20

From what I understand, it is difficult for automatic algorithms to figure out the key of a recorded piece of music. For example, in one review of a range of professional "DJ" software that had a key detection feature, the software typically scored about 50%, only guessing the correct key half the time.

I guess I am confused by this. If we can have software like Auto-Tune that can adjust music to be in the correct key, why is it so difficult to detect the key in the first place?

Tyler Durden
  • 799
  • 5
  • 13
  • 2
    Even humans have a hard time detecting keys. That seems like a task that would be much harder to program a computer to do than to do oneself, as hard as it is for a human to do it. – Todd Wilcox Apr 23 '18 at 01:45
  • 14
    As I understand it, Auto-Tune does not really "adjust music to be in the correct key," but rather merely shifts pitches to the nearest semitone. That is a much simpler problem than key-detection. –  Apr 23 '18 at 02:11
  • 2
    @DavidBowling Autotune can do either or both. When autotune is correcting the pitch to the nearest pitch in the key, it is not done by detecting the key from other music, the key is chosen by the user. What Tyler Durden isn't understanding here is that the analysis required for pitch correction software is very simple: detect the pitch. Once the pitch is detected, it's very simple to determine the closest pitch and then shift the pitch to that. Detecting pitches is easy. Detecting keys is much more complicated. – Todd Wilcox Apr 23 '18 at 02:16
  • @Stinkfoot The questions are very similar but none of the answers of the candidate dupe come even close to answering this question. I think an important difference is that this question asks "why?", not "what?". – Todd Wilcox Apr 23 '18 at 03:57
  • @ToddWilcox - I agree. I retracted my close vote. – Stinkfoot Apr 23 '18 at 04:02
  • This question would get much better responses on the Software Engineering SE site – Cloud Apr 23 '18 at 08:39
  • Given a sequence of notes, there may be multiple matching keys, even though the composer had only one in mind. The same sequence of notes can match two scales, but function as different scale degrees in each scale. Without knowing the composer's intent, the answer can be ambiguous. To make things worse, composers regularly break the rules and use notes outside of the key they are composing in. – Owen Johnson Apr 23 '18 at 18:14
  • Would any of these automatic key detectors detect polytonality or at least bitonality? Polytonality isn't restricted to classical music, I've found: Giratina's theme from Pokemon Platinum is bitonal in parts (B major in treble, G Phrygian/C minor-like in bass), and the Team Galactic Admin theme from Pokemon Diamond/Pearl/Platinum is also bitonal in parts (A flat minor in treble, F minor in bass). – Dekkadeci Apr 23 '18 at 23:08

4 Answers4

28

There's a few factors at play here:


Let's assume that we have a magical piece of software, which can listen to audio and tell us exactly what notes are being played. Even given this software, determining key is not a trivial problem. Sure, there are simple cases, but even humans disagree over many songs. A computer has no chance.

Take Sweet Home Alabama. The chords are D C G. Many electrons have been wasted arguing over whether this is a V IV I in G Major or a I bVII IV in D Major. I personally think it's in the key of "please never play that again", so I avoid analysing the infernal thing too closely.

Or take Hey Jude. The na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na bit. If we transpose a bit, the chords are also D C G. But that's pretty clearly a I bVII IV in D major. Context is important, and building an algorithm to automatically determine that context is a complex problem.


So, we've established that 100% of the surveyed songs with a D C G progression are annoying. The next part of the problem is actually getting a list of pitches to do this key recognition.

You'll notice that I used the word "magical" in the previous section. Most pitch recognition software will do some sort of frequency analysis. Basically, they grab a section of audio, and determine what frequencies are present. We know the frequency of every note, so we can map that list of frequencies to a list of pitches.

Not so fast. Unfortunately, when an instrument plays a note, it produces more than one frequency. That's why a piano doesn't sound like a guitar. Some of those frequencies will be harmonic; that is, multiples of the root frequency. Others will not. If the instrument is not pitched (like untuned percussion, or a noise sweep), there will be lots of these inharmonic frequencies.

If you have a complete track, separating all these frequencies, determining which are pitches, and which are harmonics, is non-trivial. It's kind of like trying to separate the ingredients of a milkshake once they are mixed. It's certainly possible to get a good approximation, but hard to actually tell exactly what was being played. The (trained) human ear is much better at this task than computers.

Now, to be fair, if you're just trying to determine the key (rather than transcribe every note), this problem is easier to solve. I don't care who is playing what note; just the overall harmonic structure. But there's still plenty of room for your computer to make mistakes here.


A couple of comments have observed that even if you have a list of pitches, converting them to note names requires some idea of the key. This is because, in the vast majority of Western music, we have the concept of enharmonics. Basically, A# and Bb are the same frequency, and we choose the name based on the key.

For a lot of music, this isn't really a big problem. For example, here's a set of pitches:

A#/B♭/C♭♭ B#/C/D♭♭ C##/D/E♭♭ D#/E♭/F♭♭ E#/F/G♭♭ F##/G/A♭♭ G##/A/B♭♭

It's pretty obvious that this is B♭ Major. You could call it A# Major, but that's a much more complicated way to spell the scale, so we don't. Equally, C♭♭ Major is not a good name. That sort of heuristic is quite easy to add to software, so in this simple case, it's not really a problem.

It could be more problematic when there are two equally right options, like F# Major vs G♭ major. Again, either is correct, so you just pick one.

If the key is more ambiguous, then this could be more of an issue. But I think the other problems are much more significant.


Finally, on Auto-Tune. Auto-Tune's job is easier for a couple of reasons. Firstly, it's going in the other direction. It has a set of "good" notes (semitones, or a user-specified key), and moves any "bad" notes accordingly. It doesn't have to assign a key. Secondly, you generally autotune a single isolated instrument. That's much easier to handle than a complete mix. I don't know what Auto-Tune will do if you run it over the whole mix at once, but I don't think it will be pretty.


In short:

  • Even given a list of all the notes/chords, key detection is non-trivial
  • Getting that list of notes and chords automatically is not a reliable process

As a result, computers can certainly attempt automatic key recognition, and get close in a lot of cases, but it's unlikely that they will ever be 100% accurate. If someone would like to prove me wrong, I'd love a free copy of your software to verify your claims. For scientific purposes, of course.

endorph
  • 9,549
  • 3
  • 26
  • 51
  • Actually, your magical software might actually exist. I think Celemony Melodyn can recognize all the pitches in a piece of music, if everything goes correctly. – Todd Wilcox Apr 23 '18 at 03:56
  • "*A fully mixed track is a much harder problem.*" and I think the worst offenders are [unpitched percussion instruments](https://en.wikipedia.org/wiki/Unpitched_percussion_instrument) because their sounds are considered "noise" to the software. – Andrew T. Apr 23 '18 at 04:38
  • @AndrewT. Yes, my layman's guess it that it's a combination of the inharmonic sound generated by percussion, and the issue of distinguishing "overtones generated by a single timbre" from "actual distinct notes". All that being said, I haven't used Melodyne. And my attempts with AudioScore are from five or six years ago now. – endorph Apr 23 '18 at 04:48
  • If Sweet Home Alabama isn't in G, it must be permanently modulating. How many songs do that? – Tim Apr 23 '18 at 07:05
  • @Tim Nah, the argument is usually "V IV I" in G Major vs. "I bVII IV" in D Major. I honestly haven't spent any time thinking about it, but I've seen the 30-page forum threads arguing about it. Such is the internet. – endorph Apr 23 '18 at 08:49
  • 4
    It's [The dress](https://en.wikipedia.org/wiki/The_dress) of the music world... FWIW I'm a "I bVII IV" guy! – Нет войне Apr 23 '18 at 09:11
  • @topomorto - so, one of the 12 bars I played last night wasn't actually in A, as we thought, but was in E... Funny, 'cos it sounded like A was the best key! – Tim Apr 23 '18 at 10:09
  • Let's just settle once and for all with a compromise: _Sweet Home Alabama is in B♭ major!_ Oh, or is it B major? No, should we wrap around the other way and call it E quarternote-sharp major? Damn... – leftaroundabout Apr 23 '18 at 11:33
  • On a more serious note: I wonder why it is that Sweet Home Alabama sounds more like **Ⅴ** **Ⅳ** **Ⅰ** (to me and apparently to others too), whereas e.g. the outro of Hey Jude is quite clearly **Ⅰ** **Ⅶ♭** **Ⅳ**. – leftaroundabout Apr 23 '18 at 11:39
  • 4
    @ToddWilcox I have some software that is guaranteed to win the jackpot on any lottery, if everything goes correctly. – OrangeDog Apr 23 '18 at 13:01
  • @leftaroundabout Hey Jude firmly establishes itself as being in F before the ending. Also in Sweet Home Alabama the V and IV each take half a bar, which makes the I sound a bit stronger and more home-y. Finally, the Hey Jude coda starts with a very firm F major supported by a melody which very clearly outlines that chord. Sweet Home Alabama is much more ambiguous. – Javier Apr 23 '18 at 14:25
  • To be more clear, computers, like humans, can only hear pitches - the *note* is an abstract concept. Your "list of all the notes" presupposes a key already - what you will have is a list of pitches and the task of key detection is deciding what names you want to give to those pitches; to decide how you would transcribe them as notes. You can't have a list of notes unless you've already decided whether a given pitch is A♯ or B♭, and with that you've also decided the key. Even a group of humans would probably do this differently for many pieces given no sheet music and a recording to transcribe. – J... Apr 23 '18 at 16:29
  • https://youtu.be/DVPq_-oJV5U Adam Neely's recent take on Sweet Home Alabama and key-naming – user45266 Dec 05 '19 at 19:22
5

I guess I am confused by this. If we can have software like Auto-Tune that can adjust music to be in the correct key, why is it so difficult to detect the key in the first place?

Without getting into any of the particulars of automated key detection and its difficulties, I think the answer to this question is fairly simple:

Auto Detect needs a frame of reference - a baseline - from which to work. Music in any key has a particular pattern reflected in its notes. When we transpose to a different key, we duplicate that pattern, just using different notes.

A very simple example:

The pattern for the major scale is:

Tonic->Whole Step->Whole Step->Half Step->Whole Step-> Whole Step->Whole Step->Half Step==Octave.

Following that pattern starting on the note C, we get C->D->E->F->G->A->B->C.

But we can just as easily apply that pattern starting with the note D, giving us
D->E->F#->G->A->B->C#->D.

Replicating a pattern from different starting points is a great job for a computer - it's one of the things they do best - a fundamental computing operation. This is because it requires no original thought or analysis - it's a simple, mechanical/mathematical process. So, once we have music in an established key, we can easily tell a computer program to transpose that music to a different key - simply replicate the pattern with a different set of notes.

But detecting the original key with no pre-existing frame of reference to work with and replicate is a very different and much more difficult job for a computer. It requires analysis and discernment and judgment. There is no pattern to be replicated. It requires a great deal of information and knowledge about music to determine the key of a piece, and it is sometimes quite ambiguous. In order for a program to accurately determine the key of a piece of music, it must have all of that knowledge available and be able to use it to arrive at the proper key. As a software developer, I can tell you that is a difficult computing problem indeed - getting it right is no mean feat.

Stinkfoot
  • 6,743
  • 23
  • 49
2

Even individual notes have multiple names - did I just play G sharp or A flat? The same group of notes may be named as any of several chords. If I play C, E flat, G, B flat, do I mean C m7, or E flat 6? This depends on which key the composition is in, and that "key" depends on the overall context of (usually all) the other notes and chords in the composition. The idea of a "Key" is an abstract mental model that we overlay onto a acoustic phenomena to better understand its structure. That "acoustic phenomena" (musical composition) may be very complex, and no matter whether it is mathematically precise and regular or loose and free-form, our model of it will necessarily be an approximation. Stated otherwise, the music itself is a physical reality (vibrations in the real world), the "Key" exists only in our minds. Automatic key detection is one of those things scientists call a "hard problem".

1

The problem is hard because composers make it so.

Arguably, the very definition of music composition is to obscure and interpret. Half of the techniques we learn have to do with prolongation, suspension, evasion, finding ways to make dissonant notes seem consonant, etc. If you subscribe to Schenkerian Analysis you might even believe all true masterworks are a nothing more than a (very complex) embellishment of I.

Interesting music is designed to challenge your ear to find its inner reason. And sometimes it isn't even there. It's no wonder computers have trouble with it.

John Wu
  • 1,840
  • 9
  • 10
  • 2
    This doesn't really answer the question, as the premise of the question holds true even for the simplest of pop songs. You're correct in that it's harder to infer the key of "masterworks" as you put it, but that doesn't reduce the answer to the question to simply "composers are deliberately obtuse" – Matt Taylor Apr 23 '18 at 09:57
  • 3
    @MattTaylor -- I have to disagree; I think that this is a pretty good answer. This doesn't really say that "composers are deliberately obtuse," but it is true that music, even relatively unassuming music, develops interest by avoiding resolution, suggesting other keys, and generally frustrating the sense of key. Lots of simple pop music avoids simple adherence to a key because it doesn't know better, and even the blues complicates matters by using seventh chords everywhere. –  Apr 23 '18 at 10:36
  • 3
    "the very definition of music composition is to obscure and interpret" Where in the world do you get this from? What about "Create enjoyable sounds" or "Express emotion"? – MickeyfAgain_BeforeExitOfSO Apr 23 '18 at 11:56
  • _Interesting music is designed to challenge your ear to find its inner reason_ Would you consider Mozart "interesting"? – Stinkfoot Apr 23 '18 at 15:44
  • 2
    @MattTaylor "composers are deliberately obtuse" doesn't seem a fair interpretation of this answer. deliberately *opaque*, maybe, but then there's some truth in that - which is why people ask questions like [this one](https://music.stackexchange.com/questions/70012/sources-of-harmonic-ambiguity-in-tonal-music) – Нет войне Apr 23 '18 at 16:01
  • @MattTaylor Tell you what, Matt. You post what you consider today's most "simple" pop song; I'll edit my answer to explain some of the compositional techniques that it uses that obscure the key. – John Wu Apr 23 '18 at 18:15
  • @JohnWu Challenge accepted. Song 1: Blink 182 - all the small things. And not just what's not 'in key', I want you to justify why they _deliberately_ put those elements in to draw you away from 1, 4, 5, 6. – Matt Taylor Apr 24 '18 at 07:22