Alrighty, today was pretty good. Again, not a lot to show, but it was mostly research on finding the beats in a sound file.
Started out trying to work with the MIR files, but they made very little sense. The documentation was meager, and I couldn't actually tell what the analysis was doing. Looking at the website, I think it was the program to compare songs, but the output files wouldn't open in any given file, and looking at the txt of them, it was a bunch of numbers followed by a filename. Useful if one knows what they mean, but....
I started looking around the MIR website, and found a version of the BeatFinder written in C++. Wish I could have run it, but I don't know how to compile C++ on the terminal. Anyway, I started looking at it, and I know already, I can't translate it directly and implement it, because, rather than using an array of doubles for the data, it creates its own new object called Sounddata. All the extra information in it has already been stored in our other methods, so I see no reason to try and implement it.
Before trying to get through the BeatDetection, I saw in their comments that they had gotten their algorithms from another site, gamedev.net. I found the article they mentioned "Beat Detection Algorithms" (c) Frederic Patin 2002 , and it helped a lot. Put simply, the way the human ear finds beats is by recognizing a briefly louder sound in a song at a particular frequency. If a program can find when there are emphasized notes, it can find the beats.
It went into a lot of detail about how to compare a given sample to the average of the local sound (about one second surrounding the sample, so that if a song changes in intesity, it does not miss quiet beats and falsely return loud non-beats) to find if it is a beat or not. There are several methods of optimization, such as keeping the energy height rather than the frames of a sample, adding in a multiplier to avoid getting loud non-beats, and using an FFT to only compare the energy of given frequencies (to better find a back-beat or a cymbal hit).
Comparing what was said and what we have, I feel like we could implement this in a week (it'll probably take less than two days to write, but a while for testing various factors to become accurate). The article suggests having a logarithmic scale (and/or geometric spacing) for greater accuracy, so the constant Q should help with that. We may have to write an extra file with all the FFT data (rather than just the peaks) to use with the beat finder, but that shouldn't take up much time, programming or processing. I think it's doable in a week, doable well in two.
Very encouraging! Good idea to pursue the other article. The spectral centrum and spread also need a set of frequencies and magnitudes from the FFT, so this will be useful for that too. Also, the FFT can give us frequencies that the Q-transform won't, perhaps those made by percussive instruments?
ReplyDelete