Analyzing Music Internship: 04 July 2010

09 July, 2010

Axtell's Notes: July 9

I started off with a project that I thought would take all day, but that was done before lunch; I switched everything from a 3D array to a HashTable of 2D arrays so that access would be faster. Most full length songs run in under a minute now. I also cleaned up the code quite a bit to get rid of warnings (mostly redundant casts and unchecked instances).

I then spent all day making a browse option, or trying to. It doesn't work as of yet. The idea is that you should be able to use any sound file from anywhere on the computer, so I'm making a window that is like any Open Document GUI. It shows the directory and you can open or collapse folders and select the desired file and use that one. There are two problems with this. First, it prints the whole pathname of each file or folder which makes it too long to practically read, but JTree names it using File.toString() so I need to find a way print only the end of the pathname. The other problem is that this only works once. If you click browse and chose a filename that works fine, but try to do it again without restarting the program and it gets NullPointerExceptions.

While showing the professor how the GUI works, I found why my code has been printing a lot less than it used to. Where I get the dB of the file, I've been multiplying by 0.775 instead of dividing. This hasn't been not on files such as a440 because 0.775 is close enough to 1 that it didn't make a difference. On full songs though, it was very noticeable.

I have several tasks for next week besides getting the browse feature working. I'm making an "advanced" menu where such variable as buffer size and which weighting equation is used can be changed. I'm updating the readme and help files. I'll be doing a lot of general cleanup to make everything neat.

I'd also like to scale the magnitudes so they are between 0 and 1, but this caused problems when I briefly tried this because too many values were too close to 0 and were discarded.

And, of course, some screenshots:

Maple Leaf Rag

Granular Roads Creatovox

See you Monday.

08 July, 2010

Axtell's Notes: July 8

I got a bunch of small problems fixed today, finished up my window research and continued to write the readme and help files.

I finally figured out how to change the heap size! Excellent. When calling the GUI, instead of calling "java BigGUI" I now call "java -Xmx(heapsize)m BigGUI" I've been using 512 megabytes and haven't had an OutOfMemoryError all day.

Everything can now work with an .au, .aif, or .wav file now. I'm looking into getting it to work with .mp3's too, but I'm not sure how possible that is.

I reactivated NoteFinder. It had been commented out about a week ago to try and avoid the memory problems, and I then forgot where it had been commented out.

All the windows are scaled to the same range of points (magnitudes = 0-3.5).

Oh man, It's been a while since I've posted some screenshots. Here's Maple Leaf Rag with several different windows:

Unwindowed Maple Leaf Rag

Maple Leaf Rag - Gaussian

Maple Leaf Rag - Blackman

Maple Leaf Rag - Blackman-Harris

I seem to be missing a lot of bass notes from this. I'm going to look at a bunch of full-length songs tomorrow to find out what needs to be fixed. Also, ThunderIntro is still missing the claps, and ShaveAndAHaircut is missing the cymbal crash. I'm going to look into finding those tomorrow too.

Axtell's Notes: Windows

So I've been doing some research into windowing; which is better for what type of sound and so on. There is no window that is universally the best so, we have to decide what window to use depending on what information we want to get from a sound and what kind of sound it is. As a general rule, the more complicated a window is, the more accurate it is.

Some variables that we look at are:
-Highest side-lobe level; low levels reduce bias
-Worst-case processing loss; low levels increase detectability of peaks
-Quality of frequency resolution
-Amount of spectral leakage
-Amplitude Accuracy

Rectangular (none) has the highest side-lobe levels (~-13dB) and a lot of spectral leakage and very bad amplitude accuracy. It is best used with a transient (e.g. spoken word) shorter than the window or two close frequencies of almost equal amplitudes.

Bartlett also has high side-lobe levels and the leakage and amplitude accuracy is only a bit better.

Hanning and Hamming are good choices for a fast, general-purpose window. They have good frequency resolution, get rid of a fair amount of leakage and don't take forever to calculate. Hamming window is our current default.

Gaussian windows have the added benefit of a variable that adjusts the side-lobe level and processing loss to a point. It is best used with longer transients.

Flat-Top has very low processing loss so is best used when amplitude accuracy is important.

Blackman and 4-term Blackman-Harris are the best at reducing spectral leakage and also have good amplitude accuracy. They have very low side-lobe levels. They are best for general use when speed is not necessary. These have a tendency to push the memory over its limit right now which is why we don't use them for full length songs as often.

07 July, 2010

7/7/10 Daily Journal of AT

Well, did some more testing today, and managed to fix two of the things I wanted to.

First, when graphing the data generated by the FFT, lately the higher peaks have been lost (namely cymbal hits). This is more than an asthetic concern, as if those notes aren't graphed, it means they aren't being returned by the threshold cleaning function. I experimented with a few of the variables in there, and managed (in the Hamming window, at least) to get the cymbal hits in a few test files to show up. Amusingly, since the beat finder function gets the data before it's cleaned, it has no problem finding cymbal hits, so there were cases where there was a beat for no notes shown.

I didn't actually solve that problem, but with Axtell's new windows, I'm confident that they will take care of the problem. I also improved the beat finder so that it returns more accurate beats. Before, all beats were returned, and I was playing with returning no beats if one was found within a partial second of it (because it wasn't a new beat, mearly the old one continuing). Now, I've added a portion to the cleaner that checks to see if the previous bit was a beat or not. If it is, it is assumed that the beat is not a new one, and sets it to false. To see how this affects the two strong-beat songs:

Sweet Caroline (techno remix)

My Sharona (rock remix)

At top is the song, then the uncleaned beats. After that is beats cleaned by closeness (i.e. if they have a previous true, they're not a beat), beats cleaned by the .1 sec rule from yesterday, and then both cleanings together. They tend to compensate for each other's failings, so it will be kept as is.

With that, I can say that the Beat Finder is done. There is the option to add in cymbal finders, but all that takes is testing to find the correct band. I won't be in tomorrow or Friday, unfortunately, but when I get back, I plan on working on improving the threshold cleaner for the data.

Axtell's Notes: July 7

We made a lot of progress today. Everything is running about as fast as it did last week. This is because of two changes: first, I modified Gregor's FFT and bitReverse methods to work with a 2D array of doubles instead of Complex (we're now using that instead of Sonogram's code), and second, I moved getWindowed (the actual math of the windows) from the enum Window to FFT.

All the windows (none, Bartlett, Hamming, Hann, Gaussian, Blackman, Blackman-Harris and Flat-Top) are working now. The problem with Blackman-Harris and Flat-Top was that they need to use indexes -N/2 < i < N/2 (Where N is the number of samples and i is the index.) All the other windows use 0 < i < N. The windowed data also scales so that the highest point is always the same. This is so the colors and what prints is consistent.

I've been doing some research into which windows are best for which kinds of sound. Tomorrow's post will explain that once I've had a chance to compile all the information I've found.

We're running out of memory very often still. I've added a popup window to explain what's happened and to stop it from printing the error in the terminal, but we haven't found a way to fix it. It happens mostly with the Blackman-Harris window when running anything longer than ten seconds.

I've started writing the Read Me for the whole program and realised that we don't have a name for our program. For now we're calling it BUFFET (Big Useful Fast Fourier Epic Transform.) That is subject to change.

06 July, 2010

7/6/10 Daily Journal of AT

Well, I've got more to show for my efforts today, even if they're all in picture form.

I started today by trying to get the BeatFinder to be more consistent. I ended up abandoning the function that finds the proper multiplier based on surrounding loudness, as it only ended up deleting all useful data. I decided on a base multiplier of 1.1 times the average as the threshold for a beat, as anything more started cutting out actual peaks. I stuck with base beats only today, and analyzed a few different songs. I also wrote a "cleaning" function for the beats. Basically, as it is now, if there is a hit on a base drum (or a really low base guitar note), the function registers it as a beat. It takes samples every portion of a second, so naturally, if one note lasts for a quarter not span, it will return a lot of "beats" in a row, rather than just one. I used four different songs (Sweet Caroline remix, My Sharona, Wild World, and Maple Leaf Rag) that had four different strengths of back beats (strong, moderate, weak, and none, respectively). I ran them with no cleaning, .1 second cleaning, and .2 second cleaning. These are the results:

Sweet Caroline (techno remix)

My Sharona (acordian rock remix)

Wild World (original country)

Maple Leaf Rag (piano only)

Looking at them through Audacity, it becomes much easier to see and compare how each cleaning function is doing. At the top of each image is the sample of the song being played, then the uncleaned beats, the .1 sec cleaned beats, and the .2 second cleaned beats.

With the steady techno beat, no beats are lost in the .2 sec cleaning, and the .1 cleaning leaves them messier than they should be (if we want one beat in the file for each actual beat). However, such a rigourous cleaning causes the rock song to lose notes. With the slower country, it's not as apparent either way, and the acoustic piano shouldn't bet getting very strong peaks (it probably has a few from low notes, resonance, and a bit of spill). In any event, depending on which kind of music is being analyzed (namely, slow or fast, loud or soft) would determine which cleaner would be more useful. As the computer should be (eventually) able to decide this on it's own when running the program, I left the second length as a changable variable.

In other news, the power briefly went out today. Fortunately, I saved recently enough that no work was lost.

Daily Blog 7

Today I spent the day trying to further decipher the constant Q transform. I looked at Judy Brown's MatLab code for the "brute force" method of calculating the CQT. I tried to translate it into java but the translation proved more difficult than I originally thought so I decided to re-read her paper on an efficient algorithm to calculate the CQT.

I had a little more success with understanding the CQT by re-reading the paper. I think I have a good idea of what the transform does and what variables are used to calculate the transform. Tomorrow I plan on trying to get a working program to calculate the spectral kernel for the transform. After calculating the spectral kernel, the CQT is found through a simple multiplication.

Axtell's Notes: July 6

Today I made an enum class for the windows. This cleaned up the code a lot, but also slowed it down. We're also still get OutOfMemoryErrors.

The enum class Windows has 10 windows right now: rectangular (un-windowed), Bartlett, Hamming, Hann, Gaussian, Blackman, Blackman-Harris, Flat-Top and Tukey. I had a Kaiser-Bessel window as well, but that uses infinite series and that was taking much too long to calculate. I'm going to look into which ones we don't need or won't get used so we don't have anything unnecessary in the code.

Speaking of unnecessary, I meant to go through all my code and make sure that every class only imported what it needed, but I didn't get to it today. I hope that will help to stop the code from using too much memory.

I did some research into which windows are better for different kinds of sound files and which return more precise results. It looks like Blackman (or Blackman-Harris), Gaussian, Flat-top and Hamming are the best depending on what kind of sound is used.

Lots of testing today without any definitive results yet, so I don't have much to say. I'll be doing more of the same tomorrow as well as cleaning up.

Analyzing Music Internship