Analyzing Music Internship: 2010

26 July, 2010

Axtell's Notes: July 26

Constant Q is working! Really. It even can be graphed. Here's some proof:

a440 - Constant Q Transform

a440andOnePartial - Constant Q Transform

fade - Constant Q Transform

pretty - Constant Q Transform

These aren't scaled correctly as Constant Q is logarithmic and I have been using the FFT grapher so we could see if we were getting any points back. I would have a screenshot of Maple Leaf Rag to show you, but at the pace these were taking, it seemed like a 2-minute song would have taken around 7 hours. I'll be running it tonight so tomorrow morning we can see that.

We have written the most common kernel for the CQT (min. frequency = 16.352 Hz, max. frequency = 22050.0 Hz, bins per octave = 12, sample rate = 44100.0) to a text file so the computer doesn't have to re-calculate it each time. If those values are changed (min/max frequency and bins are changeable in the advanced menu and the sample rate is given by the sound file) the kernel is calculated, but not written to a file.

Tomorrow's to do list:
-Adding windowing functions to CQT
-Getting the CQT data into a logarithmic scale
-General clean-up and testing

23 July, 2010

7/23/10 Daily Journal of AT

Hey, it's the end of the internship! Okay, not really, since we have at least one more week. But it could be.

The old version of everything is working. Axtell and Gregor are still playing with constant q, but we do have a working FFT with peak finder, beat analysis, note generation, and statistics data. And they don't generate errors (well, unless the file you give it is error-ridden, in which case, we can't help you). I finished off the grapher for the stats data as well. It prints the average, as well as the skew, standard deviation, and spread, with kurtosis as changing colors.

Next week: code clean up!

22 July, 2010

Axtell's Notes: July 22

That problem from yesterday (only graphing the first section of the spectrogram) is somewhere in the Audio object class and how we split the file into many samples of the given length. I've avoided the problem, and the DFT half of Transforms is working now, but I should try to find out what the problem was.

How we split without the Audio object: Make an AudioInputStream of the audio. Make a for loop that makes a mini AudioInputStream that is a sample size long section of the full stream. (The way AudioInputStream works, each time through it will start reading where it last stopped) Get the data from the mini stream and FFT that.

How we tried to split it with the Audio object: Make an Audio object of the audio. Get all the data from that object (in an array). Make a for loop that makes a mini array that is sample size long starting where the last one stopped.

How we're splitting now: Make and Audio object of the audio. Audio has a split () function that makes mini streams and gets the data from each of those and adds is all to a 2D double array. So it's a combination of the two.

I did some time trials with the old and new DFT classes and found that the new is slower, but it is only noticeable on files of a minute or more. CQT is still not working, but Gregor's working on that. The new DFT is also taking a lot more memory than the old. I had to boost the memory mac to 4096MB for it to run Sweet Caroline. The old DFT can run the same file with a max of 2048MB.

The whole morning and half the afternoon were spent on those two projects.

The rest of the day went to working on some null pointers that come up when running Buffet (formerly BigGUI). They happen because there is a listener in the filename textfield that should only listen when the enter key is pressed, but it isn't very easy to get Java to listen to an enter key. While these null pointer errors don't stop the program from running, they are annoying and distracting, so I'm going to get rid of them. I'm working on that, and it should be working by the end of tomorrow.

Tomorrow is all day pair programming to neaten/speed up/shorten/fix all the code and end up with one set of classes that we are all working with since we are all working with different code.

7/22/10 Daily Journal of AT

Today was statistics, statistics, statistics. I got a decent graph out of the stats data, and printed the average, along with three standard deviations, with no problems on several files. I then did a bit of research on skew, to better understand how to best represent it visually, and added in a writer for that. After an hour, I realized it wasn't the drawer that was making the skew far off from the average, but the original statistics data.

Initially, the average and standard deviation were solely on the heights of the data. While this worked fine, it provided useless when trying to find what the average frequency was. However, when trying to find the centroid (the average frequency based on heights), the resultant number is always the same, regardless of the file (silent files have the same average as noisy ones, which is clearly wrong).

I haven't even begun on kurtosis. However, it is good to know now that the data is wrong, and hopefully I can fix it tomorrow. The grapher is not as important, though may work as a replacement for NoteFinder.

21 July, 2010

Axtell's Notes: July 21

The DFT half of the Transforms class is kind of working. It doesn't return all the samples. It does the a440 file correctly, but the fade file shows only the fade up (not the fade down) even though it has the same number of samples shown as the old DFT. I've kept the old DFT working in a separate folder so I can keep updating rainbows and getting rid of spill.

New (left) and Old (right) DFTs on Fade.wav

The new DFT is also slowing than the old one. I haven't figured out why yet, but the dialog box that pops up and quacks on completion of the split now also prints the time it took to split and FFT the file. The new DFT is about twice as slow as the old.

The Constant Q still doesn't work. It will work for about ten minutes, and it does get numbers, but none of them get through PeakData, so after all that work the computer shows a blank spectrograph.

Gregor wasn't in today, but should be back in tomorrow afternoon so we'll look at this together and get that working by Friday hopefully.

That was more or less all I got done today as we were locked out of the lab for a while this morning. Lot's of slow and steady progress as we work towards getting the Constant Q working. We need to start looking at cleaning up, commenting and packaging all our code together this week.

20 July, 2010

Axtell's Notes: July 20

Everything has been moving so slowly that I haven't been bothering to type up all I tried that didn't work. I've been working on Threshold v. PeakData and cleaning up BigGUI (Now called Buffet). We're working with PeakData right now, though that's not perfect yet (Tayloe's been doing more work on that, so look at her posts for more information.)

More importantly, Gregor and I have been working on getting a java Constant Q Transform method and yesterday we finally got the same numbers as the MATLAB method. Today we spent all day putting her code (Complex, Audio, DFT and CQT) and my code together. I also did major clean up of BigGUI, FFTGUI, and the graphers. I'm starting to rename classes more useful, updated names (eg. GraphSplitter is now Spectrogram). I made a class Transforms that is a combination of DFT and CQT. We're not sure which will be best, so we made both, and we'll test as we go.

I just got everything compiled and tried running Buffet. The DFT plots some points, but they are clearly wrong:

a440

Three Notes

The Constant Q doesn't return anything as of yet. I hope to get this working tomorrow morning, so I can get the grapher working with CQT data by the end of the day. I'll also be updating the menu to incorporate Constant Q.

7/20/10 Daily Journal of AT

This morning was spent in what I hoped was residual testing. However, I found out something weird. The scaling of the data affects what data is kept in the peak generation. At low multipliers, this can be based on some numbers being set to zero, so that, the lower the multiplier is, the less data is retained. However, a similar effect, though less dramatic, happens as numbers increase (as in, higher numbers return fewer peaks). The image below shows the same short sound file at multipliers of 1 through 12, counting from left to right, up to down.

As you can see, at the multiplier of 4, the most data is returned. This holds true for most files, regardless of overall loudness. We don't know why this is, but have compensated for it.

The afternoon was spent working on graphing the statistical data. Unfortunately, there are no pretty pictures to show from that, as I've only managed to get the data in and corrected, while the adapted GUI grapher is giving me trouble. It will possibly be replacing Note Finder when it's finished, as it seems a bit more useful.

19 July, 2010

7/18/10 Daily Journal of AT

Today was spent re-implimenting Peak Data. I know, I know, from all that I said before about Threshold being so great at cleaning data, this is unexpected. However, with the amount of testing done by myself and Axtell, it was clear that Threshold was cutting out hearable peaks in the upper frequencies, while leaving lower peaks that may or may not have existed. I've had to play with several aspects of the program, including getting rid of the equalizer. Oddly, it was changing the results of Peak Data, even though the function works on a relative scale. Fortunately, I've managed to get fairly consistantly accurate data on most of the windows with the same function (rectangular windowing remains the most messy).

In addition, I re-wrote Statistics so it just creates a file of the statistical data, rather than the FFT as well. The peak data is still being written to file, which is the most important part of the data. Tomorrow, I hope to move on to getting the constant Q to work with the other functions (have to change scaling for a lot of functions). I'm debating on abandoning the BPM finder, as it is not terribly accurate, has taken up a lot of time already, and may not be useful in the long run.

16 July, 2010

7/16/10 Daily Journal of AT

Hey, it's a blog! There hasn't been much to say for the past two days; it mostly related to testing and tweaking numbers of BeatFinder and Threshold. They're mostly stable now, so we should be set with that. I also created a statistics finder to go with the FFT, which works, but doesn't do much except for display numbers. At the professor's request, I added a method to write the statistical data, as well as all the FFT data, to a file. (I will add more about the statistical data soon.) On a four minute song, the data size is 2.4 GB. And I got an error message:

Next week: Making it smaller! Also, graphing statistical data.

13 July, 2010

7/13/10 Daily Journal of AT

Only worked with the bpm generator today. Started out writing a function to compare a generated bpm to beats found from a file. It worked, but not particularly helpfully. If the downbeats are correlated exactly, it's easy to tell if a file matches or not. If it is off by, say, an eighth note, then two identical beats will nonetheless return with no matches. It is possible to run and re-run matches to get the best data, but it takes up a lot of processing time and may not be necessary if the most occurring spaces is used by default. I've left the function in the program uncalled, for now.

After a few hours of testing, I discovered that my bpm generator was constantly getting laggy data. On a few songs, it was accurate, but on most, the beat it created was not the same as the beat in the song. I figured out that, the faster a song is, the less accurate the program can determine it's bpm. Why? Because of the spacing.

See, if I took every sample and tested if they were a beat, I would be able to tell exactly how far apart the beats are. However, in order to tell if they are a beat, I must take more than one sample at a time; in this case, 1024 samples. The relation of the spacing to the bpm is an inverse function, so the smaller the beat distances are, the larger the bpm discrepancies. For example, the program can accurately generate any bpm less than 80, give or take a beat. From 80 to 95, it skips two or more beats; any higher, and it skips more. By 151, it skips by ten, and above that is unhelpful.

However, I have a few plans to combat this. In cases where the bmp is not exactly spaced by sample size, the most occuring two (or top one and three) spacings should be next to each other, because sometimes the spacing is closest to one, and sometimes the other. Finding the average of the two will lead to a more accurate bpm. At least, that's the theory. As of right now, the only song I've had time to test it on has become more inaccurate. Tomorrow will be more testing to correct this.

Axtell's Notes: July 13

Today I finally got to all those things I'd said I'd fix about two weeks ago. When somethings scrolls, the units scroll too. When something zooms in, it no longer prints data points in the border. FFTGUI once again has a compare function, and now it can compare any number of windows. I updated the Readme and Help files and continued general cleanup of all files.

I also played with weightings. The advanced menu now has 4 options (A,B,C and C modified.) C modified should only be used when trying to get something like a cymbal crash or hand clap. Those sounds tend to not get found by FFTs because they have strange waves. The modification is simply lets quieter peaks be graphed. So in Shave and a Haircut the last note (a cymbal crash) is visible, but in Maple Leaf Rag almost all the inaudible spill is visible.

C-Weighting, in general, seems to be the best. A-Weighting seems to miss a lot of data. As an example, here's a file made by Tayloe. It's two tones fading in and out. It was made to test the color spectrum.

Pretty - A-Weighting

Pretty - B-Weighting

Pretty - C-Weighting

Rainbows are still being tweaked. I discovered that they weren't properly fading because when I had adjusted the scaling, I hadn't adjusted the colors. Oops. Also, as can be seen in the screen shots, instead of all peaks past a certain point being red, the highest peaks now go from red to magenta.

12 July, 2010

7/12/10 Daily Journal of AT

After a bit of a break, I'm back for the seventh week of the internship. Most of my work today was with Beat Finder. I got the data about as clean as it will ever be, relatively speaking, and moved on to finding the beats per minute of any given song.

Essentially, the BPM, or beats per minute, are a way of recording the tempo of a song, originally for use with a metronome. A moderately speedy song would have a BPM of 120, or contain 120 quarter notes per minute (or two a second). Slower songs have a lower BMP, and faster songs have a higher BPM.

The path to getting BPM from beat data is a bit complicated. First, the program starts with an array of beats and silences. Each point of data is equivalent to 1024 samples long. The program measures the space between each beat, and stores it in a new array. The array is sorted, then each of the lengths are translated into BPM (by being multiplied by their sample size (1024), divided by the number of samples per minute (44100*60), then inverted).

The frequency of each bpm is totaled, and put into another new array. Right now, the program finds the top three occurring BPMs, the average, and the average after outliers are removed. With the test files I've used, the most occurring and the second average tend to be the same number, so it is slightly redundant. However, if a song should change tempo, or has an irregular back beat, this extra data may become necessary.

Overall, this seems fairly accurate. However, when testing the BPM by creating a new beat file, it tended to lag in relation to the song. I'm going to try to fix this tomorrow, then move on to either fixing the file generator or moving on to a new feature.

Axtell's Notes: July 12

Today was a lot of testing of the weighting functions. I made a tester class that prints the curve of A-, B-, C- and my own tweaked weightings on top of each other in different colors so I could see what looked best. I've added a constant to both the B- and C- weightings to neaten them up, and I am playing with combining them to get more precise window. Here we have Maple Leaf Rag with each of the weightings:

Maple Leaf Rag - A-Weighting

Maple Leaf Rag - B-Weighting

Maple Leaf Rag - C-Weighting

Maple Leaf Rag - B/C-Weighting

There is almost no difference between the B-weighted and the B/C-weighted spectrogram.

I made the advanced menu so now the buffer size and weighting controls are less available unless you really want to change them. I, once again, updated the rainbows. I started to go through each class and clean up, comment, get rid of what's unnecessary, etc. I'll finish the clean up tomorrow and continue to play with the weightings. I also need to update the readme and help files.

09 July, 2010

Axtell's Notes: July 9

I started off with a project that I thought would take all day, but that was done before lunch; I switched everything from a 3D array to a HashTable of 2D arrays so that access would be faster. Most full length songs run in under a minute now. I also cleaned up the code quite a bit to get rid of warnings (mostly redundant casts and unchecked instances).

I then spent all day making a browse option, or trying to. It doesn't work as of yet. The idea is that you should be able to use any sound file from anywhere on the computer, so I'm making a window that is like any Open Document GUI. It shows the directory and you can open or collapse folders and select the desired file and use that one. There are two problems with this. First, it prints the whole pathname of each file or folder which makes it too long to practically read, but JTree names it using File.toString() so I need to find a way print only the end of the pathname. The other problem is that this only works once. If you click browse and chose a filename that works fine, but try to do it again without restarting the program and it gets NullPointerExceptions.

While showing the professor how the GUI works, I found why my code has been printing a lot less than it used to. Where I get the dB of the file, I've been multiplying by 0.775 instead of dividing. This hasn't been not on files such as a440 because 0.775 is close enough to 1 that it didn't make a difference. On full songs though, it was very noticeable.

I have several tasks for next week besides getting the browse feature working. I'm making an "advanced" menu where such variable as buffer size and which weighting equation is used can be changed. I'm updating the readme and help files. I'll be doing a lot of general cleanup to make everything neat.

I'd also like to scale the magnitudes so they are between 0 and 1, but this caused problems when I briefly tried this because too many values were too close to 0 and were discarded.

And, of course, some screenshots:

Maple Leaf Rag

Granular Roads Creatovox

See you Monday.

08 July, 2010

Axtell's Notes: July 8

I got a bunch of small problems fixed today, finished up my window research and continued to write the readme and help files.

I finally figured out how to change the heap size! Excellent. When calling the GUI, instead of calling "java BigGUI" I now call "java -Xmx(heapsize)m BigGUI" I've been using 512 megabytes and haven't had an OutOfMemoryError all day.

Everything can now work with an .au, .aif, or .wav file now. I'm looking into getting it to work with .mp3's too, but I'm not sure how possible that is.

I reactivated NoteFinder. It had been commented out about a week ago to try and avoid the memory problems, and I then forgot where it had been commented out.

All the windows are scaled to the same range of points (magnitudes = 0-3.5).

Oh man, It's been a while since I've posted some screenshots. Here's Maple Leaf Rag with several different windows:

Unwindowed Maple Leaf Rag

Maple Leaf Rag - Gaussian

Maple Leaf Rag - Blackman

Maple Leaf Rag - Blackman-Harris

I seem to be missing a lot of bass notes from this. I'm going to look at a bunch of full-length songs tomorrow to find out what needs to be fixed. Also, ThunderIntro is still missing the claps, and ShaveAndAHaircut is missing the cymbal crash. I'm going to look into finding those tomorrow too.

Axtell's Notes: Windows

So I've been doing some research into windowing; which is better for what type of sound and so on. There is no window that is universally the best so, we have to decide what window to use depending on what information we want to get from a sound and what kind of sound it is. As a general rule, the more complicated a window is, the more accurate it is.

Some variables that we look at are:
-Highest side-lobe level; low levels reduce bias
-Worst-case processing loss; low levels increase detectability of peaks
-Quality of frequency resolution
-Amount of spectral leakage
-Amplitude Accuracy

Rectangular (none) has the highest side-lobe levels (~-13dB) and a lot of spectral leakage and very bad amplitude accuracy. It is best used with a transient (e.g. spoken word) shorter than the window or two close frequencies of almost equal amplitudes.

Bartlett also has high side-lobe levels and the leakage and amplitude accuracy is only a bit better.

Hanning and Hamming are good choices for a fast, general-purpose window. They have good frequency resolution, get rid of a fair amount of leakage and don't take forever to calculate. Hamming window is our current default.

Gaussian windows have the added benefit of a variable that adjusts the side-lobe level and processing loss to a point. It is best used with longer transients.

Flat-Top has very low processing loss so is best used when amplitude accuracy is important.

Blackman and 4-term Blackman-Harris are the best at reducing spectral leakage and also have good amplitude accuracy. They have very low side-lobe levels. They are best for general use when speed is not necessary. These have a tendency to push the memory over its limit right now which is why we don't use them for full length songs as often.

07 July, 2010

7/7/10 Daily Journal of AT

Well, did some more testing today, and managed to fix two of the things I wanted to.

First, when graphing the data generated by the FFT, lately the higher peaks have been lost (namely cymbal hits). This is more than an asthetic concern, as if those notes aren't graphed, it means they aren't being returned by the threshold cleaning function. I experimented with a few of the variables in there, and managed (in the Hamming window, at least) to get the cymbal hits in a few test files to show up. Amusingly, since the beat finder function gets the data before it's cleaned, it has no problem finding cymbal hits, so there were cases where there was a beat for no notes shown.

I didn't actually solve that problem, but with Axtell's new windows, I'm confident that they will take care of the problem. I also improved the beat finder so that it returns more accurate beats. Before, all beats were returned, and I was playing with returning no beats if one was found within a partial second of it (because it wasn't a new beat, mearly the old one continuing). Now, I've added a portion to the cleaner that checks to see if the previous bit was a beat or not. If it is, it is assumed that the beat is not a new one, and sets it to false. To see how this affects the two strong-beat songs:

Sweet Caroline (techno remix)

My Sharona (rock remix)

At top is the song, then the uncleaned beats. After that is beats cleaned by closeness (i.e. if they have a previous true, they're not a beat), beats cleaned by the .1 sec rule from yesterday, and then both cleanings together. They tend to compensate for each other's failings, so it will be kept as is.

With that, I can say that the Beat Finder is done. There is the option to add in cymbal finders, but all that takes is testing to find the correct band. I won't be in tomorrow or Friday, unfortunately, but when I get back, I plan on working on improving the threshold cleaner for the data.

Axtell's Notes: July 7

We made a lot of progress today. Everything is running about as fast as it did last week. This is because of two changes: first, I modified Gregor's FFT and bitReverse methods to work with a 2D array of doubles instead of Complex (we're now using that instead of Sonogram's code), and second, I moved getWindowed (the actual math of the windows) from the enum Window to FFT.

All the windows (none, Bartlett, Hamming, Hann, Gaussian, Blackman, Blackman-Harris and Flat-Top) are working now. The problem with Blackman-Harris and Flat-Top was that they need to use indexes -N/2 < i < N/2 (Where N is the number of samples and i is the index.) All the other windows use 0 < i < N. The windowed data also scales so that the highest point is always the same. This is so the colors and what prints is consistent.

I've been doing some research into which windows are best for which kinds of sound. Tomorrow's post will explain that once I've had a chance to compile all the information I've found.

We're running out of memory very often still. I've added a popup window to explain what's happened and to stop it from printing the error in the terminal, but we haven't found a way to fix it. It happens mostly with the Blackman-Harris window when running anything longer than ten seconds.

I've started writing the Read Me for the whole program and realised that we don't have a name for our program. For now we're calling it BUFFET (Big Useful Fast Fourier Epic Transform.) That is subject to change.

06 July, 2010

7/6/10 Daily Journal of AT

Well, I've got more to show for my efforts today, even if they're all in picture form.

I started today by trying to get the BeatFinder to be more consistent. I ended up abandoning the function that finds the proper multiplier based on surrounding loudness, as it only ended up deleting all useful data. I decided on a base multiplier of 1.1 times the average as the threshold for a beat, as anything more started cutting out actual peaks. I stuck with base beats only today, and analyzed a few different songs. I also wrote a "cleaning" function for the beats. Basically, as it is now, if there is a hit on a base drum (or a really low base guitar note), the function registers it as a beat. It takes samples every portion of a second, so naturally, if one note lasts for a quarter not span, it will return a lot of "beats" in a row, rather than just one. I used four different songs (Sweet Caroline remix, My Sharona, Wild World, and Maple Leaf Rag) that had four different strengths of back beats (strong, moderate, weak, and none, respectively). I ran them with no cleaning, .1 second cleaning, and .2 second cleaning. These are the results:

Sweet Caroline (techno remix)

My Sharona (acordian rock remix)

Wild World (original country)

Maple Leaf Rag (piano only)

Looking at them through Audacity, it becomes much easier to see and compare how each cleaning function is doing. At the top of each image is the sample of the song being played, then the uncleaned beats, the .1 sec cleaned beats, and the .2 second cleaned beats.

With the steady techno beat, no beats are lost in the .2 sec cleaning, and the .1 cleaning leaves them messier than they should be (if we want one beat in the file for each actual beat). However, such a rigourous cleaning causes the rock song to lose notes. With the slower country, it's not as apparent either way, and the acoustic piano shouldn't bet getting very strong peaks (it probably has a few from low notes, resonance, and a bit of spill). In any event, depending on which kind of music is being analyzed (namely, slow or fast, loud or soft) would determine which cleaner would be more useful. As the computer should be (eventually) able to decide this on it's own when running the program, I left the second length as a changable variable.

In other news, the power briefly went out today. Fortunately, I saved recently enough that no work was lost.

Daily Blog 7

Today I spent the day trying to further decipher the constant Q transform. I looked at Judy Brown's MatLab code for the "brute force" method of calculating the CQT. I tried to translate it into java but the translation proved more difficult than I originally thought so I decided to re-read her paper on an efficient algorithm to calculate the CQT.

I had a little more success with understanding the CQT by re-reading the paper. I think I have a good idea of what the transform does and what variables are used to calculate the transform. Tomorrow I plan on trying to get a working program to calculate the spectral kernel for the transform. After calculating the spectral kernel, the CQT is found through a simple multiplication.

Axtell's Notes: July 6

Today I made an enum class for the windows. This cleaned up the code a lot, but also slowed it down. We're also still get OutOfMemoryErrors.

The enum class Windows has 10 windows right now: rectangular (un-windowed), Bartlett, Hamming, Hann, Gaussian, Blackman, Blackman-Harris, Flat-Top and Tukey. I had a Kaiser-Bessel window as well, but that uses infinite series and that was taking much too long to calculate. I'm going to look into which ones we don't need or won't get used so we don't have anything unnecessary in the code.

Speaking of unnecessary, I meant to go through all my code and make sure that every class only imported what it needed, but I didn't get to it today. I hope that will help to stop the code from using too much memory.

I did some research into which windows are better for different kinds of sound files and which return more precise results. It looks like Blackman (or Blackman-Harris), Gaussian, Flat-top and Hamming are the best depending on what kind of sound is used.

Lots of testing today without any definitive results yet, so I don't have much to say. I'll be doing more of the same tomorrow as well as cleaning up.

02 July, 2010

7/2/10 Daily Journal of AT

Today was kind of slow. Primarily, it was spent testing different multipliers and size ranges for the Beat Finder. The multiplier is how the threshold for determining whether something is a beat or not is found (if the given data is higher than the average times the multiplier, it's a beat). The tutorial I found said that the most reliable multiplier is within a function relating to variance, but after getting a lot of screwy data, I decided to work with a base multiplier for now. Trouble is, it works differently on different files. With something with a strong base beat, like the Thunder intro, it does quite well, but on anything with a more even sound, it usually doesn't pick up the beats.

I was able to successfully implement the Bark sizing to the frequencies. Previously, when I wanted to find the beat in a given range of hertz, I used a multiplier, which worked well on low frequencies, and poorly on high ones. The Bark sizing works on a semi-logarithmic scaling, so low ranges are still accurate, but high ones can also be found. For example, here's the base beat of Thunder Intro:

The handclaps are still imperceptible ):

We also tried putting together our GUI and FFTs, and found out that we don't actually know what the height of the peaks is in, other than relative loudness. Presumably, since we're using the magnitude rather than the direct data, there is not necessarily any correlation between heights, volts, or decibels, other than relatively.

Axtell's Notes: July 2

The Gaussian window now can be called from anything without crashing, but is not the correct equation for that window. I'll be spending some time this weekend looking for the correct equation and at some other windows that could be better than Hamming.

I got Gregor's code working with Tayloe's and mine, but it seems to return a lot less data than our FFT. I got frustrated trying to figure out the difference since the math is almost exactly the same, so I am going to come back to that on Tuesday.

I was reading about using an overlap to get more accurate data. Overlap meaning that, when we split the wav into samples, we take each sample starting halfway through the last one. Example: If the length of each sample is 1028, each sample starts 512 after the one before it did. This way any data that would be lost because the splits are right on a peak aren't lost. This also means that we are making an array twice as long and we're getting OutOfMemoryErrors again. Clearly, we either need to find a way to use less of the heap, or I need to actually raise the heap size to match our needs.

After a long weekend of window research I will be adding new windows and making the Gaussian window work, looking at the Complex FFT again to see what's going on there, and boosting the java heap size or cleaning up our code to use less space.

Have a good Fourth of July!

01 July, 2010

Daily Blog 6

It's been slow going the past few days. I decided to create a new class in hopes it will help clean up the code in the FFT and, since the constant Q transform uses many of the same methods and calculations as the FFT, I thought a new class would help. Writing the new class was the easy part but translating my existing program to use the new class proved to be more difficult than it should have been. After eliminating all of the syntax errors, I started receiving null pointer errors followed by FFT output discrepancies. After a day and a half of debugging, I was able to figure out my problems laid in the fact I neglected to initialize a field I called later in the program and I was not deep copying my temporary variables in my bit reverse function.

After finally getting the program to work again, I started working on implementing a Gaussian windowing function which is almost done and should be complete tomorrow morning. I will then start recoding the constant Q as I had started it a few days ago but need to implement it using the new class.

7/1/10 Daily Journal of AT

First day of July, and I feel like we're getting things done. (Which is good, since we're nearing the end of week 5.)

Started today by working with BeatFinder to speed it up. Since there was no way I could get the file reading to go any faster, I started playing with different ways to pass the data from FFT to BeatFinder. I ended up re-writing BeatFinder as a class, with accessors, that was given each miniwave FFT data as it was read. After all the FFTs were done, it runs itself to get the beats. If, for some reason, this is done when there is no FFT data, it gets beat information directly from the file (as it did in the original function from the begining of the week).

While I was working in WaveSplitter (which is where FFT, PeakFinder, and BeatFinder are all called) I realized that we didn't need to create miniwave files and save them to disk. Instead, the miniwaves are passed directly to FFT, without writing them to the disk. This nearly halfs the time when running large files, which is great, but makes us unable to run a single FFT on a given second. If there's time, we'll work on getting just the data at that second to find a single FFT, or we'll just abandon it (as it's not terribly important except as a check or as a curiosity).

This afternoon was less interesting, as I've been testing various variables in the BeatFinder in order to better find beats. So far, I've gotten fairly reliable data concerning bass beats, although low notes from non-drums are also included, and vocal sound can throw it off. Tomorrow is more work on that, and I hope to have it working reliably soon.

Axtell's Notes: July 1

Very slow day. I first played with raising the threshold a little bit to see if I could clean up the peak graph at all, but it cut out too much data. I read a paper from ircam in France on spectral features. I looked at Gregor's FFT code that uses complex arrays instead of doubles to see how much work it would take to start using that instead of what we have now. There are basically two ways to adapt everything to work together. We could either get all the FFT data in a complex array, read just the real part as doubles into a double array that could be used in everything Tayloe and I have written, or we could adapt everything we have to work with a complex array. We're not sure which will work the best. We're going to try both tomorrow. I spent the rest of the morning trying to get rid of the miniwav's, but Tayloe's worked better so I scrapped that. We no longer use miniwavs at all. Maple Leaf Rag finishes in just over two minutes on our fastest computer now. It used to take 10-15.

In the afternoon, I fixed the colors again. Someday I'll be done fixing the colors. Now the colors are printed in order from quietest to loudest because the quieter peaks were covering up the louder peaks and making everything harder to read accurately.

Lastly, I worked on adding a Gaussian window because it is supposed to work better than the Hamming window which is out current default. That is almost working, though not at all in FFTGUI. In fact, FFTGUI doesn't work right now because I haven't had a chance to update it to deal with the added window.

So, first task tomorrow is getting FFTGUI working with Gaussian windows and making sure that my Gaussian window works. Then I will move on to switching our code from the old FFT class to Gregor's.

30 June, 2010

6/30/10 Daily Journal of AT

Spent the morning figuring out things going wrong with the FFT version of BeatFinder. I managed to get an analysis working on bands of frequencies, rather than just one, so that it was more accurate. However, re-reading my notes, I realized I forgot something critical: the imaginary aspect. Fairly easy to do, since our FFT discards the data entirely. According to the algorithm, the data should be squared, and in the case of FFT data, the imaginary squared is subtracted from the real squared, leading to a very large difference in data. I'm not 100% sure this is necessary, as we use the real data only in everything else. However, this might cause different errors that we are not aware of .

Moved onto/got distracted by Axtell's work with thresholds of hearing. Basically, depending on frequency, the softest note people can hear changes. There are a few algorithms that approximate this threshold, and we worked with them today to replace PeakFinder, as it did the same thing, only better. (This was the magic threshold I was searching for those two weeks ago!) Between the testing and debugging, that took most of the rest of today.

In the afternoon, I did some research to try and speed up the BeatFinder. I started out looking at ways to optimize the speed of the file reading, as that's where most of the lag comes from. Unfortunately, it seems that what I'm doing now is the fastest that it can get, at least in terms of accessing the file. It recommended using buffers (I am) and only getting necessary data (I do).

Alternatively, I could forgo writing all the FFT data (probably wise, as it's a very large file). The options then are (1) use the peaks instead or (2) get the beats as the FFT runs. The problem with the first is that I'm still not sure if it's returning all hearable notes, and, if it's not, it will throw off all the beats. The problem with the second is mainly one of structuring, as I'm sure it would be much faster then what I currently have (no reading or writing of files necessary!). I'll try implementing the latter tomorrow.

Axtell's Notes: June 30

So today we changed everything. Not quite, but it feels like it because we no longer use PeakData at all. Threshold has completely replaced it. I found the problems that were holding up Threshold (using the negative of the dB of that point for some reason...) and did a lot of testing as to which weighting function works best (A, B, C, D.) I also tried a slight shift on the decibel calculation when using A-, B- and C-Weightings because their functions use a shift to line up their numbers (Remember, none of these functions are perfect because no one has yet found an equation to match the ATH data set.)

I looked at four files (a440.wav, threenotes.wav, fade.wav and mapleleafrag.wav) with each weighting with and without the shift. (All images are of fade.wav)

A-Weighting lost a lot of data that was audible:

Fade.wav - A-Weighting

B-Weighting shows the most data without showing spill:

Fade.wav - B-Weighting

C-Weighting was a close second to B:

Fade.wav - C-Weighting

D-Weighting shows a lot of upper frequency spill, just barely visible here:

Fade.wav D-Weighting

So B without a shift was the winner and now our default weighing function. Here is Maple Leaf Rag using our new and improved Threshold:

Maple Leaf Rag

Compare that to this, the last Maple Leaf Rag I posted from June 25th:

Maple Leaf Rag - June 25

We gained some upper frequency spill, and lost a lot of lower frequency spill. I'll try to clean that up some more tomorrow.

I also did a little bit of updates to the color spectrum to work with Threshold. Tomorrow, I'll be adding decibels everywhere since we have that math working now! Finally.

29 June, 2010

Daily Blog 5

I started researching the constant Q transform (CQT) on Monday and found the transform differs from the discrete Fourier transform (DFT) in several ways but the most important difference is the constant Q transform analyzes an audio file on a logarithmic scale. At the lower frequencies, there are more "bins" for analysis than at the higher frequencies. The file is analyzed this way because humans can not distinguish a 2.5 Hz difference at higher frequencies but we can at the lower ones so the resolution of the window at the high frequencies (10 KHz to 20KHz) does not have to be as high as at the lower frequencies.

Like the DFT, the CQT can be calculated by brute force or a short cut can be used. The fast Fourier transform (FFT) is the short cut for the DFT. The FFT is used in the short cut for the CQT. The CQT is equal to multiplying the FFT by the spectral kernel of the data. I found a website with some sample code in MatLab: http://www.hans.fugal.net/research/cq-octave.

Yesterday, I spent most of the day rewriting some of my previous code in hopes of condensing and modifying it so I could use a generic class for both my DFT class and CQT class. Unfortunately, I only managed to crash what I had written so I am going to change it back and look to modify it after I have a working CQT.

6/29/10 Daily Journal of AT

Got it working again! Beat Finder will now run with both a basic file and on an FFT. Took most of the day to get it working properly, and there's still more to go. That's good, though, because most of what's left is finding the frequencies corresponding to various beat sources, like bass drums, hi hats, snares, bongos, tamborines, cowbells, and general frequencies. It'd be cool to have user options to specify this sort of thing, or analyze it to have the program tell itself which one to run. However, that's a bit more long-term.

This morning, I figured out the null pointer error that was causing my try to become a catch and return with an empty array. It was going a bit too far in the file, and couldn't fill the array. Fixed that with a simple -1, then moved on to getting the FFT to play nice. I ended up reinstating the FFT writer, but instead of having it in FFT, it's in WaveSplitter. That way, the data from all the miniwaves can be added.

Getting it back from the file was a lot harder. (It's impossible to just pass it, because there's not enough memory space.) The function ended up to be similar to what's done in NoteFinder, but with lots of extra for loops. (It's not terribly efficient yet.) Basically, the FFT file consists of all the miniwaves samples, and all the hertzs and heights in each. It's similar to what's in the peak data, but containing everything, so it's a much larger file and takes a lot longer to go through. (On the plus side, for anything else we need FFT data for, we have, and PeakFinder/NoteFinder may be re-done to reflect this.)

The function for getting the peaks at all samples (the chunks from yesterday) at a given frequency goes like this: Inside a while loop to check that there's more to be read, it goes past the number equal to the frequency, saves the frequency to a linked list, then goes through the rest until it reaches a double blank line. Then, it does it all again, until the file is empty. The list is made into an array and returned. That data is then used in the beat finder.

Next up is getting multiple frequencies at once, which will be harder, as the options are (A) run the data getter multiple times, which will save space, but take up a lot of processor time, or (B) change the data getter to get more than one frequency, which will be faster, but take up more space and be much more complicated in terms of coding. I'm inclined to go with the first, as I may be able to figure out how to optimize the speed, and it's easier to do that way first.

Also, the Beat Finder will also function as a single note identification program. Basically, right now it can find the peak frequencies of any note and return them. So if, for example, someone wanted to find all instances of E2 in a song...

...it's hard to imagine circumstances that would require it, but we can do it! (In this case, it corresponds to the back beat of a drum set.)

Axtell's Notes: June 29

Today I was looking at a bunch of different psychoacoustic functions and how we could use them. I started looking at the absolute threshold of hearing or ATH (a data set showing the lowest audible decibel of each frequency.) I hope to use it to ignore and not graph all the data we get that is inaudible. For now we accomplish that by ignoring the bottom 10% of a file, but this still leaves a lot of spill since higher frequencies are inaudible at much higher decibels. I first got off topic and looked at the equal-loudness contour which uses A-Weighting (a curve that tries to mimic the ATH) to make everything in an audio file (almost) the same volume. There is a check box now that will show that though I'm not sure that this is relevant or useful.

I am working on using the A-Weighting curve to determine if a point is audible or not. Currently, it does find some points that aren't audible and paints them white so I can see where they are, but a many points that should come back as inaudible don't. I think there is a flaw in my decibel conversion. Once this gets working, the BeatFinder should be much more accurate since we won't have to worry about spill as much.

Slow day today, but it should lead to something useful eventually. More research in psychoacoustics and testing of decibel converters and threshold functions tomorrow.

28 June, 2010

6/28/10 Daily Journal of AT

Today was pretty good. I started writing my beat finder function, which took some time. Merely reading the file in was hard enough! Once I got the data, I split it into miniwave-sized chunks, got the data at each point, squared, then totaled them. I then took the average of all the chunks together, and compared each chunk to the overall average. If they were higher, they were beats. That worked okay, so I decided to work on improvements. The first, storing all the data as chunks rather than exact bytes, I had already implemented, so I moved onto getting the variance.

That took a good deal longer. First off, I had to actually understand the math, and write a separate function for it. Then I realized it made a lot more sense within my other function, so moved it, and managed to get it mucked. Basically, it always returned a beat, regardless of height. After about an hour and a half of fixing that (and leaving the program a bit messy) it was working more smoothly. So, I gave it to Axtell to run on a song (because her computer is much faster than mine) and gave it a song.

It ran out of memory.

Fortunately, there's a fix, as the memory-running-out happened when reading the file in, which means that it's too large to create an array (which happened with our FFT in the early days). At first, I thought the trick would be to use the miniwave files, and read them in instead. However, upon closer inspection, the error proved to be in the size of the array holding the data, which would not be reduced by using the miniwaves.

The answer is, rather than reading everything, read the enough data to find a chunk, add the chunk to a list, and make an array out of the list. I've gotten rid of that error, but for some reason, all my data is set to zero. Well, it's something to fix tomorrow, as well as working on adding the FFT data to it. Probably have to write all the data to a file...which we got rid of...oh, the joys of coding.

Axtell's Notes: June 28

I re-reorganized the menus today. Now the buttons that only effect the peak graph are in a separate menu that pops up when Show Peaks is pressed. I added the check boxes for constant Q (this doesn't do anything yet) and beats. The beats check box is in the peak graph menu and draws a vertical grey line across the graph wherever BeatFinder finds a beat. It also writes those beats as a wave file. This is done by making a wave that is just a single tone as long as the sample length (for now assumed to be 1024 samples) and a wave of silence of the same length. When BeatFinder finds a beat, it adds the tone and where BeatFinder finds silence, it adds silence. BeatFinder is sort of working, so the BeatWriter is sort of working.

Beats in Shave and a Haircut

This gets almost all the beats except the last one which is a cymbal crash. We think that is why it's missing it because the PeakFinder is missing it too (The last peak should be visible around 2 seconds).

I started trying to look at BeatFinder on longer files and promptly got many OutOfMemoryErrors. I spent the rest of the day trying to increase the max memory for java to no avail. Tayloe is looking at decreasing how much memory everything uses. Hopefully, tomorrow we'll figure out some way to run full length songs again.

25 June, 2010

6/25/10 Daily Journal of AT

Alrighty, today was pretty good. Again, not a lot to show, but it was mostly research on finding the beats in a sound file.

Started out trying to work with the MIR files, but they made very little sense. The documentation was meager, and I couldn't actually tell what the analysis was doing. Looking at the website, I think it was the program to compare songs, but the output files wouldn't open in any given file, and looking at the txt of them, it was a bunch of numbers followed by a filename. Useful if one knows what they mean, but....

I started looking around the MIR website, and found a version of the BeatFinder written in C++. Wish I could have run it, but I don't know how to compile C++ on the terminal. Anyway, I started looking at it, and I know already, I can't translate it directly and implement it, because, rather than using an array of doubles for the data, it creates its own new object called Sounddata. All the extra information in it has already been stored in our other methods, so I see no reason to try and implement it.

Before trying to get through the BeatDetection, I saw in their comments that they had gotten their algorithms from another site, gamedev.net. I found the article they mentioned "Beat Detection Algorithms" (c) Frederic Patin 2002 , and it helped a lot. Put simply, the way the human ear finds beats is by recognizing a briefly louder sound in a song at a particular frequency. If a program can find when there are emphasized notes, it can find the beats.

It went into a lot of detail about how to compare a given sample to the average of the local sound (about one second surrounding the sample, so that if a song changes in intesity, it does not miss quiet beats and falsely return loud non-beats) to find if it is a beat or not. There are several methods of optimization, such as keeping the energy height rather than the frames of a sample, adding in a multiplier to avoid getting loud non-beats, and using an FFT to only compare the energy of given frequencies (to better find a back-beat or a cymbal hit).

Comparing what was said and what we have, I feel like we could implement this in a week (it'll probably take less than two days to write, but a while for testing various factors to become accurate). The article suggests having a logarithmic scale (and/or geometric spacing) for greater accuracy, so the constant Q should help with that. We may have to write an extra file with all the FFT data (rather than just the peaks) to use with the beat finder, but that shouldn't take up much time, programming or processing. I think it's doable in a week, doable well in two.

Axtell's Notes: June 25

Today while Tayloe worked on getting a beat finder, I continued to update FFTGUI and BigGUI. We ran some full length songs and got results that were recognizable as that song.

Maple Leaf Rag (Hamming Window)

If you play the song while looking at this, several musical patterns stick out as they repeat. The other song we ran was Freeze Ray from Dr. Horrible's Sing-Along Blog which didn't show as clear notes, but did show a very clear rhythm.

Freeze Ray (Hamming Window)

As can be seen from both screen shots, the BigGUI controls got an overhaul today. I'm not sure if I like the new setup, but it seems better than the old one.

I added the same zoom and scroll that the peak graph has on its x-axis to the FFT graph's x-axis. On both of those, the numbers on the x-axis do not scroll with the graph.

I spent more time than I should have learning about keyboard shortcuts today. Java makes it very easy to set a shortcut of Alt+anything, but on Mac keyboards, pressing Alt+anything makes special characters. This means that while trying to use shortcuts, umlauts were added to my sample size and 1024¨ is not a number. To use control, I had to change how my buttons were set up a bit add two lines of code for each shortcut to set them up, but they work! Huzzah.

Lastly, I got my color algorithm to work! Now the same block of code is used for any window. I tested this on several files including an a440 that fades in and then out made by Tayloe:

Fade (Hamming Window)

The colors work the same way: blue is the quietest and progresses through the rainbow until red, the loudest.

Next week we should be starting to play with a beat finder. We plan on adding grey vertical lines on the peak graph showing where the beats are. We are also talking about trying to make a .wav of the beat. I'm also looking at different possibilities for the BigGUI menu as it is sort of a mess.

24 June, 2010

Axtell's Notes: June 24

More updates to BigGUI today and a few to FFTGUI. First change is in FFTGUI. The function that gets the frequency of the point under the mouse and draws it next to the mouse now draws the note of that point under the frequency. I also added both those features to BigGUI's peak graph. After getting the updated NoteFinder and PeakData from Tayloe, I used threenotes.wav to test the notes by the mouse and they were all very close to correct.

We figured out that 512 is too small of a sample size to get an accurate sample. We are now using 1025 (one quarter of a second if the file has a 4.4 sampling rate.) I took some photos of the same file at 512, 1028, and 2048 samples to show the difference (All FFTS are at .2 seconds):

Sample Size Comparison Using threenotes.wav

They show that a larger sample size gets rid of a lot of noise, gets louder peaks and more specific points. It also speeds up the program quite a bit.

I then fixed my algorithms that determined the color for each peak. The Hamming and Hanning windows return much quieter peaks (always under one) they were too small to effect the algorithm I had, so they didn't use shades at all, but five different colors (red, yellow, green, blue, magenta) I added a multiplier so they now use the full spectrum. The quietest peaks now graph as purple (rather than magenta) because magenta stood out and looked like a very loud peak when it was really the quietest.

All of the afternoon was spent on setting up an x-axis zoom for the peak graph. There is a scrollbar so that any part of the file can be zoomed in on (tomorrow I'll be adding this feature to the y-axis of the peak graph and FFTGUI.) It cuts off the last data point or two and the numbers on the axis don't change with the scroll, but I hope to get both of those problems fixed tomorrow.

I'm working on one algorithm that takes any range of amplitudes and plots them using the color spectrum so I don't need three different blocks of code (one for Hamming, one for Hanning, and one for Bartlett and un-windowed since they are so similar) So far, if often tries to make colors with number greater than 255.

Tomorrow more BigGUI and FFTGUI work and moving on to look at generating rhythm using some spectral tools.

6/24/10 Daily Journal of AT

Today was...eh.

This morning was spent in more tweeking and adjusting of the various finders. I re-wrote NoteFinder with an extra class to find a single note, so that Axtell could include it in the GUI when a user highlights a peak and find the note, not just the hertz value. That also helped me find a few errors in the finder that had been present, but unseen, like it's propensity to have C's be an octave too low, or to needlessly return ? values.

In researching the MIR code and methods, I found that they use an FFT, but never split their minifiles smaller than 23 miliseconds. Testing ours, I found that, at larger sample sizes, the peaks were higher and more accurate, with less shifting to the sharper notes. So, we've now changed the default from 512 frames per sample (roughly 11 miliseconds) to 1025 frames (25 miliseconds). This lead to more tweaking of the Peak Finder, since the new default led to a lot more noise in the Hamming and Hanning windows. Ended up very similar, but enough to have needed the work I put in.

Unfortunately, I couldn't test the MIR program on the computers here, as they refuse to install Java 1.6. Instead, I tried running the program on my personal computer. I don't know what it was doing, but I don't think the program linked to was useful at all, as the files it wrote out were unreadable by any program I had. I'm going to investigate some of their other work, particularly the beat finder, tomorrow.

Daily Blog 4

I finally got an FFT algorithm working today and was able to convert it to a frequency scale. In addition, I need to add a short method to apply windowing functions to the audio signal before running the FFT. I would also like to add a method to "reset" the DFT so the same audio file does not have to be loaded again to run another FFT of a different sample size or with a different windowing scheme. I also should add a few more error catching and checks into my program to make sure it won't crash.

Tonight I am going to begin reading up on the Constant Q transform and will continue research on it through the end of the week. By mid-week next week, I will hopefully know enough about it to be able to start coding the transform. Tomorrow I plan on adding the last few finishing touches on my FFT and start research on the Constant Q transform.

23 June, 2010

6/23/10 Daily Journal of AT

Wow, Wendnesday already? In a couple days, we'll be halfway done with the internship.

Spent the morning looking up information on scales and analysis programs. The just vs. equal temperamant issue of scales is interesting. Basically, in just temperament, scales are tuned based on one note, relative to that pitch, and each note as a slight variation based on what key it's in. Equal temperament sets a default pitch for each note, all relative, which leads to it being slightly worse than any given just tuning, but better at going to other keys with no need to re-tune. Due to the way NoteFinder works in ranges around equal temperament pitches, it should be accurate for instruments tuned that way (namely pianos), but, since it works in ranges around those pitches, it should also suffice for equal temperament pieces (like vocal groups).

Also read a lot about the mir group's various projects, which were linked from the main summer page. They talk a lot about different analysis programs, how they can find the genre or the beats or the scale, but don't say a lot about the actual analysis involved. I played with the online versions of the programs and downloaded a bit of the code, but so far I haven't found out much, except they use FFTs in some fashion. I couldn't download the proper programs, because admin passwords D:

This afternoon, I realized that, when the notes were graphed, they appeared sequentially, with no regard to what sample they came from. Axtell and I managed to make the frequencies be packeted, so we've gotten rid of the slanty lines, and replaced them with linear lines. I also did a comparative analysis on a song, showing the difference between no window (the upper) and Hanning window (the lower). It's pretty clear that the windowing gets rid of noise, but also a lot of quieter data. (For reference, the song was "My Freeze Ray" from Dr. Horrible's Sing-a-long Blog. )

Axtell's Notes: June 23

So we haven't gotten to the constant Q transformation yet. Hopefully that will start tomorrow. I fixed a bunch of bugs in BigGUI and FFTGUI today. Some of them have error popups when it is something that the user should fix (such as not putting in a number for the sample size.)

I added a zoom in FFTGUI like BigGUI's, and I'm working on adding a scrolling function for both of them so we can zoom into any part of the spectrum. FFTGUI had a few problems with windows. Namely mousing over the windowed function's graph changed it to the un-windowed FFT, and zooming did the same thing. Both are fixed, though when the FFTGUI is called from BigGUI, the FFT that first appears is un-windowed and the mouse moves onto it, it changes to the window. I think I need to look at scaling for the windows because the windowed values are so much smaller than the un-windowed FFT that they look like straight lines.

We changed the 2D double array that held the data to a 3D array so that all the peaks that were in the same sample could be plotted in the same x location.

a440 With Hanning Window

The last thing I did was to update the function that determines what color to make each point. I wrote completely separate algorithms for the Hanning and Hamming windows because their data is so much smaller than the un-windowed FFT or Bartlett.

I also did some clean up and commenting of all the code I had written thus far, and ran all of Maple Leaf Rag with and with and without windows. Each took about 10 minutes.

We will actually start work on the constant Q transformation tomorrow! I'll also fix whatever problems come up with BigGUI.

22 June, 2010

6/22/10 Daily Journal of AT

Ha-haa! I finally fixed NoteFinder! It took all day, but I managed it!

Basically, as I had it, it was finding the notes relatively well, but it kept going sharp. I played with a few methods of getting the notes, and went back to my original method, taking a range between two adjacent notes. I also thought to print out what hertz my program had for each note. I found that, since the base notes were only accurate for two decimal points, the higher notes were getting further and further away from what they should be. (Only the A notes were accurate, due to their strange even nature.) So, I went in and put the actual function for finding the hertz for the notes, so that they were accurate to more places initially, and kept accuracy over numerous multiplications. For information, the function is 6.875 x 2 ^ ( ( 3 + midi-pitch-of-the-note ) / 12 ).

That was fine, but when I checked the ranges, they were now overlapping! I had to go in and find the correct fraction to subtract from each range, which was rather difficult, as it had to be accurate to at least ten places, and I couldn't figure out a formula that would work for generating most from some! It looked linear, but not quite. Same with quadratic, squared, cubed, square root, all of them slightly off. Maybe there's a logarithmic function (knowing this program, there would be) but in the end I did it manually, which took a while, but increased accuracy immensely. Any spill now is from actual FFT noise, and nothing to do with NoteFinder!

Axtell's Notes: June 22

Lots of little updates to BigGUI today. Next is to add the constant Q to BigGUI. It will be at least a few days to get a constant Q working on its own before BigGUI can implement it.

First of all, changing the buffer size no longer calls PeakData and NoteFinder, only SPLIT does that. Also, since windows are working now, BigGUI uses a Hanning window unless you tell it otherwise.

Next, when FFTGUI is called through BigGUI, the FFT and sine curve of that specified sample are shown as soon as it opens. For something that took several days to get working, the final code was very simple. There is a boolean that is true if main is passed data. If the boolean is true, after the FFT and sine buttons are initialized, they call doClick(). I also adding Tayloe's code from yesterday that goes to the specific mini file as opposed to using the whole file.

I also added a few popup panes that catch and explain errors. There's also one when SPLIT is complete.

An old problem that I finally got around to today is closing/reopening windows. Ideally, we should be able to close the peak and note windows and reopen them by clicking on Show Peaks or Show Notes. All I had to do is add a setVisible(true) to their listeners. I also did the same in FFTGUI. Closing either of the control windows ends the program.

The peak graph has two options for the x-axis units now (samples or seconds.) Not every second is marked except on short files (less than 15 seconds.)

Lastly, there is now a button next to the filename text field that plays the filename. Eventually, pressing the button again will stop the sound, but as of right now that is not working.

Screenshots: proof that I actually get work done. This is the peaks and notes of Maple Leaf Rag (using the old NoteFinder, I only just got the new one) with a Hanning window zoomed in to see the first ~1380 Hz.

Maple Leaf Rag With Hanning Window

Tomorrow, I'll be stopping the sound from playing, and looking at constant Q transforms.

21 June, 2010

Axtell's Notes: June 21

Today, I added some more features to BigGUI. First I worked on labeling the x-axis of the peak graph with units of time in samples. That was easy enough and works nicely. I'm considering adding an option to switch into time in seconds since the FFTGUI call takes a specific second, not sample.

The points on the peak graph now are different colors through the spectrum depending on their amplitude. An example (using a440.wav):

That picture also shows that I added a zoom function. It can now show as few as ~1380 Hz or as many as 22050 Hz. That is controlled by two new buttons under the Show Peaks button on the BigGUI controls.

The last piece I got working today is that the FFTGUI now opens with the same buffer and window as the BigGUI had selected when Get FFT was clicked. Tayloe got the Get FFT to read the time in seconds from the text field and do the FFT on that specific part (I was still working with the whole file). I still am working on making the FFT and sine curve graph as soon as FFTGUI opens. To do this, I have to tell FFTGUI that those buttons have been pushed even though they haven't. Hopefully that will be working tomorrow.

Lastly, I found an equation for volts to decibels (I decided that the raw data was in volts as that made the most sense), but I got oddly large numbers, so I'm not sure the given formula is accurate. The formula is 20log(V) for dBV, meaning 0 dB = 1 V, and 20log(V/0.7746) for dBu, meaning 0 dB = 0.7746 V. I assumed that it was log base 10. I will try using ln tomorrow to see if it looks right. I am also going to look at the website that I got it from again to see if they explain any more.

I also gave up on using the dialog box to show status and stop the splitting because it was taking a long time and is not that important. I will still be using dialog boxes to explain errors though.

So tomorrow I'll be working on the FFTGUI call a bit more and adding in the error dialog boxes. I also want to look into closing windows. I want to be able to close one of the graphs and then bring them back if I click the button that calls it again.

Hopefully we'll be adding in a constant Q option soon. That will probably be a check box in both BigGUI and FFTGUI. Since the constant Q is based on the FFT, we do not need a whole other button.

6/21/10 Daily Journal of AT

Some success, but not a lot of interest today. I made the windowing work, by having get Peaks filter the data more in the Hamming and Hanning windows. It took all morning to get right, and I still have a few issues with the note finder. This afternoon I got the file finder working, when the user wants a specific second to FFT. Also, looked over what professor was doing. It makes sense, but I don't know if she wants us to change something we're doing, or add something, or what.
Yeah, sort post today, sorry. Will write more when more is done.
Quote of the day: "If I had a penny for every time something to do with this program was logarithmic..."

18 June, 2010

6/18/10 Daily Journal of AT

Not a lot to say today. Sort of lost momentum from the good days we've been having. Morning was spent on NoteFinder, trying to get that to be more accurate. Used logarithmic scaling between notes, but that just made it worse. Ended up using note and next note, without anything between, and that worked best, but there's still a lot of spill.

Afternoon was spent helping with GUI (peer programming) and trying to get windows to work better. Bartlett works, but the two H's have too much noise, and increasing the peak filter hasn't helped yet. May move on to constant Q transforms, which is basically DFTs with logarithmic scaling. Of course, the papers I've looked at so far assume we know the math behind DFTs really well, so...don't know how well I'll do on that. Perhaps this weekend I shall belatedly research them, and figure out how the math works. It's one of those things that shouldn't work, but does.

Oh, and I ran Maple Leaf Rag through our GUI. Took half an hour and the data was very, very wrong. *sigh*

Axtell's Notes: June 18

I paid for those few nicely productive days in a row with a painfully slow day today.

As you can see in the picture, FFTGUI can now be called from BigGUI. It uses whatever filename was typed in last in BigGUI. It does not yet get the current buffer size or window. It also still runs the FFT on the entire file, not the mini file at the specified second. That was today's big break-through.

Something I'm working on which is half-way done now, is the dialog box labeled "Split" (In between the BigGUI and FFTGUI controls.) What's supposed to happen is: SPLIT is clicked, the dialog box pops up while SPLIT is splitting the file, getting the data, etc., if the user wants to stop the split while it's going (because it is taking to long, for example) pressing the Stop button will do that. What happens now: SPLIT is clicked, the dialog box pops up while SPLIT is splitting the file, but the Stop button doesn't appear until it is done splitting, so there is no change to stop it.

Now that I know how dialog boxes work, I plan to add them in when there are errors so the user can see what went wrong without going to look at the terminal for a printed error.

Windows are still not working in BigGUI. The Bartlett window works. The Hamming and Hanning windows do not. They return almost every frequency in the sound file so the graph is covered in red dots. No idea how/why this is happening. I shall look into on Monday.

A feature to add to the BigGUI: If the text file in the text file field already exists, use it for the Show peaks, Show notes without having to call SPLIT again. If SPLIT is pressed with an existing text file in the text file field, it will overwrite with current sound file data.

Tune in next week for: Fixing/finishing BigGUI, constant Q transformation research and implementation into current code, actually figuring out how to get the raw data into decibels.