Analyzing Music Internship: 13 June 2010

18 June, 2010

6/18/10 Daily Journal of AT

Not a lot to say today. Sort of lost momentum from the good days we've been having. Morning was spent on NoteFinder, trying to get that to be more accurate. Used logarithmic scaling between notes, but that just made it worse. Ended up using note and next note, without anything between, and that worked best, but there's still a lot of spill.

Afternoon was spent helping with GUI (peer programming) and trying to get windows to work better. Bartlett works, but the two H's have too much noise, and increasing the peak filter hasn't helped yet. May move on to constant Q transforms, which is basically DFTs with logarithmic scaling. Of course, the papers I've looked at so far assume we know the math behind DFTs really well, so...don't know how well I'll do on that. Perhaps this weekend I shall belatedly research them, and figure out how the math works. It's one of those things that shouldn't work, but does.

Oh, and I ran Maple Leaf Rag through our GUI. Took half an hour and the data was very, very wrong. *sigh*

Axtell's Notes: June 18

I paid for those few nicely productive days in a row with a painfully slow day today.

As you can see in the picture, FFTGUI can now be called from BigGUI. It uses whatever filename was typed in last in BigGUI. It does not yet get the current buffer size or window. It also still runs the FFT on the entire file, not the mini file at the specified second. That was today's big break-through.

Something I'm working on which is half-way done now, is the dialog box labeled "Split" (In between the BigGUI and FFTGUI controls.) What's supposed to happen is: SPLIT is clicked, the dialog box pops up while SPLIT is splitting the file, getting the data, etc., if the user wants to stop the split while it's going (because it is taking to long, for example) pressing the Stop button will do that. What happens now: SPLIT is clicked, the dialog box pops up while SPLIT is splitting the file, but the Stop button doesn't appear until it is done splitting, so there is no change to stop it.

Now that I know how dialog boxes work, I plan to add them in when there are errors so the user can see what went wrong without going to look at the terminal for a printed error.

Windows are still not working in BigGUI. The Bartlett window works. The Hamming and Hanning windows do not. They return almost every frequency in the sound file so the graph is covered in red dots. No idea how/why this is happening. I shall look into on Monday.

A feature to add to the BigGUI: If the text file in the text file field already exists, use it for the Show peaks, Show notes without having to call SPLIT again. If SPLIT is pressed with an existing text file in the text file field, it will overwrite with current sound file data.

Tune in next week for: Fixing/finishing BigGUI, constant Q transformation research and implementation into current code, actually figuring out how to get the raw data into decibels.

17 June, 2010

Axtell's Notes: June 17

The BigGUI class has all but two features working! Those that aren't working are windowing (because I haven't gotten around to setting it up) and showing the FFT of a specific sample (haven't figured out how to do this yet.)

Changing buffer size, sample size, the text file to write to and the sound file to read from all work. SPLIT splits the given .wav into smaller .wavs (each of the length given in the sample length text field) and does all the math so that Show Peaks and Show Notes are fast. Show Peaks makes the graph which is the highest (or few highest) frequencies of each mini .wav (x-axis is time, y-aixs is frequency in MHz.) Show Notes makes the array (note by octave) the number in each cell represents how many times that note of that octave occurs in the given .wav.

So Tayloe's code is now completely integrated with mine in BigGUI. NoteFinder and PeakData are both graphed through GraphSplitter (a boolean determines which to graph). I thought I was having problems with the boolean because clicking on Show Notes did nothing and clicking on Show Peaks did what it should and what Show Notes should. Eventually I switched to two classes (GraphPeaks and GraphNotes) to avoid the boolean, but that didn't change anything. I finally thought to look at where I initialize the buttons and sure enough Peak and Note listeners were both controlled by the Show Peaks button. Luckily I had saved the GraphSplitter class and that all works as it should now.

I might have made a bit of progress on the finding decibels side of things. Audacity's formula to scale their amplitudes is value[i] = 20. * log10(sample[i]) according to a few forums I found about it. This scales the sound to a range of -1 to 1 dB. I'm not sure if this is accurate, but since we have access to Audacity's code, I'll look at that tomorrow and see if I can find something useful.

Tomorrow, besides looking at Audacity's code, I hope to get the windows and get FFT features of BigGUI working, clean up BigGUI and other new classes, and start looking at constant Q transforms.

6/17/10 Daily Journal of AT

Another fine day. I could get used to this.

Started out by getting all the programs to play nice together. All I had left from yesterday was the Note Finder, which now works a lot more efficiently. It also is able to count how many of each note are in a peak data file, and return it in a double array, to be printed out with our Big GUI. The notes are still a little off in the higher pitches, and I'll probably try and figure out a better splitting point tonight. The difference between notes starts out small and doubles each time through, so higher notes have a higher range of error than lower notes. Trouble is, I don't know a way to get a threshold other than getting the point halfway between two notes. Maybe research will help.

I also helped out Axtell with the Big GUI. We've got it to run the splitter and fft minifiles, then save the files to disk. We can change what file, sample size, filtering (though the non-null ones don't work yet), sample size, and save location. We can also print out a graph of the peaks over time, or a table of the total number of notes. So, once that's shined up and we get our new FFT function, we should have a working program ready for delivery. Let us know if you want any other features.

16 June, 2010

Daily Blog 3

Even though this is my first post in several days, I do not have much to report. I have been continuing research on FFTs in hopes to fully understand them so I can write a program to run it from scratch. I got a hold of a textbook on computer music, The Computer Music Tutorial by Curtis Roads, and started reading it. The appendix included a very detailed outline on how to program DFTs. After reading part of this textbook and re-reading some of the other sources, I believe I have a firm enough grasp on the subject to be able to write a FFT algorithm.

I started working on creating an Audio object and DFT object to store the information needed to perform the FFT. I believe my Audio class is finished and I am almost done with the DFT class. Today I perfected the code to do bit reversals on an array, which is the first step to running a FFT. I will hopefully be able to finish the code for running the FFT tomorrow and will be able to start combining my code with Axtell and Tayloe's GUIs, other interfaces, and file splitting programs.

Axtell's Notes: June 16

Today I updated the FFTGUI class and started work on a BigGUI class that will combine my FFTGUI and Tayloe's WaveSplitter, PeakData and NoteFinder.

Updates to the FFTGUI class:

Clicking on the comparison of the FFT cycles through which windowed FFT is on top. There is a drop-down menu of windows so that any single windowed FFT can be seen on its own. There is a clear button that sort of works (the FFT and Sine graphs are cleared by running a .wav of silence through them and the drop-down menus and text field don't change.) If the filename in the text field does not exist, it uses the silence again. A bunch of null pointer errors came up and were fixed. Most of them came from the program trying to do something on a mouse click or move over the window when there was no graph in the window.

I realised that FFT.java used all floats, GraphingData used all doubles and FFTGUI used a little of each. So everything is now using doubles. They are more precise which means they slow down the program, but seeing as we will be using it for very small files, it shouldn't be too noticeable.

The BigGUI is currently just buttons, text fields, and drop-down menus that don't do anything. Tomorrow I will be working on getting a few of BigGUIs buttons working.

I still need to figure out what units the raw sound bytes are in. I'm doing some research on this tonight and will report back tomorrow.

6/16/10 Daily Journal of AT

Two good days in a row! Something must be going wrong, but stuff works!

This morning, I was inspired. I finally figured out what sort of PeakFinder to use! The secret? More rigorous peak identification and standard deviations. To elaborate, the function, as it is now, first finds the peaks based on the surrounding points, fifteen to the left and right, and only saves them if the data is larger than all the surrounding points. (This may be changed to a smaller number when windowing is used, in order to save smaller peaks and peaks close to each other.) Then, of the remaining peaks, a standard deviation is taken, and all below the 64th percentile are discarded (the noisy peaks). It works super well!

I had to readjust the note finder, as the C notes generated were a step lower than they should have been. I also wrote in a bit that tells when a peak is inaudible (as in, lower than the 0th octave or higher than the 9th octave).

This afternoon, I worked on cleaning and commenting my code, and looked at Axtell's shiny new GUI. We discussed how we would implement our two programs, and figured out a new GUI that would take a file, split it, and display either hertz peaks over time or a list of the notes present in the piece. I've managed to re-do Wave Splitter and Peak Finder to work with it, although I may have to overhaul Note Finder more tomorrow. Anyway, it was a surprisingly productive day, and we should hopefully have a working GUI by Friday morning.

Also: Everything's better nested, like grids, loops, and baby birds.

To Do: Comment, clean, and get everyone to play nice with each other

15 June, 2010

Axtell's Notes: June 15

Lots of updates to the GUI today.

GUI Showing FFT and Curve of a440.wav

So, three separate windows open now. This is mostly so the two graphs will resize as their windows d0. The menu window has a text field that takes any .wav file, a combo box that changes the buffer size (how many samples are taken) from 512 (2^9) to 262,144 (2^18), three buttons (FFT, comparison, Sine curve), and a check box that controls whether or not the spill on the sine curve is shown. As the buffer size goes up, the program slows down. Gregor is looking at writing a recursive FFT method to see if that is fast than nested for-loops.

The peak of the file is printed in the corner (the peak does read exactly 440 Hz for the a440 file unless the buffer is 2^16 or greater. If the mouse is on the graph, the frequency in Hz at that point appears next to the mouse.

There was an interesting bug when adding in the buffer menu that changed the shape of the sine curve each time a different buffer size was chosen. I figured out that the GraphingSound class gets the 512 samples to plot from the array of raw data of size buffer that is rewritten each time the buffer size changes. The raw data is read from the file using audioInputStream.read(byte array). That method takes as many samples as will fit in the byte array, but it takes them from evenly spaced intervals across the file so that the FFT will be more accurate. To get the first 512 samples, I had to make a new method that read every sample there is, get the first 512 of those, and graph that. This slows down the program, but as we work with smaller and smaller files, it will be less and less of a problem.

I also did some much needed general clean up and commenting of my classes.

I still need to: set up cycling through the windows in comparison(I'm going to try setting up some states to do this), get the raw FFT data into decibels, add Tayloe's WaveSplitter and peak/note finder, clean up the GUI a bit more and research constant Q transforms.

I'm starting with the window cycling tomorrow.

6/15/10 Daily Journal of AT

So I figured out the problem from yesterday. I had created my files with a low amplitude (.1), in case I decided to add them together. However, at that low, the peaks are lost entirely in our FFT. At an amplitude of 1, they give a much different peak. (I should have guess when the waving occurred.) *sigh*

So I ran the numbers again, with a higher amplitude and shorter length (ten times as loud, but four percent of the length). Because it was so short, there was no data available for frequencies below 40 Hz. However, the ratio is now what was expected, half the sampling rate (44.1 kHz) over on fourth the buffer size (8192). Well, more or less. Annoyingly, this means at the very large buffer size of 32768, there are 2.691650390625 Hz per array. If I wanted to reduce it, get closer to (at least) a 1:1 ratio, we'd have to quadruple the buffer size (to 131072) which would drastically increase the time it takes to run the FFT. Potentially, we could merely double the buffer size, and have the ratio be at 1.3:1, which would allow us to at least tell the difference between lower notes (C1 and lower).

After all that, I made a function to figure out, based on frequency, what note was at the peak. It took a little figuring, but I managed to create it and then improve it. Initially, I had thought to use if/elif statements, but realized that would be a lot of code. Instead, I used a shortcut. I put the frequencies for the lowest scale (Ab0 to G0) into an array called basicNotes, with the corresponding letter value in literalNotes. I also put the distance between each element into another array (that was one longer) called basicRanges. The program takes an array of floats, presumably frequencies. For each one, it first figures out what scale it's in by testing to see if it's greater than the C of that scale, using a for loop. In a new for loop, it tests against each of the basic notes, multiplied by 2^scale (because notes are logarithmic), and returns the note that it's closest to (based on threshold between notes. Right now it's in half, which probably isn't exactly right, but it's close enough for jazz.) I had thought to store all this in one array, but Java requires them all to be the same, so I split them into floats and strings.

The rest of the day was spent testing different files to see how high their frequency peaks were, to try and get a ballpark range for the filtering; and testing the note finder. The latter is going better. Tomorrow will be cleaning up the files so that they play nice with Axtell's GUI, and trying to find a rule of thumb for getting the higher peaks.

P.S. For the test file we were given, the peaks are (approximately) at 292.0, 585.4, 522.2, and 259.7. In note terms, they are D4, C5, D5, and C4, respectively. As analyzed in thirds, the first two notes are D4 and C5, second two are D4 and D5, third two are C4 and C5.

14 June, 2010

Axtell's Notes: June 14

First job today was a program that would graph the .wav file as time/amplitude (as opposed to frequency/amplitude as in an FFT) It was simple enough to plot the float values from the raw data of the audio file, but there was a lot of spill.

a440 Sine Curve Points

Using a simple a440 file (one sustained note) a sine curve can be seen, but there is a lot of spill. I wrote a method that goes through each point and of the 5 points either way in the x-axis, it finds the closest and draws a line between those two points and only draws the lines and no points.

a440 Sine Curve Without Spill

a440 With A Partial At 880

There is still some odd spill, but not nearly as much and the curve is very clear. That option is now available from the menu in FFTGraph.

I figured out that the FFT graph prints a range of 0 to 22 MHz. With a little math, the FFTGraph window now prints the frequency with the highest amplitude in the corner of the screen.

Next I worked on turning everything I had into a GUI. Eventually if the mouse hovers over the graph the GUI will show the frequency at that point, and if the mouse clicks on the graph while comparing it will cycle through each graph.

Having never written my own GUI from scratch before, a lot of time was spent in working out how to use what I already had in a GUI. The GUI now works minimally. The FFTGraph class still provides a lot more useful information. Also, the GUI image will not scale when the window changes size.

Blank GUI Screen

GUI Showing an a440 FFT

GUI Showing an a440 FFT Comparison

There is a lot more to do with the GUI just to get it caught up with the FFTGraph class, then I'll look at adding Tayloe's WaveSplitter class so the files are of a more useful length, and then we'll move on to the constant Q transform.

I also still need to figure out how to get the raw data from the FFT into amplitude in decibels.

6/14/10 Daily Journal of AT

Another day where things go wrong. They seem to come in cycles.

I basically had two things I wanted to do today: figure out a better peak finder, and find what peaks correspond to what frequencies. Since the correspondence was easier, I decided to get that out of the way first. It started out fairly easily: I created a number of files of frequencies ranging from 10 to 200, going up by 10 each time, then wrote a shell program to run an FFT and a peak finder on each of them. (Since there was only one peak, I ran the halfing function six times on each, which got the major peak, or occasionally peaks.) I plotted the data and found the ratio. To be on the safe side, I did so for a variety of buffer sizes, from 2^15 to 2^10. I decided to use the largest, as it gave the most exact data. With 2^15, or 32768, the ratio of array placement to Hz was 28.6. So far so good.

The problem came when I decided to run some more frequencies through; namely, frequencies 50 through 2000, going up by 50. That's when the problems began. For 50, 100, 200, and 250, the peaks corresponded perfectly with their respective Hz. (150 had some problems with resonance, I think. I don't know why.) However, for 300 onward, there were no peaks in their respective frequencies. Every peak was less than 300. I thought it might be the multiplier, but changing it didn't help with the higher frequencies, and made the lower ones wrong as well. The only conclusion I could come to was that the FFT was cutting off data. Which made sense, given the data.

However, this flies in the face of the testing data done on one of our files, a440AndOnePartial.wav. When graphed, it clearly gave two peaks, the nearer one much larger. To try and figure out what was going on, I graphed the data from 50, 100, 150, 200, 250, and the a440 file.

A little blurry, but I'll explain.
The five graphs on the left are 50 through 250, with the yellow boxes pointing out the peaks. 150 is hard to see, but looking at 140 and 160 (immediately to the right) it is clear where 150 should be. However, to the top right is the a440 and partial graph, which, fairly clearly, shows those peaks. In the completely wrong positions!

I haven't had much time to mess with the FFT. I tried taking out the mirroring function, which gives a few peaks close to accurate for 300 to 500. However, beyond that, the peaks do not exist. I'm going to do some reading tonight to try and figure out what's wrong, but unless this is fixed, the FFT function is useless for any frequencies above middle C.

Analyzing Music Internship