25 June, 2010

6/25/10 Daily Journal of AT

Alrighty, today was pretty good. Again, not a lot to show, but it was mostly research on finding the beats in a sound file.

Started out trying to work with the MIR files, but they made very little sense. The documentation was meager, and I couldn't actually tell what the analysis was doing. Looking at the website, I think it was the program to compare songs, but the output files wouldn't open in any given file, and looking at the txt of them, it was a bunch of numbers followed by a filename. Useful if one knows what they mean, but....

I started looking around the MIR website, and found a version of the BeatFinder written in C++. Wish I could have run it, but I don't know how to compile C++ on the terminal. Anyway, I started looking at it, and I know already, I can't translate it directly and implement it, because, rather than using an array of doubles for the data, it creates its own new object called Sounddata. All the extra information in it has already been stored in our other methods, so I see no reason to try and implement it.

Before trying to get through the BeatDetection, I saw in their comments that they had gotten their algorithms from another site, gamedev.net. I found the article they mentioned "Beat Detection Algorithms" (c) Frederic Patin 2002 , and it helped a lot. Put simply, the way the human ear finds beats is by recognizing a briefly louder sound in a song at a particular frequency. If a program can find when there are emphasized notes, it can find the beats.

It went into a lot of detail about how to compare a given sample to the average of the local sound (about one second surrounding the sample, so that if a song changes in intesity, it does not miss quiet beats and falsely return loud non-beats) to find if it is a beat or not. There are several methods of optimization, such as keeping the energy height rather than the frames of a sample, adding in a multiplier to avoid getting loud non-beats, and using an FFT to only compare the energy of given frequencies (to better find a back-beat or a cymbal hit).

Comparing what was said and what we have, I feel like we could implement this in a week (it'll probably take less than two days to write, but a while for testing various factors to become accurate). The article suggests having a logarithmic scale (and/or geometric spacing) for greater accuracy, so the constant Q should help with that. We may have to write an extra file with all the FFT data (rather than just the peaks) to use with the beat finder, but that shouldn't take up much time, programming or processing. I think it's doable in a week, doable well in two.

Axtell's Notes: June 25

Today while Tayloe worked on getting a beat finder, I continued to update FFTGUI and BigGUI. We ran some full length songs and got results that were recognizable as that song.


If you play the song while looking at this, several musical patterns stick out as they repeat. The other song we ran was Freeze Ray from Dr. Horrible's Sing-Along Blog which didn't show as clear notes, but did show a very clear rhythm.


As can be seen from both screen shots, the BigGUI controls got an overhaul today. I'm not sure if I like the new setup, but it seems better than the old one.

I added the same zoom and scroll that the peak graph has on its x-axis to the FFT graph's x-axis. On both of those, the numbers on the x-axis do not scroll with the graph.

I spent more time than I should have learning about keyboard shortcuts today. Java makes it very easy to set a shortcut of Alt+anything, but on Mac keyboards, pressing Alt+anything makes special characters. This means that while trying to use shortcuts, umlauts were added to my sample size and 1024¨ is not a number. To use control, I had to change how my buttons were set up a bit add two lines of code for each shortcut to set them up, but they work! Huzzah.

Lastly, I got my color algorithm to work! Now the same block of code is used for any window. I tested this on several files including an a440 that fades in and then out made by Tayloe:


The colors work the same way: blue is the quietest and progresses through the rainbow until red, the loudest.

Next week we should be starting to play with a beat finder. We plan on adding grey vertical lines on the peak graph showing where the beats are. We are also talking about trying to make a .wav of the beat. I'm also looking at different possibilities for the BigGUI menu as it is sort of a mess.

24 June, 2010

Axtell's Notes: June 24

More updates to BigGUI today and a few to FFTGUI. First change is in FFTGUI. The function that gets the frequency of the point under the mouse and draws it next to the mouse now draws the note of that point under the frequency. I also added both those features to BigGUI's peak graph. After getting the updated NoteFinder and PeakData from Tayloe, I used threenotes.wav to test the notes by the mouse and they were all very close to correct.

We figured out that 512 is too small of a sample size to get an accurate sample. We are now using 1025 (one quarter of a second if the file has a 4.4 sampling rate.) I took some photos of the same file at 512, 1028, and 2048 samples to show the difference (All FFTS are at .2 seconds):


They show that a larger sample size gets rid of a lot of noise, gets louder peaks and more specific points. It also speeds up the program quite a bit.

I then fixed my algorithms that determined the color for each peak. The Hamming and Hanning windows return much quieter peaks (always under one) they were too small to effect the algorithm I had, so they didn't use shades at all, but five different colors (red, yellow, green, blue, magenta) I added a multiplier so they now use the full spectrum. The quietest peaks now graph as purple (rather than magenta) because magenta stood out and looked like a very loud peak when it was really the quietest.

All of the afternoon was spent on setting up an x-axis zoom for the peak graph. There is a scrollbar so that any part of the file can be zoomed in on (tomorrow I'll be adding this feature to the y-axis of the peak graph and FFTGUI.) It cuts off the last data point or two and the numbers on the axis don't change with the scroll, but I hope to get both of those problems fixed tomorrow.

I'm working on one algorithm that takes any range of amplitudes and plots them using the color spectrum so I don't need three different blocks of code (one for Hamming, one for Hanning, and one for Bartlett and un-windowed since they are so similar) So far, if often tries to make colors with number greater than 255.

Tomorrow more BigGUI and FFTGUI work and moving on to look at generating rhythm using some spectral tools.

6/24/10 Daily Journal of AT

Today was...eh.

This morning was spent in more tweeking and adjusting of the various finders. I re-wrote NoteFinder with an extra class to find a single note, so that Axtell could include it in the GUI when a user highlights a peak and find the note, not just the hertz value. That also helped me find a few errors in the finder that had been present, but unseen, like it's propensity to have C's be an octave too low, or to needlessly return ? values.

In researching the MIR code and methods, I found that they use an FFT, but never split their minifiles smaller than 23 miliseconds. Testing ours, I found that, at larger sample sizes, the peaks were higher and more accurate, with less shifting to the sharper notes. So, we've now changed the default from 512 frames per sample (roughly 11 miliseconds) to 1025 frames (25 miliseconds). This lead to more tweaking of the Peak Finder, since the new default led to a lot more noise in the Hamming and Hanning windows. Ended up very similar, but enough to have needed the work I put in.

Unfortunately, I couldn't test the MIR program on the computers here, as they refuse to install Java 1.6. Instead, I tried running the program on my personal computer. I don't know what it was doing, but I don't think the program linked to was useful at all, as the files it wrote out were unreadable by any program I had. I'm going to investigate some of their other work, particularly the beat finder, tomorrow.

Daily Blog 4

I finally got an FFT algorithm working today and was able to convert it to a frequency scale. In addition, I need to add a short method to apply windowing functions to the audio signal before running the FFT. I would also like to add a method to "reset" the DFT so the same audio file does not have to be loaded again to run another FFT of a different sample size or with a different windowing scheme. I also should add a few more error catching and checks into my program to make sure it won't crash.

Tonight I am going to begin reading up on the Constant Q transform and will continue research on it through the end of the week. By mid-week next week, I will hopefully know enough about it to be able to start coding the transform. Tomorrow I plan on adding the last few finishing touches on my FFT and start research on the Constant Q transform.

23 June, 2010

6/23/10 Daily Journal of AT

Wow, Wendnesday already? In a couple days, we'll be halfway done with the internship.

Spent the morning looking up information on scales and analysis programs. The just vs. equal temperamant issue of scales is interesting. Basically, in just temperament, scales are tuned based on one note, relative to that pitch, and each note as a slight variation based on what key it's in. Equal temperament sets a default pitch for each note, all relative, which leads to it being slightly worse than any given just tuning, but better at going to other keys with no need to re-tune. Due to the way NoteFinder works in ranges around equal temperament pitches, it should be accurate for instruments tuned that way (namely pianos), but, since it works in ranges around those pitches, it should also suffice for equal temperament pieces (like vocal groups).

Also read a lot about the mir group's various projects, which were linked from the main summer page. They talk a lot about different analysis programs, how they can find the genre or the beats or the scale, but don't say a lot about the actual analysis involved. I played with the online versions of the programs and downloaded a bit of the code, but so far I haven't found out much, except they use FFTs in some fashion. I couldn't download the proper programs, because admin passwords D:

This afternoon, I realized that, when the notes were graphed, they appeared sequentially, with no regard to what sample they came from. Axtell and I managed to make the frequencies be packeted, so we've gotten rid of the slanty lines, and replaced them with linear lines. I also did a comparative analysis on a song, showing the difference between no window (the upper) and Hanning window (the lower). It's pretty clear that the windowing gets rid of noise, but also a lot of quieter data. (For reference, the song was "My Freeze Ray" from Dr. Horrible's Sing-a-long Blog. )


Axtell's Notes: June 23

So we haven't gotten to the constant Q transformation yet. Hopefully that will start tomorrow. I fixed a bunch of bugs in BigGUI and FFTGUI today. Some of them have error popups when it is something that the user should fix (such as not putting in a number for the sample size.)

I added a zoom in FFTGUI like BigGUI's, and I'm working on adding a scrolling function for both of them so we can zoom into any part of the spectrum. FFTGUI had a few problems with windows. Namely mousing over the windowed function's graph changed it to the un-windowed FFT, and zooming did the same thing. Both are fixed, though when the FFTGUI is called from BigGUI, the FFT that first appears is un-windowed and the mouse moves onto it, it changes to the window. I think I need to look at scaling for the windows because the windowed values are so much smaller than the un-windowed FFT that they look like straight lines.

We changed the 2D double array that held the data to a 3D array so that all the peaks that were in the same sample could be plotted in the same x location.

The last thing I did was to update the function that determines what color to make each point. I wrote completely separate algorithms for the Hanning and Hamming windows because their data is so much smaller than the un-windowed FFT or Bartlett.

I also did some clean up and commenting of all the code I had written thus far, and ran all of Maple Leaf Rag with and with and without windows. Each took about 10 minutes.

We will actually start work on the constant Q transformation tomorrow! I'll also fix whatever problems come up with BigGUI.

22 June, 2010

6/22/10 Daily Journal of AT

Ha-haa! I finally fixed NoteFinder! It took all day, but I managed it!

Basically, as I had it, it was finding the notes relatively well, but it kept going sharp. I played with a few methods of getting the notes, and went back to my original method, taking a range between two adjacent notes. I also thought to print out what hertz my program had for each note. I found that, since the base notes were only accurate for two decimal points, the higher notes were getting further and further away from what they should be. (Only the A notes were accurate, due to their strange even nature.) So, I went in and put the actual function for finding the hertz for the notes, so that they were accurate to more places initially, and kept accuracy over numerous multiplications. For information, the function is
6.875 x 2 ^ ( ( 3 + midi-pitch-of-the-note ) / 12 ).

That was fine, but when I checked the ranges, they were now overlapping! I had to go in and find the correct fraction to subtract from each range, which was rather difficult, as it had to be accurate to at least ten places, and I couldn't figure out a formula that would work for generating most from some! It looked linear, but not quite. Same with quadratic, squared, cubed, square root, all of them slightly off. Maybe there's a logarithmic function (knowing this program, there would be) but in the end I did it manually, which took a while, but increased accuracy immensely. Any spill now is from actual FFT noise, and nothing to do with NoteFinder!

Axtell's Notes: June 22

Lots of little updates to BigGUI today. Next is to add the constant Q to BigGUI. It will be at least a few days to get a constant Q working on its own before BigGUI can implement it.

First of all, changing the buffer size no longer calls PeakData and NoteFinder, only SPLIT does that. Also, since windows are working now, BigGUI uses a Hanning window unless you tell it otherwise.

Next, when FFTGUI is called through BigGUI, the FFT and sine curve of that specified sample are shown as soon as it opens. For something that took several days to get working, the final code was very simple. There is a boolean that is true if main is passed data. If the boolean is true, after the FFT and sine buttons are initialized, they call doClick(). I also adding Tayloe's code from yesterday that goes to the specific mini file as opposed to using the whole file.

I also added a few popup panes that catch and explain errors. There's also one when SPLIT is complete.

An old problem that I finally got around to today is closing/reopening windows. Ideally, we should be able to close the peak and note windows and reopen them by clicking on Show Peaks or Show Notes. All I had to do is add a setVisible(true) to their listeners. I also did the same in FFTGUI. Closing either of the control windows ends the program.

The peak graph has two options for the x-axis units now (samples or seconds.) Not every second is marked except on short files (less than 15 seconds.)

Lastly, there is now a button next to the filename text field that plays the filename. Eventually, pressing the button again will stop the sound, but as of right now that is not working.

Screenshots: proof that I actually get work done. This is the peaks and notes of Maple Leaf Rag (using the old NoteFinder, I only just got the new one) with a Hanning window zoomed in to see the first ~1380 Hz.


Tomorrow, I'll be stopping the sound from playing, and looking at constant Q transforms.

21 June, 2010

Axtell's Notes: June 21

Today, I added some more features to BigGUI. First I worked on labeling the x-axis of the peak graph with units of time in samples. That was easy enough and works nicely. I'm considering adding an option to switch into time in seconds since the FFTGUI call takes a specific second, not sample.

The points on the peak graph now are different colors through the spectrum depending on their amplitude. An example (using a440.wav):

That picture also shows that I added a zoom function. It can now show as few as ~1380 Hz or as many as 22050 Hz. That is controlled by two new buttons under the Show Peaks button on the BigGUI controls.


The last piece I got working today is that the FFTGUI now opens with the same buffer and window as the BigGUI had selected when Get FFT was clicked. Tayloe got the Get FFT to read the time in seconds from the text field and do the FFT on that specific part (I was still working with the whole file). I still am working on making the FFT and sine curve graph as soon as FFTGUI opens. To do this, I have to tell FFTGUI that those buttons have been pushed even though they haven't. Hopefully that will be working tomorrow.

Lastly, I found an equation for volts to decibels (I decided that the raw data was in volts as that made the most sense), but I got oddly large numbers, so I'm not sure the given formula is accurate. The formula is 20log(V) for dBV, meaning 0 dB = 1 V, and 20log(V/0.7746) for dBu, meaning 0 dB = 0.7746 V. I assumed that it was log base 10. I will try using ln tomorrow to see if it looks right. I am also going to look at the website that I got it from again to see if they explain any more.

I also gave up on using the dialog box to show status and stop the splitting because it was taking a long time and is not that important. I will still be using dialog boxes to explain errors though.

So tomorrow I'll be working on the FFTGUI call a bit more and adding in the error dialog boxes. I also want to look into closing windows. I want to be able to close one of the graphs and then bring them back if I click the button that calls it again.

Hopefully we'll be adding in a constant Q option soon. That will probably be a check box in both BigGUI and FFTGUI. Since the constant Q is based on the FFT, we do not need a whole other button.

6/21/10 Daily Journal of AT

Some success, but not a lot of interest today. I made the windowing work, by having get Peaks filter the data more in the Hamming and Hanning windows. It took all morning to get right, and I still have a few issues with the note finder. This afternoon I got the file finder working, when the user wants a specific second to FFT. Also, looked over what professor was doing. It makes sense, but I don't know if she wants us to change something we're doing, or add something, or what.
Yeah, sort post today, sorry. Will write more when more is done.
Quote of the day: "If I had a penny for every time something to do with this program was logarithmic..."