Today was kind of slow. Primarily, it was spent testing different multipliers and size ranges for the Beat Finder. The multiplier is how the threshold for determining whether something is a beat or not is found (if the given data is higher than the average times the multiplier, it's a beat). The tutorial I found said that the most reliable multiplier is within a function relating to variance, but after getting a lot of screwy data, I decided to work with a base multiplier for now. Trouble is, it works differently on different files. With something with a strong base beat, like the Thunder intro, it does quite well, but on anything with a more even sound, it usually doesn't pick up the beats.
I was able to successfully implement the Bark sizing to the frequencies. Previously, when I wanted to find the beat in a given range of hertz, I used a multiplier, which worked well on low frequencies, and poorly on high ones. The Bark sizing works on a semi-logarithmic scaling, so low ranges are still accurate, but high ones can also be found. For example, here's the base beat of Thunder Intro:
The handclaps are still imperceptible ):
We also tried putting together our GUI and FFTs, and found out that we don't actually know what the height of the peaks is in, other than relative loudness. Presumably, since we're using the magnitude rather than the direct data, there is not necessarily any correlation between heights, volts, or decibels, other than relatively.
This blog is designed to be a web journal to document the progress made on a 2010 summer research project regarding music analysis and Fast Fourier Transforms.
02 July, 2010
Axtell's Notes: July 2
The Gaussian window now can be called from anything without crashing, but is not the correct equation for that window. I'll be spending some time this weekend looking for the correct equation and at some other windows that could be better than Hamming.
I got Gregor's code working with Tayloe's and mine, but it seems to return a lot less data than our FFT. I got frustrated trying to figure out the difference since the math is almost exactly the same, so I am going to come back to that on Tuesday.
I was reading about using an overlap to get more accurate data. Overlap meaning that, when we split the wav into samples, we take each sample starting halfway through the last one. Example: If the length of each sample is 1028, each sample starts 512 after the one before it did. This way any data that would be lost because the splits are right on a peak aren't lost. This also means that we are making an array twice as long and we're getting OutOfMemoryErrors again. Clearly, we either need to find a way to use less of the heap, or I need to actually raise the heap size to match our needs.
After a long weekend of window research I will be adding new windows and making the Gaussian window work, looking at the Complex FFT again to see what's going on there, and boosting the java heap size or cleaning up our code to use less space.
Have a good Fourth of July!
I got Gregor's code working with Tayloe's and mine, but it seems to return a lot less data than our FFT. I got frustrated trying to figure out the difference since the math is almost exactly the same, so I am going to come back to that on Tuesday.
I was reading about using an overlap to get more accurate data. Overlap meaning that, when we split the wav into samples, we take each sample starting halfway through the last one. Example: If the length of each sample is 1028, each sample starts 512 after the one before it did. This way any data that would be lost because the splits are right on a peak aren't lost. This also means that we are making an array twice as long and we're getting OutOfMemoryErrors again. Clearly, we either need to find a way to use less of the heap, or I need to actually raise the heap size to match our needs.
After a long weekend of window research I will be adding new windows and making the Gaussian window work, looking at the Complex FFT again to see what's going on there, and boosting the java heap size or cleaning up our code to use less space.
Have a good Fourth of July!
01 July, 2010
Daily Blog 6
It's been slow going the past few days. I decided to create a new class in hopes it will help clean up the code in the FFT and, since the constant Q transform uses many of the same methods and calculations as the FFT, I thought a new class would help. Writing the new class was the easy part but translating my existing program to use the new class proved to be more difficult than it should have been. After eliminating all of the syntax errors, I started receiving null pointer errors followed by FFT output discrepancies. After a day and a half of debugging, I was able to figure out my problems laid in the fact I neglected to initialize a field I called later in the program and I was not deep copying my temporary variables in my bit reverse function.
After finally getting the program to work again, I started working on implementing a Gaussian windowing function which is almost done and should be complete tomorrow morning. I will then start recoding the constant Q as I had started it a few days ago but need to implement it using the new class.
After finally getting the program to work again, I started working on implementing a Gaussian windowing function which is almost done and should be complete tomorrow morning. I will then start recoding the constant Q as I had started it a few days ago but need to implement it using the new class.
7/1/10 Daily Journal of AT
First day of July, and I feel like we're getting things done. (Which is good, since we're nearing the end of week 5.)
Started today by working with BeatFinder to speed it up. Since there was no way I could get the file reading to go any faster, I started playing with different ways to pass the data from FFT to BeatFinder. I ended up re-writing BeatFinder as a class, with accessors, that was given each miniwave FFT data as it was read. After all the FFTs were done, it runs itself to get the beats. If, for some reason, this is done when there is no FFT data, it gets beat information directly from the file (as it did in the original function from the begining of the week).
While I was working in WaveSplitter (which is where FFT, PeakFinder, and BeatFinder are all called) I realized that we didn't need to create miniwave files and save them to disk. Instead, the miniwaves are passed directly to FFT, without writing them to the disk. This nearly halfs the time when running large files, which is great, but makes us unable to run a single FFT on a given second. If there's time, we'll work on getting just the data at that second to find a single FFT, or we'll just abandon it (as it's not terribly important except as a check or as a curiosity).
This afternoon was less interesting, as I've been testing various variables in the BeatFinder in order to better find beats. So far, I've gotten fairly reliable data concerning bass beats, although low notes from non-drums are also included, and vocal sound can throw it off. Tomorrow is more work on that, and I hope to have it working reliably soon.
Started today by working with BeatFinder to speed it up. Since there was no way I could get the file reading to go any faster, I started playing with different ways to pass the data from FFT to BeatFinder. I ended up re-writing BeatFinder as a class, with accessors, that was given each miniwave FFT data as it was read. After all the FFTs were done, it runs itself to get the beats. If, for some reason, this is done when there is no FFT data, it gets beat information directly from the file (as it did in the original function from the begining of the week).
While I was working in WaveSplitter (which is where FFT, PeakFinder, and BeatFinder are all called) I realized that we didn't need to create miniwave files and save them to disk. Instead, the miniwaves are passed directly to FFT, without writing them to the disk. This nearly halfs the time when running large files, which is great, but makes us unable to run a single FFT on a given second. If there's time, we'll work on getting just the data at that second to find a single FFT, or we'll just abandon it (as it's not terribly important except as a check or as a curiosity).
This afternoon was less interesting, as I've been testing various variables in the BeatFinder in order to better find beats. So far, I've gotten fairly reliable data concerning bass beats, although low notes from non-drums are also included, and vocal sound can throw it off. Tomorrow is more work on that, and I hope to have it working reliably soon.
Axtell's Notes: July 1
Very slow day. I first played with raising the threshold a little bit to see if I could clean up the peak graph at all, but it cut out too much data. I read a paper from ircam in France on spectral features. I looked at Gregor's FFT code that uses complex arrays instead of doubles to see how much work it would take to start using that instead of what we have now. There are basically two ways to adapt everything to work together. We could either get all the FFT data in a complex array, read just the real part as doubles into a double array that could be used in everything Tayloe and I have written, or we could adapt everything we have to work with a complex array. We're not sure which will work the best. We're going to try both tomorrow. I spent the rest of the morning trying to get rid of the miniwav's, but Tayloe's worked better so I scrapped that. We no longer use miniwavs at all. Maple Leaf Rag finishes in just over two minutes on our fastest computer now. It used to take 10-15.
In the afternoon, I fixed the colors again. Someday I'll be done fixing the colors. Now the colors are printed in order from quietest to loudest because the quieter peaks were covering up the louder peaks and making everything harder to read accurately.
Lastly, I worked on adding a Gaussian window because it is supposed to work better than the Hamming window which is out current default. That is almost working, though not at all in FFTGUI. In fact, FFTGUI doesn't work right now because I haven't had a chance to update it to deal with the added window.
So, first task tomorrow is getting FFTGUI working with Gaussian windows and making sure that my Gaussian window works. Then I will move on to switching our code from the old FFT class to Gregor's.
In the afternoon, I fixed the colors again. Someday I'll be done fixing the colors. Now the colors are printed in order from quietest to loudest because the quieter peaks were covering up the louder peaks and making everything harder to read accurately.
Lastly, I worked on adding a Gaussian window because it is supposed to work better than the Hamming window which is out current default. That is almost working, though not at all in FFTGUI. In fact, FFTGUI doesn't work right now because I haven't had a chance to update it to deal with the added window.
So, first task tomorrow is getting FFTGUI working with Gaussian windows and making sure that my Gaussian window works. Then I will move on to switching our code from the old FFT class to Gregor's.
30 June, 2010
6/30/10 Daily Journal of AT
Spent the morning figuring out things going wrong with the FFT version of BeatFinder. I managed to get an analysis working on bands of frequencies, rather than just one, so that it was more accurate. However, re-reading my notes, I realized I forgot something critical: the imaginary aspect. Fairly easy to do, since our FFT discards the data entirely. According to the algorithm, the data should be squared, and in the case of FFT data, the imaginary squared is subtracted from the real squared, leading to a very large difference in data. I'm not 100% sure this is necessary, as we use the real data only in everything else. However, this might cause different errors that we are not aware of .
Moved onto/got distracted by Axtell's work with thresholds of hearing. Basically, depending on frequency, the softest note people can hear changes. There are a few algorithms that approximate this threshold, and we worked with them today to replace PeakFinder, as it did the same thing, only better. (This was the magic threshold I was searching for those two weeks ago!) Between the testing and debugging, that took most of the rest of today.
In the afternoon, I did some research to try and speed up the BeatFinder. I started out looking at ways to optimize the speed of the file reading, as that's where most of the lag comes from. Unfortunately, it seems that what I'm doing now is the fastest that it can get, at least in terms of accessing the file. It recommended using buffers (I am) and only getting necessary data (I do).
Alternatively, I could forgo writing all the FFT data (probably wise, as it's a very large file). The options then are (1) use the peaks instead or (2) get the beats as the FFT runs. The problem with the first is that I'm still not sure if it's returning all hearable notes, and, if it's not, it will throw off all the beats. The problem with the second is mainly one of structuring, as I'm sure it would be much faster then what I currently have (no reading or writing of files necessary!). I'll try implementing the latter tomorrow.
Moved onto/got distracted by Axtell's work with thresholds of hearing. Basically, depending on frequency, the softest note people can hear changes. There are a few algorithms that approximate this threshold, and we worked with them today to replace PeakFinder, as it did the same thing, only better. (This was the magic threshold I was searching for those two weeks ago!) Between the testing and debugging, that took most of the rest of today.
In the afternoon, I did some research to try and speed up the BeatFinder. I started out looking at ways to optimize the speed of the file reading, as that's where most of the lag comes from. Unfortunately, it seems that what I'm doing now is the fastest that it can get, at least in terms of accessing the file. It recommended using buffers (I am) and only getting necessary data (I do).
Alternatively, I could forgo writing all the FFT data (probably wise, as it's a very large file). The options then are (1) use the peaks instead or (2) get the beats as the FFT runs. The problem with the first is that I'm still not sure if it's returning all hearable notes, and, if it's not, it will throw off all the beats. The problem with the second is mainly one of structuring, as I'm sure it would be much faster then what I currently have (no reading or writing of files necessary!). I'll try implementing the latter tomorrow.
Axtell's Notes: June 30
So today we changed everything. Not quite, but it feels like it because we no longer use PeakData at all. Threshold has completely replaced it. I found the problems that were holding up Threshold (using the negative of the dB of that point for some reason...) and did a lot of testing as to which weighting function works best (A, B, C, D.) I also tried a slight shift on the decibel calculation when using A-, B- and C-Weightings because their functions use a shift to line up their numbers (Remember, none of these functions are perfect because no one has yet found an equation to match the ATH data set.)
I looked at four files (a440.wav, threenotes.wav, fade.wav and mapleleafrag.wav) with each weighting with and without the shift. (All images are of fade.wav)
A-Weighting lost a lot of data that was audible:
B-Weighting shows the most data without showing spill:
C-Weighting was a close second to B:
D-Weighting shows a lot of upper frequency spill, just barely visible here:
So B without a shift was the winner and now our default weighing function. Here is Maple Leaf Rag using our new and improved Threshold:
Compare that to this, the last Maple Leaf Rag I posted from June 25th:
We gained some upper frequency spill, and lost a lot of lower frequency spill. I'll try to clean that up some more tomorrow.
I also did a little bit of updates to the color spectrum to work with Threshold. Tomorrow, I'll be adding decibels everywhere since we have that math working now! Finally.
I looked at four files (a440.wav, threenotes.wav, fade.wav and mapleleafrag.wav) with each weighting with and without the shift. (All images are of fade.wav)
A-Weighting lost a lot of data that was audible:
B-Weighting shows the most data without showing spill:
C-Weighting was a close second to B:
D-Weighting shows a lot of upper frequency spill, just barely visible here:
So B without a shift was the winner and now our default weighing function. Here is Maple Leaf Rag using our new and improved Threshold:
Compare that to this, the last Maple Leaf Rag I posted from June 25th:
We gained some upper frequency spill, and lost a lot of lower frequency spill. I'll try to clean that up some more tomorrow.
I also did a little bit of updates to the color spectrum to work with Threshold. Tomorrow, I'll be adding decibels everywhere since we have that math working now! Finally.
29 June, 2010
Daily Blog 5
I started researching the constant Q transform (CQT) on Monday and found the transform differs from the discrete Fourier transform (DFT) in several ways but the most important difference is the constant Q transform analyzes an audio file on a logarithmic scale. At the lower frequencies, there are more "bins" for analysis than at the higher frequencies. The file is analyzed this way because humans can not distinguish a 2.5 Hz difference at higher frequencies but we can at the lower ones so the resolution of the window at the high frequencies (10 KHz to 20KHz) does not have to be as high as at the lower frequencies.
Like the DFT, the CQT can be calculated by brute force or a short cut can be used. The fast Fourier transform (FFT) is the short cut for the DFT. The FFT is used in the short cut for the CQT. The CQT is equal to multiplying the FFT by the spectral kernel of the data. I found a website with some sample code in MatLab: http://www.hans.fugal.net/research/cq-octave.
Yesterday, I spent most of the day rewriting some of my previous code in hopes of condensing and modifying it so I could use a generic class for both my DFT class and CQT class. Unfortunately, I only managed to crash what I had written so I am going to change it back and look to modify it after I have a working CQT.
Like the DFT, the CQT can be calculated by brute force or a short cut can be used. The fast Fourier transform (FFT) is the short cut for the DFT. The FFT is used in the short cut for the CQT. The CQT is equal to multiplying the FFT by the spectral kernel of the data. I found a website with some sample code in MatLab: http://www.hans.fugal.net/research/cq-octave.
Yesterday, I spent most of the day rewriting some of my previous code in hopes of condensing and modifying it so I could use a generic class for both my DFT class and CQT class. Unfortunately, I only managed to crash what I had written so I am going to change it back and look to modify it after I have a working CQT.
6/29/10 Daily Journal of AT
Got it working again! Beat Finder will now run with both a basic file and on an FFT. Took most of the day to get it working properly, and there's still more to go. That's good, though, because most of what's left is finding the frequencies corresponding to various beat sources, like bass drums, hi hats, snares, bongos, tamborines, cowbells, and general frequencies. It'd be cool to have user options to specify this sort of thing, or analyze it to have the program tell itself which one to run. However, that's a bit more long-term.
This morning, I figured out the null pointer error that was causing my try to become a catch and return with an empty array. It was going a bit too far in the file, and couldn't fill the array. Fixed that with a simple -1, then moved on to getting the FFT to play nice. I ended up reinstating the FFT writer, but instead of having it in FFT, it's in WaveSplitter. That way, the data from all the miniwaves can be added.
Getting it back from the file was a lot harder. (It's impossible to just pass it, because there's not enough memory space.) The function ended up to be similar to what's done in NoteFinder, but with lots of extra for loops. (It's not terribly efficient yet.) Basically, the FFT file consists of all the miniwaves samples, and all the hertzs and heights in each. It's similar to what's in the peak data, but containing everything, so it's a much larger file and takes a lot longer to go through. (On the plus side, for anything else we need FFT data for, we have, and PeakFinder/NoteFinder may be re-done to reflect this.)
The function for getting the peaks at all samples (the chunks from yesterday) at a given frequency goes like this: Inside a while loop to check that there's more to be read, it goes past the number equal to the frequency, saves the frequency to a linked list, then goes through the rest until it reaches a double blank line. Then, it does it all again, until the file is empty. The list is made into an array and returned. That data is then used in the beat finder.
Next up is getting multiple frequencies at once, which will be harder, as the options are (A) run the data getter multiple times, which will save space, but take up a lot of processor time, or (B) change the data getter to get more than one frequency, which will be faster, but take up more space and be much more complicated in terms of coding. I'm inclined to go with the first, as I may be able to figure out how to optimize the speed, and it's easier to do that way first.
Also, the Beat Finder will also function as a single note identification program. Basically, right now it can find the peak frequencies of any note and return them. So if, for example, someone wanted to find all instances of E2 in a song...
...it's hard to imagine circumstances that would require it, but we can do it! (In this case, it corresponds to the back beat of a drum set.)
This morning, I figured out the null pointer error that was causing my try to become a catch and return with an empty array. It was going a bit too far in the file, and couldn't fill the array. Fixed that with a simple -1, then moved on to getting the FFT to play nice. I ended up reinstating the FFT writer, but instead of having it in FFT, it's in WaveSplitter. That way, the data from all the miniwaves can be added.
Getting it back from the file was a lot harder. (It's impossible to just pass it, because there's not enough memory space.) The function ended up to be similar to what's done in NoteFinder, but with lots of extra for loops. (It's not terribly efficient yet.) Basically, the FFT file consists of all the miniwaves samples, and all the hertzs and heights in each. It's similar to what's in the peak data, but containing everything, so it's a much larger file and takes a lot longer to go through. (On the plus side, for anything else we need FFT data for, we have, and PeakFinder/NoteFinder may be re-done to reflect this.)
The function for getting the peaks at all samples (the chunks from yesterday) at a given frequency goes like this: Inside a while loop to check that there's more to be read, it goes past the number equal to the frequency, saves the frequency to a linked list, then goes through the rest until it reaches a double blank line. Then, it does it all again, until the file is empty. The list is made into an array and returned. That data is then used in the beat finder.
Next up is getting multiple frequencies at once, which will be harder, as the options are (A) run the data getter multiple times, which will save space, but take up a lot of processor time, or (B) change the data getter to get more than one frequency, which will be faster, but take up more space and be much more complicated in terms of coding. I'm inclined to go with the first, as I may be able to figure out how to optimize the speed, and it's easier to do that way first.
Also, the Beat Finder will also function as a single note identification program. Basically, right now it can find the peak frequencies of any note and return them. So if, for example, someone wanted to find all instances of E2 in a song...
...it's hard to imagine circumstances that would require it, but we can do it! (In this case, it corresponds to the back beat of a drum set.)
Axtell's Notes: June 29
Today I was looking at a bunch of different psychoacoustic functions and how we could use them. I started looking at the absolute threshold of hearing or ATH (a data set showing the lowest audible decibel of each frequency.) I hope to use it to ignore and not graph all the data we get that is inaudible. For now we accomplish that by ignoring the bottom 10% of a file, but this still leaves a lot of spill since higher frequencies are inaudible at much higher decibels. I first got off topic and looked at the equal-loudness contour which uses A-Weighting (a curve that tries to mimic the ATH) to make everything in an audio file (almost) the same volume. There is a check box now that will show that though I'm not sure that this is relevant or useful.
I am working on using the A-Weighting curve to determine if a point is audible or not. Currently, it does find some points that aren't audible and paints them white so I can see where they are, but a many points that should come back as inaudible don't. I think there is a flaw in my decibel conversion. Once this gets working, the BeatFinder should be much more accurate since we won't have to worry about spill as much.
Slow day today, but it should lead to something useful eventually. More research in psychoacoustics and testing of decibel converters and threshold functions tomorrow.
I am working on using the A-Weighting curve to determine if a point is audible or not. Currently, it does find some points that aren't audible and paints them white so I can see where they are, but a many points that should come back as inaudible don't. I think there is a flaw in my decibel conversion. Once this gets working, the BeatFinder should be much more accurate since we won't have to worry about spill as much.
Slow day today, but it should lead to something useful eventually. More research in psychoacoustics and testing of decibel converters and threshold functions tomorrow.
28 June, 2010
6/28/10 Daily Journal of AT
Today was pretty good. I started writing my beat finder function, which took some time. Merely reading the file in was hard enough! Once I got the data, I split it into miniwave-sized chunks, got the data at each point, squared, then totaled them. I then took the average of all the chunks together, and compared each chunk to the overall average. If they were higher, they were beats. That worked okay, so I decided to work on improvements. The first, storing all the data as chunks rather than exact bytes, I had already implemented, so I moved onto getting the variance.
That took a good deal longer. First off, I had to actually understand the math, and write a separate function for it. Then I realized it made a lot more sense within my other function, so moved it, and managed to get it mucked. Basically, it always returned a beat, regardless of height. After about an hour and a half of fixing that (and leaving the program a bit messy) it was working more smoothly. So, I gave it to Axtell to run on a song (because her computer is much faster than mine) and gave it a song.
It ran out of memory.
Fortunately, there's a fix, as the memory-running-out happened when reading the file in, which means that it's too large to create an array (which happened with our FFT in the early days). At first, I thought the trick would be to use the miniwave files, and read them in instead. However, upon closer inspection, the error proved to be in the size of the array holding the data, which would not be reduced by using the miniwaves.
The answer is, rather than reading everything, read the enough data to find a chunk, add the chunk to a list, and make an array out of the list. I've gotten rid of that error, but for some reason, all my data is set to zero. Well, it's something to fix tomorrow, as well as working on adding the FFT data to it. Probably have to write all the data to a file...which we got rid of...oh, the joys of coding.
That took a good deal longer. First off, I had to actually understand the math, and write a separate function for it. Then I realized it made a lot more sense within my other function, so moved it, and managed to get it mucked. Basically, it always returned a beat, regardless of height. After about an hour and a half of fixing that (and leaving the program a bit messy) it was working more smoothly. So, I gave it to Axtell to run on a song (because her computer is much faster than mine) and gave it a song.
It ran out of memory.
Fortunately, there's a fix, as the memory-running-out happened when reading the file in, which means that it's too large to create an array (which happened with our FFT in the early days). At first, I thought the trick would be to use the miniwave files, and read them in instead. However, upon closer inspection, the error proved to be in the size of the array holding the data, which would not be reduced by using the miniwaves.
The answer is, rather than reading everything, read the enough data to find a chunk, add the chunk to a list, and make an array out of the list. I've gotten rid of that error, but for some reason, all my data is set to zero. Well, it's something to fix tomorrow, as well as working on adding the FFT data to it. Probably have to write all the data to a file...which we got rid of...oh, the joys of coding.
Axtell's Notes: June 28
I re-reorganized the menus today. Now the buttons that only effect the peak graph are in a separate menu that pops up when Show Peaks is pressed. I added the check boxes for constant Q (this doesn't do anything yet) and beats. The beats check box is in the peak graph menu and draws a vertical grey line across the graph wherever BeatFinder finds a beat. It also writes those beats as a wave file. This is done by making a wave that is just a single tone as long as the sample length (for now assumed to be 1024 samples) and a wave of silence of the same length. When BeatFinder finds a beat, it adds the tone and where BeatFinder finds silence, it adds silence. BeatFinder is sort of working, so the BeatWriter is sort of working.
This gets almost all the beats except the last one which is a cymbal crash. We think that is why it's missing it because the PeakFinder is missing it too (The last peak should be visible around 2 seconds).
I started trying to look at BeatFinder on longer files and promptly got many OutOfMemoryErrors. I spent the rest of the day trying to increase the max memory for java to no avail. Tayloe is looking at decreasing how much memory everything uses. Hopefully, tomorrow we'll figure out some way to run full length songs again.
This gets almost all the beats except the last one which is a cymbal crash. We think that is why it's missing it because the PeakFinder is missing it too (The last peak should be visible around 2 seconds).
I started trying to look at BeatFinder on longer files and promptly got many OutOfMemoryErrors. I spent the rest of the day trying to increase the max memory for java to no avail. Tayloe is looking at decreasing how much memory everything uses. Hopefully, tomorrow we'll figure out some way to run full length songs again.
Subscribe to:
Posts (Atom)