Hooray! I feel like everything that went wrong yesterday has been put right. Well, some of it, anyway.
We scrapped the JASS FFT function and implemented the version from Sonogram, which works a lot better. Apparently, the underlying waves were still showing through in the JASS code, which was throwing everything off. Now, nearly everything is working as it should. The windows work, we're pretty sure correctly, although the redrawing problem still exists. The 440 FFT now works and gives the same single peak consistently, regardless of the length of the file. In addition, an older problem of the FFT not working on wave files smaller than a second has been repaired, almost certainly by changing the FFT. Now, an accurate FFT can be run on files of any length (within reason, as data on a few frames will be far from accurate, and the computer may not be able to handle files larger than 20 seconds. However, the FFT will likely only be run on files at a max of 1 second, so that's not really a problem.)
My main dilemma is with the Peak Data program that finds the peaks of a given set of FFT data. There are three options that, at one time or another today, I've implemented:
1) Use a pre-generated threshold to pick the peaks. This is rather dangerous and inaccurate, as the range of peaks generated in different data varies greatly based on the length and the occurrence of other peaks. This method will not be used.
2) Use a threshold based on the top percentage of the range of data. This would be similar to the above, but much more accurate on any given set of data. However, the issue becomes as to what percentage to take. With testing, I've determined that, in general, anything less than 10% will miss many smaller peaks, while anything larger than 20% will pick up too much noise. This method may be used.
3) Get the average of the data and delete anything smaller. Run it multiple times to leave the largest peaks. I had initially thought this would half my data each time; however, it has turned out to get rid of anywhere from half to nine tenths of the previous data, based on what was left. This seems much better for finding more accurate peaks; however, the problem of how many times to run it comes up again. From testing, it should be run between 4 and 7 times. This method may be used.
4) Some combination of the above. This is most likely what I will do, along with more accurate peak finding.
Alternatively, I should probably write a more accurate peak finder. Right now, it considers anything a peak if the data immediately before and after is less than the given value. Perhaps with checking more surrounding data and having more rigorous qualifications will allow the finder to do more of the work, and perhaps not need the cleaner at all.
The other thing I found was that the WaveSplitter is kind of slow. When the length of the resulting mini files is a second, or a largish fraction of a second (as low as .1 sec) it isn't noticeably slow. However, getting files that are 512 samples long (which is around .01 seconds) takes a lot longer. For example, on a two minute song, the program generates 9654 files, and takes about twenty minutes to generate and run an FFT on each. Adding in the analysis will probably bring the total to 30 minutes per song analyzed. Which isn't too bad, but if it can be faster, I'd like it to be.
TODO: Write function to give frequency (range) of a given peak. Write peaks to file. Sort file. Use file to print new graph of total peaks in a song to graph. Get WaveSplitter to go faster, possibly by not saving wave files (?) or by having larger sample size. Find peaks more accurately.
This blog is designed to be a web journal to document the progress made on a 2010 summer research project regarding music analysis and Fast Fourier Transforms.
11 June, 2010
Axtell's Notes: June 11
Today has been massively productive. First off, the doFFT() method (from JASS) has been replaced with Sonogram's method because it does not write over the array that it is given, as the JASS method does.
It also returns a more accurate array of data without any negative values. Because there are no longer negative values, the graph has been reformatted to only show the positive x- and y-axises. On the graphing side of things, the FFT graph now displays the frequency in megahertz (MHz). I hope to set up a feature that will display the frequency of a peak in hertz when hovered over/clicked on.
I also discovered that the windowing functions have been working, but because the graphing class scales each set of data to fill the graph, the difference couldn't be seen. When called in comparison, the four FFTs (FFT and three different windowed FFTs of the same file) appear on top of each other in different colors and in the same scale.
The next step will be to combine this now working code with Tayloe's wave splitting code to make FFTs (windowed or not) of much shorter sound files (512 samples so much shorter than one second).
Next week I will also continue update the GraphingData class, and work on adding a function that will plot the sound file given on a graph (x-axis as time, y-axis as amplitude). So if the sound file given were a single sustained note, it would graph a sine curve.
It also returns a more accurate array of data without any negative values. Because there are no longer negative values, the graph has been reformatted to only show the positive x- and y-axises. On the graphing side of things, the FFT graph now displays the frequency in megahertz (MHz). I hope to set up a feature that will display the frequency of a peak in hertz when hovered over/clicked on.
I also discovered that the windowing functions have been working, but because the graphing class scales each set of data to fill the graph, the difference couldn't be seen. When called in comparison, the four FFTs (FFT and three different windowed FFTs of the same file) appear on top of each other in different colors and in the same scale.
Windowed FFT Comparison
(Click on the image for a larger version)
(Click on the image for a larger version)
Red = unwindowed FFT
Blue = Bartlett window
Green = Hamming window
Yellow = Hanning window
Blue = Bartlett window
Green = Hamming window
Yellow = Hanning window
I will be working on changing which window is in front through some buttons or clicking on the graph.
Another update to the graphs is that if multiple windows are open at once, closing one does not close all of them and does not end the program.
Another update to the graphs is that if multiple windows are open at once, closing one does not close all of them and does not end the program.
The next step will be to combine this now working code with Tayloe's wave splitting code to make FFTs (windowed or not) of much shorter sound files (512 samples so much shorter than one second).
Next week I will also continue update the GraphingData class, and work on adding a function that will plot the sound file given on a graph (x-axis as time, y-axis as amplitude). So if the sound file given were a single sustained note, it would graph a sine curve.
10 June, 2010
6/10/10 Daily Journal of AT
I really hope the professor comes in tomorrow, because I feel like I'm stuck. Most of the morning was debugging and discovery of problems. The afternoon was more testing, comparing graphs and functions, and discovery of more problems. To name a few that I found/helped with (and the solutions we tried/implemented/can't figure out):
- Data files add information, making graphs longer and longer without adding information. (Solution: Clear the data files.)
- Data files are cleared at the end of each program, making it impossible to tell where errors occur. (Solution: Clear data files at the beginning of program.)
- Data files are not passed any zeroes. (Solution: Let them pass zeroes.)
- Data files have a lot of zeroes at the end that mean nothing. (Solution: Reduce size of array.)
- Data is mirrored. (Solution: Reduce size of the array still more.)
- Data is overwritten each time FFT is run, making it impossible to do peak comparison on graphs. (Solution: Manually rename the files so they are not over written. Temporary solution.)
- Windowing doesn't work. (Solution: Run windowing algorithms on wave data, not on FFT data.)
- FFT data is wrong when multiple windows are used. (Solution: Re-write imaginary array of zeroes at the begining of FFT call, rather then at creation.)
- FFT data of rectangular, Bartlett, and Hamming functions is the same. (Solution: ???)
- Peak generator only gets highest points, regardless of their peak-quality. (Solution: Find the peaks, then clean the data, rather than the other way around. To go into more detail, the clean function gets the average of all the data, disregarding any zeroes or negatives, then sets any number below the average to zero. Run multiple times, it reduces the size of the data severalfold. The getPeaks function got the peaks by checking a given piece of data against its immediate neighbors. If it was greater than both, it was considered a peak. However, running clean, then getPeaks, resulted in the same data as only running clean. The reason for this was that, if its neighbors were set to zero, a given large number would always be considered a peak, regardless of if it actually was or not. Run the other way around, so that the peaks were generated first, then cleaned, led to a more accurate analysis.)
- When FFT is run on a single 440 wave, with the only difference being in how long the file is (1 sec, 3 sec, and 12 sec), different data is generated. (Solution: ??? I really don't know why this is happening, or even if it's a problem. The FFT data generated is put into an array, then printed from their. The size of the array holding the bytes does not change based on the length of the sample, but is set to the buffer limit. This could be limiting the data given to the FFT; however, even if this were the case, the data generated should still be identical, because it is exactly the same wave. If it were to vary based on the length of the file, it should only do so in terms of height of the peaks, not their placement. The only option I can think of is to scrap our FFT and start again, but we thought it was working! Also, it's what we've been doing all week, and I really hope we haven't wasted all that time.)
- FFT data is in an array, and graphs have no way of telling what peaks correspond to what frequencies. (Solution: Test single frequencies in the peak finder, find what array values corespond to what frequencies, find a function, and translate songs based on that. However, with the FFT on the fritz, any data generated by it cannot be trusted. In addition, when I tried this with single frequencies, it gave multiple peaks of similar heights. In fact, the file with a single base note (with more actual noise) generated less noise in the FFT! This is frustrating, especially as I did not write the code that does the FFT, and thus have trouble tweaking it to make it work. When zoomed in upon, the graphs the FFT generates are similar to graphs created by Audacity analyzing the same files, or at least, they were yesterday. Today, they show a lot of discrepancy between the pure frequencies.)
Axtell's Notes: June 10
Unproductivity is apparently the word of the week.
I did get some good work done today though. I figured out the flaw in the windowing methods. It was returning a list of zeros because each term was being multiplied by a null. Having fixed that, I realised that I was asking it to graph the curve of the window rather than the window applied to an FFT. That too was fixed and now each window method prints a graph. I believe the graphs are incorrect, but at least something is shown.
Each window function when called on their own (as opposed to together through the compare option) seems to be the same as the FFT without windowing. The Hanning window is a little different, but it doesn't get rid of the leakage. If anything, it adds more.
When called together, something is still off. The Hanning window looks the same, but the others lose several values towards the end of the frequency spectrum.
I did get some good work done today though. I figured out the flaw in the windowing methods. It was returning a list of zeros because each term was being multiplied by a null. Having fixed that, I realised that I was asking it to graph the curve of the window rather than the window applied to an FFT. That too was fixed and now each window method prints a graph. I believe the graphs are incorrect, but at least something is shown.
Each window function when called on their own (as opposed to together through the compare option) seems to be the same as the FFT without windowing. The Hanning window is a little different, but it doesn't get rid of the leakage. If anything, it adds more.
When called together, something is still off. The Hanning window looks the same, but the others lose several values towards the end of the frequency spectrum.
Comparison of FFT Graphs
(Click on the image to see a larger, more legible version)
In this process a few other things got accomplished. I got rid of the mirroring of graphs the was seen yesterday. I had been only plotting the first half of the FFT values, thinking that would get rid of the mirroring. It turns out that it did and the image was mirrored twice, so I now plot only the first quarter of the FFT. Not sure why this is true. I'll do some research into that at a later date.
I also discovered that there were two float arrays with the same data but different names floating around (no pun intended) After some searching I found that the doFFT() method rewrites the float array it is given with the FFT values and returning a second version of that array. That has been remedied.
Another big change in the code is that it no longer writes to Text files. That was getting confusing as it had to write over the same file each time, so the information is stored in a float array in the FFT class that is passed to the GraphingData class to be graphed. That cleaned up the code quite a bit.
The last change was to change the if-else loops that dealt with the menu choice and which window to use are now switch-case blocks. I just find that it's neater. I also added a try-catch block around the menu's switch-case blocks to keep it from crashing.
So maybe today wasn't that unproductive, quite a lot got updated if not fixed, but it felt it because the windows still do not work right. That's top of my list tomorrow. Again.
I'm going to end today with some research into what I should add to this next. Assuming that the windows are fixed and working tomorrow, I will need something else to do for next week.
I also discovered that there were two float arrays with the same data but different names floating around (no pun intended) After some searching I found that the doFFT() method rewrites the float array it is given with the FFT values and returning a second version of that array. That has been remedied.
Another big change in the code is that it no longer writes to Text files. That was getting confusing as it had to write over the same file each time, so the information is stored in a float array in the FFT class that is passed to the GraphingData class to be graphed. That cleaned up the code quite a bit.
The last change was to change the if-else loops that dealt with the menu choice and which window to use are now switch-case blocks. I just find that it's neater. I also added a try-catch block around the menu's switch-case blocks to keep it from crashing.
So maybe today wasn't that unproductive, quite a lot got updated if not fixed, but it felt it because the windows still do not work right. That's top of my list tomorrow. Again.
I'm going to end today with some research into what I should add to this next. Assuming that the windows are fixed and working tomorrow, I will need something else to do for next week.
09 June, 2010
6/9/10 Daily Journal of AT
Today, fortunately, was much more productive than yesterday. In the morning, I wrote and tested WaveSplitter, a program that splits one wave file into many smaller ones, then runs an FFT on them. (Note: Code used to run FFT and draw graphs was from yesterday, still have to implement changes made to them by partners.) After doing research on the AudioInputStream class, I managed to get my code to read the buffered file and write it out again in chunks (rather than using nested loops, as I had originally planned). Each wave file can be successfully FFT'ed upon. Only problem I discovered was that, if one tried to split the wave file into larger chunks (the default that works is one second) it will only give one second to be analyzed, just space them out more and have fewer mini-files. I'm not quite sure how to fix it, so am waiting until Friday. There was another problem but I can't think of it right now.
This afternoon, I worked on a function to get the peaks from the FFT data generated. This involved a bit of trial-and-error with reading the files in (and the hope that the FFT file generation hasn't changed much today). I wrote a simple loop that goes through the data and gets rid of the smallest half (because they are obviously not large enough peaks to bother with). Running this "cleaner function" multiple times gives the peaks of the data. Comparing the graphs to the numbers, it appears fairly accurate. The only trick will be to translate the location of the peak to a frequency, in order to make this useful for analysis, and be able to write and append this data to a single file, then use the file to generate a graph for the song as a whole (with higher accuracy than taking an FFT of the entire song, as in the latter, small peaks would be lost in the noise, while in the former, smaller peaks are preserved).
Also, as a special tangential issue, I researched some information about logarithmic thinking. I first heard about this on a podcast, RadioLab, done through NPR. They did a special on numbers, and their first story was about how babies and toddlers "think" of numbers.
Found at RadioLab's website.
Basically, there was a study done on babies, children from two to three months old. They were hooked up to an electrode net and shown a series of eight identical pictures, in one case, ducks. They were measured on their reactions when the pictures changed to, say, eight trucks. Then, they changed from eight ducks to sixteen ducks. The babies' brains still showed a change, but in a different area. The study went on to show that babies can tell the difference between large differences of numbers, like ten and twenty, but not smaller ones, like nine and ten.
The reason for this is that, initially, our brains think of numbers logarithmically. The show goes on to tell about studies done among non-counting people living in the Amazon basin, who, when asked what is between one and nine, reply "Three." To us, it makes no sense, as the number directly between one and nine is five (four more than one and four less than nine). But the way they think of it is, one times three is three. Three times three is nine. Therefore, three is exactly between one and nine.
This way of thinking is not unique to infants and non-counters, either. Say, for example, there's a sale on a pair of pants. Normally, the pants cost twenty dollars, but today, they cost ten. That's a really great deal, isn't it? Now say that you find a jacket that's also on sale. It used to be a hundred dollars, but now it's only ninety. Is it as good a deal?
Thinking linearly about it, it seems both these deals are the same. However, half off is far better than ten percent off, and our gut reaction is to say that the pants are a better deal than the jacket. As a third example, if you were thinking of buying a ten thousand dollar car, the salesman knocking ten dollars off the price will probably not sway you much.
This relates to our project with the fact that steps on the western scales are arranged logarithmically. For example, going from middle C to high C doubles the frequency (and halves the wavelength). Now, this has something to do with the waves that make up sounds, as waves that have peaks at similar times will reinforce each other, creating a stronger sound. If wave 1 has half the frequency of wave 2, they will reinforce each other every other wave, whereas if wave A has a wavelength of a hundred less than wave B, there's no telling when they will reinforce each other. However, our brains prefer the reinforced waves, which in turn lead to our appreciation for things "in key." I'm fairly sure it's related to our thinking logarithmically, or perhaps we think logarithmically and like chordes due to some other facet of human nature. Regardless, it's pretty interesting.
This afternoon, I worked on a function to get the peaks from the FFT data generated. This involved a bit of trial-and-error with reading the files in (and the hope that the FFT file generation hasn't changed much today). I wrote a simple loop that goes through the data and gets rid of the smallest half (because they are obviously not large enough peaks to bother with). Running this "cleaner function" multiple times gives the peaks of the data. Comparing the graphs to the numbers, it appears fairly accurate. The only trick will be to translate the location of the peak to a frequency, in order to make this useful for analysis, and be able to write and append this data to a single file, then use the file to generate a graph for the song as a whole (with higher accuracy than taking an FFT of the entire song, as in the latter, small peaks would be lost in the noise, while in the former, smaller peaks are preserved).
Also, as a special tangential issue, I researched some information about logarithmic thinking. I first heard about this on a podcast, RadioLab, done through NPR. They did a special on numbers, and their first story was about how babies and toddlers "think" of numbers.
Found at RadioLab's website.
Basically, there was a study done on babies, children from two to three months old. They were hooked up to an electrode net and shown a series of eight identical pictures, in one case, ducks. They were measured on their reactions when the pictures changed to, say, eight trucks. Then, they changed from eight ducks to sixteen ducks. The babies' brains still showed a change, but in a different area. The study went on to show that babies can tell the difference between large differences of numbers, like ten and twenty, but not smaller ones, like nine and ten.
The reason for this is that, initially, our brains think of numbers logarithmically. The show goes on to tell about studies done among non-counting people living in the Amazon basin, who, when asked what is between one and nine, reply "Three." To us, it makes no sense, as the number directly between one and nine is five (four more than one and four less than nine). But the way they think of it is, one times three is three. Three times three is nine. Therefore, three is exactly between one and nine.
This way of thinking is not unique to infants and non-counters, either. Say, for example, there's a sale on a pair of pants. Normally, the pants cost twenty dollars, but today, they cost ten. That's a really great deal, isn't it? Now say that you find a jacket that's also on sale. It used to be a hundred dollars, but now it's only ninety. Is it as good a deal?
Thinking linearly about it, it seems both these deals are the same. However, half off is far better than ten percent off, and our gut reaction is to say that the pants are a better deal than the jacket. As a third example, if you were thinking of buying a ten thousand dollar car, the salesman knocking ten dollars off the price will probably not sway you much.
This relates to our project with the fact that steps on the western scales are arranged logarithmically. For example, going from middle C to high C doubles the frequency (and halves the wavelength). Now, this has something to do with the waves that make up sounds, as waves that have peaks at similar times will reinforce each other, creating a stronger sound. If wave 1 has half the frequency of wave 2, they will reinforce each other every other wave, whereas if wave A has a wavelength of a hundred less than wave B, there's no telling when they will reinforce each other. However, our brains prefer the reinforced waves, which in turn lead to our appreciation for things "in key." I'm fairly sure it's related to our thinking logarithmically, or perhaps we think logarithmically and like chordes due to some other facet of human nature. Regardless, it's pretty interesting.
Axtell's Notes: June 9
Another not as productive day as I would have liked. The main discovery of the day is that I was giving the doFFT() method the wrong length of array. The array length should be a "buffer" that is always a power of two. The higher the power, the more accurate the result. I had been giving it a list as long as there were samples (Often in the hundred thousands for a sound clip of several seconds). It is now set to 8192 (2^13) and seems to be working better. It is displaying a much nicer image, but still has the mirroring effect. Should figure out how to fix that.
I did some basic clean-up of my classes so the code would only have to work as hard as it had to. I also found a new way of converting bytes to floats. Instead of the confusing method I had (borrowed from JASS) I found four lines in Sonogram's code that accomplished the same thing. It casts the bytes as Bytes, and the Byte class has a method floatValue().
The window that pops up with the FFT is neater now. The line graph now is scaled as the window is re-sized so that it always fills the x- and y-axises. Other enhancements to be added to the graph window at a later date include:
-If more than one window is opened, offset each additional window so all are visible
-One window should close without closing all windows and ending the program
-The compare option will print in the same window in a different color for each window
Windowing functions are still not working correctly. Bartlett and Hamming windows seem not to make any change on the FFT. The Hanning window still produces nothing. A little research into this showed that this is because the method returns all zeros. I tried replacing the equation with several other versions of the Hanning window I found, but they all returned zeros. Because of this, I decided that I must be implementing the window in my code incorrectly. Once again, windows are on the to do list tomorrow.
On the topics of windows, I discovered that the Hanning window is more accurately called the "Hann window" in honor of van Hann. Because of the existence of the Hamming window and the ensuing confusion it tends to be called "Hanning" nowadays.
Tomorrow's Goals:
-Get windows working
-Figure out why the function mirrors itself and fix
-Print the corresponding frequency's along the x-axis
-Continue to update the graph window
I did some basic clean-up of my classes so the code would only have to work as hard as it had to. I also found a new way of converting bytes to floats. Instead of the confusing method I had (borrowed from JASS) I found four lines in Sonogram's code that accomplished the same thing. It casts the bytes as Bytes, and the Byte class has a method floatValue().
The window that pops up with the FFT is neater now. The line graph now is scaled as the window is re-sized so that it always fills the x- and y-axises. Other enhancements to be added to the graph window at a later date include:
-If more than one window is opened, offset each additional window so all are visible
-One window should close without closing all windows and ending the program
-The compare option will print in the same window in a different color for each window
Windowing functions are still not working correctly. Bartlett and Hamming windows seem not to make any change on the FFT. The Hanning window still produces nothing. A little research into this showed that this is because the method returns all zeros. I tried replacing the equation with several other versions of the Hanning window I found, but they all returned zeros. Because of this, I decided that I must be implementing the window in my code incorrectly. Once again, windows are on the to do list tomorrow.
On the topics of windows, I discovered that the Hanning window is more accurately called the "Hann window" in honor of van Hann. Because of the existence of the Hamming window and the ensuing confusion it tends to be called "Hanning" nowadays.
Tomorrow's Goals:
-Get windows working
-Figure out why the function mirrors itself and fix
-Print the corresponding frequency's along the x-axis
-Continue to update the graph window
08 June, 2010
Daily Blog 2
Today I spent most of the day finishing reading the articles I started yesterday. Part II of An Introduction to the Mathematics of Digital Signal Processing focused on sampling,m transforms, and filtering. DFTs along with z-transforms were discussed. I skipped the section on z-transforms as we are focusing on DFTs right now.
After finishing the articles, I believe I have a much better understand on how DFTs and sound sampling works and how to deal with the phenomena that can occur when sampling a signal and applying a DFT. The rest of this post will focus on summarizing what I read (hopefully) in a manner that will make the main points clear to Axtell and Tayloe.
As mentioned yesterday, the first part of the article focused on explaining the algebraic and trigonometric topics that are relevant to the understanding of transforms and sampling. Some of these topics included a description of the different types of numbers in math; whole (Z), natural (N), integer (I), real (R), and complex (C). Each set is "included" the sets to the right. For example, the integer number set includes the sets of whole and natural numbers, etc. The set of complex numbers are the most important to the discussion of signal processing as all signals have both a real and an imaginary part and can be described as: x + jy where x,y are elements of R and j2 = -1. j is used to represent the square root of 1 in signal processing as it is a branch of engineering and i is already used to represent current.
The article then goes on to explain polynomials, roots, exponents, logarithms, and e giving the rules and identities (if any) for each topic. He then explains sums and series. Summations are an important as a DFT is expressed mathematically as a summation.
After explaining the important topics in algebra, the topics of trigonometry are explained. Three measurements can be used in trig; radians, degrees, and grads. Pi/2 radians = 90o = 100 grads. Pythagoras' theorem is discussed and the trig the 13 most common trig identities are given. sin(x) and cos(x) only differ in phase (when x = 0, sin(x) = 0 and cos(x) = 1).
I found the following explanations to be helpful:
The second article focused on sampling, transforms, and filtering. With sampling, a phenomenon known as "the wagon-wheel effect" can occur and happens when the sampling rate in relationship to the actual frequency is too small. Using the example given by its namesake, in films, as a wagon increases in speed, the wheel will appear to increase as well until half of the frequency of the sampling rate (frames per second). At this point, the wheel will appear to decelerate or even go backwards. At R/2 (where R equals sampling rate), the frequency of the wheel is indistinguishable from its negative counterpart. For example, in old Western films the frames per second was equal to 24 (R = 24 Hz). At F = 12 Hz (F = actual frequency of the wheel), the wheel appears to be rotating at its maximum velocity. At F = 18 Hz, the wheel appears to be rotating at -6 Hz because it is impossible to tell if the wheel is completely 3/4 of a rotation each frame or -1/4 of a rotation. The Wikipedia article provides a good animation of this effect. The sampling theorem states to avoid aliasing or foldover (the wagon-wheel effect), any simple harmonic variation (a sinusoid) occurring at a rate of F Hz must be sampled at least 2F times per second. However, to avoid ambiguity in the equation statement of this theorem, the sampling rate R can not exactly equal 2F. So the sampling rate must be at least two times greater than the highest frequency component of the original waveform.
Digital signals can be thought of as functions of discrete values of time (n) or sequences of numbers, with each number representing the instantaneous value of a continuous time function (the original analog signal). I came across a good analogy while reading: "Prism is to light as Fourier analysis is to sound." White noise is the sound equivalent to white light in optics.
The article then discusses the DFT. It is "used to calculate the spectrum of a waveform in terms of a set of harmonically related sinusoids, each with a particular amplitude and phase" and is most commonly implemented by the FFT. Some quick notes about the DFT sequence x(n):
-----
\
| x(n)e-jwnk 0 <= k <= N - 1
/
-----
n = 0
where w(omega) = 2Pi/N and e-jwnk = cos(wnk) - jsin(wnk)
After finishing the articles, I believe I have a much better understand on how DFTs and sound sampling works and how to deal with the phenomena that can occur when sampling a signal and applying a DFT. The rest of this post will focus on summarizing what I read (hopefully) in a manner that will make the main points clear to Axtell and Tayloe.
As mentioned yesterday, the first part of the article focused on explaining the algebraic and trigonometric topics that are relevant to the understanding of transforms and sampling. Some of these topics included a description of the different types of numbers in math; whole (Z), natural (N), integer (I), real (R), and complex (C). Each set is "included" the sets to the right. For example, the integer number set includes the sets of whole and natural numbers, etc. The set of complex numbers are the most important to the discussion of signal processing as all signals have both a real and an imaginary part and can be described as: x + jy where x,y are elements of R and j2 = -1. j is used to represent the square root of 1 in signal processing as it is a branch of engineering and i is already used to represent current.
The article then goes on to explain polynomials, roots, exponents, logarithms, and e giving the rules and identities (if any) for each topic. He then explains sums and series. Summations are an important as a DFT is expressed mathematically as a summation.
After explaining the important topics in algebra, the topics of trigonometry are explained. Three measurements can be used in trig; radians, degrees, and grads. Pi/2 radians = 90o = 100 grads. Pythagoras' theorem is discussed and the trig the 13 most common trig identities are given. sin(x) and cos(x) only differ in phase (when x = 0, sin(x) = 0 and cos(x) = 1).
I found the following explanations to be helpful:
- period = frequency = repetition rate of a periodic waveform
- amplitude = the difference between peaks and/or troughs = strength
- timbre = tone quality = general shape
- pitch = "tonal height" of a sound
- any periodic waveform can be described as the sum of a number of sinusoidal variations with each one having a particular
- frequency
- amplitude
- phase
- must be Dirichlet as well as periodic meaning it must satisfy the conditions for a real-valued, periodic function
- f(t) = f(t + T) where f is the periodic waveform, t is time, and T is the period
The second article focused on sampling, transforms, and filtering. With sampling, a phenomenon known as "the wagon-wheel effect" can occur and happens when the sampling rate in relationship to the actual frequency is too small. Using the example given by its namesake, in films, as a wagon increases in speed, the wheel will appear to increase as well until half of the frequency of the sampling rate (frames per second). At this point, the wheel will appear to decelerate or even go backwards. At R/2 (where R equals sampling rate), the frequency of the wheel is indistinguishable from its negative counterpart. For example, in old Western films the frames per second was equal to 24 (R = 24 Hz). At F = 12 Hz (F = actual frequency of the wheel), the wheel appears to be rotating at its maximum velocity. At F = 18 Hz, the wheel appears to be rotating at -6 Hz because it is impossible to tell if the wheel is completely 3/4 of a rotation each frame or -1/4 of a rotation. The Wikipedia article provides a good animation of this effect. The sampling theorem states to avoid aliasing or foldover (the wagon-wheel effect), any simple harmonic variation (a sinusoid) occurring at a rate of F Hz must be sampled at least 2F times per second. However, to avoid ambiguity in the equation statement of this theorem, the sampling rate R can not exactly equal 2F. So the sampling rate must be at least two times greater than the highest frequency component of the original waveform.
Digital signals can be thought of as functions of discrete values of time (n) or sequences of numbers, with each number representing the instantaneous value of a continuous time function (the original analog signal). I came across a good analogy while reading: "Prism is to light as Fourier analysis is to sound." White noise is the sound equivalent to white light in optics.
The article then discusses the DFT. It is "used to calculate the spectrum of a waveform in terms of a set of harmonically related sinusoids, each with a particular amplitude and phase" and is most commonly implemented by the FFT. Some quick notes about the DFT sequence x(n):
- x(n) is modeled as one period with N samples (entire audio file is one period)
- FFT generally requires N to be a power of two
- more than two samples are required (N > 2)
- DFT[x(n)] = X(k) =
-----
\
| x(n)e-jwnk 0 <= k <= N - 1
/
-----
n = 0
where w(omega) = 2Pi/N and e-jwnk = cos(wnk) - jsin(wnk)
- magnitude (amplitude) = |X(k)| = sqrt(ak2 + bk2)
- harmonics of a sound = R/N
6/8/10 Daily Journal of AT
Unfortunately, most of today was spent debugging with little to show for it, except for how not to do things.
Started out with a copy of Axtell's code from yesterday, an FFT function and a graph-drawing function, uncommented. Initially, we both thought that, since the imaginary in an FFT starts as an empty (or all zeroed) array unless a reverse FFT was being done, we could do away with it. I did so in my version. Then I started work on splitting audio files into portions, then analyzing them. This led to the morning of endless debugging.
First, I reasoned that the file gives the information of the number of samples per second, or srate. So, if I wanted to take a one-second sample of the song, I would get srate bytes from a file. If I wanted half a second, I would get half that many. So, I ran a loop to get all the samples, and copy them a portion at a time into a temporary array, then run an FFT on the array and store the data. I realized early on that the song samples do not tend to end exactly on the second, so I added an exception to write the data as zeroes if the loop was trying to copy samples that didn't exist. However, I was still stuck, as every time I tried to run the program, I got a Null Pointer Exception, not within the copying loop, where I almost expected it, but in the FFT function.
After messing with the various variables and becoming very confunsed, I asked Axtell what normally caused this problem. Apparently, if the number of samples is fewer than the srate, the FFT function won't compute. I still don't know why, because as far as I can tell, the function never uses the srate, but I adjusted the loop so that it only used audio samples of srate length or longer. I finally ran the function and printed the graph of the data...to see a mess. I tried zooming in and out, adjusting the window, printing only one graph at a time, still, nothing useful. Then I put the imaginary numbers back in.
Apparently they do do something, as that fixed the problem for the first graph generated. Unfortunately, all subsequent graphs, rather than having peaks in a few places, showed striking similarities to a sin wave. I thought the problem might be the imaginary number file, which was a global variable and never got reset back to zero. Doing that helped, but I still got a wave rather than peaks. I'm still trying to figure that one out, but I have a feeling it has something to do with all the global variables in that function.
The plan for tomorrow is to bypass all this mess. Rather than writing a function to split the sound file within the FFT function, I'm going to write a separate class to split a sound file into half-second (or so) increments, then run each of the resulting files through the FFT. Problems I forsee include having to figure out Java's AudioReader class, which confused me greatly before, or using BufferedReader, which lead to a loss of aliasing, or something syntactically similar. However, Gregor is starting work on getting the wave-byte-wave function working, which would solve that problem. Either way, I should have at least a function to show by tomorrow for my work these past two days.
Related minor issues: Our FFT will not read any music file longer than ~15 seconds. This is a problem, as the function should ultimately analyze full-length songs, but may be solved with the function I'm trying to write. The Hanning window is not working, and no windows are working consistently (Hamming only works for Axtell, and Barlett only worked on half my files). Sample rates need to be really large, as in 16K, for the FFT, because it's only at that level that the minor waves become noticeable above the noise (discovered when creating test files in Audacity).
Interesting notes: In music, whole notes can last anywhere from a second to ten, depending on the tempo. While the written values do not change, the speed at which they're played does. This means that, depending on the song, a second could contain a flurry of pitches or just one, and it cannot necessarily be told in advance based on the sheet music. Ideally, I'd like to analyze each note, but due to not only the variance in notes but in their length from song to song, this would be impossible.
Started out with a copy of Axtell's code from yesterday, an FFT function and a graph-drawing function, uncommented. Initially, we both thought that, since the imaginary in an FFT starts as an empty (or all zeroed) array unless a reverse FFT was being done, we could do away with it. I did so in my version. Then I started work on splitting audio files into portions, then analyzing them. This led to the morning of endless debugging.
First, I reasoned that the file gives the information of the number of samples per second, or srate. So, if I wanted to take a one-second sample of the song, I would get srate bytes from a file. If I wanted half a second, I would get half that many. So, I ran a loop to get all the samples, and copy them a portion at a time into a temporary array, then run an FFT on the array and store the data. I realized early on that the song samples do not tend to end exactly on the second, so I added an exception to write the data as zeroes if the loop was trying to copy samples that didn't exist. However, I was still stuck, as every time I tried to run the program, I got a Null Pointer Exception, not within the copying loop, where I almost expected it, but in the FFT function.
After messing with the various variables and becoming very confunsed, I asked Axtell what normally caused this problem. Apparently, if the number of samples is fewer than the srate, the FFT function won't compute. I still don't know why, because as far as I can tell, the function never uses the srate, but I adjusted the loop so that it only used audio samples of srate length or longer. I finally ran the function and printed the graph of the data...to see a mess. I tried zooming in and out, adjusting the window, printing only one graph at a time, still, nothing useful. Then I put the imaginary numbers back in.
Apparently they do do something, as that fixed the problem for the first graph generated. Unfortunately, all subsequent graphs, rather than having peaks in a few places, showed striking similarities to a sin wave. I thought the problem might be the imaginary number file, which was a global variable and never got reset back to zero. Doing that helped, but I still got a wave rather than peaks. I'm still trying to figure that one out, but I have a feeling it has something to do with all the global variables in that function.
The plan for tomorrow is to bypass all this mess. Rather than writing a function to split the sound file within the FFT function, I'm going to write a separate class to split a sound file into half-second (or so) increments, then run each of the resulting files through the FFT. Problems I forsee include having to figure out Java's AudioReader class, which confused me greatly before, or using BufferedReader, which lead to a loss of aliasing, or something syntactically similar. However, Gregor is starting work on getting the wave-byte-wave function working, which would solve that problem. Either way, I should have at least a function to show by tomorrow for my work these past two days.
Related minor issues: Our FFT will not read any music file longer than ~15 seconds. This is a problem, as the function should ultimately analyze full-length songs, but may be solved with the function I'm trying to write. The Hanning window is not working, and no windows are working consistently (Hamming only works for Axtell, and Barlett only worked on half my files). Sample rates need to be really large, as in 16K, for the FFT, because it's only at that level that the minor waves become noticeable above the noise (discovered when creating test files in Audacity).
Interesting notes: In music, whole notes can last anywhere from a second to ten, depending on the tempo. While the written values do not change, the speed at which they're played does. This means that, depending on the song, a second could contain a flurry of pitches or just one, and it cannot necessarily be told in advance based on the sheet music. Ideally, I'd like to analyze each note, but due to not only the variance in notes but in their length from song to song, this would be impossible.
Axtell's Notes: June 8
Today I built a class that could compute the FFT and show the graph of any WAVE file it was given. It includes a simple menu with a few different options.
I checked the graphs it was making against the graphs Sonogram made for the same WAVE files to ensure that the FFT algorithm works properly.
Once it was written, the rest of the day went to debugging it. The plain FFT works and automatically scales it so it is nicely visible. The windowing and comparison options do not work as they should (instead of a smoother FFT it displays a straight line), so that's on my to do list for tomorrow.
I also need to add numbers to the x-axis saying what frequency is where. I was starting work on finding the frequencies today and hopefully will have that working tomorrow.
I did find out that my FFT method is doing something odd. I only use half the number the method makes because it mirrors itself. Ignoring the second half of the numbers should fix that, but I found out that I was too far zoomed in. If I zoomed out I could see that my function still mirrored itself. I was not able to find out how that was happening, but if I used only a 100th of the list or so, I got rid of the mirroring.
So I have the start of a useful program that is still full of bugs. Tomorrow I hope to work out most of the bugs and do some more research into what should be added to the program next.
I checked the graphs it was making against the graphs Sonogram made for the same WAVE files to ensure that the FFT algorithm works properly.
Once it was written, the rest of the day went to debugging it. The plain FFT works and automatically scales it so it is nicely visible. The windowing and comparison options do not work as they should (instead of a smoother FFT it displays a straight line), so that's on my to do list for tomorrow.
I also need to add numbers to the x-axis saying what frequency is where. I was starting work on finding the frequencies today and hopefully will have that working tomorrow.
I did find out that my FFT method is doing something odd. I only use half the number the method makes because it mirrors itself. Ignoring the second half of the numbers should fix that, but I found out that I was too far zoomed in. If I zoomed out I could see that my function still mirrored itself. I was not able to find out how that was happening, but if I used only a 100th of the list or so, I got rid of the mirroring.
So I have the start of a useful program that is still full of bugs. Tomorrow I hope to work out most of the bugs and do some more research into what should be added to the program next.
07 June, 2010
Daily Blog Entry 1
Today was the first day of my second week of research. Last week I devoted much of the week to background research in order to understand the material needed tackle this research project. I mainly researched the mathematical meanings and derivations of Fourier transformations slowly working my way towards understanding discrete Fourier transformations (DFTs) and fast Fourier transformations (FFTs). I discovered an FFT is the one and the same with a DFT; the only difference between the two are the way they are derived. In computing, calculating a DFT is nearly impossible as it is a function of magnitude O(N2). FFTs, in comparison, are functions of magnitude O(NlogN).
I began this week by continuing where I left off last week with my research regarding the mathematical origins of digital signal processing (DSP). I started the day by looking at Spectral Tools, a music analysis program implementing Java and Max Runtime. I was unable to install Max Runtime on the lab computer but plan to run the program on my personal computer.
I also discovered a MIT Press publication entitled Computer Music Journal which can be accessed online through the MIT Press Journal website. While searching their database, I discovered a textbook I thought would be helpful, The Computer Music Tutorial by Curtis Roads, which I subsequently requested at the library.
Through my research on this site I discovered a two-part article, An Introduction to the Mathematics of Digital Signal Processing, by F. R. Moore which was published in Computer Music Journal which can be accessed through the JSTOR archive. These articles proved to be very enlightening regarding the mathematical aspects of DSP. The first article was a useful tool to recall the main themes of algebra and trigonometry that are imperative to the understanding of DSP. This article was written as an attempt to explain the math without explaining it through calculus, making it easier to understand.
The second article delves into sampling and transforms. Tomorrow I plan on finishing this article and will (hopefully) fully understand the math behind the transforms. I also plan to start programming a FFT from scratch.
I began this week by continuing where I left off last week with my research regarding the mathematical origins of digital signal processing (DSP). I started the day by looking at Spectral Tools, a music analysis program implementing Java and Max Runtime. I was unable to install Max Runtime on the lab computer but plan to run the program on my personal computer.
I also discovered a MIT Press publication entitled Computer Music Journal which can be accessed online through the MIT Press Journal website. While searching their database, I discovered a textbook I thought would be helpful, The Computer Music Tutorial by Curtis Roads, which I subsequently requested at the library.
Through my research on this site I discovered a two-part article, An Introduction to the Mathematics of Digital Signal Processing, by F. R. Moore which was published in Computer Music Journal which can be accessed through the JSTOR archive. These articles proved to be very enlightening regarding the mathematical aspects of DSP. The first article was a useful tool to recall the main themes of algebra and trigonometry that are imperative to the understanding of DSP. This article was written as an attempt to explain the math without explaining it through calculus, making it easier to understand.
The second article delves into sampling and transforms. Tomorrow I plan on finishing this article and will (hopefully) fully understand the math behind the transforms. I also plan to start programming a FFT from scratch.
Axtell's Notes From June 7
On Friday, I ended the day with a short piece of code that I had pieced together using code from JASS (Java Audio Synthesis System) and some modifications of my own. When ran, it printed a long list of floats, but I couldn't figure out what they represented.
So that's how I started today. I did some research into what would be returned by an FFT algorithm, but didn't find anything. I eventually decided that I would graph the points and see if I could see anything useful.
I got a simple graph building class on Java Forums written by hardwired. Again I modified it for my needs.
I ran an FFT on an a440 with on partial meaning an FFT should show 2 peaks (440 and 880)
I decided that the array of floats I had were the amplitudes, so I plotted the points as the y-coords (amplitude) and spreading them evenly across the x-axis (frequency) I got the following graph:
The Hanning window method left just a straight line, but I hopefully that will be fixed tomorrow.
Other than that breakthrough, I found two amusing explanations of FFTs:
A song about FFTs (Lyrics here)
A picture book explaining FFTs applied to visuals
So that's how I started today. I did some research into what would be returned by an FFT algorithm, but didn't find anything. I eventually decided that I would graph the points and see if I could see anything useful.
I got a simple graph building class on Java Forums written by hardwired. Again I modified it for my needs.
I ran an FFT on an a440 with on partial meaning an FFT should show 2 peaks (440 and 880)
I decided that the array of floats I had were the amplitudes, so I plotted the points as the y-coords (amplitude) and spreading them evenly across the x-axis (frequency) I got the following graph:
Two peaks! I then used Tayloe's windowing methods on this same a440.wav file and got the following graphs:
The Hanning window method left just a straight line, but I hopefully that will be fixed tomorrow.
Other than that breakthrough, I found two amusing explanations of FFTs:
A song about FFTs (Lyrics here)
A picture book explaining FFTs applied to visuals
6/7/10 Daily Journal of AT
Today was mostly spent with Audacity, again. Started out finishing translating the FFT function from C++ to Java. Harder than it sounds. Also figured out exactly what each method was doing, and why. For some reason, bits have to be reversed, probably to do inverse function, and code had three ways of doing it, each faster than the last. After sorting out that out, figured out which of the two FFT shells was more useful. Since Axtell was working with simple version, focused on the logarithmic one, which was not any faster, but generated much clearer data. After all, human brains are configured logarithmically, not linearly. Didn't get it working yet.
However, did implement Hamming and Bartell windowing in Axtell's function. Made data somewhat clearer. Hopefully get Hanning to work soon, as, acording to Audacity, it provides the clearest graph and most obvious peaks.
Tomorrow: Write/find function to take data from FFT, find peaks, store them, and work on getting windows of a song to better analyze them.
However, did implement Hamming and Bartell windowing in Axtell's function. Made data somewhat clearer. Hopefully get Hanning to work soon, as, acording to Audacity, it provides the clearest graph and most obvious peaks.
Tomorrow: Write/find function to take data from FFT, find peaks, store them, and work on getting windows of a song to better analyze them.
Subscribe to:
Posts (Atom)