Analyzing Music Internship: wednesday

Showing posts with label wednesday. Show all posts

30 June, 2010

6/30/10 Daily Journal of AT

Spent the morning figuring out things going wrong with the FFT version of BeatFinder. I managed to get an analysis working on bands of frequencies, rather than just one, so that it was more accurate. However, re-reading my notes, I realized I forgot something critical: the imaginary aspect. Fairly easy to do, since our FFT discards the data entirely. According to the algorithm, the data should be squared, and in the case of FFT data, the imaginary squared is subtracted from the real squared, leading to a very large difference in data. I'm not 100% sure this is necessary, as we use the real data only in everything else. However, this might cause different errors that we are not aware of .

Moved onto/got distracted by Axtell's work with thresholds of hearing. Basically, depending on frequency, the softest note people can hear changes. There are a few algorithms that approximate this threshold, and we worked with them today to replace PeakFinder, as it did the same thing, only better. (This was the magic threshold I was searching for those two weeks ago!) Between the testing and debugging, that took most of the rest of today.

In the afternoon, I did some research to try and speed up the BeatFinder. I started out looking at ways to optimize the speed of the file reading, as that's where most of the lag comes from. Unfortunately, it seems that what I'm doing now is the fastest that it can get, at least in terms of accessing the file. It recommended using buffers (I am) and only getting necessary data (I do).

Alternatively, I could forgo writing all the FFT data (probably wise, as it's a very large file). The options then are (1) use the peaks instead or (2) get the beats as the FFT runs. The problem with the first is that I'm still not sure if it's returning all hearable notes, and, if it's not, it will throw off all the beats. The problem with the second is mainly one of structuring, as I'm sure it would be much faster then what I currently have (no reading or writing of files necessary!). I'll try implementing the latter tomorrow.

23 June, 2010

6/23/10 Daily Journal of AT

Wow, Wendnesday already? In a couple days, we'll be halfway done with the internship.

Spent the morning looking up information on scales and analysis programs. The just vs. equal temperamant issue of scales is interesting. Basically, in just temperament, scales are tuned based on one note, relative to that pitch, and each note as a slight variation based on what key it's in. Equal temperament sets a default pitch for each note, all relative, which leads to it being slightly worse than any given just tuning, but better at going to other keys with no need to re-tune. Due to the way NoteFinder works in ranges around equal temperament pitches, it should be accurate for instruments tuned that way (namely pianos), but, since it works in ranges around those pitches, it should also suffice for equal temperament pieces (like vocal groups).

Also read a lot about the mir group's various projects, which were linked from the main summer page. They talk a lot about different analysis programs, how they can find the genre or the beats or the scale, but don't say a lot about the actual analysis involved. I played with the online versions of the programs and downloaded a bit of the code, but so far I haven't found out much, except they use FFTs in some fashion. I couldn't download the proper programs, because admin passwords D:

This afternoon, I realized that, when the notes were graphed, they appeared sequentially, with no regard to what sample they came from. Axtell and I managed to make the frequencies be packeted, so we've gotten rid of the slanty lines, and replaced them with linear lines. I also did a comparative analysis on a song, showing the difference between no window (the upper) and Hanning window (the lower). It's pretty clear that the windowing gets rid of noise, but also a lot of quieter data. (For reference, the song was "My Freeze Ray" from Dr. Horrible's Sing-a-long Blog. )

16 June, 2010

6/16/10 Daily Journal of AT

Two good days in a row! Something must be going wrong, but stuff works!

This morning, I was inspired. I finally figured out what sort of PeakFinder to use! The secret? More rigorous peak identification and standard deviations. To elaborate, the function, as it is now, first finds the peaks based on the surrounding points, fifteen to the left and right, and only saves them if the data is larger than all the surrounding points. (This may be changed to a smaller number when windowing is used, in order to save smaller peaks and peaks close to each other.) Then, of the remaining peaks, a standard deviation is taken, and all below the 64th percentile are discarded (the noisy peaks). It works super well!

I had to readjust the note finder, as the C notes generated were a step lower than they should have been. I also wrote in a bit that tells when a peak is inaudible (as in, lower than the 0th octave or higher than the 9th octave).

This afternoon, I worked on cleaning and commenting my code, and looked at Axtell's shiny new GUI. We discussed how we would implement our two programs, and figured out a new GUI that would take a file, split it, and display either hertz peaks over time or a list of the notes present in the piece. I've managed to re-do Wave Splitter and Peak Finder to work with it, although I may have to overhaul Note Finder more tomorrow. Anyway, it was a surprisingly productive day, and we should hopefully have a working GUI by Friday morning.

Also: Everything's better nested, like grids, loops, and baby birds.

To Do: Comment, clean, and get everyone to play nice with each other

09 June, 2010

6/9/10 Daily Journal of AT

Today, fortunately, was much more productive than yesterday. In the morning, I wrote and tested WaveSplitter, a program that splits one wave file into many smaller ones, then runs an FFT on them. (Note: Code used to run FFT and draw graphs was from yesterday, still have to implement changes made to them by partners.) After doing research on the AudioInputStream class, I managed to get my code to read the buffered file and write it out again in chunks (rather than using nested loops, as I had originally planned). Each wave file can be successfully FFT'ed upon. Only problem I discovered was that, if one tried to split the wave file into larger chunks (the default that works is one second) it will only give one second to be analyzed, just space them out more and have fewer mini-files. I'm not quite sure how to fix it, so am waiting until Friday. There was another problem but I can't think of it right now.

This afternoon, I worked on a function to get the peaks from the FFT data generated. This involved a bit of trial-and-error with reading the files in (and the hope that the FFT file generation hasn't changed much today). I wrote a simple loop that goes through the data and gets rid of the smallest half (because they are obviously not large enough peaks to bother with). Running this "cleaner function" multiple times gives the peaks of the data. Comparing the graphs to the numbers, it appears fairly accurate. The only trick will be to translate the location of the peak to a frequency, in order to make this useful for analysis, and be able to write and append this data to a single file, then use the file to generate a graph for the song as a whole (with higher accuracy than taking an FFT of the entire song, as in the latter, small peaks would be lost in the noise, while in the former, smaller peaks are preserved).

Also, as a special tangential issue, I researched some information about logarithmic thinking. I first heard about this on a podcast, RadioLab, done through NPR. They did a special on numbers, and their first story was about how babies and toddlers "think" of numbers.

Found at RadioLab's website.
Basically, there was a study done on babies, children from two to three months old. They were hooked up to an electrode net and shown a series of eight identical pictures, in one case, ducks. They were measured on their reactions when the pictures changed to, say, eight trucks. Then, they changed from eight ducks to sixteen ducks. The babies' brains still showed a change, but in a different area. The study went on to show that babies can tell the difference between large differences of numbers, like ten and twenty, but not smaller ones, like nine and ten.

The reason for this is that, initially, our brains think of numbers logarithmically. The show goes on to tell about studies done among non-counting people living in the Amazon basin, who, when asked what is between one and nine, reply "Three." To us, it makes no sense, as the number directly between one and nine is five (four more than one and four less than nine). But the way they think of it is, one times three is three. Three times three is nine. Therefore, three is exactly between one and nine.

This way of thinking is not unique to infants and non-counters, either. Say, for example, there's a sale on a pair of pants. Normally, the pants cost twenty dollars, but today, they cost ten. That's a really great deal, isn't it? Now say that you find a jacket that's also on sale. It used to be a hundred dollars, but now it's only ninety. Is it as good a deal?

Thinking linearly about it, it seems both these deals are the same. However, half off is far better than ten percent off, and our gut reaction is to say that the pants are a better deal than the jacket. As a third example, if you were thinking of buying a ten thousand dollar car, the salesman knocking ten dollars off the price will probably not sway you much.

This relates to our project with the fact that steps on the western scales are arranged logarithmically. For example, going from middle C to high C doubles the frequency (and halves the wavelength). Now, this has something to do with the waves that make up sounds, as waves that have peaks at similar times will reinforce each other, creating a stronger sound. If wave 1 has half the frequency of wave 2, they will reinforce each other every other wave, whereas if wave A has a wavelength of a hundred less than wave B, there's no telling when they will reinforce each other. However, our brains prefer the reinforced waves, which in turn lead to our appreciation for things "in key." I'm fairly sure it's related to our thinking logarithmically, or perhaps we think logarithmically and like chordes due to some other facet of human nature. Regardless, it's pretty interesting.

Analyzing Music Internship