04 June, 2010

Weekly Blog Internship Update #1

As the first week of our internship, much of our time was spent learning the basics of what we were doing, and reviewing how to do it. Tuesday was our first semi-official day, where we went over the grant proposal and learned the overarching ideas behind this internship; namely, analyzing music and sound files, then creating new, similar music from the data generated.
The first analysis method we're using was the Fourier transformation, which takes wave data, finds the frequencies of the peaks, and puts it into graph form. The hope is that, once the frequencies in a song are known, they can be analyzed to figure out what instruments are playing what notes, and how often, in order to aid in further music generation. Our first official day, Wednesday, was spent learning about Fourier transformations, the related mathematical formulas, and how to perform the functions faster using FFTs.

We also started looking at different previously established programs that used FFTs to analyze music. The first was Sonogram, a "visual speech" open source program that displays sound files in colorful graph forms, which we worked with on Wednesday and Thursday. We had high hopes that, due to it's code being in Java, we could better understand how to use it and possibly implement it in our work. However, the language barrier (Sonogram was programed in German) and the number of nested unnecessary calls and transformations made the program very difficult to understand and work with, and we ended up abandoning it.

We moved on to looking at three other programs with similar functionality: JASS, Audacity, and Jigl. JASS (Java Audio Synthesis System) is used for audio manipulation, primarily as a researcher resource. Audacity's primary use is as a sound mixer and amature recording program. Jigl (Java Image and Graphic Library) is a set of functions designed for students studying digital image and audio processing. All programs are open source. Our hope was that, through these three programs, we could create and test a simple FFT program.

We came at the problem from three directions: translating the audio files (in this case, WAVE files) into bytes, processing the bytes in the FFT function, and putting the resultant data into a useful form. Our initial forray into audio-to-byte functions lead to some problems: the bytes, when put back into audio form, produced a similar but distorted audio file. Comparing Fourier graphs of the two showed that the direct copying would not give an accurate reading. Instead, we started working with the pre-built Java class AudioInputStream, which gave us grouped bytes rather than the raw data.

With analyzing the data itself, after we understood the function, implementing it was rather easy. However, we could not test it until we got the audio into byte form. Even after that, we had no way to test the numbers we got to see if they were correct or useful. For now, we compared the data to running the program through Audacity's FFT, but the data was vastly different. We're working on figuring out how to better test our FFT function, and how to begin storing and showing the data.

Our goals for next week are creating a working function to take an audio file and give a Fourier transformation, and create a method to check that our data is correct. We also hope to better understand and utalize the various error-catching functions present in the codes we have read so far, and to be able to better understand C++ code.