Library Spectrum Analysis Statistics (matlab, c++, database)
Jan 26, 2013 at 6:23 PM Thread Starter Post #1 of 22

shrimants

1000+ Head-Fier
Joined
Nov 8, 2006
Posts
1,176
Likes
81
I am interested in conducting an experiment. Let me know what your thoughts are on how I can accomplish the following.

I would like to iterate through my entire music library, one song at a time. I read the song's basic information (artist, album, track number) and store it in a database. Then, I go through the song 1/100th of a second or 1/10th of a second or so at a time. Essentially, I take the song, sample by sample, and create a spectrum analysis of it. This yields frequency and amplitude at that frequency per sample. I store this information in the same database along with artist/album/tracknumber.

Once this is all done, I want to be able to analyze the information. What songs have the lowest frequencies and what are they? What songs are my "bassiest"? what songs have the highest pitches in them? and a whole bunch of other measurements.



For methodology, the first step is to create the database format. Next would be to figure out how to read flac and mp3 files. Next would be to figure out how to recurse through directories. Next, i'd need to figure out how to work FFT algorithms, possibly in a multithreaded environment so this doesnt take days. Finally, i'd need to write or somehow come up with search/sort/statistical algorithms that can parse through the database one song at a time and generate reports.

If some software to do this already exists, let me know. Otherwise, I was thinking something like Matlab and C++. I can only code in C++, though matlab isnt too difficult to re-teach myself. I had a brief introduction to it back in 2007 so its been quite a while, but i think they have some pretty simplified tools for dealing with this sort of analysis and statistics.
 
Jan 26, 2013 at 11:01 PM Post #2 of 22
I think music transcription, analysis, genre identification, "musical scene analysis" is active AI - with lots of literature - "live" robotic accompaniment is even done
 
Jan 27, 2013 at 9:50 PM Post #3 of 22
Thats not really what im trying to do. You know those 32 band equalizer visualizers? the bars that move up and down depending on what frequencies are happening in a song at that particular moment? I want to take a song, analyze it moment by moment, and generate the frequency spectrum for it, and continuously save that data. Audacity can do spectrum analysis of an input file. IE for every moment in the song, it can tell you what frequencies are currently present. i want to export that spectral analysis data into a database and then parse the database with statistical algorithms. Active AI is not what i'm after. Just simple data scraping. I can obviously do it manually, but I figured there should be some way to automate the process. Audacity doesnt have any command line inputs so I cant script it, so the next best thing would be to code it myself.
 
Jan 27, 2013 at 10:46 PM Post #4 of 22
Quote:
You know those 32 band equalizer visualizers? the bars that move up and down depending on what frequencies are happening in a song at that particular moment? I want to take a song, analyze it moment by moment, and generate the frequency spectrum for it, and continuously save that data. 

 
Here is my free, deeply thought and labored code for how you could do that in matlab: 
[p]= psd(x).  
 
Where p is the power spectral density of signal x.  
 

 
It is very easy to read and write files in matlab, though I've just done it with excel, video & electrophysiology files.  
 
You can read mp3 files with this free function.
Here is a function that will read id3 tags.
 
Write a routine that will take 1-2 second samples of the songs and feed those snipets to psd.  Then store averages of spectral power over whatever frequency ranges you're interested in.  It is a matrix language so will work great over many many cores.
 
Jan 29, 2013 at 12:28 PM Post #5 of 22
I think what i really need to do is a bunch of reading on this. It makes sense in my head on a basic theory level but when I get into the mathematics level of things, it stops making sense to me. Is psd(x) a built in function in matlab?
 
Jan 29, 2013 at 6:11 PM Post #6 of 22
Quote:
 Is psd(x) a built in function in matlab?

 
Yes it is, but you'll need the signal processing toolbox.  
 
Quote:
I think what i really need to do is a bunch of reading on this. It makes sense in my head on a basic theory level but when I get into the mathematics level of things, it stops making sense to me.

 
Why bother.  
size]
  Matlab is so easy, you can just read up on the different estimation methods (Welsch, Eigenvector, Yule-Walker, Burg, etc.), pick the one you like and let Matlab do the math for you.  psd just uses the Welsh's method as a default.
 
Feb 12, 2013 at 1:11 AM Post #7 of 22
The time taken to code what you want to do in C++ would be likely be arduous and lengthy.
 
why not just get matlab to output the answers to a txt fine and then see how you can code in c++ a OO based program to output the data in arrays to an sql injected database?
 
otherwise if you dont need to get technical, excel is a perfect database.
 
It sounds like you want something thats part of agile development so you can easily whack together different apps and get a very similified C++ program to batch add to sql.
as far as error correction and how to do this is concerned, thats quite lengthy in C++ and i am not sure if your intentions are to overwrite data but SQL injections would simply error and continue if the same data tried to go back into the same table again.
 
 
good luck.
 
Feb 12, 2013 at 11:55 AM Post #8 of 22
This is basically what I'm going to do. Matlab to text file, c++ for text processing, and either MATLAB or some other data visualization tool for lookin at the data. I still need to do reading to figure out exactly what functions exist, what they do, and which ones I can use to get the information I need.
 
Feb 12, 2013 at 2:07 PM Post #9 of 22
Quote:
shrimants said:
  Next, i'd need to figure out how to work FFT algorithms, possibly in a multithreaded environment so this doesnt take days. 

 
 
Multi-threading will only reduce the runtime if you have extra cpus available. Ok; you can take advantage of extra cores on the same cpu - but you could do that anyway by running extra instances of the program and telling each instance to work on a different set of files. 
 
Feb 13, 2013 at 4:19 AM Post #10 of 22
Quote:
 
 
Multi-threading will only reduce the runtime if you have extra cpus available. Ok; you can take advantage of extra cores on the same cpu - but you could do that anyway by running extra instances of the program and telling each instance to work on a different set of files. 


Both statements are false. From the software's point of view, multiple cpus or multiple cores makes no difference at all. A core is a cpu.
And multithreading can reduce runtime even if you have only one cpu. Basic example: if your task is getting intput from file followed by processing it, running two of these tasks in parallel, one on each thread, will normally be faster: while one task is reading input form file the cpu is pretty much idle since disk I/O is not the cpu's business. Hence that free cpu time can be used by the other task in the other thread to do processing. 
 
 
On-topic: I don't know your level of experience with C++, but text processing is way easier in scripting languages like Perl/Python/Ruby/... or even higher level languages like C#. On the other hand, if you know your C++, you don't have to go matlab -> text -> C++ but you can put your C++ functionality in mex files so you can call it directly from matlab, no intermediate text file and (error-prone) parsing. Or you can go the other way around and compile pieces of matlab code to C style dlls, which you can then call from pretty much any other language out there. 
 
Feb 13, 2013 at 5:20 AM Post #11 of 22
Quote:
Both statements are false. From the software's point of view, multiple cpus or multiple cores makes no difference at all. A core is a cpu.

 
I didn't say otherwise. Learn to read. Although I'd add that speaking of "software's pov" is inherently silly; from a PERFORMANCE pov cores and multiple cpus can vary quite a lot when you start considering caching and data fetches - but this gets very architecture specific, and the api (which is wannabes who have never written this sort of application probably confuse with "the software's pov") will stay the same.
 
 
 
And multithreading can reduce runtime even if you have only one cpu. Basic example: if your task is getting intput from file followed by processing it, running two of these tasks in parallel, one on each thread, will normally be faster: while one task is reading input form file the cpu is pretty much idle since disk I/O is not the cpu's business. Hence that free cpu time can be used by the other task in the other thread to do processing. 

 
Again, learn to read. I referred to this specific problem - computing FFTs is NOT io bound! In fact it's about as good an example as you can get of a task that is not.
 
So again, in this particular application, taking time to write a multithreaded version would be stupid - it's much easier to launch multiple instances of the program and give each a different group of files to process.
 
Feb 13, 2013 at 5:35 AM Post #12 of 22
Quote:
 
On-topic: I don't know your level of experience with C++, but text processing is way easier in scripting languages like Perl/Python/Ruby/... 

 
Thank you, Mr Script Kiddie, but no.
 
He's talking about reading in a file of NUMBERS and performing some unspecified operation. Not making Markov chains out of successive nouns! It's nice that you know which languages are usually used to bash text files around, but it's probably not relevant here - reading csv of numbers is trivial in C++ (if you can actually program it - and it is a much harder language than Python et al) and the languages you mention have about 1/100 the numerical performance of C++. He won't be introducing an intermediate stage between Matlab operations unless he needs to do some pretty hefty number crunching... although I thought that with the addition to JIT etc to Matlab this sort of thing was getting obsolete. 
 
Feb 13, 2013 at 8:54 AM Post #13 of 22
Look, if you make general statements like "Multi-threading will only reduce the runtime if you have extra cpus available" and I correct you on that (ok maybe a tad too fast) you could just stay nice and friendly and say you meant this in the light of this topic - or even more narrow, only to calculating FFT. Instead you start namecalling, which is not very thoughtful in general nor in this case particularly since you do not know me at all. Well, thanks to you the joke of the day amongst my fellow programmer colleagues is that I now get addressed with 'The Wannabe Script Kiddie'. Good laughs, but silly since I'm writing a compiler that runs on a multithreaded hard realtime system. No joke.
 
I have a feeling discussion with you might become hard but I'll try it once.
 
 
Quote:
So again, in this particular application, taking time to write a multithreaded version would be stupid - it's much easier to launch multiple instances of the program and give each a different group of files to process.

 
Depends on what you consider easy.. In the first case you take a list of files, split it in say 4, then launch 4 threads. In the second case you take a of files, split it in say 4 then launch 4 processes. In C++ I consider the first option easier since you don't have to bother with platform specific C style code for launching a process but just use the standard threading features. But I guess it's just a matter of preference.
 
 
Quote:
and it is a much harder language than Python et al

 
I'm glad we agree on a point
 
 
Quote:
The languages you mention have about 1/100 the numerical performance of C++

 
Bold and way too general statement. You could have a point for Ruby and Perl (could - hard to tell without actually measuring this specific case) But in C# for instance a multiplication of two double values yields the exact same assembly code as for C++. While this might not be representative for the OP's case, it does illustrate pretty well that this case is not 100 times slower, which backs up my statement that yours is way too general - or false - depending on how you look at it. Or take Python, in which one would use NumPy for numerical stuff that needs speed, which under the hood just calls into the C lapack libs. Good luck finding something much faster than that.
 
Feb 13, 2013 at 9:27 AM Post #14 of 22
Quote:
Look, if you make general statements like "Multi-threading will only reduce the runtime if you have extra cpus available" and I correct you on that (ok maybe a tad too fast) you could just stay nice and friendly
 

 
It wasn't a general statement, it was a statement about this particular problem. Which is why it followed a carefully selected quote from another post, to show it was a comment on that particular statement!
 

Bold and way too general statement. You could have a point for Ruby and Perl (could - hard to tell without actually measuring this specific case) But in C#

 
 
Again, learn to read. And don't try to win arguments by deliberately misquoting people. I made it clear that I was referring to what you had said about "Perl/Python/Ruby/" by actually naming those languages! If I wanted to make a statement about C# then I would written "Regarding C#..."!
 
 
 
Or take Python, in which one would use NumPy for numerical stuff that needs speed, which under the hood just calls into the C lapack libs. Good luck finding something much faster than that.
 

 
This is factually correct, but hopelessly incompetent. Read what the guy said again - he has Matlab available! If he can't do the unspecified job in the part of the chain where he wants to leave Matlab for C++, then saying "Use NumPy instead!"  is just silly. 
 
And it might be hard for you to tell how suitable Ruby is for writing numerics in, but for a competent programmer, no. (Regarding which, judging whether C# will run numerical code as fast C++ by looking at the assembly for a single multiplication operation is face-palm helmet territory..)
 
Feb 13, 2013 at 6:45 PM Post #15 of 22
Not to intrude on your guys' discussion, but I was initially going to do C++ because I'm comfortable with it. that is, until I saw the FFT libraries and realized that a) i dont know howTF to use their libraries and b) I dont know diddly scoot about FFT in the first place. Then I went to matlab, which is a lot more scripty and easy to program in. Matlab has built in functions already and due to the large amount of data i would be processing, time is not REALLY an important constraint. it will take a long time and thats just fine as long as the result is accurate. I do know about C++ and mex, though that was a LONG time ago, as was matlab programming.

My real obstacle now is to figure out exactly what kind of FFT I need, what the function is, what kind of data I need to create from this function, and what the function's inputs and more importantly outputs even mean. In my head it all is easy enough but I dont know anythign about the math except "FFT lets you take a window of time and get the frequency/amplitude of said frequencies in that window".

I think matlab's DSP toolbox has everything I need including flac/mp3 input file capability. I am 99% sure that the power spectral density function is what needs to be used to generate the data. I have no idea what that means though.

My only other reason for liking C++ is that it does math better than matlab does. But like I said, speed is not really a constraint for me.
 

Users who are viewing this thread

Back
Top