Solved

question about music genre classification

Posted on 2004-03-20
15
672 Views
Last Modified: 2008-08-22
Hi,

I need some help in developing a C++ source code that will take an mp3 and convert it to a wav file. This wav file will be converted to a series of numbers and then compared to existing files in order for the mp3 file to be classified to a specific genre depending on the similarity of the existing files. The code needs to have use of a neural network. There are a couple of steps i need help:

1. mp3 conversion to wav file (I am guessing it has to be 44khz)
2. take this conversion and use it with a fast fourier transformation to compare with text files classfied as genres
3. classification loop i know how to do, it is a set of if/else statements and do-while loop, and then based on the closest percentage i can classify it.

I would really appreciate your help in this.
0
Comment
Question by:ollero
15 Comments
 
LVL 30

Expert Comment

by:Axter
Comment Utility
Exactly what is your question?
0
 

Author Comment

by:ollero
Comment Utility
My question is how do i do this? Has anyone done anything similar to this? I need code, because I am sincerely stuck in this project.

thanks a million
0
 
LVL 1

Expert Comment

by:Lescha
Comment Utility
You can start by email a few people who have written the programs for converting MP3 to WAV and asking them for their code or for some tips. Maybe they won't let you have the code but give some sort of a DLL instead.
0
 
LVL 1

Expert Comment

by:lap_dog_shuffle
Comment Utility
Hi there, I would suggest joining the music-dsp list (http://shoko.calarts.edu/~glmrboy/musicdsp/music-dsp.html) there are many experieced audio developers using it.

Also 'Portaudio' (http://www.portaudio.com/) is a free multiplatform API for audio, the guys on the mailing list could definatly help you out.

I have been using CPS (http://www.bonneville.nl/cps/) for a number of audio projects recently, it costs $150 and comes with a graphical patch editor and full SDKs for developing in C++. Once you have created your a plugin, (CPS includes ready built functions for reading wavetables and conversion from mp3 is done automaticly) build your patch graphicly and export the code to be used in you own application. I'm not sure if this is suitable for your project, but it may be good for prototyping.

Good Luck, I can imagine that you will need to perform some very complex anaylisis on your audio before you will feed it to your nural network. I know there are people at Queens University Belfast  (http://www.sarc.qub.ac.uk/) doing work in very similar areas, although I think the genres they are working with aren't traditional (jazz,dance,rock) but relate directly to the spectral and timing qualitys of any recording.#

/Steven

0
 
LVL 1

Expert Comment

by:lap_dog_shuffle
Comment Utility
Actually, I would just like to add that unless you perform some method of musical anylisis to the data retrieved by your FFTs I very much doubt you will be able to classify genre beyond data you can retrieve just by examining the dynamics of the file. Because of the standardisation of recording quality and dynamics processing there really isn't much spectral variation between modern recordings of popular genres.
0
 

Author Comment

by:ollero
Comment Utility
My confusion is in neural networks, how do i create it? Or what does it consist of?

0
 
LVL 1

Expert Comment

by:lap_dog_shuffle
Comment Utility
Neural Networks are a form of artificial intelligence which has been modelled on the way living organisms learn. They can provide a fairly umn.. 'Lazy' method of pattern analysis, but they aren't magic!  Neural Networks are a MASSIVE subject! And there is certainly no 'standard' neural network. Basically, they aren't really for beginners.

For the project you have proposed you will have to decide what information you want to extract from your FFT analysis and how are you going to store it before you even start analysing it, this will be your major problem. Maybe if you were analysing midi files you would be able to filter the useful musical data in such a way that you could get semi-accurate results, but defiantly not raw audio data. (If you did build a neural network it may take days to work its way through you’re a single mp3.)

Genre classification is not necessarily a musical thing; genres are the constructions of human analysts. And the factors of analysis are too great to be found in the music alone. Unless you are going to invent your own genres based directly on the spectral qualities of the files (for example. “Slow Quiet Rhythmic Music” or “Fast Noisy Music”) you will not be able to do any traditional genre analysis with raw audio data (and then neural nets may not be the best way to do it.)

I don’t want to shatter your ideas, for all I know you may have already thought of these factors, but from what I have gathered from your original post your concept seems quite impossible to realise. I would be happy to try and point you in the right direction, but you will need to tell me exactly what your intentions are for your program.
0
Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

 

Author Comment

by:ollero
Comment Utility
Lap Dog. I would really appreciate your help in guidance. Here are my intentions:

The overall program will use the input file (either in wav or midi, depending on the ease of use) and extracting information from the FFT analysis (don't know which method is easier, i was thinking of seuquences of 0's and 1's and storing them on text file if possible) and comparing that sequence to files that i have modeled. You said that maybe NN's are not the best way to go. If you know of a simple type of A.I. to develop this project, all suggestions are welcomed.

Steps
-From what you have stated, I suppose I would have to use midi files for analysis. This is code that I can easily do or find online.
- Do an FFT Analysis and I have in mind storing the sequence in 0's and 1's in a text file if possible
**Now comes where I have the major problem: NN's. **
- WHen comparing the midi files to text file modeling that I have created, based on the highest percentage that is how I would classify them.

DOes this clarify things a bit for you?
0
 
LVL 1

Expert Comment

by:lap_dog_shuffle
Comment Utility
hehe, Not Really. ;-)

Tell me what is the purpose of your program? How accurate does it need to be? & Give me some examples of the types of genres which you would like your program to identify. How good a programmer are you? How much experience do you have with FFTs and DSP in general?
0
 

Author Comment

by:ollero
Comment Utility
jeje, ok

purpose:
I want to grab a music file (wav, midi) and classify it by a genre (rock, country, dance- these are the only ones because then it would be to troublesome)

Accuracy:
It does not have to be that accurate, as soon as i can get something working then i will look at more accuracy

I have experience in programming, but i am no expert. I am more into web applications (asp, perl, etc.).
as for experience in FFT's and DSP i am a beginner. i am doing research over the internet to get an idea of what is going on.

I hope this helps.

0
 
LVL 1

Expert Comment

by:lap_dog_shuffle
Comment Utility
Ok, well midi files are not audio, there is no to convert between the two formats. The parameters which you can retrieve from an audio file are Amplitude and Harmonics, and although FFTs can extract harmonic information from audio, what we perceive as a note is often made up of incredibly complex harmonics. Look up ‘additive synthesis’ to see how what we hear as a single tone can actually be many differently pitched partials.

Amplitude: this is how raw music data is stored in digital form. Amplitude data is float precision and moves between the values -1 to +1, Outputting 0 puts your speakers at rest, values >0 push them out and values <0 pull them in, haha sorry for the simplistic explanation. Anyway, these values change 44100 times a second while ‘musical time’ (notes and time signatures) is far less than this (a quaver beat at 120bpm would be 4 times per second.) So working at a rate of around 100 times a second will allow you to accurately find patterns in amplitude data by averaging out the positive peak amplitudes of the waveforms. The typical rate to work at is 512 samples (around 86 times a second) as this is the length of a typical audio buffer it will make life a lot easier while prototyping.

Harmonic: Harmonic data is what a FFT will give you, a complete run down of the frequency components of a section of audio, if you have never seen a spectral analysis before, most audio editors have a spectral view built in (or try ‘sigview’.. http://www.sigview.com/ ) you should look at a zoomed in section of a music file, to see how complex the harmonics of even simple sounds can be. Again analysing the peaks of a harmonic output (typically the brightest patches on a spectral view) will give you a run-down of the most prominent frequency components.

If you map these on two dimensions you start to build up a profile of the musical and rhythmic information of a file. You will be able to easily write an algorithm which can tell weather the music is loud (many spectral peaks and amplitude peaks) or fast (by measuring the frequency of these peaks) or noisy (not much variation in spectral peaks). If you still want to use a neural network, this is where I would suggest it would have most use; it could ‘learn’ which traits are typical of each genre, from analysing other mp3 data and choosing the most suitable.

One thing that may be incredibly fruitful, would be to offset and subtract your peak information from a fixed copy of itself, the lower the value is retuned (closer to zero), the more similar the data is, this way you can analyse the repetitiveness of the music.

I would suggest writing an application that performs amplitude analysis first, as it is very simple and will give you most of the information on timing and dynamics. That you will get if you were using FFTs.

As i said before, Midi files contain no audible information, just sequence parameters. Therefore it is quite easy to analyse midi data in terms of rhythmic and tonal properties (scale, time-signature, tempo etc.), and then compare this information with information you have collected on music of specific genre. (When I say easy, this is still a major project, on its own.) I would expect to be working on a program like this for over a month, and I have been working with generative music for years! Using a Neural Network (or any method of A.I.) would help do this comparison for you. I'm not sure you want to do tis though, it would be more exact anylisis but personally i find the spectral & amplitude approach far more exciting ;-) There is a lot more to genre than the music, and from an audio recording anylisis you will also be able to anylise enviromental parameters, such as, how modern is the recording is, and perhaps how heavy a rock song is.

/Steven
0
 
LVL 1

Expert Comment

by:lap_dog_shuffle
Comment Utility
http://www.dspdimension.com/start.html
:: great beginner audio dsp tutorials (many on ffts).

http://www.dspguide.com/pdfbook.htm
:: a full dsp book to download.

http://cnx.rice.edu/content/m11715/latest/
:: on frequency domain pitch correction.

http://cnx.rice.edu/content/m11711/latest/
:: on time domain pitch detection (just to show there are other ways of doing things than FFTs)
0
 

Author Comment

by:ollero
Comment Utility
Wow, over a month. I actually need this in exactly a month. Thanks for all your help.

One last thing, do you know where I can get code already made for this?

Thank you one last time.
0
 
LVL 1

Accepted Solution

by:
lap_dog_shuffle earned 500 total points
Comment Utility
no, i don't know of anybody who has released any code similar to this. i would suggest you start by compiling the portaudio API examples this will help you realise the scale of what you want to do, I think there may also be some examples in the SDK which show how to read and write from audio files. The anaylisis you require is very specific to your project alone, as most people doing this kind of thing are sonic artists who aren't really worried about offending their peers or conforming to genres haha. If you want further input on fft techniques etc. i would really suggest joining the music-dsp list (http://shoko.calarts.edu/~glmrboy/musicdsp/music-dsp.html) there are guys there who may be able to point you towards open-source code for your project.

I am about to re-start my own website, which will provides a more anti-theoretical approach to generitive music and synthesis techniques, the address is http://www.automelodic.com. and it will have examples in c++ and javascript (director, flash etc.) I am prepering some pieces now on markov chains and vector based synthesis, check it in a couple of weeks ;-)
0
 
LVL 1

Expert Comment

by:lap_dog_shuffle
Comment Utility
btw. you will have to accept an answer for this topic, i hope i have answered your question more than i have frightened you! hah e-e is generally designed for quite short questions. keep it short and precise questions & you can expect answers alot quicker. if you get most of the question in the subject line its good ;-) haha.

generally for questions regarding music and dsp, people on mailing lists (such as music-dsp) are very interested in hearing about new projects and helping people out, while e-e is great for program and platform specific problems :-)

/Steven
0

Featured Post

Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

Join & Write a Comment

What is C++ STL?: STL stands for Standard Template Library and is a part of standard C++ libraries. It contains many useful data structures (containers) and algorithms, which can spare you a lot of the time. Today we will look at the STL Vector. …
This article will show you some of the more useful Standard Template Library (STL) algorithms through the use of working examples.  You will learn about how these algorithms fit into the STL architecture, how they work with STL containers, and why t…
The viewer will learn how to use the return statement in functions in C++. The video will also teach the user how to pass data to a function and have the function return data back for further processing.
The viewer will be introduced to the member functions push_back and pop_back of the vector class. The video will teach the difference between the two as well as how to use each one along with its functionality.

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now