• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 691
  • Last Modified:

question about music genre classification


I need some help in developing a C++ source code that will take an mp3 and convert it to a wav file. This wav file will be converted to a series of numbers and then compared to existing files in order for the mp3 file to be classified to a specific genre depending on the similarity of the existing files. The code needs to have use of a neural network. There are a couple of steps i need help:

1. mp3 conversion to wav file (I am guessing it has to be 44khz)
2. take this conversion and use it with a fast fourier transformation to compare with text files classfied as genres
3. classification loop i know how to do, it is a set of if/else statements and do-while loop, and then based on the closest percentage i can classify it.

I would really appreciate your help in this.
1 Solution
Exactly what is your question?
olleroAuthor Commented:
My question is how do i do this? Has anyone done anything similar to this? I need code, because I am sincerely stuck in this project.

thanks a million
You can start by email a few people who have written the programs for converting MP3 to WAV and asking them for their code or for some tips. Maybe they won't let you have the code but give some sort of a DLL instead.
Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Hi there, I would suggest joining the music-dsp list (http://shoko.calarts.edu/~glmrboy/musicdsp/music-dsp.html) there are many experieced audio developers using it.

Also 'Portaudio' (http://www.portaudio.com/) is a free multiplatform API for audio, the guys on the mailing list could definatly help you out.

I have been using CPS (http://www.bonneville.nl/cps/) for a number of audio projects recently, it costs $150 and comes with a graphical patch editor and full SDKs for developing in C++. Once you have created your a plugin, (CPS includes ready built functions for reading wavetables and conversion from mp3 is done automaticly) build your patch graphicly and export the code to be used in you own application. I'm not sure if this is suitable for your project, but it may be good for prototyping.

Good Luck, I can imagine that you will need to perform some very complex anaylisis on your audio before you will feed it to your nural network. I know there are people at Queens University Belfast  (http://www.sarc.qub.ac.uk/) doing work in very similar areas, although I think the genres they are working with aren't traditional (jazz,dance,rock) but relate directly to the spectral and timing qualitys of any recording.#


Actually, I would just like to add that unless you perform some method of musical anylisis to the data retrieved by your FFTs I very much doubt you will be able to classify genre beyond data you can retrieve just by examining the dynamics of the file. Because of the standardisation of recording quality and dynamics processing there really isn't much spectral variation between modern recordings of popular genres.
olleroAuthor Commented:
My confusion is in neural networks, how do i create it? Or what does it consist of?

Neural Networks are a form of artificial intelligence which has been modelled on the way living organisms learn. They can provide a fairly umn.. 'Lazy' method of pattern analysis, but they aren't magic!  Neural Networks are a MASSIVE subject! And there is certainly no 'standard' neural network. Basically, they aren't really for beginners.

For the project you have proposed you will have to decide what information you want to extract from your FFT analysis and how are you going to store it before you even start analysing it, this will be your major problem. Maybe if you were analysing midi files you would be able to filter the useful musical data in such a way that you could get semi-accurate results, but defiantly not raw audio data. (If you did build a neural network it may take days to work its way through you’re a single mp3.)

Genre classification is not necessarily a musical thing; genres are the constructions of human analysts. And the factors of analysis are too great to be found in the music alone. Unless you are going to invent your own genres based directly on the spectral qualities of the files (for example. “Slow Quiet Rhythmic Music” or “Fast Noisy Music”) you will not be able to do any traditional genre analysis with raw audio data (and then neural nets may not be the best way to do it.)

I don’t want to shatter your ideas, for all I know you may have already thought of these factors, but from what I have gathered from your original post your concept seems quite impossible to realise. I would be happy to try and point you in the right direction, but you will need to tell me exactly what your intentions are for your program.
olleroAuthor Commented:
Lap Dog. I would really appreciate your help in guidance. Here are my intentions:

The overall program will use the input file (either in wav or midi, depending on the ease of use) and extracting information from the FFT analysis (don't know which method is easier, i was thinking of seuquences of 0's and 1's and storing them on text file if possible) and comparing that sequence to files that i have modeled. You said that maybe NN's are not the best way to go. If you know of a simple type of A.I. to develop this project, all suggestions are welcomed.

-From what you have stated, I suppose I would have to use midi files for analysis. This is code that I can easily do or find online.
- Do an FFT Analysis and I have in mind storing the sequence in 0's and 1's in a text file if possible
**Now comes where I have the major problem: NN's. **
- WHen comparing the midi files to text file modeling that I have created, based on the highest percentage that is how I would classify them.

DOes this clarify things a bit for you?
hehe, Not Really. ;-)

Tell me what is the purpose of your program? How accurate does it need to be? & Give me some examples of the types of genres which you would like your program to identify. How good a programmer are you? How much experience do you have with FFTs and DSP in general?
olleroAuthor Commented:
jeje, ok

I want to grab a music file (wav, midi) and classify it by a genre (rock, country, dance- these are the only ones because then it would be to troublesome)

It does not have to be that accurate, as soon as i can get something working then i will look at more accuracy

I have experience in programming, but i am no expert. I am more into web applications (asp, perl, etc.).
as for experience in FFT's and DSP i am a beginner. i am doing research over the internet to get an idea of what is going on.

I hope this helps.

Ok, well midi files are not audio, there is no to convert between the two formats. The parameters which you can retrieve from an audio file are Amplitude and Harmonics, and although FFTs can extract harmonic information from audio, what we perceive as a note is often made up of incredibly complex harmonics. Look up ‘additive synthesis’ to see how what we hear as a single tone can actually be many differently pitched partials.

Amplitude: this is how raw music data is stored in digital form. Amplitude data is float precision and moves between the values -1 to +1, Outputting 0 puts your speakers at rest, values >0 push them out and values <0 pull them in, haha sorry for the simplistic explanation. Anyway, these values change 44100 times a second while ‘musical time’ (notes and time signatures) is far less than this (a quaver beat at 120bpm would be 4 times per second.) So working at a rate of around 100 times a second will allow you to accurately find patterns in amplitude data by averaging out the positive peak amplitudes of the waveforms. The typical rate to work at is 512 samples (around 86 times a second) as this is the length of a typical audio buffer it will make life a lot easier while prototyping.

Harmonic: Harmonic data is what a FFT will give you, a complete run down of the frequency components of a section of audio, if you have never seen a spectral analysis before, most audio editors have a spectral view built in (or try ‘sigview’.. http://www.sigview.com/ ) you should look at a zoomed in section of a music file, to see how complex the harmonics of even simple sounds can be. Again analysing the peaks of a harmonic output (typically the brightest patches on a spectral view) will give you a run-down of the most prominent frequency components.

If you map these on two dimensions you start to build up a profile of the musical and rhythmic information of a file. You will be able to easily write an algorithm which can tell weather the music is loud (many spectral peaks and amplitude peaks) or fast (by measuring the frequency of these peaks) or noisy (not much variation in spectral peaks). If you still want to use a neural network, this is where I would suggest it would have most use; it could ‘learn’ which traits are typical of each genre, from analysing other mp3 data and choosing the most suitable.

One thing that may be incredibly fruitful, would be to offset and subtract your peak information from a fixed copy of itself, the lower the value is retuned (closer to zero), the more similar the data is, this way you can analyse the repetitiveness of the music.

I would suggest writing an application that performs amplitude analysis first, as it is very simple and will give you most of the information on timing and dynamics. That you will get if you were using FFTs.

As i said before, Midi files contain no audible information, just sequence parameters. Therefore it is quite easy to analyse midi data in terms of rhythmic and tonal properties (scale, time-signature, tempo etc.), and then compare this information with information you have collected on music of specific genre. (When I say easy, this is still a major project, on its own.) I would expect to be working on a program like this for over a month, and I have been working with generative music for years! Using a Neural Network (or any method of A.I.) would help do this comparison for you. I'm not sure you want to do tis though, it would be more exact anylisis but personally i find the spectral & amplitude approach far more exciting ;-) There is a lot more to genre than the music, and from an audio recording anylisis you will also be able to anylise enviromental parameters, such as, how modern is the recording is, and perhaps how heavy a rock song is.

:: great beginner audio dsp tutorials (many on ffts).

:: a full dsp book to download.

:: on frequency domain pitch correction.

:: on time domain pitch detection (just to show there are other ways of doing things than FFTs)
olleroAuthor Commented:
Wow, over a month. I actually need this in exactly a month. Thanks for all your help.

One last thing, do you know where I can get code already made for this?

Thank you one last time.
no, i don't know of anybody who has released any code similar to this. i would suggest you start by compiling the portaudio API examples this will help you realise the scale of what you want to do, I think there may also be some examples in the SDK which show how to read and write from audio files. The anaylisis you require is very specific to your project alone, as most people doing this kind of thing are sonic artists who aren't really worried about offending their peers or conforming to genres haha. If you want further input on fft techniques etc. i would really suggest joining the music-dsp list (http://shoko.calarts.edu/~glmrboy/musicdsp/music-dsp.html) there are guys there who may be able to point you towards open-source code for your project.

I am about to re-start my own website, which will provides a more anti-theoretical approach to generitive music and synthesis techniques, the address is http://www.automelodic.com. and it will have examples in c++ and javascript (director, flash etc.) I am prepering some pieces now on markov chains and vector based synthesis, check it in a couple of weeks ;-)
btw. you will have to accept an answer for this topic, i hope i have answered your question more than i have frightened you! hah e-e is generally designed for quite short questions. keep it short and precise questions & you can expect answers alot quicker. if you get most of the question in the subject line its good ;-) haha.

generally for questions regarding music and dsp, people on mailing lists (such as music-dsp) are very interested in hearing about new projects and helping people out, while e-e is great for program and platform specific problems :-)

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now