Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium


MP3 Waveform Image Generation

Posted on 2006-05-15
Medium Priority
Last Modified: 2008-01-09
I'm trying to generate an image of the waveform of an audio file within a script or through system() type commands and have not found a way to do it yet.  Ideally I could read an MP3 file (decode it if necessary) and output a gif or jpeg image of the waveform.  When I say 'waveform' I mean the pretty graphical representation of an audio file that you see in audio editing programs.

If you don't know how to do it could you at least tell me what a waveform is actually representing?  The way I understand it, it's the amplitude (volume) of the audio over time.  I will write a script to make the image for me based on that if I have to but I'm sure it's been done before.

I am looking to develop this in Perl or PHP although I am open to Actionscript and C++ (yuck) as well.
Question by:kamermans
  • 2
LVL 17

Expert Comment

by:Dushan De Silva
ID: 16688292
You can use GTK (if need with OPenGL).

BR Dushan
LVL 40

Expert Comment

by:Richard Quadling
ID: 16688768
(I'm from the PHP pointer).

Follow this logic.

An MP3 is normally 1/12th of the original audio size.

Say the orginal audio is 16bit Stereo and was at 44.1KHz

This means that 1 second, there will be 44,100 * 16 * 2 bits = 1,411,200 bits = 176,400 bytes of data.

But more importantly, 1 second will contain 44,100 samples. This is 44,100 dots along the x axis of the image.

So, you will need to resample this to get an image of sensible proportions.

MP3 files use a form of compression called lossy compression.


Source => mp3 => Output

Source is NOT exactly the same as Output. But an audio equivalent. I can't give you figures but there will be significant differences between the two.

You will need to decode the MP3 data (it is DATA, it is NOT AUDIO SAMPLES) into the samples to generate the wave form picture.

The waveforms will be VERY wide.

A 3 minute track will contain 7,938,000 samples. That is nearly 8 million samples. To plot that on a screen, even at 1280x1024, you have to resample and shrink that by a factor of 6,200. A significant shrink.

The waveform you see is normally with 2 traces (left and right audio channels).

Each trace will be the value of the sample at that point in time (in increments of 1/44100 seconds).

LVL 13

Author Comment

ID: 16691554
RQualing - Thank for the info!  Would it make sense for me to decode the MP3 to raw WAV, then resample it to like 1kHz - Mono or something very low just to cut down on CPU time, then use some language to determine the average amplitude of each 1000 samples?  For a 5 min song (5min * 60sec = 300sec) I would have (300sec * 1kHz = 300000 samples) 30000 samples and if I take the avg of every 1000 samples I would be left with (300000 samples / 1000 = 300) 300 total samples - one for each second, which would generate a nice 300px wide image?

Let me know if my logic is off.  I also would like to know if anyone knows how to determine the actual amplitude (volume) of an individual sample in a PCM.
LVL 40

Accepted Solution

Richard Quadling earned 2000 total points
ID: 16706569
If you can, I would decode to a stream.


stream_WAV = new MP3_Decode_Stream('my.mp3')

left_sample = stream_WAV->left_sample()
right_sample = stream_WAV->right_sample()


The idea here being that the decode presents the left and right samples on the fly and sequentially. That way you do not have to physically convert to a wav first and then process the wav file. I've no idea on decoding an MP3 file, but I suspect there are good sources available as the MP3 encoding does all the real work.

I would then work out how many samples are needed for a pixel. If you have a 300px wide image and you have 300,000 samples wide audio, then you need to average 1,000 samples at a time. Add the values of 1,000 samples together and then divide by 1,000.

This SHOULD provide a fairly reasonable waveform. Don't forget you would need to do left and right simultaneously as the I the samples are interleaved.


Featured Post

Become an Android App Developer

Ready to kick start your career in 2018? Learn how to build an Android app in January’s Course of the Month and open the door to new opportunities.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I've just discovered very important differences between Windows an Unix formats in Perl,at least 5.xx.. MOST IMPORTANT: Use Unix file format while saving Your script. otherwise it will have ^M s or smth likely weird in the EOL, Then DO NOT use m…
On Microsoft Windows, if  when you click or type the name of a .pl file, you get an error "is not recognized as an internal or external command, operable program or batch file", then this means you do not have the .pl file extension associated with …
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans
Suggested Courses

577 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question