MP3 Waveform Image Generation

Posted on 2006-05-15
Last Modified: 2008-01-09
I'm trying to generate an image of the waveform of an audio file within a script or through system() type commands and have not found a way to do it yet.  Ideally I could read an MP3 file (decode it if necessary) and output a gif or jpeg image of the waveform.  When I say 'waveform' I mean the pretty graphical representation of an audio file that you see in audio editing programs.

If you don't know how to do it could you at least tell me what a waveform is actually representing?  The way I understand it, it's the amplitude (volume) of the audio over time.  I will write a script to make the image for me based on that if I have to but I'm sure it's been done before.

I am looking to develop this in Perl or PHP although I am open to Actionscript and C++ (yuck) as well.
Question by:kamermans
    LVL 17

    Expert Comment

    You can use GTK (if need with OPenGL).

    BR Dushan
    LVL 40

    Expert Comment

    (I'm from the PHP pointer).

    Follow this logic.

    An MP3 is normally 1/12th of the original audio size.

    Say the orginal audio is 16bit Stereo and was at 44.1KHz

    This means that 1 second, there will be 44,100 * 16 * 2 bits = 1,411,200 bits = 176,400 bytes of data.

    But more importantly, 1 second will contain 44,100 samples. This is 44,100 dots along the x axis of the image.

    So, you will need to resample this to get an image of sensible proportions.

    MP3 files use a form of compression called lossy compression.


    Source => mp3 => Output

    Source is NOT exactly the same as Output. But an audio equivalent. I can't give you figures but there will be significant differences between the two.

    You will need to decode the MP3 data (it is DATA, it is NOT AUDIO SAMPLES) into the samples to generate the wave form picture.

    The waveforms will be VERY wide.

    A 3 minute track will contain 7,938,000 samples. That is nearly 8 million samples. To plot that on a screen, even at 1280x1024, you have to resample and shrink that by a factor of 6,200. A significant shrink.

    The waveform you see is normally with 2 traces (left and right audio channels).

    Each trace will be the value of the sample at that point in time (in increments of 1/44100 seconds).

    LVL 13

    Author Comment

    RQualing - Thank for the info!  Would it make sense for me to decode the MP3 to raw WAV, then resample it to like 1kHz - Mono or something very low just to cut down on CPU time, then use some language to determine the average amplitude of each 1000 samples?  For a 5 min song (5min * 60sec = 300sec) I would have (300sec * 1kHz = 300000 samples) 30000 samples and if I take the avg of every 1000 samples I would be left with (300000 samples / 1000 = 300) 300 total samples - one for each second, which would generate a nice 300px wide image?

    Let me know if my logic is off.  I also would like to know if anyone knows how to determine the actual amplitude (volume) of an individual sample in a PCM.
    LVL 40

    Accepted Solution

    If you can, I would decode to a stream.

    e.g (PSEUDO CODE)

    stream_WAV = new MP3_Decode_Stream('my.mp3')

    left_sample = stream_WAV->left_sample()
    right_sample = stream_WAV->right_sample()


    The idea here being that the decode presents the left and right samples on the fly and sequentially. That way you do not have to physically convert to a wav first and then process the wav file. I've no idea on decoding an MP3 file, but I suspect there are good sources available as the MP3 encoding does all the real work.

    I would then work out how many samples are needed for a pixel. If you have a 300px wide image and you have 300,000 samples wide audio, then you need to average 1,000 samples at a time. Add the values of 1,000 samples together and then divide by 1,000.

    This SHOULD provide a fairly reasonable waveform. Don't forget you would need to do left and right simultaneously as the I the samples are interleaved.


    Featured Post

    Why You Should Analyze Threat Actor TTPs

    After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

    Join & Write a Comment

    I've just discovered very important differences between Windows an Unix formats in Perl,at least 5.xx.. MOST IMPORTANT: Use Unix file format while saving Your script. otherwise it will have ^M s or smth likely weird in the EOL, Then DO NOT use m…
    Many time we need to work with multiple files all together. If its windows system then we can use some GUI based editor to accomplish our task. But what if you are on putty or have only CLI(Command Line Interface) as an option to  edit your files. I…
    Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
    Excel styles will make formatting consistent and let you apply and change formatting faster. In this tutorial, you'll learn how to use Excel's built-in styles, how to modify styles, and how to create your own. You'll also learn how to use your custo…

    745 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    15 Experts available now in Live!

    Get 1:1 Help Now