PCM/WAV file format information, please.

Posted on 1998-12-11
Last Modified: 2013-12-03

Hello!  I'm currently writing a program that will be reading in and working with PCM/WAV files.  I already know the header formats, gleaned from but I need some help with regards to the meaning of the data portion of the file.  I.e. how do the numbers reflect the waveform of the sound being generated at that time (in that sample), or some such relationship.


Question by:ap9
  • 3
  • 2

Accepted Solution

trillo earned 200 total points
ID: 1417094
Ok... here it is:
All wave data is stored in 8-bit bytes. The bytes of multiple-byte values are stored with the low-order (ie, least significant) bytes first. Data bits are as follows (ie, shown with bit numbers on top):

             7  6  5  4  3  2  1  0
     char: | lsb               msb |

                 7  6  5  4  3  2  1  0 15 14 13 12 11 10  9  8
short(2 bytes): | lsb     byte 0        |       byte 1      msb |

A WAVE file is a collection of a number of different types of chunks. There is a required Format ("fmt ") chunk which contains important parameters describing the waveform, such as its sample rate. The Data chunk, which contains the actual waveform data, is also required. All other chunks are optional. Among the other optional chunks are ones which define cue points, list instrument parameters, store application-specific information, etc.
All applications that use WAVE must be able to read the 2 required chunks and can choose to selectively ignore the optional chunks. A program that copies a WAVE should copy all of the chunks in the WAVE, even those it chooses not to interpret.
There are no restrictions upon the order of the chunks within a WAVE file, with the exception that the Format chunk must precede the Data chunk.

Very Important: Sample Points and Sample Frames
A large part of interpreting WAVE files revolves around the two concepts of sample points and sample frames.
A sample point is a value representing a sample of a sound at a given moment in time. For waveforms with greater than 8-bit resolution, each sample point is stored as a linear, 2's-complement value which may be from 9 to 32 bits wide (as determined by the wBitsPerSample field in the Format Chunk, assuming PCM format (uncompressed). For example, each sample point of a 16-bit waveform would be a 16-bit word (ie, two 8-bit bytes) where 32767 (0x7FFF) is the highest value and -32768 (0x8000) is the lowest value. For 8-bit (or less) waveforms, each sample point is a linear, unsigned byte where 255 is the highest value and 0 is the lowest value. Obviously, this signed/unsigned sample point discrepancy between 8-bit and larger resolution waveforms was one of those "oops" scenarios where some Microsoft employee decided to change the sign sometime after 8-bit wave files were common but 16-bit wave files hadn't yet appeared. Remember 8 bit sound is unsigned and 16 bit is signed. This is important when building your buffers.
Because most CPU's read and write operations deal with 8-bit bytes, it was decided that a sample point should be rounded up to a size which is a multiple of 8 when stored in a WAVE. This makes the WAVE easier to read into memory. If your ADC produces a sample point from 1 to 8 bits wide, a sample point should be stored in a WAVE as an 8-bit byte (ie, unsigned char). If your ADC produces a sample point from 9 to 16 bits wide, a sample point should be stored in a WAVE as a 16-bit word (ie, signed short). If your ADC produces a sample point from 17 to 24 bits wide, a sample point should be stored in a WAVE as three bytes. If your ADC produces a sample point from 25 to 32 bits wide, a sample point should be stored in a WAVE as a 32-bit doubleword (ie, signed long). Etc.

Furthermore, the data bits should be left-justified, with any remaining (ie, pad) bits zeroed. For example, consider the case of a 12-bit sample point. It has 12 bits, so the sample point must be saved as a 16-bit word. Those 12 bits should be left-justified so that they become bits 4 to 15 inclusive, and bits 0 to 3 should be set to zero. Shown below is how a 12-bit sample point with a value of binary [1010 00010111] is formatted left-justified as a 16-bit word.

___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___
|   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |
| 1   0   1   0   0   0   0   1   0   1   1   1   0   0   0   0 |
 <---------------------------------------------> <------------->
        12 bit sample point is left justified          rightmost
                                                      4 bits are
                                                      zero padded

But note that, because the WAVE format uses Intel little endian byte order, the LSB is stored first in the wave file as so:

 ___ ___ ___ ___ ___ ___ ___ ___  ___ ___ ___ ___ ___ ___ ___ __
|   |   |   |   |   |   |   |   ||   |   |   |   |   |   |   |  |
| 0   1   1   1   0   0   0   0 || 1   0   1   0   0   0   0   1
|___|___|___|___|___|___|___|___| __|___|___|___|___|___|___|___|
<-------------> <------------->  <----------------------------->
       bits 0 to 3     4 pad bits                 bits 4 to 11

For multichannel sounds (for example, a stereo waveform), single sample points from each channel are interleaved. For example, assume a stereo (ie, 2 channel) waveform. Instead of storing all of the sample points for the left channel first, and then storing all of the sample points for the right channel next, you "mix" the two channels' sample points together. You would store the first sample point of the left channel. Next, you would store the first sample point of the right channel. Next, you would store the second sample point of the left channel. Next, you would store the second sample point of the right channel, and so on, alternating between storing the next sample point of each channel. This is what is meant by interleaved data; you store the next sample point of each of the channels in turn, so that the sample points that are meant to be "played" (ie, sent to a DAC) simultaneously are stored contiguously.

The sample points that are meant to be "played" (ie, sent to a DAC) simultaneously are collectively called a sample frame. In the example of our stereo waveform, every two sample points makes up another sample frame. This is illustrated below for that stereo example.

      sample       sample              sample
      frame 0      frame 1             frame N
     _____ _____ _____ _____         _____ _____
    | ch1 | ch2 | ch1 | ch2 | . . . | ch1 | ch2 |
    |_____|_____|_____|_____|       |_____|_____|
    |     | = one sample point

For a monophonic waveform, a sample frame is merely a single sample point (ie, there's nothing to interleave). For multichannel waveforms, you should follow the conventions shown below for which order to store channels within the sample frame. (ie, Below, a single sample frame is displayed for each example of a multichannel waveform).

      channels       1         2
                 _________ _________
                | left    | right   |
      stereo    |         |         |

                     1         2         3
                 _________ _________ _________
                | left    | right   | center  |
      3 channel |         |         |         |

The sample points within a sample frame are packed together; there are no unused bytes between them. Likewise, the sample frames are packed together with no pad bytes.

Voila, I hope I could help you.

Note: To see propery the diagrams copy them into Notepad

Author Comment

ID: 1417095
Whoa, ok, good information there, but my actual question is how do you interpret the SAMPLE frames?  Like, if I wanted to draw out the waveform, what would I need to do to extract that information from a sample frame (or frames).

Or maybe, by way of example, suppose I have a simple wave (i.e. a sine wave -- sin(x)) sampled at 44.1kHz mono, what happens when I encode it into WAV format (not worrying about the headers, just how it is represented in the data chunk).

I've increased the point value of the question to 200, as I do appreciate your help!


Author Comment

ID: 1417096
Oh, a small point -- for the data chunk, there is a string that consists of "data".  Now, is there a 4 byte value after this that represents the length of the data chunk, or not?  I've read conflicting reports about this.  Thanks.


Expert Comment

ID: 1417097
How do you interpret the sample frames?.. It depends on your meaning of "interpreting" a sample frame.
First of all you should base your code on the format header. In this piece of text we'll work with a simple example: A wave file with:
SampleRate 11025
BitperSample 16
Data bytes = 1024

Before going on, you're right!... After each header there is a value representing the length in bytes of the chunk (without incluthin the chunk name string).

In our example we can see that each sample is represented by a 16 bit value, so a "Char" data type won't be enough to store the samples, so we choose "Integers" to store our values (Remember an integer needs 2 bytes in memory). We also see that our wave file is Stereo, this means that we will have 2 integer values per sample. In conclusion each sample frame is formed by 4 byes = 2 integers = 2 sample points (the first int for the left, and the second for the right channel).
Our data chunk says that our wave file has 1024 bytes, this means that we will have 1024/4 = 256 sample frames, this means that we will have 256 values for the left channel and 256 values for the right channel.
I've choosen a Stereo example here because it's a little more difficult, (but no too much). In this case, if you want to graph the wave form, you should make two graphs, one for each channel (Remember that the left speaker can have a completely different music that the right speaker)... If you want to make only one graph, you can maybe calculate the average of the right and left sample points.
Of course for Mono sound you avoid all this trouble.
It's not very difficult if you see.... You should just read values according to the Wave format.... In stereo you read: left1, right1, left2, right2, left3, right3, left4, right4, etc... In mono you read: value1, value2, value3, value4, etc... and finally you draw those values.


Author Comment

ID: 1417098
Ah, I see!  Yes, you're right, it isn't very hard -- I was thinking that there was more involved than just that.  Thank you very much!  I accept your answer.


Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Launching Internet Explorer inPrivate mode using VBA 14 387
TFS Branching 4 70
WPF MainWindow update textbox from another class... 6 98
Recommendation vb6 to or others 14 104
This article shows how to make a Windows 7 gadget that accepts files dropped from the Windows Explorer.  It also illustrates how to give your gadget a non-rectangular shape and how to add some nifty visual effects to text displayed in a your gadget.…
For most people, the WrapPanel seems like a magic when they switch from WinForms to WPF. Most of us will think that the code that is used to write a control like that would be difficult. However, most of the work is done by the WPF engine, and the W…
This is Part 3 in a 3-part series on Experts Exchange to discuss error handling in VBA code written for Excel. Part 1 of this series discussed basic error handling code using VBA.…
This tutorial demonstrates a quick way of adding group price to multiple Magento products.

929 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now