How do I read audio samples in PHP from a 16-bit Mono Signed Big Endian .RAW Audio File produced by sox or ffmpeg for an Automatic 4K Radial Kaleidoscope Video generator originally created in Flash?

Adobe Flash, PHP, Ubuntu, sox, Linux
Here's the user interface to the VideoKaleidoscope3840x2160.air Adobe Air application. I'm in the process of getting a 10-year code-signing certificate to share the Air Application. The timestamping in Flash and Air doesn't work any more now that they are past end-of-life. I figured I give my users 10 years of operation with a real certificate. 

Exhibit 1: Animated Rotating Radial Kaleidoscope Music VIDEO made from an XML list of beautiful STILL images and a URL to an online MP3 audio file. It makes a beat-synchronized 4K Kaleidoscope music VIDEO out of a list of beautiful high resolution STILL images (can you say Creative Commons?).

The website is here: . It's just an index.php that performs a simple single function for non-real time video animators. It accepts a URL to an MP3 sound file and a video frameRate and outputs an XML list of AverageVolume and PeakVolume by video frame with timecode information.

The full source is here (as you can see, this is a really short script if you ignore the HTML, CSS, graphics and YouTube video):

This is a super simple PHP application doing a super simple function, but I am having difficulty and need assistance. I feel really embarrassed that I am so stumped by such a simple thing.

I have a small server set up as a web service for a FREE OPEN SOURCE Adobe Air application that slowly generates 4K video kaleidoscopes as a series of BMP images. I have limited programming experience and am getting confused by the values that are being returned by my 16-bit raw audio analysis. I originally wrote this script in 2012 when I was a Flash student. I just returned to it and tried to make it 16-bit. It worked great almost exactly as it is in 8-bit.

The sole purpose of this script (the whole purpose of the educational nonprofit VPS) is to provide FREE frame-level metadata on MP3 files for entry-level music video animators who are interested in finding the beat for video production (AverageVolume, PeakVolume and video timecode information).

Here is an example of my very first test with frame-level audio synchronization. You can really see the acoustical percussion patterns as visual patterns in the spherical refraction index of the kaleidoscope. All I'm doing is directly plugging amplitude into image refraction after applying a variable scaling factor input by the user with a little bit of application-level look ahead and averaging between frames to smooth the visual decay of the beats.

Basically, it's a computerized visualizer of MP3 music amplitude patterns by applying a 3D spherical displacement map to a 2D radial kaleidoscope made from rotating hi-res still images INPUT by an XML file.

It's a totally AUTOMATIC music video MACHINE. A URL to an mp3 audio file and a LIST of STILL images IN and a 4K Blu-ray, YouTube, Hulu, Amazon Prime and Netflix-ready Kaleidoscope Music Video OUT for FREE if you have a LOT of PATIENCE, a fast computer with a good graphics card and a VERY large hard drive. It's a totally FREE money-making machine for entry-level people who have PATIENCE.

You can do this in Adobe Flash 7 (or scripted with Adobe Photoshop CS5) with these PixelBender filters. Here are the PixelBender filters that will do this functionality in a single step. I'd like the Open Source software community to convert these filters into free plug-ins for the modern Adobe Photoshop and Adobe Premiere applications and also command line versions that will run on Ubuntu.

Here is my Adobe Flash and Air Source Code for this Kaleidoscope Video project.

The development environment is PHP 8 and Apache2 on Ubuntu (with sox and ffmpeg installed). It's a super simple basic vanilla server configuration. I've included URLs to sample data files (copyrighted music but used here on a nonprofit website for educational scientific critical discussion purposes). Keeping absolute simplicity in mind, I'd like to try to get this working with the minimum amount of redesign. This is super basic stuff. It's supposed to be SUPER SIMPLE so that anybody can replicate it and scale it.

Here is the meat of the code. This is the PHP sample-reading logic in the source slightly modified to work standalone with the included data files.

Here is the FIRST code snippet for the standalone test. (Below in the SECOND code snippet I provide a standalone example that will accept any mp3 via a GET parameter or command line argument.)

//Experts Exchange 16-bit Audio Analysis Basic Question Standalone Sample Script Example 1

$song="";//Original Song for testing
$mp3TEMP="";//Rock On by David Essex
$rawTEMP="";//16-bit Mono 22KHz Signed Big Endian RAW Audio

echo "<?xml version='1.0' encoding='UTF-8'?>\r\n";


echo "\t<frames mp3File='$song' frameRate='$frameRate' mp3Length='$mp3LengthSeconds' mp3LengthSMPTE='$mp3LengthSMPTE' totalFrames='$totalFrames'>\r\n";

        if($high_byte>127) $high_byte=$high_byte-128; //remove the sign
        $amplitude=1-(($high_byte*255+$low_byte)/32768); //normalize with silence 0 and full loud 1
        if($amplitude>$peakVolume) $peakVolume=$amplitude;
    echo "\t\t<peakVolume frame='$realFrame' seconds='$seconds' timeCodeSMPTE='$timeCodeSMPTE' averageVolume='$averageVolume'>$peakVolume</peakVolume>\r\n";

echo "\t</frames>";
function getTimeCode($time, $inputFrameRate, $smpte=1) 
    $min = floor($time / 60);
    $hour = floor($min / 60);
    if ($min > 60) $min = $min - ($hour * 60);
    $sec = floor($time % 60);
    $milli = $time - floor($time);
    $frame = round($milli*$inputFrameRate);

    $shour = $hour;
    $smin = $min; 
    $ssec = $sec;

    $sframe = $frame;

    if (strlen($shour) < 2)
    $shour = "0" . $shour;

    if (strlen($smin) < 2)
    $smin = "0" . $smin;

    if (strlen($ssec) < 2)
    $ssec = "0" . $ssec;

    if (strlen($sframe) < 2)
    $sframe = "0" . $sframe;

    $tcSMPTE = $shour . ":" . $smin . ":" . $ssec . ":" . $sframe;
    $tc = $shour . ":" . $smin . ":" . $ssec . number_format($milli,3);

        return $tcSMPTE;
        return $tc;

For some reason, I'm NOT getting the correct values out of this. It all worked great when the RAW file was signed 8-bit (technically 7 bits because I throw away the sign). I wanted to improve the fidelity of my kaleidoscope scaling to be 16-bit (technically 15-bit because I throw away the sign). I figured that will make for better scaling in-betweens.

Something is wrong right here:
   if($high_byte>127) $high_byte=$high_byte-128; //remove the sign
   $amplitude=1-(($high_byte*255+$low_byte)/32768); //normalize with silence 0 and full loud 1

    if($amplitude>$peakVolume) $peakVolume=$amplitude;

I understand binary. I'm used to visualizing wave forms in audio editors as having LOUD be big spikes up and down across the zero axis. So, I think of loud as being +32768 and -32768 with 0 being silence. It's the absolute value. But that's not how it's represented. Here is the LOGIC that is IMPOSSIBLE for my BRAIN and I'm sure MANY OTHER BRAINS:

According to the sox documentation, a value of 0 is supposed to indicate full loudness. I'm trying to get the correct value then to normalize and return a floating point value between 0 and 1 with 0 being silence and 1 being full loudness. That doesn't make sense to me. How do you put a sign on a zero if that's supposed to be full volume? That's what's confusing me. For me, it doesn't make sense. It's NOT intuitive.

Here is an example of an MP3 that I'd like to analyze for testing purposes. The MP3 starts with silence with a really low level of background noise, so the first frames should have very low values indicating near silence, until the bass starts doing the hook. Instead the values are high for the whole song and are nearly uniform. So something in my logic is wrong.

Here's the code analyzing this song as a 16-bit signed-integer big endian RAW file. Processing the local MP3 file takes about 2 seconds but it takes about 5 seconds for the browser to render all the XML with thousands of frames, so please be patient.

Here are copies of the intermediate temp files that are deleted right away during the process.

I loaded the RAW file into an audio editor with the following settings and it sounded fine.

22050 Hz 16-bit Mono Big Endian Signed

It looks like sox is doing it's job exactly as the parameters say. It sounds fine. Below, David suggested that I rewrite this in perl or use ffmpeg instead for reliability and flexibility. In this case with this example, rather than rewrite everything (I have to find the ffmpeg parameters), I'd like just to get my logic working here in this example without a rewrite starting with the RAW file. I'm just interested in grabbing the samples from memory with the correct values.

Perhaps the experts would use more efficient and technically complex means to produce 4K Video Kaleidoscopes and to analyze raw audio from MP3s. In any case, I have the raw audio in memory from the raw file. That's the majority of the work right there. Fixing about 4 lines of sample-reading code should get this working correctly, if NOT optimized for performance. Everything worked as designed when I was analyzing 8-bit RAW audio, but the switch to 16-bit has left me confused.

As you can see, all the TIMING and LENGTH values are CORRECT but all the VOLUME values are wrong as of when I submitted this question.

I realize that this is a super simple application but for the life of me I seem to have no luck finding my own solution.

Thank you.
