We help IT Professionals succeed at work.
Get Started
Troubleshooting Question

How do I read audio samples in PHP from a 16-bit Mono Signed Big Endian .RAW Audio File produced by sox or ffmpeg for an Automatic 4K Radial Kaleidoscope Video generator originally created in Flash?

Last Modified: 2021-02-26

It cost me $239.88 to join the Experts Exchange to ask this single SIMPLE BASIC QUESTION of the online Experts here.

Here's the user interface to the VideoKaleidoscope3840x2160.air Adobe Air application. I'm in the process of getting a 10-year code-signing certificate to share the Air Application. The timestamping in Flash and Air doesn't work any more now that they are past end-of-life. I figured I give my users 10 years of operation with a real certificate.



Exhibit 1: Animated Rotating Radial Kaleidoscope Music VIDEO made from an XML list of beautiful STILL images and a URL to an online MP3 audio file. It makes a beat-synchronized 4K Kaleidoscope music VIDEO out of a list of beautiful high resolution STILL images (can you say Creative Commons?).

The website is here: http://mp3cruncher.org . It's just an index.php that performs a simple single function for non-real time video animators. It accepts a URL to an MP3 sound file and a video frameRate and outputs an XML list of AverageVolume and PeakVolume by video frame with timecode information.

The full source is here (as you can see, this is a really short script if you ignore the HTML, CSS, graphics and YouTube video):


My name is Ken Meyering. In my writings, I call myself DROID Ken. I am a severely mentally ill educator. I'm a really cool crazy pothead who seems to be a MISTAKE-MAKING MACHINE. I just try to make all of my mistakes EDUCATIONAL for others by being TOTALLY OPEN and TRANSPARENT about my own UNIQUE PERSONAL SITUATION.

So, this may NOT seem like a typical appropriate Experts Exchange question. I am NOT a programmer looking for a job. I live on Social Security Disability. I am totally free to speak my mind. I over-share compulsively so that EVERYTHING I say is INAPPROPRIATE and POLITICALLY INCORRECT. I provide WAY TOO MUCH INFORMATION about myself for EDUCATIONAL PURPOSES. These are ALL symptoms of my VERY REAL severe mental illness. I am requesting that you to please assist me IN SPITE of my mental illness and oversharing. Please don't write me off.

My $1850/month public income is secure. I try to give back. I pretend that I am a one-man scientific think tank consulting for the taxpayers by designing the high-level conceptual outline for an all-new, totally private, completely nonprofit, all-virtual, free and real time futuristic utopian global monetary system. (Getting rid of paper money, doing a worldwide banking jubilee and starting over with an all-virtual economy.)  I am a one-trick pony. A worldwide banking quantum leap is my one trick, which everything else supports.

It's almost like my whole life story is nothing but FODDER for educational scientific critical discussion. I'm like a human CARTOON CHARACTER. Please consider this question and my website to be high-visibility public educational examples for a mass market worldwide audience.

Please bear with me for what on the surface may appear to be stupidity and intellectual laziness. I have a severe mental illness that affects my ability to program. I can pretend to be a programmer, but I'm fooling nobody. I'm a man in need of expert help.

I am an online science fiction writer who has completed several community college introductory programming classes. I know how to use MySQL. I took several Unix System Administration courses that gave me the skills to configure a really basic Ubuntu VPS. I can hack my way to getting simple programs working based on OTHER people's OPEN SOURCE examples and functions, but I feel the need to share this disclaimer as I publicly reach out for help. Just pretend you are talking to a smart student with an invisible mental disability.

I have a form of schizophrenia that affects my MATH and LOGIC abilities and my individual initiative and it severely LIMITS my technical RETENTION and the complexity of what I am capable of doing with computers on my own without the help of more intelligent and experienced others. I have very little technical retention and rely on written notes and code to get things working.

I'm a smart guy but I wasn't able to successfully complete a community college ELEMENTARY SYMBOLIC LOGIC class and I couldn't make it past the UNIT CIRCLE in precalculus. I had to withdraw from both of those classes. Basic logic confuses me. My brain just cannot do the operations and translations. The neural connections just aren't there. I hit an INVISIBLE mental wall. I can forget about taking STATISTICS. That is beyond my ken. So much for neural networks, pattern recognition and quantum physics.

Things that are super easy and intuitive to normal people are hard for me. I lack COMMON SENSE but not for lack of trying. That's why I like writing fiction where anything goes and it's all imaginary fantasy. For a person who suffers from delusions due to no fault of his own, fantasy writing beats programming as an intrinsically rewarding activity. Sorry, Dad, I'm NOT a computer genius. This is the best that I can do with computers.

It is imperative that the solution here be as SIMPLE as possible even if it's NOT the fastest way to produce the MP3 audio analysis. This is a super simple PHP application doing a super simple function, but because of my illness, I am flabbergasted, easily frustrated and confused and need assistance please. I feel really embarrassed that I am so stumped by such a simple thing. I am totally serious and truly need help.

I have a small server set up as a web service for a FREE OPEN SOURCE Adobe Air application that slowly generates 4K video kaleidoscopes as a series of BMP images. I have limited programming experience and am getting confused by the values that are being returned by my 16-bit raw audio analysis. I originally wrote this script in 2012 when I was a Flash student. I just returned to it and tried to make it 16-bit. It worked great almost exactly as it is in 8-bit.

The sole purpose of this script (the whole purpose of the educational nonprofit VPS) is to provide FREE frame-level metadata on MP3 files for entry-level music video animators who are interested in finding the beat for video production (AverageVolume, PeakVolume and video timecode information).

Here is an example of my very first test with frame-level audio synchronization. You can really see the acoustical percussion patterns as visual patterns in the spherical refraction index of the kaleidoscope. All I'm doing is directly plugging amplitude into image refraction after applying a variable scaling factor input by the user with a little bit of application-level look ahead and averaging between frames to smooth the visual decay of the beats.

Basically, it's a computerized visualizer of MP3 music amplitude patterns by applying a 3D spherical displacement map to a 2D radial kaleidoscope made from rotating hi-res still images INPUT by an XML file.

It's a totally AUTOMATIC music video MACHINE. A URL to an mp3 audio file and a LIST of STILL images IN and a 4K Blu-ray, YouTube, Hulu, Amazon Prime and Netflix-ready Kaleidoscope Music Video OUT for FREE if you have a LOT of PATIENCE, a fast computer with a good graphics card and a VERY large hard drive. It's a totally FREE money-making machine for entry-level people who have PATIENCE.

You can do this in Adobe Flash 7 (or scripted with Adobe Photoshop CS5) with these PixelBender filters. Here are the PixelBender filters that will do this functionality in a single step. I'd like the Open Source software community to convert these filters into free plug-ins for the modern Adobe Photoshop and Adobe Premiere applications and also command line versions that will run on Ubuntu.





Here is my Adobe Flash and Air Source Code for this Kaleidoscope Video project.


Pretend I am my 97-year-old grandmother, who doesn't know how to use Google even though my mother bought her a computer and she pays for her monthly internet access. We Skype. She refuses to learn. She has no initiative. She just doesn't have the motivation to learn NEW THINGS. She BELIEVES she doesn't know how to use the computer and NOTHING will shake that BELIEF. She's had breast cancer for 4 years but refuses to get it cut out because she's afraid she will die under general anesthesia.

She has a MENTAL BLOCK. It's not LOGICAL or RATIONAL. I still love her dearly. She's a wonderful loving human being, like me. That's grandma. I have her DNA, for better or worse. I'm not LOGICAL. I know sharing TOO MUCH INFORMATION offends people and drives them way. But I still do it COMPULSIVELY in spite of that knowledge. I'm ALWAYS in BRAIN DUMP MODE. I share my thoughts and feelings without filters with reckless abandon. I am a human CARTOON of exaggerated OPENNESS and TRANSPARENCY.

Assume that I do NOT know how to look up the command line parameters to ffmpeg. I want this little Experts Exchange conversation to embed all of that information into the conversation with the voluntary assistance and participation of members of this online community.

The development environment is PHP 8 and Apache2 on Ubuntu (with sox and ffmpeg installed). It's a super simple basic vanilla server configuration. I've included URLs to sample data files (copyrighted music but used here on a nonprofit website for educational scientific critical discussion purposes). Keeping absolute simplicity in mind, I'd like to try to get this working with the minimum amount of redesign. This is super basic stuff. It's supposed to be SUPER SIMPLE so that anybody can replicate it and scale it.

Here is the meat of the code. This is the PHP sample-reading logic in the source slightly modified to work standalone with the included data files.



Here is the FIRST code snippet for the standalone test. (Below in the SECOND code snippet I provide a standalone example that will accept any mp3 via a GET parameter or command line argument.)

//Experts Exchange 16-bit Audio Analysis Basic Question Standalone Sample Script Example 1

$song="http://mp3cruncher.org/examples/David-Essex_Rock-On.mp3";//Original Song for testing
$mp3TEMP="http://mp3cruncher.org/examples/3572ef3c9121e9faa59fdb33b2d7aa84.mp3";//Rock On by David Essex
$rawTEMP="http://mp3cruncher.org/examples/3572ef3c9121e9faa59fdb33b2d7aa84.raw";//16-bit Mono 22KHz Signed Big Endian RAW Audio

echo "<?xml version='1.0' encoding='UTF-8'?>\r\n";


echo "\t<frames mp3File='$song' frameRate='$frameRate' mp3Length='$mp3LengthSeconds' mp3LengthSMPTE='$mp3LengthSMPTE' totalFrames='$totalFrames'>\r\n";

        if($high_byte>127) $high_byte=$high_byte-128; //remove the sign
        $amplitude=1-(($high_byte*255+$low_byte)/32768); //normalize with silence 0 and full loud 1
        if($amplitude>$peakVolume) $peakVolume=$amplitude;
    echo "\t\t<peakVolume frame='$realFrame' seconds='$seconds' timeCodeSMPTE='$timeCodeSMPTE' averageVolume='$averageVolume'>$peakVolume</peakVolume>\r\n";

echo "\t</frames>";
function getTimeCode($time, $inputFrameRate, $smpte=1) 
    $min = floor($time / 60);
    $hour = floor($min / 60);
    if ($min > 60) $min = $min - ($hour * 60);
    $sec = floor($time % 60);
    $milli = $time - floor($time);
    $frame = round($milli*$inputFrameRate);

    $shour = $hour;
    $smin = $min; 
    $ssec = $sec;

    $sframe = $frame;

    if (strlen($shour) < 2)
    $shour = "0" . $shour;

    if (strlen($smin) < 2)
    $smin = "0" . $smin;

    if (strlen($ssec) < 2)
    $ssec = "0" . $ssec;

    if (strlen($sframe) < 2)
    $sframe = "0" . $sframe;

    $tcSMPTE = $shour . ":" . $smin . ":" . $ssec . ":" . $sframe;
    $tc = $shour . ":" . $smin . ":" . $ssec . number_format($milli,3);

        return $tcSMPTE;
        return $tc;
For some reason, I'm NOT getting the correct values out of this. It all worked great when the RAW file was signed 8-bit (technically 7 bits because I throw away the sign). I wanted to improve the fidelity of my kaleidoscope scaling to be 16-bit (technically 15-bit because I throw away the sign). I figured that will make for better scaling in-betweens.

Something is wrong right here:
   if($high_byte>127) $high_byte=$high_byte-128; //remove the sign
   $amplitude=1-(($high_byte*255+$low_byte)/32768); //normalize with silence 0 and full loud 1

    if($amplitude>$peakVolume) $peakVolume=$amplitude;
I understand binary. I'm used to visualizing wave forms in audio editors as having LOUD be big spikes up and down across the zero axis. So, I think of loud as being +32768 and -32768 with 0 being silence. It's the absolute value. But that's not how it's represented. Here is the LOGIC that is IMPOSSIBLE for my BRAIN and I'm sure MANY OTHER BRAINS:

According to the sox documentation, a value of 0 is supposed to indicate full loudness. I'm trying to get the correct value then to normalize and return a floating point value between 0 and 1 with 0 being silence and 1 being full loudness. That doesn't make sense to me. How do you put a sign on a zero if that's supposed to be full volume? That's what's confusing me. For me, it doesn't make sense. It's NOT intuitive.

Here is an example of an MP3 that I'd like to analyze for testing purposes. The MP3 starts with silence with a really low level of background noise, so the first frames should have very low values indicating near silence, until the bass starts doing the hook. Instead the values are high for the whole song and are nearly uniform. So something in my logic is wrong.

Here's the code analyzing this song as a 16-bit signed-integer big endian RAW file. Processing the local MP3 file takes about 2 seconds but it takes about 5 seconds for the browser to render all the XML with thousands of frames, so please be patient.


Here are copies of the intermediate temp files that are deleted right away during the process.



I loaded the RAW file into an audio editor with the following settings and it sounded fine.

22050 Hz 16-bit Mono Big Endian Signed

It looks like sox is doing it's job exactly as the parameters say. It sounds fine. Below, David suggested that I rewrite this in perl or use ffmpeg instead for reliability and flexibility. In this case with this example, rather than rewrite everything (I have to find the ffmpeg parameters), I'd like just to get my logic working here in this example without a rewrite starting with the RAW file. I'm just interested in grabbing the samples from memory with the correct values.

Perhaps the experts would use more efficient and technically complex means to produce 4K Video Kaleidoscopes and to analyze raw audio from MP3s. In any case, I have the raw audio in memory from the raw file. That's the majority of the work right there. Fixing about 4 lines of sample-reading code should get this working correctly, if NOT optimized for performance. Everything worked as designed when I was analyzing 8-bit RAW audio, but the switch to 16-bit has left me confused.

As you can see, all the TIMING and LENGTH values are CORRECT but all the VOLUME values are wrong as of when I submitted this question.

I realize that this is a super simple application but for the life of me I seem to have no luck finding my own solution.

Thank you.
Watch Question
Fractional CTO
Distinguished Expert 2020
This problem has been solved!
Unlock 1 Answer and 25 Comments.
See Answer
Why Experts Exchange?

Experts Exchange always has the answer, or at the least points me in the correct direction! It is like having another employee that is extremely experienced.

Jim Murphy
Programmer at Smart IT Solutions

When asked, what has been your best career decision?

Deciding to stick with EE.

Mohamed Asif
Technical Department Head

Being involved with EE helped me to grow personally and professionally.

Carl Webster
CTP, Sr Infrastructure Consultant
Ask ANY Question

Connect with Certified Experts to gain insight and support on specific technology challenges including:

  • Troubleshooting
  • Research
  • Professional Opinions
Did You Know?

We've partnered with two important charities to provide clean water and computer science education to those who need it most. READ MORE