compare audio files programatically

Posted on 2013-12-05
Last Modified: 2013-12-17
Hi everyone, I'm looking for a solution to compare audio files of voice recordings.
One file will be recorded in the studio.
Users are asked to imitate that recording as adequately as possible.
We need to compare each user recording to the original, and determine programatically, preferably using server side scripting (php or node.js), which ones are most similar to the original.

I was thinking we could make a spectrogram out of each recording and compare the bitmap data looking for similarities. I guess there must be other ways of doing this.

Tips & Tricks are more than welcome !
Question by:Dreammonkey
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
LVL 34

Assisted Solution

Slick812 earned 500 total points
ID: 39698700
greetings Dreammonkey, you say you need to "Compare" audio files to determine = "which ones are most similar to the original"
You will never be able to do this, even if you use people or one person to listen to these, how or what choice "parameters" are going to be used for the judges (human) to differentiate the audio entries? If you have more than one judge (human) there will be disagreements about what is "similar". You can probably get the "spectrogram" image for digital analysis (or other digital analysis method), BUT there is not a way to program a series of methods in IF-THEN programming logic, to have a subjective opinion in the digital IF-THEN result to determine what is or is not "most similar", even if you have only one word spoken in the digital audio files. The fine points and complexities of digital audio are massive, even if you just change the microphone for the same person, it can alter the digital footprint in the recording.

You might consider having your site users "Vote" on the audio entry that they think is "most similar", after they get to listen to the audio entries and form an opinion.

Author Comment

ID: 39699430
Thanks for your comment, Slick812,

I understand the challenge, that's part of the reason I posted the question on this forum ;)
I do understand the complexity of audio recordings and microphones, and you certainly have a point here ! ;)

Reading how Shazam works I do believe it must be possible to compare pitch & rhythm between 2 recordings ?

I found this open source library, will try to experiment with it tomorrow:

I know the mechanics are different:
I guess the software searches for a match (comparing highlights in a spectrogram) eventually resulting in a Boolean (or so I imagine). I wonder if it could be possible to have a return value that's not a Boolean, but a float ? ie. a value between 0.0 and 1.0.

I believe the difference is that this software is trying to compare the original version of a audio file with a recorded version of the original + added ambient noise... Ultimately filtering out the noise and finding a match , or not...

I'll keep you posted about my findings, looking forward to your thoughts...
LVL 34

Accepted Solution

Slick812 earned 500 total points
ID: 39699714
I went to the  and read their  "How it works"  page, They reduce the audio to a more simple and variation-restrictive "11kHz mono signal" then they do something like go through the file time spot, by time spot and pick up a reference (evaluate the frequencies and volumes), then do a sort of "average" or combination (as a file checksum HASH does for bytes) and uses that hash-average in a lookup table to see if there is a match, if a match found = then song found. However I do not see any way this could work except for comparing the exact same digital recording as distributed for copyright laws by retailers-downloads. I really doubt that if another group sang the same exact song, same guitars, same drum set, same bass, same key, same tempo, but different singers and players it could make any sort of a match. This is designed to make EXACT matches, in order to pick out a specific song from thousands of other songs. But I do not know if I understand all of the factors they have in the way they "pick up a reference ".

Please get my main point -
have more than one judge (human) there will be disagreements about what is "similar"!

to define somehow in a digital evaluation and comparison what would be "similar" would be difficult, and nearly impossible (my opinion) that one audio was "more similar" than 10 other audios.

Featured Post

WordPress Tutorial 2: Terminology

An important part of learning any new piece of software is understanding the terminology it uses. Thankfully WordPress uses fairly simple names for everything that make it easy to start using the software.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this article we will discuss all things related to StageFright bug, the most vulnerable bug of android devices.
This article discusses how to create an extensible mechanism for linked drop downs.
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.
Viewers will learn the basics of creating custom device Racks in Ableton Live. Place instrument(s) and effects onto a track, and select them all by holding the Shift key and clicking on the device title bars: Group them by typing Command-G (Ctrl-G…

636 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question