compare audio files programatically

Hi everyone, I'm looking for a solution to compare audio files of voice recordings.
One file will be recorded in the studio.
Users are asked to imitate that recording as adequately as possible.
We need to compare each user recording to the original, and determine programatically, preferably using server side scripting (php or node.js), which ones are most similar to the original.

I was thinking we could make a spectrogram out of each recording and compare the bitmap data looking for similarities. I guess there must be other ways of doing this.

Tips & Tricks are more than welcome !
Who is Participating?
I went to the  and read their  "How it works"  page, They reduce the audio to a more simple and variation-restrictive "11kHz mono signal" then they do something like go through the file time spot, by time spot and pick up a reference (evaluate the frequencies and volumes), then do a sort of "average" or combination (as a file checksum HASH does for bytes) and uses that hash-average in a lookup table to see if there is a match, if a match found = then song found. However I do not see any way this could work except for comparing the exact same digital recording as distributed for copyright laws by retailers-downloads. I really doubt that if another group sang the same exact song, same guitars, same drum set, same bass, same key, same tempo, but different singers and players it could make any sort of a match. This is designed to make EXACT matches, in order to pick out a specific song from thousands of other songs. But I do not know if I understand all of the factors they have in the way they "pick up a reference ".

Please get my main point -
have more than one judge (human) there will be disagreements about what is "similar"!

to define somehow in a digital evaluation and comparison what would be "similar" would be difficult, and nearly impossible (my opinion) that one audio was "more similar" than 10 other audios.
greetings Dreammonkey, you say you need to "Compare" audio files to determine = "which ones are most similar to the original"
You will never be able to do this, even if you use people or one person to listen to these, how or what choice "parameters" are going to be used for the judges (human) to differentiate the audio entries? If you have more than one judge (human) there will be disagreements about what is "similar". You can probably get the "spectrogram" image for digital analysis (or other digital analysis method), BUT there is not a way to program a series of methods in IF-THEN programming logic, to have a subjective opinion in the digital IF-THEN result to determine what is or is not "most similar", even if you have only one word spoken in the digital audio files. The fine points and complexities of digital audio are massive, even if you just change the microphone for the same person, it can alter the digital footprint in the recording.

You might consider having your site users "Vote" on the audio entry that they think is "most similar", after they get to listen to the audio entries and form an opinion.
DreammonkeyAuthor Commented:
Thanks for your comment, Slick812,

I understand the challenge, that's part of the reason I posted the question on this forum ;)
I do understand the complexity of audio recordings and microphones, and you certainly have a point here ! ;)

Reading how Shazam works I do believe it must be possible to compare pitch & rhythm between 2 recordings ?

I found this open source library, will try to experiment with it tomorrow:

I know the mechanics are different:
I guess the software searches for a match (comparing highlights in a spectrogram) eventually resulting in a Boolean (or so I imagine). I wonder if it could be possible to have a return value that's not a Boolean, but a float ? ie. a value between 0.0 and 1.0.

I believe the difference is that this software is trying to compare the original version of a audio file with a recorded version of the original + added ambient noise... Ultimately filtering out the noise and finding a match , or not...

I'll keep you posted about my findings, looking forward to your thoughts...
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.