compare audio files programatically

Posted on 2013-12-05
Last Modified: 2013-12-17
Hi everyone, I'm looking for a solution to compare audio files of voice recordings.
One file will be recorded in the studio.
Users are asked to imitate that recording as adequately as possible.
We need to compare each user recording to the original, and determine programatically, preferably using server side scripting (php or node.js), which ones are most similar to the original.

I was thinking we could make a spectrogram out of each recording and compare the bitmap data looking for similarities. I guess there must be other ways of doing this.

Tips & Tricks are more than welcome !
Question by:Dreammonkey
  • 2
LVL 33

Assisted Solution

Slick812 earned 500 total points
Comment Utility
greetings Dreammonkey, you say you need to "Compare" audio files to determine = "which ones are most similar to the original"
You will never be able to do this, even if you use people or one person to listen to these, how or what choice "parameters" are going to be used for the judges (human) to differentiate the audio entries? If you have more than one judge (human) there will be disagreements about what is "similar". You can probably get the "spectrogram" image for digital analysis (or other digital analysis method), BUT there is not a way to program a series of methods in IF-THEN programming logic, to have a subjective opinion in the digital IF-THEN result to determine what is or is not "most similar", even if you have only one word spoken in the digital audio files. The fine points and complexities of digital audio are massive, even if you just change the microphone for the same person, it can alter the digital footprint in the recording.

You might consider having your site users "Vote" on the audio entry that they think is "most similar", after they get to listen to the audio entries and form an opinion.

Author Comment

Comment Utility
Thanks for your comment, Slick812,

I understand the challenge, that's part of the reason I posted the question on this forum ;)
I do understand the complexity of audio recordings and microphones, and you certainly have a point here ! ;)

Reading how Shazam works I do believe it must be possible to compare pitch & rhythm between 2 recordings ?

I found this open source library, will try to experiment with it tomorrow:

I know the mechanics are different:
I guess the software searches for a match (comparing highlights in a spectrogram) eventually resulting in a Boolean (or so I imagine). I wonder if it could be possible to have a return value that's not a Boolean, but a float ? ie. a value between 0.0 and 1.0.

I believe the difference is that this software is trying to compare the original version of a audio file with a recorded version of the original + added ambient noise... Ultimately filtering out the noise and finding a match , or not...

I'll keep you posted about my findings, looking forward to your thoughts...
LVL 33

Accepted Solution

Slick812 earned 500 total points
Comment Utility
I went to the  and read their  "How it works"  page, They reduce the audio to a more simple and variation-restrictive "11kHz mono signal" then they do something like go through the file time spot, by time spot and pick up a reference (evaluate the frequencies and volumes), then do a sort of "average" or combination (as a file checksum HASH does for bytes) and uses that hash-average in a lookup table to see if there is a match, if a match found = then song found. However I do not see any way this could work except for comparing the exact same digital recording as distributed for copyright laws by retailers-downloads. I really doubt that if another group sang the same exact song, same guitars, same drum set, same bass, same key, same tempo, but different singers and players it could make any sort of a match. This is designed to make EXACT matches, in order to pick out a specific song from thousands of other songs. But I do not know if I understand all of the factors they have in the way they "pick up a reference ".

Please get my main point -
have more than one judge (human) there will be disagreements about what is "similar"!

to define somehow in a digital evaluation and comparison what would be "similar" would be difficult, and nearly impossible (my opinion) that one audio was "more similar" than 10 other audios.

Featured Post

Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

Join & Write a Comment

I have a Synology DS212+ NAS.  These are not only great for backup and normal NAS stuff, but also for delivering media throughout your home or LAN via DLNA.  I copied my whole audio collection from iTunes over to the box, but couldn't figure out how…
This article discusses four methods for overlaying images in a container on a web page
This video will give a brief orientation and organization of Logic Pro X, and how to access different work spaces quickly with keyboard shortcuts.
Viewers will learn how to turn a Live Set into a compressed Live Pack file, and how to install Live Packs. Make: File > Collect All And Save: File > Manage Files: Click Manage Project: Click Create Pack: Select save location: Install: Doub…

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now