Solved

compare audio files programatically

Posted on 2013-12-05
3
290 Views
Last Modified: 2013-12-17
Hi everyone, I'm looking for a solution to compare audio files of voice recordings.
One file will be recorded in the studio.
Users are asked to imitate that recording as adequately as possible.
We need to compare each user recording to the original, and determine programatically, preferably using server side scripting (php or node.js), which ones are most similar to the original.

I was thinking we could make a spectrogram out of each recording and compare the bitmap data looking for similarities. I guess there must be other ways of doing this.

Tips & Tricks are more than welcome !
0
Comment
Question by:Dreammonkey
  • 2
3 Comments
 
LVL 33

Assisted Solution

by:Slick812
Slick812 earned 500 total points
ID: 39698700
greetings Dreammonkey, you say you need to "Compare" audio files to determine = "which ones are most similar to the original"
You will never be able to do this, even if you use people or one person to listen to these, how or what choice "parameters" are going to be used for the judges (human) to differentiate the audio entries? If you have more than one judge (human) there will be disagreements about what is "similar". You can probably get the "spectrogram" image for digital analysis (or other digital analysis method), BUT there is not a way to program a series of methods in IF-THEN programming logic, to have a subjective opinion in the digital IF-THEN result to determine what is or is not "most similar", even if you have only one word spoken in the digital audio files. The fine points and complexities of digital audio are massive, even if you just change the microphone for the same person, it can alter the digital footprint in the recording.

You might consider having your site users "Vote" on the audio entry that they think is "most similar", after they get to listen to the audio entries and form an opinion.
0
 
LVL 8

Author Comment

by:Dreammonkey
ID: 39699430
Thanks for your comment, Slick812,

I understand the challenge, that's part of the reason I posted the question on this forum ;)
I do understand the complexity of audio recordings and microphones, and you certainly have a point here ! ;)

Reading how Shazam works I do believe it must be possible to compare pitch & rhythm between 2 recordings ?

I found this open source library, will try to experiment with it tomorrow:
http://echoprint.me/

I know the mechanics are different:
I guess the software searches for a match (comparing highlights in a spectrogram) eventually resulting in a Boolean (or so I imagine). I wonder if it could be possible to have a return value that's not a Boolean, but a float ? ie. a value between 0.0 and 1.0.

I believe the difference is that this software is trying to compare the original version of a audio file with a recorded version of the original + added ambient noise... Ultimately filtering out the noise and finding a match , or not...

I'll keep you posted about my findings, looking forward to your thoughts...
0
 
LVL 33

Accepted Solution

by:
Slick812 earned 500 total points
ID: 39699714
I went to the  http://echoprint.me/  and read their  "How it works"  page, They reduce the audio to a more simple and variation-restrictive "11kHz mono signal" then they do something like go through the file time spot, by time spot and pick up a reference (evaluate the frequencies and volumes), then do a sort of "average" or combination (as a file checksum HASH does for bytes) and uses that hash-average in a lookup table to see if there is a match, if a match found = then song found. However I do not see any way this could work except for comparing the exact same digital recording as distributed for copyright laws by retailers-downloads. I really doubt that if another group sang the same exact song, same guitars, same drum set, same bass, same key, same tempo, but different singers and players it could make any sort of a match. This is designed to make EXACT matches, in order to pick out a specific song from thousands of other songs. But I do not know if I understand all of the factors they have in the way they "pick up a reference ".

Please get my main point -
have more than one judge (human) there will be disagreements about what is "similar"!

to define somehow in a digital evaluation and comparison what would be "similar" would be difficult, and nearly impossible (my opinion) that one audio was "more similar" than 10 other audios.
0

Featured Post

Three Reasons Why Backup is Strategic

Backup is strategic to your business because your data is strategic to your business. Without backup, your business will fail. This white paper explains why it is vital for you to design and immediately execute a backup strategy to protect 100 percent of your data.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Keep your audience engaged and get the most out of your next presentation with these quick Prezi tips.
Read about why website design really matters in today's demanding market.
This video will demonstrate how to customize windows, tools, and control bars, and save them as screen sets. Open and resize windows: Customize the toolbar: Customize the control bar: Customize your tool selections: Your screen set is alread…
Viewers will learn the basics of making and using Impulse Kits in Ableton Live. Load new Impulse into an empty MIDI track: Fill the 8 empty sample slots with drum samples: Adjust parameters to tailor each sound as desired: Proceed to create be…

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question