Link to home
Start Free TrialLog in
Avatar of Silas2
Silas2

asked on

Analyse mp3 files for sound?

Does anyone know of a tool, (no UI, just code) which can scan thru mp3 files to see when sound is present, its just for call-center stuff to check if people have been talking or not.
Avatar of ChopOMatic
ChopOMatic
Flag of United States of America image

I don't know of anything but if you do find it, please post here for sharing if you don't mind.
Avatar of btan
btan

One possibility is detecting beats per minutes

@ http://superuser.com/questions/129041/any-beat-detection-software-for-linux

Another possible is using library for development (unless you want to go to that route)

@ http://www.surina.net/soundtouch/index.html
Avatar of Silas2

ASKER

Thanks for those, the library is fine, but I'm not sure how I could detect the presence/absence of speech, with no beat.
I don't know anything about mp3, for example, does the length of the file represent the amount of complexit of it not its realtime playback, so maybe if I could detect the length of playback with code, and knowing its size, I could tell if it were silent?
Hi Silas.

I find myself in a bit of a dilemma with this question.  My personal overview of *some* call centres I have seen is that they are draconian and somewhat patronising to employees.  The request gives me the impression of "chat police" in a dark room analysing continuous audio feeds from every employee's headset to discover who is engaging in idle chit-chat amongst themselves about their weekend adventures ;-)

Putting all personal feelings aside and acknowledging the fact that people are there to do a job and not engage in idle banter, there are a few of things that I suppose are important to know before a suitable method is suggested.

1. Are these MP3s created for the purpose of "quality monitoring" (as they say while you are in the queue)?
2. Are you expecting most of them to contain audio content or not?
3. Are you trying to decipher human voice content amongst other audio content such as "on-hold" music or similar?
4. What is the approximate play time of these MP3s?
5. Do you need to extract sections of audio content from lengthy audio files that are mostly silent.

I think you'll see what I'm trying to ascertain.  There are quite a lot of standalone command-line programs that allow you to extract all kinds of properties from MP3 files to a report.

The actual play-time would be a pretty useless property to know because it doesn't tell you if it's just a long silence or actually has audio content in it.  I'm pretty sure that, amongst the utility programs I have saved and any new ones that may exist out there, that there will be those that can tell you whether there is actually audio content amongst silent gaps.  The main issue is whether any other audio other than voice is expected to be present.

Bill
Avatar of Silas2

ASKER

The app is to provide remote/home work for people, who will get paid for the amount of time they are on the phone. They don't have to be on the phone so its not draconian in that sense, its just to stop people leaving the phone off the hook and charging us for time on the phone.
Therefore, to be able to tell the difference between plain noise or music and speach would be nice, but the difference between speach and silence would be enough.
Thanks for explaining that.  It's not inconceivable that an enterprising individual, with knowledge that the app may be monitoring voice activity, could just place the radio next to the off-hook phone or something similar to defeat the monitoring, but it seems a reasonable enough way of validating activity.

I'll run a few tests with a couple of command line "MP3 Split" type programs using a few test MP3s.  Usually they will allow the generation of a report based on "silence" thresholds detected between audio content, which may in the end be much the same as detecting audio content amongst silence.

I've had a look for command-line driven programs that might be able to map amplitudes and frequencies, but so far it looks like that type of thing might only be available to GUI applications.
Avatar of Silas2

ASKER

Audacity is an opensource recording app that certainly does a lot of sound analysis, are bits of that accessible thru code?
You read my mind on that one.  I use Audacity a lot.  Generally scripting in Audacity is for running internal "macro" type scripts such as chaining together encoding and export commands (http://manual.audacityteam.org/man/Scripting), but I have been checking out whether something like the Spectrum Plotter function can be invoked from the command line.  So far I don't think so.
I write a tools to detect the wav file whether silence or not.
If the whole file is silence or noise, it will return 0 and print a hint message.

This tools only support PCM wav files, MUST be MONO, 16K sample rate, 16 bit file.
You can convert the mp3 file as 16K, 16bit PCM wave file first, and use this tools to detect.
There have a lot of tools to convert MP3 to wav. e.g. L3dec, lame.

I think we can write a batch file to do it like below:
lame --decode your_file.mp3 temp.wav
siledet temp.wav
if %ERRORLEVEL%==1 do echo silence file!

Siledet.zip
Avatar of Silas2

ASKER

Thats v.interesting, can you set a threshold level for silence in the cmd line?
Why do you wanna the threshold level?
Did this tools take a mistake for some files?
Please provider some sample file if possible.

What's the threshold level? for What?
Means the silence time length or the noise gain(db) level?

Detect whole file is silence or not is OK for me.
I'm not sure if it is OK for me, if you wanna a noise amplitude threshold.
Avatar of Silas2

ASKER

Sorry, I've never thought about sound analysis before, but I guess a combination of:
"...silence time length or the noise gain(db) level..." would be ideal, e.g. "over x db's for y% of time", is that possible?
ASKER CERTIFIED SOLUTION
Avatar of BillDL
BillDL
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Sorry, didn't realise you two were conversing.  My comment addressed the question:
"Thats v.interesting, can you set a threshold level for silence in the cmd line?"
Silas

How is your recording implemented?  Through the headset, via the telephony network, on the server side?

Could an operative talking to a cat on her lap while browsing YouTube, or the noise of her husband doing the vacuuming potentially leave some "noise" in the recording while sitting with the headphones on awaiting a call?
Avatar of Silas2

ASKER

That is a good point about backgound noise, the recording is done thru a bespoke softphone, I don't know if I can split the 'channels', mic/speakers. Have you got any ideas for distinguishing voice from other sounds?
Unfortunately not.  That kind of thing really starts getting into the realms of forensic audio analysis and my knowledge doesn't extend anywhere near understanding the intricacies.  I do, however, have an old forensic data recovery suite I found years ago, and I'll have a quick look at and see if there might be an audio analysis tool.  I deleted the GUI program that plots spectrums for WAV files yesterday, but may find it again by searching.

If the mic/headset is able to pick up ambient room sounds with any efficiency, I reckon it's going to be a very hard job to distinguish between "C'mon kitty, up on my lap", "Oh, John, could you take out the bin and make the lunch", and "Good morning, Silas IT Support, how may I help you".  
I would expect "other sounds" to be quieter than a voice from a mouth close to a headset microphone, but with a simple program like the "mp3splt.exe" I was toying with, it would then be a case of lots of experimentation to set a "silence" threshold level to roughly match household noise in the background.
Avatar of Silas2

ASKER

I think you might be right about the "mp3splt.exe" thresholds, I might be able to find a sweet spot for voice. It's not supposed to be a definative way to calculate wages routinely, only in cases of dispute, it would be a good first port of call.
Where are the MP3 files stored when recorded, and are they for the duration of the day or are they somehow initiated programmatically when the operative answers or makes a call, or logs in to your server?

That's rhetorical and just food for thought, because I need to go and make my dinner right now ;-)

I was just experimenting a few days ago with a simple batch script or VBS (can't recall which) that, when run, records input from the default recording device on a Windows (XP and possibly other OS) machine, for whatever duration in seconds is set, and creates a WAV file without showing any other activity.  I'll be damned if I can find the script now though.  I'll do a file search later.
I already add the threshold settings.
Usage:
  siledet c:\test.wav 10
means if the silecence time more than 10%, the result will give "silence" otherwise not.
Of cause this tools only process the device background noise, can't filter the enviroment background noise like "come on kitty".

 
 User generated image Siledet.zip
Sorry, it looks have some bug on pure device background noise file.
I'll try to solve it later.
Avatar of Silas2

ASKER

Ahem, excuse me, but I think I've just realised I might have an easier way of doing this, as I've just discovered that I do receive a 'disconnect' event from the Voip library when the person called if they hang up, I didn't think I was getting one, but I think I am, so I can stop recording at that point. They would have to be phoning themselves or some confederate but we control the numbers phoned.
However, both these utilies look useful to me in other ways I might need to process the files.
Bill, I'd love to know about the GUI tools you mentioned above when  you showed the long set of instructions for MP3split.

Thx!
Hi ChopOMatic

I managed to find one of the GUI utilities I had looked at previously but disregarded because of the user interface and that it only processed WAV files.  It's named .... wait for it .... "AnalFreq".  I can only assume that it is pronounced "Annalll" rather than "Aynal", because it's by Christopher Brown, The Parmly Hearing Institute, Loyola University, Chicago, and not by someone in the Gastro-Intestinal Noise Analysis Dept ;-)

It is described as "a single-channel FFT-based (Fast Fourier Transform) spectrum analyzer", and the version I found (afreq18.zip) was from this link:
http://www.simtel.net/author/Christopher-Brown/3891.html

WAV files have to be 16-bit and one of 3 fixed sample rates by the look of the errors I get trying to find a suitable file to load.  It's also giving me runtime errors, so it would appear to either be less than perfectly coded or just too old for anything past Windows 2000.

Another I've known about for a while is by the Australian company NCH which tends to provide free versions of their software as tasters for the full packages:
http://www.nch.com.au/wavepad/fft.html

There are loads more like that, some retail, some shareware, some free.  I had actually been searching for some old style DOS program that might output to an ANSI or ASCII "graph" or "plot", or to numeric ranges that could be parsed and plotted later somehow.
ChopOMatic, I meant to say that I also looked at "MP3Gain" as a possible way to determine a sound level, but it's scope doesn't really allow anything to work with in this case.

Hi Silas

You said that you get a "disconnect" event from the VoIP softphone.  Does that imply that you may be able to use these events as definitive "timers" to back up a tribunal case if it came to it, or are you still looking for some software that can be triggered from the event to analyse each MP3 after the recording is terminated each time?
Silas

You might also want to bookmark the NCH software site.  Maybe some of it could prove useful now or at a later stage:
http://www.nch.com.au/software/index.html

http://www.nch.com.au/talk/index.html
http://www.nch.com.au/vrs/index.html
http://www.nch.com.au/soundtap/index.html
Big thanks for sharing that info, Bill. I'm a big fan of NCH stuff. I'm pretty sure I'm remembering correctly that they're the ones who make Golden Records, an app designed to find silent gaps in recordings and split the file accordingly?

I do a fair amount of forensic audio work so I'm always on the lookout for interesting software in this space. Again, thanks!

Chop

PS:  LOL @ "AnalFreq"
Interesting. That's a field I would love to have been able to follow (in pursuance of one of my previous "vocations" - see the "skills" section in my profile), but sadly women, work, life, money, and interests didn't quite go the directions I would have preferred ;-)
Great suggestion here, thought I add on "Sound eXchange" , a command-line audio processing tool (probably already know this).
I saw that it has some option of interest such as "silence" (Removes silence), "vad" (Voice Activity Detector)

@ http://sox.sourceforge.net/sox.html
@ http://sox.sourceforge.net/Main/HomePage

Specifically for vad, its description is as followed. There can be many option - flexible also need to do some testing to derive them:

Voice Activity Detector. Attempts to trim silence and quiet background sounds from the ends of (fairly high resolution i.e. 16-bit, 44-48kHz) recordings of speech. The algorithm currently uses a simple cepstral power measurement to detect voice, so may be fooled by other things, especially music. The effect can trim only from the front of the audio, so in order to trim from the back, the reverse effect must also be used

But on further drilling, I also saw that MP3 is not inherently (optionally) supported, need some effort for compiling

@ http://techblog.netwater.com/?p=4
@ http://sox.sourceforge.net/soxformat.html
Thank you Silas.
Bill, assuming you still have a pulse, it's NOT too late to change that direction, my friend! Do so!
Erm, yes, only just though ;)  I wouldn't mind liaising with you at some point off site to see if you might be able to offer any suggestions.  Not a brain-picking exercise, maybe just for some rough directions or resources I could use as a feasibility study.  I wish E-E had a personal message function.
Check your email, Bill. (I'm assuming I interpreted it correctly from your EE profile.)