?
Solved

Analyse mp3 files for sound?

Posted on 2011-04-25
34
Medium Priority
?
568 Views
Last Modified: 2012-05-11
Does anyone know of a tool, (no UI, just code) which can scan thru mp3 files to see when sound is present, its just for call-center stuff to check if people have been talking or not.
0
Comment
Question by:Silas2
  • 15
  • 8
  • 5
  • +2
34 Comments
 
LVL 5

Expert Comment

by:ChopOMatic
ID: 35463820
I don't know of anything but if you do find it, please post here for sharing if you don't mind.
0
 
LVL 65

Expert Comment

by:btan
ID: 35481495
One possibility is detecting beats per minutes

@ http://superuser.com/questions/129041/any-beat-detection-software-for-linux

Another possible is using library for development (unless you want to go to that route)

@ http://www.surina.net/soundtouch/index.html
0
 

Author Comment

by:Silas2
ID: 35481911
Thanks for those, the library is fine, but I'm not sure how I could detect the presence/absence of speech, with no beat.
I don't know anything about mp3, for example, does the length of the file represent the amount of complexit of it not its realtime playback, so maybe if I could detect the length of playback with code, and knowing its size, I could tell if it were silent?
0
 The Evil-ution of Network Security Threats

What are the hacks that forever changed the security industry? To answer that question, we created an exciting new eBook that takes you on a trip through hacking history. It explores the top hacks from the 80s to 2010s, why they mattered, and how the security industry responded.

 
LVL 39

Expert Comment

by:BillDL
ID: 35482408
Hi Silas.

I find myself in a bit of a dilemma with this question.  My personal overview of *some* call centres I have seen is that they are draconian and somewhat patronising to employees.  The request gives me the impression of "chat police" in a dark room analysing continuous audio feeds from every employee's headset to discover who is engaging in idle chit-chat amongst themselves about their weekend adventures ;-)

Putting all personal feelings aside and acknowledging the fact that people are there to do a job and not engage in idle banter, there are a few of things that I suppose are important to know before a suitable method is suggested.

1. Are these MP3s created for the purpose of "quality monitoring" (as they say while you are in the queue)?
2. Are you expecting most of them to contain audio content or not?
3. Are you trying to decipher human voice content amongst other audio content such as "on-hold" music or similar?
4. What is the approximate play time of these MP3s?
5. Do you need to extract sections of audio content from lengthy audio files that are mostly silent.

I think you'll see what I'm trying to ascertain.  There are quite a lot of standalone command-line programs that allow you to extract all kinds of properties from MP3 files to a report.

The actual play-time would be a pretty useless property to know because it doesn't tell you if it's just a long silence or actually has audio content in it.  I'm pretty sure that, amongst the utility programs I have saved and any new ones that may exist out there, that there will be those that can tell you whether there is actually audio content amongst silent gaps.  The main issue is whether any other audio other than voice is expected to be present.

Bill
0
 

Author Comment

by:Silas2
ID: 35484490
The app is to provide remote/home work for people, who will get paid for the amount of time they are on the phone. They don't have to be on the phone so its not draconian in that sense, its just to stop people leaving the phone off the hook and charging us for time on the phone.
Therefore, to be able to tell the difference between plain noise or music and speach would be nice, but the difference between speach and silence would be enough.
0
 
LVL 39

Expert Comment

by:BillDL
ID: 35484675
Thanks for explaining that.  It's not inconceivable that an enterprising individual, with knowledge that the app may be monitoring voice activity, could just place the radio next to the off-hook phone or something similar to defeat the monitoring, but it seems a reasonable enough way of validating activity.

I'll run a few tests with a couple of command line "MP3 Split" type programs using a few test MP3s.  Usually they will allow the generation of a report based on "silence" thresholds detected between audio content, which may in the end be much the same as detecting audio content amongst silence.

I've had a look for command-line driven programs that might be able to map amplitudes and frequencies, but so far it looks like that type of thing might only be available to GUI applications.
0
 

Author Comment

by:Silas2
ID: 35484871
Audacity is an opensource recording app that certainly does a lot of sound analysis, are bits of that accessible thru code?
0
 
LVL 39

Expert Comment

by:BillDL
ID: 35485176
You read my mind on that one.  I use Audacity a lot.  Generally scripting in Audacity is for running internal "macro" type scripts such as chaining together encoding and export commands (http://manual.audacityteam.org/man/Scripting), but I have been checking out whether something like the Spectrum Plotter function can be invoked from the command line.  So far I don't think so.
0
 
LVL 7

Expert Comment

by:huacat
ID: 35492009
I write a tools to detect the wav file whether silence or not.
If the whole file is silence or noise, it will return 0 and print a hint message.

This tools only support PCM wav files, MUST be MONO, 16K sample rate, 16 bit file.
You can convert the mp3 file as 16K, 16bit PCM wave file first, and use this tools to detect.
There have a lot of tools to convert MP3 to wav. e.g. L3dec, lame.

I think we can write a batch file to do it like below:
lame --decode your_file.mp3 temp.wav
siledet temp.wav
if %ERRORLEVEL%==1 do echo silence file!

Siledet.zip
0
 

Author Comment

by:Silas2
ID: 35492215
Thats v.interesting, can you set a threshold level for silence in the cmd line?
0
 
LVL 7

Expert Comment

by:huacat
ID: 35492480
Why do you wanna the threshold level?
Did this tools take a mistake for some files?
Please provider some sample file if possible.

What's the threshold level? for What?
Means the silence time length or the noise gain(db) level?

Detect whole file is silence or not is OK for me.
I'm not sure if it is OK for me, if you wanna a noise amplitude threshold.
0
 

Author Comment

by:Silas2
ID: 35492534
Sorry, I've never thought about sound analysis before, but I guess a combination of:
"...silence time length or the noise gain(db) level..." would be ideal, e.g. "over x db's for y% of time", is that possible?
0
 
LVL 39

Accepted Solution

by:
BillDL earned 1000 total points
ID: 35492571
You can with the command line program "mp3splt.exe":

http://mp3splt.sourceforge.net/mp3splt_page/home.php
http://mp3splt.sourceforge.net/mp3splt_page/downloads.php
http://prdownloads.sourceforge.net/mp3splt/mp3splt_2.3a_i386.zip

Delete the *.bat file from the unzipped package.

To get a brief help file:
mp3splt.exe -h > usage.txt 2>&1

Full Manual:
http://mp3splt.sourceforge.net/mp3splt_page/documentation/man.html

Useful switches:

-s   Silence detection: automatically find splitpoint. (Use -p for arguments)
-p + PARAMETERS (th, nt, off, min, rm, gap - see below)
-N   Don't create the 'mp3splt.log' log file when using '-s'.
-P   Pretend to split: simulation of the process (note uppercase P)

Parameters for -p
th = Float value between -96 and 0 representing Decibel threshold level to be considered silence. Default -s -48dB
nt = number of tracks (not useful or applicable to your case)
off = cutpoint offset of cutpoint in silence for splitting at gaps (not useful or applicable to your case)
min = minimum_length of silence length in seconds. Float value. Default is Zero
rm = remove silence between split tracks (not useful or applicable to your case)

If you don't specify any parameter, mp3splt will use the default values.

The program was written to allow people to split up a large MP3 into separate tracks and output the tracks as separate MP3s, whilest retaining the ID3 tags in each new file, or creating them as required.  I wondered if the silence detection would be enough on its own using the -P switch (go through the motions and just report findings) to at least determine that there are silent gaps.  To have silent gaps you need sound in between them, right?

example:

mp3splt -s -p min=5 -N -P test.mp3

It's way less than perfect though.

I did find quite a few tools that very quickly show spectral analysis of WAV files, bt discounted them because you asked for a non_GUI program and are using MP3s.
0
 
LVL 39

Expert Comment

by:BillDL
ID: 35492580
Sorry, didn't realise you two were conversing.  My comment addressed the question:
"Thats v.interesting, can you set a threshold level for silence in the cmd line?"
0
 
LVL 39

Expert Comment

by:BillDL
ID: 35492597
Silas

How is your recording implemented?  Through the headset, via the telephony network, on the server side?

Could an operative talking to a cat on her lap while browsing YouTube, or the noise of her husband doing the vacuuming potentially leave some "noise" in the recording while sitting with the headphones on awaiting a call?
0
 

Author Comment

by:Silas2
ID: 35492696
That is a good point about backgound noise, the recording is done thru a bespoke softphone, I don't know if I can split the 'channels', mic/speakers. Have you got any ideas for distinguishing voice from other sounds?
0
 
LVL 39

Expert Comment

by:BillDL
ID: 35492763
Unfortunately not.  That kind of thing really starts getting into the realms of forensic audio analysis and my knowledge doesn't extend anywhere near understanding the intricacies.  I do, however, have an old forensic data recovery suite I found years ago, and I'll have a quick look at and see if there might be an audio analysis tool.  I deleted the GUI program that plots spectrums for WAV files yesterday, but may find it again by searching.

If the mic/headset is able to pick up ambient room sounds with any efficiency, I reckon it's going to be a very hard job to distinguish between "C'mon kitty, up on my lap", "Oh, John, could you take out the bin and make the lunch", and "Good morning, Silas IT Support, how may I help you".  
0
 
LVL 39

Expert Comment

by:BillDL
ID: 35492800
I would expect "other sounds" to be quieter than a voice from a mouth close to a headset microphone, but with a simple program like the "mp3splt.exe" I was toying with, it would then be a case of lots of experimentation to set a "silence" threshold level to roughly match household noise in the background.
0
 

Author Comment

by:Silas2
ID: 35492918
I think you might be right about the "mp3splt.exe" thresholds, I might be able to find a sweet spot for voice. It's not supposed to be a definative way to calculate wages routinely, only in cases of dispute, it would be a good first port of call.
0
 
LVL 39

Expert Comment

by:BillDL
ID: 35493285
Where are the MP3 files stored when recorded, and are they for the duration of the day or are they somehow initiated programmatically when the operative answers or makes a call, or logs in to your server?

That's rhetorical and just food for thought, because I need to go and make my dinner right now ;-)

I was just experimenting a few days ago with a simple batch script or VBS (can't recall which) that, when run, records input from the default recording device on a Windows (XP and possibly other OS) machine, for whatever duration in seconds is set, and creates a WAV file without showing any other activity.  I'll be damned if I can find the script now though.  I'll do a file search later.
0
 
LVL 7

Expert Comment

by:huacat
ID: 35493678
I already add the threshold settings.
Usage:
  siledet c:\test.wav 10
means if the silecence time more than 10%, the result will give "silence" otherwise not.
Of cause this tools only process the device background noise, can't filter the enviroment background noise like "come on kitty".

 
 silence detector screenshots Siledet.zip
0
 
LVL 7

Expert Comment

by:huacat
ID: 35493761
Sorry, it looks have some bug on pure device background noise file.
I'll try to solve it later.
0
 

Author Comment

by:Silas2
ID: 35494925
Ahem, excuse me, but I think I've just realised I might have an easier way of doing this, as I've just discovered that I do receive a 'disconnect' event from the Voip library when the person called if they hang up, I didn't think I was getting one, but I think I am, so I can stop recording at that point. They would have to be phoning themselves or some confederate but we control the numbers phoned.
However, both these utilies look useful to me in other ways I might need to process the files.
0
 
LVL 5

Expert Comment

by:ChopOMatic
ID: 35495412
Bill, I'd love to know about the GUI tools you mentioned above when  you showed the long set of instructions for MP3split.

Thx!
0
 
LVL 39

Expert Comment

by:BillDL
ID: 35495840
Hi ChopOMatic

I managed to find one of the GUI utilities I had looked at previously but disregarded because of the user interface and that it only processed WAV files.  It's named .... wait for it .... "AnalFreq".  I can only assume that it is pronounced "Annalll" rather than "Aynal", because it's by Christopher Brown, The Parmly Hearing Institute, Loyola University, Chicago, and not by someone in the Gastro-Intestinal Noise Analysis Dept ;-)

It is described as "a single-channel FFT-based (Fast Fourier Transform) spectrum analyzer", and the version I found (afreq18.zip) was from this link:
http://www.simtel.net/author/Christopher-Brown/3891.html

WAV files have to be 16-bit and one of 3 fixed sample rates by the look of the errors I get trying to find a suitable file to load.  It's also giving me runtime errors, so it would appear to either be less than perfectly coded or just too old for anything past Windows 2000.

Another I've known about for a while is by the Australian company NCH which tends to provide free versions of their software as tasters for the full packages:
http://www.nch.com.au/wavepad/fft.html

There are loads more like that, some retail, some shareware, some free.  I had actually been searching for some old style DOS program that might output to an ANSI or ASCII "graph" or "plot", or to numeric ranges that could be parsed and plotted later somehow.
0
 
LVL 39

Expert Comment

by:BillDL
ID: 35495854
ChopOMatic, I meant to say that I also looked at "MP3Gain" as a possible way to determine a sound level, but it's scope doesn't really allow anything to work with in this case.

Hi Silas

You said that you get a "disconnect" event from the VoIP softphone.  Does that imply that you may be able to use these events as definitive "timers" to back up a tribunal case if it came to it, or are you still looking for some software that can be triggered from the event to analyse each MP3 after the recording is terminated each time?
0
 
LVL 39

Expert Comment

by:BillDL
ID: 35495885
Silas

You might also want to bookmark the NCH software site.  Maybe some of it could prove useful now or at a later stage:
http://www.nch.com.au/software/index.html

http://www.nch.com.au/talk/index.html
http://www.nch.com.au/vrs/index.html
http://www.nch.com.au/soundtap/index.html
0
 
LVL 5

Expert Comment

by:ChopOMatic
ID: 35495917
Big thanks for sharing that info, Bill. I'm a big fan of NCH stuff. I'm pretty sure I'm remembering correctly that they're the ones who make Golden Records, an app designed to find silent gaps in recordings and split the file accordingly?

I do a fair amount of forensic audio work so I'm always on the lookout for interesting software in this space. Again, thanks!

Chop

PS:  LOL @ "AnalFreq"
0
 
LVL 39

Expert Comment

by:BillDL
ID: 35496120
Interesting. That's a field I would love to have been able to follow (in pursuance of one of my previous "vocations" - see the "skills" section in my profile), but sadly women, work, life, money, and interests didn't quite go the directions I would have preferred ;-)
0
 
LVL 65

Expert Comment

by:btan
ID: 35496231
Great suggestion here, thought I add on "Sound eXchange" , a command-line audio processing tool (probably already know this).
I saw that it has some option of interest such as "silence" (Removes silence), "vad" (Voice Activity Detector)

@ http://sox.sourceforge.net/sox.html
@ http://sox.sourceforge.net/Main/HomePage

Specifically for vad, its description is as followed. There can be many option - flexible also need to do some testing to derive them:

Voice Activity Detector. Attempts to trim silence and quiet background sounds from the ends of (fairly high resolution i.e. 16-bit, 44-48kHz) recordings of speech. The algorithm currently uses a simple cepstral power measurement to detect voice, so may be fooled by other things, especially music. The effect can trim only from the front of the audio, so in order to trim from the back, the reverse effect must also be used

But on further drilling, I also saw that MP3 is not inherently (optionally) supported, need some effort for compiling

@ http://techblog.netwater.com/?p=4
@ http://sox.sourceforge.net/soxformat.html
0
 
LVL 39

Expert Comment

by:BillDL
ID: 35498826
Thank you Silas.
0
 
LVL 5

Expert Comment

by:ChopOMatic
ID: 35499179
Bill, assuming you still have a pulse, it's NOT too late to change that direction, my friend! Do so!
0
 
LVL 39

Expert Comment

by:BillDL
ID: 35499227
Erm, yes, only just though ;)  I wouldn't mind liaising with you at some point off site to see if you might be able to offer any suggestions.  Not a brain-picking exercise, maybe just for some rough directions or resources I could use as a feasibility study.  I wish E-E had a personal message function.
0
 
LVL 5

Expert Comment

by:ChopOMatic
ID: 35499475
Check your email, Bill. (I'm assuming I interpreted it correctly from your EE profile.)
0

Featured Post

What Security Threats Are We Predicting for 2018?

Cryptocurrency, IoT botnets, MFA, and more! Hackers are already planning their next big attacks for 2018. Learn what you might face, and how to defend against it with our 2018 security predictions.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I have been fiddling with (oops, I mean evaluating) media players for over a year now.  I purchased a D-Link DSM-520 about a year ago and was pleased with how many file formats It could play, but I found the user interface to be on the clunky side. …
How to record audio from input sources to your PC – connected devices, connected preamp to record vinyl discs, streaming media, that play through your audio card: Vista, Windows 7, Windows 8, Windows 8.1 and Windows 10 – both 32 bit & 64.
Viewers will learn how to include realistic velocity sensitivity to their Sampler instruments. Set the Vol<Vel parameter in the Filter/Global tab to your desired setting: Gather samples of hits of various intensity, and drag/drop into Velocity zon…
Viewers will learn how to use LFOs to modulate the sound of their Sampler instruments. Click the Modulation tab in Sampler: Choose one (or more) of the three available LFOs, and click the respective button to turn it on: Select a waveform, an LF…
Suggested Courses

850 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question