Which plot type is the best for speech recognation or voice recognation

fatihbarut
fatihbarut used Ask the Experts™
on
hi guys,
could you tell me which plot type in matlab is the best one for speech recognation?
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
There are hundreds of on-line references.

Search for "speech recognition"  "speech spectrogram"  or "Victor Zue"

Here are a few examples:

http://www.bcs.rochester.edu/courses/crsinf/561/ARCHIVES/S06/0426/Zue.pdf

http://sipl.technion.ac.il/~rafi/spectrogram%20segmentation.pdf

The basic process is do break the speech in short time segments.

Then you do an FFT on each segment to get frequency information.

And plot the frequency content versus time (this is a spectrogram).  There are examples in the second paper.

Then you have to break the spectrogram in to phonemes.  This is really the the guts of the process.  This means that you have to know what the spectrogram of each phoneme looks like.  

And finally convert the phonemes into words.

If you want to understand the words, that is another level entirely.

Author

Commented:
They both are greath sources for who has time to read and more importantly with engineering past :)

On the other hand I got what you explained exactly, thank you.

and finally I am just trying to catch at most 200 words. I don't need segmantation of words to phonemes. I just need to differentiate them.
You could probably implement The Clapper in software without doing an FFT.

The next level up in audio processing would be something like decoding Touch Tone Dialing.  And I think you would need or want an FFT there.

Even if you are just trying to respond to short, simple commands:   "STOP"  "TURN LEFT"  and  "FIRE", you will probably need the FFT and you may still want to use phonemes.

If you can prerecord and process your target vocabulary, you might get by with correlating
the input spectrogram against the target words.  This would work best for a single speaker.
Learn Ruby Fundamentals

This course will introduce you to Ruby, as well as teach you about classes, methods, variables, data structures, loops, enumerable methods, and finishing touches.

Author

Commented:
I used this code below

function wavimportandplot(filename)
[data,fs] = wavread(filename);
figure('visible','off')
stem(repmat((1:size(data,1))'/fs,1,size(data,2)),data,'marker','none')
xlim([1 size(data,1)]/fs)
xlabel('Time, sec')
print('-dpng','-r300',strrep(filename,'.wav',''))
close gcf

however bar stem area kind of graphics didn't satisfied me.
Just need much more usefull one.
How are you generating the input waveforms?
What is the sample rate and resolution?

What sort of processing are you trying to implement here:

          stem(repmat((1:size(data,1))'/fs,1,size(data,2)),data,'marker','none')

As near as I can tell, the stem() function just plots the amplitude of each sample with a little circles on top.
Here is a recent App Note from Matlab on Isolated Word Recognition.  In this case they are trying to recognize just ten words: the spoken digits from zero to nine.  Your project is at least this complicated.

http://www.mathworks.com/company/newsletters/digest/2010/jan/word-recognition-system-matlab.html

Author

Commented:
another great article.
However I just want to see my wave files (spoken words) in a format which even a human can differentiate by eye.
Speech Recognition is a hard problem, and a very active research area with a 50+ year history.

http://www.csd.uoc.gr/~hy578/2005/projects/ieee_sp_spm_1998015_03may_0024juan.pdf

>>  a format which even a human can differentiate by eye

Victor Zue (who I mentioned in my first post) can "read" speech spectrograms as a second language.  It is one of the things that got him written up in Time Magazine (01-APR-1985) and how I first heard about him.

But if you want a format that anybody can differentiate, you really need to do some heavy duty signal processing.  The output format you need is English Text.  There are no shortcuts.

Author

Commented:
I still need a clear answers.
Thanks for all the previous ones.

Author

Commented:
thanks I am trying to contact Mr. Zue
The question is resolved and relevant links are provided.  I was the only responder.
I recommend a split between the following two posts:

d-glitch     https:#a38002614   <== Accept
The author hoped to do speech recgniton without any signal precessing.  My critique was (and still is) correct.

d-glitch     https:#a37999009
This is a still-active link to an Ap Note for Isolated Word Recognition (the stated goal) in Matlab (the author's preferred language).

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial