Link to home
Start Free TrialLog in
Avatar of fatihbarut
fatihbarut

asked on

Which plot type is the best for speech recognation or voice recognation

hi guys,
could you tell me which plot type in matlab is the best one for speech recognation?
Avatar of d-glitch
d-glitch
Flag of United States of America image

There are hundreds of on-line references.

Search for "speech recognition"  "speech spectrogram"  or "Victor Zue"

Here are a few examples:

http://www.bcs.rochester.edu/courses/crsinf/561/ARCHIVES/S06/0426/Zue.pdf

http://sipl.technion.ac.il/~rafi/spectrogram%20segmentation.pdf

The basic process is do break the speech in short time segments.

Then you do an FFT on each segment to get frequency information.

And plot the frequency content versus time (this is a spectrogram).  There are examples in the second paper.

Then you have to break the spectrogram in to phonemes.  This is really the the guts of the process.  This means that you have to know what the spectrogram of each phoneme looks like.  

And finally convert the phonemes into words.

If you want to understand the words, that is another level entirely.
Avatar of fatihbarut
fatihbarut

ASKER

They both are greath sources for who has time to read and more importantly with engineering past :)

On the other hand I got what you explained exactly, thank you.

and finally I am just trying to catch at most 200 words. I don't need segmantation of words to phonemes. I just need to differentiate them.
You could probably implement The Clapper in software without doing an FFT.

The next level up in audio processing would be something like decoding Touch Tone Dialing.  And I think you would need or want an FFT there.

Even if you are just trying to respond to short, simple commands:   "STOP"  "TURN LEFT"  and  "FIRE", you will probably need the FFT and you may still want to use phonemes.

If you can prerecord and process your target vocabulary, you might get by with correlating
the input spectrogram against the target words.  This would work best for a single speaker.
I used this code below

function wavimportandplot(filename)
[data,fs] = wavread(filename);
figure('visible','off')
stem(repmat((1:size(data,1))'/fs,1,size(data,2)),data,'marker','none')
xlim([1 size(data,1)]/fs)
xlabel('Time, sec')
print('-dpng','-r300',strrep(filename,'.wav',''))
close gcf

however bar stem area kind of graphics didn't satisfied me.
Just need much more usefull one.
How are you generating the input waveforms?
What is the sample rate and resolution?

What sort of processing are you trying to implement here:

          stem(repmat((1:size(data,1))'/fs,1,size(data,2)),data,'marker','none')

As near as I can tell, the stem() function just plots the amplitude of each sample with a little circles on top.
SOLUTION
Avatar of d-glitch
d-glitch
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
another great article.
However I just want to see my wave files (spoken words) in a format which even a human can differentiate by eye.
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I still need a clear answers.
Thanks for all the previous ones.
thanks I am trying to contact Mr. Zue
The question is resolved and relevant links are provided.  I was the only responder.
I recommend a split between the following two posts:

d-glitch     https:#a38002614   <== Accept
The author hoped to do speech recgniton without any signal precessing.  My critique was (and still is) correct.

d-glitch     https:#a37999009
This is a still-active link to an Ap Note for Isolated Word Recognition (the stated goal) in Matlab (the author's preferred language).