Link to home
Start Free TrialLog in
Avatar of Dan_A
Dan_A

asked on

Voice Recognition

Looking for a simple Voice Recognition pgm. to make a Button Jump when the app. here's the word Jump. I've downloaded MicroSoft's Direct Speech Engine, But I don't know how to use it. I want to impress my VP... Please help (I only have 100 points to Give ..) Thanks in Advance - Dan A.
 
Avatar of bhh
bhh

I too am interested in this...
listening...
Hi!

You better download the whole Speech SDK 4.0a not just the engine, and the engine is just for text to speech (it is a speech syntetizer)

http://www.microsoft.com/IIT/download/default.htm

There are VB section in the SDK but mainly it is directed for VC. A lot of samples and reading in it.

It comes whit a voice recognition/command sample program.

Make sure you take the 4.0a not the Beta SDK

Matti
Hi!

Someting more comes to my mind, is it speech and spoken word's you need to recon or just different persons voices?

Matti
:ping:
Ahhh....I have an idea that I am willing to share.

Ok this may take a little while but please bear with me :-)

There are several applications with source that will show you the graphical representation of a sound. You know like windows sound recorder (black box with green graphic lines). Now if you can open and receive the sound from a mic and graphically depict what it would look like then... You could save that and actually analyize any further input from the mic and compare it to the graphical representation of the first sound.

Example:

You got and record the pass phrase:
"Open Says Me"
keep in mind that you can speak in any tone. Then you save that as your pass phrase. Later you come back and enter your program, saying "Open Says Me".
This time is compares the graphic representation of the current pass phrase to the saved graphical representation (original). If it matches then ok, if not then....

Now the problem is that variations of a sound can occur which would cause the graph to be skewed. If normal is medium then a soft voice saying the pass phrase would be incorrect because the graph would be much lower. So to compensate for this you would need to incorporate a percentage/security level. And then compare the graphic attempt to the graphic original margin....if a vowel peaks at 10 then only let it deviate from 10 by 2 pixels...so a correct vowel sound could be 8-12 on that certain part. I am sure that I will not be clear enough but...

Original pass phrase:
H

M ___/\___

L

Attempted pass phrase:

H

M ___/\___

L


Requirements:

any part cannot deviate 2 pixels above or below the original.


Food For Thought

ne0
Simply put... You are comparing the graphs of the two. In doing so you can pick certain key parts/valleys/peaks and use the graph to specify what is required for a correct pass phrase to work.
           / \
         /  .  \
       /  /   \  \
     /  /  / \  \  \
   /  /  /     \  \  \
M O Mi


If a segment of the sound is analyzed, and the percentage of deviation is set to 2%...O is the original pass phrase. Then M (being Max Deviation) and Mi (being
Minimum deviation) would be the limits of any sound there after. This would account for small differences in tones.


If you were sick one day though... might as well go back to sleep. :-)

ne0
don't even think about trying to develop this on your own.  your boss, my boss, my bosses great great great great grandchildren will be dead before you finish it. (on your own)

just plug a mic into the back of your soundcard, incorporate the winsock control and get a co-conspirator.

when your boss says jump have someone around the corner send a msg through the socket to make the button move around.

then while you're at it tell him you wrote a neural network that can communicate.  have him type in a text box, click send and then just have your friend chat like a normal chat room.

he'll think your brilliant, but you gotta buy your friend lunch!


Matti gave you the link for the MS SAPI Development Kit Suite.  I have used the samples in this package.  It is a huge download, but it is worth it.  There is an active X control that wraps the command recognition API that makes using this technology very easy.  

I am also in agreement with viperlin here.  Do *not* try to write your own speech recognition (esp in VB lol!)  There is a lot more to it than what ne0 is saying here.  Let MS do all the research for you and you can reap the benfits with (depending on connection speed) a few hours of downloading.  The total SDK is about 40 M, but as I mentioned before, it is worth it.  If you get one of the smaller packages, you will probably be missing various tools and samples, so it is better to get the full thing.
Ahh come on...You don't think it would be easy to code this?   :-)   (j/k)  

ne0
Hi!

FYI ne0
There is very many things you need to take care to recon person from a wave sample.

-soundcard ,computer settings, volume level
microphone maker and model etc.
-the time relation variance, trim and cut of the sample.
-background noice.
-and lots of minor things
-error level test, must have a large groub of beta testers and see how it realy works, got someting "wokging" in your quiet room or someting. not mean same will work in real dutied as a door guard or someting like that.

The speech recon works on phonetic level not complete word or sentence level.
User "learns" his/her voice and reads certain words and sentences as samples, these are carefully selected and all phonems comes in these least once.

Graph is problematic, that's a picture and need to compare two times the "coverage" and "surplus", other vice it needs only a constant loud sound to make full "peak" and cover everything. Need to know both bmp and wave format real good to code anything whit these. and there are just many variables in this and speec SDK do not use this it has wave editor whit garph but there is "pitch" and "energy" and phonem analysis, so the garph is good looking but not much to do whit this kind of analysis. And this does not leave much for errors , think you lef the other proces out so if works until someone start to makig loud sound of full scale, but if you know that if you squirt or fart the sound lock will open, also slow the Speeh SDK is in almost realtime.
Values a long integers much better logical operations are bossible, but the speech technology in that SDK is ready to use. Only disadvantage is that the ready componets are made for use of VB speech and word recon so the correction vocabylary and phonetic vocabylary tries to recon a word and it has corrector "so it makes it too good".the lanquage has ruules and this makes it much easier to pick a existing word from speech. like colect the analytzed phonems and see if they becomes as word in this order or is there one or more wrong phonems in the analyse.
Need to make own componet in Speech SDK and leave these corection parts out of the process.This is for VC or there is bossibility to extend Vb whit a typelib.

There is constanly large professional teem making this SDK much better constantly, vs one Vb programmer who self tries to solve it, and best, there are currently NO extra monetary need in redistribution of SAPI app made by Microsoft. See redistribution notes of the SDK.


Matti
>>Ahh come on...You don't think it would be easy to code this?   :-)   (j/k)  

>>ne0

Coding it would be the easy part.  Getting it to actually work, now there's the problem.  :)  BTW, I found that the command recognition was reasonably reliable (after training the voice recognition engine for awhile) but the dictation API still needs a lot of work:

Mary had a little lamb, by Microsoft Dictation Pad:

Geary and needed so of lamb
Her fleas was buying no
And Henry way area then merry way the land was shortened go
Hi!

The Voice Commads has only a few choices to make: the programs installed and common commands in them. Everything is in "What Can I Say" list and it is quite small list to compare a spoken sentence, this makes it much more reliable.

The Dictationpad has the whole word count of a language to make a choice, and the preread personal phonome library has a very important role in it.

Matti
>The Dictationpad has the whole word count of a language to make a choice, and the preread personal phonome library has a very important role in it.

I can see it is a much more challenging task to make a dictation product.  Still I trained the engine for over an hour, and this is the best it could do?

I've tried Dragon NaturallySpeaking voice recognition engine and find the dictation is by far superior.
dont know much bout voice speech and recognition
1)is it possible to access the memory of sound card
2)as every wave file varies only because of wave lenghth and frequency,why not write an algorithm to get wave length of speech,so voice recog is possible
plz comment
I have returned :-)


Matti: I agree that there are many undotted 'i's and uncrossed 't's but...
MS does not have to continue working on it in the future, they do have a tendency to drop things that are not profitable. So a programmer would be forced to use whatever is given to him/her or until the programmer decides to code one of their own or find some elsewhere.  

PaulHews: :-) you not even kidding. I am glad you got my joke. In OOA/OOD we are talking years and probably the same in Conventional methods. This is just to get where the SDK is (or hopefully further)


I agree that a VC, or even better a Win32 assembly, component would be far better for computations as well as more resourceful. In making such a component one would need a programming team (not just for the 20% that actually does the Voice Recognition/Comparison, but for the 80% error handling/variation checking around it).

I was merely expressing a view/opinion of the simplest Voice Recognition solution asked by Dan A (behind the SDK that MS provides, that would be the easiest) and hopefully showing that even though the question was simple... The answer is quite complex. As far as coding your own, Of course changes in speech tone, pitch, emphasis, DB, start time, end time, segment duration and pretty much all variations (between 20Hz - 12KHz whether it be from another individual or the individuals environment) would need to be accounted for. I only wanted to scrape the tip of the ice to show that an iceberg lay underneath. I personally have not tried the MS SDK so I cannot comment on that.

The possibilities are endless,
it is the resources that are limited.

ne0
Hi!

That Dragon engine has ower four times bigger correction vogabylary, and it's totaly different series product.

MS may stop working whit this, but currently nothing points to that.

I have seen one sound/wave bit level recon/comparation VB application (not a published sample) and there has been used three months of work and result was not perfect.
Just for those limited resources this SAPI SDK is a good choice for this, the easyest voice recon solution.


Matti
<ping> This conversation is gettin interesting! lol
Hi!

Dan A, some commets pleace!

Matti
Avatar of Dan_A

ASKER

If I install the whole SDK Kit, Try to use a documented example, Does It seem like VB is a good choice to code a voice recognition appl ??? If not, What language or 3rd party components should I look at ??? All I wanted to do is make a button Jump upon my command ..... Thanks for all your comments, I'm new iin VB devel. and I'm trying to explore its Reach. So far, I'm pretty strong w/ VB and Sybase using to ADO. I watch for ???s on this topic, I should be able to help in this arena (e.g. When to leave & drop connections etc) - Dan A.
   
ASKER CERTIFIED SOLUTION
Avatar of Matti
Matti
Flag of Finland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Dan_A

ASKER

Sorry Matti,  I Posted the ??? prior to going on Vacation,, Trying to catch-up is not always easy. I'll download version 4a and try follow examples ... If I get It to work, I'll document my source & post it on EE.

Dan A.
   
Avatar of Asta Cu
Greetings.

This question is still open today, perhaps it was overlooked or just lost in the volumes.  Please return to this question to update it with comments if more information is needed to get your solution.  If you've been helped by the participating expert(s), you may just convert their comment to the accepted answer and then grade and close.  If an answer has ever been proposed you may not have this option to accept the comment as answer, if that is the case, ask the specific expert you wish to pay to post an answer.  This benefits others who then search our PAQ for just this solution.  A win/win scenario.

If you wish to pay multiple participants, you can do so by creating a zero point question in the Community Support topic area, include this link and tell them which experts you'd like to pay what amounts.  If you'd like to delete this question, use the same process as above, but explain why you think it should be deleted.  Here is the Community Support link:   https://www.experts-exchange.com/jsp/qList.jsp?ta=commspt

You can always click on your profile to see all your open questions, in the event you also have other open items to be resolved.   If your number of Questions Asked is not equal to the number of Answers Graded, choose to VIEW question history, and you'll quickly be able to navigate to your open items to close them as well.

I've had excellent help from experts-exchange through the years and find the real key to getting what I need is to remain active in all my questions, responding with results to suggestions until my solution is found, and recommend that highly.

Thank you very much for your responsiveness, it is very much appreciated.  
":0)  Asta

P.S.  Some of the older questions from last year are not in the proper comment date order, and Engineering has been advised.    
Force accepting Matti's last comment.

costello
Community Support Moderator @ Experts-Exchange