Wav to Text program using SAPI 5 and delphi

fatihbarut
fatihbarut used Ask the Experts™
on
Hi,
I want to translate wav files into text using SAPI 5.1. It must be very easy because instead of the sound recorded by microphone I will use the sounds recorded elsewhere.
Microsoft said it is possible by SAPI 5.1 how ever I couldn't find how I can do it after I do these steps
- Instaling SAPI 5.1
- Installing SAPI components into Delphi

Briefly I need answers for this 2 questions
- Which sapi components should I use for it
- Which methods should I use and how
Thank you
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Commented:
Hi hope this helps, (Code is not checked as i don't have SAPI SDK installed right now)

1) After installing the SAPI 5.1 SDK, you import the SAPI type library into delphi from Project\ImportTypeLibrary. This will install wrapper components for the SAPI Engine.

2) Drop a TSpInProcRecoContext comp into your form name it SP and a TSpFileStream and name it FileStream, code the following events in SR comp:

OnSRRecognition, OnSRFalseRecognition, OnEndStream

like this

procedure TForm1.OnSRRecognition(Sender: TObject; StreamNumber: Integer; StreamPosition: OleVariant; RecognitionType: TOleEnum; var Result: OleVariant);
var
  SRResult: ISpeechRecoResult;
  oItem: ISpeechPhraseProperty;
  i: Integer;
begin
 SRResult := IDispatch(Result) as ISpeechRecoResult;
   for i := 0 to SRResult.PhraseInfo.Properties.Count - 1 do
    begin
     oItem := SRResult.PhraseInfo.Properties.Item(i).Children.Item(0);
     if i = SRResult.PhraseInfo.Properties.Count - 1 then
      Showmessage(SRResult.PhraseInfo.GetText(0, -1, True));
    end;
end;

procedure TForm1.OnSRFalseRecognition(Sender: TObject; StreamNumber: Integer; StreamPosition: OleVariant; var Result: OleVariant);
begin
 Showmessage('Cannot recognize');
end;

procedure TXSReco.OnSREndStream(Sender: TObject;StreamNumber: Integer; StreamPosition: OleVariant; StreamReleased: WordBool);
begin
 FileStream.Close;
end;

then load the wav file like this:

procedure TForm1.Start;
begin
  FileStream.Open(aWaveFile, SPFM_OPEN_READONLY, False);
  SR.Recognizer.AudioInputStream := FileStream.DefaultInterface;
end;

Hope i helped...

Author

Commented:
Firstly, it looks great. thank you.
Secondly I am sorry but I have a problem with my delphi 7,

it says [Error] Unit1.pas(14): Undeclared identifier: 'TOleEnum' for the procedure  below...


procedure TForm1.OnSRRecognition(Sender: TObject; StreamNumber: Integer; StreamPosition: OleVariant; RecognitionType: TOleEnum; var Result: OleVariant);



Commented:
hi, did you import the type library as i wrote you ? if yes then check if your unit has these files in the uses section

uses
  Windows, Messages, SysUtils, Classes, Graphics, Controls, Forms, Dialogs,
  ActiveX, OleServer, ExtCtrls;
Learn SQL Server Core 2016

This course will introduce you to SQL Server Core 2016, as well as teach you about SSMS, data tools, installation, server configuration, using Management Studio, and writing and executing queries.

Author

Commented:
By the way I realized something I am using windows 7 therefore my sapi is 5.4 what should I do know?
uninstall it somehow and reinstall 5.1?

Author

Commented:
Pardon after adding ActiveX, OleServer, ExtCtrls into uses problem solved, however, when I use "start" procedure there nothing happens, how can I write the recognized stream into a text file or into a memo
Commented:
Hmm, use the following code:

  Grammar: ISpeechRecoGrammar; // Put this in private area of your class

Add this event...

procedure TForm1.FormCreate(Sender: TObject);
begin
 Grammar := SR.CreateGrammar(0);
 Grammar.Reset(languageID);           // i don't know what is your languageid
end;

Then change the start procedure as follows:

procedure TForm1.Start;
begin
  FileStream.Open(aWaveFile, SPFM_OPEN_READONLY, False);
  SR.Recognizer.AudioInputStream := FileStream.DefaultInterface;
  Grammar.CmdSetRuleState('TopRule', SGDSActive);
end;

This should start the recognition and fire the OnRecognition event...

Author

Commented:
The language I use is English, I would be happy to learn it is language code
and I got "Undeclaired identifier for 'Grammar' word"
thanks again

Author

Commented:
Sorry my fault
I added  Grammar: ISpeechRecoGrammar; in private area
how ever this time when I execute start procedure  I got "OLE eror 80045001" message

Author

Commented:
This is the last situation I am in.
With the help of twinsofts answers and 2 articles I linked below, I have made a working code which I added

http://edn.embarcadero.com/article/29583
and
http://www.delphi3000.com/articles/article_2629.asp

However, results are awfull. I need to upgrade sensitivity and accuracy,
Forexample
The real speech is:
"Welcome to the turbo power happy voice example program, press any key on your phones touch pad to start program"
The conversion is:
"Welcome to the cattle polish up where it can't be politically ample program but any key on your own cut that to start a program"
Any futher help will be very appreciated.
unit AltYazarP;

interface

uses
  Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,Dialogs, SpeechLib_TLB, OleServer, StdCtrls,ActiveX,ExtCtrls;

type
  TForm1 = class(TForm)
    SR: TSpInProcRecoContext;
    FileStream: TSpFileStream;
    Button1: TButton;
    Hipotezler: TMemo;
    OpenDialog1: TOpenDialog;
    Label1: TLabel;
    Label2: TLabel;
    Taninanlar: TMemo;
    Button2: TButton;
    procedure SRFalseRecognition(ASender: TObject; StreamNumber: Integer;
      StreamPosition: OleVariant; const Result: ISpeechRecoResult);
    procedure SREndStream(ASender: TObject; StreamNumber: Integer;
      StreamPosition: OleVariant; StreamReleased: WordBool);
      procedure Start;
    procedure Button1Click(Sender: TObject);
    procedure SRRecognition(ASender: TObject; StreamNumber: Integer;
      StreamPosition: OleVariant; RecognitionType: TOleEnum;      const Result: ISpeechRecoResult);
    procedure FormCreate(Sender: TObject);
    procedure SRHypothesis(ASender: TObject; StreamNumber: Integer;
      StreamPosition: OleVariant; const Result: ISpeechRecoResult);
    procedure Button2Click(Sender: TObject);
  private
  Grammar: ISpeechRecoGrammar;
    { Private declarations }
  public
    { Public declarations }
  end;

var
  Form1: TForm1;

implementation

{$R *.dfm}
procedure TForm1.FormCreate(Sender: TObject);

begin
Grammar := SR.CreateGrammar(0);
Grammar.DictationSetState(SGDSActive);
end;

procedure TForm1.Start;

begin
if OpenDialog1.Execute then
begin
FileStream.Open(OpenDialog1.FileName, SPFM_OPEN_READONLY, False);
SR.Recognizer.AudioInputStream := FileStream.DefaultInterface;
end;
end;

procedure TForm1.SRFalseRecognition(ASender: TObject;
  StreamNumber: Integer; StreamPosition: OleVariant;
  const Result: ISpeechRecoResult);
begin
// Showmessage('Cannot recognize');
end;

procedure TForm1.SREndStream(ASender: TObject; StreamNumber: Integer;
  StreamPosition: OleVariant; StreamReleased: WordBool);
begin
 FileStream.Close;
end;

procedure TForm1.Button1Click(Sender: TObject);
begin
Start;
end;

procedure TForm1.SRRecognition(ASender: TObject; StreamNumber: Integer;
  StreamPosition: OleVariant; RecognitionType: TOleEnum; const Result: ISpeechRecoResult);
var
  SRResult: ISpeechRecoResult;
  oItem: ISpeechPhraseProperty;
  i: Integer;
begin

Taninanlar.Text {taninanlar is a Memo} := Taninanlar.Text+Result.PhraseInfo.GetText(0,-1,true);

end;



procedure TForm1.SRHypothesis(ASender: TObject; StreamNumber: Integer;
  StreamPosition: OleVariant; const Result: ISpeechRecoResult);

begin

Hipotezler.Text{Hipotezler is another Memo) := Hipotezler.Text +Result.PhraseInfo.GetText(0,-1,false);
end;

procedure TForm1.Button2Click(Sender: TObject);

begin
//erased garbage
end;

end.

Open in new window

Commented:
Hi, the ability of the engine to recognize words correctly has to do with the efficiency of the engine itself. The only thing that you can do is to use grammar rules to help the engine perform better but this is not easy as you have to create something like a vocabulary...

Author

Commented:
Thanks again, however I didn't get what you said completely. How can I change the grammar rules, are there more to do besides arranging the parameters?
If there are paramaters for grammar rules, I d be happy to hear.

(For twinsoft) I have 4 more questions similar to this one but comperatively easy ones


Export M.S Speech Engine 5.1 (SAPI 5.1) and/or SAPI 4 library to another computer to save my recorded words
http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/SSML/Q_25100009.html



Convert .wav (sound) to Graphic (jpg etc.) using delphi: Wav to binary -> Interperate binary-> Number to points-> Join points and draw
http://www.experts-exchange.com/Programming/Languages/Pascal/Delphi/Q_25100383.html


Speech Recognation program using Delphi and Matlab

http://www.experts-exchange.com/Programming/Languages/Pascal/Delphi/Q_25100036.html


Simple Speech Recognation program, which recognise numbers between 1-100 using Delphi (in turkish language)
http://www.experts-exchange.com/Programming/Languages/Pascal/Delphi/Q_25100689.html?fromWizard=true

Author

Commented:
Thanks it was good to work with you...

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial