Link to home
Start Free TrialLog in
Avatar of fatihbarut
fatihbarut

asked on

Wav to Text program using SAPI 5 and delphi

Hi,
I want to translate wav files into text using SAPI 5.1. It must be very easy because instead of the sound recorded by microphone I will use the sounds recorded elsewhere.
Microsoft said it is possible by SAPI 5.1 how ever I couldn't find how I can do it after I do these steps
- Instaling SAPI 5.1
- Installing SAPI components into Delphi

Briefly I need answers for this 2 questions
- Which sapi components should I use for it
- Which methods should I use and how
Thank you
ASKER CERTIFIED SOLUTION
Avatar of twinsoft
twinsoft
Flag of Greece image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of fatihbarut
fatihbarut

ASKER

Firstly, it looks great. thank you.
Secondly I am sorry but I have a problem with my delphi 7,

it says [Error] Unit1.pas(14): Undeclared identifier: 'TOleEnum' for the procedure  below...


procedure TForm1.OnSRRecognition(Sender: TObject; StreamNumber: Integer; StreamPosition: OleVariant; RecognitionType: TOleEnum; var Result: OleVariant);



hi, did you import the type library as i wrote you ? if yes then check if your unit has these files in the uses section

uses
  Windows, Messages, SysUtils, Classes, Graphics, Controls, Forms, Dialogs,
  ActiveX, OleServer, ExtCtrls;
By the way I realized something I am using windows 7 therefore my sapi is 5.4 what should I do know?
uninstall it somehow and reinstall 5.1?
Pardon after adding ActiveX, OleServer, ExtCtrls into uses problem solved, however, when I use "start" procedure there nothing happens, how can I write the recognized stream into a text file or into a memo
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
The language I use is English, I would be happy to learn it is language code
and I got "Undeclaired identifier for 'Grammar' word"
thanks again
Sorry my fault
I added  Grammar: ISpeechRecoGrammar; in private area
how ever this time when I execute start procedure  I got "OLE eror 80045001" message
This is the last situation I am in.
With the help of twinsofts answers and 2 articles I linked below, I have made a working code which I added

http://edn.embarcadero.com/article/29583
and
http://www.delphi3000.com/articles/article_2629.asp

However, results are awfull. I need to upgrade sensitivity and accuracy,
Forexample
The real speech is:
"Welcome to the turbo power happy voice example program, press any key on your phones touch pad to start program"
The conversion is:
"Welcome to the cattle polish up where it can't be politically ample program but any key on your own cut that to start a program"
Any futher help will be very appreciated.
unit AltYazarP;

interface

uses
  Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,Dialogs, SpeechLib_TLB, OleServer, StdCtrls,ActiveX,ExtCtrls;

type
  TForm1 = class(TForm)
    SR: TSpInProcRecoContext;
    FileStream: TSpFileStream;
    Button1: TButton;
    Hipotezler: TMemo;
    OpenDialog1: TOpenDialog;
    Label1: TLabel;
    Label2: TLabel;
    Taninanlar: TMemo;
    Button2: TButton;
    procedure SRFalseRecognition(ASender: TObject; StreamNumber: Integer;
      StreamPosition: OleVariant; const Result: ISpeechRecoResult);
    procedure SREndStream(ASender: TObject; StreamNumber: Integer;
      StreamPosition: OleVariant; StreamReleased: WordBool);
      procedure Start;
    procedure Button1Click(Sender: TObject);
    procedure SRRecognition(ASender: TObject; StreamNumber: Integer;
      StreamPosition: OleVariant; RecognitionType: TOleEnum;      const Result: ISpeechRecoResult);
    procedure FormCreate(Sender: TObject);
    procedure SRHypothesis(ASender: TObject; StreamNumber: Integer;
      StreamPosition: OleVariant; const Result: ISpeechRecoResult);
    procedure Button2Click(Sender: TObject);
  private
  Grammar: ISpeechRecoGrammar;
    { Private declarations }
  public
    { Public declarations }
  end;

var
  Form1: TForm1;

implementation

{$R *.dfm}
procedure TForm1.FormCreate(Sender: TObject);

begin
Grammar := SR.CreateGrammar(0);
Grammar.DictationSetState(SGDSActive);
end;

procedure TForm1.Start;

begin
if OpenDialog1.Execute then
begin
FileStream.Open(OpenDialog1.FileName, SPFM_OPEN_READONLY, False);
SR.Recognizer.AudioInputStream := FileStream.DefaultInterface;
end;
end;

procedure TForm1.SRFalseRecognition(ASender: TObject;
  StreamNumber: Integer; StreamPosition: OleVariant;
  const Result: ISpeechRecoResult);
begin
// Showmessage('Cannot recognize');
end;

procedure TForm1.SREndStream(ASender: TObject; StreamNumber: Integer;
  StreamPosition: OleVariant; StreamReleased: WordBool);
begin
 FileStream.Close;
end;

procedure TForm1.Button1Click(Sender: TObject);
begin
Start;
end;

procedure TForm1.SRRecognition(ASender: TObject; StreamNumber: Integer;
  StreamPosition: OleVariant; RecognitionType: TOleEnum; const Result: ISpeechRecoResult);
var
  SRResult: ISpeechRecoResult;
  oItem: ISpeechPhraseProperty;
  i: Integer;
begin

Taninanlar.Text {taninanlar is a Memo} := Taninanlar.Text+Result.PhraseInfo.GetText(0,-1,true);

end;



procedure TForm1.SRHypothesis(ASender: TObject; StreamNumber: Integer;
  StreamPosition: OleVariant; const Result: ISpeechRecoResult);

begin

Hipotezler.Text{Hipotezler is another Memo) := Hipotezler.Text +Result.PhraseInfo.GetText(0,-1,false);
end;

procedure TForm1.Button2Click(Sender: TObject);

begin
//erased garbage
end;

end.

Open in new window

Hi, the ability of the engine to recognize words correctly has to do with the efficiency of the engine itself. The only thing that you can do is to use grammar rules to help the engine perform better but this is not easy as you have to create something like a vocabulary...
Thanks again, however I didn't get what you said completely. How can I change the grammar rules, are there more to do besides arranging the parameters?
If there are paramaters for grammar rules, I d be happy to hear.

(For twinsoft) I have 4 more questions similar to this one but comperatively easy ones


Export M.S Speech Engine 5.1 (SAPI 5.1) and/or SAPI 4 library to another computer to save my recorded words
https://www.experts-exchange.com/questions/25100009/Export-M-S-Speech-Engine-5-1-SAPI-5-1-and-or-SAPI-4-library-to-another-computer-to-save-my-recorded-words.html



Convert .wav (sound) to Graphic (jpg etc.) using delphi: Wav to binary -> Interperate binary-> Number to points-> Join points and draw
https://www.experts-exchange.com/questions/25100383/Convert-wav-sound-to-Graphic-jpg-etc-using-delphi-Wav-to-binary-Interperate-binary-Number-to-points-Join-points-and-draw.html


Speech Recognation program using Delphi and Matlab

https://www.experts-exchange.com/questions/25100036/Speech-Recognation-program-using-Delphi-and-Matlab.html


Simple Speech Recognation program, which recognise numbers between 1-100 using Delphi (in turkish language)
https://www.experts-exchange.com/questions/25100689/Simple-Speech-Recognation-program-which-recognise-numbers-between-1-100-using-Delphi-in-turkish-language.html?fromWizard=true
Thanks it was good to work with you...