[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
Solved

# Finding start and end points of a  speech signal?

Posted on 1998-12-06
Medium Priority
260 Views
I need to Find start and end points of a
speech signal

below is faq from the comp.speech
but the code example is only for unix,
where can I find the doc/delphi example for this?
and where can i find some article about the end-point
detection algorithms?
thank you.

End-point detection algorithms identify sections in an incoming audio signal that contain
speech. Accurate end-pointing is a non-trivial task, however, reasonable behaviour can be
obtained for inputs which contain only speech surrounded by silence (no other noises). Typical
algorithms look at the energy or amplitude of the incoming signal and at the rate of
"zero-crossings". A zero-crossing is where the audio signal changes from positive to negative
or visa versa. When the energy and zero-crossings are at certain levels, it is reasonable to
guess that there is speech. More detailed descriptions are provided in the papers cited below
and in the documentation for the following software.
0
Question by:jackton
• 3

LVL 10

Expert Comment

ID: 1349620
Hi jackton,

the cite is correct. Finding start and end point of a speech signal is indeed nontrivial. Here's a simple end detection routine. If you'd increase the points for your question significantly I'd give you much better functions (using the zero crossings approach including optional noise detection).

function RastaFindStop2(PCMData: PSmallInt; SampleCount, NoiseLevel, Start: LongInt;
var Stop: LongInt; NeedSilenceAtEnd: Boolean): Boolean;
var
P: PSmallInt;
SignalBuf: array[0..511] of Integer;
Avg: LongInt;

function CalcAvgSignal: LongInt;
var
L: Integer;
begin
Result := 0;
for L := 0 to High(SignalBuf) do
Inc(Result, SignalBuf[L]);
Result := Result div (High(SignalBuf)+1);
end;

begin
CurSample := Stop;
P := Inc(PCMData, CurSample * 2);

CurIndex := 0;
FillChar(SignalBuf, SizeOf(SignalBuf), #0);

while CurSample > Start do
begin
Signal := Abs(P^);
if CurIndex = High(SignalBuf)+1 then
CurIndex := 0;
SignalBuf[CurIndex] := Signal;
Inc(CurIndex);

Avg := CalcAvgSignal;

if Avg > 150 then
begin
begin
P := Inc(PCMData, WalkAheadIndex * 2);
Signal := Abs(P^);
if CurIndex = High(SignalBuf)+1 then
CurIndex := 0;
SignalBuf[CurIndex] := Signal;
Inc(CurIndex);
Avg := CalcAvgSignal;
if Avg < 120 then
Break;
end;
Break;
end;
Dec(CurSample);
P := Inc(PCMData, CurSample * 2);
end;
Stop := CurSample;
Result := True;
end;

PCMData is a pointer to raw pcm data (signed 16 bit mono signal only).

Ciao, Mike
0

Author Comment

ID: 1349621
hi,Lischke
The problem is urgent for me,But I haven't any more points
Would you mail me the better function?
If i have more points,I would add it to you.

my e-mail account is:jt9700@yahoo.com

0

LVL 10

Accepted Solution

Lischke earned 100 total points
ID: 1349622
Grummel, grummel, mmmmh, okay I'll send you the code. But you have to find out alone how it is to be used.

Ciao, Mike
0

LVL 10

Expert Comment

ID: 1349623
Without much comment, here's the solution I send jiangtao privately:

Here is the code. It compiles fine with D2,3,4. Some german comments are still not translated, but I'm sure you will get the sense. The code is for 22kHz, 16 bits, signed mono samples (easy to adjust to other values):

function OffsetPointer(P: Pointer; Offs: LongInt): Pointer;
{\$IFDEF WIN32}
begin Result := Pointer(LongInt(P)+Offs); end;
{\$ELSE}
Assembler;
asm
mov ax,Word Ptr[Offs]; mov dx, Word Ptr[Offs+2];
mov cx,3;  { OFFSET __AHSHIFT }
shl dx,cl
end;
{\$ENDIF}

var
{ True: NoiseLift über Noise faktorisieren (-> Quadratfunktion) }
RastaAutoScaleNoiseLift: Boolean = True;
{ Signal := Signal-NoiseLift*Noise; 1.4 für Noise < 10, 4.0 für Noise = 40 }
RastaNoiseLift: Single = 2.0;
{ Minimalzeit für Above Threshold (msec) -> ist Signal else Knackser }
RastaMinSigLen: LongInt = 35;
{ Start: Größe Sliding Window und AC-Durchschnittswert }
RastaStartAvgBufSize: LongInt = 200; RastaStartAvg: LongInt = 35;
{ Stop: Größe Sliding Window und AC-Durchschnittswert - erheblich kleiner! }
RastaStopAvgBufSize: LongInt = 200;  RastaStopAvg: LongInt = 15;
{ 22050 Hz, Sliding Window 23 msec -> 13.9... Zero Crossings = Grundfrequenz 600 Hz}
RastaZCross600Hz: Integer = 14;
{ 25 MSec kann ein Signal von Zerocrossings allein leben, danach muß Power kommen }
RastaMaxFricSamples: Integer = (LongInt(22050)*25) div 1000;
{ Just Not Discriminable Volume Change, normalerweise 2 dB, max. 3 dB }
RastaJND: Single = 1.4;
{ True: StartAvg/StopAvg verwenden, False: Just Noise * RastaJND }
RastaUseThresholds: Boolean = True;

const
DefaultPauseSamples = 6620;  { 300 msec bei 22050 Hz }
PauseSamples: LongInt = DefaultPauseSamples; { -> RastaFindStop }

function EstimateNoise(StdLongThreshold,NoiseLevel: LongInt; IsStart: Boolean): LongInt;
begin
if RastaAutoScaleNoiseLift then
begin { ScaledNoise := NoiseLevel div 512;
if ScaledNoise < 10 then NoiseLift := 1.4
else NoiseLift := ScaledNoise / 10; }
RastaNoiseLift := NoiseLevel / 5120;
if RastaNoiseLift < 1.4 then RastaNoiseLift := 1.4;  { 2 dB }
end;
if RastaUseThresholds then
begin
Result := Trunc(RastaNoiseLift*StdLongThreshold+NoiseLevel div 512);
end else
begin
if IsStart then Result := Trunc(NoiseLevel*RastaJND)
else Result := Trunc(NoiseLevel*Sqrt(RastaJND));  {Stop }
end;
end;

const TriggerFac = 1.5;  { ZCrossings above Noise }
const MaxFricLenMs = 25; { Nach 25 msec "Signal" aufgrund von ZCrossings muß Power kommen - sonst war's nix }

{ True: if start and NoiseLevel set. NoiseLevel = Real Level * 512  }
function RastaFindStart(PCMData: PSmallInt; SampleCount: LongInt;
NoiseComputing: Boolean; var NoiseLevel, Start: LongInt): Boolean;

type IntBuf = Array[0..999] of SmallInt; PIntBuf = ^IntBuf;
var x: LongInt; P: PSmallInt;
AvgBufSize, AvgBufIndex, CurVal: Word; Avg, AvgNext: LongInt;
AvgBuf: PIntBuf;
AvgLongThresholdWithNoise: LongInt;
LastVal, NewVal, NextLastVal: Integer;
AvgElems, AvgThreshold: LongInt;

NoiseBuf: PIntBuf; NoiseBufIndex: Integer; LowestNoiseLevel, NewNoiseLevel: LongInt;
MinSignalLength, SignalLength: LongInt;
{ Zero crossings }
DCPos, DCNeg: LongInt; DCOffset: Integer;
ZCrossPosTrigger, ZCrossNegTrigger: Integer;
ZCrossBuf: ^IntBuf; VoltageBelow: Boolean;
ZCrossIndex, ZCrossCount: Integer;
FricDownCount: Integer;  { "Signal" ohne Power (nur wg. ZCrossings): nach * Samples muß Power kommen }
begin
AvgElems := RastaStartAvgBufSize; AvgThreshold := RastaStartAvg;
MinSignalLength := 22*RastaMinSigLen; SignalLength := 0;
Start := 0; Result := False;

AvgBufSize := AvgElems*SizeOf(SmallInt); GetMem(AvgBuf,AvgBufSize);
Avg := 0; AvgBufIndex := 0; FillChar(AvgBuf^,AvgBufSize,0);

{ the first 23 msec AC as noise (accumulated AC-values) and for DC-Offset }
GetMem(NoiseBuf,512*SizeOf(SmallInt));
FillChar(NoiseBuf^,1024,0);
NewNoiseLevel := 0;
NoiseBufIndex := 0;
DCPos := 0;
DCNeg := 0;
FricDownCount:=0;
GetMem(ZCrossBuf,512*SizeOf(SmallInt));
FillChar(ZCrossBuf^,1024,0);

P := PCMData;
LastVal := P^;
DCOffset := 0;

if NoiseComputing then begin
while (NoiseBufIndex < 511) and (DCOffset < SampleCount-1) do
begin
{ DC-Offset }
if LastVal > 0 then
Inc(DCPos,LastVal)
else
Inc(DCNeg,LastVal);

{\$IFDEF WIN32} Inc(P); {\$ELSE}
asm
jnb @@1;
@@1:
end; {\$ENDIF}

NewVal := P^;

CurVal := Abs(NewVal-LastVal);
LastVal := NewVal;
if CurVal <> 0 then
begin
Inc(NewNoiseLevel,CurVal);
{ saturation }
if CurVal > 32000 then NoiseBuf^[NoiseBufIndex] := 32000
else NoiseBuf^[NoiseBufIndex] := CurVal;
Inc(NoiseBufIndex);
end;

Inc(DCOffset);  { own counter - NoiseBuf skips DC-ranges }
end;
DCOffset := (DCPos+DCNeg) div DCOffset; { DCNeg < 0 }
end
else begin
DCOffset := 0;
NewNoiseLevel := 25000; {30000 for baiser! }
end;

LowestNoiseLevel := NewNoiseLevel;
NoiseBufIndex := 0;
ZCrossPosTrigger := Trunc((NewNoiseLevel*TriggerFac) / 512 + DCOffset);
ZCrossNegTrigger := Trunc(-(NewNoiseLevel*TriggerFac) /512 + DCOffset);

{ added AC-Level plus Noise }
AvgLongThresholdWithNoise := EstimateNoise(AvgThreshold*AvgElems,LowestNoiseLevel, True);

{ and all again, Signal = AC - NoiseLevel. for the first 23 msec
can very well be a negative signal level  }
P := PCMData;
LastVal := P^;
VoltageBelow := LastVal < ZCrossNegTrigger; ZCrossCount := 0; ZCrossIndex := 0;
for x := 1 to SampleCount-1 do
begin
{\$IFDEF WIN32}
Inc(P);
{\$ELSE}
asm
jnb @@1;
@@1:
end;
{\$ENDIF}

Dec(Avg, AvgBuf^[AvgBufIndex]);
NewVal := P^; CurVal := Abs(NewVal-LastVal); LastVal := NewVal;

{ Obergrenze für verwertbare Daten festlegen (Saturation)}
if CurVal > 32000 then CurVal := 32000;
AvgBuf^[AvgBufIndex] := CurVal; Inc(Avg,CurVal);

Inc(AvgBufIndex); if AvgBufIndex >= AvgElems then AvgBufIndex := 0;

{ special case for silence-pieces }
if CurVal <> 0 then
begin
Dec(NewNoiseLevel,NoiseBuf^[NoiseBufIndex]); NoiseBuf^[NoiseBufIndex] := CurVal;
Inc(NewNoiseLevel,CurVal);
Inc(NoiseBufIndex); if NoiseBufIndex > 511 then NoiseBufIndex := 0;
end;

if NewNoiseLevel < LowestNoiseLevel then
begin
LowestNoiseLevel := NewNoiseLevel;
AvgLongThresholdWithNoise := EstimateNoise(AvgThreshold*AvgElems,LowestNoiseLevel,True);
ZCrossPosTrigger := Trunc((NewNoiseLevel*TriggerFac) / 512 + DCOffset);
ZCrossNegTrigger := Trunc(-(NewNoiseLevel*TriggerFac) / 512 + DCOffset);
end;

{ Zerocrossings }
Dec(ZCrossCount,ZCrossBuf^[ZCrossIndex]);
if VoltageBelow then
begin
if NewVal > ZCrossPosTrigger then
begin
Inc(ZCrossCount); ZCrossBuf^[ZCrossIndex] := 1;
VoltageBelow := False;
end
else ZCrossBuf^[ZCrossIndex] := 0;  { no change }
end else
begin
if NewVal < ZCrossNegTrigger then
begin
Inc(ZCrossCount); ZCrossBuf^[ZCrossIndex] := 1;
VoltageBelow := True;
end
else ZCrossBuf^[ZCrossIndex] := 0;
end;
Inc(ZCrossIndex); if ZCrossIndex > 511 then ZCrossIndex := 0;

if SignalLength > 0 then
begin   { signal found. Is it still there? }
Dec(FricDownCount);
if (Avg < AvgLongThresholdWithNoise)  and (FricDownCount < 0)  { and (ZCrossCount < RastaZCross600Hz) }
then SignalLength := 0  { No. was a peak }
else
begin
{ if Power and ZCrossings are there: Counter Re-Init }
if (Avg > AvgLongThresholdWithNoise) and (ZCrossCount > RastaZCross600Hz)
then FricDownCount := RastaMaxFricSamples;
Inc(SignalLength);
if SignalLength > MinSignalLength then Break;  { start is already there }
end;
end else
begin
if (Avg > AvgLongThresholdWithNoise) or (ZCrossCount > RastaZCross600Hz) then
begin
if ZCrossCount > RastaZCross600Hz
then FricDownCount := RastaMaxFricSamples
else FricDownCount := 0;
SignalLength := AvgElems;
Start := x-AvgElems;
if Start < 0 then Start := 0;
end;
end;
end;
FreeMem(AvgBuf,AvgBufSize);
FreeMem(NoiseBuf,512*SizeOf(SmallInt));
FreeMem(ZCrossBuf,512*SizeOf(SmallInt));
NoiseLevel := LowestNoiseLevel;  { = Lowest Level * 512 }
Result := SignalLength > MinSignalLength; { True: variable Start set }
end;

function RastaFindStop(PCMData: PSmallInt; SampleCount, NoiseLevel, Start: LongInt;
var Stop: LongInt; NeedSilenceAtEnd: Boolean): Boolean;
type
IntBuf = Array[0..999] of SmallInt;
PIntBuf = ^IntBuf;
var x: LongInt; P: PSmallInt;
AvgBufSize, AvgBufIndex, CurVal: Word; Avg, AvgNext: LongInt;
AvgBuf: PIntBuf;
AvgLongThresholdWithNoise: LongInt;
LastVal, NewVal, NextLastVal: Integer;
AvgElems, AvgThreshold: LongInt;

NoiseBuf: PIntBuf; NoiseBufIndex: Integer; LowestNoiseLevel, NewNoiseLevel: LongInt;
MinSignalLength, SignalLength: LongInt;
{ Zero crossings }
DCPos, DCNeg: LongInt; DCOffset: Integer;
ZCrossPosTrigger, ZCrossNegTrigger: Integer;
ZCrossBuf: ^IntBuf; VoltageBelow: Boolean;
ZCrossIndex, ZCrossCount: Integer;
FricDownCount: Integer;  { "Signal" ohne Power (nur wg. ZCrossings): nach * Samples muß Power kommen }
begin
{ global Rasta-variable -> locales. reason: just for convinience }
AvgElems := RastaStopAvgBufSize; AvgThreshold := RastaStopAvg;
MinSignalLength := 22*RastaMinSigLen; SignalLength := 0;

Result := False;
{ if less than PauseSamples then no search can be done: go out of here }
if NeedSilenceAtEnd and (SampleCount < PauseSamples) then Exit;

AvgBufSize := AvgElems*SizeOf(SmallInt); GetMem(AvgBuf,AvgBufSize);
Avg := 0; AvgBufIndex := 0; FillChar(AvgBuf^,AvgBufSize,0);

GetMem(NoiseBuf,512*SizeOf(SmallInt)); FillChar(NoiseBuf^,1024,0);
NewNoiseLevel := 0; LowestNoiseLevel := NoiseLevel; NoiseBufIndex := 0;
FricDownCount:=0;
DCPos := 0; DCNeg := 0;
GetMem(ZCrossBuf,512*SizeOf(SmallInt));
FillChar(ZCrossBuf^,1024,0);

{ P to the last element in the buffer }
P := OffsetPointer(PCMData, (SampleCount-1)*2);
LastVal := P^;
DCOffset := 0;

while (NoiseBufIndex < 511) and (DCOffset < SampleCount-1) do
begin
{ DC-Offset }
if LastVal > 0 then
Inc(DCPos,LastVal)
else
Inc(DCNeg,LastVal);
{\$IFDEF WIN32}
Dec(P);
{\$ELSE}
asm
sub Word Ptr[P],2;
jnb @@1;
sub Word Ptr[P+2],8;
@@1:
end;
{\$ENDIF}
NewVal := P^;
CurVal := Abs(NewVal-LastVal);

if CurVal > 32000 then CurVal:=32000;

LastVal := NewVal;
if CurVal <> 0 then
begin
Inc(NewNoiseLevel,CurVal);
NoiseBuf^[NoiseBufIndex] := CurVal;
Inc(NoiseBufIndex);
end;
Inc(DCOffset);
end;
DCOffset := (DCPos+DCNeg) div DCOffset;
NoiseBufIndex := 0;
if NewNoiseLevel < LowestNoiseLevel
then LowestNoiseLevel := NewNoiseLevel;
ZCrossPosTrigger := Trunc((NewNoiseLevel*TriggerFac) / 512 + DCOffset);
ZCrossNegTrigger := Trunc(-(NewNoiseLevel*TriggerFac) / 512 + DCOffset);

{ summed AC-level plus noise. no recalc of NoiseLiftering }
AvgLongThresholdWithNoise := EstimateNoise(AvgThreshold*AvgElems,LowestNoiseLevel,False);

P := OffsetPointer(PCMData, (SampleCount-1)*2);

LastVal := P^;
VoltageBelow := LastVal < ZCrossNegTrigger;
ZCrossCount := 0; ZCrossIndex := 0;

for x := SampleCount downto 1 do
begin
Dec(Avg, AvgBuf^[AvgBufIndex]);
NewVal := P^;
{\$IFDEF WIN32} Dec(P); {\$ELSE}
asm sub Word Ptr[P],2; jnb @@1; sub Word Ptr[P+2],8; @@1: end; {\$ENDIF}

CurVal := Abs(NewVal-LastVal);

if CurVal > 32000 then CurVal:=32000;

LastVal := NewVal;
AvgBuf^[AvgBufIndex] := CurVal;
Inc(Avg,CurVal);
Inc(AvgBufIndex);
if AvgBufIndex >= AvgElems then AvgBufIndex := 0;

{ special case for silence-ranges }
if CurVal <> 0 then
begin
Dec(NewNoiseLevel,NoiseBuf^[NoiseBufIndex]); NoiseBuf^[NoiseBufIndex] := CurVal;
Inc(NewNoiseLevel,CurVal);
Inc(NoiseBufIndex); if NoiseBufIndex > 511 then NoiseBufIndex := 0;
end;

if NewNoiseLevel < LowestNoiseLevel then
begin
LowestNoiseLevel := NewNoiseLevel;
AvgLongThresholdWithNoise := EstimateNoise(AvgThreshold*AvgElems,LowestNoiseLevel,False);
ZCrossPosTrigger := Trunc((NewNoiseLevel*TriggerFac) / 512 + DCOffset);
ZCrossNegTrigger := Trunc(-(NewNoiseLevel*TriggerFac) / 512 + DCOffset);
end;

{ Zerocrossings }
Dec(ZCrossCount,ZCrossBuf^[ZCrossIndex]);
if VoltageBelow then
begin
if NewVal > ZCrossPosTrigger then
begin
Inc(ZCrossCount); ZCrossBuf^[ZCrossIndex] := 1;
VoltageBelow := False;
end
else ZCrossBuf^[ZCrossIndex] := 0;  { no change }
end else
begin
if NewVal < ZCrossNegTrigger then
begin
Inc(ZCrossCount); ZCrossBuf^[ZCrossIndex] := 1;
VoltageBelow := True;
end
else ZCrossBuf^[ZCrossIndex] := 0;
end;
Inc(ZCrossIndex); if ZCrossIndex > 511 then ZCrossIndex := 0;

if SignalLength > 0 then
begin
Dec(FricDownCount);
if (Avg < AvgLongThresholdWithNoise) and (FricDownCount < 0) { and (ZCrossCount < RastaZCross600Hz) }
then SignalLength := 0
else
begin
if (Avg > AvgLongThresholdWithNoise) and (ZCrossCount > RastaZCross600Hz)
then FricDownCount := RastaMaxFricSamples;
Inc(SignalLength);
if SignalLength > MinSignalLength then
begin
if (Stop > SampleCount-PauseSamples) and NeedSilenceAtEnd then SignalLength := 0;  { no end found }
Break;
end;
end;
end else
begin
if (Avg > AvgLongThresholdWithNoise) or (ZCrossCount > RastaZCross600Hz) then
begin
if ZCrossCount > RastaZCross600Hz then FricDownCount := RastaMaxFricSamples
else FricDownCount := 0;
SignalLength := AvgElems;
Stop := x+AvgElems;
end;
end;
end;
FreeMem(AvgBuf,AvgBufSize);
FreeMem(NoiseBuf,512*SizeOf(SmallInt));
FreeMem(ZCrossBuf,512*SizeOf(SmallInt));
Result := SignalLength > MinSignalLength;  { True: Stop set }
end;

Ciao, Mike

0

## Featured Post

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

A lot of questions regard threads in Delphi.   One of the more specific questions is how to show progress of the thread.   Updating a progressbar from inside a thread is a mistake. A solution to this would be to send a synchronized message to the…
Creating an auto free TStringList The TStringList is a basic and frequently used object in Delphi. On many occasions, you may want to create a temporary list, process some items in the list and be done with the list. In such cases, you have to…
As many of you are aware about Scanpst.exe utility which is owned by Microsoft itself to repair inaccessible or damaged PST files, but the question is do you really think Scanpst.exe is capable to repair all sorts of PST related corruption issues?
With just a little bit of  SQL and VBA, many doors open to cool things like synchronize a list box to display data relevant to other information on a form.  If you have never written code or looked at an SQL statement before, no problem! ...  give i…
###### Suggested Courses
Course of the Month19 days, 8 hours left to enroll