# Finding start and end points of a  speech signal?

Posted on 1998-12-06
I need to Find start and end points of a
speech signal

below is faq from the comp.speech
but the code example is only for unix,
where can I find the doc/delphi example for this?
and where can i find some article about the end-point
detection algorithms?
thank you.

End-point detection algorithms identify sections in an incoming audio signal that contain
speech. Accurate end-pointing is a non-trivial task, however, reasonable behaviour can be
obtained for inputs which contain only speech surrounded by silence (no other noises). Typical
algorithms look at the energy or amplitude of the incoming signal and at the rate of
"zero-crossings". A zero-crossing is where the audio signal changes from positive to negative
or visa versa. When the energy and zero-crossings are at certain levels, it is reasonable to
guess that there is speech. More detailed descriptions are provided in the papers cited below
and in the documentation for the following software.
Question by:jackton
Hi jackton,

the cite is correct. Finding start and end point of a speech signal is indeed nontrivial. Here's a simple end detection routine. If you'd increase the points for your question significantly I'd give you much better functions (using the zero crossings approach including optional noise detection).

function RastaFindStop2(PCMData: PSmallInt; SampleCount, NoiseLevel, Start: LongInt;
var Stop: LongInt; NeedSilenceAtEnd: Boolean): Boolean;
var
P: PSmallInt;
SignalBuf: array[0..511] of Integer;
Avg: LongInt;

function CalcAvgSignal: LongInt;
var
L: Integer;
begin
Result := 0;
for L := 0 to High(SignalBuf) do
Inc(Result, SignalBuf[L]);
Result := Result div (High(SignalBuf)+1);
end;

begin
CurSample := Stop;
P := Inc(PCMData, CurSample * 2);

CurIndex := 0;
FillChar(SignalBuf, SizeOf(SignalBuf), #0);

while CurSample > Start do
begin
Signal := Abs(P^);
if CurIndex = High(SignalBuf)+1 then
CurIndex := 0;
SignalBuf[CurIndex] := Signal;
Inc(CurIndex);

Avg := CalcAvgSignal;

if Avg > 150 then
begin
begin
P := Inc(PCMData, WalkAheadIndex * 2);
Signal := Abs(P^);
if CurIndex = High(SignalBuf)+1 then
CurIndex := 0;
SignalBuf[CurIndex] := Signal;
Inc(CurIndex);
Avg := CalcAvgSignal;
if Avg < 120 then
Break;
end;
Break;
end;
Dec(CurSample);
P := Inc(PCMData, CurSample * 2);
end;
Stop := CurSample;
Result := True;
end;

PCMData is a pointer to raw pcm data (signed 16 bit mono signal only).

Ciao, Mike
hi,Lischke
The problem is urgent for me,But I haven't any more points
Would you mail me the better function?
If i have more points,I would add it to you.

my e-mail account is:jt9700@yahoo.com

Grummel, grummel, mmmmh, okay I'll send you the code. But you have to find out alone how it is to be used.

Ciao, Mike
Without much comment, here's the solution I send jiangtao privately:

Here is the code. It compiles fine with D2,3,4. Some german comments are still not translated, but I'm sure you will get the sense. The code is for 22kHz, 16 bits, signed mono samples (easy to adjust to other values):

function OffsetPointer(P: Pointer; Offs: LongInt): Pointer;
{\$IFDEF WIN32}
begin Result := Pointer(LongInt(P)+Offs); end;
{\$ELSE}
Assembler;
asm
mov ax,Word Ptr[Offs]; mov dx, Word Ptr[Offs+2];
mov cx,3;  { OFFSET __AHSHIFT }
shl dx,cl
end;
{\$ENDIF}

var
{ True: NoiseLift über Noise faktorisieren (-> Quadratfunktion) }
RastaAutoScaleNoiseLift: Boolean = True;
{ Signal := Signal-NoiseLift*Noise; 1.4 für Noise < 10, 4.0 für Noise = 40 }
RastaNoiseLift: Single = 2.0;
{ Minimalzeit für Above Threshold (msec) -> ist Signal else Knackser }
RastaMinSigLen: LongInt = 35;
{ Start: Größe Sliding Window und AC-Durchschnittswert }
RastaStartAvgBufSize: LongInt = 200; RastaStartAvg: LongInt = 35;
{ Stop: Größe Sliding Window und AC-Durchschnittswert - erheblich kleiner! }
RastaStopAvgBufSize: LongInt = 200;  RastaStopAvg: LongInt = 15;
{ 22050 Hz, Sliding Window 23 msec -> 13.9... Zero Crossings = Grundfrequenz 600 Hz}
RastaZCross600Hz: Integer = 14;
{ 25 MSec kann ein Signal von Zerocrossings allein leben, danach muß Power kommen }
RastaMaxFricSamples: Integer = (LongInt(22050)*25) div 1000;
{ Just Not Discriminable Volume Change, normalerweise 2 dB, max. 3 dB }
RastaJND: Single = 1.4;
{ True: StartAvg/StopAvg verwenden, False: Just Noise * RastaJND }
RastaUseThresholds: Boolean = True;

const
DefaultPauseSamples = 6620;  { 300 msec bei 22050 Hz }
PauseSamples: LongInt = DefaultPauseSamples; { -> RastaFindStop }

function EstimateNoise(StdLongThreshold,NoiseLevel: LongInt; IsStart: Boolean): LongInt;
begin
if RastaAutoScaleNoiseLift then
begin { ScaledNoise := NoiseLevel div 512;
if ScaledNoise < 10 then NoiseLift := 1.4
else NoiseLift := ScaledNoise / 10; }
RastaNoiseLift := NoiseLevel / 5120;
if RastaNoiseLift < 1.4 then RastaNoiseLift := 1.4;  { 2 dB }
end;
if RastaUseThresholds then
begin
Result := Trunc(RastaNoiseLift*StdLongThreshold+NoiseLevel div 512);
end else
begin
if IsStart then Result := Trunc(NoiseLevel*RastaJND)
else Result := Trunc(NoiseLevel*Sqrt(RastaJND));  {Stop }
end;
end;

const TriggerFac = 1.5;  { ZCrossings above Noise }
const MaxFricLenMs = 25; { Nach 25 msec "Signal" aufgrund von ZCrossings muß Power kommen - sonst war's nix }

{ True: if start and NoiseLevel set. NoiseLevel = Real Level * 512  }
function RastaFindStart(PCMData: PSmallInt; SampleCount: LongInt;
NoiseComputing: Boolean; var NoiseLevel, Start: LongInt): Boolean;

type IntBuf = Array[0..999] of SmallInt; PIntBuf = ^IntBuf;
var x: LongInt; P: PSmallInt;
AvgBufSize, AvgBufIndex, CurVal: Word; Avg, AvgNext: LongInt;
AvgBuf: PIntBuf;
AvgLongThresholdWithNoise: LongInt;
LastVal, NewVal, NextLastVal: Integer;
AvgElems, AvgThreshold: LongInt;

NoiseBuf: PIntBuf; NoiseBufIndex: Integer; LowestNoiseLevel, NewNoiseLevel: LongInt;
MinSignalLength, SignalLength: LongInt;
{ Zero crossings }
DCPos, DCNeg: LongInt; DCOffset: Integer;
ZCrossPosTrigger, ZCrossNegTrigger: Integer;
ZCrossBuf: ^IntBuf; VoltageBelow: Boolean;
ZCrossIndex, ZCrossCount: Integer;
FricDownCount: Integer;  { "Signal" ohne Power (nur wg. ZCrossings): nach * Samples muß Power kommen }
begin
AvgElems := RastaStartAvgBufSize; AvgThreshold := RastaStartAvg;
MinSignalLength := 22*RastaMinSigLen; SignalLength := 0;
Start := 0; Result := False;

AvgBufSize := AvgElems*SizeOf(SmallInt); GetMem(AvgBuf,AvgBufSize);
Avg := 0; AvgBufIndex := 0; FillChar(AvgBuf^,AvgBufSize,0);

{ the first 23 msec AC as noise (accumulated AC-values) and for DC-Offset }
GetMem(NoiseBuf,512*SizeOf(SmallInt));
FillChar(NoiseBuf^,1024,0);
NewNoiseLevel := 0;
NoiseBufIndex := 0;
DCPos := 0;
DCNeg := 0;
FricDownCount:=0;
GetMem(ZCrossBuf,512*SizeOf(SmallInt));
FillChar(ZCrossBuf^,1024,0);

P := PCMData;
LastVal := P^;
DCOffset := 0;

if NoiseComputing then begin
while (NoiseBufIndex < 511) and (DCOffset < SampleCount-1) do
begin
{ DC-Offset }
if LastVal > 0 then
Inc(DCPos,LastVal)
else
Inc(DCNeg,LastVal);

{\$IFDEF WIN32} Inc(P); {\$ELSE}
asm
jnb @@1;
@@1:
end; {\$ENDIF}

NewVal := P^;

CurVal := Abs(NewVal-LastVal);
LastVal := NewVal;
if CurVal <> 0 then
begin
Inc(NewNoiseLevel,CurVal);
{ saturation }
if CurVal > 32000 then NoiseBuf^[NoiseBufIndex] := 32000
else NoiseBuf^[NoiseBufIndex] := CurVal;
Inc(NoiseBufIndex);
end;

Inc(DCOffset);  { own counter - NoiseBuf skips DC-ranges }
end;
DCOffset := (DCPos+DCNeg) div DCOffset; { DCNeg < 0 }
end
else begin
DCOffset := 0;
NewNoiseLevel := 25000; {30000 for baiser! }
end;

LowestNoiseLevel := NewNoiseLevel;
NoiseBufIndex := 0;
ZCrossPosTrigger := Trunc((NewNoiseLevel*TriggerFac) / 512 + DCOffset);
ZCrossNegTrigger := Trunc(-(NewNoiseLevel*TriggerFac) /512 + DCOffset);

{ added AC-Level plus Noise }
AvgLongThresholdWithNoise := EstimateNoise(AvgThreshold*AvgElems,LowestNoiseLevel, True);

{ and all again, Signal = AC - NoiseLevel. for the first 23 msec
can very well be a negative signal level  }
P := PCMData;
LastVal := P^;
VoltageBelow := LastVal < ZCrossNegTrigger; ZCrossCount := 0; ZCrossIndex := 0;
for x := 1 to SampleCount-1 do
begin
{\$IFDEF WIN32}
Inc(P);
{\$ELSE}
asm
jnb @@1;
@@1:
end;
{\$ENDIF}

Dec(Avg, AvgBuf^[AvgBufIndex]);
NewVal := P^; CurVal := Abs(NewVal-LastVal); LastVal := NewVal;

{ Obergrenze für verwertbare Daten festlegen (Saturation)}
if CurVal > 32000 then CurVal := 32000;
AvgBuf^[AvgBufIndex] := CurVal; Inc(Avg,CurVal);

Inc(AvgBufIndex); if AvgBufIndex >= AvgElems then AvgBufIndex := 0;

{ special case for silence-pieces }
if CurVal <> 0 then
begin
Dec(NewNoiseLevel,NoiseBuf^[NoiseBufIndex]); NoiseBuf^[NoiseBufIndex] := CurVal;
Inc(NewNoiseLevel,CurVal);
Inc(NoiseBufIndex); if NoiseBufIndex > 511 then NoiseBufIndex := 0;
end;

if NewNoiseLevel < LowestNoiseLevel then
begin
LowestNoiseLevel := NewNoiseLevel;
AvgLongThresholdWithNoise := EstimateNoise(AvgThreshold*AvgElems,LowestNoiseLevel,True);
ZCrossPosTrigger := Trunc((NewNoiseLevel*TriggerFac) / 512 + DCOffset);
ZCrossNegTrigger := Trunc(-(NewNoiseLevel*TriggerFac) / 512 + DCOffset);
end;

{ Zerocrossings }
Dec(ZCrossCount,ZCrossBuf^[ZCrossIndex]);
if VoltageBelow then
begin
if NewVal > ZCrossPosTrigger then
begin
Inc(ZCrossCount); ZCrossBuf^[ZCrossIndex] := 1;
VoltageBelow := False;
end
else ZCrossBuf^[ZCrossIndex] := 0;  { no change }
end else
begin
if NewVal < ZCrossNegTrigger then
begin
Inc(ZCrossCount); ZCrossBuf^[ZCrossIndex] := 1;
VoltageBelow := True;
end
else ZCrossBuf^[ZCrossIndex] := 0;
end;
Inc(ZCrossIndex); if ZCrossIndex > 511 then ZCrossIndex := 0;

if SignalLength > 0 then
begin   { signal found. Is it still there? }
Dec(FricDownCount);
if (Avg < AvgLongThresholdWithNoise)  and (FricDownCount < 0)  { and (ZCrossCount < RastaZCross600Hz) }
then SignalLength := 0  { No. was a peak }
else
begin
{ if Power and ZCrossings are there: Counter Re-Init }
if (Avg > AvgLongThresholdWithNoise) and (ZCrossCount > RastaZCross600Hz)
then FricDownCount := RastaMaxFricSamples;
Inc(SignalLength);
if SignalLength > MinSignalLength then Break;  { start is already there }
end;
end else
begin
if (Avg > AvgLongThresholdWithNoise) or (ZCrossCount > RastaZCross600Hz) then
begin
if ZCrossCount > RastaZCross600Hz
then FricDownCount := RastaMaxFricSamples
else FricDownCount := 0;
SignalLength := AvgElems;
Start := x-AvgElems;
if Start < 0 then Start := 0;
end;
end;
end;
FreeMem(AvgBuf,AvgBufSize);
FreeMem(NoiseBuf,512*SizeOf(SmallInt));
FreeMem(ZCrossBuf,512*SizeOf(SmallInt));
NoiseLevel := LowestNoiseLevel;  { = Lowest Level * 512 }
Result := SignalLength > MinSignalLength; { True: variable Start set }
end;

function RastaFindStop(PCMData: PSmallInt; SampleCount, NoiseLevel, Start: LongInt;
var Stop: LongInt; NeedSilenceAtEnd: Boolean): Boolean;
type
IntBuf = Array[0..999] of SmallInt;
PIntBuf = ^IntBuf;
var x: LongInt; P: PSmallInt;
AvgBufSize, AvgBufIndex, CurVal: Word; Avg, AvgNext: LongInt;
AvgBuf: PIntBuf;
AvgLongThresholdWithNoise: LongInt;
LastVal, NewVal, NextLastVal: Integer;
AvgElems, AvgThreshold: LongInt;

NoiseBuf: PIntBuf; NoiseBufIndex: Integer; LowestNoiseLevel, NewNoiseLevel: LongInt;
MinSignalLength, SignalLength: LongInt;
{ Zero crossings }
DCPos, DCNeg: LongInt; DCOffset: Integer;
ZCrossPosTrigger, ZCrossNegTrigger: Integer;
ZCrossBuf: ^IntBuf; VoltageBelow: Boolean;
ZCrossIndex, ZCrossCount: Integer;
FricDownCount: Integer;  { "Signal" ohne Power (nur wg. ZCrossings): nach * Samples muß Power kommen }
begin
{ global Rasta-variable -> locales. reason: just for convinience }
AvgElems := RastaStopAvgBufSize; AvgThreshold := RastaStopAvg;
MinSignalLength := 22*RastaMinSigLen; SignalLength := 0;

Result := False;
{ if less than PauseSamples then no search can be done: go out of here }
if NeedSilenceAtEnd and (SampleCount < PauseSamples) then Exit;

AvgBufSize := AvgElems*SizeOf(SmallInt); GetMem(AvgBuf,AvgBufSize);
Avg := 0; AvgBufIndex := 0; FillChar(AvgBuf^,AvgBufSize,0);

GetMem(NoiseBuf,512*SizeOf(SmallInt)); FillChar(NoiseBuf^,1024,0);
NewNoiseLevel := 0; LowestNoiseLevel := NoiseLevel; NoiseBufIndex := 0;
FricDownCount:=0;
DCPos := 0; DCNeg := 0;
GetMem(ZCrossBuf,512*SizeOf(SmallInt));
FillChar(ZCrossBuf^,1024,0);

{ P to the last element in the buffer }
P := OffsetPointer(PCMData, (SampleCount-1)*2);
LastVal := P^;
DCOffset := 0;

while (NoiseBufIndex < 511) and (DCOffset < SampleCount-1) do
begin
{ DC-Offset }
if LastVal > 0 then
Inc(DCPos,LastVal)
else
Inc(DCNeg,LastVal);
{\$IFDEF WIN32}
Dec(P);
{\$ELSE}
asm
sub Word Ptr[P],2;
jnb @@1;
sub Word Ptr[P+2],8;
@@1:
end;
{\$ENDIF}
NewVal := P^;
CurVal := Abs(NewVal-LastVal);

if CurVal > 32000 then CurVal:=32000;

LastVal := NewVal;
if CurVal <> 0 then
begin
Inc(NewNoiseLevel,CurVal);
NoiseBuf^[NoiseBufIndex] := CurVal;
Inc(NoiseBufIndex);
end;
Inc(DCOffset);
end;
DCOffset := (DCPos+DCNeg) div DCOffset;
NoiseBufIndex := 0;
if NewNoiseLevel < LowestNoiseLevel
then LowestNoiseLevel := NewNoiseLevel;
ZCrossPosTrigger := Trunc((NewNoiseLevel*TriggerFac) / 512 + DCOffset);
ZCrossNegTrigger := Trunc(-(NewNoiseLevel*TriggerFac) / 512 + DCOffset);

{ summed AC-level plus noise. no recalc of NoiseLiftering }
AvgLongThresholdWithNoise := EstimateNoise(AvgThreshold*AvgElems,LowestNoiseLevel,False);

P := OffsetPointer(PCMData, (SampleCount-1)*2);

LastVal := P^;
VoltageBelow := LastVal < ZCrossNegTrigger;
ZCrossCount := 0; ZCrossIndex := 0;

for x := SampleCount downto 1 do
begin
Dec(Avg, AvgBuf^[AvgBufIndex]);
NewVal := P^;
{\$IFDEF WIN32} Dec(P); {\$ELSE}
asm sub Word Ptr[P],2; jnb @@1; sub Word Ptr[P+2],8; @@1: end; {\$ENDIF}

CurVal := Abs(NewVal-LastVal);

if CurVal > 32000 then CurVal:=32000;

LastVal := NewVal;
AvgBuf^[AvgBufIndex] := CurVal;
Inc(Avg,CurVal);
Inc(AvgBufIndex);
if AvgBufIndex >= AvgElems then AvgBufIndex := 0;

{ special case for silence-ranges }
if CurVal <> 0 then
begin
Dec(NewNoiseLevel,NoiseBuf^[NoiseBufIndex]); NoiseBuf^[NoiseBufIndex] := CurVal;
Inc(NewNoiseLevel,CurVal);
Inc(NoiseBufIndex); if NoiseBufIndex > 511 then NoiseBufIndex := 0;
end;

if NewNoiseLevel < LowestNoiseLevel then
begin
LowestNoiseLevel := NewNoiseLevel;
AvgLongThresholdWithNoise := EstimateNoise(AvgThreshold*AvgElems,LowestNoiseLevel,False);
ZCrossPosTrigger := Trunc((NewNoiseLevel*TriggerFac) / 512 + DCOffset);
ZCrossNegTrigger := Trunc(-(NewNoiseLevel*TriggerFac) / 512 + DCOffset);
end;

{ Zerocrossings }
Dec(ZCrossCount,ZCrossBuf^[ZCrossIndex]);
if VoltageBelow then
begin
if NewVal > ZCrossPosTrigger then
begin
Inc(ZCrossCount); ZCrossBuf^[ZCrossIndex] := 1;
VoltageBelow := False;
end
else ZCrossBuf^[ZCrossIndex] := 0;  { no change }
end else
begin
if NewVal < ZCrossNegTrigger then
begin
Inc(ZCrossCount); ZCrossBuf^[ZCrossIndex] := 1;
VoltageBelow := True;
end
else ZCrossBuf^[ZCrossIndex] := 0;
end;
Inc(ZCrossIndex); if ZCrossIndex > 511 then ZCrossIndex := 0;

if SignalLength > 0 then
begin
Dec(FricDownCount);
if (Avg < AvgLongThresholdWithNoise) and (FricDownCount < 0) { and (ZCrossCount < RastaZCross600Hz) }
then SignalLength := 0
else
begin
if (Avg > AvgLongThresholdWithNoise) and (ZCrossCount > RastaZCross600Hz)
then FricDownCount := RastaMaxFricSamples;
Inc(SignalLength);
if SignalLength > MinSignalLength then
begin
if (Stop > SampleCount-PauseSamples) and NeedSilenceAtEnd then SignalLength := 0;  { no end found }
Break;
end;
end;
end else
begin
if (Avg > AvgLongThresholdWithNoise) or (ZCrossCount > RastaZCross600Hz) then
begin
if ZCrossCount > RastaZCross600Hz then FricDownCount := RastaMaxFricSamples
else FricDownCount := 0;
SignalLength := AvgElems;
Stop := x+AvgElems;
end;
end;
end;
FreeMem(AvgBuf,AvgBufSize);
FreeMem(NoiseBuf,512*SizeOf(SmallInt));
FreeMem(ZCrossBuf,512*SizeOf(SmallInt));
Result := SignalLength > MinSignalLength;  { True: Stop set }
end;

Ciao, Mike

0

