jackton
asked on
Finding start and end points of a speech signal?
I need to Find start and end points of a
speech signal
below is faq from the comp.speech
but the code example is only for unix,
where can I find the doc/delphi example for this?
and where can i find some article about the end-point
detection algorithms?
thank you.
End-point detection algorithms identify sections in an incoming audio signal that contain
speech. Accurate end-pointing is a non-trivial task, however, reasonable behaviour can be
obtained for inputs which contain only speech surrounded by silence (no other noises). Typical
algorithms look at the energy or amplitude of the incoming signal and at the rate of
"zero-crossings". A zero-crossing is where the audio signal changes from positive to negative
or visa versa. When the energy and zero-crossings are at certain levels, it is reasonable to
guess that there is speech. More detailed descriptions are provided in the papers cited below
and in the documentation for the following software.
speech signal
below is faq from the comp.speech
but the code example is only for unix,
where can I find the doc/delphi example for this?
and where can i find some article about the end-point
detection algorithms?
thank you.
End-point detection algorithms identify sections in an incoming audio signal that contain
speech. Accurate end-pointing is a non-trivial task, however, reasonable behaviour can be
obtained for inputs which contain only speech surrounded by silence (no other noises). Typical
algorithms look at the energy or amplitude of the incoming signal and at the rate of
"zero-crossings". A zero-crossing is where the audio signal changes from positive to negative
or visa versa. When the energy and zero-crossings are at certain levels, it is reasonable to
guess that there is speech. More detailed descriptions are provided in the papers cited below
and in the documentation for the following software.
ASKER
hi,Lischke
Thank you for your help.
The problem is urgent for me,But I haven't any more points
Would you mail me the better function?
If i have more points,I would add it to you.
my e-mail account is:jt9700@yahoo.com
Thank you for your help.
The problem is urgent for me,But I haven't any more points
Would you mail me the better function?
If i have more points,I would add it to you.
my e-mail account is:jt9700@yahoo.com
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Without much comment, here's the solution I send jiangtao privately:
Here is the code. It compiles fine with D2,3,4. Some german comments are still not translated, but I'm sure you will get the sense. The code is for 22kHz, 16 bits, signed mono samples (easy to adjust to other values):
function OffsetPointer(P: Pointer; Offs: LongInt): Pointer;
{$IFDEF WIN32}
begin Result := Pointer(LongInt(P)+Offs); end;
{$ELSE}
Assembler;
asm
mov ax,Word Ptr[Offs]; mov dx, Word Ptr[Offs+2];
add ax,Word Ptr[P]
adc dx,0
mov cx,3; { OFFSET __AHSHIFT }
shl dx,cl
add dx,Word Ptr[P+2];
end;
{$ENDIF}
var
{ True: NoiseLift über Noise faktorisieren (-> Quadratfunktion) }
RastaAutoScaleNoiseLift: Boolean = True;
{ Signal := Signal-NoiseLift*Noise; 1.4 für Noise < 10, 4.0 für Noise = 40 }
RastaNoiseLift: Single = 2.0;
{ Minimalzeit für Above Threshold (msec) -> ist Signal else Knackser }
RastaMinSigLen: LongInt = 35;
{ Start: Größe Sliding Window und AC-Durchschnittswert }
RastaStartAvgBufSize: LongInt = 200; RastaStartAvg: LongInt = 35;
{ Stop: Größe Sliding Window und AC-Durchschnittswert - erheblich kleiner! }
RastaStopAvgBufSize: LongInt = 200; RastaStopAvg: LongInt = 15;
{ 22050 Hz, Sliding Window 23 msec -> 13.9... Zero Crossings = Grundfrequenz 600 Hz}
RastaZCross600Hz: Integer = 14;
{ 25 MSec kann ein Signal von Zerocrossings allein leben, danach muß Power kommen }
RastaMaxFricSamples: Integer = (LongInt(22050)*25) div 1000;
{ Just Not Discriminable Volume Change, normalerweise 2 dB, max. 3 dB }
RastaJND: Single = 1.4;
{ True: StartAvg/StopAvg verwenden, False: Just Noise * RastaJND }
RastaUseThresholds: Boolean = True;
const
DefaultPauseSamples = 6620; { 300 msec bei 22050 Hz }
PauseSamples: LongInt = DefaultPauseSamples; { -> RastaFindStop }
function EstimateNoise(StdLongThres hold,Noise Level: LongInt; IsStart: Boolean): LongInt;
begin
if RastaAutoScaleNoiseLift then
begin { ScaledNoise := NoiseLevel div 512;
if ScaledNoise < 10 then NoiseLift := 1.4
else NoiseLift := ScaledNoise / 10; }
RastaNoiseLift := NoiseLevel / 5120;
if RastaNoiseLift < 1.4 then RastaNoiseLift := 1.4; { 2 dB }
end;
if RastaUseThresholds then
begin
Result := Trunc(RastaNoiseLift*StdLo ngThreshol d+NoiseLev el div 512);
end else
begin
if IsStart then Result := Trunc(NoiseLevel*RastaJND)
else Result := Trunc(NoiseLevel*Sqrt(Rast aJND)); {Stop }
end;
end;
const TriggerFac = 1.5; { ZCrossings above Noise }
const MaxFricLenMs = 25; { Nach 25 msec "Signal" aufgrund von ZCrossings muß Power kommen - sonst war's nix }
{ True: if start and NoiseLevel set. NoiseLevel = Real Level * 512 }
function RastaFindStart(PCMData: PSmallInt; SampleCount: LongInt;
NoiseComputing: Boolean; var NoiseLevel, Start: LongInt): Boolean;
type IntBuf = Array[0..999] of SmallInt; PIntBuf = ^IntBuf;
var x: LongInt; P: PSmallInt;
AvgBufSize, AvgBufIndex, CurVal: Word; Avg, AvgNext: LongInt;
AvgBuf: PIntBuf;
AvgLongThresholdWithNoise: LongInt;
LastVal, NewVal, NextLastVal: Integer;
AvgElems, AvgThreshold: LongInt;
NoiseBuf: PIntBuf; NoiseBufIndex: Integer; LowestNoiseLevel, NewNoiseLevel: LongInt;
MinSignalLength, SignalLength: LongInt;
{ Zero crossings }
DCPos, DCNeg: LongInt; DCOffset: Integer;
ZCrossPosTrigger, ZCrossNegTrigger: Integer;
ZCrossBuf: ^IntBuf; VoltageBelow: Boolean;
ZCrossIndex, ZCrossCount: Integer;
FricDownCount: Integer; { "Signal" ohne Power (nur wg. ZCrossings): nach * Samples muß Power kommen }
begin
AvgElems := RastaStartAvgBufSize; AvgThreshold := RastaStartAvg;
MinSignalLength := 22*RastaMinSigLen; SignalLength := 0;
Start := 0; Result := False;
AvgBufSize := AvgElems*SizeOf(SmallInt); GetMem(AvgBuf,AvgBufSize);
Avg := 0; AvgBufIndex := 0; FillChar(AvgBuf^,AvgBufSiz e,0);
{ the first 23 msec AC as noise (accumulated AC-values) and for DC-Offset }
GetMem(NoiseBuf,512*SizeOf (SmallInt) );
FillChar(NoiseBuf^,1024,0) ;
NewNoiseLevel := 0;
NoiseBufIndex := 0;
DCPos := 0;
DCNeg := 0;
FricDownCount:=0;
GetMem(ZCrossBuf,512*SizeO f(SmallInt ));
FillChar(ZCrossBuf^,1024,0 );
P := PCMData;
LastVal := P^;
DCOffset := 0;
if NoiseComputing then begin
while (NoiseBufIndex < 511) and (DCOffset < SampleCount-1) do
begin
{ DC-Offset }
if LastVal > 0 then
Inc(DCPos,LastVal)
else
Inc(DCNeg,LastVal);
{$IFDEF WIN32} Inc(P); {$ELSE}
asm
add Word Ptr[P],2;
jnb @@1;
add Word Ptr[P+2],8;
@@1:
end; {$ENDIF}
NewVal := P^;
CurVal := Abs(NewVal-LastVal);
LastVal := NewVal;
if CurVal <> 0 then
begin
Inc(NewNoiseLevel,CurVal);
{ saturation }
if CurVal > 32000 then NoiseBuf^[NoiseBufIndex] := 32000
else NoiseBuf^[NoiseBufIndex] := CurVal;
Inc(NoiseBufIndex);
end;
Inc(DCOffset); { own counter - NoiseBuf skips DC-ranges }
end;
DCOffset := (DCPos+DCNeg) div DCOffset; { DCNeg < 0 }
end
else begin
DCOffset := 0;
NewNoiseLevel := 25000; {30000 for baiser! }
end;
LowestNoiseLevel := NewNoiseLevel;
NoiseBufIndex := 0;
ZCrossPosTrigger := Trunc((NewNoiseLevel*Trigg erFac) / 512 + DCOffset);
ZCrossNegTrigger := Trunc(-(NewNoiseLevel*Trig gerFac) /512 + DCOffset);
{ added AC-Level plus Noise }
AvgLongThresholdWithNoise := EstimateNoise(AvgThreshold *AvgElems, LowestNois eLevel, True);
{ and all again, Signal = AC - NoiseLevel. for the first 23 msec
can very well be a negative signal level }
P := PCMData;
LastVal := P^;
VoltageBelow := LastVal < ZCrossNegTrigger; ZCrossCount := 0; ZCrossIndex := 0;
for x := 1 to SampleCount-1 do
begin
{$IFDEF WIN32}
Inc(P);
{$ELSE}
asm
add Word Ptr[P],2;
jnb @@1;
add Word Ptr[P+2],8;
@@1:
end;
{$ENDIF}
Dec(Avg, AvgBuf^[AvgBufIndex]);
NewVal := P^; CurVal := Abs(NewVal-LastVal); LastVal := NewVal;
{ Obergrenze für verwertbare Daten festlegen (Saturation)}
if CurVal > 32000 then CurVal := 32000;
AvgBuf^[AvgBufIndex] := CurVal; Inc(Avg,CurVal);
Inc(AvgBufIndex); if AvgBufIndex >= AvgElems then AvgBufIndex := 0;
{ special case for silence-pieces }
if CurVal <> 0 then
begin
Dec(NewNoiseLevel,NoiseBuf ^[NoiseBuf Index]); NoiseBuf^[NoiseBufIndex] := CurVal;
Inc(NewNoiseLevel,CurVal);
Inc(NoiseBufIndex); if NoiseBufIndex > 511 then NoiseBufIndex := 0;
end;
if NewNoiseLevel < LowestNoiseLevel then
begin
LowestNoiseLevel := NewNoiseLevel;
AvgLongThresholdWithNoise := EstimateNoise(AvgThreshold *AvgElems, LowestNois eLevel,Tru e);
ZCrossPosTrigger := Trunc((NewNoiseLevel*Trigg erFac) / 512 + DCOffset);
ZCrossNegTrigger := Trunc(-(NewNoiseLevel*Trig gerFac) / 512 + DCOffset);
end;
{ Zerocrossings }
Dec(ZCrossCount,ZCrossBuf^ [ZCrossInd ex]);
if VoltageBelow then
begin
if NewVal > ZCrossPosTrigger then
begin
Inc(ZCrossCount); ZCrossBuf^[ZCrossIndex] := 1;
VoltageBelow := False;
end
else ZCrossBuf^[ZCrossIndex] := 0; { no change }
end else
begin
if NewVal < ZCrossNegTrigger then
begin
Inc(ZCrossCount); ZCrossBuf^[ZCrossIndex] := 1;
VoltageBelow := True;
end
else ZCrossBuf^[ZCrossIndex] := 0;
end;
Inc(ZCrossIndex); if ZCrossIndex > 511 then ZCrossIndex := 0;
if SignalLength > 0 then
begin { signal found. Is it still there? }
Dec(FricDownCount);
if (Avg < AvgLongThresholdWithNoise) and (FricDownCount < 0) { and (ZCrossCount < RastaZCross600Hz) }
then SignalLength := 0 { No. was a peak }
else
begin
{ if Power and ZCrossings are there: Counter Re-Init }
if (Avg > AvgLongThresholdWithNoise) and (ZCrossCount > RastaZCross600Hz)
then FricDownCount := RastaMaxFricSamples;
Inc(SignalLength);
if SignalLength > MinSignalLength then Break; { start is already there }
end;
end else
begin
if (Avg > AvgLongThresholdWithNoise) or (ZCrossCount > RastaZCross600Hz) then
begin
if ZCrossCount > RastaZCross600Hz
then FricDownCount := RastaMaxFricSamples
else FricDownCount := 0;
SignalLength := AvgElems;
Start := x-AvgElems;
if Start < 0 then Start := 0;
end;
end;
end;
FreeMem(AvgBuf,AvgBufSize) ;
FreeMem(NoiseBuf,512*SizeO f(SmallInt ));
FreeMem(ZCrossBuf,512*Size Of(SmallIn t));
NoiseLevel := LowestNoiseLevel; { = Lowest Level * 512 }
Result := SignalLength > MinSignalLength; { True: variable Start set }
end;
function RastaFindStop(PCMData: PSmallInt; SampleCount, NoiseLevel, Start: LongInt;
var Stop: LongInt; NeedSilenceAtEnd: Boolean): Boolean;
type
IntBuf = Array[0..999] of SmallInt;
PIntBuf = ^IntBuf;
var x: LongInt; P: PSmallInt;
AvgBufSize, AvgBufIndex, CurVal: Word; Avg, AvgNext: LongInt;
AvgBuf: PIntBuf;
AvgLongThresholdWithNoise: LongInt;
LastVal, NewVal, NextLastVal: Integer;
AvgElems, AvgThreshold: LongInt;
NoiseBuf: PIntBuf; NoiseBufIndex: Integer; LowestNoiseLevel, NewNoiseLevel: LongInt;
MinSignalLength, SignalLength: LongInt;
{ Zero crossings }
DCPos, DCNeg: LongInt; DCOffset: Integer;
ZCrossPosTrigger, ZCrossNegTrigger: Integer;
ZCrossBuf: ^IntBuf; VoltageBelow: Boolean;
ZCrossIndex, ZCrossCount: Integer;
FricDownCount: Integer; { "Signal" ohne Power (nur wg. ZCrossings): nach * Samples muß Power kommen }
begin
{ global Rasta-variable -> locales. reason: just for convinience }
AvgElems := RastaStopAvgBufSize; AvgThreshold := RastaStopAvg;
MinSignalLength := 22*RastaMinSigLen; SignalLength := 0;
Result := False;
{ if less than PauseSamples then no search can be done: go out of here }
if NeedSilenceAtEnd and (SampleCount < PauseSamples) then Exit;
AvgBufSize := AvgElems*SizeOf(SmallInt); GetMem(AvgBuf,AvgBufSize);
Avg := 0; AvgBufIndex := 0; FillChar(AvgBuf^,AvgBufSiz e,0);
GetMem(NoiseBuf,512*SizeOf (SmallInt) ); FillChar(NoiseBuf^,1024,0) ;
NewNoiseLevel := 0; LowestNoiseLevel := NoiseLevel; NoiseBufIndex := 0;
FricDownCount:=0;
DCPos := 0; DCNeg := 0;
GetMem(ZCrossBuf,512*SizeO f(SmallInt ));
FillChar(ZCrossBuf^,1024,0 );
{ P to the last element in the buffer }
P := OffsetPointer(PCMData, (SampleCount-1)*2);
LastVal := P^;
DCOffset := 0;
while (NoiseBufIndex < 511) and (DCOffset < SampleCount-1) do
begin
{ DC-Offset }
if LastVal > 0 then
Inc(DCPos,LastVal)
else
Inc(DCNeg,LastVal);
{$IFDEF WIN32}
Dec(P);
{$ELSE}
asm
sub Word Ptr[P],2;
jnb @@1;
sub Word Ptr[P+2],8;
@@1:
end;
{$ENDIF}
NewVal := P^;
CurVal := Abs(NewVal-LastVal);
if CurVal > 32000 then CurVal:=32000;
LastVal := NewVal;
if CurVal <> 0 then
begin
Inc(NewNoiseLevel,CurVal);
NoiseBuf^[NoiseBufIndex] := CurVal;
Inc(NoiseBufIndex);
end;
Inc(DCOffset);
end;
DCOffset := (DCPos+DCNeg) div DCOffset;
NoiseBufIndex := 0;
if NewNoiseLevel < LowestNoiseLevel
then LowestNoiseLevel := NewNoiseLevel;
ZCrossPosTrigger := Trunc((NewNoiseLevel*Trigg erFac) / 512 + DCOffset);
ZCrossNegTrigger := Trunc(-(NewNoiseLevel*Trig gerFac) / 512 + DCOffset);
{ summed AC-level plus noise. no recalc of NoiseLiftering }
AvgLongThresholdWithNoise := EstimateNoise(AvgThreshold *AvgElems, LowestNois eLevel,Fal se);
P := OffsetPointer(PCMData, (SampleCount-1)*2);
LastVal := P^;
VoltageBelow := LastVal < ZCrossNegTrigger;
ZCrossCount := 0; ZCrossIndex := 0;
for x := SampleCount downto 1 do
begin
Dec(Avg, AvgBuf^[AvgBufIndex]);
NewVal := P^;
{$IFDEF WIN32} Dec(P); {$ELSE}
asm sub Word Ptr[P],2; jnb @@1; sub Word Ptr[P+2],8; @@1: end; {$ENDIF}
CurVal := Abs(NewVal-LastVal);
if CurVal > 32000 then CurVal:=32000;
LastVal := NewVal;
AvgBuf^[AvgBufIndex] := CurVal;
Inc(Avg,CurVal);
Inc(AvgBufIndex);
if AvgBufIndex >= AvgElems then AvgBufIndex := 0;
{ special case for silence-ranges }
if CurVal <> 0 then
begin
Dec(NewNoiseLevel,NoiseBuf ^[NoiseBuf Index]); NoiseBuf^[NoiseBufIndex] := CurVal;
Inc(NewNoiseLevel,CurVal);
Inc(NoiseBufIndex); if NoiseBufIndex > 511 then NoiseBufIndex := 0;
end;
if NewNoiseLevel < LowestNoiseLevel then
begin
LowestNoiseLevel := NewNoiseLevel;
AvgLongThresholdWithNoise := EstimateNoise(AvgThreshold *AvgElems, LowestNois eLevel,Fal se);
ZCrossPosTrigger := Trunc((NewNoiseLevel*Trigg erFac) / 512 + DCOffset);
ZCrossNegTrigger := Trunc(-(NewNoiseLevel*Trig gerFac) / 512 + DCOffset);
end;
{ Zerocrossings }
Dec(ZCrossCount,ZCrossBuf^ [ZCrossInd ex]);
if VoltageBelow then
begin
if NewVal > ZCrossPosTrigger then
begin
Inc(ZCrossCount); ZCrossBuf^[ZCrossIndex] := 1;
VoltageBelow := False;
end
else ZCrossBuf^[ZCrossIndex] := 0; { no change }
end else
begin
if NewVal < ZCrossNegTrigger then
begin
Inc(ZCrossCount); ZCrossBuf^[ZCrossIndex] := 1;
VoltageBelow := True;
end
else ZCrossBuf^[ZCrossIndex] := 0;
end;
Inc(ZCrossIndex); if ZCrossIndex > 511 then ZCrossIndex := 0;
if SignalLength > 0 then
begin
Dec(FricDownCount);
if (Avg < AvgLongThresholdWithNoise) and (FricDownCount < 0) { and (ZCrossCount < RastaZCross600Hz) }
then SignalLength := 0
else
begin
if (Avg > AvgLongThresholdWithNoise) and (ZCrossCount > RastaZCross600Hz)
then FricDownCount := RastaMaxFricSamples;
Inc(SignalLength);
if SignalLength > MinSignalLength then
begin
if (Stop > SampleCount-PauseSamples) and NeedSilenceAtEnd then SignalLength := 0; { no end found }
Break;
end;
end;
end else
begin
if (Avg > AvgLongThresholdWithNoise) or (ZCrossCount > RastaZCross600Hz) then
begin
if ZCrossCount > RastaZCross600Hz then FricDownCount := RastaMaxFricSamples
else FricDownCount := 0;
SignalLength := AvgElems;
Stop := x+AvgElems;
end;
end;
end;
FreeMem(AvgBuf,AvgBufSize) ;
FreeMem(NoiseBuf,512*SizeO f(SmallInt ));
FreeMem(ZCrossBuf,512*Size Of(SmallIn t));
Result := SignalLength > MinSignalLength; { True: Stop set }
end;
Ciao, Mike
Here is the code. It compiles fine with D2,3,4. Some german comments are still not translated, but I'm sure you will get the sense. The code is for 22kHz, 16 bits, signed mono samples (easy to adjust to other values):
function OffsetPointer(P: Pointer; Offs: LongInt): Pointer;
{$IFDEF WIN32}
begin Result := Pointer(LongInt(P)+Offs); end;
{$ELSE}
Assembler;
asm
mov ax,Word Ptr[Offs]; mov dx, Word Ptr[Offs+2];
add ax,Word Ptr[P]
adc dx,0
mov cx,3; { OFFSET __AHSHIFT }
shl dx,cl
add dx,Word Ptr[P+2];
end;
{$ENDIF}
var
{ True: NoiseLift über Noise faktorisieren (-> Quadratfunktion) }
RastaAutoScaleNoiseLift: Boolean = True;
{ Signal := Signal-NoiseLift*Noise; 1.4 für Noise < 10, 4.0 für Noise = 40 }
RastaNoiseLift: Single = 2.0;
{ Minimalzeit für Above Threshold (msec) -> ist Signal else Knackser }
RastaMinSigLen: LongInt = 35;
{ Start: Größe Sliding Window und AC-Durchschnittswert }
RastaStartAvgBufSize: LongInt = 200; RastaStartAvg: LongInt = 35;
{ Stop: Größe Sliding Window und AC-Durchschnittswert - erheblich kleiner! }
RastaStopAvgBufSize: LongInt = 200; RastaStopAvg: LongInt = 15;
{ 22050 Hz, Sliding Window 23 msec -> 13.9... Zero Crossings = Grundfrequenz 600 Hz}
RastaZCross600Hz: Integer = 14;
{ 25 MSec kann ein Signal von Zerocrossings allein leben, danach muß Power kommen }
RastaMaxFricSamples: Integer = (LongInt(22050)*25) div 1000;
{ Just Not Discriminable Volume Change, normalerweise 2 dB, max. 3 dB }
RastaJND: Single = 1.4;
{ True: StartAvg/StopAvg verwenden, False: Just Noise * RastaJND }
RastaUseThresholds: Boolean = True;
const
DefaultPauseSamples = 6620; { 300 msec bei 22050 Hz }
PauseSamples: LongInt = DefaultPauseSamples; { -> RastaFindStop }
function EstimateNoise(StdLongThres
begin
if RastaAutoScaleNoiseLift then
begin { ScaledNoise := NoiseLevel div 512;
if ScaledNoise < 10 then NoiseLift := 1.4
else NoiseLift := ScaledNoise / 10; }
RastaNoiseLift := NoiseLevel / 5120;
if RastaNoiseLift < 1.4 then RastaNoiseLift := 1.4; { 2 dB }
end;
if RastaUseThresholds then
begin
Result := Trunc(RastaNoiseLift*StdLo
end else
begin
if IsStart then Result := Trunc(NoiseLevel*RastaJND)
else Result := Trunc(NoiseLevel*Sqrt(Rast
end;
end;
const TriggerFac = 1.5; { ZCrossings above Noise }
const MaxFricLenMs = 25; { Nach 25 msec "Signal" aufgrund von ZCrossings muß Power kommen - sonst war's nix }
{ True: if start and NoiseLevel set. NoiseLevel = Real Level * 512 }
function RastaFindStart(PCMData: PSmallInt; SampleCount: LongInt;
NoiseComputing: Boolean; var NoiseLevel, Start: LongInt): Boolean;
type IntBuf = Array[0..999] of SmallInt; PIntBuf = ^IntBuf;
var x: LongInt; P: PSmallInt;
AvgBufSize, AvgBufIndex, CurVal: Word; Avg, AvgNext: LongInt;
AvgBuf: PIntBuf;
AvgLongThresholdWithNoise:
LastVal, NewVal, NextLastVal: Integer;
AvgElems, AvgThreshold: LongInt;
NoiseBuf: PIntBuf; NoiseBufIndex: Integer; LowestNoiseLevel, NewNoiseLevel: LongInt;
MinSignalLength, SignalLength: LongInt;
{ Zero crossings }
DCPos, DCNeg: LongInt; DCOffset: Integer;
ZCrossPosTrigger, ZCrossNegTrigger: Integer;
ZCrossBuf: ^IntBuf; VoltageBelow: Boolean;
ZCrossIndex, ZCrossCount: Integer;
FricDownCount: Integer; { "Signal" ohne Power (nur wg. ZCrossings): nach * Samples muß Power kommen }
begin
AvgElems := RastaStartAvgBufSize; AvgThreshold := RastaStartAvg;
MinSignalLength := 22*RastaMinSigLen; SignalLength := 0;
Start := 0; Result := False;
AvgBufSize := AvgElems*SizeOf(SmallInt);
Avg := 0; AvgBufIndex := 0; FillChar(AvgBuf^,AvgBufSiz
{ the first 23 msec AC as noise (accumulated AC-values) and for DC-Offset }
GetMem(NoiseBuf,512*SizeOf
FillChar(NoiseBuf^,1024,0)
NewNoiseLevel := 0;
NoiseBufIndex := 0;
DCPos := 0;
DCNeg := 0;
FricDownCount:=0;
GetMem(ZCrossBuf,512*SizeO
FillChar(ZCrossBuf^,1024,0
P := PCMData;
LastVal := P^;
DCOffset := 0;
if NoiseComputing then begin
while (NoiseBufIndex < 511) and (DCOffset < SampleCount-1) do
begin
{ DC-Offset }
if LastVal > 0 then
Inc(DCPos,LastVal)
else
Inc(DCNeg,LastVal);
{$IFDEF WIN32} Inc(P); {$ELSE}
asm
add Word Ptr[P],2;
jnb @@1;
add Word Ptr[P+2],8;
@@1:
end; {$ENDIF}
NewVal := P^;
CurVal := Abs(NewVal-LastVal);
LastVal := NewVal;
if CurVal <> 0 then
begin
Inc(NewNoiseLevel,CurVal);
{ saturation }
if CurVal > 32000 then NoiseBuf^[NoiseBufIndex] := 32000
else NoiseBuf^[NoiseBufIndex] := CurVal;
Inc(NoiseBufIndex);
end;
Inc(DCOffset); { own counter - NoiseBuf skips DC-ranges }
end;
DCOffset := (DCPos+DCNeg) div DCOffset; { DCNeg < 0 }
end
else begin
DCOffset := 0;
NewNoiseLevel := 25000; {30000 for baiser! }
end;
LowestNoiseLevel := NewNoiseLevel;
NoiseBufIndex := 0;
ZCrossPosTrigger := Trunc((NewNoiseLevel*Trigg
ZCrossNegTrigger := Trunc(-(NewNoiseLevel*Trig
{ added AC-Level plus Noise }
AvgLongThresholdWithNoise := EstimateNoise(AvgThreshold
{ and all again, Signal = AC - NoiseLevel. for the first 23 msec
can very well be a negative signal level }
P := PCMData;
LastVal := P^;
VoltageBelow := LastVal < ZCrossNegTrigger; ZCrossCount := 0; ZCrossIndex := 0;
for x := 1 to SampleCount-1 do
begin
{$IFDEF WIN32}
Inc(P);
{$ELSE}
asm
add Word Ptr[P],2;
jnb @@1;
add Word Ptr[P+2],8;
@@1:
end;
{$ENDIF}
Dec(Avg, AvgBuf^[AvgBufIndex]);
NewVal := P^; CurVal := Abs(NewVal-LastVal); LastVal := NewVal;
{ Obergrenze für verwertbare Daten festlegen (Saturation)}
if CurVal > 32000 then CurVal := 32000;
AvgBuf^[AvgBufIndex] := CurVal; Inc(Avg,CurVal);
Inc(AvgBufIndex); if AvgBufIndex >= AvgElems then AvgBufIndex := 0;
{ special case for silence-pieces }
if CurVal <> 0 then
begin
Dec(NewNoiseLevel,NoiseBuf
Inc(NewNoiseLevel,CurVal);
Inc(NoiseBufIndex); if NoiseBufIndex > 511 then NoiseBufIndex := 0;
end;
if NewNoiseLevel < LowestNoiseLevel then
begin
LowestNoiseLevel := NewNoiseLevel;
AvgLongThresholdWithNoise := EstimateNoise(AvgThreshold
ZCrossPosTrigger := Trunc((NewNoiseLevel*Trigg
ZCrossNegTrigger := Trunc(-(NewNoiseLevel*Trig
end;
{ Zerocrossings }
Dec(ZCrossCount,ZCrossBuf^
if VoltageBelow then
begin
if NewVal > ZCrossPosTrigger then
begin
Inc(ZCrossCount); ZCrossBuf^[ZCrossIndex] := 1;
VoltageBelow := False;
end
else ZCrossBuf^[ZCrossIndex] := 0; { no change }
end else
begin
if NewVal < ZCrossNegTrigger then
begin
Inc(ZCrossCount); ZCrossBuf^[ZCrossIndex] := 1;
VoltageBelow := True;
end
else ZCrossBuf^[ZCrossIndex] := 0;
end;
Inc(ZCrossIndex); if ZCrossIndex > 511 then ZCrossIndex := 0;
if SignalLength > 0 then
begin { signal found. Is it still there? }
Dec(FricDownCount);
if (Avg < AvgLongThresholdWithNoise)
then SignalLength := 0 { No. was a peak }
else
begin
{ if Power and ZCrossings are there: Counter Re-Init }
if (Avg > AvgLongThresholdWithNoise)
then FricDownCount := RastaMaxFricSamples;
Inc(SignalLength);
if SignalLength > MinSignalLength then Break; { start is already there }
end;
end else
begin
if (Avg > AvgLongThresholdWithNoise)
begin
if ZCrossCount > RastaZCross600Hz
then FricDownCount := RastaMaxFricSamples
else FricDownCount := 0;
SignalLength := AvgElems;
Start := x-AvgElems;
if Start < 0 then Start := 0;
end;
end;
end;
FreeMem(AvgBuf,AvgBufSize)
FreeMem(NoiseBuf,512*SizeO
FreeMem(ZCrossBuf,512*Size
NoiseLevel := LowestNoiseLevel; { = Lowest Level * 512 }
Result := SignalLength > MinSignalLength; { True: variable Start set }
end;
function RastaFindStop(PCMData: PSmallInt; SampleCount, NoiseLevel, Start: LongInt;
var Stop: LongInt; NeedSilenceAtEnd: Boolean): Boolean;
type
IntBuf = Array[0..999] of SmallInt;
PIntBuf = ^IntBuf;
var x: LongInt; P: PSmallInt;
AvgBufSize, AvgBufIndex, CurVal: Word; Avg, AvgNext: LongInt;
AvgBuf: PIntBuf;
AvgLongThresholdWithNoise:
LastVal, NewVal, NextLastVal: Integer;
AvgElems, AvgThreshold: LongInt;
NoiseBuf: PIntBuf; NoiseBufIndex: Integer; LowestNoiseLevel, NewNoiseLevel: LongInt;
MinSignalLength, SignalLength: LongInt;
{ Zero crossings }
DCPos, DCNeg: LongInt; DCOffset: Integer;
ZCrossPosTrigger, ZCrossNegTrigger: Integer;
ZCrossBuf: ^IntBuf; VoltageBelow: Boolean;
ZCrossIndex, ZCrossCount: Integer;
FricDownCount: Integer; { "Signal" ohne Power (nur wg. ZCrossings): nach * Samples muß Power kommen }
begin
{ global Rasta-variable -> locales. reason: just for convinience }
AvgElems := RastaStopAvgBufSize; AvgThreshold := RastaStopAvg;
MinSignalLength := 22*RastaMinSigLen; SignalLength := 0;
Result := False;
{ if less than PauseSamples then no search can be done: go out of here }
if NeedSilenceAtEnd and (SampleCount < PauseSamples) then Exit;
AvgBufSize := AvgElems*SizeOf(SmallInt);
Avg := 0; AvgBufIndex := 0; FillChar(AvgBuf^,AvgBufSiz
GetMem(NoiseBuf,512*SizeOf
NewNoiseLevel := 0; LowestNoiseLevel := NoiseLevel; NoiseBufIndex := 0;
FricDownCount:=0;
DCPos := 0; DCNeg := 0;
GetMem(ZCrossBuf,512*SizeO
FillChar(ZCrossBuf^,1024,0
{ P to the last element in the buffer }
P := OffsetPointer(PCMData, (SampleCount-1)*2);
LastVal := P^;
DCOffset := 0;
while (NoiseBufIndex < 511) and (DCOffset < SampleCount-1) do
begin
{ DC-Offset }
if LastVal > 0 then
Inc(DCPos,LastVal)
else
Inc(DCNeg,LastVal);
{$IFDEF WIN32}
Dec(P);
{$ELSE}
asm
sub Word Ptr[P],2;
jnb @@1;
sub Word Ptr[P+2],8;
@@1:
end;
{$ENDIF}
NewVal := P^;
CurVal := Abs(NewVal-LastVal);
if CurVal > 32000 then CurVal:=32000;
LastVal := NewVal;
if CurVal <> 0 then
begin
Inc(NewNoiseLevel,CurVal);
NoiseBuf^[NoiseBufIndex] := CurVal;
Inc(NoiseBufIndex);
end;
Inc(DCOffset);
end;
DCOffset := (DCPos+DCNeg) div DCOffset;
NoiseBufIndex := 0;
if NewNoiseLevel < LowestNoiseLevel
then LowestNoiseLevel := NewNoiseLevel;
ZCrossPosTrigger := Trunc((NewNoiseLevel*Trigg
ZCrossNegTrigger := Trunc(-(NewNoiseLevel*Trig
{ summed AC-level plus noise. no recalc of NoiseLiftering }
AvgLongThresholdWithNoise := EstimateNoise(AvgThreshold
P := OffsetPointer(PCMData, (SampleCount-1)*2);
LastVal := P^;
VoltageBelow := LastVal < ZCrossNegTrigger;
ZCrossCount := 0; ZCrossIndex := 0;
for x := SampleCount downto 1 do
begin
Dec(Avg, AvgBuf^[AvgBufIndex]);
NewVal := P^;
{$IFDEF WIN32} Dec(P); {$ELSE}
asm sub Word Ptr[P],2; jnb @@1; sub Word Ptr[P+2],8; @@1: end; {$ENDIF}
CurVal := Abs(NewVal-LastVal);
if CurVal > 32000 then CurVal:=32000;
LastVal := NewVal;
AvgBuf^[AvgBufIndex] := CurVal;
Inc(Avg,CurVal);
Inc(AvgBufIndex);
if AvgBufIndex >= AvgElems then AvgBufIndex := 0;
{ special case for silence-ranges }
if CurVal <> 0 then
begin
Dec(NewNoiseLevel,NoiseBuf
Inc(NewNoiseLevel,CurVal);
Inc(NoiseBufIndex); if NoiseBufIndex > 511 then NoiseBufIndex := 0;
end;
if NewNoiseLevel < LowestNoiseLevel then
begin
LowestNoiseLevel := NewNoiseLevel;
AvgLongThresholdWithNoise := EstimateNoise(AvgThreshold
ZCrossPosTrigger := Trunc((NewNoiseLevel*Trigg
ZCrossNegTrigger := Trunc(-(NewNoiseLevel*Trig
end;
{ Zerocrossings }
Dec(ZCrossCount,ZCrossBuf^
if VoltageBelow then
begin
if NewVal > ZCrossPosTrigger then
begin
Inc(ZCrossCount); ZCrossBuf^[ZCrossIndex] := 1;
VoltageBelow := False;
end
else ZCrossBuf^[ZCrossIndex] := 0; { no change }
end else
begin
if NewVal < ZCrossNegTrigger then
begin
Inc(ZCrossCount); ZCrossBuf^[ZCrossIndex] := 1;
VoltageBelow := True;
end
else ZCrossBuf^[ZCrossIndex] := 0;
end;
Inc(ZCrossIndex); if ZCrossIndex > 511 then ZCrossIndex := 0;
if SignalLength > 0 then
begin
Dec(FricDownCount);
if (Avg < AvgLongThresholdWithNoise)
then SignalLength := 0
else
begin
if (Avg > AvgLongThresholdWithNoise)
then FricDownCount := RastaMaxFricSamples;
Inc(SignalLength);
if SignalLength > MinSignalLength then
begin
if (Stop > SampleCount-PauseSamples) and NeedSilenceAtEnd then SignalLength := 0; { no end found }
Break;
end;
end;
end else
begin
if (Avg > AvgLongThresholdWithNoise)
begin
if ZCrossCount > RastaZCross600Hz then FricDownCount := RastaMaxFricSamples
else FricDownCount := 0;
SignalLength := AvgElems;
Stop := x+AvgElems;
end;
end;
end;
FreeMem(AvgBuf,AvgBufSize)
FreeMem(NoiseBuf,512*SizeO
FreeMem(ZCrossBuf,512*Size
Result := SignalLength > MinSignalLength; { True: Stop set }
end;
Ciao, Mike
the cite is correct. Finding start and end point of a speech signal is indeed nontrivial. Here's a simple end detection routine. If you'd increase the points for your question significantly I'd give you much better functions (using the zero crossings approach including optional noise detection).
function RastaFindStop2(PCMData: PSmallInt; SampleCount, NoiseLevel, Start: LongInt;
var Stop: LongInt; NeedSilenceAtEnd: Boolean): Boolean;
var
P: PSmallInt;
WalkAheadIndex, CurIndex, Signal, CurSample: LongInt;
SignalBuf: array[0..511] of Integer;
Avg: LongInt;
function CalcAvgSignal: LongInt;
var
L: Integer;
begin
Result := 0;
for L := 0 to High(SignalBuf) do
Inc(Result, SignalBuf[L]);
Result := Result div (High(SignalBuf)+1);
end;
begin
CurSample := Stop;
P := Inc(PCMData, CurSample * 2);
CurIndex := 0;
FillChar(SignalBuf, SizeOf(SignalBuf), #0);
while CurSample > Start do
begin
Signal := Abs(P^);
if CurIndex = High(SignalBuf)+1 then
CurIndex := 0;
SignalBuf[CurIndex] := Signal;
Inc(CurIndex);
Avg := CalcAvgSignal;
if Avg > 150 then
begin
WalkAheadIndex := CurSample;
while WalkAheadIndex < Stop do
begin
P := Inc(PCMData, WalkAheadIndex * 2);
Signal := Abs(P^);
if CurIndex = High(SignalBuf)+1 then
CurIndex := 0;
SignalBuf[CurIndex] := Signal;
Inc(CurIndex);
Avg := CalcAvgSignal;
if Avg < 120 then
Break;
Inc(WalkAheadIndex);
end;
CurSample := WalkAheadIndex;
Break;
end;
Dec(CurSample);
P := Inc(PCMData, CurSample * 2);
end;
Stop := CurSample;
Result := True;
end;
PCMData is a pointer to raw pcm data (signed 16 bit mono signal only).
Ciao, Mike