Link to home
Start Free TrialLog in
Avatar of jackton
jackton

asked on

Finding start and end points of a speech signal?

I need to Find start and end points of a
speech signal

below is faq from the comp.speech
but the code example is only for unix,
where can I find the doc/delphi example for this?
and where can i find some article about the end-point
detection algorithms?
thank you.


End-point detection algorithms identify sections in an incoming audio signal that contain
speech. Accurate end-pointing is a non-trivial task, however, reasonable behaviour can be
obtained for inputs which contain only speech surrounded by silence (no other noises). Typical
algorithms look at the energy or amplitude of the incoming signal and at the rate of
"zero-crossings". A zero-crossing is where the audio signal changes from positive to negative
or visa versa. When the energy and zero-crossings are at certain levels, it is reasonable to
guess that there is speech. More detailed descriptions are provided in the papers cited below
and in the documentation for the following software.
Avatar of Lischke
Lischke

Hi jackton,

the cite is correct. Finding start and end point of a speech signal is indeed nontrivial. Here's a simple end detection routine. If you'd increase the points for your question significantly I'd give you much better functions (using the zero crossings approach including optional noise detection).

function RastaFindStop2(PCMData: PSmallInt; SampleCount, NoiseLevel, Start: LongInt;
  var Stop: LongInt; NeedSilenceAtEnd: Boolean): Boolean;
var
  P: PSmallInt;
  WalkAheadIndex, CurIndex, Signal, CurSample: LongInt;
  SignalBuf: array[0..511] of Integer;
  Avg: LongInt;

function CalcAvgSignal: LongInt;
var
  L: Integer;
begin
  Result := 0;
  for L := 0 to High(SignalBuf) do
    Inc(Result, SignalBuf[L]);
  Result := Result div (High(SignalBuf)+1);
end;

begin
  CurSample := Stop;
  P := Inc(PCMData, CurSample * 2);

  CurIndex := 0;
  FillChar(SignalBuf, SizeOf(SignalBuf), #0);

  while CurSample > Start do
  begin
    Signal := Abs(P^);
    if CurIndex = High(SignalBuf)+1 then
      CurIndex := 0;
    SignalBuf[CurIndex] := Signal;
    Inc(CurIndex);

    Avg := CalcAvgSignal;

    if Avg > 150 then
    begin
      WalkAheadIndex := CurSample;
      while WalkAheadIndex < Stop do
      begin
        P := Inc(PCMData, WalkAheadIndex * 2);
        Signal := Abs(P^);
        if CurIndex = High(SignalBuf)+1 then
          CurIndex := 0;
        SignalBuf[CurIndex] := Signal;
        Inc(CurIndex);
        Avg := CalcAvgSignal;
        if Avg < 120 then
          Break;
        Inc(WalkAheadIndex);
      end;
      CurSample := WalkAheadIndex;
      Break;
    end;
    Dec(CurSample);
    P := Inc(PCMData, CurSample * 2);
  end;
  Stop := CurSample;
  Result := True;
end;

PCMData is a pointer to raw pcm data (signed 16 bit mono signal only).

Ciao, Mike
Avatar of jackton

ASKER

hi,Lischke
Thank you for your help.
The problem is urgent for me,But I haven't any more points
Would you mail me the better function?
If i have more points,I would add it to you.

my e-mail account is:jt9700@yahoo.com


ASKER CERTIFIED SOLUTION
Avatar of Lischke
Lischke

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Without much comment, here's the solution I send jiangtao privately:

Here is the code. It compiles fine with D2,3,4. Some german comments are still not translated, but I'm sure you will get the sense. The code is for 22kHz, 16 bits, signed mono samples (easy to adjust to other values):

function OffsetPointer(P: Pointer; Offs: LongInt): Pointer;
{$IFDEF WIN32}
begin Result := Pointer(LongInt(P)+Offs); end;
{$ELSE}
Assembler;
  asm
    mov ax,Word Ptr[Offs]; mov dx, Word Ptr[Offs+2];
    add ax,Word Ptr[P]
    adc dx,0
    mov cx,3;  { OFFSET __AHSHIFT }
    shl dx,cl
    add dx,Word Ptr[P+2];
  end;
{$ENDIF}

var
 { True: NoiseLift über Noise faktorisieren (-> Quadratfunktion) }
  RastaAutoScaleNoiseLift: Boolean = True;
  { Signal := Signal-NoiseLift*Noise; 1.4 für Noise < 10, 4.0 für Noise = 40 }
  RastaNoiseLift: Single = 2.0;
  { Minimalzeit für Above Threshold (msec) -> ist Signal else Knackser }
  RastaMinSigLen: LongInt = 35;
  { Start: Größe Sliding Window und AC-Durchschnittswert }
  RastaStartAvgBufSize: LongInt = 200; RastaStartAvg: LongInt = 35;
  { Stop: Größe Sliding Window und AC-Durchschnittswert - erheblich kleiner! }
  RastaStopAvgBufSize: LongInt = 200;  RastaStopAvg: LongInt = 15;
  { 22050 Hz, Sliding Window 23 msec -> 13.9... Zero Crossings = Grundfrequenz 600 Hz}
  RastaZCross600Hz: Integer = 14;
  { 25 MSec kann ein Signal von Zerocrossings allein leben, danach muß Power kommen }
  RastaMaxFricSamples: Integer = (LongInt(22050)*25) div 1000;
  { Just Not Discriminable Volume Change, normalerweise 2 dB, max. 3 dB }
  RastaJND: Single = 1.4;
  { True: StartAvg/StopAvg verwenden, False: Just Noise * RastaJND }
  RastaUseThresholds: Boolean = True;

const
  DefaultPauseSamples = 6620;  { 300 msec bei 22050 Hz }
  PauseSamples: LongInt = DefaultPauseSamples; { -> RastaFindStop }
 
function EstimateNoise(StdLongThreshold,NoiseLevel: LongInt; IsStart: Boolean): LongInt;
begin
  if RastaAutoScaleNoiseLift then
  begin { ScaledNoise := NoiseLevel div 512;
          if ScaledNoise < 10 then NoiseLift := 1.4
            else NoiseLift := ScaledNoise / 10; }
    RastaNoiseLift := NoiseLevel / 5120;
    if RastaNoiseLift < 1.4 then RastaNoiseLift := 1.4;  { 2 dB }
  end;
  if RastaUseThresholds then
  begin
    Result := Trunc(RastaNoiseLift*StdLongThreshold+NoiseLevel div 512);
  end else
  begin
    if IsStart then Result := Trunc(NoiseLevel*RastaJND)
     else Result := Trunc(NoiseLevel*Sqrt(RastaJND));  {Stop }
  end;
end;

const TriggerFac = 1.5;  { ZCrossings above Noise }
const MaxFricLenMs = 25; { Nach 25 msec "Signal" aufgrund von ZCrossings muß Power kommen - sonst war's nix }

{ True: if start and NoiseLevel set. NoiseLevel = Real Level * 512  }
function RastaFindStart(PCMData: PSmallInt; SampleCount: LongInt;
  NoiseComputing: Boolean; var NoiseLevel, Start: LongInt): Boolean;

type IntBuf = Array[0..999] of SmallInt; PIntBuf = ^IntBuf;
var x: LongInt; P: PSmallInt;
    AvgBufSize, AvgBufIndex, CurVal: Word; Avg, AvgNext: LongInt;
    AvgBuf: PIntBuf;
    AvgLongThresholdWithNoise: LongInt;
    LastVal, NewVal, NextLastVal: Integer;
    AvgElems, AvgThreshold: LongInt;

    NoiseBuf: PIntBuf; NoiseBufIndex: Integer; LowestNoiseLevel, NewNoiseLevel: LongInt;
    MinSignalLength, SignalLength: LongInt;
    { Zero crossings }
    DCPos, DCNeg: LongInt; DCOffset: Integer;
    ZCrossPosTrigger, ZCrossNegTrigger: Integer;
    ZCrossBuf: ^IntBuf; VoltageBelow: Boolean;
    ZCrossIndex, ZCrossCount: Integer;
    FricDownCount: Integer;  { "Signal" ohne Power (nur wg. ZCrossings): nach * Samples muß Power kommen }
begin
  AvgElems := RastaStartAvgBufSize; AvgThreshold := RastaStartAvg;
  MinSignalLength := 22*RastaMinSigLen; SignalLength := 0;
  Start := 0; Result := False;

  AvgBufSize := AvgElems*SizeOf(SmallInt); GetMem(AvgBuf,AvgBufSize);
  Avg := 0; AvgBufIndex := 0; FillChar(AvgBuf^,AvgBufSize,0);

  { the first 23 msec AC as noise (accumulated AC-values) and for DC-Offset }
  GetMem(NoiseBuf,512*SizeOf(SmallInt));
  FillChar(NoiseBuf^,1024,0);
  NewNoiseLevel := 0;
  NoiseBufIndex := 0;
  DCPos := 0;
  DCNeg := 0;
  FricDownCount:=0;
  GetMem(ZCrossBuf,512*SizeOf(SmallInt));
  FillChar(ZCrossBuf^,1024,0);

  P := PCMData;
  LastVal := P^;
  DCOffset := 0;

  if NoiseComputing then begin
    while (NoiseBufIndex < 511) and (DCOffset < SampleCount-1) do
    begin
      { DC-Offset }
      if LastVal > 0 then
        Inc(DCPos,LastVal)
      else
        Inc(DCNeg,LastVal);

      {$IFDEF WIN32} Inc(P); {$ELSE}
      asm
        add Word Ptr[P],2;
        jnb @@1;
        add Word Ptr[P+2],8;
      @@1:
      end; {$ENDIF}

      NewVal := P^;


      CurVal := Abs(NewVal-LastVal);
      LastVal := NewVal;
      if CurVal <> 0 then
      begin
        Inc(NewNoiseLevel,CurVal);
        { saturation }
        if CurVal > 32000 then NoiseBuf^[NoiseBufIndex] := 32000
          else NoiseBuf^[NoiseBufIndex] := CurVal;
        Inc(NoiseBufIndex);
      end;

      Inc(DCOffset);  { own counter - NoiseBuf skips DC-ranges }
    end;
    DCOffset := (DCPos+DCNeg) div DCOffset; { DCNeg < 0 }
  end
  else begin
    DCOffset := 0;
    NewNoiseLevel := 25000; {30000 for baiser! }
  end;

  LowestNoiseLevel := NewNoiseLevel;
  NoiseBufIndex := 0;
  ZCrossPosTrigger := Trunc((NewNoiseLevel*TriggerFac) / 512 + DCOffset);
  ZCrossNegTrigger := Trunc(-(NewNoiseLevel*TriggerFac) /512 + DCOffset);

  { added AC-Level plus Noise }
  AvgLongThresholdWithNoise := EstimateNoise(AvgThreshold*AvgElems,LowestNoiseLevel, True);

  { and all again, Signal = AC - NoiseLevel. for the first 23 msec
    can very well be a negative signal level  }
  P := PCMData;
  LastVal := P^;
  VoltageBelow := LastVal < ZCrossNegTrigger; ZCrossCount := 0; ZCrossIndex := 0;
  for x := 1 to SampleCount-1 do
  begin
    {$IFDEF WIN32}
    Inc(P);
    {$ELSE}
    asm
      add Word Ptr[P],2;
      jnb @@1;
      add Word Ptr[P+2],8;
    @@1:
    end;
    {$ENDIF}

    Dec(Avg, AvgBuf^[AvgBufIndex]);
    NewVal := P^; CurVal := Abs(NewVal-LastVal); LastVal := NewVal;

    { Obergrenze für verwertbare Daten festlegen (Saturation)}
    if CurVal > 32000 then CurVal := 32000;
    AvgBuf^[AvgBufIndex] := CurVal; Inc(Avg,CurVal);

    Inc(AvgBufIndex); if AvgBufIndex >= AvgElems then AvgBufIndex := 0;

    { special case for silence-pieces }
    if CurVal <> 0 then
    begin
      Dec(NewNoiseLevel,NoiseBuf^[NoiseBufIndex]); NoiseBuf^[NoiseBufIndex] := CurVal;
      Inc(NewNoiseLevel,CurVal);
      Inc(NoiseBufIndex); if NoiseBufIndex > 511 then NoiseBufIndex := 0;
    end;

    if NewNoiseLevel < LowestNoiseLevel then
    begin
      LowestNoiseLevel := NewNoiseLevel;
      AvgLongThresholdWithNoise := EstimateNoise(AvgThreshold*AvgElems,LowestNoiseLevel,True);
      ZCrossPosTrigger := Trunc((NewNoiseLevel*TriggerFac) / 512 + DCOffset);
      ZCrossNegTrigger := Trunc(-(NewNoiseLevel*TriggerFac) / 512 + DCOffset);
    end;

    { Zerocrossings }
    Dec(ZCrossCount,ZCrossBuf^[ZCrossIndex]);
    if VoltageBelow then
    begin
      if NewVal > ZCrossPosTrigger then
      begin
        Inc(ZCrossCount); ZCrossBuf^[ZCrossIndex] := 1;
        VoltageBelow := False;
      end
        else ZCrossBuf^[ZCrossIndex] := 0;  { no change }
    end else
    begin
      if NewVal < ZCrossNegTrigger then
      begin
        Inc(ZCrossCount); ZCrossBuf^[ZCrossIndex] := 1;
        VoltageBelow := True;
      end
        else ZCrossBuf^[ZCrossIndex] := 0;
    end;
    Inc(ZCrossIndex); if ZCrossIndex > 511 then ZCrossIndex := 0;

    if SignalLength > 0 then
    begin   { signal found. Is it still there? }
      Dec(FricDownCount);
      if (Avg < AvgLongThresholdWithNoise)  and (FricDownCount < 0)  { and (ZCrossCount < RastaZCross600Hz) }
       then SignalLength := 0  { No. was a peak }
       else
       begin
         { if Power and ZCrossings are there: Counter Re-Init }
         if (Avg > AvgLongThresholdWithNoise) and (ZCrossCount > RastaZCross600Hz)
             then FricDownCount := RastaMaxFricSamples;
         Inc(SignalLength);
         if SignalLength > MinSignalLength then Break;  { start is already there }
       end;
     end else
     begin
       if (Avg > AvgLongThresholdWithNoise) or (ZCrossCount > RastaZCross600Hz) then
       begin
         if ZCrossCount > RastaZCross600Hz
           then FricDownCount := RastaMaxFricSamples
           else FricDownCount := 0;
         SignalLength := AvgElems;
         Start := x-AvgElems;
         if Start < 0 then Start := 0;
       end;
     end;
  end;
  FreeMem(AvgBuf,AvgBufSize);
  FreeMem(NoiseBuf,512*SizeOf(SmallInt));
  FreeMem(ZCrossBuf,512*SizeOf(SmallInt));
  NoiseLevel := LowestNoiseLevel;  { = Lowest Level * 512 }
  Result := SignalLength > MinSignalLength; { True: variable Start set }
end;

function RastaFindStop(PCMData: PSmallInt; SampleCount, NoiseLevel, Start: LongInt;
     var Stop: LongInt; NeedSilenceAtEnd: Boolean): Boolean;
type
  IntBuf = Array[0..999] of SmallInt;
  PIntBuf = ^IntBuf;
var x: LongInt; P: PSmallInt;
  AvgBufSize, AvgBufIndex, CurVal: Word; Avg, AvgNext: LongInt;
  AvgBuf: PIntBuf;
  AvgLongThresholdWithNoise: LongInt;
  LastVal, NewVal, NextLastVal: Integer;
  AvgElems, AvgThreshold: LongInt;

  NoiseBuf: PIntBuf; NoiseBufIndex: Integer; LowestNoiseLevel, NewNoiseLevel: LongInt;
  MinSignalLength, SignalLength: LongInt;
  { Zero crossings }
  DCPos, DCNeg: LongInt; DCOffset: Integer;
  ZCrossPosTrigger, ZCrossNegTrigger: Integer;
  ZCrossBuf: ^IntBuf; VoltageBelow: Boolean;
  ZCrossIndex, ZCrossCount: Integer;
  FricDownCount: Integer;  { "Signal" ohne Power (nur wg. ZCrossings): nach * Samples muß Power kommen }
begin
  { global Rasta-variable -> locales. reason: just for convinience }
  AvgElems := RastaStopAvgBufSize; AvgThreshold := RastaStopAvg;
  MinSignalLength := 22*RastaMinSigLen; SignalLength := 0;

  Result := False;
  { if less than PauseSamples then no search can be done: go out of here }
  if NeedSilenceAtEnd and (SampleCount < PauseSamples) then Exit;

  AvgBufSize := AvgElems*SizeOf(SmallInt); GetMem(AvgBuf,AvgBufSize);
  Avg := 0; AvgBufIndex := 0; FillChar(AvgBuf^,AvgBufSize,0);

  GetMem(NoiseBuf,512*SizeOf(SmallInt)); FillChar(NoiseBuf^,1024,0);
  NewNoiseLevel := 0; LowestNoiseLevel := NoiseLevel; NoiseBufIndex := 0;
  FricDownCount:=0;
  DCPos := 0; DCNeg := 0;
  GetMem(ZCrossBuf,512*SizeOf(SmallInt));
  FillChar(ZCrossBuf^,1024,0);


{ P to the last element in the buffer }
  P := OffsetPointer(PCMData, (SampleCount-1)*2);
  LastVal := P^;
  DCOffset := 0;

  while (NoiseBufIndex < 511) and (DCOffset < SampleCount-1) do
  begin
    { DC-Offset }
    if LastVal > 0 then
      Inc(DCPos,LastVal)
    else
      Inc(DCNeg,LastVal);
    {$IFDEF WIN32}
      Dec(P);
    {$ELSE}
    asm
      sub Word Ptr[P],2;
      jnb @@1;
      sub Word Ptr[P+2],8;
    @@1:
    end;
    {$ENDIF}
    NewVal := P^;
    CurVal := Abs(NewVal-LastVal);

    if CurVal > 32000 then CurVal:=32000;

    LastVal := NewVal;
    if CurVal <> 0 then
    begin
      Inc(NewNoiseLevel,CurVal);
      NoiseBuf^[NoiseBufIndex] := CurVal;
      Inc(NoiseBufIndex);
    end;
    Inc(DCOffset);
  end;
  DCOffset := (DCPos+DCNeg) div DCOffset;
  NoiseBufIndex := 0;
  if NewNoiseLevel < LowestNoiseLevel
    then LowestNoiseLevel := NewNoiseLevel;
  ZCrossPosTrigger := Trunc((NewNoiseLevel*TriggerFac) / 512 + DCOffset);
  ZCrossNegTrigger := Trunc(-(NewNoiseLevel*TriggerFac) / 512 + DCOffset);

  { summed AC-level plus noise. no recalc of NoiseLiftering }
  AvgLongThresholdWithNoise := EstimateNoise(AvgThreshold*AvgElems,LowestNoiseLevel,False);

  P := OffsetPointer(PCMData, (SampleCount-1)*2);

  LastVal := P^;
  VoltageBelow := LastVal < ZCrossNegTrigger;
  ZCrossCount := 0; ZCrossIndex := 0;

  for x := SampleCount downto 1 do
  begin
    Dec(Avg, AvgBuf^[AvgBufIndex]);
    NewVal := P^;
    {$IFDEF WIN32} Dec(P); {$ELSE}
    asm sub Word Ptr[P],2; jnb @@1; sub Word Ptr[P+2],8; @@1: end; {$ENDIF}

    CurVal := Abs(NewVal-LastVal);

    if CurVal > 32000 then CurVal:=32000;

    LastVal := NewVal;
    AvgBuf^[AvgBufIndex] := CurVal;
    Inc(Avg,CurVal);
    Inc(AvgBufIndex);
    if AvgBufIndex >= AvgElems then AvgBufIndex := 0;

    { special case for silence-ranges }
    if CurVal <> 0 then
    begin
      Dec(NewNoiseLevel,NoiseBuf^[NoiseBufIndex]); NoiseBuf^[NoiseBufIndex] := CurVal;
      Inc(NewNoiseLevel,CurVal);
      Inc(NoiseBufIndex); if NoiseBufIndex > 511 then NoiseBufIndex := 0;
    end;

    if NewNoiseLevel < LowestNoiseLevel then
    begin
      LowestNoiseLevel := NewNoiseLevel;
      AvgLongThresholdWithNoise := EstimateNoise(AvgThreshold*AvgElems,LowestNoiseLevel,False);
      ZCrossPosTrigger := Trunc((NewNoiseLevel*TriggerFac) / 512 + DCOffset);
      ZCrossNegTrigger := Trunc(-(NewNoiseLevel*TriggerFac) / 512 + DCOffset);
    end;

    { Zerocrossings }
    Dec(ZCrossCount,ZCrossBuf^[ZCrossIndex]);
    if VoltageBelow then
    begin
      if NewVal > ZCrossPosTrigger then
      begin
        Inc(ZCrossCount); ZCrossBuf^[ZCrossIndex] := 1;
        VoltageBelow := False;
      end
        else ZCrossBuf^[ZCrossIndex] := 0;  { no change }
    end else
    begin
      if NewVal < ZCrossNegTrigger then
      begin
        Inc(ZCrossCount); ZCrossBuf^[ZCrossIndex] := 1;
        VoltageBelow := True;
      end
        else ZCrossBuf^[ZCrossIndex] := 0;
    end;
    Inc(ZCrossIndex); if ZCrossIndex > 511 then ZCrossIndex := 0;

    if SignalLength > 0 then
    begin
      Dec(FricDownCount);
      if (Avg < AvgLongThresholdWithNoise) and (FricDownCount < 0) { and (ZCrossCount < RastaZCross600Hz) }
        then SignalLength := 0
        else
        begin
         if (Avg > AvgLongThresholdWithNoise) and (ZCrossCount > RastaZCross600Hz)
             then FricDownCount := RastaMaxFricSamples;
          Inc(SignalLength);
          if SignalLength > MinSignalLength then
          begin
            if (Stop > SampleCount-PauseSamples) and NeedSilenceAtEnd then SignalLength := 0;  { no end found }
            Break;
          end;
        end;
      end else
      begin
        if (Avg > AvgLongThresholdWithNoise) or (ZCrossCount > RastaZCross600Hz) then
        begin
         if ZCrossCount > RastaZCross600Hz then FricDownCount := RastaMaxFricSamples
           else FricDownCount := 0;
          SignalLength := AvgElems;
          Stop := x+AvgElems;
        end;
      end;
  end;
  FreeMem(AvgBuf,AvgBufSize);
  FreeMem(NoiseBuf,512*SizeOf(SmallInt));
  FreeMem(ZCrossBuf,512*SizeOf(SmallInt));
  Result := SignalLength > MinSignalLength;  { True: Stop set }
end;

Ciao, Mike