Solved

Finding start and end points of a  speech signal?

Posted on 1998-12-06
4
247 Views
Last Modified: 2010-04-04
I need to Find start and end points of a
speech signal

below is faq from the comp.speech
but the code example is only for unix,
where can I find the doc/delphi example for this?
and where can i find some article about the end-point
detection algorithms?
thank you.


End-point detection algorithms identify sections in an incoming audio signal that contain
speech. Accurate end-pointing is a non-trivial task, however, reasonable behaviour can be
obtained for inputs which contain only speech surrounded by silence (no other noises). Typical
algorithms look at the energy or amplitude of the incoming signal and at the rate of
"zero-crossings". A zero-crossing is where the audio signal changes from positive to negative
or visa versa. When the energy and zero-crossings are at certain levels, it is reasonable to
guess that there is speech. More detailed descriptions are provided in the papers cited below
and in the documentation for the following software.
0
Comment
Question by:jackton
  • 3
4 Comments
 
LVL 10

Expert Comment

by:Lischke
ID: 1349620
Hi jackton,

the cite is correct. Finding start and end point of a speech signal is indeed nontrivial. Here's a simple end detection routine. If you'd increase the points for your question significantly I'd give you much better functions (using the zero crossings approach including optional noise detection).

function RastaFindStop2(PCMData: PSmallInt; SampleCount, NoiseLevel, Start: LongInt;
  var Stop: LongInt; NeedSilenceAtEnd: Boolean): Boolean;
var
  P: PSmallInt;
  WalkAheadIndex, CurIndex, Signal, CurSample: LongInt;
  SignalBuf: array[0..511] of Integer;
  Avg: LongInt;

function CalcAvgSignal: LongInt;
var
  L: Integer;
begin
  Result := 0;
  for L := 0 to High(SignalBuf) do
    Inc(Result, SignalBuf[L]);
  Result := Result div (High(SignalBuf)+1);
end;

begin
  CurSample := Stop;
  P := Inc(PCMData, CurSample * 2);

  CurIndex := 0;
  FillChar(SignalBuf, SizeOf(SignalBuf), #0);

  while CurSample > Start do
  begin
    Signal := Abs(P^);
    if CurIndex = High(SignalBuf)+1 then
      CurIndex := 0;
    SignalBuf[CurIndex] := Signal;
    Inc(CurIndex);

    Avg := CalcAvgSignal;

    if Avg > 150 then
    begin
      WalkAheadIndex := CurSample;
      while WalkAheadIndex < Stop do
      begin
        P := Inc(PCMData, WalkAheadIndex * 2);
        Signal := Abs(P^);
        if CurIndex = High(SignalBuf)+1 then
          CurIndex := 0;
        SignalBuf[CurIndex] := Signal;
        Inc(CurIndex);
        Avg := CalcAvgSignal;
        if Avg < 120 then
          Break;
        Inc(WalkAheadIndex);
      end;
      CurSample := WalkAheadIndex;
      Break;
    end;
    Dec(CurSample);
    P := Inc(PCMData, CurSample * 2);
  end;
  Stop := CurSample;
  Result := True;
end;

PCMData is a pointer to raw pcm data (signed 16 bit mono signal only).

Ciao, Mike
0
 

Author Comment

by:jackton
ID: 1349621
hi,Lischke
Thank you for your help.
The problem is urgent for me,But I haven't any more points
Would you mail me the better function?
If i have more points,I would add it to you.

my e-mail account is:jt9700@yahoo.com


0
 
LVL 10

Accepted Solution

by:
Lischke earned 50 total points
ID: 1349622
Grummel, grummel, mmmmh, okay I'll send you the code. But you have to find out alone how it is to be used.

Ciao, Mike
0
 
LVL 10

Expert Comment

by:Lischke
ID: 1349623
Without much comment, here's the solution I send jiangtao privately:

Here is the code. It compiles fine with D2,3,4. Some german comments are still not translated, but I'm sure you will get the sense. The code is for 22kHz, 16 bits, signed mono samples (easy to adjust to other values):

function OffsetPointer(P: Pointer; Offs: LongInt): Pointer;
{$IFDEF WIN32}
begin Result := Pointer(LongInt(P)+Offs); end;
{$ELSE}
Assembler;
  asm
    mov ax,Word Ptr[Offs]; mov dx, Word Ptr[Offs+2];
    add ax,Word Ptr[P]
    adc dx,0
    mov cx,3;  { OFFSET __AHSHIFT }
    shl dx,cl
    add dx,Word Ptr[P+2];
  end;
{$ENDIF}

var
 { True: NoiseLift über Noise faktorisieren (-> Quadratfunktion) }
  RastaAutoScaleNoiseLift: Boolean = True;
  { Signal := Signal-NoiseLift*Noise; 1.4 für Noise < 10, 4.0 für Noise = 40 }
  RastaNoiseLift: Single = 2.0;
  { Minimalzeit für Above Threshold (msec) -> ist Signal else Knackser }
  RastaMinSigLen: LongInt = 35;
  { Start: Größe Sliding Window und AC-Durchschnittswert }
  RastaStartAvgBufSize: LongInt = 200; RastaStartAvg: LongInt = 35;
  { Stop: Größe Sliding Window und AC-Durchschnittswert - erheblich kleiner! }
  RastaStopAvgBufSize: LongInt = 200;  RastaStopAvg: LongInt = 15;
  { 22050 Hz, Sliding Window 23 msec -> 13.9... Zero Crossings = Grundfrequenz 600 Hz}
  RastaZCross600Hz: Integer = 14;
  { 25 MSec kann ein Signal von Zerocrossings allein leben, danach muß Power kommen }
  RastaMaxFricSamples: Integer = (LongInt(22050)*25) div 1000;
  { Just Not Discriminable Volume Change, normalerweise 2 dB, max. 3 dB }
  RastaJND: Single = 1.4;
  { True: StartAvg/StopAvg verwenden, False: Just Noise * RastaJND }
  RastaUseThresholds: Boolean = True;

const
  DefaultPauseSamples = 6620;  { 300 msec bei 22050 Hz }
  PauseSamples: LongInt = DefaultPauseSamples; { -> RastaFindStop }
 
function EstimateNoise(StdLongThreshold,NoiseLevel: LongInt; IsStart: Boolean): LongInt;
begin
  if RastaAutoScaleNoiseLift then
  begin { ScaledNoise := NoiseLevel div 512;
          if ScaledNoise < 10 then NoiseLift := 1.4
            else NoiseLift := ScaledNoise / 10; }
    RastaNoiseLift := NoiseLevel / 5120;
    if RastaNoiseLift < 1.4 then RastaNoiseLift := 1.4;  { 2 dB }
  end;
  if RastaUseThresholds then
  begin
    Result := Trunc(RastaNoiseLift*StdLongThreshold+NoiseLevel div 512);
  end else
  begin
    if IsStart then Result := Trunc(NoiseLevel*RastaJND)
     else Result := Trunc(NoiseLevel*Sqrt(RastaJND));  {Stop }
  end;
end;

const TriggerFac = 1.5;  { ZCrossings above Noise }
const MaxFricLenMs = 25; { Nach 25 msec "Signal" aufgrund von ZCrossings muß Power kommen - sonst war's nix }

{ True: if start and NoiseLevel set. NoiseLevel = Real Level * 512  }
function RastaFindStart(PCMData: PSmallInt; SampleCount: LongInt;
  NoiseComputing: Boolean; var NoiseLevel, Start: LongInt): Boolean;

type IntBuf = Array[0..999] of SmallInt; PIntBuf = ^IntBuf;
var x: LongInt; P: PSmallInt;
    AvgBufSize, AvgBufIndex, CurVal: Word; Avg, AvgNext: LongInt;
    AvgBuf: PIntBuf;
    AvgLongThresholdWithNoise: LongInt;
    LastVal, NewVal, NextLastVal: Integer;
    AvgElems, AvgThreshold: LongInt;

    NoiseBuf: PIntBuf; NoiseBufIndex: Integer; LowestNoiseLevel, NewNoiseLevel: LongInt;
    MinSignalLength, SignalLength: LongInt;
    { Zero crossings }
    DCPos, DCNeg: LongInt; DCOffset: Integer;
    ZCrossPosTrigger, ZCrossNegTrigger: Integer;
    ZCrossBuf: ^IntBuf; VoltageBelow: Boolean;
    ZCrossIndex, ZCrossCount: Integer;
    FricDownCount: Integer;  { "Signal" ohne Power (nur wg. ZCrossings): nach * Samples muß Power kommen }
begin
  AvgElems := RastaStartAvgBufSize; AvgThreshold := RastaStartAvg;
  MinSignalLength := 22*RastaMinSigLen; SignalLength := 0;
  Start := 0; Result := False;

  AvgBufSize := AvgElems*SizeOf(SmallInt); GetMem(AvgBuf,AvgBufSize);
  Avg := 0; AvgBufIndex := 0; FillChar(AvgBuf^,AvgBufSize,0);

  { the first 23 msec AC as noise (accumulated AC-values) and for DC-Offset }
  GetMem(NoiseBuf,512*SizeOf(SmallInt));
  FillChar(NoiseBuf^,1024,0);
  NewNoiseLevel := 0;
  NoiseBufIndex := 0;
  DCPos := 0;
  DCNeg := 0;
  FricDownCount:=0;
  GetMem(ZCrossBuf,512*SizeOf(SmallInt));
  FillChar(ZCrossBuf^,1024,0);

  P := PCMData;
  LastVal := P^;
  DCOffset := 0;

  if NoiseComputing then begin
    while (NoiseBufIndex < 511) and (DCOffset < SampleCount-1) do
    begin
      { DC-Offset }
      if LastVal > 0 then
        Inc(DCPos,LastVal)
      else
        Inc(DCNeg,LastVal);

      {$IFDEF WIN32} Inc(P); {$ELSE}
      asm
        add Word Ptr[P],2;
        jnb @@1;
        add Word Ptr[P+2],8;
      @@1:
      end; {$ENDIF}

      NewVal := P^;


      CurVal := Abs(NewVal-LastVal);
      LastVal := NewVal;
      if CurVal <> 0 then
      begin
        Inc(NewNoiseLevel,CurVal);
        { saturation }
        if CurVal > 32000 then NoiseBuf^[NoiseBufIndex] := 32000
          else NoiseBuf^[NoiseBufIndex] := CurVal;
        Inc(NoiseBufIndex);
      end;

      Inc(DCOffset);  { own counter - NoiseBuf skips DC-ranges }
    end;
    DCOffset := (DCPos+DCNeg) div DCOffset; { DCNeg < 0 }
  end
  else begin
    DCOffset := 0;
    NewNoiseLevel := 25000; {30000 for baiser! }
  end;

  LowestNoiseLevel := NewNoiseLevel;
  NoiseBufIndex := 0;
  ZCrossPosTrigger := Trunc((NewNoiseLevel*TriggerFac) / 512 + DCOffset);
  ZCrossNegTrigger := Trunc(-(NewNoiseLevel*TriggerFac) /512 + DCOffset);

  { added AC-Level plus Noise }
  AvgLongThresholdWithNoise := EstimateNoise(AvgThreshold*AvgElems,LowestNoiseLevel, True);

  { and all again, Signal = AC - NoiseLevel. for the first 23 msec
    can very well be a negative signal level  }
  P := PCMData;
  LastVal := P^;
  VoltageBelow := LastVal < ZCrossNegTrigger; ZCrossCount := 0; ZCrossIndex := 0;
  for x := 1 to SampleCount-1 do
  begin
    {$IFDEF WIN32}
    Inc(P);
    {$ELSE}
    asm
      add Word Ptr[P],2;
      jnb @@1;
      add Word Ptr[P+2],8;
    @@1:
    end;
    {$ENDIF}

    Dec(Avg, AvgBuf^[AvgBufIndex]);
    NewVal := P^; CurVal := Abs(NewVal-LastVal); LastVal := NewVal;

    { Obergrenze für verwertbare Daten festlegen (Saturation)}
    if CurVal > 32000 then CurVal := 32000;
    AvgBuf^[AvgBufIndex] := CurVal; Inc(Avg,CurVal);

    Inc(AvgBufIndex); if AvgBufIndex >= AvgElems then AvgBufIndex := 0;

    { special case for silence-pieces }
    if CurVal <> 0 then
    begin
      Dec(NewNoiseLevel,NoiseBuf^[NoiseBufIndex]); NoiseBuf^[NoiseBufIndex] := CurVal;
      Inc(NewNoiseLevel,CurVal);
      Inc(NoiseBufIndex); if NoiseBufIndex > 511 then NoiseBufIndex := 0;
    end;

    if NewNoiseLevel < LowestNoiseLevel then
    begin
      LowestNoiseLevel := NewNoiseLevel;
      AvgLongThresholdWithNoise := EstimateNoise(AvgThreshold*AvgElems,LowestNoiseLevel,True);
      ZCrossPosTrigger := Trunc((NewNoiseLevel*TriggerFac) / 512 + DCOffset);
      ZCrossNegTrigger := Trunc(-(NewNoiseLevel*TriggerFac) / 512 + DCOffset);
    end;

    { Zerocrossings }
    Dec(ZCrossCount,ZCrossBuf^[ZCrossIndex]);
    if VoltageBelow then
    begin
      if NewVal > ZCrossPosTrigger then
      begin
        Inc(ZCrossCount); ZCrossBuf^[ZCrossIndex] := 1;
        VoltageBelow := False;
      end
        else ZCrossBuf^[ZCrossIndex] := 0;  { no change }
    end else
    begin
      if NewVal < ZCrossNegTrigger then
      begin
        Inc(ZCrossCount); ZCrossBuf^[ZCrossIndex] := 1;
        VoltageBelow := True;
      end
        else ZCrossBuf^[ZCrossIndex] := 0;
    end;
    Inc(ZCrossIndex); if ZCrossIndex > 511 then ZCrossIndex := 0;

    if SignalLength > 0 then
    begin   { signal found. Is it still there? }
      Dec(FricDownCount);
      if (Avg < AvgLongThresholdWithNoise)  and (FricDownCount < 0)  { and (ZCrossCount < RastaZCross600Hz) }
       then SignalLength := 0  { No. was a peak }
       else
       begin
         { if Power and ZCrossings are there: Counter Re-Init }
         if (Avg > AvgLongThresholdWithNoise) and (ZCrossCount > RastaZCross600Hz)
             then FricDownCount := RastaMaxFricSamples;
         Inc(SignalLength);
         if SignalLength > MinSignalLength then Break;  { start is already there }
       end;
     end else
     begin
       if (Avg > AvgLongThresholdWithNoise) or (ZCrossCount > RastaZCross600Hz) then
       begin
         if ZCrossCount > RastaZCross600Hz
           then FricDownCount := RastaMaxFricSamples
           else FricDownCount := 0;
         SignalLength := AvgElems;
         Start := x-AvgElems;
         if Start < 0 then Start := 0;
       end;
     end;
  end;
  FreeMem(AvgBuf,AvgBufSize);
  FreeMem(NoiseBuf,512*SizeOf(SmallInt));
  FreeMem(ZCrossBuf,512*SizeOf(SmallInt));
  NoiseLevel := LowestNoiseLevel;  { = Lowest Level * 512 }
  Result := SignalLength > MinSignalLength; { True: variable Start set }
end;

function RastaFindStop(PCMData: PSmallInt; SampleCount, NoiseLevel, Start: LongInt;
     var Stop: LongInt; NeedSilenceAtEnd: Boolean): Boolean;
type
  IntBuf = Array[0..999] of SmallInt;
  PIntBuf = ^IntBuf;
var x: LongInt; P: PSmallInt;
  AvgBufSize, AvgBufIndex, CurVal: Word; Avg, AvgNext: LongInt;
  AvgBuf: PIntBuf;
  AvgLongThresholdWithNoise: LongInt;
  LastVal, NewVal, NextLastVal: Integer;
  AvgElems, AvgThreshold: LongInt;

  NoiseBuf: PIntBuf; NoiseBufIndex: Integer; LowestNoiseLevel, NewNoiseLevel: LongInt;
  MinSignalLength, SignalLength: LongInt;
  { Zero crossings }
  DCPos, DCNeg: LongInt; DCOffset: Integer;
  ZCrossPosTrigger, ZCrossNegTrigger: Integer;
  ZCrossBuf: ^IntBuf; VoltageBelow: Boolean;
  ZCrossIndex, ZCrossCount: Integer;
  FricDownCount: Integer;  { "Signal" ohne Power (nur wg. ZCrossings): nach * Samples muß Power kommen }
begin
  { global Rasta-variable -> locales. reason: just for convinience }
  AvgElems := RastaStopAvgBufSize; AvgThreshold := RastaStopAvg;
  MinSignalLength := 22*RastaMinSigLen; SignalLength := 0;

  Result := False;
  { if less than PauseSamples then no search can be done: go out of here }
  if NeedSilenceAtEnd and (SampleCount < PauseSamples) then Exit;

  AvgBufSize := AvgElems*SizeOf(SmallInt); GetMem(AvgBuf,AvgBufSize);
  Avg := 0; AvgBufIndex := 0; FillChar(AvgBuf^,AvgBufSize,0);

  GetMem(NoiseBuf,512*SizeOf(SmallInt)); FillChar(NoiseBuf^,1024,0);
  NewNoiseLevel := 0; LowestNoiseLevel := NoiseLevel; NoiseBufIndex := 0;
  FricDownCount:=0;
  DCPos := 0; DCNeg := 0;
  GetMem(ZCrossBuf,512*SizeOf(SmallInt));
  FillChar(ZCrossBuf^,1024,0);


{ P to the last element in the buffer }
  P := OffsetPointer(PCMData, (SampleCount-1)*2);
  LastVal := P^;
  DCOffset := 0;

  while (NoiseBufIndex < 511) and (DCOffset < SampleCount-1) do
  begin
    { DC-Offset }
    if LastVal > 0 then
      Inc(DCPos,LastVal)
    else
      Inc(DCNeg,LastVal);
    {$IFDEF WIN32}
      Dec(P);
    {$ELSE}
    asm
      sub Word Ptr[P],2;
      jnb @@1;
      sub Word Ptr[P+2],8;
    @@1:
    end;
    {$ENDIF}
    NewVal := P^;
    CurVal := Abs(NewVal-LastVal);

    if CurVal > 32000 then CurVal:=32000;

    LastVal := NewVal;
    if CurVal <> 0 then
    begin
      Inc(NewNoiseLevel,CurVal);
      NoiseBuf^[NoiseBufIndex] := CurVal;
      Inc(NoiseBufIndex);
    end;
    Inc(DCOffset);
  end;
  DCOffset := (DCPos+DCNeg) div DCOffset;
  NoiseBufIndex := 0;
  if NewNoiseLevel < LowestNoiseLevel
    then LowestNoiseLevel := NewNoiseLevel;
  ZCrossPosTrigger := Trunc((NewNoiseLevel*TriggerFac) / 512 + DCOffset);
  ZCrossNegTrigger := Trunc(-(NewNoiseLevel*TriggerFac) / 512 + DCOffset);

  { summed AC-level plus noise. no recalc of NoiseLiftering }
  AvgLongThresholdWithNoise := EstimateNoise(AvgThreshold*AvgElems,LowestNoiseLevel,False);

  P := OffsetPointer(PCMData, (SampleCount-1)*2);

  LastVal := P^;
  VoltageBelow := LastVal < ZCrossNegTrigger;
  ZCrossCount := 0; ZCrossIndex := 0;

  for x := SampleCount downto 1 do
  begin
    Dec(Avg, AvgBuf^[AvgBufIndex]);
    NewVal := P^;
    {$IFDEF WIN32} Dec(P); {$ELSE}
    asm sub Word Ptr[P],2; jnb @@1; sub Word Ptr[P+2],8; @@1: end; {$ENDIF}

    CurVal := Abs(NewVal-LastVal);

    if CurVal > 32000 then CurVal:=32000;

    LastVal := NewVal;
    AvgBuf^[AvgBufIndex] := CurVal;
    Inc(Avg,CurVal);
    Inc(AvgBufIndex);
    if AvgBufIndex >= AvgElems then AvgBufIndex := 0;

    { special case for silence-ranges }
    if CurVal <> 0 then
    begin
      Dec(NewNoiseLevel,NoiseBuf^[NoiseBufIndex]); NoiseBuf^[NoiseBufIndex] := CurVal;
      Inc(NewNoiseLevel,CurVal);
      Inc(NoiseBufIndex); if NoiseBufIndex > 511 then NoiseBufIndex := 0;
    end;

    if NewNoiseLevel < LowestNoiseLevel then
    begin
      LowestNoiseLevel := NewNoiseLevel;
      AvgLongThresholdWithNoise := EstimateNoise(AvgThreshold*AvgElems,LowestNoiseLevel,False);
      ZCrossPosTrigger := Trunc((NewNoiseLevel*TriggerFac) / 512 + DCOffset);
      ZCrossNegTrigger := Trunc(-(NewNoiseLevel*TriggerFac) / 512 + DCOffset);
    end;

    { Zerocrossings }
    Dec(ZCrossCount,ZCrossBuf^[ZCrossIndex]);
    if VoltageBelow then
    begin
      if NewVal > ZCrossPosTrigger then
      begin
        Inc(ZCrossCount); ZCrossBuf^[ZCrossIndex] := 1;
        VoltageBelow := False;
      end
        else ZCrossBuf^[ZCrossIndex] := 0;  { no change }
    end else
    begin
      if NewVal < ZCrossNegTrigger then
      begin
        Inc(ZCrossCount); ZCrossBuf^[ZCrossIndex] := 1;
        VoltageBelow := True;
      end
        else ZCrossBuf^[ZCrossIndex] := 0;
    end;
    Inc(ZCrossIndex); if ZCrossIndex > 511 then ZCrossIndex := 0;

    if SignalLength > 0 then
    begin
      Dec(FricDownCount);
      if (Avg < AvgLongThresholdWithNoise) and (FricDownCount < 0) { and (ZCrossCount < RastaZCross600Hz) }
        then SignalLength := 0
        else
        begin
         if (Avg > AvgLongThresholdWithNoise) and (ZCrossCount > RastaZCross600Hz)
             then FricDownCount := RastaMaxFricSamples;
          Inc(SignalLength);
          if SignalLength > MinSignalLength then
          begin
            if (Stop > SampleCount-PauseSamples) and NeedSilenceAtEnd then SignalLength := 0;  { no end found }
            Break;
          end;
        end;
      end else
      begin
        if (Avg > AvgLongThresholdWithNoise) or (ZCrossCount > RastaZCross600Hz) then
        begin
         if ZCrossCount > RastaZCross600Hz then FricDownCount := RastaMaxFricSamples
           else FricDownCount := 0;
          SignalLength := AvgElems;
          Stop := x+AvgElems;
        end;
      end;
  end;
  FreeMem(AvgBuf,AvgBufSize);
  FreeMem(NoiseBuf,512*SizeOf(SmallInt));
  FreeMem(ZCrossBuf,512*SizeOf(SmallInt));
  Result := SignalLength > MinSignalLength;  { True: Stop set }
end;

Ciao, Mike

0

Featured Post

Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

Join & Write a Comment

The uses clause is one of those things that just tends to grow and grow. Most of the time this is in the main form, as it's from this form that all others are called. If you have a big application (including many forms), the uses clause in the in…
Introduction Raise your hands if you were as upset with FireMonkey as I was when I discovered that there was no TListview.  I use TListView in almost all of my applications I've written, and I was not going to compromise by resorting to TStringGrid…
Internet Business Fax to Email Made Easy - With eFax Corporate (http://www.enterprise.efax.com), you'll receive a dedicated online fax number, which is used the same way as a typical analog fax number. You'll receive secure faxes in your email, fr…
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now