Solved

Extract sentences from file

Posted on 2006-11-23
9
439 Views
Last Modified: 2010-04-21
Hi

Say I have two files, I want to create a third file, by extracting the 5 sentences from file1, adding it to file3, then extracting 5 sentences from file2, and adding it to file3, then going back to file1, doing the same, then file2, doing the same until the end of both files. If either file1 or file2 file runs out then the rest of the other file must be added to file3.

It is safe to presume a sentence ends in a "." and then next one begins after the "."

Can somebody whip up some code to do this?

Thanks


0
Comment
Question by:zattz
  • 3
  • 2
  • 2
  • +2
9 Comments
 
LVL 28

Accepted Solution

by:
TName earned 125 total points
ID: 18006484
Hi, you could try something like this (put a button on a form. Demofiles: 'C:\Test1.txt', 'C:\Test2.txt'):

unit Unit1;

interface

uses
  Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,
  Dialogs, StdCtrls;

type
  TForm1 = class(TForm)
    Button1: TButton;
    procedure Button1Click(Sender: TObject);
  private
    function FileToString(FN: String):String;
    procedure Dismantle(Sep, Text: String; var SList:TStringList);
  public
    { Public declarations }
  end;

var
  Form1: TForm1;

implementation

{$R *.dfm}


procedure TForm1.Button1Click(Sender: TObject);
var
SL1,SL2,SL3:TStringList;
str:String;
i,j,c:Integer;
begin
  SL1:=TStringList.Create;
  SL2:=TStringList.Create;
  SL3:=TStringList.Create;

  if FileExists('C:\Test1.txt') then
    str:=FileToString('C:\Test1.txt');
  if FileExists('C:\Test2.txt') then
     str:=FileToString('C:\Test2.txt');
  try
    Dismantle('.', str, SL1);
    Dismantle('.', str, SL2);
  except
     Exit;
  end;

  c:=SL1.Count;
  if SL2.Count>c then
      c:=SL2.Count;

  try
    for i:=1 to c do begin
      j:=5;
      while j>0 do begin
         if SL1.Count>0 then begin
           SL3.Add(SL1[0]);
           SL1.Delete(0);
         end;
         Dec(j);
      end;
      j:=5;
      while j>0 do begin
         if SL2.Count>0 then begin
           SL3.Add(SL2[0]);
           SL2.Delete(0);
         end;
         Dec(j);
      end;
    end;
  except
    //
  end;
  SL3.SaveToFile('C:\Test3.txt');

  SL1.Free;
  SL2.Free;
  SL3.Free;
end;


procedure TForm1.Dismantle(Sep, Text: String; var SList:TStringList);
var
  i: Integer;
  S: String;
begin
  S := Text;
  i := 0;
  while Pos(Sep, S) > 0 do begin
    SList.Add(Copy(S, 1, Pos(Sep, S)-1));
    S := Copy(S, Pos(Sep, S) + Length(Sep), Length(S));
    Inc(i);
  end;
  SList.Add(Copy(S, 1, Length(S)));
end;


function TForm1.FileToString(FN: String):String;
var
  p: PChar;
begin
  with TFileStream.Create(FN,fmOpenRead) do
    try
      GetMem(p,Size);
      try
        Read(p^,Size);
        Result := p;
      finally
        FreeMem(p);
      end;
    finally
      Free;
    end;
end;

end.
0
 
LVL 15

Assisted Solution

by:mikelittlewood
mikelittlewood earned 125 total points
ID: 18006501
Don't you need 2 different string holder values for the files and not just the one "str"

Instead of doing

  try
    Dismantle('.', str1, SL1);
    Dismantle('.', str2, SL2);
  except
     Exit;
  end;

change it to

  SL1.Delimiter := '.';
  SL2.Delimiter := '.';
  SL1.DelimitedText := str1;
  SL2.DelimitedText := str2;
0
 
LVL 28

Expert Comment

by:TName
ID: 18006538
Thanks Mike, I've done some late changes in the edit field of the browser... !   Very clever idea :\
Glad to see someone really checks! :)


procedure TForm1.Button1Click(Sender: TObject);
var
SL1,SL2,SL3:TStringList;
str:String;
i,j,c:Integer;
begin
  SL1:=TStringList.Create;
  SL2:=TStringList.Create;
  SL3:=TStringList.Create;

  if FileExists('C:\Test1.txt') then
    str:=FileToString('C:\Test1.txt');
  try
    Dismantle('.', str, SL1);  
  except
     Exit;
  end;

 if FileExists('C:\Test2.txt') then
    str:=FileToString('C:\Test2.txt');
  try
    Dismantle('.', str, SL2);  
  except
     Exit;
  end;

  c:=SL1.Count;
  if SL2.Count>c then
      c:=SL2.Count;

{...}
0
 
LVL 15

Expert Comment

by:mikelittlewood
ID: 18006542
he he no probs
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 28

Expert Comment

by:TName
ID: 18006544
 >SL1.Delimiter := '.';
  >SL2.Delimiter := '.';
  >SL1.DelimitedText := str1;
  >SL2.DelimitedText := str2;

Or as Mike suggested.
I'll have to check this. Can you believe that I've used TStringlist for years, without noticing this function?

0
 
LVL 9

Assisted Solution

by:alkisg
alkisg earned 125 total points
ID: 18009263
A shorter version, using TStrings.LineBreak to seperate the sentences and TStrings.LoadFromFile to load them.
WARNING, I didn't compile this, there could be errors in the code:

var
  t1, t2: TStrings;
begin
  t1 := TStringList.Create;
  t1.LineBreak := '.';
  t1.LoadFromFile('text1.txt');
  t2 := TStringList.Create;
  t2.LineBreak := '.';
  t2.LoadFromFile('text1.txt');
//...now the files are read and broken to sentences,
//then you continue with the rest of the code as TName suggested
0
 
LVL 9

Expert Comment

by:alkisg
ID: 18009269
P.S. the dots at the end of the sentences are lost, so you'll have to write them manually at file3, i.e.
Write(t1[i] + '.')

The line breaks are preserved thought, and it makes code simpler...
0
 
LVL 9

Assisted Solution

by:bernani
bernani earned 125 total points
ID: 18018667
Hi,

If it can help, a little demo with display of the results in 3 memos

unit Unit1;

interface

uses
  Windows, Messages, SysUtils, Classes, Graphics, Controls, Forms, Dialogs,
  StdCtrls;

type
  TForm1 = class(TForm)
    Memo1: TMemo;
    Button1: TButton;
    Memo2: TMemo;
    Memo3: TMemo;
    procedure Button1Click(Sender: TObject);
  private
    { Déclarations privées }
  public
    { Déclarations publiques }
  end;

var
  Form1: TForm1;

implementation

{$R *.DFM}

procedure StringPickToPieces(Str: AnsiString; Separator: Char; Strings: TStrings);
begin
  try
    Strings.Clear;
    while AnsiPos(Separator, Str) > 0 do
    begin
      Strings.Add(Copy(Str, 1, AnsiPos(Separator, Str) - 1));
      Str := Copy(Str, AnsiPos(Separator, Str) + 1, Length(Str));
    end;
    Strings.Add(Str);
  except
  {}
  end;
end;

procedure TForm1.Button1Click(Sender: TObject);
var
x, c, i,j: integer;
sl1, sl2, sl3: TStringList;


begin
Memo1.clear;
sl1:= TStringlist.Create;
sl1.LoadFromFile('File1.txt');
StringPickToPieces(sl1.text,'.',sl1);
// only for visual correct text display
for x:= 0 to sl1.count -1 do
begin
memo1.lines.Add(sl1[x]+ '.');
end;

Memo2.clear;
sl2:= TStringlist.Create;
sl2.LoadFromFile('File2.txt');
StringPickToPieces(sl2.text,'.',sl2);
// only for visual correct text display
for x:=0 to sl2.count-1 do
begin
memo2.lines.Add(sl2[x]+ '.');
end;

Memo3.clear;
sl3:= TStringlist.Create;

c:=SL1.Count;

if SL2.Count>c then c:= SL2.Count;

  try
    for i:=1 to c do begin
      j:=5;
      while j>0 do begin
         if SL1.Count -1 > 0 then begin
           SL3.Add(SL1[0]+'.');
           SL1.Delete(0);
         end;
         Dec(j);
      end;
      j:=5;
      while j>0 do begin
         if SL2.Count -1 > 0 then begin
           SL3.Add(SL2[0] + '.');
           SL2.Delete(0);
         end;
         Dec(j);
      end;
    end;
  except
    //
  end;

SL3.SaveToFile('File3.txt');
// only for visual correct text display
Memo3.lines.LoadFromFile('File3.txt');

sl1.free;
sl2.free;
sl3.free;
end;

end.

______________ dfm _____________

object Form1: TForm1
  Left = 342
  Top = 214
  Width = 531
  Height = 431
  Caption = 'Form1'
  Color = clBtnFace
  Font.Charset = DEFAULT_CHARSET
  Font.Color = clWindowText
  Font.Height = -11
  Font.Name = 'MS Shell Dlg 2'
  Font.Style = []
  OldCreateOrder = False
  PixelsPerInch = 96
  TextHeight = 13
  object Memo1: TMemo
    Left = 112
    Top = 32
    Width = 193
    Height = 169
    Lines.Strings = (
      'Memo1')
    TabOrder = 0
  end
  object Button1: TButton
    Left = 16
    Top = 360
    Width = 75
    Height = 25
    Caption = 'Text splitting '
    TabOrder = 1
    OnClick = Button1Click
  end
  object Memo2: TMemo
    Left = 312
    Top = 32
    Width = 185
    Height = 169
    Lines.Strings = (
      'Memo2')
    TabOrder = 2
  end
  object Memo3: TMemo
    Left = 112
    Top = 208
    Width = 393
    Height = 185
    Lines.Strings = (
      'Memo3')
    TabOrder = 3
  end
end


0
 

Author Closing Comment

by:zattz
ID: 31425433
sorry about the delay
0

Featured Post

Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

Join & Write a Comment

Objective: - This article will help user in how to convert their numeric value become words. How to use 1. You can copy this code in your Unit as function 2. than you can perform your function by type this code The Code   (CODE) The Im…
In this tutorial I will show you how to use the Windows Speech API in Delphi. I will only cover basic functions such as text to speech and controlling the speed of the speech. SAPI Installation First you need to install the SAPI type library, th…
This video shows how to remove a single email address from the Outlook 2010 Auto Suggestion memory. NOTE: For Outlook 2016 and 2013 perform the exact same steps. Open a new email: Click the New email button in Outlook. Start typing the address: …
This video demonstrates how to create an example email signature rule for a department in a company using CodeTwo Exchange Rules. The signature will be inserted beneath users' latest emails in conversations and will be displayed in users' Sent Items…

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now