Solved

Extract sentences from file

Posted on 2006-11-23
9
447 Views
Last Modified: 2010-04-21
Hi

Say I have two files, I want to create a third file, by extracting the 5 sentences from file1, adding it to file3, then extracting 5 sentences from file2, and adding it to file3, then going back to file1, doing the same, then file2, doing the same until the end of both files. If either file1 or file2 file runs out then the rest of the other file must be added to file3.

It is safe to presume a sentence ends in a "." and then next one begins after the "."

Can somebody whip up some code to do this?

Thanks


0
Comment
Question by:zattz
  • 3
  • 2
  • 2
  • +2
9 Comments
 
LVL 28

Accepted Solution

by:
TName earned 125 total points
ID: 18006484
Hi, you could try something like this (put a button on a form. Demofiles: 'C:\Test1.txt', 'C:\Test2.txt'):

unit Unit1;

interface

uses
  Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,
  Dialogs, StdCtrls;

type
  TForm1 = class(TForm)
    Button1: TButton;
    procedure Button1Click(Sender: TObject);
  private
    function FileToString(FN: String):String;
    procedure Dismantle(Sep, Text: String; var SList:TStringList);
  public
    { Public declarations }
  end;

var
  Form1: TForm1;

implementation

{$R *.dfm}


procedure TForm1.Button1Click(Sender: TObject);
var
SL1,SL2,SL3:TStringList;
str:String;
i,j,c:Integer;
begin
  SL1:=TStringList.Create;
  SL2:=TStringList.Create;
  SL3:=TStringList.Create;

  if FileExists('C:\Test1.txt') then
    str:=FileToString('C:\Test1.txt');
  if FileExists('C:\Test2.txt') then
     str:=FileToString('C:\Test2.txt');
  try
    Dismantle('.', str, SL1);
    Dismantle('.', str, SL2);
  except
     Exit;
  end;

  c:=SL1.Count;
  if SL2.Count>c then
      c:=SL2.Count;

  try
    for i:=1 to c do begin
      j:=5;
      while j>0 do begin
         if SL1.Count>0 then begin
           SL3.Add(SL1[0]);
           SL1.Delete(0);
         end;
         Dec(j);
      end;
      j:=5;
      while j>0 do begin
         if SL2.Count>0 then begin
           SL3.Add(SL2[0]);
           SL2.Delete(0);
         end;
         Dec(j);
      end;
    end;
  except
    //
  end;
  SL3.SaveToFile('C:\Test3.txt');

  SL1.Free;
  SL2.Free;
  SL3.Free;
end;


procedure TForm1.Dismantle(Sep, Text: String; var SList:TStringList);
var
  i: Integer;
  S: String;
begin
  S := Text;
  i := 0;
  while Pos(Sep, S) > 0 do begin
    SList.Add(Copy(S, 1, Pos(Sep, S)-1));
    S := Copy(S, Pos(Sep, S) + Length(Sep), Length(S));
    Inc(i);
  end;
  SList.Add(Copy(S, 1, Length(S)));
end;


function TForm1.FileToString(FN: String):String;
var
  p: PChar;
begin
  with TFileStream.Create(FN,fmOpenRead) do
    try
      GetMem(p,Size);
      try
        Read(p^,Size);
        Result := p;
      finally
        FreeMem(p);
      end;
    finally
      Free;
    end;
end;

end.
0
 
LVL 15

Assisted Solution

by:mikelittlewood
mikelittlewood earned 125 total points
ID: 18006501
Don't you need 2 different string holder values for the files and not just the one "str"

Instead of doing

  try
    Dismantle('.', str1, SL1);
    Dismantle('.', str2, SL2);
  except
     Exit;
  end;

change it to

  SL1.Delimiter := '.';
  SL2.Delimiter := '.';
  SL1.DelimitedText := str1;
  SL2.DelimitedText := str2;
0
 
LVL 28

Expert Comment

by:TName
ID: 18006538
Thanks Mike, I've done some late changes in the edit field of the browser... !   Very clever idea :\
Glad to see someone really checks! :)


procedure TForm1.Button1Click(Sender: TObject);
var
SL1,SL2,SL3:TStringList;
str:String;
i,j,c:Integer;
begin
  SL1:=TStringList.Create;
  SL2:=TStringList.Create;
  SL3:=TStringList.Create;

  if FileExists('C:\Test1.txt') then
    str:=FileToString('C:\Test1.txt');
  try
    Dismantle('.', str, SL1);  
  except
     Exit;
  end;

 if FileExists('C:\Test2.txt') then
    str:=FileToString('C:\Test2.txt');
  try
    Dismantle('.', str, SL2);  
  except
     Exit;
  end;

  c:=SL1.Count;
  if SL2.Count>c then
      c:=SL2.Count;

{...}
0
Netscaler Common Configuration How To guides

If you use NetScaler you will want to see these guides. The NetScaler How To Guides show administrators how to get NetScaler up and configured by providing instructions for common scenarios and some not so common ones.

 
LVL 15

Expert Comment

by:mikelittlewood
ID: 18006542
he he no probs
0
 
LVL 28

Expert Comment

by:TName
ID: 18006544
 >SL1.Delimiter := '.';
  >SL2.Delimiter := '.';
  >SL1.DelimitedText := str1;
  >SL2.DelimitedText := str2;

Or as Mike suggested.
I'll have to check this. Can you believe that I've used TStringlist for years, without noticing this function?

0
 
LVL 9

Assisted Solution

by:alkisg
alkisg earned 125 total points
ID: 18009263
A shorter version, using TStrings.LineBreak to seperate the sentences and TStrings.LoadFromFile to load them.
WARNING, I didn't compile this, there could be errors in the code:

var
  t1, t2: TStrings;
begin
  t1 := TStringList.Create;
  t1.LineBreak := '.';
  t1.LoadFromFile('text1.txt');
  t2 := TStringList.Create;
  t2.LineBreak := '.';
  t2.LoadFromFile('text1.txt');
//...now the files are read and broken to sentences,
//then you continue with the rest of the code as TName suggested
0
 
LVL 9

Expert Comment

by:alkisg
ID: 18009269
P.S. the dots at the end of the sentences are lost, so you'll have to write them manually at file3, i.e.
Write(t1[i] + '.')

The line breaks are preserved thought, and it makes code simpler...
0
 
LVL 9

Assisted Solution

by:bernani
bernani earned 125 total points
ID: 18018667
Hi,

If it can help, a little demo with display of the results in 3 memos

unit Unit1;

interface

uses
  Windows, Messages, SysUtils, Classes, Graphics, Controls, Forms, Dialogs,
  StdCtrls;

type
  TForm1 = class(TForm)
    Memo1: TMemo;
    Button1: TButton;
    Memo2: TMemo;
    Memo3: TMemo;
    procedure Button1Click(Sender: TObject);
  private
    { Déclarations privées }
  public
    { Déclarations publiques }
  end;

var
  Form1: TForm1;

implementation

{$R *.DFM}

procedure StringPickToPieces(Str: AnsiString; Separator: Char; Strings: TStrings);
begin
  try
    Strings.Clear;
    while AnsiPos(Separator, Str) > 0 do
    begin
      Strings.Add(Copy(Str, 1, AnsiPos(Separator, Str) - 1));
      Str := Copy(Str, AnsiPos(Separator, Str) + 1, Length(Str));
    end;
    Strings.Add(Str);
  except
  {}
  end;
end;

procedure TForm1.Button1Click(Sender: TObject);
var
x, c, i,j: integer;
sl1, sl2, sl3: TStringList;


begin
Memo1.clear;
sl1:= TStringlist.Create;
sl1.LoadFromFile('File1.txt');
StringPickToPieces(sl1.text,'.',sl1);
// only for visual correct text display
for x:= 0 to sl1.count -1 do
begin
memo1.lines.Add(sl1[x]+ '.');
end;

Memo2.clear;
sl2:= TStringlist.Create;
sl2.LoadFromFile('File2.txt');
StringPickToPieces(sl2.text,'.',sl2);
// only for visual correct text display
for x:=0 to sl2.count-1 do
begin
memo2.lines.Add(sl2[x]+ '.');
end;

Memo3.clear;
sl3:= TStringlist.Create;

c:=SL1.Count;

if SL2.Count>c then c:= SL2.Count;

  try
    for i:=1 to c do begin
      j:=5;
      while j>0 do begin
         if SL1.Count -1 > 0 then begin
           SL3.Add(SL1[0]+'.');
           SL1.Delete(0);
         end;
         Dec(j);
      end;
      j:=5;
      while j>0 do begin
         if SL2.Count -1 > 0 then begin
           SL3.Add(SL2[0] + '.');
           SL2.Delete(0);
         end;
         Dec(j);
      end;
    end;
  except
    //
  end;

SL3.SaveToFile('File3.txt');
// only for visual correct text display
Memo3.lines.LoadFromFile('File3.txt');

sl1.free;
sl2.free;
sl3.free;
end;

end.

______________ dfm _____________

object Form1: TForm1
  Left = 342
  Top = 214
  Width = 531
  Height = 431
  Caption = 'Form1'
  Color = clBtnFace
  Font.Charset = DEFAULT_CHARSET
  Font.Color = clWindowText
  Font.Height = -11
  Font.Name = 'MS Shell Dlg 2'
  Font.Style = []
  OldCreateOrder = False
  PixelsPerInch = 96
  TextHeight = 13
  object Memo1: TMemo
    Left = 112
    Top = 32
    Width = 193
    Height = 169
    Lines.Strings = (
      'Memo1')
    TabOrder = 0
  end
  object Button1: TButton
    Left = 16
    Top = 360
    Width = 75
    Height = 25
    Caption = 'Text splitting '
    TabOrder = 1
    OnClick = Button1Click
  end
  object Memo2: TMemo
    Left = 312
    Top = 32
    Width = 185
    Height = 169
    Lines.Strings = (
      'Memo2')
    TabOrder = 2
  end
  object Memo3: TMemo
    Left = 112
    Top = 208
    Width = 393
    Height = 185
    Lines.Strings = (
      'Memo3')
    TabOrder = 3
  end
end


0
 

Author Closing Comment

by:zattz
ID: 31425433
sorry about the delay
0

Featured Post

VMware Disaster Recovery and Data Protection

In this expert guide, you’ll learn about the components of a Modern Data Center. You will use cases for the value-added capabilities of Veeam®, including combining backup and replication for VMware disaster recovery and using replication for data center migration.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Have you ever had your Delphi form/application just hanging while waiting for data to load? This is the article to read if you want to learn some things about adding threads for data loading in the background. First, I'll setup a general applica…
In my programming career I have only very rarely run into situations where operator overloading would be of any use in my work.  Normally those situations involved math with either overly large numbers (hundreds of thousands of digits or accuracy re…
Two types of users will appreciate AOMEI Backupper Pro: 1 - Those with PCIe drives (and haven't found cloning software that works on them). 2 - Those who want a fast clone of their boot drive (no re-boots needed) and it can clone your drive wh…

810 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question