Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 502
  • Last Modified:

Tampering with very large textfiles...

I have a textfile on approx. 120 000 lines. How can i put the n:th line into a string?

--johan
0
sageryd
Asked:
sageryd
  • 10
  • 8
  • 4
  • +4
1 Solution
 
God_AresCommented:
perhaps somthing lie this?

procedure TForm1.Button1Click(Sender: TObject);
var s:string;
    text : Tstringlist;
begin

  text := Tstringlist.Create;

  text.LoadFromFile('filename.txt');

  s := text.Strings[nr_line];

  text.free;

end;
0
 
edeyCommented:
or, if you don't want to keep the entire file in ram you could try readln'ing through to the nth line, from the help file:

Description

The Readln procedure reads a line of text and then skips to the next line of the file.
Readln(F) with no parameters causes the current file position to advance to the beginning of the next line if there is one; otherwise, it goes to the end of the file.

hence you could try something like:

ix := 1;
while (not EoF(F))and(ix < n) do
begin
 readln(F);
 inc(ix);
end;
if (not EoF(F)) and (ix = n) then
 readln(F,my_string)
else
 messageDLG('File Contains Less Then N Lines',mtError,[mbOk],0);


GL
MIke
0
 
intheCommented:
this was from another similar question:

var
linenumber : array[1..maxlines] of Cardinal;
s : string = '';
f : file of byte;
begin
AssignFile(f, 'c:\mylargefile.txt');
reset(f);
Seek(f, linenumber[1245]);
repeat
read(f, c);
s := s + chr(c);
until c = 13;
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
sagerydAuthor Commented:
Gor_Ares, your solution does not work at all, already tried it a couple of times, a TStringList or any other kind of list is to small to be able to handle such large text-files.

Edey, I tried your solution before too, it gets very slow if you have to read, lets say, 100 000 lines before being able to read the 100 0001:st.

Barry, I think your solutions is the most optimal, and easiest, I havn't tried anything yet, but it looks promising. I'll get back to you as soon as I have the time to take a look at it.

Thanks everyone!

--johan
0
 
sagerydAuthor Commented:
God_ares, sorry for the misspelling of your nick, might have sounded a little bit unpleasant.."Gore_ares" ;) Cheers!
0
 
MadshiCommented:
Ehm, Barry, who fills the linenumber array? As far as I understand that code, Seek is called with a linenumber that was not initialized, or am I missing something?

Regards, Madshi.
0
 
LischkeCommented:
Madshi, this is what I wondered about too. I guess the array is filled with positions of the line starts in the file. This would need a specially prepared file or a way to get to this information otherwise.

sageryd,

what you said about TStringList is definitly not true. I have used this (and a rewritten variant for wide strings) to hold a one million lines file (needs much memory though).

In the case you want a fast AND memory inexpensive solution I recommend that you look at memory mapped files. Map the file to memory and iterate through the line breaks (via simple memory pointers).

Ciao, Mike
0
 
sagerydAuthor Commented:
OK, Mike , maybe I was wrong, but what I really meant was that it wouldn't work with a list because of the memory needed, I don't want the users RAM to get filled up! Maybe you can give an example of what you mean with those memory pointers.

--johan
0
 
LischkeCommented:
Well, I don't have ready to use code but the idea is:

1) create a file mapping for the text file:

var
  TextFile: THandle;
  MapHandle: THandle;

begin
  TextFile := OpenFile(...);
  MapHandle := CreateFileMapping(TextFile, nil, PAGE_READONLY, 0, 0, nil);
  :
end;

2) create views of this mapping in your memory and search/count there:

const
  MapSize = 4096;

var
  Data,
  Run: PChar;
  Offset: Cardinal;
  Done: Boolean;

begin
  Offset := 0;
  Done := False;
  while not Done do
  begin
    Data := MapViewOfFile(MapHandle, FILE_MAP_READ, 0, Offset, MapSize);
    // Data points now to raw string (file) data, start searching the block
    Run := Data;
    while (Run - Data) < MapSize do
    begin
      if Run^ = #13 then
      begin
        Inc(LineCount);
        if LineCount = WantedLine then
        begin
          Done := True;
          Break;
        end;
      end;
      Inc(Run);
    end;
    UnmapViewOfFile(Data);
    Inc(Offset, MapSize);
  end;
end;

3) clean up
  CloseHandle(MapHandle);
  CloseHandle(TextFile);
  etc.

This code is not complete but should give you most of the stuff you need.

Ciao, Mike
0
 
LischkeCommented:
Ah yes, you need of course to stop the loop also when you have processed the entire file and there weren't as much lines as you expected.

Ciao, Mike
0
 
sagerydAuthor Commented:
ok...but isn't there a simpler way?
0
 
LischkeCommented:
Yes, TStringList.LoadFromFile.

Ciao, Mike
0
 
MadshiCommented:
((-:   Hehe Mike...   :-))     (You're absolutely right of course)
0
 
LischkeCommented:
;-)
0
 
sagerydAuthor Commented:
Very funny.....

Wouldn't it be just as fast to do something like this:

var
  F: TextFile;
  LineNo: integer;
  S: string;
begin
  LineNo := 5067;
  AssignFile(F, 'filename');
  Reset(F);
  Seek(F, LineNo);
  Readln(F, S);
  {S = the text of the 5067:th line?}
end;

Does the above work as fast as the other? Does this work at all?

--johan
0
 
intheCommented:
howdy fellas:
lischke you were also right about
< the array is filled with positions of the line number >

this is correct ;-)
fill the array with positions/linenumbers of file

array[1,34,124,etc]

Seek(f, linenumber[2]);
//to got to line 124

this would only really make sense if you only wanted to store the line numbers of about 50-100 misc lines to quickly jump to a line..otherwise the array could get a bit large :o)
0
 
sagerydAuthor Commented:
I believe it will work just great for my purpose, I'm going to pick a random line number and display the text at that line. btw, how can I retrieve how many lines there are in the text file?

--johan
0
 
LischkeCommented:
Johan, that's exactly the point. There's no way to find lines in a text file other than iterating through the content and counting the line breaks. That's what I try to make clear here. If there would be a way to just seek to a particular line why the heck should I bother you with memory mapped files? The type TextFile is only a wrapper around a normal Windows file and some extra handling like ReadLn (which also iterates through the content).

But you should not be distracted by the code I gave here. Actually, it is almost complete. You need only to add a little file open/close handling and link your own stuff into the search code. This is really not difficult.

Ciao, Mike
0
 
God_AresCommented:
What if you make the text a fixed length?? than you could get a line really fast..!
0
 
AJFlemingCommented:
I was playing around with this sort of thing a couple of years ago - trying to alter text in a 4Gb file and the main trick I found was to load as much as you can cope with into memory, look through it there, then dump it out to file again once you've done what you need to.

In your case, you're unfortunately going to have to look at every character in the file - at least until you get to the line you want. There are some cute tricks you can pull though.

If you allocate yourself a decent sized buffer - say about a couple of Meg - as an array of chars. Blockread to fill this buffer - or close to it if you're at the end of the file.

Scan through it counting CR/LF pairs until you get to the line you want. If you get through an entire buffer and you haven't gotten to your insertion point then just blockwrite it back out.

Once you get to your insertion point, blockwrite everything in the buffer up to that point, write out your inserted line, then blockwrite the rest of the buffer - then blockread/blockwrite until you get to the end of the file.

Obviously, this isn't the most elegant solution - but if you don't have any extra knowledge about the layout of the file, that's all you got.

Mike, I haven't used memory mapped files myself - but I'd be really interested to see a comparison between this an a more brute force method.

cheers,

Adam...
0
 
MadshiCommented:
I agree with Mike here. Memory mapped files should be the way to go, because Windows then does all the rest for you. And I'm quite sure that Windows will do it in the most performance optimized way. But I agree with Adam, I would like to see a benchmark...

Regards, Madshi.
0
 
LischkeCommented:
I won't have the time this very day, but will give you some results tomorrow.

Ciao, Mike
0
 
LischkeCommented:
Hi guys,

here I am with my test results. The system I did the tests on is a double processor PII 350 running WinNT 4 SP6 with 128 MB. The test file is a ~11MB text file with > 300,000 lines (copied from Windows.pas) on a 2GB partition (IDE drive) containing also my NT system (the program itself is on another partition).

I inserted a line "This is our target line!!!" exactly on line position 300,000 (used Delphi's IDE) which is shown when found, otherwise a preset line is shown. I tested four ways in the order of increasing speed (my expectation was exactly my result). Note for implementors: To write the code given below I needed 1 hour this morning with 3 of the 4 routines running without any bug from the first moment on (I had no template to look at). I say this not to show how good I am but to point out how easy it is to write it.

Results:

ReadLn           1094 ms
File Stream      406 ms
Pure File API    391 ms
MMF              297 ms

Here's the code I used:

object Form1: TForm1
  Left = 400
  Top = 262
  HorzScrollBar.Visible = False
  BorderStyle = bsSingle
  Caption = 'Form1'
  ClientHeight = 240
  ClientWidth = 549
  Color = clBtnFace
  Font.Charset = ANSI_CHARSET
  Font.Color = clWindowText
  Font.Height = -13
  Font.Name = 'Arial'
  Font.Style = []
  KeyPreview = True
  OldCreateOrder = True
  Scaled = False
  Visible = True
  OnKeyPress = FormKeyPress
  PixelsPerInch = 96
  TextHeight = 16
  object Label1: TLabel
    Left = 8
    Top = 80
    Width = 97
    Height = 53
    AutoSize = False
    Caption = 'Time needed to find line number 300,000:'
    WordWrap = True
  end
  object Label2: TLabel
    Left = 260
    Top = 96
    Width = 28
    Height = 16
    Caption = 'Time'
  end
  object Label3: TLabel
    Left = 164
    Top = 96
    Width = 28
    Height = 16
    Caption = 'Time'
  end
  object Label4: TLabel
    Left = 352
    Top = 96
    Width = 28
    Height = 16
    Caption = 'Time'
  end
  object Label5: TLabel
    Left = 456
    Top = 96
    Width = 28
    Height = 16
    Caption = 'Time'
  end
  object Button1: TButton
    Left = 144
    Top = 28
    Width = 75
    Height = 25
    Caption = 'ReadLn'
    TabOrder = 0
    OnClick = Button1Click
  end
  object Button2: TButton
    Left = 240
    Top = 28
    Width = 75
    Height = 25
    Caption = 'Stream'
    TabOrder = 1
    OnClick = Button2Click
  end
  object Button3: TButton
    Left = 336
    Top = 28
    Width = 75
    Height = 25
    Caption = 'File API'
    TabOrder = 2
    OnClick = Button3Click
  end
  object Button4: TButton
    Left = 432
    Top = 28
    Width = 75
    Height = 25
    Caption = 'MMF'
    TabOrder = 3
    OnClick = Button4Click
  end
end




unit Unit1;

interface

uses
  Windows, SysUtils, Forms, Classes, StdCtrls, Controls, Buttons, Graphics, Dialogs, ComCtrls, Messages,
  ExtCtrls;


type
  TForm1 = class(TForm)
    Button1: TButton;
    Button2: TButton;
    Button3: TButton;
    Button4: TButton;
    Label1: TLabel;
    Label2: TLabel;
    Label3: TLabel;
    Label4: TLabel;
    Label5: TLabel;
    procedure FormKeyPress(Sender: TObject; var Key: Char);
    procedure Button1Click(Sender: TObject);
    procedure Button2Click(Sender: TObject);
    procedure Button3Click(Sender: TObject);
    procedure Button4Click(Sender: TObject);
  private
  public
  end;

var
  Form1: TForm1;

implementation

uses
  MMSystem;
 
{$R *.DFM}

procedure TForm1.FormKeyPress(Sender: TObject; var Key: Char);
begin
  if key = #27 then
  begin
    Key:=#0;
    Close;
  end;
end;

const
  FileName = 'C:\Temp\Test.txt'; // 305315 lines in ~11.4MB (this is mainly a Windows.pas copy)

var
  Buffer: array[0..1024 * 1024 - 1] of Byte;
 
procedure TForm1.Button1Click(Sender: TObject);

var
  Start: Cardinal;
  F: TextFile;
  S: String;
  Counter: Cardinal;

begin
  Screen.Cursor := crHourGlass;
  S := 'nothing found';
  try
    AssignFile(F, FileName);
    Reset(F);
    Counter := 0;
    Start := timeGetTime;
    while not EOF(F) do
    begin
      if Counter = 299999 then
      begin
        ReadLn(F, S);
        Break;
      end;
      ReadLn(F);
      Inc(Counter);
    end;
    Label3.Caption := Format('%d ms', [timeGetTime - Start]);
    CloseFile(F);
  finally
    Screen.Cursor := crDefault;
    ShowMessage(S);
  end;
end;

procedure TForm1.Button2Click(Sender: TObject);

var
  Start: Cardinal;
  S: String;
  Stream: TFileStream;
  LineCounter,
  CharCounter: Cardinal;
  Head, Tail: PChar;

begin
  Screen.Cursor := crHourGlass;
  S := 'nothing found';
  try
    Stream := TFileStream.Create(FileName, fmOpenRead or fmShareDenyNone);
    LineCOunter := 0;
    Start := timeGetTime;
    with Stream do
    begin
      while Position < Size do
      begin
        CharCounter := Read(Buffer, SizeOf(Buffer));
        Head := @Buffer;
        repeat
          while (CharCounter > 0) and (Head^ <> #13) do
          begin
            Inc(Head);
            Dec(CharCounter);
          end;

          if CharCounter > 0 then
          begin
            Inc(Head);
            Dec(CharCounter);
           
            Inc(LineCounter);
            if LineCounter = 299999 then
            begin
              // load the line
              if Head^ = #10 then Inc(Head);
              Tail := Head;
              // NOTE: here a buffer overrun should be checked too
              while Tail^ <> #13 do Inc(Tail);
              SetString(S, Head, Tail - Head);
              Break;
            end;
          end;
        until CharCounter = 0;
      end;
    end;
    Label2.Caption := Format('%d ms', [timeGetTime - Start]);
    Stream.Free;
  finally
    Screen.Cursor := crDefault;
    ShowMessage(S);
  end;
end;

procedure TForm1.Button3Click(Sender: TObject);

var
  Start: Cardinal;
  S: String;
  LineCounter,
  CharCounter: Cardinal;
  Head, Tail: PChar;
  FileHandle: THandle;
 
begin
  Screen.Cursor := crHourGlass;
  S := 'nothing found';
  try
    // note: access speed could still be improved by using the FILE_FLAG_NO_BUFFERING flag, but this requires
    //       a buffer sized and aligned to the current disk sector size.
    FileHandle := CreateFile(FileName, GENERIC_READ, FILE_SHARE_READ, nil, OPEN_EXISTING, FILE_FLAG_SEQUENTIAL_SCAN, 0);
    LineCounter := 0;
    Start := timeGetTime;
    begin
      while True do
      begin
        ReadFile(FileHandle, Buffer, SizeOf(Buffer), CharCounter, nil);
        if CharCounter = 0 then Break;
        Head := @Buffer;
        repeat
          while (CharCounter > 0) and (Head^ <> #13) do
          begin
            Inc(Head);
            Dec(CharCounter);
          end;

          if CharCounter > 0 then
          begin
            Inc(Head);
            Dec(CharCounter);
           
            Inc(LineCounter);
            if LineCounter = 299999 then
            begin
              // load the line
              if Head^ = #10 then Inc(Head);
              Tail := Head;
              // NOTE: here a buffer overrun should be checked too
              while Tail^ <> #13 do Inc(Tail);
              SetString(S, Head, Tail - Head);
              Break;
            end;
          end;
        until CharCounter = 0;
      end;
    end;
    Label4.Caption := Format('%d ms', [timeGetTime - Start]);
    CloseHandle(FileHandle);
  finally
    Screen.Cursor := crDefault;
    ShowMessage(S);
  end;
end;

procedure TForm1.Button4Click(Sender: TObject);

var
  Start: Cardinal;
  S: String;
  LineCounter,
  CharCounter: Cardinal; // Int64 for files >= 4GB
  Base,
  Head, Tail: PChar;
  FileHandle: THandle;
  FileMapping: THandle;

begin
  Screen.Cursor := crHourGlass;
  S := 'nothing found';
  try
    FileHandle := CreateFile(FileName, GENERIC_READ, FILE_SHARE_READ, nil, OPEN_EXISTING, FILE_FLAG_SEQUENTIAL_SCAN, 0);
    FileMapping := CreateFileMapping(FileHandle, nil, PAGE_READONLY, 0, 0, nil);
    CharCounter := GetFileSize(FileHandle, nil);
    // map entire file content into address space
    Base := MapViewOfFile(FileMapping, FILE_MAP_READ, 0, 0, 0);
    LineCounter := 0;
    Start := timeGetTime;
    begin
      Head := Base;
      while CharCounter > 0 do
      begin
        while (CharCounter > 0) and (Head^ <> #13) do
        begin
          Inc(Head);
          Dec(CharCounter);
        end;

        if CharCounter > 0 then
        begin
          Inc(Head);
          Dec(CharCounter);

          Inc(LineCounter);
          if LineCounter = 299999 then
          begin
            // load the line
            if Head^ = #10 then Inc(Head);
            Tail := Head;
            // NOTE: here a buffer overrun should be checked too
            while Tail^ <> #13 do Inc(Tail);
            SetString(S, Head, Tail - Head);
            Break;
          end;
        end;
      end;
      UnmapViewOfFile(Base);
    end;
    Label5.Caption := Format('%d ms', [timeGetTime - Start]);
    CloseHandle(FileMapping);
    CloseHandle(FileHandle);
  finally
    Screen.Cursor := crDefault;
    ShowMessage(S);
  end;
end;

end.




Ciao, Mike
0
 
MadshiCommented:
Well, that looks like being worth a grade A (and perhaps even a point boost)...   :-))

Regards, Madshi.
0
 
AJFlemingCommented:
Nice code Mike, I hope you don't mind if I shamelessly lift some of it for my own purposes :)

Any idea what the difference in speed is between them?

cheers,

Adam...
0
 
AJFlemingCommented:
Sorry, should have said - "what was the difference in speed in your system?" I'm away from my Delphi compiler at the moment so I can't test it...

cheers,

Adam...
0
 
LischkeCommented:
:-) thank you guys...

Adam, I'm not sure why you ask about the speed difference. I have included the results I got in the text above. See there!

Ciao, Mike
0
 
LischkeCommented:
Johan, are you still with us?
0
 
sagerydAuthor Commented:
Comment accepted as answer
0
 
sagerydAuthor Commented:
Yep, I'm with ya! I've just had so much to do this week - last week in high school, had to work through that pile of homework before I could deal with anything else. But now I'm done with it! I feels soooo relieved! Lischke, you've done a very good job! You'll get the points! Thanks everyone else too!

--johan
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 10
  • 8
  • 4
  • +4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now