Link to home
Start Free TrialLog in
Avatar of sageryd
sagerydFlag for Sweden

asked on

Tampering with very large textfiles...

I have a textfile on approx. 120 000 lines. How can i put the n:th line into a string?

--johan
Avatar of God_Ares
God_Ares

perhaps somthing lie this?

procedure TForm1.Button1Click(Sender: TObject);
var s:string;
    text : Tstringlist;
begin

  text := Tstringlist.Create;

  text.LoadFromFile('filename.txt');

  s := text.Strings[nr_line];

  text.free;

end;
or, if you don't want to keep the entire file in ram you could try readln'ing through to the nth line, from the help file:

Description

The Readln procedure reads a line of text and then skips to the next line of the file.
Readln(F) with no parameters causes the current file position to advance to the beginning of the next line if there is one; otherwise, it goes to the end of the file.

hence you could try something like:

ix := 1;
while (not EoF(F))and(ix < n) do
begin
 readln(F);
 inc(ix);
end;
if (not EoF(F)) and (ix = n) then
 readln(F,my_string)
else
 messageDLG('File Contains Less Then N Lines',mtError,[mbOk],0);


GL
MIke
this was from another similar question:

var
linenumber : array[1..maxlines] of Cardinal;
s : string = '';
f : file of byte;
begin
AssignFile(f, 'c:\mylargefile.txt');
reset(f);
Seek(f, linenumber[1245]);
repeat
read(f, c);
s := s + chr(c);
until c = 13;
Avatar of sageryd

ASKER

Gor_Ares, your solution does not work at all, already tried it a couple of times, a TStringList or any other kind of list is to small to be able to handle such large text-files.

Edey, I tried your solution before too, it gets very slow if you have to read, lets say, 100 000 lines before being able to read the 100 0001:st.

Barry, I think your solutions is the most optimal, and easiest, I havn't tried anything yet, but it looks promising. I'll get back to you as soon as I have the time to take a look at it.

Thanks everyone!

--johan
Avatar of sageryd

ASKER

God_ares, sorry for the misspelling of your nick, might have sounded a little bit unpleasant.."Gore_ares" ;) Cheers!
Ehm, Barry, who fills the linenumber array? As far as I understand that code, Seek is called with a linenumber that was not initialized, or am I missing something?

Regards, Madshi.
Madshi, this is what I wondered about too. I guess the array is filled with positions of the line starts in the file. This would need a specially prepared file or a way to get to this information otherwise.

sageryd,

what you said about TStringList is definitly not true. I have used this (and a rewritten variant for wide strings) to hold a one million lines file (needs much memory though).

In the case you want a fast AND memory inexpensive solution I recommend that you look at memory mapped files. Map the file to memory and iterate through the line breaks (via simple memory pointers).

Ciao, Mike
Avatar of sageryd

ASKER

OK, Mike , maybe I was wrong, but what I really meant was that it wouldn't work with a list because of the memory needed, I don't want the users RAM to get filled up! Maybe you can give an example of what you mean with those memory pointers.

--johan
Well, I don't have ready to use code but the idea is:

1) create a file mapping for the text file:

var
  TextFile: THandle;
  MapHandle: THandle;

begin
  TextFile := OpenFile(...);
  MapHandle := CreateFileMapping(TextFile, nil, PAGE_READONLY, 0, 0, nil);
  :
end;

2) create views of this mapping in your memory and search/count there:

const
  MapSize = 4096;

var
  Data,
  Run: PChar;
  Offset: Cardinal;
  Done: Boolean;

begin
  Offset := 0;
  Done := False;
  while not Done do
  begin
    Data := MapViewOfFile(MapHandle, FILE_MAP_READ, 0, Offset, MapSize);
    // Data points now to raw string (file) data, start searching the block
    Run := Data;
    while (Run - Data) < MapSize do
    begin
      if Run^ = #13 then
      begin
        Inc(LineCount);
        if LineCount = WantedLine then
        begin
          Done := True;
          Break;
        end;
      end;
      Inc(Run);
    end;
    UnmapViewOfFile(Data);
    Inc(Offset, MapSize);
  end;
end;

3) clean up
  CloseHandle(MapHandle);
  CloseHandle(TextFile);
  etc.

This code is not complete but should give you most of the stuff you need.

Ciao, Mike
Ah yes, you need of course to stop the loop also when you have processed the entire file and there weren't as much lines as you expected.

Ciao, Mike
Avatar of sageryd

ASKER

ok...but isn't there a simpler way?
Yes, TStringList.LoadFromFile.

Ciao, Mike
((-:   Hehe Mike...   :-))     (You're absolutely right of course)
;-)
Avatar of sageryd

ASKER

Very funny.....

Wouldn't it be just as fast to do something like this:

var
  F: TextFile;
  LineNo: integer;
  S: string;
begin
  LineNo := 5067;
  AssignFile(F, 'filename');
  Reset(F);
  Seek(F, LineNo);
  Readln(F, S);
  {S = the text of the 5067:th line?}
end;

Does the above work as fast as the other? Does this work at all?

--johan
howdy fellas:
lischke you were also right about
< the array is filled with positions of the line number >

this is correct ;-)
fill the array with positions/linenumbers of file

array[1,34,124,etc]

Seek(f, linenumber[2]);
//to got to line 124

this would only really make sense if you only wanted to store the line numbers of about 50-100 misc lines to quickly jump to a line..otherwise the array could get a bit large :o)
Avatar of sageryd

ASKER

I believe it will work just great for my purpose, I'm going to pick a random line number and display the text at that line. btw, how can I retrieve how many lines there are in the text file?

--johan
Johan, that's exactly the point. There's no way to find lines in a text file other than iterating through the content and counting the line breaks. That's what I try to make clear here. If there would be a way to just seek to a particular line why the heck should I bother you with memory mapped files? The type TextFile is only a wrapper around a normal Windows file and some extra handling like ReadLn (which also iterates through the content).

But you should not be distracted by the code I gave here. Actually, it is almost complete. You need only to add a little file open/close handling and link your own stuff into the search code. This is really not difficult.

Ciao, Mike
What if you make the text a fixed length?? than you could get a line really fast..!
I was playing around with this sort of thing a couple of years ago - trying to alter text in a 4Gb file and the main trick I found was to load as much as you can cope with into memory, look through it there, then dump it out to file again once you've done what you need to.

In your case, you're unfortunately going to have to look at every character in the file - at least until you get to the line you want. There are some cute tricks you can pull though.

If you allocate yourself a decent sized buffer - say about a couple of Meg - as an array of chars. Blockread to fill this buffer - or close to it if you're at the end of the file.

Scan through it counting CR/LF pairs until you get to the line you want. If you get through an entire buffer and you haven't gotten to your insertion point then just blockwrite it back out.

Once you get to your insertion point, blockwrite everything in the buffer up to that point, write out your inserted line, then blockwrite the rest of the buffer - then blockread/blockwrite until you get to the end of the file.

Obviously, this isn't the most elegant solution - but if you don't have any extra knowledge about the layout of the file, that's all you got.

Mike, I haven't used memory mapped files myself - but I'd be really interested to see a comparison between this an a more brute force method.

cheers,

Adam...
I agree with Mike here. Memory mapped files should be the way to go, because Windows then does all the rest for you. And I'm quite sure that Windows will do it in the most performance optimized way. But I agree with Adam, I would like to see a benchmark...

Regards, Madshi.
I won't have the time this very day, but will give you some results tomorrow.

Ciao, Mike
ASKER CERTIFIED SOLUTION
Avatar of Lischke
Lischke

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Well, that looks like being worth a grade A (and perhaps even a point boost)...   :-))

Regards, Madshi.
Nice code Mike, I hope you don't mind if I shamelessly lift some of it for my own purposes :)

Any idea what the difference in speed is between them?

cheers,

Adam...
Sorry, should have said - "what was the difference in speed in your system?" I'm away from my Delphi compiler at the moment so I can't test it...

cheers,

Adam...
:-) thank you guys...

Adam, I'm not sure why you ask about the speed difference. I have included the results I got in the text above. See there!

Ciao, Mike
Johan, are you still with us?
Avatar of sageryd

ASKER

Comment accepted as answer
Avatar of sageryd

ASKER

Yep, I'm with ya! I've just had so much to do this week - last week in high school, had to work through that pile of homework before I could deal with anything else. But now I'm done with it! I feels soooo relieved! Lischke, you've done a very good job! You'll get the points! Thanks everyone else too!

--johan