crap in textfile

Hallo,

I'm reading a text file like this:

(global) var lessX:integer;

var f:Textfile;
    s:string;

begin
AssignFile(f,'c:\test1.txt');
Reset(f);
while not eof(f) do
  begin
   readln(f,s);
   if s<'100' then inc(lessX);
  end;
CloseFile(s);
end;

Sometimes the textfile contains some crap and the while
loop breaks before the real EndOfFile is reached.
Is there a way to avoid this break ?
The mismatch chars reach from #0 to #255 .

regards
benniAsked:
Who is Participating?
 
owCommented:
Hi benni,

you have to scan for the strings like this:

  var
    F :file of char;
    C :char;
    S :string;
  begin
  AssignFile(F, TEXT_FILE);
  Reset(F);
  {Initialize S}
  S := '';
  while not EoF(F) do
    begin
    {Read one character}
    Read(F, C);
    if (C <> #13) then
      S := S + C
    else
      begin
      {Here you do with S what you want...}
      {...}
      ListBox1.Items.Add(S);
      {Reinitialize S}
      S := '';
      {Overread linefeed}
      Read(F, C);
      end;
    end;
  CloseFile(F);
  end;

Regards
  ow
0
 
BlackDeathCommented:
could you mail me such a crappy file?
my email address can be found in my profile.

regs,
Black Death.
0
 
rene100Commented:
you can try it with a TMemoryStream and the
method LoadFromFile(FileName).
perhaps this works

regards
rene
0
Introducing Cloud Class® training courses

Tech changes fast. You can learn faster. That’s why we’re bringing professional training courses to Experts Exchange. With a subscription, you can access all the Cloud Class® courses to expand your education, prep for certifications, and get top-notch instructions.

 
benniAuthor Commented:
Black Death
hmm the files are 40 MB or bigger ...

Rene
how do I extract strings from this method ?


0
 
erajojCommented:
Hi,
Can you try the code below and tell us if the result (in the messagebox) is equal to the file size? so I know whether to put any energy into this.

var
  f: File;
  p: pointer;
  iSize, cRead, cTotal: Integer;
begin
  AssignFile( f, 'c:\test1.txt' );
  FileMode := 0;
  Reset( f, 1 );
  iSize := 1 shl 16; // 64kB
  cTotal := 0;
  GetMem( p, iSize );
  repeat
    BlockRead( f, p^, iSize, cRead );
    Inc( cTotal, cRead );
  until ( cRead<>iSize );
  FreeMem( p );
  CloseFile( f ); // not 's'!
  ShowMessage( IntToStr( iTotal ) + ' bytes read.' );
end;

/// John
0
 
erajojCommented:
Hi again,
What would happen if the line contains '0123', '+123' or ' 123'???
Would you do a miscalculation of "lessX"??
It seems so, since both strings above are less than '100' due to
their first characters position in the ASCII charset.

/// John
0
 
owCommented:
Please delete the line
 ListBox1.Items.Add(S);
from the code (its from another use).

ow
0
 
erajojCommented:
Hi,
Yes, that answer really provides a fast solution for large files and is soo much better than Borlands own "ReadLn" implementation, reading one char at a time!!! ;-(
Will the answer work better than the original code if there are stray EOL's in the file? ...NO, it won't!
Please try my code example first, if you want a serious solution to the problem.

Typical solution from a sysadmin, scraping off the surface! ;-)

/// John
0
 
BlackDeathCommented:
outch - circus maximus ?
>:->

benni - 40mb _zipped_ ?

Black Death.
0
 
owCommented:
Hi Benni, hi John!

The described solution will work better than Borlands Pascal, cause it uses type "text" and not "file". Therefore it will read the whole file and not terminate on char(26).
But you are right, I have forgotten single CRs.
To prevent that single CRs are seen as lineends, we have to check two characters.
With my simple example I wanted to show, that it's necessary to look at every char to determine the lineends.
And of course the following code is much more faster:

  type
    tCardinalArray = array[0..High(integer) div 2] of char;
  var
    FileStream :tFileStream;
    FileSize   :integer;
    Buffer     :^tCardinalArray;
    Index      :integer;
    LastIndex  :integer;
    C, C1      :char;
    Count      :integer;
    S          :string;
  begin
  FileStream := tFileStream.Create(TEXT_FILE, fmOpenRead);
  FileSize := FileStream.Size;
  Buffer := AllocMem(FileSize);
  FileStream.ReadBuffer(Buffer^, FileSize);
  FileStream.Free;
  {Initialize S}
  S := '';
  C1 := ' ';
  LastIndex := 0;
  for Index := 0 to FileSize - 1 do
    begin
    {Read one character}
    C := Buffer^[Index];
    {Test for CRLF}
    if (C1 = #13) and (C = #10) then
      begin
      Count := Index - LastIndex - 1;
      SetLength(S, Count);
      Move(Buffer^[LastIndex], S[1], Count);
      LastIndex := Index + 1;
      {Here you may do with S what you want...}
      {...}
      end;
    {Remember C for next loop}
    C1 := C;
    end;
  FreeMem(Buffer);  
  end;

regards
  ow

0
 
owCommented:
Hi Benni,

I remembered that you want to work on very large files.
So if you don't have enough RAM to read the whole file in, here is another version, which uses only a small buffer.
It is almost as fast as the version above (10 MB in 5 s on a P90).

  const
    MAX_BUF_SIZE = $FFF;
  var
    FileStream :tFileStream;
    FileSize   :integer;
    Buffer     :pByteArray;
    BufSize    :integer;
    Index      :integer;
    LastIndex  :integer;
    C, C1      :char;
    S          :string;

  procedure FromBufToS(BufPos :integer);
    var
      Count :integer;
      SLen  :integer;
    begin
    Count := BufPos - LastIndex;
    if (Count > 0) then
      begin
      SLen := Length(S);
      SetLength(S, SLen + Count);
      Move(Buffer^[LastIndex], S[SLen + 1], Count);
      end;
    end;

  begin
  BufSize := MAX_BUF_SIZE;;
  GetMem(Buffer, BufSize);
  FileStream := tFileStream.Create(TEXT_FILE, fmOpenRead);
  FileSize := FileStream.Size;
  {Initialize C1, S}
  C1 := ' ';
  S := '';
  while (FileSize > 0) do
    begin
    if (FileSize < BufSize) then
      BufSize := FileSize;
    FileStream.ReadBuffer(Buffer^, BufSize);
    Dec(FileSize, BufSize);
    LastIndex := 0;
    for Index := 0 to BufSize - 1 do
      begin
      {Read one character}
      C := char(Buffer^[Index]);
      {Test for CRLF}
      if (C1 = #13) and (C = #10) then
        begin
        if (Index = 0) then
          {Remove CR }
          Delete(S, Length(S), 1)
        else
          {Move chars to S, exclude CRLF}
          FromBufToS(Index - 1);
        LastIndex := Index + 1;
        {Here you do with S what you want...}
        {...}
        {Reset S}
        S := '';
        end;
      {Remember C for next loop}
      C1 := C;
      end;
    {Move Buffer to S}
    FromBufToS(BufSize);
    end;
  FreeMem(Buffer);
  end;


regards
  ow

0
 
benniAuthor Commented:
thanks ow - seems that it works ...

btw: your method is about 5 % slower than the readln, seems
that my implementation of your source has a little bit more overhead - but dont worry, time dosnt matter at this point :-) !

thx again

egono

0
 
benniAuthor Commented:
for all the other boys and girls - I accepted ow's last comment and not his answer !!!

0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.