Solved

crap in textfile

Posted on 1998-10-05
13
376 Views
Last Modified: 2010-05-19
Hallo,

I'm reading a text file like this:

(global) var lessX:integer;

var f:Textfile;
    s:string;

begin
AssignFile(f,'c:\test1.txt');
Reset(f);
while not eof(f) do
  begin
   readln(f,s);
   if s<'100' then inc(lessX);
  end;
CloseFile(s);
end;

Sometimes the textfile contains some crap and the while
loop breaks before the real EndOfFile is reached.
Is there a way to avoid this break ?
The mismatch chars reach from #0 to #255 .

regards
0
Comment
Question by:benni
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
  • 3
  • +2
13 Comments
 
LVL 1

Expert Comment

by:BlackDeath
ID: 1341799
could you mail me such a crappy file?
my email address can be found in my profile.

regs,
Black Death.
0
 
LVL 2

Expert Comment

by:rene100
ID: 1341800
you can try it with a TMemoryStream and the
method LoadFromFile(FileName).
perhaps this works

regards
rene
0
 

Author Comment

by:benni
ID: 1341801
Black Death
hmm the files are 40 MB or bigger ...

Rene
how do I extract strings from this method ?


0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 4

Expert Comment

by:erajoj
ID: 1341802
Hi,
Can you try the code below and tell us if the result (in the messagebox) is equal to the file size? so I know whether to put any energy into this.

var
  f: File;
  p: pointer;
  iSize, cRead, cTotal: Integer;
begin
  AssignFile( f, 'c:\test1.txt' );
  FileMode := 0;
  Reset( f, 1 );
  iSize := 1 shl 16; // 64kB
  cTotal := 0;
  GetMem( p, iSize );
  repeat
    BlockRead( f, p^, iSize, cRead );
    Inc( cTotal, cRead );
  until ( cRead<>iSize );
  FreeMem( p );
  CloseFile( f ); // not 's'!
  ShowMessage( IntToStr( iTotal ) + ' bytes read.' );
end;

/// John
0
 
LVL 4

Expert Comment

by:erajoj
ID: 1341803
Hi again,
What would happen if the line contains '0123', '+123' or ' 123'???
Would you do a miscalculation of "lessX"??
It seems so, since both strings above are less than '100' due to
their first characters position in the ASCII charset.

/// John
0
 
LVL 1

Accepted Solution

by:
ow earned 100 total points
ID: 1341804
Hi benni,

you have to scan for the strings like this:

  var
    F :file of char;
    C :char;
    S :string;
  begin
  AssignFile(F, TEXT_FILE);
  Reset(F);
  {Initialize S}
  S := '';
  while not EoF(F) do
    begin
    {Read one character}
    Read(F, C);
    if (C <> #13) then
      S := S + C
    else
      begin
      {Here you do with S what you want...}
      {...}
      ListBox1.Items.Add(S);
      {Reinitialize S}
      S := '';
      {Overread linefeed}
      Read(F, C);
      end;
    end;
  CloseFile(F);
  end;

Regards
  ow
0
 
LVL 1

Expert Comment

by:ow
ID: 1341805
Please delete the line
 ListBox1.Items.Add(S);
from the code (its from another use).

ow
0
 
LVL 4

Expert Comment

by:erajoj
ID: 1341806
Hi,
Yes, that answer really provides a fast solution for large files and is soo much better than Borlands own "ReadLn" implementation, reading one char at a time!!! ;-(
Will the answer work better than the original code if there are stray EOL's in the file? ...NO, it won't!
Please try my code example first, if you want a serious solution to the problem.

Typical solution from a sysadmin, scraping off the surface! ;-)

/// John
0
 
LVL 1

Expert Comment

by:BlackDeath
ID: 1341807
outch - circus maximus ?
>:->

benni - 40mb _zipped_ ?

Black Death.
0
 
LVL 1

Expert Comment

by:ow
ID: 1341808
Hi Benni, hi John!

The described solution will work better than Borlands Pascal, cause it uses type "text" and not "file". Therefore it will read the whole file and not terminate on char(26).
But you are right, I have forgotten single CRs.
To prevent that single CRs are seen as lineends, we have to check two characters.
With my simple example I wanted to show, that it's necessary to look at every char to determine the lineends.
And of course the following code is much more faster:

  type
    tCardinalArray = array[0..High(integer) div 2] of char;
  var
    FileStream :tFileStream;
    FileSize   :integer;
    Buffer     :^tCardinalArray;
    Index      :integer;
    LastIndex  :integer;
    C, C1      :char;
    Count      :integer;
    S          :string;
  begin
  FileStream := tFileStream.Create(TEXT_FILE, fmOpenRead);
  FileSize := FileStream.Size;
  Buffer := AllocMem(FileSize);
  FileStream.ReadBuffer(Buffer^, FileSize);
  FileStream.Free;
  {Initialize S}
  S := '';
  C1 := ' ';
  LastIndex := 0;
  for Index := 0 to FileSize - 1 do
    begin
    {Read one character}
    C := Buffer^[Index];
    {Test for CRLF}
    if (C1 = #13) and (C = #10) then
      begin
      Count := Index - LastIndex - 1;
      SetLength(S, Count);
      Move(Buffer^[LastIndex], S[1], Count);
      LastIndex := Index + 1;
      {Here you may do with S what you want...}
      {...}
      end;
    {Remember C for next loop}
    C1 := C;
    end;
  FreeMem(Buffer);  
  end;

regards
  ow

0
 
LVL 1

Expert Comment

by:ow
ID: 1341809
Hi Benni,

I remembered that you want to work on very large files.
So if you don't have enough RAM to read the whole file in, here is another version, which uses only a small buffer.
It is almost as fast as the version above (10 MB in 5 s on a P90).

  const
    MAX_BUF_SIZE = $FFF;
  var
    FileStream :tFileStream;
    FileSize   :integer;
    Buffer     :pByteArray;
    BufSize    :integer;
    Index      :integer;
    LastIndex  :integer;
    C, C1      :char;
    S          :string;

  procedure FromBufToS(BufPos :integer);
    var
      Count :integer;
      SLen  :integer;
    begin
    Count := BufPos - LastIndex;
    if (Count > 0) then
      begin
      SLen := Length(S);
      SetLength(S, SLen + Count);
      Move(Buffer^[LastIndex], S[SLen + 1], Count);
      end;
    end;

  begin
  BufSize := MAX_BUF_SIZE;;
  GetMem(Buffer, BufSize);
  FileStream := tFileStream.Create(TEXT_FILE, fmOpenRead);
  FileSize := FileStream.Size;
  {Initialize C1, S}
  C1 := ' ';
  S := '';
  while (FileSize > 0) do
    begin
    if (FileSize < BufSize) then
      BufSize := FileSize;
    FileStream.ReadBuffer(Buffer^, BufSize);
    Dec(FileSize, BufSize);
    LastIndex := 0;
    for Index := 0 to BufSize - 1 do
      begin
      {Read one character}
      C := char(Buffer^[Index]);
      {Test for CRLF}
      if (C1 = #13) and (C = #10) then
        begin
        if (Index = 0) then
          {Remove CR }
          Delete(S, Length(S), 1)
        else
          {Move chars to S, exclude CRLF}
          FromBufToS(Index - 1);
        LastIndex := Index + 1;
        {Here you do with S what you want...}
        {...}
        {Reset S}
        S := '';
        end;
      {Remember C for next loop}
      C1 := C;
      end;
    {Move Buffer to S}
    FromBufToS(BufSize);
    end;
  FreeMem(Buffer);
  end;


regards
  ow

0
 

Author Comment

by:benni
ID: 1341810
thanks ow - seems that it works ...

btw: your method is about 5 % slower than the readln, seems
that my implementation of your source has a little bit more overhead - but dont worry, time dosnt matter at this point :-) !

thx again

egono

0
 

Author Comment

by:benni
ID: 1341811
for all the other boys and girls - I accepted ow's last comment and not his answer !!!

0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Tviruailstringtree sort multi columns on header click 1 84
enhance the following code 3 42
MS Access from Delphi 31 78
delphi popmenu non latine charcters 3 30
The uses clause is one of those things that just tends to grow and grow. Most of the time this is in the main form, as it's from this form that all others are called. If you have a big application (including many forms), the uses clause in the in…
Introduction The parallel port is a very commonly known port, it was widely used to connect a printer to the PC, if you look at the back of your computer, for those who don't have newer computers, there will be a port with 25 pins and a small print…
In an interesting question (https://www.experts-exchange.com/questions/29008360/) here at Experts Exchange, a member asked how to split a single image into multiple images. The primary usage for this is to place many photographs on a flatbed scanner…
How to Install VMware Tools in Red Hat Enterprise Linux 6.4 (RHEL 6.4) Step-by-Step Tutorial

740 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question