Solved

crap in textfile

Posted on 1998-10-05
13
370 Views
Last Modified: 2010-05-19
Hallo,

I'm reading a text file like this:

(global) var lessX:integer;

var f:Textfile;
    s:string;

begin
AssignFile(f,'c:\test1.txt');
Reset(f);
while not eof(f) do
  begin
   readln(f,s);
   if s<'100' then inc(lessX);
  end;
CloseFile(s);
end;

Sometimes the textfile contains some crap and the while
loop breaks before the real EndOfFile is reached.
Is there a way to avoid this break ?
The mismatch chars reach from #0 to #255 .

regards
0
Comment
Question by:benni
  • 4
  • 3
  • 3
  • +2
13 Comments
 
LVL 1

Expert Comment

by:BlackDeath
Comment Utility
could you mail me such a crappy file?
my email address can be found in my profile.

regs,
Black Death.
0
 
LVL 2

Expert Comment

by:rene100
Comment Utility
you can try it with a TMemoryStream and the
method LoadFromFile(FileName).
perhaps this works

regards
rene
0
 

Author Comment

by:benni
Comment Utility
Black Death
hmm the files are 40 MB or bigger ...

Rene
how do I extract strings from this method ?


0
 
LVL 4

Expert Comment

by:erajoj
Comment Utility
Hi,
Can you try the code below and tell us if the result (in the messagebox) is equal to the file size? so I know whether to put any energy into this.

var
  f: File;
  p: pointer;
  iSize, cRead, cTotal: Integer;
begin
  AssignFile( f, 'c:\test1.txt' );
  FileMode := 0;
  Reset( f, 1 );
  iSize := 1 shl 16; // 64kB
  cTotal := 0;
  GetMem( p, iSize );
  repeat
    BlockRead( f, p^, iSize, cRead );
    Inc( cTotal, cRead );
  until ( cRead<>iSize );
  FreeMem( p );
  CloseFile( f ); // not 's'!
  ShowMessage( IntToStr( iTotal ) + ' bytes read.' );
end;

/// John
0
 
LVL 4

Expert Comment

by:erajoj
Comment Utility
Hi again,
What would happen if the line contains '0123', '+123' or ' 123'???
Would you do a miscalculation of "lessX"??
It seems so, since both strings above are less than '100' due to
their first characters position in the ASCII charset.

/// John
0
 
LVL 1

Accepted Solution

by:
ow earned 100 total points
Comment Utility
Hi benni,

you have to scan for the strings like this:

  var
    F :file of char;
    C :char;
    S :string;
  begin
  AssignFile(F, TEXT_FILE);
  Reset(F);
  {Initialize S}
  S := '';
  while not EoF(F) do
    begin
    {Read one character}
    Read(F, C);
    if (C <> #13) then
      S := S + C
    else
      begin
      {Here you do with S what you want...}
      {...}
      ListBox1.Items.Add(S);
      {Reinitialize S}
      S := '';
      {Overread linefeed}
      Read(F, C);
      end;
    end;
  CloseFile(F);
  end;

Regards
  ow
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 1

Expert Comment

by:ow
Comment Utility
Please delete the line
 ListBox1.Items.Add(S);
from the code (its from another use).

ow
0
 
LVL 4

Expert Comment

by:erajoj
Comment Utility
Hi,
Yes, that answer really provides a fast solution for large files and is soo much better than Borlands own "ReadLn" implementation, reading one char at a time!!! ;-(
Will the answer work better than the original code if there are stray EOL's in the file? ...NO, it won't!
Please try my code example first, if you want a serious solution to the problem.

Typical solution from a sysadmin, scraping off the surface! ;-)

/// John
0
 
LVL 1

Expert Comment

by:BlackDeath
Comment Utility
outch - circus maximus ?
>:->

benni - 40mb _zipped_ ?

Black Death.
0
 
LVL 1

Expert Comment

by:ow
Comment Utility
Hi Benni, hi John!

The described solution will work better than Borlands Pascal, cause it uses type "text" and not "file". Therefore it will read the whole file and not terminate on char(26).
But you are right, I have forgotten single CRs.
To prevent that single CRs are seen as lineends, we have to check two characters.
With my simple example I wanted to show, that it's necessary to look at every char to determine the lineends.
And of course the following code is much more faster:

  type
    tCardinalArray = array[0..High(integer) div 2] of char;
  var
    FileStream :tFileStream;
    FileSize   :integer;
    Buffer     :^tCardinalArray;
    Index      :integer;
    LastIndex  :integer;
    C, C1      :char;
    Count      :integer;
    S          :string;
  begin
  FileStream := tFileStream.Create(TEXT_FILE, fmOpenRead);
  FileSize := FileStream.Size;
  Buffer := AllocMem(FileSize);
  FileStream.ReadBuffer(Buffer^, FileSize);
  FileStream.Free;
  {Initialize S}
  S := '';
  C1 := ' ';
  LastIndex := 0;
  for Index := 0 to FileSize - 1 do
    begin
    {Read one character}
    C := Buffer^[Index];
    {Test for CRLF}
    if (C1 = #13) and (C = #10) then
      begin
      Count := Index - LastIndex - 1;
      SetLength(S, Count);
      Move(Buffer^[LastIndex], S[1], Count);
      LastIndex := Index + 1;
      {Here you may do with S what you want...}
      {...}
      end;
    {Remember C for next loop}
    C1 := C;
    end;
  FreeMem(Buffer);  
  end;

regards
  ow

0
 
LVL 1

Expert Comment

by:ow
Comment Utility
Hi Benni,

I remembered that you want to work on very large files.
So if you don't have enough RAM to read the whole file in, here is another version, which uses only a small buffer.
It is almost as fast as the version above (10 MB in 5 s on a P90).

  const
    MAX_BUF_SIZE = $FFF;
  var
    FileStream :tFileStream;
    FileSize   :integer;
    Buffer     :pByteArray;
    BufSize    :integer;
    Index      :integer;
    LastIndex  :integer;
    C, C1      :char;
    S          :string;

  procedure FromBufToS(BufPos :integer);
    var
      Count :integer;
      SLen  :integer;
    begin
    Count := BufPos - LastIndex;
    if (Count > 0) then
      begin
      SLen := Length(S);
      SetLength(S, SLen + Count);
      Move(Buffer^[LastIndex], S[SLen + 1], Count);
      end;
    end;

  begin
  BufSize := MAX_BUF_SIZE;;
  GetMem(Buffer, BufSize);
  FileStream := tFileStream.Create(TEXT_FILE, fmOpenRead);
  FileSize := FileStream.Size;
  {Initialize C1, S}
  C1 := ' ';
  S := '';
  while (FileSize > 0) do
    begin
    if (FileSize < BufSize) then
      BufSize := FileSize;
    FileStream.ReadBuffer(Buffer^, BufSize);
    Dec(FileSize, BufSize);
    LastIndex := 0;
    for Index := 0 to BufSize - 1 do
      begin
      {Read one character}
      C := char(Buffer^[Index]);
      {Test for CRLF}
      if (C1 = #13) and (C = #10) then
        begin
        if (Index = 0) then
          {Remove CR }
          Delete(S, Length(S), 1)
        else
          {Move chars to S, exclude CRLF}
          FromBufToS(Index - 1);
        LastIndex := Index + 1;
        {Here you do with S what you want...}
        {...}
        {Reset S}
        S := '';
        end;
      {Remember C for next loop}
      C1 := C;
      end;
    {Move Buffer to S}
    FromBufToS(BufSize);
    end;
  FreeMem(Buffer);
  end;


regards
  ow

0
 

Author Comment

by:benni
Comment Utility
thanks ow - seems that it works ...

btw: your method is about 5 % slower than the readln, seems
that my implementation of your source has a little bit more overhead - but dont worry, time dosnt matter at this point :-) !

thx again

egono

0
 

Author Comment

by:benni
Comment Utility
for all the other boys and girls - I accepted ow's last comment and not his answer !!!

0

Featured Post

What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

The uses clause is one of those things that just tends to grow and grow. Most of the time this is in the main form, as it's from this form that all others are called. If you have a big application (including many forms), the uses clause in the in…
Introduction The parallel port is a very commonly known port, it was widely used to connect a printer to the PC, if you look at the back of your computer, for those who don't have newer computers, there will be a port with 25 pins and a small print…
Internet Business Fax to Email Made Easy - With eFax Corporate (http://www.enterprise.efax.com), you'll receive a dedicated online fax number, which is used the same way as a typical analog fax number. You'll receive secure faxes in your email, fr…
Here's a very brief overview of the methods PRTG Network Monitor (https://www.paessler.com/prtg) offers for monitoring bandwidth, to help you decide which methods you´d like to investigate in more detail.  The methods are covered in more detail in o…

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

9 Experts available now in Live!

Get 1:1 Help Now