Solved

crap in textfile

Posted on 1998-10-05
13
373 Views
Last Modified: 2010-05-19
Hallo,

I'm reading a text file like this:

(global) var lessX:integer;

var f:Textfile;
    s:string;

begin
AssignFile(f,'c:\test1.txt');
Reset(f);
while not eof(f) do
  begin
   readln(f,s);
   if s<'100' then inc(lessX);
  end;
CloseFile(s);
end;

Sometimes the textfile contains some crap and the while
loop breaks before the real EndOfFile is reached.
Is there a way to avoid this break ?
The mismatch chars reach from #0 to #255 .

regards
0
Comment
Question by:benni
  • 4
  • 3
  • 3
  • +2
13 Comments
 
LVL 1

Expert Comment

by:BlackDeath
ID: 1341799
could you mail me such a crappy file?
my email address can be found in my profile.

regs,
Black Death.
0
 
LVL 2

Expert Comment

by:rene100
ID: 1341800
you can try it with a TMemoryStream and the
method LoadFromFile(FileName).
perhaps this works

regards
rene
0
 

Author Comment

by:benni
ID: 1341801
Black Death
hmm the files are 40 MB or bigger ...

Rene
how do I extract strings from this method ?


0
U.S. Department of Agriculture and Acronis Access

With the new era of mobile computing, smartphones and tablets, wireless communications and cloud services, the USDA sought to take advantage of a mobilized workforce and the blurring lines between personal and corporate computing resources.

 
LVL 4

Expert Comment

by:erajoj
ID: 1341802
Hi,
Can you try the code below and tell us if the result (in the messagebox) is equal to the file size? so I know whether to put any energy into this.

var
  f: File;
  p: pointer;
  iSize, cRead, cTotal: Integer;
begin
  AssignFile( f, 'c:\test1.txt' );
  FileMode := 0;
  Reset( f, 1 );
  iSize := 1 shl 16; // 64kB
  cTotal := 0;
  GetMem( p, iSize );
  repeat
    BlockRead( f, p^, iSize, cRead );
    Inc( cTotal, cRead );
  until ( cRead<>iSize );
  FreeMem( p );
  CloseFile( f ); // not 's'!
  ShowMessage( IntToStr( iTotal ) + ' bytes read.' );
end;

/// John
0
 
LVL 4

Expert Comment

by:erajoj
ID: 1341803
Hi again,
What would happen if the line contains '0123', '+123' or ' 123'???
Would you do a miscalculation of "lessX"??
It seems so, since both strings above are less than '100' due to
their first characters position in the ASCII charset.

/// John
0
 
LVL 1

Accepted Solution

by:
ow earned 100 total points
ID: 1341804
Hi benni,

you have to scan for the strings like this:

  var
    F :file of char;
    C :char;
    S :string;
  begin
  AssignFile(F, TEXT_FILE);
  Reset(F);
  {Initialize S}
  S := '';
  while not EoF(F) do
    begin
    {Read one character}
    Read(F, C);
    if (C <> #13) then
      S := S + C
    else
      begin
      {Here you do with S what you want...}
      {...}
      ListBox1.Items.Add(S);
      {Reinitialize S}
      S := '';
      {Overread linefeed}
      Read(F, C);
      end;
    end;
  CloseFile(F);
  end;

Regards
  ow
0
 
LVL 1

Expert Comment

by:ow
ID: 1341805
Please delete the line
 ListBox1.Items.Add(S);
from the code (its from another use).

ow
0
 
LVL 4

Expert Comment

by:erajoj
ID: 1341806
Hi,
Yes, that answer really provides a fast solution for large files and is soo much better than Borlands own "ReadLn" implementation, reading one char at a time!!! ;-(
Will the answer work better than the original code if there are stray EOL's in the file? ...NO, it won't!
Please try my code example first, if you want a serious solution to the problem.

Typical solution from a sysadmin, scraping off the surface! ;-)

/// John
0
 
LVL 1

Expert Comment

by:BlackDeath
ID: 1341807
outch - circus maximus ?
>:->

benni - 40mb _zipped_ ?

Black Death.
0
 
LVL 1

Expert Comment

by:ow
ID: 1341808
Hi Benni, hi John!

The described solution will work better than Borlands Pascal, cause it uses type "text" and not "file". Therefore it will read the whole file and not terminate on char(26).
But you are right, I have forgotten single CRs.
To prevent that single CRs are seen as lineends, we have to check two characters.
With my simple example I wanted to show, that it's necessary to look at every char to determine the lineends.
And of course the following code is much more faster:

  type
    tCardinalArray = array[0..High(integer) div 2] of char;
  var
    FileStream :tFileStream;
    FileSize   :integer;
    Buffer     :^tCardinalArray;
    Index      :integer;
    LastIndex  :integer;
    C, C1      :char;
    Count      :integer;
    S          :string;
  begin
  FileStream := tFileStream.Create(TEXT_FILE, fmOpenRead);
  FileSize := FileStream.Size;
  Buffer := AllocMem(FileSize);
  FileStream.ReadBuffer(Buffer^, FileSize);
  FileStream.Free;
  {Initialize S}
  S := '';
  C1 := ' ';
  LastIndex := 0;
  for Index := 0 to FileSize - 1 do
    begin
    {Read one character}
    C := Buffer^[Index];
    {Test for CRLF}
    if (C1 = #13) and (C = #10) then
      begin
      Count := Index - LastIndex - 1;
      SetLength(S, Count);
      Move(Buffer^[LastIndex], S[1], Count);
      LastIndex := Index + 1;
      {Here you may do with S what you want...}
      {...}
      end;
    {Remember C for next loop}
    C1 := C;
    end;
  FreeMem(Buffer);  
  end;

regards
  ow

0
 
LVL 1

Expert Comment

by:ow
ID: 1341809
Hi Benni,

I remembered that you want to work on very large files.
So if you don't have enough RAM to read the whole file in, here is another version, which uses only a small buffer.
It is almost as fast as the version above (10 MB in 5 s on a P90).

  const
    MAX_BUF_SIZE = $FFF;
  var
    FileStream :tFileStream;
    FileSize   :integer;
    Buffer     :pByteArray;
    BufSize    :integer;
    Index      :integer;
    LastIndex  :integer;
    C, C1      :char;
    S          :string;

  procedure FromBufToS(BufPos :integer);
    var
      Count :integer;
      SLen  :integer;
    begin
    Count := BufPos - LastIndex;
    if (Count > 0) then
      begin
      SLen := Length(S);
      SetLength(S, SLen + Count);
      Move(Buffer^[LastIndex], S[SLen + 1], Count);
      end;
    end;

  begin
  BufSize := MAX_BUF_SIZE;;
  GetMem(Buffer, BufSize);
  FileStream := tFileStream.Create(TEXT_FILE, fmOpenRead);
  FileSize := FileStream.Size;
  {Initialize C1, S}
  C1 := ' ';
  S := '';
  while (FileSize > 0) do
    begin
    if (FileSize < BufSize) then
      BufSize := FileSize;
    FileStream.ReadBuffer(Buffer^, BufSize);
    Dec(FileSize, BufSize);
    LastIndex := 0;
    for Index := 0 to BufSize - 1 do
      begin
      {Read one character}
      C := char(Buffer^[Index]);
      {Test for CRLF}
      if (C1 = #13) and (C = #10) then
        begin
        if (Index = 0) then
          {Remove CR }
          Delete(S, Length(S), 1)
        else
          {Move chars to S, exclude CRLF}
          FromBufToS(Index - 1);
        LastIndex := Index + 1;
        {Here you do with S what you want...}
        {...}
        {Reset S}
        S := '';
        end;
      {Remember C for next loop}
      C1 := C;
      end;
    {Move Buffer to S}
    FromBufToS(BufSize);
    end;
  FreeMem(Buffer);
  end;


regards
  ow

0
 

Author Comment

by:benni
ID: 1341810
thanks ow - seems that it works ...

btw: your method is about 5 % slower than the readln, seems
that my implementation of your source has a little bit more overhead - but dont worry, time dosnt matter at this point :-) !

thx again

egono

0
 

Author Comment

by:benni
ID: 1341811
for all the other boys and girls - I accepted ow's last comment and not his answer !!!

0

Featured Post

Does Powershell have you tied up in knots?

Managing Active Directory does not always have to be complicated.  If you are spending more time trying instead of doing, then it's time to look at something else. For nearly 20 years, AD admins around the world have used one tool for day-to-day AD management: Hyena. Discover why

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Objective: - This article will help user in how to convert their numeric value become words. How to use 1. You can copy this code in your Unit as function 2. than you can perform your function by type this code The Code   (CODE) The Im…
Introduction Raise your hands if you were as upset with FireMonkey as I was when I discovered that there was no TListview.  I use TListView in almost all of my applications I've written, and I was not going to compromise by resorting to TStringGrid…
This Micro Tutorial will teach you how to censor certain areas of your screen. The example in this video will show a little boy's face being blurred. This will be demonstrated using Adobe Premiere Pro CS6.
Established in 1997, Technology Architects has become one of the most reputable technology solutions companies in the country. TA have been providing businesses with cost effective state-of-the-art solutions and unparalleled service that is designed…

777 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question