Link to home
Start Free TrialLog in
Avatar of TimFHayes
TimFHayes

asked on

Exception raised when trying to import large text files (> maxint) into streams using 64 bit Delphi XE5 to XE10

I work with a supplier who provides data files containing their product information for use by my system. Up until recently, their data file (in wide string / Unicode format) size was less than 2 gig and read easily into various Delphi components such as TStringList and TMemoryStream using the loadfromfile method.

Although I am already compiling in 64 bit, the data file has now grown to exceed 2 gig with the ensuing result that I get 'Stream Read Error'.

I have looked into the TStream Code and in various places the "size" variable in use is declared as an integer, hence the error. Although I am working on XE5, I tried the recent XE10 trial and got the same result.

Ideally I would like to work with TStringList or TDynStringArray to iterate through the input data, but I cannot find an easy work-around.

Am I missing something? I contacted Embarcadero tech support but their response was that they needed to keep the integer to protect 32 bit operations!

Any help or suggestions gratefully accepted.
Avatar of Sinisa Vuk
Sinisa Vuk
Flag of Croatia image

Could you provide some code (which reads data files....)? How you read, what you read, integer is 32bit in 32 bit Delphi as in 64 bit too. Maybe you do something wrong.... maybe some routine can be rewritten for large files .....
Avatar of TimFHayes
TimFHayes

ASKER

OK:

I wish to load an instance of TStringDynArray with a file whose size/length exceeds 2,147,483,647 bytes - it is 2,167,034,846 bytes.

When I call TFile.ReadAllLines (below) I get an error.

class function TFile.ReadAllLines(const Path: string): TStringDynArray;
var
  Encoding: TEncoding;
  Buff: TBytes;
  Text: string;
  BOMLength: Integer;

begin
  CheckReadAllLinesParameters(Path, nil, False);

  Encoding := nil;
  Buff := DoReadAllBytes(Path);
  BOMLength := TEncoding.GetBufferEncoding(Buff, Encoding);
  Text := Encoding.GetString(Buff, BOMLength, Length(Buff) - BOMLength);
  Result := GetStringArrayFromText(Text);
end;

Open in new window


It is clear that variable BOMLength being a 32bit integer cannot handle the length of my file.

Tracing TEncoding.GetBufferEncoding down I find the code will clearly only handle filesizes up to 2,147,483,647 bytes. All the work is done with integers:

class function TEncoding.GetBufferEncoding(const Buffer: TBytes; var AEncoding: TEncoding): Integer;
begin
  Result := GetBufferEncoding(Buffer, AEncoding, Default); // Must call property getter to create Encoding
end;

class function TEncoding.GetBufferEncoding(const Buffer: TBytes; var AEncoding: TEncoding;
  ADefaultEncoding: TEncoding): Integer;

  function ContainsPreamble(const Buffer, Signature: array of Byte): Boolean;
  var
    I: Integer;
  begin
    Result := True;
    if Length(Buffer) >= Length(Signature) then
    begin
      for I := 1 to Length(Signature) do
        if Buffer[I - 1] <> Signature [I - 1] then
        begin
          Result := False;
          Break;
        end;
    end
    else
      Result := False;
  end;

var
  Preamble: TBytes;
begin
  Result := 0;
  if AEncoding = nil then
  begin
    // Find the appropraite encoding
    if ContainsPreamble(Buffer, TEncoding.UTF8.GetPreamble) then
      AEncoding := TEncoding.UTF8
    else if ContainsPreamble(Buffer, TEncoding.Unicode.GetPreamble) then
      AEncoding := TEncoding.Unicode
    else if ContainsPreamble(Buffer, TEncoding.BigEndianUnicode.GetPreamble) then
      AEncoding := TEncoding.BigEndianUnicode
    else
    begin
      AEncoding := ADefaultEncoding;
      Exit; // Don't proceed just in case ADefaultEncoding has a Preamble
    end;
    Result := Length(AEncoding.GetPreamble);
  end
  else
  begin
    Preamble := AEncoding.GetPreamble;
    if ContainsPreamble(Buffer, Preamble) then
      Result := Length(Preamble);
  end;
end;

Open in new window


So I look at TMemoryStream.LoadFromFile which in turn calls its own LoadFromStream and contains further integer limitations:

procedure TMemoryStream.LoadFromStream(Stream: TStream);
var
  Count: Longint;
begin
  Stream.Position := 0;
  Count := Stream.Size;
  SetSize(Count);
  if Count <> 0 then Stream.ReadBuffer(FMemory^, Count);
end;

procedure TMemoryStream.LoadFromFile(const FileName: string);
var
  Stream: TStream;
begin
  Stream := TFileStream.Create(FileName, fmOpenRead or fmShareDenyWrite);
  try
    LoadFromStream(Stream);
  finally
    Stream.Free;
  end;
end;

Open in new window


Basically I cannot find a regular way to import this text file (which grows larger by the week).

Is there any way of approaching the problem please?

Thanks.
ASKER CERTIFIED SOLUTION
Avatar of Sinisa Vuk
Sinisa Vuk
Flag of Croatia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial