?
Solved

ClientDataset performance problem -- workaround/alternative(s) needed

Posted on 2005-04-19
38
Medium Priority
?
12,849 Views
Last Modified: 2011-08-18
BACKGROUND:
I have written an engineering application that uses a ClientDataset with DBGrid and navigator components making the development process of easier and simpler (or "cleaner") in terms of both code and appearance on the screen.  Previously, I had been using my own scheme for storing the data sets, making use TurboPower's Orpheus data table to enter, edit, and view data.

The purpose of my application (or, more accurately, subcomponent of an application) is to manage several data sets consisting of terrain data points. The data sets can be manipulated by combining them in various ways to make new data sets. They are stored together in a single database file as nested data sets.
 
~~~~~~~~~~~~~~~~~~~~~~~~
INPUT DATA DESCRIPTION:
The raw "terrain" data are coordinate points obtained from ground surveys and saved in "point files" (using AutoCAD terminology) that come right out of survey instruments. The raw data files can have a variety of formats. Data for a "point" consists of at least three floating point numbers (X, Y, and Z coordinates), but may also include a unique point ID number (integer), and a point code (string). These files are in CSV format and frequently contain more than 200,000 coordinate points.
 
~~~~~~~~~~~~~~~~~~~~~~~~
PERFORMANCE PROBLEM:
Since the ClientDataset.LoadFromFile method only supports CDS and XML file formats, I am reading the CSV file and Inserting the data to the ClientDataset (see code snippet below).  After about 20k-30k Inserts, the performance slows down dramatically.  However, once the data have been inserted into the ClientDataset and saved to a file, in either CDS or XML format, the LoadFromFile only takes a couple of seconds.  So the length of time it takes to insert the CSV data into the ClientDataset more than offsets its convenience.

I am interested in finding a solution to the slow import of CSV files into a client data base.  For me, that would be the cleanest way of storing my data files, specifically by making use of nested data sets and storing them as binary (*.CDS) files.  I prefer saving the CDS in binary format because it makes it harder for people to fiddle with the data, outside of my application's control, if it is not in a text format.  As for preventing people from mucking up a text data file, they will do it if it is possible.  They can still alter a binary file, but if they do not understand what they see they seem to be less likely to touch it.
If there is no solution to the ClientDataset insert performance, I would like to know about alternative solutions.
My DB programming knowledge is scant and this is my first EE question.
 
~~~~~~~~~~~~~~~~~~~~~~~~
SAMPLE CODE (includes attempts to insert fields by name and by number):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Factor := 100/(TerrainDataSet.PointList.Count - 1);
dlgProgressBar.Show;
ClientDataSet2.LogChanges := False;
with ClientDataSet2 do begin
// Open;
  LogChanges := False;
  for i := 0 to TerrainDataSet.PointList.Count - 1 do begin
    Application.ProcessMessages;
    if ((GetKeyState(VK_Escape) and 128) = 128) then begin
      if MessageDlg('Stop importing data?', mtConfirmation, [mbYes, mbNo], 0) = mrYes then Break;
    end;
    TP := TerrainDataSet.PointList[i];
//  Append;
//  FieldByName('Number').AsInteger := i;
//  FieldByName('X').AsFloat := TerrainPoint^.X;
//  FieldByName('Y').AsFloat := TerrainPoint^.Y;
//  FieldByName('Z').AsFloat := TerrainPoint^.Z;
//  FieldByName('Description').AsString := TerrainPoint^.Description;
    AppendRecord([TP^.Number,TP^.X,TP^.Y,TP^.Z,TP^.Description]);
    if (i mod 100 = 0) then dlgProgressBar.ProgressBar.Percent := Round(Factor*i);
 
    end;
end;
 
dlgProgressBar.Close;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0
Comment
Question by:dcfroehlich
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 17
  • 7
  • 4
  • +6
38 Comments
 
LVL 15

Expert Comment

by:mikelittlewood
ID: 13813765
Could I ask how you are loading the csv file and what you are storing it in before analysing.
0
 
LVL 22

Expert Comment

by:Mohammed Nasman
ID: 13814076
check kbmMemTable, it's faster than ClientDataset and supoprt binary format
http://www.components4programmers.com/products/kbmmemtable/

but it may the TClientDataset isnt the problem, maybe the way you parse and insert the cvs file
0
 

Author Comment

by:dcfroehlich
ID: 13814094
The csv text file is read record by record, and each item is stored in a linked list (TList).  Reading the file and storing the data in the list is fast.  Then, each item in the list is appended to the DataTable, using the snippet of code included at the end of the initial question.  This is where the delay occurs.  Tracking things down in the source code, the Addrecord procedure in the Delphi DB unit (DB.pas) is where I have ended my search for the problem.
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 22

Expert Comment

by:Mohammed Nasman
ID: 13814193
then check kbmMemTable, it's much faster then TClientDataset in appending
my tests show it's faster by at least 10X in adding 100,000 records
0
 
LVL 46

Expert Comment

by:aikimark
ID: 13815065
0
 
LVL 46

Expert Comment

by:aikimark
ID: 13815491
mnasman,

This ClientDataset performance question arose at last month's RTP Delphi Interest Group meeting.  According to the attendees that have used ClientDatasets, there seems a definite performance problem when inserting/appending new records to a ClientDataset.  A few thousand rows can be added with nearly linear performance.  Somewhere around 10k-20k records, the per-insert performance begins to drop.

I suspect the performance is due to memory management operations, much like string concatenation operations.  kbmMemTable (and other classes) address this performance issue by allocating larger blocks of memory or memory block chaining.  In the .Net world, the StringBuilder class is an implementation of this improved memory allocation to facilitate greatly improved concatenation operations (often used for building web pages).

============================
Possible CDS work-arounds (for the Delphi experts to consider):

1. Does anyone know the format of a CDS file?  It occurs to me that it should be possible to create a CDS formatted file directly from your Delphi program.

2. Is it possible to append multiple CDS files?  One could build multiple CDS files with good performance and then (possibly) combine their data into one contiguous CDS file.
0
 
LVL 46

Expert Comment

by:aikimark
ID: 13816878
dcfroehlich,

<<I prefer saving the CDS in binary format because it makes it harder for people to fiddle with the data...>>

If you find one of the above suggestions (working directly with CSV files) to your liking, you could store your copy of the CSV data in a password protected file (in ASCII format) in compressed format (.zip).  When you wanted to read the data, you could  programmatically uncompress the file with the password to a temporary file and then read the contents in CSV format.  Once you are finished with any file changes, you could rezip the file.  It wouldn't matter what the user did to the temporary file, you would only rely on the contents of your zip file until a new set of data is introduced.
0
 
LVL 26

Expert Comment

by:EddieShipman
ID: 13819331
I have been able to import tens of thousands of records into a CDS from CSV using a variant of this code.
This code has since been modified to use ADOQueries to import into SQLServer but it was originally
written to import into a CDS.

{Some of this code was copied from EE and some of it was copied from Borland's NGs.}

type
  TForm1 = class(TForm)
    Button1: TButton;
    OpenDialog1: TOpenDialog;
    ADOQuery1: TADOQuery;
    ADOConnection1: TADOConnection;
    Gauge1: TGauge;
    Button2: TButton;
    StatusBar1: TStatusBar;
    Memo1: TMemo;
    Label1: TLabel;
    procedure Button1Click(Sender: TObject);
    procedure Button2Click(Sender: TObject);
  private
    { Private declarations }
  public
    { Public declarations }
    bDone: Boolean;
    j1: Integer;
    gdtBegtime, gdtEndtime: TDatetime;
    giTotRecords :Integer;
    function GetEstimatedCompletion(ATtlRecs, ACompleteRecs: Integer; ABgTime: TDateTime): String;
    procedure ImportNDFile(Filename: TFIleName);
    procedure LoadLine(ALines: TStringList);
  end;

var
  Form1: TForm1;

implementation

{$R *.dfm}

procedure ParseCSVLine(ALine: string; AFields: TStrings);
var
  iState: cardinal;
  i: cardinal;
  iLength: cardinal;
  sField: string;
begin
  // determine length of input //
  iLength := Length(ALine);
  // exit if empty string //
  if iLength = 0 then
    Exit;
  // initialize State Machine //
  iState := 0;
  sField := '';

  // state machine //
  for i := 1 to iLength do
  begin
    case iState of

//----------------------------------------------------------------------//

      0: // unknown //
      begin
        sField := '';
        case ALine[i] of
          '"': // start of embedded quotes or commas //
          begin
            iState := 2;
          end;
          ',': // empty field //
          begin
            AFields.Add(sField);
          end;
          else
          begin // start of regular field //
            sField := ALine[i];
            iState := 1;
          end;
        end; // case ALine[i] of //
      end; // 0 //


//----------------------------------------------------------------------//

      1: // continuation of regular field //
      begin
        case ALine[i] of
          ',': // end of regular field //
          begin
            AFields.Add(sField);
            // if end of input, then we know there remains a "null" field //
            if (i = iLength) then
            begin
              AFields.Add('');
            end // (i = iLength) //
            else
            begin
              iState := 0;
            end;
          end;
          else // concatenate current char //
          begin
            sField := sField + ALine[i];
            if (i = iLength) then // EOL //
              AFields.Add(sField);
          end;
        end; // case ALine[i] //
      end; // 1 //


//----------------------------------------------------------------------//

      2: // continuation of embedded quotes or commas //
      begin
        case ALine[i] of
          '"': // end of embedded comma field or beginning of embedded quote
//
          begin
            if (i < iLength) then // NotTheEndPos //
            begin
              if (ALine[i+1] = ',') then
              begin // end of embedded comma field //
                iState := 1
              end
              else
              begin
                iState := 3;
              end;
            end
            else
            begin // end of field since end of line //
              AFields.Add(sField);
            end;
          end
          else // concatenate current char //
          begin
            sField := sField + ALine[i];
          end;
        end; // case ALine[i] //
      end; // 2 //


//----------------------------------------------------------------------//

      3: // beginning of embedded quote //
      begin
        case ALine[i] of
          '"':
          begin
            sField := sField + ALine[i];
            iState := 2;
          end;
        end; // case ALine[i] //
      end; // 3 //


//----------------------------------------------------------------------//
    end; // case iState //
  end; // for i := 1 to iLength //
end;

// This is the function that counts the number of lines
function TextLineCount_BufferedStream(const Filename: TFileName): Integer;
const
  MAX_BUFFER = 1024 *1024;
var
  oStream: TFileStream;
  sBuffer: string;
  iBufferSize: Integer;
  iSeek: Integer;
  bCarry: Boolean;
begin
  Result := 0;
  bCarry := False;
  oStream := TFileStream.Create(FileName, fmOpenRead or fmShareDenyWrite);
  try
    SetLength(sBuffer, MAX_BUFFER);
    repeat
      iBufferSize := oStream.Read(sBuffer[1], MAX_BUFFER);
      if iBufferSize <= 0 then
        break;
      {Skip LFs that follow a CR - even if it falls in seperate buffers}
      iSeek := 1;
      if bCarry and (sBuffer[1] = #10) then
        Inc(iSeek);
      while iSeek <= iBufferSize do
      begin
        case sBuffer[iSeek] of
        #10:
          Inc(Result);
        #13:
          if iSeek = iBufferSize then
            Inc(Result)
          else
          if sBuffer[iSeek + 1] <> #10 then
            Inc(Result)
          else
          begin
            Inc(Result);
            Inc(iSeek);
          end;
        end;
        Inc(iSeek);
      end;
      {Set carry flag for next pass}
      bCarry := (sBuffer[iBufferSize] = #13);
    until
      iBufferSize < MAX_BUFFER;
  finally
    FreeAndNil(oStream);
  end;
end;

function PosEx(const SubStr, S: string; Offset: Cardinal = 1): Integer;
var
  Tmp: PChar;
begin
  Result := 0;
  if (Offset > Cardinal(Length(S))) then exit;
  if Offset = 1 then
    Result := Pos(SubStr, S)
  else
  begin
    Tmp := StrPos(PChar(@S[Offset]), PChar(SubStr));
    if Tmp = nil then exit;
    Result := Cardinal(Tmp - PChar(@S[Offset])) + Offset;
  end;
end;

procedure Split(S, Delimiter: string; Strings: TStrings);
var
  P, OldP: integer;
  Token: string;
begin
  if (Strings = nil) or (Length(S) = 0) or (Length(Delimiter) = 0) then
    Exit;
  P := Pos(Delimiter, S);
  OldP := 1;
  while P > 0 do
  begin
    Token := Copy(S, OldP, P-OldP);
    Strings.Add(Token);
    OldP := P + 1;
    P := PosEx(Delimiter, S, OldP);
  end;
  if P = 0 then
    Strings.Add(Copy(S, OldP, Length(S)));
end;


procedure TForm1.LoadLine(ALines: TStringList);
  function ConvertExpirationDate(ADateString: String): String;
  var
    sl: TStringList;
    i: Integer;
    iDay, iMonth, iYear: Word;
    dt: TDateTime;
  begin
    Result := '';
    sl := TStringList.Create;
    try
      //         123456789
      // Must be 31-Oct-04
      if Pos('-', ADateString) > 0 then
      begin
        // We will do it this way if it is formatted
        // like this:
        // 31-Oct-04
        // 31-Dec-2004
        sl.Delimiter := '-';
        sl.DelimitedText := aDateString;

        iDay := StrToInt(sl[0]);
        iMonth := 12;
        for i := 1 to 12 do
        begin
          if sl[1] =  ShortMonthNames[I] then
          begin
            iMonth := i;
             Break;
          end;
        end;
        // if this is a four digit year
        if Length(Trim(sl[2])) = 4 then
          iYear := StrToInt(sl[2])
        else
          // else add 2000 to it.
          iYear := 2000 + StrToInt(sl[2]);
        dt := EncodeDate(iYear, iMonth, iDay);
      end
      else
        // ADateString is in standard mm/dd/yyyy format
        // May need to catch conversion errors here??
        dt := StrToDate(ADateString);
    finally
      sl.Free;
    end;
    Result := FormatDateTime('mm/dd/yy hh:nn:ss', dt);
  end;

var
  s: String;
begin
  // Let's convert the ExpirationDate to a DateTime
  s := ConvertExpirationDate(ALines[8]);
                          // You will need to write your own routine to generate
                          // surrogates for the two tables.
  ADOQuery1.InsertRecord(['PPW_'+ IntToStr(j1),           {PushPackWorkSurrogate}
                          'PP_'+ IntToStr(j1),            {PushPackSurrogate}
                          StrToIntDef(Trim(ALines[0]), 0) {Container-Container_Number},
                          Trim(Alines[2])                 {ID-NDC},
                          Trim(Alines[1])                 {Seal-Seal},
                          Trim(Alines[3])                 {Description-Description},
                          StrToIntDef(Trim(Alines[4]), 0) {QTY-Quantity},
                          Trim(Alines[5])                 {Primary UOM-PrimaryUOM},
                          Trim(Alines[6])                 {Packing UOM-PackingUOM},
                          Trim(Alines[7])                 {Lot #-Lot},
                          Trim(s)                         {Expiration Date-ExpirationDate},
                          Trim(Alines[9])                 {Type-Type},
                          // I left these five fields null, and modified the Table
                          // to reflect this. You will need to write code to gen the
                          // data to insert and place it here, also change the fields back
                          Null,                           {Error}
                          Null,                           {CreateDateTime}
                          Null,                           {CreateUserID}
                          Null,                           {UpdateDateTime}
                          Null,                           {UpdateUserID}
                          // I am just using the counter as a version, you need to fix this
                          // to insert what you think is the correct value
                          j1                              {Version}
                          ]);
  Inc(j1);
end;

procedure TForm1.ImportNDFile(Filename: TFIleName);
var
  t: TextFile;
  s: String;
  sl: TStringList;
  begtime, endtime: TDateTime;
begin
  Application.ProcessMessages;
  begtime := Now;
  bDone := False;
  Gauge1.Visible := False;
  AssignFile(t, FileName);
  Reset(t);
  sl := TStringList.Create;
  try
    // Skip to the third line.
    ReadLn(t, s);
    ReadLn(t, s);
    ReadLn(t, s);
    while (not Eof(t)) and (not bDone) and (s[1] <> 'z') {$IFDEF TESTING} and (j1 <= 100) {$ENDIF} do
    begin
      Application.ProcessMessages;
      ReadLn(t, s);
      if (Length(Trim(s)) = 0) then
        bDone := True;
      s := Trim(s);
      // Used when there is no container, see NOCTR records.
      if Pos(', ,', s) > 0 then
        s := StringReplace(s, ', ,', ',0,', [rfReplaceAll]);
      //
      // Used when there is a null date or null lot# and null date
      //
      s := StringReplace(s, ',,,',',0,12/31/2099,', [rfReplaceAll]);
      s := StringReplace(s, ',,',',12/31/2099,', [rfReplaceAll]);
      if Pos('""', s) >0 then
      begin
        s := StringReplace(s, '""', '''', [rfReplaceAll]);
        // s := StringReplace(s, '''''', '', [rfReplaceAll]);
      end;
      sl.Clear;
      if Length(s) > 0 then
      begin
        //s := StringReplace(s, '"', '', [rfReplaceAll]);
        if (j1 mod 10 = 0) and (j1 > 0) then
        begin
          Gauge1.Progress := Round((j1/giTotRecords)*100);
          if (j1 mod 200 = 0) and (j1 > 0) then
            StatusBar1.Panels[0].Text := 'Estimated Completion time: ' + GetEstimatedCompletion(giTotRecords, j1, gdtBegtime);
          Application.ProcessMessages;
        end;
        ParseCSVLine(s, sl);
        // Split(s, ',',  sl);
        LoadLine(sl);
        Label1.Caption := IntToStr(j1);
        Application.ProcessMessages;
      end;
    end;
  finally
    sl.Free;
    CloseFile(t);
    Gauge1.Visible := False;
    endtime := Now;
    begtime := endTime-begTime;
    ShowMessage(FormatDateTime('hh:nn:ss', begtime));
  end;
end;

procedure TForm1.Button1Click(Sender: TObject);
begin
  j1 := 0;
  if OpenDialog1.Execute then
  begin
    gdtBegtime := Now;
    giTotRecords := TextLineCount_BufferedStream(OpenDialog1.FileName);
    ImportNDFile(OpenDialog1.FileName);
  end;
end;

function TForm1.GetEstimatedCompletion(ATtlRecs, ACompleteRecs: Integer; ABgTime: TDateTime): String;
var
  LeftTime: TDateTime;
begin
  Result     := '';
  LeftTime   := (Now - ABgTime) / ACompleteRecs * (ATtlRecs - ACompleteRecs);
  Result     := FormatDateTime('hh:nn:ss', LeftTime);
end;


procedure TForm1.Button2Click(Sender: TObject);
begin
  Close;
end;
0
 
LVL 46

Expert Comment

by:aikimark
ID: 13819640
EddieShipman,

<<I have been able to import tens of thousands of records into a CDS from CSV using a variant of this code.>>

Since the question is about poor insert performance into a CDS, I don't see how your good performance with an ADOQuery is a possible solution.  Do you have the ADOQuery somehow connected to a ClientDataset?

"Tens of thousands" is only the start of the poor performance experienced by dcfroehlich.  He is inserting a quarter million rows in to the CDS.  It would have been much more helpful to see the original (not variant of) Delphi code that rapidly inserts many tens of thousands of CDS rows if it is different from the code in the question text that dcfroehlich has already tried.
0
 
LVL 26

Expert Comment

by:EddieShipman
ID: 13819689
"I don't see how your good performance with an ADOQuery is a possible solution"
The code was originally written to import inot a CDS as I previously stated.

One file we had had almost 1 million records and it was very fast, even with all the
parsing going on.

I no longer have the original code posting to the CDS or at least I don't think I do, let
me check...
0
 
LVL 26

Accepted Solution

by:
EddieShipman earned 750 total points
ID: 13819765
I found the old code and this was the key to my better performance:
{cdImportLegalSections is a TClientDataset}  
  .
  .
  while (not Eof(F)) and (not bDone) do
    begin
      dmDASMain.cdImportLegalSections.Insert;
      LoadSections(AReinYear);
      dmDASMain.cdImportLegalSections.Post;
      Inc(k);
      // ONLY Apply updates every 5000 records.
      if k >=5000 then
      begin
        dmDASMain.cdImportLegalSections.ApplyUpdates(-1);
        k := 0;
      end;
    end;
  finally
    // Make sure you apply the last updates.
    dmDASMain.cdImportLegalSections.ApplyUpdates(-1);
  .
  .
  end;

Here is the LoadSections code:
procedure TfmLegalSectionsImport.LoadSections(AReinYear: Integer);
var
  statecode, countycode: Integer;
  township, range: String;
  sOldtownship, sOldrange: String;
  iSection: Integer;
  iLegalSectionA, iLegalSectionB, iLegalSectionC,
  iLegalSectionD, iLegalSectionE, iLegalSectionF,
  iLegalSectionG, iLegalSectionH, iLegalSectionI,
  iLegalSectionJ: Integer;
  iLenofs: Integer;
begin
  statecode := 99;
  countycode := 0;
  if ByState then
  begin
    township   := Copy(s, 4, 4);
    range      := Copy(s, 8, 4);
  end
  else
  begin
    township   := Copy(s, 6, 4);
    range      := Copy(s, 10, 4);
  end;
  sOldtownship:= township;
  sOldrange   := range;
  iLegalSectionA := 0;
  iLegalSectionB := 0;
  iLegalSectionC := 0;
  iLegalSectionD := 0;
  iLegalSectionE := 0;
  iLegalSectionF := 0;
  iLegalSectionG := 0;
  iLegalSectionH := 0;
  iLegalSectionI := 0;
  iLegalSectionJ := 0;
  while (township=sOldtownship) and (range=sOldrange) and (not Eof(F)) do
  begin
    if ByState then
    begin
      // This is the layout of each state file, note no statecode
      //          1
      // 12345678901234567
      // -------------------
      // 001012N070W006NNN
      statecode  := States.StateAbbrevs[gsStateNo].StateCode;
      countycode := StrToInt(Copy(s, 1, 3));
      township   := Copy(s, 4, 4);
      range      := Copy(s, 8, 4);
      iSection   := StrToInt(Copy(s, 12, 3));
    end
    else
    begin
      // This is the USSECTIO.txt file layout
      //          1
      // 1234567890123456789
      // -------------------
      // 01001016N012E001NNN
      statecode  := StrToIntDef(Copy(s, 1, 2), 99);
      countycode := StrToIntDef(Copy(s, 3, 3), 999);
      township   := Copy(s, 6, 4);
      range      := Copy(s, 10, 4);
      iSection    := StrToIntDef(Copy(s, 14, 3), 9999);
    end;
    sOldtownship:= township;
    sOldrange   := range;
    lblSC.Caption := IntToStr(statecode);
    lblCC.Caption := IntToStr(countycode);
    lblLT.Caption := township;
    lblLR.Caption := range;
    lblLS.Caption := IntToStr(iSection);
    Application.ProcessMessages;
    case iSection of
     1..16:
      SetSection(iLegalSectionA, iSection, True);
     17..32:
      SetSection(iLegalSectionB, iSection-16, True);
     33..48:
      SetSection(iLegalSectionC, iSection-32, True);
     49..64:
      SetSection(iLegalSectionD, iSection-48, True);
     65..80:
      SetSection(iLegalSectionE, iSection-64, True);
     81..96:
      SetSection(iLegalSectionF, iSection-80, True);
     97..112:
      SetSection(iLegalSectionG, iSection-96, True);
     113..128:
      SetSection(iLegalSectionH, iSection-112, True);
     129..144:
      SetSection(iLegalSectionI, iSection-128, True);
     145..160:
      SetSection(iLegalSectionJ, iSection-144, True);
    end; {case}
    Inc(j);
    ReadLn(f, s);
    iLenofs := Length(Trim(s));
    if ByState then
    begin
      // If the line is short, let's skip it by reading the next line
      if (iLenofs < 17) then
        ReadLn(f, s);
      township   := Copy(s, 4, 4);
      range      := Copy(s, 8, 4);
    end
    else
    begin
      // If the line is short, let's skip it by reading the next line
      if (iLenofs < 19) then
        ReadLn(f, s);
      township   := Copy(s, 6, 4);
      range      := Copy(s, 10, 4);
    end;
  end;
  if (j mod 10 = 0) then
  begin
    PBar.Percent := Round((j/giTotRecords)*100);
    Application.ProcessMessages;
  end;
  if statecode < 99 then
  begin
    dmDASMain.cdImportLegalSections.FieldByName('StateCode').Value     := statecode;
    dmDASMain.cdImportLegalSections.FieldByName('CountyCode').Value    := countycode;
    dmDASMain.cdImportLegalSections.FieldByName('LegalTownship').Value := sOldtownship;
    dmDASMain.cdImportLegalSections.FieldByName('LegalRange').Value    := sOldrange;
    dmDASMain.cdImportLegalSections.FieldByName('ReinYear').Value      := AReinYear;
    dmDASMain.cdImportLegalSections.FieldByName('LegalSectionA').Value := iLegalSectionA;
    dmDASMain.cdImportLegalSections.FieldByName('LegalSectionB').Value := iLegalSectionB;
    dmDASMain.cdImportLegalSections.FieldByName('LegalSectionC').Value := iLegalSectionC;
    dmDASMain.cdImportLegalSections.FieldByName('LegalSectionD').Value := iLegalSectionD;
    dmDASMain.cdImportLegalSections.FieldByName('LegalSectionE').Value := iLegalSectionE;
    dmDASMain.cdImportLegalSections.FieldByName('LegalSectionF').Value := iLegalSectionF;
    dmDASMain.cdImportLegalSections.FieldByName('LegalSectionG').Value := iLegalSectionG;
    dmDASMain.cdImportLegalSections.FieldByName('LegalSectionH').Value := iLegalSectionH;
    dmDASMain.cdImportLegalSections.FieldByName('LegalSectionI').Value := iLegalSectionI;
    dmDASMain.cdImportLegalSections.FieldByName('LegalSectionJ').Value := iLegalSectionJ;
    Inc(l);
    if (j mod 200 = 0) and (Trim(Label10.Caption) = '')then
      Label10.Caption := GetEstimatedCompletion(giTotRecords, j, gdtBegtime);
  end;
end;

and, although not really needed,  SetSection:
procedure SetSection(var AData: Integer; ASection: Integer; AValid: Boolean);
begin
  if AValid then
    AData := AData or (1 shl Pred(ASection))
  else
    AData := (AData and (not (1 shl Pred(ASection))));
end;
0
 
LVL 26

Expert Comment

by:EddieShipman
ID: 13819788
dcfroehlich , have you tried to parse the CSV to XML and then use LoadFromFile on the XML?
That should be many times faster.
0
 
LVL 46

Expert Comment

by:aikimark
ID: 13820258
EddieShipman,

So you were able to batch the inserts by using the .Post method for 5000 records and only applying your cached data using the .ApplyUpdates method?  Correct me if I'm misinterpreting your code.

I assume this acts like transactions in other database environments.  This certainly reduces the number of 'internal commits' by a factor of 5000 and may affect the ClientDataset memory management behavior/decisions.

That's GREAT!  I'm certainly glad you found your original code.
0
 

Author Comment

by:dcfroehlich
ID: 13822507
EddieShipman:

Yes, converting the raw data csv file to an XML document, then using the TClientDataSet.LoadFromFile procedure to read the XML file works, but it takes several seconds and seems an odd way to go about populating the data table in a reasonable amount of time.  Your solution is better.  I will try it, after I study your code and figure it out.  Will post a reply to let you know how it goes.  Thanks for your help.
0
 
LVL 26

Expert Comment

by:EddieShipman
ID: 13824071
Yes, it works that way with ClientDatasets. It keeps an internal memory buffer and it doesn't
get applied until ApplyUpdates is called.

0
 
LVL 3

Expert Comment

by:JDuncan
ID: 13827139
Have you tried taking out the application.processmessages , this will slow things down quite a bit.
0
 
LVL 26

Expert Comment

by:EddieShipman
ID: 13827323
IT has to be in there for the progessbar unless he uses threads.
0
 
LVL 46

Expert Comment

by:aikimark
ID: 13827469
JDuncan,

Without the processmessages, how will the progress indicator get updated?

You are correct to include other factors that can/will slow this process:
* keyboard and mouse activity in this application
* form/control repainting because of other application form painting
* ignoring Windows messages from another application
* processing Windows messages as a result of changes to controls in this application (in this case a progress indicator)

=============================
So I guess we should follow some of these guidelines when running a (potentially) lengthy process within our applications:
* indicate to the user that the application is busy (hourglass mousepointer?)
* indicate to the user that the application will not respond to keyboard/mouse (display a "BUSY" message and disable the form?)
* ignore all external messages (don't process messages unless absolutely necessary
* show the user some progress (perhaps in a different, modal, form)
0
 

Author Comment

by:dcfroehlich
ID: 13829117
JDuncan:

The call to Application.ProcessMessages is needed to check for the ESC key being pressed.  Because appending the records to the ClientDataSet bogs down to a snails pace after 20 to 30 thousand records are processed, a way was needed to end data input nicely.  With or without the call to Application.ProcessMessages, the code as I have written it is not usable, more than one-half hour is needed to append several hundred thousand records from the linked list.  However, as noted in previous comments, importing the data from an existing cds binary file or from an XML document file is very fast.
0
 
LVL 3

Expert Comment

by:JDuncan
ID: 13829326
You should include another piece of code to reduce the amount of times ProcessMessages is called. If this is actually slowing down the input.


 i.e.
if i mod 8 >0 then Application.Processmessages.
0
 
LVL 19

Expert Comment

by:Limbeck
ID: 13846613
it is prob. a duhhhh question, but is this the Clientdataset what is connected to the grid nd other controls?

If so, you should add Clientdataset2.disablecontrols before the import, nd enablecontrols when finished.

good luck

Ed
0
 
LVL 26

Expert Comment

by:EddieShipman
ID: 13847067
I think that was a given.
0
 
LVL 19

Expert Comment

by:Limbeck
ID: 13847094
yup, but when all else fails, look for the obvious
0
 
LVL 3

Expert Comment

by:PeterIngham
ID: 13848185
I'm not an expert in CDS, but after reviewing the other comments here I thought I'd ask some Supplementary Questions:

When your application is in the "very slow" mode,  What is the system doing? - is the CPU usage very high (indicating poor algorthmic implementation yeilding very non-linear update speeds) or Disk usage is high (especially paging) indicating memory starvation leading to thrashing.  Does the slowdown occur after a predictable number of records?  Does the slowdown occur progressively or does it have a sudden onset?

Does the point at which the slowdown occurs change sigificantly if the system has double (or half) the memory in it?
0
 
LVL 1

Expert Comment

by:jturpin
ID: 13851138

One tiny comment: Don't call Application.ProcessMessages; so many times. Set a counter and call it every 10-100 times.
0
 
LVL 46

Expert Comment

by:aikimark
ID: 13851603
For future readers, here's a very good Dr. Bob (Swart) article on CDS
http://bdn.borland.com/borcon2004/article/paper/0,1963,32262,00.html

===============================
PeterIngham,

I doubt of dcfroehlich could easily alter their memory (physical RAM).  The only reasonable way to vary/test this question's constraints is through a virtual environment, such as VMWare or VirtualPC.

===============================
jturpin,

I think this has already been discussed in the preceeding comments (per JDuncan).  Or are you refering to something else that I'm incorrectly interpreting from your comment?

===============================
Limbeck & EddieShipman,

Since dcfroehlich identified himself as a database (CDS) newbie, "My DB programming knowledge is scant...", I think it would not be a 'given' assumption.  Now the question goes out to dcfroehlich...

Do you break the link between the visible controls (grid) and the CDS before you start inserting data from your CSV file?
0
 
LVL 46

Expert Comment

by:aikimark
ID: 13877844
dcfroehlich,

It's been a week since we've heard from you.  

* Where do you stand with our comments and suggestions?
* Is the grid control connected to your CDS while you are doing your mass inserts?
* Do you need clarification about our comments or the EddieShipman code example?
0
 
LVL 46

Expert Comment

by:aikimark
ID: 13970980
I've discussed the Post/ApplyUpdates suggestion with a local Delphi developer with much more experience with ClientDatasets.  This solution might not work for disconnected datasets, since the ApplyUpdates requires some defined data provider to point to a physical database.

0
 

Author Comment

by:dcfroehlich
ID: 13971796
Have tried some of the solutions mentioned in comments with no appreciable difference.  For the purposes of my application, I created the functionality of the CDS using TurboPower Orpheus data tables.  Importing data files consisting of millions of records is almost instantaneous (Oh, maybe a few seconds), and I manage do what is needed.  But, the CDS is a more efficient way to go about creating the code, and it looks better on the screen.  Also, you can save your data as a binary cds file, or as an xml file.  

Please, try it for yourself by plopping a ClientDataSet and a Table on a form and then populating the CDS with the same record ("1.0,2.0,3.0") a million times.  For Delphi Experts, this might take about two minutes.  See if your code slows to a crawl as mine does.  By the way, I am running Delphi 7.  Perhaps the problem does not occur with newer releases.
0
 
LVL 46

Assisted Solution

by:aikimark
aikimark earned 750 total points
ID: 14423518
On my 300MHz laptop, I ran some tests with a three field disconnected CDS (ID autonumber and two floating fields.  Two additional buttons save the CDS and load from an previously saved CDS.  I've included the code below the following performance figures:

Records      Time       Relative       Relative
      (sec)      Recount       Time
            increase      increase
5      0.000            
50      0.000      10.000      
500      0.030      10.000      
1000      0.070      2.000      2.3333
2000      0.160      2.000      2.2857
2500      0.190      1.250      1.1875
5000      0.500      2.000      2.6337
7500      0.963      1.500      1.9253
10000      1.588      1.333      1.6487
15000      3.385      1.500      2.1309
20000      5.670      1.333      1.6751
25000      8.646      1.250      1.5249
30000      12.198      1.200      1.4108
40000      21.157      1.333      1.7345
50000      32.537      1.250      1.5379
75000      74.597      1.500      2.2927      
100000      127.298      1.333      1.7065      
150000      260.159      1.500      2.0437      
200000      399.134      1.333      1.5342      
250000      568.798      1.250      1.4251      

Time to save 100000 CDS records: 0.6 seconds
time to load 250000 CDS records: 172 seconds
time to save 250000 CDS records: 2 seconds

=========================================
unit cdsMainForm;

interface

uses
  Windows, Messages, SysUtils, Classes, Graphics, Controls, Forms, Dialogs,
  StdCtrls, Db, DBClient, Mask;

type
  TForm1 = class(TForm)
    Label1: TLabel;
    cds1: TClientDataSet;
    cds1ID: TIntegerField;
    cds1VAL1: TIntegerField;
    cds1VAL2: TIntegerField;
    Button1: TButton;
    Button2: TButton;
    Button3: TButton;
    Edit1: TEdit;
    procedure Button1Click(Sender: TObject);
    procedure Button2Click(Sender: TObject);
    procedure Button3Click(Sender: TObject);
  private
    { Private declarations }
  public
    { Public declarations }
  end;

var
  Form1: TForm1;

implementation

{$R *.DFM}


procedure TForm1.Button1Click(Sender: TObject);
var
 i: integer;
 DateTime : TDateTime;

begin
  label1.caption:='Adding records';
  application.processmessages;
 DateTime := Time;  // store the current date and time

 with cds1 do
 begin
  close;
  createdataset;
  Logchanges:=false;
  for i := 1 to strtoint(edit1.Text) do       //250000
  begin
   append;
   cds1VAL1.AsFloat:=i;
   cds1VAL2.AsFloat:=i;
   cds1.Post;
  end;

  label1.caption:='Added Record '+inttostr(i-1) + ' records in ' + floattostrf((Time-datetime)*86400,fffixed,8,6) + ' seconds';
  application.processmessages;

 end;
end;

procedure TForm1.Button2Click(Sender: TObject);
var
 DateTime : TDateTime;

begin
 DateTime := Time;  // store the current date and time
 cds1.SaveToFile('myfile.cds',dfBinary);
 showmessage('Saved ' + inttostr(cds1.recordcount) + ' records in ' + floattostrf((Time-datetime)*86400,fffixed,8,6) + ' seconds');
end;

procedure TForm1.Button3Click(Sender: TObject);
var
 DateTime : TDateTime;

begin
 DateTime := Time;  // store the current date and time
 cds1.Close;
 cds1.LoadFromFile('myfile.cds');
 showmessage('Loaded ' + inttostr(cds1.recordcount) + ' records in ' + floattostrf((Time-datetime)*86400,fffixed,8,6) + ' seconds');
end;

end.
0
 
LVL 46

Expert Comment

by:aikimark
ID: 14444170
Mabe a little more helpful figures are these (I've already calculated the per-record times)

Per-record            Records
time (sec/rec)
0.00000            5
0.00000            50
0.00006            500
0.00007            1000
0.00008            2000
0.00008            2500
0.00010            5000
0.00013            7500
0.00016            10000
0.00023            15000
0.00028            20000
0.00035            25000
0.00041            30000
0.00053            40000
0.00065            50000
0.00099            75000
0.00127            100000
0.00173            150000
0.00200            200000
0.00228            250000

===================================
I also discovered from the questioner that multiple ClientDatasets are stored together in a nested TClientDataset configuration.
0
 
LVL 46

Expert Comment

by:aikimark
ID: 14457573
The only improvement suggestion I might make, using only the CDS and not one of the CSV-related components/units I've already suggested, starts with the EddieShipman proposal to use XML as an intermediate format for importing the raw (CSV format) data.  My suggestion would write the raw data in XML format to a stream, rather than writing to the hard drive.  Then use the CDS LoadFromStream method.

One restriction on this method is that the PC must have enough real memory to contain both the XML stream and the CDS simultaneously.  Once you've imported the XML data, you can free the XML-stream memory, but it will be about the fastest workaround using only the CDS, since it eliminates the I/O associated with the raw-to/thru-XML-to-CDS data conversion.
0
 
LVL 46

Expert Comment

by:aikimark
ID: 14463828
I did some test runs and determined that writing to a TMemoryStream object and then populating the CDS via LoadFromStream() method is about twice as fast as appending individual records to the CDS.

Just for grins, I used a TStringList object to store the '<ROW .../>' part of the XML generated in the loop.  I reduced the execution time to 1/4 of the CDS-individual-record-append method.
0
 
LVL 46

Expert Comment

by:aikimark
ID: 14471864
Marco Cantu presented string concatenation performance comparison at Borcon2004.
http://bdn.borland.com/borcon2004/article/paper/0,1963,32120,00.html

It looks like writing to disk might actually be faster than using the TStringLIst when you get up to 100000 (or more rows).  It also matters which version of Delphi you are using.  Near the conclusion of his presentation, Marco states "...consider that there are some relevant third party (and free) collections of string routines. One of the best known is the FastStrings unit by Peter Morris (http://www.droopyeyes.com). FastStrings aim, as the name implies, is to make existing Delphi routines like Pos or Replace faster. If you are looking more for additional routines you can look into JclStrings.pas of Project JEDI Code Library (JCL) or into the StStrL.pas unit and other units of the TurboPower SysTools."

From this, we might conclude that the fastest possible building of the XML will require one of the optimized string concatenation methods/classes/routines.  In the Delphi.Net world, use the stringbuilder class, which is optimized for concatenation.  One of the TeamB members, Peter Below, has written a StringBuilder class for the Win32 crowd.
0
 
LVL 46

Expert Comment

by:aikimark
ID: 14504287
Since the resolution of this CDS performance problem will ultimately lie with a code fix on Borland's part, I reported this as a problem to Borland's QC team, incident number 14530.

Since all of the execution times used an empty ClientDataset, there is an unknown distribution of loading and appending times for a CDS that already has records (probably as a result of a LoadFromFile/LoadFromStream operation.

============================
dcfroehlich,

The only thing we can do in this discussion thread is to examine the quickest ways to work around the CDS performance problem, since there doesn't apeear to be anything 'wrong' with your code.  Cary Jensen would state that 250000 rows is too much for a CDS, but your CDS performs well once it completes the importation of the CSV data and subsequent LoadFromFile times are acceptably quick.  Here are my workaround suggestions in order of their ease of implementation.

1. Use one of the off-the-shelf CSV-table solutions I've already linked to in my previous comment (04/19/2005).

2. Convert the CSV data into the (MyBase) XML format and then import the data into the CDS via a LoadFromFile method, followed by a deletion of the temporary XML file.  You can optimize this process by writing data in blocks equal to the sector size of the hard drive.  In this optimized scenario, you allocate an output buffer string and do memory copy operations to replace concatenation operations.  You may also choose to use one of the FastString classes downloadable from different web sites.

3a. Similar to recommendation #2, except you use a memorystream to eliminate the file I/O.  However, this isn't as fast as the temporary disk file implementation.
3b. Similar to recommendation #2, except you use a TStringList to cache the XML statements and stream this to a file for subsequent LoadFromFile CDS importing.
3c. Similar to recommendation #2, except you use a TStringList to cache the XML statements and stream this to a TMemoryStream object for subsequent LoadFromStream CDS importing.
3d. Similar to recommendation #2, except you use a TStringList to cache the XML statements and assign the TStringList.Text property to the CDS XMLData property.
3e. Similar to recommendation #2, except you use a huge String variable to cache the XML statements optimized to avoid concatenation and assign the String.Text property to the CDS XMLData property.

Note1: For steps 2-3x, the creation of the XML representation of the data is an order of magnitude faster than the importation of the XML into the clientdataset.  Using the TXMLTransform class was not tested and would be expected to be quite slow.

Note2: Keep the field names short, since the field names are used in every row of the XML representation of the data.  The less overhead, the quicker the process.

4. Since your metadata is fairly simple and consistent, reverse engineer the CDS binary format and LoadFromFile/LoadFromStream import this data format.  Apply similar optimizations mentioned in 3x above.  It seems similar to the XML format in that it contains some specification of the metadata (what fields you have) followed by the actual data.

5. Since the performance problem doesn't appear until after 10k-20k rows have been appended, create several nested client datasets to which you append 10k-20k rows and then combine the data from the individual nested datasets.

~~~~~~~~~~~~~~
It seems we have exhausted this topic.  Unless you have further questions, it would think this discussion thread worth closing.

~~~~~~~~~~~~~~
There are very good online articles on the ClientDataset written by Cary Jensen, Bob (DrBob) Swart, and Marco Cantu.  I've learned a lot about CDS and MyBase in researching this problem from the articles written by these three industry giants.
0
 
LVL 46

Expert Comment

by:aikimark
ID: 14518582
It appears that there are some memory-manager (MM) patches in MIDAS that may improve performance twofold (or better).  The most applicable patch is:

http://cc.borland.com/Item.aspx?id=20524
This changes the memory manager used from the Midas MM  to the Delphi MM.  Sample TClientDataset row appending code is given as an example of the performance improvement.
0
 
LVL 46

Expert Comment

by:aikimark
ID: 16497934
Comment for future readers of this thread:

* The performance problem wasn't fixed by an application of the MemoryManager patch that the TeamB member recommended.
* The questioner had already coded a work-around for this problem that avoided the use of CDS, so you are left to decide which suggested solution is best for you (CVS/table substitution, creating your own CVS front end that looks like a CDS to your grid, something else).
* Although CDS is very handy, neither Borland nor any CDS expert (Cary Jensen) would advise trying to insert lots of rows, as the questioner was doing.
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article explains how to create forms/units independent of other forms/units object names in a delphi project. Have you ever created a form for user input in a Delphi project and then had the need to have that same form in a other Delphi proj…
In this tutorial I will show you how to use the Windows Speech API in Delphi. I will only cover basic functions such as text to speech and controlling the speed of the speech. SAPI Installation First you need to install the SAPI type library, th…
Sometimes it takes a new vantage point, apart from our everyday security practices, to truly see our Active Directory (AD) vulnerabilities. We get used to implementing the same techniques and checking the same areas for a breach. This pattern can re…
In this video, Percona Director of Solution Engineering Jon Tobin discusses the function and features of Percona Server for MongoDB. How Percona can help Percona can help you determine if Percona Server for MongoDB is the right solution for …
Suggested Courses

764 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question