Read a large database into memory more efficiently

I have a program that I would like to make more efficient. A large part of the processing time is spent reading two large databases into memory (about 150,000 records each) so that I can do computations on them. I am reading them in with the construction:

mydatabasetable.first;
while not mydatabasetable.eof do begin
<assign the database fields to various variables>.
mydatabasetable.next;
end;

I am finding that it takes several minutes to read each database. Is there a faster way to get the data into memory?
riskassessorAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

pilotzCommented:
i think part "<assign the database fields to various variables>" takes the largest amount of time. check that, e.g. just iterate through mydatabasetable. If so, think, maybe you can change the assigning part, e.g. some computations or operations should be done later on.
0
HardiCommented:
If you use mydatabasetable['fieldname'], change it to mydatabasetable.FieldByName('fieldname').As...
It may be a little faster
0
riskassessorAuthor Commented:
My code already implements both the above suggestions. Thanks.
0
Cloud Class® Course: Microsoft Exchange Server

The MCTS: Microsoft Exchange Server 2010 certification validates your skills in supporting the maintenance and administration of the Exchange servers in an enterprise environment. Learn everything you need to know with this course.

KristaoCommented:
unit Unit1;

interface

uses
  Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,
  Dialogs, StdCtrls, DB, ADODB;

type
  TForm1 = class(TForm)
    ado: TADOQuery;
    Button1: TButton;
    procedure FormCreate(Sender: TObject);
    procedure FormDestroy(Sender: TObject);
    procedure Button1Click(Sender: TObject);
  private
    { Private declarations }
  public
    Buffer: TList;
  end;

type
  pDbData = ^rDbData;
  rDbData = record
    Name: string[255];
    Age: integer;
    Telephone: integer;
  end;

var
  Form1: TForm1;

implementation

{$R *.dfm}

procedure TForm1.FormCreate(Sender: TObject);
begin
  Buffer := TList.Create;
end;

procedure TForm1.FormDestroy(Sender: TObject);
begin
  while Buffer.Count <> 0 do
  begin
{$I-}
    dispose(Buffer[0]);
    Buffer.Delete(0);
{$I+}
  end;
  freeandnil(Buffer);
end;

procedure TForm1.Button1Click(Sender: TObject);
var
  k: pDbData;
begin
  ado.Open;
  while not (ado.Eof) do
  begin
    new(k);
    k.Name := ado.Fields.Fields[0].AsString;
    k.Age := ado.Fields.Fields[1].AsInteger;
    k.Telephone := ado.Fields.Fields[2].AsInteger;
    Buffer.Add(k);
    ado.Next;
  end;
end;

end.

ado.Fields.Fields[X].assomething my help a little bit. But the datasets are big have you tried to cut datasets in more then one part so mybe you could cut datasets in more parts and work with datasets in more than one thread.
0
riskassessorAuthor Commented:
Kristao, thanks for algorithm and ideas.

The algorithm you propose is essentially the same as that already in my program.

Regarding your suggestion of working with more than one thread, I have not tried it but would not have thought it would be any quicker on a single-processor machine.

In reply to several commenters, I find it makes little difference to the speed whether I use Fields.Fields[X].As... or FieldByName('fieldname').As... or FieldValues('fieldname').
0
KristaoCommented:
Ok is there is big datasets, i supose you need to use thouse data wich are in dataset. This idea could make your soft a litle bit quicker.

One process reads data from dataset in buffer

Second process take data from bufer and works with them, in this way you don't need to wait until all dataset is loaded in memory.

There will be litle collision in dataput(data:pointer) and dataget(var data:pointer), becouse in multithread u need to use TCriticalSection. > "TCriticalSection allows a thread in a multi-threaded application to temporarily block other threads from accessing a block of code."

I'm using this kind of tehnology my self. My soft get very big dataset there is more than 80 000 records in it, i can't wait until all data are in memory. I Start reading dataset, put the info in buffer, othere process just takes the data from and starts to work with data :). In my case there is 1 datareader and up to 10 dataworkers :)

regards
Kristao.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Delphi

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.