Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 600
  • Last Modified:

file of records

I made a database "type" program a while back that can store upto 1 million records in a file as big as 2Gb.  I use it for many things and it has worked very well so far.

What i am now doing is going back into the code and adding some new parts to it - one of the parts i'd like to add is the ability to delete a record within the file and move the other files up.

I tried the usual of seeking to the file to be deleted, loading all of the records after it into memory, truncate the file and then putting the records in memory back into the file.

Problem is that it uses a lot of memory and sometimes i get the "out of memory" message depending on what else i have running at the same time.

Anyone got a way to do the same thing, perhaps by using a temporary file, etc?
0
mi6agent
Asked:
mi6agent
  • 6
  • 5
  • 3
  • +4
1 Solution
 
kretzschmarCommented:
you may take a look into my solution at
http://www.experts-exchange.com/Programming/Programming_Languages/Delphi/Q_20147576.html#6273793

does not fit exact your needs, but i hope it shows one way

meikl ;-)
0
 
mi6agentAuthor Commented:
Thanks meikl - i used all of that code to teach myself along the way on creating my database program.

The problem is that loading the records into memory (as in the example you provided) is no longer possible.  That is why i thought maybe making a temporary file was a good idea?



0
 
VoodoomanCommented:
Comment only

Hi

The way this is normally done as you say is by writing temp files.

Read 'x' amount of the file and then append it to your temp file, when finished, trash the old file and rename the temp file with the original file name.

You only need to look at hardrive activity on a PC where you are doing bulk deletes to realise that this is what is going on.  This partly expalians why bulk deleting records from a table is so time consuming - due to the high levels of read/writes.

You should only get out od memory messages if you are low on disk space, your temp directory is clutterered up with files (there is a limit on files per directory - I am not sure how many, or your swap file is to small).

Voodooman

0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
fsurferCommented:
hi,

Why trying to consume time on such a "record move" each time you suppress a record ?

i would suggest you to create a kind of index, parallel to your big file (or whether in your big file), of all your records (marking the deleted ones as deleted).

The thing would be then to have a kind of maintenance tool, or command, to coalesce your file : rebuilding it using the "temporary" solution mentioned by voodooman.

i'm thinking in fact about Oracle way of working : splitting records and when chaining is above a certain percentage, then applying the ALTER TABLESPACE nnn COALESCE.

The thing would be to modify your "file" driver to keep the chaining value correctly set and then having an alert or something to tell you to use the tool...

just an idea...

hope this helps,

best regards,

Guillaume
0
 
mi6agentAuthor Commented:
That would be my goal - to mark a record as deleted so i use a record marked as deleted instead of inserting a new record structure.

I'm not familiar with oracle though.

I am looking just to delete all records only when i load the file - which i could then show a progressbar for.

But it is working out how to make use of a temporary file that has me lost.
0
 
VoodoomanCommented:
Hi

I agree with Guillaume regarding the index idea.

In the old DOS databases it was normal to mark a record as **DELETED** you would then periodically purge deleted files.

This actually is still fact.  When you look at MSAccess the space is not freed unless you do a database compact - this is the same as 'purging' in the old database.

Obviously you are able to edit your recs.  Why not just add a field 'Status' to the start or end of each rec with a 0 or 1 to indicate whether the rec is deleted or not.

You could pretty obviously loop through the recs adding them to a string (where they were not empty).  Then every x recs (your approximation of 64k/128k or whatever you deem practical) append the string to your temp file.

Someone is going to point out that this is very inefficient (which it is) but it does approximate in some ways to writing 'pages' of data.

Before someone jumps on this one, remember that the problem is that you would need to loop through the recs to find the empty one or the one marked **DELETED***.  Another way would be simply to grab a chunk of the file and test for anu instance of **DELETED** in the chunk.  If there is no **DELETED in the chunk write it, if there is an instance of **DELETED** write the chunk to another temp so you can loop through and clean it up.

Just some theoretical ideas (if you dont try you dont get!!)

Voodooman

0
 
fsurferCommented:
Voodooman,

i would process in an easier way :

each index record in the maintained index file would have, beside the index value and the offset in the file, a field marking the record deleted or not.

Then, by reconstructing or coalescing the file, the easiest way would be to read all the index records and duplicate into a temporary file all the undeleted entries and then renaming the temp file and a new index file.

i think, if i remember well, that was the way dBase was working.

HTH,

Best regards,

Guillaume
0
 
VoodoomanCommented:
Hi

I agree with what you are saying Guillaume to an extent.  The real problem is that you dont want to read and especially  write every single record because it would be very, very slow.

I remember before databound grids existed and we had to fill the grids manually from the recsets - it was very,very slow.

Very interesting topic - basically its 'how to create a database' - personally I would rather just buy one.....

Still interesting

Voodooman
0
 
Slick812Commented:
hello  mi6agent, , although you could create a temp file to hold the records you want to move foward, I think what I would try (since you say this is just an Out of Memory problem for larger files), , is to read to memory and move (write) smaller blocks of the file several times, using some amount of memory (maybe 10 megs) that will not tax your system , if you are just deleting a record, then all record movement would be forward, so any records in the file after the last file  read and move would still have valid data in them to copy and move up in the next FOR or WHILE loop
0
 
fsurferCommented:
voodooman,

well the thing would be do the records have to be moved "while in use", as the un fragmentation process for un*x filesystems ?

if that's the case, then eventually, yes, use an existing db. or if you still, mi6agent, want to use your "big datafile", may be then have a process that does the process in background... but then it's reinventing the wheel, and then still use a db that already does it :)

In order to prevent memory failure, what i would do, mi6agent, is :
1. When loading the app, searching through the file - or through a built index - an index of all the records (index value, file offset, deleted flag).
2. When accessing data, use that index in memory. if your index value is an integer and the offset a cardinal, that makes 8 bytes for an entry, hence just above 8Mb of memory for a million of records.
2b. if deleting data, just mark the index entry as deleted.
3. when closing your app, rewrite the index file plus the deleted record in the datafile.

the point 3 can be made within an external - maintenance - tool provided with your app.

Best regards,

Guillaume
0
 
aikimarkCommented:
This could be a HUGE discussion thread if the topic were covered completely.  Although the question was specific about deleted records, this thread seems to have expanded.  So I figured it was time to opine...

Problems:
1. Very large files can/will exceed the ability of 32bit (long integer) record pointers normally used by the standard file access I/O routines and Windows API calls.  This can be worked around in Delphi (I know the DLSuperC author did it).

2. Reading a very large file into memory isn't feasible or even possible sometimes.  Exceeding available physical memory will tap into virtual memory, which adversely affects entire system performance.  It's possible that there isn't enough physical and virtual memory to accomodate all the records.  Usually, very large files provide a sliding view of the records in the buffer.  A balance is struck between physical memory allocation, I/O operations, and application performance.
Hint: look at some of the listbox controls that provide a virtual view of a file.  Look at some of the MemTable features.

3. The experts in this discussion have already mentioned logical record deletions with later physical deletions.  In addition to taking up space on disk and not meeting your application requirement to be able to delete records, these deleted records consume/waste I/O cycles and buffer space.  You need to decide how much this hurts your application and provide some space relief while achieving some balance of the performance (and capacity) factors.  There are systems, like btrieve, that reuse deleted record space.

4. Typical sequential file systems use a generational approach (current master file spawns a new master file with each running of the maintenance application -- add/change/delete records.  However, this can be very expensive to have multiple 2GB files on your hard drive.

Questions (these sure do bring back memories):
Q1. Is all/some of the record access sequential?
Q2. Is some of the record access direct (position, record number, record key)?
Q3. How are new records added?
Q4a. Where are new records added?  
Q4b. Does the physical record placement matter?
Q5. Why would you need an Index?
Q6. How often would the index be used to access the records?
Q7. What criteria (if any) will you use to judge index performance and need to consolidate space?
Q8. Can deleted record space be reused?

General comments:
G1. Don't reinvent the database (wheel).
G2. Look to the past if this is a sequential system.
G3. If you have fixed length records, you can group multiple records into readable/writeable blocks.  Most databases use a 4K or 2K blocksize.
G4. If new records can be added anywhere in the file, consider adding a freespace factor for your record 'blocks'.
G5. You can reorganize (deleted record space consolidation) your file space by the block instead of cleaning up (compacting) the entire file.
G6. If you have variable length records, consider splitting the fixed part from the variable part of the records.
G7. You can move many records at a time by copying the memory, rather than copying individual records.
G8. Beware of deleting records in front-to-back fashion.
0
 
mi6agentAuthor Commented:
>This could be a HUGE discussion thread if the topic were covered completely.

Sheesh, so i noticed :)  I was just looking for a quick way of removing deleted records and "compacting" the database file.  I've seen others use the same "file of records" idea and just wanted to implement some of the same ideas for my own too.

As to the questions:

Q1. Is all/some of the record access sequential?

Not really, the user can access the records in any order they wish - one at a time.

Q2. Is some of the record access direct (position, record number, record key)?

Access is via record number

Q3. How are new records added? And Q4a. Where are new records added?  

At the end of the file - unless a deleted record is found then it is inserted there

Q4b. Does the physical record placement matter?

Not sure what you mean here - but each record has its own record number saved in the record itself

Q5. Why would you need an Index?

I don't.

Q6. How often would the index be used to access the records?, and Q7. What criteria (if any) will you use to judge index performance and need to consolidate space?

Don't use an index - it's only a simple record of files.

Q8. Can deleted record space be reused?

Yes, i mark a deleted record with a "gravestone" marker and reuse it when a record is added to the file.
0
 
DragonSlayerCommented:
> Yes, i mark a deleted record with a "gravestone" marker and reuse it when a record is added to the file.

In such a case, you might also need to maintain an INDEX of deleted records? Otherwise, your INSERTion operations will be *slow* especially if you need to traverse through all the records to see if you can reuse the space.

If on the other hand, you do not support reuse, the operation will be faster because you will keep appending records to the end, and only COMPACTing it when the user elects to do so.



DragonSlayer.
0
 
mi6agentAuthor Commented:
> If on the other hand, you do not support reuse, the operation will be faster because you will keep appending records to the end, and only COMPACTing it when the user elects to do so.


Which brings me 180 degrees - back to my original question - how to effectively compact such a file :)

0
 
DragonSlayerCommented:
mi6agent, check this out http://www.nexusdb.com/showpage.asp?Id=122

The embedded version of NexusDB has just been released as freeware (without source), and if your project does not really require you to reinvent the wheel, you might just give this a try...
0
 
fsurferCommented:
mi6Agent,

the main idea, i think, to compact your database file would be the following :
1. create an index file. your index entry would be a record of :
- index value (ie record number)
- offset of the record in datafile
- offset of the index entry in index file
- deleted flag

2. at your application startup, load the index file into memory. you'll have a bit of memory overload, but less than loading your datafile completely.

3. each time you want to access a record (adding, modifying, *deleting*), use the index in memory. Much more efficient to access data and/or to control the deleted record. don't forget to write modified index entries to the index file (at least maintain the index file).

4. what you can do is maintain a ratio of (deleted records) / (valid records). If this ratio is above a value, then launch a warning dialog asking for a maintenance (compacting) operation.

5. the maintenance operation would process the following way :
- load the index file
- create a new datafile (temporary)
- create a new indexfile (temporary)
- parse the index and copy several valid records at the same time (blockreads ?) to the temporary file.
- update the new index file.
- when done, delete the old index and data files. and rename the new ones.

this is theorically, of course ;)

hth,

best regards,

Guillaume
0
 
fsurferCommented:
dragonslayer,

(sorry was typing my message while you posted yours ;)

the thing, if i've well understand mi6agent's problem, is that his application is already running and he wants to improve it.

mi6agent, consider one thing :
is your application too heavy to consider the database replacement ? would you spend too much time in translating your application to a new database - and an efficient one :) - for the cost of such a thing ? unless this "database stuff" is for your own amusement :)

Guillaume
0
 
mi6agentAuthor Commented:
>the thing, if i've well understand mi6agent's problem, is that his application is already running and he wants to improve it.

Yep, i just want to improve it a little.

The whole idea behind my database is not one of super-speed, etc.  I don't expect a file of records to compete with the likes of Oracle or Access, etc and I don't mind that it might take 10-20ms to find and display a record from 1million records.

I know databases such as Oracle, MS Access, etc are all better at being databases than mine ever could be but then i don't really need the power of those databases for my application.

Also, i know that access does not actually delete anything until you "compact" it and that when you do "compact" the database it can take a while depending on how long it was between "compact" operations and how much "garbage" it has to delete, etc.

That was my idea - to delete records that have been marked as such when the user selects to "compact" my database...in essence they would know it was not going to be a super-fast operation - just like Access isn't super-fast at doing it.

What i was looking for was for a way to cut out all the "marked" files so that it would compact the file by removing them.



0
 
kretzschmarCommented:
var
  sourceF, DestF : File of TYourRecord;
  YourRecord : TYourRecord;
begin
  assignfile(sourceF,'YourFileName.Dat');
  assignFile(destF,'YourFileName.tmp');
  reset(sourceF);
  rewrite(DestF);

  while not eof(sourceF) do
  begin
     read(sourceF,YourRecord);
     if not YourRecord.Deleted then
       write(destF,YourRecord);
  end;
 
  //following may syntactally wrong
  DeleteFile('YourFileName.Dat');
  RenameFile('YourFileName.tmp','YourFileName.Dat');
end;

just from head

meikl ;-)
0
 
mi6agentAuthor Commented:
Thank you meikl

Your example gave me a good starting point to work from.

Thanks to everyone else for their input too - it was interesting and in some ways, well over my head :)
0
 
aikimarkCommented:
one last post-acceptance comment for future readers

G9. It is possible to get both hard drive and memory relief by storing the records in a compressed state.  Only as the records (blocks) are read are they uncompressed.  With the gap between I/O speed and CPU speed widening (I/O is still relatively as slow as it was a decade ago while CPU speeds have continued to climb), compression is a good technique to consider as a compensation strategy or performance/tuning strategy.

=============
Definition clarification:
* record number is an ordinal number of the non-deleted records in your file, usually starting with 1 or 0 and extending up to (NumOfRecords-NumOfDeletedRecords).
* record key is a unique identifier for each record, independent from its position in the file.

=============
Indexes are most commonly used for the following (some already mentioned in comments above):
I1. Speed or facilitate direct access to a record
I2. Provide access to records in a sequence that is different than the physical order of the records
I3. Point to a subset of the records meeting some selection criteria (this would also include the marking of deleted records.
I4. Provide a unique-record-key mechanism to prevent duplicate key values.
0
 
Slick812Commented:
here is some code that will compact a single file without rewriting the entire file, this is just for an example, if the TAccount is just a Record of 128 byte size, , , meikl's method is certanly easier, but will rewrite the entire file, even for a single record delete;



procedure TForm1.sbut_CompactAccountsClick(Sender: TObject);
const
memSize = 2048; // set your memory use size here, 2Kbs is extreamly low, just for example

var
account1: TAccount;
i, filePos, recAmt, Dropped, SizeF, nextPos, getAmt, SeekPos: Integer;
pAccMem: Pointer;
accFile: File of TAccount;

begin
if not FileExists('E:\Accounts.arf') then Exit;
recAmt := memSize div SizeOf(TAccount);
AssignFile(accFile, 'E:\Accounts.arf');
try
Reset(accFile);
Dropped := 0;
SizeF := FileSize(accFile)-1;

GetMem(pAccMem, recAmt* SizeOf(TAccount));
try
for i := 0 to SizeF do
  begin
  Read(accFile, account1);
  // test for the Delete to move the records
  if account1.Delete then
    begin
    ShowMessage(IntToStr(i)+' Drop '+IntToStr(Dropped));
    filePos := i;
    nextPos := filePos;
    Inc(Dropped); // Dropped has the number of records Deleted

    repeat
    //search the next records to see if they are also Deletes
    if i+Dropped > SizeF then Break;
    Read(accFile, account1);
    if account1.Delete then
      begin
      Inc(Dropped);
      Inc(nextPos); // nextPos will have the number of sequential records to delete
      end else break;
    until not account1.Delete;

    nextPos := (nextPos-filePos)+1;
    if (SizeF) - ((filePos+nextPos)) > recAmt-1 then
      getAmt := recAmt
      else // reduce the number of records moved at the end of the file
      getAmt := (SizeF+1) - ((filePos+nextPos));
    Seek(accFile, (filePos+nextPos));

    if getAmt > 0 then
      begin
      //here I read the file records into memory for move
      BlockRead(accFile, pAccMem^, getAmt);
      Seek(accFile, filePos);
      BlockWrite(accFile, pAccMem^, getAmt);
      //i move file pos back and write mem to file
      SeekPos := filePos+nextPos+1;

      while SeekPos < SizeF do
        begin
        if getAmt <> recAmt then Break;
      // this code moves all of the rest of the file foward in mem block size chunks
        SeekPos := SeekPos+ getAmt;

        if SizeF - (SeekPos+1) > recAmt-1 then
          getAmt := recAmt
          else // adjust move size at end of file
          getAmt := (SizeF+1) - (SeekPos);
        if getAmt < 1 then Break;
        Seek(accFile, SeekPos);
        BlockRead(accFile, pAccMem^, getAmt);
        Seek(accFile, SeekPos-nextPos);
        BlockWrite(accFile, pAccMem^, getAmt);
        end;
      end;
    Seek(accFile, filePos+1); // move file pos back to original to continue FOR loop
    end;
  if i >= SizeF - Dropped then Break;
  end;
finally
FreeMem(pAccMem);
end;
if Dropped > 0 then
  begin
  Seek(accFile, (SizeF+1) -Dropped);
  Truncate(accFile);
  end;
finally
CloseFile(accFile);
end;
end;
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 6
  • 5
  • 3
  • +4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now