Avatar of tanhl
tanhl
 asked on

efficiently manipulating records (line) in file

Hi all,

I am trying to implement an order database, which stores records (typically ~500records) in line.

e.g.

order: 1 [type: 1; date: xxxxx; price: $xxx]
order: 2 [type: 4; date: xxxxx; price: $xxxxx]
order: 3 [type: 3; date: xxxxx; price: $xx]
..
order: x [type: 3; date: xxxxx; price: $xxxxxxx]


I would like to efficiently alter the content for example line #2, without reading all records into an array of order structure. How can I do this?

Cheers,
hl
CCOBOL

Avatar of undefined
Last Comment
jmcg

8/22/2022 - Mon
ASKER CERTIFIED SOLUTION
imladris

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
SOLUTION
gj62

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
Kent Olsen

gj62 is correct when he says that this is a poor file structure for this type of application.

But, if you're committed to this, you have two "real" choices.

1)  Preset all the records to the same length so that you CAN change it in place.  Actually changing the data is pretty simple if the record size doesn't change.

2)  Map the file over paged memory.  mmap() on unix systems.  The instead of doing reads() and writes() you can treat the entire file as just a big array and let the system's page handler do all of the I/O behind the scenes.

Kdo

TheBeaver

You could try to implement an index. Do this by having a separate file that has simple records of just position and length.
- Everytime you add a record to the data file, you record its position and length in the index file.
- When searching for the nth record, pull out the nth entry in the index. This will give you the position to jump to in the data file.

This method DOES allow you to have records of different length BUT if you want to change the records, you need to...
- Add a new field to the data that is a record status this will be 'A' for active or 'D' for deleted.
- Everytime a record is changed, add it as a new record at the end of the data file. Then change the index entry to point to the new pos (this maintains the order). Then mark the old record as 'D' for deleted.

The downside of the above is that the data file will have lots of deleted records in it. So make a routine that goes through the data file, copying only the active records to a new file. You would not have to run this cleanup all that often.

Alternativly, you could keep the deleted records as a history of what changes has been made. The best way to do this is have a new field that refers back to the original (now deleted) entry, whenever you change a record.
tanhl

ASKER
I need to modify the record by the entire line, not on a specific field (e.g price) in each line. Do I need to have a constant number of bytes for all records (lines) to make the job easier?

I might have indentical number of bytes for each record (line).

thanks!
hl
Experts Exchange is like having an extremely knowledgeable team sitting and waiting for your call. Couldn't do my job half as well as I do without it!
James Murphy
TheBeaver

If you have a constant record size then you can replace the record by...
1) opening the file "rw"
2) then use fseek to jump to the nth record (n * recordsize)
3) write the record
Kocil

// Well, just do it with the fixed length record
// for 500 records, thats not a problem
// 500 * sizeof(record) = around 5000 byte ???

//
// Header File
typedef struct {
  unsigned long num_record;
  char another_info[10];
} DbHeader;

typedef struct {
  int type;
  struct date dt;
  unsigned long price; // float if you need cent ?
} DbRecord;

FILE *dbCreate(char *fname)
{
   static DbHeader h = {0, "MYDATABSE"};
   FILE *f = fopen(fname, "w+b");
   fwrite(f, sizeof(h), 1, &h);
}

int dbAppend(FILE *f, DbRecord *rec)
{
   DbHeader h;

   fseek(f, 0, SEEK_SET);
   fread(f, sizeof(h), 1, &h);
   h.num_record++;
   fseek(f, 0, SEEK_SET);
   fwrite(f, sizeof(h), 1, &h);
   fseek(f, sizeof(h)+(h.num_record-1)*sizeof(DbRecord),SEEK_SET);
   fwrite(f, sizeof(DbRecord), 1, rec);
}

// rec_no from 0 .. N-1
int dbRead(FILE* f, DbRecord* rec, int rec_no)
{
   DbHeader h;

   fseek(f, 0, SEEK_SET);
   fread(f, sizeof(h), 1, &h);
   if (rec_no < h.num_record) {
      fseek(f, sizeof(h)+(rec_no)*sizeof  (DbRecord),SEEK_SET);
      fread(f, sizeof(DbRecord), 1, rec);
      return 1;
   }
   return 0;
}

int dbWrite(FILE* f, DbRecord* rec, int rec_no)
{
   DbHeader h;

   fseek(f, 0, SEEK_SET);
   fread(f, sizeof(h), 1, &h);
   if (rec_no < h.num_record) {
      fseek(f, sizeof(h)+(rec_no)*sizeof  (DbRecord),SEEK_SET);
      fwrite(f, sizeof(DbRecord), 1, rec);
      return 1;
   }
   return 0;
}

int dbClose(FILE* f)
{
   fclose(f);
}

Kocil

Oops correction

FILE *dbCreate(char *fname)
{
  static DbHeader h = {0, "MYDATABSE"};
  FILE *f = fopen(fname, "w+b");
  fwrite(f, sizeof(h), 1, &h);
  return f;
}


My code is not tested,
I don't check for any I/O error,
use by your own risk.


 
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
tanhl

ASKER
Hi Kocil,

but I need the records in acsii format not binary.

please advice.

thanks,
hl
Kent Olsen


Now you're entering the world of "how can I make a C program behave like a COBOL program.  It can be done, but it gets a bit cumbersome.

Let's build a small record that we want to maintain.  While we could always use the C 'struct' statement, it's not very flexible and just as wordy as what I'm going to describe.

/*  Create constants for field identifiers  */

enum
{
  FN_NULL
  FN_NAME,
  FN_ADDRESS1,
  FN_ADDRESS2,
  FN_CITY,
  FN_STATE,
  FN_ZIP,
  FN_PHONE,
  FN_SALARY,
  FN_BONUS,
  FN_RESERVED
  FN_MAX
};

/*  Define the (maximum) length of each field  */

enum
{
  FL_NAME     =40,
  FL_ADDRESS1 =30,
  FL_ADDRESS2 =30,
  FL_CITY     =24,
  FL_STATE    =2,
  FL_COUNTRY  =24,
  FL_ZIP      =10,
  FL_PHONE    =10,
  FL_SALARY   =9,
  FL_BONUS    =9,
  FL_RESERVED =40   /* Pad the record so you can add stuff later without growing the record */
}

/*  Define the starting offset for each field  */

#define RS_NAME     0
#define RS_ADDRESS1 (RS_NAME+FL_NAME)
#define RS_ADDRESS2 (RS_ADDRESS1+FL_ADDRESS1)
#define RS_CITY     (RS_ADDRESS2+FL_ADDRESS2)
#define RS_STATE    (RS_CITY+FL_CITY)
#define RS_COUNTRY  (RS_STATE+FL_STATE)
#define RS_ZIP      (RS_COUNTRY+FL_COUNTRY)
#define RS_PHONE    (RS_ZIP+FL_ZIP)
#define RS_SALARY   (RS_PHONE+FL_PHONE)
#define RS_BONUS    (RS_SALARY+FL_SALARY)
#define RS_RESERVED (RS_BONUS+FL_BONUS)
#define RS_MAX      (RS_RESERVED+FL_RESERVED)

/*
    Make sure that any changes to the RESERVED area
    maintain the same record length.
*/
#if (RS_MAX != 210)
#err Record Size Has Changed!
#endif

char *MyRecord;

char * CreateDataRecord (void)
{
  return ((char *)malloc (RS_MAX));
}

void ChangeTextField (char *Record, int FieldStart, int Length, char *NewValue)
{

  memset (Record+FieldStart, ' ', Length);
  memcpy (Record+FieldStart, NewValue, max (Length, strlen (NewValue));
}

void ChangeNumericValue (char *Record, int FieldStart, int Length, long NewValue)
{

  char *TempString;

  TempString = (char *)malloc (Length+1);
  sprintf (TempString, "%*.*d", Length, Length, NewValue);
  memcpy (Record+FieldStart, TempString, Length);
  free (TempString);
}



Now that "setup" might look a bit awkward, but it actually does a pretty good job of laying down the framework for managing your custom record.  (It needs "GetValue" functions, but I'll leave that up to you.)  Here's a sample usage:


MyRecord = CreateDataRecord ();

ChangeTextValue (MyRecord, RS_NAME, FL_NAME, "My Name");
ChangeTextValue (MyRecord, RS_ADDRESS1, FL_ADDRESS1, "My Address");
ChangeNumericValue (MyRecord, RS_SALARY, FL_SALARY, 1000000);

And of course, instead of calling "ChangeTextValue" or "ChangeNumericValue" directly for each change, you might want to build wrappers around them for each field.  The code will ceratinly look a bit cleaner.


void ChangeName (char *Record, char *NewValue)
{
  ChangeTextValue (Record, RS_NAME, FL_NAME, NewValue);
}

This is about as close as you're going to come to making a C record behave like COBOL or C++;

C:     ChangeName (MyRecord, "New Name");
C++:   MyRecord->ChangeName ("New Name");
COBOL: Move "New Name" to NAME of MyRecord;


Good Luck!
Kdo
jmcg

Nothing has happened on this question in over 10 months. It's time for cleanup!

My recommendation, which I will post in the Cleanup topic area, is to
split points between imladris and gj62.

PLEASE DO NOT ACCEPT THIS COMMENT AS AN ANSWER!

jmcg
EE Cleanup Volunteer
Your help has saved me hundreds of hours of internet surfing.
fblack61