Link to home
Start Free TrialLog in
Avatar of checkin
checkin

asked on

Seeking Positions in a File

Hi !

I am trying to seek positions in a file.  The file is about 5 meg in size, and in one process need to seek about 200 times.  I am currently using fopen and fseek to do this and it is taking about 6-8 seconds to complete.  I need to speed this up to be as fast as possible.  Also when this is being done simultaneously by about 10 people it can take longer than 20 seconds per process.  I would be grateful if you could advise me on any faster ways of doing this.

Regards,

Marvin.
Avatar of rbr
rbr

Can you post your code since a fseek should not need so much time!
Avatar of checkin

ASKER

Below is the chunk that records the time elapsed.

char tmpStr[4096];

a = time(&a);

idxFile = fopen("MainData.txt","r");
for(i=1;i<=posCounter;i++) {
  tmpPosition = atoi(read_record(inBuff,i,','));
  fseek(idxFile,tmpPosition, 0);
  fgets(tmpStr,sizeof(tmpStr),idxFile);
}
fclose(idxFile);

b = time(&b);
diff = b - a;
printf("Get Records Time = [%d] Seconds<br>\n",diff);



I guess posCounter goes up to 200. 5-6 secs seems a long time for me for this function. What does read_record do. What kind of computer (processor, OS) do you use?
Also fseek in a non binary file could be dangerous!
Avatar of checkin

ASKER

OS is Solaris on a SUN Ultra 1 with 256Mb Ram

read_record is a function to return a specific field from a delimted line.  Here is it below :-

char* read_record(char *rec, int fieldNum, char delimin) {

  int a;
  char *chPtr1;
  char localrec[4096];
  char delimeter[40];
  char tmpbuf[40];

  memset(tmpbuf,'\0',sizeof(tmpbuf));
  memset(delimeter,'\0',sizeof(delimeter));
  memset(localrec,'\0',sizeof(localrec));

  sprintf(delimeter,"%c",delimin);

  for(a=0;rec[a]!='\0';a++) {
    if(a != 0) {
      if(rec[a-1]==delimin && rec[a]==delimin) {
        sprintf(localrec,"%s ",localrec);
      }
    }
    sprintf(localrec,"%s%c",localrec,rec[a]);
  }

  chPtr1 = localrec;
  chPtr1 = strtok((char*)localrec,delimeter);
  for(a=0;a<fieldNum;a++) {
    if(chPtr1+1==delimeter) { a++; }
    chPtr1 = strtok(NULL,delimeter);
  }
  if(chPtr1==NULL)
    return("NULL");
  return(chPtr1);
}

I think your best bet, I dont know your situation, would be to translate the ascii file, which seems to contain only indecies back in to the same file, translate it in to a binary file.
That would make the file smaller (MUCH smaller). Then you could read the whole file in to memmory and scan through it. Then you wouldn't need to do any seeks only calculate an offset in to an array.
I know thats some what general but its the best I can do without knowing what the file contains.
If the file contains some data other than offsets back in to the same file you might want to consider an index file. Keep the data in one file with no offset information, just null terminated strings. Then have a second file, binary, that contains only offsets in to the data file. You can read the index file in to memmory and then do One seek to the data you want in the data file.
Again I dont know what the data file contains.
ASKER CERTIFIED SOLUTION
Avatar of rbr
rbr

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I don't understand, why your read_record is soo difficult. Can you post the contents of inBuff and what you want to do with it. You loop everytime over the whole buffer. This can be made faster.
Avatar of checkin

ASKER

Basically any string that I have for example :-

field1,field2,field3,field4

I made this function so that I could pass it any delimeted string with the field number I wanted to retreive and the delimeter used.  So in the above example to retrieve field3 I would call it like this

read_record(string,2,',')

Marvin.
Change

for(i=1;i<=posCounter;i++) {
  tmpPosition = atoi(read_record(inBuff,i,','));
  fseek(idxFile,tmpPosition, 0);
  fgets(tmpStr,sizeof(tmpStr),idxFile);
}

to
tmpstr = read_record(inBuff,0,',');

for(i=1;i<=posCounter;i++) {
  tmpstr = read_record(tmpstr,0,',');
  tmpPosition = atoi(tmpstr);
  fseek(idxFile,tmpPosition, 0);
  fgets(tmpStr,sizeof(tmpStr),idxFile);
}

Why don't you use the first field?