checkin
asked on
Seeking Positions in a File
Hi !
I am trying to seek positions in a file. The file is about 5 meg in size, and in one process need to seek about 200 times. I am currently using fopen and fseek to do this and it is taking about 6-8 seconds to complete. I need to speed this up to be as fast as possible. Also when this is being done simultaneously by about 10 people it can take longer than 20 seconds per process. I would be grateful if you could advise me on any faster ways of doing this.
Regards,
Marvin.
I am trying to seek positions in a file. The file is about 5 meg in size, and in one process need to seek about 200 times. I am currently using fopen and fseek to do this and it is taking about 6-8 seconds to complete. I need to speed this up to be as fast as possible. Also when this is being done simultaneously by about 10 people it can take longer than 20 seconds per process. I would be grateful if you could advise me on any faster ways of doing this.
Regards,
Marvin.
Can you post your code since a fseek should not need so much time!
ASKER
Below is the chunk that records the time elapsed.
char tmpStr[4096];
a = time(&a);
idxFile = fopen("MainData.txt","r");
for(i=1;i<=posCounter;i++) {
tmpPosition = atoi(read_record(inBuff,i, ','));
fseek(idxFile,tmpPosition, 0);
fgets(tmpStr,sizeof(tmpStr ),idxFile) ;
}
fclose(idxFile);
b = time(&b);
diff = b - a;
printf("Get Records Time = [%d] Seconds<br>\n",diff);
char tmpStr[4096];
a = time(&a);
idxFile = fopen("MainData.txt","r");
for(i=1;i<=posCounter;i++)
tmpPosition = atoi(read_record(inBuff,i,
fseek(idxFile,tmpPosition,
fgets(tmpStr,sizeof(tmpStr
}
fclose(idxFile);
b = time(&b);
diff = b - a;
printf("Get Records Time = [%d] Seconds<br>\n",diff);
I guess posCounter goes up to 200. 5-6 secs seems a long time for me for this function. What does read_record do. What kind of computer (processor, OS) do you use?
Also fseek in a non binary file could be dangerous!
ASKER
OS is Solaris on a SUN Ultra 1 with 256Mb Ram
read_record is a function to return a specific field from a delimted line. Here is it below :-
char* read_record(char *rec, int fieldNum, char delimin) {
int a;
char *chPtr1;
char localrec[4096];
char delimeter[40];
char tmpbuf[40];
memset(tmpbuf,'\0',sizeof( tmpbuf));
memset(delimeter,'\0',size of(delimet er));
memset(localrec,'\0',sizeo f(localrec ));
sprintf(delimeter,"%c",del imin);
for(a=0;rec[a]!='\0';a++) {
if(a != 0) {
if(rec[a-1]==delimin && rec[a]==delimin) {
sprintf(localrec,"%s ",localrec);
}
}
sprintf(localrec,"%s%c",lo calrec,rec [a]);
}
chPtr1 = localrec;
chPtr1 = strtok((char*)localrec,del imeter);
for(a=0;a<fieldNum;a++) {
if(chPtr1+1==delimeter) { a++; }
chPtr1 = strtok(NULL,delimeter);
}
if(chPtr1==NULL)
return("NULL");
return(chPtr1);
}
read_record is a function to return a specific field from a delimted line. Here is it below :-
char* read_record(char *rec, int fieldNum, char delimin) {
int a;
char *chPtr1;
char localrec[4096];
char delimeter[40];
char tmpbuf[40];
memset(tmpbuf,'\0',sizeof(
memset(delimeter,'\0',size
memset(localrec,'\0',sizeo
sprintf(delimeter,"%c",del
for(a=0;rec[a]!='\0';a++) {
if(a != 0) {
if(rec[a-1]==delimin && rec[a]==delimin) {
sprintf(localrec,"%s ",localrec);
}
}
sprintf(localrec,"%s%c",lo
}
chPtr1 = localrec;
chPtr1 = strtok((char*)localrec,del
for(a=0;a<fieldNum;a++) {
if(chPtr1+1==delimeter) { a++; }
chPtr1 = strtok(NULL,delimeter);
}
if(chPtr1==NULL)
return("NULL");
return(chPtr1);
}
I think your best bet, I dont know your situation, would be to translate the ascii file, which seems to contain only indecies back in to the same file, translate it in to a binary file.
That would make the file smaller (MUCH smaller). Then you could read the whole file in to memmory and scan through it. Then you wouldn't need to do any seeks only calculate an offset in to an array.
I know thats some what general but its the best I can do without knowing what the file contains.
If the file contains some data other than offsets back in to the same file you might want to consider an index file. Keep the data in one file with no offset information, just null terminated strings. Then have a second file, binary, that contains only offsets in to the data file. You can read the index file in to memmory and then do One seek to the data you want in the data file.
Again I dont know what the data file contains.
That would make the file smaller (MUCH smaller). Then you could read the whole file in to memmory and scan through it. Then you wouldn't need to do any seeks only calculate an offset in to an array.
I know thats some what general but its the best I can do without knowing what the file contains.
If the file contains some data other than offsets back in to the same file you might want to consider an index file. Keep the data in one file with no offset information, just null terminated strings. Then have a second file, binary, that contains only offsets in to the data file. You can read the index file in to memmory and then do One seek to the data you want in the data file.
Again I dont know what the data file contains.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
I don't understand, why your read_record is soo difficult. Can you post the contents of inBuff and what you want to do with it. You loop everytime over the whole buffer. This can be made faster.
ASKER
Basically any string that I have for example :-
field1,field2,field3,field 4
I made this function so that I could pass it any delimeted string with the field number I wanted to retreive and the delimeter used. So in the above example to retrieve field3 I would call it like this
read_record(string,2,',')
Marvin.
field1,field2,field3,field
I made this function so that I could pass it any delimeted string with the field number I wanted to retreive and the delimeter used. So in the above example to retrieve field3 I would call it like this
read_record(string,2,',')
Marvin.
Change
for(i=1;i<=posCounter;i++) {
tmpPosition = atoi(read_record(inBuff,i, ','));
fseek(idxFile,tmpPosition, 0);
fgets(tmpStr,sizeof(tmpStr ),idxFile) ;
}
to
tmpstr = read_record(inBuff,0,',');
for(i=1;i<=posCounter;i++) {
tmpstr = read_record(tmpstr,0,',');
tmpPosition = atoi(tmpstr);
fseek(idxFile,tmpPosition, 0);
fgets(tmpStr,sizeof(tmpStr ),idxFile) ;
}
Why don't you use the first field?
for(i=1;i<=posCounter;i++)
tmpPosition = atoi(read_record(inBuff,i,
fseek(idxFile,tmpPosition,
fgets(tmpStr,sizeof(tmpStr
}
to
tmpstr = read_record(inBuff,0,',');
for(i=1;i<=posCounter;i++)
tmpstr = read_record(tmpstr,0,',');
tmpPosition = atoi(tmpstr);
fseek(idxFile,tmpPosition,
fgets(tmpStr,sizeof(tmpStr
}
Why don't you use the first field?