?
Solved

Populate a structure by reading a formatted text file.

Posted on 2003-03-20
17
Medium Priority
?
239 Views
Last Modified: 2008-02-01
I am teaching myself to program with C. I've run across a problem with which I need some assistance. I want to read a text file from the harddrive and use the information from the file to populate an array of structures. From there I will do some additional processing. My problem is with how to read the file from the disk into the stuctures. I know how to read a file from disk and I know how to write a file to disk, but reading formatted information into a structure has me a bit stumped at the moment.
 
The file is set up with the following format:
123456789String1, String2 String3 Int1 Int2
123456789String1, String2 String3 Int1 Int2

The first nine digits represent a file number. It is not separated from the first string by any character. The first string is separated from the other two strings by a comma. The second and third strings followed by two integers are separated by spaces. The line ends with \n.  The file is about 1500 lines long. Each line represents one record.
 
 I want to read the file line by line into a structure similar to the following:
 
 struct myRecord {
 int myFileNumber;
 char *myString1
 char *myString2
 char *myString3
 int myInt1
 int myInt2
 }

I would also like the ability to start reading at the first line or any line other than the first line.  Could someone provide me with some direction on how to implement this problem with ANSI C. Thanks.
     
0
Comment
Question by:mjones1040
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 4
  • 3
  • +2
17 Comments
 

Expert Comment

by:bradleyb1537
ID: 8178646
For your program fwrite and fread sound perfect
to write a struct fill it out how you want then:

Create and open file with File Pointer fp.

myRecord test;
fwrite(test,sizeof(myRecord),1,fp);
//test is the structure , the next parameter is the size of the structure, the next is the number of the previous structures to write, and the last is the file pointer.

to read the file use
fread(test,sizeof(myRecord),1,fp);
//test is the structure , the next parameter is the size of the structure, the next is the number of the previous structures to read, and the last is the file pointer.

To read(or write) a bunch of data ata time go:
myRecord records[5];
fread(records,sizeof(myRecord),5,fp);
this will read 5 records to the array records.
0
 
LVL 8

Expert Comment

by:akshayxx
ID: 8178667
ok since each line of ur file will have a fixed format ,  so we can work out a scan strategy for each line,
also it would help if u know the maximum length possible of each line, if not that can also be worked around..

i hope u know ( as u've said) how to read line by line

lets say u have variable
char *line ;
for each line read it( "line" ) points to it.

here is how u'll parse the line and fill up the information in the record pointed to with 'st' which , here i am passing as an argument to the record.

void parseLine(char *line,struct myRecord *st){
int fno,i1,i2;  //used for convenience
char *s1,*s2,*s3;  // u can directly assign the respective struct's members  , and u can use single char * for s1, s2,s3 .. as they are temporary variables .. but i have used separate for ur convenience to understand
char *endptr=NULL;// used for strtol
char *idx;

fno=strtol(line,&endptr,10);
st->myFileNumber=fno;

//endptr now points to first non numeric character after the myfilenumber

idx=strchr(endptr,',');
s1=(char*)malloc(idx-endptr+1);
strncpy(s1,endptr,idx-endptr);
st->myString1=s1;

//idx points to first comma.. so increment it to point to next string
idx++; // u might need to skip spaces if there are any between comma and the string2
endptr=strchr(idx,' ');// look for the first space after string2

s2=(char*)malloc(endptr-idx+1);
strncpy(s2,idx,endptr-idx);
st->myString2=s2;

//now endptr points to first spcace after second string
endptr++;
idx=strchr(endptr,' ');
s3=(char*)malloc(idx-endptr+1);
strncpy(s3,endptr,idx-endptr);
st->myString3=s3;


//now idx points to first space after third string..so increment it to point to int1..u may need to skip spaces , if u have more than one
idx++;

now simple sscanf will do the job to get rest of the integers in one call.. u can parse them one by one also

sscanf(idx,"%d %d",&i1,&i2);

st->myInt1=i1;
st->myInt2=i2;

}




all this assumes . that each of ur line is well formatted and if it can have potential errors ( like no int2 present after int1.. and other errors) .. then u shud modify the above code to check for errors..
hope this will get you good starting point..
0
 
LVL 8

Expert Comment

by:akshayxx
ID: 8178673
bradleyb1537 :
fread and fwrite are used to read and write in binary ( bytes ) mode,
the type of data file that mjones1040 has mentioned requires formatted I/O

if he has flexibility to change the available data to 'raw binary mode'
then i will also suggest him to go for fread and fwrite .. but not for formatted I/O
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 8

Expert Comment

by:akshayxx
ID: 8178685
>>I would also like the ability to start reading at the first line or any line other than the first line

well mjones1040, this is not possible unless u know then length of each line, if each line can be of variable length , then u have no choice but to read each line and keep skipping till u get desired line number.
0
 
LVL 8

Expert Comment

by:akshayxx
ID: 8178695
also for my example to work . u need to include folowing headers
#include <stdlib.h>
#include <string.h>

and also when u pass line and st to my function .. st must have been allocate memory .. and line points to the actual single line  of data that u got from the file

and also i wrote the above code here  only without testing .. so try it out .. logically it is correct , u might ( i hope u wont) need to modify it a bit .

work out on ur problem and let us know if u get stuck ..  
0
 

Expert Comment

by:bradleyb1537
ID: 8178703
For your program fwrite and fread sound perfect
to write a struct fill it out how you want then:

Create and open file with File Pointer fp.

myRecord test;
fwrite(test,sizeof(myRecord),1,fp);
//test is the structure , the next parameter is the size of the structure, the next is the number of the previous structures to write, and the last is the file pointer.

to read the file use
fread(test,sizeof(myRecord),1,fp);
//test is the structure , the next parameter is the size of the structure, the next is the number of the previous structures to read, and the last is the file pointer.

To read(or write) a bunch of data ata time go:
myRecord records[5];
fread(records,sizeof(myRecord),5,fp);
this will read 5 records to the array records.
0
 

Expert Comment

by:bradleyb1537
ID: 8178735
For your program fwrite and fread sound perfect
to write a struct fill it out how you want then:

Create and open file with File Pointer fp.

myRecord test;
fwrite(test,sizeof(myRecord),1,fp);
//test is the structure , the next parameter is the size of the structure, the next is the number of the previous structures to write, and the last is the file pointer.

to read the file use
fread(test,sizeof(myRecord),1,fp);
//test is the structure , the next parameter is the size of the structure, the next is the number of the previous structures to read, and the last is the file pointer.

To read(or write) a bunch of data ata time go:
myRecord records[5];
fread(records,sizeof(myRecord),5,fp);
this will read 5 records to the array records.
0
 
LVL 8

Expert Comment

by:akshayxx
ID: 8178757
bradleyb1537 : please stop posting same stuff everytime, that too after significant interval of times..
seems u are pressing refresh button many times .. since u posted ur first comment .. dont do that..
0
 
LVL 6

Expert Comment

by:gj62
ID: 8181063
Hmmm,

First, you should not read and write a structure directly to disk.  It's poor style, not portable, and could break when you change compiler versions even (though not likely).  Structures are internally padded in C by the compiler, so you aren't going to write *exactly* what is in your struct.  Anyhow, you have variable length strings, so it wouldn't work anyway.

Next, a nine-digit number needs a long, not an int, as the first number in your struct...

Next, the strncpy's shown in the code need to be subsequently NULL-terminated (strncpy does not do that unless the source is completely read and space is still available in target).  So, you need to calculate the length of the string using idx and endptr (whose functions change from string to string - you might want to clear that up for clarity).  E.g., for the first string, do this:

idx=strchr(endptr,',');
s1=(char*)malloc(idx-endptr+1);
strncpy(s1,endptr,idx-endptr);
s1[(idx-endptr)+1]=0; /*NULL TERMINATE THE STRING*/
st->myString1=s1;

QUESTION for mjones - can the strings have spaces or commas that are NOT part of the formatting?  In other words, can the file have a line that looks like:

123456789String One With Spaces,String,Two,with,commas StringThree 123 456

If so, you have to do it the way akshayxx says, if not, user strtok, that's what it's for, e.g.

char delims[]=" ,"; /*space and comma are delimiters*/

fgets(buf, sizeof(buf), pFile); /*read a line*/

/* use akshayxx method for the first number, since there is no delimiter*/
long fno; /* NEED A LONG, NOT AN INT...*/
fno=strtol(buf,&sPtr,10);
st->myFileNumber=fno; /*change myFileNumber to a LONG */

now use strtok...

nextStr = strtok(sPtr, delims); /*nextStr now points to the NULL terminated string, so just strcpy...*/
strcpy(st->myString1,nextStr);
/* do it again for next strings...*/
nextStr =  strtok(NULL, delims);
strcpy(st->myString2, nextStr);
nextStr =  strtok(NULL, delims);
strcpy(st->myString3, nextStr);
/* now read the ints (are you sure they are not LONGS?...*/
nextStr =  strtok(NULL, delims);
st->myInt1 = atoi(nextStr);
nextStr = strtok(NULL, delims);
st->myInt2 = atoi(nextStr);

This just read alot cleaner if the file is structured without embedded spaces or commas, other than the intended delimiters...
0
 

Author Comment

by:mjones1040
ID: 8181633
Thanks to everyone on this problem.  I am still reading all the comments and trying to figure out what will work best for me.  TO: gj62  All of the lines in the file follow the same pattern.  String2 after the comma may have an embedded space.  The other strings do not.  All of the lines are the same length and all of the strings have the same number of characters.  This should make it easier for me to parse each line once I figure out what I need to do.  All of the comments are very helpful.
0
 
LVL 6

Accepted Solution

by:
gj62 earned 200 total points
ID: 8181770
Well, if they are fixed length strings, it gets even easier...

#define FILENOLEN 9
#define STRLEN1  10
#define STRLEN2  15
#define STRLEN3  20 /*assume the strings are 10, 15 and 20 characters... replace with your string lengths*/

#define STR1LOC  FILENOLEN /* 0-based, not 1-based */
#define STR2LOC  STR1LOC+STRLEN1
#define STR3LOC  STR2LOC+STRLEN2

/*now, make your struct mirror that*/

struct myRecord {
  long myFileNumber;
  char myString1[STRLEN1+1]; /*add 1 for NULL*/
  char myString2[STRLEN2+1];
  char myString3[STRLEN3+1];
  int myInt1;
  int myInt2;
}

FILE *fInput;
char inBuffer[100];
struct myRecord recs[100];  /* assume 100 record in file */
int recNum = 0;
char *nextStr;

fInput = fopen("myfile.txt", "r");
while(!feof(fInput))
{
  fgets(inBuffer, sizeof(inBuffer), fInput);

  long fno; /* NEED A LONG, NOT AN INT...*/
  recs[recNum].myFileNumber = strtol(inBuffer,&nextStr,10);

  memcpy(&recs[recNum].myString1, inBuffer[STR1LOC],STRLEN1);
  recs[recNum].myString1[STRLEN1+1]=0;

  memcpy(&recs[recNum].myString2, inBuffer[STR2LOC],STRLEN2);
  recs[recNum].myString1[STRLEN2+1]=0;

  memcpy(&recs[recNum].myString3, inBuffer[STR3LOC],STRLEN3);
  recs[recNum].myString1[STRLEN3+1]=0;

  /* now read the ints (are you sure they are not LONGS?...*/
  nextStr =  strtok(&inBuffer[STR3LOC+STRLEN3], delims);
  recs[recNum].myInt1 = atoi(nextStr);
  nextStr = strtok(NULL, delims);
  recs[recNum].myInt2 = atoi(nextStr);

  /*we're done, let's increment recNuma and do it until end-of-file */
  ++recNum;
}
0
 

Author Comment

by:mjones1040
ID: 8181854
oops.  I should have said that strings 1,2 and 3 combined have the same number of characters.  However the only way to tell the difference between string 1 and string 2 is the comma.  The only way to distinguish string 2 from string 3 is the first space after the space following the comma separating 1 from 2.  Just to be clear... the sum of the lengths for all the strings will be a fixed number, however the lengths of the individual strings will vary from line to line.
0
 
LVL 6

Expert Comment

by:gj62
ID: 8181989
OK, so you have a choice.

You can waste some space in the struct, and not worry about dynamic allocation of string space (if you don't have that many structs, that's what I'd do).  Just assign each string the max value it can have.  

WARNING - in the other sample code posted by people, they never allocated this space, so it would crash since your struct is using char * (a pointer - no allocated space)...

You can then use the complete strtok() code I posted, rather than the memcpy...

If you want to dynamically allocate, it's no big deal, just use the following:

nextStr = strtok(sPtr, delims); /*nextStr now points to the NULL terminated string, so just strcpy...*/

recs[recNum].myString1 = (char *)malloc(strlen(nextStr+1)); */malloc space for the string, assuming myString1 is a char * */
strcpy(recs[recNum].myString1,nextStr);

/* do it again for next strings...*/
nextStr =  strtok(NULL, delims);
recs[recNum].myString2 = (char *)malloc(strlen(nextStr+1));
strcpy(recs[recNum].myString2, nextStr);
nextStr =  strtok(NULL, delims);
recs[recNum].myString3 = (char *)malloc(strlen(nextStr+1));
strcpy(recs[recNum].myString3, nextStr);
/* now read the ints (are you sure they are not LONGS?...*/
nextStr =  strtok(NULL, delims);
st->myInt1 = atoi(nextStr);
nextStr = strtok(NULL, delims);
st->myInt2 = atoi(nextStr);
0
 

Expert Comment

by:posternb
ID: 8183071
>> long fno; /* NEED A LONG, NOT AN INT...*/

Note on many systems sizeof(long) == sizeof(int)
0
 
LVL 6

Expert Comment

by:gj62
ID: 8183118
Agreed - but not a safe assumption to make if you don't know where your code may end up.  

Kinda like including sizeof(char *)*strlen(string)+1 in a malloc <grin><grin> - just a good habit...
0
 

Expert Comment

by:posternb
ID: 8183488
Hehe, yep
0
 

Author Comment

by:mjones1040
ID: 8222590
All of the comments were very helpful.  This is the one I chose as the accepted answer.  It along with your other comments seemed to help me the most.  Thanks!
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This is a short and sweet, but (hopefully) to the point article. There seems to be some fundamental misunderstanding about the function prototype for the "main" function in C and C++, more specifically what type this function should return. I see so…
Examines three attack vectors, specifically, the different types of malware used in malicious attacks, web application attacks, and finally, network based attacks.  Concludes by examining the means of securing and protecting critical systems and inf…
The goal of this video is to provide viewers with basic examples to understand and use structures in the C programming language.
Video by: Grant
The goal of this video is to provide viewers with basic examples to understand and use while-loops in the C programming language.
Suggested Courses

765 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question