RonMexico
asked on
How to read a text file from C, line-by-line, in portable fashion
So, i wrote a C program that loads in an ASCII file line-by-line using code like the below. This worked fine compiled down to windows and to ubuntu, but we tried to compile it onto a raspberry pi (which I think runs a variant of debian linux), it didnt process the file correctly. It seemed (but not sure) to be loading the entire file into one line; ie not detecting the '/n' as an end of line and loading the whole file into line_data.
We have double checked the data file, and they are each encoded properly (ie line endings) for the operating system.
So my questions are;
1) isn't '\n' supposed to be the portable way to detect an end of line?
2) if not, what IS the portable way?
Thanks for any help.
We have double checked the data file, and they are each encoded properly (ie line endings) for the operating system.
So my questions are;
1) isn't '\n' supposed to be the portable way to detect an end of line?
2) if not, what IS the portable way?
Thanks for any help.
while (!done_reading)
{
ch = getc(fp);
if (ch == EOF) done_reading = true;
if ((ch == '\n') || (ch == EOF))
{
// process the complete line in line_text
line_length=0;
}
else
{
line_text[line_length] = ch;
line_length++;
}
}
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
>>By the way if I read the doc right one drawback of fgets is that it won't notify you if you hit
>>the max line length.
That's easy to avoid by choosing an ample-sized buffer. E.g. 2048 chars on a single line are prety uncommon for a not malformed text file.
>>the max line length.
That's easy to avoid by choosing an ample-sized buffer. E.g. 2048 chars on a single line are prety uncommon for a not malformed text file.
ASKER
Yes and no... consider if they accidentally try to load a binary file. There won't *be* regular line endings and they could overflow an adequate buffer pretty easily. I'd rather catch and abort (with a precise error code) earlier than later in that situation.
ASKER
*"read" not "load"
Also I should have said "yes, but" not "yes and no" since you indicated a text file in your example.
Thanks for the continued thoughts.
Also I should have said "yes, but" not "yes and no" since you indicated a text file in your example.
Thanks for the continued thoughts.
>> consider if they accidentally try to load a binary file.
The open mode in 'fopen()' will open it as a text file, so the 1st binary zero will be interpreted as "EOF".
>>There won't *be* regular line endings and they could overflow an adequate buffer pretty
>>easily.
No, since 'fgets()' will only read up the the amount of bytes specified, so buffer iverruns will only happen if a mistake is made in that call.
The open mode in 'fopen()' will open it as a text file, so the 1st binary zero will be interpreted as "EOF".
>>There won't *be* regular line endings and they could overflow an adequate buffer pretty
>>easily.
No, since 'fgets()' will only read up the the amount of bytes specified, so buffer iverruns will only happen if a mistake is made in that call.
ASKER
I should have said hitting the buffer limit instead of buffer overruns.
Point is, if i trap hitting the buffer limit, the client function will be more helpful to the user in its error messaging ("you have probably loaded an altogether wrong kind of file") than if it just doesn't recognize the content of a very long line, which could be just a typo. It's just a minor benefit provided by the character-by-character read.
Which will hopefully be more than offset by fgets being able to do what it's intended to do on a raspberry pi. :)
Point is, if i trap hitting the buffer limit, the client function will be more helpful to the user in its error messaging ("you have probably loaded an altogether wrong kind of file") than if it just doesn't recognize the content of a very long line, which could be just a typo. It's just a minor benefit provided by the character-by-character read.
Which will hopefully be more than offset by fgets being able to do what it's intended to do on a raspberry pi. :)
Ironically, the easiest way out would be to say "go C++, use 'getline()'" - this would rid you of all these issues and handle lines of arbitrary length ;o)
ASKER
Thanks for the help!
fgets, unlike the posted code, will stop before the maximum number of bytes is reached, and it will append a null character to the string (whether or not it had already red null bytes in the file)
You would have to do the check for whether the '\n' was included yourself, since fgets will nit tell you with it's return value.
I'd like to see the test you did in which it did not process the file correctly, and a clarification of in what way it was not correct.
I'd also like to verify that ch was properly declared as int, not as char, since an EOF value does not fit in a char.
You would have to do the check for whether the '\n' was included yourself, since fgets will nit tell you with it's return value.
I'd like to see the test you did in which it did not process the file correctly, and a clarification of in what way it was not correct.
I'd also like to verify that ch was properly declared as int, not as char, since an EOF value does not fit in a char.
ASKER
@jkr: Would love to go C++ but we are writing Simulink s-functions. Don't ask.
@ozo: Yeah, I actually edited my precious max-line-length trap (in the else branch, with specific error code) out of the posted code, since it was sideways from the problem. :) I didn't bother with a \0 since I was keeping track of the line_length myself as I appended to the buffer.
The result on the Pi was that it was hitting the limit of the line buffer (ie my redacted error code was returned), as if
if ((ch == '\n') || (ch == EOF))
was never evaluating to TRUE.
Very interesting on the EOF. It doesn't fit in a char?? Maybe changing that to int is something else to try...
@ozo: Yeah, I actually edited my precious max-line-length trap (in the else branch, with specific error code) out of the posted code, since it was sideways from the problem. :) I didn't bother with a \0 since I was keeping track of the line_length myself as I appended to the buffer.
The result on the Pi was that it was hitting the limit of the line buffer (ie my redacted error code was returned), as if
if ((ch == '\n') || (ch == EOF))
was never evaluating to TRUE.
Very interesting on the EOF. It doesn't fit in a char?? Maybe changing that to int is something else to try...
ASKER
Does '\n' always fit into a char?
Come to think of it ozo, if getstr() doesnt work I think you might have pointed me in the right direction... the Raspberry Pi (embedded) probably has different data widths than my windows and Ubuntu (desktop) installations.
Come to think of it ozo, if getstr() doesnt work I think you might have pointed me in the right direction... the Raspberry Pi (embedded) probably has different data widths than my windows and Ubuntu (desktop) installations.
If Simulink only has a C- style interface, that does not mean you cannot use C++ there - as long as the interface remains C, that's well possible.
ASKER
Interesting point jkr.
'\n' does fit into a char, but a char wouldn't == EOF, so if ch was declared as char, done_reading would never be set.
But if char was signed on your windows and ubuntu implementations, it is possible that the value when promoted to int would become comparable to EOF. (which would also mean that it could prematurely set done_reading if the file contains chars with a value that could be sign extended to look like EOF)
But if char was signed on your windows and ubuntu implementations, it is possible that the value when promoted to int would become comparable to EOF. (which would also mean that it could prematurely set done_reading if the file contains chars with a value that could be sign extended to look like EOF)
ASKER
By the way if I read the doc right one drawback of fgets is that it won't notify you if you hit the max line length. I'd prefer to trap this situation since it would generally mean a malformed input file. Things like that are why I usually do things like this longhand.