Solved

How to read a text file from C, line-by-line, in portable fashion

Posted on 2013-06-30
15
1,118 Views
Last Modified: 2016-08-13
So, i wrote a C program that loads in an ASCII file line-by-line using code like the below.  This worked fine compiled down to windows and to ubuntu, but we tried to compile it onto a raspberry pi (which I think runs a variant of debian linux), it didnt process the file correctly.  It seemed (but not sure) to be loading the entire file into one line; ie not detecting the '/n' as an end of line and loading the whole file into line_data.

We have double checked the data file, and they are each encoded properly (ie line endings) for the operating system.

So my questions are;

1) isn't '\n' supposed to be the portable way to detect an end of line?
2) if not, what IS the portable way?

Thanks for any  help.

        while (!done_reading)
        {
            ch = getc(fp);
            if (ch == EOF) done_reading = true;
            if ((ch == '\n') || (ch == EOF))
            {
                   // process the complete line in line_text
                   line_length=0;
            }
            else
            {
                  line_text[line_length] = ch;
                  line_length++;
            }
      }

Open in new window

0
Comment
Question by:RonMexico
  • 8
  • 5
  • 2
15 Comments
 
LVL 86

Accepted Solution

by:
jkr earned 500 total points
ID: 39288175
That actually should work in a portable way, yet you might see an additional '\r' on Windows. Yet that looping and reading single chars looks overly complicated, why not using 'fgets()' (http://www.cplusplus.com/reference/cstdio/fgets/?kw=fgets) to read an entire line?  See the example on that page:

/* fgets example */
#include <stdio.h>

int main()
{
   FILE * pFile;
   char mystring [100];

   pFile = fopen ("myfile.txt" , "r");
   if (pFile == NULL) perror ("Error opening file");
   else {
     if ( fgets (mystring , 100 , pFile) != NULL )
       puts (mystring);
     fclose (pFile);
   }
   return 0;
}

Open in new window

0
 

Author Comment

by:RonMexico
ID: 39288433
Thanks I will try that, will post back about whether it works.  Also good to know it *should* have worked and I'm not losing my mind.

By the way if I read the doc right one drawback of fgets is that it won't notify you if you hit the max line length.  I'd prefer to trap this situation since it would generally mean a malformed input file.  Things like that are why I usually do things like this longhand.
0
 
LVL 86

Expert Comment

by:jkr
ID: 39288456
>>By the way if I read the doc right one drawback of fgets is that it won't notify you if you hit
>>the max line length.

That's easy to avoid by choosing an ample-sized buffer. E.g. 2048 chars on a single line are prety uncommon for a not malformed text file.
0
 

Author Comment

by:RonMexico
ID: 39288464
Yes and no... consider if they accidentally try to load a binary file.  There won't *be* regular line endings and they could overflow an adequate buffer pretty easily.  I'd rather catch and abort (with a precise error code) earlier than later in that situation.
0
 

Author Comment

by:RonMexico
ID: 39288468
*"read" not "load"

Also I should have said "yes, but" not "yes and no" since you indicated a text file in your example.  

Thanks for the continued thoughts.
0
 
LVL 86

Expert Comment

by:jkr
ID: 39288470
>> consider if they accidentally try to load a binary file.  

The open mode in 'fopen()' will open it as a text file, so the 1st binary zero will be interpreted as "EOF".

>>There won't *be* regular line endings and they could overflow an adequate buffer pretty
>>easily.

No, since 'fgets()' will only read up the the amount of bytes specified, so buffer iverruns will only happen if a mistake is made in that call.
0
 

Author Comment

by:RonMexico
ID: 39288485
I should have said hitting the buffer limit instead of buffer overruns.

Point is, if i trap hitting the buffer limit, the client function will be more helpful to the user in its error messaging ("you have probably loaded an altogether wrong kind of file") than if it just doesn't recognize the content of a very long line, which could be just a typo.   It's just a minor benefit provided by the character-by-character read.

Which will hopefully be more than offset by fgets being able to do what it's intended to do on a raspberry pi.  :)
0
 
LVL 86

Expert Comment

by:jkr
ID: 39288500
Ironically, the easiest way out would be to say "go C++, use 'getline()'" - this would rid you of all these issues and handle lines of arbitrary length ;o)
0
 

Author Closing Comment

by:RonMexico
ID: 39288501
Thanks for the help!
0
 
LVL 84

Expert Comment

by:ozo
ID: 39288505
fgets, unlike the posted code, will stop before the maximum number of bytes is reached, and it will append a null character to the string (whether or not it had already red null bytes in the file)
You would have to do the check for whether the '\n' was included yourself, since fgets will nit tell you with it's return value.

I'd like to see the test  you did in which it did not process the file correctly, and a clarification of in what way it was not correct.
I'd also like to verify that ch was properly declared as int, not as char, since an EOF  value does not fit in a char.
0
 

Author Comment

by:RonMexico
ID: 39288520
@jkr: Would love to go C++ but we are writing Simulink s-functions.  Don't ask.  

@ozo: Yeah, I actually edited my precious max-line-length trap (in the else branch, with specific error code) out of the posted code, since it was sideways from the problem.  :)  I didn't bother with a \0 since I was keeping track of the line_length myself as I appended to the buffer.

The result on the Pi was that it was hitting the limit of the line buffer (ie my redacted error code was returned), as if

if ((ch == '\n') || (ch == EOF))

was never evaluating to TRUE.

Very interesting on the EOF.  It doesn't fit in a char??  Maybe changing that to int is something else to try...
0
 

Author Comment

by:RonMexico
ID: 39288528
Does '\n' always fit into a char?

Come to think of it ozo, if getstr() doesnt work I think you might have pointed me in the right direction... the Raspberry Pi (embedded) probably has different data widths than my windows and Ubuntu (desktop) installations.
0
 
LVL 86

Expert Comment

by:jkr
ID: 39288534
If Simulink only has a C- style interface, that does not mean you cannot use C++ there - as long as the interface remains C, that's well possible.
0
 

Author Comment

by:RonMexico
ID: 39288538
Interesting point jkr.
0
 
LVL 84

Expert Comment

by:ozo
ID: 39288575
'\n' does fit into a char, but a char wouldn't == EOF, so if ch was declared as char, done_reading would never be set.

But if char was signed on your windows and ubuntu implementations, it is possible that the value when promoted to int would become comparable to EOF.  (which would also mean that it could prematurely set done_reading if the file contains chars with a value that could be sign extended to look like EOF)
0

Join & Write a Comment

Article by: SunnyDark
This article's goal is to present you with an easy to use XML wrapper for C++ and also present some interesting techniques that you might use with MS C++. The reason I built this class is to ease the pain of using XML files with C++, since there is…
Linux users are sometimes dumbfounded by the severe lack of documentation on a topic. Sometimes, the documentation is copious, but other times, you end up with some obscure "it varies depending on your distribution" over and over when searching for …
Video by: Grant
The goal of this video is to provide viewers with basic examples to understand and use nested-loops in the C programming language.
The goal of this video is to provide viewers with basic examples to understand how to create, access, and change arrays in the C programming language.

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now