asked on

Reading files : line breaks and EOF

I'm trying to read delimited text files into a C++ program.

There are two issues that are bugging me :

(1) Line breaks

My code is curently as below. If I do not specify "\r" in the getline function, the whole thing stops working.

The issue I've got is that I cannot assume that all files inputted will terminate with "\r", some might be "\n" and some might be "\r\n".

How to I handle that issue ? I did some research that suggested opening the file in binary mode would help, but this has had no positive effect.

(2) EOF

There seems to me something wrong with my EOF detection routine but I can't figure out what.

The output from sscanf is fed into a vector of vectors (myData).

The inner vector operates as expected, there are 10 delimited fields and therefore inner vector size is 10 and everyone is happy.

The outer vector does not operate as expected. There are only 40 lines in the file, however the program will crash and burn if I specify vector size of 40. If I add a magical extra vector element to the outer vector, everyone is happy again !

Have I missed something obvivous here ? And I can't figure out how to count the number of lines in the file first due to the line break issue above !

Over to you experts !

vector<vector<string> > myData(41, vector<string> (10));
ifstream infile(fileName, ios_base::in | ios_base::binary);
if (!infile.is_open()) {
		cerr << "Unable to open input file !" << endl;
		return 0;
	}
	while (!infile.eof()) {
		getline(infile, strLine,'\r');
		if (!strLine.length())
			continue;
		sscanf(find stuff........)
}

Open in new window

evilrix

>> My code is curently as below. If I do not specify "\r" in the getline function, the whole thing stops working.
You are opening the file as binary, if you open it as text you won't need to do this (although see below for more info on this)

>> The issue I've got is that I cannot assume that all files inputted will terminate with "\r", some might be "\n" and some might be "\r\n".
If you open this as text rather than binary it'll make parsing a little simpler since CRLF and LF will both be represented as LF so all you'll need to handler as a special case is the CR. I had this exact problem when parsing PDF files :(

>> How to I handle that issue ?
Well, getline is designed to read a text file so you'll have to code something to parse this yourself.

>> There are only 40 lines in the file, however the program will crash and burn if I specify vector size of 40.
Heh, you didn't provide the important part of the scanf, which is the format specifier. That said, why don't you use a stringstream to extract your values it'll be far simpler and safer?

>> And I can't figure out how to count the number of lines in the file first due to the line break issue above !
The line ending inconsistencies mean you'll really have to code your own parser since the file isn't. Read in each line and then parse each line for embedded CR's that might make it multiple lines.