Link to home
Start Free TrialLog in
Avatar of bebonham
bebonham

asked on

reading files

what is the fastest way to read through a file.  I will need to know things like...

what is the length (in LINES) of the file
I need to access the beggining of each line like a member of a list or array.

I was thinking of two or three things...

read the file a character at a time into a one char buffer, and then check the buffer for new lines, and increment a number at each time,

something like

while reading into char a,
int chars++
if a == std::endl
int lines++
then I can seek with chars but I would probably have to makea list of all the positions of the \n

the downside to this is that it calls a loop for each character in the file.

I could also try to do it a line at a time, or, say 10 or 20 chars at a time...

what is the best way to do this w. the lest memory, processor.

I will need to completely reorder the list by the time I am done with it.

Bob
ASKER CERTIFIED SOLUTION
Avatar of jkr
jkr
Flag of Germany image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of jonnin
jonnin

read as much of a file as you have memory.
like several (5-10) megs worth.
the best way is to read N sectors at a time (hardware specific though). the reason is that a sector is the min that the hardware will read, the unused part may be thrown away (os can hold on in anticipation of use soon)...

use up that buffer, and get more until done...



line by line is fine for small apps. large chunking is for BIG files, usually binary files...

chunking helps even more if the drive is compressed (nt, etc)...
I recommend you use the global getline function.
It allows you to using an std::string, so you don't have to know in advance the max size of a line before fetching it.

Here's a modified example of the code posted by jkr.

#include <fstream>
#include <string>
#include <list>
using namespace std;


int main(int argc, char* argv[])
{
     ifstream str ( "myfile.txt");
     list<string> lstLines;
     
     while ( !str.eof())
     {
          string line;
          getline(str,line);
          lstLines.push_back (line);
     }
     
     str.close();
     return 0;
}

>>I recommend you use the global getline function

Have been scratching my head, I *knew* there was one that takes a string&... well, too much low-level I/O lately, I guess :o)
Avatar of bebonham

ASKER

isn't it a waste to put the whole line in memory?

I thought it might be faster to only save the position of the file where the lines started.

so it doesn't really slow things down to put a whole...5-10 megabyte file into a list like that?


Thanks,

Bob
If you don't plan to put the file content into memory, why would you need to completely reorder the list by the time you're done with it?
>>I need to access the beggining of each line like a member of a list or array

It might help to know what is in that 'beginning of each line' and why you want to re-sort it...
Similar to the approaches above, you could use MFC, and CString and CStringArray classes to do what you want, in a Win32 console mode application.  You can use the Win32 APIs OpenFile() and ReadFile() with a buffer that is long enough to handle your longest line, then add this buffer to the CString array.  Then you can access the CStringArray just like an array of the lines of the file.

I have done this successfully with multiple megabyte files--it just depends on how much memory you have.

SSR
the file is a simple csv, this is for a bulk-mailing program for postcards.

the problem is that sometimes we have 1 per page, sometimes 2 per page and sometime 4 per page.

if there are 4 per page then the order has to be changed like so

from 1,2,3,4,5,6,7,8,9,10,11,12
to
1,4,7,10,2,5,8,11,3,6,9,12

because a 4-up is __________
                  | a | c |
                  ---------
                  | b | d |
                  ---------

BUT then they get cut in 4's
so stack a has to be on top of stack b, stack b on top of c, etc.

hence the need for the reorder, to maintain the PRESORTED order.



the basic formula is
to split the total into four groups, say for 1000

group a= 1-250
group b=251-500
group c=501-750
group d=751-1000



Thanks,

Bob


I must be missing the big picture, because I don't understand how the above info fits with the line counting requirement.

Could you tie the two together with more info?
okay, sorry for the unclearness of my examples, thanks for sticking w/ me.

if I am doing 4 to a page postcards,

each pile has to have 1/4 of all the total postcards,

but I have to know how many so I know where to start each pile

on the first page printed, if there are 4000 records,
I have to know that in spot b on the first page, the record printed has to be 1001

so I have to know the total, so I can figure out what is the starting number of each stack.

I make 4 lists, one for each stack, and then pop the top one off each list
so for a list of 4000
I have 4 lists
list a 1-1000
b 1001-2000
c 2001-3000
d 3001-4000

then for page 1 I have 1, 1001, 2001, 3001
then page 2 I have 2,1002,2002,3002


I hope that clears it up

Bob
thanks to everyone.