Solved

reading files

Posted on 2002-03-22
12
312 Views
Last Modified: 2010-04-02
what is the fastest way to read through a file.  I will need to know things like...

what is the length (in LINES) of the file
I need to access the beggining of each line like a member of a list or array.

I was thinking of two or three things...

read the file a character at a time into a one char buffer, and then check the buffer for new lines, and increment a number at each time,

something like

while reading into char a,
int chars++
if a == std::endl
int lines++
then I can seek with chars but I would probably have to makea list of all the positions of the \n

the downside to this is that it calls a loop for each character in the file.

I could also try to do it a line at a time, or, say 10 or 20 chars at a time...

what is the best way to do this w. the lest memory, processor.

I will need to completely reorder the list by the time I am done with it.

Bob
0
Comment
Question by:bebonham
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
  • 3
  • +2
12 Comments
 
LVL 86

Accepted Solution

by:
jkr earned 100 total points
ID: 6889221
Recding char-by-char is certainly the slowest way to do that - why not

#include <iostream>
#include <string>
#include <list>
using namespace std;

// ...

istream str ( "myfile.txt", ios::nocreate);
char buf [ MAX_LENGTH];
list<string> lstLines;

while ( !str.eof()) {

  str.getline ( buf, sizeof ( buf));

  string line = buf;
  lstLines.push_back ( line);
}

str.close();

This way, you'll have all the lines in a list...
0
 
LVL 2

Expert Comment

by:jonnin
ID: 6889382
read as much of a file as you have memory.
like several (5-10) megs worth.
the best way is to read N sectors at a time (hardware specific though). the reason is that a sector is the min that the hardware will read, the unused part may be thrown away (os can hold on in anticipation of use soon)...

use up that buffer, and get more until done...



line by line is fine for small apps. large chunking is for BIG files, usually binary files...

chunking helps even more if the drive is compressed (nt, etc)...
0
 
LVL 30

Expert Comment

by:Axter
ID: 6889628
I recommend you use the global getline function.
It allows you to using an std::string, so you don't have to know in advance the max size of a line before fetching it.

Here's a modified example of the code posted by jkr.

#include <fstream>
#include <string>
#include <list>
using namespace std;


int main(int argc, char* argv[])
{
     ifstream str ( "myfile.txt");
     list<string> lstLines;
     
     while ( !str.eof())
     {
          string line;
          getline(str,line);
          lstLines.push_back (line);
     }
     
     str.close();
     return 0;
}

0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 86

Expert Comment

by:jkr
ID: 6889663
>>I recommend you use the global getline function

Have been scratching my head, I *knew* there was one that takes a string&... well, too much low-level I/O lately, I guess :o)
0
 
LVL 8

Author Comment

by:bebonham
ID: 6889756
isn't it a waste to put the whole line in memory?

I thought it might be faster to only save the position of the file where the lines started.

so it doesn't really slow things down to put a whole...5-10 megabyte file into a list like that?


Thanks,

Bob
0
 
LVL 30

Expert Comment

by:Axter
ID: 6889767
If you don't plan to put the file content into memory, why would you need to completely reorder the list by the time you're done with it?
0
 
LVL 86

Expert Comment

by:jkr
ID: 6889793
>>I need to access the beggining of each line like a member of a list or array

It might help to know what is in that 'beginning of each line' and why you want to re-sort it...
0
 
LVL 2

Expert Comment

by:ssr
ID: 6889816
Similar to the approaches above, you could use MFC, and CString and CStringArray classes to do what you want, in a Win32 console mode application.  You can use the Win32 APIs OpenFile() and ReadFile() with a buffer that is long enough to handle your longest line, then add this buffer to the CString array.  Then you can access the CStringArray just like an array of the lines of the file.

I have done this successfully with multiple megabyte files--it just depends on how much memory you have.

SSR
0
 
LVL 8

Author Comment

by:bebonham
ID: 6890070
the file is a simple csv, this is for a bulk-mailing program for postcards.

the problem is that sometimes we have 1 per page, sometimes 2 per page and sometime 4 per page.

if there are 4 per page then the order has to be changed like so

from 1,2,3,4,5,6,7,8,9,10,11,12
to
1,4,7,10,2,5,8,11,3,6,9,12

because a 4-up is __________
                  | a | c |
                  ---------
                  | b | d |
                  ---------

BUT then they get cut in 4's
so stack a has to be on top of stack b, stack b on top of c, etc.

hence the need for the reorder, to maintain the PRESORTED order.



the basic formula is
to split the total into four groups, say for 1000

group a= 1-250
group b=251-500
group c=501-750
group d=751-1000



Thanks,

Bob


0
 
LVL 30

Expert Comment

by:Axter
ID: 6890083
I must be missing the big picture, because I don't understand how the above info fits with the line counting requirement.

Could you tie the two together with more info?
0
 
LVL 8

Author Comment

by:bebonham
ID: 6890120
okay, sorry for the unclearness of my examples, thanks for sticking w/ me.

if I am doing 4 to a page postcards,

each pile has to have 1/4 of all the total postcards,

but I have to know how many so I know where to start each pile

on the first page printed, if there are 4000 records,
I have to know that in spot b on the first page, the record printed has to be 1001

so I have to know the total, so I can figure out what is the starting number of each stack.

I make 4 lists, one for each stack, and then pop the top one off each list
so for a list of 4000
I have 4 lists
list a 1-1000
b 1001-2000
c 2001-3000
d 3001-4000

then for page 1 I have 1, 1001, 2001, 3001
then page 2 I have 2,1002,2002,3002


I hope that clears it up

Bob
0
 
LVL 8

Author Comment

by:bebonham
ID: 6896771
thanks to everyone.
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Often, when implementing a feature, you won't know how certain events should be handled at the point where they occur and you'd rather defer to the user of your function or class. For example, a XML parser will extract a tag from the source code, wh…
Introduction This article is a continuation of the C/C++ Visual Studio Express debugger series. Part 1 provided a quick start guide in using the debugger. Part 2 focused on additional topics in breakpoints. As your assignments become a little more …
The viewer will learn how to clear a vector as well as how to detect empty vectors in C++.
The viewer will be introduced to the member functions push_back and pop_back of the vector class. The video will teach the difference between the two as well as how to use each one along with its functionality.

739 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question