Solved

reading files

Posted on 2002-03-22
12
305 Views
Last Modified: 2010-04-02
what is the fastest way to read through a file.  I will need to know things like...

what is the length (in LINES) of the file
I need to access the beggining of each line like a member of a list or array.

I was thinking of two or three things...

read the file a character at a time into a one char buffer, and then check the buffer for new lines, and increment a number at each time,

something like

while reading into char a,
int chars++
if a == std::endl
int lines++
then I can seek with chars but I would probably have to makea list of all the positions of the \n

the downside to this is that it calls a loop for each character in the file.

I could also try to do it a line at a time, or, say 10 or 20 chars at a time...

what is the best way to do this w. the lest memory, processor.

I will need to completely reorder the list by the time I am done with it.

Bob
0
Comment
Question by:bebonham
  • 4
  • 3
  • 3
  • +2
12 Comments
 
LVL 86

Accepted Solution

by:
jkr earned 100 total points
ID: 6889221
Recding char-by-char is certainly the slowest way to do that - why not

#include <iostream>
#include <string>
#include <list>
using namespace std;

// ...

istream str ( "myfile.txt", ios::nocreate);
char buf [ MAX_LENGTH];
list<string> lstLines;

while ( !str.eof()) {

  str.getline ( buf, sizeof ( buf));

  string line = buf;
  lstLines.push_back ( line);
}

str.close();

This way, you'll have all the lines in a list...
0
 
LVL 2

Expert Comment

by:jonnin
ID: 6889382
read as much of a file as you have memory.
like several (5-10) megs worth.
the best way is to read N sectors at a time (hardware specific though). the reason is that a sector is the min that the hardware will read, the unused part may be thrown away (os can hold on in anticipation of use soon)...

use up that buffer, and get more until done...



line by line is fine for small apps. large chunking is for BIG files, usually binary files...

chunking helps even more if the drive is compressed (nt, etc)...
0
 
LVL 30

Expert Comment

by:Axter
ID: 6889628
I recommend you use the global getline function.
It allows you to using an std::string, so you don't have to know in advance the max size of a line before fetching it.

Here's a modified example of the code posted by jkr.

#include <fstream>
#include <string>
#include <list>
using namespace std;


int main(int argc, char* argv[])
{
     ifstream str ( "myfile.txt");
     list<string> lstLines;
     
     while ( !str.eof())
     {
          string line;
          getline(str,line);
          lstLines.push_back (line);
     }
     
     str.close();
     return 0;
}

0
 
LVL 86

Expert Comment

by:jkr
ID: 6889663
>>I recommend you use the global getline function

Have been scratching my head, I *knew* there was one that takes a string&... well, too much low-level I/O lately, I guess :o)
0
 
LVL 8

Author Comment

by:bebonham
ID: 6889756
isn't it a waste to put the whole line in memory?

I thought it might be faster to only save the position of the file where the lines started.

so it doesn't really slow things down to put a whole...5-10 megabyte file into a list like that?


Thanks,

Bob
0
 
LVL 30

Expert Comment

by:Axter
ID: 6889767
If you don't plan to put the file content into memory, why would you need to completely reorder the list by the time you're done with it?
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 86

Expert Comment

by:jkr
ID: 6889793
>>I need to access the beggining of each line like a member of a list or array

It might help to know what is in that 'beginning of each line' and why you want to re-sort it...
0
 
LVL 2

Expert Comment

by:ssr
ID: 6889816
Similar to the approaches above, you could use MFC, and CString and CStringArray classes to do what you want, in a Win32 console mode application.  You can use the Win32 APIs OpenFile() and ReadFile() with a buffer that is long enough to handle your longest line, then add this buffer to the CString array.  Then you can access the CStringArray just like an array of the lines of the file.

I have done this successfully with multiple megabyte files--it just depends on how much memory you have.

SSR
0
 
LVL 8

Author Comment

by:bebonham
ID: 6890070
the file is a simple csv, this is for a bulk-mailing program for postcards.

the problem is that sometimes we have 1 per page, sometimes 2 per page and sometime 4 per page.

if there are 4 per page then the order has to be changed like so

from 1,2,3,4,5,6,7,8,9,10,11,12
to
1,4,7,10,2,5,8,11,3,6,9,12

because a 4-up is __________
                  | a | c |
                  ---------
                  | b | d |
                  ---------

BUT then they get cut in 4's
so stack a has to be on top of stack b, stack b on top of c, etc.

hence the need for the reorder, to maintain the PRESORTED order.



the basic formula is
to split the total into four groups, say for 1000

group a= 1-250
group b=251-500
group c=501-750
group d=751-1000



Thanks,

Bob


0
 
LVL 30

Expert Comment

by:Axter
ID: 6890083
I must be missing the big picture, because I don't understand how the above info fits with the line counting requirement.

Could you tie the two together with more info?
0
 
LVL 8

Author Comment

by:bebonham
ID: 6890120
okay, sorry for the unclearness of my examples, thanks for sticking w/ me.

if I am doing 4 to a page postcards,

each pile has to have 1/4 of all the total postcards,

but I have to know how many so I know where to start each pile

on the first page printed, if there are 4000 records,
I have to know that in spot b on the first page, the record printed has to be 1001

so I have to know the total, so I can figure out what is the starting number of each stack.

I make 4 lists, one for each stack, and then pop the top one off each list
so for a list of 4000
I have 4 lists
list a 1-1000
b 1001-2000
c 2001-3000
d 3001-4000

then for page 1 I have 1, 1001, 2001, 3001
then page 2 I have 2,1002,2002,3002


I hope that clears it up

Bob
0
 
LVL 8

Author Comment

by:bebonham
ID: 6896771
thanks to everyone.
0

Featured Post

Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

Join & Write a Comment

Errors will happen. It is a fact of life for the programmer. How and when errors are detected have a great impact on quality and cost of a product. It is better to detect errors at compile time, when possible and practical. Errors that make their wa…
Unlike C#, C++ doesn't have native support for sealing classes (so they cannot be sub-classed). At the cost of a virtual base class pointer it is possible to implement a pseudo sealing mechanism The trick is to virtually inherit from a base class…
The goal of the tutorial is to teach the user how to use functions in C++. The video will cover how to define functions, how to call functions and how to create functions prototypes. Microsoft Visual C++ 2010 Express will be used as a text editor an…
The viewer will learn how to user default arguments when defining functions. This method of defining functions will be contrasted with the non-default-argument of defining functions.

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now