• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 527
  • Last Modified:

Text File Read Question

I have a humongous text file with elements like this... (This is an example only and NOT the real file)...Note that in each item... First half before the underscore is the major item name and the other half is the minor descriptions.


There are about 75000 items in this file.

I am writing a C++ class to pickup only the major name of each item. I.E. My result from the above humongous file should be...


I know thare are tons of techniques out there. What is the real efficient method I should use so that my result is produced in nano seconds :-) (serioulsly efficiency is extreamly important for me)
1 Solution
You'll have to read the entire file anyway, so there's not much room for improvement. The simplest way I can imagine would be to

#include <fstream>
#include <string>
#include <list>

using namespace std;


string line;
size_t pos;
list<string> items;
ifstream is("file.txt");

if (!is.is_open()) {

  // error, no such file

while (!is.eof()) {


  if (string::npos == (pos = line.find('_'))) {

    // error, malformed line w/o underscore


Open in new window

the nanoseconds is not realistic, even for fast ssd or flash storage, each file access to a new file not currently in cache would need milliseconds to position at file-begin and read all blocks to memory, manage the cache, handle overhead of the filesystem, schedule the thread, ... if the file was not stored contiguously, all times must be multiplied with the number of file pieces. same applies to debug mode which also would generate extra times.

generally if you read a file in total to memory in binary mode you normally can halve the reading times for a file of significant size.

#include <sys/stat.h>
#include <fstream>
#include <string>
struct stat filestatus = { 0 };
if (stat(szfilepath, &filestatus) == 0)
     std::ifstream file(szfilepath, std::ios::binary | std::ios::in);
     if (file)
           std::string buf(filestatus.st_size+1, '\0');
           if (file.read(&buf[0], filestatus.st_size))
                  std::string crlf = "\r\n";  
                  std::string line;
                  buf += crlf; // add carriagereturn-linefeed for easier parsing
                  size_t pos, lpos = 0;
                  while ((pos = buf.find(crlf, lpos)) != std::string::npos)
                      if (pos > lpos)
                           line = buf.substr(lpos, pos - lpos);
                           // here you have one line extracted
                      lpos = pos + crlf.length();

Open in new window

the above code would need contiguous memory of 75,000 times (average line length + 2). That can be a problem on a low-memory or busy system. if that could be a problem you might think of reading the file in - say - 64k chunks.

At the ... you could extract the first string part from line and add it to a std::set<std::string> container. a set would add a new string only if it is not already in the set.    

std::set<std::string> items;
       size_t undl = line.find('_');
       if (undl != std::string::npos)
           line.resize(undl);   // truncate line string

Open in new window


finally, you could iterate the set to get all entries (sorted alphabetically).

std::set<std::string>::iterator i;
for (i = items.begin(); i != items.end(); ++i)
      std::string & item = *i;  // get a reference of the current item in set 

Open in new window


note, a std::set or a std::map would handle duplicates. a std::list or std::vector would not. if using one of the latter you would need to sort after filling and remove or skip duplicates after sorting. the last method could be faster if there are only few duplicates.

prainAuthor Commented:
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Get expert help—faster!

Need expert help—fast? Use the Help Bell for personalized assistance getting answers to your important questions.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now