Link to home
Start Free TrialLog in
Avatar of josgood
josgoodFlag for United States of America

asked on

Request critique, Noob Linux C++

I'm learning Linux.  I've written a small C++ program that accepts a file path on stdin, splits the file into sentences, and emits them on stdout.  I'm doing this because it seems that the Linux paradigm is to pipe programs together to create a solution.  My target is to parse a collection of text files to find sentences that meet a criterion...meaning I need to do something like
      ls SomeFolder/* | MakeSentences | FilterByCriterion | more

Please give me a code review.  That will help me learn.  Thanks.

//
// Convert text file to sentences, emitted to stdout.
//
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>
#include <boost/regex.hpp>
#include <boost/algorithm/string/regex.hpp>
#include <boost/foreach.hpp>

std::string ExtractFileContents(const std::string path) {
      std::string line;
      std::stringstream contents;
      std::cout << path << std::endl;            // Show file path before contents
      std::ifstream file(path.c_str());
      if (file.is_open()) {
            while (file.good()) {
                  getline(file,line);
                  contents << " " << line;
            }
      }
      return contents.str();
      }

void EmitSentences(const std::string text) {
      std::vector<std::string> result;
      boost::algorithm::split_regex(result,text,boost::regex("\\. +")); //Naive filter, but seems sufficient
      BOOST_FOREACH(std::string sentence, result) {
            std::cout << sentence << ".\n";
      }
}

int main(int, char**) {
      std::string aLine;
      std::stringstream ss;
      while (!std::cin.eof()) {
            getline(std::cin,aLine);
            ss << aLine;
      }
      
      std::string contents = ExtractFileContents(ss.str());
      EmitSentences(contents);
      return 0;
}
ASKER CERTIFIED SOLUTION
Avatar of evilrix
evilrix
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of josgood

ASKER

Excellent!  Thank you!
(I feel silly about missing the references)

I appreciate the thought about making the method more generic.  That's a very good point that I need to work on in general...need to keep it more in mind.

On the regex, I'm saying that a sentence is terminated by a period and whitespace -- quite naive, but fits the text I'm searching.  I had forgotten about \S.

I'm encouraged that the points were all fairly minor...nothing big.

Other than the regex, I will implement your suggestions.
>> On the regex, I'm saying that a sentence is terminated by a period and whitespace
Ah, fair enough.

What happens if there is a dot in the line that isn't a full stop/period? Just something to consider.
Oh, you will also not match the last line if there isn't a space after the period.

Maybe try this instead.
\.(?:\s+|$)

Open in new window

Avatar of josgood

ASKER

Yes, certainly Mr. Jones, Jr. would be a problem, along with other abbreviations.  For the moment that game isn't worth the candle -- I have bigger fish to fry.

Thank you for the regex change.  I hadn't considered that case.

BTW, I am gratified that the code stood up well under your review.  I have a high opinion of you.
>> I have a high opinion of you.

Sheesh, now I am blushing :)
Avatar of josgood

ASKER

Thank you.