Solved

Scanning and Tokenizing a text file in C++

Posted on 2011-02-15
14
1,337 Views
Last Modified: 2012-05-11
I'm playing around with the code below, and what I want to happen is to be able to tokenize whatever is in the text file, and do some str comparisons with.

So far, I'm only getting the first word/token.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fstream>

using namespace std;

int main()
{
char str[]="NOVALUE";
char *pch;
int x=0;

FILE *fp;

fp=fopen("C:\\myfile.txt", "r");
fscanf(fp, "%s", &str);

pch=strtok(str, " ,.");

while(pch != NULL)
{
printf("%s", pch);

//--arbitrary line to stop program from exiting before viewing output goes here --

fflush(fp);
fclose(fp);

return 0;
}

Open in new window

0
Comment
Question by:--TripWire--
  • 6
  • 4
  • 2
  • +1
14 Comments
 
LVL 9

Expert Comment

by:AriMc
ID: 34902170
I think this is the code you are looking for:


#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main()
{
   char str[1024];
   char *pch;
   FILE *fp;

   fp=fopen("d:\\myfile.txt", "r");

   if (fp != NULL)
   { 
      while (fscanf(fp, "%s", str) != EOF)
      {
         printf("%s\n", str);
      }
   
      fclose(fp);
   }
   
   return 0;
}

Open in new window

0
 
LVL 9

Expert Comment

by:AriMc
ID: 34902238
Actually, I ignored your strtok, so here is a revised version:


#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main()
{
   char str[1024];
   char *pch;
   FILE *fp;
   int  first;
   
   fp=fopen("d:\\myfile.txt", "r");

   if (fp != NULL)
   { 
      while (fscanf(fp, "%s", str) != EOF)
      {
         first = 1;
         while ((pch=strtok(first ? str : NULL, " ,.")) != NULL)
         {
            printf("%s\n", pch);
            first=0;
         }
      }
   
      fclose(fp);
   }
   
   return 0;
}

Open in new window

0
 

Assisted Solution

by:shlomibu
shlomibu earned 87 total points
ID: 34904592
Hi,

If you don't insist on working with fscanf, I think  the following code could help you tokenize strings easily.

Sincerely,
Shlomi


#include <fstream>
#include <string>
#include <vector>

void Tokenize(const std::string& str,
         std::vector<std::string>& tokens,
         const std::string& delimiters)
{
    // Empty Content of Output Vector
    tokens.clear();
    // Skip delimiters at beginning.
    std::string::size_type lastPos = str.find_first_not_of(delimiters, 0);
    // Find first "non-delimiter".
    std::string::size_type pos     = str.find_first_of(delimiters, lastPos);

    while (std::string::npos != pos || std::string::npos != lastPos)
    {
        // Found a token, add it to the vector.
        tokens.push_back(str.substr(lastPos, pos - lastPos));
        // Skip delimiters.  Note the "not_of"
        lastPos = str.find_first_not_of(delimiters, pos);
        // Find next "non-delimiter"
        pos = str.find_first_of(delimiters, lastPos);
    }
}

int main()
{
    std::string filename("D:\\temp.txt");

    std::ifstream fid(filename.c_str());

    std::string line;
    std::vector < std::string > token_vec;
    while( !fid.eof() )
    {
        getline (fid,line);
        if ( line.empty() )
        {
            continue;
        }
        Tokenize(line,token_vec," \n\t");
    }
    fid.close();

    return 0;
}




0
 
LVL 32

Assisted Solution

by:sarabande
sarabande earned 25 total points
ID: 34904642
in c++ you don't need C headers stdio.h, stdlib.h and string.h.

instead use

#include <iostream>
#include <fstream>
#include <string>  // not .h

all these are headers of standard template library STL which is standard c++ since 1998.

using <fstream> header you open the text file with

  std::ifstream f("c:\\myfile.txt");

then you can read in while loop like

   while(getline(f, strline))
   {
      ...
   }

where strline is of type std::string.

you may parse the strline by call of strline.find_first_of(" ;,", nextpos) in a loop what will return a position in strline if a space, comma or semicolon was found in strline at or after nextpos. if no separator is found std::string::npos was returned.

Sara
0
 

Author Comment

by:--TripWire--
ID: 34909144
Thanks for all your replies.

shlomibu:  I'm sure your code works, however, I'm not as versed in C++ as you may be, so it might be best that I continue to use fscanf because I need to expand on this code and may need to troubleshoot it later.  But I will try your code anyway, to see if I can stumble through it.  Is there a reason why you're not too fond of fscanf?  What are the drawbacks?

This is a question for anyone - it's obviously been a while since I've programmed in cpp, I seem to remember there being something special about passing a pointer to a function.  Am I correct?  
0
 
LVL 32

Expert Comment

by:sarabande
ID: 34909641
fscanf is old c runtime function. it makes conversion which is not type safe as you could do it with c++. you want parse the line anyway so why not using clean and easy c++ getline function?

instead of pointers you can use references in c++ for most cases what is less error-prone.

Sara
0
 

Expert Comment

by:shlomibu
ID: 34914557
Hi TripWire,

I'm not an expert as you may think.
I believe you'd be better of trying to understand the basic elements of standard template library such as: string, vector, stream. These should help you get started and expanding your code to the functionality you need.

A great place to learn these is cplusplus.com, (e.g. string):
http://www.cplusplus.com/reference/string/string/

I've clean the code a bit for you - I added a "using namespace std" statement and removed the "std::" references in the code.

I suggest you run the code in debug mode and walk through its steps.
The main function code basically opens a text file, gets a line and tokenizes it using three separators, line delimiter, tab delimiter and a space.  

The & symbols are references, these are a great addition to C++ (vs. standard C).
When you use a reference parameter, the address (not the value) of an
argument is automatically passed to the function. So when you change the value within a function (like i've done with the tokens (which is a vector of strings), the value in the calling function changes as well ( to avoid doing this when unnecessary, you add a const keyword ).

HTH,
Shlomi




 
 
#include <fstream>
#include <string>
#include <vector>

using namespace std;

void Tokenize(const string& str,
         vector<string>& tokens,
         const string& delimiters)
{
    // Empty Content of Output Vector
    tokens.clear(); 
    // Skip delimiters at beginning.
    string::size_type lastPos = str.find_first_not_of(delimiters, 0);
    // Find first "non-delimiter".
    string::size_type pos     = str.find_first_of(delimiters, lastPos);

    while (string::npos != pos || string::npos != lastPos)
    {
        // Found a token, add it to the vector.
        tokens.push_back(str.substr(lastPos, pos - lastPos));
        // Skip delimiters.  Note the "not_of"
        lastPos = str.find_first_not_of(delimiters, pos);
        // Find next "non-delimiter"
        pos = str.find_first_of(delimiters, lastPos);
    }
}

int main()
{
    string filename("D:\\temp.txt");

    ifstream fid(filename.c_str());

    string line;
    vector < string > token_vec;
    while( !fid.eof() )
    {
        getline (fid,line);
        if ( line.empty() )
        {
            continue;
        }
        Tokenize(line,token_vec," \n\t");
    }
    fid.close();

    return 0;
}

Open in new window

0
Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

 

Author Comment

by:--TripWire--
ID: 34920225
Thanks shlomibu!

I really appreciate the info, and I do plan on learning the ins and outs of what you have above - however, for the sake of this project, I have a deadline of about another week and a half, and so I would not be able to learn the concepts that are going to be new to me by then.

Just to clarify - my program is going to be very simple.  I will be examining string values found in a text file on the local hard drive.  Do you really not suggest using my standard C vocabulary?  Or could I use the code posted by AriMc above?
0
 

Author Comment

by:--TripWire--
ID: 34922044
AriMc - The code you have tokenizes all words with spaces in between.
How do I get the only delimiter to be a comma (",")?

I've already tried...
while((pch=strtok(first ? str : NULL, ",")) != NULL)

Open in new window

0
 
LVL 9

Expert Comment

by:AriMc
ID: 34922102
The second argument of strtok specifies the delimiters between tokens. Your "," should work fine in that respect.

Please provide complete examples of code, input files, output and descriptions of what you expected to see.



0
 

Author Comment

by:--TripWire--
ID: 34922125
That's what I thought.  The code I have is exactly the same as what you provided earlier, with the exception of the fact that I changed strtok's arguments to be ","

My input file is a text file in the root.  C:\myfile.txt

I typed a sample input into the text file.

This is an example of a string, to be tokenized

And what's showing up is:
This
is
an
example
of
a
string
to
be
tokenized

I want it to print out:

This is an example of a string
to be tokenized

In other words - I only want it to cut the string where a comma appears.
0
 
LVL 9

Accepted Solution

by:
AriMc earned 88 total points
ID: 34922184
Ok, in that case try this:


#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main()
{
   char str[1024];
   char *pch;
   FILE *fp;
   int  first;
   
   fp=fopen("c:\\myfile.txt", "r");

   if (fp != NULL)
   { 
      while (fgets(str, sizeof(str), fp) != NULL)
      {
         first = 1;
         while ((pch=strtok(first ? str : NULL, ",")) != NULL)
         {
            printf("%s\n", pch);
            first=0;
         }
      }
   
      fclose(fp);
   }
   
   return 0;
}

Open in new window

0
 

Author Comment

by:--TripWire--
ID: 34922213
I'm getting that conversion from int to char* cannot be performed.
They're incompatible.
0
 

Author Comment

by:--TripWire--
ID: 34922220
Ignore the last statement.  I forgot to change EOF to NULL.
Thank you!
0

Featured Post

6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

Join & Write a Comment

When writing generic code, using template meta-programming techniques, it is sometimes useful to know if a type is convertible to another type. A good example of when this might be is if you are writing diagnostic instrumentation for code to generat…
Introduction This article is a continuation of the C/C++ Visual Studio Express debugger series. Part 1 provided a quick start guide in using the debugger. Part 2 focused on additional topics in breakpoints. As your assignments become a little more …
The goal of this video is to provide viewers with basic examples to understand and use conditional statements in the C programming language.
The goal of the tutorial is to teach the user how to use functions in C++. The video will cover how to define functions, how to call functions and how to create functions prototypes. Microsoft Visual C++ 2010 Express will be used as a text editor an…

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now