Solved

how to extracted result into string or numeric from boost::tokenizer

Posted on 2013-11-28
19
755 Views
Last Modified: 2013-12-07
Below is a code snippet that uses tokenizer.  How can I try to extract the token into a string or a data structure rather than printing them out
      cout << *beg << "\n";


#include<iostream>
#include<boost/tokenizer.hpp>
#include<string>

int main(){
   using namespace std;
   using namespace boost;
   string s = "Field 1,\"putting quotes around fields, allows commas\",Field 3, 123, 34.5";
   tokenizer<escaped_list_separator<char> > tok(s);
   for(tokenizer<escaped_list_separator<char> >::iterator beg=tok.begin(); beg!=tok.end();++beg){
       cout << *beg << "\n";
   }
}

Open in new window

0
Comment
Question by:tommym121
  • 7
  • 5
  • 5
  • +1
19 Comments
 
LVL 86

Assisted Solution

by:jkr
jkr earned 126 total points
Comment Utility
Just assign them to a string instead, e.g.

#include<iostream>
#include<boost/tokenizer.hpp>
#include<string>
#include <list>

int main(){
   using namespace std;
   using namespace boost;
   list <string> results;
   string token;
   string s = "Field 1,\"putting quotes around fields, allows commas\",Field 3, 123, 34.5";
   tokenizer<escaped_list_separator<char> > tok(s);
   for(tokenizer<escaped_list_separator<char> >::iterator beg=tok.begin(); beg!=tok.end();++beg){
       cout << *beg << "\n";

      token = *beg;

     results.push_back(token);
   }
}
                                  

Open in new window

0
 
LVL 24

Assisted Solution

by:chaau
chaau earned 292 total points
Comment Utility
It depends where you want to store your tokens. You can for example declare a vector of strings and store each token there, like this:
#include<iostream>
#include<boost/tokenizer.hpp>
#include<string>

int main(){
   using namespace std;
   using namespace boost;
   vector<string> tokens;
   string s = "Field 1,\"putting quotes around fields, allows commas\",Field 3, 123, 34.5";
   tokenizer<escaped_list_separator<char> > tok(s);
   for(tokenizer<escaped_list_separator<char> >::iterator beg=tok.begin(); beg!=tok.end();++beg){
       tokens.push_back(*beg);
   }
}

Open in new window

If you want to use the vector, just iterate through it the same way you have iterated through the tokenizer:
vector<string>::const_iterator pos;
for(pos = tokens.begin(); pos != tokens.end(); ++pos)
{
    cout << *pos << ", ";
}

Open in new window

However, your question makes me think that you have a comma-separated string with the elements each at its own position, like a transaction string from a bank where the elements are defined like this:
AccountNumber,AccountName,DebitAmount,CreditAmount

Open in new window

In this case it will be wise to define a structure and move the tokens to each elements. I recommend you define this structure:
#include<iostream>
#include<boost/tokenizer.hpp>
#include<string>

using namespace std;
using namespace boost;

struct Transaction
{
  string AccountNumber;
  string AccountName;
  double DebitAmount;
  double CreditAmount;
};
enum
{
  PosAccountNumber,
  PosAccountName,
  PosDebitAmount,
  PosCreditAmount
};
int main(){
   Transaction tran;
   string s = "123456789, Test Account, 123, 34.5";
   tokenizer<escaped_list_separator<char> > tok(s);
   int pos = 0;
   for(tokenizer<escaped_list_separator<char> >::iterator beg=tok.begin(); beg!=tok.end();++beg, ++pos){
       switch(pos)
   {
   case PosAccountNumber: tran.AccountNumber = *beg; break;
   case PosAccountName: tran.AccountName = *beg; break;
   case PosDebitAmount: tran.DebitAmount = atof(*beg.c_str()); break;
   case PosCreditAmount: tran.CreditAmount = atof(*beg.c_str()); break;
   }
   }
}

Open in new window

0
 

Author Comment

by:tommym121
Comment Utility
chaau

You are right. It is what I am looking for.

When I assign *beg to a string in the following in the following statement. Do we need to create a new string to hold the content of *beg.  I am confused when it is a copy and when it is a reference.
   case PosAccountNumber: tran.AccountNumber = *beg; break;
   case PosAccountName: tran.AccountName = *beg; break;
0
 
LVL 24

Assisted Solution

by:chaau
chaau earned 292 total points
Comment Utility
There is a copy of the string in the Transaction structure.
tran.AccountNumber is a string. When you use this statement:
tran.AccountNumber = *beg

Open in new window

you copy the content of the current token to the string inside the structure. If you are confused about the structures and enums then you can quite simply use normal string variables, like this:
#include<iostream>
#include<boost/tokenizer.hpp>
#include<string>

using namespace std;
using namespace boost;

int main(){
   string AccountNumber;
   string AccountName;
   string s = "123456789, Test Account, 123, 34.5";
   tokenizer<escaped_list_separator<char> > tok(s);
   int pos = 0;
   for(tokenizer<escaped_list_separator<char> >::iterator beg=tok.begin(); beg!=tok.end();++beg, ++pos){
       switch(pos)
   {
   case 0: AccountNumber = *beg; break;
   case 1: AccountName = *beg; break;
    // etc.
   }
   }
}

Open in new window

0
 
LVL 86

Assisted Solution

by:jkr
jkr earned 126 total points
Comment Utility
>>Do we need to create a new string to hold the content of *beg.

See the code sample I posted above (which chaau was 'kind' enough to ignore), it does exactly that:

int main(){
   using namespace std;
   using namespace boost;
   list <string> results;
   string token;
   string s = "Field 1,\"putting quotes around fields, allows commas\",Field 3, 123, 34.5";
   tokenizer<escaped_list_separator<char> > tok(s);
   for(tokenizer<escaped_list_separator<char> >::iterator beg=tok.begin(); beg!=tok.end();++beg){
       cout << *beg << "\n";

      token = *beg;  // assign to string

     results.push_back(token); // add to list
   }
}

Open in new window

0
 
LVL 24

Assisted Solution

by:chaau
chaau earned 292 total points
Comment Utility
@jkr. I did not ignore your code. I was typing mine (using a vector, BTW) while you have already had yours typed. If you have a look at my answer that followed yours, you will notice that I have also provided a second option with a struct, enum and stuff like that. Obviously, that stuff required more time for typing, thus my answer appeared after yours. But I promise you that when I started typing there were no answers whatsoever.

This is actually a problem of EE. It does not have any indication of what is going on while you type the answer. If you have an experience with SO, you would know that there the answers from other users "magically" appear while you type your answer.

Send a note to the EE administrators with your concerns. Maybe they will implement some sort of AJAX functionality to show answers dynamically
0
 

Author Comment

by:tommym121
Comment Utility
The reason why I ask is, I am planning to read a CSV file (quite similar to chaau).  I would  like to read into the structure and store it in a vector.  I will be  reading something like this
 where 'line' will need to be tokenized. Below is not complete code, but did illustrate what I need to accomplish
			infile.open(inputFilename, std::fstream::in);
			std::string line;
			std::string token;

			while (std::getline(infile, line))
			{
 tokenizer<escaped_list_separator<char> > tok(line);
   for(tokenizer<escaped_list_separator<char> >::iterator beg=tok.begin(); beg!=tok.end();++beg, ++pos){
       cout << *beg << "\n";

    switch(pos)
   {
   case PosAccountNumber: tran.AccountNumber = *beg; break;
   case PosAccountName: tran.AccountName = *beg; break;
   case PosDebitAmount: tran.DebitAmount = atof(*beg.c_str()); break;
   case PosCreditAmount: tran.CreditAmount = atof(*beg.c_str()); break;
   };
}

Open in new window

0
 
LVL 24

Assisted Solution

by:chaau
chaau earned 292 total points
Comment Utility
For your example I would definitely create a vector (or list with respect to jkr) of structs you have define for each row. That way it will be easier to use it later on in the code
0
 
LVL 86

Assisted Solution

by:jkr
jkr earned 126 total points
Comment Utility
Well, you are almost there - the only thing that is missing is to 'push_back()' the struct you filled in the loop to the vector. That works exactly the same way as with a list or as with a single string in the examples above:

vector<STRUCT_TYPE> vFileContents; // don't know the exact definition of 'tran', just a placeholder
while (std::getline(infile, line))
{
 STRUCT_TYPE tran; // local to the loop, the contens will be appended to the vector
 tokenizer<escaped_list_separator<char> > tok(line);
   for(tokenizer<escaped_list_separator<char> >::iterator beg=tok.begin(); beg!=tok.end();++beg, ++pos){
       cout << *beg << "\n";

    switch(pos)
   {
   case PosAccountNumber: tran.AccountNumber = *beg; break;
   case PosAccountName: tran.AccountName = *beg; break;
   case PosDebitAmount: tran.DebitAmount = atof(*beg.c_str()); break;
   case PosCreditAmount: tran.CreditAmount = atof(*beg.c_str()); break;
   };

  vFileContents.push_back(tran); // that's all you need
}

Open in new window

0
Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

 

Author Comment

by:tommym121
Comment Utility
So what I want to understand, when I do such an assignment

tran.AccountNumber = *beg;

My questions are
1. When tokenize, is *beg has a copy of the actual string. Or just a reference to part of the 'line that has the token.  

2. If  *beg has its own copy, when I do the above assignment, does tran.AccountNumber get a new copy or a reference to *beg?

3.  tran.AccountNumber get a new copy and *beg also has it copy, will this cause memory leak since I am reading in all the lines from file?
0
 
LVL 86

Expert Comment

by:jkr
Comment Utility
1. 'beg' is a tokenizer iterator, so it basically points to the first token found in the loop you are starting.
2. No, when you do the assignment, you get a new copy of the token. If you hawever use it with 'atoi()', this function will evaluate the copy '*beg' has
3. No, no menory leaks, all variables are 'auto' and will go out of scope.
0
 
LVL 86

Expert Comment

by:jkr
Comment Utility
As a side note - only worry about leaks when you are using 'new' - if you don't, you are on teh safe side.
0
 
LVL 24

Accepted Solution

by:
chaau earned 292 total points
Comment Utility
First of all, beg has a type of "tokenizer<escaped_list_separator<char> >::iterator". The iterator itself is according to C++:
... any object that, pointing to some element in a range of elements (such as an array or a container)...
As you can see iterator itself is a pointer. In your case it will be a pointer to string, meaning:
beg equals *string

Open in new window

By doing *beg we dereference the variable which effectively becomes:
*beg equals &string

Open in new window

Therefore when you are doing this type of statements:
tran.AccountNumber = *beg;

Open in new window

You are actually doing
string = &string

Open in new window

which in turn calls the operator=() for a string. As you probably know this operator copies content of the string, not the pointer, not the reference. In this example:
string s = "hello";
string t = s;

Open in new window

t will have a string "hello" inside it, and it will be located at a different address than s.
I need to stress, that tran.AccountNumber is a string object. It is not a pointer. It exists within the scope of the struct Transaction, and does not represent a pointer.
Please read here about dereference operator, and here about the iterators
0
 

Author Comment

by:tommym121
Comment Utility
Thanks for all the comments, I am going to post below as next question.
Chaau, thank for your educating explanation and reference articles.  They are very helpful


if tran.AccountNumber is wchar_t *, how should I assigned *beg to it?
0
 
LVL 24

Assisted Solution

by:chaau
chaau earned 292 total points
Comment Utility
I recommend to keep all strings the same for the program. Say, if  tran.AccountNumber is a unicode string then use unicode libraries for your program. Open file using unicode, use a unicode version for tokenizer, etc. BTW, there is a wstring version in std library
0
 
LVL 32

Assisted Solution

by:sarabande
sarabande earned 82 total points
Comment Utility
if tran.AccountNumber is wchar_t *, how should I assigned *beg to it?
you would need to allocate storage for the member pointer and do a copy:

std::string str = *beg;
delete [] tran.AccountNumber;  // delete old storage if any. 
                             // the pointer should be initialized with NULL
tran.AccountNumber = new wchar_t [str.length()+1];
mbstowcs( tran.AccountNumber, str.c_str(), str.length());

Open in new window


Sara
0
 
LVL 32

Assisted Solution

by:sarabande
sarabande earned 82 total points
Comment Utility
note,  the 'tran' structure should not have a wchar_t * but a fixed sized wchar_t array for 'AccountNumber' or  a std::wstring as member. then, you would not have to care for allocation and deleting of the pointer members and you can assign the structure like that:

const int MAXLEN_ACCOUNTNUMBER = 16;

struct Transaction
{
     ....
     wchar_t AccountNumber[MAXLEN_ACCOUNTNUMBER]; 
};

Transaction temp = { 0 };  // makes all members zero
temp = tran;  // assuming tran is also a Transaction

Open in new window


the code for assigning and converting a token would turn to

std::string str = *beg;
size_t m = sizeof(tran.AccountNumber);
memset(tran.AccountNumber, 0, m*sizeof(wchar_t));
size_t n = str.length();
if (n >= m)
{
    n = m-1;
}
mbstowcs( tran.AccountNumber, str.c_str(), n); 
tran.AccountNumber[n] = L'0'; 

Open in new window


Sara
0
 
LVL 24

Assisted Solution

by:chaau
chaau earned 292 total points
Comment Utility
I just want to reiterate that if there's no particular requirement to have a wchar_t variable you can use wstring. It is much easier to use
0
 

Author Closing Comment

by:tommym121
Comment Utility
Thanks
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

C++ Properties One feature missing from standard C++ that you will find in many other Object Oriented Programming languages is something called a Property (http://www.experts-exchange.com/Programming/Languages/CPP/A_3912-Object-Properties-in-C.ht…
Many modern programming languages support the concept of a property -- a class member that combines characteristics of both a data member and a method.  These are sometimes called "smart fields" because you can add logic that is applied automaticall…
The goal of the video will be to teach the user the concept of local variables and scope. An example of a locally defined variable will be given as well as an explanation of what scope is in C++. The local variable and concept of scope will be relat…
The goal of the video will be to teach the user the difference and consequence of passing data by value vs passing data by reference in C++. An example of passing data by value as well as an example of passing data by reference will be be given. Bot…

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now