Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

how to extracted result into string or numeric from boost::tokenizer

Posted on 2013-11-28
19
Medium Priority
?
954 Views
Last Modified: 2013-12-07
Below is a code snippet that uses tokenizer.  How can I try to extract the token into a string or a data structure rather than printing them out
      cout << *beg << "\n";


#include<iostream>
#include<boost/tokenizer.hpp>
#include<string>

int main(){
   using namespace std;
   using namespace boost;
   string s = "Field 1,\"putting quotes around fields, allows commas\",Field 3, 123, 34.5";
   tokenizer<escaped_list_separator<char> > tok(s);
   for(tokenizer<escaped_list_separator<char> >::iterator beg=tok.begin(); beg!=tok.end();++beg){
       cout << *beg << "\n";
   }
}

Open in new window

0
Comment
Question by:tommym121
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 5
  • 5
  • +1
19 Comments
 
LVL 86

Assisted Solution

by:jkr
jkr earned 504 total points
ID: 39684219
Just assign them to a string instead, e.g.

#include<iostream>
#include<boost/tokenizer.hpp>
#include<string>
#include <list>

int main(){
   using namespace std;
   using namespace boost;
   list <string> results;
   string token;
   string s = "Field 1,\"putting quotes around fields, allows commas\",Field 3, 123, 34.5";
   tokenizer<escaped_list_separator<char> > tok(s);
   for(tokenizer<escaped_list_separator<char> >::iterator beg=tok.begin(); beg!=tok.end();++beg){
       cout << *beg << "\n";

      token = *beg;

     results.push_back(token);
   }
}
                                  

Open in new window

0
 
LVL 25

Assisted Solution

by:chaau
chaau earned 1168 total points
ID: 39684229
It depends where you want to store your tokens. You can for example declare a vector of strings and store each token there, like this:
#include<iostream>
#include<boost/tokenizer.hpp>
#include<string>

int main(){
   using namespace std;
   using namespace boost;
   vector<string> tokens;
   string s = "Field 1,\"putting quotes around fields, allows commas\",Field 3, 123, 34.5";
   tokenizer<escaped_list_separator<char> > tok(s);
   for(tokenizer<escaped_list_separator<char> >::iterator beg=tok.begin(); beg!=tok.end();++beg){
       tokens.push_back(*beg);
   }
}

Open in new window

If you want to use the vector, just iterate through it the same way you have iterated through the tokenizer:
vector<string>::const_iterator pos;
for(pos = tokens.begin(); pos != tokens.end(); ++pos)
{
    cout << *pos << ", ";
}

Open in new window

However, your question makes me think that you have a comma-separated string with the elements each at its own position, like a transaction string from a bank where the elements are defined like this:
AccountNumber,AccountName,DebitAmount,CreditAmount

Open in new window

In this case it will be wise to define a structure and move the tokens to each elements. I recommend you define this structure:
#include<iostream>
#include<boost/tokenizer.hpp>
#include<string>

using namespace std;
using namespace boost;

struct Transaction
{
  string AccountNumber;
  string AccountName;
  double DebitAmount;
  double CreditAmount;
};
enum
{
  PosAccountNumber,
  PosAccountName,
  PosDebitAmount,
  PosCreditAmount
};
int main(){
   Transaction tran;
   string s = "123456789, Test Account, 123, 34.5";
   tokenizer<escaped_list_separator<char> > tok(s);
   int pos = 0;
   for(tokenizer<escaped_list_separator<char> >::iterator beg=tok.begin(); beg!=tok.end();++beg, ++pos){
       switch(pos)
   {
   case PosAccountNumber: tran.AccountNumber = *beg; break;
   case PosAccountName: tran.AccountName = *beg; break;
   case PosDebitAmount: tran.DebitAmount = atof(*beg.c_str()); break;
   case PosCreditAmount: tran.CreditAmount = atof(*beg.c_str()); break;
   }
   }
}

Open in new window

0
 

Author Comment

by:tommym121
ID: 39684273
chaau

You are right. It is what I am looking for.

When I assign *beg to a string in the following in the following statement. Do we need to create a new string to hold the content of *beg.  I am confused when it is a copy and when it is a reference.
   case PosAccountNumber: tran.AccountNumber = *beg; break;
   case PosAccountName: tran.AccountName = *beg; break;
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
LVL 25

Assisted Solution

by:chaau
chaau earned 1168 total points
ID: 39684283
There is a copy of the string in the Transaction structure.
tran.AccountNumber is a string. When you use this statement:
tran.AccountNumber = *beg

Open in new window

you copy the content of the current token to the string inside the structure. If you are confused about the structures and enums then you can quite simply use normal string variables, like this:
#include<iostream>
#include<boost/tokenizer.hpp>
#include<string>

using namespace std;
using namespace boost;

int main(){
   string AccountNumber;
   string AccountName;
   string s = "123456789, Test Account, 123, 34.5";
   tokenizer<escaped_list_separator<char> > tok(s);
   int pos = 0;
   for(tokenizer<escaped_list_separator<char> >::iterator beg=tok.begin(); beg!=tok.end();++beg, ++pos){
       switch(pos)
   {
   case 0: AccountNumber = *beg; break;
   case 1: AccountName = *beg; break;
    // etc.
   }
   }
}

Open in new window

0
 
LVL 86

Assisted Solution

by:jkr
jkr earned 504 total points
ID: 39684294
>>Do we need to create a new string to hold the content of *beg.

See the code sample I posted above (which chaau was 'kind' enough to ignore), it does exactly that:

int main(){
   using namespace std;
   using namespace boost;
   list <string> results;
   string token;
   string s = "Field 1,\"putting quotes around fields, allows commas\",Field 3, 123, 34.5";
   tokenizer<escaped_list_separator<char> > tok(s);
   for(tokenizer<escaped_list_separator<char> >::iterator beg=tok.begin(); beg!=tok.end();++beg){
       cout << *beg << "\n";

      token = *beg;  // assign to string

     results.push_back(token); // add to list
   }
}

Open in new window

0
 
LVL 25

Assisted Solution

by:chaau
chaau earned 1168 total points
ID: 39684307
@jkr. I did not ignore your code. I was typing mine (using a vector, BTW) while you have already had yours typed. If you have a look at my answer that followed yours, you will notice that I have also provided a second option with a struct, enum and stuff like that. Obviously, that stuff required more time for typing, thus my answer appeared after yours. But I promise you that when I started typing there were no answers whatsoever.

This is actually a problem of EE. It does not have any indication of what is going on while you type the answer. If you have an experience with SO, you would know that there the answers from other users "magically" appear while you type your answer.

Send a note to the EE administrators with your concerns. Maybe they will implement some sort of AJAX functionality to show answers dynamically
0
 

Author Comment

by:tommym121
ID: 39684314
The reason why I ask is, I am planning to read a CSV file (quite similar to chaau).  I would  like to read into the structure and store it in a vector.  I will be  reading something like this
 where 'line' will need to be tokenized. Below is not complete code, but did illustrate what I need to accomplish
			infile.open(inputFilename, std::fstream::in);
			std::string line;
			std::string token;

			while (std::getline(infile, line))
			{
 tokenizer<escaped_list_separator<char> > tok(line);
   for(tokenizer<escaped_list_separator<char> >::iterator beg=tok.begin(); beg!=tok.end();++beg, ++pos){
       cout << *beg << "\n";

    switch(pos)
   {
   case PosAccountNumber: tran.AccountNumber = *beg; break;
   case PosAccountName: tran.AccountName = *beg; break;
   case PosDebitAmount: tran.DebitAmount = atof(*beg.c_str()); break;
   case PosCreditAmount: tran.CreditAmount = atof(*beg.c_str()); break;
   };
}

Open in new window

0
 
LVL 25

Assisted Solution

by:chaau
chaau earned 1168 total points
ID: 39684326
For your example I would definitely create a vector (or list with respect to jkr) of structs you have define for each row. That way it will be easier to use it later on in the code
0
 
LVL 86

Assisted Solution

by:jkr
jkr earned 504 total points
ID: 39684332
Well, you are almost there - the only thing that is missing is to 'push_back()' the struct you filled in the loop to the vector. That works exactly the same way as with a list or as with a single string in the examples above:

vector<STRUCT_TYPE> vFileContents; // don't know the exact definition of 'tran', just a placeholder
while (std::getline(infile, line))
{
 STRUCT_TYPE tran; // local to the loop, the contens will be appended to the vector
 tokenizer<escaped_list_separator<char> > tok(line);
   for(tokenizer<escaped_list_separator<char> >::iterator beg=tok.begin(); beg!=tok.end();++beg, ++pos){
       cout << *beg << "\n";

    switch(pos)
   {
   case PosAccountNumber: tran.AccountNumber = *beg; break;
   case PosAccountName: tran.AccountName = *beg; break;
   case PosDebitAmount: tran.DebitAmount = atof(*beg.c_str()); break;
   case PosCreditAmount: tran.CreditAmount = atof(*beg.c_str()); break;
   };

  vFileContents.push_back(tran); // that's all you need
}

Open in new window

0
 

Author Comment

by:tommym121
ID: 39684558
So what I want to understand, when I do such an assignment

tran.AccountNumber = *beg;

My questions are
1. When tokenize, is *beg has a copy of the actual string. Or just a reference to part of the 'line that has the token.  

2. If  *beg has its own copy, when I do the above assignment, does tran.AccountNumber get a new copy or a reference to *beg?

3.  tran.AccountNumber get a new copy and *beg also has it copy, will this cause memory leak since I am reading in all the lines from file?
0
 
LVL 86

Expert Comment

by:jkr
ID: 39684574
1. 'beg' is a tokenizer iterator, so it basically points to the first token found in the loop you are starting.
2. No, when you do the assignment, you get a new copy of the token. If you hawever use it with 'atoi()', this function will evaluate the copy '*beg' has
3. No, no menory leaks, all variables are 'auto' and will go out of scope.
0
 
LVL 86

Expert Comment

by:jkr
ID: 39684576
As a side note - only worry about leaks when you are using 'new' - if you don't, you are on teh safe side.
0
 
LVL 25

Accepted Solution

by:
chaau earned 1168 total points
ID: 39684581
First of all, beg has a type of "tokenizer<escaped_list_separator<char> >::iterator". The iterator itself is according to C++:
... any object that, pointing to some element in a range of elements (such as an array or a container)...
As you can see iterator itself is a pointer. In your case it will be a pointer to string, meaning:
beg equals *string

Open in new window

By doing *beg we dereference the variable which effectively becomes:
*beg equals &string

Open in new window

Therefore when you are doing this type of statements:
tran.AccountNumber = *beg;

Open in new window

You are actually doing
string = &string

Open in new window

which in turn calls the operator=() for a string. As you probably know this operator copies content of the string, not the pointer, not the reference. In this example:
string s = "hello";
string t = s;

Open in new window

t will have a string "hello" inside it, and it will be located at a different address than s.
I need to stress, that tran.AccountNumber is a string object. It is not a pointer. It exists within the scope of the struct Transaction, and does not represent a pointer.
Please read here about dereference operator, and here about the iterators
0
 

Author Comment

by:tommym121
ID: 39684657
Thanks for all the comments, I am going to post below as next question.
Chaau, thank for your educating explanation and reference articles.  They are very helpful


if tran.AccountNumber is wchar_t *, how should I assigned *beg to it?
0
 
LVL 25

Assisted Solution

by:chaau
chaau earned 1168 total points
ID: 39684667
I recommend to keep all strings the same for the program. Say, if  tran.AccountNumber is a unicode string then use unicode libraries for your program. Open file using unicode, use a unicode version for tokenizer, etc. BTW, there is a wstring version in std library
0
 
LVL 35

Assisted Solution

by:sarabande
sarabande earned 328 total points
ID: 39685007
if tran.AccountNumber is wchar_t *, how should I assigned *beg to it?
you would need to allocate storage for the member pointer and do a copy:

std::string str = *beg;
delete [] tran.AccountNumber;  // delete old storage if any. 
                             // the pointer should be initialized with NULL
tran.AccountNumber = new wchar_t [str.length()+1];
mbstowcs( tran.AccountNumber, str.c_str(), str.length());

Open in new window


Sara
0
 
LVL 35

Assisted Solution

by:sarabande
sarabande earned 328 total points
ID: 39685055
note,  the 'tran' structure should not have a wchar_t * but a fixed sized wchar_t array for 'AccountNumber' or  a std::wstring as member. then, you would not have to care for allocation and deleting of the pointer members and you can assign the structure like that:

const int MAXLEN_ACCOUNTNUMBER = 16;

struct Transaction
{
     ....
     wchar_t AccountNumber[MAXLEN_ACCOUNTNUMBER]; 
};

Transaction temp = { 0 };  // makes all members zero
temp = tran;  // assuming tran is also a Transaction

Open in new window


the code for assigning and converting a token would turn to

std::string str = *beg;
size_t m = sizeof(tran.AccountNumber);
memset(tran.AccountNumber, 0, m*sizeof(wchar_t));
size_t n = str.length();
if (n >= m)
{
    n = m-1;
}
mbstowcs( tran.AccountNumber, str.c_str(), n); 
tran.AccountNumber[n] = L'0'; 

Open in new window


Sara
0
 
LVL 25

Assisted Solution

by:chaau
chaau earned 1168 total points
ID: 39685085
I just want to reiterate that if there's no particular requirement to have a wchar_t variable you can use wstring. It is much easier to use
0
 

Author Closing Comment

by:tommym121
ID: 39703285
Thanks
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Templates For Beginners Or How To Encourage The Compiler To Work For You Introduction This tutorial is targeted at the reader who is, perhaps, familiar with the basics of C++ but would prefer a little slower introduction to the more ad…
Container Orchestration platforms empower organizations to scale their apps at an exceptional rate. This is the reason numerous innovation-driven companies are moving apps to an appropriated datacenter wide platform that empowers them to scale at a …
The goal of the tutorial is to teach the user how to use functions in C++. The video will cover how to define functions, how to call functions and how to create functions prototypes. Microsoft Visual C++ 2010 Express will be used as a text editor an…
The viewer will learn how to use the return statement in functions in C++. The video will also teach the user how to pass data to a function and have the function return data back for further processing.
Suggested Courses

610 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question