how to extracted result into string or numeric from boost::tokenizer

Below is a code snippet that uses tokenizer.  How can I try to extract the token into a string or a data structure rather than printing them out
      cout << *beg << "\n";


#include<iostream>
#include<boost/tokenizer.hpp>
#include<string>

int main(){
   using namespace std;
   using namespace boost;
   string s = "Field 1,\"putting quotes around fields, allows commas\",Field 3, 123, 34.5";
   tokenizer<escaped_list_separator<char> > tok(s);
   for(tokenizer<escaped_list_separator<char> >::iterator beg=tok.begin(); beg!=tok.end();++beg){
       cout << *beg << "\n";
   }
}

Open in new window

tommym121Asked:
Who is Participating?

Improve company productivity with a Business Account.Sign Up

x
 
chaauConnect With a Mentor Commented:
First of all, beg has a type of "tokenizer<escaped_list_separator<char> >::iterator". The iterator itself is according to C++:
... any object that, pointing to some element in a range of elements (such as an array or a container)...
As you can see iterator itself is a pointer. In your case it will be a pointer to string, meaning:
beg equals *string

Open in new window

By doing *beg we dereference the variable which effectively becomes:
*beg equals &string

Open in new window

Therefore when you are doing this type of statements:
tran.AccountNumber = *beg;

Open in new window

You are actually doing
string = &string

Open in new window

which in turn calls the operator=() for a string. As you probably know this operator copies content of the string, not the pointer, not the reference. In this example:
string s = "hello";
string t = s;

Open in new window

t will have a string "hello" inside it, and it will be located at a different address than s.
I need to stress, that tran.AccountNumber is a string object. It is not a pointer. It exists within the scope of the struct Transaction, and does not represent a pointer.
Please read here about dereference operator, and here about the iterators
0
 
jkrConnect With a Mentor Commented:
Just assign them to a string instead, e.g.

#include<iostream>
#include<boost/tokenizer.hpp>
#include<string>
#include <list>

int main(){
   using namespace std;
   using namespace boost;
   list <string> results;
   string token;
   string s = "Field 1,\"putting quotes around fields, allows commas\",Field 3, 123, 34.5";
   tokenizer<escaped_list_separator<char> > tok(s);
   for(tokenizer<escaped_list_separator<char> >::iterator beg=tok.begin(); beg!=tok.end();++beg){
       cout << *beg << "\n";

      token = *beg;

     results.push_back(token);
   }
}
                                  

Open in new window

0
 
chaauConnect With a Mentor Commented:
It depends where you want to store your tokens. You can for example declare a vector of strings and store each token there, like this:
#include<iostream>
#include<boost/tokenizer.hpp>
#include<string>

int main(){
   using namespace std;
   using namespace boost;
   vector<string> tokens;
   string s = "Field 1,\"putting quotes around fields, allows commas\",Field 3, 123, 34.5";
   tokenizer<escaped_list_separator<char> > tok(s);
   for(tokenizer<escaped_list_separator<char> >::iterator beg=tok.begin(); beg!=tok.end();++beg){
       tokens.push_back(*beg);
   }
}

Open in new window

If you want to use the vector, just iterate through it the same way you have iterated through the tokenizer:
vector<string>::const_iterator pos;
for(pos = tokens.begin(); pos != tokens.end(); ++pos)
{
    cout << *pos << ", ";
}

Open in new window

However, your question makes me think that you have a comma-separated string with the elements each at its own position, like a transaction string from a bank where the elements are defined like this:
AccountNumber,AccountName,DebitAmount,CreditAmount

Open in new window

In this case it will be wise to define a structure and move the tokens to each elements. I recommend you define this structure:
#include<iostream>
#include<boost/tokenizer.hpp>
#include<string>

using namespace std;
using namespace boost;

struct Transaction
{
  string AccountNumber;
  string AccountName;
  double DebitAmount;
  double CreditAmount;
};
enum
{
  PosAccountNumber,
  PosAccountName,
  PosDebitAmount,
  PosCreditAmount
};
int main(){
   Transaction tran;
   string s = "123456789, Test Account, 123, 34.5";
   tokenizer<escaped_list_separator<char> > tok(s);
   int pos = 0;
   for(tokenizer<escaped_list_separator<char> >::iterator beg=tok.begin(); beg!=tok.end();++beg, ++pos){
       switch(pos)
   {
   case PosAccountNumber: tran.AccountNumber = *beg; break;
   case PosAccountName: tran.AccountName = *beg; break;
   case PosDebitAmount: tran.DebitAmount = atof(*beg.c_str()); break;
   case PosCreditAmount: tran.CreditAmount = atof(*beg.c_str()); break;
   }
   }
}

Open in new window

0
Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
tommym121Author Commented:
chaau

You are right. It is what I am looking for.

When I assign *beg to a string in the following in the following statement. Do we need to create a new string to hold the content of *beg.  I am confused when it is a copy and when it is a reference.
   case PosAccountNumber: tran.AccountNumber = *beg; break;
   case PosAccountName: tran.AccountName = *beg; break;
0
 
chaauConnect With a Mentor Commented:
There is a copy of the string in the Transaction structure.
tran.AccountNumber is a string. When you use this statement:
tran.AccountNumber = *beg

Open in new window

you copy the content of the current token to the string inside the structure. If you are confused about the structures and enums then you can quite simply use normal string variables, like this:
#include<iostream>
#include<boost/tokenizer.hpp>
#include<string>

using namespace std;
using namespace boost;

int main(){
   string AccountNumber;
   string AccountName;
   string s = "123456789, Test Account, 123, 34.5";
   tokenizer<escaped_list_separator<char> > tok(s);
   int pos = 0;
   for(tokenizer<escaped_list_separator<char> >::iterator beg=tok.begin(); beg!=tok.end();++beg, ++pos){
       switch(pos)
   {
   case 0: AccountNumber = *beg; break;
   case 1: AccountName = *beg; break;
    // etc.
   }
   }
}

Open in new window

0
 
jkrConnect With a Mentor Commented:
>>Do we need to create a new string to hold the content of *beg.

See the code sample I posted above (which chaau was 'kind' enough to ignore), it does exactly that:

int main(){
   using namespace std;
   using namespace boost;
   list <string> results;
   string token;
   string s = "Field 1,\"putting quotes around fields, allows commas\",Field 3, 123, 34.5";
   tokenizer<escaped_list_separator<char> > tok(s);
   for(tokenizer<escaped_list_separator<char> >::iterator beg=tok.begin(); beg!=tok.end();++beg){
       cout << *beg << "\n";

      token = *beg;  // assign to string

     results.push_back(token); // add to list
   }
}

Open in new window

0
 
chaauConnect With a Mentor Commented:
@jkr. I did not ignore your code. I was typing mine (using a vector, BTW) while you have already had yours typed. If you have a look at my answer that followed yours, you will notice that I have also provided a second option with a struct, enum and stuff like that. Obviously, that stuff required more time for typing, thus my answer appeared after yours. But I promise you that when I started typing there were no answers whatsoever.

This is actually a problem of EE. It does not have any indication of what is going on while you type the answer. If you have an experience with SO, you would know that there the answers from other users "magically" appear while you type your answer.

Send a note to the EE administrators with your concerns. Maybe they will implement some sort of AJAX functionality to show answers dynamically
0
 
tommym121Author Commented:
The reason why I ask is, I am planning to read a CSV file (quite similar to chaau).  I would  like to read into the structure and store it in a vector.  I will be  reading something like this
 where 'line' will need to be tokenized. Below is not complete code, but did illustrate what I need to accomplish
			infile.open(inputFilename, std::fstream::in);
			std::string line;
			std::string token;

			while (std::getline(infile, line))
			{
 tokenizer<escaped_list_separator<char> > tok(line);
   for(tokenizer<escaped_list_separator<char> >::iterator beg=tok.begin(); beg!=tok.end();++beg, ++pos){
       cout << *beg << "\n";

    switch(pos)
   {
   case PosAccountNumber: tran.AccountNumber = *beg; break;
   case PosAccountName: tran.AccountName = *beg; break;
   case PosDebitAmount: tran.DebitAmount = atof(*beg.c_str()); break;
   case PosCreditAmount: tran.CreditAmount = atof(*beg.c_str()); break;
   };
}

Open in new window

0
 
chaauConnect With a Mentor Commented:
For your example I would definitely create a vector (or list with respect to jkr) of structs you have define for each row. That way it will be easier to use it later on in the code
0
 
jkrConnect With a Mentor Commented:
Well, you are almost there - the only thing that is missing is to 'push_back()' the struct you filled in the loop to the vector. That works exactly the same way as with a list or as with a single string in the examples above:

vector<STRUCT_TYPE> vFileContents; // don't know the exact definition of 'tran', just a placeholder
while (std::getline(infile, line))
{
 STRUCT_TYPE tran; // local to the loop, the contens will be appended to the vector
 tokenizer<escaped_list_separator<char> > tok(line);
   for(tokenizer<escaped_list_separator<char> >::iterator beg=tok.begin(); beg!=tok.end();++beg, ++pos){
       cout << *beg << "\n";

    switch(pos)
   {
   case PosAccountNumber: tran.AccountNumber = *beg; break;
   case PosAccountName: tran.AccountName = *beg; break;
   case PosDebitAmount: tran.DebitAmount = atof(*beg.c_str()); break;
   case PosCreditAmount: tran.CreditAmount = atof(*beg.c_str()); break;
   };

  vFileContents.push_back(tran); // that's all you need
}

Open in new window

0
 
tommym121Author Commented:
So what I want to understand, when I do such an assignment

tran.AccountNumber = *beg;

My questions are
1. When tokenize, is *beg has a copy of the actual string. Or just a reference to part of the 'line that has the token.  

2. If  *beg has its own copy, when I do the above assignment, does tran.AccountNumber get a new copy or a reference to *beg?

3.  tran.AccountNumber get a new copy and *beg also has it copy, will this cause memory leak since I am reading in all the lines from file?
0
 
jkrCommented:
1. 'beg' is a tokenizer iterator, so it basically points to the first token found in the loop you are starting.
2. No, when you do the assignment, you get a new copy of the token. If you hawever use it with 'atoi()', this function will evaluate the copy '*beg' has
3. No, no menory leaks, all variables are 'auto' and will go out of scope.
0
 
jkrCommented:
As a side note - only worry about leaks when you are using 'new' - if you don't, you are on teh safe side.
0
 
tommym121Author Commented:
Thanks for all the comments, I am going to post below as next question.
Chaau, thank for your educating explanation and reference articles.  They are very helpful


if tran.AccountNumber is wchar_t *, how should I assigned *beg to it?
0
 
chaauConnect With a Mentor Commented:
I recommend to keep all strings the same for the program. Say, if  tran.AccountNumber is a unicode string then use unicode libraries for your program. Open file using unicode, use a unicode version for tokenizer, etc. BTW, there is a wstring version in std library
0
 
sarabandeConnect With a Mentor Commented:
if tran.AccountNumber is wchar_t *, how should I assigned *beg to it?
you would need to allocate storage for the member pointer and do a copy:

std::string str = *beg;
delete [] tran.AccountNumber;  // delete old storage if any. 
                             // the pointer should be initialized with NULL
tran.AccountNumber = new wchar_t [str.length()+1];
mbstowcs( tran.AccountNumber, str.c_str(), str.length());

Open in new window


Sara
0
 
sarabandeConnect With a Mentor Commented:
note,  the 'tran' structure should not have a wchar_t * but a fixed sized wchar_t array for 'AccountNumber' or  a std::wstring as member. then, you would not have to care for allocation and deleting of the pointer members and you can assign the structure like that:

const int MAXLEN_ACCOUNTNUMBER = 16;

struct Transaction
{
     ....
     wchar_t AccountNumber[MAXLEN_ACCOUNTNUMBER]; 
};

Transaction temp = { 0 };  // makes all members zero
temp = tran;  // assuming tran is also a Transaction

Open in new window


the code for assigning and converting a token would turn to

std::string str = *beg;
size_t m = sizeof(tran.AccountNumber);
memset(tran.AccountNumber, 0, m*sizeof(wchar_t));
size_t n = str.length();
if (n >= m)
{
    n = m-1;
}
mbstowcs( tran.AccountNumber, str.c_str(), n); 
tran.AccountNumber[n] = L'0'; 

Open in new window


Sara
0
 
chaauConnect With a Mentor Commented:
I just want to reiterate that if there's no particular requirement to have a wchar_t variable you can use wstring. It is much easier to use
0
 
tommym121Author Commented:
Thanks
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.