Link to home
Start Free TrialLog in
Avatar of Zulma9999
Zulma9999

asked on

Split function like in Perl

In Perl you can split a string into multiple strings.

I'm looking for a method to do this in C++.

I'm hopeing someone has already done this, or knows an easy method.

I want something that can split a string into a list<string> object.
Avatar of thienpnguyen
thienpnguyen

I am not sure about what slip function does.
 I assume that "slip" seperates a string into words
Here is a code


#include <iostream>
#include <sstream>
#include <list>
using namespace std;


void slip(char *str, list<string> &result)
{

    result.clear(); // if you want to append, you don't need to reset the list 's content

    istringstream is(str);
    string s;

    while( is >> s )
        result.push_back(s);
}    



void main()
{

    list<string> mylist;

    slip("one two three four five", mylist);

    list<string>::iterator i;

    for( i = mylist.begin();  i != mylist.end(); i++ )
        cout << (*i) << endl;
}

The output is

one
two
three
four
five
Avatar of Axter
I have a StrSplit template function that I think will do what you're looking for.

     template<class TC, class T1, class T2>
          int StrSplit(const T1 &Src_, const T2 &Token_, TC &SrcContainer, bool RemoveTokenFromResult)
     {
          int QtyItems = 0;
          T1 Token = Token_;
          if (Token.size())
          {
               for(T1 Src = Src_;;Src.erase(Src.begin(), std::search(Src.begin(), Src.end(), Token.begin(), Token.end())+Token.size()))
               {
                    QtyItems++;
                    if (Src.end() != std::search(Src.begin(), Src.end(), Token.begin(), Token.end()))
                    {
                         if (RemoveTokenFromResult)
                         {
                              SrcContainer.push_back(T1(Src.begin(), std::search(Src.begin(), Src.end(), Token.begin(), Token.end())));
                         }
                         else
                         {
                              SrcContainer.push_back(T1(Src.begin(), std::search(Src.begin(), Src.end(), Token.begin(), Token.end()) + Token.size()));
                         }
                    }
                    else
                    {
                         if (Src.size())     SrcContainer.push_back(Src);
                         return QtyItems;
                    }
               }
          }
          return QtyItems;
     }    
     
I also have an overloader function that works with it.

     template<class T1, class T2> std::list<T1> StrSplit(const T1 &Src, const T2 &Token, bool RemoveTokenFromResult = false)
     {
          std::list<T1> SrcContainer;
          StrSplit(Src,Token, SrcContainer,RemoveTokenFromResult);
          return SrcContainer;
     }
And I have this StrCombine function that can put back together what the previous function seperated.

     template<class T>
          T::value_type StrCombine(T &Src)
     {
          T::value_type tmp;
          for(T::iterator i = Src.begin();i!=Src.end();i++)
          {
               tmp += i->begin();
          }
          return tmp;
     }
The following is an example function that uses the above template functions.
The first arguement receives the string, the second takes what is used as the token-dividing string, and the third is true or false depending on wheather you want the token removed on the target container.

The functions works with std::string and/or std::wstring.

Let me know if you have any questions.


void SomeFunct(void)
{
     std::string data = "Hello World.  How are you axter.  I'm doing find. ";
     std::list<std::string> xx1 = oo::StrSplit(data, ". ", false);
     std::string tmp1 = oo::StrCombine(xx1);
     std::vector<std::string> xx3;
     oo::StrSplit(data, ". ", xx3, true);
     std::string tmp3 = oo::StrCombine(xx3);
}    
Avatar of Zulma9999

ASKER

thienpnguyen,
I'm looking for a split function that can take a token string to split the main string.  This is how Perl works.

I don't think I can modify your function to do that.

Axter,
Looks a little complicated, but let me give it a shot.
I'll get back to you.
#include <stdio.h>
#include <string.h>
//#include <split.h>

int     split(char * pstrSource, char ** pstrDest,     char * pstrSeps)
{
     auto       int            intNumTokens=0;
     auto       char     *     pstrToken=NULL;
     static     char     *     pstrDefaultSeps="\t,; ";
     static     char     *     pstrSepsToUse=NULL;
     if (!pstrSource || !pstrDest)
          {
          return 0;
          }

     pstrSepsToUse=pstrSeps;
     if (!pstrSepsToUse)
          {
          pstrSepsToUse=pstrDefaultSeps;
          }

     pstrToken = strtok( pstrSource, pstrSepsToUse );
     while( pstrToken != NULL )
          {
          pstrDest[intNumTokens]=pstrToken;
          intNumTokens++;
     pstrToken = strtok( NULL, pstrSepsToUse );
          }

     return     intNumTokens;
}
...and here's the header declaration.
int     split(char * pstrSource, char ** pstrDest,     char * pstrSeps=NULL);
Hi,

If you want a perl style regular expression library (including split) see the regex library at www.boost.org

To tide you over, here is something to split a string with a delimiter as a character or as a string.

//-----Code-start-----------------------------------
#include <list>
#include <iostream>
#include <algorithm>
#include <string>

// Do not add these lines in a header file, use std::list
// and std::string explicitly
using std::list;
using std::string;

list<string> split(string& source,char delim=':')
{
    list<std::string> tokens;
    string::iterator left=source.begin();
    string::iterator right;
    bool all_done=false;
    while (!all_done)
    {
     right=std::find(left, source.end(),delim);
     tokens.push_back(string(left,right));
     if (right==source.end()) all_done=true;
     else left=right;
     left++; // skip over deliminator
    }
    return tokens;
}

list<string> split(string& source, string delim)
{
    list<string> tokens;
    string::iterator left=source.begin();
    string::iterator right;
    bool all_done=false;
    while (!all_done)
    {
     right=std::search(left, source.end(),
                 delim.begin(),delim.end());
     tokens.push_back(string(left,right));
     if (right==source.end()) all_done=true;
     else left=right;
     left+=delim.size(); // skip over deliminator
    }
    return tokens;
}

template<typename list_type>
void display_list(list<list_type> the_list, std::ostream& os)
{
    std::copy(the_list.begin(), the_list.end(),
           std::ostream_iterator<string>(os,"\n"));
}

int main(int argc, char* argv[])
{
    // Use split(string,char) with default delimiter
    string input("/home/me/bin:/home/me/test:/usr/bin");
    list<string> tokens=split(input);
    display_list(tokens,std::cout);

    // Use split(string,char) with ; as delim
    input="c:\\home\\me\\bin;c:\\home\\me\\test;c:\\winnt";
    tokens=split(input,':');
    display_list(tokens,std::cout);

    // Use split(string,string) with :@; as delim
    input="/home/me/bin:@;/home/me/test:@;/usr/bin";
    tokens=split(input,":@;");
    display_list(tokens,std::cout);
   
    return 0;
}

//-----Code-End-----------------------------------

Triskelion,
Thanks for the help, but I'm looking for something that looks more like vickirk's and Axter's functions.
As I said in my question, I want something that works with list<string>.
I'm doing STL code, and I'm looking for an STL function.

vickirk & Axter,
Both vickirk's and Axter's code seem to do what I need.
vickirk's code is simple, but Axter's code has more flexibility.

vickirk,
I looked at the link you provided, but I couldn't find perl style regular expression.

I'll wait to see if you and Axter have any further input, before I decide.
ASKER CERTIFIED SOLUTION
Avatar of Axter
Axter
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Zulma9999,
You'll notice that the above function uses constant reference for the two input string.
It's a constant, so you can use it with both constant and non-constant input strings.
It's a reference, so the original string's do not have to be copied in order to use them in the algorithm.

In both my previous function, and in vickirk's function, the strings have to be copied before they can be used.

Using constant references greatly increase the iffecientcy of the code.  This is especially important when you're using large strings or when you're executing the function 1000's of times in your code.

My function also has an extra line to make sure it doesn't put an empty string into the list.
You'll notice this if you add a token string as the last part of an input string.

Example:
   //This input string has ":" at end
   string input("/home/me/bin:/home/me/test:/usr/bin:");
   //This will have an empty string as last string on list
   list<string> tokens=split(input);

   tokens.clear();
   //This will not have an empty string
   StrSplit(input,":",tokens,true);
Hi,

The library I was refering to can be found at

http://www.boost.org/libs/regex/index.htm

and the split function is at

http://www.boost.org/libs/regex/template_class_ref.htm#regex_split

If you use Axters solution (a perfectly fine solution) you may wish to add a specialisation for when the delimiter is a single char to for need that.  std::search can (implementation dependent) have more over head than find.

Cheerio
Thanks Axter and vickirk.
Both your functions are good, but I decided to go with Axter's.

vickirk,
Thanks for the link and the help.
I'm awarding you some points.
https://www.experts-exchange.com/jsp/qManageQuestion.jsp?ta=cplusprog&qid=20189239