Create string splitting user defined function in C++

We are looking for a solution that transform these underscore (_)
value strings into individual “name – value” pairs.The number of “_”
separated values in the concatenated string can also change over time
and need to be dynamically handled.We are looking to create a dynamic
code in C++.delimiter in the string and no of columns in the string are variable.
we will use this C++ code in netezza database as a function.

Source Column Value:dl_v5.2_2_9_1200_256_0_1_64_0_54_2345

below is the out put we expected.
P1        P6        P7       P8         P11
dl        256      0       1         54
LVL 1
coventriAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

peprCommented:
I suggest to use the strtok template shown at http://gcc.gnu.org/onlinedocs/libstdc++/manual/strings.html#strings.string.token, and using the underscore as the delimiter argument. This way you can split the string to the standard vector of strings. Then you can directly access the elements (zero based indexing). The strtok definition from the URL is:
#include <string>
template <typename Container>
void
stringtok(Container &container, string const &in,
	  const char * const delimiters = " \t\n")
{
    const string::size_type len = in.length();
	  string::size_type i = 0;

    while (i < len)
    {
	// Eat leading whitespace
	i = in.find_first_not_of(delimiters, i);
	if (i == string::npos)
	  return;   // Nothing left but white space

	// Find the end of the token
	string::size_type j = in.find_first_of(delimiters, i);

	// Push token
	if (j == string::npos)
	{
	  container.push_back(in.substr(i));
	  return;
	}
	else
	  container.push_back(in.substr(i, j-i));

	// Set up for next loop
	i = j + 1;
    }
}

Open in new window

You may want to modify it a bit to fit your needs. Parsing to the vector would look like (not tested):
std::vector<string>  vec;
stringtok(vec, "dl_v5.2_2_9_1200_256_0_1_64_0_54_2345", "_");

Open in new window

Then vec[0] will be "dl", etc.
0
murugesandinsShell_script Automation /bin/bash /bin/bash.exe /bin/ksh /bin/mksh.exe AIX C C++ CYGWIN_NT HP-UX Linux MINGW32 MINGW64 SunOS Windows_NTCommented:
Updated code

Rest removed, since it was basically copying what pepr had posted - jkr
0
coventriAuthor Commented:
Requirement split a string.
We have string data in Netezza database tables.We need to create a user defined function to split string, In netezza database for creating a function we have to write code in C++programming and compile it on netezza. we have to create function with name stringsplit.
Function should accept three parameters.
strinsplit(columnname,delimiter,position)
Can any one please provide function code to split string in c++programming.I am new to C++ programming.

Below is the sample data:

Source Column Value:dl_v5.2_2_9_1200_256_0_1_64_0_54_2345

select  stringsplit('dl_v5.2_2_9_1200_256_0_1_64_0_54_2345','_',0)
output expected is dl

select  stringsplit('dl_v5.2_2_9_1200_256_0_1_64_0_54_2345','_',1)
output expected is v5.2

Sample netezza function customername code is attached.
For the customername example, the UDF takes a string and returns the integer 1 if the
string starts with “Customer A”, otherwise it returns the integer 0.I got the sample from netezza manual.
how should i add your split code to the netezza function.
customername.cpp
0
CompTIA Network+

Prepare for the CompTIA Network+ exam by learning how to troubleshoot, configure, and manage both wired and wireless networks.

earth man2Commented:
start with something simple to start.  I have no way to test this but try this and then add the strtok code to add the functionality you want.

#include <regex.h>
#include <string.h>
#include <sys/types.h>
#include "udxinc.h"


using namespace nz::udx_ver2;



class splitString : public nz::udx_ver2::Udf
{
  public:
  splitstring(UdxInit *pInit) : Udf(pInit)
  {
  }
 static nz::udx_ver2::Udf* instantiate(UdxInit *pInit);  

 virtual nz::udx_ver2::ReturnValue evaluate()
 {
     
      // MAKE SURE INPUTS HAVE BEN PASSED CORRECTLY
      if ( isArgNull(0) || isArgNull(1) ) NZ_UDX_RETURN_NULL();

      // CAPTURE INPUT VARIABLES
      StringArg *inString  = stringArg(0);  // STRING
      int16      inPos = int16Arg(1);   // POSITION NUMBER
      StringReturn* ret = stringReturnInfo();

        // INPUT DATA VALIDATION
      string strData = inString->data ;
     
      if ( inPos < 1 ) throwUdxException("The requested position cannot be found please supply value 1 or greater.");
      int Chk = 1;
           
      // SPLIT STRING
      Chk=1;
      bool Found = false;
      for ( int x = 0 ; x < inString->length ; x++ ) {
      }
     
      ret->size = 4;
      memcpy(ret->data, "v5.2", 4 );

      // SANITY CHECK
      if ( ret->size == 0 ) NZ_UDX_RETURN_NULL();      
   
      // RETURN RESULT
      NZ_UDX_RETURN_STRING(ret);
 }
};

nz::udx_ver2::Udf* splitString::instantiate(UdxInit *pInit)
{
  return new splitString(pInit);
}
0
earth man2Commented:
you can get the 4th token from a string with some standard function wrangling like

select
substr('dl_v5.2_2_9_1200_256_0_1_64_0_54_2345',
instr('dl_v5.2_2_9_1200_256_0_1_64_0_54_2345','_',1,3)+1,
instr('dl_v5.2_2_9_1200_256_0_1_64_0_54_2345','_',1,4) -
instr('dl_v5.2_2_9_1200_256_0_1_64_0_54_2345','_',1,3) - 1);
0
earth man2Commented:
heres an example of using strtok,

#include <string>
#include <stdlib.h>
#include <iostream>
using namespace std;
int main ( int argc, char *argv[] ){
int i = 0, n = atoi( argv[2] );
char *p;
if ( p = strtok( argv[1], "_" ) ) do {
} while ( ( ++i < n ) && ( p = strtok( (char *) NULL ,"_") ) );
if ( i == n )
  cout << p;
else
  cout << "Token not found\n";
exit(0);
}
0
coventriAuthor Commented:
I have written the below C++ program.Its working as expected for some records but it's not working as expected for other records.I attached my C++ program and test data.In  test data bad data was highlighted with red color.

Below is the sql we are using in Netezza database to get the output.we are using C++ programming to create function and compile it in netezza database.can you please correct my c++ programming to get correct values.I am new to C++. program.
Function is accepting three parameters.
stringsplit(columnname,delimiter,position)


TestSQL.
select sk_key,TRIM(dl_record) ,stringsplit(TRIM(dl_record),'_',0) as dl0,
stringsplit(TRIM(dl_record),'_',1) as dl1,stringsplit(TRIM(dl_record),'_',2) as dl2,
stringsplit(TRIM(dl_record),'_',3) as dl3,stringsplit(TRIM(dl_record),'_',4) as dl4,
stringsplit(TRIM(dl_record),'_',5) as dl5
from TEST
order by sk_key
stringsplit.cpp
bad-data.xls
0
earth man2Commented:
I suspect the problem is down to the string representation between NZ and C.
C strings have to be null terminated.
So you need to make a copy as you need to add a zero byte on the end.
char *p = malloc( arg1str->size+1);
memcpy(p,arg1str->data,arg1str->size);
p[arg1str->size]=`\0`;
Same for delimiter string.  Check malloc function call is succesful. Use free(p) when done.  Netezza may have a special function for dynamic memory allocation instead if malloc.
0
coventriAuthor Commented:
Thanks earthman2.
Is it possible for u to access my pc through teamviewer. I am new to C++
0
earth man2Commented:
#include "udxinc.h"
#include <string.h>
#include <cstring>
using namespace nz::udx_ver2;
class stringsplit: public nz::udx_ver2::Udf {
public:stringsplit(UdxInit *pInit) : Udf(pInit){}
static nz::udx_ver2::Udf* instantiate(UdxInit *pInit);
virtual nz::udx_ver2::ReturnValue evaluate(){
StringReturn* ret = stringReturnInfo();
StringArg *arg1 =  stringArg(0), *arg2 =  stringArg(1);
int16 num = int16Arg(2), i = 0;
char *split, *arg11 = (char *) malloc( arg1->size + 1), *arg22 = (char *) malloc( arg2->size + 1);
  if ( arg11 && arg22 ) {
    memcpy( arg11, arg1->data, arg1->size );
    arg11[arg1->size] = '\0';
    memcpy( arg22, arg2->data, arg2->size );
    arg22[arg2->size] = '\0';
    split = strtok( arg11, arg22 ); 
    while ( split ) {
      if ( ++i == num ) {
        ret->size = strlen(split);
        memcpy(ret->data,split,ret->size);
        free(arg11);
        free(arg22);
        NZ_UDX_RETURN_STRING(ret);
      }
      split = strtok(NULL,arg22);
    }
    NZ_UDX_RETURN_NULL();
  }
};
nz::udx_ver2::Udf* stringsplit::instantiate(UdxInit *pInit) {
 return new stringsplit(pInit);
}

Open in new window

0
coventriAuthor Commented:
Thanks for ur time.
When I compiled I got below error message.you missed one closing bracket'}' at line 33.I added that one and compiled.

/tmp/stringsplit.cpp: In member function 'virtual nz::udx::ReturnValue stringsplit::evaluate()':
/tmp/stringsplit.cpp:13: error: 'struct nz::udx::StringArg' has no member named 'size'
/tmp/stringsplit.cpp:14: error: 'struct nz::udx::StringArg' has no member named 'size'
/tmp/stringsplit.cpp:14: error: expected ',' or ';' before ')' token
/tmp/stringsplit.cpp:16: error: expected initializer before 'int16'
/tmp/stringsplit.cpp:18: error: 'struct nz::udx::StringArg' has no member named 'size'
/tmp/stringsplit.cpp:19: error: 'struct nz::udx::StringArg' has no member named 'size'
/tmp/stringsplit.cpp:20: error: 'struct nz::udx::StringArg' has no member named 'size'
/tmp/stringsplit.cpp:21: error: 'struct nz::udx::StringArg' has no member named 'size'
/tmp/stringsplit.cpp:22: error: 'split' was not declared in this scope
/tmp/stringsplit.cpp:24: error: 'i' was not declared in this scope
0
earth man2Commented:
#include "udxinc.h"
#include <string.h>
#include <cstring>
using namespace nz::udx_ver2;
class stringsplit: public nz::udx_ver2::Udf {
public:stringsplit(UdxInit *pInit) : Udf(pInit){}
static nz::udx_ver2::Udf* instantiate(UdxInit *pInit);
virtual nz::udx_ver2::ReturnValue evaluate(){
  StringReturn* ret = stringReturnInfo();
  StringArg *arg1 = stringArg(0), *arg2 = stringArg(1);
  int16 i = 0, num = int16Arg(2);
  char *split, *arg11 = (char *) malloc( arg1->length + 1), *arg22 = (char *) malloc( arg2->length + 1);
    if ( arg11 && arg22 ) {
      memcpy( arg11, arg1->data, arg1->length );
      arg11[arg1->length] = '\0';
      memcpy( arg22, arg2->data, arg2->length );
      arg22[arg2->length] = '\0';
      split = strtok( arg11, arg22 ); 
      while ( split ) {
        if ( ++i == num ) {
          ret->length = strlen(split);
          memcpy(ret->data,split,ret->length);
          free(arg11);
          free(arg22);
          NZ_UDX_RETURN_STRING(ret);
        }
        split = strtok(NULL,arg22);
      }
    }
    NZ_UDX_RETURN_NULL();
  }
};
nz::udx_ver2::Udf* stringsplit::instantiate(UdxInit *pInit) {
 return new stringsplit(pInit);
}

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
coventriAuthor Commented:
Thanks a lot.Your code is working.
I did a small change to the code,ur code was displaying first value in the concatenated string as position 1 and we are expecting it as position 0.
So i changed increment in the while loop, it's working as expected.Can u please verify the below code.I will do some more testing and accept ur solution.




#include "udxinc.h"
 #include <string.h>
 #include <cstring>
 using namespace nz::udx_ver2;
 class stringsplit: public nz::udx_ver2::Udf
 {
 public:
 stringsplit(UdxInit *pInit) : Udf(pInit)
 {
 }
 static nz::udx_ver2::Udf* instantiate(UdxInit *pInit);
 virtual nz::udx_ver2::ReturnValue evaluate()
 {
 StringReturn* ret = stringReturnInfo();
 StringArg *arg1 =  stringArg(0);
 StringArg *arg2 =  stringArg(1);
 int16 num = int16Arg(2);

char *arg11 = (char *) malloc( arg1->length + 1);
char *arg22 = (char *) malloc( arg2->length + 1);

char *split;
int16 i = 0;
 
if ( arg11 && arg22 ) {
memcpy( arg11, arg1->data, arg1->length );
arg11[arg1->length] = '\0';
memcpy( arg22, arg2->data, arg2->length );
arg22[arg2->length] = '\0';
split = strtok( arg11, arg22 );
while ( split ) {
if (i == num) {
ret->size = strlen(split);
memcpy(ret->data,split,ret->size);
NZ_UDX_RETURN_STRING(ret);
}
split = strtok(NULL,arg22);
i = i+1;
}
NZ_UDX_RETURN_NULL();
}
}
};
 nz::udx_ver2::Udf* stringsplit::instantiate(UdxInit *pInit) {
  return new stringsplit(pInit);
 }
0
earth man2Commented:
need to free memory allocated else you will run out of heap memory eventually.

memcpy(ret->data,split,ret->size);
free(arg11);
free(arg22);
NZ_UDX_RETURN_STRING(ret);

also best move return null to end of function to fall out gracefully or raise exception as required.
0
earth man2Commented:
Also strlen should return a value of type size_t defined in stddef.h.  So you should consider what happens in the case of very large strings. Maybe that int16 value should be a unsigned integer "uint16".  Maybe varchar strings can never get that big.  Is an explicit cast required.
0
coventriAuthor Commented:
I removed delimiter as parameter, when compiling i am getting the
error. can u please take a look.i am getting below error at line18
invalid conversion from 'char' to 'nz::udx::StringArg*'
--------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------


#include "udxinc.h"
 #include <string.h>
 #include <cstring>
 using namespace nz::udx_ver2;
 class stringsplit: public nz::udx_ver2::Udf
 {
 public:
 stringsplit(UdxInit *pInit) : Udf(pInit)
 {
 }
 static nz::udx_ver2::Udf* instantiate(UdxInit *pInit);
 virtual nz::udx_ver2::ReturnValue evaluate()
 {
 StringReturn* ret = stringReturnInfo();
 StringArg *arg1 =  stringArg(0);
 int16 num = int16Arg(1);
 
 StringArg *arg2 =  '_';
 

char *arg11 = (char *) malloc( arg1->length + 1);
char *arg22 = (char *) malloc( arg2->length + 1);

char *split;
int16 i = 0;
 
if ( arg11 && arg22 ) {
memcpy( arg11, arg1->data, arg1->length );
arg11[arg1->length] = '\0';
memcpy( arg22, arg2->data, arg2->length );
arg22[arg2->length] = '\0';
split = strtok( arg11, arg22 );
while ( split ) {
if (i == num) {
ret->size = strlen(split);
memcpy(ret->data,split,ret->size);
free(arg11);
free(arg22);
NZ_UDX_RETURN_STRING(ret);
}
split = strtok(NULL,arg22);
i = i+1;
}
NZ_UDX_RETURN_NULL();
}
}
};
 nz::udx_ver2::Udf* stringsplit::instantiate(UdxInit *pInit) {
  return new stringsplit(pInit);
 }
0
earth man2Commented:
Enclose the underscore with double quotes not single quote character." not '.
Single quote indicates a single character constant and is not delimited by the null character.
0
earth man2Commented:
There's a bit more wrong than that...

#include "udxinc.h"
#include <string.h>
#include <cstring>
 using namespace nz::udx_ver2;
 class stringsplit: public nz::udx_ver2::Udf
 {
 public:
 stringsplit(UdxInit *pInit) : Udf(pInit)
 {
 }
 static nz::udx_ver2::Udf* instantiate(UdxInit *pInit);
 virtual nz::udx_ver2::ReturnValue evaluate()
{
 StringReturn* ret = stringReturnInfo();
 StringArg *arg1 =  stringArg(0);
 int16 num = int16Arg(1);
 
char *arg11 = (char *) malloc( arg1->length + 1);

char *split;
int16 i = 0;
 
if ( arg11 ) {
memcpy( arg11, arg1->data, arg1->length );
arg11[arg1->length] = '\0';
split = strtok( arg11, "_" );
while ( split ) {
if (i == num) {
ret->size = strlen(split);
memcpy(ret->data,split,ret->size);
free(arg11);
NZ_UDX_RETURN_STRING(ret);
}
split = strtok(NULL,"_");
i = i+1;
}
NZ_UDX_RETURN_NULL();
}
}
};
 nz::udx_ver2::Udf* stringsplit::instantiate(UdxInit *pInit) {
  return new stringsplit(pInit);
 }
0
earth man2Commented:
there is a risk to using strtok as it is not deemed as multithreading safe.
Best to use or write an alternative.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
C++

From novice to tech pro — start learning today.