Solved

Delete the spaces in a string

Posted on 2009-07-03
31
363 Views
Last Modified: 2013-12-14
Experts,

I want to delete all the white space that is within a string so that:

"A VERY LONG STRING       ISNT       IT?" becomes

"AVERYLONGSTRINGISNTIT?

I am using string objects - is there a member function or quick way to do this?
0
Comment
Question by:simondopickup
  • 11
  • 8
  • 6
  • +3
31 Comments
 
LVL 40

Expert Comment

by:evilrix
ID: 24772531
0
 
LVL 53

Accepted Solution

by:
Infinity08 earned 500 total points
ID: 24772557
>> I am using string objects

If you mean STL strings, then here's a way :
#include <iostream>

#include <string>

#include <algorithm>

#include <cctype>
 

int main(void) {

  std::string str = "A VERY LONG STRING       ISNT       IT?";

  str.resize(remove_if(str.begin(), str.end(), isspace) - str.begin());

  std::cout << str << std::endl;

  return 0;

}

Open in new window

0
 
LVL 19

Expert Comment

by:alb66
ID: 24772579
In .net you can use String.Replace( " ", "" );

In MFC you can use CString::Replace( " ", "" );
0
 
LVL 86

Expert Comment

by:jkr
ID: 24772584
Or, in a nutshell:
void trim(string& s) {
 

  size_t pos = 0;

  while(string::npos != (pos = s.find(' ',pos))) s.erase(pos);

}
 

// ...
 

string s = "A VERY LONG STRING       ISNT       IT?";
 

trim(s);

Open in new window

0
 
LVL 53

Expert Comment

by:Infinity08
ID: 24772660
Note that the find/erase method is not as performant as the remove_if method I posted, since it needs to shift over the data many times.
0
 

Author Comment

by:simondopickup
ID: 24772686
jkr - your solution is removing everything after the first space is encountered.

i..e A VER LONG STRING becomes

"A"

0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 24772696
>>>> while(string::npos != (pos = s.find(' ',pos))) s.erase(pos);
should be turned to

   while(string::npos != (pos = s.find(' ',pos))) s.erase(pos++);
0
 
LVL 86

Expert Comment

by:jkr
ID: 24772698
Ooops, soor, should have been
void trim(string& s) {

 

  size_t pos = 0;

  while(string::npos != (pos = s.find(' ',pos))) s.erase(pos,1);

}

Open in new window

0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 24772705
>>>> since it needs to shift over the data many times.
It is not so bad if the pos was incremented correctly ;-)
0
 

Author Comment

by:simondopickup
ID: 24772709
Infinity yours has worked a treat - thanks.
0
 

Author Closing Comment

by:simondopickup
ID: 31599568
Many thanks, neat solution.
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 24772720
>>>> while(string::npos != (pos = s.find(' ',pos))) s.erase(pos,1);
Yes, it is !!!
0
 
LVL 53

Expert Comment

by:Infinity08
ID: 24772741
>> It is not so bad if the pos was incremented correctly ;-)

It is, because for every space that is erased, the rest of the string is shifted one position to the left. So, if x spaces are erased, you can expect to shift something like (x*(len/2)) characters to the left.

The remove_if approach only shifts each character once, so you can expect to shift something like (len-x) characters to the left.

And the bigger x gets (ie. the more spaces the string contains), the bigger the difference in performance will get.
0
 
LVL 40

Expert Comment

by:evilrix
ID: 24772751
Just as a follow-up... the main reason I'd probably go for find/erase over replace_if  is that it should be more efficient than replace_if since it can erase whole blocks of white space at a time rather than 1 char at a time; however, replace_if is simpler. Of course, YMMV.
0
 
LVL 40

Expert Comment

by:evilrix
ID: 24772762
>> Note that the find/erase method is not as performant as the remove_if method I posted, since it needs to shift over the data many times

Interesting :)

I don't have to time test this now but I might try it later when I get home.
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 
LVL 53

Expert Comment

by:Infinity08
ID: 24772955
I got this on one of the machines here :
>more remover.cpp

#include <iostream>

#include <string>

#include <algorithm>

#include <cctype>

#include <ctime>
 

int main(void) {

  size_t i = 0;

  size_t cnt = 1000000;

  clock_t start = 0;

  clock_t end = 0;
 

  start = clock();

  while (i++ < cnt) {

    std::string str = "A VERY LONG STRING       ISNT       IT?";

    str.resize(std::remove_if(str.begin(), str.end(), isspace) - str.begin());

  }

  end = clock();

  std::cout << "remove_if method : " << (end - start) << std::endl;
 

  i = 0;
 

  start = clock();

  while (i++ < cnt) {

    std::string str = "A VERY LONG STRING       ISNT       IT?";

    size_t pos = 0;

    while(std::string::npos != (pos = str.find(' ', pos))) str.erase(pos, 1);

  }

  end = clock();
 

  std::cout << "find/erase method : " << (end - start) << std::endl;
 

  return 0;

}
 

>CC remover.cpp

>./a.out

remove_if method : 2136000

find/erase method : 3909000

>./a.out

remove_if method : 2132000

find/erase method : 3912000

>./a.out

remove_if method : 2142000

find/erase method : 3874000

>./a.out

remove_if method : 2135000

find/erase method : 3901000

>./a.out

remove_if method : 2136000

find/erase method : 3902000

>

Open in new window

0
 
LVL 53

Expert Comment

by:Infinity08
ID: 24773002
Interestingly, the difference is smaller using MingW (which uses the Windows runtime) :

        remove_if method : 1953
        find/erase method : 2437

and it seems the isspace implementation is not very optimal on Windows. I had to write my own to make it perform well :

        remove_if method : 1437
        find/erase method : 2531

using :

        bool isSpace(char c) { return (c == ' '); }
0
 
LVL 40

Expert Comment

by:evilrix
ID: 24773221
you are erasing one item at a time. I meant find start and end of space block and erase it all in one go. I would expect that to be quicker.
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 24773385
>> Note that the find/erase method is not as performant as the remove_if method I posted, since it needs to shift over the data many times

You are right. I checked the code in STL of VC9 compiler (VS 2008) and the final function was (I made the classes and variables more readable) performing a forward loop where it tests the condition for each single char only once and doing the assignment within the same loop.

I wonder why the

     *next++ = *first;

works. If next would be a class object the post operator++ would happen before assignment. So either next is a simple char pointer or they fixed the issue that post operator on class objects works differently than on POD types.
// TEMPLATE FUNCTION remove_copy_if

template<class ForwardIterator, class OutputIterator, class Condition> inline

OutputIterator _Remove_copy_if(ForwardIterator first, ForwardIterator last, OutputIterator next, Condition cond, _Range_checked_iterator_tag)

{	

    // copy omitting each element satisfying cond

    for (; first != last; ++first)

        if (!cond(*first))

            *next++ = *first;

        return (next);

}

Open in new window

0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 24773493
I added a plain C loop to the test prog of Infinity. Now we have (XP, Core2Duo 2.1)

remove_if method :   609 AVERYLONGSTRINGISNTIT?
find/erase method : 1906 AVERYLONGSTRINGISNTIT?
plain loop method :  250 AVERYLONGSTRINGISNTIT?
#include <iostream>

#include <string>

#include <algorithm>

#include <cctype>

#include <ctime>

 

int main(void) {

  size_t i = 0;

  size_t cnt = 1000000;

  clock_t start = 0;

  clock_t end = 0;

  std::string str;

 

  start = clock();   

  while (i++ < cnt) {

    str = "A VERY LONG STRING       ISNT       IT?";

    str.resize(std::remove_if(str.begin(), str.end(), isspace) - str.begin());

  }

  end = clock();

  std::cout << "remove_if method : " << (end - start) << " " << str << std::endl;

 

  i = 0;

 

  start = clock();

  while (i++ < cnt) {

    str = "A VERY LONG STRING       ISNT       IT?";

    size_t pos = 0;

    while(std::string::npos != (pos = str.find(' ', pos))) str.erase(pos, 1);

  }

  end = clock();

 

  std::cout << "find/erase method : " << (end - start) << " " << str << std::endl;

 

  i = 0;

 

  start = clock();

  while (i++ < cnt) {

    str = "A VERY LONG STRING       ISNT       IT?";

    size_t len = str.size();

    char * p = &str[0];

    char * n = &str[0];
 

    for (;*p;p++) 

       if (*p != ' ') *n++ = *p; 

       else           --len;

    str.resize(len); 

  }

  end = clock();

 

  std::cout << "plain loop method : " << (end - start) << " " << str << std::endl;

 

  return 0;

}

Open in new window

0
 
LVL 40

Expert Comment

by:evilrix
ID: 24773538
I had very little time so I coded this up very quickly (using I8's original code as a starting point), so I'm not promising this is bug free...

remove_if method  : 11150000
find/erase method : 4070000
remove_if method  : 11290000
find/erase method : 4030000
remove_if method  : 11300000
find/erase method : 4050000

#include <iostream>

#include <string>

#include <algorithm>

#include <cctype>

#include <ctime>

 

int main(void) {

   for(int x = 0 ; x < 3 ; ++x)

   {

     size_t i = 0;

     size_t cnt = 1000000;

     clock_t start = 0;

     clock_t end = 0;

    

     start = clock();

     while (i++ < cnt) {

       std::string str = "A VERY LONG STRING       ISNT       IT?";

       str.resize(std::remove_if(str.begin(), str.end(), isspace) - str.begin());

     }

     end = clock();

     std::cout << "remove_if method  : " << (end - start) << std::endl;

    

     i = 0;

    

     start = clock();

     while (i++ < cnt) {

  	   std::string str = "A VERY LONG STRING       ISNT       IT?";

   

	   size_t spos = 0;

	   size_t epos = 0;

   

	   for(;;)

	   {

		   spos = str.find(' ', spos);

		   if(spos == std::string::npos) { break; }

		   epos = str.find_first_not_of(' ', spos);

		   if(epos == std::string::npos) { epos = str.size(); }

		   str.erase(str.begin() + spos, str.begin() + epos);

	   }

     }

     end = clock();

    

     std::cout << "find/erase method : " << (end - start) << std::endl;

    

   }

}

Open in new window

0
 
LVL 53

Expert Comment

by:Infinity08
ID: 24773560
>> you are erasing one item at a time. I meant find start and end of space block and erase it all in one go. I would expect that to be quicker.

True, that will most likely make it faster, but it depends on the input data. If the amount of large space blocks in the input data is big, maybe you'd get an advantage out of it ... Worth testing ... maybe later :)


>> If next would be a class object the post operator++ would happen before assignment.

The operator++ is called before the assignment, but that doesn't change the fact that the un-incremented value will be used for the assignment, assuming that the overloaded operator++ is properly implemented (ie. it returns the un-incremented value).

Post increment operators guarantee that the increment happens AFTER the original value has been read, no matter if the object is a POD type or if it's a class with an overloaded post increment operator (assuming it's properly implemented).

So, first the value of 'next' is retrieved (to be used for the assignment), then 'next' is incremented. The exact moment the post increment operator is called is not important, but it is guaranteed that the original value of 'next' will be used for the assignment.


>> I added a plain C loop

That's not surprising heh ... given the overhead of the STL algorithm, and the fact that C++ compilers still have a long way to go to optimize STL functionality.
0
 
LVL 40

Expert Comment

by:evilrix
ID: 24773574
>> but it depends on the input data
Yes, it very much does as. It also depends on how your string is implemented. Since string isn't bound to have contiguous memory internally erase can, in theory, be implemented very efficiently. Of course, as I said before, YMMV and as always when considering optimizing code you should profile the options using representative data.
0
 
LVL 53

Expert Comment

by:Infinity08
ID: 24773604
>> and the fact that C++ compilers still have a long way to go to optimize STL functionality.

Actually, if you want to be fair, you should compile the code with optimization enabled, to give the compiler a fighting chance ;)


>> Worth testing ... maybe later :)

And I see you already did lol. But try the same with -O2 enabled, and you should see the difference shrink ... Then try it with a different string, like :

  "a s t r i n g w i t h a l o t o f s p a c e s"

(with and without optimization enabled)
0
 
LVL 53

Expert Comment

by:Infinity08
ID: 24773630
>>   "a s t r i n g w i t h a l o t o f s p a c e s"

If my intuition doesn't play tricks on me, you should find that the remove_if method is a lot more consistent. No matter what input string you give it, it should not make a lot of difference in the time it takes to execute it (the more spaces the string contains, the faster it will be actually lol).
The find/erase method however should give widely varying results, depending on the input string used. Which is in my view another reason to choose the remove_if method.
0
 
LVL 40

Expert Comment

by:evilrix
ID: 24773659
>> Actually, if you want to be fair, you should compile the code with optimization enabled
Muhahaha, told you I was rushing and I had that nasty feeling in my gut I'd forgotten to do something. :)
See below O2 and O3, both faster given the simple test case we've been using.

On balance, with lots of small white space they are likely to be similar in performance (replace_if will probably have an edge) but if there are blocks of white spaceI'd expect, on balance, erase to be faster. Like I said though, and I'm sure you agree, profiling is the right way to go if you need the most optimal solution.

I like Qs like this, they are simple yet thought provoking :)
$ g++ -O3 x.cpp 

$ ./a.out 

remove_if method  : 3050000

find/erase method : 2860000

remove_if method  : 3040000

find/erase method : 2830000

remove_if method  : 3120000

find/erase method : 2840000
 

$ g++ -O2 x.cpp 

$ ./a.out 

remove_if method  : 3120000

find/erase method : 3160000 <--- arrrg, I lost :)

remove_if method  : 2950000

find/erase method : 2820000

remove_if method  : 2950000

find/erase method : 2840000

Open in new window

0
 
LVL 53

Expert Comment

by:Infinity08
ID: 24773738
>> >> Actually, if you want to be fair, you should compile the code with optimization enabled

Note that this was more intended for Alex's hand-optimized solution. It uses the same algorithm as remove_if, but remove_if has the "disadvantage" of being a generic algorithm, so it needs compiler optimization to stand a chance against Alex's very tight loop. The remove_if will likely still be slower, even with full optimization, but the difference should be smaller.


>> Like I said though, and I'm sure you agree, profiling is the right way to go if you need the most optimal solution.

Of course. That's why I focused on the algorithmically best solution, and in this case, there's nothing more straightforward than iterating over the string once, and shifting non-space characters one at a time, which is exactly what remove_if does.

The find/erase approach with space-block detection is still algorithmically inferior (although obviously less so than the standard find/erase approach).

After that it's up to the compiler to try to optimize it, and up to the profiler to point out if there's an unforeseen  problem.
0
 
LVL 53

Expert Comment

by:Infinity08
ID: 24773744
>> I like Qs like this, they are simple yet thought provoking :)

Yep, me too :)
0
 
LVL 40

Expert Comment

by:evilrix
ID: 24773771
>> Note that this was more intended for Alex's hand-optimized solution
I know but it prompted me to realized I'd forgotten to. :)
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 24773839
>>>> so it needs compiler optimization to stand a chance against Alex's very tight loop.

Yes, and I didn't use isspace what is unfair also:

With

       if (!isspace(*p)) *n++ = *p;

times turn to

remove_if method :   609 AVERYLONGSTRINGISNTIT?
find/erase method : 1891 AVERYLONGSTRINGISNTIT?
plain loop method :   421 AVERYLONGSTRINGISNTIT?

0
 
LVL 53

Expert Comment

by:Infinity08
ID: 24773867
That isspace function behaves quite weirdly under Windows. I noticed a big increase in performance by using my own function :

        bool isSpace(char c) { return (c == ' '); }

And I only noticed that on Windows platforms ... Strange.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
How to programmatically differentiate between C and Java 10 188
Writing a parser for java language 4 71
Grammars for C C++ and java 1 113
best sources to up-to-date in C++? 8 70
IntroductionThis article is the second in a three part article series on the Visual Studio 2008 Debugger.  It provides tips in setting and using breakpoints. If not familiar with this debugger, you can find a basic introduction in the EE article loc…
C++ Properties One feature missing from standard C++ that you will find in many other Object Oriented Programming languages is something called a Property (http://www.experts-exchange.com/Programming/Languages/CPP/A_3912-Object-Properties-in-C.ht…
The viewer will learn how to use and create keystrokes in Netbeans IDE 8.0 for Windows.
The viewer will learn how to clear a vector as well as how to detect empty vectors in C++.

914 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now