We help IT Professionals succeed at work.

Check out our new AWS podcast with Certified Expert, Phil Phillips! Listen to "How to Execute a Seamless AWS Migration" on EE or on your favorite podcast platform. Listen Now

x

Delete the spaces in a string

Medium Priority
387 Views
Last Modified: 2013-12-14
Experts,

I want to delete all the white space that is within a string so that:

"A VERY LONG STRING       ISNT       IT?" becomes

"AVERYLONGSTRINGISNTIT?

I am using string objects - is there a member function or quick way to do this?
Comment
Watch Question

evilrixSenior Software Engineer (Avast)
CERTIFIED EXPERT

Commented:
CERTIFIED EXPERT
Top Expert 2009
Commented:
Unlock this solution with a free trial preview.
(No credit card required)
Get Preview

Commented:
In .net you can use String.Replace( " ", "" );

In MFC you can use CString::Replace( " ", "" );
jkr
CERTIFIED EXPERT
Top Expert 2012

Commented:
Or, in a nutshell:
void trim(string& s) {
 
  size_t pos = 0;
  while(string::npos != (pos = s.find(' ',pos))) s.erase(pos);
}
 
// ...
 
string s = "A VERY LONG STRING       ISNT       IT?";
 
trim(s);

Open in new window

CERTIFIED EXPERT
Top Expert 2009

Commented:
Note that the find/erase method is not as performant as the remove_if method I posted, since it needs to shift over the data many times.

Author

Commented:
jkr - your solution is removing everything after the first space is encountered.

i..e A VER LONG STRING becomes

"A"

>>>> while(string::npos != (pos = s.find(' ',pos))) s.erase(pos);
should be turned to

   while(string::npos != (pos = s.find(' ',pos))) s.erase(pos++);
jkr
CERTIFIED EXPERT
Top Expert 2012

Commented:
Ooops, soor, should have been
void trim(string& s) {
 
  size_t pos = 0;
  while(string::npos != (pos = s.find(' ',pos))) s.erase(pos,1);
}

Open in new window

>>>> since it needs to shift over the data many times.
It is not so bad if the pos was incremented correctly ;-)

Author

Commented:
Infinity yours has worked a treat - thanks.

Author

Commented:
Many thanks, neat solution.
>>>> while(string::npos != (pos = s.find(' ',pos))) s.erase(pos,1);
Yes, it is !!!
CERTIFIED EXPERT
Top Expert 2009

Commented:
>> It is not so bad if the pos was incremented correctly ;-)

It is, because for every space that is erased, the rest of the string is shifted one position to the left. So, if x spaces are erased, you can expect to shift something like (x*(len/2)) characters to the left.

The remove_if approach only shifts each character once, so you can expect to shift something like (len-x) characters to the left.

And the bigger x gets (ie. the more spaces the string contains), the bigger the difference in performance will get.
evilrixSenior Software Engineer (Avast)
CERTIFIED EXPERT

Commented:
Just as a follow-up... the main reason I'd probably go for find/erase over replace_if  is that it should be more efficient than replace_if since it can erase whole blocks of white space at a time rather than 1 char at a time; however, replace_if is simpler. Of course, YMMV.
evilrixSenior Software Engineer (Avast)
CERTIFIED EXPERT

Commented:
>> Note that the find/erase method is not as performant as the remove_if method I posted, since it needs to shift over the data many times

Interesting :)

I don't have to time test this now but I might try it later when I get home.
CERTIFIED EXPERT
Top Expert 2009

Commented:
I got this on one of the machines here :
>more remover.cpp
#include <iostream>
#include <string>
#include <algorithm>
#include <cctype>
#include <ctime>
 
int main(void) {
  size_t i = 0;
  size_t cnt = 1000000;
  clock_t start = 0;
  clock_t end = 0;
 
  start = clock();
  while (i++ < cnt) {
    std::string str = "A VERY LONG STRING       ISNT       IT?";
    str.resize(std::remove_if(str.begin(), str.end(), isspace) - str.begin());
  }
  end = clock();
  std::cout << "remove_if method : " << (end - start) << std::endl;
 
  i = 0;
 
  start = clock();
  while (i++ < cnt) {
    std::string str = "A VERY LONG STRING       ISNT       IT?";
    size_t pos = 0;
    while(std::string::npos != (pos = str.find(' ', pos))) str.erase(pos, 1);
  }
  end = clock();
 
  std::cout << "find/erase method : " << (end - start) << std::endl;
 
  return 0;
}
 
>CC remover.cpp
>./a.out
remove_if method : 2136000
find/erase method : 3909000
>./a.out
remove_if method : 2132000
find/erase method : 3912000
>./a.out
remove_if method : 2142000
find/erase method : 3874000
>./a.out
remove_if method : 2135000
find/erase method : 3901000
>./a.out
remove_if method : 2136000
find/erase method : 3902000
>

Open in new window

CERTIFIED EXPERT
Top Expert 2009

Commented:
Interestingly, the difference is smaller using MingW (which uses the Windows runtime) :

        remove_if method : 1953
        find/erase method : 2437

and it seems the isspace implementation is not very optimal on Windows. I had to write my own to make it perform well :

        remove_if method : 1437
        find/erase method : 2531

using :

        bool isSpace(char c) { return (c == ' '); }
evilrixSenior Software Engineer (Avast)
CERTIFIED EXPERT

Commented:
you are erasing one item at a time. I meant find start and end of space block and erase it all in one go. I would expect that to be quicker.
>> Note that the find/erase method is not as performant as the remove_if method I posted, since it needs to shift over the data many times

You are right. I checked the code in STL of VC9 compiler (VS 2008) and the final function was (I made the classes and variables more readable) performing a forward loop where it tests the condition for each single char only once and doing the assignment within the same loop.

I wonder why the

     *next++ = *first;

works. If next would be a class object the post operator++ would happen before assignment. So either next is a simple char pointer or they fixed the issue that post operator on class objects works differently than on POD types.
// TEMPLATE FUNCTION remove_copy_if
template<class ForwardIterator, class OutputIterator, class Condition> inline
OutputIterator _Remove_copy_if(ForwardIterator first, ForwardIterator last, OutputIterator next, Condition cond, _Range_checked_iterator_tag)
{	
    // copy omitting each element satisfying cond
    for (; first != last; ++first)
        if (!cond(*first))
            *next++ = *first;
        return (next);
}

Open in new window

I added a plain C loop to the test prog of Infinity. Now we have (XP, Core2Duo 2.1)

remove_if method :   609 AVERYLONGSTRINGISNTIT?
find/erase method : 1906 AVERYLONGSTRINGISNTIT?
plain loop method :  250 AVERYLONGSTRINGISNTIT?
#include <iostream>
#include <string>
#include <algorithm>
#include <cctype>
#include <ctime>
 
int main(void) {
  size_t i = 0;
  size_t cnt = 1000000;
  clock_t start = 0;
  clock_t end = 0;
  std::string str;
 
  start = clock();   
  while (i++ < cnt) {
    str = "A VERY LONG STRING       ISNT       IT?";
    str.resize(std::remove_if(str.begin(), str.end(), isspace) - str.begin());
  }
  end = clock();
  std::cout << "remove_if method : " << (end - start) << " " << str << std::endl;
 
  i = 0;
 
  start = clock();
  while (i++ < cnt) {
    str = "A VERY LONG STRING       ISNT       IT?";
    size_t pos = 0;
    while(std::string::npos != (pos = str.find(' ', pos))) str.erase(pos, 1);
  }
  end = clock();
 
  std::cout << "find/erase method : " << (end - start) << " " << str << std::endl;
 
  i = 0;
 
  start = clock();
  while (i++ < cnt) {
    str = "A VERY LONG STRING       ISNT       IT?";
    size_t len = str.size();
    char * p = &str[0];
    char * n = &str[0];
 
    for (;*p;p++) 
       if (*p != ' ') *n++ = *p; 
       else           --len;
    str.resize(len); 
  }
  end = clock();
 
  std::cout << "plain loop method : " << (end - start) << " " << str << std::endl;
 
  return 0;
}

Open in new window

evilrixSenior Software Engineer (Avast)
CERTIFIED EXPERT

Commented:
I had very little time so I coded this up very quickly (using I8's original code as a starting point), so I'm not promising this is bug free...

remove_if method  : 11150000
find/erase method : 4070000
remove_if method  : 11290000
find/erase method : 4030000
remove_if method  : 11300000
find/erase method : 4050000

#include <iostream>
#include <string>
#include <algorithm>
#include <cctype>
#include <ctime>
 
int main(void) {
   for(int x = 0 ; x < 3 ; ++x)
   {
     size_t i = 0;
     size_t cnt = 1000000;
     clock_t start = 0;
     clock_t end = 0;
    
     start = clock();
     while (i++ < cnt) {
       std::string str = "A VERY LONG STRING       ISNT       IT?";
       str.resize(std::remove_if(str.begin(), str.end(), isspace) - str.begin());
     }
     end = clock();
     std::cout << "remove_if method  : " << (end - start) << std::endl;
    
     i = 0;
    
     start = clock();
     while (i++ < cnt) {
  	   std::string str = "A VERY LONG STRING       ISNT       IT?";
   
	   size_t spos = 0;
	   size_t epos = 0;
   
	   for(;;)
	   {
		   spos = str.find(' ', spos);
		   if(spos == std::string::npos) { break; }
		   epos = str.find_first_not_of(' ', spos);
		   if(epos == std::string::npos) { epos = str.size(); }
		   str.erase(str.begin() + spos, str.begin() + epos);
	   }
     }
     end = clock();
    
     std::cout << "find/erase method : " << (end - start) << std::endl;
    
   }
}

Open in new window

CERTIFIED EXPERT
Top Expert 2009

Commented:
>> you are erasing one item at a time. I meant find start and end of space block and erase it all in one go. I would expect that to be quicker.

True, that will most likely make it faster, but it depends on the input data. If the amount of large space blocks in the input data is big, maybe you'd get an advantage out of it ... Worth testing ... maybe later :)


>> If next would be a class object the post operator++ would happen before assignment.

The operator++ is called before the assignment, but that doesn't change the fact that the un-incremented value will be used for the assignment, assuming that the overloaded operator++ is properly implemented (ie. it returns the un-incremented value).

Post increment operators guarantee that the increment happens AFTER the original value has been read, no matter if the object is a POD type or if it's a class with an overloaded post increment operator (assuming it's properly implemented).

So, first the value of 'next' is retrieved (to be used for the assignment), then 'next' is incremented. The exact moment the post increment operator is called is not important, but it is guaranteed that the original value of 'next' will be used for the assignment.


>> I added a plain C loop

That's not surprising heh ... given the overhead of the STL algorithm, and the fact that C++ compilers still have a long way to go to optimize STL functionality.
evilrixSenior Software Engineer (Avast)
CERTIFIED EXPERT

Commented:
>> but it depends on the input data
Yes, it very much does as. It also depends on how your string is implemented. Since string isn't bound to have contiguous memory internally erase can, in theory, be implemented very efficiently. Of course, as I said before, YMMV and as always when considering optimizing code you should profile the options using representative data.
CERTIFIED EXPERT
Top Expert 2009

Commented:
>> and the fact that C++ compilers still have a long way to go to optimize STL functionality.

Actually, if you want to be fair, you should compile the code with optimization enabled, to give the compiler a fighting chance ;)


>> Worth testing ... maybe later :)

And I see you already did lol. But try the same with -O2 enabled, and you should see the difference shrink ... Then try it with a different string, like :

  "a s t r i n g w i t h a l o t o f s p a c e s"

(with and without optimization enabled)
CERTIFIED EXPERT
Top Expert 2009

Commented:
>>   "a s t r i n g w i t h a l o t o f s p a c e s"

If my intuition doesn't play tricks on me, you should find that the remove_if method is a lot more consistent. No matter what input string you give it, it should not make a lot of difference in the time it takes to execute it (the more spaces the string contains, the faster it will be actually lol).
The find/erase method however should give widely varying results, depending on the input string used. Which is in my view another reason to choose the remove_if method.
evilrixSenior Software Engineer (Avast)
CERTIFIED EXPERT

Commented:
>> Actually, if you want to be fair, you should compile the code with optimization enabled
Muhahaha, told you I was rushing and I had that nasty feeling in my gut I'd forgotten to do something. :)
See below O2 and O3, both faster given the simple test case we've been using.

On balance, with lots of small white space they are likely to be similar in performance (replace_if will probably have an edge) but if there are blocks of white spaceI'd expect, on balance, erase to be faster. Like I said though, and I'm sure you agree, profiling is the right way to go if you need the most optimal solution.

I like Qs like this, they are simple yet thought provoking :)
$ g++ -O3 x.cpp 
$ ./a.out 
remove_if method  : 3050000
find/erase method : 2860000
remove_if method  : 3040000
find/erase method : 2830000
remove_if method  : 3120000
find/erase method : 2840000
 
$ g++ -O2 x.cpp 
$ ./a.out 
remove_if method  : 3120000
find/erase method : 3160000 <--- arrrg, I lost :)
remove_if method  : 2950000
find/erase method : 2820000
remove_if method  : 2950000
find/erase method : 2840000

Open in new window

CERTIFIED EXPERT
Top Expert 2009

Commented:
>> >> Actually, if you want to be fair, you should compile the code with optimization enabled

Note that this was more intended for Alex's hand-optimized solution. It uses the same algorithm as remove_if, but remove_if has the "disadvantage" of being a generic algorithm, so it needs compiler optimization to stand a chance against Alex's very tight loop. The remove_if will likely still be slower, even with full optimization, but the difference should be smaller.


>> Like I said though, and I'm sure you agree, profiling is the right way to go if you need the most optimal solution.

Of course. That's why I focused on the algorithmically best solution, and in this case, there's nothing more straightforward than iterating over the string once, and shifting non-space characters one at a time, which is exactly what remove_if does.

The find/erase approach with space-block detection is still algorithmically inferior (although obviously less so than the standard find/erase approach).

After that it's up to the compiler to try to optimize it, and up to the profiler to point out if there's an unforeseen  problem.
CERTIFIED EXPERT
Top Expert 2009

Commented:
>> I like Qs like this, they are simple yet thought provoking :)

Yep, me too :)
evilrixSenior Software Engineer (Avast)
CERTIFIED EXPERT

Commented:
>> Note that this was more intended for Alex's hand-optimized solution
I know but it prompted me to realized I'd forgotten to. :)
>>>> so it needs compiler optimization to stand a chance against Alex's very tight loop.

Yes, and I didn't use isspace what is unfair also:

With

       if (!isspace(*p)) *n++ = *p;

times turn to

remove_if method :   609 AVERYLONGSTRINGISNTIT?
find/erase method : 1891 AVERYLONGSTRINGISNTIT?
plain loop method :   421 AVERYLONGSTRINGISNTIT?

CERTIFIED EXPERT
Top Expert 2009

Commented:
That isspace function behaves quite weirdly under Windows. I noticed a big increase in performance by using my own function :

        bool isSpace(char c) { return (c == ' '); }

And I only noticed that on Windows platforms ... Strange.
Unlock the solution to this question.
Thanks for using Experts Exchange.

Please provide your email to receive a free trial preview!

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.