Delete the spaces in a string

Experts,

I want to delete all the white space that is within a string so that:

"A VERY LONG STRING       ISNT       IT?" becomes

"AVERYLONGSTRINGISNTIT?

I am using string objects - is there a member function or quick way to do this?
simondopickupAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

evilrixSenior Software Engineer (Avast)Commented:
0
Infinity08Commented:
>> I am using string objects

If you mean STL strings, then here's a way :
#include <iostream>
#include <string>
#include <algorithm>
#include <cctype>
 
int main(void) {
  std::string str = "A VERY LONG STRING       ISNT       IT?";
  str.resize(remove_if(str.begin(), str.end(), isspace) - str.begin());
  std::cout << str << std::endl;
  return 0;
}

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
alb66Commented:
In .net you can use String.Replace( " ", "" );

In MFC you can use CString::Replace( " ", "" );
0
Introduction to Web Design

Develop a strong foundation and understanding of web design by learning HTML, CSS, and additional tools to help you develop your own website.

jkrCommented:
Or, in a nutshell:
void trim(string& s) {
 
  size_t pos = 0;
  while(string::npos != (pos = s.find(' ',pos))) s.erase(pos);
}
 
// ...
 
string s = "A VERY LONG STRING       ISNT       IT?";
 
trim(s);

Open in new window

0
Infinity08Commented:
Note that the find/erase method is not as performant as the remove_if method I posted, since it needs to shift over the data many times.
0
simondopickupAuthor Commented:
jkr - your solution is removing everything after the first space is encountered.

i..e A VER LONG STRING becomes

"A"

0
itsmeandnobodyelseCommented:
>>>> while(string::npos != (pos = s.find(' ',pos))) s.erase(pos);
should be turned to

   while(string::npos != (pos = s.find(' ',pos))) s.erase(pos++);
0
jkrCommented:
Ooops, soor, should have been
void trim(string& s) {
 
  size_t pos = 0;
  while(string::npos != (pos = s.find(' ',pos))) s.erase(pos,1);
}

Open in new window

0
itsmeandnobodyelseCommented:
>>>> since it needs to shift over the data many times.
It is not so bad if the pos was incremented correctly ;-)
0
simondopickupAuthor Commented:
Infinity yours has worked a treat - thanks.
0
simondopickupAuthor Commented:
Many thanks, neat solution.
0
itsmeandnobodyelseCommented:
>>>> while(string::npos != (pos = s.find(' ',pos))) s.erase(pos,1);
Yes, it is !!!
0
Infinity08Commented:
>> It is not so bad if the pos was incremented correctly ;-)

It is, because for every space that is erased, the rest of the string is shifted one position to the left. So, if x spaces are erased, you can expect to shift something like (x*(len/2)) characters to the left.

The remove_if approach only shifts each character once, so you can expect to shift something like (len-x) characters to the left.

And the bigger x gets (ie. the more spaces the string contains), the bigger the difference in performance will get.
0
evilrixSenior Software Engineer (Avast)Commented:
Just as a follow-up... the main reason I'd probably go for find/erase over replace_if  is that it should be more efficient than replace_if since it can erase whole blocks of white space at a time rather than 1 char at a time; however, replace_if is simpler. Of course, YMMV.
0
evilrixSenior Software Engineer (Avast)Commented:
>> Note that the find/erase method is not as performant as the remove_if method I posted, since it needs to shift over the data many times

Interesting :)

I don't have to time test this now but I might try it later when I get home.
0
Infinity08Commented:
I got this on one of the machines here :
>more remover.cpp
#include <iostream>
#include <string>
#include <algorithm>
#include <cctype>
#include <ctime>
 
int main(void) {
  size_t i = 0;
  size_t cnt = 1000000;
  clock_t start = 0;
  clock_t end = 0;
 
  start = clock();
  while (i++ < cnt) {
    std::string str = "A VERY LONG STRING       ISNT       IT?";
    str.resize(std::remove_if(str.begin(), str.end(), isspace) - str.begin());
  }
  end = clock();
  std::cout << "remove_if method : " << (end - start) << std::endl;
 
  i = 0;
 
  start = clock();
  while (i++ < cnt) {
    std::string str = "A VERY LONG STRING       ISNT       IT?";
    size_t pos = 0;
    while(std::string::npos != (pos = str.find(' ', pos))) str.erase(pos, 1);
  }
  end = clock();
 
  std::cout << "find/erase method : " << (end - start) << std::endl;
 
  return 0;
}
 
>CC remover.cpp
>./a.out
remove_if method : 2136000
find/erase method : 3909000
>./a.out
remove_if method : 2132000
find/erase method : 3912000
>./a.out
remove_if method : 2142000
find/erase method : 3874000
>./a.out
remove_if method : 2135000
find/erase method : 3901000
>./a.out
remove_if method : 2136000
find/erase method : 3902000
>

Open in new window

0
Infinity08Commented:
Interestingly, the difference is smaller using MingW (which uses the Windows runtime) :

        remove_if method : 1953
        find/erase method : 2437

and it seems the isspace implementation is not very optimal on Windows. I had to write my own to make it perform well :

        remove_if method : 1437
        find/erase method : 2531

using :

        bool isSpace(char c) { return (c == ' '); }
0
evilrixSenior Software Engineer (Avast)Commented:
you are erasing one item at a time. I meant find start and end of space block and erase it all in one go. I would expect that to be quicker.
0
itsmeandnobodyelseCommented:
>> Note that the find/erase method is not as performant as the remove_if method I posted, since it needs to shift over the data many times

You are right. I checked the code in STL of VC9 compiler (VS 2008) and the final function was (I made the classes and variables more readable) performing a forward loop where it tests the condition for each single char only once and doing the assignment within the same loop.

I wonder why the

     *next++ = *first;

works. If next would be a class object the post operator++ would happen before assignment. So either next is a simple char pointer or they fixed the issue that post operator on class objects works differently than on POD types.
// TEMPLATE FUNCTION remove_copy_if
template<class ForwardIterator, class OutputIterator, class Condition> inline
OutputIterator _Remove_copy_if(ForwardIterator first, ForwardIterator last, OutputIterator next, Condition cond, _Range_checked_iterator_tag)
{	
    // copy omitting each element satisfying cond
    for (; first != last; ++first)
        if (!cond(*first))
            *next++ = *first;
        return (next);
}

Open in new window

0
itsmeandnobodyelseCommented:
I added a plain C loop to the test prog of Infinity. Now we have (XP, Core2Duo 2.1)

remove_if method :   609 AVERYLONGSTRINGISNTIT?
find/erase method : 1906 AVERYLONGSTRINGISNTIT?
plain loop method :  250 AVERYLONGSTRINGISNTIT?
#include <iostream>
#include <string>
#include <algorithm>
#include <cctype>
#include <ctime>
 
int main(void) {
  size_t i = 0;
  size_t cnt = 1000000;
  clock_t start = 0;
  clock_t end = 0;
  std::string str;
 
  start = clock();   
  while (i++ < cnt) {
    str = "A VERY LONG STRING       ISNT       IT?";
    str.resize(std::remove_if(str.begin(), str.end(), isspace) - str.begin());
  }
  end = clock();
  std::cout << "remove_if method : " << (end - start) << " " << str << std::endl;
 
  i = 0;
 
  start = clock();
  while (i++ < cnt) {
    str = "A VERY LONG STRING       ISNT       IT?";
    size_t pos = 0;
    while(std::string::npos != (pos = str.find(' ', pos))) str.erase(pos, 1);
  }
  end = clock();
 
  std::cout << "find/erase method : " << (end - start) << " " << str << std::endl;
 
  i = 0;
 
  start = clock();
  while (i++ < cnt) {
    str = "A VERY LONG STRING       ISNT       IT?";
    size_t len = str.size();
    char * p = &str[0];
    char * n = &str[0];
 
    for (;*p;p++) 
       if (*p != ' ') *n++ = *p; 
       else           --len;
    str.resize(len); 
  }
  end = clock();
 
  std::cout << "plain loop method : " << (end - start) << " " << str << std::endl;
 
  return 0;
}

Open in new window

0
evilrixSenior Software Engineer (Avast)Commented:
I had very little time so I coded this up very quickly (using I8's original code as a starting point), so I'm not promising this is bug free...

remove_if method  : 11150000
find/erase method : 4070000
remove_if method  : 11290000
find/erase method : 4030000
remove_if method  : 11300000
find/erase method : 4050000

#include <iostream>
#include <string>
#include <algorithm>
#include <cctype>
#include <ctime>
 
int main(void) {
   for(int x = 0 ; x < 3 ; ++x)
   {
     size_t i = 0;
     size_t cnt = 1000000;
     clock_t start = 0;
     clock_t end = 0;
    
     start = clock();
     while (i++ < cnt) {
       std::string str = "A VERY LONG STRING       ISNT       IT?";
       str.resize(std::remove_if(str.begin(), str.end(), isspace) - str.begin());
     }
     end = clock();
     std::cout << "remove_if method  : " << (end - start) << std::endl;
    
     i = 0;
    
     start = clock();
     while (i++ < cnt) {
  	   std::string str = "A VERY LONG STRING       ISNT       IT?";
   
	   size_t spos = 0;
	   size_t epos = 0;
   
	   for(;;)
	   {
		   spos = str.find(' ', spos);
		   if(spos == std::string::npos) { break; }
		   epos = str.find_first_not_of(' ', spos);
		   if(epos == std::string::npos) { epos = str.size(); }
		   str.erase(str.begin() + spos, str.begin() + epos);
	   }
     }
     end = clock();
    
     std::cout << "find/erase method : " << (end - start) << std::endl;
    
   }
}

Open in new window

0
Infinity08Commented:
>> you are erasing one item at a time. I meant find start and end of space block and erase it all in one go. I would expect that to be quicker.

True, that will most likely make it faster, but it depends on the input data. If the amount of large space blocks in the input data is big, maybe you'd get an advantage out of it ... Worth testing ... maybe later :)


>> If next would be a class object the post operator++ would happen before assignment.

The operator++ is called before the assignment, but that doesn't change the fact that the un-incremented value will be used for the assignment, assuming that the overloaded operator++ is properly implemented (ie. it returns the un-incremented value).

Post increment operators guarantee that the increment happens AFTER the original value has been read, no matter if the object is a POD type or if it's a class with an overloaded post increment operator (assuming it's properly implemented).

So, first the value of 'next' is retrieved (to be used for the assignment), then 'next' is incremented. The exact moment the post increment operator is called is not important, but it is guaranteed that the original value of 'next' will be used for the assignment.


>> I added a plain C loop

That's not surprising heh ... given the overhead of the STL algorithm, and the fact that C++ compilers still have a long way to go to optimize STL functionality.
0
evilrixSenior Software Engineer (Avast)Commented:
>> but it depends on the input data
Yes, it very much does as. It also depends on how your string is implemented. Since string isn't bound to have contiguous memory internally erase can, in theory, be implemented very efficiently. Of course, as I said before, YMMV and as always when considering optimizing code you should profile the options using representative data.
0
Infinity08Commented:
>> and the fact that C++ compilers still have a long way to go to optimize STL functionality.

Actually, if you want to be fair, you should compile the code with optimization enabled, to give the compiler a fighting chance ;)


>> Worth testing ... maybe later :)

And I see you already did lol. But try the same with -O2 enabled, and you should see the difference shrink ... Then try it with a different string, like :

  "a s t r i n g w i t h a l o t o f s p a c e s"

(with and without optimization enabled)
0
Infinity08Commented:
>>   "a s t r i n g w i t h a l o t o f s p a c e s"

If my intuition doesn't play tricks on me, you should find that the remove_if method is a lot more consistent. No matter what input string you give it, it should not make a lot of difference in the time it takes to execute it (the more spaces the string contains, the faster it will be actually lol).
The find/erase method however should give widely varying results, depending on the input string used. Which is in my view another reason to choose the remove_if method.
0
evilrixSenior Software Engineer (Avast)Commented:
>> Actually, if you want to be fair, you should compile the code with optimization enabled
Muhahaha, told you I was rushing and I had that nasty feeling in my gut I'd forgotten to do something. :)
See below O2 and O3, both faster given the simple test case we've been using.

On balance, with lots of small white space they are likely to be similar in performance (replace_if will probably have an edge) but if there are blocks of white spaceI'd expect, on balance, erase to be faster. Like I said though, and I'm sure you agree, profiling is the right way to go if you need the most optimal solution.

I like Qs like this, they are simple yet thought provoking :)
$ g++ -O3 x.cpp 
$ ./a.out 
remove_if method  : 3050000
find/erase method : 2860000
remove_if method  : 3040000
find/erase method : 2830000
remove_if method  : 3120000
find/erase method : 2840000
 
$ g++ -O2 x.cpp 
$ ./a.out 
remove_if method  : 3120000
find/erase method : 3160000 <--- arrrg, I lost :)
remove_if method  : 2950000
find/erase method : 2820000
remove_if method  : 2950000
find/erase method : 2840000

Open in new window

0
Infinity08Commented:
>> >> Actually, if you want to be fair, you should compile the code with optimization enabled

Note that this was more intended for Alex's hand-optimized solution. It uses the same algorithm as remove_if, but remove_if has the "disadvantage" of being a generic algorithm, so it needs compiler optimization to stand a chance against Alex's very tight loop. The remove_if will likely still be slower, even with full optimization, but the difference should be smaller.


>> Like I said though, and I'm sure you agree, profiling is the right way to go if you need the most optimal solution.

Of course. That's why I focused on the algorithmically best solution, and in this case, there's nothing more straightforward than iterating over the string once, and shifting non-space characters one at a time, which is exactly what remove_if does.

The find/erase approach with space-block detection is still algorithmically inferior (although obviously less so than the standard find/erase approach).

After that it's up to the compiler to try to optimize it, and up to the profiler to point out if there's an unforeseen  problem.
0
Infinity08Commented:
>> I like Qs like this, they are simple yet thought provoking :)

Yep, me too :)
0
evilrixSenior Software Engineer (Avast)Commented:
>> Note that this was more intended for Alex's hand-optimized solution
I know but it prompted me to realized I'd forgotten to. :)
0
itsmeandnobodyelseCommented:
>>>> so it needs compiler optimization to stand a chance against Alex's very tight loop.

Yes, and I didn't use isspace what is unfair also:

With

       if (!isspace(*p)) *n++ = *p;

times turn to

remove_if method :   609 AVERYLONGSTRINGISNTIT?
find/erase method : 1891 AVERYLONGSTRINGISNTIT?
plain loop method :   421 AVERYLONGSTRINGISNTIT?

0
Infinity08Commented:
That isspace function behaves quite weirdly under Windows. I noticed a big increase in performance by using my own function :

        bool isSpace(char c) { return (c == ' '); }

And I only noticed that on Windows platforms ... Strange.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Editors IDEs

From novice to tech pro — start learning today.