Solved

Delete the spaces in a string

Posted on 2009-07-03
31
362 Views
Last Modified: 2013-12-14
Experts,

I want to delete all the white space that is within a string so that:

"A VERY LONG STRING       ISNT       IT?" becomes

"AVERYLONGSTRINGISNTIT?

I am using string objects - is there a member function or quick way to do this?
0
Comment
Question by:simondopickup
  • 11
  • 8
  • 6
  • +3
31 Comments
 
LVL 40

Expert Comment

by:evilrix
ID: 24772531
0
 
LVL 53

Accepted Solution

by:
Infinity08 earned 500 total points
ID: 24772557
>> I am using string objects

If you mean STL strings, then here's a way :
#include <iostream>

#include <string>

#include <algorithm>

#include <cctype>
 

int main(void) {

  std::string str = "A VERY LONG STRING       ISNT       IT?";

  str.resize(remove_if(str.begin(), str.end(), isspace) - str.begin());

  std::cout << str << std::endl;

  return 0;

}

Open in new window

0
 
LVL 19

Expert Comment

by:alb66
ID: 24772579
In .net you can use String.Replace( " ", "" );

In MFC you can use CString::Replace( " ", "" );
0
 
LVL 86

Expert Comment

by:jkr
ID: 24772584
Or, in a nutshell:
void trim(string& s) {
 

  size_t pos = 0;

  while(string::npos != (pos = s.find(' ',pos))) s.erase(pos);

}
 

// ...
 

string s = "A VERY LONG STRING       ISNT       IT?";
 

trim(s);

Open in new window

0
 
LVL 53

Expert Comment

by:Infinity08
ID: 24772660
Note that the find/erase method is not as performant as the remove_if method I posted, since it needs to shift over the data many times.
0
 

Author Comment

by:simondopickup
ID: 24772686
jkr - your solution is removing everything after the first space is encountered.

i..e A VER LONG STRING becomes

"A"

0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 24772696
>>>> while(string::npos != (pos = s.find(' ',pos))) s.erase(pos);
should be turned to

   while(string::npos != (pos = s.find(' ',pos))) s.erase(pos++);
0
 
LVL 86

Expert Comment

by:jkr
ID: 24772698
Ooops, soor, should have been
void trim(string& s) {

 

  size_t pos = 0;

  while(string::npos != (pos = s.find(' ',pos))) s.erase(pos,1);

}

Open in new window

0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 24772705
>>>> since it needs to shift over the data many times.
It is not so bad if the pos was incremented correctly ;-)
0
 

Author Comment

by:simondopickup
ID: 24772709
Infinity yours has worked a treat - thanks.
0
 

Author Closing Comment

by:simondopickup
ID: 31599568
Many thanks, neat solution.
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 24772720
>>>> while(string::npos != (pos = s.find(' ',pos))) s.erase(pos,1);
Yes, it is !!!
0
 
LVL 53

Expert Comment

by:Infinity08
ID: 24772741
>> It is not so bad if the pos was incremented correctly ;-)

It is, because for every space that is erased, the rest of the string is shifted one position to the left. So, if x spaces are erased, you can expect to shift something like (x*(len/2)) characters to the left.

The remove_if approach only shifts each character once, so you can expect to shift something like (len-x) characters to the left.

And the bigger x gets (ie. the more spaces the string contains), the bigger the difference in performance will get.
0
 
LVL 40

Expert Comment

by:evilrix
ID: 24772751
Just as a follow-up... the main reason I'd probably go for find/erase over replace_if  is that it should be more efficient than replace_if since it can erase whole blocks of white space at a time rather than 1 char at a time; however, replace_if is simpler. Of course, YMMV.
0
 
LVL 40

Expert Comment

by:evilrix
ID: 24772762
>> Note that the find/erase method is not as performant as the remove_if method I posted, since it needs to shift over the data many times

Interesting :)

I don't have to time test this now but I might try it later when I get home.
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 53

Expert Comment

by:Infinity08
ID: 24772955
I got this on one of the machines here :
>more remover.cpp

#include <iostream>

#include <string>

#include <algorithm>

#include <cctype>

#include <ctime>
 

int main(void) {

  size_t i = 0;

  size_t cnt = 1000000;

  clock_t start = 0;

  clock_t end = 0;
 

  start = clock();

  while (i++ < cnt) {

    std::string str = "A VERY LONG STRING       ISNT       IT?";

    str.resize(std::remove_if(str.begin(), str.end(), isspace) - str.begin());

  }

  end = clock();

  std::cout << "remove_if method : " << (end - start) << std::endl;
 

  i = 0;
 

  start = clock();

  while (i++ < cnt) {

    std::string str = "A VERY LONG STRING       ISNT       IT?";

    size_t pos = 0;

    while(std::string::npos != (pos = str.find(' ', pos))) str.erase(pos, 1);

  }

  end = clock();
 

  std::cout << "find/erase method : " << (end - start) << std::endl;
 

  return 0;

}
 

>CC remover.cpp

>./a.out

remove_if method : 2136000

find/erase method : 3909000

>./a.out

remove_if method : 2132000

find/erase method : 3912000

>./a.out

remove_if method : 2142000

find/erase method : 3874000

>./a.out

remove_if method : 2135000

find/erase method : 3901000

>./a.out

remove_if method : 2136000

find/erase method : 3902000

>

Open in new window

0
 
LVL 53

Expert Comment

by:Infinity08
ID: 24773002
Interestingly, the difference is smaller using MingW (which uses the Windows runtime) :

        remove_if method : 1953
        find/erase method : 2437

and it seems the isspace implementation is not very optimal on Windows. I had to write my own to make it perform well :

        remove_if method : 1437
        find/erase method : 2531

using :

        bool isSpace(char c) { return (c == ' '); }
0
 
LVL 40

Expert Comment

by:evilrix
ID: 24773221
you are erasing one item at a time. I meant find start and end of space block and erase it all in one go. I would expect that to be quicker.
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 24773385
>> Note that the find/erase method is not as performant as the remove_if method I posted, since it needs to shift over the data many times

You are right. I checked the code in STL of VC9 compiler (VS 2008) and the final function was (I made the classes and variables more readable) performing a forward loop where it tests the condition for each single char only once and doing the assignment within the same loop.

I wonder why the

     *next++ = *first;

works. If next would be a class object the post operator++ would happen before assignment. So either next is a simple char pointer or they fixed the issue that post operator on class objects works differently than on POD types.
// TEMPLATE FUNCTION remove_copy_if

template<class ForwardIterator, class OutputIterator, class Condition> inline

OutputIterator _Remove_copy_if(ForwardIterator first, ForwardIterator last, OutputIterator next, Condition cond, _Range_checked_iterator_tag)

{	

    // copy omitting each element satisfying cond

    for (; first != last; ++first)

        if (!cond(*first))

            *next++ = *first;

        return (next);

}

Open in new window

0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 24773493
I added a plain C loop to the test prog of Infinity. Now we have (XP, Core2Duo 2.1)

remove_if method :   609 AVERYLONGSTRINGISNTIT?
find/erase method : 1906 AVERYLONGSTRINGISNTIT?
plain loop method :  250 AVERYLONGSTRINGISNTIT?
#include <iostream>

#include <string>

#include <algorithm>

#include <cctype>

#include <ctime>

 

int main(void) {

  size_t i = 0;

  size_t cnt = 1000000;

  clock_t start = 0;

  clock_t end = 0;

  std::string str;

 

  start = clock();   

  while (i++ < cnt) {

    str = "A VERY LONG STRING       ISNT       IT?";

    str.resize(std::remove_if(str.begin(), str.end(), isspace) - str.begin());

  }

  end = clock();

  std::cout << "remove_if method : " << (end - start) << " " << str << std::endl;

 

  i = 0;

 

  start = clock();

  while (i++ < cnt) {

    str = "A VERY LONG STRING       ISNT       IT?";

    size_t pos = 0;

    while(std::string::npos != (pos = str.find(' ', pos))) str.erase(pos, 1);

  }

  end = clock();

 

  std::cout << "find/erase method : " << (end - start) << " " << str << std::endl;

 

  i = 0;

 

  start = clock();

  while (i++ < cnt) {

    str = "A VERY LONG STRING       ISNT       IT?";

    size_t len = str.size();

    char * p = &str[0];

    char * n = &str[0];
 

    for (;*p;p++) 

       if (*p != ' ') *n++ = *p; 

       else           --len;

    str.resize(len); 

  }

  end = clock();

 

  std::cout << "plain loop method : " << (end - start) << " " << str << std::endl;

 

  return 0;

}

Open in new window

0
 
LVL 40

Expert Comment

by:evilrix
ID: 24773538
I had very little time so I coded this up very quickly (using I8's original code as a starting point), so I'm not promising this is bug free...

remove_if method  : 11150000
find/erase method : 4070000
remove_if method  : 11290000
find/erase method : 4030000
remove_if method  : 11300000
find/erase method : 4050000

#include <iostream>

#include <string>

#include <algorithm>

#include <cctype>

#include <ctime>

 

int main(void) {

   for(int x = 0 ; x < 3 ; ++x)

   {

     size_t i = 0;

     size_t cnt = 1000000;

     clock_t start = 0;

     clock_t end = 0;

    

     start = clock();

     while (i++ < cnt) {

       std::string str = "A VERY LONG STRING       ISNT       IT?";

       str.resize(std::remove_if(str.begin(), str.end(), isspace) - str.begin());

     }

     end = clock();

     std::cout << "remove_if method  : " << (end - start) << std::endl;

    

     i = 0;

    

     start = clock();

     while (i++ < cnt) {

  	   std::string str = "A VERY LONG STRING       ISNT       IT?";

   

	   size_t spos = 0;

	   size_t epos = 0;

   

	   for(;;)

	   {

		   spos = str.find(' ', spos);

		   if(spos == std::string::npos) { break; }

		   epos = str.find_first_not_of(' ', spos);

		   if(epos == std::string::npos) { epos = str.size(); }

		   str.erase(str.begin() + spos, str.begin() + epos);

	   }

     }

     end = clock();

    

     std::cout << "find/erase method : " << (end - start) << std::endl;

    

   }

}

Open in new window

0
 
LVL 53

Expert Comment

by:Infinity08
ID: 24773560
>> you are erasing one item at a time. I meant find start and end of space block and erase it all in one go. I would expect that to be quicker.

True, that will most likely make it faster, but it depends on the input data. If the amount of large space blocks in the input data is big, maybe you'd get an advantage out of it ... Worth testing ... maybe later :)


>> If next would be a class object the post operator++ would happen before assignment.

The operator++ is called before the assignment, but that doesn't change the fact that the un-incremented value will be used for the assignment, assuming that the overloaded operator++ is properly implemented (ie. it returns the un-incremented value).

Post increment operators guarantee that the increment happens AFTER the original value has been read, no matter if the object is a POD type or if it's a class with an overloaded post increment operator (assuming it's properly implemented).

So, first the value of 'next' is retrieved (to be used for the assignment), then 'next' is incremented. The exact moment the post increment operator is called is not important, but it is guaranteed that the original value of 'next' will be used for the assignment.


>> I added a plain C loop

That's not surprising heh ... given the overhead of the STL algorithm, and the fact that C++ compilers still have a long way to go to optimize STL functionality.
0
 
LVL 40

Expert Comment

by:evilrix
ID: 24773574
>> but it depends on the input data
Yes, it very much does as. It also depends on how your string is implemented. Since string isn't bound to have contiguous memory internally erase can, in theory, be implemented very efficiently. Of course, as I said before, YMMV and as always when considering optimizing code you should profile the options using representative data.
0
 
LVL 53

Expert Comment

by:Infinity08
ID: 24773604
>> and the fact that C++ compilers still have a long way to go to optimize STL functionality.

Actually, if you want to be fair, you should compile the code with optimization enabled, to give the compiler a fighting chance ;)


>> Worth testing ... maybe later :)

And I see you already did lol. But try the same with -O2 enabled, and you should see the difference shrink ... Then try it with a different string, like :

  "a s t r i n g w i t h a l o t o f s p a c e s"

(with and without optimization enabled)
0
 
LVL 53

Expert Comment

by:Infinity08
ID: 24773630
>>   "a s t r i n g w i t h a l o t o f s p a c e s"

If my intuition doesn't play tricks on me, you should find that the remove_if method is a lot more consistent. No matter what input string you give it, it should not make a lot of difference in the time it takes to execute it (the more spaces the string contains, the faster it will be actually lol).
The find/erase method however should give widely varying results, depending on the input string used. Which is in my view another reason to choose the remove_if method.
0
 
LVL 40

Expert Comment

by:evilrix
ID: 24773659
>> Actually, if you want to be fair, you should compile the code with optimization enabled
Muhahaha, told you I was rushing and I had that nasty feeling in my gut I'd forgotten to do something. :)
See below O2 and O3, both faster given the simple test case we've been using.

On balance, with lots of small white space they are likely to be similar in performance (replace_if will probably have an edge) but if there are blocks of white spaceI'd expect, on balance, erase to be faster. Like I said though, and I'm sure you agree, profiling is the right way to go if you need the most optimal solution.

I like Qs like this, they are simple yet thought provoking :)
$ g++ -O3 x.cpp 

$ ./a.out 

remove_if method  : 3050000

find/erase method : 2860000

remove_if method  : 3040000

find/erase method : 2830000

remove_if method  : 3120000

find/erase method : 2840000
 

$ g++ -O2 x.cpp 

$ ./a.out 

remove_if method  : 3120000

find/erase method : 3160000 <--- arrrg, I lost :)

remove_if method  : 2950000

find/erase method : 2820000

remove_if method  : 2950000

find/erase method : 2840000

Open in new window

0
 
LVL 53

Expert Comment

by:Infinity08
ID: 24773738
>> >> Actually, if you want to be fair, you should compile the code with optimization enabled

Note that this was more intended for Alex's hand-optimized solution. It uses the same algorithm as remove_if, but remove_if has the "disadvantage" of being a generic algorithm, so it needs compiler optimization to stand a chance against Alex's very tight loop. The remove_if will likely still be slower, even with full optimization, but the difference should be smaller.


>> Like I said though, and I'm sure you agree, profiling is the right way to go if you need the most optimal solution.

Of course. That's why I focused on the algorithmically best solution, and in this case, there's nothing more straightforward than iterating over the string once, and shifting non-space characters one at a time, which is exactly what remove_if does.

The find/erase approach with space-block detection is still algorithmically inferior (although obviously less so than the standard find/erase approach).

After that it's up to the compiler to try to optimize it, and up to the profiler to point out if there's an unforeseen  problem.
0
 
LVL 53

Expert Comment

by:Infinity08
ID: 24773744
>> I like Qs like this, they are simple yet thought provoking :)

Yep, me too :)
0
 
LVL 40

Expert Comment

by:evilrix
ID: 24773771
>> Note that this was more intended for Alex's hand-optimized solution
I know but it prompted me to realized I'd forgotten to. :)
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 24773839
>>>> so it needs compiler optimization to stand a chance against Alex's very tight loop.

Yes, and I didn't use isspace what is unfair also:

With

       if (!isspace(*p)) *n++ = *p;

times turn to

remove_if method :   609 AVERYLONGSTRINGISNTIT?
find/erase method : 1891 AVERYLONGSTRINGISNTIT?
plain loop method :   421 AVERYLONGSTRINGISNTIT?

0
 
LVL 53

Expert Comment

by:Infinity08
ID: 24773867
That isspace function behaves quite weirdly under Windows. I noticed a big increase in performance by using my own function :

        bool isSpace(char c) { return (c == ' '); }

And I only noticed that on Windows platforms ... Strange.
0

Featured Post

Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

Join & Write a Comment

Suggested Solutions

Templates For Beginners Or How To Encourage The Compiler To Work For You Introduction This tutorial is targeted at the reader who is, perhaps, familiar with the basics of C++ but would prefer a little slower introduction to the more ad…
Jaspersoft Studio is a plugin for Eclipse that lets you create reports from a datasource.  In this article, we'll go over creating a report from a default template and setting up a datasource that connects to your database.
The viewer will learn how to use and create keystrokes in Netbeans IDE 8.0 for Windows.
The goal of the tutorial is to teach the user how to use functions in C++. The video will cover how to define functions, how to call functions and how to create functions prototypes. Microsoft Visual C++ 2010 Express will be used as a text editor an…

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now