Solved

URGENT:  Comparing STL string in lower case and trimmed

Posted on 2003-10-29
16
291 Views
Last Modified: 2010-07-27
I need to write the most optimized way to converta and to compare to strings based on lower case and no outside whitespaces.

for example,

I want "THIS " and "this" and "tHis    " to all match.  where those are all strings.  

What is the most optimized way to do this?  I am going to be using these to access objects in a hash_map, so I want them to key correctly.  Would it be fastest to just write this into the

0
Comment
Question by:jjacksn
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 4
  • 3
  • +3
16 Comments
 
LVL 1

Expert Comment

by:c567591
ID: 9643614
If you can convert them into standard C strings,
You can use stricmp to case insensitive compare.
Or if you must use a case sensitive comparison, convert them to upper or lower case before comparing.

As far as a trim, here is the trim that I always use:
//This comes from Bob Stout's Snippets & is Public Domain
char *trim (char *str)
{
      char *ibuf, *obuf;

      if (str)
      {
            for (ibuf = obuf = str; *ibuf; )
            {
                  while (*ibuf && (isspace (*ibuf)))
                        ibuf++;
                  if (*ibuf && (obuf != str))
                        *(obuf++) = ' ';
                  while (*ibuf && (!isspace (*ibuf)))
                        *(obuf++) = *(ibuf++);
            }
            *obuf = 0x00; //NUL;
      }
      return (str);
}

0
 
LVL 2

Expert Comment

by:federal102
ID: 9643734
Or, if you want to stic with std::string

First strip off the spaces...

string s = "   this              ";
while ( s.size() && *(s.begin()) == ' ' )
      s.erase(s.begin());
while ( s.size() && *(s.end() - 1) == ' ' )
      s.erase(s.end()-1);

Then convert to uppercase....

string::iterator i = s.begin();
while(i != s.end())
{
      *i = toupper(*i);
      ++i;
}

Then compare.
0
 
LVL 11

Expert Comment

by:bcladd
ID: 9643751
federal102 has the best suggestion. If you are going to hash them all, you want to normalize the representation BEFORE you hash so that they hash to the same place.

-bcl
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 1

Expert Comment

by:fsign21
ID: 9644066
What do you think about the following solution (I am posting it, because I think that string.erase() may be an expensive operation)?

const static string null_string("");

string stripWhiteSpaceAndConvertToUpper(const string& arg)
{
  int szlen = arg.length();
  if( szlen == 0) return null_string;

  int end = (szlen-1);
  int start = 0;
  register const char* val = arg.c_str();

  while(start <=end && (val[start] == ' ' || val[start] == '\n'))
    start++;
  if(start > end) return null_string; //only white spaces

  while(end && (val[end] == ' ' || val[end] == '\n'))
    end--;

  //result string
  string tmp(val+start, end-start+1);  
 
  //convert toupper
  string::iterator i = tmp.begin();
  while(i != tmp.end())
  {
     *i = toupper(*i);
     ++i;
  }

  return tmp;
}
0
 
LVL 1

Expert Comment

by:c567591
ID: 9645142
Don't forget to include \t as whitespace.
0
 
LVL 3

Accepted Solution

by:
EarthQuaker earned 250 total points
ID: 9645224
string& make_all_lowercase(string &s)
{
    transform(s.begin(), s.end(), s.begin(), &tolower);
    return s;
}

string& trim_left(string &s)
{
    while(s[0]==' ')
    {
       s.erase(s.begin());
    }
    return s;
}

string& trim_right(string &s)
{
    while(s[s.size()-1]==' ')
    {
       s.erase(--s.end());
    }
    return s;
}

string& trim(string &s)
{
    return trim_right(trim_left(s));
}
0
 
LVL 5

Author Comment

by:jjacksn
ID: 9645744
Earth Quakers code appers the cleanest, but it it just as fast as the other methods?  and what does the notation (string &s) mean?
0
 
LVL 1

Expert Comment

by:c567591
ID: 9645805
&S is a pointer to the string, to just pass it instead of the full string.
It should be pretty quick, what are you doing that needs to be fast as possible?
0
 
LVL 5

Author Comment

by:jjacksn
ID: 9645834
I'm scanning all of the emails in a folder and storing info by each sender and recipient of each e-mail.  I'm going to have to access the hashtable once per recipient of each email, thsu I'm doing this operation roughly 100K times and I would like to operation to be as quick as possible.  so (string *s) and (string &s) are the same thing?  
0
 
LVL 1

Expert Comment

by:c567591
ID: 9646231
A mathematical hash would be far faster than a string comparison for a large # of strings.
If you computed a unique hash for each email addy and stored in a binary tree, you would have a very fast search with a maximum of 17 searches for less than 131,072 email addresses. (2^17=131072).  This plus the time to create the hash would be far less than thousands of string comparisons.
0
 
LVL 1

Expert Comment

by:c567591
ID: 9646340
0
 
LVL 3

Expert Comment

by:EarthQuaker
ID: 9646699
>so (string *s) and (string &s) are the same thing?  

Kindof.

& is reference to when passed as function argument.

References are like an alias, they cannot be uninitialized and they are as fast as pointers.

Basically, use a reference when you can and a pointer when you must.
Here is a sample to illustrate my point :

int n = 1;
int *p = &n;
*p = 2; // n == 2
int &r = n;
r = 3; // n == 3

0
 
LVL 5

Author Comment

by:jjacksn
ID: 9647483
c567591,

I'm using a hash_map keyed on e-mail addresses.  But, obviously, the e-mail addresses must all be in the same form when hashed.  Thus the need for the conversion.

EarthQuaker.  

So to make sure I am understanding correctly:
DoSomething1(string *p)
{
  modify p
}
and
DoSomething2(string &p)
{
 modify p
}

are equivelent in the sense that
*p = "test"
DoSomething1(p)
and
DoSomethign2(*p)
would both mutate p in the calling function?

in other words, func1(obj a) and func(obj &a) both take the same args, but func1 would only modify it on its own call stack, not its caller?
0
 
LVL 3

Expert Comment

by:EarthQuaker
ID: 9650390
A reference is like an alias. I think you got it but nothing can describe it more sucessfully than running this program :

#include <iostream>
#include <string>

using namespace std;

void foo(string &s)
{
    s="reference";    // use s normally
    cout << "in foo ref - s at address :" << &s << endl;
}
void foo(int *s)
{
    *s="pointer";  // need to dereference to point to the real s
    cout << "in foo pointer - s at address :" << s << endl;
}

int main()
{
     string s = "hello world";
     cout << s << endl;
     foo(s);
     cout << s << endl;
     foo(&s);
     cout << s << endl;
     return 0;
}
0
 
LVL 3

Expert Comment

by:EarthQuaker
ID: 9650404
http://www.parashift.com/c++-faq-lite/references.html

This should answer all your questions about references.
0
 
LVL 11

Expert Comment

by:bcladd
ID: 9688912
Happened to be reading the C/C++ Users Journal, October 2003, and Koenig and Moo's column focused on removing spaces from a string. They present and compare three various algorithms. I like the way they write (loved _Accelerated C++_) and it happens to be on point here.

-bcl
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction This article is the first in a series of articles about the C/C++ Visual Studio Express debugger.  It provides a quick start guide in using the debugger. Part 2 focuses on additional topics in breakpoints.  Lastly, Part 3 focuses on th…
C++ Properties One feature missing from standard C++ that you will find in many other Object Oriented Programming languages is something called a Property (http://www.experts-exchange.com/Programming/Languages/CPP/A_3912-Object-Properties-in-C.ht…
The viewer will learn how to clear a vector as well as how to detect empty vectors in C++.
The viewer will be introduced to the member functions push_back and pop_back of the vector class. The video will teach the difference between the two as well as how to use each one along with its functionality.

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question