Solved

URGENT:  Comparing STL string in lower case and trimmed

Posted on 2003-10-29
16
282 Views
Last Modified: 2010-07-27
I need to write the most optimized way to converta and to compare to strings based on lower case and no outside whitespaces.

for example,

I want "THIS " and "this" and "tHis    " to all match.  where those are all strings.  

What is the most optimized way to do this?  I am going to be using these to access objects in a hash_map, so I want them to key correctly.  Would it be fastest to just write this into the

0
Comment
Question by:jjacksn
  • 5
  • 4
  • 3
  • +3
16 Comments
 
LVL 1

Expert Comment

by:c567591
Comment Utility
If you can convert them into standard C strings,
You can use stricmp to case insensitive compare.
Or if you must use a case sensitive comparison, convert them to upper or lower case before comparing.

As far as a trim, here is the trim that I always use:
//This comes from Bob Stout's Snippets & is Public Domain
char *trim (char *str)
{
      char *ibuf, *obuf;

      if (str)
      {
            for (ibuf = obuf = str; *ibuf; )
            {
                  while (*ibuf && (isspace (*ibuf)))
                        ibuf++;
                  if (*ibuf && (obuf != str))
                        *(obuf++) = ' ';
                  while (*ibuf && (!isspace (*ibuf)))
                        *(obuf++) = *(ibuf++);
            }
            *obuf = 0x00; //NUL;
      }
      return (str);
}

0
 
LVL 2

Expert Comment

by:federal102
Comment Utility
Or, if you want to stic with std::string

First strip off the spaces...

string s = "   this              ";
while ( s.size() && *(s.begin()) == ' ' )
      s.erase(s.begin());
while ( s.size() && *(s.end() - 1) == ' ' )
      s.erase(s.end()-1);

Then convert to uppercase....

string::iterator i = s.begin();
while(i != s.end())
{
      *i = toupper(*i);
      ++i;
}

Then compare.
0
 
LVL 11

Expert Comment

by:bcladd
Comment Utility
federal102 has the best suggestion. If you are going to hash them all, you want to normalize the representation BEFORE you hash so that they hash to the same place.

-bcl
0
 
LVL 1

Expert Comment

by:fsign21
Comment Utility
What do you think about the following solution (I am posting it, because I think that string.erase() may be an expensive operation)?

const static string null_string("");

string stripWhiteSpaceAndConvertToUpper(const string& arg)
{
  int szlen = arg.length();
  if( szlen == 0) return null_string;

  int end = (szlen-1);
  int start = 0;
  register const char* val = arg.c_str();

  while(start <=end && (val[start] == ' ' || val[start] == '\n'))
    start++;
  if(start > end) return null_string; //only white spaces

  while(end && (val[end] == ' ' || val[end] == '\n'))
    end--;

  //result string
  string tmp(val+start, end-start+1);  
 
  //convert toupper
  string::iterator i = tmp.begin();
  while(i != tmp.end())
  {
     *i = toupper(*i);
     ++i;
  }

  return tmp;
}
0
 
LVL 1

Expert Comment

by:c567591
Comment Utility
Don't forget to include \t as whitespace.
0
 
LVL 3

Accepted Solution

by:
EarthQuaker earned 250 total points
Comment Utility
string& make_all_lowercase(string &s)
{
    transform(s.begin(), s.end(), s.begin(), &tolower);
    return s;
}

string& trim_left(string &s)
{
    while(s[0]==' ')
    {
       s.erase(s.begin());
    }
    return s;
}

string& trim_right(string &s)
{
    while(s[s.size()-1]==' ')
    {
       s.erase(--s.end());
    }
    return s;
}

string& trim(string &s)
{
    return trim_right(trim_left(s));
}
0
 
LVL 5

Author Comment

by:jjacksn
Comment Utility
Earth Quakers code appers the cleanest, but it it just as fast as the other methods?  and what does the notation (string &s) mean?
0
 
LVL 1

Expert Comment

by:c567591
Comment Utility
&S is a pointer to the string, to just pass it instead of the full string.
It should be pretty quick, what are you doing that needs to be fast as possible?
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 5

Author Comment

by:jjacksn
Comment Utility
I'm scanning all of the emails in a folder and storing info by each sender and recipient of each e-mail.  I'm going to have to access the hashtable once per recipient of each email, thsu I'm doing this operation roughly 100K times and I would like to operation to be as quick as possible.  so (string *s) and (string &s) are the same thing?  
0
 
LVL 1

Expert Comment

by:c567591
Comment Utility
A mathematical hash would be far faster than a string comparison for a large # of strings.
If you computed a unique hash for each email addy and stored in a binary tree, you would have a very fast search with a maximum of 17 searches for less than 131,072 email addresses. (2^17=131072).  This plus the time to create the hash would be far less than thousands of string comparisons.
0
 
LVL 1

Expert Comment

by:c567591
Comment Utility
0
 
LVL 3

Expert Comment

by:EarthQuaker
Comment Utility
>so (string *s) and (string &s) are the same thing?  

Kindof.

& is reference to when passed as function argument.

References are like an alias, they cannot be uninitialized and they are as fast as pointers.

Basically, use a reference when you can and a pointer when you must.
Here is a sample to illustrate my point :

int n = 1;
int *p = &n;
*p = 2; // n == 2
int &r = n;
r = 3; // n == 3

0
 
LVL 5

Author Comment

by:jjacksn
Comment Utility
c567591,

I'm using a hash_map keyed on e-mail addresses.  But, obviously, the e-mail addresses must all be in the same form when hashed.  Thus the need for the conversion.

EarthQuaker.  

So to make sure I am understanding correctly:
DoSomething1(string *p)
{
  modify p
}
and
DoSomething2(string &p)
{
 modify p
}

are equivelent in the sense that
*p = "test"
DoSomething1(p)
and
DoSomethign2(*p)
would both mutate p in the calling function?

in other words, func1(obj a) and func(obj &a) both take the same args, but func1 would only modify it on its own call stack, not its caller?
0
 
LVL 3

Expert Comment

by:EarthQuaker
Comment Utility
A reference is like an alias. I think you got it but nothing can describe it more sucessfully than running this program :

#include <iostream>
#include <string>

using namespace std;

void foo(string &s)
{
    s="reference";    // use s normally
    cout << "in foo ref - s at address :" << &s << endl;
}
void foo(int *s)
{
    *s="pointer";  // need to dereference to point to the real s
    cout << "in foo pointer - s at address :" << s << endl;
}

int main()
{
     string s = "hello world";
     cout << s << endl;
     foo(s);
     cout << s << endl;
     foo(&s);
     cout << s << endl;
     return 0;
}
0
 
LVL 3

Expert Comment

by:EarthQuaker
Comment Utility
http://www.parashift.com/c++-faq-lite/references.html

This should answer all your questions about references.
0
 
LVL 11

Expert Comment

by:bcladd
Comment Utility
Happened to be reading the C/C++ Users Journal, October 2003, and Koenig and Moo's column focused on removing spaces from a string. They present and compare three various algorithms. I like the way they write (loved _Accelerated C++_) and it happens to be on point here.

-bcl
0

Featured Post

What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

Many modern programming languages support the concept of a property -- a class member that combines characteristics of both a data member and a method.  These are sometimes called "smart fields" because you can add logic that is applied automaticall…
This article shows you how to optimize memory allocations in C++ using placement new. Applicable especially to usecases dealing with creation of large number of objects. A brief on problem: Lets take example problem for simplicity: - I have a G…
The viewer will learn how to user default arguments when defining functions. This method of defining functions will be contrasted with the non-default-argument of defining functions.
The viewer will learn how to clear a vector as well as how to detect empty vectors in C++.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

6 Experts available now in Live!

Get 1:1 Help Now