Solved

URGENT:  Comparing STL string in lower case and trimmed

Posted on 2003-10-29
16
283 Views
Last Modified: 2010-07-27
I need to write the most optimized way to converta and to compare to strings based on lower case and no outside whitespaces.

for example,

I want "THIS " and "this" and "tHis    " to all match.  where those are all strings.  

What is the most optimized way to do this?  I am going to be using these to access objects in a hash_map, so I want them to key correctly.  Would it be fastest to just write this into the

0
Comment
Question by:jjacksn
  • 5
  • 4
  • 3
  • +3
16 Comments
 
LVL 1

Expert Comment

by:c567591
ID: 9643614
If you can convert them into standard C strings,
You can use stricmp to case insensitive compare.
Or if you must use a case sensitive comparison, convert them to upper or lower case before comparing.

As far as a trim, here is the trim that I always use:
//This comes from Bob Stout's Snippets & is Public Domain
char *trim (char *str)
{
      char *ibuf, *obuf;

      if (str)
      {
            for (ibuf = obuf = str; *ibuf; )
            {
                  while (*ibuf && (isspace (*ibuf)))
                        ibuf++;
                  if (*ibuf && (obuf != str))
                        *(obuf++) = ' ';
                  while (*ibuf && (!isspace (*ibuf)))
                        *(obuf++) = *(ibuf++);
            }
            *obuf = 0x00; //NUL;
      }
      return (str);
}

0
 
LVL 2

Expert Comment

by:federal102
ID: 9643734
Or, if you want to stic with std::string

First strip off the spaces...

string s = "   this              ";
while ( s.size() && *(s.begin()) == ' ' )
      s.erase(s.begin());
while ( s.size() && *(s.end() - 1) == ' ' )
      s.erase(s.end()-1);

Then convert to uppercase....

string::iterator i = s.begin();
while(i != s.end())
{
      *i = toupper(*i);
      ++i;
}

Then compare.
0
 
LVL 11

Expert Comment

by:bcladd
ID: 9643751
federal102 has the best suggestion. If you are going to hash them all, you want to normalize the representation BEFORE you hash so that they hash to the same place.

-bcl
0
 
LVL 1

Expert Comment

by:fsign21
ID: 9644066
What do you think about the following solution (I am posting it, because I think that string.erase() may be an expensive operation)?

const static string null_string("");

string stripWhiteSpaceAndConvertToUpper(const string& arg)
{
  int szlen = arg.length();
  if( szlen == 0) return null_string;

  int end = (szlen-1);
  int start = 0;
  register const char* val = arg.c_str();

  while(start <=end && (val[start] == ' ' || val[start] == '\n'))
    start++;
  if(start > end) return null_string; //only white spaces

  while(end && (val[end] == ' ' || val[end] == '\n'))
    end--;

  //result string
  string tmp(val+start, end-start+1);  
 
  //convert toupper
  string::iterator i = tmp.begin();
  while(i != tmp.end())
  {
     *i = toupper(*i);
     ++i;
  }

  return tmp;
}
0
 
LVL 1

Expert Comment

by:c567591
ID: 9645142
Don't forget to include \t as whitespace.
0
 
LVL 3

Accepted Solution

by:
EarthQuaker earned 250 total points
ID: 9645224
string& make_all_lowercase(string &s)
{
    transform(s.begin(), s.end(), s.begin(), &tolower);
    return s;
}

string& trim_left(string &s)
{
    while(s[0]==' ')
    {
       s.erase(s.begin());
    }
    return s;
}

string& trim_right(string &s)
{
    while(s[s.size()-1]==' ')
    {
       s.erase(--s.end());
    }
    return s;
}

string& trim(string &s)
{
    return trim_right(trim_left(s));
}
0
 
LVL 5

Author Comment

by:jjacksn
ID: 9645744
Earth Quakers code appers the cleanest, but it it just as fast as the other methods?  and what does the notation (string &s) mean?
0
 
LVL 1

Expert Comment

by:c567591
ID: 9645805
&S is a pointer to the string, to just pass it instead of the full string.
It should be pretty quick, what are you doing that needs to be fast as possible?
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 
LVL 5

Author Comment

by:jjacksn
ID: 9645834
I'm scanning all of the emails in a folder and storing info by each sender and recipient of each e-mail.  I'm going to have to access the hashtable once per recipient of each email, thsu I'm doing this operation roughly 100K times and I would like to operation to be as quick as possible.  so (string *s) and (string &s) are the same thing?  
0
 
LVL 1

Expert Comment

by:c567591
ID: 9646231
A mathematical hash would be far faster than a string comparison for a large # of strings.
If you computed a unique hash for each email addy and stored in a binary tree, you would have a very fast search with a maximum of 17 searches for less than 131,072 email addresses. (2^17=131072).  This plus the time to create the hash would be far less than thousands of string comparisons.
0
 
LVL 1

Expert Comment

by:c567591
ID: 9646340
0
 
LVL 3

Expert Comment

by:EarthQuaker
ID: 9646699
>so (string *s) and (string &s) are the same thing?  

Kindof.

& is reference to when passed as function argument.

References are like an alias, they cannot be uninitialized and they are as fast as pointers.

Basically, use a reference when you can and a pointer when you must.
Here is a sample to illustrate my point :

int n = 1;
int *p = &n;
*p = 2; // n == 2
int &r = n;
r = 3; // n == 3

0
 
LVL 5

Author Comment

by:jjacksn
ID: 9647483
c567591,

I'm using a hash_map keyed on e-mail addresses.  But, obviously, the e-mail addresses must all be in the same form when hashed.  Thus the need for the conversion.

EarthQuaker.  

So to make sure I am understanding correctly:
DoSomething1(string *p)
{
  modify p
}
and
DoSomething2(string &p)
{
 modify p
}

are equivelent in the sense that
*p = "test"
DoSomething1(p)
and
DoSomethign2(*p)
would both mutate p in the calling function?

in other words, func1(obj a) and func(obj &a) both take the same args, but func1 would only modify it on its own call stack, not its caller?
0
 
LVL 3

Expert Comment

by:EarthQuaker
ID: 9650390
A reference is like an alias. I think you got it but nothing can describe it more sucessfully than running this program :

#include <iostream>
#include <string>

using namespace std;

void foo(string &s)
{
    s="reference";    // use s normally
    cout << "in foo ref - s at address :" << &s << endl;
}
void foo(int *s)
{
    *s="pointer";  // need to dereference to point to the real s
    cout << "in foo pointer - s at address :" << s << endl;
}

int main()
{
     string s = "hello world";
     cout << s << endl;
     foo(s);
     cout << s << endl;
     foo(&s);
     cout << s << endl;
     return 0;
}
0
 
LVL 3

Expert Comment

by:EarthQuaker
ID: 9650404
http://www.parashift.com/c++-faq-lite/references.html

This should answer all your questions about references.
0
 
LVL 11

Expert Comment

by:bcladd
ID: 9688912
Happened to be reading the C/C++ Users Journal, October 2003, and Koenig and Moo's column focused on removing spaces from a string. They present and compare three various algorithms. I like the way they write (loved _Accelerated C++_) and it happens to be on point here.

-bcl
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When writing generic code, using template meta-programming techniques, it is sometimes useful to know if a type is convertible to another type. A good example of when this might be is if you are writing diagnostic instrumentation for code to generat…
Introduction This article is a continuation of the C/C++ Visual Studio Express debugger series. Part 1 provided a quick start guide in using the debugger. Part 2 focused on additional topics in breakpoints. As your assignments become a little more …
The goal of the tutorial is to teach the user how to use functions in C++. The video will cover how to define functions, how to call functions and how to create functions prototypes. Microsoft Visual C++ 2010 Express will be used as a text editor an…
The viewer will learn how to pass data into a function in C++. This is one step further in using functions. Instead of only printing text onto the console, the function will be able to perform calculations with argumentents given by the user.

911 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now