Solved

URGENT:  Comparing STL string in lower case and trimmed

Posted on 2003-10-29
16
289 Views
Last Modified: 2010-07-27
I need to write the most optimized way to converta and to compare to strings based on lower case and no outside whitespaces.

for example,

I want "THIS " and "this" and "tHis    " to all match.  where those are all strings.  

What is the most optimized way to do this?  I am going to be using these to access objects in a hash_map, so I want them to key correctly.  Would it be fastest to just write this into the

0
Comment
Question by:jjacksn
  • 5
  • 4
  • 3
  • +3
16 Comments
 
LVL 1

Expert Comment

by:c567591
ID: 9643614
If you can convert them into standard C strings,
You can use stricmp to case insensitive compare.
Or if you must use a case sensitive comparison, convert them to upper or lower case before comparing.

As far as a trim, here is the trim that I always use:
//This comes from Bob Stout's Snippets & is Public Domain
char *trim (char *str)
{
      char *ibuf, *obuf;

      if (str)
      {
            for (ibuf = obuf = str; *ibuf; )
            {
                  while (*ibuf && (isspace (*ibuf)))
                        ibuf++;
                  if (*ibuf && (obuf != str))
                        *(obuf++) = ' ';
                  while (*ibuf && (!isspace (*ibuf)))
                        *(obuf++) = *(ibuf++);
            }
            *obuf = 0x00; //NUL;
      }
      return (str);
}

0
 
LVL 2

Expert Comment

by:federal102
ID: 9643734
Or, if you want to stic with std::string

First strip off the spaces...

string s = "   this              ";
while ( s.size() && *(s.begin()) == ' ' )
      s.erase(s.begin());
while ( s.size() && *(s.end() - 1) == ' ' )
      s.erase(s.end()-1);

Then convert to uppercase....

string::iterator i = s.begin();
while(i != s.end())
{
      *i = toupper(*i);
      ++i;
}

Then compare.
0
 
LVL 11

Expert Comment

by:bcladd
ID: 9643751
federal102 has the best suggestion. If you are going to hash them all, you want to normalize the representation BEFORE you hash so that they hash to the same place.

-bcl
0
Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

 
LVL 1

Expert Comment

by:fsign21
ID: 9644066
What do you think about the following solution (I am posting it, because I think that string.erase() may be an expensive operation)?

const static string null_string("");

string stripWhiteSpaceAndConvertToUpper(const string& arg)
{
  int szlen = arg.length();
  if( szlen == 0) return null_string;

  int end = (szlen-1);
  int start = 0;
  register const char* val = arg.c_str();

  while(start <=end && (val[start] == ' ' || val[start] == '\n'))
    start++;
  if(start > end) return null_string; //only white spaces

  while(end && (val[end] == ' ' || val[end] == '\n'))
    end--;

  //result string
  string tmp(val+start, end-start+1);  
 
  //convert toupper
  string::iterator i = tmp.begin();
  while(i != tmp.end())
  {
     *i = toupper(*i);
     ++i;
  }

  return tmp;
}
0
 
LVL 1

Expert Comment

by:c567591
ID: 9645142
Don't forget to include \t as whitespace.
0
 
LVL 3

Accepted Solution

by:
EarthQuaker earned 250 total points
ID: 9645224
string& make_all_lowercase(string &s)
{
    transform(s.begin(), s.end(), s.begin(), &tolower);
    return s;
}

string& trim_left(string &s)
{
    while(s[0]==' ')
    {
       s.erase(s.begin());
    }
    return s;
}

string& trim_right(string &s)
{
    while(s[s.size()-1]==' ')
    {
       s.erase(--s.end());
    }
    return s;
}

string& trim(string &s)
{
    return trim_right(trim_left(s));
}
0
 
LVL 5

Author Comment

by:jjacksn
ID: 9645744
Earth Quakers code appers the cleanest, but it it just as fast as the other methods?  and what does the notation (string &s) mean?
0
 
LVL 1

Expert Comment

by:c567591
ID: 9645805
&S is a pointer to the string, to just pass it instead of the full string.
It should be pretty quick, what are you doing that needs to be fast as possible?
0
 
LVL 5

Author Comment

by:jjacksn
ID: 9645834
I'm scanning all of the emails in a folder and storing info by each sender and recipient of each e-mail.  I'm going to have to access the hashtable once per recipient of each email, thsu I'm doing this operation roughly 100K times and I would like to operation to be as quick as possible.  so (string *s) and (string &s) are the same thing?  
0
 
LVL 1

Expert Comment

by:c567591
ID: 9646231
A mathematical hash would be far faster than a string comparison for a large # of strings.
If you computed a unique hash for each email addy and stored in a binary tree, you would have a very fast search with a maximum of 17 searches for less than 131,072 email addresses. (2^17=131072).  This plus the time to create the hash would be far less than thousands of string comparisons.
0
 
LVL 1

Expert Comment

by:c567591
ID: 9646340
0
 
LVL 3

Expert Comment

by:EarthQuaker
ID: 9646699
>so (string *s) and (string &s) are the same thing?  

Kindof.

& is reference to when passed as function argument.

References are like an alias, they cannot be uninitialized and they are as fast as pointers.

Basically, use a reference when you can and a pointer when you must.
Here is a sample to illustrate my point :

int n = 1;
int *p = &n;
*p = 2; // n == 2
int &r = n;
r = 3; // n == 3

0
 
LVL 5

Author Comment

by:jjacksn
ID: 9647483
c567591,

I'm using a hash_map keyed on e-mail addresses.  But, obviously, the e-mail addresses must all be in the same form when hashed.  Thus the need for the conversion.

EarthQuaker.  

So to make sure I am understanding correctly:
DoSomething1(string *p)
{
  modify p
}
and
DoSomething2(string &p)
{
 modify p
}

are equivelent in the sense that
*p = "test"
DoSomething1(p)
and
DoSomethign2(*p)
would both mutate p in the calling function?

in other words, func1(obj a) and func(obj &a) both take the same args, but func1 would only modify it on its own call stack, not its caller?
0
 
LVL 3

Expert Comment

by:EarthQuaker
ID: 9650390
A reference is like an alias. I think you got it but nothing can describe it more sucessfully than running this program :

#include <iostream>
#include <string>

using namespace std;

void foo(string &s)
{
    s="reference";    // use s normally
    cout << "in foo ref - s at address :" << &s << endl;
}
void foo(int *s)
{
    *s="pointer";  // need to dereference to point to the real s
    cout << "in foo pointer - s at address :" << s << endl;
}

int main()
{
     string s = "hello world";
     cout << s << endl;
     foo(s);
     cout << s << endl;
     foo(&s);
     cout << s << endl;
     return 0;
}
0
 
LVL 3

Expert Comment

by:EarthQuaker
ID: 9650404
http://www.parashift.com/c++-faq-lite/references.html

This should answer all your questions about references.
0
 
LVL 11

Expert Comment

by:bcladd
ID: 9688912
Happened to be reading the C/C++ Users Journal, October 2003, and Koenig and Moo's column focused on removing spaces from a string. They present and compare three various algorithms. I like the way they write (loved _Accelerated C++_) and it happens to be on point here.

-bcl
0

Featured Post

Free Tool: Postgres Monitoring System

A PHP and Perl based system to collect and display usage statistics from PostgreSQL databases.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Article by: SunnyDark
This article's goal is to present you with an easy to use XML wrapper for C++ and also present some interesting techniques that you might use with MS C++. The reason I built this class is to ease the pain of using XML files with C++, since there is…
In days of old, returning something by value from a function in C++ was necessarily avoided because it would, invariably, involve one or even two copies of the object being created and potentially costly calls to a copy-constructor and destructor. A…
The goal of the tutorial is to teach the user how to use functions in C++. The video will cover how to define functions, how to call functions and how to create functions prototypes. Microsoft Visual C++ 2010 Express will be used as a text editor an…
The viewer will learn additional member functions of the vector class. Specifically, the capacity and swap member functions will be introduced.

837 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question