Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

URGENT:  Comparing STL string in lower case and trimmed

Posted on 2003-10-29
16
Medium Priority
?
294 Views
Last Modified: 2010-07-27
I need to write the most optimized way to converta and to compare to strings based on lower case and no outside whitespaces.

for example,

I want "THIS " and "this" and "tHis    " to all match.  where those are all strings.  

What is the most optimized way to do this?  I am going to be using these to access objects in a hash_map, so I want them to key correctly.  Would it be fastest to just write this into the

0
Comment
Question by:jjacksn
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 4
  • 3
  • +3
16 Comments
 
LVL 1

Expert Comment

by:c567591
ID: 9643614
If you can convert them into standard C strings,
You can use stricmp to case insensitive compare.
Or if you must use a case sensitive comparison, convert them to upper or lower case before comparing.

As far as a trim, here is the trim that I always use:
//This comes from Bob Stout's Snippets & is Public Domain
char *trim (char *str)
{
      char *ibuf, *obuf;

      if (str)
      {
            for (ibuf = obuf = str; *ibuf; )
            {
                  while (*ibuf && (isspace (*ibuf)))
                        ibuf++;
                  if (*ibuf && (obuf != str))
                        *(obuf++) = ' ';
                  while (*ibuf && (!isspace (*ibuf)))
                        *(obuf++) = *(ibuf++);
            }
            *obuf = 0x00; //NUL;
      }
      return (str);
}

0
 
LVL 2

Expert Comment

by:federal102
ID: 9643734
Or, if you want to stic with std::string

First strip off the spaces...

string s = "   this              ";
while ( s.size() && *(s.begin()) == ' ' )
      s.erase(s.begin());
while ( s.size() && *(s.end() - 1) == ' ' )
      s.erase(s.end()-1);

Then convert to uppercase....

string::iterator i = s.begin();
while(i != s.end())
{
      *i = toupper(*i);
      ++i;
}

Then compare.
0
 
LVL 11

Expert Comment

by:bcladd
ID: 9643751
federal102 has the best suggestion. If you are going to hash them all, you want to normalize the representation BEFORE you hash so that they hash to the same place.

-bcl
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
LVL 1

Expert Comment

by:fsign21
ID: 9644066
What do you think about the following solution (I am posting it, because I think that string.erase() may be an expensive operation)?

const static string null_string("");

string stripWhiteSpaceAndConvertToUpper(const string& arg)
{
  int szlen = arg.length();
  if( szlen == 0) return null_string;

  int end = (szlen-1);
  int start = 0;
  register const char* val = arg.c_str();

  while(start <=end && (val[start] == ' ' || val[start] == '\n'))
    start++;
  if(start > end) return null_string; //only white spaces

  while(end && (val[end] == ' ' || val[end] == '\n'))
    end--;

  //result string
  string tmp(val+start, end-start+1);  
 
  //convert toupper
  string::iterator i = tmp.begin();
  while(i != tmp.end())
  {
     *i = toupper(*i);
     ++i;
  }

  return tmp;
}
0
 
LVL 1

Expert Comment

by:c567591
ID: 9645142
Don't forget to include \t as whitespace.
0
 
LVL 3

Accepted Solution

by:
EarthQuaker earned 1000 total points
ID: 9645224
string& make_all_lowercase(string &s)
{
    transform(s.begin(), s.end(), s.begin(), &tolower);
    return s;
}

string& trim_left(string &s)
{
    while(s[0]==' ')
    {
       s.erase(s.begin());
    }
    return s;
}

string& trim_right(string &s)
{
    while(s[s.size()-1]==' ')
    {
       s.erase(--s.end());
    }
    return s;
}

string& trim(string &s)
{
    return trim_right(trim_left(s));
}
0
 
LVL 5

Author Comment

by:jjacksn
ID: 9645744
Earth Quakers code appers the cleanest, but it it just as fast as the other methods?  and what does the notation (string &s) mean?
0
 
LVL 1

Expert Comment

by:c567591
ID: 9645805
&S is a pointer to the string, to just pass it instead of the full string.
It should be pretty quick, what are you doing that needs to be fast as possible?
0
 
LVL 5

Author Comment

by:jjacksn
ID: 9645834
I'm scanning all of the emails in a folder and storing info by each sender and recipient of each e-mail.  I'm going to have to access the hashtable once per recipient of each email, thsu I'm doing this operation roughly 100K times and I would like to operation to be as quick as possible.  so (string *s) and (string &s) are the same thing?  
0
 
LVL 1

Expert Comment

by:c567591
ID: 9646231
A mathematical hash would be far faster than a string comparison for a large # of strings.
If you computed a unique hash for each email addy and stored in a binary tree, you would have a very fast search with a maximum of 17 searches for less than 131,072 email addresses. (2^17=131072).  This plus the time to create the hash would be far less than thousands of string comparisons.
0
 
LVL 1

Expert Comment

by:c567591
ID: 9646340
0
 
LVL 3

Expert Comment

by:EarthQuaker
ID: 9646699
>so (string *s) and (string &s) are the same thing?  

Kindof.

& is reference to when passed as function argument.

References are like an alias, they cannot be uninitialized and they are as fast as pointers.

Basically, use a reference when you can and a pointer when you must.
Here is a sample to illustrate my point :

int n = 1;
int *p = &n;
*p = 2; // n == 2
int &r = n;
r = 3; // n == 3

0
 
LVL 5

Author Comment

by:jjacksn
ID: 9647483
c567591,

I'm using a hash_map keyed on e-mail addresses.  But, obviously, the e-mail addresses must all be in the same form when hashed.  Thus the need for the conversion.

EarthQuaker.  

So to make sure I am understanding correctly:
DoSomething1(string *p)
{
  modify p
}
and
DoSomething2(string &p)
{
 modify p
}

are equivelent in the sense that
*p = "test"
DoSomething1(p)
and
DoSomethign2(*p)
would both mutate p in the calling function?

in other words, func1(obj a) and func(obj &a) both take the same args, but func1 would only modify it on its own call stack, not its caller?
0
 
LVL 3

Expert Comment

by:EarthQuaker
ID: 9650390
A reference is like an alias. I think you got it but nothing can describe it more sucessfully than running this program :

#include <iostream>
#include <string>

using namespace std;

void foo(string &s)
{
    s="reference";    // use s normally
    cout << "in foo ref - s at address :" << &s << endl;
}
void foo(int *s)
{
    *s="pointer";  // need to dereference to point to the real s
    cout << "in foo pointer - s at address :" << s << endl;
}

int main()
{
     string s = "hello world";
     cout << s << endl;
     foo(s);
     cout << s << endl;
     foo(&s);
     cout << s << endl;
     return 0;
}
0
 
LVL 3

Expert Comment

by:EarthQuaker
ID: 9650404
http://www.parashift.com/c++-faq-lite/references.html

This should answer all your questions about references.
0
 
LVL 11

Expert Comment

by:bcladd
ID: 9688912
Happened to be reading the C/C++ Users Journal, October 2003, and Koenig and Moo's column focused on removing spaces from a string. They present and compare three various algorithms. I like the way they write (loved _Accelerated C++_) and it happens to be on point here.

-bcl
0

Featured Post

Ask an Anonymous Question!

Don't feel intimidated by what you don't know. Ask your question anonymously. It's easy! Learn more and upgrade.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

IntroductionThis article is the second in a three part article series on the Visual Studio 2008 Debugger.  It provides tips in setting and using breakpoints. If not familiar with this debugger, you can find a basic introduction in the EE article loc…
Go is an acronym of golang, is a programming language developed Google in 2007. Go is a new language that is mostly in the C family, with significant input from Pascal/Modula/Oberon family. Hence Go arisen as low-level language with fast compilation…
The goal of the video will be to teach the user the concept of local variables and scope. An example of a locally defined variable will be given as well as an explanation of what scope is in C++. The local variable and concept of scope will be relat…
The viewer will learn how to use the return statement in functions in C++. The video will also teach the user how to pass data to a function and have the function return data back for further processing.

636 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question