Solved

EASY: function to clean e-mail addresses

Posted on 2004-07-30
20
309 Views
Last Modified: 2010-04-01
I'm trying to take an e-mail address, remove any invalid characters from it, and make it lower case.  
the e-mail is a std::string.  What is the easiest way to perform this operation?  (I think the only valid symbols in an e-mail are - _ . and @)
0
Comment
Question by:jjacksn
  • 10
  • 9
20 Comments
 
LVL 86

Expert Comment

by:jkr
ID: 11679601
#include <string>
#include <ctype>
using namespace std;

strinsg strAddr = "Joe_User@someserver.com";
string::iterator i;
string::iterator ib = strAddr.begin();
string::iterator ie = strAddr.end();

for ( i = ib; i != ie; ++i) {

  char c = *i;

   if ( !isalnum(c) && '_' != c && '@' != c) {

      i = strAddr.erase(i); --i;

  } else *i = tolower(c);
}
0
 
LVL 86

Expert Comment

by:jkr
ID: 11679655
Ooops, a few corrections:

#include <string>
#include <cctype>
#include <iostream>
using namespace std;

string strAddr = "Joe_User@someserver.com";
string::iterator i;
string::iterator ib = strAddr.begin();
string::iterator ie = strAddr.end();

for ( i = ib; i != ie; ++i) {

 char c = *i;

  if ( !isalnum(c) && '_' != c && '@' != c && '.' != c) {

     i = strAddr.erase(i);

 } else *i = tolower(c);
}
0
 
LVL 86

Expert Comment

by:jkr
ID: 11679675
And, finally, to wrap that up in a function:

#include <string>
#include <cctype>
#include <iostream>
using namespace std;

void canonicalize_email_addr ( string& strAddr) {

string::iterator i;
string::iterator ib = strAddr.begin();
string::iterator ie = strAddr.end();

for ( i = ib; i != ie; ++i) {

char c = *i;

 if ( !isalnum(c) && '_' != c && '@' != c && '.' != c) {

    i = strAddr.erase(i);

 } else *i = tolower(c);
}
0
Gigs: Get Your Project Delivered by an Expert

Select from freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely and get projects done right.

 
LVL 5

Author Comment

by:jjacksn
ID: 11680110
what will i point to after strAddr.erase?

am I correct in thinking those are the only valid symbols?

isalnum does what and is in which header?
0
 
LVL 86

Expert Comment

by:jkr
ID: 11680234
>> what will i point to after strAddr.erase?

Um, sorry, but I don't get what you mean...

>>am I correct in thinking those are the only valid symbols?

Yes. The underscore, the dot and the @ IIRC are the only valid characters

>>isalnum does what and is in which header?

It checks whether a character is alphanumeric, e.g. A-Z,a-z,0-9 and resides in 'cctype' - you can directly compile the above with all the headers, I tested it - and have to add another correction. Here's the test code

#include <string>
#include <cctype>
#include <iostream>
using namespace std;

void main () {

string strAddr = "Joe_User@som#ese!rver.com%";
string::iterator i;
string::iterator ib = strAddr.begin();

cout << strAddr << endl;


for ( i = ib; i != strAddr.end(); ++i) {

 char c = *i;

  if ( !isalnum(c) && '_' != c && '@' != c && '.' != c) {

     i = strAddr.erase(i); --i;

 } else *i = tolower(c);
}

cout << strAddr << endl;

}

and the function:

#include <string>
#include <cctype>
#include <iostream>
using namespace std;

void canonicalize_email_addr ( string& strAddr) {

string::iterator i;
string::iterator ib = strAddr.begin();

for ( i = ib; i != strAddr.end(); ++i) {

char c = *i;

if ( !isalnum(c) && '_' != c && '@' != c && '.' != c) {

   i = strAddr.erase(i);

} else *i = tolower(c);
}
0
 
LVL 5

Author Comment

by:jjacksn
ID: 11680407
the bottom function is then incorrect?  (it still doesn't have the --i)

What i meant before was:

what does the string::iterator point to after calling strAddr.erase(i);  <-- what will i be pointing to?  

In other words, will i++ work correctly after that?  

I noticed you added --i?  I'm guessing that's part of my anwer?
0
 
LVL 30

Expert Comment

by:Axter
ID: 11680419
>>I'm trying to take an e-mail address, remove any invalid characters from it, and make it lower case.

What is the source of the email address?
What is the purpose of modifing the original email address?

Be aware, that you can run into problems if the original address string comes in the following formats:
Experts Exchange [qna@experts-exchange.com]
<Experts Exchange>qna@experts-exchange.com
0
 
LVL 86

Accepted Solution

by:
jkr earned 500 total points
ID: 11680441
>> the bottom function is then incorrect?  (it still doesn't have the --i)

Argh, copy&paste :o)

#include <string>
#include <cctype>
#include <iostream>
using namespace std;

void canonicalize_email_addr ( string& strAddr) {

string::iterator i;
string::iterator ib = strAddr.begin();

for ( i = ib; i != strAddr.end(); ++i) {

char c = *i;

if ( !isalnum(c) && '_' != c && '@' != c && '.' != c) {

  i = strAddr.erase(i); --i;

} else *i = tolower(c);
}

>>what does the string::iterator point to after calling strAddr.erase(i);  <-- what will i be pointing to?  

It points to the 1st element *after* the erased one - thus the '--i', since the '++i' in the loop would skip one character...
0
 
LVL 5

Author Comment

by:jjacksn
ID: 11682595
hmm, that function seems to hang forever...?

      //make the e-mail lower case and remove invalid chars
      string::iterator i;
      string::iterator ib = email->begin();
      string::iterator ie = email->end();

      for ( i = ib; i != ie; i++) {
            char c = *i;
            if ( !isalnum(c) && '_' != c && '@' != c && '.' != c) {
                  i = email->erase(i);
                   i--;
            }
            else
            {
                  *i = tolower(c);
            }
      }
0
 
LVL 86

Expert Comment

by:jkr
ID: 11682615
What are you feeding it as arguments? From the above snippet, it is hard to tell what is going on.
0
 
LVL 5

Author Comment

by:jjacksn
ID: 11682631
in the debugger, the value of email is

-email      0x09b45e61 {"}      std::basic_string<char,std::char_traits<char>,std::allocator<char> > *
I can't tell if email equals an empty string or a string with two single quotes.  Either way, the for loop loops infinitely on this string.  Any advice?  (I'm pretty sure its not the empty string because it is pass a call I placed above the for loop that is if(email->length() > 0)

OK, now I'm really confused, its executing the else clause even if the "if" clause is hit?  what is going on?


      if(email->length() > 0)
      {
      for ( i = ib; i != ie; i++) {
            char c = *i;
            if ( !isalnum(c) && '_' != c && '@' != c && '.' != c) {
                  i = email->erase(i);
                  i--;
            }
            else
            {
                  *i = tolower(c);
            }
      }
      }
0
 
LVL 86

Expert Comment

by:jkr
ID: 11682661
But, what is it when you are calling the above function?
0
 
LVL 5

Author Comment

by:jjacksn
ID: 11682791
its really hard to tell... but I'm guessing a blank string, or a double single quote.  this is run in the middle or parsing like 5,000 addresses, so its really hard to single it out.  

Can you reproduce the bad behavior with a string of length zero or double single quotes?  may a single double quote?
0
 
LVL 5

Author Comment

by:jjacksn
ID: 11682997
also, isn't '-' valid as well?  (domains are allowed to have dashes, aren't they?).  
0
 
LVL 5

Author Comment

by:jjacksn
ID: 11683015
Ok, it happens when you have two single quotes.  And, I'm pretty sure the problem is related to the fact that the else clause is getting executing, because then *i is always being assigned to an invalid char.  So, why is the else clause getting executed?  (I feel like I'm taking crazy pills)

      for ( i = ib; i != ie; i++) {
                  char c = *i;
                  if ( !isalnum(c) && '_' != c && '@' != c && '.' != c) {
                        i = email->erase(i);
                        i--;
                  }
                  else
                  {
                        *i = tolower(c);
                  }
            }
0
 
LVL 86

Expert Comment

by:jkr
ID: 11685246
Tested it again with empty strings - cannot reproduce that - you are right about the dash though, just add it to the 'if' statement.
0
 
LVL 5

Author Comment

by:jjacksn
ID: 11686097
not an empty string, a string with two single quotes.
0
 
LVL 86

Expert Comment

by:jkr
ID: 11686469
>>a string with two single quotes

Like

string strAddr = "\"\"";

?

Works also :-(
0
 
LVL 5

Author Comment

by:jjacksn
ID: 11686509
in mine, it doesn't work for some reason.... no, like

string strAddr = "''";

The change to the for loop of

      for ( string::iterator i = email->begin(); i != email->end(); i++) {
                  char c = *i;
                  if ( !isalnum(c) && '_' != c && '@' != c && '.' != c && '-' != c) {
                        i = email->erase(i);
                        i--;
                  }
                  else
                  {
                        *i = tolower(c);
                  }
            }

has fixed the problem though
0
 
LVL 86

Expert Comment

by:jkr
ID: 11686521
Gonna take a closer look tomorrow, it's approaching 5am here :o)
0

Featured Post

Gigs: Get Your Project Delivered by an Expert

Select from freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely and get projects done right.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When writing generic code, using template meta-programming techniques, it is sometimes useful to know if a type is convertible to another type. A good example of when this might be is if you are writing diagnostic instrumentation for code to generat…
Go is an acronym of golang, is a programming language developed Google in 2007. Go is a new language that is mostly in the C family, with significant input from Pascal/Modula/Oberon family. Hence Go arisen as low-level language with fast compilation…
The viewer will learn how to pass data into a function in C++. This is one step further in using functions. Instead of only printing text onto the console, the function will be able to perform calculations with argumentents given by the user.
The viewer will learn how to clear a vector as well as how to detect empty vectors in C++.

786 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question