Solved

EASY: function to clean e-mail addresses

Posted on 2004-07-30
20
307 Views
Last Modified: 2010-04-01
I'm trying to take an e-mail address, remove any invalid characters from it, and make it lower case.  
the e-mail is a std::string.  What is the easiest way to perform this operation?  (I think the only valid symbols in an e-mail are - _ . and @)
0
Comment
Question by:jjacksn
  • 10
  • 9
20 Comments
 
LVL 86

Expert Comment

by:jkr
ID: 11679601
#include <string>
#include <ctype>
using namespace std;

strinsg strAddr = "Joe_User@someserver.com";
string::iterator i;
string::iterator ib = strAddr.begin();
string::iterator ie = strAddr.end();

for ( i = ib; i != ie; ++i) {

  char c = *i;

   if ( !isalnum(c) && '_' != c && '@' != c) {

      i = strAddr.erase(i); --i;

  } else *i = tolower(c);
}
0
 
LVL 86

Expert Comment

by:jkr
ID: 11679655
Ooops, a few corrections:

#include <string>
#include <cctype>
#include <iostream>
using namespace std;

string strAddr = "Joe_User@someserver.com";
string::iterator i;
string::iterator ib = strAddr.begin();
string::iterator ie = strAddr.end();

for ( i = ib; i != ie; ++i) {

 char c = *i;

  if ( !isalnum(c) && '_' != c && '@' != c && '.' != c) {

     i = strAddr.erase(i);

 } else *i = tolower(c);
}
0
 
LVL 86

Expert Comment

by:jkr
ID: 11679675
And, finally, to wrap that up in a function:

#include <string>
#include <cctype>
#include <iostream>
using namespace std;

void canonicalize_email_addr ( string& strAddr) {

string::iterator i;
string::iterator ib = strAddr.begin();
string::iterator ie = strAddr.end();

for ( i = ib; i != ie; ++i) {

char c = *i;

 if ( !isalnum(c) && '_' != c && '@' != c && '.' != c) {

    i = strAddr.erase(i);

 } else *i = tolower(c);
}
0
 
LVL 5

Author Comment

by:jjacksn
ID: 11680110
what will i point to after strAddr.erase?

am I correct in thinking those are the only valid symbols?

isalnum does what and is in which header?
0
 
LVL 86

Expert Comment

by:jkr
ID: 11680234
>> what will i point to after strAddr.erase?

Um, sorry, but I don't get what you mean...

>>am I correct in thinking those are the only valid symbols?

Yes. The underscore, the dot and the @ IIRC are the only valid characters

>>isalnum does what and is in which header?

It checks whether a character is alphanumeric, e.g. A-Z,a-z,0-9 and resides in 'cctype' - you can directly compile the above with all the headers, I tested it - and have to add another correction. Here's the test code

#include <string>
#include <cctype>
#include <iostream>
using namespace std;

void main () {

string strAddr = "Joe_User@som#ese!rver.com%";
string::iterator i;
string::iterator ib = strAddr.begin();

cout << strAddr << endl;


for ( i = ib; i != strAddr.end(); ++i) {

 char c = *i;

  if ( !isalnum(c) && '_' != c && '@' != c && '.' != c) {

     i = strAddr.erase(i); --i;

 } else *i = tolower(c);
}

cout << strAddr << endl;

}

and the function:

#include <string>
#include <cctype>
#include <iostream>
using namespace std;

void canonicalize_email_addr ( string& strAddr) {

string::iterator i;
string::iterator ib = strAddr.begin();

for ( i = ib; i != strAddr.end(); ++i) {

char c = *i;

if ( !isalnum(c) && '_' != c && '@' != c && '.' != c) {

   i = strAddr.erase(i);

} else *i = tolower(c);
}
0
 
LVL 5

Author Comment

by:jjacksn
ID: 11680407
the bottom function is then incorrect?  (it still doesn't have the --i)

What i meant before was:

what does the string::iterator point to after calling strAddr.erase(i);  <-- what will i be pointing to?  

In other words, will i++ work correctly after that?  

I noticed you added --i?  I'm guessing that's part of my anwer?
0
 
LVL 30

Expert Comment

by:Axter
ID: 11680419
>>I'm trying to take an e-mail address, remove any invalid characters from it, and make it lower case.

What is the source of the email address?
What is the purpose of modifing the original email address?

Be aware, that you can run into problems if the original address string comes in the following formats:
Experts Exchange [qna@experts-exchange.com]
<Experts Exchange>qna@experts-exchange.com
0
 
LVL 86

Accepted Solution

by:
jkr earned 500 total points
ID: 11680441
>> the bottom function is then incorrect?  (it still doesn't have the --i)

Argh, copy&paste :o)

#include <string>
#include <cctype>
#include <iostream>
using namespace std;

void canonicalize_email_addr ( string& strAddr) {

string::iterator i;
string::iterator ib = strAddr.begin();

for ( i = ib; i != strAddr.end(); ++i) {

char c = *i;

if ( !isalnum(c) && '_' != c && '@' != c && '.' != c) {

  i = strAddr.erase(i); --i;

} else *i = tolower(c);
}

>>what does the string::iterator point to after calling strAddr.erase(i);  <-- what will i be pointing to?  

It points to the 1st element *after* the erased one - thus the '--i', since the '++i' in the loop would skip one character...
0
 
LVL 5

Author Comment

by:jjacksn
ID: 11682595
hmm, that function seems to hang forever...?

      //make the e-mail lower case and remove invalid chars
      string::iterator i;
      string::iterator ib = email->begin();
      string::iterator ie = email->end();

      for ( i = ib; i != ie; i++) {
            char c = *i;
            if ( !isalnum(c) && '_' != c && '@' != c && '.' != c) {
                  i = email->erase(i);
                   i--;
            }
            else
            {
                  *i = tolower(c);
            }
      }
0
 
LVL 86

Expert Comment

by:jkr
ID: 11682615
What are you feeding it as arguments? From the above snippet, it is hard to tell what is going on.
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 
LVL 5

Author Comment

by:jjacksn
ID: 11682631
in the debugger, the value of email is

-email      0x09b45e61 {"}      std::basic_string<char,std::char_traits<char>,std::allocator<char> > *
I can't tell if email equals an empty string or a string with two single quotes.  Either way, the for loop loops infinitely on this string.  Any advice?  (I'm pretty sure its not the empty string because it is pass a call I placed above the for loop that is if(email->length() > 0)

OK, now I'm really confused, its executing the else clause even if the "if" clause is hit?  what is going on?


      if(email->length() > 0)
      {
      for ( i = ib; i != ie; i++) {
            char c = *i;
            if ( !isalnum(c) && '_' != c && '@' != c && '.' != c) {
                  i = email->erase(i);
                  i--;
            }
            else
            {
                  *i = tolower(c);
            }
      }
      }
0
 
LVL 86

Expert Comment

by:jkr
ID: 11682661
But, what is it when you are calling the above function?
0
 
LVL 5

Author Comment

by:jjacksn
ID: 11682791
its really hard to tell... but I'm guessing a blank string, or a double single quote.  this is run in the middle or parsing like 5,000 addresses, so its really hard to single it out.  

Can you reproduce the bad behavior with a string of length zero or double single quotes?  may a single double quote?
0
 
LVL 5

Author Comment

by:jjacksn
ID: 11682997
also, isn't '-' valid as well?  (domains are allowed to have dashes, aren't they?).  
0
 
LVL 5

Author Comment

by:jjacksn
ID: 11683015
Ok, it happens when you have two single quotes.  And, I'm pretty sure the problem is related to the fact that the else clause is getting executing, because then *i is always being assigned to an invalid char.  So, why is the else clause getting executed?  (I feel like I'm taking crazy pills)

      for ( i = ib; i != ie; i++) {
                  char c = *i;
                  if ( !isalnum(c) && '_' != c && '@' != c && '.' != c) {
                        i = email->erase(i);
                        i--;
                  }
                  else
                  {
                        *i = tolower(c);
                  }
            }
0
 
LVL 86

Expert Comment

by:jkr
ID: 11685246
Tested it again with empty strings - cannot reproduce that - you are right about the dash though, just add it to the 'if' statement.
0
 
LVL 5

Author Comment

by:jjacksn
ID: 11686097
not an empty string, a string with two single quotes.
0
 
LVL 86

Expert Comment

by:jkr
ID: 11686469
>>a string with two single quotes

Like

string strAddr = "\"\"";

?

Works also :-(
0
 
LVL 5

Author Comment

by:jjacksn
ID: 11686509
in mine, it doesn't work for some reason.... no, like

string strAddr = "''";

The change to the for loop of

      for ( string::iterator i = email->begin(); i != email->end(); i++) {
                  char c = *i;
                  if ( !isalnum(c) && '_' != c && '@' != c && '.' != c && '-' != c) {
                        i = email->erase(i);
                        i--;
                  }
                  else
                  {
                        *i = tolower(c);
                  }
            }

has fixed the problem though
0
 
LVL 86

Expert Comment

by:jkr
ID: 11686521
Gonna take a closer look tomorrow, it's approaching 5am here :o)
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Handle Exceptions during instantiation 28 411
computer science syllabus 3 70
Header Errors LNK2019, LNK1120 - Unresolved Externals 4 91
Safe conversion? 4 55
Often, when implementing a feature, you won't know how certain events should be handled at the point where they occur and you'd rather defer to the user of your function or class. For example, a XML parser will extract a tag from the source code, wh…
IntroductionThis article is the second in a three part article series on the Visual Studio 2008 Debugger.  It provides tips in setting and using breakpoints. If not familiar with this debugger, you can find a basic introduction in the EE article loc…
The goal of the video will be to teach the user the concept of local variables and scope. An example of a locally defined variable will be given as well as an explanation of what scope is in C++. The local variable and concept of scope will be relat…
The viewer will learn how to pass data into a function in C++. This is one step further in using functions. Instead of only printing text onto the console, the function will be able to perform calculations with argumentents given by the user.

930 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now