Solved

EASY: function to clean e-mail addresses

Posted on 2004-07-30
20
312 Views
Last Modified: 2010-04-01
I'm trying to take an e-mail address, remove any invalid characters from it, and make it lower case.  
the e-mail is a std::string.  What is the easiest way to perform this operation?  (I think the only valid symbols in an e-mail are - _ . and @)
0
Comment
Question by:jjacksn
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 10
  • 9
20 Comments
 
LVL 86

Expert Comment

by:jkr
ID: 11679601
#include <string>
#include <ctype>
using namespace std;

strinsg strAddr = "Joe_User@someserver.com";
string::iterator i;
string::iterator ib = strAddr.begin();
string::iterator ie = strAddr.end();

for ( i = ib; i != ie; ++i) {

  char c = *i;

   if ( !isalnum(c) && '_' != c && '@' != c) {

      i = strAddr.erase(i); --i;

  } else *i = tolower(c);
}
0
 
LVL 86

Expert Comment

by:jkr
ID: 11679655
Ooops, a few corrections:

#include <string>
#include <cctype>
#include <iostream>
using namespace std;

string strAddr = "Joe_User@someserver.com";
string::iterator i;
string::iterator ib = strAddr.begin();
string::iterator ie = strAddr.end();

for ( i = ib; i != ie; ++i) {

 char c = *i;

  if ( !isalnum(c) && '_' != c && '@' != c && '.' != c) {

     i = strAddr.erase(i);

 } else *i = tolower(c);
}
0
 
LVL 86

Expert Comment

by:jkr
ID: 11679675
And, finally, to wrap that up in a function:

#include <string>
#include <cctype>
#include <iostream>
using namespace std;

void canonicalize_email_addr ( string& strAddr) {

string::iterator i;
string::iterator ib = strAddr.begin();
string::iterator ie = strAddr.end();

for ( i = ib; i != ie; ++i) {

char c = *i;

 if ( !isalnum(c) && '_' != c && '@' != c && '.' != c) {

    i = strAddr.erase(i);

 } else *i = tolower(c);
}
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 5

Author Comment

by:jjacksn
ID: 11680110
what will i point to after strAddr.erase?

am I correct in thinking those are the only valid symbols?

isalnum does what and is in which header?
0
 
LVL 86

Expert Comment

by:jkr
ID: 11680234
>> what will i point to after strAddr.erase?

Um, sorry, but I don't get what you mean...

>>am I correct in thinking those are the only valid symbols?

Yes. The underscore, the dot and the @ IIRC are the only valid characters

>>isalnum does what and is in which header?

It checks whether a character is alphanumeric, e.g. A-Z,a-z,0-9 and resides in 'cctype' - you can directly compile the above with all the headers, I tested it - and have to add another correction. Here's the test code

#include <string>
#include <cctype>
#include <iostream>
using namespace std;

void main () {

string strAddr = "Joe_User@som#ese!rver.com%";
string::iterator i;
string::iterator ib = strAddr.begin();

cout << strAddr << endl;


for ( i = ib; i != strAddr.end(); ++i) {

 char c = *i;

  if ( !isalnum(c) && '_' != c && '@' != c && '.' != c) {

     i = strAddr.erase(i); --i;

 } else *i = tolower(c);
}

cout << strAddr << endl;

}

and the function:

#include <string>
#include <cctype>
#include <iostream>
using namespace std;

void canonicalize_email_addr ( string& strAddr) {

string::iterator i;
string::iterator ib = strAddr.begin();

for ( i = ib; i != strAddr.end(); ++i) {

char c = *i;

if ( !isalnum(c) && '_' != c && '@' != c && '.' != c) {

   i = strAddr.erase(i);

} else *i = tolower(c);
}
0
 
LVL 5

Author Comment

by:jjacksn
ID: 11680407
the bottom function is then incorrect?  (it still doesn't have the --i)

What i meant before was:

what does the string::iterator point to after calling strAddr.erase(i);  <-- what will i be pointing to?  

In other words, will i++ work correctly after that?  

I noticed you added --i?  I'm guessing that's part of my anwer?
0
 
LVL 30

Expert Comment

by:Axter
ID: 11680419
>>I'm trying to take an e-mail address, remove any invalid characters from it, and make it lower case.

What is the source of the email address?
What is the purpose of modifing the original email address?

Be aware, that you can run into problems if the original address string comes in the following formats:
Experts Exchange [qna@experts-exchange.com]
<Experts Exchange>qna@experts-exchange.com
0
 
LVL 86

Accepted Solution

by:
jkr earned 500 total points
ID: 11680441
>> the bottom function is then incorrect?  (it still doesn't have the --i)

Argh, copy&paste :o)

#include <string>
#include <cctype>
#include <iostream>
using namespace std;

void canonicalize_email_addr ( string& strAddr) {

string::iterator i;
string::iterator ib = strAddr.begin();

for ( i = ib; i != strAddr.end(); ++i) {

char c = *i;

if ( !isalnum(c) && '_' != c && '@' != c && '.' != c) {

  i = strAddr.erase(i); --i;

} else *i = tolower(c);
}

>>what does the string::iterator point to after calling strAddr.erase(i);  <-- what will i be pointing to?  

It points to the 1st element *after* the erased one - thus the '--i', since the '++i' in the loop would skip one character...
0
 
LVL 5

Author Comment

by:jjacksn
ID: 11682595
hmm, that function seems to hang forever...?

      //make the e-mail lower case and remove invalid chars
      string::iterator i;
      string::iterator ib = email->begin();
      string::iterator ie = email->end();

      for ( i = ib; i != ie; i++) {
            char c = *i;
            if ( !isalnum(c) && '_' != c && '@' != c && '.' != c) {
                  i = email->erase(i);
                   i--;
            }
            else
            {
                  *i = tolower(c);
            }
      }
0
 
LVL 86

Expert Comment

by:jkr
ID: 11682615
What are you feeding it as arguments? From the above snippet, it is hard to tell what is going on.
0
 
LVL 5

Author Comment

by:jjacksn
ID: 11682631
in the debugger, the value of email is

-email      0x09b45e61 {"}      std::basic_string<char,std::char_traits<char>,std::allocator<char> > *
I can't tell if email equals an empty string or a string with two single quotes.  Either way, the for loop loops infinitely on this string.  Any advice?  (I'm pretty sure its not the empty string because it is pass a call I placed above the for loop that is if(email->length() > 0)

OK, now I'm really confused, its executing the else clause even if the "if" clause is hit?  what is going on?


      if(email->length() > 0)
      {
      for ( i = ib; i != ie; i++) {
            char c = *i;
            if ( !isalnum(c) && '_' != c && '@' != c && '.' != c) {
                  i = email->erase(i);
                  i--;
            }
            else
            {
                  *i = tolower(c);
            }
      }
      }
0
 
LVL 86

Expert Comment

by:jkr
ID: 11682661
But, what is it when you are calling the above function?
0
 
LVL 5

Author Comment

by:jjacksn
ID: 11682791
its really hard to tell... but I'm guessing a blank string, or a double single quote.  this is run in the middle or parsing like 5,000 addresses, so its really hard to single it out.  

Can you reproduce the bad behavior with a string of length zero or double single quotes?  may a single double quote?
0
 
LVL 5

Author Comment

by:jjacksn
ID: 11682997
also, isn't '-' valid as well?  (domains are allowed to have dashes, aren't they?).  
0
 
LVL 5

Author Comment

by:jjacksn
ID: 11683015
Ok, it happens when you have two single quotes.  And, I'm pretty sure the problem is related to the fact that the else clause is getting executing, because then *i is always being assigned to an invalid char.  So, why is the else clause getting executed?  (I feel like I'm taking crazy pills)

      for ( i = ib; i != ie; i++) {
                  char c = *i;
                  if ( !isalnum(c) && '_' != c && '@' != c && '.' != c) {
                        i = email->erase(i);
                        i--;
                  }
                  else
                  {
                        *i = tolower(c);
                  }
            }
0
 
LVL 86

Expert Comment

by:jkr
ID: 11685246
Tested it again with empty strings - cannot reproduce that - you are right about the dash though, just add it to the 'if' statement.
0
 
LVL 5

Author Comment

by:jjacksn
ID: 11686097
not an empty string, a string with two single quotes.
0
 
LVL 86

Expert Comment

by:jkr
ID: 11686469
>>a string with two single quotes

Like

string strAddr = "\"\"";

?

Works also :-(
0
 
LVL 5

Author Comment

by:jjacksn
ID: 11686509
in mine, it doesn't work for some reason.... no, like

string strAddr = "''";

The change to the for loop of

      for ( string::iterator i = email->begin(); i != email->end(); i++) {
                  char c = *i;
                  if ( !isalnum(c) && '_' != c && '@' != c && '.' != c && '-' != c) {
                        i = email->erase(i);
                        i--;
                  }
                  else
                  {
                        *i = tolower(c);
                  }
            }

has fixed the problem though
0
 
LVL 86

Expert Comment

by:jkr
ID: 11686521
Gonna take a closer look tomorrow, it's approaching 5am here :o)
0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

What is C++ STL?: STL stands for Standard Template Library and is a part of standard C++ libraries. It contains many useful data structures (containers) and algorithms, which can spare you a lot of the time. Today we will look at the STL Vector. …
Go is an acronym of golang, is a programming language developed Google in 2007. Go is a new language that is mostly in the C family, with significant input from Pascal/Modula/Oberon family. Hence Go arisen as low-level language with fast compilation…
The viewer will learn how to pass data into a function in C++. This is one step further in using functions. Instead of only printing text onto the console, the function will be able to perform calculations with argumentents given by the user.
The viewer will be introduced to the technique of using vectors in C++. The video will cover how to define a vector, store values in the vector and retrieve data from the values stored in the vector.

717 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question