Solved

EASY: function to clean e-mail addresses

Posted on 2004-07-30
20
310 Views
Last Modified: 2010-04-01
I'm trying to take an e-mail address, remove any invalid characters from it, and make it lower case.  
the e-mail is a std::string.  What is the easiest way to perform this operation?  (I think the only valid symbols in an e-mail are - _ . and @)
0
Comment
Question by:jjacksn
  • 10
  • 9
20 Comments
 
LVL 86

Expert Comment

by:jkr
ID: 11679601
#include <string>
#include <ctype>
using namespace std;

strinsg strAddr = "Joe_User@someserver.com";
string::iterator i;
string::iterator ib = strAddr.begin();
string::iterator ie = strAddr.end();

for ( i = ib; i != ie; ++i) {

  char c = *i;

   if ( !isalnum(c) && '_' != c && '@' != c) {

      i = strAddr.erase(i); --i;

  } else *i = tolower(c);
}
0
 
LVL 86

Expert Comment

by:jkr
ID: 11679655
Ooops, a few corrections:

#include <string>
#include <cctype>
#include <iostream>
using namespace std;

string strAddr = "Joe_User@someserver.com";
string::iterator i;
string::iterator ib = strAddr.begin();
string::iterator ie = strAddr.end();

for ( i = ib; i != ie; ++i) {

 char c = *i;

  if ( !isalnum(c) && '_' != c && '@' != c && '.' != c) {

     i = strAddr.erase(i);

 } else *i = tolower(c);
}
0
 
LVL 86

Expert Comment

by:jkr
ID: 11679675
And, finally, to wrap that up in a function:

#include <string>
#include <cctype>
#include <iostream>
using namespace std;

void canonicalize_email_addr ( string& strAddr) {

string::iterator i;
string::iterator ib = strAddr.begin();
string::iterator ie = strAddr.end();

for ( i = ib; i != ie; ++i) {

char c = *i;

 if ( !isalnum(c) && '_' != c && '@' != c && '.' != c) {

    i = strAddr.erase(i);

 } else *i = tolower(c);
}
0
Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

 
LVL 5

Author Comment

by:jjacksn
ID: 11680110
what will i point to after strAddr.erase?

am I correct in thinking those are the only valid symbols?

isalnum does what and is in which header?
0
 
LVL 86

Expert Comment

by:jkr
ID: 11680234
>> what will i point to after strAddr.erase?

Um, sorry, but I don't get what you mean...

>>am I correct in thinking those are the only valid symbols?

Yes. The underscore, the dot and the @ IIRC are the only valid characters

>>isalnum does what and is in which header?

It checks whether a character is alphanumeric, e.g. A-Z,a-z,0-9 and resides in 'cctype' - you can directly compile the above with all the headers, I tested it - and have to add another correction. Here's the test code

#include <string>
#include <cctype>
#include <iostream>
using namespace std;

void main () {

string strAddr = "Joe_User@som#ese!rver.com%";
string::iterator i;
string::iterator ib = strAddr.begin();

cout << strAddr << endl;


for ( i = ib; i != strAddr.end(); ++i) {

 char c = *i;

  if ( !isalnum(c) && '_' != c && '@' != c && '.' != c) {

     i = strAddr.erase(i); --i;

 } else *i = tolower(c);
}

cout << strAddr << endl;

}

and the function:

#include <string>
#include <cctype>
#include <iostream>
using namespace std;

void canonicalize_email_addr ( string& strAddr) {

string::iterator i;
string::iterator ib = strAddr.begin();

for ( i = ib; i != strAddr.end(); ++i) {

char c = *i;

if ( !isalnum(c) && '_' != c && '@' != c && '.' != c) {

   i = strAddr.erase(i);

} else *i = tolower(c);
}
0
 
LVL 5

Author Comment

by:jjacksn
ID: 11680407
the bottom function is then incorrect?  (it still doesn't have the --i)

What i meant before was:

what does the string::iterator point to after calling strAddr.erase(i);  <-- what will i be pointing to?  

In other words, will i++ work correctly after that?  

I noticed you added --i?  I'm guessing that's part of my anwer?
0
 
LVL 30

Expert Comment

by:Axter
ID: 11680419
>>I'm trying to take an e-mail address, remove any invalid characters from it, and make it lower case.

What is the source of the email address?
What is the purpose of modifing the original email address?

Be aware, that you can run into problems if the original address string comes in the following formats:
Experts Exchange [qna@experts-exchange.com]
<Experts Exchange>qna@experts-exchange.com
0
 
LVL 86

Accepted Solution

by:
jkr earned 500 total points
ID: 11680441
>> the bottom function is then incorrect?  (it still doesn't have the --i)

Argh, copy&paste :o)

#include <string>
#include <cctype>
#include <iostream>
using namespace std;

void canonicalize_email_addr ( string& strAddr) {

string::iterator i;
string::iterator ib = strAddr.begin();

for ( i = ib; i != strAddr.end(); ++i) {

char c = *i;

if ( !isalnum(c) && '_' != c && '@' != c && '.' != c) {

  i = strAddr.erase(i); --i;

} else *i = tolower(c);
}

>>what does the string::iterator point to after calling strAddr.erase(i);  <-- what will i be pointing to?  

It points to the 1st element *after* the erased one - thus the '--i', since the '++i' in the loop would skip one character...
0
 
LVL 5

Author Comment

by:jjacksn
ID: 11682595
hmm, that function seems to hang forever...?

      //make the e-mail lower case and remove invalid chars
      string::iterator i;
      string::iterator ib = email->begin();
      string::iterator ie = email->end();

      for ( i = ib; i != ie; i++) {
            char c = *i;
            if ( !isalnum(c) && '_' != c && '@' != c && '.' != c) {
                  i = email->erase(i);
                   i--;
            }
            else
            {
                  *i = tolower(c);
            }
      }
0
 
LVL 86

Expert Comment

by:jkr
ID: 11682615
What are you feeding it as arguments? From the above snippet, it is hard to tell what is going on.
0
 
LVL 5

Author Comment

by:jjacksn
ID: 11682631
in the debugger, the value of email is

-email      0x09b45e61 {"}      std::basic_string<char,std::char_traits<char>,std::allocator<char> > *
I can't tell if email equals an empty string or a string with two single quotes.  Either way, the for loop loops infinitely on this string.  Any advice?  (I'm pretty sure its not the empty string because it is pass a call I placed above the for loop that is if(email->length() > 0)

OK, now I'm really confused, its executing the else clause even if the "if" clause is hit?  what is going on?


      if(email->length() > 0)
      {
      for ( i = ib; i != ie; i++) {
            char c = *i;
            if ( !isalnum(c) && '_' != c && '@' != c && '.' != c) {
                  i = email->erase(i);
                  i--;
            }
            else
            {
                  *i = tolower(c);
            }
      }
      }
0
 
LVL 86

Expert Comment

by:jkr
ID: 11682661
But, what is it when you are calling the above function?
0
 
LVL 5

Author Comment

by:jjacksn
ID: 11682791
its really hard to tell... but I'm guessing a blank string, or a double single quote.  this is run in the middle or parsing like 5,000 addresses, so its really hard to single it out.  

Can you reproduce the bad behavior with a string of length zero or double single quotes?  may a single double quote?
0
 
LVL 5

Author Comment

by:jjacksn
ID: 11682997
also, isn't '-' valid as well?  (domains are allowed to have dashes, aren't they?).  
0
 
LVL 5

Author Comment

by:jjacksn
ID: 11683015
Ok, it happens when you have two single quotes.  And, I'm pretty sure the problem is related to the fact that the else clause is getting executing, because then *i is always being assigned to an invalid char.  So, why is the else clause getting executed?  (I feel like I'm taking crazy pills)

      for ( i = ib; i != ie; i++) {
                  char c = *i;
                  if ( !isalnum(c) && '_' != c && '@' != c && '.' != c) {
                        i = email->erase(i);
                        i--;
                  }
                  else
                  {
                        *i = tolower(c);
                  }
            }
0
 
LVL 86

Expert Comment

by:jkr
ID: 11685246
Tested it again with empty strings - cannot reproduce that - you are right about the dash though, just add it to the 'if' statement.
0
 
LVL 5

Author Comment

by:jjacksn
ID: 11686097
not an empty string, a string with two single quotes.
0
 
LVL 86

Expert Comment

by:jkr
ID: 11686469
>>a string with two single quotes

Like

string strAddr = "\"\"";

?

Works also :-(
0
 
LVL 5

Author Comment

by:jjacksn
ID: 11686509
in mine, it doesn't work for some reason.... no, like

string strAddr = "''";

The change to the for loop of

      for ( string::iterator i = email->begin(); i != email->end(); i++) {
                  char c = *i;
                  if ( !isalnum(c) && '_' != c && '@' != c && '.' != c && '-' != c) {
                        i = email->erase(i);
                        i--;
                  }
                  else
                  {
                        *i = tolower(c);
                  }
            }

has fixed the problem though
0
 
LVL 86

Expert Comment

by:jkr
ID: 11686521
Gonna take a closer look tomorrow, it's approaching 5am here :o)
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Fully specialized class template function 21 142
Find Visual Studio Tools 2 111
basic hardware to learn oop advanced design patterns 3 101
Add values of each row in an array 3 64
In days of old, returning something by value from a function in C++ was necessarily avoided because it would, invariably, involve one or even two copies of the object being created and potentially costly calls to a copy-constructor and destructor. A…
What is C++ STL?: STL stands for Standard Template Library and is a part of standard C++ libraries. It contains many useful data structures (containers) and algorithms, which can spare you a lot of the time. Today we will look at the STL Vector. …
The viewer will learn how to pass data into a function in C++. This is one step further in using functions. Instead of only printing text onto the console, the function will be able to perform calculations with argumentents given by the user.
The viewer will learn how to use the return statement in functions in C++. The video will also teach the user how to pass data to a function and have the function return data back for further processing.

860 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question