Solved

EASY: function to clean e-mail addresses

Posted on 2004-07-30
20
306 Views
Last Modified: 2010-04-01
I'm trying to take an e-mail address, remove any invalid characters from it, and make it lower case.  
the e-mail is a std::string.  What is the easiest way to perform this operation?  (I think the only valid symbols in an e-mail are - _ . and @)
0
Comment
Question by:jjacksn
  • 10
  • 9
20 Comments
 
LVL 86

Expert Comment

by:jkr
ID: 11679601
#include <string>
#include <ctype>
using namespace std;

strinsg strAddr = "Joe_User@someserver.com";
string::iterator i;
string::iterator ib = strAddr.begin();
string::iterator ie = strAddr.end();

for ( i = ib; i != ie; ++i) {

  char c = *i;

   if ( !isalnum(c) && '_' != c && '@' != c) {

      i = strAddr.erase(i); --i;

  } else *i = tolower(c);
}
0
 
LVL 86

Expert Comment

by:jkr
ID: 11679655
Ooops, a few corrections:

#include <string>
#include <cctype>
#include <iostream>
using namespace std;

string strAddr = "Joe_User@someserver.com";
string::iterator i;
string::iterator ib = strAddr.begin();
string::iterator ie = strAddr.end();

for ( i = ib; i != ie; ++i) {

 char c = *i;

  if ( !isalnum(c) && '_' != c && '@' != c && '.' != c) {

     i = strAddr.erase(i);

 } else *i = tolower(c);
}
0
 
LVL 86

Expert Comment

by:jkr
ID: 11679675
And, finally, to wrap that up in a function:

#include <string>
#include <cctype>
#include <iostream>
using namespace std;

void canonicalize_email_addr ( string& strAddr) {

string::iterator i;
string::iterator ib = strAddr.begin();
string::iterator ie = strAddr.end();

for ( i = ib; i != ie; ++i) {

char c = *i;

 if ( !isalnum(c) && '_' != c && '@' != c && '.' != c) {

    i = strAddr.erase(i);

 } else *i = tolower(c);
}
0
 
LVL 5

Author Comment

by:jjacksn
ID: 11680110
what will i point to after strAddr.erase?

am I correct in thinking those are the only valid symbols?

isalnum does what and is in which header?
0
 
LVL 86

Expert Comment

by:jkr
ID: 11680234
>> what will i point to after strAddr.erase?

Um, sorry, but I don't get what you mean...

>>am I correct in thinking those are the only valid symbols?

Yes. The underscore, the dot and the @ IIRC are the only valid characters

>>isalnum does what and is in which header?

It checks whether a character is alphanumeric, e.g. A-Z,a-z,0-9 and resides in 'cctype' - you can directly compile the above with all the headers, I tested it - and have to add another correction. Here's the test code

#include <string>
#include <cctype>
#include <iostream>
using namespace std;

void main () {

string strAddr = "Joe_User@som#ese!rver.com%";
string::iterator i;
string::iterator ib = strAddr.begin();

cout << strAddr << endl;


for ( i = ib; i != strAddr.end(); ++i) {

 char c = *i;

  if ( !isalnum(c) && '_' != c && '@' != c && '.' != c) {

     i = strAddr.erase(i); --i;

 } else *i = tolower(c);
}

cout << strAddr << endl;

}

and the function:

#include <string>
#include <cctype>
#include <iostream>
using namespace std;

void canonicalize_email_addr ( string& strAddr) {

string::iterator i;
string::iterator ib = strAddr.begin();

for ( i = ib; i != strAddr.end(); ++i) {

char c = *i;

if ( !isalnum(c) && '_' != c && '@' != c && '.' != c) {

   i = strAddr.erase(i);

} else *i = tolower(c);
}
0
 
LVL 5

Author Comment

by:jjacksn
ID: 11680407
the bottom function is then incorrect?  (it still doesn't have the --i)

What i meant before was:

what does the string::iterator point to after calling strAddr.erase(i);  <-- what will i be pointing to?  

In other words, will i++ work correctly after that?  

I noticed you added --i?  I'm guessing that's part of my anwer?
0
 
LVL 30

Expert Comment

by:Axter
ID: 11680419
>>I'm trying to take an e-mail address, remove any invalid characters from it, and make it lower case.

What is the source of the email address?
What is the purpose of modifing the original email address?

Be aware, that you can run into problems if the original address string comes in the following formats:
Experts Exchange [qna@experts-exchange.com]
<Experts Exchange>qna@experts-exchange.com
0
 
LVL 86

Accepted Solution

by:
jkr earned 500 total points
ID: 11680441
>> the bottom function is then incorrect?  (it still doesn't have the --i)

Argh, copy&paste :o)

#include <string>
#include <cctype>
#include <iostream>
using namespace std;

void canonicalize_email_addr ( string& strAddr) {

string::iterator i;
string::iterator ib = strAddr.begin();

for ( i = ib; i != strAddr.end(); ++i) {

char c = *i;

if ( !isalnum(c) && '_' != c && '@' != c && '.' != c) {

  i = strAddr.erase(i); --i;

} else *i = tolower(c);
}

>>what does the string::iterator point to after calling strAddr.erase(i);  <-- what will i be pointing to?  

It points to the 1st element *after* the erased one - thus the '--i', since the '++i' in the loop would skip one character...
0
 
LVL 5

Author Comment

by:jjacksn
ID: 11682595
hmm, that function seems to hang forever...?

      //make the e-mail lower case and remove invalid chars
      string::iterator i;
      string::iterator ib = email->begin();
      string::iterator ie = email->end();

      for ( i = ib; i != ie; i++) {
            char c = *i;
            if ( !isalnum(c) && '_' != c && '@' != c && '.' != c) {
                  i = email->erase(i);
                   i--;
            }
            else
            {
                  *i = tolower(c);
            }
      }
0
 
LVL 86

Expert Comment

by:jkr
ID: 11682615
What are you feeding it as arguments? From the above snippet, it is hard to tell what is going on.
0
Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

 
LVL 5

Author Comment

by:jjacksn
ID: 11682631
in the debugger, the value of email is

-email      0x09b45e61 {"}      std::basic_string<char,std::char_traits<char>,std::allocator<char> > *
I can't tell if email equals an empty string or a string with two single quotes.  Either way, the for loop loops infinitely on this string.  Any advice?  (I'm pretty sure its not the empty string because it is pass a call I placed above the for loop that is if(email->length() > 0)

OK, now I'm really confused, its executing the else clause even if the "if" clause is hit?  what is going on?


      if(email->length() > 0)
      {
      for ( i = ib; i != ie; i++) {
            char c = *i;
            if ( !isalnum(c) && '_' != c && '@' != c && '.' != c) {
                  i = email->erase(i);
                  i--;
            }
            else
            {
                  *i = tolower(c);
            }
      }
      }
0
 
LVL 86

Expert Comment

by:jkr
ID: 11682661
But, what is it when you are calling the above function?
0
 
LVL 5

Author Comment

by:jjacksn
ID: 11682791
its really hard to tell... but I'm guessing a blank string, or a double single quote.  this is run in the middle or parsing like 5,000 addresses, so its really hard to single it out.  

Can you reproduce the bad behavior with a string of length zero or double single quotes?  may a single double quote?
0
 
LVL 5

Author Comment

by:jjacksn
ID: 11682997
also, isn't '-' valid as well?  (domains are allowed to have dashes, aren't they?).  
0
 
LVL 5

Author Comment

by:jjacksn
ID: 11683015
Ok, it happens when you have two single quotes.  And, I'm pretty sure the problem is related to the fact that the else clause is getting executing, because then *i is always being assigned to an invalid char.  So, why is the else clause getting executed?  (I feel like I'm taking crazy pills)

      for ( i = ib; i != ie; i++) {
                  char c = *i;
                  if ( !isalnum(c) && '_' != c && '@' != c && '.' != c) {
                        i = email->erase(i);
                        i--;
                  }
                  else
                  {
                        *i = tolower(c);
                  }
            }
0
 
LVL 86

Expert Comment

by:jkr
ID: 11685246
Tested it again with empty strings - cannot reproduce that - you are right about the dash though, just add it to the 'if' statement.
0
 
LVL 5

Author Comment

by:jjacksn
ID: 11686097
not an empty string, a string with two single quotes.
0
 
LVL 86

Expert Comment

by:jkr
ID: 11686469
>>a string with two single quotes

Like

string strAddr = "\"\"";

?

Works also :-(
0
 
LVL 5

Author Comment

by:jjacksn
ID: 11686509
in mine, it doesn't work for some reason.... no, like

string strAddr = "''";

The change to the for loop of

      for ( string::iterator i = email->begin(); i != email->end(); i++) {
                  char c = *i;
                  if ( !isalnum(c) && '_' != c && '@' != c && '.' != c && '-' != c) {
                        i = email->erase(i);
                        i--;
                  }
                  else
                  {
                        *i = tolower(c);
                  }
            }

has fixed the problem though
0
 
LVL 86

Expert Comment

by:jkr
ID: 11686521
Gonna take a closer look tomorrow, it's approaching 5am here :o)
0

Featured Post

Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

Join & Write a Comment

Written by John Humphreys C++ Threading and the POSIX Library This article will cover the basic information that you need to know in order to make use of the POSIX threading library available for C and C++ on UNIX and most Linux systems.   [s…
Many modern programming languages support the concept of a property -- a class member that combines characteristics of both a data member and a method.  These are sometimes called "smart fields" because you can add logic that is applied automaticall…
The viewer will learn how to user default arguments when defining functions. This method of defining functions will be contrasted with the non-default-argument of defining functions.
The viewer will learn how to clear a vector as well as how to detect empty vectors in C++.

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now