Solved

EMail Address parser/validator

Posted on 2001-07-19
6
1,150 Views
Last Modified: 2013-11-23
I am looking for source code (preferbaly in Java) for a class that will parse and validate email addresses for RFC (822, I think) conformance.

Not asking for anyone to write one, just want to know of anyone knows of one.

Cheers.
0
Comment
Question by:ozymandias
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
6 Comments
 
LVL 19

Expert Comment

by:Jim Cakalic
ID: 6299893
I have one that I wrote using IBM's regex4j. Not sure that it is exactly RFC822 compliant because I borrowed the algorithm -- not the code -- from elsewhere.

Originally, I thought I would be able to use the javax.mail.internet.InternetAddress class parse method. The javadoc says that addresses parsed by the method must follow the RFC822 syntax. I don't believe this to be true as one illegal address form that I know it allows to parse without exception is: name@domain..com. AFAIK, a subdomain cannot be empty. Unfortunately, it is not until you send a Message to the InternetAddress object on a Transport that you get an AddressException. A little too late for my tastes.

Anyway, holler if you want it and I'll post it. I don't think it would be too difficult to change from regex4j to some other regular expression toolkit.

Best regards,
Jim Cakalic
0
 
LVL 15

Author Comment

by:ozymandias
ID: 6300363
Jim, that would be cool if you could post the code.
I started reading the rfc and getting legal and illegal characters for each part of the address etc etc....and I thought, hey, this is a job for someone else....and then I thought, hey, this almost certainly has been a job for someone else, so why not find them ?

Cheers,
Ozy.
0
 
LVL 19

Accepted Solution

by:
Jim Cakalic earned 50 total points
ID: 6300420
OK. Here it is, for what it's worth. You'll need the regex4j package from IBM:
    http://www.alphaworks.ibm.com/tech/regex4j

I'll post as just a method that you can throw into a class of your choosing. First, though, you'll need some constant Strings:

    private static final String basicAddress = "^([[:ascii:]]+)@([[:ascii:]]+)$";
    private static final String specialChars = "\\(\\)><@,;:\\\\\\\"\\.\\[\\]";
    private static final String validChars = "[^ \f\n\r\t" + specialChars + "]";
    private static final String atom = validChars + "+";
    private static final String quotedUser = "(\"[^\"]+\")";
    private static final String word = "(" + atom + "|" + quotedUser + ")";
    private static final String validUser = "^" + word + "(\\." + word + ")*$";
    private static final String symDomain = "^" + atom + "(\\." + atom + ")+$";
    private static final String ipDomain = "^\\[(\\d{1,3})\\.(\\d{1,3})\\.(\\d{1,3})\\.(\\d{1,3})\\]$";
    private static final String knownTLDs = "^\\.(com|net|org|edu|int|mil|gov|arpa|biz|aero|name|coop|info|pro|museum)$";

And here's the method:

    public static void validate(String addr) throws javax.mail.internet.AddressException {
        if (addr == null) {
            throw new AddressException("Address argument is null");
        }
        addr = addr.trim();
        if (addr.length() == 0) {
            throw new AddressException("Address argument is empty (0-length)");
        }

        // basic address check
        RegularExpression re = new RegularExpression(basicAddress);
        Match match = new Match();
        if (re.matches(addr, match) == false) {
            throw new AddressException("'" + addr + "' does not look like an internet email address: a@b.c");
        }
        String userPart = match.getCapturedText(1);
        String domainPart = match.getCapturedText(2);

        // user address check
        re.setPattern(validUser);
        if (re.matches(userPart) == false) {
            throw new AddressException("User name '" + userPart + "' violates address syntax");
        }

        // first ip domain check
        re.setPattern(ipDomain);
        if (re.matches(domainPart, match)) {
            // if the pattern matched, there _must_ be 5 groups
            for (int i = 1; i < 5; ++i) {
                String num = match.getCapturedText(i);
                int n = Integer.parseInt(num);
                if (Integer.parseInt(match.getCapturedText(i)) > 255) {
                    throw new AddressException("The specified ip address '" + num + "' is not valid");
                }
            }
        } else {
            // symbolic domain check if not ip domain
            re.setPattern(symDomain);
            if (re.matches(domainPart, match)) {
                String tld = match.getCapturedText(match.getNumberOfGroups() - 1);
                re.setPattern(knownTLDs);
                // permit top-level-domains of 3 (includes dot separator) because these could be
                // country codes. perhaps add check for valid countries? lots of maintenance there.
                if (tld.length() != 3 && re.matches(tld) == false) {
                    throw new AddressException("Top level domain '" + tld + "' is not recognized");
                }
            } else {
                throw new AddressException("Domain name '" + domainPart + "' violates address syntax");
            }
        }
        // all tests passed
        return;
    }


Hope this works for you -- or is at least a start.
Jim
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 15

Author Comment

by:ozymandias
ID: 6301389
Yup, that's a great start Jim.
Thanks for that.
I will have to get that IBM package an check it out.
I'm already getting a flashback to my days with sed, awk and perl.....mummy, I'm home......... :0)
0
 
LVL 19

Expert Comment

by:Jim Cakalic
ID: 6302751
Regular expressions are your friends! Can you imagine the amount of code it would take to perform this validation _without_ regex. The parse method (and submethods) of InternetAddress are somewhere on the order of 300 lines of code. Using regular expressions may not be as fast but definitely makes the resulting code more readable and maintainable -- IMHO. Have fun.
Jim
0
 
LVL 15

Author Comment

by:ozymandias
ID: 6304384
Thanks, Jim.
This will do fine for now, and I can refine it as I go along.

I completely concur on the "friendliness" of RE.
They are my friends!

<musical interlude>
    who put the RE in gREp...?
</musical interlude>

They are just old friends who I have been a bit remiss about keeping in touch except for the odd christmas card.

Cheers :0)
0

Featured Post

PeopleSoft Has Never Been Easier

PeopleSoft Adoption Made Smooth & Simple!

On-The-Job Training Is made Intuitive & Easy With WalkMe's On-Screen Guidance Tool.  Claim Your Free WalkMe Account Now

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Having just graduated from college and entered the workforce, I don’t find myself always using the tools and programs I grew accustomed to over the past four years. However, there is one program I continually find myself reverting back to…R.   So …
This article will show, step by step, how to integrate R code into a R Sweave document
The goal of the tutorial is to teach the user how to use functions in C++. The video will cover how to define functions, how to call functions and how to create functions prototypes. Microsoft Visual C++ 2010 Express will be used as a text editor an…
The viewer will learn how to clear a vector as well as how to detect empty vectors in C++.

691 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question