Solved

EMail Address parser/validator

Posted on 2001-07-19
6
1,123 Views
Last Modified: 2013-11-23
I am looking for source code (preferbaly in Java) for a class that will parse and validate email addresses for RFC (822, I think) conformance.

Not asking for anyone to write one, just want to know of anyone knows of one.

Cheers.
0
Comment
Question by:ozymandias
  • 3
  • 3
6 Comments
 
LVL 19

Expert Comment

by:Jim Cakalic
ID: 6299893
I have one that I wrote using IBM's regex4j. Not sure that it is exactly RFC822 compliant because I borrowed the algorithm -- not the code -- from elsewhere.

Originally, I thought I would be able to use the javax.mail.internet.InternetAddress class parse method. The javadoc says that addresses parsed by the method must follow the RFC822 syntax. I don't believe this to be true as one illegal address form that I know it allows to parse without exception is: name@domain..com. AFAIK, a subdomain cannot be empty. Unfortunately, it is not until you send a Message to the InternetAddress object on a Transport that you get an AddressException. A little too late for my tastes.

Anyway, holler if you want it and I'll post it. I don't think it would be too difficult to change from regex4j to some other regular expression toolkit.

Best regards,
Jim Cakalic
0
 
LVL 15

Author Comment

by:ozymandias
ID: 6300363
Jim, that would be cool if you could post the code.
I started reading the rfc and getting legal and illegal characters for each part of the address etc etc....and I thought, hey, this is a job for someone else....and then I thought, hey, this almost certainly has been a job for someone else, so why not find them ?

Cheers,
Ozy.
0
 
LVL 19

Accepted Solution

by:
Jim Cakalic earned 50 total points
ID: 6300420
OK. Here it is, for what it's worth. You'll need the regex4j package from IBM:
    http://www.alphaworks.ibm.com/tech/regex4j

I'll post as just a method that you can throw into a class of your choosing. First, though, you'll need some constant Strings:

    private static final String basicAddress = "^([[:ascii:]]+)@([[:ascii:]]+)$";
    private static final String specialChars = "\\(\\)><@,;:\\\\\\\"\\.\\[\\]";
    private static final String validChars = "[^ \f\n\r\t" + specialChars + "]";
    private static final String atom = validChars + "+";
    private static final String quotedUser = "(\"[^\"]+\")";
    private static final String word = "(" + atom + "|" + quotedUser + ")";
    private static final String validUser = "^" + word + "(\\." + word + ")*$";
    private static final String symDomain = "^" + atom + "(\\." + atom + ")+$";
    private static final String ipDomain = "^\\[(\\d{1,3})\\.(\\d{1,3})\\.(\\d{1,3})\\.(\\d{1,3})\\]$";
    private static final String knownTLDs = "^\\.(com|net|org|edu|int|mil|gov|arpa|biz|aero|name|coop|info|pro|museum)$";

And here's the method:

    public static void validate(String addr) throws javax.mail.internet.AddressException {
        if (addr == null) {
            throw new AddressException("Address argument is null");
        }
        addr = addr.trim();
        if (addr.length() == 0) {
            throw new AddressException("Address argument is empty (0-length)");
        }

        // basic address check
        RegularExpression re = new RegularExpression(basicAddress);
        Match match = new Match();
        if (re.matches(addr, match) == false) {
            throw new AddressException("'" + addr + "' does not look like an internet email address: a@b.c");
        }
        String userPart = match.getCapturedText(1);
        String domainPart = match.getCapturedText(2);

        // user address check
        re.setPattern(validUser);
        if (re.matches(userPart) == false) {
            throw new AddressException("User name '" + userPart + "' violates address syntax");
        }

        // first ip domain check
        re.setPattern(ipDomain);
        if (re.matches(domainPart, match)) {
            // if the pattern matched, there _must_ be 5 groups
            for (int i = 1; i < 5; ++i) {
                String num = match.getCapturedText(i);
                int n = Integer.parseInt(num);
                if (Integer.parseInt(match.getCapturedText(i)) > 255) {
                    throw new AddressException("The specified ip address '" + num + "' is not valid");
                }
            }
        } else {
            // symbolic domain check if not ip domain
            re.setPattern(symDomain);
            if (re.matches(domainPart, match)) {
                String tld = match.getCapturedText(match.getNumberOfGroups() - 1);
                re.setPattern(knownTLDs);
                // permit top-level-domains of 3 (includes dot separator) because these could be
                // country codes. perhaps add check for valid countries? lots of maintenance there.
                if (tld.length() != 3 && re.matches(tld) == false) {
                    throw new AddressException("Top level domain '" + tld + "' is not recognized");
                }
            } else {
                throw new AddressException("Domain name '" + domainPart + "' violates address syntax");
            }
        }
        // all tests passed
        return;
    }


Hope this works for you -- or is at least a start.
Jim
0
Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

 
LVL 15

Author Comment

by:ozymandias
ID: 6301389
Yup, that's a great start Jim.
Thanks for that.
I will have to get that IBM package an check it out.
I'm already getting a flashback to my days with sed, awk and perl.....mummy, I'm home......... :0)
0
 
LVL 19

Expert Comment

by:Jim Cakalic
ID: 6302751
Regular expressions are your friends! Can you imagine the amount of code it would take to perform this validation _without_ regex. The parse method (and submethods) of InternetAddress are somewhere on the order of 300 lines of code. Using regular expressions may not be as fast but definitely makes the resulting code more readable and maintainable -- IMHO. Have fun.
Jim
0
 
LVL 15

Author Comment

by:ozymandias
ID: 6304384
Thanks, Jim.
This will do fine for now, and I can refine it as I go along.

I completely concur on the "friendliness" of RE.
They are my friends!

<musical interlude>
    who put the RE in gREp...?
</musical interlude>

They are just old friends who I have been a bit remiss about keeping in touch except for the odd christmas card.

Cheers :0)
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

If you haven’t already, I encourage you to read the first article (http://www.experts-exchange.com/articles/18680/An-Introduction-to-R-Programming-and-R-Studio.html) in my series to gain a basic foundation of R and R Studio.  You will also find the …
This article is meant to give a basic understanding of how to use R Sweave as a way to merge LaTeX and R code seamlessly into one presentable document.
This tutorial covers a step-by-step guide to install VisualVM launcher in eclipse.
The viewer will learn how to use the return statement in functions in C++. The video will also teach the user how to pass data to a function and have the function return data back for further processing.

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now