EMail Address parser/validator

I am looking for source code (preferbaly in Java) for a class that will parse and validate email addresses for RFC (822, I think) conformance.

Not asking for anyone to write one, just want to know of anyone knows of one.

Cheers.
LVL 15
ozymandiasAsked:
Who is Participating?

Improve company productivity with a Business Account.Sign Up

x
 
Jim CakalicConnect With a Mentor Senior Developer/ArchitectCommented:
OK. Here it is, for what it's worth. You'll need the regex4j package from IBM:
    http://www.alphaworks.ibm.com/tech/regex4j

I'll post as just a method that you can throw into a class of your choosing. First, though, you'll need some constant Strings:

    private static final String basicAddress = "^([[:ascii:]]+)@([[:ascii:]]+)$";
    private static final String specialChars = "\\(\\)><@,;:\\\\\\\"\\.\\[\\]";
    private static final String validChars = "[^ \f\n\r\t" + specialChars + "]";
    private static final String atom = validChars + "+";
    private static final String quotedUser = "(\"[^\"]+\")";
    private static final String word = "(" + atom + "|" + quotedUser + ")";
    private static final String validUser = "^" + word + "(\\." + word + ")*$";
    private static final String symDomain = "^" + atom + "(\\." + atom + ")+$";
    private static final String ipDomain = "^\\[(\\d{1,3})\\.(\\d{1,3})\\.(\\d{1,3})\\.(\\d{1,3})\\]$";
    private static final String knownTLDs = "^\\.(com|net|org|edu|int|mil|gov|arpa|biz|aero|name|coop|info|pro|museum)$";

And here's the method:

    public static void validate(String addr) throws javax.mail.internet.AddressException {
        if (addr == null) {
            throw new AddressException("Address argument is null");
        }
        addr = addr.trim();
        if (addr.length() == 0) {
            throw new AddressException("Address argument is empty (0-length)");
        }

        // basic address check
        RegularExpression re = new RegularExpression(basicAddress);
        Match match = new Match();
        if (re.matches(addr, match) == false) {
            throw new AddressException("'" + addr + "' does not look like an internet email address: a@b.c");
        }
        String userPart = match.getCapturedText(1);
        String domainPart = match.getCapturedText(2);

        // user address check
        re.setPattern(validUser);
        if (re.matches(userPart) == false) {
            throw new AddressException("User name '" + userPart + "' violates address syntax");
        }

        // first ip domain check
        re.setPattern(ipDomain);
        if (re.matches(domainPart, match)) {
            // if the pattern matched, there _must_ be 5 groups
            for (int i = 1; i < 5; ++i) {
                String num = match.getCapturedText(i);
                int n = Integer.parseInt(num);
                if (Integer.parseInt(match.getCapturedText(i)) > 255) {
                    throw new AddressException("The specified ip address '" + num + "' is not valid");
                }
            }
        } else {
            // symbolic domain check if not ip domain
            re.setPattern(symDomain);
            if (re.matches(domainPart, match)) {
                String tld = match.getCapturedText(match.getNumberOfGroups() - 1);
                re.setPattern(knownTLDs);
                // permit top-level-domains of 3 (includes dot separator) because these could be
                // country codes. perhaps add check for valid countries? lots of maintenance there.
                if (tld.length() != 3 && re.matches(tld) == false) {
                    throw new AddressException("Top level domain '" + tld + "' is not recognized");
                }
            } else {
                throw new AddressException("Domain name '" + domainPart + "' violates address syntax");
            }
        }
        // all tests passed
        return;
    }


Hope this works for you -- or is at least a start.
Jim
0
 
Jim CakalicSenior Developer/ArchitectCommented:
I have one that I wrote using IBM's regex4j. Not sure that it is exactly RFC822 compliant because I borrowed the algorithm -- not the code -- from elsewhere.

Originally, I thought I would be able to use the javax.mail.internet.InternetAddress class parse method. The javadoc says that addresses parsed by the method must follow the RFC822 syntax. I don't believe this to be true as one illegal address form that I know it allows to parse without exception is: name@domain..com. AFAIK, a subdomain cannot be empty. Unfortunately, it is not until you send a Message to the InternetAddress object on a Transport that you get an AddressException. A little too late for my tastes.

Anyway, holler if you want it and I'll post it. I don't think it would be too difficult to change from regex4j to some other regular expression toolkit.

Best regards,
Jim Cakalic
0
 
ozymandiasAuthor Commented:
Jim, that would be cool if you could post the code.
I started reading the rfc and getting legal and illegal characters for each part of the address etc etc....and I thought, hey, this is a job for someone else....and then I thought, hey, this almost certainly has been a job for someone else, so why not find them ?

Cheers,
Ozy.
0
What Kind of Coding Program is Right for You?

There are many ways to learn to code these days. From coding bootcamps like Flatiron School to online courses to totally free beginner resources. The best way to learn to code depends on many factors, but the most important one is you. See what course is best for you.

 
ozymandiasAuthor Commented:
Yup, that's a great start Jim.
Thanks for that.
I will have to get that IBM package an check it out.
I'm already getting a flashback to my days with sed, awk and perl.....mummy, I'm home......... :0)
0
 
Jim CakalicSenior Developer/ArchitectCommented:
Regular expressions are your friends! Can you imagine the amount of code it would take to perform this validation _without_ regex. The parse method (and submethods) of InternetAddress are somewhere on the order of 300 lines of code. Using regular expressions may not be as fast but definitely makes the resulting code more readable and maintainable -- IMHO. Have fun.
Jim
0
 
ozymandiasAuthor Commented:
Thanks, Jim.
This will do fine for now, and I can refine it as I go along.

I completely concur on the "friendliness" of RE.
They are my friends!

<musical interlude>
    who put the RE in gREp...?
</musical interlude>

They are just old friends who I have been a bit remiss about keeping in touch except for the odd christmas card.

Cheers :0)
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.