Solved

EMail Address parser/validator

Posted on 2001-07-19
6
1,126 Views
Last Modified: 2013-11-23
I am looking for source code (preferbaly in Java) for a class that will parse and validate email addresses for RFC (822, I think) conformance.

Not asking for anyone to write one, just want to know of anyone knows of one.

Cheers.
0
Comment
Question by:ozymandias
  • 3
  • 3
6 Comments
 
LVL 19

Expert Comment

by:Jim Cakalic
ID: 6299893
I have one that I wrote using IBM's regex4j. Not sure that it is exactly RFC822 compliant because I borrowed the algorithm -- not the code -- from elsewhere.

Originally, I thought I would be able to use the javax.mail.internet.InternetAddress class parse method. The javadoc says that addresses parsed by the method must follow the RFC822 syntax. I don't believe this to be true as one illegal address form that I know it allows to parse without exception is: name@domain..com. AFAIK, a subdomain cannot be empty. Unfortunately, it is not until you send a Message to the InternetAddress object on a Transport that you get an AddressException. A little too late for my tastes.

Anyway, holler if you want it and I'll post it. I don't think it would be too difficult to change from regex4j to some other regular expression toolkit.

Best regards,
Jim Cakalic
0
 
LVL 15

Author Comment

by:ozymandias
ID: 6300363
Jim, that would be cool if you could post the code.
I started reading the rfc and getting legal and illegal characters for each part of the address etc etc....and I thought, hey, this is a job for someone else....and then I thought, hey, this almost certainly has been a job for someone else, so why not find them ?

Cheers,
Ozy.
0
 
LVL 19

Accepted Solution

by:
Jim Cakalic earned 50 total points
ID: 6300420
OK. Here it is, for what it's worth. You'll need the regex4j package from IBM:
    http://www.alphaworks.ibm.com/tech/regex4j

I'll post as just a method that you can throw into a class of your choosing. First, though, you'll need some constant Strings:

    private static final String basicAddress = "^([[:ascii:]]+)@([[:ascii:]]+)$";
    private static final String specialChars = "\\(\\)><@,;:\\\\\\\"\\.\\[\\]";
    private static final String validChars = "[^ \f\n\r\t" + specialChars + "]";
    private static final String atom = validChars + "+";
    private static final String quotedUser = "(\"[^\"]+\")";
    private static final String word = "(" + atom + "|" + quotedUser + ")";
    private static final String validUser = "^" + word + "(\\." + word + ")*$";
    private static final String symDomain = "^" + atom + "(\\." + atom + ")+$";
    private static final String ipDomain = "^\\[(\\d{1,3})\\.(\\d{1,3})\\.(\\d{1,3})\\.(\\d{1,3})\\]$";
    private static final String knownTLDs = "^\\.(com|net|org|edu|int|mil|gov|arpa|biz|aero|name|coop|info|pro|museum)$";

And here's the method:

    public static void validate(String addr) throws javax.mail.internet.AddressException {
        if (addr == null) {
            throw new AddressException("Address argument is null");
        }
        addr = addr.trim();
        if (addr.length() == 0) {
            throw new AddressException("Address argument is empty (0-length)");
        }

        // basic address check
        RegularExpression re = new RegularExpression(basicAddress);
        Match match = new Match();
        if (re.matches(addr, match) == false) {
            throw new AddressException("'" + addr + "' does not look like an internet email address: a@b.c");
        }
        String userPart = match.getCapturedText(1);
        String domainPart = match.getCapturedText(2);

        // user address check
        re.setPattern(validUser);
        if (re.matches(userPart) == false) {
            throw new AddressException("User name '" + userPart + "' violates address syntax");
        }

        // first ip domain check
        re.setPattern(ipDomain);
        if (re.matches(domainPart, match)) {
            // if the pattern matched, there _must_ be 5 groups
            for (int i = 1; i < 5; ++i) {
                String num = match.getCapturedText(i);
                int n = Integer.parseInt(num);
                if (Integer.parseInt(match.getCapturedText(i)) > 255) {
                    throw new AddressException("The specified ip address '" + num + "' is not valid");
                }
            }
        } else {
            // symbolic domain check if not ip domain
            re.setPattern(symDomain);
            if (re.matches(domainPart, match)) {
                String tld = match.getCapturedText(match.getNumberOfGroups() - 1);
                re.setPattern(knownTLDs);
                // permit top-level-domains of 3 (includes dot separator) because these could be
                // country codes. perhaps add check for valid countries? lots of maintenance there.
                if (tld.length() != 3 && re.matches(tld) == false) {
                    throw new AddressException("Top level domain '" + tld + "' is not recognized");
                }
            } else {
                throw new AddressException("Domain name '" + domainPart + "' violates address syntax");
            }
        }
        // all tests passed
        return;
    }


Hope this works for you -- or is at least a start.
Jim
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 
LVL 15

Author Comment

by:ozymandias
ID: 6301389
Yup, that's a great start Jim.
Thanks for that.
I will have to get that IBM package an check it out.
I'm already getting a flashback to my days with sed, awk and perl.....mummy, I'm home......... :0)
0
 
LVL 19

Expert Comment

by:Jim Cakalic
ID: 6302751
Regular expressions are your friends! Can you imagine the amount of code it would take to perform this validation _without_ regex. The parse method (and submethods) of InternetAddress are somewhere on the order of 300 lines of code. Using regular expressions may not be as fast but definitely makes the resulting code more readable and maintainable -- IMHO. Have fun.
Jim
0
 
LVL 15

Author Comment

by:ozymandias
ID: 6304384
Thanks, Jim.
This will do fine for now, and I can refine it as I go along.

I completely concur on the "friendliness" of RE.
They are my friends!

<musical interlude>
    who put the RE in gREp...?
</musical interlude>

They are just old friends who I have been a bit remiss about keeping in touch except for the odd christmas card.

Cheers :0)
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Windows Script Host (WSH) has been part of Windows since Windows NT4. Windows Script Host provides architecture for building dynamic scripts that consist of a core object model, scripting hosts, and scripting engines. The key components of Window…
This article is meant to give a basic understanding of how to use R Sweave as a way to merge LaTeX and R code seamlessly into one presentable document.
This tutorial will introduce the viewer to VisualVM for the Java platform application. This video explains an example program and covers the Overview, Monitor, and Heap Dump tabs.
This tutorial explains how to use the VisualVM tool for the Java platform application. This video goes into detail on the Threads, Sampler, and Profiler tabs.

862 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now