Solved

EMail Address parser/validator

Posted on 2001-07-19
6
1,129 Views
Last Modified: 2013-11-23
I am looking for source code (preferbaly in Java) for a class that will parse and validate email addresses for RFC (822, I think) conformance.

Not asking for anyone to write one, just want to know of anyone knows of one.

Cheers.
0
Comment
Question by:ozymandias
  • 3
  • 3
6 Comments
 
LVL 19

Expert Comment

by:Jim Cakalic
ID: 6299893
I have one that I wrote using IBM's regex4j. Not sure that it is exactly RFC822 compliant because I borrowed the algorithm -- not the code -- from elsewhere.

Originally, I thought I would be able to use the javax.mail.internet.InternetAddress class parse method. The javadoc says that addresses parsed by the method must follow the RFC822 syntax. I don't believe this to be true as one illegal address form that I know it allows to parse without exception is: name@domain..com. AFAIK, a subdomain cannot be empty. Unfortunately, it is not until you send a Message to the InternetAddress object on a Transport that you get an AddressException. A little too late for my tastes.

Anyway, holler if you want it and I'll post it. I don't think it would be too difficult to change from regex4j to some other regular expression toolkit.

Best regards,
Jim Cakalic
0
 
LVL 15

Author Comment

by:ozymandias
ID: 6300363
Jim, that would be cool if you could post the code.
I started reading the rfc and getting legal and illegal characters for each part of the address etc etc....and I thought, hey, this is a job for someone else....and then I thought, hey, this almost certainly has been a job for someone else, so why not find them ?

Cheers,
Ozy.
0
 
LVL 19

Accepted Solution

by:
Jim Cakalic earned 50 total points
ID: 6300420
OK. Here it is, for what it's worth. You'll need the regex4j package from IBM:
    http://www.alphaworks.ibm.com/tech/regex4j

I'll post as just a method that you can throw into a class of your choosing. First, though, you'll need some constant Strings:

    private static final String basicAddress = "^([[:ascii:]]+)@([[:ascii:]]+)$";
    private static final String specialChars = "\\(\\)><@,;:\\\\\\\"\\.\\[\\]";
    private static final String validChars = "[^ \f\n\r\t" + specialChars + "]";
    private static final String atom = validChars + "+";
    private static final String quotedUser = "(\"[^\"]+\")";
    private static final String word = "(" + atom + "|" + quotedUser + ")";
    private static final String validUser = "^" + word + "(\\." + word + ")*$";
    private static final String symDomain = "^" + atom + "(\\." + atom + ")+$";
    private static final String ipDomain = "^\\[(\\d{1,3})\\.(\\d{1,3})\\.(\\d{1,3})\\.(\\d{1,3})\\]$";
    private static final String knownTLDs = "^\\.(com|net|org|edu|int|mil|gov|arpa|biz|aero|name|coop|info|pro|museum)$";

And here's the method:

    public static void validate(String addr) throws javax.mail.internet.AddressException {
        if (addr == null) {
            throw new AddressException("Address argument is null");
        }
        addr = addr.trim();
        if (addr.length() == 0) {
            throw new AddressException("Address argument is empty (0-length)");
        }

        // basic address check
        RegularExpression re = new RegularExpression(basicAddress);
        Match match = new Match();
        if (re.matches(addr, match) == false) {
            throw new AddressException("'" + addr + "' does not look like an internet email address: a@b.c");
        }
        String userPart = match.getCapturedText(1);
        String domainPart = match.getCapturedText(2);

        // user address check
        re.setPattern(validUser);
        if (re.matches(userPart) == false) {
            throw new AddressException("User name '" + userPart + "' violates address syntax");
        }

        // first ip domain check
        re.setPattern(ipDomain);
        if (re.matches(domainPart, match)) {
            // if the pattern matched, there _must_ be 5 groups
            for (int i = 1; i < 5; ++i) {
                String num = match.getCapturedText(i);
                int n = Integer.parseInt(num);
                if (Integer.parseInt(match.getCapturedText(i)) > 255) {
                    throw new AddressException("The specified ip address '" + num + "' is not valid");
                }
            }
        } else {
            // symbolic domain check if not ip domain
            re.setPattern(symDomain);
            if (re.matches(domainPart, match)) {
                String tld = match.getCapturedText(match.getNumberOfGroups() - 1);
                re.setPattern(knownTLDs);
                // permit top-level-domains of 3 (includes dot separator) because these could be
                // country codes. perhaps add check for valid countries? lots of maintenance there.
                if (tld.length() != 3 && re.matches(tld) == false) {
                    throw new AddressException("Top level domain '" + tld + "' is not recognized");
                }
            } else {
                throw new AddressException("Domain name '" + domainPart + "' violates address syntax");
            }
        }
        // all tests passed
        return;
    }


Hope this works for you -- or is at least a start.
Jim
0
Courses: Start Training Online With Pros, Today

Brush up on the basics or master the advanced techniques required to earn essential industry certifications, with Courses. Enroll in a course and start learning today. Training topics range from Android App Dev to the Xen Virtualization Platform.

 
LVL 15

Author Comment

by:ozymandias
ID: 6301389
Yup, that's a great start Jim.
Thanks for that.
I will have to get that IBM package an check it out.
I'm already getting a flashback to my days with sed, awk and perl.....mummy, I'm home......... :0)
0
 
LVL 19

Expert Comment

by:Jim Cakalic
ID: 6302751
Regular expressions are your friends! Can you imagine the amount of code it would take to perform this validation _without_ regex. The parse method (and submethods) of InternetAddress are somewhere on the order of 300 lines of code. Using regular expressions may not be as fast but definitely makes the resulting code more readable and maintainable -- IMHO. Have fun.
Jim
0
 
LVL 15

Author Comment

by:ozymandias
ID: 6304384
Thanks, Jim.
This will do fine for now, and I can refine it as I go along.

I completely concur on the "friendliness" of RE.
They are my friends!

<musical interlude>
    who put the RE in gREp...?
</musical interlude>

They are just old friends who I have been a bit remiss about keeping in touch except for the odd christmas card.

Cheers :0)
0

Featured Post

Gigs: Get Your Project Delivered by an Expert

Select from freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely and get projects done right.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Navigation is an important part of web design from a usability perspective. But it is often a pain when it comes to a developer’s perspective. By navigation, it often means menuing. This is less theory and more practical of how to get a specific gro…
Java functions are among the best things for programmers to work with as Java sites can be very easy to read and prepare. Java especially simplifies many processes in the coding industry as it helps integrate many forms of technology and different d…
This theoretical tutorial explains exceptions, reasons for exceptions, different categories of exception and exception hierarchy.
The goal of the tutorial is to teach the user how to use functions in C++. The video will cover how to define functions, how to call functions and how to create functions prototypes. Microsoft Visual C++ 2010 Express will be used as a text editor an…

776 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question