Verifying email value syntax

Basssque
Basssque used Ask the Experts™
on
Can someone tell me how a column could be checked to verify that it's contents which "should" be email addresses contain the @ character and at least one period?  Thanks!!
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
Quick and dirty but won't work on everything:
case when regexp_like(email_address,'^.+@.+\..+$') then 'Good' else 'Bad' end
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
There are a LOT of email address regular expression validators out there.  Just pick one and use it.  You might need to tweak it a little to make the syntax Oracle compliant but most regex's can be used.
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
This one is a lot better than my first one but still won't get all bad data:
case when regexp_like(email_address,'^[^@.]+@[^@.]+\.[^@.]+$') then 'Good' else 'Bad' end
Ensure you’re charging the right price for your IT

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

Author

Commented:
how about this one?
case when REGEXP_LIKE (CNT1_EMAIL,'^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}$')
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
>>how about this one?

Does it work for you?

I saw that one as well but it is OLD.  There are so many new top level domains, that regex won't work for them.

http://data.iana.org/TLD/tlds-alpha-by-domain.txt

If I created:
slightwv@i_drink.beer

Would it show 'Good' or 'Bad'?

Author

Commented:
Can't get either of them to work, here is an example

SELECT
STUDENTS.STUDENT_NUMBER,
STUDENTS.LAST_NAME,
STUDENTS.FIRST_NAME,
STUDENTS.SCHOOLID,
STUDENTS.GRADE_LEVEL,

case when REGEXP_LIKE (CNT1_EMAIL,'^[^@.]+@[^@.]+\.[^@.]+$')
                        then ''
                        else CNT1_EMAIL || '<-- invalid_format' end CNT1_EMAIL

FROM
"PS"."STUDENTS" "STUDENTS"
LEFT JOIN "PS"."CONTACTS" "CONTACTS" ON "STUDENTS"."DCID" = "CONTACTS"."STUDENTSDCID"

WHERE
(
REGEXP_LIKE (CNT1_EMAIL,'^[^@.]+@[^@.]+\.[^@.]+$') AND
(CONTACTS.CNT1_EMAIL IS NOT NULL) and (CONTACTS.CNT1_EMAIL != ' ')
)
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
What isn't working?  do you get an error or just no data?

If you use the regexp_like in the where clause, then everything returned must match the pattern so, you should not get any 'invalid' ones?


>>(CONTACTS.CNT1_EMAIL IS NOT NULL) and (CONTACTS.CNT1_EMAIL != ' ')

Make it one using trim.  No need to check for a single space.  What if there are two or ten spaces in the field?
(trim(CONTACTS.CNT1_EMAIL) IS NOT NULL)

Author

Commented:
I wasn't getting any data before, that works great.  Good point!
The regex statement outputs any addresses with periods before the @ sign though which is unwanted.  Can any periods before the @ be ignored?
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
>>Can any periods before the @ be ignored?

You can include/ignore anything you want.

As far as periods before the '@', is this not a valid email address:
jane.doe@some.domain.com

Author

Commented:
it's valid
I'm only looking to output invalid addresses though
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
OH, I was confused.  I didn't test 'jane.doe@some.domain.com' with my regex.

I'll see what I can do.

Author

Commented:
this is what I have so far btw
just need to ignore any periods before the @ character :-)

case when REGEXP_LIKE (CNT1_EMAIL,'^[^@.]+@[^@.]+\.[^@.]+$') then ''
                        when instr(CONTACTS.CNT1_EMAIL,'@')=0 then 'missing_@'
                        when instr(CONTACTS.CNT1_EMAIL,'.')=0 then 'missing_period'
                        else CNT1_EMAIL || '<-- invalid_format' end CNT1_EMAIL
Taken from various sources:

http://emailregex.com/

General Email Regex (RFC 5322 Official Standard)
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

Open in new window


Oracle PL/SQL Email Regex
SELECT email 
FROM table_name
WHERE REGEXP_LIKE (email, '[A-Z0-9._%-]+@[A-Z0-9._%-]+\.[A-Z]{2,4}');

Open in new window


Other:

http://nerd.net.au/oracle/49-apex-application-express/validation-regex/108-email-regex-validation
^[a-zA-Z0-9][a-zA-Z0-9\.\-]{1,}@[a-zA-Z0-9]{1}[a-zA-Z0-9\.\-]{1,}\.{1}[a-zA-Z]{2,4}$

Open in new window


Basics:
https://galobalda.wordpress.com/2013/07/10/an-attempt-to-validate-email-addresses-with-a-regular-expression/

General (awesome site for testing regular expressions):
https://regex101.com/
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
Try this one.  Again, far from perfect but I've not really seen a really good one.

case when regexp_like(email_address, '^[^@]+\.?[^@]*@[^@.]+\.[^@.]+$' ) then 'Good' else 'Bad' end
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
Alex,
Most of those aren't valid.  They are all old.  I didn't test the RFC one but the others are out of date.

Author

Commented:
works well, still looking through all of the data
only found one issue so far
if the address contains more than two periods after the @ then it's included in the results which should not be the case
for example first.last@school1.miami.fl.us is valid
Most Valuable Expert 2012
Distinguished Expert 2018
Commented:
Try this tweak:
 regexp_like(email_address, '^[^@]+\.?[^@]*@[^@.]+(\.[^@.]+)+$' )

Author

Commented:
Fantastic!! :-)

This is what I have right now, I can't thank you enough for your help!

case when REGEXP_LIKE (CNT1_EMAIL, '^[^@]+\.?[^@]*@[^@.]+(\.[^@.]+)+$' ) then ''
                        when instr(CONTACTS.CNT1_EMAIL,'@')=0 then CNT1_EMAIL || ' <-- missing_@'
                        when instr(CONTACTS.CNT1_EMAIL,'@')>1 then CNT1_EMAIL || ' <-- multiple_addresses'
                        when instr(CONTACTS.CNT1_EMAIL,'.')=0 then CNT1_EMAIL || ' <-- missing_period'
                        else CNT1_EMAIL || ' <-- invalid_format' end CNT1_EMAIL

Author

Commented:
found one last issue
if the address contains multiple periods before the @ then it's output as invalid
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
There will be more, remember, what I posted won't catch everything.

I'm not going to be able to post something that will catch everything.

This appears to fix the q..@q.q issue:
regexp_like(email_address, '^([^@.]+\.?)+@[^@.]+(\.[^@.]+)+$')

Author

Commented:
Doesn't like that one
never completes and doesn't output any field values
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
I ran it against 11.2.0.2.  I can't remember your Oracle version.

I'll see if I can come up without the new group.

Author

Commented:
12
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
Try my stand-alone test case.  I just tried it with 12.1.0.2 and it ran fine:
with mydata as (
select null email_address from dual
union all
select 'q@q@q.q' email_address from dual
union all
select 'q..@q.q' email_address from dual
union all
select 'q.@q.q' email_address from dual
union all
select 'q@q.q' email_address from dual
union all
select 'q.q.q@q.q.q' email_address from dual
union all
select 'q.q@q.q' email_address from dual
union all
select 'q.q.q.q.q@q.q.q.q.q' email_address from dual
union all
select 'q@q..q' email_address from dual
union all
select 'qqq' email_address from dual
)
select email_address,
case when
regexp_like(email_address, '^([^@.]+\.?)+@[^@.]+(\.[^@.]+)+$')
then 'Good' else 'Bad' end
from mydata
/

Open in new window

Author

Commented:
Can it be modified to only output the invalid values though?
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
Sure.

Add it to the where clause:

select ...
where ...
and NOT regexp_like ...

Author

Commented:
Throws an invalid operational error message
where
(CONTACTS.CNT1_EMAIL NOT REGEXP_LIKE '^[^@]+\.?[^@]*@[^@.]+(\.[^@.]+)+$' )
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
You need the complete regexp_like function call...

select ...
from ...
where ...
and NOT REGEXP_LIKE (CNT1_EMAIL, '^[^@]+\.?[^@]*@[^@.]+(\.[^@.]+)+$' )

Author

Commented:
If I do that then it gives me all the null fields from the <-- missing_@ statement
even with an is not null in the where statement
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
>>If I do that then it gives me all the null fields from

I didn't say it was the ONLY thing in the where clause, if you don't want nulls, add to the where clause.

select ...
from ...
where ...
and NOT REGEXP_LIKE (CNT1_EMAIL, '^[^@]+\.?[^@]*@[^@.]+(\.[^@.]+)+$' )
and trim(CNT1_EMAIL) is not null
...

Author

Commented:
NICE!

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial