Oracle Regular Expressions and Dates

newtoperlpgm
newtoperlpgm used Ask the Experts™
on
I want to use Oracle regular expression to extract dates from a varchar2 field in Oracle.  
I need to convert the data into date format so that I can then evaluate it to filter out dates older than today.

regexp_like(batch_id,'[0-9]{2}.[0-9]{2}.[0-9]{2}') works in my where clause and brings back all the data, but I also need to convert it and compare it to SYSDATE to filter out old dates.

The data is in one varchar2 column and looks like this:


Student Term 01.31.19
Student Term 09.15.18
Student Term 09.30.18
Student Term 11.30.18
STAFF 08/31/18-08/15/19
EXTRA 8.31.18-12.21.18
EXTRAS END 08.31.18
AUGMENT END 08.31.18


All help is greatly appreciated.
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
Another problem you will have is if you have a string:  01/02/18

Is that January 2nd or February 1st?

That said, what are your expected results from the above data?

For example, what do you want back with "STAFF 08/31/18-08/15/19"?

Is a dash the only allowed separators?  What about "I started on 11/12/13 and took a break on 11/15/16 and started again on 12/31/20"?

What about data like 99999999?  It matches your regex but cannot be converted.
awking00Information Technology Specialist

Commented:
>>For example, what do you want back with "STAFF 08/31/18-08/15/19"?<<
Good question, especially since one date is prior to sysdate and one is subsequent to it. It's also the only data shown that uses a slash as the separator and not a period. How would you know which format mask to use when converting to a date for comparison purposes. I think some more detail about what you have and what you want to accomplish with your comparisons would be most helpful.

Author

Commented:
Thanks for the questions, very good observances.  The dates are in MMDDYYYY format, so 01/02/18 is January 02, 2018.

Also, I want the ending date from the string.  For example, 08/15/19 from "STAFF 08/31/18-08/15/19" I expect to yield 08/15/19

Is a dash the only allowed separators?  What about "I started on 11/12/13 and took a break on 11/15/16 and started again on 12/31/20"?
I will never have 99999999, or , "I started on 11/12/13 and took a break on 11/15/16 and started again on 12/31/20" because it is a human entering the dates, and if I ever do, those will just not be converted.  Also, at this time the dash is the only separator, but it really could be anything, but again, if we have really bad date information, i.e., worse than what you see in my example, we will not expect to yield that data without first having it fixed.  

Thank you so much for any help you can provide.
Ensure you’re charging the right price for your IT

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

Most Valuable Expert 2012
Distinguished Expert 2018
Commented:
This seems to extract the dates from your sample data. You can take those dates and add any sysdate math you need.

 I cannot say it will work for all your data:
with mydata as (
select 'Student Term 01.31.19' batch_id from dual union all
select 'Student Term 09.15.18' from dual union all
select 'Student Term 09.30.18' from dual union all
select 'Student Term 11.30.18' from dual union all
select 'STAFF 08/31/18-08/15/19' from dual union all
select 'EXTRA 8.31.18-12.21.18' from dual union all
select 'EXTRAS END 08.31.18' from dual union all
select 'AUGMENT END 08.31.18' from dual
)
select
	to_date(regexp_substr(batch_id,'[0-9]{2}[./][0-9]{2}[./][0-9]{2}',1,1),'MM/DD/YY') first_date,
	to_date(regexp_substr(batch_id,'[0-9]{2}[./][0-9]{2}[./][0-9]{2}',1,2),'MM/DD/YY') second_date
from mydata
/

Open in new window

awking00Information Technology Specialist

Commented:
You might consider modifying your regular expression to only allow values of 0 or 1 for the first integer of the day portion and values of 0-3 for the first integer of the month portion to further eliminate invalid entries (although it wouldn't prevent February 30th e.g.).
regexp_substr(varch_id,'[0-1]{1}[0-9]{1}[./][0-3][1}[0-9]{1}[./]{0-9]{2}'

Author

Commented:
Thank you for your help.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial