Solved

Extract date from string in PHP

Posted on 2009-04-11
5
846 Views
Last Modified: 2013-12-12
I am trying to extract dates in PHP from a large number of existing files.

The trick is this:  the dates are many different formats, and they're embedded within a file name in different ways.  Here are a few of the real-world cases I'm trying to fit:

// Case 1:  "Morning Meeting 4-29-08.xls" : Typical format, 1-digit month
// Case 2:  "Photon_DPR_04-25-2008_Atlantic.xls" : 2-digit month
// Case 3:  "Canyon DPR April 22 08.xlsx" : Month is spelled out
// Case 4:  "CDI_Time_Ticket_April 02_08.xls" : Month is spelled out, underscore used to delimit day/year
// Case 5:  "CCS_Atlantic_Log_Sheet_5-12-08.xls" : Underscore and 1-digit month value
// Case 6:  "CCS_Atlantic_DPR_05-14-08.xls" : Underscore and 2-digit month value
// Case 7:  "Integra_Daily_Report_4-21-2008.pdf" : Underscore and 2-digit month value, with 4-digit year
// Case 8:  "05132008 Wachs DPR Atlantic.xls" : No delimiters, date at front of string
// Case 9:  "Kest-15-03292008_Injury_Report.xls" : No delimiters, date mid-string, "-" characters do not delimit dates!
// Case 10:  "Sample-032908_Report.xls" : 2-digit year, no delimiters, date mid-string, "-" characters do not delimit dates!

My current approach is this:

1.  Starting from the left and right of the file name string, search forwards and backwards until a numeric value is encountered and extract the characters between to search using a simple strtotime call, as below in the code section.

The above approach will fail, though in certain cases, such as:
"Sample File 04-04-2008 version 2008" :  In this case, it fails as digits are not expected to the right of the date.
"Kest-15-03292008_Injury_Report.xls":  This fails as the digits are not expected to the left of the date string.

Any advise on a robust way to parse dates for these kinds of unstructured file names?

Many thanks,

-Kevin
echo ("Embedded date: 4-29-08: ".date("Y-m-d", strtotime("4-29-08"))."\n");

Open in new window

0
Comment
Question by:Kevin_Cain
  • 4
5 Comments
 
LVL 18

Expert Comment

by:Hube02
ID: 24124072
I would start with a regular expression to extract the date and then parse the date from there. Try the attached code. The regular expression used will extract all of the possibilities given in your example. I took your examples made them in to a string and then grab all of your dates out of it.

You could use the following:

preg_matchl($regex, $string, $matches);

and your date would be stored in $matches[2];

<?php

	

$string = '// Case 1:  "Morning Meeting 4-29-08.xls" : Typical format, 1-digit month

// Case 2:  "Photon_DPR_04-25-2008_Atlantic.xls" : 2-digit month

// Case 3:  "Canyon DPR April 22 08.xlsx" : Month is spelled out

// Case 4:  "CDI_Time_Ticket_April 02_08.xls" : Month is spelled out, underscore used to delimit day/year

// Case 5:  "CCS_Atlantic_Log_Sheet_5-12-08.xls" : Underscore and 1-digit month value

// Case 6:  "CCS_Atlantic_DPR_05-14-08.xls" : Underscore and 2-digit month value

// Case 7:  "Integra_Daily_Report_4-21-2008.pdf" : Underscore and 2-digit month value, with 4-digit year

// Case 8:  "05132008 Wachs DPR Atlantic.xls" : No delimiters, date at front of string

// Case 9:  "Kest-15-03292008_Injury_Report.xls" : No delimiters, date mid-string, "-" characters do not delimit dates!

// Case 10:  "Sample-032908_Report.xls" : 2-digit year, no delimiters, date mid-string, "-" characters do not delimit dates!';
 

$regex = '/(:\d+-)?((\d{1,2}|january|february|march|april|may|june|july|august|september|october|november|december)[- _]*\d{1,2}[- _]*\d{2,4})/i';

preg_match_all($regex, $string, $dates);

print_r($dates);

	

?>

Open in new window

0
 
LVL 18

Expert Comment

by:Hube02
ID: 24124073
I take that back, I'm having trouble with your Case 9.........
0
 
LVL 18

Accepted Solution

by:
Hube02 earned 125 total points
ID: 24124079
Got it, as long as your dates conform the only those cases given above you can use this regex

$regex = '/(-\d{2}-)?((\d{1,2}|january|february|march|april|may|june|july|august|september|october|november|december)[- _]*\d{1,2}[- _]*\d{2,4})/i';

Open in new window

0
 

Author Closing Comment

by:Kevin_Cain
ID: 31569249
Very fast response, and a well composed answer.
0
 
LVL 18

Expert Comment

by:Hube02
ID: 24124094
Thanks for the question.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Author Note: Since this E-E article was originally written, years ago, formal testing has come into common use in the world of PHP.  PHPUnit (http://en.wikipedia.org/wiki/PHPUnit) and similar technologies have enjoyed wide adoption, making it possib…
Part of the Global Positioning System A geocode (https://developers.google.com/maps/documentation/geocoding/) is the major subset of a GPS coordinate (http://en.wikipedia.org/wiki/Global_Positioning_System), the other parts being the altitude and t…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
The viewer will learn how to count occurrences of each item in an array.

947 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now