I am trying to extract dates in PHP from a large number of existing files.
The trick is this: the dates are many different formats, and they're embedded within a file name in different ways. Here are a few of the real-world cases I'm trying to fit:
// Case 1: "Morning Meeting 4-29-08.xls" : Typical format, 1-digit month
// Case 2: "Photon_DPR_04-25-2008_Atl
: 2-digit month
// Case 3: "Canyon DPR April 22 08.xlsx" : Month is spelled out
// Case 4: "CDI_Time_Ticket_April 02_08.xls" : Month is spelled out, underscore used to delimit day/year
// Case 5: "CCS_Atlantic_Log_Sheet_5-
: Underscore and 1-digit month value
// Case 6: "CCS_Atlantic_DPR_05-14-08
.xls" : Underscore and 2-digit month value
// Case 7: "Integra_Daily_Report_4-21
: Underscore and 2-digit month value, with 4-digit year
// Case 8: "05132008 Wachs DPR Atlantic.xls" : No delimiters, date at front of string
// Case 9: "Kest-15-03292008_Injury_R
: No delimiters, date mid-string, "-" characters do not delimit dates!
// Case 10: "Sample-032908_Report.xls"
: 2-digit year, no delimiters, date mid-string, "-" characters do not delimit dates!
My current approach is this:
1. Starting from the left and right of the file name string, search forwards and backwards until a numeric value is encountered and extract the characters between to search using a simple strtotime call, as below in the code section.
The above approach will fail, though in certain cases, such as:
"Sample File 04-04-2008 version 2008" : In this case, it fails as digits are not expected to the right of the date.
: This fails as the digits are not expected to the left of the date string.
Any advise on a robust way to parse dates for these kinds of unstructured file names?
echo ("Embedded date: 4-29-08: ".date("Y-m-d", strtotime("4-29-08"))."\n");