Solved

Extract date from string in PHP

Posted on 2009-04-11
5
857 Views
Last Modified: 2013-12-12
I am trying to extract dates in PHP from a large number of existing files.

The trick is this:  the dates are many different formats, and they're embedded within a file name in different ways.  Here are a few of the real-world cases I'm trying to fit:

// Case 1:  "Morning Meeting 4-29-08.xls" : Typical format, 1-digit month
// Case 2:  "Photon_DPR_04-25-2008_Atlantic.xls" : 2-digit month
// Case 3:  "Canyon DPR April 22 08.xlsx" : Month is spelled out
// Case 4:  "CDI_Time_Ticket_April 02_08.xls" : Month is spelled out, underscore used to delimit day/year
// Case 5:  "CCS_Atlantic_Log_Sheet_5-12-08.xls" : Underscore and 1-digit month value
// Case 6:  "CCS_Atlantic_DPR_05-14-08.xls" : Underscore and 2-digit month value
// Case 7:  "Integra_Daily_Report_4-21-2008.pdf" : Underscore and 2-digit month value, with 4-digit year
// Case 8:  "05132008 Wachs DPR Atlantic.xls" : No delimiters, date at front of string
// Case 9:  "Kest-15-03292008_Injury_Report.xls" : No delimiters, date mid-string, "-" characters do not delimit dates!
// Case 10:  "Sample-032908_Report.xls" : 2-digit year, no delimiters, date mid-string, "-" characters do not delimit dates!

My current approach is this:

1.  Starting from the left and right of the file name string, search forwards and backwards until a numeric value is encountered and extract the characters between to search using a simple strtotime call, as below in the code section.

The above approach will fail, though in certain cases, such as:
"Sample File 04-04-2008 version 2008" :  In this case, it fails as digits are not expected to the right of the date.
"Kest-15-03292008_Injury_Report.xls":  This fails as the digits are not expected to the left of the date string.

Any advise on a robust way to parse dates for these kinds of unstructured file names?

Many thanks,

-Kevin
echo ("Embedded date: 4-29-08: ".date("Y-m-d", strtotime("4-29-08"))."\n");

Open in new window

0
Comment
Question by:Kevin_Cain
  • 4
5 Comments
 
LVL 18

Expert Comment

by:Hube02
ID: 24124072
I would start with a regular expression to extract the date and then parse the date from there. Try the attached code. The regular expression used will extract all of the possibilities given in your example. I took your examples made them in to a string and then grab all of your dates out of it.

You could use the following:

preg_matchl($regex, $string, $matches);

and your date would be stored in $matches[2];

<?php
	
$string = '// Case 1:  "Morning Meeting 4-29-08.xls" : Typical format, 1-digit month
// Case 2:  "Photon_DPR_04-25-2008_Atlantic.xls" : 2-digit month
// Case 3:  "Canyon DPR April 22 08.xlsx" : Month is spelled out
// Case 4:  "CDI_Time_Ticket_April 02_08.xls" : Month is spelled out, underscore used to delimit day/year
// Case 5:  "CCS_Atlantic_Log_Sheet_5-12-08.xls" : Underscore and 1-digit month value
// Case 6:  "CCS_Atlantic_DPR_05-14-08.xls" : Underscore and 2-digit month value
// Case 7:  "Integra_Daily_Report_4-21-2008.pdf" : Underscore and 2-digit month value, with 4-digit year
// Case 8:  "05132008 Wachs DPR Atlantic.xls" : No delimiters, date at front of string
// Case 9:  "Kest-15-03292008_Injury_Report.xls" : No delimiters, date mid-string, "-" characters do not delimit dates!
// Case 10:  "Sample-032908_Report.xls" : 2-digit year, no delimiters, date mid-string, "-" characters do not delimit dates!';
 
$regex = '/(:\d+-)?((\d{1,2}|january|february|march|april|may|june|july|august|september|october|november|december)[- _]*\d{1,2}[- _]*\d{2,4})/i';
preg_match_all($regex, $string, $dates);
print_r($dates);
	
?>

Open in new window

0
 
LVL 18

Expert Comment

by:Hube02
ID: 24124073
I take that back, I'm having trouble with your Case 9.........
0
 
LVL 18

Accepted Solution

by:
Hube02 earned 125 total points
ID: 24124079
Got it, as long as your dates conform the only those cases given above you can use this regex

$regex = '/(-\d{2}-)?((\d{1,2}|january|february|march|april|may|june|july|august|september|october|november|december)[- _]*\d{1,2}[- _]*\d{2,4})/i';

Open in new window

0
 

Author Closing Comment

by:Kevin_Cain
ID: 31569249
Very fast response, and a well composed answer.
0
 
LVL 18

Expert Comment

by:Hube02
ID: 24124094
Thanks for the question.
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Popularity Can Be Measured Sometimes we deal with questions of popularity, and we need a way to collect opinions from our clients.  This article shows a simple teaching example of how we might elect a favorite color by letting our clients vote for …
3 proven steps to speed up Magento powered sites. The article focus is on optimizing time to first byte (TTFB), full page caching and configuring server for optimal performance.
The viewer will learn how to dynamically set the form action using jQuery.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

821 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question