Solved

Extract date from string in PHP

Posted on 2009-04-11
5
843 Views
Last Modified: 2013-12-12
I am trying to extract dates in PHP from a large number of existing files.

The trick is this:  the dates are many different formats, and they're embedded within a file name in different ways.  Here are a few of the real-world cases I'm trying to fit:

// Case 1:  "Morning Meeting 4-29-08.xls" : Typical format, 1-digit month
// Case 2:  "Photon_DPR_04-25-2008_Atlantic.xls" : 2-digit month
// Case 3:  "Canyon DPR April 22 08.xlsx" : Month is spelled out
// Case 4:  "CDI_Time_Ticket_April 02_08.xls" : Month is spelled out, underscore used to delimit day/year
// Case 5:  "CCS_Atlantic_Log_Sheet_5-12-08.xls" : Underscore and 1-digit month value
// Case 6:  "CCS_Atlantic_DPR_05-14-08.xls" : Underscore and 2-digit month value
// Case 7:  "Integra_Daily_Report_4-21-2008.pdf" : Underscore and 2-digit month value, with 4-digit year
// Case 8:  "05132008 Wachs DPR Atlantic.xls" : No delimiters, date at front of string
// Case 9:  "Kest-15-03292008_Injury_Report.xls" : No delimiters, date mid-string, "-" characters do not delimit dates!
// Case 10:  "Sample-032908_Report.xls" : 2-digit year, no delimiters, date mid-string, "-" characters do not delimit dates!

My current approach is this:

1.  Starting from the left and right of the file name string, search forwards and backwards until a numeric value is encountered and extract the characters between to search using a simple strtotime call, as below in the code section.

The above approach will fail, though in certain cases, such as:
"Sample File 04-04-2008 version 2008" :  In this case, it fails as digits are not expected to the right of the date.
"Kest-15-03292008_Injury_Report.xls":  This fails as the digits are not expected to the left of the date string.

Any advise on a robust way to parse dates for these kinds of unstructured file names?

Many thanks,

-Kevin
echo ("Embedded date: 4-29-08: ".date("Y-m-d", strtotime("4-29-08"))."\n");

Open in new window

0
Comment
Question by:Kevin_Cain
  • 4
5 Comments
 
LVL 18

Expert Comment

by:Hube02
ID: 24124072
I would start with a regular expression to extract the date and then parse the date from there. Try the attached code. The regular expression used will extract all of the possibilities given in your example. I took your examples made them in to a string and then grab all of your dates out of it.

You could use the following:

preg_matchl($regex, $string, $matches);

and your date would be stored in $matches[2];

<?php

	

$string = '// Case 1:  "Morning Meeting 4-29-08.xls" : Typical format, 1-digit month

// Case 2:  "Photon_DPR_04-25-2008_Atlantic.xls" : 2-digit month

// Case 3:  "Canyon DPR April 22 08.xlsx" : Month is spelled out

// Case 4:  "CDI_Time_Ticket_April 02_08.xls" : Month is spelled out, underscore used to delimit day/year

// Case 5:  "CCS_Atlantic_Log_Sheet_5-12-08.xls" : Underscore and 1-digit month value

// Case 6:  "CCS_Atlantic_DPR_05-14-08.xls" : Underscore and 2-digit month value

// Case 7:  "Integra_Daily_Report_4-21-2008.pdf" : Underscore and 2-digit month value, with 4-digit year

// Case 8:  "05132008 Wachs DPR Atlantic.xls" : No delimiters, date at front of string

// Case 9:  "Kest-15-03292008_Injury_Report.xls" : No delimiters, date mid-string, "-" characters do not delimit dates!

// Case 10:  "Sample-032908_Report.xls" : 2-digit year, no delimiters, date mid-string, "-" characters do not delimit dates!';
 

$regex = '/(:\d+-)?((\d{1,2}|january|february|march|april|may|june|july|august|september|october|november|december)[- _]*\d{1,2}[- _]*\d{2,4})/i';

preg_match_all($regex, $string, $dates);

print_r($dates);

	

?>

Open in new window

0
 
LVL 18

Expert Comment

by:Hube02
ID: 24124073
I take that back, I'm having trouble with your Case 9.........
0
 
LVL 18

Accepted Solution

by:
Hube02 earned 125 total points
ID: 24124079
Got it, as long as your dates conform the only those cases given above you can use this regex

$regex = '/(-\d{2}-)?((\d{1,2}|january|february|march|april|may|june|july|august|september|october|november|december)[- _]*\d{1,2}[- _]*\d{2,4})/i';

Open in new window

0
 

Author Closing Comment

by:Kevin_Cain
ID: 31569249
Very fast response, and a well composed answer.
0
 
LVL 18

Expert Comment

by:Hube02
ID: 24124094
Thanks for the question.
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Consider the following scenario: You are working on a website and make something great - something that lets the server work with information submitted by your users. This could be anything, from a simple guestbook to a e-Money solution. But what…
Part of the Global Positioning System A geocode (https://developers.google.com/maps/documentation/geocoding/) is the major subset of a GPS coordinate (http://en.wikipedia.org/wiki/Global_Positioning_System), the other parts being the altitude and t…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

9 Experts available now in Live!

Get 1:1 Help Now