Solved

Parsing Time String with Timezone info

Posted on 2007-03-23
5
1,464 Views
Last Modified: 2012-05-05
I'm trying to parse Apache log files with Python and I'm having difficulty parsing the time zone information.

import time
t = "23/Mar/2007:02:01:14 -0500"
pdate = time.strptime(t, "%d/%b/%Y:%H:%M:%S %Z")

The above code fails.  How can I take the above string and parse it to get a unix timestamp in GMT?  I looked at mx.DateTime but the parser documentation didn't seem to say anything about timezone information.  Does mx.DateTime handle this correctly or do I need something different?
0
Comment
Question by:phasevar
  • 4
5 Comments
 
LVL 28

Expert Comment

by:pepr
ID: 18782202
The problem is in timezone %Z. Try the following:

import time
# t = "23/Mar/2007:02:01:14 -0500"
t = "23/Mar/2007:02:01:14"
pdate = time.strptime(t, "%d/%b/%Y:%H:%M:%S")
print time.strftime("%d/%b/%Y:%H:%M:%S %Z", pdate)

Your value string was commented out and simplified. Also the formatting string for strptime() was simplified. Then the pdate is extracted correctly (in my case). Converting it back using the strftime() and the original format string reveals what was expected for the extraction.

In my case it look really strange (you know MS Windows and their view on how the timezone should be displayed):

23/Mar/2007:02:01:14 Střední Evropa (běžný čas)

It is in Czech and it says Middle Europe (normal time)

When I put back that string into your t, set the coding of the Python source file correctly and use your original code, it works. Now the question is how to accept the -0500.
0
 
LVL 28

Expert Comment

by:pepr
ID: 18782484
You can split manually the timezone offset string and convert it to int:

t = "23/Mar/2007:02:01:14 -0500"
tlocStr, zoneStr = t.split(" ")
zoneOffset = int(zoneStr[:3])
print tlocStr
print zoneStr
print zoneOffset

You can get the local time from tlocStr using the simplified format string ("%d/%b/%Y:%H:%M:%S") and add manually the zone offset. If the log contains always the same value of the zone offset, you may want to ignore it.
0
 
LVL 28

Expert Comment

by:pepr
ID: 18782523
Have a look at RFC2822 (http://www.faqs.org/rfcs/rfc2822.html) what the -0500 exactly means. (To put it simply, it means minus 5 hours to UTC a.k.a. Greenwich Mean Time.)
0
 

Author Comment

by:phasevar
ID: 18782562
Yeah, the TZ is the issue.   Without parsing the TZ info I'll have incorrect dates.
0
 
LVL 28

Accepted Solution

by:
pepr earned 500 total points
ID: 18783206
Try the standard module datetime for the addition of the timezone delta in hours:

==========================================================
import datetime

t = "23/Mar/2007:02:01:14 -0500"
tloc, zone = t.split(" ")
zoneOffset = int(zone[:3])
print tloc
print zoneOffset

dt = datetime.datetime.strptime(tloc, "%d/%b/%Y:%H:%M:%S")
delta = datetime.timedelta(hours=zoneOffset)

print dt.strftime("%d/%b/%Y:%H:%M:%S")
dt = dt + delta
print dt.strftime("%d/%b/%Y:%H:%M:%S")
==========================================================

For me it produces
==========================================================
23/Mar/2007:02:01:14
-5
23/Mar/2007:02:01:14
22/Mar/2007:21:01:14
==========================================================

0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

A set of related code is known to be a Module, it helps us to organize our code logically which is much easier for us to understand and use it. Module is an object with arbitrarily named attributes which can be used in binding and referencing. …
Sequence is something that used to store data in it in very simple words. Let us just create a list first. To create a list first of all we need to give a name to our list which I have taken as “COURSE” followed by equals sign and finally enclosed …
Learn the basics of if, else, and elif statements in Python 2.7. Use "if" statements to test a specified condition.: The structure of an if statement is as follows: (CODE) Use "else" statements to allow the execution of an alternative, if the …
Learn the basics of while and for loops in Python.  while loops are used for testing while, or until, a condition is met: The structure of a while loop is as follows:     while <condition>:         do something         repeate: The break statement m…

943 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now