Parsing Time String with Timezone info

I'm trying to parse Apache log files with Python and I'm having difficulty parsing the time zone information.

import time
t = "23/Mar/2007:02:01:14 -0500"
pdate = time.strptime(t, "%d/%b/%Y:%H:%M:%S %Z")

The above code fails.  How can I take the above string and parse it to get a unix timestamp in GMT?  I looked at mx.DateTime but the parser documentation didn't seem to say anything about timezone information.  Does mx.DateTime handle this correctly or do I need something different?
phasevarAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

peprCommented:
The problem is in timezone %Z. Try the following:

import time
# t = "23/Mar/2007:02:01:14 -0500"
t = "23/Mar/2007:02:01:14"
pdate = time.strptime(t, "%d/%b/%Y:%H:%M:%S")
print time.strftime("%d/%b/%Y:%H:%M:%S %Z", pdate)

Your value string was commented out and simplified. Also the formatting string for strptime() was simplified. Then the pdate is extracted correctly (in my case). Converting it back using the strftime() and the original format string reveals what was expected for the extraction.

In my case it look really strange (you know MS Windows and their view on how the timezone should be displayed):

23/Mar/2007:02:01:14 Střední Evropa (běžný čas)

It is in Czech and it says Middle Europe (normal time)

When I put back that string into your t, set the coding of the Python source file correctly and use your original code, it works. Now the question is how to accept the -0500.
0
peprCommented:
You can split manually the timezone offset string and convert it to int:

t = "23/Mar/2007:02:01:14 -0500"
tlocStr, zoneStr = t.split(" ")
zoneOffset = int(zoneStr[:3])
print tlocStr
print zoneStr
print zoneOffset

You can get the local time from tlocStr using the simplified format string ("%d/%b/%Y:%H:%M:%S") and add manually the zone offset. If the log contains always the same value of the zone offset, you may want to ignore it.
0
peprCommented:
Have a look at RFC2822 (http://www.faqs.org/rfcs/rfc2822.html) what the -0500 exactly means. (To put it simply, it means minus 5 hours to UTC a.k.a. Greenwich Mean Time.)
0
phasevarAuthor Commented:
Yeah, the TZ is the issue.   Without parsing the TZ info I'll have incorrect dates.
0
peprCommented:
Try the standard module datetime for the addition of the timezone delta in hours:

==========================================================
import datetime

t = "23/Mar/2007:02:01:14 -0500"
tloc, zone = t.split(" ")
zoneOffset = int(zone[:3])
print tloc
print zoneOffset

dt = datetime.datetime.strptime(tloc, "%d/%b/%Y:%H:%M:%S")
delta = datetime.timedelta(hours=zoneOffset)

print dt.strftime("%d/%b/%Y:%H:%M:%S")
dt = dt + delta
print dt.strftime("%d/%b/%Y:%H:%M:%S")
==========================================================

For me it produces
==========================================================
23/Mar/2007:02:01:14
-5
23/Mar/2007:02:01:14
22/Mar/2007:21:01:14
==========================================================

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Python

From novice to tech pro — start learning today.