Regular Expression Needed

Posted on 2006-11-29
Last Modified: 2010-03-05
I am looking for the Perl Regular Expression to strip part of an HTTP request out.

Basically take the following request examples:

1) GET /index.html HTTP/1.1
2) GET /dir/test.asp?param=value HTTP/1.1
3) POST /dir/dir/dir/test.php HTTP/1.1
4) GET /dir/index.js HTTP/1.1
5) POST /dir/dir/post.asp?param=value HTTP/1.1
6) GET /dir/images/index.jpg HTTP/1.1
7) GET / HTTP/1.1
8) GET /test/script HTTP/1.1

I would like a regular expression that would give me the actual page name. For each one I would like the following to be in Group 1.

1) index.html
2) test.asp
3) test.php
4) index.js
5) post.asp
6) index.jpg
7) /
8) script

I don't care about the number or names of directories unless there is no specific resource name.
Question by:mikedgibson
  • 2
  • 2
  • 2
LVL 84

Expert Comment

Comment Utility
Is the
part of the request example and the output you want?

Expert Comment

Comment Utility
# the following regex works for your test data.  The only thing is the single '/' is returned as a blank.
# i personally don't know how to return the '/' in that case with only a regex, if I used some if () else() then i could

# if you DO NOT want the 1), 2) etc...
if (m#/([\w\.]*)(?:\?[\w\=])?\sHTTP/1.1#) { print "$1\n"; }

# if you do want them
if (m#^(\d+\)\s).*/([\w\.]*)(?:\?[\w\=])?\sHTTP/1.1#) { print "$1$2\n"; }

Author Comment

Comment Utility
No I do not want the 1) and 2) included I was just showing that that output corresponded to the matching input.
How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline


Expert Comment

Comment Utility
in the case of mine, you may have to edit the [\w\=] to accept whatever is acceptable in "param=value"

Author Comment

Comment Utility
It doesn't need to be just a regex .. If you ned to use if() else () then that is fine as well
LVL 84

Accepted Solution

ozo earned 250 total points
Comment Utility
while( <DATA> ){
    print "$1\n" if m#(?=/)(?:\S*/)?([^?\s]+)[\s?]#
GET /index.html HTTP/1.1
GET /dir/test.asp?param=value HTTP/1.1
POST /dir/dir/dir/test.php HTTP/1.1
GET /dir/index.js HTTP/1.1
POST /dir/dir/post.asp?param=value HTTP/1.1
GET /dir/images/index.jpg HTTP/1.1
GET / HTTP/1.1
GET /test/script HTTP/1.1

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded (,  installed and started the daemon, and had a console window open. After an hour or two of playing at the command …
There are many situations when we need to display the data in sorted order. For example: Student details by name or by rank or by total marks etc. If you are working on data driven based projects then you will use sorting techniques very frequently.…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Internet Business Fax to Email Made Easy - With eFax Corporate (, you'll receive a dedicated online fax number, which is used the same way as a typical analog fax number. You'll receive secure faxes in your email, fr…

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now