Solved

Regular Expression Needed

Posted on 2006-11-29
6
196 Views
Last Modified: 2010-03-05
I am looking for the Perl Regular Expression to strip part of an HTTP request out.

Basically take the following request examples:

1) GET /index.html HTTP/1.1
2) GET /dir/test.asp?param=value HTTP/1.1
3) POST /dir/dir/dir/test.php HTTP/1.1
4) GET /dir/index.js HTTP/1.1
5) POST /dir/dir/post.asp?param=value HTTP/1.1
6) GET /dir/images/index.jpg HTTP/1.1
7) GET / HTTP/1.1
8) GET /test/script HTTP/1.1


I would like a regular expression that would give me the actual page name. For each one I would like the following to be in Group 1.

1) index.html
2) test.asp
3) test.php
4) index.js
5) post.asp
6) index.jpg
7) /
8) script

I don't care about the number or names of directories unless there is no specific resource name.
0
Comment
Question by:mikedgibson
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
  • 2
6 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 18039785
Is the
1)
2)
part of the request example and the output you want?
0
 
LVL 2

Expert Comment

by:jingks03
ID: 18039863
# the following regex works for your test data.  The only thing is the single '/' is returned as a blank.
# i personally don't know how to return the '/' in that case with only a regex, if I used some if () else() then i could

# if you DO NOT want the 1), 2) etc...
if (m#/([\w\.]*)(?:\?[\w\=])?\sHTTP/1.1#) { print "$1\n"; }

# if you do want them
if (m#^(\d+\)\s).*/([\w\.]*)(?:\?[\w\=])?\sHTTP/1.1#) { print "$1$2\n"; }
0
 
LVL 2

Author Comment

by:mikedgibson
ID: 18039894
No I do not want the 1) and 2) included I was just showing that that output corresponded to the matching input.
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 2

Expert Comment

by:jingks03
ID: 18039930
in the case of mine, you may have to edit the [\w\=] to accept whatever is acceptable in "param=value"
0
 
LVL 2

Author Comment

by:mikedgibson
ID: 18040107
It doesn't need to be just a regex .. If you ned to use if() else () then that is fine as well
0
 
LVL 84

Accepted Solution

by:
ozo earned 250 total points
ID: 18040116
while( <DATA> ){
    print "$1\n" if m#(?=/)(?:\S*/)?([^?\s]+)[\s?]#
}
__DATA__
GET /index.html HTTP/1.1
GET /dir/test.asp?param=value HTTP/1.1
POST /dir/dir/dir/test.php HTTP/1.1
GET /dir/index.js HTTP/1.1
POST /dir/dir/post.asp?param=value HTTP/1.1
GET /dir/images/index.jpg HTTP/1.1
GET / HTTP/1.1
GET /test/script HTTP/1.1
0

Featured Post

On Demand Webinar: Networking for the Cloud Era

Ready to improve network connectivity? Watch this webinar to learn how SD-WANs and a one-click instant connect tool can boost provisions, deployment, and management of your cloud connection.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded (http://www.mongodb.org/downloads),  installed and started the daemon, and had a console window open. After an hour or two of playing at the command …
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question