Solved

Regular Expression Needed

Posted on 2006-11-29
6
193 Views
Last Modified: 2010-03-05
I am looking for the Perl Regular Expression to strip part of an HTTP request out.

Basically take the following request examples:

1) GET /index.html HTTP/1.1
2) GET /dir/test.asp?param=value HTTP/1.1
3) POST /dir/dir/dir/test.php HTTP/1.1
4) GET /dir/index.js HTTP/1.1
5) POST /dir/dir/post.asp?param=value HTTP/1.1
6) GET /dir/images/index.jpg HTTP/1.1
7) GET / HTTP/1.1
8) GET /test/script HTTP/1.1


I would like a regular expression that would give me the actual page name. For each one I would like the following to be in Group 1.

1) index.html
2) test.asp
3) test.php
4) index.js
5) post.asp
6) index.jpg
7) /
8) script

I don't care about the number or names of directories unless there is no specific resource name.
0
Comment
Question by:mikedgibson
  • 2
  • 2
  • 2
6 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 18039785
Is the
1)
2)
part of the request example and the output you want?
0
 
LVL 2

Expert Comment

by:jingks03
ID: 18039863
# the following regex works for your test data.  The only thing is the single '/' is returned as a blank.
# i personally don't know how to return the '/' in that case with only a regex, if I used some if () else() then i could

# if you DO NOT want the 1), 2) etc...
if (m#/([\w\.]*)(?:\?[\w\=])?\sHTTP/1.1#) { print "$1\n"; }

# if you do want them
if (m#^(\d+\)\s).*/([\w\.]*)(?:\?[\w\=])?\sHTTP/1.1#) { print "$1$2\n"; }
0
 
LVL 2

Author Comment

by:mikedgibson
ID: 18039894
No I do not want the 1) and 2) included I was just showing that that output corresponded to the matching input.
0
Optimizing Cloud Backup for Low Bandwidth

With cloud storage prices going down a growing number of SMBs start to use it for backup storage. Unfortunately, business data volume rarely fits the average Internet speed. This article provides an overview of main Internet speed challenges and reveals backup best practices.

 
LVL 2

Expert Comment

by:jingks03
ID: 18039930
in the case of mine, you may have to edit the [\w\=] to accept whatever is acceptable in "param=value"
0
 
LVL 2

Author Comment

by:mikedgibson
ID: 18040107
It doesn't need to be just a regex .. If you ned to use if() else () then that is fine as well
0
 
LVL 84

Accepted Solution

by:
ozo earned 250 total points
ID: 18040116
while( <DATA> ){
    print "$1\n" if m#(?=/)(?:\S*/)?([^?\s]+)[\s?]#
}
__DATA__
GET /index.html HTTP/1.1
GET /dir/test.asp?param=value HTTP/1.1
POST /dir/dir/dir/test.php HTTP/1.1
GET /dir/index.js HTTP/1.1
POST /dir/dir/post.asp?param=value HTTP/1.1
GET /dir/images/index.jpg HTTP/1.1
GET / HTTP/1.1
GET /test/script HTTP/1.1
0

Featured Post

NAS Cloud Backup Strategies

This article explains backup scenarios when using network storage. We review the so-called “3-2-1 strategy” and summarize the methods you can use to send NAS data to the cloud

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded (http://www.mongodb.org/downloads),  installed and started the daemon, and had a console window open. After an hour or two of playing at the command …
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

831 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question