Solved

Regex String Needed to Trap Certain URLs

Posted on 2013-02-04
3
288 Views
Last Modified: 2013-02-05
I am using a program to create for me a sitemap.xml file.  It's awesome at crawling my site, unfortunately I do not want it to index certain pages.  Fortunately, the program allows me to filter out pages using Regular Expressions.  I hope someone could help me write this regex string to plug into the program.  

Following is an example of an aspx page that I do not want indexed.

http://www.companysite.com/ca/anaheim/6008-e.-calle-cedro/4641217/?sorigin=hb

For the record, the URL is a profile page for a Real Estate listing.  We only list properties in California, so /ca/ is considered static text.  
http://www.companysite.com/ca/{city}/{address}/{propertyID}/{variable}.  

Based on the above URL, I do not want to crawl any /ca/{city}/{address} pages.   But I am okay with it crawling other city pages such as /ca/{city}/housingmarkettrends.  

So in laymen terms, below is what I figure is the pattern that I need to trap.  For ease of reading I have broken down each piece of the URL string in its own row below:

{any string of chars, including special chars: hyphens, periods, etc. that ends with a forward slash}
{string of chars that begin with a digit (zero thru nine) and ends with a forward slash}
{string of chars that only contain digits (zero thru nine) and ends with a forward slash}

I look forward to working with someone on this.  Thanks.  

Robert
0
Comment
Question by:PAEWINS
  • 2
3 Comments
 
LVL 35

Accepted Solution

by:
Terry Woods earned 500 total points
ID: 38853070
This might do the trick:

http://www.companysite.com/ca/[^?&/]+/\d[^?&/]*/\d+/

Open in new window


Let me know how it goes.
0
 

Author Closing Comment

by:PAEWINS
ID: 38857586
The regex created appears to work using an Expression Tester (http://www.regular-expressions.info/javascriptexample.html).  

Below are the testing variables.  

URL:   http://www.companysite.com/ca/corona/1518-beacon-ridge-way/4579453/?sorigin=hb

REGEX:   http://www.companysite.com/ca/[^?&/]+/\d[^?&/]*/\d+/

However, when I supplied it in the program that called for it, it seems to not acknowledge it.  Could it be written for a certain platform.

The program I am using is Gmapper.  It is an XML sitemap generator.  http://www.g-mapper.co.uk/download/index.aspx  

Thank you anyway.  I will need to research this further.
0
 

Author Comment

by:PAEWINS
ID: 38857678
Terry,

I am trying to troubleshoot my issue and wrap my head around this regex string you provided.  Can you break this down for me?  I provided a URL with 5 forward slashes, and the regex you supplied has 7.  

Are the two forward slashed inside the brackets metacharacters and part of the regex set of commands?  

Why does the expression end with a forward slash?  

/ca/[^?&/]+/\d[^?&/]*/\d+/

Thanks.
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

Suggested Solutions

If you get a (Blue Screen of Death), your system writes a small file called a minidump. Your first step is to make certain your computer is setup to record memory dumps. Right click My Computer, choose properties. Click on the advanced tab, an…
It is a general practice to get rid of old user profiles on a computer  in a LAN environment. As I have been working with a company in a LAN environment where users move from one place to some other place at times. This will make many user profil…
The viewer will learn how to dynamically set the form action using jQuery.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now