Solved

Regex String Needed to Trap Certain URLs

Posted on 2013-02-04
3
293 Views
Last Modified: 2013-02-05
I am using a program to create for me a sitemap.xml file.  It's awesome at crawling my site, unfortunately I do not want it to index certain pages.  Fortunately, the program allows me to filter out pages using Regular Expressions.  I hope someone could help me write this regex string to plug into the program.  

Following is an example of an aspx page that I do not want indexed.

http://www.companysite.com/ca/anaheim/6008-e.-calle-cedro/4641217/?sorigin=hb

For the record, the URL is a profile page for a Real Estate listing.  We only list properties in California, so /ca/ is considered static text.  
http://www.companysite.com/ca/{city}/{address}/{propertyID}/{variable}.  

Based on the above URL, I do not want to crawl any /ca/{city}/{address} pages.   But I am okay with it crawling other city pages such as /ca/{city}/housingmarkettrends.  

So in laymen terms, below is what I figure is the pattern that I need to trap.  For ease of reading I have broken down each piece of the URL string in its own row below:

{any string of chars, including special chars: hyphens, periods, etc. that ends with a forward slash}
{string of chars that begin with a digit (zero thru nine) and ends with a forward slash}
{string of chars that only contain digits (zero thru nine) and ends with a forward slash}

I look forward to working with someone on this.  Thanks.  

Robert
0
Comment
Question by:PAEWINS
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
3 Comments
 
LVL 35

Accepted Solution

by:
Terry Woods earned 500 total points
ID: 38853070
This might do the trick:

http://www.companysite.com/ca/[^?&/]+/\d[^?&/]*/\d+/

Open in new window


Let me know how it goes.
0
 

Author Closing Comment

by:PAEWINS
ID: 38857586
The regex created appears to work using an Expression Tester (http://www.regular-expressions.info/javascriptexample.html).  

Below are the testing variables.  

URL:   http://www.companysite.com/ca/corona/1518-beacon-ridge-way/4579453/?sorigin=hb 

REGEX:   http://www.companysite.com/ca/[^?&/]+/\d[^?&/]*/\d+/

However, when I supplied it in the program that called for it, it seems to not acknowledge it.  Could it be written for a certain platform.

The program I am using is Gmapper.  It is an XML sitemap generator.  http://www.g-mapper.co.uk/download/index.aspx  

Thank you anyway.  I will need to research this further.
0
 

Author Comment

by:PAEWINS
ID: 38857678
Terry,

I am trying to troubleshoot my issue and wrap my head around this regex string you provided.  Can you break this down for me?  I provided a URL with 5 forward slashes, and the regex you supplied has 7.  

Are the two forward slashed inside the brackets metacharacters and part of the regex set of commands?  

Why does the expression end with a forward slash?  

/ca/[^?&/]+/\d[^?&/]*/\d+/

Thanks.
0

Featured Post

How our DevOps Teams Maximize Uptime

Our Dev teams are like yours. They’re continually cranking out code for new features/bugs fixes, testing, deploying, responding to production monitoring events and more. It’s complex. So, we thought you’d like to see what’s working for us. Read the use case whitepaper.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this tutorial I will show you how to make a simple HTML bar chart with the usage of WhizBase, If you want more information about WhizBase please read my previous articles at http://www.experts-exchange.com/ARTH_5123186.html (http://www.experts-ex…
It is a general practice to get rid of old user profiles on a computer  in a LAN environment. As I have been working with a company in a LAN environment where users move from one place to some other place at times. This will make many user profil…
The viewer will learn how to count occurrences of each item in an array.
Video by: Mark
This lesson goes over how to construct ordered and unordered lists and how to create hyperlinks.

749 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question