Improve company productivity with a Business Account.Sign Up

x
?
Solved

Regex String Needed to Trap Certain URLs

Posted on 2013-02-04
3
Medium Priority
?
343 Views
Last Modified: 2013-02-05
I am using a program to create for me a sitemap.xml file.  It's awesome at crawling my site, unfortunately I do not want it to index certain pages.  Fortunately, the program allows me to filter out pages using Regular Expressions.  I hope someone could help me write this regex string to plug into the program.  

Following is an example of an aspx page that I do not want indexed.

http://www.companysite.com/ca/anaheim/6008-e.-calle-cedro/4641217/?sorigin=hb

For the record, the URL is a profile page for a Real Estate listing.  We only list properties in California, so /ca/ is considered static text.  
http://www.companysite.com/ca/{city}/{address}/{propertyID}/{variable}.  

Based on the above URL, I do not want to crawl any /ca/{city}/{address} pages.   But I am okay with it crawling other city pages such as /ca/{city}/housingmarkettrends.  

So in laymen terms, below is what I figure is the pattern that I need to trap.  For ease of reading I have broken down each piece of the URL string in its own row below:

{any string of chars, including special chars: hyphens, periods, etc. that ends with a forward slash}
{string of chars that begin with a digit (zero thru nine) and ends with a forward slash}
{string of chars that only contain digits (zero thru nine) and ends with a forward slash}

I look forward to working with someone on this.  Thanks.  

Robert
0
Comment
Question by:PAEWINS
  • 2
3 Comments
 
LVL 35

Accepted Solution

by:
Terry Woods earned 2000 total points
ID: 38853070
This might do the trick:

http://www.companysite.com/ca/[^?&/]+/\d[^?&/]*/\d+/

Open in new window


Let me know how it goes.
0
 

Author Closing Comment

by:PAEWINS
ID: 38857586
The regex created appears to work using an Expression Tester (http://www.regular-expressions.info/javascriptexample.html).  

Below are the testing variables.  

URL:   http://www.companysite.com/ca/corona/1518-beacon-ridge-way/4579453/?sorigin=hb 

REGEX:   http://www.companysite.com/ca/[^?&/]+/\d[^?&/]*/\d+/

However, when I supplied it in the program that called for it, it seems to not acknowledge it.  Could it be written for a certain platform.

The program I am using is Gmapper.  It is an XML sitemap generator.  http://www.g-mapper.co.uk/download/index.aspx  

Thank you anyway.  I will need to research this further.
0
 

Author Comment

by:PAEWINS
ID: 38857678
Terry,

I am trying to troubleshoot my issue and wrap my head around this regex string you provided.  Can you break this down for me?  I provided a URL with 5 forward slashes, and the regex you supplied has 7.  

Are the two forward slashed inside the brackets metacharacters and part of the regex set of commands?  

Why does the expression end with a forward slash?  

/ca/[^?&/]+/\d[^?&/]*/\d+/

Thanks.
0

Featured Post

Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Whatever be the reason, if you are working on web development side,  you will need day-today validation codes like email validation, date validation , IP address validation, phone validation on any of the edit page or say at the time of registration…
This article is meant to give a basic understanding of how to use R Sweave as a way to merge LaTeX and R code seamlessly into one presentable document.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

595 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question