Solved

Need one Standard Regular Expression string

Posted on 2013-02-06
6
191 Views
Last Modified: 2013-10-15
I am using a program G-mapper to create a sitemap.xml file.  However, I do not want it to index certain pages.  

The program allows the filtering out of pages using Standard Regular Expressions.  Their help page offers RegEx help links to the following sites:
   
Following is an example of an aspx page that I do not want indexed.
   http://www.companysite.com/ca/anaheim/6008-e.-calle-cedro/4641217/?sorigin=hb

For the record, the above URL is a profile page for a Real Estate listing.  We only list properties in California, so /ca/ is considered static text.  
   http://www.companysite.com/ca/{city}/{address}/{propertyID}/{variable}.  


Based on the above URL, I do not want to crawl any /ca/{city}/{address} pages.   But I am okay with it crawling other sub directory city pages such as /ca/{city}/housingmarkettrends.  

So in laymen terms, below is what I figure is the pattern that I need to trap.  For ease of reading I have broken down each piece of the URL string in its own row below:

   

1.

http://www.companysite.com/ca/
 

2.

followed by {any string of chars, including special chars: hyphens, periods, etc. that ends with a forward slash}  

3.

followed by {string of chars that begin with a digit (zero thru nine) and ends with a forward slash}  

4.

followed by {string of chars that only contain digits (zero thru nine) and ends with a forward slash}  

5.

followed by {string of chars that begin with a question mark and ends with a forward slash}
FYI, I was provided an expression that seems to be legal, but the program seems to ignore it.  Maybe it's not a STANDARD Regular Expression???

   http://www.companysite.com/ca/[^?&/]+/\d[^?&/]*/\d+/

I look forward to any expert advice on the topic.  Best Regards.
0
Comment
Question by:PAEWINS
  • 3
  • 2
6 Comments
 
LVL 35

Assisted Solution

by:Terry Woods
Terry Woods earned 100 total points
ID: 38861787
One thing you could try is changing each \d to [0-9]
0
 
LVL 19

Assisted Solution

by:simon3270
simon3270 earned 100 total points
ID: 38863329
Also, the "+" (match one or more times) is not in Basic regexes, you only have * (match 0 or more times) or "?" (match 0 or 1 times).  You can get the "+" effect of, for example, "[^?&/]+" with:
    [^?&/][^?&/]*
and "\d+" with:
    [0-9][0-9]*

so mixing Terry and my suggestions, you get:
     http://www.companysite.com/ca/[^?&/][^?&/]*/[0-9][^?&/]*/[0-9][0-9]*/
0
 

Author Comment

by:PAEWINS
ID: 39553195
I am closing this old issue.
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 19

Expert Comment

by:simon3270
ID: 39553785
Were our suggsetions useful at the time?
0
 

Accepted Solution

by:
PAEWINS earned 0 total points
ID: 39553838
No.  But thanks.
0
 

Author Closing Comment

by:PAEWINS
ID: 39573143
No solution provided.  But I appreciate the attempts.
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

Whatever be the reason, if you are working on web development side,  you will need day-today validation codes like email validation, date validation , IP address validation, phone validation on any of the edit page or say at the time of registration…
As most anyone who uses or has come across them can attest to, regular expressions (regex) are a complicated bit of magic. Packed so succinctly within their cryptic syntax lies a great deal of power. It's not the "take over the world" kind of power,…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now