Solved

Need one Standard Regular Expression string

Posted on 2013-02-06
6
194 Views
Last Modified: 2013-10-15
I am using a program G-mapper to create a sitemap.xml file.  However, I do not want it to index certain pages.  

The program allows the filtering out of pages using Standard Regular Expressions.  Their help page offers RegEx help links to the following sites:
   
Following is an example of an aspx page that I do not want indexed.
   http://www.companysite.com/ca/anaheim/6008-e.-calle-cedro/4641217/?sorigin=hb

For the record, the above URL is a profile page for a Real Estate listing.  We only list properties in California, so /ca/ is considered static text.  
   http://www.companysite.com/ca/{city}/{address}/{propertyID}/{variable}.  


Based on the above URL, I do not want to crawl any /ca/{city}/{address} pages.   But I am okay with it crawling other sub directory city pages such as /ca/{city}/housingmarkettrends.  

So in laymen terms, below is what I figure is the pattern that I need to trap.  For ease of reading I have broken down each piece of the URL string in its own row below:

   

1.

http://www.companysite.com/ca/
 

2.

followed by {any string of chars, including special chars: hyphens, periods, etc. that ends with a forward slash}  

3.

followed by {string of chars that begin with a digit (zero thru nine) and ends with a forward slash}  

4.

followed by {string of chars that only contain digits (zero thru nine) and ends with a forward slash}  

5.

followed by {string of chars that begin with a question mark and ends with a forward slash}
FYI, I was provided an expression that seems to be legal, but the program seems to ignore it.  Maybe it's not a STANDARD Regular Expression???

   http://www.companysite.com/ca/[^?&/]+/\d[^?&/]*/\d+/

I look forward to any expert advice on the topic.  Best Regards.
0
Comment
Question by:PAEWINS
  • 3
  • 2
6 Comments
 
LVL 35

Assisted Solution

by:Terry Woods
Terry Woods earned 100 total points
ID: 38861787
One thing you could try is changing each \d to [0-9]
0
 
LVL 19

Assisted Solution

by:simon3270
simon3270 earned 100 total points
ID: 38863329
Also, the "+" (match one or more times) is not in Basic regexes, you only have * (match 0 or more times) or "?" (match 0 or 1 times).  You can get the "+" effect of, for example, "[^?&/]+" with:
    [^?&/][^?&/]*
and "\d+" with:
    [0-9][0-9]*

so mixing Terry and my suggestions, you get:
     http://www.companysite.com/ca/[^?&/][^?&/]*/[0-9][^?&/]*/[0-9][0-9]*/
0
 

Author Comment

by:PAEWINS
ID: 39553195
I am closing this old issue.
0
Master Your Team's Linux and Cloud Stack!

The average business loses $13.5M per year to ineffective training (per 1,000 employees). Keep ahead of the competition and combine in-person quality with online cost and flexibility by training with Linux Academy.

 
LVL 19

Expert Comment

by:simon3270
ID: 39553785
Were our suggsetions useful at the time?
0
 

Accepted Solution

by:
PAEWINS earned 0 total points
ID: 39553838
No.  But thanks.
0
 

Author Closing Comment

by:PAEWINS
ID: 39573143
No solution provided.  But I appreciate the attempts.
0

Featured Post

Netscaler Common Configuration How To guides

If you use NetScaler you will want to see these guides. The NetScaler How To Guides show administrators how to get NetScaler up and configured by providing instructions for common scenarios and some not so common ones.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Whatever be the reason, if you are working on web development side,  you will need day-today validation codes like email validation, date validation , IP address validation, phone validation on any of the edit page or say at the time of registration…
Do you hate spam? I do, and I am willing to bet you do as well. I often wonder, though, "if people hate spam so much, why do they still post their email addresses on the web?" I'm not talking about a plain-text posting here. I am referring to the fa…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

803 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question