Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

Need one Standard Regular Expression string

Posted on 2013-02-06
6
Medium Priority
?
203 Views
Last Modified: 2013-10-15
I am using a program G-mapper to create a sitemap.xml file.  However, I do not want it to index certain pages.  

The program allows the filtering out of pages using Standard Regular Expressions.  Their help page offers RegEx help links to the following sites:
   
Following is an example of an aspx page that I do not want indexed.
   http://www.companysite.com/ca/anaheim/6008-e.-calle-cedro/4641217/?sorigin=hb

For the record, the above URL is a profile page for a Real Estate listing.  We only list properties in California, so /ca/ is considered static text.  
   http://www.companysite.com/ca/{city}/{address}/{propertyID}/{variable}.  


Based on the above URL, I do not want to crawl any /ca/{city}/{address} pages.   But I am okay with it crawling other sub directory city pages such as /ca/{city}/housingmarkettrends.  

So in laymen terms, below is what I figure is the pattern that I need to trap.  For ease of reading I have broken down each piece of the URL string in its own row below:

   

1.

http://www.companysite.com/ca/
 

2.

followed by {any string of chars, including special chars: hyphens, periods, etc. that ends with a forward slash}  

3.

followed by {string of chars that begin with a digit (zero thru nine) and ends with a forward slash}  

4.

followed by {string of chars that only contain digits (zero thru nine) and ends with a forward slash}  

5.

followed by {string of chars that begin with a question mark and ends with a forward slash}
FYI, I was provided an expression that seems to be legal, but the program seems to ignore it.  Maybe it's not a STANDARD Regular Expression???

   http://www.companysite.com/ca/[^?&/]+/\d[^?&/]*/\d+/

I look forward to any expert advice on the topic.  Best Regards.
0
Comment
Question by:PAEWINS
  • 3
  • 2
6 Comments
 
LVL 35

Assisted Solution

by:Terry Woods
Terry Woods earned 300 total points
ID: 38861787
One thing you could try is changing each \d to [0-9]
0
 
LVL 20

Assisted Solution

by:simon3270
simon3270 earned 300 total points
ID: 38863329
Also, the "+" (match one or more times) is not in Basic regexes, you only have * (match 0 or more times) or "?" (match 0 or 1 times).  You can get the "+" effect of, for example, "[^?&/]+" with:
    [^?&/][^?&/]*
and "\d+" with:
    [0-9][0-9]*

so mixing Terry and my suggestions, you get:
     http://www.companysite.com/ca/[^?&/][^?&/]*/[0-9][^?&/]*/[0-9][0-9]*/
0
 

Author Comment

by:PAEWINS
ID: 39553195
I am closing this old issue.
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
LVL 20

Expert Comment

by:simon3270
ID: 39553785
Were our suggsetions useful at the time?
0
 

Accepted Solution

by:
PAEWINS earned 0 total points
ID: 39553838
No.  But thanks.
0
 

Author Closing Comment

by:PAEWINS
ID: 39573143
No solution provided.  But I appreciate the attempts.
0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Whatever be the reason, if you are working on web development side,  you will need day-today validation codes like email validation, date validation , IP address validation, phone validation on any of the edit page or say at the time of registration…
We are witnesses that everyone is saying that our children shouldn't "play" with a technology because it is dangerous. This article is going to prove that they are wrong.
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Suggested Courses

926 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question