• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 343
  • Last Modified:

Regular expressions to include and exclude webside subdirectories

I have a website content management program that can include or exclude webpages from being processed in a certain way.  

And the documentation says I can use "regular expressions" to do it.  Specifically following the documentation on regular  expressions at  this url http://www.regular-expressions.info/reference.html    And the documentation says I can test them at http://gskinner.com/RegExr/

However I'm not quite sure on how to do it.  I have little or no experience on "regular expressions".  This is the first I have heard of it. However I certainly know dos wildcards such as *.* and ?.  But that's my limit.

Here are example urls I am trying to include and exclude using "regular expressions:

http://wwwfoobar.org/fruits/
http://wwwfoobar.org/fruits/bananas/

I want "http://wwwfoobar.org/fruits/" to be EXcluded, using regular expressions, but I want its subdirectory "http://wwwfoobar.org/fruits/bananas/" to be INcluded.

So pages such as
http://wwwfoobar.org/fruits/cookingfruits.html and
http://wwwfoobar.org/fruits/eatingfruits.html and
...would be EXcluded from being processed in a certain way by the content management system.

But pages such as
http://wwwfoobar.org/fruits/bananas/ripebananas.html
http://wwwfoobar.org/fruits/bananas/rottenbananas.html 
...would be INcluded to be processed in a certain way by the content management system.

Also it would be ideal if anything in the bananas subdirectories would be INcluded via a wildcard expression -- so that I wouldn't have to manually have to type each page into the system.

Any suggestions?  

Thanks!

Rowby
0
Rowby Goren
Asked:
Rowby Goren
  • 3
  • 3
1 Solution
 
Terry WoodsIT GuruCommented:
Using the regex tester you linked to, the pattern:
^http://wwwfoobar\.org/(?!fruits(?!/bananas)).*$

Open in new window

gives the result you want. You may need to exclude the ^ and $ (which match the start and end of line in multiline mode, which I used for testing), and possibly even the http://wwwfoobar\.org/ part too depending on what the software is expecting. You'll need to play around and find what works, or find out more about the system you're using.

I tested on the data:
Exclude:
http://wwwfoobar.org/fruits/
http://wwwfoobar.org/fruits/cookingfruits.html
http://wwwfoobar.org/fruits/eatingfruits.html

Include:
http://wwwfoobar.org/fruits/bananas/
http://wwwfoobar.org/fruits/bananas/ripebananas.html
http://wwwfoobar.org/fruits/bananas/rottenbananas.html

Open in new window


And the result looks like this for me:
screenshot
0
 
Terry WoodsIT GuruCommented:
A note about negative lookaheads: In regular expressions, the pattern
foo(?!bar)
uses a negative lookahead so that foo is only matched when it's not followed by bar

This pattern matches foo, when not followed by bar, then the next 2 (non-linebreak) characters:
foo(?!bar).{2}
eg when used with text:
foobar
foobaa
foobarber
food
foofighters

Open in new window


You'd get the following matches:
fooba (from foobaa)
foofi (from foofighters)

The pattern I gave you used a negative lookahead within a negative lookahead. Let me know if you can't get your head around the concept!

(fingers crossed your system supports negative lookaheads; not all regex engines do support them)
0
 
Rowby GorenAuthor Commented:
Thanks Terry.  

I will be testing this later today or first thing in the morning.

Thanks for helping.  

On a side note, I assume "regular expressions" are is also used for programming syntax, such as in php programming?   BTW the content management system I am using was done in php and is called Joomla.  I've added an extension that uses regular expressions to  fine tune its features.

Thanks

Rowby
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
Terry WoodsIT GuruCommented:
PHP, Perl, Java and .NET have very good regex engines and would work with the pattern I provided, so you should be ok.
0
 
Rowby GorenAuthor Commented:
Hi  Sorry for the delay. I was out all weekend at a convention. But will try out the solution this afternoon.

Rowby
0
 
Rowby GorenAuthor Commented:
Thanks Terry,  Worked fine!

Rowby
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

  • 3
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now