[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 532
  • Last Modified:

Regex to extract URL subdirectories.

Hi,

I'm trying to get a regex to extract subdirectories from, for example :
http://www.example.com/dir1/subdir1/page.aspx

I need to match just exactly "dir1" and "subdir1".
I think I have a problem with the "/" cause I have some time trying it without look.

Any help is appreciated.
0
fischermx
Asked:
fischermx
  • 5
  • 3
2 Solutions
 
fischermxAuthor Commented:
Ok, let me tell you what I've got so far :

                  string myPath = "http://www.myexample.com/subdir1/subdir2/subdir3/";

                  string mapTo = myPath;
                  Regex regMap = new Regex(@"\/[a-zA-Z0-9]+\/",RegexOptions.IgnoreCase);
                  
                  foreach( Match matchExpr in regMap.Matches(mapTo) )
                  {
                        string expr = matchExpr.ToString();
                        Console.WriteLine('>'+expr);
                  }

This is outputing "subdir1" and "subdir3", somehow it skips "subdir2" !!
Why ?
0
 
YurichCommented:
i think it's based on your search request, you're searching not for subdir1, subdir2, etc, but for "/subdir1/", "/subdir2/", and etc... therefore:

when you finish searching for /subdir1/, the next character to check is 's' in subdir2 which is not in the search pattern of your regex and '/' is already used for your previous "/subdir1/" as a last character, hence subdir2 is completely missed out but the next '/' triggers the regex and you have your suddir3 in the place...

regards
0
 
YurichCommented:
and you don't have escape '/' char,                  
Regex myRegex = new Regex( "/[a-zA-Z0-9]+/",RegexOptions.IgnoreCase );

will do just fine

(but won't solve your problem) :(
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
YurichCommented:
try this:

Regex myRegex = new Regex( "/[a-zA-Z0-9][a-zA-Z0-9]+",RegexOptions.IgnoreCase );

it'll get all subdirs but it will get "www" as well but you can easily parse it later or think how to modify your regex to avoid it.

regards
0
 
Bob LearnedCommented:
Slight modification to skip //www:

"/[^/a-zA-Z0-9][a-zA-Z0-9]+"

   ^^^

Bob
0
 
YurichCommented:
hmm, for some reason

"/[^/a-zA-Z0-9][a-zA-Z0-9]+"

doesn't work for me - it doesn't find anything in the string (http://...) above.

isn't it a contradiction? we want it to have the first character '/' but not to start with '/'...

regards
0
 
Bob LearnedCommented:
Yeah, that makes sense:

^ negates the entire character class.

Let's try this instead:

(?<!/)/{1}[a-zA-Z\d][a-zA-Z\d]+

?<! is negative zero-width look-behind.  

So a single '/' not preceded by '/', followed by any number of letters and numbers.

Bob

0
 
YurichCommented:
this one works fine, should i accept the answer? ;)
0
 
Bob LearnedCommented:
I would if I were you :))  Actually, I'll split with you ;)

Bob
0

Featured Post

New feature and membership benefit!

New feature! Upgrade and increase expert visibility of your issues with Priority Questions.

  • 5
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now