Regex to extract URL subdirectories.

Hi,

I'm trying to get a regex to extract subdirectories from, for example :
http://www.example.com/dir1/subdir1/page.aspx

I need to match just exactly "dir1" and "subdir1".
I think I have a problem with the "/" cause I have some time trying it without look.

Any help is appreciated.
LVL 1
fischermxAsked:
Who is Participating?
 
Bob LearnedConnect With a Mentor Commented:
Yeah, that makes sense:

^ negates the entire character class.

Let's try this instead:

(?<!/)/{1}[a-zA-Z\d][a-zA-Z\d]+

?<! is negative zero-width look-behind.  

So a single '/' not preceded by '/', followed by any number of letters and numbers.

Bob

0
 
fischermxAuthor Commented:
Ok, let me tell you what I've got so far :

                  string myPath = "http://www.myexample.com/subdir1/subdir2/subdir3/";

                  string mapTo = myPath;
                  Regex regMap = new Regex(@"\/[a-zA-Z0-9]+\/",RegexOptions.IgnoreCase);
                  
                  foreach( Match matchExpr in regMap.Matches(mapTo) )
                  {
                        string expr = matchExpr.ToString();
                        Console.WriteLine('>'+expr);
                  }

This is outputing "subdir1" and "subdir3", somehow it skips "subdir2" !!
Why ?
0
 
YurichConnect With a Mentor Commented:
i think it's based on your search request, you're searching not for subdir1, subdir2, etc, but for "/subdir1/", "/subdir2/", and etc... therefore:

when you finish searching for /subdir1/, the next character to check is 's' in subdir2 which is not in the search pattern of your regex and '/' is already used for your previous "/subdir1/" as a last character, hence subdir2 is completely missed out but the next '/' triggers the regex and you have your suddir3 in the place...

regards
0
Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
YurichCommented:
and you don't have escape '/' char,                  
Regex myRegex = new Regex( "/[a-zA-Z0-9]+/",RegexOptions.IgnoreCase );

will do just fine

(but won't solve your problem) :(
0
 
YurichCommented:
try this:

Regex myRegex = new Regex( "/[a-zA-Z0-9][a-zA-Z0-9]+",RegexOptions.IgnoreCase );

it'll get all subdirs but it will get "www" as well but you can easily parse it later or think how to modify your regex to avoid it.

regards
0
 
Bob LearnedCommented:
Slight modification to skip //www:

"/[^/a-zA-Z0-9][a-zA-Z0-9]+"

   ^^^

Bob
0
 
YurichCommented:
hmm, for some reason

"/[^/a-zA-Z0-9][a-zA-Z0-9]+"

doesn't work for me - it doesn't find anything in the string (http://...) above.

isn't it a contradiction? we want it to have the first character '/' but not to start with '/'...

regards
0
 
YurichCommented:
this one works fine, should i accept the answer? ;)
0
 
Bob LearnedCommented:
I would if I were you :))  Actually, I'll split with you ;)

Bob
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.