Link to home
Start Free TrialLog in
Avatar of zhshqzyc
zhshqzyc

asked on

MSDN library address: regular express

Hello, I want to look up and print out all MSDN library web pages. The sample web page likes
http://msdn.microsoft.com/en-us/library/e7sf90t3.aspx

Open in new window

Can you please look at my code?
Thanks.
string pattern = @"http://msdn.microsoft.com/en-us/library/[a-zA-Z0-9].+.aspx";
Regex r = new Regex(pattern);
foreach (Match match in r.Matches) //not sure how
 {
         Console.WriteLine("Address: {0}", match);
  }

Open in new window

Avatar of kaufmed
kaufmed
Flag of United States of America image

Corrected:
string pattern = @"http://msdn\.microsoft\.com/en-us/library/[a-zA-Z0-9]+\.aspx";

Open in new window

Also:
foreach (Match match in r.Matches) //not sure how
{
    Console.WriteLine("Address: {0}", match.Value);
}

Open in new window

Avatar of zhshqzyc
zhshqzyc

ASKER

I got an error
Foreach cannot operate on a 'method group'. 
Did you intend to invoke the 'method group'?

Open in new window

I want to grab all web pages.
Ooops! Silly oversight on my part. You need parentheses on the call to Matches(). You also need to pass the string that holds the data:
foreach (Match match in r.Matches(source_data_string))
{
    Console.WriteLine("Address: {0}", match.Value);
}

Open in new window

I want to grab all web pages.
I'm not quite sure what you mean by this. Are you saying you want to extract all the MSDN URLs from a string of data, or are you saying you want to download all of the MSDN pages in the MSDN library? You cannot do the latter with regex alone.
Yes, actually I want to download all of the MSDN pages in the MSDN library. I know regex is not enough. But I think that I should get all web pages address first. I saw that there is a similar link DownloadString
So in your code
foreach (Match match in r.Matches(source_data_string))

Open in new window

How can I get
source_data_string

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of kaufmed
kaufmed
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I have another web
bash command
I used the pattern
string pattern = @"([a-zA-Z0-9-]+\.html)";

Open in new window

Is it right?