MSDN library address: regular express

Hello, I want to look up and print out all MSDN library web pages. The sample web page likes
http://msdn.microsoft.com/en-us/library/e7sf90t3.aspx

Open in new window

Can you please look at my code?
Thanks.
string pattern = @"http://msdn.microsoft.com/en-us/library/[a-zA-Z0-9].+.aspx";
Regex r = new Regex(pattern);
foreach (Match match in r.Matches) //not sure how
 {
         Console.WriteLine("Address: {0}", match);
  }

Open in new window

zhshqzycAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

käµfm³d 👽Commented:
Corrected:
string pattern = @"http://msdn\.microsoft\.com/en-us/library/[a-zA-Z0-9]+\.aspx";

Open in new window

0
käµfm³d 👽Commented:
Also:
foreach (Match match in r.Matches) //not sure how
{
    Console.WriteLine("Address: {0}", match.Value);
}

Open in new window

0
zhshqzycAuthor Commented:
I got an error
Foreach cannot operate on a 'method group'. 
Did you intend to invoke the 'method group'?

Open in new window

0
CompTIA Security+

Learn the essential functions of CompTIA Security+, which establishes the core knowledge required of any cybersecurity role and leads professionals into intermediate-level cybersecurity jobs.

zhshqzycAuthor Commented:
I want to grab all web pages.
0
käµfm³d 👽Commented:
Ooops! Silly oversight on my part. You need parentheses on the call to Matches(). You also need to pass the string that holds the data:
foreach (Match match in r.Matches(source_data_string))
{
    Console.WriteLine("Address: {0}", match.Value);
}

Open in new window

0
käµfm³d 👽Commented:
I want to grab all web pages.
I'm not quite sure what you mean by this. Are you saying you want to extract all the MSDN URLs from a string of data, or are you saying you want to download all of the MSDN pages in the MSDN library? You cannot do the latter with regex alone.
0
zhshqzycAuthor Commented:
Yes, actually I want to download all of the MSDN pages in the MSDN library. I know regex is not enough. But I think that I should get all web pages address first. I saw that there is a similar link DownloadString
So in your code
foreach (Match match in r.Matches(source_data_string))

Open in new window

How can I get
source_data_string

Open in new window

0
käµfm³d 👽Commented:
Yes, actually I want to download all of the MSDN pages in the MSDN library.

Yikes! I hope your hard drive is big!

In any event, there is a WebClient class in the System.Net namespace that can be used to download web pages. You could scrape the returned page for links using the code above. Here is a quick (and crude) example of how to do so. I can't guarantee you won't run out of memory, though, trying to use the following. I have no clue, ultimately, how big the MSDN library is.

I included a capture group which will grab the name of the file for writing out the file.
using System;
using System.IO;
using System.Net;
using System.Collections.Generic;
using System.Text.RegularExpressions;

namespace ConsoleApplication10
{
    class Program
    {
        static void Main(string[] args)
        {
            WebClient client = new WebClient();     // Used to download pages
            Stack<string> urls = new Stack<string>(new string[] { "http://msdn.microsoft.com/en-us/library/ms123401.aspx" });   // Used to remember where we are in the site tree
            List<string> downloaded = new List<string>();   // Pages link to each other, so we only want to download what hasn't already been downloaded

            while (urls.Count > 0)
            {
                string toDownload = urls.Pop();     // Grab what's on top of the stack

                if (!downloaded.Contains(toDownload))   // Check that we haven't downloaded already
                {
                    string pageData = client.DownloadString(toDownload);    // Download
                    string pattern = @"http://msdn\.microsoft\.com/en-us/library/([a-zA-Z0-9]+\.aspx)";

                    foreach (Match match in System.Text.RegularExpressions.Regex.Matches(pageData, pattern))    // Find all links
                    {
                        Console.WriteLine("Address: {0}", match.Value);
                        urls.Push(match.Value);

                        File.WriteAllText("myLocalFolder\\" + match.Groups[1].Value, pageData); // Groups[1] corresponds to the stuff inside the parentheses in the pattern
                    }
                }
            }
        }
    }
}

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
zhshqzycAuthor Commented:
I have another web
bash command
I used the pattern
string pattern = @"([a-zA-Z0-9-]+\.html)";

Open in new window

Is it right?
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Regular Expressions

From novice to tech pro — start learning today.