Solved

Download web pages quickly

Posted on 2011-03-18
6
409 Views
Last Modified: 2012-05-11
Here is a web page, there are many links. I don't want click them one by one.
Any method to extract the links and fast download?

http://msdn.microsoft.com/en-us/library/ff846392.aspx

Thanks
0
Comment
Question by:zhshqzyc
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
6 Comments
 
LVL 18

Expert Comment

by:Dennis Aries
ID: 35166242
You can download the page and parse it as an XML-file.
After that, you can loop through the a-tags and download the given reference.

Developer.Com has a nice article on parsing HTML to an XML-document that might be of some use to you.
0
 
LVL 6

Expert Comment

by:akajohn
ID: 35166249
If you want to mirror a web site (ffline). Try Teleport Pro.

http://www.tenmax.com/teleport/pro/home.htm

Otherwise if I have a lot of links to download , I normally add all of them to a text file and them ask wget (www.gnu.org/software/wget/) to download it for me.
0
 

Author Comment

by:zhshqzyc
ID: 35166410
Your methods are good but not specific for the web pages. I want to write code to download them.
Please notice they have format like
<a href="http://msdn.microsoft.com/en-us/library/ff846370.aspx" title="Excel 2010 Developer Reference">Excel 2010 Developer Reference</a></div>
<a href="http://msdn.microsoft.com/en-us/library/ff846437.aspx" title="AllowEditRange Object">AllowEditRange Object</a></div>

Open in new window

Any regular expresstion to collect links?
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 6

Expert Comment

by:akajohn
ID: 35166453
So if I understood you correctly you want to write a code in C# , VB to extract links from a Web Page and then download them individually ?

Thanks for clarifying.

A>
0
 

Accepted Solution

by:
zhshqzyc earned 0 total points
ID: 35166482
Not sure right or not
Regex reg = @"^<a\s(href="http://msdn.microsoft.com/en-us/library/ff)\d+(.aspx")";

Open in new window

0
 

Author Closing Comment

by:zhshqzyc
ID: 35225389
Fig it out by myself.
0

Featured Post

PeopleSoft Has Never Been Easier

PeopleSoft Adoption Made Smooth & Simple!

On-The-Job Training Is made Intuitive & Easy With WalkMe's On-Screen Guidance Tool.  Claim Your Free WalkMe Account Now

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
VB.Net creating Contact in Outlook 1 59
Find date of 2nd Thursday of each month 3 34
Nested forach loop to linq 3 30
Adjust the codes 3 37
IntroductionWhile developing web applications, a single page might contain many regions and each region might contain many number of controls with the capability to perform  postback. Many times you might need to perform some action on an ASP.NET po…
Entity Framework is a powerful tool to help you interact with the DataBase but still doesn't help much when we have a Stored Procedure that returns more than one resultset. The solution takes some of out-of-the-box thinking; read on!
Nobody understands Phishing better than an anti-spam company. That’s why we are providing Phishing Awareness Training to our customers. According to a report by Verizon, only 3% of targeted users report malicious emails to management. With compan…

730 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question