?
Solved

Download all Zip files from a webpage

Posted on 2012-08-23
1
Medium Priority
?
522 Views
Last Modified: 2012-09-06
Need to access the below website and download all the listed Zip files to a local folder on my system.  These files could change so need the program to read current list.

http://www.window.state.tx.us/taxinfo/requests/taxfiles.html

I did find the following code to download a file but I need something to create a list of the zip files on the site.

        private void btn_download_Click(object sender, EventArgs e)
        {
            WebClient webClient = new WebClient();
            webClient.DownloadFileCompleted += new AsyncCompletedEventHandler(Completed);
            webClient.DownloadProgressChanged += new DownloadProgressChangedEventHandler(ProgressChanged);
            webClient.DownloadFileAsync(new Uri("http://www.window.state.tx.us/taxinfo/requests/stp04-02ph.zip"), @"c:\data\DL\stp04-02ph.zip");
        }

        private void ProgressChanged(object sender, DownloadProgressChangedEventArgs e)
        {
            progressBar.Value = e.ProgressPercentage;
        }

        private void Completed(object sender, AsyncCompletedEventArgs e)
        {
            MessageBox.Show("Download completed!");
        }

Open in new window

0
Comment
Question by:kwitcom
1 Comment
 
LVL 75

Accepted Solution

by:
käµfm³d   👽 earned 2000 total points
ID: 38325758
I'd say look into the HTML Agility Pack. You can use this tool to download the HTML of the page and extract out all the zip filenames. Then use your WebClient to download the files. For example:

using (WebClient webClient = new WebClient())
{
    HtmlDocument doc = new HtmlDocument();
    Uri baseAddress = new Uri("http://www.window.state.tx.us/taxinfo/requests/");
    Uri mainPage = new Uri(baseAddress, "taxfiles.html");
    string html = webClient.DownloadString(mainPage.ToString());

    doc.LoadHtml(html);

    foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@href]"))
    {
        if (link.InnerText.EndsWith(".zip"))
        {
            Console.WriteLine(link.InnerText);
        }
    }
}

Open in new window


I'm dumping to the Console, but I think you can see how to modify it to download the file. Do note that since the site uses relative URLs, you will need a touch of string manipulation to get the correct URL to pass to the WebClient. This should be relatively straight-forward, though.
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Article by: Najam
Having new technologies does not mean they will completely replace old components.  Recently I had to create WCF that will be called by VB6 component.  Here I will describe what steps one should follow while doing so, please feel free to post any qu…
This article is for Object-Oriented Programming (OOP) beginners. An Interface contains declarations of events, indexers, methods and/or properties. Any class which implements the Interface should provide the concrete implementation for each Inter…
Please read the paragraph below before following the instructions in the video — there are important caveats in the paragraph that I did not mention in the video. If your PaperPort 12 or PaperPort 14 is failing to start, or crashing, or hanging, …
Despite its rising prevalence in the business world, "the cloud" is still misunderstood. Some companies still believe common misconceptions about lack of security in cloud solutions and many misuses of cloud storage options still occur every day. …
Suggested Courses
Course of the Month14 days, 19 hours left to enroll

840 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question