Solved

c# extract html from site with console program

Posted on 2013-11-07
3
342 Views
Last Modified: 2013-11-12
I'm using a c# program to extract to a string the following page:

http://signssafety.com/signsafety/ProductDescription.aspx?productID=7

and I'm using the following code:

urlItem = "http://signssafety.com/signsafety/ProductDescription.aspx?productID=7"
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(urlItem);
request.UserAgent = "Foo";
request.ContentType = "text/html; charset=UTF-8";
Encoding wind1252 = Encoding.GetEncoding(1252);
request.UseDefaultCredentials = true;
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
                        StreamReader myStreamReader = new streamReader(response.GetResponseStream(), wind1252);
 string responseString = myStreamReader.ReadToEnd();
 request.Abort();
StreamWriter swwrite = new StreamWriter(@"Items.html");
swwrite.Write(responseString);
swwrite.Close();

Open in new window


When I view the downloaded Items.html file I see that the the actual page that was downloaded was:

"http://www.signssafety.com/signsafety"

and not the page in the link above.

I want to continue using the c# console program, and don't want to use the WebBrowser object. Does anyone know what I'm doing wrong or what can be done using the C# console to download the actual page?
0
Comment
Question by:esak2000
  • 2
3 Comments
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 39630438
I would suggest using the HTML Agility Pack (available through NuGet also) if you are going to be parsing HTML. It is very flexible in terms of handling various qualities of HTML.

For your needs, you could do something like:

HtmlAgilityPack.HtmlWeb client = new HtmlAgilityPack.HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = client.Load("http://signssafety.com/signsafety/ProductDescription.aspx?productID=7");

doc.Save("Items.html");

Open in new window


HAP provides both LINQ and XPath mechanisms for extracting data from HTML. Both of these would be more reliable in terms of locating data within the HTML source than would straight string searching.
0
 

Accepted Solution

by:
esak2000 earned 0 total points
ID: 39631040
Thanks for the tip. In the end I used the internet explorer object to download the html files to my local computer and used stream reader to read the files.
0
 

Author Closing Comment

by:esak2000
ID: 39641076
my comment was what the better solution for what I wanted
0

Featured Post

Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Calculating holidays and working days is a function that is often needed yet it is not one found within the Framework. This article presents one approach to building a working-day calculator for use in .NET.
Today, still in the boom of Apple, PC's and products, nearly 50% of the computer users use Windows as graphical operating systems. If you are among those users who love windows, but are grappling to keep the system's hard drive optimized, then you s…
Shows how to create a shortcut to site-search Experts Exchange using Google in the Chrome browser. This eliminates the need to type out site:experts-exchange.com whenever you want to search the site. Launch the Search Engine Menu: In chrome, via you…
How to create a custom search shortcut to site-search Experts Exchange using Google in the Firefox browser. This eliminates the need to type out site:experts-exchange.com whenever you want to search the site. Launch your Bookmark Menu: Press 'Ctrl +…

825 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question