Solved

c# extract html from site with console program

Posted on 2013-11-07
3
345 Views
Last Modified: 2013-11-12
I'm using a c# program to extract to a string the following page:

http://signssafety.com/signsafety/ProductDescription.aspx?productID=7

and I'm using the following code:

urlItem = "http://signssafety.com/signsafety/ProductDescription.aspx?productID=7"
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(urlItem);
request.UserAgent = "Foo";
request.ContentType = "text/html; charset=UTF-8";
Encoding wind1252 = Encoding.GetEncoding(1252);
request.UseDefaultCredentials = true;
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
                        StreamReader myStreamReader = new streamReader(response.GetResponseStream(), wind1252);
 string responseString = myStreamReader.ReadToEnd();
 request.Abort();
StreamWriter swwrite = new StreamWriter(@"Items.html");
swwrite.Write(responseString);
swwrite.Close();

Open in new window


When I view the downloaded Items.html file I see that the the actual page that was downloaded was:

"http://www.signssafety.com/signsafety"

and not the page in the link above.

I want to continue using the c# console program, and don't want to use the WebBrowser object. Does anyone know what I'm doing wrong or what can be done using the C# console to download the actual page?
0
Comment
Question by:esak2000
  • 2
3 Comments
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 39630438
I would suggest using the HTML Agility Pack (available through NuGet also) if you are going to be parsing HTML. It is very flexible in terms of handling various qualities of HTML.

For your needs, you could do something like:

HtmlAgilityPack.HtmlWeb client = new HtmlAgilityPack.HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = client.Load("http://signssafety.com/signsafety/ProductDescription.aspx?productID=7");

doc.Save("Items.html");

Open in new window


HAP provides both LINQ and XPath mechanisms for extracting data from HTML. Both of these would be more reliable in terms of locating data within the HTML source than would straight string searching.
0
 

Accepted Solution

by:
esak2000 earned 0 total points
ID: 39631040
Thanks for the tip. In the end I used the internet explorer object to download the html files to my local computer and used stream reader to read the files.
0
 

Author Closing Comment

by:esak2000
ID: 39641076
my comment was what the better solution for what I wanted
0

Featured Post

PeopleSoft Has Never Been Easier

PeopleSoft Adoption Made Smooth & Simple!

On-The-Job Training Is made Intuitive & Easy With WalkMe's On-Screen Guidance Tool.  Claim Your Free WalkMe Account Now

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Internet options/Settings 1 68
What browser will run Java? 7 127
itextsharp with c# 3 18
ComboBox to String Not Working 5 20
Introduction If you're like most people, you have occasionally made a typographical error when you're entering information into an online form.  And to your consternation, the browser remembers the error, and offers to autocomplete your future entr…
This article offers some helpful and general tips for safe browsing and online shopping. It offers simple and manageable procedures that help to ensure the safety of one's personal information and the security of any devices.
This Micro Tutorial will demonstrate how to add subdomains to your content reports. This can be very importing in having a site with multiple subdomains.
How to create a custom search shortcut to site-search Experts Exchange using Google in the Firefox browser. This eliminates the need to type out site:experts-exchange.com whenever you want to search the site. Launch your Bookmark Menu: Press 'Ctrl +…

726 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question