Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

c# extract html from site with console program

Posted on 2013-11-07
3
Medium Priority
?
351 Views
Last Modified: 2013-11-12
I'm using a c# program to extract to a string the following page:

http://signssafety.com/signsafety/ProductDescription.aspx?productID=7

and I'm using the following code:

urlItem = "http://signssafety.com/signsafety/ProductDescription.aspx?productID=7"
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(urlItem);
request.UserAgent = "Foo";
request.ContentType = "text/html; charset=UTF-8";
Encoding wind1252 = Encoding.GetEncoding(1252);
request.UseDefaultCredentials = true;
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
                        StreamReader myStreamReader = new streamReader(response.GetResponseStream(), wind1252);
 string responseString = myStreamReader.ReadToEnd();
 request.Abort();
StreamWriter swwrite = new StreamWriter(@"Items.html");
swwrite.Write(responseString);
swwrite.Close();

Open in new window


When I view the downloaded Items.html file I see that the the actual page that was downloaded was:

"http://www.signssafety.com/signsafety"

and not the page in the link above.

I want to continue using the c# console program, and don't want to use the WebBrowser object. Does anyone know what I'm doing wrong or what can be done using the C# console to download the actual page?
0
Comment
Question by:esak2000
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
3 Comments
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 39630438
I would suggest using the HTML Agility Pack (available through NuGet also) if you are going to be parsing HTML. It is very flexible in terms of handling various qualities of HTML.

For your needs, you could do something like:

HtmlAgilityPack.HtmlWeb client = new HtmlAgilityPack.HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = client.Load("http://signssafety.com/signsafety/ProductDescription.aspx?productID=7");

doc.Save("Items.html");

Open in new window


HAP provides both LINQ and XPath mechanisms for extracting data from HTML. Both of these would be more reliable in terms of locating data within the HTML source than would straight string searching.
0
 

Accepted Solution

by:
esak2000 earned 0 total points
ID: 39631040
Thanks for the tip. In the end I used the internet explorer object to download the html files to my local computer and used stream reader to read the files.
0
 

Author Closing Comment

by:esak2000
ID: 39641076
my comment was what the better solution for what I wanted
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

#Citrix #Internet Explorer #Enterprise Mode #IE 11 #IE 8
The article shows the basic steps of integrating an HTML theme template into an ASP.NET MVC project
Google currently has a new report that is in beta and coming soon to Webmaster Tool accounts. This Micro Tutorial will highlight new features for Google Webmaster Tools.
Want to learn how to record your desktop screen without having to use an outside camera. Click on this video and learn how to use the cool google extension called "Screencastify"! Step 1: Open a new google tab Step 2: Go to the left hand upper corn…

618 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question