Parsing a HTML from the Internet with ASP.NET

Posted on 2009-07-07
Medium Priority
Last Modified: 2012-05-07
I need to load and parse an HTML file from a website.
I found a .NET project called "HTML Agility Pack" that lets me parse HTML easily, but not while they're still online. In other words, I can't specify an URI or URL as a file location (Same goes for the FileInfo constructor in C# Syste.IO namespace).

So instead, I'm guessing I have to download the file first, but I need to handle the download with server side code.

To put things into perspective, I am building a web service that must generate an XML file from a HTML site thats full of dropdowns (the webservice will be used internally by the university that have asked me to do this for them). I can't have direct access to the database from which the HTML page is getting its data, therefore the cumbersome workaround.

What the best way to do this? Thank you.
Question by:uhm179
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
  • 2
  • +1

Accepted Solution

lharrispv earned 600 total points
ID: 24794970
HTTP Web Request.  Uses an HTTP Request to return the page for you and I think you might actually be able to get it to retrun in XML format already.

LVL 21

Expert Comment

ID: 24795037
well if you can't have access to the database, get in touch with the DB admin at the campus and have him create an interface/service that will allow you to get the data before you go through what they are asking...It will be tedious...
LVL 21

Expert Comment

ID: 24795057
by the way, I worked at an university where I wasn't allowed to touch the db, so I would go the route of gettin the DB admin to create API's for me to connect to retrieve the data as I just suggested.  That's more than reasonable...
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

LVL 33

Assisted Solution

by:Todd Gerbert
Todd Gerbert earned 600 total points
ID: 24795125
System.Net.WebRequest webRequest = System.Net.WebRequest.Create("http://www.server.com/page.htm");
System.Net.WebResponse webResponse = webRequest.GetResponse();
System.IO.StreamReader reader = new System.IO.StreamReader(webResponse.GetResponseStream());
string theHtml = reader.ReadToEnd();

Expert Comment

ID: 24795156
Hmm looks like the same thing I said..
LVL 33

Expert Comment

by:Todd Gerbert
ID: 24795177

I just hadn't seen your post yet.

Expert Comment

ID: 24795214
Sorry.. its just the last 3 or 4 posts I have made to this group I have been the first poster and then someone came along later said the same thing in a different way and wound up getting the points....makes it hard to get your months quota :-\

Author Closing Comment

ID: 31600765
I'm gonna split the points between iharrispv and tgerbert.

iharrispv, I would've assigned all the points to you but the page you linked to contained a bunch of code that I'd have to pick apart first to find exactly what I was looking for. tgerbert provided the bit of code that gave me an idea of what kind of code I had to look out for. Had you combined the link with a quick code example (due to the nature of the link), then it would have been perfect.
I don't know the nature of your other posts, but in this specific example, I can easily imagine someone else giving full points to tgerbert simply because many may prefer code over links (to code).

hehe, yah I'd prefer direct access to the database, but my chef doesn't like the idea. I guess he has his reasons.

Thank you for your help guys.

Featured Post

Secure Your WordPress Site: 5 Essential Approaches

WordPress is the web's most popular CMS, but its dominance also makes it a target for attackers. Our eBook will show you how to:

Prevent costly exploits of core and plugin vulnerabilities
Repel automated attacks
Lock down your dashboard, secure your code, and protect your users

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Real-time is more about the business, not the technology. In day-to-day life, to make real-time decisions like buying or investing, business needs the latest information(e.g. Gold Rate/Stock Rate). Unlike traditional days, you need not wait for a fe…
Performance in games development is paramount: every microsecond counts to be able to do everything in less than 33ms (aiming at 16ms). C# foreach statement is one of the worst performance killers, and here I explain why.
In this tutorial viewers will learn how to style elements, such a divs, with a "drop shadow" effect using the CSS box-shadow property Start with a normal styled element, such as a div.: In the element's style, type the box shadow property: "box-shad…
In this tutorial viewers will learn how to style a corner ribbon overlay for an image using CSS Create a new class by typing ".Ribbon":  Define the class' "display:" as "inline-block": Define its "position:" as "relative": Define its "overflow:" as …
Suggested Courses

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question