Solved

Parsing a HTML from the Internet with ASP.NET

Posted on 2009-07-07
8
226 Views
Last Modified: 2012-05-07
Hi,
I need to load and parse an HTML file from a website.
I found a .NET project called "HTML Agility Pack" that lets me parse HTML easily, but not while they're still online. In other words, I can't specify an URI or URL as a file location (Same goes for the FileInfo constructor in C# Syste.IO namespace).

So instead, I'm guessing I have to download the file first, but I need to handle the download with server side code.

To put things into perspective, I am building a web service that must generate an XML file from a HTML site thats full of dropdowns (the webservice will be used internally by the university that have asked me to do this for them). I can't have direct access to the database from which the HTML page is getting its data, therefore the cumbersome workaround.

What the best way to do this? Thank you.
0
Comment
Question by:uhm179
  • 3
  • 2
  • 2
  • +1
8 Comments
 
LVL 8

Accepted Solution

by:
lharrispv earned 150 total points
ID: 24794970
HTTP Web Request.  Uses an HTTP Request to return the page for you and I think you might actually be able to get it to retrun in XML format already.

http://odetocode.com/articles/162.aspx
0
 
LVL 21

Expert Comment

by:silemone
ID: 24795037
well if you can't have access to the database, get in touch with the DB admin at the campus and have him create an interface/service that will allow you to get the data before you go through what they are asking...It will be tedious...
0
 
LVL 21

Expert Comment

by:silemone
ID: 24795057
by the way, I worked at an university where I wasn't allowed to touch the db, so I would go the route of gettin the DB admin to create API's for me to connect to retrieve the data as I just suggested.  That's more than reasonable...
0
 
LVL 33

Assisted Solution

by:Todd Gerbert
Todd Gerbert earned 150 total points
ID: 24795125
System.Net.WebRequest webRequest = System.Net.WebRequest.Create("http://www.server.com/page.htm");
System.Net.WebResponse webResponse = webRequest.GetResponse();
System.IO.StreamReader reader = new System.IO.StreamReader(webResponse.GetResponseStream());
string theHtml = reader.ReadToEnd();
0
Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

 
LVL 8

Expert Comment

by:lharrispv
ID: 24795156
Hmm looks like the same thing I said..
0
 
LVL 33

Expert Comment

by:Todd Gerbert
ID: 24795177
Yup.

I just hadn't seen your post yet.
0
 
LVL 8

Expert Comment

by:lharrispv
ID: 24795214
Sorry.. its just the last 3 or 4 posts I have made to this group I have been the first poster and then someone came along later said the same thing in a different way and wound up getting the points....makes it hard to get your months quota :-\
0
 

Author Closing Comment

by:uhm179
ID: 31600765
I'm gonna split the points between iharrispv and tgerbert.

iharrispv, I would've assigned all the points to you but the page you linked to contained a bunch of code that I'd have to pick apart first to find exactly what I was looking for. tgerbert provided the bit of code that gave me an idea of what kind of code I had to look out for. Had you combined the link with a quick code example (due to the nature of the link), then it would have been perfect.
I don't know the nature of your other posts, but in this specific example, I can easily imagine someone else giving full points to tgerbert simply because many may prefer code over links (to code).

hehe, yah I'd prefer direct access to the database, but my chef doesn't like the idea. I guess he has his reasons.

Thank you for your help guys.
0

Featured Post

What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

Join & Write a Comment

It was really hard time for me to get the understanding of Delegates in C#. I went through many websites and articles but I found them very clumsy. After going through those sites, I noted down the points in a easy way so here I am sharing that unde…
Not sure what the best email signature size is? Are you worried about email signature image size? Follow this best practice guide.
The viewer will learn the basics of jQuery, including how to invoke it on a web page. Reference your jQuery libraries: (CODE) Include your new external js/jQuery file: (CODE) Write your first lines of code to setup your site for jQuery.: (CODE)
The viewer will learn the basics of jQuery including how to code hide show and toggles. Reference your jQuery libraries: (CODE) Include your new external js/jQuery file: (CODE) Write your first lines of code to setup your site for jQuery…

759 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now