MiracleByDesign
asked on
ASP.NET 2.0 Screen Scraper
Hello,
I wanted to try to develop a screen scraper I can use for a client but I am having some issues. Before I buy one I thought I should post the question here. I need this solution as soon as possible so any help would be appreciated! I need to go to various court websites and scrape the case data. I then need to be able to write the data to a csv file. I will need to continue to add data to the same file. Can anyone help. On the web, I was told this is an easy process but all I am getting is a copy of the screen and data. I just need the data to be written to a csv file. This solution needs to be in ASP.NET/C#.NET.
Thank you in advance!
Miracle By Design
I wanted to try to develop a screen scraper I can use for a client but I am having some issues. Before I buy one I thought I should post the question here. I need this solution as soon as possible so any help would be appreciated! I need to go to various court websites and scrape the case data. I then need to be able to write the data to a csv file. I will need to continue to add data to the same file. Can anyone help. On the web, I was told this is an easy process but all I am getting is a copy of the screen and data. I just need the data to be written to a csv file. This solution needs to be in ASP.NET/C#.NET.
Thank you in advance!
Miracle By Design
public partial class _Default : System.Web.UI.Page
{
String r;
protected void Page_Load(object sender, EventArgs e)
{
string str = "http://www.clerk-alachua-fl.org/pa/pa.urd/pamw2000*o_case_sum?83518636";
home.Text = screenscrape(str);
}
private string screenscrape(string url)
{
WebResponse obj;
WebRequest obj1 = System.Net.HttpWebRequest.Create(url);
obj = obj1.GetResponse();
using (StreamReader sr = new StreamReader(obj.GetResponseStream()))
{
r = sr.ReadToEnd();
sr.Close();
}
return r;
gvResults.DataSource = r;
// binds the databind
gvResults.DataBind();
// The following lines of code writes the extracted Urls to the file named test.txt
StreamWriter sw = new StreamWriter(Server.MapPath("AlachuaCoFLCircuitCourt.csv"));
sw.Write(r);
sw.Close();
}
}
}
ASKER
gvResults is a datagrid that I thought I could right the data to and then maybe export it to a csv file. Can you give me a coding example of how to solve this problem?
Does the information actually populate into the DataGridView (I imagine not but might as well confirm), if it does then I can get you from there to the file.
ASKER
No, it does not populate into the grid. That is my main problem. I have exported data from a grid to Excel before but I am new to scraping a website for data only.
I'll be able to give you some help on this but it'll take some work since I'll have to do parsing on the html. I'll send up something to show pretty soon.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
I will add this code to my project tonight and let you know what happens. Thank you very much for your help with this project.
No prob let me know how it goes.
ASKER
BitRunner303-The solution works but the client has another request that I am not sure you can help me with. I need to save the actual scraper to an XML file so it can be used in another program. Do you have any idea how I would do this?
thanks,
MiracleByDesign
thanks,
MiracleByDesign
Simple. Here's a tutorial on using the XML writer features in .NET: http://www.c-sharpcorner.com/UploadFile/mahesh/writexmlusingXmlWriter11132005233450PM/writexmlusingXmlWriter.aspx
You'll simply iterate through the rows your final DataSet, writing out the elements of each record. I would probably do it something like so (snippet).
Basically you'd write a root element for the file, that I've called CaseFile here. Then for each Case write out all the XmlElements. If you need some more help with this let me know but this should get you going.
You'll simply iterate through the rows your final DataSet, writing out the elements of each record. I would probably do it something like so (snippet).
Basically you'd write a root element for the file, that I've called CaseFile here. Then for each Case write out all the XmlElements. If you need some more help with this let me know but this should get you going.
<CaseFile>
<Case>
<DefendantName>John Doe</DefendantName>
<ProsecutingAttorney>Jack McCoy</ProsecutingAttorney>
</Case>
</CaseFile>
ASKER
Thank you very much!!!!
Anyways though, the code that it looks like you put in would go out and get the HTML source of the page and read it into the string "r", then write it to a csv.
Problem though is that html source is not the same as csv... if you want it in csv you're going to have to parse the html source (i.e. by using Regular Expressions), or do it a different way and do iterate through HTML Document Object Model (DOM) for the page.