Link to home
Start Free TrialLog in
Avatar of MiracleByDesign
MiracleByDesign

asked on

ASP.NET 2.0 Screen Scraper

Hello,
I wanted to try to develop a screen scraper I can use for a client but I am having some issues. Before I buy one I thought I should post the question here.  I need this solution as soon as possible so any help would be appreciated!  I need to go to various court websites and scrape the case data.  I then need to be able to write the data to a csv file.  I will need to continue to add data to the same file.  Can anyone help.  On the web, I was told this is an easy process but all I am getting is a copy of the screen and data.  I just need the data to be written to a csv file.  This solution needs to be in ASP.NET/C#.NET.

Thank you in advance!
Miracle By Design
public partial class _Default : System.Web.UI.Page
    {
        String r;
 
        protected void Page_Load(object sender, EventArgs e)
        {
            string str = "http://www.clerk-alachua-fl.org/pa/pa.urd/pamw2000*o_case_sum?83518636"; 
            home.Text = screenscrape(str);
        }
 
        private string screenscrape(string url)
        {
            WebResponse obj;
            WebRequest obj1 = System.Net.HttpWebRequest.Create(url);
            obj = obj1.GetResponse();
            using (StreamReader sr = new StreamReader(obj.GetResponseStream()))
            {
                r = sr.ReadToEnd();
                sr.Close();
            }
            return r;
            
            gvResults.DataSource = r;
            // binds the databind
            gvResults.DataBind();
 
            // The following lines of code writes the extracted Urls to the file named test.txt
            StreamWriter sw = new StreamWriter(Server.MapPath("AlachuaCoFLCircuitCourt.csv"));
            sw.Write(r);
            sw.Close(); 
 
 
        }
        
    }
}

Open in new window

Avatar of BitRunner303
BitRunner303

Not sure what gvResutls is since it's a partial class.

Anyways though, the code that it looks like you put in would go out and get the HTML source of the page and read it into the string "r", then write it to a csv.

Problem though is that html source is not the same as csv...  if you want it in csv you're going to have to parse the html source (i.e. by using Regular Expressions), or do it a different way and do iterate through HTML Document Object Model (DOM) for the page.
Avatar of MiracleByDesign

ASKER

gvResults is a datagrid that I thought I could right the data to and then maybe export it to a csv file.  Can you give me a coding example of how to solve this problem?
Does the information actually populate into the DataGridView (I imagine not but might as well confirm), if it does then I can get you from there to the file.
No, it does not populate into the grid.  That is my main problem.  I have exported data from a grid to Excel before but I am new to scraping a website for data only.
I'll be able to give you some help on this but it'll take some work since I'll have to do parsing on the html.  I'll send up something to show pretty soon.
ASKER CERTIFIED SOLUTION
Avatar of BitRunner303
BitRunner303

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I will add this code to my project tonight and let you know what happens.  Thank you very much for your help with this project.
No prob let me know how it goes.
BitRunner303-The solution works but the client has another request that I am not sure you can help me with. I need to save the actual scraper to an XML file so it can be used in another program.  Do you have any idea how I would do this?

thanks,
MiracleByDesign
Simple.  Here's a tutorial on using the XML writer features in .NET: http://www.c-sharpcorner.com/UploadFile/mahesh/writexmlusingXmlWriter11132005233450PM/writexmlusingXmlWriter.aspx

You'll simply iterate through the rows your final DataSet, writing out the elements of each record.  I would probably do it something like so (snippet).

Basically you'd write a root element for the file, that I've called CaseFile here.  Then for each Case write out all the XmlElements.  If you need some more help with this let me know but this should get you going.
<CaseFile>
  <Case>
      <DefendantName>John Doe</DefendantName>
      <ProsecutingAttorney>Jack McCoy</ProsecutingAttorney>
  </Case>
</CaseFile>

Open in new window

Thank you very much!!!!