We help IT Professionals succeed at work.

We've partnered with Certified Experts, Carl Webster and Richard Faulkner, to bring you a podcast all about Citrix Workspace, moving to the cloud, and analytics & intelligence. Episode 2 coming soon!Listen Now

x

C# autonomous web browsing

Medium Priority
842 Views
Last Modified: 2012-05-06
I would like to mine some data on a given page.

I want to be able to be able to download a webpage and load some values into a text box:

example
      using (System.Net.WebClient client = new System.Net.WebClient()) {
           try {
              return client.DownloadString("www.goggle.ca");
           }
           catch{}
      }

The thing is I want to be able to when the page downloads. In the case of google there is a main textbox on the page. I want to be able to programmically enter something into the textbox and then resubmit the page and then redownload the page using the same method above.

The point is I want to be able to write a bunch of operations (go to google.ca, search for cats, choose 3rd one) all programmically..Almost like a crawler. expect one that can interact with the page..

Any ideas on this would be very appreciated.
Comment
Watch Question

It sounds almost like you are looking for a macro bot, since you want it to interact with the interface.

There is a PHP class called snoopy that emulates web browsing, you could look at it and maybe get some ideas. http://sourceforge.net/projects/snoopy/

Author

Commented:
That's cool any C# examples? something more.....NETish?
Here is a C# browser wrapper class that lets you build your own browser, which means you can control what is being displayed and returned.

http://www.codeproject.com/KB/miscctrl/csEXWB.aspx

Not the solution you were looking for? Getting a personalized solution is easy.

Ask the Experts
Technical SEO Consultant
Commented:
I put this together once to test some websites. If you call it using post=true it will return the result of posting a form to the url with the field values specified.
   public string RequestText(string url, NameValueCollection fields, bool post)
    {
        if (post)
        {
            WebRequest request = WebRequest.Create(url);
 
            request.Method = "POST";
 
            request.ContentType = "application/x-www-form-urlencoded";     // content type
 
            StreamWriter writer = new StreamWriter(request.GetRequestStream());
            bool first = true;
            foreach (string key in fields.Keys)
            {
                if (first)
                    first = false;
                else
                    writer.Write("&");
 
                writer.Write(Server.UrlEncode(key) + "=" + Server.UrlEncode(fields[key]));
            }
            writer.Close();
 
            WebResponse response = request.GetResponse();
 
            Stream stream = response.GetResponseStream();
 
            StreamReader reader = new StreamReader(stream);
 
            string content = reader.ReadToEnd();
 
            return content;
        }
        else
        {
            string fullUrl = url;
 
            if (fields!=null && fields.Count > 0)
            {
                if (!fullUrl.Contains("?"))
                    fullUrl += "?";
 
                foreach (string key in fields.Keys)
                {
                    if (!fullUrl.EndsWith("&"))
                        fullUrl += "&";
 
                    fullUrl += Server.UrlEncode(key) + "=" + Server.UrlEncode(fields[key]);
                }
            }
            return RequestText(fullUrl);
        }
    }
    public string RequestText(string url)
    {
        return this.WebClient.DownloadString(url);
    }
    private WebClient _WebClient = new WebClient();
 
    public WebClient WebClient
    {
        get { return _WebClient; }
        set { _WebClient = value; }
    }

Open in new window

Author

Commented:
silly question but..is you solution dependent on whether the form's action is Get or Post? I would like this solution to work with any webpage.

I'm guessing the NameValuePair is the id of the field and the contents of the field right? I also just noticed you are pasting the values into the URL. What if the form method is post? like asp.net pages. will that still work?

Ghost
Tony McCreathTechnical SEO Consultant

Commented:
The solution will do get or post based on if you set post to true.

If it posts then the data is added to the data stream while non-post (get) will add the data to the url.

namevalue is related as you state (is it 'id' or 'name' attributes that get posted?)



Author

Commented:
I see. very cool.

I'm going to give it a shot later in the week. I'll get back to you.
Access more of Experts Exchange with a free account
Thanks for using Experts Exchange.

Create a free account to continue.

Limited access with a free account allows you to:

  • View three pieces of content (articles, solutions, posts, and videos)
  • Ask the experts questions (counted toward content limit)
  • Customize your dashboard and profile

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.