[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

scraping a web page using System.Net.HttpWebReques causes 403 error

Posted on 2012-09-01
7
Medium Priority
?
945 Views
Last Modified: 2012-09-18
hi i am trying to scrape a web page but when i try to run it it falls over at
 objResponse = objRequest.GetResponse();
with
The remote server returned an error: (403) Forbidden.
any help have been reading up but not sure what to do here maybe a user agent but not sure

private void button2_Click(object sender, EventArgs e)
        {
            string url = "http://www.clusty.com/search?input-form=clusty-simple&v%3Asources=webplus-ns-uf&v%3Aproject=clusty-original&query=dogs";
            string strResult = "";

     //       WebRequest.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.6) Gecko/20060728 Firefox/1.5";
            WebResponse objResponse;
            WebRequest objRequest = System.Net.HttpWebRequest.Create(url);

            objResponse = objRequest.GetResponse();

            using (StreamReader sr = new StreamReader(objResponse.GetResponseStream()))
            {
                strResult = sr.ReadToEnd();
                // Close and clean up the StreamReader
                sr.Close();
            }

            // Display results to a webpage
     //       Response.Write(strResult);
        }
0
Comment
Question by:sydneyguy
7 Comments
 
LVL 36

Assisted Solution

by:Miguel Oz
Miguel Oz earned 500 total points
ID: 38357126
0
 

Author Comment

by:sydneyguy
ID: 38357147
its not the problem with pharsing the html it the fact the the web site data cannot be seen and throws up a 403,
it works for google but not clusty is this the same problem or are we looking at two different prob here
0
 
LVL 17

Assisted Solution

by:selvol
selvol earned 500 total points
ID: 38357264
Maybe I missed it?

What is your Referrer?

Try is as

http://www.clusty.com/search?input-form=clusty-simple&v%3Asources=webplus-ns-uf&v%3Aproject=clusty-original&query=dogs

Open in new window

0
Get your Disaster Recovery as a Service basics

Disaster Recovery as a Service is one go-to solution that revolutionizes DR planning. Implementing DRaaS could be an efficient process, easily accessible to non-DR experts. Learn about monitoring, testing, executing failovers and failbacks to ensure a "healthy" DR environment.

 
LVL 75

Assisted Solution

by:käµfm³d 👽
käµfm³d   👽 earned 500 total points
ID: 38357312
It's entirely possible that the host uses cookies or Javascript to ensure that bots (automated  programs) don't access the pages. I would first ensure that you are not violating the site's terms of service.  Then I would suggest using a tool like Fiddler to examine the requests your browser sends to see if such behavior is occurring.
0
 
LVL 17

Expert Comment

by:selvol
ID: 38357733
kaufmed. stated use Fiddler.  I'd listen to him for sure.

i had actually ran a fiddle earlier for a brief run on Crusty.com.
I did not look deep enough to conclude anything.

Again  kaufmed stated cookies.

I had noticed you script did not contain and Cookie jar, referrer or Follow instructions for you bot.

I do quite a bit a scraping. I'll admit not in  C#.
But the process is the same.

You will find some site Require a Referrer from that site.
Some a cookie.
Some a Redirect.
Some 3 redirects 2 cookie and a referrer.
Some none.

Selvol
0
 
LVL 10

Accepted Solution

by:
eguilherme earned 500 total points
ID: 38360633
Are you behind a proxy/firewall? That error (403) its that you are required to send an authorization header that you did not send, could be basic authentication or not, either way, what you could do to check it, is, logoff / logon at the machine, start Fiddler and navigate the website, if you are behind a http proxy, you should see not only the request to the site, but also a post to the proxy,
0
 

Author Closing Comment

by:sydneyguy
ID: 38411707
thanks for all your help
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article aims to explain the working of CircularLogArchiver. This tool was designed to solve the buildup of log file in cases where systems do not support circular logging or where circular logging is not enabled
Simulator games are perfect for generating sample realistic data streams, especially for learning data analysis. It is even useful for demoing offerings such as Azure stream analytics, PowerBI etc.
Progress
Starting up a Project

830 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question