Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
Solved

Simulating javascript and cookies enabled when using the WebRequest object

Posted on 2008-10-01
13
3,434 Views
Last Modified: 2013-12-17
I am using WebRequest in a Windows form to retrun (scrape) the html of a web page.

The page I am using WebRequest to connect to has a check (onload I presume) to see if javascript is enabled and if it cannot verify that you have javascript enabled, then it will redirect you to a different page thus returning the HTML of the "Your javascript is not enabled" page.  If I visit the page using my javascript enabled browser, then it looks fine.

How do I simulate that javascript is enabled using the WebRequest object?
0
Comment
Question by:GaryRasmussen
  • 6
  • 5
  • 2
13 Comments
 
LVL 29

Expert Comment

by:QPR
ID: 22621688
Don't quote me but I can't think of a way round this short of forging headers and even then the javascript check maybe some sort of challenge.
Perform this javascript function and respond to me.... no response, you can't have javascript enabled. Redirect.
0
 
LVL 1

Author Comment

by:GaryRasmussen
ID: 22621929
They use a meta tag within a <noscript> tag like ...
<meta http-equiv="Refresh" content="0; URL=https://login.live.com/jsDisabled.srf?lc=1033"/></noscript>

So any scripting the page tries do do will automatically redirect you if any script cannot run.  There are so many web page scraping programs available, surely somebody figured out a way around this type of obstacle?
0
 
LVL 1

Author Comment

by:GaryRasmussen
ID: 22621948
Oh, and not to be rude but please don't respond to say you don't know because that just makes other people looking at the posts think somebody has already answered the question so they don't bother opening the thread.  I know that sounds really rude but I have seen this happen on this website alot.

Please understand that I hardly ever get a solution to my problems on this site, probably because I don't now the correct zones to post it to but I am desperate on this one and am running out of time.

Thanks,
0
Master Your Team's Linux and Cloud Stack!

The average business loses $13.5M per year to ineffective training (per 1,000 employees). Keep ahead of the competition and combine in-person quality with online cost and flexibility by training with Linux Academy.

 
LVL 29

Expert Comment

by:QPR
ID: 22622019
Well then you don't understand Experts Exchange very well.
Most of us that come here do so to help. We have unlimited points so they are of little importance.
If a question is marked as "awaiting answer" then most experts will read regardless of whether anybody else has posted a potential solution.

You may or may not know that putting a redirect within a <noscript> tag is a common way to deter screen scraping and auto site downloading.
If you or another do find a way to bypass this I'd be happy to read about it. Not that I'm saying there isn't a way.
0
 
LVL 1

Author Comment

by:GaryRasmussen
ID: 22625686
Thanks QPR,

Ok, well it just seemed like when I or others would post a question and 1 person would post an incorrect answer, the question would just go unanswered and I figured that the experts must have assumed the question was answered or being worked on ny somebody else because they never had any input after the initial reply.

Yes, I know that the noscript tag is designed to do exactly what they are using it for.  Whether they are using it because they truly do require javascript or to deter scraping is anyones guess.  Why all the hubba about scraping?  I mean I have a login and username so I have been given access to the site so that I can access my information.  I know I wouldn't care if someone scraped their information from one of my websites as opposed to viewing it in a browser.

Does anybody know how to simultate having javascript for web pages that require javascriipt whenn using a webrequest?  There are 3rd party screen scrapers that work.  Anybody know how they do it?

Thanks!
0
 
LVL 3

Expert Comment

by:tasky
ID: 22626594
Gary,

I really don't see the issue. WebRequest doesn't actually PARSE any HTML, so you won't be redirected. However, you may have to manually recreate what actions a browser takes, like loading an external Javascript and parsing out values to redirect yourself to another location. But, when the WebRequest makes the request, it does not tell the server whether Javascript is installed. You will have to manually follow the logic for that site (I recommend either IEWatch or Wireshark to log packets).

If you have any questions, I'll be happy to respond. I have coded a LOT of scrapers in my time.
0
 
LVL 1

Author Comment

by:GaryRasmussen
ID: 22626884
If you use WebRequest to create a connection to a web page that has some javascript embedded in it like maybe, location.href="page.htm"

The WebRequest doesn't do anything with javascript, (and it shouldn't) so the page will run whatever it has between it's <noscript> tags which is to redirect the WebRequest to a different page which is not the page I want to scrape.

<noscript>
<meta http-equiv="Refresh" content="0; URL=https://login.live.com/jsDisabled.srf?lc=1033"/>
</noscript>

I don't know how to explain the issue and clearer.  Can you ask me whatever it is you need to help me explain the issue better?   I really want to figure this one out and could really use your help, especially if you coded alot of scrapers, this is probably a no brainer for you?

Thanks!

0
 
LVL 3

Expert Comment

by:tasky
ID: 22626939
But thats the thing, the WebRequest doesn't execute ANYTHING. It just grabs the source. What you're going to have to do is parse out the location.href="" and manually grab that page next. But, if you are using a WebRequest object, it's important to know that it does not TOUCH the HTML at all. It merely grabs it.
0
 
LVL 1

Author Comment

by:GaryRasmussen
ID: 22627059
In order to stream the page content of the page I want back to my application, I have to connect or "grab" the page I want first.  When I "grab" the page, the page notices that javascript cannot be run so it redirects the request to some other page.

So I parse out the javascript from the page content of the page I am getting redirected to and I see what the url is that they tried to send me to.  What does that do for me?  If I go there,  I am still going to be redirected back to the "Your browser does not support javascript" page.

0
 
LVL 3

Expert Comment

by:tasky
ID: 22627100
No. The page does not "notice" that Javascript can't be run.  WebRequest does not execute anything. Ever. All WebRequest does is grab the source code to the page. The server does not know if the client has Javascript or not. That is up to the browser to decide. However, the HTML is the same nonetheless. If you are getting redirected, it's because of another issue (no cookies, etc) from the server using the "Location: x" header.
0
 
LVL 3

Expert Comment

by:tasky
ID: 22627126
Now, if you are trying to pass the HTML from the WebRequest into a browser on a form, you're going to have a lot of issues. First of all, Cookies will not be shared. So any site that uses state-based logic (i.e. to log in) will be broken with this approach.
0
 
LVL 3

Expert Comment

by:tasky
ID: 22627181
If you are NOT using a browser on the form, and are trying to make subsequent requests to the same site, preserving cookies is rather easy. You need to create an instance of a CookieContainer object and assign it to the WebRequest's CookieContainer. From there, it will preserve and update cookies automatically. Take the following code example:

Dim jar As New CookieContainer
Dim req As HttpWebRequest = HttpWebRequest.Create("https://login.live.com/")
req.CookieContainer = jar
Dim resp As HttpWebResponse = req.GetResponse
 
req = HttpWebRequest.Create("https://login.live.com/doSubmit")
req.CookieContainer = jar
req.Method = "POST"
' etc etc
resp = req.GetResponse

Open in new window

0
 
LVL 1

Accepted Solution

by:
GaryRasmussen earned 0 total points
ID: 22627263
hmm, well, maybe I am going about this the wrong way?

I am not trying to pass the html into a browser.  I just wanted to save the page content to a string so I could parse the string to find a particular value with the following:

WebRequest myReq = WebRequest.Create(url);
WebResponse webResponse = myReq.GetResponse();
StreamReader webResponseStream = new StreamReader(webResponse.GetResponseStream());
string strContent = webResponseStream.ReadToEnd();
return strContent;

But when I do, the page detects that javascript is not enabled with this ...

<noscript>
<meta http-equiv="Refresh" content="0; URL=https://login.live.com/jsDisabled.srf?lc=1033"/>
</noscript>

And my string contains the html from https://login.live.com/jsDisabled.srf?lc=1033

This code works exactly as expected on web pages that don't redirect with a noscript condition?
0

Featured Post

Free Tool: Postgres Monitoring System

A PHP and Perl based system to collect and display usage statistics from PostgreSQL databases.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

"In order to have an organized way for empathy mapping, we rely on a psychological model and trying to model it in a simple way, so we will split the board to three section for each persona and a scenario and try to see what those personas would Do,…
An enjoyable and seamless user experience can go a long way on an eCommerce site. While a cohesive layout and engaging copy play roles in creating a positive user experience, some sites neglect aspects that seem marginal but in actuality prove very …
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.
Any person in technology especially those working for big companies should at least know about the basics of web accessibility. Believe it or not there are even laws in place that require businesses to provide such means for the disabled and aging p…

856 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question