paygo
asked on
HOW TO SCREEN SCRAPE IFRAME with ASP / ASP.NET / Can't get IFRAME Contents of a non local domain - Thanks
Here's what I've heard so far-
1) I can't get to the HTML contents of an IFRAME that is pointing to a non-local URL such as src=http://www.yahoo.com.
2) I can't use the ActiveXObject("Msxml2.XMLH TTP.4.0") since A) Access is denied and b) a security risk message appears.
so.
Is there a way to screen scrape the contents of an IFRAME using asp.net?
Here's an example Iframe
<body>
<form id="form1">
<IFRAME src="http://www.google.com" name="myIframeId"></IFRAME >
</form>
<body>
Thanks
1) I can't get to the HTML contents of an IFRAME that is pointing to a non-local URL such as src=http://www.yahoo.com.
2) I can't use the ActiveXObject("Msxml2.XMLH
so.
Is there a way to screen scrape the contents of an IFRAME using asp.net?
Here's an example Iframe
<body>
<form id="form1">
<IFRAME src="http://www.google.com" name="myIframeId"></IFRAME
</form>
<body>
Thanks
Two problems:
1) Its a tricky problem if you are trying to scrape a dynamic page that is embedded as an Iframe. coz it would always scrape the original URL i.e. www.google.com and not what has been searched.
2) You need to have MSxml2.XMLHTTP loaded on your server to use it. If not u cant use it.
But here is how u scrape a page.
'create an instance of the XMLHTTP component
Set objXML = Server.CreateObject("MSXML 2.ServerXM LHTTP")
'build up the url and store it in strURL variable
strURL = "http://www.google.com"
get the strURL
objXML.Open "GET" , strURL , False ,"",""
'send the information
objXML.Send
'we have no errors
If Err.Number = 0 Then
'and the url is valid
If objXML.Status = 200 then
'store all of the downloaded data in strOpen
strFileContent = objXML.ResponseText
Else
'bad url display a message
Response.Write "Incorrect URL"
End if
Else
'if we do have an error display the description of the error
Response.Write Err.Description
End If
'clear up
Set objXML = Nothing
1) Its a tricky problem if you are trying to scrape a dynamic page that is embedded as an Iframe. coz it would always scrape the original URL i.e. www.google.com and not what has been searched.
2) You need to have MSxml2.XMLHTTP loaded on your server to use it. If not u cant use it.
But here is how u scrape a page.
'create an instance of the XMLHTTP component
Set objXML = Server.CreateObject("MSXML
'build up the url and store it in strURL variable
strURL = "http://www.google.com"
get the strURL
objXML.Open "GET" , strURL , False ,"",""
'send the information
objXML.Send
'we have no errors
If Err.Number = 0 Then
'and the url is valid
If objXML.Status = 200 then
'store all of the downloaded data in strOpen
strFileContent = objXML.ResponseText
Else
'bad url display a message
Response.Write "Incorrect URL"
End if
Else
'if we do have an error display the description of the error
Response.Write Err.Description
End If
'clear up
Set objXML = Nothing
So, it sounds like we are on the same page, then?
FtB
FtB
ASKER
thx - I'm writing/testing some code now - will keep you posted shortly
ASKER
Hi I'm not seeing 'ServerXMLHTTP' using intellisense MSXML2.****
do you know where / which version on MSXML*.dll contains this object?
do you know where / which version on MSXML*.dll contains this object?
I am not sure, but just for yucks, try my code with a page that you know will work (just about any public page will do) to see if you get a response.
ASKER
Do you have this code in C# Scripting somewhere?
Thanks
Thanks
I am afraid that I don't. I use this in ASP classic with VBScript.
FtB
FtB
ASKER
Thanks Fritz,
I tried this in VB and received the following message.
'Let' and 'Set' assignment statements are no longer supported.
Is there a vb work around?
I tried this in VB and received the following message.
'Let' and 'Set' assignment statements are no longer supported.
Is there a vb work around?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks Fritz - you answered my question about this.
I going to post a second question regarding the C# scripting.
I am running asp.net and feel confortable with C# from the object side.
Good time for me to learn about C# scripting - can't be too much different than what I already know
Regards
Steve
I going to post a second question regarding the C# scripting.
I am running asp.net and feel confortable with C# from the object side.
Good time for me to learn about C# scripting - can't be too much different than what I already know
Regards
Steve
Best of luck. If I come across a C# example, I'll post it.
FtB
FtB
Sorry about the double post Ftb,
It took me a while to write the code and then i realized that you had posted earlier.
:) SD
It took me a while to write the code and then i realized that you had posted earlier.
:) SD
Not a problem--I'll take a look at yours to see what I can learn from it.
FtB
FtB
Function GetHTML(strURL)
Dim objXMLHTTP, strReturn
Set objXMLHTTP = Server.CreateObject("MSXML
objXMLHTTP.Open "GET", strURL, False
objXMLHTTP.Send
strReturn = objXMLHTTP.responseText
Set objXMLHTTP = Nothing
GetHTML = strReturn
End Function