omer d
asked on
web scraping using python
Hi,
I'm using python Browser() to download html pages,
it's working for most of the sites,
it doesn't work for: http://www.hashulchan.co.il/?CategoryID=541&ArticleID=13120
I'm getting:
<html style="height:100%"><head> <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"><meta name="format-detection" content="telephone=no"><me ta name="viewport" content="initial-scale=1.0 "><meta http-equiv="X-UA-Compatibl e" content="IE=edge,chrome=1" ></head><b ody style="margin:0px;height:1 00%"><ifra me src="/_Incapsula_Resource? CWUDNSAI=9 &xinfo=0-2 73981-0 0NNN RT(1427217058600 4) q(0 -1 -1 -1) r(0 -1) B12(4,315,0)&incident_id=2 5300002000 0650957-29 1143379248 7456&edet= 12&cinfo=0 4000000" frameborder=0 width="100%" height="100%" marginheight="0px" marginwidth="0px">Request unsuccessful. Incapsula incident ID: 253000020000650957-2911433 792487456< /iframe></ body></htm l>
How can I download the page, is it some kind of protection?
Thanks.
I'm using python Browser() to download html pages,
it's working for most of the sites,
it doesn't work for: http://www.hashulchan.co.il/?CategoryID=541&ArticleID=13120
I'm getting:
<html style="height:100%"><head>
How can I download the page, is it some kind of protection?
Thanks.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
You're seeing that because the actual content is in the 'iframe'. An 'iframe' is a method to include the content of another page in the current page. However, the browser or the crawler in this case, has to make a separate request for the page in the iframe.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Hi omer_d,
where you able to solve the issue?
where you able to solve the issue?
ASKER
Hi Walter,
No... :/
No... :/
No comment has been added to this question in more than 21 days, so it is now classified as abandoned.
I have recommended this question be closed as follows:
Split:
-- Walter Ritzel (https:#a40685422)
-- omer d (https:#a40685489)
If you feel this question should be closed differently, post an objection and the moderators will review all objections and close it as they feel fit. If no one objects, this question will be closed automatically the way described above.
suhasbharadwaj
Experts-Exchange Cleanup Volunteer
I have recommended this question be closed as follows:
Split:
-- Walter Ritzel (https:#a40685422)
-- omer d (https:#a40685489)
If you feel this question should be closed differently, post an objection and the moderators will review all objections and close it as they feel fit. If no one objects, this question will be closed automatically the way described above.
suhasbharadwaj
Experts-Exchange Cleanup Volunteer
ASKER
Thanks, I've no intention to scrape the all site or to harm it...
I'm using:
Open in new window
and yet I'm getting sometime the posted result, and sometime:
Open in new window