1) Why your doing this. In other words, how the content pulled from an external site will be used.
2) Type of external site. If the external site uses Javascript, then you must use a Javascript aware scraper.
3) Another question will be if this site provides an API to extract data. If not, most sites note in the TOS (terms of service) scraping is disallowed, then if they're scraped repeatedly, almost surely have countermeasures built in to track + block (sometimes forever) any IP scraping their site.
4) Another option is to <iframe> in the content, in which case you'll be blocked by browser security modifying any of the content.
Answer these 3x questions, then provided the URL of the page to be scraped + likely you'll have many suggestions.
LeighWardle
ASKER
Thanks, Kimputer and David.
That's got me started.
Regards,
Leigh
David Favor
You're welcome!
Good luck!
If you'd like additional comments, answer the questions above.
1) Why your doing this. In other words, how the content pulled from an external site will be used.
2) Type of external site. If the external site uses Javascript, then you must use a Javascript aware scraper.
3) Another question will be if this site provides an API to extract data. If not, most sites note in the TOS (terms of service) scraping is disallowed, then if they're scraped repeatedly, almost surely have countermeasures built in to track + block (sometimes forever) any IP scraping their site.
4) Another option is to <iframe> in the content, in which case you'll be blocked by browser security modifying any of the content.
Answer these 3x questions, then provided the URL of the page to be scraped + likely you'll have many suggestions.