curiouswebster
asked on
Is webpage scraping still "a thing"?
Is webpage scraping still "a thing"?
Is it normal to scrape a website, anymore?
Is it okay to to this? Is it okay for a person to log in to the "target site", and have their info scraped?
Or, is this no longer considered okay?
Is it even possible?
I saved the HTML to my disk, then opened it in a text editor. It looked pretty impossible to parse...
Thanks
It is but it's pointless in many cases. Many sites are using Javascript and AJAX to load their content so unless your 'scraper' can run Javascript, you're not going to get the content.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
@DaveBaldwin Isn't web scraping that I parse through the HTML? Regardless of where the content comes from, the numbers exist in the HTML. But, it may be near impossible to extract. For example, these simple dollar values, are buried.
@Mlanda T, your post is promising. I was afraid scraping was not permitted.
Are you parsing through HTML? How do you deal with the complexity of mining the HTML? Notice how the Rendered HTML DOES contain the values I seek. But, $18.02 and $10.94 are each contained by a <span> with the same class. Is this the kind of detail you need to iterate through? Am I missing something?
<div class="sc-c6d4adb7-0 cjFxcK"><li data-testid="focus-ring" class="sc-118qnmp-1 sc-8bevu2-1 jLYpnn lhwSdn"><div data-testid="listitem-cont ainer" class="sc-8bevu2-0 hrjzqn"><div aria-hidden="false" class="sc-8bevu2-3 gztdTv"><div class="sc-wee03o-0 egtzOT"><div class="sc-wee03o-0 jvjOa-D"><span class="sc-cx1xxi-0 elfRdE">4:14 PM</span><span class="sc-cx1xxi-0 csUZEd">7.46 miles • 22m 36s</span></div><div class="sc-wee03o-0 hLWqav"><div class="sc-wee03o-0 iscWNJ"><span class="sc-cx1xxi-0 elfRdE">$18.02</span>
<div data-testid="listitem-cont ainer" class="sc-8bevu2-0 hrjzqn"><div aria-hidden="false" class="sc-8bevu2-3 gztdTv"><div class="sc-wee03o-0 egtzOT"><div class="sc-wee03o-0 jvjOa-D"><span class="sc-cx1xxi-0 elfRdE">3:47 PM</span><span class="sc-cx1xxi-0 csUZEd">3.89 miles • 17m 45s</span></div><div class="sc-wee03o-0 hLWqav"><div class="sc-wee03o-0 iscWNJ"><span class="sc-cx1xxi-0 elfRdE">$10.94</span>
Thanks.
Screen-Shot-2023-01-14-at-8.49.53-AM.png
Screen-Shot-2023-01-14-at-8.53.30-AM.png
Screen-Shot-2023-01-14-at-9.05.22-AM.png
@Mlanda T, your post is promising. I was afraid scraping was not permitted.
Are you parsing through HTML? How do you deal with the complexity of mining the HTML? Notice how the Rendered HTML DOES contain the values I seek. But, $18.02 and $10.94 are each contained by a <span> with the same class. Is this the kind of detail you need to iterate through? Am I missing something?
<div class="sc-c6d4adb7-0 cjFxcK"><li data-testid="focus-ring" class="sc-118qnmp-1 sc-8bevu2-1 jLYpnn lhwSdn"><div data-testid="listitem-cont
<div data-testid="listitem-cont
Thanks.
Screen-Shot-2023-01-14-at-8.49.53-AM.png
Screen-Shot-2023-01-14-at-8.53.30-AM.png
Screen-Shot-2023-01-14-at-9.05.22-AM.png
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Related question of mine...
https://www.experts-exchange.com/questions/29253057/Why-does-HtmlAgilityPack's-SelectNodes-return-null.html
https://www.experts-exchange.com/questions/29253057/Why-does-HtmlAgilityPack's-SelectNodes-return-null.html