Improve company productivity with a Business Account.Sign Up

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 58
  • Last Modified:

Extract links form provided webpage

Hi All,
There is an requirement to download the PDF and html links from the given web-page. if html links contains more pdf links than those were required to download as well.
Is there any sample or reference perl script or any other script which can be run on linux machine.

Thanks,
Shail
0
Shailesh Shinde
Asked:
Shailesh Shinde
  • 3
  • 2
1 Solution
 
Benjamin VoglarIT ProCommented:
if I understand this correctly. You like to download all PDF files from a site.

You can try with powershell.

$psPage = Invoke-WebRequest "http://www.powertheshell.com/cookbooks/"
$urls = $psPage.ParsedHtml.getElementsByTagName("A") | ? {$_.href -like "*.pdf"} | Select-Object -ExpandProperty href

$urls | ForEach-Object {Invoke-WebRequest -Uri $_ -OutFile ($_ | Split-Path -Leaf)}

Open in new window

0
 
Shailesh ShindeLocalization Engineering & AutomationAuthor Commented:
Hi,
I tried running this script. However, getting below error on command prompt screen
links.ps1 cannot be loaded because the execution of scripts is disabled on this system.

Thanks,
Shail
0
 
Benjamin VoglarIT ProCommented:
open powershell as Admin and enter:

 Set-ExecutionPolicy -ExecutionPolicy Unrestricted

then try the script again.
0
 
Benjamin VoglarIT ProCommented:
Or You can use "Windows Powershell ISE"
0
 
Shailesh ShindeLocalization Engineering & AutomationAuthor Commented:
Thanks
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now