Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
Solved

Extract links form provided webpage

Posted on 2016-07-19
5
38 Views
Last Modified: 2016-08-01
Hi All,
There is an requirement to download the PDF and html links from the given web-page. if html links contains more pdf links than those were required to download as well.
Is there any sample or reference perl script or any other script which can be run on linux machine.

Thanks,
Shail
0
Comment
Question by:Shailesh Shinde
  • 3
  • 2
5 Comments
 
LVL 12

Accepted Solution

by:
Benjamin Voglar earned 500 total points
ID: 41718268
if I understand this correctly. You like to download all PDF files from a site.

You can try with powershell.

$psPage = Invoke-WebRequest "http://www.powertheshell.com/cookbooks/"
$urls = $psPage.ParsedHtml.getElementsByTagName("A") | ? {$_.href -like "*.pdf"} | Select-Object -ExpandProperty href

$urls | ForEach-Object {Invoke-WebRequest -Uri $_ -OutFile ($_ | Split-Path -Leaf)}

Open in new window

0
 
LVL 3

Author Comment

by:Shailesh Shinde
ID: 41718749
Hi,
I tried running this script. However, getting below error on command prompt screen
links.ps1 cannot be loaded because the execution of scripts is disabled on this system.

Thanks,
Shail
0
 
LVL 12

Expert Comment

by:Benjamin Voglar
ID: 41718959
open powershell as Admin and enter:

 Set-ExecutionPolicy -ExecutionPolicy Unrestricted

then try the script again.
0
 
LVL 12

Expert Comment

by:Benjamin Voglar
ID: 41718961
Or You can use "Windows Powershell ISE"
0
 
LVL 3

Author Closing Comment

by:Shailesh Shinde
ID: 41738491
Thanks
0

Featured Post

How Do You Stack Up Against Your Peers?

With today’s modern enterprise so dependent on digital infrastructures, the impact of major incidents has increased dramatically. Grab the report now to gain insight into how your organization ranks against your peers and learn best-in-class strategies to resolve incidents.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

In this tutorial I will focus on how to use WhizBase as a tool for sending ICQ messages to ICQ. Here I will use a new technology in WhizBase, published in WhizBase 5.1 version. In this tutorial I will use 3 files, pager.wbsp for the processing, e…
I hope you'll find this tutorial useful and interesting. So let's try to extend Tcl with a new package.  For anyone more deeply interested please check out the book "Practical Programming in Tcl and Tk". It's really one of the best written books abo…
The viewer will learn how to dynamically set the form action using jQuery.
In a recent question (https://www.experts-exchange.com/questions/29004105/Run-AutoHotkey-script-directly-from-Notepad.html) here at Experts Exchange, a member asked how to run an AutoHotkey script (.AHK) directly from Notepad++ (aka NPP). This video…

808 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question