Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium


Extract links form provided webpage

Posted on 2016-07-19
Medium Priority
Last Modified: 2016-08-01
Hi All,
There is an requirement to download the PDF and html links from the given web-page. if html links contains more pdf links than those were required to download as well.
Is there any sample or reference perl script or any other script which can be run on linux machine.

Question by:Shailesh Shinde
  • 3
  • 2
LVL 12

Accepted Solution

Benjamin Voglar earned 2000 total points
ID: 41718268
if I understand this correctly. You like to download all PDF files from a site.

You can try with powershell.

$psPage = Invoke-WebRequest "http://www.powertheshell.com/cookbooks/"
$urls = $psPage.ParsedHtml.getElementsByTagName("A") | ? {$_.href -like "*.pdf"} | Select-Object -ExpandProperty href

$urls | ForEach-Object {Invoke-WebRequest -Uri $_ -OutFile ($_ | Split-Path -Leaf)}

Open in new window


Author Comment

by:Shailesh Shinde
ID: 41718749
I tried running this script. However, getting below error on command prompt screen
links.ps1 cannot be loaded because the execution of scripts is disabled on this system.

LVL 12

Expert Comment

by:Benjamin Voglar
ID: 41718959
open powershell as Admin and enter:

 Set-ExecutionPolicy -ExecutionPolicy Unrestricted

then try the script again.
LVL 12

Expert Comment

by:Benjamin Voglar
ID: 41718961
Or You can use "Windows Powershell ISE"

Author Closing Comment

by:Shailesh Shinde
ID: 41738491

Featured Post

[Webinar] Database Backup and Recovery

Does your company store data on premises, off site, in the cloud, or a combination of these? If you answered “yes”, you need a data backup recovery plan that fits each and every platform. Watch now as as Percona teaches us how to build agile data backup recovery plan.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this tutorial I will focus on how to use WhizBase as a tool for sending ICQ messages to ICQ. Here I will use a new technology in WhizBase, published in WhizBase 5.1 version. In this tutorial I will use 3 files, pager.wbsp for the processing, e…
Active Directory replication delay is the cause to many problems.  Here is a super easy script to force Active Directory replication to all sites with by using an elevated PowerShell command prompt, and a tool to verify your changes.
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …
In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …
Suggested Courses

572 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question