Solved

Extract links form provided webpage

Posted on 2016-07-19
5
30 Views
Last Modified: 2016-08-01
Hi All,
There is an requirement to download the PDF and html links from the given web-page. if html links contains more pdf links than those were required to download as well.
Is there any sample or reference perl script or any other script which can be run on linux machine.

Thanks,
Shail
0
Comment
Question by:Shailesh Shinde
  • 3
  • 2
5 Comments
 
LVL 12

Accepted Solution

by:
Benjamin Voglar earned 500 total points
ID: 41718268
if I understand this correctly. You like to download all PDF files from a site.

You can try with powershell.

$psPage = Invoke-WebRequest "http://www.powertheshell.com/cookbooks/"
$urls = $psPage.ParsedHtml.getElementsByTagName("A") | ? {$_.href -like "*.pdf"} | Select-Object -ExpandProperty href

$urls | ForEach-Object {Invoke-WebRequest -Uri $_ -OutFile ($_ | Split-Path -Leaf)}

Open in new window

0
 
LVL 3

Author Comment

by:Shailesh Shinde
ID: 41718749
Hi,
I tried running this script. However, getting below error on command prompt screen
links.ps1 cannot be loaded because the execution of scripts is disabled on this system.

Thanks,
Shail
0
 
LVL 12

Expert Comment

by:Benjamin Voglar
ID: 41718959
open powershell as Admin and enter:

 Set-ExecutionPolicy -ExecutionPolicy Unrestricted

then try the script again.
0
 
LVL 12

Expert Comment

by:Benjamin Voglar
ID: 41718961
Or You can use "Windows Powershell ISE"
0
 
LVL 3

Author Closing Comment

by:Shailesh Shinde
ID: 41738491
Thanks
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Suggested Solutions

Introduction:   Welcome to my first article ever. To begin with, the reason I write this article.  I participated in a question on Experts Exchange about the start command in Windows and there were some discussion about the usage. The discussio…
Active Directory replication delay is the cause to many problems.  Here is a super easy script to force Active Directory replication to all sites with by using an elevated PowerShell command prompt, and a tool to verify your changes.
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now