Extract links form provided webpage

Posted on 2016-07-19
Last Modified: 2016-08-01
Hi All,
There is an requirement to download the PDF and html links from the given web-page. if html links contains more pdf links than those were required to download as well.
Is there any sample or reference perl script or any other script which can be run on linux machine.

Question by:Shailesh Shinde
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
LVL 12

Accepted Solution

Benjamin Voglar earned 500 total points
ID: 41718268
if I understand this correctly. You like to download all PDF files from a site.

You can try with powershell.

$psPage = Invoke-WebRequest ""
$urls = $psPage.ParsedHtml.getElementsByTagName("A") | ? {$_.href -like "*.pdf"} | Select-Object -ExpandProperty href

$urls | ForEach-Object {Invoke-WebRequest -Uri $_ -OutFile ($_ | Split-Path -Leaf)}

Open in new window


Author Comment

by:Shailesh Shinde
ID: 41718749
I tried running this script. However, getting below error on command prompt screen
links.ps1 cannot be loaded because the execution of scripts is disabled on this system.

LVL 12

Expert Comment

by:Benjamin Voglar
ID: 41718959
open powershell as Admin and enter:

 Set-ExecutionPolicy -ExecutionPolicy Unrestricted

then try the script again.
LVL 12

Expert Comment

by:Benjamin Voglar
ID: 41718961
Or You can use "Windows Powershell ISE"

Author Closing Comment

by:Shailesh Shinde
ID: 41738491

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Copy items in a folder structure then put them back. 4 31
storing csv file in table variable in Python 2 95
Powershell PGP and SFTP 1 48
Ruby or Python 7 132
This tutorial will discuss fancy secure registration forms, with AJAX technology support. In this article I assume you already know HTML and some JS. I will write the code using WhizBase Server Pages, so you need to know some basics in WBSP (you mig…
Making a simple AJAX shopping cart Couple years ago I made my first shopping cart, I used iframe and JavaScript, it was very good at that time, there were no sessions or AJAX, I used cookies on clients machine. Today we have more advanced techno…
Learn the basics of lists in Python. Lists, as their name suggests, are a means for ordering and storing values. : Lists are declared using brackets; for example: t = [1, 2, 3]: Lists may contain a mix of data types; for example: t = ['string', 1, T…
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…

739 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question