Link to home
Start Free TrialLog in
Avatar of Les Ostness
Les OstnessFlag for United States of America

asked on

How to read URL's from a site and pass them into a bash script

From linux (SUSE Linux Enterprise Server 11 SP4) I want to read url's from a site. The URL's are embedded as shortcuts on the web page.
The site name https://apps.COMPANYNAME.com/wiki/display/ABCDEF/APPLICATION+INSTANCE+INFO
Then I want to pass this URL as a variable into a bash script

Here is a sample of a URL I'm trying to get.
http://servername.companyname.com:8888/OA_HTML/AppsLogin.jsp
Note there are severs URL's with different server names, ports, and the last part (AppsLogin) can also be different.
Avatar of David Favor
David Favor
Flag of United States of America image

1) Read URLs off a page.

You may be able to use wget or curl... if the Webpage is simple HTML.

If the page uses Javascript to generate the set of links you're trying to scrape, then you'll use PhantomJS (headless/screenless Chrome), which will run all Javascript on the page to generate links, before you do your scrape.

2) var="https://apps.COMPANYNAME.com/wiki/display/ABCDEF/APPLICATION+INSTANCE+INFO"

Then use $var in your script.... passed however you like.

Likely you're hitting some problem, which will require you providing more context about your current code + exact problem you're hitting.
Tip: If you provide the actual URL you're trying to scrape, someone can provide you with a correct response, about whether PhantomJS is required or not.
Avatar of skullnobrains
skullnobrains

Curl +sed would seem like the most straightforwards, assuming the urls are not split on multiple lines. A sample would help.

Skeleton for simple cases

curl URL |sed -ne 's/href="\(http:[^"]*\)"/\1/p '
ASKER CERTIFIED SOLUTION
Avatar of Pierre François
Pierre François
Flag of Belgium image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Les Ostness

ASKER

Thank you all. Such great input from everyone. Thanks again for all your help.