Les Ostness
asked on
How to read URL's from a site and pass them into a bash script
From linux (SUSE Linux Enterprise Server 11 SP4) I want to read url's from a site. The URL's are embedded as shortcuts on the web page.
The site name https://apps.COMPANYNAME.com/wiki/display/ABCDEF/APPLICATION+INSTANCE+INFO
Then I want to pass this URL as a variable into a bash script
Here is a sample of a URL I'm trying to get.
http://servername.companyname.com:8888/OA_HTML/AppsLogin.jsp
Note there are severs URL's with different server names, ports, and the last part (AppsLogin) can also be different.
The site name https://apps.COMPANYNAME.com/wiki/display/ABCDEF/APPLICATION+INSTANCE+INFO
Then I want to pass this URL as a variable into a bash script
Here is a sample of a URL I'm trying to get.
http://servername.companyname.com:8888/OA_HTML/AppsLogin.jsp
Note there are severs URL's with different server names, ports, and the last part (AppsLogin) can also be different.
Tip: If you provide the actual URL you're trying to scrape, someone can provide you with a correct response, about whether PhantomJS is required or not.
Curl +sed would seem like the most straightforwards, assuming the urls are not split on multiple lines. A sample would help.
Skeleton for simple cases
curl URL |sed -ne 's/href="\(http:[^"]*\)"/\1/p '
Skeleton for simple cases
curl URL |sed -ne 's/href="\(http:[^"]*\)"/\1/p '
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thank you all. Such great input from everyone. Thanks again for all your help.
You may be able to use wget or curl... if the Webpage is simple HTML.
If the page uses Javascript to generate the set of links you're trying to scrape, then you'll use PhantomJS (headless/screenless Chrome), which will run all Javascript on the page to generate links, before you do your scrape.
2) var="https://apps.COMPANYNAME.com/wiki/display/ABCDEF/APPLICATION+INSTANCE+INFO"
Then use $var in your script.... passed however you like.
Likely you're hitting some problem, which will require you providing more context about your current code + exact problem you're hitting.