• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 169
  • Last Modified:

crawl site to remove http links

Hi,
I am looking for a crawler which will extract all url on a page and place them in a file.  Any recommendations?  Thanks.
0
NYGiantsFan
Asked:
NYGiantsFan
  • 2
2 Solutions
 
Scott Fell, EE MVEDeveloper & EE ModeratorCommented:
I used to use this when I had a PC.  http://www.httrack.com/  I don't know something similar for the MAC though.
0
 
Giovanni HewardCommented:
You can download curl and run the following command:

curl http://www.example.com 2>nul|findstr /i "<a"

Open in new window


You can redirect the output to a file like so:

curl http://www.example.com 2>nul|findstr /i "<a" >>links.txt

Open in new window


If you want to clean up that output, try:

for /f tokens^=2^ delims^=^" %l in ('curl http://www.example.com 2^>nul^|findstr /i "<a"') do echo %l >>links.txt

Open in new window

0
 
Giovanni HewardCommented:
If you're looking for a GUI method, try:

http://www.focalmedia.net/urlextract.html
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now