Solved

How to write a crawler to download files?

Posted on 2011-09-22
4
400 Views
Last Modified: 2012-05-12
Hi,

I will have the directory listing of a web server.
If this url is given, I want to download everything in that folder and subfolders.
I need to writ this from the scratch.
I was looking at the web crawler to parse the given URL and extract links.
I am told I need to parse the URL since there is always the directory listing available.
So how do I use the directory listing to download files?
Thanks.
0
Comment
Question by:dkim18
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
4 Comments
 
LVL 86

Accepted Solution

by:
CEHJ earned 500 total points
ID: 36580824
>>I need to writ this from the scratch.

Why is that? There are web crawlers already written

>>I am told I need to parse the URL since there is always the directory listing available.

Then i assume you're crawling a specific site, since you can't otherwise rely on directory listings being available?
0
 

Author Comment

by:dkim18
ID: 36580991
Sorry. My grammar wasn't good.

I was trying to say. The client doesn't want us to use those third party tool
So I was going to look at some of the open source code and copy and use them.

What I meant to say above was I will be given a URL like http://mysite/website101/
In the website101, there is a directory listing.
The directory listing will be always available.
in the website101,
There is a folder A and f1 file, f2 file..etc
A folder has b, c, d and f3 file, f4 file
b folder has some sub folders and files

So I am new to this kind of thing.
Do I still have to the parse the directory listing?
Does the directory listing list those subfolder and files in html file (and  as a hyperlink)
So I still need to parse that directory listing, don't I?
0
 

Author Comment

by:dkim18
ID: 36581003
So I want to keep the folder structure and download files.
Website101
website101/a
website101/f1
website101/f2
website101/a/f3
website101/a/b
...etc.
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 36592331
:)
0

Featured Post

On Demand Webinar: Networking for the Cloud Era

Did you know SD-WANs can improve network connectivity? Check out this webinar to learn how an SD-WAN simplified, one-click tool can help you migrate and manage data in the cloud.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Java functions are among the best things for programmers to work with as Java sites can be very easy to read and prepare. Java especially simplifies many processes in the coding industry as it helps integrate many forms of technology and different d…
The article shows the basic steps of integrating an HTML theme template into an ASP.NET MVC project
This video teaches viewers about errors in exception handling.
Viewers will learn how to properly install Eclipse with the necessary JDK, and will take a look at an introductory Java program. Download Eclipse installation zip file: Extract files from zip file: Download and install JDK 8: Open Eclipse and …
Suggested Courses

631 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question