Solved

How to write a crawler to download files?

Posted on 2011-09-22
4
399 Views
Last Modified: 2012-05-12
Hi,

I will have the directory listing of a web server.
If this url is given, I want to download everything in that folder and subfolders.
I need to writ this from the scratch.
I was looking at the web crawler to parse the given URL and extract links.
I am told I need to parse the URL since there is always the directory listing available.
So how do I use the directory listing to download files?
Thanks.
0
Comment
Question by:dkim18
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
4 Comments
 
LVL 86

Accepted Solution

by:
CEHJ earned 500 total points
ID: 36580824
>>I need to writ this from the scratch.

Why is that? There are web crawlers already written

>>I am told I need to parse the URL since there is always the directory listing available.

Then i assume you're crawling a specific site, since you can't otherwise rely on directory listings being available?
0
 

Author Comment

by:dkim18
ID: 36580991
Sorry. My grammar wasn't good.

I was trying to say. The client doesn't want us to use those third party tool
So I was going to look at some of the open source code and copy and use them.

What I meant to say above was I will be given a URL like http://mysite/website101/
In the website101, there is a directory listing.
The directory listing will be always available.
in the website101,
There is a folder A and f1 file, f2 file..etc
A folder has b, c, d and f3 file, f4 file
b folder has some sub folders and files

So I am new to this kind of thing.
Do I still have to the parse the directory listing?
Does the directory listing list those subfolder and files in html file (and  as a hyperlink)
So I still need to parse that directory listing, don't I?
0
 

Author Comment

by:dkim18
ID: 36581003
So I want to keep the folder structure and download files.
Website101
website101/a
website101/f1
website101/f2
website101/a/f3
website101/a/b
...etc.
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 36592331
:)
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Java Flight Recorder and Java Mission Control together create a complete tool chain to continuously collect low level and detailed runtime information enabling after-the-fact incident analysis. Java Flight Recorder is a profiling and event collectio…
Real-time is more about the business, not the technology. In day-to-day life, to make real-time decisions like buying or investing, business needs the latest information(e.g. Gold Rate/Stock Rate). Unlike traditional days, you need not wait for a fe…
Video by: Michael
Viewers learn about how to reduce the potential repetitiveness of coding in main by developing methods to perform specific tasks for their program. Additionally, objects are introduced for the purpose of learning how to call methods in Java. Define …
Viewers will learn how to properly install Eclipse with the necessary JDK, and will take a look at an introductory Java program. Download Eclipse installation zip file: Extract files from zip file: Download and install JDK 8: Open Eclipse and …
Suggested Courses

739 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question