Solved

How to write a crawler to download files?

Posted on 2011-09-22
4
395 Views
Last Modified: 2012-05-12
Hi,

I will have the directory listing of a web server.
If this url is given, I want to download everything in that folder and subfolders.
I need to writ this from the scratch.
I was looking at the web crawler to parse the given URL and extract links.
I am told I need to parse the URL since there is always the directory listing available.
So how do I use the directory listing to download files?
Thanks.
0
Comment
Question by:dkim18
  • 2
  • 2
4 Comments
 
LVL 86

Accepted Solution

by:
CEHJ earned 500 total points
ID: 36580824
>>I need to writ this from the scratch.

Why is that? There are web crawlers already written

>>I am told I need to parse the URL since there is always the directory listing available.

Then i assume you're crawling a specific site, since you can't otherwise rely on directory listings being available?
0
 

Author Comment

by:dkim18
ID: 36580991
Sorry. My grammar wasn't good.

I was trying to say. The client doesn't want us to use those third party tool
So I was going to look at some of the open source code and copy and use them.

What I meant to say above was I will be given a URL like http://mysite/website101/
In the website101, there is a directory listing.
The directory listing will be always available.
in the website101,
There is a folder A and f1 file, f2 file..etc
A folder has b, c, d and f3 file, f4 file
b folder has some sub folders and files

So I am new to this kind of thing.
Do I still have to the parse the directory listing?
Does the directory listing list those subfolder and files in html file (and  as a hyperlink)
So I still need to parse that directory listing, don't I?
0
 

Author Comment

by:dkim18
ID: 36581003
So I want to keep the folder structure and download files.
Website101
website101/a
website101/f1
website101/f2
website101/a/f3
website101/a/b
...etc.
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 36592331
:)
0

Featured Post

What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

Join & Write a Comment

Suggested Solutions

We all know that functional code is the leg that any good program stands on when it comes right down to it, however, if your program lacks a good user interface your product may not have the appeal needed to keep your customers happy. This issue can…
Entity Framework is a powerful tool to help you interact with the DataBase but still doesn't help much when we have a Stored Procedure that returns more than one resultset. The solution takes some of out-of-the-box thinking; read on!
Viewers learn about the scanner class in this video and are introduced to receiving user input for their programs. Additionally, objects, conditional statements, and loops are used to help reinforce the concepts. Introduce Scanner class: Importing…
This tutorial covers a practical example of lazy loading technique and early loading technique in a Singleton Design Pattern.

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now