?
Solved

Downlload All Files From a Url

Posted on 2003-10-21
14
Medium Priority
?
254 Views
Last Modified: 2010-05-18
Hi again Experts!
I'm looking for a web site, that have a lot of subtitles (zip format), but i don't have time to download all subtitles by clicking in a php link....
so, I want download all *zip files from this url:
Something like:
http://www.somesite/subtitles/*.zip

The problem is that this url, have a blank index.htm, so i can't see the zip files, and i don't have any idea
about the file names...
maybe a really long name, as 123456_789_something_833Sl62Bb.zip

So, i need a code, that search this web directory for the files, and download all to my local hard disk.

Any idea guys?
0
Comment
Question by:Spetson
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 4
  • 2
  • +3
14 Comments
 

Expert Comment

by:OpenSourceDeveloper
ID: 9591789
well if you only want a program that can do this then I suggest "wget"

http://www.interlog.com/~tcharron/wgetwin.html
0
 

Author Comment

by:Spetson
ID: 9592044
nope...
I have tryed wget, but
I just receive a index.html file :(
0
 

Expert Comment

by:OpenSourceDeveloper
ID: 9592118
wget -r -l1 --no-parent -A.gif http://host/dir/

It is a bit of a kludge, but it works perfectly. `-r -l1' means to retrieve recursively (See section Advanced Options), with maximum depth of 1. `--no-parent' means that references to the parent directory are ignored (See section Directory-Based Limits), and `-A.gif' means to download only the GIF files. `-A "*.gif"' would have worked too.
Suppose you were in the middle of downloading, when Wget was interrupted. Now you do not want to clobber the files already present. It would be:

so let's say you want to download all zip files that are linked to from www.test.com you would do

wget -r -l1 --no-parent -A.zip http://www.test.com/
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 6

Expert Comment

by:MannSoft
ID: 9592594
You need some sort of file list, wether it be an index.html linking to the files, or a directory list.  If you don't know the names of the files, how can you download them?  You can't, unless of course you write a brute force program that tries every possible filename combination up to X amount of characters.  But the admin will probably want to kick your ass if you do that :)
0
 
LVL 5

Expert Comment

by:delphized
ID: 9592673
if it's only for a site, why don't you write them and ask for a CD???

chuss
0
 

Author Comment

by:Spetson
ID: 9594703
The screen Show:
==================================
wget -r -l1 --no-parent -A.zip http://www.website/_subtitles_/
--07:18:17--  http://www.website:80/_subtitles_/
           => `www.website/_subtitles_/index.html'
Connecting to www.website:80... connected!
HTTP request sent, awaiting response... 200 OK
Length: 266 [text/html]

    0K ->                                                        [100%]

07:18:18 (259.77 KB/s) - `www.website/_subtitles_/index.html' saved [2
66/266]


FINISHED --07:18:18--
Downloaded: 266 bytes in 1 files
===============================================
I believe that MannSoft is right!
And you know...we are not talking about delphi here anymore...hehe
So if somebody knows some delphi code to make this works, please help!!!
Otherways... i will request a cd, like delphized said!!
Thank's guys, and sorry for my english!
0
 
LVL 6

Expert Comment

by:MannSoft
ID: 9594732
I just re-read your first message...and you say:

"I'm looking for a web site, that have a lot of subtitles (zip format), but i don't have time to download all subtitles by clicking in a php link.... "

So it sounds like there IS a page that has a list of all the files you can download.  Point wget at that page and it should download all the files it links to.
0
 
LVL 17

Expert Comment

by:Wim ten Brink
ID: 9598179
MannSoft is right... The only way to get files through HTTP is when you know the name of these files. There are quite a few applications that use a simple trick to find these names. All you have to do is search for <A href="Whatever"> tags in a webpage and whatever the href is pointing to is just another URL where you can get more data from. Look for these kinds of references to build a list of pages that you want to download.
This technique is called webcrawling, btw. And you have to be careful since you can end in an endless loop if you're not careful.
0
 

Author Comment

by:Spetson
ID: 9598541
Ok...now i understood.
I believe that i cannot make this crazy idea works, because the server admin
have a "blank index.html" in the subtitles folder, accurately to protect the directory.
And maybe, is insane  try to "guess" all the possible archives names in this directory,
using any type of loop or delphi code...

But, i will keep this topic open here, cause i already see  incredible things made by Delphi programmers :)

Therefore, I will be listening, for any suggestion or comments here.

 
 
0
 
LVL 6

Expert Comment

by:MannSoft
ID: 9598617
Could you clarify what you mean by this:

"I'm looking for a web site, that have a lot of subtitles (zip format), but i don't have time to download all subtitles by clicking in a php link.... "

What php link do you have to click?  Depending on the format of the page, you should be able to build a file list from it, that you can then use to get the files with wget.
0
 

Author Comment

by:Spetson
ID: 9598771
Ok!
Sorry for my poor explanations, this is because i'm not powerfull in english...
The server, use a mysql database, so when u click in a subtitle link, this link send you to another page (a cgi)
something like this:
download.php?file=289

So, the subtitles page, have about 2000 links (all linked to the same download.php) only changing the file id.
The problem is:
Before the download.php page, redirect you to the file, you MUST WAIT (in the line) about 1 minute queue, to get every file!
I just want broke this line, and get the file.
I know the directory where all the files is, but don't know all the names (filename)
:)
0
 
LVL 6

Accepted Solution

by:
MannSoft earned 900 total points
ID: 9598984
Okay, I see what you mean now.  And yeah, I think there's not much you can do unless by chance the files were named with some sort of a pattern.  For example, if there were file1.zip, file2.zip, file3.zip then there's a good chance that there are also file4.zip, file5.zip, file6.zip.  But if the zip name describes the contents, like winamp.zip, norton-antivirus.zip, smartftp.zip, then of course that makes it near impossible to guess what else is there.
0
 
LVL 26

Expert Comment

by:EddieShipman
ID: 9950750
I did something like this not too long ago. Let me see if I can find it.
0
 

Author Comment

by:Spetson
ID: 9980961
MannSoft said:
"it near impossible to guess what else is there."

Yes.. I think you're right man!
I had worked hard, and no success to find a solution to my question...hehe!

But, as you had support me with this question, i decided to give to you this 300 points ok?
Because i really hate "Just Close" a topic, so...
Thank's a lot for your comments...

See ya!
0

Featured Post

On Demand Webinar - Networking for the Cloud Era

This webinar discusses:
-Common barriers companies experience when moving to the cloud
-How SD-WAN changes the way we look at networks
-Best practices customers should employ moving forward with cloud migration
-What happens behind the scenes of SteelConnect’s one-click button

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this tutorial I will show you how to use the Windows Speech API in Delphi. I will only cover basic functions such as text to speech and controlling the speed of the speech. SAPI Installation First you need to install the SAPI type library, th…
Introduction Raise your hands if you were as upset with FireMonkey as I was when I discovered that there was no TListview.  I use TListView in almost all of my applications I've written, and I was not going to compromise by resorting to TStringGrid…
In this video, Percona Solution Engineer Rick Golba discuss how (and why) you implement high availability in a database environment. To discuss how Percona Consulting can help with your design and architecture needs for your database and infrastr…
In this video, Percona Solutions Engineer Barrett Chambers discusses some of the basic syntax differences between MySQL and MongoDB. To learn more check out our webinar on MongoDB administration for MySQL DBA: https://www.percona.com/resources/we…
Suggested Courses
Course of the Month9 days, 22 hours left to enroll

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question