• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 272
  • Last Modified:

Downlload All Files From a Url

Hi again Experts!
I'm looking for a web site, that have a lot of subtitles (zip format), but i don't have time to download all subtitles by clicking in a php link....
so, I want download all *zip files from this url:
Something like:
http://www.somesite/subtitles/*.zip

The problem is that this url, have a blank index.htm, so i can't see the zip files, and i don't have any idea
about the file names...
maybe a really long name, as 123456_789_something_833Sl62Bb.zip

So, i need a code, that search this web directory for the files, and download all to my local hard disk.

Any idea guys?
0
Spetson
Asked:
Spetson
  • 5
  • 4
  • 2
  • +3
1 Solution
 
OpenSourceDeveloperCommented:
well if you only want a program that can do this then I suggest "wget"

http://www.interlog.com/~tcharron/wgetwin.html
0
 
SpetsonAuthor Commented:
nope...
I have tryed wget, but
I just receive a index.html file :(
0
 
OpenSourceDeveloperCommented:
wget -r -l1 --no-parent -A.gif http://host/dir/

It is a bit of a kludge, but it works perfectly. `-r -l1' means to retrieve recursively (See section Advanced Options), with maximum depth of 1. `--no-parent' means that references to the parent directory are ignored (See section Directory-Based Limits), and `-A.gif' means to download only the GIF files. `-A "*.gif"' would have worked too.
Suppose you were in the middle of downloading, when Wget was interrupted. Now you do not want to clobber the files already present. It would be:

so let's say you want to download all zip files that are linked to from www.test.com you would do

wget -r -l1 --no-parent -A.zip http://www.test.com/
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
MannSoftCommented:
You need some sort of file list, wether it be an index.html linking to the files, or a directory list.  If you don't know the names of the files, how can you download them?  You can't, unless of course you write a brute force program that tries every possible filename combination up to X amount of characters.  But the admin will probably want to kick your ass if you do that :)
0
 
delphizedCommented:
if it's only for a site, why don't you write them and ask for a CD???

chuss
0
 
SpetsonAuthor Commented:
The screen Show:
==================================
wget -r -l1 --no-parent -A.zip http://www.website/_subtitles_/
--07:18:17--  http://www.website:80/_subtitles_/
           => `www.website/_subtitles_/index.html'
Connecting to www.website:80... connected!
HTTP request sent, awaiting response... 200 OK
Length: 266 [text/html]

    0K ->                                                        [100%]

07:18:18 (259.77 KB/s) - `www.website/_subtitles_/index.html' saved [2
66/266]


FINISHED --07:18:18--
Downloaded: 266 bytes in 1 files
===============================================
I believe that MannSoft is right!
And you know...we are not talking about delphi here anymore...hehe
So if somebody knows some delphi code to make this works, please help!!!
Otherways... i will request a cd, like delphized said!!
Thank's guys, and sorry for my english!
0
 
MannSoftCommented:
I just re-read your first message...and you say:

"I'm looking for a web site, that have a lot of subtitles (zip format), but i don't have time to download all subtitles by clicking in a php link.... "

So it sounds like there IS a page that has a list of all the files you can download.  Point wget at that page and it should download all the files it links to.
0
 
Wim ten BrinkCommented:
MannSoft is right... The only way to get files through HTTP is when you know the name of these files. There are quite a few applications that use a simple trick to find these names. All you have to do is search for <A href="Whatever"> tags in a webpage and whatever the href is pointing to is just another URL where you can get more data from. Look for these kinds of references to build a list of pages that you want to download.
This technique is called webcrawling, btw. And you have to be careful since you can end in an endless loop if you're not careful.
0
 
SpetsonAuthor Commented:
Ok...now i understood.
I believe that i cannot make this crazy idea works, because the server admin
have a "blank index.html" in the subtitles folder, accurately to protect the directory.
And maybe, is insane  try to "guess" all the possible archives names in this directory,
using any type of loop or delphi code...

But, i will keep this topic open here, cause i already see  incredible things made by Delphi programmers :)

Therefore, I will be listening, for any suggestion or comments here.

 
 
0
 
MannSoftCommented:
Could you clarify what you mean by this:

"I'm looking for a web site, that have a lot of subtitles (zip format), but i don't have time to download all subtitles by clicking in a php link.... "

What php link do you have to click?  Depending on the format of the page, you should be able to build a file list from it, that you can then use to get the files with wget.
0
 
SpetsonAuthor Commented:
Ok!
Sorry for my poor explanations, this is because i'm not powerfull in english...
The server, use a mysql database, so when u click in a subtitle link, this link send you to another page (a cgi)
something like this:
download.php?file=289

So, the subtitles page, have about 2000 links (all linked to the same download.php) only changing the file id.
The problem is:
Before the download.php page, redirect you to the file, you MUST WAIT (in the line) about 1 minute queue, to get every file!
I just want broke this line, and get the file.
I know the directory where all the files is, but don't know all the names (filename)
:)
0
 
MannSoftCommented:
Okay, I see what you mean now.  And yeah, I think there's not much you can do unless by chance the files were named with some sort of a pattern.  For example, if there were file1.zip, file2.zip, file3.zip then there's a good chance that there are also file4.zip, file5.zip, file6.zip.  But if the zip name describes the contents, like winamp.zip, norton-antivirus.zip, smartftp.zip, then of course that makes it near impossible to guess what else is there.
0
 
Eddie ShipmanAll-around developerCommented:
I did something like this not too long ago. Let me see if I can find it.
0
 
SpetsonAuthor Commented:
MannSoft said:
"it near impossible to guess what else is there."

Yes.. I think you're right man!
I had worked hard, and no success to find a solution to my question...hehe!

But, as you had support me with this question, i decided to give to you this 300 points ok?
Because i really hate "Just Close" a topic, so...
Thank's a lot for your comments...

See ya!
0

Featured Post

[Webinar] Cloud and Mobile-First Strategy

Maybe you’ve fully adopted the cloud since the beginning. Or maybe you started with on-prem resources but are pursuing a “cloud and mobile first” strategy. Getting to that end state has its challenges. Discover how to build out a 100% cloud and mobile IT strategy in this webinar.

  • 5
  • 4
  • 2
  • +3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now