Solved

Downlload All Files From a Url

Posted on 2003-10-21
14
248 Views
Last Modified: 2010-05-18
Hi again Experts!
I'm looking for a web site, that have a lot of subtitles (zip format), but i don't have time to download all subtitles by clicking in a php link....
so, I want download all *zip files from this url:
Something like:
http://www.somesite/subtitles/*.zip

The problem is that this url, have a blank index.htm, so i can't see the zip files, and i don't have any idea
about the file names...
maybe a really long name, as 123456_789_something_833Sl62Bb.zip

So, i need a code, that search this web directory for the files, and download all to my local hard disk.

Any idea guys?
0
Comment
Question by:Spetson
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 4
  • 2
  • +3
14 Comments
 

Expert Comment

by:OpenSourceDeveloper
ID: 9591789
well if you only want a program that can do this then I suggest "wget"

http://www.interlog.com/~tcharron/wgetwin.html
0
 

Author Comment

by:Spetson
ID: 9592044
nope...
I have tryed wget, but
I just receive a index.html file :(
0
 

Expert Comment

by:OpenSourceDeveloper
ID: 9592118
wget -r -l1 --no-parent -A.gif http://host/dir/

It is a bit of a kludge, but it works perfectly. `-r -l1' means to retrieve recursively (See section Advanced Options), with maximum depth of 1. `--no-parent' means that references to the parent directory are ignored (See section Directory-Based Limits), and `-A.gif' means to download only the GIF files. `-A "*.gif"' would have worked too.
Suppose you were in the middle of downloading, when Wget was interrupted. Now you do not want to clobber the files already present. It would be:

so let's say you want to download all zip files that are linked to from www.test.com you would do

wget -r -l1 --no-parent -A.zip http://www.test.com/
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 6

Expert Comment

by:MannSoft
ID: 9592594
You need some sort of file list, wether it be an index.html linking to the files, or a directory list.  If you don't know the names of the files, how can you download them?  You can't, unless of course you write a brute force program that tries every possible filename combination up to X amount of characters.  But the admin will probably want to kick your ass if you do that :)
0
 
LVL 5

Expert Comment

by:delphized
ID: 9592673
if it's only for a site, why don't you write them and ask for a CD???

chuss
0
 

Author Comment

by:Spetson
ID: 9594703
The screen Show:
==================================
wget -r -l1 --no-parent -A.zip http://www.website/_subtitles_/
--07:18:17--  http://www.website:80/_subtitles_/
           => `www.website/_subtitles_/index.html'
Connecting to www.website:80... connected!
HTTP request sent, awaiting response... 200 OK
Length: 266 [text/html]

    0K ->                                                        [100%]

07:18:18 (259.77 KB/s) - `www.website/_subtitles_/index.html' saved [2
66/266]


FINISHED --07:18:18--
Downloaded: 266 bytes in 1 files
===============================================
I believe that MannSoft is right!
And you know...we are not talking about delphi here anymore...hehe
So if somebody knows some delphi code to make this works, please help!!!
Otherways... i will request a cd, like delphized said!!
Thank's guys, and sorry for my english!
0
 
LVL 6

Expert Comment

by:MannSoft
ID: 9594732
I just re-read your first message...and you say:

"I'm looking for a web site, that have a lot of subtitles (zip format), but i don't have time to download all subtitles by clicking in a php link.... "

So it sounds like there IS a page that has a list of all the files you can download.  Point wget at that page and it should download all the files it links to.
0
 
LVL 17

Expert Comment

by:Wim ten Brink
ID: 9598179
MannSoft is right... The only way to get files through HTTP is when you know the name of these files. There are quite a few applications that use a simple trick to find these names. All you have to do is search for <A href="Whatever"> tags in a webpage and whatever the href is pointing to is just another URL where you can get more data from. Look for these kinds of references to build a list of pages that you want to download.
This technique is called webcrawling, btw. And you have to be careful since you can end in an endless loop if you're not careful.
0
 

Author Comment

by:Spetson
ID: 9598541
Ok...now i understood.
I believe that i cannot make this crazy idea works, because the server admin
have a "blank index.html" in the subtitles folder, accurately to protect the directory.
And maybe, is insane  try to "guess" all the possible archives names in this directory,
using any type of loop or delphi code...

But, i will keep this topic open here, cause i already see  incredible things made by Delphi programmers :)

Therefore, I will be listening, for any suggestion or comments here.

 
 
0
 
LVL 6

Expert Comment

by:MannSoft
ID: 9598617
Could you clarify what you mean by this:

"I'm looking for a web site, that have a lot of subtitles (zip format), but i don't have time to download all subtitles by clicking in a php link.... "

What php link do you have to click?  Depending on the format of the page, you should be able to build a file list from it, that you can then use to get the files with wget.
0
 

Author Comment

by:Spetson
ID: 9598771
Ok!
Sorry for my poor explanations, this is because i'm not powerfull in english...
The server, use a mysql database, so when u click in a subtitle link, this link send you to another page (a cgi)
something like this:
download.php?file=289

So, the subtitles page, have about 2000 links (all linked to the same download.php) only changing the file id.
The problem is:
Before the download.php page, redirect you to the file, you MUST WAIT (in the line) about 1 minute queue, to get every file!
I just want broke this line, and get the file.
I know the directory where all the files is, but don't know all the names (filename)
:)
0
 
LVL 6

Accepted Solution

by:
MannSoft earned 300 total points
ID: 9598984
Okay, I see what you mean now.  And yeah, I think there's not much you can do unless by chance the files were named with some sort of a pattern.  For example, if there were file1.zip, file2.zip, file3.zip then there's a good chance that there are also file4.zip, file5.zip, file6.zip.  But if the zip name describes the contents, like winamp.zip, norton-antivirus.zip, smartftp.zip, then of course that makes it near impossible to guess what else is there.
0
 
LVL 26

Expert Comment

by:EddieShipman
ID: 9950750
I did something like this not too long ago. Let me see if I can find it.
0
 

Author Comment

by:Spetson
ID: 9980961
MannSoft said:
"it near impossible to guess what else is there."

Yes.. I think you're right man!
I had worked hard, and no success to find a solution to my question...hehe!

But, as you had support me with this question, i decided to give to you this 300 points ok?
Because i really hate "Just Close" a topic, so...
Thank's a lot for your comments...

See ya!
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Objective: - This article will help user in how to convert their numeric value become words. How to use 1. You can copy this code in your Unit as function 2. than you can perform your function by type this code The Code   (CODE) The Im…
Hello everybody This Article will show you how to validate number with TEdit control, What's the TEdit control? TEdit is a standard Windows edit control on a form, it allows to user to write, read and copy/paste single line of text. Usua…
There's a multitude of different network monitoring solutions out there, and you're probably wondering what makes NetCrunch so special. It's completely agentless, but does let you create an agent, if you desire. It offers powerful scalability …
If you’ve ever visited a web page and noticed a cool font that you really liked the look of, but couldn’t figure out which font it was so that you could use it for your own work, then this video is for you! In this Micro Tutorial, you'll learn yo…

719 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question