Solved

Downlload All Files From a Url

Posted on 2003-10-21
14
244 Views
Last Modified: 2010-05-18
Hi again Experts!
I'm looking for a web site, that have a lot of subtitles (zip format), but i don't have time to download all subtitles by clicking in a php link....
so, I want download all *zip files from this url:
Something like:
http://www.somesite/subtitles/*.zip

The problem is that this url, have a blank index.htm, so i can't see the zip files, and i don't have any idea
about the file names...
maybe a really long name, as 123456_789_something_833Sl62Bb.zip

So, i need a code, that search this web directory for the files, and download all to my local hard disk.

Any idea guys?
0
Comment
Question by:Spetson
  • 5
  • 4
  • 2
  • +3
14 Comments
 

Expert Comment

by:OpenSourceDeveloper
ID: 9591789
well if you only want a program that can do this then I suggest "wget"

http://www.interlog.com/~tcharron/wgetwin.html
0
 

Author Comment

by:Spetson
ID: 9592044
nope...
I have tryed wget, but
I just receive a index.html file :(
0
 

Expert Comment

by:OpenSourceDeveloper
ID: 9592118
wget -r -l1 --no-parent -A.gif http://host/dir/

It is a bit of a kludge, but it works perfectly. `-r -l1' means to retrieve recursively (See section Advanced Options), with maximum depth of 1. `--no-parent' means that references to the parent directory are ignored (See section Directory-Based Limits), and `-A.gif' means to download only the GIF files. `-A "*.gif"' would have worked too.
Suppose you were in the middle of downloading, when Wget was interrupted. Now you do not want to clobber the files already present. It would be:

so let's say you want to download all zip files that are linked to from www.test.com you would do

wget -r -l1 --no-parent -A.zip http://www.test.com/
0
Are your AD admin tools letting you down?

Managing Active Directory can get complicated.  Often, the native tools for managing AD are just not up to the task.  The largest Active Directory installations in the world have relied on one tool to manage their day-to-day administration tasks: Hyena. Start your trial today.

 
LVL 6

Expert Comment

by:MannSoft
ID: 9592594
You need some sort of file list, wether it be an index.html linking to the files, or a directory list.  If you don't know the names of the files, how can you download them?  You can't, unless of course you write a brute force program that tries every possible filename combination up to X amount of characters.  But the admin will probably want to kick your ass if you do that :)
0
 
LVL 5

Expert Comment

by:delphized
ID: 9592673
if it's only for a site, why don't you write them and ask for a CD???

chuss
0
 

Author Comment

by:Spetson
ID: 9594703
The screen Show:
==================================
wget -r -l1 --no-parent -A.zip http://www.website/_subtitles_/
--07:18:17--  http://www.website:80/_subtitles_/
           => `www.website/_subtitles_/index.html'
Connecting to www.website:80... connected!
HTTP request sent, awaiting response... 200 OK
Length: 266 [text/html]

    0K ->                                                        [100%]

07:18:18 (259.77 KB/s) - `www.website/_subtitles_/index.html' saved [2
66/266]


FINISHED --07:18:18--
Downloaded: 266 bytes in 1 files
===============================================
I believe that MannSoft is right!
And you know...we are not talking about delphi here anymore...hehe
So if somebody knows some delphi code to make this works, please help!!!
Otherways... i will request a cd, like delphized said!!
Thank's guys, and sorry for my english!
0
 
LVL 6

Expert Comment

by:MannSoft
ID: 9594732
I just re-read your first message...and you say:

"I'm looking for a web site, that have a lot of subtitles (zip format), but i don't have time to download all subtitles by clicking in a php link.... "

So it sounds like there IS a page that has a list of all the files you can download.  Point wget at that page and it should download all the files it links to.
0
 
LVL 17

Expert Comment

by:Wim ten Brink
ID: 9598179
MannSoft is right... The only way to get files through HTTP is when you know the name of these files. There are quite a few applications that use a simple trick to find these names. All you have to do is search for <A href="Whatever"> tags in a webpage and whatever the href is pointing to is just another URL where you can get more data from. Look for these kinds of references to build a list of pages that you want to download.
This technique is called webcrawling, btw. And you have to be careful since you can end in an endless loop if you're not careful.
0
 

Author Comment

by:Spetson
ID: 9598541
Ok...now i understood.
I believe that i cannot make this crazy idea works, because the server admin
have a "blank index.html" in the subtitles folder, accurately to protect the directory.
And maybe, is insane  try to "guess" all the possible archives names in this directory,
using any type of loop or delphi code...

But, i will keep this topic open here, cause i already see  incredible things made by Delphi programmers :)

Therefore, I will be listening, for any suggestion or comments here.

 
 
0
 
LVL 6

Expert Comment

by:MannSoft
ID: 9598617
Could you clarify what you mean by this:

"I'm looking for a web site, that have a lot of subtitles (zip format), but i don't have time to download all subtitles by clicking in a php link.... "

What php link do you have to click?  Depending on the format of the page, you should be able to build a file list from it, that you can then use to get the files with wget.
0
 

Author Comment

by:Spetson
ID: 9598771
Ok!
Sorry for my poor explanations, this is because i'm not powerfull in english...
The server, use a mysql database, so when u click in a subtitle link, this link send you to another page (a cgi)
something like this:
download.php?file=289

So, the subtitles page, have about 2000 links (all linked to the same download.php) only changing the file id.
The problem is:
Before the download.php page, redirect you to the file, you MUST WAIT (in the line) about 1 minute queue, to get every file!
I just want broke this line, and get the file.
I know the directory where all the files is, but don't know all the names (filename)
:)
0
 
LVL 6

Accepted Solution

by:
MannSoft earned 300 total points
ID: 9598984
Okay, I see what you mean now.  And yeah, I think there's not much you can do unless by chance the files were named with some sort of a pattern.  For example, if there were file1.zip, file2.zip, file3.zip then there's a good chance that there are also file4.zip, file5.zip, file6.zip.  But if the zip name describes the contents, like winamp.zip, norton-antivirus.zip, smartftp.zip, then of course that makes it near impossible to guess what else is there.
0
 
LVL 26

Expert Comment

by:EddieShipman
ID: 9950750
I did something like this not too long ago. Let me see if I can find it.
0
 

Author Comment

by:Spetson
ID: 9980961
MannSoft said:
"it near impossible to guess what else is there."

Yes.. I think you're right man!
I had worked hard, and no success to find a solution to my question...hehe!

But, as you had support me with this question, i decided to give to you this 300 points ok?
Because i really hate "Just Close" a topic, so...
Thank's a lot for your comments...

See ya!
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Internet Explorer View Settings Question 15 111
Magic Software info 18 132
Multi-layered image in FireMonkey 9 37
How to Get Images From Server using App Tethering 11 29
Introduction The parallel port is a very commonly known port, it was widely used to connect a printer to the PC, if you look at the back of your computer, for those who don't have newer computers, there will be a port with 25 pins and a small print…
Creating an auto free TStringList The TStringList is a basic and frequently used object in Delphi. On many occasions, you may want to create a temporary list, process some items in the list and be done with the list. In such cases, you have to…
This Micro Tutorial demonstrates using Microsoft Excel pivot tables, how to reverse engineer competitors' marketing strategies through backlinks.
This video shows how to quickly and easily add an email signature for all users on Exchange 2016. The resulting signature is applied on a server level by Exchange Online. The email signature template has been downloaded from: www.mail-signatures…

809 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question