JaffaKREE
asked on
Downloading content using wget
I subscribe to a monthly-pay multimedia content provider. This site hosts a large collection of content I'd like to download, but tends to be very slow - manually downloading is a poor option. I have attempted to use wget to download content, but I always get a 302 redirect instead of the content I'm trying to download.
here's the script I'm using -
wget --load-cookies="administra tor@site[2 ].txt" --load-cookies="administra tor@www.site[1].tx t" --load-cookies="administra tor@site.t xt" -np -erobots=off -t10 -w2 --random-wait --waitretry=7 -U "Mozilla/4.03 [en] (X11; I; SunOS 5.5.1 sun4u)" -k --http-user=Mylogin --http-passwd=Mypassword --proxy-user=Mylogin --proxy-passwd=Mypassword -nv -A\*\.\* VideoName
The cookies were generated by logging in through IE.
The login and password work fine through IE/Opera.
Any suggestions on how to get this to work ?
Thanks.
- The site does not mind bots downloading content.
here's the script I'm using -
wget --load-cookies="administra
The cookies were generated by logging in through IE.
The login and password work fine through IE/Opera.
Any suggestions on how to get this to work ?
Thanks.
- The site does not mind bots downloading content.
ASKER
Yes, I have copied the cookies over. I wasn't using -m, I'm just trying to hit one file right now.
if I log in through IE I can bounce page-to-page until I close the browser. Then I need to log in again.
I figured logging in to IE, copying the cookie, and using it in the wget script without closing IE would allow me to mimic the IE session. I'm using the IE user-agent string now, but I'm still getting a 302 redirect.
Ideas ?
if I log in through IE I can bounce page-to-page until I close the browser. Then I need to log in again.
I figured logging in to IE, copying the cookie, and using it in the wget script without closing IE would allow me to mimic the IE session. I'm using the IE user-agent string now, but I'm still getting a 302 redirect.
Ideas ?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
1. While you are testing with single file download, perhaps it'd be better to remove -A option.
2. Assuming that authentication works, the 301 response seems to indicate that the video file is accessed differently - perhaps not via its name but via page script that gets passed the name as a parameter. If you turn off the quiet mode of wget, perhaps you can see where you are being redirected?
2. Assuming that authentication works, the 301 response seems to indicate that the video file is accessed differently - perhaps not via its name but via page script that gets passed the name as a parameter. If you turn off the quiet mode of wget, perhaps you can see where you are being redirected?
Oops, misread the code.
302 means that actually the video IS under a different name probably, so you should see the Location: header you are getting?
302 means that actually the video IS under a different name probably, so you should see the Location: header you are getting?
There is an issue regarding the referer thing as well.
Usually the easiest and quickest way to block bots is to do a very good referer thing as many browsers today support referer url.
For example, if you are the webmaster, and you know that the videos are available as links on page1 and page2, you can do a filter in apache to allow downloading only from those pages, and not others.
Many bots forget to put the right url as referer, when downloading images, etc.
Usually the easiest and quickest way to block bots is to do a very good referer thing as many browsers today support referer url.
For example, if you are the webmaster, and you know that the videos are available as links on page1 and page2, you can do a filter in apache to allow downloading only from those pages, and not others.
Many bots forget to put the right url as referer, when downloading images, etc.
are you specifying the full pathname to the cookie or have you copied it into the working directory?
I don't recognise (without the manpage) some of the options, are you -m mirroring the content?
Mike