Solved

Downloading content using wget

Posted on 2004-08-12
6
4,806 Views
Last Modified: 2012-06-27
I subscribe to a monthly-pay multimedia content provider.  This site hosts a large collection of content I'd like to download, but tends to be very slow - manually downloading is a poor option.  I have attempted to use wget to download content, but I always get a 302 redirect instead of the content I'm trying to download.

here's the script I'm using -

wget --load-cookies="administrator@site[2].txt" --load-cookies="administrator@www.site[1].txt" --load-cookies="administrator@site.txt" -np -erobots=off -t10 -w2 --random-wait --waitretry=7 -U "Mozilla/4.03 [en] (X11; I; SunOS 5.5.1 sun4u)" -k --http-user=Mylogin --http-passwd=Mypassword --proxy-user=Mylogin --proxy-passwd=Mypassword -nv -A\*\.\* VideoName

The cookies were generated by logging in through IE.
The login and password work fine through IE/Opera.

Any suggestions on how to get this to work ?

Thanks.

- The site does not mind bots downloading content.





0
Comment
Question by:JaffaKREE
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
6 Comments
 
LVL 10

Expert Comment

by:frugle
ID: 11797402
looks interesting... I take it you have to login with cookie to get to the content.

are you specifying the full pathname to the cookie or have you copied it into the working directory?

I don't recognise (without the manpage) some of the options, are you -m mirroring the content?

Mike
0
 
LVL 6

Author Comment

by:JaffaKREE
ID: 11803360
Yes, I have copied the cookies over.  I wasn't using -m, I'm just trying to hit one file right now.

if I log in through IE I can bounce page-to-page until I close the browser.  Then I need to log in again.

I figured logging in to IE, copying the cookie, and using it in the wget script without closing IE would allow me to mimic the IE session.  I'm using the IE user-agent string now, but I'm still getting a 302 redirect.

Ideas ?

0
 
LVL 10

Accepted Solution

by:
frugle earned 500 total points
ID: 11803897
Looking a little deeper into it, which you probably have so stop me if I'm giving the old egg-sucking lessons...

--load-cookie seems to work best with netscape cookies - if you've exported your cookie using IE's import/export cookie routine it *may* work, or may not, but copying the cookie file in IE's native format probably won't. Getting the 302 adds weight to this.

What you CAN do as a workaround is manually add the cookie to wget using the following arguments:

wget --cookies=off --header "Cookie: name=value"


If you:

 --save-cookies file

it may give you a better idea of the format your cookie file needs to be in.

good luck,

Mike
0
NEW Veeam Agent for Microsoft Windows

Backup and recover physical and cloud-based servers and workstations, as well as endpoint devices that belong to remote users. Avoid downtime and data loss quickly and easily for Windows-based physical or public cloud-based workloads!

 
LVL 3

Expert Comment

by:gnudiff
ID: 11892521
1. While you are testing with single file download, perhaps it'd be better to remove -A option.
2. Assuming that authentication works, the 301 response seems to indicate that the video file is accessed differently - perhaps not via its name but via page script that gets passed the name as a parameter.  If you turn off the quiet mode of wget, perhaps you can see where you are being redirected?
0
 
LVL 3

Expert Comment

by:gnudiff
ID: 11892536
Oops, misread the code.
302 means that actually the video IS under a different name probably, so you should see the Location: header you are getting?
0
 

Expert Comment

by:eucoders
ID: 13719889
There is an issue regarding the referer thing as well.

Usually the easiest and quickest way to block bots is to do a very good referer thing as many browsers today support referer url.

For example, if you are the webmaster, and you know that the videos are available as links on page1 and page2, you can do a filter in apache to allow downloading only from those pages, and not others.

Many bots forget to put the right url as referer, when downloading images, etc.
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
how to gather data from news site 6 53
Looks good in lg screen, md screen but disappears on mobile 4 96
Visual Studio Web Development - HTML 7 76
WEB Service vs ??? 7 128
In this short web based tutorial, I wanted to show users how they can still use the powers of FrontPage in conjunction with Expression Web 3.  Even though Microsoft eliminated the use of Web components, we can still use them with FrontPage and edit …
Objective of This Article In 1990’s, when I was a budding software professional, I had a lot of confusion about which stream or technology, I had to choose to build my career. In those days, I had lot of confusion like whether to choose System so…
The purpose of this video is to demonstrate how to Import and export files in WordPress. This will be demonstrated using a Windows 8 PC. Go to your WordPress login page. This will look like the following: mywebsite.com/wp-login.php : Click on Too…
The purpose of this video is to demonstrate how to set up an RSS Feed on a WordPress Website. This will be demonstrated using a Windows 8 PC. Feedburner will be used for this demonstration. Go to your WordPress login page. This will look like the…

730 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question