Solved

module for http range retrieval

Posted on 2009-07-15
6
289 Views
Last Modified: 2012-05-07
I have been using LWP::Simple and WWW::Mechanize to retrieve files from http servers.

There are some files that are very large and I only need a section of them and would like to use the range retrieval capability of most http 1.1 servers.

Is there a perl module that currently supports this, and in what way?

thanks
0
Comment
Question by:drunnels
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
  • 2
6 Comments
 
LVL 14

Expert Comment

by:flob9
ID: 24860368

Try "max_size" :
use LWP::UserAgent;
use HTTP::Response;
 
my $browser = LWP::UserAgent->new( );
$browser->max_size(500);
$url = 'http://www.google.com/';
my $response = $browser->get($url);
 
print $response->content( );

Open in new window

0
 

Author Comment

by:drunnels
ID: 24860506
Thanks, but I'm not trying to limit the size from the beginning of the file, but rather I want to be able to specify a starting point. For instance, a may have a 500 meg file and I want to be able to ask for the download to start 400 meg into it and continue to the end of the file.
0
 
LVL 7

Expert Comment

by:Fairlight2cx
ID: 24861091
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 7

Expert Comment

by:Fairlight2cx
ID: 24861214
Actually, HTTP::Range may not be exactly what you need, since it's related to segmenting.  BUT...  The docs show that it uses the Range and Content-Range headers of the HTTP protocol.  Those are detailed at ftp://ftp.rfc-editor.org/in-notes/rfc2616.txt in the spec.  (See section 14.)

You should be able to use the header() or push_header() methods of HTTP::Request to add the appropriate information to your request and achieve your goal using full-on LWP, however.
0
 
LVL 14

Accepted Solution

by:
flob9 earned 500 total points
ID: 24861708

$req = HTTP::Request->new(GET => "http://cdimage.debian.org/debian-cd/5.0.2/i386/iso-cd/debian-502-i386-netinst.iso"); 
$req->header(Range => "bytes=0-99"); 
$res = LWP::UserAgent->new->request($req); 
print $res->as_string; 
 
 
=> response :
 
HTTP/1.1 206 Partial Content
Connection: close
Date: Wed, 15 Jul 2009 17:22:13 GMT
Accept-Ranges: bytes
Age: 3481
ETag: "d871c8-9608000-46d7ab1025380"
Server: Apache/2.2.9 (Unix)
Content-Length: 100
Content-Range: bytes 0-99/157319168
Content-Type: application/octet-stream
Last-Modified: Mon, 29 Jun 2009 11:07:10 GMT
Client-Date: Wed, 15 Jul 2009 17:22:13 GMT
Client-Peer: 130.239.18.138:80
Client-Response-Num: 1

Open in new window

0
 

Author Closing Comment

by:drunnels
ID: 31603784
Thanks. This was exactly what I needed. The only thing I'd add to your answer is that to get just the page content one would add:
$content = $res->{'_content'}

0

Featured Post

NEW Veeam Agent for Microsoft Windows

Backup and recover physical and cloud-based servers and workstations, as well as endpoint devices that belong to remote users. Avoid downtime and data loss quickly and easily for Windows-based physical or public cloud-based workloads!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

726 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question