Solved

Timing out on large file connections using URLConnection

Posted on 2009-07-02
14
558 Views
Last Modified: 2012-06-27
I don't write in Java too often so this may seem a little basic.

I'm using URLConnection to scrape some sites. It works fine until it tries to scrape something like a large video. So I put a little function in to grab headers first, check for "text" in the content type, then continue from there.

This works fine for smaller items, it catches my trap for zip files and such that aren't that large to begin with... but I tested with this one really large MOV file, and it just hung there trying to connect (just for getting headers.)

I tried adding a setConnectTimeout and setReadTimeout at first 5000ms, 1000ms, then 100ms, but I must be using them incorrectly because they aren't canceling the connection ( even when i brought it down to 100 ).

Any help is appreciated, thanks.
// Getting headers (sans the function/catches etc)
u = new URL( $url );
uc = u.openConnection();
uc.setConnectTimeout( 1000 );
uc.setReadTimeout( 1000 );
header = uc.getContentType();
System.out.println( header );
return header;
 
// Get page data (sans other stuff)
String headers			=	getHeaders( $url );
if( headers.indexOf("text") < 0 ){
	System.out.print( "Bad headers : " + headers );
	return "";
}else{
	System.out.println( "Good headers, continue ");
}

Open in new window

0
Comment
Question by:MattKenefick
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
14 Comments
 
LVL 92

Expert Comment

by:objects
ID: 24769037
you can also check the content length to determine the amount of data
there is no way to set a timeout on URLConnection, you would need to implement that yourself by for example timing out the thread that is pulling the file

HttpClient though would be better for handling it instead of urlconnection
0
 
LVL 4

Author Comment

by:MattKenefick
ID: 24769652
I thought of that, but I'm not sure if I can use HttpClient because I'm sandboxed. I thought of checking content-length but I figured that if it took that long to read the content-type header that it would be same for content-length since they're both headers. Does it read the content-length first?

And how are the timeouts supposed to be used? I assume itd be a time length til it cancels the connection, but apparently not? Unless im using it wrong. Any input on this? Thanks the assistance
0
 
LVL 6

Expert Comment

by:jwenting
ID: 24769691
HttpClient uses the same connections as does URLConnection, it just handles them for you.
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 4

Author Comment

by:MattKenefick
ID: 24769739
Okay, I suppose I can try rolling back to that but I'd still also like to know about content-type vs content-length and the reality behind what the timeouts are all about. Thanks jwenting!
0
 
LVL 92

Expert Comment

by:objects
ID: 24770055
> I thought of that, but I'm not sure if I can use HttpClient because I'm sandboxed.

sandbox does not affect what you can use

> I thought of checking content-length but I figured that if it took that long to read the content-type header that it would be same for content-length since they're both headers. Does it read the content-length first?

It doesn't need to read the content to access the headers.

> And how are the timeouts supposed to be used?

They are how long it will wait for server to respond


0
 
LVL 4

Author Comment

by:MattKenefick
ID: 24770080
@objects

I know headers are supposed to come before content. That's why I said content-type was taking a long time to load asif it was loading content, which is exactly the problem.

I understand the connect timeout should be until the connection is recognized, but isn't setReadTimeout how long for it to read the content? Like if its a 10mb video and your setReadTimeout is 2000ms, shouldn't it throw an error? Cause mine doesn't.
0
 
LVL 92

Expert Comment

by:objects
ID: 24770085
so are you not even reading the content? That doesn't sound right.

> but isn't setReadTimeout how long for it to read the content?

no how long it will wait for a read to respond
0
 
LVL 92

Accepted Solution

by:
objects earned 500 total points
ID: 24770100
you could send a HEAD request to just get the content type :)
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 24770356
This is ulimately a threading problem. Why?

A large file is going to have the same order of number of headers as a small one. The problem is also not to do with connect or read timeout. If it were, you'd be seeing exceptions.

You're probably experiencing other networking problems and you can certainly expect to get them from time to time. What shouldn't happen is for this to become a problem for your program, having to wait. That's the threading problem.

The fetch should be done in a separate thread so that your main thread can take the appropriate action when the worker thread is kept waiting around too long
0
 
LVL 4

Author Comment

by:MattKenefick
ID: 24776330
Good little chart about the differences between HTTPClient and URLConnection http://www.innovation.ch/java/HTTPClient/urlcon_vs_httpclient.html
0
 
LVL 92

Expert Comment

by:objects
ID: 24776332
was referring to apache httpclient

http://hc.apache.org/httpclient-3.x/
0
 
LVL 4

Author Comment

by:MattKenefick
ID: 24778933
used this to get the headers without timing out. thanks.
this.hurlc.setDoOutput(true);
        try{
        	this.hurlc.setRequestMethod("HEAD");
            returnString = this.hurlc.getContentType();
            this.hurlc.disconnect();
        } catch ( ProtocolException e ){
        	// ...
        } catch ( RuntimeException e ){
        	// ...
        }

Open in new window

0
 
LVL 4

Author Closing Comment

by:MattKenefick
ID: 31599427
Not everything I wanted to know, but the majority of it.
0
 
LVL 4

Author Comment

by:MattKenefick
ID: 24778937
To elaborate a little further, I switched from URLConnection to HttpURLConnection. The code the creates the "hurlc" is like this:
this.hurlc = (HttpURLConnection) new URL("http://...com").openConnection();

Open in new window

0

Featured Post

Online Training Solution

Drastically shorten your training time with WalkMe's advanced online training solution that Guides your trainees to action. Forget about retraining and skyrocket knowledge retention rates.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

For beginner Java programmers or at least those new to the Eclipse IDE, the following tutorial will show some (four) ways in which you can import your Java projects to your Eclipse workbench. Introduction While learning Java can be done with…
By the end of 1980s, object oriented programming using languages like C++, Simula69 and ObjectPascal gained momentum. It looked like programmers finally found the perfect language. C++ successfully combined the object oriented principles of Simula w…
Viewers learn about the third conditional statement “else if” and use it in an example program. Then additional information about conditional statements is provided, covering the topic thoroughly. Viewers learn about the third conditional statement …
Viewers learn about the scanner class in this video and are introduced to receiving user input for their programs. Additionally, objects, conditional statements, and loops are used to help reinforce the concepts. Introduce Scanner class: Importing…
Suggested Courses

710 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question