Link to home
Start Free TrialLog in
Avatar of noobie210
noobie210

asked on

How do I set a Readtimeout in Java URLConnection?

I am using Java to connect to a url and read the contents of the page.
The code looks like what I have posted in the attachment. However, the connect and/or read timeouts don't seem to work and the program hangs. Examples of urls which can't be read are: http://www.ajc.com and http://www.statesman.com. I can supply more examples of such urls, although they can be read perfectly all right in a browser.

I have also tried:
System.setProperty("sun.net.client.defaultConnectTimeout", "5000");
System.setProperty("sun.net.client.defaultReadTimeout", "5000");
but this too doesn't seem to work.

try {
		newURL = new URL (CURLVal);
		HttpURLConnection conn = (HttpURLConnection) newURL.openConnection();
		conn.setRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 4.0)");
		conn.setConnectTimeout(5000);
		conn.setReadTimeout(5000);
		
	
		InputStreamReader isr = new InputStreamReader(conn.getInputStream());
		BufferedReader in = new BufferedReader(isr);
		String inputLine;
		
         while ((inputLine = in.readLine()) != null)
	    		contents = contents + inputLine;
			
		} 
			catch (SocketTimeoutException e) {
			}	
			
			catch (Exception e) {
			}

Open in new window

Avatar of CEHJ
CEHJ
Flag of United Kingdom of Great Britain and Northern Ireland image

What makes you think it's a timeout issue. For me, in Java it simply doesn't return any data. That's not unusual
Avatar of noobie210
noobie210

ASKER

I want the program to either read the contents or timeout instead of hanging.
It will only timeout if there are connection problems or problems with reading the content. There aren't any afaics
If there is no connection problem and the page can be read in a browser why isn't Java able to read it?
There are many ways in which sites prevent non-human interaction. Often cookies are used, which are not handled by your code. You should probably use an http library such as Apache HttpClient, which will handle cookies for you.
That is an interesting pointer. I wrote an HttpClient program but it still hangs when I try to access these URLs. I could have committed errors in writing the program, though. However, if I am not wrong I can handle cookies without using HttpClient as well. Let me try if that is possible. Thank you for pointing out that the problem was not what I thought it was. Any other suggestions would be very welcome.

ASKER CERTIFIED SOLUTION
Avatar of CEHJ
CEHJ
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thank you for saving my time in not writing cookie handling myself. I am not a very good programmer.

The httpclient code I wrote is something like this:
public class HttpClientTitle {
	
	String contents = "";
	URL newURL;
		
	//write a function to extract title from the page
	public String getTitle (String CURLVal, String CNAMEVal)
	{
		
		String title = "";
		
		if (CURLVal.endsWith(".pdf")){
			title = "PDF File";
			return title;
		}
		
			// Create an instance of HttpClient.
    		HttpClient client = new HttpClient();

    		// Create a method instance.
    		GetMethod method = new GetMethod(CURLVal);
    		
    		method.setRequestHeader("User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows 2000)");

    		
    		// Provide custom retry handler is necessary
    		method.getParams().setParameter(HttpMethodParams.RETRY_HANDLER, 
    		new DefaultHttpMethodRetryHandler(3, false));

			String contents = "";
		
			try {
      			// Execute the method.
      			int statusCode = client.executeMethod(method);

      			if (statusCode != HttpStatus.SC_OK) {
        		System.err.println("Method failed: " + method.getStatusLine());
      			}

      			// Read the response body.
				InputStream istream = method.getResponseBodyAsStream();
      			InputStreamReader isr = new InputStreamReader(istream, "utf-8");
				BufferedReader in = new BufferedReader(isr);
				String inputLine;
		
		
								
         				while ((inputLine = in.readLine()) != null)
         					{
         					contents = contents + inputLine;
         					}
      			istream.close();
      			
      			
				} 	catch (HttpException e) {
      				System.err.println("Fatal protocol violation: " + e.getMessage());
      				} catch (IOException e) {
      				System.err.println("Fatal transport error: " + e.getMessage());
      				} finally {
      				// Release the connection.
      				method.releaseConnection();
    				}

Open in new window

The HttpURLConnection class does not provide a timeout, we implemented it in our last project by extending it to also have a timout handler. I do not have access to the code, but the link below will give you pointers on how its implemented.
http://www.logicamente.com/sockets.html 
 
This is a problem I've been returning to off and on for a very long time. Like many others who have posted questions about it, I assumed that the fault lay with Java, in that it did not allow connection and read timeouts, or that the timeouts it allowed did not work.

I see now that the problem, for most URLs, lay elsewhere. Often it was just that the page was so large, sometimes 25,000 lines long, that it took ages just to read the file. In some cases, the error was elsewhere in the code, e.g. I was trying to follow an http-equiv="refresh" erroneously.

Thank you for showing that I was making errors and confidently assertinig that it need not be a timeout problem at all.
:)