Read in a URL problem(sometimes)

Hokester
Hokester used Ask the Experts™
on
I am trying to read in a URL and copy it to a local file on my computer. This works most of the time, but for some reason(i am doing this about 1100 times) it gets stuck somewhere in the read in part of the code. I have tried reading in one character at a time, and also one line at a time, and though it almost always works every once in a while it doesnt, and the program hangs because no error is thrown. Anyone ever seen this before?

      StringBuffer stringBuffer = new StringBuffer();
      try
      {
        URL url = new URL(urlString);
        URLConnection connection = url.openConnection();
        connection.setDoInput(true);
        InputStream inStream = connection.getInputStream();
        BufferedReader input =
           new BufferedReader(new InputStreamReader(inStream));

        int line;
        while ((line = input.read()) != -1)
        {
           stringBuffer.append((char)line);
           stringBuffer.append('\n');
        }
        input.close();
        inStream.close();
        File tempFile = new File(tempHTMLString);
        BufferedWriter bw = new BufferedWriter(new FileWriter(tempFile));
        for(int i=0;i<stringBuffer.length();i++)
        {
          bw.write(stringBuffer.charAt(i));
        }
        stringBuffer.setLength(0);
        bw.close();
      }
      catch (Exception e)
      {
        System.out.println(e.toString());
      }
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Top Expert 2016

Commented:
This could simply be network problems. You seem to be writing a newline after every character at the moment btw.

Author

Commented:
yah...sorry...i caught that too, just after i posted...
i took that out, and got the same problem.

how can i detect if it loses connection sometime during there, to reconnect, or restart? it seems that if it loses connection during input.read() it doesnt give me an error, because its stuck in a loop somewhere.
Top Expert 2016

Commented:
Yes, read will block. I think the only way to do this is with non-blocking reads. What version of Java are you using?
Exploring ASP.NET Core: Fundamentals

Learn to build web apps and services, IoT apps, and mobile backends by covering the fundamentals of ASP.NET Core and  exploring the core foundations for app libraries.

Author

Commented:
im running 1.3.1 at home, and ive tried it on 1.4.0 at work, and got the problem at both places....how do i do non-blocking reads?
Top Expert 2016

Commented:
There are some examples at

http://java.sun.com/j2se/1.4/docs/guide/nio/example/index.html

but i'm not sure how relevant they'll be. Google might be good ;-)

Author

Commented:
you cant imagine how many pages ive looked on google....
they all say the same thing basically...and for the most part they are right...but every once in a while, the read or readline will bomb
Top Expert 2016

Commented:

>>(i am doing this about 1100 times)

why is this btw?
Top Expert 2016

Commented:
What you should probably do here is have at least two threads. If you do the downloading in a separate thread, it won't block the main program execution. For something a little more sophisticated, you could have another thread checking that the reading thread has not blocked for too long.

Author

Commented:
the main program is on a thread right now that is doing the downloading

could you give me an example of what you mean by
"For something a little more sophisticated, you could have another thread checking that the reading thread has not blocked for too long. "

Author

Commented:
the reason i am doing this is that i am downloading files from espn...data files for players, and there are about 1100 players, so i need a file for each player

Commented:
I'd implement a TimeoutInputStream that would have two threads, one for reading, and another one for triggering a timeout after a (configurable) while.

Clues here, no time to make real class:
- extend InputStream
- start a Thread in the constructor that waits for the while and
  after that calls method "timeout()" or something, grab a hold of
  the currentThread in a class variable
- method "timeout()" checks if the reader thread is still alive, and
  if it is, interrupt()s it using the handle stored in the constructor
- I'm not sure if this already makes it throw an InterruptedException,
  if not, do some other violence to the stalled reader thread

HTH
Top Expert 2016
Commented:
So, do i assume correctly that you'll have 1100 different urlStrings?

>>URL url = new URL(urlString);


arauramo - your suggestion seems vaguely familiar ;-)

>>I'm not sure if this already makes it throw an InterruptedException

If it's blocked in a read, it won't. You'd have to close the lowest stream with another thread.

Author

Commented:
Yes, i have 1100 different URL streams, and this piece of code only gets called on the next string when the current one finishes.

and yes, its blocked in a read, so it never throws any kind of Exception during that read.

can you go into a little more detail about how i would interrupt the reading thread? Im not sure how i would do that.  thanks
Top Expert 2016

Commented:
>>and this piece of code only gets called on the next string when the current one finishes

In that case, you'd be better spinning off a new thread for each url. If there's a problem with one or more of them, it, or they, won't then interfere with the others.

Commented:
> it gets stuck  somewhere in the read in part of the code.

sounds like communication / server side problem. TCP conenction is not closed normally

Commented:
Hm, looks like the Socket beneath the URLConnection is hidden, as far as I can see.

How about you take an actual Socket, setSoTimeout on that, getInputStream() and read the data? Of course then you won't have the convinience of URLConnection, so you have to check headers yourself. Complexity depends a bit on the data you are dealing with.
Top Expert 2016

Commented:
It makes no sense to do this in one thread. The probability of all the rest getting help up by 'bad apples' seems to me to be quite high.

Commented:
...and from there, I found these:
http://www.innovation.ch/java/HTTPClient/
and
http://www.larsan.net/java/index.jsp?page=tricks/url.html

Both provide a URLConnection with timeout. =)
Commented:
the only possible solutions seems to be

"Keep checking in a loop ( with a small sleep inside ) whether the stream has any data. Only when ready() become true you do a read()."

not sure if it will work

Commented:
> the only possible solutions seems to be

if you REALLY have to use URLConnection of course :)

Commented:
And further, a direct copy from the forum;

"Just so people are aware - J2SE 1.4 does have a system property to set the timeout when reading from URL connections. It's sun.net.client.defaultReadTimeout and specifies the timeout in ms. Going forward it would be useful to have programmatic control of the timeout. "

Commented:
And further, a direct copy from the forum;

"Just so people are aware - J2SE 1.4 does have a system property to set the timeout when reading from URL connections. It's sun.net.client.defaultReadTimeout and specifies the timeout in ms. Going forward it would be useful to have programmatic control of the timeout. "

Author

Commented:
arauramo,

have you tried this
And further, a direct copy from the forum;

"Just so people are aware - J2SE 1.4 does have a system property to set the timeout when reading from URL connections. It's sun.net.client.defaultReadTimeout and specifies the timeout in ms. Going forward it would be useful to have programmatic control of the timeout. " 

or seen any examples of that?

by the way, the thread you posted at http://forum.java.sun.com/thread.jsp?forum=11&thread=17737 
seems to be what my problem is, but it doesnt seem like anyone agree on a solution

Author

Commented:
i found this piece of code
System.setProperty("sun.net.client.defaultReadTimeout","30000"); where 30000 is 30 seconds.  ill give it a shot!


Commented:
>have you tried this

No, I haven't, I thought you would and tell us if it worked? ;) If you use/can use J2RE1.4, naturally...

>by the way, the thread you posted seems to be what my
>problem is, but it doesnt seem like anyone agree on a
>solution

Everyone seems to agree that that one guy should email the class to them, though. =)
Top Expert 2016

Commented:
>>System.setProperty

Sounds less painful than using the Socket directly (also an option for setting timeout). You should still get better performance if you fire off an new thread for each url.
Commented:
>You should still get better performance if you fire
>off an new thread for each url.

1100 Threads? Come on! Of course more than one thread definately, since most of the time of the reading goes to waiting. How about some 50-ish Threads and a synchronized pool for unfetched URL where each thread can ask for the next one to get after finishing working on the previous one. That way also if a Thread gets "broken" it won't lose anything except the one URL it was working on, since it can't ask for the next one.
Top Expert 2016

Commented:
Well, obviously there wouldn't be 1100 alive simultaneously ;-)
Top Expert 2016

Commented:
>>each thread can ask for the next one to get after finishing working on the previous one

and, unlike figures of legend, a thread cannot be reborn after death ;-)

Commented:
>and, unlike figures of legend, a thread cannot be
>reborn after death ;-)

Whaddya mean? I see that as a good thing only, reborn Threads might have a vague state, as typically do the legendary figures that do get reborn.

Why'd the Thread die after getting one URL? Loop, man, loop until there's no more URL's to get!

Commented:
If espn is updating a player profile at the exact instant that you are trying to have an open connection to the file, your reader may hang.

Listen to the comments above.  What's wrong with threading the read process, and running about fifty threads at a time.

With a proper thread manager, you'd be able to log the one player that threw and exception and you'd still get the remaining players in less time.

Top Expert 2016

Commented:
>>? Loop, man, loop until there's no more URL's to get!

Yes, that'd be OK. That might be getting more complex then Hokester wants though, as it would imply synchronized access to a queue or stack.

Author

Commented:
So i tried the System.setProperty for read timeOuts and Connect timeouts...and with both in there, i still got the same results. Anyone actually have a solution theyve tried and gotten a good result?

Commented:
> Anyone actually have a solution theyve tried and gotten a good result?

yes, I have used separate thread per connection. pretty complicated and not very scaleable.

Author

Commented:
any way you can post the thread part of the code, and let me take a swing at seeing if i can figure it out?

Commented:
Hokester, what's the problem, the blocking reads or the multithreading part?

Author

Commented:
Well they are both the problem. The problem is the blocking reads, and apparently a solution is the threads.  If the threads way does indeed get around this problem, then any sort of example would make my life much easier.

thanks!

Commented:
I do not have my code at the moment. If nobody posts an example, I will try to dig it tomorrow.
Top Expert 2016

Commented:
>>If the threads way does indeed get around this problem

It will help. Instead of blocking the main execution thread, bad reads will merely block a separate thread and other reads can proceed. I don't have any example at the moment unfortunately.

Author

Commented:
Does anyone have some sort of example?

Commented:
URLReader extends Thread {

  private String htmlContent;
  private Exception exception;
  private boolean done;

  public void run () {  
    try {
      while((String line = br.readLine()) != null) {
        htmlContent += line;
      }
      done = true;
    }
    catch(Exception all) {
      this.exception = all;
    }
  }

}

private checkThreads() {
  //go through your Collection maybe Vector v?
  //if URLReader.getException() != null && URLReader.done()
  //else remove (e.g. v.removeElementAt(i);)
  if(v.size > 0) {
    return checkThreads();
  }
}

Commented:
Hokester

do you still need help with this one ?

Author

Commented:
Yes please. Looking at this previous example, i dont see how i use this, or how the thread works with this checkThreads function.  Other than the check threads function, this seems very similar to what i have posted as being a problem for me.
Top Expert 2016

Commented:
The problem with this answer is that instead of your main execution thread being blocked, a dedicated thread will block instead. This is progress of a sort, but you still need to solve the blocking read problem in these threads.
Commented:
Sorry,

Two Classes:

1. URLReader

2. URLManager


URLManager extends Thread {

  private Vector threads;
  boolean done;

  public void run() {
    return checkThreads();
    done = true;
  }
 
  private checkThreads() {
    //go through your Collection maybe Vector v?
    //if URLReader.getException() != null && URLReader.done()
   //else remove (e.g. v.removeElementAt(i);)
   if(v.size > 0) {
     return checkThreads();
   }
  }
}

a. start URLManager and then periodically check every few seconds if it is complete.

Author

Commented:
will done= true ever get executed in the run function? and also, check threads has no type, i assume its just type void?

where you say go through your collection

say i have a vector, which i do, do i make a url reader for each element of the vector or what are you suggesting i do in this section?

thanks again,
eric
Top Expert 2016

Commented:
The important thing is to test what happens when you assign null to a thread that's blocked in a read.

Commented:
Sorry again,

don't return checkThreads();

after all threads are either checked or removed, done is set to true;

yes, insert a URLReader for each element in your Vector.

If any URLReader gets 'locked up', you can get it's Exception.

If any URLReader takes too long, you can time it out and remove it from the Vector:

  if(currentTime > startTime + 90000) {
    //iterate through your elements and remove the remaining items.
  }
Hokester:
This old question needs to be finalized -- accept an answer, split points, or get a refund.  For information on your options, please click here-> http:/help/closing.jsp#1 
EXPERTS:
Post your closing recommendations!  No comment means you don't care.

Commented:
No comment has been added lately, so it's time to clean up this TA.
I will leave a recommendation in the Cleanup topic area that this question is:

- Split points between CEHJ, arauramo, heyhey_ and jerelw

Please leave any comments here within the next seven days.

PLEASE DO NOT ACCEPT THIS COMMENT AS AN ANSWER!

Venabili
EE Cleanup Volunteer

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial