?
Solved

java.util.Timer shows symptoms of starvation

Posted on 2008-01-25
42
Medium Priority
?
819 Views
Last Modified: 2011-10-19
I use a java.util.Timer to send ping singnals to some other applications which are connected to mine
by a standard socket communication. The timer's goal is to show communication problems soon.
Hence, if a connected application does not receive regular pings, it closes the socket after a certain
timeout and retries later. The timer's period is 2 seconds and the timeout at 30 seconds.

It happens (seldom but from time to time), that the timer is pausing for about some minutes (!)
and restarts by catching up all missing pings as a bunch. (The task is scheduled by the method
scheduleAtFixedRate()).

Of course, I know about the inaccuracies of Java's wait() method which seems to be the key line
in the Timer's implementation. But it's not a matter of some milliseconds but of minutes.
The operating system is Windows XP and the CPU usage is really low.
All threads (and there are just a dozen of them) have standard priority, which I would like to keep.

What could be the problem? And what could be a solution?

Thanks for any assistance.

Olaf.
0
Comment
Question by:Olaf_Bloemer
  • 22
  • 11
  • 9
42 Comments
 
LVL 17

Expert Comment

by:krakatoa
ID: 20741511
Wait()? Normally the timer would run fine with a Timertask associated with it.
0
 

Author Comment

by:Olaf_Bloemer
ID: 20741664
Of course, I use a TimerTask to schedule my job, and there is no explicit invocation of wait() in my code.
But a glance at Sun's implementation of java.util.Timer shows in the private method mainLoop()
(lines 475-517 of the non-public class TimerThread inside java.util.Timer) the usage of wait() in line 509:

     queue.wait(executionTime - currentTime);

This seems to be the line, where the executing timer thread waits between its executions.
And it seems that it does not return from this wait() in time.

Or am I completely wrong?
0
 
LVL 17

Expert Comment

by:krakatoa
ID: 20742364
Thinking.
0
The new generation of project management tools

With monday.com’s project management tool, you can see what everyone on your team is working in a single glance. Its intuitive dashboards are customizable, so you can create systems that work for you.

 
LVL 17

Expert Comment

by:krakatoa
ID: 20742580
>> It happens (seldom but from time to time) ..

How often, and when?

Would you be in a position to run an observable, non-executive parallel instance (which uses fewer threads)? You must have given quite some thought to this already concerning possible race conditions.
0
 

Author Comment

by:Olaf_Bloemer
ID: 20742847
Yes, we're running about 130 instances of this application in 130 subsidiaries on 130 local machines
the hardware equipment of which is supposed to be identical.
There is one machine showing this problem every day between two and four times.
The problem has been observed on other machines as well, but then it's really seldom.
Maybe one occurence per week among the other 129 machines.
So it could be a hardware problem in this one special case, but there seems to be a general
Java problem in the background as well which is "provoked" by this one machine more than usual.
0
 
LVL 17

Expert Comment

by:krakatoa
ID: 20742904
... add in time-stamped logging to provide documented evidence, if you are not already doing so .
0
 
LVL 17

Expert Comment

by:krakatoa
ID: 20742922
(Missed reading your last post before my last post. Reading.)
0
 
LVL 17

Expert Comment

by:krakatoa
ID: 20743078
Do you mean that the Java app sending the pings is on the local machines, pinging itself, (?), not pinging clients from a server?
0
 
LVL 17

Expert Comment

by:krakatoa
ID: 20743442
And how do you differentiate between a 'genuine' communications problem, and an apparently phantom one such as this?
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 20743691
>>
It happens (seldom but from time to time), that the timer is pausing for about some minutes (!)
and restarts by catching up all missing pings as a bunch. (The task is scheduled by the method
scheduleAtFixedRate()).
>>

Try to avoid the bunching by executing your application code in a separate thread created in TimerTask.run
0
 

Author Comment

by:Olaf_Bloemer
ID: 20759392
Thank you for your answers, and sorry for being late with this next contribution.

To give further explanation on the architecture: As I said, there are 130 subsidiaries each of which
has a Java application, which in turn works as a server in a LAN and serves for about five to 30
clients. These clients are primitive microcontrollers supervising some technical machines.
The microcontrollers are capable (only) of a socket communication which allows to transmit
a primitive ASCII protocol to control the microcontrollers.
Hence the pings are not sent from the Java application to localhost but to a microcontroller located
in a LAN (which means there shouldn't be to many communication problems).

I have no idea how to detect 'genuine' communication problems. There's not much traffic beside the
pings, i.e. it can happen, that there is no other signal for several hours. This is why we use a ping
signal: to become aware of problems.

In the meantime we tried a 'hand-made' timer which uses a simple Thread.sleep() to pause.
This worked better, we had no problems during the weekend. Maybe this is the work-around
for the time being, but I still would like better to understand really what's going wrong with
the java.util.Timer. It remains to be strange, doesn't it?

Thank you again.
What about the points? What do you think is a fair handling?
(I'm not very often around on EE, so I'm not really up with the common awarding)

Best regards,
Olaf


0
 
LVL 86

Expert Comment

by:CEHJ
ID: 20759491
>>
but I still would like better to understand really what's going wrong with the java.util.Timer.
>>

When you say 'wrong'. i'd guess nothing (technically). Timer is single-threaded and expects its tasks to return fast. From the API docs:

"Corresponding to each Timer object is a single background thread that is used to execute all of the timer's tasks, sequentially. Timer tasks should complete quickly."

My guess is that on some occasions, it's just what your code *doesn't* do, hence my last suggestion. Did you try it btw?
0
 
LVL 17

Expert Comment

by:krakatoa
ID: 20760898
Agree with CEHJ.  

Some part of the process that the maverick server controls is not returning on time because it's doing something else, cf. pre-emptive multitasking, time-slicing. I guess with Thread.sleep, your code can't "not return", because it's didn't go anywhere yet, whereas with the timer, it's aw(o)l.

(It might be interesting to see what happens if you replace that one server with a multiprocessor machine).

As for the points, put it this way - the 3 of us deserve equal shares. ;) Good luck.
0
 

Author Comment

by:Olaf_Bloemer
ID: 20761221
I see that TimerTask's run() method should return quickly, but this should be okay
in my code: There is just one task for this timer, and it  just places some ping command
in a command queue. (I always prefer to use separate timers for separate tasks.
Do you think, this is good practice?)
I haven't tried your idea yet, but I will do soon.

Thanks again,

best regards,
Olaf.
0
 
LVL 17

Expert Comment

by:krakatoa
ID: 20761272
>> Do you think, this is good practice?)
 I guess that is directed at CEHJ, so I'll step aside.

But in that case, maybe you should be looking at your queue code rather than the timer. -?
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 20761417
>>I see that TimerTask's run() method should return quickly, but this should be okay
in my code:

That would run counter to the evidence you report - namely of 'bunching'. Including the timestamped debugging code will tell you exactly when your task returned

>>Do you think, this is good practice?

Yes, i think they should have this logical separation
0
 

Author Comment

by:Olaf_Bloemer
ID: 20769820
Hi again,

we've tried CEHJ's suggestion: 'Try to avoid the bunching by executing your application cod
 in a separate thread created in TimerTask.run'
Unfortunately, it didn't work. It really seems to be the timer itself which awakes too late.

Our 'home-made' timer (a simple loop with a Thread.sleep()) seems to be the best work-around,
although it's not beautiful code. It's interesting, that it is also possible to use a this.wait(2000)
to bridge the execution pause in this simple loop solution.

If both of you don't mind, I woud like to share the points between you to honour your efforts.
Is it possible? If not, what should I do?

Best regards,
Olaf.

0
 
LVL 17

Expert Comment

by:krakatoa
ID: 20770114
re: points, yes, that's fine with me, thanks.

Do you not have any other apps or processes running on the server machines which could be starving your timer thread ? Is the crucial part of your programme in a synchronized block? And have you made any use of .notify() / notifyAll()?

I'd doubt CEHJ's diagnosis were wrong though.
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 20770845
I can't really comment further without seeing the log file from your application
0
 

Author Comment

by:Olaf_Bloemer
ID: 20771294
Thanks for the offer. Unfortunately most of the log file is written in German.
Do you understand German? ;o)

0
 
LVL 17

Expert Comment

by:krakatoa
ID: 20771432
Ja, Deutsch kann ich allerdings. ;)

Nur nebenbei also:

I am trying to imagine your setup where each 1/130 XP machine would only be sending pings to a peripheral device, and doing nothing else. Is each 1/130 machine a standalone? On a network? On the 'net? The other 128/129 machines seem to bear out the stability of the code otherwise, so what about hardware glitches in the rogue machines' timekeeping - haben Sie das ueberhaupt kontrolliert?
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 20771497
>>Ja, Deutsch kann ich allerdings. ;)

Ich auch ;-)
0
 
LVL 17

Expert Comment

by:krakatoa
ID: 20771580
Ausgezeichnet !

(At *last* a breakthrough) ;)
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 20772267
All i'd like to see from the log btw is the entry time to TimerTask.run and exit time
0
 
LVL 17

Expert Comment

by:krakatoa
ID: 20773394
You've probably seen Sun's ping class - http://java.sun.com/j2se/1.4.2/docs/guide/nio/example/Ping.java.
0
 
LVL 17

Expert Comment

by:krakatoa
ID: 20775492
Sorry, but asking ultra-fundamental questions can sometimes be edifying (and I guess you are still open to ideas. I should've asked already :

1) what kind of OS, environment etc., is on the microcontrollers? What kind of hardware devices are they? 1a) What kind of physical connection do they have with the XPs?

2) Can you enlighten us as to your conception of 'ping' - is it merely successfully opening a socket, or are you expecting a response as well? And ...

3) How are the microcontrollers handling Java - or are you addressing them through some native interface via the socket?

Why don't you get your XP machines to receive pings from them instead?
0
 

Author Comment

by:Olaf_Bloemer
ID: 20775814
Okay, you'll get a log file. :o)
But please give some more hours to produce a significant one...

And then  I'll give further information on your questions as well.

See you later...
0
 
LVL 17

Expert Comment

by:krakatoa
ID: 20812957
0
 

Author Comment

by:Olaf_Bloemer
ID: 20830832
Hi guys,

at first I have to apologize for this badly delayed answer, but we worked hard on it.

At last, we seem to be very close to the solution: It has nothing to do with java.util.Timer
(as assumed by all of us). Instead, something is wrong with the system time, i.e.
System.currentTimeMillis() jumps backwards from time to time.

To demonstrate this, I've attached a very simple test class as code snippet and the
log file it produced on the machine in question. There is a jump back in system time
every hour (about 20 minutes after full hour) and a correction some minutes later.

There is a legacy other application on the machine which is responsible to set the system
time every 15 minutes. The time is supplied by a central server. Normally it does just small
corrections, but here it should be definitely one half of the picture: the jump back or the correction.

The regularity of the  jump back (every 20  minutes past full hour) could fit to the 4 x 15 minutes
of this legacy application. But if this is true: Why do we have this problem on just one machine
(as I said before, all 130 machines should have identical configuirations, including the
system time management by just one central time server)?
On the other hand, if the legacy application does the correction: why is the correction not
regularly but depending on the size of the jump back (the bigger the jump, the earlier the
correction)?

This we still don't know. A virus? A broken system clock?

There is some mysterious rhythm in the size of the jumps and time it takes to correct it:

size of jump  | time until correction (minutes)
----------------------------------------------------
   12:28         |       4
     8:27         |       8
     4:28         |     12
     0:28         |     16
   12:28         |       4
     8:27         |       8
     4:28         |     12
     0:28         |     16
   12:28         |       4
     8:27         |       8

The sum of each row is nearly constant.

Do you have an idea about all of this?

Thanks again and best regards,

Olaf.


package test;
 
import java.io.IOException;
import java.util.logging.FileHandler;
import java.util.logging.Logger;
 
/**
 * test.SystemTime
 */
public class SystemTime {
    
    private static Logger LOG = Logger.getLogger("system.time");
 
    public static void main(String[] args) {
        try {
            FileHandler fh = new FileHandler("systemtime.log", true);
            fh.setFormatter(new java.util.logging.SimpleFormatter());
            LOG.addHandler(fh);
            
            long currentTimeMillis = 0;
            long delta             = 60000;
            
            while (true) {
                LOG.info("");
                try {
                    Thread.sleep(delta);
                } catch (InterruptedException e) {
                    LOG.warning(e.getMessage());
                }
                if (Math.abs(System.currentTimeMillis()-currentTimeMillis-delta) > 5000) {
                    LOG.warning("Abweichung der Systemzeit um mehr als 5 Sekunden.");
                }
                currentTimeMillis = System.currentTimeMillis();
            }
        } catch (SecurityException e) {
            LOG.warning(e.getMessage());
        } catch (IOException e) {
            LOG.warning(e.getMessage());
        }
        
    }
}

Open in new window

systemtime.log
0
 
LVL 17

Expert Comment

by:krakatoa
ID: 20830998
Well, I did mention in an earlier comment that the rogue machines' timing could be at fault (although I mentioned it in hardware terms, not legacy software), and this kind of clock resetting is one of the caveats mentioned in the Java documentation itself.

What kind of an application would get its time from elsewhere than the system anyway? I'll look at what you sent also.
0
 
LVL 17

Expert Comment

by:krakatoa
ID: 20831040
Having looked, I'd say you just can't have your cake and eat it too with timing. The clock resets are probably totally confusing the thread timings and leading to the bunching.

As an outsider to much of your thinking and certainly to the processes it is directed at, I would ask myself why, just to know peripherals are alive, you'd need to synchronize the times on all the machines in this way, UNLESS they were controlling some crucial process of manufacture or control which could only be accomplished using such fine synchronization?
0
 

Author Comment

by:Olaf_Bloemer
ID: 20831132
The legacy application is obsolete in its main purpose, but it still does some work beside
the clock corrections which cannot be replaced by another application within some weeks.
As you, I'm convinced that the clock manipulations lead to all this funny behaviour of timers.

But I don't think that the legacy application is the cause for this trouble. It works since about
five years, and we didn't face any problem like this on this on special machine.
I think it's more likely that it is a hardware problem with the system clock or some virus...
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 20831155
Are you using NTP to synchronize with the server?
0
 
LVL 17

Expert Comment

by:krakatoa
ID: 20831164
If that's the case, then if you could parallel-in another nagelneue Maschine alongside the rogue, holding the same software environment as the rogue, but only reporting (at least initially) to the timing logs (not polling the peripherals for real), you may well find it revealing.
0
 

Author Comment

by:Olaf_Bloemer
ID: 20831193
Hi CEHJ,
nice that you join in again. ;o)

We don't use NTP.
It's also some hand-made thing. The legacy application (written in C++) communicates with the legacy
server (written in Java) by a line-based ASCII protocol. And there is a command from the server to the
client which reads about:

COMMAND_SET_TIME
date=06.02.2008
time=12:50:45
END_COMMAND

Hence if the time on the legacy server was wrong, it should be a problem an all 130 clients.
0
 
LVL 86

Accepted Solution

by:
CEHJ earned 500 total points
ID: 20831250
Well you should probably be using NTP to do this. It has built-in safeguards against excessive correction, so you don't get big swings.

I suspect that your rogue machine(s) behaviour could be due to network problems
0
 
LVL 17

Assisted Solution

by:krakatoa
krakatoa earned 500 total points
ID: 20832003
You probably already know this :
57 4290274 (timer) java.util.Timer.scheduleAtFixedRate() fails if the system time is changed

from:

http://bugs.sun.com/bugdatabase/top25_bugs.do
0
 

Author Comment

by:Olaf_Bloemer
ID: 20833569
Yes, I think that's exactly that issue.
Thank you again,

best regards,

Olaf.
0
 
LVL 17

Expert Comment

by:krakatoa
ID: 20833633
Thanks. k.
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 20833642
:-)
0
 
LVL 17

Expert Comment

by:krakatoa
ID: 20834333
Hmm - I don't wish to gild the lily, but I've been playing with a simple piece of code with timertask in, and resetting the time on my XP box whilst it's running. The result is that the timertask stops completely, only to self-right itself and run again after about a minute - without the clock being reset to the right time. Interesting also is that synchorinizing the PC's time with the internet did not have any adverse effect on the timertask, whether I ran the .scheduleAtFixedRate method or the .schedule method.

The code is ultra basic and very probably not any more edifying and I'm sure no more forensic than what you have and know already; but since the Q. is closed and I've gone this far, I may as well post it pro forma. Regards, k.

import javax.swing.*;
import java.awt.*;
import java.util.*;
import java.util.Timer;



 class CPUTest extends JFrame implements Runnable{

  private boolean ok = true;
  private int a =0;
  private TimerTask timeTask;
  private Timer myTimer;


    public CPUTest(){

      this.setSize(new Dimension(200,100));
      //this.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
      this.setDefaultCloseOperation(JFrame.DISPOSE_ON_CLOSE);
      this.setVisible(true);

      myTimer = new Timer(true);// make the Timer a Daemon thread, so that there is a way of ending it!

      timeTask = new TymerTask(this);
      
      myTimer.scheduleAtFixedRate(timeTask, 1000, 1000);

      //myTimer.schedule(timeTask, 1000, 1000);

      timeTask.run();


    }



    public static void main(String args[]){

      new CPUTest().run();
      
    }



      public synchronized void run(){ //either use synchronized here on the method or as a block in the remmed alternative below

      //synchronized(this){
      
            while (ok){
      
                   try{
                  this.wait(1000);
                  }catch(InterruptedException intex){}
                  if(!(isVisible())){ok=!ok;}
                  System.out.println("Main JFrame instance running       "+a++);
                  this.notify();
                  
            }

      //}

      try{
      }
      finally{
            System.out.println("GUARANTEED CALL TO FINALLY METHOD");
      }



    }// end run


  private class TymerTask extends java.util.TimerTask{

      CPUTest mycputest;
      
      public TymerTask(CPUTest cputest){
            
            mycputest = cputest;

      }



        public synchronized void run(){
            

            if(mycputest.isVisible()==false){System.out.println(" *****   Enclosing instance is gone   ***** ");}
            try{
            this.wait(1000);
            }catch(Exception intexx){intexx.printStackTrace();}
            System.out.println("TIMERTASK HERE "+a++);

      }

  }

}
0
 
LVL 17

Expert Comment

by:krakatoa
ID: 20834351
... and, sorry ... this is superfluous : this.notify();

k.
0

Featured Post

The new generation of project management tools

With monday.com’s project management tool, you can see what everyone on your team is working in a single glance. Its intuitive dashboards are customizable, so you can create systems that work for you.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Java had always been an easily readable and understandable language.  Some relatively recent changes in the language seem to be changing this pretty fast, and anyone that had not seen any Java code for the last 5 years will possibly have issues unde…
Java functions are among the best things for programmers to work with as Java sites can be very easy to read and prepare. Java especially simplifies many processes in the coding industry as it helps integrate many forms of technology and different d…
Viewers will learn about the different types of variables in Java and how to declare them. Decide the type of variable desired: Put the keyword corresponding to the type of variable in front of the variable name: Use the equal sign to assign a v…
How to fix incompatible JVM issue while installing Eclipse While installing Eclipse in windows, got one error like above and unable to proceed with the installation. This video describes how to successfully install Eclipse. How to solve incompa…
Suggested Courses
Course of the Month3 days, 7 hours left to enroll

600 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question