java.util.Timer shows symptoms of starvation

I use a java.util.Timer to send ping singnals to some other applications which are connected to mine
by a standard socket communication. The timer's goal is to show communication problems soon.
Hence, if a connected application does not receive regular pings, it closes the socket after a certain
timeout and retries later. The timer's period is 2 seconds and the timeout at 30 seconds.

It happens (seldom but from time to time), that the timer is pausing for about some minutes (!)
and restarts by catching up all missing pings as a bunch. (The task is scheduled by the method
scheduleAtFixedRate()).

Of course, I know about the inaccuracies of Java's wait() method which seems to be the key line
in the Timer's implementation. But it's not a matter of some milliseconds but of minutes.
The operating system is Windows XP and the CPU usage is really low.
All threads (and there are just a dozen of them) have standard priority, which I would like to keep.

What could be the problem? And what could be a solution?

Thanks for any assistance.

Olaf.
Olaf_BloemerAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

krakatoaCommented:
Wait()? Normally the timer would run fine with a Timertask associated with it.
0
Olaf_BloemerAuthor Commented:
Of course, I use a TimerTask to schedule my job, and there is no explicit invocation of wait() in my code.
But a glance at Sun's implementation of java.util.Timer shows in the private method mainLoop()
(lines 475-517 of the non-public class TimerThread inside java.util.Timer) the usage of wait() in line 509:

     queue.wait(executionTime - currentTime);

This seems to be the line, where the executing timer thread waits between its executions.
And it seems that it does not return from this wait() in time.

Or am I completely wrong?
0
krakatoaCommented:
Thinking.
0
Cloud Class® Course: Microsoft Azure 2017

Azure has a changed a lot since it was originally introduce by adding new services and features. Do you know everything you need to about Azure? This course will teach you about the Azure App Service, monitoring and application insights, DevOps, and Team Services.

krakatoaCommented:
>> It happens (seldom but from time to time) ..

How often, and when?

Would you be in a position to run an observable, non-executive parallel instance (which uses fewer threads)? You must have given quite some thought to this already concerning possible race conditions.
0
Olaf_BloemerAuthor Commented:
Yes, we're running about 130 instances of this application in 130 subsidiaries on 130 local machines
the hardware equipment of which is supposed to be identical.
There is one machine showing this problem every day between two and four times.
The problem has been observed on other machines as well, but then it's really seldom.
Maybe one occurence per week among the other 129 machines.
So it could be a hardware problem in this one special case, but there seems to be a general
Java problem in the background as well which is "provoked" by this one machine more than usual.
0
krakatoaCommented:
... add in time-stamped logging to provide documented evidence, if you are not already doing so .
0
krakatoaCommented:
(Missed reading your last post before my last post. Reading.)
0
krakatoaCommented:
Do you mean that the Java app sending the pings is on the local machines, pinging itself, (?), not pinging clients from a server?
0
krakatoaCommented:
And how do you differentiate between a 'genuine' communications problem, and an apparently phantom one such as this?
0
CEHJCommented:
>>
It happens (seldom but from time to time), that the timer is pausing for about some minutes (!)
and restarts by catching up all missing pings as a bunch. (The task is scheduled by the method
scheduleAtFixedRate()).
>>

Try to avoid the bunching by executing your application code in a separate thread created in TimerTask.run
0
Olaf_BloemerAuthor Commented:
Thank you for your answers, and sorry for being late with this next contribution.

To give further explanation on the architecture: As I said, there are 130 subsidiaries each of which
has a Java application, which in turn works as a server in a LAN and serves for about five to 30
clients. These clients are primitive microcontrollers supervising some technical machines.
The microcontrollers are capable (only) of a socket communication which allows to transmit
a primitive ASCII protocol to control the microcontrollers.
Hence the pings are not sent from the Java application to localhost but to a microcontroller located
in a LAN (which means there shouldn't be to many communication problems).

I have no idea how to detect 'genuine' communication problems. There's not much traffic beside the
pings, i.e. it can happen, that there is no other signal for several hours. This is why we use a ping
signal: to become aware of problems.

In the meantime we tried a 'hand-made' timer which uses a simple Thread.sleep() to pause.
This worked better, we had no problems during the weekend. Maybe this is the work-around
for the time being, but I still would like better to understand really what's going wrong with
the java.util.Timer. It remains to be strange, doesn't it?

Thank you again.
What about the points? What do you think is a fair handling?
(I'm not very often around on EE, so I'm not really up with the common awarding)

Best regards,
Olaf


0
CEHJCommented:
>>
but I still would like better to understand really what's going wrong with the java.util.Timer.
>>

When you say 'wrong'. i'd guess nothing (technically). Timer is single-threaded and expects its tasks to return fast. From the API docs:

"Corresponding to each Timer object is a single background thread that is used to execute all of the timer's tasks, sequentially. Timer tasks should complete quickly."

My guess is that on some occasions, it's just what your code *doesn't* do, hence my last suggestion. Did you try it btw?
0
krakatoaCommented:
Agree with CEHJ.  

Some part of the process that the maverick server controls is not returning on time because it's doing something else, cf. pre-emptive multitasking, time-slicing. I guess with Thread.sleep, your code can't "not return", because it's didn't go anywhere yet, whereas with the timer, it's aw(o)l.

(It might be interesting to see what happens if you replace that one server with a multiprocessor machine).

As for the points, put it this way - the 3 of us deserve equal shares. ;) Good luck.
0
Olaf_BloemerAuthor Commented:
I see that TimerTask's run() method should return quickly, but this should be okay
in my code: There is just one task for this timer, and it  just places some ping command
in a command queue. (I always prefer to use separate timers for separate tasks.
Do you think, this is good practice?)
I haven't tried your idea yet, but I will do soon.

Thanks again,

best regards,
Olaf.
0
krakatoaCommented:
>> Do you think, this is good practice?)
 I guess that is directed at CEHJ, so I'll step aside.

But in that case, maybe you should be looking at your queue code rather than the timer. -?
0
CEHJCommented:
>>I see that TimerTask's run() method should return quickly, but this should be okay
in my code:

That would run counter to the evidence you report - namely of 'bunching'. Including the timestamped debugging code will tell you exactly when your task returned

>>Do you think, this is good practice?

Yes, i think they should have this logical separation
0
Olaf_BloemerAuthor Commented:
Hi again,

we've tried CEHJ's suggestion: 'Try to avoid the bunching by executing your application cod
 in a separate thread created in TimerTask.run'
Unfortunately, it didn't work. It really seems to be the timer itself which awakes too late.

Our 'home-made' timer (a simple loop with a Thread.sleep()) seems to be the best work-around,
although it's not beautiful code. It's interesting, that it is also possible to use a this.wait(2000)
to bridge the execution pause in this simple loop solution.

If both of you don't mind, I woud like to share the points between you to honour your efforts.
Is it possible? If not, what should I do?

Best regards,
Olaf.

0
krakatoaCommented:
re: points, yes, that's fine with me, thanks.

Do you not have any other apps or processes running on the server machines which could be starving your timer thread ? Is the crucial part of your programme in a synchronized block? And have you made any use of .notify() / notifyAll()?

I'd doubt CEHJ's diagnosis were wrong though.
0
CEHJCommented:
I can't really comment further without seeing the log file from your application
0
Olaf_BloemerAuthor Commented:
Thanks for the offer. Unfortunately most of the log file is written in German.
Do you understand German? ;o)

0
krakatoaCommented:
Ja, Deutsch kann ich allerdings. ;)

Nur nebenbei also:

I am trying to imagine your setup where each 1/130 XP machine would only be sending pings to a peripheral device, and doing nothing else. Is each 1/130 machine a standalone? On a network? On the 'net? The other 128/129 machines seem to bear out the stability of the code otherwise, so what about hardware glitches in the rogue machines' timekeeping - haben Sie das ueberhaupt kontrolliert?
0
CEHJCommented:
>>Ja, Deutsch kann ich allerdings. ;)

Ich auch ;-)
0
krakatoaCommented:
Ausgezeichnet !

(At *last* a breakthrough) ;)
0
CEHJCommented:
All i'd like to see from the log btw is the entry time to TimerTask.run and exit time
0
krakatoaCommented:
You've probably seen Sun's ping class - http://java.sun.com/j2se/1.4.2/docs/guide/nio/example/Ping.java.
0
krakatoaCommented:
Sorry, but asking ultra-fundamental questions can sometimes be edifying (and I guess you are still open to ideas. I should've asked already :

1) what kind of OS, environment etc., is on the microcontrollers? What kind of hardware devices are they? 1a) What kind of physical connection do they have with the XPs?

2) Can you enlighten us as to your conception of 'ping' - is it merely successfully opening a socket, or are you expecting a response as well? And ...

3) How are the microcontrollers handling Java - or are you addressing them through some native interface via the socket?

Why don't you get your XP machines to receive pings from them instead?
0
Olaf_BloemerAuthor Commented:
Okay, you'll get a log file. :o)
But please give some more hours to produce a significant one...

And then  I'll give further information on your questions as well.

See you later...
0
krakatoaCommented:
0
Olaf_BloemerAuthor Commented:
Hi guys,

at first I have to apologize for this badly delayed answer, but we worked hard on it.

At last, we seem to be very close to the solution: It has nothing to do with java.util.Timer
(as assumed by all of us). Instead, something is wrong with the system time, i.e.
System.currentTimeMillis() jumps backwards from time to time.

To demonstrate this, I've attached a very simple test class as code snippet and the
log file it produced on the machine in question. There is a jump back in system time
every hour (about 20 minutes after full hour) and a correction some minutes later.

There is a legacy other application on the machine which is responsible to set the system
time every 15 minutes. The time is supplied by a central server. Normally it does just small
corrections, but here it should be definitely one half of the picture: the jump back or the correction.

The regularity of the  jump back (every 20  minutes past full hour) could fit to the 4 x 15 minutes
of this legacy application. But if this is true: Why do we have this problem on just one machine
(as I said before, all 130 machines should have identical configuirations, including the
system time management by just one central time server)?
On the other hand, if the legacy application does the correction: why is the correction not
regularly but depending on the size of the jump back (the bigger the jump, the earlier the
correction)?

This we still don't know. A virus? A broken system clock?

There is some mysterious rhythm in the size of the jumps and time it takes to correct it:

size of jump  | time until correction (minutes)
----------------------------------------------------
   12:28         |       4
     8:27         |       8
     4:28         |     12
     0:28         |     16
   12:28         |       4
     8:27         |       8
     4:28         |     12
     0:28         |     16
   12:28         |       4
     8:27         |       8

The sum of each row is nearly constant.

Do you have an idea about all of this?

Thanks again and best regards,

Olaf.


package test;
 
import java.io.IOException;
import java.util.logging.FileHandler;
import java.util.logging.Logger;
 
/**
 * test.SystemTime
 */
public class SystemTime {
    
    private static Logger LOG = Logger.getLogger("system.time");
 
    public static void main(String[] args) {
        try {
            FileHandler fh = new FileHandler("systemtime.log", true);
            fh.setFormatter(new java.util.logging.SimpleFormatter());
            LOG.addHandler(fh);
            
            long currentTimeMillis = 0;
            long delta             = 60000;
            
            while (true) {
                LOG.info("");
                try {
                    Thread.sleep(delta);
                } catch (InterruptedException e) {
                    LOG.warning(e.getMessage());
                }
                if (Math.abs(System.currentTimeMillis()-currentTimeMillis-delta) > 5000) {
                    LOG.warning("Abweichung der Systemzeit um mehr als 5 Sekunden.");
                }
                currentTimeMillis = System.currentTimeMillis();
            }
        } catch (SecurityException e) {
            LOG.warning(e.getMessage());
        } catch (IOException e) {
            LOG.warning(e.getMessage());
        }
        
    }
}

Open in new window

systemtime.log
0
krakatoaCommented:
Well, I did mention in an earlier comment that the rogue machines' timing could be at fault (although I mentioned it in hardware terms, not legacy software), and this kind of clock resetting is one of the caveats mentioned in the Java documentation itself.

What kind of an application would get its time from elsewhere than the system anyway? I'll look at what you sent also.
0
krakatoaCommented:
Having looked, I'd say you just can't have your cake and eat it too with timing. The clock resets are probably totally confusing the thread timings and leading to the bunching.

As an outsider to much of your thinking and certainly to the processes it is directed at, I would ask myself why, just to know peripherals are alive, you'd need to synchronize the times on all the machines in this way, UNLESS they were controlling some crucial process of manufacture or control which could only be accomplished using such fine synchronization?
0
Olaf_BloemerAuthor Commented:
The legacy application is obsolete in its main purpose, but it still does some work beside
the clock corrections which cannot be replaced by another application within some weeks.
As you, I'm convinced that the clock manipulations lead to all this funny behaviour of timers.

But I don't think that the legacy application is the cause for this trouble. It works since about
five years, and we didn't face any problem like this on this on special machine.
I think it's more likely that it is a hardware problem with the system clock or some virus...
0
CEHJCommented:
Are you using NTP to synchronize with the server?
0
krakatoaCommented:
If that's the case, then if you could parallel-in another nagelneue Maschine alongside the rogue, holding the same software environment as the rogue, but only reporting (at least initially) to the timing logs (not polling the peripherals for real), you may well find it revealing.
0
Olaf_BloemerAuthor Commented:
Hi CEHJ,
nice that you join in again. ;o)

We don't use NTP.
It's also some hand-made thing. The legacy application (written in C++) communicates with the legacy
server (written in Java) by a line-based ASCII protocol. And there is a command from the server to the
client which reads about:

COMMAND_SET_TIME
date=06.02.2008
time=12:50:45
END_COMMAND

Hence if the time on the legacy server was wrong, it should be a problem an all 130 clients.
0
CEHJCommented:
Well you should probably be using NTP to do this. It has built-in safeguards against excessive correction, so you don't get big swings.

I suspect that your rogue machine(s) behaviour could be due to network problems
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
krakatoaCommented:
You probably already know this :
57 4290274 (timer) java.util.Timer.scheduleAtFixedRate() fails if the system time is changed

from:

http://bugs.sun.com/bugdatabase/top25_bugs.do
0
Olaf_BloemerAuthor Commented:
Yes, I think that's exactly that issue.
Thank you again,

best regards,

Olaf.
0
krakatoaCommented:
Thanks. k.
0
CEHJCommented:
:-)
0
krakatoaCommented:
Hmm - I don't wish to gild the lily, but I've been playing with a simple piece of code with timertask in, and resetting the time on my XP box whilst it's running. The result is that the timertask stops completely, only to self-right itself and run again after about a minute - without the clock being reset to the right time. Interesting also is that synchorinizing the PC's time with the internet did not have any adverse effect on the timertask, whether I ran the .scheduleAtFixedRate method or the .schedule method.

The code is ultra basic and very probably not any more edifying and I'm sure no more forensic than what you have and know already; but since the Q. is closed and I've gone this far, I may as well post it pro forma. Regards, k.

import javax.swing.*;
import java.awt.*;
import java.util.*;
import java.util.Timer;



 class CPUTest extends JFrame implements Runnable{

  private boolean ok = true;
  private int a =0;
  private TimerTask timeTask;
  private Timer myTimer;


    public CPUTest(){

      this.setSize(new Dimension(200,100));
      //this.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
      this.setDefaultCloseOperation(JFrame.DISPOSE_ON_CLOSE);
      this.setVisible(true);

      myTimer = new Timer(true);// make the Timer a Daemon thread, so that there is a way of ending it!

      timeTask = new TymerTask(this);
      
      myTimer.scheduleAtFixedRate(timeTask, 1000, 1000);

      //myTimer.schedule(timeTask, 1000, 1000);

      timeTask.run();


    }



    public static void main(String args[]){

      new CPUTest().run();
      
    }



      public synchronized void run(){ //either use synchronized here on the method or as a block in the remmed alternative below

      //synchronized(this){
      
            while (ok){
      
                   try{
                  this.wait(1000);
                  }catch(InterruptedException intex){}
                  if(!(isVisible())){ok=!ok;}
                  System.out.println("Main JFrame instance running       "+a++);
                  this.notify();
                  
            }

      //}

      try{
      }
      finally{
            System.out.println("GUARANTEED CALL TO FINALLY METHOD");
      }



    }// end run


  private class TymerTask extends java.util.TimerTask{

      CPUTest mycputest;
      
      public TymerTask(CPUTest cputest){
            
            mycputest = cputest;

      }



        public synchronized void run(){
            

            if(mycputest.isVisible()==false){System.out.println(" *****   Enclosing instance is gone   ***** ");}
            try{
            this.wait(1000);
            }catch(Exception intexx){intexx.printStackTrace();}
            System.out.println("TIMERTASK HERE "+a++);

      }

  }

}
0
krakatoaCommented:
... and, sorry ... this is superfluous : this.notify();

k.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Java

From novice to tech pro — start learning today.