asked on

CentOS 7 - NTP Syncrhonized: no

Hi,

I'm using Chrony to configure the ntp client and it seems like every evening I keep losing the NTP synchronization which makes the time leap forward.

Sep 30 22:10:00 xxxxxxx chronyd[32710]: Forward time jump detected!
Sep 30 22:10:00 xxxxxxx chronyd[32710]: Can't synchronise: no selectable sources
Sep 30 22:13:13 xxxxxxx chronyd[32710]: Selected source 193.190.253.212
Sep 30 22:13:13 xxxxxxx chronyd[32710]: System clock wrong by -7439.358171 seconds, adjustment started
Sep 30 22:18:38 xxxxxxx  chronyd[32710]: Selected source 194.78.244.172

Open in new window

Lists of NTP Servers I'm using:

server 0.be.pool.ntp.org iburst
server 1.be.pool.ntp.org iburst
server 2.be.pool.ntp.org iburst
server 3.be.pool.ntp.org iburst

restarting chrony resolves the issue, but it keeps happening

noci

Can you give more details on the config (server lines, other settings, restrict is not needed).
Is this on a VM, Container or bare metal.
Are there ANY other programs running that set / sync time? Please also check all cronjobs.

Can you give an output of ntpq -pw & ntpq -pwn
(preferably if the system has been up for some time.)

Matthias Vandercleyen

ASKER

VM. We discovered that one of our ESX Servers did not have NTP servers set, so we fixed that.

No conjobs are present. And we disabled ntpd as we are using Chrony, so the commands wont work, but I will provide you the chronyc tracking & chronyc sources output.

chronyc sources
210 Number of sources = 4
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
^* valoo.patate.ninja            2   7   377     2   +948us[ +732us] +/-   42ms
^+ 193.104.37.238                2   7   377   127   -641us[ -843us] +/-   36ms
^+ hades.boxed-it.com            3   7   377     2    +93us[  +93us] +/-   77ms
^- 82-64-45-50.subs.proxad.>     1   8   201   122   -181us[ -384us] +/-   20ms

Open in new window

chronyc tracking
Reference ID    : 33260278 (valoo.patate.ninja)                                                                                                                                                                                               
Stratum         : 3
Ref time (UTC)  : Tue Oct 01 08:54:46 2019                                                                                                                                                                                                    
System time     : 0.000058071 seconds slow of NTP time                                                                                                                                                                                        
Last offset     : +0.000307334 seconds                                                                                                                                                                                                        
RMS offset      : 0.000683662 seconds                                                                                                                                                                                                         
Frequency       : 3.101 ppm fast                                                                                                                                                                                                              
Residual freq   : +0.050 ppm                                                                                                                                                                                                                  
Skew            : 1.978 ppm                                                                                                                                                                                                                   
Root delay      : 0.035822876 seconds                                                                                                                                                                                                         
Root dispersion : 0.020035349 seconds                                                                                                                                                                                                         
Update interval : 64.9 seconds                                                                                                                                                                                                                
Leap status     : Normal

Open in new window

As for the configuration, it is done by ansible, but her is the file:

cat /etc/chrony.conf                                                                                                                                                                                              # Ansible managed

# List of NTP servers to use.
server 0.be.pool.ntp.org iburst
server 1.be.pool.ntp.org iburst
server 2.be.pool.ntp.org iburst
server 3.be.pool.ntp.org iburst

# This directive specify the location of the file containing ID/key pairs for
# NTP authentication.
keyfile /etc/chrony.keys

# This directive specify the file into which chronyd will store the rate
# information.
driftfile /var/lib/chrony/drift

# Uncomment the following line to turn logging on.
log tracking measurements statistics

# Log files location.
logdir /var/log/chrony

# Stop bad estimates upsetting machine clock.
maxupdateskew 100.0

# This directive enables kernel synchronisation (every 11 minutes) of the
# real-time clock. Note that it can't be used along with the 'rtcfile' directive.
rtcsync

# Step the system clock instead of slewing it if the adjustment is larger than
# one second, but only in the first three clock updates.
makestep 1 3

Open in new window

noci

This one is useless: 82-64-45-50.subs.proxad.>

I was specifically interested in jitter which is a measure of stability.
ntpd is the gold standard for time servers.

It appears to jump when connection fails and the next clock it sees is a false ticker.

Matthias Vandercleyen

ASKER

Here is the output of the chroynic ntpdata which contains some jitter information:

chronyc ntpdata                                                                                                                                                                                                                                                                                                                                                                                                                                                 
Remote address  : 51.38.2.120 (33260278)                                                                                                                                                                                                      
Remote port     : 123                                                                                                                                                                                                                         
Local address   : 10.102.7.103 (0A660767)                                                                                                                                                                                                     
Leap status     : Normal                                                                                                                                                                                                                      
Version         : 4                                                                                                                                                                                                                           
Mode            : Server                                                                                                                                                                                                                      
Stratum         : 2                                                                                                                                                                                                                           
Poll interval   : 8 (256 seconds)                                                                                                                                                                                                             
Precision       : -24 (0.000000060 seconds)                                                                                                                                                                                                   
Root delay      : 0.006821 seconds                                                                                                                                                                                                            
Root dispersion : 0.025146 seconds                                                                                                                                                                                                            
Reference ID    : 83BC03DD ()                                                                                                                                                                                                                 
Reference time  : Tue Oct 01 09:09:26 2019                                                                                                                                                                                                    
Offset          : +0.000721286 seconds                                                                                                                                                                                                        
Peer delay      : 0.032555584 seconds                                                                                                                                                                                                         
Peer dispersion : 0.000000144 seconds                                                                                                                                                                                                         
Response time   : 0.000088594 seconds                                                                                                                                                                                                         
Jitter asymmetry: +0.00                                                                                                                                                                                                                       
NTP tests       : 111 111 1111                                                                                                                                                                                                                
Interleaved     : No                                                                                                                                                                                                                          
Authenticated   : No                                                                                                                                                                                                                          
TX timestamping : Daemon                                                                                                                                                                                                                      
RX timestamping : Kernel                                                                                                                                                                                                                      
Total TX        : 32                                                                                                                                                                                                                          
Total RX        : 32                                                                                                                                                                                                                          
Total valid RX  : 32                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
Remote address  : 193.104.37.238 (C16825EE)                                                                                                                                                                                                   
Remote port     : 123                                                                                                                                                                                                                         
Local address   : 10.102.7.103 (0A660767)                                                                                                                                                                                                     
Leap status     : Normal                                                                                                                                                                                                                      
Version         : 4                                                                                                                                                                                                                           
Mode            : Server
Stratum         : 2
Poll interval   : 7 (128 seconds)
Precision       : -22 (0.000000238 seconds)
Root delay      : 0.004593 seconds
Root dispersion : 0.044601 seconds
Reference ID    : C1BEE642 ()
Reference time  : Tue Oct 01 08:55:21 2019
Offset          : -0.000647127 seconds
Peer delay      : 0.017280784 seconds
Peer dispersion : 0.000000299 seconds
Response time   : 0.000017454 seconds
Jitter asymmetry: +0.00
NTP tests       : 111 111 1111
Interleaved     : No
Authenticated   : No
TX timestamping : Daemon
RX timestamping : Kernel
Total TX        : 19
Total RX        : 19
Total valid RX  : 19

Remote address  : 195.200.224.66 (C3C8E042)
Remote port     : 123
Local address   : 10.102.7.103 (0A660767)
Leap status     : Normal
Version         : 4
Mode            : Server
Stratum         : 3
Poll interval   : 8 (256 seconds)
Precision       : -22 (0.000000238 seconds)
Root delay      : 0.012161 seconds
Root dispersion : 0.075363 seconds
Reference ID    : 040DCC06 ()
Reference time  : Tue Oct 01 09:05:49 2019
Offset          : +0.002876930 seconds
Peer delay      : 0.016297361 seconds
Peer dispersion : 0.000000301 seconds
Response time   : 0.000069260 seconds
Jitter asymmetry: +0.00
NTP tests       : 111 111 1111
Interleaved     : No
Authenticated   : No
TX timestamping : Daemon
RX timestamping : Kernel
Total TX        : 28
Total RX        : 28
Total valid RX  : 28

Remote address  : 82.64.45.50 (52402D32)
Remote port     : 123
Local address   : 10.102.7.103 (0A660767)
Leap status     : Normal
Version         : 4
Mode            : Server
Stratum         : 1
Poll interval   : 8 (256 seconds)
Precision       : -25 (0.000000030 seconds)
Root delay      : 0.000000 seconds
Root dispersion : 0.000000 seconds
Reference ID    : 47505300 (GPS)
Reference time  : Tue Oct 01 09:18:38 2019
Offset          : -0.000110997 seconds
Peer delay      : 0.036675438 seconds
Peer dispersion : 0.000000500 seconds
Response time   : 0.000004827 seconds
Jitter asymmetry: +0.00
NTP tests       : 111 111 1111
Interleaved     : No
Authenticated   : No
TX timestamping : Daemon
RX timestamping : Kernel
Total TX        : 22
Total RX        : 6
Total valid RX  : 6

Open in new window

David Favor

The message...

Sep 30 22:10:00 xxxxxxx chronyd[32710]: Can't synchronise: no selectable sources

Open in new window

suggests somehow your machine's connection to the Internet is sometimes interrupted.

The message...

Sep 30 22:13:13 xxxxxxx chronyd[32710]: System clock wrong by -7439.358171 seconds, adjustment started

Open in new window

suggests no sources could be contacted for almost 2 hours (124 minutes) which seems highly suspect.

This means either this machine's Internet connection actually drops for 2 hour sometimes or something about your NTP setup is failing.

Try posting the contents of the following, as text, like you did above.

egrep -i -e chronyd -e cron /var/log/syslog /var/log/syslog.1

Open in new window

Your log file name may be different, depending on your Distro.

noci

@david is doesn't mean there was no connection for 2 hours (probably far more... ).

It means there was a buildup of errors accumulating to a two hour difference...
I have seen clocks from VM's drifting by one minute / five minutes. With that rate it took around 10 hours of missing synchronisation to get there.
Any more log messages from crony in the log files? preferably a longer time a period of 2-4 weeks?

If the clocks were running for a long time maybe the poll interval became to long to cope with changes...
The jitter i meant (clock jitter) is not in the shown statistics. (Jitter = 0 can only happen in an atom clock and you probably need a hydrogen clock [ gallileo] ).

you may get help from adding minpoll 5 maxpoll 8 on each of the server lines.

Matthias Vandercleyen

ASKER

The logs don't really change, every evening between 22:10:00 and 22:20:00 a forward time jump is detected. I'm going to watch it this evening and see what happens now with the ntp configured on the esx server. Hopefully the issue will be gone

noci

hm. let me guess. The jump as after a backup?... or some other house keeping? the the VM is snapshotted

Matthias Vandercleyen

ASKER

So it just happened again, another service running on the same esx server doesn't have that problem...
executing hwclock shows the right time tho.

Matthias Vandercleyen

ASKER

timedatectl
      Local time: Tue 2019-10-01 23:51:15 CEST
  Universal time: Tue 2019-10-01 21:51:15 UTC
        RTC time: Tue 2019-10-01 19:55:55
       Time zone: Europe/Brussels (CEST, +0200)
     NTP enabled: yes
NTP synchronized: no
 RTC in local TZ: no
      DST active: yes
 Last DST change: DST began at
                  Sun 2019-03-31 01:59:59 CET
                  Sun 2019-03-31 03:00:00 CEST
 Next DST change: DST ends (the clock jumps one hour backwards) at
                  Sun 2019-10-27 02:59:59 CEST
                  Sun 2019-10-27 02:00:00 CET

Open in new window

restarting chrony fixes the issue but it will happen again.
Thinking about removing Chrony and using ntpd

noci

Then there is some tool moving the clocks...
you may be able to find it by enabling selinux in permissive mode and watch the audit trails from that around the date change.

Two hour shift might also be caused by some utility setting the wrong time with f.e. hwclock or doing data calculations the wrong ways . It is close to the difference between Summer time & UTC. (7200 seconds). If the difference is too large (1000 seconds ) then syncing stops and the clock can run astray,

hwclock can be set to read/write UTC time, or localtime is that used consistently? (if used). It is wise to check if chrony is the cause, ntpd has a better track record (imho).
check cron jobs around 22:00 for time manipulation.

ntpd will not help if there is such a stray job.

Frank Helk

Hmmm - since it's a VM, let's have a look at a common cause of time havoc:

Does the host sync the VM's clock ?
That MUST defintely be forbidden if the VM has its own timeservice.

Besides of that:

it's a good idead to sync the hardware clock, but on a VM I suspect that would be useless ... I presume on bootup the clock of the host would be used as source of the HW clock. Probably you should sync the host, too, to be on the safe side.

Why not the "real thing", a classic NTP client ? What's the point where chrony is better ?

P.S.: I'd like to kindly recommend a look onto my article on NTP basics for some NTP insights ...

Matthias Vandercleyen

ASKER

We have several other vm's powered on the same esx server that don't have this issue so we decided to just drop that one and start it over, no issues so far. Hopefully the issue is gone.

We will swap over to ntpd in the near future, want to see if the issue is still present first

ASKER CERTIFIED SOLUTION

Matthias Vandercleyen

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial