Matthias Vandercleyen
asked on
CentOS 7 - NTP Syncrhonized: no
Hi,
I'm using Chrony to configure the ntp client and it seems like every evening I keep losing the NTP synchronization which makes the time leap forward.
Lists of NTP Servers I'm using:
server 0.be.pool.ntp.org iburst
server 1.be.pool.ntp.org iburst
server 2.be.pool.ntp.org iburst
server 3.be.pool.ntp.org iburst
restarting chrony resolves the issue, but it keeps happening
I'm using Chrony to configure the ntp client and it seems like every evening I keep losing the NTP synchronization which makes the time leap forward.
Sep 30 22:10:00 xxxxxxx chronyd[32710]: Forward time jump detected!
Sep 30 22:10:00 xxxxxxx chronyd[32710]: Can't synchronise: no selectable sources
Sep 30 22:13:13 xxxxxxx chronyd[32710]: Selected source 193.190.253.212
Sep 30 22:13:13 xxxxxxx chronyd[32710]: System clock wrong by -7439.358171 seconds, adjustment started
Sep 30 22:18:38 xxxxxxx chronyd[32710]: Selected source 194.78.244.172
Lists of NTP Servers I'm using:
server 0.be.pool.ntp.org iburst
server 1.be.pool.ntp.org iburst
server 2.be.pool.ntp.org iburst
server 3.be.pool.ntp.org iburst
restarting chrony resolves the issue, but it keeps happening
ASKER
VM. We discovered that one of our ESX Servers did not have NTP servers set, so we fixed that.
No conjobs are present. And we disabled ntpd as we are using Chrony, so the commands wont work, but I will provide you the chronyc tracking & chronyc sources output.
As for the configuration, it is done by ansible, but her is the file:
No conjobs are present. And we disabled ntpd as we are using Chrony, so the commands wont work, but I will provide you the chronyc tracking & chronyc sources output.
chronyc sources
210 Number of sources = 4
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
^* valoo.patate.ninja 2 7 377 2 +948us[ +732us] +/- 42ms
^+ 193.104.37.238 2 7 377 127 -641us[ -843us] +/- 36ms
^+ hades.boxed-it.com 3 7 377 2 +93us[ +93us] +/- 77ms
^- 82-64-45-50.subs.proxad.> 1 8 201 122 -181us[ -384us] +/- 20ms
chronyc tracking
Reference ID : 33260278 (valoo.patate.ninja)
Stratum : 3
Ref time (UTC) : Tue Oct 01 08:54:46 2019
System time : 0.000058071 seconds slow of NTP time
Last offset : +0.000307334 seconds
RMS offset : 0.000683662 seconds
Frequency : 3.101 ppm fast
Residual freq : +0.050 ppm
Skew : 1.978 ppm
Root delay : 0.035822876 seconds
Root dispersion : 0.020035349 seconds
Update interval : 64.9 seconds
Leap status : Normal
As for the configuration, it is done by ansible, but her is the file:
cat /etc/chrony.conf # Ansible managed
# List of NTP servers to use.
server 0.be.pool.ntp.org iburst
server 1.be.pool.ntp.org iburst
server 2.be.pool.ntp.org iburst
server 3.be.pool.ntp.org iburst
# This directive specify the location of the file containing ID/key pairs for
# NTP authentication.
keyfile /etc/chrony.keys
# This directive specify the file into which chronyd will store the rate
# information.
driftfile /var/lib/chrony/drift
# Uncomment the following line to turn logging on.
log tracking measurements statistics
# Log files location.
logdir /var/log/chrony
# Stop bad estimates upsetting machine clock.
maxupdateskew 100.0
# This directive enables kernel synchronisation (every 11 minutes) of the
# real-time clock. Note that it can't be used along with the 'rtcfile' directive.
rtcsync
# Step the system clock instead of slewing it if the adjustment is larger than
# one second, but only in the first three clock updates.
makestep 1 3
This one is useless: 82-64-45-50.subs.proxad.>
I was specifically interested in jitter which is a measure of stability.
ntpd is the gold standard for time servers.
It appears to jump when connection fails and the next clock it sees is a false ticker.
I was specifically interested in jitter which is a measure of stability.
ntpd is the gold standard for time servers.
It appears to jump when connection fails and the next clock it sees is a false ticker.
ASKER
Here is the output of the chroynic ntpdata which contains some jitter information:
chronyc ntpdata
Remote address : 51.38.2.120 (33260278)
Remote port : 123
Local address : 10.102.7.103 (0A660767)
Leap status : Normal
Version : 4
Mode : Server
Stratum : 2
Poll interval : 8 (256 seconds)
Precision : -24 (0.000000060 seconds)
Root delay : 0.006821 seconds
Root dispersion : 0.025146 seconds
Reference ID : 83BC03DD ()
Reference time : Tue Oct 01 09:09:26 2019
Offset : +0.000721286 seconds
Peer delay : 0.032555584 seconds
Peer dispersion : 0.000000144 seconds
Response time : 0.000088594 seconds
Jitter asymmetry: +0.00
NTP tests : 111 111 1111
Interleaved : No
Authenticated : No
TX timestamping : Daemon
RX timestamping : Kernel
Total TX : 32
Total RX : 32
Total valid RX : 32
Remote address : 193.104.37.238 (C16825EE)
Remote port : 123
Local address : 10.102.7.103 (0A660767)
Leap status : Normal
Version : 4
Mode : Server
Stratum : 2
Poll interval : 7 (128 seconds)
Precision : -22 (0.000000238 seconds)
Root delay : 0.004593 seconds
Root dispersion : 0.044601 seconds
Reference ID : C1BEE642 ()
Reference time : Tue Oct 01 08:55:21 2019
Offset : -0.000647127 seconds
Peer delay : 0.017280784 seconds
Peer dispersion : 0.000000299 seconds
Response time : 0.000017454 seconds
Jitter asymmetry: +0.00
NTP tests : 111 111 1111
Interleaved : No
Authenticated : No
TX timestamping : Daemon
RX timestamping : Kernel
Total TX : 19
Total RX : 19
Total valid RX : 19
Remote address : 195.200.224.66 (C3C8E042)
Remote port : 123
Local address : 10.102.7.103 (0A660767)
Leap status : Normal
Version : 4
Mode : Server
Stratum : 3
Poll interval : 8 (256 seconds)
Precision : -22 (0.000000238 seconds)
Root delay : 0.012161 seconds
Root dispersion : 0.075363 seconds
Reference ID : 040DCC06 ()
Reference time : Tue Oct 01 09:05:49 2019
Offset : +0.002876930 seconds
Peer delay : 0.016297361 seconds
Peer dispersion : 0.000000301 seconds
Response time : 0.000069260 seconds
Jitter asymmetry: +0.00
NTP tests : 111 111 1111
Interleaved : No
Authenticated : No
TX timestamping : Daemon
RX timestamping : Kernel
Total TX : 28
Total RX : 28
Total valid RX : 28
Remote address : 82.64.45.50 (52402D32)
Remote port : 123
Local address : 10.102.7.103 (0A660767)
Leap status : Normal
Version : 4
Mode : Server
Stratum : 1
Poll interval : 8 (256 seconds)
Precision : -25 (0.000000030 seconds)
Root delay : 0.000000 seconds
Root dispersion : 0.000000 seconds
Reference ID : 47505300 (GPS)
Reference time : Tue Oct 01 09:18:38 2019
Offset : -0.000110997 seconds
Peer delay : 0.036675438 seconds
Peer dispersion : 0.000000500 seconds
Response time : 0.000004827 seconds
Jitter asymmetry: +0.00
NTP tests : 111 111 1111
Interleaved : No
Authenticated : No
TX timestamping : Daemon
RX timestamping : Kernel
Total TX : 22
Total RX : 6
Total valid RX : 6
The message...
suggests somehow your machine's connection to the Internet is sometimes interrupted.
The message...
suggests no sources could be contacted for almost 2 hours (124 minutes) which seems highly suspect.
This means either this machine's Internet connection actually drops for 2 hour sometimes or something about your NTP setup is failing.
Try posting the contents of the following, as text, like you did above.
Your log file name may be different, depending on your Distro.
Sep 30 22:10:00 xxxxxxx chronyd[32710]: Can't synchronise: no selectable sources
suggests somehow your machine's connection to the Internet is sometimes interrupted.
The message...
Sep 30 22:13:13 xxxxxxx chronyd[32710]: System clock wrong by -7439.358171 seconds, adjustment started
suggests no sources could be contacted for almost 2 hours (124 minutes) which seems highly suspect.
This means either this machine's Internet connection actually drops for 2 hour sometimes or something about your NTP setup is failing.
Try posting the contents of the following, as text, like you did above.
egrep -i -e chronyd -e cron /var/log/syslog /var/log/syslog.1
Your log file name may be different, depending on your Distro.
@david is doesn't mean there was no connection for 2 hours (probably far more... ).
It means there was a buildup of errors accumulating to a two hour difference...
I have seen clocks from VM's drifting by one minute / five minutes. With that rate it took around 10 hours of missing synchronisation to get there.
Any more log messages from crony in the log files? preferably a longer time a period of 2-4 weeks?
If the clocks were running for a long time maybe the poll interval became to long to cope with changes...
The jitter i meant (clock jitter) is not in the shown statistics. (Jitter = 0 can only happen in an atom clock and you probably need a hydrogen clock [ gallileo] ).
you may get help from adding minpoll 5 maxpoll 8 on each of the server lines.
It means there was a buildup of errors accumulating to a two hour difference...
I have seen clocks from VM's drifting by one minute / five minutes. With that rate it took around 10 hours of missing synchronisation to get there.
Any more log messages from crony in the log files? preferably a longer time a period of 2-4 weeks?
If the clocks were running for a long time maybe the poll interval became to long to cope with changes...
The jitter i meant (clock jitter) is not in the shown statistics. (Jitter = 0 can only happen in an atom clock and you probably need a hydrogen clock [ gallileo] ).
you may get help from adding minpoll 5 maxpoll 8 on each of the server lines.
ASKER
The logs don't really change, every evening between 22:10:00 and 22:20:00 a forward time jump is detected. I'm going to watch it this evening and see what happens now with the ntp configured on the esx server. Hopefully the issue will be gone
hm. let me guess. The jump as after a backup?... or some other house keeping? the the VM is snapshotted
ASKER
So it just happened again, another service running on the same esx server doesn't have that problem...
executing hwclock shows the right time tho.
executing hwclock shows the right time tho.
ASKER
timedatectl
Local time: Tue 2019-10-01 23:51:15 CEST
Universal time: Tue 2019-10-01 21:51:15 UTC
RTC time: Tue 2019-10-01 19:55:55
Time zone: Europe/Brussels (CEST, +0200)
NTP enabled: yes
NTP synchronized: no
RTC in local TZ: no
DST active: yes
Last DST change: DST began at
Sun 2019-03-31 01:59:59 CET
Sun 2019-03-31 03:00:00 CEST
Next DST change: DST ends (the clock jumps one hour backwards) at
Sun 2019-10-27 02:59:59 CEST
Sun 2019-10-27 02:00:00 CET
restarting chrony fixes the issue but it will happen again.
Thinking about removing Chrony and using ntpd
Then there is some tool moving the clocks...
you may be able to find it by enabling selinux in permissive mode and watch the audit trails from that around the date change.
Two hour shift might also be caused by some utility setting the wrong time with f.e. hwclock or doing data calculations the wrong ways . It is close to the difference between Summer time & UTC. (7200 seconds). If the difference is too large (1000 seconds ) then syncing stops and the clock can run astray,
hwclock can be set to read/write UTC time, or localtime is that used consistently? (if used). It is wise to check if chrony is the cause, ntpd has a better track record (imho).
check cron jobs around 22:00 for time manipulation.
ntpd will not help if there is such a stray job.
you may be able to find it by enabling selinux in permissive mode and watch the audit trails from that around the date change.
Two hour shift might also be caused by some utility setting the wrong time with f.e. hwclock or doing data calculations the wrong ways . It is close to the difference between Summer time & UTC. (7200 seconds). If the difference is too large (1000 seconds ) then syncing stops and the clock can run astray,
hwclock can be set to read/write UTC time, or localtime is that used consistently? (if used). It is wise to check if chrony is the cause, ntpd has a better track record (imho).
check cron jobs around 22:00 for time manipulation.
ntpd will not help if there is such a stray job.
Hmmm - since it's a VM, let's have a look at a common cause of time havoc:
Does the host sync the VM's clock ?
That MUST defintely be forbidden if the VM has its own timeservice.
Besides of that:
P.S.: I'd like to kindly recommend a look onto my article on NTP basics for some NTP insights ...
Does the host sync the VM's clock ?
That MUST defintely be forbidden if the VM has its own timeservice.
Besides of that:
- it's a good idead to sync the hardware clock, but on a VM I suspect that would be useless ... I presume on bootup the clock of the host would be used as source of the HW clock. Probably you should sync the host, too, to be on the safe side.
- Why not the "real thing", a classic NTP client ? What's the point where chrony is better ?
P.S.: I'd like to kindly recommend a look onto my article on NTP basics for some NTP insights ...
ASKER
We have several other vm's powered on the same esx server that don't have this issue so we decided to just drop that one and start it over, no issues so far. Hopefully the issue is gone.
We will swap over to ntpd in the near future, want to see if the issue is still present first
We will swap over to ntpd in the near future, want to see if the issue is still present first
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Is this on a VM, Container or bare metal.
Are there ANY other programs running that set / sync time? Please also check all cronjobs.
Can you give an output of ntpq -pw & ntpq -pwn
(preferably if the system has been up for some time.)