纠结的NTP安装过程

时间:2023-03-09 00:49:48
纠结的NTP安装过程

为了部署实验用的openstack环境,其中有NTP的安装环节。在这个过程中,真是折腾了一下午。。。遇到了一些问题!

由于公司内部网络管理的原因,很多网站没有办法访问,比如公开的时间服务站点,我找了几个都没有办法访问,于是乎,我就选择了选择将openstack的controller节点node0作为time server,其他的节点作为client。

我的openstack的基础服务器上安装的linux系统是centos7.首先按照openstack官网的说法,安装了chrony 2.1.1的版本。配置也很简单。openstack官网的说法,时间服务器节点配置在controller机器上,其他的节点作为client节点。开始执行的时候,总是无法同步上时间服务器。

 chronyc> sourcestats -v    #time server上操作的信息
Number of sources =
.- Number of sample points in measurement set.
/ .- Number of residual runs with same sign.
| / .- Length of measurement set (time).
| | / .- Est. clock freq error (ppm).
| | | / .- Est. error in freq.
| | | | / .- Est. offset.
| | | | | | On the -.
| | | | | | samples. \
| | | | | | |
Name/IP Address NP NR Span Frequency Freq Skew Offset Std Dev
==============================================================================
node0 +0.000 2000.000 +0ns 4000ms
chronyc> sourcestats -v
Number of sources =
.- Number of sample points in measurement set.
/ .- Number of residual runs with same sign.
| / .- Length of measurement set (time).
| | / .- Est. clock freq error (ppm).
| | | / .- Est. error in freq.
| | | | / .- Est. offset.
| | | | | | On the -.
| | | | | | samples. \
| | | | | | |
Name/IP Address NP NR Span Frequency Freq Skew Offset Std Dev
==============================================================================
node0 +0.000 2000.000 +0ns 4000ms
chronyc>

在client上执行chronyc的操作,注意看下面红色的部分,是一个?号,说明现在没有和时间服务器同步上,不知是什么地方配错了。

 [root@node1 tools]# chronyc -a
chrony version 2.1.
Copyright (C) -, , - Richard P. Curnow and others
chrony comes with ABSOLUTELY NO WARRANTY. This is free software, and
you are welcome to redistribute it under certain conditions. See the
GNU General Public License version for details. OK
chronyc> sourcestats -v
Number of sources =
.- Number of sample points in measurement set.
/ .- Number of residual runs with same sign.
| / .- Length of measurement set (time).
| | / .- Est. clock freq error (ppm).
| | | / .- Est. error in freq.
| | | | / .- Est. offset.
| | | | | | On the -.
| | | | | | samples. \
| | | | | | |
Name/IP Address NP NR Span Frequency Freq Skew Offset Std Dev
==============================================================================
node0 +0.000 2000.000 +0ns 4000ms
chronyc> sources
Number of sources =
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
^? node0 10y +0ns[ +0ns] +/- 0ns

后来,采用原始的NTP的配置。不用chrony了。但是,NTP的配置,也不是很顺利,遇到了下面几个主要问题。这些问题,其实是一个一个的被暴露出来的。因为这里几个机器是很久没有用的poweredge r610服务器,有的开机了,有的是下电状态。各自的硬件时钟hwclock值差很多,几个小时。

首先看看我time master的/etc/ntp.conf原始配置:(其他的配置信息,都是保留了默认的信息),这里,192.168.1.100是node0的ip。其他几个节点的IP都和node在一个网段。

 # Hosts on local network are less restricted.
restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap # Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
#server .centos.pool.ntp.org iburst
#server .centos.pool.ntp.org iburst
#server .centos.pool.ntp.org iburst
#server .centos.pool.ntp.org iburst
server 192.168.1.100 iburst

而client的ntp.conf配置信息,和master的基本一致,就是少了restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap。

最开始遇到的问题,在这里已经没有办法重现了,就是tcpdump的时候,提式bad udp cksum的错误。google了下,发现只需要在server和client端都执行下面的两指令就好,主要是消除tcp offloading的问题。

 ethtool --offload em1 rx off tx off
ethtool -K em1 gso off

调试过程中,node0上通过tcpdump查看IP包的走向,node3(client)上执行ntpdate -d node0命令。分别得到下面的信息。首先看master上的tcpdump的内容:

 [root@node0 tools]# tcpdump -vvv -i em1 host 192.168.1.130 -n
tcpdump: listening on em1, link-type EN10MB (Ethernet), capture size bytes
::03.717606 IP (tos 0x0, ttl , id , offset , flags [DF], proto UDP (), length )
192.168.1.130. > 192.168.1.100.ntp: [udp sum ok] NTPv4, length
Client, Leap indicator: clock unsynchronized (), Stratum (unspecified), poll (8s), precision -
Root Delay: 1.000000, Root dispersion: 1.000000, Reference-ID: (unspec)
Reference Timestamp: 0.000000000
Originator Timestamp: 0.000000000
Receive Timestamp: 0.000000000
Transmit Timestamp: 3663362643.719085752 (// ::)
Originator - Receive Timestamp: 0.000000000
Originator - Transmit Timestamp: 3663362643.719085752 (// ::)
::03.717667 IP (tos 0xc0, ttl , id , offset , flags [none], proto ICMP (), length )
192.168.1.100 > 192.168.1.130: ICMP host 192.168.1.100 unreachable - admin prohibited, length 84 #这行信息有问题。找不到主机。
IP (tos 0x0, ttl , id , offset , flags [DF], proto UDP (), length )
192.168.1.130. > 192.168.1.100.ntp: [udp sum ok] NTPv4, length
Client, Leap indicator: clock unsynchronized (), Stratum (unspecified), poll (8s), precision -
Root Delay: 1.000000, Root dispersion: 1.000000, Reference-ID: (unspec)
Reference Timestamp: 0.000000000
Originator Timestamp: 0.000000000
Receive Timestamp: 0.000000000
Transmit Timestamp: 3663362643.719085752 (// ::)
Originator - Receive Timestamp: 0.000000000
。。。。。。。。。。。。。。。。。。。。。。

而在node3上,执行ntpdate -d node0时,得到下面的错误日志(红色部分,说明此时,两个机器之间没有数据交互,ntp协议不通):

 [root@node3 tools]# ntpdate -d node0
Feb :: ntpdate[]: ntpdate 4.2.6p5@1.2349-o Mon Jan :: UTC ()
Looking for host node0 and service ntp
host found : node2
transmit(192.168.1.100)
transmit(192.168.1.100)
transmit(192.168.1.100)
transmit(192.168.1.100)
transmit(192.168.1.100)
192.168.1.100: Server dropped: no data
server 192.168.1.100, port
stratum , precision , leap , trust
refid [192.168.1.100], delay 0.00000, dispersion 64.00000
transmitted , in filter
reference time: 00000000.00000000 Mon, Jan ::43.000
originate timestamp: 00000000.00000000 Mon, Jan ::43.000
transmit timestamp: da5a7a59.b8115280 Tue, Feb ::09.719
filter delay: 0.00000 0.00000 0.00000 0.00000
0.00000 0.00000 0.00000 0.00000
filter offset: 0.000000 0.000000 0.000000 0.000000
0.000000 0.000000 0.000000 0.000000
delay 0.00000, dispersion 64.00000
offset 0.000000 Feb :: ntpdate[]: no server suitable for synchronization found

在调试过程中,iptables都关闭了,还是找不到原因,网上google了很多信息,最终找到一点线索。centos7中有个firewalld的防火墙程序,将这个也关闭了。发现没有上面的错误,但是显示新的错误类型了:

 [root@node1 tools]# ntpdate -d 192.168.1.100
Feb :: ntpdate[]: ntpdate 4.2.6p5@1.2349-o Mon Jan :: UTC ()
Looking for host 192.168.1.100 and service ntp
host found : node0
transmit(192.168.1.100)
receive(192.168.1.100)
transmit(192.168.1.100)
receive(192.168.1.100)
transmit(192.168.1.100)
receive(192.168.1.100)
transmit(192.168.1.100)
receive(192.168.1.100)
192.168.1.100: Server dropped: strata too high
server 192.168.1.100, port
stratum , precision -, leap , trust
refid [192.168.1.100], delay 0.02582, dispersion 0.00000
transmitted , in filter
reference time: 00000000.00000000 Mon, Jan ::43.000
originate timestamp: da5a80c3.f225c875 Tue, Feb ::31.945
transmit timestamp: da5a80c3.fef87c8f Tue, Feb ::31.995
filter delay: 0.02585 0.02583 0.02582 0.02583
0.00000 0.00000 0.00000 0.00000
filter offset: -0.05033 -0.05031 -0.05030 -0.05030
0.000000 0.000000 0.000000 0.000000
delay 0.02582, dispersion 0.00000
offset -0.050306 Feb :: ntpdate[]: no server suitable for synchronization found

继续找解决方案,最后发现,若将自己作为standalone的time server,那么server节点的server配置就不能用自己的NIC上配置的IP,而用127.127.1.1的回环IP,如下配置:

即将server 192.168.1.100 iburst改为server 127.127.1.1 iburst后,

重启ntpd,测试就通过了,node1时间就同步上node0了:

 [root@node1 tools]# ntpdate -d 192.168.1.100
Feb :: ntpdate[]: ntpdate 4.2.6p5@1.2349-o Mon Jan :: UTC ()
Looking for host 192.168.1.100 and service ntp
host found : node0
transmit(192.168.1.100)
receive(192.168.1.100)
transmit(192.168.1.100)
receive(192.168.1.100)
transmit(192.168.1.100)
receive(192.168.1.100)
transmit(192.168.1.100)
receive(192.168.1.100)
server 192.168.1.100, port
stratum , precision -, leap , trust
refid [192.168.1.100], delay 0.02580, dispersion 0.00000
transmitted , in filter
reference time: da5a828d.a102d9f2 Tue, Feb ::09.628
originate timestamp: da5a8299.e4141d4c Tue, Feb ::21.890
transmit timestamp: da5a8299.f02c1325 Tue, Feb ::21.938
filter delay: 0.02583 0.02583 0.02580 0.02583
0.00000 0.00000 0.00000 0.00000
filter offset: -0.04751 -0.04749 -0.04747 -0.04745
0.000000 0.000000 0.000000 0.000000
delay 0.02580, dispersion 0.00000
offset -0.047470 2 Feb 09:19:21 ntpdate[2190]: adjust time server 192.168.1.100 offset -0.047470 sec

这个时候,node0的tcpdump的信息如下:

 [root@node0 etc]# tcpdump -vvv -i em1 host 192.168.1.110 -n
tcpdump: listening on em1, link-type EN10MB (Ethernet), capture size bytes
::15.890792 IP (tos 0x0, ttl , id , offset , flags [DF], proto UDP (), length )
192.168.1.110. > 192.168.1.100.ntp: [udp sum ok] NTPv4, length
Client, Leap indicator: clock unsynchronized (), Stratum (unspecified), poll (8s), precision -
Root Delay: 1.000000, Root dispersion: 1.000000, Reference-ID: (unspec)
Reference Timestamp: 0.000000000
Originator Timestamp: 0.000000000
Receive Timestamp: 0.000000000
Transmit Timestamp: 3663364755.938188076 (// ::)
Originator - Receive Timestamp: 0.000000000
Originator - Transmit Timestamp: 3663364755.938188076 (// ::)
::15.890922 IP (tos 0xc0, ttl , id , offset , flags [DF], proto UDP (), length )
192.168.1.100.ntp > 192.168.1.110.: [udp sum ok] NTPv4, length
Server, Leap indicator: (), Stratum (secondary reference), poll (8s), precision -
Root Delay: 0.000000, Root dispersion: 7.947586, Reference-ID: 127.127.1.1
Reference Timestamp: 3663364749.628949761 (// ::)
Originator Timestamp: 3663364755.938188076 (// ::)
Receive Timestamp: 3663364755.890792727 (// ::)
Transmit Timestamp: 3663364755.890906155 (// ::)
Originator - Receive Timestamp: -0.047395322
Originator - Transmit Timestamp: -0.047281891
::17.890788 IP (tos 0x0, ttl , id , offset , flags [DF], proto UDP (), length )
192.168.1.110. > 192.168.1.100.ntp: [udp sum ok] NTPv4, length
Client, Leap indicator: clock unsynchronized (), Stratum (unspecified), poll (8s), precision -
Root Delay: 1.000000, Root dispersion: 1.000000, Reference-ID: (unspec)
Reference Timestamp: 0.000000000
Originator Timestamp: 0.000000000
Receive Timestamp: 0.000000000
Transmit Timestamp: 3663364757.938173294 (// ::)
Originator - Receive Timestamp: 0.000000000
Originator - Transmit Timestamp: 3663364757.938173294 (// ::)
::17.890901 IP (tos 0xc0, ttl , id , offset , flags [DF], proto UDP (), length )
192.168.1.100.ntp > 192.168.1.110.: [udp sum ok] NTPv4, length
Server, Leap indicator: (), Stratum (secondary reference), poll (8s), precision -
Root Delay: 0.000000, Root dispersion: 7.947616, Reference-ID: 127.127.1.1
Reference Timestamp: 3663364749.628949761 (// ::)
Originator Timestamp: 3663364757.938173294 (// ::)
Receive Timestamp: 3663364757.890788793 (// ::)
Transmit Timestamp: 3663364757.890883028 (// ::)
Originator - Receive Timestamp: -0.047384545
Originator - Transmit Timestamp: -0.047290302
::19.890806 IP (tos 0x0, ttl , id , offset , flags [DF], proto UDP (), length )
192.168.1.110. > 192.168.1.100.ntp: [udp sum ok] NTPv4, length
Client, Leap indicator: clock unsynchronized (), Stratum (unspecified), poll (8s), precision -
Root Delay: 1.000000, Root dispersion: 1.000000, Reference-ID: (unspec)
Reference Timestamp: 0.000000000
Originator Timestamp: 0.000000000
Receive Timestamp: 0.000000000
Transmit Timestamp: 3663364759.938178777 (// ::)
Originator - Receive Timestamp: 0.000000000
Originator - Transmit Timestamp: 3663364759.938178777 (// ::)
::19.890935 IP (tos 0xc0, ttl , id , offset , flags [DF], proto UDP (), length )
192.168.1.100.ntp > 192.168.1.110.: [udp sum ok] NTPv4, length
Server, Leap indicator: (), Stratum (secondary reference), poll (8s), precision -
Root Delay: 0.000000, Root dispersion: 7.947647, Reference-ID: 127.127.1.1
Reference Timestamp: 3663364749.628949761 (// ::)
Originator Timestamp: 3663364759.938178777 (// ::)
Receive Timestamp: 3663364759.890806376 (// ::)
Transmit Timestamp: 3663364759.890924751 (// ::)
Originator - Receive Timestamp: -0.047372374
Originator - Transmit Timestamp: -0.047254003
::20.895650 ARP, Ethernet (len ), IPv4 (len ), Request who-has 192.168.1.100 tell 192.168.1.110, length
::20.895665 ARP, Ethernet (len ), IPv4 (len ), Reply 192.168.1.100 is-at :::f0:c3:, length
::21.890822 IP (tos 0x0, ttl , id , offset , flags [DF], proto UDP (), length )
192.168.1.110. > 192.168.1.100.ntp: [udp sum ok] NTPv4, length
Client, Leap indicator: clock unsynchronized (), Stratum (unspecified), poll (8s), precision -
Root Delay: 1.000000, Root dispersion: 1.000000, Reference-ID: (unspec)
Reference Timestamp: 0.000000000
Originator Timestamp: 0.000000000
Receive Timestamp: 0.000000000
Transmit Timestamp: 3663364761.938172519 (// ::)
Originator - Receive Timestamp: 0.000000000
Originator - Transmit Timestamp: 3663364761.938172519 (// ::)
::21.890948 IP (tos 0xc0, ttl , id , offset , flags [DF], proto UDP (), length )
192.168.1.100.ntp > 192.168.1.110.: [udp sum ok] NTPv4, length
Server, Leap indicator: (), Stratum (secondary reference), poll (8s), precision -
Root Delay: 0.000000, Root dispersion: 7.947677, Reference-ID: 127.127.1.1
Reference Timestamp: 3663364749.628949761 (// ::)
Originator Timestamp: 3663364761.938172519 (// ::)
Receive Timestamp: 3663364761.890823006 (// ::)
Transmit Timestamp: 3663364761.890931904 (// ::)
Originator - Receive Timestamp: -0.047349534
Originator - Transmit Timestamp: -0.047240607

最后,为了控制时间层数,即stratum的限制,加入了fudge 127.127.1.0 stratum 10的配置信息。下面将最终的server端的ntp.conf配置附上:

 # For more information about this file, see the man pages
# ntp.conf(), ntp_acc(), ntp_auth(), ntp_clock(), ntp_misc(), ntp_mon(). #server 127.127.1.1 iburst
#fudge 127.127.1.0 stratum
#multicastclient
#broadcastdelay 0.008
#authenticate no driftfile /var/lib/ntp/drift # Permit time synchronization with our time source, but do not
# permit the source to query or modify the service on this system.
restrict default nomodify notrap nopeer noquery # Permit all access over the loopback interface. This could
# be tightened as well, but to do so would effect some of
# the administrative functions.
restrict 127.0.0.1
restrict :: # Hosts on local network are less restricted.
restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap # Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
#server .centos.pool.ntp.org iburst
#server .centos.pool.ntp.org iburst
#server .centos.pool.ntp.org iburst
#server .centos.pool.ntp.org iburst
server 127.127.1.1 iburst
fudge 127.127.1.0 stratum #broadcast 192.168.1.255 autokey # broadcast server
#broadcastclient # broadcast client
#broadcast 224.0.1.1 autokey # multicast server
#multicastclient 224.0.1.1 # multicast client
#manycastserver 239.255.254.254 # manycast server
#manycastclient 239.255.254.254 autokey # manycast client # Enable public key cryptography.
#crypto includefile /etc/ntp/crypto/pw # Key file containing the keys and key identifiers used when operating
# with symmetric key cryptography.
keys /etc/ntp/keys # Specify the key identifiers which are trusted.
#trustedkey # Specify the key identifier to use with the ntpdc utility.
#requestkey # Specify the key identifier to use with the ntpq utility.
#controlkey # Enable writing of statistics records.
#statistics clockstats cryptostats loopstats peerstats # Disable the monitoring facility to prevent amplification attacks using ntpdc
# monlist command when default restrict does not include the noquery flag. See
# CVE-- for more details.
# Note: Monitoring will not be disabled with the limited restriction flag.
disable monitor

最后,说说,当时间差很大的时候,比如我测试过程中,相差一年,其实不需要这么大,只要在小时级别就可以,会提示你,时间跨度大,这个时候,不要认为是错误,ntp会慢慢的将时间调整过来,避免一次调整过大,影响应用的正常运行。

 [root@node3 tools]# ntpdate -d node0
Feb :: ntpdate[]: ntpdate 4.2.6p5@1.2349-o Mon Jan :: UTC ()
Looking for host node0 and service ntp
host found : node0
transmit(192.168.1.100)
receive(192.168.1.100)
transmit(192.168.1.100)
receive(192.168.1.100)
transmit(192.168.1.100)
receive(192.168.1.100)
transmit(192.168.1.100)
receive(192.168.1.100)
server 192.168.1.100, port
stratum , precision -, leap , trust
refid [192.168.1.100], delay 0.02582, dispersion 0.00000
transmitted , in filter
reference time: da5a87a5.da45128d Tue, Feb ::53.852
originate timestamp: da5a87d1.6c4aead5 Tue, Feb ::37.423
transmit timestamp: d878db68.d01598ae Mon, Feb ::44.812
filter delay: 0.02591 0.02585 0.02582 0.02583
0.00000 0.00000 0.00000 0.00000
filter offset:
0.000000 0.000000 0.000000 0.000000
delay 0.02582, dispersion 0.00000
offset 31566952.609973 Feb :: ntpdate[]: step time server 192.168.1.100 offset 31566952.609973 sec