When dealing with NTP infrastructure, we often encounter devices that misrepresent their stratum levels. In this case, a third-party NTP device incorrectly reports stratum 2 when it should actually be stratum 4 (receiving time from a stratum 3 server). The NTP algorithm naturally prefers lower stratum sources, creating an unwanted preference cascade.
The standard NTP implementation uses these key metrics for server selection:
- Stratum level (lower is better)
- Root dispersion
- Root delay
- Clock jitter
The problematic device configuration looks like this:
$ ntpq -pn
remote refid st t when poll reach delay offset jitter
==============================================================================
x10.x.x.X 69.164.222.108 3 u 48 64 177 0.501 370.029 1.530
*10.x.x.Z 10.x.x.Z 2 u 50 64 377 1.354 -23.681 14.179
The fudge
directive only works for reference clocks, not network peers. Here are three practical solutions:
1. Using the noselect Keyword
In your ntp.conf
:
server 10.x.x.Z noselect
server 10.x.x.X prefer
This completely excludes the problematic server from selection.
2. Manipulating the Trust Mechanism
Add weight adjustments to influence server selection:
server 10.x.x.X prefer minpoll 4 maxpoll 4
server 10.x.x.Z minpoll 6 maxpoll 6
This makes your preferred server poll more frequently, increasing its statistical weight.
3. Creating a Local Reference Clock
For advanced setups, you can create a virtual reference clock:
server 127.127.1.0
fudge 127.127.1.0 stratum 5
Then configure your preferred servers as peers:
peer 10.x.x.X
peer 10.x.x.Z
After implementing changes, verify with:
ntpq -pn
ntpdc -c sysinfo
ntptrace
For continuous monitoring, consider adding these to your NRPE or monitoring system:
command[check_ntp_peer]=/usr/lib/nagios/plugins/check_ntp_peer -H 10.x.x.X -w 0.5 -c 1.0
command[check_ntp_stratum]=/usr/lib/nagios/plugins/check_ntp_time -H localhost -w 5 -c 10
When dealing with NTP hierarchy, stratum levels should propagate predictably - each hop should increment the stratum by 1. However, some network devices (particularly specialized time synchronization hardware) may hardcode their stratum values, creating inconsistencies in your time synchronization topology.
First, verify the actual stratum propagation with ntpq
:
ntpq -pn
remote refid st t when poll reach delay offset jitter
==============================================================================
*10.1.1.10 69.164.222.108 3 u 48 64 177 0.501 370.029 1.530
10.1.1.20 10.1.1.20 2 u 50 64 377 1.354 -23.681 14.179
In this case, 10.1.1.20 (a wireless clock transmitter) incorrectly reports stratum 2 despite being downstream from our stratum 3 server.
For NTPd versions 4.2.7p26 and later, use the maxdist
option:
server 10.1.1.20 maxdist 0.5
This tells ntpd to treat the source as highly distant (effectively making it less preferable). For older versions, we need to combine multiple techniques:
1. Weighting Configuration
server 10.1.1.10 prefer
server 10.1.1.20 noselect
2. Using NTP Authentication
Add authentication to your preferred server:
server 10.1.1.10 key 42
keys /etc/ntp.keys
trustedkey 42
3. Filtering with Restrictions
restrict 10.1.1.20 nomodify notrap noquery
After making changes, monitor the selection process:
ntpq -c "peers"
ntpdate -q 10.1.1.20
The peers
output should now show your preferred server with the *
marker indicating it's the current synchronization source.
For critical systems, consider declaring a local reference clock with proper stratum:
server 127.127.1.0
fudge 127.127.1.0 stratum 8
This creates a minimal stratum baseline that will naturally deprioritize higher stratum sources.