Critical Time-Service Bug: Windows Servers Spontaneously Jumping 55+ Days into Future (with Partial Rollbacks)


4 views

We've encountered a critical Windows Time-Service issue affecting multiple server versions (2016/2019/2022) where system clocks suddenly jump forward 55+ days, then partially correct themselves through chaotic rollback sequences. This creates cascading failures in:

  • Kerberos authentication (typical error: KRB_AP_ERR_SKEW)
  • Database transaction timestamps
  • Log aggregation systems
  • SSL certificate validation
// Event sequence from our monitoring:
2023-04-15T14:32:18Z - Clock jumps to 2023-06-09 
2023-04-15T14:32:33Z - Time-Service attempts -4454176s correction (fails)
2023-04-15T14:47:18Z - Secondary jump to +12h26m43s offset
2023-04-15T14:47:33Z - Successful correction within threshold

After eliminating common suspects:

  1. VMware Tools: Disabled time synchronization (tools.syncTime = "0")
  2. NTP Configuration: Verified w32tm /query /configuration
  3. Hardware Clocks: Cross-verified with Get-WmiObject Win32_UTCTime
# PowerShell watchdog (run as scheduled task every 5 minutes)
$maxDelta = New-TimeSpan -Minutes 5
$current = Get-Date
$ntpTime = try { (w32tm /stripchart /computer:pool.ntp.org /dataonly /samples:1)[-1] -replace '.*: ' } catch { $current }

if ([datetime]$ntpTime - $current -gt $maxDelta) {
    Stop-Service w32time
    w32tm /unregister
    w32tm /register
    Start-Service w32time
    w32tm /resync
    Write-EventLog -LogName System -Source "TimeService" -EntryType Error -EventId 1001 -Message "Clock drift detected and reset"
}

The bug appears in the Windows Time-Service's drift compensation algorithm when:

  • Processing large time differences (> 2^32 microseconds)
  • Interacting with virtualized hardware clocks
  • During daylight saving transitions

Add these registry keys to limit aggressive corrections:

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\W32Time\Config]
"MaxNegPhaseCorrection"=dword:0000012c
"MaxPosPhaseCorrection"=dword:0000012c
"MaxAllowedPhaseOffset"=dword:0000000f

Sample Prometheus alert rule for time drift detection:

groups:
- name: time_monitoring
  rules:
  - alert: WindowsTimeDrift
    expr: abs(time() - windows_time_ntp_offset_seconds{job="windows"}) > 300
    for: 2m
    labels:
      severity: critical
    annotations:
      summary: "{{ $labels.instance }} has clock drift >5 minutes"

We've encountered a particularly nasty Windows Time Service issue where domain-joined servers suddenly jump forward in time (55 days in our case), then partially correct themselves in erratic patterns. This behavior was observed across:

  • Windows Server 2016 (twice on same machine)
  • Windows Server 2019 (initial observation)
  • Independent reports of Server 2022 exhibiting similar behavior

Here's what happens during an incident (based on our logging):

1. [T+0] Clock suddenly jumps forward (e.g., 2023-01-01 when actual date is 2022-08-10)
2. [T+15s] Time Service detects discrepancy with DC, attempts correction
3. [T+15m] Second time jump occurs (smaller delta)
4. [T+15s] Final correction within acceptable threshold

We've ruled out several potential causes through extensive testing:

# Verify NTP sources
w32tm /query /peers

# Check time service configuration
w32tm /query /configuration

# Examine time service debug logs (requires registry tweak)
reg add HKLM\SYSTEM\CurrentControlSet\Services\W32Time\Config /v FileLogEntries /t REG_DWORD /d 0-300 /f
reg add HKLM\SYSTEM\CurrentControlSet\Services\W32Time\Config /v FileLogName /t REG_SZ /d "C:\time.log" /f

While we confirmed this occurs on physical hardware too, for VMware environments:

# Ensure proper time sync configuration
vim-cmd hostsvc/advopt/view CpxUseHvTimer

# Critical VMware tools settings
<vmx>
tools.syncTime = "0"
time.synchronize.continue = "FALSE"
time.synchronize.restore = "FALSE"
time.synchronize.resume.disk = "FALSE"
</vmx>

Until Microsoft provides a proper fix, we've implemented these measures:

:: PowerShell watchdog script
while ($true) {
    $delta = [math]::Abs((Get-Date) - (Get-WmiObject Win32_OperatingSystem).LastBootUpTime).TotalDays
    if ($delta -gt 30) {
        Restart-Service W32Time
        w32tm /resync
        Send-MailMessage -To admin@domain.com -Subject "Time drift detected" -Body "Delta: $delta days"
    }
    Start-Sleep -Seconds 300
}

These settings have reduced (but not eliminated) occurrences:

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\W32Time\Config]
"MaxNegPhaseCorrection"=dword:ffffffff
"MaxPosPhaseCorrection"=dword:ffffffff
"PhaseCorrectRate"=dword:00000001
"PollAdjustFactor"=dword:00000005
"SpikeWatchPeriod"=dword:00000384

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\W32Time\TimeProviders\NtpClient]
"SpecialPollInterval"=dword:00000e10

Our case history with Microsoft Support:

  • 2022-09-15: Initial case opened (Reference#: SRX14589234)
  • 2022-10-03: Acknowledged as known issue, no ETA for fix
  • 2022-11-17: Suggested workarounds (registry tweaks above)
  • 2023-01-05: Case still open, no root cause identified