Troubleshooting Ansible Sudo Authentication Failures: When Playbooks Hang on Gathering Facts


3 views

In a typical Ansible sudo authentication scenario, several components interact:

Ansible Client → SSH → Remote Host → Sudo → PAM

The failure occurs specifically during the fact gathering phase, which is the first privileged operation Ansible performs. Let's examine the critical components.

First, verify these essential configuration points on the problematic host:

# Check sudoers configuration
$ sudo visudo -c

# Verify PAM configuration
$ ls -la /etc/pam.d/sudo
$ diff /etc/pam.d/sudo server1:/etc/pam.d/sudo

# Examine SSH configuration for subtle differences
$ diff /etc/ssh/sshd_config server1:/etc/ssh/sshd_config

When facing sudo authentication issues, these commands help isolate the problem:

# Test basic sudo functionality
$ echo "password" | sudo -S -k whoami

# Test sudo with Ansible-style command
$ echo "password" | sudo -S -k /bin/sh -c 'echo SUDO-SUCCESS-test; whoami'

# Verbose SSH logging
$ ansible-playbook playbook.yml -vvvv

Several scenarios can cause this behavior:

1. Terminal Allocation Issues

# In ansible.cfg
[defaults]
sudo_flags = -H -S -n

2. PAM Module Differences

# Compare PAM stack
$ sudo grep -v ^# /etc/pam.d/sudo | grep -v ^$

3. SSH Configuration Differences

# Check for subtle SSHd differences
$ sudo sshd -T | grep -E 'PermitEmptyPasswords|ChallengeResponseAuthentication'

For persistent cases, implement these deeper diagnostics:

# Real-time debugging with strace
$ strace -f -o /tmp/ansible-debug.log ansible-playbook playbook.yml

# PAM debugging
$ sudo tail -f /var/log/secure | grep -i pam

Optimal ansible.cfg settings for sudo operations:

[defaults]
sudo_flags = -H -S -n
sudo_exe = /usr/bin/sudo
timeout = 30
gathering = smart

[privilege_escalation]
become = True
become_method = sudo
become_user = root
become_ask_pass = True

When traditional sudo authentication fails consistently:

# Using become instead of sudo
- hosts: problematic_host
  become: yes
  tasks:
    - name: Test privilege escalation
      command: whoami
      register: result
    
    - debug: var=result.stdout

After making changes, validate with:

# Test sudo with the exact command Ansible uses
$ echo "yourpassword" | sudo -S -k /bin/sh -c 'echo SUDO-SUCCESS-testcommand; LANG=C LC_CTYPE=C /usr/bin/python /tmp/ansible-test'

# Check environment variables
$ sudo -i env | grep -E 'LANG|LC_'

When working with Ansible 1.8.2 on RHEL 6.4 systems, you may encounter a specific scenario where:

GATHERING FACTS *************************************************************** 
ok: [Server2]
failed: [Server1] => {"failed": true, "parsed": false}

The authentication fails despite correct sudo credentials being provided. Let's examine why this occurs on some servers but not others.

The problem manifests when these conditions converge:

  • RHEL 6.4 with Ansible 1.8.2
  • NIS-authenticated users
  • Sudo version 1.8.6p3
  • OpenSSH 5.3p1

From the secure logs, we see different behavior:

# Server1 (failing)
Dec 31 15:21:11 Server1 sudo: pam_unix(sudo:auth): authentication failure
Dec 31 15:26:13 Server1 sudo: pam_unix(sudo:auth): conversation failed

# Server2 (working)
Dec 31 15:21:12 Server2 sudo: User1 : TTY=pts/2 ; PWD=/home/User1 ; USER=root

Option 1: Modify Ansible Configuration

Add these settings to ansible.cfg:

[defaults]
sudo_flags = -H -S -n

The flags breakdown:

  • -H: Set HOME environment variable
  • -S: Read password from stdin
  • -n: Non-interactive mode

Option 2: Update PAM Configuration

Edit /etc/pam.d/sudo on problematic servers:

# Add to the top of the file
auth sufficient pam_permit.so

Warning: This reduces security - only use in trusted environments.

Option 3: Use SSH Agent Forwarding

Configure your playbook to use SSH keys:

---
- hosts: all
  become: yes
  become_method: sudo
  gather_facts: no
  
  tasks:
    - name: Setup SSH keys
      authorized_key:
        user: "{{ ansible_user }}"
        key: "{{ lookup('file', '~/.ssh/id_rsa.pub') }}"

The successful ad-hoc command uses different authentication flow:

ansible Server1 -m file -a "dest=/tmp/ansible_test.txt state=touch" -sK

Key differences:

  • Direct command execution bypasses fact gathering
  • Different stdin handling in ad-hoc mode
  • Simpler authentication context

For production environments, implement this solution:

# Create /etc/sudoers.d/ansible
User1 ALL=(ALL) NOPASSWD: ALL

# Then in ansible.cfg
[defaults]
sudo_flags = -H -S

This provides the best balance of security and reliability for Ansible automation.