Troubleshooting Ansible Stuck on Gathering Facts: SSH Connection and Setup Module Issues


2 views

When Ansible hangs during the "Gathering Facts" phase, it's typically related to SSH connectivity issues or problems with the setup module. The verbose output shows the connection attempt and command execution details, which can help diagnose where exactly the process is failing.

From the output, we can identify several potential issues:

  • SSH connection timeout (ConnectTimeout=10 is quite aggressive)
  • Authentication problems (PreferredAuthentications settings)
  • ControlPath directory permissions
  • Network connectivity to the target host

First, try running a simple SSH command manually to verify connectivity:

ssh -vvv -p 2221 deploy@5.xxx.xxx.xxx

If that works, try testing the exact command Ansible is attempting:

ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s \
-o ControlPath=/home/vagrant/.ansible/cp/ansible-ssh-%h-%p-%r \
-o Port=2221 -o KbdInteractiveAuthentication=no \
-o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey \
-o PasswordAuthentication=no -o User=deploy -o ConnectTimeout=10 \
5.xxx.xxx.xxx "/bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1411372677.18-251130781588968 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1411372677.18-251130781588968 && echo $HOME/.ansible/tmp/ansible-tmp-1411372677.18-251130781588968'"

Modify your ansible.cfg with these settings to help with debugging:

[defaults]
gathering = smart
fact_caching = jsonfile
fact_caching_connection = /tmp/ansible_facts
fact_caching_timeout = 86400
timeout = 30
host_key_checking = False

For Vagrant environments, add these settings to your inventory file:

[vagrant]
5.xxx.xxx.xxx ansible_ssh_port=2221 ansible_ssh_user=vagrant ansible_ssh_private_key_file=.vagrant/machines/default/virtualbox/private_key

As a temporary workaround, you can disable fact gathering:

- hosts: all
  gather_facts: no
  tasks:
    - name: Your task here
      debug:
        msg: "Running without facts"

For long-term stability, consider these improvements:

  1. Increase SSH timeout to at least 30 seconds
  2. Simplify authentication methods
  3. Verify ControlPath directory exists and has correct permissions
  4. Check for network issues between your control machine and target hosts

When Ansible hangs during the "Gathering Facts" phase, it typically indicates communication issues between the control node and managed hosts. The verbose output shows the SSH connection attempt freezing during the initial setup module execution.

Based on the provided output, several potential issues emerge:

# Authentication problems
- SSH key permissions
- Changed host keys
- Network/firewall changes

# Performance bottlenecks
- High CPU usage on target
- Slow DNS resolution
- SSH connection throttling

First, test SSH connectivity manually using the exact parameters from the output:

ssh -C -tt -vvv -o ControlMaster=auto \
-o ControlPersist=60s \
-o ControlPath=/home/vagrant/.ansible/cp/ansible-ssh-%h-%p-%r \
-o Port=2221 \
-o KbdInteractiveAuthentication=no \
-o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey \
-o PasswordAuthentication=no \
-o User=deploy \
-o ConnectTimeout=10 \
5.xxx.xxx.xxx

Modify your ansible.cfg or playbook with these settings:

[defaults]
gathering = smart
fact_caching = jsonfile
fact_caching_connection = /tmp/ansible_facts
fact_caching_timeout = 86400
timeout = 30
forks = 5

[ssh_connection]
ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o ServerAliveInterval=60
pipelining = True

For Vagrant environments specifically, add these to your playbook:

- hosts: all
  vars:
    ansible_ssh_common_args: '-o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no'
    ansible_connection: ssh
    ansible_user: vagrant
    ansible_ssh_private_key_file: ./.vagrant/machines/default/virtualbox/private_key
  tasks:
    - name: Test connection
      ping:

When troubleshooting, you can completely bypass fact gathering:

- hosts: webservers
  gather_facts: no
  tasks:
    - name: Install packages
      apt:
        name: "{{ item }}"
        state: present
      with_items:
        - nginx
        - postgresql

Check these network components:

# On control node:
ping target_host
traceroute target_host
telnet target_host 22
nc -zv target_host 22

# On managed host:
sudo netstat -tulnp | grep sshd
journalctl -u sshd -f