In configuration management systems, the push-pull dichotomy creates distinct operational patterns. Pull-based systems like Puppet/Chef implement a continuous synchronization model where agents periodically check the control server (typically every 30 minutes). This creates inherent load distribution:
# Puppet agent's periodic check crontab
*/30 * * * * /opt/puppetlabs/bin/puppet agent --no-daemonize
Push systems like Ansible operate through orchestrated execution bursts. The control node initiates SSH connections simultaneously, requiring careful thread management:
# Ansible playbook execution with fork control
ansible-playbook site.yml -f 50 --forks=50
Metric | Push Model | Pull Model |
---|---|---|
Connection Initiation | Controller → Nodes (bursty) | Nodes → Controller (distributed) |
Failure Handling | Requires retry logic | Built-in through polling |
New Node Bootstrap | Manual intervention needed | Automatic registration |
Rackspace's demonstrated capability with Ansible highlights that push systems can scale when implementing:
- Connection pipelining (SSH multiplexing)
- Delta-based change propagation
- Hierarchical execution topology
# SSH multiplexing configuration in ansible.cfg
[ssh_connection]
ssh_args = -o ControlMaster=auto -o ControlPersist=60s
control_path = ~/.ssh/ansible-%%r@%%h:%%p
The infrastructure.org critique reveals fundamental concurrency challenges in push systems:
# Problematic push implementation pattern
for host in $(cat hostlist); do
scp configs/* $host:/etc/ & # Backgrounding causes socket exhaustion
done
Versus proper threaded implementation:
# Python example using ThreadPoolExecutor
from concurrent.futures import ThreadPoolExecutor
def push_config(host):
with SSHClient() as ssh:
ssh.connect(host)
ssh.put_files('/etc/')
with ThreadPoolExecutor(max_workers=50) as executor:
executor.map(push_config, hostlist)
Modern solutions like SaltStack demonstrate hybrid approaches where minions can operate in both push and pull modes:
# SaltStack multi-mode configuration
# Push mode:
salt '*' state.apply
# Pull mode (scheduled):
schedule:
highstate:
function: state.apply
minutes: 30
The architectural choice ultimately depends on:
- Network topology constraints
- Change propagation urgency
- Node churn rate
- Security model requirements
In configuration management systems, the push vs pull debate centers around how configuration updates propagate through infrastructure. Pull-based systems like Puppet and Chef have clients periodically check a central server for updates, while push-based systems like Ansible initiate changes from a control node.
The primary advantage of pull-based systems emerges in large-scale deployments:
# Example Puppet agent configuration (pull-based)
[agent]
server = puppet-master.example.com
runinterval = 1800 # Check every 30 minutes
This architecture naturally handles:
- Client-initiated connections that don't overwhelm the server
- Built-in retry mechanisms when clients are offline
- Easier horizontal scaling of masters
While Rackspace demonstrates push can scale to 15k nodes, implementation becomes complex:
# Naive push implementation that doesn't scale
for host in $(cat hostlist); do
ssh $host "sudo apt-get update && sudo apt-get upgrade -y"
done
The problems with this approach include:
- Connection timeouts for offline nodes
- TCP socket exhaustion
- No built-in retry mechanism
- Parallel execution complexity
Here's how modern systems handle these challenges:
Pull-Based Optimization
# Chef client configuration with optimized pull
chef_client_updater 'Install latest Chef' do
version 'latest'
post_install_action 'kill'
end
Push-Based Scaling Solutions
# Ansible playbook with scaling optimizations
- hosts: all
serial: 50
max_fail_percentage: 5
tasks:
- name: Apply security updates
apt:
update_cache: yes
upgrade: dist
Consider these factors when selecting an architecture:
Factor | Pull Advantage | Push Advantage |
---|---|---|
Offline nodes | ✔️ Automatic retry | ❌ Manual handling |
Immediate changes | ❌ Polling delay | ✔️ Instant |
Network topology | ✔️ Works behind NAT | ❌ Requires connectivity |
Some systems combine both models effectively:
# SaltStack hybrid example
# Master-minion (pull) configuration:
file_client: remote
# Masterless (push) configuration:
file_client: local
The key is matching your architecture to:
- Infrastructure size
- Change frequency
- Operational team size
- Network constraints