As a sysadmin, seeing those familiar update notifications on production boxes always triggers a mental calculus:
$ sudo apt update
...
30 packages can be updated.
16 updates are security updates.
The core tension lies between maintaining security (applying patches promptly) and ensuring stability (avoiding unexpected failures). Let me share battle-tested approaches from managing hundreds of production Ubuntu servers.
For security updates, we implement a tiered response system:
- Emergency patches (CVSS ≥ 7.0): Apply within 24 hours using:
sudo unattended-upgrade --dry-run -d
sudo unattended-upgrade --dry-run --debug cat /var/log/unattended-upgrades/unattended-upgrades.log
First test in staging, then deploy with monitoring checks.
- High-risk updates: Weekly maintenance window procedure:
#!/bin/bash
# production_update.sh
apt-mark showhold
apt-get update
apt-get upgrade -s # simulate first!
apt-get install --only-upgrade <specific-package>
systemctl restart <affected-service>
We use Ansible to handle config file preservation during updates:
- name: Upgrade packages
apt:
upgrade: dist
update_cache: yes
autoremove: yes
dpkg_options: 'force-confdef,force-confold'
- name: Check for needed reboots
stat:
path: /var/run/reboot-required
register: reboot_required
For mission-critical services, we selectively freeze versions:
# Check held packages
apt-mark showhold
# Place hold on NGINX
sudo apt-mark hold nginx
# View changelog before updating
apt-get changelog postgresql-14
Our CI/CD pipeline includes update validation:
# In Jenkinsfile
stage('Update Test') {
sh 'lxc launch ubuntu:22.04 test-container'
sh 'lxc exec test-container -- apt update && apt upgrade -y'
sh 'lxc exec test-container -- run-parts /etc/update-test.d/'
post {
always {
sh 'lxc delete -f test-container'
}
}
}
Essential post-update checks:
# Nagios custom check
#!/bin/bash
services=("nginx" "postgresql" "redis")
for service in "${services[@]}"; do
if ! systemctl is-active --quiet $service; then
echo "CRITICAL: $service down after update"
exit 2
fi
done
echo "OK: All services running"
exit 0
For rollbacks, we maintain local package mirrors with:
sudo apt-cache policy <package>
sudo apt-get install <package>=<old-version>
After years of production management, here's my update cadence:
- Security updates: Apply within 72 hours (with testing)
- Non-critical updates: Monthly maintenance windows
- Database/Kernel updates: Quarterly with full backups
The key is implementing structured processes rather than ad-hoc updates.
Every sysadmin knows that stomach-dropping moment when you see:
30 packages can be updated.
16 updates are security updates.
The core challenge: Security patches arrive daily, but production environments can't tolerate daily reboots or potential breaks. Here's how we navigate this minefield.
We follow these principles:
- Security first: Critical CVEs get patched within 24 hours
- Stability second: Non-critical updates follow a staged rollout
- Automation with oversight: Automated checks with human approval gates
Our solution combines unattended-upgrades with custom checks:
# /etc/apt/apt.conf.d/50unattended-upgrades
Unattended-Upgrade::Allowed-Origins {
"${distro_id}:${distro_codename}-security";
// "${distro_id}:${distro_codename}-updates"; // Disabled for prod
};
Unattended-Upgrade::Automatic-Reboot "false";
Unattended-Upgrade::Mail "admin@example.com";
Updates flow through this path:
- Test VM cluster (updates applied immediately)
- Staging environment (1-week delay)
- Production canary nodes (2-week delay)
- Full production rollout
We use this Nagios check to monitor pending updates:
#!/bin/bash
SECURITY_UPDATES=$(/usr/lib/update-notifier/apt-check --human-readable | grep security)
if [ -z "$SECURITY_UPDATES" ]; then
echo "OK: No security updates pending"
exit 0
else
echo "CRITICAL: ${SECURITY_UPDATES}"
exit 2
fi
Our rollback procedure:
# Snapshot before updates
sudo apt-mark showauto > /var/backups/apt-mark-auto
sudo apt-mark showmanual > /var/backups/apt-mark-manual
sudo dpkg --get-selections > /var/backups/dpkg-selections
# Rollback command
sudo apt-get install aptitude
sudo aptitude keep-all
For database servers, we add these precautions:
# mysql-safe-update.sh
if ! mysql -e "SELECT 1" >/dev/null 2>&1; then
echo "DB connection check failed - aborting update"
exit 1
fi
# Take backups before any package changes
mysqldump --all-databases | gzip > /backups/mysql/pre-update-$(date +%s).sql.gz
Remember: No solution is perfect, but this balanced approach has kept our uptime at 99.99% while addressing critical vulnerabilities promptly.