When running critical cron jobs across redundant Debian servers, we face a common challenge: ensuring a job runs only once despite having multiple servers. Traditional approaches like file locking don’t handle server failures elegantly. Here’s a lightweight HA solution without relying on Heartbeat/Pacemaker.
Leverage atomic filesystem operations to "claim" cron execution rights. This script (/usr/local/bin/cron-leader
) checks which server owns the lock:
#!/bin/bash
LOCK_FILE="/etc/cron.d/ha-cron.lock"
THIS_SERVER=$(hostname -s)
# Attempt to claim the lock
if ln -s "$THIS_SERVER" "$LOCK_FILE" 2>/dev/null; then
echo "[$(date)] Lock acquired by $THIS_SERVER"
exit 0
else
echo "[$(date)] Lock held by $(readlink "$LOCK_FILE")"
exit 1
fi
In /etc/cron.d/ha-job
:
# Run every hour, but only on the leader
0 * * * * root /usr/local/bin/cron-leader && /path/to/actual/script.sh
For more robust failure detection, use Consul’s session-based locks:
consul lock -child-exit-code -verbose cron-jobs /path/to/script.sh
- NFS Consideration: If using shared storage, ensure
flock
works across nodes - Clock Drift: Synchronize time with
chrony
orntpd
- Cleanup: Add trap handlers to remove stale locks
When implementing cron-based automation across multiple servers, we often face the "double execution" problem. Here's a lightweight solution using file-based locking that works without complex HA frameworks like Heartbeat/Pacemaker.
#!/bin/bash
# /usr/local/bin/cron-failover.sh
LOCK_FILE="/mnt/nas-share/cron.lock"
SERVER_ID=$(hostname -s)
# Acquire lock with 5 minute timeout (300 seconds)
if ln -s "$SERVER_ID" "$LOCK_FILE" 2>/dev/null ||
[ "$(find "$LOCK_FILE" -mmin +5 2>/dev/null)" ] ||
[ "$(readlink "$LOCK_FILE")" = "$SERVER_ID" ]; then
ln -sf "$SERVER_ID" "$LOCK_FILE"
else
exit 0
fi
# Your actual cron job commands here
/path/to/your/script.sh
# Optional: Release lock when done
# rm -f "$LOCK_FILE"
For the lock file mechanism to work reliably, consider these shared storage options:
- NFS share mounted on both servers
- GlusterFS distributed filesystem
- DRBD block-level replication
# MySQL/MariaDB implementation example
LOCK_QUERY="INSERT INTO cron_locks (job_name, server, timestamp)
VALUES ('${JOB_NAME}', '${SERVER_ID}', NOW())
ON DUPLICATE KEY UPDATE
server = IF(timestamp < DATE_SUB(NOW(), INTERVAL 5 MINUTE),
VALUES(server), server),
timestamp = IF(server = VALUES(server),
VALUES(timestamp), timestamp)"
# /etc/cron.d/HA-job
* * * * * root /usr/local/bin/cron-failover.sh > /var/log/ha-cron.log 2>&1
Implement these checks to ensure failover reliability:
# Monitoring script example
#!/bin/bash
LOCK_AGE=$(stat -c %Y /mnt/nas-share/cron.lock 2>/dev/null || echo 0)
CURRENT_TIME=$(date +%s)
MAX_AGE=600 # 10 minutes
if [ $((CURRENT_TIME - LOCK_AGE)) -gt $MAX_AGE ]; then
echo "CRITICAL: Cron lock file stale" | mail -s "Cron Failover Alert" admin@example.com
fi