When scheduling frequent cron jobs (like every 5 minutes) that might occasionally run longer than the interval period, we face the risk of overlapping executions. This becomes particularly problematic when:
- The job performs database operations that shouldn't be concurrent
- It consumes significant system resources
- Multiple instances could cause data corruption or race conditions
The most common approach is using lock files to prevent multiple instances:
#!/bin/bash
LOCKFILE="/tmp/my_cron_job.lock"
# Check for existing lock file
if [ -e ${LOCKFILE} ] && kill -0 cat ${LOCKFILE}; then
echo "Previous instance still running - exiting"
exit 1
fi
# Create lock file
echo $$ > ${LOCKFILE}
# Clean up lock file when script finishes or crashes
trap "rm -f ${LOCKFILE}; exit" INT TERM EXIT
# Your actual job code here
./my_script.sh
# Explicitly remove lock file when done
rm -f ${LOCKFILE}
To address cases where scripts crash without cleaning up lock files:
#!/bin/bash
LOCKFILE="/tmp/my_cron_job.lock"
LOCKTIMEOUT=3600 # 1 hour in seconds
# Check for stale lock file
if [ -e ${LOCKFILE} ]; then
# Get PID from lock file
PID=$(cat ${LOCKFILE})
# Check if process is still running
if ! kill -0 $PID 2>/dev/null; then
# Process not running - clean up stale lock
echo "Removing stale lock file"
rm -f ${LOCKFILE}
else
# Check lock file age
LOCKAGE=$(($(date +%s) - $(stat -c %Y ${LOCKFILE})))
if [ ${LOCKAGE} -gt ${LOCKTIMEOUT} ]; then
echo "Force removing timed-out lock (age: ${LOCKAGE}s)"
kill -9 $PID
rm -f ${LOCKFILE}
else
echo "Previous instance still running (PID: $PID)"
exit 1
fi
fi
fi
# Create new lock file
echo $$ > ${LOCKFILE}
For more robust solutions:
Using Flock
The flock
utility provides a simpler way to handle file locking:
#!/bin/bash
(
flock -n 200 || exit 1
# Your commands here
./my_script.sh
) 200>/tmp/my_cron_job.lock
Database-Based Locking
For distributed systems, consider using database locks:
#!/bin/bash
# MySQL example
if ! mysql -e "SELECT GET_LOCK('my_cron_job', 0)" | grep -q '1'; then
echo "Job is already running"
exit 1
fi
# Job code here
# Release lock
mysql -e "SELECT RELEASE_LOCK('my_cron_job')"
For modern Linux systems, consider using systemd timers instead of cron:
# myjob.service
[Unit]
Description=My Periodic Job
[Service]
Type=oneshot
ExecStart=/path/to/my_script.sh
# myjob.timer
[Unit]
Description=Run my job every 5 minutes
[Timer]
OnCalendar=*-*-* *:0/5:0
Persistent=true
[Install]
WantedBy=timers.target
Implement monitoring to detect stuck jobs:
#!/bin/bash
MAX_RUNTIME=300 # 5 minutes in seconds
LOCKFILE="/tmp/my_cron_job.lock"
if [ -e ${LOCKFILE} ]; then
PID=$(cat ${LOCKFILE})
if kill -0 $PID 2>/dev/null; then
RUNTIME=$(ps -o etimes= -p $PID | awk '{print $1}')
if [ ${RUNTIME} -gt ${MAX_RUNTIME} ]; then
# Send alert
echo "Job running too long (PID: $PID, ${RUNTIME}s)" | mail -s "Cron Job Alert" admin@example.com
fi
fi
fi
When dealing with cron jobs that run at short intervals (like every 5 minutes), you might encounter situations where a job takes longer to complete than the interval between runs. This can lead to:
- Multiple instances of the same script running simultaneously
- Resource contention and system overload
- Race conditions in data processing
- Unexpected behavior when scripts aren't designed for concurrency
Here are three reliable methods to prevent cron job overlap:
1. File Locking Mechanism
The most common approach is to implement file-based locking:
#!/bin/bash
LOCKFILE="/tmp/myscript.lock"
if [ -e ${LOCKFILE} ] && kill -0 cat ${LOCKFILE}; then
echo "Script already running"
exit
fi
trap "rm -f ${LOCKFILE}; exit" INT TERM EXIT
echo $$ > ${LOCKFILE}
# Your actual script commands go here...
sleep 60 # Example long-running process
rm -f ${LOCKFILE}
2. Process Checking
For more robust checking, you can verify running processes:
#!/bin/bash
SCRIPT_NAME=$(basename "$0")
RUNNING_PIDS=$(pgrep -fl "$SCRIPT_NAME" | grep -v $$ | wc -l)
if [ "$RUNNING_PIDS" -gt 1 ]; then
echo "Another instance is already running"
exit 1
fi
# Rest of your script...
3. Database-Based Locking
For distributed systems, consider a database lock:
#!/usr/bin/env python3
import sqlite3
import time
import sys
DB_FILE = "/var/locks/cron_locks.db"
def acquire_lock(job_name):
try:
conn = sqlite3.connect(DB_FILE)
cursor = conn.cursor()
cursor.execute("CREATE TABLE IF NOT EXISTS locks (job_name TEXT PRIMARY KEY, pid INTEGER, timestamp REAL)")
# Try to insert lock
cursor.execute("INSERT INTO locks VALUES (?, ?, ?)", (job_name, os.getpid(), time.time()))
conn.commit()
return True
except sqlite3.IntegrityError:
# Lock exists - check if process is still alive
cursor.execute("SELECT pid FROM locks WHERE job_name=?", (job_name,))
existing_pid = cursor.fetchone()[0]
try:
os.kill(existing_pid, 0) # Check if process exists
return False
except ProcessLookupError:
# Process is dead - take over the lock
cursor.execute("UPDATE locks SET pid=?, timestamp=? WHERE job_name=?",
(os.getpid(), time.time(), job_name))
conn.commit()
return True
finally:
conn.close()
if not acquire_lock("my_cron_job"):
print("Previous instance still running")
sys.exit(0)
# Main script logic here...
To address the issue of scripts crashing without cleaning up locks:
- Add timestamp checks to remove stale locks (e.g., older than 1 hour)
- Implement signal trapping to ensure cleanup on script termination
- Consider combining file locks with process checking for robustness
For Linux systems using systemd, you can leverage its built-in locking:
[Unit]
Description=My Cron Job
After=network.target
[Service]
Type=oneshot
ExecStart=/path/to/your/script.sh
[Install]
WantedBy=multi-user.target
Then create a timer unit to replace cron:
[Unit]
Description=Run my script every 5 minutes
[Timer]
OnCalendar=*-*-* *:0/5:0
Unit=myscript.service
[Install]
WantedBy=timers.target