When scheduling frequent cron jobs (especially those running every minute), a common issue arises when the script execution time exceeds the interval between runs. This leads to multiple instances of the same script running simultaneously, potentially causing:
- Resource contention
- Data corruption
- Unpredictable script behavior
- System performance degradation
While the basic file-based locking approach mentioned (using lockfile.txt
) works in principle, it has several weaknesses:
# Basic (flawed) implementation example:
if [ -f /tmp/lockfile.txt ]; then
exit 0
else
touch /tmp/lockfile.txt
# Main script logic here
rm /tmp/lockfile.txt
fi
The main issues with this approach are:
- No protection against stale locks (if script crashes)
- Possible race conditions between check and create
- No process ownership verification
Here are three professional-grade solutions to implement proper locking:
1. Using flock (Recommended for Linux)
#!/bin/bash
(
flock -n 200 || exit 1
# Your script commands here
) 200>/var/lock/myscript.lock
Key advantages:
- Kernel-managed locks (no stale files)
- Automatic release when process ends
- Non-blocking option available (-n)
2. Process ID Tracking
LOCKFILE="/tmp/myscript.lock"
if [ -e ${LOCKFILE} ] && kill -0 `cat ${LOCKFILE}`; then
exit
fi
echo $$ > ${LOCKFILE}
# Script contents here
rm -f ${LOCKFILE}
3. Python Implementation
import fcntl
import sys
import os
lock_file = '/tmp/myscript.lock'
file = open(lock_file, 'w')
try:
fcntl.lockf(file, fcntl.LOCK_EX | fcntl.LOCK_NB)
except IOError:
sys.exit(0)
# Main script logic here
For production systems, consider these enhancements:
- Log lock acquisition failures
- Implement lock timeout mechanisms
- Include cleanup routines for unexpected exits
- Consider systemd services for long-running processes
File locking may not be ideal when:
- Scripts run across multiple servers
- You need distributed coordination
- The jobs are mission-critical
In these cases, consider:
- Database-based locking
- Distributed coordination services (ZooKeeper, etcd)
- Queue systems (Redis, RabbitMQ)
When scheduling frequent cron jobs (like every minute), you might encounter situations where a job's execution time exceeds the interval between scheduled runs. This leads to concurrent executions piling up, consuming system resources unnecessarily, and potentially causing data corruption or race conditions.
The approach of using a lock file (lockfile.txt
) is actually a standard Unix practice, though we can implement it more robustly. The concept is called file locking or advisory locking, where the existence of a file serves as a mutex.
Here's a more robust version of file locking in bash:
#!/bin/bash
LOCKFILE="/tmp/myscript.lock"
TIMEOUT=60 # seconds
# Check for existing lock
if [ -e "${LOCKFILE}" ] && kill -0 "$(cat ${LOCKFILE})"; then
echo "Script already running (PID: $(cat ${LOCKFILE}))" >&2
exit 1
fi
# Create lock file
echo $$ > "${LOCKFILE}"
# Ensure lock is removed when script exits
trap 'rm -f "${LOCKFILE}"; exit' INT TERM EXIT
# Your actual script logic here
echo "Running script..."
sleep 30 # Simulate long-running task
# Clean up (handled by trap but explicit is good)
rm -f "${LOCKFILE}"
Linux provides a built-in command specifically for file locking:
#!/bin/bash
(
flock -n 200 || exit 1
# Your commands here
echo "Running with flock protection"
sleep 30
) 200>/var/lock/myscript.lock
For Python scripts, you can use the portalocker
library:
import portalocker
import time
LOCKFILE = '/tmp/myscript.lock'
try:
with open(LOCKFILE, 'w') as f:
portalocker.lock(f, portalocker.LOCK_EX | portalocker.LOCK_NB)
# Your script logic
print("Script running")
time.sleep(30)
except (IOError, portalocker.exceptions.LockException):
print("Script already running")
exit(1)
For simple scripts, the bash implementation works well. For more complex scenarios:
- Use
flock
for command-line tools (most robust) - Consider Python's
portalocker
for Python scripts - For distributed systems, look into Redis or database-based locks
Avoid these mistakes when implementing cron job locks:
- Not handling stale lock files (add timeout checks)
- Forgetting to remove lock files on script failure
- Using temporary directories that get cleared automatically
- Not testing both locked and unlocked scenarios