Preventing Cron Job Overlap: Best Practices for Implementing File Locking in Shell Scripts

When scheduling frequent cron jobs (especially those running every minute), a common issue arises when the script execution time exceeds the interval between runs. This leads to multiple instances of the same script running simultaneously, potentially causing:

Resource contention
Data corruption
Unpredictable script behavior
System performance degradation

While the basic file-based locking approach mentioned (using lockfile.txt) works in principle, it has several weaknesses:


# Basic (flawed) implementation example:
if [ -f /tmp/lockfile.txt ]; then
    exit 0
else
    touch /tmp/lockfile.txt
    # Main script logic here
    rm /tmp/lockfile.txt
fi

The main issues with this approach are:

No protection against stale locks (if script crashes)
Possible race conditions between check and create
No process ownership verification

Here are three professional-grade solutions to implement proper locking:

1. Using flock (Recommended for Linux)


#!/bin/bash
(
  flock -n 200 || exit 1
  # Your script commands here
) 200>/var/lock/myscript.lock

Key advantages:

Kernel-managed locks (no stale files)
Automatic release when process ends
Non-blocking option available (-n)

2. Process ID Tracking


LOCKFILE="/tmp/myscript.lock"
if [ -e ${LOCKFILE} ] && kill -0 `cat ${LOCKFILE}`; then
    exit
fi
echo $$ > ${LOCKFILE}
# Script contents here
rm -f ${LOCKFILE}

3. Python Implementation


import fcntl
import sys
import os

lock_file = '/tmp/myscript.lock'
file = open(lock_file, 'w')
try:
    fcntl.lockf(file, fcntl.LOCK_EX | fcntl.LOCK_NB)
except IOError:
    sys.exit(0)

# Main script logic here

For production systems, consider these enhancements:

Log lock acquisition failures
Implement lock timeout mechanisms
Include cleanup routines for unexpected exits
Consider systemd services for long-running processes

File locking may not be ideal when:

Scripts run across multiple servers
You need distributed coordination
The jobs are mission-critical

In these cases, consider:

Database-based locking
Distributed coordination services (ZooKeeper, etcd)
Queue systems (Redis, RabbitMQ)

When scheduling frequent cron jobs (like every minute), you might encounter situations where a job's execution time exceeds the interval between scheduled runs. This leads to concurrent executions piling up, consuming system resources unnecessarily, and potentially causing data corruption or race conditions.

The approach of using a lock file (lockfile.txt) is actually a standard Unix practice, though we can implement it more robustly. The concept is called file locking or advisory locking, where the existence of a file serves as a mutex.

Here's a more robust version of file locking in bash:


#!/bin/bash

LOCKFILE="/tmp/myscript.lock"
TIMEOUT=60  # seconds

# Check for existing lock
if [ -e "${LOCKFILE}" ] && kill -0 "$(cat ${LOCKFILE})"; then
    echo "Script already running (PID: $(cat ${LOCKFILE}))" >&2
    exit 1
fi

# Create lock file
echo $$ > "${LOCKFILE}"

# Ensure lock is removed when script exits
trap 'rm -f "${LOCKFILE}"; exit' INT TERM EXIT

# Your actual script logic here
echo "Running script..."
sleep 30  # Simulate long-running task

# Clean up (handled by trap but explicit is good)
rm -f "${LOCKFILE}"

Linux provides a built-in command specifically for file locking:


#!/bin/bash

(
  flock -n 200 || exit 1
  
  # Your commands here
  echo "Running with flock protection"
  sleep 30
  
) 200>/var/lock/myscript.lock

For Python scripts, you can use the portalocker library:


import portalocker
import time

LOCKFILE = '/tmp/myscript.lock'

try:
    with open(LOCKFILE, 'w') as f:
        portalocker.lock(f, portalocker.LOCK_EX | portalocker.LOCK_NB)
        
        # Your script logic
        print("Script running")
        time.sleep(30)
        
except (IOError, portalocker.exceptions.LockException):
    print("Script already running")
    exit(1)

For simple scripts, the bash implementation works well. For more complex scenarios:

Use flock for command-line tools (most robust)
Consider Python's portalocker for Python scripts
For distributed systems, look into Redis or database-based locks

Avoid these mistakes when implementing cron job locks:

Not handling stale lock files (add timeout checks)
Forgetting to remove lock files on script failure
Using temporary directories that get cleared automatically
Not testing both locked and unlocked scenarios

ServerDevWorker

Preventing Cron Job Overlap: Best Practices for Implementing File Locking in Shell Scripts

1. Using flock (Recommended for Linux)

2. Process ID Tracking

3. Python Implementation

Related Articles