Preventing Cron Job Overlap: Ensuring Single Instance Execution When Previous Run is Still Active


2 views

When scheduling frequent cron jobs (like every 5 minutes) that might occasionally run longer than the interval period, we face the risk of overlapping executions. This becomes particularly problematic when:

  • The job performs database operations that shouldn't be concurrent
  • It consumes significant system resources
  • Multiple instances could cause data corruption or race conditions

The most common approach is using lock files to prevent multiple instances:


#!/bin/bash

LOCKFILE="/tmp/my_cron_job.lock"

# Check for existing lock file
if [ -e ${LOCKFILE} ] && kill -0 cat ${LOCKFILE}; then
    echo "Previous instance still running - exiting"
    exit 1
fi

# Create lock file
echo $$ > ${LOCKFILE}

# Clean up lock file when script finishes or crashes
trap "rm -f ${LOCKFILE}; exit" INT TERM EXIT

# Your actual job code here
./my_script.sh

# Explicitly remove lock file when done
rm -f ${LOCKFILE}

To address cases where scripts crash without cleaning up lock files:


#!/bin/bash

LOCKFILE="/tmp/my_cron_job.lock"
LOCKTIMEOUT=3600  # 1 hour in seconds

# Check for stale lock file
if [ -e ${LOCKFILE} ]; then
    # Get PID from lock file
    PID=$(cat ${LOCKFILE})
    
    # Check if process is still running
    if ! kill -0 $PID 2>/dev/null; then
        # Process not running - clean up stale lock
        echo "Removing stale lock file"
        rm -f ${LOCKFILE}
    else
        # Check lock file age
        LOCKAGE=$(($(date +%s) - $(stat -c %Y ${LOCKFILE})))
        if [ ${LOCKAGE} -gt ${LOCKTIMEOUT} ]; then
            echo "Force removing timed-out lock (age: ${LOCKAGE}s)"
            kill -9 $PID
            rm -f ${LOCKFILE}
        else
            echo "Previous instance still running (PID: $PID)"
            exit 1
        fi
    fi
fi

# Create new lock file
echo $$ > ${LOCKFILE}

For more robust solutions:

Using Flock

The flock utility provides a simpler way to handle file locking:


#!/bin/bash

(
  flock -n 200 || exit 1
  # Your commands here
  ./my_script.sh
) 200>/tmp/my_cron_job.lock

Database-Based Locking

For distributed systems, consider using database locks:


#!/bin/bash

# MySQL example
if ! mysql -e "SELECT GET_LOCK('my_cron_job', 0)" | grep -q '1'; then
    echo "Job is already running"
    exit 1
fi

# Job code here

# Release lock
mysql -e "SELECT RELEASE_LOCK('my_cron_job')"

For modern Linux systems, consider using systemd timers instead of cron:


# myjob.service
[Unit]
Description=My Periodic Job

[Service]
Type=oneshot
ExecStart=/path/to/my_script.sh

# myjob.timer
[Unit]
Description=Run my job every 5 minutes

[Timer]
OnCalendar=*-*-* *:0/5:0
Persistent=true

[Install]
WantedBy=timers.target

Implement monitoring to detect stuck jobs:


#!/bin/bash

MAX_RUNTIME=300  # 5 minutes in seconds
LOCKFILE="/tmp/my_cron_job.lock"

if [ -e ${LOCKFILE} ]; then
    PID=$(cat ${LOCKFILE})
    if kill -0 $PID 2>/dev/null; then
        RUNTIME=$(ps -o etimes= -p $PID | awk '{print $1}')
        if [ ${RUNTIME} -gt ${MAX_RUNTIME} ]; then
            # Send alert
            echo "Job running too long (PID: $PID, ${RUNTIME}s)" | mail -s "Cron Job Alert" admin@example.com
        fi
    fi
fi

When dealing with cron jobs that run at short intervals (like every 5 minutes), you might encounter situations where a job takes longer to complete than the interval between runs. This can lead to:

  • Multiple instances of the same script running simultaneously
  • Resource contention and system overload
  • Race conditions in data processing
  • Unexpected behavior when scripts aren't designed for concurrency

Here are three reliable methods to prevent cron job overlap:

1. File Locking Mechanism

The most common approach is to implement file-based locking:

#!/bin/bash

LOCKFILE="/tmp/myscript.lock"

if [ -e ${LOCKFILE} ] && kill -0 cat ${LOCKFILE}; then
    echo "Script already running"
    exit
fi

trap "rm -f ${LOCKFILE}; exit" INT TERM EXIT
echo $$ > ${LOCKFILE}

# Your actual script commands go here...
sleep 60 # Example long-running process

rm -f ${LOCKFILE}

2. Process Checking

For more robust checking, you can verify running processes:

#!/bin/bash

SCRIPT_NAME=$(basename "$0")
RUNNING_PIDS=$(pgrep -fl "$SCRIPT_NAME" | grep -v $$ | wc -l)

if [ "$RUNNING_PIDS" -gt 1 ]; then
    echo "Another instance is already running"
    exit 1
fi

# Rest of your script...

3. Database-Based Locking

For distributed systems, consider a database lock:

#!/usr/bin/env python3
import sqlite3
import time
import sys

DB_FILE = "/var/locks/cron_locks.db"

def acquire_lock(job_name):
    try:
        conn = sqlite3.connect(DB_FILE)
        cursor = conn.cursor()
        cursor.execute("CREATE TABLE IF NOT EXISTS locks (job_name TEXT PRIMARY KEY, pid INTEGER, timestamp REAL)")
        
        # Try to insert lock
        cursor.execute("INSERT INTO locks VALUES (?, ?, ?)", (job_name, os.getpid(), time.time()))
        conn.commit()
        return True
    except sqlite3.IntegrityError:
        # Lock exists - check if process is still alive
        cursor.execute("SELECT pid FROM locks WHERE job_name=?", (job_name,))
        existing_pid = cursor.fetchone()[0]
        try:
            os.kill(existing_pid, 0)  # Check if process exists
            return False
        except ProcessLookupError:
            # Process is dead - take over the lock
            cursor.execute("UPDATE locks SET pid=?, timestamp=? WHERE job_name=?", 
                         (os.getpid(), time.time(), job_name))
            conn.commit()
            return True
    finally:
        conn.close()

if not acquire_lock("my_cron_job"):
    print("Previous instance still running")
    sys.exit(0)

# Main script logic here...

To address the issue of scripts crashing without cleaning up locks:

  • Add timestamp checks to remove stale locks (e.g., older than 1 hour)
  • Implement signal trapping to ensure cleanup on script termination
  • Consider combining file locks with process checking for robustness

For Linux systems using systemd, you can leverage its built-in locking:

[Unit]
Description=My Cron Job
After=network.target

[Service]
Type=oneshot
ExecStart=/path/to/your/script.sh

[Install]
WantedBy=multi-user.target

Then create a timer unit to replace cron:

[Unit]
Description=Run my script every 5 minutes

[Timer]
OnCalendar=*-*-* *:0/5:0
Unit=myscript.service

[Install]
WantedBy=timers.target