Preventing Cron Job Overlap: Best Practices for Implementing File Locking in Shell Scripts


3 views

When scheduling frequent cron jobs (especially those running every minute), a common issue arises when the script execution time exceeds the interval between runs. This leads to multiple instances of the same script running simultaneously, potentially causing:

  • Resource contention
  • Data corruption
  • Unpredictable script behavior
  • System performance degradation

While the basic file-based locking approach mentioned (using lockfile.txt) works in principle, it has several weaknesses:


# Basic (flawed) implementation example:
if [ -f /tmp/lockfile.txt ]; then
    exit 0
else
    touch /tmp/lockfile.txt
    # Main script logic here
    rm /tmp/lockfile.txt
fi

The main issues with this approach are:

  • No protection against stale locks (if script crashes)
  • Possible race conditions between check and create
  • No process ownership verification

Here are three professional-grade solutions to implement proper locking:

1. Using flock (Recommended for Linux)


#!/bin/bash
(
  flock -n 200 || exit 1
  # Your script commands here
) 200>/var/lock/myscript.lock

Key advantages:

  • Kernel-managed locks (no stale files)
  • Automatic release when process ends
  • Non-blocking option available (-n)

2. Process ID Tracking


LOCKFILE="/tmp/myscript.lock"
if [ -e ${LOCKFILE} ] && kill -0 `cat ${LOCKFILE}`; then
    exit
fi
echo $$ > ${LOCKFILE}
# Script contents here
rm -f ${LOCKFILE}

3. Python Implementation


import fcntl
import sys
import os

lock_file = '/tmp/myscript.lock'
file = open(lock_file, 'w')
try:
    fcntl.lockf(file, fcntl.LOCK_EX | fcntl.LOCK_NB)
except IOError:
    sys.exit(0)

# Main script logic here

For production systems, consider these enhancements:

  • Log lock acquisition failures
  • Implement lock timeout mechanisms
  • Include cleanup routines for unexpected exits
  • Consider systemd services for long-running processes

File locking may not be ideal when:

  • Scripts run across multiple servers
  • You need distributed coordination
  • The jobs are mission-critical

In these cases, consider:

  • Database-based locking
  • Distributed coordination services (ZooKeeper, etcd)
  • Queue systems (Redis, RabbitMQ)

When scheduling frequent cron jobs (like every minute), you might encounter situations where a job's execution time exceeds the interval between scheduled runs. This leads to concurrent executions piling up, consuming system resources unnecessarily, and potentially causing data corruption or race conditions.

The approach of using a lock file (lockfile.txt) is actually a standard Unix practice, though we can implement it more robustly. The concept is called file locking or advisory locking, where the existence of a file serves as a mutex.

Here's a more robust version of file locking in bash:


#!/bin/bash

LOCKFILE="/tmp/myscript.lock"
TIMEOUT=60  # seconds

# Check for existing lock
if [ -e "${LOCKFILE}" ] && kill -0 "$(cat ${LOCKFILE})"; then
    echo "Script already running (PID: $(cat ${LOCKFILE}))" >&2
    exit 1
fi

# Create lock file
echo $$ > "${LOCKFILE}"

# Ensure lock is removed when script exits
trap 'rm -f "${LOCKFILE}"; exit' INT TERM EXIT

# Your actual script logic here
echo "Running script..."
sleep 30  # Simulate long-running task

# Clean up (handled by trap but explicit is good)
rm -f "${LOCKFILE}"

Linux provides a built-in command specifically for file locking:


#!/bin/bash

(
  flock -n 200 || exit 1
  
  # Your commands here
  echo "Running with flock protection"
  sleep 30
  
) 200>/var/lock/myscript.lock

For Python scripts, you can use the portalocker library:


import portalocker
import time

LOCKFILE = '/tmp/myscript.lock'

try:
    with open(LOCKFILE, 'w') as f:
        portalocker.lock(f, portalocker.LOCK_EX | portalocker.LOCK_NB)
        
        # Your script logic
        print("Script running")
        time.sleep(30)
        
except (IOError, portalocker.exceptions.LockException):
    print("Script already running")
    exit(1)

For simple scripts, the bash implementation works well. For more complex scenarios:

  • Use flock for command-line tools (most robust)
  • Consider Python's portalocker for Python scripts
  • For distributed systems, look into Redis or database-based locks

Avoid these mistakes when implementing cron job locks:

  • Not handling stale lock files (add timeout checks)
  • Forgetting to remove lock files on script failure
  • Using temporary directories that get cleared automatically
  • Not testing both locked and unlocked scenarios