Automated Yum Repository Cleanup: How to Remove Old RPM Versions While Keeping Latest X Builds


2 views

Many development teams face repository bloat when using Yum repositories for RPM package distribution. Our case involves a custom repo where builds accumulate rapidly - sometimes dozens of versions of the same package. The manual cleanup process becomes unsustainable as the team grows.

For stable production environments, we recommend keeping:

  • 3 latest versions of each package (for rollback capability)
  • Any security-patched versions marked as special retention
  • The current production version (even if not latest)

Here's a Python solution using the yum and createrepo utilities:


#!/usr/bin/env python3
import os
import re
from collections import defaultdict

REPO_PATH = "/var/www/html/yum-repo"
KEEP_VERSIONS = 3

def clean_repo():
    packages = defaultdict(list)
    
    # Scan repository and group packages by name
    for filename in os.listdir(REPO_PATH):
        if not filename.endswith('.rpm'):
            continue
            
        # Extract package name (naive version parsing - adjust as needed)
        pkg_name = re.sub(r'-\d+\.\d+\.\d+.*\.rpm$', '', filename)
        packages[pkg_name].append(filename)
    
    # Process each package group
    for pkg_name, versions in packages.items():
        if len(versions) <= KEEP_VERSIONS:
            continue
            
        # Sort versions by modification time (newest first)
        versions.sort(key=lambda x: os.path.getmtime(
            os.path.join(REPO_PATH, x)), reverse=True)
            
        # Keep only N newest versions
        for old_version in versions[KEEP_VERSIONS:]:
            old_path = os.path.join(REPO_PATH, old_version)
            print(f"Removing {old_path}")
            os.unlink(old_path)
    
    # Update repository metadata
    os.system(f"createrepo --update {REPO_PATH}")

if __name__ == "__main__":
    clean_repo()

For those preferring existing solutions:

  • yum-utils: The package-cleanup tool can remove old kernels but lacks general RPM support
  • repomanage: Included in createrepo package, handles basic version retention

Schedule weekly cleanup via cron:


# Weekly repository maintenance
0 3 * * 1 /usr/local/bin/clean_yum_repo.py >> /var/log/yum-cleanup.log 2>&1

When managing a custom Yum repository for development builds, we inevitably face storage bloat from accumulating old RPM packages. Manual cleanup becomes tedious:

  • Developers push nightly/weekly builds
  • Multiple parallel version branches coexist
  • No automatic retention policy exists

An ideal solution should:


1. Preserve the latest X versions per package
2. Handle standard version numbering (1.2.3) and release tags
3. Support cron automation
4. Maintain repo metadata integrity
5. Log deleted packages for audit

Here's a production-tested script using yum and createrepo utilities:


#!/usr/bin/env python3
import os
import re
from collections import defaultdict
from subprocess import check_call

REPO_PATH = "/var/www/html/yum/custom"
KEEP_VERSIONS = 3  # Number of versions to retain

def get_rpm_versions():
    pkg_versions = defaultdict(list)
    for f in os.listdir(REPO_PATH):
        if not f.endswith('.rpm'):
            continue
        # Parse name-version-release.arch.rpm
        match = re.match(r'^(.*)-(\d+\.\d+\.\d+)-(\d+)\..*\.rpm$', f)
        if match:
            name, version, release = match.groups()
            full_ver = (version, release, f)
            pkg_versions[name].append(full_ver)
    return pkg_versions

def cleanup_repo():
    pkg_versions = get_rpm_versions()
    for pkg, versions in pkg_versions.items():
        # Sort by version then release number
        versions.sort(key=lambda x: (tuple(map(int, x[0].split('.'))), int(x[1])))
        if len(versions) > KEEP_VERSIONS:
            for ver in versions[:-KEEP_VERSIONS]:
                os.remove(os.path.join(REPO_PATH, ver[2]))
                print(f"Removed: {ver[2]}")
    # Update repo metadata
    check_call(['createrepo', '--update', REPO_PATH])

if __name__ == '__main__':
    cleanup_repo()

For enterprise environments, enhance the script with:


# Retention rules in YAML config
retention_rules:
  core-package:
    keep_last: 5
    min_age_days: 7
  experimental-*:
    keep_last: 1
    
# Dry-run mode
# S3/Artifactory integration
# Email notification of deletions
# Package signature verification

Set up daily cleanup at 2AM:


# /etc/cron.d/yum-cleanup
0 2 * * * root /usr/local/bin/yum_repo_cleanup.py >> /var/log/yum-cleanup.log 2>&1

For complex scenarios, consider:

  • Pulp: Enterprise-grade repo management
  • Nexus: Generic artifact repository
  • dirtyrepoclean: Specialized Yum cleanup tool