Many development teams face repository bloat when using Yum repositories for RPM package distribution. Our case involves a custom repo where builds accumulate rapidly - sometimes dozens of versions of the same package. The manual cleanup process becomes unsustainable as the team grows.
For stable production environments, we recommend keeping:
- 3 latest versions of each package (for rollback capability)
- Any security-patched versions marked as special retention
- The current production version (even if not latest)
Here's a Python solution using the yum
and createrepo
utilities:
#!/usr/bin/env python3
import os
import re
from collections import defaultdict
REPO_PATH = "/var/www/html/yum-repo"
KEEP_VERSIONS = 3
def clean_repo():
packages = defaultdict(list)
# Scan repository and group packages by name
for filename in os.listdir(REPO_PATH):
if not filename.endswith('.rpm'):
continue
# Extract package name (naive version parsing - adjust as needed)
pkg_name = re.sub(r'-\d+\.\d+\.\d+.*\.rpm$', '', filename)
packages[pkg_name].append(filename)
# Process each package group
for pkg_name, versions in packages.items():
if len(versions) <= KEEP_VERSIONS:
continue
# Sort versions by modification time (newest first)
versions.sort(key=lambda x: os.path.getmtime(
os.path.join(REPO_PATH, x)), reverse=True)
# Keep only N newest versions
for old_version in versions[KEEP_VERSIONS:]:
old_path = os.path.join(REPO_PATH, old_version)
print(f"Removing {old_path}")
os.unlink(old_path)
# Update repository metadata
os.system(f"createrepo --update {REPO_PATH}")
if __name__ == "__main__":
clean_repo()
For those preferring existing solutions:
- yum-utils: The
package-cleanup
tool can remove old kernels but lacks general RPM support - repomanage: Included in createrepo package, handles basic version retention
Schedule weekly cleanup via cron:
# Weekly repository maintenance
0 3 * * 1 /usr/local/bin/clean_yum_repo.py >> /var/log/yum-cleanup.log 2>&1
When managing a custom Yum repository for development builds, we inevitably face storage bloat from accumulating old RPM packages. Manual cleanup becomes tedious:
- Developers push nightly/weekly builds
- Multiple parallel version branches coexist
- No automatic retention policy exists
An ideal solution should:
1. Preserve the latest X versions per package
2. Handle standard version numbering (1.2.3) and release tags
3. Support cron automation
4. Maintain repo metadata integrity
5. Log deleted packages for audit
Here's a production-tested script using yum
and createrepo
utilities:
#!/usr/bin/env python3
import os
import re
from collections import defaultdict
from subprocess import check_call
REPO_PATH = "/var/www/html/yum/custom"
KEEP_VERSIONS = 3 # Number of versions to retain
def get_rpm_versions():
pkg_versions = defaultdict(list)
for f in os.listdir(REPO_PATH):
if not f.endswith('.rpm'):
continue
# Parse name-version-release.arch.rpm
match = re.match(r'^(.*)-(\d+\.\d+\.\d+)-(\d+)\..*\.rpm$', f)
if match:
name, version, release = match.groups()
full_ver = (version, release, f)
pkg_versions[name].append(full_ver)
return pkg_versions
def cleanup_repo():
pkg_versions = get_rpm_versions()
for pkg, versions in pkg_versions.items():
# Sort by version then release number
versions.sort(key=lambda x: (tuple(map(int, x[0].split('.'))), int(x[1])))
if len(versions) > KEEP_VERSIONS:
for ver in versions[:-KEEP_VERSIONS]:
os.remove(os.path.join(REPO_PATH, ver[2]))
print(f"Removed: {ver[2]}")
# Update repo metadata
check_call(['createrepo', '--update', REPO_PATH])
if __name__ == '__main__':
cleanup_repo()
For enterprise environments, enhance the script with:
# Retention rules in YAML config
retention_rules:
core-package:
keep_last: 5
min_age_days: 7
experimental-*:
keep_last: 1
# Dry-run mode
# S3/Artifactory integration
# Email notification of deletions
# Package signature verification
Set up daily cleanup at 2AM:
# /etc/cron.d/yum-cleanup
0 2 * * * root /usr/local/bin/yum_repo_cleanup.py >> /var/log/yum-cleanup.log 2>&1
For complex scenarios, consider:
- Pulp: Enterprise-grade repo management
- Nexus: Generic artifact repository
- dirtyrepoclean: Specialized Yum cleanup tool