How to Accurately Parse RPM Package Names for Version Control in Linux Systems


1 views

RPM package filenames follow a strict convention that encodes crucial information. The standard format is:

NAME-VERSION-RELEASE.ARCH.rpm

Where:

  • NAME: Package name (may contain hyphens)
  • VERSION: Upstream version (may contain letters, numbers, dots, and hyphens)
  • RELEASE: Distribution-specific release number (often contains distro tags like .el6)
  • ARCH: Architecture (x86_64, noarch, etc.)

The common bash parsing approach fails with complex version strings. Consider these examples:

package-1.2-3.el6.x86_64.rpm     # Simple case
package-sub-1.2.3-4.el6.x86_64.rpm # Contains hyphens in name
package-1.2-3.4.el6.x86_64.rpm    # Version contains multiple hyphens

While RPM itself doesn't include a standalone name parser, we can use rpmdev-packagename from the rpmdevtools package:

# Install rpmdevtools
sudo yum install rpmdevtools

# Example usage:
rpmdev-packagename package-1.2-3.el6.x86_64.rpm

This handles complex cases correctly but requires the package to be installed.

For scriptable parsing without RPM tools, use this Python implementation:

import re

def parse_rpm_filename(filename):
    pattern = r'^(.*?)-((?:[0-9]+:)?(?:[a-zA-Z0-9._+]+(?:-[a-zA-Z0-9._+]+)*))-((?:[a-zA-Z0-9._+]+(?:-[a-zA-Z0-9._+]+)*))\.([a-zA-Z][a-zA-Z0-9_]*)(?:\.rpm)$'
    match = re.match(pattern, filename.split('/')[-1])
    if match:
        return {
            'name': match.group(1),
            'version': match.group(2),
            'release': match.group(3),
            'arch': match.group(4)
        }
    return None

Processing a file list with our Python function:

with open('/tmp/packages.txt') as f:
    for line in f:
        result = parse_rpm_filename(line.strip())
        if result:
            print(f"Name: {result['name']}, Version: {result['version']}")

The regex handles these complex scenarios:

sei_dnsmaster-1.0-99.el6.x86_64.rpm → Name: sei_dnsmaster, Version: 1.0-99
python-redis-2.8.0-2.el6.noarch.rpm → Name: python-redis, Version: 2.8.0

For your specific use case of verifying installed versions:

import subprocess

def verify_package_version(pkg_name, expected_version):
    cmd = ['yum', 'info', 'installed', pkg_name]
    output = subprocess.check_output(cmd).decode()
    version_line = [line for line in output.split('\n') if 'Version' in line][0]
    installed_version = version_line.split(':')[1].strip()
    return installed_version == expected_version

When working with RPM package management systems, developers often need to programmatically extract components from RPM filenames. The standard RPM naming convention follows this pattern:

NAME-VERSION-RELEASE.ARCH.rpm

However, parsing these filenames correctly can be tricky due to:

  • Complex version strings (may contain multiple hyphens)
  • Variable architecture suffixes (.x86_64, .noarch, etc.)
  • Optional epoch numbers
  • Distribution-specific tags (el6, fc28, etc.)

The RPM package itself includes tools that can help with parsing:

rpm -qp --queryformat '%{NAME} %{VERSION}-%{RELEASE} %{ARCH}\n' package.rpm

However, this requires the actual RPM file, not just the filename. For filename-only parsing, we need alternative approaches.

Here's a robust Python implementation based on the official RPM naming specification:

import re

def parse_rpm_filename(filename):
    # Remove path and .rpm extension
    basename = filename.split('/')[-1].replace('.rpm', '')
    
    # Official RPM filename regex pattern
    pattern = r'^(.*?)-((?:[0-9]+:)?(?:[a-zA-Z0-9._+]+))-((?:[a-zA-Z0-9._+]+))\.([a-zA-Z][a-zA-Z0-9]*)$'
    
    match = re.match(pattern, basename)
    if match:
        return {
            'name': match.group(1),
            'version': match.group(2),
            'release': match.group(3),
            'arch': match.group(4)
        }
    return None

# Example usage
result = parse_rpm_filename('mercurial-2.8-3.el6.x86_64.rpm')
print(result)

For shell scripts, here's an improved version of the bash function:

function parse_rpm() {
    local RPM=$1
    local B=${RPM##*/}
    B=${B%.rpm}
    local A=${B##*.}
    B=${B%.*}
    local R=${B##*-}
    B=${B%-*}
    local V=${B##*-}
    B=${B%-*}
    local N=$B
    echo "$N $V $R $A"
}

# Handle cases with multiple hyphens in version
function parse_rpm_enhanced() {
    local fullname=${1##*/}
    fullname=${fullname%.rpm}
    local arch=${fullname##*.}
    local rest=${fullname%.$arch}
    
    # Split name and version-release
    [[ $rest =~ (.*)-([^-]+-[^-]+)$ ]]
    local name=${BASH_REMATCH[1]}
    local version=${BASH_REMATCH[2]%-*}
    local release=${BASH_REMATCH[2]#*-}
    
    echo "$name $version $release $arch"
}

Let's examine how these solutions handle various cases:

# Standard case
python-mercurial-2.8.4-1.el8.x86_64.rpm
→ name: python-mercurial
→ version: 2.8.4
→ release: 1.el8
→ arch: x86_64

# Case with epoch
1:nginx-1.20.1-1.el8.ngx.x86_64.rpm
→ name: nginx
→ version: 1:1.20.1
→ release: 1.el8.ngx
→ arch: x86_64

# Complex version string
libreoffice-7.3.4.2-4.el8.x86_64.rpm
→ name: libreoffice
→ version: 7.3.4.2
→ release: 4.el8
→ arch: x86_64

For your specific use case of verifying package installation, you could extend the Python solution:

import subprocess

def verify_package_installation(package_path):
    components = parse_rpm_filename(package_path)
    if not components:
        return False
    
    cmd = f"yum install -y {components['name']}-{components['version']}"
    result = subprocess.run(cmd, shell=True, capture_output=True)
    return result.returncode == 0