Advanced sosreport Analysis Techniques: Comparing Configurations Between RHEL/CentOS Systems for Troubleshooting


4 views

The sosreport utility, included in RHEL/CentOS since EL 4.6, is essentially a Swiss Army knife for system diagnostics. While primarily designed for Red Hat support cases, it collects over 200 different types of system information including:

- Kernel parameters (/proc/sys)
- Package manifests (rpm -qa)
- Systemd service states
- Network configurations
- Hardware inventory (dmidecode)
- Storage configurations (multipath, lvm)

When dealing with configuration drift between supposedly identical systems, here's my professional workflow:

Step 1: Extract key files

# Extract both sosreports to temp directories
mkdir -p /tmp/sos_{A,B}
tar -xjf sosreport-hostA.tar.bz2 -C /tmp/sos_A
tar -xjf sosreport-hostB.tar.bz2 -C /tmp/sos_B

Step 2: Generate file manifests for comparison

# Create file lists with checksums for verification
find /tmp/sos_A -type f -exec sha256sum {} + > sos_A.manifest
find /tmp/sos_B -type f -exec sha256sum {} + > sos_B.manifest

Here's a Python script I frequently use to highlight significant differences:

#!/usr/bin/env python3
import filecmp, os

def compare_sosreports(dir1, dir2):
    dcmp = filecmp.dircmp(dir1, dir2)
    
    print("Unique to first system:")
    for item in dcmp.left_only:
        if not item.startswith('sos_logs'):
            print(f"- {os.path.join(dir1, item)}")
    
    print("\nConfiguration differences:")
    for item in dcmp.diff_files:
        if item.endswith('.conf') or 'sysctl' in item:
            print(f"diff {os.path.join(dir1, item)} {os.path.join(dir2, item)}")
    
    for subdir in dcmp.common_dirs:
        compare_sosreports(
            os.path.join(dir1, subdir),
            os.path.join(dir2, subdir)
        )

compare_sosreports('/tmp/sos_A', '/tmp/sos_B')

Through numerous troubleshooting sessions, I've found these files most revealing:

1. /etc/sysctl.conf and /etc/sysctl.d/* 
2. /proc/cmdline (kernel boot parameters)
3. /var/log/messages patterns
4. systemd unit files under /usr/lib/systemd/system/
5. SELinux contexts in /etc/selinux/targeted/contexts/

For large sosreports, I use this grep command to find potential smoking guns:

grep -r -E '^(diff|unique|error|warning|fail|denied)' /tmp/sos_A /tmp/sos_B \
    | grep -v -E '/(cache|logs|tmp)/'

Pro tip: Combine with wc -l to quantify differences when dealing with multiple systems in a cluster.


The sosreport utility has been a staple in Red Hat ecosystem troubleshooting since RHEL 4.6. This diagnostic tool collects:

  • System configuration files
  • Hardware information
  • Running processes
  • Package manifests
  • Log files
# Basic sosreport generation command
sosreport --batch --tmp-dir /var/tmp --all-logs

When dealing with configuration drift between "identical" systems, these methods prove valuable:

1. Directory Structure Comparison

# Extract both sosreports to compare directories
tar xvfj system1-sosreport.tar.bz2 -C /tmp/system1
tar xvfj system2-sosreport.tar.bz2 -C /tmp/system2

# Use diff recursively
diff -r /tmp/system1 /tmp/system2 | grep -v "Only in"

2. Key Configuration File Analysis

Focus on critical files that often reveal differences:

# Compare kernel parameters
diff /tmp/system1/sos_commands/kernel/sysctl_-a /tmp/system2/sos_commands/kernel/sysctl_-a

# Analyze package variations
comm -3 <(sort /tmp/system1/sos_commands/rpm/rpm_-qa) <(sort /tmp/system2/sos_commands/rpm/rpm_-qa)

For frequent comparisons, consider this Python script:

import filecmp
import os
from pathlib import Path

def compare_sosreports(dir1, dir2):
    dcmp = filecmp.dircmp(dir1, dir2)
    
    print("Differing files:")
    for name in dcmp.diff_files:
        print(f"{name} differs")
        
    print("\nFiles only in first system:")
    for name in dcmp.left_only:
        print(name)
        
    print("\nFiles only in second system:")
    for name in dcmp.right_only:
        print(name)

# Usage
compare_sosreports('/tmp/system1', '/tmp/system2')

For deeper analysis, these methods can help:

1. Metadata Comparison

# Compare SELinux contexts
diff /tmp/system1/sos_commands/selinux/sestatus_-b /tmp/system2/sos_commands/selinux/sestatus_-b

# Check service states
diff /tmp/system1/sos_commands/systemd/systemctl_list-units /tmp/system2/sos_commands/systemd/systemctl_list-units

2. Performance Metrics Analysis

# Compare sysctl tuning
grep -E 'vm.|net.' /tmp/system1/sos_commands/kernel/sysctl_-a > sysctl1.txt
grep -E 'vm.|net.' /tmp/system2/sos_commands/kernel/sysctl_-a > sysctl2.txt
diff -y sysctl1.txt sysctl2.txt

For complex environments, consider these visualization approaches:

  • Generate dot graphs of service dependencies using systemd-analyze output
  • Create side-by-side HTML reports using diff2html
  • Plot performance metric differences with gnuplot

Remember that while sosreport provides comprehensive data, the key to effective troubleshooting lies in focusing on the most likely areas of difference based on the specific symptoms you're investigating.