How to Extract Package Lists with Versions from Puppet Catalogs for Node Management


2 views

When managing infrastructure with Puppet, there are scenarios where you need to programmatically access the compiled catalog data. The most common use case is extracting package lists with their specified versions across your node inventory. This enables version tracking, dependency analysis, and compliance reporting.

Here are three practical approaches to retrieve package data from Puppet:

1. Direct Catalog Compilation

Use Puppet's face API to compile and inspect catalogs programmatically:

require 'puppet/face'

def get_packages(node)
  catalog = Puppet::Face[:catalog, '0.0.1'].find(node)
  catalog.resources.select { |r| r.type == 'Package' }.map do |p|
    {
      name: p.title,
      version: p[:ensure],
      provider: p[:provider]
    }
  end
end

2. Using PuppetDB Queries

For environments with PuppetDB, you can query the package resources directly:

# PQL query to get all packages across nodes
curl -X POST http://puppetdb:8081/pdb/query/v4 \
  -H "Content-Type: application/json" \
  -d '{
    "query": "resources[certname, title, parameters] { type = 'Package' }"
  }'

3. Catalog Export via Tasks

Create a custom task to export catalog data to your preferred storage:

# metadata.json
{
  "description": "Export package information from catalogs",
  "input_method": "stdin",
  "parameters": {
    "output_dir": {
      "description": "Directory to save JSON exports",
      "type": "String"
    }
  }
}
# export_packages.rb
require 'json'
require 'puppet'

def task(output_dir:)
  node = Puppet[:certname]
  catalog = Puppet::Resource::Catalog.indirection.find(node)
  
  packages = catalog.resources.select { |r| r.type == 'Package' }.each_with_object({}) do |p, h|
    h[p.title] = p[:ensure]
  end

  File.write(File.join(output_dir, "#{node}_packages.json"), packages.to_json)
  { status: 'success', exported: packages.size }
end

For environments using Hiera or complex conditional logic, consider these additional steps:

  • Include class parameters in your export to understand package inclusion triggers
  • Capture resource dependencies using catalog.relationship_graph
  • Account for dynamic version specifications (e.g., latest/pinned versions)

Once collected, package data can be processed for various outputs:

# Generate a markdown report
def generate_report(package_data)
  report = ["# Package Version Report", "| Node | Package | Version |", "|------|---------|---------|"]
  package_data.each do |node, packages|
    packages.each do |name, version|
      report << "| #{node} | #{name} | #{version} |"
    end
  end
  report.join("\n")
end

Consider adding catalog analysis as a pipeline step to track package version changes between deployments. This helps detect version drift and potential conflicts early in the deployment process.

When implementing catalog exports:

  • Restrict access to sensitive catalog data
  • Sanitize outputs containing node-specific parameters
  • Implement proper authentication for PuppetDB queries

When managing infrastructure with Puppet, you'll often need to extract specific resource data like package versions from compiled catalogs. This information becomes crucial for:

  • Auditing package versions across nodes
  • Generating compliance reports
  • Troubleshooting dependency issues
  • Maintaining version consistency

The simplest approach is to query resources directly on nodes:

# Get all installed packages
puppet resource package

# Filter for specific packages
puppet resource package | grep -E 'nginx|httpd'

# JSON output format
puppet resource package --to_json

For centralized data collection, PuppetDB provides powerful query capabilities:

# Query PuppetDB for package resources
curl -X GET \
  --tlsv1 \
  --cacert /etc/puppetlabs/puppet/ssl/certs/ca.pem \
  --cert /etc/puppetlabs/puppet/ssl/certs/$(hostname -f).pem \
  --key /etc/puppetlabs/puppet/ssl/private_keys/$(hostname -f).pem \
  'https://puppetdb:8081/pdb/query/v4/resources/Package'

Create a custom report processor to extract package data:

# In /etc/puppetlabs/puppet/puppet.conf
[master]
reports = store,package_versions

# Custom report processor (/etc/puppetlabs/code/environments/production/modules/puppet_metrics/lib/puppet/reports/package_versions.rb)
require 'json'

Puppet::Reports.register_report(:package_versions) do
  def process
    packages = self.resource_statuses.values.select do |resource|
      resource.resource_type == 'Package'
    end.map do |package|
      {
        name: package.title,
        version: package.events.last&.desired_value,
        node: self.host
      }
    end

    File.write("/opt/puppetlabs/package_versions/#{Time.now.to_i}.json", packages.to_json)
  end
end

Preview package changes before they're applied:

puppet agent -t --noop --evaltrace \
  | grep -A 1 'Package' \
  | grep -E 'current_value|ensure'

Combine these methods to create comprehensive reports. For example, this Ruby script aggregates data:

require 'puppetdb'
require 'json'

client = PuppetDB::Client.new(server: 'https://puppetdb:8081')

response = client.request(
  'resources',
  [:'=', 'type', 'Package'],
  [:'=', 'exported', false]
)

package_data = response.data.each_with_object({}) do |resource, hash|
  hash[resource['title']] = {
    version: resource['parameters']['ensure'],
    nodes: [resource['certname']]
  }
end

File.write('package_report.json', JSON.pretty_generate(package_data))