Best Practices for Modularizing Large Prometheus Configuration Files


2 views

When your Prometheus configuration exceeds 1000+ lines, you'll quickly encounter several pain points:

  • Merge conflicts in version control
  • Difficulty locating specific job configurations
  • Environment-specific settings mixed together
  • Reduced readability and maintainability

While Prometheus doesn't support traditional "include" directives, it offers two powerful alternatives:

# Main prometheus.yml
global:
  scrape_interval: 15s

rule_files:
  - 'rules/common.rules'
  - 'rules/prod/*.rules'  # Wildcard pattern matching
  
scrape_configs:
  - job_name: 'prometheus'
    file_sd_configs:
      - files:
        - 'targets/prometheus*.yaml'  # File-based service discovery

Here's how we structured our 1400-line config into modular components:

config/
├── prometheus.yml              # Main config
├── envs/
│   ├── dev/                    # Dev environment
│   │   ├── scrape_configs.yml
│   │   └── rules.yml
│   ├── prod/                   # Prod environment
│   │   ├── scrape_configs.yml
│   │   └── rules.yml
├── global/                     
│   ├── alerts/                 # Alert rules
│   │   ├── node.rules
│   │   └── k8s.rules
│   └── scrape/                 # Common scrape configs
│       ├── blackbox.yml
│       └── exporters.yml

For maximum flexibility, we leverage file-based service discovery:

# scrape_configs/dev/node.yml
- targets:
  - dev-app-01:9100
  - dev-app-02:9100
  labels:
    env: dev
    role: application

# scrape_configs/prod/k8s.yml  
- targets:
  - 10.0.1.5:10250
  - 10.0.1.6:10250
  labels:
    env: prod
    cluster: primary

To make this work in production:

  1. Use configuration management tools (Ansible/Chef) to template environment-specific files
  2. Implement CI/CD pipelines that validate configs with promtool check config
  3. Set up config reload endpoint: curl -X POST http://prometheus:9090/-/reload

For complex environments, consider using Jsonnet:

// environments.libsonnet
{
  dev:: {
    scrapeInterval: '30s',
    targets: ['dev-node-01:9100', 'dev-node-02:9100'],
  },
  prod:: {
    scrapeInterval: '15s',
    targets: ['prod-node-01:9100', 'prod-node-02:9100'],
  },
}

// prometheus.jsonnet
local env = import 'environments.libsonnet';
{
  global: {
    scrape_interval: env.prod.scrapeInterval,
  },
  scrape_configs: [
    {
      job_name: 'node',
      static_configs: [
        {
          targets: env.prod.targets,
          labels: { env: 'prod' },
        },
      ],
    },
  ],
}

When your Prometheus configuration grows beyond 1000+ lines, maintenance becomes challenging. The single prometheus.yml file becomes:

  • Hard to version control with multiple contributors
  • Difficult to audit changes
  • Prone to merge conflicts
  • Environment-specific settings mixed together

While Prometheus doesn't natively support file includes, we can leverage these approaches:

1. File Concatenation with --config.file

Use shell scripts to combine partial files:

# build-config.sh
cat base.yml dev-scrape.yml dev-alerts.yml > prometheus.yml

# Then run:
# prometheus --config.file=./prometheus.yml

2. Symlink Strategy

Create environment-specific directories and symlink the active config:

prometheus-config/
├── base/
│   ├── alert_rules.yml
│   └── scrape_configs.yml
├── dev/
│   └── env_specific.yml
├── prod/
│   └── env_specific.yml
└── prometheus.yml -> dev/prometheus.yml

Here's how we structured our 1400-line config:

# main prometheus.yml
global:
  scrape_interval: 15s

rule_files:
  - 'alerts/*.rules'

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']
  - job_name: 'node'
    file_sd_configs:
      - files: ['targets/node_targets.yml']

With supporting files:

# alerts/app.rules
groups:
- name: app-alerts
  rules:
  - alert: HighRequestLatency
    expr: job:request_latency_seconds:mean5m > 0.5

# targets/node_targets.yml
- labels:
    env: dev
  targets:
  - dev-node1:9100
  - dev-node2:9100

For cloud-native setups, combine file splitting with service discovery:

scrape_configs:
  - job_name: 'consul-services'
    consul_sd_configs:
      - server: 'consul:8500'
    relabel_configs:
      - source_labels: [__meta_consul_tags]
        regex: ',(dev|prod),'
        action: keep
  • promtool: Validate split configurations with promtool check config prometheus.yml
  • Jsonnet: Generate configurations programmatically for complex setups
  • Git Submodules: Share common configuration across multiple repositories