When your Prometheus configuration exceeds 1000+ lines, you'll quickly encounter several pain points:
- Merge conflicts in version control
- Difficulty locating specific job configurations
- Environment-specific settings mixed together
- Reduced readability and maintainability
While Prometheus doesn't support traditional "include" directives, it offers two powerful alternatives:
# Main prometheus.yml
global:
scrape_interval: 15s
rule_files:
- 'rules/common.rules'
- 'rules/prod/*.rules' # Wildcard pattern matching
scrape_configs:
- job_name: 'prometheus'
file_sd_configs:
- files:
- 'targets/prometheus*.yaml' # File-based service discovery
Here's how we structured our 1400-line config into modular components:
config/
├── prometheus.yml # Main config
├── envs/
│ ├── dev/ # Dev environment
│ │ ├── scrape_configs.yml
│ │ └── rules.yml
│ ├── prod/ # Prod environment
│ │ ├── scrape_configs.yml
│ │ └── rules.yml
├── global/
│ ├── alerts/ # Alert rules
│ │ ├── node.rules
│ │ └── k8s.rules
│ └── scrape/ # Common scrape configs
│ ├── blackbox.yml
│ └── exporters.yml
For maximum flexibility, we leverage file-based service discovery:
# scrape_configs/dev/node.yml
- targets:
- dev-app-01:9100
- dev-app-02:9100
labels:
env: dev
role: application
# scrape_configs/prod/k8s.yml
- targets:
- 10.0.1.5:10250
- 10.0.1.6:10250
labels:
env: prod
cluster: primary
To make this work in production:
- Use configuration management tools (Ansible/Chef) to template environment-specific files
- Implement CI/CD pipelines that validate configs with
promtool check config
- Set up config reload endpoint:
curl -X POST http://prometheus:9090/-/reload
For complex environments, consider using Jsonnet:
// environments.libsonnet
{
dev:: {
scrapeInterval: '30s',
targets: ['dev-node-01:9100', 'dev-node-02:9100'],
},
prod:: {
scrapeInterval: '15s',
targets: ['prod-node-01:9100', 'prod-node-02:9100'],
},
}
// prometheus.jsonnet
local env = import 'environments.libsonnet';
{
global: {
scrape_interval: env.prod.scrapeInterval,
},
scrape_configs: [
{
job_name: 'node',
static_configs: [
{
targets: env.prod.targets,
labels: { env: 'prod' },
},
],
},
],
}
When your Prometheus configuration grows beyond 1000+ lines, maintenance becomes challenging. The single prometheus.yml file becomes:
- Hard to version control with multiple contributors
- Difficult to audit changes
- Prone to merge conflicts
- Environment-specific settings mixed together
While Prometheus doesn't natively support file includes, we can leverage these approaches:
1. File Concatenation with --config.file
Use shell scripts to combine partial files:
# build-config.sh
cat base.yml dev-scrape.yml dev-alerts.yml > prometheus.yml
# Then run:
# prometheus --config.file=./prometheus.yml
2. Symlink Strategy
Create environment-specific directories and symlink the active config:
prometheus-config/
├── base/
│ ├── alert_rules.yml
│ └── scrape_configs.yml
├── dev/
│ └── env_specific.yml
├── prod/
│ └── env_specific.yml
└── prometheus.yml -> dev/prometheus.yml
Here's how we structured our 1400-line config:
# main prometheus.yml
global:
scrape_interval: 15s
rule_files:
- 'alerts/*.rules'
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node'
file_sd_configs:
- files: ['targets/node_targets.yml']
With supporting files:
# alerts/app.rules
groups:
- name: app-alerts
rules:
- alert: HighRequestLatency
expr: job:request_latency_seconds:mean5m > 0.5
# targets/node_targets.yml
- labels:
env: dev
targets:
- dev-node1:9100
- dev-node2:9100
For cloud-native setups, combine file splitting with service discovery:
scrape_configs:
- job_name: 'consul-services'
consul_sd_configs:
- server: 'consul:8500'
relabel_configs:
- source_labels: [__meta_consul_tags]
regex: ',(dev|prod),'
action: keep
- promtool: Validate split configurations with
promtool check config prometheus.yml
- Jsonnet: Generate configurations programmatically for complex setups
- Git Submodules: Share common configuration across multiple repositories