When working with Graphite, many developers encounter a perplexing situation where their metrics appear in the system but consistently show "None" values. This typically manifests when:
# Sending test data
echo "app.metric 42 $(date +%s)" | nc localhost 2003
# Checking results shows all null values
whisper-fetch.py /opt/graphite/storage/whisper/app/metric.wsp | grep -v None | wc -l
# Output: 0
The issue stems from Graphite's multi-stage data processing:
- Carbon receiver accepts incoming metrics
- Data gets written to Whisper database files
- Aggregation occurs according to retention schemas
- Graphite-web serves aggregated data
The root cause often lies in storage-schemas.conf misconfiguration. Consider this problematic example:
[default]
pattern = .*
retentions = 1s:30m,1m:1d,5m:2y
This configuration creates three challenges:
- Precision mismatch: 1-second resolution requires extremely frequent data points
- Aggregation threshold: Default xFilesFactor=0.5 means you need >50% of datapoints
- Query windowing: The UI defaults to 24h view which skips 1s resolution data
Here's a working configuration for typical application monitoring:
[app_metrics]
pattern = ^app\.
retentions = 10s:6h,1m:7d,10m:5y
xFilesFactor = 0.1
[system]
pattern = ^system\.
retentions = 15s:24h,1m:14d,15m:5y
When troubleshooting, use these commands:
# Verify carbon is receiving data
ngrep -d any port 2003
# Check whisper file structure
whisper-info.py /path/to/metric.wsp
# Force fetch raw data (bypass aggregation)
whisper-fetch.py --from=$(date -d "-5 min" +%s) /path/to/metric.wsp
For high-volume monitoring, consider this optimized setup:
[carbon]
pattern = ^carbon\.
retentions = 10s:6h,1m:7d,10m:5y
[detailed_metrics]
pattern = ^(app|service)\..+(latency|errors)$
retentions = 5s:1h,30s:24h,5m:7d,1h:5y
xFilesFactor = 0.1
[default_1min]
pattern = .*
retentions = 1m:7d,10m:5y
- Sparse metrics getting dropped by default xFilesFactor=0.5
- Query time ranges exceeding lowest retention period
- Clock skew between metric timestamps and server time
- Whisper file permission issues preventing writes
When sending metrics to Graphite via Carbon (port 2003), the whisper files are created but all data points show as "None". This occurs even when:
- Using the built-in example-client.py
- Sending manual metrics via nc
- Carbon's own internal metrics are working
Running whisper-fetch reveals null values throughout:
whisper-fetch.py --pretty /opt/graphite/storage/whisper/jakub/test.wsp | head -n1
Sun May 4 12:19:00 2014 None
The storage-schemas.conf contains:
[default]
pattern = .*
retentions = 1s:30m,1m:1d,5m:2y
The issue stems from two critical Graphite behaviors:
Aggregation Thresholds
Graphite's default xFilesFactor=0.5 requires at least 50% of data points in an aggregation window to contain values. With 1s precision aggregating to 1m:
- 60 possible data points per minute
- Need ≥30 values to pass aggregation
- Typical application sends data every 10s (only 6 values)
Retention Period Visibility
Graphite's UI and whisper-fetch show data aggregated to the highest precision spanning the query period:
1s:30m retention means:
- Raw 1s data only visible for queries ≤30 minutes
- Default 24h view forces aggregation to 1m precision
Option 1: Adjust Retention Schema
[custom_apps]
pattern = ^jakub\.
retentions = 10s:6h,1m:7d,10m:5y
Option 2: Modify Aggregation Rules
In storage-aggregation.conf:
[jakub]
pattern = ^jakub\.
xFilesFactor = 0.1
aggregationMethod = average
Option 3: Query with Proper Timeframes
When using whisper-fetch or the API:
# View raw 1s data
whisper-fetch.py --from=$(date -d "30 min ago" +%s) /path/to/metric.wsp
# Render API example
http://graphite/render?target=jakub.test&from=-30min&format=json
After making changes:
- Restart carbon-cache
- Send test data:
echo "verify.metric 42 $(date +%s)" | nc localhost 2003
- Check immediate results:
whisper-fetch.py --from=-5min /opt/graphite/storage/whisper/verify/metric.wsp
For high-precision monitoring:
- Balance retention periods with storage requirements
- Consider carbon-relay for horizontal scaling
- Monitor carbon queue sizes
Remember that each precision level should be a multiple of the previous one (10s→1m→10m, not 10s→1m→5m).