Troubleshooting Graphite’s “None” Data Points: Storage Aggregation and Retention Configuration Guide

When working with Graphite, many developers encounter a perplexing situation where their metrics appear in the system but consistently show "None" values. This typically manifests when:

# Sending test data
echo "app.metric 42 $(date +%s)" | nc localhost 2003

# Checking results shows all null values
whisper-fetch.py /opt/graphite/storage/whisper/app/metric.wsp | grep -v None | wc -l
# Output: 0

The issue stems from Graphite's multi-stage data processing:

Carbon receiver accepts incoming metrics
Data gets written to Whisper database files
Aggregation occurs according to retention schemas
Graphite-web serves aggregated data

The root cause often lies in storage-schemas.conf misconfiguration. Consider this problematic example:

[default]
pattern = .*
retentions = 1s:30m,1m:1d,5m:2y

This configuration creates three challenges:

Precision mismatch: 1-second resolution requires extremely frequent data points
Aggregation threshold: Default xFilesFactor=0.5 means you need >50% of datapoints
Query windowing: The UI defaults to 24h view which skips 1s resolution data

Here's a working configuration for typical application monitoring:

[app_metrics]
pattern = ^app\.
retentions = 10s:6h,1m:7d,10m:5y
xFilesFactor = 0.1

[system]
pattern = ^system\.
retentions = 15s:24h,1m:14d,15m:5y

When troubleshooting, use these commands:

# Verify carbon is receiving data
ngrep -d any port 2003

# Check whisper file structure
whisper-info.py /path/to/metric.wsp

# Force fetch raw data (bypass aggregation)
whisper-fetch.py --from=$(date -d "-5 min" +%s) /path/to/metric.wsp

For high-volume monitoring, consider this optimized setup:

[carbon]
pattern = ^carbon\.
retentions = 10s:6h,1m:7d,10m:5y

[detailed_metrics]
pattern = ^(app|service)\..+(latency|errors)$
retentions = 5s:1h,30s:24h,5m:7d,1h:5y
xFilesFactor = 0.1

[default_1min]
pattern = .*
retentions = 1m:7d,10m:5y

Sparse metrics getting dropped by default xFilesFactor=0.5
Query time ranges exceeding lowest retention period
Clock skew between metric timestamps and server time
Whisper file permission issues preventing writes

When sending metrics to Graphite via Carbon (port 2003), the whisper files are created but all data points show as "None". This occurs even when:

Using the built-in example-client.py
Sending manual metrics via nc
Carbon's own internal metrics are working

Running whisper-fetch reveals null values throughout:

whisper-fetch.py --pretty /opt/graphite/storage/whisper/jakub/test.wsp | head -n1
Sun May  4 12:19:00 2014    None

The storage-schemas.conf contains:

[default]
pattern = .*
retentions = 1s:30m,1m:1d,5m:2y

The issue stems from two critical Graphite behaviors:

Aggregation Thresholds

Graphite's default xFilesFactor=0.5 requires at least 50% of data points in an aggregation window to contain values. With 1s precision aggregating to 1m:

60 possible data points per minute
Need ≥30 values to pass aggregation
Typical application sends data every 10s (only 6 values)

Retention Period Visibility

Graphite's UI and whisper-fetch show data aggregated to the highest precision spanning the query period:

1s:30m retention means:
- Raw 1s data only visible for queries ≤30 minutes
- Default 24h view forces aggregation to 1m precision

Option 1: Adjust Retention Schema

[custom_apps]
pattern = ^jakub\.
retentions = 10s:6h,1m:7d,10m:5y

Option 2: Modify Aggregation Rules

In storage-aggregation.conf:

[jakub]
pattern = ^jakub\.
xFilesFactor = 0.1
aggregationMethod = average

Option 3: Query with Proper Timeframes

When using whisper-fetch or the API:

# View raw 1s data
whisper-fetch.py --from=$(date -d "30 min ago" +%s) /path/to/metric.wsp

# Render API example
http://graphite/render?target=jakub.test&from=-30min&format=json

After making changes:

Restart carbon-cache
Send test data: echo "verify.metric 42 $(date +%s)" | nc localhost 2003
Check immediate results: whisper-fetch.py --from=-5min /opt/graphite/storage/whisper/verify/metric.wsp

For high-precision monitoring:

Balance retention periods with storage requirements
Consider carbon-relay for horizontal scaling
Monitor carbon queue sizes

Remember that each precision level should be a multiple of the previous one (10s→1m→10m, not 10s→1m→5m).

ServerDevWorker