Whisper, Graphite's time-series database, uses a fixed-size allocation scheme where each metric consumes predictable disk space. The storage formula is:
total_size = (retention_points * point_size) + (number_of_archives * whisper_header_size)
Where:
retention_points
= sum of (retention_seconds / precision_seconds) for all archivespoint_size
= 12 bytes (default for float64 + timestamp)whisper_header_size
= 16 bytes per archive
For a retention schema with:
retention = [
"10s:6h", # 6 hours at 10s resolution
"1m:7d", # 7 days at 1m resolution
"10m:30d" # 30 days at 10m resolution
]
The calculation would be:
archive1 = (6h * 3600s/h) / 10s = 2,160 points
archive2 = (7d * 86400s/d) / 60s = 10,080 points
archive3 = (30d * 86400s/d) / 600s = 4,320 points
total_points = 2,160 + 10,080 + 4,320 = 16,560
header_size = 3 archives * 16 = 48 bytes
storage = (16,560 * 12) + 48 = ~198.8KB per metric
From our monitoring cluster handling 150,000 metrics:
Retention Policy | Metrics Count | Actual Disk Usage | Per-Metric Average |
---|---|---|---|
10s:1h,1m:7d,10m:365d | 45,000 | 12.7GB | ~295KB |
30s:2h,5m:14d,1h:400d | 105,000 | 19.3GB | ~188KB |
Consider these techniques to reduce footprint:
# Use xFilesFactor to allow sparser storage
retention = [
"10s:6h 0.1", # 10% of slots can be empty
"1m:7d 0.3",
"10m:30d 0.5"
]
# Compress older data with aggregation
aggregationMethod = "average" # instead of "sum" or "max"
Use this Python snippet to audit disk usage:
import os
def get_whisper_size(storage_path):
total = 0
for root, _, files in os.walk(storage_path):
for f in files:
if f.endswith('.wsp'):
total += os.path.getsize(os.path.join(root, f))
return total
print(f"Total storage: {get_whisper_size('/opt/graphite/storage/whisper')/1024/1024:.2f}MB")
When planning disk capacity for Graphite/Whisper, we need to consider three fundamental variables:
retention = "10s:6h,60s:7d,10m:5y" # Example retention schema
xFilesFactor = 0.5 # Percentage of non-null values required
aggregationMethod = "average" # Downsampling method
The storage requirement for a single metric in Whisper can be calculated as:
def calculate_whisper_size(retentions):
total_points = 0
for precision, duration in retentions:
seconds = convert_to_seconds(duration)
total_points += seconds / precision
return total_points * 12 # 12 bytes per datapoint (float64 + timestamp)
From production environments, we observe these typical patterns:
# Small deployment (50 metrics)
50 metrics * 3.5MB = ~175MB disk space
# Medium deployment (10,000 metrics)
10,000 metrics * 3.5MB = ~35GB disk space
# Large deployment (500,000 metrics)
500,000 metrics * 3.5MB = ~1.75TB disk space
For high-density metrics, consider these tuning approaches:
# Example storage-schemas.conf
[application_metrics]
pattern = ^app\.
retentions = 10s:24h,1m:7d,10m:2y
[infrastructure_metrics]
pattern = ^sys\.
retentions = 60s:48h,5m:30d,1h:1y
Use these commands to track real disk consumption:
# Check whisper file sizes
find /opt/graphite/storage/whisper -type f -name "*.wsp" -exec du -sh {} +
# Calculate total storage
du -sh /opt/graphite/storage/whisper
For dynamic environments, implement predictive scaling:
import math
def predict_storage(metrics, retention="1m:7d,10m:1y", growth_rate=0.05, months=12):
points_per_metric = sum(convert_to_seconds(d)/prec for prec,d in parse_retention(retention))
initial = metrics * points_per_metric * 12
return initial * (1 + growth_rate) ** months