OpenTSDB vs Graphite: Key Technical Differences for Time Series Data Storage and Performance


1 views

OpenTSDB uses HBase as its backend storage, allowing theoretically unlimited retention of high-precision metrics. Here's how you'd configure a metric in OpenTSDB:

tsd.core.auto_create_metrics = true
tsd.storage.hbase.data_table = tsdb
tsd.storage.hbase.uid_table = tsdb-uid

Graphite uses Whisper files with fixed-size databases. A sample retention schema in storage-schemas.conf:

[servers]
pattern = ^servers\.
retentions = 10s:6h,1m:7d,10m:5y

OpenTSDB supports sub-second metric collection natively. A sample put command for 500ms resolution:

put sys.cpu.user 1356998400.5 42 host=webserver01 cpu=0

Graphite's minimum interval is typically 1 second, though some forks have added sub-second support. The carbon.conf setting controls this:

[carbon]
MAX_UPDATES_PER_SECOND = 1000
MAX_CREATES_PER_MINUTE = 1000

OpenTSDB excels at high-cardinality metrics. Example query for 100,000 unique time series:

{
  "start": "1h-ago",
  "queries": [
    {
      "metric": "app.requests",
      "aggregator": "sum",
      "tags": {
        "host": "*",
        "region": "us-west-*"
      }
    }
  ]
}

Graphite's query language (Graphite Render API) handles rollups efficiently:

/render?target=summarize(app.requests.count,"1hour","sum")&from=-7d

Modern alternatives worth considering:

# Prometheus config example
scrape_configs:
  - job_name: 'node'
    scrape_interval: 15s
    static_configs:
      - targets: ['localhost:9100']

InfluxDB TSM storage example:

INSERT cpu,host=server01 value=0.64
SELECT mean("value") FROM "cpu" WHERE time > now() - 1h GROUP BY time(5m)

OpenTSDB requires HBase expertise for cluster management. Graphite's carbon-relay supports consistent hashing:

[relay]
RELAY_METHOD = consistent-hashing
DESTINATIONS = 127.0.0.1:2004:a, 127.0.0.1:2104:b

For deployments under 1 million metrics/day, Graphite is often simpler. Beyond that scale, OpenTSDB's distributed architecture shines.


The most fundamental architectural difference lies in how these systems handle data retention:


// Graphite's storage-schemas.conf example
[default_1min_for_1day]
pattern = .*
retentions = 60s:1d

// OpenTSDB uses HBase's native storage without downsampling
# tsd.storage.hbase.data_table = tsdb
# tsd.storage.hbase.meta_table = tsdb-meta

Graphite employs fixed-size Whisper databases that require predefined retention policies, while OpenTSDB leverages HBase's scalable storage without built-in data deterioration. This means:

  • Graphite: Storage size predictable but requires upfront configuration
  • OpenTSDB: Storage scales dynamically with metrics volume

Regarding time resolution capabilities:


// Graphite's minimum interval (typically 1 second possible)
carbon.conf:
MAX_CREATES_PER_MINUTE = 10000

// OpenTSDB native support for sub-second metrics
tsd.core.auto_create_metrics = true
tsd.storage.fix_duplicates = true

While Graphite can technically handle 1s resolution, practical deployments often use 10s-60s intervals due to performance considerations. OpenTSDB natively handles millisecond precision out of the box.

The query paradigms differ significantly:


# Graphite's function chaining
aliasByNode(summarize(production.server*.cpu.load, "1h", "avg"), 2)

# OpenTSDB's metric+tag queries
{
  "start": 1672531200,
  "queries": [{
    "metric": "sys.cpu.nice",
    "tags": {
      "host": "web*",
      "dc": "lga"
    },
    "aggregator": "avg",
    "downsample": "1h-avg"
  }]
}

Both systems scale differently:

  • Graphite: Relies on Carbon relay + aggregator patterns
    [aggregation]
    cpu.core.all = avg cpu.core.[0-9]+
    
  • OpenTSDB: Leverages HBase regions for horizontal scaling
    tsd.storage.enable_compactions = true
    tsd.storage.flush_interval = 1000
    

For modern deployments, consider these alternatives:


# Prometheus configuration example
scrape_configs:
  - job_name: 'node'
    static_configs:
      - targets: ['localhost:9100']

# InfluxDB line protocol example
cpu,host=server01 value=0.64 1434055562000000000

Other notable systems include VictoriaMetrics, TimescaleDB, and M3DB, each with distinct tradeoffs in consistency models and query capabilities.