Monitoring Block Device Cache Hit/Miss Ratios in Linux: A Performance Analysis Guide


2 views

In Linux systems, analyzing cache performance for block devices is crucial for optimizing I/O operations. The kernel provides several mechanisms to measure how effectively the page cache is serving read/write requests.

The most straightforward method involves examining /proc/vmstat:

cat /proc/vmstat | egrep 'pgpgin|pgpgout|pgfault|pgmajfault'

Key metrics include:

  • pgfault: Total page faults (includes cache hits)
  • pgmajfault: Major page faults (true cache misses requiring disk I/O)

For more granular insights, use the perf tool to trace cache events:

sudo perf stat -e cache-references,cache-misses -a sleep 5

This provides:

  • cache-references: Total cache accesses
  • cache-misses: Failed cache lookups

For block device-specific analysis, examine /sys/block/[device]/stat:

cat /sys/block/sda/stat

The 4th field shows merged read requests served from cache, while the 8th field shows merged write requests.

Install BCC tools and use cachestat:

sudo cachestat -T 1

Sample output shows:

  • HITS: Pages found in cache
  • MISSES: Pages not found in cache
  • HITRATIO: Percentage of cache hits

For developers needing custom metrics, consider this kernel module snippet:

#include 
#include 
#include 

static int __init cache_mon_init(void) {
    struct backing_dev_info *bdi;
    bdi = blk_get_backing_dev_info(bdev);
    printk(KERN_INFO "Cache ratio: %lu/%lu\n",
        bdi->ra_pages, bdi->io_pages);
    return 0;
}
module_init(cache_mon_init);

Calculate hit ratio using:

hit_ratio = (total_requests - cache_misses) / total_requests * 100

Optimal systems typically show >90% cache hit ratio for most workloads.

For ongoing monitoring, integrate these metrics with Grafana using Node Exporter's textfile collector to scrape custom metrics.


When optimizing I/O performance in Linux systems, understanding cache behavior is crucial. The Linux kernel provides several mechanisms to monitor how effectively the page cache serves read/write requests before they reach physical storage.

The primary interface for cache statistics is through the /proc filesystem:

# For overall system cache statistics
cat /proc/meminfo | grep -E '^(Cached|Buffers|Dirty|Writeback)'

# For per-block device cache efficiency
cat /sys/block/sda/stat

For more granular metrics, the perf tool can track cache events:

# Install perf if needed
sudo apt install linux-tools-common linux-tools-$(uname -r)

# Monitor cache references and misses
sudo perf stat -e cache-references,cache-misses -a sleep 5

The BCC toolkit provides advanced tools for cache analysis:

# Install bcc-tools
sudo apt install bpfcc-tools

# Monitor page cache hit ratio
sudo cachestat 1

Typical cachestat output shows:

  • HITS: Successful cache accesses
  • MISSES: Cache misses requiring disk I/O
  • DIRTY: Pages modified in cache
  • RATIO: Hit ratio percentage

For persistent monitoring, create a script to log cache metrics:

#!/bin/bash
while true; do
    echo "$(date +%s) $(grep -E '^(Cached|Dirty)' /proc/meminfo)" >> /var/log/cache_stats.log
    sleep 5
done

For deep analysis, use kernel tracepoints:

sudo perf probe --add 'submit_bio'
sudo perf stat -e 'probe:submit_bio' -a sleep 10

Combine these tools with visualization tools like Grafana for long-term trend analysis and capacity planning.