Linux's background writeback mechanism, while essential for system performance, can become a bottleneck when it aggressively flushes dirty pages at maximum device speed. The current controls in /proc/sys/vm/
(dirty_background_ratio
, dirty_expire_centisecs
, and dirty_ratio
) don't provide fine-grained control over the writeback rate, leading to potential I/O contention.
The kernel's flush threads (kworker
processes) attempt to write dirty pages when either:
- The percentage of dirty pages exceeds
dirty_background_ratio
- Pages have been dirty longer than
dirty_expire_centisecs
This can create sudden bursts of I/O activity that starve other processes:
# Current default values (may vary by distribution)
$ cat /proc/sys/vm/dirty_background_ratio
10
$ cat /proc/sys/vm/dirty_expire_centisecs
3000
While Linux doesn't provide direct writeback throttling controls, we can implement several workarounds:
1. Adjusting Existing Parameters
More aggressive tuning can help smooth out writeback:
# Reduce both ratio and expiration time
echo 5 > /proc/sys/vm/dirty_background_ratio
echo 1000 > /proc/sys/vm/dirty_expire_centisecs
2. Using cgroups v2 I/O Controller
For modern systems (kernel 4.19+), cgroups v2 provides better I/O control:
# Create a cgroup for writeback processes
mkdir /sys/fs/cgroup/io/writeback_limit
echo "+io" > /sys/fs/cgroup/cgroup.subtree_control
# Limit writeback to 50MB/s
echo "8:0 wbps=52428800" > /sys/fs/cgroup/io/writeback_limit/io.max
3. Kernel Parameter: dirty_writeback_centisecs
Adjust how often the kernel checks for dirty pages:
# Check every 500ms instead of default 5s
echo 500 > /proc/sys/vm/dirty_writeback_centisecs
Use these tools to verify your changes:
# Real-time I/O monitoring
iotop -oP
# Detailed disk stats
iostat -x 1
# Flush thread activity
grep -i flush /proc/diskstats
For ultimate control, you can modify the writeback behavior:
# Sample kernel module snippet to throttle writeback
static unsigned long custom_writeback_rate = 1024; /* pages/second */
void throttle_writeback(void)
{
if (nr_to_write > custom_writeback_rate) {
nr_to_write = custom_writeback_rate;
schedule_timeout_interruptible(1);
}
}
- Database servers: Lower dirty_background_ratio (3-5%) and reduce expiration time
- Media servers: Use cgroups to prioritize read I/O
- Write-heavy applications: Consider mounting filesystems with
-o sync
for critical writes
Linux's background flush mechanism, while efficient for write operations, can create significant I/O contention when:
- The system reaches
dirty_background_ratio
threshold (default 10%) - Dirty pages age beyond
dirty_expire_centisecs
(default 3000cs) - Not yet hitting
dirty_ratio
(default 20-30%)
The fundamental issue emerges when the kernel's pdflush or kworker threads engage in full-speed writeback, starving other I/O operations like synchronous writes or uncached reads.
Modern Linux kernels (4.10+) offer several controls for writeback throttling:
# View current settings
sysctl vm.dirty_bytes
sysctl vm.dirty_background_bytes
sysctl vm.dirty_writeback_centisecs
sysctl vm.dirtytime_expire_seconds
1. Dynamic Throttling with cgroups v2
For systems with cgroups v2 support (most modern distributions):
# Create IO throttle group
mkdir /sys/fs/cgroup/writeback_throttle
echo "8:16 wbps=1000000" > /sys/fs/cgroup/writeback_throttle/io.max
# Limit background writeback to 1MB/s
echo $(pgrep kworker) > /sys/fs/cgroup/writeback_throttle/cgroup.procs
2. Ionice Priority Adjustment
# Set background flush to idle I/O class
ionice -c 3 -p $(pgrep kworker | head -n1)
3. Kernel Parameter Optimization
# Reduce writeback aggressiveness
echo 150 > /proc/sys/vm/dirty_writeback_centisecs
echo $((4*1024*1024)) > /proc/sys/vm/dirty_background_bytes
echo $((16*1024*1024)) > /proc/sys/vm/dirty_bytes
For absolute control, consider a kernel module that hooks into the writeback mechanism:
#include <linux/mm.h>
#include <linux/module.h>
static unsigned long custom_dirty_ratio = 10;
module_param(custom_dirty_ratio, ulong, 0644);
static int __init wb_throttle_init(void)
{
struct wb_domain *dom = &global_wb_domain;
dom->dirty_limit = (dom->dirty_limit * custom_dirty_ratio) / 100;
return 0;
}
module_init(wb_throttle_init);
Compile with:
make -C /lib/modules/$(uname -r)/build M=$PWD modules
Essential tools for observation:
# Real-time I/O monitoring
iotop -oP
# Detailed writeback stats
cat /proc/vmstat | egrep "dirty|writeback"
# Per-device I/O saturation
iostat -x 1