How Linux sync and Drop Caches Commands Fixed Our Tomcat/Red5 Performance Bottleneck


7 views

When inheriting a large-scale Tomcat application serving as a Red5 server for Flex clients, we noticed response times creeping up from under 100ms to 300-400ms under sustained load. Memory leaks were suspected but never conclusively proven. The real surprise came during a staging environment load test when the server essentially stopped responding.

Out of desperation, I tried:

sync && echo 3 > /proc/sys/vm/drop_caches

Miraculously, the server immediately returned to normal operation. But why did this work?

sync: Forces completion of pending disk writes

echo 3 > /proc/sys/vm/drop_caches: Clears:

  • 1: PageCache
  • 2: Dentries and Inodes
  • 3: Both (1+2)

In high-throughput Java applications:

  1. Linux aggressively caches file operations (JARs, class files, etc.)
  2. Over time, cached entries become stale or fragmented
  3. Java's memory management competes with OS caching

Consider this approach when:

# Monitoring script example
if [ $(grep -c 'java.lang.OutOfMemoryError' /var/log/tomcat/catalina.out) -gt 0 ] ||
   [ $(vmstat 1 2 | tail -1 | awk '{print $4}') -lt 10000 ]; then
    logger "Triggering cache cleanup"
    sync && echo 3 > /proc/sys/vm/drop_caches
fi

For production systems, consider more controlled approaches:

# Scheduled maintenance (cron)
0 3 * * * root /usr/bin/sync && /usr/bin/echo 1 > /proc/sys/vm/drop_caches

1. This is NOT a substitute for proper memory management
2. Dropping caches causes immediate performance hit until cache rebuilds
3. Consider adjusting vm.vfs_cache_pressure instead for permanent solution


When inheriting a legacy Tomcat application serving as a Red5 server for Flex clients, I encountered a perplexing performance degradation pattern. Under sustained load, response times would gradually inflate from under 100ms to 300-400ms. Memory leaks were suspected but never conclusively proven through heap dumps or GC analysis.

# Sample metrics showing the performance degradation:
Requests: 1500/min → Response p99: 98ms
After 4 hours:
Requests: 1500/min → Response p99: 387ms

During a particularly severe episode where a staging server nearly stopped responding, I executed:

sync && echo 3 > /proc/sys/vm/drop_caches

Remarkably, the server immediately returned to baseline performance. Let's break down why this worked.

Modern Linux kernels aggressively cache filesystem operations to improve performance. Three types of caches exist:

  • Page cache: Caches file contents (echo 1)
  • Dentry/inode caches: Caches directory structures (echo 2)
  • Combined: Both above (echo 3)

In Java applications handling many small real-time interactions, these caches can grow excessively:

# Check current cache usage
free -h
              total        used        free      shared  buff/cache   available
Mem:            16G        5.2G        230M        1.3G         11G        9.2G

The sync command ensures all buffered writes hit disk before clearing caches. Omitting this risks data corruption. The sequence:

  1. sync: Force write completion
  2. echo 3: Clear all cache types

For production systems, consider implementing controlled cache drops during low-traffic periods:

#!/bin/bash
# Safe cache dropper for Tomcat/Java apps

if [ $(pgrep -fc java) -lt 1 ]; then
    echo "No Java processes running"
    exit 1
fi

# Only proceed if load is below threshold
LOAD=$(awk '{print $1}' /proc/loadavg)
if [ $(echo "$LOAD < 2.0" | bc) -eq 1 ]; then
    logger "Initiating safe cache drop"
    sync
    echo 3 > /proc/sys/vm/drop_caches
    logger "Cache drop completed. Current free: $(free -h | awk '/Mem/{print $4}')"
fi

For long-term stability, consider:

  • JVM Tuning: Reduce filesystem interactions through proper buffer sizing
  • Redis Caching: Offload real-time data from filesystem
  • Kernel Parameters: Adjust vm.vfs_cache_pressure
# Example kernel tuning
sysctl -w vm.vfs_cache_pressure=150

The cache drop technique serves as an emergency remedy rather than a permanent solution, but understanding its mechanics proves invaluable when troubleshooting mysterious latency spikes in real-time systems.