Analyzing Disk Usage with Treemap Visualization on Headless Linux: CLI Tools and Remote Analysis Methods


13 views

When managing headless Linux servers, understanding disk consumption patterns is crucial for maintenance and capacity planning. Treemap visualizations provide immediate spatial awareness of storage allocation, making them superior to traditional du -h output for identifying space hogs.

For servers without GUI environments, these command-line tools create treemap-compatible data:

# ncdu - export scan results for later visualization
ncdu -o scan_results.json /path/to/scan

# dust - alternative with color-coded output
dust -d 3 / | less -R

# gdu - Go-based disk analyzer
gdu --non-interactive --export json > disk_usage.json

The most effective approach combines CLI scanning with remote visualization:

  1. On the server:
  2. # Scan with detailed output
    ncdu -x / --exclude /mnt --output ncdu_results.json
    
  3. Transfer results to local machine:
  4. scp user@server:/path/ncdu_results.json ~/disk_analysis/
    
  5. Visualize locally:
  6. # Using ncdu's HTML export
    ncdu --import ncdu_results.json --export-html disk_report.html
    
    # Or with k4dirstat (for KDE users)
    k4dirstat --import ncdu_results.json
    

When graphical tools aren't available, consider these structured CLI alternatives:

# Tree-like output with sizes
du -h --max-depth=3 | sort -hr | less

# Interactive navigation
ncurses-based tools:
- ncdu (already mentioned)
- vdu (Vim-like interface)
- gdu (TUI interface)

For programmatic analysis, this Python script generates treemap-ready JSON:

import os
import json
from collections import defaultdict

def get_dir_size(start_path):
    total_size = 0
    for dirpath, dirnames, filenames in os.walk(start_path):
        for f in filenames:
            fp = os.path.join(dirpath, f)
            total_size += os.path.getsize(fp)
    return total_size

def build_tree(path):
    tree = {'name': os.path.basename(path), 'children': []}
    try:
        entries = os.listdir(path)
        for entry in entries:
            full_path = os.path.join(path, entry)
            if os.path.isdir(full_path):
                tree['children'].append(build_tree(full_path))
            else:
                tree['children'].append({
                    'name': entry,
                    'size': os.path.getsize(full_path)
                })
        tree['size'] = get_dir_size(path)
    except PermissionError:
        pass
    return tree

with open('disk_tree.json', 'w') as f:
    json.dump(build_tree('/'), f)

For teams needing shared access to disk analysis:

# Serve results via HTTP (Python one-liner)
python3 -m http.server 8000 --directory /path/to/exported_report

# Or use dedicated tools:
- Diskover-web (Elasticsearch backend)
- DaisyDisk Web Viewer
- Custom D3.js treemap visualization

Working with headless Linux servers often means sacrificing visual tools like KDirStat or WinDirStat. Here's how to analyze disk usage effectively through SSH:

ncdu (NCurses Disk Usage) provides interactive exploration:


# Install (Debian/Ubuntu)
sudo apt install ncdu

# Scan filesystem
ncdu /path/to/scan

# Export results
ncdu -o scan_results.json /

dust offers intuitive directory summaries:


cargo install du-dust
dust /var --depth 2

For true treemap visualization, consider this pipeline:


# 1. Generate data (Python example)
import subprocess
result = subprocess.run(['du', '-h', '--max-depth=5', '/'], 
                      capture_output=True,
                      text=True)

# 2. Process for visualization (save to CSV)
with open('disk_usage.csv', 'w') as f:
    f.write("path,size\n")
    for line in result.stdout.splitlines():
        size, path = line.split('\t')
        f.write(f"{path},{size.replace('G','000').replace('M','')}\n")

Transfer the CSV to a workstation and visualize with:

  • D3.js treemap (browser-based)
  • RAWGraphs (open source visualization tool)
  • Python matplotlib:

import pandas as pd
import matplotlib.pyplot as plt
import squarify

df = pd.read_csv('disk_usage.csv')
squarify.plot(sizes=df['size'], label=df['path'], alpha=.8)
plt.axis('off')
plt.show()

Web-based solutions:


# Serve directory via HTTP (Python 3)
python3 -m http.server 8000 --directory /path/to/share

Then access from any device with a browser and use web-based tools.

JSON API endpoints:


# FastAPI example
from fastapi import FastAPI
import subprocess

app = FastAPI()

@app.get("/disk-usage")
def get_disk_usage():
    result = subprocess.run(['du', '-s', '--block-size=1M', '/*'], 
                          capture_output=True,
                          text=True)
    return {"data": result.stdout}