How to Accurately Monitor Memory Usage of Single-CPU Jobs in SGE (Sun Grid Engine)


6 views

When working with SGE (particularly version 6.2u5), many users encounter confusing discrepancies between actual memory usage shown in system tools like top and the values reported by SGE utilities (qstat, qacct). Here's what I've discovered through extensive testing and discussions with cluster administrators.

Let's break down the key memory metrics you'll encounter:

# From top:
VIRT: 45.6g  # Virtual memory size (includes reserved but unused memory)
RES: 38g     # Resident memory (actual physical RAM used)
SHR: 9600    # Shared memory (portion that could be shared with other processes)

# From qacct:
mem 2768.453   # Memory usage in MB-seconds (integral over time)
maxvmem 4.078G # Peak virtual memory usage during job execution

The maxvmem value in SGE is typically lower than what top shows because:

  1. SGE measures the process tree's memory usage differently than top
  2. SGE may not account for memory-mapped files or shared libraries properly
  3. The sampling frequency of SGE might miss short memory spikes

Here are three reliable approaches I've used to get accurate memory measurements:

1. Using timev and RSS Measurement

Create a wrapper script that periodically samples RSS:

#!/bin/bash
# monitor_mem.sh
while true; do
    ps -p $1 -o rss= >> memory_usage.log
    sleep 30
done

# Usage:
# ./monitor_mem.sh $$ &
# your_actual_command
# kill %1

2. Direct cgroup Memory Stats

For newer systems using cgroups:

grep 'total_rss' /sys/fs/cgroup/memory/sge/*/memory.stat

3. Enhanced SGE Accounting

Configure SGE to use more accurate memory accounting (requires admin access):

# In sge_conf:
execd_params ENABLE_ADDGRP_KILL=1 MEMORY_ACCOUNTING=true
complex_values mem=virtual_free

The mem field in qacct shows MB-seconds (memory-time product). To calculate average memory usage:

average_mem_MB = mem / (end_time - start_time)

For the example with mem=2768.453 and 100 seconds runtime:

2768.453 MB-s / 100s = 27.68 MB average usage

This explains why it appears much lower than your peak usage.

Always request adequate memory in your job submission:

qsub -l h_vmem=50G -l mem_free=50G my_job.sh

This ensures proper scheduling and prevents memory-based job failures.

When SGE reporting isn't sufficient, consider these alternatives:

  • htop - Enhanced version of top with tree view
  • smem - Provides proportional set size (PSS) measurement
  • /proc/$PID/status - Detailed memory stats for any process

The confusion between SGE's reported memory values (qacct/qstat) and system-level measurements (top) stems from fundamental differences in what these tools measure:

// Sample qacct output structure
job_number 7270916
mem        2768.453     # Total memory GB-seconds
maxvmem    4.078G       # Peak virtual memory usage

top shows real-time memory allocation:

  • VIRT: Total virtual memory (45.6GB - includes reserved but unused memory)
  • RES: Resident memory actually in RAM (38GB)
  • SHR: Shared memory portions (9.6MB)

Sun Grid Engine tracks memory differently:

// Sample qstat -j output
usage 1: cpu=00:01:37, mem=168.12988 GBs, io=38.64676, vmem=1.665G, maxvmem=4.078G

Key metrics:

  1. mem: Cumulative memory-time product (GB-seconds)
  2. maxvmem: Peak virtual memory observed by SGE

For precise RAM tracking in SGE 6.2u5:

#!/bin/bash
# Method 1: Use /proc directly
grep VmRSS /proc/$JOB_ID/status | awk '{print $2}'

# Method 2: Enhanced SGE reporting
qacct -j $JOB_ID | grep -E 'mem|maxvmem' | awk '{printf "%.2f GB\n", $2/1024}'

# Method 3: Periodic sampling
while sleep 5; do
  ps -p $JOB_ID -o rss= | awk '{print $1/1024/1024 " GB"}'
done
  • Enable detailed SGE accounting with -l m_mem_free=4G in job submission
  • Combine qacct with /proc monitoring for complete picture
  • For critical memory measurements, implement custom logging within your application

Consider these additional methods for more precise tracking:

// Python memory profiler snippet
import resource
def memory_usage():
    return resource.getrusage(resource.RUSAGE_SELF).ru_maxrss/1024/1024

Third-party tools worth exploring:

  • GNU time with -v flag
  • Valgrind massif for detailed heap analysis
  • Custom cgroups memory tracking