Understanding Wall Clock Time vs User Time vs CPU Time in GridEngine Performance Benchmarking

When analyzing job performance in GridEngine (now known as Univa Grid Engine or Altair Grid Engine), you'll encounter three fundamental time measurements:

# Sample GridEngine output format
=============================================
Wallclock Time: 00:05:23
User Time: 00:03:41
CPU Time: 00:04:12
=============================================

Wall clock time (or elapsed time) represents the actual time taken from job start to completion, as if you were timing it with a physical stopwatch. This includes:

All system delays (I/O waits)
Network latency
Queue waiting time
Other processes' interference

Example scenario where wall time differs significantly:

# Python script with sleep
import time

start = time.time()
# Computation
time.sleep(10)  # Artificial delay
# More computation
end = time.time()
print(f"Wall time: {end - start:.2f} seconds")

User time accounts for the CPU time spent executing your application's code in user space. Key characteristics:

Excludes system calls and kernel operations
Multi-core systems may report values exceeding wall time
Best for single-threaded performance comparison

Example using Linux time command:

$ time -p my_application
real 4.23  # Wall time
user 3.78  # User time
sys 0.32   # System time

CPU time (often called system time) includes both user time and time spent in kernel operations:

Includes system calls and I/O operations
For multi-threaded apps, can exceed wall time
Important for analyzing system call overhead

Demonstration in C:

#include <sys/times.h>
#include <unistd.h>

void main() {
    struct tms t;
    times(&t);
    // t.tms_utime = user CPU time
    // t.tms_stime = system CPU time
}

For benchmarking applications:

Metric	Best Use Case	Limitations
Wall Clock	End-to-end system performance	Affected by external factors
User Time	Algorithm efficiency	Misses system overhead
CPU Time	Total resource usage	Hard to isolate app-specific cost

Practical GridEngine monitoring example:

# qacct output analysis
qacct -j job_id | egrep "wallclock|ru_utime|ru_stime"

When dealing with parallel jobs:

# MPI job with 4 processes might show:
Wallclock: 60s
User Time: 220s (4 cores × 55s each)
CPU Time: 240s (includes system overhead)

For accurate comparisons:

Run multiple iterations
Use dedicated test nodes
Monitor system load averages
Consider using dedicated benchmarking tools like HPL

When analyzing computing jobs in GridEngine (now known as Univa Grid Engine or Altair Grid Engine), we encounter three fundamental time metrics that reveal different aspects of execution performance:

# Sample GridEngine output format
qacct -j job_id | egrep "wallclock|ru_utime|ru_stime"
wallclock   00:05:23
ru_utime    00:03:45
ru_stime    00:01:12

Wall clock time (also called elapsed time) measures the total duration from job start to completion, including all system overhead and waiting periods. This is what you'd measure with a physical stopwatch.

Example scenario: A Python script that makes API calls:

import time
start = time.time()
# API call with network latency
response = requests.get('https://api.example.com/data') 
end = time.time()
print(f"Wall time: {end - start:.2f}s")

User CPU time (ru_utime) accounts for time spent executing application code in user space. Multiple threads/cores can cause this to exceed wall time.

// C program showing user vs system time
#include 
#include 

int main() {
    clock_t start = clock();
    // CPU-intensive user operations
    for(long i=0; i<1000000000; i++);
    clock_t end = clock();
    printf("User CPU time: %.2f sec\n", 
           (double)(end - start)/CLOCKS_PER_SEC);
    return 0;
}

Total CPU time (user + system time) represents all processor cycles consumed. System time (ru_stime) covers kernel operations like I/O handling.

Key differences in database operations:

Wall time: Includes connection latency
User time: Query processing only
System time: Disk I/O operations

For comparing application performance:

Algorithm efficiency: Use user CPU time (eliminates system variance)
Real-world impact: Wall clock time matters for SLA compliance
Resource usage: Total CPU time reveals optimization opportunities

Advanced analysis example using Unix time command:

$ /usr/bin/time -v python script.py
Command being timed: "python script.py"
User time (seconds): 12.45
System time (seconds): 1.87
Percent of CPU this job got: 198%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:07.21

When submitting jobs with qsub, be aware that:

Wall time limits are enforced via -l h_rt=HH:MM:SS
CPU time accounting depends on scheduler configuration
Multi-core jobs aggregate CPU time across all cores

ServerDevWorker

Understanding Wall Clock Time vs User Time vs CPU Time in GridEngine Performance Benchmarking

Related Articles