When analyzing job performance in GridEngine (now known as Univa Grid Engine or Altair Grid Engine), you'll encounter three fundamental time measurements:
# Sample GridEngine output format
=============================================
Wallclock Time: 00:05:23
User Time: 00:03:41
CPU Time: 00:04:12
=============================================
Wall clock time (or elapsed time) represents the actual time taken from job start to completion, as if you were timing it with a physical stopwatch. This includes:
- All system delays (I/O waits)
- Network latency
- Queue waiting time
- Other processes' interference
Example scenario where wall time differs significantly:
# Python script with sleep
import time
start = time.time()
# Computation
time.sleep(10) # Artificial delay
# More computation
end = time.time()
print(f"Wall time: {end - start:.2f} seconds")
User time accounts for the CPU time spent executing your application's code in user space. Key characteristics:
- Excludes system calls and kernel operations
- Multi-core systems may report values exceeding wall time
- Best for single-threaded performance comparison
Example using Linux time command:
$ time -p my_application
real 4.23 # Wall time
user 3.78 # User time
sys 0.32 # System time
CPU time (often called system time) includes both user time and time spent in kernel operations:
- Includes system calls and I/O operations
- For multi-threaded apps, can exceed wall time
- Important for analyzing system call overhead
Demonstration in C:
#include <sys/times.h>
#include <unistd.h>
void main() {
struct tms t;
times(&t);
// t.tms_utime = user CPU time
// t.tms_stime = system CPU time
}
For benchmarking applications:
Metric | Best Use Case | Limitations |
---|---|---|
Wall Clock | End-to-end system performance | Affected by external factors |
User Time | Algorithm efficiency | Misses system overhead |
CPU Time | Total resource usage | Hard to isolate app-specific cost |
Practical GridEngine monitoring example:
# qacct output analysis
qacct -j job_id | egrep "wallclock|ru_utime|ru_stime"
When dealing with parallel jobs:
# MPI job with 4 processes might show:
Wallclock: 60s
User Time: 220s (4 cores × 55s each)
CPU Time: 240s (includes system overhead)
For accurate comparisons:
- Run multiple iterations
- Use dedicated test nodes
- Monitor system load averages
- Consider using dedicated benchmarking tools like HPL
When analyzing computing jobs in GridEngine (now known as Univa Grid Engine or Altair Grid Engine), we encounter three fundamental time metrics that reveal different aspects of execution performance:
# Sample GridEngine output format
qacct -j job_id | egrep "wallclock|ru_utime|ru_stime"
wallclock 00:05:23
ru_utime 00:03:45
ru_stime 00:01:12
Wall clock time (also called elapsed time) measures the total duration from job start to completion, including all system overhead and waiting periods. This is what you'd measure with a physical stopwatch.
Example scenario: A Python script that makes API calls:
import time
start = time.time()
# API call with network latency
response = requests.get('https://api.example.com/data')
end = time.time()
print(f"Wall time: {end - start:.2f}s")
User CPU time (ru_utime) accounts for time spent executing application code in user space. Multiple threads/cores can cause this to exceed wall time.
// C program showing user vs system time
#include
#include
int main() {
clock_t start = clock();
// CPU-intensive user operations
for(long i=0; i<1000000000; i++);
clock_t end = clock();
printf("User CPU time: %.2f sec\n",
(double)(end - start)/CLOCKS_PER_SEC);
return 0;
}
Total CPU time (user + system time) represents all processor cycles consumed. System time (ru_stime) covers kernel operations like I/O handling.
Key differences in database operations:
- Wall time: Includes connection latency
- User time: Query processing only
- System time: Disk I/O operations
For comparing application performance:
- Algorithm efficiency: Use user CPU time (eliminates system variance)
- Real-world impact: Wall clock time matters for SLA compliance
- Resource usage: Total CPU time reveals optimization opportunities
Advanced analysis example using Unix time
command:
$ /usr/bin/time -v python script.py
Command being timed: "python script.py"
User time (seconds): 12.45
System time (seconds): 1.87
Percent of CPU this job got: 198%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:07.21
When submitting jobs with qsub
, be aware that:
- Wall time limits are enforced via
-l h_rt=HH:MM:SS
- CPU time accounting depends on scheduler configuration
- Multi-core jobs aggregate CPU time across all cores