In Sun Grid Engine (SGE/Univa Grid Engine), job concurrency can be controlled at multiple levels. While most administrators focus on system-wide limits through queue configurations, user-specific throttling is equally important for I/O-intensive workloads.
The simplest approach is using -tc
(task concurrency) during job submission:
qsub -tc 100 massive_job_array.sh
This ensures no more than 100 tasks from the array will run simultaneously. However, this only works for job arrays.
For more permanent solutions, create a user-specific complex resource:
qconf -sc | grep -i slot
# Add/modify the 'slots' complex attribute
qconf -mc <<EOF
name slots
shortcut s
type INT
requestable YES
consumable YES
default 1
EOF
# Apply to user
qconf -au dame user_slots=100
Create a dedicated PE with slot limitations:
qconf -ap throttled_pe <<EOF
pe_name throttled_pe
slots 100
user_lists dave
EOF
For complex scenarios, implement job dependencies:
#!/bin/bash
MAX_JOBS=100
for i in {1..500}; do
while [ $(qstat -u dave | grep " r " | wc -l) -ge $MAX_JOBS ]; do
sleep 30
done
qsub job_script_${i}.sh
done
Check active job counts with:
qstat -u dave | grep " r " | wc -l
Or use SGE's accounting:
qacct -j -u dave -d 1
When working with Sun Grid Engine (SGE), uncontrolled job submissions can lead to resource contention, especially when dealing with I/O-intensive workloads. A common scenario is when a single user submits hundreds of jobs that simultaneously access shared storage, causing filesystem bottlenecks for all cluster users.
The most straightforward approach is to configure slot limits through SGE's qconf -rattr
command. Here's how to implement a per-user limit:
# Set max running jobs for user 'dave' to 100
qconf -rattr queue slots=100 all.q@node001 -u dave
qconf -rattr queue slots=100 all.q@node002 -u dave
[... repeat for all nodes ...]
For temporary limits, SGE's complex values offer more flexibility:
# Create a complex attribute for job limits
qconf -sc >> /tmp/complex_attrs
echo "max_user_jobs max_user_jobs INT <= YES YES 0 0" >> /tmp/complex_attrs
qconf -Mc /tmp/complex_attrs
# Apply to specific user
qconf -mattr queue complex_values max_user_jobs=100 all.q
For users who want to self-limit their submissions, job arrays with throttling work well:
#!/bin/bash
#$ -t 1-500
#$ -tc 100 # Maximum concurrent tasks
#$ -q all.q
#$ -cwd
# Your actual job commands here
./process_data.sh $SGE_TASK_ID
After setting limits, verify them with:
qstat -u dave -s r | wc -l # Count running jobs
qconf -su dave # Check user configuration
qstat -f -explain a # View queue status with explanations
For more sophisticated control, you can create a cron job that adjusts limits based on system load:
#!/bin/bash
# Adjust limits when filesystem latency exceeds threshold
FS_LATENCY=$(iostat -dx /shared 1 2 | awk 'NR==4 {print $10}')
if (( $(echo "$FS_LATENCY > 50" | bc -l) )); then
qconf -mattr queue slots 50 all.q -u dave
logger "Reduced dave's job limit due to high I/O latency ($FS_LATENCY ms)"
fi