When your Solaris server shows high load averages (54-63) while CPU utilization remains under 5%, you're facing a classic performance puzzle. The key indicators from your diagnostic outputs reveal:
# prstat -Z shows
Total: 135 processes, 3167 lwps, load averages: 54.48, 62.50, 63.11
Yet vmstat
reports 92-98% idle time. This discrepancy points to resource contention that isn't CPU-bound.
Examining your Java processes (WebLogic domains) shows interesting thread patterns:
java/225 # One domain with 225 threads
java/209 # Another with 209 threads
...
# Total LWPs (light-weight processes): 3167
The high thread count suggests possible:
- Thread pool saturation
- Lock contention
- Blocked I/O operations
Your netstat -i
output shows healthy network throughput without errors:
input aggr26 output input (Total) output
1500233798 0 1489316495 0 0 3608008314 0 3586173708 0 0
This rules out network bottlenecks as the primary issue.
To pinpoint Java-level issues, run these commands for each WebLogic domain:
# Get thread dumps (run 3-5 times at 10s intervals)
kill -3 <java_pid>
# Check GC behavior
jstat -gcutil <java_pid> 1000 10
Example output to watch for:
S0 S1 E O P YGC YGCT FGC FGCT GCT
0.00 96.88 63.78 41.63 99.99 1309 75.021 5 3.104 78.125
Common configuration issues in WebLogic 10.3.5 that could cause this:
- Execute Queue Threads:
<max-threads-constraint> <name>MaxThreadsConstraint</name> <count>100</count> </max-threads-constraint>
- Stuck Thread Detection:
<stuck-thread-max-time>600</stuck-thread-max-time> <stuck-thread-timer-interval>60</stuck-thread-timer-interval>
Since your apps connect to Oracle, check connection pool usage:
# In WebLogic console:
JDBCDataSourceRuntimeMBean.getWaitingForConnectionCurrentCount()
JDBCDataSourceRuntimeMBean.getWaitingForConnectionHighCount()
Sample problematic pattern:
# High wait counts indicate pool saturation
WaitingForConnectionCurrentCount: 142
WaitingForConnectionHighCount: 150
Solaris handles Java threads differently than Linux. Consider these tunables:
# /etc/system parameters
set max_nprocs=30000
set maxuprc=15000
set rlim_fd_max=8192
set rlim_fd_cur=4096
Also verify project limits for WebLogic users:
prctl -n project.max-lwps -i project <project_id>
For deeper analysis, use these Solaris tools:
# Show system call activity
dtrace -n 'syscall:::entry { @[execname] = count(); }'
# Monitor thread state distribution
prstat -mLc -p <java_pid> 5
Example thread state breakdown:
PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID
3836 ducm0101 0.1 0.0 0.0 0.0 0.0 2.1 0.0 97.8 15k 5.2k 0 0 java/1
3836 ducm0101 0.0 0.0 0.0 0.0 0.0 98.2 0.0 1.8 2.1k 1.3k 0 0 java/225
- Capture 5 thread dumps at 10-second intervals
- Verify JDBC connection pool sizing
- Check for filesystem waits (NFS mounts?)
- Monitor Solaris kernel dispatcher activity
- Review WebLogic work manager configurations
When your Solaris system shows load averages spiking to 50-60 while CPU usage remains below 5%, we're looking at a classic case of resource contention that isn't immediately visible through standard monitoring tools. The key indicators from your diagnostic outputs:
// prstat shows multiple Java processes with hundreds of threads
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
3836 ducm0101 2119M 2074M cpu348 58 0 8:41:56 0.5% java/225
24196 ducm0101 1974M 1910M sleep 59 0 4:04:33 0.4% java/209
With 8 WebLogic domains running Java applications, thread synchronization becomes the prime suspect. The high load average indicates threads waiting on:
- Database connection pools (despite being on a separate server)
- JVM garbage collection pauses (even without visible CPU spikes)
- Application-level locks or synchronized blocks
Standard Linux tools won't reveal the full picture. Try these Solaris-specific approaches:
# Check for thread blocking in JVMs
dtrace -n 'profile-97 /pid == $target/ { @[ustack()] = count(); }' -p <java_pid>
# Monitor LWPs (light-weight processes)
mpstat -a 15
Add these JMX queries to your monitoring:
// WebLogic JDBC pool monitoring MBean
ObjectName poolName = new ObjectName("com.bea:ServerRuntime=AdminServer,Name=YOUR_POOL,Type=JDBCConnectionPoolRuntime");
Integer waiting = (Integer) mbeanServer.getAttribute(poolName, "WaitingForConnectionCurrentCount");
Integer active = (Integer) mbeanServer.getAttribute(poolName, "ActiveConnectionsCurrentCount");
Your Java 1.6 setup might benefit from these JVM flags:
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=70
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps
-XX:+PrintTenuringDistribution
Despite netstat showing no errors, add these checks:
# Check for TCP retransmits
netstat -s -P tcp | grep retrans
# Oracle SQL*Net latency
SELECT sql_id, elapsed_time/executions/1000 "ms_per_exec"
FROM v$sqlarea
WHERE executions > 1000
ORDER BY 2 DESC;
- Reduce WebLogic domain count from 8 to 4-5 on this server
- Implement request timeouts in your Java code
- Add connection pool validation queries
- Enable WebLogic work manager constraints