Advanced Linux System Administration Interview Questions for Senior DevOps Engineers


1 views

When testing senior candidates, I always include this classic scenario to assess their low-level system understanding:

# The classic fork bomb
:(){ :|:& };:

# First response - ulimit adjustment
ulimit -u 500  # Limit user processes

# Alternative method using cgroups
sudo cgcreate -g cpu,cpuacct,memory:/forkbomb_group
sudo cgset -r cpu.shares=512 forkbomb_group
sudo cgset -r memory.limit_in_bytes=1G forkbomb_group
sudo cgexec -g cpu,cpuacct,memory:forkbomb_group bash

A strong candidate should mention:

  • Process limits via ulimit (both hard and soft limits)
  • Cgroups for resource isolation
  • Kill commands with specific signal handling
  • Potential SELinux/AppArmor implications

This tests filesystem and process handle knowledge:

# Find Apache's PID
ps aux | grep apache2

# Check open file descriptors
ls -l /proc/<PID>/fd | grep log

# Recovery method (example for PID 1234)
cat /proc/1234/fd/7 > /var/log/apache2/error.log.recovered

# Verify with lsof
sudo lsof -p 1234 | grep deleted

I often present this network stack challenge:

# When TCP connections are stuck in CLOSE_WAIT
netstat -tnp | grep CLOSE_WAIT

# Deep investigation steps
sudo ss -temop
sudo cat /proc/net/tcp | grep -i "01"  # 01=CLOSE_WAIT
sudo strace -p <PID> -e trace=network

Senior engineers should understand these tools:

# OOM killer analysis
dmesg | grep -i oom

# Kernel parameter tuning
sysctl -w vm.panic_on_oom=1
sysctl -w kernel.panic=10

# Advanced profiling with perf
perf top -p <PID>
perf stat -e context-switches -p <PID>

Essential for senior roles:

# Verify ASLR status
cat /proc/sys/kernel/randomize_va_space

# Check for vulnerable SUID binaries
find / -perm -4000 -type f -exec ls -ld {} \;

# SELinux troubleshooting
sealert -a /var/log/audit/audit.log
ausearch -m avc -ts recent

A strong candidate should explain these concepts:

# Disk I/O tuning
echo deadline > /sys/block/sda/queue/scheduler
blockdev --setra 4096 /dev/sda

# Network stack optimization
sysctl -w net.core.somaxconn=4096
sysctl -w net.ipv4.tcp_fin_timeout=30

# Memory management
echo 1 > /proc/sys/vm/overcommit_memory
echo 80 > /proc/sys/vm/dirty_ratio

When interviewing senior Linux administrators, the fork bomb question tests both theoretical knowledge and practical crisis management skills. Here's a deep dive into the solution:

# Prevention (before it happens):
# Edit /etc/security/limits.conf to add:
*    hard    nproc    500
root hard    nproc    500

# Recovery (when already logged in):
# Method 1: Using exec to bypass fork bomb
exec /bin/bash
ulimit -S -u 100
pkill -TERM -u [username]

# Method 2: Alternative approach using cgroups
cgcreate -g cpu,memory:/forkbomb_limit
cgset -r cpu.cfs_quota_us=50000 forkbomb_limit
cgset -r memory.limit_in_bytes=512M forkbomb_limit

This scenario tests a candidate's understanding of Linux file handling and process management. The key lies in recognizing that deleted files still exist while processes maintain open handles.

# Step 1: Find the process holding the file descriptor
lsof | grep '(deleted)' | grep apache

# Sample output:
httpd   1234 apache    4w   REG    8,3    1204   123456 /var/log/apache/access.log (deleted)

# Step 2: Recover the content
# Method A: Copy directly from /proc
cat /proc/1234/fd/4 > /var/log/apache/access.log.recovered

# Method B: Use gdb for active processes
gdb -p 1234
(gdb) call creat("/var/log/apache/access.log.new", 0644)
(gdb) call sendfile(4, [return value from creat], 0, )
(gdb) quit

Here are three more challenging scenarios to consider during interviews:

# 1. Debugging a stuck NFS mount
grep -i nfs /var/log/messages
umount -f -l /mnt/nfs
cat /proc/fs/nfsfs/servers
cat /proc/fs/nfsfs/volumes

# 2. Diagnosing memory leaks
valgrind --leak-check=full --show-leak-kinds=all --track-origins=yes ./your_program
pmap -x [PID]
grep -i commit /proc/meminfo

# 3. Troubleshooting DNS resolution
dig +trace example.com @8.8.8.8
tcpdump -i eth0 -n port 53
strace -e trace=network -p [PID_of_process]

Senior admins should understand kernel-level optimization. Here's an example tuning Apache for high traffic:

# Kernel parameters for HTTP servers
sysctl -w net.ipv4.tcp_tw_reuse=1
sysctl -w net.ipv4.tcp_fin_timeout=15
sysctl -w net.core.somaxconn=65535

# Apache MPM tuning (prefork example)
<IfModule prefork.c>
StartServers            10
MinSpareServers         10
MaxSpareServers         30
ServerLimit             256
MaxRequestWorkers       256
MaxConnectionsPerChild  10000
</IfModule>

# MySQL concurrent connection handling
[mysqld]
thread_cache_size = 32
table_open_cache = 4096
max_connections = 200

Test candidates' security knowledge with these scenarios:

# 1. Detecting rootkits
rkhunter --check --sk
chkrootkit -q
ausearch -k suspicious | aureport -f -i

# 2. Implementing mandatory access control
# SELinux example for Apache:
semanage fcontext -a -t httpd_sys_content_t "/web(/.*)?"
restorecon -Rv /web
setsebool -P httpd_can_network_connect_db 1

# 3. Securing SSH access
# /etc/ssh/sshd_config configuration:
Protocol 2
PermitRootLogin no
MaxAuthTries 3
LoginGraceTime 1m
AllowUsers adminuser deployuser