Just like developers use the Joel Test to evaluate programming jobs, system administrators need a similar framework to assess potential workplaces. The original 12-question Joel Test for programmers doesn't fully address the unique challenges sysadmins face in infrastructure management, security, and operational policies.
Here's the curated list of yes/no questions every sysadmin should ask during interviews:
1. Do you use version control for system configurations?
2. Can you build and deploy systems with a single command/script?
3. Do you have comprehensive monitoring covering all critical systems?
4. Is there a documented disaster recovery plan tested within the last year?
5. Are system administrators involved in architecture decisions?
6. Do you follow the principle of least privilege for access control?
7. Is there a formal change management process for production systems?
8. Are systems automatically patched within 30 days of critical updates?
9. Do you have dedicated test/staging environments matching production?
10. Is there budget allocated for professional development/training?
11. Are on-call rotations properly compensated and sustainable?
12. Is technical debt regularly addressed and prioritized?
Let's examine some technical implementations that would satisfy these questions:
# Example of infrastructure-as-code (Question #2)
#!/bin/bash
# Automated deployment script for web servers
# Provision infrastructure
terraform apply -auto-approve
# Configure servers
ansible-playbook -i hosts.ini webserver.yml
# Verify deployment
curl -sSf http://loadbalancer/healthcheck > /dev/null || exit 1
Question #3 about monitoring might translate to a Prometheus configuration like:
# prometheus.yml snippet showing comprehensive monitoring
scrape_configs:
- job_name: 'node'
static_configs:
- targets: ['server1:9100', 'server2:9100']
- job_name: 'webapp'
metrics_path: '/metrics'
static_configs:
- targets: ['app1:8080', 'app2:8080']
- job_name: 'database'
static_configs:
- targets: ['db1:9187']
Negative answers to certain questions reveal serious issues. For question #6 (least privilege), finding global admin rights everywhere suggests security immaturity. In PowerShell:
# Bad practice example - blanket admin access
Grant-Admins -Group "All Employees" -Role "Domain Admin"
While the principles are universal, implementations vary. A cloud-native answer to question #9 might be:
// Terraform module creating identical staging/prod environments
module "production" {
source = "./environments"
env_name = "prod"
}
module "staging" {
source = "./environments"
env_name = "stage"
}
Just like developers have Joel Spolsky's 12 questions to evaluate software teams, system administrators need a standardized way to assess IT environments. This isn't just about technical stacks - it's about identifying organizational red flags that turn sysadmin jobs into nightmares.
Here's the essential checklist every infrastructure professional should run through:
- Do you have automated server provisioning?
- Is there a documented disaster recovery plan tested within the last year?
- Can new hires deploy code to production without begging for access?
- Are production changes tracked in a version control system?
- Do you have centralized logging with at least 30 days retention?
- Is there a separate staging environment matching production?
- Do developers have production access when needed?
- Are critical systems monitored with alert thresholds?
- Is there a formal change management process?
- Are security patches applied within 30 days of release?
- Does IT participate in business continuity planning?
- Are systems documented well enough for someone to be hit by a bus?
For question #1 (automated provisioning), here's what good looks like in practice:
# Ansible playbook snippet for automated web server deployment
- hosts: webservers
become: yes
tasks:
- name: Install Apache
apt:
name: apache2
state: latest
update_cache: yes
- name: Enable mod_rewrite
apache2_module:
name: rewrite
state: present
notify: restart apache
handlers:
- name: restart apache
service:
name: apache2
state: restarted
When you hear answers like these, consider it a warning:
- "We're transitioning to automation" (usually means no automation exists)
- "Our DR plan is in the IT manager's head"
- "We document everything in Sharepoint" (translation: nothing is findable)
Yes Answers | Workplace Health |
---|---|
0-3 | Run away immediately |
4-6 | Expect daily firefighting |
7-9 | Average IT shop |
10-12 | Rare well-managed environment |
The best organizations will proudly answer "yes" to most questions before you even ask. If they get defensive or vague, that tells you everything.