Best Practices for SSH Key Management in Large-Scale Linux Server Environments (2500+ Nodes)


2 views

html

Using a single SSH keypair across 2500+ servers creates significant security vulnerabilities. If compromised, an attacker gains access to the entire infrastructure. Common issues include:

  • No individual accountability (all admins use same key)
  • Compromise propagation risk
  • PCI DSS/NIST compliance violations
  • No granular access revocation

For our jump host scenario with 2500 target servers, we have several architectural options:

Option 1: Certificate-Based Authentication

The most scalable solution using SSH certificates signed by a CA:

# On CA server:
ssh-keygen -t rsa -b 4096 -f host_ca
ssh-keygen -s host_ca -I jump_host -h -n server1,server2,... server1.pub

# In sshd_config on all servers:
TrustedUserCAKeys /etc/ssh/host_ca.pub
HostCertificate /etc/ssh/ssh_host_rsa_key-cert.pub

Option 2: HashiCorp Vault SSH Secret Engine

Dynamic key generation with Vault:

# Vault configuration:
vault secrets enable ssh
vault write ssh/roles/linux_admin \
  key_type=otp \
  default_user=admin \
  cidr_list=10.0.0.0/8

# Client usage:
vault ssh -role=linux_admin -mode=otp -strict-host-key-checking=no admin@server1

Option 3: Per-Service Principal Keys

For critical services, deploy unique keys with ansible/vault:

# Ansible playbook snippet:
- hosts: all_servers
  vars_files:
    - vault.yml
  tasks:
    - name: Deploy unique SSH key
      ansible.builtin.copy:
        content: "{{ vaulted_ssh_private_key }}"
        dest: /etc/ssh/jump_key
        mode: '0600'

For most enterprises with 2500+ nodes, we recommend a hybrid approach:

  1. Certificate-based auth for all human access
  2. Vault-managed keys for CI/CD systems
  3. Quarterly key rotation automation
  4. Centralized logging via syslog-ng/rsyslog

Ensure your solution meets:

  • NIST SP 800-53 IA-5(1) for authenticator management
  • PCI DSS Requirement 8.2.1 for unique credentials
  • SOX controls for access management

html

Using a single private-public key pair across 2500+ servers creates a massive security vulnerability. If compromised, an attacker gains access to the entire infrastructure. While deploying unique keys per server improves security, it introduces key management complexity at scale.

Implementing an SSH Certificate Authority (CA) solves both security and management challenges:

# Example of generating a host certificate
ssh-keygen -s ca_key -I host_identifier -h -n server1.example.com server1_key.pub

# Example client configuration (~/.ssh/config)
Host *.example.com
    CertificateFile ~/.ssh/id_ecdsa-cert.pub
    IdentityFile ~/.ssh/id_ecdsa

1. Set up the CA infrastructure:

# Generate CA keys (store securely)
ssh-keygen -t ed25519 -f ~/ssh_ca/ca_key

# Sign host keys (automate this process)
for server in $(cat server_list); do
    scp $server:/etc/ssh/ssh_host_ed25519_key.pub /tmp/
    ssh-keygen -s ~/ssh_ca/ca_key -I $server -h -n $server /tmp/ssh_host_ed25519_key.pub
    scp /tmp/ssh_host_ed25519_key-cert.pub $server:/etc/ssh/
done

2. Configure servers to trust the CA:

# /etc/ssh/sshd_config
TrustedUserCAKeys /etc/ssh/ca.pub
HostCertificate /etc/ssh/ssh_host_ed25519_key-cert.pub

Implement these key management practices:

  • Use configuration management tools (Ansible/Puppet) to distribute certificates
  • Set appropriate certificate lifetimes (e.g., 30 days for user certs)
  • Implement automatic certificate renewal processes
# Example Ansible playbook snippet
- name: Deploy SSH host certificates
  hosts: all_servers
  tasks:
    - copy:
        src: "{{ inventory_hostname }}-cert.pub"
        dest: /etc/ssh/ssh_host_ed25519_key-cert.pub
        owner: root
        group: root
        mode: 0644

For environments not ready for CA implementation:

# Bastion host configuration example
Match Group admins
    ForceCommand /usr/local/bin/ssh-auth-wrapper
    AuthenticationMethods publickey,keyboard-interactive