Enterprise SSH Key Management: Centralized Solutions for Global Server Access Control


1 views

When managing hundreds of cloud servers across multiple regions, traditional SSH key distribution methods become unwieldy. Manual key rotation on individual EC2 instances creates security gaps and operational overhead. Consider this common pain point:

# Problematic manual process
ssh-copy-id user@server1
ssh-copy-id user@server2
...
ssh-copy-id user@serverN

An effective centralized SSH key management system should provide:

  • Real-time key propagation to all authorized servers
  • Granular access controls per user/service account
  • Automated revocation workflows
  • Audit trails for compliance
  • Integration with existing IAM systems

Here are battle-tested approaches we've evaluated:

1. Open Source: Teleport

Teleport's SSH certificate authority provides short-lived certificates instead of static keys:

# Node joining Teleport cluster
teleport start --roles=node --auth-server=teleport.example.com:3025 \
--token=xxxx --labels=env=prod,region=us-west

Example user access flow:

# User obtains certificate
tsh login --proxy=teleport.example.com

# Session recording begins automatically
ssh ec2-user@production-db

2. Commercial: HashiCorp Boundary

Boundary combines SSH key management with session brokering:

# Target definition in HCL
resource "boundary_target" "prod_web" {
  name         = "production_web"
  description  = "Prod Web Servers"
  scope_id     = boundary_scope.proj.id
  host_source_ids = [
    boundary_host_set.prod_web.id
  ]
  default_port = 22
}

3. Hybrid Approach: OpenSSH + LDAP

For organizations wedded to OpenSSH, this combo works:

# sshd_config snippet
AuthorizedKeysCommand /usr/bin/ldapsearch -x -LLL \
  "(&(objectClass=posixAccount)(uid=%u))" sshPublicKey | \
  sed -n 's/^sshPublicKey: //p'
AuthorizedKeysCommandUser nobody

When deploying any solution:

  • Ensure zero trust principles - never store private keys centrally
  • Implement JIT (Just-In-Time) access where possible
  • Enforce MFA for all privilege escalation
  • Maintain offline emergency access methods

Example emergency breakglass procedure:

# Encrypted backup access method
gpg --decrypt emergency_access.gpg | \
  ssh -o StrictHostKeyChecking=no \
      -o UserKnownHostsFile=/dev/null \
      admin@$(terraform output -raw backup_bastion_ip)

When managing hundreds or thousands of cloud servers (EC2, GCP, or Azure instances) across multiple regions, traditional SSH key distribution becomes unmanageable. The pain points include:

  • Key revocation requires manual updates on every server
  • No audit trail for key usage
  • Difficulty rotating keys across entire infrastructure

Here are three proven approaches with their technical implementations:

1. LDAP + OpenSSH Patch

The most robust open-source solution combines OpenLDAP with OpenSSH's AuthorizedKeysCommand:

# /etc/ssh/sshd_config
AuthorizedKeysCommand /usr/local/bin/ssh-ldap-helper
AuthorizedKeysCommandUser nobody

# Sample ssh-ldap-helper script
#!/bin/bash
ldapsearch -x -h ldap.example.com -b "ou=People,dc=example,dc=com" \
  "(&(objectClass=posixAccount)(uid=$1))" sshPublicKey | \
  awk '/^sshPublicKey:/ {print $2}'

2. Commercial Solutions Feature Matrix

Product Key Rotation Cloud Integration ACL Granularity
Teleport Automatic AWS/GCP/Azure Per-server roles
HashiCorp Boundary Manual All major clouds Session-based
Smallstep SSH Certificate-based Kubernetes-native Time-bound

3. Certificate-Based SSH with Vault

Using HashiCorp Vault's SSH secrets engine:

# Configure Vault SSH engine
vault secrets enable -path=ssh-client-signer ssh
vault write ssh-client-signer/config/ca generate_signing_key=true

# Server-side configuration (in /etc/ssh/sshd_config)
TrustedUserCAKeys /etc/ssh/trusted-user-ca-keys.pem
RevokedKeys /etc/ssh/revoked-keys

Key metrics to evaluate when choosing a solution:

  • Latency: Global lookup performance for distributed teams
  • Compliance: Support for FIPS 140-2 or ISO 27001 requirements
  • Disaster Recovery: Key store replication across regions

For AWS environments, this configuration enables just-in-time access:

# teleport.yaml
auth_service:
  enabled: true
  cluster_name: "production"
  listen_addr: 0.0.0.0:3025

ssh_service:
  enabled: true
  labels:
    "env": "prod"
  commands:
  - name: "instance-info"
    command: ["/usr/bin/aws", "ec2", "describe-instances"]
    period: "1h"

This approach provides centralized audit logs while maintaining granular access control through IAM integration.