When managing hundreds of cloud servers across multiple regions, traditional SSH key distribution methods become unwieldy. Manual key rotation on individual EC2 instances creates security gaps and operational overhead. Consider this common pain point:
# Problematic manual process
ssh-copy-id user@server1
ssh-copy-id user@server2
...
ssh-copy-id user@serverN
An effective centralized SSH key management system should provide:
- Real-time key propagation to all authorized servers
- Granular access controls per user/service account
- Automated revocation workflows
- Audit trails for compliance
- Integration with existing IAM systems
Here are battle-tested approaches we've evaluated:
1. Open Source: Teleport
Teleport's SSH certificate authority provides short-lived certificates instead of static keys:
# Node joining Teleport cluster
teleport start --roles=node --auth-server=teleport.example.com:3025 \
--token=xxxx --labels=env=prod,region=us-west
Example user access flow:
# User obtains certificate
tsh login --proxy=teleport.example.com
# Session recording begins automatically
ssh ec2-user@production-db
2. Commercial: HashiCorp Boundary
Boundary combines SSH key management with session brokering:
# Target definition in HCL
resource "boundary_target" "prod_web" {
name = "production_web"
description = "Prod Web Servers"
scope_id = boundary_scope.proj.id
host_source_ids = [
boundary_host_set.prod_web.id
]
default_port = 22
}
3. Hybrid Approach: OpenSSH + LDAP
For organizations wedded to OpenSSH, this combo works:
# sshd_config snippet
AuthorizedKeysCommand /usr/bin/ldapsearch -x -LLL \
"(&(objectClass=posixAccount)(uid=%u))" sshPublicKey | \
sed -n 's/^sshPublicKey: //p'
AuthorizedKeysCommandUser nobody
When deploying any solution:
- Ensure zero trust principles - never store private keys centrally
- Implement JIT (Just-In-Time) access where possible
- Enforce MFA for all privilege escalation
- Maintain offline emergency access methods
Example emergency breakglass procedure:
# Encrypted backup access method
gpg --decrypt emergency_access.gpg | \
ssh -o StrictHostKeyChecking=no \
-o UserKnownHostsFile=/dev/null \
admin@$(terraform output -raw backup_bastion_ip)
When managing hundreds or thousands of cloud servers (EC2, GCP, or Azure instances) across multiple regions, traditional SSH key distribution becomes unmanageable. The pain points include:
- Key revocation requires manual updates on every server
- No audit trail for key usage
- Difficulty rotating keys across entire infrastructure
Here are three proven approaches with their technical implementations:
1. LDAP + OpenSSH Patch
The most robust open-source solution combines OpenLDAP with OpenSSH's AuthorizedKeysCommand
:
# /etc/ssh/sshd_config
AuthorizedKeysCommand /usr/local/bin/ssh-ldap-helper
AuthorizedKeysCommandUser nobody
# Sample ssh-ldap-helper script
#!/bin/bash
ldapsearch -x -h ldap.example.com -b "ou=People,dc=example,dc=com" \
"(&(objectClass=posixAccount)(uid=$1))" sshPublicKey | \
awk '/^sshPublicKey:/ {print $2}'
2. Commercial Solutions Feature Matrix
Product | Key Rotation | Cloud Integration | ACL Granularity |
---|---|---|---|
Teleport | Automatic | AWS/GCP/Azure | Per-server roles |
HashiCorp Boundary | Manual | All major clouds | Session-based |
Smallstep SSH | Certificate-based | Kubernetes-native | Time-bound |
3. Certificate-Based SSH with Vault
Using HashiCorp Vault's SSH secrets engine:
# Configure Vault SSH engine
vault secrets enable -path=ssh-client-signer ssh
vault write ssh-client-signer/config/ca generate_signing_key=true
# Server-side configuration (in /etc/ssh/sshd_config)
TrustedUserCAKeys /etc/ssh/trusted-user-ca-keys.pem
RevokedKeys /etc/ssh/revoked-keys
Key metrics to evaluate when choosing a solution:
- Latency: Global lookup performance for distributed teams
- Compliance: Support for FIPS 140-2 or ISO 27001 requirements
- Disaster Recovery: Key store replication across regions
For AWS environments, this configuration enables just-in-time access:
# teleport.yaml
auth_service:
enabled: true
cluster_name: "production"
listen_addr: 0.0.0.0:3025
ssh_service:
enabled: true
labels:
"env": "prod"
commands:
- name: "instance-info"
command: ["/usr/bin/aws", "ec2", "describe-instances"]
period: "1h"
This approach provides centralized audit logs while maintaining granular access control through IAM integration.