Implementing Single Outbound IP for GCP/GKE Workloads with Cloud NAT


43 views

When integrating with third-party APIs that require IP whitelisting, managing dynamic GKE workloads becomes problematic. Traditional approaches like instance-level NAT gateways introduce SPOF (Single Point of Failure) and maintenance overhead. The ideal solution should:

  • Provide a static external IP for all outbound traffic
  • Scale automatically with GKE cluster changes
  • Maintain high availability
  • Require minimal configuration maintenance

Google Cloud's managed NAT service solves this elegantly without maintaining standalone NAT instances. Here's the architecture breakdown:

+---------------+
|   GKE Nodes   |
| (no public IP)|
+-------┬-------+
        │
        ↓
+---------------+
|  Cloud NAT    |
| (Static IP)   |
+-------┬-------+
        │
        ↓
+---------------+
|   Internet    |
+---------------+

1. Reserve a Static IP

gcloud compute addresses create api-gateway-ip \
  --region=us-central1 \
  --network-tier=PREMIUM

2. Configure Cloud Router

gcloud compute routers create nat-router \
  --network=default \
  --region=us-central1 \
  --asn=64512

3. Set Up Cloud NAT

gcloud compute routers nats create api-nat-config \
  --router=nat-router \
  --region=us-central1 \
  --nat-custom-subnet-ip-ranges="ALL_SUBNETWORKS_ALL_IP_RANGES" \
  --nat-external-ip-pool="api-gateway-ip"

For GKE clusters, ensure your node pools are configured without external IPs:

gcloud container node-pools create workload-pool \
  --cluster=my-cluster \
  --no-enable-external-ips \
  --machine-type=n2-standard-4

To confirm all outbound traffic uses the static IP:

# From any GKE pod:
kubectl run --rm -it curl-test --image=curlimages/curl -- sh
curl ifconfig.me

For production environments, consider these enhancements:

  • Configure NAT logging for audit trails
  • Set minimum ports per VM to avoid SNAT exhaustion
  • Implement regional NAT for cross-zone redundancy

Symptom: Outbound connections failing
Check: Verify firewall rules allow egress from GKE nodes to 0.0.0.0/0

Symptom: IP mismatch
Check: Confirm no other NAT gateways or instance-level routes exist


When integrating with third-party APIs requiring IP whitelisting in GCP, we face a fundamental architectural decision: how to maintain a consistent egress point while accommodating dynamic Kubernetes workloads. The traditional approach of assigning individual public IPs to each GCE instance becomes untenable when dealing with auto-scaling clusters and API restrictions.

GCP's Cloud NAT service solves exactly this problem without requiring manual NAT instance management. Here's how to implement it:

# Create a router for your region
gcloud compute routers create nat-router \
    --network=default \
    --region=us-central1

# Configure Cloud NAT with static IP
gcloud compute routers nats create nat-config \
    --router-region=us-central1 \
    --router=nat-router \
    --nat-all-subnet-ip-ranges \
    --auto-allocate-nat-external-ips \
    --nat-external-ip-pool=YOUR_STATIC_IP_ADDRESS

For GKE clusters, we need additional configuration to ensure all egress traffic uses the NAT gateway:

apiVersion: v1
kind: ConfigMap
metadata:
  name: kube-system
data:
  disable-legacy-endpoints: "true"
  enable-aggregator-routing: "true"
  egress-firewall: |
    {
      "apiVersion": "networking.gke.io/v1",
      "kind": "EgressFirewall",
      "metadata": {
        "name": "force-nat-egress"
      },
      "spec": {
        "egress": [
          {
            "to": {
              "cidr": "0.0.0.0/0"
            },
            "type": "Deny"
          },
          {
            "to": {
              "cidr": "THIRD_PARTY_API_IP/32"
            },
            "type": "Allow"
          }
        ]
      }
    }

Cloud NAT automatically provides regional redundancy when configured properly:

  • Deploy NAT gateways in all zones where your workloads run
  • Use custom route advertisements to ensure proper traffic flow
  • Monitor NAT gateway metrics through Cloud Monitoring

Balance between performance and cost with these techniques:

# Set minimum ports per VM to control scaling
gcloud compute routers nats update nat-config \
    --min-ports-per-vm=32 \
    --max-ports-per-vm=64

When debugging NAT connectivity problems:

# Check active NAT mappings
gcloud compute ssh nat-gateway-instance --command "sudo conntrack -L"

# Verify route propagation
gcloud compute routes list --filter="network=default"