RabbitMQ Docker Cluster Setup: Solving .erlang.cookie Mismatch When RABBITMQ_ERLANG_COOKIE Env Variable Fails


4 views

When deploying RabbitMQ clusters in Docker containers, one critical requirement is ensuring all nodes share the same .erlang.cookie file. This secret cookie enables Erlang nodes to communicate securely. The official RabbitMQ Docker image documentation suggests using the RABBITMQ_ERLANG_COOKIE environment variable to set this value uniformly across containers.

In practice, many developers encounter a situation where:

  • The environment variable is properly set in the container
  • The actual /var/lib/rabbitmq/.erlang.cookie file contains a different value
  • Cluster formation fails due to cookie mismatch

Here's a typical scenario:

docker run -d --hostname rabbit1 \
  -e RABBITMQ_ERLANG_COOKIE="QOKWQHQKXXTBIEAOPWKE" \
  --name rabbit1 rabbitmq:3.6.9-alpine

docker exec rabbit1 cat /var/lib/rabbitmq/.erlang.cookie
# Output: AYMNAPKRPCPJVPFYAJZX (not matching our env var)

The issue stems from the container initialization sequence:

  1. The RabbitMQ image creates a default cookie during container startup
  2. The environment variable processing occurs too late in the boot process
  3. The cookie file permissions (400) make runtime modifications difficult

Solution 1: Pre-create the cookie file

Create a custom Dockerfile that sets the cookie before RabbitMQ starts:

FROM rabbitmq:3.6.9-alpine

RUN echo "QOKWQHQKXXTBIEAOPWKE" > /var/lib/rabbitmq/.erlang.cookie
RUN chmod 400 /var/lib/rabbitmq/.erlang.cookie
RUN chown rabbitmq:rabbitmq /var/lib/rabbitmq/.erlang.cookie

Solution 2: Use an entrypoint script

Create an entrypoint.sh that handles cookie initialization:

#!/bin/sh
set -e

if [ -n "$RABBITMQ_ERLANG_COOKIE" ]; then
    cookieFile="/var/lib/rabbitmq/.erlang.cookie"
    echo "$RABBITMQ_ERLANG_COOKIE" > "$cookieFile"
    chmod 400 "$cookieFile"
    chown rabbitmq:rabbitmq "$cookieFile"
fi

exec docker-entrypoint.sh "$@"

Then in your Dockerfile:

FROM rabbitmq:3.6.9-alpine
COPY entrypoint.sh /
RUN chmod +x /entrypoint.sh
ENTRYPOINT ["/entrypoint.sh"]

For AWS ECS deployments, you can combine these approaches with ECS task definitions:

{
  "containerDefinitions": [
    {
      "name": "rabbitmq",
      "image": "your-custom-rabbitmq-image",
      "environment": [
        {
          "name": "RABBITMQ_ERLANG_COOKIE",
          "value": "QOKWQHQKXXTBIEAOPWKE"
        }
      ],
      "essential": true
    }
  ]
}

After implementing your solution, verify with:

# Check environment variable
docker exec container_name env | grep RABBITMQ_ERLANG_COOKIE

# Check cookie file content
docker exec container_name cat /var/lib/rabbitmq/.erlang.cookie

# Check file permissions
docker exec container_name ls -la /var/lib/rabbitmq/.erlang.cookie

Remember that all nodes in the cluster must:

  • Use identical cookie values
  • Have proper file permissions (400)
  • Have correct ownership (rabbitmq:rabbitmq)

If using Kubernetes, consider ConfigMaps:

apiVersion: v1
kind: ConfigMap
metadata:
  name: rabbitmq-config
data:
  .erlang.cookie: |
    QOKWQHQKXXTBIEAOPWKE
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: rabbitmq
spec:
  template:
    spec:
      containers:
      - name: rabbitmq
        volumeMounts:
        - name: config-volume
          mountPath: /var/lib/rabbitmq/.erlang.cookie
          subPath: .erlang.cookie
      volumes:
      - name: config-volume
        configMap:
          name: rabbitmq-config

When deploying RabbitMQ clusters on AWS ECS, I encountered a critical issue where the RABBITMQ_ERLANG_COOKIE environment variable wasn't properly reflected in the /var/lib/rabbitmq/.erlang.cookie file, despite being visible in container inspection. This caused cluster formation failures as each node generated its own random cookie.

The RabbitMQ Docker image (specifically 3.6.9-alpine) has a particular behavior:

  • The entrypoint script only creates/modifies the cookie file if it doesn't exist
  • If the file exists (even empty), it won't be overwritten
  • The environment variable check happens after the file existence check

Solution 1: Pre-create the cookie file

Create a custom Dockerfile to ensure proper cookie handling:

FROM rabbitmq:3.6.9-alpine

# Remove any existing cookie file
RUN rm -f /var/lib/rabbitmq/.erlang.cookie

# Environment variable will be processed on container start
ENV RABBITMQ_ERLANG_COOKIE="QOKWQHQKXXTBIEAOPWKE"

Solution 2: Use a volume-mounted cookie file

For better control, mount a pre-created cookie file:

docker run -d \
  --name rabbit1 \
  -v /path/to/cookie:/var/lib/rabbitmq/.erlang.cookie \
  rabbitmq:3.6.9-alpine

Solution 3: Entrypoint override

Create a custom entrypoint script:

#!/bin/sh
set -e

# Force write cookie from env
if [ -n "$RABBITMQ_ERLANG_COOKIE" ]; then
  echo "$RABBITMQ_ERLANG_COOKIE" > /var/lib/rabbitmq/.erlang.cookie
  chmod 600 /var/lib/rabbitmq/.erlang.cookie
fi

exec docker-entrypoint.sh "$@"

After implementing any solution, verify:

docker exec -it rabbit1 cat /var/lib/rabbitmq/.erlang.cookie
docker exec -it rabbit2 cat /var/lib/rabbitmq/.erlang.cookie

Both should show identical values matching your RABBITMQ_ERLANG_COOKIE.

Once cookies match, configure clustering:

docker exec -it rabbit2 \
  rabbitmqctl stop_app
docker exec -it rabbit2 \
  rabbitmqctl join_cluster rabbit@rabbit1
docker exec -it rabbit2 \
  rabbitmqctl start_app

The nodes should now form a proper cluster with synchronized state.