In a ZooKeeper ensemble with three nodes (as configured in the provided zoo.cfg
), one server always acts as the Leader while others become Followers. When the Leader fails, the remaining servers initiate a new leader election using the Zab protocol over the configured 3888 port.
The most reliable method follows the "four-eyes principle" - combining multiple verification approaches:
# Method 1: Using stat command
echo stat | nc localhost 2181 | grep Mode
# Method 2: Using srvr command (more detailed)
echo srvr | nc localhost 2181 | grep "Mode:"
# Method 3: JMX monitoring (for production environments)
jconsole localhost:2181
Here's a Python script that programmatically determines node status:
import socket
import subprocess
def check_zookeeper_role(host='localhost', port=2181, timeout=3):
try:
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.settimeout(timeout)
s.connect((host, port))
s.sendall(b'stat\n')
data = s.recv(1024).decode('utf-8')
if 'Mode: leader' in data:
return 'LEADER'
elif 'Mode: follower' in data:
return 'FOLLOWER'
else:
return 'STANDALONE' if 'Mode: standalone' in data else 'UNKNOWN'
except Exception as e:
return f'ERROR: {str(e)}'
# Example usage
print(check_zookeeper_role())
When testing leader failover in your 3-node cluster:
- Identify current leader using above methods
- Gracefully stop the leader:
zookeeper-server-stop
- Monitor election process (takes 2*tickTime=4000ms in this config)
- Verify new leader election within syncLimit*tickTime=10000ms
For enterprise deployments, consider these additional methods:
# Kubernetes probes example
livenessProbe:
exec:
command:
- sh
- -c
- "echo stat | nc localhost 2181 | grep -q Mode"
initialDelaySeconds: 20
periodSeconds: 10
In a ZooKeeper ensemble with three nodes (as configured in the zoo.cfg), exactly one server becomes the Leader while others become Followers. The Leader handles all write requests and coordinates the consensus protocol, while Followers serve read requests and participate in elections.
The most straightforward way to check a server's role is using the stat
command via nc
or telnet
:
echo stat | nc localhost 2181
# or for remote servers:
echo stat | nc zk-server-ip 2181
Example output showing a Leader:
Zookeeper version: 3.6.3--6401e4ad2087061bc6b9f80dec2d69f2e3c8660a, built on 04/08/2021 16:35 GMT
Clients:
/192.168.1.10:44852[0](queued=0,recved=1,sent=0)
Latency min/avg/max: 0/0/0
Received: 1
Sent: 0
Connections: 1
Outstanding: 0
Zxid: 0x200000002
Mode: leader
Node count: 5
For more detailed monitoring, consider these approaches:
1. Using the Four Letter Words (FLW) commands:
echo mntr | nc localhost 2181 | grep zk_server_state
# Output: zk_server_state leader or zk_server_state follower
2. Programmatic approach with ZooKeeper CLI:
bin/zkCli.sh -server localhost:2181 get /zookeeper/config | grep server
3. JMX monitoring: Enable JMX and check the zk_server_state
attribute.
To verify leader election works when the current leader fails:
# 1. Identify current leader using above methods
# 2. Gracefully shutdown the leader:
bin/zkServer.sh stop
# 3. Check remaining servers (should elect new leader within tickTime*syncLimit)
for server in zk1 zk2 zk3; do
echo "Checking $server..."
echo stat | nc $server 2181 | grep Mode
done
Here's a bash script to monitor the ensemble status:
#!/bin/bash
ZK_SERVERS=("zk1" "zk2" "zk3")
ZK_PORT=2181
for server in "${ZK_SERVERS[@]}"; do
status=$(echo stat | nc $server $ZK_PORT 2>/dev/null | grep "Mode")
if [ $? -eq 0 ]; then
echo "$server: $status"
else
echo "$server: DOWN"
fi
done
- The election timeout is determined by
tickTime
andinitLimit
/syncLimit
- Network partitions can affect leader election - use
ping
between servers - For production, implement proper monitoring (Prometheus + Grafana is common)