When examining the DRBD status on both nodes, we observe both instances are in StandAlone
state despite configuration attempts. The primary node shows:
m:res cs ro ds p mounted fstype
0:r0 StandAlone Primary/Unknown UpToDate/DUnknown r----s ext3
While the secondary reports:
m:res cs ro ds p mounted fstype
0:r0 StandAlone Secondary/Unknown Inconsistent/DUnknown r----s
The kernel messages reveal a fundamental network connectivity issue:
[2285173.099330] block drbd0: bind before connect failed, err = -99
[2285173.099346] block drbd0: conn( WFConnection -> Disconnecting )
The error code -99 (EADDRNOTAVAIL) indicates the network stack cannot bind to the specified IP addresses.
The fundamental issue stems from EC2's elastic IP implementation. While the DRBD config specifies public IPs:
on drbd01 {
address 23.XX.XX.XX:7788;
}
on drbd02 {
address 184.XX.XX.XX:7788;
}
The actual interfaces show private IP assignments:
# Primary node
eth0: inet addr:10.28.39.17
# Secondary node
eth0: inet addr:10.160.27.107
For DRBD to work properly on EC2, we must use the private IPs and configure security groups:
resource r0 {
protocol C;
startup {
wfc-timeout 15;
degr-wfc-timeout 60;
}
net {
cram-hmac-alg sha1;
shared-secret "test123";
}
on drbd01 {
device /dev/drbd0;
disk /dev/xvdm;
address 10.28.39.17:7788;
meta-disk internal;
}
on drbd02 {
device /dev/drbd0;
disk /dev/xvdm;
address 10.160.27.107:7788;
meta-disk internal;
}
}
After correcting the IP addresses:
- Reload the configuration on both nodes:
drbdadm adjust r0
- Initialize the synchronization:
drbdadm -- --overwrite-data-of-peer primary r0
- Verify connectivity:
drbdadm status r0
Ensure your EC2 security groups allow bidirectional traffic on port 7788 between the nodes. A sample security group rule:
Type: Custom TCP Rule
Protocol: TCP
Port Range: 7788
Source: [other node's security group ID]
Once connected, monitor sync progress with:
watch -n1 cat /proc/drbd
Expect to see connection state transitioning from SyncTarget
to Connected
, with decreasing oos
(out-of-sync) blocks.
Implement these best practices:
- Use EC2's internal DNS names for dynamic IP environments
- Configure monitoring for DRBD connection state
- Set up alerts for split-brain conditions
- Regularly test failover procedures
When examining /proc/drbd
on both nodes, we see:
# Primary node
0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown
ns:0 nr:0 dw:4 dr:1073 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:262135964
# Secondary node
0: cs:StandAlone ro:Secondary/Unknown ds:Inconsistent/DUnknown
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:262135964
The critical error appears when running drbdadm dump all
:
# On primary
/etc/drbd.conf:19: in resource r0, on drbd01:
IP 23.XX.XX.XX not found on this host.
# On secondary
/etc/drbd.conf:25: in resource r0, on drbd02:
IP 184.XX.XX.XX not found on this host.
The actual interface configurations show private IPs (10.x.x.x) while DRBD config uses Elastic IPs:
# Primary ifconfig output
eth0: inet addr:10.28.39.17
# Secondary ifconfig output
eth0: inet addr:10.160.27.107
Modify your DRBD configuration to use the private IPs that EC2 instances actually use for internal communication:
resource r0 {
protocol C;
startup {
wfc-timeout 15;
degr-wfc-timeout 60;
}
net {
cram-hmac-alg sha1;
shared-secret "test123";
}
on drbd01 {
device /dev/drbd0;
disk /dev/xvdm;
address 10.28.39.17:7788; # Use private IP
meta-disk internal;
}
on drbd02 {
device /dev/drbd0;
disk /dev/xvdm;
address 10.160.27.107:7788; # Use private IP
meta-disk internal;
}
}
After configuration changes:
# On both nodes:
drbdadm adjust r0
drbdadm up r0
# On secondary node:
drbdadm secondary r0
# On primary node:
drbdadm primary --force r0
If issues persist, check:
# Network connectivity
nc -zv 10.160.27.107 7788
# Firewall rules
iptables -L -n | grep 7788
# DRBD connection status
cat /proc/drbd