When working with DRBD 8.3.13 on CentOS 5 in an OCFS2 cluster configuration, you may encounter situations where DRBD enters split-brain state and refuses to transition to secondary mode. The key error message appears as:
1: State change failed: (-12) Device is held open by someone
Command 'drbdsetup 1 secondary' terminated with exit code 11
First verify the current DRBD status:
# cat /proc/drbd
version: 8.3.13 (api:88/proto:86-96)
1: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r-----
ns:0 nr:0 dw:112281991 dr:797551 al:99 bm:6401 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:60
Then check for processes holding DRBD resources:
# lsof | grep drbd
# ps aux | grep drbd
OCFS2 can maintain persistent connections to storage devices. Verify its status:
# service ocfs2 status
# mount | grep ocfs2
Even if OCFS2 appears unmounted, check for lingering processes:
# ps aux | grep o2hb
# ls -l /proc/$(pidof o2hb-*)/exe
When encountering zombie processes with square brackets in ps
output:
root 7782 1 0 Apr22 ? 00:00:20 [drbd1_worker]
This indicates a kernel thread or defunct process. Examine its stack trace:
# echo t > /proc/sysrq-trigger
# dmesg | grep -A20 "drbd1_worker"
Verify LVM's involvement with DRBD devices:
# vgdisplay -v
# lvdisplay -m
# dmsetup ls --tree -o inverted
1. Forcefully terminate any OCFS2-related processes:
# killall -9 o2hb-*
# killall -9 ocfs2*
2. Attempt DRBD detach:
# drbdadm detach r0
3. Cleanup DRBD metadata:
# drbdadm -- --discard-my-data connect r0
After successful recovery, verify the new state:
# drbdadm connect r0
# drbdadm secondary r0
# cat /proc/drbd
For persistent solutions, consider adding these to your DRBD configuration:
net {
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
}
When running DRBD 8.3.13 with OCFS2 on CentOS 5, you may encounter a situation where:
1: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r-----
ns:0 nr:0 dw:112281991 dr:797551 al:99 bm:6401 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:60
The most frustrating part appears when trying to switch to secondary:
drbdadm secondary r0
1: State change failed: (-12) Device is held open by someone
Command 'drbdsetup 1 secondary' terminated with exit code 11
Our first clue comes from process examination:
# lsof | grep drbd
drbd1_wor 7782 root cwd DIR 253,0 4096 2 /
Note the zombie process (indicated by square brackets):
root 7782 1 0 Apr22 ? 00:00:20 [drbd1_worker]
Checking the storage topology reveals:
# dmsetup ls --tree -o inverted
(202:2)
├─VolGroup00-LogVol01 (253:1)
└─VolGroup00-LogVol00 (253:0)
Let's examine the kernel-level interactions:
kernel: drbd1_worker S ffff81007ae21820 0 7782 1 7795 7038 (L-TLB)
kernel: ffff810055d89e00 0000000000000046 000573a8befba2d6 ffffffff8008e82f
kernel: [] :drbd:.text.lock.drbd_worker+0x2d/0x43
Here's how to resolve this without rebooting:
- First, ensure OCFS2 is completely unmounted:
umount -f /data
- Terminate any orphaned processes:
kill -9 7782
- Clear any kernel references:
echo 1 > /sys/block/drbd1/device/delete
- Finally, switch to secondary:
drbdadm secondary r0
If the above fails, try forcing the secondary state:
drbdsetup /dev/drbd1 secondary --force
Modify your DRBD configuration to prevent future occurrences:
resource r0 {
net {
# Add these parameters
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
}
}