In a recent Windows Server 2016 Failover Cluster setup with Storage Spaces Direct (S2D), I encountered a critical issue where the system automatically incorporated RAID volumes into the storage pool - despite S2D explicitly requiring direct-attached, non-RAID disks. This caused the virtual disk to go offline, bringing down the entire SQL Failover Cluster Instance (FCI).
The configuration consists of:
- 2 HPE DL380 Gen9 servers with dual 10Gb RDMA connections
- HP RAID controller (for OS and Files volumes)
- HP HBA controller (for S2D raw disks)
- Windows Server 2016 Datacenter Edition
- SQL Server 2016 Standard FCI
After extensive testing, I found these reliable methods to prevent S2D from auto-adding disks:
1. Physical Disk Masking via PowerShell
Before enabling S2D, explicitly specify which physical disks should be included:
# Get all available physical disks
$disks = Get-PhysicalDisk -CanPool $true
# Filter disks by SerialNumber or FriendlyName
$s2dDisks = $disks | Where-Object { $_.SerialNumber -in @("1234ABCD","5678EFGH") }
# Create pool only with selected disks
New-StoragePool -FriendlyName S2DPool -StorageSubsystemFriendlyName "Windows Storage*"
-PhysicalDisks $s2dDisks -Verbose
2. Storage Pool Policy Configuration
After pool creation, modify the auto-configuration behavior:
# Disable auto-adding new disks
Set-StoragePool -FriendlyName S2DPool -IsReadOnly $true
# Alternatively, set disk allocation policy
Set-StoragePool -FriendlyName S2DPool -AllocationPolicy Manual
3. Disk BusType Filtering
For HPE hardware specifically, filter by bus type to exclude RAID volumes:
# Only include SAS disks on HBA (typically bus type 8)
$s2dDisks = Get-PhysicalDisk | Where-Object {
$_.BusType -eq 8 -and $_.OperationalStatus -eq "OK"
}
For those already experiencing this problem, here's how to recover:
1. Force Remove Problematic Disks
# First retire the disk
Set-PhysicalDisk -FriendlyName "ProblemDisk" -Usage Retired
# Then remove (may require -Force)
Remove-PhysicalDisk -FriendlyName "ProblemDisk" -StoragePoolFriendlyName S2DPool -Force
2. Virtual Disk Repair Commands
# First check health
Get-VirtualDisk | Get-StorageJob
# Full repair sequence
Repair-VirtualDisk -FriendlyName "ClusterDisk" -AsJob
Sync-VirtualDisk -FriendlyName "ClusterDisk"
- Always pre-filter disks before pool creation
- Document and validate disk serial numbers
- Implement regular health checks with:
Get-StorageSubSystem -FriendlyName *space* | Debug-StorageSubSystem
- Consider using S2D cache configuration to further define disk roles
In Windows Server 2016 Failover Clusters with Storage Spaces Direct (S2D), we're seeing an alarming behavior where the system automatically incorporates RAID volumes into the storage pool despite S2D's explicit requirement for JBOD (Just a Bunch Of Disks) configuration. This creates a ticking time bomb scenario where any RAID volume modifications can crash the entire cluster.
The issue stems from Windows Server's overly aggressive disk auto-incorporation logic. When examining failed clusters, we consistently find:
- RAID volumes appearing as PhysicalDisks in Get-PhysicalDisk output
- Storage pools containing mixed JBOD and RAID disks
- Retirement operations that fail to fully remove problematic disks
To prevent automatic disk incorporation, use these PowerShell commands during initial setup:
# First, create the pool with only your intended disks
$eligibleDisks = Get-PhysicalDisk -CanPool $true | Where-Object { $_.BusType -ne 'RAID' }
New-StoragePool -FriendlyName S2DPool -StorageSubsystemFriendlyName 'Windows Storage*' -PhysicalDisks $eligibleDisks
# Then lock down the pool to prevent auto-expansion
Set-StoragePool -FriendlyName S2DPool -IsReadOnly $true
For production environments, implement these safeguards:
- Disk Filtering Policy: Create a Group Policy to restrict disk types:
- Cluster Validation Rule: Add a custom cluster validation test:
New-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Services\SpacePort\Parameters"
-Name "ExcludePhysicalDisks" -PropertyType MultiString -Value "RAID"
$test = New-ClusterResourceType -Name "S2DComplianceCheck" -Dll "%SystemRoot%\cluster\resrc.dll"
-IsAlivePollInterval 300000
Add-ClusterResource -Name "S2DGuard" -Group "Cluster Group" -ResourceType "S2DComplianceCheck"
If you're already in a broken state, try this recovery sequence:
# Step 1: Take the pool offline safely
Stop-ClusterResource -Name "S2D*"
# Step 2: Force remove problematic disks
Get-StoragePool -FriendlyName S2DPool | Get-PhysicalDisk |
Where-Object { $_.BusType -eq 'RAID' } |
Set-PhysicalDisk -Usage Retired -Reason "Manual isolation"
# Step 3: Perform deep repair
Repair-VirtualDisk -FriendlyName ClusterVirtualDisk -AsJob
Wait-Job -Name "Repair*"
# Step 4: Validate consistency
Test-Cluster -Include "Storage Spaces Direct" -ReportName "S2DRecoveryReport"
- Always use dedicated HBA controllers (no RAID capabilities)
- Create separate physical disks for OS and S2D storage
- Implement regular Get-PhysicalDisk audits with this script:
$report = Get-PhysicalDisk | Select-Object FriendlyName, BusType, Size, OperationalStatus, HealthStatus, Usage
$report | Export-Csv -Path "C:\S2D_Audit_$(Get-Date -Format yyyyMMdd).csv"
Add these performance counters to your monitoring solution:
\Storage Spaces Drt(*)\Pool Metadata Operation Failures
\Storage Spaces Drt(*)\Pool Unhealthy
\Cluster Disk(*)\Disk Offline
\Cluster Disk(*)\Resource Online