When performing maintenance on Azure VMs running IIS (or any stateful service), simply removing the VM from the backend pool isn't enough. Unlike traditional NLB's drain stop functionality, Azure Load Balancer (ALB) operates at layer 4 and doesn't inherently support graceful connection draining. Here's how we solved this in our production environment.
We implemented a three-phase approach combining Azure native features and application-level controls:
1. Health Probe Manipulation
2. Connection Draining Timer
3. Application Warmup Verification
Azure LB uses health probes to determine VM availability. We created a custom health check endpoint:
// In Startup.cs
app.MapGet("/health", (HttpContext context) =>
{
var maintenanceMode = File.Exists("/maintenance.flag");
return maintenanceMode
? Results.StatusCode(StatusCodes.Status503ServiceUnavailable)
: Results.Ok();
});
For IIS specifically, we use Application Initialization and ARR affinity:
<applicationInitialization
remapManagedRequestsTo="Warmup.htm"
skipManagedModules="true">
<add initializationPage="/" />
</applicationInitialization>
Here's our PowerShell workflow for coordinated drain stop:
# Step 1: Set maintenance flag
Invoke-AzVMRunCommand -ResourceGroupName $rg
-VMName $vmName
-CommandId 'RunPowerShellScript'
-ScriptString 'New-Item -Path C:\maintenance.flag -ItemType File'
# Step 2: Wait for connections to drain
Start-Sleep -Seconds 300 # Adjust based on average session duration
# Step 3: Remove from backend pool
$backendPool = Get-AzLoadBalancerBackendAddressPool
-ResourceGroupName $rg
-LoadBalancerName $lbName
Remove-AzNetworkInterfaceIpConfig
-NetworkInterface $nic
-Name "ipconfig1"
-LoadBalancerBackendAddressPool $backendPool
Use this KQL query in Azure Monitor to verify draining progress:
AzureMetrics
| where ResourceProvider == "MICROSOFT.NETWORK"
| where MetricName == "DipAvailability"
| where TimeGenerated > ago(15m)
| summarize avg(Average) by bin(TimeGenerated, 1m), Resource
For complex scenarios, consider Azure Service Fabric or AKS with:
- Pod Disruption Budgets
- Readiness Probes
- PreStop Hooks
When working with Azure Load Balancer (ALB) for IIS deployments, the concept of "drain stopping" isn't natively available like it was in Microsoft NLB. However, we can achieve similar functionality through proper configuration.
Here's the complete workflow to gracefully remove a VM from ALB:
# PowerShell example for Azure CLI $resourceGroup = "Your-RG-Name" $lbName = "Your-LB-Name" $backendPoolName = "Your-Backend-Pool" $vmName = "Your-VM-Name" # First, get the NIC associated with the VM $nic = Get-AzNetworkInterface -ResourceGroupName $resourceGroup | Where-Object { $_.VirtualMachine.Id -like "*$vmName*" } # Get the load balancer backend pool configuration $lb = Get-AzLoadBalancer -Name $lbName -ResourceGroupName $resourceGroup $backendPool = $lb.BackendAddressPools | Where-Object { $_.Name -eq $backendPoolName } # Remove the VM's IP configuration from the backend pool $nic.IpConfigurations[0].LoadBalancerBackendAddressPools.Remove($backendPool) Set-AzNetworkInterface -NetworkInterface $nic # Alternative approach using Azure CLI: # az network nic ip-config update --name ipconfig1 --nic-name MyNic --resource-group MyResourceGroup --remove backend_address_pools [0]
Since Azure LB doesn't maintain connection state, we need to implement this at the application level:
// C# example for IIS application public class DrainModeModule : IHttpModule { private static bool _isDraining = false; public void Init(HttpApplication context) { context.BeginRequest += (sender, e) => { if (_isDraining) { var response = ((HttpApplication)sender).Response; response.StatusCode = 503; response.Write("Server in maintenance mode"); response.End(); } }; } public static void EnableDrainMode() => _isDraining = true; public static void DisableDrainMode() => _isDraining = false; public void Dispose() { } }
For complete automation, combine these approaches in a maintenance script:
# PowerShell maintenance script param( [string]$ResourceGroup, [string]$VMName, [string]$WebAppUrl ) # Step 1: Enable drain mode in application Invoke-RestMethod -Uri "$WebAppUrl/api/maintenance/enable" -Method Post # Step 2: Wait for active connections to complete Start-Sleep -Seconds 300 # Adjust based on your average session duration # Step 3: Remove from load balancer $nic = Get-AzNetworkInterface -ResourceGroupName $ResourceGroup | Where-Object { $_.VirtualMachine.Id -like "*$VMName*" } $lb = Get-AzLoadBalancer | Where-Object { $_.BackendAddressPools[0].BackendIpConfigurations.Id -like "*$($nic.Id)*" } $nic.IpConfigurations[0].LoadBalancerBackendAddressPools = @() Set-AzNetworkInterface -NetworkInterface $nic # Step 4: Perform your maintenance tasks # ...
Use this PowerShell snippet to check active IIS connections before proceeding with maintenance:
# Get IIS active connections Import-Module WebAdministration $server = "localhost" $site = "Default Web Site" $activeConnections = (Get-WebRequest -HostName $server -SiteName $site).Count while ($activeConnections -gt 0) { Write-Host "Waiting for $activeConnections active connections to complete..." Start-Sleep -Seconds 30 $activeConnections = (Get-WebRequest -HostName $server -SiteName $site).Count }
If you're using Application Gateway instead of Load Balancer, connection draining is built-in:
# Configure connection draining in Application Gateway az network application-gateway update \ --resource-group MyResourceGroup \ --name MyAppGateway \ --connection-draining-timeout 300 \ --enable-connection-draining true