How to Properly Bind and Control systemd Unit Dependencies for CoreOS Fleet Services


1 views

When working with CoreOS and fleet, properly managing service dependencies is crucial for orchestrated container deployments. The key issue here stems from misunderstanding how BindsTo works in conjunction with other unit directives.

[Unit]
# This is the correct way to establish a tight binding
BindsTo=firehose@%i.service
After=firehose@%i.service

Your current configuration has several subtle issues:

  • Requires creates a weaker dependency than BindsTo
  • The service enters failed state due to missing restart handling
  • Fleet's machine condition might interfere with startup ordering

Here's the improved version that properly handles the lifecycle binding:

[Unit]
Description=Firehose etcd announcer
BindsTo=firehose@%i.service
After=firehose@%i.service
# Removed Requires as it's redundant with BindsTo

[Service]
EnvironmentFile=/etc/environment
TimeoutStartSec=30s
ExecStartPre=/bin/sh -c 'until docker inspect firehose-%i >/dev/null 2>&1; do sleep 1; done'
ExecStart=/bin/sh -c \
  "port=$(docker inspect -f '{{range $i, $e := .NetworkSettings.Ports}}{{$p := index $e 0}}{{$p.HostPort}}{}' firehose-%i); \
  while [ \"$port\" ] && netstat -lnt | grep -q :$port; do \
    etcdctl set /firehose/upstream/firehose-%i $COREOS_PRIVATE_IPV4:$port --ttl 300; \
    sleep 200; \
  done"
Restart=on-failure
RestartSec=30s
StartLimitInterval=300
StartLimitBurst=5

[X-Fleet]
X-ConditionMachineOf=firehose@%i.service

The solution implements several critical fixes:

# Docker container existence check
ExecStartPre=/bin/sh -c 'until docker inspect firehose-%i >/dev/null 2>&1; do sleep 1; done'

# More robust port checking logic
while [ \"$port\" ] && netstat -lnt | grep -q :$port; do

After deploying the changes, verify the behavior:

# Start the main service
fleetctl start firehose@1.service

# Check dependent service status
fleetctl status firehose-announce@1.service

# Stop the main service and verify both stop
fleetctl stop firehose@1.service
fleetctl list-units | grep firehose

For more complex scenarios, consider path-based activation:

[Unit]
Description=Watch for firehose container start
BindsTo=firehose@%i.service

[Path]
PathExists=/var/run/docker.sock
Unit=firehose-announce@%i.service

[Install]
WantedBy=multi-user.target

When working with systemd units that need to maintain tight lifecycle synchronization, the relationship between BindsTo and Requires becomes crucial. In your case with firehose.service and firehose-announce.service, the current configuration isn't achieving the desired start/stop synchronization.

Your unit file shows several dependency directives:

[Unit]
BindsTo=firehose@%i.service
After=firehose@%i.service
Requires=firehose@%i.service

The observed behavior where firehose-announce.service enters a failed state when firehose.service stops suggests the BindsTo directive is working correctly for stopping. However, the startup synchronization fails because:

  1. Requires only ensures the target unit is activated
  2. BindsTo alone doesn't guarantee startup order
  3. The After directive might not be properly handling templated units

Here's the corrected approach for firehose-announce.service:

[Unit]
Description=Firehose etcd announcer
BindsTo=firehose@%i.service
After=firehose@%i.service
PartOf=firehose@%i.service

[Service]
EnvironmentFile=/etc/environment
TimeoutStartSec=30s
ExecStartPre=/bin/sh -c 'until docker inspect firehose-%i >/dev/null 2>&1; do sleep 1; done'
ExecStart=/bin/sh -c "port=$(docker inspect -f '{{range $i, $e := .NetworkSettings.Ports }}{{$p := index $e 0}}{{$p.HostPort}}{}' firehose-%i); \
  while netstat -lnt | grep -q :$port; do \
    etcdctl set /firehose/upstream/firehose-%i $COREOS_PRIVATE_IPV4:$port --ttl 300 >/dev/null; \
    sleep 200; \
  done"
RestartSec=30s
Restart=on-failure

[X-Fleet]
X-ConditionMachineOf=firehose@%i.service
  • Added PartOf: Ensures the unit is treated as part of firehose@.service's lifecycle
  • Improved dependency chain: BindsTo + PartOf provides better synchronization than Requires
  • Better startup detection: The ExecStartPre now waits for the container to exist
  • Simplified ExecStart: Removed redundant echo command that could fail

After implementing these changes, verify the behavior with:

# Start the main service
fleetctl start firehose@1.service

# Check status of both units
fleetctl list-units | grep firehose

# Stop the main service
fleetctl stop firehose@1.service

# Verify both services stopped
fleetctl list-units | grep firehose

For more complex scenarios, consider using path units to trigger announcements:

[Unit]
Description=Firehose port watcher

[Path]
PathExists=/sys/fs/cgroup/systemd/docker-firehose-%i.scope

[Install]
WantedBy=multi-user.target