How to Properly Migrate Docker Data Containers with Persistent Storage Between Hosts


2 views

Your approach reflects a common misconception about how Docker handles data persistence. The critical issue lies in how Docker manages volumes and container commits:

# This is the problematic step in your workflow:
docker commit datatest_data myrepository:5000/datatest-data:latest

When you commit a container, Docker captures the container's filesystem except for volumes. This means your /datafolder content isn't included in the committed image.

Docker volumes exist outside the container's writable layer and have specific characteristics:

  • Volumes persist independently of containers
  • They bypass the Union File System
  • Commit operations exclude mounted volumes

Option 1: Volume Backup and Restore

Here's how to properly migrate data between hosts:

# On Machine A
docker run --rm --volumes-from datatest_data -v $(pwd):/backup ubuntu tar cvf /backup/backup.tar /datafolder

# Transfer backup.tar to Machine B
scp backup.tar user@machineB:/path/to/backup.tar

# On Machine B
docker run --rm --volumes-from new_data_container -v $(pwd):/backup ubuntu bash -c "cd / && tar xvf /backup/backup.tar"

Option 2: Named Volumes with Volume Drivers

For more modern Docker installations:

# Create a named volume
docker volume create myapp_data

# Run your container with the named volume
docker run -d --name datatest_data -v myapp_data:/datafolder ubuntu

# Backup the volume
docker run --rm -v myapp_data:/data -v $(pwd):/backup ubuntu tar cvf /backup/backup.tar /data

For production environments, consider this automated approach:

#!/bin/bash
# data_migration.sh

# Source host
SOURCE_HOST="machineA"
SOURCE_VOL="datatest_data"
SOURCE_PATH="/datafolder"

# Destination host
DEST_HOST="machineB"
DEST_VOL="new_data_container"
DEST_PATH="/datafolder"

# Create backup
ssh $SOURCE_HOST "docker run --rm --volumes-from $SOURCE_VOL -v /tmp:/backup ubuntu tar cvf /backup/data_backup.tar $SOURCE_PATH"

# Transfer backup
scp $SOURCE_HOST:/tmp/data_backup.tar /tmp/

# Restore backup
scp /tmp/data_backup.tar $DEST_HOST:/tmp/
ssh $DEST_HOST "docker run --rm --volumes-from $DEST_VOL -v /tmp:/backup ubuntu tar xvf /backup/data_backup.tar -C $DEST_PATH"

For cloud-native applications, consider these patterns:

  • Use cloud storage services (S3, Azure Blob) as your persistent layer
  • Implement database containers with proper replication
  • Consider orchestration tools like Kubernetes with PersistentVolumeClaims

The key takeaway is that Docker volumes require special handling during migration. Standard commit/push operations won't capture volume data, so you need explicit backup/restore procedures.


Many developers new to Docker make the same assumption about data persistence that you did. The key realization is that Docker images are immutable snapshots, while volumes represent mutable storage. Here's why committing a data container doesn't preserve your data:

# This DOES NOT capture volume data
docker commit container-name repository:tag

For true data persistence across deployments, you need to properly manage volumes separately from your containers. Here are three production-grade approaches:

Option 1: Named Volumes

# Create named volume
docker volume create myapp_data

# Run container with volume
docker run -d \
  --name myapp \
  -v myapp_data:/data \
  myapp:latest

Option 2: Host-mounted Volumes

docker run -d \
  --name myapp \
  -v /host/path:/container/path \
  myapp:latest

Option 3: Volume Containers (Legacy Approach)

# Create data container
docker create -v /data --name data_container busybox

# Application container
docker run --volumes-from data_container myapp:latest

Here's the corrected workflow for your specific scenario:

# 1. Create named volume
docker volume create datatest_volume

# 2. Run writer with volume
docker run -d \
  --name writer \
  -v datatest_volume:/datafolder \
  myrepository:5000/datatest-write:latest

# 3. Run reader to verify
docker run --rm \
  --name reader \
  -v datatest_volume:/datafolder \
  myrepository:5000/datatest-read:latest

# 4. To migrate to another host:
# - Use volume backup tools or
# - Mount same host directory on new machine

For actual data portability between machines, consider these techniques:

# Backup volume data
docker run --rm \
  -v datatest_volume:/volume \
  -v /backup:/backup \
  busybox \
  tar cvf /backup/backup.tar /volume

# Restore to new host
docker run --rm \
  -v new_volume:/volume \
  -v /backup:/backup \
  busybox \
  tar xvf /backup/backup.tar -C /

For production environments, consider:

  • Using Docker Compose for defining multi-container apps
  • Implementing CI/CD pipelines for container deployment
  • Exploring orchestration tools like Kubernetes or Swarm

Here's a sample docker-compose.yml for your scenario:

version: '3'
services:
  writer:
    image: myrepository:5000/datatest-write:latest
    volumes:
      - datatest_volume:/datafolder
    restart: unless-stopped
  
  reader:
    image: myrepository:5000/datatest-read:latest
    volumes:
      - datatest_volume:/datafolder
    depends_on:
      - writer

volumes:
  datatest_volume: