Your approach reflects a common misconception about how Docker handles data persistence. The critical issue lies in how Docker manages volumes and container commits:
# This is the problematic step in your workflow:
docker commit datatest_data myrepository:5000/datatest-data:latest
When you commit a container, Docker captures the container's filesystem except for volumes. This means your /datafolder content isn't included in the committed image.
Docker volumes exist outside the container's writable layer and have specific characteristics:
- Volumes persist independently of containers
- They bypass the Union File System
- Commit operations exclude mounted volumes
Option 1: Volume Backup and Restore
Here's how to properly migrate data between hosts:
# On Machine A
docker run --rm --volumes-from datatest_data -v $(pwd):/backup ubuntu tar cvf /backup/backup.tar /datafolder
# Transfer backup.tar to Machine B
scp backup.tar user@machineB:/path/to/backup.tar
# On Machine B
docker run --rm --volumes-from new_data_container -v $(pwd):/backup ubuntu bash -c "cd / && tar xvf /backup/backup.tar"
Option 2: Named Volumes with Volume Drivers
For more modern Docker installations:
# Create a named volume
docker volume create myapp_data
# Run your container with the named volume
docker run -d --name datatest_data -v myapp_data:/datafolder ubuntu
# Backup the volume
docker run --rm -v myapp_data:/data -v $(pwd):/backup ubuntu tar cvf /backup/backup.tar /data
For production environments, consider this automated approach:
#!/bin/bash
# data_migration.sh
# Source host
SOURCE_HOST="machineA"
SOURCE_VOL="datatest_data"
SOURCE_PATH="/datafolder"
# Destination host
DEST_HOST="machineB"
DEST_VOL="new_data_container"
DEST_PATH="/datafolder"
# Create backup
ssh $SOURCE_HOST "docker run --rm --volumes-from $SOURCE_VOL -v /tmp:/backup ubuntu tar cvf /backup/data_backup.tar $SOURCE_PATH"
# Transfer backup
scp $SOURCE_HOST:/tmp/data_backup.tar /tmp/
# Restore backup
scp /tmp/data_backup.tar $DEST_HOST:/tmp/
ssh $DEST_HOST "docker run --rm --volumes-from $DEST_VOL -v /tmp:/backup ubuntu tar xvf /backup/data_backup.tar -C $DEST_PATH"
For cloud-native applications, consider these patterns:
- Use cloud storage services (S3, Azure Blob) as your persistent layer
- Implement database containers with proper replication
- Consider orchestration tools like Kubernetes with PersistentVolumeClaims
The key takeaway is that Docker volumes require special handling during migration. Standard commit/push operations won't capture volume data, so you need explicit backup/restore procedures.
Many developers new to Docker make the same assumption about data persistence that you did. The key realization is that Docker images are immutable snapshots, while volumes represent mutable storage. Here's why committing a data container doesn't preserve your data:
# This DOES NOT capture volume data
docker commit container-name repository:tag
For true data persistence across deployments, you need to properly manage volumes separately from your containers. Here are three production-grade approaches:
Option 1: Named Volumes
# Create named volume
docker volume create myapp_data
# Run container with volume
docker run -d \
--name myapp \
-v myapp_data:/data \
myapp:latest
Option 2: Host-mounted Volumes
docker run -d \
--name myapp \
-v /host/path:/container/path \
myapp:latest
Option 3: Volume Containers (Legacy Approach)
# Create data container
docker create -v /data --name data_container busybox
# Application container
docker run --volumes-from data_container myapp:latest
Here's the corrected workflow for your specific scenario:
# 1. Create named volume
docker volume create datatest_volume
# 2. Run writer with volume
docker run -d \
--name writer \
-v datatest_volume:/datafolder \
myrepository:5000/datatest-write:latest
# 3. Run reader to verify
docker run --rm \
--name reader \
-v datatest_volume:/datafolder \
myrepository:5000/datatest-read:latest
# 4. To migrate to another host:
# - Use volume backup tools or
# - Mount same host directory on new machine
For actual data portability between machines, consider these techniques:
# Backup volume data
docker run --rm \
-v datatest_volume:/volume \
-v /backup:/backup \
busybox \
tar cvf /backup/backup.tar /volume
# Restore to new host
docker run --rm \
-v new_volume:/volume \
-v /backup:/backup \
busybox \
tar xvf /backup/backup.tar -C /
For production environments, consider:
- Using Docker Compose for defining multi-container apps
- Implementing CI/CD pipelines for container deployment
- Exploring orchestration tools like Kubernetes or Swarm
Here's a sample docker-compose.yml for your scenario:
version: '3'
services:
writer:
image: myrepository:5000/datatest-write:latest
volumes:
- datatest_volume:/datafolder
restart: unless-stopped
reader:
image: myrepository:5000/datatest-read:latest
volumes:
- datatest_volume:/datafolder
depends_on:
- writer
volumes:
datatest_volume: