docker-compose
Validate docker-compose.yml files for proper GPU device configuration.
Usage
If no path is provided, looks for docker-compose.yml or docker-compose.yaml in the current directory.
What It Validates
GPU Device Configuration
- Checks for
deploy.resources.reservations.devices - Validates
driver: nvidiais specified - Ensures
capabilities: [gpu]is set
Deprecated Syntax
- Flags old
runtime: nvidiaapproach - Provides migration path to new syntax
Multi-Service Configuration
- Warns about GPU resource sharing between services
- Validates each service with GPU requirements
Host Requirements
- Checks for nvidia-container-toolkit
- Validates Docker GPU support
Example Output
đŗ DOCKER COMPOSE VALIDATION: docker-compose.yml
â ERRORS (1):
------------------------------------------------------------
Service 'ml-training':
Issue: Missing GPU device configuration
Fix: Add GPU device configuration under deploy.resources.reservations.devices
Suggested fix:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
â ī¸ WARNINGS (1):
------------------------------------------------------------
Service 'legacy-app':
Issue: Deprecated 'runtime: nvidia' syntax
Fix: Use the new 'deploy.resources.reservations.devices' syntax instead
SUMMARY:
â Errors: 1
â ī¸ Warnings: 1
âšī¸ Info: 0
Correct GPU Configuration
Modern Syntax (Compose v2.4+)
services:
ml-training:
image: pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all # or specific number: count: 1
capabilities: [gpu]
Device Selection
# Use all GPUs
count: all
# Use specific number of GPUs
count: 2
# Use specific GPU by ID
device_ids: ['0', '2']
Deprecated Syntax
# â Old syntax (still works but deprecated)
services:
app:
runtime: nvidia
# â
New syntax
services:
app:
deploy:
resources:
reservations:
devices:
- driver: nvidia
capabilities: [gpu]
Common Issues
Missing nvidia-container-toolkit
â nvidia-container-toolkit not detected on host
đĄ Installation:
# Ubuntu/Debian
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt update && sudo apt install nvidia-container-toolkit
sudo systemctl restart docker
GPU Sharing Warning
â ī¸ Multiple services reserving GPU resources
Services: ml-training, inference-api
đĄ Note:
GPU memory is shared between containers.
Ensure total VRAM requirements don't exceed available memory.
See Also
- dockerfile - Validate Dockerfile