Skip to content

deepsquare-io/cifar-10-example

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CIFAR 10 Horovod Example

This example uses the Deep Layer Aggregation method to train on the CIFAR10 dataset.

Installation with Pipenv

  1. Install OpenMPI if you wish to be able to run a distributed workload locally.

  2. Install Pipenv which is a dependency management tool with a locking mechanism (similar to Anaconda).

  3. Clone this repository and run:

    export HOROVOD_WITH_PYTORCH=1
    export HOROVOD_WITH_MPI=1
    export HOROVOD_WITHOUT_GLOO=1
    
    # If GPU
    # export HOROVOD_CUDA_HOME=/usr/local/cuda
    # export HOROVOD_GPU=CUDA
    pipenv install

    This command creates a virtualenv based on the Pipfile and Pipfile.lock.

Usage

With Docker

Prepare the directories:

mkdir -p "$(pwd)/data"
# Download CIFAR-10 dataset
curl -fsSL https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz -o "$(pwd)/data/cifar-10-python.tar.gz"
tar -C $(pwd)/data/ -xvzf "$(pwd)/data/cifar-10-python.tar.gz"
mkdir -p "$(pwd)/checkpoint"

Run the model:

docker run \
  --rm \
  -v "$(pwd)/data:/data" \
  -v "$(pwd)/checkpoint:/checkpoint" \
  -u 1000:1000 \
  --entrypoint /bin/sh \
  ghcr.io/deepsquare-io/cifar-10-example:latest \
  -c '\
  mpirun \
  -np 4 \
  /.venv/bin/python3 \
  /app/main.py \
  --no-cuda \
  --horovod \
  --checkpoint_in=/checkpoint/ckpt.pth \
  --checkpoint_out=/checkpoint/ckpt.pth \
  --dataset=/data
'

With Pipenv

Prepare the directories:

mkdir -p "$(pwd)/data"
# Download CIFAR-10 dataset
curl -fsSL https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz -o "$(pwd)/data/cifar-10-python.tar.gz"
tar -C $(pwd)/data/ -xvzf "$(pwd)/data/cifar-10-python.tar.gz"
mkdir -p "$(pwd)/checkpoint"

Run the model:

pipenv shell
mpirun \
  -np 4 \
  python3 \
  main.py \
  --no-cuda \
  --horovod \
  --checkpoint_in="$(pwd)/checkpoint/ckpt.pth" \
  --checkpoint_out="$(pwd)/checkpoint/ckpt.pth" \
  --dataset="$(pwd)/data"
'

About

Example CIFAR 10 using Deep Layer Aggregation to be used on DeepSquare

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors