Exercise 4: OpenHPC and Containers

In this exercise we will be working with containers. Containers are an OS-level virtualization paradigm that allow multiple, isolated user environments to run concurrently. There are several different container technologies that address a variety of use cases and due to the Open Container Initiative (OCI), we can convert containers from one format to another.

The first container technology that most people learn is Docker. Docker is designed in such a way that the IAM model does not map well to HPC systems. That is, you must have escalated privledges (root) in order to use it. Because of this, most HPC systems do not support Docker.

However, there are multiple, alternative container technologies that allow users to run Docker/OCI containers in userspace without admin privledges. OpenHPC has support for Singularity and Charliecloud and both RHEL/CentOS and SuSE have support for Podman.

In this exercise, we will take Docker containers, convert them to Charliecloud containers and then run them on our elastic OpenHPC cluster. Previously, we have given other tutorials that include discussion of different virtualization and containerization tools including Singularity.

If you are attending this tutorial live, the containers are already available and ready to be used in $HOME/ContainersHPC. Below (and live), we will demo the typical use case where Docker is installed (and used for development) on your personal laptop/desktop and you want to run Docker images on an HPC system. At the end of this exercise are instructions for building the tutorial content in userspace using Podman.

Working with containers (45 mins)

First, we will demonstrate the typical use case.

End users develop with Docker on their personal machines
Dockerfiles are converted to gzip’d compressed charliecloud images
Images are transferred to our HPC system
Charliecloud compressed images are unpacked into a directory
Charliecloud containers are run on the cluster (Live tutorial users start here)

Build HPC containers from Dockerfiles

On the Desktop or laptop (demonstrated)

Build docker image from Dockerfile

In the directory where the Dockerfile is located, execute the following command:

$ sudo docker build -t image_name .

View the new Docker image in your local Docker repository:

$ sudo docker image ls

REPOSITORY                             TAG                IMAGE ID                  CREATED             SIZE
image_name                            latest              02d6e22ec5ec              2 days ago         5.74GB
bash                                  latest              16463e0c481e              7 weeks ago         15.2MB
ubuntu                                16.04               7e87e2b3bf7a              7 months ago        117MB
hello-world                           latest              fce289e99eb9              8 months ago        1.84kB
debian                                stretch             de8b49d4b0b3              8 months ago        101MB

Convert docker image to charliecloud on desktop

$ sudo ch-builder2tar image /dir/to/save

This creates the file /dir/to/save/image_name.tar.gz

Copy the tar.gz file to the HPC system

$ scp -i .ssh/id_rsa -r image_name.tar.gz centos@ec2-54-177-114-95.us-west-1.compute.amazonaws.com:

On the HPC system (hands on)

Load the charliecloud module

$ module load charliecloud

Unpack charliecloud archive on OHPC cloud

$ cd /home/centos/ContainersHPC
$ ch-tar2dir pico_quant.tar.gz .

Note if you’re attending this event live, the containers are already unpacked and ready to be used

Simple container execution (Live tutorial users start here)

Now that we have …

built our docker containers from a Dockerfile
converted it to a gzip’d charliecloud image
transfered it to our cluster
and unpacked our charliecloud image

We are ready to run our containers. If you are attending this tutorial live, the containers are available in $HOME/ContainersHPC.

The first thing we will do is invoke a bash shell in our Charliecloud container. This is done via the ch-run command.

Note: the -w flag mounts the image read-write (by default, the image is mounted read-only)

If you didn’t previously, you’ll need to ml load charliecloud before proceeding.

Start a bash shell in the container

$ ch-run -w ./a408704d3f3d/ -- bash
$ ls
$ exit

bash: /opt/ohpc/admin/lmod/lmod/libexec/lmod: No such file or directory?

In Exercise 3, we set up module collections and added module restore to our ~/.bashrc. However our containers are not set up to use lmod. Future versions of this tutorial will cover integration of OpenHPC modules into container environments.

For now, we can edit our ~/.bashrc and set it up to check for the existence of the /WEIRD_AL_YANKOVIC file. Charliecloud images always have this file at the top level of their virtual file system.

$ pwd
/home/centos/ContainersHPC/intel-oneapi
$ cat WEIRD_AL_YANKOVIC 
This directory is a Charliecloud image.

Edit your ~/.bashrc and replace the module restore command with this if statement (or simply remove / comment out the module command).

# User specific aliases and functions

if [ -f /WEIRD_AL_YANKOVIC ]
then
        echo "inside container; not restoring lmod defaults" #NOOP
else
        module restore
fi

Compile “hello world” mpi C program

Next, we will compile an MPI hello world program that is inside the container using the container compiler and MPI stacks.

$ ch-run -w a408704d3f3d -- mpicc -g -O3 -o /MPI_TEST/HelloWorld/mpi_hello_world /MPI_TEST/HelloWorld/mpi_hello_world.c

Execute MPI “hello world”

And now, we can run the freshly compiled mpi hello world example.

$ mpiexec -n 2 ch-run -w ./a408704d3f3d/ -- /MPI_TEST/HelloWorld/mpi_hello_world

TensorFlow example

Now that we’ve looked at a simple example, let’s use the same container to run Horovod. Horovod is Uber’s open source deep learning framework on top of TensorFlow.

Run distributed TensorFlow horovod interactively (print out horovod ranks)

$ mpiexec -n 2 ch-run -w ./a408704d3f3d/ -- python /MPI_TEST/Horovod/simple_mpi.py

Run distributed TensorFlow example on our elastic cluster via Slurm

$ sbatch slurm_sc_tutorial_charliecloud.sh

#!/bin/bash
#SBATCH --job-name="charliecloud_sc_tutorial_mpich_test"
#SBATCH --output="output_charliecloud_horovod_sc_tutorial_mpich_test.txt"
#SBATCH --error="error_charliecloud_horovod_sc_tutorial_mpich_test.txt"
#SBATCH --time=00:30:00
#SBATCH -N 2 # Request two nodes
#SBATCH -n 2 # Request 2 cores; one MPI task per node

#load charliecloud module
module load charliecloud

#tensorflow cpu best practice from:
#https://software.intel.com/content/www/us/en/develop/articles/maximize-tensorflow-performance-on-cpu-considerations-and-recommendations-for-inference.html
#Recommended settings for CNN → OMP_NUM_THREADS = num physical cores
export OMP_NUM_THREADS=8

#Recommended affinity setting for systems with hyperthreading on (can confirm with $ cat /proc/cpuinfo  | grep ht)
export KMP_AFFINITY=granularity=fine,verbose,compact,1,0

#Recommended settings (RTI) → intra_op_parallelism = # physical cores
#Recommended settings → inter_op_parallelism = 2
#Recommended settings → data_format = NCHW
#Recommended settings for CNN → KMP_BLOCKTIME=0

time prun ch-run -w /home/centos/ContainersHPC/a408704d3f3d --  python /tensorflow/benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py -
-model alexnet --batch_size 128 --data_format NCHW --num_batches 100 --distortions=False --mkl=True --local_parameter_device cpu --num_warmup_
batches 10 --optimizer rmsprop --display_every 10 --variable_update horovod --horovod_device cpu --num_intra_threads 8 --kmp_blocktime 0 --num
_inter_threads 2

Other features of Charliecloud

Mounting directory from host into container

Using the -b flag, it is possible to mount directories from the host system directly into the container. This can be used to “augment” the container with files / binaries from the host system.

$ ch-run -w -b /opt/ohpc/.:/opt/ohpc/ ./a408704d3f3d/ -- bash
$ ls /opt/ohpc
$ exit

Set environment to docker environment

When a Charliecloud image is built, a special file is created in $IMAGE/ch/environment that allows you to inherit the environment specified by the builder (Docker).

Default OHPC paths based on loaded modules.

$ echo $PATH
/opt/ohpc/pub/libs/charliecloud/0.15/bin:/home/centos/.local/bin:/home/centos/bin:/opt/ohpc/pub/mpi/libfabric/1.10.1/bin:/opt/ohpc/pub/mpi/mpi
ch-ofi-gnu9-ohpc/3.3.2/bin:/opt/ohpc/pub/compiler/gcc/9.3.0/bin:/opt/ohpc/pub/utils/prun/2.0:/opt/ohpc/pub/utils/autotools/bin:/opt/ohpc/pub/b
in:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin

Simply starting a container does not set your PATH to the same as the container build environment.

$ ch-run -w ./intel-oneapi -- bash
$ echo $PATH
/home/centos/.local/bin:/home/centos/bin:/opt/ohpc/pub/libs/charliecloud/0.15/bin:/home/centos/.local/bin:/home/centos/bin:/opt/ohpc/pub/mpi/l
ibfabric/1.10.1/bin:/opt/ohpc/pub/mpi/mpich-ofi-gnu9-ohpc/3.3.2/bin:/opt/ohpc/pub/compiler/gcc/9.3.0/bin:/opt/ohpc/pub/utils/prun/2.0:/opt/ohp
c/pub/utils/autotools/bin:/opt/ohpc/pub/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/bin
$ exit

However, you can use the –set-env option to modify your environment upon starting the container and if you use $IMAGE/ch/environment, it will be your original builder (Docker) runtime environment.

$ ch-run --set-env=./intel-oneapi/ch/environment -w ./intel-oneapi -- bash
$ echo $PATH
/home/centos/.local/bin:/home/centos/bin:/opt/intel/oneapi/inspector/2021.1-beta10/bin64:/opt/intel/oneapi/itac/2021.1-beta10/bin:/opt/intel/o
neapi/itac/2021.1-beta10/bin:/opt/intel/oneapi/clck/2021.1-beta10/bin/intel64:/opt/intel/oneapi/debugger/10.0-beta10/gdb/intel64/bin:/opt/inte
l/oneapi/dev-utilities/2021.1-beta10/bin:/opt/intel/oneapi/intelpython/latest/bin:/opt/intel/oneapi/intelpython/latest/condabin:/opt/intel/one
api/mpi/2021.1-beta10/libfabric/bin:/opt/intel/oneapi/mpi/2021.1-beta10/bin:/opt/intel/oneapi/vtune/2021.1-beta10/bin64:/opt/intel/oneapi/mkl/
2021.1-beta10/bin/intel64:/opt/intel/oneapi/compiler/2021.1-beta10/linux/lib/oclfpga/llvm/aocl-bin:/opt/intel/oneapi/compiler/2021.1-beta10/li
nux/lib/oclfpga/bin:/opt/intel/oneapi/compiler/2021.1-beta10/linux/bin/intel64:/opt/intel/oneapi/compiler/2021.1-beta10/linux/bin:/opt/intel/o
neapi/compiler/2021.1-beta10/linux/ioc/bin:/opt/intel/oneapi/advisor/2021.1-beta10/bin64:/opt/intel/oneapi/vpl/2021.1-beta10/bin:/usr/local/sb
in:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
$ exit

Additional Exercise – Quantum computing simulator setup

$ ch-run --set-env=./pico_quant/ch/environment -w ./pico_quant -- bash
$ julia
$ using Pkg
$ Pkg.add("PicoQuant")
$ Pkg.activate("/PicoQuant")
$ Pkg.test("PicoQuant")
$ exit

That concludes Exercise 4 and the tutorial. Below are the instructions to generate the Charliecloud image directories that you just used.

Gotchas, tips and “real world” examples

In your dockerfile avoid building your application and copying/storing your data in /home, as when executing the Charliecloud container the users /home directory will map to the containers /home directory and you will not be able to access the data the dockerfile build placed in /home.
Mount your data directory such as $SCRATCH and/or $WORK in the container to avoid storing your data in the container.
Use the system MPI environment (via mounting) when executing on a production HPC system.
Install the software/drivers of the high performance interconnects inside the container. The transport layer can default to TCP instead of using the high performance network.
Keep your container as small as possible. As creating the container is done on your laptop or desktop.
Use the HPC module system inside the container if possible for optimized, tested and/or licensed software.

Appendix: Building and Converting Dockerfiles in Userspace

If you’d like to build containers from Dockerfiles or Docker/OCI container registeries and don’t have access to a system with Docker or you want to build them directly on an HPC system, you can use Podman (if it’s installed on your HPC system).

Podman is a daemonless Docker/OCI container runtime that can run without escalated privledges. More info on Podman is available on their whatis page.

In this exercise, you were provided with Charliecloud image directories. These directories were built from Dockerfiles that were provided as part of the tarball from Exercise 1. These Dockerfiles are available in the ~/SC20/Dockerfiles directory on your “bastion”/manually launched EC2 instance.

$ cd ~/SC20/misc/Dockerfiles/
$ sh install-deps.sh

Log out and log back in (or run bash -l) to setup lmod in your environment.

$ ml load charliecloud
$ podman build -t myoneapi -f Dockerfile_intel-oneapi
$ ch-builder2tar myoneapi .

$ podman build -t mypq -f Dockerfile_PicoQuant
$ ch-builder2tar mypq .

From here, you can transfer your gzip’d charliecloud image tarballs to your HPC system, extract them, and run them.

$ ch-tar2dir image.tar.gz .