From 8e7d09eecf044cb952318ba754991983b66895c7 Mon Sep 17 00:00:00 2001 From: Arnaud Bergeron Date: Tue, 3 Dec 2024 14:06:37 -0500 Subject: [PATCH 01/12] Add some documentation for podman --- docs/Userguide.rst | 3 +- docs/Userguide_containers.rst | 164 ++++------------------------------ 2 files changed, 20 insertions(+), 147 deletions(-) diff --git a/docs/Userguide.rst b/docs/Userguide.rst index 5391eca7..733f5b89 100644 --- a/docs/Userguide.rst +++ b/docs/Userguide.rst @@ -14,8 +14,7 @@ knowledge, tips and tricks and example commands. .. include:: Userguide_login.rst .. include:: Userguide_running_code.rst .. include:: Userguide_portability.rst -.. Nope, not that one, because we'll talk about singularity instead. - .. include:: Userguide_containers.rst +.. include:: Userguide_containers.rst .. include:: Userguide_singularity.rst .. include:: Userguide_sharing_data.rst .. include:: Userguide_datasets.rst diff --git a/docs/Userguide_containers.rst b/docs/Userguide_containers.rst index b9f9515f..150191c3 100644 --- a/docs/Userguide_containers.rst +++ b/docs/Userguide_containers.rst @@ -3,170 +3,44 @@ Using containers ================ -Docker containers are now available on the local cluster with a root-less -system called Shifter integrated into SLURM. -*It is still in beta and be careful with this usage* +Docker containers are now available on the local cluster without root priviledges using `podman `_. -Initialising your Containers ----------------------------- - -To first use a container, you have to pull it to the local registry to be -converted to a Shifter-compatible image. - -.. prompt:: bash - - shifterimg pull docker:image_name:latest - - -You can list available images with - -.. prompt:: bash - - shifterimg images - - -**DO NOT USE IMAGES WITH SENSITIVE INFORMATION** yet, it will soon be possible. -For now, every image is pulled to a common registry but access-control will soon -be implemented. +Generally any command-line argument accepted by docker will work with podman. this means that you can mostly use docker examples in you find on the web by replacing docker with podman in the command line. +.. note:: + Complete Podman Documentation: + https://docs.podman.io/en/stable/ Using in SLURM -------------- -Containerized Batch job -^^^^^^^^^^^^^^^^^^^^^^^ +To use podman you can just use the podman command in either a batch script or an interactive job. -You must use the ``--image=docker:image_name:latest`` directive to specify -the container to use. Once the container is mounted, you are not yet -inside the container's file-system, you must use the ``shifter`` command -to execute a command in the chroot environment of the container. - -For example: +One difference in configuration is that for certain technical reasons all the storage for podman (images, containers, ...) is on a job-specific location and will be lost after the job is complete or preempted. If you have data that must be preseved across jobs, you can `mount ` a local folder inside the container, such as $SCRATCH or you home to save data. .. code-block:: bash - :linenos: - - #!/bin/bash - #SBATCH --image=docker:image_name:latest - #SBATCH --nodes=1 - #SBATCH --partition=low - - shifter python myPythonScript.py args - - - -Container Interactive job -^^^^^^^^^^^^^^^^^^^^^^^^^ - -Using the salloc command, you can request the image while getting the allocation - -.. prompt:: bash - - salloc -c2 --mem=16g --image=docker:image_name:latest - - -Once in the job, you can activate the container's environment with the -``shifter`` command - -.. prompt:: bash - - shifter /bin/bash - - + $ podman run --mount type=bind,source=$SCRATCH/exp,destination=/data/exp bash touch /data/exp/file + $ ls $SCRATCH/exp + file -Command line ------------- - -``shifter`` support various options on the command line but you should be -set with the image name and the command to execute: - -.. code-block:: bash - - shifter [-h|--help] [-v|--verbose] [--image=:] - [--entrypoint[=command]] [--workdir[=/path]] - [-E|--clearenv] [-e|--env==] [--env-file=/env/file - [-V|--volume=/path/to/bind:/mnt/in/image[:[,...]][;...]] - [-m|--module=[,...]] - [-- /command/to/exec/in/shifter [args...]] - - - -Volumes -------- - -``/home/yourusername``, ``/Tmp``, ``/ai`` and all ``/network/..`` sub-folders are -mounted inside the container. - +You can use multiple containers in a single job, but you have to be careful about the memory and CPU limits of the job. GPU --- -To access the GPU inside a container, you need to specify ``--module=nvidia`` on -the ``sbatch/salloc/shifter`` command line - -.. prompt:: bash - - shifter --image=centos:7 --module=nvidia bash - +To use a GPU you need to a GPU job and then use the `--device nvidia.com/gpu=all` for all GPUs allocated to the job or `--device nvidia.com/gpu=n` where n is the gpu you want in the container, starting at 0. -Following folders will be mounted in the container: - -========================== =========== ================================================== -Host Container Comment -========================== =========== ================================================== -/ai/apps/cuda/10.0 /cuda Cuda libraries and bin, added to ``PATH`` -/usr/bin /nvidia/bin To access ``nvidia-smi`` -/usr/lib/x86_64-linux-gnu/ /nvidia/lib ``LD_LIBRARY_PATH`` will be set to ``/nvidia/lib`` -========================== =========== ================================================== - - -.. note:: - - - Use image names in 3 parts to avoid confusion: ``_type:name:tag_`` - - Please keep in mind that root is squashed on Shifter images, so the - software should be installed in a way that is executable to someone with - user-level permissions. - - Currently the ``/etc`` and ``/var`` directories are reserved for use by the - system and will be overwritten when the image is mounted - - The container is not isolated so you share the network card and all - hardware from the host, no need to forward ports - - -Example -------- - .. code-block:: bash - username@login-2:~$ shifterimg pull docker:alpine:latest - 2019-10-11T20:12:42 Pulling Image: docker:alpine:latest, status: READY - - username@login-2:~$ salloc -c2 --gres=gpu:1 --image=docker:alpine:latest - salloc: Granted job allocation 213064 - salloc: Waiting for resource configuration - salloc: Nodes eos20 are ready for job - - username@eos20:~$ cat /etc/os-release - NAME="Ubuntu" - VERSION="18.04.2 LTS (Bionic Beaver)" - ID=ubuntu - ID_LIKE=debian - PRETTY_NAME="Ubuntu 18.04.2 LTS" - VERSION_ID="18.04" - VERSION_CODENAME=bionic - UBUNTU_CODENAME=bionic - - username@eos20:~$ shifter sh - ~ $ cat /etc/os-release - NAME="Alpine Linux" - ID=alpine - VERSION_ID=3.10.2 - PRETTY_NAME="Alpine Linux v3.10" - - ~ $ + $ nvidia-smi + $ podman run --device nvidia.com/gpu=all nvidia/cuda:11.6.1-base-ubuntu20.04 nvidia-smi + $ podman run --device nvidia.com/gpu=0 nvidia/cuda:11.6.1-base-ubuntu20.04 nvidia-smi + $ podman run --device nvidia.com/gpu=1 nvidia/cuda:11.6.1-base-ubuntu20.04 nvidia-smi +You can pass `--device` multiple times to add more than one gpus to the container. .. note:: - Complete Documentation: - https://docs.nersc.gov/programming/shifter/how-to-use/ + CDI (GPU) support documentation: + https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html#running-a-workload-with-cdi From e39f9b72540975109a818cd7f65ded0f52750099 Mon Sep 17 00:00:00 2001 From: Arnaud Bergeron Date: Tue, 3 Dec 2024 14:21:41 -0500 Subject: [PATCH 02/12] Fix markup --- docs/Userguide_containers.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/Userguide_containers.rst b/docs/Userguide_containers.rst index 150191c3..05005b37 100644 --- a/docs/Userguide_containers.rst +++ b/docs/Userguide_containers.rst @@ -16,7 +16,7 @@ Using in SLURM To use podman you can just use the podman command in either a batch script or an interactive job. -One difference in configuration is that for certain technical reasons all the storage for podman (images, containers, ...) is on a job-specific location and will be lost after the job is complete or preempted. If you have data that must be preseved across jobs, you can `mount ` a local folder inside the container, such as $SCRATCH or you home to save data. +One difference in configuration is that for certain technical reasons all the storage for podman (images, containers, ...) is on a job-specific location and will be lost after the job is complete or preempted. If you have data that must be preseved across jobs, you can `mount `_ a local folder inside the container, such as $SCRATCH or you home to save data. .. code-block:: bash @@ -29,7 +29,7 @@ You can use multiple containers in a single job, but you have to be careful abou GPU --- -To use a GPU you need to a GPU job and then use the `--device nvidia.com/gpu=all` for all GPUs allocated to the job or `--device nvidia.com/gpu=n` where n is the gpu you want in the container, starting at 0. +To use a GPU you need to a GPU job and then use the ``--device nvidia.com/gpu=all`` for all GPUs allocated to the job or ``--device nvidia.com/gpu=n`` where n is the gpu you want in the container, starting at 0. .. code-block:: bash @@ -39,7 +39,7 @@ To use a GPU you need to a GPU job and then use the `--device nvidia.com/gpu=all $ podman run --device nvidia.com/gpu=0 nvidia/cuda:11.6.1-base-ubuntu20.04 nvidia-smi $ podman run --device nvidia.com/gpu=1 nvidia/cuda:11.6.1-base-ubuntu20.04 nvidia-smi -You can pass `--device` multiple times to add more than one gpus to the container. +You can pass ``--device`` multiple times to add more than one gpus to the container. .. note:: CDI (GPU) support documentation: From f5a399b2cb24cf770ab4f66fb9888ac7173220d9 Mon Sep 17 00:00:00 2001 From: satyaog Date: Wed, 4 Dec 2024 10:00:32 -0500 Subject: [PATCH 03/12] Apply suggestions from code review --- docs/Userguide_containers.rst | 29 +++++++++++++++++++++-------- 1 file changed, 21 insertions(+), 8 deletions(-) diff --git a/docs/Userguide_containers.rst b/docs/Userguide_containers.rst index 05005b37..ca55072d 100644 --- a/docs/Userguide_containers.rst +++ b/docs/Userguide_containers.rst @@ -3,20 +3,29 @@ Using containers ================ -Docker containers are now available on the local cluster without root priviledges using `podman `_. +Docker containers are now available on the local cluster without root +priviledges using `podman `_. -Generally any command-line argument accepted by docker will work with podman. this means that you can mostly use docker examples in you find on the web by replacing docker with podman in the command line. +Generally any command-line argument accepted by docker will work with podman. +This means that you can mostly use the docker examples you find on the web by +replacing `docker` with `podman` in the command line. .. note:: - Complete Podman Documentation: - https://docs.podman.io/en/stable/ + Complete Podman Documentation: https://docs.podman.io/en/stable/ Using in SLURM -------------- -To use podman you can just use the podman command in either a batch script or an interactive job. +To use podman you can just use the `podman` command in either a batch script or +an interactive job. -One difference in configuration is that for certain technical reasons all the storage for podman (images, containers, ...) is on a job-specific location and will be lost after the job is complete or preempted. If you have data that must be preseved across jobs, you can `mount `_ a local folder inside the container, such as $SCRATCH or you home to save data. +One difference in configuration is that for certain technical reasons all the +storage for podman (images, containers, ...) is on a job-specific location and +will be lost after the job is complete or preempted. If you have data that must +be preseved across jobs, you can `mount +`_ +a local folder inside the container, such as `$SCRATCH` or your home to save +data. .. code-block:: bash @@ -24,12 +33,16 @@ One difference in configuration is that for certain technical reasons all the st $ ls $SCRATCH/exp file -You can use multiple containers in a single job, but you have to be careful about the memory and CPU limits of the job. +You can use multiple containers in a single job, but you have to be careful +about the memory and CPU limits of the job. GPU --- -To use a GPU you need to a GPU job and then use the ``--device nvidia.com/gpu=all`` for all GPUs allocated to the job or ``--device nvidia.com/gpu=n`` where n is the gpu you want in the container, starting at 0. +To use a GPU in a container, you need to a GPU job and then use ``--device +nvidia.com/gpu=all`` to make all GPUs allocated available in the container or +``--device nvidia.com/gpu=N`` where `N` is the gpu index you want in the +container, starting at 0. .. code-block:: bash From 5270c3f7369402da44af39feb4b227029b1a67a6 Mon Sep 17 00:00:00 2001 From: Arnaud Bergeron Date: Fri, 13 Dec 2024 11:23:31 -0500 Subject: [PATCH 04/12] Add note about warning messages --- docs/Userguide_containers.rst | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/docs/Userguide_containers.rst b/docs/Userguide_containers.rst index ca55072d..37846a86 100644 --- a/docs/Userguide_containers.rst +++ b/docs/Userguide_containers.rst @@ -36,6 +36,18 @@ data. You can use multiple containers in a single job, but you have to be careful about the memory and CPU limits of the job. +.. note:: + + Due to the cluster environment you may see warning messages like + ``` + WARN[0000] "/" is not a shared mount, this could cause issues or missing mounts with rootless containers + ``` + or + ``` + WARN[0005] Failed to add pause process to systemd sandbox cgroup: dbus: couldn't determine address of session bus + ``` + but as far as we can see those can be safely ignored and should not have an impact on your images. + GPU --- From d2b3e7e37e88d2dbed1e149f3e39df5e849cdc42 Mon Sep 17 00:00:00 2001 From: Arnaud Bergeron Date: Fri, 13 Dec 2024 11:29:56 -0500 Subject: [PATCH 05/12] Fix markup --- docs/Userguide_containers.rst | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/docs/Userguide_containers.rst b/docs/Userguide_containers.rst index 37846a86..89dcb235 100644 --- a/docs/Userguide_containers.rst +++ b/docs/Userguide_containers.rst @@ -39,14 +39,11 @@ about the memory and CPU limits of the job. .. note:: Due to the cluster environment you may see warning messages like - ``` - WARN[0000] "/" is not a shared mount, this could cause issues or missing mounts with rootless containers - ``` + `WARN[0000] "/" is not a shared mount, this could cause issues or missing mounts with rootless containers` or - ``` - WARN[0005] Failed to add pause process to systemd sandbox cgroup: dbus: couldn't determine address of session bus - ``` - but as far as we can see those can be safely ignored and should not have an impact on your images. + `WARN[0005] Failed to add pause process to systemd sandbox cgroup: dbus: couldn't determine address of session bus` + but as far as we can see those can be safely ignored and should not have + an impact on your images. GPU --- From ce948e2df2b50738912ccc3e3a11e04f8e1d7f2d Mon Sep 17 00:00:00 2001 From: Arnaud Bergeron Date: Fri, 13 Dec 2024 12:41:54 -0500 Subject: [PATCH 06/12] Add some more error messages and some examples --- docs/Userguide_containers.rst | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/docs/Userguide_containers.rst b/docs/Userguide_containers.rst index 89dcb235..33836cfb 100644 --- a/docs/Userguide_containers.rst +++ b/docs/Userguide_containers.rst @@ -39,7 +39,9 @@ about the memory and CPU limits of the job. .. note:: Due to the cluster environment you may see warning messages like - `WARN[0000] "/" is not a shared mount, this could cause issues or missing mounts with rootless containers` + `WARN[0000] "/" is not a shared mount, this could cause issues or missing mounts with rootless containers`, + `ERRO[0000] cannot find UID/GID for user : no subuid ranges found for user "" in /etc/subuid - check rootless mode in man pages.`, + `WARN[0000] Using rootless single mapping into the namespace. This might break some images. Check /etc/subuid and /etc/subgid for adding sub*ids if not using a network user` or `WARN[0005] Failed to add pause process to systemd sandbox cgroup: dbus: couldn't determine address of session bus` but as far as we can see those can be safely ignored and should not have @@ -56,10 +58,10 @@ container, starting at 0. .. code-block:: bash - $ nvidia-smi - $ podman run --device nvidia.com/gpu=all nvidia/cuda:11.6.1-base-ubuntu20.04 nvidia-smi - $ podman run --device nvidia.com/gpu=0 nvidia/cuda:11.6.1-base-ubuntu20.04 nvidia-smi - $ podman run --device nvidia.com/gpu=1 nvidia/cuda:11.6.1-base-ubuntu20.04 nvidia-smi + $ nvidia-smi + $ podman run --device nvidia.com/gpu=all nvidia/cuda:11.6.1-base-ubuntu20.04 nvidia-smi + $ podman run --device nvidia.com/gpu=0 nvidia/cuda:11.6.1-base-ubuntu20.04 nvidia-smi + $ podman run --device nvidia.com/gpu=1 nvidia/cuda:11.6.1-base-ubuntu20.04 nvidia-smi You can pass ``--device`` multiple times to add more than one gpus to the container. From b7306ad304e2da06ae2910f6ad649b1adae49a60 Mon Sep 17 00:00:00 2001 From: Arnaud Bergeron Date: Fri, 13 Dec 2024 12:53:38 -0500 Subject: [PATCH 07/12] Fill in expected results from GPU examples --- docs/Userguide_containers.rst | 88 +++++++++++++++++++++++++++++++++++ 1 file changed, 88 insertions(+) diff --git a/docs/Userguide_containers.rst b/docs/Userguide_containers.rst index 33836cfb..efde6eb3 100644 --- a/docs/Userguide_containers.rst +++ b/docs/Userguide_containers.rst @@ -59,9 +59,97 @@ container, starting at 0. .. code-block:: bash $ nvidia-smi + Fri Dec 13 12:47:34 2024 + +-----------------------------------------------------------------------------------------+ + | NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6 | + |-----------------------------------------+------------------------+----------------------+ + | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | + | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | + | | | MIG M. | + |=========================================+========================+======================| + | 0 NVIDIA L40S On | 00000000:4A:00.0 Off | 0 | + | N/A 25C P8 36W / 350W | 1MiB / 46068MiB | 0% Default | + | | | N/A | + +-----------------------------------------+------------------------+----------------------+ + | 1 NVIDIA L40S On | 00000000:61:00.0 Off | 0 | + | N/A 26C P8 35W / 350W | 1MiB / 46068MiB | 0% Default | + | | | N/A | + +-----------------------------------------+------------------------+----------------------+ + + +-----------------------------------------------------------------------------------------+ + | Processes: | + | GPU GI CI PID Type Process name GPU Memory | + | ID ID Usage | + |=========================================================================================| + | No running processes found | + +-----------------------------------------------------------------------------------------+ $ podman run --device nvidia.com/gpu=all nvidia/cuda:11.6.1-base-ubuntu20.04 nvidia-smi + Fri Dec 13 17:48:21 2024 + +-----------------------------------------------------------------------------------------+ + | NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6 | + |-----------------------------------------+------------------------+----------------------+ + | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | + | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | + | | | MIG M. | + |=========================================+========================+======================| + | 0 NVIDIA L40S On | 00000000:4A:00.0 Off | 0 | + | N/A 25C P8 36W / 350W | 1MiB / 46068MiB | 0% Default | + | | | N/A | + +-----------------------------------------+------------------------+----------------------+ + | 1 NVIDIA L40S On | 00000000:61:00.0 Off | 0 | + | N/A 25C P8 35W / 350W | 1MiB / 46068MiB | 0% Default | + | | | N/A | + +-----------------------------------------+------------------------+----------------------+ + + +-----------------------------------------------------------------------------------------+ + | Processes: | + | GPU GI CI PID Type Process name GPU Memory | + | ID ID Usage | + |=========================================================================================| + | No running processes found | + +-----------------------------------------------------------------------------------------+ $ podman run --device nvidia.com/gpu=0 nvidia/cuda:11.6.1-base-ubuntu20.04 nvidia-smi + Fri Dec 13 17:48:33 2024 + +-----------------------------------------------------------------------------------------+ + | NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6 | + |-----------------------------------------+------------------------+----------------------+ + | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | + | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | + | | | MIG M. | + |=========================================+========================+======================| + | 0 NVIDIA L40S On | 00000000:4A:00.0 Off | 0 | + | N/A 25C P8 36W / 350W | 1MiB / 46068MiB | 0% Default | + | | | N/A | + +-----------------------------------------+------------------------+----------------------+ + + +-----------------------------------------------------------------------------------------+ + | Processes: | + | GPU GI CI PID Type Process name GPU Memory | + | ID ID Usage | + |=========================================================================================| + | No running processes found | + +-----------------------------------------------------------------------------------------+ $ podman run --device nvidia.com/gpu=1 nvidia/cuda:11.6.1-base-ubuntu20.04 nvidia-smi + Fri Dec 13 17:48:40 2024 + +-----------------------------------------------------------------------------------------+ + | NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6 | + |-----------------------------------------+------------------------+----------------------+ + | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | + | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | + | | | MIG M. | + |=========================================+========================+======================| + | 0 NVIDIA L40S On | 00000000:61:00.0 Off | 0 | + | N/A 25C P8 35W / 350W | 1MiB / 46068MiB | 0% Default | + | | | N/A | + +-----------------------------------------+------------------------+----------------------+ + + +-----------------------------------------------------------------------------------------+ + | Processes: | + | GPU GI CI PID Type Process name GPU Memory | + | ID ID Usage | + |=========================================================================================| + | No running processes found | + +-----------------------------------------------------------------------------------------+ You can pass ``--device`` multiple times to add more than one gpus to the container. From 3816aeb1389b0eb9b357a8b280cab28754b9bc2d Mon Sep 17 00:00:00 2001 From: Arnaud Bergeron Date: Fri, 13 Dec 2024 12:58:03 -0500 Subject: [PATCH 08/12] Remove whitespace --- docs/Userguide_containers.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/Userguide_containers.rst b/docs/Userguide_containers.rst index efde6eb3..17a3cb64 100644 --- a/docs/Userguide_containers.rst +++ b/docs/Userguide_containers.rst @@ -75,7 +75,7 @@ container, starting at 0. | N/A 26C P8 35W / 350W | 1MiB / 46068MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ - + +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | @@ -100,7 +100,7 @@ container, starting at 0. | N/A 25C P8 35W / 350W | 1MiB / 46068MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ - + +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | @@ -121,7 +121,7 @@ container, starting at 0. | N/A 25C P8 36W / 350W | 1MiB / 46068MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ - + +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | @@ -142,7 +142,7 @@ container, starting at 0. | N/A 25C P8 35W / 350W | 1MiB / 46068MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ - + +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | From 9affedec14865ce384a1c797244a88abdbb6eaae Mon Sep 17 00:00:00 2001 From: Arnaud Bergeron Date: Fri, 13 Dec 2024 13:04:58 -0500 Subject: [PATCH 09/12] Don't specify bash because the linter fails --- docs/Userguide_containers.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/Userguide_containers.rst b/docs/Userguide_containers.rst index 17a3cb64..6ce1159c 100644 --- a/docs/Userguide_containers.rst +++ b/docs/Userguide_containers.rst @@ -56,7 +56,7 @@ nvidia.com/gpu=all`` to make all GPUs allocated available in the container or container, starting at 0. -.. code-block:: bash +.. code-block:: $ nvidia-smi Fri Dec 13 12:47:34 2024 From c0fa0b192aaf2086197cc8ecc8a689ee5fb22546 Mon Sep 17 00:00:00 2001 From: abergeron Date: Fri, 13 Dec 2024 14:21:43 -0500 Subject: [PATCH 10/12] Update docs/Userguide_containers.rst Co-authored-by: Bruno Travouillon --- docs/Userguide_containers.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/Userguide_containers.rst b/docs/Userguide_containers.rst index 6ce1159c..85434115 100644 --- a/docs/Userguide_containers.rst +++ b/docs/Userguide_containers.rst @@ -3,8 +3,8 @@ Using containers ================ -Docker containers are now available on the local cluster without root -priviledges using `podman `_. +Podman containers are now available as tech preview on the Mila cluster +without root privileges using `podman `_. Generally any command-line argument accepted by docker will work with podman. This means that you can mostly use the docker examples you find on the web by From f40b08b5584dd7e57f5285a1dac1be7c2887ccc4 Mon Sep 17 00:00:00 2001 From: Arnaud Bergeron Date: Fri, 13 Dec 2024 14:23:33 -0500 Subject: [PATCH 11/12] fix typo --- docs/Userguide_containers.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/Userguide_containers.rst b/docs/Userguide_containers.rst index 85434115..3631a74a 100644 --- a/docs/Userguide_containers.rst +++ b/docs/Userguide_containers.rst @@ -50,7 +50,7 @@ about the memory and CPU limits of the job. GPU --- -To use a GPU in a container, you need to a GPU job and then use ``--device +To use a GPU in a container, you need a GPU job and then use ``--device nvidia.com/gpu=all`` to make all GPUs allocated available in the container or ``--device nvidia.com/gpu=N`` where `N` is the gpu index you want in the container, starting at 0. From 7412709d399b49b02f729c5f8bcf3b0ca465ac2f Mon Sep 17 00:00:00 2001 From: Arnaud Bergeron Date: Fri, 13 Dec 2024 14:41:26 -0500 Subject: [PATCH 12/12] Fix format --- docs/Userguide_containers.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/Userguide_containers.rst b/docs/Userguide_containers.rst index 3631a74a..24d02806 100644 --- a/docs/Userguide_containers.rst +++ b/docs/Userguide_containers.rst @@ -39,11 +39,11 @@ about the memory and CPU limits of the job. .. note:: Due to the cluster environment you may see warning messages like - `WARN[0000] "/" is not a shared mount, this could cause issues or missing mounts with rootless containers`, - `ERRO[0000] cannot find UID/GID for user : no subuid ranges found for user "" in /etc/subuid - check rootless mode in man pages.`, - `WARN[0000] Using rootless single mapping into the namespace. This might break some images. Check /etc/subuid and /etc/subgid for adding sub*ids if not using a network user` + ``WARN[0000] "/" is not a shared mount, this could cause issues or missing mounts with rootless containers``, + ``ERRO[0000] cannot find UID/GID for user : no subuid ranges found for user "" in /etc/subuid - check rootless mode in man pages.``, + ``WARN[0000] Using rootless single mapping into the namespace. This might break some images. Check /etc/subuid and /etc/subgid for adding sub*ids if not using a network user`` or - `WARN[0005] Failed to add pause process to systemd sandbox cgroup: dbus: couldn't determine address of session bus` + ``WARN[0005] Failed to add pause process to systemd sandbox cgroup: dbus: couldn't determine address of session bus`` but as far as we can see those can be safely ignored and should not have an impact on your images.