From e19861c567b544b7b43ae2e0315371abe6d86fa0 Mon Sep 17 00:00:00 2001 From: Helin Wang Date: Tue, 28 Mar 2017 12:06:15 -0700 Subject: [PATCH] refine doc for run paddle on docker --- .../build_and_install/docker_install_en.rst | 281 ++++++++++-------- 1 file changed, 159 insertions(+), 122 deletions(-) diff --git a/doc/getstarted/build_and_install/docker_install_en.rst b/doc/getstarted/build_and_install/docker_install_en.rst index f43e83d129..add666261f 100644 --- a/doc/getstarted/build_and_install/docker_install_en.rst +++ b/doc/getstarted/build_and_install/docker_install_en.rst @@ -8,199 +8,236 @@ Please be aware that you will need to change `Dockers settings `_ to make full use of your hardware resource on Mac OS X and Windows. +Working With Docker +------------------- + +Here we will describe the basic docker concepts that we will be using +in this tutorial. + +- *container* is an environment for running applications + +- *image* is an immutable snapshot of a docker container. One can run + a container based on a docker image by using command :code:`docker + run docker_image_name`. + +- By default docker container have an isolated file system namespace, + we can not see the files in the host file system. By using *volume*, + mounted files in host will be visible inside docker container. + Following command will mount current dirctory into /data inside + docker container, run docker container from debian image with + command :code:`ls /data`. + + .. code-block:: bash + + docker run --rm -v $(pwd):/data debian ls /data Usage of CPU-only and GPU Images ---------------------------------- -For each version of PaddlePaddle, we release 2 types of Docker images: development -image and production image. Production image includes CPU-only version and a CUDA -GPU version and their no-AVX versions. We put the docker images on -`dockerhub.com `_. You can find the -latest versions under "tags" tab at dockerhub.com. -1. development image :code:`paddlepaddle/paddle:-dev` +For each version of PaddlePaddle, we release 2 types of Docker images: +development image and production image. Production image includes +CPU-only version and a CUDA GPU version and their no-AVX versions. We +put the docker images on `dockerhub.com +`_. You can find the +latest versions under "tags" tab at dockerhub.com - This image has packed related develop tools and runtime environment. Users and - developers can use this image instead of their own local computer to accomplish - development, build, releasing, document writing etc. While different version of - paddle may depends on different version of libraries and tools, if you want to - setup a local environment, you must pay attention to the versions. - The development image contains: - - gcc/clang - - nvcc - - Python - - sphinx - - woboq - - sshd - Many developers use servers with GPUs, they can use ssh to login to the server - and run :code:`docker exec` to enter the docker container and start their work. - Also they can start a development docker image with SSHD service, so they can login to - the container and start work. +1. Production images, this image might have multiple variants: - To run the CPU-only image as an interactive container: + - GPU/AVX::code:`paddlepaddle/paddle:-gpu` + - GPU/no-AVX::code:`paddlepaddle/paddle:-gpu-noavx` + - CPU/AVX::code:`paddlepaddle/paddle:` + - CPU/no-AVX::code:`paddlepaddle/paddle:-noavx` - .. code-block:: bash + Please be aware that the CPU-only and the GPU images both use the + AVX instruction set, but old computers produced before 2008 do not + support AVX. The following command checks if your Linux computer + supports AVX: - docker run -it --rm paddledev/paddle: /bin/bash + .. code-block:: bash - or, we can run it as a daemon container + if cat /proc/cpuinfo | grep -i avx; then echo Yes; else echo No; fi - .. code-block:: bash + + To run the CPU-only image as an interactive container: - docker run -d -p 2202:22 -p 8888:8888 paddledev/paddle: + .. code-block:: bash - and SSH to this container using password :code:`root`: + docker run -it --rm paddlepaddle/paddle:0.10.0rc2 /bin/bash - .. code-block:: bash + Above method work with the GPU image too -- the recommended way is + using `nvidia-docker `_. - ssh -p 2202 root@localhost + Please install nvidia-docker first following this `tutorial + `_. - An advantage of using SSH is that we can connect to PaddlePaddle from - more than one terminals. For example, one terminal running vi and - another one running Python interpreter. Another advantage is that we - can run the PaddlePaddle container on a remote server and SSH to it - from a laptop. + Now you can run a GPU image: + .. code-block:: bash -2. Production images, this image might have multiple variants: - - GPU/AVX::code:`paddlepaddle/paddle:-gpu` - - GPU/no-AVX::code:`paddlepaddle/paddle:-gpu-noavx` - - CPU/AVX::code:`paddlepaddle/paddle:` - - CPU/no-AVX::code:`paddlepaddle/paddle:-noavx` + nvidia-docker run -it --rm paddlepaddle/paddle:0.10.0rc2-gpu /bin/bash - Please be aware that the CPU-only and the GPU images both use the AVX - instruction set, but old computers produced before 2008 do not support - AVX. The following command checks if your Linux computer supports - AVX: +2. development image :code:`paddlepaddle/paddle:-dev` - .. code-block:: bash + This image has packed related develop tools and runtime + environment. Users and developers can use this image instead of + their own local computer to accomplish development, build, + releasing, document writing etc. While different version of paddle + may depends on different version of libraries and tools, if you + want to setup a local environment, you must pay attention to the + versions. The development image contains: + + - gcc/clang + - nvcc + - Python + - sphinx + - woboq + - sshd + + Many developers use servers with GPUs, they can use ssh to login to + the server and run :code:`docker exec` to enter the docker + container and start their work. Also they can start a development + docker image with SSHD service, so they can login to the container + and start work. - if cat /proc/cpuinfo | grep -i avx; then echo Yes; else echo No; fi +Train Model Using Python API +---------------------------- - If it doesn't, we will use the non-AVX images. +Our official docker image provides a runtime for PaddlePaddle +programs. The typical workflow will be as follows: - Above methods work with the GPU image too -- just please don't forget - to install GPU driver. To support GPU driver, we recommend to use - [nvidia-docker](https://github.com/NVIDIA/nvidia-docker). Run using +Create a directory as workspace: - .. code-block:: bash +.. code-block:: bash - nvidia-docker run -it --rm paddledev/paddle:0.10.0rc1-gpu /bin/bash + mkdir ~/workspace - Note: If you would have a problem running nvidia-docker, you may try the old method we have used (not recommended). +Edit a PaddlePaddle python program using your favourite editor - .. code-block:: bash +.. code-block:: bash - export CUDA_SO="$(\ls /usr/lib64/libcuda* | xargs -I{} echo '-v {}:{}') $(\ls /usr/lib64/libnvidia* | xargs -I{} echo '-v {}:{}')" - export DEVICES=$(\ls /dev/nvidia* | xargs -I{} echo '--device {}:{}') - docker run ${CUDA_SO} ${DEVICES} -it paddledev/paddle:-gpu + emacs ~/workspace/example.py +Run the program using docker: -3. Use production image to release you AI application - Suppose that we have a simple application program in :code:`a.py`, we can test and run it using the production image: +.. code-block:: bash - ```bash - docker run -it -v $PWD:/work paddle /work/a.py - ``` + docker run -it --rm -v ~/workspace:/workspace paddlepaddle/paddle:0.10.0rc2 python /workspace/example.py - But this works only if all dependencies of :code:`a.py` are in the production image. If this is not the case, we need to build a new Docker image from the production image and with more dependencies installs. +Or if you are using GPU for training: +.. code-block:: bash -PaddlePaddle Book ------------------- + nvidia-docker run -it --rm -v ~/workspace:/workspace paddlepaddle/paddle:0.10.0rc2-gpu python /workspace/example.py -The Jupyter Notebook is an open-source web application that allows -you to create and share documents that contain live code, equations, -visualizations and explanatory text in a single browser. +Above commands will start a docker container by running :code:`python +/workspace/example.py`. It will stop once :code:`python +/workspace/example.py` finishes. -PaddlePaddle Book is an interactive Jupyter Notebook for users and developers. -We already exposed port 8888 for this book. If you want to -dig deeper into deep learning, PaddlePaddle Book definitely is your best choice. +Another way is to tell docker to start a :code:`/bin/bash` session and +run PaddlePaddle program interactively: -We provide a packaged book image, simply issue the command: +.. code-block:: bash + + docker run -it -v ~/workspace:/workspace paddlepaddle/paddle:0.10.0rc2 /bin/bash + # now we are inside docker container + cd /workspace + python example.py + +Running with GPU is identical: .. code-block:: bash - docker run -p 8888:8888 paddlepaddle/book + nvidia-docker run -it -v ~/workspace:/workspace paddlepaddle/paddle:0.10.0rc2-gpu /bin/bash + # now we are inside docker container + cd /workspace + python example.py -Then, you would back and paste the address into the local browser: -.. code-block:: text +Develop PaddlePaddle or Train Model Using C++ API +--------------------------------------------------- - http://localhost:8888/ +We will be using PaddlePaddle development image since it contains all +compiling tools and dependencies. -That's all. Enjoy your journey! +Let's clone PaddlePaddle repo first: -Development Using Docker ------------------------- +.. code-block:: bash -Developers can work on PaddlePaddle using Docker. This allows -developers to work on different platforms -- Linux, Mac OS X, and -Windows -- in a consistent way. + git clone https://github.com/PaddlePaddle/Paddle.git && cd Paddle -1. Build the Development Docker Image +Mount both workspace folder and paddle code folder into docker +container, so we can access them inside docker container. There are +two ways of using PaddlePaddle development docker image: - .. code-block:: bash +- run interactive bash directly - git clone --recursive https://github.com/PaddlePaddle/Paddle - cd Paddle - docker build -t paddle:dev . + .. code-block:: bash - Note that by default :code:`docker build` wouldn't import source - tree into the image and build it. If we want to do that, we need docker the - development docker image and then run the following command: + # use nvidia-docker instead of docker if you need to use GPU + docker run -it -v ~/workspace:/workspace -v $(pwd):/paddle paddlepaddle/paddle:0.10.0rc2-dev /bin/bash + # now we are inside docker container - .. code-block:: bash +- or, we can run it as a daemon container - docker run -v $PWD:/paddle -e "WITH_GPU=OFF" -e "WITH_AVX=ON" -e "TEST=OFF" paddle:dev + .. code-block:: bash + # use nvidia-docker instead of docker if you need to use GPU + docker run -d -p 2202:22 -p 8888:8888 -v ~/workspace:/workspace -v $(pwd):/paddle paddlepaddle/paddle:0.10.0rc2-dev /usr/sbin/sshd -D -2. Run the Development Environment + and SSH to this container using password :code:`root`: - Once we got the image :code:`paddle:dev`, we can use it to develop - Paddle by mounting the local source code tree into a container that - runs the image: + .. code-block:: bash - .. code-block:: bash + ssh -p 2202 root@localhost - docker run -d -p 2202:22 -p 8888:8888 -v $PWD:/paddle paddle:dev sshd + An advantage is that we can run the PaddlePaddle container on a + remote server and SSH to it from a laptop. - This runs a container of the development environment Docker image - with the local source tree mounted to :code:`/paddle` of the - container. +When developing PaddlePaddle, you can edit PaddlePaddle source code +from outside of docker container using your favoriate editor. To +compile PaddlePaddle, run inside container: - The above :code:`docker run` commands actually starts - an SSHD server listening on port 2202. This allows us to log into - this container with: +.. code-block:: bash - .. code-block:: bash + WITH_GPU=OFF WITH_AVX=ON WITH_TEST=ON bash /paddle/paddle/scripts/docker/build.sh - ssh root@localhost -p 2202 +This builds everything about Paddle in :code:`/paddle/build`. And we +can run unit tests there: - Usually, I run above commands on my Mac. I can also run them on a - GPU server :code:`xxx.yyy.zzz.www` and ssh from my Mac to it: +.. code-block:: bash - .. code-block:: bash + cd /paddle/build + ctest - my-mac$ ssh root@xxx.yyy.zzz.www -p 2202 +When training model using C++ API, we can edit paddle program in +~/workspace outside of docker. And build from /workspace inside of +docker. -3. Build and Install Using the Development Environment +PaddlePaddle Book +------------------ - Once I am in the container, I can use - :code:`paddle/scripts/docker/build.sh` to build, install, and test - Paddle: +The Jupyter Notebook is an open-source web application that allows +you to create and share documents that contain live code, equations, +visualizations and explanatory text in a single browser. - .. code-block:: bash +PaddlePaddle Book is an interactive Jupyter Notebook for users and developers. +We already exposed port 8888 for this book. If you want to +dig deeper into deep learning, PaddlePaddle Book definitely is your best choice. - /paddle/paddle/scripts/docker/build.sh +We provide a packaged book image, simply issue the command: - This builds everything about Paddle in :code:`/paddle/build`. And - we can run unit tests there: +.. code-block:: bash - .. code-block:: bash + docker run -p 8888:8888 paddlepaddle/book - cd /paddle/build - ctest +Then, you would back and paste the address into the local browser: + +.. code-block:: text + + http://localhost:8888/ + +That's all. Enjoy your journey! Documentation -- GitLab