Merge pull request #1716 from helinwang/docker_work

refine doc for run paddle on docker

Merge pull request #1716 from helinwang/docker_work
refine doc for run paddle on docker
a470000f · helinwang · GitHub · 4c6dee93 · 90d7723a · a470000f
隐藏空白更改
内联并排

Showing with 178 addition and 122 deletion

doc/getstarted/build_and_install/docker_install_en.rst doc/getstarted/build_and_install/docker_install_en.rst +178 -122

未找到文件。
--- a/doc/getstarted/build_and_install/docker_install_en.rst
+++ b/doc/getstarted/build_and_install/docker_install_en.rst
@@ -8,199 +8,255 @@ Please be aware that you will need to change `Dockers settings
 <https://github.com/PaddlePaddle/Paddle/issues/627>`_ to make full use
 of your hardware resource on Mac OS X and Windows.

+Working With Docker
+-------------------
+
+Docker is simple as long as we understand a few basic concepts:
+
+- *image*: A Docker image is a pack of software. It could contain one or more programs and all their dependencies. For example, the PaddlePaddle's Docker image includes pre-built PaddlePaddle and Python and many Python packages. We can run a Docker image directly, other than installing all these software. We can type
+
+  .. code-block:: bash
+
+     docker images
+
+  to list all images in the system. We can also run
+
+  .. code-block:: bash
+		  
+     docker pull paddlepaddle/paddle:0.10.0rc2
+
+  to download a Docker image, paddlepaddle/paddle in this example,
+  from Dockerhub.com.
+
+- *container*: considering a Docker image a program, a container is a
+  "process" that runs the image. Indeed, a container is exactly an
+  operating system process, but with a virtualized filesystem, network
+  port space, and other virtualized environment. We can type
+
+  .. code-block:: bash
+
+     docker run paddlepaddle/paddle:0.10.0rc2
+
+  to start a container to run a Docker image, paddlepaddle/paddle in this example.
+
+- By default docker container have an isolated file system namespace,
+  we can not see the files in the host file system. By using *volume*,
+  mounted files in host will be visible inside docker container.
+  Following command will mount current dirctory into /data inside
+  docker container, run docker container from debian image with
+  command :code:`ls /data`.
+
+  .. code-block:: bash
+
+     docker run --rm -v $(pwd):/data debian ls /data

 Usage of CPU-only and GPU Images
 ----------------------------------

-For each version of PaddlePaddle, we release 2 types of Docker images: development
-image and production image. Production image includes CPU-only version and a CUDA
-GPU version and their no-AVX versions. We put the docker images on
-`dockerhub.com <https://hub.docker.com/r/paddledev/paddle/>`_. You can find the
-latest versions under "tags" tab at dockerhub.com.
-1. development image :code:`paddlepaddle/paddle:<version>-dev`
+For each version of PaddlePaddle, we release two types of Docker images:
+development image and production image. Production image includes
+CPU-only version and a CUDA GPU version and their no-AVX versions. We
+put the docker images on `dockerhub.com
+<https://hub.docker.com/r/paddledev/paddle/>`_. You can find the
+latest versions under "tags" tab at dockerhub.com

-    This image has packed related develop tools and runtime environment. Users and
-    developers can use this image instead of their own local computer to accomplish
-    development, build, releasing, document writing etc. While different version of
-    paddle may depends on different version of libraries and tools, if you want to
-    setup a local environment, you must pay attention to the versions.
-    The development image contains:
-    - gcc/clang
-    - nvcc
-    - Python
-    - sphinx
-    - woboq
-    - sshd
-    Many developers use servers with GPUs, they can use ssh to login to the server
-    and run :code:`docker exec` to enter the docker container and start their work.
-    Also they can start a development docker image with SSHD service, so they can login to
-    the container and start work.
+1. Production images, this image might have multiple variants:

-    To run the CPU-only image as an interactive container:
+   - GPU/AVX：:code:`paddlepaddle/paddle:<version>-gpu`
+   - GPU/no-AVX：:code:`paddlepaddle/paddle:<version>-gpu-noavx`
+   - CPU/AVX：:code:`paddlepaddle/paddle:<version>`
+   - CPU/no-AVX：:code:`paddlepaddle/paddle:<version>-noavx`

-    .. code-block:: bash
+   Please be aware that the CPU-only and the GPU images both use the
+   AVX instruction set, but old computers produced before 2008 do not
+   support AVX.  The following command checks if your Linux computer
+   supports AVX:

-        docker run -it --rm paddledev/paddle:<version> /bin/bash
+   .. code-block:: bash

-    or, we can run it as a daemon container
+      if cat /proc/cpuinfo | grep -i avx; then echo Yes; else echo No; fi

-    .. code-block:: bash
+   
+   To run the CPU-only image as an interactive container:

-        docker run -d -p 2202:22 -p 8888:8888 paddledev/paddle:<version>
+   .. code-block:: bash

-    and SSH to this container using password :code:`root`:
+      docker run -it --rm paddlepaddle/paddle:0.10.0rc2 /bin/bash

-    .. code-block:: bash
+   Above method work with the GPU image too -- the recommended way is
+   using `nvidia-docker <https://github.com/NVIDIA/nvidia-docker>`_.

-        ssh -p 2202 root@localhost
+   Please install nvidia-docker first following this `tutorial
+   <https://github.com/NVIDIA/nvidia-docker#quick-start>`_.

-    An advantage of using SSH is that we can connect to PaddlePaddle from
-    more than one terminals.  For example, one terminal running vi and
-    another one running Python interpreter.  Another advantage is that we
-    can run the PaddlePaddle container on a remote server and SSH to it
-    from a laptop.
+   Now you can run a GPU image:

+   .. code-block:: bash

-2. Production images, this image might have multiple variants:
-    - GPU/AVX：:code:`paddlepaddle/paddle:<version>-gpu`
-    - GPU/no-AVX：:code:`paddlepaddle/paddle:<version>-gpu-noavx`
-    - CPU/AVX：:code:`paddlepaddle/paddle:<version>`
-    - CPU/no-AVX：:code:`paddlepaddle/paddle:<version>-noavx`
+      nvidia-docker run -it --rm paddlepaddle/paddle:0.10.0rc2-gpu /bin/bash

-    Please be aware that the CPU-only and the GPU images both use the AVX
-    instruction set, but old computers produced before 2008 do not support
-    AVX.  The following command checks if your Linux computer supports
-    AVX:
+2. development image :code:`paddlepaddle/paddle:<version>-dev`

-    .. code-block:: bash
+   This image has packed related develop tools and runtime
+   environment. Users and developers can use this image instead of
+   their own local computer to accomplish development, build,
+   releasing, document writing etc. While different version of paddle
+   may depends on different version of libraries and tools, if you
+   want to setup a local environment, you must pay attention to the
+   versions.  The development image contains:
+   
+   - gcc/clang
+   - nvcc
+   - Python
+   - sphinx
+   - woboq
+   - sshd
+     
+   Many developers use servers with GPUs, they can use ssh to login to
+   the server and run :code:`docker exec` to enter the docker
+   container and start their work.  Also they can start a development
+   docker image with SSHD service, so they can login to the container
+   and start work.

-       if cat /proc/cpuinfo | grep -i avx; then echo Yes; else echo No; fi

+Train Model Using Python API
+----------------------------

-       If it doesn't, we will use the non-AVX images.
+Our official docker image provides a runtime for PaddlePaddle
+programs. The typical workflow will be as follows:

-    Above methods work with the GPU image too -- just please don't forget
-    to install GPU driver. To support GPU driver, we recommend to use 
-    [nvidia-docker](https://github.com/NVIDIA/nvidia-docker). Run using
+Create a directory as workspace:

-    .. code-block:: bash
+.. code-block:: bash

-        nvidia-docker run -it --rm paddledev/paddle:0.10.0rc1-gpu /bin/bash
+   mkdir ~/workspace

-    Note: If you would have a problem running nvidia-docker, you may try the old method we have used (not recommended).
+Edit a PaddlePaddle python program using your favourite editor

-    .. code-block:: bash
+.. code-block:: bash

-        export CUDA_SO="$(\ls /usr/lib64/libcuda* | xargs -I{} echo '-v {}:{}') $(\ls /usr/lib64/libnvidia* | xargs -I{} echo '-v {}:{}')"
-        export DEVICES=$(\ls /dev/nvidia* | xargs -I{} echo '--device {}:{}')
-        docker run ${CUDA_SO} ${DEVICES} -it paddledev/paddle:<version>-gpu
+   emacs ~/workspace/example.py

+Run the program using docker:

-3. Use production image to release you AI application
-    Suppose that we have a simple application program in :code:`a.py`, we can test and run it using the production image:
+.. code-block:: bash

-    ```bash
-    docker run -it -v $PWD:/work paddle /work/a.py
-    ```
+   docker run --rm -v ~/workspace:/workspace paddlepaddle/paddle:0.10.0rc2 python /workspace/example.py

-    But this works only if all dependencies of :code:`a.py` are in the production image. If this is not the case, we need to build a new Docker image from the production image and with more dependencies installs.
+Or if you are using GPU for training:

+.. code-block:: bash

-PaddlePaddle Book
------------------
+   nvidia-docker run --rm -v ~/workspace:/workspace paddlepaddle/paddle:0.10.0rc2-gpu python /workspace/example.py

-The Jupyter Notebook is an open-source web application that allows
-you to create and share documents that contain live code, equations,
-visualizations and explanatory text in a single browser.
+Above commands will start a docker container by running :code:`python
+/workspace/example.py`. It will stop once :code:`python
+/workspace/example.py` finishes.

-PaddlePaddle Book is an interactive Jupyter Notebook for users and developers.
-We already exposed port 8888 for this book. If you want to
-dig deeper into deep learning, PaddlePaddle Book definitely is your best choice.
+Another way is to tell docker to start a :code:`/bin/bash` session and
+run PaddlePaddle program interactively:

-We provide a packaged book image, simply issue the command:
+.. code-block:: bash
+
+   docker run -it -v ~/workspace:/workspace paddlepaddle/paddle:0.10.0rc2 /bin/bash
+   # now we are inside docker container
+   cd /workspace
+   python example.py
+
+Running with GPU is identical:

 .. code-block:: bash

-    docker run -p 8888:8888 paddlepaddle/book
+   nvidia-docker run -it -v ~/workspace:/workspace paddlepaddle/paddle:0.10.0rc2-gpu /bin/bash
+   # now we are inside docker container
+   cd /workspace
+   python example.py

-Then, you would back and paste the address into the local browser:

-.. code-block:: text
+Develop PaddlePaddle or Train Model Using C++ API
+---------------------------------------------------

-    http://localhost:8888/
+We will be using PaddlePaddle development image since it contains all
+compiling tools and dependencies.

-That's all. Enjoy your journey!
+Let's clone PaddlePaddle repo first:

-Development Using Docker
------------------------
+.. code-block:: bash

-Developers can work on PaddlePaddle using Docker.  This allows
-developers to work on different platforms -- Linux, Mac OS X, and
-Windows -- in a consistent way.
+   git clone https://github.com/PaddlePaddle/Paddle.git && cd Paddle

-1. Build the Development Docker Image
+Mount both workspace folder and paddle code folder into docker
+container, so we can access them inside docker container. There are
+two ways of using PaddlePaddle development docker image:

-   .. code-block:: bash
+- run interactive bash directly

-      git clone --recursive https://github.com/PaddlePaddle/Paddle
-      cd Paddle
-      docker build -t paddle:dev .
+  .. code-block:: bash

-   Note that by default :code:`docker build` wouldn't import source
-   tree into the image and build it.  If we want to do that, we need docker the
-   development docker image and then run the following command:
+     # use nvidia-docker instead of docker if you need to use GPU
+     docker run -it -v ~/workspace:/workspace -v $(pwd):/paddle paddlepaddle/paddle:0.10.0rc2-dev /bin/bash
+     # now we are inside docker container

-   .. code-block:: bash
+- or, we can run it as a daemon container

-      docker run -v $PWD:/paddle -e "WITH_GPU=OFF" -e "WITH_AVX=ON" -e "TEST=OFF" paddle:dev
+  .. code-block:: bash

+     # use nvidia-docker instead of docker if you need to use GPU
+     docker run -d -p 2202:22 -p 8888:8888 -v ~/workspace:/workspace -v $(pwd):/paddle paddlepaddle/paddle:0.10.0rc2-dev /usr/sbin/sshd -D

-2. Run the Development Environment
+  and SSH to this container using password :code:`root`:

-   Once we got the image :code:`paddle:dev`, we can use it to develop
-   Paddle by mounting the local source code tree into a container that
-   runs the image:
+  .. code-block:: bash

-   .. code-block:: bash
+     ssh -p 2202 root@localhost

-      docker run -d -p 2202:22 -p 8888:8888 -v $PWD:/paddle paddle:dev sshd
+  An advantage is that we can run the PaddlePaddle container on a
+  remote server and SSH to it from a laptop.

-   This runs a container of the development environment Docker image
-   with the local source tree mounted to :code:`/paddle` of the
-   container.
+When developing PaddlePaddle, you can edit PaddlePaddle source code
+from outside of docker container using your favoriate editor. To
+compile PaddlePaddle, run inside container:

-   The above :code:`docker run` commands actually starts
-   an SSHD server listening on port 2202.  This allows us to log into
-   this container with:
+.. code-block:: bash

-   .. code-block:: bash
+   WITH_GPU=OFF WITH_AVX=ON WITH_TEST=ON bash /paddle/paddle/scripts/docker/build.sh

-      ssh root@localhost -p 2202
+This builds everything about Paddle in :code:`/paddle/build`.  And we
+can run unit tests there:

-   Usually, I run above commands on my Mac.  I can also run them on a
-   GPU server :code:`xxx.yyy.zzz.www` and ssh from my Mac to it:
+.. code-block:: bash

-   .. code-block:: bash
+   cd /paddle/build
+   ctest

-      my-mac$ ssh root@xxx.yyy.zzz.www -p 2202
+When training model using C++ API, we can edit paddle program in
+~/workspace outside of docker. And build from /workspace inside of
+docker.

-3. Build and Install Using the Development Environment
+PaddlePaddle Book
+------------------

-   Once I am in the container, I can use
-   :code:`paddle/scripts/docker/build.sh` to build, install, and test
-   Paddle:
+The Jupyter Notebook is an open-source web application that allows
+you to create and share documents that contain live code, equations,
+visualizations and explanatory text in a single browser.

-   .. code-block:: bash
+PaddlePaddle Book is an interactive Jupyter Notebook for users and developers.
+We already exposed port 8888 for this book. If you want to
+dig deeper into deep learning, PaddlePaddle Book definitely is your best choice.

-      /paddle/paddle/scripts/docker/build.sh
+We provide a packaged book image, simply issue the command:

-   This builds everything about Paddle in :code:`/paddle/build`.  And
-   we can run unit tests there:
+.. code-block:: bash

-   .. code-block:: bash
+    docker run -p 8888:8888 paddlepaddle/book
+
+Then, you would back and paste the address into the local browser:
+
+.. code-block:: text
+
+    http://localhost:8888/

-      cd /paddle/build
-      ctest
+That's all. Enjoy your journey!


 Documentation