Merge pull request #1716 from helinwang/docker_work

refine doc for run paddle on docker

Merge pull request #1716 from helinwang/docker_work
refine doc for run paddle on docker
a470000f · helinwang · GitHub · 4c6dee93 · 90d7723a · a470000f
显示空白变更内容
内联并排

Showing with 178 addition and 122 deletion

doc/getstarted/build_and_install/docker_install_en.rst doc/getstarted/build_and_install/docker_install_en.rst +178 -122

未找到文件。
--- a/doc/getstarted/build_and_install/docker_install_en.rst
+++ b/doc/getstarted/build_and_install/docker_install_en.rst
@@ -8,200 +8,256 @@ Please be aware that you will need to change `Dockers settings
 <https://github.com/PaddlePaddle/Paddle/issues/627>`_ to make full use
 of your hardware resource on Mac OS X and Windows.
+Working With Docker
+-------------------
-Usage of CPU-only and GPU Images
+Docker is simple as long as we understand a few basic concepts:
----------------------------------
-For each version of PaddlePaddle, we release 2 types of Docker images: development
+- *image*: A Docker image is a pack of software. It could contain one or more programs and all their dependencies. For example, the PaddlePaddle's Docker image includes pre-built PaddlePaddle and Python and many Python packages. We can run a Docker image directly, other than installing all these software. We can type
-image and production image. Production image includes CPU-only version and a CUDA
-GPU version and their no-AVX versions. We put the docker images on
-`dockerhub.com <https://hub.docker.com/r/paddledev/paddle/>`_. You can find the
-latest versions under "tags" tab at dockerhub.com.
-1. development image :code:`paddlepaddle/paddle:<version>-dev`
-    This image has packed related develop tools and runtime environment. Users and
-    developers can use this image instead of their own local computer to accomplish
-    development, build, releasing, document writing etc. While different version of
-    paddle may depends on different version of libraries and tools, if you want to
-    setup a local environment, you must pay attention to the versions.
-    The development image contains:
-    - gcc/clang
-    - nvcc
-    - Python
-    - sphinx
-    - woboq
-    - sshd
-    Many developers use servers with GPUs, they can use ssh to login to the server
-    and run :code:`docker exec` to enter the docker container and start their work.
-    Also they can start a development docker image with SSHD service, so they can login to
-    the container and start work.
-    To run the CPU-only image as an interactive container:
+  .. code-block:: bash
+     docker images
+  to list all images in the system. We can also run
  .. code-block:: bash
-        docker run -it --rm paddledev/paddle:<version> /bin/bash
+     docker pull paddlepaddle/paddle:0.10.0rc2
-    or, we can run it as a daemon container
+  to download a Docker image, paddlepaddle/paddle in this example,
+  from Dockerhub.com.
+- *container*: considering a Docker image a program, a container is a
+  "process" that runs the image. Indeed, a container is exactly an
+  operating system process, but with a virtualized filesystem, network
+  port space, and other virtualized environment. We can type
  .. code-block:: bash
-        docker run -d -p 2202:22 -p 8888:8888 paddledev/paddle:<version>
+     docker run paddlepaddle/paddle:0.10.0rc2
-    and SSH to this container using password :code:`root`:
+  to start a container to run a Docker image, paddlepaddle/paddle in this example.
+- By default docker container have an isolated file system namespace,
+  we can not see the files in the host file system. By using *volume*,
+  mounted files in host will be visible inside docker container.
+  Following command will mount current dirctory into /data inside
+  docker container, run docker container from debian image with
+  command :code:`ls /data`.
  .. code-block:: bash
-        ssh -p 2202 root@localhost
+     docker run --rm -v $(pwd):/data debian ls /data
-    An advantage of using SSH is that we can connect to PaddlePaddle from
+Usage of CPU-only and GPU Images
-    more than one terminals.  For example, one terminal running vi and
+----------------------------------
-    another one running Python interpreter.  Another advantage is that we
-    can run the PaddlePaddle container on a remote server and SSH to it
-    from a laptop.
+For each version of PaddlePaddle, we release two types of Docker images:
+development image and production image. Production image includes
+CPU-only version and a CUDA GPU version and their no-AVX versions. We
+put the docker images on `dockerhub.com
+<https://hub.docker.com/r/paddledev/paddle/>`_. You can find the
+latest versions under "tags" tab at dockerhub.com
+1. Production images, this image might have multiple variants:
-2. Production images, this image might have multiple variants:
   - GPU/AVX：:code:`paddlepaddle/paddle:<version>-gpu`
   - GPU/no-AVX：:code:`paddlepaddle/paddle:<version>-gpu-noavx`
   - CPU/AVX：:code:`paddlepaddle/paddle:<version>`
   - CPU/no-AVX：:code:`paddlepaddle/paddle:<version>-noavx`
-    Please be aware that the CPU-only and the GPU images both use the AVX
+   Please be aware that the CPU-only and the GPU images both use the
-    instruction set, but old computers produced before 2008 do not support
+   AVX instruction set, but old computers produced before 2008 do not
-    AVX.  The following command checks if your Linux computer supports
+   support AVX.  The following command checks if your Linux computer
-    AVX:
+   supports AVX:
   .. code-block:: bash
      if cat /proc/cpuinfo | grep -i avx; then echo Yes; else echo No; fi
-       If it doesn't, we will use the non-AVX images.
+   To run the CPU-only image as an interactive container:
-    Above methods work with the GPU image too -- just please don't forget
-    to install GPU driver. To support GPU driver, we recommend to use 
-    [nvidia-docker](https://github.com/NVIDIA/nvidia-docker). Run using
   .. code-block:: bash
-        nvidia-docker run -it --rm paddledev/paddle:0.10.0rc1-gpu /bin/bash
+      docker run -it --rm paddlepaddle/paddle:0.10.0rc2 /bin/bash
+   Above method work with the GPU image too -- the recommended way is
+   using `nvidia-docker <https://github.com/NVIDIA/nvidia-docker>`_.
+   Please install nvidia-docker first following this `tutorial
+   <https://github.com/NVIDIA/nvidia-docker#quick-start>`_.
-    Note: If you would have a problem running nvidia-docker, you may try the old method we have used (not recommended).
+   Now you can run a GPU image:
   .. code-block:: bash
-        export CUDA_SO="$(\ls /usr/lib64/libcuda* | xargs -I{} echo '-v {}:{}') $(\ls /usr/lib64/libnvidia* | xargs -I{} echo '-v {}:{}')"
+      nvidia-docker run -it --rm paddlepaddle/paddle:0.10.0rc2-gpu /bin/bash
-        export DEVICES=$(\ls /dev/nvidia* | xargs -I{} echo '--device {}:{}')
-        docker run ${CUDA_SO} ${DEVICES} -it paddledev/paddle:<version>-gpu
+2. development image :code:`paddlepaddle/paddle:<version>-dev`
-3. Use production image to release you AI application
+   This image has packed related develop tools and runtime
-    Suppose that we have a simple application program in :code:`a.py`, we can test and run it using the production image:
+   environment. Users and developers can use this image instead of
+   their own local computer to accomplish development, build,
+   releasing, document writing etc. While different version of paddle
+   may depends on different version of libraries and tools, if you
+   want to setup a local environment, you must pay attention to the
+   versions.  The development image contains:
-    ```bash
+   - gcc/clang
-    docker run -it -v $PWD:/work paddle /work/a.py
+   - nvcc
-    ```
+   - Python
+   - sphinx
+   - woboq
+   - sshd
-    But this works only if all dependencies of :code:`a.py` are in the production image. If this is not the case, we need to build a new Docker image from the production image and with more dependencies installs.
+   Many developers use servers with GPUs, they can use ssh to login to
+   the server and run :code:`docker exec` to enter the docker
+   container and start their work.  Also they can start a development
+   docker image with SSHD service, so they can login to the container
+   and start work.
-PaddlePaddle Book
+Train Model Using Python API
------------------
+----------------------------
-The Jupyter Notebook is an open-source web application that allows
+Our official docker image provides a runtime for PaddlePaddle
-you to create and share documents that contain live code, equations,
+programs. The typical workflow will be as follows:
-visualizations and explanatory text in a single browser.
-PaddlePaddle Book is an interactive Jupyter Notebook for users and developers.
+Create a directory as workspace:
-We already exposed port 8888 for this book. If you want to
-dig deeper into deep learning, PaddlePaddle Book definitely is your best choice.
-We provide a packaged book image, simply issue the command:
+.. code-block:: bash
+   mkdir ~/workspace
+Edit a PaddlePaddle python program using your favourite editor
 .. code-block:: bash
-    docker run -p 8888:8888 paddlepaddle/book
+   emacs ~/workspace/example.py
-Then, you would back and paste the address into the local browser:
+Run the program using docker:
-.. code-block:: text
+.. code-block:: bash
-    http://localhost:8888/
+   docker run --rm -v ~/workspace:/workspace paddlepaddle/paddle:0.10.0rc2 python /workspace/example.py
-That's all. Enjoy your journey!
+Or if you are using GPU for training:
-Development Using Docker
+.. code-block:: bash
------------------------
-Developers can work on PaddlePaddle using Docker.  This allows
+   nvidia-docker run --rm -v ~/workspace:/workspace paddlepaddle/paddle:0.10.0rc2-gpu python /workspace/example.py
-developers to work on different platforms -- Linux, Mac OS X, and
-Windows -- in a consistent way.
-1. Build the Development Docker Image
+Above commands will start a docker container by running :code:`python
+/workspace/example.py`. It will stop once :code:`python
+/workspace/example.py` finishes.
-   .. code-block:: bash
+Another way is to tell docker to start a :code:`/bin/bash` session and
+run PaddlePaddle program interactively:
-      git clone --recursive https://github.com/PaddlePaddle/Paddle
+.. code-block:: bash
-      cd Paddle
-      docker build -t paddle:dev .
-   Note that by default :code:`docker build` wouldn't import source
+   docker run -it -v ~/workspace:/workspace paddlepaddle/paddle:0.10.0rc2 /bin/bash
-   tree into the image and build it.  If we want to do that, we need docker the
+   # now we are inside docker container
-   development docker image and then run the following command:
+   cd /workspace
+   python example.py
-   .. code-block:: bash
+Running with GPU is identical:
+.. code-block:: bash
-      docker run -v $PWD:/paddle -e "WITH_GPU=OFF" -e "WITH_AVX=ON" -e "TEST=OFF" paddle:dev
+   nvidia-docker run -it -v ~/workspace:/workspace paddlepaddle/paddle:0.10.0rc2-gpu /bin/bash
+   # now we are inside docker container
+   cd /workspace
+   python example.py
-2. Run the Development Environment
+Develop PaddlePaddle or Train Model Using C++ API
+---------------------------------------------------
-   Once we got the image :code:`paddle:dev`, we can use it to develop
+We will be using PaddlePaddle development image since it contains all
-   Paddle by mounting the local source code tree into a container that
+compiling tools and dependencies.
-   runs the image:
-   .. code-block:: bash
+Let's clone PaddlePaddle repo first:
+.. code-block:: bash
-      docker run -d -p 2202:22 -p 8888:8888 -v $PWD:/paddle paddle:dev sshd
+   git clone https://github.com/PaddlePaddle/Paddle.git && cd Paddle
-   This runs a container of the development environment Docker image
+Mount both workspace folder and paddle code folder into docker
-   with the local source tree mounted to :code:`/paddle` of the
+container, so we can access them inside docker container. There are
-   container.
+two ways of using PaddlePaddle development docker image:
-   The above :code:`docker run` commands actually starts
+- run interactive bash directly
-   an SSHD server listening on port 2202.  This allows us to log into
-   this container with:
  .. code-block:: bash
-      ssh root@localhost -p 2202
+     # use nvidia-docker instead of docker if you need to use GPU
+     docker run -it -v ~/workspace:/workspace -v $(pwd):/paddle paddlepaddle/paddle:0.10.0rc2-dev /bin/bash
+     # now we are inside docker container
-   Usually, I run above commands on my Mac.  I can also run them on a
+- or, we can run it as a daemon container
-   GPU server :code:`xxx.yyy.zzz.www` and ssh from my Mac to it:
  .. code-block:: bash
-      my-mac$ ssh root@xxx.yyy.zzz.www -p 2202
+     # use nvidia-docker instead of docker if you need to use GPU
+     docker run -d -p 2202:22 -p 8888:8888 -v ~/workspace:/workspace -v $(pwd):/paddle paddlepaddle/paddle:0.10.0rc2-dev /usr/sbin/sshd -D
-3. Build and Install Using the Development Environment
-   Once I am in the container, I can use
+  and SSH to this container using password :code:`root`:
-   :code:`paddle/scripts/docker/build.sh` to build, install, and test
-   Paddle:
  .. code-block:: bash
-      /paddle/paddle/scripts/docker/build.sh
+     ssh -p 2202 root@localhost
-   This builds everything about Paddle in :code:`/paddle/build`.  And
+  An advantage is that we can run the PaddlePaddle container on a
-   we can run unit tests there:
+  remote server and SSH to it from a laptop.
-   .. code-block:: bash
+When developing PaddlePaddle, you can edit PaddlePaddle source code
+from outside of docker container using your favoriate editor. To
+compile PaddlePaddle, run inside container:
+.. code-block:: bash
+   WITH_GPU=OFF WITH_AVX=ON WITH_TEST=ON bash /paddle/paddle/scripts/docker/build.sh
+This builds everything about Paddle in :code:`/paddle/build`.  And we
+can run unit tests there:
+.. code-block:: bash
   cd /paddle/build
   ctest
+When training model using C++ API, we can edit paddle program in
+~/workspace outside of docker. And build from /workspace inside of
+docker.
+PaddlePaddle Book
+------------------
+The Jupyter Notebook is an open-source web application that allows
+you to create and share documents that contain live code, equations,
+visualizations and explanatory text in a single browser.
+PaddlePaddle Book is an interactive Jupyter Notebook for users and developers.
+We already exposed port 8888 for this book. If you want to
+dig deeper into deep learning, PaddlePaddle Book definitely is your best choice.
+We provide a packaged book image, simply issue the command:
+.. code-block:: bash
+    docker run -p 8888:8888 paddlepaddle/book
+Then, you would back and paste the address into the local browser:
+.. code-block:: text
+    http://localhost:8888/
+That's all. Enjoy your journey!
 Documentation
 -------------