提交 a470000f 编写于 作者: H helinwang 提交者: GitHub

Merge pull request #1716 from helinwang/docker_work

refine doc for run paddle on docker
......@@ -8,200 +8,256 @@ Please be aware that you will need to change `Dockers settings
<https://github.com/PaddlePaddle/Paddle/issues/627>`_ to make full use
of your hardware resource on Mac OS X and Windows.
Working With Docker
-------------------
Usage of CPU-only and GPU Images
----------------------------------
Docker is simple as long as we understand a few basic concepts:
For each version of PaddlePaddle, we release 2 types of Docker images: development
image and production image. Production image includes CPU-only version and a CUDA
GPU version and their no-AVX versions. We put the docker images on
`dockerhub.com <https://hub.docker.com/r/paddledev/paddle/>`_. You can find the
latest versions under "tags" tab at dockerhub.com.
1. development image :code:`paddlepaddle/paddle:<version>-dev`
This image has packed related develop tools and runtime environment. Users and
developers can use this image instead of their own local computer to accomplish
development, build, releasing, document writing etc. While different version of
paddle may depends on different version of libraries and tools, if you want to
setup a local environment, you must pay attention to the versions.
The development image contains:
- gcc/clang
- nvcc
- Python
- sphinx
- woboq
- sshd
Many developers use servers with GPUs, they can use ssh to login to the server
and run :code:`docker exec` to enter the docker container and start their work.
Also they can start a development docker image with SSHD service, so they can login to
the container and start work.
- *image*: A Docker image is a pack of software. It could contain one or more programs and all their dependencies. For example, the PaddlePaddle's Docker image includes pre-built PaddlePaddle and Python and many Python packages. We can run a Docker image directly, other than installing all these software. We can type
To run the CPU-only image as an interactive container:
.. code-block:: bash
docker images
to list all images in the system. We can also run
.. code-block:: bash
docker run -it --rm paddledev/paddle:<version> /bin/bash
docker pull paddlepaddle/paddle:0.10.0rc2
or, we can run it as a daemon container
to download a Docker image, paddlepaddle/paddle in this example,
from Dockerhub.com.
- *container*: considering a Docker image a program, a container is a
"process" that runs the image. Indeed, a container is exactly an
operating system process, but with a virtualized filesystem, network
port space, and other virtualized environment. We can type
.. code-block:: bash
docker run -d -p 2202:22 -p 8888:8888 paddledev/paddle:<version>
docker run paddlepaddle/paddle:0.10.0rc2
and SSH to this container using password :code:`root`:
to start a container to run a Docker image, paddlepaddle/paddle in this example.
- By default docker container have an isolated file system namespace,
we can not see the files in the host file system. By using *volume*,
mounted files in host will be visible inside docker container.
Following command will mount current dirctory into /data inside
docker container, run docker container from debian image with
command :code:`ls /data`.
.. code-block:: bash
ssh -p 2202 root@localhost
docker run --rm -v $(pwd):/data debian ls /data
An advantage of using SSH is that we can connect to PaddlePaddle from
more than one terminals. For example, one terminal running vi and
another one running Python interpreter. Another advantage is that we
can run the PaddlePaddle container on a remote server and SSH to it
from a laptop.
Usage of CPU-only and GPU Images
----------------------------------
For each version of PaddlePaddle, we release two types of Docker images:
development image and production image. Production image includes
CPU-only version and a CUDA GPU version and their no-AVX versions. We
put the docker images on `dockerhub.com
<https://hub.docker.com/r/paddledev/paddle/>`_. You can find the
latest versions under "tags" tab at dockerhub.com
1. Production images, this image might have multiple variants:
2. Production images, this image might have multiple variants:
- GPU/AVX::code:`paddlepaddle/paddle:<version>-gpu`
- GPU/no-AVX::code:`paddlepaddle/paddle:<version>-gpu-noavx`
- CPU/AVX::code:`paddlepaddle/paddle:<version>`
- CPU/no-AVX::code:`paddlepaddle/paddle:<version>-noavx`
Please be aware that the CPU-only and the GPU images both use the AVX
instruction set, but old computers produced before 2008 do not support
AVX. The following command checks if your Linux computer supports
AVX:
Please be aware that the CPU-only and the GPU images both use the
AVX instruction set, but old computers produced before 2008 do not
support AVX. The following command checks if your Linux computer
supports AVX:
.. code-block:: bash
if cat /proc/cpuinfo | grep -i avx; then echo Yes; else echo No; fi
If it doesn't, we will use the non-AVX images.
Above methods work with the GPU image too -- just please don't forget
to install GPU driver. To support GPU driver, we recommend to use
[nvidia-docker](https://github.com/NVIDIA/nvidia-docker). Run using
To run the CPU-only image as an interactive container:
.. code-block:: bash
nvidia-docker run -it --rm paddledev/paddle:0.10.0rc1-gpu /bin/bash
docker run -it --rm paddlepaddle/paddle:0.10.0rc2 /bin/bash
Above method work with the GPU image too -- the recommended way is
using `nvidia-docker <https://github.com/NVIDIA/nvidia-docker>`_.
Please install nvidia-docker first following this `tutorial
<https://github.com/NVIDIA/nvidia-docker#quick-start>`_.
Note: If you would have a problem running nvidia-docker, you may try the old method we have used (not recommended).
Now you can run a GPU image:
.. code-block:: bash
export CUDA_SO="$(\ls /usr/lib64/libcuda* | xargs -I{} echo '-v {}:{}') $(\ls /usr/lib64/libnvidia* | xargs -I{} echo '-v {}:{}')"
export DEVICES=$(\ls /dev/nvidia* | xargs -I{} echo '--device {}:{}')
docker run ${CUDA_SO} ${DEVICES} -it paddledev/paddle:<version>-gpu
nvidia-docker run -it --rm paddlepaddle/paddle:0.10.0rc2-gpu /bin/bash
2. development image :code:`paddlepaddle/paddle:<version>-dev`
3. Use production image to release you AI application
Suppose that we have a simple application program in :code:`a.py`, we can test and run it using the production image:
This image has packed related develop tools and runtime
environment. Users and developers can use this image instead of
their own local computer to accomplish development, build,
releasing, document writing etc. While different version of paddle
may depends on different version of libraries and tools, if you
want to setup a local environment, you must pay attention to the
versions. The development image contains:
```bash
docker run -it -v $PWD:/work paddle /work/a.py
```
- gcc/clang
- nvcc
- Python
- sphinx
- woboq
- sshd
But this works only if all dependencies of :code:`a.py` are in the production image. If this is not the case, we need to build a new Docker image from the production image and with more dependencies installs.
Many developers use servers with GPUs, they can use ssh to login to
the server and run :code:`docker exec` to enter the docker
container and start their work. Also they can start a development
docker image with SSHD service, so they can login to the container
and start work.
PaddlePaddle Book
------------------
Train Model Using Python API
----------------------------
The Jupyter Notebook is an open-source web application that allows
you to create and share documents that contain live code, equations,
visualizations and explanatory text in a single browser.
Our official docker image provides a runtime for PaddlePaddle
programs. The typical workflow will be as follows:
PaddlePaddle Book is an interactive Jupyter Notebook for users and developers.
We already exposed port 8888 for this book. If you want to
dig deeper into deep learning, PaddlePaddle Book definitely is your best choice.
Create a directory as workspace:
We provide a packaged book image, simply issue the command:
.. code-block:: bash
mkdir ~/workspace
Edit a PaddlePaddle python program using your favourite editor
.. code-block:: bash
docker run -p 8888:8888 paddlepaddle/book
emacs ~/workspace/example.py
Then, you would back and paste the address into the local browser:
Run the program using docker:
.. code-block:: text
.. code-block:: bash
http://localhost:8888/
docker run --rm -v ~/workspace:/workspace paddlepaddle/paddle:0.10.0rc2 python /workspace/example.py
That's all. Enjoy your journey!
Or if you are using GPU for training:
Development Using Docker
------------------------
.. code-block:: bash
Developers can work on PaddlePaddle using Docker. This allows
developers to work on different platforms -- Linux, Mac OS X, and
Windows -- in a consistent way.
nvidia-docker run --rm -v ~/workspace:/workspace paddlepaddle/paddle:0.10.0rc2-gpu python /workspace/example.py
1. Build the Development Docker Image
Above commands will start a docker container by running :code:`python
/workspace/example.py`. It will stop once :code:`python
/workspace/example.py` finishes.
.. code-block:: bash
Another way is to tell docker to start a :code:`/bin/bash` session and
run PaddlePaddle program interactively:
git clone --recursive https://github.com/PaddlePaddle/Paddle
cd Paddle
docker build -t paddle:dev .
.. code-block:: bash
Note that by default :code:`docker build` wouldn't import source
tree into the image and build it. If we want to do that, we need docker the
development docker image and then run the following command:
docker run -it -v ~/workspace:/workspace paddlepaddle/paddle:0.10.0rc2 /bin/bash
# now we are inside docker container
cd /workspace
python example.py
.. code-block:: bash
Running with GPU is identical:
.. code-block:: bash
docker run -v $PWD:/paddle -e "WITH_GPU=OFF" -e "WITH_AVX=ON" -e "TEST=OFF" paddle:dev
nvidia-docker run -it -v ~/workspace:/workspace paddlepaddle/paddle:0.10.0rc2-gpu /bin/bash
# now we are inside docker container
cd /workspace
python example.py
2. Run the Development Environment
Develop PaddlePaddle or Train Model Using C++ API
---------------------------------------------------
Once we got the image :code:`paddle:dev`, we can use it to develop
Paddle by mounting the local source code tree into a container that
runs the image:
We will be using PaddlePaddle development image since it contains all
compiling tools and dependencies.
.. code-block:: bash
Let's clone PaddlePaddle repo first:
.. code-block:: bash
docker run -d -p 2202:22 -p 8888:8888 -v $PWD:/paddle paddle:dev sshd
git clone https://github.com/PaddlePaddle/Paddle.git && cd Paddle
This runs a container of the development environment Docker image
with the local source tree mounted to :code:`/paddle` of the
container.
Mount both workspace folder and paddle code folder into docker
container, so we can access them inside docker container. There are
two ways of using PaddlePaddle development docker image:
The above :code:`docker run` commands actually starts
an SSHD server listening on port 2202. This allows us to log into
this container with:
- run interactive bash directly
.. code-block:: bash
ssh root@localhost -p 2202
# use nvidia-docker instead of docker if you need to use GPU
docker run -it -v ~/workspace:/workspace -v $(pwd):/paddle paddlepaddle/paddle:0.10.0rc2-dev /bin/bash
# now we are inside docker container
Usually, I run above commands on my Mac. I can also run them on a
GPU server :code:`xxx.yyy.zzz.www` and ssh from my Mac to it:
- or, we can run it as a daemon container
.. code-block:: bash
my-mac$ ssh root@xxx.yyy.zzz.www -p 2202
3. Build and Install Using the Development Environment
# use nvidia-docker instead of docker if you need to use GPU
docker run -d -p 2202:22 -p 8888:8888 -v ~/workspace:/workspace -v $(pwd):/paddle paddlepaddle/paddle:0.10.0rc2-dev /usr/sbin/sshd -D
Once I am in the container, I can use
:code:`paddle/scripts/docker/build.sh` to build, install, and test
Paddle:
and SSH to this container using password :code:`root`:
.. code-block:: bash
/paddle/paddle/scripts/docker/build.sh
ssh -p 2202 root@localhost
This builds everything about Paddle in :code:`/paddle/build`. And
we can run unit tests there:
An advantage is that we can run the PaddlePaddle container on a
remote server and SSH to it from a laptop.
.. code-block:: bash
When developing PaddlePaddle, you can edit PaddlePaddle source code
from outside of docker container using your favoriate editor. To
compile PaddlePaddle, run inside container:
.. code-block:: bash
WITH_GPU=OFF WITH_AVX=ON WITH_TEST=ON bash /paddle/paddle/scripts/docker/build.sh
This builds everything about Paddle in :code:`/paddle/build`. And we
can run unit tests there:
.. code-block:: bash
cd /paddle/build
ctest
When training model using C++ API, we can edit paddle program in
~/workspace outside of docker. And build from /workspace inside of
docker.
PaddlePaddle Book
------------------
The Jupyter Notebook is an open-source web application that allows
you to create and share documents that contain live code, equations,
visualizations and explanatory text in a single browser.
PaddlePaddle Book is an interactive Jupyter Notebook for users and developers.
We already exposed port 8888 for this book. If you want to
dig deeper into deep learning, PaddlePaddle Book definitely is your best choice.
We provide a packaged book image, simply issue the command:
.. code-block:: bash
docker run -p 8888:8888 paddlepaddle/book
Then, you would back and paste the address into the local browser:
.. code-block:: text
http://localhost:8888/
That's all. Enjoy your journey!
Documentation
-------------
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册