docker_install_en.rst 9.0 KB
Newer Older
Y
Yi Wang 已提交
1 2
PaddlePaddle in Docker Containers
=================================
3

Y
Yi Wang 已提交
4 5 6 7 8 9
Docker container is currently the only officially-supported way to
running PaddlePaddle.  This is reasonable as Docker now runs on all
major operating systems including Linux, Mac OS X, and Windows.
Please be aware that you will need to change `Dockers settings
<https://github.com/PaddlePaddle/Paddle/issues/627>`_ to make full use
of your hardware resource on Mac OS X and Windows.
10

H
Helin Wang 已提交
11 12 13
Working With Docker
-------------------

H
Helin Wang 已提交
14
Docker is simple as long as we understand a few basic concepts:
H
Helin Wang 已提交
15

H
Helin Wang 已提交
16 17 18 19 20 21 22 23 24 25 26 27 28 29
- *image*: A Docker image is a pack of software. It could contain one or more programs and all their dependencies. For example, the PaddlePaddle's Docker image includes pre-built PaddlePaddle and Python and many Python packages. We can run a Docker image directly, other than installing all these software. We can type

  .. code-block:: bash

     docker images

  to list all images in the system. We can also run

  .. code-block:: bash
		  
     docker pull paddlepaddle/paddle:0.10.0rc2

  to download a Docker image, paddlepaddle/paddle in this example,
  from Dockerhub.com.
H
Helin Wang 已提交
30

31 32 33 34 35 36 37 38 39 40 41
- *container*: considering a Docker image a program, a container is a
  "process" that runs the image. Indeed, a container is exactly an
  operating system process, but with a virtualized filesystem, network
  port space, and other virtualized environment. We can type

  .. code-block:: bash

     docker run paddlepaddle/paddle:0.10.0rc2

  to start a container to run a Docker image, paddlepaddle/paddle in this example.

H
Helin Wang 已提交
42 43 44 45 46 47 48 49 50 51
- By default docker container have an isolated file system namespace,
  we can not see the files in the host file system. By using *volume*,
  mounted files in host will be visible inside docker container.
  Following command will mount current dirctory into /data inside
  docker container, run docker container from debian image with
  command :code:`ls /data`.

  .. code-block:: bash

     docker run --rm -v $(pwd):/data debian ls /data
52

L
liaogang 已提交
53 54
Usage of CPU-only and GPU Images
----------------------------------
55

H
Helin Wang 已提交
56
For each version of PaddlePaddle, we release two types of Docker images:
H
Helin Wang 已提交
57 58 59 60 61
development image and production image. Production image includes
CPU-only version and a CUDA GPU version and their no-AVX versions. We
put the docker images on `dockerhub.com
<https://hub.docker.com/r/paddledev/paddle/>`_. You can find the
latest versions under "tags" tab at dockerhub.com
62

H
Helin Wang 已提交
63
1. Production images, this image might have multiple variants:
64

H
Helin Wang 已提交
65 66 67 68
   - GPU/AVX::code:`paddlepaddle/paddle:<version>-gpu`
   - GPU/no-AVX::code:`paddlepaddle/paddle:<version>-gpu-noavx`
   - CPU/AVX::code:`paddlepaddle/paddle:<version>`
   - CPU/no-AVX::code:`paddlepaddle/paddle:<version>-noavx`
69

H
Helin Wang 已提交
70 71 72 73
   Please be aware that the CPU-only and the GPU images both use the
   AVX instruction set, but old computers produced before 2008 do not
   support AVX.  The following command checks if your Linux computer
   supports AVX:
74

H
Helin Wang 已提交
75
   .. code-block:: bash
76

H
Helin Wang 已提交
77
      if cat /proc/cpuinfo | grep -i avx; then echo Yes; else echo No; fi
78

H
Helin Wang 已提交
79 80
   
   To run the CPU-only image as an interactive container:
81

H
Helin Wang 已提交
82
   .. code-block:: bash
83

H
Helin Wang 已提交
84
      docker run -it --rm paddlepaddle/paddle:0.10.0rc2 /bin/bash
85

H
Helin Wang 已提交
86 87
   Above method work with the GPU image too -- the recommended way is
   using `nvidia-docker <https://github.com/NVIDIA/nvidia-docker>`_.
88

H
Helin Wang 已提交
89 90
   Please install nvidia-docker first following this `tutorial
   <https://github.com/NVIDIA/nvidia-docker#quick-start>`_.
91

H
Helin Wang 已提交
92
   Now you can run a GPU image:
L
liaogang 已提交
93

H
Helin Wang 已提交
94
   .. code-block:: bash
L
liaogang 已提交
95

H
Helin Wang 已提交
96
      nvidia-docker run -it --rm paddlepaddle/paddle:0.10.0rc2-gpu /bin/bash
L
liaogang 已提交
97

H
Helin Wang 已提交
98
2. development image :code:`paddlepaddle/paddle:<version>-dev`
L
liaogang 已提交
99

H
Helin Wang 已提交
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119
   This image has packed related develop tools and runtime
   environment. Users and developers can use this image instead of
   their own local computer to accomplish development, build,
   releasing, document writing etc. While different version of paddle
   may depends on different version of libraries and tools, if you
   want to setup a local environment, you must pay attention to the
   versions.  The development image contains:
   
   - gcc/clang
   - nvcc
   - Python
   - sphinx
   - woboq
   - sshd
     
   Many developers use servers with GPUs, they can use ssh to login to
   the server and run :code:`docker exec` to enter the docker
   container and start their work.  Also they can start a development
   docker image with SSHD service, so they can login to the container
   and start work.
120

Y
yi.wu 已提交
121

H
Helin Wang 已提交
122 123
Train Model Using Python API
----------------------------
Y
yi.wu 已提交
124

H
Helin Wang 已提交
125 126
Our official docker image provides a runtime for PaddlePaddle
programs. The typical workflow will be as follows:
Y
yi.wu 已提交
127

H
Helin Wang 已提交
128
Create a directory as workspace:
L
liaogang 已提交
129

H
Helin Wang 已提交
130
.. code-block:: bash
L
liaogang 已提交
131

H
Helin Wang 已提交
132
   mkdir ~/workspace
L
liaogang 已提交
133

H
Helin Wang 已提交
134
Edit a PaddlePaddle python program using your favourite editor
Y
yi.wu 已提交
135

H
Helin Wang 已提交
136
.. code-block:: bash
Y
yi.wu 已提交
137

H
Helin Wang 已提交
138
   emacs ~/workspace/example.py
Y
yi.wu 已提交
139

H
Helin Wang 已提交
140
Run the program using docker:
Y
yi.wu 已提交
141

H
Helin Wang 已提交
142
.. code-block:: bash
Y
yi.wu 已提交
143

H
Helin Wang 已提交
144
   docker run -it --rm -v ~/workspace:/workspace paddlepaddle/paddle:0.10.0rc2 python /workspace/example.py
145

H
Helin Wang 已提交
146
Or if you are using GPU for training:
147

H
Helin Wang 已提交
148
.. code-block:: bash
149

H
Helin Wang 已提交
150
   nvidia-docker run -it --rm -v ~/workspace:/workspace paddlepaddle/paddle:0.10.0rc2-gpu python /workspace/example.py
151

H
Helin Wang 已提交
152 153 154
Above commands will start a docker container by running :code:`python
/workspace/example.py`. It will stop once :code:`python
/workspace/example.py` finishes.
155

H
Helin Wang 已提交
156 157
Another way is to tell docker to start a :code:`/bin/bash` session and
run PaddlePaddle program interactively:
158

H
Helin Wang 已提交
159 160 161 162 163 164 165 166
.. code-block:: bash

   docker run -it -v ~/workspace:/workspace paddlepaddle/paddle:0.10.0rc2 /bin/bash
   # now we are inside docker container
   cd /workspace
   python example.py

Running with GPU is identical:
167 168

.. code-block:: bash
Y
yi.wu 已提交
169

H
Helin Wang 已提交
170 171 172 173
   nvidia-docker run -it -v ~/workspace:/workspace paddlepaddle/paddle:0.10.0rc2-gpu /bin/bash
   # now we are inside docker container
   cd /workspace
   python example.py
174

Y
yi.wu 已提交
175

H
Helin Wang 已提交
176 177
Develop PaddlePaddle or Train Model Using C++ API
---------------------------------------------------
178

H
Helin Wang 已提交
179 180
We will be using PaddlePaddle development image since it contains all
compiling tools and dependencies.
181

H
Helin Wang 已提交
182
Let's clone PaddlePaddle repo first:
183

H
Helin Wang 已提交
184
.. code-block:: bash
185

H
Helin Wang 已提交
186
   git clone https://github.com/PaddlePaddle/Paddle.git && cd Paddle
187

H
Helin Wang 已提交
188 189 190
Mount both workspace folder and paddle code folder into docker
container, so we can access them inside docker container. There are
two ways of using PaddlePaddle development docker image:
191

H
Helin Wang 已提交
192
- run interactive bash directly
193

H
Helin Wang 已提交
194
  .. code-block:: bash
D
dayhaha 已提交
195

H
Helin Wang 已提交
196 197 198
     # use nvidia-docker instead of docker if you need to use GPU
     docker run -it -v ~/workspace:/workspace -v $(pwd):/paddle paddlepaddle/paddle:0.10.0rc2-dev /bin/bash
     # now we are inside docker container
199

H
Helin Wang 已提交
200
- or, we can run it as a daemon container
201

H
Helin Wang 已提交
202
  .. code-block:: bash
203

H
Helin Wang 已提交
204 205
     # use nvidia-docker instead of docker if you need to use GPU
     docker run -d -p 2202:22 -p 8888:8888 -v ~/workspace:/workspace -v $(pwd):/paddle paddlepaddle/paddle:0.10.0rc2-dev /usr/sbin/sshd -D
206

H
Helin Wang 已提交
207
  and SSH to this container using password :code:`root`:
208

H
Helin Wang 已提交
209
  .. code-block:: bash
210

H
Helin Wang 已提交
211
     ssh -p 2202 root@localhost
王益 已提交
212

H
Helin Wang 已提交
213 214
  An advantage is that we can run the PaddlePaddle container on a
  remote server and SSH to it from a laptop.
215

H
Helin Wang 已提交
216 217 218
When developing PaddlePaddle, you can edit PaddlePaddle source code
from outside of docker container using your favoriate editor. To
compile PaddlePaddle, run inside container:
219

H
Helin Wang 已提交
220
.. code-block:: bash
221

H
Helin Wang 已提交
222
   WITH_GPU=OFF WITH_AVX=ON WITH_TEST=ON bash /paddle/paddle/scripts/docker/build.sh
王益 已提交
223

H
Helin Wang 已提交
224 225
This builds everything about Paddle in :code:`/paddle/build`.  And we
can run unit tests there:
王益 已提交
226

H
Helin Wang 已提交
227
.. code-block:: bash
228

H
Helin Wang 已提交
229 230
   cd /paddle/build
   ctest
王益 已提交
231

H
Helin Wang 已提交
232 233 234
When training model using C++ API, we can edit paddle program in
~/workspace outside of docker. And build from /workspace inside of
docker.
235

H
Helin Wang 已提交
236 237
PaddlePaddle Book
------------------
238

H
Helin Wang 已提交
239 240 241
The Jupyter Notebook is an open-source web application that allows
you to create and share documents that contain live code, equations,
visualizations and explanatory text in a single browser.
242

H
Helin Wang 已提交
243 244 245
PaddlePaddle Book is an interactive Jupyter Notebook for users and developers.
We already exposed port 8888 for this book. If you want to
dig deeper into deep learning, PaddlePaddle Book definitely is your best choice.
王益 已提交
246

H
Helin Wang 已提交
247
We provide a packaged book image, simply issue the command:
248

H
Helin Wang 已提交
249
.. code-block:: bash
王益 已提交
250

H
Helin Wang 已提交
251
    docker run -p 8888:8888 paddlepaddle/book
王益 已提交
252

H
Helin Wang 已提交
253 254 255 256 257 258 259
Then, you would back and paste the address into the local browser:

.. code-block:: text

    http://localhost:8888/

That's all. Enjoy your journey!
260

261 262 263 264 265 266 267 268 269 270

Documentation
-------------

Paddle Docker images include an HTML version of C++ source code
generated using `woboq code browser
<https://github.com/woboq/woboq_codebrowser>`_.  This makes it easy
for users to browse and understand the C++ source code.

As long as we give the Paddle Docker container a name, we can run an
D
dayhaha 已提交
271
additional Nginx Docker container to serve the volume from the Paddle
272 273 274 275
container:

.. code-block:: bash

Y
yi.wu 已提交
276
   docker run -d --name paddle-cpu-doc paddle:<version>
277 278 279 280 281
   docker run -d --volumes-from paddle-cpu-doc -p 8088:80 nginx


Then we can direct our Web browser to the HTML version of source code
at http://localhost:8088/paddle/