docker_install_en.rst 9.1 KB
Newer Older
Y
Yi Wang 已提交
1 2
PaddlePaddle in Docker Containers
=================================
3

Y
Yi Wang 已提交
4 5 6 7 8 9
Docker container is currently the only officially-supported way to
running PaddlePaddle.  This is reasonable as Docker now runs on all
major operating systems including Linux, Mac OS X, and Windows.
Please be aware that you will need to change `Dockers settings
<https://github.com/PaddlePaddle/Paddle/issues/627>`_ to make full use
of your hardware resource on Mac OS X and Windows.
10

H
Helin Wang 已提交
11 12 13
Working With Docker
-------------------

H
Helin Wang 已提交
14
Docker is simple as long as we understand a few basic concepts:
H
Helin Wang 已提交
15

H
Helin Wang 已提交
16 17 18 19 20 21 22 23 24 25
- *image*: A Docker image is a pack of software. It could contain one or more programs and all their dependencies. For example, the PaddlePaddle's Docker image includes pre-built PaddlePaddle and Python and many Python packages. We can run a Docker image directly, other than installing all these software. We can type

  .. code-block:: bash

     docker images

  to list all images in the system. We can also run

  .. code-block:: bash
		  
L
liaogang 已提交
26
     docker pull paddlepaddle/paddle:0.10.0
H
Helin Wang 已提交
27 28 29

  to download a Docker image, paddlepaddle/paddle in this example,
  from Dockerhub.com.
H
Helin Wang 已提交
30

31 32 33 34 35 36 37
- *container*: considering a Docker image a program, a container is a
  "process" that runs the image. Indeed, a container is exactly an
  operating system process, but with a virtualized filesystem, network
  port space, and other virtualized environment. We can type

  .. code-block:: bash

L
liaogang 已提交
38
     docker run paddlepaddle/paddle:0.10.0
39 40 41

  to start a container to run a Docker image, paddlepaddle/paddle in this example.

H
Helin Wang 已提交
42 43 44 45 46 47 48 49 50 51
- By default docker container have an isolated file system namespace,
  we can not see the files in the host file system. By using *volume*,
  mounted files in host will be visible inside docker container.
  Following command will mount current dirctory into /data inside
  docker container, run docker container from debian image with
  command :code:`ls /data`.

  .. code-block:: bash

     docker run --rm -v $(pwd):/data debian ls /data
52

L
liaogang 已提交
53 54
Usage of CPU-only and GPU Images
----------------------------------
55

56 57 58 59 60 61 62 63 64
We package PaddlePaddle's compile environment into a Docker image,
called the develop image, it contains all compiling tools that
PaddlePaddle needs. We package compiled PaddlePaddle program into a
Docker image as well, called the production image, it contains all
runtime environment that running PaddlePaddle needs. For each version
of PaddlePaddle, we release both of them. Production image includes
CPU-only version and a CUDA GPU version and their no-AVX versions.

We put the docker images on `dockerhub.com
L
liaogang 已提交
65
<https://hub.docker.com/r/paddlepaddle/paddle/tags/>`_. You can find the
T
typhoonzero 已提交
66
latest versions under "tags" tab at dockerhub.com. 
67

T
typhoonzero 已提交
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94
** NOTE: If you are in China, you can use our Docker image registry mirror to speed up the download process. To use it, please replace all paddlepaddle/paddle in the commands to docker.paddlepaddle.org/paddle.**


1. development image :code:`paddlepaddle/paddle:<version>-dev`

   This image has packed related develop tools and runtime
   environment. Users and developers can use this image instead of
   their own local computer to accomplish development, build,
   releasing, document writing etc. While different version of paddle
   may depends on different version of libraries and tools, if you
   want to setup a local environment, you must pay attention to the
   versions.  The development image contains:
   
   - gcc/clang
   - nvcc
   - Python
   - sphinx
   - woboq
   - sshd
     
   Many developers use servers with GPUs, they can use ssh to login to
   the server and run :code:`docker exec` to enter the docker
   container and start their work.  Also they can start a development
   docker image with SSHD service, so they can login to the container
   and start work.

2. Production images, this image might have multiple variants:
95

H
Helin Wang 已提交
96 97 98 99
   - GPU/AVX::code:`paddlepaddle/paddle:<version>-gpu`
   - GPU/no-AVX::code:`paddlepaddle/paddle:<version>-gpu-noavx`
   - CPU/AVX::code:`paddlepaddle/paddle:<version>`
   - CPU/no-AVX::code:`paddlepaddle/paddle:<version>-noavx`
100

H
Helin Wang 已提交
101 102 103 104
   Please be aware that the CPU-only and the GPU images both use the
   AVX instruction set, but old computers produced before 2008 do not
   support AVX.  The following command checks if your Linux computer
   supports AVX:
105

H
Helin Wang 已提交
106
   .. code-block:: bash
107

H
Helin Wang 已提交
108
      if cat /proc/cpuinfo | grep -i avx; then echo Yes; else echo No; fi
109

T
typhoonzero 已提交
110
   **NOTE:versions after 0.10.0 will automatically detect system AVX support, so manual detect is not needed in this case.**
H
Helin Wang 已提交
111
   To run the CPU-only image as an interactive container:
112

H
Helin Wang 已提交
113
   .. code-block:: bash
114

L
liaogang 已提交
115
      docker run -it --rm paddlepaddle/paddle:0.10.0 /bin/bash
116

H
Helin Wang 已提交
117 118
   Above method work with the GPU image too -- the recommended way is
   using `nvidia-docker <https://github.com/NVIDIA/nvidia-docker>`_.
119

H
Helin Wang 已提交
120 121
   Please install nvidia-docker first following this `tutorial
   <https://github.com/NVIDIA/nvidia-docker#quick-start>`_.
122

H
Helin Wang 已提交
123
   Now you can run a GPU image:
L
liaogang 已提交
124

H
Helin Wang 已提交
125
   .. code-block:: bash
L
liaogang 已提交
126

L
liaogang 已提交
127
      nvidia-docker run -it --rm paddlepaddle/paddle:0.10.0-gpu /bin/bash
L
liaogang 已提交
128

Y
yi.wu 已提交
129

H
Helin Wang 已提交
130 131
Train Model Using Python API
----------------------------
Y
yi.wu 已提交
132

H
Helin Wang 已提交
133 134
Our official docker image provides a runtime for PaddlePaddle
programs. The typical workflow will be as follows:
Y
yi.wu 已提交
135

H
Helin Wang 已提交
136
Create a directory as workspace:
L
liaogang 已提交
137

H
Helin Wang 已提交
138
.. code-block:: bash
L
liaogang 已提交
139

H
Helin Wang 已提交
140
   mkdir ~/workspace
L
liaogang 已提交
141

H
Helin Wang 已提交
142
Edit a PaddlePaddle python program using your favourite editor
Y
yi.wu 已提交
143

H
Helin Wang 已提交
144
.. code-block:: bash
Y
yi.wu 已提交
145

H
Helin Wang 已提交
146
   emacs ~/workspace/example.py
Y
yi.wu 已提交
147

H
Helin Wang 已提交
148
Run the program using docker:
Y
yi.wu 已提交
149

H
Helin Wang 已提交
150
.. code-block:: bash
Y
yi.wu 已提交
151

L
liaogang 已提交
152
   docker run --rm -v ~/workspace:/workspace paddlepaddle/paddle:0.10.0 python /workspace/example.py
153

H
Helin Wang 已提交
154
Or if you are using GPU for training:
155

H
Helin Wang 已提交
156
.. code-block:: bash
157

L
liaogang 已提交
158
   nvidia-docker run --rm -v ~/workspace:/workspace paddlepaddle/paddle:0.10.0-gpu python /workspace/example.py
159

H
Helin Wang 已提交
160 161 162
Above commands will start a docker container by running :code:`python
/workspace/example.py`. It will stop once :code:`python
/workspace/example.py` finishes.
163

H
Helin Wang 已提交
164 165
Another way is to tell docker to start a :code:`/bin/bash` session and
run PaddlePaddle program interactively:
166

H
Helin Wang 已提交
167 168
.. code-block:: bash

L
liaogang 已提交
169
   docker run -it -v ~/workspace:/workspace paddlepaddle/paddle:0.10.0 /bin/bash
H
Helin Wang 已提交
170 171 172 173 174
   # now we are inside docker container
   cd /workspace
   python example.py

Running with GPU is identical:
175 176

.. code-block:: bash
Y
yi.wu 已提交
177

L
liaogang 已提交
178
   nvidia-docker run -it -v ~/workspace:/workspace paddlepaddle/paddle:0.10.0-gpu /bin/bash
H
Helin Wang 已提交
179 180 181
   # now we are inside docker container
   cd /workspace
   python example.py
182

Y
yi.wu 已提交
183

H
Helin Wang 已提交
184 185
Develop PaddlePaddle or Train Model Using C++ API
---------------------------------------------------
186

H
Helin Wang 已提交
187 188
We will be using PaddlePaddle development image since it contains all
compiling tools and dependencies.
189

190
1. Build PaddlePaddle develop image
191

192
   Use following command to build PaddlePaddle develop image:
193

194
   .. code-block:: bash
195

196 197
      git clone https://github.com/PaddlePaddle/Paddle.git && cd Paddle
      docker build -t paddle:dev .
198

199
2. Build PaddlePaddle production image
200

201
   There are two steps for building production image, the first step is to run:
202

203
   .. code-block:: bash
204

205
      docker run -v $(pwd):/paddle -e "WITH_GPU=OFF" -e "WITH_AVX=OFF" -e "WITH_TEST=ON" paddle:dev
王益 已提交
206

207
   The above command will compile PaddlePaddle and create a Dockerfile for building production image. All the generated files are in the build directory. "WITH_GPU" controls if the generated production image supports GPU. "WITH_AVX" controls if the generated production image supports AVX. "WITH_TEST" controls if the unit test will be generated.
208

209
   The second step is to run:
210

211
   .. code-block:: bash
212

213
      docker build -t paddle:prod -f build/Dockerfile ./build
王益 已提交
214

215
   The above command will generate the production image by copying the compiled PaddlePaddle program into the image.
王益 已提交
216

217
3. Run unit test
218

219
   Following command will run unit test:
王益 已提交
220

221 222 223
   .. code-block:: bash
      
      docker run -it -v $(pwd):/paddle paddle:dev bash -c "cd /paddle/build && ctest"
224

H
Helin Wang 已提交
225 226
PaddlePaddle Book
------------------
227

H
Helin Wang 已提交
228 229 230
The Jupyter Notebook is an open-source web application that allows
you to create and share documents that contain live code, equations,
visualizations and explanatory text in a single browser.
231

H
Helin Wang 已提交
232 233 234
PaddlePaddle Book is an interactive Jupyter Notebook for users and developers.
We already exposed port 8888 for this book. If you want to
dig deeper into deep learning, PaddlePaddle Book definitely is your best choice.
王益 已提交
235

H
Helin Wang 已提交
236
We provide a packaged book image, simply issue the command:
237

H
Helin Wang 已提交
238
.. code-block:: bash
王益 已提交
239

H
Helin Wang 已提交
240
    docker run -p 8888:8888 paddlepaddle/book
王益 已提交
241

H
Helin Wang 已提交
242 243 244 245 246 247 248
Then, you would back and paste the address into the local browser:

.. code-block:: text

    http://localhost:8888/

That's all. Enjoy your journey!
249

250 251 252 253 254 255 256 257 258 259

Documentation
-------------

Paddle Docker images include an HTML version of C++ source code
generated using `woboq code browser
<https://github.com/woboq/woboq_codebrowser>`_.  This makes it easy
for users to browse and understand the C++ source code.

As long as we give the Paddle Docker container a name, we can run an
D
dayhaha 已提交
260
additional Nginx Docker container to serve the volume from the Paddle
261 262 263 264
container:

.. code-block:: bash

Y
yi.wu 已提交
265
   docker run -d --name paddle-cpu-doc paddle:<version>
266 267 268 269 270
   docker run -d --volumes-from paddle-cpu-doc -p 8088:80 nginx


Then we can direct our Web browser to the HTML version of source code
at http://localhost:8088/paddle/