docker_install_en.rst 9.0 KB
Newer Older
Y
Yi Wang 已提交
1 2
PaddlePaddle in Docker Containers
=================================
3

Y
Yi Wang 已提交
4 5 6 7 8 9
Docker container is currently the only officially-supported way to
running PaddlePaddle.  This is reasonable as Docker now runs on all
major operating systems including Linux, Mac OS X, and Windows.
Please be aware that you will need to change `Dockers settings
<https://github.com/PaddlePaddle/Paddle/issues/627>`_ to make full use
of your hardware resource on Mac OS X and Windows.
10

11 12 13
Working With Docker
-------------------

H
Helin Wang 已提交
14
Docker is simple as long as we understand a few basic concepts:
15

H
Helin Wang 已提交
16 17 18 19 20 21 22 23 24 25 26 27 28 29
- *image*: A Docker image is a pack of software. It could contain one or more programs and all their dependencies. For example, the PaddlePaddle's Docker image includes pre-built PaddlePaddle and Python and many Python packages. We can run a Docker image directly, other than installing all these software. We can type

  .. code-block:: bash

     docker images

  to list all images in the system. We can also run

  .. code-block:: bash
		  
     docker pull paddlepaddle/paddle:0.10.0rc2

  to download a Docker image, paddlepaddle/paddle in this example,
  from Dockerhub.com.
30

31 32 33 34 35 36 37 38 39 40 41
- *container*: considering a Docker image a program, a container is a
  "process" that runs the image. Indeed, a container is exactly an
  operating system process, but with a virtualized filesystem, network
  port space, and other virtualized environment. We can type

  .. code-block:: bash

     docker run paddlepaddle/paddle:0.10.0rc2

  to start a container to run a Docker image, paddlepaddle/paddle in this example.

42 43 44 45 46 47 48 49 50 51
- By default docker container have an isolated file system namespace,
  we can not see the files in the host file system. By using *volume*,
  mounted files in host will be visible inside docker container.
  Following command will mount current dirctory into /data inside
  docker container, run docker container from debian image with
  command :code:`ls /data`.

  .. code-block:: bash

     docker run --rm -v $(pwd):/data debian ls /data
52

53 54
Usage of CPU-only and GPU Images
----------------------------------
55

56 57 58 59 60 61 62 63 64
We package PaddlePaddle's compile environment into a Docker image,
called the develop image, it contains all compiling tools that
PaddlePaddle needs. We package compiled PaddlePaddle program into a
Docker image as well, called the production image, it contains all
runtime environment that running PaddlePaddle needs. For each version
of PaddlePaddle, we release both of them. Production image includes
CPU-only version and a CUDA GPU version and their no-AVX versions.

We put the docker images on `dockerhub.com
65
<https://hub.docker.com/r/paddledev/paddle/>`_. You can find the
66 67 68 69
latest versions under "tags" tab at dockerhub.com. If you are in
China, you can use our Docker image registry mirror to speed up the
download process. To use it, please replace all paddlepaddle/paddle in
the commands to docker.paddlepaddle.org/paddle.
70

71
1. Production images, this image might have multiple variants:
72

73 74 75 76
   - GPU/AVX::code:`paddlepaddle/paddle:<version>-gpu`
   - GPU/no-AVX::code:`paddlepaddle/paddle:<version>-gpu-noavx`
   - CPU/AVX::code:`paddlepaddle/paddle:<version>`
   - CPU/no-AVX::code:`paddlepaddle/paddle:<version>-noavx`
77

78 79 80 81
   Please be aware that the CPU-only and the GPU images both use the
   AVX instruction set, but old computers produced before 2008 do not
   support AVX.  The following command checks if your Linux computer
   supports AVX:
82

83
   .. code-block:: bash
84

85
      if cat /proc/cpuinfo | grep -i avx; then echo Yes; else echo No; fi
86

87 88
   
   To run the CPU-only image as an interactive container:
89

90
   .. code-block:: bash
91

92
      docker run -it --rm paddlepaddle/paddle:0.10.0rc2 /bin/bash
93

94 95
   Above method work with the GPU image too -- the recommended way is
   using `nvidia-docker <https://github.com/NVIDIA/nvidia-docker>`_.
96

97 98
   Please install nvidia-docker first following this `tutorial
   <https://github.com/NVIDIA/nvidia-docker#quick-start>`_.
99

100
   Now you can run a GPU image:
L
liaogang 已提交
101

102
   .. code-block:: bash
L
liaogang 已提交
103

104
      nvidia-docker run -it --rm paddlepaddle/paddle:0.10.0rc2-gpu /bin/bash
L
liaogang 已提交
105

106
2. development image :code:`paddlepaddle/paddle:<version>-dev`
L
liaogang 已提交
107

108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
   This image has packed related develop tools and runtime
   environment. Users and developers can use this image instead of
   their own local computer to accomplish development, build,
   releasing, document writing etc. While different version of paddle
   may depends on different version of libraries and tools, if you
   want to setup a local environment, you must pay attention to the
   versions.  The development image contains:
   
   - gcc/clang
   - nvcc
   - Python
   - sphinx
   - woboq
   - sshd
     
   Many developers use servers with GPUs, they can use ssh to login to
   the server and run :code:`docker exec` to enter the docker
   container and start their work.  Also they can start a development
   docker image with SSHD service, so they can login to the container
   and start work.
128

129

130 131
Train Model Using Python API
----------------------------
132

133 134
Our official docker image provides a runtime for PaddlePaddle
programs. The typical workflow will be as follows:
135

136
Create a directory as workspace:
L
liaogang 已提交
137

138
.. code-block:: bash
L
liaogang 已提交
139

140
   mkdir ~/workspace
L
liaogang 已提交
141

142
Edit a PaddlePaddle python program using your favourite editor
143

144
.. code-block:: bash
145

146
   emacs ~/workspace/example.py
147

148
Run the program using docker:
149

150
.. code-block:: bash
151

H
Helin Wang 已提交
152
   docker run --rm -v ~/workspace:/workspace paddlepaddle/paddle:0.10.0rc2 python /workspace/example.py
153

154
Or if you are using GPU for training:
155

156
.. code-block:: bash
157

H
Helin Wang 已提交
158
   nvidia-docker run --rm -v ~/workspace:/workspace paddlepaddle/paddle:0.10.0rc2-gpu python /workspace/example.py
159

160 161 162
Above commands will start a docker container by running :code:`python
/workspace/example.py`. It will stop once :code:`python
/workspace/example.py` finishes.
163

164 165
Another way is to tell docker to start a :code:`/bin/bash` session and
run PaddlePaddle program interactively:
166

167 168 169 170 171 172 173 174
.. code-block:: bash

   docker run -it -v ~/workspace:/workspace paddlepaddle/paddle:0.10.0rc2 /bin/bash
   # now we are inside docker container
   cd /workspace
   python example.py

Running with GPU is identical:
175 176

.. code-block:: bash
177

178 179 180 181
   nvidia-docker run -it -v ~/workspace:/workspace paddlepaddle/paddle:0.10.0rc2-gpu /bin/bash
   # now we are inside docker container
   cd /workspace
   python example.py
182

183

184 185
Develop PaddlePaddle or Train Model Using C++ API
---------------------------------------------------
186

187 188
We will be using PaddlePaddle development image since it contains all
compiling tools and dependencies.
189

190
1. Build PaddlePaddle develop image
191

192
   Use following command to build PaddlePaddle develop image:
193

194
   .. code-block:: bash
195

196 197
      git clone https://github.com/PaddlePaddle/Paddle.git && cd Paddle
      docker build -t paddle:dev .
198

199
2. Build PaddlePaddle production image
200

201
   There are two steps for building production image, the first step is to run:
202

203
   .. code-block:: bash
204

205
      docker run -v $(pwd):/paddle -e "WITH_GPU=OFF" -e "WITH_AVX=OFF" -e "WITH_TEST=ON" paddle:dev
206

207
   The above command will compile PaddlePaddle and create a Dockerfile for building production image. All the generated files are in the build directory. "WITH_GPU" controls if the generated production image supports GPU. "WITH_AVX" controls if the generated production image supports AVX. "WITH_TEST" controls if the unit test will be generated.
208

209
   The second step is to run:
210

211
   .. code-block:: bash
212

213
      docker build -t paddle:prod -f build/Dockerfile .
214

215
   The above command will generate the production image by copying the compiled PaddlePaddle program into the image.
王益 已提交
216

217
3. Run unit test
218

219
   Following command will run unit test:
220

221 222 223
   .. code-block:: bash
      
      docker run -it -v $(pwd):/paddle paddle:dev bash -c "cd /paddle/build && ctest"
224

225 226
PaddlePaddle Book
------------------
227

228 229 230
The Jupyter Notebook is an open-source web application that allows
you to create and share documents that contain live code, equations,
visualizations and explanatory text in a single browser.
231

232 233 234
PaddlePaddle Book is an interactive Jupyter Notebook for users and developers.
We already exposed port 8888 for this book. If you want to
dig deeper into deep learning, PaddlePaddle Book definitely is your best choice.
235

236
We provide a packaged book image, simply issue the command:
237

238
.. code-block:: bash
王益 已提交
239

240
    docker run -p 8888:8888 paddlepaddle/book
241

242 243 244 245 246 247 248
Then, you would back and paste the address into the local browser:

.. code-block:: text

    http://localhost:8888/

That's all. Enjoy your journey!
249

250 251 252 253 254 255 256 257 258 259

Documentation
-------------

Paddle Docker images include an HTML version of C++ source code
generated using `woboq code browser
<https://github.com/woboq/woboq_codebrowser>`_.  This makes it easy
for users to browse and understand the C++ source code.

As long as we give the Paddle Docker container a name, we can run an
D
dayhaha 已提交
260
additional Nginx Docker container to serve the volume from the Paddle
261 262 263 264
container:

.. code-block:: bash

265
   docker run -d --name paddle-cpu-doc paddle:<version>
266 267 268 269 270
   docker run -d --volumes-from paddle-cpu-doc -p 8088:80 nginx


Then we can direct our Web browser to the HTML version of source code
at http://localhost:8088/paddle/