README.md 8.5 KB
Newer Older
1
# Building PaddlePaddle
G
gongweibao 已提交
2

3
## Goals
G
gongweibao 已提交
4

T
typhoonzero 已提交
5
We want to make the building procedures:
G
gongweibao 已提交
6

T
typhoonzero 已提交
7 8 9 10 11 12 13 14
1. Static, can reproduce easily.
1. Generate python `whl` packages that can be widely use cross many distributions.
1. Build different binaries per release to satisfy different environments:
    - Binaries for different CUDA and CUDNN versions, like CUDA 7.5, 8.0, 9.0
    - Binaries containing only capi
    - Binaries for python with wide unicode support or not.
1. Build docker images with PaddlePaddle pre-installed, so that we can run
PaddlePaddle applications directly in docker or on Kubernetes clusters.
G
gongweibao 已提交
15

T
typhoonzero 已提交
16 17 18 19
To achieve this, we created a repo: https://github.com/PaddlePaddle/buildtools
which gives several docker images that are `manylinux1` sufficient. Then we
can build PaddlePaddle using these images to generate corresponding `whl`
binaries.
G
gongweibao 已提交
20

T
typhoonzero 已提交
21
## Run The Build
22

L
liaogang 已提交
23
### Build Environments
24

T
typhoonzero 已提交
25
The pre-built build environment images are:
26

T
typhoonzero 已提交
27 28 29 30 31 32
| Image | Tag |
| ----- | --- |
| paddlepaddle/paddle_manylinux_devel | cuda7.5_cudnn5 |
| paddlepaddle/paddle_manylinux_devel | cuda8.0_cudnn5 |
| paddlepaddle/paddle_manylinux_devel | cuda7.5_cudnn7 |
| paddlepaddle/paddle_manylinux_devel | cuda9.0_cudnn7 |
33

T
typhoonzero 已提交
34
### Start Build
35

T
typhoonzero 已提交
36 37
Choose one docker image that suit your environment and run the following
command to start a build:
38

T
typhoonzero 已提交
39 40 41 42 43
```bash
git clone https://github.com/PaddlePaddle/Paddle.git
cd Paddle
docker run --rm -v $PWD:/paddle -e "WITH_GPU=OFF" -e "WITH_AVX=ON" -e "WITH_TESTING=OFF" -e "RUN_TEST=OFF" -e "PYTHON_ABI=cp27-cp27mu" paddlepaddle/paddle_manylinux_devel /paddle/paddle/scripts/docker/build.sh
```
44

T
typhoonzero 已提交
45 46
After the build finishes, you can get output `whl` package under
`build/python/dist`.
47

T
typhoonzero 已提交
48 49
This command mounts the source directory on the host into `/paddle` in the container, then run the build script `/paddle/paddle/scripts/docker/build.sh`
in the container. When it writes to `/paddle/build` in the container, it writes to `$PWD/build` on the host indeed.
50

T
typhoonzero 已提交
51
### Build Options
52

T
typhoonzero 已提交
53
Users can specify the following Docker build arguments with either "ON" or "OFF" value:
54

T
typhoonzero 已提交
55 56 57 58 59
| Option | Default | Description |
| ------ | -------- | ----------- |
| `WITH_GPU` | OFF | Generates NVIDIA CUDA GPU code and relies on CUDA libraries. |
| `WITH_AVX` | OFF | Set to "ON" to enable AVX support. |
| `WITH_TESTING` | ON | Build unit tests binaries. |
60
| `WITH_MKL` | ON | Build with [Intel® MKL](https://software.intel.com/en-us/mkl) and [Intel® MKL-DNN](https://github.com/01org/mkl-dnn) support. |
T
typhoonzero 已提交
61 62 63 64 65 66 67 68 69
| `WITH_GOLANG` | ON | Build fault-tolerant parameter server written in go. |
| `WITH_SWIG_PY` | ON | Build with SWIG python API support. |
| `WITH_C_API` | OFF | Build capi libraries for inference. |
| `WITH_PYTHON` | ON | Build with python support. Turn this off if build is only for capi. |
| `WITH_STYLE_CHECK` | ON | Check the code style when building. |
| `PYTHON_ABI` | "" | Build for different python ABI support, can be cp27-cp27m or cp27-cp27mu |
| `RUN_TEST` | OFF | Run unit test immediently after the build. |
| `WITH_DOC` | OFF | Build docs after build binaries. |
| `WOBOQ` | OFF | Generate WOBOQ code viewer under `build/woboq_out` |
70 71


T
typhoonzero 已提交
72
## Docker Images
73

T
typhoonzero 已提交
74 75
You can get the latest PaddlePaddle docker images by
`docker pull paddlepaddle/paddle:<version>` or build one by yourself.
76

T
typhoonzero 已提交
77
### Official Docker Releases
78

T
typhoonzero 已提交
79 80 81 82
Official docker images at
[here](https://hub.docker.com/r/paddlepaddle/paddle/tags/),
you can choose either latest or images with a release tag like `0.10.0`,
Currently available tags are:
83

T
typhoonzero 已提交
84 85 86 87 88 89
|   Tag  | Description |
| ------ | --------------------- |
| latest | latest CPU only image |
| latest-gpu | latest binary with GPU support |
| 0.10.0 | release 0.10.0 CPU only binary image |
| 0.10.0-gpu | release 0.10.0 with GPU support |
90

T
typhoonzero 已提交
91
### Build Your Own Image
92

T
typhoonzero 已提交
93 94
Build PaddlePaddle docker images are quite simple since PaddlePaddle can
be installed by just running `pip install`. A sample `Dockerfile` is:
95

T
typhoonzero 已提交
96 97 98 99 100 101 102 103
```dockerfile
FROM nvidia/cuda:7.5-cudnn5-runtime-centos6
RUN yum install -y centos-release-SCL
RUN yum install -y python27
# This whl package is generated by previous build steps.
ADD python/dist/paddlepaddle-0.10.0-cp27-cp27mu-linux_x86_64.whl /
RUN pip install /paddlepaddle-0.10.0-cp27-cp27mu-linux_x86_64.whl && rm -f /*.whl
```
104

T
typhoonzero 已提交
105 106
Then build the image by running `docker build -t [REPO]/paddle:[TAG] .` under
the directory containing your own `Dockerfile`.
107

T
typhoonzero 已提交
108
- NOTE: note that you can choose different base images for your environment, you can find all the versions [here](https://hub.docker.com/r/nvidia/cuda/).
109

T
typhoonzero 已提交
110
### Use Docker Images
111

T
typhoonzero 已提交
112 113
Suppose that you have written an application program `train.py` using
PaddlePaddle, we can test and run it using docker:
114

115
```bash
T
typhoonzero 已提交
116
docker run --rm -it -v $PWD:/work paddlepaddle/paddle /work/a.py
117
```
118

T
typhoonzero 已提交
119
But this works only if all dependencies of `train.py` are in the production image. If this is not the case, we need to build a new Docker image from the production image and with more dependencies installs.
120

T
typhoonzero 已提交
121
### Run PaddlePaddle Book In Docker
G
gongweibao 已提交
122

T
typhoonzero 已提交
123 124 125
Our [book repo](https://github.com/paddlepaddle/book) also provide a docker
image to start a jupiter notebook inside docker so that you can run this book
using docker:
126

127
```bash
T
typhoonzero 已提交
128
docker run -d -p 8888:8888 paddlepaddle/book
129
```
G
gongweibao 已提交
130

T
typhoonzero 已提交
131 132
Please refer to https://github.com/paddlepaddle/book if you want to build this
docker image by your self.
133

T
typhoonzero 已提交
134
### Run Distributed Applications
135

T
typhoonzero 已提交
136
In our [API design doc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/api.md#distributed-training), we proposed an API that starts a distributed training job on a cluster.  This API need to build a PaddlePaddle application into a Docker image as above and calls kubectl to run it on the cluster.  This API might need to generate a Dockerfile look like above and call `docker build`.
137

T
typhoonzero 已提交
138
Of course, we can manually build an application image and launch the job using the kubectl tool:
139

140
```bash
T
typhoonzero 已提交
141 142 143 144
docker build -f some/Dockerfile -t myapp .
docker tag myapp me/myapp
docker push
kubectl ...
145
```
146

T
typhoonzero 已提交
147
## Docker Images for Developers
148

T
typhoonzero 已提交
149 150 151
We have a special docker image for developers:
`paddlepaddle/paddle:<version>-dev`. This image is also generated from
https://github.com/PaddlePaddle/buildtools
152

T
typhoonzero 已提交
153 154
This a development image contains only the
development tools and standardizes the building procedure.  Users include:
155

T
typhoonzero 已提交
156 157 158
- developers -- no longer need to install development tools on the host, and can build their current work on the host (development computer).
- release engineers -- use this to build the official release from certain branch/tag on Github.com.
- document writers / Website developers -- Our documents are in the source repo in the form of .md/.rst files and comments in source code.  We need tools to extract the information, typeset, and generate Web pages.
159

T
typhoonzero 已提交
160
Of course, developers can install building tools on their development computers.  But different versions of PaddlePaddle might require different set or version of building tools.  Also, it makes collaborative debugging easier if all developers use a unified development environment.
161

T
typhoonzero 已提交
162
The development image contains the following tools:
G
gongweibao 已提交
163

T
typhoonzero 已提交
164 165 166 167 168 169
   - gcc/clang
   - nvcc
   - Python
   - sphinx
   - woboq
   - sshd
170

T
typhoonzero 已提交
171
Many developers work on a remote computer with GPU; they could ssh into the computer and  `docker exec` into the development container. However, running `sshd` in the container allows developers to ssh into the container directly.
G
gongweibao 已提交
172

173

T
typhoonzero 已提交
174
### Development Workflow
175

T
typhoonzero 已提交
176
Here we describe how the workflow goes on.  We start from considering our daily development environment.
177

T
typhoonzero 已提交
178
Developers work on a computer, which is usually a laptop or desktop:
179

T
typhoonzero 已提交
180
<img src="doc/paddle-development-environment.png" width=500 />
G
gongweibao 已提交
181

T
typhoonzero 已提交
182 183 184 185 186
or, they might rely on a more sophisticated box (like with GPUs):

<img src="doc/paddle-development-environment-gpu.png" width=500 />

A principle here is that source code lies on the development computer (host) so that editors like Eclipse can parse the source code to support auto-completion.
Y
Yancey1989 已提交
187 188

### Reading source code with woboq codebrowser
T
typhoonzero 已提交
189

190
For developers who are interested in the C++ source code, please use -e "WOBOQ=ON" to enable the building of C++ source code into HTML pages using [Woboq codebrowser](https://github.com/woboq/woboq_codebrowser).
Y
update  
Yancey1989 已提交
191

Y
update  
Yancey1989 已提交
192
- The following command builds PaddlePaddle, generates HTML pages from C++ source code, and writes HTML pages into `$HOME/woboq_out` on the host:
Y
Yancey1989 已提交
193 194

```bash
L
Luo Tao 已提交
195
docker run -v $PWD:/paddle -v $HOME/woboq_out:/woboq_out -e "WITH_GPU=OFF" -e "WITH_AVX=ON" -e "WITH_TESTING=ON" -e "WOBOQ=ON" paddlepaddle/paddle:latest-dev
Y
Yancey1989 已提交
196 197
```

Y
update  
Yancey1989 已提交
198
- You can open the generated HTML files in your Web browser. Or, if you want to run a Nginx container to serve them for a wider audience, you can run:
Y
Yancey1989 已提交
199 200

```
Y
update  
Yancey1989 已提交
201
docker run -v $HOME/woboq_out:/usr/share/nginx/html -d -p 8080:80 nginx
Y
Yancey1989 已提交
202
```