提交 db045acb 编写于 作者: 王益

Improve the design doc of Docker build

上级 56fcf9c1
因为我们不提供非Ubuntu的bulid支持,所以如果用户用其他操作系统,比如CoreOS、CentOS、MacOS X、Windows,开发都得在docker里。所以需要能build本地修改后的代码。 We need to complete the initial draft https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/scripts/docker/README.md.
我们可能需要两个 Docker images: I am recording some ideas here, and we should file a PR later.
1. development image:不包括源码,但是包括开发环境(预先安装好各种工具),也就是说Dockerfile.dev里既不需要 COPY 也不需要 RUN git clone。虽然这个image和源码无关,但是不同版本的源码需要依赖不同的第三方库,所以这个image的tag里还是要包含git branch/tag name,比如叫做 `paddlepaddle/paddle:dev-0.10.0rc1`,这里的0.10.0.rc1是一个branch name,其中rc是release candidate的意思。正是发布之后就成了master branch里的一个tag,叫做0.10.0。 ## Current Status
1. production image: 不包括编译环境,也不包括源码,只包括build好的libpaddle.so和必要的Python packages,用于在Kubernetes机群上跑应用的image。比如叫做 `paddlepaddle/paddle:0.10.0rc1` Currently, we have four sets of Dockefiles:
从1.生成2.的过程如下: 1. Kubernetes examples:
1. 在本机(host)上开发。假设源码位于 `~/work/paddle` ```
doc/howto/usage/k8s/src/Dockerfile -- based on released image but add start.sh
doc/howto/usage/k8s/src/k8s_data/Dockerfile -- contains only get_data.sh
doc/howto/usage/k8s/src/k8s_train/Dockerfile -- this duplicates with the first one.
```
1. Generate .deb packages:
```
paddle/scripts/deb/build_scripts/Dockerfile -- significantly overlaps with the `docker` directory
```
1. In the `docker` directory:
```
paddle/scripts/docker/Dockerfile
paddle/scripts/docker/Dockerfile.gpu
```
1. Document building
```
paddle/scripts/tools/build_docs/Dockerfile -- a subset of above two sets.
```
## Goal
We want two Docker images for each version of PaddlePaddle:
1. `paddle:<version>-dev`
This a development image contains only the development tools. This standardizes the building tools and procedure. Users include:
- developers -- no longer need to install development tools on the host, and can build their current work on the host (development computer).
- release engineers -- use this to build the official release from certain branch/tag on Github.com.
- document writers / Website developers -- Our documents are in the source repo in the form of .md/.rst files and comments in source code. We need tools to extract the information, typeset, and generate Web pages.
So the development image must contain not only source code building tools, but also documentation tools:
- gcc/clang
- nvcc
- Python
- sphinx
- woboq
- sshd
where `sshd` makes it easy for developers to have multiple terminals connecting into the container.
1. `paddle:<version>`
This is the production image, generated using the development image. This image might have multiple variants:
- GPU/AVX `paddle:<version>-gpu`
- GPU/no-AVX `paddle:<version>-gpu-noavx`
- no-GPU/AVX `paddle:<version>`
- no-GPU/no-AVX `paddle:<version>-noavx`
We'd like to give users choices of GPU and no-GPU, because the GPU version image is much larger than then the no-GPU version.
We'd like to give users choices of AVX and no-AVX, because some cloud providers don't provide AVX-enabled VMs.
## Dockerfile
To realize above goals, we need only one Dockerfile for the development image. We can put it in the root source directory.
Let us go over our daily development procedure to show how developers can use this file.
1. Check out the source code
1. 用dev image build 我们的源码:
```bash ```bash
docker run -it -p 2022:22 -v $PWD:/paddle paddlepaddle/paddle:dev-0.10.0rc1 /paddle/build.sh git clone https://github.com/PaddlePaddle/Paddle paddle
``` ```
注意,这里的 `-v ` 参数把host上的源码目录里的内容映射到了container里的`/paddle` 目录;而container里的 `/paddle/build.sh` 就是源码目录里的 `build.sh`。上述命令调用了本地源码中的 bulid.sh 来build了本地源码,结果在container里的 `/paddle/build` 目录里,也就是本地的源码目录里的 `build` 子目录。
1. 我们希望上述 `build.sh` 脚本在 `build` 子目录里生成一个Dockerfile,使得我们可以运行: 1. Do something
```bash ```bash
docker build -t paddle ./build cd paddle
git checkout -b my_work
Edit some files
``` ```
来生成我们的production image。
1. 有了这个production image之后,我们可能会希望docker push 到dockerhub.com的我们自己的名下,然后可以用来启动本地或者远程(Kubernetes)jobs: 1. Build/update the development image (if not yet)
```bash ```bash
docker tag paddle yiwang/paddle:did-some-change docker build -t paddle:dev . # Suppose that the Dockerfile is in the root source directory.
docker push ```
paddlectl run yiwang/paddle:did-some-change /paddle/demo/mnist/train.py
1. Build the source code
```bash
docker run -v $PWD:/paddle -e "GPU=OFF" -e "AVX=ON" -e "TEST=ON" paddle:dev
``` ```
其中 paddlectl 应该是我们自己写的一个脚本,调用kubectl来在Kubernetes机群上启动一个job的。 This command maps the source directory on the host into `/paddle` in the container.
Please be aware that the default entrypoint of `paddle:dev` is a shell script file `build.sh`, which builds the source code, and outputs to `/paddle/build` in the container, which is actually `$PWD/build` on the host.
`build.sh` doesn't only build binaries, but also generates a `$PWD/build/Dockerfile` file, which can be used to build the production image. We will talk about it later.
1. Run on the host (Not recommended)
If the host computer happens to have all dependent libraries and Python runtimes installed, we can now run/test the built program. But the recommended way is to running in a production image.
1. Run in the development container
`build.sh` generates binary files and invokes `make install`. So we can run the built program within the development container. This is convenient for developers.
1. Build a production image
On the host, we can use the `$PWD/build/Dockerfile` to generate a production image.
```bash
docker build -t paddle --build-arg "BOOK=ON" -f build/Dockerfile .
```
1. Run the Paddle Book
Once we have the production image, we can run [Paddle Book](http://book.paddlepaddle.org/) chapters in Jupyter Notebooks (if we chose to build them)
```bash
docker run -it paddle
```
Note that the default entrypoint of the production image starts Jupyter server, if we chose to build Paddle Book.
1. Run on Kubernetes
We can push the production image to a DockerHub server, so developers can run distributed training jobs on the Kuberentes cluster:
```bash
docker tag paddle me/paddle
docker push
kubectl ...
```
曾经的讨论背景: For end users, we will provide more convinient tools to run distributed jobs.
["PR 1599"](https://github.com/PaddlePaddle/Paddle/pull/1599)
["PR 1598"](https://github.com/PaddlePaddle/Paddle/pull/1598)
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册