From db045acb219774fe001eab0c45818a596bc3354a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E7=8E=8B=E7=9B=8A?= Date: Wed, 15 Mar 2017 17:35:49 -0700 Subject: [PATCH] Improve the design doc of Docker build --- paddle/scripts/docker/README.md | 152 +++++++++++++++++++++++++++----- 1 file changed, 130 insertions(+), 22 deletions(-) diff --git a/paddle/scripts/docker/README.md b/paddle/scripts/docker/README.md index dd4a1d30d51..722380a0e1e 100644 --- a/paddle/scripts/docker/README.md +++ b/paddle/scripts/docker/README.md @@ -1,38 +1,146 @@ -因为我们不提供非Ubuntu的bulid支持,所以如果用户用其他操作系统,比如CoreOS、CentOS、MacOS X、Windows,开发都得在docker里。所以需要能build本地修改后的代码。 +We need to complete the initial draft https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/scripts/docker/README.md. -我们可能需要两个 Docker images: +I am recording some ideas here, and we should file a PR later. -1. development image:不包括源码,但是包括开发环境(预先安装好各种工具),也就是说Dockerfile.dev里既不需要 COPY 也不需要 RUN git clone。虽然这个image和源码无关,但是不同版本的源码需要依赖不同的第三方库,所以这个image的tag里还是要包含git branch/tag name,比如叫做 `paddlepaddle/paddle:dev-0.10.0rc1`,这里的0.10.0.rc1是一个branch name,其中rc是release candidate的意思。正是发布之后就成了master branch里的一个tag,叫做0.10.0。 +## Current Status -1. production image: 不包括编译环境,也不包括源码,只包括build好的libpaddle.so和必要的Python packages,用于在Kubernetes机群上跑应用的image。比如叫做 `paddlepaddle/paddle:0.10.0rc1`。 +Currently, we have four sets of Dockefiles: -从1.生成2.的过程如下: +1. Kubernetes examples: -1. 在本机(host)上开发。假设源码位于 `~/work/paddle`。 + ``` + doc/howto/usage/k8s/src/Dockerfile -- based on released image but add start.sh + doc/howto/usage/k8s/src/k8s_data/Dockerfile -- contains only get_data.sh + doc/howto/usage/k8s/src/k8s_train/Dockerfile -- this duplicates with the first one. + ``` + +1. Generate .deb packages: + + ``` + paddle/scripts/deb/build_scripts/Dockerfile -- significantly overlaps with the `docker` directory + ``` + +1. In the `docker` directory: + + ``` + paddle/scripts/docker/Dockerfile + paddle/scripts/docker/Dockerfile.gpu + ``` + +1. Document building + + ``` + paddle/scripts/tools/build_docs/Dockerfile -- a subset of above two sets. + ``` + +## Goal + +We want two Docker images for each version of PaddlePaddle: + +1. `paddle:-dev` + + This a development image contains only the development tools. This standardizes the building tools and procedure. Users include: + + - developers -- no longer need to install development tools on the host, and can build their current work on the host (development computer). + - release engineers -- use this to build the official release from certain branch/tag on Github.com. + - document writers / Website developers -- Our documents are in the source repo in the form of .md/.rst files and comments in source code. We need tools to extract the information, typeset, and generate Web pages. + + So the development image must contain not only source code building tools, but also documentation tools: + + - gcc/clang + - nvcc + - Python + - sphinx + - woboq + - sshd + + where `sshd` makes it easy for developers to have multiple terminals connecting into the container. + +1. `paddle:` + + This is the production image, generated using the development image. This image might have multiple variants: + + - GPU/AVX `paddle:-gpu` + - GPU/no-AVX `paddle:-gpu-noavx` + - no-GPU/AVX `paddle:` + - no-GPU/no-AVX `paddle:-noavx` + + We'd like to give users choices of GPU and no-GPU, because the GPU version image is much larger than then the no-GPU version. + + We'd like to give users choices of AVX and no-AVX, because some cloud providers don't provide AVX-enabled VMs. + +## Dockerfile + +To realize above goals, we need only one Dockerfile for the development image. We can put it in the root source directory. + +Let us go over our daily development procedure to show how developers can use this file. + +1. Check out the source code -1. 用dev image build 我们的源码: ```bash - docker run -it -p 2022:22 -v $PWD:/paddle paddlepaddle/paddle:dev-0.10.0rc1 /paddle/build.sh - ``` - 注意,这里的 `-v ` 参数把host上的源码目录里的内容映射到了container里的`/paddle` 目录;而container里的 `/paddle/build.sh` 就是源码目录里的 `build.sh`。上述命令调用了本地源码中的 bulid.sh 来build了本地源码,结果在container里的 `/paddle/build` 目录里,也就是本地的源码目录里的 `build` 子目录。 + git clone https://github.com/PaddlePaddle/Paddle paddle + ``` + +1. Do something -1. 我们希望上述 `build.sh` 脚本在 `build` 子目录里生成一个Dockerfile,使得我们可以运行: ```bash - docker build -t paddle ./build + cd paddle + git checkout -b my_work + Edit some files ``` - 来生成我们的production image。 - -1. 有了这个production image之后,我们可能会希望docker push 到dockerhub.com的我们自己的名下,然后可以用来启动本地或者远程(Kubernetes)jobs: + +1. Build/update the development image (if not yet) ```bash - docker tag paddle yiwang/paddle:did-some-change - docker push - paddlectl run yiwang/paddle:did-some-change /paddle/demo/mnist/train.py + docker build -t paddle:dev . # Suppose that the Dockerfile is in the root source directory. + ``` + +1. Build the source code + + ```bash + docker run -v $PWD:/paddle -e "GPU=OFF" -e "AVX=ON" -e "TEST=ON" paddle:dev + ``` + + This command maps the source directory on the host into `/paddle` in the container. + + Please be aware that the default entrypoint of `paddle:dev` is a shell script file `build.sh`, which builds the source code, and outputs to `/paddle/build` in the container, which is actually `$PWD/build` on the host. + + `build.sh` doesn't only build binaries, but also generates a `$PWD/build/Dockerfile` file, which can be used to build the production image. We will talk about it later. + +1. Run on the host (Not recommended) + + If the host computer happens to have all dependent libraries and Python runtimes installed, we can now run/test the built program. But the recommended way is to running in a production image. + +1. Run in the development container + + `build.sh` generates binary files and invokes `make install`. So we can run the built program within the development container. This is convenient for developers. + +1. Build a production image + + On the host, we can use the `$PWD/build/Dockerfile` to generate a production image. + + ```bash + docker build -t paddle --build-arg "BOOK=ON" -f build/Dockerfile . ``` - 其中 paddlectl 应该是我们自己写的一个脚本,调用kubectl来在Kubernetes机群上启动一个job的。 +1. Run the Paddle Book + + Once we have the production image, we can run [Paddle Book](http://book.paddlepaddle.org/) chapters in Jupyter Notebooks (if we chose to build them) + ```bash + docker run -it paddle + ``` + + Note that the default entrypoint of the production image starts Jupyter server, if we chose to build Paddle Book. + +1. Run on Kubernetes + + We can push the production image to a DockerHub server, so developers can run distributed training jobs on the Kuberentes cluster: + + ```bash + docker tag paddle me/paddle + docker push + kubectl ... + ``` -曾经的讨论背景: -["PR 1599"](https://github.com/PaddlePaddle/Paddle/pull/1599) -["PR 1598"](https://github.com/PaddlePaddle/Paddle/pull/1598) + For end users, we will provide more convinient tools to run distributed jobs. -- GitLab