Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • Paddle
  • Issue
  • #1625

P
Paddle
  • 项目概览

PaddlePaddle / Paddle
大约 2 年 前同步成功

通知 2325
Star 20933
Fork 5424
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 1423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
P
Paddle
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 1,423
    • Issue 1,423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
    • 合并请求 543
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板
已关闭
开放中
Opened 3月 16, 2017 by saxon_zh@saxon_zhGuest

Build process re-designed

Created by: wangkuiyi

We need to complete the initial draft https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/scripts/docker/README.md.

I am recording some ideas here, and we should file a PR later.

Current Status

Currently, we have four sets of Dockefiles:

  1. Kubernetes examples:

    doc/howto/usage/k8s/src/Dockerfile -- based on released image but add start.sh
    doc/howto/usage/k8s/src/k8s_data/Dockerfile -- contains only get_data.sh
    doc/howto/usage/k8s/src/k8s_train/Dockerfile -- this duplicates with the first one.
  2. Generate .deb packages:

    paddle/scripts/deb/build_scripts/Dockerfile -- significantly overlaps with the `docker` directory
  3. In the docker directory:

    paddle/scripts/docker/Dockerfile
    paddle/scripts/docker/Dockerfile.gpu
  4. Document building

    paddle/scripts/tools/build_docs/Dockerfile -- a subset of above two sets.

Goal

We want two Docker images for each version of PaddlePaddle:

  1. paddle:<version>-dev

    This a development image contains only the development tools. This standardizes the building tools and procedure. Users include:

    • developers -- no longer need to install development tools on the host, and can build their current work on the host (development computer).
    • release engineers -- use this to build the official release from certain branch/tag on Github.com.
    • document writers / Website developers -- Our documents are in the source repo in the form of .md/.rst files and comments in source code. We need tools to extract the information, typeset, and generate Web pages.

    So the development image must contain not only source code building tools, but also documentation tools:

    • gcc/clang
    • nvcc
    • Python
    • sphinx
    • woboq
    • sshd

    where sshd makes it easy for developers to have multiple terminals connecting into the container.

  2. paddle:<version>

    This is the production image, generated using the development image. This image might have multiple variants:

    • GPU/AVX paddle:<version>-gpu
    • GPU/no-AVX paddle:<version>-gpu-noavx
    • no-GPU/AVX paddle:<version>
    • no-GPU/no-AVX paddle:<version>-noavx

    We'd like to give users choices of GPU and no-GPU, because the GPU version image is much larger than then the no-GPU version.

    We'd like to give users choices of AVX and no-AVX, because some cloud providers don't provide AVX-enabled VMs.

Dockerfile

To realize above goals, we need only one Dockerfile for the development image. We can put it in the root source directory.

Let us go over our daily development procedure to show how developers can use this file.

  1. Check out the source code

    git clone https://github.com/PaddlePaddle/Paddle paddle
  2. Do something

    cd paddle
    git checkout -b my_work
    Edit some files
  3. Build/update the development image (if not yet)

    docker build -t paddle:dev . # Suppose that the Dockerfile is in the root source directory.
  4. Build the source code

    docker run -v $PWD:/paddle -e "GPU=OFF" -e "AVX=ON" -e "TEST=ON" paddle:dev

    This command maps the source directory on the host into /paddle in the container.

    Please be aware that the default entrypoint of paddle:dev is a shell script file build.sh, which builds the source code, and outputs to /paddle/build in the container, which is actually $PWD/build on the host.

    build.sh doesn't only build binaries, but also generates a $PWD/build/Dockerfile file, which can be used to build the production image. We will talk about it later.

  5. Run on the host (Not recommended)

    If the host computer happens to have all dependent libraries and Python runtimes installed, we can now run/test the built program. But the recommended way is to running in a production image.

  6. Run in the development container

    build.sh generates binary files and invokes make install. So we can run the built program within the development container. This is convenient for developers.

  7. Build a production image

    On the host, we can use the $PWD/build/Dockerfile to generate a production image.

    docker build -t paddle --build-arg "BOOK=ON" -f build/Dockerfile .
  8. Run the Paddle Book

    Once we have the production image, we can run Paddle Book chapters in Jupyter Notebooks (if we chose to build them)

    docker run -it paddle

    Note that the default entrypoint of the production image starts Jupyter server, if we chose to build Paddle Book.

  9. Run on Kubernetes

    We can push the production image to a DockerHub server, so developers can run distributed training jobs on the Kuberentes cluster:

    docker tag paddle me/paddle
    docker push
    kubectl ...

    For end users, we will provide more convinient tools to run distributed jobs.

指派人
分配到
无
里程碑
无
分配里程碑
工时统计
无
截止日期
无
标识: paddlepaddle/Paddle#1625
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7