diff --git a/doc/howto/usage/cluster/k8s_aws_en.md b/doc/howto/usage/cluster/k8s_aws_en.md index ce72b0803818d5bf0c18753c421848cf2fc1b668..0dfa8237a3fa2c9c3ee11e873c9fbbed3cd6018f 100644 --- a/doc/howto/usage/cluster/k8s_aws_en.md +++ b/doc/howto/usage/cluster/k8s_aws_en.md @@ -493,7 +493,7 @@ spec: spec: containers: - name: paddle-data - image: paddledev/paddle-tutorial:k8s_data + image: paddlepaddle/paddle-tutorial:k8s_data imagePullPolicy: Always volumeMounts: - mountPath: "/efs" @@ -522,7 +522,7 @@ NAME DESIRED SUCCESSFUL AGE paddle-data 1 1 6m ``` -Data preparation is done by docker image `paddledev/paddle-tutorial:k8s_data`, see [here](src/k8s_data/README.md) for how to build this docker image and source code. +Data preparation is done by docker image `paddlepaddle/paddle-tutorial:k8s_data`, see [here](src/k8s_data/README.md) for how to build this docker image and source code. #### Start Training @@ -545,7 +545,7 @@ spec: claimName: efsvol containers: - name: trainer - image: paddledev/paddle-tutorial:k8s_train + image: paddlepaddle/paddle-tutorial:k8s_train command: ["bin/bash", "-c", "/root/start.sh"] env: - name: JOB_NAME @@ -617,7 +617,7 @@ kubectl --kubeconfig=kubeconfig log -f POD_NAME Run `kubectl --kubeconfig=kubeconfig describe job paddle-cluster-job` to check training job status. It will complete in around 20 minutes. -The details for start `pserver` and `trainer` are hidden inside docker image `paddledev/paddle-tutorial:k8s_train`, see [here](src/k8s_train/README.md) for how to build the docker image and source code. +The details for start `pserver` and `trainer` are hidden inside docker image `paddlepaddle/paddle-tutorial:k8s_train`, see [here](src/k8s_train/README.md) for how to build the docker image and source code. #### Inspect Training Output diff --git a/doc/howto/usage/cluster/k8s_cn.md b/doc/howto/usage/cluster/k8s_cn.md index 9d49d0fa8cf01bbaef15c6e13a1e906fa5a9e367..c1a11f7165a2f9da9dd044641274447e7943a597 100644 --- a/doc/howto/usage/cluster/k8s_cn.md +++ b/doc/howto/usage/cluster/k8s_cn.md @@ -4,18 +4,19 @@ ## 制作Docker镜像 -在一个功能齐全的Kubernetes机群里,通常我们会安装Ceph等分布式文件系统来存储训练数据。这样的话,一个分布式PaddlePaddle训练任务中的每个进程都可以从Ceph读取数据。在这个例子里,我们只演示一个单机作业,所以可以简化对环境的要求,把训练数据直接放在 -PaddlePaddle的Docker image里。为此,我们需要制作一个包含训练数据的PaddlePaddle镜像。 - -Paddle 的 [Quick Start Tutorial](http://www.paddlepaddle.org/docs/develop/documentation/zh/getstarted/index_cn.html) -里介绍了用Paddle源码中的脚本下载训练数据的过程。 -而 `paddledev/paddle:cpu-demo-latest` 镜像里有 PaddlePaddle 源码与demo,( 请注意,默认的 -PaddlePaddle镜像 `paddledev/paddle:cpu-latest` 是不包括源码的, PaddlePaddle的各版本镜像可以参考 [Docker installation guide](http://www.paddlepaddle.org/doc/build/docker_install.html) ),所以我们使用这个镜像来下载训练数据到Docker container中,然后把这个包含了训练数据的container保存为一个新的镜像。 - +在一个功能齐全的Kubernetes机群里,通常我们会安装Ceph等分布式文件系统来存储训练数据。这样的话,一个分布式PaddlePaddle训练任务中 +的每个进程都可以从Ceph读取数据。在这个例子里,我们只演示一个单机作业,所以可以简化对环境的要求,把训练数据直接放在 +PaddlePaddle的Docker Image里。为此,我们需要制作一个包含训练数据的PaddlePaddle镜像。 + +PaddlePaddle的 `paddlepaddle/paddle:cpu-demo-latest` 镜像里有PaddlePaddle的源码与demo, +(请注意,默认的PaddlePaddle生产环境镜像 `paddlepaddle/paddle:latest` 是不包括源码的,PaddlePaddle的各版本镜像可以参考 +[Docker Installation Guide](http://paddlepaddle.org/docs/develop/documentation/zh/getstarted/build_and_install/docker_install_cn.html)), +下面我们使用这个镜像来下载数据到Docker Container中,并把这个包含了训练数据的Container保存为一个新的镜像。 + ### 运行容器 ``` -$ docker run --name quick_start_data -it paddledev/paddle:cpu-demo-latest +$ docker run --name quick_start_data -it paddlepaddle/paddle:cpu-demo-latest ``` ### 下载数据 diff --git a/doc/howto/usage/cluster/k8s_distributed_cn.md b/doc/howto/usage/cluster/k8s_distributed_cn.md index 0fc9e37a990104e942636fc807f67a99f0df9da8..701a9a75d78b53d7dab94529dbd1be382ff0d04e 100644 --- a/doc/howto/usage/cluster/k8s_distributed_cn.md +++ b/doc/howto/usage/cluster/k8s_distributed_cn.md @@ -28,7 +28,7 @@ PaddlePaddle镜像需要提供`paddle pserver`与`paddle train`进程的运行 - 拷贝训练文件到容器内 - 生成`paddle pserver`与`paddle train`进程的启动参数,并且启动训练 -因为官方镜像 `paddledev/paddle:cpu-latest` 内已经包含PaddlePaddle的执行程序但是还没上述功能,所以我们可以在这个基础上,添加启动脚本,制作新镜像来完成以上的工作。参考镜像的[*Dockerfile*](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/usage/cluster/src/k8s_train/Dockerfile)。 +因为官方镜像 `paddlepaddle/paddle:latest` 内已经包含PaddlePaddle的执行程序但是还没上述功能,所以我们可以在这个基础上,添加启动脚本,制作新镜像来完成以上的工作。参考镜像的[*Dockerfile*](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/usage/cluster/src/k8s_train/Dockerfile)。 ```bash $ cd doc/howto/usage/k8s/src/k8s_train @@ -62,7 +62,7 @@ spec: hostNetwork: true containers: - name: paddle-data - image: paddledev/paddle-tutorial:k8s_data + image: paddlepaddle/paddle-tutorial:k8s_data imagePullPolicy: Always volumeMounts: - mountPath: "/mnt" diff --git a/doc/howto/usage/cluster/k8s_en.md b/doc/howto/usage/cluster/k8s_en.md index 5a3ebfd8dcd144785769d43f9bc75135b5f9b0bd..c374f00a495d705ceddf8d3d930768ceeb93282b 100644 --- a/doc/howto/usage/cluster/k8s_en.md +++ b/doc/howto/usage/cluster/k8s_en.md @@ -4,15 +4,24 @@ In this article, we will introduce how to run PaddlePaddle training job on singl ## Build Docker Image -In distributed Kubernetes cluster, we will use Ceph or other shared storage system for storing training data so that all processes in the training job can retrieve data from Ceph. In this example, we will only demo training job on single machine. In order to simplify the requirement of the environment, we will directly put training data into PaddlePaddle's Docker Image, so we need to create a PaddlePaddle Docker image that already includes the training data. +In distributed Kubernetes cluster, we will use Ceph or other distributed +storage system for storing training related data so that all processes in +PaddlePaddle training can retrieve data from Ceph. In this example, we will +only demo training job on single machine. In order to simplify the requirement +of the environment, we will directly put training data into the PaddlePaddle Docker Image, +so we need to create a PaddlePaddle Docker image that includes the training data. + +The production Docker Image `paddlepaddle/paddle:cpu-demo-latest` has the PaddlePaddle +source code and demo. (Caution: Default PaddlePaddle Docker Image `paddlepaddle/paddle:latest` doesn't include +the source code, PaddlePaddle's different versions of Docker Image can be referred here: +[Docker Installation Guide](http://paddlepaddle.org/docs/develop/documentation/zh/getstarted/build_and_install/docker_install_en.html)), +so we run this Docker Image and download the training data, and then commit the whole +Container to be a new Docker Image. -PaddlePaddle's [Quick Start Tutorial](http://www.paddlepaddle.org/docs/develop/documentation/en/getstarted/index_en.html) introduces how to download and train data by using script from PaddlePaddle's source code. -And `paddledev/paddle:cpu-demo-latest` image has the PaddlePaddle source code and demo. (Caution: Default PaddlePaddle image `paddledev/paddle:cpu-latest` doesn't include the source code, PaddlePaddle's different versions of image can be referred here: [Docker installation guide](http://www.paddlepaddle.org/doc/build/docker_install.html)), so we run this container and download the training data, and then commit the whole container to be a new Docker image. - ### Run Docker Container ``` -$ docker run --name quick_start_data -it paddledev/paddle:cpu-demo-latest +$ docker run --name quick_start_data -it paddlepaddle/paddle:cpu-demo-latest ``` ### Download Training Data diff --git a/doc/howto/usage/cluster/src/Dockerfile b/doc/howto/usage/cluster/src/Dockerfile index 3a73606c61432329b4cc2d2f8daadc5af8735c96..e178bf4da0f32fca9586b5b69a2c7419de5d9cb1 100644 --- a/doc/howto/usage/cluster/src/Dockerfile +++ b/doc/howto/usage/cluster/src/Dockerfile @@ -1,4 +1,4 @@ -FROM paddledev/paddle:cpu-latest +FROM paddlepaddle/paddle:latest MAINTAINER zjsxzong89@gmail.com diff --git a/doc/howto/usage/cluster/src/k8s_train/Dockerfile b/doc/howto/usage/cluster/src/k8s_train/Dockerfile index c0fca1f9a945921e6e8899fee2db8845e66136a1..77f021a89a70d934bf70424eaa3c6dc3f7c93a28 100644 --- a/doc/howto/usage/cluster/src/k8s_train/Dockerfile +++ b/doc/howto/usage/cluster/src/k8s_train/Dockerfile @@ -1,4 +1,4 @@ -FROM paddledev/paddle:cpu-latest +FROM paddlepaddle/paddle:latest COPY start.sh /root/ COPY start_paddle.py /root/