提交 ae6061dd 编写于 作者: T Travis CI

Deploy to GitHub Pages: a87f4963

上级 123269f6
...@@ -493,7 +493,7 @@ spec: ...@@ -493,7 +493,7 @@ spec:
spec: spec:
containers: containers:
- name: paddle-data - name: paddle-data
image: paddledev/paddle-tutorial:k8s_data image: paddlepaddle/paddle-tutorial:k8s_data
imagePullPolicy: Always imagePullPolicy: Always
volumeMounts: volumeMounts:
- mountPath: "/efs" - mountPath: "/efs"
...@@ -522,7 +522,7 @@ NAME DESIRED SUCCESSFUL AGE ...@@ -522,7 +522,7 @@ NAME DESIRED SUCCESSFUL AGE
paddle-data 1 1 6m paddle-data 1 1 6m
``` ```
Data preparation is done by docker image `paddledev/paddle-tutorial:k8s_data`, see [here](src/k8s_data/README.md) for how to build this docker image and source code. Data preparation is done by docker image `paddlepaddle/paddle-tutorial:k8s_data`, see [here](src/k8s_data/README.md) for how to build this docker image and source code.
#### Start Training #### Start Training
...@@ -545,7 +545,7 @@ spec: ...@@ -545,7 +545,7 @@ spec:
claimName: efsvol claimName: efsvol
containers: containers:
- name: trainer - name: trainer
image: paddledev/paddle-tutorial:k8s_train image: paddlepaddle/paddle-tutorial:k8s_train
command: ["bin/bash", "-c", "/root/start.sh"] command: ["bin/bash", "-c", "/root/start.sh"]
env: env:
- name: JOB_NAME - name: JOB_NAME
...@@ -617,7 +617,7 @@ kubectl --kubeconfig=kubeconfig log -f POD_NAME ...@@ -617,7 +617,7 @@ kubectl --kubeconfig=kubeconfig log -f POD_NAME
Run `kubectl --kubeconfig=kubeconfig describe job paddle-cluster-job` to check training job status. It will complete in around 20 minutes. Run `kubectl --kubeconfig=kubeconfig describe job paddle-cluster-job` to check training job status. It will complete in around 20 minutes.
The details for start `pserver` and `trainer` are hidden inside docker image `paddledev/paddle-tutorial:k8s_train`, see [here](src/k8s_train/README.md) for how to build the docker image and source code. The details for start `pserver` and `trainer` are hidden inside docker image `paddlepaddle/paddle-tutorial:k8s_train`, see [here](src/k8s_train/README.md) for how to build the docker image and source code.
#### Inspect Training Output #### Inspect Training Output
......
...@@ -4,15 +4,24 @@ In this article, we will introduce how to run PaddlePaddle training job on singl ...@@ -4,15 +4,24 @@ In this article, we will introduce how to run PaddlePaddle training job on singl
## Build Docker Image ## Build Docker Image
In distributed Kubernetes cluster, we will use Ceph or other shared storage system for storing training data so that all processes in the training job can retrieve data from Ceph. In this example, we will only demo training job on single machine. In order to simplify the requirement of the environment, we will directly put training data into PaddlePaddle's Docker Image, so we need to create a PaddlePaddle Docker image that already includes the training data. In distributed Kubernetes cluster, we will use Ceph or other distributed
storage system for storing training related data so that all processes in
PaddlePaddle training can retrieve data from Ceph. In this example, we will
only demo training job on single machine. In order to simplify the requirement
of the environment, we will directly put training data into the PaddlePaddle Docker Image,
so we need to create a PaddlePaddle Docker image that includes the training data.
The production Docker Image `paddlepaddle/paddle:cpu-demo-latest` has the PaddlePaddle
source code and demo. (Caution: Default PaddlePaddle Docker Image `paddlepaddle/paddle:latest` doesn't include
the source code, PaddlePaddle's different versions of Docker Image can be referred here:
[Docker Installation Guide](http://paddlepaddle.org/docs/develop/documentation/zh/getstarted/build_and_install/docker_install_en.html)),
so we run this Docker Image and download the training data, and then commit the whole
Container to be a new Docker Image.
PaddlePaddle's [Quick Start Tutorial](http://www.paddlepaddle.org/docs/develop/documentation/en/getstarted/index_en.html) introduces how to download and train data by using script from PaddlePaddle's source code.
And `paddledev/paddle:cpu-demo-latest` image has the PaddlePaddle source code and demo. (Caution: Default PaddlePaddle image `paddledev/paddle:cpu-latest` doesn't include the source code, PaddlePaddle's different versions of image can be referred here: [Docker installation guide](http://www.paddlepaddle.org/doc/build/docker_install.html)), so we run this container and download the training data, and then commit the whole container to be a new Docker image.
### Run Docker Container ### Run Docker Container
``` ```
$ docker run --name quick_start_data -it paddledev/paddle:cpu-demo-latest $ docker run --name quick_start_data -it paddlepaddle/paddle:cpu-demo-latest
``` ```
### Download Training Data ### Download Training Data
......
...@@ -649,7 +649,7 @@ ip-10-0-0-55.us-west-2.compute.internal Ready 6m ...@@ -649,7 +649,7 @@ ip-10-0-0-55.us-west-2.compute.internal Ready 6m
<span class="n">spec</span><span class="p">:</span> <span class="n">spec</span><span class="p">:</span>
<span class="n">containers</span><span class="p">:</span> <span class="n">containers</span><span class="p">:</span>
<span class="o">-</span> <span class="n">name</span><span class="p">:</span> <span class="n">paddle</span><span class="o">-</span><span class="n">data</span> <span class="o">-</span> <span class="n">name</span><span class="p">:</span> <span class="n">paddle</span><span class="o">-</span><span class="n">data</span>
<span class="n">image</span><span class="p">:</span> <span class="n">paddledev</span><span class="o">/</span><span class="n">paddle</span><span class="o">-</span><span class="n">tutorial</span><span class="p">:</span><span class="n">k8s_data</span> <span class="n">image</span><span class="p">:</span> <span class="n">paddlepaddle</span><span class="o">/</span><span class="n">paddle</span><span class="o">-</span><span class="n">tutorial</span><span class="p">:</span><span class="n">k8s_data</span>
<span class="n">imagePullPolicy</span><span class="p">:</span> <span class="n">Always</span> <span class="n">imagePullPolicy</span><span class="p">:</span> <span class="n">Always</span>
<span class="n">volumeMounts</span><span class="p">:</span> <span class="n">volumeMounts</span><span class="p">:</span>
<span class="o">-</span> <span class="n">mountPath</span><span class="p">:</span> <span class="s2">&quot;/efs&quot;</span> <span class="o">-</span> <span class="n">mountPath</span><span class="p">:</span> <span class="s2">&quot;/efs&quot;</span>
...@@ -676,7 +676,7 @@ NAME DESIRED SUCCESSFUL AGE ...@@ -676,7 +676,7 @@ NAME DESIRED SUCCESSFUL AGE
paddle-data 1 1 6m paddle-data 1 1 6m
</pre></div> </pre></div>
</div> </div>
<p>Data preparation is done by docker image <code class="docutils literal"><span class="pre">paddledev/paddle-tutorial:k8s_data</span></code>, see <a class="reference internal" href="src/k8s_data/README.html"><span class="doc">here</span></a> for how to build this docker image and source code.</p> <p>Data preparation is done by docker image <code class="docutils literal"><span class="pre">paddlepaddle/paddle-tutorial:k8s_data</span></code>, see <a class="reference internal" href="src/k8s_data/README.html"><span class="doc">here</span></a> for how to build this docker image and source code.</p>
</div> </div>
<div class="section" id="start-training"> <div class="section" id="start-training">
<span id="start-training"></span><h4>Start Training<a class="headerlink" href="#start-training" title="Permalink to this headline"></a></h4> <span id="start-training"></span><h4>Start Training<a class="headerlink" href="#start-training" title="Permalink to this headline"></a></h4>
...@@ -698,7 +698,7 @@ paddle-data 1 1 6m ...@@ -698,7 +698,7 @@ paddle-data 1 1 6m
<span class="n">claimName</span><span class="p">:</span> <span class="n">efsvol</span> <span class="n">claimName</span><span class="p">:</span> <span class="n">efsvol</span>
<span class="n">containers</span><span class="p">:</span> <span class="n">containers</span><span class="p">:</span>
<span class="o">-</span> <span class="n">name</span><span class="p">:</span> <span class="n">trainer</span> <span class="o">-</span> <span class="n">name</span><span class="p">:</span> <span class="n">trainer</span>
<span class="n">image</span><span class="p">:</span> <span class="n">paddledev</span><span class="o">/</span><span class="n">paddle</span><span class="o">-</span><span class="n">tutorial</span><span class="p">:</span><span class="n">k8s_train</span> <span class="n">image</span><span class="p">:</span> <span class="n">paddlepaddle</span><span class="o">/</span><span class="n">paddle</span><span class="o">-</span><span class="n">tutorial</span><span class="p">:</span><span class="n">k8s_train</span>
<span class="n">command</span><span class="p">:</span> <span class="p">[</span><span class="s2">&quot;bin/bash&quot;</span><span class="p">,</span> <span class="s2">&quot;-c&quot;</span><span class="p">,</span> <span class="s2">&quot;/root/start.sh&quot;</span><span class="p">]</span> <span class="n">command</span><span class="p">:</span> <span class="p">[</span><span class="s2">&quot;bin/bash&quot;</span><span class="p">,</span> <span class="s2">&quot;-c&quot;</span><span class="p">,</span> <span class="s2">&quot;/root/start.sh&quot;</span><span class="p">]</span>
<span class="n">env</span><span class="p">:</span> <span class="n">env</span><span class="p">:</span>
<span class="o">-</span> <span class="n">name</span><span class="p">:</span> <span class="n">JOB_NAME</span> <span class="o">-</span> <span class="n">name</span><span class="p">:</span> <span class="n">JOB_NAME</span>
...@@ -761,7 +761,7 @@ paddle-cluster-job-jx4xr 1/1 Running 0 9m ...@@ -761,7 +761,7 @@ paddle-cluster-job-jx4xr 1/1 Running 0 9m
</div> </div>
<p><code class="docutils literal"><span class="pre">POD_NAME</span></code>: name of any pod (e.g., <code class="docutils literal"><span class="pre">paddle-cluster-job-cm469</span></code>).</p> <p><code class="docutils literal"><span class="pre">POD_NAME</span></code>: name of any pod (e.g., <code class="docutils literal"><span class="pre">paddle-cluster-job-cm469</span></code>).</p>
<p>Run <code class="docutils literal"><span class="pre">kubectl</span> <span class="pre">--kubeconfig=kubeconfig</span> <span class="pre">describe</span> <span class="pre">job</span> <span class="pre">paddle-cluster-job</span></code> to check training job status. It will complete in around 20 minutes.</p> <p>Run <code class="docutils literal"><span class="pre">kubectl</span> <span class="pre">--kubeconfig=kubeconfig</span> <span class="pre">describe</span> <span class="pre">job</span> <span class="pre">paddle-cluster-job</span></code> to check training job status. It will complete in around 20 minutes.</p>
<p>The details for start <code class="docutils literal"><span class="pre">pserver</span></code> and <code class="docutils literal"><span class="pre">trainer</span></code> are hidden inside docker image <code class="docutils literal"><span class="pre">paddledev/paddle-tutorial:k8s_train</span></code>, see <a class="reference internal" href="src/k8s_train/README.html"><span class="doc">here</span></a> for how to build the docker image and source code.</p> <p>The details for start <code class="docutils literal"><span class="pre">pserver</span></code> and <code class="docutils literal"><span class="pre">trainer</span></code> are hidden inside docker image <code class="docutils literal"><span class="pre">paddlepaddle/paddle-tutorial:k8s_train</span></code>, see <a class="reference internal" href="src/k8s_train/README.html"><span class="doc">here</span></a> for how to build the docker image and source code.</p>
</div> </div>
<div class="section" id="inspect-training-output"> <div class="section" id="inspect-training-output">
<span id="inspect-training-output"></span><h4>Inspect Training Output<a class="headerlink" href="#inspect-training-output" title="Permalink to this headline"></a></h4> <span id="inspect-training-output"></span><h4>Inspect Training Output<a class="headerlink" href="#inspect-training-output" title="Permalink to this headline"></a></h4>
......
...@@ -220,12 +220,21 @@ ...@@ -220,12 +220,21 @@
<p>In this article, we will introduce how to run PaddlePaddle training job on single CPU machine using Kubernetes. In next article, we will introduce how to run PaddlePaddle training job on distributed cluster.</p> <p>In this article, we will introduce how to run PaddlePaddle training job on single CPU machine using Kubernetes. In next article, we will introduce how to run PaddlePaddle training job on distributed cluster.</p>
<div class="section" id="build-docker-image"> <div class="section" id="build-docker-image">
<span id="build-docker-image"></span><h2>Build Docker Image<a class="headerlink" href="#build-docker-image" title="Permalink to this headline"></a></h2> <span id="build-docker-image"></span><h2>Build Docker Image<a class="headerlink" href="#build-docker-image" title="Permalink to this headline"></a></h2>
<p>In distributed Kubernetes cluster, we will use Ceph or other shared storage system for storing training data so that all processes in the training job can retrieve data from Ceph. In this example, we will only demo training job on single machine. In order to simplify the requirement of the environment, we will directly put training data into PaddlePaddle&#8217;s Docker Image, so we need to create a PaddlePaddle Docker image that already includes the training data.</p> <p>In distributed Kubernetes cluster, we will use Ceph or other distributed
<p>PaddlePaddle&#8217;s <a class="reference external" href="http://www.paddlepaddle.org/docs/develop/documentation/en/getstarted/index_en.html">Quick Start Tutorial</a> introduces how to download and train data by using script from PaddlePaddle&#8217;s source code. storage system for storing training related data so that all processes in
And <code class="docutils literal"><span class="pre">paddledev/paddle:cpu-demo-latest</span></code> image has the PaddlePaddle source code and demo. (Caution: Default PaddlePaddle image <code class="docutils literal"><span class="pre">paddledev/paddle:cpu-latest</span></code> doesn&#8217;t include the source code, PaddlePaddle&#8217;s different versions of image can be referred here: <a class="reference external" href="http://www.paddlepaddle.org/doc/build/docker_install.html">Docker installation guide</a>), so we run this container and download the training data, and then commit the whole container to be a new Docker image.</p> PaddlePaddle training can retrieve data from Ceph. In this example, we will
only demo training job on single machine. In order to simplify the requirement
of the environment, we will directly put training data into the PaddlePaddle Docker Image,
so we need to create a PaddlePaddle Docker image that includes the training data.</p>
<p>The production Docker Image <code class="docutils literal"><span class="pre">paddlepaddle/paddle:cpu-demo-latest</span></code> has the PaddlePaddle
source code and demo. (Caution: Default PaddlePaddle Docker Image <code class="docutils literal"><span class="pre">paddlepaddle/paddle:latest</span></code> doesn&#8217;t include
the source code, PaddlePaddle&#8217;s different versions of Docker Image can be referred here:
<a class="reference external" href="http://paddlepaddle.org/docs/develop/documentation/zh/getstarted/build_and_install/docker_install_en.html">Docker Installation Guide</a>),
so we run this Docker Image and download the training data, and then commit the whole
Container to be a new Docker Image.</p>
<div class="section" id="run-docker-container"> <div class="section" id="run-docker-container">
<span id="run-docker-container"></span><h3>Run Docker Container<a class="headerlink" href="#run-docker-container" title="Permalink to this headline"></a></h3> <span id="run-docker-container"></span><h3>Run Docker Container<a class="headerlink" href="#run-docker-container" title="Permalink to this headline"></a></h3>
<div class="highlight-default"><div class="highlight"><pre><span></span>$ docker run --name quick_start_data -it paddledev/paddle:cpu-demo-latest <div class="highlight-default"><div class="highlight"><pre><span></span>$ docker run --name quick_start_data -it paddlepaddle/paddle:cpu-demo-latest
</pre></div> </pre></div>
</div> </div>
</div> </div>
......
因为 它太大了无法显示 source diff 。你可以改为 查看blob
...@@ -493,7 +493,7 @@ spec: ...@@ -493,7 +493,7 @@ spec:
spec: spec:
containers: containers:
- name: paddle-data - name: paddle-data
image: paddledev/paddle-tutorial:k8s_data image: paddlepaddle/paddle-tutorial:k8s_data
imagePullPolicy: Always imagePullPolicy: Always
volumeMounts: volumeMounts:
- mountPath: "/efs" - mountPath: "/efs"
...@@ -522,7 +522,7 @@ NAME DESIRED SUCCESSFUL AGE ...@@ -522,7 +522,7 @@ NAME DESIRED SUCCESSFUL AGE
paddle-data 1 1 6m paddle-data 1 1 6m
``` ```
Data preparation is done by docker image `paddledev/paddle-tutorial:k8s_data`, see [here](src/k8s_data/README.md) for how to build this docker image and source code. Data preparation is done by docker image `paddlepaddle/paddle-tutorial:k8s_data`, see [here](src/k8s_data/README.md) for how to build this docker image and source code.
#### Start Training #### Start Training
...@@ -545,7 +545,7 @@ spec: ...@@ -545,7 +545,7 @@ spec:
claimName: efsvol claimName: efsvol
containers: containers:
- name: trainer - name: trainer
image: paddledev/paddle-tutorial:k8s_train image: paddlepaddle/paddle-tutorial:k8s_train
command: ["bin/bash", "-c", "/root/start.sh"] command: ["bin/bash", "-c", "/root/start.sh"]
env: env:
- name: JOB_NAME - name: JOB_NAME
...@@ -617,7 +617,7 @@ kubectl --kubeconfig=kubeconfig log -f POD_NAME ...@@ -617,7 +617,7 @@ kubectl --kubeconfig=kubeconfig log -f POD_NAME
Run `kubectl --kubeconfig=kubeconfig describe job paddle-cluster-job` to check training job status. It will complete in around 20 minutes. Run `kubectl --kubeconfig=kubeconfig describe job paddle-cluster-job` to check training job status. It will complete in around 20 minutes.
The details for start `pserver` and `trainer` are hidden inside docker image `paddledev/paddle-tutorial:k8s_train`, see [here](src/k8s_train/README.md) for how to build the docker image and source code. The details for start `pserver` and `trainer` are hidden inside docker image `paddlepaddle/paddle-tutorial:k8s_train`, see [here](src/k8s_train/README.md) for how to build the docker image and source code.
#### Inspect Training Output #### Inspect Training Output
......
...@@ -4,18 +4,19 @@ ...@@ -4,18 +4,19 @@
## 制作Docker镜像 ## 制作Docker镜像
在一个功能齐全的Kubernetes机群里,通常我们会安装Ceph等分布式文件系统来存储训练数据。这样的话,一个分布式PaddlePaddle训练任务中的每个进程都可以从Ceph读取数据。在这个例子里,我们只演示一个单机作业,所以可以简化对环境的要求,把训练数据直接放在 在一个功能齐全的Kubernetes机群里,通常我们会安装Ceph等分布式文件系统来存储训练数据。这样的话,一个分布式PaddlePaddle训练任务中
PaddlePaddle的Docker image里。为此,我们需要制作一个包含训练数据的PaddlePaddle镜像。 的每个进程都可以从Ceph读取数据。在这个例子里,我们只演示一个单机作业,所以可以简化对环境的要求,把训练数据直接放在
PaddlePaddle的Docker Image里。为此,我们需要制作一个包含训练数据的PaddlePaddle镜像。
Paddle 的 [Quick Start Tutorial](http://www.paddlepaddle.org/docs/develop/documentation/zh/getstarted/index_cn.html)
里介绍了用Paddle源码中的脚本下载训练数据的过程。 PaddlePaddle的 `paddlepaddle/paddle:cpu-demo-latest` 镜像里有PaddlePaddle的源码与demo,
而 `paddledev/paddle:cpu-demo-latest` 镜像里有 PaddlePaddle 源码与demo,( 请注意,默认的 (请注意,默认的PaddlePaddle生产环境镜像 `paddlepaddle/paddle:latest` 是不包括源码的,PaddlePaddle的各版本镜像可以参考
PaddlePaddle镜像 `paddledev/paddle:cpu-latest` 是不包括源码的, PaddlePaddle的各版本镜像可以参考 [Docker installation guide](http://www.paddlepaddle.org/doc/build/docker_install.html) ),所以我们使用这个镜像来下载训练数据到Docker container中,然后把这个包含了训练数据的container保存为一个新的镜像。 [Docker Installation Guide](http://paddlepaddle.org/docs/develop/documentation/zh/getstarted/build_and_install/docker_install_cn.html)),
下面我们使用这个镜像来下载数据到Docker Container中,并把这个包含了训练数据的Container保存为一个新的镜像。
### 运行容器 ### 运行容器
``` ```
$ docker run --name quick_start_data -it paddledev/paddle:cpu-demo-latest $ docker run --name quick_start_data -it paddlepaddle/paddle:cpu-demo-latest
``` ```
### 下载数据 ### 下载数据
......
...@@ -28,7 +28,7 @@ PaddlePaddle镜像需要提供`paddle pserver`与`paddle train`进程的运行 ...@@ -28,7 +28,7 @@ PaddlePaddle镜像需要提供`paddle pserver`与`paddle train`进程的运行
- 拷贝训练文件到容器内 - 拷贝训练文件到容器内
- 生成`paddle pserver`与`paddle train`进程的启动参数,并且启动训练 - 生成`paddle pserver`与`paddle train`进程的启动参数,并且启动训练
因为官方镜像 `paddledev/paddle:cpu-latest` 内已经包含PaddlePaddle的执行程序但是还没上述功能,所以我们可以在这个基础上,添加启动脚本,制作新镜像来完成以上的工作。参考镜像的[*Dockerfile*](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/usage/cluster/src/k8s_train/Dockerfile)。 因为官方镜像 `paddlepaddle/paddle:latest` 内已经包含PaddlePaddle的执行程序但是还没上述功能,所以我们可以在这个基础上,添加启动脚本,制作新镜像来完成以上的工作。参考镜像的[*Dockerfile*](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/usage/cluster/src/k8s_train/Dockerfile)。
```bash ```bash
$ cd doc/howto/usage/k8s/src/k8s_train $ cd doc/howto/usage/k8s/src/k8s_train
...@@ -62,7 +62,7 @@ spec: ...@@ -62,7 +62,7 @@ spec:
hostNetwork: true hostNetwork: true
containers: containers:
- name: paddle-data - name: paddle-data
image: paddledev/paddle-tutorial:k8s_data image: paddlepaddle/paddle-tutorial:k8s_data
imagePullPolicy: Always imagePullPolicy: Always
volumeMounts: volumeMounts:
- mountPath: "/mnt" - mountPath: "/mnt"
......
...@@ -662,7 +662,7 @@ ip-10-0-0-55.us-west-2.compute.internal Ready 6m ...@@ -662,7 +662,7 @@ ip-10-0-0-55.us-west-2.compute.internal Ready 6m
<span class="n">spec</span><span class="p">:</span> <span class="n">spec</span><span class="p">:</span>
<span class="n">containers</span><span class="p">:</span> <span class="n">containers</span><span class="p">:</span>
<span class="o">-</span> <span class="n">name</span><span class="p">:</span> <span class="n">paddle</span><span class="o">-</span><span class="n">data</span> <span class="o">-</span> <span class="n">name</span><span class="p">:</span> <span class="n">paddle</span><span class="o">-</span><span class="n">data</span>
<span class="n">image</span><span class="p">:</span> <span class="n">paddledev</span><span class="o">/</span><span class="n">paddle</span><span class="o">-</span><span class="n">tutorial</span><span class="p">:</span><span class="n">k8s_data</span> <span class="n">image</span><span class="p">:</span> <span class="n">paddlepaddle</span><span class="o">/</span><span class="n">paddle</span><span class="o">-</span><span class="n">tutorial</span><span class="p">:</span><span class="n">k8s_data</span>
<span class="n">imagePullPolicy</span><span class="p">:</span> <span class="n">Always</span> <span class="n">imagePullPolicy</span><span class="p">:</span> <span class="n">Always</span>
<span class="n">volumeMounts</span><span class="p">:</span> <span class="n">volumeMounts</span><span class="p">:</span>
<span class="o">-</span> <span class="n">mountPath</span><span class="p">:</span> <span class="s2">&quot;/efs&quot;</span> <span class="o">-</span> <span class="n">mountPath</span><span class="p">:</span> <span class="s2">&quot;/efs&quot;</span>
...@@ -689,7 +689,7 @@ NAME DESIRED SUCCESSFUL AGE ...@@ -689,7 +689,7 @@ NAME DESIRED SUCCESSFUL AGE
paddle-data 1 1 6m paddle-data 1 1 6m
</pre></div> </pre></div>
</div> </div>
<p>Data preparation is done by docker image <code class="docutils literal"><span class="pre">paddledev/paddle-tutorial:k8s_data</span></code>, see <a class="reference internal" href="src/k8s_data/README.html"><span class="doc">here</span></a> for how to build this docker image and source code.</p> <p>Data preparation is done by docker image <code class="docutils literal"><span class="pre">paddlepaddle/paddle-tutorial:k8s_data</span></code>, see <a class="reference internal" href="src/k8s_data/README.html"><span class="doc">here</span></a> for how to build this docker image and source code.</p>
</div> </div>
<div class="section" id="start-training"> <div class="section" id="start-training">
<span id="start-training"></span><h4>Start Training<a class="headerlink" href="#start-training" title="永久链接至标题"></a></h4> <span id="start-training"></span><h4>Start Training<a class="headerlink" href="#start-training" title="永久链接至标题"></a></h4>
...@@ -711,7 +711,7 @@ paddle-data 1 1 6m ...@@ -711,7 +711,7 @@ paddle-data 1 1 6m
<span class="n">claimName</span><span class="p">:</span> <span class="n">efsvol</span> <span class="n">claimName</span><span class="p">:</span> <span class="n">efsvol</span>
<span class="n">containers</span><span class="p">:</span> <span class="n">containers</span><span class="p">:</span>
<span class="o">-</span> <span class="n">name</span><span class="p">:</span> <span class="n">trainer</span> <span class="o">-</span> <span class="n">name</span><span class="p">:</span> <span class="n">trainer</span>
<span class="n">image</span><span class="p">:</span> <span class="n">paddledev</span><span class="o">/</span><span class="n">paddle</span><span class="o">-</span><span class="n">tutorial</span><span class="p">:</span><span class="n">k8s_train</span> <span class="n">image</span><span class="p">:</span> <span class="n">paddlepaddle</span><span class="o">/</span><span class="n">paddle</span><span class="o">-</span><span class="n">tutorial</span><span class="p">:</span><span class="n">k8s_train</span>
<span class="n">command</span><span class="p">:</span> <span class="p">[</span><span class="s2">&quot;bin/bash&quot;</span><span class="p">,</span> <span class="s2">&quot;-c&quot;</span><span class="p">,</span> <span class="s2">&quot;/root/start.sh&quot;</span><span class="p">]</span> <span class="n">command</span><span class="p">:</span> <span class="p">[</span><span class="s2">&quot;bin/bash&quot;</span><span class="p">,</span> <span class="s2">&quot;-c&quot;</span><span class="p">,</span> <span class="s2">&quot;/root/start.sh&quot;</span><span class="p">]</span>
<span class="n">env</span><span class="p">:</span> <span class="n">env</span><span class="p">:</span>
<span class="o">-</span> <span class="n">name</span><span class="p">:</span> <span class="n">JOB_NAME</span> <span class="o">-</span> <span class="n">name</span><span class="p">:</span> <span class="n">JOB_NAME</span>
...@@ -774,7 +774,7 @@ paddle-cluster-job-jx4xr 1/1 Running 0 9m ...@@ -774,7 +774,7 @@ paddle-cluster-job-jx4xr 1/1 Running 0 9m
</div> </div>
<p><code class="docutils literal"><span class="pre">POD_NAME</span></code>: name of any pod (e.g., <code class="docutils literal"><span class="pre">paddle-cluster-job-cm469</span></code>).</p> <p><code class="docutils literal"><span class="pre">POD_NAME</span></code>: name of any pod (e.g., <code class="docutils literal"><span class="pre">paddle-cluster-job-cm469</span></code>).</p>
<p>Run <code class="docutils literal"><span class="pre">kubectl</span> <span class="pre">--kubeconfig=kubeconfig</span> <span class="pre">describe</span> <span class="pre">job</span> <span class="pre">paddle-cluster-job</span></code> to check training job status. It will complete in around 20 minutes.</p> <p>Run <code class="docutils literal"><span class="pre">kubectl</span> <span class="pre">--kubeconfig=kubeconfig</span> <span class="pre">describe</span> <span class="pre">job</span> <span class="pre">paddle-cluster-job</span></code> to check training job status. It will complete in around 20 minutes.</p>
<p>The details for start <code class="docutils literal"><span class="pre">pserver</span></code> and <code class="docutils literal"><span class="pre">trainer</span></code> are hidden inside docker image <code class="docutils literal"><span class="pre">paddledev/paddle-tutorial:k8s_train</span></code>, see <a class="reference internal" href="src/k8s_train/README.html"><span class="doc">here</span></a> for how to build the docker image and source code.</p> <p>The details for start <code class="docutils literal"><span class="pre">pserver</span></code> and <code class="docutils literal"><span class="pre">trainer</span></code> are hidden inside docker image <code class="docutils literal"><span class="pre">paddlepaddle/paddle-tutorial:k8s_train</span></code>, see <a class="reference internal" href="src/k8s_train/README.html"><span class="doc">here</span></a> for how to build the docker image and source code.</p>
</div> </div>
<div class="section" id="inspect-training-output"> <div class="section" id="inspect-training-output">
<span id="inspect-training-output"></span><h4>Inspect Training Output<a class="headerlink" href="#inspect-training-output" title="永久链接至标题"></a></h4> <span id="inspect-training-output"></span><h4>Inspect Training Output<a class="headerlink" href="#inspect-training-output" title="永久链接至标题"></a></h4>
......
...@@ -233,15 +233,16 @@ ...@@ -233,15 +233,16 @@
<p>在这篇文档里,我们介绍如何在 Kubernetes 集群上启动一个单机使用CPU的PaddlePaddle训练作业。在下一篇中,我们将介绍如何启动分布式训练作业。</p> <p>在这篇文档里,我们介绍如何在 Kubernetes 集群上启动一个单机使用CPU的PaddlePaddle训练作业。在下一篇中,我们将介绍如何启动分布式训练作业。</p>
<div class="section" id="docker"> <div class="section" id="docker">
<span id="docker"></span><h2>制作Docker镜像<a class="headerlink" href="#docker" title="永久链接至标题"></a></h2> <span id="docker"></span><h2>制作Docker镜像<a class="headerlink" href="#docker" title="永久链接至标题"></a></h2>
<p>在一个功能齐全的Kubernetes机群里,通常我们会安装Ceph等分布式文件系统来存储训练数据。这样的话,一个分布式PaddlePaddle训练任务中的每个进程都可以从Ceph读取数据。在这个例子里,我们只演示一个单机作业,所以可以简化对环境的要求,把训练数据直接放在 <p>在一个功能齐全的Kubernetes机群里,通常我们会安装Ceph等分布式文件系统来存储训练数据。这样的话,一个分布式PaddlePaddle训练任务中
PaddlePaddle的Docker image里。为此,我们需要制作一个包含训练数据的PaddlePaddle镜像。</p> 的每个进程都可以从Ceph读取数据。在这个例子里,我们只演示一个单机作业,所以可以简化对环境的要求,把训练数据直接放在
<p>Paddle 的 <a class="reference external" href="http://www.paddlepaddle.org/docs/develop/documentation/zh/getstarted/index_cn.html">Quick Start Tutorial</a> PaddlePaddle的Docker Image里。为此,我们需要制作一个包含训练数据的PaddlePaddle镜像。</p>
里介绍了用Paddle源码中的脚本下载训练数据的过程。 <p>PaddlePaddle的 <code class="docutils literal"><span class="pre">paddlepaddle/paddle:cpu-demo-latest</span></code> 镜像里有PaddlePaddle的源码与demo,
<code class="docutils literal"><span class="pre">paddledev/paddle:cpu-demo-latest</span></code> 镜像里有 PaddlePaddle 源码与demo,( 请注意,默认的 (请注意,默认的PaddlePaddle生产环境镜像 <code class="docutils literal"><span class="pre">paddlepaddle/paddle:latest</span></code> 是不包括源码的,PaddlePaddle的各版本镜像可以参考
PaddlePaddle镜像 <code class="docutils literal"><span class="pre">paddledev/paddle:cpu-latest</span></code> 是不包括源码的, PaddlePaddle的各版本镜像可以参考 <a class="reference external" href="http://www.paddlepaddle.org/doc/build/docker_install.html">Docker installation guide</a> ),所以我们使用这个镜像来下载训练数据到Docker container中,然后把这个包含了训练数据的container保存为一个新的镜像。</p> <a class="reference external" href="http://paddlepaddle.org/docs/develop/documentation/zh/getstarted/build_and_install/docker_install_cn.html">Docker Installation Guide</a>),
下面我们使用这个镜像来下载数据到Docker Container中,并把这个包含了训练数据的Container保存为一个新的镜像。</p>
<div class="section" id=""> <div class="section" id="">
<span id="id1"></span><h3>运行容器<a class="headerlink" href="#" title="永久链接至标题"></a></h3> <span id="id1"></span><h3>运行容器<a class="headerlink" href="#" title="永久链接至标题"></a></h3>
<div class="highlight-default"><div class="highlight"><pre><span></span>$ docker run --name quick_start_data -it paddledev/paddle:cpu-demo-latest <div class="highlight-default"><div class="highlight"><pre><span></span>$ docker run --name quick_start_data -it paddlepaddle/paddle:cpu-demo-latest
</pre></div> </pre></div>
</div> </div>
</div> </div>
......
...@@ -252,7 +252,7 @@ ...@@ -252,7 +252,7 @@
<li>拷贝训练文件到容器内</li> <li>拷贝训练文件到容器内</li>
<li>生成<code class="docutils literal"><span class="pre">paddle</span> <span class="pre">pserver</span></code><code class="docutils literal"><span class="pre">paddle</span> <span class="pre">train</span></code>进程的启动参数,并且启动训练</li> <li>生成<code class="docutils literal"><span class="pre">paddle</span> <span class="pre">pserver</span></code><code class="docutils literal"><span class="pre">paddle</span> <span class="pre">train</span></code>进程的启动参数,并且启动训练</li>
</ul> </ul>
<p>因为官方镜像 <code class="docutils literal"><span class="pre">paddledev/paddle:cpu-latest</span></code> 内已经包含PaddlePaddle的执行程序但是还没上述功能,所以我们可以在这个基础上,添加启动脚本,制作新镜像来完成以上的工作。参考镜像的<a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/usage/cluster/src/k8s_train/Dockerfile"><em>Dockerfile</em></a></p> <p>因为官方镜像 <code class="docutils literal"><span class="pre">paddlepaddle/paddle:latest</span></code> 内已经包含PaddlePaddle的执行程序但是还没上述功能,所以我们可以在这个基础上,添加启动脚本,制作新镜像来完成以上的工作。参考镜像的<a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/usage/cluster/src/k8s_train/Dockerfile"><em>Dockerfile</em></a></p>
<div class="highlight-bash"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> doc/howto/usage/k8s/src/k8s_train <div class="highlight-bash"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> doc/howto/usage/k8s/src/k8s_train
$ docker build -t <span class="o">[</span>YOUR_REPO<span class="o">]</span>/paddle:mypaddle . $ docker build -t <span class="o">[</span>YOUR_REPO<span class="o">]</span>/paddle:mypaddle .
</pre></div> </pre></div>
...@@ -279,7 +279,7 @@ $ docker build -t <span class="o">[</span>YOUR_REPO<span class="o">]</span>/padd ...@@ -279,7 +279,7 @@ $ docker build -t <span class="o">[</span>YOUR_REPO<span class="o">]</span>/padd
<span class="l l-Scalar l-Scalar-Plain">hostNetwork</span><span class="p p-Indicator">:</span> <span class="l l-Scalar l-Scalar-Plain">true</span> <span class="l l-Scalar l-Scalar-Plain">hostNetwork</span><span class="p p-Indicator">:</span> <span class="l l-Scalar l-Scalar-Plain">true</span>
<span class="l l-Scalar l-Scalar-Plain">containers</span><span class="p p-Indicator">:</span> <span class="l l-Scalar l-Scalar-Plain">containers</span><span class="p p-Indicator">:</span>
<span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">name</span><span class="p p-Indicator">:</span> <span class="l l-Scalar l-Scalar-Plain">paddle-data</span> <span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">name</span><span class="p p-Indicator">:</span> <span class="l l-Scalar l-Scalar-Plain">paddle-data</span>
<span class="l l-Scalar l-Scalar-Plain">image</span><span class="p p-Indicator">:</span> <span class="l l-Scalar l-Scalar-Plain">paddledev/paddle-tutorial:k8s_data</span> <span class="l l-Scalar l-Scalar-Plain">image</span><span class="p p-Indicator">:</span> <span class="l l-Scalar l-Scalar-Plain">paddlepaddle/paddle-tutorial:k8s_data</span>
<span class="l l-Scalar l-Scalar-Plain">imagePullPolicy</span><span class="p p-Indicator">:</span> <span class="l l-Scalar l-Scalar-Plain">Always</span> <span class="l l-Scalar l-Scalar-Plain">imagePullPolicy</span><span class="p p-Indicator">:</span> <span class="l l-Scalar l-Scalar-Plain">Always</span>
<span class="l l-Scalar l-Scalar-Plain">volumeMounts</span><span class="p p-Indicator">:</span> <span class="l l-Scalar l-Scalar-Plain">volumeMounts</span><span class="p p-Indicator">:</span>
<span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">mountPath</span><span class="p p-Indicator">:</span> <span class="s">&quot;/mnt&quot;</span> <span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">mountPath</span><span class="p p-Indicator">:</span> <span class="s">&quot;/mnt&quot;</span>
......
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册