提交 b47e3d9f 编写于 作者: T Travis CI

Deploy to GitHub Pages: fee90b50

上级 3aa59eb2
## Command-line arguments
# Command-line arguments
We'll take `doc/howto/cluster/src/word2vec` as an example to introduce distributed training using PaddlePaddle v2 API.
### Starting parameter server
## Starting parameter server
Type the below command to start a parameter server which will wait for trainers to connect:
```bash
$ paddle pserver --port=7164 --ports_num=1 --ports_num_for_sparse=1 --num_gradient_servers=1
$ paddle pserver --port=7164 --ports_num=1 --ports_num_for_sparse=1 --num_gradient_servers=1 --nics=eth0
```
If you wish to run parameter servers in background, and save a log file, you can type:
```bash
$ stdbuf -oL /usr/bin/nohup paddle pserver --port=7164 --ports_num=1 --ports_num_for_sparse=1 --num_gradient_servers=1 &> pserver.log
$ stdbuf -oL /usr/bin/nohup paddle pserver --port=7164 --ports_num=1 --ports_num_for_sparse=1 --num_gradient_servers=1 --nics=eth0 &> pserver.log &
```
Parameter Description
......@@ -21,8 +22,10 @@ Parameter Description
- ports_num: **required, default 1**, total number of ports will listen on.
- ports_num_for_sparse: **required, default 0**, number of ports which serves sparse parameter update.
- num_gradient_servers: **required, default 1**, total number of gradient servers.
- nics: **optional, default xgbe0,xgbe1**, network device name which paramter server will listen on.
## Starting trainer
### Starting trainer
Type the command below to start the trainer(name the file whatever you want, like "train.py")
```bash
......@@ -70,7 +73,7 @@ Parameter Description
- trainer_id: **required, default 0**, ID for every trainer, start from 0.
- pservers: **required, default 127.0.0.1**, list of IPs of parameter servers, separated by ",".
### Prepare Training Dataset
## Prepare Training Dataset
Here's some example code [prepare.py](https://github.com/PaddlePaddle/Paddle/tree/develop/doc/howto/usage/cluster/src/word2vec/prepare.py), it will download public `imikolov` dataset and split it into multiple files according to job parallelism(trainers count). Modify `SPLIT_COUNT` at the begining of `prepare.py` to change the count of output files.
......@@ -88,7 +91,7 @@ for f in flist:
Example code `prepare.py` will split training data and testing data into 3 files with digital suffix like `-00000`, `-00001` and`-00002`:
```
```bash
train.txt
train.txt-00000
train.txt-00001
......@@ -103,13 +106,13 @@ When job started, every trainer needs to get it's own part of data. In some dist
Different training jobs may have different data format and `reader()` function, developers may need to write different data prepare scripts and `reader()` functions for their job.
### Prepare Training program
## Prepare Training program
We'll create a *workspace* directory on each node, storing your training program, dependencies, mounted or downloaded dataset directory.
Your workspace may looks like:
```
```bash
.
|-- my_lib.py
|-- word_dict.pickle
......@@ -138,3 +141,21 @@ Your workspace may looks like:
- `train_data_dir`: containing training data. Mount from storage service or copy trainning data to here.
- `test_data_dir`: containing testing data.
## Async SGD Update
We can set some parameters of the optimizer to make it support async SGD update.
For example, we can set the `is_async` and `async_lagged_grad_discard_ratio` of the `AdaGrad` optimizer:
```python
adagrad = paddle.optimizer.AdaGrad(
is_async=True,
async_lagged_grad_discard_ratio=1.6,
learning_rate=3e-3,
regularization=paddle.optimizer.L2Regularization(8e-4))
```
- `is_async`: Is Async-SGD or not.
- `async_lagged_grad_discard_ratio`: For async SGD gradient commit control.
when `async_lagged_grad_discard_ratio * num_gradient_servers` commit passed,
current async gradient will be discard silently.
......@@ -202,11 +202,11 @@
<div class="section" id="starting-parameter-server">
<span id="starting-parameter-server"></span><h2>Starting parameter server<a class="headerlink" href="#starting-parameter-server" title="Permalink to this headline"></a></h2>
<p>Type the below command to start a parameter server which will wait for trainers to connect:</p>
<div class="highlight-bash"><div class="highlight"><pre><span></span>$ paddle pserver --port<span class="o">=</span><span class="m">7164</span> --ports_num<span class="o">=</span><span class="m">1</span> --ports_num_for_sparse<span class="o">=</span><span class="m">1</span> --num_gradient_servers<span class="o">=</span><span class="m">1</span>
<div class="highlight-bash"><div class="highlight"><pre><span></span>$ paddle pserver --port<span class="o">=</span><span class="m">7164</span> --ports_num<span class="o">=</span><span class="m">1</span> --ports_num_for_sparse<span class="o">=</span><span class="m">1</span> --num_gradient_servers<span class="o">=</span><span class="m">1</span> --nics<span class="o">=</span>eth0
</pre></div>
</div>
<p>If you wish to run parameter servers in background, and save a log file, you can type:</p>
<div class="highlight-bash"><div class="highlight"><pre><span></span>$ stdbuf -oL /usr/bin/nohup paddle pserver --port<span class="o">=</span><span class="m">7164</span> --ports_num<span class="o">=</span><span class="m">1</span> --ports_num_for_sparse<span class="o">=</span><span class="m">1</span> --num_gradient_servers<span class="o">=</span><span class="m">1</span> <span class="p">&amp;</span>&gt; pserver.log
<div class="highlight-bash"><div class="highlight"><pre><span></span>$ stdbuf -oL /usr/bin/nohup paddle pserver --port<span class="o">=</span><span class="m">7164</span> --ports_num<span class="o">=</span><span class="m">1</span> --ports_num_for_sparse<span class="o">=</span><span class="m">1</span> --num_gradient_servers<span class="o">=</span><span class="m">1</span> --nics<span class="o">=</span>eth0 <span class="p">&amp;</span>&gt; pserver.log <span class="p">&amp;</span>
</pre></div>
</div>
<p>Parameter Description</p>
......@@ -215,6 +215,7 @@
<li>ports_num: <strong>required, default 1</strong>, total number of ports will listen on.</li>
<li>ports_num_for_sparse: <strong>required, default 0</strong>, number of ports which serves sparse parameter update.</li>
<li>num_gradient_servers: <strong>required, default 1</strong>, total number of gradient servers.</li>
<li>nics: <strong>optional, default xgbe0,xgbe1</strong>, network device name which paramter server will listen on.</li>
</ul>
</div>
<div class="section" id="starting-trainer">
......@@ -274,14 +275,14 @@ python train.py
</pre></div>
</div>
<p>Example code <code class="docutils literal"><span class="pre">prepare.py</span></code> will split training data and testing data into 3 files with digital suffix like <code class="docutils literal"><span class="pre">-00000</span></code>, <code class="docutils literal"><span class="pre">-00001</span></code> and<code class="docutils literal"><span class="pre">-00002</span></code>:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">train</span><span class="o">.</span><span class="n">txt</span>
<span class="n">train</span><span class="o">.</span><span class="n">txt</span><span class="o">-</span><span class="mi">00000</span>
<span class="n">train</span><span class="o">.</span><span class="n">txt</span><span class="o">-</span><span class="mi">00001</span>
<span class="n">train</span><span class="o">.</span><span class="n">txt</span><span class="o">-</span><span class="mi">00002</span>
<span class="n">test</span><span class="o">.</span><span class="n">txt</span>
<span class="n">test</span><span class="o">.</span><span class="n">txt</span><span class="o">-</span><span class="mi">00000</span>
<span class="n">test</span><span class="o">.</span><span class="n">txt</span><span class="o">-</span><span class="mi">00001</span>
<span class="n">test</span><span class="o">.</span><span class="n">txt</span><span class="o">-</span><span class="mi">00002</span>
<div class="highlight-bash"><div class="highlight"><pre><span></span>train.txt
train.txt-00000
train.txt-00001
train.txt-00002
test.txt
test.txt-00000
test.txt-00001
test.txt-00002
</pre></div>
</div>
<p>When job started, every trainer needs to get it&#8217;s own part of data. In some distributed systems a storage service will be provided, so the date under that path can be accessed by all the trainer nodes. Without the storage service, you must copy the training data to each trainer node.</p>
......@@ -291,18 +292,18 @@ python train.py
<span id="prepare-training-program"></span><h2>Prepare Training program<a class="headerlink" href="#prepare-training-program" title="Permalink to this headline"></a></h2>
<p>We&#8217;ll create a <em>workspace</em> directory on each node, storing your training program, dependencies, mounted or downloaded dataset directory.</p>
<p>Your workspace may looks like:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span>.
|-- my_lib.py
|-- word_dict.pickle
|-- train.py
|-- train_data_dir/
| |-- train.txt-00000
| |-- train.txt-00001
| |-- train.txt-00002
`-- test_data_dir/
|-- test.txt-00000
|-- test.txt-00001
`-- test.txt-00002
<div class="highlight-bash"><div class="highlight"><pre><span></span>.
<span class="p">|</span>-- my_lib.py
<span class="p">|</span>-- word_dict.pickle
<span class="p">|</span>-- train.py
<span class="p">|</span>-- train_data_dir/
<span class="p">|</span> <span class="p">|</span>-- train.txt-00000
<span class="p">|</span> <span class="p">|</span>-- train.txt-00001
<span class="p">|</span> <span class="p">|</span>-- train.txt-00002
<span class="sb">`</span>-- test_data_dir/
<span class="p">|</span>-- test.txt-00000
<span class="p">|</span>-- test.txt-00001
<span class="sb">`</span>-- test.txt-00002
</pre></div>
</div>
<ul>
......@@ -325,6 +326,24 @@ python train.py
</li>
</ul>
</div>
<div class="section" id="async-sgd-update">
<span id="async-sgd-update"></span><h2>Async SGD Update<a class="headerlink" href="#async-sgd-update" title="Permalink to this headline"></a></h2>
<p>We can set some parameters of the optimizer to make it support async SGD update.
For example, we can set the <code class="docutils literal"><span class="pre">is_async</span></code> and <code class="docutils literal"><span class="pre">async_lagged_grad_discard_ratio</span></code> of the <code class="docutils literal"><span class="pre">AdaGrad</span></code> optimizer:</p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">adagrad</span> <span class="o">=</span> <span class="n">paddle</span><span class="o">.</span><span class="n">optimizer</span><span class="o">.</span><span class="n">AdaGrad</span><span class="p">(</span>
<span class="n">is_async</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
<span class="n">async_lagged_grad_discard_ratio</span><span class="o">=</span><span class="mf">1.6</span><span class="p">,</span>
<span class="n">learning_rate</span><span class="o">=</span><span class="mf">3e-3</span><span class="p">,</span>
<span class="n">regularization</span><span class="o">=</span><span class="n">paddle</span><span class="o">.</span><span class="n">optimizer</span><span class="o">.</span><span class="n">L2Regularization</span><span class="p">(</span><span class="mf">8e-4</span><span class="p">))</span>
</pre></div>
</div>
<ul class="simple">
<li><code class="docutils literal"><span class="pre">is_async</span></code>: Is Async-SGD or not.</li>
<li><code class="docutils literal"><span class="pre">async_lagged_grad_discard_ratio</span></code>: For async SGD gradient commit control.
when <code class="docutils literal"><span class="pre">async_lagged_grad_discard_ratio</span> <span class="pre">*</span> <span class="pre">num_gradient_servers</span></code> commit passed,
current async gradient will be discard silently.</li>
</ul>
</div>
</div>
......
因为 它太大了无法显示 source diff 。你可以改为 查看blob
## 启动参数说明
# 启动参数说明
下面以`doc/howto/cluster/src/word2vec`中的代码作为实例,介绍使用PaddlePaddle v2 API完成分布式训练。
### 启动参数服务器
## 启动参数服务器
执行以下的命令启动一个参数服务器并等待和计算节点的数据交互
```bash
$ paddle pserver --port=7164 --ports_num=1 --ports_num_for_sparse=1 --num_gradient_servers=1
```
如果希望可以在后台运行pserver程序,并保存输出到一个日志文件,可以运行:
```bash
$ stdbuf -oL /usr/bin/nohup paddle pserver --port=7164 --ports_num=1 --ports_num_for_sparse=1 --num_gradient_servers=1 &> pserver.log
```
......@@ -20,8 +23,10 @@ $ stdbuf -oL /usr/bin/nohup paddle pserver --port=7164 --ports_num=1 --ports_num
- ports_num_for_sparse:**必选,默认0**,用于稀疏类型参数通信的端口个数
- num_gradient_servers:**必选,默认1**,当前训练任务pserver总数
### 启动计算节点
## 启动计算节点
执行以下命令启动使用python编写的trainer程序(文件名为任意文件名,如train.py)
```bash
$ python train.py
```
......@@ -67,7 +72,7 @@ paddle.init(
- pservers:**必选,默认127.0.0.1**,当前训练任务启动的pserver的IP列表,多个IP使用“,”隔开
### 准备数据集
## 准备数据集
参考样例数据准备脚本[prepare.py](https://github.com/PaddlePaddle/Paddle/tree/develop/doc/howto/usage/cluster/src/word2vec/prepare.py),准备训练数据和验证数据集,我们使用paddle.dataset.imikolov数据集,并根据分布式训练并发数(trainer节点个数),在`prepare.py`开头部分指定`SPLIT_COUNT`将数据切分成多份。
......@@ -84,7 +89,8 @@ for f in flist:
```
示例程序`prepare.py`会把训练集和测试集分别分割成多个文件(例子中为3个,后缀为`-00000`、`-00001`和`-00002`):
```
```bash
train.txt
train.txt-00000
train.txt-00001
......@@ -99,12 +105,13 @@ test.txt-00002
对于不同的训练任务,训练数据格式和训练程序的`reader()`会大不相同,所以开发者需要根据自己训练任务的实际场景完成训练数据的分割和`reader()`的编写。
### 准备训练程序
## 准备训练程序
我们会对每个训练任务都会在每个节点上创建一个工作空间(workspace),其中包含了用户的训练程序、程序依赖、挂载或下载的训练数据分片。
最后,工作空间应如下所示:
```
```bash
.
|-- my_lib.py
|-- word_dict.pickle
......@@ -133,3 +140,21 @@ test.txt-00002
- `train_data_dir`:包含训练数据的目录,可以是从分布式存储挂载过来的,也可以是在任务启动前下载到本地的。
- `test_data_dir`:包含测试数据集的目录。
## 异步 SGD 更新
我们可以通过设置 `optimize` 的参数使之支持异步SGD更新。
例如,设置 `AdaGrad` optimize 的 `is_async` 和 `async_lagged_grad_discard_ratio` 参数:
```python
adagrad = paddle.optimizer.AdaGrad(
is_async=True,
async_lagged_grad_discard_ratio=1.6,
learning_rate=3e-3,
regularization=paddle.optimizer.L2Regularization(8e-4))
```
- `is_async`: 是否为异步SGD更新模式。
- `async_lagged_grad_discard_ratio`: 异步SGD更新的步长控制,接收到足够的gradient(
`async_lagged_grad_discard_ratio * num_gradient_servers`)之后,后面的gradient
将会被抛弃。
......@@ -280,14 +280,14 @@
</pre></div>
</div>
<p>示例程序<code class="docutils literal"><span class="pre">prepare.py</span></code>会把训练集和测试集分别分割成多个文件(例子中为3个,后缀为<code class="docutils literal"><span class="pre">-00000</span></code><code class="docutils literal"><span class="pre">-00001</span></code><code class="docutils literal"><span class="pre">-00002</span></code>):</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">train</span><span class="o">.</span><span class="n">txt</span>
<span class="n">train</span><span class="o">.</span><span class="n">txt</span><span class="o">-</span><span class="mi">00000</span>
<span class="n">train</span><span class="o">.</span><span class="n">txt</span><span class="o">-</span><span class="mi">00001</span>
<span class="n">train</span><span class="o">.</span><span class="n">txt</span><span class="o">-</span><span class="mi">00002</span>
<span class="n">test</span><span class="o">.</span><span class="n">txt</span>
<span class="n">test</span><span class="o">.</span><span class="n">txt</span><span class="o">-</span><span class="mi">00000</span>
<span class="n">test</span><span class="o">.</span><span class="n">txt</span><span class="o">-</span><span class="mi">00001</span>
<span class="n">test</span><span class="o">.</span><span class="n">txt</span><span class="o">-</span><span class="mi">00002</span>
<div class="highlight-bash"><div class="highlight"><pre><span></span>train.txt
train.txt-00000
train.txt-00001
train.txt-00002
test.txt
test.txt-00000
test.txt-00001
test.txt-00002
</pre></div>
</div>
<p>在进行分布式训练时,每个trainer进程需要能够读取属于自己的一份数据。在一些分布式系统中,系统会提供一个分布式存储服务,这样保存在分布式存储中的数据可以被集群中的每个节点读取到。如果不使用分布式存储,则需要手动拷贝属于每个trainer节点的训练数据到对应的节点上。</p>
......@@ -297,18 +297,18 @@
<span id="id5"></span><h2>准备训练程序<a class="headerlink" href="#" title="永久链接至标题"></a></h2>
<p>我们会对每个训练任务都会在每个节点上创建一个工作空间(workspace),其中包含了用户的训练程序、程序依赖、挂载或下载的训练数据分片。</p>
<p>最后,工作空间应如下所示:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span>.
|-- my_lib.py
|-- word_dict.pickle
|-- train.py
|-- train_data_dir/
| |-- train.txt-00000
| |-- train.txt-00001
| |-- train.txt-00002
`-- test_data_dir/
|-- test.txt-00000
|-- test.txt-00001
`-- test.txt-00002
<div class="highlight-bash"><div class="highlight"><pre><span></span>.
<span class="p">|</span>-- my_lib.py
<span class="p">|</span>-- word_dict.pickle
<span class="p">|</span>-- train.py
<span class="p">|</span>-- train_data_dir/
<span class="p">|</span> <span class="p">|</span>-- train.txt-00000
<span class="p">|</span> <span class="p">|</span>-- train.txt-00001
<span class="p">|</span> <span class="p">|</span>-- train.txt-00002
<span class="sb">`</span>-- test_data_dir/
<span class="p">|</span>-- test.txt-00000
<span class="p">|</span>-- test.txt-00001
<span class="sb">`</span>-- test.txt-00002
</pre></div>
</div>
<ul>
......@@ -331,6 +331,24 @@
</li>
</ul>
</div>
<div class="section" id="sgd">
<span id="sgd"></span><h2>异步 SGD 更新<a class="headerlink" href="#sgd" title="永久链接至标题"></a></h2>
<p>我们可以通过设置 <code class="docutils literal"><span class="pre">optimize</span></code> 的参数使之支持异步SGD更新。
例如,设置 <code class="docutils literal"><span class="pre">AdaGrad</span></code> optimize 的 <code class="docutils literal"><span class="pre">is_async</span></code><code class="docutils literal"><span class="pre">async_lagged_grad_discard_ratio</span></code> 参数:</p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">adagrad</span> <span class="o">=</span> <span class="n">paddle</span><span class="o">.</span><span class="n">optimizer</span><span class="o">.</span><span class="n">AdaGrad</span><span class="p">(</span>
<span class="n">is_async</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
<span class="n">async_lagged_grad_discard_ratio</span><span class="o">=</span><span class="mf">1.6</span><span class="p">,</span>
<span class="n">learning_rate</span><span class="o">=</span><span class="mf">3e-3</span><span class="p">,</span>
<span class="n">regularization</span><span class="o">=</span><span class="n">paddle</span><span class="o">.</span><span class="n">optimizer</span><span class="o">.</span><span class="n">L2Regularization</span><span class="p">(</span><span class="mf">8e-4</span><span class="p">))</span>
</pre></div>
</div>
<ul class="simple">
<li><code class="docutils literal"><span class="pre">is_async</span></code>: 是否为异步SGD更新模式。</li>
<li><code class="docutils literal"><span class="pre">async_lagged_grad_discard_ratio</span></code>: 异步SGD更新的步长控制,接收到足够的gradient(
<code class="docutils literal"><span class="pre">async_lagged_grad_discard_ratio</span> <span class="pre">*</span> <span class="pre">num_gradient_servers</span></code>)之后,后面的gradient
将会被抛弃。</li>
</ul>
</div>
</div>
......
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册