Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
BaiXuePrincess
Paddle
提交
6ea45823
P
Paddle
项目概览
BaiXuePrincess
/
Paddle
与 Fork 源项目一致
Fork自
PaddlePaddle / Paddle
通知
1
Star
1
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
6ea45823
编写于
3月 14, 2018
作者:
F
fengjiayi
浏览文件
操作
浏览文件
下载
差异文件
Merge branch 'develop' of
https://github.com/PaddlePaddle/Paddle
into dev_update_reader_doc
上级
6519f6ca
48f213e5
变更
31
显示空白变更内容
内联
并排
Showing
31 changed file
with
898 addition
and
356 deletion
+898
-356
doc/fluid/design/dist_train/distributed_architecture.md
doc/fluid/design/dist_train/distributed_architecture.md
+1
-1
doc/fluid/dev/api_doc_std_cn.md
doc/fluid/dev/api_doc_std_cn.md
+220
-0
doc/fluid/dev/src/fc.py
doc/fluid/dev/src/fc.py
+81
-0
paddle/fluid/framework/operator.cc
paddle/fluid/framework/operator.cc
+2
-18
paddle/fluid/framework/reader.cc
paddle/fluid/framework/reader.cc
+15
-7
paddle/fluid/framework/reader.h
paddle/fluid/framework/reader.h
+20
-42
paddle/fluid/operators/detail/grpc_client.cc
paddle/fluid/operators/detail/grpc_client.cc
+15
-3
paddle/fluid/operators/detail/grpc_client.h
paddle/fluid/operators/detail/grpc_client.h
+26
-10
paddle/fluid/operators/detail/grpc_server.cc
paddle/fluid/operators/detail/grpc_server.cc
+15
-6
paddle/fluid/operators/detail/grpc_server.h
paddle/fluid/operators/detail/grpc_server.h
+1
-1
paddle/fluid/operators/detail/sendrecvop_utils.h
paddle/fluid/operators/detail/sendrecvop_utils.h
+1
-0
paddle/fluid/operators/listen_and_serv_op.cc
paddle/fluid/operators/listen_and_serv_op.cc
+2
-2
paddle/fluid/operators/nccl_op.cc
paddle/fluid/operators/nccl_op.cc
+45
-47
paddle/fluid/operators/nccl_op.cu.cc
paddle/fluid/operators/nccl_op.cu.cc
+44
-95
paddle/fluid/operators/reader/create_double_buffer_reader_op.cc
.../fluid/operators/reader/create_double_buffer_reader_op.cc
+88
-23
paddle/fluid/operators/reader/create_random_data_generator_op.cc
...fluid/operators/reader/create_random_data_generator_op.cc
+3
-2
paddle/fluid/operators/reader/create_recordio_file_reader_op.cc
.../fluid/operators/reader/create_recordio_file_reader_op.cc
+12
-9
paddle/fluid/operators/reader/create_shuffle_reader_op.cc
paddle/fluid/operators/reader/create_shuffle_reader_op.cc
+44
-31
paddle/fluid/operators/reader/reader_op_registry.cc
paddle/fluid/operators/reader/reader_op_registry.cc
+5
-0
paddle/fluid/operators/reader/reader_op_registry.h
paddle/fluid/operators/reader/reader_op_registry.h
+26
-0
paddle/fluid/operators/reduce_op.cc
paddle/fluid/operators/reduce_op.cc
+12
-0
paddle/fluid/operators/reduce_op.h
paddle/fluid/operators/reduce_op.h
+18
-1
paddle/fluid/operators/send_op.cc
paddle/fluid/operators/send_op.cc
+6
-0
python/paddle/fluid/distribute_transpiler.py
python/paddle/fluid/distribute_transpiler.py
+17
-3
python/paddle/fluid/layers/io.py
python/paddle/fluid/layers/io.py
+30
-1
python/paddle/fluid/layers/nn.py
python/paddle/fluid/layers/nn.py
+73
-43
python/paddle/fluid/recordio_writer.py
python/paddle/fluid/recordio_writer.py
+3
-0
python/paddle/fluid/regularizer.py
python/paddle/fluid/regularizer.py
+34
-5
python/paddle/fluid/tests/book/test_machine_translation.py
python/paddle/fluid/tests/book/test_machine_translation.py
+4
-1
python/paddle/fluid/tests/unittests/test_recordio_reader.py
python/paddle/fluid/tests/unittests/test_recordio_reader.py
+22
-5
python/paddle/fluid/tests/unittests/test_reduce_op.py
python/paddle/fluid/tests/unittests/test_reduce_op.py
+13
-0
未找到文件。
doc/fluid/design/dist_train/distributed_architecture.md
浏览文件 @
6ea45823
...
@@ -155,7 +155,7 @@ Cluster environment.
...
@@ -155,7 +155,7 @@ Cluster environment.
<img
src=
"src/remote_executor.png"
width=
"500"
align=
"center"
/>
<img
src=
"src/remote_executor.png"
width=
"500"
align=
"center"
/>
`RemoteExecutor.run`
sends the
`ProgramDesc`
and
`RemoteExecutor.run`
sends the
`ProgramDesc`
and
[
TrainingJob
](
https://github.com/PaddlePaddle/cloud/blob/
develop
/doc/autoscale/README.md#training-job-resource
)
[
TrainingJob
](
https://github.com/PaddlePaddle/cloud/blob/
unreleased-tpr
/doc/autoscale/README.md#training-job-resource
)
to a server in the cluster which executes
`RemoteExecutor.listen`
. This server is responsible
to a server in the cluster which executes
`RemoteExecutor.listen`
. This server is responsible
to start the final Kubernetes Jobs to run the different role of
`ProgramDesc`
from
`ConfigMap`
.
to start the final Kubernetes Jobs to run the different role of
`ProgramDesc`
from
`ConfigMap`
.
...
...
doc/fluid/dev/api_doc_std_cn.md
0 → 100644
浏览文件 @
6ea45823
# API注释撰写标准
-
[
API注释模块
](
#API注释模块
)
-
[
格式及示例
](
#格式及示例
)
-
[
完整示例
](
#完整示例
)
## API注释模块
API文档须包含以下几个模块(排列顺序为文档撰写顺序):
-
Python API Definition
API的代码定义。
-
Function Description
API的功能描述。描述该API的含义、作用或对输入所做的操作,及参考文献和对应链接(如果有),必要时给出公式,并解释公式中关键变量的含义。
-
Args Description
API参数介绍。按代码定义中的参数顺序逐个介绍,介绍内容包含数据类型、默认值(如果有)、含义等。
-
Returns
API返回值介绍。介绍返回值含义,必要时给出对应的形状。若返回值为包含多个参数的tuple,则按顺序逐个介绍各参数。
-
Raises(如果有)
可能抛出的异常或错误及可能的产生原因,当可能抛出多种异常或错误时应分条列出。
-
Note(如果有)
注意事项。当有多条注意事项时,应分条列出。
-
Examples
API的使用示例。
## 格式及示例
API文档须使用reStructuredText格式撰写,该格式详情请参考
[
链接
](
http://sphinx-doc-zh.readthedocs.io/en/latest/rest.html
)
。API文档各模块的内容格式及示例如下(以下以fc为例进行说明):
-
Python API Definition
-
格式:
[Python API Definition]
-
示例
```
fc(input,
size,
num_flatten_dims=1,
param_attr=None,
bias_attr=None,
act=None,
name=None,
main_program=None,
startup_program=None)
```
-
Function Description
-
格式
本模块应包含以下内容(排列顺序为文档撰写顺序):
[Function Description]
[Formula]
[Symbols' Descriptions if necessary]
[References if necessary]
-
示例
[Function Description]
```
**Fully Connected Layer**
The fully connected layer can take multiple tensors as its inputs. It
creates a variable called weights for each input tensor, which represents
a fully connected weight matrix from each input unit to each output unit.
The fully connected layer multiplies each input tensor with its coresponding
weight to produce an output Tensor. If multiple input tensors are given,
the results of multiple multiplications will be sumed up. If bias_attr is
not None, a bias variable will be created and added to the output. Finally,
if activation is not None, it will be applied to the output as well.
```
[Formula]
```
This process can be formulated as follows:
.. math::
Out = Act({\sum_{i=0}^{N-1}X_iW_i + b})
```
[Symbols' Descriptions if necessary]
```
In the above equation:
* :math:`N`: Number of the input.
* :math:`X_i`: The input tensor.
* :math:`W`: The weights created by this layer.
* :math:`b`: The bias parameter created by this layer (if needed).
* :math:`Act`: The activation function.
* :math:`Out`: The output tensor.
```
[References if necessary]
因fc没有必要列出的参考文献,故该内容省略。其他情况下需明确给出对应的参考文献和对应连接,以 layer_norm 为例:
```
Refer to `Layer Normalization <https://arxiv.org/pdf/1607.06450v1.pdf>`_ for more details.
```
-
Args Description
-
格式
\[
Arg's Name
\]
[
(Data Type, Default Value)
][
Description
]
-
示例
fc的部分参数注释如下:
```
Args:
input (Variable|list of Variable): The input tensor(s) of this layer, and the dimension of
the input tensor(s) is at least 2.
param_attr (ParamAttr|list of ParamAttr, default None): The parameter attribute for learnable
parameters/weights of this layer.
name (str, default None): The name of this layer.
```
-
Returns
-
格式
[
Name
][
Shape
]
-
示例
```
Returns:
A tensor variable storing the transformation result.
```
当返回值为包含多个参数的tuple时,应按顺序逐个介绍各参数,以dynamic_lstm为例:
```
Returns:
A tuple containing:
The hidden state of LSTM whose shape is (T X D).
The cell state of LSTM whose shape is (T X D).
```
-
Raises
-
格式
[
Exception Type
][
Condition
]
-
示例
```
Raises:
ValueError: If the rank of the input is less than 2.
```
-
Note
-
格式
[Note]
-
示例
fc没有注意事项,故该模块省略不写。如有注意事项应明确给出,当有多条注意事项,须分条列出,以scaled\_dot\_product\_attention为例:
```
Note:
1. When num_heads > 1, three linear projections are learned respectively
to map input queries, keys and values into queries', keys' and values'.
queries', keys' and values' have the same shapes with queries, keys
and values.
2. When num_heads == 1, scaled_dot_product_attention has no learnable
parameters.
```
-
Examples
-
格式
\[Python Code Snipper]
-
示例
```
Examples:
.. code-block:: python
data = fluid.layers.data(name="data", shape=[32, 32], dtype="float32")
fc = fluid.layers.fc(input=data, size=1000, act="tanh")
```
## 完整示例
fc 的完整注释见
[
示例
](
src/fc.py
)
。
doc/fluid/dev/src/fc.py
0 → 100644
浏览文件 @
6ea45823
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
def
fc
(
input
,
size
,
num_flatten_dims
=
1
,
param_attr
=
None
,
bias_attr
=
None
,
act
=
None
,
name
=
None
):
"""
**Fully Connected Layer**
The fully connected layer can take multiple tensors as its inputs. It
creates a variable called weights for each input tensor, which represents
a fully connected weight matrix from each input unit to each output unit.
The fully connected layer multiplies each input tensor with its coresponding
weight to produce an output Tensor. If multiple input tensors are given,
the results of multiple multiplications will be sumed up. If bias_attr is
not None, a bias variable will be created and added to the output. Finally,
if activation is not None, it will be applied to the output as well.
This process can be formulated as follows:
.. math::
Out = Act({\sum_{i=0}^{N-1}X_iW_i + b})
In the above equation:
* :math:`N`: Number of the input.
* :math:`X_i`: The input tensor.
* :math:`W`: The weights created by this layer.
* :math:`b`: The bias parameter created by this layer (if needed).
* :math:`Act`: The activation function.
* :math:`Out`: The output tensor.
Args:
input (Variable|list of Variable): The input tensor(s) of this layer, and the dimension of
the input tensor(s) is at least 2.
size(int): The number of output units in this layer.
num_flatten_dims (int, default 1): The fc layer can accept an input tensor with more than
two dimensions. If this happens, the multidimensional tensor will first be flattened
into a 2-dimensional matrix. The parameter `num_flatten_dims` determines how the input
tensor is flattened: the first `num_flatten_dims` (inclusive, index starts from 1)
dimensions will be flatten to form the first dimension of the final matrix (height of
the matrix), and the rest `rank(X) - num_flatten_dims` dimensions are flattened to
form the second dimension of the final matrix (width of the matrix). For example, suppose
`X` is a 6-dimensional tensor with a shape [2, 3, 4, 5, 6], and `num_flatten_dims` = 3.
Then, the flattened matrix will have a shape [2 x 3 x 4, 5 x 6] = [24, 30].
param_attr (ParamAttr|list of ParamAttr, default None): The parameter attribute for learnable
parameters/weights of this layer.
bias_attr (ParamAttr|list of ParamAttr, default None): The parameter attribute for the bias
of this layer. If it is set to None, no bias will be added to the output units.
act (str, default None): Activation to be applied to the output of this layer.
name (str, default None): The name of this layer.
Returns:
A tensor variable storing the transformation result.
Raises:
ValueError: If rank of the input tensor is less than 2.
Examples:
.. code-block:: python
data = fluid.layers.data(name="data", shape=[32, 32], dtype="float32")
fc = fluid.layers.fc(input=data, size=1000, act="tanh")
"""
paddle/fluid/framework/operator.cc
浏览文件 @
6ea45823
...
@@ -445,15 +445,7 @@ class RuntimeInferShapeContext : public InferShapeContext {
...
@@ -445,15 +445,7 @@ class RuntimeInferShapeContext : public InferShapeContext {
}
}
std
::
vector
<
DDim
>
GetRepeatedDims
(
const
std
::
string
&
name
)
const
override
{
std
::
vector
<
DDim
>
GetRepeatedDims
(
const
std
::
string
&
name
)
const
override
{
Variable
*
var
=
scope_
.
FindVar
(
name
);
PADDLE_THROW
(
"Only compile time support this method"
);
if
(
var
->
IsType
<
ReaderHolder
>
())
{
return
var
->
Get
<
ReaderHolder
>
().
shapes
();
}
else
{
PADDLE_THROW
(
"Only ReaderHolder support 'GetRepeatedDims', but Variable %s's "
"type_id is %s."
,
name
,
var
->
Type
().
name
());
}
}
}
void
SetDim
(
const
std
::
string
&
name
,
const
DDim
&
dim
)
override
{
void
SetDim
(
const
std
::
string
&
name
,
const
DDim
&
dim
)
override
{
...
@@ -470,15 +462,7 @@ class RuntimeInferShapeContext : public InferShapeContext {
...
@@ -470,15 +462,7 @@ class RuntimeInferShapeContext : public InferShapeContext {
void
SetRepeatedDims
(
const
std
::
string
&
name
,
void
SetRepeatedDims
(
const
std
::
string
&
name
,
const
std
::
vector
<
DDim
>&
dims
)
override
{
const
std
::
vector
<
DDim
>&
dims
)
override
{
Variable
*
var
=
scope_
.
FindVar
(
name
);
PADDLE_THROW
(
"Only compile time support this method"
);
if
(
var
->
IsType
<
ReaderHolder
>
())
{
var
->
GetMutable
<
ReaderHolder
>
()
->
set_shapes
(
dims
);
}
else
{
PADDLE_THROW
(
"Only ReaderHolder support 'SetRepeatedDims', but Variable %s's "
"type_id is %s."
,
name
,
var
->
Type
().
name
());
}
}
}
proto
::
VarType
::
Type
GetVarType
(
const
std
::
string
&
name
)
const
override
{
proto
::
VarType
::
Type
GetVarType
(
const
std
::
string
&
name
)
const
override
{
...
...
paddle/fluid/framework/reader.cc
浏览文件 @
6ea45823
...
@@ -16,14 +16,22 @@
...
@@ -16,14 +16,22 @@
namespace
paddle
{
namespace
paddle
{
namespace
framework
{
namespace
framework
{
ReaderBase
::~
ReaderBase
()
{}
DDim
ReaderBase
::
shape
(
size_t
idx
)
const
{
FileReader
::
FileReader
(
const
std
::
vector
<
DDim
>
&
dims
)
:
dims_
(
dims
)
{}
PADDLE_ENFORCE_LT
(
idx
,
shapes_
.
size
(),
void
FileReader
::
ReadNext
(
std
::
vector
<
LoDTensor
>
*
out
)
{
"Cannot get the %d'th shape, 'shapes_' only has %d elements."
,
idx
,
ReadNextImpl
(
out
);
shapes_
.
size
());
PADDLE_ENFORCE_EQ
(
out
->
size
(),
dims_
.
size
());
return
shapes_
[
idx
];
for
(
size_t
i
=
0
;
i
<
dims_
.
size
();
++
i
)
{
}
auto
&
actual
=
out
->
at
(
i
).
dims
();
auto
&
expect
=
dims_
[
i
];
PADDLE_ENFORCE_EQ
(
actual
.
size
(),
expect
.
size
());
for
(
int
j
=
0
;
j
<
actual
.
size
();
++
j
)
{
PADDLE_ENFORCE
(
actual
[
i
]
==
expect
[
i
]
||
expect
[
i
]
==
-
1
);
}
}
}
}
// namespace framework
}
// namespace framework
}
// namespace paddle
}
// namespace paddle
paddle/fluid/framework/reader.h
浏览文件 @
6ea45823
...
@@ -16,51 +16,29 @@
...
@@ -16,51 +16,29 @@
#include "paddle/fluid/framework/ddim.h"
#include "paddle/fluid/framework/ddim.h"
#include "paddle/fluid/framework/lod_tensor_array.h"
#include "paddle/fluid/framework/lod_tensor_array.h"
#include "paddle/fluid/platform/place.h"
#include <memory>
#include <thread>
#include <vector>
namespace
paddle
{
namespace
paddle
{
namespace
framework
{
namespace
framework
{
class
ReaderBase
{
class
ReaderBase
{
public:
public:
explicit
ReaderBase
(
const
std
::
vector
<
DDim
>&
shapes
)
:
shapes_
(
shapes
)
{
PADDLE_ENFORCE
(
!
shapes_
.
empty
());
}
virtual
void
ReadNext
(
std
::
vector
<
LoDTensor
>*
out
)
=
0
;
virtual
void
ReadNext
(
std
::
vector
<
LoDTensor
>*
out
)
=
0
;
virtual
void
ReInit
()
=
0
;
virtual
void
ReInit
()
=
0
;
DDim
shape
(
size_t
idx
)
const
;
std
::
vector
<
DDim
>
shapes
()
const
{
return
shapes_
;
}
void
set_shapes
(
const
std
::
vector
<
DDim
>&
shapes
)
{
shapes_
=
shapes
;
}
virtual
bool
HasNext
()
const
=
0
;
virtual
bool
HasNext
()
const
=
0
;
virtual
~
ReaderBase
()
{}
virtual
~
ReaderBase
();
protected:
std
::
vector
<
DDim
>
shapes_
;
};
class
FileReader
:
public
ReaderBase
{
public:
explicit
FileReader
(
const
std
::
vector
<
DDim
>&
shapes
)
:
shapes_
(
shapes
)
{}
void
ReadNext
(
std
::
vector
<
LoDTensor
>*
out
)
override
final
{
ReadNextImpl
(
out
);
CheckShapes
(
out
);
}
virtual
void
ReadNextImpl
(
std
::
vector
<
LoDTensor
>*
out
)
=
0
;
protected:
CheckShape
(
const
std
::
vector
<
LoDTensor
>*
out
);
std
::
vector
<
DDim
>
shapes_
;
};
};
class
DecoratedReader
:
public
ReaderBase
{
class
DecoratedReader
:
public
ReaderBase
{
public:
public:
explicit
DecoratedReader
(
ReaderBase
*
reader
)
:
reader_
(
reader
)
{
explicit
DecoratedReader
(
ReaderBase
*
reader
)
:
ReaderBase
(),
reader_
(
reader
)
{
PADDLE_ENFORCE_NOT_NULL
(
reader_
);
PADDLE_ENFORCE_NOT_NULL
(
reader_
);
}
}
...
@@ -72,6 +50,19 @@ class DecoratedReader : public ReaderBase {
...
@@ -72,6 +50,19 @@ class DecoratedReader : public ReaderBase {
ReaderBase
*
reader_
;
ReaderBase
*
reader_
;
};
};
class
FileReader
:
public
ReaderBase
{
public:
explicit
FileReader
(
const
std
::
vector
<
DDim
>&
dims
);
void
ReadNext
(
std
::
vector
<
LoDTensor
>*
out
)
override
;
protected:
virtual
void
ReadNextImpl
(
std
::
vector
<
LoDTensor
>*
out
)
=
0
;
private:
std
::
vector
<
DDim
>
dims_
;
};
// The ReaderHolder is used as reader' unified wrapper,
// The ReaderHolder is used as reader' unified wrapper,
// making it easier to access different type reader in Variables.
// making it easier to access different type reader in Variables.
class
ReaderHolder
{
class
ReaderHolder
{
...
@@ -89,19 +80,6 @@ class ReaderHolder {
...
@@ -89,19 +80,6 @@ class ReaderHolder {
reader_
->
ReInit
();
reader_
->
ReInit
();
}
}
DDim
shape
(
size_t
idx
)
const
{
PADDLE_ENFORCE_NOT_NULL
(
reader_
);
return
reader_
->
shape
(
idx
);
}
std
::
vector
<
DDim
>
shapes
()
const
{
PADDLE_ENFORCE_NOT_NULL
(
reader_
);
return
reader_
->
shapes
();
}
void
set_shapes
(
const
std
::
vector
<
DDim
>&
shapes
)
{
PADDLE_ENFORCE_NOT_NULL
(
reader_
);
reader_
->
set_shapes
(
shapes
);
}
bool
HasNext
()
const
{
return
reader_
->
HasNext
();
}
bool
HasNext
()
const
{
return
reader_
->
HasNext
();
}
private:
private:
...
...
paddle/fluid/operators/detail/grpc_client.cc
浏览文件 @
6ea45823
...
@@ -97,7 +97,7 @@ bool RPCClient::AsyncGetVariable(const std::string& ep,
...
@@ -97,7 +97,7 @@ bool RPCClient::AsyncGetVariable(const std::string& ep,
return
true
;
return
true
;
}
}
bool
RPCClient
::
AsyncSendBatchBarrier
(
const
std
::
string
&
ep
,
int64_t
time_out
)
{
void
RPCClient
::
AsyncSendBatchBarrier
(
const
std
::
string
&
ep
,
int64_t
time_out
)
{
const
auto
ch
=
GetChannel
(
ep
);
const
auto
ch
=
GetChannel
(
ep
);
BatchBarrierProcessor
*
s
=
new
BatchBarrierProcessor
(
ch
);
BatchBarrierProcessor
*
s
=
new
BatchBarrierProcessor
(
ch
);
...
@@ -108,8 +108,18 @@ bool RPCClient::AsyncSendBatchBarrier(const std::string& ep, int64_t time_out) {
...
@@ -108,8 +108,18 @@ bool RPCClient::AsyncSendBatchBarrier(const std::string& ep, int64_t time_out) {
auto
rpc
=
s
->
stub_
->
AsyncSendVariable
(
s
->
context_
.
get
(),
req
,
&
cq_
);
auto
rpc
=
s
->
stub_
->
AsyncSendVariable
(
s
->
context_
.
get
(),
req
,
&
cq_
);
rpc
->
Finish
(
&
s
->
reply_
,
&
s
->
status_
,
(
void
*
)
s
);
rpc
->
Finish
(
&
s
->
reply_
,
&
s
->
status_
,
(
void
*
)
s
);
req_count_
++
;
req_count_
++
;
}
return
true
;
void
RPCClient
::
AsyncSendFetchBarrier
(
const
std
::
string
&
ep
,
int64_t
time_out
)
{
const
auto
ch
=
GetChannel
(
ep
);
FetchBarrierProcessor
*
s
=
new
FetchBarrierProcessor
(
ch
);
s
->
Prepare
(
time_out
);
sendrecv
::
VariableMessage
req
;
req
.
set_varname
(
FETCH_BARRIER_MESSAGE
);
auto
rpc
=
s
->
stub_
->
AsyncGetVariable
(
s
->
context_
.
get
(),
req
,
&
cq_
);
rpc
->
Finish
(
&
s
->
reply_
,
&
s
->
status_
,
(
void
*
)
s
);
req_count_
++
;
}
}
bool
RPCClient
::
Wait
()
{
bool
RPCClient
::
Wait
()
{
...
@@ -154,7 +164,7 @@ bool RPCClient::Proceed() {
...
@@ -154,7 +164,7 @@ bool RPCClient::Proceed() {
PADDLE_ENFORCE
(
tag
);
PADDLE_ENFORCE
(
tag
);
// TODO(gongwb): add more retries.
// TODO(gongwb): add more retries.
ClientBase
*
c
=
static_cast
<
ClientBase
*>
(
tag
);
BaseProcessor
*
c
=
static_cast
<
BaseProcessor
*>
(
tag
);
if
(
!
c
->
status_
.
ok
())
{
if
(
!
c
->
status_
.
ok
())
{
LOG
(
ERROR
)
<<
"proc param error:"
<<
c
->
var_h_
.
String
()
LOG
(
ERROR
)
<<
"proc param error:"
<<
c
->
var_h_
.
String
()
<<
" grpc error:"
<<
c
->
status_
.
error_message
();
<<
" grpc error:"
<<
c
->
status_
.
error_message
();
...
@@ -174,6 +184,8 @@ std::shared_ptr<grpc::Channel> RPCClient::GetChannel(const std::string& ep) {
...
@@ -174,6 +184,8 @@ std::shared_ptr<grpc::Channel> RPCClient::GetChannel(const std::string& ep) {
}
}
grpc
::
ChannelArguments
args
;
grpc
::
ChannelArguments
args
;
args
.
SetInt
(
"grpc.testing.fixed_reconnect_backoff_ms"
,
5000
);
args
.
SetCompressionAlgorithm
(
GRPC_COMPRESS_NONE
);
args
.
SetMaxSendMessageSize
(
std
::
numeric_limits
<
int
>::
max
());
args
.
SetMaxSendMessageSize
(
std
::
numeric_limits
<
int
>::
max
());
args
.
SetMaxReceiveMessageSize
(
std
::
numeric_limits
<
int
>::
max
());
args
.
SetMaxReceiveMessageSize
(
std
::
numeric_limits
<
int
>::
max
());
...
...
paddle/fluid/operators/detail/grpc_client.h
浏览文件 @
6ea45823
...
@@ -52,14 +52,14 @@ struct VarHandle {
...
@@ -52,14 +52,14 @@ struct VarHandle {
void
ProcGetResponse
(
const
VarHandle
&
var_h
,
void
ProcGetResponse
(
const
VarHandle
&
var_h
,
const
sendrecv
::
VariableMessage
&
msg
);
const
sendrecv
::
VariableMessage
&
msg
);
class
ClientBase
{
class
BaseProcessor
{
public:
public:
explicit
ClientBase
(
std
::
shared_ptr
<
grpc
::
Channel
>
ch
)
{
explicit
BaseProcessor
(
std
::
shared_ptr
<
grpc
::
Channel
>
ch
)
{
stub_
=
sendrecv
::
SendRecvService
::
NewStub
(
ch
);
stub_
=
sendrecv
::
SendRecvService
::
NewStub
(
ch
);
context_
=
NULL
;
context_
=
NULL
;
}
}
virtual
~
ClientBase
()
{}
virtual
~
BaseProcessor
()
{}
virtual
void
Prepare
(
const
VarHandle
&
var_info
,
int64_t
time_out
)
{
virtual
void
Prepare
(
const
VarHandle
&
var_info
,
int64_t
time_out
)
{
context_
.
reset
(
new
grpc
::
ClientContext
());
context_
.
reset
(
new
grpc
::
ClientContext
());
...
@@ -91,9 +91,10 @@ class ClientBase {
...
@@ -91,9 +91,10 @@ class ClientBase {
typedef
std
::
function
<
void
(
const
VarHandle
&
,
const
sendrecv
::
VoidMessage
&
)
>
typedef
std
::
function
<
void
(
const
VarHandle
&
,
const
sendrecv
::
VoidMessage
&
)
>
RequestSendCallBack
;
RequestSendCallBack
;
class
SendProcessor
:
public
ClientBase
{
class
SendProcessor
:
public
BaseProcessor
{
public:
public:
explicit
SendProcessor
(
std
::
shared_ptr
<
grpc
::
Channel
>
ch
)
:
ClientBase
(
ch
)
{}
explicit
SendProcessor
(
std
::
shared_ptr
<
grpc
::
Channel
>
ch
)
:
BaseProcessor
(
ch
)
{}
virtual
~
SendProcessor
()
{}
virtual
~
SendProcessor
()
{}
...
@@ -110,9 +111,10 @@ class SendProcessor : public ClientBase {
...
@@ -110,9 +111,10 @@ class SendProcessor : public ClientBase {
typedef
std
::
function
<
void
(
const
VarHandle
&
,
const
sendrecv
::
VariableMessage
&
)
>
typedef
std
::
function
<
void
(
const
VarHandle
&
,
const
sendrecv
::
VariableMessage
&
)
>
RequestGetCallBack
;
RequestGetCallBack
;
class
GetProcessor
:
public
ClientBase
{
class
GetProcessor
:
public
BaseProcessor
{
public:
public:
explicit
GetProcessor
(
std
::
shared_ptr
<
grpc
::
Channel
>
ch
)
:
ClientBase
(
ch
)
{}
explicit
GetProcessor
(
std
::
shared_ptr
<
grpc
::
Channel
>
ch
)
:
BaseProcessor
(
ch
)
{}
virtual
~
GetProcessor
()
{}
virtual
~
GetProcessor
()
{}
...
@@ -126,10 +128,10 @@ class GetProcessor : public ClientBase {
...
@@ -126,10 +128,10 @@ class GetProcessor : public ClientBase {
RequestGetCallBack
response_call_back_
=
ProcGetResponse
;
RequestGetCallBack
response_call_back_
=
ProcGetResponse
;
};
};
class
BatchBarrierProcessor
:
public
ClientBase
{
class
BatchBarrierProcessor
:
public
BaseProcessor
{
public:
public:
explicit
BatchBarrierProcessor
(
std
::
shared_ptr
<
grpc
::
Channel
>
ch
)
explicit
BatchBarrierProcessor
(
std
::
shared_ptr
<
grpc
::
Channel
>
ch
)
:
ClientBase
(
ch
)
{}
:
BaseProcessor
(
ch
)
{}
virtual
~
BatchBarrierProcessor
()
{}
virtual
~
BatchBarrierProcessor
()
{}
...
@@ -137,6 +139,17 @@ class BatchBarrierProcessor : public ClientBase {
...
@@ -137,6 +139,17 @@ class BatchBarrierProcessor : public ClientBase {
sendrecv
::
VoidMessage
reply_
;
sendrecv
::
VoidMessage
reply_
;
};
};
class
FetchBarrierProcessor
:
public
BaseProcessor
{
public:
explicit
FetchBarrierProcessor
(
std
::
shared_ptr
<
grpc
::
Channel
>
ch
)
:
BaseProcessor
(
ch
)
{}
virtual
~
FetchBarrierProcessor
()
{}
virtual
void
Process
()
{}
sendrecv
::
VariableMessage
reply_
;
};
class
RPCClient
{
class
RPCClient
{
public:
public:
bool
AsyncSendVariable
(
const
std
::
string
&
ep
,
bool
AsyncSendVariable
(
const
std
::
string
&
ep
,
...
@@ -151,7 +164,10 @@ class RPCClient {
...
@@ -151,7 +164,10 @@ class RPCClient {
const
std
::
string
&
var_name
,
const
std
::
string
&
var_name
,
int64_t
time_out
=
600
*
1000
);
int64_t
time_out
=
600
*
1000
);
bool
AsyncSendBatchBarrier
(
const
std
::
string
&
ep
,
void
AsyncSendBatchBarrier
(
const
std
::
string
&
ep
,
int64_t
time_out
=
600
*
1000
);
void
AsyncSendFetchBarrier
(
const
std
::
string
&
ep
,
int64_t
time_out
=
600
*
1000
);
int64_t
time_out
=
600
*
1000
);
bool
Wait
();
bool
Wait
();
...
...
paddle/fluid/operators/detail/grpc_server.cc
浏览文件 @
6ea45823
...
@@ -84,7 +84,7 @@ class RequestGet final : public RequestBase {
...
@@ -84,7 +84,7 @@ class RequestGet final : public RequestBase {
explicit
RequestGet
(
sendrecv
::
SendRecvService
::
AsyncService
*
service
,
explicit
RequestGet
(
sendrecv
::
SendRecvService
::
AsyncService
*
service
,
grpc
::
ServerCompletionQueue
*
cq
,
framework
::
Scope
*
scope
,
grpc
::
ServerCompletionQueue
*
cq
,
framework
::
Scope
*
scope
,
const
platform
::
DeviceContext
*
dev_ctx
,
const
platform
::
DeviceContext
*
dev_ctx
,
SimpleBlockQueue
<
char
>*
queue
)
SimpleBlockQueue
<
MessageWithName
>*
queue
)
:
RequestBase
(
service
,
cq
),
:
RequestBase
(
service
,
cq
),
responder_
(
&
ctx_
),
responder_
(
&
ctx_
),
scope_
(
scope
),
scope_
(
scope
),
...
@@ -101,11 +101,16 @@ class RequestGet final : public RequestBase {
...
@@ -101,11 +101,16 @@ class RequestGet final : public RequestBase {
// proc request.
// proc request.
std
::
string
var_name
=
request_
.
varname
();
std
::
string
var_name
=
request_
.
varname
();
auto
*
var
=
scope_
->
FindVar
(
var_name
);
auto
*
var
=
scope_
->
FindVar
(
var_name
);
if
(
var_name
!=
FETCH_BARRIER_MESSAGE
)
{
SerializeToMessage
(
var_name
,
var
,
*
dev_ctx_
,
&
reply_
);
SerializeToMessage
(
var_name
,
var
,
*
dev_ctx_
,
&
reply_
);
}
// TODO(gongwb): check var's info.
// TODO(gongwb): check var's info.
responder_
.
Finish
(
reply_
,
grpc
::
Status
::
OK
,
this
);
responder_
.
Finish
(
reply_
,
grpc
::
Status
::
OK
,
this
);
status_
=
FINISH
;
status_
=
FINISH
;
queue_
->
Push
(
'c'
);
MessageWithName
msg_with_name
=
// request name reply
std
::
make_pair
(
var_name
,
std
::
move
(
reply_
));
queue_
->
Push
(
msg_with_name
);
}
}
protected:
protected:
...
@@ -114,12 +119,16 @@ class RequestGet final : public RequestBase {
...
@@ -114,12 +119,16 @@ class RequestGet final : public RequestBase {
ServerAsyncResponseWriter
<
sendrecv
::
VariableMessage
>
responder_
;
ServerAsyncResponseWriter
<
sendrecv
::
VariableMessage
>
responder_
;
framework
::
Scope
*
scope_
;
framework
::
Scope
*
scope_
;
const
platform
::
DeviceContext
*
dev_ctx_
;
const
platform
::
DeviceContext
*
dev_ctx_
;
SimpleBlockQueue
<
char
>*
queue_
;
SimpleBlockQueue
<
MessageWithName
>*
queue_
;
};
};
void
AsyncGRPCServer
::
WaitClientGet
(
int
count
)
{
void
AsyncGRPCServer
::
WaitClientGet
(
int
count
)
{
for
(
int
i
=
0
;
i
<
count
;
++
i
)
{
int
fetch_barriers
=
0
;
var_get_queue_
.
Pop
();
while
(
fetch_barriers
<
count
)
{
auto
msg
=
var_get_queue_
.
Pop
();
if
(
msg
.
first
==
FETCH_BARRIER_MESSAGE
)
{
fetch_barriers
++
;
}
}
}
}
}
...
...
paddle/fluid/operators/detail/grpc_server.h
浏览文件 @
6ea45823
...
@@ -77,7 +77,7 @@ class AsyncGRPCServer final : public sendrecv::SendRecvService::Service {
...
@@ -77,7 +77,7 @@ class AsyncGRPCServer final : public sendrecv::SendRecvService::Service {
const
platform
::
DeviceContext
*
dev_ctx_
;
const
platform
::
DeviceContext
*
dev_ctx_
;
// received variable from RPC, operators fetch variable from this queue.
// received variable from RPC, operators fetch variable from this queue.
SimpleBlockQueue
<
MessageWithName
>
var_recv_queue_
;
SimpleBlockQueue
<
MessageWithName
>
var_recv_queue_
;
SimpleBlockQueue
<
char
>
var_get_queue_
;
SimpleBlockQueue
<
MessageWithName
>
var_get_queue_
;
// condition of the sub program
// condition of the sub program
std
::
mutex
barrier_mutex_
;
std
::
mutex
barrier_mutex_
;
...
...
paddle/fluid/operators/detail/sendrecvop_utils.h
浏览文件 @
6ea45823
...
@@ -32,6 +32,7 @@ namespace detail {
...
@@ -32,6 +32,7 @@ namespace detail {
#define LISTEN_TERMINATE_MESSAGE "TERMINATE@RECV"
#define LISTEN_TERMINATE_MESSAGE "TERMINATE@RECV"
#define BATCH_BARRIER_MESSAGE "BATCH_BARRIER@RECV"
#define BATCH_BARRIER_MESSAGE "BATCH_BARRIER@RECV"
#define FETCH_BARRIER_MESSAGE "FETCH_BARRIER@RECV"
typedef
void
(
*
DestroyCallback
)(
void
*
);
typedef
void
(
*
DestroyCallback
)(
void
*
);
...
...
paddle/fluid/operators/listen_and_serv_op.cc
浏览文件 @
6ea45823
...
@@ -128,8 +128,8 @@ class ListenAndServOp : public framework::OperatorBase {
...
@@ -128,8 +128,8 @@ class ListenAndServOp : public framework::OperatorBase {
}
}
}
}
if
(
exit_flag
)
{
if
(
exit_flag
)
{
rpc_service_
->
ShutDown
();
rpc_service_
->
SetCond
(
1
);
rpc_service_
->
SetCond
(
1
);
rpc_service_
->
ShutDown
();
break
;
break
;
}
}
try
{
try
{
...
@@ -148,7 +148,7 @@ class ListenAndServOp : public framework::OperatorBase {
...
@@ -148,7 +148,7 @@ class ListenAndServOp : public framework::OperatorBase {
}
}
rpc_service_
->
SetCond
(
1
);
rpc_service_
->
SetCond
(
1
);
// FIXME(typhoonzero): use another condition to sync wait clients get.
// FIXME(typhoonzero): use another condition to sync wait clients get.
rpc_service_
->
WaitClientGet
(
ins
.
size
()
);
rpc_service_
->
WaitClientGet
(
fan_in
);
sparse_vars
.
clear
();
sparse_vars
.
clear
();
}
// while(true)
}
// while(true)
}
}
...
...
paddle/fluid/operators/nccl_op.cc
浏览文件 @
6ea45823
...
@@ -104,19 +104,38 @@ class NCCLAllReduceOp : public framework::OperatorWithKernel {
...
@@ -104,19 +104,38 @@ class NCCLAllReduceOp : public framework::OperatorWithKernel {
" Input(Communicator) of AllReduce op input should not be NULL"
);
" Input(Communicator) of AllReduce op input should not be NULL"
);
PADDLE_ENFORCE
(
ctx
->
HasOutput
(
"Out"
),
PADDLE_ENFORCE
(
ctx
->
HasOutput
(
"Out"
),
" Output(Out) of AllReduce op output should not be NULL"
);
" Output(Out) of AllReduce op output should not be NULL"
);
auto
x_dims
=
ctx
->
GetInputsDim
(
"X"
);
std
::
string
reduction
=
ctx
->
Attrs
().
Get
<
std
::
string
>
(
"reduction"
);
std
::
string
reduction
=
ctx
->
Attrs
().
Get
<
std
::
string
>
(
"reduction"
);
PADDLE_ENFORCE
((
reduction
==
"ncclSum"
||
reduction
==
"ncclProd"
||
PADDLE_ENFORCE
((
reduction
==
"ncclSum"
||
reduction
==
"ncclProd"
||
reduction
==
"ncclMin"
||
reduction
==
"ncclMax"
),
reduction
==
"ncclMin"
||
reduction
==
"ncclMax"
),
"invalid reduction."
);
"invalid reduction."
);
auto
x_dims
=
ctx
->
GetInputsDim
(
"X"
);
ctx
->
SetOutputsDim
(
"Out"
,
x_dims
);
ctx
->
SetOutputsDim
(
"Out"
,
x_dims
);
ctx
->
ShareLoD
(
"X"
,
/*->*/
"Out"
);
ctx
->
ShareLoD
(
"X"
,
/*->*/
"Out"
);
}
}
};
};
// AllReduceOp
class
NCCLAllReduceOpMaker
:
public
framework
::
OpProtoAndCheckerMaker
{
public:
NCCLAllReduceOpMaker
(
OpProto
*
proto
,
OpAttrChecker
*
op_checker
)
:
OpProtoAndCheckerMaker
(
proto
,
op_checker
)
{
AddInput
(
"X"
,
"The input of AllReduce op"
);
AddInput
(
"Communicator"
,
"Communicator for communicating between gpus"
);
AddOutput
(
"Out"
,
"The output of AllReduce op"
);
AddAttr
<
std
::
string
>
(
"reduction"
,
"(string, default 'ncclSum') "
"{'ncclMin', 'ncclMax', 'ncclProd', 'ncclSum'}."
)
.
SetDefault
(
"ncclSum"
);
AddComment
(
R"DOC(
NCCLAllReduce Operator.
AllReduce the input tensors.
)DOC"
);
}
};
// ReduceOp
// ReduceOp
class
NCCLReduceOp
:
public
framework
::
OperatorWithKernel
{
class
NCCLReduceOp
:
public
framework
::
OperatorWithKernel
{
public:
public:
...
@@ -143,50 +162,6 @@ class NCCLReduceOp : public framework::OperatorWithKernel {
...
@@ -143,50 +162,6 @@ class NCCLReduceOp : public framework::OperatorWithKernel {
}
}
};
};
// BcastOp
class
NCCLBcastOp
:
public
framework
::
OperatorWithKernel
{
public:
using
framework
::
OperatorWithKernel
::
OperatorWithKernel
;
protected:
void
InferShape
(
framework
::
InferShapeContext
*
ctx
)
const
override
{
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"X"
),
" Input(X) of Bcast op input should not be NULL"
);
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"Communicator"
),
" Input(Communicator) of Bcast op input should not be NULL"
);
PADDLE_ENFORCE
(
ctx
->
HasOutput
(
"Out"
),
" Output(Out) of Bcast op output should not be NULL"
);
int
root
=
ctx
->
Attrs
().
Get
<
int
>
(
"root"
);
PADDLE_ENFORCE
(
root
!=
platform
::
kInvalidGPUId
,
"Bcast root must be set."
);
auto
x_dims
=
ctx
->
GetInputsDim
(
"X"
);
ctx
->
SetOutputsDim
(
"Out"
,
x_dims
);
ctx
->
ShareLoD
(
"X"
,
/*->*/
"Out"
);
}
};
// AllreduceOp
class
NCCLAllReduceOpMaker
:
public
framework
::
OpProtoAndCheckerMaker
{
public:
NCCLAllReduceOpMaker
(
OpProto
*
proto
,
OpAttrChecker
*
op_checker
)
:
OpProtoAndCheckerMaker
(
proto
,
op_checker
)
{
AddInput
(
"X"
,
"The input of AllReduce op"
);
AddInput
(
"Communicator"
,
"Communicator for communicating between gpus"
);
AddOutput
(
"Out"
,
"The output of AllReduce op"
);
AddAttr
<
std
::
string
>
(
"reduction"
,
"(string, default 'ncclSum') "
"{'ncclMin', 'ncclMax', 'ncclProd', 'ncclSum'}."
)
.
SetDefault
(
"ncclSum"
);
AddComment
(
R"DOC(
NCCLAllReduce Operator.
AllReduce the input tensors.
)DOC"
);
}
};
// ReduceOp
// ReduceOp
class
NCCLReduceOpMaker
:
public
framework
::
OpProtoAndCheckerMaker
{
class
NCCLReduceOpMaker
:
public
framework
::
OpProtoAndCheckerMaker
{
public:
public:
...
@@ -213,6 +188,29 @@ Reduce the tensors.
...
@@ -213,6 +188,29 @@ Reduce the tensors.
}
}
};
};
// BcastOp
class
NCCLBcastOp
:
public
framework
::
OperatorWithKernel
{
public:
using
framework
::
OperatorWithKernel
::
OperatorWithKernel
;
protected:
void
InferShape
(
framework
::
InferShapeContext
*
ctx
)
const
override
{
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"X"
),
" Input(X) of Bcast op input should not be NULL"
);
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"Communicator"
),
" Input(Communicator) of Bcast op input should not be NULL"
);
PADDLE_ENFORCE
(
ctx
->
HasOutput
(
"Out"
),
" Output(Out) of Bcast op output should not be NULL"
);
int
root
=
ctx
->
Attrs
().
Get
<
int
>
(
"root"
);
PADDLE_ENFORCE
(
root
!=
platform
::
kInvalidGPUId
,
"Bcast root must be set."
);
auto
x_dims
=
ctx
->
GetInputsDim
(
"X"
);
ctx
->
SetOutputsDim
(
"Out"
,
x_dims
);
ctx
->
ShareLoD
(
"X"
,
/*->*/
"Out"
);
}
};
// BcastOp
// BcastOp
class
NCCLBcastOpMaker
:
public
framework
::
OpProtoAndCheckerMaker
{
class
NCCLBcastOpMaker
:
public
framework
::
OpProtoAndCheckerMaker
{
public:
public:
...
...
paddle/fluid/operators/nccl_op.cu.cc
浏览文件 @
6ea45823
...
@@ -43,13 +43,12 @@ class NCCLAllReduceKernel : public framework::OpKernel<T> {
...
@@ -43,13 +43,12 @@ class NCCLAllReduceKernel : public framework::OpKernel<T> {
void
Compute
(
const
framework
::
ExecutionContext
&
ctx
)
const
override
{
void
Compute
(
const
framework
::
ExecutionContext
&
ctx
)
const
override
{
PADDLE_ENFORCE
(
platform
::
is_gpu_place
(
ctx
.
GetPlace
()),
PADDLE_ENFORCE
(
platform
::
is_gpu_place
(
ctx
.
GetPlace
()),
"This kernel only runs on GPU device."
);
"This kernel only runs on GPU device."
);
auto
*
x
=
ctx
.
Input
<
LoDTensor
>
(
"X"
);
auto
ins
=
ctx
.
MultiInput
<
LoDTensor
>
(
"X"
);
auto
*
out
=
ctx
.
Output
<
LoDTensor
>
(
"Out"
);
auto
outs
=
ctx
.
MultiOutput
<
LoDTensor
>
(
"Out"
);
auto
*
comm
=
ctx
.
Input
<
Communicator
>
(
"Communicator"
);
std
::
string
reduction
=
ctx
.
Attr
<
std
::
string
>
(
"reduction"
);
std
::
string
reduction
=
ctx
.
Attr
<
std
::
string
>
(
"reduction"
);
ncclRedOp_t
reduction_op_
=
ncclSum
;
ncclRedOp_t
reduction_op_
=
ncclSum
;
if
(
reduction
==
"ncclMin"
)
{
if
(
reduction
==
"ncclMin"
)
{
reduction_op_
=
ncclMin
;
reduction_op_
=
ncclMin
;
}
else
if
(
reduction
==
"ncclMax"
)
{
}
else
if
(
reduction
==
"ncclMax"
)
{
...
@@ -61,30 +60,19 @@ class NCCLAllReduceKernel : public framework::OpKernel<T> {
...
@@ -61,30 +60,19 @@ class NCCLAllReduceKernel : public framework::OpKernel<T> {
}
else
{
}
else
{
PADDLE_THROW
(
"Invalid reduction. default ncclSum."
);
PADDLE_THROW
(
"Invalid reduction. default ncclSum."
);
}
}
auto
*
comm
=
ctx
.
Input
<
Communicator
>
(
"Communicator"
);
auto
stream
=
ctx
.
cuda_device_context
().
stream
();
// device id
// device id
int
gpu_id
=
boost
::
get
<
platform
::
CUDAPlace
>
(
ctx
.
GetPlace
()).
GetDeviceId
();
int
gpu_id
=
boost
::
get
<
platform
::
CUDAPlace
>
(
ctx
.
GetPlace
()).
GetDeviceId
();
int
idx
=
comm
->
GetCommId
(
gpu_id
);
int
idx
=
comm
->
GetCommId
(
gpu_id
);
VLOG
(
3
)
<<
"gpu : "
for
(
size_t
i
=
0
;
i
<
ins
.
size
();
++
i
)
{
<<
" invoke allreduce. send "
<<
x
->
numel
()
<<
" recv "
VLOG
(
1
)
<<
"gpu : "
<<
out
->
numel
();
<<
" invoke allreduce. send "
<<
ins
[
i
]
->
numel
()
<<
" recv "
<<
outs
[
i
]
->
numel
();
PADDLE_ENFORCE
(
platform
::
dynload
::
ncclAllReduce
(
PADDLE_ENFORCE
(
platform
::
dynload
::
ncclAllReduce
(
ins
[
i
]
->
data
<
T
>
(),
outs
[
i
]
->
mutable_data
<
T
>
(
ctx
.
GetPlace
()),
x
->
data
<
T
>
(),
out
->
mutable_data
<
T
>
(
ctx
.
GetPlace
()),
out
->
numel
(),
outs
[
i
]
->
numel
(),
NCCLTypeWrapper
<
T
>::
type
,
reduction_op_
,
NCCLTypeWrapper
<
T
>::
type
,
reduction_op_
,
comm
->
comms
().
at
(
idx
),
comm
->
comms
().
at
(
idx
),
stream
));
ctx
.
cuda_device_context
().
stream
()));
PADDLE_ENFORCE
(
cudaStreamSynchronize
(
stream
));
VLOG
(
3
)
<<
"gpu : "
<<
" finished allreduce. send "
<<
x
->
numel
()
<<
" recv "
VLOG
(
1
)
<<
"gpu : "
<<
out
->
numel
();
<<
" finished allreduce. send "
<<
ins
[
i
]
->
numel
()
<<
" recv "
<<
outs
[
i
]
->
numel
();
}
}
}
};
};
...
@@ -94,13 +82,13 @@ class NCCLReduceKernel : public framework::OpKernel<T> {
...
@@ -94,13 +82,13 @@ class NCCLReduceKernel : public framework::OpKernel<T> {
void
Compute
(
const
framework
::
ExecutionContext
&
ctx
)
const
override
{
void
Compute
(
const
framework
::
ExecutionContext
&
ctx
)
const
override
{
PADDLE_ENFORCE
(
platform
::
is_gpu_place
(
ctx
.
GetPlace
()),
PADDLE_ENFORCE
(
platform
::
is_gpu_place
(
ctx
.
GetPlace
()),
"This kernel only runs on GPU device."
);
"This kernel only runs on GPU device."
);
auto
x
=
ctx
.
Input
<
LoDTensor
>
(
"X"
);
// x0, x1, x2
auto
ins
=
ctx
.
MultiInput
<
LoDTensor
>
(
"X"
);
// x0, x1, x2
auto
out
=
ctx
.
Output
<
LoDTensor
>
(
"Out"
);
auto
outs
=
ctx
.
MultiOutput
<
LoDTensor
>
(
"Out
"
);
auto
*
comm
=
ctx
.
Input
<
Communicator
>
(
"Communicator
"
);
int
root
=
ctx
.
Attr
<
int
>
(
"root"
);
std
::
string
reduction
=
ctx
.
Attr
<
std
::
string
>
(
"reduction"
);
std
::
string
reduction
=
ctx
.
Attr
<
std
::
string
>
(
"reduction"
);
ncclRedOp_t
reduction_op_
=
ncclSum
;
ncclRedOp_t
reduction_op_
=
ncclSum
;
if
(
reduction
==
"ncclMin"
)
{
if
(
reduction
==
"ncclMin"
)
{
reduction_op_
=
ncclMin
;
reduction_op_
=
ncclMin
;
}
else
if
(
reduction
==
"ncclMax"
)
{
}
else
if
(
reduction
==
"ncclMax"
)
{
...
@@ -112,40 +100,21 @@ class NCCLReduceKernel : public framework::OpKernel<T> {
...
@@ -112,40 +100,21 @@ class NCCLReduceKernel : public framework::OpKernel<T> {
}
else
{
}
else
{
PADDLE_THROW
(
"Invalid reduction. default ncclSum."
);
PADDLE_THROW
(
"Invalid reduction. default ncclSum."
);
}
}
int
root
=
ctx
.
Attr
<
int
>
(
"root"
);
auto
*
comm
=
ctx
.
Input
<
Communicator
>
(
"Communicator"
);
auto
stream
=
reinterpret_cast
<
const
platform
::
CUDADeviceContext
&>
(
ctx
.
device_context
())
.
stream
();
// device id
// device id
int
gpu_id
=
boost
::
get
<
platform
::
CUDAPlace
>
(
ctx
.
GetPlace
()).
GetDeviceId
();
int
gpu_id
=
boost
::
get
<
platform
::
CUDAPlace
>
(
ctx
.
GetPlace
()).
GetDeviceId
();
int
idx
=
comm
->
GetCommId
(
gpu_id
);
int
idx
=
comm
->
GetCommId
(
gpu_id
);
auto
ins_names
=
ctx
.
Inputs
(
"X"
);
std
::
hash
<
std
::
string
>
hasher
;
for
(
size_t
i
=
0
;
i
<
ins
.
size
();
++
i
)
{
if
(
root
==
platform
::
kInvalidGPUId
)
{
root
=
hasher
(
ins_names
[
i
])
%
comm
->
comms
().
size
();
}
T
*
recvbuffer
=
nullptr
;
T
*
recvbuffer
=
nullptr
;
if
(
root
==
gpu_id
)
{
if
(
root
==
gpu_id
)
{
recvbuffer
=
outs
[
i
]
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
recvbuffer
=
out
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
}
}
VLOG
(
3
)
<<
"gpu : "
<<
gpu_id
<<
" invoke reduce. send "
<<
x
->
numel
()
VLOG
(
1
)
<<
"gpu : "
<<
gpu_id
<<
" invoke reduce. send "
<<
" recv "
<<
out
->
numel
();
<<
ins
[
i
]
->
numel
()
<<
" recv "
<<
outs
[
i
]
->
numel
();
PADDLE_ENFORCE
(
platform
::
dynload
::
ncclReduce
(
PADDLE_ENFORCE
(
platform
::
dynload
::
ncclReduce
(
ins
[
i
]
->
data
<
T
>
(),
recvbuffer
,
ins
[
i
]
->
numel
(),
x
->
data
<
T
>
(),
recvbuffer
,
x
->
numel
(),
NCCLTypeWrapper
<
T
>::
type
,
NCCLTypeWrapper
<
T
>::
type
,
reduction_op_
,
root
,
comm
->
comms
().
at
(
idx
),
reduction_op_
,
root
,
comm
->
comms
().
at
(
idx
),
stream
));
ctx
.
cuda_device_context
().
stream
()));
PADDLE_ENFORCE
(
cudaStreamSynchronize
(
stream
));
VLOG
(
3
)
<<
"gpu : "
<<
gpu_id
<<
" finished reduce. send "
<<
x
->
numel
()
<<
" recv "
<<
out
->
numel
();
VLOG
(
1
)
<<
"gpu : "
<<
gpu_id
<<
" finished reduce. send "
<<
ins
[
i
]
->
numel
()
<<
" recv "
<<
outs
[
i
]
->
numel
();
}
}
}
};
};
...
@@ -155,47 +124,27 @@ class NCCLBcastKernel : public framework::OpKernel<T> {
...
@@ -155,47 +124,27 @@ class NCCLBcastKernel : public framework::OpKernel<T> {
void
Compute
(
const
framework
::
ExecutionContext
&
ctx
)
const
override
{
void
Compute
(
const
framework
::
ExecutionContext
&
ctx
)
const
override
{
PADDLE_ENFORCE
(
platform
::
is_gpu_place
(
ctx
.
GetPlace
()),
PADDLE_ENFORCE
(
platform
::
is_gpu_place
(
ctx
.
GetPlace
()),
"This kernel only runs on GPU device."
);
"This kernel only runs on GPU device."
);
int
root
=
ctx
.
Attr
<
int
>
(
"root"
);
int
root
=
ctx
.
Attr
<
int
>
(
"root"
);
auto
*
comm
=
ctx
.
Input
<
Communicator
>
(
"Communicator"
);
auto
*
comm
=
ctx
.
Input
<
Communicator
>
(
"Communicator"
);
auto
stream
=
reinterpret_cast
<
const
platform
::
CUDADeviceContext
&>
(
ctx
.
device_context
())
.
stream
();
// device id
// device id
int
gpu_id
=
boost
::
get
<
platform
::
CUDAPlace
>
(
ctx
.
GetPlace
()).
GetDeviceId
();
int
gpu_id
=
boost
::
get
<
platform
::
CUDAPlace
>
(
ctx
.
GetPlace
()).
GetDeviceId
();
int
idx
=
comm
->
GetCommId
(
gpu_id
);
int
idx
=
comm
->
GetCommId
(
gpu_id
);
if
(
idx
==
root
)
{
if
(
idx
==
root
)
{
auto
ins
=
ctx
.
MultiInput
<
LoDTensor
>
(
"X"
);
auto
*
x
=
ctx
.
Input
<
LoDTensor
>
(
"X"
);
for
(
size_t
i
=
0
;
i
<
ins
.
size
();
++
i
)
{
VLOG
(
3
)
<<
"gpu : "
<<
gpu_id
<<
" invoke Bcast. send "
<<
x
->
numel
();
VLOG
(
1
)
<<
"gpu : "
<<
gpu_id
<<
" invoke Bcast. send "
<<
ins
[
i
]
->
numel
();
VLOG
(
1
)
<<
" before ncclBcast"
;
PADDLE_ENFORCE
(
platform
::
dynload
::
ncclBcast
(
PADDLE_ENFORCE
(
platform
::
dynload
::
ncclBcast
(
(
void
*
)
ins
[
i
]
->
data
<
T
>
(),
ins
[
i
]
->
numel
(),
NCCLTypeWrapper
<
T
>::
type
,
(
void
*
)
x
->
data
<
T
>
(),
x
->
numel
(),
NCCLTypeWrapper
<
T
>::
type
,
root
,
root
,
comm
->
comms
().
at
(
idx
),
stream
));
comm
->
comms
().
at
(
idx
),
ctx
.
cuda_device_context
().
stream
()));
VLOG
(
1
)
<<
" after ncclBcast"
;
VLOG
(
3
)
<<
"gpu : "
<<
gpu_id
<<
" finished Bcast."
;
PADDLE_ENFORCE
(
cudaStreamSynchronize
(
stream
));
VLOG
(
1
)
<<
"gpu : "
<<
gpu_id
<<
" finished Bcast."
;
}
}
else
{
}
else
{
auto
outs
=
ctx
.
MultiOutput
<
LoDTensor
>
(
"Out"
);
auto
*
out
=
ctx
.
Output
<
LoDTensor
>
(
"Out"
);
for
(
size_t
i
=
0
;
i
<
outs
.
size
();
++
i
)
{
VLOG
(
3
)
<<
"gpu : "
<<
gpu_id
<<
" invoke Bcast. recv buffer "
VLOG
(
1
)
<<
"gpu : "
<<
gpu_id
<<
" invoke Bcast. recv buffer "
<<
framework
::
product
(
out
->
dims
());
<<
framework
::
product
(
outs
[
i
]
->
dims
());
PADDLE_ENFORCE
(
platform
::
dynload
::
ncclBcast
(
PADDLE_ENFORCE
(
platform
::
dynload
::
ncclBcast
(
outs
[
i
]
->
mutable_data
<
T
>
(
ctx
.
GetPlace
()),
outs
[
i
]
->
numel
(),
out
->
mutable_data
<
T
>
(
ctx
.
GetPlace
()),
out
->
numel
(),
NCCLTypeWrapper
<
T
>::
type
,
root
,
comm
->
comms
().
at
(
idx
),
stream
));
NCCLTypeWrapper
<
T
>::
type
,
root
,
comm
->
comms
().
at
(
idx
),
PADDLE_ENFORCE
(
cudaStreamSynchronize
(
stream
));
ctx
.
cuda_device_context
().
stream
()));
VLOG
(
3
)
<<
"gpu : "
<<
gpu_id
<<
" finished Bcast. recv "
<<
out
->
numel
();
VLOG
(
1
)
<<
"gpu : "
<<
gpu_id
<<
" finished Bcast. recv "
<<
outs
[
i
]
->
numel
();
}
}
}
}
}
};
};
...
...
paddle/fluid/operators/reader/create_double_buffer_reader_op.cc
浏览文件 @
6ea45823
...
@@ -24,11 +24,31 @@ static constexpr size_t kDoubleBufferSize = 2;
...
@@ -24,11 +24,31 @@ static constexpr size_t kDoubleBufferSize = 2;
class
DoubleBufferReader
:
public
framework
::
DecoratedReader
{
class
DoubleBufferReader
:
public
framework
::
DecoratedReader
{
public:
public:
explicit
DoubleBufferReader
(
ReaderBase
*
reader
)
struct
Item
{
:
DecoratedReader
(
reader
),
Item
()
:
ctx_
(
nullptr
)
{}
buffer_
(
framework
::
MakeChannel
<
std
::
vector
<
framework
::
LoDTensor
>>
(
kDoubleBufferSize
))
{
std
::
vector
<
framework
::
LoDTensor
>
payloads_
;
std
::
thread
prefetch
(
&
DoubleBufferReader
::
PrefetchThreadFunc
,
this
);
platform
::
DeviceContext
*
ctx_
;
};
explicit
DoubleBufferReader
(
ReaderBase
*
reader
,
platform
::
Place
target_place
=
platform
::
CPUPlace
())
:
DecoratedReader
(
reader
),
place_
(
target_place
)
{
for
(
size_t
i
=
0
;
i
<
kDoubleBufferSize
;
++
i
)
{
if
(
platform
::
is_gpu_place
(
place_
))
{
#ifdef PADDLE_WITH_CUDA
ctxs_
.
emplace_back
(
new
platform
::
CUDADeviceContext
(
boost
::
get
<
platform
::
CUDAPlace
>
(
place_
)));
#endif
}
}
start_thread
();
}
void
start_thread
()
{
buffer_
=
framework
::
MakeChannel
<
Item
>
(
kDoubleBufferSize
);
std
::
thread
prefetch
([
this
]
{
PrefetchThreadFunc
();
});
prefetch
.
detach
();
prefetch
.
detach
();
}
}
...
@@ -42,7 +62,10 @@ class DoubleBufferReader : public framework::DecoratedReader {
...
@@ -42,7 +62,10 @@ class DoubleBufferReader : public framework::DecoratedReader {
private:
private:
void
PrefetchThreadFunc
();
void
PrefetchThreadFunc
();
framework
::
Channel
<
std
::
vector
<
framework
::
LoDTensor
>>*
buffer_
;
framework
::
Channel
<
Item
>*
buffer_
;
platform
::
Place
place_
;
std
::
vector
<
std
::
unique_ptr
<
platform
::
DeviceContext
>>
ctxs_
;
mutable
Item
local_buffer_
;
};
};
class
CreateDoubleBufferReaderOp
:
public
framework
::
OperatorBase
{
class
CreateDoubleBufferReaderOp
:
public
framework
::
OperatorBase
{
...
@@ -56,7 +79,20 @@ class CreateDoubleBufferReaderOp : public framework::OperatorBase {
...
@@ -56,7 +79,20 @@ class CreateDoubleBufferReaderOp : public framework::OperatorBase {
->
Get
<
framework
::
ReaderHolder
>
();
->
Get
<
framework
::
ReaderHolder
>
();
auto
*
out
=
scope
.
FindVar
(
Output
(
"Out"
))
auto
*
out
=
scope
.
FindVar
(
Output
(
"Out"
))
->
template
GetMutable
<
framework
::
ReaderHolder
>();
->
template
GetMutable
<
framework
::
ReaderHolder
>();
out
->
Reset
(
new
DoubleBufferReader
(
underlying_reader
.
Get
()));
auto
place_str
=
Attr
<
std
::
string
>
(
"place"
);
platform
::
Place
place
;
if
(
place_str
==
"CPU"
)
{
place
=
platform
::
CPUPlace
();
}
else
{
std
::
istringstream
sin
(
place_str
);
sin
.
seekg
(
std
::
string
(
"CUDA:"
).
size
(),
std
::
ios
::
beg
);
size_t
num
;
sin
>>
num
;
place
=
platform
::
CUDAPlace
(
static_cast
<
int
>
(
num
));
}
out
->
Reset
(
new
DoubleBufferReader
(
underlying_reader
.
Get
(),
place
));
}
}
};
};
...
@@ -71,44 +107,73 @@ class CreateDoubleBufferReaderOpMaker : public DecoratedReaderMakerBase {
...
@@ -71,44 +107,73 @@ class CreateDoubleBufferReaderOpMaker : public DecoratedReaderMakerBase {
It launches another thread to execute the 'underlying reader' asynchronously,
It launches another thread to execute the 'underlying reader' asynchronously,
which prevents reading process from blocking subsequent training.
which prevents reading process from blocking subsequent training.
)DOC"
);
)DOC"
);
std
::
unordered_set
<
std
::
string
>
enum_range
;
constexpr
size_t
kMaxCUDADevs
=
128
;
for
(
size_t
i
=
0
;
i
<
kMaxCUDADevs
;
++
i
)
{
enum_range
.
insert
(
string
::
Sprintf
(
"CUDA:%d"
,
i
));
}
enum_range
.
insert
(
"CPU"
);
AddAttr
<
std
::
string
>
(
"place"
,
"The double buffer place, default is CPU"
)
.
SetDefault
(
"CPU"
)
.
InEnum
({
enum_range
});
}
}
};
};
void
DoubleBufferReader
::
ReadNext
(
std
::
vector
<
framework
::
LoDTensor
>*
out
)
{
void
DoubleBufferReader
::
ReadNext
(
std
::
vector
<
framework
::
LoDTensor
>*
out
)
{
out
->
clear
();
if
(
local_buffer_
.
payloads_
.
empty
())
{
buffer_
->
Receive
(
out
);
buffer_
->
Receive
(
&
local_buffer_
);
}
*
out
=
local_buffer_
.
payloads_
;
local_buffer_
.
payloads_
.
clear
();
if
(
local_buffer_
.
ctx_
)
{
local_buffer_
.
ctx_
->
Wait
();
}
}
}
void
DoubleBufferReader
::
ReInit
()
{
void
DoubleBufferReader
::
ReInit
()
{
reader_
->
ReInit
();
reader_
->
ReInit
();
buffer_
->
Close
();
buffer_
->
Close
();
// The existing prefetch thread will terminate for the buffer_ is closed.
start_thread
();
buffer_
=
framework
::
MakeChannel
<
std
::
vector
<
framework
::
LoDTensor
>>
(
kDoubleBufferSize
);
std
::
thread
prefetch
(
&
DoubleBufferReader
::
PrefetchThreadFunc
,
this
);
prefetch
.
detach
();
}
}
void
DoubleBufferReader
::
PrefetchThreadFunc
()
{
void
DoubleBufferReader
::
PrefetchThreadFunc
()
{
VLOG
(
5
)
<<
"A new prefetch thread starts."
;
VLOG
(
5
)
<<
"A new prefetch thread starts."
;
while
(
true
)
{
size_t
gpu_ctx_offset
=
0
;
std
::
vector
<
framework
::
LoDTensor
>
batch
;
while
(
reader_
->
HasNext
())
{
reader_
->
ReadNext
(
&
batch
);
Item
batch
;
if
(
batch
.
empty
())
{
reader_
->
ReadNext
(
&
batch
.
payloads_
);
// EOF
if
(
platform
::
is_gpu_place
(
place_
))
{
buffer_
->
Close
();
std
::
vector
<
framework
::
LoDTensor
>
gpu_batch
;
VLOG
(
5
)
<<
"Reached the end of the file. The prefetch thread terminates."
;
auto
&
gpu_ctx
=
this
->
ctxs_
[
gpu_ctx_offset
++
];
break
;
gpu_ctx_offset
%=
this
->
ctxs_
.
size
();
gpu_batch
.
resize
(
batch
.
payloads_
.
size
());
for
(
size_t
i
=
0
;
i
<
batch
.
payloads_
.
size
();
++
i
)
{
framework
::
TensorCopy
(
batch
.
payloads_
[
i
],
place_
,
*
gpu_ctx
,
&
gpu_batch
[
i
]);
gpu_batch
[
i
].
set_lod
(
batch
.
payloads_
[
i
].
lod
());
}
}
batch
.
ctx_
=
gpu_ctx
.
get
();
std
::
swap
(
gpu_batch
,
batch
.
payloads_
);
}
if
(
!
buffer_
->
Send
(
&
batch
))
{
if
(
!
buffer_
->
Send
(
&
batch
))
{
VLOG
(
5
)
<<
"WARNING: The double buffer channel has been closed. The "
VLOG
(
5
)
<<
"WARNING: The double buffer channel has been closed. The "
"prefetch thread terminates."
;
"prefetch thread terminates."
;
break
;
break
;
}
}
}
}
buffer_
->
Close
();
}
}
bool
DoubleBufferReader
::
HasNext
()
const
{
PADDLE_THROW
(
"Not Implemented"
);
}
bool
DoubleBufferReader
::
HasNext
()
const
{
if
(
local_buffer_
.
payloads_
.
empty
())
{
bool
ok
=
buffer_
->
Receive
(
&
local_buffer_
);
return
ok
;
}
else
{
return
true
;
}
}
}
// namespace reader
}
// namespace reader
}
// namespace operators
}
// namespace operators
...
...
paddle/fluid/operators/reader/create_random_data_generator_op.cc
浏览文件 @
6ea45823
...
@@ -19,11 +19,11 @@ namespace operators {
...
@@ -19,11 +19,11 @@ namespace operators {
namespace
reader
{
namespace
reader
{
template
<
typename
T
>
template
<
typename
T
>
class
RandomDataGenerator
:
public
framework
::
FileReader
{
class
RandomDataGenerator
:
public
framework
::
ReaderBase
{
public:
public:
RandomDataGenerator
(
const
std
::
vector
<
framework
::
DDim
>&
shapes
,
float
min
,
RandomDataGenerator
(
const
std
::
vector
<
framework
::
DDim
>&
shapes
,
float
min
,
float
max
)
float
max
)
:
FileReader
(
shapes
),
min_
(
min
),
max_
(
max
)
{
:
framework
::
ReaderBase
(),
min_
(
min
),
max_
(
max
),
shapes_
(
shapes
)
{
PADDLE_ENFORCE_LE
(
PADDLE_ENFORCE_LE
(
min
,
max
,
"'min' shouldn't be greater than 'max'.(%f vs %f)"
,
min
,
max
);
min
,
max
,
"'min' shouldn't be greater than 'max'.(%f vs %f)"
,
min
,
max
);
unsigned
int
seed
=
std
::
random_device
()();
unsigned
int
seed
=
std
::
random_device
()();
...
@@ -59,6 +59,7 @@ class RandomDataGenerator : public framework::FileReader {
...
@@ -59,6 +59,7 @@ class RandomDataGenerator : public framework::FileReader {
float
max_
;
float
max_
;
std
::
minstd_rand
engine_
;
std
::
minstd_rand
engine_
;
std
::
uniform_real_distribution
<
float
>
dist_
;
std
::
uniform_real_distribution
<
float
>
dist_
;
std
::
vector
<
framework
::
DDim
>
shapes_
;
};
};
template
<
typename
T
>
template
<
typename
T
>
...
...
paddle/fluid/operators/reader/create_recordio_file_reader_op.cc
浏览文件 @
6ea45823
...
@@ -20,21 +20,22 @@ namespace operators {
...
@@ -20,21 +20,22 @@ namespace operators {
namespace
reader
{
namespace
reader
{
class
RecordIOFileReader
:
public
framework
::
FileReader
{
class
RecordIOFileReader
:
public
framework
::
FileReader
{
public:
public:
RecordIOFileReader
(
const
std
::
string
&
filename
,
explicit
RecordIOFileReader
(
const
std
::
string
&
filename
,
const
std
::
vector
<
framework
::
DDim
>&
shape
s
)
const
std
::
vector
<
framework
::
DDim
>&
dim
s
)
:
FileReader
(
shape
s
),
:
FileReader
(
dim
s
),
scanner_
(
filename
),
scanner_
(
filename
),
dev_ctx_
(
*
platform
::
DeviceContextPool
::
Instance
().
Get
(
dev_ctx_
(
*
platform
::
DeviceContextPool
::
Instance
().
Get
(
platform
::
CPUPlace
()))
{}
platform
::
CPUPlace
()))
{}
void
ReadNext
(
std
::
vector
<
framework
::
LoDTensor
>*
out
)
override
{
*
out
=
framework
::
ReadFromRecordIO
(
scanner_
,
dev_ctx_
);
}
bool
HasNext
()
const
override
{
return
scanner_
.
HasNext
();
}
bool
HasNext
()
const
override
{
return
scanner_
.
HasNext
();
}
void
ReInit
()
override
{
scanner_
.
Reset
();
}
void
ReInit
()
override
{
scanner_
.
Reset
();
}
protected:
void
ReadNextImpl
(
std
::
vector
<
framework
::
LoDTensor
>*
out
)
override
{
*
out
=
framework
::
ReadFromRecordIO
(
scanner_
,
dev_ctx_
);
}
private:
private:
recordio
::
Scanner
scanner_
;
recordio
::
Scanner
scanner_
;
const
platform
::
DeviceContext
&
dev_ctx_
;
const
platform
::
DeviceContext
&
dev_ctx_
;
...
@@ -54,12 +55,12 @@ class CreateRecordIOReaderOp : public framework::OperatorBase {
...
@@ -54,12 +55,12 @@ class CreateRecordIOReaderOp : public framework::OperatorBase {
int
(
shape_concat
.
size
()),
int
(
shape_concat
.
size
()),
"The accumulate of all ranks should be equal to the "
"The accumulate of all ranks should be equal to the "
"shape concat's length."
);
"shape concat's length."
);
std
::
vector
<
framework
::
DDim
>
shapes
=
RestoreShapes
(
shape_concat
,
ranks
);
std
::
string
filename
=
Attr
<
std
::
string
>
(
"filename"
);
std
::
string
filename
=
Attr
<
std
::
string
>
(
"filename"
);
auto
*
out
=
scope
.
FindVar
(
Output
(
"Out"
))
auto
*
out
=
scope
.
FindVar
(
Output
(
"Out"
))
->
template
GetMutable
<
framework
::
ReaderHolder
>();
->
template
GetMutable
<
framework
::
ReaderHolder
>();
out
->
Reset
(
new
RecordIOFileReader
(
filename
,
shapes
));
out
->
Reset
(
new
RecordIOFileReader
(
filename
,
RestoreShapes
(
shape_concat
,
ranks
)));
}
}
};
};
...
@@ -85,3 +86,5 @@ namespace reader = paddle::operators::reader;
...
@@ -85,3 +86,5 @@ namespace reader = paddle::operators::reader;
REGISTER_FILE_READER_OPERATOR
(
create_recordio_file_reader
,
REGISTER_FILE_READER_OPERATOR
(
create_recordio_file_reader
,
reader
::
CreateRecordIOReaderOp
,
reader
::
CreateRecordIOReaderOp
,
reader
::
CreateRecordIOReaderOpMaker
);
reader
::
CreateRecordIOReaderOpMaker
);
REGISTER_FILE_READER
(
recordio
,
reader
::
RecordIOFileReader
);
paddle/fluid/operators/reader/create_shuffle_reader_op.cc
浏览文件 @
6ea45823
...
@@ -12,6 +12,9 @@
...
@@ -12,6 +12,9 @@
// See the License for the specific language governing permissions and
// See the License for the specific language governing permissions and
// limitations under the License.
// limitations under the License.
#include <random>
#include "glog/logging.h"
#include "paddle/fluid/operators/detail/safe_ref.h"
#include "paddle/fluid/operators/reader/reader_op_registry.h"
#include "paddle/fluid/operators/reader/reader_op_registry.h"
namespace
paddle
{
namespace
paddle
{
...
@@ -20,43 +23,53 @@ namespace reader {
...
@@ -20,43 +23,53 @@ namespace reader {
class
ShuffleReader
:
public
framework
::
DecoratedReader
{
class
ShuffleReader
:
public
framework
::
DecoratedReader
{
public:
public:
ShuffleReader
(
ReaderBase
*
reader
,
int
buffer_size
)
ShuffleReader
(
ReaderBase
*
reader
,
size_t
buffer_size
,
size_t
seed
=
0
)
:
DecoratedReader
(
reader
),
buffer_size_
(
buffer_size
),
iteration_pos_
(
0
)
{
:
DecoratedReader
(
reader
),
buffer_size_
(
buffer_size
),
seed_
(
seed
)
{
buffer_
.
reserve
(
buffer_size
);
VLOG
(
10
)
<<
"Create shuffle reader of "
<<
reader_
;
if
(
seed_
==
0
)
{
std
::
random_device
device
;
seed_
=
device
();
}
ReadIntoBuffers
();
}
}
void
ReadNext
(
std
::
vector
<
framework
::
LoDTensor
>*
out
)
override
;
void
ReadNext
(
std
::
vector
<
framework
::
LoDTensor
>*
out
)
override
{
if
(
iteration_pos_
>=
buffer_
.
size
())
{
VLOG
(
10
)
<<
"Resetting shuffle buffer"
;
ReadIntoBuffers
();
}
*
out
=
buffer_
[
iteration_pos_
++
];
}
private:
bool
HasNext
()
const
override
{
int
buffer_size_
;
return
iteration_pos_
<
buffer_
.
size
()
||
reader_
->
HasNext
();
std
::
vector
<
std
::
vector
<
framework
::
LoDTensor
>>
buffer_
;
}
size_t
iteration_pos_
;
};
void
ShuffleReader
::
ReadNext
(
std
::
vector
<
framework
::
LoDTensor
>*
out
)
{
private:
if
(
iteration_pos_
>=
buffer_
.
size
())
{
void
ReadIntoBuffers
()
{
// Reload buffer with new data
buffer_
.
clear
();
buffer_
.
clear
();
buffer_
.
reserve
(
buffer_size_
);
buffer_
.
reserve
(
buffer_size_
);
for
(
int
i
=
0
;
i
<
buffer_size_
;
++
i
)
{
iteration_pos_
=
0
;
buffer_
.
push_back
(
std
::
vector
<
framework
::
LoDTensor
>
());
PADDLE_ENFORCE
(
reader_
->
HasNext
());
reader_
->
ReadNext
(
&
buffer_
.
back
());
for
(
size_t
i
=
0
;
i
<
buffer_size_
;
++
i
)
{
if
(
buffer_
.
back
().
empty
())
{
if
(
!
reader_
->
HasNext
())
{
buffer_
.
pop_back
();
break
;
break
;
}
}
buffer_
.
emplace_back
();
reader_
->
ReadNext
(
&
buffer_
.
back
());
}
}
// TODO(fengjiayi): 'std::random_shuffle' can be very slow. It needs to be
std
::
mt19937
g
(
seed_
);
// optimize.
std
::
shuffle
(
buffer_
.
begin
(),
buffer_
.
end
(),
g
);
std
::
random_shuffle
(
buffer_
.
begin
(),
buffer_
.
end
());
seed_
=
g
();
// update seed_;
iteration_pos_
=
0
;
VLOG
(
10
)
<<
"random buffer size = "
<<
buffer_
.
size
();
}
out
->
clear
();
if
(
!
buffer_
.
empty
())
{
std
::
swap
(
*
out
,
buffer_
[
iteration_pos_
++
]);
}
}
// if buffer_ is empty, the 'out' will return as an empty vector.
}
size_t
buffer_size_
;
std
::
vector
<
std
::
vector
<
framework
::
LoDTensor
>>
buffer_
;
size_t
iteration_pos_
;
size_t
seed_
;
};
class
CreateShuffleReaderOp
:
public
framework
::
OperatorBase
{
class
CreateShuffleReaderOp
:
public
framework
::
OperatorBase
{
public:
public:
...
@@ -67,10 +80,10 @@ class CreateShuffleReaderOp : public framework::OperatorBase {
...
@@ -67,10 +80,10 @@ class CreateShuffleReaderOp : public framework::OperatorBase {
const
platform
::
Place
&
dev_place
)
const
override
{
const
platform
::
Place
&
dev_place
)
const
override
{
const
auto
&
underlying_reader
=
scope
.
FindVar
(
Input
(
"UnderlyingReader"
))
const
auto
&
underlying_reader
=
scope
.
FindVar
(
Input
(
"UnderlyingReader"
))
->
Get
<
framework
::
ReaderHolder
>
();
->
Get
<
framework
::
ReaderHolder
>
();
auto
*
out
=
scope
.
FindVar
(
Output
(
"Out"
))
auto
&
var
=
detail
::
Ref
(
scope
.
FindVar
(
Output
(
"Out"
)));
->
template
GetMutable
<
framework
::
ReaderHolder
>();
var
.
GetMutable
<
framework
::
ReaderHolder
>
()
->
Reset
(
out
->
Reset
(
new
ShuffleReader
(
underlying_reader
.
Get
(),
new
ShuffleReader
(
underlying_reader
.
Get
(),
Attr
<
int
>
(
"buffer_size"
)));
static_cast
<
size_t
>
(
Attr
<
int
>
(
"buffer_size"
)
)));
}
}
};
};
...
...
paddle/fluid/operators/reader/reader_op_registry.cc
浏览文件 @
6ea45823
...
@@ -31,6 +31,11 @@ std::vector<framework::DDim> RestoreShapes(const std::vector<int>& shape_concat,
...
@@ -31,6 +31,11 @@ std::vector<framework::DDim> RestoreShapes(const std::vector<int>& shape_concat,
return
res
;
return
res
;
}
}
std
::
unordered_map
<
std
::
string
,
FileReaderCreator
>&
FileReaderRegistry
()
{
static
std
::
unordered_map
<
std
::
string
,
FileReaderCreator
>
regs
;
return
regs
;
}
FileReaderMakerBase
::
FileReaderMakerBase
(
FileReaderMakerBase
::
FileReaderMakerBase
(
framework
::
OpProtoAndCheckerMaker
::
OpProto
*
op_proto
,
framework
::
OpProtoAndCheckerMaker
::
OpProto
*
op_proto
,
framework
::
OpAttrChecker
*
op_checker
)
framework
::
OpAttrChecker
*
op_checker
)
...
...
paddle/fluid/operators/reader/reader_op_registry.h
浏览文件 @
6ea45823
...
@@ -21,6 +21,20 @@ namespace paddle {
...
@@ -21,6 +21,20 @@ namespace paddle {
namespace
operators
{
namespace
operators
{
namespace
reader
{
namespace
reader
{
using
FileReaderCreator
=
std
::
function
<
framework
::
ReaderBase
*
(
const
std
::
string
&
,
const
std
::
vector
<
framework
::
DDim
>&
)
>
;
std
::
unordered_map
<
std
::
string
,
FileReaderCreator
>&
FileReaderRegistry
();
template
<
typename
Reader
>
int
RegisterFileReader
(
const
std
::
string
&
filetype
)
{
FileReaderRegistry
()[
filetype
]
=
[](
const
std
::
string
&
fn
,
const
std
::
vector
<
paddle
::
framework
::
DDim
>&
dim
)
{
return
new
Reader
(
fn
,
dim
);
};
return
0
;
}
extern
std
::
vector
<
framework
::
DDim
>
RestoreShapes
(
extern
std
::
vector
<
framework
::
DDim
>
RestoreShapes
(
const
std
::
vector
<
int
>&
shape_concat
,
const
std
::
vector
<
int
>&
ranks
);
const
std
::
vector
<
int
>&
shape_concat
,
const
std
::
vector
<
int
>&
ranks
);
...
@@ -73,3 +87,15 @@ class DecoratedReaderMakerBase : public framework::OpProtoAndCheckerMaker {
...
@@ -73,3 +87,15 @@ class DecoratedReaderMakerBase : public framework::OpProtoAndCheckerMaker {
paddle::operators::reader::DecoratedReaderInferShape, \
paddle::operators::reader::DecoratedReaderInferShape, \
paddle::framework::EmptyGradOpMaker, \
paddle::framework::EmptyGradOpMaker, \
paddle::operators::reader::DecoratedReaderInferVarType)
paddle::operators::reader::DecoratedReaderInferVarType)
#define REGISTER_FILE_READER(_filetype, _reader) \
STATIC_ASSERT_GLOBAL_NAMESPACE( \
_reg_file_reader_##_filetype, \
"Must use REGISTER_FILE_READER in global namespace"); \
int TouchFileReader##_filetype() { return 0; } \
int _reg_file_reader_entry_##filetype = \
paddle::operators::reader::RegisterFileReader<_reader>(#_filetype)
#define USE_FILE_READER(filetype) \
extern int TouchFileReader##filetype(); \
static int _use_##filetype = TouchFileReader##filetype()
paddle/fluid/operators/reduce_op.cc
浏览文件 @
6ea45823
...
@@ -173,6 +173,15 @@ class ReduceMinOpMaker : public ReduceOpMaker {
...
@@ -173,6 +173,15 @@ class ReduceMinOpMaker : public ReduceOpMaker {
}
}
};
};
class
ReduceProdOpMaker
:
public
ReduceOpMaker
{
public:
ReduceProdOpMaker
(
OpProto
*
proto
,
OpAttrChecker
*
op_checker
)
:
ReduceOpMaker
(
proto
,
op_checker
)
{
SetComment
(
"ReduceProd"
,
"production"
);
AddComment
(
comment_
);
}
};
}
// namespace operators
}
// namespace operators
}
// namespace paddle
}
// namespace paddle
...
@@ -190,6 +199,9 @@ REGISTER_OP(reduce_max, ops::ReduceOp, ops::ReduceMaxOpMaker, reduce_max_grad,
...
@@ -190,6 +199,9 @@ REGISTER_OP(reduce_max, ops::ReduceOp, ops::ReduceMaxOpMaker, reduce_max_grad,
REGISTER_OP
(
reduce_min
,
ops
::
ReduceOp
,
ops
::
ReduceMinOpMaker
,
reduce_min_grad
,
REGISTER_OP
(
reduce_min
,
ops
::
ReduceOp
,
ops
::
ReduceMinOpMaker
,
reduce_min_grad
,
ops
::
ReduceGradOp
);
ops
::
ReduceGradOp
);
REGISTER_OP
(
reduce_prod
,
ops
::
ReduceOp
,
ops
::
ReduceProdOpMaker
,
reduce_prod_grad
,
ops
::
ReduceGradOp
);
#define REGISTER_REDUCE_CPU_KERNEL(reduce_type, functor, grad_functor) \
#define REGISTER_REDUCE_CPU_KERNEL(reduce_type, functor, grad_functor) \
REGISTER_OP_CPU_KERNEL(reduce_type, \
REGISTER_OP_CPU_KERNEL(reduce_type, \
ops::ReduceKernel<paddle::platform::CPUDeviceContext, \
ops::ReduceKernel<paddle::platform::CPUDeviceContext, \
...
...
paddle/fluid/operators/reduce_op.h
浏览文件 @
6ea45823
...
@@ -93,6 +93,22 @@ struct MaxOrMinGradFunctor {
...
@@ -93,6 +93,22 @@ struct MaxOrMinGradFunctor {
}
}
};
};
struct
ProdFunctor
{
template
<
typename
DeviceContext
,
typename
X
,
typename
Y
,
typename
Dim
>
void
operator
()(
const
DeviceContext
&
place
,
X
&
x
,
Y
&
y
,
const
Dim
&
dim
)
{
y
.
device
(
place
)
=
x
.
prod
(
dim
);
}
};
struct
ProdGradFunctor
{
template
<
typename
DeviceContext
,
typename
X
,
typename
Y
,
typename
DX
,
typename
DY
,
typename
Dim
>
void
operator
()(
const
DeviceContext
&
place
,
X
&
x
,
Y
&
y
,
DX
&
dx
,
DY
&
dy
,
const
Dim
&
dim
,
int
size
)
{
dx
.
device
(
place
)
=
dy
.
broadcast
(
dim
)
*
y
.
broadcast
(
dim
)
*
x
.
inverse
();
}
};
template
<
typename
DeviceContext
,
typename
T
,
typename
Functor
>
template
<
typename
DeviceContext
,
typename
T
,
typename
Functor
>
class
ReduceKernel
:
public
framework
::
OpKernel
<
T
>
{
class
ReduceKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
public:
...
@@ -254,4 +270,5 @@ class ReduceGradKernel : public framework::OpKernel<T> {
...
@@ -254,4 +270,5 @@ class ReduceGradKernel : public framework::OpKernel<T> {
__macro(reduce_sum, SumFunctor, SumGradFunctor); \
__macro(reduce_sum, SumFunctor, SumGradFunctor); \
__macro(reduce_mean, MeanFunctor, MeanGradFunctor); \
__macro(reduce_mean, MeanFunctor, MeanGradFunctor); \
__macro(reduce_max, MaxFunctor, MaxOrMinGradFunctor); \
__macro(reduce_max, MaxFunctor, MaxOrMinGradFunctor); \
__macro(reduce_min, MinFunctor, MaxOrMinGradFunctor);
__macro(reduce_min, MinFunctor, MaxOrMinGradFunctor); \
__macro(reduce_prod, ProdFunctor, ProdGradFunctor);
paddle/fluid/operators/send_op.cc
浏览文件 @
6ea45823
...
@@ -88,6 +88,12 @@ class SendOp : public framework::OperatorBase {
...
@@ -88,6 +88,12 @@ class SendOp : public framework::OperatorBase {
rpc_client
->
AsyncGetVariable
(
epmap
[
i
],
ctx
,
scope
,
outs
[
i
]);
rpc_client
->
AsyncGetVariable
(
epmap
[
i
],
ctx
,
scope
,
outs
[
i
]);
}
}
PADDLE_ENFORCE
(
rpc_client
->
Wait
());
PADDLE_ENFORCE
(
rpc_client
->
Wait
());
// tell pservers that current trainer have called fetch
for
(
auto
&
ep
:
endpoints
)
{
VLOG
(
3
)
<<
"send fetch barrier, ep: "
<<
ep
;
rpc_client
->
AsyncSendFetchBarrier
(
ep
);
}
PADDLE_ENFORCE
(
rpc_client
->
Wait
());
}
}
}
}
};
};
...
...
python/paddle/fluid/distribute_transpiler.py
浏览文件 @
6ea45823
...
@@ -250,6 +250,8 @@ class DistributeTranspiler:
...
@@ -250,6 +250,8 @@ class DistributeTranspiler:
def
get_trainer_program
(
self
):
def
get_trainer_program
(
self
):
# remove optimize ops and add a send op to main_program
# remove optimize ops and add a send op to main_program
self
.
program
.
global_block
().
delete_ops
(
self
.
optimize_ops
)
self
.
program
.
global_block
().
delete_ops
(
self
.
optimize_ops
)
# FIXME(typhoonzero): serialize once will fix error occurs when clone.
self
.
program
.
__str__
()
return
self
.
program
return
self
.
program
def
get_pserver_program
(
self
,
endpoint
):
def
get_pserver_program
(
self
,
endpoint
):
...
@@ -309,7 +311,8 @@ class DistributeTranspiler:
...
@@ -309,7 +311,8 @@ class DistributeTranspiler:
for
_
,
opt_op
in
enumerate
(
opt_op_on_pserver
):
for
_
,
opt_op
in
enumerate
(
opt_op_on_pserver
):
if
ufind
.
is_connected
(
op
,
opt_op
):
if
ufind
.
is_connected
(
op
,
opt_op
):
if
self
.
_is_opt_op
(
op
):
if
self
.
_is_opt_op
(
op
):
self
.
_append_pserver_ops
(
optimize_block
,
op
,
endpoint
)
self
.
_append_pserver_ops
(
optimize_block
,
op
,
endpoint
,
default_main_program
())
else
:
else
:
self
.
_append_pserver_non_opt_ops
(
optimize_block
,
op
)
self
.
_append_pserver_non_opt_ops
(
optimize_block
,
op
)
break
break
...
@@ -520,7 +523,8 @@ class DistributeTranspiler:
...
@@ -520,7 +523,8 @@ class DistributeTranspiler:
orig_var_name
=
varname
[:
suff_idx
]
orig_var_name
=
varname
[:
suff_idx
]
return
orig_var_name
return
orig_var_name
def
_append_pserver_ops
(
self
,
optimize_block
,
opt_op
,
endpoint
):
def
_append_pserver_ops
(
self
,
optimize_block
,
opt_op
,
endpoint
,
origin_program
):
program
=
optimize_block
.
program
program
=
optimize_block
.
program
pserver_block
=
program
.
global_block
()
pserver_block
=
program
.
global_block
()
new_inputs
=
dict
()
new_inputs
=
dict
()
...
@@ -576,7 +580,17 @@ class DistributeTranspiler:
...
@@ -576,7 +580,17 @@ class DistributeTranspiler:
elif
key
==
"LearningRate"
:
elif
key
==
"LearningRate"
:
# leraning rate variable has already be created by non-optimize op,
# leraning rate variable has already be created by non-optimize op,
# don't create it once again.
# don't create it once again.
lr_varname
=
opt_op
.
input
(
key
)[
0
]
if
pserver_block
.
vars
.
has_key
(
lr_varname
):
new_inputs
[
key
]
=
pserver_block
.
vars
[
opt_op
.
input
(
key
)[
0
]]
new_inputs
[
key
]
=
pserver_block
.
vars
[
opt_op
.
input
(
key
)[
0
]]
else
:
origin_var
=
origin_program
.
global_block
().
vars
[
lr_varname
]
tmpvar
=
pserver_block
.
create_var
(
name
=
origin_var
.
name
,
persistable
=
origin_var
.
persistable
,
dtype
=
origin_var
.
dtype
,
shape
=
origin_var
.
shape
)
new_inputs
[
key
]
=
tmpvar
for
key
in
opt_op
.
input_names
:
for
key
in
opt_op
.
input_names
:
new_shape
=
None
new_shape
=
None
...
...
python/paddle/fluid/layers/io.py
浏览文件 @
6ea45823
...
@@ -21,7 +21,7 @@ from ..executor import global_scope
...
@@ -21,7 +21,7 @@ from ..executor import global_scope
__all__
=
[
__all__
=
[
'data'
,
'BlockGuardServ'
,
'ListenAndServ'
,
'Send'
,
'open_recordio_file'
,
'data'
,
'BlockGuardServ'
,
'ListenAndServ'
,
'Send'
,
'open_recordio_file'
,
'read_file'
'read_file'
,
'create_shuffle_reader'
,
'create_double_buffer_reader'
]
]
...
@@ -245,6 +245,8 @@ def monkey_patch_reader_methods(reader):
...
@@ -245,6 +245,8 @@ def monkey_patch_reader_methods(reader):
reader
.
eof
=
eof
reader
.
eof
=
eof
reader
.
reset
=
reset
reader
.
reset
=
reset
reader
.
stop_gradient
=
True
reader
.
persistable
=
True
return
reader
return
reader
...
@@ -285,6 +287,33 @@ def open_recordio_file(filename, shapes, lod_levels, dtypes):
...
@@ -285,6 +287,33 @@ def open_recordio_file(filename, shapes, lod_levels, dtypes):
startup_var
)
startup_var
)
def
__create_decorated_reader__
(
op_type
,
reader
,
attrs
):
var_name
=
unique_name
(
op_type
)
startup_blk
=
default_startup_program
().
current_block
()
startup_var
=
startup_blk
.
create_var
(
name
=
var_name
)
startup_blk
.
append_op
(
type
=
op_type
,
inputs
=
{
'UnderlyingReader'
:
reader
},
outputs
=
{
'Out'
:
[
startup_var
]},
attrs
=
attrs
)
startup_var
.
persistable
=
True
return
_copy_reader_var_
(
default_main_program
().
current_block
(),
startup_var
)
def
create_shuffle_reader
(
reader
,
buffer_size
):
return
__create_decorated_reader__
(
'create_shuffle_reader'
,
reader
,
{
'buffer_size'
:
int
(
buffer_size
)})
def
create_double_buffer_reader
(
reader
,
place
=
None
):
attrs
=
dict
()
if
place
is
not
None
:
attrs
[
'place'
]
=
str
(
place
).
upper
()
return
__create_decorated_reader__
(
'create_double_buffer_reader'
,
reader
,
attrs
)
def
read_file
(
file_obj
):
def
read_file
(
file_obj
):
helper
=
LayerHelper
(
'read_file'
)
helper
=
LayerHelper
(
'read_file'
)
out
=
[
out
=
[
...
...
python/paddle/fluid/layers/nn.py
浏览文件 @
6ea45823
...
@@ -49,6 +49,7 @@ __all__ = [
...
@@ -49,6 +49,7 @@ __all__ = [
'reduce_mean'
,
'reduce_mean'
,
'reduce_max'
,
'reduce_max'
,
'reduce_min'
,
'reduce_min'
,
'reduce_prod'
,
'sequence_first_step'
,
'sequence_first_step'
,
'sequence_last_step'
,
'sequence_last_step'
,
'dropout'
,
'dropout'
,
...
@@ -84,13 +85,12 @@ def fc(input,
...
@@ -84,13 +85,12 @@ def fc(input,
**Fully Connected Layer**
**Fully Connected Layer**
The fully connected layer can take multiple tensors as its inputs. It
The fully connected layer can take multiple tensors as its inputs. It
creates a variable (one for each input tensor) called weights for each
creates a variable called weights for each input tensor, which represents
input tensor, which represents a fully connected weight matrix from
a fully connected weight matrix from each input unit to each output unit.
each input unit to each output unit. The fully connected layer
The fully connected layer multiplies each input tensor with its coresponding
multiplies each input tensor with its coresponding weight to produce
weight to produce an output Tensor. If multiple input tensors are given,
an output Tensor. If multiple input tensors are given, the results of
the results of multiple multiplications will be sumed up. If bias_attr is
multiple multiplications will be sumed up. If bias_attr is not None,
not None, a bias variable will be created and added to the output. Finally,
a biases variable will be created and added to the output. Finally,
if activation is not None, it will be applied to the output as well.
if activation is not None, it will be applied to the output as well.
This process can be formulated as follows:
This process can be formulated as follows:
...
@@ -109,44 +109,27 @@ def fc(input,
...
@@ -109,44 +109,27 @@ def fc(input,
* :math:`Out`: The output tensor.
* :math:`Out`: The output tensor.
Args:
Args:
input(Variable|list): The input tensor(s) to the fully connected layer.
input (Variable|list of Variable): The input tensor(s) of this layer, and the dimension of
size(int): The number of output units in the fully connected layer.
the input tensor(s) is at least 2.
num_flatten_dims(int): The fc layer can accept an input tensor with more
size(int): The number of output units in this layer.
than two dimensions. If this happens, the
num_flatten_dims (int, default 1): The fc layer can accept an input tensor with more than
multidimensional tensor will first be flattened
two dimensions. If this happens, the multidimensional tensor will first be flattened
into a 2-dimensional matrix. The parameter
into a 2-dimensional matrix. The parameter `num_flatten_dims` determines how the input
`num_flatten_dims` determines how the input tensor
tensor is flattened: the first `num_flatten_dims` (inclusive, index starts from 1)
is flattened: the first `num_flatten_dims`
dimensions will be flatten to form the first dimension of the final matrix (height of
(inclusive, index starts from 1) dimensions will
the matrix), and the rest `rank(X) - num_flatten_dims` dimensions are flattened to
be flatten to form the first dimension of the
form the second dimension of the final matrix (width of the matrix). For example, suppose
final matrix (height of the matrix), and the rest
`X` is a 6-dimensional tensor with a shape [2, 3, 4, 5, 6], and `num_flatten_dims` = 3.
`rank(X) - num_flatten_dims` dimensions are
Then, the flattened matrix will have a shape [2 x 3 x 4, 5 x 6] = [24, 30].
flattened to form the second dimension of the
param_attr (ParamAttr|list of ParamAttr, default None): The parameter attribute for learnable
final matrix (width of the matrix). For example,
parameters/weights of this layer.
suppose `X` is a 6-dimensional tensor with a shape
bias_attr (ParamAttr|list of ParamAttr, default None): The parameter attribute for the bias
[2, 3, 4, 5, 6], and `num_flatten_dims` = 3. Then,
of this layer. If it is set to None, no bias will be added to the output units.
the flattened matrix will have a shape
act (str, default None): Activation to be applied to the output of this layer.
[2 x 3 x 4, 5 x 6] = [24, 30]. By default,
name (str, default None): The name of this layer.
`num_flatten_dims` is set to 1.
param_attr(ParamAttr|list): The parameter attribute for learnable
parameters/weights of the fully connected
layer.
param_initializer(ParamAttr|list): The initializer used for the
weight/parameter. If set None,
XavierInitializer() will be used.
bias_attr(ParamAttr|list): The parameter attribute for the bias parameter
for this layer. If set None, no bias will be
added to the output units.
bias_initializer(ParamAttr|list): The initializer used for the bias.
If set None, then ConstantInitializer()
will be used.
act(str): Activation to be applied to the output of the fully connected
layer.
name(str): Name/alias of the fully connected layer.
Returns:
Returns:
Variable: The output tensor variable
.
A tensor variable storing the transformation result
.
Raises:
Raises:
ValueError: If rank of the input tensor is less than 2.
ValueError: If rank of the input tensor is less than 2.
...
@@ -2202,6 +2185,53 @@ def reduce_min(input, dim=None, keep_dim=False, name=None):
...
@@ -2202,6 +2185,53 @@ def reduce_min(input, dim=None, keep_dim=False, name=None):
return
out
return
out
def
reduce_prod
(
input
,
dim
=
None
,
keep_dim
=
False
,
name
=
None
):
"""
Computes the product of tensor elements over the given dimension.
Args:
input (Variable): The input variable which is a Tensor or LoDTensor.
dim (int|None): The dimension along which the product is performed. If
:attr:`None`, multipy all elements of :attr:`input` and return a
Tensor variable with a single element, otherwise must be in the
range :math:`[-rank(input), rank(input))`. If :math:`dim < 0`,
the dimension to reduce is :math:`rank + dim`.
keep_dim (bool|False): Whether to reserve the reduced dimension in the
output Tensor. The result tensor will have one fewer dimension
than the :attr:`input` unless :attr:`keep_dim` is true.
name(str|None): A name for this layer(optional). If set None, the
layer will be named automatically.
Returns:
Variable: The reduced Tensor variable.
Examples:
.. code-block:: python
# x is a Tensor variable with following elements:
# [[0.2, 0.3, 0.5, 0.9]
# [0.1, 0.2, 0.6, 0.7]]
# Each example is followed by the correspending output tensor.
fluid.layers.reduce_prod(x) # [0.0002268]
fluid.layers.reduce_prod(x, dim=0) # [0.02, 0.06, 0.3, 0.63]
fluid.layers.reduce_prod(x, dim=-1) # [0.027, 0.0084]
fluid.layers.reduce_prod(x, dim=1,
keep_dim=True) # [[0.027], [0.0084]]
"""
helper
=
LayerHelper
(
'reduce_prod'
,
**
locals
())
out
=
helper
.
create_tmp_variable
(
dtype
=
helper
.
input_dtype
())
helper
.
append_op
(
type
=
'reduce_prod'
,
inputs
=
{
'X'
:
input
},
outputs
=
{
'Out'
:
out
},
attrs
=
{
'dim'
:
dim
if
dim
!=
None
else
0
,
'keep_dim'
:
keep_dim
,
'reduce_all'
:
True
if
dim
==
None
else
False
})
return
out
def
split
(
input
,
num_or_sections
,
dim
=-
1
,
name
=
None
):
def
split
(
input
,
num_or_sections
,
dim
=-
1
,
name
=
None
):
"""
"""
Split the input tensor into multiple sub-tensors.
Split the input tensor into multiple sub-tensors.
...
...
python/paddle/fluid/recordio_writer.py
浏览文件 @
6ea45823
...
@@ -36,6 +36,7 @@ def convert_reader_to_recordio_file(
...
@@ -36,6 +36,7 @@ def convert_reader_to_recordio_file(
feed_order
=
None
):
feed_order
=
None
):
if
feed_order
is
None
:
if
feed_order
is
None
:
feed_order
=
feeder
.
feed_names
feed_order
=
feeder
.
feed_names
counter
=
0
with
create_recordio_writer
(
filename
,
compressor
,
with
create_recordio_writer
(
filename
,
compressor
,
max_num_records
)
as
writer
:
max_num_records
)
as
writer
:
for
batch
in
reader_creator
():
for
batch
in
reader_creator
():
...
@@ -43,3 +44,5 @@ def convert_reader_to_recordio_file(
...
@@ -43,3 +44,5 @@ def convert_reader_to_recordio_file(
for
each
in
feed_order
:
for
each
in
feed_order
:
writer
.
append_tensor
(
res
[
each
])
writer
.
append_tensor
(
res
[
each
])
writer
.
complete_append_tensor
()
writer
.
complete_append_tensor
()
counter
+=
1
return
counter
python/paddle/fluid/regularizer.py
浏览文件 @
6ea45823
...
@@ -13,6 +13,7 @@
...
@@ -13,6 +13,7 @@
# limitations under the License.
# limitations under the License.
import
framework
import
framework
from
.
import
core
__all__
=
[
__all__
=
[
'append_regularization_ops'
,
'append_regularization_ops'
,
...
@@ -46,9 +47,9 @@ def append_regularization_ops(parameters_and_grads, regularization=None):
...
@@ -46,9 +47,9 @@ def append_regularization_ops(parameters_and_grads, regularization=None):
regularization_term
=
None
regularization_term
=
None
if
param
.
regularizer
is
not
None
:
if
param
.
regularizer
is
not
None
:
# Add variable for regularization term in grad block
# Add variable for regularization term in grad block
regularization_term
=
param
.
regularizer
(
param
,
grad
.
block
)
regularization_term
=
param
.
regularizer
(
param
,
grad
,
grad
.
block
)
elif
regularization
is
not
None
:
elif
regularization
is
not
None
:
regularization_term
=
regularization
(
param
,
grad
.
block
)
regularization_term
=
regularization
(
param
,
grad
,
grad
.
block
)
# If no gradient or no regularization specified,
# If no gradient or no regularization specified,
# then we don't need to do anything
# then we don't need to do anything
...
@@ -82,7 +83,7 @@ class WeightDecayRegularizer(object):
...
@@ -82,7 +83,7 @@ class WeightDecayRegularizer(object):
def
__init__
(
self
):
def
__init__
(
self
):
pass
pass
def
__call__
(
self
,
param
,
block
):
def
__call__
(
self
,
param
,
grad
,
block
):
"""Add corresponding weight decay operations to the network
"""Add corresponding weight decay operations to the network
"""
"""
raise
NotImplementedError
()
raise
NotImplementedError
()
...
@@ -102,7 +103,7 @@ class L2DecayRegularizer(WeightDecayRegularizer):
...
@@ -102,7 +103,7 @@ class L2DecayRegularizer(WeightDecayRegularizer):
super
(
L2DecayRegularizer
,
self
).
__init__
()
super
(
L2DecayRegularizer
,
self
).
__init__
()
self
.
_regularization_coeff
=
regularization_coeff
self
.
_regularization_coeff
=
regularization_coeff
def
__call__
(
self
,
param
,
block
):
def
__call__
(
self
,
param
,
grad
,
block
):
"""Add L2 weight decay ops to network
"""Add L2 weight decay ops to network
Adds L2 weight decay ops.
Adds L2 weight decay ops.
...
@@ -117,8 +118,23 @@ class L2DecayRegularizer(WeightDecayRegularizer):
...
@@ -117,8 +118,23 @@ class L2DecayRegularizer(WeightDecayRegularizer):
"""
"""
assert
isinstance
(
param
,
framework
.
Parameter
)
assert
isinstance
(
param
,
framework
.
Parameter
)
assert
isinstance
(
block
,
framework
.
Block
)
assert
isinstance
(
block
,
framework
.
Block
)
decay
=
block
.
create_var
(
decay
=
block
.
create_var
(
dtype
=
"float32"
,
shape
=
param
.
shape
,
lod_level
=
param
.
lod_level
)
dtype
=
"float32"
,
shape
=
param
.
shape
,
lod_level
=
param
.
lod_level
)
if
grad
.
type
==
core
.
VarDesc
.
VarType
.
SELECTED_ROWS
:
decay
=
block
.
create_var
(
dtype
=
"float32"
,
shape
=
param
.
shape
,
type
=
core
.
VarDesc
.
VarType
.
SELECTED_ROWS
)
block
.
append_op
(
type
=
'lookup_table'
,
inputs
=
{
'W'
:
param
,
'Ids'
:
grad
},
outputs
=
{
'Out'
:
decay
},
attrs
=
{
'is_sparse'
:
True
})
param
=
decay
# Append Op to calculate decay
# Append Op to calculate decay
block
.
append_op
(
block
.
append_op
(
type
=
'scale'
,
type
=
'scale'
,
...
@@ -141,7 +157,7 @@ class L1DecayRegularizer(WeightDecayRegularizer):
...
@@ -141,7 +157,7 @@ class L1DecayRegularizer(WeightDecayRegularizer):
super
(
L1DecayRegularizer
,
self
).
__init__
()
super
(
L1DecayRegularizer
,
self
).
__init__
()
self
.
_regularization_coeff
=
regularization_coeff
self
.
_regularization_coeff
=
regularization_coeff
def
__call__
(
self
,
param
,
block
):
def
__call__
(
self
,
param
,
grad
,
block
):
"""Add L1 weight decay ops to network
"""Add L1 weight decay ops to network
Adds L1 weight decay ops.
Adds L1 weight decay ops.
...
@@ -158,6 +174,19 @@ class L1DecayRegularizer(WeightDecayRegularizer):
...
@@ -158,6 +174,19 @@ class L1DecayRegularizer(WeightDecayRegularizer):
assert
isinstance
(
block
,
framework
.
Block
)
assert
isinstance
(
block
,
framework
.
Block
)
decay
=
block
.
create_var
(
decay
=
block
.
create_var
(
dtype
=
"float32"
,
shape
=
param
.
shape
,
lod_level
=
param
.
lod_level
)
dtype
=
"float32"
,
shape
=
param
.
shape
,
lod_level
=
param
.
lod_level
)
if
grad
.
type
==
core
.
VarDesc
.
VarType
.
SELECTED_ROWS
:
decay
=
block
.
create_var
(
dtype
=
"float32"
,
shape
=
param
.
shape
,
type
=
core
.
VarDesc
.
VarType
.
SELECTED_ROWS
)
block
.
append_op
(
type
=
'lookup_table'
,
inputs
=
{
'W'
:
param
,
'Ids'
:
grad
},
outputs
=
{
'Out'
:
decay
},
attrs
=
{
'is_sparse'
:
True
})
# Append sign op
# Append sign op
block
.
append_op
(
block
.
append_op
(
type
=
'sign'
,
inputs
=
{
"X"
:
param
},
outputs
=
{
"Out"
:
decay
})
type
=
'sign'
,
inputs
=
{
"X"
:
param
},
outputs
=
{
"Out"
:
decay
})
...
...
python/paddle/fluid/tests/book/test_machine_translation.py
浏览文件 @
6ea45823
...
@@ -181,7 +181,10 @@ def train_main(use_cuda, is_sparse, is_local=True):
...
@@ -181,7 +181,10 @@ def train_main(use_cuda, is_sparse, is_local=True):
cost
=
pd
.
cross_entropy
(
input
=
rnn_out
,
label
=
label
)
cost
=
pd
.
cross_entropy
(
input
=
rnn_out
,
label
=
label
)
avg_cost
=
pd
.
mean
(
cost
)
avg_cost
=
pd
.
mean
(
cost
)
optimizer
=
fluid
.
optimizer
.
Adagrad
(
learning_rate
=
1e-4
)
optimizer
=
fluid
.
optimizer
.
Adagrad
(
learning_rate
=
1e-4
,
regularization
=
fluid
.
regularizer
.
L2DecayRegularizer
(
regularization_coeff
=
0.1
))
optimize_ops
,
params_grads
=
optimizer
.
minimize
(
avg_cost
)
optimize_ops
,
params_grads
=
optimizer
.
minimize
(
avg_cost
)
train_data
=
paddle
.
batch
(
train_data
=
paddle
.
batch
(
...
...
python/paddle/fluid/tests/unittests/test_recordio_reader.py
浏览文件 @
6ea45823
...
@@ -13,9 +13,10 @@
...
@@ -13,9 +13,10 @@
# limitations under the License.
# limitations under the License.
import
unittest
import
unittest
import
paddle.fluid
as
fluid
import
paddle.fluid
as
fluid
import
paddle.v2.dataset.mnist
as
mnist
import
paddle.v2
as
paddle
import
paddle.v2
as
paddle
import
paddle.v2.dataset.mnist
as
mnist
class
TestRecordIO
(
unittest
.
TestCase
):
class
TestRecordIO
(
unittest
.
TestCase
):
...
@@ -31,10 +32,10 @@ class TestRecordIO(unittest.TestCase):
...
@@ -31,10 +32,10 @@ class TestRecordIO(unittest.TestCase):
name
=
'label'
,
shape
=
[
1
],
dtype
=
'int64'
),
name
=
'label'
,
shape
=
[
1
],
dtype
=
'int64'
),
],
],
place
=
fluid
.
CPUPlace
())
place
=
fluid
.
CPUPlace
())
fluid
.
recordio_writer
.
convert_reader_to_recordio_file
(
self
.
num_batches
=
fluid
.
recordio_writer
.
convert_reader_to_recordio_file
(
'./mnist.recordio'
,
reader
,
feeder
)
'./mnist.recordio'
,
reader
,
feeder
)
def
test_main
(
self
):
def
test_main
(
self
,
decorator_callback
=
None
):
# use new program
# use new program
with
fluid
.
program_guard
(
fluid
.
Program
(),
fluid
.
Program
()):
with
fluid
.
program_guard
(
fluid
.
Program
(),
fluid
.
Program
()):
data_file
=
fluid
.
layers
.
open_recordio_file
(
data_file
=
fluid
.
layers
.
open_recordio_file
(
...
@@ -42,6 +43,8 @@ class TestRecordIO(unittest.TestCase):
...
@@ -42,6 +43,8 @@ class TestRecordIO(unittest.TestCase):
shapes
=
[[
-
1
,
784
],
[
-
1
,
1
]],
shapes
=
[[
-
1
,
784
],
[
-
1
,
1
]],
lod_levels
=
[
0
,
0
],
lod_levels
=
[
0
,
0
],
dtypes
=
[
'float32'
,
'int64'
])
dtypes
=
[
'float32'
,
'int64'
])
if
decorator_callback
is
not
None
:
data_file
=
decorator_callback
(
data_file
)
img
,
label
=
fluid
.
layers
.
read_file
(
data_file
)
img
,
label
=
fluid
.
layers
.
read_file
(
data_file
)
hidden
=
fluid
.
layers
.
fc
(
input
=
img
,
size
=
100
,
act
=
'tanh'
)
hidden
=
fluid
.
layers
.
fc
(
input
=
img
,
size
=
100
,
act
=
'tanh'
)
...
@@ -51,14 +54,28 @@ class TestRecordIO(unittest.TestCase):
...
@@ -51,14 +54,28 @@ class TestRecordIO(unittest.TestCase):
fluid
.
optimizer
.
Adam
(
learning_rate
=
1e-3
).
minimize
(
avg_loss
)
fluid
.
optimizer
.
Adam
(
learning_rate
=
1e-3
).
minimize
(
avg_loss
)
exe
=
fluid
.
Executor
(
fluid
.
CPUPlace
())
if
fluid
.
core
.
is_compiled_with_cuda
():
place
=
fluid
.
CUDAPlace
(
0
)
else
:
place
=
fluid
.
CPUPlace
()
exe
=
fluid
.
Executor
(
place
)
exe
.
run
(
fluid
.
default_startup_program
())
exe
.
run
(
fluid
.
default_startup_program
())
avg_loss_np
=
[]
avg_loss_np
=
[]
# train a pass
# train a pass
batch_id
=
0
while
not
data_file
.
eof
():
while
not
data_file
.
eof
():
tmp
,
=
exe
.
run
(
fetch_list
=
[
avg_loss
])
tmp
,
=
exe
.
run
(
fetch_list
=
[
avg_loss
])
avg_loss_np
.
append
(
tmp
)
avg_loss_np
.
append
(
tmp
)
batch_id
+=
1
data_file
.
reset
()
data_file
.
reset
()
self
.
assertEqual
(
batch_id
,
self
.
num_batches
)
self
.
assertLess
(
avg_loss_np
[
-
1
],
avg_loss_np
[
0
])
self
.
assertLess
(
avg_loss_np
[
-
1
],
avg_loss_np
[
0
])
def
test_shuffle_reader
(
self
):
self
.
test_main
(
decorator_callback
=
lambda
reader
:
fluid
.
layers
.
create_shuffle_reader
(
reader
,
buffer_size
=
200
))
def
test_double_buffer_reader
(
self
):
self
.
test_main
(
decorator_callback
=
lambda
reader
:
fluid
.
layers
.
create_double_buffer_reader
(
reader
,
place
=
'cuda:0'
if
fluid
.
core
.
is_compiled_with_cuda
()
else
'cpu'
))
python/paddle/fluid/tests/unittests/test_reduce_op.py
浏览文件 @
6ea45823
...
@@ -70,6 +70,19 @@ class TestMinOp(OpTest):
...
@@ -70,6 +70,19 @@ class TestMinOp(OpTest):
self
.
check_output
()
self
.
check_output
()
class
TestProdOp
(
OpTest
):
def
setUp
(
self
):
self
.
op_type
=
"reduce_prod"
self
.
inputs
=
{
'X'
:
np
.
random
.
random
((
5
,
6
,
10
)).
astype
(
"float64"
)}
self
.
outputs
=
{
'Out'
:
self
.
inputs
[
'X'
].
prod
(
axis
=
0
)}
def
test_check_output
(
self
):
self
.
check_output
()
def
test_check_grad
(
self
):
self
.
check_grad
([
'X'
],
'Out'
)
class
TestKeepDimReduce
(
OpTest
):
class
TestKeepDimReduce
(
OpTest
):
def
setUp
(
self
):
def
setUp
(
self
):
self
.
op_type
=
"reduce_sum"
self
.
op_type
=
"reduce_sum"
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录