Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
BaiXuePrincess
Paddle
提交
1c484ca5
P
Paddle
项目概览
BaiXuePrincess
/
Paddle
与 Fork 源项目一致
Fork自
PaddlePaddle / Paddle
通知
1
Star
1
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
1c484ca5
编写于
9月 06, 2017
作者:
T
Travis CI
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
Deploy to GitHub Pages:
c6aa8c7f
上级
93584eab
变更
3
显示空白变更内容
内联
并排
Showing
3 changed file
with
195 addition
and
150 deletion
+195
-150
develop/doc_cn/_sources/howto/dev/new_op_cn.md.txt
develop/doc_cn/_sources/howto/dev/new_op_cn.md.txt
+102
-81
develop/doc_cn/howto/dev/new_op_cn.html
develop/doc_cn/howto/dev/new_op_cn.html
+92
-68
develop/doc_cn/searchindex.js
develop/doc_cn/searchindex.js
+1
-1
未找到文件。
develop/doc_cn/_sources/howto/dev/new_op_cn.md.txt
浏览文件 @
1c484ca5
...
...
@@ -45,7 +45,9 @@ Kernel实现 | CPU、GPU共享Kernel实现在`.h`文件中,否则,CPU
### 1. 定义ProtoMaker类
矩阵乘的公式:$Out = X * Y$, 可见该计算由两个输入,一个输出组成。首先定义`ProtoMaker`来描述该Op的输入、输出及注释:
矩阵乘法的公式:$Out = X * Y$, 可见该计算由两个输入,一个输出组成。
首先定义`ProtoMaker`来描述该Op的输入、输出,并添加注释:
```cpp
class MulOpMaker : public framework::OpProtoAndCheckerMaker {
...
...
@@ -63,17 +65,17 @@ The equation is: Out = X * Y
};
```
[`MulOpMaker`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/mul_op.cc#L43)继承自`framework::OpProtoAndCheckerMaker`,构造函数
包括
2个参数:
[`MulOpMaker`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/mul_op.cc#L43)继承自`framework::OpProtoAndCheckerMaker`,构造函数
含有
2个参数:
- `framework::OpProto` : 前者存储Op的输入输出和参数属性,将用于Python API接口的生成。
- `framework::OpAttrChecker` :后者用于检查参数属性的合法性。
构造函数里通过`AddInput`添加输入参数,通过`AddOutput`添加输出参数,通过`AddComment`添加
该Op的注释,
这些函数会将对应内容添加到`OpProto`中。
构造函数里通过`AddInput`添加输入参数,通过`AddOutput`添加输出参数,通过`AddComment`添加
Op的注释。
这些函数会将对应内容添加到`OpProto`中。
在`MulOp`中添加两个输入`X`和`Y`,添加了一个输出`Out`,并解释了各自含义,命名请遵守命名规范。
上面的代码
在`MulOp`中添加两个输入`X`和`Y`,添加了一个输出`Out`,并解释了各自含义,命名请遵守命名规范。
再
举个[`ScaleOp`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/scale_op.cc#L37)的例子
:
再
以[`ScaleOp`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/scale_op.cc#L37)为例
:
```cpp
template <typename AttrType>
...
...
@@ -91,14 +93,16 @@ The equation is: Out = scale*X
};
```
这个例子有两处不同:
这个例子有两处不同:
- `AddInput("X","...").NotInGradient()` : 表示`X`这个输入不参与`ScaleOp`对应的梯度Op计算之中,如果Op的某个输入不参与反向梯度的计算,请显示地调用`.NotInGradient()`进行设置。
- `AddInput("X","...").NotInGradient()` : 表示`X`这个输入不参与`ScaleOp`对应的梯度Op计算之中。
- `AddAttr<AttrType>("scale", "...").SetDefault(1.0);` : 增加`scale`系数,作为参数属性,并且设置默认值为1.0。
- `AddAttr<AttrType>("scale", "...").SetDefault(1.0);` : 增加`scale`系数,作为参数属性,并且设置默认值为1.0。
### 2. 定义Operator类
下面的点实现了MulOp的定义:
```cpp
class MulOp : public framework::OperatorWithKernel {
...
...
@@ -143,13 +147,26 @@ MulOp(const std::string &type, const framework::VariableNameMap &inputs,
- 1). 做检查, 尽早报错:检查输入数据维度、类型等是否合法。
- 2). 设置输出Tensor的形状。
通常`OpProtoMaker`和`Op`类的定义写在`.cc`文件中,和
要讲到
的注册函数一起放在`.cc`中
通常`OpProtoMaker`和`Op`类的定义写在`.cc`文件中,和
下面将要介绍
的注册函数一起放在`.cc`中
### 3. 定义OpKernel类
```cpp
template <typename Place, typename T>
class MulKernel : public framework::OpKernel {
`MulKernel`继承自`framework::OpKernel`,带有下面两个模板参数:
- `typename Place`: 表示设备类型,不同设备(CPU、GPU)共享同一个Kernel时,需加该模板参数,不共享则不加,一个不共享的例子是[`OnehotCrossEntropyOpKernel`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/cross_entropy_op.h#L43)。
- `typename T` : 表示数据类型,如`float`, `double`等。
需要为`MulKernel`类重写`Compute`接口。
- `Compute`接受一个输入参数:`const framework::ExecutionContext& context`。
- 与`InferShapeContext`相比,`ExecutionContext`增加了设备类型,同样可获取到输入输出和属性参数。
- `Compute`函数里实现`OpKernel`的具体计算逻辑。
下面是 `MulKernel` `Compute`的实现:
```cpp
template <typename Place, typename T>
class MulKernel : public framework::OpKernel {
public:
void Compute(const framework::ExecutionContext& context) const override {
auto* X = context.Input<Tensor>("X");
...
...
@@ -160,50 +177,50 @@ class MulKernel : public framework::OpKernel {
const_cast<platform::DeviceContext*>(context.device_context_);
math::matmul<Place, T>(*X, false, *Y, false, 1, Z, 0, device_context);
}
};
```
`MulKernel`继承自`framework::OpKernel`,带有模板参数:
- `typename Place`: 表示设备类型,不同设备(CPU、GPU)共享同一个Kernel时,需加该模板参数,不共享则不加,一个不共享的例子是[`OnehotCrossEntropyOpKernel`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/cross_entropy_op.h#L43)。
};
```
- `typename T` : 表示数据类型,如`float`, `double`等。
需要注意:**不同设备(CPU、GPU)共享一个Op定义,是否则共享同一个`OpKernel`,取决于`Compute`调用的函数是否支持不同设备。**
`Mul
Kernel`需要重写`Compute`接口,该接口参数为`const framework::ExecutionContext& context`, `ExecutionContext`相比`InferShapeContext`增加了设备类型,同样可获取到输入输出和属性参数,`Compute`函数里写具体实现时
。
`Mul
Op`的CPU、GPU实现共享同一个`Kernel`。`OpKernel`不共享的例子可以参考:[`OnehotCrossEntropyOpKernel`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/cross_entropy_op.h#L43)
。
注意,不同设备(CPU、GPU)共享一个Op定义,是否则共享同一个`OpKernel`,取决于`Compute`调用的函数是否支持不同设备。`MulOp`的CPU、GPU实现共享同一个`Kernel`,`OpKernel`不共享的例子可以参考[`OnehotCrossEntropyOpKernel`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/cross_entropy_op.h#L43)。
为了使`OpKernel`的计算过程书写更加简单,并且CPU、GPU的代码可以复用,我们通常借助 Eigen unsupported Tensor模块来实现`Compute`接口。关于在PaddlePaddle中如何使用Eigen库,请参考[使用文档](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/dev/use_eigen_cn.md)。
为了使得`OpKernel`的计算过程书写较为简单,CPU、GPU的代码可以复用,我们通常借助Eigen unsupported Tensor模块来实现。关于在paddle中如何使用Eigen库,请参考对应的使用[文档](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/dev/use_eigen_cn.md)
到此前向Op实现完成,需要在`.cc`文件中注册该op和kernel。反向Op类的定义和Kernel定义与前向Op类似,这里不再重复。但注意,反向Op没有`ProtoMaker`。
到此,前向Op实现完成。接下来,需要在`.cc`文件中注册该op和kernel。
反向Op类的定义,反向OpKernel的定义与前向Op类似,这里不再赘述。**但需注意反向Op没有`ProtoMaker`**。
### 4. 注册Operator
在`.cc`文件中注册前向、反向Op类,注册CPU Kernel。
-
在`.cc`文件中注册前向、反向Op类,注册CPU Kernel。
```cpp
namespace ops = paddle::operators;
REGISTER_OP(mul, ops::MulOp, ops::MulOpMaker, mul_grad, ops::MulOpGrad);
REGISTER_OP_CPU_KERNEL(mul, ops::MulKernel<paddle::platform::CPUPlace, float>);
REGISTER_OP_CPU_KERNEL(mul_grad,
```cpp
namespace ops = paddle::operators;
REGISTER_OP(mul, ops::MulOp, ops::MulOpMaker, mul_grad, ops::MulOpGrad);
REGISTER_OP_CPU_KERNEL(mul, ops::MulKernel<paddle::platform::CPUPlace, float>);
REGISTER_OP_CPU_KERNEL(mul_grad,
ops::MulGradKernel<paddle::platform::CPUPlace, float>);
```
```
在上面的代码中:
- `REGISTER_OP` : 注册`ops::MulOp`类,类型名为`mul`,该类的`ProtoMaker`为`ops::MulOpMaker`,注册`ops::MulOpGrad`,类型名为`mul_grad`,
- `REGISTER_OP` : 注册`ops::MulOp`类,类型名为`mul`,该类的`ProtoMaker`为`ops::MulOpMaker`,注册`ops::MulOpGrad`,类型名为`mul_grad`。
- `REGISTER_OP_WITHOUT_GRADIENT` : 用于注册没有反向的Op。
- `REGISTER_OP_CPU_KERNEL` :注册`ops::MulKernel`类,并特化模板参数为`paddle::platform::CPUPlace`和`float`类型,同理,注册`ops::MulKernel`类。
在 `.cu`文件中注册GPU Kernel。请注意,如果GPU Kernel的实现是基于Eigen unsupported模块,那么在 `.cu`的最前面请加上宏定义 `#define EIGEN_USE_GPU`
```cpp
// if use Eigen unsupported module before include head files
#define EIGEN_USE_GPU
- 在 `.cu`文件中注册GPU Kernel。
- 请注意,如果GPU Kernel的实现基于Eigen unsupported模块,那么在 `.cu`的开始请加上宏定义 `#define EIGEN_USE_GPU`,代码示例如下:
```cpp
// if use Eigen unsupported module before include head files
#define EIGEN_USE_GPU
namespace ops = paddle::operators;
REGISTER_OP_GPU_KERNEL(mul, ops::MulKernel<paddle::platform::GPUPlace, float>);
REGISTER_OP_GPU_KERNEL(mul_grad,
namespace ops = paddle::operators;
REGISTER_OP_GPU_KERNEL(mul, ops::MulKernel<paddle::platform::GPUPlace, float>);
REGISTER_OP_GPU_KERNEL(mul_grad,
ops::MulGradKernel<paddle::platform::GPUPlace, float>);
```
```
### 5. 编译
...
...
@@ -225,7 +242,7 @@ REGISTER_OP_GPU_KERNEL(mul_grad,
- 绑定Python
在 [`paddle/pybind/pybind.cc
`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/pybind/pybind.cc)
文件中添加该类:
`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/pybind/pybind.cc)
使用`USE_OP`告知编译器需要链接的Op,具体解释参考[代码注释](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/op_registry.h#L81)。
```
USE_OP(mul);
...
...
@@ -242,28 +259,31 @@ REGISTER_OP_GPU_KERNEL(mul_grad,
USE_NO_KENREL_OP(recurrent);
```
使用`USE_OP`告知编译器需要链接该Op的目标文件,具体解释参考[代码注释](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/op_registry.h#L81)。
- 生成库
无需修改 [`paddle/pybind/CMakeLists.txt`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/pybind/CMakeLists.txt)文件,`paddle/operators` 目录下新增的 `*_op.cc` 文件会
自动被
添加链接到生成的lib库中。
无需修改 [`paddle/pybind/CMakeLists.txt`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/pybind/CMakeLists.txt)文件,`paddle/operators` 目录下新增的 `*_op.cc` 文件会
被自动
添加链接到生成的lib库中。
## 实现单元测试
单测包括对比前向Op不同设备(CPU、GPU)的实现、对比反向OP不同设备(CPU、GPU)的实现、反向Op的梯度测试。下面介绍介绍[`MulOp`的单
测
](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/framework/tests/test_mul_op.py)。
单测包括对比前向Op不同设备(CPU、GPU)的实现、对比反向OP不同设备(CPU、GPU)的实现、反向Op的梯度测试。下面介绍介绍[`MulOp`的单
元测试
](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/framework/tests/test_mul_op.py)。
### 前向Operator单元测试
前向Op单
测继承自`unittest.TestCase`,并定义元类`__metaclass__ = OpTestMeta`,具体单测流程在`OpTestMeta`里完成。需在`setUp`函数定义输入输出和属性参数,以及Python对比的输出值。
前向Op单
元测试继承自`unittest.TestCase`,并定义元类`__metaclass__ = OpTestMeta`。各项更加具体的单元测试在`OpTestMeta`里完成。测试前向Operator,需要:
```python
import unittest
import numpy as np
from gradient_checker import GradientChecker, create_op
from op_test_util import OpTestMeta
1. 在`setUp`函数定义输入、输出,以及相关的属性参数。
2. 生成随机的输入数据。
3. 在Python脚本中实现与前向operator相同的计算逻辑,得到输出值,与operator前向计算的输出进行对比。
class TestMulOp(unittest.TestCase):
```python
import unittest
import numpy as np
from gradient_checker import GradientChecker, create_op
from op_test_util import OpTestMeta
class TestMulOp(unittest.TestCase):
__metaclass__ = OpTestMeta
def setUp(self):
...
...
@@ -273,19 +293,20 @@ class TestMulOp(unittest.TestCase):
'Y': np.random.random((84, 100)).astype("float32")
}
self.outputs = {'Out': np.dot(self.inputs['X'], self.inputs['Y'])}
```
首先需要`import`必要的包,下面详细解释其他值:
```
- `self.type = "mul" ` : 定义类型,和注册的类型一致。
- `self.inputs` : 定义输入,类型为Numpy.array,并初始化。
- `self.outputs` : 定义输出,并得到Python结算结果。
上面的代码首先导入依赖的包,下面是对`setUp`函数中操作的重要变量的详细解释:
- `self.type = "mul" ` : 定义类型,与operator注册时注册的类型一致。
- `self.inputs` : 定义输入,类型为`numpy.array`,并初始化。
- `self.outputs` : 定义输出,并在Python脚本中完成与operator同样的计算逻辑,返回Python端的计算结果。
### 反向Operator单元测试
反向Op单
测继承自`GradientChecker`,而`GradientChecker`集成自`unittest.TestCase`,所以反向单测函数需要`test_`开头
。
反向Op单
元测试继承自`GradientChecker`,而`GradientChecker`继承自`unittest.TestCase`,因此,**反向单元测试函数需要以`test_`开头**
。
```
cpp
```
python
class TestMulGradOp(GradientChecker):
def setUp(self):
self.op = create_op("mul")
...
...
@@ -319,27 +340,27 @@ class TestMulGradOp(GradientChecker):
no_grad_set={"Y"})
```
下面解释一些关键的地方:
下面解释
代码中
一些关键的地方:
- 调用`create_op("mul")`创建反向Op对应的前向Op。
- 调用`compare_grad`函数对比CPU、GPU计算结果。
- `test_normal`中调用`check_grad`检查梯度稳定性,这里采用数值法检测梯度正确
性。
- 调用`create_op("mul")`创建反向Op对应的前向Op。
- 调用`compare_grad`函数对比CPU、GPU计算结果。
- `test_normal`中调用`check_grad`使用数值法检测梯度正确性和稳定
性。
- 第一个参数`self.op` : 前向Op。
- 第二个参数`self.inputs` : 输入词典,词典的Key和`ProtoMaker`定义保持一致。
- 第三个参数`["X", "Y"]` : 指定对输入变量`X`、`Y`做梯度检测。
- 第四个参数`"Out"` : 指定前向网络最终的输出目标变量`Out`
- `test_ignore_x`和`test_ignore_y`分支
测试只需要计算一个输入梯度的情况。
- `test_ignore_x`和`test_ignore_y`分支用来
测试只需要计算一个输入梯度的情况。
### 编译和执行单元测试
单
测完成之后,在[`python/paddle/v2/framework/tests/CMakeLists.txt`](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/framework/tests/CMakeLists.txt)里添加以下内容将单测加入工程中
:
单
元测试编写完成之后,在[`python/paddle/v2/framework/tests/CMakeLists.txt`](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/framework/tests/CMakeLists.txt)中添加以下内容,将单元测试加入工程
:
```
py_test(test_mul_op SRCS test_mul_op.py)
```
请注意,**不同于Op的编译测试,运行单元测试测时需要编译整个工程**,并且编译时需要打开`WITH_TESTING`, 即`cmake paddle_dir -DWITH_TESTING=ON`。编译成功后,执行下面的命令来运行单
测
:
请注意,**不同于Op的编译测试,运行单元测试测时需要编译整个工程**,并且编译时需要打开`WITH_TESTING`, 即`cmake paddle_dir -DWITH_TESTING=ON`。编译成功后,执行下面的命令来运行单
元测试
:
```bash
make test ARGS="-R test_mul_op -V"
...
...
develop/doc_cn/howto/dev/new_op_cn.html
浏览文件 @
1c484ca5
...
...
@@ -227,7 +227,8 @@ Kernel实现 | CPU、GPU共享Kernel实现在<code class="docutils literal
<span
id=
"c"
></span><h2>
实现C++类
<a
class=
"headerlink"
href=
"#c"
title=
"永久链接至标题"
>
¶
</a></h2>
<div
class=
"section"
id=
"protomaker"
>
<span
id=
"protomaker"
></span><h3>
1. 定义ProtoMaker类
<a
class=
"headerlink"
href=
"#protomaker"
title=
"永久链接至标题"
>
¶
</a></h3>
<p>
矩阵乘的公式:$Out = X * Y$, 可见该计算由两个输入,一个输出组成。首先定义
<code
class=
"docutils literal"
><span
class=
"pre"
>
ProtoMaker
</span></code>
来描述该Op的输入、输出及注释:
</p>
<p>
矩阵乘法的公式:$Out = X * Y$, 可见该计算由两个输入,一个输出组成。
</p>
<p>
首先定义
<code
class=
"docutils literal"
><span
class=
"pre"
>
ProtoMaker
</span></code>
来描述该Op的输入、输出,并添加注释:
</p>
<div
class=
"highlight-cpp"
><div
class=
"highlight"
><pre><span></span><span
class=
"k"
>
class
</span>
<span
class=
"nc"
>
MulOpMaker
</span>
<span
class=
"o"
>
:
</span>
<span
class=
"k"
>
public
</span>
<span
class=
"n"
>
framework
</span><span
class=
"o"
>
::
</span><span
class=
"n"
>
OpProtoAndCheckerMaker
</span>
<span
class=
"p"
>
{
</span>
<span
class=
"k"
>
public
</span><span
class=
"o"
>
:
</span>
<span
class=
"n"
>
MulOpMaker
</span><span
class=
"p"
>
(
</span><span
class=
"n"
>
framework
</span><span
class=
"o"
>
::
</span><span
class=
"n"
>
OpProto
</span>
<span
class=
"o"
>
*
</span><span
class=
"n"
>
proto
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
framework
</span><span
class=
"o"
>
::
</span><span
class=
"n"
>
OpAttrChecker
</span>
<span
class=
"o"
>
*
</span><span
class=
"n"
>
op_checker
</span><span
class=
"p"
>
)
</span>
...
...
@@ -243,14 +244,14 @@ Kernel实现 | CPU、GPU共享Kernel实现在<code class="docutils literal
<span
class=
"p"
>
};
</span>
</pre></div>
</div>
<p><a
class=
"reference external"
href=
"https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/mul_op.cc#L43"
><code
class=
"docutils literal"
><span
class=
"pre"
>
MulOpMaker
</span></code></a>
继承自
<code
class=
"docutils literal"
><span
class=
"pre"
>
framework::OpProtoAndCheckerMaker
</span></code>
,构造函数
包括
2个参数:
</p>
<p><a
class=
"reference external"
href=
"https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/mul_op.cc#L43"
><code
class=
"docutils literal"
><span
class=
"pre"
>
MulOpMaker
</span></code></a>
继承自
<code
class=
"docutils literal"
><span
class=
"pre"
>
framework::OpProtoAndCheckerMaker
</span></code>
,构造函数
含有
2个参数:
</p>
<ul
class=
"simple"
>
<li><code
class=
"docutils literal"
><span
class=
"pre"
>
framework::OpProto
</span></code>
: 前者存储Op的输入输出和参数属性,将用于Python API接口的生成。
</li>
<li><code
class=
"docutils literal"
><span
class=
"pre"
>
framework::OpAttrChecker
</span></code>
:后者用于检查参数属性的合法性。
</li>
</ul>
<p>
构造函数里通过
<code
class=
"docutils literal"
><span
class=
"pre"
>
AddInput
</span></code>
添加输入参数,通过
<code
class=
"docutils literal"
><span
class=
"pre"
>
AddOutput
</span></code>
添加输出参数,通过
<code
class=
"docutils literal"
><span
class=
"pre"
>
AddComment
</span></code>
添加
该Op的注释,
这些函数会将对应内容添加到
<code
class=
"docutils literal"
><span
class=
"pre"
>
OpProto
</span></code>
中。
</p>
<p>
在
<code
class=
"docutils literal"
><span
class=
"pre"
>
MulOp
</span></code>
中添加两个输入
<code
class=
"docutils literal"
><span
class=
"pre"
>
X
</span></code>
和
<code
class=
"docutils literal"
><span
class=
"pre"
>
Y
</span></code>
,添加了一个输出
<code
class=
"docutils literal"
><span
class=
"pre"
>
Out
</span></code>
,并解释了各自含义,命名请遵守命名规范。
</p>
<p>
再
举个
<a
class=
"reference external"
href=
"https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/scale_op.cc#L37"
><code
class=
"docutils literal"
><span
class=
"pre"
>
ScaleOp
</span></code></a>
的例子
:
</p>
<p>
构造函数里通过
<code
class=
"docutils literal"
><span
class=
"pre"
>
AddInput
</span></code>
添加输入参数,通过
<code
class=
"docutils literal"
><span
class=
"pre"
>
AddOutput
</span></code>
添加输出参数,通过
<code
class=
"docutils literal"
><span
class=
"pre"
>
AddComment
</span></code>
添加
Op的注释。
这些函数会将对应内容添加到
<code
class=
"docutils literal"
><span
class=
"pre"
>
OpProto
</span></code>
中。
</p>
<p>
上面的代码
在
<code
class=
"docutils literal"
><span
class=
"pre"
>
MulOp
</span></code>
中添加两个输入
<code
class=
"docutils literal"
><span
class=
"pre"
>
X
</span></code>
和
<code
class=
"docutils literal"
><span
class=
"pre"
>
Y
</span></code>
,添加了一个输出
<code
class=
"docutils literal"
><span
class=
"pre"
>
Out
</span></code>
,并解释了各自含义,命名请遵守命名规范。
</p>
<p>
再
以
<a
class=
"reference external"
href=
"https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/scale_op.cc#L37"
><code
class=
"docutils literal"
><span
class=
"pre"
>
ScaleOp
</span></code></a>
为例
:
</p>
<div
class=
"highlight-cpp"
><div
class=
"highlight"
><pre><span></span><span
class=
"k"
>
template
</span>
<span
class=
"o"
>
<
</span><span
class=
"k"
>
typename
</span>
<span
class=
"n"
>
AttrType
</span><span
class=
"o"
>
>
</span>
<span
class=
"k"
>
class
</span>
<span
class=
"nc"
>
ScaleOpMaker
</span>
<span
class=
"o"
>
:
</span>
<span
class=
"k"
>
public
</span>
<span
class=
"n"
>
framework
</span><span
class=
"o"
>
::
</span><span
class=
"n"
>
OpProtoAndCheckerMaker
</span>
<span
class=
"p"
>
{
</span>
<span
class=
"k"
>
public
</span><span
class=
"o"
>
:
</span>
...
...
@@ -268,12 +269,13 @@ Kernel实现 | CPU、GPU共享Kernel实现在<code class="docutils literal
</div>
<p>
这个例子有两处不同:
</p>
<ul
class=
"simple"
>
<li><code
class=
"docutils literal"
><span
class=
"pre"
>
AddInput(
"
X
"
,
"
...
"
).NotInGradient()
</span></code>
: 表示
<code
class=
"docutils literal"
><span
class=
"pre"
>
X
</span></code>
这个输入不参与
<code
class=
"docutils literal"
><span
class=
"pre"
>
ScaleOp
</span></code>
对应的梯度Op计算之中。
</li>
<li><code
class=
"docutils literal"
><span
class=
"pre"
>
AddInput(
"
X
"
,
"
...
"
).NotInGradient()
</span></code>
: 表示
<code
class=
"docutils literal"
><span
class=
"pre"
>
X
</span></code>
这个输入不参与
<code
class=
"docutils literal"
><span
class=
"pre"
>
ScaleOp
</span></code>
对应的梯度Op计算之中
,如果Op的某个输入不参与反向梯度的计算,请显示地调用
<code
class=
"docutils literal"
><span
class=
"pre"
>
.NotInGradient()
</span></code>
进行设置
。
</li>
<li><code
class=
"docutils literal"
><span
class=
"pre"
>
AddAttr
<
AttrType
>
(
"
scale
"
,
</span>
<span
class=
"pre"
>
"
...
"
).SetDefault(1.0);
</span></code>
: 增加
<code
class=
"docutils literal"
><span
class=
"pre"
>
scale
</span></code>
系数,作为参数属性,并且设置默认值为1.0。
</li>
</ul>
</div>
<div
class=
"section"
id=
"operator"
>
<span
id=
"id2"
></span><h3>
2. 定义Operator类
<a
class=
"headerlink"
href=
"#operator"
title=
"永久链接至标题"
>
¶
</a></h3>
<p>
下面的点实现了MulOp的定义:
</p>
<div
class=
"highlight-cpp"
><div
class=
"highlight"
><pre><span></span><span
class=
"k"
>
class
</span>
<span
class=
"nc"
>
MulOp
</span>
<span
class=
"o"
>
:
</span>
<span
class=
"k"
>
public
</span>
<span
class=
"n"
>
framework
</span><span
class=
"o"
>
::
</span><span
class=
"n"
>
OperatorWithKernel
</span>
<span
class=
"p"
>
{
</span>
<span
class=
"k"
>
public
</span><span
class=
"o"
>
:
</span>
<span
class=
"k"
>
using
</span>
<span
class=
"n"
>
framework
</span><span
class=
"o"
>
::
</span><span
class=
"n"
>
OperatorWithKernel
</span><span
class=
"o"
>
::
</span><span
class=
"n"
>
OperatorWithKernel
</span><span
class=
"p"
>
;
</span>
...
...
@@ -312,14 +314,26 @@ Kernel实现 | CPU、GPU共享Kernel实现在<code class="docutils literal
<li>
1). 做检查, 尽早报错:检查输入数据维度、类型等是否合法。
</li>
<li>
2). 设置输出Tensor的形状。
</li>
</ul>
<p>
通常
<code
class=
"docutils literal"
><span
class=
"pre"
>
OpProtoMaker
</span></code>
和
<code
class=
"docutils literal"
><span
class=
"pre"
>
Op
</span></code>
类的定义写在
<code
class=
"docutils literal"
><span
class=
"pre"
>
.cc
</span></code>
文件中,和
要讲到
的注册函数一起放在
<code
class=
"docutils literal"
><span
class=
"pre"
>
.cc
</span></code>
中
</p>
<p>
通常
<code
class=
"docutils literal"
><span
class=
"pre"
>
OpProtoMaker
</span></code>
和
<code
class=
"docutils literal"
><span
class=
"pre"
>
Op
</span></code>
类的定义写在
<code
class=
"docutils literal"
><span
class=
"pre"
>
.cc
</span></code>
文件中,和
下面将要介绍
的注册函数一起放在
<code
class=
"docutils literal"
><span
class=
"pre"
>
.cc
</span></code>
中
</p>
</div>
<div
class=
"section"
id=
"opkernel"
>
<span
id=
"opkernel"
></span><h3>
3. 定义OpKernel类
<a
class=
"headerlink"
href=
"#opkernel"
title=
"永久链接至标题"
>
¶
</a></h3>
<p><code
class=
"docutils literal"
><span
class=
"pre"
>
MulKernel
</span></code>
继承自
<code
class=
"docutils literal"
><span
class=
"pre"
>
framework::OpKernel
</span></code>
,带有下面两个模板参数:
</p>
<ul
class=
"simple"
>
<li><code
class=
"docutils literal"
><span
class=
"pre"
>
typename
</span>
<span
class=
"pre"
>
Place
</span></code>
: 表示设备类型,不同设备(CPU、GPU)共享同一个Kernel时,需加该模板参数,不共享则不加,一个不共享的例子是
<a
class=
"reference external"
href=
"https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/cross_entropy_op.h#L43"
><code
class=
"docutils literal"
><span
class=
"pre"
>
OnehotCrossEntropyOpKernel
</span></code></a>
。
</li>
<li><code
class=
"docutils literal"
><span
class=
"pre"
>
typename
</span>
<span
class=
"pre"
>
T
</span></code>
: 表示数据类型,如
<code
class=
"docutils literal"
><span
class=
"pre"
>
float
</span></code>
,
<code
class=
"docutils literal"
><span
class=
"pre"
>
double
</span></code>
等。
</li>
</ul>
<p>
需要为
<code
class=
"docutils literal"
><span
class=
"pre"
>
MulKernel
</span></code>
类重写
<code
class=
"docutils literal"
><span
class=
"pre"
>
Compute
</span></code>
接口。
</p>
<ul
class=
"simple"
>
<li><code
class=
"docutils literal"
><span
class=
"pre"
>
Compute
</span></code>
接受一个输入参数:
<code
class=
"docutils literal"
><span
class=
"pre"
>
const
</span>
<span
class=
"pre"
>
framework::ExecutionContext
&
</span>
<span
class=
"pre"
>
context
</span></code>
。
</li>
<li>
与
<code
class=
"docutils literal"
><span
class=
"pre"
>
InferShapeContext
</span></code>
相比,
<code
class=
"docutils literal"
><span
class=
"pre"
>
ExecutionContext
</span></code>
增加了设备类型,同样可获取到输入输出和属性参数。
</li>
<li><code
class=
"docutils literal"
><span
class=
"pre"
>
Compute
</span></code>
函数里实现
<code
class=
"docutils literal"
><span
class=
"pre"
>
OpKernel
</span></code>
的具体计算逻辑。
</li>
</ul>
<p>
下面是
<code
class=
"docutils literal"
><span
class=
"pre"
>
MulKernel
</span></code>
<code
class=
"docutils literal"
><span
class=
"pre"
>
Compute
</span></code>
的实现:
</p>
<div
class=
"highlight-cpp"
><div
class=
"highlight"
><pre><span></span><span
class=
"k"
>
template
</span>
<span
class=
"o"
>
<
</span><span
class=
"k"
>
typename
</span>
<span
class=
"n"
>
Place
</span><span
class=
"p"
>
,
</span>
<span
class=
"k"
>
typename
</span>
<span
class=
"n"
>
T
</span><span
class=
"o"
>
>
</span>
<span
class=
"k"
>
class
</span>
<span
class=
"nc"
>
MulKernel
</span>
<span
class=
"o"
>
:
</span>
<span
class=
"k"
>
public
</span>
<span
class=
"n"
>
framework
</span><span
class=
"o"
>
::
</span><span
class=
"n"
>
OpKernel
</span>
<span
class=
"p"
>
{
</span>
<span
class=
"k"
>
public
</span><span
class=
"o"
>
:
</span>
<span
class=
"kt"
>
void
</span>
<span
class=
"n"
>
Compute
</span><span
class=
"p"
>
(
</span><span
class=
"k"
>
const
</span>
<span
class=
"n"
>
framework
</span><span
class=
"o"
>
::
</span><span
class=
"n"
>
ExecutionContext
</span><span
class=
"o"
>
&
</span>
<span
class=
"n"
>
context
</span><span
class=
"p"
>
)
</span>
<span
class=
"k"
>
const
</span>
<span
class=
"k"
>
override
</span>
<span
class=
"p"
>
{
</span>
<span
class=
"k"
>
public
</span><span
class=
"o"
>
:
</span>
<span
class=
"kt"
>
void
</span>
<span
class=
"n"
>
Compute
</span><span
class=
"p"
>
(
</span><span
class=
"k"
>
const
</span>
<span
class=
"n"
>
framework
</span><span
class=
"o"
>
::
</span><span
class=
"n"
>
ExecutionContext
</span><span
class=
"o"
>
&
</span>
<span
class=
"n"
>
context
</span><span
class=
"p"
>
)
</span>
<span
class=
"k"
>
const
</span>
<span
class=
"k"
>
override
</span>
<span
class=
"p"
>
{
</span>
<span
class=
"k"
>
auto
</span><span
class=
"o"
>
*
</span>
<span
class=
"n"
>
X
</span>
<span
class=
"o"
>
=
</span>
<span
class=
"n"
>
context
</span><span
class=
"p"
>
.
</span><span
class=
"n"
>
Input
</span><span
class=
"o"
>
<
</span><span
class=
"n"
>
Tensor
</span><span
class=
"o"
>
>
</span><span
class=
"p"
>
(
</span><span
class=
"s"
>
"
X
"
</span><span
class=
"p"
>
);
</span>
<span
class=
"k"
>
auto
</span><span
class=
"o"
>
*
</span>
<span
class=
"n"
>
Y
</span>
<span
class=
"o"
>
=
</span>
<span
class=
"n"
>
context
</span><span
class=
"p"
>
.
</span><span
class=
"n"
>
Input
</span><span
class=
"o"
>
<
</span><span
class=
"n"
>
Tensor
</span><span
class=
"o"
>
>
</span><span
class=
"p"
>
(
</span><span
class=
"s"
>
"
Y
"
</span><span
class=
"p"
>
);
</span>
<span
class=
"k"
>
auto
</span><span
class=
"o"
>
*
</span>
<span
class=
"n"
>
Z
</span>
<span
class=
"o"
>
=
</span>
<span
class=
"n"
>
context
</span><span
class=
"p"
>
.
</span><span
class=
"n"
>
Output
</span><span
class=
"o"
>
<
</span><span
class=
"n"
>
Tensor
</span><span
class=
"o"
>
>
</span><span
class=
"p"
>
(
</span><span
class=
"s"
>
"
Out
"
</span><span
class=
"p"
>
);
</span>
...
...
@@ -327,23 +341,20 @@ Kernel实现 | CPU、GPU共享Kernel实现在<code class="docutils literal
<span
class=
"k"
>
auto
</span><span
class=
"o"
>
*
</span>
<span
class=
"n"
>
device_context
</span>
<span
class=
"o"
>
=
</span>
<span
class=
"k"
>
const_cast
</span><span
class=
"o"
>
<
</span><span
class=
"n"
>
platform
</span><span
class=
"o"
>
::
</span><span
class=
"n"
>
DeviceContext
</span><span
class=
"o"
>
*
>
</span><span
class=
"p"
>
(
</span><span
class=
"n"
>
context
</span><span
class=
"p"
>
.
</span><span
class=
"n"
>
device_context_
</span><span
class=
"p"
>
);
</span>
<span
class=
"n"
>
math
</span><span
class=
"o"
>
::
</span><span
class=
"n"
>
matmul
</span><span
class=
"o"
>
<
</span><span
class=
"n"
>
Place
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
T
</span><span
class=
"o"
>
>
</span><span
class=
"p"
>
(
</span><span
class=
"o"
>
*
</span><span
class=
"n"
>
X
</span><span
class=
"p"
>
,
</span>
<span
class=
"nb"
>
false
</span><span
class=
"p"
>
,
</span>
<span
class=
"o"
>
*
</span><span
class=
"n"
>
Y
</span><span
class=
"p"
>
,
</span>
<span
class=
"nb"
>
false
</span><span
class=
"p"
>
,
</span>
<span
class=
"mi"
>
1
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
Z
</span><span
class=
"p"
>
,
</span>
<span
class=
"mi"
>
0
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
device_context
</span><span
class=
"p"
>
);
</span>
<span
class=
"p"
>
}
</span>
<span
class=
"p"
>
}
</span>
<span
class=
"p"
>
};
</span>
</pre></div>
</div>
<p><code
class=
"docutils literal"
><span
class=
"pre"
>
MulKernel
</span></code>
继承自
<code
class=
"docutils literal"
><span
class=
"pre"
>
framework::OpKernel
</span></code>
,带有模板参数:
</p>
<ul
class=
"simple"
>
<li><code
class=
"docutils literal"
><span
class=
"pre"
>
typename
</span>
<span
class=
"pre"
>
Place
</span></code>
: 表示设备类型,不同设备(CPU、GPU)共享同一个Kernel时,需加该模板参数,不共享则不加,一个不共享的例子是
<a
class=
"reference external"
href=
"https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/cross_entropy_op.h#L43"
><code
class=
"docutils literal"
><span
class=
"pre"
>
OnehotCrossEntropyOpKernel
</span></code></a>
。
</li>
<li><code
class=
"docutils literal"
><span
class=
"pre"
>
typename
</span>
<span
class=
"pre"
>
T
</span></code>
: 表示数据类型,如
<code
class=
"docutils literal"
><span
class=
"pre"
>
float
</span></code>
,
<code
class=
"docutils literal"
><span
class=
"pre"
>
double
</span></code>
等。
</li>
</ul>
<p><code
class=
"docutils literal"
><span
class=
"pre"
>
MulKernel
</span></code>
需要重写
<code
class=
"docutils literal"
><span
class=
"pre"
>
Compute
</span></code>
接口,该接口参数为
<code
class=
"docutils literal"
><span
class=
"pre"
>
const
</span>
<span
class=
"pre"
>
framework::ExecutionContext
&
</span>
<span
class=
"pre"
>
context
</span></code>
,
<code
class=
"docutils literal"
><span
class=
"pre"
>
ExecutionContext
</span></code>
相比
<code
class=
"docutils literal"
><span
class=
"pre"
>
InferShapeContext
</span></code>
增加了设备类型,同样可获取到输入输出和属性参数,
<code
class=
"docutils literal"
><span
class=
"pre"
>
Compute
</span></code>
函数里写具体实现时。
</p>
<p>
注意,不同设备(CPU、GPU)共享一个Op定义,是否则共享同一个
<code
class=
"docutils literal"
><span
class=
"pre"
>
OpKernel
</span></code>
,取决于
<code
class=
"docutils literal"
><span
class=
"pre"
>
Compute
</span></code>
调用的函数是否支持不同设备。
<code
class=
"docutils literal"
><span
class=
"pre"
>
MulOp
</span></code>
的CPU、GPU实现共享同一个
<code
class=
"docutils literal"
><span
class=
"pre"
>
Kernel
</span></code>
,
<code
class=
"docutils literal"
><span
class=
"pre"
>
OpKernel
</span></code>
不共享的例子可以参考
<a
class=
"reference external"
href=
"https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/cross_entropy_op.h#L43"
><code
class=
"docutils literal"
><span
class=
"pre"
>
OnehotCrossEntropyOpKernel
</span></code></a>
。
</p>
<p>
为了使得
<code
class=
"docutils literal"
><span
class=
"pre"
>
OpKernel
</span></code>
的计算过程书写较为简单,CPU、GPU的代码可以复用,我们通常借助Eigen unsupported Tensor模块来实现。关于在paddle中如何使用Eigen库,请参考对应的使用
<a
class=
"reference external"
href=
"https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/dev/use_eigen_cn.md"
>
文档
</a></p>
<p>
到此前向Op实现完成,需要在
<code
class=
"docutils literal"
><span
class=
"pre"
>
.cc
</span></code>
文件中注册该op和kernel。反向Op类的定义和Kernel定义与前向Op类似,这里不再重复。但注意,反向Op没有
<code
class=
"docutils literal"
><span
class=
"pre"
>
ProtoMaker
</span></code>
。
</p>
<p>
需要注意:
<strong>
不同设备(CPU、GPU)共享一个Op定义,是否则共享同一个
<code
class=
"docutils literal"
><span
class=
"pre"
>
OpKernel
</span></code>
,取决于
<code
class=
"docutils literal"
><span
class=
"pre"
>
Compute
</span></code>
调用的函数是否支持不同设备。
</strong></p>
<p><code
class=
"docutils literal"
><span
class=
"pre"
>
MulOp
</span></code>
的CPU、GPU实现共享同一个
<code
class=
"docutils literal"
><span
class=
"pre"
>
Kernel
</span></code>
。
<code
class=
"docutils literal"
><span
class=
"pre"
>
OpKernel
</span></code>
不共享的例子可以参考:
<a
class=
"reference external"
href=
"https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/cross_entropy_op.h#L43"
><code
class=
"docutils literal"
><span
class=
"pre"
>
OnehotCrossEntropyOpKernel
</span></code></a>
。
</p>
<p>
为了使
<code
class=
"docutils literal"
><span
class=
"pre"
>
OpKernel
</span></code>
的计算过程书写更加简单,并且CPU、GPU的代码可以复用,我们通常借助 Eigen unsupported Tensor模块来实现
<code
class=
"docutils literal"
><span
class=
"pre"
>
Compute
</span></code>
接口。关于在PaddlePaddle中如何使用Eigen库,请参考
<a
class=
"reference external"
href=
"https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/dev/use_eigen_cn.md"
>
使用文档
</a>
。
</p>
<p>
到此,前向Op实现完成。接下来,需要在
<code
class=
"docutils literal"
><span
class=
"pre"
>
.cc
</span></code>
文件中注册该op和kernel。
反向Op类的定义,反向OpKernel的定义与前向Op类似,这里不再赘述。
<strong>
但需注意反向Op没有
<code
class=
"docutils literal"
><span
class=
"pre"
>
ProtoMaker
</span></code></strong>
。
</p>
</div>
<div
class=
"section"
id=
"operator"
>
<span
id=
"id3"
></span><h3>
4. 注册Operator
<a
class=
"headerlink"
href=
"#operator"
title=
"永久链接至标题"
>
¶
</a></h3>
<p>
在
<code
class=
"docutils literal"
><span
class=
"pre"
>
.cc
</span></code>
文件中注册前向、反向Op类,注册CPU Kernel。
</p>
<ul>
<li><p
class=
"first"
>
在
<code
class=
"docutils literal"
><span
class=
"pre"
>
.cc
</span></code>
文件中注册前向、反向Op类,注册CPU Kernel。
</p>
<div
class=
"highlight-cpp"
><div
class=
"highlight"
><pre><span></span><span
class=
"k"
>
namespace
</span>
<span
class=
"n"
>
ops
</span>
<span
class=
"o"
>
=
</span>
<span
class=
"n"
>
paddle
</span><span
class=
"o"
>
::
</span><span
class=
"n"
>
operators
</span><span
class=
"p"
>
;
</span>
<span
class=
"n"
>
REGISTER_OP
</span><span
class=
"p"
>
(
</span><span
class=
"n"
>
mul
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
ops
</span><span
class=
"o"
>
::
</span><span
class=
"n"
>
MulOp
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
ops
</span><span
class=
"o"
>
::
</span><span
class=
"n"
>
MulOpMaker
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
mul_grad
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
ops
</span><span
class=
"o"
>
::
</span><span
class=
"n"
>
MulOpGrad
</span><span
class=
"p"
>
);
</span>
<span
class=
"n"
>
REGISTER_OP_CPU_KERNEL
</span><span
class=
"p"
>
(
</span><span
class=
"n"
>
mul
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
ops
</span><span
class=
"o"
>
::
</span><span
class=
"n"
>
MulKernel
</span><span
class=
"o"
>
<
</span><span
class=
"n"
>
paddle
</span><span
class=
"o"
>
::
</span><span
class=
"n"
>
platform
</span><span
class=
"o"
>
::
</span><span
class=
"n"
>
CPUPlace
</span><span
class=
"p"
>
,
</span>
<span
class=
"kt"
>
float
</span><span
class=
"o"
>
>
</span><span
class=
"p"
>
);
</span>
...
...
@@ -351,12 +362,19 @@ Kernel实现 | CPU、GPU共享Kernel实现在<code class="docutils literal
<span
class=
"n"
>
ops
</span><span
class=
"o"
>
::
</span><span
class=
"n"
>
MulGradKernel
</span><span
class=
"o"
>
<
</span><span
class=
"n"
>
paddle
</span><span
class=
"o"
>
::
</span><span
class=
"n"
>
platform
</span><span
class=
"o"
>
::
</span><span
class=
"n"
>
CPUPlace
</span><span
class=
"p"
>
,
</span>
<span
class=
"kt"
>
float
</span><span
class=
"o"
>
>
</span><span
class=
"p"
>
);
</span>
</pre></div>
</div>
<p>
在上面的代码中:
</p>
<ul
class=
"simple"
>
<li><code
class=
"docutils literal"
><span
class=
"pre"
>
REGISTER_OP
</span></code>
: 注册
<code
class=
"docutils literal"
><span
class=
"pre"
>
ops::MulOp
</span></code>
类,类型名为
<code
class=
"docutils literal"
><span
class=
"pre"
>
mul
</span></code>
,该类的
<code
class=
"docutils literal"
><span
class=
"pre"
>
ProtoMaker
</span></code>
为
<code
class=
"docutils literal"
><span
class=
"pre"
>
ops::MulOpMaker
</span></code>
,注册
<code
class=
"docutils literal"
><span
class=
"pre"
>
ops::MulOpGrad
</span></code>
,类型名为
<code
class=
"docutils literal"
><span
class=
"pre"
>
mul_grad
</span></code>
,
</li>
<li><code
class=
"docutils literal"
><span
class=
"pre"
>
REGISTER_OP
</span></code>
: 注册
<code
class=
"docutils literal"
><span
class=
"pre"
>
ops::MulOp
</span></code>
类,类型名为
<code
class=
"docutils literal"
><span
class=
"pre"
>
mul
</span></code>
,该类的
<code
class=
"docutils literal"
><span
class=
"pre"
>
ProtoMaker
</span></code>
为
<code
class=
"docutils literal"
><span
class=
"pre"
>
ops::MulOpMaker
</span></code>
,注册
<code
class=
"docutils literal"
><span
class=
"pre"
>
ops::MulOpGrad
</span></code>
,类型名为
<code
class=
"docutils literal"
><span
class=
"pre"
>
mul_grad
</span></code>
。
</li>
<li><code
class=
"docutils literal"
><span
class=
"pre"
>
REGISTER_OP_WITHOUT_GRADIENT
</span></code>
: 用于注册没有反向的Op。
</li>
<li><code
class=
"docutils literal"
><span
class=
"pre"
>
REGISTER_OP_CPU_KERNEL
</span></code>
:注册
<code
class=
"docutils literal"
><span
class=
"pre"
>
ops::MulKernel
</span></code>
类,并特化模板参数为
<code
class=
"docutils literal"
><span
class=
"pre"
>
paddle::platform::CPUPlace
</span></code>
和
<code
class=
"docutils literal"
><span
class=
"pre"
>
float
</span></code>
类型,同理,注册
<code
class=
"docutils literal"
><span
class=
"pre"
>
ops::MulKernel
</span></code>
类。
</li>
</ul>
<p>
在
<code
class=
"docutils literal"
><span
class=
"pre"
>
.cu
</span></code>
文件中注册GPU Kernel。请注意,如果GPU Kernel的实现是基于Eigen unsupported模块,那么在
<code
class=
"docutils literal"
><span
class=
"pre"
>
.cu
</span></code>
的最前面请加上宏定义
<code
class=
"docutils literal"
><span
class=
"pre"
>
#define
</span>
<span
class=
"pre"
>
EIGEN_USE_GPU
</span></code></p>
</li>
</ul>
<ul>
<li><p
class=
"first"
>
在
<code
class=
"docutils literal"
><span
class=
"pre"
>
.cu
</span></code>
文件中注册GPU Kernel。
</p>
<ul
class=
"simple"
>
<li>
请注意,如果GPU Kernel的实现基于Eigen unsupported模块,那么在
<code
class=
"docutils literal"
><span
class=
"pre"
>
.cu
</span></code>
的开始请加上宏定义
<code
class=
"docutils literal"
><span
class=
"pre"
>
#define
</span>
<span
class=
"pre"
>
EIGEN_USE_GPU
</span></code>
,代码示例如下:
</li>
</ul>
<div
class=
"highlight-cpp"
><div
class=
"highlight"
><pre><span></span><span
class=
"c1"
>
// if use Eigen unsupported module before include head files
</span>
<span
class=
"cp"
>
#define EIGEN_USE_GPU
</span>
...
...
@@ -366,6 +384,8 @@ Kernel实现 | CPU、GPU共享Kernel实现在<code class="docutils literal
<span
class=
"n"
>
ops
</span><span
class=
"o"
>
::
</span><span
class=
"n"
>
MulGradKernel
</span><span
class=
"o"
>
<
</span><span
class=
"n"
>
paddle
</span><span
class=
"o"
>
::
</span><span
class=
"n"
>
platform
</span><span
class=
"o"
>
::
</span><span
class=
"n"
>
GPUPlace
</span><span
class=
"p"
>
,
</span>
<span
class=
"kt"
>
float
</span><span
class=
"o"
>
>
</span><span
class=
"p"
>
);
</span>
</pre></div>
</div>
</li>
</ul>
</div>
<div
class=
"section"
id=
""
>
<span
id=
"id4"
></span><h3>
5. 编译
<a
class=
"headerlink"
href=
"#"
title=
"永久链接至标题"
>
¶
</a></h3>
...
...
@@ -389,7 +409,7 @@ Kernel实现 | CPU、GPU共享Kernel实现在<code class="docutils literal
<span
id=
"python"
></span><h2>
绑定Python
<a
class=
"headerlink"
href=
"#python"
title=
"永久链接至标题"
>
¶
</a></h2>
<ul>
<li><p
class=
"first"
>
绑定Python
</p>
<p>
在
<a
class=
"reference external"
href=
"https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/pybind/pybind.cc"
><code
class=
"docutils literal"
><span
class=
"pre"
>
paddle/pybind/pybind.cc
</span></code></a>
文件中添加该类:
</p>
<p>
在
<a
class=
"reference external"
href=
"https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/pybind/pybind.cc"
><code
class=
"docutils literal"
><span
class=
"pre"
>
paddle/pybind/pybind.cc
</span></code></a>
使用
<code
class=
"docutils literal"
><span
class=
"pre"
>
USE_OP
</span></code>
告知编译器需要链接的Op,具体解释参考
<a
class=
"reference external"
href=
"https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/op_registry.h#L81"
>
代码注释
</a>
。
</p>
<div
class=
"highlight-default"
><div
class=
"highlight"
><pre><span></span><span
class=
"n"
>
USE_OP
</span><span
class=
"p"
>
(
</span><span
class=
"n"
>
mul
</span><span
class=
"p"
>
);
</span>
</pre></div>
</div>
...
...
@@ -401,21 +421,25 @@ Kernel实现 | CPU、GPU共享Kernel实现在<code class="docutils literal
<div
class=
"highlight-default"
><div
class=
"highlight"
><pre><span></span><span
class=
"n"
>
USE_NO_KENREL_OP
</span><span
class=
"p"
>
(
</span><span
class=
"n"
>
recurrent
</span><span
class=
"p"
>
);
</span>
</pre></div>
</div>
<p>
使用
<code
class=
"docutils literal"
><span
class=
"pre"
>
USE_OP
</span></code>
告知编译器需要链接该Op的目标文件,具体解释参考
<a
class=
"reference external"
href=
"https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/op_registry.h#L81"
>
代码注释
</a>
。
</p>
</li>
</ul>
<ul>
<li><p
class=
"first"
>
生成库
</p>
<p>
无需修改
<a
class=
"reference external"
href=
"https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/pybind/CMakeLists.txt"
><code
class=
"docutils literal"
><span
class=
"pre"
>
paddle/pybind/CMakeLists.txt
</span></code></a>
文件,
<code
class=
"docutils literal"
><span
class=
"pre"
>
paddle/operators
</span></code>
目录下新增的
<code
class=
"docutils literal"
><span
class=
"pre"
>
*_op.cc
</span></code>
文件会
自动被
添加链接到生成的lib库中。
</p>
<p>
无需修改
<a
class=
"reference external"
href=
"https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/pybind/CMakeLists.txt"
><code
class=
"docutils literal"
><span
class=
"pre"
>
paddle/pybind/CMakeLists.txt
</span></code></a>
文件,
<code
class=
"docutils literal"
><span
class=
"pre"
>
paddle/operators
</span></code>
目录下新增的
<code
class=
"docutils literal"
><span
class=
"pre"
>
*_op.cc
</span></code>
文件会
被自动
添加链接到生成的lib库中。
</p>
</li>
</ul>
</div>
<div
class=
"section"
id=
""
>
<span
id=
"id5"
></span><h2>
实现单元测试
<a
class=
"headerlink"
href=
"#"
title=
"永久链接至标题"
>
¶
</a></h2>
<p>
单测包括对比前向Op不同设备(CPU、GPU)的实现、对比反向OP不同设备(CPU、GPU)的实现、反向Op的梯度测试。下面介绍介绍
<a
class=
"reference external"
href=
"https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/framework/tests/test_mul_op.py"
><code
class=
"docutils literal"
><span
class=
"pre"
>
MulOp
</span></code>
的单
测
</a>
。
</p>
<p>
单测包括对比前向Op不同设备(CPU、GPU)的实现、对比反向OP不同设备(CPU、GPU)的实现、反向Op的梯度测试。下面介绍介绍
<a
class=
"reference external"
href=
"https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/framework/tests/test_mul_op.py"
><code
class=
"docutils literal"
><span
class=
"pre"
>
MulOp
</span></code>
的单
元测试
</a>
。
</p>
<div
class=
"section"
id=
"operator"
>
<span
id=
"id6"
></span><h3>
前向Operator单元测试
<a
class=
"headerlink"
href=
"#operator"
title=
"永久链接至标题"
>
¶
</a></h3>
<p>
前向Op单测继承自
<code
class=
"docutils literal"
><span
class=
"pre"
>
unittest.TestCase
</span></code>
,并定义元类
<code
class=
"docutils literal"
><span
class=
"pre"
>
__metaclass__
</span>
<span
class=
"pre"
>
=
</span>
<span
class=
"pre"
>
OpTestMeta
</span></code>
,具体单测流程在
<code
class=
"docutils literal"
><span
class=
"pre"
>
OpTestMeta
</span></code>
里完成。需在
<code
class=
"docutils literal"
><span
class=
"pre"
>
setUp
</span></code>
函数定义输入输出和属性参数,以及Python对比的输出值。
</p>
<p>
前向Op单元测试继承自
<code
class=
"docutils literal"
><span
class=
"pre"
>
unittest.TestCase
</span></code>
,并定义元类
<code
class=
"docutils literal"
><span
class=
"pre"
>
__metaclass__
</span>
<span
class=
"pre"
>
=
</span>
<span
class=
"pre"
>
OpTestMeta
</span></code>
。各项更加具体的单元测试在
<code
class=
"docutils literal"
><span
class=
"pre"
>
OpTestMeta
</span></code>
里完成。测试前向Operator,需要:
</p>
<ol
class=
"simple"
>
<li>
在
<code
class=
"docutils literal"
><span
class=
"pre"
>
setUp
</span></code>
函数定义输入、输出,以及相关的属性参数。
</li>
<li>
生成随机的输入数据。
</li>
<li>
在Python脚本中实现与前向operator相同的计算逻辑,得到输出值,与operator前向计算的输出进行对比。
</li>
</ol>
<div
class=
"highlight-python"
><div
class=
"highlight"
><pre><span></span><span
class=
"kn"
>
import
</span>
<span
class=
"nn"
>
unittest
</span>
<span
class=
"kn"
>
import
</span>
<span
class=
"nn"
>
numpy
</span>
<span
class=
"kn"
>
as
</span>
<span
class=
"nn"
>
np
</span>
<span
class=
"kn"
>
from
</span>
<span
class=
"nn"
>
gradient_checker
</span>
<span
class=
"kn"
>
import
</span>
<span
class=
"n"
>
GradientChecker
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
create_op
</span>
...
...
@@ -433,70 +457,70 @@ Kernel实现 | CPU、GPU共享Kernel实现在<code class="docutils literal
<span
class=
"bp"
>
self
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
outputs
</span>
<span
class=
"o"
>
=
</span>
<span
class=
"p"
>
{
</span><span
class=
"s1"
>
'
Out
'
</span><span
class=
"p"
>
:
</span>
<span
class=
"n"
>
np
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
dot
</span><span
class=
"p"
>
(
</span><span
class=
"bp"
>
self
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
inputs
</span><span
class=
"p"
>
[
</span><span
class=
"s1"
>
'
X
'
</span><span
class=
"p"
>
],
</span>
<span
class=
"bp"
>
self
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
inputs
</span><span
class=
"p"
>
[
</span><span
class=
"s1"
>
'
Y
'
</span><span
class=
"p"
>
])}
</span>
</pre></div>
</div>
<p>
首先需要
<code
class=
"docutils literal"
><span
class=
"pre"
>
import
</span></code>
必要的包,下面详细解释其他值
:
</p>
<p>
上面的代码首先导入依赖的包,下面是对
<code
class=
"docutils literal"
><span
class=
"pre"
>
setUp
</span></code>
函数中操作的重要变量的详细解释
:
</p>
<ul
class=
"simple"
>
<li><code
class=
"docutils literal"
><span
class=
"pre"
>
self.type
</span>
<span
class=
"pre"
>
=
</span>
<span
class=
"pre"
>
"
mul
"
</span></code>
: 定义类型,
和
注册的类型一致。
</li>
<li><code
class=
"docutils literal"
><span
class=
"pre"
>
self.inputs
</span></code>
: 定义输入,类型为
Numpy.array
,并初始化。
</li>
<li><code
class=
"docutils literal"
><span
class=
"pre"
>
self.outputs
</span></code>
: 定义输出,并
得到Python结
算结果。
</li>
<li><code
class=
"docutils literal"
><span
class=
"pre"
>
self.type
</span>
<span
class=
"pre"
>
=
</span>
<span
class=
"pre"
>
"
mul
"
</span></code>
: 定义类型,
与operator注册时
注册的类型一致。
</li>
<li><code
class=
"docutils literal"
><span
class=
"pre"
>
self.inputs
</span></code>
: 定义输入,类型为
<code
class=
"docutils literal"
><span
class=
"pre"
>
numpy.array
</span></code>
,并初始化。
</li>
<li><code
class=
"docutils literal"
><span
class=
"pre"
>
self.outputs
</span></code>
: 定义输出,并
在Python脚本中完成与operator同样的计算逻辑,返回Python端的计
算结果。
</li>
</ul>
</div>
<div
class=
"section"
id=
"operator"
>
<span
id=
"id7"
></span><h3>
反向Operator单元测试
<a
class=
"headerlink"
href=
"#operator"
title=
"永久链接至标题"
>
¶
</a></h3>
<p>
反向Op单
测继承自
<code
class=
"docutils literal"
><span
class=
"pre"
>
GradientChecker
</span></code>
,而
<code
class=
"docutils literal"
><span
class=
"pre"
>
GradientChecker
</span></code>
集成自
<code
class=
"docutils literal"
><span
class=
"pre"
>
unittest.TestCase
</span></code>
,所以反向单测函数需要
<code
class=
"docutils literal"
><span
class=
"pre"
>
test_
</span></code>
开头
。
</p>
<div
class=
"highlight-
cpp"
><div
class=
"highlight"
><pre><span></span><span
class=
"k"
>
class
</span>
<span
class=
"nf"
>
TestMulGradOp
</span><span
class=
"p"
>
(
</span><span
class=
"n"
>
GradientChecker
</span><span
class=
"p"
>
)
</span><span
class=
"o"
>
:
</span>
<span
class=
"
n"
>
def
</span>
<span
class=
"n"
>
setUp
</span><span
class=
"p"
>
(
</span><span
class=
"n"
>
self
</span><span
class=
"p"
>
)
</span><span
class=
"o"
>
:
</span>
<span
class=
"
n"
>
self
</span><span
class=
"p"
>
.
</span><span
class=
"n"
>
op
</span>
<span
class=
"o"
>
=
</span>
<span
class=
"n"
>
create_op
</span><span
class=
"p"
>
(
</span><span
class=
"s
"
>
"
mul
"
</span><span
class=
"p"
>
)
</span>
<span
class=
"
n"
>
self
</span><span
class=
"p
"
>
.
</span><span
class=
"n"
>
inputs
</span>
<span
class=
"o"
>
=
</span>
<span
class=
"p"
>
{
</span>
<span
class=
"s
c"
>
'
X
'
</span><span
class=
"o"
>
:
</span>
<span
class=
"n"
>
np
</span><span
class=
"p"
>
.
</span><span
class=
"n"
>
random
</span><span
class=
"p"
>
.
</span><span
class=
"n"
>
random
</span><span
class=
"p"
>
((
</span><span
class=
"mi"
>
32
</span><span
class=
"p"
>
,
</span>
<span
class=
"mi"
>
84
</span><span
class=
"p"
>
)).
</span><span
class=
"n"
>
astype
</span><span
class=
"p"
>
(
</span><span
class=
"s
"
>
"
float32
"
</span><span
class=
"p"
>
),
</span>
<span
class=
"s
c"
>
'
Y
'
</span><span
class=
"o"
>
:
</span>
<span
class=
"n"
>
np
</span><span
class=
"p"
>
.
</span><span
class=
"n"
>
random
</span><span
class=
"p"
>
.
</span><span
class=
"n"
>
random
</span><span
class=
"p"
>
((
</span><span
class=
"mi"
>
84
</span><span
class=
"p"
>
,
</span>
<span
class=
"mi"
>
100
</span><span
class=
"p"
>
)).
</span><span
class=
"n"
>
astype
</span><span
class=
"p"
>
(
</span><span
class=
"s
"
>
"
float32
"
</span><span
class=
"p"
>
)
</span>
<p>
反向Op单
元测试继承自
<code
class=
"docutils literal"
><span
class=
"pre"
>
GradientChecker
</span></code>
,而
<code
class=
"docutils literal"
><span
class=
"pre"
>
GradientChecker
</span></code>
继承自
<code
class=
"docutils literal"
><span
class=
"pre"
>
unittest.TestCase
</span></code>
,因此,
<strong>
反向单元测试函数需要以
<code
class=
"docutils literal"
><span
class=
"pre"
>
test_
</span></code>
开头
</strong>
。
</p>
<div
class=
"highlight-
python"
><div
class=
"highlight"
><pre><span></span><span
class=
"k"
>
class
</span>
<span
class=
"nc"
>
TestMulGradOp
</span><span
class=
"p"
>
(
</span><span
class=
"n"
>
GradientChecker
</span><span
class=
"p"
>
)
:
</span>
<span
class=
"
k"
>
def
</span>
<span
class=
"nf"
>
setUp
</span><span
class=
"p"
>
(
</span><span
class=
"bp"
>
self
</span><span
class=
"p"
>
)
:
</span>
<span
class=
"
bp"
>
self
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
op
</span>
<span
class=
"o"
>
=
</span>
<span
class=
"n"
>
create_op
</span><span
class=
"p"
>
(
</span><span
class=
"s2
"
>
"
mul
"
</span><span
class=
"p"
>
)
</span>
<span
class=
"
bp"
>
self
</span><span
class=
"o
"
>
.
</span><span
class=
"n"
>
inputs
</span>
<span
class=
"o"
>
=
</span>
<span
class=
"p"
>
{
</span>
<span
class=
"s
1"
>
'
X
'
</span><span
class=
"p"
>
:
</span>
<span
class=
"n"
>
np
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
random
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
random
</span><span
class=
"p"
>
((
</span><span
class=
"mi"
>
32
</span><span
class=
"p"
>
,
</span>
<span
class=
"mi"
>
84
</span><span
class=
"p"
>
))
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
astype
</span><span
class=
"p"
>
(
</span><span
class=
"s2
"
>
"
float32
"
</span><span
class=
"p"
>
),
</span>
<span
class=
"s
1"
>
'
Y
'
</span><span
class=
"p"
>
:
</span>
<span
class=
"n"
>
np
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
random
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
random
</span><span
class=
"p"
>
((
</span><span
class=
"mi"
>
84
</span><span
class=
"p"
>
,
</span>
<span
class=
"mi"
>
100
</span><span
class=
"p"
>
))
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
astype
</span><span
class=
"p"
>
(
</span><span
class=
"s2
"
>
"
float32
"
</span><span
class=
"p"
>
)
</span>
<span
class=
"p"
>
}
</span>
<span
class=
"
n"
>
def
</span>
<span
class=
"nf"
>
test_cpu_gpu_compare
</span><span
class=
"p"
>
(
</span><span
class=
"n"
>
self
</span><span
class=
"p"
>
)
</span><span
class=
"o"
>
:
</span>
<span
class=
"
n"
>
self
</span><span
class=
"p"
>
.
</span><span
class=
"n"
>
compare_grad
</span><span
class=
"p"
>
(
</span><span
class=
"n"
>
self
</span><span
class=
"p"
>
.
</span><span
class=
"n"
>
op
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
self
</span><span
class=
"p
"
>
.
</span><span
class=
"n"
>
inputs
</span><span
class=
"p"
>
)
</span>
<span
class=
"
k"
>
def
</span>
<span
class=
"nf"
>
test_cpu_gpu_compare
</span><span
class=
"p"
>
(
</span><span
class=
"bp"
>
self
</span><span
class=
"p"
>
)
:
</span>
<span
class=
"
bp"
>
self
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
compare_grad
</span><span
class=
"p"
>
(
</span><span
class=
"bp"
>
self
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
op
</span><span
class=
"p"
>
,
</span>
<span
class=
"bp"
>
self
</span><span
class=
"o
"
>
.
</span><span
class=
"n"
>
inputs
</span><span
class=
"p"
>
)
</span>
<span
class=
"
n"
>
def
</span>
<span
class=
"n"
>
test_normal
</span><span
class=
"p"
>
(
</span><span
class=
"n"
>
self
</span><span
class=
"p"
>
)
</span><span
class=
"o"
>
:
</span>
<span
class=
"c
p
"
>
# mul op will enlarge the relative error
</span>
<span
class=
"
n"
>
self
</span><span
class=
"p
"
>
.
</span><span
class=
"n"
>
check_grad
</span><span
class=
"p"
>
(
</span>
<span
class=
"
n"
>
self
</span><span
class=
"p"
>
.
</span><span
class=
"n"
>
op
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
self
</span><span
class=
"p"
>
.
</span><span
class=
"n"
>
inputs
</span><span
class=
"p"
>
,
</span>
<span
class=
"p"
>
[
</span><span
class=
"s"
>
"
X
"
</span><span
class=
"p"
>
,
</span>
<span
class=
"s"
>
"
Y
"
</span><span
class=
"p"
>
],
</span>
<span
class=
"s
"
>
"
Out
"
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
max_relative_error
</span><span
class=
"o"
>
=
</span><span
class=
"mf"
>
0.5
</span><span
class=
"p"
>
)
</span>
<span
class=
"
k"
>
def
</span>
<span
class=
"nf"
>
test_normal
</span><span
class=
"p"
>
(
</span><span
class=
"bp"
>
self
</span><span
class=
"p"
>
)
:
</span>
<span
class=
"c
1
"
>
# mul op will enlarge the relative error
</span>
<span
class=
"
bp"
>
self
</span><span
class=
"o
"
>
.
</span><span
class=
"n"
>
check_grad
</span><span
class=
"p"
>
(
</span>
<span
class=
"
bp"
>
self
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
op
</span><span
class=
"p"
>
,
</span>
<span
class=
"bp"
>
self
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
inputs
</span><span
class=
"p"
>
,
</span>
<span
class=
"p"
>
[
</span><span
class=
"s2"
>
"
X
"
</span><span
class=
"p"
>
,
</span>
<span
class=
"s2"
>
"
Y
"
</span><span
class=
"p"
>
],
</span>
<span
class=
"s2
"
>
"
Out
"
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
max_relative_error
</span><span
class=
"o"
>
=
</span><span
class=
"mf"
>
0.5
</span><span
class=
"p"
>
)
</span>
<span
class=
"
n"
>
def
</span>
<span
class=
"n"
>
test_ignore_x
</span><span
class=
"p"
>
(
</span><span
class=
"n"
>
self
</span><span
class=
"p"
>
)
</span><span
class=
"o"
>
:
</span>
<span
class=
"
n"
>
self
</span><span
class=
"p
"
>
.
</span><span
class=
"n"
>
check_grad
</span><span
class=
"p"
>
(
</span>
<span
class=
"
n"
>
self
</span><span
class=
"p
"
>
.
</span><span
class=
"n"
>
op
</span><span
class=
"p"
>
,
</span>
<span
class=
"
n"
>
self
</span><span
class=
"p"
>
.
</span><span
class=
"n"
>
inputs
</span><span
class=
"p"
>
,
</span>
<span
class=
"p"
>
[
</span><span
class=
"s
"
>
"
Y
"
</span><span
class=
"p"
>
],
</span>
<span
class=
"s"
>
"
Out
"
</span><span
class=
"p"
>
,
</span>
<span
class=
"
k"
>
def
</span>
<span
class=
"nf"
>
test_ignore_x
</span><span
class=
"p"
>
(
</span><span
class=
"bp"
>
self
</span><span
class=
"p"
>
)
:
</span>
<span
class=
"
bp"
>
self
</span><span
class=
"o
"
>
.
</span><span
class=
"n"
>
check_grad
</span><span
class=
"p"
>
(
</span>
<span
class=
"
bp"
>
self
</span><span
class=
"o
"
>
.
</span><span
class=
"n"
>
op
</span><span
class=
"p"
>
,
</span>
<span
class=
"
bp"
>
self
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
inputs
</span><span
class=
"p"
>
,
</span>
<span
class=
"p"
>
[
</span><span
class=
"s2
"
>
"
Y
"
</span><span
class=
"p"
>
],
</span>
<span
class=
"s
2
"
>
"
Out
"
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
max_relative_error
</span><span
class=
"o"
>
=
</span><span
class=
"mf"
>
0.5
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
no_grad_set
</span><span
class=
"o"
>
=
</span><span
class=
"p"
>
{
</span><span
class=
"s"
>
"
X
"
</span><span
class=
"p"
>
})
</span>
<span
class=
"n"
>
no_grad_set
</span><span
class=
"o"
>
=
</span><span
class=
"p"
>
{
</span><span
class=
"s
2
"
>
"
X
"
</span><span
class=
"p"
>
})
</span>
<span
class=
"
n"
>
def
</span>
<span
class=
"n"
>
test_ignore_y
</span><span
class=
"p"
>
(
</span><span
class=
"n"
>
self
</span><span
class=
"p"
>
)
</span><span
class=
"o"
>
:
</span>
<span
class=
"
n"
>
self
</span><span
class=
"p
"
>
.
</span><span
class=
"n"
>
check_grad
</span><span
class=
"p"
>
(
</span>
<span
class=
"
n"
>
self
</span><span
class=
"p
"
>
.
</span><span
class=
"n"
>
op
</span><span
class=
"p"
>
,
</span>
<span
class=
"
n"
>
self
</span><span
class=
"p"
>
.
</span><span
class=
"n"
>
inputs
</span><span
class=
"p"
>
,
</span>
<span
class=
"p"
>
[
</span><span
class=
"s
"
>
"
X
"
</span><span
class=
"p"
>
],
</span>
<span
class=
"s"
>
"
Out
"
</span><span
class=
"p"
>
,
</span>
<span
class=
"
k"
>
def
</span>
<span
class=
"nf"
>
test_ignore_y
</span><span
class=
"p"
>
(
</span><span
class=
"bp"
>
self
</span><span
class=
"p"
>
)
:
</span>
<span
class=
"
bp"
>
self
</span><span
class=
"o
"
>
.
</span><span
class=
"n"
>
check_grad
</span><span
class=
"p"
>
(
</span>
<span
class=
"
bp"
>
self
</span><span
class=
"o
"
>
.
</span><span
class=
"n"
>
op
</span><span
class=
"p"
>
,
</span>
<span
class=
"
bp"
>
self
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
inputs
</span><span
class=
"p"
>
,
</span>
<span
class=
"p"
>
[
</span><span
class=
"s2
"
>
"
X
"
</span><span
class=
"p"
>
],
</span>
<span
class=
"s
2
"
>
"
Out
"
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
max_relative_error
</span><span
class=
"o"
>
=
</span><span
class=
"mf"
>
0.5
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
no_grad_set
</span><span
class=
"o"
>
=
</span><span
class=
"p"
>
{
</span><span
class=
"s"
>
"
Y
"
</span><span
class=
"p"
>
})
</span>
<span
class=
"n"
>
no_grad_set
</span><span
class=
"o"
>
=
</span><span
class=
"p"
>
{
</span><span
class=
"s
2
"
>
"
Y
"
</span><span
class=
"p"
>
})
</span>
</pre></div>
</div>
<p>
下面解释一些关键的地方:
</p>
<p>
下面解释
代码中
一些关键的地方:
</p>
<ul
class=
"simple"
>
<li>
调用
<code
class=
"docutils literal"
><span
class=
"pre"
>
create_op(
"
mul
"
)
</span></code>
创建反向Op对应的前向Op。
</li>
<li>
调用
<code
class=
"docutils literal"
><span
class=
"pre"
>
compare_grad
</span></code>
函数对比CPU、GPU计算结果。
</li>
<li><code
class=
"docutils literal"
><span
class=
"pre"
>
test_normal
</span></code>
中调用
<code
class=
"docutils literal"
><span
class=
"pre"
>
check_grad
</span></code>
检查梯度稳定性,这里采用数值法检测梯度正确
性。
<ul>
<li><code
class=
"docutils literal"
><span
class=
"pre"
>
test_normal
</span></code>
中调用
<code
class=
"docutils literal"
><span
class=
"pre"
>
check_grad
</span></code>
使用数值法检测梯度正确性和稳定
性。
<ul>
<li>
第一个参数
<code
class=
"docutils literal"
><span
class=
"pre"
>
self.op
</span></code>
: 前向Op。
</li>
<li>
第二个参数
<code
class=
"docutils literal"
><span
class=
"pre"
>
self.inputs
</span></code>
: 输入词典,词典的Key和
<code
class=
"docutils literal"
><span
class=
"pre"
>
ProtoMaker
</span></code>
定义保持一致。
</li>
<li>
第三个参数
<code
class=
"docutils literal"
><span
class=
"pre"
>
[
"
X
"
,
</span>
<span
class=
"pre"
>
"
Y
"
]
</span></code>
: 指定对输入变量
<code
class=
"docutils literal"
><span
class=
"pre"
>
X
</span></code>
、
<code
class=
"docutils literal"
><span
class=
"pre"
>
Y
</span></code>
做梯度检测。
</li>
<li>
第四个参数
<code
class=
"docutils literal"
><span
class=
"pre"
>
"
Out
"
</span></code>
: 指定前向网络最终的输出目标变量
<code
class=
"docutils literal"
><span
class=
"pre"
>
Out
</span></code></li>
</ul>
</li>
<li><code
class=
"docutils literal"
><span
class=
"pre"
>
test_ignore_x
</span></code>
和
<code
class=
"docutils literal"
><span
class=
"pre"
>
test_ignore_y
</span></code>
分支测试只需要计算一个输入梯度的情况。
</li>
<li><code
class=
"docutils literal"
><span
class=
"pre"
>
test_ignore_x
</span></code>
和
<code
class=
"docutils literal"
><span
class=
"pre"
>
test_ignore_y
</span></code>
分支
用来
测试只需要计算一个输入梯度的情况。
</li>
</ul>
</div>
<div
class=
"section"
id=
""
>
<span
id=
"id8"
></span><h3>
编译和执行单元测试
<a
class=
"headerlink"
href=
"#"
title=
"永久链接至标题"
>
¶
</a></h3>
<p>
单
测完成之后,在
<a
class=
"reference external"
href=
"https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/framework/tests/CMakeLists.txt"
><code
class=
"docutils literal"
><span
class=
"pre"
>
python/paddle/v2/framework/tests/CMakeLists.txt
</span></code></a>
里添加以下内容将单测加入工程中
:
</p>
<p>
单
元测试编写完成之后,在
<a
class=
"reference external"
href=
"https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/framework/tests/CMakeLists.txt"
><code
class=
"docutils literal"
><span
class=
"pre"
>
python/paddle/v2/framework/tests/CMakeLists.txt
</span></code></a>
中添加以下内容,将单元测试加入工程
:
</p>
<div
class=
"highlight-default"
><div
class=
"highlight"
><pre><span></span><span
class=
"n"
>
py_test
</span><span
class=
"p"
>
(
</span><span
class=
"n"
>
test_mul_op
</span>
<span
class=
"n"
>
SRCS
</span>
<span
class=
"n"
>
test_mul_op
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
py
</span><span
class=
"p"
>
)
</span>
</pre></div>
</div>
<p>
请注意,
<strong>
不同于Op的编译测试,运行单元测试测时需要编译整个工程
</strong>
,并且编译时需要打开
<code
class=
"docutils literal"
><span
class=
"pre"
>
WITH_TESTING
</span></code>
, 即
<code
class=
"docutils literal"
><span
class=
"pre"
>
cmake
</span>
<span
class=
"pre"
>
paddle_dir
</span>
<span
class=
"pre"
>
-DWITH_TESTING=ON
</span></code>
。编译成功后,执行下面的命令来运行单
测
:
</p>
<p>
请注意,
<strong>
不同于Op的编译测试,运行单元测试测时需要编译整个工程
</strong>
,并且编译时需要打开
<code
class=
"docutils literal"
><span
class=
"pre"
>
WITH_TESTING
</span></code>
, 即
<code
class=
"docutils literal"
><span
class=
"pre"
>
cmake
</span>
<span
class=
"pre"
>
paddle_dir
</span>
<span
class=
"pre"
>
-DWITH_TESTING=ON
</span></code>
。编译成功后,执行下面的命令来运行单
元测试
:
</p>
<div
class=
"highlight-bash"
><div
class=
"highlight"
><pre><span></span>
make
<span
class=
"nb"
>
test
</span>
<span
class=
"nv"
>
ARGS
</span><span
class=
"o"
>
=
</span><span
class=
"s2"
>
"
-R test_mul_op -V
"
</span>
</pre></div>
</div>
...
...
develop/doc_cn/searchindex.js
浏览文件 @
1c484ca5
因为 它太大了无法显示 source diff 。你可以改为
查看blob
。
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录