Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
机器未来
Paddle
提交
07efdb51
P
Paddle
项目概览
机器未来
/
Paddle
与 Fork 源项目一致
Fork自
PaddlePaddle / Paddle
通知
1
Star
1
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
1
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
1
Issue
1
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
07efdb51
编写于
2月 26, 2019
作者:
T
tensor-tang
浏览文件
操作
浏览文件
下载
差异文件
Merge remote-tracking branch 'ups/develop' into jit/sgd
上级
a0c37662
a4b4ecd8
变更
95
隐藏空白更改
内联
并排
Showing
95 changed file
with
1900 addition
and
969 deletion
+1900
-969
README.md
README.md
+11
-11
README_cn.md
README_cn.md
+11
-11
paddle/fluid/API.spec
paddle/fluid/API.spec
+2
-2
paddle/fluid/framework/block_desc.cc
paddle/fluid/framework/block_desc.cc
+14
-0
paddle/fluid/framework/block_desc.h
paddle/fluid/framework/block_desc.h
+2
-0
paddle/fluid/framework/data_layout_transform.cc
paddle/fluid/framework/data_layout_transform.cc
+7
-16
paddle/fluid/framework/data_transform.cc
paddle/fluid/framework/data_transform.cc
+24
-6
paddle/fluid/framework/details/all_reduce_deps_pass.cc
paddle/fluid/framework/details/all_reduce_deps_pass.cc
+2
-2
paddle/fluid/framework/details/build_strategy.cc
paddle/fluid/framework/details/build_strategy.cc
+5
-31
paddle/fluid/framework/details/build_strategy.h
paddle/fluid/framework/details/build_strategy.h
+1
-1
paddle/fluid/framework/details/fast_threaded_ssa_graph_executor.cc
...uid/framework/details/fast_threaded_ssa_graph_executor.cc
+4
-5
paddle/fluid/framework/details/fast_threaded_ssa_graph_executor.h
...luid/framework/details/fast_threaded_ssa_graph_executor.h
+2
-2
paddle/fluid/framework/details/memory_optimize_helper.cc
paddle/fluid/framework/details/memory_optimize_helper.cc
+39
-8
paddle/fluid/framework/details/memory_optimize_helper.h
paddle/fluid/framework/details/memory_optimize_helper.h
+6
-4
paddle/fluid/framework/details/memory_optimize_helper_test.cc
...le/fluid/framework/details/memory_optimize_helper_test.cc
+4
-22
paddle/fluid/framework/details/memory_optimize_pass.cc
paddle/fluid/framework/details/memory_optimize_pass.cc
+6
-8
paddle/fluid/framework/details/multi_devices_graph_pass.cc
paddle/fluid/framework/details/multi_devices_graph_pass.cc
+15
-3
paddle/fluid/framework/details/parallel_ssa_graph_executor.cc
...le/fluid/framework/details/parallel_ssa_graph_executor.cc
+4
-11
paddle/fluid/framework/details/parallel_ssa_graph_executor.h
paddle/fluid/framework/details/parallel_ssa_graph_executor.h
+2
-4
paddle/fluid/framework/details/sequential_execution_pass.cc
paddle/fluid/framework/details/sequential_execution_pass.cc
+2
-2
paddle/fluid/framework/details/threaded_ssa_graph_executor.cc
...le/fluid/framework/details/threaded_ssa_graph_executor.cc
+4
-5
paddle/fluid/framework/details/threaded_ssa_graph_executor.h
paddle/fluid/framework/details/threaded_ssa_graph_executor.h
+2
-2
paddle/fluid/framework/ir/graph.cc
paddle/fluid/framework/ir/graph.cc
+3
-0
paddle/fluid/framework/ir/graph.h
paddle/fluid/framework/ir/graph.h
+7
-1
paddle/fluid/framework/ir/mkldnn/conv_elementwise_add_mkldnn_fuse_pass_tester.cc
...ir/mkldnn/conv_elementwise_add_mkldnn_fuse_pass_tester.cc
+40
-10
paddle/fluid/framework/operator.cc
paddle/fluid/framework/operator.cc
+16
-2
paddle/fluid/framework/operator.h
paddle/fluid/framework/operator.h
+35
-2
paddle/fluid/framework/operator_kernel_configs.h
paddle/fluid/framework/operator_kernel_configs.h
+118
-0
paddle/fluid/framework/parallel_executor.cc
paddle/fluid/framework/parallel_executor.cc
+50
-42
paddle/fluid/framework/parallel_executor.h
paddle/fluid/framework/parallel_executor.h
+3
-3
paddle/fluid/framework/tensor.h
paddle/fluid/framework/tensor.h
+33
-8
paddle/fluid/framework/var_type_traits.h
paddle/fluid/framework/var_type_traits.h
+0
-5
paddle/fluid/imperative/layer.cc
paddle/fluid/imperative/layer.cc
+2
-1
paddle/fluid/imperative/layer.h
paddle/fluid/imperative/layer.h
+29
-18
paddle/fluid/imperative/tracer.cc
paddle/fluid/imperative/tracer.cc
+17
-8
paddle/fluid/imperative/tracer.h
paddle/fluid/imperative/tracer.h
+6
-4
paddle/fluid/operators/alloc_continuous_space_op.cc
paddle/fluid/operators/alloc_continuous_space_op.cc
+211
-0
paddle/fluid/operators/beam_search_decode_op.cc
paddle/fluid/operators/beam_search_decode_op.cc
+1
-1
paddle/fluid/operators/beam_search_decode_op.h
paddle/fluid/operators/beam_search_decode_op.h
+1
-1
paddle/fluid/operators/benchmark/op_tester.cc
paddle/fluid/operators/benchmark/op_tester.cc
+43
-10
paddle/fluid/operators/benchmark/op_tester.h
paddle/fluid/operators/benchmark/op_tester.h
+1
-2
paddle/fluid/operators/benchmark/op_tester_config.cc
paddle/fluid/operators/benchmark/op_tester_config.cc
+83
-9
paddle/fluid/operators/benchmark/op_tester_config.h
paddle/fluid/operators/benchmark/op_tester_config.h
+10
-1
paddle/fluid/operators/conv_cudnn_op.cu.cc
paddle/fluid/operators/conv_cudnn_op.cu.cc
+14
-45
paddle/fluid/operators/conv_cudnn_op_cache.h
paddle/fluid/operators/conv_cudnn_op_cache.h
+1
-95
paddle/fluid/operators/conv_fusion_op.cu.cc
paddle/fluid/operators/conv_fusion_op.cu.cc
+9
-24
paddle/fluid/operators/conv_op.cc
paddle/fluid/operators/conv_op.cc
+34
-5
paddle/fluid/operators/cross_entropy_op.cc
paddle/fluid/operators/cross_entropy_op.cc
+39
-18
paddle/fluid/operators/detection/prior_box_op.h
paddle/fluid/operators/detection/prior_box_op.h
+12
-1
paddle/fluid/operators/fake_quantize_op.cc
paddle/fluid/operators/fake_quantize_op.cc
+2
-4
paddle/fluid/operators/mkldnn/activation_mkldnn_op.cc
paddle/fluid/operators/mkldnn/activation_mkldnn_op.cc
+1
-1
paddle/fluid/operators/mkldnn/conv_mkldnn_op.cc
paddle/fluid/operators/mkldnn/conv_mkldnn_op.cc
+23
-22
paddle/fluid/operators/mkldnn/gaussian_random_mkldnn_op.cc
paddle/fluid/operators/mkldnn/gaussian_random_mkldnn_op.cc
+6
-2
paddle/fluid/operators/mkldnn/pool_mkldnn_op.cc
paddle/fluid/operators/mkldnn/pool_mkldnn_op.cc
+3
-4
paddle/fluid/operators/mkldnn/softmax_mkldnn_op.cc
paddle/fluid/operators/mkldnn/softmax_mkldnn_op.cc
+3
-4
paddle/fluid/operators/mkldnn/sum_mkldnn_op.cc
paddle/fluid/operators/mkldnn/sum_mkldnn_op.cc
+4
-108
paddle/fluid/operators/mkldnn/transpose_mkldnn_op.cc
paddle/fluid/operators/mkldnn/transpose_mkldnn_op.cc
+22
-3
paddle/fluid/operators/pool_op.cc
paddle/fluid/operators/pool_op.cc
+8
-7
paddle/fluid/operators/sequence_ops/sequence_expand_op.cu
paddle/fluid/operators/sequence_ops/sequence_expand_op.cu
+77
-15
paddle/fluid/platform/CMakeLists.txt
paddle/fluid/platform/CMakeLists.txt
+3
-3
paddle/fluid/platform/device_context.cc
paddle/fluid/platform/device_context.cc
+2
-2
paddle/fluid/platform/device_tracer.cc
paddle/fluid/platform/device_tracer.cc
+12
-2
paddle/fluid/platform/device_tracer.h
paddle/fluid/platform/device_tracer.h
+1
-2
paddle/fluid/platform/enforce.h
paddle/fluid/platform/enforce.h
+1
-0
paddle/fluid/platform/event.h
paddle/fluid/platform/event.h
+68
-0
paddle/fluid/platform/mkldnn_reuse.h
paddle/fluid/platform/mkldnn_reuse.h
+45
-38
paddle/fluid/platform/mkldnn_utils.h
paddle/fluid/platform/mkldnn_utils.h
+69
-0
paddle/fluid/platform/profiler.cu
paddle/fluid/platform/profiler.cu
+10
-10
paddle/fluid/platform/profiler.h
paddle/fluid/platform/profiler.h
+5
-46
paddle/fluid/platform/profiler_test.cc
paddle/fluid/platform/profiler_test.cc
+0
-1
paddle/fluid/platform/temporary_allocator_test.cc
paddle/fluid/platform/temporary_allocator_test.cc
+4
-4
paddle/fluid/pybind/imperative.cc
paddle/fluid/pybind/imperative.cc
+4
-4
paddle/fluid/pybind/ir.cc
paddle/fluid/pybind/ir.cc
+2
-1
paddle/fluid/pybind/protobuf.cc
paddle/fluid/pybind/protobuf.cc
+2
-0
paddle/fluid/pybind/pybind.cc
paddle/fluid/pybind/pybind.cc
+4
-3
paddle/fluid/train/demo/demo_trainer.cc
paddle/fluid/train/demo/demo_trainer.cc
+2
-2
paddle/fluid/train/test_train_recognize_digits.cc
paddle/fluid/train/test_train_recognize_digits.cc
+1
-1
paddle/scripts/paddle_build.sh
paddle/scripts/paddle_build.sh
+1
-0
python/paddle/fluid/compiler.py
python/paddle/fluid/compiler.py
+49
-24
python/paddle/fluid/executor.py
python/paddle/fluid/executor.py
+2
-0
python/paddle/fluid/framework.py
python/paddle/fluid/framework.py
+51
-15
python/paddle/fluid/layers/nn.py
python/paddle/fluid/layers/nn.py
+36
-8
python/paddle/fluid/layers/ops.py
python/paddle/fluid/layers/ops.py
+22
-7
python/paddle/fluid/parallel_executor.py
python/paddle/fluid/parallel_executor.py
+6
-12
python/paddle/fluid/tests/unittests/mkldnn/mkldnn_op_test.py
python/paddle/fluid/tests/unittests/mkldnn/mkldnn_op_test.py
+72
-0
python/paddle/fluid/tests/unittests/mkldnn/test_activation_mkldnn_op.py
...fluid/tests/unittests/mkldnn/test_activation_mkldnn_op.py
+17
-55
python/paddle/fluid/tests/unittests/mkldnn/test_softmax_mkldnn_op.py
...le/fluid/tests/unittests/mkldnn/test_softmax_mkldnn_op.py
+57
-0
python/paddle/fluid/tests/unittests/test_alloc_continuous_space_op.py
...e/fluid/tests/unittests/test_alloc_continuous_space_op.py
+74
-0
python/paddle/fluid/tests/unittests/test_fake_quantize_op.py
python/paddle/fluid/tests/unittests/test_fake_quantize_op.py
+35
-2
python/paddle/fluid/tests/unittests/test_imperative_optimizer.py
...paddle/fluid/tests/unittests/test_imperative_optimizer.py
+59
-56
python/paddle/fluid/tests/unittests/test_imperative_resnet.py
...on/paddle/fluid/tests/unittests/test_imperative_resnet.py
+5
-5
python/paddle/fluid/tests/unittests/test_ir_memory_optimize_transformer.py
...id/tests/unittests/test_ir_memory_optimize_transformer.py
+33
-5
python/paddle/fluid/tests/unittests/test_softmax_op.py
python/paddle/fluid/tests/unittests/test_softmax_op.py
+0
-10
python/paddle/utils/plot.py
python/paddle/utils/plot.py
+8
-7
python/paddle/utils/preprocess_img.py
python/paddle/utils/preprocess_img.py
+2
-2
未找到文件。
README.md
浏览文件 @
07efdb51
...
...
@@ -3,8 +3,8 @@
English |
[
简体中文
](
./README_cn.md
)
[

](https://travis-ci.org/PaddlePaddle/Paddle)
[

](http://paddlepaddle.org/documentation/docs/en/1.
2/getstarted
/index_en.html)
[

](http://paddlepaddle.org/documentation/docs/zh/1.
2
/beginners_guide/index.html)
[

](http://paddlepaddle.org/documentation/docs/en/1.
3/beginners_guide
/index_en.html)
[

](http://paddlepaddle.org/documentation/docs/zh/1.
3
/beginners_guide/index.html)
[

](https://github.com/PaddlePaddle/Paddle/releases)
[

](LICENSE)
...
...
@@ -18,7 +18,7 @@ learning to many products at Baidu.
Our vision is to enable deep learning for everyone via PaddlePaddle.
Please refer to our
[
release announcement
](
https://github.com/PaddlePaddle/Paddle/releases
)
to track the latest feature of PaddlePaddle.
### Latest PaddlePaddle Release: [Fluid 1.
2.0](https://github.com/PaddlePaddle/Paddle/tree/release/1.2
)
### Latest PaddlePaddle Release: [Fluid 1.
3.0](https://github.com/PaddlePaddle/Paddle/tree/release/1.3
)
### Install Latest Stable Release:
```
# Linux CPU
...
...
@@ -26,9 +26,9 @@ pip install paddlepaddle
# Linux GPU cuda9cudnn7
pip install paddlepaddle-gpu
# Linux GPU cuda8cudnn7
pip install paddlepaddle-gpu==1.
2
.0.post87
pip install paddlepaddle-gpu==1.
3
.0.post87
# Linux GPU cuda8cudnn5
pip install paddlepaddle-gpu==1.
2
.0.post85
pip install paddlepaddle-gpu==1.
3
.0.post85
# For installation on other platform, refer to http://paddlepaddle.org/
```
...
...
@@ -75,26 +75,26 @@ pip install paddlepaddle-gpu==1.2.0.post85
## Installation
It is recommended to read
[
this doc
](
http://paddlepaddle.org/documentation/docs/
zh/1.2/beginners_guide/install/index_c
n.html
)
on our website.
It is recommended to read
[
this doc
](
http://paddlepaddle.org/documentation/docs/
en/1.3/beginners_guide/index_e
n.html
)
on our website.
## Documentation
We provide
[
English
](
http://paddlepaddle.org/documentation/docs/en/1.
2/getstarted
/index_en.html
)
and
[
Chinese
](
http://paddlepaddle.org/documentation/docs/zh/1.
2
/beginners_guide/index.html
)
documentation.
We provide
[
English
](
http://paddlepaddle.org/documentation/docs/en/1.
3/beginners_guide
/index_en.html
)
and
[
Chinese
](
http://paddlepaddle.org/documentation/docs/zh/1.
3
/beginners_guide/index.html
)
documentation.
-
[
Deep Learning 101
](
https://github.com/PaddlePaddle/book
)
You might want to start from this online interactive book that can run in a Jupyter Notebook.
-
[
Distributed Training
](
http://paddlepaddle.org/documentation/docs/
zh/1.2/user_guides/howto/training/cluster_howto
.html
)
-
[
Distributed Training
](
http://paddlepaddle.org/documentation/docs/
en/1.3/user_guides/howto/training/multi_node_en
.html
)
You can run distributed training jobs on MPI clusters.
-
[
Python API
](
http://paddlepaddle.org/documentation/docs/
zh/1.2/api_cn/index_c
n.html
)
-
[
Python API
](
http://paddlepaddle.org/documentation/docs/
en/1.3/api/index_e
n.html
)
Our new API enables much shorter programs.
-
[
How to Contribute
](
http://paddlepaddle.org/documentation/docs/
zh/1.2/advanced_usage/development/contribute_to_paddle/index_c
n.html
)
-
[
How to Contribute
](
http://paddlepaddle.org/documentation/docs/
en/1.3/advanced_usage/development/contribute_to_paddle/index_e
n.html
)
We appreciate your contributions!
...
...
README_cn.md
浏览文件 @
07efdb51
...
...
@@ -3,8 +3,8 @@
[
English
](
./README.md
)
| 简体中文
[

](https://travis-ci.org/PaddlePaddle/Paddle)
[

](http://paddlepaddle.org/documentation/docs/en/1.
2/getstarted
/index_en.html)
[

](http://paddlepaddle.org/documentation/docs/zh/1.
2
/beginners_guide/index.html)
[

](http://paddlepaddle.org/documentation/docs/en/1.
3/beginners_guide
/index_en.html)
[

](http://paddlepaddle.org/documentation/docs/zh/1.
3
/beginners_guide/index.html)
[

](https://github.com/PaddlePaddle/Paddle/releases)
[

](LICENSE)
...
...
@@ -16,7 +16,7 @@ PaddlePaddle (PArallel Distributed Deep LEarning) 是一个简单易用、高效
跟进PaddlePaddle最新特性请参考我们的
[
版本说明
](
https://github.com/PaddlePaddle/Paddle/releases
)
### PaddlePaddle最新版本: [Fluid 1.
2.0](https://github.com/PaddlePaddle/Paddle/tree/release/1.2
)
### PaddlePaddle最新版本: [Fluid 1.
3.0](https://github.com/PaddlePaddle/Paddle/tree/release/1.3
)
### 安装最新稳定版本:
```
# Linux CPU
...
...
@@ -24,9 +24,9 @@ pip install paddlepaddle
# Linux GPU cuda9cudnn7
pip install paddlepaddle-gpu
# Linux GPU cuda8cudnn7
pip install paddlepaddle-gpu==1.
2
.0.post87
pip install paddlepaddle-gpu==1.
3
.0.post87
# Linux GPU cuda8cudnn5
pip install paddlepaddle-gpu==1.
2
.0.post85
pip install paddlepaddle-gpu==1.
3
.0.post85
# 其他平台上的安装指引请参考 http://paddlepaddle.org/
```
...
...
@@ -57,26 +57,26 @@ pip install paddlepaddle-gpu==1.2.0.post85
## 安装
推荐阅读官网上的
[
安装说明
](
http://paddlepaddle.org/documentation/docs/zh/1.
2
/beginners_guide/install/index_cn.html
)
推荐阅读官网上的
[
安装说明
](
http://paddlepaddle.org/documentation/docs/zh/1.
3
/beginners_guide/install/index_cn.html
)
## 文档
我们提供
[
英文
](
http://paddlepaddle.org/documentation/docs/en/1.
2/getstarted
/index_en.html
)
和
[
中文
](
http://paddlepaddle.org/documentation/docs/zh/1.
2
/beginners_guide/index.html
)
文档
我们提供
[
英文
](
http://paddlepaddle.org/documentation/docs/en/1.
3/beginners_guide
/index_en.html
)
和
[
中文
](
http://paddlepaddle.org/documentation/docs/zh/1.
3
/beginners_guide/index.html
)
文档
-
[
深度学习101
](
https://github.com/PaddlePaddle/book
)
或许您想从这个在线交互式书籍开始,可以在Jupyter Notebook中运行
-
[
分布式训练
](
http://paddlepaddle.org/documentation/docs/zh/1.
2/user_guides/howto/training/cluster_howto
.html
)
-
[
分布式训练
](
http://paddlepaddle.org/documentation/docs/zh/1.
3/user_guides/howto/training/multi_node
.html
)
可以在MPI集群上运行分布式训练任务
-
[
Python API
](
http://paddlepaddle.org/documentation/docs/zh/1.
2
/api_cn/index_cn.html
)
-
[
Python API
](
http://paddlepaddle.org/documentation/docs/zh/1.
3
/api_cn/index_cn.html
)
新的API支持代码更少更简洁的程序
-
[
贡献方式
](
http://paddlepaddle.org/documentation/docs/zh/1.
2
/advanced_usage/development/contribute_to_paddle/index_cn.html
)
-
[
贡献方式
](
http://paddlepaddle.org/documentation/docs/zh/1.
3
/advanced_usage/development/contribute_to_paddle/index_cn.html
)
欢迎您的贡献!
...
...
paddle/fluid/API.spec
浏览文件 @
07efdb51
...
...
@@ -43,7 +43,7 @@ paddle.fluid.AsyncExecutor.init_worker ArgSpec(args=['self', 'dist_desc', 'start
paddle.fluid.AsyncExecutor.run ArgSpec(args=['self', 'program', 'data_feed', 'filelist', 'thread_num', 'fetch', 'mode', 'debug'], varargs=None, keywords=None, defaults=('', False))
paddle.fluid.AsyncExecutor.save_model ArgSpec(args=['self', 'save_path'], varargs=None, keywords=None, defaults=None)
paddle.fluid.AsyncExecutor.stop ArgSpec(args=['self'], varargs=None, keywords=None, defaults=None)
paddle.fluid.CompiledProgram.__init__ ArgSpec(args=['self', 'program'], varargs=None, keywords=None, defaults=None)
paddle.fluid.CompiledProgram.__init__ ArgSpec(args=['self', 'program
_or_graph
'], varargs=None, keywords=None, defaults=None)
paddle.fluid.CompiledProgram.with_data_parallel ArgSpec(args=['self', 'loss_name', 'build_strategy', 'exec_strategy', 'share_vars_from'], varargs=None, keywords=None, defaults=(None, None, None, None))
paddle.fluid.CompiledProgram.with_inference_optimize ArgSpec(args=['self', 'config'], varargs=None, keywords=None, defaults=None)
paddle.fluid.ExecutionStrategy.__init__ __init__(self: paddle.fluid.core.ParallelExecutor.ExecutionStrategy) -> None
...
...
@@ -304,7 +304,7 @@ paddle.fluid.layers.reciprocal ArgSpec(args=['x', 'name'], varargs=None, keyword
paddle.fluid.layers.square ArgSpec(args=['x', 'name'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.layers.softplus ArgSpec(args=['x', 'name'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.layers.softsign ArgSpec(args=['x', 'name'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.layers.uniform_random ArgSpec(args=['shape', 'dtype', 'min', 'max', 'seed'], varargs=None, keywords=None, defaults=(
None, None, None, None
))
paddle.fluid.layers.uniform_random ArgSpec(args=['shape', 'dtype', 'min', 'max', 'seed'], varargs=None, keywords=None, defaults=(
'float32', -1.0, 1.0, 0
))
paddle.fluid.layers.hard_shrink ArgSpec(args=['x', 'threshold'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.layers.cumsum ArgSpec(args=['x', 'axis', 'exclusive', 'reverse'], varargs=None, keywords=None, defaults=(None, None, None))
paddle.fluid.layers.thresholded_relu ArgSpec(args=['x', 'threshold'], varargs=None, keywords=None, defaults=(None,))
...
...
paddle/fluid/framework/block_desc.cc
浏览文件 @
07efdb51
...
...
@@ -163,6 +163,20 @@ std::vector<OpDesc *> BlockDesc::AllOps() const {
return
res
;
}
void
BlockDesc
::
Clear
()
{
// clear all ops
ops_
.
clear
();
// clear all vars which are not persistable
for
(
auto
it
=
vars_
.
begin
();
it
!=
vars_
.
end
();)
{
if
(
it
->
second
->
Persistable
())
{
++
it
;
}
else
{
vars_
.
erase
(
it
++
);
}
}
}
void
BlockDesc
::
Flush
()
{
for
(
auto
&
op_desc
:
ops_
)
{
op_desc
->
Flush
();
...
...
paddle/fluid/framework/block_desc.h
浏览文件 @
07efdb51
...
...
@@ -97,6 +97,8 @@ class BlockDesc {
std
::
vector
<
OpDesc
*>
AllOps
()
const
;
void
Clear
();
size_t
OpSize
()
const
{
return
ops_
.
size
();
}
OpDesc
*
Op
(
int
idx
)
const
{
return
ops_
.
at
(
idx
).
get
();
}
...
...
paddle/fluid/framework/data_layout_transform.cc
浏览文件 @
07efdb51
...
...
@@ -134,11 +134,6 @@ void TransDataLayoutFromMKLDNN(const OpKernelType& kernel_type_for_var,
out_layout
=
out_layout
==
DataLayout
::
kAnyLayout
?
DataLayout
::
kNCHW
:
out_layout
;
auto
&
pool
=
platform
::
DeviceContextPool
::
Instance
();
auto
*
dev_ctx
=
dynamic_cast
<
platform
::
MKLDNNDeviceContext
*>
(
pool
.
Get
(
expected_kernel_type
.
place_
));
auto
&
cpu_engine
=
dev_ctx
->
GetEngine
();
std
::
vector
<
int
>
in_tz
=
paddle
::
framework
::
vectorize2int
(
in
.
dims
());
std
::
vector
<
int
>
out_tz
=
in_tz
;
...
...
@@ -147,29 +142,25 @@ void TransDataLayoutFromMKLDNN(const OpKernelType& kernel_type_for_var,
"Input tensor type is not supported: %s"
,
in
.
type
());
memory
::
data_type
out_type
=
in_type
;
auto
in_format
=
platform
::
MKLDNNFormatForSize
(
in_tz
.
size
(),
in
.
format
());
auto
out_format
=
platform
::
MKLDNNFormatForSize
(
in_tz
.
size
(),
ToMKLDNNFormat
(
out_layout
));
// output tensor has the same dims as input. Reorder don't change dims
out
->
Resize
(
in
.
dims
());
if
(
in_format
!=
out_format
)
{
// tempory mem pd fr out , to make reorder
auto
out_mem_pd
=
paddle
::
platform
::
create_prim_desc_from_dims
(
paddle
::
framework
::
vectorize2int
(
out
->
dims
()),
mkldnn
::
memory
::
format
::
blocked
,
out_type
);
if
(
in
.
get_mkldnn_prim_desc
()
!=
out_mem_pd
)
{
void
*
in_data
=
GetDataFromTensor
(
in
,
in_type
);
auto
out_data
=
out
->
mutable_data
(
expected_kernel_type
.
place_
,
in
.
type
());
auto
in_memory
=
memory
({{{
in_tz
},
in_type
,
in_format
},
cpu_engine
},
in_data
);
auto
out_memory
=
memory
({{{
out_tz
},
out_type
,
out_format
},
cpu_engine
},
out_data
);
auto
in_memory
=
memory
(
in
.
get_mkldnn_prim_desc
(),
in_data
);
auto
out_memory
=
memory
(
out_mem_pd
,
out_data
);
platform
::
Reorder
(
in_memory
,
out_memory
);
}
else
{
out
->
ShareDataWith
(
in
);
}
out
->
set_layout
(
out_layout
);
// reset format since the out tensor will be feed to non-MKLDNN OPkernel
out
->
set_format
(
memory
::
format
::
format_undef
);
#endif
}
...
...
paddle/fluid/framework/data_transform.cc
浏览文件 @
07efdb51
...
...
@@ -51,13 +51,31 @@ void TransformData(const OpKernelType &expected_kernel_type,
#ifdef PADDLE_WITH_MKLDNN
// Case1 - transform from Non-MKLDNN OPKernel to MKLDNN OPKernel
// Just set layout/format. No real transform occur
auto
out_format
=
platform
::
MKLDNNFormatForSize
(
in
.
dims
().
size
(),
ToMKLDNNFormat
(
lin
));
out
.
ShareDataWith
(
input_tensor
);
out
.
set_layout
(
DataLayout
::
kMKLDNN
);
out
.
set_format
(
out_format
);
// TODO(jczaja): Remove that once all mkldnn ops
// are modified to work with mkldnn_blocked
auto
mkldnn_fmt
=
[
&
](
int
rank
)
{
switch
(
rank
)
{
case
5
:
return
mkldnn
::
memory
::
format
::
ncdhw
;
case
4
:
return
mkldnn
::
memory
::
format
::
nchw
;
case
3
:
return
mkldnn
::
memory
::
format
::
ncw
;
case
2
:
return
mkldnn
::
memory
::
format
::
nc
;
case
1
:
return
mkldnn
::
memory
::
format
::
x
;
default:
return
mkldnn
::
memory
::
format
::
blocked
;
}
};
auto
out_mem_pd
=
paddle
::
platform
::
create_prim_desc_from_dims
(
paddle
::
framework
::
vectorize2int
(
out
.
dims
()),
mkldnn_fmt
(
out
.
dims
().
size
()));
out
.
set_mkldnn_prim_desc
(
out_mem_pd
);
#endif
}
else
{
// Case2 - transfrom from MKLDNN OPKernel to Non-MKLDNN OPKernel
...
...
paddle/fluid/framework/details/all_reduce_deps_pass.cc
浏览文件 @
07efdb51
...
...
@@ -50,7 +50,7 @@ std::unique_ptr<ir::Graph> AllReduceDepsPass::ApplyImpl(
std
::
unordered_map
<
std
::
string
,
int
>
vars
;
// TODO(gongwb): use graph topology sort to find the order of operators.
// Note that must assert topology sort is stable
auto
&
ops
=
Get
<
const
std
::
vector
<
OpDesc
*>>
(
kAll
OpDescs
);
auto
&
ops
=
graph
->
Get
<
const
std
::
vector
<
OpDesc
*>>
(
kStaleProgram
OpDescs
);
for
(
auto
*
op_desc
:
ops
)
{
auto
outputs
=
op_desc
->
Outputs
();
for
(
auto
&
o_it
:
outputs
)
{
...
...
@@ -120,4 +120,4 @@ std::unique_ptr<ir::Graph> AllReduceDepsPass::ApplyImpl(
REGISTER_PASS
(
all_reduce_deps_pass
,
paddle
::
framework
::
details
::
AllReduceDepsPass
)
.
Require
PassAttr
(
paddle
::
framework
::
details
::
kAll
OpDescs
);
.
Require
GraphAttr
(
paddle
::
framework
::
details
::
kStaleProgram
OpDescs
);
paddle/fluid/framework/details/build_strategy.cc
浏览文件 @
07efdb51
...
...
@@ -135,12 +135,15 @@ class ParallelExecutorPassBuilder : public ir::PassBuilder {
void
AppendMultiDevPass
(
const
BuildStrategy
&
strategy
)
{
ir
::
Pass
*
multi_devices_pass
;
if
(
strategy_
.
is_distribution_
)
{
VLOG
(
3
)
<<
"multi device parameter server mode"
;
multi_devices_pass
=
AppendPass
(
"dist_multi_devices_pass"
).
get
();
}
else
{
if
(
strategy
.
reduce_
==
BuildStrategy
::
ReduceStrategy
::
kAllReduce
)
{
VLOG
(
3
)
<<
"multi devices collective mode with allreduce"
;
multi_devices_pass
=
AppendPass
(
"allreduce_mode_multi_devices_pass"
).
get
();
}
else
if
(
strategy
.
reduce_
==
BuildStrategy
::
ReduceStrategy
::
kReduce
)
{
VLOG
(
3
)
<<
"multi deivces collective mode with reduce"
;
multi_devices_pass
=
AppendPass
(
"reduce_mode_multi_devices_pass"
).
get
();
}
else
{
PADDLE_THROW
(
"Unknown reduce strategy."
);
...
...
@@ -171,7 +174,8 @@ bool BuildStrategy::IsMultiDevPass(const std::string &pass_name) const {
}
std
::
unique_ptr
<
ir
::
Graph
>
BuildStrategy
::
Apply
(
const
ProgramDesc
&
main_program
,
const
std
::
vector
<
platform
::
Place
>
&
places
,
std
::
unique_ptr
<
ir
::
Graph
>
graph
,
const
std
::
vector
<
platform
::
Place
>
&
places
,
const
std
::
string
&
loss_var_name
,
const
std
::
vector
<
Scope
*>
&
local_scopes
,
const
size_t
&
nranks
,
#if defined(PADDLE_WITH_CUDA) && !defined(_WIN32)
...
...
@@ -182,7 +186,6 @@ std::unique_ptr<ir::Graph> BuildStrategy::Apply(
// Create a default one if not finalized by user.
CreatePassesFromStrategy
(
false
);
std
::
unique_ptr
<
ir
::
Graph
>
graph
(
new
ir
::
Graph
(
main_program
));
for
(
std
::
shared_ptr
<
ir
::
Pass
>
&
pass
:
pass_builder_
->
AllPasses
())
{
if
(
IsMultiDevPass
(
pass
->
Type
()))
{
pass
->
Erase
(
kPlaces
);
...
...
@@ -200,41 +203,12 @@ std::unique_ptr<ir::Graph> BuildStrategy::Apply(
pass
->
Erase
(
"nccl_ctxs"
);
pass
->
SetNotOwned
<
platform
::
NCCLContextMap
>
(
"nccl_ctxs"
,
nctx
);
#endif
}
else
if
(
pass
->
Type
()
==
"memory_optimize_pass"
)
{
if
(
graph
->
Has
(
kAllOpDescs
))
{
graph
->
Erase
(
kAllOpDescs
);
}
const
std
::
vector
<
OpDesc
*>
*
all_op_descs
=
new
std
::
vector
<
OpDesc
*>
(
main_program
.
Block
(
0
).
AllOps
());
graph
->
Set
<
const
std
::
vector
<
OpDesc
*>>
(
kAllOpDescs
,
all_op_descs
);
// take ownership
pass
->
Erase
(
kAllOpDescs
);
pass
->
SetNotOwned
<
const
std
::
vector
<
OpDesc
*>>
(
kAllOpDescs
,
all_op_descs
);
}
else
if
(
pass
->
Type
()
==
"sequential_execution_pass"
)
{
LOG
(
INFO
)
<<
"set enable_sequential_execution:"
<<
enable_sequential_execution_
;
pass
->
Erase
(
kAllOpDescs
);
pass
->
Set
<
const
std
::
vector
<
OpDesc
*>>
(
kAllOpDescs
,
new
std
::
vector
<
OpDesc
*>
(
main_program
.
Block
(
0
).
AllOps
()));
}
else
if
(
pass
->
Type
()
==
"all_reduce_deps_pass"
)
{
LOG
(
INFO
)
<<
"SeqOnlyAllReduceOps:"
<<
SeqOnlyAllReduceOps
(
*
this
)
<<
", num_trainers:"
<<
num_trainers_
;
pass
->
Erase
(
kAllOpDescs
);
pass
->
Set
<
const
std
::
vector
<
OpDesc
*>>
(
kAllOpDescs
,
new
std
::
vector
<
OpDesc
*>
(
main_program
.
Block
(
0
).
AllOps
()));
}
else
if
(
pass
->
Type
()
==
"inplace_pass"
)
{
if
(
graph
->
Has
(
kAllOpDescs
))
{
graph
->
Erase
(
kAllOpDescs
);
}
graph
->
Set
<
const
std
::
vector
<
OpDesc
*>>
(
kAllOpDescs
,
new
std
::
vector
<
OpDesc
*>
(
main_program
.
Block
(
0
).
AllOps
()));
}
else
if
(
pass
->
Type
()
==
"fuse_relu_depthwise_conv_pass"
)
{
if
(
!
use_cuda
)
{
LOG
(
WARNING
)
<<
"fuse_relu_depthwise_conv_pass is only supported on "
...
...
paddle/fluid/framework/details/build_strategy.h
浏览文件 @
07efdb51
...
...
@@ -114,7 +114,7 @@ struct BuildStrategy {
// Apply the passes built by the pass_builder_. The passes will be
// applied to the Program and output an ir::Graph.
std
::
unique_ptr
<
ir
::
Graph
>
Apply
(
const
ProgramDesc
&
main_program
,
std
::
unique_ptr
<
ir
::
Graph
>
Apply
(
std
::
unique_ptr
<
ir
::
Graph
>
graph
,
const
std
::
vector
<
platform
::
Place
>
&
places
,
const
std
::
string
&
loss_var_name
,
const
std
::
vector
<
Scope
*>
&
local_scopes
,
...
...
paddle/fluid/framework/details/fast_threaded_ssa_graph_executor.cc
浏览文件 @
07efdb51
...
...
@@ -24,12 +24,11 @@ namespace details {
FastThreadedSSAGraphExecutor
::
FastThreadedSSAGraphExecutor
(
const
ExecutionStrategy
&
strategy
,
const
std
::
vector
<
Scope
*>
&
local_scopes
,
const
std
::
vector
<
platform
::
Place
>
&
places
,
std
::
unique_ptr
<
ir
::
Graph
>
&&
graph
)
const
std
::
vector
<
platform
::
Place
>
&
places
,
ir
::
Graph
*
graph
)
:
strategy_
(
strategy
),
local_scopes_
(
local_scopes
),
places_
(
places
),
graph_
(
std
::
move
(
graph
)
),
graph_
(
graph
),
pool_
(
strategy
.
num_threads_
),
prepare_pool_
(
1
),
// add one more thread for generate op_deps
fetch_ctxs_
(
places
)
{
...
...
@@ -110,14 +109,14 @@ FeedFetchList FastThreadedSSAGraphExecutor::Run(
}
}
if
(
exception_
.
IsCaught
())
{
ClearFetchOp
(
graph_
.
get
()
,
&
fetch_ops
);
ClearFetchOp
(
graph_
,
&
fetch_ops
);
exception_
.
ReThrow
();
}
}
num_complete
+=
num_comp
;
}
// Wait FetchOps.
ClearFetchOp
(
graph_
.
get
()
,
&
fetch_ops
);
ClearFetchOp
(
graph_
,
&
fetch_ops
);
return
fetches
;
}
...
...
paddle/fluid/framework/details/fast_threaded_ssa_graph_executor.h
浏览文件 @
07efdb51
...
...
@@ -32,7 +32,7 @@ class FastThreadedSSAGraphExecutor : public SSAGraphExecutor {
FastThreadedSSAGraphExecutor
(
const
ExecutionStrategy
&
strategy
,
const
std
::
vector
<
Scope
*>
&
local_scopes
,
const
std
::
vector
<
platform
::
Place
>
&
places
,
std
::
unique_ptr
<
ir
::
Graph
>
&&
graph
);
ir
::
Graph
*
graph
);
FeedFetchList
Run
(
const
std
::
vector
<
std
::
string
>
&
fetch_tensors
)
override
;
const
ir
::
Graph
&
Graph
()
const
override
;
...
...
@@ -40,7 +40,7 @@ class FastThreadedSSAGraphExecutor : public SSAGraphExecutor {
ExecutionStrategy
strategy_
;
std
::
vector
<
Scope
*>
local_scopes_
;
std
::
vector
<
platform
::
Place
>
places_
;
std
::
unique_ptr
<
ir
::
Graph
>
graph_
;
ir
::
Graph
*
graph_
;
std
::
unordered_map
<
OpHandleBase
*
,
int
>
op_deps_
;
std
::
vector
<
OpHandleBase
*>
bootstrap_ops_
;
...
...
paddle/fluid/framework/details/memory_optimize_helper.cc
浏览文件 @
07efdb51
...
...
@@ -33,10 +33,10 @@ namespace details {
using
paddle
::
framework
::
VarDesc
;
std
::
vector
<
ir
::
Node
*>
SortOpLikeDescOrder
(
const
ir
::
Graph
&
graph
)
{
PADDLE_ENFORCE
(
graph
.
Has
(
k
All
OpDescs
),
"Graph has no attribute of k
All
OpDescs."
);
PADDLE_ENFORCE
(
graph
.
Has
(
k
StaleProgram
OpDescs
),
"Graph has no attribute of k
StaleProgram
OpDescs."
);
// 1. get op desc order
auto
&
op_descs
=
graph
.
Get
<
const
std
::
vector
<
OpDesc
*>>
(
k
All
OpDescs
);
auto
&
op_descs
=
graph
.
Get
<
const
std
::
vector
<
OpDesc
*>>
(
k
StaleProgram
OpDescs
);
// 2. topology sort order
auto
nodes
=
graph
.
Nodes
();
...
...
@@ -461,11 +461,21 @@ void ControlFlowGraph::LiveVariableAnalysis() {
}
}
}
for
(
auto
*
op
:
ops_
)
{
unlived_vars_
[
op
]
=
std
::
set
<
std
::
string
>
();
for
(
auto
&
var
:
this
->
LiveIn
(
op
))
{
if
(
!
this
->
LiveOut
(
op
).
count
(
var
))
{
unlived_vars_
[
op
].
insert
(
var
);
}
}
}
}
void
ControlFlowGraph
::
RenameVarInCFGGraph
(
const
std
::
string
&
old_node
,
const
std
::
string
&
new_node
,
int
begin_idx
)
{
std
::
vector
<
bool
>
need_update
(
ops_
.
size
(),
false
);
// update graph from begin idx to the end
for
(
size_t
i
=
begin_idx
;
i
!=
ops_
.
size
();
++
i
)
{
auto
*
op
=
ops_
[
i
];
...
...
@@ -480,15 +490,27 @@ void ControlFlowGraph::RenameVarInCFGGraph(const std::string& old_node,
if
(
live_in_
[
op
].
find
(
old_node
)
!=
live_in_
[
op
].
end
())
{
live_in_
[
op
].
erase
(
old_node
);
live_in_
[
op
].
insert
(
new_node
);
need_update
[
i
]
=
true
;
}
if
(
live_out_
[
op
].
find
(
old_node
)
!=
live_out_
[
op
].
end
())
{
live_out_
[
op
].
erase
(
old_node
);
live_out_
[
op
].
insert
(
new_node
);
need_update
[
i
]
=
true
;
}
}
for
(
size_t
i
=
begin_idx
;
i
<
ops_
.
size
();
++
i
)
{
if
(
!
need_update
[
i
])
continue
;
auto
*
op
=
ops_
[
i
];
for
(
auto
&
var
:
this
->
LiveIn
(
op
))
{
if
(
!
this
->
LiveOut
(
op
).
count
(
var
))
{
unlived_vars_
[
op
].
insert
(
var
);
}
}
}
}
const
std
::
set
<
std
::
string
>
ControlFlowGraph
::
LiveIn
(
ir
::
Node
*
op
)
const
{
const
std
::
set
<
std
::
string
>
&
ControlFlowGraph
::
LiveIn
(
ir
::
Node
*
op
)
const
{
auto
it
=
live_in_
.
find
(
op
);
PADDLE_ENFORCE
(
it
!=
live_in_
.
end
(),
...
...
@@ -496,7 +518,7 @@ const std::set<std::string> ControlFlowGraph::LiveIn(ir::Node* op) const {
return
it
->
second
;
}
const
std
::
set
<
std
::
string
>
ControlFlowGraph
::
LiveOut
(
ir
::
Node
*
op
)
const
{
const
std
::
set
<
std
::
string
>
&
ControlFlowGraph
::
LiveOut
(
ir
::
Node
*
op
)
const
{
auto
it
=
live_out_
.
find
(
op
);
PADDLE_ENFORCE
(
it
!=
live_out_
.
end
(),
...
...
@@ -504,15 +526,24 @@ const std::set<std::string> ControlFlowGraph::LiveOut(ir::Node* op) const {
return
it
->
second
;
}
const
std
::
set
<
std
::
string
>
ControlFlowGraph
::
Use
(
ir
::
Node
*
op
)
const
{
const
std
::
set
<
std
::
string
>
&
ControlFlowGraph
::
Use
(
ir
::
Node
*
op
)
const
{
auto
it
=
uses_
.
find
(
op
);
PADDLE_ENFORCE
(
it
!=
uses_
.
end
(),
string
::
Sprintf
(
"Expect %s in live_out, but Not Found."
,
op
->
Name
()));
string
::
Sprintf
(
"Expect %s in use, but Not Found."
,
op
->
Name
()));
return
it
->
second
;
}
const
std
::
set
<
std
::
string
>&
ControlFlowGraph
::
Unlived
(
ir
::
Node
*
op
)
const
{
auto
it
=
unlived_vars_
.
find
(
op
);
PADDLE_ENFORCE
(
it
!=
unlived_vars_
.
end
(),
string
::
Sprintf
(
"Expect %s in unlived_set, but Not Found."
,
op
->
Name
()));
return
it
->
second
;
return
it
->
second
;
}
const
std
::
vector
<
ir
::
Node
*>
ControlFlowGraph
::
Ops
()
const
{
return
ops_
;
}
const
std
::
vector
<
ir
::
Node
*>
&
ControlFlowGraph
::
Ops
()
const
{
return
ops_
;
}
std
::
vector
<
ir
::
Node
*>&
ControlFlowGraph
::
Ops
()
{
return
ops_
;
}
...
...
paddle/fluid/framework/details/memory_optimize_helper.h
浏览文件 @
07efdb51
...
...
@@ -92,10 +92,11 @@ class ControlFlowGraph {
void
RenameVarInCFGGraph
(
const
std
::
string
&
old_node
,
const
std
::
string
&
new_node
,
int
begin_idx
);
const
std
::
set
<
std
::
string
>
LiveIn
(
ir
::
Node
*
op
)
const
;
const
std
::
set
<
std
::
string
>
LiveOut
(
ir
::
Node
*
op
)
const
;
const
std
::
set
<
std
::
string
>
Use
(
ir
::
Node
*
op
)
const
;
const
std
::
vector
<
ir
::
Node
*>
Ops
()
const
;
const
std
::
set
<
std
::
string
>&
LiveIn
(
ir
::
Node
*
op
)
const
;
const
std
::
set
<
std
::
string
>&
LiveOut
(
ir
::
Node
*
op
)
const
;
const
std
::
set
<
std
::
string
>&
Use
(
ir
::
Node
*
op
)
const
;
const
std
::
set
<
std
::
string
>&
Unlived
(
ir
::
Node
*
op
)
const
;
const
std
::
vector
<
ir
::
Node
*>&
Ops
()
const
;
std
::
vector
<
ir
::
Node
*>&
Ops
();
// for ssa-graph nodes
...
...
@@ -117,6 +118,7 @@ class ControlFlowGraph {
VarSetMap
live_out_
;
VarSetMap
uses_
;
// op inputs
VarSetMap
defs_
;
// op outputs
std
::
unordered_map
<
ir
::
Node
*
,
std
::
set
<
std
::
string
>>
unlived_vars_
;
std
::
vector
<
ir
::
Node
*>
ops_
;
// op sequence by topology sort
};
...
...
paddle/fluid/framework/details/memory_optimize_helper_test.cc
浏览文件 @
07efdb51
...
...
@@ -228,9 +228,6 @@ TEST(CFGGraph, IRGraph) {
// prepare ir graph
auto
prog
=
FillProgramDesc
();
ir
::
Graph
graph
(
prog
);
const
std
::
vector
<
OpDesc
*>*
all_op_descs
=
new
std
::
vector
<
OpDesc
*>
(
prog
.
Block
(
0
).
AllOps
());
graph
.
Set
(
details
::
kAllOpDescs
,
all_op_descs
);
// take ownership
ControlFlowGraph
cfg
(
graph
);
cfg
.
LiveVariableAnalysis
();
...
...
@@ -256,9 +253,6 @@ TEST(CFGGraph, IRGraph) {
TEST
(
SortOpLikeDescOrder
,
NormalTest
)
{
auto
prog
=
FillProgramDesc
();
ir
::
Graph
graph
(
prog
);
const
std
::
vector
<
OpDesc
*>*
all_op_descs
=
new
std
::
vector
<
OpDesc
*>
(
prog
.
Block
(
0
).
AllOps
());
graph
.
Set
(
details
::
kAllOpDescs
,
all_op_descs
);
// take ownership
auto
nodes
=
SortOpLikeDescOrder
(
graph
);
auto
op_descs
=
prog
.
Block
(
0
).
AllOps
();
...
...
@@ -273,9 +267,6 @@ TEST(SortOpLikeDescOrder, NormalTest) {
TEST
(
SortOpLikeDescOrder
,
RemoveOpDesc
)
{
auto
prog
=
FillProgramDesc
();
ir
::
Graph
graph
(
prog
);
const
std
::
vector
<
OpDesc
*>*
all_op_descs
=
new
std
::
vector
<
OpDesc
*>
(
prog
.
Block
(
0
).
AllOps
());
graph
.
Set
(
details
::
kAllOpDescs
,
all_op_descs
);
// take ownership
auto
nodes
=
graph
.
Nodes
();
auto
op_descs
=
prog
.
Block
(
0
).
AllOps
();
ir
::
Node
*
found_node
=
nullptr
;
...
...
@@ -324,8 +315,6 @@ TEST(SortOpLikeDescOrder, RemoveOpDesc) {
// 3. add some op_desc
TEST
(
SortOpLikeDescOrder
,
AddOpDesc
)
{
auto
prog
=
FillProgramDesc
();
const
std
::
vector
<
OpDesc
*>*
all_op_descs
=
new
std
::
vector
<
OpDesc
*>
(
prog
.
Block
(
0
).
AllOps
());
ir
::
Graph
graph
(
prog
);
auto
find_node_in_graph
=
[
&
](
std
::
string
s
)
{
...
...
@@ -342,9 +331,7 @@ TEST(SortOpLikeDescOrder, AddOpDesc) {
// cached desc different with real one
// mimic the intermidiete pass modify the programdesc.
graph
.
Set
(
details
::
kAllOpDescs
,
all_op_descs
);
// take ownership
auto
op_descs
=
prog
.
Block
(
0
).
AllOps
();
std
::
vector
<
OpDesc
*>
op_descs
=
graph
.
OriginProgram
().
Block
(
0
).
AllOps
();
auto
op
=
prog
.
MutableBlock
(
0
)
->
AppendOp
();
prog
.
MutableBlock
(
0
)
->
Var
(
"d1"
)
->
SetType
(
proto
::
VarType
::
LOD_TENSOR
);
...
...
@@ -376,9 +363,6 @@ TEST(SortOpLikeDescOrder, AddOpDesc) {
TEST
(
SortOpLikeDescOrder
,
AddAndDeleteOpDesc
)
{
auto
prog
=
FillProgramDesc
();
ir
::
Graph
graph
(
prog
);
const
std
::
vector
<
OpDesc
*>*
all_op_descs
=
new
std
::
vector
<
OpDesc
*>
(
prog
.
Block
(
0
).
AllOps
());
graph
.
Set
(
details
::
kAllOpDescs
,
all_op_descs
);
// take ownership
auto
find_node_in_graph
=
[
&
](
std
::
string
s
)
{
ir
::
Node
*
ret
=
nullptr
;
...
...
@@ -392,8 +376,9 @@ TEST(SortOpLikeDescOrder, AddAndDeleteOpDesc) {
return
ret
;
};
std
::
vector
<
OpDesc
*>
op_descs
=
graph
.
OriginProgram
().
Block
(
0
).
AllOps
();
// remove sum node
auto
op_descs
=
prog
.
Block
(
0
).
AllOps
();
ir
::
Node
*
found_node
=
nullptr
;
auto
nodes
=
graph
.
Nodes
();
for
(
auto
node
:
nodes
)
{
...
...
@@ -454,9 +439,7 @@ TEST(SortOpLikeDescOrder, AddAndDeleteOpDesc) {
TEST
(
SortOpLikeDescOrder
,
AddAndReplaceOpDescInplace
)
{
auto
prog
=
FillProgramDesc
();
ir
::
Graph
graph
(
prog
);
const
std
::
vector
<
OpDesc
*>*
all_op_descs
=
new
std
::
vector
<
OpDesc
*>
(
prog
.
Block
(
0
).
AllOps
());
graph
.
Set
(
details
::
kAllOpDescs
,
all_op_descs
);
// take ownership
std
::
vector
<
OpDesc
*>
op_descs
=
graph
.
OriginProgram
().
Block
(
0
).
AllOps
();
auto
find_node_in_graph
=
[
&
](
std
::
string
s
)
{
ir
::
Node
*
ret
=
nullptr
;
...
...
@@ -470,7 +453,6 @@ TEST(SortOpLikeDescOrder, AddAndReplaceOpDescInplace) {
return
ret
;
};
auto
op_descs
=
prog
.
Block
(
0
).
AllOps
();
// add node
auto
op
=
prog
.
MutableBlock
(
0
)
->
AppendOp
();
prog
.
MutableBlock
(
0
)
->
Var
(
"d1"
)
->
SetType
(
proto
::
VarType
::
LOD_TENSOR
);
...
...
paddle/fluid/framework/details/memory_optimize_pass.cc
浏览文件 @
07efdb51
...
...
@@ -118,13 +118,11 @@ std::unique_ptr<ir::Graph> MemoryOptimizePass::ApplyImpl(
}
}
// fill the pool
for
(
auto
var
:
cfg_
->
LiveIn
(
op
))
{
if
(
cfg_
->
LiveOut
(
op
).
count
(
var
)
==
0
)
{
ir
::
Node
*
var_node
=
cfg_
->
GetNodeByName
(
var
,
op
);
if
(
var_node
==
nullptr
||
var_node
->
IsCtrlVar
())
continue
;
if
(
NodeCanReused
(
var_node
)
&&
!
pool_
.
Has
(
var_node
))
{
pool_
.
Insert
(
var_node
);
}
for
(
auto
&
var
:
cfg_
->
Unlived
(
op
))
{
ir
::
Node
*
var_node
=
cfg_
->
GetNodeByName
(
var
,
op
);
if
(
var_node
==
nullptr
||
var_node
->
IsCtrlVar
())
continue
;
if
(
NodeCanReused
(
var_node
)
&&
!
pool_
.
Has
(
var_node
))
{
pool_
.
Insert
(
var_node
);
}
}
}
...
...
@@ -337,4 +335,4 @@ void MemoryOptimizePass::RenameVarInGraphNode(const std::string& var,
REGISTER_PASS
(
memory_optimize_pass
,
paddle
::
framework
::
details
::
MemoryOptimizePass
)
.
RequireGraphAttr
(
paddle
::
framework
::
details
::
k
All
OpDescs
);
.
RequireGraphAttr
(
paddle
::
framework
::
details
::
k
StaleProgram
OpDescs
);
paddle/fluid/framework/details/multi_devices_graph_pass.cc
浏览文件 @
07efdb51
...
...
@@ -937,9 +937,21 @@ void DistSSAGraphBuilder::InsertCollectiveOp(ir::Graph *result,
}
void
DistSSAGraphBuilder
::
InsertPostprocessOps
(
ir
::
Graph
*
result
)
const
{
if
(
need_broadcast_var_
||
(
UseGPU
()
&&
strategy_
.
reduce_
==
BuildStrategy
::
ReduceStrategy
::
kReduce
))
{
// broad cast received parameters when training in parameter server mode.
if
(
need_broadcast_var_
)
{
// There are 4 conditions:
// 1. GPU && Reduce: Reduce gradient then broadcast gradient to other GPUS.
// Need to broadcast received parameters to other GPU.
// 2. GPU && AllReduce: AllReduce all graident to each GPU. Need to
// broadcast received parameters to other GPU.
// 3. CPU && AllReduce: AllReduce all gradient to each thread. Need to
// broadcast received parameters to other scope.
// 4. CPU && Reduce: because all parameters share the same memory, did not
// broadcast received parameters.
if
(
!
UseGPU
()
&&
strategy_
.
reduce_
==
BuildStrategy
::
ReduceStrategy
::
kReduce
)
{
return
;
}
if
(
strategy_
.
fuse_broadcast_op_
)
{
CreateFusedBroadcastOp
(
result
,
bcast_var_name_set_
);
}
else
{
...
...
paddle/fluid/framework/details/parallel_ssa_graph_executor.cc
浏览文件 @
07efdb51
...
...
@@ -20,8 +20,7 @@ namespace framework {
namespace
details
{
std
::
vector
<
std
::
unique_ptr
<
ir
::
Graph
>>
ParallelSSAGraphExecutor
::
SeparateMultiDevicesGraph
(
std
::
unique_ptr
<
ir
::
Graph
>
&&
graph
)
{
ParallelSSAGraphExecutor
::
SeparateMultiDevicesGraph
(
ir
::
Graph
*
graph
)
{
std
::
vector
<
std
::
unique_ptr
<
ir
::
Graph
>>
graphs
;
graphs
.
reserve
(
places_
.
size
());
for
(
size_t
i
=
0
;
i
<
places_
.
size
();
++
i
)
{
...
...
@@ -77,24 +76,18 @@ ParallelSSAGraphExecutor::SeparateMultiDevicesGraph(
ParallelSSAGraphExecutor
::
ParallelSSAGraphExecutor
(
const
ExecutionStrategy
&
strategy
,
const
std
::
vector
<
Scope
*>
&
local_scopes
,
const
std
::
vector
<
platform
::
Place
>
&
places
,
const
framework
::
ProgramDesc
&
main_prog
,
std
::
unique_ptr
<
ir
::
Graph
>
&&
graph
)
const
std
::
vector
<
platform
::
Place
>
&
places
,
ir
::
Graph
*
graph
)
:
strategy_
(
std
::
move
(
strategy
)),
local_scopes_
(
std
::
move
(
local_scopes
)),
pool_
(
places
.
size
()
>=
2
?
new
::
ThreadPool
(
places
.
size
())
:
nullptr
),
places_
(
std
::
move
(
places
)),
main_prog_
(
main_prog
),
// TODO(Yancey1989): Copying graphs is not safely since it deleted the
// attrs.
graphs_
(
SeparateMultiDevicesGraph
(
std
::
move
(
graph
)
))
{
graphs_
(
SeparateMultiDevicesGraph
(
graph
))
{
PADDLE_ENFORCE_EQ
(
places_
.
size
(),
local_scopes_
.
size
());
auto
seq_allreduce_pass
=
ir
::
PassRegistry
::
Instance
().
Get
(
"all_reduce_deps_pass"
);
seq_allreduce_pass
->
Erase
(
details
::
kAllOpDescs
);
seq_allreduce_pass
->
Set
<
const
std
::
vector
<
OpDesc
*>>
(
details
::
kAllOpDescs
,
new
std
::
vector
<
OpDesc
*>
(
main_prog_
.
Block
(
0
).
AllOps
()));
for
(
size_t
i
=
0
;
i
<
graphs_
.
size
();
++
i
)
{
graphs_
[
i
]
=
seq_allreduce_pass
->
Apply
(
std
::
move
(
graphs_
[
i
]));
}
...
...
@@ -107,7 +100,7 @@ ParallelSSAGraphExecutor::ParallelSSAGraphExecutor(
<<
" to run the operators of the graph on each device."
;
for
(
size_t
i
=
0
;
i
<
places
.
size
();
++
i
)
{
executors_
.
emplace_back
(
new
details
::
ThreadedSSAGraphExecutor
(
strategy_
,
local_scopes_
,
{
places_
[
i
]},
std
::
move
(
graphs_
.
at
(
i
)
)));
strategy_
,
local_scopes_
,
{
places_
[
i
]},
graphs_
.
at
(
i
).
get
(
)));
}
}
...
...
paddle/fluid/framework/details/parallel_ssa_graph_executor.h
浏览文件 @
07efdb51
...
...
@@ -31,8 +31,7 @@ class ParallelSSAGraphExecutor : public SSAGraphExecutor {
ParallelSSAGraphExecutor
(
const
ExecutionStrategy
&
strategy
,
const
std
::
vector
<
Scope
*>
&
local_scopes
,
const
std
::
vector
<
platform
::
Place
>
&
places
,
const
framework
::
ProgramDesc
&
main_prog
,
std
::
unique_ptr
<
ir
::
Graph
>
&&
graph
);
ir
::
Graph
*
graph
);
~
ParallelSSAGraphExecutor
()
final
=
default
;
const
ir
::
Graph
&
Graph
()
const
override
{
return
*
graphs_
[
0
];
}
...
...
@@ -41,13 +40,12 @@ class ParallelSSAGraphExecutor : public SSAGraphExecutor {
private:
std
::
vector
<
std
::
unique_ptr
<
ir
::
Graph
>>
SeparateMultiDevicesGraph
(
std
::
unique_ptr
<
ir
::
Graph
>
&&
graph
);
ir
::
Graph
*
graph
);
ExecutionStrategy
strategy_
;
std
::
vector
<
Scope
*>
local_scopes_
;
std
::
unique_ptr
<::
ThreadPool
>
pool_
{
nullptr
};
std
::
vector
<
platform
::
Place
>
places_
;
framework
::
ProgramDesc
main_prog_
;
std
::
vector
<
std
::
unique_ptr
<
ir
::
Graph
>>
graphs_
;
std
::
vector
<
std
::
unique_ptr
<
details
::
ThreadedSSAGraphExecutor
>>
executors_
;
...
...
paddle/fluid/framework/details/sequential_execution_pass.cc
浏览文件 @
07efdb51
...
...
@@ -40,7 +40,7 @@ std::unique_ptr<ir::Graph> SequentialExecutionPass::ApplyImpl(
static
std
::
unordered_set
<
std
::
string
>
skip_dist_ops
{
"send"
,
"recv"
,
"send_barrier"
,
"fetch_barrier"
};
auto
&
ops
=
Get
<
const
std
::
vector
<
OpDesc
*>>
(
kAll
OpDescs
);
auto
&
ops
=
graph
->
Get
<
const
std
::
vector
<
OpDesc
*>>
(
kStaleProgram
OpDescs
);
std
::
vector
<
ir
::
Node
*>
op_node_list
;
op_node_list
.
reserve
(
ops
.
size
());
...
...
@@ -107,4 +107,4 @@ std::unique_ptr<ir::Graph> SequentialExecutionPass::ApplyImpl(
REGISTER_PASS
(
sequential_execution_pass
,
paddle
::
framework
::
details
::
SequentialExecutionPass
)
.
Require
PassAttr
(
paddle
::
framework
::
details
::
kAll
OpDescs
);
.
Require
GraphAttr
(
paddle
::
framework
::
details
::
kStaleProgram
OpDescs
);
paddle/fluid/framework/details/threaded_ssa_graph_executor.cc
浏览文件 @
07efdb51
...
...
@@ -23,9 +23,8 @@ namespace framework {
namespace
details
{
ThreadedSSAGraphExecutor
::
ThreadedSSAGraphExecutor
(
const
ExecutionStrategy
&
strategy
,
const
std
::
vector
<
Scope
*>
&
local_scopes
,
const
std
::
vector
<
platform
::
Place
>
&
places
,
std
::
unique_ptr
<
ir
::
Graph
>
&&
graph
)
:
graph_
(
std
::
move
(
graph
)),
const
std
::
vector
<
platform
::
Place
>
&
places
,
ir
::
Graph
*
graph
)
:
graph_
(
graph
),
pool_
(
strategy
.
num_threads_
>=
2
?
new
::
ThreadPool
(
strategy
.
num_threads_
)
:
nullptr
),
local_scopes_
(
local_scopes
),
...
...
@@ -110,7 +109,7 @@ FeedFetchList ThreadedSSAGraphExecutor::Run(
for
(
auto
&
run_op_future
:
run_op_futures_
)
{
run_op_future
.
wait
();
}
ClearFetchOp
(
graph_
.
get
()
,
&
fetch_ops
);
ClearFetchOp
(
graph_
,
&
fetch_ops
);
exception_holder_
.
ReThrow
();
}
else
{
continue
;
...
...
@@ -135,7 +134,7 @@ FeedFetchList ThreadedSSAGraphExecutor::Run(
}
PADDLE_ENFORCE
(
ready_ops
.
empty
());
// Wait FetchOps.
ClearFetchOp
(
graph_
.
get
()
,
&
fetch_ops
);
ClearFetchOp
(
graph_
,
&
fetch_ops
);
return
fetch_data
;
}
...
...
paddle/fluid/framework/details/threaded_ssa_graph_executor.h
浏览文件 @
07efdb51
...
...
@@ -41,7 +41,7 @@ class ThreadedSSAGraphExecutor : public SSAGraphExecutor {
ThreadedSSAGraphExecutor
(
const
ExecutionStrategy
&
strategy
,
const
std
::
vector
<
Scope
*>
&
local_scopes
,
const
std
::
vector
<
platform
::
Place
>
&
places
,
std
::
unique_ptr
<
ir
::
Graph
>
&&
graph
);
ir
::
Graph
*
graph
);
const
ir
::
Graph
&
Graph
()
const
override
{
return
*
graph_
;
}
// Run a SSAGraph by a thread pool
...
...
@@ -55,7 +55,7 @@ class ThreadedSSAGraphExecutor : public SSAGraphExecutor {
details
::
OpHandleBase
*
op
);
private:
std
::
unique_ptr
<
ir
::
Graph
>
graph_
;
ir
::
Graph
*
graph_
;
std
::
unique_ptr
<::
ThreadPool
>
pool_
;
std
::
vector
<
Scope
*>
local_scopes_
;
std
::
vector
<
platform
::
Place
>
places_
;
...
...
paddle/fluid/framework/ir/graph.cc
浏览文件 @
07efdb51
...
...
@@ -76,6 +76,9 @@ std::map<std::string, std::vector<ir::Node *>> Graph::InitFromProgram(
var
->
inputs
.
push_back
(
node
);
}
}
Set
<
const
std
::
vector
<
OpDesc
*>>
(
details
::
kStaleProgramOpDescs
,
new
std
::
vector
<
OpDesc
*>
(
program
.
Block
(
0
).
AllOps
()));
return
var_nodes
;
}
...
...
paddle/fluid/framework/ir/graph.h
浏览文件 @
07efdb51
...
...
@@ -31,7 +31,7 @@ namespace details {
// This attr is not recommended, because the graph should not dependence
// the program once it is built.
constexpr
char
k
AllOpDescs
[]
=
"all
_op_descs"
;
constexpr
char
k
StaleProgramOpDescs
[]
=
"stale_program
_op_descs"
;
}
// namespace details
namespace
ir
{
...
...
@@ -195,6 +195,12 @@ class Graph {
return
nullptr
;
}
// Returns reference to the original program.
// WARN: After a series of passes, the current graph can be quite
// different from OriginProgram. Caller shouldn't assume much from
// the returned OriginProgram.
const
ProgramDesc
&
OriginProgram
()
const
{
return
program_
;
}
// This method takes ownership of `node`.
ir
::
Node
*
AddNode
(
ir
::
Node
*
node
)
{
PADDLE_ENFORCE
(
node_set_
.
find
(
node
)
==
node_set_
.
end
());
...
...
paddle/fluid/framework/ir/mkldnn/conv_elementwise_add_mkldnn_fuse_pass_tester.cc
浏览文件 @
07efdb51
...
...
@@ -44,10 +44,14 @@ struct TestIsReachable {
using
func
=
std
::
function
<
bool
(
const
std
::
string
&
,
const
std
::
string
&
)
>
;
auto
operator
()(
const
std
::
unique_ptr
<
ir
::
Graph
>&
graph
)
->
func
{
auto
find_node
=
[](
const
std
::
unique_ptr
<
ir
::
Graph
>&
graph
,
const
std
::
string
&
name
)
->
Node
*
{
auto
hash
=
[](
const
Node
*
node
)
->
std
::
string
{
return
node
->
Name
()
+
std
::
to_string
(
node
->
id
());
};
auto
find_node
=
[
&
](
const
std
::
unique_ptr
<
ir
::
Graph
>&
graph
,
const
std
::
string
&
name
)
->
Node
*
{
for
(
auto
&
node
:
GraphTraits
::
DFS
(
*
graph
))
{
if
(
name
==
node
.
Name
(
))
{
if
(
name
==
hash
(
&
node
))
{
return
&
node
;
}
}
...
...
@@ -55,13 +59,17 @@ struct TestIsReachable {
return
nullptr
;
};
return
[
&
](
std
::
string
from
,
const
std
::
string
to
)
->
bool
{
// update the from and to strings to hashed equivs in loop from graph traits
return
[
&
](
std
::
string
from
,
std
::
string
to
)
->
bool
{
if
(
from
==
to
)
return
true
;
std
::
map
<
std
::
string
,
bool
>
visited
;
for
(
auto
&
node
:
GraphTraits
::
DFS
(
*
graph
))
{
visited
[
node
.
Name
()]
=
false
;
auto
hashed
=
hash
(
&
node
);
if
(
node
.
Name
()
==
from
)
from
=
hashed
;
if
(
node
.
Name
()
==
to
)
to
=
hashed
;
visited
[
hashed
]
=
false
;
}
visited
[
from
]
=
true
;
...
...
@@ -72,15 +80,15 @@ struct TestIsReachable {
while
(
!
queue
.
empty
())
{
auto
cur
=
find_node
(
graph
,
queue
.
front
());
queue
.
pop_front
();
if
(
cur
==
nullptr
)
return
false
;
for
(
auto
n
:
cur
->
outputs
)
{
if
(
n
->
Name
()
==
to
)
return
true
;
auto
hashed_name
=
hash
(
n
);
if
(
hashed_name
==
to
)
return
true
;
if
(
!
visited
[
n
->
Name
()
])
{
visited
[
n
->
Name
()
]
=
true
;
queue
.
push_back
(
n
->
Name
()
);
if
(
!
visited
[
hashed_name
])
{
visited
[
hashed_name
]
=
true
;
queue
.
push_back
(
hashed_name
);
}
}
}
...
...
@@ -166,6 +174,28 @@ TEST(ConvElementwiseAddMKLDNNFusePass, ConvolutionAsYWithElementwiseAddRelu) {
RunPassAndAssert
(
&
prog
,
"a"
,
"relu"
,
1
);
}
TEST
(
ConvElementwiseAddMKLDNNFusePass
,
ConvolutionProjectionAsYWithElementwiseAddRelu
)
{
auto
prog
=
BuildProgramDesc
({
"a"
,
"b"
,
"c"
,
"d"
,
"e"
,
"f"
},
{
"bias"
,
"weights"
,
"bias2"
,
"weights2"
});
SetOp
(
&
prog
,
"sigmoid"
,
{{
"X"
,
"a"
}},
{
"Out"
,
"b"
});
// right branch
SetOp
(
&
prog
,
"conv2d"
,
{{
"Input"
,
"b"
},
{
"Bias"
,
"bias"
},
{
"Filter"
,
"weights"
}},
{
"Output"
,
"c"
});
// left branch
SetOp
(
&
prog
,
"conv2d"
,
{{
"Input"
,
"a"
},
{
"Bias"
,
"bias2"
},
{
"Filter"
,
"weights2"
}},
{
"Output"
,
"f"
});
SetOp
(
&
prog
,
"elementwise_add"
,
{{
"X"
,
"f"
},
{
"Y"
,
"c"
}},
{
"Out"
,
"d"
});
SetOp
(
&
prog
,
"relu"
,
{{
"X"
,
"d"
}},
{
"Out"
,
"e"
});
RunPassAndAssert
(
&
prog
,
"a"
,
"relu"
,
2
);
}
TEST
(
ConvElementwiseAddMKLDNNFusePass
,
ConvolutionAsYWithElementwiseAddReluNoBias
)
{
auto
prog
=
BuildProgramDesc
({
"a"
,
"b"
,
"c"
,
"d"
,
"e"
},
{
"weights"
});
...
...
paddle/fluid/framework/operator.cc
浏览文件 @
07efdb51
...
...
@@ -904,6 +904,16 @@ void OperatorWithKernel::RuntimeInferShape(const Scope& scope,
this
->
InferShape
(
&
infer_shape_ctx
);
}
std
::
vector
<
KernelConfig
>*
OperatorWithKernel
::
GetKernelConfig
(
const
OpKernelType
&
key
)
const
{
auto
config_iter
=
kernel_configs_map_
.
find
(
key
);
std
::
vector
<
KernelConfig
>*
kernel_configs
=
nullptr
;
if
(
config_iter
!=
kernel_configs_map_
.
end
())
{
kernel_configs
=
&
(
config_iter
->
second
);
}
return
kernel_configs
;
}
void
OperatorWithKernel
::
RunImpl
(
const
Scope
&
scope
,
const
platform
::
Place
&
place
)
const
{
RuntimeContext
ctx
(
Inputs
(),
Outputs
(),
scope
);
...
...
@@ -921,7 +931,7 @@ void OperatorWithKernel::RunImpl(const Scope& scope,
OpKernelMap
&
kernels
=
kernels_iter
->
second
;
auto
expected_kernel_key
=
this
->
GetExpectedKernelType
(
ExecutionContext
(
*
this
,
scope
,
*
dev_ctx
,
ctx
));
ExecutionContext
(
*
this
,
scope
,
*
dev_ctx
,
ctx
,
nullptr
));
VLOG
(
3
)
<<
"expected_kernel_key:"
<<
expected_kernel_key
;
auto
kernel_iter
=
kernels
.
find
(
expected_kernel_key
);
...
...
@@ -940,6 +950,9 @@ void OperatorWithKernel::RunImpl(const Scope& scope,
KernelTypeToString
(
expected_kernel_key
));
}
std
::
vector
<
KernelConfig
>*
kernel_configs
=
GetKernelConfig
(
expected_kernel_key
);
// do data transformScope &transfer_scope;
std
::
vector
<
std
::
string
>
transfered_inplace_vars
;
auto
*
transfer_scope
=
...
...
@@ -957,7 +970,8 @@ void OperatorWithKernel::RunImpl(const Scope& scope,
this
->
InferShape
(
&
infer_shape_ctx
);
// TODO(panyx0718): ExecutionContext should only depend on RuntimeContext
// not Scope. Imperative mode only pass inputs and get outputs.
kernel_iter
->
second
(
ExecutionContext
(
*
this
,
exec_scope
,
*
dev_ctx
,
ctx
));
kernel_iter
->
second
(
ExecutionContext
(
*
this
,
exec_scope
,
*
dev_ctx
,
ctx
,
kernel_configs
));
if
(
!
transfered_inplace_vars
.
empty
())
{
// there is inplace variable has been transfered.
...
...
paddle/fluid/framework/operator.h
浏览文件 @
07efdb51
...
...
@@ -28,6 +28,7 @@ limitations under the License. */
#include "paddle/fluid/framework/lod_tensor.h"
#include "paddle/fluid/framework/op_info.h"
#include "paddle/fluid/framework/op_kernel_type.h"
#include "paddle/fluid/framework/operator_kernel_configs.h"
#include "paddle/fluid/framework/scope.h"
#include "paddle/fluid/framework/selected_rows.h"
#include "paddle/fluid/framework/tensor.h"
...
...
@@ -184,12 +185,30 @@ class OperatorBase {
const
platform
::
Place
&
place
)
const
=
0
;
};
#ifdef PADDLE_WITH_CUDA
using
KernelConfig
=
boost
::
variant
<
std
::
shared_ptr
<
AlgorithmsCache
<
cudnnConvolutionFwdAlgo_t
>>
,
std
::
shared_ptr
<
AlgorithmsCache
<
cudnnConvolutionBwdDataAlgo_t
>>
,
std
::
shared_ptr
<
AlgorithmsCache
<
cudnnConvolutionBwdFilterAlgo_t
>>>
;
#else
using
KernelConfig
=
boost
::
variant
<
boost
::
blank
>
;
#endif
using
OpKernelConfigsMap
=
std
::
unordered_map
<
OpKernelType
,
std
::
vector
<
KernelConfig
>
,
OpKernelType
::
Hash
>
;
class
ExecutionContext
{
public:
ExecutionContext
(
const
OperatorBase
&
op
,
const
Scope
&
scope
,
const
platform
::
DeviceContext
&
device_context
,
const
RuntimeContext
&
ctx
)
:
op_
(
op
),
scope_
(
scope
),
device_context_
(
device_context
),
ctx_
(
ctx
)
{}
const
RuntimeContext
&
ctx
,
std
::
vector
<
KernelConfig
>*
configs
)
:
op_
(
op
),
scope_
(
scope
),
device_context_
(
device_context
),
ctx_
(
ctx
),
kernel_configs_
(
configs
)
{}
const
OperatorBase
&
op
()
const
{
return
op_
;
}
...
...
@@ -398,11 +417,20 @@ class ExecutionContext {
return
temp_tensor
;
}
template
<
typename
T
>
T
&
GetKernelConfig
(
int
idx
)
const
{
PADDLE_ENFORCE
(
kernel_configs_
&&
kernel_configs_
->
size
()
>
idx
,
"%s selected kernel doesn't have kernel config %lu <= %d"
,
op_
.
Type
().
c_str
(),
kernel_configs_
->
size
(),
idx
);
return
*
boost
::
get
<
std
::
shared_ptr
<
T
>>
(
kernel_configs_
->
at
(
idx
));
}
private:
const
OperatorBase
&
op_
;
const
Scope
&
scope_
;
const
platform
::
DeviceContext
&
device_context_
;
const
RuntimeContext
&
ctx_
;
mutable
std
::
vector
<
KernelConfig
>*
kernel_configs_
;
};
template
<>
...
...
@@ -483,6 +511,8 @@ class OperatorWithKernel : public OperatorBase {
virtual
OpKernelType
GetExpectedKernelType
(
const
ExecutionContext
&
ctx
)
const
;
std
::
vector
<
KernelConfig
>*
GetKernelConfig
(
const
OpKernelType
&
key
)
const
;
protected:
virtual
OpKernelType
GetKernelTypeForVar
(
const
std
::
string
&
var_name
,
const
Tensor
&
tensor
,
...
...
@@ -508,6 +538,9 @@ class OperatorWithKernel : public OperatorBase {
void
TransferInplaceVarsBack
(
const
Scope
&
scope
,
const
std
::
vector
<
std
::
string
>&
inplace_vars
,
const
Scope
&
exec_scope
)
const
;
protected:
mutable
OpKernelConfigsMap
kernel_configs_map_
;
};
extern
bool
OpSupportGPU
(
const
std
::
string
&
op_type
);
...
...
paddle/fluid/framework/operator_kernel_configs.h
0 → 100644
浏览文件 @
07efdb51
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <algorithm>
#include <unordered_map>
#include <vector>
namespace
paddle
{
namespace
framework
{
// Not thread-safe. Should be owned per-kernel.
template
<
typename
TAlgorithm
>
class
AlgorithmsCache
{
public:
AlgorithmsCache
()
:
search_times_
(
0
)
{
hash_
.
clear
();
}
// Caches the best algorithm for a given
// combination of tensor dimensions & compute data type.
TAlgorithm
GetAlgorithm
(
const
std
::
vector
<
int64_t
>&
dims1
,
const
std
::
vector
<
int64_t
>&
dims2
,
const
std
::
vector
<
int
>&
strides
,
const
std
::
vector
<
int
>&
paddings
,
const
std
::
vector
<
int
>&
dilations
,
int
algorithmFlags
,
// can set for different data type
std
::
function
<
TAlgorithm
()
>
gen_func
);
TAlgorithm
GetAlgorithm
(
int64_t
area
,
int
search_times
,
int
algorithmFlags
,
std
::
function
<
TAlgorithm
()
>
gen_func
);
private:
std
::
unordered_map
<
int64_t
,
TAlgorithm
>
hash_
;
int
search_times_
;
};
template
<
typename
TAlgorithm
>
TAlgorithm
framework
::
AlgorithmsCache
<
TAlgorithm
>::
GetAlgorithm
(
const
std
::
vector
<
int64_t
>&
dims1
,
const
std
::
vector
<
int64_t
>&
dims2
,
const
std
::
vector
<
int
>&
strides
,
const
std
::
vector
<
int
>&
paddings
,
const
std
::
vector
<
int
>&
dilations
,
int
algorithmFlags
,
std
::
function
<
TAlgorithm
()
>
gen_func
)
{
int64_t
seed
=
0
;
// Hash all of the inputs, use to try and look up a previously
// discovered algorithm, or fall back to generating a new one.
std
::
hash
<
int64_t
>
hashFn
;
// do hash like boost
// https://stackoverflow.com/questions/2590677/how-do-i-combine-hash-values-in-c0x
for
(
const
auto
num
:
dims1
)
{
seed
^=
hashFn
(
num
)
+
0x9e3779b9
+
(
seed
<<
6
)
+
(
seed
>>
2
);
}
for
(
const
auto
num
:
dims2
)
{
seed
^=
hashFn
(
num
)
+
0x9e3779b9
+
(
seed
<<
6
)
+
(
seed
>>
2
)
+
1
;
}
for
(
const
auto
num
:
strides
)
{
seed
^=
hashFn
(
static_cast
<
int64_t
>
(
num
))
+
0x9e3779b9
+
(
seed
<<
6
)
+
(
seed
>>
2
)
+
2
;
}
for
(
const
auto
num
:
paddings
)
{
seed
^=
hashFn
(
static_cast
<
int64_t
>
(
num
))
+
0x9e3779b9
+
(
seed
<<
6
)
+
(
seed
>>
2
)
+
3
;
}
for
(
const
auto
num
:
dilations
)
{
seed
^=
hashFn
(
static_cast
<
int64_t
>
(
num
))
+
0x9e3779b9
+
(
seed
<<
6
)
+
(
seed
>>
2
)
+
4
;
}
seed
^=
hashFn
(
static_cast
<
int64_t
>
(
algorithmFlags
))
+
0x9e3779b9
+
(
seed
<<
6
)
+
(
seed
>>
2
)
+
5
;
if
(
seed
==
0
)
return
gen_func
();
if
(
hash_
.
find
(
seed
)
==
hash_
.
end
())
{
TAlgorithm
value
=
gen_func
();
hash_
[
seed
]
=
value
;
}
return
hash_
[
seed
];
}
template
<
typename
TAlgorithm
>
TAlgorithm
AlgorithmsCache
<
TAlgorithm
>::
GetAlgorithm
(
int64_t
area
,
int
search_times
,
int
algorithmFlags
,
std
::
function
<
TAlgorithm
()
>
gen_func
)
{
if
(
hash_
.
find
(
area
)
!=
hash_
.
end
())
{
return
hash_
[
area
];
}
if
(
search_times_
<
search_times
)
{
auto
algo
=
gen_func
();
hash_
[
area
]
=
algo
;
++
search_times_
;
return
algo
;
}
TAlgorithm
algo
;
int64_t
min
=
static_cast
<
uint64_t
>
(
INT_MAX
);
for
(
const
auto
&
m
:
hash_
)
{
if
(
m
.
first
<
min
)
{
min
=
m
.
first
;
algo
=
m
.
second
;
}
}
return
algo
;
}
}
// namespace framework
}
// namespace paddle
paddle/fluid/framework/parallel_executor.cc
浏览文件 @
07efdb51
...
...
@@ -184,9 +184,10 @@ std::vector<Scope *> &ParallelExecutor::GetLocalScopes() {
ParallelExecutor
::
ParallelExecutor
(
const
std
::
vector
<
platform
::
Place
>
&
places
,
const
std
::
unordered_set
<
std
::
string
>
&
bcast_vars
,
const
ProgramDesc
&
main_program
,
const
std
::
string
&
loss_var_name
,
Scope
*
scope
,
const
std
::
vector
<
Scope
*>
&
local_scopes
,
const
ExecutionStrategy
&
exec_strategy
,
const
BuildStrategy
&
build_strategy
)
const
std
::
string
&
loss_var_name
,
Scope
*
scope
,
const
std
::
vector
<
Scope
*>
&
local_scopes
,
const
ExecutionStrategy
&
exec_strategy
,
const
BuildStrategy
&
build_strategy
,
ir
::
Graph
*
graph
)
:
member_
(
new
ParallelExecutorPrivate
(
places
))
{
member_
->
global_scope_
=
scope
;
member_
->
use_cuda_
=
exec_strategy
.
use_cuda_
;
...
...
@@ -216,11 +217,13 @@ ParallelExecutor::ParallelExecutor(
}
}
std
::
unique_ptr
<
ir
::
Graph
>
temp_owned_graph
(
graph
);
// FIXME(Yancey1989): parallel graph mode get better performance
// in GPU allreduce distributed training. Need an elegant way to
// choice the execution strategy.
build_strategy
.
enable_parallel_graph_
=
EnableParallelGraphExecution
(
main_program
,
exec_strategy
,
build_strategy
);
build_strategy
.
enable_parallel_graph_
=
EnableParallelGraphExecution
(
*
temp_owned_graph
,
exec_strategy
,
build_strategy
);
if
(
build_strategy
.
enable_parallel_graph_
)
VLOG
(
0
)
<<
"The Executor would execute the graph by ParallelGraph "
"Execution which can get better performance,"
...
...
@@ -254,26 +257,32 @@ ParallelExecutor::ParallelExecutor(
if
(
member_
->
local_scopes_
.
size
()
!=
1
&&
local_scopes
.
empty
())
{
BCastParamsToDevices
(
bcast_vars
);
}
// Startup Program has been run. All local scopes has correct parameters.
// Startup Program has been run. All local scopes has correct parameters.
// Step 2. Convert main_program to SSA form and dependency graph. Also, insert
// ncclOp
std
::
unique_ptr
<
ir
::
Graph
>
graph
;
// Step 2. Convert main_program to SSA form and dependency graph. Also, insert
// ncclOp
#if defined(PADDLE_WITH_CUDA) && !defined(_WIN32)
graph
=
build_strategy
.
Apply
(
main_program
,
member_
->
places_
,
loss_var_name
,
member_
->
local_scopes_
,
member_
->
nranks_
,
member_
->
use_cuda_
,
member_
->
nccl_ctxs_
.
get
());
temp_owned_graph
=
build_strategy
.
Apply
(
std
::
move
(
temp_owned_graph
),
member_
->
places_
,
loss_var_name
,
member_
->
local_scopes_
,
member_
->
nranks_
,
member_
->
use_cuda_
,
member_
->
nccl_ctxs_
.
get
());
#else
graph
=
build_strategy
.
Apply
(
main_program
,
member_
->
places_
,
loss_var_name
,
member_
->
local_scopes_
,
member_
->
nranks_
,
member_
->
use_cuda_
);
temp_owned_graph
=
build_strategy
.
Apply
(
std
::
move
(
temp_owned_graph
),
member_
->
places_
,
loss_var_name
,
member_
->
local_scopes_
,
member_
->
nranks_
,
member_
->
use_cuda_
);
#endif
auto
max_memory_size
=
GetEagerDeletionThreshold
();
VLOG
(
10
)
<<
"Eager Deletion Threshold "
<<
static_cast
<
float
>
(
max_memory_size
)
/
(
1
<<
30
);
if
(
max_memory_size
>=
0
)
{
graph
=
member_
->
PrepareGCAndRefCnts
(
std
::
move
(
graph
),
static_cast
<
size_t
>
(
max_memory_size
));
graph
=
member_
->
PrepareGCAndRefCnts
(
std
::
move
(
temp_owned_graph
),
static_cast
<
size_t
>
(
max_memory_size
))
.
release
();
}
else
{
graph
=
temp_owned_graph
.
release
();
}
// Step 3. Create vars in each scope. Passes may also create new vars.
...
...
@@ -308,8 +317,7 @@ ParallelExecutor::ParallelExecutor(
// TODO(Yancey1989): Remove passing in the main_program when
// allreduce_seq_pass doesn't need it as the attr.
member_
->
executor_
.
reset
(
new
details
::
ParallelSSAGraphExecutor
(
exec_strategy
,
member_
->
local_scopes_
,
member_
->
places_
,
main_program
,
std
::
move
(
graph
)));
exec_strategy
,
member_
->
local_scopes_
,
member_
->
places_
,
graph
));
#else
PADDLE_THROW
(
"Paddle should be compiled with CUDA for ParallelGraph Execution."
);
...
...
@@ -317,12 +325,10 @@ ParallelExecutor::ParallelExecutor(
}
else
{
if
(
exec_strategy
.
type_
==
ExecutionStrategy
::
kDefault
)
{
member_
->
executor_
.
reset
(
new
details
::
ThreadedSSAGraphExecutor
(
exec_strategy
,
member_
->
local_scopes_
,
member_
->
places_
,
std
::
move
(
graph
)));
exec_strategy
,
member_
->
local_scopes_
,
member_
->
places_
,
graph
));
}
else
{
member_
->
executor_
.
reset
(
new
details
::
FastThreadedSSAGraphExecutor
(
exec_strategy
,
member_
->
local_scopes_
,
member_
->
places_
,
std
::
move
(
graph
)));
exec_strategy
,
member_
->
local_scopes_
,
member_
->
places_
,
graph
));
}
}
...
...
@@ -452,24 +458,33 @@ void ParallelExecutor::FeedAndSplitTensorIntoLocalScopes(
}
}
ParallelExecutor
::~
ParallelExecutor
()
{
for
(
auto
&
p
:
member_
->
places_
)
{
platform
::
DeviceContextPool
::
Instance
().
Get
(
p
)
->
Wait
();
}
delete
member_
;
}
bool
ParallelExecutor
::
EnableParallelGraphExecution
(
const
ProgramDesc
&
main_program
,
const
ExecutionStrategy
&
exec_strategy
,
const
ir
::
Graph
&
graph
,
const
ExecutionStrategy
&
exec_strategy
,
const
BuildStrategy
&
build_strategy
)
const
{
if
(
!
FLAGS_enable_parallel_graph
)
return
false
;
bool
enable_parallel_graph
=
true
;
// TODO(Yancey1989): support sparse update in ParallelGraph mode.
for
(
auto
&
var_desc
:
main_program
.
Block
(
0
).
AllVars
())
{
if
(
var_desc
->
GetType
()
==
proto
::
VarType
::
SELECTED_ROWS
)
{
enable_parallel_graph
=
false
;
}
}
// TODO(Yancey1989): support pserver mode
for
(
auto
&
op_desc
:
main_program
.
Block
(
0
).
AllOps
())
{
if
(
op_desc
->
Type
()
==
"send"
||
op_desc
->
Type
()
==
"recv"
)
{
enable_parallel_graph
=
false
;
break
;
for
(
ir
::
Node
*
node
:
graph
.
Nodes
())
{
if
(
node
->
IsVar
()
&&
node
->
Var
())
{
// TODO(Yancey1989): support sparse update in ParallelGraph mode.
if
(
node
->
Var
()
->
GetType
()
==
proto
::
VarType
::
SELECTED_ROWS
)
{
enable_parallel_graph
=
false
;
break
;
}
}
else
if
(
node
->
IsOp
()
&&
node
->
Op
())
{
// TODO(Yancey1989): support pserver mode
if
(
node
->
Op
()
->
Type
()
==
"send"
||
node
->
Op
()
->
Type
()
==
"recv"
)
{
enable_parallel_graph
=
false
;
break
;
}
}
}
...
...
@@ -481,13 +496,6 @@ bool ParallelExecutor::EnableParallelGraphExecution(
return
enable_parallel_graph
;
}
ParallelExecutor
::~
ParallelExecutor
()
{
for
(
auto
&
p
:
member_
->
places_
)
{
platform
::
DeviceContextPool
::
Instance
().
Get
(
p
)
->
Wait
();
}
delete
member_
;
}
}
// namespace framework
}
// namespace paddle
...
...
paddle/fluid/framework/parallel_executor.h
浏览文件 @
07efdb51
...
...
@@ -46,11 +46,11 @@ class ParallelExecutor {
public:
explicit
ParallelExecutor
(
const
std
::
vector
<
platform
::
Place
>
&
places
,
const
std
::
unordered_set
<
std
::
string
>
&
bcast_vars
,
const
ProgramDesc
&
main_program
,
const
std
::
string
&
loss_var_name
,
Scope
*
scope
,
const
std
::
vector
<
Scope
*>
&
local_scopes
,
const
ExecutionStrategy
&
exec_strategy
,
const
BuildStrategy
&
build_strategy
);
const
BuildStrategy
&
build_strategy
,
ir
::
Graph
*
graph
);
~
ParallelExecutor
();
...
...
@@ -71,7 +71,7 @@ class ParallelExecutor {
private:
void
BCastParamsToDevices
(
const
std
::
unordered_set
<
std
::
string
>
&
vars
)
const
;
bool
EnableParallelGraphExecution
(
const
ProgramDesc
&
main_program
,
bool
EnableParallelGraphExecution
(
const
ir
::
Graph
&
graph
,
const
ExecutionStrategy
&
exec_strategy
,
const
BuildStrategy
&
build_strategy
)
const
;
...
...
paddle/fluid/framework/tensor.h
浏览文件 @
07efdb51
...
...
@@ -27,6 +27,10 @@ limitations under the License. */
#include "paddle/fluid/platform/enforce.h"
#include "paddle/fluid/platform/place.h"
#ifdef PADDLE_WITH_MKLDNN
#include "paddle/fluid/platform/mkldnn_utils.h"
#endif
namespace
paddle
{
namespace
framework
{
...
...
@@ -37,10 +41,34 @@ class Tensor {
#ifdef PADDLE_WITH_MKLDNN
public:
inline
mkldnn
::
memory
::
format
format
()
const
{
return
format_
;
}
// TODO(jczaja): This is depracted and will be removed
inline
mkldnn
::
memory
::
format
format
()
const
{
if
(
layout_
==
DataLayout
::
kMKLDNN
)
{
return
static_cast
<
mkldnn
::
memory
::
format
>
(
mem_pd_
.
desc
().
data
.
format
);
}
else
{
return
mkldnn
::
memory
::
format
::
format_undef
;
}
}
inline
void
set_format
(
const
mkldnn
::
memory
::
format
format
)
{
format_
=
format
;
// TODO(jczaja): This is depracted and will be removed
inline
void
set_format
(
const
mkldnn
::
memory
::
format
fmt
,
mkldnn
::
memory
::
data_type
data_type
=
mkldnn
::
memory
::
f32
)
{
mem_pd_
=
paddle
::
platform
::
create_prim_desc_from_format
(
paddle
::
framework
::
vectorize2int
(
dims
()),
fmt
,
data_type
);
layout_
=
DataLayout
::
kMKLDNN
;
}
inline
mkldnn
::
memory
::
primitive_desc
get_mkldnn_prim_desc
()
const
{
return
mem_pd_
;
}
inline
void
set_mkldnn_prim_desc
(
const
mkldnn
::
memory
::
primitive_desc
&
mem_pd
)
{
// Internally MKL-DNN is just copying (increasing reference counter)
// to shared_ptr. So asignment should be quite cheap
mem_pd_
=
mem_pd
;
layout_
=
DataLayout
::
kMKLDNN
;
}
protected:
...
...
@@ -48,12 +76,9 @@ class Tensor {
* @brief the detail format of memory block which have layout as kMKLDNN
*
* @note MKLDNN lib support various memory format like nchw, nhwc, nChw8C,
* nChw16c, etc. For a MKLDNN memory block, layout will be set as
* DataLayout::kMKLDNN meanwhile detail memory format will be kept in
* this field.
* nChw16c, etc. For a MKLDNN memory block, we store memory descriptor
*/
mkldnn
::
memory
::
format
format_
=
mkldnn
::
memory
::
format
::
format_undef
;
mutable
mkldnn
::
memory
::
primitive_desc
mem_pd_
;
#endif
public:
...
...
paddle/fluid/framework/var_type_traits.h
浏览文件 @
07efdb51
...
...
@@ -50,8 +50,6 @@ class Scope;
}
// namespace framework
namespace
operators
{
template
<
typename
T
>
class
AlgorithmsCache
;
class
CudnnRNNCache
;
...
...
@@ -144,9 +142,6 @@ using VarTypeRegistry = detail::VarTypeRegistryImpl<
#ifndef _WIN32
ncclUniqueId
,
platform
::
Communicator
,
#endif
operators
::
AlgorithmsCache
<
cudnnConvolutionFwdAlgo_t
>
,
operators
::
AlgorithmsCache
<
cudnnConvolutionBwdDataAlgo_t
>
,
operators
::
AlgorithmsCache
<
cudnnConvolutionBwdFilterAlgo_t
>
,
operators
::
CudnnRNNCache
,
#endif
int
,
float
>
;
...
...
paddle/fluid/imperative/layer.cc
浏览文件 @
07efdb51
...
...
@@ -249,7 +249,8 @@ std::map<std::string, std::vector<VarBase*>> OpBase::ApplyGrad() {
framework
::
Scope
scope
;
PreparedOp
p
=
PreparedOp
::
Prepare
(
ctx
,
*
op_kernel
,
place_
);
p
.
op
.
RuntimeInferShape
(
scope
,
place_
,
ctx
);
p
.
func
(
framework
::
ExecutionContext
(
p
.
op
,
scope
,
*
p
.
dev_ctx
,
p
.
ctx
));
p
.
func
(
framework
::
ExecutionContext
(
p
.
op
,
scope
,
*
p
.
dev_ctx
,
p
.
ctx
,
nullptr
));
}
}
...
...
paddle/fluid/imperative/layer.h
浏览文件 @
07efdb51
...
...
@@ -44,8 +44,13 @@ class PreparedOp {
PreparedOp
(
const
framework
::
OperatorBase
&
op
,
const
framework
::
RuntimeContext
&
ctx
,
framework
::
OperatorWithKernel
::
OpKernelFunc
func
,
platform
::
DeviceContext
*
dev_ctx
)
:
op
(
op
),
ctx
(
ctx
),
func
(
func
),
dev_ctx
(
dev_ctx
)
{}
platform
::
DeviceContext
*
dev_ctx
,
std
::
vector
<
framework
::
KernelConfig
>*
kernel_configs
)
:
op
(
op
),
ctx
(
ctx
),
func
(
func
),
dev_ctx
(
dev_ctx
),
kernel_configs
(
kernel_configs
)
{}
static
PreparedOp
Prepare
(
const
framework
::
RuntimeContext
&
ctx
,
const
framework
::
OperatorWithKernel
&
op
,
...
...
@@ -64,8 +69,9 @@ class PreparedOp {
framework
::
OperatorWithKernel
::
OpKernelMap
&
kernels
=
kernels_iter
->
second
;
auto
expected_kernel_key
=
op
.
GetExpectedKernelType
(
framework
::
ExecutionContext
(
op
,
framework
::
Scope
(),
*
dev_ctx
,
ctx
));
auto
expected_kernel_key
=
op
.
GetExpectedKernelType
(
framework
::
ExecutionContext
(
op
,
framework
::
Scope
(),
*
dev_ctx
,
ctx
,
nullptr
));
VLOG
(
3
)
<<
"expected_kernel_key:"
<<
expected_kernel_key
;
auto
kernel_iter
=
kernels
.
find
(
expected_kernel_key
);
...
...
@@ -83,7 +89,9 @@ class PreparedOp {
PADDLE_THROW
(
"op %s does not have kernel for %s"
,
op
.
Type
(),
KernelTypeToString
(
expected_kernel_key
));
}
return
PreparedOp
(
op
,
ctx
,
kernel_iter
->
second
,
dev_ctx
);
std
::
vector
<
framework
::
KernelConfig
>*
kernel_configs
=
op
.
GetKernelConfig
(
expected_kernel_key
);
return
PreparedOp
(
op
,
ctx
,
kernel_iter
->
second
,
dev_ctx
,
kernel_configs
);
}
inline
platform
::
DeviceContext
*
GetDeviceContext
()
const
{
return
dev_ctx
;
}
...
...
@@ -92,6 +100,7 @@ class PreparedOp {
const
framework
::
RuntimeContext
&
ctx
;
framework
::
OperatorWithKernel
::
OpKernelFunc
func
;
platform
::
DeviceContext
*
dev_ctx
;
std
::
vector
<
framework
::
KernelConfig
>*
kernel_configs
;
};
class
OpBase
;
...
...
@@ -105,23 +114,23 @@ class VarBase {
public:
VarBase
()
:
VarBase
(
new
framework
::
Variable
(),
new
VarBase
(
true
))
{}
// Owns `var` and `grad`
explicit
VarBase
(
bool
stop_gradient
)
:
VarBase
(
new
framework
::
Variable
(),
stop_gradient
?
nullptr
:
new
VarBase
(
true
),
stop_gradient
)
{}
VarBase
(
framework
::
Variable
*
var
,
VarBase
*
grad
)
:
VarBase
(
var
,
grad
,
false
)
{}
private:
VarBase
(
framework
::
Variable
*
var
,
VarBase
*
grad
,
bool
stop_gradient
)
:
var_desc_
(
nullptr
),
var_
(
var
),
grads_
(
grad
),
stop_gradient_
(
false
),
pre_op_
(
nullptr
),
pre_op_out_idx_
(
-
1
)
{}
explicit
VarBase
(
bool
stop_gradient
)
:
var_desc_
(
nullptr
),
var_
(
new
framework
::
Variable
()),
grads_
(
stop_gradient
?
nullptr
:
new
VarBase
(
true
)),
stop_gradient_
(
stop_gradient
),
pre_op_
(
nullptr
),
pre_op_out_idx_
(
-
1
)
{}
public:
virtual
~
VarBase
()
{
if
(
var_
)
{
delete
var_
;
...
...
@@ -132,11 +141,13 @@ class VarBase {
}
}
OpBase
*
PreOp
()
const
{
return
pre_op_
;
}
int
PreOpOutIdx
()
const
{
return
pre_op_out_idx_
;
}
inline
OpBase
*
PreOp
()
const
{
return
pre_op_
;
}
in
line
in
t
PreOpOutIdx
()
const
{
return
pre_op_out_idx_
;
}
void
SetStopGradient
(
bool
stop_gradient
)
{
stop_gradient_
=
stop_gradient
;
}
bool
IsStopGradient
()
const
{
return
stop_gradient_
;
}
inline
void
SetStopGradient
(
bool
stop_gradient
)
{
stop_gradient_
=
stop_gradient
;
}
inline
bool
IsStopGradient
()
const
{
return
stop_gradient_
;
}
void
RunBackward
();
...
...
paddle/fluid/imperative/tracer.cc
浏览文件 @
07efdb51
...
...
@@ -14,6 +14,8 @@
#include "paddle/fluid/imperative/tracer.h"
#include <set>
#include "paddle/fluid/operators/math/math_function.h"
#include "paddle/fluid/platform/device_context.h"
#include "paddle/fluid/platform/enforce.h"
...
...
@@ -66,16 +68,18 @@ platform::Place GetExpectedPlace(platform::Place place, VarBasePtrMap inputs) {
return
result
;
}
void
Tracer
::
Trace
(
OpBase
*
op
,
const
VarBasePtrMap
&
inputs
,
const
VarBasePtrMap
&
outputs
,
framework
::
BlockDesc
*
block
,
const
platform
::
Place
expected_place
,
const
bool
stop_gradient
)
{
std
::
set
<
std
::
string
>
Tracer
::
Trace
(
OpBase
*
op
,
const
VarBasePtrMap
&
inputs
,
const
VarBasePtrMap
&
outputs
,
framework
::
BlockDesc
*
block
,
const
platform
::
Place
expected_place
,
const
bool
stop_gradient
)
{
std
::
map
<
std
::
string
,
VarBase
*>
vars
;
framework
::
OpDesc
*
op_desc
=
op
->
op_desc_
;
VLOG
(
3
)
<<
"tracer tracing "
<<
op_desc
->
Type
();
op_desc
->
InferShape
(
*
block
);
op_desc
->
InferVarType
(
block
);
std
::
unique_ptr
<
framework
::
OperatorBase
>
op_base
=
framework
::
OpRegistry
::
CreateOp
(
*
op_desc
);
...
...
@@ -92,7 +96,7 @@ void Tracer::Trace(OpBase* op, const VarBasePtrMap& inputs,
invars
.
emplace_back
(
inp
->
var_
);
vars
[
inp
->
var_desc_
->
Name
()]
=
inp
;
if
(
inp
->
PreOp
())
{
if
(
inp
->
PreOp
()
&&
!
inp
->
IsStopGradient
()
)
{
op
->
pre_ops_
[
it
.
first
].
push_back
(
inp
->
PreOp
());
op
->
pre_ops_out_idx_
[
it
.
first
].
push_back
(
inp
->
PreOpOutIdx
());
}
else
{
...
...
@@ -138,8 +142,11 @@ void Tracer::Trace(OpBase* op, const VarBasePtrMap& inputs,
op
->
place_
=
GetExpectedPlace
(
expected_place
,
inputs
);
PreparedOp
prepared_op
=
PreparedOp
::
Prepare
(
ctx
,
*
op_kernel
,
op
->
place_
);
prepared_op
.
op
.
RuntimeInferShape
(
scope
,
op
->
place_
,
ctx
);
prepared_op
.
func
(
framework
::
ExecutionContext
(
prepared_op
.
op
,
scope
,
*
prepared_op
.
dev_ctx
,
prepared_op
.
ctx
));
prepared_op
.
func
(
framework
::
ExecutionContext
(
prepared_op
.
op
,
scope
,
*
prepared_op
.
dev_ctx
,
prepared_op
.
ctx
,
prepared_op
.
kernel_configs
));
std
::
set
<
std
::
string
>
vars_saved_for_backward
;
if
(
!
stop_gradient
)
{
std
::
unique_ptr
<
std
::
unordered_map
<
std
::
string
,
std
::
string
>>
grad_to_var
(
...
...
@@ -160,6 +167,7 @@ void Tracer::Trace(OpBase* op, const VarBasePtrMap& inputs,
PADDLE_ENFORCE
(
fwd_var_it
!=
vars
.
end
());
// Forward inputs or outputs.
grad_in_vars
.
push_back
(
fwd_var_it
->
second
->
var_
);
vars_saved_for_backward
.
insert
(
it
.
first
);
}
else
{
VarBase
*
var
=
vars
[
var_it
->
second
];
if
(
!
var
->
grads_
->
var_
->
IsInitialized
())
{
...
...
@@ -193,6 +201,7 @@ void Tracer::Trace(OpBase* op, const VarBasePtrMap& inputs,
}
op
->
block_
=
block
;
return
vars_saved_for_backward
;
}
std
::
vector
<
VarBase
*>
Tracer
::
PyTrace
(
OpBase
*
op
,
...
...
@@ -202,7 +211,7 @@ std::vector<VarBase*> Tracer::PyTrace(OpBase* op,
op
->
input_vars_
[
PyLayer
::
kFwdInp
]
=
inputs
;
op
->
output_vars_
[
PyLayer
::
kFwdOut
]
=
PyLayer
::
Apply
(
op
->
forward_id_
,
inputs
);
for
(
VarBase
*
inp
:
inputs
)
{
if
(
inp
->
PreOp
())
{
if
(
inp
->
PreOp
()
&&
!
inp
->
IsStopGradient
()
)
{
op
->
pre_ops_
[
PyLayer
::
kFwdInp
].
push_back
(
inp
->
PreOp
());
op
->
pre_ops_out_idx_
[
PyLayer
::
kFwdInp
].
push_back
(
inp
->
PreOpOutIdx
());
}
else
{
...
...
paddle/fluid/imperative/tracer.h
浏览文件 @
07efdb51
...
...
@@ -15,6 +15,7 @@
#pragma once
#include <map>
#include <set>
#include <string>
#include <vector>
...
...
@@ -43,10 +44,11 @@ class Tracer {
virtual
~
Tracer
()
{}
void
Trace
(
OpBase
*
op
,
const
VarBasePtrMap
&
inputs
,
const
VarBasePtrMap
&
outputs
,
framework
::
BlockDesc
*
block
,
const
platform
::
Place
expected_place
,
const
bool
stop_gradient
=
false
);
std
::
set
<
std
::
string
>
Trace
(
OpBase
*
op
,
const
VarBasePtrMap
&
inputs
,
const
VarBasePtrMap
&
outputs
,
framework
::
BlockDesc
*
block
,
const
platform
::
Place
expected_place
,
const
bool
stop_gradient
=
false
);
std
::
vector
<
VarBase
*>
PyTrace
(
OpBase
*
op
,
const
std
::
vector
<
VarBase
*>&
inputs
,
bool
stop_gradient
=
false
);
...
...
paddle/fluid/operators/alloc_continuous_space_op.cc
0 → 100644
浏览文件 @
07efdb51
// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include <vector>
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/operator.h"
#include "paddle/fluid/framework/var_type.h"
#include "paddle/fluid/operators/math/math_function.h"
namespace
paddle
{
namespace
operators
{
static
framework
::
proto
::
VarType
::
Type
kDefaultDtype
=
framework
::
proto
::
VarType
::
Type
::
VarType_Type_BOOL
;
template
<
typename
DeviceContext
,
typename
T
>
class
AllocContinuousSpaceKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
void
Compute
(
const
framework
::
ExecutionContext
&
context
)
const
override
{
auto
&
in_var_names
=
context
.
Inputs
(
"Input"
);
auto
&
out_var_names
=
context
.
Outputs
(
"Output"
);
auto
&
in_vars
=
context
.
MultiInputVar
(
"Input"
);
auto
out_vars
=
context
.
MultiOutputVar
(
"Output"
);
PADDLE_ENFORCE_GT
(
in_var_names
.
size
(),
static_cast
<
size_t
>
(
0
));
PADDLE_ENFORCE_EQ
(
in_var_names
.
size
(),
out_var_names
.
size
());
for
(
size_t
i
=
0
;
i
<
in_var_names
.
size
();
++
i
)
{
// Only support LoDTensor
PADDLE_ENFORCE_NOT_NULL
(
in_vars
[
i
],
"%s should not be nullptr,"
,
in_var_names
[
i
]);
PADDLE_ENFORCE_NOT_NULL
(
out_vars
[
i
],
"%s should not be nullptr,"
,
out_var_names
[
i
]);
PADDLE_ENFORCE
(
in_vars
[
i
]
->
IsType
<
framework
::
LoDTensor
>
());
PADDLE_ENFORCE
(
out_vars
[
i
]
->
IsType
<
framework
::
LoDTensor
>
());
}
auto
in_tensors
=
context
.
MultiInput
<
framework
::
LoDTensor
>
(
"Input"
);
if
(
context
.
Attr
<
bool
>
(
"check_name"
))
{
for
(
size_t
i
=
0
;
i
<
in_var_names
.
size
();
++
i
)
{
PADDLE_ENFORCE_EQ
(
in_var_names
[
i
],
out_var_names
[
i
]);
}
}
else
{
// Init the output as input
for
(
size_t
i
=
0
;
i
<
in_tensors
.
size
();
++
i
)
{
out_vars
[
i
]
->
GetMutable
<
framework
::
LoDTensor
>
()
->
Resize
(
in_tensors
[
i
]
->
dims
());
}
}
auto
&
dev_ctx
=
context
.
template
device_context
<
DeviceContext
>();
// Get numel and dtype
size_t
numel
=
0
;
auto
dtype
=
kDefaultDtype
;
GetMemSizeAndDtype
(
in_tensors
,
in_var_names
,
&
numel
,
&
dtype
);
// Alloc the continuous space
auto
fused_tensor
=
context
.
Output
<
framework
::
LoDTensor
>
(
"FusedOutput"
);
fused_tensor
->
Resize
(
framework
::
make_ddim
({
static_cast
<
int64_t
>
(
numel
)}))
.
mutable_data
(
context
.
GetPlace
(),
dtype
);
// Init the continuous space
auto
out_tensors
=
context
.
MultiOutput
<
framework
::
LoDTensor
>
(
"Output"
);
int64_t
offset
=
0
;
if
(
context
.
Attr
<
bool
>
(
"copy_data"
))
{
for
(
size_t
i
=
0
;
i
<
in_var_names
.
size
();
++
i
)
{
int64_t
len
=
out_tensors
[
i
]
->
numel
();
auto
sub_tensor
=
fused_tensor
->
Slice
(
offset
,
offset
+
len
);
offset
+=
len
;
framework
::
TensorCopy
(
*
out_tensors
[
i
],
context
.
GetPlace
(),
dev_ctx
,
&
sub_tensor
);
}
}
else
if
(
context
.
Attr
<
bool
>
(
"set_constant"
))
{
math
::
SetConstant
<
DeviceContext
,
T
>
set_constant
;
set_constant
(
dev_ctx
,
fused_tensor
,
static_cast
<
T
>
(
context
.
Attr
<
float
>
(
"constant"
)));
}
// Make the outputs point to the continuous space.
offset
=
0
;
for
(
size_t
i
=
0
;
i
<
out_tensors
.
size
();
++
i
)
{
int64_t
len
=
out_tensors
[
i
]
->
numel
();
auto
dim
=
out_tensors
[
i
]
->
dims
();
out_tensors
[
i
]
->
ShareDataWith
(
fused_tensor
->
Slice
(
offset
,
offset
+
len
))
.
Resize
(
dim
);
offset
+=
len
;
VLOG
(
10
)
<<
"alloc_space_for_vars: output("
<<
out_var_names
[
i
]
<<
") ,dim:("
<<
dim
<<
")"
<<
" Address: "
<<
out_tensors
[
i
]
->
data
<
void
>
();
}
}
void
GetMemSizeAndDtype
(
const
std
::
vector
<
const
framework
::
LoDTensor
*>
&
lod_tensors
,
const
std
::
vector
<
std
::
string
>
var_names
,
size_t
*
numel
,
framework
::
proto
::
VarType
::
Type
*
dtype
)
const
{
PADDLE_ENFORCE_EQ
(
lod_tensors
.
size
(),
var_names
.
size
());
*
numel
=
0
;
for
(
size_t
i
=
0
;
i
<
var_names
.
size
();
++
i
)
{
PADDLE_ENFORCE
(
lod_tensors
[
i
]
->
IsInitialized
(),
"%s is not initialized."
,
var_names
[
i
]);
auto
p_dtype
=
lod_tensors
[
i
]
->
type
();
if
(
*
dtype
==
kDefaultDtype
)
{
PADDLE_ENFORCE_NE
(
p_dtype
,
kDefaultDtype
,
"%s's type should not be %s."
,
var_names
[
i
],
kDefaultDtype
);
*
dtype
=
p_dtype
;
}
PADDLE_ENFORCE_EQ
(
p_dtype
,
*
dtype
,
"Input vars is not equal."
);
auto
size
=
lod_tensors
[
i
]
->
numel
();
PADDLE_ENFORCE_GT
(
size
,
0
);
VLOG
(
10
)
<<
"alloc_space_for_vars: input("
<<
var_names
[
i
]
<<
") ,dim:("
<<
lod_tensors
[
i
]
->
dims
()
<<
")"
;
*
numel
+=
size
;
}
}
};
class
AllocContinuousSpaceOp
:
public
framework
::
OperatorWithKernel
{
public:
using
framework
::
OperatorWithKernel
::
OperatorWithKernel
;
void
InferShape
(
framework
::
InferShapeContext
*
ctx
)
const
override
{}
};
class
AllocContinuousSpaceOpMaker
:
public
framework
::
OpProtoAndCheckerMaker
{
public:
void
Make
()
override
{
AddInput
(
"Input"
,
"(vector<LoDTensor>) The input tensors of"
" alloc_continuous_space operator."
)
.
AsDuplicable
();
AddOutput
(
"Output"
,
"(vector<LoDTensor>) The output "
"tensors of alloc_continuous_space operator. And the address "
"of output tensors are continuous, they are sliced from the "
"tensor of FusedOutput."
)
.
AsDuplicable
();
AddOutput
(
"FusedOutput"
,
"(LoDTensor) The output tensor "
"of alloc_continuous_space operator. And the tensors of"
" Output is sliced from the tensor of FusedOutput."
);
AddAttr
<
bool
>
(
"copy_data"
,
"Whether to copy the Input value to Output."
)
.
SetDefault
(
false
);
AddAttr
<
bool
>
(
"set_constant"
,
"Whether to set the Output with a constant value."
)
.
SetDefault
(
false
);
AddAttr
<
float
>
(
"constant"
,
"If set_constant is true, the constant value will be used "
"to set the Output."
)
.
SetDefault
(
0.0
);
AddAttr
<
bool
>
(
"check_name"
,
"Whether to check the name of Input and Output to ensure "
"they are the same separately."
)
.
SetDefault
(
false
);
AddComment
(
R"DOC(
AllocContinuousSpace Operator.
alloc_continuous_space is used to make the address of Output
continuous according to the Input. This Op will alloc a big tensor
according to the tensors of Input, the dtype is the same with those input tensors,
the size is the sum of those input tensors' numel, and the dim of the big
tensor is {sum(numel)}. And the big tensor is stored in FusedOutput.
The tensors of Output are sliced from the tensor of FusedOutput.
Note that, the dtype of Input should be the same, and the dim of Input
and Output should equal.
The tensors of Input and Output could be the same or different. And
alloc_continuous_space allows copying the value of Input to Output, or
setting the Output with a constant value.
)DOC"
);
}
};
}
// namespace operators
}
// namespace paddle
REGISTER_OPERATOR
(
alloc_continuous_space
,
paddle
::
operators
::
AllocContinuousSpaceOp
,
paddle
::
operators
::
AllocContinuousSpaceOpMaker
);
namespace
ops
=
paddle
::
operators
;
REGISTER_OP_CPU_KERNEL
(
alloc_continuous_space
,
ops
::
AllocContinuousSpaceKernel
<
paddle
::
platform
::
CPUDeviceContext
,
int
>
,
ops
::
AllocContinuousSpaceKernel
<
paddle
::
platform
::
CPUDeviceContext
,
float
>
,
ops
::
AllocContinuousSpaceKernel
<
paddle
::
platform
::
CPUDeviceContext
,
double
>
);
#ifdef PADDLE_WITH_CUDA
REGISTER_OP_CUDA_KERNEL
(
alloc_continuous_space
,
ops
::
AllocContinuousSpaceKernel
<
paddle
::
platform
::
CUDADeviceContext
,
int
>
,
ops
::
AllocContinuousSpaceKernel
<
paddle
::
platform
::
CUDADeviceContext
,
float
>
,
ops
::
AllocContinuousSpaceKernel
<
paddle
::
platform
::
CUDADeviceContext
,
double
>
);
#endif
paddle/fluid/operators/beam_search_decode_op.cc
浏览文件 @
07efdb51
...
...
@@ -123,7 +123,7 @@ class BeamSearchDecodeOp : public framework::OperatorBase {
auto
&
dev_ctx
=
*
pool
.
Get
(
dev_place
);
framework
::
RuntimeContext
run_ctx
(
Inputs
(),
Outputs
(),
scope
);
framework
::
ExecutionContext
ctx
(
*
this
,
scope
,
dev_ctx
,
run_ctx
);
framework
::
ExecutionContext
ctx
(
*
this
,
scope
,
dev_ctx
,
run_ctx
,
nullptr
);
const
LoDTensorArray
*
ids
=
ctx
.
Input
<
LoDTensorArray
>
(
"Ids"
);
const
LoDTensorArray
*
scores
=
ctx
.
Input
<
LoDTensorArray
>
(
"Scores"
);
...
...
paddle/fluid/operators/beam_search_decode_op.h
浏览文件 @
07efdb51
...
...
@@ -122,7 +122,7 @@ void BeamSearchDecoder<T>::ConvertSentenceVectorToLodTensor(
auto
cpu_place
=
std
::
unique_ptr
<
paddle
::
platform
::
CPUPlace
>
(
new
paddle
::
platform
::
CPUPlace
());
paddle
::
platform
::
CPUDeviceContext
cpu_ctx
(
*
cpu_place
.
get
()
);
paddle
::
platform
::
CPUDeviceContext
cpu_ctx
(
*
cpu_place
);
framework
::
LoD
lod
;
lod
.
push_back
(
source_level_lod
);
...
...
paddle/fluid/operators/benchmark/op_tester.cc
浏览文件 @
07efdb51
...
...
@@ -13,6 +13,7 @@ See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/operators/benchmark/op_tester.h"
#include <fstream>
#include "gflags/gflags.h"
#include "gtest/gtest.h"
#include "paddle/fluid/framework/op_info.h"
...
...
@@ -28,6 +29,7 @@ namespace operators {
namespace
benchmark
{
DEFINE_string
(
op_config_list
,
""
,
"Path of op config file."
);
DEFINE_int32
(
specified_config_id
,
-
1
,
"Test the specified op config."
);
void
OpTester
::
Init
(
const
std
::
string
&
filename
)
{
Init
(
OpTesterConfig
(
filename
));
...
...
@@ -147,7 +149,7 @@ void OpTester::CreateInputVarDesc() {
var
->
SetShape
(
input
->
dims
);
op_desc_
.
SetInput
(
name
,
{
var_name
});
input
s_
.
push_back
(
var_name
)
;
input
_lods_
[
var_name
]
=
input
->
lod
;
}
}
...
...
@@ -162,7 +164,6 @@ void OpTester::CreateOutputVarDesc() {
var
->
SetDataType
(
framework
::
proto
::
VarType
::
FP32
);
op_desc_
.
SetOutput
(
name
,
{
var_name
});
outputs_
.
push_back
(
var_name
);
}
}
...
...
@@ -218,16 +219,26 @@ void OpTester::CreateVariables(framework::Scope *scope) {
}
}
// Allocate memory for input tensor
for
(
auto
&
name
:
inputs_
)
{
VLOG
(
3
)
<<
"Allocate memory for tensor "
<<
name
;
auto
&
var_desc
=
vars_
[
name
];
for
(
auto
&
item
:
input_lods_
)
{
// Allocate memory for input tensor
auto
&
var_name
=
item
.
first
;
VLOG
(
3
)
<<
"Allocate memory for tensor "
<<
var_name
;
auto
&
var_desc
=
vars_
[
var_name
];
std
::
vector
<
int64_t
>
shape
=
var_desc
->
GetShape
();
auto
*
var
=
scope
->
Var
(
name
);
auto
*
var
=
scope
->
Var
(
var_
name
);
auto
*
tensor
=
var
->
GetMutable
<
framework
::
LoDTensor
>
();
SetupTensor
<
float
>
(
tensor
,
shape
,
static_cast
<
float
>
(
0.0
),
static_cast
<
float
>
(
1.0
));
VLOG
(
3
)
<<
"Set lod for tensor "
<<
var_name
;
std
::
vector
<
std
::
vector
<
size_t
>>
&
lod_vec
=
item
.
second
;
framework
::
LoD
lod
;
for
(
size_t
i
=
0
;
i
<
lod_vec
.
size
();
++
i
)
{
lod
.
push_back
(
lod_vec
[
i
]);
}
tensor
->
set_lod
(
lod
);
}
}
...
...
@@ -282,10 +293,32 @@ std::string OpTester::DebugString() {
}
TEST
(
op_tester
,
base
)
{
OpTester
tester
;
if
(
!
FLAGS_op_config_list
.
empty
())
{
tester
.
Init
(
FLAGS_op_config_list
);
std
::
ifstream
fin
(
FLAGS_op_config_list
,
std
::
ios
::
in
|
std
::
ios
::
binary
);
PADDLE_ENFORCE
(
static_cast
<
bool
>
(
fin
),
"Cannot open file %s"
,
FLAGS_op_config_list
.
c_str
());
std
::
vector
<
OpTesterConfig
>
op_configs
;
while
(
!
fin
.
eof
())
{
OpTesterConfig
config
;
bool
result
=
config
.
Init
(
fin
);
if
(
result
)
{
op_configs
.
push_back
(
config
);
}
}
if
(
FLAGS_specified_config_id
>=
0
&&
FLAGS_specified_config_id
<
static_cast
<
int
>
(
op_configs
.
size
()))
{
OpTester
tester
;
tester
.
Init
(
op_configs
[
FLAGS_specified_config_id
]);
tester
.
Run
();
}
else
{
for
(
size_t
i
=
0
;
i
<
op_configs
.
size
();
++
i
)
{
OpTester
tester
;
tester
.
Init
(
op_configs
[
i
]);
tester
.
Run
();
}
}
}
else
{
OpTester
tester
;
OpTesterConfig
config
;
config
.
op_type
=
"elementwise_add"
;
config
.
inputs
.
resize
(
2
);
...
...
@@ -294,8 +327,8 @@ TEST(op_tester, base) {
config
.
inputs
[
1
].
name
=
"Y"
;
config
.
inputs
[
1
].
dims
=
{
64
,
1
};
tester
.
Init
(
config
);
tester
.
Run
();
}
tester
.
Run
();
}
}
// namespace benchmark
...
...
paddle/fluid/operators/benchmark/op_tester.h
浏览文件 @
07efdb51
...
...
@@ -57,8 +57,7 @@ class OpTester {
std
::
string
type_
;
framework
::
OpDesc
op_desc_
;
std
::
unordered_map
<
std
::
string
,
std
::
unique_ptr
<
framework
::
VarDesc
>>
vars_
;
std
::
vector
<
std
::
string
>
inputs_
;
std
::
vector
<
std
::
string
>
outputs_
;
std
::
unordered_map
<
std
::
string
,
std
::
vector
<
std
::
vector
<
size_t
>>>
input_lods_
;
std
::
unique_ptr
<
framework
::
OperatorBase
>
op_
;
platform
::
Place
place_
;
std
::
unique_ptr
<
framework
::
Scope
>
scope_
;
...
...
paddle/fluid/operators/benchmark/op_tester_config.cc
浏览文件 @
07efdb51
...
...
@@ -33,21 +33,64 @@ static bool EndWith(const std::string& str, const std::string& substr) {
return
str
.
rfind
(
substr
)
==
(
str
.
length
()
-
substr
.
length
());
}
static
void
EraseEndSep
(
std
::
string
*
str
)
{
std
::
string
substr
=
kSepBetweenItems
;
static
void
EraseEndSep
(
std
::
string
*
str
,
std
::
string
substr
=
kSepBetweenItems
)
{
if
(
EndWith
(
*
str
,
substr
))
{
str
->
erase
(
str
->
length
()
-
substr
.
length
(),
str
->
length
());
}
}
static
std
::
vector
<
int64_t
>
ParseDims
(
std
::
string
dims_str
)
{
std
::
vector
<
int64_t
>
dims
;
void
OpInputConfig
::
ParseDims
(
std
::
istream
&
is
)
{
std
::
string
dims_str
;
is
>>
dims_str
;
dims
.
clear
();
std
::
string
token
;
std
::
istringstream
token_stream
(
dims_str
);
while
(
std
::
getline
(
token_stream
,
token
,
'x'
))
{
dims
.
push_back
(
std
::
stoi
(
token
));
}
return
dims
;
}
void
OpInputConfig
::
ParseLoD
(
std
::
istream
&
is
)
{
std
::
string
lod_str
;
std
::
string
start_sep
=
std
::
string
(
kStartSeparator
)
+
std
::
string
(
kStartSeparator
);
std
::
string
end_sep
=
std
::
string
(
kEndSeparator
)
+
std
::
string
(
kEndSeparator
);
std
::
string
sep
;
is
>>
sep
;
if
(
StartWith
(
sep
,
start_sep
))
{
lod_str
+=
sep
;
while
(
!
EndWith
(
sep
,
end_sep
))
{
is
>>
sep
;
lod_str
+=
sep
;
}
}
EraseEndSep
(
&
lod_str
);
PADDLE_ENFORCE_GE
(
lod_str
.
length
(),
4U
);
VLOG
(
4
)
<<
"lod: "
<<
lod_str
<<
", length: "
<<
lod_str
.
length
();
// Parse the lod_str
lod
.
clear
();
for
(
size_t
i
=
1
;
i
<
lod_str
.
length
()
-
1
;)
{
if
(
lod_str
[
i
]
==
'{'
)
{
std
::
vector
<
size_t
>
level
;
while
(
lod_str
[
i
]
!=
'}'
)
{
++
i
;
std
::
string
number
;
while
(
lod_str
[
i
]
>=
'0'
&&
lod_str
[
i
]
<=
'9'
)
{
number
+=
lod_str
[
i
];
++
i
;
}
level
.
push_back
(
atoi
(
number
.
c_str
()));
}
lod
.
push_back
(
level
);
}
else
if
(
lod_str
[
i
]
==
'}'
)
{
++
i
;
}
}
}
OpInputConfig
::
OpInputConfig
(
std
::
istream
&
is
)
{
...
...
@@ -60,9 +103,9 @@ OpInputConfig::OpInputConfig(std::istream& is) {
is
>>
name
;
EraseEndSep
(
&
name
);
}
else
if
(
sep
==
"dims"
||
sep
==
"dims:"
)
{
std
::
string
dims_str
;
is
>>
dims_str
;
dims
=
ParseDims
(
dims_str
);
ParseDims
(
is
)
;
}
else
if
(
sep
==
"lod"
||
sep
==
"lod:"
)
{
ParseLoD
(
is
);
}
}
}
...
...
@@ -76,7 +119,7 @@ OpTesterConfig::OpTesterConfig(const std::string& filename) {
Init
(
fin
);
}
void
OpTesterConfig
::
Init
(
std
::
istream
&
is
)
{
bool
OpTesterConfig
::
Init
(
std
::
istream
&
is
)
{
std
::
string
sep
;
is
>>
sep
;
if
(
sep
==
kStartSeparator
)
{
...
...
@@ -95,9 +138,40 @@ void OpTesterConfig::Init(std::istream& is) {
}
else
if
(
sep
==
"input"
||
sep
==
"input:"
)
{
OpInputConfig
input_config
(
is
);
inputs
.
push_back
(
input_config
);
}
else
if
(
sep
==
"attrs"
||
sep
==
"attrs:"
)
{
ParseAttrs
(
is
);
}
else
{
if
(
sep
!=
kEndSeparator
)
{
return
false
;
}
}
}
}
else
{
return
false
;
}
return
true
;
}
bool
OpTesterConfig
::
ParseAttrs
(
std
::
istream
&
is
)
{
std
::
string
sep
;
is
>>
sep
;
if
(
sep
==
kStartSeparator
)
{
while
(
true
)
{
std
::
string
key
;
is
>>
key
;
if
(
key
==
kEndSeparator
)
{
break
;
}
std
::
string
value
;
is
>>
value
;
EraseEndSep
(
&
key
,
":"
);
EraseEndSep
(
&
value
);
attrs
[
key
]
=
value
;
}
}
return
true
;
}
const
OpInputConfig
*
OpTesterConfig
::
GetInput
(
const
std
::
string
&
name
)
{
...
...
paddle/fluid/operators/benchmark/op_tester_config.h
浏览文件 @
07efdb51
...
...
@@ -16,6 +16,7 @@ limitations under the License. */
#include <istream>
#include <string>
#include <unordered_map>
#include <vector>
namespace
paddle
{
...
...
@@ -26,19 +27,27 @@ struct OpInputConfig {
OpInputConfig
()
{}
explicit
OpInputConfig
(
std
::
istream
&
is
);
void
ParseDims
(
std
::
istream
&
is
);
void
ParseLoD
(
std
::
istream
&
is
);
std
::
string
name
;
std
::
vector
<
int64_t
>
dims
;
std
::
vector
<
std
::
vector
<
size_t
>>
lod
;
};
struct
OpTesterConfig
{
OpTesterConfig
()
{}
explicit
OpTesterConfig
(
const
std
::
string
&
filename
);
void
Init
(
std
::
istream
&
is
);
bool
Init
(
std
::
istream
&
is
);
bool
ParseAttrs
(
std
::
istream
&
is
);
const
OpInputConfig
*
GetInput
(
const
std
::
string
&
name
);
std
::
string
op_type
;
std
::
vector
<
OpInputConfig
>
inputs
;
std
::
unordered_map
<
std
::
string
,
std
::
string
>
attrs
;
int
device_id
{
-
1
};
// CPU: -1
int
repeat
{
1
};
int
profile
{
0
};
...
...
paddle/fluid/operators/conv_cudnn_op.cu.cc
浏览文件 @
07efdb51
...
...
@@ -42,6 +42,7 @@ using ScopedConvolutionDescriptor = platform::ScopedConvolutionDescriptor;
using
DataLayout
=
platform
::
DataLayout
;
template
<
typename
T
>
using
ScalingParamType
=
typename
platform
::
CudnnDataType
<
T
>::
ScalingParamType
;
using
framework
::
AlgorithmsCache
;
template
<
typename
T
>
class
CUDNNConvOpKernel
:
public
framework
::
OpKernel
<
T
>
{
...
...
@@ -169,18 +170,8 @@ class CUDNNConvOpKernel : public framework::OpKernel<T> {
workspace_size_limit
,
&
algo
));
VLOG
(
3
)
<<
"cuDNN forward algo "
<<
algo
;
}
else
if
(
exhaustive_search
&&
(
!
half_float
))
{
AlgorithmsCache
<
cudnnConvolutionFwdAlgo_t
>*
algo_cache
=
nullptr
;
if
(
ctx
.
scope
().
FindVar
(
kCUDNNFwdAlgoCache
))
{
algo_cache
=
ctx
.
scope
()
.
FindVar
(
kCUDNNFwdAlgoCache
)
->
GetMutable
<
AlgorithmsCache
<
cudnnConvolutionFwdAlgo_t
>>
();
}
else
{
algo_cache
=
const_cast
<
framework
::
Scope
&>
(
ctx
.
scope
())
.
Var
(
kCUDNNFwdAlgoCache
)
->
GetMutable
<
AlgorithmsCache
<
cudnnConvolutionFwdAlgo_t
>>
();
}
AlgorithmsCache
<
cudnnConvolutionFwdAlgo_t
>&
algo_cache
=
ctx
.
GetKernelConfig
<
AlgorithmsCache
<
cudnnConvolutionFwdAlgo_t
>>
(
0
);
cudnn_workspace
=
ctx
.
AllocateTmpTensor
<
int8_t
,
platform
::
CUDADeviceContext
>
(
framework
::
make_ddim
(
...
...
@@ -188,7 +179,7 @@ class CUDNNConvOpKernel : public framework::OpKernel<T> {
dev_ctx
);
cudnn_workspace_ptr
=
static_cast
<
void
*>
(
cudnn_workspace
.
data
<
int8_t
>
());
algo
=
algo_cache
->
GetAlgorithm
(
algo
=
algo_cache
.
GetAlgorithm
(
x_dims
,
f_dims
,
strides
,
paddings
,
dilations
,
0
,
[
&
]()
{
int
returned_algo_count
;
std
::
array
<
cudnnConvolutionFwdAlgoPerf_t
,
kNUM_CUDNN_FWD_ALGS
>
...
...
@@ -382,22 +373,11 @@ class CUDNNConvGradOpKernel : public framework::OpKernel<T> {
if
(
input_grad
)
{
T
*
input_grad_data
=
input_grad
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
if
(
exhaustive_search
)
{
AlgorithmsCache
<
cudnnConvolutionBwdDataAlgo_t
>*
data_algo_cache
;
if
(
ctx
.
scope
().
FindVar
(
kCUDNNBwdDataAlgoCache
))
{
data_algo_cache
=
ctx
.
scope
()
.
FindVar
(
kCUDNNBwdDataAlgoCache
)
->
GetMutable
<
AlgorithmsCache
<
cudnnConvolutionBwdDataAlgo_t
>>
();
}
else
{
data_algo_cache
=
const_cast
<
framework
::
Scope
&>
(
ctx
.
scope
())
.
Var
(
kCUDNNBwdDataAlgoCache
)
->
GetMutable
<
AlgorithmsCache
<
cudnnConvolutionBwdDataAlgo_t
>>
();
}
data_algo
=
data_algo_cache
->
GetAlgorithm
(
AlgorithmsCache
<
cudnnConvolutionBwdDataAlgo_t
>&
data_algo_cache
=
ctx
.
GetKernelConfig
<
AlgorithmsCache
<
cudnnConvolutionBwdDataAlgo_t
>>
(
0
);
data_algo
=
data_algo_cache
.
GetAlgorithm
(
x_dims
,
f_dims
,
strides
,
paddings
,
dilations
,
0
,
[
&
]()
{
int
returned_algo_count
;
std
::
array
<
cudnnConvolutionBwdDataAlgoPerf_t
,
...
...
@@ -448,22 +428,11 @@ class CUDNNConvGradOpKernel : public framework::OpKernel<T> {
if
(
filter_grad
)
{
T
*
filter_grad_data
=
filter_grad
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
if
(
exhaustive_search
)
{
AlgorithmsCache
<
cudnnConvolutionBwdFilterAlgo_t
>*
f_algo_cache
;
if
(
ctx
.
scope
().
FindVar
(
kCUDNNBwdFilterAlgoCache
))
{
f_algo_cache
=
ctx
.
scope
()
.
FindVar
(
kCUDNNBwdFilterAlgoCache
)
->
GetMutable
<
AlgorithmsCache
<
cudnnConvolutionBwdFilterAlgo_t
>>
();
}
else
{
f_algo_cache
=
const_cast
<
framework
::
Scope
&>
(
ctx
.
scope
())
.
Var
(
kCUDNNBwdFilterAlgoCache
)
->
GetMutable
<
AlgorithmsCache
<
cudnnConvolutionBwdFilterAlgo_t
>>
();
}
filter_algo
=
f_algo_cache
->
GetAlgorithm
(
AlgorithmsCache
<
cudnnConvolutionBwdFilterAlgo_t
>&
f_algo_cache
=
ctx
.
GetKernelConfig
<
AlgorithmsCache
<
cudnnConvolutionBwdFilterAlgo_t
>>
(
1
);
filter_algo
=
f_algo_cache
.
GetAlgorithm
(
x_dims
,
f_dims
,
strides
,
paddings
,
dilations
,
0
,
[
&
]()
{
int
returned_algo_count
;
std
::
array
<
cudnnConvolutionBwdFilterAlgoPerf_t
,
...
...
paddle/fluid/operators/conv_cudnn_op_cache.h
浏览文件 @
07efdb51
...
...
@@ -17,6 +17,7 @@ limitations under the License. */
#include <functional>
#include <unordered_map>
#include <vector>
#include "paddle/fluid/framework/operator.h"
#include "paddle/fluid/platform/cudnn_helper.h"
DECLARE_uint64
(
conv_workspace_size_limit
);
...
...
@@ -46,100 +47,5 @@ static constexpr size_t kNUM_CUDNN_BWD_FILTER_ALGS = 4;
static
constexpr
size_t
kNUM_CUDNN_BWD_DATA_ALGS
=
5
;
#endif
template
<
typename
TAlgorithm
>
class
AlgorithmsCache
{
public:
AlgorithmsCache
()
:
search_times_
(
0
)
{
hash_
.
clear
();
}
// Caches the best algorithm for a given
// combination of tensor dimensions & compute data type.
TAlgorithm
GetAlgorithm
(
const
std
::
vector
<
int64_t
>&
dims1
,
const
std
::
vector
<
int64_t
>&
dims2
,
const
std
::
vector
<
int
>&
strides
,
const
std
::
vector
<
int
>&
paddings
,
const
std
::
vector
<
int
>&
dilations
,
int
algorithmFlags
,
// can set for different data type
std
::
function
<
TAlgorithm
()
>
gen_func
);
TAlgorithm
GetAlgorithm
(
int64_t
area
,
int
search_times
,
int
algorithmFlags
,
std
::
function
<
TAlgorithm
()
>
gen_func
);
private:
std
::
unordered_map
<
int64_t
,
TAlgorithm
>
hash_
;
std
::
mutex
mutex_
;
int
search_times_
;
};
template
<
typename
TAlgorithm
>
TAlgorithm
AlgorithmsCache
<
TAlgorithm
>::
GetAlgorithm
(
const
std
::
vector
<
int64_t
>&
dims1
,
const
std
::
vector
<
int64_t
>&
dims2
,
const
std
::
vector
<
int
>&
strides
,
const
std
::
vector
<
int
>&
paddings
,
const
std
::
vector
<
int
>&
dilations
,
int
algorithmFlags
,
std
::
function
<
TAlgorithm
()
>
gen_func
)
{
std
::
lock_guard
<
std
::
mutex
>
lock
(
mutex_
);
int64_t
seed
=
0
;
// Hash all of the inputs, use to try and look up a previously
// discovered algorithm, or fall back to generating a new one.
std
::
hash
<
int64_t
>
hashFn
;
// do hash like boost
// https://stackoverflow.com/questions/2590677/how-do-i-combine-hash-values-in-c0x
for
(
const
auto
num
:
dims1
)
{
seed
^=
hashFn
(
num
)
+
0x9e3779b9
+
(
seed
<<
6
)
+
(
seed
>>
2
);
}
for
(
const
auto
num
:
dims2
)
{
seed
^=
hashFn
(
num
)
+
0x9e3779b9
+
(
seed
<<
6
)
+
(
seed
>>
2
)
+
1
;
}
for
(
const
auto
num
:
strides
)
{
seed
^=
hashFn
(
static_cast
<
int64_t
>
(
num
))
+
0x9e3779b9
+
(
seed
<<
6
)
+
(
seed
>>
2
)
+
2
;
}
for
(
const
auto
num
:
paddings
)
{
seed
^=
hashFn
(
static_cast
<
int64_t
>
(
num
))
+
0x9e3779b9
+
(
seed
<<
6
)
+
(
seed
>>
2
)
+
3
;
}
for
(
const
auto
num
:
dilations
)
{
seed
^=
hashFn
(
static_cast
<
int64_t
>
(
num
))
+
0x9e3779b9
+
(
seed
<<
6
)
+
(
seed
>>
2
)
+
4
;
}
seed
^=
hashFn
(
static_cast
<
int64_t
>
(
algorithmFlags
))
+
0x9e3779b9
+
(
seed
<<
6
)
+
(
seed
>>
2
)
+
5
;
if
(
seed
==
0
)
return
gen_func
();
if
(
hash_
.
find
(
seed
)
==
hash_
.
end
())
{
TAlgorithm
value
=
gen_func
();
hash_
[
seed
]
=
value
;
}
return
hash_
[
seed
];
}
template
<
typename
TAlgorithm
>
TAlgorithm
AlgorithmsCache
<
TAlgorithm
>::
GetAlgorithm
(
int64_t
area
,
int
search_times
,
int
algorithmFlags
,
std
::
function
<
TAlgorithm
()
>
gen_func
)
{
if
(
hash_
.
find
(
area
)
!=
hash_
.
end
())
{
return
hash_
[
area
];
}
if
(
search_times_
<
search_times
)
{
auto
algo
=
gen_func
();
hash_
[
area
]
=
algo
;
++
search_times_
;
return
algo
;
}
TAlgorithm
algo
;
int64_t
min
=
static_cast
<
uint64_t
>
(
INT_MAX
);
for
(
const
auto
&
m
:
hash_
)
{
if
(
m
.
first
<
min
)
{
min
=
m
.
first
;
algo
=
m
.
second
;
}
}
return
algo
;
}
}
// namespace operators
}
// namespace paddle
paddle/fluid/operators/conv_fusion_op.cu.cc
浏览文件 @
07efdb51
...
...
@@ -30,6 +30,8 @@ using ScopedFilterDescriptor = platform::ScopedFilterDescriptor;
using
ScopedConvolutionDescriptor
=
platform
::
ScopedConvolutionDescriptor
;
using
ScopedActivationDescriptor
=
platform
::
ScopedActivationDescriptor
;
using
DataLayout
=
platform
::
DataLayout
;
using
framework
::
AlgorithmsCache
;
template
<
typename
T
>
using
ScalingParamType
=
typename
platform
::
CudnnDataType
<
T
>::
ScalingParamType
;
...
...
@@ -139,38 +141,21 @@ class CUDNNConvFusionOpKernel : public framework::OpKernel<T> {
}
return
fwd_perf_stat
[
0
].
algo
;
};
AlgorithmsCache
<
cudnnConvolutionFwdAlgo_t
>*
algo_cache
=
nullptr
;
AlgorithmsCache
<
cudnnConvolutionFwdAlgo_t
>&
algo_cache
=
ctx
.
GetKernelConfig
<
AlgorithmsCache
<
cudnnConvolutionFwdAlgo_t
>>
(
0
);
int
search_times
=
ctx
.
Attr
<
int
>
(
"search_times"
);
search_times
=
std
::
max
(
static_cast
<
int
>
(
FLAGS_cudnn_exhaustive_search_times
),
search_times
);
// TODO(dangqingqing): Unify this if-else.
if
(
search_times
>
0
)
{
// The searched algo will be cached by `search_times` times for
// different input dimension. For other dimensions, select the algo
// of closest area.
auto
var_name
=
ctx
.
Inputs
(
"AlgoCache"
)[
0
];
algo_cache
=
ctx
.
scope
()
.
FindVar
(
var_name
)
->
GetMutable
<
AlgorithmsCache
<
cudnnConvolutionFwdAlgo_t
>>
();
algo
=
algo_cache
->
GetAlgorithm
(
x_dims
[
2
]
*
x_dims
[
3
],
search_times
,
0
,
search_func
);
algo
=
algo_cache
.
GetAlgorithm
(
x_dims
[
2
]
*
x_dims
[
3
],
search_times
,
0
,
search_func
);
}
else
{
// Cache searched algo in Var(kCUDNNFwdAlgoCache).
// all conv ops use the same kCUDNNFwdAlgoCache variable.
if
(
ctx
.
scope
().
FindVar
(
kCUDNNFwdAlgoCache
))
{
algo_cache
=
ctx
.
scope
()
.
FindVar
(
kCUDNNFwdAlgoCache
)
->
GetMutable
<
AlgorithmsCache
<
cudnnConvolutionFwdAlgo_t
>>
();
}
else
{
// TODO(qingqing) remove const_cast
algo_cache
=
const_cast
<
framework
::
Scope
*>
(
ctx
.
scope
().
parent
())
->
Var
(
kCUDNNFwdAlgoCache
)
->
GetMutable
<
AlgorithmsCache
<
cudnnConvolutionFwdAlgo_t
>>
();
}
algo
=
algo_cache
->
GetAlgorithm
(
x_dims
,
f_dims
,
strides
,
paddings
,
dilations
,
0
,
search_func
);
algo
=
algo_cache
.
GetAlgorithm
(
x_dims
,
f_dims
,
strides
,
paddings
,
dilations
,
0
,
search_func
);
}
VLOG
(
3
)
<<
"choose algo "
<<
algo
;
}
...
...
paddle/fluid/operators/conv_op.cc
浏览文件 @
07efdb51
...
...
@@ -18,6 +18,7 @@ limitations under the License. */
#include <vector>
#ifdef PADDLE_WITH_CUDA
#include "paddle/fluid/operators/conv_cudnn_op_cache.h"
#include "paddle/fluid/platform/cudnn_helper.h"
#endif
#ifdef PADDLE_WITH_MKLDNN
...
...
@@ -109,8 +110,20 @@ framework::OpKernelType ConvOp::GetExpectedKernelType(
"float16 can only be used when CUDNN is used"
);
}
return
framework
::
OpKernelType
(
input_data_type
,
ctx
.
GetPlace
(),
layout
,
library
,
customized_type_value
);
auto
type
=
framework
::
OpKernelType
(
input_data_type
,
ctx
.
GetPlace
(),
layout
,
library
,
customized_type_value
);
#ifdef PADDLE_WITH_CUDA
std
::
vector
<
framework
::
KernelConfig
>&
configs
=
kernel_configs_map_
[
type
];
// TODO(dangqingqing): Currently conv_fusion_op use cudnn but sets use_cudnn
// to false. It should be fixed and then here should only create if library
// is kCUDNN.
if
(
configs
.
empty
())
{
std
::
shared_ptr
<
framework
::
AlgorithmsCache
<
cudnnConvolutionFwdAlgo_t
>>
p
(
new
framework
::
AlgorithmsCache
<
cudnnConvolutionFwdAlgo_t
>
());
configs
.
push_back
(
p
);
}
#endif
return
type
;
}
void
Conv2DOpMaker
::
Make
()
{
...
...
@@ -410,9 +423,25 @@ framework::OpKernelType ConvOpGrad::GetExpectedKernelType(
}
#endif
return
framework
::
OpKernelType
(
ctx
.
Input
<
Tensor
>
(
"Input"
)
->
type
(),
ctx
.
GetPlace
(),
layout_
,
library_
,
customized_type_value
);
auto
type
=
framework
::
OpKernelType
(
ctx
.
Input
<
Tensor
>
(
"Input"
)
->
type
(),
ctx
.
GetPlace
(),
layout_
,
library_
,
customized_type_value
);
#ifdef PADDLE_WITH_CUDA
if
(
library_
==
framework
::
LibraryType
::
kCUDNN
)
{
std
::
vector
<
framework
::
KernelConfig
>&
configs
=
kernel_configs_map_
[
type
];
if
(
configs
.
empty
())
{
std
::
shared_ptr
<
framework
::
AlgorithmsCache
<
cudnnConvolutionBwdDataAlgo_t
>>
p
(
new
framework
::
AlgorithmsCache
<
cudnnConvolutionBwdDataAlgo_t
>
());
configs
.
push_back
(
p
);
std
::
shared_ptr
<
framework
::
AlgorithmsCache
<
cudnnConvolutionBwdFilterAlgo_t
>>
p2
(
new
framework
::
AlgorithmsCache
<
cudnnConvolutionBwdFilterAlgo_t
>
());
configs
.
push_back
(
p2
);
}
}
#endif
return
type
;
}
class
Conv2dGradMaker
:
public
framework
::
SingleGradOpDescMaker
{
...
...
paddle/fluid/operators/cross_entropy_op.cc
浏览文件 @
07efdb51
...
...
@@ -32,14 +32,23 @@ class CrossEntropyOp : public framework::OperatorWithKernel {
int
rank
=
x_dims
.
size
();
PADDLE_ENFORCE_EQ
(
rank
,
label_dims
.
size
(),
"Input(X) and Input(Label) shall have the same rank."
);
PADDLE_ENFORCE_EQ
(
framework
::
slice_ddim
(
x_dims
,
0
,
rank
-
1
),
framework
::
slice_ddim
(
label_dims
,
0
,
rank
-
1
),
"Input(X) and Input(Label) shall have the same shape "
"except the last dimension."
);
bool
check
=
true
;
if
((
!
ctx
->
IsRuntime
())
&&
(
framework
::
product
(
x_dims
)
<=
0
||
framework
::
product
(
label_dims
)
<=
0
))
{
check
=
false
;
}
if
(
check
)
{
PADDLE_ENFORCE_EQ
(
framework
::
slice_ddim
(
x_dims
,
0
,
rank
-
1
),
framework
::
slice_ddim
(
label_dims
,
0
,
rank
-
1
),
"Input(X) and Input(Label) shall have the same shape "
"except the last dimension."
);
}
if
(
ctx
->
Attrs
().
Get
<
bool
>
(
"soft_label"
))
{
PADDLE_ENFORCE_EQ
(
x_dims
[
rank
-
1
],
label_dims
[
rank
-
1
],
"If Attr(soft_label) == true, the last dimension of "
"Input(X) and Input(Label) should be equal."
);
if
(
check
)
{
PADDLE_ENFORCE_EQ
(
x_dims
[
rank
-
1
],
label_dims
[
rank
-
1
],
"If Attr(soft_label) == true, the last dimension of "
"Input(X) and Input(Label) should be equal."
);
}
}
else
{
PADDLE_ENFORCE_EQ
(
label_dims
[
rank
-
1
],
1UL
,
"If Attr(softLabel) == false, the last dimension of "
...
...
@@ -82,20 +91,32 @@ class CrossEntropyGradientOp : public framework::OperatorWithKernel {
"Input(Y@Grad) and Input(X) should have the same rank."
);
PADDLE_ENFORCE_EQ
(
label_dims
.
size
(),
rank
,
"Input(Label) and Input(X) should have the same rank."
);
PADDLE_ENFORCE_EQ
(
framework
::
slice_ddim
(
x_dims
,
0
,
rank
-
1
),
framework
::
slice_ddim
(
label_dims
,
0
,
rank
-
1
),
"The Input(X) and Input(Label) should have the same "
"shape except the last dimension."
);
PADDLE_ENFORCE_EQ
(
framework
::
slice_ddim
(
x_dims
,
0
,
rank
-
1
),
framework
::
slice_ddim
(
dy_dims
,
0
,
rank
-
1
),
"The Input(X) and Input(Y@Grad) should have the same "
"shape except the last dimension."
);
bool
check
=
true
;
if
((
!
ctx
->
IsRuntime
())
&&
(
framework
::
product
(
x_dims
)
<=
0
||
framework
::
product
(
label_dims
)
<=
0
))
{
check
=
false
;
}
if
(
check
)
{
PADDLE_ENFORCE_EQ
(
framework
::
slice_ddim
(
x_dims
,
0
,
rank
-
1
),
framework
::
slice_ddim
(
label_dims
,
0
,
rank
-
1
),
"The Input(X) and Input(Label) should have the same "
"shape except the last dimension."
);
PADDLE_ENFORCE_EQ
(
framework
::
slice_ddim
(
x_dims
,
0
,
rank
-
1
),
framework
::
slice_ddim
(
dy_dims
,
0
,
rank
-
1
),
"The Input(X) and Input(Y@Grad) should have the same "
"shape except the last dimension."
);
}
PADDLE_ENFORCE_EQ
(
dy_dims
[
rank
-
1
],
1
,
"The last dimension of Input(Y@Grad) should be 1."
);
if
(
ctx
->
Attrs
().
Get
<
bool
>
(
"soft_label"
))
{
PADDLE_ENFORCE_EQ
(
x_dims
[
rank
-
1
],
label_dims
[
rank
-
1
],
"When Attr(soft_label) == true, the last dimension of "
"Input(X) and Input(Label) should be equal."
);
if
(
check
)
{
PADDLE_ENFORCE_EQ
(
x_dims
[
rank
-
1
],
label_dims
[
rank
-
1
],
"When Attr(soft_label) == true, the last dimension of "
"Input(X) and Input(Label) should be equal."
);
}
}
else
{
PADDLE_ENFORCE_EQ
(
label_dims
[
rank
-
1
],
1
,
"When Attr(soft_label) == false, the last dimension of "
...
...
paddle/fluid/operators/detection/prior_box_op.h
浏览文件 @
07efdb51
...
...
@@ -172,6 +172,10 @@ class PriorBoxOpKernel : public framework::OpKernel<T> {
framework
::
make_ddim
({
1
,
static_cast
<
int
>
(
variances
.
size
())}),
ctx
.
GetPlace
());
auto
var_et
=
framework
::
EigenTensor
<
T
,
2
>::
From
(
var_t
);
#ifdef PADDLE_WITH_MKLML
#pragma omp parallel for
#endif
for
(
size_t
i
=
0
;
i
<
variances
.
size
();
++
i
)
{
var_et
(
0
,
i
)
=
variances
[
i
];
}
...
...
@@ -181,8 +185,15 @@ class PriorBoxOpKernel : public framework::OpKernel<T> {
vars
->
Resize
({
box_num
,
static_cast
<
int
>
(
variances
.
size
())});
auto
e_vars
=
framework
::
EigenMatrix
<
T
,
Eigen
::
RowMajor
>::
From
(
*
vars
);
e_vars
=
var_et
.
broadcast
(
Eigen
::
DSizes
<
int
,
2
>
(
box_num
,
1
));
#ifdef PADDLE_WITH_MKLML
#pragma omp parallel for collapse(2)
#endif
for
(
int
i
=
0
;
i
<
box_num
;
++
i
)
{
for
(
int
j
=
0
;
j
<
variances
.
size
();
++
j
)
{
e_vars
(
i
,
j
)
=
variances
[
j
];
}
}
vars
->
Resize
(
var_dim
);
}
};
// namespace operators
...
...
paddle/fluid/operators/fake_quantize_op.cc
浏览文件 @
07efdb51
...
...
@@ -31,7 +31,7 @@ template <typename T>
struct
FindAbsMaxFunctor
<
platform
::
CPUDeviceContext
,
T
>
{
void
operator
()(
const
platform
::
CPUDeviceContext
&
ctx
,
const
T
*
in
,
const
int
num
,
T
*
out
)
{
*
out
=
*
(
std
::
max_element
(
in
+
0
,
in
+
num
,
Compare
<
T
>
(
)));
*
out
=
std
::
abs
(
*
(
std
::
max_element
(
in
+
0
,
in
+
num
,
Compare
<
T
>
()
)));
}
};
...
...
@@ -46,10 +46,8 @@ struct ClipAndFakeQuantFunctor<platform::CPUDeviceContext, T> {
platform
::
Transform
<
platform
::
CPUDeviceContext
>
trans
;
trans
(
ctx
,
in
.
data
<
T
>
(),
in
.
data
<
T
>
()
+
in
.
numel
(),
out
->
mutable_data
<
T
>
(
ctx
.
GetPlace
()),
ClipFunctor
<
T
>
(
-
s
,
s
));
auto
in_e
=
framework
::
EigenVector
<
T
>::
Flatten
(
in
);
auto
out_e
=
framework
::
EigenVector
<
T
>::
Flatten
(
*
out
);
out_e
.
device
(
*
ctx
.
eigen_device
())
=
(
bin_cnt
/
s
*
in_e
).
round
();
out_e
.
device
(
*
ctx
.
eigen_device
())
=
(
bin_cnt
/
s
*
out_e
).
round
();
}
};
...
...
paddle/fluid/operators/mkldnn/activation_mkldnn_op.cc
浏览文件 @
07efdb51
...
...
@@ -225,7 +225,7 @@ void eltwise_grad(const framework::ExecutionContext &ctx,
std
::
static_pointer_cast
<
mkldnn
::
memory
>
(
dev_ctx
.
GetBlob
(
key_src_mem
));
PADDLE_ENFORCE
(
src_memory
!=
nullptr
,
"Fail to find src_memory in device context"
);
src_memory
->
set_data_handle
(
*
p_src_data
.
get
()
);
src_memory
->
set_data_handle
(
*
p_src_data
);
std
::
shared_ptr
<
memory
>
diff_src_memory
;
...
...
paddle/fluid/operators/mkldnn/conv_mkldnn_op.cc
浏览文件 @
07efdb51
...
...
@@ -96,12 +96,8 @@ class ConvMKLDNNOpKernel : public paddle::framework::OpKernel<T> {
auto
*
bias
=
ctx
.
HasInput
(
"Bias"
)
?
ctx
.
Input
<
Tensor
>
(
"Bias"
)
:
nullptr
;
auto
*
output
=
ctx
.
Output
<
Tensor
>
(
"Output"
);
PADDLE_ENFORCE
(
input
->
layout
()
==
DataLayout
::
kMKLDNN
&&
input
->
format
()
!=
memory
::
format
::
format_undef
,
"Wrong layout/format set for Input tensor"
);
PADDLE_ENFORCE
(
filter
->
layout
()
==
DataLayout
::
kMKLDNN
&&
filter
->
format
()
!=
memory
::
format
::
format_undef
,
"Wrong layout/format set for Filter tensor"
);
PADDLE_ENFORCE
(
input
->
layout
()
==
DataLayout
::
kMKLDNN
);
PADDLE_ENFORCE
(
filter
->
layout
()
==
DataLayout
::
kMKLDNN
);
PADDLE_ENFORCE
(
input
->
dims
().
size
()
==
4
||
input
->
dims
().
size
()
==
5
,
"Input must be with 4 or 5 dimensions, i.e. NCHW or NCDHW"
);
PADDLE_ENFORCE
(
filter
->
dims
().
size
()
==
4
||
filter
->
dims
().
size
()
==
5
,
...
...
@@ -148,14 +144,19 @@ class ConvMKLDNNOpKernel : public paddle::framework::OpKernel<T> {
std
::
vector
<
primitive
>
pipeline
;
auto
src_format
=
input
->
format
();
mkldnn
::
memory
::
format
weights_format
=
GetWeightsFormat
(
filter
->
format
(),
g
,
is_conv3d
);
auto
user_src_md
=
platform
::
MKLDNNMemDesc
(
{
src_tz
},
platform
::
MKLDNNGetDataType
<
T
>
(),
src_format
);
auto
user_weights_md
=
platform
::
MKLDNNMemDesc
(
{
weights_tz
},
platform
::
MKLDNNGetDataType
<
T
>
(),
weights_format
);
// For convolution with groups we need to recreate primitive descriptor
// as Paddle tensor is not having group dims while mkldnn treats
// group as another dimensions
mkldnn
::
memory
::
primitive_desc
user_weights_mpd
=
filter
->
get_mkldnn_prim_desc
();
if
(
g
>
1
)
{
mkldnn
::
memory
::
format
weights_format
=
GetWeightsFormat
(
filter
->
format
(),
g
,
is_conv3d
);
auto
user_weights_md
=
platform
::
MKLDNNMemDesc
(
{
weights_tz
},
platform
::
MKLDNNGetDataType
<
T
>
(),
weights_format
);
user_weights_mpd
=
mkldnn
::
memory
::
primitive_desc
(
user_weights_md
,
mkldnn_engine
);
}
/* create memory descriptor for convolution without specified format
* ('any') which lets a primitive (convolution in this case) choose
...
...
@@ -165,7 +166,7 @@ class ConvMKLDNNOpKernel : public paddle::framework::OpKernel<T> {
auto
chosen_memory_format
=
platform
::
data_format_to_memory_format
(
data_format
);
weights_format
=
mkldnn
::
memory
::
format
::
any
;
mkldnn
::
memory
::
format
weights_format
=
mkldnn
::
memory
::
format
::
any
;
// Check the format for user's special output
if
(
chosen_memory_format
!=
mkldnn
::
memory
::
format
::
any
)
{
if
(
is_conv3d
)
{
...
...
@@ -205,10 +206,10 @@ class ConvMKLDNNOpKernel : public paddle::framework::OpKernel<T> {
platform
::
ConvMKLDNNHandler
handler
(
conv_pd
,
dev_ctx
,
mkldnn_engine
,
key
);
// create mkldnn memory from input tensors (data/weights)
auto
user_src_memory_p
=
handler
.
AcquireSrcMemory
(
user_src_md
,
to_void_cast
<
T
>
(
input_data
));
auto
user_src_memory_p
=
handler
.
AcquireSrcMemory
(
input
->
get_mkldnn_prim_desc
()
,
to_void_cast
<
T
>
(
input_data
));
auto
user_weights_memory_p
=
handler
.
AcquireWeightsMemory
(
user_weights_md
,
to_void_cast
<
T
>
(
filter_data
));
user_weights_m
p
d
,
to_void_cast
<
T
>
(
filter_data
));
// create reorder primitive if the input format is not the preferred one
auto
src_memory_p
=
...
...
@@ -281,8 +282,8 @@ class ConvMKLDNNOpKernel : public paddle::framework::OpKernel<T> {
pipeline
.
push_back
(
*
conv_p
);
stream
(
stream
::
kind
::
eager
).
submit
(
pipeline
).
wait
();
output
->
set_layout
(
DataLayout
::
kMKLDNN
);
output
->
set_
format
(
GetMKLDNNFormat
(
*
dst_memory_p
)
);
auto
dst_mpd
=
dst_memory_p
->
get_primitive_desc
(
);
output
->
set_
mkldnn_prim_desc
(
dst_mpd
);
}
void
ComputeINT8
(
const
paddle
::
framework
::
ExecutionContext
&
ctx
)
const
{
const
bool
is_test
=
ctx
.
Attr
<
bool
>
(
"is_test"
);
...
...
@@ -947,8 +948,8 @@ class ConvMKLDNNGradOpKernel : public paddle::framework::OpKernel<T> {
// push primitive to stream and wait until it's executed
pipeline
.
push_back
(
*
conv_bwd_weights_p
);
filter_grad
->
set_layout
(
DataLayout
::
kMKLDNN
);
filter_grad
->
set_
format
(
GetMKLDNNFormat
(
*
diff_weights_memory_p
)
);
auto
filter_grad_mpd
=
diff_weights_memory_p
->
get_primitive_desc
(
);
filter_grad
->
set_
mkldnn_prim_desc
(
filter_grad_mpd
);
}
if
(
input_grad
)
{
...
...
paddle/fluid/operators/mkldnn/gaussian_random_mkldnn_op.cc
浏览文件 @
07efdb51
...
...
@@ -42,8 +42,12 @@ class GaussianMKLDNNKernel : public paddle::framework::OpKernel<T> {
// The format of output is set as the mkldnn's format
// TODO(@mozga-intel) The format of matrix sets inside the another layers.
tensor
->
set_layout
(
DataLayout
::
kMKLDNN
);
tensor
->
set_format
(
mkldnn
::
memory
::
format
::
oihw
);
// TODO(jczaja): Remove this hack after checking performance on block layout
auto
tensor_mem_pd
=
paddle
::
platform
::
create_prim_desc_from_dims
(
paddle
::
framework
::
vectorize2int
(
tensor
->
dims
()),
mkldnn
::
memory
::
format
::
oihw
);
tensor
->
set_mkldnn_prim_desc
(
tensor_mem_pd
);
}
};
}
// namespace operators
...
...
paddle/fluid/operators/mkldnn/pool_mkldnn_op.cc
浏览文件 @
07efdb51
...
...
@@ -198,7 +198,7 @@ class PoolMKLDNNOpKernel : public paddle::framework::OpKernel<T> {
}
// push primitive to stream and wait until it's executed
std
::
vector
<
mkldnn
::
primitive
>
pipeline
{
*
(
pool_p
.
get
())
};
std
::
vector
<
mkldnn
::
primitive
>
pipeline
{
*
pool_p
};
stream
(
stream
::
kind
::
eager
).
submit
(
pipeline
).
wait
();
output
->
set_layout
(
DataLayout
::
kMKLDNN
);
...
...
@@ -367,8 +367,7 @@ class PoolMKLDNNGradOpKernel : public paddle::framework::OpKernel<T> {
dev_ctx
.
SetBlob
(
key_pool_diff_dst_mem_p
,
diff_dst_memory
);
pool_bwd_p
=
std
::
make_shared
<
pooling_backward
>
(
pool_bwd_pd
,
*
(
diff_dst_memory
.
get
()),
*
workspace_memory
,
*
(
diff_src_memory
));
pool_bwd_pd
,
*
diff_dst_memory
,
*
workspace_memory
,
*
diff_src_memory
);
dev_ctx
.
SetBlob
(
key_pool_bwd_p
,
pool_bwd_p
);
}
else
{
...
...
@@ -404,7 +403,7 @@ class PoolMKLDNNGradOpKernel : public paddle::framework::OpKernel<T> {
if
(
is_diff_dst_reordered
)
{
pipeline
.
push_back
(
reorder_diff_dst
);
}
pipeline
.
push_back
(
*
(
pool_bwd_p
.
get
())
);
pipeline
.
push_back
(
*
pool_bwd_p
);
mkldnn
::
stream
(
mkldnn
::
stream
::
kind
::
eager
).
submit
(
pipeline
).
wait
();
in_x_grad
->
set_layout
(
DataLayout
::
kMKLDNN
);
...
...
paddle/fluid/operators/mkldnn/softmax_mkldnn_op.cc
浏览文件 @
07efdb51
...
...
@@ -66,8 +66,7 @@ class SoftmaxMKLDNNHandler : public platform::MKLDNNHandler {
"Fail to find softmax primitive in device context"
);
if
(
softmax_p
==
nullptr
)
{
softmax_p
=
std
::
make_shared
<
mkldnn
::
softmax_forward
>
(
*
(
softmax_pd_
.
get
()),
*
(
static_cast
<
mkldnn
::
memory
*>
(
src_memory_p
.
get
())),
*
softmax_pd_
,
*
(
static_cast
<
mkldnn
::
memory
*>
(
src_memory_p
.
get
())),
*
(
static_cast
<
mkldnn
::
memory
*>
(
dst_memory_p
.
get
())));
dev_ctx_
.
SetBlob
(
prim_key
,
softmax_p
);
}
else
{
...
...
@@ -88,8 +87,8 @@ class SoftmaxMKLDNNHandler : public platform::MKLDNNHandler {
"Fail to find softmax backward primitive in device context"
);
if
(
softmax_bwd_p
==
nullptr
)
{
softmax_bwd_p
=
std
::
make_shared
<
mkldnn
::
softmax_backward
>
(
*
softmax_bwd_pd_
,
*
(
dst_memory_p
.
get
()),
*
(
diff_dst_memory_p
.
get
())
,
*
(
diff_src_memory_p
.
get
())
);
*
softmax_bwd_pd_
,
*
dst_memory_p
,
*
diff_dst_memory_p
,
*
diff_src_memory_p
);
dev_ctx_
.
SetBlob
(
prim_key
,
softmax_bwd_p
);
}
else
{
is_reusing_
=
true
;
...
...
paddle/fluid/operators/mkldnn/sum_mkldnn_op.cc
浏览文件 @
07efdb51
...
...
@@ -79,15 +79,6 @@ class SumMKLDNNOpKernel : public paddle::framework::OpKernel<T> {
memory
::
format
input_format
=
input0
.
format
();
if
(
src_tz
.
size
()
==
1
&&
(
input_format
==
memory
::
format
::
nchw
||
input_format
==
memory
::
format
::
nhwc
))
{
input_format
=
memory
::
format
::
x
;
}
if
(
src_tz
.
size
()
==
2
&&
(
input_format
==
memory
::
format
::
nchw
||
input_format
==
memory
::
format
::
nhwc
))
{
input_format
=
memory
::
format
::
nc
;
}
for
(
int
i
=
0
;
i
<
N
;
i
++
)
{
PADDLE_ENFORCE
(
in_vars
[
i
]
->
IsType
<
LoDTensor
>
(),
"all inputs must be all LoDTensors"
);
...
...
@@ -147,105 +138,10 @@ class SumMKLDNNOpKernel : public paddle::framework::OpKernel<T> {
output
->
set_layout
(
DataLayout
::
kMKLDNN
);
output
->
set_format
(
output_format
);
}
else
if
(
out_var
->
IsType
<
framework
::
SelectedRows
>
())
{
// TODO(@mozga-intel) Add MKLDNN SelectedRows support
std
::
unique_ptr
<
framework
::
SelectedRows
>
in0
;
if
(
in_place
)
{
// If is in_place, we store the input[0] to in0
auto
&
in_sel0
=
in_vars
[
0
]
->
Get
<
SelectedRows
>
();
auto
&
rows
=
in_sel0
.
rows
();
in0
.
reset
(
new
framework
::
SelectedRows
(
rows
,
in_sel0
.
height
()));
in0
->
mutable_value
()
->
ShareDataWith
(
in_sel0
.
value
());
}
auto
get_selected_row
=
[
&
](
size_t
i
)
->
const
SelectedRows
&
{
if
(
i
==
0
&&
in0
)
{
return
*
in0
.
get
();
}
else
{
return
in_vars
[
i
]
->
Get
<
SelectedRows
>
();
}
};
auto
*
out
=
ctx
.
Output
<
SelectedRows
>
(
"Out"
);
out
->
mutable_rows
()
->
clear
();
auto
*
out_value
=
out
->
mutable_value
();
// Runtime InferShape
size_t
first_dim
=
0
;
for
(
int
i
=
0
;
i
<
N
;
i
++
)
{
auto
&
sel_row
=
get_selected_row
(
i
);
first_dim
+=
sel_row
.
rows
().
size
();
}
std
::
vector
<
int64_t
>
in_dim
;
for
(
int
i
=
0
;
i
<
N
;
i
++
)
{
auto
&
sel_row
=
get_selected_row
(
i
);
if
(
sel_row
.
rows
().
size
()
>
0
)
{
in_dim
=
framework
::
vectorize
(
sel_row
.
value
().
dims
());
break
;
}
}
if
(
in_dim
.
empty
())
{
VLOG
(
3
)
<<
"WARNING: all the inputs are empty"
;
in_dim
=
framework
::
vectorize
(
get_selected_row
(
N
-
1
).
value
().
dims
());
}
else
{
in_dim
[
0
]
=
static_cast
<
int64_t
>
(
first_dim
);
}
in_dim
[
0
]
=
static_cast
<
int64_t
>
(
first_dim
);
out_value
->
Resize
(
framework
::
make_ddim
(
in_dim
));
out_value
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
// if all the input sparse vars are empty, no need to
// merge these vars.
if
(
first_dim
==
0UL
)
{
return
;
}
math
::
SelectedRowsAddTo
<
CPUDeviceContext
,
T
>
functor
;
int64_t
offset
=
0
;
for
(
int
i
=
0
;
i
<
N
;
i
++
)
{
auto
&
sel_row
=
get_selected_row
(
i
);
if
(
sel_row
.
rows
().
size
()
==
0
)
{
continue
;
}
PADDLE_ENFORCE_EQ
(
out
->
height
(),
sel_row
.
height
());
functor
(
ctx
.
template
device_context
<
CPUDeviceContext
>(),
sel_row
,
offset
,
out
);
offset
+=
sel_row
.
value
().
numel
();
}
}
else
if
(
out_var
->
IsType
<
framework
::
LoDTensorArray
>
())
{
// TODO(@mozga-intel) Add MKLDNN LoDTensorArray support
auto
&
out_array
=
*
out_var
->
GetMutable
<
framework
::
LoDTensorArray
>
();
for
(
size_t
i
=
in_place
?
1
:
0
;
i
<
in_vars
.
size
();
++
i
)
{
PADDLE_ENFORCE
(
in_vars
[
i
]
->
IsType
<
framework
::
LoDTensorArray
>
(),
"Only support all inputs are TensorArray"
);
auto
&
in_array
=
in_vars
[
i
]
->
Get
<
framework
::
LoDTensorArray
>
();
for
(
size_t
i
=
0
;
i
<
in_array
.
size
();
++
i
)
{
if
(
in_array
[
i
].
numel
()
!=
0
)
{
if
(
i
>=
out_array
.
size
())
{
out_array
.
resize
(
i
+
1
);
}
if
(
out_array
[
i
].
numel
()
==
0
)
{
framework
::
TensorCopy
(
in_array
[
i
],
in_array
[
i
].
place
(),
ctx
.
device_context
(),
&
out_array
[
i
]);
out_array
[
i
].
set_lod
(
in_array
[
i
].
lod
());
}
else
{
PADDLE_ENFORCE
(
out_array
[
i
].
lod
()
==
in_array
[
i
].
lod
());
auto
in
=
EigenVector
<
T
>::
Flatten
(
in_array
[
i
]);
auto
result
=
EigenVector
<
T
>::
Flatten
(
out_array
[
i
]);
result
.
device
(
*
ctx
.
template
device_context
<
MKLDNNDeviceContext
>()
.
eigen_device
())
=
result
+
in
;
}
}
}
}
}
else
{
PADDLE_THROW
(
"Unexpected branch, output variable type is %s"
,
framework
::
ToTypeName
(
out_var
->
Type
()));
}
else
{
// Fallback to naive version
// TODO(@mozga-intel) Add MKLDNN SelectedRows & LoDTensorArray support
SumKernel
<
CPUDeviceContext
,
T
>
reference_kernel
;
reference_kernel
.
Compute
(
ctx
);
}
}
};
...
...
paddle/fluid/operators/mkldnn/transpose_mkldnn_op.cc
浏览文件 @
07efdb51
...
...
@@ -52,7 +52,7 @@ class TransposeMKLDNNOpKernel : public paddle::framework::OpKernel<T> {
mkldnn_engine
,
key
);
auto
transpose_src_memory_p
=
handler
.
AcquireSrcMemory
(
input
->
format
(),
platform
::
to_void_cast
<
T
>
(
input_data
));
input
->
get_mkldnn_prim_desc
(),
platform
::
to_void_cast
<
T
>
(
input_data
));
auto
transpose_dst_memory_p
=
handler
.
AcquireDstMemory
(
output
,
ctx
.
GetPlace
());
auto
transpose_p
=
handler
.
AcquireTranspose
(
transpose_dst_memory_p
,
...
...
@@ -61,6 +61,15 @@ class TransposeMKLDNNOpKernel : public paddle::framework::OpKernel<T> {
std
::
vector
<
mkldnn
::
primitive
>
pipeline
;
pipeline
.
push_back
(
*
transpose_p
);
mkldnn
::
stream
(
mkldnn
::
stream
::
kind
::
eager
).
submit
(
pipeline
).
wait
();
// Transpose did change logical dimensions of Tensor, but reorder does not.
// Reorder does change only physical layout eg. format , strides
// so we need to create new primitive descriptor with changed logical layout
// so it match output shape
auto
output_mem_pd
=
paddle
::
platform
::
create_prim_desc_from_dims
(
paddle
::
framework
::
vectorize2int
(
output
->
dims
()),
mkldnn
::
memory
::
format
::
blocked
);
output
->
set_mkldnn_prim_desc
(
output_mem_pd
);
}
};
...
...
@@ -102,8 +111,9 @@ class TransposeMKLDNNGradOpKernel : public paddle::framework::OpKernel<T> {
platform
::
TransposeMKLDNNHandler
handler
(
nchw_tz
,
reversed_axis
,
dev_ctx
,
mkldnn_engine
,
key
);
auto
transpose_src_memory_p
=
handler
.
AcquireSrcMemory
(
out_grad
->
format
(),
platform
::
to_void_cast
<
T
>
(
out_grad_data
));
auto
transpose_src_memory_p
=
handler
.
AcquireSrcMemory
(
out_grad
->
get_mkldnn_prim_desc
(),
platform
::
to_void_cast
<
T
>
(
out_grad_data
));
auto
transpose_dst_memory_p
=
handler
.
AcquireDstMemory
(
x_grad
,
ctx
.
GetPlace
());
auto
transpose_p
=
handler
.
AcquireTranspose
(
transpose_dst_memory_p
,
...
...
@@ -112,6 +122,15 @@ class TransposeMKLDNNGradOpKernel : public paddle::framework::OpKernel<T> {
std
::
vector
<
mkldnn
::
primitive
>
pipeline
;
pipeline
.
push_back
(
*
transpose_p
);
mkldnn
::
stream
(
mkldnn
::
stream
::
kind
::
eager
).
submit
(
pipeline
).
wait
();
// Transpose did change logical dimensions of Tensor, but reorder does not.
// Reorder does change only physical layout eg. format , strides
// so we need to create new primitive descriptor with changed logical layout
// so it match output shape
auto
x_grad_mem_pd
=
paddle
::
platform
::
create_prim_desc_from_dims
(
paddle
::
framework
::
vectorize2int
(
x_grad
->
dims
()),
mkldnn
::
memory
::
format
::
blocked
);
x_grad
->
set_mkldnn_prim_desc
(
x_grad_mem_pd
);
}
};
...
...
paddle/fluid/operators/pool_op.cc
浏览文件 @
07efdb51
...
...
@@ -168,9 +168,10 @@ void Pool2dOpMaker::Make() {
"be ignored."
);
// TODO(Chengduo): Add checker.
// (Currently,
// TypedAttrChecker don't support vector type.)
AddAttr
<
bool
>
(
"global_pooling"
,
"(bool, default false) Whether to use the global pooling. "
"If global_pooling = true, ksize and paddings will be ignored."
)
AddAttr
<
bool
>
(
"global_pooling"
,
"(bool, default false) Whether to use the global pooling. "
"If global_pooling = true, kernel size and paddings will be ignored."
)
.
SetDefault
(
false
);
AddAttr
<
std
::
vector
<
int
>>
(
"strides"
,
"(vector<int>, default {1, 1}), strides(height, "
...
...
@@ -182,7 +183,7 @@ void Pool2dOpMaker::Make() {
"paddings"
,
"(vector<int>, default {0,0}), paddings(height, width) of pooling "
"operator."
"If global_pooling = true, paddings and ksize will be ignored."
)
"If global_pooling = true, paddings and k
ernel
size will be ignored."
)
.
SetDefault
({
0
,
0
});
AddAttr
<
bool
>
(
"exclusive"
,
...
...
@@ -204,7 +205,7 @@ void Pool2dOpMaker::Make() {
.
SetDefault
(
false
);
AddAttr
<
bool
>
(
"ceil_mode"
,
"(bool, default false) Wether to use the ceil function to calculate "
"(bool, default false) W
h
ether to use the ceil function to calculate "
"output height and width. False is the default. If it is set to False, "
"the floor function will be used."
)
.
SetDefault
(
false
);
...
...
@@ -333,7 +334,7 @@ void Pool3dOpMaker::Make() {
AddAttr
<
bool
>
(
"global_pooling"
,
"(bool, default false) Whether to use the global pooling. "
"If global_pooling = true, k
size and paddings wille
be ignored."
)
"If global_pooling = true, k
ernel size and paddings will
be ignored."
)
.
SetDefault
(
false
);
AddAttr
<
std
::
vector
<
int
>>
(
"strides"
,
...
...
@@ -368,7 +369,7 @@ void Pool3dOpMaker::Make() {
.
SetDefault
(
false
);
AddAttr
<
bool
>
(
"ceil_mode"
,
"(bool, default false) Wether to use the ceil function to calculate "
"(bool, default false) W
h
ether to use the ceil function to calculate "
"output height and width. False is the default. If it is set to False, "
"the floor function will be used."
)
.
SetDefault
(
false
);
...
...
paddle/fluid/operators/sequence_ops/sequence_expand_op.cu
浏览文件 @
07efdb51
...
...
@@ -13,6 +13,7 @@ See the License for the specific language governing permissions and
limitations under the License. */
#include <algorithm>
#include "paddle/fluid/memory/memcpy.h"
#include "paddle/fluid/operators/sequence_ops/sequence_expand_op.h"
#include "paddle/fluid/platform/cuda_primitives.h"
...
...
@@ -88,6 +89,49 @@ void GetOutputOffset(const framework::Vector<size_t>& x_lod,
}
}
template
<
typename
T
>
static
int
ExpandByMemoryCopy
(
const
platform
::
CUDADeviceContext
&
context
,
const
LoDTensor
&
x
,
LoDTensor
*
out
,
const
framework
::
Vector
<
size_t
>&
x_lod
,
const
framework
::
Vector
<
size_t
>&
ref_lod
,
bool
do_copy
)
{
auto
out_data
=
out
->
data
<
T
>
();
auto
x_data
=
x
.
data
<
T
>
();
auto
&
gpu_place
=
boost
::
get
<
platform
::
CUDAPlace
>
(
context
.
GetPlace
());
int
x_item_length
=
x
.
numel
()
/
x
.
dims
()[
0
];
int
out_offset
=
0
;
int
num_copys
=
0
;
for
(
size_t
i
=
1
;
i
<
ref_lod
.
size
();
++
i
)
{
int
repeat_num
=
ref_lod
[
i
]
-
ref_lod
[
i
-
1
];
int
x_start
=
x_lod
[
i
-
1
];
int
x_end
=
x_lod
[
i
];
int
x_seq_len
=
x_end
-
x_start
;
if
(
repeat_num
>
0
)
{
if
(
do_copy
)
{
int
out_start
=
out_offset
;
if
(
out
->
lod
().
size
()
==
1
)
{
out_start
=
out
->
lod
()[
0
][
out_offset
];
}
for
(
int
j
=
0
;
j
<
repeat_num
;
j
++
)
{
for
(
int
k
=
0
;
k
<
x_seq_len
;
k
++
)
{
memory
::
Copy
(
gpu_place
,
out_data
+
(
out_start
+
j
*
x_seq_len
+
k
)
*
x_item_length
,
gpu_place
,
x_data
+
(
x_start
+
k
)
*
x_item_length
,
sizeof
(
T
)
*
x_item_length
,
context
.
stream
());
}
}
}
else
{
num_copys
+=
repeat_num
*
x_seq_len
;
}
}
out_offset
+=
repeat_num
;
}
return
num_copys
;
}
template
<
typename
T
>
struct
SequenceExpandFunctor
<
platform
::
CUDADeviceContext
,
T
>
{
void
operator
()(
...
...
@@ -95,22 +139,40 @@ struct SequenceExpandFunctor<platform::CUDADeviceContext, T> {
const
framework
::
Vector
<
size_t
>&
x_lod
,
/*expand source lod*/
const
framework
::
Vector
<
size_t
>&
ref_lod
,
/*expand referenced lod*/
LoDTensor
*
out
)
{
int
x_item_length
=
x
.
numel
()
/
x
.
dims
()[
0
];
framework
::
Vector
<
size_t
>
out_offset
(
x_lod
.
size
());
GetOutputOffset
(
x_lod
,
ref_lod
,
&
out_offset
);
int
thread_x
=
std
::
min
(
32
,
std
::
max
(
static_cast
<
int
>
(
ref_lod
.
size
()),
16
));
int
thread_y
=
16
;
int
thread_z
=
1024
/
thread_x
/
thread_y
;
int
block_x
=
static_cast
<
int
>
(
ref_lod
.
size
());
dim3
block_size
(
thread_x
,
thread_y
,
thread_z
);
dim3
grid_size
(
block_x
,
1
);
int
num_copys
=
ExpandByMemoryCopy
<
T
>
(
context
,
x
,
out
,
x_lod
,
ref_lod
,
false
);
// Sometimes direct copies will be faster, this maybe need deeply analysis.
if
(
num_copys
<
5
)
{
ExpandByMemoryCopy
<
T
>
(
context
,
x
,
out
,
x_lod
,
ref_lod
,
true
);
}
else
{
int
x_item_length
=
x
.
numel
()
/
x
.
dims
()[
0
];
size_t
x_lod_size
=
x_lod
.
size
();
framework
::
Vector
<
size_t
>
out_offset
(
x_lod_size
*
2
+
ref_lod
.
size
());
GetOutputOffset
(
x_lod
,
ref_lod
,
&
out_offset
);
for
(
size_t
i
=
0
;
i
<
x_lod_size
;
++
i
)
{
out_offset
[
x_lod_size
+
i
]
=
x_lod
[
i
];
}
for
(
size_t
i
=
0
;
i
<
ref_lod
.
size
();
++
i
)
{
out_offset
[
2
*
x_lod_size
+
i
]
=
ref_lod
[
i
];
}
sequence_expand_kernel
<<<
grid_size
,
block_size
,
0
,
context
.
stream
()
>>>
(
x
.
data
<
T
>
(),
x_lod
.
CUDAData
(
context
.
GetPlace
()),
ref_lod
.
CUDAData
(
context
.
GetPlace
()),
out_offset
.
CUDAData
(
context
.
GetPlace
()),
x_lod
.
size
(),
x_item_length
,
out
->
mutable_data
<
T
>
(
context
.
GetPlace
()));
const
size_t
*
out_offset_data
=
out_offset
.
CUDAData
(
context
.
GetPlace
());
const
size_t
*
x_lod_data
=
out_offset_data
+
x_lod_size
;
const
size_t
*
ref_lod_data
=
out_offset_data
+
2
*
x_lod_size
;
int
thread_x
=
std
::
min
(
32
,
std
::
max
(
static_cast
<
int
>
(
ref_lod
.
size
()),
16
));
int
thread_y
=
16
;
int
thread_z
=
1024
/
thread_x
/
thread_y
;
int
block_x
=
static_cast
<
int
>
(
ref_lod
.
size
());
dim3
block_size
(
thread_x
,
thread_y
,
thread_z
);
dim3
grid_size
(
block_x
,
1
);
sequence_expand_kernel
<<<
grid_size
,
block_size
,
0
,
context
.
stream
()
>>>
(
x
.
data
<
T
>
(),
x_lod_data
,
ref_lod_data
,
out_offset_data
,
x_lod_size
,
x_item_length
,
out
->
mutable_data
<
T
>
(
context
.
GetPlace
()));
}
}
};
...
...
paddle/fluid/platform/CMakeLists.txt
浏览文件 @
07efdb51
...
...
@@ -87,11 +87,11 @@ nv_test(transform_test SRCS transform_test.cu DEPS memory place device_context)
cc_library
(
timer SRCS timer.cc
)
cc_test
(
timer_test SRCS timer_test.cc DEPS timer
)
cc_library
(
device_tracer SRCS device_tracer.cc DEPS boost profiler_proto framework_proto
device_context
${
GPU_CTX_DEPS
}
)
cc_library
(
device_tracer SRCS device_tracer.cc DEPS boost profiler_proto framework_proto
${
GPU_CTX_DEPS
}
)
if
(
WITH_GPU
)
nv_library
(
profiler SRCS profiler.cc profiler.cu DEPS device_
context device_tracer
)
nv_library
(
profiler SRCS profiler.cc profiler.cu DEPS device_
tracer gpu_info enforce
)
else
()
cc_library
(
profiler SRCS profiler.cc DEPS device_
context device_tracer
)
cc_library
(
profiler SRCS profiler.cc DEPS device_
tracer enforce
)
endif
()
cc_test
(
profiler_test SRCS profiler_test.cc DEPS profiler
)
...
...
paddle/fluid/platform/device_context.cc
浏览文件 @
07efdb51
...
...
@@ -394,7 +394,7 @@ void MKLDNNDeviceContext::SetBlob(const std::string& name,
int
tid
=
platform
::
get_cur_thread_id
();
std
::
lock_guard
<
std
::
mutex
>
lock
(
*
p_mutex_
.
get
()
);
std
::
lock_guard
<
std
::
mutex
>
lock
(
*
p_mutex_
);
// Find KeyBlob for current thread
auto
map_it
=
pMap
->
find
(
tid
);
...
...
@@ -427,7 +427,7 @@ std::shared_ptr<void> MKLDNNDeviceContext::GetBlob(
int
tid
=
platform
::
get_cur_thread_id
();
std
::
lock_guard
<
std
::
mutex
>
lock
(
*
p_mutex_
.
get
()
);
std
::
lock_guard
<
std
::
mutex
>
lock
(
*
p_mutex_
);
// Find KeyBlob for current thread firstly
auto
map_it
=
pMap
->
find
(
tid
);
...
...
paddle/fluid/platform/device_tracer.cc
浏览文件 @
07efdb51
...
...
@@ -136,7 +136,7 @@ void EnableActivity() {
CUPTI_CALL
(
dynload
::
cuptiActivityEnable
(
CUPTI_ACTIVITY_KIND_DRIVER
));
CUPTI_CALL
(
dynload
::
cuptiActivityEnable
(
CUPTI_ACTIVITY_KIND_RUNTIME
));
// We don't track these activities for now.
//
CUPTI_CALL(dynload::cuptiActivityEnable(CUPTI_ACTIVITY_KIND_MEMSET));
CUPTI_CALL
(
dynload
::
cuptiActivityEnable
(
CUPTI_ACTIVITY_KIND_MEMSET
));
// CUPTI_CALL(dynload::cuptiActivityEnable(CUPTI_ACTIVITY_KIND_OVERHEAD));
// CUPTI_CALL(dynload::cuptiActivityEnable(CUPTI_ACTIVITY_KIND_DEVICE));
// CUPTI_CALL(dynload::cuptiActivityEnable(CUPTI_ACTIVITY_KIND_CONTEXT));
...
...
@@ -155,7 +155,7 @@ void DisableActivity() {
// CUPTI_CALL(dynload::cuptiActivityDisable(CUPTI_ACTIVITY_KIND_CONTEXT));
CUPTI_CALL
(
dynload
::
cuptiActivityDisable
(
CUPTI_ACTIVITY_KIND_DRIVER
));
CUPTI_CALL
(
dynload
::
cuptiActivityDisable
(
CUPTI_ACTIVITY_KIND_RUNTIME
));
//
CUPTI_CALL(dynload::cuptiActivityDisable(CUPTI_ACTIVITY_KIND_MEMSET));
CUPTI_CALL
(
dynload
::
cuptiActivityDisable
(
CUPTI_ACTIVITY_KIND_MEMSET
));
// CUPTI_CALL(dynload::cuptiActivityDisable(CUPTI_ACTIVITY_KIND_NAME));
// CUPTI_CALL(dynload::cuptiActivityDisable(CUPTI_ACTIVITY_KIND_MARKER));
// CUPTI_CALL(dynload::cuptiActivityDisable(CUPTI_ACTIVITY_KIND_OVERHEAD));
...
...
@@ -212,6 +212,14 @@ void CUPTIAPI bufferCompleted(CUcontext ctx, uint32_t streamId, uint8_t *buffer,
memcpy
->
correlationId
,
memcpy
->
bytes
);
break
;
}
case
CUPTI_ACTIVITY_KIND_MEMSET
:
{
auto
*
memset
=
reinterpret_cast
<
const
CUpti_ActivityMemset
*>
(
record
);
tracer
->
AddKernelRecords
(
"MEMSET"
,
memset
->
start
,
memset
->
end
,
memset
->
deviceId
,
memset
->
streamId
,
memset
->
correlationId
);
break
;
}
case
CUPTI_ACTIVITY_KIND_DRIVER
:
{
auto
*
api
=
reinterpret_cast
<
const
CUpti_ActivityAPI
*>
(
record
);
if
(
api
->
start
!=
0
&&
api
->
end
!=
0
)
...
...
@@ -348,6 +356,8 @@ class DeviceTracerImpl : public DeviceTracer {
const
std
::
vector
<
int
>
cbids
{
CUPTI_RUNTIME_TRACE_CBID_cudaMemcpy_v3020
,
CUPTI_RUNTIME_TRACE_CBID_cudaMemcpyAsync_v3020
,
CUPTI_RUNTIME_TRACE_CBID_cudaMemset_v3020
,
CUPTI_RUNTIME_TRACE_CBID_cudaMemsetAsync_v3020
,
CUPTI_RUNTIME_TRACE_CBID_cudaLaunch_v3020
,
CUPTI_RUNTIME_TRACE_CBID_cudaLaunchKernel_v7000
#if CUDA_VERSION >= 9000
...
...
paddle/fluid/platform/device_tracer.h
浏览文件 @
07efdb51
...
...
@@ -17,6 +17,7 @@ limitations under the License. */
#include <string>
#include "paddle/fluid/platform/dynload/cupti.h"
#include "paddle/fluid/platform/event.h"
#include "paddle/fluid/platform/port.h"
#include "paddle/fluid/platform/profiler.pb.h"
...
...
@@ -32,8 +33,6 @@ inline uint64_t PosixInNsec() {
return
1000
*
(
static_cast
<
uint64_t
>
(
tv
.
tv_sec
)
*
1000000
+
tv
.
tv_usec
);
}
class
Event
;
// DeviceTracer performs the following tasks:
// 1. Register cuda callbacks for various events: kernel, memcpy, etc.
// 2. Collect cuda statistics: start/end ts, memory, etc.
...
...
paddle/fluid/platform/enforce.h
浏览文件 @
07efdb51
...
...
@@ -34,6 +34,7 @@ limitations under the License. */
#include <type_traits>
#include <utility>
#define GLOG_NO_ABBREVIATED_SEVERITIES // msvc conflict logging with windows.h
#include "glog/logging.h"
#include "paddle/fluid/platform/macros.h"
#include "paddle/fluid/platform/port.h"
...
...
paddle/fluid/platform/event.h
0 → 100644
浏览文件 @
07efdb51
/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <string>
#ifdef PADDLE_WITH_CUDA
#include <cuda_runtime.h>
#endif
namespace
paddle
{
namespace
platform
{
enum
EventType
{
kMark
,
kPushRange
,
kPopRange
};
class
Event
{
public:
// The DeviceContext is used to get the cuda stream.
// If CPU profiling mode, can pass nullptr.
Event
(
EventType
type
,
std
::
string
name
,
uint32_t
thread_id
);
const
EventType
&
type
()
const
;
std
::
string
name
()
const
{
return
name_
;
}
uint32_t
thread_id
()
const
{
return
thread_id_
;
}
#ifdef PADDLE_WITH_CUDA
#ifndef PADDLE_WITH_CUPTI
cudaEvent_t
event
()
const
{
return
event_
;
}
int
device
()
const
{
return
device_
;
}
#endif
#endif
double
CpuElapsedMs
(
const
Event
&
e
)
const
;
double
CudaElapsedMs
(
const
Event
&
e
)
const
;
private:
EventType
type_
;
std
::
string
name_
;
uint32_t
thread_id_
;
int64_t
cpu_ns_
;
#ifdef PADDLE_WITH_CUDA
#ifdef PADDLE_WITH_CUPTI
int64_t
gpu_ns_
=
0
;
public:
void
AddCudaElapsedTime
(
int64_t
start_ns
,
int64_t
end_ns
)
{
gpu_ns_
+=
end_ns
-
start_ns
;
}
private:
#else
cudaEvent_t
event_
=
nullptr
;
int
device_
=
-
1
;
#endif
#endif
};
}
// namespace platform
}
// namespace paddle
paddle/fluid/platform/mkldnn_reuse.h
浏览文件 @
07efdb51
...
...
@@ -39,6 +39,45 @@ class MKLDNNHandler {
return
this
->
AcquireMemory
(
md
,
ptr
,
"@user_src_mem_p"
);
}
// TODO(jczaja): extract common part and make AcquireMemory
std
::
shared_ptr
<
mkldnn
::
memory
>
AcquireSrcMemory
(
const
mkldnn
::
memory
::
primitive_desc
&
mpd
,
void
*
ptr
)
{
auto
local_key
=
key_
+
"@user_src_mem_p"
;
auto
mem_p
=
std
::
static_pointer_cast
<
mkldnn
::
memory
>
(
dev_ctx_
.
GetBlob
(
local_key
));
PADDLE_ENFORCE
((
mem_p
!=
nullptr
)
||
(
is_reusing_
==
false
),
" find mem primitive in device context"
);
if
(
mem_p
==
nullptr
)
{
mem_p
=
std
::
make_shared
<
mkldnn
::
memory
>
(
mpd
,
ptr
);
dev_ctx_
.
SetBlob
(
local_key
,
mem_p
);
}
else
{
mem_p
->
set_data_handle
(
ptr
);
// Mark that reusing happenned. All primitives from operator instance
// should be reused or none of them. So we check consistency
is_reusing_
=
true
;
}
return
mem_p
;
}
std
::
shared_ptr
<
mkldnn
::
memory
>
AcquireWeightsMemory
(
const
mkldnn
::
memory
::
primitive_desc
&
mpd
,
void
*
ptr
)
{
auto
local_key
=
key_
+
"@user_weights_mem_p"
;
auto
mem_p
=
std
::
static_pointer_cast
<
mkldnn
::
memory
>
(
dev_ctx_
.
GetBlob
(
local_key
));
PADDLE_ENFORCE
((
mem_p
!=
nullptr
)
||
(
is_reusing_
==
false
),
" find mem primitive in device context"
);
if
(
mem_p
==
nullptr
)
{
mem_p
=
std
::
make_shared
<
mkldnn
::
memory
>
(
mpd
,
ptr
);
dev_ctx_
.
SetBlob
(
local_key
,
mem_p
);
}
else
{
mem_p
->
set_data_handle
(
ptr
);
// Mark that reusing happenned. All primitives from operator instance
// should be reused or none of them. So we check consistency
is_reusing_
=
true
;
}
return
mem_p
;
}
std
::
shared_ptr
<
mkldnn
::
memory
>
AcquireWeightsMemory
(
const
mkldnn
::
memory
::
desc
&
md
,
void
*
ptr
,
user_function
custom_func
=
{})
{
...
...
@@ -273,37 +312,7 @@ class TransposeMKLDNNHandler : public MKLDNNHandler {
mkldnn
::
engine
engine
,
const
std
::
string
&
base_key
)
:
platform
::
MKLDNNHandler
(
dev_ctx
,
engine
,
base_key
),
dims_
(
dims
),
axis_
(
axis
),
logical_axis_
(
dims
.
size
(),
0
)
{}
std
::
shared_ptr
<
mkldnn
::
memory
>
AcquireSrcMemory
(
const
mkldnn
::
memory
::
format
&
fmt
,
void
*
ptr
)
{
auto
local_key
=
key_
+
"@user_src_mem_p"
;
auto
mem_p
=
std
::
static_pointer_cast
<
mkldnn
::
memory
>
(
dev_ctx_
.
GetBlob
(
local_key
));
PADDLE_ENFORCE
((
mem_p
!=
nullptr
)
||
(
is_reusing_
==
false
),
" find mem primitive in device context"
);
if
(
mem_p
==
nullptr
)
{
// Make memory descriptor using input format, unless it
// cannot be trusted (nchw) then make up memory fmt manually
for
(
size_t
i
=
0
;
i
<
logical_axis_
.
size
();
++
i
)
{
logical_axis_
[
i
]
=
i
;
}
auto
src_md
=
fmt
!=
mkldnn
::
memory
::
format
::
nchw
?
platform
::
MKLDNNMemDesc
(
dims_
,
platform
::
MKLDNNGetDataType
<
float
>
(),
fmt
)
:
Axis2MemoryDesc
(
dims_
,
logical_axis_
);
mem_p
=
std
::
make_shared
<
mkldnn
::
memory
>
(
mkldnn
::
memory
::
primitive_desc
{
src_md
,
engine_
},
ptr
);
dev_ctx_
.
SetBlob
(
local_key
,
mem_p
);
}
else
{
mem_p
->
set_data_handle
(
ptr
);
// Mark that reusing happenned. All primitives from operator instance
// should be reused or none of them. So we check consistency
is_reusing_
=
true
;
}
return
mem_p
;
}
axis_
(
axis
)
{}
std
::
shared_ptr
<
mkldnn
::
memory
>
AcquireDstMemory
(
framework
::
Tensor
*
output
,
platform
::
Place
place
)
{
...
...
@@ -388,7 +397,6 @@ class TransposeMKLDNNHandler : public MKLDNNHandler {
private:
std
::
vector
<
int
>
dims_
;
std
::
vector
<
int
>
axis_
;
std
::
vector
<
int
>
logical_axis_
;
};
template
<
class
forward_t
,
class
backward_data_t
,
class
backward_weights_t
>
...
...
@@ -548,9 +556,8 @@ class ConvMKLDNNTemplateHandler : public MKLDNNHandler {
PADDLE_ENFORCE
((
conv_p
!=
nullptr
)
||
(
is_reusing_
==
false
),
"Fail to find convolution primitive in device context"
);
if
(
conv_p
==
nullptr
)
{
conv_p
=
std
::
make_shared
<
forward_t
>
(
*
conv_pd_
,
*
(
src_memory_p
),
*
(
weights_memory_p
.
get
()),
*
(
dst_memory_p
.
get
()));
conv_p
=
std
::
make_shared
<
forward_t
>
(
*
conv_pd_
,
*
src_memory_p
,
*
weights_memory_p
,
*
dst_memory_p
);
dev_ctx_
.
SetBlob
(
prim_key
,
conv_p
);
}
else
{
...
...
@@ -570,9 +577,9 @@ class ConvMKLDNNTemplateHandler : public MKLDNNHandler {
PADDLE_ENFORCE
((
conv_p
!=
nullptr
)
||
(
is_reusing_
==
false
),
"Fail to find convolution primitive in device context"
);
if
(
conv_p
==
nullptr
)
{
conv_p
=
std
::
make_shared
<
forward_t
>
(
*
conv_pd_
,
*
(
src_memory_p
),
*
(
weights_memory_p
.
get
())
,
*
(
bias_memory_p
.
get
()),
*
(
dst_memory_p
.
get
())
);
conv_p
=
std
::
make_shared
<
forward_t
>
(
*
conv_pd_
,
*
src_memory_p
,
*
weights_memory_p
,
*
bias_memory_p
,
*
dst_memory_p
);
dev_ctx_
.
SetBlob
(
prim_key
,
conv_p
);
}
else
{
...
...
paddle/fluid/platform/mkldnn_utils.h
0 → 100644
浏览文件 @
07efdb51
/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <mkldnn.h>
#include <string>
namespace
paddle
{
namespace
platform
{
inline
mkldnn
::
memory
::
primitive_desc
create_prim_desc_from_dims
(
const
std
::
vector
<
int
>&
ltz
,
mkldnn
::
memory
::
format
fmt
,
mkldnn
::
memory
::
data_type
data_type
=
mkldnn
::
memory
::
data_type
::
f32
)
{
mkldnn_memory_desc_t
mem_fmt
;
mem_fmt
.
primitive_kind
=
mkldnn_memory
;
mem_fmt
.
ndims
=
ltz
.
size
();
for
(
unsigned
int
i
=
0
;
i
<
ltz
.
size
();
++
i
)
{
mem_fmt
.
dims
[
i
]
=
ltz
[
i
];
// logical dimensions (nchw format,
// regardless physical layout)
}
mem_fmt
.
data_type
=
static_cast
<
mkldnn_data_type_t
>
(
data_type
);
mem_fmt
.
format
=
static_cast
<
mkldnn_memory_format_t
>
(
fmt
);
unsigned
int
total_stride
=
1
;
for
(
int
i
=
ltz
.
size
()
-
1
;
i
>=
0
;
--
i
)
{
mem_fmt
.
layout_desc
.
blocking
.
padding_dims
[
i
]
=
ltz
[
i
];
// logical dimensions (nchw format, regardless physical
// layout)
mem_fmt
.
layout_desc
.
blocking
.
block_dims
[
i
]
=
1
;
mem_fmt
.
layout_desc
.
blocking
.
offset_padding_to_data
[
i
]
=
0
;
// no offset
mem_fmt
.
layout_desc
.
blocking
.
strides
[
0
][
i
]
=
total_stride
;
mem_fmt
.
layout_desc
.
blocking
.
strides
[
1
][
i
]
=
1
;
total_stride
*=
ltz
[
i
];
}
mem_fmt
.
layout_desc
.
blocking
.
offset_padding
=
0
;
// no initial offset
auto
&
pool
=
platform
::
DeviceContextPool
::
Instance
();
auto
place
=
paddle
::
platform
::
CPUPlace
();
auto
*
dev_ctx
=
dynamic_cast
<
platform
::
MKLDNNDeviceContext
*>
(
pool
.
Get
(
place
));
auto
&
cpu_engine
=
dev_ctx
->
GetEngine
();
return
mkldnn
::
memory
::
primitive_desc
(
mem_fmt
,
cpu_engine
);
}
inline
mkldnn
::
memory
::
primitive_desc
create_prim_desc_from_format
(
const
std
::
vector
<
int
>&
ltz
,
const
mkldnn
::
memory
::
format
format
,
const
mkldnn
::
memory
::
data_type
data_type
)
{
auto
md
=
mkldnn
::
memory
::
desc
({
ltz
},
data_type
,
format
);
auto
&
pool
=
platform
::
DeviceContextPool
::
Instance
();
auto
place
=
paddle
::
platform
::
CPUPlace
();
auto
dev_ctx
=
dynamic_cast
<
platform
::
MKLDNNDeviceContext
*>
(
pool
.
Get
(
place
));
PADDLE_ENFORCE_NOT_NULL
(
dev_ctx
,
"Could not get valid device"
);
auto
&
cpu_engine
=
dev_ctx
->
GetEngine
();
return
mkldnn
::
memory
::
primitive_desc
(
md
,
cpu_engine
);
}
}
// namespace platform
}
// namespace paddle
paddle/fluid/platform/profiler.cu
浏览文件 @
07efdb51
...
...
@@ -12,9 +12,8 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/platform/profiler.h"
#include <cuda.h>
#include "paddle/fluid/platform/profiler.h"
namespace
paddle
{
namespace
platform
{
...
...
@@ -22,26 +21,27 @@ namespace platform {
__global__
void
DummyKernel
(
int
*
a
)
{
a
[
0
]
=
0
;
}
static
void
ForEachDevice
(
std
::
function
<
void
(
int
)
>
func
)
{
auto
original_device
=
GetCurrentDeviceId
();
int
count
=
GetCUDADeviceCount
();
auto
original_device
=
platform
::
GetCurrentDeviceId
();
int
count
=
platform
::
GetCUDADeviceCount
();
for
(
int
i
=
0
;
i
<
count
;
i
++
)
{
SetDeviceId
(
i
);
platform
::
SetDeviceId
(
i
);
func
(
i
);
}
SetDeviceId
(
original_device
);
platform
::
SetDeviceId
(
original_device
);
}
void
DummyKernelAndEvent
()
{
for
(
int
i
=
0
;
i
<
5
;
i
++
)
{
ForEachDevice
([](
int
d
)
{
CUDADeviceContext
*
dev_ctx
=
new
CUDADeviceContext
(
CUDAPlace
(
d
));
platform
::
SetDeviceId
(
d
);
cudaStream_t
stream
;
PADDLE_ENFORCE
(
cudaStreamCreate
(
&
stream
));
Mark
(
"_cuda_startup_"
);
int
*
ptr
;
PADDLE_ENFORCE
(
cudaMalloc
(
&
ptr
,
sizeof
(
int
)));
DummyKernel
<<<
1
,
1
,
0
,
dev_ctx
->
stream
()
>>>
(
ptr
);
dev_ctx
->
Wait
(
);
DummyKernel
<<<
1
,
1
,
0
,
stream
>>>
(
ptr
);
PADDLE_ENFORCE
(
cudaStreamSynchronize
(
stream
)
);
PADDLE_ENFORCE
(
cudaFree
(
ptr
));
delete
dev_ctx
;
});
}
}
...
...
paddle/fluid/platform/profiler.h
浏览文件 @
07efdb51
...
...
@@ -17,54 +17,13 @@ limitations under the License. */
#include <list>
#include <string>
#include <vector>
#include "paddle/fluid/platform/device_context.h"
namespace
paddle
{
namespace
platform
{
enum
EventType
{
kMark
,
kPushRange
,
kPopRange
};
class
Event
{
public:
// The DeviceContext is used to get the cuda stream.
// If CPU profiling mode, can pass nullptr.
Event
(
EventType
type
,
std
::
string
name
,
uint32_t
thread_id
);
const
EventType
&
type
()
const
;
std
::
string
name
()
const
{
return
name_
;
}
uint32_t
thread_id
()
const
{
return
thread_id_
;
}
#include "paddle/fluid/platform/enforce.h"
#include "paddle/fluid/platform/event.h"
#ifdef PADDLE_WITH_CUDA
#ifndef PADDLE_WITH_CUPTI
cudaEvent_t
event
()
const
{
return
event_
;
}
int
device
()
const
{
return
device_
;
}
#endif
#include "paddle/fluid/platform/gpu_info.h"
#endif
double
CpuElapsedMs
(
const
Event
&
e
)
const
;
double
CudaElapsedMs
(
const
Event
&
e
)
const
;
private:
EventType
type_
;
std
::
string
name_
;
uint32_t
thread_id_
;
int64_t
cpu_ns_
;
#ifdef PADDLE_WITH_CUDA
#ifdef PADDLE_WITH_CUPTI
int64_t
gpu_ns_
=
0
;
public:
void
AddCudaElapsedTime
(
int64_t
start_ns
,
int64_t
end_ns
)
{
gpu_ns_
+=
end_ns
-
start_ns
;
}
private:
#else
cudaEvent_t
event_
=
nullptr
;
int
device_
=
-
1
;
#endif
#endif
};
namespace
paddle
{
namespace
platform
{
enum
ProfilerState
{
kDisabled
,
// disabled state
...
...
paddle/fluid/platform/profiler_test.cc
浏览文件 @
07efdb51
...
...
@@ -33,7 +33,6 @@ TEST(Event, CpuElapsedTime) {
}
TEST
(
RecordEvent
,
RecordEvent
)
{
using
paddle
::
platform
::
DeviceContext
;
using
paddle
::
platform
::
Event
;
using
paddle
::
platform
::
EventType
;
using
paddle
::
platform
::
RecordEvent
;
...
...
paddle/fluid/platform/temporary_allocator_test.cc
浏览文件 @
07efdb51
...
...
@@ -141,7 +141,7 @@ TEST(temporary_allocator, create_tensor_with_allocationptr) {
platform
::
DeviceContextPool
&
pool
=
platform
::
DeviceContextPool
::
Instance
();
auto
*
dev_ctx
=
static_cast
<
platform
::
CPUDeviceContext
*>
(
pool
.
Get
(
cpu_place
));
framework
::
ExecutionContext
ctx
(
op
,
scope
,
*
dev_ctx
,
run_ctx
);
framework
::
ExecutionContext
ctx
(
op
,
scope
,
*
dev_ctx
,
run_ctx
,
nullptr
);
int
numel
=
memory_size
/
sizeof
(
float
);
framework
::
Tensor
tensor
=
...
...
@@ -156,7 +156,7 @@ TEST(temporary_allocator, create_tensor_with_allocationptr) {
platform
::
DeviceContextPool
&
pool
=
platform
::
DeviceContextPool
::
Instance
();
auto
*
dev_ctx
=
static_cast
<
platform
::
CUDADeviceContext
*>
(
pool
.
Get
(
gpu_place
));
framework
::
ExecutionContext
ctx
(
op
,
scope
,
*
dev_ctx
,
run_ctx
);
framework
::
ExecutionContext
ctx
(
op
,
scope
,
*
dev_ctx
,
run_ctx
,
nullptr
);
int
numel
=
memory_size
/
sizeof
(
float
);
framework
::
Tensor
tensor
=
ctx
.
AllocateTmpTensor
<
float
,
platform
::
CUDADeviceContext
>
(
...
...
@@ -179,7 +179,7 @@ TEST(temporary_allocator, create_tensor_with_allocationptr2) {
platform
::
DeviceContextPool
&
pool
=
platform
::
DeviceContextPool
::
Instance
();
auto
*
dev_ctx
=
static_cast
<
platform
::
CPUDeviceContext
*>
(
pool
.
Get
(
cpu_place
));
framework
::
ExecutionContext
ctx
(
op
,
scope
,
*
dev_ctx
,
run_ctx
);
framework
::
ExecutionContext
ctx
(
op
,
scope
,
*
dev_ctx
,
run_ctx
,
nullptr
);
int
numel
=
memory_size
/
sizeof
(
float
);
framework
::
Tensor
out_side_tensor
;
...
...
@@ -200,7 +200,7 @@ TEST(temporary_allocator, create_tensor_with_allocationptr2) {
platform
::
DeviceContextPool
&
pool
=
platform
::
DeviceContextPool
::
Instance
();
auto
*
dev_ctx
=
static_cast
<
platform
::
CUDADeviceContext
*>
(
pool
.
Get
(
gpu_place
));
framework
::
ExecutionContext
ctx
(
op
,
scope
,
*
dev_ctx
,
run_ctx
);
framework
::
ExecutionContext
ctx
(
op
,
scope
,
*
dev_ctx
,
run_ctx
,
nullptr
);
size_t
memory_size
=
500
;
int
numel
=
memory_size
/
sizeof
(
float
);
...
...
paddle/fluid/pybind/imperative.cc
浏览文件 @
07efdb51
...
...
@@ -34,8 +34,8 @@ void BindTracer(pybind11::module* m) {
framework
::
BlockDesc
*
block
,
const
platform
::
CPUPlace
expected_place
,
const
bool
stop_gradient
=
false
)
{
self
.
Trace
(
op
,
inputs
,
outputs
,
block
,
expected_place
,
stop_gradient
);
return
self
.
Trace
(
op
,
inputs
,
outputs
,
block
,
expected_place
,
stop_gradient
);
})
.
def
(
"trace"
,
[](
imperative
::
Tracer
&
self
,
imperative
::
OpBase
*
op
,
...
...
@@ -44,8 +44,8 @@ void BindTracer(pybind11::module* m) {
framework
::
BlockDesc
*
block
,
const
platform
::
CUDAPlace
expected_place
,
const
bool
stop_gradient
=
false
)
{
self
.
Trace
(
op
,
inputs
,
outputs
,
block
,
expected_place
,
stop_gradient
);
return
self
.
Trace
(
op
,
inputs
,
outputs
,
block
,
expected_place
,
stop_gradient
);
})
.
def
(
"py_trace"
,
&
imperative
::
Tracer
::
PyTrace
,
pybind11
::
return_value_policy
::
take_ownership
);
...
...
paddle/fluid/pybind/ir.cc
浏览文件 @
07efdb51
...
...
@@ -101,7 +101,8 @@ void BindGraph(py::module *m) {
[](
Graph
&
self
,
Node
&
node
)
{
return
self
.
RemoveNode
(
&
node
);
})
.
def
(
"retrieve_node"
,
&
Graph
::
RetrieveNode
,
return_value_policy
::
reference
)
.
def
(
"resolve_hazard"
,
&
Graph
::
ResolveHazard
);
.
def
(
"resolve_hazard"
,
&
Graph
::
ResolveHazard
)
.
def
(
"origin_program_desc"
,
&
Graph
::
OriginProgram
);
}
void
BindNode
(
py
::
module
*
m
)
{
...
...
paddle/fluid/pybind/protobuf.cc
浏览文件 @
07efdb51
...
...
@@ -189,6 +189,8 @@ void BindBlockDesc(pybind11::module *m) {
return
self
.
HasVar
(
name
);
},
pybind11
::
return_value_policy
::
reference
)
.
def
(
"_clear_block"
,
[](
pd
::
BlockDesc
&
self
)
{
return
self
.
Clear
();
},
pybind11
::
return_value_policy
::
reference
)
.
def
(
"_rename_var"
,
[](
pd
::
BlockDesc
&
self
,
const
pybind11
::
bytes
&
byte_name
,
const
pybind11
::
bytes
&
byte_name_new
)
{
...
...
paddle/fluid/pybind/pybind.cc
浏览文件 @
07efdb51
...
...
@@ -976,6 +976,7 @@ All parameter, weight, gradient are variables in Paddle.
[](
ir
::
PassBuilder
&
self
,
size_t
idx
)
{
self
.
RemovePass
(
idx
);
});
// -- python binds for parallel executor.
py
::
class_
<
ParallelExecutor
>
pe
(
m
,
"ParallelExecutor"
);
py
::
class_
<
ExecutionStrategy
>
exec_strategy
(
pe
,
"ExecutionStrategy"
,
R"DOC(
ExecutionStrategy allows the user to more preciously control how to run
...
...
@@ -1213,9 +1214,9 @@ All parameter, weight, gradient are variables in Paddle.
cannot be updated after being finalized.)DOC"
);
pe
.
def
(
py
::
init
<
const
std
::
vector
<
platform
::
Place
>
&
,
const
std
::
unordered_set
<
std
::
string
>
&
,
const
ProgramDesc
&
,
const
std
::
string
&
,
Scope
*
,
std
::
vector
<
Scope
*>
&
,
const
ExecutionStrategy
&
,
const
BuildStrategy
&
>
())
const
std
::
unordered_set
<
std
::
string
>
&
,
const
std
::
string
&
,
Scope
*
,
std
::
vector
<
Scope
*>
&
,
const
ExecutionStrategy
&
,
const
BuildStrategy
&
,
ir
::
Graph
*
>
())
// NOTE: even we return a vec<Scope*>* to Python use reference policy.
// We still cannot get local_scope from this vector, since the element
// of vec<Scope*> will be freed by Python GC. We can only return Scope*
...
...
paddle/fluid/train/demo/demo_trainer.cc
浏览文件 @
07efdb51
...
...
@@ -73,7 +73,7 @@ int main() {
PADDLE_ENFORCE_NE
(
loss_name
,
""
,
"loss not found"
);
// init all parameters
executor
.
Run
(
*
startup_program
.
get
()
,
&
scope
,
0
);
executor
.
Run
(
*
startup_program
,
&
scope
,
0
);
// prepare data
auto
x_var
=
scope
.
Var
(
"x"
);
...
...
@@ -101,7 +101,7 @@ int main() {
clock_t
t1
=
clock
();
for
(
int
i
=
0
;
i
<
10
;
++
i
)
{
executor
.
Run
(
*
train_program
.
get
()
,
&
scope
,
0
,
false
,
true
);
executor
.
Run
(
*
train_program
,
&
scope
,
0
,
false
,
true
);
std
::
cout
<<
"step: "
<<
i
<<
" loss: "
<<
loss_var
->
Get
<
paddle
::
framework
::
LoDTensor
>
().
data
<
float
>
()[
0
]
<<
std
::
endl
;
...
...
paddle/fluid/train/test_train_recognize_digits.cc
浏览文件 @
07efdb51
...
...
@@ -74,7 +74,7 @@ void Train() {
float
first_loss
=
0.0
;
float
last_loss
=
0.0
;
for
(
int
i
=
0
;
i
<
100
;
++
i
)
{
executor
.
Run
(
*
train_program
.
get
()
,
&
scope
,
0
,
false
,
true
);
executor
.
Run
(
*
train_program
,
&
scope
,
0
,
false
,
true
);
if
(
i
==
0
)
{
first_loss
=
loss_var
->
Get
<
framework
::
LoDTensor
>
().
data
<
float
>
()[
0
];
}
else
if
(
i
==
99
)
{
...
...
paddle/scripts/paddle_build.sh
浏览文件 @
07efdb51
...
...
@@ -444,6 +444,7 @@ function assert_api_spec_approvals() {
"paddle/fluid/framework/ir/node.h"
"paddle/fluid/framework/ir/graph.h"
"paddle/fluid/framework/framework.proto"
"python/paddle/fluid/compiler.py"
"paddle/fluid/operators/distributed/send_recv.proto.in"
)
for
API_FILE
in
${
API_FILES
[*]
}
;
do
API_CHANGE
=
`
git diff
--name-only
upstream/
$BRANCH
|
grep
"
${
API_FILE
}
"
||
true
`
...
...
python/paddle/fluid/compiler.py
浏览文件 @
07efdb51
...
...
@@ -17,8 +17,10 @@ import os
import
six
import
sys
from
..
import
compat
as
cpt
from
.
import
framework
from
.
import
core
from
.
import
framework
__all__
=
[
'CompiledProgram'
,
'ExecutionStrategy'
,
'BuildStrategy'
]
...
...
@@ -36,7 +38,7 @@ def _place_obj(place):
class
CompiledProgram
(
object
):
"""
Compiles
a Program
for execution.
Compiles
to Graph
for execution.
1. Users first create the program with layers.
2. Optionally, users use CompiledProgram to optimize the program before run.
...
...
@@ -51,7 +53,7 @@ class CompiledProgram(object):
Example:
.. code-block:: python
place = fluid.CUDAPlace(0) if use_
cuda
else fluid.CPUPlace()
place = fluid.CUDAPlace(0) if use_
gpu
else fluid.CPUPlace()
exe = fluid.Executor(place)
exe.run(startup)
compiled_prog = compiler.CompiledProgram(main).with_data_parallel(
...
...
@@ -62,11 +64,25 @@ class CompiledProgram(object):
fetch_list=[loss.name])
Args:
program: Program instance that contains the model logic.
program_or_graph (Graph|Program): If it's Program, it will be first
lowered to a graph for further optimizations. If it's a graph
(potentially optimized before), it will be directly used for
further optimizations. Note: graph is only supported when compiled
with with_data_parallel option.
"""
def
__init__
(
self
,
program
):
self
.
_program
=
program
def
__init__
(
self
,
program_or_graph
):
if
isinstance
(
program_or_graph
,
core
.
Graph
):
self
.
_graph
=
program_or_graph
self
.
_program
=
None
elif
isinstance
(
program_or_graph
,
framework
.
Program
):
self
.
_graph
=
core
.
Graph
(
program_or_graph
.
desc
)
self
.
_program
=
program_or_graph
else
:
raise
ValueError
(
"Wrong program_to_graph type: %s"
%
type
(
program_or_graph
))
self
.
_program_desc
=
self
.
_graph
.
origin_program_desc
()
self
.
_scope
=
None
self
.
_place
=
None
self
.
_executor
=
None
...
...
@@ -101,6 +117,7 @@ class CompiledProgram(object):
self
"""
assert
not
self
.
_is_data_parallel
,
"Already compiled with parallel."
assert
not
self
.
_is_inference
,
"Cannot compile both data parallel and inference"
self
.
_is_data_parallel
=
True
self
.
_build_strategy
=
build_strategy
self
.
_exec_strategy
=
exec_strategy
...
...
@@ -110,6 +127,8 @@ class CompiledProgram(object):
self
.
_exec_strategy
=
ExecutionStrategy
()
if
self
.
_build_strategy
is
None
:
self
.
_build_strategy
=
BuildStrategy
()
self
.
_build_strategy
.
is_distribution
=
framework
.
is_pserver_mode
(
self
.
_program
)
return
self
def
with_inference_optimize
(
self
,
config
):
...
...
@@ -120,11 +139,13 @@ class CompiledProgram(object):
Returns:
self
"""
assert
not
self
.
_is_data_parallel
,
"Cannot compile both data parallel and inference"
assert
not
self
.
_is_inference
,
"Already compiled with inference"
assert
any
([
isinstance
(
config
,
InferNativeConfig
),
isinstance
(
config
,
InferAnalysisConfig
)
])
self
.
_is_data_parallel
=
False
self
.
_is_inference
=
True
self
.
_infer_config
=
config
return
self
...
...
@@ -173,37 +194,41 @@ class CompiledProgram(object):
os
.
environ
.
get
(
'CPU_NUM'
,
multiprocessing
.
cpu_count
()))
self
.
_exec_strategy
.
num_threads
=
cpu_num
*
2
trainers_endpoints
=
self
.
_program
.
_trainers_endpoints
# FIXME(dzhwinter): enable_inplace should be after memory_optimize
# if turn on python memory optimize, turn off the inplace_pass.
if
self
.
_build_strategy
.
memory_optimize
is
None
:
self
.
_build_strategy
.
memory_optimize
=
False
if
self
.
_program
.
_is_mem_optimized
else
True
self
.
_build_strategy
.
memory_optimize
=
False
if
self
.
_program
and
self
.
_program
.
_is_mem_optimized
else
True
if
self
.
_build_strategy
.
enable_inplace
is
None
:
self
.
_build_strategy
.
enable_inplace
=
False
if
self
.
_program
.
_is_mem_optimized
else
True
self
.
_build_strategy
.
enable_inplace
=
False
if
self
.
_program
and
self
.
_program
.
_is_mem_optimized
else
True
# TODO(wuyi): trainer endpoings should be passed in through
# build_strategy, not program.xxx.
if
self
.
_program
and
self
.
_build_strategy
.
num_trainers
>
1
and
\
self
.
_program
.
_trainers_endpoints
:
tps
=
self
.
_program
.
_trainers_endpoints
if
self
.
_build_strategy
.
num_trainers
>
1
and
trainers_endpoints
:
assert
self
.
_build_strategy
.
num_trainers
==
len
(
t
rainers_endpoint
s
),
"num_trainers == len(end_points)"
self
.
_build_strategy
.
trainers_endpoints
=
t
rainers_endpoint
s
self
.
_persistable_vars
=
set
([
cpt
.
to_text
(
v
.
name
)
for
v
in
[
var
for
var
in
self
.
_program
.
list_vars
()
if
var
.
persistable
and
var
.
type
!=
core
.
VarDesc
.
VarType
.
RAW
]
])
t
p
s
),
"num_trainers == len(end_points)"
self
.
_build_strategy
.
trainers_endpoints
=
t
p
s
self
.
_persistable_vars
=
[]
for
block_id
in
range
(
self
.
_program_desc
.
num_blocks
()):
bdesc
=
self
.
_program_desc
.
block
(
block_id
)
self
.
_persistable_vars
.
extend
([
cpt
.
to_text
(
v
.
name
())
for
v
in
bdesc
.
all_vars
()
if
v
.
persistable
()
and
v
.
type
()
!=
core
.
VarDesc
.
VarType
.
RAW
])
places
=
list
(
map
(
_place_obj
,
self
.
_places
))
return
core
.
ParallelExecutor
(
places
,
self
.
_persistable_vars
,
self
.
_program
.
desc
,
places
,
set
(
self
.
_persistable_vars
),
cpt
.
to_text
(
self
.
_loss_name
)
if
self
.
_loss_name
else
six
.
u
(
''
),
self
.
_scope
,
self
.
_local_scopes
,
self
.
_exec_strategy
,
self
.
_build_strategy
)
self
.
_exec_strategy
,
self
.
_build_strategy
,
self
.
_graph
)
def
_compile_inference
(
self
):
assert
self
.
_is_data_parallel
is
False
return
core
.
create_paddle_predictor
(
self
.
_infer_config
)
def
_compile
(
self
,
scope
,
place
):
...
...
python/paddle/fluid/executor.py
浏览文件 @
07efdb51
...
...
@@ -538,6 +538,8 @@ class Executor(object):
else
:
# TODO(panyx0718): Can compile program to optimize executor
# performance.
# TODO(panyx0718): executor should be able to run graph.
assert
program
.
_program
,
"CompiledProgram is compiled from graph, can only run with_data_parallel."
return
self
.
_run
(
program
.
_program
,
self
.
_default_executor
,
...
...
python/paddle/fluid/framework.py
浏览文件 @
07efdb51
...
...
@@ -87,6 +87,15 @@ def _current_expected_place():
return
_imperative_current_expected_place_
def
is_pserver_mode
(
main_program
):
main
=
main_program
if
main_program
\
else
default_main_program
()
for
op
in
main
.
global_block
().
ops
:
if
op
.
type
in
[
"send"
,
"recv"
]:
return
True
return
False
class
NameScope
(
object
):
def
__init__
(
self
,
name
=
""
,
parent
=
None
):
self
.
_children
=
dict
()
...
...
@@ -378,16 +387,19 @@ class Variable(object):
# get_capacity is implemented
pass
self
.
block
.
vars
[
name
]
=
self
self
.
op
=
None
self
.
stop_gradient
=
stop_gradient
self
.
is_data
=
is_data
if
_in_imperative_mode
():
# record vars in tracer rather than blocks
self
.
_ivar
=
kwargs
.
get
(
"ivar"
,
None
)
if
not
self
.
_ivar
:
self
.
_ivar
=
core
.
VarBase
()
self
.
_ivar
=
core
.
VarBase
(
stop_gradient
)
self
.
_ivar
.
desc
=
self
.
desc
self
.
_ivar
.
stop_gradient
=
stop_gradient
if
persistable
:
self
.
block
.
vars
[
name
]
=
self
else
:
self
.
block
.
vars
[
name
]
=
self
self
.
op
=
None
self
.
stop_gradient
=
stop_gradient
self
.
is_data
=
is_data
def
_numpy
(
self
):
new_ivar
=
self
.
_ivar
.
_copy_to
(
core
.
CPUPlace
(),
True
)
...
...
@@ -723,7 +735,6 @@ class Operator(object):
self
.
_update_desc_attr
(
attr_name
,
attr_val
)
self
.
desc
.
check_attrs
()
if
self
.
_has_kernel
(
type
):
self
.
desc
.
infer_var_type
(
self
.
block
.
desc
)
self
.
desc
.
infer_shape
(
self
.
block
.
desc
)
...
...
@@ -731,6 +742,7 @@ class Operator(object):
if
_in_imperative_mode
():
self
.
iop
=
core
.
OpBase
()
self
.
iop
.
desc
=
self
.
desc
self
.
inputs
=
defaultdict
(
list
)
if
inputs
is
not
None
:
for
k
,
v
in
six
.
iteritems
(
inputs
):
...
...
@@ -738,6 +750,7 @@ class Operator(object):
self
.
inputs
[
k
].
append
(
v
.
_ivar
)
elif
isinstance
(
v
,
list
)
or
isinstance
(
v
,
tuple
):
self
.
inputs
[
k
].
extend
([
var
.
_ivar
for
var
in
v
])
self
.
outputs
=
defaultdict
(
list
)
if
outputs
is
not
None
:
for
k
,
v
in
six
.
iteritems
(
outputs
):
...
...
@@ -1187,6 +1200,15 @@ class Block(object):
else
:
raise
ValueError
(
"Var {0} is not found recursively"
.
format
(
name
))
def
_clear_block
(
self
):
# TODO(minqiyang): move this to backward_hooks
self
.
desc
.
_clear_block
()
for
name
in
self
.
vars
.
keys
():
assert
self
.
vars
[
name
].
persistable
del
self
.
ops
[:]
def
all_parameters
(
self
):
return
list
(
self
.
iter_parameters
())
...
...
@@ -1317,18 +1339,31 @@ class Block(object):
inputs
=
kwargs
.
get
(
"inputs"
,
None
),
outputs
=
kwargs
.
get
(
"outputs"
,
None
),
attrs
=
kwargs
.
get
(
"attrs"
,
None
))
if
_in_imperative_mode
():
# record ops in tracer rather than blocks
#
# TODO(minqiyang): add op stop_gradient support in static mode too.
# currently, we only support stop_gradient in imperative mode.
self
.
_trace_op
(
op
,
kwargs
.
get
(
"stop_gradient"
,
False
))
self
.
ops
.
append
(
op
)
# TODO(minqiyang): add stop_gradient support in static mode too.
# currently, we only support stop_gradient in imperative mode.
self
.
_trace_op
(
op
,
kwargs
.
get
(
"stop_gradient"
,
False
))
return
op
def
_trace_op
(
self
,
op
,
stop_gradient
=
False
):
if
_in_imperative_mode
():
_imperative_tracer
().
trace
(
op
.
iop
,
op
.
inputs
,
op
.
outputs
,
self
.
desc
,
_imperative_current_expected_place_
,
stop_gradient
)
backward_refs
=
_imperative_tracer
().
trace
(
op
.
iop
,
op
.
inputs
,
op
.
outputs
,
self
.
desc
,
_imperative_current_expected_place_
,
stop_gradient
)
# TODO(minqiyang): support backward_hooks to eager remove backward_refs
op
.
backward_refs
=
defaultdict
(
list
)
for
k
,
v
in
six
.
iteritems
(
op
.
inputs
):
if
k
in
backward_refs
:
op
.
backward_refs
[
k
]
=
op
.
inputs
[
k
]
for
k
,
v
in
six
.
iteritems
(
op
.
outputs
):
if
k
in
backward_refs
:
op
.
backward_refs
[
k
]
=
op
.
outputs
[
k
]
def
_insert_op
(
self
,
index
,
*
args
,
**
kwargs
):
"""
...
...
@@ -1383,7 +1418,8 @@ class Block(object):
outputs
=
kwargs
.
get
(
"outputs"
,
None
),
attrs
=
kwargs
.
get
(
"attrs"
,
None
))
self
.
ops
.
insert
(
0
,
op
)
self
.
_trace_op
(
op
,
kwargs
.
get
(
"stop_gradient"
,
False
))
if
_in_imperative_mode
():
self
.
_trace_op
(
op
,
kwargs
.
get
(
"stop_gradient"
,
False
))
return
op
def
_sync_with_cpp
(
self
):
...
...
python/paddle/fluid/layers/nn.py
浏览文件 @
07efdb51
...
...
@@ -2473,7 +2473,7 @@ def pool2d(input,
data = fluid.layers.data(
name='data', shape=[3, 32, 32], dtype='float32')
conv
2d = fluid.layers.pool2d(
pool
2d = fluid.layers.pool2d(
input=data,
pool_size=2,
pool_type='max',
...
...
@@ -2522,6 +2522,7 @@ def pool2d(input,
return
pool_out
@
templatedoc
()
def
pool3d
(
input
,
pool_size
=-
1
,
pool_type
=
"max"
,
...
...
@@ -2533,13 +2534,19 @@ def pool3d(input,
name
=
None
,
exclusive
=
True
):
"""
This function adds the operator for pooling in 3-dimensions, using the
pooling configurations mentioned in input parameters.
${comment}
Args:
input (Variable): ${input_comment}
pool_size (int): ${ksize_comment}
pool_type (str): ${pooling_type_comment}
input (Variable): The input tensor of pooling operator. The format of
input tensor is NCDHW, where N is batch size, C is
the number of channels, D is the depth of the feature,
H is the height of the feature, and W is the width
of the feature.
pool_size (int|list|tuple): The pool kernel size. If pool kernel size
is a tuple or list, it must contain three integers,
(pool_size_Depth, pool_size_Height, pool_size_Width).
Otherwise, the pool kernel size will be the cube of an int.
pool_type (string): ${pooling_type_comment}
pool_stride (int): stride of the pooling layer.
pool_padding (int): padding size.
global_pooling (bool): ${global_pooling_comment}
...
...
@@ -2552,6 +2559,19 @@ def pool3d(input,
Returns:
Variable: output of pool3d layer.
Examples:
.. code-block:: python
data = fluid.layers.data(
name='data', shape=[3, 32, 32, 32], dtype='float32')
pool3d = fluid.layers.pool3d(
input=data,
pool_size=2,
pool_type='max',
pool_stride=1,
global_pooling=False)
"""
if
pool_type
not
in
[
"max"
,
"avg"
]:
raise
ValueError
(
...
...
@@ -5921,6 +5941,8 @@ def sampled_softmax_with_cross_entropy(logits,
sampled_logits
\
=
helper
.
create_variable_for_type_inference
(
dtype
=
logits
.
dtype
)
sampled_label
=
helper
.
create_variable_for_type_inference
(
dtype
=
'int64'
)
sampled_softlabel
=
helper
.
create_variable_for_type_inference
(
dtype
=
logits
.
dtype
)
helper
.
append_op
(
type
=
'sample_logits'
,
...
...
@@ -5945,14 +5967,20 @@ def sampled_softmax_with_cross_entropy(logits,
})
loss
=
helper
.
create_variable_for_type_inference
(
dtype
=
logits
.
dtype
)
softmax
=
helper
.
create_variable_for_type_inference
(
dtype
=
logits
.
dtype
)
helper
.
append_op
(
type
=
'one_hot'
,
inputs
=
{
'X'
:
sampled_label
},
attrs
=
{
'depth'
:
num_samples
+
1
},
outputs
=
{
'Out'
:
sampled_softlabel
})
helper
.
append_op
(
type
=
'softmax_with_cross_entropy'
,
inputs
=
{
'Logits'
:
sampled_logits
,
'Label'
:
sampled_label
},
'Label'
:
sampled_
soft
label
},
outputs
=
{
'Softmax'
:
softmax
,
'Loss'
:
loss
},
attrs
=
{
'soft_label'
:
Fals
e
,
'soft_label'
:
Tru
e
,
'ignore_index'
:
False
,
'numeric_stable_mode'
:
False
})
...
...
python/paddle/fluid/layers/ops.py
浏览文件 @
07efdb51
...
...
@@ -60,7 +60,28 @@ __all__ += ["uniform_random"]
_uniform_random_
=
generate_layer_fn
(
'uniform_random'
)
def
uniform_random
(
shape
,
dtype
=
None
,
min
=
None
,
max
=
None
,
seed
=
None
):
def
uniform_random
(
shape
,
dtype
=
'float32'
,
min
=-
1.0
,
max
=
1.0
,
seed
=
0
):
"""
This operator initializes a variable with random values sampled from a
uniform distribution. The random result is in set [min, max].
Args:
shape (list): The shape of output variable.
dtype(np.dtype|core.VarDesc.VarType|str): The type of data, such as
float32, float64 etc. Default: float32.
min (float): Minimum value of uniform random. Default -1.0.
max (float): Maximun value of uniform random. Default 1.0.
seed (int): Random seed used for generating samples. 0 means use a
seed generated by the system. Note that if seed is not 0, this
operator will always generate the same random numbers every time.
Default 0.
Examples:
.. code-block:: python
result = fluid.layers.uniform_random(shape=[32, 784])
"""
locals_var
=
locals
().
keys
()
if
not
isinstance
(
dtype
,
core
.
VarDesc
.
VarType
):
dtype
=
convert_np_dtype_to_dtype_
(
dtype
)
...
...
@@ -72,12 +93,6 @@ def uniform_random(shape, dtype=None, min=None, max=None, seed=None):
return
_uniform_random_
(
**
kwargs
)
uniform_random
.
__doc__
=
_uniform_random_
.
__doc__
+
"""
Examples:
>>> result = fluid.layers.uniform_random(shape=[32, 784])
"""
__all__
+=
[
'hard_shrink'
]
_hard_shrink_
=
generate_layer_fn
(
'hard_shrink'
)
...
...
python/paddle/fluid/parallel_executor.py
浏览文件 @
07efdb51
...
...
@@ -29,15 +29,6 @@ ExecutionStrategy = core.ParallelExecutor.ExecutionStrategy
BuildStrategy
=
core
.
ParallelExecutor
.
BuildStrategy
def
_is_pserver_mode
(
main_program
):
main
=
main_program
if
main_program
\
else
framework
.
default_main_program
()
for
op
in
main
.
global_block
().
ops
:
if
op
.
type
in
[
"send"
,
"recv"
]:
return
True
return
False
class
ParallelExecutor
(
object
):
"""
ParallelExecutor is designed for data parallelism, which focuses on distributing
...
...
@@ -140,7 +131,7 @@ class ParallelExecutor(object):
# FIXME(zcd): is_distribution_ is a temporary field, because in pserver mode,
# num_trainers is 1, so the current fields of build_strategy doesn't tell if
# it's distributed model.
build_strategy
.
is_distribution
=
_
is_pserver_mode
(
build_strategy
.
is_distribution
=
framework
.
is_pserver_mode
(
main_program
)
or
num_trainers
>
1
# step4: get main_program, scope, local_scopes
...
...
@@ -185,10 +176,13 @@ class ParallelExecutor(object):
places
=
list
(
map
(
place_obj
,
self
.
_places
))
# step7: init ParallelExecutor
# ParallelExecutor API will be deprecated, don't support parallel graph.
self
.
_graph
=
core
.
Graph
(
main
.
desc
)
self
.
executor
=
core
.
ParallelExecutor
(
places
,
persistable_vars
,
main
.
desc
,
places
,
persistable_vars
,
cpt
.
to_text
(
loss_name
)
if
loss_name
else
six
.
u
(
''
),
scope
,
local_scopes
,
exec_strategy
,
build_strategy
)
local_scopes
,
exec_strategy
,
build_strategy
,
self
.
_graph
)
self
.
scope
=
scope
...
...
python/paddle/fluid/tests/unittests/mkldnn/mkldnn_op_test.py
0 → 100644
浏览文件 @
07efdb51
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from
__future__
import
print_function
import
numpy
as
np
import
paddle.fluid.core
as
core
import
paddle.fluid
as
fluid
def
check_if_mkldnn_primitives_exist_in_bwd
(
test_case
,
op_type
,
x
,
out
,
out_grad
,
x_grad
):
def
__assert_close
(
tensor
,
np_array
,
msg
,
atol
=
1e-4
):
test_case
.
assertTrue
(
np
.
allclose
(
np
.
array
(
tensor
),
np_array
,
atol
=
atol
),
msg
)
place
=
core
.
CPUPlace
()
var_dict
=
{
'x'
:
x
,
'out'
:
out
,
'out@GRAD'
:
out_grad
,
'x@GRAD'
:
x_grad
}
var_names
=
list
(
var_dict
.
keys
())
ground_truth
=
{
name
:
var_dict
[
name
]
for
name
in
var_names
}
program
=
fluid
.
Program
()
with
fluid
.
program_guard
(
program
):
block
=
program
.
global_block
()
for
name
in
ground_truth
:
block
.
create_var
(
name
=
name
,
dtype
=
np
.
float32
,
shape
=
ground_truth
[
name
].
shape
)
op
=
block
.
append_op
(
type
=
op_type
,
inputs
=
{
'X'
:
block
.
var
(
'x'
),
},
outputs
=
{
'Out'
:
block
.
var
(
'out'
)},
attrs
=
{
'use_mkldnn'
:
True
})
# Generate backward op_desc
grad_op_desc_list
,
op_grad_to_var
=
core
.
get_grad_op_desc
(
op
.
desc
,
set
(),
[])
grad_op_desc
=
grad_op_desc_list
[
0
]
new_op_desc
=
block
.
desc
.
append_op
()
new_op_desc
.
copy_from
(
grad_op_desc
)
for
var_name
in
grad_op_desc
.
output_arg_names
():
block
.
desc
.
var
(
var_name
.
encode
(
'ascii'
))
grad_op_desc
.
infer_var_type
(
block
.
desc
)
grad_op_desc
.
infer_shape
(
block
.
desc
)
for
arg
in
grad_op_desc
.
output_arg_names
():
grad_var
=
block
.
desc
.
find_var
(
arg
.
encode
(
'ascii'
))
grad_var
.
set_dtype
(
core
.
VarDesc
.
VarType
.
FP32
)
exe
=
fluid
.
Executor
(
place
)
# Do at least 2 iterations
for
i
in
range
(
2
):
out
=
exe
.
run
(
program
,
feed
=
{
name
:
var_dict
[
name
]
for
name
in
[
'x'
,
'out@GRAD'
]},
fetch_list
=
[
'x@GRAD'
,
'out'
])
__assert_close
(
x_grad
,
out
[
0
],
'x@GRAD'
)
python/paddle/fluid/tests/unittests/mkldnn/test_activation_mkldnn_op.py
浏览文件 @
07efdb51
...
...
@@ -19,7 +19,7 @@ import numpy as np
import
paddle.fluid.core
as
core
from
paddle.fluid.tests.unittests.op_test
import
OpTest
from
paddle.fluid.tests.unittests.test_activation_op
import
TestRelu
,
TestTanh
,
TestSqrt
,
TestAbs
import
paddle.fluid
as
flui
d
from
mkldnn_op_test
import
check_if_mkldnn_primitives_exist_in_bw
d
class
TestMKLDNNReluDim2
(
TestRelu
):
...
...
@@ -98,62 +98,24 @@ class TestMKLDNNAbsDim4(TestAbs):
# Check if primitives already exist in backward
class
TestMKLDNNReluPrimitivesAlreadyExist
(
unittest
.
TestCase
):
def
__assert_close
(
self
,
tensor
,
np_array
,
msg
,
atol
=
1e-4
):
self
.
assertTrue
(
np
.
allclose
(
np
.
array
(
tensor
),
np_array
,
atol
=
atol
),
msg
)
def
test_check_forward_backward
(
self
):
place
=
core
.
CPUPlace
()
class
TestMKLDNNAbsPrimitivesAlreadyExist
(
unittest
.
TestCase
):
def
setUp
(
self
):
super
(
TestMKLDNNAbsPrimitivesAlreadyExist
,
self
).
setUp
()
np
.
random
.
seed
(
123
)
x
=
np
.
random
.
uniform
(
-
1
,
1
,
[
2
,
2
]).
astype
(
np
.
float32
)
out
=
np
.
abs
(
x
)
out_grad
=
np
.
random
.
random_sample
(
x
.
shape
).
astype
(
np
.
float32
)
x_grad
=
out_grad
*
np
.
sign
(
x
)
# Abs grad calculation
var_dict
=
{
'x'
:
x
,
'out'
:
out
,
'out@GRAD'
:
out_grad
,
'x@GRAD'
:
x_grad
}
var_names
=
list
(
var_dict
.
keys
())
ground_truth
=
{
name
:
var_dict
[
name
]
for
name
in
var_names
}
program
=
fluid
.
Program
()
with
fluid
.
program_guard
(
program
):
block
=
program
.
global_block
()
for
name
in
ground_truth
:
block
.
create_var
(
name
=
name
,
dtype
=
'float32'
,
shape
=
ground_truth
[
name
].
shape
)
relu_op
=
block
.
append_op
(
type
=
"abs"
,
inputs
=
{
"X"
:
block
.
var
(
'x'
),
},
outputs
=
{
"Out"
:
block
.
var
(
'out'
)},
attrs
=
{
"use_mkldnn"
:
True
})
# Generate backward op_desc
grad_op_desc_list
,
op_grad_to_var
=
core
.
get_grad_op_desc
(
relu_op
.
desc
,
set
(),
[])
grad_op_desc
=
grad_op_desc_list
[
0
]
new_op_desc
=
block
.
desc
.
append_op
()
new_op_desc
.
copy_from
(
grad_op_desc
)
for
var_name
in
grad_op_desc
.
output_arg_names
():
block
.
desc
.
var
(
var_name
.
encode
(
"ascii"
))
grad_op_desc
.
infer_var_type
(
block
.
desc
)
grad_op_desc
.
infer_shape
(
block
.
desc
)
for
arg
in
grad_op_desc
.
output_arg_names
():
grad_var
=
block
.
desc
.
find_var
(
arg
.
encode
(
"ascii"
))
grad_var
.
set_dtype
(
core
.
VarDesc
.
VarType
.
FP32
)
exe
=
fluid
.
Executor
(
place
)
# Do at least 2 iterations
for
i
in
range
(
2
):
out
=
exe
.
run
(
program
,
feed
=
{
name
:
var_dict
[
name
]
for
name
in
[
'x'
,
'out@GRAD'
]},
fetch_list
=
[
'x@GRAD'
])
self
.
__assert_close
(
x_grad
,
out
[
0
],
"x@GRAD"
)
self
.
op_type
=
'abs'
self
.
x
=
np
.
random
.
uniform
(
-
1
,
1
,
[
2
,
2
]).
astype
(
np
.
float32
)
self
.
out
=
np
.
abs
(
self
.
x
)
self
.
out_grad
=
np
.
random
.
random_sample
(
self
.
x
.
shape
).
astype
(
np
.
float32
)
self
.
x_grad
=
self
.
__abs_bwd
(
self
.
x
,
self
.
out_grad
)
# Abs grad calculation
def
__abs_bwd
(
self
,
x
,
out_grad
):
return
out_grad
*
np
.
sign
(
x
)
def
test_check
(
self
):
check_if_mkldnn_primitives_exist_in_bwd
(
self
,
self
.
op_type
,
self
.
x
,
self
.
out
,
self
.
out_grad
,
self
.
x_grad
)
if
__name__
==
'__main__'
:
...
...
python/paddle/fluid/tests/unittests/mkldnn/test_softmax_mkldnn_op.py
0 → 100644
浏览文件 @
07efdb51
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from
__future__
import
print_function
import
unittest
import
numpy
as
np
from
paddle.fluid.tests.unittests.op_test
import
OpTest
import
paddle.fluid.core
as
core
from
paddle.fluid.tests.unittests.test_softmax_op
import
TestSoftmaxOp
,
stable_softmax
from
mkldnn_op_test
import
check_if_mkldnn_primitives_exist_in_bwd
class
TestSoftmaxMKLDNNOp
(
TestSoftmaxOp
):
def
init_kernel_type
(
self
):
self
.
use_mkldnn
=
True
class
TestSoftmaxMKLDNNOp2
(
TestSoftmaxMKLDNNOp
):
def
get_x_shape
(
self
):
return
[
2
,
3
,
4
,
5
]
# Check if primitives already exist in backward
class
TestSoftmaxMKLDNNPrimitivesAlreadyExist
(
unittest
.
TestCase
):
def
setUp
(
self
):
super
(
TestSoftmaxMKLDNNPrimitivesAlreadyExist
,
self
).
setUp
()
np
.
random
.
seed
(
123
)
self
.
op_type
=
'softmax'
self
.
x
=
np
.
random
.
uniform
(
-
1
,
1
,
2
).
astype
(
np
.
float32
)
self
.
out
=
stable_softmax
(
self
.
x
)
self
.
out_grad
=
np
.
random
.
random_sample
(
self
.
x
.
shape
).
astype
(
np
.
float32
)
self
.
x_grad
=
self
.
__softmax_bwd
(
self
.
out
,
self
.
out_grad
)
# Softmax grad calculation
def
__softmax_bwd
(
self
,
out
,
out_grad
):
return
out
*
(
out_grad
-
np
.
dot
(
out
,
out_grad
))
def
test_check
(
self
):
check_if_mkldnn_primitives_exist_in_bwd
(
self
,
self
.
op_type
,
self
.
x
,
self
.
out
,
self
.
out_grad
,
self
.
x_grad
)
if
__name__
==
'__main__'
:
unittest
.
main
()
python/paddle/fluid/tests/unittests/test_alloc_continuous_space_op.py
0 → 100644
浏览文件 @
07efdb51
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from
__future__
import
print_function
import
unittest
import
numpy
as
np
from
op_test
import
OpTest
class
TestAllocContinuousSpace
(
OpTest
):
def
setUp
(
self
):
self
.
op_type
=
"alloc_continuous_space"
self
.
dtype
=
np
.
float32
attrs
=
self
.
init_attr
()
self
.
copy_data
=
attrs
[
"copy_data"
]
self
.
constant
=
attrs
[
"constant"
]
self
.
set_constant
=
attrs
[
"set_constant"
]
self
.
Inputs
=
self
.
init_input
()
self
.
FusedOutput
=
self
.
init_output
(
self
.
Inputs
,
self
.
set_constant
,
self
.
constant
)
self
.
inputs
=
{
'Input'
:
self
.
Inputs
}
self
.
attrs
=
attrs
self
.
outputs
=
{
'Output'
:
self
.
Inputs
,
'FusedOutput'
:
self
.
FusedOutput
}
def
init_dtype
(
self
):
self
.
dtype
=
np
.
float32
def
init_input
(
self
):
inputs
=
[]
inputs
.
append
((
"x1"
,
np
.
random
.
random
([
20
,
3
]).
astype
(
self
.
dtype
)))
inputs
.
append
((
"x2"
,
np
.
random
.
random
([
20
]).
astype
(
self
.
dtype
)))
inputs
.
append
((
"x3"
,
np
.
random
.
random
([
1
]).
astype
(
self
.
dtype
)))
inputs
.
append
((
"x4"
,
np
.
random
.
random
([
200
,
30
]).
astype
(
self
.
dtype
)))
inputs
.
append
((
"x5"
,
np
.
random
.
random
([
30
]).
astype
(
self
.
dtype
)))
inputs
.
append
((
"x6"
,
np
.
random
.
random
([
1
]).
astype
(
self
.
dtype
)))
return
inputs
def
init_attr
(
self
):
return
{
"copy_data"
:
True
,
"set_constant"
:
False
,
"constant"
:
0.0
}
def
init_output
(
self
,
input_list
,
set_constant
,
constant
):
inputs
=
[
input
[
1
].
flatten
()
for
input
in
input_list
]
output
=
np
.
concatenate
(
inputs
)
if
set_constant
:
output
=
np
.
ones
((
len
(
output
)))
*
constant
return
output
def
test_check_output
(
self
):
self
.
check_output
()
class
TestAllocContinuousSpace2
(
TestAllocContinuousSpace
):
def
init_attr
(
self
):
return
{
"copy_data"
:
False
,
"set_constant"
:
True
,
"constant"
:
0.5
}
def
test_check_output
(
self
):
self
.
check_output
(
no_check_set
=
[
"Output"
])
if
__name__
==
'__main__'
:
unittest
.
main
()
python/paddle/fluid/tests/unittests/test_fake_quantize_op.py
浏览文件 @
07efdb51
...
...
@@ -35,7 +35,7 @@ class TestFakeQuantizeOp(OpTest):
self
.
check_output
()
class
TestFakeQuantizeOp
(
OpTest
):
class
TestFakeQuantize
RangeAbsMax
Op
(
OpTest
):
def
setUp
(
self
):
self
.
op_type
=
"fake_quantize_range_abs_max"
self
.
attrs
=
{
...
...
@@ -43,8 +43,10 @@ class TestFakeQuantizeOp(OpTest):
'window_size'
:
int
(
1
),
'is_test'
:
False
}
x
=
(
np
.
random
.
random
((
8
,
16
,
7
,
7
))
-
0.5
)
*
10
x
=
x
.
astype
(
"float32"
)
self
.
inputs
=
{
'X'
:
np
.
random
.
random
((
8
,
16
,
7
,
7
)).
astype
(
"float32"
)
,
'X'
:
x
,
'Iter'
:
np
.
zeros
(
1
).
astype
(
"int64"
),
'InScale'
:
np
.
zeros
(
1
).
astype
(
"float32"
)
}
...
...
@@ -62,5 +64,36 @@ class TestFakeQuantizeOp(OpTest):
self
.
check_output
()
class
TestFakeQuantizeRangeAbsMaxOp2
(
OpTest
):
def
setUp
(
self
):
self
.
op_type
=
"fake_quantize_range_abs_max"
self
.
attrs
=
{
'bit_length'
:
int
(
8
),
'window_size'
:
int
(
1
),
'is_test'
:
True
}
x
=
(
np
.
random
.
random
((
8
,
16
,
7
,
7
))
-
0.5
)
*
10
x
=
x
.
astype
(
"float32"
)
scale
=
np
.
max
(
np
.
abs
(
x
)).
astype
(
"float32"
)
-
1.0
out_scales
=
np
.
zeros
(
self
.
attrs
[
'window_size'
]).
astype
(
"float32"
)
out_scales
[
0
]
=
scale
self
.
inputs
=
{
'X'
:
x
,
'Iter'
:
np
.
zeros
(
1
).
astype
(
"int64"
),
'InScale'
:
scale
.
astype
(
"float32"
)
}
xs
=
np
.
clip
(
x
,
-
scale
,
scale
)
qs
=
np
.
round
(
xs
/
scale
*
((
1
<<
(
self
.
attrs
[
'bit_length'
]
-
1
))
-
1
))
self
.
outputs
=
{
'Out'
:
qs
,
'OutScale'
:
scale
.
astype
(
"float32"
),
'OutScales'
:
out_scales
,
}
def
test_check_output
(
self
):
self
.
check_output
(
no_check_set
=
set
([
'OutScale'
,
'OutScales'
]))
if
__name__
==
"__main__"
:
unittest
.
main
()
python/paddle/fluid/tests/unittests/test_imperative_optimizer.py
浏览文件 @
07efdb51
...
...
@@ -105,7 +105,7 @@ class MNIST(fluid.imperative.Layer):
class
TestImperativeMnist
(
unittest
.
TestCase
):
def
test_mnist_float32
(
self
):
seed
=
90
batch_num
=
2
epoch_num
=
1
with
fluid
.
imperative
.
guard
():
fluid
.
default_startup_program
().
random_seed
=
seed
fluid
.
default_main_program
().
random_seed
=
seed
...
...
@@ -113,39 +113,40 @@ class TestImperativeMnist(unittest.TestCase):
mnist
=
MNIST
(
"mnist"
)
sgd
=
SGDOptimizer
(
learning_rate
=
1e-3
)
train_reader
=
paddle
.
batch
(
paddle
.
dataset
.
mnist
.
train
(),
batch_size
=
128
)
paddle
.
dataset
.
mnist
.
train
(),
batch_size
=
128
,
drop_last
=
True
)
dy_param_init_value
=
{}
for
batch_id
,
data
in
enumerate
(
train_reader
()):
if
batch_id
>=
batch_num
:
break
dy_x_data
=
np
.
array
(
[
x
[
0
].
reshape
(
1
,
28
,
28
)
for
x
in
data
]).
astype
(
'float32'
)
y_data
=
np
.
array
([
x
[
1
]
for
x
in
data
]).
astype
(
'int64'
).
reshape
(
128
,
1
)
img
=
to_variable
(
dy_x_data
)
label
=
to_variable
(
y_data
)
label
.
_stop_gradient
=
True
cost
=
mnist
(
img
)
loss
=
fluid
.
layers
.
cross_entropy
(
cost
,
label
)
avg_loss
=
fluid
.
layers
.
mean
(
loss
)
dy_out
=
avg_loss
.
_numpy
()
if
batch_id
==
0
:
for
param
in
fluid
.
default_main_program
().
global_block
(
).
all_parameters
():
dy_param_init_value
[
param
.
name
]
=
param
.
_numpy
()
avg_loss
.
_backward
()
sgd
.
minimize
(
avg_loss
)
mnist
.
clear_gradients
()
dy_param_value
=
{}
for
param
in
fluid
.
default_main_program
().
global_block
(
).
all_parameters
():
dy_param_value
[
param
.
name
]
=
param
.
_numpy
()
for
epoch
in
range
(
epoch_num
):
for
batch_id
,
data
in
enumerate
(
train_reader
()):
dy_x_data
=
np
.
array
(
[
x
[
0
].
reshape
(
1
,
28
,
28
)
for
x
in
data
]).
astype
(
'float32'
)
y_data
=
np
.
array
(
[
x
[
1
]
for
x
in
data
]).
astype
(
'int64'
).
reshape
(
128
,
1
)
img
=
to_variable
(
dy_x_data
)
label
=
to_variable
(
y_data
)
label
.
_stop_gradient
=
True
cost
=
mnist
(
img
)
loss
=
fluid
.
layers
.
cross_entropy
(
cost
,
label
)
avg_loss
=
fluid
.
layers
.
mean
(
loss
)
dy_out
=
avg_loss
.
_numpy
()
if
epoch
==
0
and
batch_id
==
0
:
for
param
in
mnist
.
parameters
():
dy_param_init_value
[
param
.
name
]
=
param
.
_numpy
()
avg_loss
.
_backward
()
sgd
.
minimize
(
avg_loss
)
mnist
.
clear_gradients
()
fluid
.
default_main_program
().
global_block
().
_clear_block
()
dy_param_value
=
{}
for
param
in
mnist
.
parameters
():
dy_param_value
[
param
.
name
]
=
param
.
_numpy
()
with
new_program_scope
():
fluid
.
default_startup_program
().
random_seed
=
seed
...
...
@@ -157,7 +158,7 @@ class TestImperativeMnist(unittest.TestCase):
mnist
=
MNIST
(
"mnist"
)
sgd
=
SGDOptimizer
(
learning_rate
=
1e-3
)
train_reader
=
paddle
.
batch
(
paddle
.
dataset
.
mnist
.
train
(),
batch_size
=
128
)
paddle
.
dataset
.
mnist
.
train
(),
batch_size
=
128
,
drop_last
=
True
)
img
=
fluid
.
layers
.
data
(
name
=
'pixel'
,
shape
=
[
1
,
28
,
28
],
dtype
=
'float32'
)
...
...
@@ -170,8 +171,7 @@ class TestImperativeMnist(unittest.TestCase):
# initialize params and fetch them
static_param_init_value
=
{}
static_param_name_list
=
[]
for
param
in
fluid
.
default_startup_program
().
global_block
(
).
all_parameters
():
for
param
in
mnist
.
parameters
():
static_param_name_list
.
append
(
param
.
name
)
out
=
exe
.
run
(
fluid
.
default_startup_program
(),
...
...
@@ -180,26 +180,29 @@ class TestImperativeMnist(unittest.TestCase):
for
i
in
range
(
len
(
static_param_name_list
)):
static_param_init_value
[
static_param_name_list
[
i
]]
=
out
[
i
]
for
batch_id
,
data
in
enumerate
(
train_reader
()):
if
batch_id
>=
batch_num
:
break
static_x_data
=
np
.
array
(
[
x
[
0
].
reshape
(
1
,
28
,
28
)
for
x
in
data
]).
astype
(
'float32'
)
y_data
=
np
.
array
([
x
[
1
]
for
x
in
data
]).
astype
(
'int64'
).
reshape
(
[
128
,
1
])
fetch_list
=
[
avg_loss
.
name
]
fetch_list
.
extend
(
static_param_name_list
)
out
=
exe
.
run
(
fluid
.
default_main_program
(),
feed
=
{
"pixel"
:
static_x_data
,
"label"
:
y_data
},
fetch_list
=
fetch_list
)
static_param_value
=
{}
static_out
=
out
[
0
]
for
i
in
range
(
1
,
len
(
out
)):
static_param_value
[
static_param_name_list
[
i
-
1
]]
=
out
[
i
]
for
epoch
in
range
(
epoch_num
):
for
batch_id
,
data
in
enumerate
(
train_reader
()):
static_x_data
=
np
.
array
(
[
x
[
0
].
reshape
(
1
,
28
,
28
)
for
x
in
data
]).
astype
(
'float32'
)
y_data
=
np
.
array
(
[
x
[
1
]
for
x
in
data
]).
astype
(
'int64'
).
reshape
([
128
,
1
])
fetch_list
=
[
avg_loss
.
name
]
fetch_list
.
extend
(
static_param_name_list
)
out
=
exe
.
run
(
fluid
.
default_main_program
(),
feed
=
{
"pixel"
:
static_x_data
,
"label"
:
y_data
},
fetch_list
=
fetch_list
)
static_param_value
=
{}
static_out
=
out
[
0
]
for
i
in
range
(
1
,
len
(
out
)):
static_param_value
[
static_param_name_list
[
i
-
1
]]
=
out
[
i
]
self
.
assertTrue
(
np
.
allclose
(
dy_x_data
.
all
(),
static_x_data
.
all
()))
for
key
,
value
in
six
.
iteritems
(
static_param_init_value
):
self
.
assertTrue
(
np
.
allclose
(
value
,
dy_param_init_value
[
key
]))
...
...
@@ -207,7 +210,7 @@ class TestImperativeMnist(unittest.TestCase):
self
.
assertTrue
(
np
.
allclose
(
static_out
,
dy_out
))
for
key
,
value
in
six
.
iteritems
(
static_param_value
):
self
.
assertTrue
(
np
.
allclose
(
value
,
dy_param_value
[
key
]))
self
.
assertTrue
(
np
.
allclose
(
value
,
dy_param_value
[
key
]
,
atol
=
1e-5
))
if
__name__
==
'__main__'
:
...
...
python/paddle/fluid/tests/unittests/test_imperative_resnet.py
浏览文件 @
07efdb51
...
...
@@ -231,7 +231,7 @@ class TestImperativeResnet(unittest.TestCase):
seed
=
90
batch_size
=
train_parameters
[
"batch_size"
]
batch_num
=
1
batch_num
=
2
with
fluid
.
imperative
.
guard
():
fluid
.
default_startup_program
().
random_seed
=
seed
fluid
.
default_main_program
().
random_seed
=
seed
...
...
@@ -286,6 +286,8 @@ class TestImperativeResnet(unittest.TestCase):
optimizer
.
minimize
(
avg_loss
)
resnet
.
clear_gradients
()
fluid
.
default_main_program
().
global_block
().
_clear_block
()
dy_param_value
=
{}
for
param
in
resnet
.
parameters
():
dy_param_value
[
param
.
name
]
=
param
.
_numpy
()
...
...
@@ -319,11 +321,9 @@ class TestImperativeResnet(unittest.TestCase):
static_param_init_value
=
{}
static_param_name_list
=
[]
static_grad_name_list
=
[]
for
param
in
fluid
.
default_startup_program
().
global_block
(
).
all_parameters
():
for
param
in
resnet
.
parameters
():
static_param_name_list
.
append
(
param
.
name
)
for
param
in
fluid
.
default_main_program
().
global_block
(
).
all_parameters
():
for
param
in
resnet
.
parameters
():
if
not
param
.
stop_gradient
:
static_grad_name_list
.
append
(
param
.
name
+
core
.
grad_var_suffix
())
...
...
python/paddle/fluid/tests/unittests/test_ir_memory_optimize_transformer.py
浏览文件 @
07efdb51
...
...
@@ -13,21 +13,47 @@
# limitations under the License.
import
os
import
sys
import
unittest
from
timeit
import
default_timer
as
timer
import
paddle
import
paddle.fluid
as
fluid
import
paddle.fluid.core
as
core
import
paddle.dataset.wmt16
as
wmt16
os
.
environ
[
'FLAGS_eager_delete_tensor_gb'
]
=
"0.0"
os
.
environ
[
'RECORDIO_FILENAME'
]
=
'/tmp/ir_memory_optimize_transformer.wmt16.recordio'
from
test_parallel_executor_transformer
import
TestTransformer
from
test_parallel_executor_transformer
import
transformer
from
test_parallel_executor_transformer
import
transformer
,
ModelHyperParams
,
transformer_model
,
transformer
,
prepare_batch_input
from
parallel_executor_test_base
import
TestParallelExecutorBase
# disable temporarily because of timeout.
sys
.
exit
(
0
)
# NOTE(dzhwinter): test diferent strategy colisions.
# open the eager delete tensor strategy by default.
class
TestTransformerWithIR
(
TestTransformer
):
class
TestTransformerWithIR
(
TestParallelExecutorBase
):
@
classmethod
def
setUpClass
(
cls
):
os
.
environ
[
'CPU_NUM'
]
=
str
(
4
)
reader
=
paddle
.
batch
(
wmt16
.
train
(
ModelHyperParams
.
src_vocab_size
,
ModelHyperParams
.
trg_vocab_size
),
batch_size
=
transformer_model
.
batch_size
)
with
fluid
.
recordio_writer
.
create_recordio_writer
(
os
.
environ
.
get
(
"RECORDIO_FILENAME"
))
as
writer
:
for
batch
in
reader
():
for
tensor
in
prepare_batch_input
(
batch
,
ModelHyperParams
.
src_pad_idx
,
ModelHyperParams
.
trg_pad_idx
,
ModelHyperParams
.
n_head
):
t
=
fluid
.
LoDTensor
()
t
.
set
(
tensor
,
fluid
.
CPUPlace
())
writer
.
append_tensor
(
t
)
writer
.
complete_append_tensor
()
def
test_main
(
self
):
if
core
.
is_compiled_with_cuda
():
# check python transpiler
...
...
@@ -35,13 +61,15 @@ class TestTransformerWithIR(TestTransformer):
transformer
,
use_cuda
=
True
,
memory_opt
=
True
,
use_ir_memory_optimize
=
False
)
use_ir_memory_optimize
=
False
,
iter
=
2
)
# check IR memory optimize
self
.
check_network_convergence
(
transformer
,
use_cuda
=
True
,
memory_opt
=
False
,
use_ir_memory_optimize
=
True
)
use_ir_memory_optimize
=
True
,
iter
=
2
)
if
__name__
==
'__main__'
:
...
...
python/paddle/fluid/tests/unittests/test_softmax_op.py
浏览文件 @
07efdb51
...
...
@@ -144,15 +144,5 @@ class TestSoftmaxFP16CUDNNOp2(TestSoftmaxFP16CUDNNOp):
return
[
2
,
3
,
4
,
5
]
class
TestSoftmaxMKLDNNOp
(
TestSoftmaxOp
):
def
init_kernel_type
(
self
):
self
.
use_mkldnn
=
True
class
TestSoftmaxMKLDNNOp2
(
TestSoftmaxMKLDNNOp
):
def
get_x_shape
(
self
):
return
[
2
,
3
,
4
,
5
]
if
__name__
==
"__main__"
:
unittest
.
main
()
python/paddle/utils/plot.py
浏览文件 @
07efdb51
...
...
@@ -13,6 +13,7 @@
# limitations under the License.
import
os
import
six
class
PlotData
(
object
):
...
...
@@ -60,9 +61,9 @@ class Ploter(object):
def
append
(
self
,
title
,
step
,
value
):
"""
Feed data
Args:
Feed data
Args:
title: assign the group data to this subtitle.
step: the x_axis of data.
value: the y_axis of data.
...
...
@@ -71,9 +72,9 @@ class Ploter(object):
.. code-block:: python
plot_curve = Ploter("Curve 1","Curve 2")
plot_curve.append(title="Curve 1",step=1,value=1)
"""
assert
isinstance
(
title
,
basestring
)
assert
self
.
__plot_data__
.
has_key
(
title
)
"""
assert
isinstance
(
title
,
six
.
string_types
)
assert
title
in
self
.
__plot_data__
data
=
self
.
__plot_data__
[
title
]
assert
isinstance
(
data
,
PlotData
)
data
.
append
(
step
,
value
)
...
...
@@ -89,7 +90,7 @@ class Ploter(object):
.. code-block:: python
plot_curve = Ploter()
plot_cure.plot()
"""
"""
if
self
.
__plot_is_disabled__
():
return
...
...
python/paddle/utils/preprocess_img.py
浏览文件 @
07efdb51
...
...
@@ -122,7 +122,7 @@ class ImageClassificationDatasetCreater(preprocess_util.DatasetCreater):
def
create_dataset_from_list
(
self
,
path
):
data
=
[]
label_set
=
[]
for
line
in
open
(
file_list
):
for
line
in
open
(
path
):
items
=
line
.
rstrip
.
split
()
image_path
=
items
[
0
]
label_name
=
items
[
1
]
...
...
@@ -141,7 +141,7 @@ class ImageClassificationDatasetCreater(preprocess_util.DatasetCreater):
path: the path of the image dataset.
"""
if
self
.
from_list
:
return
create_dataset_from_list
(
path
)
return
self
.
create_dataset_from_list
(
path
)
label_set
=
preprocess_util
.
get_label_set_from_dir
(
path
)
data
=
[]
for
l_name
in
list
(
label_set
.
keys
()):
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录