Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
Paddle
提交
f52c4f8b
P
Paddle
项目概览
PaddlePaddle
/
Paddle
大约 2 年 前同步成功
通知
2325
Star
20933
Fork
5424
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
1423
列表
看板
标记
里程碑
合并请求
543
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
1,423
Issue
1,423
列表
看板
标记
里程碑
合并请求
543
合并请求
543
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
f52c4f8b
编写于
9月 21, 2020
作者:
Y
yaoxuefeng6
浏览文件
操作
浏览文件
下载
差异文件
fix conflict
上级
cb602fce
37f7414f
变更
125
展开全部
显示空白变更内容
内联
并排
Showing
125 changed file
with
4740 addition
and
912 deletion
+4740
-912
cmake/cuda.cmake
cmake/cuda.cmake
+3
-0
paddle/fluid/framework/data_feed.cc
paddle/fluid/framework/data_feed.cc
+58
-12
paddle/fluid/framework/details/CMakeLists.txt
paddle/fluid/framework/details/CMakeLists.txt
+1
-0
paddle/fluid/framework/details/all_reduce_op_handle.cc
paddle/fluid/framework/details/all_reduce_op_handle.cc
+63
-19
paddle/fluid/framework/details/async_ssa_graph_executor.cc
paddle/fluid/framework/details/async_ssa_graph_executor.cc
+13
-2
paddle/fluid/framework/details/build_strategy.h
paddle/fluid/framework/details/build_strategy.h
+4
-0
paddle/fluid/framework/details/fast_threaded_ssa_graph_executor.cc
...uid/framework/details/fast_threaded_ssa_graph_executor.cc
+5
-1
paddle/fluid/framework/details/fetch_op_handle.cc
paddle/fluid/framework/details/fetch_op_handle.cc
+6
-2
paddle/fluid/framework/details/op_handle_base.cc
paddle/fluid/framework/details/op_handle_base.cc
+7
-0
paddle/fluid/framework/details/op_handle_base.h
paddle/fluid/framework/details/op_handle_base.h
+6
-0
paddle/fluid/framework/details/parallel_ssa_graph_executor.cc
...le/fluid/framework/details/parallel_ssa_graph_executor.cc
+8
-1
paddle/fluid/framework/details/scope_buffered_ssa_graph_executor.cc
...id/framework/details/scope_buffered_ssa_graph_executor.cc
+9
-1
paddle/fluid/framework/details/share_tensor_buffer_functor.cc
...le/fluid/framework/details/share_tensor_buffer_functor.cc
+57
-13
paddle/fluid/framework/details/share_tensor_buffer_functor.h
paddle/fluid/framework/details/share_tensor_buffer_functor.h
+9
-1
paddle/fluid/framework/details/share_tensor_buffer_op_handle.cc
.../fluid/framework/details/share_tensor_buffer_op_handle.cc
+20
-5
paddle/fluid/framework/details/share_tensor_buffer_op_handle.h
...e/fluid/framework/details/share_tensor_buffer_op_handle.h
+4
-1
paddle/fluid/framework/details/ssa_graph_executor.cc
paddle/fluid/framework/details/ssa_graph_executor.cc
+4
-2
paddle/fluid/framework/details/threaded_ssa_graph_executor.cc
...le/fluid/framework/details/threaded_ssa_graph_executor.cc
+15
-6
paddle/fluid/framework/details/threaded_ssa_graph_executor.h
paddle/fluid/framework/details/threaded_ssa_graph_executor.h
+2
-2
paddle/fluid/framework/details/var_handle.h
paddle/fluid/framework/details/var_handle.h
+8
-3
paddle/fluid/framework/details/variable_visitor.cc
paddle/fluid/framework/details/variable_visitor.cc
+44
-27
paddle/fluid/framework/fleet/gloo_wrapper.cc
paddle/fluid/framework/fleet/gloo_wrapper.cc
+22
-3
paddle/fluid/framework/ir/conv_affine_channel_fuse_pass.cc
paddle/fluid/framework/ir/conv_affine_channel_fuse_pass.cc
+12
-0
paddle/fluid/framework/ir/conv_bn_fuse_pass.cc
paddle/fluid/framework/ir/conv_bn_fuse_pass.cc
+12
-0
paddle/fluid/framework/ir/conv_elementwise_add2_act_fuse_pass.cc
...fluid/framework/ir/conv_elementwise_add2_act_fuse_pass.cc
+8
-1
paddle/fluid/framework/ir/conv_elementwise_add_act_fuse_pass.cc
.../fluid/framework/ir/conv_elementwise_add_act_fuse_pass.cc
+8
-0
paddle/fluid/framework/ir/conv_elementwise_add_fuse_pass.cc
paddle/fluid/framework/ir/conv_elementwise_add_fuse_pass.cc
+7
-2
paddle/fluid/framework/ir/embedding_fc_lstm_fuse_pass.cc
paddle/fluid/framework/ir/embedding_fc_lstm_fuse_pass.cc
+11
-1
paddle/fluid/framework/ir/fc_fuse_pass.cc
paddle/fluid/framework/ir/fc_fuse_pass.cc
+8
-0
paddle/fluid/framework/ir/fc_gru_fuse_pass.cc
paddle/fluid/framework/ir/fc_gru_fuse_pass.cc
+21
-1
paddle/fluid/framework/ir/fc_lstm_fuse_pass.cc
paddle/fluid/framework/ir/fc_lstm_fuse_pass.cc
+15
-0
paddle/fluid/framework/ir/memory_optimize_pass/CMakeLists.txt
...le/fluid/framework/ir/memory_optimize_pass/CMakeLists.txt
+2
-0
paddle/fluid/framework/ir/memory_optimize_pass/buffer_shared_inplace_op_pass.cc
.../ir/memory_optimize_pass/buffer_shared_inplace_op_pass.cc
+4
-2
paddle/fluid/framework/ir/memory_optimize_pass/inplace_addto_op_pass.cc
...ramework/ir/memory_optimize_pass/inplace_addto_op_pass.cc
+221
-0
paddle/fluid/framework/ir/memory_optimize_pass/memory_reuse_pass.cc
...id/framework/ir/memory_optimize_pass/memory_reuse_pass.cc
+8
-3
paddle/fluid/framework/ir/memory_optimize_pass/memory_reuse_pass.h
...uid/framework/ir/memory_optimize_pass/memory_reuse_pass.h
+7
-7
paddle/fluid/framework/ir/repeated_fc_relu_fuse_pass.cc
paddle/fluid/framework/ir/repeated_fc_relu_fuse_pass.cc
+10
-0
paddle/fluid/framework/ir/shuffle_channel_detect_pass.cc
paddle/fluid/framework/ir/shuffle_channel_detect_pass.cc
+8
-0
paddle/fluid/framework/ir/squared_mat_sub_fuse_pass.cc
paddle/fluid/framework/ir/squared_mat_sub_fuse_pass.cc
+24
-6
paddle/fluid/framework/ir/squared_mat_sub_fuse_pass.h
paddle/fluid/framework/ir/squared_mat_sub_fuse_pass.h
+1
-1
paddle/fluid/framework/operator.h
paddle/fluid/framework/operator.h
+8
-0
paddle/fluid/framework/parallel_executor.cc
paddle/fluid/framework/parallel_executor.cc
+19
-0
paddle/fluid/inference/api/paddle_pass_builder.cc
paddle/fluid/inference/api/paddle_pass_builder.cc
+2
-1
paddle/fluid/inference/tensorrt/convert/emb_eltwise_layernorm.cc
...fluid/inference/tensorrt/convert/emb_eltwise_layernorm.cc
+3
-3
paddle/fluid/inference/tensorrt/plugin/emb_eltwise_layernorm_plugin.cu
...inference/tensorrt/plugin/emb_eltwise_layernorm_plugin.cu
+133
-81
paddle/fluid/inference/tensorrt/plugin/emb_eltwise_layernorm_plugin.h
.../inference/tensorrt/plugin/emb_eltwise_layernorm_plugin.h
+144
-34
paddle/fluid/inference/tests/api/trt_dynamic_shape_ernie_deserialize_test.cc
...nce/tests/api/trt_dynamic_shape_ernie_deserialize_test.cc
+4
-6
paddle/fluid/operators/conv_cudnn_op.cu
paddle/fluid/operators/conv_cudnn_op.cu
+22
-5
paddle/fluid/operators/conv_op.cc
paddle/fluid/operators/conv_op.cc
+10
-0
paddle/fluid/operators/cudnn_lstm_cache.h
paddle/fluid/operators/cudnn_lstm_cache.h
+10
-0
paddle/fluid/operators/elementwise/elementwise_add_op.cc
paddle/fluid/operators/elementwise/elementwise_add_op.cc
+18
-0
paddle/fluid/operators/elementwise/elementwise_add_op.cu
paddle/fluid/operators/elementwise/elementwise_add_op.cu
+7
-0
paddle/fluid/operators/fake_quantize_op.cc
paddle/fluid/operators/fake_quantize_op.cc
+135
-0
paddle/fluid/operators/fake_quantize_op.cu
paddle/fluid/operators/fake_quantize_op.cu
+87
-2
paddle/fluid/operators/fake_quantize_op.h
paddle/fluid/operators/fake_quantize_op.h
+31
-0
paddle/fluid/operators/fused/fusion_gru_op.cc
paddle/fluid/operators/fused/fusion_gru_op.cc
+1
-0
paddle/fluid/operators/optimizers/rmsprop_op.cc
paddle/fluid/operators/optimizers/rmsprop_op.cc
+2
-1
paddle/fluid/operators/optimizers/rmsprop_op.cu
paddle/fluid/operators/optimizers/rmsprop_op.cu
+2
-1
paddle/fluid/operators/top_k_v2_op.cc
paddle/fluid/operators/top_k_v2_op.cc
+12
-3
paddle/fluid/platform/cudnn_helper.h
paddle/fluid/platform/cudnn_helper.h
+2
-0
paddle/fluid/platform/dynload/cudnn.cc
paddle/fluid/platform/dynload/cudnn.cc
+4
-0
paddle/fluid/platform/dynload/cudnn.h
paddle/fluid/platform/dynload/cudnn.h
+13
-8
paddle/fluid/platform/flags.cc
paddle/fluid/platform/flags.cc
+15
-0
paddle/fluid/pybind/global_value_getter_setter.cc
paddle/fluid/pybind/global_value_getter_setter.cc
+2
-1
paddle/fluid/pybind/op_function_generator.cc
paddle/fluid/pybind/op_function_generator.cc
+1
-0
paddle/fluid/pybind/pybind.cc
paddle/fluid/pybind/pybind.cc
+6
-0
paddle/scripts/paddle_build.sh
paddle/scripts/paddle_build.sh
+44
-1
python/paddle/distributed/fleet/__init__.py
python/paddle/distributed/fleet/__init__.py
+1
-0
python/paddle/distributed/fleet/base/fleet_base.py
python/paddle/distributed/fleet/base/fleet_base.py
+17
-27
python/paddle/distributed/fleet/base/role_maker.py
python/paddle/distributed/fleet/base/role_maker.py
+441
-253
python/paddle/distributed/fleet/base/util_factory.py
python/paddle/distributed/fleet/base/util_factory.py
+8
-37
python/paddle/distributed/fleet/launch.py
python/paddle/distributed/fleet/launch.py
+25
-1
python/paddle/distributed/fleet/launch_utils.py
python/paddle/distributed/fleet/launch_utils.py
+8
-2
python/paddle/distributed/fleet/meta_optimizers/common.py
python/paddle/distributed/fleet/meta_optimizers/common.py
+3
-3
python/paddle/distributed/fleet/meta_optimizers/dgc_optimizer.py
...paddle/distributed/fleet/meta_optimizers/dgc_optimizer.py
+2
-2
python/paddle/distributed/fleet/meta_optimizers/graph_execution_optimizer.py
...ibuted/fleet/meta_optimizers/graph_execution_optimizer.py
+9
-9
python/paddle/distributed/fleet/meta_optimizers/localsgd_optimizer.py
...e/distributed/fleet/meta_optimizers/localsgd_optimizer.py
+5
-5
python/paddle/distributed/fleet/meta_optimizers/parameter_server_graph_optimizer.py
...fleet/meta_optimizers/parameter_server_graph_optimizer.py
+1
-1
python/paddle/distributed/fleet/meta_optimizers/parameter_server_optimizer.py
...buted/fleet/meta_optimizers/parameter_server_optimizer.py
+2
-2
python/paddle/distributed/fleet/meta_optimizers/pipeline_optimizer.py
...e/distributed/fleet/meta_optimizers/pipeline_optimizer.py
+4
-4
python/paddle/distributed/fleet/runtime/parameter_server_runtime.py
...dle/distributed/fleet/runtime/parameter_server_runtime.py
+11
-10
python/paddle/fluid/__init__.py
python/paddle/fluid/__init__.py
+1
-0
python/paddle/fluid/backward.py
python/paddle/fluid/backward.py
+77
-17
python/paddle/fluid/contrib/slim/quantization/imperative/qat.py
.../paddle/fluid/contrib/slim/quantization/imperative/qat.py
+8
-3
python/paddle/fluid/contrib/slim/quantization/imperative/quant_nn.py
...le/fluid/contrib/slim/quantization/imperative/quant_nn.py
+105
-7
python/paddle/fluid/contrib/slim/tests/test_imperative_qat.py
...on/paddle/fluid/contrib/slim/tests/test_imperative_qat.py
+0
-1
python/paddle/fluid/contrib/slim/tests/test_imperative_qat_channelwise.py
...uid/contrib/slim/tests/test_imperative_qat_channelwise.py
+428
-0
python/paddle/fluid/incubate/fleet/parameter_server/ir/public.py
...paddle/fluid/incubate/fleet/parameter_server/ir/public.py
+24
-6
python/paddle/fluid/layers/tensor.py
python/paddle/fluid/layers/tensor.py
+2
-0
python/paddle/fluid/tests/unittests/ir/inference/test_conv_affine_channel_fuse_pass.py
...ttests/ir/inference/test_conv_affine_channel_fuse_pass.py
+228
-0
python/paddle/fluid/tests/unittests/ir/inference/test_conv_bn_fuse_pass.py
...id/tests/unittests/ir/inference/test_conv_bn_fuse_pass.py
+177
-0
python/paddle/fluid/tests/unittests/ir/inference/test_conv_elementwise_add2_act_fuse_pass.py
.../ir/inference/test_conv_elementwise_add2_act_fuse_pass.py
+4
-0
python/paddle/fluid/tests/unittests/ir/inference/test_conv_elementwise_add_act_fuse_pass.py
...s/ir/inference/test_conv_elementwise_add_act_fuse_pass.py
+4
-0
python/paddle/fluid/tests/unittests/ir/inference/test_conv_elementwise_add_fuse_pass.py
...tests/ir/inference/test_conv_elementwise_add_fuse_pass.py
+3
-0
python/paddle/fluid/tests/unittests/ir/inference/test_fc_fuse_pass.py
...e/fluid/tests/unittests/ir/inference/test_fc_fuse_pass.py
+54
-0
python/paddle/fluid/tests/unittests/ir/inference/test_fc_gru_fuse_pass.py
...uid/tests/unittests/ir/inference/test_fc_gru_fuse_pass.py
+86
-0
python/paddle/fluid/tests/unittests/ir/inference/test_fc_lstm_fuse_pass.py
...id/tests/unittests/ir/inference/test_fc_lstm_fuse_pass.py
+52
-0
python/paddle/fluid/tests/unittests/ir/inference/test_repeated_fc_relu_fuse_pass.py
...unittests/ir/inference/test_repeated_fc_relu_fuse_pass.py
+94
-0
python/paddle/fluid/tests/unittests/ir/inference/test_squared_mat_sub_fuse_pass.py
.../unittests/ir/inference/test_squared_mat_sub_fuse_pass.py
+63
-0
python/paddle/fluid/tests/unittests/ir/inference/test_transpose_flatten_concat_fuse_pass.py
...s/ir/inference/test_transpose_flatten_concat_fuse_pass.py
+3
-1
python/paddle/fluid/tests/unittests/ir/inference/test_trt_shuffle_channel_detect_pass.py
...ests/ir/inference/test_trt_shuffle_channel_detect_pass.py
+51
-0
python/paddle/fluid/tests/unittests/test_fake_quantize_op.py
python/paddle/fluid/tests/unittests/test_fake_quantize_op.py
+65
-0
python/paddle/fluid/tests/unittests/test_fleet_base.py
python/paddle/fluid/tests/unittests/test_fleet_base.py
+41
-25
python/paddle/fluid/tests/unittests/test_fleet_rolemaker_2.py
...on/paddle/fluid/tests/unittests/test_fleet_rolemaker_2.py
+1
-1
python/paddle/fluid/tests/unittests/test_fleet_rolemaker_new.py
.../paddle/fluid/tests/unittests/test_fleet_rolemaker_new.py
+636
-43
python/paddle/fluid/tests/unittests/test_fleet_util.py
python/paddle/fluid/tests/unittests/test_fleet_util.py
+3
-94
python/paddle/fluid/tests/unittests/test_inplace_addto_strategy.py
...ddle/fluid/tests/unittests/test_inplace_addto_strategy.py
+114
-0
python/paddle/fluid/tests/unittests/test_top_k_v2_op.py
python/paddle/fluid/tests/unittests/test_top_k_v2_op.py
+15
-4
python/paddle/fluid/tests/unittests/test_transformer_api.py
python/paddle/fluid/tests/unittests/test_transformer_api.py
+135
-0
python/paddle/nn/layer/transformer.py
python/paddle/nn/layer/transformer.py
+82
-20
python/paddle/optimizer/adam.py
python/paddle/optimizer/adam.py
+5
-6
python/paddle/reader/decorator.py
python/paddle/reader/decorator.py
+1
-1
python/paddle/tests/test_dataset_cifar.py
python/paddle/tests/test_dataset_cifar.py
+16
-8
python/paddle/tests/test_datasets.py
python/paddle/tests/test_datasets.py
+4
-2
python/paddle/text/datasets/uci_housing.py
python/paddle/text/datasets/uci_housing.py
+5
-1
python/paddle/utils/__init__.py
python/paddle/utils/__init__.py
+1
-0
python/paddle/utils/lazy_import.py
python/paddle/utils/lazy_import.py
+34
-0
python/paddle/vision/datasets/cifar.py
python/paddle/vision/datasets/cifar.py
+1
-0
python/paddle/vision/datasets/folder.py
python/paddle/vision/datasets/folder.py
+2
-1
python/paddle/vision/datasets/mnist.py
python/paddle/vision/datasets/mnist.py
+1
-10
python/paddle/vision/transforms/functional.py
python/paddle/vision/transforms/functional.py
+35
-16
python/paddle/vision/transforms/transforms.py
python/paddle/vision/transforms/transforms.py
+49
-12
python/requirements.txt
python/requirements.txt
+0
-1
python/setup.py.in
python/setup.py.in
+0
-3
tools/check_api_approvals.sh
tools/check_api_approvals.sh
+1
-1
未找到文件。
cmake/cuda.cmake
浏览文件 @
f52c4f8b
...
@@ -107,6 +107,9 @@ function(select_nvcc_arch_flags out_variable)
...
@@ -107,6 +107,9 @@ function(select_nvcc_arch_flags out_variable)
elseif
(
${
CUDA_ARCH_NAME
}
STREQUAL
"Maxwell"
)
elseif
(
${
CUDA_ARCH_NAME
}
STREQUAL
"Maxwell"
)
set
(
cuda_arch_bin
"50"
)
set
(
cuda_arch_bin
"50"
)
elseif
(
${
CUDA_ARCH_NAME
}
STREQUAL
"Pascal"
)
elseif
(
${
CUDA_ARCH_NAME
}
STREQUAL
"Pascal"
)
if
(
NOT
${
CMAKE_CUDA_COMPILER_VERSION
}
LESS 10.0
)
add_definitions
(
"-DSUPPORTS_CUDA_FP16"
)
endif
()
set
(
cuda_arch_bin
"60 61"
)
set
(
cuda_arch_bin
"60 61"
)
elseif
(
${
CUDA_ARCH_NAME
}
STREQUAL
"Volta"
)
elseif
(
${
CUDA_ARCH_NAME
}
STREQUAL
"Volta"
)
if
(
NOT
${
CMAKE_CUDA_COMPILER_VERSION
}
LESS 10.0
)
if
(
NOT
${
CMAKE_CUDA_COMPILER_VERSION
}
LESS 10.0
)
...
...
paddle/fluid/framework/data_feed.cc
浏览文件 @
f52c4f8b
...
@@ -527,6 +527,8 @@ bool MultiSlotDataFeed::CheckFile(const char* filename) {
...
@@ -527,6 +527,8 @@ bool MultiSlotDataFeed::CheckFile(const char* filename) {
VLOG
(
0
)
<<
"error: the number of ids is a negative number: "
<<
num
;
VLOG
(
0
)
<<
"error: the number of ids is a negative number: "
<<
num
;
VLOG
(
0
)
<<
"please check line<"
<<
instance_cout
<<
"> in file<"
VLOG
(
0
)
<<
"please check line<"
<<
instance_cout
<<
"> in file<"
<<
filename
<<
">"
;
<<
filename
<<
">"
;
VLOG
(
0
)
<<
"Error occured when parsing "
<<
i
<<
" th slot with total slots number: "
<<
all_slots_
.
size
();
return
false
;
return
false
;
}
else
if
(
num
==
0
)
{
}
else
if
(
num
==
0
)
{
VLOG
(
0
)
VLOG
(
0
)
...
@@ -536,42 +538,66 @@ bool MultiSlotDataFeed::CheckFile(const char* filename) {
...
@@ -536,42 +538,66 @@ bool MultiSlotDataFeed::CheckFile(const char* filename) {
"characters."
;
"characters."
;
VLOG
(
0
)
<<
"please check line<"
<<
instance_cout
<<
"> in file<"
VLOG
(
0
)
<<
"please check line<"
<<
instance_cout
<<
"> in file<"
<<
filename
<<
">"
;
<<
filename
<<
">"
;
VLOG
(
0
)
<<
"Error occured when parsing "
<<
i
<<
" th slot with total slots number: "
<<
all_slots_
.
size
();
return
false
;
return
false
;
}
else
if
(
errno
==
ERANGE
||
num
>
INT_MAX
)
{
}
else
if
(
errno
==
ERANGE
||
num
>
INT_MAX
)
{
VLOG
(
0
)
<<
"error: the number of ids greater than INT_MAX"
;
VLOG
(
0
)
<<
"error: the number of ids greater than INT_MAX"
;
VLOG
(
0
)
<<
"please check line<"
<<
instance_cout
<<
"> in file<"
VLOG
(
0
)
<<
"please check line<"
<<
instance_cout
<<
"> in file<"
<<
filename
<<
">"
;
<<
filename
<<
">"
;
VLOG
(
0
)
<<
"Error occured when parsing "
<<
i
<<
" th slot with total slots number: "
<<
all_slots_
.
size
();
return
false
;
return
false
;
}
}
if
(
all_slots_type_
[
i
]
==
"float"
)
{
if
(
all_slots_type_
[
i
]
==
"float"
)
{
for
(
int
i
=
0
;
i
<
num
;
++
i
)
{
for
(
int
j
=
0
;
j
<
num
;
++
j
)
{
strtof
(
endptr
,
&
endptr
);
strtof
(
endptr
,
&
endptr
);
if
(
errno
==
ERANGE
)
{
if
(
errno
==
ERANGE
)
{
VLOG
(
0
)
<<
"error: the value is out of the range of "
VLOG
(
0
)
<<
"error: the value is out of the range of "
"representable values for float"
;
"representable values for float"
;
VLOG
(
0
)
<<
"please check line<"
<<
instance_cout
<<
"> in file<"
VLOG
(
0
)
<<
"please check line<"
<<
instance_cout
<<
"> in file<"
<<
filename
<<
">"
;
<<
filename
<<
">"
;
VLOG
(
0
)
<<
"Error occured when parsing "
<<
i
<<
" th slot with total slots number: "
<<
all_slots_
.
size
();
VLOG
(
0
)
<<
"and in this slot: "
<<
j
<<
" th id with total id number: "
<<
num
;
return
false
;
return
false
;
}
}
if
(
i
+
1
!=
num
&&
endptr
-
str
==
len
)
{
if
(
j
+
1
!=
num
&&
endptr
-
str
==
len
)
{
VLOG
(
0
)
<<
"error: there is a wrong with the number of ids."
;
VLOG
(
0
)
<<
"error: there is a wrong with the number of ids."
;
VLOG
(
0
)
<<
"Error occured when parsing "
<<
i
<<
" th slot with total slots number: "
<<
all_slots_
.
size
();
VLOG
(
0
)
<<
"and in this slot: "
<<
j
<<
" th id with total id number: "
<<
num
;
VLOG
(
0
)
<<
"please check line<"
<<
instance_cout
<<
"> in file<"
VLOG
(
0
)
<<
"please check line<"
<<
instance_cout
<<
"> in file<"
<<
filename
<<
">"
;
<<
filename
<<
">"
;
return
false
;
return
false
;
}
}
}
}
}
else
if
(
all_slots_type_
[
i
]
==
"uint64"
)
{
}
else
if
(
all_slots_type_
[
i
]
==
"uint64"
)
{
for
(
int
i
=
0
;
i
<
num
;
++
i
)
{
for
(
int
j
=
0
;
j
<
num
;
++
j
)
{
strtoull
(
endptr
,
&
endptr
,
10
);
strtoull
(
endptr
,
&
endptr
,
10
);
if
(
errno
==
ERANGE
)
{
if
(
errno
==
ERANGE
)
{
VLOG
(
0
)
<<
"error: the value is out of the range of "
VLOG
(
0
)
<<
"error: the value is out of the range of "
"representable values for uint64_t"
;
"representable values for uint64_t"
;
VLOG
(
0
)
<<
"Error occured when parsing "
<<
i
<<
" th slot with total slots number: "
<<
all_slots_
.
size
();
VLOG
(
0
)
<<
"and in this slot: "
<<
j
<<
" th id with total id number: "
<<
num
;
VLOG
(
0
)
<<
"please check line<"
<<
instance_cout
<<
"> in file<"
VLOG
(
0
)
<<
"please check line<"
<<
instance_cout
<<
"> in file<"
<<
filename
<<
">"
;
<<
filename
<<
">"
;
return
false
;
return
false
;
}
}
if
(
i
+
1
!=
num
&&
endptr
-
str
==
len
)
{
if
(
j
+
1
!=
num
&&
endptr
-
str
==
len
)
{
VLOG
(
0
)
<<
"error: there is a wrong with the number of ids."
;
VLOG
(
0
)
<<
"error: there is a wrong with the number of ids."
;
VLOG
(
0
)
<<
"Error occured when parsing "
<<
i
<<
" th slot with total slots number: "
<<
all_slots_
.
size
();
VLOG
(
0
)
<<
"and in this slot: "
<<
j
<<
" th id with total id number: "
<<
num
;
VLOG
(
0
)
<<
"please check line<"
<<
instance_cout
<<
"> in file<"
VLOG
(
0
)
<<
"please check line<"
<<
instance_cout
<<
"> in file<"
<<
filename
<<
">"
;
<<
filename
<<
">"
;
return
false
;
return
false
;
...
@@ -632,8 +658,13 @@ bool MultiSlotDataFeed::ParseOneInstanceFromPipe(
...
@@ -632,8 +658,13 @@ bool MultiSlotDataFeed::ParseOneInstanceFromPipe(
"The number of ids can not be zero, you need padding "
"The number of ids can not be zero, you need padding "
"it in data generator; or if there is something wrong with "
"it in data generator; or if there is something wrong with "
"the data, please check if the data contains unresolvable "
"the data, please check if the data contains unresolvable "
"characters.
\n
please check this error line: %s"
,
"characters.
\n
please check this error line: %s,
\n
Specifically, "
str
));
"something wrong happened(the length of this slot's feasign is 0)"
"when we parse the %d th slots."
"Maybe something wrong around this slot"
,
"
\n
We detect the feasign number of this slot is %d, "
"which is illegal."
,
str
,
i
,
num
));
if
(
idx
!=
-
1
)
{
if
(
idx
!=
-
1
)
{
(
*
instance
)[
idx
].
Init
(
all_slots_type_
[
i
]);
(
*
instance
)[
idx
].
Init
(
all_slots_type_
[
i
]);
if
((
*
instance
)[
idx
].
GetType
()[
0
]
==
'f'
)
{
// float
if
((
*
instance
)[
idx
].
GetType
()[
0
]
==
'f'
)
{
// float
...
@@ -683,8 +714,13 @@ bool MultiSlotDataFeed::ParseOneInstance(std::vector<MultiSlotType>* instance) {
...
@@ -683,8 +714,13 @@ bool MultiSlotDataFeed::ParseOneInstance(std::vector<MultiSlotType>* instance) {
"The number of ids can not be zero, you need padding "
"The number of ids can not be zero, you need padding "
"it in data generator; or if there is something wrong with "
"it in data generator; or if there is something wrong with "
"the data, please check if the data contains unresolvable "
"the data, please check if the data contains unresolvable "
"characters.
\n
please check this error line: %s."
,
"characters.
\n
please check this error line: %s,
\n
Specifically, "
str
));
"something wrong happened(the length of this slot's feasign is 0)"
"when we parse the %d th slots."
"Maybe something wrong around this slot"
,
"
\n
We detect the feasign number of this slot is %d, "
"which is illegal."
,
str
,
i
,
num
));
if
(
idx
!=
-
1
)
{
if
(
idx
!=
-
1
)
{
(
*
instance
)[
idx
].
Init
(
all_slots_type_
[
i
]);
(
*
instance
)[
idx
].
Init
(
all_slots_type_
[
i
]);
...
@@ -916,8 +952,13 @@ bool MultiSlotInMemoryDataFeed::ParseOneInstanceFromPipe(Record* instance) {
...
@@ -916,8 +952,13 @@ bool MultiSlotInMemoryDataFeed::ParseOneInstanceFromPipe(Record* instance) {
"The number of ids can not be zero, you need padding "
"The number of ids can not be zero, you need padding "
"it in data generator; or if there is something wrong with "
"it in data generator; or if there is something wrong with "
"the data, please check if the data contains unresolvable "
"the data, please check if the data contains unresolvable "
"characters.
\n
please check this error line: %s."
,
"characters.
\n
please check this error line: %s,
\n
Specifically, "
str
));
"something wrong happened(the length of this slot's feasign is 0)"
"when we parse the %d th slots."
"Maybe something wrong around this slot"
,
"
\n
We detect the feasign number of this slot is %d, "
"which is illegal."
,
str
,
i
,
num
));
if
(
idx
!=
-
1
)
{
if
(
idx
!=
-
1
)
{
if
(
all_slots_type_
[
i
][
0
]
==
'f'
)
{
// float
if
(
all_slots_type_
[
i
][
0
]
==
'f'
)
{
// float
for
(
int
j
=
0
;
j
<
num
;
++
j
)
{
for
(
int
j
=
0
;
j
<
num
;
++
j
)
{
...
@@ -982,8 +1023,13 @@ bool MultiSlotInMemoryDataFeed::ParseOneInstance(Record* instance) {
...
@@ -982,8 +1023,13 @@ bool MultiSlotInMemoryDataFeed::ParseOneInstance(Record* instance) {
"The number of ids can not be zero, you need padding "
"The number of ids can not be zero, you need padding "
"it in data generator; or if there is something wrong with "
"it in data generator; or if there is something wrong with "
"the data, please check if the data contains unresolvable "
"the data, please check if the data contains unresolvable "
"characters.
\n
please check this error line: %s."
,
"characters.
\n
please check this error line: %s,
\n
Specifically, "
str
));
"something wrong happened(the length of this slot's feasign is 0)"
"when we parse the %d th slots."
"Maybe something wrong around this slot"
,
"
\n
We detect the feasign number of this slot is %d, "
"which is illegal."
,
str
,
i
,
num
));
if
(
idx
!=
-
1
)
{
if
(
idx
!=
-
1
)
{
if
(
all_slots_type_
[
i
][
0
]
==
'f'
)
{
// float
if
(
all_slots_type_
[
i
][
0
]
==
'f'
)
{
// float
...
...
paddle/fluid/framework/details/CMakeLists.txt
浏览文件 @
f52c4f8b
...
@@ -74,6 +74,7 @@ set(SSA_GRAPH_EXECUTOR_DEPS graph framework_proto
...
@@ -74,6 +74,7 @@ set(SSA_GRAPH_EXECUTOR_DEPS graph framework_proto
eager_deletion_pass
eager_deletion_pass
buffer_shared_inplace_op_pass
buffer_shared_inplace_op_pass
buffer_shared_cross_op_memory_reuse_pass
buffer_shared_cross_op_memory_reuse_pass
inplace_addto_op_pass
set_reader_device_info_utils
set_reader_device_info_utils
add_reader_dependency_pass
)
add_reader_dependency_pass
)
cc_library
(
ssa_graph_executor SRCS ssa_graph_executor.cc DEPS
${
SSA_GRAPH_EXECUTOR_DEPS
}
)
cc_library
(
ssa_graph_executor SRCS ssa_graph_executor.cc DEPS
${
SSA_GRAPH_EXECUTOR_DEPS
}
)
...
...
paddle/fluid/framework/details/all_reduce_op_handle.cc
浏览文件 @
f52c4f8b
...
@@ -12,7 +12,9 @@
...
@@ -12,7 +12,9 @@
// See the License for the specific language governing permissions and
// See the License for the specific language governing permissions and
// limitations under the License.
// limitations under the License.
#include "paddle/fluid/framework/details/all_reduce_op_handle.h"
#include "paddle/fluid/framework/details/all_reduce_op_handle.h"
#include <algorithm>
#include <algorithm>
#include "paddle/fluid/framework/details/container_cast.h"
#include "paddle/fluid/framework/details/container_cast.h"
#include "paddle/fluid/framework/details/reduce_and_gather.h"
#include "paddle/fluid/framework/details/reduce_and_gather.h"
#include "paddle/fluid/framework/details/variable_visitor.h"
#include "paddle/fluid/framework/details/variable_visitor.h"
...
@@ -34,14 +36,24 @@ AllReduceOpHandle::AllReduceOpHandle(ir::Node *node,
...
@@ -34,14 +36,24 @@ AllReduceOpHandle::AllReduceOpHandle(ir::Node *node,
const
std
::
vector
<
platform
::
Place
>
&
places
,
const
std
::
vector
<
platform
::
Place
>
&
places
,
const
platform
::
NCCLCommunicator
*
ctxs
)
const
platform
::
NCCLCommunicator
*
ctxs
)
:
NCCLOpHandleBase
(
node
,
places
,
ctxs
),
local_scopes_
(
local_scopes
)
{
:
NCCLOpHandleBase
(
node
,
places
,
ctxs
),
local_scopes_
(
local_scopes
)
{
PADDLE_ENFORCE_EQ
(
places_
.
size
(),
local_scopes_
.
size
());
PADDLE_ENFORCE_EQ
(
places_
.
size
(),
local_scopes_
.
size
(),
platform
::
errors
::
InvalidArgument
(
"The number of places and the number of local scopes "
"should be equal, but got number of places is %d and "
"number of local scopes is %d."
,
places_
.
size
(),
local_scopes_
.
size
()));
}
}
#else
#else
AllReduceOpHandle
::
AllReduceOpHandle
(
ir
::
Node
*
node
,
AllReduceOpHandle
::
AllReduceOpHandle
(
ir
::
Node
*
node
,
const
std
::
vector
<
Scope
*>
&
local_scopes
,
const
std
::
vector
<
Scope
*>
&
local_scopes
,
const
std
::
vector
<
platform
::
Place
>
&
places
)
const
std
::
vector
<
platform
::
Place
>
&
places
)
:
OpHandleBase
(
node
),
local_scopes_
(
local_scopes
),
places_
(
places
)
{
:
OpHandleBase
(
node
),
local_scopes_
(
local_scopes
),
places_
(
places
)
{
PADDLE_ENFORCE_EQ
(
places_
.
size
(),
local_scopes_
.
size
());
PADDLE_ENFORCE_EQ
(
places_
.
size
(),
local_scopes_
.
size
(),
platform
::
errors
::
InvalidArgument
(
"The number of places and the number of local scopes "
"should be equal, but got number of places is %d and "
"number of local scopes is %d."
,
places_
.
size
(),
local_scopes_
.
size
()));
}
}
#endif
#endif
...
@@ -60,13 +72,25 @@ void AllReduceOpHandle::AllReduceImpl(
...
@@ -60,13 +72,25 @@ void AllReduceOpHandle::AllReduceImpl(
const
std
::
vector
<
VarHandle
*>
&
in_var_handles
,
const
std
::
vector
<
VarHandle
*>
&
in_var_handles
,
const
std
::
vector
<
VarHandle
*>
&
out_var_handles
)
{
const
std
::
vector
<
VarHandle
*>
&
out_var_handles
)
{
size_t
num_places
=
places_
.
size
();
size_t
num_places
=
places_
.
size
();
PADDLE_ENFORCE_EQ
(
PADDLE_ENFORCE_EQ
(
in_var_handles
.
size
(),
num_places
,
in_var_handles
.
size
(),
num_places
,
platform
::
errors
::
InvalidArgument
(
"The NoDummyInputSize should be equal to the number of places."
);
"The NoDummyInputSize should be equal "
"to the number of places, but got NoDummyInputSize is "
"%d and the number of place is %d."
,
in_var_handles
.
size
(),
num_places
));
PADDLE_ENFORCE_EQ
(
PADDLE_ENFORCE_EQ
(
in_var_handles
.
size
(),
out_var_handles
.
size
(),
in_var_handles
.
size
(),
out_var_handles
.
size
(),
"The NoDummyInputSize and NoDummyOutputSize should be equal."
);
platform
::
errors
::
InvalidArgument
(
PADDLE_ENFORCE_EQ
(
local_exec_scopes_
.
size
(),
num_places
);
"The NoDummyInputSize and NoDummyOutputSize should be "
"equal, but got NoDummyInputSize is %d and NoDummyOutputSize is %d."
,
in_var_handles
.
size
(),
out_var_handles
.
size
()));
PADDLE_ENFORCE_EQ
(
local_exec_scopes_
.
size
(),
num_places
,
platform
::
errors
::
InvalidArgument
(
"The number of local scopes should be equal "
"to the number of places, but got the number of local scopes is "
"%d and the number of place is %d."
,
in_var_handles
.
size
(),
num_places
));
std
::
vector
<
const
void
*>
lod_tensor_data
;
std
::
vector
<
const
void
*>
lod_tensor_data
;
std
::
vector
<
platform
::
Place
>
places
;
std
::
vector
<
platform
::
Place
>
places
;
...
@@ -78,23 +102,36 @@ void AllReduceOpHandle::AllReduceImpl(
...
@@ -78,23 +102,36 @@ void AllReduceOpHandle::AllReduceImpl(
for
(
size_t
i
=
0
;
i
<
local_exec_scopes_
.
size
();
++
i
)
{
for
(
size_t
i
=
0
;
i
<
local_exec_scopes_
.
size
();
++
i
)
{
auto
&
local_scope
=
local_exec_scopes_
[
i
];
auto
&
local_scope
=
local_exec_scopes_
[
i
];
auto
var
=
local_scope
->
FindVar
(
in_var_handles
[
i
]
->
name
());
auto
var
=
local_scope
->
FindVar
(
in_var_handles
[
i
]
->
name
());
PADDLE_ENFORCE_NOT_NULL
(
var
,
"%s is not found int scope."
,
PADDLE_ENFORCE_NOT_NULL
(
var
,
platform
::
errors
::
NotFound
(
in_var_handles
[
i
]
->
name
());
"Variable %s is not found in local scope."
,
in_var_handles
[
i
]
->
name
()));
auto
&
lod_tensor
=
var
->
Get
<
LoDTensor
>
();
auto
&
lod_tensor
=
var
->
Get
<
LoDTensor
>
();
if
(
i
==
0
)
{
if
(
i
==
0
)
{
numel
=
static_cast
<
int64_t
>
(
lod_tensor
.
numel
());
numel
=
static_cast
<
int64_t
>
(
lod_tensor
.
numel
());
// only enforce place0, we will enforce other palce numel == place0 numel
// only enforce place0, we will enforce other palce numel == place0 numel
PADDLE_ENFORCE_GT
(
PADDLE_ENFORCE_GT
(
numel
,
0
,
platform
::
errors
::
InvalidArgument
(
numel
,
0
,
"The numel of tensos=[%s] must > 0. But now numel=[%d]"
,
platform
::
errors
::
PreconditionNotMet
(
"The numel of tensor %s should be > 0, but got numel is %d."
,
in_var_handles
[
i
]
->
name
(),
numel
));
in_var_handles
[
i
]
->
name
(),
numel
));
dtype
=
lod_tensor
.
type
();
dtype
=
lod_tensor
.
type
();
is_gpu_place
=
platform
::
is_gpu_place
(
lod_tensor
.
place
());
is_gpu_place
=
platform
::
is_gpu_place
(
lod_tensor
.
place
());
}
}
PADDLE_ENFORCE_EQ
(
numel
,
static_cast
<
int64_t
>
(
lod_tensor
.
numel
()));
PADDLE_ENFORCE_EQ
(
PADDLE_ENFORCE_EQ
(
dtype
,
lod_tensor
.
type
());
numel
,
static_cast
<
int64_t
>
(
lod_tensor
.
numel
()),
PADDLE_ENFORCE_EQ
(
is_gpu_place
,
platform
::
is_gpu_place
(
lod_tensor
.
place
()));
platform
::
errors
::
PreconditionNotMet
(
"The size of tensors of the same variable in different local "
"scopes should be equal."
));
PADDLE_ENFORCE_EQ
(
dtype
,
lod_tensor
.
type
(),
platform
::
errors
::
PreconditionNotMet
(
"The dtype of tensors of the same variable in different local "
"scopes should be equal."
));
PADDLE_ENFORCE_EQ
(
is_gpu_place
,
platform
::
is_gpu_place
(
lod_tensor
.
place
()),
platform
::
errors
::
PreconditionNotMet
(
"The place type of tensors of the same variable "
"in different local scopes should be equal."
));
lod_tensor_data
.
emplace_back
(
lod_tensor
.
data
<
void
>
());
lod_tensor_data
.
emplace_back
(
lod_tensor
.
data
<
void
>
());
places
.
emplace_back
(
lod_tensor
.
place
());
places
.
emplace_back
(
lod_tensor
.
place
());
...
@@ -102,8 +139,12 @@ void AllReduceOpHandle::AllReduceImpl(
...
@@ -102,8 +139,12 @@ void AllReduceOpHandle::AllReduceImpl(
VLOG
(
10
)
<<
"place:"
<<
i
<<
", input_name:"
<<
in_var_handles
[
i
]
->
name
()
VLOG
(
10
)
<<
"place:"
<<
i
<<
", input_name:"
<<
in_var_handles
[
i
]
->
name
()
<<
", out_name:"
<<
out_var_handles
[
i
]
->
name
();
<<
", out_name:"
<<
out_var_handles
[
i
]
->
name
();
PADDLE_ENFORCE_EQ
(
in_var_handles
[
i
]
->
name
(),
out_var_handles
[
i
]
->
name
(),
PADDLE_ENFORCE_EQ
(
"The name of input and output should be equal."
);
in_var_handles
[
i
]
->
name
(),
out_var_handles
[
i
]
->
name
(),
platform
::
errors
::
InvalidArgument
(
"The name of input and output of all_reduce op should be equal, "
"but got input is %s and output is %s."
,
in_var_handles
[
i
]
->
name
(),
out_var_handles
[
i
]
->
name
()));
}
}
std
::
vector
<
std
::
string
>
grad_var_names
;
std
::
vector
<
std
::
string
>
grad_var_names
;
...
@@ -122,7 +163,9 @@ void AllReduceOpHandle::AllReduceFunc(
...
@@ -122,7 +163,9 @@ void AllReduceOpHandle::AllReduceFunc(
const
std
::
vector
<
std
::
string
>
&
out_var_names
)
{
const
std
::
vector
<
std
::
string
>
&
out_var_names
)
{
if
(
is_gpu_place
(
places
[
0
]))
{
if
(
is_gpu_place
(
places
[
0
]))
{
#if defined(PADDLE_WITH_NCCL)
#if defined(PADDLE_WITH_NCCL)
PADDLE_ENFORCE_NOT_NULL
(
nccl_ctxs_
,
"nccl_ctxs should not be nullptr."
);
PADDLE_ENFORCE_NOT_NULL
(
nccl_ctxs_
,
platform
::
errors
::
InvalidArgument
(
"The nccl context should not be NULL."
));
ncclDataType_t
nccl_dtype
=
platform
::
ToNCCLDataType
(
dtype
);
ncclDataType_t
nccl_dtype
=
platform
::
ToNCCLDataType
(
dtype
);
std
::
vector
<
std
::
function
<
void
()
>>
all_reduce_calls
;
std
::
vector
<
std
::
function
<
void
()
>>
all_reduce_calls
;
for
(
size_t
i
=
0
;
i
<
local_exec_scopes_
.
size
();
++
i
)
{
for
(
size_t
i
=
0
;
i
<
local_exec_scopes_
.
size
();
++
i
)
{
...
@@ -134,7 +177,8 @@ void AllReduceOpHandle::AllReduceFunc(
...
@@ -134,7 +177,8 @@ void AllReduceOpHandle::AllReduceFunc(
}
}
NCCLAllReduceFunc
(
all_reduce_calls
);
NCCLAllReduceFunc
(
all_reduce_calls
);
#else
#else
PADDLE_THROW
(
"Not compiled with CUDA."
);
PADDLE_THROW
(
platform
::
errors
::
PreconditionNotMet
(
"Not compiled with CUDA."
));
#endif
#endif
}
else
{
// Special handle CPU only Operator's gradient. Like CRF
}
else
{
// Special handle CPU only Operator's gradient. Like CRF
auto
&
trg
=
*
local_exec_scopes_
[
0
]
auto
&
trg
=
*
local_exec_scopes_
[
0
]
...
...
paddle/fluid/framework/details/async_ssa_graph_executor.cc
浏览文件 @
f52c4f8b
...
@@ -89,8 +89,19 @@ AsyncSSAGraphExecutor::AsyncSSAGraphExecutor(
...
@@ -89,8 +89,19 @@ AsyncSSAGraphExecutor::AsyncSSAGraphExecutor(
places_
(
std
::
move
(
places
)),
places_
(
std
::
move
(
places
)),
graphs_
(
std
::
move
(
graphs
))
{
graphs_
(
std
::
move
(
graphs
))
{
VLOG
(
3
)
<<
"build AsyncSSAGraphExecutor"
;
VLOG
(
3
)
<<
"build AsyncSSAGraphExecutor"
;
PADDLE_ENFORCE_EQ
(
places_
.
size
(),
local_scopes_
.
size
());
PADDLE_ENFORCE_EQ
(
places_
.
size
(),
local_scopes_
.
size
(),
PADDLE_ENFORCE_EQ
(
local_scopes_
.
size
(),
local_exec_scopes_
.
size
());
platform
::
errors
::
InvalidArgument
(
"The number of places and the number of local scopes "
"should be equal, but got number of places is %d and "
"number of local scopes is %d."
,
places_
.
size
(),
local_scopes_
.
size
()));
PADDLE_ENFORCE_EQ
(
local_scopes_
.
size
(),
local_exec_scopes_
.
size
(),
platform
::
errors
::
InvalidArgument
(
"The number of local scopes and the number of local execution scopes "
"should be equal, but got number of local scopes is %d and "
"number of local execution scopes is %d."
,
local_scopes_
.
size
(),
local_exec_scopes_
.
size
()));
// set the correct size of thread pool to each device.
// set the correct size of thread pool to each device.
strategy_
.
num_threads_
=
strategy_
.
num_threads_
<
places_
.
size
()
strategy_
.
num_threads_
=
strategy_
.
num_threads_
<
places_
.
size
()
...
...
paddle/fluid/framework/details/build_strategy.h
浏览文件 @
f52c4f8b
...
@@ -19,6 +19,7 @@
...
@@ -19,6 +19,7 @@
#include <unordered_set>
#include <unordered_set>
#include <utility>
#include <utility>
#include <vector>
#include <vector>
#include "boost/optional.hpp"
#include "boost/optional.hpp"
#include "paddle/fluid/framework/ir/pass_builder.h"
#include "paddle/fluid/framework/ir/pass_builder.h"
#include "paddle/fluid/framework/program_desc.h"
#include "paddle/fluid/framework/program_desc.h"
...
@@ -119,6 +120,9 @@ struct BuildStrategy {
...
@@ -119,6 +120,9 @@ struct BuildStrategy {
// Turn on inplace by default.
// Turn on inplace by default.
bool
enable_inplace_
{
true
};
bool
enable_inplace_
{
true
};
// Turn off inplace addto by default.
bool
enable_addto_
{
false
};
// FIXME(zcd): is_distribution_ is a temporary field, because in pserver mode,
// FIXME(zcd): is_distribution_ is a temporary field, because in pserver mode,
// num_trainers is 1, so the current fields of build_strategy doesn't tell if
// num_trainers is 1, so the current fields of build_strategy doesn't tell if
// it's distributed model.
// it's distributed model.
...
...
paddle/fluid/framework/details/fast_threaded_ssa_graph_executor.cc
浏览文件 @
f52c4f8b
...
@@ -12,12 +12,14 @@
...
@@ -12,12 +12,14 @@
// See the License for the specific language governing permissions and
// See the License for the specific language governing permissions and
// limitations under the License.
// limitations under the License.
#include "paddle/fluid/framework/details/fast_threaded_ssa_graph_executor.h"
#include "paddle/fluid/framework/details/fast_threaded_ssa_graph_executor.h"
#include <deque>
#include <deque>
#include <memory>
#include <memory>
#include <string>
#include <string>
#include <unordered_map>
#include <unordered_map>
#include <unordered_set>
#include <unordered_set>
#include <vector>
#include <vector>
#include "paddle/fluid/framework/details/computation_op_handle.h"
#include "paddle/fluid/framework/details/computation_op_handle.h"
#include "paddle/fluid/framework/details/fetch_async_op_handle.h"
#include "paddle/fluid/framework/details/fetch_async_op_handle.h"
#include "paddle/fluid/framework/details/multi_devices_helper.h"
#include "paddle/fluid/framework/details/multi_devices_helper.h"
...
@@ -48,7 +50,9 @@ FastThreadedSSAGraphExecutor::FastThreadedSSAGraphExecutor(
...
@@ -48,7 +50,9 @@ FastThreadedSSAGraphExecutor::FastThreadedSSAGraphExecutor(
bootstrap_ops_
.
emplace_back
(
op
);
bootstrap_ops_
.
emplace_back
(
op
);
}
}
}
}
PADDLE_ENFORCE_GT
(
op_deps_
.
size
(),
0
,
"The graph doesn't have operators."
);
PADDLE_ENFORCE_GT
(
op_deps_
.
size
(),
0
,
platform
::
errors
::
PreconditionNotMet
(
"The graph doesn't have operators."
));
PrepareAtomicOpDeps
();
PrepareAtomicOpDeps
();
}
}
...
...
paddle/fluid/framework/details/fetch_op_handle.cc
浏览文件 @
f52c4f8b
...
@@ -13,9 +13,11 @@
...
@@ -13,9 +13,11 @@
// limitations under the License.
// limitations under the License.
#include "paddle/fluid/framework/details/fetch_op_handle.h"
#include "paddle/fluid/framework/details/fetch_op_handle.h"
#include <string>
#include <string>
#include <utility>
#include <utility>
#include <vector>
#include <vector>
#include "paddle/fluid/platform/profiler.h"
#include "paddle/fluid/platform/profiler.h"
namespace
paddle
{
namespace
paddle
{
...
@@ -138,8 +140,10 @@ void FetchOpHandle::RunImpl() {
...
@@ -138,8 +140,10 @@ void FetchOpHandle::RunImpl() {
auto
*
var_handle
=
static_cast
<
VarHandle
*>
(
inputs_
[
i
]);
auto
*
var_handle
=
static_cast
<
VarHandle
*>
(
inputs_
[
i
]);
auto
&
scope
=
scopes
.
at
(
var_handle
->
scope_idx
());
auto
&
scope
=
scopes
.
at
(
var_handle
->
scope_idx
());
auto
*
var
=
scope
->
FindVar
(
var_handle
->
name
());
auto
*
var
=
scope
->
FindVar
(
var_handle
->
name
());
PADDLE_ENFORCE_NOT_NULL
(
var
,
"Cannot find variable %s in execution scope"
,
PADDLE_ENFORCE_NOT_NULL
(
var_handle
->
name
());
var
,
platform
::
errors
::
NotFound
(
"Cannot find variable %s in execution scope."
,
var_handle
->
name
()));
if
(
var
->
IsType
<
LoDTensor
>
())
{
if
(
var
->
IsType
<
LoDTensor
>
())
{
auto
&
t
=
var
->
Get
<
framework
::
LoDTensor
>
();
auto
&
t
=
var
->
Get
<
framework
::
LoDTensor
>
();
...
...
paddle/fluid/framework/details/op_handle_base.cc
浏览文件 @
f52c4f8b
...
@@ -12,6 +12,7 @@
...
@@ -12,6 +12,7 @@
// See the License for the specific language governing permissions and
// See the License for the specific language governing permissions and
// limitations under the License.
// limitations under the License.
#include "paddle/fluid/framework/details/op_handle_base.h"
#include "paddle/fluid/framework/details/op_handle_base.h"
#include <map>
#include <map>
#include <unordered_set>
#include <unordered_set>
...
@@ -88,6 +89,12 @@ void OpHandleBase::Run(bool use_cuda) {
...
@@ -88,6 +89,12 @@ void OpHandleBase::Run(bool use_cuda) {
PADDLE_ENFORCE
(
!
use_cuda
);
PADDLE_ENFORCE
(
!
use_cuda
);
#endif
#endif
// skip running current op, used with inplace_addto_op_pass
if
(
skip_running_
)
{
VLOG
(
4
)
<<
"skip running: "
<<
Name
();
return
;
}
RunImpl
();
RunImpl
();
}
}
...
...
paddle/fluid/framework/details/op_handle_base.h
浏览文件 @
f52c4f8b
...
@@ -18,6 +18,7 @@
...
@@ -18,6 +18,7 @@
#include <unordered_map>
#include <unordered_map>
#include <unordered_set>
#include <unordered_set>
#include <vector>
#include <vector>
#include "paddle/fluid/framework/details/var_handle.h"
#include "paddle/fluid/framework/details/var_handle.h"
#include "paddle/fluid/framework/ir/node.h"
#include "paddle/fluid/framework/ir/node.h"
#include "paddle/fluid/platform/device_context.h"
#include "paddle/fluid/platform/device_context.h"
...
@@ -52,6 +53,10 @@ class OpHandleBase {
...
@@ -52,6 +53,10 @@ class OpHandleBase {
virtual
Priority
GetPriority
()
const
{
return
kNormal
;
}
virtual
Priority
GetPriority
()
const
{
return
kNormal
;
}
virtual
bool
GetSkipRunning
()
const
{
return
skip_running_
;
}
virtual
void
SetSkipRunning
(
bool
skip_runing
)
{
skip_running_
=
skip_runing
;
}
virtual
std
::
string
Name
()
const
=
0
;
virtual
std
::
string
Name
()
const
=
0
;
void
Run
(
bool
use_cuda
);
void
Run
(
bool
use_cuda
);
...
@@ -131,6 +136,7 @@ class OpHandleBase {
...
@@ -131,6 +136,7 @@ class OpHandleBase {
std
::
map
<
platform
::
Place
,
platform
::
DeviceContext
*>
dev_ctxes_
;
std
::
map
<
platform
::
Place
,
platform
::
DeviceContext
*>
dev_ctxes_
;
std
::
vector
<
Scope
*>
local_exec_scopes_
;
std
::
vector
<
Scope
*>
local_exec_scopes_
;
bool
skip_running_
=
false
;
#ifdef PADDLE_WITH_CUDA
#ifdef PADDLE_WITH_CUDA
std
::
unordered_map
<
int
,
cudaEvent_t
>
events_
;
std
::
unordered_map
<
int
,
cudaEvent_t
>
events_
;
...
...
paddle/fluid/framework/details/parallel_ssa_graph_executor.cc
浏览文件 @
f52c4f8b
...
@@ -13,9 +13,11 @@
...
@@ -13,9 +13,11 @@
// limitations under the License.
// limitations under the License.
#include "paddle/fluid/framework/details/parallel_ssa_graph_executor.h"
#include "paddle/fluid/framework/details/parallel_ssa_graph_executor.h"
#include <algorithm>
#include <algorithm>
#include <memory>
#include <memory>
#include <utility>
#include <utility>
#include "paddle/fluid/framework/ir/graph_helper.h"
#include "paddle/fluid/framework/ir/graph_helper.h"
namespace
paddle
{
namespace
paddle
{
...
@@ -104,7 +106,12 @@ ParallelSSAGraphExecutor::ParallelSSAGraphExecutor(
...
@@ -104,7 +106,12 @@ ParallelSSAGraphExecutor::ParallelSSAGraphExecutor(
places_
(
places
),
places_
(
places
),
graphs_
(
std
::
move
(
graphs
)),
graphs_
(
std
::
move
(
graphs
)),
feed_status_
(
places
.
size
(),
FeedStatus
::
kNone
)
{
feed_status_
(
places
.
size
(),
FeedStatus
::
kNone
)
{
PADDLE_ENFORCE_EQ
(
places_
.
size
(),
local_scopes_
.
size
());
PADDLE_ENFORCE_EQ
(
places_
.
size
(),
local_scopes_
.
size
(),
platform
::
errors
::
InvalidArgument
(
"The number of places and the number of local scopes "
"should be equal, but got number of places is %d and "
"number of local scopes is %d."
,
places_
.
size
(),
local_scopes_
.
size
()));
PADDLE_ENFORCE_EQ
(
places_
.
size
(),
graphs_
.
size
(),
PADDLE_ENFORCE_EQ
(
places_
.
size
(),
graphs_
.
size
(),
platform
::
errors
::
InvalidArgument
(
platform
::
errors
::
InvalidArgument
(
...
...
paddle/fluid/framework/details/scope_buffered_ssa_graph_executor.cc
浏览文件 @
f52c4f8b
...
@@ -13,10 +13,12 @@
...
@@ -13,10 +13,12 @@
// limitations under the License.
// limitations under the License.
#include "paddle/fluid/framework/details/scope_buffered_ssa_graph_executor.h"
#include "paddle/fluid/framework/details/scope_buffered_ssa_graph_executor.h"
#include <stdexcept>
#include <stdexcept>
#include <string>
#include <string>
#include <utility>
#include <utility>
#include <vector>
#include <vector>
#include "paddle/fluid/framework/details/multi_devices_helper.h"
#include "paddle/fluid/framework/details/multi_devices_helper.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/variable_helper.h"
#include "paddle/fluid/framework/variable_helper.h"
...
@@ -37,7 +39,13 @@ ScopeBufferedSSAGraphExecutor::ScopeBufferedSSAGraphExecutor(
...
@@ -37,7 +39,13 @@ ScopeBufferedSSAGraphExecutor::ScopeBufferedSSAGraphExecutor(
var_infos_
(
std
::
move
(
var_infos
)),
var_infos_
(
std
::
move
(
var_infos
)),
places_
(
std
::
move
(
places
)),
places_
(
std
::
move
(
places
)),
scope_monitor_
(
places_
,
local_exec_scopes_
)
{
scope_monitor_
(
places_
,
local_exec_scopes_
)
{
PADDLE_ENFORCE_EQ
(
local_scopes_
.
size
(),
local_exec_scopes_
.
size
());
PADDLE_ENFORCE_EQ
(
local_scopes_
.
size
(),
local_exec_scopes_
.
size
(),
platform
::
errors
::
InvalidArgument
(
"The number of local scopes and the number of local execution scopes "
"should be equal, but got number of local scopes is %d and "
"number of local execution scopes is %d."
,
local_scopes_
.
size
(),
local_exec_scopes_
.
size
()));
PrepareLocalExeScopes
();
PrepareLocalExeScopes
();
}
}
...
...
paddle/fluid/framework/details/share_tensor_buffer_functor.cc
浏览文件 @
f52c4f8b
...
@@ -13,9 +13,11 @@
...
@@ -13,9 +13,11 @@
// limitations under the License.
// limitations under the License.
#include "paddle/fluid/framework/details/share_tensor_buffer_functor.h"
#include "paddle/fluid/framework/details/share_tensor_buffer_functor.h"
#include <string>
#include <string>
#include <unordered_map>
#include <unordered_map>
#include <unordered_set>
#include <unordered_set>
#include "paddle/fluid/framework/lod_tensor.h"
#include "paddle/fluid/framework/lod_tensor.h"
#include "paddle/fluid/framework/selected_rows.h"
#include "paddle/fluid/framework/selected_rows.h"
#include "paddle/fluid/platform/enforce.h"
#include "paddle/fluid/platform/enforce.h"
...
@@ -29,7 +31,8 @@ static inline const Tensor &GetTensorFromVar(const Variable *var) {
...
@@ -29,7 +31,8 @@ static inline const Tensor &GetTensorFromVar(const Variable *var) {
if
(
var
->
IsType
<
LoDTensor
>
())
{
if
(
var
->
IsType
<
LoDTensor
>
())
{
return
var
->
Get
<
LoDTensor
>
();
return
var
->
Get
<
LoDTensor
>
();
}
else
{
}
else
{
PADDLE_THROW
(
"Variable must be type of LoDTensor"
);
PADDLE_THROW
(
platform
::
errors
::
InvalidArgument
(
"Variable must be type of LoDTensor."
));
}
}
}
}
...
@@ -37,20 +40,27 @@ static inline Tensor *GetMutableTensorFromVar(Variable *var) {
...
@@ -37,20 +40,27 @@ static inline Tensor *GetMutableTensorFromVar(Variable *var) {
if
(
var
->
IsType
<
LoDTensor
>
())
{
if
(
var
->
IsType
<
LoDTensor
>
())
{
return
var
->
GetMutable
<
LoDTensor
>
();
return
var
->
GetMutable
<
LoDTensor
>
();
}
else
{
}
else
{
PADDLE_THROW
(
"Variable must be type of LoDTensor"
);
PADDLE_THROW
(
platform
::
errors
::
InvalidArgument
(
"Variable must be type of LoDTensor."
));
}
}
}
}
ShareTensorBufferFunctor
::
ShareTensorBufferFunctor
(
ShareTensorBufferFunctor
::
ShareTensorBufferFunctor
(
Scope
*
scope
,
size_t
scope_idx
,
const
std
::
string
&
op_type
,
Scope
*
scope
,
size_t
scope_idx
,
const
std
::
string
&
op_type
,
const
std
::
vector
<
const
ir
::
MemOptVarInfo
*>
&
in_var_infos
,
const
std
::
vector
<
const
ir
::
MemOptVarInfo
*>
&
in_var_infos
,
const
std
::
vector
<
std
::
string
>
&
out_var_names
)
const
std
::
vector
<
std
::
string
>
&
out_var_names
,
bool
share_dims
)
:
scope_
(
scope
),
:
scope_
(
scope
),
scope_idx_
(
scope_idx
),
scope_idx_
(
scope_idx
),
op_type_
(
op_type
),
op_type_
(
op_type
),
in_var_infos_
(
in_var_infos
),
in_var_infos_
(
in_var_infos
),
out_var_names_
(
out_var_names
)
{
out_var_names_
(
out_var_names
),
PADDLE_ENFORCE_EQ
(
in_var_infos_
.
size
(),
out_var_names_
.
size
());
share_dims_
(
share_dims
)
{
PADDLE_ENFORCE_EQ
(
in_var_infos_
.
size
(),
out_var_names_
.
size
(),
platform
::
errors
::
PreconditionNotMet
(
"The number of input variables and output variables "
"should be equal, but got number of input variables is "
"%d and number of output variables is %d."
,
in_var_infos_
.
size
(),
out_var_names_
.
size
()));
for
(
size_t
i
=
0
;
i
<
in_var_infos_
.
size
();
++
i
)
{
for
(
size_t
i
=
0
;
i
<
in_var_infos_
.
size
();
++
i
)
{
AddReuseVarPair
(
in_var_infos_
[
i
],
out_var_names_
[
i
]);
AddReuseVarPair
(
in_var_infos_
[
i
],
out_var_names_
[
i
]);
}
}
...
@@ -67,32 +77,59 @@ ShareTensorBufferFunctor::ReusedVars() const {
...
@@ -67,32 +77,59 @@ ShareTensorBufferFunctor::ReusedVars() const {
void
ShareTensorBufferFunctor
::
AddReuseVarPair
(
void
ShareTensorBufferFunctor
::
AddReuseVarPair
(
const
ir
::
MemOptVarInfo
*
in_var_info
,
const
std
::
string
&
out_var_name
)
{
const
ir
::
MemOptVarInfo
*
in_var_info
,
const
std
::
string
&
out_var_name
)
{
PADDLE_ENFORCE_NOT_NULL
(
in_var_info
,
"in_var_info cannot be nullptr"
);
PADDLE_ENFORCE_NOT_NULL
(
in_var_info
,
platform
::
errors
::
InvalidArgument
(
"The input variables to be inplaced should not be NULL."
));
PADDLE_ENFORCE_NE
(
in_var_info
->
Name
(),
out_var_name
,
PADDLE_ENFORCE_NE
(
in_var_info
->
Name
(),
out_var_name
,
"in/out cannot have same name: %s"
,
out_var_name
);
platform
::
errors
::
InvalidArgument
(
"The input variable and output variable to be inplaced "
"cannot have the same name: %s."
,
out_var_name
));
in_var_infos_
.
emplace_back
(
in_var_info
);
in_var_infos_
.
emplace_back
(
in_var_info
);
out_var_names_
.
emplace_back
(
out_var_name
);
out_var_names_
.
emplace_back
(
out_var_name
);
}
}
void
ShareTensorBufferFunctor
::
CallOnce
()
{
void
ShareTensorBufferFunctor
::
CallOnce
()
{
PADDLE_ENFORCE
(
in_out_vars_
.
empty
(),
"in_out_vars_ must be initialized here"
);
PADDLE_ENFORCE
(
in_out_vars_
.
empty
(),
platform
::
errors
::
InvalidArgument
(
"The input-output variable pairs to be "
"inplaced should be initialized here."
));
for
(
size_t
i
=
0
;
i
<
in_var_infos_
.
size
();
++
i
)
{
for
(
size_t
i
=
0
;
i
<
in_var_infos_
.
size
();
++
i
)
{
auto
*
in_var
=
exec_scope_
->
FindVar
(
in_var_infos_
[
i
]
->
Name
());
auto
*
in_var
=
exec_scope_
->
FindVar
(
in_var_infos_
[
i
]
->
Name
());
auto
*
out_var
=
exec_scope_
->
FindVar
(
out_var_names_
[
i
]);
auto
*
out_var
=
exec_scope_
->
FindVar
(
out_var_names_
[
i
]);
PADDLE_ENFORCE_NOT_NULL
(
in_var
);
PADDLE_ENFORCE_NOT_NULL
(
PADDLE_ENFORCE_NOT_NULL
(
out_var
);
in_var
,
platform
::
errors
::
NotFound
(
PADDLE_ENFORCE_NE
(
in_var
,
out_var
);
"The input variable(%s)to be inplaced should not be NULL."
,
in_var_infos_
[
i
]
->
Name
()));
PADDLE_ENFORCE_NOT_NULL
(
out_var
,
platform
::
errors
::
NotFound
(
"The output variable(%s) to be inplaced should not be NULL."
,
out_var_names_
[
i
]));
PADDLE_ENFORCE_NE
(
in_var
,
out_var
,
platform
::
errors
::
PreconditionNotMet
(
"The input variable and output variable to be inplaced "
"cannot be the same variable(%s)."
,
out_var_names_
[
i
]));
in_out_vars_
.
emplace_back
(
in_var
,
out_var
);
in_out_vars_
.
emplace_back
(
in_var
,
out_var
);
}
}
}
}
void
ShareTensorBufferFunctor
::
operator
()(
Scope
*
exec_scope
)
{
void
ShareTensorBufferFunctor
::
operator
()(
Scope
*
exec_scope
)
{
if
(
!
exec_scope_
)
{
if
(
!
exec_scope_
)
{
PADDLE_ENFORCE_NOT_NULL
(
exec_scope
);
PADDLE_ENFORCE_NOT_NULL
(
exec_scope
,
platform
::
errors
::
InvalidArgument
(
"The given execution scope should not be NULL "
"if the cached scope is NULL."
));
exec_scope_
=
exec_scope
;
exec_scope_
=
exec_scope
;
CallOnce
();
CallOnce
();
}
else
{
}
else
{
PADDLE_ENFORCE
(
exec_scope_
==
exec_scope
,
"Scope must be the same"
);
PADDLE_ENFORCE_EQ
(
exec_scope_
,
exec_scope
,
platform
::
errors
::
InvalidArgument
(
"The given execution scope and the cached execution "
"scope should be the same."
));
}
}
for
(
size_t
i
=
0
;
i
<
in_var_infos_
.
size
();
++
i
)
{
for
(
size_t
i
=
0
;
i
<
in_var_infos_
.
size
();
++
i
)
{
...
@@ -115,6 +152,13 @@ void ShareTensorBufferFunctor::operator()(Scope *exec_scope) {
...
@@ -115,6 +152,13 @@ void ShareTensorBufferFunctor::operator()(Scope *exec_scope) {
}
else
{
}
else
{
out_tensor
->
ShareBufferWith
(
in_tensor
);
out_tensor
->
ShareBufferWith
(
in_tensor
);
// NOTE(zhiqiu): In the case of inplace addto, if the operator of
// the in_out_vars is skipped during running, we should set the dims of
// output as the same as input.
if
(
share_dims_
)
{
out_tensor
->
Resize
(
in_tensor
.
dims
());
}
VLOG
(
2
)
<<
"Share tensor buffer when running "
<<
op_type_
<<
" : "
VLOG
(
2
)
<<
"Share tensor buffer when running "
<<
op_type_
<<
" : "
<<
in_var_info
->
Name
()
<<
" -> "
<<
out_var_names_
[
i
];
<<
in_var_info
->
Name
()
<<
" -> "
<<
out_var_names_
[
i
];
}
}
...
...
paddle/fluid/framework/details/share_tensor_buffer_functor.h
浏览文件 @
f52c4f8b
...
@@ -19,6 +19,7 @@
...
@@ -19,6 +19,7 @@
#include <unordered_set>
#include <unordered_set>
#include <utility>
#include <utility>
#include <vector>
#include <vector>
#include "paddle/fluid/framework/details/op_handle_base.h"
#include "paddle/fluid/framework/details/op_handle_base.h"
#include "paddle/fluid/framework/ir/memory_optimize_pass/memory_optimization_var_info.h"
#include "paddle/fluid/framework/ir/memory_optimize_pass/memory_optimization_var_info.h"
#include "paddle/fluid/framework/scope.h"
#include "paddle/fluid/framework/scope.h"
...
@@ -40,11 +41,13 @@ class ShareTensorBufferFunctor {
...
@@ -40,11 +41,13 @@ class ShareTensorBufferFunctor {
ShareTensorBufferFunctor
(
ShareTensorBufferFunctor
(
Scope
*
scope
,
size_t
scope_idx
,
const
std
::
string
&
op_type
,
Scope
*
scope
,
size_t
scope_idx
,
const
std
::
string
&
op_type
,
const
std
::
vector
<
const
ir
::
MemOptVarInfo
*>
&
in_var_infos
,
const
std
::
vector
<
const
ir
::
MemOptVarInfo
*>
&
in_var_infos
,
const
std
::
vector
<
std
::
string
>
&
out_var_names
);
const
std
::
vector
<
std
::
string
>
&
out_var_names
,
bool
share_dims
=
false
);
void
AddReuseVarPair
(
const
ir
::
MemOptVarInfo
*
in_var_info
,
void
AddReuseVarPair
(
const
ir
::
MemOptVarInfo
*
in_var_info
,
const
std
::
string
&
out_var_name
);
const
std
::
string
&
out_var_name
);
void
SetShareDims
(
bool
share_dims
)
{
share_dims_
=
share_dims
;
}
void
operator
()(
Scope
*
exec_scope
);
void
operator
()(
Scope
*
exec_scope
);
std
::
unordered_map
<
std
::
string
,
std
::
string
>
ReusedVars
()
const
;
std
::
unordered_map
<
std
::
string
,
std
::
string
>
ReusedVars
()
const
;
...
@@ -66,6 +69,11 @@ class ShareTensorBufferFunctor {
...
@@ -66,6 +69,11 @@ class ShareTensorBufferFunctor {
std
::
vector
<
std
::
string
>
out_var_names_
;
std
::
vector
<
std
::
string
>
out_var_names_
;
std
::
vector
<
std
::
pair
<
const
Variable
*
,
Variable
*>>
in_out_vars_
;
std
::
vector
<
std
::
pair
<
const
Variable
*
,
Variable
*>>
in_out_vars_
;
// NOTE(zhiqiu): In the case of inplace addto, if the operator of
// the in_out_vars is skipped during running, we should set the dims of output
// as the same as input.
bool
share_dims_
{
false
};
};
};
}
// namespace details
}
// namespace details
...
...
paddle/fluid/framework/details/share_tensor_buffer_op_handle.cc
浏览文件 @
f52c4f8b
...
@@ -13,8 +13,10 @@
...
@@ -13,8 +13,10 @@
// limitations under the License.
// limitations under the License.
#include "paddle/fluid/framework/details/share_tensor_buffer_op_handle.h"
#include "paddle/fluid/framework/details/share_tensor_buffer_op_handle.h"
#include <string>
#include <string>
#include <unordered_set>
#include <unordered_set>
#include "paddle/fluid/framework/ir/memory_optimize_pass/memory_optimization_var_info.h"
#include "paddle/fluid/framework/ir/memory_optimize_pass/memory_optimization_var_info.h"
#include "paddle/fluid/framework/lod_tensor.h"
#include "paddle/fluid/framework/lod_tensor.h"
#include "paddle/fluid/framework/scope.h"
#include "paddle/fluid/framework/scope.h"
...
@@ -32,26 +34,35 @@ ComputationOpHandle *GetUniquePendingComputationOpHandle(
...
@@ -32,26 +34,35 @@ ComputationOpHandle *GetUniquePendingComputationOpHandle(
for
(
ir
::
Node
*
pending_op
:
out_var
->
outputs
)
{
for
(
ir
::
Node
*
pending_op
:
out_var
->
outputs
)
{
auto
&
op
=
pending_op
->
Wrapper
<
OpHandleBase
>
();
auto
&
op
=
pending_op
->
Wrapper
<
OpHandleBase
>
();
auto
*
compute_op
=
dynamic_cast
<
ComputationOpHandle
*>
(
&
op
);
auto
*
compute_op
=
dynamic_cast
<
ComputationOpHandle
*>
(
&
op
);
PADDLE_ENFORCE_NOT_NULL
(
compute_op
);
PADDLE_ENFORCE_NOT_NULL
(
compute_op
,
platform
::
errors
::
PreconditionNotMet
(
"The pending OpHandle should be ComputationOpHandle."
));
if
(
result_op
==
nullptr
)
{
if
(
result_op
==
nullptr
)
{
result_op
=
compute_op
;
result_op
=
compute_op
;
}
else
{
}
else
{
PADDLE_ENFORCE_EQ
(
result_op
,
compute_op
);
PADDLE_ENFORCE_EQ
(
result_op
,
compute_op
,
platform
::
errors
::
PreconditionNotMet
(
"The pending OpHandle should be the unique one."
));
}
}
}
}
}
}
PADDLE_ENFORCE_NOT_NULL
(
result_op
);
PADDLE_ENFORCE_NOT_NULL
(
result_op
,
platform
::
errors
::
PreconditionNotMet
(
"The pending OpHandle should not be NULL."
));
return
result_op
;
return
result_op
;
}
}
ShareTensorBufferOpHandle
::
ShareTensorBufferOpHandle
(
ShareTensorBufferOpHandle
::
ShareTensorBufferOpHandle
(
ir
::
Node
*
node
,
Scope
*
scope
,
size_t
scope_idx
,
const
std
::
string
&
op_type
,
ir
::
Node
*
node
,
Scope
*
scope
,
size_t
scope_idx
,
const
std
::
string
&
op_type
,
const
std
::
vector
<
const
ir
::
MemOptVarInfo
*>
&
in_var_infos
,
const
std
::
vector
<
const
ir
::
MemOptVarInfo
*>
&
in_var_infos
,
const
std
::
vector
<
std
::
string
>
&
out_var_names
)
const
std
::
vector
<
std
::
string
>
&
out_var_names
,
bool
share_dims
)
:
OpHandleBase
(
node
),
:
OpHandleBase
(
node
),
functor_
(
scope
,
scope_idx
,
op_type
,
in_var_infos
,
out_var_names
)
{}
functor_
(
scope
,
scope_idx
,
op_type
,
in_var_infos
,
out_var_names
,
share_dims
)
{}
std
::
unordered_map
<
std
::
string
,
std
::
string
>
std
::
unordered_map
<
std
::
string
,
std
::
string
>
ShareTensorBufferOpHandle
::
ReusedVars
()
const
{
ShareTensorBufferOpHandle
::
ReusedVars
()
const
{
...
@@ -63,6 +74,10 @@ void ShareTensorBufferOpHandle::AddReuseVarPair(
...
@@ -63,6 +74,10 @@ void ShareTensorBufferOpHandle::AddReuseVarPair(
functor_
.
AddReuseVarPair
(
in_var_info
,
out_var_name
);
functor_
.
AddReuseVarPair
(
in_var_info
,
out_var_name
);
}
}
void
ShareTensorBufferOpHandle
::
SetShareDims
(
bool
share_dims
)
{
functor_
.
SetShareDims
(
share_dims
);
}
void
ShareTensorBufferOpHandle
::
InitCUDA
()
{
void
ShareTensorBufferOpHandle
::
InitCUDA
()
{
#ifdef PADDLE_WITH_CUDA
#ifdef PADDLE_WITH_CUDA
int
dev_id
=
int
dev_id
=
...
...
paddle/fluid/framework/details/share_tensor_buffer_op_handle.h
浏览文件 @
f52c4f8b
...
@@ -17,6 +17,7 @@
...
@@ -17,6 +17,7 @@
#include <unordered_map>
#include <unordered_map>
#include <utility>
#include <utility>
#include <vector>
#include <vector>
#include "paddle/fluid/framework/details/computation_op_handle.h"
#include "paddle/fluid/framework/details/computation_op_handle.h"
#include "paddle/fluid/framework/details/op_handle_base.h"
#include "paddle/fluid/framework/details/op_handle_base.h"
#include "paddle/fluid/framework/details/share_tensor_buffer_functor.h"
#include "paddle/fluid/framework/details/share_tensor_buffer_functor.h"
...
@@ -31,7 +32,7 @@ class ShareTensorBufferOpHandle : public OpHandleBase {
...
@@ -31,7 +32,7 @@ class ShareTensorBufferOpHandle : public OpHandleBase {
ir
::
Node
*
node
,
Scope
*
scope
,
size_t
scope_idx
,
ir
::
Node
*
node
,
Scope
*
scope
,
size_t
scope_idx
,
const
std
::
string
&
op_type
,
const
std
::
string
&
op_type
,
const
std
::
vector
<
const
ir
::
MemOptVarInfo
*>
&
in_vars_infos
,
const
std
::
vector
<
const
ir
::
MemOptVarInfo
*>
&
in_vars_infos
,
const
std
::
vector
<
std
::
string
>
&
out_var_names
);
const
std
::
vector
<
std
::
string
>
&
out_var_names
,
bool
share_dims
=
false
);
std
::
unordered_map
<
std
::
string
,
std
::
string
>
ReusedVars
()
const
;
std
::
unordered_map
<
std
::
string
,
std
::
string
>
ReusedVars
()
const
;
...
@@ -42,6 +43,8 @@ class ShareTensorBufferOpHandle : public OpHandleBase {
...
@@ -42,6 +43,8 @@ class ShareTensorBufferOpHandle : public OpHandleBase {
void
AddReuseVarPair
(
const
ir
::
MemOptVarInfo
*
in_var_info
,
void
AddReuseVarPair
(
const
ir
::
MemOptVarInfo
*
in_var_info
,
const
std
::
string
&
out_var_name
);
const
std
::
string
&
out_var_name
);
void
SetShareDims
(
bool
share_dims
);
const
ShareTensorBufferFunctor
&
Functor
()
const
{
return
functor_
;
}
const
ShareTensorBufferFunctor
&
Functor
()
const
{
return
functor_
;
}
protected:
protected:
...
...
paddle/fluid/framework/details/ssa_graph_executor.cc
浏览文件 @
f52c4f8b
...
@@ -13,6 +13,7 @@
...
@@ -13,6 +13,7 @@
// limitations under the License.
// limitations under the License.
#include "paddle/fluid/framework/details/ssa_graph_executor.h"
#include "paddle/fluid/framework/details/ssa_graph_executor.h"
#include "paddle/fluid/framework/details/fetch_async_op_handle.h"
#include "paddle/fluid/framework/details/fetch_async_op_handle.h"
namespace
paddle
{
namespace
paddle
{
...
@@ -27,8 +28,9 @@ void ClearFetchOp(ir::Graph* graph, std::vector<OpHandleBase*>* fetch_ops) {
...
@@ -27,8 +28,9 @@ void ClearFetchOp(ir::Graph* graph, std::vector<OpHandleBase*>* fetch_ops) {
PADDLE_ENFORCE_EQ
(
dynamic_cast
<
FetchOpHandle
*>
(
op
)
!=
nullptr
||
PADDLE_ENFORCE_EQ
(
dynamic_cast
<
FetchOpHandle
*>
(
op
)
!=
nullptr
||
dynamic_cast
<
FetchAsyncOpHandle
*>
(
op
)
!=
nullptr
,
dynamic_cast
<
FetchAsyncOpHandle
*>
(
op
)
!=
nullptr
,
true
,
true
,
platform
::
errors
::
PreconditionNotMet
(
"The input ops of ClearFetchOp function should be "
"The input ops of ClearFetchOp function should be "
"FetchOpHandle or FetchAsyncOpHandle."
);
"FetchOpHandle or FetchAsyncOpHandle."
)
);
for
(
auto
&
out_var
:
op
->
Node
()
->
outputs
)
{
for
(
auto
&
out_var
:
op
->
Node
()
->
outputs
)
{
graph
->
RemoveNode
(
out_var
);
graph
->
RemoveNode
(
out_var
);
}
}
...
...
paddle/fluid/framework/details/threaded_ssa_graph_executor.cc
浏览文件 @
f52c4f8b
...
@@ -13,6 +13,7 @@
...
@@ -13,6 +13,7 @@
// limitations under the License.
// limitations under the License.
#include "paddle/fluid/framework/details/threaded_ssa_graph_executor.h"
#include "paddle/fluid/framework/details/threaded_ssa_graph_executor.h"
#include "paddle/fluid/framework/ir/graph_helper.h"
#include "paddle/fluid/framework/ir/graph_helper.h"
#include "paddle/fluid/platform/profiler.h"
#include "paddle/fluid/platform/profiler.h"
...
@@ -138,7 +139,10 @@ inline FetchResultType ThreadedSSAGraphExecutor::RunImpl(
...
@@ -138,7 +139,10 @@ inline FetchResultType ThreadedSSAGraphExecutor::RunImpl(
}
}
}
}
}
}
PADDLE_ENFORCE
(
ready_ops
.
empty
());
PADDLE_ENFORCE_EQ
(
ready_ops
.
empty
(),
true
,
platform
::
errors
::
Fatal
(
"After the execution of computation graph, "
"there are unexecuted operators left."
));
}
}
// Wait FetchOps.
// Wait FetchOps.
...
@@ -165,9 +169,8 @@ void ThreadedSSAGraphExecutor::InsertFetchOps(
...
@@ -165,9 +169,8 @@ void ThreadedSSAGraphExecutor::InsertFetchOps(
FetchResultType
*
fetch_data
,
bool
return_merged
)
{
FetchResultType
*
fetch_data
,
bool
return_merged
)
{
std
::
unordered_map
<
std
::
string
,
std
::
vector
<
VarHandleBase
*>>
fetched_vars
;
std
::
unordered_map
<
std
::
string
,
std
::
vector
<
VarHandleBase
*>>
fetched_vars
;
std
::
unordered_set
<
VarHandleBase
*>
local_ready_vars
;
std
::
unordered_set
<
VarHandleBase
*>
local_ready_vars
;
std
::
unordered_set
<
std
::
string
>
fetch_tensor_set
(
fetch_tensors
.
begin
(),
fetch_tensors
.
end
());
for
(
auto
&
fetch_var_name
:
fetch_tensors
)
{
for
(
auto
&
fetch_var_name
:
fetch_tensor_set
)
{
for
(
auto
&
var_map
:
graph_
->
Get
<
details
::
GraphVars
>
(
details
::
kGraphVars
))
{
for
(
auto
&
var_map
:
graph_
->
Get
<
details
::
GraphVars
>
(
details
::
kGraphVars
))
{
auto
it
=
var_map
.
find
(
fetch_var_name
);
auto
it
=
var_map
.
find
(
fetch_var_name
);
if
(
it
!=
var_map
.
end
())
{
if
(
it
!=
var_map
.
end
())
{
...
@@ -231,7 +234,11 @@ void ThreadedSSAGraphExecutor::InsertFetchOps(
...
@@ -231,7 +234,11 @@ void ThreadedSSAGraphExecutor::InsertFetchOps(
ready_ops
->
insert
(
static_cast
<
OpHandleBase
*>
(
op
));
ready_ops
->
insert
(
static_cast
<
OpHandleBase
*>
(
op
));
}
}
}
}
PADDLE_ENFORCE_EQ
(
local_ready_vars
.
size
(),
0
);
PADDLE_ENFORCE_EQ
(
local_ready_vars
.
size
(),
0
,
platform
::
errors
::
Fatal
(
"The number of ready variables should be 0, but got %d."
,
local_ready_vars
.
size
()));
}
}
void
ThreadedSSAGraphExecutor
::
InsertPendingOp
(
void
ThreadedSSAGraphExecutor
::
InsertPendingOp
(
...
@@ -277,7 +284,9 @@ void ThreadedSSAGraphExecutor::PrepareOpDeps() {
...
@@ -277,7 +284,9 @@ void ThreadedSSAGraphExecutor::PrepareOpDeps() {
}
}
}
}
op_deps_
->
num_ops_
=
ready_ops
.
size
()
+
pending_ops
.
size
();
op_deps_
->
num_ops_
=
ready_ops
.
size
()
+
pending_ops
.
size
();
PADDLE_ENFORCE_GT
(
op_deps_
->
num_ops_
,
0
,
"The graph doesn't have operators."
);
PADDLE_ENFORCE_GT
(
op_deps_
->
num_ops_
,
0
,
platform
::
errors
::
InvalidArgument
(
"The graph doesn't have operators."
));
for
(
auto
ready_var
:
ready_vars
)
{
for
(
auto
ready_var
:
ready_vars
)
{
pending_vars
.
erase
(
ready_var
);
pending_vars
.
erase
(
ready_var
);
...
...
paddle/fluid/framework/details/threaded_ssa_graph_executor.h
浏览文件 @
f52c4f8b
...
@@ -14,6 +14,8 @@
...
@@ -14,6 +14,8 @@
#pragma once
#pragma once
#include <ThreadPool.h> // ThreadPool in thrird party
#include <deque>
#include <deque>
#include <functional>
#include <functional>
#include <list>
#include <list>
...
@@ -24,8 +26,6 @@
...
@@ -24,8 +26,6 @@
#include <utility>
#include <utility>
#include <vector>
#include <vector>
#include <ThreadPool.h> // ThreadPool in thrird party
#include "paddle/fluid/framework/blocking_queue.h"
#include "paddle/fluid/framework/blocking_queue.h"
#include "paddle/fluid/framework/details/exception_holder.h"
#include "paddle/fluid/framework/details/exception_holder.h"
#include "paddle/fluid/framework/details/execution_strategy.h"
#include "paddle/fluid/framework/details/execution_strategy.h"
...
...
paddle/fluid/framework/details/var_handle.h
浏览文件 @
f52c4f8b
...
@@ -54,8 +54,10 @@ struct VarHandleBase {
...
@@ -54,8 +54,10 @@ struct VarHandleBase {
void
AddOutput
(
OpHandleBase
*
out
,
ir
::
Node
*
node
)
{
void
AddOutput
(
OpHandleBase
*
out
,
ir
::
Node
*
node
)
{
if
(
pending_ops_
.
find
(
out
)
==
pending_ops_
.
end
())
{
if
(
pending_ops_
.
find
(
out
)
==
pending_ops_
.
end
())
{
PADDLE_ENFORCE
(
out
!=
nullptr
,
"The output of %s should not be nullptr"
,
PADDLE_ENFORCE_NOT_NULL
(
out
,
this
->
Node
()
->
Name
());
platform
::
errors
::
InvalidArgument
(
"The output added to VarHandle %s is NULL."
,
this
->
Node
()
->
Name
()));
pending_ops_
.
insert
(
out
);
pending_ops_
.
insert
(
out
);
node_
->
outputs
.
push_back
(
node
);
node_
->
outputs
.
push_back
(
node
);
}
}
...
@@ -120,7 +122,10 @@ struct VarHandle : public VarHandleBase {
...
@@ -120,7 +122,10 @@ struct VarHandle : public VarHandleBase {
bool
HasEvent
()
{
return
has_event_
;
}
bool
HasEvent
()
{
return
has_event_
;
}
const
cudaEvent_t
&
GetEvent
()
{
const
cudaEvent_t
&
GetEvent
()
{
PADDLE_ENFORCE
(
HasEvent
(),
"The event is not set."
);
PADDLE_ENFORCE_EQ
(
HasEvent
(),
true
,
platform
::
errors
::
PreconditionNotMet
(
"The cuda event is not set, maybe InitCUDA() is not called."
));
return
event_
;
return
event_
;
}
}
...
...
paddle/fluid/framework/details/variable_visitor.cc
浏览文件 @
f52c4f8b
...
@@ -13,6 +13,7 @@
...
@@ -13,6 +13,7 @@
// limitations under the License.
// limitations under the License.
#include "paddle/fluid/framework/details/variable_visitor.h"
#include "paddle/fluid/framework/details/variable_visitor.h"
#include "paddle/fluid/framework/selected_rows.h"
#include "paddle/fluid/framework/selected_rows.h"
namespace
paddle
{
namespace
paddle
{
namespace
framework
{
namespace
framework
{
...
@@ -24,7 +25,9 @@ static void VisitVariable(Variable* var, Func* func) {
...
@@ -24,7 +25,9 @@ static void VisitVariable(Variable* var, Func* func) {
}
else
if
(
var
->
IsType
<
SelectedRows
>
())
{
}
else
if
(
var
->
IsType
<
SelectedRows
>
())
{
(
*
func
)(
var
->
GetMutable
<
SelectedRows
>
());
(
*
func
)(
var
->
GetMutable
<
SelectedRows
>
());
}
else
{
}
else
{
PADDLE_THROW
(
"Not supported type %s"
,
ToTypeName
(
var
->
Type
()));
PADDLE_THROW
(
platform
::
errors
::
Unimplemented
(
"VisitVariable is not supported for type %s."
,
ToTypeName
(
var
->
Type
())));
}
}
}
}
...
@@ -35,7 +38,8 @@ static void VisitVariable(const Variable& var, Func* func) {
...
@@ -35,7 +38,8 @@ static void VisitVariable(const Variable& var, Func* func) {
}
else
if
(
var
.
IsType
<
SelectedRows
>
())
{
}
else
if
(
var
.
IsType
<
SelectedRows
>
())
{
(
*
func
)(
var
.
Get
<
SelectedRows
>
());
(
*
func
)(
var
.
Get
<
SelectedRows
>
());
}
else
{
}
else
{
PADDLE_THROW
(
"Not supported type %s"
,
ToTypeName
(
var
.
Type
()));
PADDLE_THROW
(
platform
::
errors
::
Unimplemented
(
"VisitVariable is not supported for type %s."
,
ToTypeName
(
var
.
Type
())));
}
}
}
}
...
@@ -50,7 +54,8 @@ struct TensorVisitor {
...
@@ -50,7 +54,8 @@ struct TensorVisitor {
template
<
typename
T
>
template
<
typename
T
>
void
operator
()()
{
void
operator
()()
{
PADDLE_THROW
(
"Not Support to get LoDTensor from %s"
,
typeid
(
T
).
name
());
PADDLE_THROW
(
platform
::
errors
::
Unimplemented
(
"Getting tensor from type %s is not supported."
,
typeid
(
T
).
name
()));
}
}
};
};
...
@@ -78,8 +83,8 @@ struct ShareDimsAndLoDVisitor {
...
@@ -78,8 +83,8 @@ struct ShareDimsAndLoDVisitor {
template
<
typename
T
>
template
<
typename
T
>
void
operator
()(
const
T
&
)
{
void
operator
()(
const
T
&
)
{
PADDLE_
ENFORCE
(
"ShareDimsAndLoD is not supported by type %s"
,
PADDLE_
THROW
(
platform
::
errors
::
Unimplemented
(
typeid
(
T
).
name
(
));
"ShareDimsAndLoD is not supported for type %s."
,
typeid
(
T
).
name
()
));
}
}
};
};
...
@@ -89,42 +94,54 @@ void VariableVisitor::ShareDimsAndLoD(const Variable& src, Variable* trg) {
...
@@ -89,42 +94,54 @@ void VariableVisitor::ShareDimsAndLoD(const Variable& src, Variable* trg) {
}
}
struct
EnforceShapeAndDTypeEQVisitor
{
struct
EnforceShapeAndDTypeEQVisitor
{
const
Variable
*
trg
_
;
const
Variable
*
dst
_
;
void
operator
()(
const
LoDTensor
&
src
)
{
void
operator
()(
const
LoDTensor
&
src
)
{
auto
&
tensor
=
trg
_
->
Get
<
LoDTensor
>
();
auto
&
tensor
=
dst
_
->
Get
<
LoDTensor
>
();
PADDLE_ENFORCE_EQ
(
PADDLE_ENFORCE_EQ
(
src
.
place
().
which
(),
tensor
.
place
().
which
(),
src
.
place
().
which
(),
tensor
.
place
().
which
(),
platform
::
errors
::
PreconditionNotMet
(
"The Places of the two Variable must be all on CPU or all on GPU."
);
"The place type of the two variables is not equal."
)
);
PADDLE_ENFORCE_EQ
(
src
.
type
(),
tensor
.
type
(),
PADDLE_ENFORCE_EQ
(
src
.
type
(),
tensor
.
type
(),
"The dtype of the two Variable is not equal."
);
platform
::
errors
::
PreconditionNotMet
(
PADDLE_ENFORCE_EQ
(
src
.
dims
(),
tensor
.
dims
(),
"The dtype of the two variables is not equal."
));
"The dims of the two Variable is not equal."
);
PADDLE_ENFORCE_EQ
(
src
.
dims
(),
tensor
.
dims
(),
platform
::
errors
::
PreconditionNotMet
(
"The layout of the two variables' tensors is not equal."
));
PADDLE_ENFORCE_EQ
(
src
.
lod
(),
tensor
.
lod
(),
PADDLE_ENFORCE_EQ
(
src
.
lod
(),
tensor
.
lod
(),
"The lod of the two Variable is not equal."
);
platform
::
errors
::
PreconditionNotMet
(
PADDLE_ENFORCE_EQ
(
src
.
layout
(),
tensor
.
layout
(),
"The lod of the two variable is not equal."
));
"The layout of the two Variable's tensor is not equal."
);
PADDLE_ENFORCE_EQ
(
src
.
layout
(),
tensor
.
layout
(),
platform
::
errors
::
PreconditionNotMet
(
"The layout of the two variables' tensors tensor is not equal."
));
}
}
void
operator
()(
const
SelectedRows
&
src
)
{
void
operator
()(
const
SelectedRows
&
src
)
{
auto
&
selected_rows
=
trg
_
->
Get
<
SelectedRows
>
();
auto
&
selected_rows
=
dst
_
->
Get
<
SelectedRows
>
();
PADDLE_ENFORCE_EQ
(
PADDLE_ENFORCE_EQ
(
src
.
place
().
which
(),
selected_rows
.
place
().
which
(),
src
.
place
().
which
(),
selected_rows
.
place
().
which
(),
platform
::
errors
::
PreconditionNotMet
(
"The Places of the two Variable must be all on CPU or all on GPU."
);
"The place type of the two variables is not equal."
)
);
PADDLE_ENFORCE_EQ
(
src
.
value
().
type
(),
selected_rows
.
value
().
type
(),
PADDLE_ENFORCE_EQ
(
src
.
value
().
type
(),
selected_rows
.
value
().
type
(),
"The dtype of the two Variable is not equal."
);
platform
::
errors
::
PreconditionNotMet
(
PADDLE_ENFORCE_EQ
(
src
.
value
().
layout
(),
selected_rows
.
value
().
layout
(),
"The dtype of the two variables is not equal."
));
"The layout of the two Variable's tensor is not equal."
);
PADDLE_ENFORCE_EQ
(
src
.
value
().
layout
(),
selected_rows
.
value
().
layout
(),
platform
::
errors
::
PreconditionNotMet
(
"The layout of the two variables' tensors is not equal."
));
PADDLE_ENFORCE_EQ
(
src
.
height
(),
selected_rows
.
height
(),
PADDLE_ENFORCE_EQ
(
src
.
height
(),
selected_rows
.
height
(),
"The height of the two Variable is not equal."
);
platform
::
errors
::
PreconditionNotMet
(
"The height of the two variables is not equal."
));
PADDLE_ENFORCE_EQ
(
src
.
GetCompleteDims
(),
selected_rows
.
GetCompleteDims
(),
PADDLE_ENFORCE_EQ
(
src
.
GetCompleteDims
(),
selected_rows
.
GetCompleteDims
(),
"The dims of the two Variable is not equal."
);
platform
::
errors
::
PreconditionNotMet
(
"The dims of the two variables is not equal."
));
}
}
template
<
typename
T
>
template
<
typename
T
>
void
operator
()(
const
T
&
)
{
void
operator
()(
const
T
&
)
{
PADDLE_ENFORCE
(
"EnforceShapeAndDTypeEQ is not supported by type %s"
,
PADDLE_THROW
(
platform
::
errors
::
Unimplemented
(
typeid
(
T
).
name
());
"EnforceShapeAndDTypeEQ is not supported for type %s."
,
typeid
(
T
).
name
()));
}
}
};
};
...
...
paddle/fluid/framework/fleet/gloo_wrapper.cc
浏览文件 @
f52c4f8b
...
@@ -19,6 +19,8 @@ limitations under the License. */
...
@@ -19,6 +19,8 @@ limitations under the License. */
namespace
gloo
{
namespace
gloo
{
namespace
rendezvous
{
namespace
rendezvous
{
constexpr
int
kNodeSize
=
136
;
HdfsStore
::
HdfsStore
(
const
std
::
string
&
path
)
{
HdfsStore
::
HdfsStore
(
const
std
::
string
&
path
)
{
path_
=
path
;
path_
=
path
;
wait_sleep_ms_
=
10000
;
wait_sleep_ms_
=
10000
;
...
@@ -213,12 +215,14 @@ void ParallelConnectContext::connectFullMesh(
...
@@ -213,12 +215,14 @@ void ParallelConnectContext::connectFullMesh(
storeKey
<<
rank
;
storeKey
<<
rank
;
store
.
set
(
storeKey
.
str
(),
allBytes
);
store
.
set
(
storeKey
.
str
(),
allBytes
);
auto
total_add_size
=
kNodeSize
*
(
size
-
1
);
std
::
vector
<
std
::
shared_ptr
<
std
::
thread
>>
connect_threads
(
thread_num_
);
std
::
vector
<
std
::
shared_ptr
<
std
::
thread
>>
connect_threads
(
thread_num_
);
// Connect every pair
// Connect every pair
for
(
uint32_t
i
=
0
;
i
<
connect_threads
.
size
();
++
i
)
{
for
(
uint32_t
i
=
0
;
i
<
connect_threads
.
size
();
++
i
)
{
connect_threads
[
i
].
reset
(
new
std
::
thread
(
connect_threads
[
i
].
reset
(
new
std
::
thread
(
[
&
store
,
&
transportContext
,
t
his
](
size_t
thread_idx
,
[
&
store
,
&
transportContext
,
t
otal_add_size
,
this
](
size_t
thread_num
)
->
void
{
size_t
thread_idx
,
size_t
thread_num
)
->
void
{
for
(
int
i
=
thread_idx
;
i
<
size
;
i
+=
thread_num
)
{
for
(
int
i
=
thread_idx
;
i
<
size
;
i
+=
thread_num
)
{
if
(
i
==
rank
)
{
if
(
i
==
rank
)
{
continue
;
continue
;
...
@@ -226,8 +230,23 @@ void ParallelConnectContext::connectFullMesh(
...
@@ -226,8 +230,23 @@ void ParallelConnectContext::connectFullMesh(
// Wait for address of other side of this pair to become available
// Wait for address of other side of this pair to become available
std
::
string
key
=
std
::
to_string
(
i
);
std
::
string
key
=
std
::
to_string
(
i
);
store
.
wait
({
key
},
getTimeout
());
store
.
wait
({
key
},
getTimeout
());
std
::
vector
<
char
>
allAddrs
;
auto
max_retry_times
=
5
;
// Connect to other side of this pair
// Connect to other side of this pair
auto
allAddrs
=
store
.
get
(
key
);
while
(
max_retry_times
>
0
)
{
allAddrs
=
store
.
get
(
key
);
VLOG
(
3
)
<<
"store get all address size: "
<<
allAddrs
.
size
()
<<
" except: "
<<
total_add_size
;
if
(
allAddrs
.
size
()
==
static_cast
<
size_t
>
(
total_add_size
))
{
break
;
}
--
max_retry_times
;
}
auto
addr
=
extractAddress
(
allAddrs
,
i
);
auto
addr
=
extractAddress
(
allAddrs
,
i
);
transportContext
->
getPair
(
i
)
->
connect
(
addr
);
transportContext
->
getPair
(
i
)
->
connect
(
addr
);
}
}
...
...
paddle/fluid/framework/ir/conv_affine_channel_fuse_pass.cc
浏览文件 @
f52c4f8b
...
@@ -18,6 +18,7 @@
...
@@ -18,6 +18,7 @@
#include <string>
#include <string>
#include <vector>
#include <vector>
#include "paddle/fluid/framework/lod_tensor.h"
#include "paddle/fluid/framework/lod_tensor.h"
#include "paddle/fluid/framework/op_version_registry.h"
#include "paddle/fluid/operators/math/cpu_vec.h"
#include "paddle/fluid/operators/math/cpu_vec.h"
#include "paddle/fluid/platform/enforce.h"
#include "paddle/fluid/platform/enforce.h"
...
@@ -225,3 +226,14 @@ REGISTER_PASS(conv_affine_channel_fuse_pass,
...
@@ -225,3 +226,14 @@ REGISTER_PASS(conv_affine_channel_fuse_pass,
paddle
::
framework
::
ir
::
ConvAffineChannelFusePass
);
paddle
::
framework
::
ir
::
ConvAffineChannelFusePass
);
REGISTER_PASS
(
conv_eltwiseadd_affine_channel_fuse_pass
,
REGISTER_PASS
(
conv_eltwiseadd_affine_channel_fuse_pass
,
paddle
::
framework
::
ir
::
ConvEltwiseAddAffineChannelFusePass
);
paddle
::
framework
::
ir
::
ConvEltwiseAddAffineChannelFusePass
);
REGISTER_PASS_CAPABILITY
(
conv_affine_channel_fuse_pass
)
.
AddCombination
(
paddle
::
framework
::
compatible
::
OpVersionComparatorCombination
()
.
EQ
(
"conv2d"
,
0
)
.
EQ
(
"affine_channel"
,
0
));
REGISTER_PASS_CAPABILITY
(
conv_eltwiseadd_affine_channel_fuse_pass
)
.
AddCombination
(
paddle
::
framework
::
compatible
::
OpVersionComparatorCombination
()
.
EQ
(
"conv2d"
,
0
)
.
EQ
(
"elementwise_add"
,
0
)
.
EQ
(
"affine_channel"
,
0
));
paddle/fluid/framework/ir/conv_bn_fuse_pass.cc
浏览文件 @
f52c4f8b
...
@@ -18,6 +18,7 @@
...
@@ -18,6 +18,7 @@
#include <string>
#include <string>
#include <vector>
#include <vector>
#include "paddle/fluid/framework/lod_tensor.h"
#include "paddle/fluid/framework/lod_tensor.h"
#include "paddle/fluid/framework/op_version_registry.h"
#include "paddle/fluid/operators/math/cpu_vec.h"
#include "paddle/fluid/operators/math/cpu_vec.h"
#include "paddle/fluid/platform/enforce.h"
#include "paddle/fluid/platform/enforce.h"
...
@@ -372,3 +373,14 @@ REGISTER_PASS(depthwise_conv_bn_fuse_pass,
...
@@ -372,3 +373,14 @@ REGISTER_PASS(depthwise_conv_bn_fuse_pass,
paddle
::
framework
::
ir
::
DepthwiseConvBNFusePass
);
paddle
::
framework
::
ir
::
DepthwiseConvBNFusePass
);
REGISTER_PASS
(
depthwise_conv_eltwiseadd_bn_fuse_pass
,
REGISTER_PASS
(
depthwise_conv_eltwiseadd_bn_fuse_pass
,
paddle
::
framework
::
ir
::
DepthwiseConvEltwiseAddBNFusePass
);
paddle
::
framework
::
ir
::
DepthwiseConvEltwiseAddBNFusePass
);
REGISTER_PASS_CAPABILITY
(
conv_bn_fuse_pass
)
.
AddCombination
(
paddle
::
framework
::
compatible
::
OpVersionComparatorCombination
()
.
EQ
(
"conv2d"
,
0
)
.
EQ
(
"batch_norm"
,
0
));
REGISTER_PASS_CAPABILITY
(
conv_eltwiseadd_bn_fuse_pass
)
.
AddCombination
(
paddle
::
framework
::
compatible
::
OpVersionComparatorCombination
()
.
EQ
(
"conv2d"
,
0
)
.
EQ
(
"elementwise_add"
,
0
)
.
EQ
(
"batch_norm"
,
0
));
paddle/fluid/framework/ir/conv_elementwise_add2_act_fuse_pass.cc
浏览文件 @
f52c4f8b
...
@@ -11,9 +11,9 @@
...
@@ -11,9 +11,9 @@
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// See the License for the specific language governing permissions and
// limitations under the License.
// limitations under the License.
#include "paddle/fluid/framework/ir/conv_elementwise_add2_act_fuse_pass.h"
#include "paddle/fluid/framework/ir/conv_elementwise_add2_act_fuse_pass.h"
#include <string>
#include <string>
#include "paddle/fluid/framework/op_version_registry.h"
namespace
paddle
{
namespace
paddle
{
namespace
framework
{
namespace
framework
{
...
@@ -116,3 +116,10 @@ void ConvElementwiseAdd2ActFusePass::ApplyImpl(ir::Graph* graph) const {
...
@@ -116,3 +116,10 @@ void ConvElementwiseAdd2ActFusePass::ApplyImpl(ir::Graph* graph) const {
REGISTER_PASS
(
conv_elementwise_add2_act_fuse_pass
,
REGISTER_PASS
(
conv_elementwise_add2_act_fuse_pass
,
paddle
::
framework
::
ir
::
ConvElementwiseAdd2ActFusePass
);
paddle
::
framework
::
ir
::
ConvElementwiseAdd2ActFusePass
);
REGISTER_PASS_CAPABILITY
(
conv_elementwise_add2_act_fuse_pass
)
.
AddCombination
(
paddle
::
framework
::
compatible
::
OpVersionComparatorCombination
()
.
EQ
(
"conv2d"
,
0
)
.
EQ
(
"elementwise_add"
,
0
)
.
EQ
(
"relu"
,
0
)
.
EQ
(
"identity"
,
0
));
paddle/fluid/framework/ir/conv_elementwise_add_act_fuse_pass.cc
浏览文件 @
f52c4f8b
...
@@ -15,6 +15,7 @@
...
@@ -15,6 +15,7 @@
#include "paddle/fluid/framework/ir/conv_elementwise_add_act_fuse_pass.h"
#include "paddle/fluid/framework/ir/conv_elementwise_add_act_fuse_pass.h"
#include <string>
#include <string>
#include "paddle/fluid/framework/ir/graph_viz_pass.h"
#include "paddle/fluid/framework/ir/graph_viz_pass.h"
#include "paddle/fluid/framework/op_version_registry.h"
namespace
paddle
{
namespace
paddle
{
namespace
framework
{
namespace
framework
{
...
@@ -102,3 +103,10 @@ void ConvElementwiseAddActFusePass::ApplyImpl(ir::Graph* graph) const {
...
@@ -102,3 +103,10 @@ void ConvElementwiseAddActFusePass::ApplyImpl(ir::Graph* graph) const {
REGISTER_PASS
(
conv_elementwise_add_act_fuse_pass
,
REGISTER_PASS
(
conv_elementwise_add_act_fuse_pass
,
paddle
::
framework
::
ir
::
ConvElementwiseAddActFusePass
);
paddle
::
framework
::
ir
::
ConvElementwiseAddActFusePass
);
REGISTER_PASS_CAPABILITY
(
conv_elementwise_add_act_fuse_pass
)
.
AddCombination
(
paddle
::
framework
::
compatible
::
OpVersionComparatorCombination
()
.
EQ
(
"conv2d"
,
0
)
.
EQ
(
"elementwise_add"
,
0
)
.
EQ
(
"relu"
,
0
)
.
EQ
(
"identity"
,
0
));
paddle/fluid/framework/ir/conv_elementwise_add_fuse_pass.cc
浏览文件 @
f52c4f8b
...
@@ -12,10 +12,10 @@
...
@@ -12,10 +12,10 @@
// See the License for the specific language governing permissions and
// See the License for the specific language governing permissions and
// limitations under the License.
// limitations under the License.
#include <string>
#include "paddle/fluid/framework/ir/conv_elementwise_add_fuse_pass.h"
#include "paddle/fluid/framework/ir/conv_elementwise_add_fuse_pass.h"
#include <string>
#include "paddle/fluid/framework/ir/graph_viz_pass.h"
#include "paddle/fluid/framework/ir/graph_viz_pass.h"
#include "paddle/fluid/framework/op_version_registry.h"
namespace
paddle
{
namespace
paddle
{
namespace
framework
{
namespace
framework
{
...
@@ -89,3 +89,8 @@ void ConvElementwiseAddFusePass::ApplyImpl(ir::Graph* graph) const {
...
@@ -89,3 +89,8 @@ void ConvElementwiseAddFusePass::ApplyImpl(ir::Graph* graph) const {
REGISTER_PASS
(
conv_elementwise_add_fuse_pass
,
REGISTER_PASS
(
conv_elementwise_add_fuse_pass
,
paddle
::
framework
::
ir
::
ConvElementwiseAddFusePass
);
paddle
::
framework
::
ir
::
ConvElementwiseAddFusePass
);
REGISTER_PASS_CAPABILITY
(
conv_elementwise_add_fuse_pass
)
.
AddCombination
(
paddle
::
framework
::
compatible
::
OpVersionComparatorCombination
()
.
EQ
(
"conv2d"
,
0
)
.
EQ
(
"elementwise_add"
,
0
));
paddle/fluid/framework/ir/embedding_fc_lstm_fuse_pass.cc
浏览文件 @
f52c4f8b
...
@@ -23,6 +23,8 @@
...
@@ -23,6 +23,8 @@
#include "paddle/fluid/operators/math/cpu_vec.h"
#include "paddle/fluid/operators/math/cpu_vec.h"
#include "paddle/fluid/platform/cpu_info.h"
#include "paddle/fluid/platform/cpu_info.h"
#include "paddle/fluid/framework/op_version_registry.h"
namespace
paddle
{
namespace
paddle
{
namespace
framework
{
namespace
framework
{
namespace
ir
{
namespace
ir
{
...
@@ -34,7 +36,7 @@ static int BuildFusion(Graph* graph, const std::string& name_scope,
...
@@ -34,7 +36,7 @@ static int BuildFusion(Graph* graph, const std::string& name_scope,
// Build pattern
// Build pattern
PDNode
*
x
=
pattern
->
NewNode
(
patterns
::
PDNodeName
(
name_scope
,
"x"
))
PDNode
*
x
=
pattern
->
NewNode
(
patterns
::
PDNodeName
(
name_scope
,
"x"
))
->
assert_is_op_input
(
"lookup_table"
)
->
assert_is_op_input
(
"lookup_table
_v2
"
)
->
assert_var_not_persistable
();
->
assert_var_not_persistable
();
patterns
::
Embedding
embedding_pattern
(
pattern
,
name_scope
);
patterns
::
Embedding
embedding_pattern
(
pattern
,
name_scope
);
// TODO(jczaja): Intermediate can only be for val that are not used anywhere
// TODO(jczaja): Intermediate can only be for val that are not used anywhere
...
@@ -256,3 +258,11 @@ void EmbeddingFCLSTMFusePass::ApplyImpl(ir::Graph* graph) const {
...
@@ -256,3 +258,11 @@ void EmbeddingFCLSTMFusePass::ApplyImpl(ir::Graph* graph) const {
REGISTER_PASS
(
embedding_fc_lstm_fuse_pass
,
REGISTER_PASS
(
embedding_fc_lstm_fuse_pass
,
paddle
::
framework
::
ir
::
EmbeddingFCLSTMFusePass
);
paddle
::
framework
::
ir
::
EmbeddingFCLSTMFusePass
);
REGISTER_PASS_CAPABILITY
(
embedding_fc_lstm_fuse_pass
)
.
AddCombination
(
paddle
::
framework
::
compatible
::
OpVersionComparatorCombination
()
.
EQ
(
"lookup_table_v2"
,
0
)
.
EQ
(
"mul"
,
0
)
.
EQ
(
"elementwise_add"
,
0
)
.
EQ
(
"lstm"
,
0
)
.
EQ
(
"fused_embedding_fc_lstm"
,
0
));
paddle/fluid/framework/ir/fc_fuse_pass.cc
浏览文件 @
f52c4f8b
...
@@ -18,6 +18,7 @@
...
@@ -18,6 +18,7 @@
#include <unordered_set>
#include <unordered_set>
#include <vector>
#include <vector>
#include "paddle/fluid/framework/ir/graph_helper.h"
#include "paddle/fluid/framework/ir/graph_helper.h"
#include "paddle/fluid/framework/op_version_registry.h"
#include "paddle/fluid/platform/enforce.h"
#include "paddle/fluid/platform/enforce.h"
namespace
paddle
{
namespace
paddle
{
...
@@ -182,3 +183,10 @@ int FCFusePass::ApplyFCPattern(Graph* graph, bool with_relu) const {
...
@@ -182,3 +183,10 @@ int FCFusePass::ApplyFCPattern(Graph* graph, bool with_relu) const {
REGISTER_PASS
(
fc_fuse_pass
,
paddle
::
framework
::
ir
::
FCFusePass
)
REGISTER_PASS
(
fc_fuse_pass
,
paddle
::
framework
::
ir
::
FCFusePass
)
.
RequirePassAttr
(
"use_gpu"
);
.
RequirePassAttr
(
"use_gpu"
);
REGISTER_PASS_CAPABILITY
(
fc_fuse_pass
)
.
AddCombination
(
paddle
::
framework
::
compatible
::
OpVersionComparatorCombination
()
.
EQ
(
"mul"
,
0
)
.
EQ
(
"elementwise_add"
,
0
)
.
EQ
(
"relu"
,
0
)
.
EQ
(
"fc"
,
0
));
paddle/fluid/framework/ir/fc_gru_fuse_pass.cc
浏览文件 @
f52c4f8b
...
@@ -16,6 +16,7 @@
...
@@ -16,6 +16,7 @@
#include <string>
#include <string>
#include <unordered_set>
#include <unordered_set>
#include "paddle/fluid/framework/lod_tensor.h"
#include "paddle/fluid/framework/lod_tensor.h"
#include "paddle/fluid/framework/op_version_registry.h"
namespace
paddle
{
namespace
paddle
{
namespace
framework
{
namespace
framework
{
...
@@ -125,7 +126,6 @@ static int BuildFusion(Graph* graph, const std::string& name_scope,
...
@@ -125,7 +126,6 @@ static int BuildFusion(Graph* graph, const std::string& name_scope,
auto
*
x_n
=
subgraph
.
at
(
x
);
auto
*
x_n
=
subgraph
.
at
(
x
);
GET_IR_NODE_FROM_SUBGRAPH
(
w
,
w
,
fc_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
w
,
w
,
fc_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
mul
,
mul
,
fc_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
mul
,
mul
,
fc_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
fc_out
,
elementwise_add_out
,
fc_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
Weight
,
Weight
,
gru_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
Weight
,
Weight
,
gru_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
gru
,
gru
,
gru_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
gru
,
gru
,
gru_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
Bias
,
Bias
,
gru_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
Bias
,
Bias
,
gru_pattern
);
...
@@ -136,10 +136,17 @@ static int BuildFusion(Graph* graph, const std::string& name_scope,
...
@@ -136,10 +136,17 @@ static int BuildFusion(Graph* graph, const std::string& name_scope,
gru_pattern
);
gru_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
BatchHidden
,
BatchHidden
,
gru_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
BatchHidden
,
BatchHidden
,
gru_pattern
);
// TODO(wilber): Support origin_mode=True.
if
(
gru
->
Op
()
->
GetAttrIfExists
<
bool
>
(
"origin_mode"
)
==
true
)
{
LOG
(
INFO
)
<<
"fc_gru_fuse_pass not supported when origin_mode=True."
;
return
;
}
if
(
with_fc_bias
)
{
if
(
with_fc_bias
)
{
GET_IR_NODE_FROM_SUBGRAPH
(
mul_out
,
mul_out
,
fc_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
mul_out
,
mul_out
,
fc_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
fc_bias
,
bias
,
fc_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
fc_bias
,
bias
,
fc_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
elementwise_add
,
elementwise_add
,
fc_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
elementwise_add
,
elementwise_add
,
fc_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
fc_out
,
elementwise_add_out
,
fc_pattern
);
gru_creater
(
gru
,
x_n
,
w
,
Weight
,
Bias
,
Hidden
,
fc_bias
);
gru_creater
(
gru
,
x_n
,
w
,
Weight
,
Bias
,
Hidden
,
fc_bias
);
// Remove unneeded nodes.
// Remove unneeded nodes.
...
@@ -188,3 +195,16 @@ void FCGRUFusePass::ApplyImpl(ir::Graph* graph) const {
...
@@ -188,3 +195,16 @@ void FCGRUFusePass::ApplyImpl(ir::Graph* graph) const {
REGISTER_PASS
(
mul_gru_fuse_pass
,
paddle
::
framework
::
ir
::
MulGRUFusePass
);
REGISTER_PASS
(
mul_gru_fuse_pass
,
paddle
::
framework
::
ir
::
MulGRUFusePass
);
REGISTER_PASS
(
fc_gru_fuse_pass
,
paddle
::
framework
::
ir
::
FCGRUFusePass
);
REGISTER_PASS
(
fc_gru_fuse_pass
,
paddle
::
framework
::
ir
::
FCGRUFusePass
);
REGISTER_PASS_CAPABILITY
(
mul_gru_fuse_pass
)
.
AddCombination
(
paddle
::
framework
::
compatible
::
OpVersionComparatorCombination
()
.
EQ
(
"mul"
,
0
)
.
EQ
(
"gru"
,
0
)
.
EQ
(
"fusion_gru"
,
0
));
REGISTER_PASS_CAPABILITY
(
fc_gru_fuse_pass
)
.
AddCombination
(
paddle
::
framework
::
compatible
::
OpVersionComparatorCombination
()
.
EQ
(
"mul"
,
0
)
.
EQ
(
"elementwise_add"
,
0
)
.
EQ
(
"gru"
,
0
)
.
EQ
(
"fusion_gru"
,
0
));
paddle/fluid/framework/ir/fc_lstm_fuse_pass.cc
浏览文件 @
f52c4f8b
...
@@ -16,6 +16,7 @@
...
@@ -16,6 +16,7 @@
#include <string>
#include <string>
#include <unordered_set>
#include <unordered_set>
#include "paddle/fluid/framework/lod_tensor.h"
#include "paddle/fluid/framework/lod_tensor.h"
#include "paddle/fluid/framework/op_version_registry.h"
namespace
paddle
{
namespace
paddle
{
namespace
framework
{
namespace
framework
{
...
@@ -196,3 +197,17 @@ void FCLstmFusePass::ApplyImpl(ir::Graph* graph) const {
...
@@ -196,3 +197,17 @@ void FCLstmFusePass::ApplyImpl(ir::Graph* graph) const {
REGISTER_PASS
(
mul_lstm_fuse_pass
,
paddle
::
framework
::
ir
::
MulLstmFusePass
);
REGISTER_PASS
(
mul_lstm_fuse_pass
,
paddle
::
framework
::
ir
::
MulLstmFusePass
);
REGISTER_PASS
(
fc_lstm_fuse_pass
,
paddle
::
framework
::
ir
::
FCLstmFusePass
);
REGISTER_PASS
(
fc_lstm_fuse_pass
,
paddle
::
framework
::
ir
::
FCLstmFusePass
);
REGISTER_PASS_CAPABILITY
(
fc_lstm_fuse_pass
)
.
AddCombination
(
paddle
::
framework
::
compatible
::
OpVersionComparatorCombination
()
.
EQ
(
"mul"
,
0
)
.
EQ
(
"elementwise_add"
,
0
)
.
EQ
(
"lstm"
,
0
)
.
EQ
(
"fusion_lstm"
,
0
));
REGISTER_PASS_CAPABILITY
(
mul_lstm_fuse_pass
)
.
AddCombination
(
paddle
::
framework
::
compatible
::
OpVersionComparatorCombination
()
.
EQ
(
"mul"
,
0
)
.
EQ
(
"lstm"
,
0
)
.
EQ
(
"fusion_lstm"
,
0
));
paddle/fluid/framework/ir/memory_optimize_pass/CMakeLists.txt
浏览文件 @
f52c4f8b
...
@@ -13,4 +13,6 @@ cc_library(memory_reuse_pass SRCS memory_reuse_pass.cc DEPS computation_op_handl
...
@@ -13,4 +13,6 @@ cc_library(memory_reuse_pass SRCS memory_reuse_pass.cc DEPS computation_op_handl
cc_library
(
buffer_shared_inplace_op_pass SRCS buffer_shared_inplace_op_pass.cc DEPS memory_reuse_pass
)
cc_library
(
buffer_shared_inplace_op_pass SRCS buffer_shared_inplace_op_pass.cc DEPS memory_reuse_pass
)
cc_library
(
buffer_shared_cross_op_memory_reuse_pass SRCS buffer_shared_cross_op_memory_reuse_pass.cc DEPS memory_reuse_pass
)
cc_library
(
buffer_shared_cross_op_memory_reuse_pass SRCS buffer_shared_cross_op_memory_reuse_pass.cc DEPS memory_reuse_pass
)
cc_library
(
inplace_addto_op_pass SRCS inplace_addto_op_pass.cc DEPS memory_reuse_pass
)
cc_test
(
test_reference_count_pass_last_lived_ops SRCS test_reference_count_pass_last_lived_ops.cc DEPS parallel_executor elementwise_mul_op elementwise_add_op scale_op
)
cc_test
(
test_reference_count_pass_last_lived_ops SRCS test_reference_count_pass_last_lived_ops.cc DEPS parallel_executor elementwise_mul_op elementwise_add_op scale_op
)
paddle/fluid/framework/ir/memory_optimize_pass/buffer_shared_inplace_op_pass.cc
浏览文件 @
f52c4f8b
...
@@ -16,6 +16,7 @@
...
@@ -16,6 +16,7 @@
#include <unordered_map>
#include <unordered_map>
#include <unordered_set>
#include <unordered_set>
#include <vector>
#include <vector>
#include "paddle/fluid/framework/details/computation_op_handle.h"
#include "paddle/fluid/framework/details/computation_op_handle.h"
#include "paddle/fluid/framework/details/multi_devices_helper.h"
#include "paddle/fluid/framework/details/multi_devices_helper.h"
#include "paddle/fluid/framework/details/share_tensor_buffer_op_handle.h"
#include "paddle/fluid/framework/details/share_tensor_buffer_op_handle.h"
...
@@ -141,11 +142,12 @@ void BufferSharedInplaceOpPass::Run(Graph *graph) const {
...
@@ -141,11 +142,12 @@ void BufferSharedInplaceOpPass::Run(Graph *graph) const {
VLOG
(
4
)
<<
"Inplace performed in op "
<<
op_type
<<
": "
VLOG
(
4
)
<<
"Inplace performed in op "
<<
op_type
<<
": "
<<
in_var_handle_ptr
->
Name
()
<<
" -> "
<<
in_var_handle_ptr
->
Name
()
<<
" -> "
<<
out_var_handle_ptr
->
Name
()
<<
out_var_handle_ptr
->
Name
()
<<
". Debug String is: "
<<
op
->
GetOp
()
->
DebugString
();
<<
". Debug String is: "
<<
op
->
GetOp
()
->
DebugString
()
<<
". ReuseType: "
<<
ReuseType
();
}
else
{
}
else
{
VLOG
(
3
)
<<
"Inplace failed in op "
<<
op_type
<<
": "
VLOG
(
3
)
<<
"Inplace failed in op "
<<
op_type
<<
": "
<<
in_var_handle_ptr
->
Name
()
<<
" -> "
<<
in_var_handle_ptr
->
Name
()
<<
" -> "
<<
out_var_handle_ptr
->
Name
();
<<
out_var_handle_ptr
->
Name
()
<<
". ReuseType: "
<<
ReuseType
()
;
}
}
}
}
}
}
...
...
paddle/fluid/framework/ir/memory_optimize_pass/inplace_addto_op_pass.cc
0 → 100644
浏览文件 @
f52c4f8b
// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include <string>
#include <unordered_map>
#include <unordered_set>
#include <vector>
#include "paddle/fluid/framework/details/computation_op_handle.h"
#include "paddle/fluid/framework/details/multi_devices_helper.h"
#include "paddle/fluid/framework/details/share_tensor_buffer_op_handle.h"
#include "paddle/fluid/framework/ir/memory_optimize_pass/memory_optimization_var_info.h"
#include "paddle/fluid/framework/ir/memory_optimize_pass/memory_reuse_pass.h"
#include "paddle/fluid/framework/ir/memory_optimize_pass/reference_count_pass_helper.h"
#include "paddle/fluid/framework/ir/pass.h"
namespace
paddle
{
namespace
framework
{
namespace
ir
{
class
InplaceAddToOpPass
:
public
MemoryReusePass
{
protected:
std
::
string
ReuseType
()
const
override
{
return
"inplace_addto"
;
}
void
Run
(
Graph
*
graph
)
const
override
;
private:
// 1. Add last living op of in_var, add any last living op of out_var
// 2. Set reference count of in_var to be 2
void
UpdateLastLiveOpOfVar
(
details
::
ComputationOpHandle
*
op
,
details
::
VarHandle
*
in_var
,
details
::
VarHandle
*
out_var
)
const
override
{
size_t
scope_idx
=
op
->
GetScopeIdx
();
auto
*
last_live_ops_of_vars_
=
&
Get
<
std
::
vector
<
LastLiveOpsOfVars
>>
(
kLastLiveOpsOfVars
);
auto
*
var_infos_
=
&
(
Get
<
MemOptVarInfoMapList
>
(
kMemOptVarInfoMapList
));
auto
out_var_op_iter
=
(
*
last_live_ops_of_vars_
)[
scope_idx
].
find
(
out_var
->
Name
());
// In Reduce mode, some output variable(gradient of parameter) does not have
// last live ops
details
::
ComputationOpHandle
*
last_live_op_of_in_var
=
nullptr
;
if
(
out_var_op_iter
==
(
*
last_live_ops_of_vars_
)[
scope_idx
].
end
())
{
last_live_op_of_in_var
=
op
;
}
else
{
PADDLE_ENFORCE_EQ
(
out_var_op_iter
->
second
.
ops
().
empty
(),
false
,
platform
::
errors
::
InvalidArgument
(
"Var(%s)'s last live op should not empty."
,
out_var
->
Name
()));
last_live_op_of_in_var
=
*
(
out_var_op_iter
->
second
.
ops
().
begin
());
}
auto
*
last_live_ops_of_in_var
=
(
*
last_live_ops_of_vars_
)[
scope_idx
][
in_var
->
Name
()].
mutable_ops
();
// last_live_ops_of_in_var->clear();
last_live_ops_of_in_var
->
insert
(
last_live_op_of_in_var
);
auto
in_var_info_iter
=
(
*
var_infos_
)[
scope_idx
].
find
(
in_var
->
Name
());
PADDLE_ENFORCE_NE
(
in_var_info_iter
,
(
*
var_infos_
)[
scope_idx
].
end
(),
platform
::
errors
::
NotFound
(
"Cannot find variable %s."
,
in_var
->
Name
()));
in_var_info_iter
->
second
->
SetRefCnt
(
2
);
// before inplace, it is 1
}
};
void
InplaceAddToOpPass
::
Run
(
Graph
*
graph
)
const
{
const
auto
&
last_live_ops
=
Get
<
std
::
vector
<
LastLiveOpsOfVars
>>
(
kLastLiveOpsOfVars
);
bool
use_cuda
=
Get
<
bool
>
(
kUseCuda
);
// Currently, only perform InplaceAddToOpPass on cuda place
if
(
!
use_cuda
)
{
return
;
}
// Step 1: Build a reverse map of last_live_ops
// i.e.: op -> vars
std
::
unordered_map
<
details
::
ComputationOpHandle
*
,
std
::
unordered_map
<
std
::
string
,
ir
::
Node
*>>
candidate_ops
;
for
(
auto
&
each_scope_ops
:
last_live_ops
)
{
for
(
auto
&
pair
:
each_scope_ops
)
{
// If variable has more than 1 last lived ops, this variable cannot
// be inplaced.
if
(
pair
.
second
.
ops
().
size
()
!=
1
)
{
continue
;
}
auto
*
op
=
*
(
pair
.
second
.
ops
().
begin
());
const
std
::
string
&
op_type
=
op
->
GetOp
()
->
Type
();
const
framework
::
OpDesc
*
op_desc
=
op
->
Node
()
->
Op
();
PADDLE_ENFORCE_NOT_NULL
(
op_desc
,
platform
::
errors
::
NotFound
(
"Op(%s) can not find opdesc."
,
op
->
Name
()));
// only grad op should be processed.
if
(
op_type
!=
"grad_add"
)
{
continue
;
}
const
std
::
string
&
var_name
=
pair
.
first
;
auto
in_nodes
=
this
->
FindNodesByName
(
var_name
,
op
->
Node
()
->
inputs
);
if
(
in_nodes
.
size
()
==
1
)
{
candidate_ops
[
op
][
var_name
]
=
*
in_nodes
.
begin
();
}
VLOG
(
4
)
<<
"Find op "
<<
op_type
<<
" with input("
<<
var_name
<<
") that can do inplace add to"
;
}
}
// Step 2: Check which vars can be inplaced indeed
for
(
auto
&
op_vars_pair
:
candidate_ops
)
{
auto
*
op
=
op_vars_pair
.
first
;
// The original gradient accumulation is g = sum(g_0, g_1,..., g_n), and it
// could be changed as follws if inplace addto is enabled:
// g_sum_0 = g_0
// g_sum_1 = grad_add(g_sum_0, g_1)
// g_sum_2 = grad_add(g_sum_1, g_2)
// ...
// g_sum_n = grad_add(g_sum_n-1, g_n)
// here we will add inplace for each grad_add, for example, for the first
// grad_add, g_sum_0 -> g1, g_sum_1 -> g1, and set grad_add as skipped.
const
std
::
string
&
op_type
=
op
->
GetOp
()
->
Type
();
PADDLE_ENFORCE_EQ
(
op
->
Node
()
->
inputs
.
size
(),
2
,
platform
::
errors
::
InvalidArgument
(
"The size of inputs of %s should be 2, but got %d"
,
op_type
,
op
->
Node
()
->
inputs
.
size
()));
PADDLE_ENFORCE_EQ
(
op
->
Node
()
->
outputs
.
size
(),
1
,
platform
::
errors
::
InvalidArgument
(
"The size of outputs of %s should be 1, but got %d"
,
op_type
,
op
->
Node
()
->
outputs
.
size
()));
auto
*
left_var_ptr
=
dynamic_cast
<
details
::
VarHandle
*>
(
&
(
op
->
Node
()
->
inputs
[
0
]
->
Wrapper
<
details
::
VarHandleBase
>
()));
auto
*
right_var_ptr
=
dynamic_cast
<
details
::
VarHandle
*>
(
&
(
op
->
Node
()
->
inputs
[
1
]
->
Wrapper
<
details
::
VarHandleBase
>
()));
auto
*
out_var_ptr
=
dynamic_cast
<
details
::
VarHandle
*>
(
&
(
op
->
Node
()
->
outputs
[
0
]
->
Wrapper
<
details
::
VarHandleBase
>
()));
if
(
left_var_ptr
==
nullptr
||
right_var_ptr
==
nullptr
||
out_var_ptr
==
nullptr
)
{
continue
;
}
// auto *left_generated_op = dynamic_cast<details::ComputationOpHandle *>(
// left_var_ptr->GeneratedOp());
auto
*
right_generated_op
=
dynamic_cast
<
details
::
ComputationOpHandle
*>
(
right_var_ptr
->
GeneratedOp
());
auto
*
out_generated_op
=
dynamic_cast
<
details
::
ComputationOpHandle
*>
(
out_var_ptr
->
GeneratedOp
());
// NOTE(zhiqiu): currently, only conv2d_grad supports addto strategy
if
(
right_generated_op
->
Name
()
!=
"conv2d_grad"
)
{
continue
;
}
// NOTE(zhiqiu): Normally, if we inplace a->b, we should let a generated
// before b. However, in the situation of inplace addto, we do not care
// the order, since a+b is equal to b+a. Is there any exception for that?
// AddDependencyVar(right_generated_op, left_generated_op);
// no need, as discussed above.
// step (a): inplace right_var->left_var of grad_add
this
->
AddReuseVar
(
right_generated_op
,
left_var_ptr
,
right_var_ptr
);
UpdateLastLiveOpOfVar
(
right_generated_op
,
left_var_ptr
,
right_var_ptr
);
VLOG
(
4
)
<<
"Inplace performed in op "
<<
right_generated_op
->
GetOp
()
->
Type
()
<<
": "
<<
left_var_ptr
->
Name
()
<<
" -> "
<<
right_var_ptr
->
Name
()
<<
". Debug String is: "
<<
right_generated_op
->
GetOp
()
->
DebugString
()
<<
". ReuseType: "
<<
ReuseType
();
// step (b): inplace out -> right_var of grad_add
this
->
AddReuseVar
(
out_generated_op
,
right_var_ptr
,
out_var_ptr
,
true
);
VLOG
(
4
)
<<
"Inplace performed in op "
<<
op_type
<<
": "
<<
left_var_ptr
->
Name
()
<<
" -> "
<<
out_var_ptr
->
Name
()
<<
". Debug String is: "
<<
op
->
GetOp
()
->
DebugString
()
<<
". ReuseType: "
<<
ReuseType
();
// step (c): make right_var cannot inplace afterwards. canbe done
// aotomatically since CollectReusedVars is called before any reuse.
// step (d): make right_var's generated op use addto
right_generated_op
->
GetOp
()
->
SetAttr
(
"use_addto"
,
true
);
// step (e): make grad_add skip running
op
->
SetSkipRunning
(
true
);
}
}
}
// namespace ir
}
// namespace framework
}
// namespace paddle
REGISTER_PASS
(
inplace_addto_op_pass
,
paddle
::
framework
::
ir
::
InplaceAddToOpPass
)
.
RequirePassAttr
(
paddle
::
framework
::
ir
::
kMemOptVarInfoMapList
)
.
RequirePassAttr
(
paddle
::
framework
::
ir
::
kLastLiveOpsOfVars
)
.
RequirePassAttr
(
paddle
::
framework
::
ir
::
kUseCuda
);
paddle/fluid/framework/ir/memory_optimize_pass/memory_reuse_pass.cc
浏览文件 @
f52c4f8b
...
@@ -13,6 +13,7 @@
...
@@ -13,6 +13,7 @@
// limitations under the License.
// limitations under the License.
#include "paddle/fluid/framework/ir/memory_optimize_pass/memory_reuse_pass.h"
#include "paddle/fluid/framework/ir/memory_optimize_pass/memory_reuse_pass.h"
#include <functional>
#include <functional>
#include <map>
#include <map>
#include <string>
#include <string>
...
@@ -73,6 +74,7 @@ bool MemoryReusePass::TryReuseVar(details::VarHandle *in_var,
...
@@ -73,6 +74,7 @@ bool MemoryReusePass::TryReuseVar(details::VarHandle *in_var,
out_var
->
Name
()));
out_var
->
Name
()));
if
(
IsVarPairReusable
(
*
in_var
,
*
out_var
))
{
if
(
IsVarPairReusable
(
*
in_var
,
*
out_var
))
{
AddReuseVar
(
op
,
in_var
,
out_var
);
AddReuseVar
(
op
,
in_var
,
out_var
);
UpdateLastLiveOpOfVar
(
op
,
in_var
,
out_var
);
return
true
;
return
true
;
}
else
{
}
else
{
return
false
;
return
false
;
...
@@ -324,7 +326,8 @@ bool MemoryReusePass::IsVarPairReusable(
...
@@ -324,7 +326,8 @@ bool MemoryReusePass::IsVarPairReusable(
void
MemoryReusePass
::
AddReuseVar
(
details
::
ComputationOpHandle
*
op
,
void
MemoryReusePass
::
AddReuseVar
(
details
::
ComputationOpHandle
*
op
,
details
::
VarHandle
*
in_var
,
details
::
VarHandle
*
in_var
,
details
::
VarHandle
*
out_var
)
const
{
details
::
VarHandle
*
out_var
,
bool
share_dims
)
const
{
PADDLE_ENFORCE_GT
(
PADDLE_ENFORCE_GT
(
(
*
var_infos_
)[
op
->
GetScopeIdx
()].
count
(
in_var
->
Name
()),
0
,
(
*
var_infos_
)[
op
->
GetScopeIdx
()].
count
(
in_var
->
Name
()),
0
,
platform
::
errors
::
NotFound
(
"Var(%s) does not in mem opt var infos."
,
platform
::
errors
::
NotFound
(
"Var(%s) does not in mem opt var infos."
,
...
@@ -344,13 +347,15 @@ void MemoryReusePass::AddReuseVar(details::ComputationOpHandle *op,
...
@@ -344,13 +347,15 @@ void MemoryReusePass::AddReuseVar(details::ComputationOpHandle *op,
share_buffer_op
->
AddInput
(
in_var
);
share_buffer_op
->
AddInput
(
in_var
);
}
}
if
(
share_dims
)
{
share_buffer_op
->
SetShareDims
(
true
);
}
share_buffer_op
->
AddReuseVarPair
(
share_buffer_op
->
AddReuseVarPair
(
(
*
var_infos_
)[
op
->
GetScopeIdx
()].
at
(
in_var
->
Name
()).
get
(),
(
*
var_infos_
)[
op
->
GetScopeIdx
()].
at
(
in_var
->
Name
()).
get
(),
out_var
->
Name
());
out_var
->
Name
());
reused_in_var_names_
[
op
->
GetScopeIdx
()].
insert
(
in_var
->
Name
());
reused_in_var_names_
[
op
->
GetScopeIdx
()].
insert
(
in_var
->
Name
());
reused_out_var_names_
[
op
->
GetScopeIdx
()].
insert
(
out_var
->
Name
());
reused_out_var_names_
[
op
->
GetScopeIdx
()].
insert
(
out_var
->
Name
());
UpdateLastLiveOpOfVar
(
op
,
in_var
,
out_var
);
}
}
// 1. Set last living op of in_var to be any last living op of out_var
// 1. Set last living op of in_var to be any last living op of out_var
...
...
paddle/fluid/framework/ir/memory_optimize_pass/memory_reuse_pass.h
浏览文件 @
f52c4f8b
...
@@ -18,6 +18,7 @@
...
@@ -18,6 +18,7 @@
#include <unordered_map>
#include <unordered_map>
#include <unordered_set>
#include <unordered_set>
#include <vector>
#include <vector>
#include "paddle/fluid/framework/details/computation_op_handle.h"
#include "paddle/fluid/framework/details/computation_op_handle.h"
#include "paddle/fluid/framework/details/multi_devices_helper.h"
#include "paddle/fluid/framework/details/multi_devices_helper.h"
#include "paddle/fluid/framework/details/share_tensor_buffer_op_handle.h"
#include "paddle/fluid/framework/details/share_tensor_buffer_op_handle.h"
...
@@ -92,6 +93,12 @@ class MemoryReusePass : public Pass {
...
@@ -92,6 +93,12 @@ class MemoryReusePass : public Pass {
int64_t
GetMemorySize
(
const
details
::
VarHandle
&
var
)
const
;
int64_t
GetMemorySize
(
const
details
::
VarHandle
&
var
)
const
;
void
AddReuseVar
(
details
::
ComputationOpHandle
*
op
,
details
::
VarHandle
*
in_var
,
details
::
VarHandle
*
out_var
,
bool
share_dims
=
false
)
const
;
virtual
void
UpdateLastLiveOpOfVar
(
details
::
ComputationOpHandle
*
op
,
details
::
VarHandle
*
in_var
,
details
::
VarHandle
*
out_var
)
const
;
private:
private:
VarDesc
*
GetVarDesc
(
const
details
::
VarHandle
&
var
)
const
;
VarDesc
*
GetVarDesc
(
const
details
::
VarHandle
&
var
)
const
;
...
@@ -109,13 +116,6 @@ class MemoryReusePass : public Pass {
...
@@ -109,13 +116,6 @@ class MemoryReusePass : public Pass {
void
CollectReusedVars
()
const
;
void
CollectReusedVars
()
const
;
void
AddReuseVar
(
details
::
ComputationOpHandle
*
op
,
details
::
VarHandle
*
in_var
,
details
::
VarHandle
*
out_var
)
const
;
void
UpdateLastLiveOpOfVar
(
details
::
ComputationOpHandle
*
op
,
details
::
VarHandle
*
in_var
,
details
::
VarHandle
*
out_var
)
const
;
private:
private:
mutable
Graph
*
graph_
;
mutable
Graph
*
graph_
;
mutable
bool
use_cuda_
;
mutable
bool
use_cuda_
;
...
...
paddle/fluid/framework/ir/repeated_fc_relu_fuse_pass.cc
浏览文件 @
f52c4f8b
...
@@ -18,6 +18,7 @@ limitations under the License. */
...
@@ -18,6 +18,7 @@ limitations under the License. */
#include <unordered_set>
#include <unordered_set>
#include <vector>
#include <vector>
#include "paddle/fluid/framework/lod_tensor.h"
#include "paddle/fluid/framework/lod_tensor.h"
#include "paddle/fluid/framework/op_version_registry.h"
#define MAX_NUM_FC 10
#define MAX_NUM_FC 10
...
@@ -174,6 +175,10 @@ void BuildRepeatedFCReluPattern(PDPattern* pattern,
...
@@ -174,6 +175,10 @@ void BuildRepeatedFCReluPattern(PDPattern* pattern,
if
(
x
->
outputs
.
size
()
<=
0
||
x
->
inputs
.
size
()
<=
0U
)
{
if
(
x
->
outputs
.
size
()
<=
0
||
x
->
inputs
.
size
()
<=
0U
)
{
return
false
;
return
false
;
}
}
if
(
x
->
IsVar
()
&&
x
->
Var
()
&&
x
->
Var
()
->
GetShape
().
size
()
>
2
)
{
LOG
(
WARNING
)
<<
"repeated fc relu only supports input dims = 2"
;
return
false
;
}
int
fc_idx
=
FindFCIdx
(
x
);
int
fc_idx
=
FindFCIdx
(
x
);
if
(
fc_idx
<
0
)
{
if
(
fc_idx
<
0
)
{
return
false
;
return
false
;
...
@@ -384,3 +389,8 @@ void RepeatedFCReluFusePass::ApplyImpl(ir::Graph* graph) const {
...
@@ -384,3 +389,8 @@ void RepeatedFCReluFusePass::ApplyImpl(ir::Graph* graph) const {
REGISTER_PASS
(
repeated_fc_relu_fuse_pass
,
REGISTER_PASS
(
repeated_fc_relu_fuse_pass
,
paddle
::
framework
::
ir
::
RepeatedFCReluFusePass
);
paddle
::
framework
::
ir
::
RepeatedFCReluFusePass
);
REGISTER_PASS_CAPABILITY
(
repeated_fc_relu_fuse_pass
)
.
AddCombination
(
paddle
::
framework
::
compatible
::
OpVersionComparatorCombination
()
.
EQ
(
"fc"
,
0
)
.
EQ
(
"relu"
,
0
));
paddle/fluid/framework/ir/shuffle_channel_detect_pass.cc
浏览文件 @
f52c4f8b
...
@@ -16,6 +16,7 @@
...
@@ -16,6 +16,7 @@
#include "paddle/fluid/framework/ir/graph_viz_pass.h"
#include "paddle/fluid/framework/ir/graph_viz_pass.h"
#include "paddle/fluid/framework/ir/shuffle_channel_detect_pass.h"
#include "paddle/fluid/framework/ir/shuffle_channel_detect_pass.h"
#include "paddle/fluid/framework/op_version_registry.h"
namespace
paddle
{
namespace
paddle
{
namespace
framework
{
namespace
framework
{
...
@@ -34,6 +35,8 @@ void ShuffleChannelDetectPass::ApplyImpl(ir::Graph* graph) const {
...
@@ -34,6 +35,8 @@ void ShuffleChannelDetectPass::ApplyImpl(ir::Graph* graph) const {
const
std
::
string
pattern_name
=
"shufflechannel_pattern"
;
const
std
::
string
pattern_name
=
"shufflechannel_pattern"
;
FusePassBase
::
Init
(
pattern_name
,
graph
);
FusePassBase
::
Init
(
pattern_name
,
graph
);
LOG
(
WARNING
)
<<
"There is fluid.layers.shuffle_channel API already, you can "
"use it instead of (reshape + transpose +reshape)"
;
GraphPatternDetector
gpd
;
GraphPatternDetector
gpd
;
auto
*
x
=
gpd
.
mutable_pattern
()
auto
*
x
=
gpd
.
mutable_pattern
()
->
NewNode
(
"x"
)
->
NewNode
(
"x"
)
...
@@ -93,3 +96,8 @@ void ShuffleChannelDetectPass::ApplyImpl(ir::Graph* graph) const {
...
@@ -93,3 +96,8 @@ void ShuffleChannelDetectPass::ApplyImpl(ir::Graph* graph) const {
REGISTER_PASS
(
shuffle_channel_detect_pass
,
REGISTER_PASS
(
shuffle_channel_detect_pass
,
paddle
::
framework
::
ir
::
ShuffleChannelDetectPass
);
paddle
::
framework
::
ir
::
ShuffleChannelDetectPass
);
REGISTER_PASS_CAPABILITY
(
shuffle_channel_detect_pass
)
.
AddCombination
(
paddle
::
framework
::
compatible
::
OpVersionComparatorCombination
()
.
EQ
(
"reshape2"
,
0
)
.
EQ
(
"transpose2"
,
0
));
paddle/fluid/framework/ir/squared_mat_sub_fuse_pass.cc
浏览文件 @
f52c4f8b
...
@@ -17,6 +17,7 @@
...
@@ -17,6 +17,7 @@
#include <unordered_set>
#include <unordered_set>
#include <vector>
#include <vector>
#include "paddle/fluid/framework/lod_tensor.h"
#include "paddle/fluid/framework/lod_tensor.h"
#include "paddle/fluid/framework/op_version_registry.h"
namespace
paddle
{
namespace
paddle
{
namespace
framework
{
namespace
framework
{
...
@@ -77,7 +78,8 @@ PDNode* BuildSquaredMatSubPattern(PDPattern* pattern,
...
@@ -77,7 +78,8 @@ PDNode* BuildSquaredMatSubPattern(PDPattern* pattern,
};
};
auto
is_fusion_input_var
=
[
=
](
Node
*
x
,
const
std
::
string
&
arg_name
)
{
auto
is_fusion_input_var
=
[
=
](
Node
*
x
,
const
std
::
string
&
arg_name
)
{
bool
basic
=
var_is_op_input
(
x
,
"matmul"
,
arg_name
)
&&
bool
basic
=
(
var_is_op_input
(
x
,
"matmul_v2"
,
arg_name
)
||
var_is_op_input
(
x
,
"matmul"
,
arg_name
))
&&
var_is_op_input
(
x
,
"square"
,
"X"
);
var_is_op_input
(
x
,
"square"
,
"X"
);
if
(
!
basic
)
{
if
(
!
basic
)
{
return
false
;
return
false
;
...
@@ -88,7 +90,8 @@ PDNode* BuildSquaredMatSubPattern(PDPattern* pattern,
...
@@ -88,7 +90,8 @@ PDNode* BuildSquaredMatSubPattern(PDPattern* pattern,
}
}
auto
*
squared_x
=
squared_x_op
->
outputs
[
0
];
auto
*
squared_x
=
squared_x_op
->
outputs
[
0
];
bool
next_is_matmul_from_arg
=
bool
next_is_matmul_from_arg
=
var_is_op_input
(
squared_x
,
"matmul"
,
arg_name
)
&&
(
var_is_op_input
(
squared_x
,
"matmul_v2"
,
arg_name
)
||
var_is_op_input
(
squared_x
,
"matmul"
,
arg_name
))
&&
squared_x
->
outputs
.
size
()
==
1
&&
squared_x
->
outputs
.
size
()
==
1
&&
squared_x
->
outputs
[
0
]
->
outputs
.
size
()
==
1
;
squared_x
->
outputs
[
0
]
->
outputs
.
size
()
==
1
;
if
(
!
next_is_matmul_from_arg
)
{
if
(
!
next_is_matmul_from_arg
)
{
...
@@ -103,7 +106,8 @@ PDNode* BuildSquaredMatSubPattern(PDPattern* pattern,
...
@@ -103,7 +106,8 @@ PDNode* BuildSquaredMatSubPattern(PDPattern* pattern,
auto
is_fusion_first_mul_out
=
[
=
](
Node
*
x
)
->
bool
{
auto
is_fusion_first_mul_out
=
[
=
](
Node
*
x
)
->
bool
{
bool
input_is_matmul_op
=
x
&&
x
->
inputs
.
size
()
==
1
&&
bool
input_is_matmul_op
=
x
&&
x
->
inputs
.
size
()
==
1
&&
x
->
inputs
[
0
]
->
IsOp
()
&&
x
->
inputs
[
0
]
->
IsOp
()
&&
x
->
inputs
[
0
]
->
Op
()
->
Type
()
==
"matmul"
;
(
x
->
inputs
[
0
]
->
Op
()
->
Type
()
==
"matmul_v2"
||
x
->
inputs
[
0
]
->
Op
()
->
Type
()
==
"matmul"
);
if
(
!
input_is_matmul_op
)
{
if
(
!
input_is_matmul_op
)
{
return
false
;
return
false
;
}
}
...
@@ -167,7 +171,8 @@ PDNode* BuildSquaredMatSubPattern(PDPattern* pattern,
...
@@ -167,7 +171,8 @@ PDNode* BuildSquaredMatSubPattern(PDPattern* pattern,
auto
*
matmul_xy_op
=
pattern
->
NewNode
(
auto
*
matmul_xy_op
=
pattern
->
NewNode
(
[
=
](
Node
*
x
)
{
[
=
](
Node
*
x
)
{
return
x
&&
x
->
IsOp
()
&&
x
->
Op
()
->
Type
()
==
"matmul"
&&
return
x
&&
x
->
IsOp
()
&&
(
x
->
Op
()
->
Type
()
==
"matmul_v2"
||
x
->
Op
()
->
Type
()
==
"matmul"
)
&&
is_fusion_first_mul_out
(
x
->
outputs
[
0
]);
is_fusion_first_mul_out
(
x
->
outputs
[
0
]);
},
},
name_scope
+
"/matmul_xy_op"
);
name_scope
+
"/matmul_xy_op"
);
...
@@ -189,7 +194,9 @@ PDNode* BuildSquaredMatSubPattern(PDPattern* pattern,
...
@@ -189,7 +194,9 @@ PDNode* BuildSquaredMatSubPattern(PDPattern* pattern,
auto
is_fusion_mat_squared_x_y_op_out
=
[
=
](
Node
*
x
)
->
bool
{
auto
is_fusion_mat_squared_x_y_op_out
=
[
=
](
Node
*
x
)
->
bool
{
bool
basic
=
x
&&
x
->
IsVar
()
&&
x
->
inputs
.
size
()
==
1
&&
bool
basic
=
x
&&
x
->
IsVar
()
&&
x
->
inputs
.
size
()
==
1
&&
x
->
inputs
[
0
]
->
IsOp
()
&&
x
->
inputs
[
0
]
->
Op
()
->
Type
()
==
"matmul"
;
x
->
inputs
[
0
]
->
IsOp
()
&&
(
x
->
inputs
[
0
]
->
Op
()
->
Type
()
==
"matmul_v2"
||
x
->
inputs
[
0
]
->
Op
()
->
Type
()
==
"matmul"
);
if
(
!
basic
)
{
if
(
!
basic
)
{
return
false
;
return
false
;
}
}
...
@@ -206,7 +213,8 @@ PDNode* BuildSquaredMatSubPattern(PDPattern* pattern,
...
@@ -206,7 +213,8 @@ PDNode* BuildSquaredMatSubPattern(PDPattern* pattern,
auto
*
matmul_squared_x_y_op
=
pattern
->
NewNode
(
auto
*
matmul_squared_x_y_op
=
pattern
->
NewNode
(
[
=
](
Node
*
x
)
{
[
=
](
Node
*
x
)
{
return
x
&&
x
->
IsOp
()
&&
x
->
Op
()
->
Type
()
==
"matmul"
&&
return
x
&&
x
->
IsOp
()
&&
(
x
->
Op
()
->
Type
()
==
"matmul_v2"
||
x
->
Op
()
->
Type
()
==
"matmul"
)
&&
is_fusion_mat_squared_x_y_op_out
(
x
->
outputs
[
0
]);
is_fusion_mat_squared_x_y_op_out
(
x
->
outputs
[
0
]);
},
},
name_scope
+
"/matmul_squared_x_y_op"
);
name_scope
+
"/matmul_squared_x_y_op"
);
...
@@ -378,3 +386,13 @@ void SquaredMatSubFusePass::ApplyImpl(ir::Graph* graph) const {
...
@@ -378,3 +386,13 @@ void SquaredMatSubFusePass::ApplyImpl(ir::Graph* graph) const {
REGISTER_PASS
(
squared_mat_sub_fuse_pass
,
REGISTER_PASS
(
squared_mat_sub_fuse_pass
,
paddle
::
framework
::
ir
::
SquaredMatSubFusePass
);
paddle
::
framework
::
ir
::
SquaredMatSubFusePass
);
REGISTER_PASS_CAPABILITY
(
squared_mat_sub_fuse_pass
)
.
AddCombination
(
paddle
::
framework
::
compatible
::
OpVersionComparatorCombination
()
.
EQ
(
"matmul"
,
0
)
.
EQ
(
"matmul_v2"
,
0
)
.
EQ
(
"square"
,
0
)
.
EQ
(
"elementwise_mul"
,
0
)
.
EQ
(
"elementwise_sub"
,
0
)
.
EQ
(
"fill_constant"
,
0
)
.
EQ
(
"fusion_squared_mat_sub"
,
0
));
paddle/fluid/framework/ir/squared_mat_sub_fuse_pass.h
浏览文件 @
f52c4f8b
...
@@ -24,7 +24,7 @@ namespace framework {
...
@@ -24,7 +24,7 @@ namespace framework {
namespace
ir
{
namespace
ir
{
/**
/**
* Fuse ( (A
.^2 * B.^2) - (A * B).^2
) .* scalar
* Fuse ( (A
* B).^2 - (A.^2 * B.^2)
) .* scalar
*/
*/
class
SquaredMatSubFusePass
:
public
FusePassBase
{
class
SquaredMatSubFusePass
:
public
FusePassBase
{
public:
public:
...
...
paddle/fluid/framework/operator.h
浏览文件 @
f52c4f8b
...
@@ -157,6 +157,14 @@ class OperatorBase {
...
@@ -157,6 +157,14 @@ class OperatorBase {
platform
::
errors
::
NotFound
(
"(%s) is not found in AttributeMap."
,
name
));
platform
::
errors
::
NotFound
(
"(%s) is not found in AttributeMap."
,
name
));
return
BOOST_GET_CONST
(
T
,
attrs_
.
at
(
name
));
return
BOOST_GET_CONST
(
T
,
attrs_
.
at
(
name
));
}
}
void
SetAttr
(
const
std
::
string
&
name
,
const
Attribute
&
v
)
{
PADDLE_ENFORCE_EQ
(
HasAttr
(
name
),
true
,
platform
::
errors
::
NotFound
(
"The attribute %s is not found in operator %s"
,
name
,
Type
()));
attrs_
[
name
]
=
v
;
}
const
AttributeMap
&
Attrs
()
const
{
return
attrs_
;
}
const
AttributeMap
&
Attrs
()
const
{
return
attrs_
;
}
const
VariableNameMap
&
Inputs
()
const
{
return
inputs_
;
}
const
VariableNameMap
&
Inputs
()
const
{
return
inputs_
;
}
...
...
paddle/fluid/framework/parallel_executor.cc
浏览文件 @
f52c4f8b
...
@@ -13,12 +13,14 @@ See the License for the specific language governing permissions and
...
@@ -13,12 +13,14 @@ See the License for the specific language governing permissions and
limitations under the License. */
limitations under the License. */
#include "paddle/fluid/framework/parallel_executor.h"
#include "paddle/fluid/framework/parallel_executor.h"
#include <algorithm>
#include <algorithm>
#include <memory>
#include <memory>
#include <string>
#include <string>
#include <tuple>
#include <tuple>
#include <utility>
#include <utility>
#include <vector>
#include <vector>
#include "paddle/fluid/framework/details/async_ssa_graph_executor.h"
#include "paddle/fluid/framework/details/async_ssa_graph_executor.h"
#include "paddle/fluid/framework/details/fast_threaded_ssa_graph_executor.h"
#include "paddle/fluid/framework/details/fast_threaded_ssa_graph_executor.h"
#include "paddle/fluid/framework/details/multi_devices_helper.h"
#include "paddle/fluid/framework/details/multi_devices_helper.h"
...
@@ -108,6 +110,11 @@ class ParallelExecutorPrivate {
...
@@ -108,6 +110,11 @@ class ParallelExecutorPrivate {
* them.
* them.
*/
*/
inline
void
SetSkipMemoryReuse
(
size_t
scope_idx
,
const
std
::
string
&
name
)
{
inline
void
SetSkipMemoryReuse
(
size_t
scope_idx
,
const
std
::
string
&
name
)
{
if
(
mem_opt_var_infos_
.
size
()
==
0
)
{
VLOG
(
4
)
<<
"The mem_opt_var_infos_ is empty, maybe no memory "
"optimization strategy is enabled"
;
return
;
}
auto
iter
=
mem_opt_var_infos_
[
scope_idx
].
find
(
name
);
auto
iter
=
mem_opt_var_infos_
[
scope_idx
].
find
(
name
);
if
(
iter
!=
mem_opt_var_infos_
[
scope_idx
].
end
())
{
if
(
iter
!=
mem_opt_var_infos_
[
scope_idx
].
end
())
{
iter
->
second
->
SetSkipMemoryReuse
(
true
);
iter
->
second
->
SetSkipMemoryReuse
(
true
);
...
@@ -308,6 +315,7 @@ ir::Graph *ParallelExecutorPrivate::ApplyMemoryOptimizePass(ir::Graph *graph) {
...
@@ -308,6 +315,7 @@ ir::Graph *ParallelExecutorPrivate::ApplyMemoryOptimizePass(ir::Graph *graph) {
}
}
bool
need_mem_opt
=
build_strategy_
.
enable_inplace_
||
bool
need_mem_opt
=
build_strategy_
.
enable_inplace_
||
build_strategy_
.
enable_addto_
||
build_strategy_
.
memory_optimize_
.
get
()
||
is_gc_enabled
;
build_strategy_
.
memory_optimize_
.
get
()
||
is_gc_enabled
;
if
(
!
need_mem_opt
)
return
graph
;
if
(
!
need_mem_opt
)
return
graph
;
...
@@ -320,6 +328,16 @@ ir::Graph *ParallelExecutorPrivate::ApplyMemoryOptimizePass(ir::Graph *graph) {
...
@@ -320,6 +328,16 @@ ir::Graph *ParallelExecutorPrivate::ApplyMemoryOptimizePass(ir::Graph *graph) {
graph
=
ref_cnt_pass
->
Apply
(
graph
);
graph
=
ref_cnt_pass
->
Apply
(
graph
);
VLOG
(
10
)
<<
"ReferenceCountPass Applied"
;
VLOG
(
10
)
<<
"ReferenceCountPass Applied"
;
if
(
build_strategy_
.
enable_addto_
)
{
auto
addto_pass
=
ir
::
PassRegistry
::
Instance
().
Get
(
"inplace_addto_op_pass"
);
addto_pass
->
SetNotOwned
(
ir
::
kMemOptVarInfoMapList
,
&
mem_opt_var_infos_
);
addto_pass
->
SetNotOwned
(
ir
::
kLastLiveOpsOfVars
,
&
last_live_ops_of_vars
);
addto_pass
->
SetNotOwned
(
ir
::
kUseCuda
,
&
use_cuda_
);
VLOG
(
10
)
<<
"Start to apply inplace_addto_op_pass"
;
graph
=
addto_pass
->
Apply
(
graph
);
VLOG
(
10
)
<<
"inplace_addto_op_pass Applied"
;
}
if
(
build_strategy_
.
enable_inplace_
)
{
if
(
build_strategy_
.
enable_inplace_
)
{
auto
inplace_pass
=
auto
inplace_pass
=
ir
::
PassRegistry
::
Instance
().
Get
(
"buffer_shared_inplace_pass"
);
ir
::
PassRegistry
::
Instance
().
Get
(
"buffer_shared_inplace_pass"
);
...
@@ -1068,3 +1086,4 @@ USE_PASS(reference_count_pass);
...
@@ -1068,3 +1086,4 @@ USE_PASS(reference_count_pass);
USE_PASS
(
eager_deletion_pass
);
USE_PASS
(
eager_deletion_pass
);
USE_PASS
(
buffer_shared_inplace_pass
);
USE_PASS
(
buffer_shared_inplace_pass
);
USE_PASS
(
buffer_shared_cross_op_memory_reuse_pass
);
USE_PASS
(
buffer_shared_cross_op_memory_reuse_pass
);
USE_PASS
(
inplace_addto_op_pass
);
paddle/fluid/inference/api/paddle_pass_builder.cc
浏览文件 @
f52c4f8b
...
@@ -156,7 +156,8 @@ CpuPassStrategy::CpuPassStrategy() : PassStrategy({}) {
...
@@ -156,7 +156,8 @@ CpuPassStrategy::CpuPassStrategy() : PassStrategy({}) {
// "seqpool_concat_fuse_pass", //
// "seqpool_concat_fuse_pass", //
"seqpool_cvm_concat_fuse_pass"
,
//
"seqpool_cvm_concat_fuse_pass"
,
//
// "embedding_fc_lstm_fuse_pass", //
// "embedding_fc_lstm_fuse_pass", //
"fc_lstm_fuse_pass"
,
//
// TODO(wilber): fix correctness problem.
// "fc_lstm_fuse_pass", //
"mul_lstm_fuse_pass"
,
//
"mul_lstm_fuse_pass"
,
//
"fc_gru_fuse_pass"
,
//
"fc_gru_fuse_pass"
,
//
"mul_gru_fuse_pass"
,
//
"mul_gru_fuse_pass"
,
//
...
...
paddle/fluid/inference/tensorrt/convert/emb_eltwise_layernorm.cc
浏览文件 @
f52c4f8b
...
@@ -80,10 +80,10 @@ class EmbEltwiseLayerNormOpConverter : public OpConverter {
...
@@ -80,10 +80,10 @@ class EmbEltwiseLayerNormOpConverter : public OpConverter {
nvinfer1
::
ILayer
*
layer
=
nullptr
;
nvinfer1
::
ILayer
*
layer
=
nullptr
;
if
(
engine_
->
with_dynamic_shape
())
{
if
(
engine_
->
with_dynamic_shape
())
{
plugin
::
DynamicPluginTensorRT
*
plugin
=
nullptr
;
auto
use_fp16
=
engine_
->
WithFp16
()
;
plugin
=
new
plugin
::
EmbEltwiseLayernormPluginDynamic
<
float
>
(
auto
plugin
=
new
plugin
::
EmbEltwiseLayernormPluginDynamic
(
input_embs
,
bias
,
scale
,
emb_sizes
,
bias_size
,
scale_size
,
hidden
,
input_embs
,
bias
,
scale
,
emb_sizes
,
bias_size
,
scale_size
,
hidden
,
eps
);
eps
,
use_fp16
);
layer
=
engine_
->
AddPluginV2
(
input_ids
.
data
(),
input_num
,
plugin
);
layer
=
engine_
->
AddPluginV2
(
input_ids
.
data
(),
input_num
,
plugin
);
}
else
{
}
else
{
PADDLE_THROW
(
platform
::
errors
::
Fatal
(
PADDLE_THROW
(
platform
::
errors
::
Fatal
(
...
...
paddle/fluid/inference/tensorrt/plugin/emb_eltwise_layernorm_plugin.cu
浏览文件 @
f52c4f8b
...
@@ -32,13 +32,34 @@ namespace plugin {
...
@@ -32,13 +32,34 @@ namespace plugin {
#if IS_TRT_VERSION_GE(6000)
#if IS_TRT_VERSION_GE(6000)
template
<
typename
T
>
template
<
typename
T
>
int
EmbEltwiseLayernormPluginDynamic
<
T
>::
initialize
()
{
EmbEltwiseLayernormPluginDynamicImpl
<
T
>::~
EmbEltwiseLayernormPluginDynamicImpl
()
{
this
->
terminate
();
}
inline
half
fp32tofp16
(
float
x
)
{
return
static_cast
<
half
>
(
x
);
}
template
<
typename
T
>
int
EmbEltwiseLayernormPluginDynamicImpl
<
T
>::
initialize
()
{
embs_gpu_
.
resize
(
embs_
.
size
());
embs_gpu_
.
resize
(
embs_
.
size
());
for
(
int
i
=
0
;
i
<
embs_
.
size
();
i
++
)
{
for
(
int
i
=
0
;
i
<
embs_
.
size
();
i
++
)
{
if
(
embs_
[
i
])
{
if
(
embs_
[
i
])
{
cudaMalloc
(
&
embs_gpu_
[
i
],
sizeof
(
float
)
*
emb_sizes_
[
i
]);
T
*
host_ptr
;
cudaMemcpy
(
embs_gpu_
[
i
],
embs_
[
i
],
emb_sizes_
[
i
]
*
sizeof
(
float
),
auto
size
=
emb_sizes_
[
i
];
if
(
std
::
is_same
<
T
,
half
>::
value
)
{
host_ptr
=
new
T
[
size
];
std
::
transform
(
embs_
[
i
],
(
embs_
[
i
]
+
size
),
host_ptr
,
fp32tofp16
);
}
else
{
host_ptr
=
reinterpret_cast
<
T
*>
(
embs_
[
i
]);
}
cudaMalloc
(
&
embs_gpu_
[
i
],
sizeof
(
T
)
*
size
);
cudaMemcpy
(
embs_gpu_
[
i
],
host_ptr
,
size
*
sizeof
(
T
),
cudaMemcpyHostToDevice
);
cudaMemcpyHostToDevice
);
if
(
std
::
is_same
<
T
,
half
>::
value
)
{
delete
[]
host_ptr
;
}
}
}
}
}
...
@@ -53,11 +74,105 @@ int EmbEltwiseLayernormPluginDynamic<T>::initialize() {
...
@@ -53,11 +74,105 @@ int EmbEltwiseLayernormPluginDynamic<T>::initialize() {
cudaMemcpyHostToDevice
);
cudaMemcpyHostToDevice
);
}
}
int
input_num
=
embs_
.
size
();
in_ptr_tensor_
.
Resize
({
input_num
});
emb_ptr_tensor_
.
Resize
({
input_num
});
cudaGetDevice
(
&
device_id_
);
auto
emb_ptr_gpu_d
=
emb_ptr_tensor_
.
mutable_data
<
int64_t
>
(
platform
::
CUDAPlace
(
device_id_
));
cudaMemcpy
(
emb_ptr_gpu_d
,
embs_gpu_
.
data
(),
sizeof
(
uintptr_t
)
*
input_num
,
cudaMemcpyHostToDevice
);
return
0
;
return
0
;
}
}
template
<
typename
T
>
template
<
typename
T
>
nvinfer1
::
DimsExprs
EmbEltwiseLayernormPluginDynamic
<
T
>::
getOutputDimensions
(
void
EmbEltwiseLayernormPluginDynamicImpl
<
T
>::
terminate
()
{
for
(
int
i
=
0
;
i
<
embs_gpu_
.
size
();
++
i
)
{
if
(
embs_gpu_
[
i
])
{
cudaFree
(
embs_gpu_
[
i
]);
embs_gpu_
[
i
]
=
nullptr
;
}
}
if
(
bias_gpu_
)
{
cudaFree
(
bias_gpu_
);
bias_gpu_
=
nullptr
;
}
if
(
scale_gpu_
)
{
cudaFree
(
scale_gpu_
);
scale_gpu_
=
nullptr
;
}
}
template
<
typename
T
>
int
EmbEltwiseLayernormPluginDynamicImpl
<
T
>::
enqueue
(
const
nvinfer1
::
PluginTensorDesc
*
input_desc
,
const
nvinfer1
::
PluginTensorDesc
*
output_desc
,
const
void
*
const
*
inputs
,
void
*
const
*
outputs
,
void
*
workspace
,
cudaStream_t
stream
)
{
auto
id_dims
=
input_desc
[
0
].
dims
;
int
batch
=
id_dims
.
d
[
0
];
int
seq_len
=
id_dims
.
d
[
1
];
int
input_num
=
embs_
.
size
();
auto
in_ptr_gpu_d
=
in_ptr_tensor_
.
mutable_data
<
int64_t
>
(
platform
::
CUDAPlace
(
device_id_
));
auto
emb_ptr_gpu_d
=
emb_ptr_tensor_
.
mutable_data
<
int64_t
>
(
platform
::
CUDAPlace
(
device_id_
));
auto
new_input_ptr
=
reinterpret_cast
<
uintptr_t
>
(
inputs
[
0
]);
if
(
old_input_ptr_
!=
new_input_ptr
)
{
old_input_ptr_
=
new_input_ptr
;
cudaMemcpyAsync
(
in_ptr_gpu_d
,
reinterpret_cast
<
const
void
*>
(
inputs
),
sizeof
(
uintptr_t
)
*
input_num
,
cudaMemcpyHostToDevice
,
stream
);
}
auto
out_type
=
output_desc
[
0
].
type
;
if
(
std
::
is_same
<
T
,
float
>::
value
)
{
PADDLE_ENFORCE_EQ
(
out_type
==
nvinfer1
::
DataType
::
kFLOAT
,
true
,
platform
::
errors
::
InvalidArgument
(
"The EmbEltwiseLayernorm Plugin only support fp32 input."
));
}
else
if
(
std
::
is_same
<
T
,
half
>::
value
)
{
PADDLE_ENFORCE_EQ
(
out_type
==
nvinfer1
::
DataType
::
kHALF
,
true
,
platform
::
errors
::
InvalidArgument
(
"The EmbEltwiseLayernorm Plugin only support fp16 input."
));
}
else
{
PADDLE_THROW
(
platform
::
errors
::
Fatal
(
"Unsupport data type, the out type of EmbEltwiseLayernorm should be "
"float or half."
));
}
auto
*
output_d
=
reinterpret_cast
<
T
*>
(
outputs
[
0
]);
operators
::
math
::
EmbEltwiseLayerNormFunctor
<
T
>
emb_eltwise_layernorm_func
;
emb_eltwise_layernorm_func
(
batch
,
seq_len
,
hidden_size_
,
in_ptr_gpu_d
,
scale_gpu_
,
bias_gpu_
,
emb_ptr_gpu_d
,
output_d
,
eps_
,
input_num
,
stream
);
return
cudaGetLastError
()
!=
cudaSuccess
;
}
template
class
EmbEltwiseLayernormPluginDynamicImpl
<
float
>;
#ifdef SUPPORTS_CUDA_FP16
template
class
EmbEltwiseLayernormPluginDynamicImpl
<
half
>;
#endif // SUPPORTS_CUDA_FP16
int
EmbEltwiseLayernormPluginDynamic
::
initialize
()
{
impl_
->
initialize
();
return
0
;
}
void
EmbEltwiseLayernormPluginDynamic
::
terminate
()
{
impl_
->
terminate
();
}
nvinfer1
::
DimsExprs
EmbEltwiseLayernormPluginDynamic
::
getOutputDimensions
(
int
output_index
,
const
nvinfer1
::
DimsExprs
*
inputs
,
int
nb_inputs
,
int
output_index
,
const
nvinfer1
::
DimsExprs
*
inputs
,
int
nb_inputs
,
nvinfer1
::
IExprBuilder
&
expr_builder
)
{
// NOLINT
nvinfer1
::
IExprBuilder
&
expr_builder
)
{
// NOLINT
PADDLE_ENFORCE_EQ
(
output_index
,
0
,
PADDLE_ENFORCE_EQ
(
output_index
,
0
,
...
@@ -76,18 +191,7 @@ nvinfer1::DimsExprs EmbEltwiseLayernormPluginDynamic<T>::getOutputDimensions(
...
@@ -76,18 +191,7 @@ nvinfer1::DimsExprs EmbEltwiseLayernormPluginDynamic<T>::getOutputDimensions(
return
ret
;
return
ret
;
}
}
template
<
typename
T
>
bool
EmbEltwiseLayernormPluginDynamic
::
supportsFormatCombination
(
void
EmbEltwiseLayernormPluginDynamic
<
T
>::
terminate
()
{
for
(
auto
ptr
:
embs_gpu_
)
{
if
(
ptr
)
cudaFree
(
ptr
);
}
if
(
bias_gpu_
)
cudaFree
(
bias_gpu_
);
if
(
scale_gpu_
)
cudaFree
(
scale_gpu_
);
}
template
<
typename
T
>
bool
EmbEltwiseLayernormPluginDynamic
<
T
>::
supportsFormatCombination
(
int
pos
,
const
nvinfer1
::
PluginTensorDesc
*
in_out
,
int
nb_inputs
,
int
pos
,
const
nvinfer1
::
PluginTensorDesc
*
in_out
,
int
nb_inputs
,
int
nb_outputs
)
{
int
nb_outputs
)
{
PADDLE_ENFORCE_NOT_NULL
(
PADDLE_ENFORCE_NOT_NULL
(
...
@@ -98,6 +202,11 @@ bool EmbEltwiseLayernormPluginDynamic<T>::supportsFormatCombination(
...
@@ -98,6 +202,11 @@ bool EmbEltwiseLayernormPluginDynamic<T>::supportsFormatCombination(
"The EmbEltwiseLayerNorm's output should be one"
"The EmbEltwiseLayerNorm's output should be one"
"but it's (%d) outputs."
,
"but it's (%d) outputs."
,
nb_outputs
));
nb_outputs
));
PADDLE_ENFORCE_EQ
(
nb_outputs
,
1
,
platform
::
errors
::
InvalidArgument
(
"The EmbEltwiseLayerNorm's output should be one"
"but it's (%d) outputs."
,
nb_outputs
));
PADDLE_ENFORCE_LT
(
PADDLE_ENFORCE_LT
(
pos
,
nb_inputs
+
nb_outputs
,
pos
,
nb_inputs
+
nb_outputs
,
platform
::
errors
::
InvalidArgument
(
"The pos(%d) should be less than the "
platform
::
errors
::
InvalidArgument
(
"The pos(%d) should be less than the "
...
@@ -122,7 +231,7 @@ bool EmbEltwiseLayernormPluginDynamic<T>::supportsFormatCombination(
...
@@ -122,7 +231,7 @@ bool EmbEltwiseLayernormPluginDynamic<T>::supportsFormatCombination(
}
}
if
(
pos
==
all_nums
-
1
)
{
if
(
pos
==
all_nums
-
1
)
{
if
(
sizeof
(
T
)
==
sizeof
(
float
)
)
{
if
(
with_fp16_
==
false
)
{
return
desc
.
type
==
nvinfer1
::
DataType
::
kFLOAT
;
return
desc
.
type
==
nvinfer1
::
DataType
::
kFLOAT
;
}
else
{
}
else
{
return
desc
.
type
==
nvinfer1
::
DataType
::
kHALF
;
return
desc
.
type
==
nvinfer1
::
DataType
::
kHALF
;
...
@@ -131,84 +240,27 @@ bool EmbEltwiseLayernormPluginDynamic<T>::supportsFormatCombination(
...
@@ -131,84 +240,27 @@ bool EmbEltwiseLayernormPluginDynamic<T>::supportsFormatCombination(
return
false
;
return
false
;
}
}
template
<
typename
T
>
nvinfer1
::
DataType
EmbEltwiseLayernormPluginDynamic
::
getOutputDataType
(
nvinfer1
::
DataType
EmbEltwiseLayernormPluginDynamic
<
T
>::
getOutputDataType
(
int
index
,
const
nvinfer1
::
DataType
*
input_types
,
int
nb_inputs
)
const
{
int
index
,
const
nvinfer1
::
DataType
*
input_types
,
int
nb_inputs
)
const
{
PADDLE_ENFORCE_EQ
(
PADDLE_ENFORCE_EQ
(
index
,
0
,
platform
::
errors
::
InvalidArgument
(
index
,
0
,
platform
::
errors
::
InvalidArgument
(
"The EmbEltwiseLayernorm Plugin only has one input, so the "
"The EmbEltwiseLayernorm Plugin only has one input, so the "
"index value should be 0, but get %d."
,
"index value should be 0, but get %d."
,
index
));
index
));
if
(
with_fp16_
)
return
nvinfer1
::
DataType
::
kHALF
;
else
return
nvinfer1
::
DataType
::
kFLOAT
;
return
nvinfer1
::
DataType
::
kFLOAT
;
}
}
template
<
typename
T
>
int
EmbEltwiseLayernormPluginDynamic
::
enqueue
(
int
EmbEltwiseLayernormPluginDynamic
<
T
>::
enqueue
(
const
nvinfer1
::
PluginTensorDesc
*
input_desc
,
const
nvinfer1
::
PluginTensorDesc
*
input_desc
,
const
nvinfer1
::
PluginTensorDesc
*
output_desc
,
const
void
*
const
*
inputs
,
const
nvinfer1
::
PluginTensorDesc
*
output_desc
,
const
void
*
const
*
inputs
,
void
*
const
*
outputs
,
void
*
workspace
,
cudaStream_t
stream
)
{
void
*
const
*
outputs
,
void
*
workspace
,
cudaStream_t
stream
)
{
auto
id_dims
=
input_desc
[
0
].
dims
;
impl_
->
enqueue
(
input_desc
,
output_desc
,
inputs
,
outputs
,
workspace
,
stream
);
int
batch
=
id_dims
.
d
[
0
];
int
seq_len
=
id_dims
.
d
[
1
];
int
input_num
=
embs_
.
size
();
framework
::
Tensor
in_ptr_tensor
,
emb_ptr_tensor
;
int
device_id
;
cudaGetDevice
(
&
device_id
);
in_ptr_tensor
.
Resize
({
input_num
});
emb_ptr_tensor
.
Resize
({
input_num
});
int64_t
*
in_ptr_gpu_d
=
in_ptr_tensor
.
mutable_data
<
int64_t
>
(
platform
::
CUDAPlace
(
device_id
));
int64_t
*
emb_ptr_gpu_d
=
emb_ptr_tensor
.
mutable_data
<
int64_t
>
(
platform
::
CUDAPlace
(
device_id
));
std
::
vector
<
uintptr_t
>
in_ptr
,
emb_ptr
;
for
(
int
i
=
0
;
i
<
input_num
;
i
++
)
{
in_ptr
.
push_back
(
reinterpret_cast
<
uintptr_t
>
(
inputs
[
i
]));
emb_ptr
.
push_back
(
reinterpret_cast
<
uintptr_t
>
(
embs_gpu_
[
i
]));
}
cudaMemcpyAsync
(
in_ptr_gpu_d
,
in_ptr
.
data
(),
sizeof
(
int64_t
)
*
input_num
,
cudaMemcpyHostToDevice
,
stream
);
cudaMemcpyAsync
(
emb_ptr_gpu_d
,
emb_ptr
.
data
(),
sizeof
(
int64_t
)
*
input_num
,
cudaMemcpyHostToDevice
,
stream
);
auto
out_type
=
output_desc
[
0
].
type
;
const
unsigned
tpb
=
256
;
const
dim3
grid
(
seq_len
,
batch
,
1
);
const
dim3
block
(
tpb
,
1
,
1
);
if
(
sizeof
(
T
)
==
sizeof
(
float
))
{
PADDLE_ENFORCE_EQ
(
out_type
==
nvinfer1
::
DataType
::
kFLOAT
,
true
,
platform
::
errors
::
InvalidArgument
(
"The EmbEltwiseLayernorm Plugin only support fp32 input."
));
}
else
if
(
sizeof
(
T
)
==
sizeof
(
int16_t
))
{
PADDLE_ENFORCE_EQ
(
out_type
==
nvinfer1
::
DataType
::
kHALF
,
true
,
platform
::
errors
::
InvalidArgument
(
"The EmbEltwiseLayernorm Plugin only support fp16 input."
));
}
else
{
PADDLE_THROW
(
platform
::
errors
::
Fatal
(
"Unsupport data type, the out type of EmbEltwiseLayernorm should be "
"float or half."
));
}
T
*
output_d
=
static_cast
<
T
*>
(
outputs
[
0
]);
operators
::
math
::
EmbEltwiseLayerNormFunctor
<
T
>
emb_eltwise_layernorm_func
;
emb_eltwise_layernorm_func
(
batch
,
seq_len
,
hidden_size_
,
in_ptr_gpu_d
,
scale_gpu_
,
bias_gpu_
,
emb_ptr_gpu_d
,
output_d
,
eps_
,
input_num
,
stream
);
return
cudaGetLastError
()
!=
cudaSuccess
;
return
cudaGetLastError
()
!=
cudaSuccess
;
}
}
template
class
EmbEltwiseLayernormPluginDynamic
<
float
>;
#ifdef SUPPORTS_CUDA_FP16
template
class
EmbEltwiseLayernormPluginDynamic
<
half
>;
#endif // SUPPORTS_CUDA_FP16
#endif
#endif
}
// namespace plugin
}
// namespace plugin
...
...
paddle/fluid/inference/tensorrt/plugin/emb_eltwise_layernorm_plugin.h
浏览文件 @
f52c4f8b
...
@@ -27,10 +27,25 @@ namespace tensorrt {
...
@@ -27,10 +27,25 @@ namespace tensorrt {
namespace
plugin
{
namespace
plugin
{
#if IS_TRT_VERSION_GE(6000)
#if IS_TRT_VERSION_GE(6000)
class
EmbEltwiseLayernormPluginDynamicImplBase
{
public:
EmbEltwiseLayernormPluginDynamicImplBase
()
{}
virtual
~
EmbEltwiseLayernormPluginDynamicImplBase
()
{}
virtual
int
initialize
()
=
0
;
virtual
void
terminate
()
=
0
;
virtual
int
enqueue
(
const
nvinfer1
::
PluginTensorDesc
*
inputDesc
,
const
nvinfer1
::
PluginTensorDesc
*
outputDesc
,
const
void
*
const
*
inputs
,
void
*
const
*
outputs
,
void
*
workspace
,
cudaStream_t
stream
)
=
0
;
};
template
<
typename
T
>
template
<
typename
T
>
class
EmbEltwiseLayernormPluginDynamic
:
public
DynamicPluginTensorRT
{
class
EmbEltwiseLayernormPluginDynamicImpl
:
public
EmbEltwiseLayernormPluginDynamicImplBase
{
public:
public:
explicit
EmbEltwiseLayernormPluginDynamic
(
std
::
vector
<
float
*>
input_embs
,
explicit
EmbEltwiseLayernormPluginDynamic
Impl
(
std
::
vector
<
float
*>
input_embs
,
float
*
bias
,
float
*
scale
,
float
*
bias
,
float
*
scale
,
std
::
vector
<
int
>
emb_sizes
,
std
::
vector
<
int
>
emb_sizes
,
int
bias_size
,
int
scale_size
,
int
bias_size
,
int
scale_size
,
...
@@ -44,49 +59,126 @@ class EmbEltwiseLayernormPluginDynamic : public DynamicPluginTensorRT {
...
@@ -44,49 +59,126 @@ class EmbEltwiseLayernormPluginDynamic : public DynamicPluginTensorRT {
hidden_size_
(
hidden_size
),
hidden_size_
(
hidden_size
),
eps_
(
eps
)
{}
eps_
(
eps
)
{}
~
EmbEltwiseLayernormPluginDynamicImpl
();
int
initialize
();
void
terminate
();
int
enqueue
(
const
nvinfer1
::
PluginTensorDesc
*
inputDesc
,
const
nvinfer1
::
PluginTensorDesc
*
outputDesc
,
const
void
*
const
*
inputs
,
void
*
const
*
outputs
,
void
*
workspace
,
cudaStream_t
stream
);
private:
std
::
vector
<
float
*>
embs_
;
float
*
bias_
{
nullptr
};
float
*
scale_
{
nullptr
};
// data on devices
float
*
bias_gpu_
{
nullptr
};
float
*
scale_gpu_
{
nullptr
};
std
::
vector
<
T
*>
embs_gpu_
;
std
::
vector
<
int
>
emb_sizes_
;
int
bias_size_
;
int
scale_size_
;
int
hidden_size_
;
float
eps_
;
framework
::
Tensor
in_ptr_tensor_
,
emb_ptr_tensor_
;
int
device_id_
{
0
};
uintptr_t
old_input_ptr_
{
0
};
};
class
EmbEltwiseLayernormPluginDynamic
:
public
DynamicPluginTensorRT
{
public:
explicit
EmbEltwiseLayernormPluginDynamic
(
std
::
vector
<
float
*>
input_embs
,
float
*
bias
,
float
*
scale
,
std
::
vector
<
int
>
emb_sizes
,
int
bias_size
,
int
scale_size
,
int
hidden_size
,
float
eps
,
bool
with_fp16
)
:
embs_
(
input_embs
),
bias_
(
bias
),
scale_
(
scale
),
emb_sizes_
(
emb_sizes
),
bias_size_
(
bias_size
),
scale_size_
(
scale_size
),
hidden_size_
(
hidden_size
),
eps_
(
eps
),
with_fp16_
(
with_fp16
),
own_host_buff_
(
false
)
{
if
(
with_fp16
)
{
#ifdef SUPPORTS_CUDA_FP16
impl_
=
new
EmbEltwiseLayernormPluginDynamicImpl
<
half
>
(
embs_
,
bias_
,
scale_
,
emb_sizes_
,
bias_size_
,
scale_size_
,
hidden_size_
,
eps_
);
#else
PADDLE_THROW
(
platform
::
errors
::
Fatal
(
"Unsupported data type, current GPU doesn't support half."
));
#endif // SUPPORTS_CUDA_FP16
}
else
{
impl_
=
new
EmbEltwiseLayernormPluginDynamicImpl
<
float
>
(
embs_
,
bias_
,
scale_
,
emb_sizes_
,
bias_size_
,
scale_size_
,
hidden_size_
,
eps_
);
}
}
EmbEltwiseLayernormPluginDynamic
(
void
const
*
serial_data
,
EmbEltwiseLayernormPluginDynamic
(
void
const
*
serial_data
,
size_t
serial_length
)
{
size_t
serial_length
)
:
own_host_buff_
(
true
)
{
DeserializeValue
(
&
serial_data
,
&
serial_length
,
&
emb_sizes_
);
DeserializeValue
(
&
serial_data
,
&
serial_length
,
&
emb_sizes_
);
embs_gpu_
.
resize
(
emb_sizes_
.
size
());
embs_
.
resize
(
emb_sizes_
.
size
());
embs_
.
resize
(
emb_sizes_
.
size
());
for
(
size_t
i
=
0
;
i
<
emb_sizes_
.
size
();
i
++
)
{
for
(
size_t
i
=
0
;
i
<
emb_sizes_
.
size
();
i
++
)
{
cudaMalloc
(
&
embs_gpu_
[
i
],
sizeof
(
float
)
*
emb_sizes_
[
i
]);
auto
size
=
emb_sizes_
[
i
];
cudaMemcpy
(
embs_gpu_
[
i
],
serial_data
,
emb_sizes_
[
i
]
*
sizeof
(
float
),
auto
ptr
=
new
float
[
size
];
cudaMemcpyHostToDevice
);
memcpy
(
ptr
,
serial_data
,
sizeof
(
float
)
*
size
);
embs_
[
i
]
=
ptr
;
reinterpret_cast
<
char
const
*&>
(
serial_data
)
+=
reinterpret_cast
<
char
const
*&>
(
serial_data
)
+=
emb_sizes_
[
i
]
*
sizeof
(
float
);
emb_sizes_
[
i
]
*
sizeof
(
float
);
serial_length
-=
emb_sizes_
[
i
]
*
sizeof
(
float
);
serial_length
-=
emb_sizes_
[
i
]
*
sizeof
(
float
);
embs_
[
i
]
=
nullptr
;
}
}
DeserializeValue
(
&
serial_data
,
&
serial_length
,
&
bias_size_
);
DeserializeValue
(
&
serial_data
,
&
serial_length
,
&
bias_size_
);
DeserializeValue
(
&
serial_data
,
&
serial_length
,
&
scale_size_
);
DeserializeValue
(
&
serial_data
,
&
serial_length
,
&
scale_size_
);
cudaMalloc
(
&
bias_gpu_
,
sizeof
(
float
)
*
bias_size_
);
if
(
bias_size_
)
{
cudaMemcpy
(
bias_gpu_
,
serial_data
,
bias_size_
*
sizeof
(
float
),
bias_
=
new
float
[
bias_size_
];
cudaMemcpyHostToDevice
);
memcpy
(
bias_
,
serial_data
,
sizeof
(
float
)
*
bias_size_
);
bias_
=
nullptr
;
}
reinterpret_cast
<
char
const
*&>
(
serial_data
)
+=
bias_size_
*
sizeof
(
float
);
reinterpret_cast
<
char
const
*&>
(
serial_data
)
+=
bias_size_
*
sizeof
(
float
);
serial_length
-=
bias_size_
*
sizeof
(
float
);
serial_length
-=
bias_size_
*
sizeof
(
float
);
cudaMalloc
(
&
scale_gpu_
,
sizeof
(
float
)
*
scale_size_
);
if
(
scale_size_
)
{
cudaMemcpy
(
scale_gpu_
,
serial_data
,
scale_size_
*
sizeof
(
float
),
scale_
=
new
float
[
scale_size_
];
cudaMemcpyHostToDevice
);
memcpy
(
scale_
,
serial_data
,
sizeof
(
float
)
*
scale_size_
);
scale_
=
nullptr
;
}
reinterpret_cast
<
char
const
*&>
(
serial_data
)
+=
scale_size_
*
sizeof
(
float
);
reinterpret_cast
<
char
const
*&>
(
serial_data
)
+=
scale_size_
*
sizeof
(
float
);
serial_length
-=
scale_size_
*
sizeof
(
float
);
serial_length
-=
scale_size_
*
sizeof
(
float
);
DeserializeValue
(
&
serial_data
,
&
serial_length
,
&
hidden_size_
);
DeserializeValue
(
&
serial_data
,
&
serial_length
,
&
hidden_size_
);
DeserializeValue
(
&
serial_data
,
&
serial_length
,
&
eps_
);
DeserializeValue
(
&
serial_data
,
&
serial_length
,
&
eps_
);
DeserializeValue
(
&
serial_data
,
&
serial_length
,
&
with_fp16_
);
if
(
with_fp16_
)
{
#ifdef SUPPORTS_CUDA_FP16
impl_
=
new
EmbEltwiseLayernormPluginDynamicImpl
<
half
>
(
embs_
,
bias_
,
scale_
,
emb_sizes_
,
bias_size_
,
scale_size_
,
hidden_size_
,
eps_
);
#else
PADDLE_THROW
(
platform
::
errors
::
Fatal
(
"Unsupported data type, current GPU doesn't support half."
));
#endif // SUPPORTS_CUDA_FP16
}
else
{
impl_
=
new
EmbEltwiseLayernormPluginDynamicImpl
<
float
>
(
embs_
,
bias_
,
scale_
,
emb_sizes_
,
bias_size_
,
scale_size_
,
hidden_size_
,
eps_
);
}
}
}
nvinfer1
::
IPluginV2DynamicExt
*
clone
()
const
override
{
nvinfer1
::
IPluginV2DynamicExt
*
clone
()
const
override
{
auto
ptr
=
new
EmbEltwiseLayernormPluginDynamic
(
auto
ptr
=
new
EmbEltwiseLayernormPluginDynamic
(
embs_
,
bias_
,
scale_
,
emb_sizes_
,
bias_size_
,
scale_size_
,
hidden_size_
,
embs_
,
bias_
,
scale_
,
emb_sizes_
,
bias_size_
,
scale_size_
,
hidden_size_
,
eps_
);
eps_
,
with_fp16_
);
ptr
->
embs_gpu_
=
embs_gpu_
;
ptr
->
bias_gpu_
=
bias_gpu_
;
ptr
->
scale_gpu_
=
scale_gpu_
;
return
ptr
;
return
ptr
;
}
}
...
@@ -95,6 +187,7 @@ class EmbEltwiseLayernormPluginDynamic : public DynamicPluginTensorRT {
...
@@ -95,6 +187,7 @@ class EmbEltwiseLayernormPluginDynamic : public DynamicPluginTensorRT {
}
}
int
getNbOutputs
()
const
override
{
return
1
;
}
int
getNbOutputs
()
const
override
{
return
1
;
}
int
initialize
()
override
;
int
initialize
()
override
;
void
terminate
()
override
;
size_t
getSerializationSize
()
const
override
{
size_t
getSerializationSize
()
const
override
{
int
sum_num
=
0
;
int
sum_num
=
0
;
...
@@ -110,24 +203,32 @@ class EmbEltwiseLayernormPluginDynamic : public DynamicPluginTensorRT {
...
@@ -110,24 +203,32 @@ class EmbEltwiseLayernormPluginDynamic : public DynamicPluginTensorRT {
sum_num
+=
(
bias_size_
+
scale_size_
)
*
sizeof
(
float
);
sum_num
+=
(
bias_size_
+
scale_size_
)
*
sizeof
(
float
);
sum_num
+=
SerializedSize
(
hidden_size_
);
sum_num
+=
SerializedSize
(
hidden_size_
);
sum_num
+=
SerializedSize
(
eps_
);
sum_num
+=
SerializedSize
(
eps_
);
//
sum_num += SerializedSize(with_fp16_);
sum_num
+=
SerializedSize
(
with_fp16_
);
return
sum_num
;
return
sum_num
;
}
}
void
terminate
()
override
;
void
serialize
(
void
*
buffer
)
const
override
{
void
serialize
(
void
*
buffer
)
const
override
{
// SerializeValue(&buffer, with_fp16_);
SerializeValue
(
&
buffer
,
emb_sizes_
);
SerializeValue
(
&
buffer
,
emb_sizes_
);
for
(
size_t
i
=
0
;
i
<
emb_sizes_
.
size
();
i
++
)
{
for
(
size_t
i
=
0
;
i
<
emb_sizes_
.
size
();
i
++
)
{
SerializeCudaPointer
(
&
buffer
,
embs_gpu_
[
i
],
emb_sizes_
[
i
]);
auto
size
=
emb_sizes_
[
i
];
for
(
int
j
=
0
;
j
<
size
;
++
j
)
{
SerializeValue
(
&
buffer
,
embs_
[
i
][
j
]);
}
}
}
SerializeValue
(
&
buffer
,
bias_size_
);
SerializeValue
(
&
buffer
,
bias_size_
);
SerializeValue
(
&
buffer
,
scale_size_
);
SerializeValue
(
&
buffer
,
scale_size_
);
SerializeCudaPointer
(
&
buffer
,
bias_gpu_
,
bias_size_
);
for
(
int
i
=
0
;
i
<
bias_size_
;
++
i
)
{
SerializeCudaPointer
(
&
buffer
,
scale_gpu_
,
scale_size_
);
SerializeValue
(
&
buffer
,
bias_
[
i
]);
}
for
(
int
i
=
0
;
i
<
scale_size_
;
++
i
)
{
SerializeValue
(
&
buffer
,
scale_
[
i
]);
}
SerializeValue
(
&
buffer
,
hidden_size_
);
SerializeValue
(
&
buffer
,
hidden_size_
);
SerializeValue
(
&
buffer
,
eps_
);
SerializeValue
(
&
buffer
,
eps_
);
SerializeValue
(
&
buffer
,
with_fp16_
);
}
}
nvinfer1
::
DimsExprs
getOutputDimensions
(
nvinfer1
::
DimsExprs
getOutputDimensions
(
...
@@ -158,23 +259,33 @@ class EmbEltwiseLayernormPluginDynamic : public DynamicPluginTensorRT {
...
@@ -158,23 +259,33 @@ class EmbEltwiseLayernormPluginDynamic : public DynamicPluginTensorRT {
const
nvinfer1
::
DataType
*
input_types
,
const
nvinfer1
::
DataType
*
input_types
,
int
nb_inputs
)
const
override
;
int
nb_inputs
)
const
override
;
void
destroy
()
override
{
delete
this
;
}
void
destroy
()
override
{
if
(
own_host_buff_
)
{
for
(
auto
ptr
:
embs_
)
{
delete
[]
ptr
;
}
delete
[]
bias_
;
delete
[]
scale_
;
}
delete
impl_
;
delete
this
;
}
private:
private:
std
::
vector
<
float
*>
embs_
;
std
::
vector
<
float
*>
embs_
;
float
*
bias_
;
float
*
bias_
;
float
*
scale_
;
float
*
scale_
;
// data on devices
float
*
bias_gpu_
;
float
*
scale_gpu_
;
std
::
vector
<
float
*>
embs_gpu_
;
std
::
vector
<
int
>
emb_sizes_
;
std
::
vector
<
int
>
emb_sizes_
;
int
bias_size_
;
int
bias_size_
;
int
scale_size_
;
int
scale_size_
;
int
hidden_size_
;
int
hidden_size_
;
float
eps_
;
float
eps_
;
bool
with_fp16_
;
bool
own_host_buff_
{
false
};
EmbEltwiseLayernormPluginDynamicImplBase
*
impl_
{
nullptr
};
};
};
class
EmbEltwiseLayernormPluginV2Creator
:
public
nvinfer1
::
IPluginCreator
{
class
EmbEltwiseLayernormPluginV2Creator
:
public
nvinfer1
::
IPluginCreator
{
...
@@ -198,8 +309,7 @@ class EmbEltwiseLayernormPluginV2Creator : public nvinfer1::IPluginCreator {
...
@@ -198,8 +309,7 @@ class EmbEltwiseLayernormPluginV2Creator : public nvinfer1::IPluginCreator {
nvinfer1
::
IPluginV2
*
deserializePlugin
(
const
char
*
name
,
nvinfer1
::
IPluginV2
*
deserializePlugin
(
const
char
*
name
,
const
void
*
serial_data
,
const
void
*
serial_data
,
size_t
serial_length
)
override
{
size_t
serial_length
)
override
{
return
new
EmbEltwiseLayernormPluginDynamic
<
float
>
(
serial_data
,
return
new
EmbEltwiseLayernormPluginDynamic
(
serial_data
,
serial_length
);
serial_length
);
}
}
void
setPluginNamespace
(
const
char
*
lib_namespace
)
override
{
void
setPluginNamespace
(
const
char
*
lib_namespace
)
override
{
...
...
paddle/fluid/inference/tests/api/trt_dynamic_shape_ernie_deserialize_test.cc
浏览文件 @
f52c4f8b
...
@@ -151,7 +151,7 @@ void trt_ernie(bool with_fp16, std::vector<float> result) {
...
@@ -151,7 +151,7 @@ void trt_ernie(bool with_fp16, std::vector<float> result) {
run
(
config
,
&
out_data
);
// serialize
run
(
config
,
&
out_data
);
// serialize
run
(
*
config_deser
,
&
out_data
);
// deserialize
run
(
*
config_deser
,
&
out_data
);
// deserialize
for
(
size_t
i
=
0
;
i
<
out_data
.
size
();
i
++
)
{
for
(
size_t
i
=
0
;
i
<
out_data
.
size
();
i
++
)
{
EXPECT_NEAR
(
result
[
i
],
out_data
[
i
],
1e-
6
);
EXPECT_NEAR
(
result
[
i
],
out_data
[
i
],
1e-
2
);
}
}
}
}
...
@@ -159,13 +159,11 @@ TEST(AnalysisPredictor, no_fp16) {
...
@@ -159,13 +159,11 @@ TEST(AnalysisPredictor, no_fp16) {
std
::
vector
<
float
>
result
=
{
0.597841
,
0.219972
,
0.182187
};
std
::
vector
<
float
>
result
=
{
0.597841
,
0.219972
,
0.182187
};
trt_ernie
(
false
,
result
);
trt_ernie
(
false
,
result
);
}
}
TEST
(
AnalysisPredictor
,
fp16
)
{
#ifdef SUPPORTS_CUDA_FP16
#ifdef SUPPORTS_CUDA_FP16
std
::
vector
<
float
>
result
=
{
0.598336
,
0.219558
,
0.182106
};
TEST
(
AnalysisPredictor
,
fp16
)
{
std
::
vector
<
float
>
result
=
{
0.59923654
,
0.21923761
,
0.18152587
};
trt_ernie
(
true
,
result
);
trt_ernie
(
true
,
result
);
#endif
}
}
#endif // SUPPORTS_CUDA_FP16
}
// namespace inference
}
// namespace inference
}
// namespace paddle
}
// namespace paddle
paddle/fluid/operators/conv_cudnn_op.cu
浏览文件 @
f52c4f8b
...
@@ -14,6 +14,7 @@ limitations under the License. */
...
@@ -14,6 +14,7 @@ limitations under the License. */
#include <utility>
#include <utility>
#include <vector>
#include <vector>
#include "paddle/fluid/framework/eigen.h"
#include "paddle/fluid/framework/eigen.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/tensor.h"
#include "paddle/fluid/framework/tensor.h"
...
@@ -287,7 +288,9 @@ class CUDNNConvOpKernel : public framework::OpKernel<T> {
...
@@ -287,7 +288,9 @@ class CUDNNConvOpKernel : public framework::OpKernel<T> {
#endif
#endif
// ------------------- cudnn conv forward ---------------------
// ------------------- cudnn conv forward ---------------------
ScalingParamType
<
T
>
alpha
=
1.0
f
,
beta
=
0.0
f
;
ScalingParamType
<
T
>
alpha
=
1.0
f
;
ScalingParamType
<
T
>
beta
=
ctx
.
Attr
<
bool
>
(
"use_addto"
)
?
1.0
f
:
0.0
f
;
VLOG
(
4
)
<<
"Conv: use_addto = "
<<
ctx
.
Attr
<
bool
>
(
"use_addto"
);
for
(
int
i
=
0
;
i
<
groups
;
i
++
)
{
for
(
int
i
=
0
;
i
<
groups
;
i
++
)
{
workspace_handle
.
RunFunc
(
workspace_handle
.
RunFunc
(
[
&
](
void
*
workspace_ptr
)
{
[
&
](
void
*
workspace_ptr
)
{
...
@@ -609,9 +612,13 @@ class CUDNNConvGradOpKernel : public framework::OpKernel<T> {
...
@@ -609,9 +612,13 @@ class CUDNNConvGradOpKernel : public framework::OpKernel<T> {
}
}
// ------------------- cudnn conv backward data ---------------------
// ------------------- cudnn conv backward data ---------------------
ScalingParamType
<
T
>
alpha
=
1.0
f
,
beta
=
0.0
f
;
ScalingParamType
<
T
>
alpha
=
1.0
f
;
ScalingParamType
<
T
>
beta
=
ctx
.
Attr
<
bool
>
(
"use_addto"
)
?
1.0
f
:
0.0
f
;
VLOG
(
4
)
<<
"Conv_grad: use_addto = "
<<
ctx
.
Attr
<
bool
>
(
"use_addto"
);
if
(
input_grad
)
{
if
(
input_grad
)
{
// Because beta is zero, it is unnecessary to reset input_grad.
// When beta is 0, it is unnecessary to reset input_grad.
// When beta is 1, the output cannot be reset since addt strategy used.
for
(
int
i
=
0
;
i
<
groups
;
i
++
)
{
for
(
int
i
=
0
;
i
<
groups
;
i
++
)
{
workspace_handle
.
RunFunc
(
workspace_handle
.
RunFunc
(
[
&
](
void
*
cudnn_workspace_ptr
)
{
[
&
](
void
*
cudnn_workspace_ptr
)
{
...
@@ -653,6 +660,9 @@ class CUDNNConvGradOpKernel : public framework::OpKernel<T> {
...
@@ -653,6 +660,9 @@ class CUDNNConvGradOpKernel : public framework::OpKernel<T> {
ctx
,
&
transformed_input_grad_channel
,
input_grad
);
ctx
,
&
transformed_input_grad_channel
,
input_grad
);
}
}
}
}
// filter_grad do not use inplace addto.
ScalingParamType
<
T
>
beta_filter
=
0.0
f
;
// ------------------- cudnn conv backward filter ---------------------
// ------------------- cudnn conv backward filter ---------------------
if
(
filter_grad
)
{
if
(
filter_grad
)
{
// Because beta is zero, it is unnecessary to reset filter_grad.
// Because beta is zero, it is unnecessary to reset filter_grad.
...
@@ -665,7 +675,7 @@ class CUDNNConvGradOpKernel : public framework::OpKernel<T> {
...
@@ -665,7 +675,7 @@ class CUDNNConvGradOpKernel : public framework::OpKernel<T> {
input_data
+
i
*
group_offset_in
,
args2
.
odesc
.
desc
(),
input_data
+
i
*
group_offset_in
,
args2
.
odesc
.
desc
(),
output_grad_data
+
i
*
group_offset_out
,
output_grad_data
+
i
*
group_offset_out
,
args2
.
cdesc
.
desc
(),
filter_algo
,
cudnn_workspace_ptr
,
args2
.
cdesc
.
desc
(),
filter_algo
,
cudnn_workspace_ptr
,
workspace_size
,
&
beta
,
args2
.
wdesc
.
desc
(),
workspace_size
,
&
beta
_filter
,
args2
.
wdesc
.
desc
(),
filter_grad_data
+
i
*
group_offset_filter
));
filter_grad_data
+
i
*
group_offset_filter
));
},
},
workspace_size
);
workspace_size
);
...
@@ -1017,7 +1027,14 @@ class CUDNNConvDoubleGradOpKernel : public framework::OpKernel<T> {
...
@@ -1017,7 +1027,14 @@ class CUDNNConvDoubleGradOpKernel : public framework::OpKernel<T> {
int
group_offset_out
=
o_c
/
groups
*
o_h
*
o_w
*
o_d
;
int
group_offset_out
=
o_c
/
groups
*
o_h
*
o_w
*
o_d
;
int
group_offset_filter
=
W
->
numel
()
/
groups
;
int
group_offset_filter
=
W
->
numel
()
/
groups
;
ScalingParamType
<
T
>
alpha
=
1.0
f
,
beta
=
0.0
f
;
ScalingParamType
<
T
>
alpha
=
1.0
f
;
ScalingParamType
<
T
>
beta
=
0.0
f
;
// NOTE(zhiqiu): inplace addto is not supportted in double grad yet.
// ScalingParamType<T> beta = ctx.Attr<bool>("use_addto") ? 1.0f :
// 0.0f;
// VLOG(4) << "Conv_grad_grad: use_addto = " << ctx.Attr<bool>("use_addto");
auto
wkspace_handle
=
dev_ctx
.
cudnn_workspace_handle
();
auto
wkspace_handle
=
dev_ctx
.
cudnn_workspace_handle
();
if
(
ddO
)
{
if
(
ddO
)
{
...
...
paddle/fluid/operators/conv_op.cc
浏览文件 @
f52c4f8b
...
@@ -305,6 +305,11 @@ void Conv2DOpMaker::Make() {
...
@@ -305,6 +305,11 @@ void Conv2DOpMaker::Make() {
.
SetDefault
(
0.0
f
);
.
SetDefault
(
0.0
f
);
AddAttr
<
float
>
(
"fuse_beta"
,
"(float, default 0.0) Only used in mkldnn kernel"
)
AddAttr
<
float
>
(
"fuse_beta"
,
"(float, default 0.0) Only used in mkldnn kernel"
)
.
SetDefault
(
0.0
f
);
.
SetDefault
(
0.0
f
);
AddAttr
<
bool
>
(
"use_addto"
,
"(bool, default false) If use addto strategy or not, only used in "
"cudnn kernel"
)
.
SetDefault
(
false
);
AddAttr
<
bool
>
(
"fuse_residual_connection"
,
AddAttr
<
bool
>
(
"fuse_residual_connection"
,
"(bool, default false) Only used in mkldnn kernel. Used "
"(bool, default false) Only used in mkldnn kernel. Used "
"whenever convolution output is as an input to residual "
"whenever convolution output is as an input to residual "
...
@@ -460,6 +465,11 @@ void Conv3DOpMaker::Make() {
...
@@ -460,6 +465,11 @@ void Conv3DOpMaker::Make() {
.
SetDefault
(
0.0
f
);
.
SetDefault
(
0.0
f
);
AddAttr
<
float
>
(
"fuse_beta"
,
"(float, default 0.0) Only used in mkldnn kernel"
)
AddAttr
<
float
>
(
"fuse_beta"
,
"(float, default 0.0) Only used in mkldnn kernel"
)
.
SetDefault
(
0.0
f
);
.
SetDefault
(
0.0
f
);
AddAttr
<
bool
>
(
"use_addto"
,
"(bool, default false) If use addto strategy or not, only used in "
"cudnn kernel"
)
.
SetDefault
(
false
);
AddAttr
<
bool
>
(
"fuse_residual_connection"
,
AddAttr
<
bool
>
(
"fuse_residual_connection"
,
"(bool, default false) Only used in mkldnn kernel. Used "
"(bool, default false) Only used in mkldnn kernel. Used "
"whenever convolution output is as an input to residual "
"whenever convolution output is as an input to residual "
...
...
paddle/fluid/operators/cudnn_lstm_cache.h
浏览文件 @
f52c4f8b
...
@@ -54,6 +54,8 @@ class ScopedRNNBase {
...
@@ -54,6 +54,8 @@ class ScopedRNNBase {
x_descs_
.
emplace_back
(
x_desc_
.
descriptor
<
T
>
(
dims_x
,
strides_x
));
x_descs_
.
emplace_back
(
x_desc_
.
descriptor
<
T
>
(
dims_x
,
strides_x
));
y_descs_
.
emplace_back
(
y_desc_
.
descriptor
<
T
>
(
dims_y
,
strides_y
));
y_descs_
.
emplace_back
(
y_desc_
.
descriptor
<
T
>
(
dims_y
,
strides_y
));
}
}
#if CUDNN_VERSION >= 7201
if
(
!
sequence_length
.
empty
())
{
if
(
!
sequence_length
.
empty
())
{
x_seq_desc_
.
descriptor
<
T
>
(
seq_length_
,
batch_size_
,
input_size_
,
true
,
x_seq_desc_
.
descriptor
<
T
>
(
seq_length_
,
batch_size_
,
input_size_
,
true
,
sequence_length
);
sequence_length
);
...
@@ -61,6 +63,7 @@ class ScopedRNNBase {
...
@@ -61,6 +63,7 @@ class ScopedRNNBase {
hidden_size_
*
numDirections
,
true
,
hidden_size_
*
numDirections
,
true
,
sequence_length
);
sequence_length
);
}
}
#endif
// ------------------- cudnn hx, hy, cx, cy descriptors----------
// ------------------- cudnn hx, hy, cx, cy descriptors----------
std
::
vector
<
int
>
dims_hx
=
{
num_layers_
*
numDirections
,
batch_size_
,
std
::
vector
<
int
>
dims_hx
=
{
num_layers_
*
numDirections
,
batch_size_
,
...
@@ -96,10 +99,13 @@ class ScopedRNNBase {
...
@@ -96,10 +99,13 @@ class ScopedRNNBase {
is_bidirec_
?
CUDNN_BIDIRECTIONAL
:
CUDNN_UNIDIRECTIONAL
,
CUDNN_LSTM
,
is_bidirec_
?
CUDNN_BIDIRECTIONAL
:
CUDNN_UNIDIRECTIONAL
,
CUDNN_LSTM
,
cudnn_type
));
cudnn_type
));
#endif
#endif
#if CUDNN_VERSION >= 7201
if
(
!
sequence_length
.
empty
())
{
if
(
!
sequence_length
.
empty
())
{
PADDLE_ENFORCE_CUDA_SUCCESS
(
platform
::
dynload
::
cudnnSetRNNPaddingMode
(
PADDLE_ENFORCE_CUDA_SUCCESS
(
platform
::
dynload
::
cudnnSetRNNPaddingMode
(
rnn_desc_
.
desc
(),
CUDNN_RNN_PADDED_IO_ENABLED
));
rnn_desc_
.
desc
(),
CUDNN_RNN_PADDED_IO_ENABLED
));
}
}
#endif
// ------------------- cudnn weights_size ---------------------
// ------------------- cudnn weights_size ---------------------
size_t
weights_size_
;
size_t
weights_size_
;
...
@@ -125,8 +131,10 @@ class ScopedRNNBase {
...
@@ -125,8 +131,10 @@ class ScopedRNNBase {
}
}
cudnnTensorDescriptor_t
*
x_descs
()
{
return
x_descs_
.
data
();
}
cudnnTensorDescriptor_t
*
x_descs
()
{
return
x_descs_
.
data
();
}
cudnnTensorDescriptor_t
*
y_descs
()
{
return
y_descs_
.
data
();
}
cudnnTensorDescriptor_t
*
y_descs
()
{
return
y_descs_
.
data
();
}
#if CUDNN_VERSION >= 7201
cudnnRNNDataDescriptor_t
x_seq_desc
()
{
return
x_seq_desc_
.
desc
();
}
cudnnRNNDataDescriptor_t
x_seq_desc
()
{
return
x_seq_desc_
.
desc
();
}
cudnnRNNDataDescriptor_t
y_seq_desc
()
{
return
y_seq_desc_
.
desc
();
}
cudnnRNNDataDescriptor_t
y_seq_desc
()
{
return
y_seq_desc_
.
desc
();
}
#endif
cudnnTensorDescriptor_t
init_h_desc
()
{
return
init_h_desc_
.
desc
();
}
cudnnTensorDescriptor_t
init_h_desc
()
{
return
init_h_desc_
.
desc
();
}
cudnnTensorDescriptor_t
init_c_desc
()
{
return
init_c_desc_
.
desc
();
}
cudnnTensorDescriptor_t
init_c_desc
()
{
return
init_c_desc_
.
desc
();
}
cudnnTensorDescriptor_t
last_h_desc
()
{
return
last_h_desc_
.
desc
();
}
cudnnTensorDescriptor_t
last_h_desc
()
{
return
last_h_desc_
.
desc
();
}
...
@@ -151,8 +159,10 @@ class ScopedRNNBase {
...
@@ -151,8 +159,10 @@ class ScopedRNNBase {
platform
::
ScopedTensorDescriptor
x_desc_
;
platform
::
ScopedTensorDescriptor
x_desc_
;
platform
::
ScopedTensorDescriptor
y_desc_
;
platform
::
ScopedTensorDescriptor
y_desc_
;
#if CUDNN_VERSION >= 7201
platform
::
ScopedRNNTensorDescriptor
x_seq_desc_
;
platform
::
ScopedRNNTensorDescriptor
x_seq_desc_
;
platform
::
ScopedRNNTensorDescriptor
y_seq_desc_
;
platform
::
ScopedRNNTensorDescriptor
y_seq_desc_
;
#endif
platform
::
ScopedTensorDescriptor
init_h_desc_
;
platform
::
ScopedTensorDescriptor
init_h_desc_
;
platform
::
ScopedTensorDescriptor
init_c_desc_
;
platform
::
ScopedTensorDescriptor
init_c_desc_
;
platform
::
ScopedTensorDescriptor
last_h_desc_
;
platform
::
ScopedTensorDescriptor
last_h_desc_
;
...
...
paddle/fluid/operators/elementwise/elementwise_add_op.cc
浏览文件 @
f52c4f8b
...
@@ -13,8 +13,11 @@ See the License for the specific language governing permissions and
...
@@ -13,8 +13,11 @@ See the License for the specific language governing permissions and
limitations under the License. */
limitations under the License. */
#include "paddle/fluid/operators/elementwise/elementwise_add_op.h"
#include "paddle/fluid/operators/elementwise/elementwise_add_op.h"
#include <memory>
#include <memory>
#include <string>
#include <string>
#include "paddle/fluid/framework/op_version_registry.h"
#include "paddle/fluid/operators/elementwise/elementwise_op.h"
#include "paddle/fluid/operators/elementwise/elementwise_op.h"
namespace
paddle
{
namespace
paddle
{
...
@@ -129,3 +132,18 @@ REGISTER_OP_CPU_KERNEL(
...
@@ -129,3 +132,18 @@ REGISTER_OP_CPU_KERNEL(
int
>
,
int
>
,
ops
::
ElementwiseAddDoubleGradKernel
<
paddle
::
platform
::
CPUDeviceContext
,
ops
::
ElementwiseAddDoubleGradKernel
<
paddle
::
platform
::
CPUDeviceContext
,
int64_t
>
);
int64_t
>
);
// A specialization elementwise_add operator, used in gradient accumulation with
// inplace addto.
REGISTER_OPERATOR
(
grad_add
,
paddle
::
operators
::
ElementwiseOp
,
paddle
::
operators
::
ElementwiseAddOpMaker
,
paddle
::
framework
::
EmptyGradOpMaker
<
paddle
::
framework
::
OpDesc
>
,
paddle
::
framework
::
EmptyGradOpMaker
<
paddle
::
imperative
::
OpBase
>
);
REGISTER_OP_CPU_KERNEL
(
grad_add
,
ops
::
ElementwiseAddKernel
<
paddle
::
platform
::
CPUDeviceContext
,
float
>
,
ops
::
ElementwiseAddKernel
<
paddle
::
platform
::
CPUDeviceContext
,
double
>
,
ops
::
ElementwiseAddKernel
<
paddle
::
platform
::
CPUDeviceContext
,
int
>
,
ops
::
ElementwiseAddKernel
<
paddle
::
platform
::
CPUDeviceContext
,
int64_t
>
);
paddle/fluid/operators/elementwise/elementwise_add_op.cu
浏览文件 @
f52c4f8b
...
@@ -111,3 +111,10 @@ REGISTER_OP_CUDA_KERNEL(
...
@@ -111,3 +111,10 @@ REGISTER_OP_CUDA_KERNEL(
ops
::
ElementwiseAddDoubleGradKernel
<
plat
::
CUDADeviceContext
,
int64_t
>
,
ops
::
ElementwiseAddDoubleGradKernel
<
plat
::
CUDADeviceContext
,
int64_t
>
,
ops
::
ElementwiseAddDoubleGradKernel
<
plat
::
CUDADeviceContext
,
ops
::
ElementwiseAddDoubleGradKernel
<
plat
::
CUDADeviceContext
,
plat
::
float16
>
);
plat
::
float16
>
);
REGISTER_OP_CUDA_KERNEL
(
grad_add
,
ops
::
ElementwiseAddKernel
<
plat
::
CUDADeviceContext
,
float
>
,
ops
::
ElementwiseAddKernel
<
plat
::
CUDADeviceContext
,
double
>
,
ops
::
ElementwiseAddKernel
<
plat
::
CUDADeviceContext
,
int
>
,
ops
::
ElementwiseAddKernel
<
plat
::
CUDADeviceContext
,
int64_t
>
,
ops
::
ElementwiseAddKernel
<
plat
::
CUDADeviceContext
,
plat
::
float16
>
);
paddle/fluid/operators/fake_quantize_op.cc
浏览文件 @
f52c4f8b
...
@@ -174,7 +174,64 @@ struct ChannelClipAndFakeQuantFunctor<platform::CPUDeviceContext, T> {
...
@@ -174,7 +174,64 @@ struct ChannelClipAndFakeQuantFunctor<platform::CPUDeviceContext, T> {
template
struct
ChannelClipAndFakeQuantFunctor
<
platform
::
CPUDeviceContext
,
template
struct
ChannelClipAndFakeQuantFunctor
<
platform
::
CPUDeviceContext
,
float
>;
float
>;
template
<
typename
T
>
struct
ChannelClipFakeQuantDequantFunctor
<
platform
::
CPUDeviceContext
,
T
>
{
void
operator
()(
const
platform
::
CPUDeviceContext
&
ctx
,
const
framework
::
Tensor
&
in
,
const
framework
::
Tensor
&
scale
,
const
int
bin_cnt
,
const
int
quant_axis
,
framework
::
Tensor
*
out
)
{
PADDLE_ENFORCE_EQ
(
quant_axis
==
0
||
quant_axis
==
1
,
true
,
platform
::
errors
::
InvalidArgument
(
"'quant_axis' should be 0 or 1, but "
"the received is %d"
,
quant_axis
));
auto
*
scale_data
=
scale
.
data
<
T
>
();
auto
*
in_data
=
in
.
data
<
T
>
();
auto
*
out_data
=
out
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
auto
in_dims
=
in
.
dims
();
const
int64_t
channel
=
in_dims
[
quant_axis
];
platform
::
Transform
<
platform
::
CPUDeviceContext
>
trans
;
if
(
quant_axis
==
0
)
{
const
int64_t
channel_size
=
in
.
numel
()
/
channel
;
for
(
int
i
=
0
;
i
<
channel
;
i
++
)
{
T
s
=
scale_data
[
i
];
auto
*
start
=
in_data
+
i
*
channel_size
;
auto
*
end
=
in_data
+
(
i
+
1
)
*
channel_size
;
trans
(
ctx
,
start
,
end
,
out_data
+
i
*
channel_size
,
ClipFunctor
<
T
>
(
-
s
,
s
));
}
for
(
int
i
=
0
;
i
<
channel
;
i
++
)
{
T
s
=
scale_data
[
i
];
T
inv_s
=
inverse
(
s
);
framework
::
Tensor
one_channel_out
=
out
->
Slice
(
i
,
i
+
1
);
auto
out_e
=
framework
::
EigenVector
<
T
>::
Flatten
(
one_channel_out
);
out_e
.
device
(
*
ctx
.
eigen_device
())
=
(
bin_cnt
*
inv_s
*
out_e
).
round
()
*
s
/
static_cast
<
T
>
(
bin_cnt
);
}
}
else
if
(
quant_axis
==
1
)
{
const
int64_t
step_i
=
in
.
numel
()
/
in_dims
[
0
];
const
int64_t
step_j
=
in
.
numel
()
/
(
in_dims
[
0
]
*
in_dims
[
1
]);
for
(
int
i
=
0
;
i
<
in_dims
[
0
];
i
++
)
{
for
(
int
j
=
0
;
j
<
in_dims
[
1
];
j
++
)
{
T
s
=
scale_data
[
j
];
T
inv_s
=
inverse
(
s
);
auto
*
start
=
in_data
+
i
*
step_i
+
j
*
step_j
;
auto
*
end
=
in_data
+
i
*
step_i
+
(
j
+
1
)
*
step_j
;
auto
*
cur_out_data
=
out_data
+
i
*
step_i
+
j
*
step_j
;
trans
(
ctx
,
start
,
end
,
cur_out_data
,
ClipFunctor
<
T
>
(
-
s
,
s
));
for
(
int
k
=
0
;
k
<
step_j
;
k
++
)
{
cur_out_data
[
k
]
=
std
::
round
(
bin_cnt
*
inv_s
*
cur_out_data
[
k
])
*
s
/
static_cast
<
T
>
(
bin_cnt
);
}
}
}
}
}
};
template
struct
ChannelClipFakeQuantDequantFunctor
<
platform
::
CPUDeviceContext
,
float
>;
template
<
typename
T
>
template
<
typename
T
>
struct
FindRangeAbsMaxFunctor
<
platform
::
CPUDeviceContext
,
T
>
{
struct
FindRangeAbsMaxFunctor
<
platform
::
CPUDeviceContext
,
T
>
{
void
operator
()(
const
platform
::
CPUDeviceContext
&
ctx
,
void
operator
()(
const
platform
::
CPUDeviceContext
&
ctx
,
...
@@ -360,6 +417,75 @@ $$0 \leq c \lt \ the\ channel\ number\ of\ X$$
...
@@ -360,6 +417,75 @@ $$0 \leq c \lt \ the\ channel\ number\ of\ X$$
}
}
};
};
class
FakeChannelWiseQuantizeDequantizeAbsMaxOp
:
public
framework
::
OperatorWithKernel
{
public:
using
framework
::
OperatorWithKernel
::
OperatorWithKernel
;
void
InferShape
(
framework
::
InferShapeContext
*
ctx
)
const
override
{
OP_INOUT_CHECK
(
ctx
->
HasInput
(
"X"
),
"Input"
,
"X"
,
"FakeChannelWiseQuantizeDequantizeAbsMax"
);
OP_INOUT_CHECK
(
ctx
->
HasOutput
(
"Out"
),
"Output"
,
"Out"
,
"FakeChannelWiseQuantizeDequantizeAbsMax"
);
OP_INOUT_CHECK
(
ctx
->
HasOutput
(
"OutScale"
),
"Output"
,
"OutScale"
,
"FakeChannelWiseQuantizeDequantizeAbsMax"
);
int
quant_axis
=
ctx
->
Attrs
().
Get
<
int
>
(
"quant_axis"
);
ctx
->
SetOutputDim
(
"Out"
,
ctx
->
GetInputDim
(
"X"
));
ctx
->
SetOutputDim
(
"OutScale"
,
{
ctx
->
GetInputDim
(
"X"
)[
quant_axis
]});
ctx
->
ShareLoD
(
"X"
,
/*->*/
"Out"
);
}
protected:
framework
::
OpKernelType
GetExpectedKernelType
(
const
framework
::
ExecutionContext
&
ctx
)
const
override
{
return
framework
::
OpKernelType
(
OperatorWithKernel
::
IndicateVarDataType
(
ctx
,
"X"
),
ctx
.
GetPlace
());
}
};
class
FakeChannelWiseQuantizeDequantizeAbsMaxOpMaker
:
public
framework
::
OpProtoAndCheckerMaker
{
public:
void
Make
()
override
{
AddInput
(
"X"
,
"(Tensor) Input is float data type."
);
AddOutput
(
"Out"
,
"(Tensor) Output of quantized and dequantized low level tensor, "
"saved as float data type."
);
AddOutput
(
"OutScale"
,
"(Tensor) Current channel wise scale"
);
AddAttr
<
int
>
(
"quant_axis"
,
"(int, default 0) The axis for quantization. "
"For conv2d, depthwise_conv2d, conv2d_transpose "
"and mul, the quant_axis is equal to the cout axis."
)
.
SetDefault
(
0
)
.
AddCustomChecker
([](
const
int
&
quant_axis
)
{
PADDLE_ENFORCE_EQ
(
quant_axis
==
0
||
quant_axis
==
1
,
true
,
platform
::
errors
::
InvalidArgument
(
"'quant_axis' should be 0 or 1, but "
"the received is %d"
,
quant_axis
));
});
AddAttr
<
int
>
(
"bit_length"
,
"(int, default 8)"
)
.
SetDefault
(
8
)
.
AddCustomChecker
([](
const
int
&
bit_length
)
{
PADDLE_ENFORCE_EQ
(
bit_length
>=
1
&&
bit_length
<=
16
,
true
,
platform
::
errors
::
InvalidArgument
(
"'bit_length' should be between 1 and 16, but "
"the received is %d"
,
bit_length
));
});
AddComment
(
R"DOC(
The scale of FakeChannelWiseQuantize operator is a vector.
In detail, each channel of the input X has a scale value.
$$scale_c = max(abs(X_c))$$
$$range = 2^{bit\_length - 1} - 1$$
$$Out_c = round(\frac{X_c * range} {scale_c}) * \frac{scale_c} {range}$$
In above three formulas, the range value of c is as follow:
$$0 \leq c \lt \ the\ channel\ number\ of\ X$$
)DOC"
);
}
};
class
FakeQuantizeRangeAbsMaxOp
:
public
framework
::
OperatorWithKernel
{
class
FakeQuantizeRangeAbsMaxOp
:
public
framework
::
OperatorWithKernel
{
public:
public:
FakeQuantizeRangeAbsMaxOp
(
const
std
::
string
&
type
,
FakeQuantizeRangeAbsMaxOp
(
const
std
::
string
&
type
,
...
@@ -666,3 +792,12 @@ REGISTER_OP_CPU_KERNEL(moving_average_abs_max_scale,
...
@@ -666,3 +792,12 @@ REGISTER_OP_CPU_KERNEL(moving_average_abs_max_scale,
REGISTER_OPERATOR
(
fake_quantize_dequantize_grad
,
ops
::
FakeQuantDequantGradOp
);
REGISTER_OPERATOR
(
fake_quantize_dequantize_grad
,
ops
::
FakeQuantDequantGradOp
);
REGISTER_OP_CPU_KERNEL
(
fake_quantize_dequantize_grad
,
REGISTER_OP_CPU_KERNEL
(
fake_quantize_dequantize_grad
,
ops
::
FakeQuantDequantGradKernel
<
CPU
,
float
>
);
ops
::
FakeQuantDequantGradKernel
<
CPU
,
float
>
);
REGISTER_OPERATOR
(
fake_channel_wise_quantize_dequantize_abs_max
,
ops
::
FakeChannelWiseQuantizeDequantizeAbsMaxOp
,
ops
::
FakeChannelWiseQuantizeDequantizeAbsMaxOpMaker
,
ops
::
FakeQuantDequantGradMaker
<
paddle
::
framework
::
OpDesc
>
,
ops
::
FakeQuantDequantGradMaker
<
paddle
::
imperative
::
OpBase
>
);
REGISTER_OP_CPU_KERNEL
(
fake_channel_wise_quantize_dequantize_abs_max
,
ops
::
FakeChannelWiseQuantizeDequantizeAbsMaxKernel
<
CPU
,
float
>
);
paddle/fluid/operators/fake_quantize_op.cu
浏览文件 @
f52c4f8b
...
@@ -417,7 +417,89 @@ struct FindMovingAverageAbsMaxFunctor<platform::CUDADeviceContext, T> {
...
@@ -417,7 +417,89 @@ struct FindMovingAverageAbsMaxFunctor<platform::CUDADeviceContext, T> {
}
}
};
};
template
struct
FindMovingAverageAbsMaxFunctor
<
platform
::
CUDADeviceContext
,
// ChannelClipAndQuantDequantKernel for quant_axis is 0
template
<
typename
T
>
__global__
void
ChannelClipAndQuantDequantKernelQuantAxis0
(
const
T
*
in
,
const
T
*
scale
,
const
int
bin_cnt
,
const
int
n
,
const
int
c
,
T
*
out
)
{
int
tid
=
threadIdx
.
x
;
int
channel_size
=
n
/
c
;
const
T
*
in_c
=
in
+
blockIdx
.
x
*
channel_size
;
T
*
out_c
=
out
+
blockIdx
.
x
*
channel_size
;
T
s
=
scale
[
blockIdx
.
x
];
T
inv_s
=
inverse
(
s
);
for
(
int
i
=
tid
;
i
<
channel_size
;
i
+=
blockDim
.
x
)
{
T
x
=
in_c
[
i
];
T
v
=
x
>
s
?
s
:
x
;
v
=
v
<
-
s
?
-
s
:
v
;
v
=
bin_cnt
*
inv_s
*
v
;
out_c
[
i
]
=
round
(
v
)
*
s
/
bin_cnt
;
}
}
// ChannelClipAndQuantDequantKernel for quant_axis is 1
template
<
typename
T
>
__global__
void
ChannelClipAndQuantDequantKernelQuantAxis1
(
const
T
*
in
,
const
T
*
scale
,
const
int
bin_cnt
,
const
int
n
,
const
int
cin
,
const
int
cout
,
T
*
out
)
{
T
s
=
scale
[
blockIdx
.
x
%
cout
];
T
inv_s
=
inverse
(
s
);
int
wh_size
=
n
/
(
cin
*
cout
);
const
T
*
in_c
=
in
+
blockIdx
.
x
*
wh_size
;
T
*
out_c
=
out
+
blockIdx
.
x
*
wh_size
;
for
(
int
i
=
threadIdx
.
x
;
i
<
wh_size
;
i
+=
blockDim
.
x
)
{
T
x
=
in_c
[
i
];
T
v
=
x
>
s
?
s
:
x
;
v
=
v
<
-
s
?
-
s
:
v
;
v
=
bin_cnt
*
inv_s
*
v
;
out_c
[
i
]
=
round
(
v
)
*
s
/
bin_cnt
;
}
}
template
<
typename
T
>
struct
ChannelClipFakeQuantDequantFunctor
<
platform
::
CUDADeviceContext
,
T
>
{
void
operator
()(
const
platform
::
CUDADeviceContext
&
ctx
,
const
framework
::
Tensor
&
in
,
const
framework
::
Tensor
&
scale
,
const
int
bin_cnt
,
const
int
quant_axis
,
framework
::
Tensor
*
out
)
{
// At present, channelwise quantization supports conv2d, depthwise_conv2d
// conv2d_transpose and mul
PADDLE_ENFORCE_EQ
(
quant_axis
==
0
||
quant_axis
==
1
,
true
,
platform
::
errors
::
InvalidArgument
(
"'quant_axis' should be 0 or 1, but "
"the received is %d"
,
quant_axis
));
int
num
=
in
.
numel
();
auto
in_dims
=
in
.
dims
();
const
T
*
in_data
=
in
.
data
<
T
>
();
const
T
*
scale_data
=
scale
.
data
<
T
>
();
T
*
out_data
=
out
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
if
(
quant_axis
==
0
)
{
int
grid
=
in_dims
[
0
];
int
block
=
1024
;
ChannelClipAndQuantDequantKernelQuantAxis0
<
T
><<<
grid
,
block
,
0
,
ctx
.
stream
()
>>>
(
in_data
,
scale_data
,
bin_cnt
,
num
,
in_dims
[
0
],
out_data
);
}
else
if
(
quant_axis
==
1
)
{
int
grid
=
in_dims
[
0
]
*
in_dims
[
1
];
int
block
=
1024
;
ChannelClipAndQuantDequantKernelQuantAxis1
<
T
><<<
grid
,
block
,
0
,
ctx
.
stream
()
>>>
(
in_data
,
scale_data
,
bin_cnt
,
num
,
in_dims
[
0
],
in_dims
[
1
],
out_data
);
}
}
};
template
struct
ChannelClipFakeQuantDequantFunctor
<
platform
::
CUDADeviceContext
,
float
>;
float
>;
}
// namespace operators
}
// namespace operators
...
@@ -443,3 +525,6 @@ REGISTER_OP_CUDA_KERNEL(
...
@@ -443,3 +525,6 @@ REGISTER_OP_CUDA_KERNEL(
ops
::
FakeQuantizeDequantizeMovingAverageAbsMaxKernel
<
CUDA
,
float
>
);
ops
::
FakeQuantizeDequantizeMovingAverageAbsMaxKernel
<
CUDA
,
float
>
);
REGISTER_OP_CUDA_KERNEL
(
fake_quantize_dequantize_grad
,
REGISTER_OP_CUDA_KERNEL
(
fake_quantize_dequantize_grad
,
ops
::
FakeQuantDequantGradKernel
<
CUDA
,
float
>
);
ops
::
FakeQuantDequantGradKernel
<
CUDA
,
float
>
);
REGISTER_OP_CUDA_KERNEL
(
fake_channel_wise_quantize_dequantize_abs_max
,
ops
::
FakeChannelWiseQuantizeDequantizeAbsMaxKernel
<
CUDA
,
float
>
);
paddle/fluid/operators/fake_quantize_op.h
浏览文件 @
f52c4f8b
...
@@ -72,6 +72,13 @@ struct ChannelClipAndFakeQuantFunctor {
...
@@ -72,6 +72,13 @@ struct ChannelClipAndFakeQuantFunctor {
const
int
quant_axis
,
framework
::
Tensor
*
out
);
const
int
quant_axis
,
framework
::
Tensor
*
out
);
};
};
template
<
typename
DeviceContext
,
typename
T
>
struct
ChannelClipFakeQuantDequantFunctor
{
void
operator
()(
const
DeviceContext
&
ctx
,
const
framework
::
Tensor
&
in
,
const
framework
::
Tensor
&
scale
,
const
int
bin_cnt
,
const
int
quant_axis
,
framework
::
Tensor
*
out
);
};
template
<
typename
DeviceContext
,
typename
T
>
template
<
typename
DeviceContext
,
typename
T
>
struct
FindMovingAverageAbsMaxFunctor
{
struct
FindMovingAverageAbsMaxFunctor
{
void
operator
()(
const
DeviceContext
&
ctx
,
const
framework
::
Tensor
&
in_accum
,
void
operator
()(
const
DeviceContext
&
ctx
,
const
framework
::
Tensor
&
in_accum
,
...
@@ -154,6 +161,30 @@ class FakeChannelWiseQuantizeAbsMaxKernel : public framework::OpKernel<T> {
...
@@ -154,6 +161,30 @@ class FakeChannelWiseQuantizeAbsMaxKernel : public framework::OpKernel<T> {
}
}
};
};
template
<
typename
DeviceContext
,
typename
T
>
class
FakeChannelWiseQuantizeDequantizeAbsMaxKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
void
Compute
(
const
framework
::
ExecutionContext
&
context
)
const
override
{
auto
*
in
=
context
.
Input
<
framework
::
Tensor
>
(
"X"
);
auto
*
out
=
context
.
Output
<
framework
::
Tensor
>
(
"Out"
);
auto
*
out_scale
=
context
.
Output
<
framework
::
Tensor
>
(
"OutScale"
);
T
*
out_scale_data
=
out_scale
->
mutable_data
<
T
>
(
context
.
GetPlace
());
auto
&
dev_ctx
=
context
.
template
device_context
<
DeviceContext
>();
out
->
mutable_data
<
T
>
(
dev_ctx
.
GetPlace
());
int
bit_length
=
context
.
Attr
<
int
>
(
"bit_length"
);
int
bin_cnt
=
std
::
pow
(
2
,
bit_length
-
1
)
-
1
;
int
quant_axis
=
context
.
Attr
<
int
>
(
"quant_axis"
);
FindChannelAbsMaxFunctor
<
DeviceContext
,
T
>
()(
dev_ctx
,
*
in
,
quant_axis
,
out_scale_data
);
ChannelClipFakeQuantDequantFunctor
<
DeviceContext
,
T
>
()(
dev_ctx
,
*
in
,
*
out_scale
,
bin_cnt
,
quant_axis
,
out
);
}
};
template
<
typename
DeviceContext
,
typename
T
>
template
<
typename
DeviceContext
,
typename
T
>
class
FakeQuantizeRangeAbsMaxKernel
:
public
framework
::
OpKernel
<
T
>
{
class
FakeQuantizeRangeAbsMaxKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
public:
...
...
paddle/fluid/operators/fused/fusion_gru_op.cc
浏览文件 @
f52c4f8b
...
@@ -15,6 +15,7 @@ limitations under the License. */
...
@@ -15,6 +15,7 @@ limitations under the License. */
#include "paddle/fluid/operators/fused/fusion_gru_op.h"
#include "paddle/fluid/operators/fused/fusion_gru_op.h"
#include <cstring> // for memcpy
#include <cstring> // for memcpy
#include <string>
#include <string>
#include <vector>
#include "paddle/fluid/operators/jit/kernels.h"
#include "paddle/fluid/operators/jit/kernels.h"
#include "paddle/fluid/operators/math/blas.h"
#include "paddle/fluid/operators/math/blas.h"
#include "paddle/fluid/operators/math/fc.h"
#include "paddle/fluid/operators/math/fc.h"
...
...
paddle/fluid/operators/optimizers/rmsprop_op.cc
浏览文件 @
f52c4f8b
...
@@ -143,4 +143,5 @@ http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf)
...
@@ -143,4 +143,5 @@ http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf)
namespace
ops
=
paddle
::
operators
;
namespace
ops
=
paddle
::
operators
;
REGISTER_OP_WITHOUT_GRADIENT
(
rmsprop
,
ops
::
RmspropOp
,
ops
::
RmspropOpMaker
);
REGISTER_OP_WITHOUT_GRADIENT
(
rmsprop
,
ops
::
RmspropOp
,
ops
::
RmspropOpMaker
);
REGISTER_OP_CPU_KERNEL
(
REGISTER_OP_CPU_KERNEL
(
rmsprop
,
ops
::
RmspropOpKernel
<
paddle
::
platform
::
CPUDeviceContext
,
float
>
);
rmsprop
,
ops
::
RmspropOpKernel
<
paddle
::
platform
::
CPUDeviceContext
,
float
>
,
ops
::
RmspropOpKernel
<
paddle
::
platform
::
CPUDeviceContext
,
double
>
);
paddle/fluid/operators/optimizers/rmsprop_op.cu
浏览文件 @
f52c4f8b
...
@@ -15,4 +15,5 @@ limitations under the License. */
...
@@ -15,4 +15,5 @@ limitations under the License. */
namespace
ops
=
paddle
::
operators
;
namespace
ops
=
paddle
::
operators
;
REGISTER_OP_CUDA_KERNEL
(
REGISTER_OP_CUDA_KERNEL
(
rmsprop
,
ops
::
RmspropOpKernel
<
paddle
::
platform
::
CUDADeviceContext
,
float
>
);
rmsprop
,
ops
::
RmspropOpKernel
<
paddle
::
platform
::
CUDADeviceContext
,
float
>
,
ops
::
RmspropOpKernel
<
paddle
::
platform
::
CUDADeviceContext
,
double
>
);
paddle/fluid/operators/top_k_v2_op.cc
浏览文件 @
f52c4f8b
...
@@ -32,7 +32,6 @@ class TopkV2Op : public framework::OperatorWithKernel {
...
@@ -32,7 +32,6 @@ class TopkV2Op : public framework::OperatorWithKernel {
auto
input_dims
=
ctx
->
GetInputDim
(
"X"
);
auto
input_dims
=
ctx
->
GetInputDim
(
"X"
);
const
int
&
dim_size
=
input_dims
.
size
();
const
int
&
dim_size
=
input_dims
.
size
();
const
int
k
=
static_cast
<
int
>
(
ctx
->
Attrs
().
Get
<
int
>
(
"k"
));
int
axis
=
static_cast
<
int
>
(
ctx
->
Attrs
().
Get
<
int
>
(
"axis"
));
int
axis
=
static_cast
<
int
>
(
ctx
->
Attrs
().
Get
<
int
>
(
"axis"
));
PADDLE_ENFORCE_EQ
((
axis
<
dim_size
)
&&
(
axis
>=
(
-
1
*
dim_size
)),
true
,
PADDLE_ENFORCE_EQ
((
axis
<
dim_size
)
&&
(
axis
>=
(
-
1
*
dim_size
)),
true
,
"the axis of topk"
"the axis of topk"
...
@@ -41,8 +40,18 @@ class TopkV2Op : public framework::OperatorWithKernel {
...
@@ -41,8 +40,18 @@ class TopkV2Op : public framework::OperatorWithKernel {
if
(
axis
<
0
)
axis
+=
dim_size
;
if
(
axis
<
0
)
axis
+=
dim_size
;
PADDLE_ENFORCE_GE
(
int
k
;
k
,
1
,
"the attribute of k in the topk must >= 1, but received %d ."
,
k
);
auto
k_is_tensor
=
ctx
->
HasInput
(
"K"
);
if
(
k_is_tensor
)
{
k
=
-
1
;
}
else
{
k
=
static_cast
<
int
>
(
ctx
->
Attrs
().
Get
<
int
>
(
"k"
));
PADDLE_ENFORCE_EQ
(
k
>=
1
,
true
,
"the attribute of k in the topk must >= 1 or be a "
"Tensor, but received %d ."
,
k
);
}
PADDLE_ENFORCE_GE
(
input_dims
.
size
(),
1
,
PADDLE_ENFORCE_GE
(
input_dims
.
size
(),
1
,
"input of topk must have >= 1d shape"
);
"input of topk must have >= 1d shape"
);
...
...
paddle/fluid/platform/cudnn_helper.h
浏览文件 @
f52c4f8b
...
@@ -294,6 +294,7 @@ class ScopedTensorDescriptor {
...
@@ -294,6 +294,7 @@ class ScopedTensorDescriptor {
DISABLE_COPY_AND_ASSIGN
(
ScopedTensorDescriptor
);
DISABLE_COPY_AND_ASSIGN
(
ScopedTensorDescriptor
);
};
};
#if CUDNN_VERSION >= 7201
class
ScopedRNNTensorDescriptor
{
class
ScopedRNNTensorDescriptor
{
public:
public:
ScopedRNNTensorDescriptor
()
{
ScopedRNNTensorDescriptor
()
{
...
@@ -337,6 +338,7 @@ class ScopedRNNTensorDescriptor {
...
@@ -337,6 +338,7 @@ class ScopedRNNTensorDescriptor {
cudnnRNNDataDescriptor_t
desc_
;
cudnnRNNDataDescriptor_t
desc_
;
DISABLE_COPY_AND_ASSIGN
(
ScopedRNNTensorDescriptor
);
DISABLE_COPY_AND_ASSIGN
(
ScopedRNNTensorDescriptor
);
};
};
#endif
class
ScopedDropoutDescriptor
{
class
ScopedDropoutDescriptor
{
public:
public:
...
...
paddle/fluid/platform/dynload/cudnn.cc
浏览文件 @
f52c4f8b
...
@@ -46,6 +46,10 @@ CUDNN_DNN_ROUTINE_EACH_R6(DEFINE_WRAP);
...
@@ -46,6 +46,10 @@ CUDNN_DNN_ROUTINE_EACH_R6(DEFINE_WRAP);
CUDNN_DNN_ROUTINE_EACH_R7
(
DEFINE_WRAP
);
CUDNN_DNN_ROUTINE_EACH_R7
(
DEFINE_WRAP
);
#endif
#endif
#ifdef CUDNN_DNN_ROUTINE_EACH_AFTER_TWO_R7
CUDNN_DNN_ROUTINE_EACH_AFTER_TWO_R7
(
DEFINE_WRAP
);
#endif
#ifdef CUDNN_DNN_ROUTINE_EACH_AFTER_R7
#ifdef CUDNN_DNN_ROUTINE_EACH_AFTER_R7
CUDNN_DNN_ROUTINE_EACH_AFTER_R7
(
DEFINE_WRAP
);
CUDNN_DNN_ROUTINE_EACH_AFTER_R7
(
DEFINE_WRAP
);
#endif
#endif
...
...
paddle/fluid/platform/dynload/cudnn.h
浏览文件 @
f52c4f8b
...
@@ -101,9 +101,6 @@ extern void EnforceCUDNNLoaded(const char* fn_name);
...
@@ -101,9 +101,6 @@ extern void EnforceCUDNNLoaded(const char* fn_name);
__macro(cudnnDropoutGetStatesSize); \
__macro(cudnnDropoutGetStatesSize); \
__macro(cudnnSetDropoutDescriptor); \
__macro(cudnnSetDropoutDescriptor); \
__macro(cudnnRestoreDropoutDescriptor); \
__macro(cudnnRestoreDropoutDescriptor); \
__macro(cudnnCreateRNNDataDescriptor); \
__macro(cudnnDestroyRNNDataDescriptor); \
__macro(cudnnSetRNNDataDescriptor); \
__macro(cudnnCreateRNNDescriptor); \
__macro(cudnnCreateRNNDescriptor); \
__macro(cudnnGetRNNParamsSize); \
__macro(cudnnGetRNNParamsSize); \
__macro(cudnnGetRNNWorkspaceSize); \
__macro(cudnnGetRNNWorkspaceSize); \
...
@@ -112,11 +109,6 @@ extern void EnforceCUDNNLoaded(const char* fn_name);
...
@@ -112,11 +109,6 @@ extern void EnforceCUDNNLoaded(const char* fn_name);
__macro(cudnnRNNBackwardData); \
__macro(cudnnRNNBackwardData); \
__macro(cudnnRNNBackwardWeights); \
__macro(cudnnRNNBackwardWeights); \
__macro(cudnnRNNForwardInference); \
__macro(cudnnRNNForwardInference); \
__macro(cudnnRNNForwardTrainingEx); \
__macro(cudnnSetRNNPaddingMode); \
__macro(cudnnRNNBackwardDataEx); \
__macro(cudnnRNNBackwardWeightsEx); \
__macro(cudnnRNNForwardInferenceEx); \
__macro(cudnnDestroyDropoutDescriptor); \
__macro(cudnnDestroyDropoutDescriptor); \
__macro(cudnnDestroyRNNDescriptor); \
__macro(cudnnDestroyRNNDescriptor); \
__macro(cudnnSetTensorNdDescriptorEx);
__macro(cudnnSetTensorNdDescriptorEx);
...
@@ -188,6 +180,19 @@ CUDNN_DNN_ROUTINE_EACH_R6(DECLARE_DYNAMIC_LOAD_CUDNN_WRAP)
...
@@ -188,6 +180,19 @@ CUDNN_DNN_ROUTINE_EACH_R6(DECLARE_DYNAMIC_LOAD_CUDNN_WRAP)
CUDNN_DNN_ROUTINE_EACH_R7
(
DECLARE_DYNAMIC_LOAD_CUDNN_WRAP
)
CUDNN_DNN_ROUTINE_EACH_R7
(
DECLARE_DYNAMIC_LOAD_CUDNN_WRAP
)
#endif
#endif
#if CUDNN_VERSION >= 7201
#define CUDNN_DNN_ROUTINE_EACH_AFTER_TWO_R7(__macro) \
__macro(cudnnCreateRNNDataDescriptor); \
__macro(cudnnDestroyRNNDataDescriptor); \
__macro(cudnnSetRNNDataDescriptor); \
__macro(cudnnSetRNNPaddingMode); \
__macro(cudnnRNNForwardTrainingEx); \
__macro(cudnnRNNBackwardDataEx); \
__macro(cudnnRNNBackwardWeightsEx); \
__macro(cudnnRNNForwardInferenceEx);
CUDNN_DNN_ROUTINE_EACH_AFTER_TWO_R7
(
DECLARE_DYNAMIC_LOAD_CUDNN_WRAP
)
#endif
#if CUDNN_VERSION >= 7401
#if CUDNN_VERSION >= 7401
#define CUDNN_DNN_ROUTINE_EACH_AFTER_R7(__macro) \
#define CUDNN_DNN_ROUTINE_EACH_AFTER_R7(__macro) \
__macro(cudnnGetBatchNormalizationForwardTrainingExWorkspaceSize); \
__macro(cudnnGetBatchNormalizationForwardTrainingExWorkspaceSize); \
...
...
paddle/fluid/platform/flags.cc
浏览文件 @
f52c4f8b
...
@@ -521,3 +521,18 @@ DEFINE_int32(
...
@@ -521,3 +521,18 @@ DEFINE_int32(
DEFINE_bool
(
sort_sum_gradient
,
false
,
DEFINE_bool
(
sort_sum_gradient
,
false
,
"Sum gradients by the reverse order of "
"Sum gradients by the reverse order of "
"the forward execution sequence."
);
"the forward execution sequence."
);
/**
* Performance related FLAG
* Name: max_inplace_grad_add
* Since Version: 2.0.0
* Value Range: int32, default=0
* Example:
* Note: The maximum number of inplace grad_add.
*/
DEFINE_int32
(
max_inplace_grad_add
,
0
,
"The maximum number of inplace grad_add. When doing "
"gradient accumulation, if the number of gradients need to that "
"less FLAGS_max_inplace_grad_add, than it will be use several grad_add"
"instead of sum. Default is 0."
);
paddle/fluid/pybind/global_value_getter_setter.cc
浏览文件 @
f52c4f8b
...
@@ -62,6 +62,7 @@ DECLARE_bool(use_system_allocator);
...
@@ -62,6 +62,7 @@ DECLARE_bool(use_system_allocator);
// others
// others
DECLARE_bool
(
benchmark
);
DECLARE_bool
(
benchmark
);
DECLARE_int32
(
inner_op_parallelism
);
DECLARE_int32
(
inner_op_parallelism
);
DECLARE_int32
(
max_inplace_grad_add
);
DECLARE_string
(
tracer_profile_fname
);
DECLARE_string
(
tracer_profile_fname
);
#ifdef PADDLE_WITH_CUDA
#ifdef PADDLE_WITH_CUDA
// cudnn
// cudnn
...
@@ -348,7 +349,7 @@ static void RegisterGlobalVarGetterSetter() {
...
@@ -348,7 +349,7 @@ static void RegisterGlobalVarGetterSetter() {
FLAGS_init_allocated_mem
,
FLAGS_initial_cpu_memory_in_mb
,
FLAGS_init_allocated_mem
,
FLAGS_initial_cpu_memory_in_mb
,
FLAGS_memory_fraction_of_eager_deletion
,
FLAGS_use_pinned_memory
,
FLAGS_memory_fraction_of_eager_deletion
,
FLAGS_use_pinned_memory
,
FLAGS_benchmark
,
FLAGS_inner_op_parallelism
,
FLAGS_tracer_profile_fname
,
FLAGS_benchmark
,
FLAGS_inner_op_parallelism
,
FLAGS_tracer_profile_fname
,
FLAGS_paddle_num_threads
,
FLAGS_use_mkldnn
);
FLAGS_paddle_num_threads
,
FLAGS_use_mkldnn
,
FLAGS_max_inplace_grad_add
);
#ifdef PADDLE_WITH_CUDA
#ifdef PADDLE_WITH_CUDA
REGISTER_PUBLIC_GLOBAL_VAR
(
REGISTER_PUBLIC_GLOBAL_VAR
(
...
...
paddle/fluid/pybind/op_function_generator.cc
浏览文件 @
f52c4f8b
...
@@ -111,6 +111,7 @@ std::map<std::string, std::set<std::string>> op_passing_outs_map = {
...
@@ -111,6 +111,7 @@ std::map<std::string, std::set<std::string>> op_passing_outs_map = {
{
"fake_quantize_dequantize_moving_average_abs_max"
,
{
"fake_quantize_dequantize_moving_average_abs_max"
,
{
"Out"
,
"OutScale"
,
"OutAccum"
,
"OutState"
}},
{
"Out"
,
"OutScale"
,
"OutAccum"
,
"OutState"
}},
{
"fake_quantize_dequantize_abs_max"
,
{
"Out"
,
"OutScale"
}},
{
"fake_quantize_dequantize_abs_max"
,
{
"Out"
,
"OutScale"
}},
{
"fake_channel_wise_quantize_dequantize_abs_max"
,
{
"Out"
,
"OutScale"
}},
{
"check_finite_and_unscale"
,
{
"Out"
,
"FoundInfinite"
}},
{
"check_finite_and_unscale"
,
{
"Out"
,
"FoundInfinite"
}},
{
"update_loss_scaling"
,
{
"update_loss_scaling"
,
{
"Out"
,
"LossScaling"
,
"OutGoodSteps"
,
"OutBadSteps"
}},
{
"Out"
,
"LossScaling"
,
"OutGoodSteps"
,
"OutBadSteps"
}},
...
...
paddle/fluid/pybind/pybind.cc
浏览文件 @
f52c4f8b
...
@@ -12,6 +12,7 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
...
@@ -12,6 +12,7 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
See the License for the specific language governing permissions and
limitations under the License. */
limitations under the License. */
#include <Python.h>
#include <Python.h>
#include <algorithm>
#include <algorithm>
#include <cstdlib>
#include <cstdlib>
#include <map>
#include <map>
...
@@ -22,6 +23,7 @@ limitations under the License. */
...
@@ -22,6 +23,7 @@ limitations under the License. */
#include <unordered_set>
#include <unordered_set>
#include <utility>
#include <utility>
#include <vector>
#include <vector>
#include "paddle/fluid/framework/executor.h"
#include "paddle/fluid/framework/executor.h"
#include "paddle/fluid/framework/feed_fetch_method.h"
#include "paddle/fluid/framework/feed_fetch_method.h"
#include "paddle/fluid/framework/feed_fetch_type.h"
#include "paddle/fluid/framework/feed_fetch_type.h"
...
@@ -2528,6 +2530,10 @@ All parameter, weight, gradient are variables in Paddle.
...
@@ -2528,6 +2530,10 @@ All parameter, weight, gradient are variables in Paddle.
"enable_inplace"
,
"enable_inplace"
,
[](
const
BuildStrategy
&
self
)
{
return
self
.
enable_inplace_
;
},
[](
const
BuildStrategy
&
self
)
{
return
self
.
enable_inplace_
;
},
[](
BuildStrategy
&
self
,
bool
b
)
{
self
.
enable_inplace_
=
b
;
})
[](
BuildStrategy
&
self
,
bool
b
)
{
self
.
enable_inplace_
=
b
;
})
.
def_property
(
"enable_addto"
,
[](
const
BuildStrategy
&
self
)
{
return
self
.
enable_addto_
;
},
[](
BuildStrategy
&
self
,
bool
b
)
{
self
.
enable_addto_
=
b
;
})
.
def_property
(
.
def_property
(
"fuse_all_reduce_ops"
,
"fuse_all_reduce_ops"
,
[](
const
BuildStrategy
&
self
)
{
[](
const
BuildStrategy
&
self
)
{
...
...
paddle/scripts/paddle_build.sh
浏览文件 @
f52c4f8b
...
@@ -121,6 +121,18 @@ function cmake_base() {
...
@@ -121,6 +121,18 @@ function cmake_base() {
else
else
exit
1
exit
1
fi
fi
elif
[
"
$1
"
==
"cp38-cp38"
]
;
then
if
[
-d
"/Library/Frameworks/Python.framework/Versions/3.8"
]
;
then
export
LD_LIBRARY_PATH
=
/Library/Frameworks/Python.framework/Versions/3.8/lib/
export
DYLD_LIBRARY_PATH
=
/Library/Frameworks/Python.framework/Versions/3.8/lib/
export
PATH
=
/Library/Frameworks/Python.framework/Versions/3.8/bin/:
${
PATH
}
PYTHON_FLAGS
=
"-DPYTHON_EXECUTABLE:FILEPATH=/Library/Frameworks/Python.framework/Versions/3.8/bin/python3
-DPYTHON_INCLUDE_DIR:PATH=/Library/Frameworks/Python.framework/Versions/3.8/include/python3.8/
-DPYTHON_LIBRARY:FILEPATH=/Library/Frameworks/Python.framework/Versions/3.8/lib/libpython3.8.dylib"
pip3.8
install
--user
-r
${
PADDLE_ROOT
}
/python/requirements.txt
else
exit
1
fi
fi
fi
# delete `gym` to avoid modifying requirements.txt in *.whl
# delete `gym` to avoid modifying requirements.txt in *.whl
sed
-i
.bak
"/^gym
$/
d"
${
PADDLE_ROOT
}
/python/requirements.txt
sed
-i
.bak
"/^gym
$/
d"
${
PADDLE_ROOT
}
/python/requirements.txt
...
@@ -176,6 +188,13 @@ function cmake_base() {
...
@@ -176,6 +188,13 @@ function cmake_base() {
-DPYTHON_INCLUDE_DIR:PATH=/opt/_internal/cpython-3.7.0/include/python3.7m
-DPYTHON_INCLUDE_DIR:PATH=/opt/_internal/cpython-3.7.0/include/python3.7m
-DPYTHON_LIBRARIES:FILEPATH=/opt/_internal/cpython-3.7.0/lib/libpython3.so"
-DPYTHON_LIBRARIES:FILEPATH=/opt/_internal/cpython-3.7.0/lib/libpython3.so"
pip3.7
install
-r
${
PADDLE_ROOT
}
/python/requirements.txt
pip3.7
install
-r
${
PADDLE_ROOT
}
/python/requirements.txt
elif
[
"
$1
"
==
"cp38-cp38"
]
;
then
export
LD_LIBRARY_PATH
=
/opt/_internal/cpython-3.8.0/lib/:
${
LD_LIBRARY_PATH
}
export
PATH
=
/opt/_internal/cpython-3.8.0/bin/:
${
PATH
}
export
PYTHON_FLAGS
=
"-DPYTHON_EXECUTABLE:FILEPATH=/opt/_internal/cpython-3.8.0/bin/python3.8
-DPYTHON_INCLUDE_DIR:PATH=/opt/_internal/cpython-3.8.0/include/python3.8
-DPYTHON_LIBRARIES:FILEPATH=/opt/_internal/cpython-3.8.0/lib/libpython3.so"
pip3.8
install
-r
${
PADDLE_ROOT
}
/python/requirements.txt
fi
fi
else
else
pip
install
-r
${
PADDLE_ROOT
}
/python/requirements.txt
pip
install
-r
${
PADDLE_ROOT
}
/python/requirements.txt
...
@@ -514,6 +533,8 @@ EOF
...
@@ -514,6 +533,8 @@ EOF
pip3.6 uninstall
-y
paddlepaddle
pip3.6 uninstall
-y
paddlepaddle
elif
[
"
$1
"
==
"cp37-cp37m"
]
;
then
elif
[
"
$1
"
==
"cp37-cp37m"
]
;
then
pip3.7 uninstall
-y
paddlepaddle
pip3.7 uninstall
-y
paddlepaddle
elif
[
"
$1
"
==
"cp38-cp38"
]
;
then
pip3.8 uninstall
-y
paddlepaddle
fi
fi
set
-ex
set
-ex
...
@@ -527,6 +548,8 @@ EOF
...
@@ -527,6 +548,8 @@ EOF
pip3.6
install
--user
${
INSTALL_PREFIX
:-
/paddle/build
}
/opt/paddle/share/wheels/
*
.whl
pip3.6
install
--user
${
INSTALL_PREFIX
:-
/paddle/build
}
/opt/paddle/share/wheels/
*
.whl
elif
[
"
$1
"
==
"cp37-cp37m"
]
;
then
elif
[
"
$1
"
==
"cp37-cp37m"
]
;
then
pip3.7
install
--user
${
INSTALL_PREFIX
:-
/paddle/build
}
/opt/paddle/share/wheels/
*
.whl
pip3.7
install
--user
${
INSTALL_PREFIX
:-
/paddle/build
}
/opt/paddle/share/wheels/
*
.whl
elif
[
"
$1
"
==
"cp38-cp38"
]
;
then
pip3.8
install
--user
${
INSTALL_PREFIX
:-
/paddle/build
}
/opt/paddle/share/wheels/
*
.whl
fi
fi
tmpfile_rand
=
`
date
+%s%N
`
tmpfile_rand
=
`
date
+%s%N
`
tmpfile
=
$tmp_dir
/
$tmpfile_rand
tmpfile
=
$tmp_dir
/
$tmpfile_rand
...
@@ -666,7 +689,7 @@ function generate_api_spec() {
...
@@ -666,7 +689,7 @@ function generate_api_spec() {
awk
-F
'('
'{print $NF}'
$spec_path
>
${
spec_path
}
.doc
awk
-F
'('
'{print $NF}'
$spec_path
>
${
spec_path
}
.doc
awk
-F
'('
'{$NF="";print $0}'
$spec_path
>
${
spec_path
}
.api
awk
-F
'('
'{$NF="";print $0}'
$spec_path
>
${
spec_path
}
.api
if
[
"
$1
"
==
"cp35-cp35m"
]
||
[
"
$1
"
==
"cp36-cp36m"
]
||
[
"
$1
"
==
"cp37-cp37m"
]
;
then
if
[
"
$1
"
==
"cp35-cp35m"
]
||
[
"
$1
"
==
"cp36-cp36m"
]
||
[
"
$1
"
==
"cp37-cp37m"
]
||
[
"
$1
"
==
"cp38-cp38"
]
;
then
# Use sed to make python2 and python3 sepc keeps the same
# Use sed to make python2 and python3 sepc keeps the same
sed
-i
's/arg0: str/arg0: unicode/g'
$spec_path
sed
-i
's/arg0: str/arg0: unicode/g'
$spec_path
sed
-i
"s/
\(
.*Transpiler.*
\)
.__init__ (ArgSpec(args=
\[
'self'].*/
\1
.__init__ /g"
$spec_path
sed
-i
"s/
\(
.*Transpiler.*
\)
.__init__ (ArgSpec(args=
\[
'self'].*/
\1
.__init__ /g"
$spec_path
...
@@ -1244,21 +1267,25 @@ EOF
...
@@ -1244,21 +1267,25 @@ EOF
ref_paddle35
=
paddlepaddle
${
install_gpu
}
-
${
PADDLE_BRANCH
}
-cp35-cp35m-linux_x86_64
.whl
ref_paddle35
=
paddlepaddle
${
install_gpu
}
-
${
PADDLE_BRANCH
}
-cp35-cp35m-linux_x86_64
.whl
ref_paddle36
=
paddlepaddle
${
install_gpu
}
-
${
PADDLE_BRANCH
}
-cp36-cp36m-linux_x86_64
.whl
ref_paddle36
=
paddlepaddle
${
install_gpu
}
-
${
PADDLE_BRANCH
}
-cp36-cp36m-linux_x86_64
.whl
ref_paddle37
=
paddlepaddle
${
install_gpu
}
-
${
PADDLE_BRANCH
}
-cp37-cp37m-linux_x86_64
.whl
ref_paddle37
=
paddlepaddle
${
install_gpu
}
-
${
PADDLE_BRANCH
}
-cp37-cp37m-linux_x86_64
.whl
ref_paddle38
=
paddlepaddle
${
install_gpu
}
-
${
PADDLE_BRANCH
}
-cp38-cp38-linux_x86_64
.whl
ref_paddle2_whl
=
paddlepaddle
${
install_gpu
}
-
${
PADDLE_BRANCH
}
-cp27-cp27mu-linux_x86_64
.whl
ref_paddle2_whl
=
paddlepaddle
${
install_gpu
}
-
${
PADDLE_BRANCH
}
-cp27-cp27mu-linux_x86_64
.whl
ref_paddle35_whl
=
paddlepaddle
${
install_gpu
}
-
${
PADDLE_BRANCH
}
-cp35-cp35m-linux_x86_64
.whl
ref_paddle35_whl
=
paddlepaddle
${
install_gpu
}
-
${
PADDLE_BRANCH
}
-cp35-cp35m-linux_x86_64
.whl
ref_paddle36_whl
=
paddlepaddle
${
install_gpu
}
-
${
PADDLE_BRANCH
}
-cp36-cp36m-linux_x86_64
.whl
ref_paddle36_whl
=
paddlepaddle
${
install_gpu
}
-
${
PADDLE_BRANCH
}
-cp36-cp36m-linux_x86_64
.whl
ref_paddle37_whl
=
paddlepaddle
${
install_gpu
}
-
${
PADDLE_BRANCH
}
-cp37-cp37m-linux_x86_64
.whl
ref_paddle37_whl
=
paddlepaddle
${
install_gpu
}
-
${
PADDLE_BRANCH
}
-cp37-cp37m-linux_x86_64
.whl
ref_paddle38_whl
=
paddlepaddle
${
install_gpu
}
-
${
PADDLE_BRANCH
}
-cp38-cp38-linux_x86_64
.whl
if
[[
${
PADDLE_BRANCH
}
!=
"0.0.0"
&&
${
WITH_MKL
}
==
"ON"
&&
${
WITH_GPU
}
==
"ON"
]]
;
then
if
[[
${
PADDLE_BRANCH
}
!=
"0.0.0"
&&
${
WITH_MKL
}
==
"ON"
&&
${
WITH_GPU
}
==
"ON"
]]
;
then
ref_paddle2
=
paddlepaddle
${
install_gpu
}
-
${
PADDLE_BRANCH
}
.post
${
ref_CUDA_MAJOR
}${
CUDNN_MAJOR
}
-cp27-cp27mu-linux_x86_64
.whl
ref_paddle2
=
paddlepaddle
${
install_gpu
}
-
${
PADDLE_BRANCH
}
.post
${
ref_CUDA_MAJOR
}${
CUDNN_MAJOR
}
-cp27-cp27mu-linux_x86_64
.whl
ref_paddle35
=
paddlepaddle
${
install_gpu
}
-
${
PADDLE_BRANCH
}
.post
${
ref_CUDA_MAJOR
}${
CUDNN_MAJOR
}
-cp35-cp35m-linux_x86_64
.whl
ref_paddle35
=
paddlepaddle
${
install_gpu
}
-
${
PADDLE_BRANCH
}
.post
${
ref_CUDA_MAJOR
}${
CUDNN_MAJOR
}
-cp35-cp35m-linux_x86_64
.whl
ref_paddle36
=
paddlepaddle
${
install_gpu
}
-
${
PADDLE_BRANCH
}
.post
${
ref_CUDA_MAJOR
}${
CUDNN_MAJOR
}
-cp36-cp36m-linux_x86_64
.whl
ref_paddle36
=
paddlepaddle
${
install_gpu
}
-
${
PADDLE_BRANCH
}
.post
${
ref_CUDA_MAJOR
}${
CUDNN_MAJOR
}
-cp36-cp36m-linux_x86_64
.whl
ref_paddle37
=
paddlepaddle
${
install_gpu
}
-
${
PADDLE_BRANCH
}
.post
${
ref_CUDA_MAJOR
}${
CUDNN_MAJOR
}
-cp37-cp37m-linux_x86_64
.whl
ref_paddle37
=
paddlepaddle
${
install_gpu
}
-
${
PADDLE_BRANCH
}
.post
${
ref_CUDA_MAJOR
}${
CUDNN_MAJOR
}
-cp37-cp37m-linux_x86_64
.whl
ref_paddle38
=
paddlepaddle
${
install_gpu
}
-
${
PADDLE_BRANCH
}
.post
${
ref_CUDA_MAJOR
}${
CUDNN_MAJOR
}
-cp38-cp38-linux_x86_64
.whl
ref_paddle2_whl
=
paddlepaddle
${
install_gpu
}
-
${
PADDLE_BRANCH
}
.post
${
ref_CUDA_MAJOR
}${
CUDNN_MAJOR
}
-cp27-cp27mu-linux_x86_64
.whl
ref_paddle2_whl
=
paddlepaddle
${
install_gpu
}
-
${
PADDLE_BRANCH
}
.post
${
ref_CUDA_MAJOR
}${
CUDNN_MAJOR
}
-cp27-cp27mu-linux_x86_64
.whl
ref_paddle35_whl
=
paddlepaddle
${
install_gpu
}
-
${
PADDLE_BRANCH
}
.post
${
ref_CUDA_MAJOR
}${
CUDNN_MAJOR
}
-cp35-cp35m-linux_x86_64
.whl
ref_paddle35_whl
=
paddlepaddle
${
install_gpu
}
-
${
PADDLE_BRANCH
}
.post
${
ref_CUDA_MAJOR
}${
CUDNN_MAJOR
}
-cp35-cp35m-linux_x86_64
.whl
ref_paddle36_whl
=
paddlepaddle
${
install_gpu
}
-
${
PADDLE_BRANCH
}
.post
${
ref_CUDA_MAJOR
}${
CUDNN_MAJOR
}
-cp36-cp36m-linux_x86_64
.whl
ref_paddle36_whl
=
paddlepaddle
${
install_gpu
}
-
${
PADDLE_BRANCH
}
.post
${
ref_CUDA_MAJOR
}${
CUDNN_MAJOR
}
-cp36-cp36m-linux_x86_64
.whl
ref_paddle37_whl
=
paddlepaddle
${
install_gpu
}
-
${
PADDLE_BRANCH
}
.post
${
ref_CUDA_MAJOR
}${
CUDNN_MAJOR
}
-cp37-cp37m-linux_x86_64
.whl
ref_paddle37_whl
=
paddlepaddle
${
install_gpu
}
-
${
PADDLE_BRANCH
}
.post
${
ref_CUDA_MAJOR
}${
CUDNN_MAJOR
}
-cp37-cp37m-linux_x86_64
.whl
ref_paddle38_whl
=
paddlepaddle
${
install_gpu
}
-
${
PADDLE_BRANCH
}
.post
${
ref_CUDA_MAJOR
}${
CUDNN_MAJOR
}
-cp38-cp38-linux_x86_64
.whl
fi
fi
#ref_paddle2_mv1=""
#ref_paddle2_mv1=""
...
@@ -1363,6 +1390,22 @@ EOF
...
@@ -1363,6 +1390,22 @@ EOF
apt-get clean -y &&
\
apt-get clean -y &&
\
rm -f
${
ref_paddle37
}
&&
\
rm -f
${
ref_paddle37
}
&&
\
ldconfig
ldconfig
EOF
cat
>>
${
PADDLE_ROOT
}
/build/Dockerfile
<<
EOF
# run paddle version to install python packages first
RUN apt-get update &&
${
NCCL_DEPS
}
RUN apt-get install -y make build-essential libssl-dev zlib1g-dev libbz2-dev
\
libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev libncursesw5-dev
\
xz-utils tk-dev libffi-dev liblzma-dev
RUN wget -q https://www.python.org/ftp/python/3.8.0/Python-3.8.0.tgz &&
\
tar -xzf Python-3.8.0.tgz && cd Python-3.8.0 &&
\
CFLAGS="-Wformat" ./configure --prefix=/usr/local/ --enable-shared > /dev/null &&
\
make -j8 > /dev/null && make altinstall > /dev/null && cd ../ && rm Python-3.8.0.tgz
RUN apt-get install -y libgtk2.0-dev dmidecode python3-tk && ldconfig &&
\
pip3.8 install opencv-python && wget
${
ref_web
}
/
${
ref_paddle38
}
&& pip3.8 install
${
ref_paddle38_whl
}
; apt-get install -f -y &&
\
apt-get clean -y &&
\
rm -f
${
ref_paddle38
}
&&
\
ldconfig
EOF
EOF
cat
>>
${
PADDLE_ROOT
}
/build/Dockerfile
<<
EOF
cat
>>
${
PADDLE_ROOT
}
/build/Dockerfile
<<
EOF
# run paddle version to install python packages first
# run paddle version to install python packages first
...
...
python/paddle/distributed/fleet/__init__.py
浏览文件 @
f52c4f8b
...
@@ -42,6 +42,7 @@ server_num = fleet.server_num
...
@@ -42,6 +42,7 @@ server_num = fleet.server_num
server_index
=
fleet
.
server_index
server_index
=
fleet
.
server_index
server_endpoints
=
fleet
.
server_endpoints
server_endpoints
=
fleet
.
server_endpoints
is_server
=
fleet
.
is_server
is_server
=
fleet
.
is_server
set_util
=
fleet
.
set_util
util
=
fleet
.
util
util
=
fleet
.
util
barrier_worker
=
fleet
.
barrier_worker
barrier_worker
=
fleet
.
barrier_worker
init_worker
=
fleet
.
init_worker
init_worker
=
fleet
.
init_worker
...
...
python/paddle/distributed/fleet/base/fleet_base.py
浏览文件 @
f52c4f8b
...
@@ -180,6 +180,8 @@ class Fleet(object):
...
@@ -180,6 +180,8 @@ class Fleet(object):
raise
ValueError
(
raise
ValueError
(
"`role_maker` should be subclass of `RoleMakerBase`, but got {}"
.
"`role_maker` should be subclass of `RoleMakerBase`, but got {}"
.
format
(
type
(
role_maker
)))
format
(
type
(
role_maker
)))
self
.
_role_maker
.
_generate_role
()
self
.
strategy_compiler
=
StrategyCompiler
()
self
.
strategy_compiler
=
StrategyCompiler
()
if
paddle
.
fluid
.
framework
.
in_dygraph_mode
():
if
paddle
.
fluid
.
framework
.
in_dygraph_mode
():
if
parallel_helper
.
_is_parallel_ctx_initialized
():
if
parallel_helper
.
_is_parallel_ctx_initialized
():
...
@@ -187,7 +189,6 @@ class Fleet(object):
...
@@ -187,7 +189,6 @@ class Fleet(object):
"The dygraph parallel environment has been initialized."
)
"The dygraph parallel environment has been initialized."
)
else
:
else
:
paddle
.
distributed
.
init_parallel_env
()
paddle
.
distributed
.
init_parallel_env
()
return
None
def
is_first_worker
(
self
):
def
is_first_worker
(
self
):
"""
"""
...
@@ -206,7 +207,7 @@ class Fleet(object):
...
@@ -206,7 +207,7 @@ class Fleet(object):
fleet.is_first_worker()
fleet.is_first_worker()
"""
"""
return
self
.
_role_maker
.
is_first_worker
()
return
self
.
_role_maker
.
_
is_first_worker
()
def
worker_index
(
self
):
def
worker_index
(
self
):
"""
"""
...
@@ -223,7 +224,7 @@ class Fleet(object):
...
@@ -223,7 +224,7 @@ class Fleet(object):
fleet.worker_index()
fleet.worker_index()
"""
"""
return
self
.
_role_maker
.
worker_index
()
return
self
.
_role_maker
.
_
worker_index
()
def
worker_num
(
self
):
def
worker_num
(
self
):
"""
"""
...
@@ -240,7 +241,7 @@ class Fleet(object):
...
@@ -240,7 +241,7 @@ class Fleet(object):
fleet.worker_num()
fleet.worker_num()
"""
"""
return
self
.
_role_maker
.
worker_num
()
return
self
.
_role_maker
.
_
worker_num
()
def
is_worker
(
self
):
def
is_worker
(
self
):
"""
"""
...
@@ -258,7 +259,7 @@ class Fleet(object):
...
@@ -258,7 +259,7 @@ class Fleet(object):
fleet.is_worker()
fleet.is_worker()
"""
"""
return
self
.
_role_maker
.
is_worker
()
return
self
.
_role_maker
.
_
is_worker
()
def
worker_endpoints
(
self
,
to_string
=
False
):
def
worker_endpoints
(
self
,
to_string
=
False
):
"""
"""
...
@@ -275,13 +276,10 @@ class Fleet(object):
...
@@ -275,13 +276,10 @@ class Fleet(object):
fleet.worker_endpoints()
fleet.worker_endpoints()
"""
"""
'''
if
to_string
:
if
to_string
:
return ",".join(self._role_maker.get_trainer_endpoints())
return
","
.
join
(
self
.
_role_maker
.
_
get_trainer_endpoints
())
else
:
else
:
return self._role_maker.get_trainer_endpoints()
return
self
.
_role_maker
.
_get_trainer_endpoints
()
'''
return
[
"127.0.0.1:1001"
,
"127.0.0.1:1002"
]
def
server_num
(
self
):
def
server_num
(
self
):
"""
"""
...
@@ -296,7 +294,7 @@ class Fleet(object):
...
@@ -296,7 +294,7 @@ class Fleet(object):
fleet.init()
fleet.init()
fleet.server_num()
fleet.server_num()
"""
"""
return
len
(
self
.
_role_maker
.
get_pserver_endpoints
())
return
len
(
self
.
_role_maker
.
_
get_pserver_endpoints
())
def
server_index
(
self
):
def
server_index
(
self
):
"""
"""
...
@@ -313,7 +311,7 @@ class Fleet(object):
...
@@ -313,7 +311,7 @@ class Fleet(object):
fleet.server_index()
fleet.server_index()
"""
"""
return
self
.
_role_maker
.
server_index
()
return
self
.
_role_maker
.
_
server_index
()
def
server_endpoints
(
self
,
to_string
=
False
):
def
server_endpoints
(
self
,
to_string
=
False
):
"""
"""
...
@@ -332,9 +330,9 @@ class Fleet(object):
...
@@ -332,9 +330,9 @@ class Fleet(object):
"""
"""
if
to_string
:
if
to_string
:
return
","
.
join
(
self
.
_role_maker
.
get_pserver_endpoints
())
return
","
.
join
(
self
.
_role_maker
.
_
get_pserver_endpoints
())
else
:
else
:
return
self
.
_role_maker
.
get_pserver_endpoints
()
return
self
.
_role_maker
.
_
get_pserver_endpoints
()
def
is_server
(
self
):
def
is_server
(
self
):
"""
"""
...
@@ -352,10 +350,12 @@ class Fleet(object):
...
@@ -352,10 +350,12 @@ class Fleet(object):
fleet.is_server()
fleet.is_server()
"""
"""
return
self
.
_role_maker
.
is_server
(
return
self
.
_role_maker
.
_
is_server
(
)
or
self
.
_role_maker
.
_is_heter_worker
()
)
or
self
.
_role_maker
.
_is_heter_worker
()
@
property
def
set_util
(
self
,
util
):
self
.
_util
=
util
def
util
(
self
):
def
util
(
self
):
"""
"""
Utility functions that can be used under certain runtime
Utility functions that can be used under certain runtime
...
@@ -376,16 +376,6 @@ class Fleet(object):
...
@@ -376,16 +376,6 @@ class Fleet(object):
"""
"""
return
self
.
_util
return
self
.
_util
@
util
.
setter
def
util
(
self
,
util
):
"""
Set Utility functions for userd-defined runtime
Returns:
None
"""
self
.
_util
=
util
def
barrier_worker
(
self
):
def
barrier_worker
(
self
):
"""
"""
barrier all workers
barrier all workers
...
@@ -393,7 +383,7 @@ class Fleet(object):
...
@@ -393,7 +383,7 @@ class Fleet(object):
Returns:
Returns:
None
None
"""
"""
self
.
_role_maker
.
barrier_worker
(
)
self
.
_role_maker
.
_barrier
(
"worker"
)
@
is_non_distributed_check
@
is_non_distributed_check
@
inited_runtime_handler
@
inited_runtime_handler
...
...
python/paddle/distributed/fleet/base/role_maker.py
浏览文件 @
f52c4f8b
此差异已折叠。
点击以展开。
python/paddle/distributed/fleet/base/util_factory.py
浏览文件 @
f52c4f8b
...
@@ -57,34 +57,7 @@ class UtilBase(object):
...
@@ -57,34 +57,7 @@ class UtilBase(object):
),
"fs_client must be the instance of paddle.distributed.fleet.utils.FS"
),
"fs_client must be the instance of paddle.distributed.fleet.utils.FS"
self
.
fs_client
=
fs_client
self
.
fs_client
=
fs_client
def
__check_comm_world
(
self
,
comm_world
=
"worker"
):
def
all_reduce
(
self
,
input
,
mode
=
"sum"
,
comm_world
=
"worker"
):
if
not
self
.
role_maker
.
_role_is_generated
:
self
.
role_maker
.
generate_role
()
_comm_world
=
None
comm_world_upper
=
comm_world
.
upper
()
if
comm_world_upper
==
"WORKER"
:
if
not
self
.
role_maker
.
is_worker
():
print
(
"warning: current role is not worker in collective_func(comm_world=
\"
worker
\"
)"
)
_comm_world
=
self
.
role_maker
.
_node_type_comm
elif
comm_world_upper
==
"SERVER"
:
if
not
self
.
role_maker
.
is_server
():
print
(
"warning: current role is not server in collective_func(comm_world=
\"
server
\"
)"
)
_comm_world
=
self
.
role_maker
.
_node_type_comm
elif
comm_world_upper
==
"ALL"
:
_comm_world
=
self
.
role_maker
.
_all_comm
else
:
raise
ValueError
(
"not support comm_world, please choose one from [worker, server, all]"
)
return
_comm_world
def
all_reduce
(
self
,
input
,
mode
,
comm_world
=
"worker"
):
"""
"""
All reduce `input` between specified collection. This is a distributed API.
All reduce `input` between specified collection. This is a distributed API.
...
@@ -130,8 +103,7 @@ class UtilBase(object):
...
@@ -130,8 +103,7 @@ class UtilBase(object):
if __name__ == "__main__":
if __name__ == "__main__":
train()
train()
"""
"""
_comm_world
=
self
.
__check_comm_world
(
comm_world
)
return
self
.
role_maker
.
_all_reduce
(
input
,
mode
,
comm_world
)
return
self
.
role_maker
.
_all_reduce
(
_comm_world
,
input
,
mode
)
def
barrier
(
self
,
comm_world
=
"worker"
):
def
barrier
(
self
,
comm_world
=
"worker"
):
"""
"""
...
@@ -170,8 +142,7 @@ class UtilBase(object):
...
@@ -170,8 +142,7 @@ class UtilBase(object):
if __name__ == "__main__":
if __name__ == "__main__":
train()
train()
"""
"""
_comm_world
=
self
.
__check_comm_world
(
comm_world
)
self
.
role_maker
.
_barrier
(
comm_world
)
self
.
role_maker
.
_barrier
(
_comm_world
)
def
all_gather
(
self
,
input
,
comm_world
=
"worker"
):
def
all_gather
(
self
,
input
,
comm_world
=
"worker"
):
"""
"""
...
@@ -219,8 +190,8 @@ class UtilBase(object):
...
@@ -219,8 +190,8 @@ class UtilBase(object):
if __name__ == "__main__":
if __name__ == "__main__":
train()
train()
"""
"""
_comm_world
=
self
.
__check_comm_world
(
comm_world
)
return
self
.
role_maker
.
_all_gather
(
_comm_world
,
input
)
return
self
.
role_maker
.
_all_gather
(
input
,
comm_world
)
def
_broadcast
(
self
):
def
_broadcast
(
self
):
pass
pass
...
@@ -266,8 +237,8 @@ class UtilBase(object):
...
@@ -266,8 +237,8 @@ class UtilBase(object):
if
not
isinstance
(
files
,
list
):
if
not
isinstance
(
files
,
list
):
raise
TypeError
(
"files should be a list of file need to be read."
)
raise
TypeError
(
"files should be a list of file need to be read."
)
trainer_id
=
self
.
role_maker
.
worker_index
()
trainer_id
=
self
.
role_maker
.
_
worker_index
()
trainers
=
self
.
role_maker
.
worker_num
()
trainers
=
self
.
role_maker
.
_
worker_num
()
remainder
=
len
(
files
)
%
trainers
remainder
=
len
(
files
)
%
trainers
blocksize
=
int
(
len
(
files
)
/
trainers
)
blocksize
=
int
(
len
(
files
)
/
trainers
)
...
@@ -309,7 +280,7 @@ class UtilBase(object):
...
@@ -309,7 +280,7 @@ class UtilBase(object):
fleet_util._set_role_maker(role)
fleet_util._set_role_maker(role)
fleet_util.print_on_rank("I'm worker 0", 0)
fleet_util.print_on_rank("I'm worker 0", 0)
"""
"""
if
self
.
role_maker
.
worker_index
()
!=
rank_id
:
if
self
.
role_maker
.
_
worker_index
()
!=
rank_id
:
return
return
print
(
message
)
print
(
message
)
...
...
python/paddle/distributed/fleet/launch.py
浏览文件 @
f52c4f8b
...
@@ -55,7 +55,10 @@ launch a process on each of the given gpu card or cpu machine.
...
@@ -55,7 +55,10 @@ launch a process on each of the given gpu card or cpu machine.
"""
"""
from
__future__
import
print_function
from
__future__
import
print_function
import
shutil
import
sys
import
sys
import
tempfile
from
sys
import
version
from
sys
import
version
import
subprocess
import
subprocess
import
os
import
os
...
@@ -213,12 +216,20 @@ def launch_collective(args):
...
@@ -213,12 +216,20 @@ def launch_collective(args):
cluster
,
pod
=
get_cluster_from_args
(
args
,
gpus
)
cluster
,
pod
=
get_cluster_from_args
(
args
,
gpus
)
logger
.
debug
(
"get cluster from args:{}"
.
format
(
cluster
))
logger
.
debug
(
"get cluster from args:{}"
.
format
(
cluster
))
global_envs
=
copy
.
copy
(
os
.
environ
.
copy
())
gloo_rendezvous_dir
=
tempfile
.
mkdtemp
()
# add gloo env
global_envs
[
"PADDLE_WITH_GLOO"
]
=
"1"
global_envs
[
"PADDLE_GLOO_RENDEZVOUS"
]
=
"2"
global_envs
[
"PADDLE_GLOO_FS_PATH"
]
=
gloo_rendezvous_dir
procs
=
start_local_trainers
(
procs
=
start_local_trainers
(
cluster
,
cluster
,
pod
,
pod
,
training_script
=
args
.
training_script
,
training_script
=
args
.
training_script
,
training_script_args
=
args
.
training_script_args
,
training_script_args
=
args
.
training_script_args
,
log_dir
=
args
.
log_dir
)
log_dir
=
args
.
log_dir
,
envs
=
global_envs
)
while
True
:
while
True
:
alive
=
watch_local_trainers
(
procs
,
cluster
.
trainers_nranks
())
alive
=
watch_local_trainers
(
procs
,
cluster
.
trainers_nranks
())
...
@@ -230,6 +241,9 @@ def launch_collective(args):
...
@@ -230,6 +241,9 @@ def launch_collective(args):
time
.
sleep
(
3
)
time
.
sleep
(
3
)
if
os
.
path
.
exists
(
gloo_rendezvous_dir
):
shutil
.
rmtree
(
gloo_rendezvous_dir
)
def
launch_ps
(
args
):
def
launch_ps
(
args
):
ports
=
None
ports
=
None
...
@@ -315,6 +329,13 @@ def launch_ps(args):
...
@@ -315,6 +329,13 @@ def launch_ps(args):
default_env
=
os
.
environ
.
copy
()
default_env
=
os
.
environ
.
copy
()
current_env
=
copy
.
copy
(
default_env
)
current_env
=
copy
.
copy
(
default_env
)
gloo_rendezvous_dir
=
tempfile
.
mkdtemp
()
# add gloo env
current_env
[
"PADDLE_WITH_GLOO"
]
=
"1"
current_env
[
"PADDLE_GLOO_RENDEZVOUS"
]
=
"2"
current_env
[
"PADDLE_GLOO_FS_PATH"
]
=
gloo_rendezvous_dir
current_env
.
pop
(
"http_proxy"
,
None
)
current_env
.
pop
(
"http_proxy"
,
None
)
current_env
.
pop
(
"https_proxy"
,
None
)
current_env
.
pop
(
"https_proxy"
,
None
)
procs
=
[]
procs
=
[]
...
@@ -419,6 +440,9 @@ def launch_ps(args):
...
@@ -419,6 +440,9 @@ def launch_ps(args):
procs
[
i
].
proc
.
terminate
()
procs
[
i
].
proc
.
terminate
()
print
(
"all parameter server are killed"
,
file
=
sys
.
stderr
)
print
(
"all parameter server are killed"
,
file
=
sys
.
stderr
)
if
os
.
path
.
exists
(
gloo_rendezvous_dir
):
shutil
.
rmtree
(
gloo_rendezvous_dir
)
def
launch
():
def
launch
():
args
=
_parse_args
()
args
=
_parse_args
()
...
...
python/paddle/distributed/fleet/launch_utils.py
浏览文件 @
f52c4f8b
...
@@ -398,8 +398,14 @@ def start_local_trainers(cluster,
...
@@ -398,8 +398,14 @@ def start_local_trainers(cluster,
pod
,
pod
,
training_script
,
training_script
,
training_script_args
,
training_script_args
,
log_dir
=
None
):
log_dir
=
None
,
envs
=
None
):
if
envs
is
None
:
current_env
=
copy
.
copy
(
os
.
environ
.
copy
())
current_env
=
copy
.
copy
(
os
.
environ
.
copy
())
else
:
current_env
=
copy
.
copy
(
envs
)
#paddle broadcast ncclUniqueId use socket, and
#paddle broadcast ncclUniqueId use socket, and
#proxy maybe make trainers unreachable, so delete them.
#proxy maybe make trainers unreachable, so delete them.
#if we set them to "", grpc will log error message "bad uri"
#if we set them to "", grpc will log error message "bad uri"
...
...
python/paddle/distributed/fleet/meta_optimizers/common.py
浏览文件 @
f52c4f8b
...
@@ -57,12 +57,12 @@ class CollectiveHelper(object):
...
@@ -57,12 +57,12 @@ class CollectiveHelper(object):
if
startup_program
is
None
:
if
startup_program
is
None
:
self
.
startup_program
=
fluid
.
default_startup_program
()
self
.
startup_program
=
fluid
.
default_startup_program
()
endpoints
=
self
.
role_maker
.
get_trainer_endpoints
()
endpoints
=
self
.
role_maker
.
_
get_trainer_endpoints
()
current_endpoint
=
endpoints
[
self
.
role_maker
.
worker_index
()]
current_endpoint
=
endpoints
[
self
.
role_maker
.
_
worker_index
()]
for
ring_id
in
range
(
self
.
nrings
):
for
ring_id
in
range
(
self
.
nrings
):
self
.
_init_communicator
(
self
.
_init_communicator
(
self
.
startup_program
,
current_endpoint
,
endpoints
,
self
.
startup_program
,
current_endpoint
,
endpoints
,
self
.
role_maker
.
worker_index
(),
ring_id
,
self
.
wait_port
)
self
.
role_maker
.
_
worker_index
(),
ring_id
,
self
.
wait_port
)
self
.
_broadcast_params
()
self
.
_broadcast_params
()
def
_init_communicator
(
self
,
program
,
current_endpoint
,
endpoints
,
rank
,
def
_init_communicator
(
self
,
program
,
current_endpoint
,
endpoints
,
rank
,
...
...
python/paddle/distributed/fleet/meta_optimizers/dgc_optimizer.py
浏览文件 @
f52c4f8b
...
@@ -47,7 +47,7 @@ class DGCOptimizer(MetaOptimizerBase):
...
@@ -47,7 +47,7 @@ class DGCOptimizer(MetaOptimizerBase):
sparsity
=
configs
[
'sparsity'
],
sparsity
=
configs
[
'sparsity'
],
parameter_list
=
opt
.
_parameter_list
,
parameter_list
=
opt
.
_parameter_list
,
use_nesterov
=
opt
.
_use_nesterov
,
use_nesterov
=
opt
.
_use_nesterov
,
num_trainers
=
self
.
role_maker
.
worker_num
(),
num_trainers
=
self
.
role_maker
.
_
worker_num
(),
regularization
=
opt
.
regularization
,
regularization
=
opt
.
regularization
,
grad_clip
=
opt
.
_grad_clip
,
grad_clip
=
opt
.
_grad_clip
,
name
=
opt
.
_name
)
name
=
opt
.
_name
)
...
@@ -60,7 +60,7 @@ class DGCOptimizer(MetaOptimizerBase):
...
@@ -60,7 +60,7 @@ class DGCOptimizer(MetaOptimizerBase):
if
not
isinstance
(
self
.
inner_opt
,
Momentum
):
if
not
isinstance
(
self
.
inner_opt
,
Momentum
):
logging
.
warn
(
"dgc only works on Momentum optimizer"
)
logging
.
warn
(
"dgc only works on Momentum optimizer"
)
return
False
return
False
if
self
.
role_maker
.
worker_num
()
<=
1
:
if
self
.
role_maker
.
_
worker_num
()
<=
1
:
logging
.
warn
(
"dgc only works on multi cards"
)
logging
.
warn
(
"dgc only works on multi cards"
)
return
False
return
False
...
...
python/paddle/distributed/fleet/meta_optimizers/graph_execution_optimizer.py
浏览文件 @
f52c4f8b
...
@@ -50,12 +50,12 @@ class GraphExecutionOptimizer(MetaOptimizerBase):
...
@@ -50,12 +50,12 @@ class GraphExecutionOptimizer(MetaOptimizerBase):
# should fix the variable
# should fix the variable
def
_setup_nccl_op
(
self
,
startup_program
,
main_program
,
build_strategy
):
def
_setup_nccl_op
(
self
,
startup_program
,
main_program
,
build_strategy
):
trainer_endpoints
=
self
.
role_maker
.
get_trainer_endpoints
()
trainer_endpoints
=
self
.
role_maker
.
_
get_trainer_endpoints
()
trainers
=
trainer_endpoints
trainers
=
trainer_endpoints
trainer_id
=
self
.
role_maker
.
worker_index
()
trainer_id
=
self
.
role_maker
.
_
worker_index
()
current_endpoint
=
self
.
role_maker
.
get_trainer_endpoints
()[
trainer_id
]
current_endpoint
=
self
.
role_maker
.
_
get_trainer_endpoints
()[
trainer_id
]
trainer_endpoints_env
=
","
.
join
(
trainer_endpoints
)
trainer_endpoints_env
=
","
.
join
(
trainer_endpoints
)
trainers_num
=
self
.
role_maker
.
worker_num
()
trainers_num
=
self
.
role_maker
.
_
worker_num
()
nccl_id_var
=
startup_program
.
global_block
().
create_var
(
nccl_id_var
=
startup_program
.
global_block
().
create_var
(
name
=
"NCCLID"
,
persistable
=
True
,
type
=
core
.
VarDesc
.
VarType
.
RAW
)
name
=
"NCCLID"
,
persistable
=
True
,
type
=
core
.
VarDesc
.
VarType
.
RAW
)
for
i
in
range
(
1
,
build_strategy
.
nccl_comm_num
):
for
i
in
range
(
1
,
build_strategy
.
nccl_comm_num
):
...
@@ -127,8 +127,8 @@ class GraphExecutionOptimizer(MetaOptimizerBase):
...
@@ -127,8 +127,8 @@ class GraphExecutionOptimizer(MetaOptimizerBase):
local_build_strategy
.
enable_sequential_execution
=
True
local_build_strategy
.
enable_sequential_execution
=
True
exe_strategy
=
self
.
user_defined_strategy
.
execution_strategy
exe_strategy
=
self
.
user_defined_strategy
.
execution_strategy
worker_num
=
self
.
role_maker
.
worker_num
()
worker_num
=
self
.
role_maker
.
_
worker_num
()
node_num
=
self
.
role_maker
.
node_num
()
node_num
=
self
.
role_maker
.
_
node_num
()
if
self
.
role_maker
.
_is_collective
:
if
self
.
role_maker
.
_is_collective
:
assert
worker_num
>=
1
,
"nccl2 worker_num must >= 1, now:{}"
%
worker_num
assert
worker_num
>=
1
,
"nccl2 worker_num must >= 1, now:{}"
%
worker_num
...
@@ -170,9 +170,9 @@ class GraphExecutionOptimizer(MetaOptimizerBase):
...
@@ -170,9 +170,9 @@ class GraphExecutionOptimizer(MetaOptimizerBase):
# TODO(guru4elephant): should be an independent optimizer
# TODO(guru4elephant): should be an independent optimizer
self
.
_setup_nccl_op
(
startup_program
,
main_program
,
local_build_strategy
)
self
.
_setup_nccl_op
(
startup_program
,
main_program
,
local_build_strategy
)
local_build_strategy
.
num_trainers
=
self
.
role_maker
.
worker_num
()
local_build_strategy
.
num_trainers
=
self
.
role_maker
.
_
worker_num
()
local_build_strategy
.
trainer_id
=
self
.
role_maker
.
worker_index
()
local_build_strategy
.
trainer_id
=
self
.
role_maker
.
_
worker_index
()
local_build_strategy
.
trainers_endpoints
=
self
.
role_maker
.
get_trainer_endpoints
(
local_build_strategy
.
trainers_endpoints
=
self
.
role_maker
.
_
get_trainer_endpoints
(
)
)
local_build_strategy
.
enable_backward_optimizer_op_deps
=
True
local_build_strategy
.
enable_backward_optimizer_op_deps
=
True
...
...
python/paddle/distributed/fleet/meta_optimizers/localsgd_optimizer.py
浏览文件 @
f52c4f8b
...
@@ -38,7 +38,7 @@ class LocalSGDOptimizer(MetaOptimizerBase):
...
@@ -38,7 +38,7 @@ class LocalSGDOptimizer(MetaOptimizerBase):
if
not
self
.
user_defined_strategy
.
localsgd
:
if
not
self
.
user_defined_strategy
.
localsgd
:
return
False
return
False
if
self
.
role_maker
.
worker_num
()
<=
1
:
if
self
.
role_maker
.
_
worker_num
()
<=
1
:
return
False
return
False
return
isinstance
(
self
.
inner_opt
,
paddle
.
optimizer
.
momentum
.
Momentum
)
\
return
isinstance
(
self
.
inner_opt
,
paddle
.
optimizer
.
momentum
.
Momentum
)
\
...
@@ -168,7 +168,7 @@ class LocalSGDOptimizer(MetaOptimizerBase):
...
@@ -168,7 +168,7 @@ class LocalSGDOptimizer(MetaOptimizerBase):
inputs
=
{
'X'
:
[
param
]},
inputs
=
{
'X'
:
[
param
]},
outputs
=
{
'Out'
:
[
param
]},
outputs
=
{
'Out'
:
[
param
]},
attrs
=
{
attrs
=
{
'scale'
:
1.0
/
self
.
role_maker
.
worker_num
(),
'scale'
:
1.0
/
self
.
role_maker
.
_
worker_num
(),
OP_ROLE_KEY
:
OpRole
.
Optimize
OP_ROLE_KEY
:
OpRole
.
Optimize
})
})
sub_block
.
append_op
(
sub_block
.
append_op
(
...
@@ -208,7 +208,7 @@ class AdaptiveLocalSGDOptimizer(MetaOptimizerBase):
...
@@ -208,7 +208,7 @@ class AdaptiveLocalSGDOptimizer(MetaOptimizerBase):
if
not
self
.
user_defined_strategy
.
adaptive_localsgd
:
if
not
self
.
user_defined_strategy
.
adaptive_localsgd
:
return
False
return
False
if
self
.
role_maker
.
worker_num
()
<=
1
:
if
self
.
role_maker
.
_
worker_num
()
<=
1
:
return
False
return
False
return
isinstance
(
self
.
inner_opt
,
paddle
.
optimizer
.
momentum
.
Momentum
)
\
return
isinstance
(
self
.
inner_opt
,
paddle
.
optimizer
.
momentum
.
Momentum
)
\
...
@@ -275,7 +275,7 @@ class AdaptiveLocalSGDOptimizer(MetaOptimizerBase):
...
@@ -275,7 +275,7 @@ class AdaptiveLocalSGDOptimizer(MetaOptimizerBase):
inputs
=
{
'X'
:
[
avg_loss
]},
inputs
=
{
'X'
:
[
avg_loss
]},
outputs
=
{
'Out'
:
[
avg_loss
]},
outputs
=
{
'Out'
:
[
avg_loss
]},
attrs
=
{
attrs
=
{
'scale'
:
1.0
/
self
.
role_maker
.
worker_num
(),
'scale'
:
1.0
/
self
.
role_maker
.
_
worker_num
(),
OP_ROLE_KEY
:
OpRole
.
Optimize
OP_ROLE_KEY
:
OpRole
.
Optimize
})
})
...
@@ -398,7 +398,7 @@ class AdaptiveLocalSGDOptimizer(MetaOptimizerBase):
...
@@ -398,7 +398,7 @@ class AdaptiveLocalSGDOptimizer(MetaOptimizerBase):
inputs
=
{
'X'
:
[
param
]},
inputs
=
{
'X'
:
[
param
]},
outputs
=
{
'Out'
:
[
param
]},
outputs
=
{
'Out'
:
[
param
]},
attrs
=
{
attrs
=
{
'scale'
:
1.0
/
self
.
role_maker
.
worker_num
(),
'scale'
:
1.0
/
self
.
role_maker
.
_
worker_num
(),
OP_ROLE_KEY
:
OpRole
.
Optimize
OP_ROLE_KEY
:
OpRole
.
Optimize
})
})
sub_block
.
append_op
(
sub_block
.
append_op
(
...
...
python/paddle/distributed/fleet/meta_optimizers/parameter_server_graph_optimizer.py
浏览文件 @
f52c4f8b
...
@@ -31,7 +31,7 @@ class ParameterServerGraphOptimizer(ParameterServerOptimizer):
...
@@ -31,7 +31,7 @@ class ParameterServerGraphOptimizer(ParameterServerOptimizer):
if
k_steps
<
0
:
if
k_steps
<
0
:
return
False
return
False
if
self
.
role_maker
.
is_server
():
if
self
.
role_maker
.
_
is_server
():
return
False
return
False
if
self
.
role_maker
.
_is_heter_parameter_server_mode
:
if
self
.
role_maker
.
_is_heter_parameter_server_mode
:
...
...
python/paddle/distributed/fleet/meta_optimizers/parameter_server_optimizer.py
浏览文件 @
f52c4f8b
...
@@ -239,10 +239,10 @@ class ParameterServerOptimizer(MetaOptimizerBase):
...
@@ -239,10 +239,10 @@ class ParameterServerOptimizer(MetaOptimizerBase):
strategy
,
self
.
role_maker
)
strategy
,
self
.
role_maker
)
compiled_config
.
strategy
=
strategy
compiled_config
.
strategy
=
strategy
if
self
.
role_maker
.
is_worker
()
or
self
.
role_maker
.
_is_heter_worker
():
if
self
.
role_maker
.
_
is_worker
()
or
self
.
role_maker
.
_is_heter_worker
():
main_program
,
startup_program
=
self
.
_build_trainer_programs
(
main_program
,
startup_program
=
self
.
_build_trainer_programs
(
compiled_config
)
compiled_config
)
elif
self
.
role_maker
.
is_server
():
elif
self
.
role_maker
.
_
is_server
():
main_program
,
startup_program
=
self
.
_build_pserver_programs
(
main_program
,
startup_program
=
self
.
_build_pserver_programs
(
compiled_config
)
compiled_config
)
...
...
python/paddle/distributed/fleet/meta_optimizers/pipeline_optimizer.py
浏览文件 @
f52c4f8b
...
@@ -126,11 +126,11 @@ class PipelineOptimizer(MetaOptimizerBase):
...
@@ -126,11 +126,11 @@ class PipelineOptimizer(MetaOptimizerBase):
optimize_ops
,
params_grads
,
prog_list
=
\
optimize_ops
,
params_grads
,
prog_list
=
\
self
.
wrapped_opt
.
minimize
(
loss
,
startup_program
,
self
.
wrapped_opt
.
minimize
(
loss
,
startup_program
,
parameter_list
,
no_grad_set
)
parameter_list
,
no_grad_set
)
if
self
.
role_maker
.
worker_num
()
==
1
:
if
self
.
role_maker
.
_
worker_num
()
==
1
:
return
optimize_ops
,
params_grads
return
optimize_ops
,
params_grads
endpoints
=
self
.
role_maker
.
get_trainer_endpoints
()
endpoints
=
self
.
role_maker
.
_
get_trainer_endpoints
()
current_endpoint
=
endpoints
[
self
.
role_maker
.
worker_index
()]
current_endpoint
=
endpoints
[
self
.
role_maker
.
_
worker_index
()]
self
.
startup_program
=
startup_program
self
.
startup_program
=
startup_program
if
startup_program
is
None
:
if
startup_program
is
None
:
self
.
startup_program
=
fluid
.
default_startup_program
()
self
.
startup_program
=
fluid
.
default_startup_program
()
...
@@ -142,7 +142,7 @@ class PipelineOptimizer(MetaOptimizerBase):
...
@@ -142,7 +142,7 @@ class PipelineOptimizer(MetaOptimizerBase):
self
.
nranks
=
nranks
self
.
nranks
=
nranks
self
.
nrings
=
len
(
self
.
main_program_list
)
self
.
nrings
=
len
(
self
.
main_program_list
)
self
.
rank
=
self
.
role_maker
.
worker_index
()
self
.
rank
=
self
.
role_maker
.
_
worker_index
()
self
.
endpoints
=
endpoints
self
.
endpoints
=
endpoints
self
.
current_endpoint
=
current_endpoint
self
.
current_endpoint
=
current_endpoint
...
...
python/paddle/distributed/fleet/runtime/parameter_server_runtime.py
浏览文件 @
f52c4f8b
...
@@ -104,9 +104,9 @@ class ParameterServerRuntime(RuntimeBase):
...
@@ -104,9 +104,9 @@ class ParameterServerRuntime(RuntimeBase):
def
_init_worker
(
self
):
def
_init_worker
(
self
):
def
sync_strategy_envs
():
def
sync_strategy_envs
():
kwargs
=
{}
kwargs
=
{}
kwargs
[
"pserver_endpoints"
]
=
self
.
role_maker
.
get_pserver_endpoints
(
kwargs
[
)
"pserver_endpoints"
]
=
self
.
role_maker
.
_get_pserver_endpoints
(
)
kwargs
[
"trainer_id"
]
=
self
.
role_maker
.
worker_index
()
kwargs
[
"trainer_id"
]
=
self
.
role_maker
.
_
worker_index
()
return
kwargs
return
kwargs
def
geo_strategy_envs
():
def
geo_strategy_envs
():
...
@@ -150,7 +150,7 @@ class ParameterServerRuntime(RuntimeBase):
...
@@ -150,7 +150,7 @@ class ParameterServerRuntime(RuntimeBase):
return
"#"
.
join
(
init_attrs
)
return
"#"
.
join
(
init_attrs
)
kwargs
=
{}
kwargs
=
{}
kwargs
[
"trainers"
]
=
self
.
role_maker
.
worker_num
()
kwargs
[
"trainers"
]
=
self
.
role_maker
.
_
worker_num
()
kwargs
[
"sparse_attrs"
]
=
get_sparse_attrs
()
kwargs
[
"sparse_attrs"
]
=
get_sparse_attrs
()
return
kwargs
return
kwargs
...
@@ -338,7 +338,7 @@ class ParameterServerRuntime(RuntimeBase):
...
@@ -338,7 +338,7 @@ class ParameterServerRuntime(RuntimeBase):
block
.
append_op
(
block
.
append_op
(
type
=
'recv_save'
,
type
=
'recv_save'
,
attrs
=
{
attrs
=
{
"trainer_id"
:
self
.
role_maker
.
worker_index
(),
"trainer_id"
:
self
.
role_maker
.
_
worker_index
(),
"shape"
:
var
.
shape
,
"shape"
:
var
.
shape
,
"slice_shapes"
:
"slice_shapes"
:
[
","
.
join
([
str
(
i
)
for
i
in
var
.
shape
])],
[
","
.
join
([
str
(
i
)
for
i
in
var
.
shape
])],
...
@@ -378,14 +378,15 @@ class ParameterServerRuntime(RuntimeBase):
...
@@ -378,14 +378,15 @@ class ParameterServerRuntime(RuntimeBase):
block
.
append_op
(
block
.
append_op
(
type
=
'recv_save'
,
type
=
'recv_save'
,
attrs
=
{
attrs
=
{
"trainer_id"
:
self
.
role_maker
.
worker_index
(),
"trainer_id"
:
self
.
role_maker
.
_
worker_index
(),
"shape"
:
var
.
shape
,
"shape"
:
var
.
shape
,
"slice_shapes"
:
slice_shapes
,
"slice_shapes"
:
slice_shapes
,
"slice_varnames"
:
var_ctx
.
split_varnames
(),
"slice_varnames"
:
var_ctx
.
split_varnames
(),
"remote_varnames"
:
var_ctx
.
split_varnames
(),
"remote_varnames"
:
var_ctx
.
split_varnames
(),
"is_sparse"
:
True
,
"is_sparse"
:
True
,
"endpoints"
:
var_ctx
.
split_endpoints
(),
"endpoints"
:
var_ctx
.
split_endpoints
(),
"pserver_num"
:
len
(
self
.
role_maker
.
get_pserver_endpoints
()),
"pserver_num"
:
len
(
self
.
role_maker
.
_get_pserver_endpoints
()),
"file_path"
:
os
.
path
.
join
(
dirname
,
var
.
name
)
"file_path"
:
os
.
path
.
join
(
dirname
,
var
.
name
)
})
})
...
@@ -403,7 +404,7 @@ class ParameterServerRuntime(RuntimeBase):
...
@@ -403,7 +404,7 @@ class ParameterServerRuntime(RuntimeBase):
block
.
append_op
(
block
.
append_op
(
type
=
'recv_save'
,
type
=
'recv_save'
,
attrs
=
{
attrs
=
{
"trainer_id"
:
self
.
role_maker
.
worker_index
(),
"trainer_id"
:
self
.
role_maker
.
_
worker_index
(),
"shape"
:
var
.
shape
,
"shape"
:
var
.
shape
,
"slice_shapes"
:
slice_shapes
,
"slice_shapes"
:
slice_shapes
,
"slice_varnames"
:
slice_varnames
,
"slice_varnames"
:
slice_varnames
,
...
@@ -411,7 +412,7 @@ class ParameterServerRuntime(RuntimeBase):
...
@@ -411,7 +412,7 @@ class ParameterServerRuntime(RuntimeBase):
"is_sparse"
:
True
,
"is_sparse"
:
True
,
"endpoints"
:
var_ctx
.
split_endpoints
(),
"endpoints"
:
var_ctx
.
split_endpoints
(),
"pserver_num"
:
"pserver_num"
:
len
(
self
.
role_maker
.
get_pserver_endpoints
()),
len
(
self
.
role_maker
.
_
get_pserver_endpoints
()),
"file_path"
:
os
.
path
.
join
(
dirname
,
var
.
name
)
"file_path"
:
os
.
path
.
join
(
dirname
,
var
.
name
)
})
})
...
@@ -422,7 +423,7 @@ class ParameterServerRuntime(RuntimeBase):
...
@@ -422,7 +423,7 @@ class ParameterServerRuntime(RuntimeBase):
block
.
append_op
(
block
.
append_op
(
type
=
'recv_save'
,
type
=
'recv_save'
,
attrs
=
{
attrs
=
{
"trainer_id"
:
self
.
role_maker
.
worker_index
(),
"trainer_id"
:
self
.
role_maker
.
_
worker_index
(),
"shape"
:
var
.
shape
,
"shape"
:
var
.
shape
,
"slice_shapes"
:
"slice_shapes"
:
[
","
.
join
([
str
(
i
)
for
i
in
var
.
shape
])],
[
","
.
join
([
str
(
i
)
for
i
in
var
.
shape
])],
...
...
python/paddle/fluid/__init__.py
浏览文件 @
f52c4f8b
...
@@ -197,6 +197,7 @@ def __bootstrap__():
...
@@ -197,6 +197,7 @@ def __bootstrap__():
'free_when_no_cache_hit'
,
'free_when_no_cache_hit'
,
'call_stack_level'
,
'call_stack_level'
,
'sort_sum_gradient'
,
'sort_sum_gradient'
,
'max_inplace_grad_add'
,
]
]
if
'Darwin'
not
in
sysstr
:
if
'Darwin'
not
in
sysstr
:
read_env_flags
.
append
(
'use_pinned_memory'
)
read_env_flags
.
append
(
'use_pinned_memory'
)
...
...
python/paddle/fluid/backward.py
浏览文件 @
f52c4f8b
...
@@ -251,12 +251,19 @@ def _rename_arg_(op_descs, old_name, new_name, begin_idx=None, end_idx=None):
...
@@ -251,12 +251,19 @@ def _rename_arg_(op_descs, old_name, new_name, begin_idx=None, end_idx=None):
begin_idx
=
0
begin_idx
=
0
if
end_idx
is
None
:
if
end_idx
is
None
:
end_idx
=
len
(
op_descs
)
end_idx
=
len
(
op_descs
)
if
isinstance
(
op_descs
,
(
list
,
tuple
)):
for
i
in
range
(
begin_idx
,
end_idx
):
for
i
in
range
(
begin_idx
,
end_idx
):
op_desc
=
op_descs
[
i
]
op_desc
=
op_descs
[
i
]
if
isinstance
(
op_desc
,
tuple
):
if
isinstance
(
op_desc
,
tuple
):
op_desc
=
op_desc
[
0
]
op_desc
=
op_desc
[
0
]
op_desc
.
_rename_input
(
old_name
,
new_name
)
op_desc
.
_rename_input
(
old_name
,
new_name
)
op_desc
.
_rename_output
(
old_name
,
new_name
)
op_desc
.
_rename_output
(
old_name
,
new_name
)
if
isinstance
(
op_descs
,
collections
.
OrderedDict
):
for
key
,
value
in
op_descs
.
items
():
if
isinstance
(
value
,
(
list
,
tuple
)):
for
op_desc
in
value
:
op_desc
.
_rename_input
(
old_name
,
new_name
)
op_desc
.
_rename_output
(
old_name
,
new_name
)
def
_create_op_desc_
(
op_type
,
inputs
,
outputs
,
attrs
):
def
_create_op_desc_
(
op_type
,
inputs
,
outputs
,
attrs
):
...
@@ -369,6 +376,41 @@ def _append_grad_suffix_(name):
...
@@ -369,6 +376,41 @@ def _append_grad_suffix_(name):
return
cpt
.
to_text
(
name
)
+
core
.
grad_var_suffix
()
return
cpt
.
to_text
(
name
)
+
core
.
grad_var_suffix
()
def
_accumulate_gradients_by_sum_op_
(
var_name
,
renamed_vars
,
pending_sum_ops
,
op_idx
):
"""
Use sum op to accumulate_gradients, the gradients are stored in renamed_vars.
"""
if
op_idx
not
in
pending_sum_ops
.
keys
():
pending_sum_ops
[
op_idx
]
=
[]
pending_sum_ops
[
op_idx
].
append
(
_create_op_desc_
(
"sum"
,
{
"X"
:
renamed_vars
[
var_name
]},
{
"Out"
:
[
var_name
]},
{
"use_mkldnn"
:
False
}))
renamed_vars
[
var_name
]
=
[
var_name
]
def
_accumulate_gradients_by_add_ops_
(
var_name
,
renamed_vars
,
pending_sum_ops
,
op_idx
):
"""
Use several inplace add op to accumulate_gradients, the gradients are stored in renamed_vars.
"""
if
op_idx
not
in
pending_sum_ops
.
keys
():
pending_sum_ops
[
op_idx
]
=
[]
out_name
=
renamed_vars
[
var_name
][
0
]
for
i
in
range
(
1
,
len
(
renamed_vars
[
var_name
])):
x_name
=
out_name
y_name
=
renamed_vars
[
var_name
][
i
]
if
i
!=
len
(
renamed_vars
[
var_name
])
-
1
:
out_name
=
var_name
+
'@ADD@'
+
str
(
i
)
else
:
out_name
=
var_name
pending_sum_ops
[
op_idx
].
append
(
_create_op_desc_
(
"grad_add"
,
{
"X"
:
[
x_name
],
"Y"
:
[
y_name
]},
{
"Out"
:
[
out_name
]},
{
"use_mkldnn"
:
False
}))
renamed_vars
[
var_name
]
=
[
var_name
]
def
_addup_repetitive_outputs_
(
op_descs
,
block_idx
):
def
_addup_repetitive_outputs_
(
op_descs
,
block_idx
):
"""
"""
In backward part, an variable may be the output of more than one ops.
In backward part, an variable may be the output of more than one ops.
...
@@ -376,7 +418,9 @@ def _addup_repetitive_outputs_(op_descs, block_idx):
...
@@ -376,7 +418,9 @@ def _addup_repetitive_outputs_(op_descs, block_idx):
In these cases, the variable should be the accumulation of all the outputs.
In these cases, the variable should be the accumulation of all the outputs.
`sum_op`s are added to implement the accumulate.
`sum_op`s are added to implement the accumulate.
"""
"""
pending_sum_ops
=
[]
_MAX_ADD_NUM_
=
core
.
globals
()[
'FLAGS_max_inplace_grad_add'
]
#pending_sum_ops = []
pending_sum_ops
=
collections
.
OrderedDict
()
var_rename_count
=
collections
.
defaultdict
(
int
)
var_rename_count
=
collections
.
defaultdict
(
int
)
renamed_vars
=
collections
.
defaultdict
(
list
)
renamed_vars
=
collections
.
defaultdict
(
list
)
renamed_var_start_idx
=
collections
.
defaultdict
(
list
)
renamed_var_start_idx
=
collections
.
defaultdict
(
list
)
...
@@ -385,10 +429,13 @@ def _addup_repetitive_outputs_(op_descs, block_idx):
...
@@ -385,10 +429,13 @@ def _addup_repetitive_outputs_(op_descs, block_idx):
if
"@GRAD"
not
in
var_name
:
if
"@GRAD"
not
in
var_name
:
continue
continue
if
len
(
renamed_vars
[
var_name
])
>
1
:
if
len
(
renamed_vars
[
var_name
])
>
1
:
pending_sum_ops
.
append
((
_create_op_desc_
(
if
len
(
renamed_vars
[
var_name
])
>
_MAX_ADD_NUM_
:
"sum"
,
{
"X"
:
renamed_vars
[
var_name
]},
{
"Out"
:
[
var_name
]},
_accumulate_gradients_by_sum_op_
(
var_name
,
renamed_vars
,
{
"use_mkldnn"
:
False
}),
idx
))
pending_sum_ops
,
idx
)
renamed_vars
[
var_name
]
=
[
var_name
]
else
:
_accumulate_gradients_by_add_ops_
(
var_name
,
renamed_vars
,
pending_sum_ops
,
idx
)
for
param_idx
,
param_name
in
enumerate
(
op_desc
.
output_names
()):
for
param_idx
,
param_name
in
enumerate
(
op_desc
.
output_names
()):
arg_names
=
op_desc
.
output
(
param_name
)
arg_names
=
op_desc
.
output
(
param_name
)
for
arg_idx
,
var_name
in
enumerate
(
arg_names
):
for
arg_idx
,
var_name
in
enumerate
(
arg_names
):
...
@@ -440,13 +487,26 @@ def _addup_repetitive_outputs_(op_descs, block_idx):
...
@@ -440,13 +487,26 @@ def _addup_repetitive_outputs_(op_descs, block_idx):
renamed_vars
[
var_name
].
append
(
new_name
)
renamed_vars
[
var_name
].
append
(
new_name
)
for
var_name
,
inputs
in
six
.
iteritems
(
renamed_vars
):
for
var_name
,
inputs
in
six
.
iteritems
(
renamed_vars
):
if
len
(
inputs
)
>
1
:
if
len
(
renamed_vars
[
var_name
])
>
1
:
pending_sum_ops
.
append
(
if
len
(
renamed_vars
[
var_name
])
>
_MAX_ADD_NUM_
:
(
_create_op_desc_
(
"sum"
,
{
"X"
:
inputs
},
{
"Out"
:
[
var_name
]},
_accumulate_gradients_by_sum_op_
(
var_name
,
renamed_vars
,
{
"use_mkldnn"
:
False
}),
len
(
op_descs
)))
pending_sum_ops
,
len
(
op_descs
))
else
:
_accumulate_gradients_by_add_ops_
(
var_name
,
renamed_vars
,
pending_sum_ops
,
len
(
op_descs
))
# sum_op descs are sorted according to their insert position
# sum_op descs are sorted according to their insert position
for
p
in
reversed
(
pending_sum_ops
):
for
key
,
value
in
collections
.
OrderedDict
(
op_descs
.
insert
(
p
[
1
],
p
[
0
])
reversed
(
list
(
pending_sum_ops
.
items
()))).
items
():
# NOTE(zhiqiu): Since reversed, the idx of op_descs to be inserted will remains correct.
# For example, [0, 1, 2], and we want to insert 'a' at idx 1, 'b' at idx 2, and the expected result is [0, 1, 'a', 2, 'b'].
# If reversed, we first insert 'b' at idx 2, it becomes [0, 1, 2, 'b'], and then insert 'a' at idx 1, it becomes [0, 1, 'a', 2, 'b'].
# If not reverse, we first insert 'a' at idx 1, it becomes [0, 1, 'a', 2], and then insert 'b' at idx 2, it becomes [0, 1, 'a', 'b', 2].
idx
=
key
for
i
,
op
in
enumerate
(
value
):
op_descs
.
insert
(
idx
+
i
,
op
)
return
op_descs
return
op_descs
...
...
python/paddle/fluid/contrib/slim/quantization/imperative/qat.py
浏览文件 @
f52c4f8b
...
@@ -99,7 +99,12 @@ class ImperativeQuantAware(object):
...
@@ -99,7 +99,12 @@ class ImperativeQuantAware(object):
self
.
_activation_bits
=
activation_bits
self
.
_activation_bits
=
activation_bits
self
.
_moving_rate
=
moving_rate
self
.
_moving_rate
=
moving_rate
quant_type
=
{
'abs_max'
,
'moving_average_abs_max'
}
quant_type
=
{
'abs_max'
,
'moving_average_abs_max'
,
'channel_wise_abs_max'
}
assert
activation_quantize_type
!=
'channel_wise_abs_max'
,
\
"The activation quantization type does not support 'channel_wise_abs_max'."
if
activation_quantize_type
not
in
quant_type
:
if
activation_quantize_type
not
in
quant_type
:
raise
ValueError
(
raise
ValueError
(
"Unknown activation_quantize_type : '%s'. It can only be "
"Unknown activation_quantize_type : '%s'. It can only be "
...
@@ -108,8 +113,8 @@ class ImperativeQuantAware(object):
...
@@ -108,8 +113,8 @@ class ImperativeQuantAware(object):
if
weight_quantize_type
not
in
quant_type
:
if
weight_quantize_type
not
in
quant_type
:
raise
ValueError
(
raise
ValueError
(
"Unknown weight_quantize_type: '%s'. It can only be "
"Unknown weight_quantize_type: '%s'. It can only be "
"'abs_max' or 'moving_average_abs_max'
now."
%
"'abs_max' or 'moving_average_abs_max'
or 'channel_wise_abs_max' now."
(
str
(
weight_quantize_type
)))
%
(
str
(
weight_quantize_type
)))
self
.
_activation_quantize_type
=
activation_quantize_type
self
.
_activation_quantize_type
=
activation_quantize_type
self
.
_weight_quantize_type
=
weight_quantize_type
self
.
_weight_quantize_type
=
weight_quantize_type
...
...
python/paddle/fluid/contrib/slim/quantization/imperative/quant_nn.py
浏览文件 @
f52c4f8b
...
@@ -24,7 +24,7 @@ from paddle.fluid.data_feeder import check_variable_and_dtype
...
@@ -24,7 +24,7 @@ from paddle.fluid.data_feeder import check_variable_and_dtype
__all__
=
[
__all__
=
[
'FakeQuantMovingAverage'
,
'FakeQuantAbsMax'
,
'QuantizedConv2D'
,
'FakeQuantMovingAverage'
,
'FakeQuantAbsMax'
,
'QuantizedConv2D'
,
'QuantizedLinear'
'QuantizedLinear'
,
'FakeChannelWiseQuantDequantAbsMax'
]
]
...
@@ -209,6 +209,89 @@ class FakeQuantAbsMax(layers.Layer):
...
@@ -209,6 +209,89 @@ class FakeQuantAbsMax(layers.Layer):
return
quant_out
return
quant_out
class
FakeChannelWiseQuantDequantAbsMax
(
layers
.
Layer
):
def
__init__
(
self
,
name
=
None
,
channel_num
=
None
,
quant_bits
=
8
,
quant_axis
=
0
,
dtype
=
'float32'
,
quant_on_weight
=
False
):
assert
quant_on_weight
==
True
,
"Channel_wise only can be used on weight quantization."
super
(
FakeChannelWiseQuantDequantAbsMax
,
self
).
__init__
()
self
.
_quant_bits
=
quant_bits
self
.
_quant_axis
=
quant_axis
self
.
_dtype
=
dtype
self
.
_name
=
name
self
.
_channel_num
=
channel_num
scale_prefix
=
"{}.scale"
.
format
(
name
)
if
name
else
'quant_dequant.scale'
self
.
_scale_name
=
unique_name
.
generate
(
scale_prefix
)
if
quant_on_weight
:
scale_attr
=
ParamAttr
(
name
=
self
.
_scale_name
,
initializer
=
Constant
(
0.0
),
trainable
=
False
)
self
.
_scale
=
self
.
create_parameter
(
shape
=
[
self
.
_channel_num
],
attr
=
scale_attr
,
dtype
=
self
.
_dtype
)
self
.
_scale
.
stop_gradient
=
True
else
:
self
.
_scale
=
None
def
forward
(
self
,
input
):
if
in_dygraph_mode
():
attrs
=
(
'bit_length'
,
self
.
_quant_bits
,
'quant_axis'
,
self
.
_quant_axis
)
quant_out
=
_varbase_creator
(
type
=
input
.
type
,
name
=
"{}.quantized.dequantized"
.
format
(
input
.
name
),
shape
=
input
.
shape
,
dtype
=
input
.
dtype
,
persistable
=
False
)
out_scale
=
self
.
_scale
if
out_scale
is
None
:
out_scale
=
_varbase_creator
(
type
=
core
.
VarDesc
.
VarType
.
LOD_TENSOR
,
name
=
self
.
_scale_name
,
shape
=
[
self
.
_channel_num
],
dtype
=
self
.
_dtype
,
persistable
=
False
)
out_scale
.
stop_gradient
=
True
out
,
_
,
=
core
.
ops
.
fake_channel_wise_quantize_dequantize_abs_max
(
input
,
quant_out
,
out_scale
,
*
attrs
)
return
out
check_variable_and_dtype
(
input
,
'input'
,
[
'float32'
],
"FakeChannelWiseQuantDequantAbsMax"
)
attrs
=
{
'bit_length'
:
self
.
_quant_bits
,
'quant_axis'
:
self
.
_quant_axis
}
inputs
=
{
"X"
:
[
input
]}
quant_out
=
self
.
_helper
.
create_variable
(
name
=
"{}.quantized.dequantized"
.
format
(
input
.
name
),
dtype
=
input
.
dtype
,
type
=
core
.
VarDesc
.
VarType
.
LOD_TENSOR
,
persistable
=
False
,
stop_gradient
=
False
)
out_scale
=
self
.
_scale
if
not
out_scale
:
out_scale
=
self
.
_helper
.
create_variable
(
name
=
self
.
_scale_name
,
dtype
=
self
.
_dtype
,
type
=
core
.
VarDesc
.
VarType
.
LOD_TENSOR
,
persistable
=
False
,
stop_gradient
=
True
)
outputs
=
{
"Out"
:
[
quant_out
],
"OutScale"
:
[
out_scale
]}
self
.
_helper
.
append_op
(
type
=
"fake_channel_wise_quantize_dequantize_abs_max"
,
inputs
=
inputs
,
outputs
=
outputs
,
attrs
=
attrs
)
return
quant_out
def
_get_fake_quant_type
(
quant_type
,
**
kwargs
):
def
_get_fake_quant_type
(
quant_type
,
**
kwargs
):
call_args
=
{
call_args
=
{
"name"
:
kwargs
.
get
(
"name"
,
None
),
"name"
:
kwargs
.
get
(
"name"
,
None
),
...
@@ -220,10 +303,17 @@ def _get_fake_quant_type(quant_type, **kwargs):
...
@@ -220,10 +303,17 @@ def _get_fake_quant_type(quant_type, **kwargs):
call_args
[
"quant_on_weight"
]
=
kwargs
.
get
(
"quant_on_weight"
,
False
)
call_args
[
"quant_on_weight"
]
=
kwargs
.
get
(
"quant_on_weight"
,
False
)
elif
quant_type
==
'moving_average_abs_max'
:
elif
quant_type
==
'moving_average_abs_max'
:
call_args
[
"moving_rate"
]
=
kwargs
.
get
(
"moving_rate"
,
0.9
)
call_args
[
"moving_rate"
]
=
kwargs
.
get
(
"moving_rate"
,
0.9
)
elif
quant_type
==
'channel_wise_abs_max'
:
call_args
[
"quant_on_weight"
]
=
kwargs
.
get
(
"quant_on_weight"
,
False
)
call_args
[
"channel_num"
]
=
kwargs
.
get
(
"channel_num"
,
None
)
call_args
[
"quant_axis"
]
=
kwargs
.
get
(
"quant_axis"
,
0
)
assert
call_args
[
"channel_num"
]
is
not
None
,
(
"You need to input channel_num"
"when you use channel_wise_abs_max strategy."
)
fake_quant_map
=
{
fake_quant_map
=
{
'abs_max'
:
FakeQuantAbsMax
,
'abs_max'
:
FakeQuantAbsMax
,
'moving_average_abs_max'
:
FakeQuantMovingAverage
'moving_average_abs_max'
:
FakeQuantMovingAverage
,
'channel_wise_abs_max'
:
FakeChannelWiseQuantDequantAbsMax
}
}
return
fake_quant_map
[
quant_type
](
**
call_args
)
return
fake_quant_map
[
quant_type
](
**
call_args
)
...
@@ -255,19 +345,23 @@ class QuantizedConv2D(layers.Layer):
...
@@ -255,19 +345,23 @@ class QuantizedConv2D(layers.Layer):
self
.
weight
=
getattr
(
layer
,
'weight'
)
self
.
weight
=
getattr
(
layer
,
'weight'
)
self
.
bias
=
getattr
(
layer
,
'bias'
)
self
.
bias
=
getattr
(
layer
,
'bias'
)
# For FakeQuant
# For FakeQuant
self
.
_conv2d_quant_axis
=
0
self
.
_fake_quant_weight
=
_get_fake_quant_type
(
self
.
_fake_quant_weight
=
_get_fake_quant_type
(
weight_quantize_type
,
weight_quantize_type
,
name
=
self
.
weight
.
name
,
name
=
self
.
weight
.
name
,
moving_rate
=
moving_rate
,
moving_rate
=
moving_rate
,
quant_bits
=
weight_bits
,
quant_bits
=
weight_bits
,
dtype
=
self
.
_dtype
,
dtype
=
self
.
_dtype
,
quant_on_weight
=
True
)
quant_on_weight
=
True
,
channel_num
=
self
.
weight
.
shape
[
self
.
_conv2d_quant_axis
],
quant_axis
=
self
.
_conv2d_quant_axis
)
self
.
_fake_quant_input
=
_get_fake_quant_type
(
self
.
_fake_quant_input
=
_get_fake_quant_type
(
activation_quantize_type
,
activation_quantize_type
,
name
=
layer
.
full_name
(),
name
=
layer
.
full_name
(),
moving_rate
=
moving_rate
,
moving_rate
=
moving_rate
,
quant_bits
=
activation_bits
,
quant_bits
=
activation_bits
,
dtype
=
self
.
_dtype
)
dtype
=
self
.
_dtype
,
quant_on_weight
=
False
)
def
forward
(
self
,
input
):
def
forward
(
self
,
input
):
quant_input
=
self
.
_fake_quant_input
(
input
)
quant_input
=
self
.
_fake_quant_input
(
input
)
...
@@ -341,19 +435,23 @@ class QuantizedLinear(layers.Layer):
...
@@ -341,19 +435,23 @@ class QuantizedLinear(layers.Layer):
self
.
weight
=
getattr
(
layer
,
'weight'
)
self
.
weight
=
getattr
(
layer
,
'weight'
)
self
.
bias
=
getattr
(
layer
,
'bias'
)
self
.
bias
=
getattr
(
layer
,
'bias'
)
# For FakeQuant
# For FakeQuant
self
.
_linear_quant_axis
=
1
self
.
_fake_quant_weight
=
_get_fake_quant_type
(
self
.
_fake_quant_weight
=
_get_fake_quant_type
(
weight_quantize_type
,
weight_quantize_type
,
name
=
self
.
weight
.
name
,
name
=
self
.
weight
.
name
,
moving_rate
=
moving_rate
,
moving_rate
=
moving_rate
,
quant_bits
=
weight_bits
,
quant_bits
=
weight_bits
,
dtype
=
self
.
_dtype
,
dtype
=
self
.
_dtype
,
quant_on_weight
=
True
)
quant_on_weight
=
True
,
channel_num
=
self
.
weight
.
shape
[
self
.
_linear_quant_axis
],
quant_axis
=
self
.
_linear_quant_axis
)
self
.
_fake_quant_input
=
_get_fake_quant_type
(
self
.
_fake_quant_input
=
_get_fake_quant_type
(
activation_quantize_type
,
activation_quantize_type
,
name
=
layer
.
full_name
(),
name
=
layer
.
full_name
(),
moving_rate
=
moving_rate
,
moving_rate
=
moving_rate
,
quant_bits
=
activation_bits
,
quant_bits
=
activation_bits
,
dtype
=
self
.
_dtype
)
dtype
=
self
.
_dtype
,
quant_on_weight
=
False
)
def
forward
(
self
,
input
):
def
forward
(
self
,
input
):
quant_input
=
self
.
_fake_quant_input
(
input
)
quant_input
=
self
.
_fake_quant_input
(
input
)
...
...
python/paddle/fluid/contrib/slim/tests/test_imperative_qat.py
浏览文件 @
f52c4f8b
...
@@ -181,7 +181,6 @@ class TestImperativeQat(unittest.TestCase):
...
@@ -181,7 +181,6 @@ class TestImperativeQat(unittest.TestCase):
img
=
fluid
.
dygraph
.
to_variable
(
x_data
)
img
=
fluid
.
dygraph
.
to_variable
(
x_data
)
label
=
fluid
.
dygraph
.
to_variable
(
y_data
)
label
=
fluid
.
dygraph
.
to_variable
(
y_data
)
out
=
lenet
(
img
)
out
=
lenet
(
img
)
acc
=
fluid
.
layers
.
accuracy
(
out
,
label
)
acc
=
fluid
.
layers
.
accuracy
(
out
,
label
)
loss
=
fluid
.
layers
.
cross_entropy
(
out
,
label
)
loss
=
fluid
.
layers
.
cross_entropy
(
out
,
label
)
...
...
python/paddle/fluid/contrib/slim/tests/test_imperative_qat_channelwise.py
0 → 100644
浏览文件 @
f52c4f8b
此差异已折叠。
点击以展开。
python/paddle/fluid/incubate/fleet/parameter_server/ir/public.py
浏览文件 @
f52c4f8b
...
@@ -170,22 +170,40 @@ class CompileTimeStrategy(object):
...
@@ -170,22 +170,40 @@ class CompileTimeStrategy(object):
return
trainer
.
mode
==
DistributedMode
.
ASYNC
return
trainer
.
mode
==
DistributedMode
.
ASYNC
def
get_role_id
(
self
):
def
get_role_id
(
self
):
try
:
return
self
.
role_maker
.
_role_id
()
except
Exception
:
return
self
.
role_maker
.
role_id
()
return
self
.
role_maker
.
role_id
()
def
get_trainers
(
self
):
def
get_trainers
(
self
):
try
:
return
self
.
role_maker
.
_worker_num
()
except
Exception
:
return
self
.
role_maker
.
worker_num
()
return
self
.
role_maker
.
worker_num
()
def
get_ps_endpoint
(
self
):
def
get_ps_endpoint
(
self
):
try
:
return
self
.
role_maker
.
_get_pserver_endpoints
()[
self
.
get_role_id
()]
except
Exception
:
return
self
.
role_maker
.
get_pserver_endpoints
()[
self
.
get_role_id
()]
return
self
.
role_maker
.
get_pserver_endpoints
()[
self
.
get_role_id
()]
def
get_ps_endpoints
(
self
):
def
get_ps_endpoints
(
self
):
try
:
return
self
.
role_maker
.
_get_pserver_endpoints
()
except
Exception
:
return
self
.
role_maker
.
get_pserver_endpoints
()
return
self
.
role_maker
.
get_pserver_endpoints
()
def
get_heter_worker_endpoints
(
self
):
def
get_heter_worker_endpoints
(
self
):
try
:
return
self
.
role_maker
.
_get_heter_worker_endpoints
()
return
self
.
role_maker
.
_get_heter_worker_endpoints
()
except
Exception
:
return
self
.
role_maker
.
get_heter_worker_endpoints
()
def
get_heter_worker_endpoint
(
self
):
def
get_heter_worker_endpoint
(
self
):
try
:
return
self
.
role_maker
.
_get_heter_worker_endpoint
()
return
self
.
role_maker
.
_get_heter_worker_endpoint
()
except
Exception
:
return
self
.
role_maker
.
get_heter_worker_endpoint
()
def
get_origin_programs
(
self
):
def
get_origin_programs
(
self
):
return
self
.
origin_main_program
,
self
.
origin_startup_program
return
self
.
origin_main_program
,
self
.
origin_startup_program
...
...
python/paddle/fluid/layers/tensor.py
浏览文件 @
f52c4f8b
...
@@ -680,8 +680,10 @@ def fill_constant(shape, dtype, value, force_cpu=False, out=None, name=None):
...
@@ -680,8 +680,10 @@ def fill_constant(shape, dtype, value, force_cpu=False, out=None, name=None):
if
not
isinstance
(
value
,
Variable
):
if
not
isinstance
(
value
,
Variable
):
if
dtype
in
[
'int64'
,
'int32'
]:
if
dtype
in
[
'int64'
,
'int32'
]:
attrs
[
'str_value'
]
=
str
(
int
(
value
))
attrs
[
'str_value'
]
=
str
(
int
(
value
))
attrs
[
'value'
]
=
int
(
value
)
else
:
else
:
attrs
[
'str_value'
]
=
str
(
float
(
value
))
attrs
[
'str_value'
]
=
str
(
float
(
value
))
attrs
[
'value'
]
=
float
(
value
)
if
in_dygraph_mode
():
if
in_dygraph_mode
():
shape
=
utils
.
convert_shape_to_list
(
shape
)
shape
=
utils
.
convert_shape_to_list
(
shape
)
...
...
python/paddle/fluid/tests/unittests/ir/inference/test_conv_affine_channel_fuse_pass.py
0 → 100644
浏览文件 @
f52c4f8b
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/ir/inference/test_conv_bn_fuse_pass.py
0 → 100644
浏览文件 @
f52c4f8b
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/ir/inference/test_conv_elementwise_add2_act_fuse_pass.py
浏览文件 @
f52c4f8b
...
@@ -19,6 +19,7 @@ import numpy as np
...
@@ -19,6 +19,7 @@ import numpy as np
from
inference_pass_test
import
InferencePassTest
from
inference_pass_test
import
InferencePassTest
import
paddle.fluid
as
fluid
import
paddle.fluid
as
fluid
import
paddle.fluid.core
as
core
import
paddle.fluid.core
as
core
from
paddle.fluid.core
import
PassVersionChecker
from
paddle.fluid.core
import
AnalysisConfig
from
paddle.fluid.core
import
AnalysisConfig
"""Test for fusion of conv, elementwise_add and 2 act."""
"""Test for fusion of conv, elementwise_add and 2 act."""
...
@@ -46,6 +47,9 @@ class ConvElementwiseAdd2ActFusePassTest(InferencePassTest):
...
@@ -46,6 +47,9 @@ class ConvElementwiseAdd2ActFusePassTest(InferencePassTest):
if
core
.
is_compiled_with_cuda
():
if
core
.
is_compiled_with_cuda
():
use_gpu
=
True
use_gpu
=
True
self
.
check_output_with_option
(
use_gpu
)
self
.
check_output_with_option
(
use_gpu
)
self
.
assertTrue
(
PassVersionChecker
.
IsCompatible
(
'conv_elementwise_add2_act_fuse_pass'
))
if
__name__
==
"__main__"
:
if
__name__
==
"__main__"
:
...
...
python/paddle/fluid/tests/unittests/ir/inference/test_conv_elementwise_add_act_fuse_pass.py
浏览文件 @
f52c4f8b
...
@@ -19,6 +19,7 @@ import numpy as np
...
@@ -19,6 +19,7 @@ import numpy as np
from
inference_pass_test
import
InferencePassTest
from
inference_pass_test
import
InferencePassTest
import
paddle.fluid
as
fluid
import
paddle.fluid
as
fluid
import
paddle.fluid.core
as
core
import
paddle.fluid.core
as
core
from
paddle.fluid.core
import
PassVersionChecker
from
paddle.fluid.core
import
AnalysisConfig
from
paddle.fluid.core
import
AnalysisConfig
"""Test for fusion of conv, elementwise_add and act."""
"""Test for fusion of conv, elementwise_add and act."""
...
@@ -48,6 +49,9 @@ class ConvElementwiseAddActFusePassTest(InferencePassTest):
...
@@ -48,6 +49,9 @@ class ConvElementwiseAddActFusePassTest(InferencePassTest):
if
core
.
is_compiled_with_cuda
():
if
core
.
is_compiled_with_cuda
():
use_gpu
=
True
use_gpu
=
True
self
.
check_output_with_option
(
use_gpu
)
self
.
check_output_with_option
(
use_gpu
)
self
.
assertTrue
(
PassVersionChecker
.
IsCompatible
(
'conv_elementwise_add_act_fuse_pass'
))
if
__name__
==
"__main__"
:
if
__name__
==
"__main__"
:
...
...
python/paddle/fluid/tests/unittests/ir/inference/test_conv_elementwise_add_fuse_pass.py
浏览文件 @
f52c4f8b
...
@@ -19,6 +19,7 @@ import numpy as np
...
@@ -19,6 +19,7 @@ import numpy as np
from
inference_pass_test
import
InferencePassTest
from
inference_pass_test
import
InferencePassTest
import
paddle.fluid
as
fluid
import
paddle.fluid
as
fluid
import
paddle.fluid.core
as
core
import
paddle.fluid.core
as
core
from
paddle.fluid.core
import
PassVersionChecker
from
paddle.fluid.core
import
AnalysisConfig
from
paddle.fluid.core
import
AnalysisConfig
"""Test for fusion of conv and elementwise_add."""
"""Test for fusion of conv and elementwise_add."""
...
@@ -44,6 +45,8 @@ class ConvElementwiseAddFusePassTest(InferencePassTest):
...
@@ -44,6 +45,8 @@ class ConvElementwiseAddFusePassTest(InferencePassTest):
if
core
.
is_compiled_with_cuda
():
if
core
.
is_compiled_with_cuda
():
use_gpu
=
True
use_gpu
=
True
self
.
check_output_with_option
(
use_gpu
)
self
.
check_output_with_option
(
use_gpu
)
self
.
assertTrue
(
PassVersionChecker
.
IsCompatible
(
'conv_elementwise_add_fuse_pass'
))
if
__name__
==
"__main__"
:
if
__name__
==
"__main__"
:
...
...
python/paddle/fluid/tests/unittests/ir/inference/test_fc_fuse_pass.py
0 → 100644
浏览文件 @
f52c4f8b
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from
__future__
import
print_function
import
unittest
import
numpy
as
np
from
inference_pass_test
import
InferencePassTest
import
paddle.fluid
as
fluid
import
paddle.fluid.core
as
core
from
paddle.fluid.core
import
AnalysisConfig
from
paddle.fluid.core
import
PassVersionChecker
class
FcFusePassTest
(
InferencePassTest
):
def
setUp
(
self
):
with
fluid
.
program_guard
(
self
.
main_program
,
self
.
startup_program
):
data
=
fluid
.
data
(
name
=
"data"
,
shape
=
[
-
1
,
128
,
768
],
dtype
=
"float32"
)
data_y
=
fluid
.
data
(
name
=
"y"
,
shape
=
[
-
1
,
128
,
768
],
dtype
=
"float32"
)
fc_out1
=
fluid
.
layers
.
fc
(
input
=
data
,
size
=
3072
,
num_flatten_dims
=
2
,
act
=
"relu"
)
fc_out2
=
fluid
.
layers
.
fc
(
input
=
fc_out1
,
size
=
768
,
num_flatten_dims
=
2
)
self
.
feeds
=
{
"data"
:
np
.
random
.
random
((
4
,
128
,
768
)).
astype
(
"float32"
)}
self
.
fetch_list
=
[
fc_out2
]
def
test_check_output
(
self
):
use_gpu
=
[
False
]
if
core
.
is_compiled_with_cuda
():
use_gpu
.
append
(
True
)
for
i
in
range
(
len
(
use_gpu
)):
self
.
check_output_with_option
(
use_gpu
[
i
])
self
.
assertTrue
(
PassVersionChecker
.
IsCompatible
(
'fc_fuse_pass'
))
if
__name__
==
"__main__"
:
unittest
.
main
()
python/paddle/fluid/tests/unittests/ir/inference/test_fc_gru_fuse_pass.py
0 → 100644
浏览文件 @
f52c4f8b
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/ir/inference/test_fc_lstm_fuse_pass.py
0 → 100644
浏览文件 @
f52c4f8b
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/ir/inference/test_repeated_fc_relu_fuse_pass.py
0 → 100644
浏览文件 @
f52c4f8b
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/ir/inference/test_squared_mat_sub_fuse_pass.py
0 → 100644
浏览文件 @
f52c4f8b
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/ir/inference/test_transpose_flatten_concat_fuse_pass.py
浏览文件 @
f52c4f8b
...
@@ -75,7 +75,9 @@ class TransposeFlattenConcatFusePassWithAxisTest(InferencePassTest):
...
@@ -75,7 +75,9 @@ class TransposeFlattenConcatFusePassWithAxisTest(InferencePassTest):
use_gpu
=
True
use_gpu
=
True
self
.
check_output_with_option
(
use_gpu
)
self
.
check_output_with_option
(
use_gpu
)
PassVersionChecker
.
IsCompatible
(
'transpose_flatten_concat_fuse_pass'
)
self
.
assertTrue
(
PassVersionChecker
.
IsCompatible
(
'transpose_flatten_concat_fuse_pass'
))
if
__name__
==
"__main__"
:
if
__name__
==
"__main__"
:
...
...
python/paddle/fluid/tests/unittests/ir/inference/test_trt_shuffle_channel_detect_pass.py
0 → 100644
浏览文件 @
f52c4f8b
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_fake_quantize_op.py
浏览文件 @
f52c4f8b
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_fleet_base.py
浏览文件 @
f52c4f8b
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_fleet_rolemaker_2.py
浏览文件 @
f52c4f8b
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_fleet_rolemaker_new.py
浏览文件 @
f52c4f8b
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_fleet_util.py
浏览文件 @
f52c4f8b
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_inplace_addto_strategy.py
0 → 100644
浏览文件 @
f52c4f8b
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_top_k_v2_op.py
浏览文件 @
f52c4f8b
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_transformer_api.py
浏览文件 @
f52c4f8b
此差异已折叠。
点击以展开。
python/paddle/nn/layer/transformer.py
浏览文件 @
f52c4f8b
此差异已折叠。
点击以展开。
python/paddle/optimizer/adam.py
浏览文件 @
f52c4f8b
此差异已折叠。
点击以展开。
python/paddle/reader/decorator.py
浏览文件 @
f52c4f8b
此差异已折叠。
点击以展开。
python/paddle/tests/test_dataset_cifar.py
浏览文件 @
f52c4f8b
此差异已折叠。
点击以展开。
python/paddle/tests/test_datasets.py
浏览文件 @
f52c4f8b
此差异已折叠。
点击以展开。
python/paddle/text/datasets/uci_housing.py
浏览文件 @
f52c4f8b
此差异已折叠。
点击以展开。
python/paddle/utils/__init__.py
浏览文件 @
f52c4f8b
此差异已折叠。
点击以展开。
python/paddle/utils/lazy_import.py
0 → 100644
浏览文件 @
f52c4f8b
此差异已折叠。
点击以展开。
python/paddle/vision/datasets/cifar.py
浏览文件 @
f52c4f8b
此差异已折叠。
点击以展开。
python/paddle/vision/datasets/folder.py
浏览文件 @
f52c4f8b
此差异已折叠。
点击以展开。
python/paddle/vision/datasets/mnist.py
浏览文件 @
f52c4f8b
此差异已折叠。
点击以展开。
python/paddle/vision/transforms/functional.py
浏览文件 @
f52c4f8b
此差异已折叠。
点击以展开。
python/paddle/vision/transforms/transforms.py
浏览文件 @
f52c4f8b
此差异已折叠。
点击以展开。
python/requirements.txt
浏览文件 @
f52c4f8b
此差异已折叠。
点击以展开。
python/setup.py.in
浏览文件 @
f52c4f8b
此差异已折叠。
点击以展开。
tools/check_api_approvals.sh
浏览文件 @
f52c4f8b
此差异已折叠。
点击以展开。
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录