Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
Crayon鑫
Paddle
提交
2dda19f7
P
Paddle
项目概览
Crayon鑫
/
Paddle
与 Fork 源项目一致
Fork自
PaddlePaddle / Paddle
通知
1
Star
1
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
1
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
1
Issue
1
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
2dda19f7
编写于
12月 10, 2018
作者:
Y
Yancey1989
浏览文件
操作
浏览文件
下载
差异文件
Merge branch 'develop' of github.com:PaddlePaddle/Paddle into parallel_graph_mode
上级
47740ace
943ad478
变更
129
展开全部
隐藏空白更改
内联
并排
Showing
129 changed file
with
3332 addition
and
831 deletion
+3332
-831
cmake/operators.cmake
cmake/operators.cmake
+2
-0
paddle/fluid/API.spec
paddle/fluid/API.spec
+2
-0
paddle/fluid/framework/CMakeLists.txt
paddle/fluid/framework/CMakeLists.txt
+3
-2
paddle/fluid/framework/data_feed.cc
paddle/fluid/framework/data_feed.cc
+17
-30
paddle/fluid/framework/data_feed.h
paddle/fluid/framework/data_feed.h
+1
-30
paddle/fluid/framework/data_feed_test.cc
paddle/fluid/framework/data_feed_test.cc
+4
-11
paddle/fluid/framework/details/CMakeLists.txt
paddle/fluid/framework/details/CMakeLists.txt
+14
-2
paddle/fluid/framework/details/build_strategy.cc
paddle/fluid/framework/details/build_strategy.cc
+15
-4
paddle/fluid/framework/details/build_strategy.h
paddle/fluid/framework/details/build_strategy.h
+2
-0
paddle/fluid/framework/details/reduce_and_gather.h
paddle/fluid/framework/details/reduce_and_gather.h
+1
-1
paddle/fluid/framework/details/reduce_op_handle.cc
paddle/fluid/framework/details/reduce_op_handle.cc
+142
-2
paddle/fluid/framework/details/reduce_op_handle.h
paddle/fluid/framework/details/reduce_op_handle.h
+39
-0
paddle/fluid/framework/executor_thread_worker.cc
paddle/fluid/framework/executor_thread_worker.cc
+1
-1
paddle/fluid/framework/ir/conv_bias_mkldnn_fuse_pass.cc
paddle/fluid/framework/ir/conv_bias_mkldnn_fuse_pass.cc
+7
-3
paddle/fluid/framework/ir/conv_bias_mkldnn_fuse_pass.h
paddle/fluid/framework/ir/conv_bias_mkldnn_fuse_pass.h
+8
-0
paddle/fluid/framework/ir/graph.h
paddle/fluid/framework/ir/graph.h
+3
-4
paddle/fluid/framework/ir/graph_pattern_detector.cc
paddle/fluid/framework/ir/graph_pattern_detector.cc
+6
-5
paddle/fluid/framework/ir/graph_pattern_detector.h
paddle/fluid/framework/ir/graph_pattern_detector.h
+1
-1
paddle/fluid/framework/ir/mkldnn_placement_pass.cc
paddle/fluid/framework/ir/mkldnn_placement_pass.cc
+11
-3
paddle/fluid/framework/op_kernel_type.cc
paddle/fluid/framework/op_kernel_type.cc
+54
-0
paddle/fluid/framework/op_kernel_type.h
paddle/fluid/framework/op_kernel_type.h
+30
-29
paddle/fluid/framework/op_registry.h
paddle/fluid/framework/op_registry.h
+85
-45
paddle/fluid/framework/operator_test.cc
paddle/fluid/framework/operator_test.cc
+42
-1
paddle/fluid/inference/analysis/argument.h
paddle/fluid/inference/analysis/argument.h
+5
-0
paddle/fluid/inference/analysis/ir_pass_manager.cc
paddle/fluid/inference/analysis/ir_pass_manager.cc
+5
-0
paddle/fluid/inference/analysis/ir_passes/tensorrt_subgraph_pass.cc
...id/inference/analysis/ir_passes/tensorrt_subgraph_pass.cc
+6
-5
paddle/fluid/inference/analysis/passes/ir_graph_build_pass.cc
...le/fluid/inference/analysis/passes/ir_graph_build_pass.cc
+8
-3
paddle/fluid/inference/analysis/passes/ir_graph_build_pass.h
paddle/fluid/inference/analysis/passes/ir_graph_build_pass.h
+3
-2
paddle/fluid/inference/api/analysis_config.cc
paddle/fluid/inference/api/analysis_config.cc
+20
-0
paddle/fluid/inference/api/analysis_predictor.cc
paddle/fluid/inference/api/analysis_predictor.cc
+22
-12
paddle/fluid/inference/api/paddle_analysis_config.h
paddle/fluid/inference/api/paddle_analysis_config.h
+11
-2
paddle/fluid/inference/api/paddle_pass_builder.h
paddle/fluid/inference/api/paddle_pass_builder.h
+4
-3
paddle/fluid/inference/io.cc
paddle/fluid/inference/io.cc
+23
-5
paddle/fluid/inference/io.h
paddle/fluid/inference/io.h
+6
-1
paddle/fluid/inference/tensorrt/convert/test_prelu_op.cc
paddle/fluid/inference/tensorrt/convert/test_prelu_op.cc
+1
-2
paddle/fluid/inference/tensorrt/plugin/CMakeLists.txt
paddle/fluid/inference/tensorrt/plugin/CMakeLists.txt
+1
-1
paddle/fluid/inference/tensorrt/plugin/prelu_op_plugin.cu
paddle/fluid/inference/tensorrt/plugin/prelu_op_plugin.cu
+18
-82
paddle/fluid/inference/tests/api/analyzer_dam_tester.cc
paddle/fluid/inference/tests/api/analyzer_dam_tester.cc
+24
-3
paddle/fluid/inference/tests/api/analyzer_ner_tester.cc
paddle/fluid/inference/tests/api/analyzer_ner_tester.cc
+19
-5
paddle/fluid/inference/tests/api/config_printer.h
paddle/fluid/inference/tests/api/config_printer.h
+7
-2
paddle/fluid/inference/utils/CMakeLists.txt
paddle/fluid/inference/utils/CMakeLists.txt
+5
-0
paddle/fluid/inference/utils/visualizer.cc
paddle/fluid/inference/utils/visualizer.cc
+92
-0
paddle/fluid/inference/utils/visualizer.h
paddle/fluid/inference/utils/visualizer.h
+42
-0
paddle/fluid/operators/CMakeLists.txt
paddle/fluid/operators/CMakeLists.txt
+1
-1
paddle/fluid/operators/activation_mkldnn_op.cc
paddle/fluid/operators/activation_mkldnn_op.cc
+3
-2
paddle/fluid/operators/activation_op.cc
paddle/fluid/operators/activation_op.cc
+2
-2
paddle/fluid/operators/activation_op.h
paddle/fluid/operators/activation_op.h
+96
-17
paddle/fluid/operators/attention_lstm_op.cc
paddle/fluid/operators/attention_lstm_op.cc
+8
-8
paddle/fluid/operators/conv_fusion_op.cu.cc
paddle/fluid/operators/conv_fusion_op.cu.cc
+36
-16
paddle/fluid/operators/conv_mkldnn_op.cc
paddle/fluid/operators/conv_mkldnn_op.cc
+103
-34
paddle/fluid/operators/conv_op.cc
paddle/fluid/operators/conv_op.cc
+27
-5
paddle/fluid/operators/conv_op.h
paddle/fluid/operators/conv_op.h
+2
-0
paddle/fluid/operators/cudnn_lstm_op.cu.cc
paddle/fluid/operators/cudnn_lstm_op.cu.cc
+8
-0
paddle/fluid/operators/distributed/CMakeLists.txt
paddle/fluid/operators/distributed/CMakeLists.txt
+12
-2
paddle/fluid/operators/distributed/collective_client.cc
paddle/fluid/operators/distributed/collective_client.cc
+59
-0
paddle/fluid/operators/distributed/collective_client.h
paddle/fluid/operators/distributed/collective_client.h
+93
-0
paddle/fluid/operators/distributed/collective_server.cc
paddle/fluid/operators/distributed/collective_server.cc
+74
-0
paddle/fluid/operators/distributed/collective_server.h
paddle/fluid/operators/distributed/collective_server.h
+110
-0
paddle/fluid/operators/distributed/collective_server_test.cc
paddle/fluid/operators/distributed/collective_server_test.cc
+115
-0
paddle/fluid/operators/distributed/grpc_client.cc
paddle/fluid/operators/distributed/grpc_client.cc
+53
-6
paddle/fluid/operators/distributed/grpc_client.h
paddle/fluid/operators/distributed/grpc_client.h
+17
-7
paddle/fluid/operators/distributed/grpc_server.cc
paddle/fluid/operators/distributed/grpc_server.cc
+100
-2
paddle/fluid/operators/distributed/grpc_service.h
paddle/fluid/operators/distributed/grpc_service.h
+7
-1
paddle/fluid/operators/distributed/request_handler.h
paddle/fluid/operators/distributed/request_handler.h
+2
-0
paddle/fluid/operators/distributed/rpc_client.h
paddle/fluid/operators/distributed/rpc_client.h
+11
-1
paddle/fluid/operators/distributed/rpc_server.cc
paddle/fluid/operators/distributed/rpc_server.cc
+90
-0
paddle/fluid/operators/distributed/rpc_server.h
paddle/fluid/operators/distributed/rpc_server.h
+31
-0
paddle/fluid/operators/distributed/send_recv.proto.in
paddle/fluid/operators/distributed/send_recv.proto.in
+3
-0
paddle/fluid/operators/elementwise/elementwise_mul_op.h
paddle/fluid/operators/elementwise/elementwise_mul_op.h
+27
-5
paddle/fluid/operators/elementwise/elementwise_op.h
paddle/fluid/operators/elementwise/elementwise_op.h
+19
-12
paddle/fluid/operators/fused/fused_embedding_fc_lstm_op.cc
paddle/fluid/operators/fused/fused_embedding_fc_lstm_op.cc
+3
-3
paddle/fluid/operators/fused/fusion_seqexpand_concat_fc_op.cc
...le/fluid/operators/fused/fusion_seqexpand_concat_fc_op.cc
+3
-3
paddle/fluid/operators/get_tensor_from_selected_rows_op.cc
paddle/fluid/operators/get_tensor_from_selected_rows_op.cc
+117
-0
paddle/fluid/operators/hierarchical_sigmoid_op.cc
paddle/fluid/operators/hierarchical_sigmoid_op.cc
+6
-6
paddle/fluid/operators/hierarchical_sigmoid_op.h
paddle/fluid/operators/hierarchical_sigmoid_op.h
+0
-1
paddle/fluid/operators/load_combine_op.cc
paddle/fluid/operators/load_combine_op.cc
+26
-11
paddle/fluid/operators/math/CMakeLists.txt
paddle/fluid/operators/math/CMakeLists.txt
+1
-0
paddle/fluid/operators/math/cpu_vec.h
paddle/fluid/operators/math/cpu_vec.h
+72
-76
paddle/fluid/operators/math/cpu_vec_test.cc
paddle/fluid/operators/math/cpu_vec_test.cc
+30
-24
paddle/fluid/operators/math/jit_code.cc
paddle/fluid/operators/math/jit_code.cc
+1
-1
paddle/fluid/operators/math/jit_code.h
paddle/fluid/operators/math/jit_code.h
+1
-1
paddle/fluid/operators/math/jit_gen.cc
paddle/fluid/operators/math/jit_gen.cc
+1
-1
paddle/fluid/operators/math/jit_kernel.cc
paddle/fluid/operators/math/jit_kernel.cc
+0
-2
paddle/fluid/operators/math/jit_kernel_blas.cc
paddle/fluid/operators/math/jit_kernel_blas.cc
+1
-2
paddle/fluid/operators/math/jit_kernel_crf_decode.cc
paddle/fluid/operators/math/jit_kernel_crf_decode.cc
+11
-13
paddle/fluid/operators/math/jit_kernel_exp.cc
paddle/fluid/operators/math/jit_kernel_exp.cc
+0
-1
paddle/fluid/operators/math/jit_kernel_layer_norm.cc
paddle/fluid/operators/math/jit_kernel_layer_norm.cc
+10
-12
paddle/fluid/operators/math/jit_kernel_macro.h
paddle/fluid/operators/math/jit_kernel_macro.h
+18
-19
paddle/fluid/operators/math/jit_kernel_test.cc
paddle/fluid/operators/math/jit_kernel_test.cc
+1
-1
paddle/fluid/operators/math/matrix_bit_code.cc
paddle/fluid/operators/math/matrix_bit_code.cc
+41
-16
paddle/fluid/operators/math/matrix_bit_code.h
paddle/fluid/operators/math/matrix_bit_code.h
+4
-0
paddle/fluid/operators/math/prelu.cu
paddle/fluid/operators/math/prelu.cu
+148
-0
paddle/fluid/operators/math/prelu.h
paddle/fluid/operators/math/prelu.h
+49
-0
paddle/fluid/operators/math/softmax_impl.h
paddle/fluid/operators/math/softmax_impl.h
+1
-0
paddle/fluid/operators/merge_selected_rows_op.cc
paddle/fluid/operators/merge_selected_rows_op.cc
+72
-0
paddle/fluid/operators/merge_selected_rows_op.cu.cc
paddle/fluid/operators/merge_selected_rows_op.cu.cc
+23
-0
paddle/fluid/operators/merge_selected_rows_op.h
paddle/fluid/operators/merge_selected_rows_op.h
+36
-0
paddle/fluid/operators/prelu_op.cc
paddle/fluid/operators/prelu_op.cc
+1
-1
paddle/fluid/operators/prelu_op.cu
paddle/fluid/operators/prelu_op.cu
+64
-0
paddle/fluid/platform/cpu_info.cc
paddle/fluid/platform/cpu_info.cc
+0
-2
paddle/fluid/platform/cpu_info.h
paddle/fluid/platform/cpu_info.h
+0
-3
paddle/fluid/platform/device_context.cc
paddle/fluid/platform/device_context.cc
+12
-2
paddle/fluid/platform/device_tracer.cc
paddle/fluid/platform/device_tracer.cc
+10
-9
paddle/fluid/platform/device_tracer.h
paddle/fluid/platform/device_tracer.h
+4
-2
paddle/fluid/platform/dynload/cudnn.h
paddle/fluid/platform/dynload/cudnn.h
+7
-2
paddle/fluid/platform/init.cc
paddle/fluid/platform/init.cc
+7
-7
paddle/fluid/platform/mkldnn_helper.h
paddle/fluid/platform/mkldnn_helper.h
+12
-0
paddle/fluid/pybind/pybind.cc
paddle/fluid/pybind/pybind.cc
+12
-0
paddle/scripts/paddle_build.sh
paddle/scripts/paddle_build.sh
+2
-4
python/paddle/fluid/clip.py
python/paddle/fluid/clip.py
+7
-1
python/paddle/fluid/data_feeder.py
python/paddle/fluid/data_feeder.py
+7
-4
python/paddle/fluid/framework.py
python/paddle/fluid/framework.py
+1
-0
python/paddle/fluid/io.py
python/paddle/fluid/io.py
+2
-2
python/paddle/fluid/layers/nn.py
python/paddle/fluid/layers/nn.py
+48
-0
python/paddle/fluid/parallel_executor.py
python/paddle/fluid/parallel_executor.py
+8
-0
python/paddle/fluid/tests/test_gradient_clip.py
python/paddle/fluid/tests/test_gradient_clip.py
+0
-84
python/paddle/fluid/tests/unittests/CMakeLists.txt
python/paddle/fluid/tests/unittests/CMakeLists.txt
+4
-5
python/paddle/fluid/tests/unittests/dist_save_load.py
python/paddle/fluid/tests/unittests/dist_save_load.py
+2
-2
python/paddle/fluid/tests/unittests/test_conv2d_fusion_op.py
python/paddle/fluid/tests/unittests/test_conv2d_fusion_op.py
+6
-0
python/paddle/fluid/tests/unittests/test_conv3d_mkldnn_op.py
python/paddle/fluid/tests/unittests/test_conv3d_mkldnn_op.py
+59
-0
python/paddle/fluid/tests/unittests/test_conv3d_op.py
python/paddle/fluid/tests/unittests/test_conv3d_op.py
+23
-44
python/paddle/fluid/tests/unittests/test_dist_base.py
python/paddle/fluid/tests/unittests/test_dist_base.py
+115
-22
python/paddle/fluid/tests/unittests/test_dist_mnist.py
python/paddle/fluid/tests/unittests/test_dist_mnist.py
+13
-0
python/paddle/fluid/tests/unittests/test_dist_save_load.py
python/paddle/fluid/tests/unittests/test_dist_save_load.py
+1
-1
python/paddle/fluid/tests/unittests/test_dist_transpiler.py
python/paddle/fluid/tests/unittests/test_dist_transpiler.py
+1
-0
python/paddle/fluid/tests/unittests/test_get_tensor_from_selected_rows_op.py
.../tests/unittests/test_get_tensor_from_selected_rows_op.py
+65
-0
python/paddle/fluid/tests/unittests/test_gradient_clip.py
python/paddle/fluid/tests/unittests/test_gradient_clip.py
+161
-0
python/paddle/fluid/tests/unittests/test_merge_selectedrows_op.py
...addle/fluid/tests/unittests/test_merge_selectedrows_op.py
+73
-0
python/paddle/fluid/transpiler/distribute_transpiler.py
python/paddle/fluid/transpiler/distribute_transpiler.py
+8
-18
未找到文件。
cmake/operators.cmake
浏览文件 @
2dda19f7
...
...
@@ -166,6 +166,8 @@ function(op_library TARGET)
# Append first implemented MKLDNN activation operator
if
(
${
MKLDNN_FILE
}
STREQUAL
"activation_mkldnn_op"
)
file
(
APPEND
${
pybind_file
}
"USE_OP_DEVICE_KERNEL(relu, MKLDNN);
\n
"
)
elseif
(
${
MKLDNN_FILE
}
STREQUAL
"conv_mkldnn_op"
)
file
(
APPEND
${
pybind_file
}
"USE_OP_DEVICE_KERNEL_WITH_CUSTOM_TYPE(conv2d, MKLDNN, FP32);
\n
"
)
else
()
file
(
APPEND
${
pybind_file
}
"USE_OP_DEVICE_KERNEL(
${
TARGET
}
, MKLDNN);
\n
"
)
endif
()
...
...
paddle/fluid/API.spec
浏览文件 @
2dda19f7
...
...
@@ -194,6 +194,8 @@ paddle.fluid.layers.grid_sampler ArgSpec(args=['x', 'grid', 'name'], varargs=Non
paddle.fluid.layers.log_loss ArgSpec(args=['input', 'label', 'epsilon', 'name'], varargs=None, keywords=None, defaults=(0.0001, None))
paddle.fluid.layers.add_position_encoding ArgSpec(args=['input', 'alpha', 'beta', 'name'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.layers.bilinear_tensor_product ArgSpec(args=['x', 'y', 'size', 'act', 'name', 'param_attr', 'bias_attr'], varargs=None, keywords=None, defaults=(None, None, None, None))
paddle.fluid.layers.merge_selected_rows ArgSpec(args=['x', 'name'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.layers.get_tensor_from_selected_rows ArgSpec(args=['x', 'name'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.layers.lstm ArgSpec(args=['input', 'init_h', 'init_c', 'max_len', 'hidden_size', 'num_layers', 'dropout_prob', 'is_bidirec', 'is_test', 'name', 'default_initializer', 'seed'], varargs=None, keywords=None, defaults=(0.0, False, False, None, None, -1))
paddle.fluid.layers.data ArgSpec(args=['name', 'shape', 'append_batch_size', 'dtype', 'lod_level', 'type', 'stop_gradient'], varargs=None, keywords=None, defaults=(True, 'float32', 0, VarType.LOD_TENSOR, True))
paddle.fluid.layers.open_files ArgSpec(args=['filenames', 'shapes', 'lod_levels', 'dtypes', 'thread_num', 'buffer_size', 'pass_num', 'is_test'], varargs=None, keywords=None, defaults=(None, None, 1, None))
...
...
paddle/fluid/framework/CMakeLists.txt
浏览文件 @
2dda19f7
...
...
@@ -118,8 +118,9 @@ cc_library(op_info SRCS op_info.cc DEPS attribute framework_proto)
cc_library
(
shape_inference SRCS shape_inference.cc DEPS ddim attribute device_context
)
cc_library
(
transfer_scope_cache SRCS transfer_scope_cache.cc DEPS scope framework_proto device_context
)
cc_library
(
op_kernel_type SRCS op_kernel_type.cc DEPS device_context place
)
cc_library
(
operator SRCS operator.cc DEPS op_info device_context tensor scope glog
shape_inference data_transform lod_tensor profiler transfer_scope_cache
)
shape_inference data_transform lod_tensor profiler transfer_scope_cache
op_kernel_type
)
cc_test
(
operator_test SRCS operator_test.cc DEPS operator op_registry device_context
)
...
...
@@ -191,7 +192,7 @@ cc_test(var_type_inference_test SRCS var_type_inference_test.cc DEPS op_registry
cc_library
(
selected_rows SRCS selected_rows.cc DEPS tensor
)
cc_test
(
selected_rows_test SRCS selected_rows_test.cc DEPS selected_rows
)
cc_test
(
op_kernel_type_test SRCS op_kernel_type_test.cc DEPS place device_context framework_proto
)
cc_test
(
op_kernel_type_test SRCS op_kernel_type_test.cc DEPS place device_context framework_proto
op_kernel_type
)
cc_test
(
cow_ptr_tests SRCS details/cow_ptr_test.cc
)
cc_test
(
tuple_test SRCS tuple_test.cc
)
...
...
paddle/fluid/framework/data_feed.cc
浏览文件 @
2dda19f7
...
...
@@ -33,11 +33,7 @@ void DataFeed::AddFeedVar(Variable* var, const std::string& name) {
CheckInit
();
for
(
size_t
i
=
0
;
i
<
use_slots_
.
size
();
++
i
)
{
if
(
name
==
use_slots_
[
i
])
{
if
(
use_slots_is_dense_
[
i
])
{
feed_vec_
[
i
]
=
MixTensor
(
var
->
GetMutable
<
Tensor
>
());
}
else
{
feed_vec_
[
i
]
=
MixTensor
(
var
->
GetMutable
<
LoDTensor
>
());
}
feed_vec_
[
i
]
=
var
->
GetMutable
<
LoDTensor
>
();
}
}
}
...
...
@@ -301,6 +297,7 @@ bool MultiSlotDataFeed::ParseOneInstance(std::vector<MultiSlotType>* instance) {
"the data, please check if the data contains unresolvable "
"characters.
\n
please check this error line: %s"
,
str
);
if
(
idx
!=
-
1
)
{
(
*
instance
)[
idx
].
Init
(
all_slots_type_
[
i
]);
if
((
*
instance
)[
idx
].
GetType
()[
0
]
==
'f'
)
{
// float
...
...
@@ -337,6 +334,7 @@ void MultiSlotDataFeed::AddInstanceToInsVec(
(
*
ins_vec
)[
i
].
InitOffset
();
}
}
for
(
size_t
i
=
0
;
i
<
instance
.
size
();
++
i
)
{
(
*
ins_vec
)[
i
].
AddIns
(
instance
[
i
]);
}
...
...
@@ -348,36 +346,25 @@ void MultiSlotDataFeed::PutToFeedVec(
const
auto
&
type
=
ins_vec
[
i
].
GetType
();
const
auto
&
offset
=
ins_vec
[
i
].
GetOffset
();
int
total_instance
=
static_cast
<
int
>
(
offset
.
back
());
if
(
type
[
0
]
==
'f'
)
{
// float
const
auto
&
feasign
=
ins_vec
[
i
].
GetFloatData
();
if
(
feed_vec_
[
i
].
IsDense
())
{
int
size_in_each_batch
=
total_instance
/
batch_size_
;
float
*
tensor_ptr
=
feed_vec_
[
i
].
GetTensor
()
->
mutable_data
<
float
>
(
{
batch_size_
,
size_in_each_batch
},
platform
::
CPUPlace
());
memcpy
(
tensor_ptr
,
&
feasign
[
0
],
total_instance
*
sizeof
(
float
));
}
else
{
float
*
tensor_ptr
=
feed_vec_
[
i
].
GetLoDTensor
()
->
mutable_data
<
float
>
(
{
total_instance
,
1
},
platform
::
CPUPlace
());
memcpy
(
tensor_ptr
,
&
feasign
[
0
],
total_instance
*
sizeof
(
float
));
LoD
data_lod
{
offset
};
feed_vec_
[
i
].
GetLoDTensor
()
->
set_lod
(
data_lod
);
}
float
*
tensor_ptr
=
feed_vec_
[
i
]
->
mutable_data
<
float
>
(
{
total_instance
,
1
},
platform
::
CPUPlace
());
memcpy
(
tensor_ptr
,
&
feasign
[
0
],
total_instance
*
sizeof
(
float
));
}
else
if
(
type
[
0
]
==
'u'
)
{
// uint64
// no uint64_t type in paddlepaddle
const
auto
&
feasign
=
ins_vec
[
i
].
GetUint64Data
();
if
(
feed_vec_
[
i
].
IsDense
())
{
int
size_in_each_batch
=
total_instance
/
batch_size_
;
int64_t
*
tensor_ptr
=
feed_vec_
[
i
].
GetTensor
()
->
mutable_data
<
int64_t
>
(
{
batch_size_
,
size_in_each_batch
},
platform
::
CPUPlace
());
memcpy
(
tensor_ptr
,
&
feasign
[
0
],
total_instance
*
sizeof
(
int64_t
));
}
else
{
int64_t
*
tensor_ptr
=
feed_vec_
[
i
].
GetLoDTensor
()
->
mutable_data
<
int64_t
>
(
{
total_instance
,
1
},
platform
::
CPUPlace
());
memcpy
(
tensor_ptr
,
&
feasign
[
0
],
total_instance
*
sizeof
(
int64_t
));
LoD
data_lod
{
offset
};
feed_vec_
[
i
].
GetLoDTensor
()
->
set_lod
(
data_lod
);
}
int64_t
*
tensor_ptr
=
feed_vec_
[
i
]
->
mutable_data
<
int64_t
>
(
{
total_instance
,
1
},
platform
::
CPUPlace
());
memcpy
(
tensor_ptr
,
&
feasign
[
0
],
total_instance
*
sizeof
(
int64_t
));
}
LoD
data_lod
{
offset
};
feed_vec_
[
i
]
->
set_lod
(
data_lod
);
if
(
use_slots_is_dense_
[
i
])
{
int
dim
=
total_instance
/
batch_size_
;
feed_vec_
[
i
]
->
Resize
({
batch_size_
,
dim
});
}
}
}
...
...
paddle/fluid/framework/data_feed.h
浏览文件 @
2dda19f7
...
...
@@ -30,35 +30,6 @@ limitations under the License. */
namespace
paddle
{
namespace
framework
{
// Pack Tensor type and LoDTensor type into MixTensor type, in order
// to record either Tensor or LoDTensor information at the same time.
class
MixTensor
{
public:
MixTensor
()
{}
explicit
MixTensor
(
LoDTensor
*
lodtensor
)
{
is_dense_
=
false
;
lodtensor_
=
lodtensor
;
}
explicit
MixTensor
(
Tensor
*
tensor
)
{
is_dense_
=
true
;
tensor_
=
tensor
;
}
bool
IsDense
()
{
return
is_dense_
;
}
LoDTensor
*
GetLoDTensor
()
{
PADDLE_ENFORCE
(
!
is_dense_
,
"Let a dense var return a LoDTensor ptr."
);
return
lodtensor_
;
}
Tensor
*
GetTensor
()
{
PADDLE_ENFORCE
(
is_dense_
,
"Let a sparse var return a Tensor ptr."
);
return
tensor_
;
}
private:
bool
is_dense_
;
LoDTensor
*
lodtensor_
;
Tensor
*
tensor_
;
};
// DataFeed is the base virtual class for all ohther DataFeeds.
// It is used to read files and parse the data for subsequent trainer.
// Example:
...
...
@@ -133,7 +104,7 @@ class DataFeed {
use_slots_index_
;
// -1: not used; >=0: the index of use_slots_
// The data read by DataFeed will be stored here
std
::
vector
<
MixTensor
>
feed_vec_
;
std
::
vector
<
LoDTensor
*
>
feed_vec_
;
// the batch size defined by user
int
default_batch_size_
;
...
...
paddle/fluid/framework/data_feed_test.cc
浏览文件 @
2dda19f7
...
...
@@ -152,19 +152,13 @@ void GetElemSetFromReader(std::vector<MultiTypeSet>* reader_elem_set,
const
auto
&
multi_slot_desc
=
data_feed_desc
.
multi_slot_desc
();
std
::
map
<
std
::
string
,
const
paddle
::
framework
::
LoDTensor
*>
lodtensor_targets
;
std
::
map
<
std
::
string
,
const
paddle
::
framework
::
Tensor
*>
tensor_targets
;
for
(
int
i
=
0
;
i
<
multi_slot_desc
.
slots_size
();
++
i
)
{
const
auto
&
slot
=
multi_slot_desc
.
slots
(
i
);
if
(
slot
.
is_used
())
{
const
auto
&
name
=
slot
.
name
();
readers
[
idx
]
->
AddFeedVar
(
scope
->
Var
(
name
),
name
);
if
(
slot
.
is_dense
())
{
tensor_targets
[
name
]
=
&
scope
->
FindVar
(
name
)
->
Get
<
paddle
::
framework
::
Tensor
>
();
}
else
{
lodtensor_targets
[
name
]
=
&
scope
->
FindVar
(
name
)
->
Get
<
paddle
::
framework
::
LoDTensor
>
();
}
lodtensor_targets
[
name
]
=
&
scope
->
FindVar
(
name
)
->
Get
<
paddle
::
framework
::
LoDTensor
>
();
}
}
readers
[
idx
]
->
Start
();
...
...
@@ -175,8 +169,9 @@ void GetElemSetFromReader(std::vector<MultiTypeSet>* reader_elem_set,
if
(
!
slot
.
is_used
())
{
continue
;
}
const
paddle
::
framework
::
LoDTensor
*
tens
=
lodtensor_targets
[
slot
.
name
()];
if
(
slot
.
is_dense
())
{
// dense branch
const
paddle
::
framework
::
Tensor
*
tens
=
tensor_targets
[
slot
.
name
()];
if
(
slot
.
type
()
==
"uint64"
)
{
const
int64_t
*
data
=
tens
->
data
<
int64_t
>
();
int
batch_size
=
tens
->
dims
()[
0
];
...
...
@@ -202,8 +197,6 @@ void GetElemSetFromReader(std::vector<MultiTypeSet>* reader_elem_set,
PADDLE_THROW
(
"Error type in proto file."
);
}
}
else
{
// sparse branch
const
paddle
::
framework
::
LoDTensor
*
tens
=
lodtensor_targets
[
slot
.
name
()];
if
(
slot
.
type
()
==
"uint64"
)
{
const
int64_t
*
data
=
tens
->
data
<
int64_t
>
();
for
(
size_t
i
=
0
;
i
<
tens
->
NumElements
();
++
i
)
{
...
...
paddle/fluid/framework/details/CMakeLists.txt
浏览文件 @
2dda19f7
...
...
@@ -15,14 +15,26 @@ cc_library(variable_visitor SRCS variable_visitor.cc DEPS lod_tensor selected_ro
if
(
WITH_GPU
)
nv_library
(
all_reduce_op_handle SRCS all_reduce_op_handle.cc DEPS op_handle_base scope lod_tensor ddim memory
dynload_cuda variable_visitor
)
nv_library
(
reduce_op_handle SRCS reduce_op_handle.cc DEPS op_handle_base variable_visitor scope ddim dynload_cuda
)
if
(
WITH_DISTRIBUTE
)
nv_library
(
reduce_op_handle SRCS reduce_op_handle.cc DEPS op_handle_base variable_visitor scope
ddim dynload_cuda selected_rows_functor sendrecvop_grpc
)
else
()
nv_library
(
reduce_op_handle SRCS reduce_op_handle.cc DEPS op_handle_base variable_visitor scope
ddim dynload_cuda selected_rows_functor
)
endif
()
nv_library
(
broadcast_op_handle SRCS broadcast_op_handle.cc DEPS op_handle_base scope ddim memory variable_visitor dynload_cuda
)
nv_library
(
fused_broadcast_op_handle SRCS fused_broadcast_op_handle.cc DEPS broadcast_op_handle
)
else
()
cc_library
(
all_reduce_op_handle SRCS all_reduce_op_handle.cc DEPS op_handle_base scope lod_tensor ddim memory
variable_visitor
)
cc_library
(
reduce_op_handle SRCS reduce_op_handle.cc DEPS op_handle_base variable_visitor scope ddim
)
if
(
WITH_DISTRIBUTE
)
cc_library
(
reduce_op_handle SRCS reduce_op_handle.cc DEPS op_handle_base variable_visitor scope
ddim selected_rows_functor sendrecvop_grpc
)
else
()
cc_library
(
reduce_op_handle SRCS reduce_op_handle.cc DEPS op_handle_base variable_visitor scope
ddim selected_rows_functor
)
endif
()
cc_library
(
broadcast_op_handle SRCS broadcast_op_handle.cc DEPS op_handle_base scope ddim memory variable_visitor
)
cc_library
(
fused_broadcast_op_handle SRCS fused_broadcast_op_handle.cc DEPS broadcast_op_handle
)
endif
()
...
...
paddle/fluid/framework/details/build_strategy.cc
浏览文件 @
2dda19f7
...
...
@@ -58,6 +58,17 @@ class ParallelExecutorPassBuilder : public ir::PassBuilder {
}
}
CollectiveContext
*
context
=
CollectiveContext
::
GetInstance
();
context
->
endpoints_
=
strategy_
.
trainers_endpoints_
;
context
->
trainer_id_
=
strategy_
.
trainer_id_
;
PADDLE_ENFORCE
(
strategy_
.
trainer_id_
>=
0
,
"trainer_id_ >= 0"
);
if
(
strategy_
.
trainer_id_
>
0
)
{
PADDLE_ENFORCE
((
unsigned
)(
strategy_
.
trainer_id_
)
<
strategy_
.
trainers_endpoints_
.
size
(),
"trainer_id_ < endpoints_ size"
);
}
VLOG
(
1
)
<<
"CollectiveContext:"
<<
context
->
String
();
// Convert graph to run on multi-devices.
auto
multi_devices_pass
=
AppendPass
(
"multi_devices_pass"
);
multi_devices_pass
->
SetNotOwned
<
const
BuildStrategy
>
(
"strategy"
,
...
...
@@ -135,16 +146,16 @@ std::unique_ptr<ir::Graph> BuildStrategy::Apply(
pass
->
SetNotOwned
<
platform
::
NCCLContextMap
>
(
"nccl_ctxs"
,
nctx
);
#endif
}
else
if
(
pass
->
Type
()
==
"sequential_execution_pass"
)
{
VLOG
(
1
)
<<
"set enable_sequential_execution:"
<<
enable_sequential_execution_
;
LOG
(
INFO
)
<<
"set enable_sequential_execution:"
<<
enable_sequential_execution_
;
pass
->
Erase
(
kAllOpDescs
);
pass
->
Set
<
const
std
::
vector
<
OpDesc
*>>
(
kAllOpDescs
,
new
std
::
vector
<
OpDesc
*>
(
main_program
.
Block
(
0
).
AllOps
()));
}
else
if
(
pass
->
Type
()
==
"all_reduce_deps_pass"
)
{
VLOG
(
1
)
<<
"SeqOnlyAllReduceOps:"
<<
SeqOnlyAllReduceOps
(
*
this
)
<<
", num_trainers:"
<<
num_trainers_
;
LOG
(
INFO
)
<<
"SeqOnlyAllReduceOps:"
<<
SeqOnlyAllReduceOps
(
*
this
)
<<
", num_trainers:"
<<
num_trainers_
;
pass
->
Erase
(
kAllOpDescs
);
pass
->
Set
<
const
std
::
vector
<
OpDesc
*>>
(
...
...
paddle/fluid/framework/details/build_strategy.h
浏览文件 @
2dda19f7
...
...
@@ -74,6 +74,8 @@ struct BuildStrategy {
bool
fuse_broadcast_op_
{
false
};
int
num_trainers_
{
1
};
int
trainer_id_
{
0
};
std
::
vector
<
std
::
string
>
trainers_endpoints_
;
bool
remove_unnecessary_lock_
{
false
};
// NOTE:
...
...
paddle/fluid/framework/details/reduce_and_gather.h
浏览文件 @
2dda19f7
...
...
@@ -53,7 +53,7 @@ struct ReduceLoDTensor {
}
};
inline
void
GatherSelectedRows
(
inline
void
Gather
Local
SelectedRows
(
const
std
::
vector
<
const
SelectedRows
*>
&
src_selecte_rows_
,
const
std
::
vector
<
platform
::
Place
>
&
in_places
,
const
std
::
map
<
platform
::
Place
,
platform
::
DeviceContext
*>
&
dev_ctxes
,
...
...
paddle/fluid/framework/details/reduce_op_handle.cc
浏览文件 @
2dda19f7
...
...
@@ -16,6 +16,12 @@
#include "paddle/fluid/framework/details/container_cast.h"
#include "paddle/fluid/framework/details/reduce_and_gather.h"
#include "paddle/fluid/framework/details/variable_visitor.h"
#if defined PADDLE_WITH_CUDA && defined PADDLE_WITH_DISTRIBUTE
#include "paddle/fluid/operators/distributed/collective_client.h"
#include "paddle/fluid/operators/distributed/collective_server.h"
#include "paddle/fluid/operators/distributed/request_handler.h"
#endif
#include "paddle/fluid/operators/math/selected_rows_functor.h"
#include "paddle/fluid/platform/profiler.h"
DEFINE_bool
(
...
...
@@ -26,6 +32,112 @@ namespace paddle {
namespace
framework
{
namespace
details
{
std
::
once_flag
CollectiveContext
::
init_flag_
;
std
::
unique_ptr
<
CollectiveContext
>
CollectiveContext
::
context_
;
static
inline
std
::
string
GetRemoteVarName
(
const
std
::
string
&
var_name
,
int
trainer_id
)
{
return
string
::
Sprintf
(
"%s_merged_tmp@trainer_%d"
,
var_name
,
trainer_id
);
}
void
ReduceOpHandle
::
Wait
(
const
std
::
map
<
platform
::
Place
,
platform
::
DeviceContext
*>
&
dev_ctxes
)
{
// TODO(gongwb): use event wait?
for
(
auto
&
dev_ctx
:
dev_ctxes
)
{
dev_ctx
.
second
->
Wait
();
}
}
#if defined PADDLE_WITH_CUDA && defined PADDLE_WITH_DISTRIBUTE
template
<
typename
DevCtx
,
typename
DataType
>
void
ReduceOpHandle
::
GatherSelectedRows
(
const
std
::
vector
<
const
SelectedRows
*>
&
src_selected_rows
,
const
std
::
vector
<
platform
::
Place
>
&
in_places
,
const
std
::
map
<
platform
::
Place
,
platform
::
DeviceContext
*>
&
dev_ctxes
,
VarHandle
*
out_var_handle
,
const
platform
::
Place
&
out_place
,
SelectedRows
*
dst_selected_rows
)
{
const
CollectiveContext
&
collective_context
=
*
CollectiveContext
::
GetInstance
();
// 1. gather local selected rows, merge them
std
::
string
gathered_var_name
=
out_var_handle
->
name_
+
"_gathered_tmp"
;
auto
scope
=
local_scopes_
.
at
(
out_var_handle
->
scope_idx_
);
auto
gathered_var_mid
=
scope
->
Var
(
gathered_var_name
);
auto
gathered_select_rows
=
gathered_var_mid
->
GetMutable
<
framework
::
SelectedRows
>
();
GatherLocalSelectedRows
(
src_selected_rows
,
in_places
,
dev_ctxes
,
out_place
,
gathered_select_rows
);
// FIXME(gongwb): remove this Wait.
Wait
(
dev_ctxes
);
// merge them
auto
merged_dev_ctx
=
dynamic_cast
<
DevCtx
*>
(
dev_ctxes
.
at
(
out_place
));
std
::
string
merged_var_name
=
GetRemoteVarName
(
out_var_handle
->
name_
,
collective_context
.
trainer_id_
);
auto
merged_select_rows
=
scope
->
Var
(
merged_var_name
)
->
GetMutable
<
SelectedRows
>
();
operators
::
math
::
scatter
::
MergeAdd
<
DevCtx
,
DataType
>
merge_func
;
merge_func
(
*
merged_dev_ctx
,
*
gathered_select_rows
,
merged_select_rows
);
// 2. start collective server if it doesn't exist
operators
::
distributed
::
CollectiveServer
*
server
=
operators
::
distributed
::
CollectiveServer
::
GetInstance
(
collective_context
.
endpoints_
[
collective_context
.
trainer_id_
],
collective_context
.
endpoints_
.
size
()
-
1
);
auto
rpc_server
=
server
->
GetRPCServer
();
rpc_server
->
RegisterVar
(
merged_var_name
,
operators
::
distributed
::
kRequestGetMonomerVariable
,
scope
,
merged_dev_ctx
);
// 3. gather them from all remote nodes.
std
::
vector
<
const
SelectedRows
*>
remote
;
operators
::
distributed
::
CollectiveClient
*
client
=
operators
::
distributed
::
CollectiveClient
::
GetInstance
();
std
::
vector
<
operators
::
distributed
::
RemoteVar
>
vars
;
for
(
unsigned
int
i
=
0
;
i
<
collective_context
.
endpoints_
.
size
();
i
++
)
{
if
(
i
==
(
unsigned
)
collective_context
.
trainer_id_
)
continue
;
operators
::
distributed
::
RemoteVar
var
;
var
.
trainer_id_
=
i
;
var
.
var_name_
=
GetRemoteVarName
(
out_var_handle
->
name_
,
i
);
var
.
ep_
=
collective_context
.
endpoints_
[
i
];
vars
.
push_back
(
var
);
VLOG
(
4
)
<<
"gather from:"
<<
var
.
String
();
}
// erase gathered vars
merged_dev_ctx
->
Wait
();
scope
->
EraseVars
(
std
::
vector
<
std
::
string
>
{
gathered_var_name
});
PADDLE_ENFORCE
(
client
->
Gather
(
vars
,
&
remote
,
*
merged_dev_ctx
,
scope
));
PADDLE_ENFORCE
(
remote
.
size
()
==
vars
.
size
());
// 4. merged local selected rows.
std
::
vector
<
const
SelectedRows
*>
all
;
all
.
resize
(
collective_context
.
endpoints_
.
size
());
for
(
auto
v
:
vars
)
{
all
[
v
.
trainer_id_
]
=
scope
->
FindVar
(
v
.
var_name_
)
->
GetMutable
<
SelectedRows
>
();
}
all
[
collective_context
.
trainer_id_
]
=
merged_select_rows
;
merge_func
(
*
merged_dev_ctx
,
all
,
dst_selected_rows
);
rpc_server
->
WaitVarBarrier
(
merged_var_name
);
rpc_server
->
ClearVar
(
merged_var_name
);
// 5. clear mid vars
std
::
vector
<
std
::
string
>
tmp_vars
{
merged_var_name
};
for
(
auto
r
:
vars
)
{
tmp_vars
.
push_back
(
r
.
var_name_
);
}
scope
->
EraseVars
(
tmp_vars
);
}
#endif
void
ReduceOpHandle
::
RunImpl
()
{
platform
::
RecordEvent
record_event
(
Name
(),
dev_ctxes_
.
cbegin
()
->
second
);
...
...
@@ -90,8 +202,36 @@ void ReduceOpHandle::RunImpl() {
this
->
RunAndRecordEvent
([
&
]
{
std
::
vector
<
const
SelectedRows
*>
in_selected_rows
=
GetInputValues
<
SelectedRows
>
(
in_var_handles
,
var_scopes
);
GatherSelectedRows
(
in_selected_rows
,
in_places
,
dev_ctxes_
,
t_out_p
,
out_var
->
GetMutable
<
framework
::
SelectedRows
>
());
const
CollectiveContext
&
collective_context
=
*
CollectiveContext
::
GetInstance
();
VLOG
(
10
)
<<
"GatherSelectedRows CollectiveContext:"
<<
collective_context
.
String
();
// TODO(gongwb): add cpu support
if
(
collective_context
.
endpoints_
.
size
()
<=
1
||
is_cpu_place
(
in_places
[
0
])
||
is_cpu_place
(
t_out_p
))
{
GatherLocalSelectedRows
(
in_selected_rows
,
in_places
,
dev_ctxes_
,
t_out_p
,
out_var
->
GetMutable
<
framework
::
SelectedRows
>
());
return
;
}
#if defined PADDLE_WITH_CUDA && defined PADDLE_WITH_DISTRIBUTE
if
(
framework
::
IsType
<
const
float
>
(
in_selected_rows
[
0
]
->
value
().
type
()))
{
GatherSelectedRows
<
platform
::
CUDADeviceContext
,
float
>
(
in_selected_rows
,
in_places
,
dev_ctxes_
,
out_var_handle
,
t_out_p
,
out_var
->
GetMutable
<
framework
::
SelectedRows
>
());
}
else
if
(
framework
::
IsType
<
const
double
>
(
in_selected_rows
[
0
]
->
value
().
type
()))
{
GatherSelectedRows
<
platform
::
CUDADeviceContext
,
double
>
(
in_selected_rows
,
in_places
,
dev_ctxes_
,
out_var_handle
,
t_out_p
,
out_var
->
GetMutable
<
framework
::
SelectedRows
>
());
}
else
{
PADDLE_ENFORCE
(
false
,
"only support double or float when gahter SelectedRows"
);
}
#endif
});
}
else
{
std
::
vector
<
const
LoDTensor
*>
lod_tensors
=
...
...
paddle/fluid/framework/details/reduce_op_handle.h
浏览文件 @
2dda19f7
...
...
@@ -30,6 +30,32 @@
namespace
paddle
{
namespace
framework
{
namespace
details
{
struct
CollectiveContext
{
std
::
vector
<
std
::
string
>
endpoints_
;
int
trainer_id_
{
0
};
std
::
string
String
()
const
{
std
::
stringstream
ss
;
ss
<<
"endpoints_:"
;
for
(
auto
e
:
endpoints_
)
{
ss
<<
e
<<
","
;
}
ss
<<
"trainer_id_:"
<<
trainer_id_
;
return
ss
.
str
();
}
static
CollectiveContext
*
GetInstance
()
{
std
::
call_once
(
init_flag_
,
[
&
]()
{
context_
.
reset
(
new
CollectiveContext
());
});
return
context_
.
get
();
}
private:
static
std
::
once_flag
init_flag_
;
static
std
::
unique_ptr
<
CollectiveContext
>
context_
;
};
struct
ReduceOpHandle
:
public
OpHandleBase
{
std
::
vector
<
Scope
*>
local_scopes_
;
...
...
@@ -64,6 +90,19 @@ struct ReduceOpHandle : public OpHandleBase {
protected:
void
RunImpl
()
override
;
#if defined PADDLE_WITH_CUDA && defined PADDLE_WITH_DISTRIBUTE
template
<
typename
DevCtx
,
typename
DataType
>
void
GatherSelectedRows
(
const
std
::
vector
<
const
SelectedRows
*>
&
src_selecte_rows_
,
const
std
::
vector
<
platform
::
Place
>
&
in_places
,
const
std
::
map
<
platform
::
Place
,
platform
::
DeviceContext
*>
&
dev_ctxes
,
VarHandle
*
out_var_handle
,
const
platform
::
Place
&
out_place
,
SelectedRows
*
dst_selecte_rows
);
#endif
void
Wait
(
const
std
::
map
<
platform
::
Place
,
platform
::
DeviceContext
*>
&
dev_ctxes
);
template
<
typename
T
>
std
::
vector
<
const
T
*>
GetInputValues
(
const
std
::
vector
<
VarHandle
*>
&
in_var_handles
,
...
...
paddle/fluid/framework/executor_thread_worker.cc
浏览文件 @
2dda19f7
...
...
@@ -97,7 +97,7 @@ void ExecutorThreadWorker::SetDevice() {
static
unsigned
concurrency_cap
=
std
::
thread
::
hardware_concurrency
();
int
thread_id
=
this
->
thread_id_
;
if
(
thread_id
<
concurrency_cap
)
{
if
(
static_cast
<
unsigned
>
(
thread_id
)
<
concurrency_cap
)
{
unsigned
proc
=
thread_id
;
cpu_set_t
mask
;
...
...
paddle/fluid/framework/ir/conv_bias_mkldnn_fuse_pass.cc
浏览文件 @
2dda19f7
...
...
@@ -46,14 +46,16 @@ std::unique_ptr<ir::Graph> ConvBiasFusePass::ApplyImpl(
auto
*
scope
=
param_scope
();
PADDLE_ENFORCE
(
scope
);
std
::
string
type
=
is_conv3d
()
?
"conv3d"
:
"conv2d"
;
GraphPatternDetector
gpd
;
auto
*
conv_input
=
gpd
.
mutable_pattern
()
->
NewNode
(
patterns
::
PDNodeName
(
name_scope_
,
"conv_input"
))
->
AsInput
()
->
assert_is_op_input
(
"conv2d"
,
"Input"
);
->
assert_is_op_input
(
type
,
"Input"
);
patterns
::
ConvBias
conv_bias_pattern
(
gpd
.
mutable_pattern
(),
name_scope_
);
conv_bias_pattern
(
conv_input
);
conv_bias_pattern
(
conv_input
,
is_conv3d
()
);
int
found_conv_bias_count
=
0
;
auto
handler
=
[
&
](
const
GraphPatternDetector
::
subgraph_t
&
subgraph
,
Graph
*
g
)
{
...
...
@@ -109,7 +111,7 @@ std::unique_ptr<ir::Graph> ConvBiasFusePass::ApplyImpl(
desc
.
SetInput
(
"Filter"
,
std
::
vector
<
std
::
string
>
({
conv_weight
->
Name
()}));
desc
.
SetInput
(
"Bias"
,
std
::
vector
<
std
::
string
>
({
eltwise_bias
->
Name
()}));
desc
.
SetOutput
(
"Output"
,
std
::
vector
<
std
::
string
>
({
eltwise_out
->
Name
()}));
desc
.
SetType
(
"conv2d"
);
desc
.
SetType
(
type
);
for
(
auto
&
attr
:
conv
->
Op
()
->
GetAttrMap
())
{
desc
.
SetAttr
(
attr
.
first
,
attr
.
second
);
...
...
@@ -135,3 +137,5 @@ std::unique_ptr<ir::Graph> ConvBiasFusePass::ApplyImpl(
}
// namespace paddle
REGISTER_PASS
(
conv_bias_mkldnn_fuse_pass
,
paddle
::
framework
::
ir
::
ConvBiasFusePass
);
REGISTER_PASS
(
conv3d_bias_mkldnn_fuse_pass
,
paddle
::
framework
::
ir
::
Conv3DBiasFusePass
);
paddle/fluid/framework/ir/conv_bias_mkldnn_fuse_pass.h
浏览文件 @
2dda19f7
...
...
@@ -26,11 +26,19 @@ namespace ir {
class
ConvBiasFusePass
:
public
FusePassBase
{
public:
virtual
~
ConvBiasFusePass
()
{}
virtual
bool
is_conv3d
()
const
{
return
false
;
}
protected:
std
::
unique_ptr
<
ir
::
Graph
>
ApplyImpl
(
std
::
unique_ptr
<
ir
::
Graph
>
graph
)
const
;
const
std
::
string
name_scope_
{
"conv_bias_mkldnn_fuse"
};
};
/*
* Fuse the Conv3D and Elementwise_add to a Conv3DBiasOp.
*/
class
Conv3DBiasFusePass
:
public
ConvBiasFusePass
{
public:
bool
is_conv3d
()
const
override
{
return
true
;
}
};
}
// namespace ir
}
// namespace framework
}
// namespace paddle
paddle/fluid/framework/ir/graph.h
浏览文件 @
2dda19f7
...
...
@@ -177,14 +177,13 @@ class Graph {
return
nullptr
;
}
const
ProgramDesc
&
program
()
const
{
return
program_
;
}
std
::
map
<
std
::
string
,
std
::
vector
<
ir
::
Node
*>>
InitFromProgram
(
const
ProgramDesc
&
program
);
void
ResolveHazard
(
const
std
::
map
<
std
::
string
,
std
::
vector
<
ir
::
Node
*>>
&
var_nodes
);
private:
std
::
map
<
std
::
string
,
std
::
vector
<
ir
::
Node
*>>
InitFromProgram
(
const
ProgramDesc
&
program
);
// This method takes ownership of `node`.
ir
::
Node
*
AddNode
(
ir
::
Node
*
node
)
{
PADDLE_ENFORCE
(
node_set_
.
find
(
node
)
==
node_set_
.
end
());
...
...
paddle/fluid/framework/ir/graph_pattern_detector.cc
浏览文件 @
2dda19f7
...
...
@@ -1030,10 +1030,11 @@ PDNode *patterns::ElewiseAddActInplaceGrad::operator()(
}
PDNode
*
patterns
::
ConvBias
::
operator
()(
paddle
::
framework
::
ir
::
PDNode
*
conv_input
)
{
paddle
::
framework
::
ir
::
PDNode
*
conv_input
,
bool
is_conv3d
)
{
std
::
string
type
=
is_conv3d
?
"conv3d"
:
"conv2d"
;
// Create Operators
conv_input
->
assert_is_op_input
(
"conv2d"
,
"Input"
);
auto
*
conv_op
=
pattern
->
NewNode
(
conv_repr
())
->
assert_is_op
(
"conv2d"
);
conv_input
->
assert_is_op_input
(
type
,
"Input"
);
auto
*
conv_op
=
pattern
->
NewNode
(
conv_repr
())
->
assert_is_op
(
type
);
auto
*
eltiwse_op
=
pattern
->
NewNode
(
eltwise_repr
())
->
assert_is_op
(
"elementwise_add"
);
// Create variables
...
...
@@ -1041,11 +1042,11 @@ PDNode *patterns::ConvBias::operator()(
auto
*
conv_weight_var
=
pattern
->
NewNode
(
conv_weight_repr
())
->
AsInput
()
->
assert_is_persistable_var
()
->
assert_is_op_input
(
"conv2d"
,
"Filter"
);
->
assert_is_op_input
(
type
,
"Filter"
);
// intermediate variable, will be removed in the IR after fuse.
auto
*
conv_out_var
=
pattern
->
NewNode
(
conv_out_repr
())
->
AsIntermediate
()
->
assert_is_only_output_of_op
(
"conv2d"
)
->
assert_is_only_output_of_op
(
type
)
->
assert_is_op_input
(
"elementwise_add"
);
// Bias stored in elementwise_add
auto
*
eltwise_bias_var
=
pattern
->
NewNode
(
eltwise_bias_repr
())
...
...
paddle/fluid/framework/ir/graph_pattern_detector.h
浏览文件 @
2dda19f7
...
...
@@ -623,7 +623,7 @@ struct ElewiseAddActInplaceGrad : public PatternBase {
struct
ConvBias
:
public
PatternBase
{
ConvBias
(
PDPattern
*
pattern
,
const
std
::
string
&
name_scope
)
:
PatternBase
(
pattern
,
name_scope
,
"conv_bias"
)
{}
PDNode
*
operator
()(
PDNode
*
conv_input
);
PDNode
*
operator
()(
PDNode
*
conv_input
,
bool
is_conv3d
=
false
);
// declare operator node's name
PATTERN_DECL_NODE
(
conv
);
PATTERN_DECL_NODE
(
eltwise
);
...
...
paddle/fluid/framework/ir/mkldnn_placement_pass.cc
浏览文件 @
2dda19f7
...
...
@@ -13,6 +13,7 @@ See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/framework/ir/mkldnn_placement_pass.h"
#include <string>
namespace
paddle
{
namespace
framework
{
...
...
@@ -21,9 +22,16 @@ namespace ir {
std
::
unique_ptr
<
ir
::
Graph
>
MKLDNNPlacementPass
::
ApplyImpl
(
std
::
unique_ptr
<
ir
::
Graph
>
graph
)
const
{
VLOG
(
3
)
<<
"Aplies MKL-DNN placement strategy."
;
const
auto
&
op_types_list
=
Get
<
std
::
unordered_set
<
std
::
string
>>
(
"mkldnn_enabled_op_types"
);
for
(
const
Node
*
n
:
graph
->
Nodes
())
{
if
(
n
->
IsOp
()
&&
n
->
RuntimeHasAttr
(
"use_mkldnn"
))
{
n
->
Op
()
->
SetAttr
(
"use_mkldnn"
,
true
);
if
(
op_types_list
.
empty
())
{
n
->
Op
()
->
SetAttr
(
"use_mkldnn"
,
true
);
}
else
if
(
std
::
find
(
op_types_list
.
begin
(),
op_types_list
.
end
(),
n
->
Name
())
!=
op_types_list
.
end
())
{
n
->
Op
()
->
SetAttr
(
"use_mkldnn"
,
true
);
}
}
}
return
graph
;
...
...
@@ -33,5 +41,5 @@ std::unique_ptr<ir::Graph> MKLDNNPlacementPass::ApplyImpl(
}
// namespace framework
}
// namespace paddle
REGISTER_PASS
(
mkldnn_placement_pass
,
paddle
::
framework
::
ir
::
MKLDNNPlacementPass
);
REGISTER_PASS
(
mkldnn_placement_pass
,
paddle
::
framework
::
ir
::
MKLDNNPlacementPass
)
.
RequirePassAttr
(
"mkldnn_enabled_op_types"
);
paddle/fluid/framework/op_kernel_type.cc
0 → 100644
浏览文件 @
2dda19f7
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/framework/op_kernel_type.h"
namespace
paddle
{
namespace
framework
{
size_t
OpKernelType
::
Hash
::
operator
()(
const
OpKernelType
&
key
)
const
{
int
cur_loc
=
0
;
int
place
=
key
.
place_
.
which
();
cur_loc
+=
OpKernelType
::
kPlaceBits
;
int
data_type
=
static_cast
<
int
>
(
key
.
data_type_
)
<<
cur_loc
;
cur_loc
+=
OpKernelType
::
kPrimaryDTypeBits
;
int
data_layout
=
static_cast
<
int
>
(
key
.
data_layout_
)
<<
cur_loc
;
cur_loc
+=
OpKernelType
::
kLayoutBits
;
int
library_type
=
static_cast
<
int
>
(
key
.
library_type_
)
<<
cur_loc
;
cur_loc
+=
OpKernelType
::
kLibBits
;
int
customized_value
=
key
.
customized_type_value_
;
PADDLE_ENFORCE
(
customized_value
<
(
1
<<
OpKernelType
::
kCustomizeBits
));
customized_value
=
customized_value
<<
cur_loc
;
cur_loc
+=
OpKernelType
::
kCustomizeBits
;
PADDLE_ENFORCE
(
cur_loc
<
64
);
std
::
hash
<
int
>
hasher
;
return
hasher
(
place
+
data_type
+
data_layout
+
library_type
+
customized_value
);
}
bool
OpKernelType
::
operator
==
(
const
OpKernelType
&
o
)
const
{
return
platform
::
places_are_same_class
(
place_
,
o
.
place_
)
&&
data_type_
==
o
.
data_type_
&&
data_layout_
==
o
.
data_layout_
&&
library_type_
==
o
.
library_type_
&&
customized_type_value_
==
o
.
customized_type_value_
;
}
}
// namespace framework
}
// namespace paddle
paddle/fluid/framework/op_kernel_type.h
浏览文件 @
2dda19f7
...
...
@@ -24,54 +24,55 @@ limitations under the License. */
namespace
paddle
{
namespace
framework
{
struct
OpKernelType
{
struct
Hash
{
size_t
operator
()(
const
OpKernelType
&
key
)
const
{
int
place
=
key
.
place_
.
which
();
int
data_type
=
static_cast
<
int
>
(
key
.
data_type_
)
<<
LEFT_SHIFT
;
int
data_layout
=
static_cast
<
int
>
(
key
.
data_layout_
)
<<
(
LEFT_SHIFT
*
2
);
int
library_type
=
static_cast
<
int
>
(
key
.
library_type_
)
<<
(
LEFT_SHIFT
*
3
);
std
::
hash
<
int
>
hasher
;
return
hasher
(
place
+
data_type
+
data_layout
+
library_type
);
}
};
class
OpKernelType
{
public:
constexpr
static
int
kDefaultCustomizedTypeValue
=
0
;
// place, data_type, library_type kinds less than 2^8
constexpr
static
int
LEFT_SHIFT
=
8
;
proto
::
VarType
::
Type
data_type_
;
DataLayout
data_layout_
;
platform
::
Place
place_
;
LibraryType
library_type_
;
// In total should be smaller than 64.
constexpr
static
int
kPlaceBits
=
4
;
constexpr
static
int
kPrimaryDTypeBits
=
8
;
constexpr
static
int
kLayoutBits
=
4
;
constexpr
static
int
kLibBits
=
4
;
constexpr
static
int
kCustomizeBits
=
4
;
OpKernelType
(
proto
::
VarType
::
Type
data_type
,
platform
::
Place
place
,
DataLayout
data_layout
=
DataLayout
::
kAnyLayout
,
LibraryType
library_type
=
LibraryType
::
kPlain
)
LibraryType
library_type
=
LibraryType
::
kPlain
,
int
customized_type_value
=
kDefaultCustomizedTypeValue
)
:
data_type_
(
data_type
),
data_layout_
(
data_layout
),
place_
(
place
),
library_type_
(
library_type
)
{}
library_type_
(
library_type
),
customized_type_value_
(
customized_type_value
)
{}
OpKernelType
(
proto
::
VarType
::
Type
data_type
,
const
platform
::
DeviceContext
&
dev_ctx
,
DataLayout
data_layout
=
DataLayout
::
kAnyLayout
,
LibraryType
library_type
=
LibraryType
::
kPlain
)
LibraryType
library_type
=
LibraryType
::
kPlain
,
int
customized_type_value
=
kDefaultCustomizedTypeValue
)
:
data_type_
(
data_type
),
data_layout_
(
data_layout
),
place_
(
dev_ctx
.
GetPlace
()),
library_type_
(
library_type
)
{}
library_type_
(
library_type
),
customized_type_value_
(
customized_type_value
)
{}
virtual
~
OpKernelType
()
{}
struct
Hash
{
size_t
operator
()(
const
OpKernelType
&
key
)
const
;
};
size_t
hash_key
()
const
{
return
Hash
()(
*
this
);
}
bool
operator
==
(
const
OpKernelType
&
o
)
const
{
return
platform
::
places_are_same_class
(
place_
,
o
.
place_
)
&&
data_type_
==
o
.
data_type_
&&
data_layout_
==
o
.
data_layout_
&&
library_type_
==
o
.
library_type_
;
}
bool
operator
==
(
const
OpKernelType
&
o
)
const
;
bool
operator
!=
(
const
OpKernelType
&
o
)
const
{
return
!
(
*
this
==
o
);
}
proto
::
VarType
::
Type
data_type_
;
DataLayout
data_layout_
;
platform
::
Place
place_
;
LibraryType
library_type_
;
int
customized_type_value_
;
};
inline
std
::
ostream
&
operator
<<
(
std
::
ostream
&
os
,
...
...
paddle/fluid/framework/op_registry.h
浏览文件 @
2dda19f7
...
...
@@ -35,6 +35,7 @@ limitations under the License. */
namespace
paddle
{
namespace
framework
{
class
Registrar
{
public:
// In our design, various kinds of classes, e.g., operators and kernels,
...
...
@@ -78,7 +79,7 @@ struct OpKernelRegistrarFunctor;
template
<
typename
PlaceType
,
typename
T
,
typename
Func
>
inline
void
RegisterKernelClass
(
const
char
*
op_type
,
const
char
*
library_type
,
Func
func
)
{
int
customized_type_value
,
Func
func
)
{
std
::
string
library
(
library_type
);
std
::
string
data_layout
=
"ANYLAYOUT"
;
if
(
library
==
"MKLDNN"
)
{
...
...
@@ -86,7 +87,7 @@ inline void RegisterKernelClass(const char* op_type, const char* library_type,
}
OpKernelType
key
(
ToDataType
(
std
::
type_index
(
typeid
(
T
))),
PlaceType
(),
StringToDataLayout
(
data_layout
),
StringToLibraryType
(
library_type
));
StringToLibraryType
(
library_type
)
,
customized_type_value
);
OperatorWithKernel
::
AllOpKernels
()[
op_type
][
key
]
=
func
;
}
...
...
@@ -95,22 +96,26 @@ struct OpKernelRegistrarFunctor<PlaceType, false, I, KernelTypes...> {
using
KERNEL_TYPE
=
typename
std
::
tuple_element
<
I
,
std
::
tuple
<
KernelTypes
...
>>::
type
;
void
operator
()(
const
char
*
op_type
,
const
char
*
library_type
)
const
{
void
operator
()(
const
char
*
op_type
,
const
char
*
library_type
,
int
customized_type_value
)
const
{
using
T
=
typename
KERNEL_TYPE
::
ELEMENT_TYPE
;
RegisterKernelClass
<
PlaceType
,
T
>
(
op_type
,
library_type
,
[](
const
framework
::
ExecutionContext
&
ctx
)
{
op_type
,
library_type
,
customized_type_value
,
[](
const
framework
::
ExecutionContext
&
ctx
)
{
KERNEL_TYPE
().
Compute
(
ctx
);
});
constexpr
auto
size
=
std
::
tuple_size
<
std
::
tuple
<
KernelTypes
...
>>::
value
;
OpKernelRegistrarFunctor
<
PlaceType
,
I
+
1
==
size
,
I
+
1
,
KernelTypes
...
>
func
;
func
(
op_type
,
library_type
);
func
(
op_type
,
library_type
,
customized_type_value
);
}
};
template
<
typename
PlaceType
,
size_t
I
,
typename
...
KernelType
>
struct
OpKernelRegistrarFunctor
<
PlaceType
,
true
,
I
,
KernelType
...
>
{
void
operator
()(
const
char
*
op_type
,
const
char
*
library_type
)
const
{}
void
operator
()(
const
char
*
op_type
,
const
char
*
library_type
,
int
customized_type_value
)
const
{}
};
// User can register many kernel in one place. The data type could be
...
...
@@ -118,9 +123,10 @@ struct OpKernelRegistrarFunctor<PlaceType, true, I, KernelType...> {
template
<
typename
PlaceType
,
typename
...
KernelType
>
class
OpKernelRegistrar
:
public
Registrar
{
public:
explicit
OpKernelRegistrar
(
const
char
*
op_type
,
const
char
*
library_type
)
{
explicit
OpKernelRegistrar
(
const
char
*
op_type
,
const
char
*
library_type
,
int
customized_type_value
)
{
OpKernelRegistrarFunctor
<
PlaceType
,
false
,
0
,
KernelType
...
>
func
;
func
(
op_type
,
library_type
);
func
(
op_type
,
library_type
,
customized_type_value
);
}
};
...
...
@@ -130,17 +136,19 @@ struct OpKernelRegistrarFunctorEx;
template
<
typename
PlaceType
,
typename
...
DataTypeAndKernelType
>
class
OpKernelRegistrarEx
:
public
Registrar
{
public:
explicit
OpKernelRegistrarEx
(
const
char
*
op_type
,
const
char
*
library_type
)
{
explicit
OpKernelRegistrarEx
(
const
char
*
op_type
,
const
char
*
library_type
,
int
customized_type_value
)
{
OpKernelRegistrarFunctorEx
<
PlaceType
,
false
,
0
,
DataTypeAndKernelType
...
>
func
;
func
(
op_type
,
library_type
);
func
(
op_type
,
library_type
,
customized_type_value
);
}
};
template
<
typename
PlaceType
,
size_t
I
,
typename
...
DataTypeAndKernelType
>
struct
OpKernelRegistrarFunctorEx
<
PlaceType
,
true
,
I
,
DataTypeAndKernelType
...
>
{
void
operator
()(
const
char
*
op_type
,
const
char
*
library_type
)
const
{}
void
operator
()(
const
char
*
op_type
,
const
char
*
library_type
,
int
customized_type_value
)
const
{}
};
template
<
typename
PlaceType
,
size_t
I
,
typename
...
DataTypeAndKernelType
>
...
...
@@ -153,18 +161,21 @@ struct OpKernelRegistrarFunctorEx<PlaceType, false, I,
typename
std
::
tuple_element
<
I
,
std
::
tuple
<
DataTypeAndKernelType
...
>>::
type
;
void
operator
()(
const
char
*
op_type
,
const
char
*
library_type
)
const
{
RegisterKernelClass
<
PlaceType
,
T
>
(
op_type
,
library_type
,
Functor
());
void
operator
()(
const
char
*
op_type
,
const
char
*
library_type
,
int
customized_type_value
)
const
{
RegisterKernelClass
<
PlaceType
,
T
>
(
op_type
,
library_type
,
customized_type_value
,
Functor
());
constexpr
auto
size
=
std
::
tuple_size
<
std
::
tuple
<
DataTypeAndKernelType
...
>>::
value
;
OpKernelRegistrarFunctorEx
<
PlaceType
,
I
+
2
>=
size
,
I
+
2
,
DataTypeAndKernelType
...
>
func
;
func
(
op_type
,
library_type
);
func
(
op_type
,
library_type
,
customized_type_value
);
}
};
// clang-format off
/**
* check if MACRO is used in GLOBAL NAMESPACE.
*/
...
...
@@ -199,42 +210,64 @@ struct OpKernelRegistrarFunctorEx<PlaceType, false, I,
/**
* Macro to register OperatorKernel.
*/
#define REGISTER_OP_KERNEL(op_type, library_type, place_class, ...) \
STATIC_ASSERT_GLOBAL_NAMESPACE( \
__reg_op_kernel_##op_type##_##library_type##__, \
"REGISTER_OP_KERNEL must be called in global namespace"); \
static ::paddle::framework::OpKernelRegistrar<place_class, __VA_ARGS__> \
__op_kernel_registrar_##op_type##_##library_type##__(#op_type, \
#library_type); \
int TouchOpKernelRegistrar_##op_type##_##library_type() { \
__op_kernel_registrar_##op_type##_##library_type##__.Touch(); \
return 0; \
#define REGISTER_OP_KERNEL_WITH_CUSTOM_TYPE(op_type, library_type, \
place_class, customized_name, \
customized_type_value, ...) \
STATIC_ASSERT_GLOBAL_NAMESPACE( \
__reg_op_kernel_##op_type##_##library_type##_##customized_name##__, \
"REGISTER_OP_KERNEL must be called in " \
"global namespace"); \
static ::paddle::framework::OpKernelRegistrar<place_class, \
__VA_ARGS__> \
__op_kernel_registrar_##op_type##_##library_type##_##customized_name##__(\
#op_type, #library_type, customized_type_value); \
int TouchOpKernelRegistrar_##op_type##_##library_type##_##customized_name() {\
__op_kernel_registrar_##op_type##_##library_type##_##customized_name##__ \
.Touch(); \
return 0; \
}
#define REGISTER_OP_KERNEL(op_type, library_type, place_class, ...) \
REGISTER_OP_KERNEL_WITH_CUSTOM_TYPE( \
op_type, library_type, place_class, DEFAULT_TYPE, \
::paddle::framework::OpKernelType::kDefaultCustomizedTypeValue, \
__VA_ARGS__)
#define REGISTER_OP_CUDA_KERNEL(op_type, ...) \
REGISTER_OP_KERNEL(op_type, CUDA, ::paddle::platform::CUDAPlace, __VA_ARGS__)
#define REGISTER_OP_CPU_KERNEL(op_type, ...) \
REGISTER_OP_KERNEL(op_type, CPU, ::paddle::platform::CPUPlace, __VA_ARGS__)
#define REGISTER_OP_KERNEL_EX(op_type, library_type, place_class, ...) \
STATIC_ASSERT_GLOBAL_NAMESPACE( \
__reg_op_kernel_##op_type##_##library_type##__, \
"REGISTER_OP_KERNEL_EX must be called in global namespace"); \
static ::paddle::framework::OpKernelRegistrarEx<place_class, __VA_ARGS__> \
__op_kernel_registrar_##op_type##_##library_type##__(#op_type, \
#library_type); \
int TouchOpKernelRegistrar_##op_type##_##library_type() { \
__op_kernel_registrar_##op_type##_##library_type##__.Touch(); \
return 0; \
#define REGISTER_OP_KERNEL_EX(op_type, library_type, place_class, \
customized_name, \
customized_type_value, \
...) \
STATIC_ASSERT_GLOBAL_NAMESPACE( \
__reg_op_kernel_##op_type##_##library_type##_##customized_name##__, \
"REGISTER_OP_KERNEL_EX must be called in " \
"global namespace"); \
static ::paddle::framework::OpKernelRegistrarEx<place_class, \
__VA_ARGS__> \
__op_kernel_registrar_##op_type##_##library_type##_##customized_name##__(\
#op_type, #library_type, customized_type_value); \
int TouchOpKernelRegistrar_##op_type##_##library_type##_##customized_name() {\
__op_kernel_registrar_##op_type##_##library_type##_##customized_name##__ \
.Touch(); \
return 0; \
}
#define REGISTER_OP_CUDA_KERNEL_FUNCTOR(op_type, ...) \
REGISTER_OP_KERNEL_EX(op_type, CUDA, ::paddle::platform::CUDAPlace, \
__VA_ARGS__)
REGISTER_OP_KERNEL_EX( \
op_type, CUDA, ::paddle::platform::CUDAPlace, DEFAULT_TYPE, \
::paddle::framework::OpKernelType::kDefaultCustomizedTypeValue, \
__VA_ARGS__)
#define REGISTER_OP_CPU_KERNEL_FUNCTOR(op_type, ...) \
REGISTER_OP_KERNEL_EX(op_type, CPU, ::paddle::platform::CPUPlace, __VA_ARGS__)
#define REGISTER_OP_CPU_KERNEL_FUNCTOR(op_type, ...) \
REGISTER_OP_KERNEL_EX( \
op_type, CPU, ::paddle::platform::CPUPlace, DEFAULT_TYPE, \
::paddle::framework::OpKernelType::kDefaultCustomizedTypeValue, \
__VA_ARGS__)
/**
* Macro to mark what Operator and Kernel
...
...
@@ -248,13 +281,19 @@ struct OpKernelRegistrarFunctorEx<PlaceType, false, I,
extern int TouchOpRegistrar_##op_type(); \
UNUSED static int use_op_itself_##op_type##_ = TouchOpRegistrar_##op_type()
#define USE_OP_DEVICE_KERNEL(op_type, LIBRARY_TYPE) \
STATIC_ASSERT_GLOBAL_NAMESPACE( \
__use_op_kernel_##op_type##_##LIBRARY_TYPE##__, \
"USE_OP_DEVICE_KERNEL must be in global namespace"); \
extern int TouchOpKernelRegistrar_##op_type##_##LIBRARY_TYPE(); \
UNUSED static int use_op_kernel_##op_type##_##LIBRARY_TYPE##_ = \
TouchOpKernelRegistrar_##op_type##_##LIBRARY_TYPE()
#define USE_OP_DEVICE_KERNEL_WITH_CUSTOM_TYPE(op_type, \
LIBRARY_TYPE, \
customized_name) \
STATIC_ASSERT_GLOBAL_NAMESPACE( \
__use_op_kernel_##op_type##_##LIBRARY_TYPE##_##customized_name##__, \
"USE_OP_DEVICE_KERNEL must be in global namespace"); \
extern int \
TouchOpKernelRegistrar_##op_type##_##LIBRARY_TYPE##_##customized_name(); \
UNUSED static int use_op_kernel_##op_type##_##LIBRARY_TYPE##_##DEFAULT_TYPE##_ =
/* NOLINT */
\
TouchOpKernelRegistrar_##op_type##_##LIBRARY_TYPE##_##customized_name()
#define USE_OP_DEVICE_KERNEL(op_type, LIBRARY_TYPE) \
USE_OP_DEVICE_KERNEL_WITH_CUSTOM_TYPE(op_type, LIBRARY_TYPE, DEFAULT_TYPE)
// TODO(fengjiayi): The following macros
// seems ugly, do we have better method?
...
...
@@ -280,6 +319,7 @@ struct OpKernelRegistrarFunctorEx<PlaceType, false, I,
#define USE_OP(op_type) \
USE_OP_ITSELF(op_type); \
USE_OP_KERNEL(op_type)
// clang-format off
}
// namespace framework
}
// namespace paddle
paddle/fluid/framework/operator_test.cc
浏览文件 @
2dda19f7
...
...
@@ -50,6 +50,8 @@ class OpWithoutKernelCheckerMaker : public OpProtoAndCheckerMaker {
AddInput
(
"input"
,
"input of test op"
);
AddOutput
(
"output"
,
"output of test op"
);
AddAttr
<
float
>
(
"scale"
,
"scale of cosine op"
);
AddAttr
<
int
>
(
"kernel_sub_type"
,
"kernels with different implementations."
)
.
SetDefault
(
0
);
AddComment
(
"This is test op"
);
}
};
...
...
@@ -95,6 +97,8 @@ TEST(OperatorBase, all) {
namespace
paddle
{
namespace
framework
{
static
int
special_type_value
=
1
;
class
OpKernelTestProtoAndCheckerMaker
:
public
OpProtoAndCheckerMaker
{
public:
void
Make
()
{
...
...
@@ -103,11 +107,14 @@ class OpKernelTestProtoAndCheckerMaker : public OpProtoAndCheckerMaker {
AddAttr
<
float
>
(
"scale"
,
"scale of cosine op"
)
.
SetDefault
(
1.0
)
.
GreaterThan
(
0.0
);
AddAttr
<
int
>
(
"kernel_sub_type"
,
"kernels with different implementations."
)
.
SetDefault
(
0
);
AddComment
(
"This is test op"
);
}
};
static
int
cpu_kernel_run_num
=
0
;
static
int
cpu_kernel2_run_num
=
0
;
class
OpWithKernelTest
:
public
OperatorWithKernel
{
public:
...
...
@@ -117,7 +124,10 @@ class OpWithKernelTest : public OperatorWithKernel {
void
InferShape
(
framework
::
InferShapeContext
*
ctx
)
const
override
{}
OpKernelType
GetExpectedKernelType
(
const
ExecutionContext
&
ctx
)
const
override
{
return
OpKernelType
(
proto
::
VarType
::
FP32
,
ctx
.
GetPlace
());
int
sub_type
=
ctx
.
Attr
<
int
>
(
"kernel_sub_type"
);
return
OpKernelType
(
proto
::
VarType
::
FP32
,
ctx
.
GetPlace
(),
framework
::
DataLayout
::
kAnyLayout
,
framework
::
LibraryType
::
kPlain
,
sub_type
);
}
};
...
...
@@ -132,6 +142,17 @@ class CPUKernelTest : public OpKernel<float> {
}
};
template
<
typename
T1
,
typename
T2
>
class
CPUKernel2Test
:
public
OpKernel
<
float
>
{
public:
void
Compute
(
const
ExecutionContext
&
ctx
)
const
{
std
::
cout
<<
ctx
.
op
().
DebugString
()
<<
std
::
endl
;
cpu_kernel2_run_num
++
;
ASSERT_EQ
(
ctx
.
op
().
Input
(
"x"
),
"IN1"
);
ASSERT_EQ
(
ctx
.
op
().
Output
(
"y"
),
"OUT1"
);
}
};
class
OpKernelTestMultiInputsProtoAndCheckerMaker
:
public
OpProtoAndCheckerMaker
{
public:
...
...
@@ -142,6 +163,8 @@ class OpKernelTestMultiInputsProtoAndCheckerMaker
AddAttr
<
float
>
(
"scale"
,
"scale of cosine op"
)
.
SetDefault
(
1.0
)
.
GreaterThan
(
0.0
);
AddAttr
<
int
>
(
"kernel_sub_type"
,
"kernels with different implementations."
)
.
SetDefault
(
0
);
AddComment
(
"This is test op"
);
}
};
...
...
@@ -189,9 +212,15 @@ class CPUKernalMultiInputsTest : public OpKernel<float> {
REGISTER_OP_WITHOUT_GRADIENT
(
op_with_kernel
,
paddle
::
framework
::
OpWithKernelTest
,
paddle
::
framework
::
OpKernelTestProtoAndCheckerMaker
);
REGISTER_OP_CPU_KERNEL
(
op_with_kernel
,
paddle
::
framework
::
CPUKernelTest
<
float
,
float
>
);
REGISTER_OP_KERNEL_WITH_CUSTOM_TYPE
(
op_with_kernel
,
CPU
,
paddle
::
platform
::
CPUPlace
,
MY_SPECIAL_NAME
,
paddle
::
framework
::
special_type_value
,
paddle
::
framework
::
CPUKernel2Test
<
float
,
float
>
);
// test with single input
TEST
(
OpKernel
,
all
)
{
paddle
::
framework
::
InitDevices
(
true
);
...
...
@@ -211,7 +240,19 @@ TEST(OpKernel, all) {
auto
op
=
paddle
::
framework
::
OpRegistry
::
CreateOp
(
op_desc
);
ASSERT_EQ
(
paddle
::
framework
::
cpu_kernel_run_num
,
0
);
op
->
Run
(
scope
,
cpu_place
);
// kerne_sub_type = 0, hence cpu_kernel is called, cpu_kernel2 is not called.
ASSERT_EQ
(
paddle
::
framework
::
cpu_kernel_run_num
,
1
);
ASSERT_EQ
(
paddle
::
framework
::
cpu_kernel2_run_num
,
0
);
attr
=
op_desc
.
mutable_attrs
()
->
Add
();
attr
->
set_name
(
"kernel_sub_type"
);
attr
->
set_type
(
paddle
::
framework
::
proto
::
AttrType
::
INT
);
attr
->
set_i
(
1
);
auto
op2
=
paddle
::
framework
::
OpRegistry
::
CreateOp
(
op_desc
);
op2
->
Run
(
scope
,
cpu_place
);
// kerne_sub_type = 1, hence cpu_kernel2 is called, cpu_kernel is not called.
ASSERT_EQ
(
paddle
::
framework
::
cpu_kernel_run_num
,
1
);
ASSERT_EQ
(
paddle
::
framework
::
cpu_kernel2_run_num
,
1
);
}
REGISTER_OP_WITHOUT_GRADIENT
(
...
...
paddle/fluid/inference/analysis/argument.h
浏览文件 @
2dda19f7
...
...
@@ -103,6 +103,7 @@ struct Argument {
// Model specified with program and parameters files.
DECL_ARGUMENT_FIELD
(
model_program_path
,
ModelProgramPath
,
std
::
string
);
DECL_ARGUMENT_FIELD
(
model_params_path
,
ModelParamsPath
,
std
::
string
);
DECL_ARGUMENT_FIELD
(
model_from_memory
,
ModelFromMemory
,
bool
);
// The overall graph to work on.
DECL_ARGUMENT_UNIQUE_FIELD
(
main_graph
,
MainGraph
,
framework
::
ir
::
Graph
);
...
...
@@ -115,6 +116,10 @@ struct Argument {
DECL_ARGUMENT_FIELD
(
ir_analysis_passes
,
IrAnalysisPasses
,
std
::
vector
<
std
::
string
>
);
// Pass a set of op types to enable its mkldnn kernel
DECL_ARGUMENT_FIELD
(
mkldnn_enabled_op_types
,
MKLDNNEnabledOpTypes
,
std
::
unordered_set
<
std
::
string
>
);
DECL_ARGUMENT_FIELD
(
use_gpu
,
UseGPU
,
bool
);
DECL_ARGUMENT_FIELD
(
gpu_device_id
,
GPUDeviceId
,
int
);
DECL_ARGUMENT_FIELD
(
use_tensorrt
,
UseTensorRT
,
bool
);
...
...
paddle/fluid/inference/analysis/ir_pass_manager.cc
浏览文件 @
2dda19f7
...
...
@@ -63,6 +63,11 @@ void IRPassManager::CreatePasses(Argument *argument,
pass
->
Set
(
"graph_viz_path"
,
new
std
::
string
(
std
::
move
(
dot_file_path
)));
pass_num
++
;
}
if
(
pass_name
==
"mkldnn_placement_pass"
)
{
pass
->
Set
(
"mkldnn_enabled_op_types"
,
new
std
::
unordered_set
<
std
::
string
>
(
argument
->
mkldnn_enabled_op_types
()));
}
if
(
pass_name
==
"tensorrt_subgraph_pass"
)
{
PADDLE_ENFORCE
(
argument
->
tensorrt_node_teller_valid
());
...
...
paddle/fluid/inference/analysis/ir_passes/tensorrt_subgraph_pass.cc
浏览文件 @
2dda19f7
...
...
@@ -178,11 +178,12 @@ void TensorRtSubgraphPass::CreateTensorRTOp(framework::ir::Node *node,
output_mapping
.
push_back
(
output_name_map
[
name
]);
}
*
block_desc
.
Proto
()
->
mutable_vars
()
=
const_cast
<
framework
::
ProgramDesc
*>
(
&
graph
->
program
())
->
Proto
()
->
blocks
(
0
)
.
vars
();
auto
*
vars
=
block_desc
.
Proto
()
->
mutable_vars
();
for
(
framework
::
ir
::
Node
*
node
:
graph
->
Nodes
())
{
if
(
node
->
IsVar
()
&&
node
->
Var
())
{
*
vars
->
Add
()
=
*
node
->
Var
()
->
Proto
();
}
}
PADDLE_ENFORCE
(
!
block_desc
.
Proto
()
->
vars
().
empty
(),
"the block has no var-desc"
);
PADDLE_ENFORCE
(
!
output_mapping
.
empty
());
...
...
paddle/fluid/inference/analysis/passes/ir_graph_build_pass.cc
浏览文件 @
2dda19f7
...
...
@@ -46,7 +46,7 @@ void IrGraphBuildPass::RunImpl(Argument *argument) {
argument
->
model_params_path_valid
())
{
auto
program
=
LoadModel
(
argument
->
model_program_path
(),
argument
->
model_params_path
(),
argument
->
scope_ptr
(),
place
);
argument
->
scope_ptr
(),
place
,
argument
->
model_from_memory
()
);
argument
->
SetMainProgram
(
program
.
release
());
}
else
{
PADDLE_THROW
(
...
...
@@ -68,9 +68,14 @@ std::unique_ptr<framework::ProgramDesc> IrGraphBuildPass::LoadModel(
std
::
unique_ptr
<
framework
::
ProgramDesc
>
IrGraphBuildPass
::
LoadModel
(
const
std
::
string
&
program_path
,
const
std
::
string
&
params_path
,
framework
::
Scope
*
scope
,
const
platform
::
Place
&
place
)
{
framework
::
Scope
*
scope
,
const
platform
::
Place
&
place
,
bool
model_from_memory
)
{
framework
::
Executor
exe
(
place
);
return
Load
(
&
exe
,
scope
,
program_path
,
params_path
);
if
(
!
model_from_memory
)
{
return
Load
(
&
exe
,
scope
,
program_path
,
params_path
);
}
else
{
return
LoadFromMemory
(
&
exe
,
scope
,
program_path
,
params_path
);
}
}
std
::
string
IrGraphBuildPass
::
repr
()
const
{
return
"ir-graph-build-pass"
;
}
...
...
paddle/fluid/inference/analysis/passes/ir_graph_build_pass.h
浏览文件 @
2dda19f7
...
...
@@ -24,7 +24,7 @@ namespace inference {
namespace
analysis
{
/*
* Load program and parameter to memory from the disk.
* Load program and parameter to memory from the disk
or directly from memory
.
*/
class
IrGraphBuildPass
:
public
AnalysisPass
{
public:
...
...
@@ -38,7 +38,8 @@ class IrGraphBuildPass : public AnalysisPass {
const
platform
::
Place
&
place
);
std
::
unique_ptr
<
framework
::
ProgramDesc
>
LoadModel
(
const
std
::
string
&
program_path
,
const
std
::
string
&
params_path
,
framework
::
Scope
*
scope
,
const
platform
::
Place
&
place
);
framework
::
Scope
*
scope
,
const
platform
::
Place
&
place
,
bool
model_from_memory
);
std
::
string
model_binary_str_
;
};
...
...
paddle/fluid/inference/api/analysis_config.cc
浏览文件 @
2dda19f7
...
...
@@ -49,10 +49,15 @@ contrib::AnalysisConfig::AnalysisConfig(const contrib::AnalysisConfig &other) {
cpu_math_library_num_threads_
=
other
.
cpu_math_library_num_threads_
;
// fields from this.
enable_ir_optim
=
other
.
enable_ir_optim
;
// For mkldnn
use_mkldnn_
=
other
.
use_mkldnn_
;
mkldnn_enabled_op_types_
=
other
.
mkldnn_enabled_op_types_
;
use_feed_fetch_ops
=
other
.
use_feed_fetch_ops
;
use_tensorrt_
=
other
.
use_tensorrt_
;
tensorrt_max_batchsize_
=
other
.
tensorrt_max_batchsize_
;
tensorrt_workspace_size_
=
other
.
tensorrt_workspace_size_
;
model_from_memory_
=
other
.
model_from_memory_
;
if
(
use_gpu
)
{
pass_builder_
.
reset
(
new
GpuPassStrategy
(
...
...
@@ -76,10 +81,16 @@ contrib::AnalysisConfig::AnalysisConfig(contrib::AnalysisConfig &&other) {
cpu_math_library_num_threads_
=
other
.
cpu_math_library_num_threads_
;
// fields from this.
enable_ir_optim
=
other
.
enable_ir_optim
;
// For mkldnn
use_mkldnn_
=
other
.
use_mkldnn_
;
mkldnn_enabled_op_types_
=
other
.
mkldnn_enabled_op_types_
;
use_feed_fetch_ops
=
other
.
use_feed_fetch_ops
;
use_tensorrt_
=
other
.
use_tensorrt_
;
tensorrt_max_batchsize_
=
other
.
tensorrt_max_batchsize_
;
tensorrt_workspace_size_
=
other
.
tensorrt_workspace_size_
;
model_from_memory_
=
other
.
model_from_memory_
;
pass_builder_
=
std
::
move
(
other
.
pass_builder_
);
}
...
...
@@ -102,4 +113,13 @@ void contrib::AnalysisConfig::EnableTensorRtEngine(int workspace_size,
pass_builder
()
->
InsertPass
(
1
,
"tensorrt_subgraph_pass"
);
}
void
contrib
::
AnalysisConfig
::
SetModelBuffer
(
const
char
*
prog_buffer
,
size_t
prog_buffer_size
,
const
char
*
param_buffer
,
size_t
param_buffer_size
)
{
prog_file
=
std
::
string
(
prog_buffer
,
prog_buffer
+
prog_buffer_size
);
param_file
=
std
::
string
(
param_buffer
,
param_buffer
+
param_buffer_size
);
model_from_memory_
=
true
;
}
}
// namespace paddle
paddle/fluid/inference/api/analysis_predictor.cc
浏览文件 @
2dda19f7
...
...
@@ -308,6 +308,7 @@ void AnalysisPredictor::OptimizeInferenceProgram() {
argument_
.
SetUseGPU
(
config_
.
use_gpu
);
argument_
.
SetGPUDeviceId
(
config_
.
device
);
argument_
.
SetModelFromMemory
(
config_
.
model_from_memory_
);
// Analyze inference_program
if
(
!
config_
.
model_dir
.
empty
())
{
argument_
.
SetModelDir
(
config_
.
model_dir
);
...
...
@@ -326,6 +327,10 @@ void AnalysisPredictor::OptimizeInferenceProgram() {
argument_
.
SetTensorRtMaxBatchSize
(
config_
.
tensorrt_max_batchsize_
);
}
if
(
config_
.
use_mkldnn_
)
{
argument_
.
SetMKLDNNEnabledOpTypes
(
config_
.
mkldnn_enabled_op_types_
);
}
auto
passes
=
config_
.
pass_builder
()
->
AllPasses
();
if
(
!
config_
.
enable_ir_optim
)
passes
.
clear
();
argument_
.
SetIrAnalysisPasses
(
passes
);
...
...
@@ -448,20 +453,24 @@ bool AnalysisPredictor::LoadProgramDesc() {
return
false
;
}
std
::
string
pb_content
;
// Read binary
std
::
ifstream
fin
(
filename
,
std
::
ios
::
in
|
std
::
ios
::
binary
);
PADDLE_ENFORCE
(
static_cast
<
bool
>
(
fin
),
"Cannot open file %s"
,
filename
);
fin
.
seekg
(
0
,
std
::
ios
::
end
);
pb_content
.
resize
(
fin
.
tellg
());
fin
.
seekg
(
0
,
std
::
ios
::
beg
);
fin
.
read
(
&
(
pb_content
.
at
(
0
)),
pb_content
.
size
());
fin
.
close
();
// Create ProgramDesc
framework
::
proto
::
ProgramDesc
proto
;
proto
.
ParseFromString
(
pb_content
);
if
(
!
config_
.
model_from_memory
())
{
std
::
string
pb_content
;
// Read binary
std
::
ifstream
fin
(
filename
,
std
::
ios
::
in
|
std
::
ios
::
binary
);
PADDLE_ENFORCE
(
static_cast
<
bool
>
(
fin
.
is_open
()),
"Cannot open file %s"
,
filename
);
fin
.
seekg
(
0
,
std
::
ios
::
end
);
pb_content
.
resize
(
fin
.
tellg
());
fin
.
seekg
(
0
,
std
::
ios
::
beg
);
fin
.
read
(
&
(
pb_content
.
at
(
0
)),
pb_content
.
size
());
fin
.
close
();
proto
.
ParseFromString
(
pb_content
);
}
else
{
proto
.
ParseFromString
(
config_
.
prog_file
);
}
inference_program_
.
reset
(
new
framework
::
ProgramDesc
(
proto
));
return
true
;
}
...
...
@@ -469,6 +478,7 @@ bool AnalysisPredictor::LoadProgramDesc() {
bool
AnalysisPredictor
::
LoadParameters
()
{
PADDLE_ENFORCE_NOT_NULL
(
inference_program_
.
get
(),
"The inference program should be loaded first."
);
const
auto
&
global_block
=
inference_program_
->
MutableBlock
(
0
);
// create a temporary program to load parameters.
...
...
paddle/fluid/inference/api/paddle_analysis_config.h
浏览文件 @
2dda19f7
...
...
@@ -16,6 +16,7 @@
#include <cassert>
#include <memory>
#include <string>
#include <unordered_set>
#include <vector>
// Here we include some header files with relative paths, for that in deploy,
...
...
@@ -52,18 +53,26 @@ struct AnalysisConfig : public NativeConfig {
bool
use_tensorrt
()
const
{
return
use_tensorrt_
;
}
void
EnableMKLDNN
();
// NOTE this is just for internal development, please not use it.
// NOT stable yet.
bool
use_mkldnn
()
const
{
return
use_mkldnn_
;
}
void
SetMKLDNNOp
(
std
::
unordered_set
<
std
::
string
>
op_list
)
{
mkldnn_enabled_op_types_
=
op_list
;
}
// Specify the memory buffer of program and parameter
void
SetModelBuffer
(
const
char
*
prog_buffer
,
size_t
prog_buffer_size
,
const
char
*
program_buffer
,
size_t
program_buffer_size
);
bool
model_from_memory
()
const
{
return
model_from_memory_
;
}
friend
class
::
paddle
::
AnalysisPredictor
;
protected:
bool
use_tensorrt_
{
false
};
bool
use_mkldnn_
{
false
};
std
::
unordered_set
<
std
::
string
>
mkldnn_enabled_op_types_
;
int
tensorrt_workspace_size_
;
int
tensorrt_max_batchsize_
;
std
::
unique_ptr
<
PassStrategy
>
pass_builder_
;
bool
model_from_memory_
{
false
};
};
// Configurations for Anakin engine.
...
...
paddle/fluid/inference/api/paddle_pass_builder.h
浏览文件 @
2dda19f7
...
...
@@ -98,9 +98,10 @@ class CpuPassStrategy : public PassStrategy {
passes_
.
insert
(
passes_
.
begin
(),
"mkldnn_placement_pass"
);
for
(
auto
&
pass
:
std
::
vector
<
std
::
string
>
({
"depthwise_conv_mkldnn_pass"
,
//
"conv_bias_mkldnn_fuse_pass"
,
//
"conv_relu_mkldnn_fuse_pass"
,
//
std
::
vector
<
std
::
string
>
({
"depthwise_conv_mkldnn_pass"
,
//
"conv_bias_mkldnn_fuse_pass"
,
//
"conv3d_bias_mkldnn_fuse_pass"
,
//
"conv_relu_mkldnn_fuse_pass"
,
//
"conv_elementwise_add_mkldnn_fuse_pass"
}))
{
passes_
.
push_back
(
pass
);
}
...
...
paddle/fluid/inference/io.cc
浏览文件 @
2dda19f7
...
...
@@ -69,7 +69,8 @@ bool IsPersistable(const framework::VarDesc* var) {
void
LoadPersistables
(
framework
::
Executor
*
executor
,
framework
::
Scope
*
scope
,
const
framework
::
ProgramDesc
&
main_program
,
const
std
::
string
&
dirname
,
const
std
::
string
&
param_filename
)
{
const
std
::
string
&
param_filename
,
bool
model_from_memory
=
false
)
{
const
framework
::
BlockDesc
&
global_block
=
main_program
.
Block
(
0
);
framework
::
ProgramDesc
*
load_program
=
new
framework
::
ProgramDesc
();
...
...
@@ -108,6 +109,7 @@ void LoadPersistables(framework::Executor* executor, framework::Scope* scope,
op
->
SetType
(
"load_combine"
);
op
->
SetOutput
(
"Out"
,
paramlist
);
op
->
SetAttr
(
"file_path"
,
{
param_filename
});
op
->
SetAttr
(
"model_from_memory"
,
{
model_from_memory
});
op
->
CheckAttrs
();
}
...
...
@@ -130,16 +132,17 @@ std::unique_ptr<framework::ProgramDesc> Load(framework::Executor* executor,
"model version %ld is not supported."
,
main_program
->
Version
());
LoadPersistables
(
executor
,
scope
,
*
main_program
,
dirname
,
""
);
// model_from_memory is false in seperate parameters.
LoadPersistables
(
executor
,
scope
,
*
main_program
,
dirname
,
""
,
false
/* model_from_memory */
);
return
main_program
;
}
std
::
unique_ptr
<
framework
::
ProgramDesc
>
Load
(
framework
::
Executor
*
executor
,
framework
::
Scope
*
scope
,
const
std
::
string
&
prog_filename
,
const
std
::
string
&
param_filename
)
{
std
::
string
model_filename
=
prog_filename
;
std
::
string
program_desc_str
;
ReadBinaryFile
(
model
_filename
,
&
program_desc_str
);
ReadBinaryFile
(
prog
_filename
,
&
program_desc_str
);
std
::
unique_ptr
<
framework
::
ProgramDesc
>
main_program
(
new
framework
::
ProgramDesc
(
program_desc_str
));
...
...
@@ -147,7 +150,22 @@ std::unique_ptr<framework::ProgramDesc> Load(
"model version %ld is not supported."
,
main_program
->
Version
());
LoadPersistables
(
executor
,
scope
,
*
main_program
,
""
,
param_filename
);
LoadPersistables
(
executor
,
scope
,
*
main_program
,
""
,
param_filename
,
false
/* model_from_memory */
);
return
main_program
;
}
std
::
unique_ptr
<
framework
::
ProgramDesc
>
LoadFromMemory
(
framework
::
Executor
*
executor
,
framework
::
Scope
*
scope
,
const
std
::
string
&
prog_buffer
,
const
std
::
string
&
param_buffer
)
{
std
::
unique_ptr
<
framework
::
ProgramDesc
>
main_program
(
new
framework
::
ProgramDesc
(
prog_buffer
));
PADDLE_ENFORCE
(
framework
::
IsProgramVersionSupported
(
main_program
->
Version
()),
"model version %ld is not supported."
,
main_program
->
Version
());
LoadPersistables
(
executor
,
scope
,
*
main_program
,
""
,
param_buffer
,
true
/* model_filename */
);
return
main_program
;
}
...
...
paddle/fluid/inference/io.h
浏览文件 @
2dda19f7
...
...
@@ -30,7 +30,8 @@ void Init(const std::vector<std::string> argv);
void
LoadPersistables
(
framework
::
Executor
*
executor
,
framework
::
Scope
*
scope
,
const
framework
::
ProgramDesc
&
main_program
,
const
std
::
string
&
dirname
,
const
std
::
string
&
param_filename
);
const
std
::
string
&
param_filename
,
bool
model_from_memory
);
std
::
unique_ptr
<
framework
::
ProgramDesc
>
Load
(
framework
::
Executor
*
executor
,
framework
::
Scope
*
scope
,
...
...
@@ -41,6 +42,10 @@ std::unique_ptr<framework::ProgramDesc> Load(framework::Executor* executor,
const
std
::
string
&
prog_filename
,
const
std
::
string
&
param_filename
);
std
::
unique_ptr
<
framework
::
ProgramDesc
>
LoadFromMemory
(
framework
::
Executor
*
executor
,
framework
::
Scope
*
scope
,
const
std
::
string
&
prog_buffer
,
const
std
::
string
&
param_buffer
);
// Save the variables from a scope to disk.
void
SaveVars
(
const
framework
::
Scope
&
scope
,
const
std
::
vector
<
std
::
string
>&
vars
,
const
std
::
string
&
dirname
,
...
...
paddle/fluid/inference/tensorrt/convert/test_prelu_op.cc
浏览文件 @
2dda19f7
...
...
@@ -90,5 +90,4 @@ TEST(prelu_op, test_scalar) {
}
// namespace inference
}
// namespace paddle
// USE_OP(prelu);
USE_CPU_ONLY_OP
(
prelu
);
USE_OP
(
prelu
);
paddle/fluid/inference/tensorrt/plugin/CMakeLists.txt
浏览文件 @
2dda19f7
nv_library
(
tensorrt_plugin
SRCS trt_plugin.cc split_op_plugin.cu elementwise_op_plugin.cu prelu_op_plugin.cu
avg_pool_op_plugin.cu
DEPS enforce tensorrt_engine
)
DEPS enforce tensorrt_engine
prelu
)
paddle/fluid/inference/tensorrt/plugin/prelu_op_plugin.cu
浏览文件 @
2dda19f7
...
...
@@ -14,92 +14,16 @@
#include <stdio.h>
#include <cassert>
#include <vector>
#include "glog/logging.h"
#include "paddle/fluid/inference/tensorrt/plugin/prelu_op_plugin.h"
#include "paddle/fluid/operators/math/prelu.h"
namespace
paddle
{
namespace
inference
{
namespace
tensorrt
{
namespace
plugin
{
static
const
int
CUDA_NUM_THREADS
=
1024
;
static
const
int
CUDA_MAX_NUM_BLOCKS
=
65535
;
inline
static
int
GET_NUM_BLOCKS
(
const
int
N
)
{
return
(
N
+
CUDA_NUM_THREADS
-
1
)
/
CUDA_NUM_THREADS
;
}
__global__
void
PReluChannelWiseKernel
(
const
float
*
input
,
const
float
*
alpha
,
float
*
output
,
int
channel
,
size_t
spatial_size
)
{
size_t
offset
=
blockIdx
.
x
*
spatial_size
;
const
float
*
in
=
input
+
offset
;
float
*
out
=
output
+
offset
;
float
scale
=
alpha
[
blockIdx
.
x
%
channel
];
for
(
size_t
i
=
threadIdx
.
x
;
i
<
spatial_size
;
i
+=
blockDim
.
x
)
{
float
x
=
in
[
i
];
out
[
i
]
=
(
x
>
0
)
?
x
:
scale
*
x
;
}
}
__global__
void
PReluElementWiseKernel
(
const
float
*
input
,
const
float
*
alpha
,
float
*
output
,
size_t
spatial_size
)
{
size_t
offset
=
blockIdx
.
x
*
spatial_size
;
const
float
*
in
=
input
+
offset
;
const
float
*
scale
=
alpha
+
offset
;
float
*
out
=
output
+
offset
;
for
(
size_t
i
=
threadIdx
.
x
;
i
<
spatial_size
;
i
+=
blockDim
.
x
)
{
float
x
=
in
[
i
];
out
[
i
]
=
(
x
>
0
)
?
x
:
scale
[
i
]
*
x
;
}
}
__global__
void
PReluScalarKernel
(
const
float
*
input
,
const
float
*
alpha
,
float
*
output
,
size_t
spatial_size
)
{
size_t
offset
=
blockIdx
.
x
*
spatial_size
;
const
float
*
in
=
input
+
offset
;
float
scale
=
*
alpha
;
float
*
out
=
output
+
offset
;
for
(
size_t
i
=
threadIdx
.
x
;
i
<
spatial_size
;
i
+=
blockDim
.
x
)
{
float
x
=
in
[
i
];
out
[
i
]
=
(
x
>
0
)
?
x
:
scale
*
x
;
}
}
static
inline
void
PReluChannelWise
(
cudaStream_t
stream
,
const
float
*
input
,
const
float
*
alpha
,
float
*
output
,
int
batch_size
,
const
nvinfer1
::
Dims
&
dims
)
{
size_t
unroll
=
batch_size
*
dims
.
d
[
0
];
size_t
spatial_size
=
dims
.
d
[
1
]
*
dims
.
d
[
2
];
CHECK_LT
(
unroll
,
CUDA_MAX_NUM_BLOCKS
);
PReluChannelWiseKernel
<<<
unroll
,
CUDA_NUM_THREADS
,
0
,
stream
>>>
(
input
,
alpha
,
output
,
dims
.
d
[
0
],
spatial_size
);
}
static
inline
void
PReluElementWise
(
cudaStream_t
stream
,
const
float
*
input
,
const
float
*
alpha
,
float
*
output
,
int
batch_size
,
const
nvinfer1
::
Dims
&
dims
)
{
size_t
unroll
=
batch_size
*
dims
.
d
[
0
];
size_t
spatial_size
=
dims
.
d
[
1
]
*
dims
.
d
[
2
];
CHECK_LT
(
unroll
,
CUDA_MAX_NUM_BLOCKS
);
PReluElementWiseKernel
<<<
unroll
,
CUDA_NUM_THREADS
,
0
,
stream
>>>
(
input
,
alpha
,
output
,
spatial_size
);
}
static
inline
void
PReluScalar
(
cudaStream_t
stream
,
const
float
*
input
,
const
float
*
alpha
,
float
*
output
,
int
batch_size
,
const
nvinfer1
::
Dims
&
dims
)
{
size_t
unroll
=
batch_size
*
dims
.
d
[
0
];
size_t
spatial_size
=
dims
.
d
[
1
]
*
dims
.
d
[
2
];
CHECK_LT
(
unroll
,
CUDA_MAX_NUM_BLOCKS
);
PReluScalarKernel
<<<
unroll
,
CUDA_NUM_THREADS
,
0
,
stream
>>>
(
input
,
alpha
,
output
,
spatial_size
);
}
nvinfer1
::
Dims
PReluPlugin
::
getOutputDimensions
(
int
index
,
const
nvinfer1
::
Dims
*
inputDims
,
int
nbInputs
)
{
...
...
@@ -110,19 +34,31 @@ nvinfer1::Dims PReluPlugin::getOutputDimensions(int index,
return
output_dims
;
}
int
PReluPlugin
::
enqueue
(
int
batch
S
ize
,
const
void
*
const
*
inputs
,
int
PReluPlugin
::
enqueue
(
int
batch
_s
ize
,
const
void
*
const
*
inputs
,
void
**
outputs
,
void
*
workspace
,
cudaStream_t
stream
)
{
// input dims is CHW.
const
auto
&
input_dims
=
this
->
getInputDims
(
0
);
const
float
*
input
=
reinterpret_cast
<
const
float
*>
(
inputs
[
0
]);
const
float
*
alpha
=
reinterpret_cast
<
const
float
*>
(
alpha_
.
get
().
values
);
float
*
output
=
reinterpret_cast
<
float
**>
(
outputs
)[
0
];
std
::
vector
<
int
>
input_shape
;
input_shape
.
push_back
(
batch_size
);
for
(
int
i
=
0
;
i
<
input_dims
.
nbDims
;
i
++
)
{
input_shape
.
push_back
(
input_dims
.
d
[
i
]);
}
if
(
mode_
==
"channel"
)
{
PReluChannelWise
(
stream
,
input
,
alpha
,
output
,
batchSize
,
input_dims
);
operators
::
math
::
PreluChannelWiseDirectCUDAFunctor
<
float
>
prelu_channel_wise
;
prelu_channel_wise
(
stream
,
input
,
alpha
,
output
,
input_shape
);
}
else
if
(
mode_
==
"element"
)
{
PReluElementWise
(
stream
,
input
,
alpha
,
output
,
batchSize
,
input_dims
);
operators
::
math
::
PreluElementWiseDirectCUDAFunctor
<
float
>
prelu_element_wise
;
prelu_element_wise
(
stream
,
input
,
alpha
,
output
,
input_shape
);
}
else
{
PReluScalar
(
stream
,
input
,
alpha
,
output
,
batchSize
,
input_dims
);
operators
::
math
::
PreluScalarDirectCUDAFunctor
<
float
>
prelu_scalar
;
prelu_scalar
(
stream
,
input
,
alpha
,
output
,
input_shape
);
}
return
cudaGetLastError
()
!=
cudaSuccess
;
}
...
...
paddle/fluid/inference/tests/api/analyzer_dam_tester.cc
浏览文件 @
2dda19f7
...
...
@@ -188,10 +188,16 @@ void SetInput(std::vector<std::vector<PaddleTensor>> *inputs) {
}
// Easy for profiling independently.
TEST
(
Analyzer_dam
,
profil
e
)
{
void
profile
(
bool
use_mkldnn
=
fals
e
)
{
contrib
::
AnalysisConfig
cfg
;
SetConfig
(
&
cfg
);
if
(
use_mkldnn
)
{
cfg
.
EnableMKLDNN
();
std
::
unordered_set
<
std
::
string
>
op_list
=
{
"conv3d"
};
cfg
.
SetMKLDNNOp
(
op_list
);
}
std
::
vector
<
PaddleTensor
>
outputs
;
std
::
vector
<
std
::
vector
<
PaddleTensor
>>
input_slots_all
;
SetInput
(
&
input_slots_all
);
...
...
@@ -209,6 +215,11 @@ TEST(Analyzer_dam, profile) {
}
}
TEST
(
Analyzer_dam
,
profile
)
{
profile
();
}
#ifdef PADDLE_WITH_MKLDNN
TEST
(
Analyzer_dam
,
profile_mkldnn
)
{
profile
(
true
/* use_mkldnn */
);
}
#endif
// Check the fuse status
TEST
(
Analyzer_dam
,
fuse_statis
)
{
contrib
::
AnalysisConfig
cfg
;
...
...
@@ -222,9 +233,14 @@ TEST(Analyzer_dam, fuse_statis) {
}
// Compare result of NativeConfig and AnalysisConfig
TEST
(
Analyzer_dam
,
compar
e
)
{
contrib
::
AnalysisConfig
cfg
;
void
compare
(
bool
use_mkldnn
=
fals
e
)
{
AnalysisConfig
cfg
;
SetConfig
(
&
cfg
);
if
(
use_mkldnn
)
{
cfg
.
EnableMKLDNN
();
std
::
unordered_set
<
std
::
string
>
op_list
=
{
"conv3d"
};
cfg
.
SetMKLDNNOp
(
op_list
);
}
std
::
vector
<
std
::
vector
<
PaddleTensor
>>
input_slots_all
;
SetInput
(
&
input_slots_all
);
...
...
@@ -233,5 +249,10 @@ TEST(Analyzer_dam, compare) {
reinterpret_cast
<
const
PaddlePredictor
::
Config
*>
(
&
cfg
),
input_slots_all
);
}
TEST
(
Analyzer_dam
,
compare
)
{
compare
();
}
#ifdef PADDLE_WITH_MKLDNN
TEST
(
Analyzer_dam
,
compare_mkldnn
)
{
compare
(
true
/* use_mkldnn */
);
}
#endif
}
// namespace inference
}
// namespace paddle
paddle/fluid/inference/tests/api/analyzer_ner_tester.cc
浏览文件 @
2dda19f7
...
...
@@ -93,9 +93,17 @@ void PrepareInputs(std::vector<PaddleTensor> *input_slots, DataRecord *data,
}
}
void
SetConfig
(
contrib
::
AnalysisConfig
*
cfg
)
{
cfg
->
prog_file
=
FLAGS_infer_model
+
"/__model__"
;
cfg
->
param_file
=
FLAGS_infer_model
+
"/param"
;
void
SetConfig
(
contrib
::
AnalysisConfig
*
cfg
,
bool
memory_load
=
false
)
{
if
(
memory_load
)
{
std
::
string
buffer_prog
,
buffer_param
;
ReadBinaryFile
(
FLAGS_infer_model
+
"/__model__"
,
&
buffer_prog
);
ReadBinaryFile
(
FLAGS_infer_model
+
"/param"
,
&
buffer_param
);
cfg
->
SetModelBuffer
(
&
buffer_prog
[
0
],
buffer_prog
.
size
(),
&
buffer_param
[
0
],
buffer_param
.
size
());
}
else
{
cfg
->
prog_file
=
FLAGS_infer_model
+
"/__model__"
;
cfg
->
param_file
=
FLAGS_infer_model
+
"/param"
;
}
cfg
->
use_gpu
=
false
;
cfg
->
device
=
0
;
cfg
->
specify_input_name
=
true
;
...
...
@@ -114,9 +122,9 @@ void SetInput(std::vector<std::vector<PaddleTensor>> *inputs) {
}
// Easy for profiling independently.
TEST
(
Analyzer_Chinese_ner
,
profil
e
)
{
void
profile
(
bool
memory_load
=
fals
e
)
{
contrib
::
AnalysisConfig
cfg
;
SetConfig
(
&
cfg
);
SetConfig
(
&
cfg
,
memory_load
);
std
::
vector
<
PaddleTensor
>
outputs
;
std
::
vector
<
std
::
vector
<
PaddleTensor
>>
input_slots_all
;
...
...
@@ -138,6 +146,12 @@ TEST(Analyzer_Chinese_ner, profile) {
}
}
TEST
(
Analyzer_Chinese_ner
,
profile
)
{
profile
();
}
TEST
(
Analyzer_Chinese_ner
,
profile_memory_load
)
{
profile
(
true
/* memory_load */
);
}
// Check the fuse status
TEST
(
Analyzer_Chinese_ner
,
fuse_statis
)
{
contrib
::
AnalysisConfig
cfg
;
...
...
paddle/fluid/inference/tests/api/config_printer.h
浏览文件 @
2dda19f7
...
...
@@ -49,8 +49,6 @@ std::ostream &operator<<(std::ostream &os, const NativeConfig &config) {
os
<<
GenSpaces
(
num_spaces
)
<<
"device: "
<<
config
.
device
<<
"
\n
"
;
os
<<
GenSpaces
(
num_spaces
)
<<
"fraction_of_gpu_memory: "
<<
config
.
fraction_of_gpu_memory
<<
"
\n
"
;
os
<<
GenSpaces
(
num_spaces
)
<<
"prog_file: "
<<
config
.
prog_file
<<
"
\n
"
;
os
<<
GenSpaces
(
num_spaces
)
<<
"param_file: "
<<
config
.
param_file
<<
"
\n
"
;
os
<<
GenSpaces
(
num_spaces
)
<<
"specify_input_name: "
<<
config
.
specify_input_name
<<
"
\n
"
;
os
<<
GenSpaces
(
num_spaces
)
...
...
@@ -65,6 +63,13 @@ std::ostream &operator<<(std::ostream &os,
os
<<
GenSpaces
(
num_spaces
)
<<
"contrib::AnalysisConfig {
\n
"
;
num_spaces
++
;
os
<<
*
reinterpret_cast
<
const
NativeConfig
*>
(
&
config
);
if
(
!
config
.
model_from_memory
())
{
os
<<
GenSpaces
(
num_spaces
)
<<
"prog_file: "
<<
config
.
prog_file
<<
"
\n
"
;
os
<<
GenSpaces
(
num_spaces
)
<<
"param_file: "
<<
config
.
param_file
<<
"
\n
"
;
}
else
{
os
<<
GenSpaces
(
num_spaces
)
<<
"prog_file and param_file: load from memory
\n
"
;
}
os
<<
GenSpaces
(
num_spaces
)
<<
"enable_ir_optim: "
<<
config
.
enable_ir_optim
<<
"
\n
"
;
os
<<
GenSpaces
(
num_spaces
)
...
...
paddle/fluid/inference/utils/CMakeLists.txt
浏览文件 @
2dda19f7
cc_library
(
benchmark SRCS benchmark.cc DEPS enforce
)
cc_test
(
test_benchmark SRCS benchmark_tester.cc DEPS benchmark
)
cc_binary
(
visualizer SRCS visualizer.cc DEPS analysis
paddle_pass_builder ir_pass_manager pass graph_viz_pass analysis_passes
)
if
(
WIN32
)
target_link_libraries
(
visualizer shlwapi
)
endif
(
WIN32
)
paddle/fluid/inference/utils/visualizer.cc
0 → 100644
浏览文件 @
2dda19f7
// Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "paddle/fluid/inference/utils/visualizer.h"
#include <gflags/gflags.h>
#include <glog/logging.h>
#include <fstream>
#include <memory>
#include "paddle/fluid/framework/ir/graph_viz_pass.h"
#include "paddle/fluid/inference/analysis/analyzer.h"
#include "paddle/fluid/inference/analysis/passes/ir_analysis_pass.h"
#include "paddle/fluid/platform/init.h"
DEFINE_string
(
model_dir
,
""
,
"model directory"
);
DEFINE_string
(
model_program_path
,
""
,
"model program path"
);
DEFINE_string
(
model_params_path
,
""
,
"model params path"
);
USE_PASS
(
graph_viz_pass
);
USE_PASS
(
graph_to_program_pass
);
using
paddle
::
inference
::
analysis
::
Argument
;
namespace
paddle
{
namespace
inference
{
namespace
utils
{
void
Visualizer
::
SetArgument
(
Argument
*
argument
)
{
argument_
=
argument
;
}
bool
Visualizer
::
Run
()
{
paddle
::
framework
::
InitDevices
(
false
);
paddle
::
inference
::
analysis
::
Analyzer
().
Run
(
argument_
);
return
true
;
}
}
// namespace utils
}
// namespace inference
}
// namespace paddle
// Generate a dot file describing the structure of graph.
// To use this tool, run command: ./visualizer [options...]
// Options:
// --model_dir: the directory of model
// --model_program_path: the path of program
// --model_params_path: the path of params
int
main
(
int
argc
,
char
*
argv
[])
{
gflags
::
ParseCommandLineFlags
(
&
argc
,
&
argv
,
true
);
google
::
InitGoogleLogging
(
argv
[
0
]);
paddle
::
inference
::
analysis
::
Argument
argument
;
argument
.
SetUseGPU
(
false
);
argument
.
SetUseTensorRT
(
false
);
if
(
FLAGS_model_dir
.
empty
())
{
if
(
FLAGS_model_program_path
.
empty
()
||
FLAGS_model_params_path
.
empty
())
{
LOG
(
ERROR
)
<<
"Please set model_dir"
" or model_program_path and model_params_path"
;
return
-
1
;
}
else
{
argument
.
SetModelProgramPath
(
FLAGS_model_program_path
);
argument
.
SetModelParamsPath
(
FLAGS_model_params_path
);
}
}
else
{
argument
.
SetModelDir
(
FLAGS_model_dir
);
}
// Only 1 pass, default filename is 0_ir_origin.dot
// For more details, looking for paddle::inference::analysis::IRPassManager
argument
.
SetIrAnalysisPasses
({
"graph_viz_pass"
});
std
::
unique_ptr
<
paddle
::
framework
::
Scope
>
scope
{
new
paddle
::
framework
::
Scope
()};
argument
.
SetScopeNotOwned
(
const_cast
<
paddle
::
framework
::
Scope
*>
(
scope
.
get
()));
paddle
::
inference
::
utils
::
Visualizer
visualizer
;
visualizer
.
SetArgument
(
&
argument
);
visualizer
.
Run
();
return
0
;
}
paddle/fluid/inference/utils/visualizer.h
0 → 100644
浏览文件 @
2dda19f7
// Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#pragma once
#include <string>
#include "paddle/fluid/inference/analysis/argument.h"
namespace
paddle
{
namespace
inference
{
namespace
utils
{
using
paddle
::
inference
::
analysis
::
Argument
;
class
Visualizer
final
{
public:
Visualizer
()
=
default
;
~
Visualizer
()
=
default
;
Visualizer
(
const
Visualizer
&
)
=
delete
;
Visualizer
&
operator
=
(
const
Visualizer
&
)
=
delete
;
void
SetArgument
(
Argument
*
);
bool
Run
();
private:
Argument
*
argument_
;
};
}
// namespace utils
}
// namespace inference
}
// namespace paddle
paddle/fluid/operators/CMakeLists.txt
浏览文件 @
2dda19f7
...
...
@@ -70,7 +70,7 @@ endif()
set
(
COMMON_OP_DEPS
${
COMMON_OP_DEPS
}
sequence_padding sequence_scale cos_sim_functor memory jit_kernel concat_and_split cross_entropy softmax vol2col im2col sampler
)
set
(
COMMON_OP_DEPS
${
COMMON_OP_DEPS
}
sequence2batch lstm_compute matrix_bit_code gru_compute activation_functions
)
if
(
WITH_GPU
)
set
(
COMMON_OP_DEPS
${
COMMON_OP_DEPS
}
depthwise_conv
)
set
(
COMMON_OP_DEPS
${
COMMON_OP_DEPS
}
depthwise_conv
prelu
)
endif
()
# FIXME(typhoonzero): operator deps may not needed.
...
...
paddle/fluid/operators/activation_mkldnn_op.cc
浏览文件 @
2dda19f7
...
...
@@ -100,8 +100,9 @@ void eltwise_forward(const framework::ExecutionContext &ctx,
const
T
*
x_data
=
x
->
data
<
T
>
();
T
*
y_data
=
y
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
PADDLE_ENFORCE
(
x
->
dims
().
size
()
==
2
||
x
->
dims
().
size
()
==
4
,
"Input dim must be with 2 or 4"
);
PADDLE_ENFORCE
(
x
->
dims
().
size
()
==
2
||
x
->
dims
().
size
()
==
3
||
x
->
dims
().
size
()
==
4
,
"Input dim must be with 2, 3 or 4"
);
std
::
vector
<
int
>
src_tz
=
framework
::
vectorize2int
(
x
->
dims
());
...
...
paddle/fluid/operators/activation_op.cc
浏览文件 @
2dda19f7
...
...
@@ -76,8 +76,8 @@ framework::OpKernelType GetKernelType(const framework::ExecutionContext& ctx,
}
#endif
return
framework
::
OpKernelType
(
framework
::
ToDataType
(
ctx
.
Input
<
framework
::
Tensor
>
(
name
)
->
type
())
,
ctx
.
GetPlace
(),
layout
,
library
);
framework
::
GetDataTypeOfVar
(
ctx
.
InputVar
(
name
)),
ctx
.
GetPlace
(),
layout
,
library
);
}
class
ActivationOp
:
public
framework
::
OperatorWithKernel
{
...
...
paddle/fluid/operators/activation_op.h
浏览文件 @
2dda19f7
...
...
@@ -41,6 +41,12 @@ static std::unordered_set<std::string> InplaceOpSet = {
"floor"
,
"reciprocal"
,
"relu6"
,
"soft_relu"
,
"hard_sigmoid"
,
};
/* The following operator can be used to process SelectedRows, because the
* output of those operator for zero is zero too.
*/
static
std
::
unordered_set
<
std
::
string
>
CanBeUsedBySelectedRows
=
{
"abs"
,
"abs_grad"
,
"square"
,
"square_grad"
,
"sqrt"
,
"sqrt_grad"
};
static
bool
IsInplace
(
std
::
string
op
)
{
return
InplaceOpSet
.
count
(
op
);
}
template
<
typename
DeviceContext
,
typename
Functor
>
...
...
@@ -50,16 +56,38 @@ class ActivationKernel
using
T
=
typename
Functor
::
ELEMENT_TYPE
;
void
Compute
(
const
framework
::
ExecutionContext
&
context
)
const
override
{
auto
&
X
=
detail
::
Ref
(
context
.
Input
<
framework
::
Tensor
>
(
"X"
),
"Cannot get input tensor X, variable name = %s"
,
context
.
op
().
Input
(
"X"
));
auto
&
Out
=
detail
::
Ref
(
context
.
Output
<
framework
::
Tensor
>
(
"Out"
),
"Cannot get output tensor Out, variable name = %s"
,
context
.
op
().
Output
(
"Out"
));
Out
.
mutable_data
<
T
>
(
context
.
GetPlace
());
auto
x_var
=
context
.
InputVar
(
"X"
);
auto
out_var
=
context
.
OutputVar
(
"Out"
);
PADDLE_ENFORCE
(
x_var
!=
nullptr
,
"Cannot get input Variable X, variable name = %s"
,
context
.
op
().
Input
(
"X"
));
PADDLE_ENFORCE
(
out_var
!=
nullptr
,
"Cannot get output Variable Out, variable name = %s"
,
context
.
op
().
Output
(
"Out"
));
framework
::
Tensor
X
,
*
Out
;
if
(
CanBeUsedBySelectedRows
.
count
(
context
.
op
().
Type
()))
{
X
=
detail
::
Ref
(
paddle
::
framework
::
GetLoDTensorOrSelectedRowsValueFromVar
(
*
x_var
),
"Cannot get input Tensor X, variable name = %s"
,
context
.
op
().
Input
(
"X"
));
Out
=
paddle
::
framework
::
GetMutableLoDTensorOrSelectedRowsValueFromVar
(
out_var
);
}
else
{
X
=
detail
::
Ref
(
context
.
Input
<
framework
::
Tensor
>
(
"X"
),
"Cannot get input Tensor X, variable name = %s"
,
context
.
op
().
Input
(
"X"
));
Out
=
context
.
Output
<
framework
::
Tensor
>
(
"Out"
);
}
PADDLE_ENFORCE
(
Out
!=
nullptr
,
"Cannot get output tensor Out, variable name = %s"
,
context
.
op
().
Output
(
"Out"
));
Out
->
mutable_data
<
T
>
(
context
.
GetPlace
());
auto
x
=
framework
::
EigenVector
<
T
>::
Flatten
(
X
);
auto
out
=
framework
::
EigenVector
<
T
>::
Flatten
(
Out
);
auto
out
=
framework
::
EigenVector
<
T
>::
Flatten
(
*
Out
);
auto
*
place
=
context
.
template
device_context
<
DeviceContext
>().
eigen_device
();
Functor
functor
;
...
...
@@ -78,14 +106,54 @@ class ActivationGradKernel
public:
using
T
=
typename
Functor
::
ELEMENT_TYPE
;
void
Compute
(
const
framework
::
ExecutionContext
&
context
)
const
override
{
auto
*
Out
=
context
.
Input
<
framework
::
Tensor
>
(
"Out"
);
auto
*
dOut
=
context
.
Input
<
framework
::
Tensor
>
(
framework
::
GradVarName
(
"Out"
));
auto
*
dX
=
context
.
Output
<
framework
::
Tensor
>
(
framework
::
GradVarName
(
"X"
));
auto
out_var
=
context
.
InputVar
(
"Out"
);
auto
out_grad_var
=
context
.
InputVar
(
framework
::
GradVarName
(
"Out"
));
auto
x_grad_var
=
context
.
OutputVar
(
framework
::
GradVarName
(
"X"
));
PADDLE_ENFORCE
(
out_var
!=
nullptr
,
"Cannot get input Variable Out, variable name = %s"
,
context
.
op
().
Input
(
"Out"
));
PADDLE_ENFORCE
(
out_grad_var
!=
nullptr
,
"Cannot get input Variable %s, variable name = %s"
,
framework
::
GradVarName
(
"Out"
),
context
.
op
().
Input
(
framework
::
GradVarName
(
"Out"
)));
PADDLE_ENFORCE
(
x_grad_var
!=
nullptr
,
"Cannot get output Variable %s, variable name = %s"
,
framework
::
GradVarName
(
"X"
),
context
.
op
().
Output
(
framework
::
GradVarName
(
"X"
)));
framework
::
Tensor
Out
,
dOut
,
*
dX
;
if
(
CanBeUsedBySelectedRows
.
count
(
context
.
op
().
Type
()))
{
Out
=
detail
::
Ref
(
paddle
::
framework
::
GetLoDTensorOrSelectedRowsValueFromVar
(
*
out_var
),
"Cannot get input Tensor Out, variable name = %s"
,
context
.
op
().
Input
(
"Out"
));
dOut
=
detail
::
Ref
(
paddle
::
framework
::
GetLoDTensorOrSelectedRowsValueFromVar
(
*
out_grad_var
),
"Cannot get input Tensor %s, variable name = %s"
,
framework
::
GradVarName
(
"Out"
),
context
.
op
().
Input
(
framework
::
GradVarName
(
"Out"
)));
dX
=
paddle
::
framework
::
GetMutableLoDTensorOrSelectedRowsValueFromVar
(
x_grad_var
);
}
else
{
Out
=
detail
::
Ref
(
context
.
Input
<
framework
::
Tensor
>
(
"Out"
),
"Cannot get input Tensor Out, variable name = %s"
,
context
.
op
().
Input
(
"Out"
));
dOut
=
detail
::
Ref
(
context
.
Input
<
framework
::
Tensor
>
(
framework
::
GradVarName
(
"Out"
)),
"Cannot get input Tensor %s, variable name = %s"
,
framework
::
GradVarName
(
"Out"
),
context
.
op
().
Input
(
framework
::
GradVarName
(
"Out"
)));
dX
=
context
.
Output
<
framework
::
Tensor
>
(
framework
::
GradVarName
(
"X"
));
}
PADDLE_ENFORCE
(
dX
!=
nullptr
,
"Cannot get output tensor %s, variable name = %s"
,
framework
::
GradVarName
(
"X"
),
context
.
op
().
Output
(
framework
::
GradVarName
(
"X"
)));
dX
->
mutable_data
<
T
>
(
context
.
GetPlace
());
auto
dout
=
framework
::
EigenVector
<
T
>::
Flatten
(
*
dOut
);
auto
out
=
framework
::
EigenVector
<
T
>::
Flatten
(
*
Out
);
auto
dout
=
framework
::
EigenVector
<
T
>::
Flatten
(
dOut
);
auto
out
=
framework
::
EigenVector
<
T
>::
Flatten
(
Out
);
auto
dx
=
framework
::
EigenVector
<
T
>::
Flatten
(
*
dX
);
auto
*
place
=
context
.
template
device_context
<
DeviceContext
>().
eigen_device
();
...
...
@@ -96,8 +164,19 @@ class ActivationGradKernel
}
bool
inplace
=
functor
.
Inplace
();
if
(
!
inplace
)
{
auto
*
X
=
context
.
Input
<
framework
::
Tensor
>
(
"X"
);
auto
x
=
framework
::
EigenVector
<
T
>::
Flatten
(
*
X
);
auto
x_var
=
context
.
InputVar
(
"X"
);
PADDLE_ENFORCE
(
x_var
!=
nullptr
,
"Cannot get input tensor X, variable name = %s"
,
context
.
op
().
Input
(
"X"
));
framework
::
Tensor
X
;
if
(
CanBeUsedBySelectedRows
.
count
(
context
.
op
().
Type
()))
{
X
=
detail
::
Ref
(
paddle
::
framework
::
GetLoDTensorOrSelectedRowsValueFromVar
(
*
x_var
));
}
else
{
X
=
detail
::
Ref
(
context
.
Input
<
framework
::
Tensor
>
(
"X"
));
}
auto
x
=
framework
::
EigenVector
<
T
>::
Flatten
(
X
);
functor
(
*
place
,
x
,
out
,
dout
,
dx
);
}
else
{
VLOG
(
10
)
<<
" Inplace activation "
;
...
...
paddle/fluid/operators/attention_lstm_op.cc
浏览文件 @
2dda19f7
...
...
@@ -231,10 +231,10 @@ use lstm_x_t as input and compute as standard LSTM.
template
<
typename
T
>
inline
void
bias_relu
(
const
int
n
,
const
T
*
x
,
const
T
*
bias
,
T
*
y
)
{
if
(
bias
)
{
math
::
vec_add_bias
<
T
,
platform
::
jit
::
avx
>
(
n
,
*
bias
,
x
,
y
);
math
::
vec_relu
<
T
,
platform
::
jit
::
avx
>
(
n
,
y
,
y
);
math
::
vec_add_bias
<
T
,
platform
::
avx
>
(
n
,
*
bias
,
x
,
y
);
math
::
vec_relu
<
T
,
platform
::
avx
>
(
n
,
y
,
y
);
}
else
{
math
::
vec_relu
<
T
,
platform
::
jit
::
avx
>
(
n
,
x
,
y
);
math
::
vec_relu
<
T
,
platform
::
avx
>
(
n
,
x
,
y
);
}
}
...
...
@@ -245,8 +245,8 @@ inline void vec_softmax(const int n, const T* x, T* y) {
for
(
int
i
=
1
;
i
<
n
;
++
i
)
{
scalar
=
scalar
<
x
[
i
]
?
x
[
i
]
:
scalar
;
}
math
::
vec_add_bias
<
T
,
platform
::
jit
::
avx
>
(
n
,
-
scalar
,
x
,
y
);
// sub
math
::
vec_exp
<
T
>
(
n
,
y
,
y
);
// exp
math
::
vec_add_bias
<
T
,
platform
::
avx
>
(
n
,
-
scalar
,
x
,
y
);
// sub
math
::
vec_exp
<
T
>
(
n
,
y
,
y
);
// exp
// sum
scalar
=
T
(
0
);
for
(
int
i
=
0
;
i
<
n
;
++
i
)
{
...
...
@@ -302,13 +302,13 @@ class AttentionLSTMKernel : public framework::OpKernel<T> {
auto
&
act_gate_str
=
ctx
.
Attr
<
std
::
string
>
(
"gate_activation"
);
auto
&
act_cell_str
=
ctx
.
Attr
<
std
::
string
>
(
"cell_activation"
);
auto
&
act_cand_str
=
ctx
.
Attr
<
std
::
string
>
(
"candidate_activation"
);
if
(
platform
::
jit
::
MayIUse
(
platform
::
jit
::
avx
))
{
math
::
VecActivations
<
T
,
platform
::
jit
::
avx
>
act_functor
;
if
(
platform
::
MayIUse
(
platform
::
avx
))
{
math
::
VecActivations
<
T
,
platform
::
avx
>
act_functor
;
act_gate
=
act_functor
(
act_gate_str
);
act_cell
=
act_functor
(
act_cell_str
);
act_cand
=
act_functor
(
act_cand_str
);
}
else
{
math
::
VecActivations
<
T
,
platform
::
jit
::
isa_any
>
act_functor
;
math
::
VecActivations
<
T
,
platform
::
isa_any
>
act_functor
;
act_gate
=
act_functor
(
act_gate_str
);
act_cell
=
act_functor
(
act_cell_str
);
act_cand
=
act_functor
(
act_cand_str
);
...
...
paddle/fluid/operators/conv_fusion_op.cu.cc
浏览文件 @
2dda19f7
...
...
@@ -110,11 +110,7 @@ class CUDNNConvFusionOpKernel : public framework::OpKernel<T> {
auto
x_dims
=
framework
::
vectorize
(
input
->
dims
());
auto
f_dims
=
framework
::
vectorize
(
filter
->
dims
());
if
(
activation
==
"identity"
)
{
// Only the CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM algo is
// enabled with CUDNN_ACTIVATION_IDENTITY in cuDNN lib.
algo
=
CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM
;
}
else
if
(
!
exhaustive_search
)
{
if
(
!
exhaustive_search
)
{
CUDNN_ENFORCE
(
platform
::
dynload
::
cudnnGetConvolutionForwardAlgorithm
(
handle
,
cudnn_input_desc
,
cudnn_filter_desc
,
cudnn_conv_desc
,
cudnn_output_desc
,
CUDNN_CONVOLUTION_FWD_SPECIFY_WORKSPACE_LIMIT
,
...
...
@@ -165,18 +161,42 @@ class CUDNNConvFusionOpKernel : public framework::OpKernel<T> {
PADDLE_ENFORCE_LE
(
workspace_size_in_bytes
,
workspace_size_limit
,
"workspace_size to be allocated exceeds the limit"
);
// ------------------- cudnn conv+bias+act forward --------------------
ScalingParamType
<
T
>
alpha1
=
1.0
f
;
ScalingParamType
<
T
>
alpha2
=
residual
?
1.0
f
:
0.0
f
;
auto
cudnn_func
=
[
&
](
void
*
cudnn_workspace
)
{
CUDNN_ENFORCE
(
platform
::
dynload
::
cudnnConvolutionBiasActivationForward
(
handle
,
&
alpha1
,
cudnn_input_desc
,
input_data
,
cudnn_filter_desc
,
filter_data
,
cudnn_conv_desc
,
algo
,
cudnn_workspace
,
workspace_size_in_bytes
,
&
alpha2
,
cudnn_output_desc
,
residual_data
,
cudnn_bias_desc
,
bias_data
,
cudnn_act_desc
,
cudnn_output_desc
,
if
((
activation
==
"identity"
)
&&
(
algo
!=
CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM
)
&&
(
!
residual
))
{
// Only the CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM algo is
// enabled with CUDNN_ACTIVATION_IDENTITY in cuDNN lib.
// But test in some case, the speed is slower, change to use
// cudnnConvolutionForward and cudnnAddTensor
// ------------- cudnn conv forward and bias add ---------------------
ScalingParamType
<
T
>
alpha
=
1.0
f
,
beta
=
0.0
f
;
auto
cudnn_func
=
[
&
](
void
*
cudnn_workspace
)
{
CUDNN_ENFORCE
(
platform
::
dynload
::
cudnnConvolutionForward
(
handle
,
&
alpha
,
cudnn_input_desc
,
input_data
,
cudnn_filter_desc
,
filter_data
,
cudnn_conv_desc
,
algo
,
cudnn_workspace
,
workspace_size_in_bytes
,
&
beta
,
cudnn_output_desc
,
output_data
));
};
workspace_handle
.
RunFunc
(
cudnn_func
,
workspace_size_in_bytes
);
CUDNN_ENFORCE
(
platform
::
dynload
::
cudnnAddTensor
(
handle
,
&
alpha
,
cudnn_bias_desc
,
bias_data
,
&
alpha
,
cudnn_output_desc
,
output_data
));
};
workspace_handle
.
RunFunc
(
cudnn_func
,
workspace_size_in_bytes
);
}
else
{
if
(
activation
==
"identity"
)
{
algo
=
CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM
;
}
// ------------------- cudnn conv+bias+act forward --------------------
ScalingParamType
<
T
>
alpha1
=
1.0
f
;
ScalingParamType
<
T
>
alpha2
=
residual
?
1.0
f
:
0.0
f
;
auto
cudnn_func
=
[
&
](
void
*
cudnn_workspace
)
{
CUDNN_ENFORCE
(
platform
::
dynload
::
cudnnConvolutionBiasActivationForward
(
handle
,
&
alpha1
,
cudnn_input_desc
,
input_data
,
cudnn_filter_desc
,
filter_data
,
cudnn_conv_desc
,
algo
,
cudnn_workspace
,
workspace_size_in_bytes
,
&
alpha2
,
cudnn_output_desc
,
residual_data
,
cudnn_bias_desc
,
bias_data
,
cudnn_act_desc
,
cudnn_output_desc
,
output_data
));
};
workspace_handle
.
RunFunc
(
cudnn_func
,
workspace_size_in_bytes
);
}
}
};
#endif
...
...
paddle/fluid/operators/conv_mkldnn_op.cc
浏览文件 @
2dda19f7
...
...
@@ -28,6 +28,46 @@ using mkldnn::stream;
using
platform
::
to_void_cast
;
using
platform
::
GetMKLDNNFormat
;
inline
void
GetWeightsTz
(
std
::
vector
<
int
>&
weights_tz
,
int
groups
,
// NOLINT
bool
is_conv3d
)
{
if
(
groups
>
1
)
{
if
(
is_conv3d
)
{
int
output
=
weights_tz
[
0
];
int
input
=
weights_tz
[
1
];
int
dimension
=
weights_tz
[
2
];
int
height
=
weights_tz
[
3
];
int
width
=
weights_tz
[
4
];
weights_tz
.
resize
(
6
);
weights_tz
[
0
]
=
groups
;
weights_tz
[
1
]
=
output
/
groups
;
weights_tz
[
2
]
=
input
;
weights_tz
[
3
]
=
dimension
;
weights_tz
[
4
]
=
height
;
weights_tz
[
5
]
=
width
;
}
else
{
int
output
=
weights_tz
[
0
];
int
input
=
weights_tz
[
1
];
int
height
=
weights_tz
[
2
];
int
width
=
weights_tz
[
3
];
weights_tz
.
resize
(
5
);
weights_tz
[
0
]
=
groups
;
weights_tz
[
1
]
=
output
/
groups
;
weights_tz
[
2
]
=
input
;
weights_tz
[
3
]
=
height
;
weights_tz
[
4
]
=
width
;
}
}
}
inline
mkldnn
::
memory
::
format
GetWeightsFormat
(
mkldnn
::
memory
::
format
format
,
int
groups
,
bool
is_conv3d
)
{
if
(
is_conv3d
)
{
return
(
groups
==
1
)
?
format
:
mkldnn
::
memory
::
format
::
goidhw
;
}
else
{
return
(
groups
==
1
)
?
format
:
mkldnn
::
memory
::
format
::
goihw
;
}
}
template
<
typename
T
>
class
ConvMKLDNNOpKernel
:
public
paddle
::
framework
::
OpKernel
<
T
>
{
public:
...
...
@@ -52,10 +92,10 @@ class ConvMKLDNNOpKernel : public paddle::framework::OpKernel<T> {
PADDLE_ENFORCE
(
filter
->
layout
()
==
DataLayout
::
kMKLDNN
&&
filter
->
format
()
!=
memory
::
format
::
format_undef
,
"Wrong layout/format set for Filter tensor"
);
PADDLE_ENFORCE
(
input
->
dims
().
size
()
==
4
,
"Input must be with 4
dimensions, i.e. NC
HW"
);
PADDLE_ENFORCE
(
filter
->
dims
().
size
()
==
4
,
"Filter must be with 4
dimensions, i.e. OI
HW"
);
PADDLE_ENFORCE
(
input
->
dims
().
size
()
==
4
||
input
->
dims
().
size
()
==
5
,
"Input must be with 4
or 5 dimensions, i.e. NCHW or NCD
HW"
);
PADDLE_ENFORCE
(
filter
->
dims
().
size
()
==
4
||
filter
->
dims
().
size
()
==
5
,
"Filter must be with 4
or 5 dimensions, i.e. OIHW or OID
HW"
);
if
(
bias
)
{
PADDLE_ENFORCE
(
bias
->
layout
()
==
DataLayout
::
kMKLDNN
&&
bias
->
format
()
!=
memory
::
format
::
format_undef
,
...
...
@@ -71,9 +111,13 @@ class ConvMKLDNNOpKernel : public paddle::framework::OpKernel<T> {
bool
fuse_residual_conn
=
ctx
.
Attr
<
bool
>
(
"fuse_residual_connection"
);
int
groups
=
ctx
.
Attr
<
int
>
(
"groups"
);
bool
is_conv3d
=
strides
.
size
()
==
3U
;
// TODO(tpatejko): add support for dilation
PADDLE_ENFORCE
(
dilations
.
size
()
==
2
&&
dilations
[
0
]
==
1
&&
dilations
[
1
]
==
1
,
is_conv3d
?
dilations
.
size
()
==
3
&&
dilations
[
0
]
==
1
&&
dilations
[
1
]
==
1
&&
dilations
[
2
]
==
1
:
dilations
.
size
()
==
2
&&
dilations
[
0
]
==
1
&&
dilations
[
1
]
==
1
,
"dilation in convolution is not implemented yet"
);
const
T
*
input_data
=
input
->
data
<
T
>
();
...
...
@@ -83,18 +127,7 @@ class ConvMKLDNNOpKernel : public paddle::framework::OpKernel<T> {
std
::
vector
<
int
>
weights_tz
=
paddle
::
framework
::
vectorize2int
(
filter
->
dims
());
int
g
=
std
::
max
(
groups
,
1
);
if
(
g
>
1
)
{
int
o
=
weights_tz
[
0
];
int
i
=
weights_tz
[
1
];
int
h
=
weights_tz
[
2
];
int
w
=
weights_tz
[
3
];
weights_tz
.
resize
(
5
);
weights_tz
[
0
]
=
g
;
weights_tz
[
1
]
=
o
/
g
;
weights_tz
[
2
]
=
i
;
weights_tz
[
3
]
=
h
;
weights_tz
[
4
]
=
w
;
}
GetWeightsTz
(
weights_tz
,
g
,
is_conv3d
);
std
::
vector
<
int
>
dst_tz
=
paddle
::
framework
::
vectorize2int
(
output
->
dims
());
// Get unique name for storing MKLDNN primitives
...
...
@@ -105,11 +138,14 @@ class ConvMKLDNNOpKernel : public paddle::framework::OpKernel<T> {
std
::
vector
<
primitive
>
pipeline
;
auto
src_format
=
input
->
format
();
mkldnn
::
memory
::
format
weights_format
=
GetWeightsFormat
(
filter
->
format
(),
g
,
is_conv3d
);
auto
user_src_md
=
platform
::
MKLDNNMemDesc
(
{
src_tz
},
platform
::
MKLDNNGetDataType
<
T
>
(),
input
->
format
()
);
{
src_tz
},
platform
::
MKLDNNGetDataType
<
T
>
(),
src_format
);
auto
user_weights_md
=
platform
::
MKLDNNMemDesc
(
{
weights_tz
},
platform
::
MKLDNNGetDataType
<
T
>
(),
(
g
==
1
)
?
filter
->
format
()
:
mkldnn
::
memory
::
format
::
goihw
);
{
weights_tz
},
platform
::
MKLDNNGetDataType
<
T
>
(),
weights_format
);
/* create memory descriptor for convolution without specified format
* ('any') which lets a primitive (convolution in this case) choose
...
...
@@ -119,10 +155,16 @@ class ConvMKLDNNOpKernel : public paddle::framework::OpKernel<T> {
auto
chosen_memory_format
=
platform
::
data_format_to_memory_format
(
data_format
);
if
(
is_conv3d
)
{
chosen_memory_format
=
platform
::
MKLDNNFormatForSize
(
src_tz
.
size
(),
chosen_memory_format
);
}
weights_format
=
GetWeightsFormat
(
chosen_memory_format
,
g
,
is_conv3d
);
auto
src_md
=
platform
::
MKLDNNMemDesc
(
src_tz
,
platform
::
MKLDNNGetDataType
<
T
>
(),
chosen_memory_format
);
auto
weights_md
=
platform
::
MKLDNNMemDesc
(
weights_tz
,
platform
::
MKLDNNGetDataType
<
T
>
(),
chosen_memory
_format
);
weights_tz
,
platform
::
MKLDNNGetDataType
<
T
>
(),
weights
_format
);
std
::
vector
<
int
>
bias_tz
;
// TODO(mgallus): avoid empty vector creation.
// Currently used whenever bias is != nullptr.
auto
dst_md
=
platform
::
MKLDNNMemDesc
(
...
...
@@ -263,8 +305,8 @@ class ConvMKLDNNOpKernel : public paddle::framework::OpKernel<T> {
const
mkldnn
::
engine
&
engine
,
const
bool
fuse_relu
,
const
bool
fuse_residual_conn
,
mkldnn
::
prop_kind
fwd_prop_kind
)
const
{
memory
::
dims
stride_dims
=
{
strides
[
0
],
strides
[
1
]}
;
memory
::
dims
padding_dims
=
{
paddings
[
0
],
paddings
[
1
]}
;
memory
::
dims
stride_dims
=
strides
;
memory
::
dims
padding_dims
=
paddings
;
auto
conv_desc
=
mkldnn
::
convolution_forward
::
desc
(
fwd_prop_kind
,
mkldnn
::
convolution_direct
,
src
,
weights
,
dst
,
...
...
@@ -288,8 +330,8 @@ class ConvMKLDNNOpKernel : public paddle::framework::OpKernel<T> {
const
mkldnn
::
engine
&
engine
,
const
bool
fuse_relu
,
const
bool
fuse_residual_conn
,
mkldnn
::
prop_kind
fwd_prop_kind
)
const
{
memory
::
dims
stride_dims
=
{
strides
[
0
],
strides
[
1
]}
;
memory
::
dims
padding_dims
=
{
paddings
[
0
],
paddings
[
1
]}
;
memory
::
dims
stride_dims
=
strides
;
memory
::
dims
padding_dims
=
paddings
;
auto
conv_desc
=
mkldnn
::
convolution_forward
::
desc
(
fwd_prop_kind
,
mkldnn
::
convolution_direct
,
src
,
weights
,
bias
,
dst
,
...
...
@@ -349,6 +391,7 @@ class ConvMKLDNNGradOpKernel : public paddle::framework::OpKernel<T> {
std
::
vector
<
int
>
dilations
=
ctx
.
Attr
<
std
::
vector
<
int
>>
(
"dilations"
);
int
groups
=
ctx
.
Attr
<
int
>
(
"groups"
);
bool
is_conv3d
=
strides
.
size
()
==
3U
;
const
T
*
input_data
=
input
->
data
<
T
>
();
const
T
*
filter_data
=
filter
->
data
<
T
>
();
const
T
*
output_grad_data
=
output_grad
->
data
<
T
>
();
...
...
@@ -358,8 +401,14 @@ class ConvMKLDNNGradOpKernel : public paddle::framework::OpKernel<T> {
std
::
vector
<
int
>
src_tz
=
paddle
::
framework
::
vectorize2int
(
input
->
dims
());
std
::
vector
<
int
>
weights_tz
=
paddle
::
framework
::
vectorize2int
(
filter
->
dims
());
int
g
=
std
::
max
(
groups
,
1
);
GetWeightsTz
(
weights_tz
,
g
,
is_conv3d
);
std
::
vector
<
int
>
dst_tz
=
paddle
::
framework
::
vectorize2int
(
output
->
dims
());
auto
src_format
=
input
->
format
();
mkldnn
::
memory
::
format
weights_format
=
GetWeightsFormat
(
filter
->
format
(),
g
,
is_conv3d
);
// Get an unique name from "argument" name of "Output" variable
// as well as attributes of primitive to be created
// This name will be used as key when saving info into device context
...
...
@@ -372,9 +421,9 @@ class ConvMKLDNNGradOpKernel : public paddle::framework::OpKernel<T> {
// Create user memory descriptors
auto
user_src_md
=
platform
::
MKLDNNMemDesc
(
{
src_tz
},
platform
::
MKLDNNGetDataType
<
T
>
(),
input
->
format
()
);
{
src_tz
},
platform
::
MKLDNNGetDataType
<
T
>
(),
src_format
);
auto
user_weights_md
=
platform
::
MKLDNNMemDesc
(
{
weights_tz
},
platform
::
MKLDNNGetDataType
<
T
>
(),
filter
->
format
()
);
{
weights_tz
},
platform
::
MKLDNNGetDataType
<
T
>
(),
weights_format
);
auto
user_diff_dst_md
=
platform
::
MKLDNNMemDesc
(
{
dst_tz
},
platform
::
MKLDNNGetDataType
<
T
>
(),
output_grad
->
format
());
...
...
@@ -386,14 +435,20 @@ class ConvMKLDNNGradOpKernel : public paddle::framework::OpKernel<T> {
auto
chosen_memory_format
=
platform
::
data_format_to_memory_format
(
data_format
);
if
(
is_conv3d
)
{
chosen_memory_format
=
platform
::
MKLDNNFormatForSize
(
src_tz
.
size
(),
chosen_memory_format
);
}
weights_format
=
GetWeightsFormat
(
chosen_memory_format
,
g
,
is_conv3d
);
auto
src_md
=
platform
::
MKLDNNMemDesc
(
src_tz
,
platform
::
MKLDNNGetDataType
<
T
>
(),
chosen_memory_format
);
auto
diff_src_md
=
platform
::
MKLDNNMemDesc
(
src_tz
,
platform
::
MKLDNNGetDataType
<
T
>
(),
chosen_memory_format
);
auto
weights_md
=
platform
::
MKLDNNMemDesc
(
weights_tz
,
platform
::
MKLDNNGetDataType
<
T
>
(),
chosen_memory
_format
);
weights_tz
,
platform
::
MKLDNNGetDataType
<
T
>
(),
weights
_format
);
auto
diff_weights_md
=
platform
::
MKLDNNMemDesc
(
weights_tz
,
platform
::
MKLDNNGetDataType
<
T
>
(),
chosen_memory
_format
);
weights_tz
,
platform
::
MKLDNNGetDataType
<
T
>
(),
weights
_format
);
auto
diff_dst_md
=
platform
::
MKLDNNMemDesc
(
dst_tz
,
platform
::
MKLDNNGetDataType
<
T
>
(),
chosen_memory_format
);
...
...
@@ -491,8 +546,22 @@ class ConvMKLDNNGradOpKernel : public paddle::framework::OpKernel<T> {
namespace
ops
=
paddle
::
operators
;
REGISTER_OP_KERNEL
(
conv2d
,
MKLDNN
,
::
paddle
::
platform
::
CPUPlace
,
ops
::
ConvMKLDNNOpKernel
<
float
>
);
REGISTER_OP_KERNEL
(
conv2d_grad
,
MKLDNN
,
::
paddle
::
platform
::
CPUPlace
,
ops
::
ConvMKLDNNGradOpKernel
<
float
>
);
REGISTER_OP_KERNEL_WITH_CUSTOM_TYPE
(
conv2d
,
MKLDNN
,
::
paddle
::
platform
::
CPUPlace
,
FP32
,
ops
::
kConvMKLDNNFP32
,
ops
::
ConvMKLDNNOpKernel
<
float
>
);
REGISTER_OP_KERNEL_WITH_CUSTOM_TYPE
(
conv2d_grad
,
MKLDNN
,
::
paddle
::
platform
::
CPUPlace
,
FP32
,
ops
::
kConvMKLDNNFP32
,
ops
::
ConvMKLDNNGradOpKernel
<
float
>
);
REGISTER_OP_KERNEL_WITH_CUSTOM_TYPE
(
conv3d
,
MKLDNN
,
::
paddle
::
platform
::
CPUPlace
,
FP32
,
ops
::
kConvMKLDNNFP32
,
ops
::
ConvMKLDNNOpKernel
<
float
>
);
REGISTER_OP_KERNEL_WITH_CUSTOM_TYPE
(
conv3d_grad
,
MKLDNN
,
::
paddle
::
platform
::
CPUPlace
,
FP32
,
ops
::
kConvMKLDNNFP32
,
ops
::
ConvMKLDNNGradOpKernel
<
float
>
);
paddle/fluid/operators/conv_op.cc
浏览文件 @
2dda19f7
...
...
@@ -74,6 +74,8 @@ void ConvOp::InferShape(framework::InferShapeContext* ctx) const {
framework
::
OpKernelType
ConvOp
::
GetExpectedKernelType
(
const
framework
::
ExecutionContext
&
ctx
)
const
{
int
customized_type_value
=
framework
::
OpKernelType
::
kDefaultCustomizedTypeValue
;
framework
::
LibraryType
library
{
framework
::
LibraryType
::
kPlain
};
// TODO(pzelazko-intel): enable MKLDNN layout when it's ready
std
::
string
data_format
=
ctx
.
Attr
<
std
::
string
>
(
"data_format"
);
...
...
@@ -89,6 +91,7 @@ framework::OpKernelType ConvOp::GetExpectedKernelType(
platform
::
CanMKLDNNBeUsed
(
ctx
))
{
library
=
framework
::
LibraryType
::
kMKLDNN
;
layout
=
framework
::
DataLayout
::
kMKLDNN
;
customized_type_value
=
kConvMKLDNNFP32
;
}
#endif
...
...
@@ -105,7 +108,7 @@ framework::OpKernelType ConvOp::GetExpectedKernelType(
}
return
framework
::
OpKernelType
(
input_data_type
,
ctx
.
GetPlace
(),
layout
,
library
);
library
,
customized_type_value
);
}
void
Conv2DOpMaker
::
Make
()
{
...
...
@@ -131,14 +134,14 @@ void Conv2DOpMaker::Make() {
"The format of output tensor is X (one-dimensional) of size equal"
"to the number of output channels. Only used with MKL-DNN."
)
.
AsDispensable
();
AddOutput
(
"Output"
,
"(Tensor) The output tensor of convolution operator. "
"The format of output tensor is also NCHW."
);
AddInput
(
"ResidualData"
,
"(Tensor) Tensor with residual data "
"to which convolution output will be added."
"Used with fuse_residual_connection fusion."
)
.
AsDispensable
();
AddOutput
(
"Output"
,
"(Tensor) The output tensor of convolution operator. "
"The format of output tensor is also NCHW."
);
AddAttr
<
std
::
vector
<
int
>>
(
"strides"
,
"(vector<int> default:{1, 1}), the "
"strides(h_stride, w_stride) of "
...
...
@@ -229,6 +232,10 @@ $$
}
void
Conv3DOpMaker
::
Make
()
{
AddAttr
<
bool
>
(
"is_test"
,
"(bool, default false) Set to true for inference only, false "
"for training. Some layers may run faster when this is true."
)
.
SetDefault
(
false
);
AddInput
(
"Input"
,
"(Tensor) The input tensor of convolution operator. "
...
...
@@ -244,6 +251,11 @@ void Conv3DOpMaker::Make() {
"is the width of the filter."
"If the groups attribute is greater than 1, C equals the number of "
"input image channels divided by the groups."
);
AddInput
(
"ResidualData"
,
"(Tensor) Tensor with residual data "
"to which convolution output will be added."
"Used with fuse_residual_connection fusion."
)
.
AsDispensable
();
AddOutput
(
"Output"
,
"(Tensor) The output tensor of convolution operator."
"The format of output tensor is also NCDHW."
);
...
...
@@ -277,6 +289,13 @@ void Conv3DOpMaker::Make() {
AddAttr
<
bool
>
(
"use_mkldnn"
,
"(bool, default false) Only used in mkldnn kernel"
)
.
SetDefault
(
false
);
AddAttr
<
bool
>
(
"fuse_relu"
,
"(bool, default false) Only used in mkldnn kernel"
)
.
SetDefault
(
false
);
AddAttr
<
bool
>
(
"fuse_residual_connection"
,
"(bool, default false) Only used in mkldnn kernel. Used "
"whenever convolution output is as an input to residual "
"connection."
)
.
SetDefault
(
false
);
AddAttr
<
std
::
string
>
(
"data_format"
,
"(string, default NCHW) Only used in "
...
...
@@ -342,6 +361,8 @@ void ConvOpGrad::InferShape(framework::InferShapeContext* ctx) const {
framework
::
OpKernelType
ConvOpGrad
::
GetExpectedKernelType
(
const
framework
::
ExecutionContext
&
ctx
)
const
{
int
customized_type_value
=
framework
::
OpKernelType
::
kDefaultCustomizedTypeValue
;
framework
::
LibraryType
library_
{
framework
::
LibraryType
::
kPlain
};
// TODO(pzelazko-intel): enable MKLDNN layout when it's ready
std
::
string
data_format
=
ctx
.
Attr
<
std
::
string
>
(
"data_format"
);
...
...
@@ -357,12 +378,13 @@ framework::OpKernelType ConvOpGrad::GetExpectedKernelType(
platform
::
CanMKLDNNBeUsed
(
ctx
))
{
library_
=
framework
::
LibraryType
::
kMKLDNN
;
layout_
=
framework
::
DataLayout
::
kMKLDNN
;
customized_type_value
=
kConvMKLDNNFP32
;
}
#endif
return
framework
::
OpKernelType
(
framework
::
ToDataType
(
ctx
.
Input
<
Tensor
>
(
"Input"
)
->
type
()),
ctx
.
GetPlace
(),
layout_
,
library_
);
layout_
,
library_
,
customized_type_value
);
}
}
// namespace operators
...
...
paddle/fluid/operators/conv_op.h
浏览文件 @
2dda19f7
...
...
@@ -27,6 +27,8 @@ namespace paddle {
namespace
operators
{
using
Tensor
=
framework
::
Tensor
;
constexpr
int
kConvMKLDNNFP32
=
1
;
constexpr
int
kConvMKLDNNINT8
=
2
;
// Base convolution operator definations for other conv
// like operators to reuse the implementation.
...
...
paddle/fluid/operators/cudnn_lstm_op.cu.cc
浏览文件 @
2dda19f7
...
...
@@ -177,11 +177,19 @@ struct CudnnRNNCache {
seed_
));
CUDNN_ENFORCE
(
platform
::
dynload
::
cudnnCreateRNNDescriptor
(
&
rnn_desc_
));
#if CUDNN_VERSION >= 6000
CUDNN_ENFORCE
(
platform
::
dynload
::
cudnnSetRNNDescriptor_v6
(
handle
,
rnn_desc_
,
hidden_size_
,
num_layers_
,
dropout_desc_
,
CUDNN_LINEAR_INPUT
,
is_bidirec_
?
CUDNN_BIDIRECTIONAL
:
CUDNN_UNIDIRECTIONAL
,
CUDNN_LSTM
,
CUDNN_RNN_ALGO_STANDARD
,
CUDNN_DATA_FLOAT
));
#else
CUDNN_ENFORCE
(
platform
::
dynload
::
cudnnSetRNNDescriptor
(
rnn_desc_
,
hidden_size_
,
num_layers_
,
dropout_desc_
,
CUDNN_LINEAR_INPUT
,
is_bidirec_
?
CUDNN_BIDIRECTIONAL
:
CUDNN_UNIDIRECTIONAL
,
CUDNN_LSTM
,
CUDNN_DATA_FLOAT
));
#endif
CUDNN_ENFORCE
(
platform
::
dynload
::
cudnnCreateFilterDescriptor
(
&
w_desc_
));
CUDNN_ENFORCE
(
platform
::
dynload
::
cudnnCreateFilterDescriptor
(
&
dw_desc_
));
...
...
paddle/fluid/operators/distributed/CMakeLists.txt
浏览文件 @
2dda19f7
...
...
@@ -13,16 +13,26 @@ set(DISTRIBUTE_COMPILE_FLAGS "-Wno-non-virtual-dtor -Wno-error=non-virtual-dtor
if
(
WITH_GRPC
)
grpc_library
(
sendrecvop_grpc SRCS grpc_bytebuffer_stream.cc sendrecvop_utils.cc grpc_client.cc
request_handler_impl.cc rpc_client.cc rpc_server.cc grpc_server.cc variable_response.cc grpc_variable_response.cc grpc_serde.cc
request_handler_impl.cc rpc_client.cc rpc_server.cc grpc_server.cc variable_response.cc grpc_variable_response.cc grpc_serde.cc
collective_client.cc collective_server.cc
PROTO send_recv.proto
DEPS lod_tensor selected_rows memory
)
DEPS lod_tensor selected_rows
_functor
memory
)
set_source_files_properties
(
grpc_serde_test.cc rpc_server_test.cc PROPERTIES COMPILE_FLAGS
${
DISTRIBUTE_COMPILE_FLAGS
}
)
cc_test
(
grpc_serde_test SRCS grpc_serde_test.cc
DEPS grpc++_unsecure grpc_unsecure gpr cares zlib protobuf sendrecvop_grpc scope profiler math_function SERIAL
)
cc_test
(
rpc_server_test SRCS rpc_server_test.cc
DEPS sendrecvop_grpc grpc++_unsecure grpc_unsecure gpr cares zlib protobuf executor proto_desc lookup_sparse_table_op SERIAL
)
cc_test
(
varhandle_test SRCS varhandle_test.cc DEPS profiler
)
if
(
WITH_GPU
)
cc_test
(
collective_server_test SRCS collective_server_test.cc
DEPS sendrecvop_grpc grpc++_unsecure grpc_unsecure gpr cares zlib protobuf executor
selected_rows_functor scope math_function SERIAL
)
endif
()
cc_library
(
parameter_prefetch SRCS parameter_prefetch.cc DEPS sendrecvop_grpc memory
)
else
()
set_source_files_properties
(
brpc_server.cc brpc_client.cc rpc_server_test.cc brpc_serde_test.cc
...
...
paddle/fluid/operators/distributed/collective_client.cc
0 → 100644
浏览文件 @
2dda19f7
// Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include <condition_variable> // NOLINT
#include <string>
#include "gflags/gflags.h"
#include "paddle/fluid/operators/distributed/collective_client.h"
DECLARE_int32
(
rpc_deadline
);
namespace
paddle
{
namespace
operators
{
namespace
distributed
{
std
::
once_flag
CollectiveClient
::
init_flag_
;
std
::
unique_ptr
<
CollectiveClient
>
CollectiveClient
::
client_
(
nullptr
);
bool
CollectiveClient
::
Gather
(
const
std
::
vector
<
RemoteVar
>&
remote_vars
,
std
::
vector
<
const
framework
::
SelectedRows
*>*
dst
,
const
platform
::
DeviceContext
&
ctx
,
framework
::
Scope
*
scope
,
int64_t
time_out
)
{
for
(
auto
r
:
remote_vars
)
{
VLOG
(
50
)
<<
"begin gather from ep:"
<<
r
.
String
();
scope
->
Var
(
r
.
var_name_
)
->
GetMutable
<
framework
::
SelectedRows
>
();
VarHandlePtr
ptr
=
rpc_client_
->
AsyncGetMonomerVariable
(
r
.
ep_
,
ctx
,
*
scope
,
r
.
var_name_
,
time_out
);
}
rpc_client_
->
Wait
();
for
(
auto
r
:
remote_vars
)
{
auto
select_rows
=
scope
->
FindVar
(
r
.
var_name_
)
->
GetMutable
<
framework
::
SelectedRows
>
();
dst
->
push_back
(
select_rows
);
VLOG
(
4
)
<<
"gather from ep:"
<<
r
.
String
()
<<
", select_rows:"
<<
GetSelectedRowsInfo
(
*
select_rows
);
rpc_client_
->
AsyncGetMonomerBarrier
(
r
.
ep_
,
r
.
var_name_
);
}
rpc_client_
->
Wait
();
return
true
;
}
}
// namespace distributed
}
// namespace operators
}
// namespace paddle
paddle/fluid/operators/distributed/collective_client.h
0 → 100644
浏览文件 @
2dda19f7
// Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#pragma once
#include <condition_variable> // NOLINT
#include <string>
#include <vector>
#include "gflags/gflags.h"
#include "paddle/fluid/framework/data_type.h"
#include "paddle/fluid/framework/lod_tensor.h"
#include "paddle/fluid/framework/scope.h"
#include "paddle/fluid/operators/detail/macros.h"
#include "paddle/fluid/operators/distributed/request_handler.h"
DECLARE_int32
(
rpc_deadline
);
namespace
paddle
{
namespace
operators
{
namespace
distributed
{
inline
std
::
string
GetSelectedRowsInfo
(
const
framework
::
SelectedRows
&
slr
)
{
std
::
stringstream
ss
;
ss
<<
", height:"
<<
slr
.
height
()
<<
", rows:["
;
for
(
unsigned
int
i
=
0
;
i
<
slr
.
rows
().
size
();
i
++
)
{
if
(
i
!=
slr
.
rows
().
size
()
-
1
)
{
ss
<<
slr
.
rows
()[
i
]
<<
","
;
}
else
{
ss
<<
slr
.
rows
()[
i
];
}
}
ss
<<
"], dims:"
<<
slr
.
value
().
dims
();
return
ss
.
str
();
}
struct
RemoteVar
{
std
::
string
ep_
;
std
::
string
var_name_
;
int
trainer_id_
{
0
};
std
::
string
String
()
{
std
::
stringstream
ss
;
ss
<<
"ep:"
<<
ep_
<<
", var_name:"
<<
var_name_
<<
", trainer_id:"
<<
trainer_id_
;
return
ss
.
str
();
}
};
class
CollectiveClient
{
public:
CollectiveClient
()
{
rpc_client_
.
reset
(
new
RPCCLIENT_T
());
rpc_client_
->
InitImpl
();
}
virtual
~
CollectiveClient
()
{}
// note this function will retain the rank order.
bool
Gather
(
const
std
::
vector
<
RemoteVar
>&
remote_vars
,
std
::
vector
<
const
framework
::
SelectedRows
*>*
dst
,
const
platform
::
DeviceContext
&
ctx
,
framework
::
Scope
*
scope
,
int64_t
time_out
=
FLAGS_rpc_deadline
);
static
CollectiveClient
*
GetInstance
()
{
std
::
call_once
(
init_flag_
,
[
&
]()
{
if
(
client_
.
get
()
==
nullptr
)
{
client_
.
reset
(
new
CollectiveClient
());
}
});
return
client_
.
get
();
}
private:
std
::
unique_ptr
<
RPCClient
>
rpc_client_
;
static
std
::
once_flag
init_flag_
;
static
std
::
unique_ptr
<
CollectiveClient
>
client_
;
};
}
// namespace distributed
}
// namespace operators
}
// namespace paddle
paddle/fluid/operators/distributed/collective_server.cc
0 → 100644
浏览文件 @
2dda19f7
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <stdio.h> // for removing the port file
#include <csignal>
#include <cstdlib>
#include <fstream>
#include <thread> // NOLINT
#include <vector>
#include "paddle/fluid/operators/distributed/collective_server.h"
DEFINE_int32
(
collective_get_thread_num
,
5
,
"number of threads for rpc get"
);
namespace
paddle
{
namespace
operators
{
namespace
distributed
{
std
::
once_flag
CollectiveServer
::
init_flag_
;
std
::
shared_ptr
<
CollectiveServer
>
CollectiveServer
::
collective_server_
(
nullptr
);
CollectiveServer
::
CollectiveServer
(
const
std
::
string
&
end_point
,
int
fan_in
)
{
VLOG
(
1
)
<<
"Create colllective server:"
<<
end_point
<<
", fan_in:"
<<
fan_in
;
rpc_server_
.
reset
(
new
RPCSERVER_T
(
end_point
,
fan_in
));
}
void
CollectiveServer
::
Stop
()
{
rpc_server_
->
ShutDown
();
server_thread_
->
join
();
loop_thread_
->
join
();
}
void
CollectiveServer
::
StartServer
()
{
get_monomer_handler_
.
reset
(
new
GetMonomerHandler
());
get_monomer_handler_
->
SetRPCServer
(
rpc_server_
.
get
());
get_barrier_handler_
.
reset
(
new
GetMonomerBarrierHandler
());
get_barrier_handler_
->
SetRPCServer
(
rpc_server_
.
get
());
rpc_server_
->
RegisterRPC
(
distributed
::
kRequestGetMonomerVariable
,
get_monomer_handler_
.
get
(),
FLAGS_collective_get_thread_num
);
rpc_server_
->
RegisterRPC
(
distributed
::
kRequestGetMonomerBarrier
,
get_barrier_handler_
.
get
(),
1
);
server_thread_
.
reset
(
new
std
::
thread
([
&
]()
{
rpc_server_
->
StartServer
();
}));
rpc_server_
->
WaitServerReady
();
loop_thread_
.
reset
(
new
std
::
thread
([
&
]()
{
while
(
true
)
{
if
(
rpc_server_
->
IsExit
())
{
LOG
(
WARNING
)
<<
"get exit!rpc_processor break!"
;
break
;
}
sleep
(
1
);
}
VLOG
(
1
)
<<
"CollectiveServer loop_thread end"
;
}));
}
};
// namespace distributed
};
// namespace operators
};
// namespace paddle
paddle/fluid/operators/distributed/collective_server.h
0 → 100644
浏览文件 @
2dda19f7
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <map>
#include <set>
#include <string>
#include <thread> // NOLINT
#include <utility>
#include <vector>
#include "gflags/gflags.h"
#include "paddle/fluid/operators/detail/macros.h"
#include "paddle/fluid/operators/distributed/request_handler.h"
#include "paddle/fluid/operators/distributed/request_handler_impl.h"
#include "paddle/fluid/operators/distributed/rpc_server.h"
namespace
paddle
{
namespace
operators
{
namespace
distributed
{
class
CollectiveServer
;
class
GetMonomerHandler
final
:
public
RequestHandler
{
public:
GetMonomerHandler
()
:
RequestHandler
(
true
)
{}
virtual
~
GetMonomerHandler
()
{}
bool
Handle
(
const
std
::
string
&
var_name
,
framework
::
Scope
*
scope
,
framework
::
Variable
*
var
,
framework
::
Variable
**
outvar
,
const
int
trainer_id
,
const
std
::
string
&
out_var_name
=
""
,
const
std
::
string
&
table_name
=
""
)
override
{
VLOG
(
50
)
<<
"GetMonomerHandler recv "
<<
var_name
;
*
outvar
=
scope
->
FindVar
(
var_name
);
PADDLE_ENFORCE
(
outvar
!=
nullptr
,
"%s not found"
,
var_name
);
return
true
;
}
};
class
GetMonomerBarrierHandler
final
:
public
RequestHandler
{
public:
GetMonomerBarrierHandler
()
:
RequestHandler
(
true
)
{}
virtual
~
GetMonomerBarrierHandler
()
{}
bool
Handle
(
const
std
::
string
&
var_name
,
framework
::
Scope
*
scope
,
framework
::
Variable
*
var
,
framework
::
Variable
**
outvar
,
const
int
trainer_id
,
const
std
::
string
&
out_var_name
=
""
,
const
std
::
string
&
table_name
=
""
)
override
{
VLOG
(
50
)
<<
"GetMonomerHandler recv "
<<
var_name
;
rpc_server_
->
IncreaseVarBarrier
(
var_name
);
return
true
;
}
};
class
CollectiveServer
final
{
public:
explicit
CollectiveServer
(
const
std
::
string
&
end_point
,
int
fan_in
);
virtual
~
CollectiveServer
()
{}
void
StartServer
();
static
CollectiveServer
*
GetInstance
(
const
std
::
string
&
end_point
,
int
fan_in
)
{
std
::
call_once
(
init_flag_
,
[
&
]()
{
if
(
collective_server_
.
get
()
==
nullptr
)
{
collective_server_
.
reset
(
new
CollectiveServer
(
end_point
,
fan_in
));
collective_server_
->
StartServer
();
}
});
return
collective_server_
.
get
();
}
std
::
shared_ptr
<
RPCServer
>
GetRPCServer
()
{
return
rpc_server_
;
}
void
Stop
();
private:
std
::
unique_ptr
<
GetMonomerHandler
>
get_monomer_handler_
;
std
::
unique_ptr
<
GetMonomerBarrierHandler
>
get_barrier_handler_
;
std
::
shared_ptr
<
distributed
::
RPCServer
>
rpc_server_
;
std
::
shared_ptr
<
std
::
thread
>
server_thread_
;
std
::
shared_ptr
<
std
::
thread
>
loop_thread_
;
bool
ready_
{
false
};
static
std
::
once_flag
init_flag_
;
static
std
::
shared_ptr
<
CollectiveServer
>
collective_server_
;
};
};
// namespace distributed
};
// namespace operators
};
// namespace paddle
paddle/fluid/operators/distributed/collective_server_test.cc
0 → 100644
浏览文件 @
2dda19f7
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <unistd.h>
#include <string>
#include <thread> // NOLINT
#include "gtest/gtest.h"
#include "paddle/fluid/framework/block_desc.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/operator.h"
#include "paddle/fluid/operators/detail/macros.h"
#include "paddle/fluid/operators/distributed/collective_client.h"
#include "paddle/fluid/operators/distributed/collective_server.h"
#include "paddle/fluid/operators/distributed/request_handler_impl.h"
#include "paddle/fluid/operators/math/math_function.h"
namespace
framework
=
paddle
::
framework
;
namespace
platform
=
paddle
::
platform
;
namespace
distributed
=
paddle
::
operators
::
distributed
;
std
::
unique_ptr
<
distributed
::
CollectiveServer
>
StartServer
(
const
std
::
string
&
ep
,
int
fan_in
,
framework
::
Scope
*
scope
,
platform
::
DeviceContext
*
dev_ctx
)
{
distributed
::
CollectiveServer
*
server
=
distributed
::
CollectiveServer
::
GetInstance
(
ep
,
fan_in
);
auto
rpc_server
=
server
->
GetRPCServer
();
rpc_server
->
RegisterVar
(
"var1"
,
distributed
::
kRequestGetMonomerVariable
,
scope
,
dev_ctx
);
std
::
cout
<<
"StartServer return"
<<
std
::
endl
;
return
std
::
unique_ptr
<
distributed
::
CollectiveServer
>
(
server
);
}
std
::
unique_ptr
<
framework
::
Scope
>
GenerateVars
(
platform
::
Place
place
)
{
platform
::
DeviceContextPool
&
pool
=
platform
::
DeviceContextPool
::
Instance
();
auto
&
ctx
=
*
pool
.
Get
(
place
);
framework
::
Scope
*
scope
=
new
framework
::
Scope
();
framework
::
Variable
*
var
=
scope
->
Var
(
"var1"
);
auto
*
slr
=
var
->
GetMutable
<
framework
::
SelectedRows
>
();
slr
->
set_height
(
1000
);
auto
*
tensor
=
slr
->
mutable_value
();
auto
*
rows
=
slr
->
mutable_rows
();
tensor
->
Resize
(
framework
::
make_ddim
({
3
,
5
}));
tensor
->
mutable_data
<
float
>
(
place
);
paddle
::
operators
::
math
::
set_constant
(
ctx
,
tensor
,
32.7
);
for
(
int
i
=
0
;
i
<
3
;
++
i
)
rows
->
push_back
(
i
);
std
::
cout
<<
"src:"
<<
distributed
::
GetSelectedRowsInfo
(
*
slr
);
return
std
::
unique_ptr
<
framework
::
Scope
>
(
scope
);
}
void
Gather
(
const
std
::
vector
<
distributed
::
RemoteVar
>&
vars
,
platform
::
DeviceContext
*
dev_ctx
)
{
distributed
::
CollectiveClient
*
client
=
distributed
::
CollectiveClient
::
GetInstance
();
framework
::
Scope
*
scope
=
new
framework
::
Scope
();
framework
::
Variable
*
var
=
scope
->
Var
(
"var1"
);
var
->
GetMutable
<
framework
::
SelectedRows
>
();
std
::
vector
<
const
framework
::
SelectedRows
*>
dst
;
client
->
Gather
(
vars
,
&
dst
,
*
dev_ctx
,
scope
);
std
::
cout
<<
"dst:"
<<
distributed
::
GetSelectedRowsInfo
(
*
dst
[
0
]);
}
TEST
(
PREFETCH
,
GPU
)
{
platform
::
CUDAPlace
place
;
platform
::
DeviceContextPool
&
pool
=
platform
::
DeviceContextPool
::
Instance
();
auto
&
ctx
=
*
pool
.
Get
(
place
);
std
::
string
ep
=
"127.0.0.1:7164"
;
auto
scope
=
GenerateVars
(
place
);
auto
*
v1
=
scope
->
FindVar
(
"var1"
);
std
::
cout
<<
"var1:"
<<
v1
<<
std
::
endl
;
auto
server
=
StartServer
(
ep
,
2
,
scope
.
get
(),
&
ctx
);
auto
rpc_server
=
server
->
GetRPCServer
();
distributed
::
RemoteVar
var
;
var
.
ep_
=
ep
;
var
.
var_name_
=
"var1"
;
var
.
trainer_id_
=
0
;
std
::
vector
<
distributed
::
RemoteVar
>
vars
{
var
};
Gather
(
vars
,
&
ctx
);
Gather
(
vars
,
&
ctx
);
std
::
cout
<<
"begin WaitVarBarrier"
<<
std
::
endl
;
rpc_server
->
WaitVarBarrier
(
"var1"
);
rpc_server
->
ClearRegisteredVars
();
server
->
Stop
();
scope
.
release
();
server
.
release
();
}
paddle/fluid/operators/distributed/grpc_client.cc
浏览文件 @
2dda19f7
...
...
@@ -28,11 +28,11 @@ namespace paddle {
namespace
operators
{
namespace
distributed
{
void
GRPCClient
::
InitImpl
()
{
InitEventLoop
();
}
void
GRPCClient
::
InitEventLoop
()
{
void
GRPCClient
::
InitImpl
()
{
// start the client process thread
// TODO(wuyi): can make this in a threadpool
PADDLE_ENFORCE
(
client_thread_
==
nullptr
,
"please not re init proceed thread"
);
client_thread_
.
reset
(
new
std
::
thread
(
std
::
bind
(
&
GRPCClient
::
Proceed
,
this
)));
}
...
...
@@ -106,6 +106,7 @@ VarHandlePtr GRPCClient::AsyncSendVar(const std::string& ep,
void
ProcGetResponse
(
const
VarHandle
&
var_h
,
const
::
grpc
::
ByteBuffer
&
ret_msg
)
{
VLOG
(
100
)
<<
"ProcGetResponse"
;
framework
::
Variable
*
outvar
=
nullptr
;
// get response's trainer_id is not used
int
trainer_id
;
...
...
@@ -126,6 +127,24 @@ VarHandlePtr GRPCClient::AsyncGetVar(const std::string& ep,
const
framework
::
Scope
&
scope
,
const
std
::
string
&
var_name
,
int64_t
time_out
)
{
return
_AsyncGetVar
(
ep
,
ctx
,
scope
,
var_name
,
"/sendrecv.SendRecvService/GetVariable"
,
time_out
);
}
VarHandlePtr
GRPCClient
::
AsyncGetMonomerVariable
(
const
std
::
string
&
ep
,
const
platform
::
DeviceContext
&
ctx
,
const
framework
::
Scope
&
scope
,
const
std
::
string
&
var_name
,
int64_t
time_out
)
{
return
_AsyncGetVar
(
ep
,
ctx
,
scope
,
var_name
,
"/sendrecv.SendRecvService/GetMonomerVariable"
,
time_out
);
}
VarHandlePtr
GRPCClient
::
_AsyncGetVar
(
const
std
::
string
&
ep
,
const
platform
::
DeviceContext
&
ctx
,
const
framework
::
Scope
&
scope
,
const
std
::
string
&
var_name
,
const
std
::
string
&
rpc_path
,
int64_t
time_out
)
{
const
platform
::
DeviceContext
*
p_ctx
=
&
ctx
;
const
std
::
string
ep_val
=
ep
;
const
std
::
string
var_name_val
=
var_name
;
...
...
@@ -136,7 +155,7 @@ VarHandlePtr GRPCClient::AsyncGetVar(const std::string& ep,
VarHandlePtr
h
(
new
VarHandle
(
ep
,
method
,
var_name_val
,
p_ctx
,
p_scope
));
s
->
Prepare
(
h
,
time_out
);
framework
::
AsyncIO
([
var_name_val
,
s
,
method
,
p_ctx
,
h
,
this
]
{
framework
::
AsyncIO
([
var_name_val
,
s
,
method
,
p_ctx
,
h
,
rpc_path
,
this
]
{
// prepare input
sendrecv
::
VariableMessage
req
;
req
.
set_varname
(
var_name_val
);
...
...
@@ -151,8 +170,8 @@ VarHandlePtr GRPCClient::AsyncGetVar(const std::string& ep,
platform
::
RecordRPCEvent
record_event
(
method
,
p_ctx
);
auto
call
=
s
->
stub_g_
.
PrepareUnaryCall
(
s
->
context_
.
get
(),
"/sendrecv.SendRecvService/GetVariable"
,
buf
,
&
cq_
);
auto
call
=
s
->
stub_g_
.
PrepareUnaryCall
(
s
->
context_
.
get
(),
rpc_path
,
buf
,
&
cq_
);
call
->
StartCall
();
call
->
Finish
(
&
s
->
reply_
,
&
s
->
status_
,
reinterpret_cast
<
void
*>
(
s
));
...
...
@@ -268,6 +287,34 @@ VarHandlePtr GRPCClient::AsyncSendFetchBarrier(const std::string& ep,
return
h
;
}
VarHandlePtr
GRPCClient
::
AsyncGetMonomerBarrier
(
const
std
::
string
&
ep
,
const
std
::
string
&
var_name
,
int64_t
time_out
)
{
const
auto
ch
=
GetChannel
(
ep
);
BatchBarrierProcessor
*
s
=
new
BatchBarrierProcessor
(
ch
);
const
std
::
string
method
=
"SendMonomerFetchBarrierRPC"
;
VarHandlePtr
h
(
new
VarHandle
(
ep
,
method
,
FETCH_BARRIER_MESSAGE
,
nullptr
,
nullptr
));
s
->
Prepare
(
h
,
time_out
);
VLOG
(
30
)
<<
s
->
GetVarHandlePtr
()
->
String
()
<<
" begin"
;
sendrecv
::
VariableMessage
req
;
req
.
set_varname
(
var_name
);
platform
::
RecordRPCEvent
record_event
(
method
,
nullptr
);
auto
rpc
=
s
->
stub_
->
AsyncGetMonomerBarrier
(
s
->
context_
.
get
(),
req
,
&
cq_
);
rpc
->
Finish
(
&
s
->
reply_
,
&
s
->
status_
,
reinterpret_cast
<
void
*>
(
s
));
req_count_
++
;
if
(
UNLIKELY
(
platform
::
IsProfileEnabled
()))
{
h
->
Wait
();
}
return
h
;
}
VarHandlePtr
GRPCClient
::
AsyncSendComplete
(
const
std
::
string
&
ep
,
int64_t
time_out
)
{
const
auto
ch
=
GetChannel
(
ep
);
...
...
paddle/fluid/operators/distributed/grpc_client.h
浏览文件 @
2dda19f7
...
...
@@ -189,6 +189,11 @@ class GRPCClient : public RPCClient {
const
std
::
string
&
var_name
,
int64_t
time_out
=
FLAGS_rpc_deadline
)
override
;
VarHandlePtr
AsyncGetMonomerVariable
(
const
std
::
string
&
ep
,
const
platform
::
DeviceContext
&
ctx
,
const
framework
::
Scope
&
scope
,
const
std
::
string
&
var_name
,
int64_t
time_out
=
FLAGS_rpc_deadline
)
override
;
VarHandlePtr
AsyncPrefetchVar
(
const
std
::
string
&
ep
,
const
platform
::
DeviceContext
&
ctx
,
const
framework
::
Scope
&
scope
,
...
...
@@ -200,8 +205,12 @@ class GRPCClient : public RPCClient {
VarHandlePtr
AsyncSendBatchBarrier
(
const
std
::
string
&
ep
,
int64_t
time_out
=
FLAGS_rpc_deadline
)
override
;
VarHandlePtr
AsyncSendFetchBarrier
(
const
std
::
string
&
ep
,
int64_t
time_out
=
FLAGS_rpc_deadline
)
override
;
VarHandlePtr
AsyncSendFetchBarrier
(
const
std
::
string
&
ep
,
int64_t
time_out
)
override
;
VarHandlePtr
AsyncGetMonomerBarrier
(
const
std
::
string
&
ep
,
const
std
::
string
&
var_name
,
int64_t
time_out
=
FLAGS_rpc_deadline
)
override
;
VarHandlePtr
AsyncCheckpointNotify
(
const
std
::
string
&
ep
,
const
std
::
string
&
dir
,
...
...
@@ -214,21 +223,22 @@ class GRPCClient : public RPCClient {
void
SendComplete
()
override
;
protected:
void
InitImpl
()
override
;
private:
// InitEventLoop should only be called by Init()
void
InitEventLoop
();
void
Proceed
();
std
::
shared_ptr
<
grpc
::
Channel
>
GetChannel
(
const
std
::
string
&
ep
);
VarHandlePtr
_AsyncGetVar
(
const
std
::
string
&
ep
,
const
platform
::
DeviceContext
&
ctx
,
const
framework
::
Scope
&
scope
,
const
std
::
string
&
var_name
,
const
std
::
string
&
rpc
,
int64_t
time_out
);
private:
grpc
::
CompletionQueue
cq_
;
std
::
unordered_map
<
std
::
string
,
std
::
shared_ptr
<
grpc
::
Channel
>>
channels_
;
std
::
unique_ptr
<
std
::
thread
>
client_thread_
;
std
::
unique_ptr
<
std
::
thread
>
client_thread_
{
nullptr
}
;
// mutex for Wait client sync
std
::
mutex
sync_mutex_
;
...
...
paddle/fluid/operators/distributed/grpc_server.cc
浏览文件 @
2dda19f7
...
...
@@ -158,6 +158,98 @@ class RequestGet final : public RequestBase {
ServerAsyncResponseWriter
<::
grpc
::
ByteBuffer
>
responder_
;
};
class
RequestGetMonomerVariable
final
:
public
RequestBase
{
public:
explicit
RequestGetMonomerVariable
(
GrpcService
::
AsyncService
*
service
,
::
grpc
::
ServerCompletionQueue
*
cq
,
RequestHandler
*
request_handler
,
int
req_id
,
RPCServer
*
rpc_server
)
:
RequestBase
(
service
,
cq
,
request_handler
,
req_id
),
responder_
(
&
ctx_
),
rpc_server_
(
rpc_server
)
{
auto
method_id
=
static_cast
<
int
>
(
distributed
::
GrpcMethod
::
kGetMonomerVariable
);
service_
->
RequestAsyncUnary
(
method_id
,
&
ctx_
,
&
request_
,
&
responder_
,
cq_
,
cq_
,
reinterpret_cast
<
void
*>
(
static_cast
<
intptr_t
>
(
req_id
)));
}
virtual
~
RequestGetMonomerVariable
()
{}
std
::
string
GetReqName
()
override
{
return
request_
.
varname
();
}
void
Process
()
override
{
// proc request.
std
::
string
varname
=
request_
.
varname
();
rpc_server_
->
WaitVarCond
(
varname
);
MonomerHandle
h
=
rpc_server_
->
GetMonomer
(
varname
);
auto
scope
=
h
.
scope_
;
auto
invar
=
scope
->
FindVar
(
varname
);
framework
::
Variable
*
outvar
=
nullptr
;
request_handler_
->
Handle
(
varname
,
scope
,
invar
,
&
outvar
,
request_
.
trainer_id
());
if
(
outvar
)
{
SerializeToByteBuffer
(
varname
,
outvar
,
*
h
.
dev_ctx_
,
&
reply_
);
}
Finish
(
reply_
,
&
responder_
);
}
protected:
sendrecv
::
VariableMessage
request_
;
::
grpc
::
ByteBuffer
reply_
;
ServerAsyncResponseWriter
<::
grpc
::
ByteBuffer
>
responder_
;
RPCServer
*
rpc_server_
{
nullptr
};
};
class
RequestGetMonomerBarrier
final
:
public
RequestBase
{
public:
explicit
RequestGetMonomerBarrier
(
GrpcService
::
AsyncService
*
service
,
::
grpc
::
ServerCompletionQueue
*
cq
,
RequestHandler
*
request_handler
,
int
req_id
,
RPCServer
*
rpc_server
)
:
RequestBase
(
service
,
cq
,
request_handler
,
req_id
),
responder_
(
&
ctx_
),
rpc_server_
(
rpc_server
)
{
auto
method_id
=
static_cast
<
int
>
(
distributed
::
GrpcMethod
::
kGetMonomerBarrier
);
service_
->
RequestAsyncUnary
(
method_id
,
&
ctx_
,
&
request_
,
&
responder_
,
cq_
,
cq_
,
reinterpret_cast
<
void
*>
(
static_cast
<
intptr_t
>
(
req_id
)));
}
virtual
~
RequestGetMonomerBarrier
()
{}
std
::
string
GetReqName
()
override
{
return
request_
.
varname
();
}
void
Process
()
override
{
// proc request.
std
::
string
varname
=
request_
.
varname
();
VLOG
(
4
)
<<
"RequestGetMonomerBarrier "
<<
varname
;
rpc_server_
->
WaitVarCond
(
varname
);
MonomerHandle
h
=
rpc_server_
->
GetMonomer
(
varname
);
framework
::
Scope
*
scope
=
nullptr
;
framework
::
Variable
*
invar
=
nullptr
;
framework
::
Variable
*
outvar
=
nullptr
;
request_handler_
->
Handle
(
varname
,
scope
,
invar
,
&
outvar
,
request_
.
trainer_id
());
Finish
(
reply_
,
&
responder_
);
}
protected:
sendrecv
::
VariableMessage
request_
;
sendrecv
::
VoidMessage
reply_
;
ServerAsyncResponseWriter
<
sendrecv
::
VoidMessage
>
responder_
;
RPCServer
*
rpc_server_
{
nullptr
};
};
class
RequestPrefetch
final
:
public
RequestBase
{
public:
explicit
RequestPrefetch
(
GrpcService
::
AsyncService
*
service
,
...
...
@@ -249,7 +341,7 @@ class RequestCheckpointNotify final : public RequestBase {
};
void
AsyncGRPCServer
::
WaitServerReady
()
{
VLOG
(
4
)
<<
"AsyncGRPCServer is wait server ready"
;
VLOG
(
4
)
<<
"AsyncGRPCServer is wait
ing
server ready"
;
std
::
unique_lock
<
std
::
mutex
>
lock
(
this
->
mutex_ready_
);
condition_ready_
.
wait
(
lock
,
[
=
]
{
return
this
->
ready_
==
1
;
});
VLOG
(
4
)
<<
"AsyncGRPCServer WaitSeverReady"
;
...
...
@@ -368,6 +460,12 @@ void AsyncGRPCServer::TryToRegisterNewOne(const std::string& rpc_name,
b
=
new
RequestSend
(
&
service_
,
cq
.
get
(),
handler
,
req_id
);
}
else
if
(
rpc_name
==
kRequestGet
)
{
b
=
new
RequestGet
(
&
service_
,
cq
.
get
(),
handler
,
req_id
);
}
else
if
(
rpc_name
==
kRequestGetMonomerVariable
)
{
b
=
new
RequestGetMonomerVariable
(
&
service_
,
cq
.
get
(),
handler
,
req_id
,
this
);
}
else
if
(
rpc_name
==
kRequestGetMonomerBarrier
)
{
b
=
new
RequestGetMonomerBarrier
(
&
service_
,
cq
.
get
(),
handler
,
req_id
,
this
);
}
else
if
(
rpc_name
==
kRequestPrefetch
)
{
b
=
new
RequestPrefetch
(
&
service_
,
cq
.
get
(),
handler
,
req_id
);
}
else
if
(
rpc_name
==
kRequestCheckpoint
)
{
...
...
@@ -378,7 +476,7 @@ void AsyncGRPCServer::TryToRegisterNewOne(const std::string& rpc_name,
reqs
[
req_id
]
=
b
;
VLOG
(
4
)
<<
"
Create RequestSend
status:"
<<
b
->
Status
();
VLOG
(
4
)
<<
"
TryToRegisterNewOne
status:"
<<
b
->
Status
();
}
void
AsyncGRPCServer
::
HandleRequest
(
...
...
paddle/fluid/operators/distributed/grpc_service.h
浏览文件 @
2dda19f7
...
...
@@ -81,10 +81,12 @@ enum class GrpcMethod {
kGetVariable
,
kPrefetchVariable
,
kCheckpointNotify
,
kGetMonomerVariable
,
kGetMonomerBarrier
,
};
static
const
int
kGrpcNumMethods
=
static_cast
<
int
>
(
GrpcMethod
::
k
CheckpointNotify
)
+
1
;
static_cast
<
int
>
(
GrpcMethod
::
k
GetMonomerBarrier
)
+
1
;
inline
const
char
*
GrpcMethodName
(
GrpcMethod
id
)
{
switch
(
id
)
{
...
...
@@ -92,6 +94,10 @@ inline const char* GrpcMethodName(GrpcMethod id) {
return
"/sendrecv.SendRecvService/SendVariable"
;
case
GrpcMethod
::
kGetVariable
:
return
"/sendrecv.SendRecvService/GetVariable"
;
case
GrpcMethod
::
kGetMonomerVariable
:
return
"/sendrecv.SendRecvService/GetMonomerVariable"
;
case
GrpcMethod
::
kGetMonomerBarrier
:
return
"/sendrecv.SendRecvService/GetMonomerBarrier"
;
case
GrpcMethod
::
kPrefetchVariable
:
return
"/sendrecv.SendRecvService/PrefetchVariable"
;
case
GrpcMethod
::
kCheckpointNotify
:
...
...
paddle/fluid/operators/distributed/request_handler.h
浏览文件 @
2dda19f7
...
...
@@ -37,6 +37,8 @@ namespace distributed {
constexpr
char
kRequestSend
[]
=
"RequestSend"
;
constexpr
char
kRequestGet
[]
=
"RequestGet"
;
constexpr
char
kRequestGetMonomerVariable
[]
=
"RequestGetMonomerVariable"
;
constexpr
char
kRequestGetMonomerBarrier
[]
=
"RequestGetMonomerBarrier"
;
constexpr
char
kRequestPrefetch
[]
=
"RequestPrefetch"
;
constexpr
char
kRequestCheckpoint
[]
=
"RequestCheckpoint"
;
constexpr
char
kRequestPassBarrier
[]
=
"RequestPassBarrier"
;
...
...
paddle/fluid/operators/distributed/rpc_client.h
浏览文件 @
2dda19f7
...
...
@@ -45,6 +45,11 @@ class RPCClient {
const
std
::
string
&
var_name
,
int64_t
time_out
=
FLAGS_rpc_deadline
)
=
0
;
virtual
VarHandlePtr
AsyncGetMonomerVariable
(
const
std
::
string
&
ep
,
const
platform
::
DeviceContext
&
ctx
,
const
framework
::
Scope
&
scope
,
const
std
::
string
&
var_name
,
int64_t
time_out
=
FLAGS_rpc_deadline
)
=
0
;
virtual
VarHandlePtr
AsyncPrefetchVar
(
const
std
::
string
&
ep
,
const
platform
::
DeviceContext
&
ctx
,
const
framework
::
Scope
&
scope
,
const
std
::
string
&
in_var_name
,
...
...
@@ -57,6 +62,10 @@ class RPCClient {
virtual
VarHandlePtr
AsyncSendFetchBarrier
(
const
std
::
string
&
ep
,
int64_t
time_out
=
FLAGS_rpc_deadline
)
=
0
;
virtual
VarHandlePtr
AsyncGetMonomerBarrier
(
const
std
::
string
&
ep
,
const
std
::
string
&
var_name
,
int64_t
time_out
=
FLAGS_rpc_deadline
)
=
0
;
virtual
VarHandlePtr
AsyncCheckpointNotify
(
const
std
::
string
&
ep
,
const
std
::
string
&
dir
,
int64_t
time_out
=
FLAGS_rpc_deadline
)
=
0
;
...
...
@@ -87,8 +96,9 @@ class RPCClient {
}
}
protected:
virtual
void
InitImpl
()
{}
protected:
// each trainer have exact one trainer id, it should be static
static
int
trainer_id_
;
...
...
paddle/fluid/operators/distributed/rpc_server.cc
浏览文件 @
2dda19f7
...
...
@@ -132,6 +132,96 @@ void RPCServer::WaitCond(const std::string& rpc_name) {
lock
,
[
=
]
{
return
(
cur_cond_
.
load
()
==
cond
||
exit_flag_
.
load
());
});
}
void
RPCServer
::
RegisterVar
(
const
std
::
string
&
var_name
,
const
std
::
string
&
rpc_name
,
framework
::
Scope
*
scope
,
platform
::
DeviceContext
*
dev_ctx
)
{
MonomerHandle
h
;
h
.
var_name_
=
var_name
;
h
.
rpc_name_
=
rpc_name
;
h
.
scope_
=
scope
;
h
.
dev_ctx_
=
dev_ctx
;
{
std
::
unique_lock
<
std
::
mutex
>
lock
(
mutex_
);
if
(
var_map_
.
find
(
var_name
)
!=
var_map_
.
end
())
{
PADDLE_ENFORCE
(
false
,
"%s alreay in var_map"
,
var_name
);
}
var_map_
[
var_name
]
=
h
;
}
rpc_cond_
.
notify_all
();
VLOG
(
4
)
<<
"RegisterVar context:"
<<
h
.
String
();
}
void
RPCServer
::
IncreaseVarBarrier
(
const
std
::
string
&
var_name
)
{
int
b
=
0
;
MonomerHandle
h
;
{
std
::
unique_lock
<
std
::
mutex
>
lock
(
mutex_
);
b
=
++
var_map_
[
var_name
].
barrier_
;
h
=
var_map_
[
var_name
];
}
if
(
b
>=
client_num_
)
{
barrier_cond_
.
notify_all
();
}
VLOG
(
4
)
<<
"IncreaseVarBarrier context:"
<<
h
.
String
();
}
void
RPCServer
::
WaitVarBarrier
(
const
std
::
string
&
var_name
)
{
VLOG
(
4
)
<<
"WaitBarrier var_name:"
<<
var_name
;
std
::
unique_lock
<
std
::
mutex
>
lock
(
mutex_
);
barrier_cond_
.
wait
(
lock
,
[
&
]()
{
return
((
var_map_
[
var_name
].
barrier_
>=
client_num_
&&
client_num_
!=
0
)
||
exit_flag_
.
load
());
});
VLOG
(
4
)
<<
"WaitBarrier context: "
<<
var_map_
[
var_name
].
String
();
}
void
RPCServer
::
SetVarCond
(
const
std
::
string
&
var_name
)
{
VLOG
(
4
)
<<
"SetVarCond var_name:"
<<
var_name
;
{
std
::
unique_lock
<
std
::
mutex
>
lock
(
mutex_
);
if
(
var_map_
.
find
(
var_name
)
!=
var_map_
.
end
())
{
rpc_cond_
.
notify_all
();
}
}
}
void
RPCServer
::
WaitVarCond
(
const
std
::
string
&
var_name
)
{
VLOG
(
4
)
<<
"WaitVarCond var_name:"
<<
var_name
;
std
::
unique_lock
<
std
::
mutex
>
lock
(
mutex_
);
rpc_cond_
.
wait
(
lock
,
[
=
]
{
return
(
var_map_
.
find
(
var_name
)
!=
var_map_
.
end
()
||
exit_flag_
.
load
());
});
VLOG
(
4
)
<<
"WaitVarCond var_name:"
<<
var_name
<<
" end"
;
}
MonomerHandle
RPCServer
::
GetMonomer
(
const
std
::
string
&
var_name
)
{
MonomerHandle
h
;
{
std
::
unique_lock
<
std
::
mutex
>
lock
(
mutex_
);
h
=
var_map_
[
var_name
];
}
return
h
;
}
void
RPCServer
::
ClearRegisteredVars
()
{
std
::
unique_lock
<
std
::
mutex
>
lock
(
mutex_
);
var_map_
.
clear
();
}
void
RPCServer
::
ClearVar
(
const
std
::
string
&
var_name
)
{
std
::
unique_lock
<
std
::
mutex
>
lock
(
mutex_
);
var_map_
.
erase
(
var_name
);
}
}
// namespace distributed
}
// namespace operators
}
// namespace paddle
paddle/fluid/operators/distributed/rpc_server.h
浏览文件 @
2dda19f7
...
...
@@ -21,12 +21,30 @@
#include <utility>
#include <vector>
#include "paddle/fluid/framework/scope.h"
#include "paddle/fluid/operators/distributed/request_handler.h"
#include "paddle/fluid/platform/device_context.h"
namespace
paddle
{
namespace
operators
{
namespace
distributed
{
struct
MonomerHandle
{
std
::
string
var_name_
;
std
::
string
rpc_name_
;
framework
::
Scope
*
scope_
{
nullptr
};
platform
::
DeviceContext
*
dev_ctx_
{
nullptr
};
int64_t
barrier_
{
0
};
std
::
string
String
()
{
std
::
stringstream
ss
;
ss
<<
"var_name:"
<<
var_name_
<<
", rpc_name:"
<<
rpc_name_
<<
", scope:"
<<
scope_
<<
", dev_ctx:"
<<
dev_ctx_
<<
", barrier_:"
<<
barrier_
;
return
ss
.
str
();
}
};
class
RPCServer
{
public:
explicit
RPCServer
(
const
std
::
string
&
address
,
int
client_num
)
...
...
@@ -67,6 +85,16 @@ class RPCServer {
void
WaitCond
(
const
std
::
string
&
rpc_name
);
void
IncreaseBatchBarrier
(
const
std
::
string
rpc_name
);
void
RegisterVar
(
const
std
::
string
&
var_name
,
const
std
::
string
&
rpc_name
,
framework
::
Scope
*
scope
,
platform
::
DeviceContext
*
dev_ctx
);
void
IncreaseVarBarrier
(
const
std
::
string
&
var_name
);
void
WaitVarBarrier
(
const
std
::
string
&
var_name
);
void
SetVarCond
(
const
std
::
string
&
var_name
);
void
WaitVarCond
(
const
std
::
string
&
var_name
);
void
ClearRegisteredVars
();
void
ClearVar
(
const
std
::
string
&
var_name
);
MonomerHandle
GetMonomer
(
const
std
::
string
&
var_name
);
void
Complete
();
void
ResetBarrierCounter
();
...
...
@@ -95,6 +123,9 @@ class RPCServer {
std
::
unordered_map
<
std
::
string
,
RequestHandler
*>
rpc_call_map_
;
std
::
unordered_map
<
std
::
string
,
int
>
rpc_thread_num_
;
friend
class
RequestHandler
;
// TODO(gongwb): use more cond to notify or wait;
std
::
unordered_map
<
std
::
string
,
MonomerHandle
>
var_map_
;
};
};
// namespace distributed
...
...
paddle/fluid/operators/distributed/send_recv.proto.in
浏览文件 @
2dda19f7
...
...
@@ -28,6 +28,9 @@ service SendRecvService {
rpc
PrefetchVariable
(
VariableMessage
)
returns
(
VariableMessage
)
{}
rpc
CheckpointNotify
(
VariableMessage
)
returns
(
VoidMessage
)
{}
rpc
GetMonomerVariable
(
VariableMessage
)
returns
(
VariableMessage
)
{}
rpc
GetMonomerBarrier
(
VariableMessage
)
returns
(
VoidMessage
)
{}
}
//
VariableMessage
is
serialized
paddle
variable
message
.
...
...
paddle/fluid/operators/elementwise/elementwise_mul_op.h
浏览文件 @
2dda19f7
...
...
@@ -60,15 +60,37 @@ template <typename DeviceContext, typename T>
class
ElementwiseMulKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
void
Compute
(
const
framework
::
ExecutionContext
&
ctx
)
const
override
{
auto
*
x
=
ctx
.
Input
<
framework
::
LoDTensor
>
(
"X"
);
auto
x_var
=
ctx
.
InputVar
(
"X"
);
PADDLE_ENFORCE
(
x_var
!=
nullptr
,
"Cannot get input Variable X, variable name = %s"
,
ctx
.
op
().
Input
(
"X"
));
auto
*
y
=
ctx
.
Input
<
framework
::
LoDTensor
>
(
"Y"
);
auto
*
z
=
ctx
.
Output
<
framework
::
LoDTensor
>
(
"Out"
);
framework
::
Tensor
x
,
*
z
;
if
(
x_var
->
IsType
<
framework
::
SelectedRows
>
())
{
PADDLE_ENFORCE
(
y
->
dims
().
size
()
==
1
&&
y
->
dims
()[
0
]
==
1
,
"For elementwise_op, if X is Sparse, Y must be scalar."
);
auto
&
x_sele
=
x_var
->
Get
<
framework
::
SelectedRows
>
();
auto
out_sele
=
ctx
.
Output
<
framework
::
SelectedRows
>
(
"Out"
);
x
=
x_sele
.
value
();
out_sele
->
set_rows
(
x_sele
.
rows
());
out_sele
->
set_height
(
x_sele
.
height
());
out_sele
->
mutable_value
()
->
Resize
(
x_sele
.
value
().
dims
());
out_sele
->
mutable_value
()
->
mutable_data
(
ctx
.
GetPlace
(),
x
.
type
());
z
=
ctx
.
Output
<
framework
::
SelectedRows
>
(
"Out"
)
->
mutable_value
();
}
else
if
(
x_var
->
IsType
<
framework
::
LoDTensor
>
())
{
x
=
x_var
->
Get
<
framework
::
LoDTensor
>
();
z
=
ctx
.
Output
<
framework
::
LoDTensor
>
(
"Out"
);
}
else
{
PADDLE_THROW
(
"X's type[%s] is not supported by elementwise_op."
,
x_var
->
Type
().
name
());
}
z
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
if
(
x
->
numel
()
==
y
->
numel
())
{
elementwise_mul
<
DeviceContext
,
T
>
(
ctx
,
x
,
y
,
z
);
if
(
x
.
numel
()
==
y
->
numel
())
{
elementwise_mul
<
DeviceContext
,
T
>
(
ctx
,
&
x
,
y
,
z
);
}
else
{
default_elementwise_mul
<
DeviceContext
,
T
>
(
ctx
,
x
,
y
,
z
);
default_elementwise_mul
<
DeviceContext
,
T
>
(
ctx
,
&
x
,
y
,
z
);
}
}
};
...
...
paddle/fluid/operators/elementwise/elementwise_op.h
浏览文件 @
2dda19f7
...
...
@@ -40,21 +40,28 @@ class ElementwiseOp : public framework::OperatorWithKernel {
PADDLE_ENFORCE
(
ctx
->
HasOutput
(
"Out"
),
"Output(Out) of elementwise op should not be null."
);
PADDLE_ENFORCE
(
ctx
->
GetInputsVarType
(
"X"
).
front
()
==
framework
::
proto
::
VarType
::
LOD_TENSOR
,
"The input var's type should be LoDTensor, but the received is %s"
,
ctx
->
Inputs
(
"X"
).
front
(),
ctx
->
GetInputsVarType
(
"X"
).
front
());
PADDLE_ENFORCE
(
ctx
->
GetInputsVarType
(
"Y"
).
front
()
==
framework
::
proto
::
VarType
::
LOD_TENSOR
,
"The input var's type should be LoDTensor, but the received is %s"
,
ctx
->
Inputs
(
"Y"
).
front
(),
ctx
->
GetInputsVarType
(
"Y"
).
front
());
auto
x_dim
=
ctx
->
GetInputDim
(
"X"
);
auto
y_dim
=
ctx
->
GetInputDim
(
"Y"
);
PADDLE_ENFORCE_GE
(
x_dim
.
size
(),
y_dim
.
size
(),
"Rank of first input must >= rank of second input."
);
"The input var's type should be LoDTensor, but the received is %s [%s]"
,
ctx
->
GetInputsVarType
(
"Y"
).
front
(),
ctx
->
Inputs
(
"Y"
).
front
());
if
(
ctx
->
GetInputsVarType
(
"X"
).
front
()
==
framework
::
proto
::
VarType
::
LOD_TENSOR
)
{
auto
x_dim
=
ctx
->
GetInputDim
(
"X"
);
auto
y_dim
=
ctx
->
GetInputDim
(
"Y"
);
PADDLE_ENFORCE_GE
(
x_dim
.
size
(),
y_dim
.
size
(),
"Rank of first input must >= rank of second input."
);
}
else
if
(
ctx
->
GetInputsVarType
(
"X"
).
front
()
==
framework
::
proto
::
VarType
::
SELECTED_ROWS
)
{
PADDLE_ENFORCE
((
ctx
->
GetInputDim
(
"Y"
).
size
()
==
1u
)
&&
(
ctx
->
GetInputDim
(
"Y"
)[
0
]
==
1
),
"For elementwise_op, if X is Sparse, "
"Y must be scalar."
);
}
else
{
PADDLE_THROW
(
"X's type[%s] is not supported by elementwise_op."
,
ctx
->
GetInputsVarType
(
"X"
).
front
());
}
ctx
->
ShareDim
(
"X"
,
/*->*/
"Out"
);
ctx
->
ShareLoD
(
"X"
,
/*->*/
"Out"
);
...
...
paddle/fluid/operators/fused/fused_embedding_fc_lstm_op.cc
浏览文件 @
2dda19f7
...
...
@@ -217,13 +217,13 @@ class FusedEmbeddingFCLSTMKernel : public framework::OpKernel<T> {
auto& act_gate_str = ctx.Attr<std::string>("gate_activation"); \
auto& act_cell_str = ctx.Attr<std::string>("cell_activation"); \
auto& act_cand_str = ctx.Attr<std::string>("candidate_activation"); \
if (platform::
jit::MayIUse(platform::jit::avx)) {
\
math::VecActivations<T, platform::
jit::avx> act_functor;
\
if (platform::
MayIUse(platform::avx)) {
\
math::VecActivations<T, platform::
avx> act_functor;
\
act_gate = act_functor(act_gate_str); \
act_cell = act_functor(act_cell_str); \
act_cand = act_functor(act_cand_str); \
} else { \
math::VecActivations<T, platform::
jit::isa_any> act_functor;
\
math::VecActivations<T, platform::
isa_any> act_functor;
\
act_gate = act_functor(act_gate_str); \
act_cell = act_functor(act_cell_str); \
act_cand = act_functor(act_cand_str); \
...
...
paddle/fluid/operators/fused/fusion_seqexpand_concat_fc_op.cc
浏览文件 @
2dda19f7
...
...
@@ -151,11 +151,11 @@ class FusionSeqExpandConcatFCOpKernel : public framework::OpKernel<T> {
std
::
function
<
void
(
const
int
,
const
T
*
,
T
*
)
>
fc_act
;
auto
&
fc_act_str
=
ctx
.
Attr
<
std
::
string
>
(
"fc_activation"
);
if
(
platform
::
jit
::
MayIUse
(
platform
::
jit
::
avx
))
{
math
::
VecActivations
<
T
,
platform
::
jit
::
avx
>
act_functor
;
if
(
platform
::
MayIUse
(
platform
::
avx
))
{
math
::
VecActivations
<
T
,
platform
::
avx
>
act_functor
;
fc_act
=
act_functor
(
fc_act_str
);
}
else
{
math
::
VecActivations
<
T
,
platform
::
jit
::
isa_any
>
act_functor
;
math
::
VecActivations
<
T
,
platform
::
isa_any
>
act_functor
;
fc_act
=
act_functor
(
fc_act_str
);
}
...
...
paddle/fluid/operators/get_tensor_from_selected_rows_op.cc
0 → 100644
浏览文件 @
2dda19f7
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/tensor_util.h"
namespace
paddle
{
namespace
operators
{
class
GetTensorFromSelectedRowsOp
:
public
framework
::
OperatorWithKernel
{
public:
using
framework
::
OperatorWithKernel
::
OperatorWithKernel
;
void
InferShape
(
framework
::
InferShapeContext
*
ctx
)
const
override
{
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"X"
),
"GetTensorFromSelectedRowsOp must has input X."
);
PADDLE_ENFORCE
(
ctx
->
HasOutput
(
"Out"
),
"GetTensorFromSelectedRowsOp must has output Out."
);
PADDLE_ENFORCE
(
ctx
->
GetInputsVarType
(
"X"
).
front
()
==
framework
::
proto
::
VarType
::
SELECTED_ROWS
,
"The input X's type should be SelectedRows, but the received is %s"
,
ctx
->
Inputs
(
"X"
).
front
(),
ctx
->
GetInputsVarType
(
"X"
).
front
());
PADDLE_ENFORCE
(
ctx
->
GetOutputsVarType
(
"Out"
).
front
()
==
framework
::
proto
::
VarType
::
LOD_TENSOR
,
"The output Out's type should be LoDTensor, but the received is %s"
,
ctx
->
Outputs
(
"Out"
).
front
(),
ctx
->
GetOutputsVarType
(
"Out"
).
front
());
ctx
->
SetOutputDim
(
"Out"
,
ctx
->
GetInputDim
(
"X"
));
}
protected:
framework
::
OpKernelType
GetExpectedKernelType
(
const
framework
::
ExecutionContext
&
ctx
)
const
override
{
return
framework
::
OpKernelType
(
framework
::
GetDataTypeOfVar
(
ctx
.
InputVar
(
"X"
)),
ctx
.
device_context
());
}
};
class
GetTensorFromSelectedRowsKernel
{
public:
void
operator
()(
const
framework
::
ExecutionContext
&
ctx
)
const
{
auto
*
x
=
ctx
.
Input
<
framework
::
SelectedRows
>
(
"X"
);
auto
*
out
=
ctx
.
Output
<
framework
::
LoDTensor
>
(
"Out"
);
out
->
Resize
(
x
->
value
().
dims
());
out
->
mutable_data
(
ctx
.
GetPlace
(),
x
->
value
().
type
());
framework
::
TensorCopy
(
x
->
value
(),
ctx
.
GetPlace
(),
ctx
.
device_context
(),
out
);
}
};
class
GetTensorFromSelectedRowsOpProtoMaker
:
public
framework
::
OpProtoAndCheckerMaker
{
public:
void
Make
()
override
{
AddInput
(
"X"
,
"The input type is SelectedRows."
);
AddOutput
(
"Out"
,
"The output type is LoDTensor."
);
AddComment
(
R"DOC(
GetTensorFromSelectedRows Operator
GetTensorFromSelectedRows is used to get the tensor from SelectedRows.
)DOC"
);
}
};
class
GetTensorFromSelectedRowsOpVarTypeInference
:
public
framework
::
VarTypeInference
{
public:
void
operator
()(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
final
{
auto
out_var_name
=
op_desc
.
Output
(
"Out"
).
front
();
auto
in_var_name
=
op_desc
.
Input
(
"X"
).
front
();
auto
out_var
=
block
->
FindRecursiveOrCreateVar
(
out_var_name
);
auto
in_var
=
block
->
FindRecursiveOrCreateVar
(
in_var_name
);
out_var
.
SetType
(
framework
::
proto
::
VarType
::
LOD_TENSOR
);
out_var
.
SetDataType
(
in_var
.
GetDataType
());
}
};
}
// namespace operators
}
// namespace paddle
namespace
ops
=
paddle
::
operators
;
REGISTER_OPERATOR
(
get_tensor_from_selected_rows
,
ops
::
GetTensorFromSelectedRowsOp
,
ops
::
GetTensorFromSelectedRowsOpProtoMaker
,
ops
::
GetTensorFromSelectedRowsOpVarTypeInference
);
REGISTER_OP_CPU_KERNEL_FUNCTOR
(
get_tensor_from_selected_rows
,
float
,
ops
::
GetTensorFromSelectedRowsKernel
,
double
,
ops
::
GetTensorFromSelectedRowsKernel
,
int
,
ops
::
GetTensorFromSelectedRowsKernel
,
int64_t
,
ops
::
GetTensorFromSelectedRowsKernel
);
#ifdef PADDLE_WITH_CUDA
REGISTER_OP_CUDA_KERNEL_FUNCTOR
(
get_tensor_from_selected_rows
,
float
,
ops
::
GetTensorFromSelectedRowsKernel
,
double
,
ops
::
GetTensorFromSelectedRowsKernel
,
int
,
ops
::
GetTensorFromSelectedRowsKernel
,
int64_t
,
ops
::
GetTensorFromSelectedRowsKernel
);
#endif
paddle/fluid/operators/hierarchical_sigmoid_op.cc
浏览文件 @
2dda19f7
...
...
@@ -150,14 +150,14 @@ class HierarchicalSigmoidGradOp : public framework::OperatorWithKernel {
"Output(W@Grad should not be null."
);
PADDLE_ENFORCE
(
ctx
->
HasOutput
(
framework
::
GradVarName
(
"X"
)),
"Output(X@Grad should not be null."
);
if
(
!
ctx
->
Attrs
().
Get
<
bool
>
(
"is_sparse"
))
{
if
(
ctx
->
HasOutput
(
framework
::
GradVarName
(
"Bias"
)))
{
ctx
->
SetOutputDim
(
framework
::
GradVarName
(
"Bias"
),
ctx
->
GetInputDim
(
"Bias"
));
}
ctx
->
SetOutputDim
(
framework
::
GradVarName
(
"W"
),
ctx
->
GetInputDim
(
"W"
));
if
(
ctx
->
HasOutput
(
framework
::
GradVarName
(
"Bias"
)))
{
ctx
->
SetOutputDim
(
framework
::
GradVarName
(
"Bias"
),
ctx
->
GetInputDim
(
"Bias"
));
}
ctx
->
SetOutputDim
(
framework
::
GradVarName
(
"W"
),
ctx
->
GetInputDim
(
"W"
));
ctx
->
SetOutputDim
(
framework
::
GradVarName
(
"X"
),
ctx
->
GetInputDim
(
"X"
));
ctx
->
ShareLoD
(
"X"
,
/*->*/
framework
::
GradVarName
(
"X"
));
}
protected:
...
...
paddle/fluid/operators/hierarchical_sigmoid_op.h
浏览文件 @
2dda19f7
...
...
@@ -185,7 +185,6 @@ class HierarchicalSigmoidGradOpKernel : public framework::OpKernel<T> {
ctx
.
Output
<
framework
::
SelectedRows
>
(
framework
::
GradVarName
(
"W"
));
w_grad
->
set_rows
(
real_rows
);
// Build a map of id -> row_index to speed up finding the index of one id
w_grad
->
SyncIndex
();
w_grad
->
set_height
(
w
.
dims
()[
0
]);
auto
*
w_grad_value
=
w_grad
->
mutable_value
();
framework
::
DDim
temp_dim
(
w
.
dims
());
...
...
paddle/fluid/operators/load_combine_op.cc
浏览文件 @
2dda19f7
...
...
@@ -32,16 +32,26 @@ class LoadCombineOp : public framework::OperatorBase {
const
platform
::
Place
&
place
)
const
override
{
auto
filename
=
Attr
<
std
::
string
>
(
"file_path"
);
auto
load_as_fp16
=
Attr
<
bool
>
(
"load_as_fp16"
);
std
::
ifstream
fin
(
filename
);
PADDLE_ENFORCE
(
static_cast
<
bool
>
(
fin
),
"Cannot open file %s for load_combine op"
,
filename
);
auto
model_from_memory
=
Attr
<
bool
>
(
"model_from_memory"
);
auto
out_var_names
=
Outputs
(
"Out"
);
PADDLE_ENFORCE_GT
(
static_cast
<
int
>
(
out_var_names
.
size
()),
0
,
"The number of output variables should be greater than 0."
);
if
(
!
model_from_memory
)
{
std
::
ifstream
fin
(
filename
);
PADDLE_ENFORCE
(
static_cast
<
bool
>
(
fin
),
"Cannot open file %s for load_combine op"
,
filename
);
LoadParamsFromBuffer
(
scope
,
place
,
&
fin
,
load_as_fp16
,
out_var_names
);
}
else
{
PADDLE_ENFORCE
(
!
filename
.
empty
(),
"Cannot load file from memory"
);
std
::
stringstream
fin
(
filename
);
LoadParamsFromBuffer
(
scope
,
place
,
&
fin
,
load_as_fp16
,
out_var_names
);
}
}
void
LoadParamsFromBuffer
(
const
framework
::
Scope
&
scope
,
const
platform
::
Place
&
place
,
std
::
istream
*
buffer
,
bool
load_as_fp16
,
const
std
::
vector
<
std
::
string
>
&
out_var_names
)
const
{
platform
::
DeviceContextPool
&
pool
=
platform
::
DeviceContextPool
::
Instance
();
auto
&
dev_ctx
=
*
pool
.
Get
(
place
);
...
...
@@ -54,11 +64,10 @@ class LoadCombineOp : public framework::OperatorBase {
auto
*
tensor
=
out_var
->
GetMutable
<
framework
::
LoDTensor
>
();
// Error checking
PADDLE_ENFORCE
(
static_cast
<
bool
>
(
fin
),
"Cannot read more from file %s"
,
filename
);
PADDLE_ENFORCE
(
static_cast
<
bool
>
(
buffer
),
"Cannot read more"
);
// Get data from fin to tensor
DeserializeFromStream
(
fin
,
tensor
,
dev_ctx
);
DeserializeFromStream
(
*
buffer
,
tensor
,
dev_ctx
);
auto
in_dtype
=
framework
::
ToDataType
(
tensor
->
type
());
auto
out_dtype
=
...
...
@@ -103,11 +112,17 @@ class LoadCombineOpProtoMaker : public framework::OpProtoAndCheckerMaker {
"LoDTensors will be loaded from
\"
file_path
\"
."
)
.
AddCustomChecker
(
[](
const
std
::
string
&
path
)
{
return
!
path
.
empty
();
});
AddAttr
<
bool
>
(
"model_from_memory"
,
"(boolean, default false)"
"If true, file_path is in memory, and LoDTensors will be "
"loaded directly from memory"
)
.
SetDefault
(
false
);
AddComment
(
R"DOC(
LoadCombine Operator.
LoadCombine operator loads LoDTensor variables from a file. The file should
contain one or more LoDTensors serialized using the SaveCombine operator. The
LoadCombine operator loads LoDTensor variables from a file, which could be
loaded in memory already. The file should contain one or more LoDTensors
serialized using the SaveCombine operator. The
LoadCombine operator applies a deserialization strategy to appropriately load
the LodTensors, and this strategy complements the serialization strategy used
in the SaveCombine operator. Hence, the LoadCombine operator is tightly coupled
...
...
paddle/fluid/operators/math/CMakeLists.txt
浏览文件 @
2dda19f7
...
...
@@ -59,6 +59,7 @@ math_library(matrix_bit_code)
math_library
(
unpooling
)
math_library
(
vol2col
)
math_library
(
prelu
)
cc_test
(
math_function_test SRCS math_function_test.cc DEPS math_function
)
cc_test
(
selected_rows_functor_test SRCS selected_rows_functor_test.cc DEPS selected_rows_functor
)
...
...
paddle/fluid/operators/math/cpu_vec.h
浏览文件 @
2dda19f7
...
...
@@ -77,7 +77,7 @@ inline void vec_scal<double>(const int n, const double a, double* x) {
#endif
// MKL scal only support inplace, choose this if src and dst are not equal
template
<
typename
T
,
platform
::
jit
::
cpu_isa_t
isa
=
platform
::
jit
::
isa_any
>
template
<
typename
T
,
platform
::
cpu_isa_t
isa
=
platform
::
isa_any
>
inline
void
vec_scal
(
const
int
n
,
const
T
a
,
const
T
*
x
,
T
*
y
)
{
for
(
int
i
=
0
;
i
<
n
;
++
i
)
{
y
[
i
]
=
a
*
x
[
i
];
...
...
@@ -85,12 +85,12 @@ inline void vec_scal(const int n, const T a, const T* x, T* y) {
}
template
<
>
inline
void
vec_scal
<
float
,
platform
::
jit
::
avx
>
(
const
int
n
,
const
float
a
,
const
float
*
x
,
float
*
y
)
{
inline
void
vec_scal
<
float
,
platform
::
avx
>
(
const
int
n
,
const
float
a
,
const
float
*
x
,
float
*
y
)
{
#ifdef __AVX__
constexpr
int
block
=
YMM_FLOAT_BLOCK
;
if
(
n
<
block
)
{
vec_scal
<
float
,
platform
::
jit
::
isa_any
>
(
n
,
a
,
x
,
y
);
vec_scal
<
float
,
platform
::
isa_any
>
(
n
,
a
,
x
,
y
);
return
;
}
const
int
rest
=
n
%
block
;
...
...
@@ -114,24 +114,24 @@ inline void vec_scal<float, platform::jit::avx>(const int n, const float a,
y
[
i
]
=
a
*
x
[
i
];
}
#else
vec_scal
<
float
,
platform
::
jit
::
isa_any
>
(
n
,
a
,
x
,
y
);
vec_scal
<
float
,
platform
::
isa_any
>
(
n
,
a
,
x
,
y
);
#endif
}
template
<
>
inline
void
vec_scal
<
float
,
platform
::
jit
::
avx2
>
(
const
int
n
,
const
float
a
,
const
float
*
x
,
float
*
y
)
{
vec_scal
<
float
,
platform
::
jit
::
avx
>
(
n
,
a
,
x
,
y
);
inline
void
vec_scal
<
float
,
platform
::
avx2
>
(
const
int
n
,
const
float
a
,
const
float
*
x
,
float
*
y
)
{
vec_scal
<
float
,
platform
::
avx
>
(
n
,
a
,
x
,
y
);
}
template
<
>
inline
void
vec_scal
<
float
,
platform
::
jit
::
avx512f
>
(
const
int
n
,
const
float
a
,
const
float
*
x
,
float
*
y
)
{
inline
void
vec_scal
<
float
,
platform
::
avx512f
>
(
const
int
n
,
const
float
a
,
const
float
*
x
,
float
*
y
)
{
// TODO(TJ): enable me
vec_scal
<
float
,
platform
::
jit
::
avx2
>
(
n
,
a
,
x
,
y
);
vec_scal
<
float
,
platform
::
avx2
>
(
n
,
a
,
x
,
y
);
}
template
<
typename
T
,
platform
::
jit
::
cpu_isa_t
isa
=
platform
::
jit
::
isa_any
>
template
<
typename
T
,
platform
::
cpu_isa_t
isa
=
platform
::
isa_any
>
inline
void
vec_bias_sub
(
const
int
n
,
const
T
a
,
const
T
*
x
,
T
*
y
)
{
for
(
int
i
=
0
;
i
<
n
;
++
i
)
{
y
[
i
]
=
a
-
x
[
i
];
...
...
@@ -139,12 +139,12 @@ inline void vec_bias_sub(const int n, const T a, const T* x, T* y) {
}
template
<
>
inline
void
vec_bias_sub
<
float
,
platform
::
jit
::
avx
>
(
const
int
n
,
const
float
a
,
const
float
*
x
,
float
*
y
)
{
inline
void
vec_bias_sub
<
float
,
platform
::
avx
>
(
const
int
n
,
const
float
a
,
const
float
*
x
,
float
*
y
)
{
#ifdef __AVX__
constexpr
int
block
=
YMM_FLOAT_BLOCK
;
if
(
n
<
block
)
{
vec_bias_sub
<
float
,
platform
::
jit
::
isa_any
>
(
n
,
a
,
x
,
y
);
vec_bias_sub
<
float
,
platform
::
isa_any
>
(
n
,
a
,
x
,
y
);
return
;
}
const
int
rest
=
n
%
block
;
...
...
@@ -168,27 +168,25 @@ inline void vec_bias_sub<float, platform::jit::avx>(const int n, const float a,
y
[
i
]
=
a
-
x
[
i
];
}
#else
vec_bias_sub
<
float
,
platform
::
jit
::
isa_any
>
(
n
,
a
,
x
,
y
);
vec_bias_sub
<
float
,
platform
::
isa_any
>
(
n
,
a
,
x
,
y
);
#endif
}
template
<
>
inline
void
vec_bias_sub
<
float
,
platform
::
jit
::
avx2
>
(
const
int
n
,
const
float
a
,
const
float
*
x
,
float
*
y
)
{
vec_bias_sub
<
float
,
platform
::
jit
::
avx
>
(
n
,
a
,
x
,
y
);
inline
void
vec_bias_sub
<
float
,
platform
::
avx2
>
(
const
int
n
,
const
float
a
,
const
float
*
x
,
float
*
y
)
{
vec_bias_sub
<
float
,
platform
::
avx
>
(
n
,
a
,
x
,
y
);
}
template
<
>
inline
void
vec_bias_sub
<
float
,
platform
::
jit
::
avx512f
>
(
const
int
n
,
const
float
a
,
const
float
*
x
,
float
*
y
)
{
inline
void
vec_bias_sub
<
float
,
platform
::
avx512f
>
(
const
int
n
,
const
float
a
,
const
float
*
x
,
float
*
y
)
{
// TODO(TJ): enable me
vec_bias_sub
<
float
,
platform
::
jit
::
avx2
>
(
n
,
a
,
x
,
y
);
vec_bias_sub
<
float
,
platform
::
avx2
>
(
n
,
a
,
x
,
y
);
}
// out = x*y + (1-x)*z
template
<
typename
T
,
platform
::
jit
::
cpu_isa_t
isa
=
platform
::
jit
::
isa_any
>
template
<
typename
T
,
platform
::
cpu_isa_t
isa
=
platform
::
isa_any
>
inline
void
vec_cross
(
const
int
n
,
const
T
*
x
,
const
T
*
y
,
const
T
*
z
,
T
*
out
)
{
for
(
int
i
=
0
;
i
<
n
;
++
i
)
{
out
[
i
]
=
x
[
i
]
*
y
[
i
]
+
(
static_cast
<
T
>
(
1
)
-
x
[
i
])
*
z
[
i
];
...
...
@@ -196,13 +194,13 @@ inline void vec_cross(const int n, const T* x, const T* y, const T* z, T* out) {
}
template
<
>
inline
void
vec_cross
<
float
,
platform
::
jit
::
avx
>
(
const
int
n
,
const
float
*
x
,
const
float
*
y
,
const
float
*
z
,
float
*
out
)
{
inline
void
vec_cross
<
float
,
platform
::
avx
>
(
const
int
n
,
const
float
*
x
,
const
float
*
y
,
const
float
*
z
,
float
*
out
)
{
#ifdef __AVX__
constexpr
int
block
=
YMM_FLOAT_BLOCK
;
if
(
n
<
block
)
{
vec_cross
<
float
,
platform
::
jit
::
isa_any
>
(
n
,
x
,
y
,
z
,
out
);
vec_cross
<
float
,
platform
::
isa_any
>
(
n
,
x
,
y
,
z
,
out
);
return
;
}
const
int
rest
=
n
%
block
;
...
...
@@ -228,25 +226,26 @@ inline void vec_cross<float, platform::jit::avx>(const int n, const float* x,
out
[
i
]
=
x
[
i
]
*
y
[
i
]
+
(
1.
f
-
x
[
i
])
*
z
[
i
];
}
#else
vec_cross
<
float
,
platform
::
jit
::
isa_any
>
(
n
,
x
,
y
,
z
,
out
);
vec_cross
<
float
,
platform
::
isa_any
>
(
n
,
x
,
y
,
z
,
out
);
#endif
}
template
<
>
inline
void
vec_cross
<
float
,
platform
::
jit
::
avx2
>
(
const
int
n
,
const
float
*
x
,
const
float
*
y
,
const
float
*
z
,
float
*
out
)
{
vec_cross
<
float
,
platform
::
jit
::
avx
>
(
n
,
x
,
y
,
z
,
out
);
inline
void
vec_cross
<
float
,
platform
::
avx2
>
(
const
int
n
,
const
float
*
x
,
const
float
*
y
,
const
float
*
z
,
float
*
out
)
{
vec_cross
<
float
,
platform
::
avx
>
(
n
,
x
,
y
,
z
,
out
);
}
template
<
>
inline
void
vec_cross
<
float
,
platform
::
jit
::
avx512f
>
(
const
int
n
,
const
float
*
x
,
const
float
*
y
,
const
float
*
z
,
float
*
out
)
{
inline
void
vec_cross
<
float
,
platform
::
avx512f
>
(
const
int
n
,
const
float
*
x
,
const
float
*
y
,
const
float
*
z
,
float
*
out
)
{
// TODO(TJ): enable me
vec_cross
<
float
,
platform
::
jit
::
avx
>
(
n
,
x
,
y
,
z
,
out
);
vec_cross
<
float
,
platform
::
avx
>
(
n
,
x
,
y
,
z
,
out
);
}
template
<
typename
T
,
platform
::
jit
::
cpu_isa_t
isa
=
platform
::
jit
::
isa_any
>
template
<
typename
T
,
platform
::
cpu_isa_t
isa
=
platform
::
isa_any
>
inline
void
vec_add_bias
(
const
int
n
,
const
T
a
,
const
T
*
x
,
T
*
y
)
{
for
(
int
i
=
0
;
i
<
n
;
++
i
)
{
y
[
i
]
=
x
[
i
]
+
a
;
...
...
@@ -254,12 +253,12 @@ inline void vec_add_bias(const int n, const T a, const T* x, T* y) {
}
template
<
>
inline
void
vec_add_bias
<
float
,
platform
::
jit
::
avx
>
(
const
int
n
,
const
float
a
,
const
float
*
x
,
float
*
y
)
{
inline
void
vec_add_bias
<
float
,
platform
::
avx
>
(
const
int
n
,
const
float
a
,
const
float
*
x
,
float
*
y
)
{
#ifdef __AVX__
constexpr
int
block
=
YMM_FLOAT_BLOCK
;
if
(
n
<
block
)
{
vec_add_bias
<
float
,
platform
::
jit
::
isa_any
>
(
n
,
a
,
x
,
y
);
vec_add_bias
<
float
,
platform
::
isa_any
>
(
n
,
a
,
x
,
y
);
return
;
}
const
int
rest
=
n
%
block
;
...
...
@@ -283,32 +282,30 @@ inline void vec_add_bias<float, platform::jit::avx>(const int n, const float a,
y
[
i
]
=
x
[
i
]
+
a
;
}
#else
vec_add_bias
<
float
,
platform
::
jit
::
isa_any
>
(
n
,
a
,
x
,
y
);
vec_add_bias
<
float
,
platform
::
isa_any
>
(
n
,
a
,
x
,
y
);
#endif
}
template
<
>
inline
void
vec_add_bias
<
float
,
platform
::
jit
::
avx2
>
(
const
int
n
,
const
float
a
,
const
float
*
x
,
float
*
y
)
{
vec_add_bias
<
float
,
platform
::
jit
::
avx
>
(
n
,
a
,
x
,
y
);
inline
void
vec_add_bias
<
float
,
platform
::
avx2
>
(
const
int
n
,
const
float
a
,
const
float
*
x
,
float
*
y
)
{
vec_add_bias
<
float
,
platform
::
avx
>
(
n
,
a
,
x
,
y
);
}
template
<
>
inline
void
vec_add_bias
<
float
,
platform
::
jit
::
avx512f
>
(
const
int
n
,
const
float
a
,
const
float
*
x
,
float
*
y
)
{
inline
void
vec_add_bias
<
float
,
platform
::
avx512f
>
(
const
int
n
,
const
float
a
,
const
float
*
x
,
float
*
y
)
{
// TODO(TJ): enable me
vec_add_bias
<
float
,
platform
::
jit
::
avx2
>
(
n
,
a
,
x
,
y
);
vec_add_bias
<
float
,
platform
::
avx2
>
(
n
,
a
,
x
,
y
);
}
template
<
typename
T
,
platform
::
jit
::
cpu_isa_t
isa
=
platform
::
jit
::
isa_any
>
template
<
typename
T
,
platform
::
cpu_isa_t
isa
=
platform
::
isa_any
>
inline
void
vec_identity
(
const
int
n
,
const
T
*
x
,
T
*
y
)
{
// do nothing
return
;
}
template
<
typename
T
,
platform
::
jit
::
cpu_isa_t
isa
=
platform
::
jit
::
isa_any
>
template
<
typename
T
,
platform
::
cpu_isa_t
isa
=
platform
::
isa_any
>
inline
void
vec_sigmoid
(
const
int
n
,
const
T
*
x
,
T
*
y
)
{
const
T
min
=
SIGMOID_THRESHOLD_MIN
;
const
T
max
=
SIGMOID_THRESHOLD_MAX
;
...
...
@@ -323,12 +320,12 @@ inline void vec_sigmoid(const int n, const T* x, T* y) {
}
template
<
>
inline
void
vec_sigmoid
<
float
,
platform
::
jit
::
avx
>
(
const
int
n
,
const
float
*
x
,
float
*
y
)
{
inline
void
vec_sigmoid
<
float
,
platform
::
avx
>
(
const
int
n
,
const
float
*
x
,
float
*
y
)
{
#ifdef __AVX__
constexpr
int
block
=
YMM_FLOAT_BLOCK
;
if
(
n
<
block
)
{
vec_sigmoid
<
float
,
platform
::
jit
::
isa_any
>
(
n
,
x
,
y
);
vec_sigmoid
<
float
,
platform
::
isa_any
>
(
n
,
x
,
y
);
return
;
}
const
int
rest
=
n
%
block
;
...
...
@@ -377,25 +374,24 @@ inline void vec_sigmoid<float, platform::jit::avx>(const int n, const float* x,
y
[
i
]
=
1.
f
/
(
1.
f
+
y
[
i
]);
}
#else
vec_sigmoid
<
float
,
platform
::
jit
::
isa_any
>
(
n
,
x
,
y
);
vec_sigmoid
<
float
,
platform
::
isa_any
>
(
n
,
x
,
y
);
#endif
}
template
<
>
inline
void
vec_sigmoid
<
float
,
platform
::
jit
::
avx2
>
(
const
int
n
,
const
float
*
x
,
float
*
y
)
{
vec_sigmoid
<
float
,
platform
::
jit
::
avx
>
(
n
,
x
,
y
);
inline
void
vec_sigmoid
<
float
,
platform
::
avx2
>
(
const
int
n
,
const
float
*
x
,
float
*
y
)
{
vec_sigmoid
<
float
,
platform
::
avx
>
(
n
,
x
,
y
);
}
template
<
>
inline
void
vec_sigmoid
<
float
,
platform
::
jit
::
avx512f
>
(
const
int
n
,
const
float
*
x
,
float
*
y
)
{
inline
void
vec_sigmoid
<
float
,
platform
::
avx512f
>
(
const
int
n
,
const
float
*
x
,
float
*
y
)
{
// TODO(TJ): enable me
vec_sigmoid
<
float
,
platform
::
jit
::
avx2
>
(
n
,
x
,
y
);
vec_sigmoid
<
float
,
platform
::
avx2
>
(
n
,
x
,
y
);
}
template
<
typename
T
,
platform
::
jit
::
cpu_isa_t
isa
=
platform
::
jit
::
isa_any
>
template
<
typename
T
,
platform
::
cpu_isa_t
isa
=
platform
::
isa_any
>
inline
void
vec_tanh
(
const
int
n
,
const
T
*
x
,
T
*
y
)
{
vec_scal
<
T
,
isa
>
(
n
,
static_cast
<
T
>
(
2
),
x
,
y
);
vec_sigmoid
<
T
,
isa
>
(
n
,
y
,
y
);
...
...
@@ -404,7 +400,7 @@ inline void vec_tanh(const int n, const T* x, T* y) {
}
// TODO(TJ): make relu clip
template
<
typename
T
,
platform
::
jit
::
cpu_isa_t
isa
=
platform
::
jit
::
isa_any
>
template
<
typename
T
,
platform
::
cpu_isa_t
isa
=
platform
::
isa_any
>
inline
void
vec_relu
(
const
int
n
,
const
T
*
x
,
T
*
y
)
{
for
(
int
i
=
0
;
i
<
n
;
++
i
)
{
y
[
i
]
=
x
[
i
]
>
0
?
x
[
i
]
:
0
;
...
...
@@ -412,12 +408,12 @@ inline void vec_relu(const int n, const T* x, T* y) {
}
template
<
>
inline
void
vec_relu
<
float
,
platform
::
jit
::
avx
>
(
const
int
n
,
const
float
*
x
,
float
*
y
)
{
inline
void
vec_relu
<
float
,
platform
::
avx
>
(
const
int
n
,
const
float
*
x
,
float
*
y
)
{
#ifdef __AVX__
constexpr
int
block
=
YMM_FLOAT_BLOCK
;
if
(
n
<
block
*
4
)
{
vec_relu
<
float
,
platform
::
jit
::
isa_any
>
(
n
,
x
,
y
);
vec_relu
<
float
,
platform
::
isa_any
>
(
n
,
x
,
y
);
return
;
}
...
...
@@ -441,26 +437,26 @@ inline void vec_relu<float, platform::jit::avx>(const int n, const float* x,
#undef MOVE_ONE_STEP
#else
vec_relu
<
float
,
platform
::
jit
::
isa_any
>
(
n
,
x
,
y
);
vec_relu
<
float
,
platform
::
isa_any
>
(
n
,
x
,
y
);
#endif
}
template
<
>
inline
void
vec_relu
<
float
,
platform
::
jit
::
avx2
>
(
const
int
n
,
const
float
*
x
,
float
*
y
)
{
vec_relu
<
float
,
platform
::
jit
::
avx
>
(
n
,
x
,
y
);
inline
void
vec_relu
<
float
,
platform
::
avx2
>
(
const
int
n
,
const
float
*
x
,
float
*
y
)
{
vec_relu
<
float
,
platform
::
avx
>
(
n
,
x
,
y
);
}
template
<
>
inline
void
vec_relu
<
float
,
platform
::
jit
::
avx512f
>
(
const
int
n
,
const
float
*
x
,
float
*
y
)
{
inline
void
vec_relu
<
float
,
platform
::
avx512f
>
(
const
int
n
,
const
float
*
x
,
float
*
y
)
{
// TODO(TJ): enable me
vec_relu
<
float
,
platform
::
jit
::
avx2
>
(
n
,
x
,
y
);
vec_relu
<
float
,
platform
::
avx2
>
(
n
,
x
,
y
);
}
// TODO(TJ): optimize double of sigmoid, tanh and relu if necessary
template
<
typename
T
,
platform
::
jit
::
cpu_isa_t
isa
=
platform
::
jit
::
isa_any
>
template
<
typename
T
,
platform
::
cpu_isa_t
isa
=
platform
::
isa_any
>
class
VecActivations
{
public:
std
::
function
<
void
(
const
int
,
const
T
*
,
T
*
)
>
operator
()(
...
...
paddle/fluid/operators/math/cpu_vec_test.cc
浏览文件 @
2dda19f7
...
...
@@ -104,38 +104,42 @@ void TestAndBench(const int n, std::function<void(const int, const T*, T*)> tgt,
}
TEST
(
CpuVecTest
,
sigmoid
)
{
namespace
jit
=
paddle
::
platform
::
jit
;
namespace
platform
=
paddle
::
platform
;
using
namespace
paddle
::
operators
::
math
;
// NOLINT
for
(
auto
sz
:
{
1
,
2
,
15
,
16
,
30
,
32
,
128
,
200
,
512
})
{
TestAndBench
<
float
>
(
sz
,
vec_sigmoid
<
float
>
,
ref_sigmoid
<
float
>
);
TestAndBench
<
float
>
(
sz
,
vec_sigmoid
<
float
,
jit
::
avx
>
,
ref_sigmoid
<
float
>
);
TestAndBench
<
float
>
(
sz
,
vec_sigmoid
<
float
,
jit
::
avx2
>
,
ref_sigmoid
<
float
>
);
TestAndBench
<
float
>
(
sz
,
vec_sigmoid
<
float
,
jit
::
avx512f
>
,
TestAndBench
<
float
>
(
sz
,
vec_sigmoid
<
float
,
platform
::
avx
>
,
ref_sigmoid
<
float
>
);
TestAndBench
<
float
>
(
sz
,
vec_sigmoid
<
float
,
platform
::
avx2
>
,
ref_sigmoid
<
float
>
);
TestAndBench
<
float
>
(
sz
,
vec_sigmoid
<
float
,
platform
::
avx512f
>
,
ref_sigmoid
<
float
>
);
}
TestAndBench
<
double
>
(
30
,
vec_sigmoid
<
double
>
,
ref_sigmoid
<
double
>
);
}
TEST
(
CpuVecTest
,
tanh
)
{
namespace
jit
=
paddle
::
platform
::
jit
;
namespace
platform
=
paddle
::
platform
;
using
namespace
paddle
::
operators
::
math
;
// NOLINT
for
(
auto
sz
:
{
1
,
2
,
15
,
16
,
30
,
32
,
128
,
200
,
512
})
{
TestAndBench
<
float
>
(
sz
,
vec_tanh
<
float
>
,
ref_tanh
<
float
>
);
TestAndBench
<
float
>
(
sz
,
vec_tanh
<
float
,
jit
::
avx
>
,
ref_tanh
<
float
>
);
TestAndBench
<
float
>
(
sz
,
vec_tanh
<
float
,
jit
::
avx2
>
,
ref_tanh
<
float
>
);
TestAndBench
<
float
>
(
sz
,
vec_tanh
<
float
,
jit
::
avx512f
>
,
ref_tanh
<
float
>
);
TestAndBench
<
float
>
(
sz
,
vec_tanh
<
float
,
platform
::
avx
>
,
ref_tanh
<
float
>
);
TestAndBench
<
float
>
(
sz
,
vec_tanh
<
float
,
platform
::
avx2
>
,
ref_tanh
<
float
>
);
TestAndBench
<
float
>
(
sz
,
vec_tanh
<
float
,
platform
::
avx512f
>
,
ref_tanh
<
float
>
);
}
TestAndBench
<
double
>
(
30
,
vec_tanh
<
double
>
,
ref_tanh
<
double
>
);
}
TEST
(
CpuVecTest
,
relu
)
{
namespace
jit
=
paddle
::
platform
::
jit
;
namespace
platform
=
paddle
::
platform
;
using
namespace
paddle
::
operators
::
math
;
// NOLINT
for
(
auto
sz
:
{
1
,
2
,
15
,
16
,
30
,
32
,
128
,
200
,
512
})
{
TestAndBench
<
float
>
(
sz
,
vec_relu
<
float
>
,
ref_relu
<
float
>
);
TestAndBench
<
float
>
(
sz
,
vec_relu
<
float
,
jit
::
avx
>
,
ref_relu
<
float
>
);
TestAndBench
<
float
>
(
sz
,
vec_relu
<
float
,
jit
::
avx2
>
,
ref_relu
<
float
>
);
TestAndBench
<
float
>
(
sz
,
vec_relu
<
float
,
jit
::
avx512f
>
,
ref_relu
<
float
>
);
TestAndBench
<
float
>
(
sz
,
vec_relu
<
float
,
platform
::
avx
>
,
ref_relu
<
float
>
);
TestAndBench
<
float
>
(
sz
,
vec_relu
<
float
,
platform
::
avx2
>
,
ref_relu
<
float
>
);
TestAndBench
<
float
>
(
sz
,
vec_relu
<
float
,
platform
::
avx512f
>
,
ref_relu
<
float
>
);
}
TestAndBench
<
double
>
(
30
,
vec_relu
<
double
>
,
ref_relu
<
double
>
);
}
...
...
@@ -162,38 +166,40 @@ void TestInplace(const int n, std::function<void(const int, const T*, T*)> tgt,
}
TEST
(
CpuVecTest
,
inplace_sigmoid
)
{
namespace
jit
=
paddle
::
platform
::
jit
;
namespace
platform
=
paddle
::
platform
;
using
namespace
paddle
::
operators
::
math
;
// NOLINT
for
(
auto
sz
:
{
1
,
2
,
15
,
16
,
30
,
32
,
128
,
200
,
512
})
{
TestInplace
<
float
>
(
sz
,
vec_sigmoid
<
float
>
,
ref_sigmoid
<
float
>
);
TestInplace
<
float
>
(
sz
,
vec_sigmoid
<
float
,
jit
::
avx
>
,
ref_sigmoid
<
float
>
);
TestInplace
<
float
>
(
sz
,
vec_sigmoid
<
float
,
jit
::
avx2
>
,
ref_sigmoid
<
float
>
);
TestInplace
<
float
>
(
sz
,
vec_sigmoid
<
float
,
jit
::
avx512f
>
,
TestInplace
<
float
>
(
sz
,
vec_sigmoid
<
float
,
platform
::
avx
>
,
ref_sigmoid
<
float
>
);
TestInplace
<
float
>
(
sz
,
vec_sigmoid
<
float
,
platform
::
avx2
>
,
ref_sigmoid
<
float
>
);
TestInplace
<
float
>
(
sz
,
vec_sigmoid
<
float
,
platform
::
avx512f
>
,
ref_sigmoid
<
float
>
);
}
TestInplace
<
double
>
(
30
,
vec_sigmoid
<
double
>
,
ref_sigmoid
<
double
>
);
}
TEST
(
CpuVecTest
,
inplace_tanh
)
{
namespace
jit
=
paddle
::
platform
::
jit
;
namespace
platform
=
paddle
::
platform
;
using
namespace
paddle
::
operators
::
math
;
// NOLINT
for
(
auto
sz
:
{
1
,
2
,
15
,
16
,
30
,
32
,
128
,
200
,
512
})
{
TestInplace
<
float
>
(
sz
,
vec_tanh
<
float
>
,
ref_tanh
<
float
>
);
TestInplace
<
float
>
(
sz
,
vec_tanh
<
float
,
jit
::
avx
>
,
ref_tanh
<
float
>
);
TestInplace
<
float
>
(
sz
,
vec_tanh
<
float
,
jit
::
avx2
>
,
ref_tanh
<
float
>
);
TestInplace
<
float
>
(
sz
,
vec_tanh
<
float
,
jit
::
avx512f
>
,
ref_tanh
<
float
>
);
TestInplace
<
float
>
(
sz
,
vec_tanh
<
float
,
platform
::
avx
>
,
ref_tanh
<
float
>
);
TestInplace
<
float
>
(
sz
,
vec_tanh
<
float
,
platform
::
avx2
>
,
ref_tanh
<
float
>
);
TestInplace
<
float
>
(
sz
,
vec_tanh
<
float
,
platform
::
avx512f
>
,
ref_tanh
<
float
>
);
}
TestInplace
<
double
>
(
30
,
vec_tanh
<
double
>
,
ref_tanh
<
double
>
);
}
TEST
(
CpuVecTest
,
inplace_relu
)
{
namespace
jit
=
paddle
::
platform
::
jit
;
namespace
platform
=
paddle
::
platform
;
using
namespace
paddle
::
operators
::
math
;
// NOLINT
for
(
auto
sz
:
{
1
,
2
,
15
,
16
,
30
,
32
,
128
,
200
,
512
})
{
TestInplace
<
float
>
(
sz
,
vec_relu
<
float
>
,
ref_relu
<
float
>
);
TestInplace
<
float
>
(
sz
,
vec_relu
<
float
,
jit
::
avx
>
,
ref_relu
<
float
>
);
TestInplace
<
float
>
(
sz
,
vec_relu
<
float
,
jit
::
avx2
>
,
ref_relu
<
float
>
);
TestInplace
<
float
>
(
sz
,
vec_relu
<
float
,
jit
::
avx512f
>
,
ref_relu
<
float
>
);
TestInplace
<
float
>
(
sz
,
vec_relu
<
float
,
platform
::
avx
>
,
ref_relu
<
float
>
);
TestInplace
<
float
>
(
sz
,
vec_relu
<
float
,
platform
::
avx2
>
,
ref_relu
<
float
>
);
TestInplace
<
float
>
(
sz
,
vec_relu
<
float
,
platform
::
avx512f
>
,
ref_relu
<
float
>
);
}
TestInplace
<
double
>
(
30
,
vec_relu
<
double
>
,
ref_relu
<
double
>
);
}
paddle/fluid/operators/math/jit_code.cc
浏览文件 @
2dda19f7
...
...
@@ -22,7 +22,7 @@ namespace math {
namespace
jitkernel
{
namespace
gen
{
using
namespace
platform
::
jit
;
// NOLINT
using
namespace
platform
;
// NOLINT
bool
VXXJitCode
::
init
(
int
d
,
int
scalar_index
)
{
// It's not necessary to use avx512 since it would slow down the frequency
...
...
paddle/fluid/operators/math/jit_code.h
浏览文件 @
2dda19f7
...
...
@@ -179,7 +179,7 @@ class VActJitCode : public JitCode {
template
<
typename
JMM
>
void
exp_jmm
(
JMM
&
dst
,
JMM
&
src
,
int
src_idx
=
11
,
int
fx_idx
=
12
,
// NOLINT
int
fy_idx
=
13
,
int
mask_idx
=
14
,
int
tmp_idx
=
15
)
{
using
namespace
platform
::
jit
;
// NOLINT
using
namespace
platform
;
// NOLINT
// check all idx can not equal
JMM
jmm_src
=
JMM
(
src_idx
);
JMM
jmm_fx
=
JMM
(
fx_idx
);
...
...
paddle/fluid/operators/math/jit_gen.cc
浏览文件 @
2dda19f7
...
...
@@ -36,7 +36,7 @@ void JitCode::preCode() {
for
(
int
i
=
0
;
i
<
num_g_abi_regs
;
++
i
)
{
push
(
Xbyak
::
Reg64
(
g_abi_regs
[
i
]));
}
if
(
platform
::
jit
::
MayIUse
(
platform
::
jit
::
avx512f
))
{
if
(
platform
::
MayIUse
(
platform
::
avx512f
))
{
mov
(
reg_EVEX_max_8b_offt
,
2
*
EVEX_max_8b_offt
);
}
}
...
...
paddle/fluid/operators/math/jit_kernel.cc
浏览文件 @
2dda19f7
...
...
@@ -21,8 +21,6 @@ namespace operators {
namespace
math
{
namespace
jitkernel
{
namespace
jit
=
platform
::
jit
;
KernelPool
&
KernelPool
::
Instance
()
{
static
thread_local
KernelPool
g_jit_kernels
;
return
g_jit_kernels
;
...
...
paddle/fluid/operators/math/jit_kernel_blas.cc
浏览文件 @
2dda19f7
...
...
@@ -30,7 +30,6 @@ namespace paddle {
namespace
operators
{
namespace
math
{
namespace
jitkernel
{
namespace
jit
=
platform
::
jit
;
#ifdef PADDLE_WITH_MKLML
template
<
typename
T
>
...
...
@@ -125,7 +124,7 @@ bool VMulKernelImpl<float>::useJIT(int d) {
#ifdef PADDLE_WITH_MKLML
template
<
>
bool
VMulKernelImpl
<
float
>::
useMKL
(
int
d
)
{
return
jit
::
MayIUse
(
jit
::
avx512f
)
&&
d
>
512
;
return
platform
::
MayIUse
(
platform
::
avx512f
)
&&
d
>
512
;
}
template
<
>
...
...
paddle/fluid/operators/math/jit_kernel_crf_decode.cc
浏览文件 @
2dda19f7
...
...
@@ -25,10 +25,8 @@ namespace operators {
namespace
math
{
namespace
jitkernel
{
namespace
jit
=
platform
::
jit
;
/* CRF Decode JitKernel */
template
<
typename
T
,
platform
::
jit
::
cpu_isa_t
isa
,
jit_block
>
template
<
typename
T
,
platform
::
cpu_isa_t
isa
,
jit_block
>
class
CRFDecodeKernelImpl
:
public
CRFDecodeKernel
<
T
>
{
public:
explicit
CRFDecodeKernelImpl
(
int
tag_num
)
:
CRFDecodeKernel
<
T
>
()
{
...
...
@@ -101,7 +99,7 @@ class CRFDecodeKernelImpl : public CRFDecodeKernel<T> {
#define INTRIAVX_FLOAT(block) \
template <> \
CRFDecodeKernelImpl<float,
jit::avx, block>::CRFDecodeKernelImpl(
\
CRFDecodeKernelImpl<float,
platform::avx, block>::CRFDecodeKernelImpl(
\
int tag_num) \
: CRFDecodeKernel<float>() { \
this->num_ = tag_num; \
...
...
@@ -109,7 +107,7 @@ class CRFDecodeKernelImpl : public CRFDecodeKernel<T> {
this->rest_ = this->num_ % YMM_FLOAT_BLOCK; \
} \
template <> \
void CRFDecodeKernelImpl<float,
jit::avx, block>::Compute(
\
void CRFDecodeKernelImpl<float,
platform::avx, block>::Compute(
\
const int seq_len, const float* x, const float* w, float* alpha, \
int* track) const { \
INIT_ALPHA(YMM_FLOAT_BLOCK) \
...
...
@@ -204,7 +202,7 @@ class CRFDecodeKernelImpl : public CRFDecodeKernel<T> {
#define INTRIAVX512_FLOAT(block) \
template <> \
CRFDecodeKernelImpl<float,
jit::avx512f, block>::CRFDecodeKernelImpl(
\
CRFDecodeKernelImpl<float,
platform::avx512f, block>::CRFDecodeKernelImpl(
\
int tag_num) \
: CRFDecodeKernel<float>() { \
this->num_ = tag_num; \
...
...
@@ -212,7 +210,7 @@ class CRFDecodeKernelImpl : public CRFDecodeKernel<T> {
this->rest_ = this->num_ % ZMM_FLOAT_BLOCK; \
} \
template <> \
void CRFDecodeKernelImpl<float,
jit::avx512f, block>::Compute(
\
void CRFDecodeKernelImpl<float,
platform::avx512f, block>::Compute(
\
const int seq_len, const float* x, const float* w, float* alpha, \
int* track) const { \
INIT_ALPHA(ZMM_FLOAT_BLOCK) \
...
...
@@ -270,14 +268,14 @@ INTRIAVX_FLOAT(kEQ16);
INTRIAVX_FLOAT
(
kGT16
);
#endif
#ifdef __AVX2__
INTRIAVX2_FLOAT
(
jit
::
avx2
,
kEQ8
);
INTRIAVX2_FLOAT
(
jit
::
avx2
,
kGT8LT16
);
INTRIAVX2_FLOAT
(
jit
::
avx2
,
kEQ16
);
INTRIAVX2_FLOAT
(
jit
::
avx2
,
kGT16
);
INTRIAVX2_FLOAT
(
platform
::
avx2
,
kEQ8
);
INTRIAVX2_FLOAT
(
platform
::
avx2
,
kGT8LT16
);
INTRIAVX2_FLOAT
(
platform
::
avx2
,
kEQ16
);
INTRIAVX2_FLOAT
(
platform
::
avx2
,
kGT16
);
#endif
#ifdef __AVX512F__
INTRIAVX2_FLOAT
(
jit
::
avx512f
,
kEQ8
);
INTRIAVX2_FLOAT
(
jit
::
avx512f
,
kGT8LT16
);
INTRIAVX2_FLOAT
(
platform
::
avx512f
,
kEQ8
);
INTRIAVX2_FLOAT
(
platform
::
avx512f
,
kGT8LT16
);
INTRIAVX512_FLOAT
(
kEQ16
);
INTRIAVX512_FLOAT
(
kGT16
);
#endif
...
...
paddle/fluid/operators/math/jit_kernel_exp.cc
浏览文件 @
2dda19f7
...
...
@@ -29,7 +29,6 @@ namespace paddle {
namespace
operators
{
namespace
math
{
namespace
jitkernel
{
namespace
jit
=
platform
::
jit
;
#ifdef PADDLE_WITH_MKLML
// try to use MKL to speedup
...
...
paddle/fluid/operators/math/jit_kernel_layer_norm.cc
浏览文件 @
2dda19f7
...
...
@@ -22,10 +22,8 @@ namespace operators {
namespace
math
{
namespace
jitkernel
{
namespace
jit
=
platform
::
jit
;
/* Layer Norm JitKernel */
template
<
typename
T
,
platform
::
jit
::
cpu_isa_t
isa
,
jit_block
>
template
<
typename
T
,
platform
::
cpu_isa_t
isa
,
jit_block
>
class
LayerNormKernelImpl
:
public
LayerNormKernel
<
T
>
{
public:
explicit
LayerNormKernelImpl
(
int
right
)
:
LayerNormKernel
<
T
>
()
{
...
...
@@ -90,7 +88,7 @@ class LayerNormKernelImpl : public LayerNormKernel<T> {
this->end_ = this->num_ - this->rest_; \
} \
template <> \
void LayerNormKernelImpl<float,
jit::avx, block>::Compute(
\
void LayerNormKernelImpl<float,
platform::avx, block>::Compute(
\
float* x, float* out, float* mean, float* var, const float* scale, \
const float* bias, int height, const float epsilon) const { \
__m256 sum; \
...
...
@@ -219,16 +217,16 @@ class LayerNormKernelImpl : public LayerNormKernel<T> {
}
#ifdef __AVX__
INTRIAVX_FLOAT
(
jit
::
avx
,
kEQ8
);
INTRIAVX_FLOAT
(
jit
::
avx
,
kGT8LT16
);
INTRIAVX_FLOAT
(
jit
::
avx
,
kEQ16
);
INTRIAVX_FLOAT
(
jit
::
avx
,
kGT16
);
INTRIAVX_FLOAT
(
platform
::
avx
,
kEQ8
);
INTRIAVX_FLOAT
(
platform
::
avx
,
kGT8LT16
);
INTRIAVX_FLOAT
(
platform
::
avx
,
kEQ16
);
INTRIAVX_FLOAT
(
platform
::
avx
,
kGT16
);
#endif
#ifdef __AVX2__
INTRIAVX_FLOAT
(
jit
::
avx2
,
kEQ8
);
INTRIAVX_FLOAT
(
jit
::
avx2
,
kGT8LT16
);
INTRIAVX_FLOAT
(
jit
::
avx2
,
kEQ16
);
INTRIAVX_FLOAT
(
jit
::
avx2
,
kGT16
);
INTRIAVX_FLOAT
(
platform
::
avx2
,
kEQ8
);
INTRIAVX_FLOAT
(
platform
::
avx2
,
kGT8LT16
);
INTRIAVX_FLOAT
(
platform
::
avx2
,
kEQ16
);
INTRIAVX_FLOAT
(
platform
::
avx2
,
kGT16
);
#endif
#undef INTRIAVX_FLOAT
...
...
paddle/fluid/operators/math/jit_kernel_macro.h
浏览文件 @
2dda19f7
...
...
@@ -92,7 +92,6 @@ namespace jitkernel {
JITKERNEL_DECLARE, JITKERNEL_FIND_KEY, \
JITKERNEL_IMPL)
namespace
jit
=
platform
::
jit
;
// TODO(TJ): below defines are deprecated, would be remove recently
#define SEARCH_BLOCK(macro_, ker, dtype, isa) \
if (d < YMM_FLOAT_BLOCK) { \
...
...
@@ -107,15 +106,15 @@ namespace jit = platform::jit;
macro_(ker, dtype, isa, kGT16); \
}
#define SEARCH_ISA_BLOCK(macro_, ker, dtype) \
if (
jit::MayIUse(jit::avx512f)) {
\
SEARCH_BLOCK(macro_, ker, dtype,
jit
::avx512f); \
} else if (
jit::MayIUse(jit::avx2)) {
\
SEARCH_BLOCK(macro_, ker, dtype,
jit
::avx2); \
} else if (
jit::MayIUse(jit::avx)) {
\
SEARCH_BLOCK(macro_, ker, dtype,
jit
::avx); \
} else { \
SEARCH_BLOCK(macro_, ker, dtype,
jit
::isa_any); \
#define SEARCH_ISA_BLOCK(macro_, ker, dtype)
\
if (
platform::MayIUse(platform::avx512f)) {
\
SEARCH_BLOCK(macro_, ker, dtype,
platform
::avx512f); \
} else if (
platform::MayIUse(platform::avx2)) {
\
SEARCH_BLOCK(macro_, ker, dtype,
platform
::avx2); \
} else if (
platform::MayIUse(platform::avx)) {
\
SEARCH_BLOCK(macro_, ker, dtype,
platform
::avx); \
} else {
\
SEARCH_BLOCK(macro_, ker, dtype,
platform
::isa_any); \
}
#define JITKERNEL_KEY(ker_key, dtype_key) \
...
...
@@ -156,10 +155,10 @@ namespace jit = platform::jit;
marco_declare, macro_key, macro_impl)
#define FOR_EACH_ISA(macro_, block) \
macro_(
jit::avx512f, block);
\
macro_(
jit::avx2, block);
\
macro_(
jit::avx, block);
\
macro_(
jit
::isa_any, block)
macro_(
platform::avx512f, block);
\
macro_(
platform::avx2, block);
\
macro_(
platform::avx, block);
\
macro_(
platform
::isa_any, block)
#define FOR_EACH_BLOCK(macro_, isa) \
macro_(isa, kLT8); \
...
...
@@ -168,11 +167,11 @@ namespace jit = platform::jit;
macro_(isa, kEQ16); \
macro_(isa, kGT16)
#define FOR_EACH_ISA_BLOCK(macro_) \
FOR_EACH_BLOCK(macro_,
jit
::avx512f); \
FOR_EACH_BLOCK(macro_,
jit
::avx2); \
FOR_EACH_BLOCK(macro_,
jit
::avx); \
FOR_EACH_BLOCK(macro_,
jit
::isa_any)
#define FOR_EACH_ISA_BLOCK(macro_)
\
FOR_EACH_BLOCK(macro_,
platform
::avx512f); \
FOR_EACH_BLOCK(macro_,
platform
::avx2); \
FOR_EACH_BLOCK(macro_,
platform
::avx); \
FOR_EACH_BLOCK(macro_,
platform
::isa_any)
}
// namespace jitkernel
}
// namespace math
...
...
paddle/fluid/operators/math/jit_kernel_test.cc
浏览文件 @
2dda19f7
...
...
@@ -705,7 +705,7 @@ TEST(JitKernel, pool) {
jit
::
lstm_attr_t
attr
(
frame_size
,
act_gate
,
act_cand
,
act_cell
,
false
);
// empty call it to avoid unknown flag 'use_pinned_memory' on Mac
paddle
::
platform
::
jit
::
MayIUse
(
paddle
::
platform
::
jit
::
avx
);
paddle
::
platform
::
MayIUse
(
paddle
::
platform
::
avx
);
const
auto
&
plstm1
=
jit
::
KernelPool
::
Instance
()
.
template
Get
<
jit
::
LSTMKernel
<
float
>,
const
jit
::
lstm_attr_t
&>
(
attr
);
...
...
paddle/fluid/operators/math/matrix_bit_code.cc
浏览文件 @
2dda19f7
...
...
@@ -89,6 +89,8 @@ template <typename T>
void
MatrixBitCodeFunctor
<
T
>::
Mul
(
framework
::
Tensor
*
tmat
,
const
framework
::
Tensor
&
weight
,
const
framework
::
Tensor
&
input
)
{
auto
blas
=
GetBlas
<
platform
::
CPUDeviceContext
,
T
>
(
platform
::
CPUDeviceContext
());
size_t
num_samples
=
tmat
->
dims
()[
0
];
size_t
tmat_width
=
tmat
->
dims
()[
1
];
size_t
input_width
=
input
.
dims
()[
1
];
...
...
@@ -99,13 +101,12 @@ void MatrixBitCodeFunctor<T>::Mul(framework::Tensor* tmat,
for
(
size_t
i
=
0
;
i
<
num_samples
;
++
i
)
{
auto
code
=
code_table_
->
get_code
(
i
);
int
code_length
=
code
->
get_length
();
const
T
*
input_row
=
input_value
+
input_width
*
i
;
for
(
int
j
=
0
;
j
<
code_length
;
++
j
)
{
size_t
index
=
code
->
calc_index
(
j
);
const
T
*
weight_row
=
weight_value
+
weight_width
*
index
;
T
sum
=
static_cast
<
T
>
(
0.0
);
for
(
size_t
k
=
0
;
k
<
input_width
;
++
k
)
{
sum
+=
weight_value
[
weight_width
*
index
+
k
]
*
input_value
[
input_width
*
i
+
k
];
}
sum
=
blas
.
DOT
(
input_width
,
weight_row
,
input_row
);
tmat_value
[
i
*
tmat_width
+
j
]
+=
sum
;
}
}
...
...
@@ -115,6 +116,8 @@ template <typename T>
void
MatrixBitCodeFunctor
<
T
>::
MulGradWeight
(
const
framework
::
Tensor
&
tmat
,
framework
::
Tensor
*
weight
,
const
framework
::
Tensor
&
input
)
{
auto
blas
=
GetBlas
<
platform
::
CPUDeviceContext
,
T
>
(
platform
::
CPUDeviceContext
());
size_t
num_samples
=
tmat
.
dims
()[
0
];
size_t
input_width
=
input
.
dims
()[
1
];
size_t
tmat_width
=
tmat
.
dims
()[
1
];
...
...
@@ -122,16 +125,25 @@ void MatrixBitCodeFunctor<T>::MulGradWeight(const framework::Tensor& tmat,
auto
tmat_value
=
tmat
.
data
<
T
>
();
auto
weight_value
=
weight
->
data
<
T
>
();
auto
input_value
=
input
.
data
<
T
>
();
std
::
unordered_map
<
int
,
std
::
vector
<
std
::
pair
<
T
,
const
T
*>>>
ops
;
for
(
size_t
i
=
0
;
i
<
num_samples
;
++
i
)
{
auto
code
=
code_table_
->
get_code
(
i
);
int
code_length
=
code
->
get_length
();
const
T
*
input_value_row
=
input_value
+
input_width
*
i
;
const
T
*
tmat_row
=
tmat_value
+
i
*
tmat_width
;
for
(
int
j
=
0
;
j
<
code_length
;
++
j
)
{
size_t
index
=
code
->
calc_index
(
j
);
for
(
size_t
k
=
0
;
k
<
input_width
;
++
k
)
{
weight_value
[
weight_width
*
index
+
k
]
+=
tmat_value
[
i
*
tmat_width
+
j
]
*
input_value
[
input_width
*
i
+
k
];
}
ops
[
code
->
calc_index
(
j
)].
emplace_back
(
tmat_row
[
j
],
input_value_row
);
}
}
for
(
auto
&
op
:
ops
)
{
auto
&
op_in_row
=
op
.
second
;
for
(
auto
&
pair
:
op_in_row
)
{
auto
&
scale
=
pair
.
first
;
auto
*
input_row
=
pair
.
second
;
T
*
weight_row
=
weight_value
+
op
.
first
*
weight_width
;
blas
.
AXPY
(
input_width
,
scale
,
input_row
,
weight_row
);
}
}
}
...
...
@@ -140,6 +152,8 @@ template <typename T>
void
MatrixBitCodeFunctor
<
T
>::
MulGradWeight
(
const
framework
::
Tensor
&
tmat
,
framework
::
SelectedRows
*
weight
,
const
framework
::
Tensor
&
input
)
{
auto
blas
=
GetBlas
<
platform
::
CPUDeviceContext
,
T
>
(
platform
::
CPUDeviceContext
());
size_t
num_samples
=
tmat
.
dims
()[
0
];
size_t
input_width
=
input
.
dims
()[
1
];
size_t
tmat_width
=
tmat
.
dims
()[
1
];
...
...
@@ -147,17 +161,28 @@ void MatrixBitCodeFunctor<T>::MulGradWeight(const framework::Tensor& tmat,
auto
tmat_value
=
tmat
.
data
<
T
>
();
auto
weight_value
=
weight
->
mutable_value
()
->
data
<
T
>
();
auto
input_value
=
input
.
data
<
T
>
();
std
::
unordered_map
<
int
,
std
::
vector
<
std
::
pair
<
T
,
const
T
*>>>
ops
;
ops
.
reserve
(
weight
->
rows
().
size
());
for
(
size_t
i
=
0
;
i
<
num_samples
;
++
i
)
{
auto
code
=
code_table_
->
get_code
(
i
);
int
code_length
=
code
->
get_length
();
const
T
*
input_value_row
=
input_value
+
input_width
*
i
;
const
T
*
tmat_row
=
tmat_value
+
i
*
tmat_width
;
for
(
int
j
=
0
;
j
<
code_length
;
++
j
)
{
size_t
index
=
code
->
calc_index
(
j
);
for
(
size_t
k
=
0
;
k
<
input_width
;
++
k
)
{
int64_t
row_index
=
weight
->
GetIndexFromId
(
static_cast
<
int64_t
>
(
index
));
weight_value
[
row_index
*
weight_width
+
k
]
+=
tmat_value
[
i
*
tmat_width
+
j
]
*
input_value
[
input_width
*
i
+
k
];
}
ops
[
code
->
calc_index
(
j
)].
emplace_back
(
tmat_row
[
j
],
input_value_row
);
}
}
for
(
auto
&
row
:
weight
->
rows
())
{
auto
&
op_in_row
=
ops
[
row
];
for
(
auto
&
pair
:
op_in_row
)
{
auto
&
scale
=
pair
.
first
;
auto
*
input_row
=
pair
.
second
;
blas
.
AXPY
(
input_width
,
scale
,
input_row
,
weight_value
);
}
weight_value
+=
weight_width
;
}
}
...
...
paddle/fluid/operators/math/matrix_bit_code.h
浏览文件 @
2dda19f7
...
...
@@ -13,10 +13,14 @@ See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <unordered_map>
#include <utility>
#include <vector>
#include "paddle/fluid/framework/eigen.h"
#include "paddle/fluid/framework/lod_tensor.h"
#include "paddle/fluid/framework/selected_rows.h"
#include "paddle/fluid/framework/tensor.h"
#include "paddle/fluid/operators/math/blas.h"
#include "paddle/fluid/platform/device_context.h"
#if defined(_WIN32)
...
...
paddle/fluid/operators/math/prelu.cu
0 → 100644
浏览文件 @
2dda19f7
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/operators/math/prelu.h"
namespace
paddle
{
namespace
operators
{
namespace
math
{
static
const
int
CUDA_NUM_THREADS
=
1024
;
static
const
int
CUDA_MAX_NUM_BLOCKS
=
65535
;
inline
static
int
GET_NUM_BLOCKS
(
const
int
N
)
{
return
(
N
+
CUDA_NUM_THREADS
-
1
)
/
CUDA_NUM_THREADS
;
}
template
<
typename
T
>
__global__
void
PReluChannelWiseKernel
(
const
T
*
input
,
const
T
*
alpha
,
T
*
output
,
int
channel
,
size_t
spatial_size
)
{
size_t
offset
=
blockIdx
.
x
*
spatial_size
;
const
T
*
in
=
input
+
offset
;
T
*
out
=
output
+
offset
;
T
scale
=
alpha
[
blockIdx
.
x
%
channel
];
for
(
size_t
i
=
threadIdx
.
x
;
i
<
spatial_size
;
i
+=
blockDim
.
x
)
{
T
x
=
in
[
i
];
out
[
i
]
=
(
x
>
0
)
?
x
:
scale
*
x
;
}
}
template
<
typename
T
>
__global__
void
PReluElementWiseKernel
(
const
T
*
input
,
const
T
*
alpha
,
T
*
output
,
size_t
spatial_size
)
{
size_t
offset
=
blockIdx
.
x
*
spatial_size
;
const
T
*
in
=
input
+
offset
;
const
T
*
scale
=
alpha
+
offset
;
T
*
out
=
output
+
offset
;
for
(
size_t
i
=
threadIdx
.
x
;
i
<
spatial_size
;
i
+=
blockDim
.
x
)
{
T
x
=
in
[
i
];
out
[
i
]
=
(
x
>
0
)
?
x
:
scale
[
i
]
*
x
;
}
}
template
<
typename
T
>
__global__
void
PReluScalarKernel
(
const
T
*
input
,
const
T
*
alpha
,
T
*
output
,
size_t
spatial_size
)
{
size_t
offset
=
blockIdx
.
x
*
spatial_size
;
const
T
*
in
=
input
+
offset
;
T
scale
=
*
alpha
;
T
*
out
=
output
+
offset
;
for
(
size_t
i
=
threadIdx
.
x
;
i
<
spatial_size
;
i
+=
blockDim
.
x
)
{
T
x
=
in
[
i
];
out
[
i
]
=
(
x
>
0
)
?
x
:
scale
*
x
;
}
}
template
<
typename
T
>
static
inline
void
PReluChannelWise
(
cudaStream_t
stream
,
const
T
*
input
,
const
T
*
alpha
,
T
*
output
,
std
::
vector
<
int
>
input_shape
)
{
size_t
unroll
=
input_shape
[
0
]
*
input_shape
[
1
];
size_t
spatial_size
=
input_shape
[
2
]
*
input_shape
[
3
];
CHECK_LT
(
unroll
,
CUDA_MAX_NUM_BLOCKS
);
PReluChannelWiseKernel
<<<
unroll
,
CUDA_NUM_THREADS
,
0
,
stream
>>>
(
input
,
alpha
,
output
,
input_shape
[
1
],
spatial_size
);
}
template
<
typename
T
>
static
inline
void
PReluElementWise
(
cudaStream_t
stream
,
const
T
*
input
,
const
T
*
alpha
,
T
*
output
,
std
::
vector
<
int
>
input_shape
)
{
size_t
unroll
=
input_shape
[
0
]
*
input_shape
[
1
];
size_t
spatial_size
=
input_shape
[
2
]
*
input_shape
[
3
];
CHECK_LT
(
unroll
,
CUDA_MAX_NUM_BLOCKS
);
PReluElementWiseKernel
<<<
unroll
,
CUDA_NUM_THREADS
,
0
,
stream
>>>
(
input
,
alpha
,
output
,
spatial_size
);
}
template
<
typename
T
>
static
inline
void
PReluScalar
(
cudaStream_t
stream
,
const
T
*
input
,
const
T
*
alpha
,
T
*
output
,
std
::
vector
<
int
>
input_shape
)
{
size_t
unroll
=
input_shape
[
0
]
*
input_shape
[
1
];
size_t
spatial_size
=
input_shape
[
2
]
*
input_shape
[
3
];
CHECK_LT
(
unroll
,
CUDA_MAX_NUM_BLOCKS
);
PReluScalarKernel
<<<
unroll
,
CUDA_NUM_THREADS
,
0
,
stream
>>>
(
input
,
alpha
,
output
,
spatial_size
);
}
template
<
typename
T
>
void
PreluChannelWiseDirectCUDAFunctor
<
T
>::
operator
()(
cudaStream_t
stream
,
const
T
*
input
,
const
T
*
alpha
,
T
*
output
,
std
::
vector
<
int
>
input_shape
)
{
size_t
unroll
=
input_shape
[
0
]
*
input_shape
[
1
];
size_t
spatial_size
=
input_shape
[
2
]
*
input_shape
[
3
];
CHECK_LT
(
unroll
,
CUDA_MAX_NUM_BLOCKS
);
PReluChannelWiseKernel
<<<
unroll
,
CUDA_NUM_THREADS
,
0
,
stream
>>>
(
input
,
alpha
,
output
,
input_shape
[
1
],
spatial_size
);
}
template
<
typename
T
>
void
PreluElementWiseDirectCUDAFunctor
<
T
>::
operator
()(
cudaStream_t
stream
,
const
T
*
input
,
const
T
*
alpha
,
T
*
output
,
std
::
vector
<
int
>
input_shape
)
{
size_t
unroll
=
input_shape
[
0
]
*
input_shape
[
1
];
size_t
spatial_size
=
input_shape
[
2
]
*
input_shape
[
3
];
CHECK_LT
(
unroll
,
CUDA_MAX_NUM_BLOCKS
);
PReluElementWiseKernel
<<<
unroll
,
CUDA_NUM_THREADS
,
0
,
stream
>>>
(
input
,
alpha
,
output
,
spatial_size
);
}
template
<
typename
T
>
void
PreluScalarDirectCUDAFunctor
<
T
>::
operator
()(
cudaStream_t
stream
,
const
T
*
input
,
const
T
*
alpha
,
T
*
output
,
std
::
vector
<
int
>
input_shape
)
{
size_t
unroll
=
input_shape
[
0
]
*
input_shape
[
1
];
size_t
spatial_size
=
input_shape
[
2
]
*
input_shape
[
3
];
CHECK_LT
(
unroll
,
CUDA_MAX_NUM_BLOCKS
);
PReluScalarKernel
<<<
unroll
,
CUDA_NUM_THREADS
,
0
,
stream
>>>
(
input
,
alpha
,
output
,
spatial_size
);
}
template
class
PreluChannelWiseDirectCUDAFunctor
<
float
>;
template
class
PreluChannelWiseDirectCUDAFunctor
<
double
>;
template
class
PreluElementWiseDirectCUDAFunctor
<
float
>;
template
class
PreluElementWiseDirectCUDAFunctor
<
double
>;
template
class
PreluScalarDirectCUDAFunctor
<
float
>;
template
class
PreluScalarDirectCUDAFunctor
<
double
>;
}
// namespace math
}
// namespace operators
}
// namespace paddle
paddle/fluid/operators/math/prelu.h
0 → 100644
浏览文件 @
2dda19f7
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <vector>
#include "paddle/fluid/operators/math/math_function.h"
#include "paddle/fluid/platform/cudnn_helper.h"
namespace
paddle
{
namespace
operators
{
namespace
math
{
#ifdef PADDLE_WITH_CUDA
template
<
typename
T
>
class
PreluChannelWiseDirectCUDAFunctor
{
public:
void
operator
()(
cudaStream_t
stream
,
const
T
*
input
,
const
T
*
alpha
,
T
*
output
,
std
::
vector
<
int
>
input_shape
);
};
template
<
typename
T
>
class
PreluElementWiseDirectCUDAFunctor
{
public:
void
operator
()(
cudaStream_t
stream
,
const
T
*
input
,
const
T
*
alpha
,
T
*
output
,
std
::
vector
<
int
>
input_shape
);
};
template
<
typename
T
>
class
PreluScalarDirectCUDAFunctor
{
public:
void
operator
()(
cudaStream_t
stream
,
const
T
*
input
,
const
T
*
alpha
,
T
*
output
,
std
::
vector
<
int
>
input_shape
);
};
#endif
}
// namespace math
}
// namespace operators
}
// namespace paddle
paddle/fluid/operators/math/softmax_impl.h
浏览文件 @
2dda19f7
...
...
@@ -13,6 +13,7 @@ See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <vector>
#include "paddle/fluid/framework/eigen.h"
#include "paddle/fluid/framework/tensor.h"
...
...
paddle/fluid/operators/merge_selected_rows_op.cc
0 → 100644
浏览文件 @
2dda19f7
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/operators/merge_selected_rows_op.h"
namespace
paddle
{
namespace
operators
{
class
MergeSelectedRowsOp
:
public
framework
::
OperatorWithKernel
{
public:
using
framework
::
OperatorWithKernel
::
OperatorWithKernel
;
void
InferShape
(
framework
::
InferShapeContext
*
ctx
)
const
override
{
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"X"
),
"Input(X) of MergeSelectedRowsOp should not be null."
);
PADDLE_ENFORCE
(
ctx
->
HasOutput
(
"Out"
),
"Output(Out) of MergeSelectedRowsOp should not be null."
);
ctx
->
ShareDim
(
"X"
,
/*->*/
"Out"
);
}
};
class
MergeSelectedRowsOpMaker
:
public
framework
::
OpProtoAndCheckerMaker
{
public:
void
Make
()
override
{
AddInput
(
"X"
,
"The input type is SelectedRows, and the selected rows may be "
"duplicated."
);
AddOutput
(
"Out"
,
"The output type is SelectedRows, and the selected rows are not "
"duplicated."
);
AddComment
(
R"DOC(
MergeSelectedRows Operator.
MergeSelectedRows is used to merge the duplicated rows of the input.
)DOC"
);
}
};
class
MergeSelectedRowsOpInferVarType
:
public
framework
::
PassInDtypeAndVarTypeToOutput
{
protected:
std
::
unordered_map
<
std
::
string
,
std
::
string
>
GetInputOutputWithSameType
()
const
override
{
return
std
::
unordered_map
<
std
::
string
,
std
::
string
>
{{
"X"
,
/*->*/
"Out"
}};
}
};
}
// namespace operators
}
// namespace paddle
namespace
ops
=
paddle
::
operators
;
namespace
plat
=
paddle
::
platform
;
REGISTER_OPERATOR
(
merge_selected_rows
,
ops
::
MergeSelectedRowsOp
,
ops
::
MergeSelectedRowsOpMaker
,
ops
::
MergeSelectedRowsOpInferVarType
);
REGISTER_OP_CPU_KERNEL
(
merge_selected_rows
,
ops
::
MergeSelectedRowsKernel
<
plat
::
CPUDeviceContext
,
float
>
,
ops
::
MergeSelectedRowsKernel
<
plat
::
CPUDeviceContext
,
double
>
);
paddle/fluid/operators/merge_selected_rows_op.cu.cc
0 → 100644
浏览文件 @
2dda19f7
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/operators/merge_selected_rows_op.h"
namespace
ops
=
paddle
::
operators
;
namespace
plat
=
paddle
::
platform
;
REGISTER_OP_CUDA_KERNEL
(
merge_selected_rows
,
ops
::
MergeSelectedRowsKernel
<
plat
::
CUDADeviceContext
,
float
>
,
ops
::
MergeSelectedRowsKernel
<
plat
::
CUDADeviceContext
,
double
>
);
paddle/fluid/operators/merge_selected_rows_op.h
0 → 100644
浏览文件 @
2dda19f7
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <string>
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/operators/math/selected_rows_functor.h"
namespace
paddle
{
namespace
operators
{
template
<
typename
DeviceContext
,
typename
T
>
class
MergeSelectedRowsKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
void
Compute
(
const
framework
::
ExecutionContext
&
context
)
const
override
{
auto
*
x
=
context
.
Input
<
framework
::
SelectedRows
>
(
"X"
);
auto
*
out
=
context
.
Output
<
framework
::
SelectedRows
>
(
"Out"
);
math
::
scatter
::
MergeAdd
<
DeviceContext
,
T
>
merge_func
;
merge_func
(
context
.
template
device_context
<
DeviceContext
>(),
*
x
,
out
);
}
};
}
// namespace operators
}
// namespace paddle
paddle/fluid/operators/prelu_op.cc
浏览文件 @
2dda19f7
...
...
@@ -58,7 +58,7 @@ class PReluOp : public framework::OperatorWithKernel {
const
framework
::
ExecutionContext
&
ctx
)
const
override
{
return
framework
::
OpKernelType
(
framework
::
ToDataType
(
ctx
.
Input
<
Tensor
>
(
"X"
)
->
type
()),
platform
::
CPUPlace
());
ctx
.
device_context
());
}
};
...
...
paddle/fluid/operators/prelu_op.cu
0 → 100644
浏览文件 @
2dda19f7
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <string>
#include <vector>
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/operators/math/prelu.h"
#include "paddle/fluid/operators/prelu_op.h"
#include "paddle/fluid/platform/cuda_primitives.h"
namespace
paddle
{
namespace
operators
{
using
Tensor
=
framework
::
Tensor
;
template
<
typename
DeviceContext
,
typename
T
>
class
CUDAPReluKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
void
Compute
(
const
framework
::
ExecutionContext
&
context
)
const
override
{
auto
*
x
=
context
.
Input
<
Tensor
>
(
"X"
);
auto
*
alpha
=
context
.
Input
<
Tensor
>
(
"Alpha"
);
auto
*
out
=
context
.
Output
<
Tensor
>
(
"Out"
);
const
T
*
x_ptr
=
x
->
data
<
T
>
();
T
*
o_ptr
=
out
->
mutable_data
<
T
>
(
context
.
GetPlace
());
const
T
*
alpha_ptr
=
alpha
->
data
<
T
>
();
auto
&
mode
=
context
.
Attr
<
std
::
string
>
(
"mode"
);
int
numel
=
x
->
numel
();
auto
dim
=
x
->
dims
();
std
::
vector
<
int
>
input_shape
=
framework
::
vectorize2int
(
dim
);
if
(
mode
==
"channel"
)
{
math
::
PreluChannelWiseDirectCUDAFunctor
<
T
>
prelu_channel_wise
;
prelu_channel_wise
(
context
.
cuda_device_context
().
stream
(),
x_ptr
,
alpha_ptr
,
o_ptr
,
input_shape
);
}
else
if
(
mode
==
"element"
)
{
math
::
PreluElementWiseDirectCUDAFunctor
<
T
>
prelu_element_wise
;
prelu_element_wise
(
context
.
cuda_device_context
().
stream
(),
x_ptr
,
alpha_ptr
,
o_ptr
,
input_shape
);
}
else
{
math
::
PreluScalarDirectCUDAFunctor
<
T
>
prelu_scalar
;
prelu_scalar
(
context
.
cuda_device_context
().
stream
(),
x_ptr
,
alpha_ptr
,
o_ptr
,
input_shape
);
}
}
};
}
// namespace operators
}
// namespace paddle
namespace
ops
=
paddle
::
operators
;
REGISTER_OP_CUDA_KERNEL
(
prelu
,
ops
::
CUDAPReluKernel
<
paddle
::
platform
::
CUDADeviceContext
,
float
>
,
ops
::
CUDAPReluKernel
<
paddle
::
platform
::
CUDADeviceContext
,
double
>
);
paddle/fluid/platform/cpu_info.cc
浏览文件 @
2dda19f7
...
...
@@ -123,7 +123,6 @@ size_t CUDAPinnedMaxChunkSize() {
return
CUDAPinnedMaxAllocSize
()
/
256
;
}
namespace
jit
{
#ifdef PADDLE_WITH_XBYAK
static
Xbyak
::
util
::
Cpu
cpu
;
bool
MayIUse
(
const
cpu_isa_t
cpu_isa
)
{
...
...
@@ -165,6 +164,5 @@ bool MayIUse(const cpu_isa_t cpu_isa) {
}
#endif
}
// namespace jit
}
// namespace platform
}
// namespace paddle
paddle/fluid/platform/cpu_info.h
浏览文件 @
2dda19f7
此差异已折叠。
点击以展开。
paddle/fluid/platform/device_context.cc
浏览文件 @
2dda19f7
此差异已折叠。
点击以展开。
paddle/fluid/platform/device_tracer.cc
浏览文件 @
2dda19f7
此差异已折叠。
点击以展开。
paddle/fluid/platform/device_tracer.h
浏览文件 @
2dda19f7
此差异已折叠。
点击以展开。
paddle/fluid/platform/dynload/cudnn.h
浏览文件 @
2dda19f7
此差异已折叠。
点击以展开。
paddle/fluid/platform/init.cc
浏览文件 @
2dda19f7
此差异已折叠。
点击以展开。
paddle/fluid/platform/mkldnn_helper.h
浏览文件 @
2dda19f7
此差异已折叠。
点击以展开。
paddle/fluid/pybind/pybind.cc
浏览文件 @
2dda19f7
此差异已折叠。
点击以展开。
paddle/scripts/paddle_build.sh
浏览文件 @
2dda19f7
此差异已折叠。
点击以展开。
python/paddle/fluid/clip.py
浏览文件 @
2dda19f7
此差异已折叠。
点击以展开。
python/paddle/fluid/data_feeder.py
浏览文件 @
2dda19f7
此差异已折叠。
点击以展开。
python/paddle/fluid/framework.py
浏览文件 @
2dda19f7
此差异已折叠。
点击以展开。
python/paddle/fluid/io.py
浏览文件 @
2dda19f7
此差异已折叠。
点击以展开。
python/paddle/fluid/layers/nn.py
浏览文件 @
2dda19f7
此差异已折叠。
点击以展开。
python/paddle/fluid/parallel_executor.py
浏览文件 @
2dda19f7
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/test_gradient_clip.py
已删除
100644 → 0
浏览文件 @
47740ace
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/CMakeLists.txt
浏览文件 @
2dda19f7
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/dist_save_load.py
浏览文件 @
2dda19f7
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_conv2d_fusion_op.py
浏览文件 @
2dda19f7
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_conv3d_mkldnn_op.py
0 → 100644
浏览文件 @
2dda19f7
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_conv3d_op.py
浏览文件 @
2dda19f7
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_dist_base.py
浏览文件 @
2dda19f7
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_dist_mnist.py
浏览文件 @
2dda19f7
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_dist_save_load.py
浏览文件 @
2dda19f7
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_dist_transpiler.py
浏览文件 @
2dda19f7
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_get_tensor_from_selected_rows_op.py
0 → 100644
浏览文件 @
2dda19f7
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_gradient_clip.py
0 → 100644
浏览文件 @
2dda19f7
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_merge_selectedrows_op.py
0 → 100644
浏览文件 @
2dda19f7
此差异已折叠。
点击以展开。
python/paddle/fluid/transpiler/distribute_transpiler.py
浏览文件 @
2dda19f7
此差异已折叠。
点击以展开。
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录