Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
BaiXuePrincess
Paddle
提交
dd080b17
P
Paddle
项目概览
BaiXuePrincess
/
Paddle
与 Fork 源项目一致
Fork自
PaddlePaddle / Paddle
通知
1
Star
1
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
dd080b17
编写于
3月 20, 2019
作者:
P
phlrain
浏览文件
操作
浏览文件
下载
差异文件
Merge branch 'develop' of
https://github.com/PaddlePaddle/Paddle
into fix_sequence_pad_2
上级
1580be5d
f26ba5bd
变更
111
隐藏空白更改
内联
并排
Showing
111 changed file
with
2957 addition
and
676 deletion
+2957
-676
paddle/fluid/API.spec
paddle/fluid/API.spec
+7
-7
paddle/fluid/framework/details/CMakeLists.txt
paddle/fluid/framework/details/CMakeLists.txt
+9
-2
paddle/fluid/framework/details/alloc_continuous_space_for_grad_pass.cc
...framework/details/alloc_continuous_space_for_grad_pass.cc
+393
-0
paddle/fluid/framework/details/build_strategy.cc
paddle/fluid/framework/details/build_strategy.cc
+63
-23
paddle/fluid/framework/details/build_strategy.h
paddle/fluid/framework/details/build_strategy.h
+3
-0
paddle/fluid/framework/details/fuse_all_reduce_op_pass.cc
paddle/fluid/framework/details/fuse_all_reduce_op_pass.cc
+195
-0
paddle/fluid/framework/details/fused_all_reduce_op_handle.cc
paddle/fluid/framework/details/fused_all_reduce_op_handle.cc
+248
-0
paddle/fluid/framework/details/fused_all_reduce_op_handle.h
paddle/fluid/framework/details/fused_all_reduce_op_handle.h
+76
-0
paddle/fluid/framework/details/graph_test_base.h
paddle/fluid/framework/details/graph_test_base.h
+5
-5
paddle/fluid/framework/details/multi_devices_graph_pass.cc
paddle/fluid/framework/details/multi_devices_graph_pass.cc
+37
-19
paddle/fluid/framework/details/multi_devices_graph_pass.h
paddle/fluid/framework/details/multi_devices_graph_pass.h
+8
-12
paddle/fluid/framework/details/multi_devices_helper.h
paddle/fluid/framework/details/multi_devices_helper.h
+23
-0
paddle/fluid/framework/details/op_registry.h
paddle/fluid/framework/details/op_registry.h
+4
-2
paddle/fluid/framework/details/reduce_and_gather.h
paddle/fluid/framework/details/reduce_and_gather.h
+25
-0
paddle/fluid/framework/ir/CMakeLists.txt
paddle/fluid/framework/ir/CMakeLists.txt
+2
-0
paddle/fluid/framework/ir/cpu_quantize_placement_pass.cc
paddle/fluid/framework/ir/cpu_quantize_placement_pass.cc
+58
-0
paddle/fluid/framework/ir/cpu_quantize_placement_pass.h
paddle/fluid/framework/ir/cpu_quantize_placement_pass.h
+34
-0
paddle/fluid/framework/ir/cpu_quantize_placement_pass_tester.cc
.../fluid/framework/ir/cpu_quantize_placement_pass_tester.cc
+129
-0
paddle/fluid/framework/ir/graph_test.cc
paddle/fluid/framework/ir/graph_test.cc
+7
-7
paddle/fluid/framework/op_desc.cc
paddle/fluid/framework/op_desc.cc
+3
-1
paddle/fluid/framework/parallel_executor.cc
paddle/fluid/framework/parallel_executor.cc
+17
-6
paddle/fluid/framework/scope.cc
paddle/fluid/framework/scope.cc
+1
-1
paddle/fluid/framework/tensor_util.cc
paddle/fluid/framework/tensor_util.cc
+5
-0
paddle/fluid/framework/type_defs.h
paddle/fluid/framework/type_defs.h
+2
-1
paddle/fluid/framework/var_type_inference.h
paddle/fluid/framework/var_type_inference.h
+108
-9
paddle/fluid/framework/var_type_inference_test.cc
paddle/fluid/framework/var_type_inference_test.cc
+6
-6
paddle/fluid/imperative/layer.cc
paddle/fluid/imperative/layer.cc
+68
-26
paddle/fluid/imperative/layer.h
paddle/fluid/imperative/layer.h
+157
-16
paddle/fluid/imperative/tracer.cc
paddle/fluid/imperative/tracer.cc
+23
-15
paddle/fluid/imperative/tracer.h
paddle/fluid/imperative/tracer.h
+1
-1
paddle/fluid/imperative/type_defs.h
paddle/fluid/imperative/type_defs.h
+1
-0
paddle/fluid/inference/tests/api/CMakeLists.txt
paddle/fluid/inference/tests/api/CMakeLists.txt
+1
-1
paddle/fluid/inference/tests/api/analyzer_transformer_tester.cc
.../fluid/inference/tests/api/analyzer_transformer_tester.cc
+18
-2
paddle/fluid/operators/beam_search_decode_op.cc
paddle/fluid/operators/beam_search_decode_op.cc
+9
-12
paddle/fluid/operators/beam_search_op.cc
paddle/fluid/operators/beam_search_op.cc
+6
-9
paddle/fluid/operators/concat_op.cc
paddle/fluid/operators/concat_op.cc
+1
-1
paddle/fluid/operators/controlflow/get_places_op.cc
paddle/fluid/operators/controlflow/get_places_op.cc
+3
-5
paddle/fluid/operators/controlflow/tensor_array_read_write_op.cc
...fluid/operators/controlflow/tensor_array_read_write_op.cc
+6
-9
paddle/fluid/operators/controlflow/while_op.cc
paddle/fluid/operators/controlflow/while_op.cc
+7
-10
paddle/fluid/operators/conv_transpose_op.cc
paddle/fluid/operators/conv_transpose_op.cc
+26
-3
paddle/fluid/operators/cos_sim_op.cc
paddle/fluid/operators/cos_sim_op.cc
+3
-0
paddle/fluid/operators/cos_sim_op.h
paddle/fluid/operators/cos_sim_op.h
+13
-5
paddle/fluid/operators/distributed_ops/allreduce_op.cc
paddle/fluid/operators/distributed_ops/allreduce_op.cc
+143
-0
paddle/fluid/operators/distributed_ops/fake_init_op.cc
paddle/fluid/operators/distributed_ops/fake_init_op.cc
+1
-2
paddle/fluid/operators/distributed_ops/merge_ids_op.cc
paddle/fluid/operators/distributed_ops/merge_ids_op.cc
+4
-5
paddle/fluid/operators/distributed_ops/split_ids_op.cc
paddle/fluid/operators/distributed_ops/split_ids_op.cc
+6
-5
paddle/fluid/operators/dropout_op.cc
paddle/fluid/operators/dropout_op.cc
+22
-11
paddle/fluid/operators/fc_op.cc
paddle/fluid/operators/fc_op.cc
+14
-13
paddle/fluid/operators/fc_op.h
paddle/fluid/operators/fc_op.h
+16
-0
paddle/fluid/operators/fill_constant_op.cc
paddle/fluid/operators/fill_constant_op.cc
+4
-5
paddle/fluid/operators/fused/fused_embedding_seq_pool_op.cc
paddle/fluid/operators/fused/fused_embedding_seq_pool_op.cc
+8
-9
paddle/fluid/operators/get_tensor_from_selected_rows_op.cc
paddle/fluid/operators/get_tensor_from_selected_rows_op.cc
+6
-9
paddle/fluid/operators/hash_op.cc
paddle/fluid/operators/hash_op.cc
+2
-1
paddle/fluid/operators/hierarchical_sigmoid_op.cc
paddle/fluid/operators/hierarchical_sigmoid_op.cc
+9
-15
paddle/fluid/operators/layer_norm_op.cc
paddle/fluid/operators/layer_norm_op.cc
+30
-2
paddle/fluid/operators/layer_norm_op.h
paddle/fluid/operators/layer_norm_op.h
+7
-14
paddle/fluid/operators/lod_rank_table_op.cc
paddle/fluid/operators/lod_rank_table_op.cc
+3
-5
paddle/fluid/operators/lod_tensor_to_array_op.cc
paddle/fluid/operators/lod_tensor_to_array_op.cc
+3
-4
paddle/fluid/operators/lookup_table_op.cc
paddle/fluid/operators/lookup_table_op.cc
+6
-8
paddle/fluid/operators/mkldnn/fc_mkldnn_op.cc
paddle/fluid/operators/mkldnn/fc_mkldnn_op.cc
+16
-8
paddle/fluid/operators/mkldnn/transpose_mkldnn_op.cc
paddle/fluid/operators/mkldnn/transpose_mkldnn_op.cc
+27
-1
paddle/fluid/operators/nccl/nccl_op.cc
paddle/fluid/operators/nccl/nccl_op.cc
+3
-6
paddle/fluid/operators/nce_op.cc
paddle/fluid/operators/nce_op.cc
+6
-8
paddle/fluid/operators/ngraph/ngraph_engine.cc
paddle/fluid/operators/ngraph/ngraph_engine.cc
+20
-12
paddle/fluid/operators/ngraph/ngraph_engine.h
paddle/fluid/operators/ngraph/ngraph_engine.h
+2
-4
paddle/fluid/operators/ngraph/ngraph_engine_op.cc
paddle/fluid/operators/ngraph/ngraph_engine_op.cc
+1
-2
paddle/fluid/operators/optimizers/adam_op.h
paddle/fluid/operators/optimizers/adam_op.h
+15
-34
paddle/fluid/operators/optimizers/lars_momentum_op.cc
paddle/fluid/operators/optimizers/lars_momentum_op.cc
+3
-4
paddle/fluid/operators/optimizers/momentum_op.cc
paddle/fluid/operators/optimizers/momentum_op.cc
+7
-11
paddle/fluid/operators/optimizers/momentum_op.h
paddle/fluid/operators/optimizers/momentum_op.h
+6
-13
paddle/fluid/operators/optimizers/rmsprop_op.h
paddle/fluid/operators/optimizers/rmsprop_op.h
+4
-14
paddle/fluid/operators/optimizers/sgd_op.cc
paddle/fluid/operators/optimizers/sgd_op.cc
+6
-8
paddle/fluid/operators/py_func_op.cc
paddle/fluid/operators/py_func_op.cc
+20
-21
paddle/fluid/operators/reader/create_custom_reader_op.cc
paddle/fluid/operators/reader/create_custom_reader_op.cc
+11
-12
paddle/fluid/operators/reader/read_op.cc
paddle/fluid/operators/reader/read_op.cc
+7
-10
paddle/fluid/operators/reader/reader_op_registry.cc
paddle/fluid/operators/reader/reader_op_registry.cc
+9
-12
paddle/fluid/operators/reader/reader_op_registry.h
paddle/fluid/operators/reader/reader_op_registry.h
+4
-4
paddle/fluid/operators/save_op.cc
paddle/fluid/operators/save_op.cc
+3
-6
paddle/fluid/operators/scale_op.cc
paddle/fluid/operators/scale_op.cc
+6
-9
paddle/fluid/operators/sequence_ops/sequence_enumerate_op.cc
paddle/fluid/operators/sequence_ops/sequence_enumerate_op.cc
+2
-8
paddle/fluid/operators/sequence_ops/sequence_enumerate_op.h
paddle/fluid/operators/sequence_ops/sequence_enumerate_op.h
+28
-11
paddle/fluid/operators/slice_op.cu
paddle/fluid/operators/slice_op.cu
+7
-7
paddle/fluid/operators/softmax_with_cross_entropy_op.cc
paddle/fluid/operators/softmax_with_cross_entropy_op.cc
+1
-1
paddle/fluid/operators/split_selected_rows_op.cc
paddle/fluid/operators/split_selected_rows_op.cc
+5
-4
paddle/fluid/operators/sum_op.cc
paddle/fluid/operators/sum_op.cc
+13
-19
paddle/fluid/operators/tensor_array_to_tensor_op.cc
paddle/fluid/operators/tensor_array_to_tensor_op.cc
+3
-4
paddle/fluid/operators/tensorrt/tensorrt_engine_op.cc
paddle/fluid/operators/tensorrt/tensorrt_engine_op.cc
+1
-2
paddle/fluid/operators/uniform_random_op.cc
paddle/fluid/operators/uniform_random_op.cc
+7
-8
paddle/fluid/platform/device_context.h
paddle/fluid/platform/device_context.h
+1
-0
paddle/fluid/platform/nccl_helper.h
paddle/fluid/platform/nccl_helper.h
+3
-6
paddle/fluid/pybind/imperative.cc
paddle/fluid/pybind/imperative.cc
+2
-2
paddle/fluid/pybind/pybind.cc
paddle/fluid/pybind/pybind.cc
+5
-1
python/paddle/fluid/__init__.py
python/paddle/fluid/__init__.py
+2
-1
python/paddle/fluid/data_feeder.py
python/paddle/fluid/data_feeder.py
+3
-3
python/paddle/fluid/executor.py
python/paddle/fluid/executor.py
+14
-6
python/paddle/fluid/layers/__init__.py
python/paddle/fluid/layers/__init__.py
+1
-0
python/paddle/fluid/layers/collective.py
python/paddle/fluid/layers/collective.py
+47
-0
python/paddle/fluid/layers/control_flow.py
python/paddle/fluid/layers/control_flow.py
+7
-17
python/paddle/fluid/layers/nn.py
python/paddle/fluid/layers/nn.py
+5
-0
python/paddle/fluid/tests/unittests/dist_allreduce_op.py
python/paddle/fluid/tests/unittests/dist_allreduce_op.py
+120
-0
python/paddle/fluid/tests/unittests/mkldnn/test_transpose_int8_mkldnn_op.py
...d/tests/unittests/mkldnn/test_transpose_int8_mkldnn_op.py
+78
-0
python/paddle/fluid/tests/unittests/parallel_executor_test_base.py
...ddle/fluid/tests/unittests/parallel_executor_test_base.py
+2
-0
python/paddle/fluid/tests/unittests/test_dist_allreduce_op.py
...on/paddle/fluid/tests/unittests/test_dist_allreduce_op.py
+35
-0
python/paddle/fluid/tests/unittests/test_dist_base.py
python/paddle/fluid/tests/unittests/test_dist_base.py
+63
-26
python/paddle/fluid/tests/unittests/test_dist_mnist.py
python/paddle/fluid/tests/unittests/test_dist_mnist.py
+1
-1
python/paddle/fluid/tests/unittests/test_fuse_all_reduce_pass.py
...paddle/fluid/tests/unittests/test_fuse_all_reduce_pass.py
+121
-0
python/paddle/fluid/tests/unittests/test_layers.py
python/paddle/fluid/tests/unittests/test_layers.py
+54
-0
python/paddle/fluid/tests/unittests/test_slice_op.py
python/paddle/fluid/tests/unittests/test_slice_op.py
+26
-0
python/paddle/reader/__init__.py
python/paddle/reader/__init__.py
+2
-5
python/paddle/reader/creator.py
python/paddle/reader/creator.py
+14
-6
python/paddle/reader/decorator.py
python/paddle/reader/decorator.py
+13
-15
未找到文件。
paddle/fluid/API.spec
浏览文件 @
dd080b17
...
...
@@ -12,7 +12,7 @@ paddle.fluid.program_guard (ArgSpec(args=['main_program', 'startup_program'], va
paddle.fluid.name_scope (ArgSpec(args=['prefix'], varargs=None, keywords=None, defaults=(None,)), ('document', '0ef753f5cec69fef9ae6ad8b867b33a2'))
paddle.fluid.Executor.__init__ (ArgSpec(args=['self', 'place'], varargs=None, keywords=None, defaults=None), ('document', '6adf97f83acf6453d4a6a4b1070f3754'))
paddle.fluid.Executor.close (ArgSpec(args=['self'], varargs=None, keywords=None, defaults=None), ('document', 'f5369953dd0c443961cf79f7a00e1a03'))
paddle.fluid.Executor.run (ArgSpec(args=['self', 'program', 'feed', 'fetch_list', 'feed_var_name', 'fetch_var_name', 'scope', 'return_numpy', 'use_program_cache'], varargs=None, keywords=None, defaults=(None, None, None, 'feed', 'fetch', None, True, False)), ('document', '
aba8093edebf2d5c869b735b92811e45
'))
paddle.fluid.Executor.run (ArgSpec(args=['self', 'program', 'feed', 'fetch_list', 'feed_var_name', 'fetch_var_name', 'scope', 'return_numpy', 'use_program_cache'], varargs=None, keywords=None, defaults=(None, None, None, 'feed', 'fetch', None, True, False)), ('document', '
f482e93b38b4018796969a2e1dde479d
'))
paddle.fluid.global_scope (ArgSpec(args=[], varargs=None, keywords=None, defaults=None), ('document', 'e148d3ab1ed8edf3e928212a375959c0'))
paddle.fluid.scope_guard (ArgSpec(args=['scope'], varargs=None, keywords=None, defaults=None), ('document', 'b94d1f6bcc29c4fb58fc0058561250c2'))
paddle.fluid.DistributeTranspiler.__init__ (ArgSpec(args=['self', 'config'], varargs=None, keywords=None, defaults=(None,)), ('document', '6adf97f83acf6453d4a6a4b1070f3754'))
...
...
@@ -277,7 +277,7 @@ paddle.fluid.layers.DynamicRNN.block (ArgSpec(args=['self'], varargs=None, keywo
paddle.fluid.layers.DynamicRNN.memory (ArgSpec(args=['self', 'init', 'shape', 'value', 'need_reorder', 'dtype'], varargs=None, keywords=None, defaults=(None, None, 0.0, False, 'float32')), ('document', 'b9174d4e91505b0c8ecc193eb51e248d'))
paddle.fluid.layers.DynamicRNN.output (ArgSpec(args=['self'], varargs='outputs', keywords=None, defaults=None), ('document', 'b439a176a3328de8a75bdc5c08eece4a'))
paddle.fluid.layers.DynamicRNN.static_input (ArgSpec(args=['self', 'x'], varargs=None, keywords=None, defaults=None), ('document', 'f29ad2478b6b2ad4f413d2936a331ea0'))
paddle.fluid.layers.DynamicRNN.step_input (ArgSpec(args=['self', 'x'
], varargs=None, keywords=None, defaults=None), ('document', '169d694d2224f62b4f3afdc3dbc19e9
5'))
paddle.fluid.layers.DynamicRNN.step_input (ArgSpec(args=['self', 'x'
, 'level'], varargs=None, keywords=None, defaults=(0,)), ('document', '7568c5ac7622a10288d3307a9413465
5'))
paddle.fluid.layers.DynamicRNN.update_memory (ArgSpec(args=['self', 'ex_mem', 'new_mem'], varargs=None, keywords=None, defaults=None), ('document', '5d83987da13b98363d6a807a52d8024f'))
paddle.fluid.layers.StaticRNN.__init__ (ArgSpec(args=['self', 'name'], varargs=None, keywords=None, defaults=(None,)), ('document', '6adf97f83acf6453d4a6a4b1070f3754'))
paddle.fluid.layers.StaticRNN.memory (ArgSpec(args=['self', 'init', 'shape', 'batch_ref', 'init_value', 'init_batch_dim_idx', 'ref_batch_dim_idx'], varargs=None, keywords=None, defaults=(None, None, None, 0.0, 0, 1)), ('document', 'c24e368e23afac1ed91a78a639d7a9c7'))
...
...
@@ -494,7 +494,7 @@ paddle.fluid.CUDAPinnedPlace.__init__ __init__(self: paddle.fluid.core.CUDAPinne
paddle.fluid.ParamAttr.__init__ (ArgSpec(args=['self', 'name', 'initializer', 'learning_rate', 'regularizer', 'trainable', 'gradient_clip', 'do_model_average'], varargs=None, keywords=None, defaults=(None, None, 1.0, None, True, None, False)), ('document', '6adf97f83acf6453d4a6a4b1070f3754'))
paddle.fluid.WeightNormParamAttr.__init__ (ArgSpec(args=['self', 'dim', 'name', 'initializer', 'learning_rate', 'regularizer', 'trainable', 'gradient_clip', 'do_model_average'], varargs=None, keywords=None, defaults=(None, None, None, 1.0, None, True, None, False)), ('document', '6adf97f83acf6453d4a6a4b1070f3754'))
paddle.fluid.DataFeeder.__init__ (ArgSpec(args=['self', 'feed_list', 'place', 'program'], varargs=None, keywords=None, defaults=(None,)), ('document', '6adf97f83acf6453d4a6a4b1070f3754'))
paddle.fluid.DataFeeder.decorate_reader (ArgSpec(args=['self', 'reader', 'multi_devices', 'num_places', 'drop_last'], varargs=None, keywords=None, defaults=(None, True)), ('document', '
0eed2f198dc73c08a41b61edbc755753
'))
paddle.fluid.DataFeeder.decorate_reader (ArgSpec(args=['self', 'reader', 'multi_devices', 'num_places', 'drop_last'], varargs=None, keywords=None, defaults=(None, True)), ('document', '
f8f3df23c5633c614db781a91b81fb62
'))
paddle.fluid.DataFeeder.feed (ArgSpec(args=['self', 'iterable'], varargs=None, keywords=None, defaults=None), ('document', '459e316301279dfd82001b46f0b8ffca'))
paddle.fluid.DataFeeder.feed_parallel (ArgSpec(args=['self', 'iterable', 'num_places'], varargs=None, keywords=None, defaults=(None,)), ('document', '543863d1f9d4853758adb613b8659e85'))
paddle.fluid.clip.ErrorClipByValue.__init__ (ArgSpec(args=['self', 'max', 'min'], varargs=None, keywords=None, defaults=(None,)), ('document', '6adf97f83acf6453d4a6a4b1070f3754'))
...
...
@@ -518,11 +518,11 @@ paddle.reader.compose (ArgSpec(args=[], varargs='readers', keywords='kwargs', de
paddle.reader.chain (ArgSpec(args=[], varargs='readers', keywords=None, defaults=None), ('document', 'd22c34e379a53901ae67a6bca7f4def4'))
paddle.reader.shuffle (ArgSpec(args=['reader', 'buf_size'], varargs=None, keywords=None, defaults=None), ('document', 'e42ea6fee23ce26b23cb142cd1d6522d'))
paddle.reader.firstn (ArgSpec(args=['reader', 'n'], varargs=None, keywords=None, defaults=None), ('document', 'c5bb8f7dd4f917f1569a368aab5b8aad'))
paddle.reader.xmap_readers (ArgSpec(args=['mapper', 'reader', 'process_num', 'buffer_size', 'order'], varargs=None, keywords=None, defaults=(False,)), ('document', '
283bc0b8a0e26ae186b8b9bee4aec560
'))
paddle.reader.xmap_readers (ArgSpec(args=['mapper', 'reader', 'process_num', 'buffer_size', 'order'], varargs=None, keywords=None, defaults=(False,)), ('document', '
9c804a42f8a4dbaa76b3c98e0ab7f796
'))
paddle.reader.PipeReader.__init__ (ArgSpec(args=['self', 'command', 'bufsize', 'file_type'], varargs=None, keywords=None, defaults=(8192, 'plain')), ('document', '6adf97f83acf6453d4a6a4b1070f3754'))
paddle.reader.PipeReader.get_line (ArgSpec(args=['self', 'cut_lines', 'line_break'], varargs=None, keywords=None, defaults=(True, '\n')), ('document', '
5f80a7ed70052f01665e4c74acccfa69
'))
paddle.reader.PipeReader.get_line (ArgSpec(args=['self', 'cut_lines', 'line_break'], varargs=None, keywords=None, defaults=(True, '\n')), ('document', '
9621ae612e595b6c34eb3bb5f3eb1a45
'))
paddle.reader.multiprocess_reader (ArgSpec(args=['readers', 'use_pipe', 'queue_size'], varargs=None, keywords=None, defaults=(True, 1000)), ('document', '7d8b3a96e592107c893d5d51ce968ba0'))
paddle.reader.Fake.__init__ (ArgSpec(args=['self'], varargs=None, keywords=None, defaults=None), ('document', '6adf97f83acf6453d4a6a4b1070f3754'))
paddle.reader.creator.np_array (ArgSpec(args=['x'], varargs=None, keywords=None, defaults=None), ('document', '28d457fbc9a71efa4ac91a3be179cada'))
paddle.reader.creator.text_file (ArgSpec(args=['path'], varargs=None, keywords=None, defaults=None), ('document', '
44fe286ab6175a5464d3a961a68c266a
'))
paddle.reader.creator.recordio (ArgSpec(args=['paths', 'buf_size'], varargs=None, keywords=None, defaults=(100,)), ('document', '
11b3704ea42cfd537953387a7e58dae8
'))
paddle.reader.creator.text_file (ArgSpec(args=['path'], varargs=None, keywords=None, defaults=None), ('document', '
f45fcb7add066c8e042c6774fc7c3db2
'))
paddle.reader.creator.recordio (ArgSpec(args=['paths', 'buf_size'], varargs=None, keywords=None, defaults=(100,)), ('document', '
b4a94ee0e2cefb495619275c2f8c61d2
'))
paddle/fluid/framework/details/CMakeLists.txt
浏览文件 @
dd080b17
...
...
@@ -9,6 +9,7 @@ cc_library(rpc_op_handle SRCS rpc_op_handle.cc DEPS framework_proto scope place
cc_library
(
multi_devices_helper SRCS multi_devices_helper.cc DEPS graph graph_helper
)
cc_library
(
multi_devices_graph_print_pass SRCS multi_devices_graph_print_pass.cc DEPS multi_devices_helper
)
cc_library
(
multi_devices_graph_check_pass SRCS multi_devices_graph_check_pass.cc DEPS multi_devices_helper
)
cc_library
(
alloc_continuous_space_for_grad_pass SRCS alloc_continuous_space_for_grad_pass.cc DEPS graph graph_helper
)
cc_library
(
variable_visitor SRCS variable_visitor.cc DEPS lod_tensor selected_rows
)
...
...
@@ -22,6 +23,8 @@ endif()
if
(
WITH_GPU
)
nv_library
(
all_reduce_op_handle SRCS all_reduce_op_handle.cc DEPS op_handle_base scope lod_tensor ddim memory
dynload_cuda variable_visitor
)
nv_library
(
fused_all_reduce_op_handle SRCS fused_all_reduce_op_handle.cc DEPS op_handle_base scope lod_tensor ddim memory
dynload_cuda variable_visitor
)
if
(
WITH_DISTRIBUTE
)
nv_library
(
reduce_op_handle SRCS reduce_op_handle.cc DEPS op_handle_base variable_visitor scope
ddim dynload_cuda selected_rows_functor sendrecvop_rpc
)
...
...
@@ -35,6 +38,8 @@ if(WITH_GPU)
else
()
cc_library
(
all_reduce_op_handle SRCS all_reduce_op_handle.cc DEPS op_handle_base scope lod_tensor ddim memory
variable_visitor
)
cc_library
(
fused_all_reduce_op_handle SRCS fused_all_reduce_op_handle.cc DEPS op_handle_base scope lod_tensor ddim memory
variable_visitor
)
if
(
WITH_DISTRIBUTE
)
cc_library
(
reduce_op_handle SRCS reduce_op_handle.cc DEPS op_handle_base variable_visitor scope
ddim selected_rows_functor sendrecvop_rpc
)
...
...
@@ -71,6 +76,8 @@ cc_library(all_reduce_deps_pass SRCS all_reduce_deps_pass.cc DEPS graph graph_he
cc_library
(
multi_devices_graph_pass SRCS multi_devices_graph_pass.cc DEPS multi_devices_helper computation_op_handle
scale_loss_grad_op_handle rpc_op_handle all_reduce_op_handle reduce_op_handle broadcast_op_handle data_balance_op_handle fused_broadcast_op_handle
)
cc_library
(
fuse_all_reduce_op_pass SRCS fuse_all_reduce_op_pass.cc DEPS graph graph_helper fused_all_reduce_op_handle
)
set
(
SSA_GRAPH_EXECUTOR_DEPS graph framework_proto sequential_execution_pass modify_op_lock_and_record_event_pass all_reduce_deps_pass reference_count_pass eager_deletion_pass memory_optimize_pass inplace_op_pass
)
if
(
WITH_GPU
)
list
(
APPEND SSA_GRAPH_EXECUTOR_DEPS reference_count_pass
)
...
...
@@ -98,5 +105,5 @@ cc_library(build_strategy SRCS build_strategy.cc DEPS
graph_viz_pass multi_devices_graph_pass
multi_devices_graph_print_pass multi_devices_graph_check_pass
fuse_elewise_add_act_pass multi_batch_merge_pass
fuse_relu_depthwise_conv_pass
memory_optimize_pass lock_free_optimize_pass
)
fuse_relu_depthwise_conv_pass
memory_optimize_pass lock_free_optimize_pass
alloc_continuous_space_for_grad_pass fuse_all_reduce_op_pass
)
paddle/fluid/framework/details/alloc_continuous_space_for_grad_pass.cc
0 → 100644
浏览文件 @
dd080b17
// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include <algorithm>
#include <string>
#include <utility>
#include <vector>
#include "paddle/fluid/framework/details/build_strategy.h"
#include "paddle/fluid/framework/details/multi_devices_helper.h"
#include "paddle/fluid/framework/ir/graph_helper.h"
#include "paddle/fluid/framework/op_registry.h"
DEFINE_uint32
(
fuse_parameter_memory_size
,
0
,
// 0 KB
"fuse_parameter_memory_size is up limited memory size "
"of one group parameters' gradient which is the input "
"of communication calling(e.g NCCLAllReduce). "
"The default value is 0, it means that "
"not set group according to memory_size."
);
DEFINE_int32
(
fuse_parameter_groups_size
,
3
,
"fuse_parameter_groups_size is the size of one group parameters' gradient. "
"The default value is a experimental result. If the "
"fuse_parameter_groups_size is 1, it means that the groups size is "
"the number of parameters' gradient. If the fuse_parameter_groups_size is "
"-1, it means that there are only one group. The default value is 3, it is "
"an experimental value."
);
namespace
paddle
{
namespace
framework
{
namespace
details
{
static
const
char
kUnKnow
[]
=
"@UNKNOW@"
;
static
framework
::
proto
::
VarType
::
Type
kDefaultDtype
=
framework
::
proto
::
VarType
::
Type
::
VarType_Type_BOOL
;
class
AllocContinuousSpaceForGradPass
:
public
ir
::
Pass
{
protected:
std
::
unique_ptr
<
ir
::
Graph
>
ApplyImpl
(
std
::
unique_ptr
<
ir
::
Graph
>
graph
)
const
override
{
ir
::
Graph
&
result
=
*
graph
;
auto
&
places
=
Get
<
const
std
::
vector
<
platform
::
Place
>>
(
kPlaces
);
auto
&
local_scopes
=
Get
<
const
std
::
vector
<
Scope
*>>
(
kLocalScopes
);
ResetAttribute
<
ParamsAndGrads
>
(
kParamsAndGrads
,
&
result
);
ResetAttribute
<
GroupGradsAndParams
>
(
kGroupGradsAndParams
,
&
result
);
// NOTE: The operator nodes should be in topology order.
std
::
vector
<
ir
::
Node
*>
topo_nodes
=
ir
::
TopologySortOperations
(
result
);
auto
&
params_grads
=
result
.
Get
<
ParamsAndGrads
>
(
kParamsAndGrads
);
for
(
auto
&
node
:
topo_nodes
)
{
RecordParamsAndGrads
(
node
,
&
params_grads
);
}
if
(
params_grads
.
size
()
==
0
)
{
VLOG
(
10
)
<<
"Doesn't find gradients"
;
return
std
::
move
(
graph
);
}
std
::
unordered_map
<
std
::
string
,
ir
::
Node
*>
vars
;
for
(
ir
::
Node
*
node
:
result
.
Nodes
())
{
if
(
node
->
IsVar
()
&&
node
->
Var
())
{
// Note: The graph may have the same name node. For example, parameter
// is the input of operator and it also is the output of optimizer;
vars
.
emplace
(
node
->
Var
()
->
Name
(),
node
);
}
}
auto
&
group_grads_params
=
result
.
Get
<
GroupGradsAndParams
>
(
kGroupGradsAndParams
);
// Note: the order of params_grads may be changed by SetGroupGradsAndParams.
SetGroupGradsAndParams
(
vars
,
params_grads
,
&
group_grads_params
);
params_grads
.
clear
();
for
(
auto
&
group_p_g
:
group_grads_params
)
{
params_grads
.
insert
(
params_grads
.
begin
(),
group_p_g
.
begin
(),
group_p_g
.
end
());
}
for
(
auto
&
p_g
:
params_grads
)
{
std
::
swap
(
p_g
.
first
,
p_g
.
second
);
}
// Set Gradients as Persistable to prevent this var becoming reusable.
auto
dtype
=
kDefaultDtype
;
for
(
auto
&
p_g
:
params_grads
)
{
// Get gradient var
auto
iter
=
vars
.
find
(
p_g
.
second
);
PADDLE_ENFORCE
(
iter
!=
vars
.
end
(),
"%s is not found."
,
p_g
.
second
);
iter
->
second
->
Var
()
->
SetPersistable
(
true
);
PADDLE_ENFORCE
(
IsSupportedVarType
(
iter
->
second
->
Var
()
->
GetType
()));
// Get Dtype
auto
ele_dtype
=
iter
->
second
->
Var
()
->
GetDataType
();
if
(
dtype
==
kDefaultDtype
)
{
dtype
=
ele_dtype
;
PADDLE_ENFORCE_NE
(
ele_dtype
,
kDefaultDtype
);
}
PADDLE_ENFORCE_EQ
(
ele_dtype
,
dtype
);
}
// Create the fused variable name.
if
(
!
result
.
Has
(
kFusedVars
))
{
result
.
Set
(
kFusedVars
,
new
FusedVars
);
}
const
std
::
string
prefix
(
kFusedVarNamePrefix
);
// The fused_var_name should be unique.
auto
fused_var_name
=
prefix
+
"GRAD@"
+
params_grads
[
0
].
second
;
auto
&
fused_var_set
=
result
.
Get
<
FusedVars
>
(
kFusedVars
);
PADDLE_ENFORCE_EQ
(
fused_var_set
.
count
(
fused_var_name
),
0
);
fused_var_set
.
insert
(
fused_var_name
);
InitFusedVarsAndAllocSpaceForVars
(
places
,
local_scopes
,
vars
,
fused_var_name
,
params_grads
);
return
std
::
move
(
graph
);
}
template
<
typename
AttrType
>
void
ResetAttribute
(
const
std
::
string
&
attr_name
,
ir
::
Graph
*
graph
)
const
{
if
(
graph
->
Has
(
attr_name
))
{
VLOG
(
10
)
<<
attr_name
<<
" is reset."
;
graph
->
Erase
(
attr_name
);
}
graph
->
Set
(
attr_name
,
new
AttrType
);
}
void
SetGroupGradsAndParams
(
const
std
::
unordered_map
<
std
::
string
,
ir
::
Node
*>
&
var_nodes
,
const
ParamsAndGrads
&
params_grads
,
GroupGradsAndParams
*
group_grads_params
)
const
{
SetGroupAccordingToLayers
(
var_nodes
,
params_grads
,
group_grads_params
);
SetGroupAccordingToMemorySize
(
var_nodes
,
group_grads_params
);
SetGroupAccordingToGroupSize
(
var_nodes
,
group_grads_params
);
}
void
SetGroupAccordingToLayers
(
const
std
::
unordered_map
<
std
::
string
,
ir
::
Node
*>
&
var_nodes
,
const
ParamsAndGrads
&
params_grads
,
GroupGradsAndParams
*
group_grads_params
)
const
{
std
::
unordered_map
<
std
::
string
,
std
::
vector
<
int
>>
layer_params
;
for
(
size_t
i
=
0
;
i
<
params_grads
.
size
();
++
i
)
{
auto
pos
=
params_grads
[
i
].
first
.
find_first_of
(
"."
);
if
(
pos
==
std
::
string
::
npos
)
{
layer_params
[
std
::
string
(
kUnKnow
)].
emplace_back
(
i
);
}
else
{
layer_params
[
params_grads
[
i
].
first
.
substr
(
0
,
pos
)].
emplace_back
(
i
);
}
}
group_grads_params
->
reserve
(
layer_params
.
size
());
for
(
size_t
i
=
0
;
i
<
params_grads
.
size
();
++
i
)
{
auto
pos
=
params_grads
[
i
].
first
.
find_first_of
(
"."
);
std
::
string
key
=
kUnKnow
;
if
(
pos
!=
std
::
string
::
npos
)
{
key
=
params_grads
[
i
].
first
.
substr
(
0
,
pos
);
}
auto
iter
=
layer_params
.
find
(
key
);
if
(
iter
==
layer_params
.
end
())
continue
;
group_grads_params
->
emplace_back
();
auto
&
local_group_grads_params
=
group_grads_params
->
back
();
for
(
auto
&
idx
:
iter
->
second
)
{
local_group_grads_params
.
emplace_back
(
std
::
make_pair
(
params_grads
[
idx
].
second
,
params_grads
[
idx
].
first
));
}
layer_params
.
erase
(
iter
);
}
VLOG
(
10
)
<<
"SetGroupAccordingToLayers: "
;
for
(
size_t
i
=
0
;
i
<
group_grads_params
->
size
();
++
i
)
{
VLOG
(
10
)
<<
"group "
<<
i
;
std
::
stringstream
out
;
for
(
auto
&
p_g
:
group_grads_params
->
at
(
i
))
{
out
<<
"("
<<
p_g
.
second
<<
", "
<<
p_g
.
first
<<
"), "
;
}
VLOG
(
10
)
<<
out
.
str
();
}
}
void
SetGroupAccordingToMemorySize
(
const
std
::
unordered_map
<
std
::
string
,
ir
::
Node
*>
&
var_nodes
,
GroupGradsAndParams
*
group_grads_params
)
const
{
if
(
FLAGS_fuse_parameter_memory_size
==
0
)
{
return
;
}
size_t
group_memory_size
=
static_cast
<
size_t
>
(
FLAGS_fuse_parameter_memory_size
);
GroupGradsAndParams
local_group_grads_params
;
size_t
j
=
0
;
while
(
j
<
group_grads_params
->
size
())
{
local_group_grads_params
.
emplace_back
();
auto
&
group_p_g
=
local_group_grads_params
.
back
();
size_t
local_group_memory_size
=
0
;
while
(
j
<
group_grads_params
->
size
())
{
std
::
for_each
(
group_grads_params
->
at
(
j
).
begin
(),
group_grads_params
->
at
(
j
).
end
(),
[
&
local_group_memory_size
,
&
var_nodes
](
const
std
::
pair
<
std
::
string
,
std
::
string
>
&
g_p
)
{
auto
iter
=
var_nodes
.
find
(
g_p
.
second
);
PADDLE_ENFORCE
(
iter
!=
var_nodes
.
end
(),
"%s is not found."
,
g_p
.
second
);
auto
shape
=
iter
->
second
->
Var
()
->
GetShape
();
size_t
size
=
framework
::
SizeOfType
(
iter
->
second
->
Var
()
->
GetDataType
());
std
::
for_each
(
shape
.
begin
(),
shape
.
end
(),
[
&
size
](
const
int64_t
&
n
)
{
size
*=
n
;
});
local_group_memory_size
+=
size
;
});
group_p_g
.
insert
(
group_p_g
.
end
(),
group_grads_params
->
at
(
j
).
begin
(),
group_grads_params
->
at
(
j
).
end
());
++
j
;
if
(
local_group_memory_size
>=
group_memory_size
)
{
break
;
}
}
}
std
::
swap
(
*
group_grads_params
,
local_group_grads_params
);
VLOG
(
10
)
<<
string
::
Sprintf
(
"SetGroupAccordingToMemorySize(memory_size: %d):"
,
FLAGS_fuse_parameter_memory_size
);
for
(
size_t
i
=
0
;
i
<
group_grads_params
->
size
();
++
i
)
{
VLOG
(
10
)
<<
"group "
<<
i
;
std
::
stringstream
out
;
for
(
auto
&
g_p
:
group_grads_params
->
at
(
i
))
{
auto
iter
=
var_nodes
.
find
(
g_p
.
second
);
PADDLE_ENFORCE
(
iter
!=
var_nodes
.
end
(),
"%s is not found."
,
g_p
.
second
);
auto
shape
=
iter
->
second
->
Var
()
->
GetShape
();
size_t
size
=
framework
::
SizeOfType
(
iter
->
second
->
Var
()
->
GetDataType
());
std
::
for_each
(
shape
.
begin
(),
shape
.
end
(),
[
&
size
](
const
int64_t
&
n
)
{
size
*=
n
;
});
out
<<
string
::
Sprintf
(
"(%s(%d), %s)"
,
g_p
.
second
,
size
,
g_p
.
first
);
}
VLOG
(
10
)
<<
out
.
str
();
}
}
void
SetGroupAccordingToGroupSize
(
const
std
::
unordered_map
<
std
::
string
,
ir
::
Node
*>
&
var_nodes
,
GroupGradsAndParams
*
group_grads_params
)
const
{
if
(
FLAGS_fuse_parameter_groups_size
==
1
)
{
return
;
}
size_t
group_size
=
static_cast
<
size_t
>
(
FLAGS_fuse_parameter_groups_size
);
if
(
FLAGS_fuse_parameter_groups_size
==
-
1
)
{
group_size
=
group_grads_params
->
size
();
}
PADDLE_ENFORCE_GT
(
group_size
,
1
);
size_t
groups
=
(
group_grads_params
->
size
()
+
group_size
-
1
)
/
group_size
;
GroupGradsAndParams
local_group_grads_params
;
local_group_grads_params
.
reserve
(
groups
);
size_t
j
=
0
;
for
(
size_t
i
=
0
;
i
<
groups
;
++
i
)
{
local_group_grads_params
.
emplace_back
();
auto
&
group_p_g
=
local_group_grads_params
.
back
();
group_p_g
.
reserve
(
group_size
);
while
(
j
<
group_grads_params
->
size
())
{
group_p_g
.
insert
(
group_p_g
.
end
(),
group_grads_params
->
at
(
j
).
begin
(),
group_grads_params
->
at
(
j
).
end
());
++
j
;
if
(
j
%
group_size
==
0
)
break
;
}
}
std
::
swap
(
*
group_grads_params
,
local_group_grads_params
);
VLOG
(
10
)
<<
"SetGroupAccordingToGroupSize(group_size: "
<<
group_size
<<
"): "
;
for
(
size_t
i
=
0
;
i
<
group_grads_params
->
size
();
++
i
)
{
VLOG
(
10
)
<<
"group "
<<
i
;
std
::
stringstream
out
;
for
(
auto
&
p_g
:
group_grads_params
->
at
(
i
))
{
out
<<
"("
<<
p_g
.
second
<<
", "
<<
p_g
.
first
<<
"), "
;
}
VLOG
(
10
)
<<
out
.
str
();
}
}
private:
bool
IsSupportedVarType
(
const
proto
::
VarType
::
Type
&
type
)
const
{
// Current only support LOD_TENSOR.
return
type
==
proto
::
VarType
::
LOD_TENSOR
;
}
void
AppendAllocSpaceForVarsOp
(
const
std
::
vector
<
std
::
string
>
&
params_name
,
const
std
::
vector
<
std
::
string
>
&
grads_name
,
const
std
::
string
&
fused_var_name
,
BlockDesc
*
global_block
)
const
{
auto
op_desc
=
global_block
->
AppendOp
();
op_desc
->
SetType
(
"alloc_continuous_space"
);
op_desc
->
SetInput
(
"Input"
,
params_name
);
op_desc
->
SetOutput
(
"Output"
,
grads_name
);
op_desc
->
SetOutput
(
"FusedOutput"
,
{
fused_var_name
});
}
void
RecordParamsAndGrads
(
ir
::
Node
*
node
,
ParamsAndGrads
*
params_grads
)
const
{
try
{
bool
is_bk_op
=
static_cast
<
bool
>
(
boost
::
get
<
int
>
(
node
->
Op
()
->
GetAttr
(
OpProtoAndCheckerMaker
::
OpRoleAttrName
()))
&
static_cast
<
int
>
(
OpRole
::
kBackward
));
if
(
!
is_bk_op
)
return
;
// Currently, we assume that once gradient is generated, it can be
// broadcast, and each gradient is only broadcast once.
auto
backward_vars
=
boost
::
get
<
std
::
vector
<
std
::
string
>>
(
node
->
Op
()
->
GetNullableAttr
(
OpProtoAndCheckerMaker
::
OpRoleVarAttrName
()));
PADDLE_ENFORCE_EQ
(
backward_vars
.
size
()
%
2
,
static_cast
<
size_t
>
(
0
));
for
(
size_t
i
=
0
;
i
<
backward_vars
.
size
();
i
+=
2
)
{
VLOG
(
10
)
<<
"Trainable parameter: "
<<
backward_vars
[
i
]
<<
", gradient: "
<<
backward_vars
[
i
+
1
];
params_grads
->
emplace_back
(
std
::
make_pair
(
backward_vars
[
i
]
/*param*/
,
backward_vars
[
i
+
1
]
/*grad*/
));
}
}
catch
(
boost
::
bad_get
e
)
{
}
}
void
InitFusedVarsAndAllocSpaceForVars
(
const
std
::
vector
<
platform
::
Place
>
&
places
,
const
std
::
vector
<
Scope
*>
&
local_scopes
,
const
std
::
unordered_map
<
std
::
string
,
ir
::
Node
*>
&
vars
,
const
std
::
string
&
fused_var_name
,
const
ParamsAndGrads
&
params_grads
)
const
{
// Init Gradients and FusedVars
VLOG
(
10
)
<<
"Init FusedVars and Gradients."
;
for
(
auto
it
=
local_scopes
.
rbegin
();
it
!=
local_scopes
.
rend
();
++
it
)
{
auto
&
scope
=
*
it
;
PADDLE_ENFORCE
(
scope
->
FindVar
(
fused_var_name
)
==
nullptr
,
"%s has existed in scope."
,
fused_var_name
);
scope
->
Var
(
fused_var_name
)
->
GetMutable
<
LoDTensor
>
();
for
(
auto
&
p_g
:
params_grads
)
{
auto
iter
=
vars
.
find
(
p_g
.
second
);
PADDLE_ENFORCE
(
iter
!=
vars
.
end
());
PADDLE_ENFORCE_NOT_NULL
(
iter
->
second
->
Var
());
PADDLE_ENFORCE_EQ
(
iter
->
second
->
Var
()
->
GetType
(),
proto
::
VarType
::
LOD_TENSOR
);
scope
->
Var
(
p_g
.
second
)
->
GetMutable
<
LoDTensor
>
();
}
}
std
::
vector
<
std
::
string
>
grads_name
;
std
::
vector
<
std
::
string
>
params_name
;
grads_name
.
reserve
(
params_grads
.
size
());
params_name
.
reserve
(
params_grads
.
size
());
for
(
auto
&
p_g
:
params_grads
)
{
params_name
.
emplace_back
(
p_g
.
first
);
grads_name
.
emplace_back
(
p_g
.
second
);
}
framework
::
ProgramDesc
program_desc
;
AppendAllocSpaceForVarsOp
(
params_name
,
grads_name
,
fused_var_name
,
program_desc
.
MutableBlock
(
0
));
// Run Only Once Programs
for
(
size_t
i
=
0
;
i
<
local_scopes
.
size
();
++
i
)
{
for
(
auto
&
op_desc
:
program_desc
.
Block
(
0
).
AllOps
())
{
auto
op
=
OpRegistry
::
CreateOp
(
*
op_desc
);
op
->
Run
(
*
local_scopes
[
i
],
places
[
i
]);
}
}
}
};
}
// namespace details
}
// namespace framework
}
// namespace paddle
REGISTER_PASS
(
alloc_continuous_space_for_grad_pass
,
paddle
::
framework
::
details
::
AllocContinuousSpaceForGradPass
)
.
RequirePassAttr
(
paddle
::
framework
::
details
::
kPlaces
)
.
RequirePassAttr
(
paddle
::
framework
::
details
::
kLocalScopes
);
paddle/fluid/framework/details/build_strategy.cc
浏览文件 @
dd080b17
...
...
@@ -46,7 +46,16 @@ class ParallelExecutorPassBuilder : public ir::PassBuilder {
public:
explicit
ParallelExecutorPassBuilder
(
const
BuildStrategy
&
strategy
)
:
ir
::
PassBuilder
(),
strategy_
(
strategy
)
{
// Add a graph viz pass to record a graph.
if
(
!
strategy_
.
debug_graphviz_path_
.
empty
())
{
auto
viz_pass
=
AppendPass
(
"graph_viz_pass"
);
const
std
::
string
graph_path
=
string
::
Sprintf
(
"%s%s"
,
strategy_
.
debug_graphviz_path_
.
c_str
(),
"_original_graph"
);
viz_pass
->
Set
<
std
::
string
>
(
"graph_viz_path"
,
new
std
::
string
(
graph_path
));
}
if
(
strategy_
.
enable_sequential_execution_
)
{
VLOG
(
10
)
<<
"Add sequential_execution_pass"
;
AppendPass
(
"sequential_execution_pass"
);
}
...
...
@@ -57,6 +66,7 @@ class ParallelExecutorPassBuilder : public ir::PassBuilder {
// Add op fusion.
if
(
strategy
.
fuse_relu_depthwise_conv_
)
{
VLOG
(
10
)
<<
"Add fuse_relu_depthwise_conv_pass"
;
AppendPass
(
"fuse_relu_depthwise_conv_pass"
);
}
...
...
@@ -68,29 +78,30 @@ class ParallelExecutorPassBuilder : public ir::PassBuilder {
// Add automatically inplace.
if
(
strategy_
.
enable_inplace_
)
{
VLOG
(
10
)
<<
"Add inplace_pass"
;
AppendPass
(
"inplace_pass"
);
}
if
(
strategy
.
fuse_elewise_add_act_ops_
)
{
VLOG
(
10
)
<<
"Add fuse_elewise_add_act_pass"
;
AppendPass
(
"fuse_elewise_add_act_pass"
);
}
// for single card training, fuse_all_reduce_ops is unnecessary.
// alloc_continuous_space_for_grad_pass should be before of MultiDevPass.
if
(
strategy
.
fuse_all_reduce_ops_
)
{
VLOG
(
10
)
<<
"Add alloc_continuous_space_for_grad_pass"
;
AppendPass
(
"alloc_continuous_space_for_grad_pass"
);
}
// Add a graph viz pass to record a graph.
if
(
!
strategy
_
.
debug_graphviz_path_
.
empty
())
{
if
(
!
strategy
.
debug_graphviz_path_
.
empty
())
{
auto
viz_pass
=
AppendPass
(
"graph_viz_pass"
);
const
std
::
string
graph_path
=
string
::
Sprintf
(
"%s%s"
,
strategy
_
.
debug_graphviz_path_
.
c_str
(),
"_original
_graph"
);
"%s%s"
,
strategy
.
debug_graphviz_path_
.
c_str
(),
"_fused
_graph"
);
viz_pass
->
Set
<
std
::
string
>
(
"graph_viz_path"
,
new
std
::
string
(
graph_path
));
}
if
(
strategy
.
fuse_elewise_add_act_ops_
)
{
auto
fuse_elewise_add_act_pass
=
AppendPass
(
"fuse_elewise_add_act_pass"
);
// Add a graph viz pass to record a graph.
if
(
!
strategy
.
debug_graphviz_path_
.
empty
())
{
auto
viz_pass
=
AppendPass
(
"graph_viz_pass"
);
const
std
::
string
graph_path
=
string
::
Sprintf
(
"%s%s"
,
strategy
.
debug_graphviz_path_
.
c_str
(),
"_fused_graph"
);
viz_pass
->
Set
<
std
::
string
>
(
"graph_viz_path"
,
new
std
::
string
(
graph_path
));
}
}
CollectiveContext
*
context
=
CollectiveContext
::
GetInstance
();
context
->
endpoints_
=
strategy_
.
trainers_endpoints_
;
context
->
trainer_id_
=
strategy_
.
trainer_id_
;
...
...
@@ -108,11 +119,19 @@ class ParallelExecutorPassBuilder : public ir::PassBuilder {
// A side-effect of that, memory optimize cannot forsee the fetched vars
// , so fetchlist should be set persistable before call the Run interface.
if
(
strategy
.
memory_optimize_
)
{
auto
memory_optimize_pass
=
AppendPass
(
"memory_optimize_pass"
);
VLOG
(
10
)
<<
"Add memory_optimize_pass"
;
AppendPass
(
"memory_optimize_pass"
);
}
AppendMultiDevPass
(
strategy
);
if
(
strategy
.
fuse_all_reduce_ops_
)
{
// NOTE: fuse_all_reduce_ops will count the number of all_reduce operator
// first, if the number is zero, fuse_all_reduce_ops will do nothing.
VLOG
(
10
)
<<
"Add fuse_all_reduce_op_pass"
;
AppendPass
(
"fuse_all_reduce_op_pass"
);
}
// Add a graph print pass to record a graph with device info.
if
(
!
strategy_
.
debug_graphviz_path_
.
empty
())
{
auto
multi_devices_print_pass
=
AppendPass
(
"multi_devices_print_pass"
);
...
...
@@ -129,27 +148,29 @@ class ParallelExecutorPassBuilder : public ir::PassBuilder {
AppendPass
(
"multi_devices_check_pass"
);
if
(
SeqOnlyAllReduceOps
(
strategy
))
{
VLOG
(
10
)
<<
"Add all_reduce_deps_pass"
;
AppendPass
(
"all_reduce_deps_pass"
);
}
if
(
strategy_
.
remove_unnecessary_lock_
)
{
VLOG
(
10
)
<<
"Add modify_op_lock_and_record_event_pass"
;
AppendPass
(
"modify_op_lock_and_record_event_pass"
);
}
}
// Convert graph to run on multi-devices.
void
AppendMultiDevPass
(
const
BuildStrategy
&
strategy
)
{
ir
::
Pass
*
multi_devices_pass
;
ir
::
Pass
*
multi_devices_pass
=
nullptr
;
if
(
strategy_
.
is_distribution_
)
{
VLOG
(
3
)
<<
"multi device parameter server mode
"
;
VLOG
(
10
)
<<
"Add dist_multi_devices_pass
"
;
multi_devices_pass
=
AppendPass
(
"dist_multi_devices_pass"
).
get
();
}
else
{
if
(
strategy
.
reduce_
==
BuildStrategy
::
ReduceStrategy
::
kAllReduce
)
{
VLOG
(
3
)
<<
"multi devices collective mode with allreduce
"
;
VLOG
(
10
)
<<
"Add all_reduce_mode_multi_devices_pass
"
;
multi_devices_pass
=
AppendPass
(
"allreduce_mode_multi_devices_pass"
).
get
();
AppendPass
(
"all
_
reduce_mode_multi_devices_pass"
).
get
();
}
else
if
(
strategy
.
reduce_
==
BuildStrategy
::
ReduceStrategy
::
kReduce
)
{
VLOG
(
3
)
<<
"multi deivces collective mode with reduce
"
;
VLOG
(
10
)
<<
"Add reduce_mode_multi_devices_pass
"
;
multi_devices_pass
=
AppendPass
(
"reduce_mode_multi_devices_pass"
).
get
();
}
else
{
PADDLE_THROW
(
"Unknown reduce strategy."
);
...
...
@@ -206,9 +227,26 @@ std::unique_ptr<ir::Graph> BuildStrategy::Apply(
#if defined(PADDLE_WITH_CUDA) && !defined(_WIN32)
platform
::
NCCLContextMap
*
nctx
=
use_cuda
?
nccl_ctxs
:
nullptr
;
pass
->
Erase
(
"nccl_ctxs"
);
pass
->
SetNotOwned
<
platform
::
NCCLContextMap
>
(
"nccl_ctxs"
,
nctx
);
pass
->
Erase
(
kNCCLCtxs
);
pass
->
SetNotOwned
<
platform
::
NCCLContextMap
>
(
kNCCLCtxs
,
nctx
);
#endif
}
else
if
(
pass
->
Type
()
==
"fuse_all_reduce_op_pass"
)
{
pass
->
Erase
(
kPlaces
);
pass
->
SetNotOwned
<
const
std
::
vector
<
platform
::
Place
>>
(
kPlaces
,
&
places
);
pass
->
Erase
(
kLocalScopes
);
pass
->
SetNotOwned
<
const
std
::
vector
<
Scope
*>>
(
kLocalScopes
,
&
local_scopes
);
#if defined(PADDLE_WITH_CUDA) && !defined(_WIN32)
platform
::
NCCLContextMap
*
nctx
=
use_cuda
?
nccl_ctxs
:
nullptr
;
pass
->
Erase
(
kNCCLCtxs
);
pass
->
SetNotOwned
<
platform
::
NCCLContextMap
>
(
kNCCLCtxs
,
nctx
);
#endif
}
else
if
(
pass
->
Type
()
==
"alloc_continuous_space_for_grad_pass"
)
{
pass
->
Erase
(
kPlaces
);
pass
->
SetNotOwned
<
const
std
::
vector
<
platform
::
Place
>>
(
kPlaces
,
&
places
);
pass
->
Erase
(
kLocalScopes
);
pass
->
SetNotOwned
<
const
std
::
vector
<
Scope
*>>
(
kLocalScopes
,
&
local_scopes
);
}
else
if
(
pass
->
Type
()
==
"sequential_execution_pass"
)
{
LOG
(
INFO
)
<<
"set enable_sequential_execution:"
<<
enable_sequential_execution_
;
...
...
@@ -239,7 +277,7 @@ USE_PASS(fuse_elewise_add_act_pass);
USE_PASS
(
graph_viz_pass
);
USE_PASS
(
multi_batch_merge_pass
);
USE_PASS
(
reduce_mode_multi_devices_pass
);
USE_PASS
(
allreduce_mode_multi_devices_pass
);
USE_PASS
(
all
_
reduce_mode_multi_devices_pass
);
USE_PASS
(
dist_multi_devices_pass
);
USE_PASS
(
multi_devices_check_pass
);
USE_PASS
(
multi_devices_print_pass
);
...
...
@@ -249,4 +287,6 @@ USE_PASS(all_reduce_deps_pass);
USE_PASS
(
modify_op_lock_and_record_event_pass
);
USE_PASS
(
inplace_pass
);
USE_PASS
(
lock_free_optimize_pass
);
USE_PASS
(
alloc_continuous_space_for_grad_pass
);
USE_PASS
(
graph_to_program_pass
);
USE_PASS
(
fuse_all_reduce_op_pass
);
paddle/fluid/framework/details/build_strategy.h
浏览文件 @
dd080b17
...
...
@@ -16,6 +16,7 @@
#include <memory>
#include <string>
#include <utility>
#include <vector>
#include "paddle/fluid/framework/ir/pass_builder.h"
...
...
@@ -75,6 +76,8 @@ struct BuildStrategy {
bool
fuse_elewise_add_act_ops_
{
false
};
bool
fuse_all_reduce_ops_
{
false
};
bool
fuse_relu_depthwise_conv_
{
false
};
bool
sync_batch_norm_
{
false
};
...
...
paddle/fluid/framework/details/fuse_all_reduce_op_pass.cc
0 → 100644
浏览文件 @
dd080b17
// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include <algorithm>
#include <string>
#include <vector>
#include "paddle/fluid/framework/details/all_reduce_op_handle.h"
#include "paddle/fluid/framework/details/container_cast.h"
#include "paddle/fluid/framework/details/fused_all_reduce_op_handle.h"
#include "paddle/fluid/framework/details/multi_devices_helper.h"
#include "paddle/fluid/framework/ir/graph_helper.h"
namespace
paddle
{
namespace
framework
{
namespace
details
{
class
FuseAllReduceOpPass
:
public
ir
::
Pass
{
protected:
std
::
unique_ptr
<
ir
::
Graph
>
ApplyImpl
(
std
::
unique_ptr
<
ir
::
Graph
>
graph
)
const
override
{
ir
::
Graph
&
result
=
*
graph
;
auto
&
places
=
Get
<
const
std
::
vector
<
platform
::
Place
>>
(
kPlaces
);
auto
&
local_scopes
=
Get
<
const
std
::
vector
<
Scope
*>>
(
kLocalScopes
);
#if defined(PADDLE_WITH_CUDA) && !defined(_WIN32)
auto
*
nccl_ctxs
=
&
Get
<
platform
::
NCCLContextMap
>
(
kNCCLCtxs
);
#endif
std
::
unordered_set
<
std
::
string
>
grads
;
auto
&
params_grads
=
result
.
Get
<
ParamsAndGrads
>
(
kParamsAndGrads
);
size_t
num_of_all_reduce
=
params_grads
.
size
();
grads
.
reserve
(
num_of_all_reduce
);
for
(
auto
p_g
:
params_grads
)
{
grads
.
insert
(
p_g
.
second
);
}
size_t
num_place
=
places
.
size
();
std
::
unordered_map
<
std
::
string
,
ir
::
Node
*>
all_reduce_ops
;
all_reduce_ops
.
reserve
(
grads
.
size
());
for
(
auto
&
node
:
result
.
Nodes
())
{
if
(
node
->
IsOp
())
{
PADDLE_ENFORCE
(
node
->
IsWrappedBy
<
OpHandleBase
>
());
auto
*
all_reduce_op_handle
=
dynamic_cast
<
AllReduceOpHandle
*>
(
&
node
->
Wrapper
<
OpHandleBase
>
());
if
(
all_reduce_op_handle
)
{
auto
inputs
=
DynamicCast
<
VarHandle
>
(
all_reduce_op_handle
->
Inputs
());
PADDLE_ENFORCE_EQ
(
inputs
.
size
(),
num_place
);
// The inputs' name should be the same.
auto
&
grad_name
=
inputs
[
0
]
->
name
();
for
(
size_t
i
=
1
;
i
<
inputs
.
size
();
++
i
)
{
PADDLE_ENFORCE_EQ
(
inputs
[
i
]
->
name
(),
grad_name
,
"The input name should be the same."
);
}
PADDLE_ENFORCE_NE
(
grads
.
count
(
grad_name
),
static_cast
<
size_t
>
(
0
));
all_reduce_ops
.
emplace
(
grad_name
,
node
);
}
}
}
VLOG
(
10
)
<<
"Find all_reduce_ops: "
<<
all_reduce_ops
.
size
();
if
(
all_reduce_ops
.
size
()
==
0
)
{
return
std
::
move
(
graph
);
}
PADDLE_ENFORCE_EQ
(
all_reduce_ops
.
size
(),
grads
.
size
(),
"The number of all_reduce OpHandle is not equal to the "
"number of grads. Maybe some gradients are sparse type, "
"it is not supported currently."
);
VLOG
(
10
)
<<
"Insert fused_all_reduce"
;
auto
&
group_grads_params
=
graph
->
Get
<
GroupGradsAndParams
>
(
kGroupGradsAndParams
);
for
(
auto
&
group_g_p
:
group_grads_params
)
{
size_t
group_size
=
group_g_p
.
size
();
PADDLE_ENFORCE_GT
(
group_size
,
static_cast
<
size_t
>
(
0
));
std
::
vector
<
ir
::
Node
*>
group_all_reduce_ops
;
group_all_reduce_ops
.
reserve
(
group_size
);
for
(
auto
&
g_p
:
group_g_p
)
{
group_all_reduce_ops
.
emplace_back
(
all_reduce_ops
.
at
(
g_p
.
first
));
}
#if defined(PADDLE_WITH_CUDA) && !defined(_WIN32)
InsertFusedAllReduce
(
places
,
local_scopes
,
group_size
,
group_all_reduce_ops
,
nccl_ctxs
,
&
result
);
#else
InsertFusedAllReduce
(
places
,
local_scopes
,
group_size
,
group_all_reduce_ops
,
&
result
);
#endif
}
return
std
::
move
(
graph
);
}
void
InsertFusedAllReduce
(
const
std
::
vector
<
platform
::
Place
>
&
places
,
const
std
::
vector
<
Scope
*>
&
local_scopes
,
const
size_t
num_of_all_reduce
,
const
std
::
vector
<
ir
::
Node
*>
&
all_reduce_ops
,
#if defined(PADDLE_WITH_CUDA) && !defined(_WIN32)
const
platform
::
NCCLContextMap
*
nccl_ctxs
,
#endif
ir
::
Graph
*
result
)
const
{
std
::
vector
<
VarHandleBase
*>
inputs
;
std
::
vector
<
VarHandleBase
*>
outputs
;
for
(
auto
&
op
:
all_reduce_ops
)
{
auto
&
op_handle
=
op
->
Wrapper
<
OpHandleBase
>
();
inputs
.
insert
(
inputs
.
end
(),
op_handle
.
Inputs
().
begin
(),
op_handle
.
Inputs
().
end
());
// Remove output
for_each
(
op_handle
.
Inputs
().
begin
(),
op_handle
.
Inputs
().
end
(),
[
&
op_handle
](
VarHandleBase
*
var_handle
)
{
var_handle
->
RemoveOutput
(
&
op_handle
,
op_handle
.
Node
());
});
outputs
.
insert
(
outputs
.
end
(),
op_handle
.
Outputs
().
begin
(),
op_handle
.
Outputs
().
end
());
// Remove Input
for_each
(
op_handle
.
Outputs
().
begin
(),
op_handle
.
Outputs
().
end
(),
[](
VarHandleBase
*
var_handle
)
{
var_handle
->
ClearGeneratedOp
();
});
result
->
RemoveNode
(
op_handle
.
Node
());
}
#if defined(PADDLE_WITH_CUDA) && !defined(_WIN32)
CreateFusedAllReduceOp
(
inputs
,
outputs
,
num_of_all_reduce
,
places
,
local_scopes
,
nccl_ctxs
,
result
);
#else
CreateFusedAllReduceOp
(
inputs
,
outputs
,
num_of_all_reduce
,
places
,
local_scopes
,
result
);
#endif
}
private:
void
CreateFusedAllReduceOp
(
const
std
::
vector
<
VarHandleBase
*>
&
inputs
,
const
std
::
vector
<
VarHandleBase
*>
&
outputs
,
const
size_t
num_of_all_reduce
,
const
std
::
vector
<
platform
::
Place
>
&
places
,
const
std
::
vector
<
Scope
*>
&
local_scopes
,
#if defined(PADDLE_WITH_CUDA) && !defined(_WIN32)
const
platform
::
NCCLContextMap
*
nccl_ctxs
,
#endif
ir
::
Graph
*
result
)
const
{
#if defined(PADDLE_WITH_CUDA) && !defined(_WIN32)
auto
*
op_handle
=
new
FusedAllReduceOpHandle
(
result
->
CreateEmptyNode
(
"fused_all_reduce"
,
ir
::
Node
::
Type
::
kOperation
),
local_scopes
,
places
,
num_of_all_reduce
,
nccl_ctxs
);
#else
auto
*
op_handle
=
new
FusedAllReduceOpHandle
(
result
->
CreateEmptyNode
(
"fused_all_reduce"
,
ir
::
Node
::
Type
::
kOperation
),
local_scopes
,
places
,
num_of_all_reduce
);
#endif
for
(
auto
in
:
inputs
)
{
op_handle
->
AddInput
(
in
);
}
for
(
auto
out
:
outputs
)
{
op_handle
->
AddOutput
(
out
);
}
#if defined(PADDLE_WITH_CUDA) && !defined(_WIN32)
if
(
!
nccl_ctxs
)
{
SetCommunicationContext
(
places
,
op_handle
);
}
#else
SetCommunicationContext
(
places
,
op_handle
);
#endif
}
void
SetCommunicationContext
(
const
std
::
vector
<
platform
::
Place
>
&
places
,
FusedAllReduceOpHandle
*
op_handle
)
const
{
for
(
size_t
i
=
0
;
i
<
places
.
size
();
++
i
)
{
op_handle
->
SetDeviceContext
(
places
[
i
],
platform
::
DeviceContextPool
::
Instance
().
Get
(
places
[
i
]));
}
}
};
}
// namespace details
}
// namespace framework
}
// namespace paddle
REGISTER_PASS
(
fuse_all_reduce_op_pass
,
paddle
::
framework
::
details
::
FuseAllReduceOpPass
);
paddle/fluid/framework/details/fused_all_reduce_op_handle.cc
0 → 100644
浏览文件 @
dd080b17
// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "paddle/fluid/framework/details/fused_all_reduce_op_handle.h"
#include <algorithm>
#include <utility>
#include "paddle/fluid/framework/details/container_cast.h"
#include "paddle/fluid/framework/details/reduce_and_gather.h"
#include "paddle/fluid/framework/details/variable_visitor.h"
#include "paddle/fluid/platform/profiler.h"
DEFINE_bool
(
skip_fused_all_reduce_check
,
false
,
""
);
namespace
paddle
{
namespace
framework
{
namespace
details
{
typedef
std
::
vector
<
std
::
vector
<
std
::
pair
<
std
::
string
,
const
LoDTensor
*>>>
GradientAndLoDTensor
;
#if defined(PADDLE_WITH_CUDA) && !defined(_WIN32)
FusedAllReduceOpHandle
::
FusedAllReduceOpHandle
(
ir
::
Node
*
node
,
const
std
::
vector
<
Scope
*>
&
local_scopes
,
const
std
::
vector
<
platform
::
Place
>
&
places
,
const
size_t
num_of_all_reduce
,
const
platform
::
NCCLContextMap
*
ctxs
)
:
OpHandleBase
(
node
),
local_scopes_
(
local_scopes
),
places_
(
places
),
num_of_all_reduce_
(
num_of_all_reduce
),
nccl_ctxs_
(
ctxs
)
{
if
(
nccl_ctxs_
)
{
for
(
auto
&
p
:
places_
)
{
this
->
SetDeviceContext
(
p
,
nccl_ctxs_
->
DevCtx
(
p
));
}
}
PADDLE_ENFORCE_EQ
(
places_
.
size
(),
local_scopes_
.
size
());
}
#else
FusedAllReduceOpHandle
::
FusedAllReduceOpHandle
(
ir
::
Node
*
node
,
const
std
::
vector
<
Scope
*>
&
local_scopes
,
const
std
::
vector
<
platform
::
Place
>
&
places
,
const
size_t
num_of_all_reduce
)
:
OpHandleBase
(
node
),
local_scopes_
(
local_scopes
),
places_
(
places
),
num_of_all_reduce_
(
num_of_all_reduce
)
{
PADDLE_ENFORCE_EQ
(
places_
.
size
(),
local_scopes_
.
size
());
}
#endif
void
FusedAllReduceOpHandle
::
RunImpl
()
{
platform
::
RecordEvent
record_event
(
Name
());
VLOG
(
4
)
<<
this
->
DebugString
();
WaitInputVarGenerated
();
// The input: grad0(dev0), grad0(dev1), grad1(dev0), grad1(dev1)...
// The output: grad0(dev0), grad0(dev1), grad1(dev0), grad1(dev1)...
auto
in_var_handles
=
DynamicCast
<
VarHandle
>
(
this
->
Inputs
());
auto
out_var_handles
=
DynamicCast
<
VarHandle
>
(
this
->
Outputs
());
size_t
place_num
=
places_
.
size
();
PADDLE_ENFORCE_EQ
(
in_var_handles
.
size
(),
place_num
*
num_of_all_reduce_
,
"The NoDummyInputSize should be equal to the number of places."
);
PADDLE_ENFORCE_EQ
(
in_var_handles
.
size
(),
out_var_handles
.
size
(),
"The NoDummyInputSize and NoDummyOutputSize should be equal."
);
GradientAndLoDTensor
grads_tensor
;
grads_tensor
.
resize
(
place_num
);
int64_t
numel
=
-
1
;
auto
dtype
=
static_cast
<
framework
::
proto
::
VarType
::
Type
>
(
0
);
for
(
size_t
scope_idx
=
0
;
scope_idx
<
local_scopes_
.
size
();
++
scope_idx
)
{
auto
&
g_tensor
=
grads_tensor
.
at
(
scope_idx
);
g_tensor
.
reserve
(
num_of_all_reduce_
);
GetGradLoDTensor
(
scope_idx
,
in_var_handles
,
out_var_handles
,
&
g_tensor
);
int64_t
element_num
=
0
;
framework
::
proto
::
VarType
::
Type
ele_dtype
=
static_cast
<
framework
::
proto
::
VarType
::
Type
>
(
0
);
GetDTypeAndNumel
(
g_tensor
,
&
ele_dtype
,
&
element_num
);
if
(
numel
==
-
1
)
{
numel
=
element_num
;
}
if
(
dtype
==
static_cast
<
framework
::
proto
::
VarType
::
Type
>
(
0
))
{
dtype
=
ele_dtype
;
PADDLE_ENFORCE_NE
(
ele_dtype
,
static_cast
<
framework
::
proto
::
VarType
::
Type
>
(
0
));
}
PADDLE_ENFORCE_EQ
(
ele_dtype
,
dtype
);
// Check whether the address space is contiguous.
std
::
sort
(
g_tensor
.
begin
(),
g_tensor
.
end
(),
[](
const
std
::
pair
<
std
::
string
,
const
LoDTensor
*>
&
grad1
,
const
std
::
pair
<
std
::
string
,
const
LoDTensor
*>
&
grad2
)
->
bool
{
return
grad1
.
second
->
data
<
void
>
()
<
grad2
.
second
->
data
<
void
>
();
});
for
(
size_t
k
=
1
;
k
<
g_tensor
.
size
();
++
k
)
{
const
void
*
pre_address
=
g_tensor
.
at
(
k
-
1
).
second
->
data
<
void
>
();
int64_t
len
=
g_tensor
.
at
(
k
-
1
).
second
->
numel
();
auto
offset
=
len
*
framework
::
SizeOfType
(
dtype
);
void
*
next_address
=
reinterpret_cast
<
void
*>
(
reinterpret_cast
<
uintptr_t
>
(
pre_address
)
+
offset
);
const
void
*
cur_address
=
g_tensor
.
at
(
k
).
second
->
data
<
void
>
();
VLOG
(
10
)
<<
k
<<
", "
<<
" pre_address("
<<
g_tensor
.
at
(
k
-
1
).
first
<<
"): "
<<
pre_address
<<
", cur_address("
<<
g_tensor
.
at
(
k
).
first
<<
"): "
<<
cur_address
<<
", offset:"
<<
offset
<<
", "
<<
next_address
<<
", "
<<
cur_address
;
PADDLE_ENFORCE_EQ
(
next_address
,
cur_address
);
}
}
if
(
!
FLAGS_skip_fused_all_reduce_check
)
{
for
(
size_t
scope_idx
=
0
;
scope_idx
<
place_num
;
++
scope_idx
)
{
for
(
size_t
j
=
1
;
j
<
num_of_all_reduce_
;
++
j
)
{
PADDLE_ENFORCE_EQ
(
grads_tensor
.
at
(
0
).
at
(
j
).
first
,
grads_tensor
.
at
(
scope_idx
).
at
(
j
).
first
);
}
}
}
std
::
vector
<
const
void
*>
lod_tensor_data
;
for
(
size_t
scope_idx
=
0
;
scope_idx
<
place_num
;
++
scope_idx
)
{
auto
data
=
grads_tensor
.
at
(
scope_idx
).
at
(
0
).
second
->
data
<
void
>
();
lod_tensor_data
.
emplace_back
(
data
);
}
if
(
platform
::
is_gpu_place
(
places_
[
0
]))
{
#if defined(PADDLE_WITH_CUDA) && !defined(_WIN32)
PADDLE_ENFORCE
(
nccl_ctxs_
,
"nccl_ctxs should not be nullptr."
);
int
nccl_dtype
=
platform
::
ToNCCLDataType
(
dtype
);
std
::
vector
<
std
::
function
<
void
()
>>
all_reduce_calls
;
for
(
size_t
i
=
0
;
i
<
local_scopes_
.
size
();
++
i
)
{
auto
&
p
=
places_
[
i
];
void
*
buffer
=
const_cast
<
void
*>
(
lod_tensor_data
.
at
(
i
));
int
dev_id
=
boost
::
get
<
platform
::
CUDAPlace
>
(
p
).
device
;
auto
&
nccl_ctx
=
nccl_ctxs_
->
at
(
dev_id
);
auto
stream
=
nccl_ctx
.
stream
();
auto
comm
=
nccl_ctx
.
comm_
;
all_reduce_calls
.
emplace_back
([
=
]
{
PADDLE_ENFORCE
(
platform
::
dynload
::
ncclAllReduce
(
buffer
,
buffer
,
numel
,
static_cast
<
ncclDataType_t
>
(
nccl_dtype
),
ncclSum
,
comm
,
stream
));
});
}
this
->
RunAndRecordEvent
([
&
]
{
if
(
all_reduce_calls
.
size
()
==
1UL
)
{
// Do not use NCCLGroup when manage NCCL by per thread per device
all_reduce_calls
[
0
]();
}
else
{
platform
::
NCCLGroupGuard
guard
;
for
(
auto
&
call
:
all_reduce_calls
)
{
call
();
}
}
});
#else
PADDLE_THROW
(
"Not compiled with CUDA"
);
#endif
}
else
{
// Special handle CPU only Operator's gradient. Like CRF
auto
grad_name
=
grads_tensor
.
at
(
0
).
at
(
0
).
first
;
auto
&
trg
=
*
this
->
local_scopes_
[
0
]
->
FindVar
(
kLocalExecScopeName
)
->
Get
<
Scope
*>
()
->
FindVar
(
grad_name
)
->
GetMutable
<
framework
::
LoDTensor
>
();
// Reduce All data to trg in CPU
ReduceBufferData
func
(
lod_tensor_data
,
trg
.
data
<
void
>
(),
numel
);
VisitDataType
(
trg
.
type
(),
func
);
for
(
size_t
i
=
1
;
i
<
local_scopes_
.
size
();
++
i
)
{
auto
&
scope
=
*
local_scopes_
[
i
]
->
FindVar
(
kLocalExecScopeName
)
->
Get
<
Scope
*>
();
auto
&
p
=
places_
[
i
];
auto
*
var
=
scope
.
FindVar
(
grad_name
);
auto
*
dev_ctx
=
dev_ctxes_
.
at
(
p
);
size_t
size
=
numel
*
SizeOfType
(
trg
.
type
());
RunAndRecordEvent
(
p
,
[
&
trg
,
var
,
dev_ctx
,
p
,
size
]
{
auto
dst_ptr
=
var
->
GetMutable
<
framework
::
LoDTensor
>
()
->
data
<
void
>
();
platform
::
CPUPlace
cpu_place
;
memory
::
Copy
(
cpu_place
,
dst_ptr
,
cpu_place
,
trg
.
data
<
void
>
(),
size
);
});
}
}
}
void
FusedAllReduceOpHandle
::
GetGradLoDTensor
(
const
size_t
&
scope_idx
,
const
std
::
vector
<
VarHandle
*>
&
in_var_handles
,
const
std
::
vector
<
VarHandle
*>
&
out_var_handles
,
std
::
vector
<
std
::
pair
<
std
::
string
,
const
LoDTensor
*>>
*
grad_tensor
)
const
{
auto
*
local_scope
=
local_scopes_
.
at
(
scope_idx
)
->
FindVar
(
kLocalExecScopeName
)
->
Get
<
Scope
*>
();
size_t
place_num
=
places_
.
size
();
for
(
size_t
j
=
0
;
j
<
in_var_handles
.
size
();
j
+=
place_num
)
{
auto
var_name
=
in_var_handles
[
j
]
->
name
();
PADDLE_ENFORCE_EQ
(
var_name
,
out_var_handles
[
j
]
->
name
());
auto
&
lod_tensor
=
local_scope
->
FindVar
(
var_name
)
->
Get
<
LoDTensor
>
();
PADDLE_ENFORCE_EQ
(
lod_tensor
.
place
(),
places_
.
at
(
scope_idx
));
grad_tensor
->
emplace_back
(
std
::
make_pair
(
var_name
,
&
lod_tensor
));
}
}
void
FusedAllReduceOpHandle
::
GetDTypeAndNumel
(
const
std
::
vector
<
std
::
pair
<
std
::
string
,
const
LoDTensor
*>>
&
grad_tensor
,
proto
::
VarType
::
Type
*
dtype
,
int64_t
*
numel
)
const
{
*
numel
=
0
;
for
(
size_t
i
=
0
;
i
<
grad_tensor
.
size
();
++
i
)
{
// Get element number
int64_t
len
=
grad_tensor
.
at
(
i
).
second
->
numel
();
PADDLE_ENFORCE_GT
(
len
,
0
);
*
numel
+=
len
;
// Get dtype
auto
ele_type
=
grad_tensor
.
at
(
i
).
second
->
type
();
if
(
i
==
0
)
{
*
dtype
=
ele_type
;
}
PADDLE_ENFORCE_EQ
(
ele_type
,
*
dtype
);
}
}
std
::
string
FusedAllReduceOpHandle
::
Name
()
const
{
return
"fused_all_reduce"
;
}
}
// namespace details
}
// namespace framework
}
// namespace paddle
paddle/fluid/framework/details/fused_all_reduce_op_handle.h
0 → 100644
浏览文件 @
dd080b17
// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#pragma once
#include <string>
#include <utility>
#include <vector>
#include "paddle/fluid/framework/details/op_handle_base.h"
#include "paddle/fluid/framework/lod_tensor.h"
#include "paddle/fluid/framework/scope.h"
#if defined(PADDLE_WITH_CUDA) && !defined(_WIN32)
#include "paddle/fluid/platform/nccl_helper.h"
#endif
namespace
paddle
{
namespace
framework
{
namespace
details
{
struct
FusedAllReduceOpHandle
:
public
OpHandleBase
{
#if defined(PADDLE_WITH_CUDA) && !defined(_WIN32)
FusedAllReduceOpHandle
(
ir
::
Node
*
node
,
const
std
::
vector
<
Scope
*>
&
local_scopes
,
const
std
::
vector
<
platform
::
Place
>
&
places
,
const
size_t
num_of_all_reduce
,
const
platform
::
NCCLContextMap
*
ctxs
);
#else
FusedAllReduceOpHandle
(
ir
::
Node
*
node
,
const
std
::
vector
<
Scope
*>
&
local_scopes
,
const
std
::
vector
<
platform
::
Place
>
&
places
,
const
size_t
num_of_all_reduce
);
#endif
std
::
string
Name
()
const
override
;
// Delay and buffer nccl_all_reduce together can significantly increase
// performance. Disable this feature by returning false.
bool
IsMultiDeviceTransfer
()
override
{
return
true
;
};
protected:
void
RunImpl
()
override
;
private:
std
::
vector
<
Scope
*>
local_scopes_
;
std
::
vector
<
platform
::
Place
>
places_
;
size_t
num_of_all_reduce_
;
#if defined(PADDLE_WITH_CUDA) && !defined(_WIN32)
const
platform
::
NCCLContextMap
*
nccl_ctxs_
;
#endif
// Check the dtype of the input
void
GetDTypeAndNumel
(
const
std
::
vector
<
std
::
pair
<
std
::
string
,
const
LoDTensor
*>>
&
g_tensor
,
proto
::
VarType
::
Type
*
dtype
,
int64_t
*
total_num
)
const
;
// Get gradient's name and LoDTensor
void
GetGradLoDTensor
(
const
size_t
&
scope_idx
,
const
std
::
vector
<
VarHandle
*>
&
in_var_handles
,
const
std
::
vector
<
VarHandle
*>
&
out_var_handles
,
std
::
vector
<
std
::
pair
<
std
::
string
,
const
LoDTensor
*>>
*
grad_tensor
)
const
;
};
}
// namespace details
}
// namespace framework
}
// namespace paddle
paddle/fluid/framework/details/graph_test_base.h
浏览文件 @
dd080b17
...
...
@@ -68,11 +68,11 @@ class SplitOpMaker : public OpProtoAndCheckerMaker {
class
DummyVarTypeInference
:
public
VarTypeInference
{
public:
void
operator
()(
const
OpDesc
&
op_desc
,
BlockDesc
*
block
)
const
override
{
auto
&
inputs
=
op_desc
.
Input
(
"X"
);
auto
type
=
block
->
Var
(
inputs
.
front
())
->
GetType
(
);
auto
out_var_name
=
op_desc
.
Output
(
"Out"
).
front
();
block
->
Var
(
out_var_name
)
->
SetType
(
type
);
void
operator
()(
framework
::
InferVarTypeContext
*
ctx
)
const
override
{
auto
&
inputs
=
ctx
->
Input
(
"X"
);
auto
type
=
ctx
->
GetType
(
inputs
.
front
()
);
auto
out_var_name
=
ctx
->
Output
(
"Out"
).
front
();
ctx
->
SetType
(
out_var_name
,
type
);
}
};
...
...
paddle/fluid/framework/details/multi_devices_graph_pass.cc
浏览文件 @
dd080b17
...
...
@@ -11,18 +11,17 @@
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "paddle/fluid/framework/details/multi_devices_graph_pass.h"
#include <algorithm>
#include <fstream>
#include <string>
#include <utility>
#include <vector>
#include "paddle/fluid/framework/details/all_reduce_op_handle.h"
#include "paddle/fluid/framework/details/broadcast_op_handle.h"
#include "paddle/fluid/framework/details/computation_op_handle.h"
#include "paddle/fluid/framework/details/data_balance_op_handle.h"
#include "paddle/fluid/framework/details/fused_broadcast_op_handle.h"
#include "paddle/fluid/framework/details/multi_devices_graph_pass.h"
#include "paddle/fluid/framework/details/reduce_op_handle.h"
#include "paddle/fluid/framework/details/rpc_op_handle.h"
#include "paddle/fluid/framework/details/scale_loss_grad_op_handle.h"
...
...
@@ -134,21 +133,26 @@ void AddOutputToLeafOps(ir::Graph *graph) {
}
}
// namespace
void
MultiDevSSAGraphBuilderBase
::
CheckGraph
(
const
ir
::
Graph
&
graph
)
const
{}
void
MultiDevSSAGraphBuilderBase
::
Init
()
const
{
all_vars_
.
clear
();
loss_var_name_
=
Get
<
const
std
::
string
>
(
kLossVarName
);
VLOG
(
10
)
<<
"Init MultiDevSSAGraphBuilder, loss name: "
<<
loss_var_name_
;
places_
=
Get
<
const
std
::
vector
<
platform
::
Place
>>
(
kPlaces
);
local_scopes_
=
Get
<
const
std
::
vector
<
Scope
*>>
(
kLocalScopes
);
strategy_
=
Get
<
const
BuildStrategy
>
(
kStrategy
);
#if defined(PADDLE_WITH_CUDA) && !defined(_WIN32)
nccl_ctxs_
=
&
Get
<
platform
::
NCCLContextMap
>
(
"nccl_ctxs"
);
nccl_ctxs_
=
&
Get
<
platform
::
NCCLContextMap
>
(
kNCCLCtxs
);
#endif
PADDLE_ENFORCE_EQ
(
places_
.
size
(),
local_scopes_
.
size
());
}
std
::
unique_ptr
<
ir
::
Graph
>
MultiDevSSAGraphBuilderBase
::
ApplyImpl
(
std
::
unique_ptr
<
ir
::
Graph
>
graph
)
const
{
Init
();
CheckGraph
(
*
graph
);
std
::
vector
<
ir
::
Node
*>
sorted_ops
=
SortOperations
(
*
graph
);
auto
nodes
=
graph
->
ReleaseNodes
();
...
...
@@ -166,7 +170,6 @@ std::unique_ptr<ir::Graph> MultiDevSSAGraphBuilderBase::ApplyImpl(
result
.
Set
(
kGraphOps
,
new
GraphOps
);
bool
is_forwarding
=
true
;
bool
insert_collection_ops
=
NeedCollectiveOps
();
for
(
ir
::
Node
*
node
:
sorted_ops
)
{
if
(
DealWithSpecialOp
(
&
result
,
node
))
{
...
...
@@ -185,8 +188,8 @@ std::unique_ptr<ir::Graph> MultiDevSSAGraphBuilderBase::ApplyImpl(
CreateComputationalOps
(
&
result
,
node
,
places_
.
size
());
}
// Insert collecti
on ops
if
(
!
is_forwarding
&&
insert_collection_ops
)
{
// Insert collecti
ve ops if nranks > 1
if
(
!
is_forwarding
&&
Get
<
size_t
>
(
kNRanks
)
>
1
)
{
try
{
bool
is_bk_op
=
static_cast
<
bool
>
(
boost
::
get
<
int
>
(
node
->
Op
()
->
GetAttr
(
...
...
@@ -200,13 +203,13 @@ std::unique_ptr<ir::Graph> MultiDevSSAGraphBuilderBase::ApplyImpl(
boost
::
get
<
std
::
vector
<
std
::
string
>>
(
node
->
Op
()
->
GetNullableAttr
(
OpProtoAndCheckerMaker
::
OpRoleVarAttrName
()));
PADDLE_ENFORCE_EQ
(
backward_vars
.
size
()
%
2
,
0
);
for
(
size_t
i
=
0
;
i
<
backward_vars
.
size
();
i
+=
2
)
{
auto
&
p_name
=
backward_vars
[
i
];
auto
&
g_name
=
backward_vars
[
i
+
1
];
VLOG
(
10
)
<<
"Bcast "
<<
g_name
<<
" for parameter "
<<
p_name
;
InsertCollectiveOp
(
&
result
,
p_name
,
g_name
);
if
(
NeedCollectiveForGrad
(
g_name
,
sorted_ops
))
{
InsertCollectiveOp
(
&
result
,
p_name
,
g_name
);
}
}
}
catch
(
boost
::
bad_get
e
)
{
}
...
...
@@ -226,6 +229,7 @@ std::unique_ptr<ir::Graph> MultiDevSSAGraphBuilderBase::ApplyImpl(
* Only variables should be the leaves of graph.
*/
AddOutputToLeafOps
(
&
result
);
result
.
Erase
(
kGraphOps
);
return
graph
;
}
...
...
@@ -258,6 +262,11 @@ void MultiDevSSAGraphBuilderBase::InsertScaleLossGradOp(
}
}
bool
MultiDevSSAGraphBuilderBase
::
DealWithSpecialOp
(
ir
::
Graph
*
result
,
ir
::
Node
*
node
)
const
{
return
false
;
}
std
::
vector
<
ir
::
Node
*>
MultiDevSSAGraphBuilderBase
::
SortOperations
(
const
ir
::
Graph
&
graph
)
const
{
return
ir
::
TopologySortOperations
(
graph
);
...
...
@@ -271,8 +280,20 @@ bool MultiDevSSAGraphBuilderBase::UseGPU() const {
return
use_gpu
;
}
bool
MultiDevSSAGraphBuilderBase
::
NeedCollectiveOps
()
const
{
return
Get
<
size_t
>
(
kNRanks
)
>
1
;
bool
MultiDevSSAGraphBuilderBase
::
NeedCollectiveForGrad
(
const
std
::
string
&
grad_name
,
std
::
vector
<
ir
::
Node
*>
ops
)
const
{
// if we have allreduce_op for current gradient variable in the graph,
// then we don't need to add allreduce_op_handle for this gradient
// NOTE: This is for the case that all gradients should add collective ops
for
(
auto
*
node
:
ops
)
{
if
(
node
->
Op
()
->
Type
()
!=
"allreduce"
)
continue
;
for
(
auto
in_name
:
node
->
Op
()
->
InputArgumentNames
())
{
if
(
in_name
==
grad_name
)
{
return
false
;
}
}
}
return
true
;
}
void
MultiDevSSAGraphBuilderBase
::
CreateOpHandleIOs
(
ir
::
Graph
*
result
,
...
...
@@ -496,20 +517,17 @@ VarHandle *MultiDevSSAGraphBuilderBase::CreateReduceOp(ir::Graph *result,
}
bool
MultiDevSSAGraphBuilderBase
::
IsScaleLossOp
(
ir
::
Node
*
node
)
const
{
return
boost
::
get
<
int
>
(
return
!
loss_var_name_
.
empty
()
&&
node
->
Op
()
&&
boost
::
get
<
int
>
(
node
->
Op
()
->
GetAttr
(
OpProtoAndCheckerMaker
::
OpRoleAttrName
()))
==
(
static_cast
<
int
>
(
OpRole
::
kBackward
)
|
static_cast
<
int
>
(
OpRole
::
kLoss
))
&&
!
loss_var_name_
.
empty
();
// If loss_var is empty. This is test mode
static_cast
<
int
>
(
OpRole
::
kLoss
));
}
bool
MultiDevSSAGraphBuilderBase
::
IsSparseGradient
(
const
std
::
string
&
og
)
const
{
PADDLE_ENFORCE
(
all_vars_
.
count
(
og
)
!=
0
);
if
(
all_vars_
.
at
(
og
)
->
GetType
()
==
proto
::
VarType
::
SELECTED_ROWS
)
{
return
true
;
}
return
false
;
return
all_vars_
.
at
(
og
)
->
GetType
()
==
proto
::
VarType
::
SELECTED_ROWS
;
}
void
AllReduceSSAGraphBuilder
::
InsertCollectiveOp
(
...
...
@@ -995,7 +1013,7 @@ static int MultiDevSSAGraphBuilderRegister(const std::string &builder_mode) {
REGISTER_MULTI_DEVICES_PASS
(
reduce_mode_multi_devices_pass
,
paddle
::
framework
::
details
::
ReduceSSAGraphBuilder
);
REGISTER_MULTI_DEVICES_PASS
(
allreduce_mode_multi_devices_pass
,
all
_
reduce_mode_multi_devices_pass
,
paddle
::
framework
::
details
::
AllReduceSSAGraphBuilder
);
REGISTER_MULTI_DEVICES_PASS
(
dist_multi_devices_pass
,
paddle
::
framework
::
details
::
DistSSAGraphBuilder
);
paddle/fluid/framework/details/multi_devices_graph_pass.h
浏览文件 @
dd080b17
...
...
@@ -14,7 +14,10 @@
#pragma once
#include <memory>
#include <string>
#include <unordered_map>
#include <unordered_set>
#include <utility>
#include <vector>
...
...
@@ -31,12 +34,6 @@ namespace framework {
class
Scope
;
namespace
details
{
constexpr
char
kLossVarName
[]
=
"loss_var_name"
;
constexpr
char
kPlaces
[]
=
"places"
;
constexpr
char
kLocalScopes
[]
=
"local_scopes"
;
constexpr
char
kStrategy
[]
=
"strategy"
;
constexpr
char
kNRanks
[]
=
"nranks"
;
class
MultiDevSSAGraphBuilderBase
:
public
ir
::
Pass
{
protected:
std
::
unique_ptr
<
ir
::
Graph
>
ApplyImpl
(
...
...
@@ -44,18 +41,21 @@ class MultiDevSSAGraphBuilderBase : public ir::Pass {
virtual
void
Init
()
const
;
virtual
void
CheckGraph
(
const
ir
::
Graph
&
graph
)
const
;
virtual
std
::
vector
<
ir
::
Node
*>
SortOperations
(
const
ir
::
Graph
&
graph
)
const
;
virtual
void
InsertCollectiveOp
(
ir
::
Graph
*
result
,
const
std
::
string
&
p_name
,
const
std
::
string
&
g_name
)
const
=
0
;
virtual
bool
DealWithSpecialOp
(
ir
::
Graph
*
result
,
ir
::
Node
*
node
)
const
=
0
;
virtual
bool
DealWithSpecialOp
(
ir
::
Graph
*
result
,
ir
::
Node
*
node
)
const
;
virtual
void
InsertPostprocessOps
(
ir
::
Graph
*
result
)
const
=
0
;
bool
UseGPU
()
const
;
bool
NeedCollectiveOps
()
const
;
bool
NeedCollectiveForGrad
(
const
std
::
string
&
grad_name
,
std
::
vector
<
ir
::
Node
*>
ops
)
const
;
bool
IsScaleLossOp
(
ir
::
Node
*
node
)
const
;
...
...
@@ -109,10 +109,6 @@ class AllReduceSSAGraphBuilder : public MultiDevSSAGraphBuilderBase {
virtual
void
InsertCollectiveOp
(
ir
::
Graph
*
result
,
const
std
::
string
&
p_name
,
const
std
::
string
&
g_name
)
const
;
virtual
bool
DealWithSpecialOp
(
ir
::
Graph
*
result
,
ir
::
Node
*
node
)
const
{
return
false
;
}
virtual
void
InsertPostprocessOps
(
ir
::
Graph
*
result
)
const
{}
};
...
...
paddle/fluid/framework/details/multi_devices_helper.h
浏览文件 @
dd080b17
...
...
@@ -16,6 +16,9 @@
#include <memory>
#include <string>
#include <unordered_map>
#include <unordered_set>
#include <utility>
#include <vector>
#include "paddle/fluid/framework/details/op_handle_base.h"
...
...
@@ -44,6 +47,26 @@ const char kGraphVars[] = "vars";
typedef
std
::
unordered_set
<
VarHandleBase
*>
GraphDepVars
;
const
char
kGraphDepVars
[]
=
"dep_vars"
;
constexpr
char
kNCCLCtxs
[]
=
"nccl_ctxs"
;
constexpr
char
kLossVarName
[]
=
"loss_var_name"
;
constexpr
char
kPlaces
[]
=
"places"
;
constexpr
char
kLocalScopes
[]
=
"local_scopes"
;
constexpr
char
kStrategy
[]
=
"strategy"
;
constexpr
char
kNRanks
[]
=
"nranks"
;
typedef
std
::
unordered_set
<
std
::
string
>
FusedVars
;
constexpr
char
kFusedVars
[]
=
"fused_vars"
;
typedef
std
::
vector
<
std
::
pair
<
std
::
string
,
std
::
string
>>
ParamsAndGrads
;
constexpr
char
kParamsAndGrads
[]
=
"params_grads"
;
typedef
std
::
vector
<
std
::
vector
<
std
::
pair
<
std
::
string
,
std
::
string
>>>
GroupGradsAndParams
;
constexpr
char
kGroupGradsAndParams
[]
=
"group_grads_params"
;
constexpr
char
kFusedVarNamePrefix
[]
=
"@FUSEDVAR@"
;
}
// namespace details
}
// namespace framework
}
// namespace paddle
paddle/fluid/framework/details/op_registry.h
浏览文件 @
dd080b17
...
...
@@ -16,6 +16,8 @@ limitations under the License. */
#include <string>
#include <tuple>
#include <unordered_map>
#include <unordered_set>
#include <vector>
#include "paddle/fluid/framework/grad_op_desc_maker.h"
#include "paddle/fluid/framework/inplace_op_inference.h"
...
...
@@ -127,9 +129,9 @@ struct OpInfoFiller<T, kGradOpDescMaker> {
template
<
typename
T
>
struct
OpInfoFiller
<
T
,
kVarTypeInference
>
{
void
operator
()(
const
char
*
op_type
,
OpInfo
*
info
)
const
{
info
->
infer_var_type_
=
[](
const
OpDesc
&
fwd_op
,
BlockDesc
*
block
)
{
info
->
infer_var_type_
=
[](
InferVarTypeContext
*
context
)
{
T
inference
;
inference
(
fwd_op
,
block
);
inference
(
context
);
};
}
};
...
...
paddle/fluid/framework/details/reduce_and_gather.h
浏览文件 @
dd080b17
...
...
@@ -53,6 +53,31 @@ struct ReduceLoDTensor {
}
};
struct
ReduceBufferData
{
const
std
::
vector
<
const
void
*>
&
src_data_
;
void
*
dst_data_
;
int64_t
numel_
;
ReduceBufferData
(
const
std
::
vector
<
const
void
*>
&
src
,
void
*
dst
,
int64_t
numel
)
:
src_data_
(
src
),
dst_data_
(
dst
),
numel_
(
numel
)
{}
template
<
typename
T
>
void
apply
()
const
{
T
*
dst_data
=
reinterpret_cast
<
T
*>
(
dst_data_
);
for
(
size_t
i
=
0
;
i
<
src_data_
.
size
();
++
i
)
{
auto
srd_data
=
reinterpret_cast
<
const
T
*>
(
src_data_
[
i
]);
VLOG
(
10
)
<<
"dst: "
<<
dst_data_
<<
", "
<<
srd_data
;
if
(
srd_data
==
dst_data_
)
{
continue
;
}
std
::
transform
(
srd_data
,
srd_data
+
numel_
,
dst_data
,
dst_data
,
[](
T
a
,
T
b
)
->
T
{
return
a
+
b
;
});
}
}
};
inline
void
GatherLocalSelectedRows
(
const
std
::
vector
<
const
SelectedRows
*>
&
src_selecte_rows_
,
const
std
::
vector
<
platform
::
Place
>
&
in_places
,
...
...
paddle/fluid/framework/ir/CMakeLists.txt
浏览文件 @
dd080b17
...
...
@@ -46,6 +46,7 @@ cc_library(fuse_pass_base SRCS fuse_pass_base.cc DEPS pass)
pass_library
(
graph_to_program_pass base
)
pass_library
(
graph_viz_pass base
)
pass_library
(
lock_free_optimize_pass base
)
pass_library
(
cpu_quantize_placement_pass base
)
pass_library
(
cpu_quantize_pass inference
)
pass_library
(
cpu_quantize_squash_pass inference
)
pass_library
(
fc_fuse_pass inference
)
...
...
@@ -103,6 +104,7 @@ cc_test(test_graph_pattern_detector SRCS graph_pattern_detector_tester.cc DEPS g
cc_test
(
test_fc_fuse_pass SRCS fc_fuse_pass_tester.cc DEPS fc_fuse_pass framework_proto
)
cc_test
(
test_seqpool_concat_fuse_pass SRCS seqpool_concat_fuse_pass_tester.cc DEPS seqpool_concat_fuse_pass framework_proto
)
cc_test
(
test_is_test_pass SRCS is_test_pass_tester.cc DEPS is_test_pass
)
cc_test
(
test_cpu_quantize_placement_pass SRCS cpu_quantize_placement_pass_tester.cc DEPS cpu_quantize_placement_pass
)
cc_test
(
test_cpu_quantize_pass SRCS cpu_quantize_pass_tester.cc DEPS cpu_quantize_pass naive_executor
)
cc_test
(
test_cpu_quantize_squash_pass SRCS cpu_quantize_squash_pass_tester.cc DEPS cpu_quantize_squash_pass naive_executor
)
if
(
NOT WIN32
)
...
...
paddle/fluid/framework/ir/cpu_quantize_placement_pass.cc
0 → 100644
浏览文件 @
dd080b17
/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/framework/ir/cpu_quantize_placement_pass.h"
#include <string>
#include <unordered_set>
namespace
paddle
{
namespace
framework
{
namespace
ir
{
std
::
unique_ptr
<
ir
::
Graph
>
CPUQuantizePlacementPass
::
ApplyImpl
(
std
::
unique_ptr
<
ir
::
Graph
>
graph
)
const
{
VLOG
(
3
)
<<
"Marks operators which are to be quantized."
;
const
auto
&
excluded_ids_list
=
Get
<
std
::
unordered_set
<
int
>>
(
"quantize_excluded_op_ids"
);
const
auto
&
op_types_list
=
Get
<
std
::
unordered_set
<
std
::
string
>>
(
"quantize_enabled_op_types"
);
for
(
const
Node
*
n
:
graph
->
Nodes
())
{
if
(
n
->
IsOp
())
{
if
(
std
::
find
(
excluded_ids_list
.
begin
(),
excluded_ids_list
.
end
(),
n
->
id
())
!=
excluded_ids_list
.
end
())
continue
;
auto
*
op
=
n
->
Op
();
if
(
op
->
HasAttr
(
"use_quantizer"
)
||
op
->
HasProtoAttr
(
"use_quantizer"
))
{
if
(
op_types_list
.
empty
())
{
op
->
SetAttr
(
"use_quantizer"
,
true
);
}
else
if
(
std
::
find
(
op_types_list
.
begin
(),
op_types_list
.
end
(),
n
->
Name
())
!=
op_types_list
.
end
())
{
op
->
SetAttr
(
"use_quantizer"
,
true
);
}
}
}
}
return
graph
;
}
}
// namespace ir
}
// namespace framework
}
// namespace paddle
REGISTER_PASS
(
cpu_quantize_placement_pass
,
paddle
::
framework
::
ir
::
CPUQuantizePlacementPass
)
// a vector of operator type names to be quantized ("conv2d" etc.)
.
RequirePassAttr
(
"quantize_enabled_op_types"
)
// a vector of operator ids that are to be excluded from quantization
.
RequirePassAttr
(
"quantize_excluded_op_ids"
);
paddle/fluid/framework/ir/cpu_quantize_placement_pass.h
0 → 100644
浏览文件 @
dd080b17
/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <memory>
#include "paddle/fluid/framework/ir/pass.h"
namespace
paddle
{
namespace
framework
{
namespace
ir
{
/*
* Specifies which operators should be quantized.
*/
class
CPUQuantizePlacementPass
:
public
Pass
{
protected:
std
::
unique_ptr
<
ir
::
Graph
>
ApplyImpl
(
std
::
unique_ptr
<
ir
::
Graph
>
graph
)
const
override
;
};
}
// namespace ir
}
// namespace framework
}
// namespace paddle
paddle/fluid/framework/ir/cpu_quantize_placement_pass_tester.cc
0 → 100644
浏览文件 @
dd080b17
// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "paddle/fluid/framework/ir/cpu_quantize_placement_pass.h"
#include <gtest/gtest.h>
#include <boost/logic/tribool.hpp>
namespace
paddle
{
namespace
framework
{
namespace
ir
{
void
SetOp
(
ProgramDesc
*
prog
,
const
std
::
string
&
type
,
const
std
::
string
&
name
,
const
std
::
vector
<
std
::
string
>&
inputs
,
const
std
::
vector
<
std
::
string
>&
outputs
,
boost
::
tribool
use_quantizer
)
{
auto
*
op
=
prog
->
MutableBlock
(
0
)
->
AppendOp
();
op
->
SetType
(
type
);
if
(
!
boost
::
indeterminate
(
use_quantizer
))
op
->
SetAttr
(
"use_quantizer"
,
use_quantizer
);
if
(
type
==
"conv2d"
)
{
op
->
SetAttr
(
"name"
,
name
);
op
->
SetInput
(
"Input"
,
{
inputs
[
0
]});
op
->
SetInput
(
"Filter"
,
{
inputs
[
1
]});
op
->
SetInput
(
"Bias"
,
{
inputs
[
2
]});
}
else
if
(
type
==
"relu"
)
{
op
->
SetInput
(
"X"
,
inputs
);
}
else
if
(
type
==
"concat"
)
{
op
->
SetAttr
(
"axis"
,
1
);
op
->
SetInput
(
"X"
,
{
inputs
[
0
],
inputs
[
1
]});
}
else
if
(
type
==
"pool2d"
)
{
op
->
SetInput
(
"X"
,
{
inputs
[
0
]});
}
else
{
FAIL
()
<<
"Unexpected operator type."
;
}
op
->
SetOutput
(
"Out"
,
{
outputs
[
0
]});
}
// operator use_quantizer
// ---------------------------------------
// (a,b)->concat->c none
// (c,weights,bias)->conv->f false
// f->relu->g none
// g->pool->h false
// (h,weights2,bias2)->conv->k false
// k->pool->l false
ProgramDesc
BuildProgramDesc
()
{
ProgramDesc
prog
;
for
(
auto
&
v
:
std
::
vector
<
std
::
string
>
({
"a"
,
"b"
,
"c"
,
"weights"
,
"bias"
,
"f"
,
"g"
,
"h"
,
"weights2"
,
"bias2"
,
"k"
,
"l"
}))
{
auto
*
var
=
prog
.
MutableBlock
(
0
)
->
Var
(
v
);
var
->
SetType
(
proto
::
VarType
::
SELECTED_ROWS
);
if
(
v
==
"weights"
||
v
==
"bias"
)
{
var
->
SetPersistable
(
true
);
}
}
SetOp
(
&
prog
,
"concat"
,
"concat1"
,
{
"a"
,
"b"
},
{
"c"
},
boost
::
indeterminate
);
SetOp
(
&
prog
,
"conv2d"
,
"conv1"
,
{
"c"
,
"weights"
,
"bias"
},
{
"f"
},
false
);
SetOp
(
&
prog
,
"relu"
,
"relu1"
,
{
"f"
},
{
"g"
},
boost
::
indeterminate
);
SetOp
(
&
prog
,
"pool2d"
,
"pool1"
,
{
"g"
},
{
"h"
},
false
);
SetOp
(
&
prog
,
"conv2d"
,
"conv2"
,
{
"h"
,
"weights2"
,
"bias2"
},
{
"k"
},
false
);
SetOp
(
&
prog
,
"pool2d"
,
"pool2"
,
{
"k"
},
{
"l"
},
false
);
return
prog
;
}
void
MainTest
(
std
::
initializer_list
<
std
::
string
>
quantize_enabled_op_types
,
std
::
initializer_list
<
int
>
quantize_excluded_op_ids
,
unsigned
expected_use_quantizer_true_count
)
{
auto
prog
=
BuildProgramDesc
();
std
::
unique_ptr
<
ir
::
Graph
>
graph
(
new
ir
::
Graph
(
prog
));
auto
pass
=
PassRegistry
::
Instance
().
Get
(
"cpu_quantize_placement_pass"
);
pass
->
Set
(
"quantize_enabled_op_types"
,
new
std
::
unordered_set
<
std
::
string
>
(
quantize_enabled_op_types
));
pass
->
Set
(
"quantize_excluded_op_ids"
,
new
std
::
unordered_set
<
int
>
(
quantize_excluded_op_ids
));
graph
=
pass
->
Apply
(
std
::
move
(
graph
));
unsigned
use_quantizer_true_count
=
0
;
for
(
auto
*
node
:
graph
->
Nodes
())
{
if
(
node
->
IsOp
())
{
auto
*
op
=
node
->
Op
();
if
(
op
->
HasAttr
(
"use_quantizer"
)
&&
boost
::
get
<
bool
>
(
op
->
GetAttr
(
"use_quantizer"
)))
{
++
use_quantizer_true_count
;
}
}
}
EXPECT_EQ
(
use_quantizer_true_count
,
expected_use_quantizer_true_count
);
}
TEST
(
QuantizerPlacementPass
,
enabled_pool
)
{
MainTest
({
"pool2d"
},
{},
2
);
}
TEST
(
QuantizerPlacementPass
,
enabled_conv_excluded_one
)
{
MainTest
({
"conv2d"
},
{
4
},
1
);
}
TEST
(
QuantizerPlacementPass
,
excluded_none
)
{
// 2 conv + 2 pool
MainTest
({},
{},
4
);
}
}
// namespace ir
}
// namespace framework
}
// namespace paddle
USE_PASS
(
cpu_quantize_placement_pass
);
paddle/fluid/framework/ir/graph_test.cc
浏览文件 @
dd080b17
...
...
@@ -43,20 +43,20 @@ class SumOpMaker : public OpProtoAndCheckerMaker {
class
SumOpVarTypeInference
:
public
VarTypeInference
{
public:
void
operator
()(
const
OpDesc
&
op_desc
,
BlockDesc
*
block
)
const
override
{
auto
&
inputs
=
op_desc
.
Input
(
"X"
);
void
operator
()(
InferVarTypeContext
*
ctx
)
const
override
{
auto
&
inputs
=
ctx
->
Input
(
"X"
);
auto
default_var_type
=
proto
::
VarType
::
SELECTED_ROWS
;
bool
any_input_is_lod_tensor
=
std
::
any_of
(
inputs
.
begin
(),
inputs
.
end
(),
[
block
](
const
std
::
string
&
name
)
{
return
block
->
Var
(
name
)
->
GetType
(
)
==
proto
::
VarType
::
LOD_TENSOR
;
inputs
.
begin
(),
inputs
.
end
(),
[
&
ctx
](
const
std
::
string
&
name
)
{
return
ctx
->
GetType
(
name
)
==
proto
::
VarType
::
LOD_TENSOR
;
});
if
(
any_input_is_lod_tensor
)
{
default_var_type
=
proto
::
VarType
::
LOD_TENSOR
;
}
auto
out_var_name
=
op_desc
.
Output
(
"Out"
).
front
();
block
->
Var
(
out_var_name
)
->
SetType
(
default_var_type
);
auto
out_var_name
=
ctx
->
Output
(
"Out"
).
front
();
ctx
->
SetType
(
out_var_name
,
default_var_type
);
}
};
...
...
@@ -71,7 +71,7 @@ class DummyOpMaker : public OpProtoAndCheckerMaker {
class
DummyOpVarTypeInference
:
public
VarTypeInference
{
public:
void
operator
()(
const
OpDesc
&
op_desc
,
BlockDesc
*
block
)
const
override
{}
void
operator
()(
framework
::
InferVarTypeContext
*
ctx
)
const
override
{}
};
}
// namespace framework
}
// namespace paddle
...
...
paddle/fluid/framework/op_desc.cc
浏览文件 @
dd080b17
...
...
@@ -24,6 +24,7 @@ limitations under the License. */
#include "paddle/fluid/framework/operator.h"
#include "paddle/fluid/framework/program_desc.h"
#include "paddle/fluid/framework/shape_inference.h"
#include "paddle/fluid/framework/var_type_inference.h"
namespace
paddle
{
namespace
framework
{
...
...
@@ -677,7 +678,8 @@ void OpDesc::InferVarType(BlockDesc *block) const {
// var type inference. Hence, we don't do any "default" setting here.
auto
&
info
=
OpInfoMap
::
Instance
().
Get
(
this
->
Type
());
if
(
info
.
infer_var_type_
)
{
info
.
infer_var_type_
(
*
this
,
block
);
InferVarTypeContext
context
(
this
,
block
);
info
.
infer_var_type_
(
&
context
);
}
}
...
...
paddle/fluid/framework/parallel_executor.cc
浏览文件 @
dd080b17
...
...
@@ -254,18 +254,29 @@ ParallelExecutor::ParallelExecutor(const std::vector<platform::Place> &places,
member_
->
places_
,
nccl_id
,
build_strategy
.
num_trainers_
,
build_strategy
.
trainer_id_
));
std
::
unique_ptr
<
platform
::
NCCLContextMap
>
dev_nccl_ctxs
;
dev_nccl_ctxs
.
reset
(
new
platform
::
NCCLContextMap
(
member_
->
places_
));
// Initialize device context's nccl comm
// Note, more than one ParallelExecutor with same place, the nccl comm will
// Initialize device context's nccl comm, will be used by normal
// Operators like sync_batch_norm, and collective ops.
// NOTE: more than one ParallelExecutor with same place, the nccl comm will
// be rewrite and there will be some problem.
// NOTE: NCCL group-calls and non-group-calls can not use the same
// NCCL communicator, so for ParallelGraph and Multi-Process mode, re-use
// same communicators.
std
::
unique_ptr
<
platform
::
NCCLContextMap
>
dev_nccl_ctxs
;
if
(
nccl_id
==
nullptr
)
{
dev_nccl_ctxs
.
reset
(
new
platform
::
NCCLContextMap
(
member_
->
places_
));
}
for
(
size_t
dev_id
=
0
;
dev_id
<
member_
->
places_
.
size
();
++
dev_id
)
{
auto
&
nccl_ctx
=
dev_nccl_ctxs
->
at
(
dev_id
);
platform
::
DeviceContextPool
&
pool
=
platform
::
DeviceContextPool
::
Instance
();
auto
*
dev_ctx
=
static_cast
<
platform
::
CUDADeviceContext
*>
(
pool
.
Get
(
member_
->
places_
[
dev_id
]));
dev_ctx
->
set_nccl_comm
(
nccl_ctx
.
comm
());
if
(
nccl_id
!=
nullptr
)
{
auto
&
nccl_ctx
=
member_
->
nccl_ctxs_
->
at
(
member_
->
places_
[
dev_id
]);
dev_ctx
->
set_nccl_comm
(
nccl_ctx
.
comm
());
}
else
{
auto
&
nccl_ctx
=
dev_nccl_ctxs
->
at
(
member_
->
places_
[
dev_id
]);
dev_ctx
->
set_nccl_comm
(
nccl_ctx
.
comm
());
}
}
#else
PADDLE_THROW
(
"Not compiled with CUDA"
);
...
...
paddle/fluid/framework/scope.cc
浏览文件 @
dd080b17
...
...
@@ -34,7 +34,7 @@ DEFINE_double(
"Memory size threshold (GB) when the garbage collector clear tensors."
"Disabled when this value is less than 0"
);
DEFINE_bool
(
fast_eager_deletion_mode
,
fals
e
,
DEFINE_bool
(
fast_eager_deletion_mode
,
tru
e
,
"Fast eager deletion mode. If enabled, memory would release "
"immediately without waiting GPU kernel ends."
);
...
...
paddle/fluid/framework/tensor_util.cc
浏览文件 @
dd080b17
...
...
@@ -44,6 +44,11 @@ void TensorCopy(const Tensor& src, const platform::Place& dst_place,
<<
dst_place
;
return
;
}
#ifdef PADDLE_WITH_MKLDNN
if
(
src
.
layout
()
==
DataLayout
::
kMKLDNN
)
{
dst
->
set_mkldnn_prim_desc
(
src
.
get_mkldnn_prim_desc
());
}
#endif
memory
::
Copy
(
boost
::
get
<
platform
::
CPUPlace
>
(
dst_place
),
dst_ptr
,
boost
::
get
<
platform
::
CPUPlace
>
(
src_place
),
src_ptr
,
size
);
}
...
...
paddle/fluid/framework/type_defs.h
浏览文件 @
dd080b17
...
...
@@ -27,6 +27,7 @@ namespace framework {
class
OperatorBase
;
class
OpDesc
;
class
InferShapeContext
;
class
InferVarTypeContext
;
class
BlockDesc
;
class
Variable
;
...
...
@@ -53,7 +54,7 @@ using GradOpMakerFN = std::function<std::vector<std::unique_ptr<OpDesc>>(
const
std
::
vector
<
BlockDesc
*>&
grad_block
)
>
;
using
InferVarTypeFN
=
std
::
function
<
void
(
const
OpDesc
&
/*op_desc*/
,
BlockDesc
*
/*block
*/
)
>
;
std
::
function
<
void
(
framework
::
InferVarTypeContext
*
/*context
*/
)
>
;
using
InferShapeFN
=
std
::
function
<
void
(
InferShapeContext
*
)
>
;
...
...
paddle/fluid/framework/var_type_inference.h
浏览文件 @
dd080b17
...
...
@@ -14,6 +14,8 @@ limitations under the License. */
#pragma once
#include <string>
#include <unordered_map>
#include <vector>
#include "paddle/fluid/framework/block_desc.h"
#include "paddle/fluid/framework/op_desc.h"
#include "paddle/fluid/framework/type_defs.h"
...
...
@@ -21,26 +23,123 @@ limitations under the License. */
namespace
paddle
{
namespace
framework
{
class
OpDesc
;
class
BlockDesc
;
// default infer var type context
class
InferVarTypeContext
{
public:
InferVarTypeContext
(
const
OpDesc
*
op
,
BlockDesc
*
block
)
:
op_
(
op
),
block_
(
block
)
{}
virtual
~
InferVarTypeContext
()
{}
virtual
Attribute
GetAttr
(
const
std
::
string
&
name
)
const
{
PADDLE_ENFORCE_NOT_NULL
(
op_
);
return
op_
->
GetAttr
(
name
);
}
virtual
bool
HasVar
(
const
std
::
string
&
name
)
const
{
PADDLE_ENFORCE_NOT_NULL
(
block_
);
return
block_
->
FindVarRecursive
(
name
)
!=
nullptr
;
}
virtual
bool
HasInput
(
const
std
::
string
&
name
)
const
{
PADDLE_ENFORCE_NOT_NULL
(
op_
);
return
op_
->
Inputs
().
count
(
name
)
>
0
;
}
virtual
bool
HasOutput
(
const
std
::
string
&
name
)
const
{
PADDLE_ENFORCE_NOT_NULL
(
op_
);
return
op_
->
Outputs
().
count
(
name
)
>
0
;
}
virtual
const
std
::
vector
<
std
::
string
>&
Input
(
const
std
::
string
&
name
)
const
{
PADDLE_ENFORCE_NOT_NULL
(
op_
);
return
op_
->
Input
(
name
);
}
virtual
const
std
::
vector
<
std
::
string
>&
Output
(
const
std
::
string
&
name
)
const
{
PADDLE_ENFORCE_NOT_NULL
(
op_
);
return
op_
->
Output
(
name
);
}
virtual
proto
::
VarType
::
Type
GetType
(
const
std
::
string
&
name
)
const
{
PADDLE_ENFORCE_NOT_NULL
(
block_
);
return
block_
->
FindRecursiveOrCreateVar
(
name
).
GetType
();
}
virtual
void
SetType
(
const
std
::
string
&
name
,
proto
::
VarType
::
Type
type
)
{
PADDLE_ENFORCE_NOT_NULL
(
block_
);
block_
->
FindRecursiveOrCreateVar
(
name
).
SetType
(
type
);
}
virtual
proto
::
VarType
::
Type
GetDataType
(
const
std
::
string
&
name
)
const
{
PADDLE_ENFORCE_NOT_NULL
(
block_
);
return
block_
->
FindRecursiveOrCreateVar
(
name
).
GetDataType
();
}
virtual
void
SetDataType
(
const
std
::
string
&
name
,
proto
::
VarType
::
Type
type
)
{
PADDLE_ENFORCE_NOT_NULL
(
block_
);
block_
->
FindRecursiveOrCreateVar
(
name
).
SetDataType
(
type
);
}
virtual
std
::
vector
<
proto
::
VarType
::
Type
>
GetDataTypes
(
const
std
::
string
&
name
)
const
{
PADDLE_ENFORCE_NOT_NULL
(
block_
);
return
block_
->
FindRecursiveOrCreateVar
(
name
).
GetDataTypes
();
}
virtual
void
SetDataTypes
(
const
std
::
string
&
name
,
const
std
::
vector
<
proto
::
VarType
::
Type
>&
multiple_data_type
)
{
PADDLE_ENFORCE_NOT_NULL
(
block_
);
block_
->
FindRecursiveOrCreateVar
(
name
).
SetDataTypes
(
multiple_data_type
);
}
virtual
std
::
vector
<
int64_t
>
GetShape
(
const
std
::
string
&
name
)
const
{
PADDLE_ENFORCE_NOT_NULL
(
block_
);
return
block_
->
FindRecursiveOrCreateVar
(
name
).
GetShape
();
}
virtual
void
SetShape
(
const
std
::
string
&
name
,
const
std
::
vector
<
int64_t
>&
dims
)
{
PADDLE_ENFORCE_NOT_NULL
(
block_
);
block_
->
FindRecursiveOrCreateVar
(
name
).
SetShape
(
dims
);
}
virtual
int32_t
GetLoDLevel
(
const
std
::
string
&
name
)
const
{
PADDLE_ENFORCE_NOT_NULL
(
block_
);
return
block_
->
FindRecursiveOrCreateVar
(
name
).
GetLoDLevel
();
}
virtual
void
SetLoDLevel
(
const
std
::
string
&
name
,
int32_t
lod_level
)
{
PADDLE_ENFORCE_NOT_NULL
(
block_
);
block_
->
FindRecursiveOrCreateVar
(
name
).
SetLoDLevel
(
lod_level
);
}
protected:
const
OpDesc
*
op_
;
BlockDesc
*
block_
;
};
class
VarTypeInference
{
public:
virtual
~
VarTypeInference
()
{}
virtual
void
operator
()(
const
OpDesc
&
op_desc
,
BlockDesc
*
block
)
const
=
0
;
virtual
void
operator
()(
InferVarTypeContext
*
context
)
const
=
0
;
// NOLINT
};
class
PassInDtypeAndVarTypeToOutput
:
public
framework
::
VarTypeInference
{
public:
void
operator
()(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
final
{
void
operator
()(
framework
::
InferVarTypeContext
*
ctx
)
const
final
{
// NOLINT
auto
in_out_var_names
=
this
->
GetInputOutputWithSameType
();
for
(
auto
&
i_o_n
:
in_out_var_names
)
{
auto
&
x_name
=
op_desc
.
Input
(
i_o_n
.
first
).
at
(
0
);
auto
&
out_name
=
op_desc
.
Output
(
i_o_n
.
second
).
at
(
0
);
auto
&
x_name
=
ctx
->
Input
(
i_o_n
.
first
).
at
(
0
);
auto
&
out_name
=
ctx
->
Output
(
i_o_n
.
second
).
at
(
0
);
auto
&
x
=
block
->
FindRecursiveOrCreateVar
(
x_name
);
auto
&
out
=
block
->
FindRecursiveOrCreateVar
(
out_name
);
out
.
SetType
(
x
.
GetType
());
out
.
SetDataType
(
x
.
GetDataType
());
ctx
->
SetType
(
out_name
,
ctx
->
GetType
(
x_name
));
ctx
->
SetDataType
(
out_name
,
ctx
->
GetDataType
(
x_name
));
}
}
...
...
paddle/fluid/framework/var_type_inference_test.cc
浏览文件 @
dd080b17
...
...
@@ -44,20 +44,20 @@ class SumOpMaker : public OpProtoAndCheckerMaker {
class
SumOpVarTypeInference
:
public
VarTypeInference
{
public:
void
operator
()(
const
OpDesc
&
op_desc
,
BlockDesc
*
block
)
const
override
{
auto
&
inputs
=
op_desc
.
Input
(
"X"
);
void
operator
()(
framework
::
InferVarTypeContext
*
ctx
)
const
override
{
auto
&
inputs
=
ctx
->
Input
(
"X"
);
auto
default_var_type
=
proto
::
VarType
::
SELECTED_ROWS
;
bool
any_input_is_lod_tensor
=
std
::
any_of
(
inputs
.
begin
(),
inputs
.
end
(),
[
block
](
const
std
::
string
&
name
)
{
return
block
->
Var
(
name
)
->
GetType
(
)
==
proto
::
VarType
::
LOD_TENSOR
;
inputs
.
begin
(),
inputs
.
end
(),
[
&
ctx
](
const
std
::
string
&
name
)
{
return
ctx
->
GetType
(
name
)
==
proto
::
VarType
::
LOD_TENSOR
;
});
if
(
any_input_is_lod_tensor
)
{
default_var_type
=
proto
::
VarType
::
LOD_TENSOR
;
}
auto
out_var_name
=
op_desc
.
Output
(
"Out"
).
front
();
block
->
Var
(
out_var_name
)
->
SetType
(
default_var_type
);
auto
out_var_name
=
ctx
->
Output
(
"Out"
).
front
();
ctx
->
SetType
(
out_var_name
,
default_var_type
);
}
};
}
// namespace framework
...
...
paddle/fluid/imperative/layer.cc
浏览文件 @
dd080b17
...
...
@@ -214,13 +214,11 @@ framework::LoDTensor& VarBase::GradValue() {
}
std
::
map
<
std
::
string
,
std
::
vector
<
VarBase
*>>
OpBase
::
ApplyGrad
()
{
if
(
grad_op_descs_
.
empty
()
&&
backward_id_
<=
0
)
{
VLOG
(
3
)
<<
"op with no grad: "
<<
Type
();
return
{};
}
PADDLE_ENFORCE
(
!
grad_op_descs_
.
empty
()
||
backward_id_
>
0
,
"%s has no backward implementation"
,
Type
());
VLOG
(
3
)
<<
"apply op grad: "
<<
Type
();
std
::
vector
<
framework
::
VariableValue
Map
>
tmp_grad_outputs
;
std
::
vector
<
VarBasePtr
Map
>
tmp_grad_outputs
;
if
(
backward_id_
>
0
)
{
VLOG
(
3
)
<<
"py_layer_grad"
;
tmp_grad_outputs
.
resize
(
1
);
...
...
@@ -243,26 +241,62 @@ std::map<std::string, std::vector<VarBase*>> OpBase::ApplyGrad() {
auto
&
outputs
=
tmp_grad_outputs
[
k
][
it
.
first
];
outputs
.
reserve
(
it
.
second
.
size
());
for
(
size_t
i
=
0
;
i
<
it
.
second
.
size
();
++
i
)
{
VarBase
*
origin_grad_var_base
=
it
.
second
[
i
];
// Allocate a new variable
Variable
*
tmp_var
=
new
framework
::
Variable
();
tmp_var
->
GetMutable
<
framework
::
LoDTensor
>
();
outputs
.
emplace_back
(
tmp_var
);
VarBase
*
tmp_grad_var_base
=
new
VarBase
(
string
::
Sprintf
(
"%s@IGrad"
,
origin_grad_var_base
->
Name
()),
origin_grad_var_base
->
DataType
(),
origin_grad_var_base
->
Dims
(),
place_
,
true
,
false
);
outputs
.
emplace_back
(
tmp_grad_var_base
);
}
}
// Run grad op
framework
::
RuntimeContext
ctx
(
grad_input_vars_
[
k
],
tmp_grad_outputs
[
k
]);
// No need to do compile time infer shape here.
// grad_op_desc_->InferShape(*block_);
// grad_op_desc->InferVarType(block_);
std
::
unique_ptr
<
framework
::
OperatorBase
>
opbase
=
framework
::
OpRegistry
::
CreateOp
(
*
grad_op_desc
);
auto
&
info
=
framework
::
OpInfoMap
::
Instance
().
Get
(
grad_op_desc
->
Type
());
if
(
info
.
infer_var_type_
)
{
RuntimeInferVarTypeContext
infer_var_type_ctx
(
&
grad_input_vars_
[
k
],
&
tmp_grad_outputs
[
k
],
&
attrs_
);
info
.
infer_var_type_
(
&
infer_var_type_ctx
);
}
framework
::
OperatorWithKernel
*
op_kernel
=
dynamic_cast
<
framework
::
OperatorWithKernel
*>
(
opbase
.
get
());
PADDLE_ENFORCE_NOT_NULL
(
op_kernel
,
"only support op with kernel"
);
// Run grad op
framework
::
VariableValueMap
grad_invars_map
;
framework
::
VariableValueMap
grad_outvars_map
;
for
(
const
auto
&
it
:
grad_input_vars_
[
k
])
{
auto
&
grad_invars
=
grad_invars_map
[
it
.
first
];
grad_invars
.
reserve
(
it
.
second
.
size
());
for
(
const
VarBase
*
grad_inp
:
it
.
second
)
{
PADDLE_ENFORCE_NOT_NULL
(
grad_inp
->
var_
,
"op %s input %s nullptr"
,
grad_op_desc
->
Type
(),
grad_inp
->
Name
());
grad_invars
.
emplace_back
(
grad_inp
->
var_
);
}
}
for
(
const
auto
&
it
:
tmp_grad_outputs
[
k
])
{
auto
&
grad_outvars
=
grad_outvars_map
[
it
.
first
];
grad_outvars
.
reserve
(
it
.
second
.
size
());
for
(
VarBase
*
grad_out
:
it
.
second
)
{
PADDLE_ENFORCE_NOT_NULL
(
grad_out
->
var_
,
"op %s output %s nullptr"
,
grad_op_desc
->
Type
(),
grad_out
->
Name
());
grad_outvars
.
emplace_back
(
grad_out
->
var_
);
}
}
framework
::
RuntimeContext
ctx
(
grad_invars_map
,
grad_outvars_map
);
framework
::
Scope
scope
;
PreparedOp
p
=
PreparedOp
::
Prepare
(
ctx
,
*
op_kernel
,
place_
);
p
.
op
.
RuntimeInferShape
(
scope
,
place_
,
ctx
);
...
...
@@ -279,8 +313,8 @@ std::map<std::string, std::vector<VarBase*>> OpBase::ApplyGrad() {
PADDLE_ENFORCE_EQ
(
outputs
.
size
(),
origin_outputs
.
size
());
for
(
size_t
i
=
0
;
i
<
outputs
.
size
();
++
i
)
{
framework
::
Variable
*
grad
=
outputs
[
i
];
framework
::
Variable
*
orig_grad
=
origin_outputs
[
i
];
framework
::
Variable
*
grad
=
outputs
[
i
]
->
var_
;
framework
::
Variable
*
orig_grad
=
origin_outputs
[
i
]
->
var_
;
AddTo
(
grad
,
orig_grad
,
place_
);
delete
grad
;
}
...
...
@@ -328,28 +362,35 @@ void PyLayer::RegisterFunc(int func_id, const py::object& py_func) {
int
PyLayer
::
NumFuncs
()
{
return
py_funcs_
.
size
();
}
std
::
vector
<
Variable
*>
PyLayer
::
Apply
(
int
func_id
,
const
std
::
vector
<
VarBase
*>&
inputs
)
{
std
::
vector
<
framework
::
Variable
*>
invars
;
for
(
const
VarBase
*
in
:
inputs
)
{
invars
.
push_back
(
in
->
var_
);
}
std
::
vector
<
framework
::
Variable
*>
PyLayer
::
Apply
(
int
func_id
,
const
std
::
vector
<
VarBase
*>&
inputs
)
{
PADDLE_ENFORCE
(
py_funcs_
.
find
(
func_id
)
!=
py_funcs_
.
end
());
return
CallPythonFunc
(
py_funcs_
[
func_id
],
in
var
s
);
return
CallPythonFunc
(
py_funcs_
[
func_id
],
in
put
s
);
}
std
::
vector
<
Var
iable
*>
PyLayer
::
ApplyGrad
(
int
func_id
,
const
std
::
vector
<
framework
::
Variabl
e
*>&
inputs
)
{
std
::
vector
<
Var
Base
*>
PyLayer
::
ApplyGrad
(
int
func_id
,
const
std
::
vector
<
VarBas
e
*>&
inputs
)
{
PADDLE_ENFORCE
(
py_funcs_
.
find
(
func_id
)
!=
py_funcs_
.
end
());
return
CallPythonFunc
(
py_funcs_
[
func_id
],
inputs
);
auto
rets
=
CallPythonFunc
(
py_funcs_
[
func_id
],
inputs
);
std
::
vector
<
VarBase
*>
outs
;
outs
.
reserve
(
rets
.
size
());
for
(
size_t
i
=
0U
;
i
!=
rets
.
size
();
++
i
)
{
outs
.
emplace_back
(
new
VarBase
(
string
::
Sprintf
(
"%s_out_%d"
,
framework
::
GradVarName
(
PyLayer
::
kFwdOut
),
i
),
rets
[
i
],
nullptr
,
true
));
}
return
outs
;
}
std
::
vector
<
framework
::
Variable
*>
PyLayer
::
CallPythonFunc
(
const
py
::
object
&
callable
,
const
std
::
vector
<
framework
::
Variabl
e
*>&
ins
)
{
const
py
::
object
&
callable
,
const
std
::
vector
<
VarBas
e
*>&
ins
)
{
py
::
gil_scoped_acquire
guard
;
py
::
tuple
in_args
(
ins
.
size
());
for
(
size_t
i
=
0
;
i
<
ins
.
size
();
++
i
)
{
const
framework
::
LoDTensor
&
t
=
ins
[
i
]
->
Get
<
framework
::
LoDTensor
>
();
const
framework
::
LoDTensor
&
t
=
ins
[
i
]
->
var_
->
Get
<
framework
::
LoDTensor
>
();
in_args
[
i
]
=
t
.
IsInitialized
()
?
py
::
cast
(
t
)
:
py
::
cast
(
nullptr
);
}
VLOG
(
3
)
<<
"pyfunc in "
<<
py
::
len
(
in_args
);
...
...
@@ -359,6 +400,7 @@ std::vector<framework::Variable*> PyLayer::CallPythonFunc(
auto
ret_tuple
=
py
::
cast
<
py
::
tuple
>
(
ret
);
size_t
ret_num
=
py
::
len
(
ret_tuple
);
std
::
vector
<
framework
::
Variable
*>
outs
;
outs
.
reserve
(
ret_num
);
VLOG
(
3
)
<<
"pyfunc out "
<<
ret_num
;
for
(
size_t
i
=
0
;
i
<
ret_num
;
++
i
)
{
try
{
...
...
@@ -369,7 +411,7 @@ std::vector<framework::Variable*> PyLayer::CallPythonFunc(
auto
*
tensor
=
var
->
GetMutable
<
framework
::
LoDTensor
>
();
tensor
->
ShareDataWith
(
*
py_out_tensor
);
tensor
->
set_lod
(
py_out_tensor
->
lod
());
outs
.
push
_back
(
var
);
outs
.
emplace
_back
(
var
);
}
catch
(
py
::
cast_error
&
)
{
PADDLE_THROW
(
"The %d-th output must be LoDTensor"
,
i
);
}
...
...
paddle/fluid/imperative/layer.h
浏览文件 @
dd080b17
...
...
@@ -18,14 +18,16 @@
#include "paddle/fluid/framework/python_headers.h"
// clang-format on
#include <map> // NOLINT
#include <string> // NOLINT
#include <vector> // NOLINT
#include <memory> // NOLINT
#include <map> // NOLINT
#include <string> // NOLINT
#include <vector> // NOLINT
#include <memory> // NOLINT
#include <unordered_map> // NOLINT
#include "paddle/fluid/framework/op_desc.h"
#include "paddle/fluid/framework/operator.h"
#include "paddle/fluid/framework/var_desc.h"
#include "paddle/fluid/framework/var_type_inference.h"
#include "paddle/fluid/platform/enforce.h"
#include "paddle/fluid/platform/device_context.h"
#include "paddle/fluid/operators/math/math_function.h"
...
...
@@ -135,13 +137,13 @@ class VarBase {
persistable
)
{}
private:
// TODO(minqiyang): need support SelectedRows
VarBase
(
const
std
::
string
&
name
,
framework
::
proto
::
VarType
::
Type
dtype
,
const
framework
::
DDim
&
shape
,
const
platform
::
Place
&
place
,
framework
::
Variable
*
var
,
VarBase
*
grad
,
bool
stop_gradient
,
bool
persistable
)
:
name_
(
name
),
dtype_
(
dtype
),
place_
(
place
),
type_
(
framework
::
proto
::
VarType
::
LOD_TENSOR
),
var_
(
var
),
grads_
(
grad
),
stop_gradient_
(
stop_gradient
),
...
...
@@ -151,10 +153,12 @@ class VarBase {
pre_op_out_idx_
(
-
1
)
{
if
(
!
var_
)
{
var_
=
new
framework
::
Variable
();
auto
tensor
=
var_
->
GetMutable
<
framework
::
LoDTensor
>
();
tensor
->
Resize
(
shape
);
tensor
->
mutable_data
(
place_
,
dtype_
);
}
auto
tensor
=
var_
->
GetMutable
<
framework
::
LoDTensor
>
();
tensor
->
Resize
(
shape
);
tensor
->
mutable_data
(
place
,
dtype
);
VLOG
(
10
)
<<
"create varbase: "
<<
name_
<<
" type: "
<<
dtype
<<
" place: "
<<
place
;
}
public:
...
...
@@ -184,7 +188,23 @@ class VarBase {
}
}
inline
framework
::
proto
::
VarType
::
Type
DType
()
const
{
return
dtype_
;
}
inline
framework
::
DDim
Dims
()
const
{
return
var_
->
Get
<
framework
::
LoDTensor
>
().
dims
();
}
// data type. e.g.. FP32
inline
void
SetDataType
(
framework
::
proto
::
VarType
::
Type
type
)
{
auto
tensor
=
var_
->
GetMutable
<
framework
::
LoDTensor
>
();
tensor
->
mutable_data
(
tensor
->
place
(),
type
);
}
inline
framework
::
proto
::
VarType
::
Type
DataType
()
const
{
auto
tensor
=
var_
->
Get
<
framework
::
LoDTensor
>
();
return
tensor
.
type
();
}
// tensor type. e.g.. LoDTensor
inline
void
SetType
(
framework
::
proto
::
VarType
::
Type
type
)
{
type_
=
type
;
}
inline
framework
::
proto
::
VarType
::
Type
Type
()
const
{
return
type_
;
}
inline
void
SetStopGradient
(
bool
stop_gradient
)
{
stop_gradient_
=
stop_gradient
;
...
...
@@ -238,7 +258,7 @@ class VarBase {
}
std
::
string
name_
;
framework
::
proto
::
VarType
::
Type
d
type_
;
framework
::
proto
::
VarType
::
Type
type_
;
platform
::
Place
place_
;
framework
::
Variable
*
var_
;
...
...
@@ -334,11 +354,13 @@ class PYBIND11_HIDDEN OpBase {
std
::
map
<
std
::
string
,
std
::
vector
<
int
>>
pre_ops_out_idx_
;
// Inputs to a vector of bwd ops.
std
::
vector
<
framework
::
VariableValue
Map
>
grad_input_vars_
;
std
::
vector
<
VarBasePtr
Map
>
grad_input_vars_
;
// Outputs to a vector of bwd ops.
std
::
vector
<
framework
::
VariableValue
Map
>
grad_output_vars_
;
std
::
vector
<
VarBasePtr
Map
>
grad_output_vars_
;
std
::
vector
<
py
::
object
>
backward_hooks_
;
framework
::
AttributeMap
attrs_
;
};
class
Layer
{
...
...
@@ -365,12 +387,131 @@ class PyLayer {
static
std
::
vector
<
framework
::
Variable
*>
Apply
(
int
func_id
,
const
std
::
vector
<
VarBase
*>&
inputs
);
static
std
::
vector
<
framework
::
Variable
*>
ApplyGrad
(
int
func_id
,
const
std
::
vector
<
framework
::
Variabl
e
*>&
inputs
);
static
std
::
vector
<
VarBase
*>
ApplyGrad
(
int
func_id
,
const
std
::
vector
<
VarBas
e
*>&
inputs
);
private:
static
std
::
vector
<
framework
::
Variable
*>
CallPythonFunc
(
const
py
::
object
&
callable
,
const
std
::
vector
<
framework
::
Variable
*>&
ins
);
const
py
::
object
&
callable
,
const
std
::
vector
<
VarBase
*>&
ins
);
};
// infer var type context for imperative mode
class
PYBIND11_HIDDEN
RuntimeInferVarTypeContext
:
public
framework
::
InferVarTypeContext
{
public:
RuntimeInferVarTypeContext
(
const
imperative
::
VarBasePtrMap
*
inputs
,
imperative
::
VarBasePtrMap
*
outputs
,
const
framework
::
AttributeMap
*
attrs_map
)
:
InferVarTypeContext
(
nullptr
,
nullptr
),
inputs_
(
inputs
),
outputs_
(
outputs
),
attrs_
(
attrs_map
),
input_names_
(),
output_names_
(),
var_set_
()
{
input_names_
.
reserve
(
inputs_
->
size
());
for
(
auto
&
it
:
*
inputs_
)
{
for
(
imperative
::
VarBase
*
var
:
it
.
second
)
{
input_names_
[
it
.
first
].
emplace_back
(
var
->
Name
());
var_set_
[
var
->
Name
()]
=
var
;
}
}
output_names_
.
reserve
(
outputs_
->
size
());
for
(
auto
&
it
:
*
outputs_
)
{
for
(
imperative
::
VarBase
*
var
:
it
.
second
)
{
output_names_
[
it
.
first
].
emplace_back
(
var
->
Name
());
var_set_
[
var
->
Name
()]
=
var
;
}
}
}
virtual
~
RuntimeInferVarTypeContext
()
{}
framework
::
Attribute
GetAttr
(
const
std
::
string
&
name
)
const
override
{
PADDLE_ENFORCE_NOT_NULL
(
attrs_
);
return
attrs_
->
at
(
name
);
}
bool
HasVar
(
const
std
::
string
&
name
)
const
override
{
return
var_set_
.
count
(
name
)
>
0
;
}
bool
HasInput
(
const
std
::
string
&
name
)
const
override
{
PADDLE_ENFORCE_NOT_NULL
(
inputs_
);
return
inputs_
->
count
(
name
)
>
0
;
}
bool
HasOutput
(
const
std
::
string
&
name
)
const
override
{
PADDLE_ENFORCE_NOT_NULL
(
outputs_
);
return
outputs_
->
count
(
name
)
>
0
;
}
const
std
::
vector
<
std
::
string
>&
Input
(
const
std
::
string
&
name
)
const
override
{
return
input_names_
.
at
(
name
);
}
const
std
::
vector
<
std
::
string
>&
Output
(
const
std
::
string
&
name
)
const
override
{
return
output_names_
.
at
(
name
);
}
framework
::
proto
::
VarType
::
Type
GetType
(
const
std
::
string
&
name
)
const
override
{
return
var_set_
.
at
(
name
)
->
Type
();
}
void
SetType
(
const
std
::
string
&
name
,
framework
::
proto
::
VarType
::
Type
type
)
override
{
var_set_
[
name
]
->
SetType
(
type
);
}
framework
::
proto
::
VarType
::
Type
GetDataType
(
const
std
::
string
&
name
)
const
override
{
return
var_set_
.
at
(
name
)
->
DataType
();
}
void
SetDataType
(
const
std
::
string
&
name
,
framework
::
proto
::
VarType
::
Type
type
)
override
{
var_set_
[
name
]
->
SetDataType
(
type
);
}
std
::
vector
<
framework
::
proto
::
VarType
::
Type
>
GetDataTypes
(
const
std
::
string
&
name
)
const
override
{
PADDLE_THROW
(
"GetDataTypes is not supported in runtime InferVarType"
);
}
void
SetDataTypes
(
const
std
::
string
&
name
,
const
std
::
vector
<
framework
::
proto
::
VarType
::
Type
>&
multiple_data_type
)
override
{
PADDLE_THROW
(
"SetDataTypes is not supported in runtime InferVarType"
);
}
std
::
vector
<
int64_t
>
GetShape
(
const
std
::
string
&
name
)
const
override
{
PADDLE_THROW
(
"Do not handle Shape in runtime InferVarType"
);
}
void
SetShape
(
const
std
::
string
&
name
,
const
std
::
vector
<
int64_t
>&
dims
)
override
{
PADDLE_THROW
(
"Do not handle Shape in runtime InferVarType"
);
}
int32_t
GetLoDLevel
(
const
std
::
string
&
name
)
const
override
{
PADDLE_THROW
(
"Do not handle LoDLevel in runtime InferVarType"
);
}
void
SetLoDLevel
(
const
std
::
string
&
name
,
int32_t
lod_level
)
override
{
PADDLE_THROW
(
"Do not handle LoDLevel in runtime InferVarType"
);
}
private:
const
imperative
::
VarBasePtrMap
*
inputs_
;
imperative
::
VarBasePtrMap
*
outputs_
;
const
framework
::
AttributeMap
*
attrs_
;
std
::
unordered_map
<
std
::
string
,
std
::
vector
<
std
::
string
>>
input_names_
;
std
::
unordered_map
<
std
::
string
,
std
::
vector
<
std
::
string
>>
output_names_
;
std
::
unordered_map
<
std
::
string
,
imperative
::
VarBase
*>
var_set_
;
};
}
// namespace imperative
...
...
paddle/fluid/imperative/tracer.cc
浏览文件 @
dd080b17
...
...
@@ -19,6 +19,7 @@
#include <unordered_map>
#include <unordered_set>
#include "paddle/fluid/framework/var_type_inference.h"
#include "paddle/fluid/operators/math/math_function.h"
#include "paddle/fluid/platform/device_context.h"
#include "paddle/fluid/platform/enforce.h"
...
...
@@ -32,11 +33,12 @@ void CreateGradOp(const framework::OpDesc& op_desc,
std
::
vector
<
framework
::
OpDesc
*>*
grad_op_descs
,
std
::
unordered_map
<
std
::
string
,
std
::
string
>*
grad_to_var
)
{
PADDLE_ENFORCE
(
grad_op_descs
->
empty
());
std
::
vector
<
std
::
unique_ptr
<
framework
::
OpDesc
>>
descs
=
framework
::
OpInfoMap
::
Instance
()
.
Get
(
op_desc
.
Type
())
.
GradOpMaker
()(
op_desc
,
no_grad_set
,
grad_to_var
,
grad_sub_block
);
const
framework
::
OpInfo
&
op_info
=
framework
::
OpInfoMap
::
Instance
().
Get
(
op_desc
.
Type
());
if
(
!
op_info
.
grad_op_maker_
)
return
;
std
::
vector
<
std
::
unique_ptr
<
framework
::
OpDesc
>>
descs
=
op_info
.
GradOpMaker
()(
op_desc
,
no_grad_set
,
grad_to_var
,
grad_sub_block
);
for
(
auto
&
desc
:
descs
)
{
grad_op_descs
->
emplace_back
(
desc
.
release
());
}
...
...
@@ -134,7 +136,7 @@ framework::VariableNameMap CreateOutputVarNameMap(
Tracer
::
Tracer
(
framework
::
BlockDesc
*
root_block
)
:
root_block_
(
root_block
)
{}
std
::
set
<
std
::
string
>
Tracer
::
Trace
(
OpBase
*
op
,
const
VarBasePtrMap
&
inputs
,
const
VarBasePtrMap
&
outputs
,
VarBasePtrMap
*
outputs
,
framework
::
AttributeMap
attrs_map
,
const
platform
::
Place
expected_place
,
const
bool
stop_gradient
)
{
...
...
@@ -162,7 +164,7 @@ std::set<std::string> Tracer::Trace(OpBase* op, const VarBasePtrMap& inputs,
op
->
TrackPreOp
(
it
.
first
,
it
.
second
);
}
op
->
output_vars_
=
outputs
;
op
->
output_vars_
=
*
outputs
;
for
(
auto
it
:
op
->
output_vars_
)
{
auto
&
outvars
=
outvars_map
[
it
.
first
];
const
std
::
vector
<
VarBase
*>&
outputs
=
it
.
second
;
...
...
@@ -185,7 +187,7 @@ std::set<std::string> Tracer::Trace(OpBase* op, const VarBasePtrMap& inputs,
framework
::
VariableNameMap
invars_name_map
=
CreateInputVarNameMap
(
op
,
inputs
);
framework
::
VariableNameMap
outvars_name_map
=
CreateOutputVarNameMap
(
op
,
outputs
);
CreateOutputVarNameMap
(
op
,
*
outputs
);
auto
&
info
=
framework
::
OpInfoMap
::
Instance
().
Get
(
op
->
Type
());
if
(
info
.
Checker
()
!=
nullptr
)
{
...
...
@@ -196,6 +198,11 @@ std::set<std::string> Tracer::Trace(OpBase* op, const VarBasePtrMap& inputs,
framework
::
OpRegistry
::
CreateOp
(
op
->
Type
(),
invars_name_map
,
outvars_name_map
,
attrs_map
);
if
(
info
.
infer_var_type_
)
{
RuntimeInferVarTypeContext
infer_var_type_ctx
(
&
inputs
,
outputs
,
&
attrs_map
);
info
.
infer_var_type_
(
&
infer_var_type_ctx
);
}
// TODO(minqiyang): Support infer var type in imperative mode
// Run forward op
VLOG
(
3
)
<<
"tracer running "
<<
op
->
Type
();
...
...
@@ -220,6 +227,7 @@ std::set<std::string> Tracer::Trace(OpBase* op, const VarBasePtrMap& inputs,
VLOG
(
5
)
<<
"start construct backward op"
;
// construct grad op descs
op
->
attrs_
=
attrs_map
;
std
::
unique_ptr
<
framework
::
OpDesc
>
fwd_op_desc
(
new
framework
::
OpDesc
(
op
->
Type
(),
invars_name_map
,
outvars_name_map
,
attrs_map
));
std
::
unique_ptr
<
std
::
unordered_map
<
std
::
string
,
std
::
string
>>
grad_to_var
(
...
...
@@ -246,12 +254,12 @@ std::set<std::string> Tracer::Trace(OpBase* op, const VarBasePtrMap& inputs,
auto
fwd_var_it
=
current_vars_map
.
find
(
grad_invar
);
PADDLE_ENFORCE
(
fwd_var_it
!=
current_vars_map
.
end
());
// Forward inputs or outputs.
grad_in_vars
.
emplace_back
(
fwd_var_it
->
second
->
var_
);
grad_in_vars
.
emplace_back
(
fwd_var_it
->
second
);
}
else
{
VarBase
*
var
=
current_vars_map
[
var_it
->
second
];
InitGrad
(
var
,
prepared_op
.
GetDeviceContext
());
// Douts.
grad_in_vars
.
emplace_back
(
var
->
grads_
->
var_
);
grad_in_vars
.
emplace_back
(
var
->
grads_
);
}
vars_saved_for_backward
.
insert
(
it
.
first
);
...
...
@@ -268,7 +276,7 @@ std::set<std::string> Tracer::Trace(OpBase* op, const VarBasePtrMap& inputs,
op
->
Type
());
VarBase
*
var
=
current_vars_map
[
var_it
->
second
];
InitGrad
(
var
,
prepared_op
.
GetDeviceContext
());
grad_out_vars
.
push_back
(
var
->
grads_
->
var_
);
grad_out_vars
.
push_back
(
var
->
grads_
);
}
}
}
...
...
@@ -308,23 +316,23 @@ std::vector<VarBase*> Tracer::PyTrace(OpBase* op,
auto
&
grad_output_vars
=
op
->
grad_output_vars_
[
0
][
framework
::
GradVarName
(
PyLayer
::
kFwdOut
)];
for
(
const
VarBase
*
inp
:
inputs
)
{
grad_input_vars
.
push_back
(
inp
->
var_
);
for
(
VarBase
*
inp
:
inputs
)
{
grad_input_vars
.
push_back
(
inp
);
}
for
(
VarBase
*
out
:
outputs
)
{
grad_input_vars
.
push_back
(
out
->
var_
);
grad_input_vars
.
push_back
(
out
);
}
// TODO(minqiyang): Add GPU support for PyLayer, only support CPU now
platform
::
CPUPlace
place
;
for
(
VarBase
*
out
:
outputs
)
{
InitGrad
(
out
,
platform
::
DeviceContextPool
::
Instance
().
Get
(
place
));
grad_input_vars
.
push_back
(
out
->
grads_
->
var_
);
grad_input_vars
.
push_back
(
out
->
grads_
);
}
for
(
VarBase
*
inp
:
inputs
)
{
InitGrad
(
inp
,
platform
::
DeviceContextPool
::
Instance
().
Get
(
place
));
grad_output_vars
.
push_back
(
inp
->
grads_
->
var_
);
grad_output_vars
.
push_back
(
inp
->
grads_
);
}
}
return
outputs
;
...
...
paddle/fluid/imperative/tracer.h
浏览文件 @
dd080b17
...
...
@@ -48,7 +48,7 @@ class Tracer {
virtual
~
Tracer
()
{}
std
::
set
<
std
::
string
>
Trace
(
OpBase
*
op
,
const
VarBasePtrMap
&
inputs
,
const
VarBasePtrMap
&
outputs
,
VarBasePtrMap
*
outputs
,
// NOLINT
framework
::
AttributeMap
attrs_map
,
const
platform
::
Place
expected_place
,
const
bool
stop_gradient
=
false
);
...
...
paddle/fluid/imperative/type_defs.h
浏览文件 @
dd080b17
...
...
@@ -25,6 +25,7 @@ class VarBase;
class
OpBase
;
typedef
std
::
map
<
std
::
string
,
std
::
vector
<
VarBase
*>>
VarBasePtrMap
;
typedef
std
::
map
<
std
::
string
,
std
::
vector
<
const
VarBase
*>>
ConstVarBasePtrMap
;
typedef
std
::
map
<
std
::
string
,
std
::
vector
<
OpBase
*>>
OpBasePtrMap
;
}
// namespace imperative
...
...
paddle/fluid/inference/tests/api/CMakeLists.txt
浏览文件 @
dd080b17
...
...
@@ -110,7 +110,7 @@ set(TRANSFORMER_INSTALL_DIR "${INFERENCE_DEMO_INSTALL_DIR}/transformer")
download_model_and_data
(
${
TRANSFORMER_INSTALL_DIR
}
"temp%2Ftransformer_model.tar.gz"
"temp%2Ftransformer_data.txt.tar.gz"
)
inference_analysis_test
(
test_analyzer_transformer SRCS analyzer_transformer_tester.cc
EXTRA_DEPS
${
INFERENCE_EXTRA_DEPS
}
ARGS --infer_model=
${
TRANSFORMER_INSTALL_DIR
}
/model --infer_data=
${
TRANSFORMER_INSTALL_DIR
}
/data.txt --batch_size=8
)
ARGS --infer_model=
${
TRANSFORMER_INSTALL_DIR
}
/model --infer_data=
${
TRANSFORMER_INSTALL_DIR
}
/data.txt --batch_size=8
SERIAL
)
# ocr
set
(
OCR_INSTALL_DIR
"
${
INFERENCE_DEMO_INSTALL_DIR
}
/ocr"
)
...
...
paddle/fluid/inference/tests/api/analyzer_transformer_tester.cc
浏览文件 @
dd080b17
...
...
@@ -183,10 +183,13 @@ void SetInput(std::vector<std::vector<PaddleTensor>> *inputs) {
}
// Easy for profiling independently.
TEST
(
Analyzer_Transformer
,
profil
e
)
{
void
profile
(
bool
use_mkldnn
=
fals
e
)
{
AnalysisConfig
cfg
;
SetConfig
(
&
cfg
);
std
::
vector
<
PaddleTensor
>
outputs
;
if
(
use_mkldnn
)
{
cfg
.
EnableMKLDNN
();
}
std
::
vector
<
std
::
vector
<
PaddleTensor
>>
input_slots_all
;
SetInput
(
&
input_slots_all
);
...
...
@@ -194,6 +197,11 @@ TEST(Analyzer_Transformer, profile) {
input_slots_all
,
&
outputs
,
FLAGS_num_threads
);
}
TEST
(
Analyzer_Transformer
,
profile
)
{
profile
();
}
#ifdef PADDLE_WITH_MKLDNN
TEST
(
Analyzer_Transformer
,
profile_mkldnn
)
{
profile
(
true
);
}
#endif
// Check the fuse status
TEST
(
Analyzer_Transformer
,
fuse_statis
)
{
AnalysisConfig
cfg
;
...
...
@@ -206,9 +214,12 @@ TEST(Analyzer_Transformer, fuse_statis) {
}
// Compare result of NativeConfig and AnalysisConfig
TEST
(
Analyzer_Transformer
,
compar
e
)
{
void
compare
(
bool
use_mkldnn
=
fals
e
)
{
AnalysisConfig
cfg
;
SetConfig
(
&
cfg
);
if
(
use_mkldnn
)
{
cfg
.
EnableMKLDNN
();
}
std
::
vector
<
std
::
vector
<
PaddleTensor
>>
input_slots_all
;
SetInput
(
&
input_slots_all
);
...
...
@@ -216,5 +227,10 @@ TEST(Analyzer_Transformer, compare) {
reinterpret_cast
<
const
PaddlePredictor
::
Config
*>
(
&
cfg
),
input_slots_all
);
}
TEST
(
Analyzer_Transformer
,
compare
)
{
compare
();
}
#ifdef PADDLE_WITH_MKLDNN
TEST
(
Analyzer_Transformer
,
compare_mkldnn
)
{
compare
(
true
/* use_mkldnn */
);
}
#endif
}
// namespace inference
}
// namespace paddle
paddle/fluid/operators/beam_search_decode_op.cc
浏览文件 @
dd080b17
...
...
@@ -178,10 +178,10 @@ Beam Search Decode Operator. This Operator constructs the full hypotheses for
each source sentence by walking back along the LoDTensorArray Input(ids)
whose lods can be used to restore the path in the beam search tree.
The Output(SentenceIds) and Output(SentenceScores) separately contain the
generated id sequences and the corresponding scores. The shapes and lods of the
two LodTensor are same. The lod level is 2 and the two levels separately
indicate how many hypotheses each source sentence has and how many ids each
The Output(SentenceIds) and Output(SentenceScores) separately contain the
generated id sequences and the corresponding scores. The shapes and lods of the
two LodTensor are same. The lod level is 2 and the two levels separately
indicate how many hypotheses each source sentence has and how many ids each
hypothesis has.
)DOC"
);
}
...
...
@@ -203,15 +203,12 @@ class BeamSearchDecodeInferShape : public framework::InferShapeBase {
class
BeamSearchDecodeInferVarType
:
public
framework
::
VarTypeInference
{
public:
void
operator
()(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
override
{
for
(
auto
&
o
:
op_desc
.
Output
(
"SentenceIds"
))
{
auto
&
sentence_ids
=
block
->
FindRecursiveOrCreateVar
(
o
);
sentence_ids
.
SetType
(
framework
::
proto
::
VarType
::
LOD_TENSOR
);
void
operator
()(
framework
::
InferVarTypeContext
*
ctx
)
const
override
{
for
(
auto
&
o
:
ctx
->
Output
(
"SentenceIds"
))
{
ctx
->
SetType
(
o
,
framework
::
proto
::
VarType
::
LOD_TENSOR
);
}
for
(
auto
&
o
:
op_desc
.
Output
(
"SentenceScores"
))
{
auto
&
sentence_scores
=
block
->
FindRecursiveOrCreateVar
(
o
);
sentence_scores
.
SetType
(
framework
::
proto
::
VarType
::
LOD_TENSOR
);
for
(
auto
&
o
:
ctx
->
Output
(
"SentenceScores"
))
{
ctx
->
SetType
(
o
,
framework
::
proto
::
VarType
::
LOD_TENSOR
);
}
}
};
...
...
paddle/fluid/operators/beam_search_op.cc
浏览文件 @
dd080b17
...
...
@@ -65,7 +65,7 @@ class BeamSearchOpMaker : public framework::OpProtoAndCheckerMaker {
.
SetDefault
(
true
);
AddComment
(
R"DOC(
This operator does the search in beams for one time step.
This operator does the search in beams for one time step.
Specifically, it selects the top-K candidate word ids of current step from
Input(ids) according to their Input(scores) for all source sentences,
where K is Attr(beam_size) and Input(ids), Input(scores) are predicted results
...
...
@@ -120,15 +120,12 @@ class BeamSearchOp : public framework::OperatorWithKernel {
class
BeamSearchInferVarType
:
public
framework
::
VarTypeInference
{
public:
void
operator
()(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
override
{
for
(
auto
&
o
:
op_desc
.
Output
(
"selected_ids"
))
{
auto
&
selected_ids
=
block
->
FindRecursiveOrCreateVar
(
o
);
selected_ids
.
SetType
(
framework
::
proto
::
VarType
::
LOD_TENSOR
);
void
operator
()(
framework
::
InferVarTypeContext
*
ctx
)
const
override
{
for
(
auto
&
o
:
ctx
->
Output
(
"selected_ids"
))
{
ctx
->
SetType
(
o
,
framework
::
proto
::
VarType
::
LOD_TENSOR
);
}
for
(
auto
&
o
:
op_desc
.
Output
(
"selected_scores"
))
{
auto
&
selected_scores
=
block
->
FindRecursiveOrCreateVar
(
o
);
selected_scores
.
SetType
(
framework
::
proto
::
VarType
::
LOD_TENSOR
);
for
(
auto
&
o
:
ctx
->
Output
(
"selected_scores"
))
{
ctx
->
SetType
(
o
,
framework
::
proto
::
VarType
::
LOD_TENSOR
);
}
}
};
...
...
paddle/fluid/operators/concat_op.cc
浏览文件 @
dd080b17
...
...
@@ -56,7 +56,7 @@ class ConcatOp : public framework::OperatorWithKernel {
"Input tensors should have the same "
"elements except the specify axis."
);
}
else
{
// not check -1 in compile time
// not check -1
with other
in compile time
if
(
out_dims
[
j
]
!=
-
1
&&
ins
[
i
][
j
]
!=
-
1
)
{
PADDLE_ENFORCE_EQ
(
out_dims
[
j
],
ins
[
i
][
j
],
"Input tensors should have the same "
...
...
paddle/fluid/operators/controlflow/get_places_op.cc
浏览文件 @
dd080b17
...
...
@@ -93,11 +93,9 @@ execution.
class
GetPlacesInferVarType
:
public
framework
::
VarTypeInference
{
public:
void
operator
()(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
override
{
for
(
auto
&
o_name
:
op_desc
.
Output
(
"Out"
))
{
block
->
FindRecursiveOrCreateVar
(
o_name
).
SetType
(
framework
::
proto
::
VarType
::
PLACE_LIST
);
void
operator
()(
framework
::
InferVarTypeContext
*
ctx
)
const
override
{
for
(
auto
&
o_name
:
ctx
->
Output
(
"Out"
))
{
ctx
->
SetType
(
o_name
,
framework
::
proto
::
VarType
::
PLACE_LIST
);
}
}
};
...
...
paddle/fluid/operators/controlflow/tensor_array_read_write_op.cc
浏览文件 @
dd080b17
...
...
@@ -100,16 +100,13 @@ class WriteToArrayInferShape : public framework::InferShapeBase {
class
WriteToArrayInferVarType
:
public
framework
::
VarTypeInference
{
public:
void
operator
()(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
override
{
auto
x_name
=
op_desc
.
Input
(
"X"
)[
0
];
auto
out_name
=
op_desc
.
Output
(
"Out"
)[
0
];
void
operator
()(
framework
::
InferVarTypeContext
*
ctx
)
const
override
{
auto
x_name
=
ctx
->
Input
(
"X"
)[
0
];
auto
out_name
=
ctx
->
Output
(
"Out"
)[
0
];
VLOG
(
10
)
<<
"Set Variable "
<<
out_name
<<
" as LOD_TENSOR_ARRAY"
;
auto
&
out
=
block
->
FindRecursiveOrCreateVar
(
out_name
);
out
.
SetType
(
framework
::
proto
::
VarType
::
LOD_TENSOR_ARRAY
);
auto
*
x
=
block
->
FindVarRecursive
(
x_name
);
if
(
x
!=
nullptr
)
{
out
.
SetDataType
(
x
->
GetDataType
());
ctx
->
SetType
(
out_name
,
framework
::
proto
::
VarType
::
LOD_TENSOR_ARRAY
);
if
(
ctx
->
HasVar
(
x_name
))
{
ctx
->
SetDataType
(
out_name
,
ctx
->
GetDataType
(
x_name
));
}
}
};
...
...
paddle/fluid/operators/controlflow/while_op.cc
浏览文件 @
dd080b17
...
...
@@ -365,19 +365,16 @@ class WhileGradOpDescMaker : public framework::SingleGradOpDescMaker {
class
WhileGradOpVarTypeInference
:
public
framework
::
VarTypeInference
{
public:
void
operator
()(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
override
{
auto
p_names
=
op_desc
.
Input
(
kX
);
auto
pg_ig_names
=
op_desc
.
Output
(
framework
::
GradVarName
(
kX
));
void
operator
()(
framework
::
InferVarTypeContext
*
ctx
)
const
override
{
auto
p_names
=
ctx
->
Input
(
kX
);
auto
pg_ig_names
=
ctx
->
Output
(
framework
::
GradVarName
(
kX
));
for
(
size_t
i
=
0
;
i
<
p_names
.
size
();
++
i
)
{
auto
&
p_var
=
detail
::
Ref
(
block
->
FindVarRecursive
(
p_names
[
i
]));
auto
*
g_var
=
block
->
FindVarRecursive
(
pg_ig_names
[
i
]);
if
(
g_var
!=
nullptr
)
{
// Gradient could be @EMPTY@
if
(
ctx
->
HasVar
(
pg_ig_names
[
i
]))
{
VLOG
(
5
)
<<
"Setting "
<<
pg_ig_names
[
i
]
<<
" following "
<<
p_names
[
i
]
<<
" type: "
<<
p_var
.
GetType
(
);
g_var
->
SetType
(
p_var
.
GetType
(
));
g_var
->
SetDataType
(
p_var
.
GetDataType
(
));
<<
" type: "
<<
ctx
->
GetType
(
p_names
[
i
]
);
ctx
->
SetType
(
pg_ig_names
[
i
],
ctx
->
GetType
(
p_names
[
i
]
));
ctx
->
SetDataType
(
pg_ig_names
[
i
],
ctx
->
GetDataType
(
p_names
[
i
]
));
}
}
}
...
...
paddle/fluid/operators/conv_transpose_op.cc
浏览文件 @
dd080b17
...
...
@@ -13,6 +13,7 @@ See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/operators/conv_transpose_op.h"
#include <memory>
#include <string>
#include <vector>
...
...
@@ -344,6 +345,28 @@ framework::OpKernelType ConvTransposeOpGrad::GetExpectedKernelType(
ctx
.
GetPlace
(),
layout_
,
library_
);
}
class
ConvTransposeGradOpDescMaker
:
public
framework
::
SingleGradOpDescMaker
{
public:
using
framework
::
SingleGradOpDescMaker
::
SingleGradOpDescMaker
;
protected:
std
::
unique_ptr
<
framework
::
OpDesc
>
Apply
()
const
override
{
std
::
unique_ptr
<
framework
::
OpDesc
>
op
(
new
framework
::
OpDesc
());
op
->
SetType
(
ForwardOp
().
Type
()
+
"_grad"
);
op
->
SetInput
(
"Input"
,
Input
(
"Input"
));
op
->
SetInput
(
"Filter"
,
Input
(
"Filter"
));
op
->
SetOutput
(
framework
::
GradVarName
(
"Input"
),
InputGrad
(
"Input"
));
op
->
SetOutput
(
framework
::
GradVarName
(
"Filter"
),
InputGrad
(
"Filter"
));
if
(
ForwardOp
().
Inputs
().
count
(
"Bias"
)
>
0
)
{
op
->
SetInput
(
"Bias"
,
Input
(
"Bias"
));
op
->
SetOutput
(
framework
::
GradVarName
(
"Bias"
),
InputGrad
(
"Bias"
));
}
op
->
SetInput
(
framework
::
GradVarName
(
"Output"
),
OutputGrad
(
"Output"
));
op
->
SetAttrMap
(
Attrs
());
return
op
;
}
};
}
// namespace operators
}
// namespace paddle
...
...
@@ -352,7 +375,7 @@ namespace ops = paddle::operators;
// conv2d_transpose
REGISTER_OPERATOR
(
conv2d_transpose
,
ops
::
ConvTransposeOp
,
ops
::
Conv2DTransposeOpMaker
,
paddle
::
framework
::
DefaultGradOpDescMaker
<
true
>
);
ops
::
ConvTransposeGradOpDescMaker
);
REGISTER_OPERATOR
(
conv2d_transpose_grad
,
ops
::
ConvTransposeOpGrad
);
REGISTER_OP_CPU_KERNEL
(
...
...
@@ -368,7 +391,7 @@ REGISTER_OP_CPU_KERNEL(
// conv3d_transpose
REGISTER_OPERATOR
(
conv3d_transpose
,
ops
::
ConvTransposeOp
,
ops
::
Conv3DTransposeOpMaker
,
paddle
::
framework
::
DefaultGradOpDescMaker
<
true
>
);
ops
::
ConvTransposeGradOpDescMaker
);
REGISTER_OPERATOR
(
conv3d_transpose_grad
,
ops
::
ConvTransposeOpGrad
);
REGISTER_OP_CPU_KERNEL
(
...
...
@@ -384,7 +407,7 @@ REGISTER_OP_CPU_KERNEL(
// depthwise conv2d_transpose
REGISTER_OPERATOR
(
depthwise_conv2d_transpose
,
ops
::
ConvTransposeOp
,
ops
::
Conv2DTransposeOpMaker
,
paddle
::
framework
::
DefaultGradOpDescMaker
<
true
>
);
ops
::
ConvTransposeGradOpDescMaker
);
REGISTER_OPERATOR
(
depthwise_conv2d_transpose_grad
,
ops
::
ConvTransposeOpGrad
);
REGISTER_OP_CPU_KERNEL
(
...
...
paddle/fluid/operators/cos_sim_op.cc
浏览文件 @
dd080b17
...
...
@@ -74,6 +74,9 @@ class CosSimOpMaker : public framework::OpProtoAndCheckerMaker {
"Norm of the second input, reduced along the 1st "
"dimension."
)
.
AsIntermediate
();
AddAttr
<
bool
>
(
framework
::
kAllKernelsMustComputeRuntimeShape
,
"Skip calling InferShape() function in the runtime."
)
.
SetDefault
(
true
);
AddComment
(
R"DOC(
**Cosine Similarity Operator**
...
...
paddle/fluid/operators/cos_sim_op.h
浏览文件 @
dd080b17
...
...
@@ -28,17 +28,21 @@ class CosSimKernel : public framework::OpKernel<T> {
public:
void
Compute
(
const
framework
::
ExecutionContext
&
context
)
const
override
{
// get Tensor
auto
*
in_x
=
context
.
Input
<
Tensor
>
(
"X"
);
auto
*
in_x
=
context
.
Input
<
framework
::
LoD
Tensor
>
(
"X"
);
auto
*
in_y
=
context
.
Input
<
Tensor
>
(
"Y"
);
auto
*
out_z
=
context
.
Output
<
Tensor
>
(
"Out"
);
auto
*
out_z
=
context
.
Output
<
framework
::
LoD
Tensor
>
(
"Out"
);
auto
*
out_x_norm
=
context
.
Output
<
Tensor
>
(
"XNorm"
);
auto
*
out_y_norm
=
context
.
Output
<
Tensor
>
(
"YNorm"
);
out_z
->
mutable_data
<
T
>
(
context
.
GetPlace
());
out_x_norm
->
mutable_data
<
T
>
(
context
.
GetPlace
());
out_y_norm
->
mutable_data
<
T
>
(
context
.
GetPlace
());
int
rows_x
=
in_x
->
dims
()[
0
];
int
rows_y
=
in_y
->
dims
()[
0
];
out_z
->
Resize
({
rows_x
,
1
});
out_x_norm
->
Resize
({
rows_x
,
1
});
out_y_norm
->
Resize
({
rows_y
,
1
});
out_z
->
mutable_data
<
T
>
(
context
.
GetPlace
());
out_x_norm
->
mutable_data
<
T
>
(
context
.
GetPlace
());
out_y_norm
->
mutable_data
<
T
>
(
context
.
GetPlace
());
out_z
->
set_lod
(
in_x
->
lod
());
int
cols
=
framework
::
product
(
in_x
->
dims
())
/
rows_x
;
...
...
@@ -81,6 +85,7 @@ class CosSimGradKernel : public framework::OpKernel<T> {
if
(
rows_x
==
rows_y
)
{
if
(
out_grad_x
)
{
out_grad_x
->
Resize
(
in_x
->
dims
());
math
::
CosSimGradFunctor
<
T
>
functor
(
in_x_norm
->
data
<
T
>
(),
in_y_norm
->
data
<
T
>
(),
in_x
->
data
<
T
>
(),
in_y
->
data
<
T
>
(),
in_z
->
data
<
T
>
(),
in_grad_z
->
data
<
T
>
(),
...
...
@@ -91,6 +96,7 @@ class CosSimGradKernel : public framework::OpKernel<T> {
for_range
(
functor
);
}
if
(
out_grad_y
)
{
out_grad_y
->
Resize
(
in_y
->
dims
());
math
::
CosSimGradFunctor
<
T
>
functor
(
in_y_norm
->
data
<
T
>
(),
in_x_norm
->
data
<
T
>
(),
in_y
->
data
<
T
>
(),
in_x
->
data
<
T
>
(),
in_z
->
data
<
T
>
(),
in_grad_z
->
data
<
T
>
(),
...
...
@@ -102,6 +108,7 @@ class CosSimGradKernel : public framework::OpKernel<T> {
}
}
else
{
if
(
out_grad_x
)
{
out_grad_x
->
Resize
(
in_x
->
dims
());
math
::
CosSimDxFunctor
<
T
>
functor
(
in_x_norm
->
data
<
T
>
(),
in_y_norm
->
data
<
T
>
(),
in_x
->
data
<
T
>
(),
in_y
->
data
<
T
>
(),
in_z
->
data
<
T
>
(),
in_grad_z
->
data
<
T
>
(),
...
...
@@ -112,6 +119,7 @@ class CosSimGradKernel : public framework::OpKernel<T> {
for_range
(
functor
);
}
if
(
out_grad_y
)
{
out_grad_y
->
Resize
(
in_y
->
dims
());
out_grad_y
->
mutable_data
<
T
>
(
context
.
GetPlace
());
math
::
SetConstant
<
DeviceContext
,
T
>
set_zero
;
auto
&
dev_ctx
=
context
.
template
device_context
<
DeviceContext
>();
...
...
paddle/fluid/operators/distributed_ops/allreduce_op.cc
0 → 100644
浏览文件 @
dd080b17
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <future> // NOLINT
#include <ostream>
#include "paddle/fluid/framework/data_type.h"
#include "paddle/fluid/framework/lod_tensor.h"
#include "paddle/fluid/framework/op_registry.h"
#ifdef PADDLE_WITH_CUDA
#include "paddle/fluid/platform/nccl_helper.h"
#endif
namespace
paddle
{
namespace
operators
{
struct
MutableDataFunctor
{
MutableDataFunctor
(
void
**
data
,
framework
::
LoDTensor
*
tensor
,
const
platform
::
Place
&
place
)
:
data_
(
data
),
tensor_
(
tensor
),
place_
(
place
)
{}
template
<
typename
T
>
void
apply
()
{
*
data_
=
tensor_
->
mutable_data
<
T
>
(
place_
);
}
void
**
data_
;
framework
::
LoDTensor
*
tensor_
;
platform
::
Place
place_
;
};
class
AllReduceOp
:
public
framework
::
OperatorBase
{
using
OperatorBase
::
OperatorBase
;
void
RunImpl
(
const
framework
::
Scope
&
scope
,
const
platform
::
Place
&
place
)
const
override
{
PADDLE_ENFORCE
(
is_gpu_place
(
place
),
"AllReduce op can run on gpu place only for now."
);
#ifdef PADDLE_WITH_CUDA
platform
::
DeviceContextPool
&
pool
=
platform
::
DeviceContextPool
::
Instance
();
auto
*
ctx
=
pool
.
Get
(
place
);
auto
in_names
=
Inputs
(
"X"
);
auto
out_names
=
Outputs
(
"Out"
);
PADDLE_ENFORCE_EQ
(
in_names
.
size
(),
1
,
"Only support one input"
);
PADDLE_ENFORCE_EQ
(
out_names
.
size
(),
1
,
"Only support one output"
);
auto
*
in
=
scope
.
FindVar
(
in_names
[
0
]);
auto
*
out
=
scope
.
FindVar
(
out_names
[
0
]);
PADDLE_ENFORCE
(
in
->
IsType
<
framework
::
LoDTensor
>
()
||
out
->
IsType
<
framework
::
LoDTensor
>
(),
"Only support allreduce LoDTensors"
);
int
dtype
=
-
1
;
auto
in_tensor
=
in
->
Get
<
framework
::
LoDTensor
>
();
dtype
=
platform
::
ToNCCLDataType
(
in_tensor
.
type
());
int64_t
numel
=
in_tensor
.
numel
();
auto
*
sendbuff
=
in_tensor
.
data
<
void
>
();
auto
*
out_tensor
=
out
->
GetMutable
<
framework
::
LoDTensor
>
();
out_tensor
->
Resize
(
in_tensor
.
dims
());
void
*
recvbuff
=
nullptr
;
framework
::
VisitDataType
(
in_tensor
.
type
(),
MutableDataFunctor
(
&
recvbuff
,
out_tensor
,
place
));
auto
cuda_ctx
=
static_cast
<
platform
::
CUDADeviceContext
*>
(
ctx
);
auto
*
comm
=
cuda_ctx
->
nccl_comm
();
// FIXME(typhoonzero): should use nccl stream here.
auto
stream
=
cuda_ctx
->
stream
();
int
reduce_type
=
Attr
<
int
>
(
"reduce_type"
);
ncclRedOp_t
red_type
=
ncclSum
;
switch
(
reduce_type
)
{
case
0
:
red_type
=
ncclSum
;
break
;
case
1
:
red_type
=
ncclProd
;
break
;
case
2
:
red_type
=
ncclMax
;
break
;
case
3
:
red_type
=
ncclMin
;
break
;
}
PADDLE_ENFORCE
(
platform
::
dynload
::
ncclAllReduce
(
sendbuff
,
recvbuff
,
numel
,
static_cast
<
ncclDataType_t
>
(
dtype
),
red_type
,
comm
,
stream
));
#endif
}
};
class
AllReduceOpMaker
:
public
framework
::
OpProtoAndCheckerMaker
{
public:
void
Make
()
{
AddInput
(
"X"
,
"(Tensor), tensor to be allreduced."
);
AddOutput
(
"Out"
,
"(Tensor) the result of allreduced."
);
AddAttr
<
int
>
(
"reduce_type"
,
"(int) determin the reduce type."
)
.
SetDefault
(
0
);
AddComment
(
R"DOC(
***AllReduce Operator***
Call NCCL AllReduce internally. Note that this op must be used when one
thread is managing one GPU device.
For speed reasons, reduce_type should be an integer:
0: sum
1: prod
2: max
3: min
If input and output are the same variable, in-place allreduce will be used.
)DOC"
);
}
};
class
AllReduceOpShapeInference
:
public
framework
::
InferShapeBase
{
public:
void
operator
()(
framework
::
InferShapeContext
*
ctx
)
const
override
{}
};
}
// namespace operators
}
// namespace paddle
namespace
ops
=
paddle
::
operators
;
REGISTER_OPERATOR
(
allreduce
,
ops
::
AllReduceOp
,
paddle
::
framework
::
EmptyGradOpMaker
,
ops
::
AllReduceOpMaker
,
ops
::
AllReduceOpShapeInference
);
paddle/fluid/operators/distributed_ops/fake_init_op.cc
浏览文件 @
dd080b17
...
...
@@ -56,8 +56,7 @@ class FakeInitOp : public framework::OperatorBase {
class
FakeInitOpVarTypeInference
:
public
framework
::
VarTypeInference
{
public:
void
operator
()(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
override
{}
void
operator
()(
framework
::
InferVarTypeContext
*
ctx
)
const
override
{}
};
class
FakeInitOpMaker
:
public
framework
::
OpProtoAndCheckerMaker
{
...
...
paddle/fluid/operators/distributed_ops/merge_ids_op.cc
浏览文件 @
dd080b17
...
...
@@ -114,11 +114,10 @@ class MergeIdsOp : public framework::OperatorWithKernel {
class
MergeIdsOpInferVarType
:
public
framework
::
VarTypeInference
{
public:
void
operator
()(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
override
{
auto
*
input_var
=
block
->
Var
(
op_desc
.
Input
(
"Ids"
)[
0
]);
for
(
auto
&
out_var
:
op_desc
.
Output
(
"Out"
))
{
block
->
Var
(
out_var
)
->
SetType
(
input_var
->
GetType
());
void
operator
()(
framework
::
InferVarTypeContext
*
ctx
)
const
override
{
auto
input_type
=
ctx
->
GetType
(
ctx
->
Input
(
"Ids"
)[
0
]);
for
(
auto
&
out_var
:
ctx
->
Output
(
"Out"
))
{
ctx
->
SetType
(
out_var
,
input_type
);
}
}
};
...
...
paddle/fluid/operators/distributed_ops/split_ids_op.cc
浏览文件 @
dd080b17
...
...
@@ -14,6 +14,8 @@ limitations under the License. */
#include "paddle/fluid/operators/distributed_ops/split_ids_op.h"
#include <memory>
namespace
paddle
{
namespace
operators
{
...
...
@@ -71,11 +73,10 @@ class SplitIdsOp : public framework::OperatorWithKernel {
class
SplitIdsOpInferVarType
:
public
framework
::
VarTypeInference
{
public:
void
operator
()(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
override
{
auto
*
input_var
=
block
->
Var
(
op_desc
.
Input
(
"Ids"
)[
0
]);
for
(
auto
&
out_var
:
op_desc
.
Output
(
"Out"
))
{
block
->
Var
(
out_var
)
->
SetType
(
input_var
->
GetType
());
void
operator
()(
framework
::
InferVarTypeContext
*
ctx
)
const
override
{
auto
input_type
=
ctx
->
GetType
(
ctx
->
Input
(
"Ids"
)[
0
]);
for
(
auto
&
out_var
:
ctx
->
Output
(
"Out"
))
{
ctx
->
SetType
(
out_var
,
input_type
);
}
}
};
...
...
paddle/fluid/operators/dropout_op.cc
浏览文件 @
dd080b17
...
...
@@ -13,6 +13,7 @@ See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/operators/dropout_op.h"
#include <memory>
#include <string>
namespace
paddle
{
...
...
@@ -106,21 +107,31 @@ class DropoutOpGrad : public framework::OperatorWithKernel {
PADDLE_ENFORCE_EQ
(
ctx
->
Attrs
().
Get
<
bool
>
(
"is_test"
),
false
,
"GradOp is only callable when is_test is false"
);
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"X"
),
"Input(X) must not be null."
);
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"Mask"
),
"Mask must not be null."
);
PADDLE_ENFORCE
(
ctx
->
HasInput
(
framework
::
GradVarName
(
"Out"
)),
"Input(Out@GRAD) must not be null."
);
auto
x_dims
=
ctx
->
GetInputDim
(
"X"
);
auto
out_dims
=
ctx
->
GetInputDim
(
framework
::
GradVarName
(
"Out"
));
PADDLE_ENFORCE_EQ
(
x_dims
,
out_dims
,
"Dimensions of Input(X) and Out@Grad must be the same."
);
auto
mask_dims
=
ctx
->
GetInputDim
(
"Mask"
);
PADDLE_ENFORCE_EQ
(
x_dims
,
mask_dims
,
"Dimensions of Input(X) and Mask must be the same."
);
ctx
->
SetOutputDim
(
framework
::
GradVarName
(
"X"
),
x_dims
);
ctx
->
ShareLoD
(
"X"
,
/*->*/
framework
::
GradVarName
(
"X"
));
ctx
->
SetOutputDim
(
framework
::
GradVarName
(
"X"
),
out_dims
);
ctx
->
ShareLoD
(
framework
::
GradVarName
(
"Out"
),
/*->*/
framework
::
GradVarName
(
"X"
));
}
};
class
DropoutGradOpDescMaker
:
public
framework
::
SingleGradOpDescMaker
{
public:
using
framework
::
SingleGradOpDescMaker
::
SingleGradOpDescMaker
;
protected:
std
::
unique_ptr
<
framework
::
OpDesc
>
Apply
()
const
override
{
std
::
unique_ptr
<
framework
::
OpDesc
>
op
(
new
framework
::
OpDesc
());
op
->
SetType
(
"dropout_grad"
);
op
->
SetInput
(
framework
::
GradVarName
(
"Out"
),
OutputGrad
(
"Out"
));
op
->
SetInput
(
"Mask"
,
Output
(
"Mask"
));
op
->
SetOutput
(
framework
::
GradVarName
(
"X"
),
InputGrad
(
"X"
));
op
->
SetAttrMap
(
Attrs
());
return
op
;
}
};
...
...
@@ -129,7 +140,7 @@ class DropoutOpGrad : public framework::OperatorWithKernel {
namespace
ops
=
paddle
::
operators
;
REGISTER_OPERATOR
(
dropout
,
ops
::
DropoutOp
,
ops
::
DropoutOpMaker
,
paddle
::
framework
::
DefaultGradOpDescMaker
<
true
>
);
ops
::
DropoutGradOpDescMaker
);
REGISTER_OPERATOR
(
dropout_grad
,
ops
::
DropoutOpGrad
);
REGISTER_OP_CPU_KERNEL
(
dropout
,
ops
::
CPUDropoutKernel
<
paddle
::
platform
::
CPUDeviceContext
,
float
>
,
...
...
paddle/fluid/operators/fc_op.cc
浏览文件 @
dd080b17
...
...
@@ -55,17 +55,8 @@ void FCOp::InferShape(framework::InferShapeContext* ctx) const {
"The input tensor Input's rank of FCOp should be larger than "
"in_num_col_dims."
);
auto
in_mat_dims
=
framework
::
flatten_to_2d
(
in_dims
,
in_num_col_dims
);
PADDLE_ENFORCE_EQ
(
in_mat_dims
[
1
],
w_dims
[
0
],
"Fully Connected input and weigth size do not match. %s, %s"
);
std
::
vector
<
int64_t
>
output_dims
;
output_dims
.
reserve
(
static_cast
<
size_t
>
(
in_num_col_dims
+
1
));
for
(
int
i
=
0
;
i
<
in_num_col_dims
;
++
i
)
{
output_dims
.
push_back
(
in_dims
[
i
]);
}
output_dims
.
push_back
(
w_dims
[
1
]);
FCOutputSize
(
in_dims
,
w_dims
,
output_dims
,
in_num_col_dims
);
ctx
->
SetOutputDim
(
"Out"
,
framework
::
make_ddim
(
output_dims
));
ctx
->
ShareLoD
(
"Input"
,
"Out"
);
...
...
@@ -128,6 +119,9 @@ void FCOpMaker::Make() {
AddAttr
<
bool
>
(
"use_mkldnn"
,
"(bool, default false) Only used in mkldnn kernel"
)
.
SetDefault
(
false
);
AddAttr
<
bool
>
(
framework
::
kAllKernelsMustComputeRuntimeShape
,
"Skip calling InferShape() function in the runtime."
)
.
SetDefault
(
true
);
AddComment
(
R"DOC(
Fully Connected Operator.
...
...
@@ -142,13 +136,20 @@ class FCOpKernel : public framework::OpKernel<T> {
void
Compute
(
const
paddle
::
framework
::
ExecutionContext
&
ctx
)
const
override
{
PADDLE_ENFORCE
(
platform
::
is_cpu_place
(
ctx
.
GetPlace
()),
"It must use CPUPlace."
);
auto
input
=
ctx
.
Input
<
Tensor
>
(
"Input"
);
auto
input
=
ctx
.
Input
<
framework
::
LoD
Tensor
>
(
"Input"
);
auto
w
=
ctx
.
Input
<
Tensor
>
(
"W"
);
auto
bias
=
ctx
.
Input
<
Tensor
>
(
"Bias"
);
auto
output
=
ctx
.
Output
<
Tensor
>
(
"Out"
);
auto
output
=
ctx
.
Output
<
framework
::
LoDTensor
>
(
"Out"
);
int
in_num_col_dims
=
ctx
.
Attr
<
int
>
(
"in_num_col_dims"
);
auto
w_dims
=
w
->
dims
();
std
::
vector
<
int64_t
>
output_dims
;
FCOutputSize
(
input
->
dims
(),
w_dims
,
output_dims
,
in_num_col_dims
);
output
->
Resize
(
framework
::
make_ddim
(
output_dims
));
output
->
set_lod
(
input
->
lod
());
auto
out_dims
=
output
->
dims
();
int
M
=
framework
::
product
(
out_dims
)
/
out_dims
[
out_dims
.
size
()
-
1
];
int
M
=
framework
::
product
(
out_dims
)
/
w_dims
[
1
];
const
T
*
input_data
=
input
->
data
<
T
>
();
const
T
*
w_data
=
w
->
data
<
T
>
();
...
...
paddle/fluid/operators/fc_op.h
浏览文件 @
dd080b17
...
...
@@ -48,5 +48,21 @@ class FCOpMaker : public framework::OpProtoAndCheckerMaker {
void
Make
()
override
;
};
inline
void
FCOutputSize
(
const
framework
::
DDim
&
in_dims
,
const
framework
::
DDim
&
w_dims
,
std
::
vector
<
int64_t
>&
out_dims
,
// NOLINT
int
in_num_col_dims
)
{
auto
in_mat_dims
=
framework
::
flatten_to_2d
(
in_dims
,
in_num_col_dims
);
PADDLE_ENFORCE_EQ
(
in_mat_dims
[
1
],
w_dims
[
0
],
"Fully Connected input and weigth size do not match. %s, %s"
);
out_dims
.
reserve
(
static_cast
<
size_t
>
(
in_num_col_dims
+
1
));
for
(
int
i
=
0
;
i
<
in_num_col_dims
;
++
i
)
{
out_dims
.
push_back
(
in_dims
[
i
]);
}
out_dims
.
push_back
(
w_dims
[
1
]);
}
}
// namespace operators
}
// namespace paddle
paddle/fluid/operators/fill_constant_op.cc
浏览文件 @
dd080b17
...
...
@@ -39,12 +39,11 @@ class FillConstantOp : public framework::OperatorWithKernel {
class
FillConstantOpVarTypeInference
:
public
framework
::
VarTypeInference
{
public:
void
operator
()(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
override
{
void
operator
()(
framework
::
InferVarTypeContext
*
ctx
)
const
override
{
auto
data_type
=
static_cast
<
framework
::
proto
::
VarType
::
Type
>
(
boost
::
get
<
int
>
(
op_desc
.
GetAttr
(
"dtype"
)));
auto
&
out_var_name
=
op_desc
.
Output
(
"Out"
).
front
();
block
->
Var
(
out_var_name
)
->
SetDataType
(
data_type
);
boost
::
get
<
int
>
(
ctx
->
GetAttr
(
"dtype"
)));
auto
&
out_var_name
=
ctx
->
Output
(
"Out"
).
front
();
ctx
->
SetDataType
(
out_var_name
,
data_type
);
}
};
...
...
paddle/fluid/operators/fused/fused_embedding_seq_pool_op.cc
浏览文件 @
dd080b17
...
...
@@ -88,7 +88,8 @@ class FusedEmbeddingSeqPoolOpMaker : public framework::OpProtoAndCheckerMaker {
"(boolean, default false) "
"Sparse update."
)
.
SetDefault
(
false
);
AddAttr
<
bool
>
(
framework
::
kAllKernelsMustComputeRuntimeShape
,
""
)
AddAttr
<
bool
>
(
framework
::
kAllKernelsMustComputeRuntimeShape
,
"Skip calling InferShape() function in the runtime."
)
.
SetDefault
(
true
);
AddComment
(
R"DOC(
FusedEmbeddingSeqPool Operator.
...
...
@@ -137,22 +138,20 @@ class FusedEmbeddingSeqPoolOpGrad : public framework::OperatorWithKernel {
class
FusedEmbeddingSeqPoolOpGradVarTypeInference
:
public
framework
::
VarTypeInference
{
public:
void
operator
()(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
override
{
auto
out_var_name
=
op_desc
.
Output
(
framework
::
GradVarName
(
"W"
)).
front
();
auto
attr
=
op_desc
.
GetAttr
(
"is_sparse"
);
void
operator
()(
framework
::
InferVarTypeContext
*
ctx
)
const
override
{
auto
out_var_name
=
ctx
->
Output
(
framework
::
GradVarName
(
"W"
)).
front
();
auto
attr
=
ctx
->
GetAttr
(
"is_sparse"
);
bool
is_sparse
=
boost
::
get
<
bool
>
(
attr
);
if
(
is_sparse
)
{
VLOG
(
3
)
<<
"fused_embedding_seq_pool_grad op "
<<
framework
::
GradVarName
(
"W"
)
<<
" is set to SelectedRows"
;
block
->
Var
(
out_var_name
)
->
SetType
(
framework
::
proto
::
VarType
::
SELECTED_ROWS
);
ctx
->
SetType
(
out_var_name
,
framework
::
proto
::
VarType
::
SELECTED_ROWS
);
}
else
{
VLOG
(
3
)
<<
"fused_embedding_seq_pool_grad op "
<<
framework
::
GradVarName
(
"W"
)
<<
" is set to LoDTensor"
;
block
->
Var
(
out_var_name
)
->
SetType
(
framework
::
proto
::
VarType
::
LOD_TENSOR
);
ctx
->
SetType
(
out_var_name
,
framework
::
proto
::
VarType
::
LOD_TENSOR
);
}
block
->
Var
(
out_var_name
)
->
SetDataType
(
block
->
Var
(
"W"
)
->
GetDataType
(
));
ctx
->
SetDataType
(
out_var_name
,
ctx
->
GetDataType
(
ctx
->
Input
(
"W"
)[
0
]
));
}
};
...
...
paddle/fluid/operators/get_tensor_from_selected_rows_op.cc
浏览文件 @
dd080b17
...
...
@@ -81,15 +81,12 @@ GetTensorFromSelectedRows is used to get the tensor from SelectedRows.
class
GetTensorFromSelectedRowsOpVarTypeInference
:
public
framework
::
VarTypeInference
{
public:
void
operator
()(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
final
{
auto
out_var_name
=
op_desc
.
Output
(
"Out"
).
front
();
auto
in_var_name
=
op_desc
.
Input
(
"X"
).
front
();
auto
out_var
=
block
->
FindRecursiveOrCreateVar
(
out_var_name
);
auto
in_var
=
block
->
FindRecursiveOrCreateVar
(
in_var_name
);
out_var
.
SetType
(
framework
::
proto
::
VarType
::
LOD_TENSOR
);
out_var
.
SetDataType
(
in_var
.
GetDataType
());
void
operator
()(
framework
::
InferVarTypeContext
*
ctx
)
const
{
// NOLINT
auto
out_var_name
=
ctx
->
Output
(
"Out"
).
front
();
auto
in_var_name
=
ctx
->
Input
(
"X"
).
front
();
ctx
->
SetType
(
out_var_name
,
framework
::
proto
::
VarType
::
LOD_TENSOR
);
ctx
->
SetDataType
(
out_var_name
,
ctx
->
GetDataType
(
in_var_name
));
}
};
...
...
paddle/fluid/operators/hash_op.cc
浏览文件 @
dd080b17
...
...
@@ -54,7 +54,8 @@ $$Out = scale * X$$
)DOC"
);
AddAttr
<
int
>
(
"num_hash"
,
""
).
SetDefault
(
1
);
AddAttr
<
int
>
(
"mod_by"
,
""
).
SetDefault
(
100000
);
AddAttr
<
bool
>
(
framework
::
kAllKernelsMustComputeRuntimeShape
,
""
)
AddAttr
<
bool
>
(
framework
::
kAllKernelsMustComputeRuntimeShape
,
"Skip calling InferShape() function in the runtime."
)
.
SetDefault
(
true
);
}
};
...
...
paddle/fluid/operators/hierarchical_sigmoid_op.cc
浏览文件 @
dd080b17
...
...
@@ -197,38 +197,32 @@ class HierarchicalSigmoidGradOp : public framework::OperatorWithKernel {
class
HierarchicalSigmoidGradOpGradVarTypeInference
:
public
framework
::
VarTypeInference
{
public:
void
operator
()(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
override
{
auto
w_grad_var_name
=
op_desc
.
Output
(
framework
::
GradVarName
(
"W"
)).
front
();
auto
bias_grad_var_name_vec
=
op_desc
.
Output
(
framework
::
GradVarName
(
"Bias"
));
void
operator
()(
framework
::
InferVarTypeContext
*
ctx
)
const
override
{
auto
w_grad_var_name
=
ctx
->
Output
(
framework
::
GradVarName
(
"W"
)).
front
();
auto
bias_grad_var_name_vec
=
ctx
->
Output
(
framework
::
GradVarName
(
"Bias"
));
std
::
string
bias_grad_var_name
;
bool
hasBias
=
false
;
if
(
bias_grad_var_name_vec
.
size
())
{
hasBias
=
true
;
bias_grad_var_name
=
op_desc
.
Output
(
framework
::
GradVarName
(
"Bias"
)).
front
();
bias_grad_var_name
=
ctx
->
Output
(
framework
::
GradVarName
(
"Bias"
)).
front
();
}
auto
attr
=
op_desc
.
GetAttr
(
"is_sparse"
);
auto
attr
=
ctx
->
GetAttr
(
"is_sparse"
);
bool
is_sparse
=
boost
::
get
<
bool
>
(
attr
);
if
(
is_sparse
)
{
VLOG
(
30
)
<<
"hierarchical_sigmoid_grad op "
<<
framework
::
GradVarName
(
"W"
)
<<
" is set to SelectedRows"
;
block
->
Var
(
w_grad_var_name
)
->
SetType
(
framework
::
proto
::
VarType
::
SELECTED_ROWS
);
ctx
->
SetType
(
w_grad_var_name
,
framework
::
proto
::
VarType
::
SELECTED_ROWS
);
}
else
{
VLOG
(
30
)
<<
"hierarchical_sigmoid_grad op "
<<
framework
::
GradVarName
(
"W"
)
<<
" is set to LoDTensor"
;
block
->
Var
(
w_grad_var_name
)
->
SetType
(
framework
::
proto
::
VarType
::
LOD_TENSOR
);
ctx
->
SetType
(
w_grad_var_name
,
framework
::
proto
::
VarType
::
LOD_TENSOR
);
}
if
(
hasBias
)
{
VLOG
(
30
)
<<
"hierarchical_sigmoid_grad op "
<<
framework
::
GradVarName
(
"Bias"
)
<<
" is set to LoDTensor"
;
block
->
Var
(
bias_grad_var_name
)
->
SetType
(
framework
::
proto
::
VarType
::
LOD_TENSOR
);
ctx
->
SetType
(
bias_grad_var_name
,
framework
::
proto
::
VarType
::
LOD_TENSOR
);
}
block
->
Var
(
w_grad_var_name
)
->
SetDataType
(
block
->
Var
(
"W"
)
->
GetDataType
(
));
ctx
->
SetDataType
(
w_grad_var_name
,
ctx
->
GetDataType
(
ctx
->
Input
(
"W"
)[
0
]
));
}
};
...
...
paddle/fluid/operators/layer_norm_op.cc
浏览文件 @
dd080b17
...
...
@@ -13,6 +13,7 @@ See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/operators/layer_norm_op.h"
#include <memory>
namespace
paddle
{
namespace
operators
{
...
...
@@ -133,7 +134,7 @@ class LayerNormGradOp : public framework::OperatorWithKernel {
}
if
(
ctx
->
HasOutput
(
framework
::
GradVarName
(
"Bias"
)))
{
ctx
->
SetOutputDim
(
framework
::
GradVarName
(
"Bias"
),
ctx
->
GetInputDim
(
"
Bias
"
));
ctx
->
GetInputDim
(
"
Scale
"
));
}
}
...
...
@@ -157,12 +158,39 @@ class LayerNormGradOp : public framework::OperatorWithKernel {
}
};
class
LayerNormGradOpDescMaker
:
public
framework
::
SingleGradOpDescMaker
{
public:
using
framework
::
SingleGradOpDescMaker
::
SingleGradOpDescMaker
;
protected:
std
::
unique_ptr
<
framework
::
OpDesc
>
Apply
()
const
override
{
std
::
unique_ptr
<
framework
::
OpDesc
>
op
(
new
framework
::
OpDesc
());
op
->
SetType
(
"layer_norm_grad"
);
op
->
SetInput
(
"X"
,
Input
(
"X"
));
op
->
SetInput
(
"Mean"
,
Output
(
"Mean"
));
op
->
SetInput
(
"Variance"
,
Output
(
"Variance"
));
if
(
ForwardOp
().
Inputs
().
count
(
"Scale"
)
>
0
)
{
op
->
SetInput
(
"Scale"
,
Input
(
"Scale"
));
op
->
SetOutput
(
framework
::
GradVarName
(
"Scale"
),
InputGrad
(
"Scale"
));
}
if
(
ForwardOp
().
Inputs
().
count
(
"Bias"
)
>
0
)
{
op
->
SetOutput
(
framework
::
GradVarName
(
"Bias"
),
InputGrad
(
"Bias"
));
}
op
->
SetInput
(
framework
::
GradVarName
(
"Y"
),
OutputGrad
(
"Y"
));
op
->
SetOutput
(
framework
::
GradVarName
(
"X"
),
InputGrad
(
"X"
));
op
->
SetAttrMap
(
Attrs
());
return
op
;
}
};
}
// namespace operators
}
// namespace paddle
namespace
ops
=
paddle
::
operators
;
REGISTER_OPERATOR
(
layer_norm
,
ops
::
LayerNormOp
,
ops
::
LayerNormOpMaker
,
paddle
::
framework
::
DefaultGradOpDescMaker
<
true
>
);
ops
::
LayerNormGradOpDescMaker
);
REGISTER_OPERATOR
(
layer_norm_grad
,
ops
::
LayerNormGradOp
);
REGISTER_OP_CPU_KERNEL
(
layer_norm
,
ops
::
LayerNormKernel
<
paddle
::
platform
::
CPUDeviceContext
,
float
>
,
...
...
paddle/fluid/operators/layer_norm_op.h
浏览文件 @
dd080b17
...
...
@@ -245,11 +245,9 @@ class LayerNormGradKernel : public framework::OpKernel<T> {
void
Compute
(
const
framework
::
ExecutionContext
&
ctx
)
const
override
{
const
float
epsilon
=
ctx
.
Attr
<
float
>
(
"epsilon"
);
auto
x
=
*
ctx
.
Input
<
Tensor
>
(
"X"
);
auto
*
y
=
ctx
.
Input
<
Tensor
>
(
"Y"
);
auto
*
mean
=
ctx
.
Input
<
Tensor
>
(
"Mean"
);
auto
*
var
=
ctx
.
Input
<
Tensor
>
(
"Variance"
);
auto
*
scale
=
ctx
.
Input
<
Tensor
>
(
"Scale"
);
auto
*
bias
=
ctx
.
Input
<
Tensor
>
(
"Bias"
);
auto
d_y
=
*
ctx
.
Input
<
Tensor
>
(
framework
::
GradVarName
(
"Y"
));
const
auto
begin_norm_axis
=
ctx
.
Attr
<
int
>
(
"begin_norm_axis"
);
...
...
@@ -275,18 +273,13 @@ class LayerNormGradKernel : public framework::OpKernel<T> {
x
.
Resize
(
matrix_shape
);
temp
.
mutable_data
<
T
>
(
matrix_shape
,
ctx
.
GetPlace
());
if
(
!
(
bias
&&
scale
))
{
temp_norm
.
ShareDataWith
(
*
y
);
temp_norm
.
Resize
(
matrix_shape
);
}
else
{
temp_norm
.
mutable_data
<
T
>
(
matrix_shape
,
ctx
.
GetPlace
());
// get x_norm
ElementwiseComputeEx
<
SubFunctor
<
T
>
,
DeviceContext
,
T
>
(
ctx
,
&
x
,
mean
,
/*axis*/
0
,
SubFunctor
<
T
>
(),
&
temp_norm
);
ElementwiseComputeEx
<
DivAndSqrtFunctor
<
T
>
,
DeviceContext
,
T
>
(
ctx
,
&
temp_norm
,
var
,
/*axis*/
0
,
DivAndSqrtFunctor
<
T
>
(
static_cast
<
T
>
(
epsilon
)),
&
temp_norm
);
}
temp_norm
.
mutable_data
<
T
>
(
matrix_shape
,
ctx
.
GetPlace
());
// get x_norm
ElementwiseComputeEx
<
SubFunctor
<
T
>
,
DeviceContext
,
T
>
(
ctx
,
&
x
,
mean
,
/*axis*/
0
,
SubFunctor
<
T
>
(),
&
temp_norm
);
ElementwiseComputeEx
<
DivAndSqrtFunctor
<
T
>
,
DeviceContext
,
T
>
(
ctx
,
&
temp_norm
,
var
,
/*axis*/
0
,
DivAndSqrtFunctor
<
T
>
(
static_cast
<
T
>
(
epsilon
)),
&
temp_norm
);
}
if
(
d_bias
)
{
...
...
paddle/fluid/operators/lod_rank_table_op.cc
浏览文件 @
dd080b17
...
...
@@ -64,11 +64,9 @@ class LoDRankTableInferShape : public framework::InferShapeBase {
class
LoDRankTableInferVarType
:
public
framework
::
VarTypeInference
{
public:
void
operator
()(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
override
{
for
(
auto
&
o
:
op_desc
.
Output
(
"Out"
))
{
block
->
FindRecursiveOrCreateVar
(
o
).
SetType
(
framework
::
proto
::
VarType
::
LOD_RANK_TABLE
);
void
operator
()(
framework
::
InferVarTypeContext
*
ctx
)
const
override
{
for
(
auto
&
o
:
ctx
->
Output
(
"Out"
))
{
ctx
->
SetType
(
o
,
framework
::
proto
::
VarType
::
LOD_RANK_TABLE
);
}
}
};
...
...
paddle/fluid/operators/lod_tensor_to_array_op.cc
浏览文件 @
dd080b17
...
...
@@ -201,10 +201,9 @@ class LoDTensorToArrayInferShape : public framework::InferShapeBase {
class
LoDTensorToArrayInferVarType
:
public
framework
::
VarTypeInference
{
public:
void
operator
()(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
override
{
for
(
auto
&
out_var
:
op_desc
.
Output
(
"Out"
))
{
block
->
Var
(
out_var
)
->
SetType
(
framework
::
proto
::
VarType
::
LOD_TENSOR_ARRAY
);
void
operator
()(
framework
::
InferVarTypeContext
*
ctx
)
const
override
{
for
(
auto
&
out_var
:
ctx
->
Output
(
"Out"
))
{
ctx
->
SetType
(
out_var
,
framework
::
proto
::
VarType
::
LOD_TENSOR_ARRAY
);
}
}
};
...
...
paddle/fluid/operators/lookup_table_op.cc
浏览文件 @
dd080b17
...
...
@@ -147,22 +147,20 @@ class LookupTableOpGrad : public framework::OperatorWithKernel {
class
LookupTableOpGradVarTypeInference
:
public
framework
::
VarTypeInference
{
public:
void
operator
()(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
override
{
auto
out_var_name
=
op_desc
.
Output
(
framework
::
GradVarName
(
"W"
)).
front
();
auto
attr
=
op_desc
.
GetAttr
(
"is_sparse"
);
void
operator
()(
framework
::
InferVarTypeContext
*
ctx
)
const
override
{
auto
out_var_name
=
ctx
->
Output
(
framework
::
GradVarName
(
"W"
)).
front
();
auto
attr
=
ctx
->
GetAttr
(
"is_sparse"
);
bool
is_sparse
=
boost
::
get
<
bool
>
(
attr
);
if
(
is_sparse
)
{
VLOG
(
3
)
<<
"lookup_table_grad op "
<<
framework
::
GradVarName
(
"W"
)
<<
" is set to SelectedRows"
;
block
->
Var
(
out_var_name
)
->
SetType
(
framework
::
proto
::
VarType
::
SELECTED_ROWS
);
ctx
->
SetType
(
out_var_name
,
framework
::
proto
::
VarType
::
SELECTED_ROWS
);
}
else
{
VLOG
(
3
)
<<
"lookup_table_grad op "
<<
framework
::
GradVarName
(
"W"
)
<<
" is set to LoDTensor"
;
block
->
Var
(
out_var_name
)
->
SetType
(
framework
::
proto
::
VarType
::
LOD_TENSOR
);
ctx
->
SetType
(
out_var_name
,
framework
::
proto
::
VarType
::
LOD_TENSOR
);
}
block
->
Var
(
out_var_name
)
->
SetDataType
(
block
->
Var
(
"W"
)
->
GetDataType
(
));
ctx
->
SetDataType
(
out_var_name
,
ctx
->
GetDataType
(
ctx
->
Input
(
"W"
)[
0
]
));
}
};
...
...
paddle/fluid/operators/mkldnn/fc_mkldnn_op.cc
浏览文件 @
dd080b17
...
...
@@ -123,7 +123,7 @@ class FCMKLDNNOpKernel : public paddle::framework::OpKernel<T> {
auto
&
dev_ctx
=
ctx
.
template
device_context
<
MKLDNNDeviceContext
>();
const
auto
&
mkldnn_engine
=
dev_ctx
.
GetEngine
();
auto
input
=
ctx
.
Input
<
Tensor
>
(
"Input"
);
auto
input
=
ctx
.
Input
<
framework
::
LoD
Tensor
>
(
"Input"
);
auto
w
=
ctx
.
Input
<
Tensor
>
(
"W"
);
auto
bias
=
ctx
.
Input
<
Tensor
>
(
"Bias"
);
...
...
@@ -151,7 +151,13 @@ class FCMKLDNNOpKernel : public paddle::framework::OpKernel<T> {
const
T
*
input_data
=
input
->
data
<
T
>
();
const
T
*
w_data
=
w
->
data
<
T
>
();
auto
output
=
ctx
.
Output
<
Tensor
>
(
"Out"
);
auto
output
=
ctx
.
Output
<
framework
::
LoDTensor
>
(
"Out"
);
int
in_num_col_dims
=
ctx
.
Attr
<
int
>
(
"in_num_col_dims"
);
std
::
vector
<
int64_t
>
output_dims
;
FCOutputSize
(
input
->
dims
(),
w
->
dims
(),
output_dims
,
in_num_col_dims
);
output
->
Resize
(
framework
::
make_ddim
(
output_dims
));
output
->
set_lod
(
input
->
lod
());
T
*
output_data
=
output
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
auto
dst_memory
=
mem
.
dst
(
output_data
);
...
...
@@ -204,19 +210,21 @@ class FCMKLDNNGradOpKernel : public paddle::framework::OpKernel<T> {
Tensor
*
input_grad
=
ctx
.
Output
<
Tensor
>
(
framework
::
GradVarName
(
"Input"
));
Tensor
*
w_grad
=
ctx
.
Output
<
Tensor
>
(
framework
::
GradVarName
(
"W"
));
const
Tensor
*
input
=
ctx
.
Input
<
Tensor
>
(
"Input"
);
const
T
*
input_data
=
input
->
data
<
T
>
();
const
Tensor
*
w
=
ctx
.
Input
<
Tensor
>
(
"W"
);
const
T
*
w_data
=
w
->
data
<
T
>
();
if
(
input_grad
)
{
input_grad
->
Resize
(
input
->
dims
());
input_grad_data
=
input_grad
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
}
if
(
w_grad
)
{
w_grad
->
Resize
(
w
->
dims
());
w_grad_data
=
w_grad
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
}
const
Tensor
*
input
=
ctx
.
Input
<
Tensor
>
(
"Input"
);
const
T
*
input_data
=
input
->
data
<
T
>
();
const
Tensor
*
w
=
ctx
.
Input
<
Tensor
>
(
"W"
);
const
T
*
w_data
=
w
->
data
<
T
>
();
const
Tensor
*
out_grad
=
ctx
.
Input
<
Tensor
>
(
framework
::
GradVarName
(
"Out"
));
const
T
*
out_grad_data
=
out_grad
->
data
<
T
>
();
...
...
paddle/fluid/operators/mkldnn/transpose_mkldnn_op.cc
浏览文件 @
dd080b17
...
...
@@ -73,6 +73,29 @@ class TransposeMKLDNNOpKernel : public paddle::framework::OpKernel<T> {
}
};
template
<
typename
T
>
class
TransposeINT8MKLDNNOpKernel
:
public
paddle
::
framework
::
OpKernel
<
T
>
{
public:
void
Compute
(
const
paddle
::
framework
::
ExecutionContext
&
ctx
)
const
override
{
std
::
vector
<
int
>
axis
=
ctx
.
Attr
<
std
::
vector
<
int
>>
(
"axis"
);
std
::
vector
<
int
>
axis_int8
=
{
0
,
2
,
3
,
1
};
if
(
axis
.
size
()
!=
1
)
{
PADDLE_ENFORCE_EQ
(
axis
.
size
(),
axis_int8
.
size
());
for
(
size_t
i
=
0
;
i
<
axis
.
size
();
i
++
)
{
PADDLE_ENFORCE_EQ
(
axis
[
i
],
axis_int8
[
i
],
"Current INT8 MKLDNN Transpose kernel only surpport "
"axis with [0, 2, 3, 1] due to MKL-DNN kernel "
"implementation."
);
}
}
auto
*
input
=
ctx
.
Input
<
Tensor
>
(
"X"
);
auto
*
output
=
ctx
.
Output
<
Tensor
>
(
"Out"
);
output
->
ShareDataWith
(
*
input
);
output
->
set_layout
(
DataLayout
::
kMKLDNN
);
output
->
set_format
(
input
->
format
());
}
};
template
<
typename
T
>
class
TransposeMKLDNNGradOpKernel
:
public
paddle
::
framework
::
OpKernel
<
T
>
{
public:
...
...
@@ -140,7 +163,10 @@ class TransposeMKLDNNGradOpKernel : public paddle::framework::OpKernel<T> {
namespace
ops
=
paddle
::
operators
;
REGISTER_OP_KERNEL
(
transpose2
,
MKLDNN
,
::
paddle
::
platform
::
CPUPlace
,
ops
::
TransposeMKLDNNOpKernel
<
float
>
);
ops
::
TransposeMKLDNNOpKernel
<
float
>
,
ops
::
TransposeINT8MKLDNNOpKernel
<
uint8_t
>
,
ops
::
TransposeINT8MKLDNNOpKernel
<
int8_t
>
);
REGISTER_OP_KERNEL
(
transpose
,
MKLDNN
,
::
paddle
::
platform
::
CPUPlace
,
ops
::
TransposeMKLDNNOpKernel
<
float
>
);
...
...
paddle/fluid/operators/nccl/nccl_op.cc
浏览文件 @
dd080b17
...
...
@@ -60,12 +60,9 @@ class NCCLInitOp : public framework::OperatorBase {
class
NCCLInitOpVarTypeInference
:
public
framework
::
VarTypeInference
{
public:
void
operator
()(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
override
{
auto
out_var_name
=
op_desc
.
Output
(
"Communicator"
).
front
();
auto
&
out_var
=
block
->
FindRecursiveOrCreateVar
(
out_var_name
);
auto
var_type
=
framework
::
proto
::
VarType
::
RAW
;
out_var
.
SetType
(
var_type
);
void
operator
()(
framework
::
InferVarTypeContext
*
ctx
)
const
override
{
auto
out_var_name
=
ctx
->
Output
(
"Communicator"
).
front
();
ctx
->
SetType
(
out_var_name
,
framework
::
proto
::
VarType
::
RAW
);
}
};
...
...
paddle/fluid/operators/nce_op.cc
浏览文件 @
dd080b17
...
...
@@ -237,23 +237,21 @@ class NCEOpGrad : public framework::OperatorWithKernel {
class
NCEOpGradVarTypeInference
:
public
framework
::
VarTypeInference
{
public:
void
operator
()(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
override
{
auto
weight_grad
=
op_desc
.
Output
(
framework
::
GradVarName
(
"Weight"
)).
front
();
void
operator
()(
framework
::
InferVarTypeContext
*
ctx
)
const
override
{
auto
weight_grad
=
ctx
->
Output
(
framework
::
GradVarName
(
"Weight"
)).
front
();
auto
attr
=
op_desc
.
GetAttr
(
"is_sparse"
);
auto
attr
=
ctx
->
GetAttr
(
"is_sparse"
);
bool
is_sparse
=
boost
::
get
<
bool
>
(
attr
);
if
(
is_sparse
)
{
VLOG
(
3
)
<<
"nce_op_grad op "
<<
weight_grad
<<
" and "
<<
" is set to SelectedRows"
;
block
->
Var
(
weight_grad
)
->
SetType
(
framework
::
proto
::
VarType
::
SELECTED_ROWS
);
ctx
->
SetType
(
weight_grad
,
framework
::
proto
::
VarType
::
SELECTED_ROWS
);
}
else
{
VLOG
(
3
)
<<
"nce_op_grad op "
<<
weight_grad
<<
" and "
<<
" is set to LoDTensor"
;
block
->
Var
(
weight_grad
)
->
SetType
(
framework
::
proto
::
VarType
::
LOD_TENSOR
);
ctx
->
SetType
(
weight_grad
,
framework
::
proto
::
VarType
::
LOD_TENSOR
);
}
block
->
Var
(
weight_grad
)
->
SetDataType
(
block
->
Var
(
"Input"
)
->
GetDataType
(
));
ctx
->
SetDataType
(
weight_grad
,
ctx
->
GetDataType
(
ctx
->
Input
(
"Input"
)[
0
]
));
}
};
...
...
paddle/fluid/operators/ngraph/ngraph_engine.cc
浏览文件 @
dd080b17
...
...
@@ -92,12 +92,10 @@ static std::vector<std::vector<int>> NgraphOpIntervals(
int
size
=
ops
->
size
();
int
left
=
0
;
while
(
left
<
size
&&
ops
->
at
(
left
)
->
Type
()
!=
framework
::
kFeedOpType
)
{
while
(
left
<
size
&&
ops
->
at
(
left
)
->
Type
()
!=
framework
::
kFeedOpType
&&
ops
->
at
(
left
)
->
Type
()
!=
framework
::
kFetchOpType
)
{
++
left
;
}
if
(
left
==
size
)
{
return
intervals
;
}
while
(
left
<
size
&&
ops
->
at
(
left
)
->
Type
()
==
framework
::
kFeedOpType
)
{
for
(
auto
&
var_name_item
:
ops
->
at
(
left
)
->
Outputs
())
{
...
...
@@ -112,10 +110,6 @@ static std::vector<std::vector<int>> NgraphOpIntervals(
while
(
right
<
size
&&
ops
->
at
(
right
)
->
Type
()
!=
framework
::
kFetchOpType
)
{
++
right
;
}
if
(
right
==
size
)
{
return
intervals
;
}
if
(
left
>=
right
)
return
intervals
;
int
index
=
right
;
while
(
index
<
size
&&
ops
->
at
(
index
)
->
Type
()
==
framework
::
kFetchOpType
)
{
...
...
@@ -127,6 +121,10 @@ static std::vector<std::vector<int>> NgraphOpIntervals(
++
index
;
}
if
(
left
==
size
||
ops
->
at
(
left
)
->
Type
()
==
framework
::
kFetchOpType
)
{
left
=
0
;
}
// (left, right - 1) represents indices between feed and fetch
int
pivot
=
left
;
while
(
pivot
<
right
)
{
...
...
@@ -234,6 +232,7 @@ NgraphEngine::NgraphEngine(const framework::Scope& scope,
}
void
NgraphEngine
::
Prepare
(
const
std
::
vector
<
int
>&
interval
)
{
bool
has_fetch
=
false
,
is_full
=
false
;
for
(
auto
&
var
:
p_bdesc
->
AllVars
())
{
if
(
!
(
var
->
GetType
()
==
framework
::
proto
::
VarType
::
SELECTED_ROWS
||
var
->
GetType
()
==
framework
::
proto
::
VarType
::
LOD_TENSOR
||
...
...
@@ -264,6 +263,9 @@ void NgraphEngine::Prepare(const std::vector<int>& interval) {
std
::
vector
<
paddle
::
framework
::
OpDesc
*>
ops_desc
;
for
(
auto
op_desc
:
p_bdesc
->
AllOps
())
{
ops_desc
.
emplace_back
(
op_desc
);
if
(
op_desc
->
Type
()
==
framework
::
kFetchOpType
)
{
has_fetch
=
true
;
}
}
for
(
auto
op_desc
:
ops_desc
)
{
...
...
@@ -276,11 +278,11 @@ void NgraphEngine::Prepare(const std::vector<int>& interval) {
if
(
interval
[
0
]
>
0
&&
ops_desc
.
at
(
interval
[
0
]
-
1
)
->
Type
()
==
framework
::
kFeedOpType
&&
interval
[
1
]
<
static_cast
<
int
>
(
ops_desc
.
size
())
&&
ops_desc
.
at
(
interval
.
at
(
1
)
)
->
Type
()
==
framework
::
kFetchOpType
)
{
this
->
op_state_
=
OpState
::
FULL
;
ops_desc
.
at
(
interval
[
1
]
)
->
Type
()
==
framework
::
kFetchOpType
)
{
is_full
=
true
;
}
if
(
this
->
op_state_
==
OpState
::
FULL
)
{
if
(
is_full
)
{
this
->
op_state_
=
this
->
is_test_
?
OpState
::
FULL_TEST
:
OpState
::
FULL_TRAIN
;
}
else
{
this
->
op_state_
=
...
...
@@ -293,7 +295,8 @@ void NgraphEngine::Prepare(const std::vector<int>& interval) {
framework
::
OpRegistry
::
CreateOp
(
*
(
ops_desc
[
idx
])));
++
idx
;
}
while
(
ops_desc
.
at
(
idx
)
->
Type
()
!=
framework
::
kFetchOpType
)
{
while
(
idx
<
static_cast
<
int
>
(
ops_desc
.
size
())
&&
ops_desc
.
at
(
idx
)
->
Type
()
!=
framework
::
kFetchOpType
)
{
auto
op_desc
=
ops_desc
.
at
(
idx
);
for
(
auto
&
var_name_item
:
op_desc
->
Inputs
())
{
for
(
auto
&
var_name
:
var_name_item
.
second
)
{
...
...
@@ -303,6 +306,10 @@ void NgraphEngine::Prepare(const std::vector<int>& interval) {
++
idx
;
}
if
(
!
has_fetch
)
{
op_state_
=
OpState
::
UNKNOWN
;
}
BuildNgIO
(
ops_desc
,
interval
);
}
...
...
@@ -378,6 +385,7 @@ void NgraphEngine::BuildNgIO(const std::vector<framework::OpDesc*>& ops_desc,
}
}
}
for
(
size_t
i
=
0
;
i
<
var_in_
.
size
();
++
i
)
{
auto
var_name
=
var_in_
[
i
];
if
(
persistables_
.
find
(
var_name
)
==
persistables_
.
end
())
{
...
...
paddle/fluid/operators/ngraph/ngraph_engine.h
浏览文件 @
dd080b17
...
...
@@ -12,8 +12,8 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#
ifndef PADDLE_FLUID_OPERATORS_NGRAPH_NGRAPH_ENGINE_H_
#define PADDLE_FLUID_OPERATORS_NGRAPH_NGRAPH_ENGINE_H_
#
pragma once
#include <memory>
#include <set>
#include <string>
...
...
@@ -35,7 +35,6 @@ enum class OpState { /* nGraph support state on ops */
PARTIAL_TRAIN
,
/* Support partial ops for train */
FULL_TEST
,
/* Support full list of ops for test */
PARTIAL_TEST
,
/* Support partial list of ops for test */
FULL
,
/* All ops supported from feed to fetch */
UNKNOWN
/* Output all for debug purpose */
};
...
...
@@ -119,4 +118,3 @@ class NgraphEngine {
}
// namespace operators
}
// namespace paddle
#endif // PADDLE_FLUID_OPERATORS_NGRAPH_NGRAPH_ENGINE_H_
paddle/fluid/operators/ngraph/ngraph_engine_op.cc
浏览文件 @
dd080b17
...
...
@@ -37,8 +37,7 @@ class NgraphEngineOpMaker : public framework::OpProtoAndCheckerMaker {
class
NgraphEngineInferVarType
:
public
framework
::
VarTypeInference
{
public:
void
operator
()(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
override
{}
void
operator
()(
framework
::
InferVarTypeContext
*
ctx
)
const
override
{}
};
}
// namespace operators
...
...
paddle/fluid/operators/optimizers/adam_op.h
浏览文件 @
dd080b17
...
...
@@ -15,6 +15,7 @@ limitations under the License. */
#pragma once
#include <math.h> // for sqrt in CPU and CUDA
#include <Eigen/Dense>
#include <unordered_map>
#include <vector>
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/threadpool.h"
...
...
@@ -311,17 +312,17 @@ struct SparseAdamFunctor<T, CPUAdam> {
T
beta1_pow
=
*
beta1_pow_
;
T
beta2_pow
=
*
beta2_pow_
;
lr
*=
sqrt
(
1
-
beta2_pow
)
/
(
1
-
beta1_pow
);
size_t
row_count
=
numel
/
row_numel_
;
int64_t
row_count
=
static_cast
<
int64_t
>
(
numel
/
row_numel_
)
;
for
(
size_t
i
=
0U
,
j
=
0U
;
i
!=
row_count
;
++
i
)
{
for
(
int64_t
i
=
0
,
j
=
0
;
i
!=
row_count
;
++
i
)
{
if
(
i
==
*
(
rows_
+
j
))
{
for
(
size_t
k
=
0U
;
k
!=
row_numel_
;
++
k
)
{
for
(
int64_t
k
=
0
;
k
!=
row_numel_
;
++
k
)
{
T
g
=
grad_
[
j
*
row_numel_
+
k
];
adam_update
(
i
*
row_numel_
+
k
,
g
);
}
++
j
;
}
else
{
for
(
size_t
k
=
0U
;
k
!=
row_numel_
;
++
k
)
{
for
(
int64_t
k
=
0
;
k
!=
row_numel_
;
++
k
)
{
T
mom1
=
moment1_
[
i
*
row_numel_
+
k
];
T
mom2
=
moment2_
[
i
*
row_numel_
+
k
];
T
p
=
param_
[
i
*
row_numel_
+
k
];
...
...
@@ -427,43 +428,23 @@ class AdamOpKernel : public framework::OpKernel<T> {
}
}
framework
::
SelectedRows
cpu
_grad_merge
;
framework
::
SelectedRows
tmp
_grad_merge
;
const
framework
::
SelectedRows
*
grad_merge_ptr
;
if
(
is_strict_sorted
)
{
grad_merge_ptr
=
&
grad
;
}
else
{
// merge duplicated rows if any.
// The rows of grad_merge have been sorted inside MergeAdd functor
framework
::
SelectedRows
*
grad_merge_var
;
scatter
::
MergeAdd
<
DeviceContext
,
T
>
merge_func
;
if
(
platform
::
is_cpu_place
(
ctx
.
GetPlace
()))
{
grad_merge_var
=
&
cpu_grad_merge
;
}
else
{
// FIXME(qiao): GPU also need to fix this
grad_merge_var
=
const_cast
<
framework
::
Scope
&>
(
ctx
.
scope
())
.
Var
()
->
GetMutable
<
framework
::
SelectedRows
>
();
}
merge_func
(
ctx
.
template
device_context
<
DeviceContext
>(),
grad
,
grad_merge_var
,
true
);
grad_merge_ptr
=
grad_merge_var
;
&
tmp_grad_merge
,
true
);
grad_merge_ptr
=
&
tmp_grad_merge
;
}
auto
&
grad_merge
=
*
grad_merge_ptr
;
auto
&
grad_tensor
=
grad_merge
.
value
();
const
T
*
grad_data
=
grad_tensor
.
template
data
<
T
>();
const
int64_t
*
rows
=
nullptr
;
// When compiled without CUDA, the CUDAData() interface should not be
// provided.
#if defined(PADDLE_WITH_CUDA)
if
(
platform
::
is_gpu_place
(
ctx
.
GetPlace
()))
{
rows
=
grad_merge
.
rows
().
CUDAData
(
ctx
.
GetPlace
());
}
else
{
#endif
rows
=
grad_merge
.
rows
().
data
();
#if defined(PADDLE_WITH_CUDA)
}
#endif
const
int64_t
*
rows
=
grad_merge
.
rows
().
Data
(
ctx
.
GetPlace
());
auto
row_numel
=
grad_tensor
.
numel
()
/
grad_merge
.
rows
().
size
();
if
(
platform
::
is_cpu_place
(
ctx
.
GetPlace
()))
{
...
...
@@ -488,7 +469,7 @@ class AdamOpKernel : public framework::OpKernel<T> {
}
}
#ifndef _WIN32
else
if
(
FLAGS_inner_op_parallelism
>
1
&&
else
if
(
FLAGS_inner_op_parallelism
>
1
&&
// NOLINT
min_row_size_to_use_multithread
>
0
&&
param
.
dims
()[
0
]
>
min_row_size_to_use_multithread
)
{
VLOG
(
3
)
<<
"use multi thread, inner_op_parallelism="
...
...
@@ -516,11 +497,11 @@ class AdamOpKernel : public framework::OpKernel<T> {
for
(
int
i
=
0
;
i
<
FLAGS_inner_op_parallelism
;
++
i
)
{
int64_t
start
=
i
*
line_in_each_thread
;
int64_t
end
=
(
i
+
1
)
*
line_in_each_thread
;
if
(
start
>=
param_row_count
)
{
if
(
start
>=
static_cast
<
int64_t
>
(
param_row_count
)
)
{
break
;
}
if
(
end
>
param_row_count
)
{
end
=
param_row_count
;
if
(
end
>
static_cast
<
int64_t
>
(
param_row_count
)
)
{
end
=
static_cast
<
int64_t
>
(
param_row_count
)
;
}
fs
.
push_back
(
framework
::
Async
([
&
functor
,
&
row_id_to_grad_row_offset
,
...
...
@@ -545,8 +526,8 @@ class AdamOpKernel : public framework::OpKernel<T> {
}
for
(
size_t
i
=
0
;
i
<
fs
.
size
();
++
i
)
fs
[
i
].
wait
();
}
#endif // !_WIN32
else
{
#endif
// !_WIN32
else
{
// NOLINT
functor
(
param
.
numel
());
}
}
else
if
(
platform
::
is_gpu_place
(
ctx
.
GetPlace
()))
{
...
...
paddle/fluid/operators/optimizers/lars_momentum_op.cc
浏览文件 @
dd080b17
...
...
@@ -56,9 +56,9 @@ This optimizer use LARS (https://arxiv.org/abs/1708.03888) to optimize each
weight using a local learning rate:
$$
local\_lr = \eta *
local\_lr = \eta *
\frac{\left \| param \right \|}{\left \| grad \right \| + \beta *\left \| param \right \|} \\
velocity = mu * velocity +
velocity = mu * velocity +
local\_lr * (grad + \beta * param) \\
param = param - velocity. \\
$$
...
...
@@ -72,8 +72,7 @@ use L2 regularizers in case of using LARS.
class
LarsMomentumOpVarTypeInference
:
public
framework
::
VarTypeInference
{
public:
void
operator
()(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
override
{}
void
operator
()(
framework
::
InferVarTypeContext
*
ctx
)
const
override
{}
};
}
// namespace operators
}
// namespace paddle
...
...
paddle/fluid/operators/optimizers/momentum_op.cc
浏览文件 @
dd080b17
...
...
@@ -21,18 +21,14 @@ using Tensor = framework::Tensor;
class
MomentumOpInferVarType
:
public
framework
::
VarTypeInference
{
public:
void
operator
()(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
override
{
auto
input_var
=
op_desc
.
Input
(
"Param"
)[
0
];
for
(
auto
&
out_var
:
op_desc
.
Output
(
"ParamOut"
))
{
if
(
block
->
FindRecursiveOrCreateVar
(
input_var
).
GetType
()
==
framework
::
proto
::
VarType
::
SELECTED_ROWS
)
{
block
->
FindRecursiveOrCreateVar
(
out_var
).
SetType
(
framework
::
proto
::
VarType
::
SELECTED_ROWS
);
}
else
if
(
block
->
FindRecursiveOrCreateVar
(
input_var
).
GetType
()
==
void
operator
()(
framework
::
InferVarTypeContext
*
ctx
)
const
override
{
auto
&
input_var
=
ctx
->
Input
(
"Param"
)[
0
];
for
(
auto
&
out_var
:
ctx
->
Output
(
"ParamOut"
))
{
if
(
ctx
->
GetType
(
input_var
)
==
framework
::
proto
::
VarType
::
SELECTED_ROWS
)
{
ctx
->
SetType
(
out_var
,
framework
::
proto
::
VarType
::
SELECTED_ROWS
);
}
else
if
(
ctx
->
GetType
(
input_var
)
==
framework
::
proto
::
VarType
::
LOD_TENSOR
)
{
block
->
FindRecursiveOrCreateVar
(
out_var
).
SetType
(
framework
::
proto
::
VarType
::
LOD_TENSOR
);
ctx
->
SetType
(
out_var
,
framework
::
proto
::
VarType
::
LOD_TENSOR
);
}
else
{
PADDLE_THROW
(
"Only support LodTensor and SelectedRows, Unexpected Input Type."
);
...
...
paddle/fluid/operators/optimizers/momentum_op.h
浏览文件 @
dd080b17
...
...
@@ -13,6 +13,7 @@ See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <memory>
#include <string>
#include "paddle/fluid/framework/eigen.h"
#include "paddle/fluid/framework/op_registry.h"
...
...
@@ -69,6 +70,7 @@ class MomentumOp : public framework::OperatorWithKernel {
ctx
->
SetOutputDim
(
"ParamOut"
,
param_dim
);
ctx
->
SetOutputDim
(
"VelocityOut"
,
param_dim
);
}
framework
::
OpKernelType
GetExpectedKernelType
(
const
framework
::
ExecutionContext
&
ctx
)
const
override
{
auto
input_data_type
=
framework
::
GetDataTypeOfVar
(
ctx
.
InputVar
(
"Param"
));
...
...
@@ -351,23 +353,14 @@ class MomentumOpKernel : public framework::OpKernel<T> {
VLOG
(
3
)
<<
"Grad SelectedRows contains no data!"
;
return
;
}
auto
*
merged_grad
=
const_cast
<
framework
::
Scope
&>
(
ctx
.
scope
())
.
Var
()
->
GetMutable
<
framework
::
SelectedRows
>
()
;
framework
::
SelectedRows
tmp_merged_grad
;
framework
::
SelectedRows
*
merged_grad
=
&
tmp_merged_grad
;
math
::
scatter
::
MergeAdd
<
DeviceContext
,
T
>
merge_func
;
merge_func
(
ctx
.
template
device_context
<
DeviceContext
>(),
*
grad
,
merged_grad
);
const
int64_t
*
rows
=
nullptr
;
#ifdef PADDLE_WITH_CUDA
if
(
platform
::
is_gpu_place
(
ctx
.
GetPlace
()))
{
rows
=
merged_grad
->
rows
().
CUDAData
(
ctx
.
GetPlace
());
}
else
{
#endif
rows
=
merged_grad
->
rows
().
data
();
#ifdef PADDLE_WITH_CUDA
}
#endif
const
int64_t
*
rows
=
merged_grad
->
rows
().
Data
(
ctx
.
GetPlace
());
int64_t
row_numel
=
merged_grad
->
value
().
numel
()
/
merged_grad
->
rows
().
size
();
platform
::
ForRange
<
DeviceContext
>
for_range
(
...
...
paddle/fluid/operators/optimizers/rmsprop_op.h
浏览文件 @
dd080b17
...
...
@@ -216,24 +216,14 @@ class RmspropOpKernel : public framework::OpKernel<T> {
}
}
else
if
(
grad_var
->
IsType
<
framework
::
SelectedRows
>
())
{
auto
&
grad
=
grad_var
->
Get
<
framework
::
SelectedRows
>
();
auto
*
merged_grad
=
const_cast
<
framework
::
Scope
&>
(
ctx
.
scope
())
.
Var
()
->
GetMutable
<
framework
::
SelectedRows
>
();
framework
::
SelectedRows
tmp_merged_grad
;
framework
::
SelectedRows
*
merged_grad
=
&
tmp_merged_grad
;
math
::
scatter
::
MergeAdd
<
DeviceContext
,
T
>
merge_func
;
merge_func
(
dev_ctx
,
grad
,
merged_grad
);
platform
::
ForRange
<
DeviceContext
>
for_range
(
dev_ctx
,
limit
);
const
int64_t
*
rows
;
#ifdef PADDLE_WITH_CUDA
if
(
platform
::
is_gpu_place
(
ctx
.
GetPlace
()))
{
rows
=
merged_grad
->
rows
().
CUDAData
(
ctx
.
GetPlace
());
}
else
{
#endif
rows
=
merged_grad
->
rows
().
data
();
#ifdef PADDLE_WITH_CUDA
}
#endif
const
int64_t
*
rows
=
merged_grad
->
rows
().
Data
(
ctx
.
GetPlace
());
auto
&
merged_tensor
=
merged_grad
->
value
();
int64_t
row_count
=
merged_grad
->
rows
().
size
();
int64_t
row_numel
=
merged_tensor
.
numel
()
/
row_count
;
...
...
paddle/fluid/operators/optimizers/sgd_op.cc
浏览文件 @
dd080b17
...
...
@@ -50,20 +50,18 @@ class SGDOp : public framework::OperatorWithKernel {
class
SGDOpInferVarType
:
public
framework
::
VarTypeInference
{
public:
void
operator
()(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
override
{
auto
input_var_n
=
op_desc
.
Input
(
"Param"
)[
0
];
auto
in_var_type
=
block
->
FindRecursiveOrCreateVar
(
input_var_n
).
GetType
();
void
operator
()(
framework
::
InferVarTypeContext
*
ctx
)
const
override
{
auto
&
input_var_n
=
ctx
->
Input
(
"Param"
)[
0
];
auto
in_var_type
=
ctx
->
GetType
(
input_var_n
);
PADDLE_ENFORCE
(
in_var_type
==
framework
::
proto
::
VarType
::
SELECTED_ROWS
||
in_var_type
==
framework
::
proto
::
VarType
::
LOD_TENSOR
,
"The input Var's type should be LoDtensor or SelectedRows,"
" but the received var(%s)'s type is %s"
,
input_var_n
,
in_var_type
);
for
(
auto
&
out_var_n
:
op_desc
.
Output
(
"ParamOut"
))
{
auto
&
out_var
=
block
->
FindRecursiveOrCreateVar
(
out_var_n
);
if
(
out_var
.
GetType
()
!=
in_var_type
)
{
out_var
.
SetType
(
in_var_type
);
for
(
auto
&
out_var_n
:
ctx
->
Output
(
"ParamOut"
))
{
if
(
ctx
->
GetType
(
out_var_n
)
!=
in_var_type
)
{
ctx
->
SetType
(
out_var_n
,
in_var_type
);
}
}
}
...
...
paddle/fluid/operators/py_func_op.cc
浏览文件 @
dd080b17
...
...
@@ -14,8 +14,11 @@
#include "paddle/fluid/operators/py_func_op.h"
#include <memory>
#include <set>
#include <string>
#include <unordered_set>
#include <utility>
#include <vector>
#include "paddle/fluid/framework/op_registry.h"
...
...
@@ -91,15 +94,12 @@ static void CallPythonFunc(py::object *callable,
}
}
class
PyFuncOpVarTypInference
:
public
framework
::
VarTypeInference
{
class
PyFuncOpVarTyp
e
Inference
:
public
framework
::
VarTypeInference
{
public:
void
operator
()(
const
framework
::
OpDesc
&
op
,
framework
::
BlockDesc
*
block
)
const
override
{
auto
&
outs
=
op
.
Outputs
();
bool
has_out
=
(
outs
.
count
(
"Out"
)
>
0
&&
!
outs
.
at
(
"Out"
).
empty
());
void
operator
()(
framework
::
InferVarTypeContext
*
ctx
)
const
override
{
bool
has_out
=
(
ctx
->
HasOutput
(
"Out"
)
&&
!
ctx
->
Output
(
"Out"
).
empty
());
auto
&
ins
=
op
.
Inputs
();
bool
has_in
=
(
ins
.
count
(
"X"
)
>
0
&&
!
ins
.
at
(
"X"
).
empty
());
bool
has_in
=
(
ctx
->
HasInput
(
"X"
)
&&
!
ctx
->
Input
(
"X"
).
empty
());
/**
* X or Out can be empty, so that py_func can be more flexible
...
...
@@ -107,8 +107,8 @@ class PyFuncOpVarTypInference : public framework::VarTypeInference {
*/
PADDLE_ENFORCE
(
has_in
||
has_out
,
"Input(X) or Output(Out) must exist"
);
PADDLE_ENFORCE_GE
(
boost
::
get
<
int
>
(
op
.
GetAttr
(
kForwardPythonCallableId
)),
0
,
"Function id cannot be less than 0"
);
PADDLE_ENFORCE_GE
(
boost
::
get
<
int
>
(
ctx
->
GetAttr
(
kForwardPythonCallableId
))
,
0
,
"Function id cannot be less than 0"
);
if
(
!
has_out
)
return
;
...
...
@@ -118,7 +118,7 @@ class PyFuncOpVarTypInference : public framework::VarTypeInference {
* the corresponding forward variable
*/
const
std
::
string
kGradVarSuffix
=
framework
::
kGradVarSuffix
;
auto
&
out_var_names
=
outs
.
a
t
(
"Out"
);
auto
&
out_var_names
=
ctx
->
Outpu
t
(
"Out"
);
for
(
auto
&
out_var_name
:
out_var_names
)
{
if
(
out_var_name
==
framework
::
kEmptyVarName
||
out_var_name
.
size
()
<
kGradVarSuffix
.
size
())
{
...
...
@@ -128,18 +128,17 @@ class PyFuncOpVarTypInference : public framework::VarTypeInference {
size_t
len
=
out_var_name
.
size
()
-
kGradVarSuffix
.
size
();
if
(
out_var_name
.
substr
(
len
)
==
kGradVarSuffix
)
{
auto
fwd_var_name
=
out_var_name
.
substr
(
0
,
len
);
auto
*
out_var_desc
=
block
->
FindVarRecursive
(
out_var_name
);
auto
*
fwd_var_desc
=
block
->
FindVarRecursive
(
fwd_var_name
);
PADDLE_ENFORCE_NOT_NULL
(
out_var_desc
,
"Backward variable %s not found"
,
out_var_name
);
PADDLE_ENFORCE_NOT_NULL
(
fwd_var_desc
,
"Forward variable %s not found"
,
fwd_var_name
);
PADDLE_ENFORCE
(
ctx
->
HasVar
(
out_var_name
),
"Backward variable %s not found"
,
out_var_name
);
PADDLE_ENFORCE
(
ctx
->
HasVar
(
fwd_var_name
),
"Backward variable %s not found"
,
fwd_var_name
);
VLOG
(
10
)
<<
"Infer var_desc of Output("
<<
out_var_name
<<
") as Input("
<<
fwd_var_name
<<
")"
;
out_var_desc
->
SetShape
(
fwd_var_desc
->
GetShape
());
out_var_desc
->
SetDataType
(
fwd_var_desc
->
GetDataType
());
out_var_desc
->
SetLoDLevel
(
fwd_var_desc
->
GetLoDLevel
());
out_var_desc
->
SetType
(
fwd_var_desc
->
GetType
());
ctx
->
SetShape
(
out_var_name
,
ctx
->
GetShape
(
fwd_var_name
));
ctx
->
SetDataType
(
out_var_name
,
ctx
->
GetDataType
(
fwd_var_name
));
ctx
->
SetLoDLevel
(
out_var_name
,
ctx
->
GetLoDLevel
(
fwd_var_name
));
ctx
->
SetType
(
out_var_name
,
ctx
->
GetType
(
fwd_var_name
));
}
}
}
...
...
@@ -309,5 +308,5 @@ class PyFuncOp : public framework::OperatorBase {
namespace
ops
=
paddle
::
operators
;
REGISTER_OPERATOR
(
py_func
,
ops
::
PyFuncOp
,
ops
::
PyFuncOpMaker
,
ops
::
PyFuncOpVarTypInference
,
ops
::
PyFuncOpShapeInference
,
ops
::
PyFuncOpVarTyp
e
Inference
,
ops
::
PyFuncOpShapeInference
,
ops
::
PyFuncOpGradDescMaker
);
paddle/fluid/operators/reader/create_custom_reader_op.cc
浏览文件 @
dd080b17
...
...
@@ -85,10 +85,10 @@ class CreateCustomReaderOpMaker : public DecoratedReaderMakerBase {
AddComment
(
R"DOC(
CreateCustomReader Operator
A custom reader can be used for input data preprocessing.
A custom reader holds its own sub-block, which will be executed in CPU
in its 'ReadNext()' function. Users can configurate their own
preprocessing pipelines by inserting operators into custom reader's
A custom reader can be used for input data preprocessing.
A custom reader holds its own sub-block, which will be executed in CPU
in its 'ReadNext()' function. Users can configurate their own
preprocessing pipelines by inserting operators into custom reader's
sub-block.
)DOC"
);
}
...
...
@@ -123,23 +123,22 @@ class CustomReaderInferShape : public framework::InferShapeBase {
class
CustomReaderInferVarType
:
public
framework
::
VarTypeInference
{
public:
void
operator
()(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
override
{
framework
::
VarDesc
*
out_reader
=
block
->
FindVar
(
op_desc
.
Output
(
"Out"
)[
0
]);
PADDLE_ENFORCE_NOT_NULL
(
out_reader
);
out_reader
->
SetType
(
framework
::
proto
::
VarType
::
READER
);
void
operator
()(
framework
::
InferVarTypeContext
*
ctx
)
const
override
{
auto
&
out_var_name
=
ctx
->
Output
(
"Out"
)[
0
];
PADDLE_ENFORCE
(
ctx
->
HasVar
(
out_var_name
));
ctx
->
SetType
(
out_var_name
,
framework
::
proto
::
VarType
::
READER
);
auto
sink_var_names
=
boost
::
get
<
std
::
vector
<
std
::
string
>>
(
op_desc
.
GetAttr
(
"sink_var_names"
));
boost
::
get
<
std
::
vector
<
std
::
string
>>
(
ctx
->
GetAttr
(
"sink_var_names"
));
const
auto
*
sub_block
=
boost
::
get
<
framework
::
BlockDesc
*>
(
op_desc
.
GetAttr
(
"sub_block"
));
boost
::
get
<
framework
::
BlockDesc
*>
(
ctx
->
GetAttr
(
"sub_block"
));
std
::
vector
<
framework
::
proto
::
VarType
::
Type
>
res_data_types
;
for
(
const
std
::
string
&
var_name
:
sink_var_names
)
{
framework
::
VarDesc
*
var
=
sub_block
->
FindVar
(
var_name
);
PADDLE_ENFORCE_NOT_NULL
(
var
);
res_data_types
.
emplace_back
(
var
->
GetDataType
());
}
out_reader
->
SetDataTypes
(
res_data_types
);
ctx
->
SetDataTypes
(
out_var_name
,
res_data_types
);
}
};
...
...
paddle/fluid/operators/reader/read_op.cc
浏览文件 @
dd080b17
...
...
@@ -51,19 +51,16 @@ class ReadInferShape : public framework::InferShapeBase {
class
ReadInferVarType
:
public
framework
::
VarTypeInference
{
public:
void
operator
()(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
override
{
bool
infer_out
=
boost
::
get
<
bool
>
(
op_desc
.
GetAttr
(
"infer_out"
));
void
operator
()(
framework
::
InferVarTypeContext
*
ctx
)
const
override
{
bool
infer_out
=
boost
::
get
<
bool
>
(
ctx
->
GetAttr
(
"infer_out"
));
if
(
infer_out
)
{
std
::
string
reader_name
=
op_desc
.
Input
(
"Reader"
)[
0
];
std
::
vector
<
std
::
string
>
out_names
=
op_desc
.
Output
(
"Out"
);
framework
::
VarDesc
*
reader
=
block
->
FindVarRecursive
(
reader_name
);
auto
dtypes
=
reader
->
GetDataTypes
();
std
::
string
reader_name
=
ctx
->
Input
(
"Reader"
)[
0
];
std
::
vector
<
std
::
string
>
out_names
=
ctx
->
Output
(
"Out"
);
auto
dtypes
=
ctx
->
GetDataTypes
(
reader_name
);
PADDLE_ENFORCE_EQ
(
dtypes
.
size
(),
out_names
.
size
());
for
(
size_t
i
=
0
;
i
<
dtypes
.
size
();
++
i
)
{
framework
::
VarDesc
&
out
=
block
->
FindRecursiveOrCreateVar
(
out_names
[
i
]);
out
.
SetType
(
framework
::
proto
::
VarType
::
LOD_TENSOR
);
out
.
SetDataType
(
dtypes
[
i
]);
ctx
->
SetType
(
out_names
[
i
],
framework
::
proto
::
VarType
::
LOD_TENSOR
);
ctx
->
SetDataType
(
out_names
[
i
],
dtypes
[
i
]);
}
}
}
...
...
paddle/fluid/operators/reader/reader_op_registry.cc
浏览文件 @
dd080b17
...
...
@@ -98,11 +98,10 @@ void FileReaderInferShape::operator()(framework::InferShapeContext* ctx) const {
}
}
void
FileReaderInferVarType
::
operator
()(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
{
std
::
string
reader_name
=
op_desc
.
Output
(
"Out"
)[
0
];
framework
::
VarDesc
*
reader
=
block
->
FindVarRecursive
(
reader_name
);
reader
->
SetType
(
framework
::
proto
::
VarType
::
READER
);
void
FileReaderInferVarType
::
operator
()(
framework
::
InferVarTypeContext
*
ctx
)
const
{
std
::
string
reader_name
=
ctx
->
Output
(
"Out"
)[
0
];
ctx
->
SetType
(
reader_name
,
framework
::
proto
::
VarType
::
READER
);
}
void
DecoratedReaderInferShape
::
operator
()(
...
...
@@ -125,13 +124,11 @@ void DecoratedReaderInferShape::operator()(
}
void
DecoratedReaderInferVarType
::
operator
()(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
{
std
::
string
in_reader_name
=
op_desc
.
Input
(
"UnderlyingReader"
)[
0
];
framework
::
VarDesc
*
in_reader
=
block
->
FindVarRecursive
(
in_reader_name
);
std
::
string
out_reader_name
=
op_desc
.
Output
(
"Out"
)[
0
];
framework
::
VarDesc
*
out_reader
=
block
->
FindVarRecursive
(
out_reader_name
);
out_reader
->
SetType
(
framework
::
proto
::
VarType
::
READER
);
out_reader
->
SetDataTypes
(
in_reader
->
GetDataTypes
());
framework
::
InferVarTypeContext
*
ctx
)
const
{
const
std
::
string
&
in_reader_name
=
ctx
->
Input
(
"UnderlyingReader"
)[
0
];
const
std
::
string
&
out_reader_name
=
ctx
->
Output
(
"Out"
)[
0
];
ctx
->
SetType
(
out_reader_name
,
framework
::
proto
::
VarType
::
READER
);
ctx
->
SetDataTypes
(
out_reader_name
,
ctx
->
GetDataTypes
(
in_reader_name
));
}
void
DecoratedReaderMakerBase
::
Make
()
{
...
...
paddle/fluid/operators/reader/reader_op_registry.h
浏览文件 @
dd080b17
...
...
@@ -14,7 +14,9 @@
#pragma once
#include <memory>
#include <string>
#include <unordered_map>
#include <vector>
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/reader.h"
...
...
@@ -59,8 +61,7 @@ class FileReaderInferShape : public framework::InferShapeBase {
class
FileReaderInferVarType
:
public
framework
::
VarTypeInference
{
public:
void
operator
()(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
override
;
void
operator
()(
framework
::
InferVarTypeContext
*
ctx
)
const
override
;
};
// general infershape for decorated reader
...
...
@@ -72,8 +73,7 @@ class DecoratedReaderInferShape : public framework::InferShapeBase {
// general var type inference for decorated reader
class
DecoratedReaderInferVarType
:
public
framework
::
VarTypeInference
{
public:
void
operator
()(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
override
;
void
operator
()(
framework
::
InferVarTypeContext
*
ctx
)
const
override
;
};
class
DecoratedReaderMakerBase
:
public
framework
::
OpProtoAndCheckerMaker
{
...
...
paddle/fluid/operators/save_op.cc
浏览文件 @
dd080b17
...
...
@@ -159,12 +159,9 @@ This operator will serialize and write LoDTensor / SelectedRows variable to file
class
SaveOpVarTypeInference
:
public
framework
::
VarTypeInference
{
public:
void
operator
()(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
override
{
auto
out_var_name
=
op_desc
.
Output
(
LOOKUP_TABLE_PATH
).
front
();
auto
&
out_var
=
block
->
FindRecursiveOrCreateVar
(
out_var_name
);
auto
var_type
=
framework
::
proto
::
VarType
::
RAW
;
out_var
.
SetType
(
var_type
);
void
operator
()(
framework
::
InferVarTypeContext
*
ctx
)
const
override
{
auto
out_var_name
=
ctx
->
Output
(
LOOKUP_TABLE_PATH
).
front
();
ctx
->
SetType
(
out_var_name
,
framework
::
proto
::
VarType
::
RAW
);
}
};
...
...
paddle/fluid/operators/scale_op.cc
浏览文件 @
dd080b17
...
...
@@ -14,6 +14,7 @@ limitations under the License. */
#include "paddle/fluid/operators/scale_op.h"
#include <memory>
#include <string>
#include "paddle/fluid/operators/detail/safe_ref.h"
...
...
@@ -69,17 +70,13 @@ $$Out = scale*(X + bias)$$
class
ScaleOpVarTypeInference
:
public
framework
::
VarTypeInference
{
public:
void
operator
()(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
override
{
auto
&
in_var_name
=
op_desc
.
Input
(
"X"
).
front
();
auto
&
in_var
=
detail
::
Ref
(
block
->
FindVarRecursive
(
in_var_name
));
auto
out_var_name
=
op_desc
.
Output
(
"Out"
).
front
();
auto
*
out_var
=
block
->
FindVarRecursive
(
out_var_name
);
void
operator
()(
framework
::
InferVarTypeContext
*
ctx
)
const
override
{
auto
&
in_var_name
=
ctx
->
Input
(
"X"
).
front
();
auto
out_var_name
=
ctx
->
Output
(
"Out"
).
front
();
if
(
in_var_name
!=
out_var_name
)
{
out_var
->
SetType
(
in_var
.
GetType
(
));
out_var
->
SetDataType
(
in_var
.
GetDataType
(
));
ctx
->
SetType
(
out_var_name
,
ctx
->
GetType
(
in_var_name
));
ctx
->
SetDataType
(
out_var_name
,
ctx
->
GetDataType
(
in_var_name
));
}
}
};
...
...
paddle/fluid/operators/sequence_ops/sequence_enumerate_op.cc
浏览文件 @
dd080b17
...
...
@@ -30,13 +30,6 @@ class SequenceEnumerateOp : public framework::OperatorWithKernel {
"Output(X) of SequenceEnumerate operator should not be null."
);
const
auto
x_dims
=
ctx
->
GetInputDim
(
"X"
);
PADDLE_ENFORCE_EQ
(
x_dims
.
size
(),
2
,
"Input(X) of SequenceEnumerate operator's rank should be 2."
);
PADDLE_ENFORCE_EQ
(
x_dims
[
1
],
1
,
"Input(X) of SequenceEnumerate operator's 2nd "
"dimension should be 1."
);
const
auto
win_size
=
ctx
->
Attrs
().
Get
<
int
>
(
"win_size"
);
ctx
->
SetOutputDim
(
"Out"
,
{
x_dims
[
0
],
win_size
});
ctx
->
ShareLoD
(
"X"
,
"Out"
);
...
...
@@ -59,7 +52,8 @@ class SequenceEnumerateOpMaker : public framework::OpProtoAndCheckerMaker {
});
AddAttr
<
int
>
(
"pad_value"
,
"(int) The enumerate sequence padding value."
)
.
SetDefault
(
0
);
AddAttr
<
bool
>
(
framework
::
kAllKernelsMustComputeRuntimeShape
,
""
)
AddAttr
<
bool
>
(
framework
::
kAllKernelsMustComputeRuntimeShape
,
"Skip calling InferShape() function in the runtime."
)
.
SetDefault
(
true
);
AddComment
(
R"DOC(
Sequence Enumerate Operator.
...
...
paddle/fluid/operators/sequence_ops/sequence_enumerate_op.h
浏览文件 @
dd080b17
...
...
@@ -27,30 +27,47 @@ class SequenceEnumerateKernel : public framework::OpKernel<T> {
auto
*
in
=
context
.
Input
<
LoDTensor
>
(
"X"
);
auto
*
out
=
context
.
Output
<
LoDTensor
>
(
"Out"
);
int
win_size
=
context
.
Attr
<
int
>
(
"win_size"
);
int
pad_value
=
context
.
Attr
<
int
>
(
"pad_value"
);
auto
pad_value
=
static_cast
<
T
>
(
context
.
Attr
<
int
>
(
"pad_value"
)
);
auto
in_dims
=
in
->
dims
();
auto
in_lod
=
in
->
lod
();
auto
lod0
=
in
->
lod
()[
0
];
PADDLE_ENFORCE_EQ
(
static_cast
<
uint64_t
>
(
in_dims
[
0
]),
in_lod
[
0
]
.
back
(),
static_cast
<
uint64_t
>
(
in_dims
[
0
]),
lod0
.
back
(),
"The actual input data's size mismatched with LoD information."
);
PADDLE_ENFORCE_EQ
(
in_dims
.
size
(),
2UL
,
"Input(X) of SequenceEnumerate operator's rank should be 2."
);
PADDLE_ENFORCE_EQ
(
in_dims
[
1
],
1
,
"Input(X) of SequenceEnumerate operator's 2nd "
"dimension should be 1."
);
// Generate enumerate sequence set
auto
lod0
=
in_lod
[
0
];
auto
in_data
=
in
->
data
<
T
>
();
out
->
Resize
({
in_dims
[
0
],
win_size
});
out
->
set_lod
(
in
->
lod
());
auto
out_data
=
out
->
mutable_data
<
T
>
(
context
.
GetPlace
());
for
(
size_t
i
=
0
;
i
<
lod0
.
size
()
-
1
;
++
i
)
{
for
(
size_t
idx
=
lod0
[
i
];
idx
<
lod0
[
i
+
1
];
++
idx
)
{
for
(
int
word_idx
=
0
;
word_idx
<
win_size
;
++
word_idx
)
{
size_t
word_pos
=
idx
+
word_idx
;
out_data
[
win_size
*
idx
+
word_idx
]
=
word_pos
<
lod0
[
i
+
1
]
?
in_data
[
word_pos
]
:
pad_value
;
int
start
=
lod0
[
i
];
int
end
=
lod0
[
i
+
1
];
int
copy_size
=
win_size
<
end
-
start
+
1
?
win_size
:
end
-
start
+
1
;
int
mid
=
end
+
1
-
copy_size
;
int
pad_num
=
win_size
-
copy_size
;
copy_size
*=
sizeof
(
T
);
for
(
int
idx
=
start
;
idx
<
mid
;
++
idx
)
{
std
::
memcpy
(
out_data
,
in_data
+
idx
,
copy_size
);
out_data
+=
win_size
;
}
for
(
int
idx
=
mid
;
idx
<
end
;
++
idx
)
{
copy_size
-=
sizeof
(
T
);
pad_num
++
;
std
::
memcpy
(
out_data
,
in_data
+
idx
,
copy_size
);
T
*
pdata
=
out_data
+
copy_size
/
sizeof
(
T
);
for
(
int
i
=
0
;
i
<
pad_num
;
++
i
)
{
pdata
[
i
]
=
pad_value
;
}
out_data
+=
win_size
;
}
}
out
->
set_lod
(
in
->
lod
());
}
};
...
...
paddle/fluid/operators/slice_op.cu
浏览文件 @
dd080b17
...
...
@@ -31,18 +31,18 @@ __global__ void Padding(const paddle::platform::float16* d_out,
paddle
::
platform
::
float16
*
d_in
)
{
int64_t
out_idx
=
threadIdx
.
x
+
blockDim
.
x
*
blockIdx
.
x
;
if
(
out_idx
<
n
)
{
int64_t
out_idx_tmp
=
out_idx
;
int
coords
[
D
]
=
{
0
};
for
(
int
i
=
D
-
1
;
i
>=
0
;
--
i
)
{
coords
[
i
]
=
out_idx
%
out_dims
[
i
];
out_idx
/=
out_dims
[
i
];
coords
[
i
]
=
out_idx
_tmp
%
out_dims
[
i
];
out_idx
_tmp
/=
out_dims
[
i
];
coords
[
i
]
+=
offsets
[
i
];
}
int64_t
in_idx
=
0
;
for
(
int
i
=
0
;
i
<
D
-
1
;
++
i
)
{
in_idx
+=
coords
[
i
]
*
in_dims
[
i
+
1
];
for
(
int
i
=
0
;
i
<
D
;
++
i
)
{
in_idx
=
in_idx
*
in_dims
[
i
]
+
coords
[
i
];
}
in_idx
+=
coords
[
D
-
1
];
d_in
[
in_idx
]
=
d_out
[
out_idx
];
}
...
...
@@ -80,8 +80,8 @@ class SliceGradKernel<paddle::platform::CUDADeviceContext,
set_zero
(
dev_ctx
,
d_in
,
static_cast
<
paddle
::
platform
::
float16
>
(
0
));
int64_t
numel
=
d_out
->
numel
();
dim3
blocks
((
numel
-
1
)
/
PADDLE_CUDA_NUM_THREADS
+
1
,
1
,
1
);
dim3
threads
(
PADDLE_CUDA_NUM_THREADS
,
1
,
1
);
dim3
blocks
((
numel
-
1
)
/
PADDLE_CUDA_NUM_THREADS
+
1
);
dim3
threads
(
PADDLE_CUDA_NUM_THREADS
);
auto
stream
=
ctx
.
cuda_device_context
().
stream
();
auto
out_shape
=
framework
::
vectorize2int
(
out_dims
);
...
...
paddle/fluid/operators/softmax_with_cross_entropy_op.cc
浏览文件 @
dd080b17
...
...
@@ -13,6 +13,7 @@ See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/operators/softmax_with_cross_entropy_op.h"
#include <memory>
namespace
paddle
{
namespace
operators
{
...
...
@@ -187,7 +188,6 @@ class SoftmaxGradMaker : public framework::SingleGradOpDescMaker {
grad_op
->
SetType
(
"softmax_with_cross_entropy_grad"
);
grad_op
->
SetInput
(
"Label"
,
Input
(
"Label"
));
grad_op
->
SetInput
(
"Softmax"
,
Output
(
"Softmax"
));
grad_op
->
SetInput
(
"Loss"
,
Output
(
"Loss"
));
grad_op
->
SetInput
(
framework
::
GradVarName
(
"Softmax"
),
OutputGrad
(
"Softmax"
));
grad_op
->
SetInput
(
framework
::
GradVarName
(
"Loss"
),
OutputGrad
(
"Loss"
));
grad_op
->
SetOutput
(
framework
::
GradVarName
(
"Logits"
),
InputGrad
(
"Logits"
));
...
...
paddle/fluid/operators/split_selected_rows_op.cc
浏览文件 @
dd080b17
...
...
@@ -14,6 +14,8 @@ limitations under the License. */
#include "paddle/fluid/operators/split_selected_rows_op.h"
#include <memory>
namespace
paddle
{
namespace
operators
{
...
...
@@ -60,10 +62,9 @@ class SplitSelectedRowsOp : public framework::OperatorWithKernel {
class
SplitSelectedRowsOpInferVarType
:
public
framework
::
VarTypeInference
{
public:
void
operator
()(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
override
{
for
(
auto
&
out_var
:
op_desc
.
Output
(
"Out"
))
{
block
->
Var
(
out_var
)
->
SetType
(
framework
::
proto
::
VarType
::
SELECTED_ROWS
);
void
operator
()(
framework
::
InferVarTypeContext
*
ctx
)
const
override
{
for
(
auto
&
out_var
:
ctx
->
Output
(
"Out"
))
{
ctx
->
SetType
(
out_var
,
framework
::
proto
::
VarType
::
SELECTED_ROWS
);
}
}
};
...
...
paddle/fluid/operators/sum_op.cc
浏览文件 @
dd080b17
...
...
@@ -12,6 +12,7 @@ limitations under the License. */
#include "paddle/fluid/operators/sum_op.h"
#include <algorithm>
#include <memory>
#include <string>
#include <vector>
...
...
@@ -159,24 +160,20 @@ the LoD information with the first input.
class
SumOpVarTypeInference
:
public
framework
::
VarTypeInference
{
public:
void
operator
()(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
override
{
auto
&
inputs
=
op_desc
.
Input
(
"X"
);
void
operator
()(
framework
::
InferVarTypeContext
*
ctx
)
const
override
{
auto
&
inputs
=
ctx
->
Input
(
"X"
);
auto
var_type
=
framework
::
proto
::
VarType
::
SELECTED_ROWS
;
for
(
auto
&
name
:
op_desc
.
Input
(
"X"
))
{
VLOG
(
10
)
<<
name
<<
" "
<<
block
->
FindRecursiveOrCreateVar
(
name
).
GetType
();
for
(
auto
&
name
:
ctx
->
Input
(
"X"
))
{
VLOG
(
10
)
<<
name
<<
" "
<<
ctx
->
GetType
(
name
);
}
bool
any_input_is_lod_tensor
=
std
::
any_of
(
inputs
.
begin
(),
inputs
.
end
(),
[
block
](
const
std
::
string
&
name
)
{
return
block
->
FindRecursiveOrCreateVar
(
name
).
GetType
()
==
framework
::
proto
::
VarType
::
LOD_TENSOR
;
inputs
.
begin
(),
inputs
.
end
(),
[
ctx
](
const
std
::
string
&
name
)
{
return
ctx
->
GetType
(
name
)
==
framework
::
proto
::
VarType
::
LOD_TENSOR
;
});
auto
is_tensor_array
=
[
block
](
const
std
::
string
&
name
)
{
return
block
->
FindRecursiveOrCreateVar
(
name
).
GetType
()
==
framework
::
proto
::
VarType
::
LOD_TENSOR_ARRAY
;
auto
is_tensor_array
=
[
ctx
](
const
std
::
string
&
name
)
{
return
ctx
->
GetType
(
name
)
==
framework
::
proto
::
VarType
::
LOD_TENSOR_ARRAY
;
};
bool
any_input_is_tensor_array
=
...
...
@@ -188,8 +185,7 @@ class SumOpVarTypeInference : public framework::VarTypeInference {
if
(
!
all_inputs_are_tensor_array
)
{
std
::
ostringstream
os
;
for
(
auto
&
each
:
inputs
)
{
os
<<
" "
<<
each
<<
" type is "
<<
block
->
FindRecursiveOrCreateVar
(
each
).
GetType
()
<<
"
\n
"
;
os
<<
" "
<<
each
<<
" type is "
<<
ctx
->
GetType
(
each
)
<<
"
\n
"
;
}
PADDLE_ENFORCE
(
all_inputs_are_tensor_array
,
"Not all inputs are tensor array:
\n
%s"
,
os
.
str
());
...
...
@@ -199,11 +195,9 @@ class SumOpVarTypeInference : public framework::VarTypeInference {
var_type
=
framework
::
proto
::
VarType
::
LOD_TENSOR
;
}
auto
out_var_name
=
op_desc
.
Output
(
"Out"
).
front
();
auto
&
out_var
=
block
->
FindRecursiveOrCreateVar
(
out_var_name
);
out_var
.
SetType
(
var_type
);
auto
&
in_var
=
detail
::
Ref
(
block
->
FindVarRecursive
(
inputs
.
front
()));
out_var
.
SetDataType
(
in_var
.
GetDataType
());
auto
out_var_name
=
ctx
->
Output
(
"Out"
).
front
();
ctx
->
SetType
(
out_var_name
,
var_type
);
ctx
->
SetDataType
(
out_var_name
,
ctx
->
GetDataType
(
inputs
.
front
()));
}
};
...
...
paddle/fluid/operators/tensor_array_to_tensor_op.cc
浏览文件 @
dd080b17
...
...
@@ -177,10 +177,9 @@ class LoDTensorArray2TensorGradInferShape : public framework::InferShapeBase {
class
LoDTensorArray2TensorGradInferVarType
:
public
framework
::
VarTypeInference
{
public:
void
operator
()(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
override
{
for
(
auto
&
out_var
:
op_desc
.
Output
(
framework
::
GradVarName
(
"X"
)))
{
block
->
Var
(
out_var
)
->
SetType
(
framework
::
proto
::
VarType
::
LOD_TENSOR_ARRAY
);
void
operator
()(
framework
::
InferVarTypeContext
*
ctx
)
const
override
{
for
(
auto
&
out_var
:
ctx
->
Output
(
framework
::
GradVarName
(
"X"
)))
{
ctx
->
SetType
(
out_var
,
framework
::
proto
::
VarType
::
LOD_TENSOR_ARRAY
);
}
}
};
...
...
paddle/fluid/operators/tensorrt/tensorrt_engine_op.cc
浏览文件 @
dd080b17
...
...
@@ -46,8 +46,7 @@ class TensorRTEngineOpMaker : public framework::OpProtoAndCheckerMaker {
class
TensorRTEngineInferVarType
:
public
framework
::
VarTypeInference
{
public:
void
operator
()(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
override
{}
void
operator
()(
framework
::
InferVarTypeContext
*
ctx
)
const
override
{}
};
}
// namespace operators
...
...
paddle/fluid/operators/uniform_random_op.cc
浏览文件 @
dd080b17
...
...
@@ -112,17 +112,16 @@ uniform distribution. The random result is in set [min, max].
class
UniformRandomOpVarTypeInference
:
public
framework
::
VarTypeInference
{
public:
void
operator
()(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
override
{
auto
out_var_name
=
op_desc
.
Output
(
"Out"
).
front
();
void
operator
()(
framework
::
InferVarTypeContext
*
ctx
)
const
override
{
auto
out_var_name
=
ctx
->
Output
(
"Out"
).
front
();
auto
var_data_type
=
static_cast
<
framework
::
proto
::
VarType
::
Type
>
(
boost
::
get
<
int
>
(
op_desc
.
GetAttr
(
"dtype"
)));
boost
::
get
<
int
>
(
ctx
->
GetAttr
(
"dtype"
)));
auto
out_var
=
block
->
FindRecursiveOrCreateVar
(
out_var_name
);
if
(
out_var
.
GetType
()
!=
framework
::
proto
::
VarType
::
SELECTED_ROWS
)
{
out_var
.
SetType
(
framework
::
proto
::
VarType
::
LOD_TENSOR
);
if
(
ctx
->
GetType
(
out_var_name
)
!=
framework
::
proto
::
VarType
::
SELECTED_ROWS
)
{
ctx
->
SetType
(
out_var_name
,
framework
::
proto
::
VarType
::
LOD_TENSOR
);
}
out_var
.
SetDataType
(
var_data_type
);
ctx
->
SetDataType
(
out_var_name
,
var_data_type
);
}
};
...
...
paddle/fluid/platform/device_context.h
浏览文件 @
dd080b17
...
...
@@ -23,6 +23,7 @@ limitations under the License. */
#include "paddle/fluid/platform/cuda_helper.h"
#include "paddle/fluid/platform/dynload/cublas.h"
#include "paddle/fluid/platform/dynload/cudnn.h"
#include "paddle/fluid/platform/dynload/nccl.h"
#include "paddle/fluid/platform/gpu_info.h"
#endif
...
...
paddle/fluid/platform/nccl_helper.h
浏览文件 @
dd080b17
...
...
@@ -22,6 +22,7 @@
#include <typeindex>
#include <unordered_map>
#include <vector>
#include "paddle/fluid/framework/data_type.h"
#include "paddle/fluid/platform/dynload/nccl.h"
#include "paddle/fluid/platform/enforce.h"
...
...
@@ -79,7 +80,6 @@ struct NCCLContext {
:
ctx_
(
new
CUDADeviceContext
(
CUDAPlace
(
dev_id
))),
comm_
{
nullptr
}
{}
cudaStream_t
stream
()
const
{
return
ctx_
->
stream
();
}
ncclComm_t
comm
()
const
{
return
comm_
;
}
int
device_id
()
const
{
...
...
@@ -105,9 +105,6 @@ struct NCCLContextMap {
order_
.
size
(),
contexts_
.
size
(),
"NCCL Context Map does not support contain two or more same device"
);
if
(
places
.
size
()
<=
1
&&
num_trainers
==
1
)
{
return
;
}
std
::
unique_ptr
<
ncclComm_t
[]
>
comms
(
new
ncclComm_t
[
order_
.
size
()]);
// if num_trainers == 1, should create a new nccl id for local comms.
if
(
num_trainers
==
1
&&
nccl_id
==
nullptr
)
{
...
...
@@ -127,8 +124,8 @@ struct NCCLContextMap {
}
else
{
rank
=
trainer_id
;
}
VLOG
(
3
0
)
<<
"init nccl rank: "
<<
rank
<<
" nranks: "
<<
nranks
<<
"
gpu id: "
<<
gpu_id
;
VLOG
(
3
)
<<
"init nccl rank: "
<<
rank
<<
" nranks: "
<<
nranks
<<
"
gpu id: "
<<
gpu_id
;
PADDLE_ENFORCE
(
cudaSetDevice
(
gpu_id
));
PADDLE_ENFORCE
(
platform
::
dynload
::
ncclCommInitRank
(
comms
.
get
()
+
i
,
nranks
,
*
nccl_id
,
rank
));
...
...
paddle/fluid/pybind/imperative.cc
浏览文件 @
dd080b17
...
...
@@ -38,7 +38,7 @@ void BindTracer(pybind11::module* m) {
.
def
(
"trace"
,
[](
imperative
::
Tracer
&
self
,
imperative
::
OpBase
*
op
,
const
imperative
::
VarBasePtrMap
&
inputs
,
const
imperative
::
VarBasePtrMap
&
outputs
,
imperative
::
VarBasePtrMap
*
outputs
,
framework
::
AttributeMap
attrs_map
,
const
platform
::
CPUPlace
expected_place
,
const
bool
stop_gradient
=
false
)
{
...
...
@@ -49,7 +49,7 @@ void BindTracer(pybind11::module* m) {
.
def
(
"trace"
,
[](
imperative
::
Tracer
&
self
,
imperative
::
OpBase
*
op
,
const
imperative
::
VarBasePtrMap
&
inputs
,
const
imperative
::
VarBasePtrMap
&
outputs
,
imperative
::
VarBasePtrMap
*
outputs
,
framework
::
AttributeMap
attrs_map
,
const
platform
::
CUDAPlace
expected_place
,
const
bool
stop_gradient
=
false
)
{
...
...
paddle/fluid/pybind/pybind.cc
浏览文件 @
dd080b17
...
...
@@ -200,7 +200,7 @@ PYBIND11_MODULE(core, m) {
.
def_property
(
"name"
,
&
imperative
::
VarBase
::
Name
,
&
imperative
::
VarBase
::
SetName
)
.
def_property_readonly
(
"shape"
,
&
imperative
::
VarBase
::
Shape
)
.
def_property_readonly
(
"dtype"
,
&
imperative
::
VarBase
::
DType
)
.
def_property_readonly
(
"dtype"
,
&
imperative
::
VarBase
::
D
ata
Type
)
.
def_property
(
"persistable"
,
&
imperative
::
VarBase
::
IsPersistable
,
&
imperative
::
VarBase
::
SetPersistable
)
.
def_property
(
"stop_gradient"
,
&
imperative
::
VarBase
::
IsStopGradient
,
...
...
@@ -1263,6 +1263,10 @@ All parameter, weight, gradient are variables in Paddle.
"enable_inplace"
,
[](
const
BuildStrategy
&
self
)
{
return
self
.
enable_inplace_
;
},
[](
BuildStrategy
&
self
,
bool
b
)
{
self
.
enable_inplace_
=
b
;
})
.
def_property
(
"fuse_all_reduce_ops"
,
[](
const
BuildStrategy
&
self
)
{
return
self
.
fuse_all_reduce_ops_
;
},
[](
BuildStrategy
&
self
,
bool
b
)
{
self
.
fuse_all_reduce_ops_
=
b
;
})
.
def
(
"_finalize_strategy_and_create_passes"
,
[](
BuildStrategy
&
self
)
->
std
::
shared_ptr
<
ir
::
PassBuilder
>
{
return
self
.
CreatePassesFromStrategy
(
true
);
...
...
python/paddle/fluid/__init__.py
浏览文件 @
dd080b17
...
...
@@ -132,7 +132,8 @@ def __bootstrap__():
'allocator_strategy'
,
'reader_queue_speed_test_mode'
,
'print_sub_graph_dir'
,
'pe_profile_fname'
,
'warpctc_dir'
,
'inner_op_parallelism'
,
'enable_parallel_graph'
,
'multiple_of_cupti_buffer_size'
,
'enable_subgraph_optimize'
,
'fuse_parameter_groups_size'
,
'multiple_of_cupti_buffer_size'
,
'enable_subgraph_optimize'
,
'fuse_parameter_memory_size'
,
'tracer_profile_fname'
]
if
'Darwin'
not
in
sysstr
:
...
...
python/paddle/fluid/data_feeder.py
浏览文件 @
dd080b17
...
...
@@ -268,8 +268,8 @@ class DataFeeder(object):
Args:
reader(function): the reader is the function which can generate data.
multi_devices(bool): whether to use multiple devices or not.
num_places(int): if
the
multi_devices is True, you can specify the number
of GPU to use, if
'num_places'
is None, the function will use all the
num_places(int): if multi_devices is True, you can specify the number
of GPU to use, if
multi_devices
is None, the function will use all the
GPU of the current machine. Default None.
drop_last(bool): whether to drop the last batch if the
size of the last batch is less than batch_size. Default True.
...
...
@@ -278,7 +278,7 @@ class DataFeeder(object):
dict: the result of conversion.
Raises:
ValueError: If drop_last is False and the data batch
which
cannot fit for devices.
ValueError: If drop_last is False and the data batch cannot fit for devices.
"""
def
__reader_creator__
():
...
...
python/paddle/fluid/executor.py
浏览文件 @
dd080b17
...
...
@@ -470,13 +470,21 @@ class Executor(object):
program(Program|CompiledProgram): the program that need to run,
if not provided, then default_main_program (not compiled) will be used.
feed(dict): feed variable map, e.g. {"image": ImageData, "label": LabelData}
fetch_list(list): a list of variable or variable names that user want to get, run will return them according to this list.
feed_var_name(str): the name for the input variable of feed Operator.
fetch_var_name(str): the name for the output variable of fetch Operator.
scope(Scope): the scope used to run this program, you can switch it to different scope. default is global_scope
fetch_list(list): a list of variable or variable names that user
wants to get, this method will return them according to this list.
feed_var_name(str): the name for the input variable of
feed Operator.
fetch_var_name(str): the name for the output variable of
fetch Operator.
scope(Scope): the scope used to run this program, you can switch
it to different scope. default is global_scope
return_numpy(bool): if convert the fetched tensor to numpy
use_program_cache(bool): set use_program_cache to true if program not changed compare to the last step.
use_program_cache(bool): whether to use the cached program
settings across batches. Setting it be true would be faster
only when (1) the program is not compiled with data parallel,
and (2) program, feed variable names and fetch_list variable
names do not changed compared to the last step.
Returns:
list(numpy.array): fetch result according to fetch_list.
...
...
python/paddle/fluid/layers/__init__.py
浏览文件 @
dd080b17
...
...
@@ -33,6 +33,7 @@ from .detection import *
from
.
import
metric_op
from
.metric_op
import
*
from
.learning_rate_scheduler
import
*
from
.collective
import
*
__all__
=
[]
__all__
+=
nn
.
__all__
...
...
python/paddle/fluid/layers/collective.py
0 → 100644
浏览文件 @
dd080b17
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from
__future__
import
print_function
from
..layer_helper
import
LayerHelper
,
unique_name
def
_allreduce
(
x
,
out
=
None
,
reduce_type
=
"sum"
):
helper
=
LayerHelper
(
"allreduce"
,
**
locals
())
# Convert string reduce type to op int type
red_typ_int
=
0
if
reduce_type
==
"sum"
:
red_typ_int
=
0
elif
reduce_type
==
"prod"
:
red_typ_int
=
1
elif
reduce_type
==
"max"
:
red_typ_int
=
2
elif
reduce_type
==
"min"
:
red_typ_int
=
3
else
:
raise
TypeError
(
"reduce type can only be [sum|prod|max|min]"
)
if
out
is
None
:
out
=
helper
.
create_variable
(
name
=
unique_name
.
generate
(
"."
.
join
([
x
.
name
,
'tmp'
])),
shape
=
x
.
shape
,
dtype
=
x
.
dtype
,
type
=
x
.
type
,
persistable
=
x
.
persistable
,
stop_gradient
=
True
)
helper
.
append_op
(
type
=
'allreduce'
,
inputs
=
{
'X'
:
[
x
]},
outputs
=
{
'Out'
:
[
out
]},
attrs
=
{
"reduce_type"
:
red_typ_int
})
return
out
python/paddle/fluid/layers/control_flow.py
浏览文件 @
dd080b17
...
...
@@ -28,21 +28,9 @@ import six
from
functools
import
reduce
__all__
=
[
'While'
,
'Switch'
,
'increment'
,
'array_write'
,
'create_array'
,
'less_than'
,
'equal'
,
'array_read'
,
'array_length'
,
'IfElse'
,
'DynamicRNN'
,
'StaticRNN'
,
'reorder_lod_tensor_by_rank'
,
'Print'
,
'is_empty'
,
'While'
,
'Switch'
,
'increment'
,
'array_write'
,
'create_array'
,
'less_than'
,
'equal'
,
'array_read'
,
'array_length'
,
'IfElse'
,
'DynamicRNN'
,
'StaticRNN'
,
'reorder_lod_tensor_by_rank'
,
'Print'
,
'is_empty'
]
...
...
@@ -1448,12 +1436,13 @@ class DynamicRNN(object):
self
.
input_array
=
[]
self
.
mem_link
=
[]
def
step_input
(
self
,
x
):
def
step_input
(
self
,
x
,
level
=
0
):
"""
Mark a sequence as a dynamic RNN input.
Args:
x(Variable): The input sequence.
level(int): The level of lod used to split steps. Default: 0.
Returns:
The current timestep in the input sequence.
...
...
@@ -1471,7 +1460,8 @@ class DynamicRNN(object):
parent_block
.
append_op
(
type
=
'lod_rank_table'
,
inputs
=
{
"X"
:
x
},
outputs
=
{
"Out"
:
self
.
lod_rank_table
})
outputs
=
{
"Out"
:
self
.
lod_rank_table
},
attrs
=
{
"level"
:
level
})
self
.
max_seq_len
=
parent_block
.
create_var
(
name
=
unique_name
.
generate
(
'dynamic_rnn_max_seq_len'
),
dtype
=
'int64'
)
...
...
python/paddle/fluid/layers/nn.py
浏览文件 @
dd080b17
...
...
@@ -24,6 +24,7 @@ import inspect
from
..layer_helper
import
LayerHelper
from
..initializer
import
Normal
,
Constant
,
NumpyArrayInitializer
from
..framework
import
Variable
,
OpProtoHolder
,
_in_imperative_mode
from
..imperative
import
base
from
..param_attr
import
ParamAttr
from
.layer_function_generator
import
autodoc
,
templatedoc
,
_generate_doc_string_
from
.tensor
import
concat
,
assign
...
...
@@ -9138,6 +9139,10 @@ def _elementwise_op(helper):
op_type
=
helper
.
layer_type
x
=
helper
.
kwargs
.
get
(
'x'
,
None
)
y
=
helper
.
kwargs
.
get
(
'y'
,
None
)
if
_in_imperative_mode
():
x
=
base
.
to_variable
(
x
)
y
=
base
.
to_variable
(
y
)
assert
x
is
not
None
,
'x cannot be None in {}'
.
format
(
op_type
)
assert
y
is
not
None
,
'y cannot be None in {}'
.
format
(
op_type
)
axis
=
helper
.
kwargs
.
get
(
'axis'
,
-
1
)
...
...
python/paddle/fluid/tests/unittests/dist_allreduce_op.py
0 → 100644
浏览文件 @
dd080b17
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from
__future__
import
print_function
import
numpy
as
np
import
argparse
import
time
import
math
import
paddle
import
paddle.fluid
as
fluid
import
paddle.fluid.profiler
as
profiler
from
paddle.fluid
import
core
import
unittest
from
multiprocessing
import
Process
import
os
import
signal
from
functools
import
reduce
from
test_dist_base
import
TestDistRunnerBase
,
runtime_main
DTYPE
=
"float32"
paddle
.
dataset
.
mnist
.
fetch
()
# Fix seed for test
fluid
.
default_startup_program
().
random_seed
=
1
fluid
.
default_main_program
().
random_seed
=
1
def
cnn_model
(
data
):
conv_pool_1
=
fluid
.
nets
.
simple_img_conv_pool
(
input
=
data
,
filter_size
=
5
,
num_filters
=
20
,
pool_size
=
2
,
pool_stride
=
2
,
act
=
"relu"
,
param_attr
=
fluid
.
ParamAttr
(
initializer
=
fluid
.
initializer
.
Constant
(
value
=
0.01
)))
conv_pool_2
=
fluid
.
nets
.
simple_img_conv_pool
(
input
=
conv_pool_1
,
filter_size
=
5
,
num_filters
=
50
,
pool_size
=
2
,
pool_stride
=
2
,
act
=
"relu"
,
param_attr
=
fluid
.
ParamAttr
(
initializer
=
fluid
.
initializer
.
Constant
(
value
=
0.01
)))
SIZE
=
10
input_shape
=
conv_pool_2
.
shape
param_shape
=
[
reduce
(
lambda
a
,
b
:
a
*
b
,
input_shape
[
1
:],
1
)]
+
[
SIZE
]
scale
=
(
2.0
/
(
param_shape
[
0
]
**
2
*
SIZE
))
**
0.5
predict
=
fluid
.
layers
.
fc
(
input
=
conv_pool_2
,
size
=
SIZE
,
act
=
"softmax"
,
param_attr
=
fluid
.
param_attr
.
ParamAttr
(
initializer
=
fluid
.
initializer
.
Constant
(
value
=
0.01
)))
return
predict
class
TestDistMnist2x2
(
TestDistRunnerBase
):
def
get_model
(
self
,
batch_size
=
2
,
single_device
=
False
):
# Input data
images
=
fluid
.
layers
.
data
(
name
=
'pixel'
,
shape
=
[
1
,
28
,
28
],
dtype
=
DTYPE
)
label
=
fluid
.
layers
.
data
(
name
=
'label'
,
shape
=
[
1
],
dtype
=
'int64'
)
# Train program
predict
=
cnn_model
(
images
)
cost
=
fluid
.
layers
.
cross_entropy
(
input
=
predict
,
label
=
label
)
avg_cost
=
fluid
.
layers
.
mean
(
x
=
cost
)
# Evaluator
batch_size_tensor
=
fluid
.
layers
.
create_tensor
(
dtype
=
'int64'
)
batch_acc
=
fluid
.
layers
.
accuracy
(
input
=
predict
,
label
=
label
,
total
=
batch_size_tensor
)
inference_program
=
fluid
.
default_main_program
().
clone
()
# Reader
train_reader
=
paddle
.
batch
(
paddle
.
dataset
.
mnist
.
test
(),
batch_size
=
batch_size
)
test_reader
=
paddle
.
batch
(
paddle
.
dataset
.
mnist
.
test
(),
batch_size
=
batch_size
)
# Optimization
# TODO(typhoonzero): fix distributed adam optimizer
# opt = fluid.optimizer.AdamOptimizer(
# learning_rate=0.001, beta1=0.9, beta2=0.999)
opt
=
fluid
.
optimizer
.
Momentum
(
learning_rate
=
self
.
lr
,
momentum
=
0.9
)
if
single_device
:
opt
.
minimize
(
avg_cost
)
else
:
# multi device or distributed multi device
params_grads
=
opt
.
backward
(
avg_cost
)
data_parallel_param_grads
=
[]
for
p
,
g
in
params_grads
:
# NOTE: scale will be done on loss scale in multi_devices_graph_pass using nranks.
grad_reduce
=
fluid
.
layers
.
collective
.
_allreduce
(
g
)
data_parallel_param_grads
.
append
([
p
,
grad_reduce
])
opt
.
apply_gradients
(
data_parallel_param_grads
)
return
inference_program
,
avg_cost
,
train_reader
,
test_reader
,
batch_acc
,
predict
if
__name__
==
"__main__"
:
runtime_main
(
TestDistMnist2x2
)
python/paddle/fluid/tests/unittests/mkldnn/test_transpose_int8_mkldnn_op.py
0 → 100644
浏览文件 @
dd080b17
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from
__future__
import
print_function
import
unittest
import
numpy
as
np
from
paddle.fluid.tests.unittests.op_test
import
OpTest
from
mkldnn_op_test
import
format_reorder
class
TestTransposeOp
(
OpTest
):
def
setUp
(
self
):
self
.
init_op_type
()
self
.
initTestCase
()
self
.
initInputData
()
self
.
use_mkldnn
=
True
self
.
axis
=
(
0
,
2
,
3
,
1
)
self
.
inputs
=
{
'X'
:
format_reorder
(
self
.
input_data
,
self
.
shape
)
}
#transform data format to 'NHWC' for INT8 transpose specially.
self
.
attrs
=
{
'axis'
:
list
(
self
.
axis
),
'use_mkldnn'
:
self
.
use_mkldnn
,
}
self
.
outputs
=
{
'XShape'
:
np
.
random
.
random
(
self
.
shape
).
astype
(
'int8'
),
'Out'
:
self
.
inputs
[
'X'
].
transpose
(
self
.
axis
)
}
def
init_op_type
(
self
):
self
.
op_type
=
"transpose2"
def
test_check_output
(
self
):
self
.
check_output
(
no_check_set
=
[
'XShape'
])
def
initTestCase
(
self
):
self
.
shape
=
(
2
,
3
,
4
,
5
)
def
initInputData
(
self
):
self
.
input_data
=
(
np
.
random
.
randint
(
0
,
100
,
self
.
shape
)
-
50
).
astype
(
'int8'
)
class
TestINT8Case
(
TestTransposeOp
):
def
initTestCase
(
self
):
self
.
shape
=
(
2
,
4
,
6
,
8
)
def
initInputData
(
self
):
self
.
input_data
=
(
np
.
random
.
randint
(
0
,
100
,
self
.
shape
)
-
50
).
astype
(
'int8'
)
class
TestUINT8Case
(
TestTransposeOp
):
def
initTestCase
(
self
):
self
.
shape
=
(
1
,
3
,
5
,
7
)
def
initDataType
(
self
):
self
.
input_data
=
(
np
.
random
.
randint
(
0
,
100
,
self
.
shape
)).
astype
(
'uint8'
)
if
__name__
==
'__main__'
:
unittest
.
main
()
python/paddle/fluid/tests/unittests/parallel_executor_test_base.py
浏览文件 @
dd080b17
...
...
@@ -43,6 +43,7 @@ class TestParallelExecutorBase(unittest.TestCase):
use_ir_memory_optimize
=
True
,
enable_inplace
=
True
,
fuse_elewise_add_act_ops
=
False
,
fuse_all_reduce_ops
=
False
,
fuse_relu_depthwise_conv
=
False
,
optimizer
=
fluid
.
optimizer
.
Adam
,
use_fast_executor
=
False
,
...
...
@@ -80,6 +81,7 @@ class TestParallelExecutorBase(unittest.TestCase):
build_strategy
.
fuse_elewise_add_act_ops
=
fuse_elewise_add_act_ops
build_strategy
.
fuse_relu_depthwise_conv
=
fuse_relu_depthwise_conv
build_strategy
.
memory_optimize
=
False
if
memory_opt
else
use_ir_memory_optimize
build_strategy
.
fuse_all_reduce_ops
=
fuse_all_reduce_ops
# python memory optimization is conflict with inplace pass.
# Use ir graph memory optimization after inplace pass is the correct way.
build_strategy
.
enable_inplace
=
False
if
memory_opt
else
enable_inplace
...
...
python/paddle/fluid/tests/unittests/test_dist_allreduce_op.py
0 → 100644
浏览文件 @
dd080b17
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from
__future__
import
print_function
import
unittest
from
test_dist_base
import
TestDistBase
class
TestDistMnistNCCL2
(
TestDistBase
):
def
_setup_config
(
self
):
self
.
_sync_mode
=
True
self
.
_use_reduce
=
False
self
.
_use_reader_alloc
=
False
self
.
_nccl2_mode
=
True
self
.
_nccl2_reduce_layer
=
True
def
test_dist_train
(
self
):
import
paddle.fluid
as
fluid
if
fluid
.
core
.
is_compiled_with_cuda
():
self
.
check_with_place
(
"dist_allreduce_op.py"
,
delta
=
1e-5
)
if
__name__
==
'__main__'
:
unittest
.
main
()
python/paddle/fluid/tests/unittests/test_dist_base.py
浏览文件 @
dd080b17
...
...
@@ -33,7 +33,10 @@ DEFAULT_BATCH_SIZE = 2
class
TestDistRunnerBase
(
object
):
def
get_model
(
self
,
batch_size
=
DEFAULT_BATCH_SIZE
,
lr
=
0.1
):
def
get_model
(
self
,
batch_size
=
DEFAULT_BATCH_SIZE
,
lr
=
0.1
,
single_device
=
False
):
raise
NotImplementedError
(
"get_model should be implemented by child classes."
)
...
...
@@ -76,8 +79,12 @@ class TestDistRunnerBase(object):
def
run_trainer
(
self
,
args
):
self
.
lr
=
args
.
lr
test_program
,
avg_cost
,
train_reader
,
test_reader
,
batch_acc
,
predict
=
\
self
.
get_model
(
batch_size
=
args
.
batch_size
)
if
args
.
nccl2_reduce_layer_local_run
:
test_program
,
avg_cost
,
train_reader
,
test_reader
,
batch_acc
,
predict
=
\
self
.
get_model
(
batch_size
=
args
.
batch_size
,
single_device
=
True
)
else
:
test_program
,
avg_cost
,
train_reader
,
test_reader
,
batch_acc
,
predict
=
\
self
.
get_model
(
batch_size
=
args
.
batch_size
)
if
args
.
mem_opt
:
fluid
.
memory_optimize
(
fluid
.
default_main_program
(),
skip_grads
=
True
)
...
...
@@ -87,7 +94,7 @@ class TestDistRunnerBase(object):
args
.
endpoints
,
args
.
trainers
,
args
.
sync_mode
,
args
.
dc_asgd
)
trainer_prog
=
t
.
get_trainer_program
()
elif
args
.
update_method
==
"nccl2"
:
elif
args
.
update_method
==
"nccl2"
or
args
.
update_method
==
"nccl2_reduce_layer"
:
# transpile for nccl2
config
=
fluid
.
DistributeTranspilerConfig
()
config
.
mode
=
"nccl2"
...
...
@@ -110,9 +117,9 @@ class TestDistRunnerBase(object):
exe
=
fluid
.
Executor
(
place
)
exe
.
run
(
fluid
.
default_startup_program
())
strategy
=
fluid
.
ExecutionStrategy
()
strategy
.
num_threads
=
1
strategy
.
allow_op_delay
=
False
exec_
strategy
=
fluid
.
ExecutionStrategy
()
exec_
strategy
.
num_threads
=
1
exec_
strategy
.
allow_op_delay
=
False
build_stra
=
fluid
.
BuildStrategy
()
# FIXME force disable enable_inplace and memory_optimize
...
...
@@ -124,23 +131,25 @@ class TestDistRunnerBase(object):
else
:
build_stra
.
reduce_strategy
=
fluid
.
BuildStrategy
.
ReduceStrategy
.
AllReduce
pass_builder
=
None
if
args
.
batch_merge_repeat
>
1
:
pass_builder
=
build_stra
.
_finalize_strategy_and_create_passes
()
mypass
=
pass_builder
.
insert_pass
(
len
(
pass_builder
.
all_passes
())
-
3
,
"multi_batch_merge_pass"
)
mypass
.
set
(
"num_repeats"
,
args
.
batch_merge_repeat
)
if
args
.
update_method
==
"nccl2"
:
if
args
.
update_method
==
"nccl2"
or
args
.
update_method
==
"nccl2_reduce_layer"
:
build_stra
.
num_trainers
=
len
(
args
.
endpoints
.
split
(
","
))
build_stra
.
trainer_id
=
args
.
trainer_id
else
:
# case args.update_method == "nccl2_reduce_layer":
build_stra
.
num_trainers
=
1
build_stra
.
trainer_id
=
0
binary
=
compiler
.
CompiledProgram
(
trainer_prog
).
with_data_parallel
(
loss_name
=
avg_cost
.
name
,
build_strategy
=
build_stra
,
exec_strategy
=
strategy
)
exec_strategy
=
exec_
strategy
)
feed_var_list
=
[
var
for
var
in
trainer_prog
.
global_block
().
vars
.
values
()
...
...
@@ -182,7 +191,7 @@ def runtime_main(test_class):
'--update_method'
,
type
=
str
,
default
=
"local"
,
choices
=
[
"pserver"
,
"nccl2"
,
"local"
])
choices
=
[
"pserver"
,
"nccl2"
,
"local"
,
"nccl2_reduce_layer"
])
parser
.
add_argument
(
'--trainer_id'
,
type
=
int
,
required
=
False
,
default
=
0
)
parser
.
add_argument
(
'--trainers'
,
type
=
int
,
required
=
False
,
default
=
1
)
parser
.
add_argument
(
...
...
@@ -198,6 +207,11 @@ def runtime_main(test_class):
parser
.
add_argument
(
'--lr'
,
required
=
False
,
type
=
float
,
default
=
0.001
)
parser
.
add_argument
(
'--batch_merge_repeat'
,
required
=
False
,
type
=
int
,
default
=
1
)
parser
.
add_argument
(
'--nccl2_reduce_layer_local_run'
,
required
=
False
,
type
=
bool
,
default
=
False
)
args
=
parser
.
parse_args
()
...
...
@@ -242,6 +256,11 @@ class TestDistBase(unittest.TestCase):
self
.
_dc_asgd
=
False
# must use with async mode
self
.
_use_reader_alloc
=
True
self
.
_nccl2_mode
=
False
# FIXME(typhoonzero): I added this stupid argument to enable
# testing allreduce layers, which users can call layers.allreduce
# to accumulate tensors at anywhere. Find a better way to do this
# test, reduce check this argument everywhere.
self
.
_nccl2_reduce_layer
=
False
self
.
_lr
=
0.001
self
.
_setup_config
()
self
.
_after_setup_config
()
...
...
@@ -307,10 +326,16 @@ class TestDistBase(unittest.TestCase):
cmd
+=
" --batch_size %d"
%
batch_size
if
batch_merge_repeat
>
1
:
cmd
+=
" --batch_merge_repeat %d"
%
batch_merge_repeat
if
self
.
_nccl2_reduce_layer
:
cmd
+=
" --nccl2_reduce_layer_local_run 1"
if
self
.
__use_cuda
:
cmd
+=
" --use_cuda"
env_local
=
{
"CUDA_VISIBLE_DEVICES"
:
"0"
}
env_local
=
{
"CUDA_VISIBLE_DEVICES"
:
"0"
,
"PADDLE_TRAINERS_NUM"
:
"1"
,
"PADDLE_TRAINER_ID"
:
"0"
}
else
:
env_local
=
{
'CPU_NUM'
:
'1'
}
...
...
@@ -427,29 +452,30 @@ class TestDistBase(unittest.TestCase):
sys
.
stderr
.
write
(
"ps1 stderr: %s
\n
"
%
fn
.
read
())
# print log
if
stat0
==
0
:
sys
.
stderr
.
write
(
'trainer 0 stdout: %s
\n
'
%
pickle
.
loads
(
tr0_out
))
with
open
(
"/tmp/tr0_err.log"
,
"r"
)
as
fn
:
sys
.
stderr
.
write
(
'trainer 0 stderr: %s
\n
'
%
fn
.
read
())
if
stat1
==
0
:
sys
.
stderr
.
write
(
'trainer 1 stdout: %s
\n
'
%
pickle
.
loads
(
tr1_out
))
with
open
(
"/tmp/tr1_err.log"
,
"r"
)
as
fn
:
sys
.
stderr
.
write
(
'trainer 1 stderr: %s
\n
'
%
fn
.
read
())
return
pickle
.
loads
(
tr0_out
),
pickle
.
loads
(
tr1_out
)
def
_run_cluster_nccl2
(
self
,
model
,
envs
,
check_error_log
):
def
_run_cluster_nccl2
(
self
,
model
,
envs
,
nccl2_reduce_layer
,
check_error_log
):
# NOTE: we reuse ps_endpoints as nccl2 worker endpoints
worker_endpoints
=
self
.
_ps_endpoints
.
split
(
","
)
w0_ep
,
w1_ep
=
worker_endpoints
if
nccl2_reduce_layer
:
update_method
=
"nccl2_reduce_layer"
else
:
update_method
=
"nccl2"
tr_cmd
=
"%s %s --role trainer --endpoints %s --trainer_id %d --current_endpoint %s --update_method
nccl2
--lr %f"
tr_cmd
=
"%s %s --role trainer --endpoints %s --trainer_id %d --current_endpoint %s --update_method
%s
--lr %f"
tr0_cmd
=
tr_cmd
%
\
(
self
.
_python_interp
,
model
,
self
.
_ps_endpoints
,
0
,
w0_ep
,
self
.
_lr
)
0
,
w0_ep
,
update_method
,
self
.
_lr
)
tr1_cmd
=
tr_cmd
%
\
(
self
.
_python_interp
,
model
,
self
.
_ps_endpoints
,
1
,
w1_ep
,
self
.
_lr
)
1
,
w1_ep
,
update_method
,
self
.
_lr
)
if
self
.
_mem_opt
:
tr0_cmd
+=
" --mem_opt"
...
...
@@ -463,8 +489,17 @@ class TestDistBase(unittest.TestCase):
if
self
.
__use_cuda
:
tr0_cmd
+=
" --use_cuda"
tr1_cmd
+=
" --use_cuda"
env0
=
{
"CUDA_VISIBLE_DEVICES"
:
"0"
}
env1
=
{
"CUDA_VISIBLE_DEVICES"
:
"1"
}
env0
=
{
"CUDA_VISIBLE_DEVICES"
:
"0"
,
# for test nccl2 layer
"PADDLE_TRAINERS_NUM"
:
"2"
,
"PADDLE_TRAINER_ID"
:
"0"
}
env1
=
{
"CUDA_VISIBLE_DEVICES"
:
"1"
,
"PADDLE_TRAINERS_NUM"
:
"2"
,
"PADDLE_TRAINER_ID"
:
"1"
}
else
:
env0
=
{
'CPU_NUM'
:
'1'
}
env1
=
{
'CPU_NUM'
:
'1'
}
...
...
@@ -498,8 +533,6 @@ class TestDistBase(unittest.TestCase):
# print log
sys
.
stderr
.
write
(
'trainer 0 stderr: %s
\n
'
%
tr0_err
)
sys
.
stderr
.
write
(
'trainer 1 stderr: %s
\n
'
%
tr1_err
)
sys
.
stderr
.
write
(
'trainer 0 stdout: %s
\n
'
%
tr0_out
)
sys
.
stderr
.
write
(
'trainer 1 stdout: %s
\n
'
%
tr1_out
)
return
pickle
.
loads
(
tr0_out
),
pickle
.
loads
(
tr1_out
)
...
...
@@ -528,10 +561,14 @@ class TestDistBase(unittest.TestCase):
local_losses
\
=
self
.
_run_local
(
model_file
,
required_envs
,
check_error_log
)
check_error_log
)
if
self
.
_nccl2_mode
:
tr0_losses
,
tr1_losses
=
self
.
_run_cluster_nccl2
(
model_file
,
required_envs
,
check_error_log
)
if
self
.
_nccl2_reduce_layer
:
tr0_losses
,
tr1_losses
=
self
.
_run_cluster_nccl2
(
model_file
,
required_envs
,
True
,
check_error_log
)
else
:
tr0_losses
,
tr1_losses
=
self
.
_run_cluster_nccl2
(
model_file
,
required_envs
,
False
,
check_error_log
)
else
:
tr0_losses
,
tr1_losses
=
self
.
_run_cluster
(
model_file
,
required_envs
,
check_error_log
)
...
...
python/paddle/fluid/tests/unittests/test_dist_mnist.py
浏览文件 @
dd080b17
#
Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
...
...
python/paddle/fluid/tests/unittests/test_fuse_all_reduce_pass.py
0 → 100644
浏览文件 @
dd080b17
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from
parallel_executor_test_base
import
TestParallelExecutorBase
import
paddle.fluid
as
fluid
import
paddle.fluid.core
as
core
import
numpy
as
np
import
paddle
import
paddle.dataset.mnist
as
mnist
import
unittest
import
os
def
simple_fc_net
(
use_feed
):
img
=
fluid
.
layers
.
data
(
name
=
'image'
,
shape
=
[
784
],
dtype
=
'float32'
)
label
=
fluid
.
layers
.
data
(
name
=
'label'
,
shape
=
[
1
],
dtype
=
'int64'
)
hidden
=
img
for
_
in
range
(
4
):
hidden
=
fluid
.
layers
.
fc
(
hidden
,
size
=
200
,
act
=
'relu'
,
bias_attr
=
fluid
.
ParamAttr
(
initializer
=
fluid
.
initializer
.
Constant
(
value
=
1.0
)))
prediction
=
fluid
.
layers
.
fc
(
hidden
,
size
=
10
,
act
=
'softmax'
)
loss
=
fluid
.
layers
.
cross_entropy
(
input
=
prediction
,
label
=
label
)
loss
=
fluid
.
layers
.
mean
(
loss
)
return
loss
def
fc_with_batchnorm
(
use_feed
):
img
=
fluid
.
layers
.
data
(
name
=
'image'
,
shape
=
[
784
],
dtype
=
'float32'
)
label
=
fluid
.
layers
.
data
(
name
=
'label'
,
shape
=
[
1
],
dtype
=
'int64'
)
hidden
=
img
for
_
in
range
(
2
):
hidden
=
fluid
.
layers
.
fc
(
hidden
,
size
=
200
,
act
=
'relu'
,
bias_attr
=
fluid
.
ParamAttr
(
initializer
=
fluid
.
initializer
.
Constant
(
value
=
1.0
)))
hidden
=
fluid
.
layers
.
batch_norm
(
input
=
hidden
)
prediction
=
fluid
.
layers
.
fc
(
hidden
,
size
=
10
,
act
=
'softmax'
)
loss
=
fluid
.
layers
.
cross_entropy
(
input
=
prediction
,
label
=
label
)
loss
=
fluid
.
layers
.
mean
(
loss
)
return
loss
class
TestMNIST
(
TestParallelExecutorBase
):
@
classmethod
def
setUpClass
(
cls
):
os
.
environ
[
'CPU_NUM'
]
=
str
(
4
)
def
_init_data
(
self
,
random
=
True
):
np
.
random
.
seed
(
5
)
if
random
:
img
=
np
.
random
.
random
(
size
=
[
32
,
784
]).
astype
(
np
.
float32
)
else
:
img
=
np
.
ones
(
shape
=
[
32
,
784
],
dtype
=
'float32'
)
label
=
np
.
ones
(
shape
=
[
32
,
1
],
dtype
=
'int64'
)
return
img
,
label
def
_compare_fuse_all_reduce_ops
(
self
,
model
,
use_cuda
,
random_data
=
True
):
if
use_cuda
and
not
core
.
is_compiled_with_cuda
():
return
img
,
label
=
self
.
_init_data
(
random_data
)
def
_optimizer
(
learning_rate
=
1e-6
):
optimizer
=
fluid
.
optimizer
.
SGD
(
learning_rate
=
learning_rate
,
regularization
=
fluid
.
regularizer
.
L2Decay
(
1e-6
))
return
optimizer
not_fuse_op_first_loss
,
not_fuse_op_last_loss
=
self
.
check_network_convergence
(
model
,
feed_dict
=
{
"image"
:
img
,
"label"
:
label
},
use_cuda
=
use_cuda
,
fuse_all_reduce_ops
=
False
,
memory_opt
=
False
,
optimizer
=
_optimizer
)
fuse_op_first_loss
,
fuse_op_last_loss
=
self
.
check_network_convergence
(
model
,
feed_dict
=
{
"image"
:
img
,
"label"
:
label
},
use_cuda
=
use_cuda
,
fuse_all_reduce_ops
=
True
,
memory_opt
=
False
,
optimizer
=
_optimizer
)
for
loss
in
zip
(
not_fuse_op_first_loss
,
fuse_op_first_loss
):
self
.
assertAlmostEquals
(
loss
[
0
],
loss
[
1
],
delta
=
1e-6
)
for
loss
in
zip
(
not_fuse_op_last_loss
,
fuse_op_last_loss
):
self
.
assertAlmostEquals
(
loss
[
0
],
loss
[
1
],
delta
=
1e-6
)
def
test_simple_fc_with_fuse_op
(
self
):
self
.
_compare_fuse_all_reduce_ops
(
simple_fc_net
,
True
)
self
.
_compare_fuse_all_reduce_ops
(
simple_fc_net
,
False
)
def
test_batchnorm_fc_with_fuse_op
(
self
):
self
.
_compare_fuse_all_reduce_ops
(
fc_with_batchnorm
,
True
)
self
.
_compare_fuse_all_reduce_ops
(
fc_with_batchnorm
,
False
)
if
__name__
==
'__main__'
:
unittest
.
main
()
python/paddle/fluid/tests/unittests/test_layers.py
浏览文件 @
dd080b17
...
...
@@ -174,6 +174,60 @@ class TestLayer(LayerTest):
self
.
assertTrue
(
np
.
allclose
(
static_ret
[
i
],
static_ret2
[
i
]))
self
.
assertTrue
(
np
.
allclose
(
static_ret
[
i
],
dy_ret
[
i
].
_numpy
()))
def
test_elementwise_math
(
self
):
n
=
np
.
ones
([
3
,
3
],
dtype
=
'float32'
)
n2
=
np
.
ones
([
3
,
3
],
dtype
=
'float32'
)
*
1.1
n3
=
np
.
ones
([
3
,
3
],
dtype
=
'float32'
)
*
2
n4
=
np
.
ones
([
3
,
3
],
dtype
=
'float32'
)
*
3
n5
=
np
.
ones
([
3
,
3
],
dtype
=
'float32'
)
*
4
n6
=
np
.
ones
([
3
,
3
],
dtype
=
'float32'
)
*
5
with
self
.
static_graph
():
t
=
layers
.
data
(
name
=
't'
,
shape
=
[
3
,
3
],
dtype
=
'float32'
)
t2
=
layers
.
data
(
name
=
't2'
,
shape
=
[
3
,
3
],
dtype
=
'float32'
)
t3
=
layers
.
data
(
name
=
't3'
,
shape
=
[
3
,
3
],
dtype
=
'float32'
)
t4
=
layers
.
data
(
name
=
't4'
,
shape
=
[
3
,
3
],
dtype
=
'float32'
)
t5
=
layers
.
data
(
name
=
't5'
,
shape
=
[
3
,
3
],
dtype
=
'float32'
)
t6
=
layers
.
data
(
name
=
't6'
,
shape
=
[
3
,
3
],
dtype
=
'float32'
)
ret
=
layers
.
elementwise_add
(
t
,
t2
)
ret
=
layers
.
elementwise_pow
(
ret
,
t3
)
ret
=
layers
.
elementwise_div
(
ret
,
t4
)
ret
=
layers
.
elementwise_sub
(
ret
,
t5
)
ret
=
layers
.
elementwise_mul
(
ret
,
t6
)
static_ret
=
self
.
get_static_graph_result
(
feed
=
{
't'
:
n
,
't2'
:
n2
,
't3'
:
n3
,
't4'
:
n4
,
't5'
:
n5
,
't6'
:
n6
},
fetch_list
=
[
ret
])[
0
]
with
self
.
dynamic_graph
():
ret
=
layers
.
elementwise_add
(
n
,
n2
)
ret
=
layers
.
elementwise_pow
(
ret
,
n3
)
ret
=
layers
.
elementwise_div
(
ret
,
n4
)
ret
=
layers
.
elementwise_sub
(
ret
,
n5
)
dy_ret
=
layers
.
elementwise_mul
(
ret
,
n6
)
self
.
assertTrue
(
np
.
allclose
(
static_ret
,
dy_ret
.
_numpy
()),
'%s vs %s'
%
(
static_ret
,
dy_ret
.
_numpy
()))
def
test_elementwise_minmax
(
self
):
n
=
np
.
ones
([
3
,
3
],
dtype
=
'float32'
)
n2
=
np
.
ones
([
3
,
3
],
dtype
=
'float32'
)
*
2
with
self
.
dynamic_graph
():
min_ret
=
layers
.
elementwise_min
(
n
,
n2
)
max_ret
=
layers
.
elementwise_max
(
n
,
n2
)
self
.
assertTrue
(
np
.
allclose
(
n
,
min_ret
.
_numpy
()))
self
.
assertTrue
(
np
.
allclose
(
n2
,
max_ret
.
_numpy
()))
class
TestBook
(
unittest
.
TestCase
):
def
test_fit_a_line
(
self
):
...
...
python/paddle/fluid/tests/unittests/test_slice_op.py
浏览文件 @
dd080b17
...
...
@@ -87,5 +87,31 @@ class TestFP16(TestSliceOp):
place
,
[
'Input'
],
'Out'
,
max_relative_error
=
0.006
)
@
unittest
.
skipIf
(
not
core
.
is_compiled_with_cuda
(),
"core is not compiled with CUDA"
)
class
TestFP16_2
(
TestSliceOp
):
def
config
(
self
):
self
.
dtype
=
"float16"
self
.
input
=
np
.
random
.
random
([
3
,
4
,
5
]).
astype
(
self
.
dtype
)
self
.
starts
=
[
0
]
self
.
ends
=
[
1
]
self
.
axes
=
[
1
]
self
.
out
=
self
.
input
[:,
0
:
1
,
:]
def
test_check_output
(
self
):
place
=
core
.
CUDAPlace
(
0
)
if
core
.
is_float16_supported
(
place
):
self
.
check_output_with_place
(
place
,
atol
=
1e-5
)
def
test_check_grad_normal
(
self
):
place
=
core
.
CUDAPlace
(
0
)
if
core
.
is_float16_supported
(
place
):
self
.
check_grad_with_place
(
place
,
[
'Input'
],
'Out'
,
max_relative_error
=
0.006
,
numeric_grad_delta
=
0.5
)
if
__name__
==
'__main__'
:
unittest
.
main
()
python/paddle/reader/__init__.py
浏览文件 @
dd080b17
...
...
@@ -38,9 +38,8 @@ items. It can be any function with no parameter that creates a iterable
Element produced from the iterable should be a **single** entry of data,
**not** a mini batch. That entry of data could be a single item, or a tuple of
items.
Item should be of `supported type <http://www.paddlepaddle.org/doc/ui/data_provider
/pydataprovider2.html?highlight=dense_vector#input-types>`_ (e.g., numpy 1d
array of float32, int, list of int)
Item should be of supported type (e.g., numpy array or list/tuple of float
or int).
An example implementation for single item data reader creator:
...
...
@@ -62,8 +61,6 @@ An example implementation for multiple item data reader creator:
yield numpy.random.uniform(-1, 1, size=width*height), label
return reader
TODO(yuyang18): Should we add whole design doc here?
"""
import
paddle.reader.decorator
...
...
python/paddle/reader/creator.py
浏览文件 @
dd080b17
...
...
@@ -44,8 +44,11 @@ def text_file(path):
Creates a data reader that outputs text line by line from given text file.
Trailing new line ('
\\\\
n') of each line will be removed.
:path: path of the text file.
:returns: data reader of text file
Args:
path (str): path of the text file.
Returns:
callable: data reader of text file.
"""
def
reader
():
...
...
@@ -59,10 +62,15 @@ def text_file(path):
def
recordio
(
paths
,
buf_size
=
100
):
"""
Creates a data reader from given RecordIO file paths separated by ",",
glob pattern is supported.
:path: path of recordio files, can be a string or a string list.
:returns: data reader of recordio files.
Creates a data reader from given RecordIO file paths separated
by ",", glob pattern is supported.
Args:
paths (str|list(str)): path of recordio files.
buf_size (int): prefetched buffer size.
Returns:
callable: data reader of recordio files.
"""
import
recordio
as
rec
...
...
python/paddle/reader/decorator.py
浏览文件 @
dd080b17
...
...
@@ -242,20 +242,18 @@ class XmapEndSignal():
def
xmap_readers
(
mapper
,
reader
,
process_num
,
buffer_size
,
order
=
False
):
"""
Use multiprocess to map samples from reader by a mapper defined by user.
And this function contains a buffered decorator.
:param mapper: a function to map sample.
:type mapper: callable
:param reader: the data reader to read from
:type reader: callable
:param process_num: process number to handle original sample
:type process_num: int
:param buffer_size: max buffer size
:type buffer_size: int
:param order: keep the order of reader
:type order: bool
:return: the decarated reader
:rtype: callable
Use multi-threads to map samples from reader by a mapper defined by user.
Args:
mapper (callable): a function to map the data from reader.
reader (callable): a data reader which yields the data.
process_num (int): thread number to handle original sample.
buffer_size (int): size of the queue to read data in.
order (bool): whether to keep the data order from original reader.
Default False.
Returns:
callable: a decorated reader with data mapping.
"""
end
=
XmapEndSignal
()
...
...
@@ -477,7 +475,7 @@ class PipeReader:
"""
:param cut_lines: cut buffer to lines
:type cut_lines: bool
:param line_break: line break of the file, like
\n
or
\r
:param line_break: line break of the file, like
'
\\\\
n' or '
\\\\
r'
:type line_break: string
:return: one line or a buffer of bytes
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录