Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
BaiXuePrincess
Paddle
提交
d231e550
P
Paddle
项目概览
BaiXuePrincess
/
Paddle
与 Fork 源项目一致
Fork自
PaddlePaddle / Paddle
通知
1
Star
1
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
d231e550
编写于
11月 12, 2018
作者:
S
sneaxiy
浏览文件
操作
浏览文件
下载
差异文件
merge develop
test=develop
上级
cf8d2e67
080740b3
变更
171
展开全部
显示空白变更内容
内联
并排
Showing
171 changed file
with
3318 addition
and
912 deletion
+3318
-912
paddle/fluid/API.spec
paddle/fluid/API.spec
+5
-2
paddle/fluid/framework/data_device_transform.cc
paddle/fluid/framework/data_device_transform.cc
+2
-2
paddle/fluid/framework/data_device_transform_test.cu
paddle/fluid/framework/data_device_transform_test.cu
+3
-3
paddle/fluid/framework/details/broadcast_op_handle.cc
paddle/fluid/framework/details/broadcast_op_handle.cc
+1
-1
paddle/fluid/framework/details/modify_op_lock_and_record_event_pass.cc
...framework/details/modify_op_lock_and_record_event_pass.cc
+2
-2
paddle/fluid/framework/details/multi_devices_graph_pass.cc
paddle/fluid/framework/details/multi_devices_graph_pass.cc
+6
-6
paddle/fluid/framework/details/reference_count_pass.cc
paddle/fluid/framework/details/reference_count_pass.cc
+2
-2
paddle/fluid/framework/details/scale_loss_grad_op_handle.cc
paddle/fluid/framework/details/scale_loss_grad_op_handle.cc
+1
-1
paddle/fluid/framework/details/sequential_execution_pass.cc
paddle/fluid/framework/details/sequential_execution_pass.cc
+2
-2
paddle/fluid/framework/details/threaded_ssa_graph_executor.cc
...le/fluid/framework/details/threaded_ssa_graph_executor.cc
+4
-4
paddle/fluid/framework/executor.cc
paddle/fluid/framework/executor.cc
+11
-11
paddle/fluid/framework/feed_fetch_method.cc
paddle/fluid/framework/feed_fetch_method.cc
+3
-3
paddle/fluid/framework/ir/attention_lstm_fuse_pass.cc
paddle/fluid/framework/ir/attention_lstm_fuse_pass.cc
+14
-14
paddle/fluid/framework/ir/conv_bias_mkldnn_fuse_pass.cc
paddle/fluid/framework/ir/conv_bias_mkldnn_fuse_pass.cc
+2
-2
paddle/fluid/framework/ir/conv_bn_fuse_pass.cc
paddle/fluid/framework/ir/conv_bn_fuse_pass.cc
+3
-3
paddle/fluid/framework/ir/conv_relu_mkldnn_fuse_pass.cc
paddle/fluid/framework/ir/conv_relu_mkldnn_fuse_pass.cc
+2
-2
paddle/fluid/framework/ir/depthwise_conv_mkldnn_pass.cc
paddle/fluid/framework/ir/depthwise_conv_mkldnn_pass.cc
+1
-1
paddle/fluid/framework/ir/fc_fuse_pass.cc
paddle/fluid/framework/ir/fc_fuse_pass.cc
+1
-1
paddle/fluid/framework/ir/fuse_elewise_add_act_pass.cc
paddle/fluid/framework/ir/fuse_elewise_add_act_pass.cc
+14
-14
paddle/fluid/framework/ir/graph.cc
paddle/fluid/framework/ir/graph.cc
+2
-2
paddle/fluid/framework/ir/graph.h
paddle/fluid/framework/ir/graph.h
+1
-1
paddle/fluid/framework/ir/graph_helper.cc
paddle/fluid/framework/ir/graph_helper.cc
+10
-9
paddle/fluid/framework/ir/graph_pattern_detector.cc
paddle/fluid/framework/ir/graph_pattern_detector.cc
+12
-10
paddle/fluid/framework/ir/graph_viz_pass.cc
paddle/fluid/framework/ir/graph_viz_pass.cc
+1
-1
paddle/fluid/framework/ir/mkldnn_placement_pass.cc
paddle/fluid/framework/ir/mkldnn_placement_pass.cc
+1
-1
paddle/fluid/framework/ir/multi_batch_merge_pass.cc
paddle/fluid/framework/ir/multi_batch_merge_pass.cc
+4
-4
paddle/fluid/framework/ir/pass.h
paddle/fluid/framework/ir/pass.h
+1
-1
paddle/fluid/framework/ir/seq_concat_fc_fuse_pass.cc
paddle/fluid/framework/ir/seq_concat_fc_fuse_pass.cc
+6
-6
paddle/fluid/framework/ir/seqconv_eltadd_relu_fuse_pass.cc
paddle/fluid/framework/ir/seqconv_eltadd_relu_fuse_pass.cc
+1
-1
paddle/fluid/framework/lod_rank_table.cc
paddle/fluid/framework/lod_rank_table.cc
+1
-1
paddle/fluid/framework/mixed_vector_test.cc
paddle/fluid/framework/mixed_vector_test.cc
+1
-1
paddle/fluid/framework/naive_executor.cc
paddle/fluid/framework/naive_executor.cc
+7
-7
paddle/fluid/framework/op_desc.cc
paddle/fluid/framework/op_desc.cc
+16
-16
paddle/fluid/framework/op_registry.cc
paddle/fluid/framework/op_registry.cc
+3
-3
paddle/fluid/framework/operator.cc
paddle/fluid/framework/operator.cc
+10
-7
paddle/fluid/framework/parallel_executor.cc
paddle/fluid/framework/parallel_executor.cc
+1
-1
paddle/fluid/framework/scope.cc
paddle/fluid/framework/scope.cc
+1
-1
paddle/fluid/framework/selected_rows.cc
paddle/fluid/framework/selected_rows.cc
+1
-1
paddle/fluid/framework/tensor_util.cc
paddle/fluid/framework/tensor_util.cc
+12
-12
paddle/fluid/framework/threadpool.cc
paddle/fluid/framework/threadpool.cc
+1
-1
paddle/fluid/framework/var_desc.cc
paddle/fluid/framework/var_desc.cc
+14
-14
paddle/fluid/framework/var_type_inference.h
paddle/fluid/framework/var_type_inference.h
+25
-0
paddle/fluid/inference/analysis/analyzer.cc
paddle/fluid/inference/analysis/analyzer.cc
+5
-3
paddle/fluid/inference/analysis/argument.h
paddle/fluid/inference/analysis/argument.h
+2
-2
paddle/fluid/inference/analysis/data_flow_graph.cc
paddle/fluid/inference/analysis/data_flow_graph.cc
+5
-5
paddle/fluid/inference/analysis/data_flow_graph_to_fluid_pass.cc
...fluid/inference/analysis/data_flow_graph_to_fluid_pass.cc
+4
-3
paddle/fluid/inference/analysis/dfg_graphviz_draw_pass.cc
paddle/fluid/inference/analysis/dfg_graphviz_draw_pass.cc
+1
-1
paddle/fluid/inference/analysis/fluid_to_ir_pass.cc
paddle/fluid/inference/analysis/fluid_to_ir_pass.cc
+1
-1
paddle/fluid/inference/analysis/model_store_pass.cc
paddle/fluid/inference/analysis/model_store_pass.cc
+4
-4
paddle/fluid/inference/analysis/pass_manager.cc
paddle/fluid/inference/analysis/pass_manager.cc
+2
-2
paddle/fluid/inference/analysis/subgraph_splitter.cc
paddle/fluid/inference/analysis/subgraph_splitter.cc
+1
-1
paddle/fluid/inference/analysis/tensorrt_subgraph_pass.cc
paddle/fluid/inference/analysis/tensorrt_subgraph_pass.cc
+3
-3
paddle/fluid/inference/api/analysis_predictor.cc
paddle/fluid/inference/api/analysis_predictor.cc
+8
-8
paddle/fluid/inference/api/analysis_predictor.h
paddle/fluid/inference/api/analysis_predictor.h
+2
-0
paddle/fluid/inference/api/api.cc
paddle/fluid/inference/api/api.cc
+0
-1
paddle/fluid/inference/api/api_impl.cc
paddle/fluid/inference/api/api_impl.cc
+1
-1
paddle/fluid/inference/api/api_tensorrt_subgraph_engine.cc
paddle/fluid/inference/api/api_tensorrt_subgraph_engine.cc
+7
-7
paddle/fluid/inference/api/demo_ci/trt_mobilenet_demo.cc
paddle/fluid/inference/api/demo_ci/trt_mobilenet_demo.cc
+4
-4
paddle/fluid/inference/api/demo_ci/utils.h
paddle/fluid/inference/api/demo_ci/utils.h
+5
-5
paddle/fluid/inference/api/demo_ci/vis_demo.cc
paddle/fluid/inference/api/demo_ci/vis_demo.cc
+5
-5
paddle/fluid/inference/api/details/reset_tensor_array.cc
paddle/fluid/inference/api/details/reset_tensor_array.cc
+2
-2
paddle/fluid/inference/api/helper.h
paddle/fluid/inference/api/helper.h
+2
-1
paddle/fluid/inference/io.cc
paddle/fluid/inference/io.cc
+4
-3
paddle/fluid/inference/tensorrt/convert/concat_op.cc
paddle/fluid/inference/tensorrt/convert/concat_op.cc
+1
-1
paddle/fluid/inference/tensorrt/convert/dropout_op.cc
paddle/fluid/inference/tensorrt/convert/dropout_op.cc
+1
-1
paddle/fluid/inference/tensorrt/convert/fc_op.cc
paddle/fluid/inference/tensorrt/convert/fc_op.cc
+1
-1
paddle/fluid/inference/tensorrt/convert/mul_op.cc
paddle/fluid/inference/tensorrt/convert/mul_op.cc
+1
-1
paddle/fluid/inference/tensorrt/convert/pad_op.cc
paddle/fluid/inference/tensorrt/convert/pad_op.cc
+1
-1
paddle/fluid/inference/tensorrt/convert/pool2d_op.cc
paddle/fluid/inference/tensorrt/convert/pool2d_op.cc
+1
-1
paddle/fluid/inference/tensorrt/convert/softmax_op.cc
paddle/fluid/inference/tensorrt/convert/softmax_op.cc
+1
-1
paddle/fluid/inference/tensorrt/engine.h
paddle/fluid/inference/tensorrt/engine.h
+1
-1
paddle/fluid/inference/tests/api/analyzer_vis_tester.cc
paddle/fluid/inference/tests/api/analyzer_vis_tester.cc
+3
-3
paddle/fluid/memory/detail/buddy_allocator.cc
paddle/fluid/memory/detail/buddy_allocator.cc
+28
-28
paddle/fluid/memory/detail/meta_cache.cc
paddle/fluid/memory/detail/meta_cache.cc
+1
-1
paddle/fluid/memory/malloc.cc
paddle/fluid/memory/malloc.cc
+7
-7
paddle/fluid/operators/CMakeLists.txt
paddle/fluid/operators/CMakeLists.txt
+1
-0
paddle/fluid/operators/activation_op.cc
paddle/fluid/operators/activation_op.cc
+6
-10
paddle/fluid/operators/activation_op.h
paddle/fluid/operators/activation_op.h
+1
-1
paddle/fluid/operators/adam_op.h
paddle/fluid/operators/adam_op.h
+1
-1
paddle/fluid/operators/add_position_encoding_op.h
paddle/fluid/operators/add_position_encoding_op.h
+4
-3
paddle/fluid/operators/array_operator.h
paddle/fluid/operators/array_operator.h
+1
-1
paddle/fluid/operators/array_to_lod_tensor_op.cc
paddle/fluid/operators/array_to_lod_tensor_op.cc
+2
-2
paddle/fluid/operators/batch_norm_op.cc
paddle/fluid/operators/batch_norm_op.cc
+10
-1
paddle/fluid/operators/batch_norm_op.cu.cc
paddle/fluid/operators/batch_norm_op.cu.cc
+1
-1
paddle/fluid/operators/beam_search_op.cc
paddle/fluid/operators/beam_search_op.cc
+6
-6
paddle/fluid/operators/bilinear_interp_op.h
paddle/fluid/operators/bilinear_interp_op.h
+0
-163
paddle/fluid/operators/checkpoint_notify_op.cc
paddle/fluid/operators/checkpoint_notify_op.cc
+2
-2
paddle/fluid/operators/concat_op.cc
paddle/fluid/operators/concat_op.cc
+1
-1
paddle/fluid/operators/conv_cudnn_op.cu.cc
paddle/fluid/operators/conv_cudnn_op.cu.cc
+188
-23
paddle/fluid/operators/conv_cudnn_op_cache.h
paddle/fluid/operators/conv_cudnn_op_cache.h
+90
-0
paddle/fluid/operators/conv_op.cc
paddle/fluid/operators/conv_op.cc
+22
-1
paddle/fluid/operators/cross_entropy_op.cc
paddle/fluid/operators/cross_entropy_op.cc
+11
-0
paddle/fluid/operators/distributed/brpc_server.cc
paddle/fluid/operators/distributed/brpc_server.cc
+2
-2
paddle/fluid/operators/distributed/grpc_client.cc
paddle/fluid/operators/distributed/grpc_client.cc
+7
-7
paddle/fluid/operators/distributed/grpc_server.cc
paddle/fluid/operators/distributed/grpc_server.cc
+23
-22
paddle/fluid/operators/distributed/request_handler.h
paddle/fluid/operators/distributed/request_handler.h
+2
-2
paddle/fluid/operators/distributed/request_handler_impl.cc
paddle/fluid/operators/distributed/request_handler_impl.cc
+13
-12
paddle/fluid/operators/distributed/rpc_server.cc
paddle/fluid/operators/distributed/rpc_server.cc
+10
-10
paddle/fluid/operators/distributed/variable_response.cc
paddle/fluid/operators/distributed/variable_response.cc
+4
-4
paddle/fluid/operators/elementwise_op.h
paddle/fluid/operators/elementwise_op.h
+6
-10
paddle/fluid/operators/feed_op.cc
paddle/fluid/operators/feed_op.cc
+2
-2
paddle/fluid/operators/fetch_barrier_op.cc
paddle/fluid/operators/fetch_barrier_op.cc
+1
-1
paddle/fluid/operators/fetch_op.cc
paddle/fluid/operators/fetch_op.cc
+1
-1
paddle/fluid/operators/gen_nccl_id_op.cc
paddle/fluid/operators/gen_nccl_id_op.cc
+5
-5
paddle/fluid/operators/grid_sampler_op.h
paddle/fluid/operators/grid_sampler_op.h
+9
-2
paddle/fluid/operators/interpolate_op.cc
paddle/fluid/operators/interpolate_op.cc
+53
-27
paddle/fluid/operators/interpolate_op.cu
paddle/fluid/operators/interpolate_op.cu
+292
-0
paddle/fluid/operators/interpolate_op.h
paddle/fluid/operators/interpolate_op.h
+236
-0
paddle/fluid/operators/listen_and_serv_op.cc
paddle/fluid/operators/listen_and_serv_op.cc
+17
-17
paddle/fluid/operators/lod_rank_table_op.cc
paddle/fluid/operators/lod_rank_table_op.cc
+2
-2
paddle/fluid/operators/lookup_table_op.cc
paddle/fluid/operators/lookup_table_op.cc
+4
-4
paddle/fluid/operators/math/cpu_vec_test.cc
paddle/fluid/operators/math/cpu_vec_test.cc
+2
-2
paddle/fluid/operators/math/jit_kernel_test.cc
paddle/fluid/operators/math/jit_kernel_test.cc
+50
-39
paddle/fluid/operators/math/selected_rows_functor.cc
paddle/fluid/operators/math/selected_rows_functor.cc
+2
-2
paddle/fluid/operators/math/selected_rows_functor.cu
paddle/fluid/operators/math/selected_rows_functor.cu
+2
-2
paddle/fluid/operators/mean_op.cc
paddle/fluid/operators/mean_op.cc
+19
-2
paddle/fluid/operators/momentum_op.h
paddle/fluid/operators/momentum_op.h
+1
-1
paddle/fluid/operators/mul_op.cc
paddle/fluid/operators/mul_op.cc
+13
-4
paddle/fluid/operators/nccl_op.cu.cc
paddle/fluid/operators/nccl_op.cu.cc
+16
-15
paddle/fluid/operators/nccl_op_test.cu.cc
paddle/fluid/operators/nccl_op_test.cu.cc
+7
-7
paddle/fluid/operators/parallel_do_op.cc
paddle/fluid/operators/parallel_do_op.cc
+5
-5
paddle/fluid/operators/pool_op.cc
paddle/fluid/operators/pool_op.cc
+14
-4
paddle/fluid/operators/prefetch_op.cc
paddle/fluid/operators/prefetch_op.cc
+3
-3
paddle/fluid/operators/random_crop_op.h
paddle/fluid/operators/random_crop_op.h
+2
-2
paddle/fluid/operators/reader/blocking_queue.h
paddle/fluid/operators/reader/blocking_queue.h
+2
-2
paddle/fluid/operators/reader/create_shuffle_reader_op.cc
paddle/fluid/operators/reader/create_shuffle_reader_op.cc
+3
-3
paddle/fluid/operators/recurrent_op.cc
paddle/fluid/operators/recurrent_op.cc
+13
-13
paddle/fluid/operators/recv_op.cc
paddle/fluid/operators/recv_op.cc
+1
-1
paddle/fluid/operators/rnn_memory_helper_op.cc
paddle/fluid/operators/rnn_memory_helper_op.cc
+1
-1
paddle/fluid/operators/save_op.cc
paddle/fluid/operators/save_op.cc
+1
-1
paddle/fluid/operators/send_barrier_op.cc
paddle/fluid/operators/send_barrier_op.cc
+2
-2
paddle/fluid/operators/send_op.cc
paddle/fluid/operators/send_op.cc
+2
-2
paddle/fluid/operators/send_recv_op_test.cc
paddle/fluid/operators/send_recv_op_test.cc
+2
-2
paddle/fluid/operators/sequence_mask_op.h
paddle/fluid/operators/sequence_mask_op.h
+1
-1
paddle/fluid/operators/sgd_op.h
paddle/fluid/operators/sgd_op.h
+4
-4
paddle/fluid/operators/similarity_focus_op.cc
paddle/fluid/operators/similarity_focus_op.cc
+87
-0
paddle/fluid/operators/similarity_focus_op.h
paddle/fluid/operators/similarity_focus_op.h
+168
-0
paddle/fluid/operators/softmax_op.cc
paddle/fluid/operators/softmax_op.cc
+9
-1
paddle/fluid/operators/split_byref_op.h
paddle/fluid/operators/split_byref_op.h
+1
-1
paddle/fluid/operators/split_ids_op.h
paddle/fluid/operators/split_ids_op.h
+1
-1
paddle/fluid/operators/sum_mkldnn_op.cc
paddle/fluid/operators/sum_mkldnn_op.cc
+1
-1
paddle/fluid/operators/sum_op.cc
paddle/fluid/operators/sum_op.cc
+3
-3
paddle/fluid/operators/tensor_array_read_write_op.cc
paddle/fluid/operators/tensor_array_read_write_op.cc
+7
-7
paddle/fluid/operators/tensor_array_to_tensor_op.cc
paddle/fluid/operators/tensor_array_to_tensor_op.cc
+246
-0
paddle/fluid/operators/tensorrt_engine_op.h
paddle/fluid/operators/tensorrt_engine_op.h
+8
-8
paddle/fluid/operators/while_op.cc
paddle/fluid/operators/while_op.cc
+9
-9
paddle/fluid/platform/device_context.cc
paddle/fluid/platform/device_context.cc
+4
-1
paddle/fluid/platform/device_tracer.cc
paddle/fluid/platform/device_tracer.cc
+4
-4
paddle/fluid/platform/dynload/cudnn.h
paddle/fluid/platform/dynload/cudnn.h
+48
-45
paddle/fluid/platform/dynload/dynamic_loader.cc
paddle/fluid/platform/dynload/dynamic_loader.cc
+2
-2
paddle/fluid/platform/gpu_info.cc
paddle/fluid/platform/gpu_info.cc
+2
-2
paddle/fluid/platform/init.cc
paddle/fluid/platform/init.cc
+1
-1
paddle/fluid/platform/nccl_helper.h
paddle/fluid/platform/nccl_helper.h
+1
-1
paddle/fluid/pybind/protobuf.cc
paddle/fluid/pybind/protobuf.cc
+3
-3
paddle/fluid/train/demo/demo_trainer.cc
paddle/fluid/train/demo/demo_trainer.cc
+1
-1
paddle/scripts/paddle_build.sh
paddle/scripts/paddle_build.sh
+22
-3
paddle/testing/TestUtil.cpp
paddle/testing/TestUtil.cpp
+1
-1
python/paddle/fluid/__init__.py
python/paddle/fluid/__init__.py
+3
-1
python/paddle/fluid/distribute_lookup_table.py
python/paddle/fluid/distribute_lookup_table.py
+39
-0
python/paddle/fluid/layers/nn.py
python/paddle/fluid/layers/nn.py
+265
-22
python/paddle/fluid/layers/tensor.py
python/paddle/fluid/layers/tensor.py
+58
-4
python/paddle/fluid/optimizer.py
python/paddle/fluid/optimizer.py
+59
-7
python/paddle/fluid/tests/book/test_label_semantic_roles.py
python/paddle/fluid/tests/book/test_label_semantic_roles.py
+1
-1
python/paddle/fluid/tests/unittests/test_conv2d_op.py
python/paddle/fluid/tests/unittests/test_conv2d_op.py
+9
-1
python/paddle/fluid/tests/unittests/test_conv3d_op.py
python/paddle/fluid/tests/unittests/test_conv3d_op.py
+6
-0
python/paddle/fluid/tests/unittests/test_dist_transpiler.py
python/paddle/fluid/tests/unittests/test_dist_transpiler.py
+11
-2
python/paddle/fluid/tests/unittests/test_interpolate_op.py
python/paddle/fluid/tests/unittests/test_interpolate_op.py
+335
-0
python/paddle/fluid/tests/unittests/test_layers.py
python/paddle/fluid/tests/unittests/test_layers.py
+10
-0
python/paddle/fluid/tests/unittests/test_similarity_focus_op.py
.../paddle/fluid/tests/unittests/test_similarity_focus_op.py
+217
-0
python/paddle/fluid/tests/unittests/test_tensor_array_to_tensor.py
...ddle/fluid/tests/unittests/test_tensor_array_to_tensor.py
+142
-0
python/paddle/fluid/transpiler/distribute_transpiler.py
python/paddle/fluid/transpiler/distribute_transpiler.py
+4
-27
未找到文件。
paddle/fluid/API.spec
浏览文件 @
d231e550
...
@@ -118,9 +118,10 @@ paddle.fluid.layers.label_smooth ArgSpec(args=['label', 'prior_dist', 'epsilon',
...
@@ -118,9 +118,10 @@ paddle.fluid.layers.label_smooth ArgSpec(args=['label', 'prior_dist', 'epsilon',
paddle.fluid.layers.roi_pool ArgSpec(args=['input', 'rois', 'pooled_height', 'pooled_width', 'spatial_scale'], varargs=None, keywords=None, defaults=(1, 1, 1.0))
paddle.fluid.layers.roi_pool ArgSpec(args=['input', 'rois', 'pooled_height', 'pooled_width', 'spatial_scale'], varargs=None, keywords=None, defaults=(1, 1, 1.0))
paddle.fluid.layers.roi_align ArgSpec(args=['input', 'rois', 'pooled_height', 'pooled_width', 'spatial_scale', 'sampling_ratio', 'name'], varargs=None, keywords=None, defaults=(1, 1, 1.0, -1, None))
paddle.fluid.layers.roi_align ArgSpec(args=['input', 'rois', 'pooled_height', 'pooled_width', 'spatial_scale', 'sampling_ratio', 'name'], varargs=None, keywords=None, defaults=(1, 1, 1.0, -1, None))
paddle.fluid.layers.dice_loss ArgSpec(args=['input', 'label', 'epsilon'], varargs=None, keywords=None, defaults=(1e-05,))
paddle.fluid.layers.dice_loss ArgSpec(args=['input', 'label', 'epsilon'], varargs=None, keywords=None, defaults=(1e-05,))
paddle.fluid.layers.image_resize ArgSpec(args=['input', 'out_shape', 'scale', 'name', 'resample'
], varargs=None, keywords=None, defaults=(None, None, None, 'BILINEAR'
))
paddle.fluid.layers.image_resize ArgSpec(args=['input', 'out_shape', 'scale', 'name', 'resample'
, 'actual_shape'], varargs=None, keywords=None, defaults=(None, None, None, 'BILINEAR', None
))
paddle.fluid.layers.image_resize_short ArgSpec(args=['input', 'out_short_len', 'resample'], varargs=None, keywords=None, defaults=('BILINEAR',))
paddle.fluid.layers.image_resize_short ArgSpec(args=['input', 'out_short_len', 'resample'], varargs=None, keywords=None, defaults=('BILINEAR',))
paddle.fluid.layers.resize_bilinear ArgSpec(args=['input', 'out_shape', 'scale', 'name'], varargs=None, keywords=None, defaults=(None, None, None))
paddle.fluid.layers.resize_bilinear ArgSpec(args=['input', 'out_shape', 'scale', 'name', 'actual_shape'], varargs=None, keywords=None, defaults=(None, None, None, None))
paddle.fluid.layers.resize_nearest ArgSpec(args=['input', 'out_shape', 'scale', 'name', 'actual_shape'], varargs=None, keywords=None, defaults=(None, None, None, None))
paddle.fluid.layers.gather ArgSpec(args=['input', 'index'], varargs=None, keywords=None, defaults=None)
paddle.fluid.layers.gather ArgSpec(args=['input', 'index'], varargs=None, keywords=None, defaults=None)
paddle.fluid.layers.scatter ArgSpec(args=['input', 'index', 'updates', 'name'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.layers.scatter ArgSpec(args=['input', 'index', 'updates', 'name'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.layers.sequence_scatter ArgSpec(args=['input', 'index', 'updates', 'name'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.layers.sequence_scatter ArgSpec(args=['input', 'index', 'updates', 'name'], varargs=None, keywords=None, defaults=(None,))
...
@@ -178,6 +179,7 @@ paddle.fluid.layers.space_to_depth ArgSpec(args=['x', 'blocksize', 'name'], vara
...
@@ -178,6 +179,7 @@ paddle.fluid.layers.space_to_depth ArgSpec(args=['x', 'blocksize', 'name'], vara
paddle.fluid.layers.affine_grid ArgSpec(args=['theta', 'out_shape', 'name'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.layers.affine_grid ArgSpec(args=['theta', 'out_shape', 'name'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.layers.sequence_reverse ArgSpec(args=['x', 'name'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.layers.sequence_reverse ArgSpec(args=['x', 'name'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.layers.affine_channel ArgSpec(args=['x', 'scale', 'bias', 'data_layout', 'name'], varargs=None, keywords=None, defaults=(None, None, 'NCHW', None))
paddle.fluid.layers.affine_channel ArgSpec(args=['x', 'scale', 'bias', 'data_layout', 'name'], varargs=None, keywords=None, defaults=(None, None, 'NCHW', None))
paddle.fluid.layers.similarity_focus ArgSpec(args=['input', 'axis', 'indexes', 'name'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.layers.hash ArgSpec(args=['input', 'hash_size', 'num_hash', 'name'], varargs=None, keywords=None, defaults=(1, None))
paddle.fluid.layers.hash ArgSpec(args=['input', 'hash_size', 'num_hash', 'name'], varargs=None, keywords=None, defaults=(1, None))
paddle.fluid.layers.grid_sampler ArgSpec(args=['x', 'grid', 'name'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.layers.grid_sampler ArgSpec(args=['x', 'grid', 'name'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.layers.log_loss ArgSpec(args=['input', 'label', 'epsilon', 'name'], varargs=None, keywords=None, defaults=(0.0001, None))
paddle.fluid.layers.log_loss ArgSpec(args=['input', 'label', 'epsilon', 'name'], varargs=None, keywords=None, defaults=(0.0001, None))
...
@@ -200,6 +202,7 @@ paddle.fluid.layers.create_tensor ArgSpec(args=['dtype', 'name', 'persistable'],
...
@@ -200,6 +202,7 @@ paddle.fluid.layers.create_tensor ArgSpec(args=['dtype', 'name', 'persistable'],
paddle.fluid.layers.create_parameter ArgSpec(args=['shape', 'dtype', 'name', 'attr', 'is_bias', 'default_initializer'], varargs=None, keywords=None, defaults=(None, None, False, None))
paddle.fluid.layers.create_parameter ArgSpec(args=['shape', 'dtype', 'name', 'attr', 'is_bias', 'default_initializer'], varargs=None, keywords=None, defaults=(None, None, False, None))
paddle.fluid.layers.create_global_var ArgSpec(args=['shape', 'value', 'dtype', 'persistable', 'force_cpu', 'name'], varargs=None, keywords=None, defaults=(False, False, None))
paddle.fluid.layers.create_global_var ArgSpec(args=['shape', 'value', 'dtype', 'persistable', 'force_cpu', 'name'], varargs=None, keywords=None, defaults=(False, False, None))
paddle.fluid.layers.cast ArgSpec(args=['x', 'dtype'], varargs=None, keywords=None, defaults=None)
paddle.fluid.layers.cast ArgSpec(args=['x', 'dtype'], varargs=None, keywords=None, defaults=None)
paddle.fluid.layers.tensor_array_to_tensor ArgSpec(args=['input', 'axis', 'name'], varargs=None, keywords=None, defaults=(1, None))
paddle.fluid.layers.concat ArgSpec(args=['input', 'axis', 'name'], varargs=None, keywords=None, defaults=(0, None))
paddle.fluid.layers.concat ArgSpec(args=['input', 'axis', 'name'], varargs=None, keywords=None, defaults=(0, None))
paddle.fluid.layers.sums ArgSpec(args=['input', 'out'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.layers.sums ArgSpec(args=['input', 'out'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.layers.assign ArgSpec(args=['input', 'output'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.layers.assign ArgSpec(args=['input', 'output'], varargs=None, keywords=None, defaults=(None,))
...
...
paddle/fluid/framework/data_device_transform.cc
浏览文件 @
d231e550
...
@@ -18,7 +18,7 @@ namespace framework {
...
@@ -18,7 +18,7 @@ namespace framework {
void
TransDataDevice
(
const
Tensor
&
in
,
const
platform
::
Place
&
dst_place
,
void
TransDataDevice
(
const
Tensor
&
in
,
const
platform
::
Place
&
dst_place
,
Tensor
*
out
)
{
Tensor
*
out
)
{
VLOG
(
3
)
<<
"DeviceTransform in, src_place "
<<
in
.
place
()
VLOG
(
3
0
)
<<
"DeviceTransform in, src_place "
<<
in
.
place
()
<<
" dst_place: "
<<
dst_place
;
<<
" dst_place: "
<<
dst_place
;
PADDLE_ENFORCE_NE
(
PADDLE_ENFORCE_NE
(
...
...
paddle/fluid/framework/data_device_transform_test.cu
浏览文件 @
d231e550
...
@@ -49,10 +49,10 @@ class TestOpWithKernel : public OperatorWithKernel {
...
@@ -49,10 +49,10 @@ class TestOpWithKernel : public OperatorWithKernel {
OpKernelType
GetExpectedKernelType
(
OpKernelType
GetExpectedKernelType
(
const
ExecutionContext
&
ctx
)
const
override
{
const
ExecutionContext
&
ctx
)
const
override
{
if
(
Attr
<
bool
>
(
"use_gpu"
))
{
if
(
Attr
<
bool
>
(
"use_gpu"
))
{
VLOG
(
3
)
<<
"force use gpu kernel"
;
VLOG
(
3
0
)
<<
"force use gpu kernel"
;
return
OpKernelType
(
proto
::
VarType
::
FP32
,
platform
::
CUDAPlace
(
0
));
return
OpKernelType
(
proto
::
VarType
::
FP32
,
platform
::
CUDAPlace
(
0
));
}
else
{
}
else
{
VLOG
(
3
)
<<
"use default kernel"
;
VLOG
(
3
0
)
<<
"use default kernel"
;
return
OpKernelType
(
proto
::
VarType
::
FP32
,
return
OpKernelType
(
proto
::
VarType
::
FP32
,
ctx
.
Input
<
Tensor
>
(
"input"
)
->
place
());
ctx
.
Input
<
Tensor
>
(
"input"
)
->
place
());
}
}
...
@@ -148,7 +148,7 @@ TEST(Operator, CPUtoGPU) {
...
@@ -148,7 +148,7 @@ TEST(Operator, CPUtoGPU) {
// get output
// get output
auto
*
output2
=
scope
.
Var
(
"OUT2"
);
auto
*
output2
=
scope
.
Var
(
"OUT2"
);
gpu_op
->
Run
(
scope
,
cuda_place
);
gpu_op
->
Run
(
scope
,
cuda_place
);
VLOG
(
3
)
<<
"after gpu_op run"
;
VLOG
(
3
0
)
<<
"after gpu_op run"
;
// auto* output2_ptr = output2->Get<LoDTensor>().data<float>();
// auto* output2_ptr = output2->Get<LoDTensor>().data<float>();
paddle
::
platform
::
DeviceContextPool
&
pool
=
paddle
::
platform
::
DeviceContextPool
&
pool
=
...
...
paddle/fluid/framework/details/broadcast_op_handle.cc
浏览文件 @
d231e550
...
@@ -60,7 +60,7 @@ void BroadcastOpHandle::BroadcastOneVar(
...
@@ -60,7 +60,7 @@ void BroadcastOpHandle::BroadcastOneVar(
PADDLE_ENFORCE_NOT_NULL
(
in_var
);
PADDLE_ENFORCE_NOT_NULL
(
in_var
);
Tensor
&
in_tensor
=
VariableVisitor
::
GetMutableTensor
(
in_var
);
Tensor
&
in_tensor
=
VariableVisitor
::
GetMutableTensor
(
in_var
);
if
(
UNLIKELY
(
!
in_tensor
.
IsInitialized
()))
{
if
(
UNLIKELY
(
!
in_tensor
.
IsInitialized
()))
{
VLOG
(
3
)
<<
"in var "
<<
in_var_handle
.
name_
<<
"not inited, return!"
;
VLOG
(
3
0
)
<<
"in var "
<<
in_var_handle
.
name_
<<
"not inited, return!"
;
return
;
return
;
}
}
...
...
paddle/fluid/framework/details/modify_op_lock_and_record_event_pass.cc
浏览文件 @
d231e550
...
@@ -45,7 +45,7 @@ std::unique_ptr<ir::Graph> ModifyOpLockAndRecordEventPass::ApplyImpl(
...
@@ -45,7 +45,7 @@ std::unique_ptr<ir::Graph> ModifyOpLockAndRecordEventPass::ApplyImpl(
IsLockAndRecordEventFreeComputationOpHandle
(
compute_op
,
graph_view
);
IsLockAndRecordEventFreeComputationOpHandle
(
compute_op
,
graph_view
);
compute_op
->
SetLockAndRecordEventFree
(
is_lock_and_record_event_free
);
compute_op
->
SetLockAndRecordEventFree
(
is_lock_and_record_event_free
);
if
(
is_lock_and_record_event_free
)
{
if
(
is_lock_and_record_event_free
)
{
VLOG
(
10
)
<<
"Set is_lock_and_record_event_free be true in op "
VLOG
(
10
0
)
<<
"Set is_lock_and_record_event_free be true in op "
<<
compute_op
->
DebugString
();
<<
compute_op
->
DebugString
();
}
}
}
}
...
...
paddle/fluid/framework/details/multi_devices_graph_pass.cc
浏览文件 @
d231e550
...
@@ -399,7 +399,7 @@ std::unique_ptr<ir::Graph> MultiDevSSAGraphBuilder::ApplyImpl(
...
@@ -399,7 +399,7 @@ std::unique_ptr<ir::Graph> MultiDevSSAGraphBuilder::ApplyImpl(
for
(
size_t
i
=
0
;
i
<
backward_vars
.
size
();
i
+=
2
)
{
for
(
size_t
i
=
0
;
i
<
backward_vars
.
size
();
i
+=
2
)
{
auto
&
p_name
=
backward_vars
[
i
];
auto
&
p_name
=
backward_vars
[
i
];
auto
&
g_name
=
backward_vars
[
i
+
1
];
auto
&
g_name
=
backward_vars
[
i
+
1
];
VLOG
(
10
)
<<
"Bcast "
<<
g_name
<<
" for parameter "
<<
p_name
;
VLOG
(
10
0
)
<<
"Bcast "
<<
g_name
<<
" for parameter "
<<
p_name
;
switch
(
strategy_
.
reduce_
)
{
switch
(
strategy_
.
reduce_
)
{
case
BuildStrategy
::
ReduceStrategy
::
kReduce
:
case
BuildStrategy
::
ReduceStrategy
::
kReduce
:
...
@@ -809,7 +809,7 @@ int MultiDevSSAGraphBuilder::CreateRPCOp(
...
@@ -809,7 +809,7 @@ int MultiDevSSAGraphBuilder::CreateRPCOp(
node
->
Op
()
->
GetAttr
(
OpProtoAndCheckerMaker
::
OpRoleVarAttrName
()));
node
->
Op
()
->
GetAttr
(
OpProtoAndCheckerMaker
::
OpRoleVarAttrName
()));
PADDLE_ENFORCE_EQ
(
send_param_grad
.
size
(),
2U
);
PADDLE_ENFORCE_EQ
(
send_param_grad
.
size
(),
2U
);
op_dev_id
=
GetAppropriateDeviceID
({
send_param_grad
[
1
]});
op_dev_id
=
GetAppropriateDeviceID
({
send_param_grad
[
1
]});
VLOG
(
10
)
<<
"send grad "
<<
input_var_names
[
0
]
<<
" origin "
VLOG
(
10
0
)
<<
"send grad "
<<
input_var_names
[
0
]
<<
" origin "
<<
send_param_grad
[
1
]
<<
" place: "
<<
op_dev_id
;
<<
send_param_grad
[
1
]
<<
" place: "
<<
op_dev_id
;
for
(
auto
&
varname
:
input_var_names
)
{
for
(
auto
&
varname
:
input_var_names
)
{
sharded_var_device
->
emplace
(
varname
,
op_dev_id
);
sharded_var_device
->
emplace
(
varname
,
op_dev_id
);
...
@@ -826,7 +826,7 @@ int MultiDevSSAGraphBuilder::CreateRPCOp(
...
@@ -826,7 +826,7 @@ int MultiDevSSAGraphBuilder::CreateRPCOp(
if
(
recv_param_grad
.
size
()
==
2U
)
{
if
(
recv_param_grad
.
size
()
==
2U
)
{
op_dev_id
=
op_dev_id
=
GetVarDeviceID
(
*
result
,
recv_param_grad
[
1
],
*
sharded_var_device
);
GetVarDeviceID
(
*
result
,
recv_param_grad
[
1
],
*
sharded_var_device
);
VLOG
(
10
)
<<
"recv param "
<<
recv_param_grad
[
0
]
VLOG
(
10
0
)
<<
"recv param "
<<
recv_param_grad
[
0
]
<<
" get grad place: "
<<
recv_param_grad
[
1
]
<<
" get grad place: "
<<
recv_param_grad
[
1
]
<<
" place: "
<<
op_dev_id
;
<<
" place: "
<<
op_dev_id
;
}
else
{
}
else
{
...
...
paddle/fluid/framework/details/reference_count_pass.cc
浏览文件 @
d231e550
...
@@ -140,7 +140,7 @@ std::unique_ptr<ir::Graph> ReferenceCountPass::ApplyImpl(
...
@@ -140,7 +140,7 @@ std::unique_ptr<ir::Graph> ReferenceCountPass::ApplyImpl(
if
(
next_compute_op
!=
nullptr
)
{
if
(
next_compute_op
!=
nullptr
)
{
if
(
compute_ref_cnt_map
.
count
(
next_compute_op
))
{
if
(
compute_ref_cnt_map
.
count
(
next_compute_op
))
{
compute_ref_cnt_map
[
next_compute_op
]
->
AddVar
(
var_name
);
compute_ref_cnt_map
[
next_compute_op
]
->
AddVar
(
var_name
);
VLOG
(
5
)
<<
"Add reference count of "
<<
var_name
<<
" to Operator "
VLOG
(
5
0
)
<<
"Add reference count of "
<<
var_name
<<
" to Operator "
<<
next_compute_op
->
Name
();
<<
next_compute_op
->
Name
();
}
else
{
}
else
{
// Create new reference_count_op_handle
// Create new reference_count_op_handle
...
...
paddle/fluid/framework/details/scale_loss_grad_op_handle.cc
浏览文件 @
d231e550
...
@@ -51,7 +51,7 @@ void ScaleLossGradOpHandle::RunImpl() {
...
@@ -51,7 +51,7 @@ void ScaleLossGradOpHandle::RunImpl() {
->
stream
();
->
stream
();
memory
::
Copy
(
boost
::
get
<
platform
::
CUDAPlace
>
(
place_
),
tmp
,
memory
::
Copy
(
boost
::
get
<
platform
::
CUDAPlace
>
(
place_
),
tmp
,
platform
::
CPUPlace
(),
&
coeff_
,
sizeof
(
float
),
stream
);
platform
::
CPUPlace
(),
&
coeff_
,
sizeof
(
float
),
stream
);
VLOG
(
10
)
<<
place_
<<
"RUN Scale loss grad op"
;
VLOG
(
10
0
)
<<
place_
<<
"RUN Scale loss grad op"
;
});
});
#endif
#endif
}
}
...
...
paddle/fluid/framework/details/sequential_execution_pass.cc
浏览文件 @
d231e550
...
@@ -94,7 +94,7 @@ std::unique_ptr<ir::Graph> SequentialExecutionPass::ApplyImpl(
...
@@ -94,7 +94,7 @@ std::unique_ptr<ir::Graph> SequentialExecutionPass::ApplyImpl(
op_node_list
[
i
-
1
]
->
outputs
.
push_back
(
dep_var
);
op_node_list
[
i
-
1
]
->
outputs
.
push_back
(
dep_var
);
dep_var
->
outputs
.
push_back
(
op_node_list
[
i
]);
dep_var
->
outputs
.
push_back
(
op_node_list
[
i
]);
dep_var
->
inputs
.
push_back
(
op_node_list
[
i
-
1
]);
dep_var
->
inputs
.
push_back
(
op_node_list
[
i
-
1
]);
VLOG
(
10
)
<<
"Add dependencies between "
<<
op_node_list
[
i
-
1
]
->
Name
()
VLOG
(
10
0
)
<<
"Add dependencies between "
<<
op_node_list
[
i
-
1
]
->
Name
()
<<
" and "
<<
op_node_list
[
i
]
->
Name
();
<<
" and "
<<
op_node_list
[
i
]
->
Name
();
}
}
return
graph
;
return
graph
;
...
...
paddle/fluid/framework/details/threaded_ssa_graph_executor.cc
浏览文件 @
d231e550
...
@@ -210,16 +210,16 @@ void ThreadedSSAGraphExecutor::RunOp(
...
@@ -210,16 +210,16 @@ void ThreadedSSAGraphExecutor::RunOp(
details
::
OpHandleBase
*
op
)
{
details
::
OpHandleBase
*
op
)
{
auto
op_run
=
[
ready_var_q
,
op
,
this
]
{
auto
op_run
=
[
ready_var_q
,
op
,
this
]
{
try
{
try
{
if
(
VLOG_IS_ON
(
10
))
{
if
(
VLOG_IS_ON
(
10
0
))
{
VLOG
(
10
)
<<
op
<<
" "
<<
op
->
Name
()
<<
" : "
<<
op
->
DebugString
();
VLOG
(
10
0
)
<<
op
<<
" "
<<
op
->
Name
()
<<
" : "
<<
op
->
DebugString
();
}
}
if
(
LIKELY
(
!
strategy_
.
dry_run_
))
{
if
(
LIKELY
(
!
strategy_
.
dry_run_
))
{
op
->
Run
(
strategy_
.
use_cuda_
);
op
->
Run
(
strategy_
.
use_cuda_
);
}
}
VLOG
(
10
)
<<
op
<<
" "
<<
op
->
Name
()
<<
" Done "
;
VLOG
(
10
0
)
<<
op
<<
" "
<<
op
->
Name
()
<<
" Done "
;
running_ops_
--
;
running_ops_
--
;
ready_var_q
->
Extend
(
op
->
Outputs
());
ready_var_q
->
Extend
(
op
->
Outputs
());
VLOG
(
10
)
<<
op
<<
" "
<<
op
->
Name
()
<<
"Signal posted"
;
VLOG
(
10
0
)
<<
op
<<
" "
<<
op
->
Name
()
<<
"Signal posted"
;
}
catch
(...)
{
}
catch
(...)
{
exception_holder_
.
Catch
(
std
::
current_exception
());
exception_holder_
.
Catch
(
std
::
current_exception
());
}
}
...
...
paddle/fluid/framework/executor.cc
浏览文件 @
d231e550
...
@@ -43,7 +43,7 @@ ExecutorPrepareContext::ExecutorPrepareContext(
...
@@ -43,7 +43,7 @@ ExecutorPrepareContext::ExecutorPrepareContext(
}
}
ExecutorPrepareContext
::~
ExecutorPrepareContext
()
{
ExecutorPrepareContext
::~
ExecutorPrepareContext
()
{
VLOG
(
5
)
<<
"destroy ExecutorPrepareContext"
;
VLOG
(
5
0
)
<<
"destroy ExecutorPrepareContext"
;
}
}
template
<
typename
RefCntMap
>
template
<
typename
RefCntMap
>
...
@@ -60,7 +60,7 @@ static void DeleteUnusedTensors(const Scope& scope, const OperatorBase* op,
...
@@ -60,7 +60,7 @@ static void DeleteUnusedTensors(const Scope& scope, const OperatorBase* op,
if
((
it
->
second
)
--
==
1
)
{
if
((
it
->
second
)
--
==
1
)
{
auto
*
var
=
scope
.
FindVar
(
name
);
auto
*
var
=
scope
.
FindVar
(
name
);
if
(
var
!=
nullptr
)
{
if
(
var
!=
nullptr
)
{
VLOG
(
10
)
<<
"Erase tensor
\'
"
<<
name
<<
"
\'
"
;
VLOG
(
10
0
)
<<
"Erase tensor
\'
"
<<
name
<<
"
\'
"
;
if
(
var
->
IsType
<
LoDTensor
>
())
{
if
(
var
->
IsType
<
LoDTensor
>
())
{
erase_tensors
.
insert
(
var
->
GetMutable
<
LoDTensor
>
());
erase_tensors
.
insert
(
var
->
GetMutable
<
LoDTensor
>
());
}
else
if
(
var
->
IsType
<
SelectedRows
>
())
{
}
else
if
(
var
->
IsType
<
SelectedRows
>
())
{
...
@@ -141,12 +141,12 @@ void Executor::CreateVariables(const ProgramDesc& pdesc, Scope* scope,
...
@@ -141,12 +141,12 @@ void Executor::CreateVariables(const ProgramDesc& pdesc, Scope* scope,
if
(
var
->
Persistable
())
{
if
(
var
->
Persistable
())
{
auto
*
ptr
=
const_cast
<
Scope
*>
(
ancestor_scope
)
->
Var
(
var
->
Name
());
auto
*
ptr
=
const_cast
<
Scope
*>
(
ancestor_scope
)
->
Var
(
var
->
Name
());
InitializeVariable
(
ptr
,
var
->
GetType
());
InitializeVariable
(
ptr
,
var
->
GetType
());
VLOG
(
3
)
<<
"Create Variable "
<<
var
->
Name
()
VLOG
(
3
0
)
<<
"Create Variable "
<<
var
->
Name
()
<<
" global, which pointer is "
<<
ptr
;
<<
" global, which pointer is "
<<
ptr
;
}
else
{
}
else
{
auto
*
ptr
=
scope
->
Var
(
var
->
Name
());
auto
*
ptr
=
scope
->
Var
(
var
->
Name
());
InitializeVariable
(
ptr
,
var
->
GetType
());
InitializeVariable
(
ptr
,
var
->
GetType
());
VLOG
(
3
)
<<
"Create Variable "
<<
var
->
Name
()
VLOG
(
3
0
)
<<
"Create Variable "
<<
var
->
Name
()
<<
" locally, which pointer is "
<<
ptr
;
<<
" locally, which pointer is "
<<
ptr
;
}
}
}
}
...
@@ -154,7 +154,7 @@ void Executor::CreateVariables(const ProgramDesc& pdesc, Scope* scope,
...
@@ -154,7 +154,7 @@ void Executor::CreateVariables(const ProgramDesc& pdesc, Scope* scope,
for
(
auto
&
var
:
global_block
.
AllVars
())
{
for
(
auto
&
var
:
global_block
.
AllVars
())
{
auto
*
ptr
=
scope
->
Var
(
var
->
Name
());
auto
*
ptr
=
scope
->
Var
(
var
->
Name
());
InitializeVariable
(
ptr
,
var
->
GetType
());
InitializeVariable
(
ptr
,
var
->
GetType
());
VLOG
(
3
)
<<
"Create variable "
<<
var
->
Name
()
<<
", which pointer is "
VLOG
(
3
0
)
<<
"Create variable "
<<
var
->
Name
()
<<
", which pointer is "
<<
ptr
;
<<
ptr
;
}
}
}
}
...
@@ -286,7 +286,7 @@ void Executor::Run(const ProgramDesc& program, Scope* scope,
...
@@ -286,7 +286,7 @@ void Executor::Run(const ProgramDesc& program, Scope* scope,
int
i
=
0
;
int
i
=
0
;
for
(
auto
&
feed_target
:
(
*
feed_targets
))
{
for
(
auto
&
feed_target
:
(
*
feed_targets
))
{
std
::
string
var_name
=
feed_target
.
first
;
std
::
string
var_name
=
feed_target
.
first
;
VLOG
(
3
)
<<
"feed target's name: "
<<
var_name
;
VLOG
(
3
0
)
<<
"feed target's name: "
<<
var_name
;
// prepend feed op
// prepend feed op
auto
*
op
=
global_block
->
PrependOp
();
auto
*
op
=
global_block
->
PrependOp
();
...
@@ -309,7 +309,7 @@ void Executor::Run(const ProgramDesc& program, Scope* scope,
...
@@ -309,7 +309,7 @@ void Executor::Run(const ProgramDesc& program, Scope* scope,
int
i
=
0
;
int
i
=
0
;
for
(
auto
&
fetch_target
:
(
*
fetch_targets
))
{
for
(
auto
&
fetch_target
:
(
*
fetch_targets
))
{
std
::
string
var_name
=
fetch_target
.
first
;
std
::
string
var_name
=
fetch_target
.
first
;
VLOG
(
3
)
<<
"fetch target's name: "
<<
var_name
;
VLOG
(
3
0
)
<<
"fetch target's name: "
<<
var_name
;
// append fetch op
// append fetch op
auto
*
op
=
global_block
->
AppendOp
();
auto
*
op
=
global_block
->
AppendOp
();
...
@@ -459,7 +459,7 @@ void Executor::RunPreparedContext(
...
@@ -459,7 +459,7 @@ void Executor::RunPreparedContext(
void
Executor
::
EnableMKLDNN
(
const
ProgramDesc
&
program
)
{
void
Executor
::
EnableMKLDNN
(
const
ProgramDesc
&
program
)
{
#ifdef PADDLE_WITH_MKLDNN
#ifdef PADDLE_WITH_MKLDNN
VLOG
(
3
)
<<
"use_mkldnn=True"
;
VLOG
(
3
0
)
<<
"use_mkldnn=True"
;
for
(
size_t
bid
=
0
;
bid
<
program
.
Size
();
++
bid
)
{
for
(
size_t
bid
=
0
;
bid
<
program
.
Size
();
++
bid
)
{
auto
*
block
=
const_cast
<
ProgramDesc
&>
(
program
).
MutableBlock
(
bid
);
auto
*
block
=
const_cast
<
ProgramDesc
&>
(
program
).
MutableBlock
(
bid
);
for
(
auto
*
op
:
block
->
AllOps
())
{
for
(
auto
*
op
:
block
->
AllOps
())
{
...
...
paddle/fluid/framework/feed_fetch_method.cc
浏览文件 @
d231e550
...
@@ -25,7 +25,7 @@ void SetFeedVariable(Scope* scope, const LoDTensor& input,
...
@@ -25,7 +25,7 @@ void SetFeedVariable(Scope* scope, const LoDTensor& input,
const
std
::
string
&
var_name
,
size_t
index
)
{
const
std
::
string
&
var_name
,
size_t
index
)
{
// If var_name Variable is not found in GlobalScope, a new variable will
// If var_name Variable is not found in GlobalScope, a new variable will
// be created.
// be created.
VLOG
(
3
)
<<
"SetFeedVariable name="
<<
var_name
<<
" index="
<<
index
;
VLOG
(
3
0
)
<<
"SetFeedVariable name="
<<
var_name
<<
" index="
<<
index
;
Variable
*
g_feed_value
=
scope
->
Var
(
var_name
);
Variable
*
g_feed_value
=
scope
->
Var
(
var_name
);
auto
&
feed_inputs
=
*
(
g_feed_value
->
GetMutable
<
FeedFetchList
>
());
auto
&
feed_inputs
=
*
(
g_feed_value
->
GetMutable
<
FeedFetchList
>
());
if
(
index
>=
feed_inputs
.
size
())
{
if
(
index
>=
feed_inputs
.
size
())
{
...
@@ -47,7 +47,7 @@ LoDTensor& GetFetchVariable(const Scope& scope, const std::string& var_name,
...
@@ -47,7 +47,7 @@ LoDTensor& GetFetchVariable(const Scope& scope, const std::string& var_name,
typeid
(
FeedFetchList
).
name
());
typeid
(
FeedFetchList
).
name
());
auto
&
fetch_outputs
=
*
g_fetch_value
->
GetMutable
<
FeedFetchList
>
();
auto
&
fetch_outputs
=
*
g_fetch_value
->
GetMutable
<
FeedFetchList
>
();
auto
&
tensor
=
fetch_outputs
[
index
];
auto
&
tensor
=
fetch_outputs
[
index
];
VLOG
(
3
)
<<
"Fetch "
<<
var_name
<<
" with index "
<<
index
VLOG
(
3
0
)
<<
"Fetch "
<<
var_name
<<
" with index "
<<
index
<<
" shape= "
<<
tensor
.
dims
();
<<
" shape= "
<<
tensor
.
dims
();
PADDLE_ENFORCE_LT
(
index
,
fetch_outputs
.
size
());
PADDLE_ENFORCE_LT
(
index
,
fetch_outputs
.
size
());
return
tensor
;
return
tensor
;
...
...
paddle/fluid/framework/ir/attention_lstm_fuse_pass.cc
浏览文件 @
d231e550
...
@@ -152,11 +152,11 @@ void PrepareParameters(Graph* graph, const Param& param) {
...
@@ -152,11 +152,11 @@ void PrepareParameters(Graph* graph, const Param& param) {
auto* W_##name__##_w1 = scope->FindVar(#name__ ".w_1"); \
auto* W_##name__##_w1 = scope->FindVar(#name__ ".w_1"); \
auto* W_##name__##_b0 = scope->FindVar(#name__ ".b_0"); \
auto* W_##name__##_b0 = scope->FindVar(#name__ ".b_0"); \
CHECK_P3(W_##name__##_w0, W_##name__##_w1, W_##name__##_b0); \
CHECK_P3(W_##name__##_w0, W_##name__##_w1, W_##name__##_b0); \
VLOG(4) << #name__ "_w0" \
VLOG(4
0
) << #name__ "_w0" \
<< " shape: " << W_##name__##_w0->Get<LoDTensor>().dims(); \
<< " shape: " << W_##name__##_w0->Get<LoDTensor>().dims(); \
VLOG(4) << #name__ "_w1" \
VLOG(4
0
) << #name__ "_w1" \
<< " shape: " << W_##name__##_w1->Get<LoDTensor>().dims(); \
<< " shape: " << W_##name__##_w1->Get<LoDTensor>().dims(); \
VLOG(4) << #name__ "_b0" \
VLOG(4
0
) << #name__ "_b0" \
<< " shape: " << W_##name__##_b0->Get<LoDTensor>().dims(); \
<< " shape: " << W_##name__##_b0->Get<LoDTensor>().dims(); \
auto& W_##name__##_w0_t = W_##name__##_w0->Get<LoDTensor>(); \
auto& W_##name__##_w0_t = W_##name__##_w0->Get<LoDTensor>(); \
auto& W_##name__##_w1_t = W_##name__##_w1->Get<LoDTensor>(); \
auto& W_##name__##_w1_t = W_##name__##_w1->Get<LoDTensor>(); \
...
@@ -208,7 +208,7 @@ void PrepareLSTMWeight(const LoDTensor& W_forget_w0,
...
@@ -208,7 +208,7 @@ void PrepareLSTMWeight(const LoDTensor& W_forget_w0,
int
D
=
W_forget_w0
.
dims
()[
0
];
int
D
=
W_forget_w0
.
dims
()[
0
];
int
M
=
W_forget_w1
.
dims
()[
0
];
int
M
=
W_forget_w1
.
dims
()[
0
];
out
->
Resize
(
make_ddim
({
D
+
M
,
4
*
D
}));
out
->
Resize
(
make_ddim
({
D
+
M
,
4
*
D
}));
VLOG
(
3
)
<<
"LSTMWeight resized to "
<<
out
->
dims
();
VLOG
(
3
0
)
<<
"LSTMWeight resized to "
<<
out
->
dims
();
float
*
out_data
=
out
->
mutable_data
<
float
>
(
platform
::
CPUPlace
());
float
*
out_data
=
out
->
mutable_data
<
float
>
(
platform
::
CPUPlace
());
std
::
array
<
const
float
*
,
4
>
tensors
(
std
::
array
<
const
float
*
,
4
>
tensors
(
...
...
paddle/fluid/framework/ir/conv_bias_mkldnn_fuse_pass.cc
浏览文件 @
d231e550
...
@@ -57,7 +57,7 @@ std::unique_ptr<ir::Graph> ConvBiasFusePass::ApplyImpl(
...
@@ -57,7 +57,7 @@ std::unique_ptr<ir::Graph> ConvBiasFusePass::ApplyImpl(
int
found_conv_bias_count
=
0
;
int
found_conv_bias_count
=
0
;
auto
handler
=
[
&
](
const
GraphPatternDetector
::
subgraph_t
&
subgraph
,
auto
handler
=
[
&
](
const
GraphPatternDetector
::
subgraph_t
&
subgraph
,
Graph
*
g
)
{
Graph
*
g
)
{
VLOG
(
4
)
<<
"handle ConvBias fuse"
;
VLOG
(
4
0
)
<<
"handle ConvBias fuse"
;
GET_IR_NODE_FROM_SUBGRAPH
(
conv_weight
,
conv_weight
,
GET_IR_NODE_FROM_SUBGRAPH
(
conv_weight
,
conv_weight
,
conv_bias_pattern
);
// Filter
conv_bias_pattern
);
// Filter
GET_IR_NODE_FROM_SUBGRAPH
(
conv_out
,
conv_out
,
conv_bias_pattern
);
// tmp
GET_IR_NODE_FROM_SUBGRAPH
(
conv_out
,
conv_out
,
conv_bias_pattern
);
// tmp
...
@@ -74,7 +74,7 @@ std::unique_ptr<ir::Graph> ConvBiasFusePass::ApplyImpl(
...
@@ -74,7 +74,7 @@ std::unique_ptr<ir::Graph> ConvBiasFusePass::ApplyImpl(
// check if fuse can be done and if MKL-DNN should be used
// check if fuse can be done and if MKL-DNN should be used
FuseOptions
fuse_option
=
FindFuseOption
(
*
conv
,
*
eltwise
);
FuseOptions
fuse_option
=
FindFuseOption
(
*
conv
,
*
eltwise
);
if
(
fuse_option
==
DO_NOT_FUSE
||
fuse_option
==
FUSE_NATIVE
)
{
if
(
fuse_option
==
DO_NOT_FUSE
||
fuse_option
==
FUSE_NATIVE
)
{
VLOG
(
3
)
<<
"do not perform conv+bias fuse"
;
VLOG
(
3
0
)
<<
"do not perform conv+bias fuse"
;
return
;
return
;
}
}
...
...
paddle/fluid/framework/ir/conv_bn_fuse_pass.cc
浏览文件 @
d231e550
...
@@ -121,7 +121,7 @@ std::unique_ptr<ir::Graph> ConvBNFusePass::ApplyImpl(
...
@@ -121,7 +121,7 @@ std::unique_ptr<ir::Graph> ConvBNFusePass::ApplyImpl(
int
found_conv_bn_count
=
0
;
int
found_conv_bn_count
=
0
;
auto
handler
=
[
&
](
const
GraphPatternDetector
::
subgraph_t
&
subgraph
,
auto
handler
=
[
&
](
const
GraphPatternDetector
::
subgraph_t
&
subgraph
,
Graph
*
g
)
{
Graph
*
g
)
{
VLOG
(
4
)
<<
"handle ConvBN fuse"
;
VLOG
(
4
0
)
<<
"handle ConvBN fuse"
;
// conv, batch_norm,
// conv, batch_norm,
// conv_weight, conv_out,
// conv_weight, conv_out,
...
@@ -133,7 +133,7 @@ std::unique_ptr<ir::Graph> ConvBNFusePass::ApplyImpl(
...
@@ -133,7 +133,7 @@ std::unique_ptr<ir::Graph> ConvBNFusePass::ApplyImpl(
// check if fuse can be done and if MKL-DNN should be used
// check if fuse can be done and if MKL-DNN should be used
FuseOptions
fuse_option
=
FindFuseOption
(
*
conv
,
*
batch_norm
);
FuseOptions
fuse_option
=
FindFuseOption
(
*
conv
,
*
batch_norm
);
if
(
fuse_option
==
DO_NOT_FUSE
)
{
if
(
fuse_option
==
DO_NOT_FUSE
)
{
VLOG
(
3
)
<<
"do not perform conv+bn fuse"
;
VLOG
(
3
0
)
<<
"do not perform conv+bn fuse"
;
return
;
return
;
}
}
...
@@ -241,7 +241,7 @@ std::unique_ptr<ir::Graph> ConvEltwiseAddBNFusePass::ApplyImpl(
...
@@ -241,7 +241,7 @@ std::unique_ptr<ir::Graph> ConvEltwiseAddBNFusePass::ApplyImpl(
int
found_conv_bn_count
=
0
;
int
found_conv_bn_count
=
0
;
auto
handler
=
[
&
](
const
GraphPatternDetector
::
subgraph_t
&
subgraph
,
auto
handler
=
[
&
](
const
GraphPatternDetector
::
subgraph_t
&
subgraph
,
Graph
*
g
)
{
Graph
*
g
)
{
VLOG
(
4
)
<<
"handle ConvBN fuse"
;
VLOG
(
4
0
)
<<
"handle ConvBN fuse"
;
// conv, batch_norm,
// conv, batch_norm,
// conv_weight, conv_out,
// conv_weight, conv_out,
...
...
paddle/fluid/framework/ir/conv_relu_mkldnn_fuse_pass.cc
浏览文件 @
d231e550
...
@@ -38,7 +38,7 @@ std::unique_ptr<ir::Graph> ConvReLUFusePass::ApplyImpl(
...
@@ -38,7 +38,7 @@ std::unique_ptr<ir::Graph> ConvReLUFusePass::ApplyImpl(
int
found_conv_relu_count
=
0
;
int
found_conv_relu_count
=
0
;
auto
handler
=
[
&
](
const
GraphPatternDetector
::
subgraph_t
&
subgraph
,
auto
handler
=
[
&
](
const
GraphPatternDetector
::
subgraph_t
&
subgraph
,
Graph
*
g
)
{
Graph
*
g
)
{
VLOG
(
4
)
<<
"handle ConvReLU fuse"
;
VLOG
(
4
0
)
<<
"handle ConvReLU fuse"
;
GET_IR_NODE_FROM_SUBGRAPH
(
conv_weight
,
conv_weight
,
GET_IR_NODE_FROM_SUBGRAPH
(
conv_weight
,
conv_weight
,
conv_relu_pattern
);
// Filter
conv_relu_pattern
);
// Filter
GET_IR_NODE_FROM_SUBGRAPH
(
conv_out
,
conv_out
,
conv_relu_pattern
);
// tmp
GET_IR_NODE_FROM_SUBGRAPH
(
conv_out
,
conv_out
,
conv_relu_pattern
);
// tmp
...
@@ -48,7 +48,7 @@ std::unique_ptr<ir::Graph> ConvReLUFusePass::ApplyImpl(
...
@@ -48,7 +48,7 @@ std::unique_ptr<ir::Graph> ConvReLUFusePass::ApplyImpl(
FuseOptions
fuse_option
=
FindFuseOption
(
*
conv
,
*
relu
);
FuseOptions
fuse_option
=
FindFuseOption
(
*
conv
,
*
relu
);
if
(
fuse_option
==
DO_NOT_FUSE
)
{
if
(
fuse_option
==
DO_NOT_FUSE
)
{
VLOG
(
3
)
<<
"do not perform conv+relu fuse"
;
VLOG
(
3
0
)
<<
"do not perform conv+relu fuse"
;
return
;
return
;
}
}
...
...
paddle/fluid/framework/ir/depthwise_conv_mkldnn_pass.cc
浏览文件 @
d231e550
...
@@ -39,7 +39,7 @@ std::unique_ptr<ir::Graph> DepthwiseConvMKLDNNPass::ApplyImpl(
...
@@ -39,7 +39,7 @@ std::unique_ptr<ir::Graph> DepthwiseConvMKLDNNPass::ApplyImpl(
int
found_depthwise_conv_mkldnn_count
=
0
;
int
found_depthwise_conv_mkldnn_count
=
0
;
auto
handler
=
[
&
](
const
GraphPatternDetector
::
subgraph_t
&
subgraph
,
auto
handler
=
[
&
](
const
GraphPatternDetector
::
subgraph_t
&
subgraph
,
Graph
*
g
)
{
Graph
*
g
)
{
VLOG
(
3
)
<<
"handle DepthwiseConvMKLDNN fuse"
;
VLOG
(
3
0
)
<<
"handle DepthwiseConvMKLDNN fuse"
;
GET_NODE
(
depthwise_conv
,
(
*
pattern
));
GET_NODE
(
depthwise_conv
,
(
*
pattern
));
depthwise_conv
->
Op
()
->
SetType
(
"conv2d"
);
depthwise_conv
->
Op
()
->
SetType
(
"conv2d"
);
found_depthwise_conv_mkldnn_count
++
;
found_depthwise_conv_mkldnn_count
++
;
...
...
paddle/fluid/framework/ir/fc_fuse_pass.cc
浏览文件 @
d231e550
...
@@ -39,7 +39,7 @@ std::unique_ptr<ir::Graph> FCFusePass::ApplyImpl(
...
@@ -39,7 +39,7 @@ std::unique_ptr<ir::Graph> FCFusePass::ApplyImpl(
int
found_fc_count
=
0
;
int
found_fc_count
=
0
;
auto
handler
=
[
&
](
const
GraphPatternDetector
::
subgraph_t
&
subgraph
,
auto
handler
=
[
&
](
const
GraphPatternDetector
::
subgraph_t
&
subgraph
,
Graph
*
g
)
{
Graph
*
g
)
{
VLOG
(
4
)
<<
"handle FC fuse"
;
VLOG
(
4
0
)
<<
"handle FC fuse"
;
GET_IR_NODE_FROM_SUBGRAPH
(
w
,
w
,
fc_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
w
,
w
,
fc_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
fc_bias
,
bias
,
fc_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
fc_bias
,
bias
,
fc_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
fc_out
,
Out
,
fc_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
fc_out
,
Out
,
fc_pattern
);
...
...
paddle/fluid/framework/ir/fuse_elewise_add_act_pass.cc
浏览文件 @
d231e550
...
@@ -61,7 +61,7 @@ std::unique_ptr<ir::Graph> FuseElewiseAddActPass::FuseElewiseAddAct(
...
@@ -61,7 +61,7 @@ std::unique_ptr<ir::Graph> FuseElewiseAddActPass::FuseElewiseAddAct(
auto
handler
=
[
&
](
const
GraphPatternDetector
::
subgraph_t
&
subgraph
,
auto
handler
=
[
&
](
const
GraphPatternDetector
::
subgraph_t
&
subgraph
,
Graph
*
g
)
{
Graph
*
g
)
{
VLOG
(
4
)
<<
"handle FuseElewiseAddAct fuse"
;
VLOG
(
4
0
)
<<
"handle FuseElewiseAddAct fuse"
;
GET_IR_NODE_FROM_SUBGRAPH
(
ele_y
,
ele_y
,
elewise_add_act_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
ele_y
,
ele_y
,
elewise_add_act_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
ele_out
,
elewise_add_out
,
GET_IR_NODE_FROM_SUBGRAPH
(
ele_out
,
elewise_add_out
,
elewise_add_act_pattern
);
elewise_add_act_pattern
);
...
@@ -77,7 +77,7 @@ std::unique_ptr<ir::Graph> FuseElewiseAddActPass::FuseElewiseAddAct(
...
@@ -77,7 +77,7 @@ std::unique_ptr<ir::Graph> FuseElewiseAddActPass::FuseElewiseAddAct(
Node
*
elewise_add_act_node
=
CreateFuseElewiseAddActNode
(
Node
*
elewise_add_act_node
=
CreateFuseElewiseAddActNode
(
g
,
act
,
ele_add
,
ele_x_n
,
ele_y_n
,
ele_out_n
,
act_out_n
);
g
,
act
,
ele_add
,
ele_x_n
,
ele_y_n
,
ele_out_n
,
act_out_n
);
VLOG
(
4
)
<<
"
\n\t
"
<<
ele_x_n
<<
" and "
<<
ele_y_n
<<
" -> "
VLOG
(
4
0
)
<<
"
\n\t
"
<<
ele_x_n
<<
" and "
<<
ele_y_n
<<
" -> "
<<
ele_add
->
Name
()
<<
" -> "
<<
ele_out_n
<<
"
\n
"
<<
ele_add
->
Name
()
<<
" -> "
<<
ele_out_n
<<
"
\n
"
<<
"
\t
"
<<
ele_out_n
<<
" -> "
<<
act
->
Name
()
<<
" -> "
<<
"
\t
"
<<
ele_out_n
<<
" -> "
<<
act
->
Name
()
<<
" -> "
<<
act_out_n
;
<<
act_out_n
;
...
@@ -113,7 +113,7 @@ std::unique_ptr<ir::Graph> FuseElewiseAddActPass::FuseActElewiseAdd(
...
@@ -113,7 +113,7 @@ std::unique_ptr<ir::Graph> FuseElewiseAddActPass::FuseActElewiseAdd(
auto
handler
=
[
&
](
const
GraphPatternDetector
::
subgraph_t
&
subgraph
,
auto
handler
=
[
&
](
const
GraphPatternDetector
::
subgraph_t
&
subgraph
,
Graph
*
g
)
{
Graph
*
g
)
{
VLOG
(
4
)
<<
"handle FuseElewiseAddAct fuse"
;
VLOG
(
4
0
)
<<
"handle FuseElewiseAddAct fuse"
;
GET_IR_NODE_FROM_SUBGRAPH
(
act_out
,
act_out
,
act_elewise_add_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
act_out
,
act_out
,
act_elewise_add_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
ele_x
,
ele_x
,
act_elewise_add_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
ele_x
,
ele_x
,
act_elewise_add_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
ele_out
,
elewise_add_out
,
GET_IR_NODE_FROM_SUBGRAPH
(
ele_out
,
elewise_add_out
,
...
@@ -129,7 +129,7 @@ std::unique_ptr<ir::Graph> FuseElewiseAddActPass::FuseActElewiseAdd(
...
@@ -129,7 +129,7 @@ std::unique_ptr<ir::Graph> FuseElewiseAddActPass::FuseActElewiseAdd(
Node
*
elewise_add_act_node
=
CreateFuseElewiseAddActNode
(
Node
*
elewise_add_act_node
=
CreateFuseElewiseAddActNode
(
g
,
ele_add
,
act
,
elewise_add_x_n
,
act_i_n
,
act_o_n
,
elewise_add_out_n
);
g
,
ele_add
,
act
,
elewise_add_x_n
,
act_i_n
,
act_o_n
,
elewise_add_out_n
);
VLOG
(
4
)
<<
"
\n\t
"
<<
act_i_n
<<
" -> "
<<
act
->
Name
()
<<
" -> "
<<
act_o_n
VLOG
(
4
0
)
<<
"
\n\t
"
<<
act_i_n
<<
" -> "
<<
act
->
Name
()
<<
" -> "
<<
act_o_n
<<
"
\n\t
"
<<
act_o_n
<<
" and "
<<
elewise_add_x_n
<<
" -> "
<<
"
\n\t
"
<<
act_o_n
<<
" and "
<<
elewise_add_x_n
<<
" -> "
<<
ele_add
->
Name
()
<<
" -> "
<<
elewise_add_out_n
;
<<
ele_add
->
Name
()
<<
" -> "
<<
elewise_add_out_n
;
...
@@ -165,7 +165,7 @@ std::unique_ptr<ir::Graph> FuseElewiseAddActPass::FuseElewiseAddActInplaceGrad(
...
@@ -165,7 +165,7 @@ std::unique_ptr<ir::Graph> FuseElewiseAddActPass::FuseElewiseAddActInplaceGrad(
auto
handler
=
[
&
](
const
GraphPatternDetector
::
subgraph_t
&
subgraph
,
auto
handler
=
[
&
](
const
GraphPatternDetector
::
subgraph_t
&
subgraph
,
Graph
*
g
)
{
Graph
*
g
)
{
VLOG
(
4
)
<<
"handle FuseElewiseAddActGrad1 fuse"
;
VLOG
(
4
0
)
<<
"handle FuseElewiseAddActGrad1 fuse"
;
GET_IR_NODE_FROM_SUBGRAPH
(
act_out
,
act_out
,
elewise_add_act_grad_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
act_out
,
act_out
,
elewise_add_act_grad_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
act_grad
,
act_grad
,
elewise_add_act_grad_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
act_grad
,
act_grad
,
elewise_add_act_grad_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
d_itermediate_out
,
d_itermediate_out
,
GET_IR_NODE_FROM_SUBGRAPH
(
d_itermediate_out
,
d_itermediate_out
,
...
@@ -208,7 +208,7 @@ std::unique_ptr<ir::Graph> FuseElewiseAddActPass::FuseElewiseAddActInplaceGrad(
...
@@ -208,7 +208,7 @@ std::unique_ptr<ir::Graph> FuseElewiseAddActPass::FuseElewiseAddActInplaceGrad(
auto
fused_node
=
g
->
CreateOpNode
(
&
desc
);
auto
fused_node
=
g
->
CreateOpNode
(
&
desc
);
VLOG
(
4
)
<<
"
\n\t
"
<<
d_act_out_n
<<
" and "
<<
act_out_n
<<
" -> "
VLOG
(
4
0
)
<<
"
\n\t
"
<<
d_act_out_n
<<
" and "
<<
act_out_n
<<
" -> "
<<
act_grad
->
Name
()
<<
" -> "
<<
d_itermediate_out_n
<<
"
\n\t
"
<<
act_grad
->
Name
()
<<
" -> "
<<
d_itermediate_out_n
<<
"
\n\t
"
<<
d_itermediate_out_n
<<
" and "
<<
act_out_n
<<
" -> "
<<
d_itermediate_out_n
<<
" and "
<<
act_out_n
<<
" -> "
<<
ele_add_grad
->
Name
()
<<
" -> "
<<
d_itermediate_out_n
;
<<
ele_add_grad
->
Name
()
<<
" -> "
<<
d_itermediate_out_n
;
...
...
paddle/fluid/framework/ir/graph.cc
浏览文件 @
d231e550
...
@@ -92,7 +92,7 @@ Graph::Graph(const ProgramDesc &program) : program_(program) {
...
@@ -92,7 +92,7 @@ Graph::Graph(const ProgramDesc &program) : program_(program) {
std
::
map
<
std
::
string
,
std
::
vector
<
ir
::
Node
*>>
Graph
::
InitFromProgram
(
std
::
map
<
std
::
string
,
std
::
vector
<
ir
::
Node
*>>
Graph
::
InitFromProgram
(
const
ProgramDesc
&
program
)
{
const
ProgramDesc
&
program
)
{
VLOG
(
3
)
<<
"block in program:"
<<
program_
.
Size
();
VLOG
(
3
0
)
<<
"block in program:"
<<
program_
.
Size
();
std
::
unordered_map
<
std
::
string
,
VarDesc
*>
all_vars
;
std
::
unordered_map
<
std
::
string
,
VarDesc
*>
all_vars
;
// var nodes for each var name, will have multiple versions in SSA
// var nodes for each var name, will have multiple versions in SSA
std
::
map
<
std
::
string
,
std
::
vector
<
ir
::
Node
*>>
var_nodes
;
std
::
map
<
std
::
string
,
std
::
vector
<
ir
::
Node
*>>
var_nodes
;
...
@@ -160,7 +160,7 @@ void Graph::ResolveHazard(
...
@@ -160,7 +160,7 @@ void Graph::ResolveHazard(
auto
it_old
=
versions
.
rbegin
();
auto
it_old
=
versions
.
rbegin
();
++
it_old
;
++
it_old
;
for
(;
it_old
!=
versions
.
rend
();
it_new
=
it_old
,
++
it_old
)
{
for
(;
it_old
!=
versions
.
rend
();
it_new
=
it_old
,
++
it_old
)
{
VLOG
(
3
)
<<
"deal with var: "
<<
(
*
it_new
)
->
Name
();
VLOG
(
3
0
)
<<
"deal with var: "
<<
(
*
it_new
)
->
Name
();
ir
::
Node
*
write_op
=
ir
::
Node
*
write_op
=
(
*
it_new
)
->
inputs
.
empty
()
?
nullptr
:
(
*
it_new
)
->
inputs
[
0
];
(
*
it_new
)
->
inputs
.
empty
()
?
nullptr
:
(
*
it_new
)
->
inputs
[
0
];
const
auto
&
read_ops
=
(
*
it_old
)
->
outputs
;
const
auto
&
read_ops
=
(
*
it_old
)
->
outputs
;
...
...
paddle/fluid/framework/ir/graph.h
浏览文件 @
d231e550
...
@@ -89,7 +89,7 @@ class Graph {
...
@@ -89,7 +89,7 @@ class Graph {
attr_name
);
attr_name
);
attrs_
[
attr_name
]
=
attr
;
attrs_
[
attr_name
]
=
attr
;
attr_dels_
[
attr_name
]
=
[
attr
,
attr_name
]()
{
attr_dels_
[
attr_name
]
=
[
attr
,
attr_name
]()
{
VLOG
(
3
)
<<
"deleting "
<<
attr_name
;
VLOG
(
3
0
)
<<
"deleting "
<<
attr_name
;
delete
attr
;
delete
attr
;
};
};
}
}
...
...
paddle/fluid/framework/ir/graph_helper.cc
浏览文件 @
d231e550
...
@@ -33,8 +33,9 @@ void SortHelper(
...
@@ -33,8 +33,9 @@ void SortHelper(
}
}
}
}
VLOG
(
3
)
<<
"topology sort insert: "
<<
node
->
Name
()
VLOG
(
30
)
<<
"topology sort insert: "
<<
node
->
Name
()
<<
reinterpret_cast
<
void
*>
(
node
)
<<
" input "
<<
node
->
inputs
.
size
();
<<
reinterpret_cast
<
void
*>
(
node
)
<<
" input "
<<
node
->
inputs
.
size
();
ret
->
push_back
(
node
);
ret
->
push_back
(
node
);
}
}
...
@@ -103,7 +104,7 @@ std::map<ir::Node *, std::unordered_set<ir::Node *>> BuildOperationAdjList(
...
@@ -103,7 +104,7 @@ std::map<ir::Node *, std::unordered_set<ir::Node *>> BuildOperationAdjList(
for
(
auto
&
var
:
n
->
inputs
)
{
for
(
auto
&
var
:
n
->
inputs
)
{
for
(
auto
&
adj_n
:
var
->
inputs
)
{
for
(
auto
&
adj_n
:
var
->
inputs
)
{
PADDLE_ENFORCE
(
adj_n
->
NodeType
()
==
ir
::
Node
::
Type
::
kOperation
);
PADDLE_ENFORCE
(
adj_n
->
NodeType
()
==
ir
::
Node
::
Type
::
kOperation
);
VLOG
(
4
)
<<
"adj "
<<
adj_n
->
Name
()
<<
reinterpret_cast
<
void
*>
(
adj_n
)
VLOG
(
4
0
)
<<
"adj "
<<
adj_n
->
Name
()
<<
reinterpret_cast
<
void
*>
(
adj_n
)
<<
" -> "
<<
n
->
Name
()
<<
reinterpret_cast
<
void
*>
(
n
)
<<
" -> "
<<
n
->
Name
()
<<
reinterpret_cast
<
void
*>
(
n
)
<<
" via "
<<
var
->
Name
()
<<
reinterpret_cast
<
void
*>
(
var
);
<<
" via "
<<
var
->
Name
()
<<
reinterpret_cast
<
void
*>
(
var
);
adj_list
[
n
].
insert
(
adj_n
);
adj_list
[
n
].
insert
(
adj_n
);
...
@@ -163,10 +164,10 @@ size_t GraphNum(const Graph &graph) {
...
@@ -163,10 +164,10 @@ size_t GraphNum(const Graph &graph) {
graph_nodes
.
emplace_back
(
g_nodes
);
graph_nodes
.
emplace_back
(
g_nodes
);
}
}
if
(
VLOG_IS_ON
(
10
))
{
if
(
VLOG_IS_ON
(
10
0
))
{
VLOG
(
10
)
<<
"graph_num: "
<<
graph_nodes
.
size
();
VLOG
(
10
0
)
<<
"graph_num: "
<<
graph_nodes
.
size
();
for
(
auto
&
g_n
:
graph_nodes
)
{
for
(
auto
&
g_n
:
graph_nodes
)
{
VLOG
(
10
)
<<
"graph_nodes: "
<<
g_n
.
size
();
VLOG
(
10
0
)
<<
"graph_nodes: "
<<
g_n
.
size
();
if
(
g_n
.
size
()
<
10
)
{
if
(
g_n
.
size
()
<
10
)
{
std
::
stringstream
out
;
std
::
stringstream
out
;
for
(
auto
&
node
:
g_n
)
{
for
(
auto
&
node
:
g_n
)
{
...
@@ -180,7 +181,7 @@ size_t GraphNum(const Graph &graph) {
...
@@ -180,7 +181,7 @@ size_t GraphNum(const Graph &graph) {
}
}
out
<<
"]"
;
out
<<
"]"
;
}
}
VLOG
(
10
)
<<
out
.
str
();
VLOG
(
10
0
)
<<
out
.
str
();
}
}
}
}
}
}
...
...
paddle/fluid/framework/ir/graph_pattern_detector.cc
浏览文件 @
d231e550
...
@@ -12,6 +12,7 @@
...
@@ -12,6 +12,7 @@
// See the License for the specific language governing permissions and
// See the License for the specific language governing permissions and
// limitations under the License.
// limitations under the License.
#include <algorithm>
#include <array>
#include <array>
#include <string>
#include <string>
#include <vector>
#include <vector>
...
@@ -91,19 +92,19 @@ void GraphPatternDetector::operator()(Graph *graph,
...
@@ -91,19 +92,19 @@ void GraphPatternDetector::operator()(Graph *graph,
PrettyLogEndl
(
Style
::
detail
(),
"--- detect %d subgraphs"
,
subgraphs
.
size
());
PrettyLogEndl
(
Style
::
detail
(),
"--- detect %d subgraphs"
,
subgraphs
.
size
());
int
id
=
0
;
int
id
=
0
;
for
(
auto
&
g
:
subgraphs
)
{
for
(
auto
&
g
:
subgraphs
)
{
VLOG
(
3
)
<<
"optimizing #"
<<
id
++
<<
" subgraph"
;
VLOG
(
3
0
)
<<
"optimizing #"
<<
id
++
<<
" subgraph"
;
handler
(
g
,
graph
);
handler
(
g
,
graph
);
}
}
}
}
bool
GraphPatternDetector
::
MarkPDNodesInGraph
(
const
ir
::
Graph
&
graph
)
{
bool
GraphPatternDetector
::
MarkPDNodesInGraph
(
const
ir
::
Graph
&
graph
)
{
VLOG
(
3
)
<<
"mark pdnodes in graph"
;
VLOG
(
3
0
)
<<
"mark pdnodes in graph"
;
if
(
graph
.
Nodes
().
empty
())
return
false
;
if
(
graph
.
Nodes
().
empty
())
return
false
;
for
(
auto
&
node
:
GraphTraits
::
DFS
(
graph
))
{
for
(
auto
&
node
:
GraphTraits
::
DFS
(
graph
))
{
for
(
const
auto
&
pdnode
:
pattern_
.
nodes
())
{
for
(
const
auto
&
pdnode
:
pattern_
.
nodes
())
{
if
(
pdnode
->
Tell
(
&
node
))
{
if
(
pdnode
->
Tell
(
&
node
))
{
VLOG
(
4
)
<<
"pdnode "
<<
pdnode
->
name
()
<<
" marked"
;
VLOG
(
4
0
)
<<
"pdnode "
<<
pdnode
->
name
()
<<
" marked"
;
pdnodes2nodes_
[
pdnode
.
get
()].
insert
(
&
node
);
pdnodes2nodes_
[
pdnode
.
get
()].
insert
(
&
node
);
}
}
}
}
...
@@ -111,7 +112,7 @@ bool GraphPatternDetector::MarkPDNodesInGraph(const ir::Graph &graph) {
...
@@ -111,7 +112,7 @@ bool GraphPatternDetector::MarkPDNodesInGraph(const ir::Graph &graph) {
// Check to early stop if some PDNode can't find matched Node.
// Check to early stop if some PDNode can't find matched Node.
for
(
auto
&
pdnode
:
pattern_
.
nodes
())
{
for
(
auto
&
pdnode
:
pattern_
.
nodes
())
{
if
(
!
pdnodes2nodes_
.
count
(
pdnode
.
get
()))
{
if
(
!
pdnodes2nodes_
.
count
(
pdnode
.
get
()))
{
VLOG
(
4
)
<<
pdnode
->
name
()
<<
" can't find matched Node, early stop"
;
VLOG
(
4
0
)
<<
pdnode
->
name
()
<<
" can't find matched Node, early stop"
;
// return false;
// return false;
}
}
}
}
...
@@ -120,7 +121,7 @@ bool GraphPatternDetector::MarkPDNodesInGraph(const ir::Graph &graph) {
...
@@ -120,7 +121,7 @@ bool GraphPatternDetector::MarkPDNodesInGraph(const ir::Graph &graph) {
GetMarkedNodes
(
const_cast
<
Graph
*>
(
&
graph
)).
insert
(
n
);
GetMarkedNodes
(
const_cast
<
Graph
*>
(
&
graph
)).
insert
(
n
);
}
}
}
}
VLOG
(
3
)
<<
pdnodes2nodes_
.
size
()
<<
" nodes marked"
;
VLOG
(
3
0
)
<<
pdnodes2nodes_
.
size
()
<<
" nodes marked"
;
return
!
pdnodes2nodes_
.
empty
();
return
!
pdnodes2nodes_
.
empty
();
}
}
...
@@ -213,7 +214,7 @@ GraphPatternDetector::DetectPatterns() {
...
@@ -213,7 +214,7 @@ GraphPatternDetector::DetectPatterns() {
// Extend a PDNode to subgraphs by deducing the connection relations defined
// Extend a PDNode to subgraphs by deducing the connection relations defined
// in edges of PDNodes.
// in edges of PDNodes.
for
(
const
auto
&
edge
:
pattern_
.
edges
())
{
for
(
const
auto
&
edge
:
pattern_
.
edges
())
{
VLOG
(
4
)
<<
"check "
<<
edge
.
first
->
name
()
<<
" -> "
<<
edge
.
second
->
name
();
VLOG
(
4
0
)
<<
"check "
<<
edge
.
first
->
name
()
<<
" -> "
<<
edge
.
second
->
name
();
// TODO(Superjomn) Fix bug here, the groups might be duplicate here.
// TODO(Superjomn) Fix bug here, the groups might be duplicate here.
// Each role has two PDNodes, which indicates two roles.
// Each role has two PDNodes, which indicates two roles.
// Detect two Nodes that can match these two roles and they are connected.
// Detect two Nodes that can match these two roles and they are connected.
...
@@ -224,7 +225,7 @@ GraphPatternDetector::DetectPatterns() {
...
@@ -224,7 +225,7 @@ GraphPatternDetector::DetectPatterns() {
// source -> target
// source -> target
for
(
Node
*
source
:
pdnodes2nodes_
[
edge
.
first
])
{
for
(
Node
*
source
:
pdnodes2nodes_
[
edge
.
first
])
{
for
(
Node
*
target
:
pdnodes2nodes_
[
edge
.
second
])
{
for
(
Node
*
target
:
pdnodes2nodes_
[
edge
.
second
])
{
VLOG
(
8
)
<<
"check "
<<
source
->
id
()
<<
" -- "
<<
target
->
id
();
VLOG
(
8
0
)
<<
"check "
<<
source
->
id
()
<<
" -- "
<<
target
->
id
();
// TODO(Superjomn) add some prune strategies.
// TODO(Superjomn) add some prune strategies.
for
(
const
auto
&
group
:
pre_groups
)
{
for
(
const
auto
&
group
:
pre_groups
)
{
HitGroup
new_group
=
group
;
HitGroup
new_group
=
group
;
...
@@ -240,12 +241,13 @@ GraphPatternDetector::DetectPatterns() {
...
@@ -240,12 +241,13 @@ GraphPatternDetector::DetectPatterns() {
}
}
}
}
}
}
VLOG
(
3
)
<<
"step "
<<
step
<<
" get records: "
<<
cur_groups
.
size
();
VLOG
(
3
0
)
<<
"step "
<<
step
<<
" get records: "
<<
cur_groups
.
size
();
for
(
auto
&
group
:
cur_groups
)
{
for
(
auto
&
group
:
cur_groups
)
{
for
(
auto
&
item
:
group
.
roles
)
{
for
(
auto
&
item
:
group
.
roles
)
{
VLOG
(
4
)
<<
"node "
<<
item
.
second
->
id
()
<<
" as "
<<
item
.
first
->
name
();
VLOG
(
40
)
<<
"node "
<<
item
.
second
->
id
()
<<
" as "
<<
item
.
first
->
name
();
}
}
VLOG
(
4
)
<<
"========================================================="
;
VLOG
(
4
0
)
<<
"========================================================="
;
}
}
}
}
...
...
paddle/fluid/framework/ir/graph_viz_pass.cc
浏览文件 @
d231e550
...
@@ -41,7 +41,7 @@ std::string FormatName(const Node* node) {
...
@@ -41,7 +41,7 @@ std::string FormatName(const Node* node) {
std
::
unique_ptr
<
ir
::
Graph
>
GraphVizPass
::
ApplyImpl
(
std
::
unique_ptr
<
ir
::
Graph
>
GraphVizPass
::
ApplyImpl
(
std
::
unique_ptr
<
ir
::
Graph
>
graph
)
const
{
std
::
unique_ptr
<
ir
::
Graph
>
graph
)
const
{
const
std
::
string
graph_viz_path
=
Get
<
std
::
string
>
(
kGraphVizPath
);
const
std
::
string
graph_viz_path
=
Get
<
std
::
string
>
(
kGraphVizPath
);
VLOG
(
3
)
<<
"draw IR graph viz to "
<<
graph_viz_path
;
VLOG
(
3
0
)
<<
"draw IR graph viz to "
<<
graph_viz_path
;
std
::
unique_ptr
<
std
::
ostream
>
fout
(
new
std
::
ofstream
(
graph_viz_path
));
std
::
unique_ptr
<
std
::
ostream
>
fout
(
new
std
::
ofstream
(
graph_viz_path
));
PADDLE_ENFORCE
(
fout
->
good
());
PADDLE_ENFORCE
(
fout
->
good
());
std
::
ostream
&
sout
=
*
fout
;
std
::
ostream
&
sout
=
*
fout
;
...
...
paddle/fluid/framework/ir/mkldnn_placement_pass.cc
浏览文件 @
d231e550
...
@@ -20,7 +20,7 @@ namespace ir {
...
@@ -20,7 +20,7 @@ namespace ir {
std
::
unique_ptr
<
ir
::
Graph
>
MKLDNNPlacementPass
::
ApplyImpl
(
std
::
unique_ptr
<
ir
::
Graph
>
MKLDNNPlacementPass
::
ApplyImpl
(
std
::
unique_ptr
<
ir
::
Graph
>
graph
)
const
{
std
::
unique_ptr
<
ir
::
Graph
>
graph
)
const
{
VLOG
(
3
)
<<
"Aplies MKL-DNN placement strategy."
;
VLOG
(
3
0
)
<<
"Aplies MKL-DNN placement strategy."
;
for
(
const
Node
*
n
:
graph
->
Nodes
())
{
for
(
const
Node
*
n
:
graph
->
Nodes
())
{
if
(
n
->
IsOp
()
&&
n
->
Op
()
->
HasAttr
(
"use_mkldnn"
))
{
if
(
n
->
IsOp
()
&&
n
->
Op
()
->
HasAttr
(
"use_mkldnn"
))
{
n
->
Op
()
->
SetAttr
(
"use_mkldnn"
,
true
);
n
->
Op
()
->
SetAttr
(
"use_mkldnn"
,
true
);
...
...
paddle/fluid/framework/ir/multi_batch_merge_pass.cc
浏览文件 @
d231e550
...
@@ -62,7 +62,7 @@ VarDesc UpdateGradVarDesc(
...
@@ -62,7 +62,7 @@ VarDesc UpdateGradVarDesc(
string
::
Sprintf
(
"%s.repeat.%d"
,
var_desc
->
Name
(),
repeat
);
string
::
Sprintf
(
"%s.repeat.%d"
,
var_desc
->
Name
(),
repeat
);
VarDesc
repeated_var
=
CopyVarDesc
(
var_desc
);
VarDesc
repeated_var
=
CopyVarDesc
(
var_desc
);
repeated_var
.
SetName
(
new_gname
);
repeated_var
.
SetName
(
new_gname
);
VLOG
(
3
)
<<
"update "
<<
var_desc
->
Name
()
<<
" to repeat "
<<
repeat
;
VLOG
(
3
0
)
<<
"update "
<<
var_desc
->
Name
()
<<
" to repeat "
<<
repeat
;
return
repeated_var
;
return
repeated_var
;
}
}
return
*
var_desc
;
return
*
var_desc
;
...
@@ -78,7 +78,7 @@ std::unique_ptr<Graph> BatchMergePass::ApplyImpl(
...
@@ -78,7 +78,7 @@ std::unique_ptr<Graph> BatchMergePass::ApplyImpl(
std
::
vector
<
ir
::
Node
*>
nodes
=
TopologySortOperations
(
*
graph
);
std
::
vector
<
ir
::
Node
*>
nodes
=
TopologySortOperations
(
*
graph
);
auto
origin_nodes
=
graph
->
ReleaseNodes
();
auto
origin_nodes
=
graph
->
ReleaseNodes
();
VLOG
(
3
)
<<
"origin nodes count: "
<<
origin_nodes
.
size
();
VLOG
(
3
0
)
<<
"origin nodes count: "
<<
origin_nodes
.
size
();
ir
::
Graph
&
result
=
*
graph
;
ir
::
Graph
&
result
=
*
graph
;
// 1. record op nodes of different roles
// 1. record op nodes of different roles
...
@@ -137,7 +137,7 @@ std::unique_ptr<Graph> BatchMergePass::ApplyImpl(
...
@@ -137,7 +137,7 @@ std::unique_ptr<Graph> BatchMergePass::ApplyImpl(
"%s.repeat.%d"
,
repeated_op
.
Input
(
"Variance"
)[
0
],
i
);
"%s.repeat.%d"
,
repeated_op
.
Input
(
"Variance"
)[
0
],
i
);
bn_vars_need_rename
.
insert
(
repeated_op
.
Input
(
"Mean"
)[
0
]);
bn_vars_need_rename
.
insert
(
repeated_op
.
Input
(
"Mean"
)[
0
]);
bn_vars_need_rename
.
insert
(
repeated_op
.
Input
(
"Variance"
)[
0
]);
bn_vars_need_rename
.
insert
(
repeated_op
.
Input
(
"Variance"
)[
0
]);
VLOG
(
3
)
<<
"renaming "
<<
repeated_op
.
Input
(
"Mean"
)[
0
]
<<
" to "
VLOG
(
3
0
)
<<
"renaming "
<<
repeated_op
.
Input
(
"Mean"
)[
0
]
<<
" to "
<<
new_mean_name
;
<<
new_mean_name
;
repeated_op
.
RenameInput
(
repeated_op
.
Input
(
"Mean"
)[
0
],
new_mean_name
);
repeated_op
.
RenameInput
(
repeated_op
.
Input
(
"Mean"
)[
0
],
new_mean_name
);
repeated_op
.
RenameInput
(
repeated_op
.
Input
(
"Variance"
)[
0
],
new_var_name
);
repeated_op
.
RenameInput
(
repeated_op
.
Input
(
"Variance"
)[
0
],
new_var_name
);
...
...
paddle/fluid/framework/ir/pass.h
浏览文件 @
d231e550
...
@@ -76,7 +76,7 @@ class Pass {
...
@@ -76,7 +76,7 @@ class Pass {
attr_name
);
attr_name
);
attrs_
[
attr_name
]
=
attr
;
attrs_
[
attr_name
]
=
attr
;
attr_dels_
[
attr_name
]
=
[
attr
,
attr_name
]()
{
attr_dels_
[
attr_name
]
=
[
attr
,
attr_name
]()
{
VLOG
(
3
)
<<
"deleting "
<<
attr_name
;
VLOG
(
3
0
)
<<
"deleting "
<<
attr_name
;
delete
attr
;
delete
attr
;
};
};
}
}
...
...
paddle/fluid/framework/ir/seq_concat_fc_fuse_pass.cc
浏览文件 @
d231e550
...
@@ -12,10 +12,13 @@
...
@@ -12,10 +12,13 @@
// See the License for the specific language governing permissions and
// See the License for the specific language governing permissions and
// limitations under the License.
// limitations under the License.
#include "paddle/fluid/framework/ir/seq_concat_fc_fuse_pass.h"
#include <set>
#include <string>
#include "paddle/fluid/framework/ir/fuse_pass_base.h"
#include "paddle/fluid/framework/ir/fuse_pass_base.h"
#include "paddle/fluid/framework/ir/graph_pattern_detector.h"
#include "paddle/fluid/framework/ir/graph_pattern_detector.h"
#include "paddle/fluid/framework/ir/graph_viz_pass.h"
#include "paddle/fluid/framework/ir/graph_viz_pass.h"
#include "paddle/fluid/framework/ir/seq_concat_fc_fuse_pass.h"
#include "paddle/fluid/framework/lod_tensor.h"
#include "paddle/fluid/framework/lod_tensor.h"
namespace
paddle
{
namespace
paddle
{
...
@@ -159,10 +162,7 @@ PDNode* BuildFCPattern(PDPattern* pattern, PDNode* fc_x) {
...
@@ -159,10 +162,7 @@ PDNode* BuildFCPattern(PDPattern* pattern, PDNode* fc_x) {
std
::
set
<
std
::
string
>
acts
({
"sigmoid"
,
"tanh"
,
"relu"
,
"identity"
});
std
::
set
<
std
::
string
>
acts
({
"sigmoid"
,
"tanh"
,
"relu"
,
"identity"
});
PDNode
*
act
=
pattern
->
NewNode
(
PDNode
*
act
=
pattern
->
NewNode
(
[
=
](
Node
*
x
)
{
[
=
](
Node
*
x
)
{
return
x
&&
x
->
IsOp
()
&&
acts
.
count
(
x
->
Op
()
->
Type
());
},
return
x
&&
x
->
IsOp
()
&&
acts
.
count
(
x
->
Op
()
->
Type
());
},
"act"
);
"act"
);
PDNode
*
fc_out
=
pattern
->
NewNode
(
PDNode
*
fc_out
=
pattern
->
NewNode
(
...
@@ -196,7 +196,7 @@ std::unique_ptr<ir::Graph> SeqConcatFcFusePass::ApplyImpl(
...
@@ -196,7 +196,7 @@ std::unique_ptr<ir::Graph> SeqConcatFcFusePass::ApplyImpl(
detector
(
graph
.
get
(),
[
&
](
const
GraphPatternDetector
::
subgraph_t
&
subgraph
,
detector
(
graph
.
get
(),
[
&
](
const
GraphPatternDetector
::
subgraph_t
&
subgraph
,
Graph
*
graph
)
{
Graph
*
graph
)
{
VLOG
(
4
)
<<
"get one concat pattern"
;
VLOG
(
4
0
)
<<
"get one concat pattern"
;
// fc
// fc
GET_NODE
(
fc_w
,
detector
.
pattern
());
GET_NODE
(
fc_w
,
detector
.
pattern
());
GET_NODE
(
fc_bias
,
detector
.
pattern
());
GET_NODE
(
fc_bias
,
detector
.
pattern
());
...
...
paddle/fluid/framework/ir/seqconv_eltadd_relu_fuse_pass.cc
浏览文件 @
d231e550
...
@@ -60,7 +60,7 @@ int BuildFusion(Graph* graph, const std::string& name_scope, Scope* scope) {
...
@@ -60,7 +60,7 @@ int BuildFusion(Graph* graph, const std::string& name_scope, Scope* scope) {
auto
handler
=
[
&
](
const
GraphPatternDetector
::
subgraph_t
&
subgraph
,
auto
handler
=
[
&
](
const
GraphPatternDetector
::
subgraph_t
&
subgraph
,
Graph
*
g
)
{
Graph
*
g
)
{
VLOG
(
4
)
<<
"handle SeqConv EltAdd Relu fuse"
;
VLOG
(
4
0
)
<<
"handle SeqConv EltAdd Relu fuse"
;
GET_IR_NODE_FROM_SUBGRAPH
(
seqconv
,
seqconv
,
fuse_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
seqconv
,
seqconv
,
fuse_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
seqconv_weight
,
seqconv_weight
,
fuse_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
seqconv_weight
,
seqconv_weight
,
fuse_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
seqconv_out
,
seqconv_out
,
fuse_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
seqconv_out
,
seqconv_out
,
fuse_pattern
);
...
...
paddle/fluid/framework/lod_rank_table.cc
浏览文件 @
d231e550
...
@@ -31,7 +31,7 @@ void LoDRankTable::Reset(const LoD& lod, size_t level) {
...
@@ -31,7 +31,7 @@ void LoDRankTable::Reset(const LoD& lod, size_t level) {
TableItem
item
;
TableItem
item
;
item
.
index
=
i
;
item
.
index
=
i
;
item
.
length
=
vec
[
i
+
1
]
-
vec
[
i
];
item
.
length
=
vec
[
i
+
1
]
-
vec
[
i
];
VLOG
(
10
)
<<
"Add item to rank table "
<<
item
.
index
<<
" "
<<
item
.
length
;
VLOG
(
10
0
)
<<
"Add item to rank table "
<<
item
.
index
<<
" "
<<
item
.
length
;
items_
.
emplace_back
(
item
);
items_
.
emplace_back
(
item
);
}
}
// NOTE(yuyang18):
// NOTE(yuyang18):
...
...
paddle/fluid/framework/mixed_vector_test.cc
浏览文件 @
d231e550
...
@@ -51,7 +51,7 @@ TEST(mixed_vector, InitWithCount) {
...
@@ -51,7 +51,7 @@ TEST(mixed_vector, InitWithCount) {
TEST
(
mixed_vector
,
ForEach
)
{
TEST
(
mixed_vector
,
ForEach
)
{
vec
<
int
>
tmp
;
vec
<
int
>
tmp
;
for
(
auto
&
v
:
tmp
)
{
for
(
auto
&
v
:
tmp
)
{
VLOG
(
3
)
<<
v
;
VLOG
(
3
0
)
<<
v
;
}
}
}
}
...
...
paddle/fluid/framework/naive_executor.cc
浏览文件 @
d231e550
...
@@ -71,7 +71,7 @@ void NaiveExecutor::Prepare(Scope *parent_scope,
...
@@ -71,7 +71,7 @@ void NaiveExecutor::Prepare(Scope *parent_scope,
void
NaiveExecutor
::
Run
()
{
void
NaiveExecutor
::
Run
()
{
for
(
auto
&
op
:
ops_
)
{
for
(
auto
&
op
:
ops_
)
{
VLOG
(
4
)
<<
"run "
<<
op
->
Type
();
VLOG
(
4
0
)
<<
"run "
<<
op
->
Type
();
op
->
Run
(
*
scope_
,
place_
);
op
->
Run
(
*
scope_
,
place_
);
}
}
}
}
...
@@ -95,12 +95,12 @@ void NaiveExecutor::CreateVariables(const ProgramDesc &desc, Scope *scope,
...
@@ -95,12 +95,12 @@ void NaiveExecutor::CreateVariables(const ProgramDesc &desc, Scope *scope,
if
(
var
->
Persistable
())
{
if
(
var
->
Persistable
())
{
auto
*
ptr
=
const_cast
<
Scope
*>
(
ancestor_scope
)
->
Var
(
var
->
Name
());
auto
*
ptr
=
const_cast
<
Scope
*>
(
ancestor_scope
)
->
Var
(
var
->
Name
());
InitializeVariable
(
ptr
,
var
->
GetType
());
InitializeVariable
(
ptr
,
var
->
GetType
());
VLOG
(
3
)
<<
"Create Variable "
<<
var
->
Name
()
VLOG
(
3
0
)
<<
"Create Variable "
<<
var
->
Name
()
<<
" global, which pointer is "
<<
ptr
;
<<
" global, which pointer is "
<<
ptr
;
}
else
{
// Create temporary variables in local scope.
}
else
{
// Create temporary variables in local scope.
auto
*
ptr
=
scope
->
Var
(
var
->
Name
());
auto
*
ptr
=
scope
->
Var
(
var
->
Name
());
InitializeVariable
(
ptr
,
var
->
GetType
());
InitializeVariable
(
ptr
,
var
->
GetType
());
VLOG
(
3
)
<<
"Create Variable "
<<
var
->
Name
()
VLOG
(
3
0
)
<<
"Create Variable "
<<
var
->
Name
()
<<
" locally, which pointer is "
<<
ptr
;
<<
" locally, which pointer is "
<<
ptr
;
}
}
}
}
...
@@ -108,7 +108,7 @@ void NaiveExecutor::CreateVariables(const ProgramDesc &desc, Scope *scope,
...
@@ -108,7 +108,7 @@ void NaiveExecutor::CreateVariables(const ProgramDesc &desc, Scope *scope,
for
(
auto
&
var
:
global_block
.
AllVars
())
{
for
(
auto
&
var
:
global_block
.
AllVars
())
{
auto
*
ptr
=
scope
->
Var
(
var
->
Name
());
auto
*
ptr
=
scope
->
Var
(
var
->
Name
());
InitializeVariable
(
ptr
,
var
->
GetType
());
InitializeVariable
(
ptr
,
var
->
GetType
());
VLOG
(
3
)
<<
"Create variable "
<<
var
->
Name
()
<<
", which pointer is "
VLOG
(
3
0
)
<<
"Create variable "
<<
var
->
Name
()
<<
", which pointer is "
<<
ptr
;
<<
ptr
;
}
}
}
}
...
...
paddle/fluid/framework/op_desc.cc
浏览文件 @
d231e550
...
@@ -82,7 +82,7 @@ class CompileTimeInferShapeContext : public InferShapeContext {
...
@@ -82,7 +82,7 @@ class CompileTimeInferShapeContext : public InferShapeContext {
auto
*
in_var
=
block_
.
FindVarRecursive
(
Inputs
(
in
)[
i
]);
auto
*
in_var
=
block_
.
FindVarRecursive
(
Inputs
(
in
)[
i
]);
auto
*
out_var
=
block_
.
FindVarRecursive
(
Outputs
(
out
)[
j
]);
auto
*
out_var
=
block_
.
FindVarRecursive
(
Outputs
(
out
)[
j
]);
if
(
in_var
->
GetType
()
!=
proto
::
VarType
::
LOD_TENSOR
)
{
if
(
in_var
->
GetType
()
!=
proto
::
VarType
::
LOD_TENSOR
)
{
VLOG
(
3
)
<<
"input "
<<
in
<<
" is not LodTensor"
;
VLOG
(
3
0
)
<<
"input "
<<
in
<<
" is not LodTensor"
;
return
;
return
;
}
}
out_var
->
SetLoDLevel
(
in_var
->
GetLoDLevel
());
out_var
->
SetLoDLevel
(
in_var
->
GetLoDLevel
());
...
@@ -241,31 +241,31 @@ void OpDesc::SetAttr(const std::string &name, const Attribute &v) {
...
@@ -241,31 +241,31 @@ void OpDesc::SetAttr(const std::string &name, const Attribute &v) {
const
proto
::
OpProto
::
Attr
&
attr
=
GetProtoAttr
(
name
);
const
proto
::
OpProto
::
Attr
&
attr
=
GetProtoAttr
(
name
);
switch
(
attr
.
type
())
{
switch
(
attr
.
type
())
{
case
proto
::
AttrType
::
BOOLEANS
:
{
case
proto
::
AttrType
::
BOOLEANS
:
{
VLOG
(
11
)
<<
"SetAttr: "
<<
Type
()
<<
", "
<<
name
VLOG
(
11
0
)
<<
"SetAttr: "
<<
Type
()
<<
", "
<<
name
<<
" from INTS to BOOLEANS"
;
<<
" from INTS to BOOLEANS"
;
this
->
attrs_
[
name
]
=
std
::
vector
<
bool
>
();
this
->
attrs_
[
name
]
=
std
::
vector
<
bool
>
();
break
;
break
;
}
}
case
proto
::
AttrType
::
INTS
:
{
case
proto
::
AttrType
::
INTS
:
{
VLOG
(
11
)
<<
"SetAttr: "
<<
Type
()
<<
", "
<<
name
VLOG
(
11
0
)
<<
"SetAttr: "
<<
Type
()
<<
", "
<<
name
<<
" from INTS to INTS"
;
<<
" from INTS to INTS"
;
this
->
attrs_
[
name
]
=
std
::
vector
<
int
>
();
this
->
attrs_
[
name
]
=
std
::
vector
<
int
>
();
break
;
break
;
}
}
case
proto
::
AttrType
::
FLOATS
:
{
case
proto
::
AttrType
::
FLOATS
:
{
VLOG
(
11
)
<<
"SetAttr: "
<<
Type
()
<<
", "
<<
name
VLOG
(
11
0
)
<<
"SetAttr: "
<<
Type
()
<<
", "
<<
name
<<
" from INTS to FLOATS"
;
<<
" from INTS to FLOATS"
;
this
->
attrs_
[
name
]
=
std
::
vector
<
float
>
();
this
->
attrs_
[
name
]
=
std
::
vector
<
float
>
();
break
;
break
;
}
}
case
proto
::
AttrType
::
STRINGS
:
{
case
proto
::
AttrType
::
STRINGS
:
{
VLOG
(
11
)
<<
"SetAttr: "
<<
Type
()
<<
", "
<<
name
VLOG
(
11
0
)
<<
"SetAttr: "
<<
Type
()
<<
", "
<<
name
<<
" from INTS to STRINGS"
;
<<
" from INTS to STRINGS"
;
this
->
attrs_
[
name
]
=
std
::
vector
<
std
::
string
>
();
this
->
attrs_
[
name
]
=
std
::
vector
<
std
::
string
>
();
break
;
break
;
}
}
case
proto
::
AttrType
::
BLOCKS
:
{
case
proto
::
AttrType
::
BLOCKS
:
{
VLOG
(
11
)
<<
"SetAttr: "
<<
Type
()
<<
", "
<<
name
VLOG
(
11
0
)
<<
"SetAttr: "
<<
Type
()
<<
", "
<<
name
<<
" from INTS to BLOCKS"
;
<<
" from INTS to BLOCKS"
;
this
->
SetBlocksAttr
(
name
,
std
::
vector
<
BlockDesc
*>
());
this
->
SetBlocksAttr
(
name
,
std
::
vector
<
BlockDesc
*>
());
return
;
return
;
...
@@ -499,13 +499,13 @@ void OpDesc::CheckAttrs() {
...
@@ -499,13 +499,13 @@ void OpDesc::CheckAttrs() {
}
}
void
OpDesc
::
InferShape
(
const
BlockDesc
&
block
)
const
{
void
OpDesc
::
InferShape
(
const
BlockDesc
&
block
)
const
{
VLOG
(
3
)
<<
"CompileTime infer shape on "
<<
Type
();
VLOG
(
3
0
)
<<
"CompileTime infer shape on "
<<
Type
();
InitInferShapeFuncs
();
InitInferShapeFuncs
();
auto
&
infer_shape
=
OpInfoMap
::
Instance
().
Get
(
this
->
Type
()).
infer_shape_
;
auto
&
infer_shape
=
OpInfoMap
::
Instance
().
Get
(
this
->
Type
()).
infer_shape_
;
PADDLE_ENFORCE
(
static_cast
<
bool
>
(
infer_shape
),
PADDLE_ENFORCE
(
static_cast
<
bool
>
(
infer_shape
),
"%s's infer_shape has not been registered"
,
this
->
Type
());
"%s's infer_shape has not been registered"
,
this
->
Type
());
CompileTimeInferShapeContext
ctx
(
*
this
,
block
);
CompileTimeInferShapeContext
ctx
(
*
this
,
block
);
if
(
VLOG_IS_ON
(
10
))
{
if
(
VLOG_IS_ON
(
10
0
))
{
std
::
ostringstream
sout
;
std
::
ostringstream
sout
;
auto
inames
=
this
->
InputArgumentNames
();
auto
inames
=
this
->
InputArgumentNames
();
sout
<<
" From ["
;
sout
<<
" From ["
;
...
@@ -516,7 +516,7 @@ void OpDesc::InferShape(const BlockDesc &block) const {
...
@@ -516,7 +516,7 @@ void OpDesc::InferShape(const BlockDesc &block) const {
std
::
copy
(
onames
.
begin
(),
onames
.
end
(),
std
::
copy
(
onames
.
begin
(),
onames
.
end
(),
std
::
ostream_iterator
<
std
::
string
>
(
sout
,
", "
));
std
::
ostream_iterator
<
std
::
string
>
(
sout
,
", "
));
sout
<<
"]"
;
sout
<<
"]"
;
VLOG
(
10
)
<<
sout
.
str
();
VLOG
(
10
0
)
<<
sout
.
str
();
}
}
infer_shape
(
&
ctx
);
infer_shape
(
&
ctx
);
}
}
...
@@ -607,7 +607,7 @@ DDim CompileTimeInferShapeContext::GetDim(const std::string &name) const {
...
@@ -607,7 +607,7 @@ DDim CompileTimeInferShapeContext::GetDim(const std::string &name) const {
auto
shape
=
var
->
GetShape
();
auto
shape
=
var
->
GetShape
();
res
=
shape
.
empty
()
?
make_ddim
({
0UL
})
:
make_ddim
(
shape
);
res
=
shape
.
empty
()
?
make_ddim
({
0UL
})
:
make_ddim
(
shape
);
}
catch
(...)
{
}
catch
(...)
{
VLOG
(
5
)
<<
"GetDim of variable "
<<
name
<<
" error"
;
VLOG
(
5
0
)
<<
"GetDim of variable "
<<
name
<<
" error"
;
std
::
rethrow_exception
(
std
::
current_exception
());
std
::
rethrow_exception
(
std
::
current_exception
());
}
}
return
res
;
return
res
;
...
@@ -624,7 +624,7 @@ std::vector<DDim> CompileTimeInferShapeContext::GetRepeatedDims(
...
@@ -624,7 +624,7 @@ std::vector<DDim> CompileTimeInferShapeContext::GetRepeatedDims(
res
.
push_back
(
s
.
empty
()
?
make_ddim
({
0UL
})
:
make_ddim
(
s
));
res
.
push_back
(
s
.
empty
()
?
make_ddim
({
0UL
})
:
make_ddim
(
s
));
}
}
}
catch
(...)
{
}
catch
(...)
{
VLOG
(
5
)
<<
"GetRepeatedDim of variable "
<<
name
<<
" error."
;
VLOG
(
5
0
)
<<
"GetRepeatedDim of variable "
<<
name
<<
" error."
;
std
::
rethrow_exception
(
std
::
current_exception
());
std
::
rethrow_exception
(
std
::
current_exception
());
}
}
return
res
;
return
res
;
...
...
paddle/fluid/framework/op_registry.cc
浏览文件 @
d231e550
...
@@ -46,7 +46,7 @@ static VariableNameMap ConvertOpDescVarsToVarNameMap(
...
@@ -46,7 +46,7 @@ static VariableNameMap ConvertOpDescVarsToVarNameMap(
std
::
unique_ptr
<
OperatorBase
>
OpRegistry
::
CreateOp
(
std
::
unique_ptr
<
OperatorBase
>
OpRegistry
::
CreateOp
(
const
proto
::
OpDesc
&
op_desc
)
{
const
proto
::
OpDesc
&
op_desc
)
{
VLOG
(
1
)
<<
"CreateOp directly from OpDesc is deprecated. It should only be"
VLOG
(
1
0
)
<<
"CreateOp directly from OpDesc is deprecated. It should only be"
"used in unit tests. Use CreateOp(const OpDesc& op_desc) "
"used in unit tests. Use CreateOp(const OpDesc& op_desc) "
"instead."
;
"instead."
;
VariableNameMap
inputs
=
ConvertOpDescVarsToVarNameMap
(
op_desc
.
inputs
());
VariableNameMap
inputs
=
ConvertOpDescVarsToVarNameMap
(
op_desc
.
inputs
());
...
...
paddle/fluid/framework/operator.cc
浏览文件 @
d231e550
...
@@ -140,7 +140,7 @@ static LoD GetLoD(const Scope& scope, const std::string& name) {
...
@@ -140,7 +140,7 @@ static LoD GetLoD(const Scope& scope, const std::string& name) {
}
}
void
OperatorBase
::
Run
(
const
Scope
&
scope
,
const
platform
::
Place
&
place
)
{
void
OperatorBase
::
Run
(
const
Scope
&
scope
,
const
platform
::
Place
&
place
)
{
VLOG
(
4
)
<<
place
<<
" "
<<
DebugStringEx
(
&
scope
);
VLOG
(
4
0
)
<<
place
<<
" "
<<
DebugStringEx
(
&
scope
);
if
(
platform
::
is_gpu_place
(
place
))
{
if
(
platform
::
is_gpu_place
(
place
))
{
#ifndef PADDLE_WITH_CUDA
#ifndef PADDLE_WITH_CUDA
PADDLE_THROW
(
"Cannot run operator on place %s"
,
place
);
PADDLE_THROW
(
"Cannot run operator on place %s"
,
place
);
...
@@ -160,7 +160,7 @@ void OperatorBase::Run(const Scope& scope, const platform::Place& place) {
...
@@ -160,7 +160,7 @@ void OperatorBase::Run(const Scope& scope, const platform::Place& place) {
}
else
{
}
else
{
RunImpl
(
scope
,
place
);
RunImpl
(
scope
,
place
);
}
}
VLOG
(
3
)
<<
place
<<
" "
<<
DebugStringEx
(
&
scope
);
VLOG
(
3
0
)
<<
place
<<
" "
<<
DebugStringEx
(
&
scope
);
}
}
bool
OperatorBase
::
HasInputs
(
const
std
::
string
&
name
)
const
{
bool
OperatorBase
::
HasInputs
(
const
std
::
string
&
name
)
const
{
...
@@ -259,6 +259,8 @@ std::string OperatorBase::DebugStringEx(const Scope* scope) const {
...
@@ -259,6 +259,8 @@ std::string OperatorBase::DebugStringEx(const Scope* scope) const {
if
(
row_size
>=
0
)
{
if
(
row_size
>=
0
)
{
ss
<<
"[row_size="
<<
row_size
<<
"]"
;
ss
<<
"[row_size="
<<
row_size
<<
"]"
;
}
}
std
::
string
dtype
=
GetDtype
(
*
scope
,
output
.
second
[
i
]);
ss
<<
":"
<<
dtype
;
ss
<<
"["
<<
GetDims
(
*
scope
,
var_name
,
true
)
<<
"]"
;
ss
<<
"["
<<
GetDims
(
*
scope
,
var_name
,
true
)
<<
"]"
;
ss
<<
"("
<<
GetLoD
(
*
scope
,
var_name
)
<<
")"
;
ss
<<
"("
<<
GetLoD
(
*
scope
,
var_name
)
<<
")"
;
}
}
...
@@ -715,14 +717,14 @@ void OperatorWithKernel::RunImpl(const Scope& scope,
...
@@ -715,14 +717,14 @@ void OperatorWithKernel::RunImpl(const Scope& scope,
auto
expected_kernel_key
=
auto
expected_kernel_key
=
this
->
GetExpectedKernelType
(
ExecutionContext
(
*
this
,
scope
,
*
dev_ctx
));
this
->
GetExpectedKernelType
(
ExecutionContext
(
*
this
,
scope
,
*
dev_ctx
));
VLOG
(
3
)
<<
"expected_kernel_key:"
<<
expected_kernel_key
;
VLOG
(
3
0
)
<<
"expected_kernel_key:"
<<
expected_kernel_key
;
auto
kernel_iter
=
kernels
.
find
(
expected_kernel_key
);
auto
kernel_iter
=
kernels
.
find
(
expected_kernel_key
);
#ifdef PADDLE_WITH_MKLDNN
#ifdef PADDLE_WITH_MKLDNN
// workaround for missing MKLDNN kernel when FLAGS_use_mkldnn env var is set
// workaround for missing MKLDNN kernel when FLAGS_use_mkldnn env var is set
if
(
kernel_iter
==
kernels
.
end
()
&&
if
(
kernel_iter
==
kernels
.
end
()
&&
expected_kernel_key
.
library_type_
==
LibraryType
::
kMKLDNN
)
{
expected_kernel_key
.
library_type_
==
LibraryType
::
kMKLDNN
)
{
VLOG
(
3
)
<<
"missing MKLDNN kernel: fallbacking to PLAIN one"
;
VLOG
(
3
0
)
<<
"missing MKLDNN kernel: fallbacking to PLAIN one"
;
expected_kernel_key
.
library_type_
=
LibraryType
::
kPlain
;
expected_kernel_key
.
library_type_
=
LibraryType
::
kPlain
;
expected_kernel_key
.
data_layout_
=
DataLayout
::
kAnyLayout
;
expected_kernel_key
.
data_layout_
=
DataLayout
::
kAnyLayout
;
kernel_iter
=
kernels
.
find
(
expected_kernel_key
);
kernel_iter
=
kernels
.
find
(
expected_kernel_key
);
...
@@ -774,7 +776,8 @@ void OperatorWithKernel::TransferInplaceVarsBack(
...
@@ -774,7 +776,8 @@ void OperatorWithKernel::TransferInplaceVarsBack(
const
Scope
&
scope
,
const
std
::
vector
<
std
::
string
>&
inplace_vars
,
const
Scope
&
scope
,
const
std
::
vector
<
std
::
string
>&
inplace_vars
,
const
Scope
&
transfer_scope
)
const
{
const
Scope
&
transfer_scope
)
const
{
for
(
auto
&
var_name
:
inplace_vars
)
{
for
(
auto
&
var_name
:
inplace_vars
)
{
VLOG
(
3
)
<<
"share inplace var "
+
var_name
+
" back to it's original scope"
;
VLOG
(
30
)
<<
"share inplace var "
+
var_name
+
" back to it's original scope"
;
auto
*
original_tensor
=
auto
*
original_tensor
=
GetMutableLoDTensorOrSelectedRowsValueFromVar
(
scope
.
FindVar
(
var_name
));
GetMutableLoDTensorOrSelectedRowsValueFromVar
(
scope
.
FindVar
(
var_name
));
auto
*
var
=
transfer_scope
.
FindVar
(
var_name
);
auto
*
var
=
transfer_scope
.
FindVar
(
var_name
);
...
@@ -815,7 +818,7 @@ Scope* OperatorWithKernel::TryTransferData(
...
@@ -815,7 +818,7 @@ Scope* OperatorWithKernel::TryTransferData(
transfered_inplace_vars
->
emplace_back
(
var_name
);
transfered_inplace_vars
->
emplace_back
(
var_name
);
}
}
VLOG
(
3
)
<<
"Transform Variable "
<<
var_name
<<
" from "
VLOG
(
3
0
)
<<
"Transform Variable "
<<
var_name
<<
" from "
<<
kernel_type_for_var
<<
" to "
<<
expected_kernel_key
;
<<
kernel_type_for_var
<<
" to "
<<
expected_kernel_key
;
if
(
new_scope
==
nullptr
)
{
if
(
new_scope
==
nullptr
)
{
...
...
paddle/fluid/framework/parallel_executor.cc
浏览文件 @
d231e550
...
@@ -199,7 +199,7 @@ void ParallelExecutor::BCastParamsToDevices(
...
@@ -199,7 +199,7 @@ void ParallelExecutor::BCastParamsToDevices(
auto
&
main_tensor
=
main_var
->
Get
<
LoDTensor
>
();
auto
&
main_tensor
=
main_var
->
Get
<
LoDTensor
>
();
if
(
!
main_tensor
.
IsInitialized
())
{
if
(
!
main_tensor
.
IsInitialized
())
{
VLOG
(
3
)
<<
"one in var not inited, return!"
;
VLOG
(
3
0
)
<<
"one in var not inited, return!"
;
continue
;
continue
;
}
}
auto
&
dims
=
main_tensor
.
dims
();
auto
&
dims
=
main_tensor
.
dims
();
...
...
paddle/fluid/framework/scope.cc
浏览文件 @
d231e550
...
@@ -149,7 +149,7 @@ Variable* Scope::VarInternal(const std::string& name) {
...
@@ -149,7 +149,7 @@ Variable* Scope::VarInternal(const std::string& name) {
v
=
new
Variable
();
v
=
new
Variable
();
vars_
[
name
].
reset
(
v
);
vars_
[
name
].
reset
(
v
);
VLOG
(
3
)
<<
"Create variable "
<<
name
;
VLOG
(
3
0
)
<<
"Create variable "
<<
name
;
v
->
name_
=
&
(
vars_
.
find
(
name
)
->
first
);
v
->
name_
=
&
(
vars_
.
find
(
name
)
->
first
);
return
v
;
return
v
;
}
}
...
...
paddle/fluid/framework/selected_rows.cc
浏览文件 @
d231e550
...
@@ -176,7 +176,7 @@ void SelectedRows::Get(const framework::Tensor& ids, framework::Tensor* value,
...
@@ -176,7 +176,7 @@ void SelectedRows::Get(const framework::Tensor& ids, framework::Tensor* value,
PADDLE_ENFORCE
(
value
->
IsInitialized
(),
PADDLE_ENFORCE
(
value
->
IsInitialized
(),
"The value tensor should be initialized."
);
"The value tensor should be initialized."
);
if
(
ids
.
numel
()
==
0
)
{
if
(
ids
.
numel
()
==
0
)
{
VLOG
(
3
)
<<
"keys is empty, please check data!"
;
VLOG
(
3
0
)
<<
"keys is empty, please check data!"
;
}
else
{
}
else
{
int64_t
value_width
=
value_
->
numel
()
/
value_
->
dims
()[
0
];
int64_t
value_width
=
value_
->
numel
()
/
value_
->
dims
()[
0
];
PADDLE_ENFORCE_EQ
(
value_width
,
value
->
numel
()
/
value
->
dims
()[
0
],
PADDLE_ENFORCE_EQ
(
value_width
,
value
->
numel
()
/
value
->
dims
()[
0
],
...
...
paddle/fluid/framework/tensor_util.cc
浏览文件 @
d231e550
...
@@ -23,7 +23,7 @@ namespace framework {
...
@@ -23,7 +23,7 @@ namespace framework {
void
TensorCopy
(
const
Tensor
&
src
,
const
platform
::
Place
&
dst_place
,
void
TensorCopy
(
const
Tensor
&
src
,
const
platform
::
Place
&
dst_place
,
const
platform
::
DeviceContext
&
ctx
,
Tensor
*
dst
)
{
const
platform
::
DeviceContext
&
ctx
,
Tensor
*
dst
)
{
VLOG
(
3
)
<<
"TensorCopy "
<<
src
.
dims
()
<<
" from "
<<
src
.
place
()
<<
" to "
VLOG
(
3
0
)
<<
"TensorCopy "
<<
src
.
dims
()
<<
" from "
<<
src
.
place
()
<<
" to "
<<
dst_place
;
<<
dst_place
;
src
.
check_memory_size
();
src
.
check_memory_size
();
...
@@ -38,7 +38,7 @@ void TensorCopy(const Tensor& src, const platform::Place& dst_place,
...
@@ -38,7 +38,7 @@ void TensorCopy(const Tensor& src, const platform::Place& dst_place,
if
(
platform
::
is_cpu_place
(
src_place
)
&&
platform
::
is_cpu_place
(
dst_place
))
{
if
(
platform
::
is_cpu_place
(
src_place
)
&&
platform
::
is_cpu_place
(
dst_place
))
{
if
(
src_ptr
==
dst_ptr
)
{
if
(
src_ptr
==
dst_ptr
)
{
VLOG
(
3
)
<<
"Skip copy the same data async from "
<<
src_place
<<
" to "
VLOG
(
3
0
)
<<
"Skip copy the same data async from "
<<
src_place
<<
" to "
<<
dst_place
;
<<
dst_place
;
return
;
return
;
}
}
...
@@ -78,7 +78,7 @@ void TensorCopy(const Tensor& src, const platform::Place& dst_place,
...
@@ -78,7 +78,7 @@ void TensorCopy(const Tensor& src, const platform::Place& dst_place,
reinterpret_cast
<
const
platform
::
CUDADeviceContext
&>
(
ctx
).
stream
();
reinterpret_cast
<
const
platform
::
CUDADeviceContext
&>
(
ctx
).
stream
();
if
(
platform
::
is_same_place
(
src_place
,
dst_place
))
{
if
(
platform
::
is_same_place
(
src_place
,
dst_place
))
{
if
(
src_ptr
==
dst_ptr
)
{
if
(
src_ptr
==
dst_ptr
)
{
VLOG
(
3
)
<<
"Skip copy the same data async from "
<<
src_place
<<
" to "
VLOG
(
3
0
)
<<
"Skip copy the same data async from "
<<
src_place
<<
" to "
<<
dst_place
;
<<
dst_place
;
return
;
return
;
}
}
...
@@ -115,7 +115,7 @@ void TensorCopy(const Tensor& src, const platform::Place& dst_place,
...
@@ -115,7 +115,7 @@ void TensorCopy(const Tensor& src, const platform::Place& dst_place,
void
TensorCopySync
(
const
Tensor
&
src
,
const
platform
::
Place
&
dst_place
,
void
TensorCopySync
(
const
Tensor
&
src
,
const
platform
::
Place
&
dst_place
,
Tensor
*
dst
)
{
Tensor
*
dst
)
{
VLOG
(
3
)
<<
"TensorCopySync "
<<
src
.
dims
()
<<
" from "
<<
src
.
place
()
VLOG
(
3
0
)
<<
"TensorCopySync "
<<
src
.
dims
()
<<
" from "
<<
src
.
place
()
<<
" to "
<<
dst_place
;
<<
" to "
<<
dst_place
;
src
.
check_memory_size
();
src
.
check_memory_size
();
dst
->
Resize
(
src
.
dims
());
dst
->
Resize
(
src
.
dims
());
...
@@ -126,7 +126,7 @@ void TensorCopySync(const Tensor& src, const platform::Place& dst_place,
...
@@ -126,7 +126,7 @@ void TensorCopySync(const Tensor& src, const platform::Place& dst_place,
auto
size
=
src
.
numel
()
*
SizeOfType
(
src
.
type
());
auto
size
=
src
.
numel
()
*
SizeOfType
(
src
.
type
());
if
(
platform
::
is_cpu_place
(
src_place
)
&&
platform
::
is_cpu_place
(
dst_place
))
{
if
(
platform
::
is_cpu_place
(
src_place
)
&&
platform
::
is_cpu_place
(
dst_place
))
{
if
(
src_ptr
==
dst_ptr
)
{
if
(
src_ptr
==
dst_ptr
)
{
VLOG
(
3
)
<<
"Skip copy the same data from "
<<
src_place
<<
" to "
VLOG
(
3
0
)
<<
"Skip copy the same data from "
<<
src_place
<<
" to "
<<
dst_place
;
<<
dst_place
;
return
;
return
;
}
}
...
@@ -147,7 +147,7 @@ void TensorCopySync(const Tensor& src, const platform::Place& dst_place,
...
@@ -147,7 +147,7 @@ void TensorCopySync(const Tensor& src, const platform::Place& dst_place,
}
else
if
(
platform
::
is_gpu_place
(
src_place
)
&&
}
else
if
(
platform
::
is_gpu_place
(
src_place
)
&&
platform
::
is_gpu_place
(
dst_place
))
{
platform
::
is_gpu_place
(
dst_place
))
{
if
(
src_ptr
==
dst_ptr
&&
platform
::
is_same_place
(
src_place
,
dst_place
))
{
if
(
src_ptr
==
dst_ptr
&&
platform
::
is_same_place
(
src_place
,
dst_place
))
{
VLOG
(
3
)
<<
"Skip copy the same data from "
<<
src_place
<<
" to "
VLOG
(
3
0
)
<<
"Skip copy the same data from "
<<
src_place
<<
" to "
<<
dst_place
;
<<
dst_place
;
return
;
return
;
}
}
...
...
paddle/fluid/framework/threadpool.cc
浏览文件 @
d231e550
...
@@ -39,7 +39,7 @@ void ThreadPool::Init() {
...
@@ -39,7 +39,7 @@ void ThreadPool::Init() {
int
num_threads
=
std
::
thread
::
hardware_concurrency
();
int
num_threads
=
std
::
thread
::
hardware_concurrency
();
if
(
FLAGS_dist_threadpool_size
>
0
)
{
if
(
FLAGS_dist_threadpool_size
>
0
)
{
num_threads
=
FLAGS_dist_threadpool_size
;
num_threads
=
FLAGS_dist_threadpool_size
;
VLOG
(
1
)
<<
"set dist_threadpool_size to "
<<
num_threads
;
VLOG
(
1
0
)
<<
"set dist_threadpool_size to "
<<
num_threads
;
}
}
PADDLE_ENFORCE_GT
(
num_threads
,
0
);
PADDLE_ENFORCE_GT
(
num_threads
,
0
);
threadpool_
.
reset
(
new
ThreadPool
(
num_threads
));
threadpool_
.
reset
(
new
ThreadPool
(
num_threads
));
...
...
paddle/fluid/framework/var_desc.cc
浏览文件 @
d231e550
...
@@ -61,7 +61,7 @@ size_t VarDesc::GetTensorDescNum() const {
...
@@ -61,7 +61,7 @@ size_t VarDesc::GetTensorDescNum() const {
void
VarDesc
::
SetShapes
(
void
VarDesc
::
SetShapes
(
const
std
::
vector
<
std
::
vector
<
int64_t
>>
&
multiple_dims
)
{
const
std
::
vector
<
std
::
vector
<
int64_t
>>
&
multiple_dims
)
{
if
(
multiple_dims
.
size
()
!=
GetTensorDescNum
())
{
if
(
multiple_dims
.
size
()
!=
GetTensorDescNum
())
{
VLOG
(
3
)
<<
"WARNING: The number of given shapes("
<<
multiple_dims
.
size
()
VLOG
(
3
0
)
<<
"WARNING: The number of given shapes("
<<
multiple_dims
.
size
()
<<
") doesn't match the existing tensor number("
<<
") doesn't match the existing tensor number("
<<
GetTensorDescNum
()
<<
GetTensorDescNum
()
<<
"). The Reader is going to be reinitialized."
;
<<
"). The Reader is going to be reinitialized."
;
...
@@ -94,7 +94,7 @@ void VarDesc::SetDataType(proto::VarType::Type data_type) {
...
@@ -94,7 +94,7 @@ void VarDesc::SetDataType(proto::VarType::Type data_type) {
void
VarDesc
::
SetDataTypes
(
void
VarDesc
::
SetDataTypes
(
const
std
::
vector
<
proto
::
VarType
::
Type
>
&
multiple_data_type
)
{
const
std
::
vector
<
proto
::
VarType
::
Type
>
&
multiple_data_type
)
{
if
(
multiple_data_type
.
size
()
!=
GetTensorDescNum
())
{
if
(
multiple_data_type
.
size
()
!=
GetTensorDescNum
())
{
VLOG
(
3
)
<<
"WARNING: The number of given data types("
VLOG
(
3
0
)
<<
"WARNING: The number of given data types("
<<
multiple_data_type
.
size
()
<<
multiple_data_type
.
size
()
<<
") doesn't match the existing tensor number("
<<
") doesn't match the existing tensor number("
<<
GetTensorDescNum
()
<<
GetTensorDescNum
()
...
@@ -139,7 +139,7 @@ void VarDesc::SetLoDLevel(int32_t lod_level) {
...
@@ -139,7 +139,7 @@ void VarDesc::SetLoDLevel(int32_t lod_level) {
void
VarDesc
::
SetLoDLevels
(
const
std
::
vector
<
int32_t
>
&
multiple_lod_level
)
{
void
VarDesc
::
SetLoDLevels
(
const
std
::
vector
<
int32_t
>
&
multiple_lod_level
)
{
if
(
multiple_lod_level
.
size
()
!=
GetTensorDescNum
())
{
if
(
multiple_lod_level
.
size
()
!=
GetTensorDescNum
())
{
VLOG
(
3
)
<<
"WARNING: The number of given lod_levels("
VLOG
(
3
0
)
<<
"WARNING: The number of given lod_levels("
<<
multiple_lod_level
.
size
()
<<
multiple_lod_level
.
size
()
<<
") doesn't match the existing tensor number("
<<
") doesn't match the existing tensor number("
<<
GetTensorDescNum
()
<<
GetTensorDescNum
()
...
...
paddle/fluid/framework/var_type_inference.h
浏览文件 @
d231e550
...
@@ -13,6 +13,9 @@ See the License for the specific language governing permissions and
...
@@ -13,6 +13,9 @@ See the License for the specific language governing permissions and
limitations under the License. */
limitations under the License. */
#pragma once
#pragma once
#include <string>
#include "paddle/fluid/framework/block_desc.h"
#include "paddle/fluid/framework/op_desc.h"
#include "paddle/fluid/framework/type_defs.h"
#include "paddle/fluid/framework/type_defs.h"
namespace
paddle
{
namespace
paddle
{
...
@@ -24,5 +27,27 @@ class VarTypeInference {
...
@@ -24,5 +27,27 @@ class VarTypeInference {
virtual
void
operator
()(
const
OpDesc
&
op_desc
,
BlockDesc
*
block
)
const
=
0
;
virtual
void
operator
()(
const
OpDesc
&
op_desc
,
BlockDesc
*
block
)
const
=
0
;
};
};
class
PassInDtypeAndVarTypeToOutput
:
public
framework
::
VarTypeInference
{
public:
void
operator
()(
const
framework
::
OpDesc
&
op_desc
,
framework
::
BlockDesc
*
block
)
const
final
{
auto
in_out_var_names
=
this
->
GetInputOutputWithSameType
();
for
(
auto
&
i_o_n
:
in_out_var_names
)
{
auto
&
x_name
=
op_desc
.
Input
(
i_o_n
.
first
).
at
(
0
);
auto
&
out_name
=
op_desc
.
Output
(
i_o_n
.
second
).
at
(
0
);
auto
&
x
=
block
->
FindRecursiveOrCreateVar
(
x_name
);
auto
&
out
=
block
->
FindRecursiveOrCreateVar
(
out_name
);
out
.
SetType
(
x
.
GetType
());
out
.
SetDataType
(
x
.
GetDataType
());
}
}
protected:
virtual
std
::
unordered_map
<
std
::
string
,
std
::
string
>
GetInputOutputWithSameType
()
const
=
0
;
};
}
// namespace framework
}
// namespace framework
}
// namespace paddle
}
// namespace paddle
paddle/fluid/inference/analysis/analyzer.cc
浏览文件 @
d231e550
...
@@ -60,7 +60,7 @@ class DfgPassManagerImpl final : public DfgPassManager {
...
@@ -60,7 +60,7 @@ class DfgPassManagerImpl final : public DfgPassManager {
private:
private:
void
AddPass
(
const
std
::
string
&
name
,
AnalysisPass
*
pass
)
{
void
AddPass
(
const
std
::
string
&
name
,
AnalysisPass
*
pass
)
{
VLOG
(
3
)
<<
"Adding pass "
<<
name
;
VLOG
(
3
0
)
<<
"Adding pass "
<<
name
;
Register
(
name
,
pass
);
Register
(
name
,
pass
);
AddGraphvizDebugerPass
(
pass
);
AddGraphvizDebugerPass
(
pass
);
}
}
...
@@ -104,7 +104,7 @@ void Analyzer::Run(Argument* argument) {
...
@@ -104,7 +104,7 @@ void Analyzer::Run(Argument* argument) {
passes
.
push_back
(
"graph_viz_pass"
);
// add graphviz for debug.
passes
.
push_back
(
"graph_viz_pass"
);
// add graphviz for debug.
#ifdef PADDLE_WITH_MKLDNN
#ifdef PADDLE_WITH_MKLDNN
if
(
use_mkldnn_
)
{
if
(
use_mkldnn_
)
{
VLOG
(
3
)
<<
"Adding MKL-DNN placement pass"
;
VLOG
(
3
0
)
<<
"Adding MKL-DNN placement pass"
;
passes
.
push_back
(
"mkldnn_placement_pass"
);
passes
.
push_back
(
"mkldnn_placement_pass"
);
}
}
#endif
#endif
...
@@ -113,7 +113,9 @@ void Analyzer::Run(Argument* argument) {
...
@@ -113,7 +113,9 @@ void Analyzer::Run(Argument* argument) {
passes
.
push_back
(
"infer_clean_graph_pass"
);
passes
.
push_back
(
"infer_clean_graph_pass"
);
passes
.
push_back
(
"graph_viz_pass"
);
// add graphviz for debug.
passes
.
push_back
(
"graph_viz_pass"
);
// add graphviz for debug.
for
(
auto
&
pass
:
ir_passes_
)
{
for
(
auto
&
pass
:
ir_passes_
)
{
if
(
!
disabled_ir_passes_
.
count
(
pass
))
{
// skip mkldnn pass when use_mkldnn_ = false;
bool
skip_pass
=
(
!
use_mkldnn_
)
&&
pass
.
find
(
"mkldnn"
)
!=
std
::
string
::
npos
;
if
(
!
disabled_ir_passes_
.
count
(
pass
)
&&
!
skip_pass
)
{
passes
.
push_back
(
pass
);
passes
.
push_back
(
pass
);
passes
.
push_back
(
"graph_viz_pass"
);
// add graphviz for debug.
passes
.
push_back
(
"graph_viz_pass"
);
// add graphviz for debug.
}
}
...
...
paddle/fluid/inference/analysis/argument.h
浏览文件 @
d231e550
...
@@ -68,8 +68,8 @@ struct Argument {
...
@@ -68,8 +68,8 @@ struct Argument {
key
);
key
);
attrs_
[
key
]
=
data
;
attrs_
[
key
]
=
data
;
attr_deleters_
[
key
]
=
[
data
,
key
]()
{
attr_deleters_
[
key
]
=
[
data
,
key
]()
{
VLOG
(
3
)
<<
"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
;
VLOG
(
3
0
)
<<
"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
;
VLOG
(
3
)
<<
"argument delete attr: "
<<
key
;
VLOG
(
3
0
)
<<
"argument delete attr: "
<<
key
;
delete
data
;
delete
data
;
};
};
}
}
...
...
paddle/fluid/inference/analysis/data_flow_graph.cc
浏览文件 @
d231e550
...
@@ -132,7 +132,7 @@ void DataFlowGraph::Build(const framework::ir::Graph &graph) {
...
@@ -132,7 +132,7 @@ void DataFlowGraph::Build(const framework::ir::Graph &graph) {
Node
*
x
{
nullptr
};
Node
*
x
{
nullptr
};
if
(
ir_node
->
IsOp
())
{
if
(
ir_node
->
IsOp
())
{
PADDLE_ENFORCE
(
ir_node
->
Op
());
PADDLE_ENFORCE
(
ir_node
->
Op
());
VLOG
(
4
)
<<
"get op "
<<
ir_node
<<
" "
<<
ir_node
->
Name
();
VLOG
(
4
0
)
<<
"get op "
<<
ir_node
<<
" "
<<
ir_node
->
Name
();
x
=
nodes
.
Create
(
Node
::
Type
::
kFunction
);
x
=
nodes
.
Create
(
Node
::
Type
::
kFunction
);
x
->
attr
(
"ir_node"
).
Pointer
()
=
ir_node
;
x
->
attr
(
"ir_node"
).
Pointer
()
=
ir_node
;
PADDLE_ENFORCE
(
ir_node
->
Op
()
->
Proto
());
PADDLE_ENFORCE
(
ir_node
->
Op
()
->
Proto
());
...
@@ -141,7 +141,7 @@ void DataFlowGraph::Build(const framework::ir::Graph &graph) {
...
@@ -141,7 +141,7 @@ void DataFlowGraph::Build(const framework::ir::Graph &graph) {
}
else
if
(
ir_node
->
IsVar
())
{
}
else
if
(
ir_node
->
IsVar
())
{
// Not create a Node for IR ControlDepVar, considering Inference currently
// Not create a Node for IR ControlDepVar, considering Inference currently
// just used in single thread scenerio.
// just used in single thread scenerio.
VLOG
(
4
)
<<
"get var "
<<
ir_node
->
Name
();
VLOG
(
4
0
)
<<
"get var "
<<
ir_node
->
Name
();
x
=
nodes
.
Create
(
Node
::
Type
::
kValue
);
x
=
nodes
.
Create
(
Node
::
Type
::
kValue
);
x
->
attr
(
"ir_node"
).
Pointer
()
=
ir_node
;
x
->
attr
(
"ir_node"
).
Pointer
()
=
ir_node
;
x
->
SetName
(
ir_node
->
Name
());
x
->
SetName
(
ir_node
->
Name
());
...
@@ -151,9 +151,9 @@ void DataFlowGraph::Build(const framework::ir::Graph &graph) {
...
@@ -151,9 +151,9 @@ void DataFlowGraph::Build(const framework::ir::Graph &graph) {
}
}
ir_node_map
.
emplace
(
ir_node
,
x
);
ir_node_map
.
emplace
(
ir_node
,
x
);
}
}
VLOG
(
4
)
<<
"finish creating Nodes"
;
VLOG
(
4
0
)
<<
"finish creating Nodes"
;
VLOG
(
4
)
<<
"to create edge"
;
VLOG
(
4
0
)
<<
"to create edge"
;
// Create links
// Create links
for
(
auto
*
ir_node
:
graph
.
Nodes
())
{
for
(
auto
*
ir_node
:
graph
.
Nodes
())
{
auto
it
=
ir_node_map
.
find
(
ir_node
);
auto
it
=
ir_node_map
.
find
(
ir_node
);
...
@@ -175,7 +175,7 @@ void DataFlowGraph::Build(const framework::ir::Graph &graph) {
...
@@ -175,7 +175,7 @@ void DataFlowGraph::Build(const framework::ir::Graph &graph) {
"Can't deduce any inputs from the graph, Is the graph empty?"
);
"Can't deduce any inputs from the graph, Is the graph empty?"
);
ir_graph
=
&
graph
;
ir_graph
=
&
graph
;
VLOG
(
3
)
<<
"finished build from IR"
;
VLOG
(
3
0
)
<<
"finished build from IR"
;
}
}
void
DataFlowGraph
::
Clean
()
{
void
DataFlowGraph
::
Clean
()
{
...
...
paddle/fluid/inference/analysis/data_flow_graph_to_fluid_pass.cc
浏览文件 @
d231e550
...
@@ -239,9 +239,10 @@ void DataFlowGraphToFluidPass::AddEngineOp(Node *node) {
...
@@ -239,9 +239,10 @@ void DataFlowGraphToFluidPass::AddEngineOp(Node *node) {
framework
::
BlockDesc
block_desc
(
nullptr
,
&
proto
);
framework
::
BlockDesc
block_desc
(
nullptr
,
&
proto
);
block_desc
.
Proto
()
->
set_parent_idx
(
-
1
);
block_desc
.
Proto
()
->
set_parent_idx
(
-
1
);
block_desc
.
Proto
()
->
set_idx
(
0
);
block_desc
.
Proto
()
->
set_idx
(
0
);
VLOG
(
4
)
<<
"origin variable size: "
VLOG
(
4
0
)
<<
"origin variable size: "
<<
argument_
->
origin_program_desc
->
blocks
(
0
).
vars
().
size
();
<<
argument_
->
origin_program_desc
->
blocks
(
0
).
vars
().
size
();
VLOG
(
4
)
<<
"transformed variable size: "
<<
block_desc
.
Proto
()
->
vars
().
size
();
VLOG
(
40
)
<<
"transformed variable size: "
<<
block_desc
.
Proto
()
->
vars
().
size
();
// copy ops.
// copy ops.
for
(
auto
*
node
:
block_node
->
subgraph
)
{
for
(
auto
*
node
:
block_node
->
subgraph
)
{
...
...
paddle/fluid/inference/analysis/dfg_graphviz_draw_pass.cc
浏览文件 @
d231e550
...
@@ -29,7 +29,7 @@ void DFG_GraphvizDrawPass::Run(DataFlowGraph *graph) {
...
@@ -29,7 +29,7 @@ void DFG_GraphvizDrawPass::Run(DataFlowGraph *graph) {
auto
png_path
=
dot_path
.
substr
(
0
,
dot_path
.
size
()
-
4
)
+
".png"
;
auto
png_path
=
dot_path
.
substr
(
0
,
dot_path
.
size
()
-
4
)
+
".png"
;
std
::
string
message
;
std
::
string
message
;
VLOG
(
3
)
<<
"draw to "
<<
png_path
;
VLOG
(
3
0
)
<<
"draw to "
<<
png_path
;
ExecShellCommand
(
"dot -Tpng "
+
dot_path
+
" -o "
+
png_path
,
&
message
);
ExecShellCommand
(
"dot -Tpng "
+
dot_path
+
" -o "
+
png_path
,
&
message
);
}
}
...
...
paddle/fluid/inference/analysis/fluid_to_ir_pass.cc
浏览文件 @
d231e550
...
@@ -29,7 +29,7 @@ void FluidToIrPass::EnableParamModify(const std::string &model_dir,
...
@@ -29,7 +29,7 @@ void FluidToIrPass::EnableParamModify(const std::string &model_dir,
PADDLE_ENFORCE
(
argument_
);
PADDLE_ENFORCE
(
argument_
);
argument_
->
Set
(
framework
::
ir
::
kParamScopeAttr
,
new
framework
::
Scope
);
argument_
->
Set
(
framework
::
ir
::
kParamScopeAttr
,
new
framework
::
Scope
);
// Load parameters.
// Load parameters.
VLOG
(
3
)
<<
"Loading parameters from "
<<
model_dir
;
VLOG
(
3
0
)
<<
"Loading parameters from "
<<
model_dir
;
LoadParams
(
&
argument_
->
Get
<
framework
::
Scope
>
(
framework
::
ir
::
kParamScopeAttr
),
LoadParams
(
&
argument_
->
Get
<
framework
::
Scope
>
(
framework
::
ir
::
kParamScopeAttr
),
model_dir
,
prog_file
,
param_file
);
model_dir
,
prog_file
,
param_file
);
}
}
...
...
paddle/fluid/inference/analysis/model_store_pass.cc
浏览文件 @
d231e550
...
@@ -35,20 +35,20 @@ void ModelStorePass::Run(DataFlowGraph *x) {
...
@@ -35,20 +35,20 @@ void ModelStorePass::Run(DataFlowGraph *x) {
std
::
stringstream
ss
;
std
::
stringstream
ss
;
// NOTE these commands only works on linux.
// NOTE these commands only works on linux.
ss
<<
"mkdir -p "
<<
*
argument_
->
model_output_store_path
;
ss
<<
"mkdir -p "
<<
*
argument_
->
model_output_store_path
;
VLOG
(
3
)
<<
"run command: "
<<
ss
.
str
();
VLOG
(
3
0
)
<<
"run command: "
<<
ss
.
str
();
PADDLE_ENFORCE_EQ
(
system
(
ss
.
str
().
c_str
()),
0
);
PADDLE_ENFORCE_EQ
(
system
(
ss
.
str
().
c_str
()),
0
);
ss
.
str
(
""
);
ss
.
str
(
""
);
ss
<<
"cp "
<<
*
argument_
->
fluid_model_dir
<<
"/*"
ss
<<
"cp "
<<
*
argument_
->
fluid_model_dir
<<
"/*"
<<
" "
<<
*
argument_
->
model_output_store_path
;
<<
" "
<<
*
argument_
->
model_output_store_path
;
VLOG
(
3
)
<<
"run command: "
<<
ss
.
str
();
VLOG
(
3
0
)
<<
"run command: "
<<
ss
.
str
();
PADDLE_ENFORCE_EQ
(
system
(
ss
.
str
().
c_str
()),
0
);
PADDLE_ENFORCE_EQ
(
system
(
ss
.
str
().
c_str
()),
0
);
// Store program
// Store program
PADDLE_ENFORCE_NOT_NULL
(
argument_
->
transformed_program_desc
,
PADDLE_ENFORCE_NOT_NULL
(
argument_
->
transformed_program_desc
,
"program desc is not transformed, should call "
"program desc is not transformed, should call "
"DataFlowGraphToFluidPass first."
);
"DataFlowGraphToFluidPass first."
);
VLOG
(
3
)
<<
"store analyzed program to "
VLOG
(
3
0
)
<<
"store analyzed program to "
<<
*
argument_
->
model_output_store_path
;
<<
*
argument_
->
model_output_store_path
;
const
std
::
string
program_output_path
=
const
std
::
string
program_output_path
=
*
argument_
->
model_output_store_path
+
"/__model__"
;
*
argument_
->
model_output_store_path
+
"/__model__"
;
...
...
paddle/fluid/inference/analysis/pass_manager.cc
浏览文件 @
d231e550
...
@@ -23,7 +23,7 @@ namespace analysis {
...
@@ -23,7 +23,7 @@ namespace analysis {
bool
PassManager
::
Initialize
(
Argument
*
argument
)
{
bool
PassManager
::
Initialize
(
Argument
*
argument
)
{
argument_
=
argument
;
argument_
=
argument
;
for
(
auto
&
pass
:
data_
)
{
for
(
auto
&
pass
:
data_
)
{
VLOG
(
3
)
<<
"Initializing pass ["
<<
pass
->
repr
()
<<
"]"
;
VLOG
(
3
0
)
<<
"Initializing pass ["
<<
pass
->
repr
()
<<
"]"
;
if
(
!
pass
->
Initialize
(
argument
))
{
if
(
!
pass
->
Initialize
(
argument
))
{
LOG
(
ERROR
)
<<
"Failed to initialize pass ["
<<
pass
->
repr
()
<<
"]"
;
LOG
(
ERROR
)
<<
"Failed to initialize pass ["
<<
pass
->
repr
()
<<
"]"
;
return
false
;
return
false
;
...
@@ -34,7 +34,7 @@ bool PassManager::Initialize(Argument* argument) {
...
@@ -34,7 +34,7 @@ bool PassManager::Initialize(Argument* argument) {
void
DfgPassManager
::
RunAll
()
{
void
DfgPassManager
::
RunAll
()
{
PADDLE_ENFORCE
(
argument_
);
PADDLE_ENFORCE
(
argument_
);
VLOG
(
3
)
<<
"Total "
<<
data_
.
size
()
<<
" Analysys passes"
;
VLOG
(
3
0
)
<<
"Total "
<<
data_
.
size
()
<<
" Analysys passes"
;
for
(
auto
&
pass
:
data_
)
{
for
(
auto
&
pass
:
data_
)
{
string
::
PrettyLogEndl
(
string
::
Style
::
H1
(),
"* Running Analysis pass [%s]"
,
string
::
PrettyLogEndl
(
string
::
Style
::
H1
(),
"* Running Analysis pass [%s]"
,
pass
->
repr
());
pass
->
repr
());
...
...
paddle/fluid/inference/analysis/subgraph_splitter.cc
浏览文件 @
d231e550
...
@@ -232,7 +232,7 @@ std::vector<std::vector<Node *>> SubGraphSplitter::ExtractSubGraphs() {
...
@@ -232,7 +232,7 @@ std::vector<std::vector<Node *>> SubGraphSplitter::ExtractSubGraphs() {
BriefNode
*
brief_node
=
itr
.
second
;
BriefNode
*
brief_node
=
itr
.
second
;
if
(
!
brief_node
->
node
->
attr
(
kMarkerAttrName
).
Bool
())
{
if
(
!
brief_node
->
node
->
attr
(
kMarkerAttrName
).
Bool
())
{
VLOG
(
4
)
<<
brief_node
->
node
->
id
()
<<
" node not a trt candicate."
;
VLOG
(
4
0
)
<<
brief_node
->
node
->
id
()
<<
" node not a trt candicate."
;
continue
;
continue
;
}
}
...
...
paddle/fluid/inference/analysis/tensorrt_subgraph_pass.cc
浏览文件 @
d231e550
...
@@ -25,7 +25,7 @@ TensorRTSubGraphPass::TensorRTSubGraphPass(
...
@@ -25,7 +25,7 @@ TensorRTSubGraphPass::TensorRTSubGraphPass(
void
TensorRTSubGraphPass
::
Run
(
DataFlowGraph
*
graph
)
{
void
TensorRTSubGraphPass
::
Run
(
DataFlowGraph
*
graph
)
{
SubGraphFuse
(
graph
,
node_inside_subgraph_teller_
,
argument_
)();
SubGraphFuse
(
graph
,
node_inside_subgraph_teller_
,
argument_
)();
VLOG
(
4
)
<<
"debug info "
VLOG
(
4
0
)
<<
"debug info "
<<
graph
->
HumanReadableInfo
(
false
/*show_values*/
,
<<
graph
->
HumanReadableInfo
(
false
/*show_values*/
,
true
/*show_functions*/
);
true
/*show_functions*/
);
}
}
...
...
paddle/fluid/inference/api/analysis_predictor.cc
浏览文件 @
d231e550
...
@@ -38,7 +38,7 @@ using contrib::AnalysisConfig;
...
@@ -38,7 +38,7 @@ using contrib::AnalysisConfig;
bool
AnalysisPredictor
::
Init
(
bool
AnalysisPredictor
::
Init
(
const
std
::
shared_ptr
<
framework
::
Scope
>
&
parent_scope
,
const
std
::
shared_ptr
<
framework
::
Scope
>
&
parent_scope
,
const
std
::
shared_ptr
<
framework
::
ProgramDesc
>
&
program
)
{
const
std
::
shared_ptr
<
framework
::
ProgramDesc
>
&
program
)
{
VLOG
(
3
)
<<
"Predictor::init()"
;
VLOG
(
3
0
)
<<
"Predictor::init()"
;
#if !defined(_WIN32)
#if !defined(_WIN32)
if
(
FLAGS_profile
)
{
if
(
FLAGS_profile
)
{
LOG
(
WARNING
)
<<
"Profiler is actived, might affect the performance"
;
LOG
(
WARNING
)
<<
"Profiler is actived, might affect the performance"
;
...
@@ -89,7 +89,7 @@ bool AnalysisPredictor::Init(
...
@@ -89,7 +89,7 @@ bool AnalysisPredictor::Init(
bool
AnalysisPredictor
::
Run
(
const
std
::
vector
<
PaddleTensor
>
&
inputs
,
bool
AnalysisPredictor
::
Run
(
const
std
::
vector
<
PaddleTensor
>
&
inputs
,
std
::
vector
<
PaddleTensor
>
*
output_data
,
std
::
vector
<
PaddleTensor
>
*
output_data
,
int
batch_size
)
{
int
batch_size
)
{
VLOG
(
3
)
<<
"Predictor::predict"
;
VLOG
(
3
0
)
<<
"Predictor::predict"
;
inference
::
Timer
timer
;
inference
::
Timer
timer
;
timer
.
tic
();
timer
.
tic
();
// set feed variable
// set feed variable
...
@@ -109,7 +109,7 @@ bool AnalysisPredictor::Run(const std::vector<PaddleTensor> &inputs,
...
@@ -109,7 +109,7 @@ bool AnalysisPredictor::Run(const std::vector<PaddleTensor> &inputs,
LOG
(
ERROR
)
<<
"fail to get fetches"
;
LOG
(
ERROR
)
<<
"fail to get fetches"
;
return
false
;
return
false
;
}
}
VLOG
(
3
)
<<
"predict cost: "
<<
timer
.
toc
()
<<
"ms"
;
VLOG
(
3
0
)
<<
"predict cost: "
<<
timer
.
toc
()
<<
"ms"
;
// Fix TensorArray reuse not cleaned bug.
// Fix TensorArray reuse not cleaned bug.
tensor_array_batch_cleaner_
.
CollectTensorArrays
(
scope_
.
get
());
tensor_array_batch_cleaner_
.
CollectTensorArrays
(
scope_
.
get
());
...
@@ -119,7 +119,7 @@ bool AnalysisPredictor::Run(const std::vector<PaddleTensor> &inputs,
...
@@ -119,7 +119,7 @@ bool AnalysisPredictor::Run(const std::vector<PaddleTensor> &inputs,
bool
AnalysisPredictor
::
SetFeed
(
const
std
::
vector
<
PaddleTensor
>
&
inputs
,
bool
AnalysisPredictor
::
SetFeed
(
const
std
::
vector
<
PaddleTensor
>
&
inputs
,
framework
::
Scope
*
scope
)
{
framework
::
Scope
*
scope
)
{
VLOG
(
3
)
<<
"Predictor::set_feed"
;
VLOG
(
3
0
)
<<
"Predictor::set_feed"
;
if
(
inputs
.
size
()
!=
feeds_
.
size
())
{
if
(
inputs
.
size
()
!=
feeds_
.
size
())
{
LOG
(
ERROR
)
<<
"wrong feed input size, need "
<<
feeds_
.
size
()
<<
" but get "
LOG
(
ERROR
)
<<
"wrong feed input size, need "
<<
feeds_
.
size
()
<<
" but get "
<<
inputs
.
size
();
<<
inputs
.
size
();
...
@@ -184,7 +184,7 @@ void AnalysisPredictor::GetFetchOne(const framework::LoDTensor &fetch,
...
@@ -184,7 +184,7 @@ void AnalysisPredictor::GetFetchOne(const framework::LoDTensor &fetch,
bool
AnalysisPredictor
::
GetFetch
(
std
::
vector
<
PaddleTensor
>
*
outputs
,
bool
AnalysisPredictor
::
GetFetch
(
std
::
vector
<
PaddleTensor
>
*
outputs
,
framework
::
Scope
*
scope
)
{
framework
::
Scope
*
scope
)
{
VLOG
(
3
)
<<
"Predictor::get_fetch"
;
VLOG
(
3
0
)
<<
"Predictor::get_fetch"
;
outputs
->
resize
(
fetchs_
.
size
());
outputs
->
resize
(
fetchs_
.
size
());
for
(
size_t
i
=
0
;
i
<
fetchs_
.
size
();
++
i
)
{
for
(
size_t
i
=
0
;
i
<
fetchs_
.
size
();
++
i
)
{
int
idx
=
boost
::
get
<
int
>
(
fetchs_
[
i
]
->
GetAttr
(
"col"
));
int
idx
=
boost
::
get
<
int
>
(
fetchs_
[
i
]
->
GetAttr
(
"col"
));
...
@@ -246,7 +246,7 @@ void AnalysisPredictor::OptimizeInferenceProgram() {
...
@@ -246,7 +246,7 @@ void AnalysisPredictor::OptimizeInferenceProgram() {
}
}
CHECK
(
argument_
.
transformed_program_desc
);
CHECK
(
argument_
.
transformed_program_desc
);
VLOG
(
5
)
<<
"to prepare executor"
;
VLOG
(
5
0
)
<<
"to prepare executor"
;
inference_program_
.
reset
(
inference_program_
.
reset
(
new
framework
::
ProgramDesc
(
*
argument_
.
transformed_program_desc
));
new
framework
::
ProgramDesc
(
*
argument_
.
transformed_program_desc
));
if
(
argument_
.
Has
(
framework
::
ir
::
kParamScopeAttr
))
{
if
(
argument_
.
Has
(
framework
::
ir
::
kParamScopeAttr
))
{
...
@@ -260,7 +260,7 @@ void AnalysisPredictor::OptimizeInferenceProgram() {
...
@@ -260,7 +260,7 @@ void AnalysisPredictor::OptimizeInferenceProgram() {
template
<
>
template
<
>
std
::
unique_ptr
<
PaddlePredictor
>
CreatePaddlePredictor
<
std
::
unique_ptr
<
PaddlePredictor
>
CreatePaddlePredictor
<
AnalysisConfig
,
PaddleEngineKind
::
kAnalysis
>
(
const
AnalysisConfig
&
config
)
{
AnalysisConfig
,
PaddleEngineKind
::
kAnalysis
>
(
const
AnalysisConfig
&
config
)
{
VLOG
(
3
)
<<
"create AnalysisConfig"
;
VLOG
(
3
0
)
<<
"create AnalysisConfig"
;
if
(
config
.
use_gpu
)
{
if
(
config
.
use_gpu
)
{
// 1. GPU memeroy
// 1. GPU memeroy
PADDLE_ENFORCE_GT
(
PADDLE_ENFORCE_GT
(
...
@@ -274,7 +274,7 @@ std::unique_ptr<PaddlePredictor> CreatePaddlePredictor<
...
@@ -274,7 +274,7 @@ std::unique_ptr<PaddlePredictor> CreatePaddlePredictor<
std
::
string
flag
=
"--fraction_of_gpu_memory_to_use="
+
std
::
string
flag
=
"--fraction_of_gpu_memory_to_use="
+
std
::
to_string
(
config
.
fraction_of_gpu_memory
);
std
::
to_string
(
config
.
fraction_of_gpu_memory
);
flags
.
push_back
(
flag
);
flags
.
push_back
(
flag
);
VLOG
(
3
)
<<
"set flag: "
<<
flag
;
VLOG
(
3
0
)
<<
"set flag: "
<<
flag
;
framework
::
InitGflags
(
flags
);
framework
::
InitGflags
(
flags
);
}
}
}
}
...
...
paddle/fluid/inference/api/analysis_predictor.h
浏览文件 @
d231e550
...
@@ -13,6 +13,8 @@
...
@@ -13,6 +13,8 @@
// limitations under the License.
// limitations under the License.
#pragma once
#pragma once
#include <algorithm>
#include <map>
#include <string>
#include <string>
#include <vector>
#include <vector>
#include "paddle/fluid/framework/naive_executor.h"
#include "paddle/fluid/framework/naive_executor.h"
...
...
paddle/fluid/inference/api/api.cc
浏览文件 @
d231e550
...
@@ -16,7 +16,6 @@
...
@@ -16,7 +16,6 @@
#include "paddle/fluid/framework/scope.h"
#include "paddle/fluid/framework/scope.h"
#include "paddle/fluid/inference/api/paddle_inference_api.h"
#include "paddle/fluid/inference/api/paddle_inference_api.h"
#include "paddle/fluid/platform/enforce.h"
#include "paddle/fluid/platform/enforce.h"
#include "paddle_inference_api.h"
namespace
paddle
{
namespace
paddle
{
...
...
paddle/fluid/inference/api/api_impl.cc
浏览文件 @
d231e550
...
@@ -157,7 +157,7 @@ bool NativePaddlePredictor::Run(const std::vector<PaddleTensor> &inputs,
...
@@ -157,7 +157,7 @@ bool NativePaddlePredictor::Run(const std::vector<PaddleTensor> &inputs,
LOG
(
ERROR
)
<<
"fail to get fetches"
;
LOG
(
ERROR
)
<<
"fail to get fetches"
;
return
false
;
return
false
;
}
}
VLOG
(
3
)
<<
"predict cost: "
<<
timer
.
toc
()
<<
"ms"
;
VLOG
(
3
0
)
<<
"predict cost: "
<<
timer
.
toc
()
<<
"ms"
;
// Fix TensorArray reuse not cleaned bug.
// Fix TensorArray reuse not cleaned bug.
tensor_array_batch_cleaner_
.
CollectTensorArrays
(
scope_
.
get
());
tensor_array_batch_cleaner_
.
CollectTensorArrays
(
scope_
.
get
());
...
...
paddle/fluid/inference/api/api_tensorrt_subgraph_engine.cc
浏览文件 @
d231e550
...
@@ -34,7 +34,7 @@ class TensorRTSubgraphPredictor : public NativePaddlePredictor {
...
@@ -34,7 +34,7 @@ class TensorRTSubgraphPredictor : public NativePaddlePredictor {
bool
Init
(
const
std
::
shared_ptr
<
framework
::
Scope
>&
parent_scope
)
{
bool
Init
(
const
std
::
shared_ptr
<
framework
::
Scope
>&
parent_scope
)
{
FLAGS_IA_enable_tensorrt_subgraph_engine
=
true
;
FLAGS_IA_enable_tensorrt_subgraph_engine
=
true
;
VLOG
(
3
)
<<
"Predictor::init()"
;
VLOG
(
3
0
)
<<
"Predictor::init()"
;
if
(
config_
.
use_gpu
)
{
if
(
config_
.
use_gpu
)
{
place_
=
paddle
::
platform
::
CUDAPlace
(
config_
.
device
);
place_
=
paddle
::
platform
::
CUDAPlace
(
config_
.
device
);
}
else
{
}
else
{
...
@@ -70,7 +70,7 @@ class TensorRTSubgraphPredictor : public NativePaddlePredictor {
...
@@ -70,7 +70,7 @@ class TensorRTSubgraphPredictor : public NativePaddlePredictor {
OptimizeInferenceProgram
();
OptimizeInferenceProgram
();
ctx_
=
executor_
->
Prepare
(
*
inference_program_
,
0
);
ctx_
=
executor_
->
Prepare
(
*
inference_program_
,
0
);
VLOG
(
5
)
<<
"to create variables"
;
VLOG
(
5
0
)
<<
"to create variables"
;
executor_
->
CreateVariables
(
*
inference_program_
,
executor_
->
CreateVariables
(
*
inference_program_
,
sub_scope_
?
sub_scope_
:
scope_
.
get
(),
0
);
sub_scope_
?
sub_scope_
:
scope_
.
get
(),
0
);
// Get the feed_target_names and fetch_target_names
// Get the feed_target_names and fetch_target_names
...
@@ -114,9 +114,9 @@ class TensorRTSubgraphPredictor : public NativePaddlePredictor {
...
@@ -114,9 +114,9 @@ class TensorRTSubgraphPredictor : public NativePaddlePredictor {
new
ProgramDesc
(
*
inference_program_
->
Proto
()));
new
ProgramDesc
(
*
inference_program_
->
Proto
()));
Singleton
<
Analyzer
>::
Global
().
Run
(
&
argument
);
Singleton
<
Analyzer
>::
Global
().
Run
(
&
argument
);
CHECK
(
argument
.
transformed_program_desc
);
CHECK
(
argument
.
transformed_program_desc
);
VLOG
(
5
)
<<
"transformed program:
\n
"
VLOG
(
5
0
)
<<
"transformed program:
\n
"
<<
argument
.
transformed_program_desc
->
SerializeAsString
();
<<
argument
.
transformed_program_desc
->
SerializeAsString
();
VLOG
(
5
)
<<
"to prepare executor"
;
VLOG
(
5
0
)
<<
"to prepare executor"
;
inference_program_
.
reset
(
inference_program_
.
reset
(
new
framework
::
ProgramDesc
(
*
argument
.
transformed_program_desc
));
new
framework
::
ProgramDesc
(
*
argument
.
transformed_program_desc
));
}
}
...
@@ -129,7 +129,7 @@ template <>
...
@@ -129,7 +129,7 @@ template <>
std
::
unique_ptr
<
PaddlePredictor
>
std
::
unique_ptr
<
PaddlePredictor
>
CreatePaddlePredictor
<
MixedRTConfig
,
PaddleEngineKind
::
kAutoMixedTensorRT
>
(
CreatePaddlePredictor
<
MixedRTConfig
,
PaddleEngineKind
::
kAutoMixedTensorRT
>
(
const
MixedRTConfig
&
config
)
{
const
MixedRTConfig
&
config
)
{
VLOG
(
3
)
<<
"create TensorRTSubgraphPredictor"
;
VLOG
(
3
0
)
<<
"create TensorRTSubgraphPredictor"
;
if
(
config
.
use_gpu
)
{
if
(
config
.
use_gpu
)
{
// 1. GPU memeroy
// 1. GPU memeroy
PADDLE_ENFORCE_GT
(
PADDLE_ENFORCE_GT
(
...
@@ -143,7 +143,7 @@ CreatePaddlePredictor<MixedRTConfig, PaddleEngineKind::kAutoMixedTensorRT>(
...
@@ -143,7 +143,7 @@ CreatePaddlePredictor<MixedRTConfig, PaddleEngineKind::kAutoMixedTensorRT>(
std
::
string
flag
=
"--fraction_of_gpu_memory_to_use="
+
std
::
string
flag
=
"--fraction_of_gpu_memory_to_use="
+
std
::
to_string
(
config
.
fraction_of_gpu_memory
);
std
::
to_string
(
config
.
fraction_of_gpu_memory
);
flags
.
push_back
(
flag
);
flags
.
push_back
(
flag
);
VLOG
(
3
)
<<
"set flag: "
<<
flag
;
VLOG
(
3
0
)
<<
"set flag: "
<<
flag
;
framework
::
InitGflags
(
flags
);
framework
::
InitGflags
(
flags
);
}
}
}
}
...
...
paddle/fluid/inference/api/demo_ci/trt_mobilenet_demo.cc
浏览文件 @
d231e550
...
@@ -45,7 +45,7 @@ void Main() {
...
@@ -45,7 +45,7 @@ void Main() {
config
.
fraction_of_gpu_memory
=
0.1
;
// set by yourself
config
.
fraction_of_gpu_memory
=
0.1
;
// set by yourself
predictor
=
CreatePaddlePredictor
<
paddle
::
contrib
::
MixedRTConfig
>
(
config
);
predictor
=
CreatePaddlePredictor
<
paddle
::
contrib
::
MixedRTConfig
>
(
config
);
VLOG
(
3
)
<<
"begin to process data"
;
VLOG
(
3
0
)
<<
"begin to process data"
;
// Just a single batch of data.
// Just a single batch of data.
std
::
string
line
;
std
::
string
line
;
std
::
ifstream
file
(
FLAGS_data
);
std
::
ifstream
file
(
FLAGS_data
);
...
@@ -60,13 +60,13 @@ void Main() {
...
@@ -60,13 +60,13 @@ void Main() {
PaddleBuf
(
record
.
data
.
data
(),
record
.
data
.
size
()
*
sizeof
(
float
));
PaddleBuf
(
record
.
data
.
data
(),
record
.
data
.
size
()
*
sizeof
(
float
));
input
.
dtype
=
PaddleDType
::
FLOAT32
;
input
.
dtype
=
PaddleDType
::
FLOAT32
;
VLOG
(
3
)
<<
"run executor"
;
VLOG
(
3
0
)
<<
"run executor"
;
std
::
vector
<
PaddleTensor
>
output
;
std
::
vector
<
PaddleTensor
>
output
;
predictor
->
Run
({
input
},
&
output
,
1
);
predictor
->
Run
({
input
},
&
output
,
1
);
VLOG
(
3
)
<<
"output.size "
<<
output
.
size
();
VLOG
(
3
0
)
<<
"output.size "
<<
output
.
size
();
auto
&
tensor
=
output
.
front
();
auto
&
tensor
=
output
.
front
();
VLOG
(
3
)
<<
"output: "
<<
SummaryTensor
(
tensor
);
VLOG
(
3
0
)
<<
"output: "
<<
SummaryTensor
(
tensor
);
// compare with reference result
// compare with reference result
CheckOutput
(
FLAGS_refer
,
tensor
);
CheckOutput
(
FLAGS_refer
,
tensor
);
...
...
paddle/fluid/inference/api/demo_ci/utils.h
浏览文件 @
d231e550
...
@@ -47,7 +47,7 @@ static void split(const std::string& str, char sep,
...
@@ -47,7 +47,7 @@ static void split(const std::string& str, char sep,
}
}
Record
ProcessALine
(
const
std
::
string
&
line
)
{
Record
ProcessALine
(
const
std
::
string
&
line
)
{
VLOG
(
3
)
<<
"process a line"
;
VLOG
(
3
0
)
<<
"process a line"
;
std
::
vector
<
std
::
string
>
columns
;
std
::
vector
<
std
::
string
>
columns
;
split
(
line
,
'\t'
,
&
columns
);
split
(
line
,
'\t'
,
&
columns
);
CHECK_EQ
(
columns
.
size
(),
2UL
)
CHECK_EQ
(
columns
.
size
(),
2UL
)
...
@@ -65,8 +65,8 @@ Record ProcessALine(const std::string& line) {
...
@@ -65,8 +65,8 @@ Record ProcessALine(const std::string& line) {
for
(
auto
&
s
:
shape_strs
)
{
for
(
auto
&
s
:
shape_strs
)
{
record
.
shape
.
push_back
(
std
::
stoi
(
s
));
record
.
shape
.
push_back
(
std
::
stoi
(
s
));
}
}
VLOG
(
3
)
<<
"data size "
<<
record
.
data
.
size
();
VLOG
(
3
0
)
<<
"data size "
<<
record
.
data
.
size
();
VLOG
(
3
)
<<
"data shape size "
<<
record
.
shape
.
size
();
VLOG
(
3
0
)
<<
"data shape size "
<<
record
.
shape
.
size
();
return
record
;
return
record
;
}
}
...
@@ -78,8 +78,8 @@ void CheckOutput(const std::string& referfile, const PaddleTensor& output) {
...
@@ -78,8 +78,8 @@ void CheckOutput(const std::string& referfile, const PaddleTensor& output) {
file
.
close
();
file
.
close
();
size_t
numel
=
output
.
data
.
length
()
/
PaddleDtypeSize
(
output
.
dtype
);
size_t
numel
=
output
.
data
.
length
()
/
PaddleDtypeSize
(
output
.
dtype
);
VLOG
(
3
)
<<
"predictor output numel "
<<
numel
;
VLOG
(
3
0
)
<<
"predictor output numel "
<<
numel
;
VLOG
(
3
)
<<
"reference output numel "
<<
refer
.
data
.
size
();
VLOG
(
3
0
)
<<
"reference output numel "
<<
refer
.
data
.
size
();
CHECK_EQ
(
numel
,
refer
.
data
.
size
());
CHECK_EQ
(
numel
,
refer
.
data
.
size
());
switch
(
output
.
dtype
)
{
switch
(
output
.
dtype
)
{
case
PaddleDType
::
INT64
:
{
case
PaddleDType
::
INT64
:
{
...
...
paddle/fluid/inference/api/demo_ci/vis_demo.cc
浏览文件 @
d231e550
...
@@ -49,11 +49,11 @@ void Main(bool use_gpu) {
...
@@ -49,11 +49,11 @@ void Main(bool use_gpu) {
config
.
fraction_of_gpu_memory
=
0.1
;
// set by yourself
config
.
fraction_of_gpu_memory
=
0.1
;
// set by yourself
}
}
VLOG
(
3
)
<<
"init predictor"
;
VLOG
(
3
0
)
<<
"init predictor"
;
predictor
=
CreatePaddlePredictor
<
NativeConfig
>
(
config
);
predictor
=
CreatePaddlePredictor
<
NativeConfig
>
(
config
);
analysis_predictor
=
CreatePaddlePredictor
<
AnalysisConfig
>
(
config
);
analysis_predictor
=
CreatePaddlePredictor
<
AnalysisConfig
>
(
config
);
VLOG
(
3
)
<<
"begin to process data"
;
VLOG
(
3
0
)
<<
"begin to process data"
;
// Just a single batch of data.
// Just a single batch of data.
std
::
string
line
;
std
::
string
line
;
std
::
ifstream
file
(
FLAGS_data
);
std
::
ifstream
file
(
FLAGS_data
);
...
@@ -68,13 +68,13 @@ void Main(bool use_gpu) {
...
@@ -68,13 +68,13 @@ void Main(bool use_gpu) {
PaddleBuf
(
record
.
data
.
data
(),
record
.
data
.
size
()
*
sizeof
(
float
));
PaddleBuf
(
record
.
data
.
data
(),
record
.
data
.
size
()
*
sizeof
(
float
));
input
.
dtype
=
PaddleDType
::
FLOAT32
;
input
.
dtype
=
PaddleDType
::
FLOAT32
;
VLOG
(
3
)
<<
"run executor"
;
VLOG
(
3
0
)
<<
"run executor"
;
std
::
vector
<
PaddleTensor
>
output
,
analysis_output
;
std
::
vector
<
PaddleTensor
>
output
,
analysis_output
;
predictor
->
Run
({
input
},
&
output
,
1
);
predictor
->
Run
({
input
},
&
output
,
1
);
VLOG
(
3
)
<<
"output.size "
<<
output
.
size
();
VLOG
(
3
0
)
<<
"output.size "
<<
output
.
size
();
auto
&
tensor
=
output
.
front
();
auto
&
tensor
=
output
.
front
();
VLOG
(
3
)
<<
"output: "
<<
SummaryTensor
(
tensor
);
VLOG
(
3
0
)
<<
"output: "
<<
SummaryTensor
(
tensor
);
// compare with reference result
// compare with reference result
CheckOutput
(
FLAGS_refer
,
tensor
);
CheckOutput
(
FLAGS_refer
,
tensor
);
...
...
paddle/fluid/inference/api/details/reset_tensor_array.cc
浏览文件 @
d231e550
...
@@ -26,7 +26,7 @@ void TensorArrayBatchCleaner::CollectTensorArrays(framework::Scope *scope) {
...
@@ -26,7 +26,7 @@ void TensorArrayBatchCleaner::CollectTensorArrays(framework::Scope *scope) {
// parameter.
// parameter.
if
(
var_name
==
"feed"
||
var_name
==
"fetch"
)
continue
;
if
(
var_name
==
"feed"
||
var_name
==
"fetch"
)
continue
;
if
(
var
->
Type
()
==
typeid
(
framework
::
LoDTensorArray
))
{
if
(
var
->
Type
()
==
typeid
(
framework
::
LoDTensorArray
))
{
VLOG
(
4
)
<<
"collect "
<<
var_name
;
VLOG
(
4
0
)
<<
"collect "
<<
var_name
;
arrays_
.
push_back
(
var
->
GetMutable
<
framework
::
LoDTensorArray
>
());
arrays_
.
push_back
(
var
->
GetMutable
<
framework
::
LoDTensorArray
>
());
}
}
}
}
...
@@ -34,7 +34,7 @@ void TensorArrayBatchCleaner::CollectTensorArrays(framework::Scope *scope) {
...
@@ -34,7 +34,7 @@ void TensorArrayBatchCleaner::CollectTensorArrays(framework::Scope *scope) {
CollectTensorArrays
(
kid
);
CollectTensorArrays
(
kid
);
}
}
VLOG
(
3
)
<<
"Collect "
<<
arrays_
.
size
()
<<
" arrays"
;
VLOG
(
3
0
)
<<
"Collect "
<<
arrays_
.
size
()
<<
" arrays"
;
flag_
=
false
;
flag_
=
false
;
}
}
}
}
...
...
paddle/fluid/inference/api/helper.h
浏览文件 @
d231e550
...
@@ -16,13 +16,14 @@
...
@@ -16,13 +16,14 @@
#include <glog/logging.h>
#include <glog/logging.h>
#include <sys/time.h>
#include <sys/time.h>
#include <algorithm>
#include <chrono> // NOLINT
#include <chrono> // NOLINT
#include <numeric>
#include <numeric>
#include <sstream>
#include <sstream>
#include <string>
#include <string>
#include <vector>
#include <vector>
#include "paddle/fluid/inference/api/paddle_inference_api.h"
#include "paddle/fluid/string/printf.h"
#include "paddle/fluid/string/printf.h"
#include "paddle_inference_api.h"
namespace
paddle
{
namespace
paddle
{
namespace
inference
{
namespace
inference
{
...
...
paddle/fluid/inference/io.cc
浏览文件 @
d231e550
...
@@ -59,7 +59,8 @@ void ReadBinaryFile(const std::string& filename, std::string* contents) {
...
@@ -59,7 +59,8 @@ void ReadBinaryFile(const std::string& filename, std::string* contents) {
bool
IsPersistable
(
const
framework
::
VarDesc
*
var
)
{
bool
IsPersistable
(
const
framework
::
VarDesc
*
var
)
{
if
(
var
->
Persistable
()
&&
if
(
var
->
Persistable
()
&&
var
->
GetType
()
!=
framework
::
proto
::
VarType
::
FEED_MINIBATCH
&&
var
->
GetType
()
!=
framework
::
proto
::
VarType
::
FEED_MINIBATCH
&&
var
->
GetType
()
!=
framework
::
proto
::
VarType
::
FETCH_LIST
)
{
var
->
GetType
()
!=
framework
::
proto
::
VarType
::
FETCH_LIST
&&
var
->
GetType
()
!=
framework
::
proto
::
VarType
::
RAW
)
{
return
true
;
return
true
;
}
}
return
false
;
return
false
;
...
@@ -77,7 +78,7 @@ void LoadPersistables(framework::Executor* executor, framework::Scope* scope,
...
@@ -77,7 +78,7 @@ void LoadPersistables(framework::Executor* executor, framework::Scope* scope,
for
(
auto
*
var
:
global_block
.
AllVars
())
{
for
(
auto
*
var
:
global_block
.
AllVars
())
{
if
(
IsPersistable
(
var
))
{
if
(
IsPersistable
(
var
))
{
VLOG
(
3
)
<<
"persistable variable's name: "
<<
var
->
Name
();
VLOG
(
3
0
)
<<
"persistable variable's name: "
<<
var
->
Name
();
framework
::
VarDesc
*
new_var
=
load_block
->
Var
(
var
->
Name
());
framework
::
VarDesc
*
new_var
=
load_block
->
Var
(
var
->
Name
());
new_var
->
SetShape
(
var
->
GetShape
());
new_var
->
SetShape
(
var
->
GetShape
());
...
@@ -120,7 +121,7 @@ std::unique_ptr<framework::ProgramDesc> Load(framework::Executor* executor,
...
@@ -120,7 +121,7 @@ std::unique_ptr<framework::ProgramDesc> Load(framework::Executor* executor,
const
std
::
string
&
dirname
)
{
const
std
::
string
&
dirname
)
{
std
::
string
model_filename
=
dirname
+
"/__model__"
;
std
::
string
model_filename
=
dirname
+
"/__model__"
;
std
::
string
program_desc_str
;
std
::
string
program_desc_str
;
VLOG
(
3
)
<<
"loading model from "
<<
model_filename
;
VLOG
(
3
0
)
<<
"loading model from "
<<
model_filename
;
ReadBinaryFile
(
model_filename
,
&
program_desc_str
);
ReadBinaryFile
(
model_filename
,
&
program_desc_str
);
std
::
unique_ptr
<
framework
::
ProgramDesc
>
main_program
(
std
::
unique_ptr
<
framework
::
ProgramDesc
>
main_program
(
...
...
paddle/fluid/inference/tensorrt/convert/concat_op.cc
浏览文件 @
d231e550
...
@@ -25,7 +25,7 @@ class ConcatOpConverter : public OpConverter {
...
@@ -25,7 +25,7 @@ class ConcatOpConverter : public OpConverter {
public:
public:
void
operator
()(
const
framework
::
proto
::
OpDesc
&
op
,
void
operator
()(
const
framework
::
proto
::
OpDesc
&
op
,
const
framework
::
Scope
&
scope
,
bool
test_mode
)
override
{
const
framework
::
Scope
&
scope
,
bool
test_mode
)
override
{
VLOG
(
4
)
<<
"convert a fluid mul op to tensorrt mul layer without bias"
;
VLOG
(
4
0
)
<<
"convert a fluid mul op to tensorrt mul layer without bias"
;
framework
::
OpDesc
op_desc
(
op
,
nullptr
);
framework
::
OpDesc
op_desc
(
op
,
nullptr
);
// Declare inputs
// Declare inputs
...
...
paddle/fluid/inference/tensorrt/convert/dropout_op.cc
浏览文件 @
d231e550
...
@@ -25,7 +25,7 @@ class DropoutOpConverter : public OpConverter {
...
@@ -25,7 +25,7 @@ class DropoutOpConverter : public OpConverter {
public:
public:
void
operator
()(
const
framework
::
proto
::
OpDesc
&
op
,
void
operator
()(
const
framework
::
proto
::
OpDesc
&
op
,
const
framework
::
Scope
&
scope
,
bool
test_mode
)
override
{
const
framework
::
Scope
&
scope
,
bool
test_mode
)
override
{
VLOG
(
4
)
<<
"convert a fluid dropout op to tensorrt dropout layer"
;
VLOG
(
4
0
)
<<
"convert a fluid dropout op to tensorrt dropout layer"
;
framework
::
OpDesc
op_desc
(
op
,
nullptr
);
framework
::
OpDesc
op_desc
(
op
,
nullptr
);
// Declare inputs
// Declare inputs
auto
*
input1
=
engine_
->
GetITensor
(
op_desc
.
Input
(
"X"
)[
0
]);
auto
*
input1
=
engine_
->
GetITensor
(
op_desc
.
Input
(
"X"
)[
0
]);
...
...
paddle/fluid/inference/tensorrt/convert/fc_op.cc
浏览文件 @
d231e550
...
@@ -52,7 +52,7 @@ class FcOpConverter : public OpConverter {
...
@@ -52,7 +52,7 @@ class FcOpConverter : public OpConverter {
public:
public:
void
operator
()(
const
framework
::
proto
::
OpDesc
&
op
,
void
operator
()(
const
framework
::
proto
::
OpDesc
&
op
,
const
framework
::
Scope
&
scope
,
bool
test_mode
)
override
{
const
framework
::
Scope
&
scope
,
bool
test_mode
)
override
{
VLOG
(
4
)
<<
"convert a fluid fc op to tensorrt fc layer without bias"
;
VLOG
(
4
0
)
<<
"convert a fluid fc op to tensorrt fc layer without bias"
;
framework
::
OpDesc
op_desc
(
op
,
nullptr
);
framework
::
OpDesc
op_desc
(
op
,
nullptr
);
PADDLE_ENFORCE_EQ
(
op_desc
.
Input
(
"X"
).
size
(),
1
);
PADDLE_ENFORCE_EQ
(
op_desc
.
Input
(
"X"
).
size
(),
1
);
...
...
paddle/fluid/inference/tensorrt/convert/mul_op.cc
浏览文件 @
d231e550
...
@@ -25,7 +25,7 @@ class MulOpConverter : public OpConverter {
...
@@ -25,7 +25,7 @@ class MulOpConverter : public OpConverter {
public:
public:
void
operator
()(
const
framework
::
proto
::
OpDesc
&
op
,
void
operator
()(
const
framework
::
proto
::
OpDesc
&
op
,
const
framework
::
Scope
&
scope
,
bool
test_mode
)
override
{
const
framework
::
Scope
&
scope
,
bool
test_mode
)
override
{
VLOG
(
4
)
<<
"convert a fluid mul op to tensorrt mul layer without bias"
;
VLOG
(
4
0
)
<<
"convert a fluid mul op to tensorrt mul layer without bias"
;
framework
::
OpDesc
op_desc
(
op
,
nullptr
);
framework
::
OpDesc
op_desc
(
op
,
nullptr
);
// Declare inputs
// Declare inputs
...
...
paddle/fluid/inference/tensorrt/convert/pad_op.cc
浏览文件 @
d231e550
...
@@ -25,7 +25,7 @@ class PadOpConverter : public OpConverter {
...
@@ -25,7 +25,7 @@ class PadOpConverter : public OpConverter {
public:
public:
void
operator
()(
const
framework
::
proto
::
OpDesc
&
op
,
void
operator
()(
const
framework
::
proto
::
OpDesc
&
op
,
const
framework
::
Scope
&
scope
,
bool
test_mode
)
override
{
const
framework
::
Scope
&
scope
,
bool
test_mode
)
override
{
VLOG
(
4
)
<<
"convert a fluid transpose op to tensorrt tranpose layer"
;
VLOG
(
4
0
)
<<
"convert a fluid transpose op to tensorrt tranpose layer"
;
framework
::
OpDesc
op_desc
(
op
,
nullptr
);
framework
::
OpDesc
op_desc
(
op
,
nullptr
);
// Declare inputs
// Declare inputs
...
...
paddle/fluid/inference/tensorrt/convert/pool2d_op.cc
浏览文件 @
d231e550
...
@@ -25,7 +25,7 @@ class Pool2dOpConverter : public OpConverter {
...
@@ -25,7 +25,7 @@ class Pool2dOpConverter : public OpConverter {
public:
public:
void
operator
()(
const
framework
::
proto
::
OpDesc
&
op
,
void
operator
()(
const
framework
::
proto
::
OpDesc
&
op
,
const
framework
::
Scope
&
scope
,
bool
test_mode
)
override
{
const
framework
::
Scope
&
scope
,
bool
test_mode
)
override
{
VLOG
(
4
)
VLOG
(
4
0
)
<<
"convert a fluid pool2d op to tensorrt pool2d layer without bias"
;
<<
"convert a fluid pool2d op to tensorrt pool2d layer without bias"
;
framework
::
OpDesc
op_desc
(
op
,
nullptr
);
framework
::
OpDesc
op_desc
(
op
,
nullptr
);
// Declare inputs
// Declare inputs
...
...
paddle/fluid/inference/tensorrt/convert/softmax_op.cc
浏览文件 @
d231e550
...
@@ -25,7 +25,7 @@ class SoftMaxOpConverter : public OpConverter {
...
@@ -25,7 +25,7 @@ class SoftMaxOpConverter : public OpConverter {
public:
public:
void
operator
()(
const
framework
::
proto
::
OpDesc
&
op
,
void
operator
()(
const
framework
::
proto
::
OpDesc
&
op
,
const
framework
::
Scope
&
scope
,
bool
test_mode
)
override
{
const
framework
::
Scope
&
scope
,
bool
test_mode
)
override
{
VLOG
(
4
)
VLOG
(
4
0
)
<<
"convert a fluid softmax op to tensorrt softmax layer without bias"
;
<<
"convert a fluid softmax op to tensorrt softmax layer without bias"
;
framework
::
OpDesc
op_desc
(
op
,
nullptr
);
framework
::
OpDesc
op_desc
(
op
,
nullptr
);
// Declare inputs
// Declare inputs
...
...
paddle/fluid/inference/tensorrt/engine.h
浏览文件 @
d231e550
...
@@ -134,7 +134,7 @@ class TensorRTEngine : public EngineBase {
...
@@ -134,7 +134,7 @@ class TensorRTEngine : public EngineBase {
std
::
unordered_map
<
std
::
string
/*name*/
,
std
::
unique_ptr
<
framework
::
Tensor
>>
std
::
unordered_map
<
std
::
string
/*name*/
,
std
::
unique_ptr
<
framework
::
Tensor
>>
weight_map
;
weight_map
;
// TODO
:
(NHZLX)
// TODO(NHZLX)
// In the normal case, the paddle-trt exists bug when runing the googlenet.
// In the normal case, the paddle-trt exists bug when runing the googlenet.
// When there are more than two convolutions of 1 * 1 with the same input, the
// When there are more than two convolutions of 1 * 1 with the same input, the
// paddle-tensorrt will do the merging optimization, which fuse those conv
// paddle-tensorrt will do the merging optimization, which fuse those conv
...
...
paddle/fluid/inference/tests/api/analyzer_vis_tester.cc
浏览文件 @
d231e550
...
@@ -27,7 +27,7 @@ struct Record {
...
@@ -27,7 +27,7 @@ struct Record {
};
};
Record
ProcessALine
(
const
std
::
string
&
line
)
{
Record
ProcessALine
(
const
std
::
string
&
line
)
{
VLOG
(
3
)
<<
"process a line"
;
VLOG
(
3
0
)
<<
"process a line"
;
std
::
vector
<
std
::
string
>
columns
;
std
::
vector
<
std
::
string
>
columns
;
split
(
line
,
'\t'
,
&
columns
);
split
(
line
,
'\t'
,
&
columns
);
CHECK_EQ
(
columns
.
size
(),
2UL
)
CHECK_EQ
(
columns
.
size
(),
2UL
)
...
@@ -45,8 +45,8 @@ Record ProcessALine(const std::string &line) {
...
@@ -45,8 +45,8 @@ Record ProcessALine(const std::string &line) {
for
(
auto
&
s
:
shape_strs
)
{
for
(
auto
&
s
:
shape_strs
)
{
record
.
shape
.
push_back
(
std
::
stoi
(
s
));
record
.
shape
.
push_back
(
std
::
stoi
(
s
));
}
}
VLOG
(
3
)
<<
"data size "
<<
record
.
data
.
size
();
VLOG
(
3
0
)
<<
"data size "
<<
record
.
data
.
size
();
VLOG
(
3
)
<<
"data shape size "
<<
record
.
shape
.
size
();
VLOG
(
3
0
)
<<
"data shape size "
<<
record
.
shape
.
size
();
return
record
;
return
record
;
}
}
...
...
paddle/fluid/memory/detail/buddy_allocator.cc
浏览文件 @
d231e550
...
@@ -32,11 +32,11 @@ BuddyAllocator::BuddyAllocator(
...
@@ -32,11 +32,11 @@ BuddyAllocator::BuddyAllocator(
system_allocator_
(
std
::
move
(
system_allocator
))
{}
system_allocator_
(
std
::
move
(
system_allocator
))
{}
BuddyAllocator
::~
BuddyAllocator
()
{
BuddyAllocator
::~
BuddyAllocator
()
{
VLOG
(
10
)
<<
"BuddyAllocator Disconstructor makes sure that all of these "
VLOG
(
10
0
)
<<
"BuddyAllocator Disconstructor makes sure that all of these "
"have actually been freed"
;
"have actually been freed"
;
while
(
!
pool_
.
empty
())
{
while
(
!
pool_
.
empty
())
{
auto
block
=
static_cast
<
MemoryBlock
*>
(
std
::
get
<
2
>
(
*
pool_
.
begin
()));
auto
block
=
static_cast
<
MemoryBlock
*>
(
std
::
get
<
2
>
(
*
pool_
.
begin
()));
VLOG
(
10
)
<<
"Free from block ("
<<
block
<<
", "
<<
max_chunk_size_
<<
")"
;
VLOG
(
10
0
)
<<
"Free from block ("
<<
block
<<
", "
<<
max_chunk_size_
<<
")"
;
system_allocator_
->
Free
(
block
,
max_chunk_size_
,
block
->
index
(
cache_
));
system_allocator_
->
Free
(
block
,
max_chunk_size_
,
block
->
index
(
cache_
));
cache_
.
invalidate
(
block
);
cache_
.
invalidate
(
block
);
...
@@ -57,12 +57,12 @@ void* BuddyAllocator::Alloc(size_t unaligned_size) {
...
@@ -57,12 +57,12 @@ void* BuddyAllocator::Alloc(size_t unaligned_size) {
// acquire the allocator lock
// acquire the allocator lock
std
::
lock_guard
<
std
::
mutex
>
lock
(
mutex_
);
std
::
lock_guard
<
std
::
mutex
>
lock
(
mutex_
);
VLOG
(
10
)
<<
"Allocate "
<<
unaligned_size
<<
" bytes from chunk size "
VLOG
(
10
0
)
<<
"Allocate "
<<
unaligned_size
<<
" bytes from chunk size "
<<
size
;
<<
size
;
// if the allocation is huge, send directly to the system allocator
// if the allocation is huge, send directly to the system allocator
if
(
size
>
max_chunk_size_
)
{
if
(
size
>
max_chunk_size_
)
{
VLOG
(
10
)
<<
"Allocate from system allocator."
;
VLOG
(
10
0
)
<<
"Allocate from system allocator."
;
return
SystemAlloc
(
size
);
return
SystemAlloc
(
size
);
}
}
...
@@ -77,7 +77,7 @@ void* BuddyAllocator::Alloc(size_t unaligned_size) {
...
@@ -77,7 +77,7 @@ void* BuddyAllocator::Alloc(size_t unaligned_size) {
return
nullptr
;
return
nullptr
;
}
}
}
else
{
}
else
{
VLOG
(
10
)
<<
"Allocation from existing memory block "
<<
std
::
get
<
2
>
(
*
it
)
VLOG
(
10
0
)
<<
"Allocation from existing memory block "
<<
std
::
get
<
2
>
(
*
it
)
<<
" at address "
<<
" at address "
<<
reinterpret_cast
<
MemoryBlock
*>
(
std
::
get
<
2
>
(
*
it
))
->
data
();
<<
reinterpret_cast
<
MemoryBlock
*>
(
std
::
get
<
2
>
(
*
it
))
->
data
();
}
}
...
@@ -96,10 +96,10 @@ void BuddyAllocator::Free(void* p) {
...
@@ -96,10 +96,10 @@ void BuddyAllocator::Free(void* p) {
// Acquire the allocator lock
// Acquire the allocator lock
std
::
lock_guard
<
std
::
mutex
>
lock
(
mutex_
);
std
::
lock_guard
<
std
::
mutex
>
lock
(
mutex_
);
VLOG
(
10
)
<<
"Free from address "
<<
block
;
VLOG
(
10
0
)
<<
"Free from address "
<<
block
;
if
(
block
->
type
(
cache_
)
==
MemoryBlock
::
HUGE_CHUNK
)
{
if
(
block
->
type
(
cache_
)
==
MemoryBlock
::
HUGE_CHUNK
)
{
VLOG
(
10
)
<<
"Free directly from system allocator"
;
VLOG
(
10
0
)
<<
"Free directly from system allocator"
;
system_allocator_
->
Free
(
block
,
block
->
total_size
(
cache_
),
system_allocator_
->
Free
(
block
,
block
->
total_size
(
cache_
),
block
->
index
(
cache_
));
block
->
index
(
cache_
));
...
@@ -116,7 +116,7 @@ void BuddyAllocator::Free(void* p) {
...
@@ -116,7 +116,7 @@ void BuddyAllocator::Free(void* p) {
// Trying to merge the right buddy
// Trying to merge the right buddy
if
(
block
->
has_right_buddy
(
cache_
))
{
if
(
block
->
has_right_buddy
(
cache_
))
{
VLOG
(
10
)
<<
"Merging this block "
<<
block
<<
" with its right buddy "
VLOG
(
10
0
)
<<
"Merging this block "
<<
block
<<
" with its right buddy "
<<
block
->
right_buddy
(
cache_
);
<<
block
->
right_buddy
(
cache_
);
auto
right_buddy
=
block
->
right_buddy
(
cache_
);
auto
right_buddy
=
block
->
right_buddy
(
cache_
);
...
@@ -134,7 +134,7 @@ void BuddyAllocator::Free(void* p) {
...
@@ -134,7 +134,7 @@ void BuddyAllocator::Free(void* p) {
// Trying to merge the left buddy
// Trying to merge the left buddy
if
(
block
->
has_left_buddy
(
cache_
))
{
if
(
block
->
has_left_buddy
(
cache_
))
{
VLOG
(
10
)
<<
"Merging this block "
<<
block
<<
" with its left buddy "
VLOG
(
10
0
)
<<
"Merging this block "
<<
block
<<
" with its left buddy "
<<
block
->
left_buddy
(
cache_
);
<<
block
->
left_buddy
(
cache_
);
auto
left_buddy
=
block
->
left_buddy
(
cache_
);
auto
left_buddy
=
block
->
left_buddy
(
cache_
);
...
@@ -151,7 +151,7 @@ void BuddyAllocator::Free(void* p) {
...
@@ -151,7 +151,7 @@ void BuddyAllocator::Free(void* p) {
}
}
// Dumping this block into pool
// Dumping this block into pool
VLOG
(
10
)
<<
"Inserting free block ("
<<
block
<<
", "
VLOG
(
10
0
)
<<
"Inserting free block ("
<<
block
<<
", "
<<
block
->
total_size
(
cache_
)
<<
")"
;
<<
block
->
total_size
(
cache_
)
<<
")"
;
pool_
.
insert
(
pool_
.
insert
(
IndexSizeAddress
(
block
->
index
(
cache_
),
block
->
total_size
(
cache_
),
block
));
IndexSizeAddress
(
block
->
index
(
cache_
),
block
->
total_size
(
cache_
),
block
));
...
@@ -174,7 +174,7 @@ void* BuddyAllocator::SystemAlloc(size_t size) {
...
@@ -174,7 +174,7 @@ void* BuddyAllocator::SystemAlloc(size_t size) {
size_t
index
=
0
;
size_t
index
=
0
;
void
*
p
=
system_allocator_
->
Alloc
(
&
index
,
size
);
void
*
p
=
system_allocator_
->
Alloc
(
&
index
,
size
);
VLOG
(
10
)
<<
"Allocated "
<<
p
<<
" from system allocator."
;
VLOG
(
10
0
)
<<
"Allocated "
<<
p
<<
" from system allocator."
;
if
(
p
==
nullptr
)
return
nullptr
;
if
(
p
==
nullptr
)
return
nullptr
;
...
@@ -200,7 +200,7 @@ BuddyAllocator::PoolSet::iterator BuddyAllocator::RefillPool() {
...
@@ -200,7 +200,7 @@ BuddyAllocator::PoolSet::iterator BuddyAllocator::RefillPool() {
if
(
p
==
nullptr
)
return
pool_
.
end
();
if
(
p
==
nullptr
)
return
pool_
.
end
();
VLOG
(
10
)
<<
"Creating and inserting new block "
<<
p
VLOG
(
10
0
)
<<
"Creating and inserting new block "
<<
p
<<
" from system allocator"
;
<<
" from system allocator"
;
static_cast
<
MemoryBlock
*>
(
p
)
->
init
(
&
cache_
,
MemoryBlock
::
FREE_CHUNK
,
index
,
static_cast
<
MemoryBlock
*>
(
p
)
->
init
(
&
cache_
,
MemoryBlock
::
FREE_CHUNK
,
index
,
...
@@ -245,18 +245,18 @@ void* BuddyAllocator::SplitToAlloc(BuddyAllocator::PoolSet::iterator it,
...
@@ -245,18 +245,18 @@ void* BuddyAllocator::SplitToAlloc(BuddyAllocator::PoolSet::iterator it,
auto
block
=
static_cast
<
MemoryBlock
*>
(
std
::
get
<
2
>
(
*
it
));
auto
block
=
static_cast
<
MemoryBlock
*>
(
std
::
get
<
2
>
(
*
it
));
pool_
.
erase
(
it
);
pool_
.
erase
(
it
);
VLOG
(
10
)
<<
"Split block ("
<<
block
<<
", "
<<
block
->
total_size
(
cache_
)
VLOG
(
10
0
)
<<
"Split block ("
<<
block
<<
", "
<<
block
->
total_size
(
cache_
)
<<
") into"
;
<<
") into"
;
block
->
split
(
&
cache_
,
size
);
block
->
split
(
&
cache_
,
size
);
VLOG
(
10
)
<<
"Left block ("
<<
block
<<
", "
<<
block
->
total_size
(
cache_
)
VLOG
(
10
0
)
<<
"Left block ("
<<
block
<<
", "
<<
block
->
total_size
(
cache_
)
<<
")"
;
<<
")"
;
block
->
set_type
(
&
cache_
,
MemoryBlock
::
ARENA_CHUNK
);
block
->
set_type
(
&
cache_
,
MemoryBlock
::
ARENA_CHUNK
);
// the rest of memory if exist
// the rest of memory if exist
if
(
block
->
has_right_buddy
(
cache_
))
{
if
(
block
->
has_right_buddy
(
cache_
))
{
if
(
block
->
right_buddy
(
cache_
)
->
type
(
cache_
)
==
MemoryBlock
::
FREE_CHUNK
)
{
if
(
block
->
right_buddy
(
cache_
)
->
type
(
cache_
)
==
MemoryBlock
::
FREE_CHUNK
)
{
VLOG
(
10
)
<<
"Insert right block ("
<<
block
->
right_buddy
(
cache_
)
<<
", "
VLOG
(
10
0
)
<<
"Insert right block ("
<<
block
->
right_buddy
(
cache_
)
<<
", "
<<
block
->
right_buddy
(
cache_
)
->
total_size
(
cache_
)
<<
")"
;
<<
block
->
right_buddy
(
cache_
)
->
total_size
(
cache_
)
<<
")"
;
pool_
.
insert
(
pool_
.
insert
(
...
@@ -284,7 +284,7 @@ void BuddyAllocator::CleanIdleFallBackAlloc() {
...
@@ -284,7 +284,7 @@ void BuddyAllocator::CleanIdleFallBackAlloc() {
return
;
return
;
}
}
VLOG
(
10
)
<<
"Return block "
<<
block
<<
" to fallback allocator."
;
VLOG
(
10
0
)
<<
"Return block "
<<
block
<<
" to fallback allocator."
;
system_allocator_
->
Free
(
block
,
max_chunk_size_
,
block
->
index
(
cache_
));
system_allocator_
->
Free
(
block
,
max_chunk_size_
,
block
->
index
(
cache_
));
cache_
.
invalidate
(
block
);
cache_
.
invalidate
(
block
);
...
@@ -320,7 +320,7 @@ void BuddyAllocator::CleanIdleNormalAlloc() {
...
@@ -320,7 +320,7 @@ void BuddyAllocator::CleanIdleNormalAlloc() {
MemoryBlock
*
block
=
static_cast
<
MemoryBlock
*>
(
std
::
get
<
2
>
(
*
pool
));
MemoryBlock
*
block
=
static_cast
<
MemoryBlock
*>
(
std
::
get
<
2
>
(
*
pool
));
VLOG
(
10
)
<<
"Return block "
<<
block
<<
" to base allocator."
;
VLOG
(
10
0
)
<<
"Return block "
<<
block
<<
" to base allocator."
;
system_allocator_
->
Free
(
block
,
max_chunk_size_
,
block
->
index
(
cache_
));
system_allocator_
->
Free
(
block
,
max_chunk_size_
,
block
->
index
(
cache_
));
cache_
.
invalidate
(
block
);
cache_
.
invalidate
(
block
);
...
...
paddle/fluid/memory/detail/meta_cache.cc
浏览文件 @
d231e550
...
@@ -29,7 +29,7 @@ MemoryBlock::Desc MetadataCache::load(const MemoryBlock* block) const {
...
@@ -29,7 +29,7 @@ MemoryBlock::Desc MetadataCache::load(const MemoryBlock* block) const {
return
existing_desc
->
second
;
return
existing_desc
->
second
;
}
else
{
}
else
{
auto
*
desc
=
reinterpret_cast
<
const
MemoryBlock
::
Desc
*>
(
block
);
auto
*
desc
=
reinterpret_cast
<
const
MemoryBlock
::
Desc
*>
(
block
);
VLOG
(
10
)
<<
"Load MemoryBlock::Desc type="
<<
desc
->
type
;
VLOG
(
10
0
)
<<
"Load MemoryBlock::Desc type="
<<
desc
->
type
;
PADDLE_ASSERT
(
desc
->
check_guards
());
PADDLE_ASSERT
(
desc
->
check_guards
());
return
*
reinterpret_cast
<
const
MemoryBlock
::
Desc
*>
(
block
);
return
*
reinterpret_cast
<
const
MemoryBlock
::
Desc
*>
(
block
);
}
}
...
...
paddle/fluid/memory/malloc.cc
浏览文件 @
d231e550
...
@@ -78,7 +78,7 @@ void* Alloc<platform::CPUPlace>(const platform::CPUPlace& place, size_t size) {
...
@@ -78,7 +78,7 @@ void* Alloc<platform::CPUPlace>(const platform::CPUPlace& place, size_t size) {
if
(
FLAGS_init_allocated_mem
)
{
if
(
FLAGS_init_allocated_mem
)
{
memset
(
p
,
0xEF
,
size
);
memset
(
p
,
0xEF
,
size
);
}
}
VLOG
(
10
)
<<
" pointer="
<<
p
;
VLOG
(
10
0
)
<<
" pointer="
<<
p
;
return
p
;
return
p
;
}
}
...
@@ -111,7 +111,7 @@ BuddyAllocator* GetGPUBuddyAllocator(int gpu_id) {
...
@@ -111,7 +111,7 @@ BuddyAllocator* GetGPUBuddyAllocator(int gpu_id) {
std
::
unique_ptr
<
detail
::
SystemAllocator
>
(
new
detail
::
GPUAllocator
(
i
)),
std
::
unique_ptr
<
detail
::
SystemAllocator
>
(
new
detail
::
GPUAllocator
(
i
)),
platform
::
GpuMinChunkSize
(),
platform
::
GpuMaxChunkSize
());
platform
::
GpuMinChunkSize
(),
platform
::
GpuMaxChunkSize
());
VLOG
(
10
)
<<
"
\n\n
NOTE: each GPU device use "
VLOG
(
10
0
)
<<
"
\n\n
NOTE: each GPU device use "
<<
FLAGS_fraction_of_gpu_memory_to_use
*
100
<<
FLAGS_fraction_of_gpu_memory_to_use
*
100
<<
"% of GPU memory.
\n
"
<<
"% of GPU memory.
\n
"
<<
"You can set GFlags environment variable '"
<<
"You can set GFlags environment variable '"
...
...
paddle/fluid/operators/CMakeLists.txt
浏览文件 @
d231e550
...
@@ -317,6 +317,7 @@ op_library(save_op DEPS lod_tensor)
...
@@ -317,6 +317,7 @@ op_library(save_op DEPS lod_tensor)
op_library
(
load_op DEPS lod_tensor
)
op_library
(
load_op DEPS lod_tensor
)
op_library
(
save_combine_op DEPS lod_tensor
)
op_library
(
save_combine_op DEPS lod_tensor
)
op_library
(
load_combine_op DEPS lod_tensor
)
op_library
(
load_combine_op DEPS lod_tensor
)
op_library
(
tensor_array_to_tensor_op DEPS concat_op
)
op_library
(
concat_op DEPS concat_and_split
)
op_library
(
concat_op DEPS concat_and_split
)
list
(
REMOVE_ITEM GENERAL_OPS
${
DEPS_OPS
}
)
list
(
REMOVE_ITEM GENERAL_OPS
${
DEPS_OPS
}
)
...
...
paddle/fluid/operators/activation_op.cc
浏览文件 @
d231e550
...
@@ -91,16 +91,12 @@ class ActivationOp : public framework::OperatorWithKernel {
...
@@ -91,16 +91,12 @@ class ActivationOp : public framework::OperatorWithKernel {
}
}
};
};
class
ActivationOpInferVarType
:
public
framework
::
VarTypeInference
{
class
ActivationOpInferVarType
public:
:
public
framework
::
PassInDtypeAndVarTypeToOutput
{
void
operator
()(
const
framework
::
OpDesc
&
op_desc
,
protected:
framework
::
BlockDesc
*
block
)
const
override
{
std
::
unordered_map
<
std
::
string
,
std
::
string
>
GetInputOutputWithSameType
()
auto
x_name
=
op_desc
.
Input
(
"X"
)[
0
];
const
override
{
auto
out_name
=
op_desc
.
Output
(
"Out"
)[
0
];
return
std
::
unordered_map
<
std
::
string
,
std
::
string
>
{{
"X"
,
/*->*/
"Out"
}};
auto
&
x
=
block
->
FindRecursiveOrCreateVar
(
x_name
);
auto
&
out
=
block
->
FindRecursiveOrCreateVar
(
out_name
);
out
.
SetType
(
x
.
GetType
());
out
.
SetDataType
(
x
.
GetDataType
());
}
}
};
};
...
...
paddle/fluid/operators/activation_op.h
浏览文件 @
d231e550
...
@@ -95,7 +95,7 @@ class ActivationGradKernel
...
@@ -95,7 +95,7 @@ class ActivationGradKernel
auto
x
=
framework
::
EigenVector
<
T
>::
Flatten
(
*
X
);
auto
x
=
framework
::
EigenVector
<
T
>::
Flatten
(
*
X
);
functor
(
*
place
,
x
,
out
,
dout
,
dx
);
functor
(
*
place
,
x
,
out
,
dout
,
dx
);
}
else
{
}
else
{
VLOG
(
10
)
<<
" Inplace activation "
;
VLOG
(
10
0
)
<<
" Inplace activation "
;
auto
x
=
framework
::
EigenVector
<
T
>::
Flatten
(
*
dX
);
auto
x
=
framework
::
EigenVector
<
T
>::
Flatten
(
*
dX
);
functor
(
*
place
,
x
,
out
,
dout
,
dx
);
functor
(
*
place
,
x
,
out
,
dout
,
dx
);
}
}
...
...
paddle/fluid/operators/adam_op.h
浏览文件 @
d231e550
...
@@ -297,7 +297,7 @@ class AdamOpKernel : public framework::OpKernel<T> {
...
@@ -297,7 +297,7 @@ class AdamOpKernel : public framework::OpKernel<T> {
auto
&
grad
=
auto
&
grad
=
Ref
(
ctx
.
Input
<
framework
::
SelectedRows
>
(
"Grad"
),
"Must set Grad"
);
Ref
(
ctx
.
Input
<
framework
::
SelectedRows
>
(
"Grad"
),
"Must set Grad"
);
if
(
grad
.
rows
().
size
()
==
0
)
{
if
(
grad
.
rows
().
size
()
==
0
)
{
VLOG
(
3
)
<<
"grad row size is 0!!"
;
VLOG
(
3
0
)
<<
"grad row size is 0!!"
;
return
;
return
;
}
}
...
...
paddle/fluid/operators/add_position_encoding_op.h
浏览文件 @
d231e550
...
@@ -66,8 +66,9 @@ class AddPositionEncodingKernel : public framework::OpKernel<T> {
...
@@ -66,8 +66,9 @@ class AddPositionEncodingKernel : public framework::OpKernel<T> {
x_lod
.
empty
()
?
max_seq_len
:
x_lod
[
0
][
i
+
1
]
-
x_lod
[
0
][
i
];
x_lod
.
empty
()
?
max_seq_len
:
x_lod
[
0
][
i
+
1
]
-
x_lod
[
0
][
i
];
for
(
int
j
=
0
;
j
<
max_length
;
++
j
)
{
for
(
int
j
=
0
;
j
<
max_length
;
++
j
)
{
for
(
int
k
=
0
;
k
<
half_size
;
++
k
)
{
for
(
int
k
=
0
;
k
<
half_size
;
++
k
)
{
const
double
val
=
(
half_size
>
1
)
const
double
val
=
?
j
/
pow
(
10000.0
,
double
(
k
)
/
(
half_size
-
1
))
(
half_size
>
1
)
?
j
/
pow
(
10000.0
,
static_cast
<
double
>
(
k
)
/
(
half_size
-
1
))
:
j
/
10000.0
;
:
j
/
10000.0
;
dst_ptr
[
k
]
=
src_ptr
[
k
]
*
alpha
+
sin
(
val
)
*
beta
;
dst_ptr
[
k
]
=
src_ptr
[
k
]
*
alpha
+
sin
(
val
)
*
beta
;
dst_ptr
[
half_size
+
k
]
=
dst_ptr
[
half_size
+
k
]
=
...
...
paddle/fluid/operators/array_operator.h
浏览文件 @
d231e550
...
@@ -49,7 +49,7 @@ class ArrayOp : public framework::OperatorBase {
...
@@ -49,7 +49,7 @@ class ArrayOp : public framework::OperatorBase {
}
else
{
}
else
{
offset
=
static_cast
<
size_t
>
(
*
i_tensor
.
data
<
int64_t
>
());
offset
=
static_cast
<
size_t
>
(
*
i_tensor
.
data
<
int64_t
>
());
}
}
VLOG
(
10
)
<<
" Offset = "
<<
offset
;
VLOG
(
10
0
)
<<
" Offset = "
<<
offset
;
return
offset
;
return
offset
;
}
}
};
};
...
...
paddle/fluid/operators/array_to_lod_tensor_op.cc
浏览文件 @
d231e550
...
@@ -148,7 +148,7 @@ class ArrayToLoDTensorOp : public framework::OperatorBase {
...
@@ -148,7 +148,7 @@ class ArrayToLoDTensorOp : public framework::OperatorBase {
size_t
start_offset
=
lod_and_offset
.
second
.
first
;
size_t
start_offset
=
lod_and_offset
.
second
.
first
;
size_t
end_offset
=
lod_and_offset
.
second
.
second
;
size_t
end_offset
=
lod_and_offset
.
second
.
second
;
VLOG
(
10
)
<<
"idx="
<<
idx
<<
" x_idx="
<<
x_idx
<<
" ["
VLOG
(
10
0
)
<<
"idx="
<<
idx
<<
" x_idx="
<<
x_idx
<<
" ["
<<
", "
<<
end_offset
<<
"]"
;
<<
", "
<<
end_offset
<<
"]"
;
// Copy data
// Copy data
PADDLE_ENFORCE_GE
(
end_offset
,
start_offset
);
PADDLE_ENFORCE_GE
(
end_offset
,
start_offset
);
...
...
paddle/fluid/operators/batch_norm_op.cc
浏览文件 @
d231e550
...
@@ -170,6 +170,15 @@ The required data format for this layer is one of the following:
...
@@ -170,6 +170,15 @@ The required data format for this layer is one of the following:
}
}
};
};
class
BatchNormOpInferVarType
:
public
framework
::
PassInDtypeAndVarTypeToOutput
{
protected:
std
::
unordered_map
<
std
::
string
,
std
::
string
>
GetInputOutputWithSameType
()
const
override
{
return
std
::
unordered_map
<
std
::
string
,
std
::
string
>
{{
"X"
,
/*->*/
"Y"
}};
}
};
template
<
typename
T
>
template
<
typename
T
>
class
BatchNormKernel
<
platform
::
CPUDeviceContext
,
T
>
class
BatchNormKernel
<
platform
::
CPUDeviceContext
,
T
>
:
public
framework
::
OpKernel
<
T
>
{
:
public
framework
::
OpKernel
<
T
>
{
...
@@ -525,7 +534,7 @@ class BatchNormGradMaker : public framework::SingleGradOpDescMaker {
...
@@ -525,7 +534,7 @@ class BatchNormGradMaker : public framework::SingleGradOpDescMaker {
namespace
ops
=
paddle
::
operators
;
namespace
ops
=
paddle
::
operators
;
REGISTER_OPERATOR
(
batch_norm
,
ops
::
BatchNormOp
,
ops
::
BatchNormOpMaker
,
REGISTER_OPERATOR
(
batch_norm
,
ops
::
BatchNormOp
,
ops
::
BatchNormOpMaker
,
ops
::
BatchNormGradMaker
);
ops
::
BatchNorm
OpInferVarType
,
ops
::
BatchNorm
GradMaker
);
REGISTER_OPERATOR
(
batch_norm_grad
,
ops
::
BatchNormGradOp
);
REGISTER_OPERATOR
(
batch_norm_grad
,
ops
::
BatchNormGradOp
);
REGISTER_OP_CPU_KERNEL
(
REGISTER_OP_CPU_KERNEL
(
...
...
paddle/fluid/operators/batch_norm_op.cu.cc
浏览文件 @
d231e550
...
@@ -96,7 +96,7 @@ class BatchNormKernel<platform::CUDADeviceContext, T>
...
@@ -96,7 +96,7 @@ class BatchNormKernel<platform::CUDADeviceContext, T>
mode_
=
CUDNN_BATCHNORM_SPATIAL
;
mode_
=
CUDNN_BATCHNORM_SPATIAL
;
#endif
#endif
VLOG
(
3
)
<<
"Setting descriptors."
;
VLOG
(
3
0
)
<<
"Setting descriptors."
;
std
::
vector
<
int
>
dims
;
std
::
vector
<
int
>
dims
;
std
::
vector
<
int
>
strides
;
std
::
vector
<
int
>
strides
;
if
(
data_layout
==
DataLayout
::
kNCHW
)
{
if
(
data_layout
==
DataLayout
::
kNCHW
)
{
...
...
paddle/fluid/operators/beam_search_op.cc
浏览文件 @
d231e550
...
@@ -33,11 +33,11 @@ void BeamSearch::operator()(const framework::LoDTensor &pre_ids,
...
@@ -33,11 +33,11 @@ void BeamSearch::operator()(const framework::LoDTensor &pre_ids,
auto
items
=
SelectTopBeamSizeItems
(
pre_ids
,
pre_scores
);
auto
items
=
SelectTopBeamSizeItems
(
pre_ids
,
pre_scores
);
auto
selected_items
=
ToMap
(
items
,
high_level
.
back
());
auto
selected_items
=
ToMap
(
items
,
high_level
.
back
());
VLOG
(
3
)
<<
"selected_items:"
;
VLOG
(
3
0
)
<<
"selected_items:"
;
for
(
size_t
i
=
0
;
i
<
selected_items
.
size
();
++
i
)
{
for
(
size_t
i
=
0
;
i
<
selected_items
.
size
();
++
i
)
{
VLOG
(
3
)
<<
"offset:"
<<
i
;
VLOG
(
3
0
)
<<
"offset:"
<<
i
;
for
(
auto
&
item
:
selected_items
[
i
])
{
for
(
auto
&
item
:
selected_items
[
i
])
{
VLOG
(
3
)
<<
ItemToString
(
item
);
VLOG
(
3
0
)
<<
ItemToString
(
item
);
}
}
}
}
...
@@ -138,11 +138,11 @@ std::vector<std::vector<BeamSearch::Item>> BeamSearch::SelectTopBeamSizeItems(
...
@@ -138,11 +138,11 @@ std::vector<std::vector<BeamSearch::Item>> BeamSearch::SelectTopBeamSizeItems(
}
}
result
.
emplace_back
(
items
);
result
.
emplace_back
(
items
);
}
}
VLOG
(
3
)
<<
"SelectTopBeamSizeItems result size "
<<
result
.
size
();
VLOG
(
3
0
)
<<
"SelectTopBeamSizeItems result size "
<<
result
.
size
();
for
(
auto
&
items
:
result
)
{
for
(
auto
&
items
:
result
)
{
VLOG
(
3
)
<<
"item set:"
;
VLOG
(
3
0
)
<<
"item set:"
;
for
(
auto
&
item
:
items
)
{
for
(
auto
&
item
:
items
)
{
VLOG
(
3
)
<<
ItemToString
(
item
);
VLOG
(
3
0
)
<<
ItemToString
(
item
);
}
}
}
}
...
...
paddle/fluid/operators/bilinear_interp_op.h
已删除
100644 → 0
浏览文件 @
cf8d2e67
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/operators/math/math_function.h"
namespace
paddle
{
namespace
operators
{
using
Tensor
=
framework
::
Tensor
;
template
<
typename
T
>
class
BilinearInterpKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
void
Compute
(
const
framework
::
ExecutionContext
&
ctx
)
const
override
{
auto
*
input_t
=
ctx
.
Input
<
Tensor
>
(
"X"
);
// float tensor
auto
*
output_t
=
ctx
.
Output
<
Tensor
>
(
"Out"
);
// float tensor
auto
out_dims
=
output_t
->
dims
();
auto
*
input
=
input_t
->
data
<
T
>
();
int
out_h
=
ctx
.
Attr
<
int
>
(
"out_h"
);
int
out_w
=
ctx
.
Attr
<
int
>
(
"out_w"
);
auto
out_size_t
=
ctx
.
Input
<
Tensor
>
(
"OutSize"
);
if
(
out_size_t
!=
nullptr
)
{
auto
out_size_data
=
out_size_t
->
data
<
int
>
();
out_h
=
out_size_data
[
0
];
out_w
=
out_size_data
[
1
];
}
auto
*
output
=
output_t
->
mutable_data
<
T
>
(
{
out_dims
[
0
],
out_dims
[
1
],
out_h
,
out_w
},
ctx
.
GetPlace
());
int
batch_size
=
input_t
->
dims
()[
0
];
int
channels
=
input_t
->
dims
()[
1
];
int
in_h
=
input_t
->
dims
()[
2
];
int
in_w
=
input_t
->
dims
()[
3
];
int
in_hw
=
in_h
*
in_w
;
int
out_hw
=
out_h
*
out_w
;
int
in_chw
=
channels
*
in_hw
;
int
out_chw
=
channels
*
out_hw
;
float
ratio_h
=
(
out_h
>
1
)
?
static_cast
<
float
>
(
in_h
-
1
)
/
(
out_h
-
1
)
:
0.
f
;
float
ratio_w
=
(
out_w
>
1
)
?
static_cast
<
float
>
(
in_w
-
1
)
/
(
out_w
-
1
)
:
0.
f
;
if
(
in_h
==
out_h
&&
in_w
==
out_w
)
{
memcpy
(
output
,
input
,
input_t
->
numel
()
*
sizeof
(
T
));
}
else
{
for
(
int
k
=
0
;
k
<
batch_size
;
++
k
)
{
// loop for batches
for
(
int
i
=
0
;
i
<
out_h
;
++
i
)
{
// loop for images
int
h
=
ratio_h
*
i
;
int
hid
=
(
h
<
in_h
-
1
)
?
1
:
0
;
float
h1lambda
=
ratio_h
*
i
-
h
;
float
h2lambda
=
1.
f
-
h1lambda
;
for
(
int
j
=
0
;
j
<
out_w
;
++
j
)
{
int
w
=
ratio_w
*
j
;
int
wid
=
(
w
<
in_w
-
1
)
?
1
:
0
;
float
w1lambda
=
ratio_w
*
j
-
w
;
float
w2lambda
=
1.
f
-
w1lambda
;
// calculate four position for bilinear interpolation
const
T
*
in_pos
=
&
input
[
k
*
in_chw
+
h
*
in_w
+
w
];
T
*
out_pos
=
&
output
[
k
*
out_chw
+
i
*
out_w
+
j
];
for
(
int
c
=
0
;
c
<
channels
;
++
c
)
{
// loop for channels
// bilinear interpolation
out_pos
[
0
]
=
static_cast
<
T
>
(
h2lambda
*
(
w2lambda
*
in_pos
[
0
]
+
w1lambda
*
in_pos
[
wid
])
+
h1lambda
*
(
w2lambda
*
in_pos
[
hid
*
in_w
]
+
w1lambda
*
in_pos
[
hid
*
in_w
+
wid
]));
in_pos
+=
in_hw
;
out_pos
+=
out_hw
;
}
}
}
}
}
}
};
template
<
typename
T
>
class
BilinearInterpGradKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
void
Compute
(
const
framework
::
ExecutionContext
&
ctx
)
const
override
{
auto
*
d_input_t
=
ctx
.
Output
<
Tensor
>
(
framework
::
GradVarName
(
"X"
));
auto
*
d_output_t
=
ctx
.
Input
<
Tensor
>
(
framework
::
GradVarName
(
"Out"
));
auto
*
d_output
=
d_output_t
->
data
<
T
>
();
auto
*
d_input
=
d_input_t
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
auto
&
device_ctx
=
ctx
.
template
device_context
<
platform
::
CPUDeviceContext
>();
math
::
SetConstant
<
platform
::
CPUDeviceContext
,
T
>
zero
;
zero
(
device_ctx
,
d_input_t
,
static_cast
<
T
>
(
0.0
));
int
out_h
=
ctx
.
Attr
<
int
>
(
"out_h"
);
int
out_w
=
ctx
.
Attr
<
int
>
(
"out_w"
);
auto
out_size_t
=
ctx
.
Input
<
Tensor
>
(
"OutSize"
);
if
(
out_size_t
!=
nullptr
)
{
auto
out_size_data
=
out_size_t
->
data
<
int
>
();
out_h
=
out_size_data
[
0
];
out_w
=
out_size_data
[
1
];
}
int
batch_size
=
d_input_t
->
dims
()[
0
];
int
channels
=
d_input_t
->
dims
()[
1
];
int
in_h
=
d_input_t
->
dims
()[
2
];
int
in_w
=
d_input_t
->
dims
()[
3
];
int
in_hw
=
in_h
*
in_w
;
int
out_hw
=
out_h
*
out_w
;
int
in_chw
=
channels
*
in_hw
;
int
out_chw
=
channels
*
out_hw
;
float
ratio_h
=
(
out_h
>
1
)
?
static_cast
<
float
>
(
in_h
-
1
)
/
(
out_h
-
1
)
:
0.
f
;
float
ratio_w
=
(
out_w
>
1
)
?
static_cast
<
float
>
(
in_w
-
1
)
/
(
out_w
-
1
)
:
0.
f
;
if
(
in_h
==
out_h
&&
in_w
==
out_w
)
{
memcpy
(
d_input
,
d_output
,
d_input_t
->
numel
()
*
sizeof
(
T
));
}
else
{
for
(
int
k
=
0
;
k
<
batch_size
;
++
k
)
{
// loop for batches
for
(
int
i
=
0
;
i
<
out_h
;
++
i
)
{
// loop for images
int
h
=
ratio_h
*
i
;
int
hid
=
(
h
<
in_h
-
1
)
?
1
:
0
;
float
h1lambda
=
ratio_h
*
i
-
h
;
float
h2lambda
=
1
-
h1lambda
;
for
(
int
j
=
0
;
j
<
out_w
;
++
j
)
{
int
w
=
ratio_w
*
j
;
int
wid
=
(
w
<
in_w
-
1
)
?
1
:
0
;
float
w1lambda
=
ratio_w
*
j
-
w
;
float
w2lambda
=
1
-
w1lambda
;
T
*
in_pos
=
&
d_input
[
k
*
in_chw
+
h
*
in_w
+
w
];
const
T
*
out_pos
=
&
d_output
[
k
*
out_chw
+
i
*
out_w
+
j
];
for
(
int
c
=
0
;
c
<
channels
;
++
c
)
{
// loop for channels
in_pos
[
0
]
+=
static_cast
<
T
>
(
h2lambda
*
w2lambda
*
out_pos
[
0
]);
in_pos
[
wid
]
+=
static_cast
<
T
>
(
h2lambda
*
w1lambda
*
out_pos
[
0
]);
in_pos
[
hid
*
in_w
]
+=
static_cast
<
T
>
(
h1lambda
*
w2lambda
*
out_pos
[
0
]);
in_pos
[
hid
*
in_w
+
wid
]
+=
static_cast
<
T
>
(
h1lambda
*
w1lambda
*
out_pos
[
0
]);
in_pos
+=
in_hw
;
out_pos
+=
out_hw
;
}
}
}
}
}
}
};
}
// namespace operators
}
// namespace paddle
paddle/fluid/operators/checkpoint_notify_op.cc
浏览文件 @
d231e550
...
@@ -46,8 +46,8 @@ class CheckpointNotifyOp : public framework::OperatorBase {
...
@@ -46,8 +46,8 @@ class CheckpointNotifyOp : public framework::OperatorBase {
auto
lookup_table_save_dir
=
auto
lookup_table_save_dir
=
string
::
Sprintf
(
"%s/%s_%d"
,
dir
,
lookup_table_name
,
i
);
string
::
Sprintf
(
"%s/%s_%d"
,
dir
,
lookup_table_name
,
i
);
rpc_client
->
AsyncCheckpointNotify
(
epmap
[
i
],
lookup_table_save_dir
);
rpc_client
->
AsyncCheckpointNotify
(
epmap
[
i
],
lookup_table_save_dir
);
VLOG
(
3
)
<<
"checkpoint notify sending lookup table: "
<<
lookup_table_name
VLOG
(
3
0
)
<<
"checkpoint notify sending lookup table: "
<<
" and dir:"
<<
dir
<<
" to "
<<
epmap
[
i
];
<<
lookup_table_name
<<
" and dir:"
<<
dir
<<
" to "
<<
epmap
[
i
];
}
}
PADDLE_ENFORCE
(
rpc_client
->
Wait
(),
"internal error in RPCClient"
);
PADDLE_ENFORCE
(
rpc_client
->
Wait
(),
"internal error in RPCClient"
);
}
}
...
...
paddle/fluid/operators/concat_op.cc
浏览文件 @
d231e550
...
@@ -37,7 +37,7 @@ class ConcatOp : public framework::OperatorWithKernel {
...
@@ -37,7 +37,7 @@ class ConcatOp : public framework::OperatorWithKernel {
PADDLE_ENFORCE_GT
(
n
,
0
,
"Input tensors count should > 0."
);
PADDLE_ENFORCE_GT
(
n
,
0
,
"Input tensors count should > 0."
);
if
(
n
==
1
)
{
if
(
n
==
1
)
{
VLOG
(
3
)
<<
"Warning: concat op have only one input, may waste memory"
;
VLOG
(
3
0
)
<<
"Warning: concat op have only one input, may waste memory"
;
}
}
auto
out_dims
=
ins
[
0
];
auto
out_dims
=
ins
[
0
];
...
...
paddle/fluid/operators/conv_cudnn_op.cu.cc
浏览文件 @
d231e550
...
@@ -15,15 +15,22 @@ limitations under the License. */
...
@@ -15,15 +15,22 @@ limitations under the License. */
#include "paddle/fluid/framework/eigen.h"
#include "paddle/fluid/framework/eigen.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/memory/memory.h"
#include "paddle/fluid/memory/memory.h"
#include "paddle/fluid/operators/conv_cudnn_op_cache.h"
#include "paddle/fluid/operators/conv_op.h"
#include "paddle/fluid/operators/conv_op.h"
#include "paddle/fluid/platform/assert.h"
#include "paddle/fluid/platform/assert.h"
#include "paddle/fluid/platform/cudnn_helper.h"
#include "paddle/fluid/platform/cudnn_helper.h"
#include "paddle/fluid/platform/float16.h"
#include "paddle/fluid/platform/float16.h"
#include "paddle/fluid/platform/profiler.h"
DEFINE_bool
(
cudnn_deterministic
,
false
,
DEFINE_bool
(
cudnn_deterministic
,
false
,
"Whether allow using an autotuning algorithm for convolution "
"Whether allow using an autotuning algorithm for convolution "
"operator. The autotuning algorithm may be non-deterministic. If "
"operator. The autotuning algorithm may be non-deterministic. If "
"true, the algorithm is deterministic."
);
"true, the algorithm is deterministic."
);
DEFINE_uint64
(
conv_workspace_size_limit
,
4096
,
"cuDNN convolution workspace limit in MB unit."
);
DEFINE_bool
(
cudnn_exhaustive_search
,
false
,
"Whether enable exhaustive search for cuDNN convolution or "
"not, defalut is False."
);
namespace
paddle
{
namespace
paddle
{
namespace
operators
{
namespace
operators
{
...
@@ -36,13 +43,25 @@ using DataLayout = platform::DataLayout;
...
@@ -36,13 +43,25 @@ using DataLayout = platform::DataLayout;
template
<
typename
T
>
template
<
typename
T
>
using
ScalingParamType
=
typename
platform
::
CudnnDataType
<
T
>::
ScalingParamType
;
using
ScalingParamType
=
typename
platform
::
CudnnDataType
<
T
>::
ScalingParamType
;
static
constexpr
char
kCUDNNFwdAlgoCache
[]
=
"kCUDNNFwdAlgoCache"
;
static
constexpr
char
kCUDNNBwdDataAlgoCache
[]
=
"kCUDNNBwdDataAlgoCache"
;
static
constexpr
char
kCUDNNBwdFilterAlgoCache
[]
=
"kCUDNNBwdFilterAlgoCache"
;
static
constexpr
size_t
kCONV_CUDNN_WORKSPACE_LIMIT_BYTES
=
static
constexpr
size_t
kCONV_CUDNN_WORKSPACE_LIMIT_BYTES
=
static_cast
<
size_t
>
(
1024
)
*
1024
*
1024
;
static_cast
<
size_t
>
(
1024
)
*
1024
*
1024
;
static
constexpr
size_t
kNUM_CUDNN_FWD_ALGS
=
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_COUNT
;
static
constexpr
size_t
kNUM_CUDNN_BWD_FILTER_ALGS
=
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_COUNT
;
static
constexpr
size_t
kNUM_CUDNN_BWD_DATA_ALGS
=
CUDNN_CONVOLUTION_BWD_DATA_ALGO_COUNT
;
template
<
typename
T
>
template
<
typename
T
>
class
CUDNNConvOpKernel
:
public
framework
::
OpKernel
<
T
>
{
class
CUDNNConvOpKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
public:
void
Compute
(
const
framework
::
ExecutionContext
&
ctx
)
const
override
{
void
Compute
(
const
framework
::
ExecutionContext
&
ctx
)
const
override
{
auto
&
dev_ctx
=
ctx
.
template
device_context
<
platform
::
CUDADeviceContext
>();
PADDLE_ENFORCE
(
platform
::
is_gpu_place
(
ctx
.
GetPlace
()),
PADDLE_ENFORCE
(
platform
::
is_gpu_place
(
ctx
.
GetPlace
()),
"It must use CUDAPlace."
);
"It must use CUDAPlace."
);
auto
*
input
=
ctx
.
Input
<
Tensor
>
(
"Input"
);
auto
*
input
=
ctx
.
Input
<
Tensor
>
(
"Input"
);
...
@@ -55,6 +74,8 @@ class CUDNNConvOpKernel : public framework::OpKernel<T> {
...
@@ -55,6 +74,8 @@ class CUDNNConvOpKernel : public framework::OpKernel<T> {
int
groups
=
ctx
.
Attr
<
int
>
(
"groups"
);
int
groups
=
ctx
.
Attr
<
int
>
(
"groups"
);
int64_t
user_workspace_size
=
int64_t
user_workspace_size
=
static_cast
<
size_t
>
(
ctx
.
Attr
<
int
>
(
"workspace_size_MB"
));
static_cast
<
size_t
>
(
ctx
.
Attr
<
int
>
(
"workspace_size_MB"
));
bool
exhaustive_search
=
FLAGS_cudnn_exhaustive_search
||
ctx
.
Attr
<
bool
>
(
"exhaustive_search"
);
const
T
*
input_data
=
input
->
data
<
T
>
();
const
T
*
input_data
=
input
->
data
<
T
>
();
const
T
*
filter_data
=
filter
->
data
<
T
>
();
const
T
*
filter_data
=
filter
->
data
<
T
>
();
...
@@ -120,19 +141,19 @@ class CUDNNConvOpKernel : public framework::OpKernel<T> {
...
@@ -120,19 +141,19 @@ class CUDNNConvOpKernel : public framework::OpKernel<T> {
// ------------------- cudnn conv workspace ---------------------
// ------------------- cudnn conv workspace ---------------------
size_t
workspace_size_in_bytes
;
// final workspace to allocate.
size_t
workspace_size_in_bytes
;
// final workspace to allocate.
size_t
workspace_size_limit
=
kCONV_CUDNN_WORKSPACE_LIMIT_BYTES
;
size_t
workspace_size_limit
=
kCONV_CUDNN_WORKSPACE_LIMIT_BYTES
;
if
(
user_workspace_size
>
0
)
{
if
(
FLAGS_conv_workspace_size_limit
>
0
||
user_workspace_size
>
0
)
{
workspace_size_limit
=
user_workspace_size
*
1024
*
1024
;
int64_t
max_user_size
=
std
::
max
(
static_cast
<
int64_t
>
(
FLAGS_conv_workspace_size_limit
),
user_workspace_size
);
workspace_size_limit
=
max_user_size
*
1024
*
1024
;
}
}
// ------------------- cudnn conv algorithm ---------------------
// ------------------- cudnn conv algorithm ---------------------
cudnnConvolutionFwdAlgo_t
algo
;
cudnnConvolutionFwdAlgo_t
algo
;
auto
&
dev_ctx
=
ctx
.
template
device_context
<
platform
::
CUDADeviceContext
>();
auto
handle
=
dev_ctx
.
cudnn_handle
();
auto
handle
=
dev_ctx
.
cudnn_handle
();
auto
workspace_handle
=
dev_ctx
.
cudnn_workspace_handle
();
CUDNN_ENFORCE
(
platform
::
dynload
::
cudnnGetConvolutionForwardAlgorithm
(
bool
half_float
=
false
;
handle
,
cudnn_input_desc
,
cudnn_filter_desc
,
cudnn_conv_desc
,
cudnn_output_desc
,
CUDNN_CONVOLUTION_FWD_SPECIFY_WORKSPACE_LIMIT
,
workspace_size_limit
,
&
algo
));
#if CUDA_VERSION >= 9000 && CUDNN_VERSION_MIN(7, 0, 1)
#if CUDA_VERSION >= 9000 && CUDNN_VERSION_MIN(7, 0, 1)
// Tensor core is supported since the volta GPU and
// Tensor core is supported since the volta GPU and
// is only enabled when input and filter data are float16
// is only enabled when input and filter data are float16
...
@@ -143,14 +164,66 @@ class CUDNNConvOpKernel : public framework::OpKernel<T> {
...
@@ -143,14 +164,66 @@ class CUDNNConvOpKernel : public framework::OpKernel<T> {
cudnn_conv_desc
,
CUDNN_TENSOR_OP_MATH
));
cudnn_conv_desc
,
CUDNN_TENSOR_OP_MATH
));
// Currently tensor core is only enabled using this algo
// Currently tensor core is only enabled using this algo
algo
=
CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM
;
algo
=
CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM
;
VLOG
(
5
)
<<
"use cudnn_tensor_op_math"
;
half_float
=
true
;
VLOG
(
50
)
<<
"use cudnn_tensor_op_math"
;
}
else
{
}
else
{
CUDNN_ENFORCE
(
platform
::
dynload
::
cudnnSetConvolutionMathType
(
CUDNN_ENFORCE
(
platform
::
dynload
::
cudnnSetConvolutionMathType
(
cudnn_conv_desc
,
CUDNN_DEFAULT_MATH
));
cudnn_conv_desc
,
CUDNN_DEFAULT_MATH
));
VLOG
(
5
)
<<
"NOT use cudnn_tensor_op_math"
;
VLOG
(
5
0
)
<<
"NOT use cudnn_tensor_op_math"
;
}
}
#endif
#endif
auto
x_dims
=
framework
::
vectorize
(
input
->
dims
());
auto
f_dims
=
framework
::
vectorize
(
filter
->
dims
());
if
((
!
exhaustive_search
)
&&
(
!
half_float
))
{
CUDNN_ENFORCE
(
platform
::
dynload
::
cudnnGetConvolutionForwardAlgorithm
(
handle
,
cudnn_input_desc
,
cudnn_filter_desc
,
cudnn_conv_desc
,
cudnn_output_desc
,
CUDNN_CONVOLUTION_FWD_SPECIFY_WORKSPACE_LIMIT
,
workspace_size_limit
,
&
algo
));
VLOG
(
3
)
<<
"cuDNN forward algo "
<<
algo
;
}
else
if
(
exhaustive_search
&&
(
!
half_float
))
{
AlgorithmsCache
<
cudnnConvolutionFwdAlgo_t
>*
algo_cache
=
nullptr
;
if
(
ctx
.
scope
().
FindVar
(
kCUDNNFwdAlgoCache
))
{
algo_cache
=
ctx
.
scope
()
.
FindVar
(
kCUDNNFwdAlgoCache
)
->
GetMutable
<
AlgorithmsCache
<
cudnnConvolutionFwdAlgo_t
>>
();
}
else
{
algo_cache
=
const_cast
<
framework
::
Scope
&>
(
ctx
.
scope
())
.
Var
(
kCUDNNFwdAlgoCache
)
->
GetMutable
<
AlgorithmsCache
<
cudnnConvolutionFwdAlgo_t
>>
();
}
algo
=
algo_cache
->
GetAlgorithm
(
x_dims
,
f_dims
,
strides
,
paddings
,
dilations
,
0
,
[
&
]()
{
int
returned_algo_count
;
std
::
array
<
cudnnConvolutionFwdAlgoPerf_t
,
kNUM_CUDNN_FWD_ALGS
>
fwd_perf_stat
;
auto
cudnn_find_func
=
[
&
](
void
*
cudnn_workspace
)
{
CUDNN_ENFORCE
(
platform
::
dynload
::
cudnnFindConvolutionForwardAlgorithmEx
(
handle
,
cudnn_input_desc
,
input_data
,
cudnn_filter_desc
,
filter_data
,
cudnn_conv_desc
,
cudnn_output_desc
,
output_data
,
kNUM_CUDNN_FWD_ALGS
,
&
returned_algo_count
,
fwd_perf_stat
.
data
(),
cudnn_workspace
,
workspace_size_limit
));
};
workspace_handle
.
RunFunc
(
cudnn_find_func
,
workspace_size_limit
);
VLOG
(
3
)
<<
"Perf result: (algo: stat, time, memory)"
;
for
(
int
i
=
0
;
i
<
returned_algo_count
;
++
i
)
{
const
auto
&
stat
=
fwd_perf_stat
[
i
];
VLOG
(
3
)
<<
stat
.
algo
<<
": "
<<
stat
.
status
<<
" "
<<
stat
.
time
<<
" "
<<
stat
.
memory
;
}
return
fwd_perf_stat
[
0
].
algo
;
});
VLOG
(
3
)
<<
"choose algo "
<<
algo
;
}
else
{
PADDLE_ENFORCE
(
half_float
,
"cuDNN exhaustive search doesn't support half float."
);
}
// get workspace size able to allocate
// get workspace size able to allocate
CUDNN_ENFORCE
(
platform
::
dynload
::
cudnnGetConvolutionForwardWorkspaceSize
(
CUDNN_ENFORCE
(
platform
::
dynload
::
cudnnGetConvolutionForwardWorkspaceSize
(
handle
,
cudnn_input_desc
,
cudnn_filter_desc
,
cudnn_conv_desc
,
handle
,
cudnn_input_desc
,
cudnn_filter_desc
,
cudnn_conv_desc
,
...
@@ -162,7 +235,6 @@ class CUDNNConvOpKernel : public framework::OpKernel<T> {
...
@@ -162,7 +235,6 @@ class CUDNNConvOpKernel : public framework::OpKernel<T> {
// ------------------- cudnn conv forward ---------------------
// ------------------- cudnn conv forward ---------------------
ScalingParamType
<
T
>
alpha
=
1.0
f
,
beta
=
0.0
f
;
ScalingParamType
<
T
>
alpha
=
1.0
f
,
beta
=
0.0
f
;
auto
workspace_handle
=
dev_ctx
.
cudnn_workspace_handle
();
for
(
int
i
=
0
;
i
<
groups
;
i
++
)
{
for
(
int
i
=
0
;
i
<
groups
;
i
++
)
{
auto
cudnn_func
=
[
&
](
void
*
cudnn_workspace
)
{
auto
cudnn_func
=
[
&
](
void
*
cudnn_workspace
)
{
CUDNN_ENFORCE
(
platform
::
dynload
::
cudnnConvolutionForward
(
CUDNN_ENFORCE
(
platform
::
dynload
::
cudnnConvolutionForward
(
...
@@ -180,6 +252,7 @@ template <typename T>
...
@@ -180,6 +252,7 @@ template <typename T>
class
CUDNNConvGradOpKernel
:
public
framework
::
OpKernel
<
T
>
{
class
CUDNNConvGradOpKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
public:
void
Compute
(
const
framework
::
ExecutionContext
&
ctx
)
const
override
{
void
Compute
(
const
framework
::
ExecutionContext
&
ctx
)
const
override
{
auto
&
dev_ctx
=
ctx
.
template
device_context
<
platform
::
CUDADeviceContext
>();
PADDLE_ENFORCE
(
platform
::
is_gpu_place
(
ctx
.
GetPlace
()),
PADDLE_ENFORCE
(
platform
::
is_gpu_place
(
ctx
.
GetPlace
()),
"It must use CUDAPlace."
);
"It must use CUDAPlace."
);
auto
input
=
ctx
.
Input
<
Tensor
>
(
"Input"
);
auto
input
=
ctx
.
Input
<
Tensor
>
(
"Input"
);
...
@@ -198,6 +271,13 @@ class CUDNNConvGradOpKernel : public framework::OpKernel<T> {
...
@@ -198,6 +271,13 @@ class CUDNNConvGradOpKernel : public framework::OpKernel<T> {
int
groups
=
ctx
.
Attr
<
int
>
(
"groups"
);
int
groups
=
ctx
.
Attr
<
int
>
(
"groups"
);
int64_t
user_workspace_size
=
int64_t
user_workspace_size
=
static_cast
<
size_t
>
(
ctx
.
Attr
<
int
>
(
"workspace_size_MB"
));
static_cast
<
size_t
>
(
ctx
.
Attr
<
int
>
(
"workspace_size_MB"
));
bool
exhaustive_search
=
FLAGS_cudnn_exhaustive_search
||
ctx
.
Attr
<
bool
>
(
"exhaustive_search"
);
if
(
exhaustive_search
&&
FLAGS_cudnn_deterministic
)
{
PADDLE_THROW
(
"Cann't set exhaustive_search True and "
"FLAGS_cudnn_deterministic True at same time."
);
}
// ------------------- cudnn descriptors ---------------------
// ------------------- cudnn descriptors ---------------------
ScopedTensorDescriptor
input_desc
;
ScopedTensorDescriptor
input_desc
;
...
@@ -265,14 +345,66 @@ class CUDNNConvGradOpKernel : public framework::OpKernel<T> {
...
@@ -265,14 +345,66 @@ class CUDNNConvGradOpKernel : public framework::OpKernel<T> {
cudnnConvolutionBwdFilterAlgo_t
filter_algo
;
cudnnConvolutionBwdFilterAlgo_t
filter_algo
;
size_t
workspace_size_in_bytes
=
0
,
tmp_size
=
0
;
size_t
workspace_size_in_bytes
=
0
,
tmp_size
=
0
;
size_t
workspace_size_limit
=
kCONV_CUDNN_WORKSPACE_LIMIT_BYTES
;
size_t
workspace_size_limit
=
kCONV_CUDNN_WORKSPACE_LIMIT_BYTES
;
if
(
user_workspace_size
>
0
)
{
if
(
FLAGS_conv_workspace_size_limit
>
0
||
user_workspace_size
>
0
)
{
workspace_size_limit
=
user_workspace_size
*
1024
*
1024
;
int64_t
max_user_size
=
std
::
max
(
static_cast
<
int64_t
>
(
FLAGS_conv_workspace_size_limit
),
user_workspace_size
);
workspace_size_limit
=
max_user_size
*
1024
*
1024
;
}
}
auto
&
dev_ctx
=
ctx
.
template
device_context
<
platform
::
CUDADeviceContext
>();
auto
x_dims
=
framework
::
vectorize
(
input
->
dims
());
auto
f_dims
=
framework
::
vectorize
(
filter
->
dims
());
auto
handle
=
dev_ctx
.
cudnn_handle
();
auto
handle
=
dev_ctx
.
cudnn_handle
();
auto
workspace_handle
=
dev_ctx
.
cudnn_workspace_handle
();
if
(
input_grad
)
{
if
(
input_grad
)
{
if
(
!
FLAGS_cudnn_deterministic
)
{
T
*
input_grad_data
=
input_grad
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
if
(
exhaustive_search
)
{
AlgorithmsCache
<
cudnnConvolutionBwdDataAlgo_t
>*
data_algo_cache
;
if
(
ctx
.
scope
().
FindVar
(
kCUDNNBwdDataAlgoCache
))
{
data_algo_cache
=
ctx
.
scope
()
.
FindVar
(
kCUDNNBwdDataAlgoCache
)
->
GetMutable
<
AlgorithmsCache
<
cudnnConvolutionBwdDataAlgo_t
>>
();
}
else
{
data_algo_cache
=
const_cast
<
framework
::
Scope
&>
(
ctx
.
scope
())
.
Var
(
kCUDNNBwdDataAlgoCache
)
->
GetMutable
<
AlgorithmsCache
<
cudnnConvolutionBwdDataAlgo_t
>>
();
}
data_algo
=
data_algo_cache
->
GetAlgorithm
(
x_dims
,
f_dims
,
strides
,
paddings
,
dilations
,
0
,
[
&
]()
{
int
returned_algo_count
;
std
::
array
<
cudnnConvolutionBwdDataAlgoPerf_t
,
kNUM_CUDNN_BWD_DATA_ALGS
>
data_perf_stat
;
auto
cudnn_find_bd_data_func
=
[
&
](
void
*
cudnn_workspace
)
{
CUDNN_ENFORCE
(
platform
::
dynload
::
cudnnFindConvolutionBackwardDataAlgorithmEx
(
handle
,
cudnn_filter_desc
,
filter_data
,
cudnn_output_grad_desc
,
output_grad_data
,
cudnn_conv_desc
,
cudnn_input_desc
,
input_grad_data
,
kNUM_CUDNN_BWD_DATA_ALGS
,
&
returned_algo_count
,
data_perf_stat
.
data
(),
cudnn_workspace
,
workspace_size_limit
));
};
workspace_handle
.
RunFunc
(
cudnn_find_bd_data_func
,
workspace_size_limit
);
VLOG
(
3
)
<<
"Perf result: (algo: stat, time, memory)"
;
for
(
int
i
=
0
;
i
<
returned_algo_count
;
++
i
)
{
const
auto
&
stat
=
data_perf_stat
[
i
];
VLOG
(
3
)
<<
stat
.
algo
<<
": "
<<
stat
.
status
<<
" "
<<
stat
.
time
<<
" "
<<
stat
.
memory
;
}
return
data_perf_stat
[
0
].
algo
;
});
VLOG
(
3
)
<<
"cuDNN backward data algo "
<<
data_algo
;
}
else
if
(
FLAGS_cudnn_deterministic
)
{
data_algo
=
CUDNN_CONVOLUTION_BWD_DATA_ALGO_1
;
}
else
{
CUDNN_ENFORCE
(
CUDNN_ENFORCE
(
platform
::
dynload
::
cudnnGetConvolutionBackwardDataAlgorithm
(
platform
::
dynload
::
cudnnGetConvolutionBackwardDataAlgorithm
(
handle
,
cudnn_filter_desc
,
handle
,
cudnn_filter_desc
,
...
@@ -285,10 +417,7 @@ class CUDNNConvGradOpKernel : public framework::OpKernel<T> {
...
@@ -285,10 +417,7 @@ class CUDNNConvGradOpKernel : public framework::OpKernel<T> {
cudnn_input_desc
,
cudnn_input_desc
,
CUDNN_CONVOLUTION_BWD_DATA_SPECIFY_WORKSPACE_LIMIT
,
CUDNN_CONVOLUTION_BWD_DATA_SPECIFY_WORKSPACE_LIMIT
,
workspace_size_limit
,
&
data_algo
));
workspace_size_limit
,
&
data_algo
));
}
else
{
data_algo
=
CUDNN_CONVOLUTION_BWD_DATA_ALGO_1
;
}
}
CUDNN_ENFORCE
(
CUDNN_ENFORCE
(
platform
::
dynload
::
cudnnGetConvolutionBackwardDataWorkspaceSize
(
platform
::
dynload
::
cudnnGetConvolutionBackwardDataWorkspaceSize
(
handle
,
cudnn_filter_desc
,
cudnn_output_grad_desc
,
handle
,
cudnn_filter_desc
,
cudnn_output_grad_desc
,
...
@@ -297,17 +426,54 @@ class CUDNNConvGradOpKernel : public framework::OpKernel<T> {
...
@@ -297,17 +426,54 @@ class CUDNNConvGradOpKernel : public framework::OpKernel<T> {
}
}
if
(
filter_grad
)
{
if
(
filter_grad
)
{
if
(
!
FLAGS_cudnn_deterministic
)
{
T
*
filter_grad_data
=
filter_grad
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
if
(
exhaustive_search
)
{
AlgorithmsCache
<
cudnnConvolutionBwdFilterAlgo_t
>*
f_algo_cache
;
if
(
ctx
.
scope
().
FindVar
(
kCUDNNBwdFilterAlgoCache
))
{
f_algo_cache
=
ctx
.
scope
()
.
FindVar
(
kCUDNNBwdFilterAlgoCache
)
->
GetMutable
<
AlgorithmsCache
<
cudnnConvolutionBwdFilterAlgo_t
>>
();
}
else
{
f_algo_cache
=
const_cast
<
framework
::
Scope
&>
(
ctx
.
scope
())
.
Var
(
kCUDNNBwdFilterAlgoCache
)
->
GetMutable
<
AlgorithmsCache
<
cudnnConvolutionBwdFilterAlgo_t
>>
();
}
filter_algo
=
f_algo_cache
->
GetAlgorithm
(
x_dims
,
f_dims
,
strides
,
paddings
,
dilations
,
0
,
[
&
]()
{
int
returned_algo_count
;
std
::
array
<
cudnnConvolutionBwdFilterAlgoPerf_t
,
kNUM_CUDNN_BWD_FILTER_ALGS
>
filter_perf_stat
;
auto
cudnn_find_bd_f_func
=
[
&
](
void
*
cudnn_workspace
)
{
CUDNN_ENFORCE
(
platform
::
dynload
::
cudnnFindConvolutionBackwardFilterAlgorithmEx
(
handle
,
cudnn_input_desc
,
input_data
,
cudnn_output_grad_desc
,
output_grad_data
,
cudnn_conv_desc
,
cudnn_filter_desc
,
filter_grad_data
,
kNUM_CUDNN_BWD_FILTER_ALGS
,
&
returned_algo_count
,
filter_perf_stat
.
data
(),
cudnn_workspace
,
workspace_size_limit
));
};
workspace_handle
.
RunFunc
(
cudnn_find_bd_f_func
,
workspace_size_limit
);
return
filter_perf_stat
[
0
].
algo
;
});
VLOG
(
3
)
<<
"cuDNN backward filter algo "
<<
filter_algo
;
}
else
if
(
FLAGS_cudnn_deterministic
)
{
filter_algo
=
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1
;
}
else
{
CUDNN_ENFORCE
(
CUDNN_ENFORCE
(
platform
::
dynload
::
cudnnGetConvolutionBackwardFilterAlgorithm
(
platform
::
dynload
::
cudnnGetConvolutionBackwardFilterAlgorithm
(
handle
,
cudnn_input_desc
,
cudnn_output_grad_desc
,
handle
,
cudnn_input_desc
,
cudnn_output_grad_desc
,
cudnn_conv_desc
,
cudnn_filter_desc
,
cudnn_conv_desc
,
cudnn_filter_desc
,
CUDNN_CONVOLUTION_BWD_FILTER_SPECIFY_WORKSPACE_LIMIT
,
CUDNN_CONVOLUTION_BWD_FILTER_SPECIFY_WORKSPACE_LIMIT
,
workspace_size_limit
,
&
filter_algo
));
workspace_size_limit
,
&
filter_algo
));
}
else
{
filter_algo
=
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1
;
}
}
CUDNN_ENFORCE
(
CUDNN_ENFORCE
(
platform
::
dynload
::
cudnnGetConvolutionBackwardFilterWorkspaceSize
(
platform
::
dynload
::
cudnnGetConvolutionBackwardFilterWorkspaceSize
(
handle
,
cudnn_input_desc
,
cudnn_output_grad_desc
,
cudnn_conv_desc
,
handle
,
cudnn_input_desc
,
cudnn_output_grad_desc
,
cudnn_conv_desc
,
...
@@ -317,7 +483,6 @@ class CUDNNConvGradOpKernel : public framework::OpKernel<T> {
...
@@ -317,7 +483,6 @@ class CUDNNConvGradOpKernel : public framework::OpKernel<T> {
// ------------------- cudnn conv backward data ---------------------
// ------------------- cudnn conv backward data ---------------------
ScalingParamType
<
T
>
alpha
=
1.0
f
,
beta
=
0.0
f
;
ScalingParamType
<
T
>
alpha
=
1.0
f
,
beta
=
0.0
f
;
auto
workspace_handle
=
dev_ctx
.
cudnn_workspace_handle
();
if
(
input_grad
)
{
if
(
input_grad
)
{
T
*
input_grad_data
=
input_grad
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
T
*
input_grad_data
=
input_grad
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
// Because beta is zero, it is unnecessary to reset input_grad.
// Because beta is zero, it is unnecessary to reset input_grad.
...
...
paddle/fluid/operators/conv_cudnn_op_cache.h
0 → 100644
浏览文件 @
d231e550
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <functional>
#include <unordered_map>
#include <vector>
namespace
paddle
{
namespace
operators
{
template
<
typename
TAlgorithm
>
class
AlgorithmsCache
{
public:
// Caches the best algorithm for a given
// combination of tensor dimensions & compute data type.
TAlgorithm
GetAlgorithm
(
const
std
::
vector
<
int64_t
>&
dims1
,
const
std
::
vector
<
int64_t
>&
dims2
,
const
std
::
vector
<
int
>&
strides
,
const
std
::
vector
<
int
>&
paddings
,
const
std
::
vector
<
int
>&
dilations
,
int
algorithmFlags
,
// can set for different data type
std
::
function
<
TAlgorithm
()
>
gen_func
);
private:
std
::
unordered_map
<
int64_t
,
TAlgorithm
>
hash_
;
std
::
mutex
mutex_
;
};
template
<
typename
TAlgorithm
>
TAlgorithm
AlgorithmsCache
<
TAlgorithm
>::
GetAlgorithm
(
const
std
::
vector
<
int64_t
>&
dims1
,
const
std
::
vector
<
int64_t
>&
dims2
,
const
std
::
vector
<
int
>&
strides
,
const
std
::
vector
<
int
>&
paddings
,
const
std
::
vector
<
int
>&
dilations
,
int
algorithmFlags
,
std
::
function
<
TAlgorithm
()
>
gen_func
)
{
std
::
lock_guard
<
std
::
mutex
>
lock
(
mutex_
);
int64_t
seed
=
0
;
// Hash all of the inputs, use to try and look up a previously
// discovered algorithm, or fall back to generating a new one.
std
::
hash
<
int64_t
>
hashFn
;
// do hash like boost
// https://stackoverflow.com/questions/2590677/how-do-i-combine-hash-values-in-c0x
for
(
const
auto
num
:
dims1
)
{
seed
^=
hashFn
(
num
)
+
0x9e3779b9
+
(
seed
<<
6
)
+
(
seed
>>
2
);
}
for
(
const
auto
num
:
dims2
)
{
seed
^=
hashFn
(
num
)
+
0x9e3779b9
+
(
seed
<<
6
)
+
(
seed
>>
2
)
+
1
;
}
for
(
const
auto
num
:
strides
)
{
seed
^=
hashFn
(
static_cast
<
int64_t
>
(
num
))
+
0x9e3779b9
+
(
seed
<<
6
)
+
(
seed
>>
2
)
+
2
;
}
for
(
const
auto
num
:
paddings
)
{
seed
^=
hashFn
(
static_cast
<
int64_t
>
(
num
))
+
0x9e3779b9
+
(
seed
<<
6
)
+
(
seed
>>
2
)
+
3
;
}
for
(
const
auto
num
:
dilations
)
{
seed
^=
hashFn
(
static_cast
<
int64_t
>
(
num
))
+
0x9e3779b9
+
(
seed
<<
6
)
+
(
seed
>>
2
)
+
4
;
}
seed
^=
hashFn
(
static_cast
<
int64_t
>
(
algorithmFlags
))
+
0x9e3779b9
+
(
seed
<<
6
)
+
(
seed
>>
2
)
+
5
;
if
(
seed
==
0
)
return
gen_func
();
if
(
hash_
.
find
(
seed
)
==
hash_
.
end
())
{
TAlgorithm
value
=
gen_func
();
hash_
[
seed
]
=
value
;
}
return
hash_
[
seed
];
}
}
// namespace operators
}
// namespace paddle
paddle/fluid/operators/conv_op.cc
浏览文件 @
d231e550
...
@@ -189,6 +189,11 @@ void Conv2DOpMaker::Make() {
...
@@ -189,6 +189,11 @@ void Conv2DOpMaker::Make() {
"workspace size can increase performance but also requires "
"workspace size can increase performance but also requires "
"better hardware. This size should be chosen carefully."
)
"better hardware. This size should be chosen carefully."
)
.
SetDefault
(
4096
);
.
SetDefault
(
4096
);
AddAttr
<
bool
>
(
"exhaustive_search"
,
"(bool, default false) cuDNN has many algorithm to calculation "
"convolution, whether enable exhaustive search "
,
"for cuDNN convolution or not, defalut is False."
)
.
SetDefault
(
false
);
AddComment
(
R"DOC(
AddComment
(
R"DOC(
Convolution Operator.
Convolution Operator.
...
@@ -219,6 +224,15 @@ $$
...
@@ -219,6 +224,15 @@ $$
)DOC"
);
)DOC"
);
}
}
class
ConvOpInferVarType
:
public
framework
::
PassInDtypeAndVarTypeToOutput
{
protected:
std
::
unordered_map
<
std
::
string
,
std
::
string
>
GetInputOutputWithSameType
()
const
override
{
return
std
::
unordered_map
<
std
::
string
,
std
::
string
>
{
{
"Input"
,
/*->*/
"Output"
}};
}
};
void
Conv3DOpMaker
::
Make
()
{
void
Conv3DOpMaker
::
Make
()
{
AddInput
(
AddInput
(
"Input"
,
"Input"
,
...
@@ -283,7 +297,11 @@ void Conv3DOpMaker::Make() {
...
@@ -283,7 +297,11 @@ void Conv3DOpMaker::Make() {
"workspace size can increase performance but also requires "
"workspace size can increase performance but also requires "
"better hardware. This size should be chosen carefully."
)
"better hardware. This size should be chosen carefully."
)
.
SetDefault
(
4096
);
.
SetDefault
(
4096
);
AddAttr
<
bool
>
(
"exhaustive_search"
,
"(bool, default false) cuDNN has many algorithm to calculation "
"convolution, whether enable exhaustive search "
,
"for cuDNN convolution or not, defalut is False."
)
.
SetDefault
(
false
);
AddComment
(
R"DOC(
AddComment
(
R"DOC(
Convolution3D Operator.
Convolution3D Operator.
...
@@ -356,6 +374,7 @@ framework::OpKernelType ConvOpGrad::GetExpectedKernelType(
...
@@ -356,6 +374,7 @@ framework::OpKernelType ConvOpGrad::GetExpectedKernelType(
namespace
ops
=
paddle
::
operators
;
namespace
ops
=
paddle
::
operators
;
REGISTER_OPERATOR
(
conv2d
,
ops
::
ConvOp
,
ops
::
Conv2DOpMaker
,
REGISTER_OPERATOR
(
conv2d
,
ops
::
ConvOp
,
ops
::
Conv2DOpMaker
,
ops
::
ConvOpInferVarType
,
paddle
::
framework
::
DefaultGradOpDescMaker
<
true
>
);
paddle
::
framework
::
DefaultGradOpDescMaker
<
true
>
);
REGISTER_OPERATOR
(
conv2d_grad
,
ops
::
ConvOpGrad
);
REGISTER_OPERATOR
(
conv2d_grad
,
ops
::
ConvOpGrad
);
...
@@ -363,7 +382,9 @@ REGISTER_OPERATOR(conv2d_grad, ops::ConvOpGrad);
...
@@ -363,7 +382,9 @@ REGISTER_OPERATOR(conv2d_grad, ops::ConvOpGrad);
REGISTER_OPERATOR
(
depthwise_conv2d
,
ops
::
ConvOp
,
ops
::
Conv2DOpMaker
,
REGISTER_OPERATOR
(
depthwise_conv2d
,
ops
::
ConvOp
,
ops
::
Conv2DOpMaker
,
paddle
::
framework
::
DefaultGradOpDescMaker
<
true
>
);
paddle
::
framework
::
DefaultGradOpDescMaker
<
true
>
);
REGISTER_OPERATOR
(
depthwise_conv2d_grad
,
ops
::
ConvOpGrad
);
REGISTER_OPERATOR
(
depthwise_conv2d_grad
,
ops
::
ConvOpGrad
);
REGISTER_OPERATOR
(
conv3d
,
ops
::
ConvOp
,
ops
::
Conv3DOpMaker
,
REGISTER_OPERATOR
(
conv3d
,
ops
::
ConvOp
,
ops
::
Conv3DOpMaker
,
ops
::
ConvOpInferVarType
,
paddle
::
framework
::
DefaultGradOpDescMaker
<
true
>
);
paddle
::
framework
::
DefaultGradOpDescMaker
<
true
>
);
REGISTER_OPERATOR
(
conv3d_grad
,
ops
::
ConvOpGrad
);
REGISTER_OPERATOR
(
conv3d_grad
,
ops
::
ConvOpGrad
);
...
...
paddle/fluid/operators/cross_entropy_op.cc
浏览文件 @
d231e550
...
@@ -13,6 +13,7 @@ See the License for the specific language governing permissions and
...
@@ -13,6 +13,7 @@ See the License for the specific language governing permissions and
limitations under the License. */
limitations under the License. */
#include "paddle/fluid/operators/cross_entropy_op.h"
#include "paddle/fluid/operators/cross_entropy_op.h"
#include <string>
namespace
paddle
{
namespace
paddle
{
namespace
operators
{
namespace
operators
{
...
@@ -179,6 +180,15 @@ or not. But the output only shares the LoD information with input X.
...
@@ -179,6 +180,15 @@ or not. But the output only shares the LoD information with input X.
)DOC"
);
)DOC"
);
}
}
};
};
class
CrossEntropyOpInferVarType
:
public
framework
::
PassInDtypeAndVarTypeToOutput
{
protected:
std
::
unordered_map
<
std
::
string
,
std
::
string
>
GetInputOutputWithSameType
()
const
override
{
return
std
::
unordered_map
<
std
::
string
,
std
::
string
>
{{
"X"
,
/*->*/
"Y"
}};
}
};
}
// namespace operators
}
// namespace operators
}
// namespace paddle
}
// namespace paddle
...
@@ -186,6 +196,7 @@ namespace ops = paddle::operators;
...
@@ -186,6 +196,7 @@ namespace ops = paddle::operators;
using
CPUCtx
=
paddle
::
platform
::
CPUDeviceContext
;
using
CPUCtx
=
paddle
::
platform
::
CPUDeviceContext
;
REGISTER_OPERATOR
(
cross_entropy
,
ops
::
CrossEntropyOp
,
ops
::
CrossEntropyOpMaker
,
REGISTER_OPERATOR
(
cross_entropy
,
ops
::
CrossEntropyOp
,
ops
::
CrossEntropyOpMaker
,
ops
::
CrossEntropyOpInferVarType
,
paddle
::
framework
::
DefaultGradOpDescMaker
<
true
>
);
paddle
::
framework
::
DefaultGradOpDescMaker
<
true
>
);
REGISTER_OPERATOR
(
cross_entropy_grad
,
ops
::
CrossEntropyGradientOp
);
REGISTER_OPERATOR
(
cross_entropy_grad
,
ops
::
CrossEntropyGradientOp
);
REGISTER_OP_CPU_KERNEL
(
cross_entropy
,
ops
::
CrossEntropyOpKernel
<
CPUCtx
,
float
>
,
REGISTER_OP_CPU_KERNEL
(
cross_entropy
,
ops
::
CrossEntropyOpKernel
<
CPUCtx
,
float
>
,
...
...
paddle/fluid/operators/distributed/brpc_server.cc
浏览文件 @
d231e550
...
@@ -133,10 +133,10 @@ void AsyncBRPCServer::StartServer() {
...
@@ -133,10 +133,10 @@ void AsyncBRPCServer::StartServer() {
void
AsyncBRPCServer
::
ShutDownImpl
()
{
server_
.
Stop
(
1000
);
}
void
AsyncBRPCServer
::
ShutDownImpl
()
{
server_
.
Stop
(
1000
);
}
void
AsyncBRPCServer
::
WaitServerReady
()
{
void
AsyncBRPCServer
::
WaitServerReady
()
{
VLOG
(
3
)
<<
"AsyncGRPCServer is wait server ready"
;
VLOG
(
3
0
)
<<
"AsyncGRPCServer is wait server ready"
;
std
::
unique_lock
<
std
::
mutex
>
lock
(
this
->
mutex_ready_
);
std
::
unique_lock
<
std
::
mutex
>
lock
(
this
->
mutex_ready_
);
condition_ready_
.
wait
(
lock
,
[
=
]
{
return
this
->
ready_
==
1
;
});
condition_ready_
.
wait
(
lock
,
[
=
]
{
return
this
->
ready_
==
1
;
});
VLOG
(
3
)
<<
"AsyncGRPCServer WaitSeverReady"
;
VLOG
(
3
0
)
<<
"AsyncGRPCServer WaitSeverReady"
;
}
}
};
// namespace distributed
};
// namespace distributed
...
...
paddle/fluid/operators/distributed/grpc_client.cc
浏览文件 @
d231e550
...
@@ -38,7 +38,7 @@ void GRPCClient::SendComplete() {
...
@@ -38,7 +38,7 @@ void GRPCClient::SendComplete() {
std
::
unique_lock
<
std
::
mutex
>
lk
(
completed_mutex_
);
std
::
unique_lock
<
std
::
mutex
>
lk
(
completed_mutex_
);
if
(
!
completed_
)
{
if
(
!
completed_
)
{
for
(
auto
&
it
:
channels_
)
{
for
(
auto
&
it
:
channels_
)
{
VLOG
(
3
)
<<
"send complete message to "
<<
it
.
first
;
VLOG
(
3
0
)
<<
"send complete message to "
<<
it
.
first
;
this
->
AsyncSendComplete
(
it
.
first
);
this
->
AsyncSendComplete
(
it
.
first
);
}
}
PADDLE_ENFORCE
(
this
->
Wait
(),
"internal grpc error"
);
PADDLE_ENFORCE
(
this
->
Wait
(),
"internal grpc error"
);
...
@@ -81,7 +81,7 @@ VarHandlePtr GRPCClient::AsyncSendVar(const std::string& ep,
...
@@ -81,7 +81,7 @@ VarHandlePtr GRPCClient::AsyncSendVar(const std::string& ep,
::
grpc
::
ByteBuffer
req
;
::
grpc
::
ByteBuffer
req
;
SerializeToByteBuffer
(
var_name_val
,
var
,
*
p_ctx
,
&
req
,
""
,
trainer_id_
);
SerializeToByteBuffer
(
var_name_val
,
var
,
*
p_ctx
,
&
req
,
""
,
trainer_id_
);
VLOG
(
3
)
<<
s
->
GetVarHandlePtr
()
->
String
()
<<
" begin"
;
VLOG
(
3
0
)
<<
s
->
GetVarHandlePtr
()
->
String
()
<<
" begin"
;
// stub context
// stub context
s
->
response_call_back_
=
nullptr
;
s
->
response_call_back_
=
nullptr
;
...
@@ -142,7 +142,7 @@ VarHandlePtr GRPCClient::AsyncGetVar(const std::string& ep,
...
@@ -142,7 +142,7 @@ VarHandlePtr GRPCClient::AsyncGetVar(const std::string& ep,
::
grpc
::
ByteBuffer
buf
;
::
grpc
::
ByteBuffer
buf
;
RequestToByteBuffer
<
sendrecv
::
VariableMessage
>
(
req
,
&
buf
);
RequestToByteBuffer
<
sendrecv
::
VariableMessage
>
(
req
,
&
buf
);
VLOG
(
3
)
<<
s
->
GetVarHandlePtr
()
->
String
()
<<
" begin"
;
VLOG
(
3
0
)
<<
s
->
GetVarHandlePtr
()
->
String
()
<<
" begin"
;
// stub context
// stub context
s
->
response_call_back_
=
ProcGetResponse
;
s
->
response_call_back_
=
ProcGetResponse
;
...
@@ -190,7 +190,7 @@ VarHandlePtr GRPCClient::AsyncPrefetchVar(const std::string& ep,
...
@@ -190,7 +190,7 @@ VarHandlePtr GRPCClient::AsyncPrefetchVar(const std::string& ep,
::
grpc
::
ByteBuffer
req
;
::
grpc
::
ByteBuffer
req
;
SerializeToByteBuffer
(
in_var_name_val
,
var
,
*
p_ctx
,
&
req
,
out_var_name_val
);
SerializeToByteBuffer
(
in_var_name_val
,
var
,
*
p_ctx
,
&
req
,
out_var_name_val
);
VLOG
(
3
)
<<
s
->
GetVarHandlePtr
()
->
String
()
<<
" begin"
;
VLOG
(
3
0
)
<<
s
->
GetVarHandlePtr
()
->
String
()
<<
" begin"
;
// stub context
// stub context
s
->
response_call_back_
=
ProcGetResponse
;
s
->
response_call_back_
=
ProcGetResponse
;
...
@@ -328,14 +328,14 @@ void GRPCClient::Proceed() {
...
@@ -328,14 +328,14 @@ void GRPCClient::Proceed() {
void
*
tag
=
nullptr
;
void
*
tag
=
nullptr
;
bool
ok
=
false
;
bool
ok
=
false
;
VLOG
(
3
)
<<
"GRPCClient Proceed begin"
;
VLOG
(
3
0
)
<<
"GRPCClient Proceed begin"
;
while
(
!
stopped_
&&
cq_
.
Next
(
&
tag
,
&
ok
))
{
while
(
!
stopped_
&&
cq_
.
Next
(
&
tag
,
&
ok
))
{
BaseProcessor
*
c
=
static_cast
<
BaseProcessor
*>
(
tag
);
BaseProcessor
*
c
=
static_cast
<
BaseProcessor
*>
(
tag
);
GPR_ASSERT
(
ok
);
GPR_ASSERT
(
ok
);
PADDLE_ENFORCE
(
c
);
PADDLE_ENFORCE
(
c
);
if
(
c
->
status_
.
ok
())
{
if
(
c
->
status_
.
ok
())
{
VLOG
(
3
)
<<
c
->
GetVarHandlePtr
()
->
String
()
<<
" process"
;
VLOG
(
3
0
)
<<
c
->
GetVarHandlePtr
()
->
String
()
<<
" process"
;
c
->
Process
();
c
->
Process
();
}
else
if
(
c
->
status_
.
error_code
()
==
grpc
::
StatusCode
::
DEADLINE_EXCEEDED
)
{
}
else
if
(
c
->
status_
.
error_code
()
==
grpc
::
StatusCode
::
DEADLINE_EXCEEDED
)
{
// FIXME(gongwb): parse error_details?
// FIXME(gongwb): parse error_details?
...
@@ -370,7 +370,7 @@ void GRPCClient::Proceed() {
...
@@ -370,7 +370,7 @@ void GRPCClient::Proceed() {
sync_cond_
.
notify_all
();
sync_cond_
.
notify_all
();
}
}
}
}
VLOG
(
3
)
<<
"GRPCClient Proceed end"
;
VLOG
(
3
0
)
<<
"GRPCClient Proceed end"
;
}
}
std
::
shared_ptr
<
grpc
::
Channel
>
GRPCClient
::
GetChannel
(
const
std
::
string
&
ep
)
{
std
::
shared_ptr
<
grpc
::
Channel
>
GRPCClient
::
GetChannel
(
const
std
::
string
&
ep
)
{
...
...
paddle/fluid/operators/distributed/grpc_server.cc
浏览文件 @
d231e550
...
@@ -98,7 +98,7 @@ class RequestSend final : public RequestBase {
...
@@ -98,7 +98,7 @@ class RequestSend final : public RequestBase {
void
Process
()
override
{
void
Process
()
override
{
std
::
string
varname
=
GetReqName
();
std
::
string
varname
=
GetReqName
();
VLOG
(
4
)
<<
"RequestSend var_name:"
<<
varname
;
VLOG
(
4
0
)
<<
"RequestSend var_name:"
<<
varname
;
auto
scope
=
request_
->
GetMutableLocalScope
();
auto
scope
=
request_
->
GetMutableLocalScope
();
auto
invar
=
request_
->
GetVar
();
auto
invar
=
request_
->
GetVar
();
...
@@ -135,7 +135,7 @@ class RequestGet final : public RequestBase {
...
@@ -135,7 +135,7 @@ class RequestGet final : public RequestBase {
// proc request.
// proc request.
std
::
string
varname
=
request_
.
varname
();
std
::
string
varname
=
request_
.
varname
();
int
trainer_id
=
request_
.
trainer_id
();
int
trainer_id
=
request_
.
trainer_id
();
VLOG
(
4
)
<<
"RequestGet "
<<
varname
;
VLOG
(
4
0
)
<<
"RequestGet "
<<
varname
;
auto
scope
=
request_handler_
->
scope
();
auto
scope
=
request_handler_
->
scope
();
auto
invar
=
scope
->
FindVar
(
varname
);
auto
invar
=
scope
->
FindVar
(
varname
);
...
@@ -182,7 +182,7 @@ class RequestPrefetch final : public RequestBase {
...
@@ -182,7 +182,7 @@ class RequestPrefetch final : public RequestBase {
std
::
string
in_var_name
=
request_
->
Varname
();
std
::
string
in_var_name
=
request_
->
Varname
();
std
::
string
out_var_name
=
request_
->
OutVarname
();
std
::
string
out_var_name
=
request_
->
OutVarname
();
int
trainer_id
=
request_
->
GetTrainerId
();
int
trainer_id
=
request_
->
GetTrainerId
();
VLOG
(
4
)
<<
"RequestPrefetch, in_var_name: "
<<
in_var_name
VLOG
(
4
0
)
<<
"RequestPrefetch, in_var_name: "
<<
in_var_name
<<
" out_var_name: "
<<
out_var_name
;
<<
" out_var_name: "
<<
out_var_name
;
auto
scope
=
request_
->
GetMutableLocalScope
();
auto
scope
=
request_
->
GetMutableLocalScope
();
...
@@ -231,7 +231,7 @@ class RequestCheckpointNotify final : public RequestBase {
...
@@ -231,7 +231,7 @@ class RequestCheckpointNotify final : public RequestBase {
std
::
string
checkpoint_dir
=
request_
->
OutVarname
();
std
::
string
checkpoint_dir
=
request_
->
OutVarname
();
int
trainer_id
=
request_
->
GetTrainerId
();
int
trainer_id
=
request_
->
GetTrainerId
();
VLOG
(
4
)
<<
"RequestCheckpointNotify notify: "
<<
checkpoint_notify
VLOG
(
4
0
)
<<
"RequestCheckpointNotify notify: "
<<
checkpoint_notify
<<
", dir: "
<<
checkpoint_dir
;
<<
", dir: "
<<
checkpoint_dir
;
request_handler_
->
Handle
(
checkpoint_notify
,
scope
,
nullptr
,
nullptr
,
request_handler_
->
Handle
(
checkpoint_notify
,
scope
,
nullptr
,
nullptr
,
...
@@ -246,10 +246,10 @@ class RequestCheckpointNotify final : public RequestBase {
...
@@ -246,10 +246,10 @@ class RequestCheckpointNotify final : public RequestBase {
};
};
void
AsyncGRPCServer
::
WaitServerReady
()
{
void
AsyncGRPCServer
::
WaitServerReady
()
{
VLOG
(
4
)
<<
"AsyncGRPCServer is wait server ready"
;
VLOG
(
4
0
)
<<
"AsyncGRPCServer is wait server ready"
;
std
::
unique_lock
<
std
::
mutex
>
lock
(
this
->
mutex_ready_
);
std
::
unique_lock
<
std
::
mutex
>
lock
(
this
->
mutex_ready_
);
condition_ready_
.
wait
(
lock
,
[
=
]
{
return
this
->
ready_
==
1
;
});
condition_ready_
.
wait
(
lock
,
[
=
]
{
return
this
->
ready_
==
1
;
});
VLOG
(
4
)
<<
"AsyncGRPCServer WaitSeverReady"
;
VLOG
(
4
0
)
<<
"AsyncGRPCServer WaitSeverReady"
;
}
}
void
AsyncGRPCServer
::
StartServer
()
{
void
AsyncGRPCServer
::
StartServer
()
{
...
@@ -282,14 +282,15 @@ void AsyncGRPCServer::StartServer() {
...
@@ -282,14 +282,15 @@ void AsyncGRPCServer::StartServer() {
reqs
.
reserve
(
kRequestBufSize
);
reqs
.
reserve
(
kRequestBufSize
);
for
(
int
i
=
0
;
i
<
kRequestBufSize
;
i
++
)
{
for
(
int
i
=
0
;
i
<
kRequestBufSize
;
i
++
)
{
VLOG
(
6
)
<<
"TryToRegisterNewOne on RPC NAME: "
<<
rpc_name
<<
" I: "
<<
i
;
VLOG
(
60
)
<<
"TryToRegisterNewOne on RPC NAME: "
<<
rpc_name
<<
" I: "
<<
i
;
TryToRegisterNewOne
(
rpc_name
,
i
);
TryToRegisterNewOne
(
rpc_name
,
i
);
}
}
for
(
int
i
=
0
;
i
<
threadnum
;
i
++
)
{
for
(
int
i
=
0
;
i
<
threadnum
;
i
++
)
{
rpc_threads_
[
rpc_name
].
emplace_back
(
new
std
::
thread
(
std
::
bind
(
rpc_threads_
[
rpc_name
].
emplace_back
(
new
std
::
thread
(
std
::
bind
(
&
AsyncGRPCServer
::
HandleRequest
,
this
,
cq
.
get
(),
rpc_name
,
f
)));
&
AsyncGRPCServer
::
HandleRequest
,
this
,
cq
.
get
(),
rpc_name
,
f
)));
VLOG
(
4
)
<<
t
.
first
<<
" creates threads!"
;
VLOG
(
4
0
)
<<
t
.
first
<<
" creates threads!"
;
}
}
}
}
...
@@ -306,7 +307,7 @@ void AsyncGRPCServer::StartServer() {
...
@@ -306,7 +307,7 @@ void AsyncGRPCServer::StartServer() {
auto
&
threads
=
t
.
second
;
auto
&
threads
=
t
.
second
;
for
(
size_t
i
=
0
;
i
<
threads
.
size
();
++
i
)
{
for
(
size_t
i
=
0
;
i
<
threads
.
size
();
++
i
)
{
threads
[
i
]
->
join
();
threads
[
i
]
->
join
();
VLOG
(
4
)
<<
t
.
first
<<
" threads ends!"
;
VLOG
(
4
0
)
<<
t
.
first
<<
" threads ends!"
;
}
}
}
}
}
}
...
@@ -314,7 +315,7 @@ void AsyncGRPCServer::StartServer() {
...
@@ -314,7 +315,7 @@ void AsyncGRPCServer::StartServer() {
void
AsyncGRPCServer
::
ShutdownQueue
()
{
void
AsyncGRPCServer
::
ShutdownQueue
()
{
for
(
auto
&
t
:
rpc_cq_
)
{
for
(
auto
&
t
:
rpc_cq_
)
{
t
.
second
->
Shutdown
();
t
.
second
->
Shutdown
();
VLOG
(
4
)
<<
t
.
first
<<
" queue shutdown!"
;
VLOG
(
4
0
)
<<
t
.
first
<<
" queue shutdown!"
;
}
}
}
}
...
@@ -323,7 +324,7 @@ void AsyncGRPCServer::ShutDownImpl() {
...
@@ -323,7 +324,7 @@ void AsyncGRPCServer::ShutDownImpl() {
is_shut_down_
=
true
;
is_shut_down_
=
true
;
ShutdownQueue
();
ShutdownQueue
();
VLOG
(
4
)
<<
"server_ shutdown!"
;
VLOG
(
4
0
)
<<
"server_ shutdown!"
;
server_
->
Shutdown
();
server_
->
Shutdown
();
}
}
...
@@ -331,11 +332,11 @@ void AsyncGRPCServer::TryToRegisterNewOne(const std::string& rpc_name,
...
@@ -331,11 +332,11 @@ void AsyncGRPCServer::TryToRegisterNewOne(const std::string& rpc_name,
int
req_id
)
{
int
req_id
)
{
std
::
unique_lock
<
std
::
mutex
>
lock
(
cq_mutex_
);
std
::
unique_lock
<
std
::
mutex
>
lock
(
cq_mutex_
);
if
(
is_shut_down_
)
{
if
(
is_shut_down_
)
{
VLOG
(
4
)
<<
"shutdown, do not TryToRegisterNewSendOne"
;
VLOG
(
4
0
)
<<
"shutdown, do not TryToRegisterNewSendOne"
;
return
;
return
;
}
}
VLOG
(
4
)
<<
"TryToRegisterNewOne on RPC NAME: "
<<
rpc_name
VLOG
(
4
0
)
<<
"TryToRegisterNewOne on RPC NAME: "
<<
rpc_name
<<
" REQ ID: "
<<
req_id
;
<<
" REQ ID: "
<<
req_id
;
auto
&
reqs
=
rpc_reqs_
[
rpc_name
];
auto
&
reqs
=
rpc_reqs_
[
rpc_name
];
...
@@ -357,7 +358,7 @@ void AsyncGRPCServer::TryToRegisterNewOne(const std::string& rpc_name,
...
@@ -357,7 +358,7 @@ void AsyncGRPCServer::TryToRegisterNewOne(const std::string& rpc_name,
reqs
[
req_id
]
=
b
;
reqs
[
req_id
]
=
b
;
VLOG
(
4
)
<<
"Create RequestSend status:"
<<
b
->
Status
();
VLOG
(
4
0
)
<<
"Create RequestSend status:"
<<
b
->
Status
();
}
}
void
AsyncGRPCServer
::
HandleRequest
(
void
AsyncGRPCServer
::
HandleRequest
(
...
@@ -367,14 +368,14 @@ void AsyncGRPCServer::HandleRequest(
...
@@ -367,14 +368,14 @@ void AsyncGRPCServer::HandleRequest(
bool
ok
=
false
;
bool
ok
=
false
;
while
(
true
)
{
while
(
true
)
{
VLOG
(
4
)
<<
"HandleRequest "
<<
rpc_name
<<
" wait next"
;
VLOG
(
4
0
)
<<
"HandleRequest "
<<
rpc_name
<<
" wait next"
;
if
(
!
cq
->
Next
(
&
tag
,
&
ok
))
{
if
(
!
cq
->
Next
(
&
tag
,
&
ok
))
{
VLOG
(
3
)
<<
"CompletionQueue "
<<
rpc_name
<<
" shutdown!"
;
VLOG
(
3
0
)
<<
"CompletionQueue "
<<
rpc_name
<<
" shutdown!"
;
break
;
break
;
}
}
int
req_id
=
static_cast
<
int
>
(
reinterpret_cast
<
intptr_t
>
(
tag
));
int
req_id
=
static_cast
<
int
>
(
reinterpret_cast
<
intptr_t
>
(
tag
));
VLOG
(
4
)
<<
"HandleRequest "
<<
rpc_name
<<
", req_id:"
<<
req_id
VLOG
(
4
0
)
<<
"HandleRequest "
<<
rpc_name
<<
", req_id:"
<<
req_id
<<
" get next"
;
<<
" get next"
;
auto
&
reqs
=
rpc_reqs_
[
rpc_name
];
auto
&
reqs
=
rpc_reqs_
[
rpc_name
];
...
@@ -385,7 +386,7 @@ void AsyncGRPCServer::HandleRequest(
...
@@ -385,7 +386,7 @@ void AsyncGRPCServer::HandleRequest(
base
=
reqs
[
req_id
];
base
=
reqs
[
req_id
];
}
}
VLOG
(
3
)
<<
base
->
Status2String
(
rpc_name
);
VLOG
(
3
0
)
<<
base
->
Status2String
(
rpc_name
);
// reference:
// reference:
// https://github.com/tensorflow/tensorflow/issues/5596
// https://github.com/tensorflow/tensorflow/issues/5596
...
...
paddle/fluid/operators/distributed/request_handler.h
浏览文件 @
d231e550
...
@@ -75,7 +75,7 @@ class VarHandle {
...
@@ -75,7 +75,7 @@ class VarHandle {
wait_cond_
.
wait
(
lk
,
[
this
]
{
return
status_
!=
kDefaultState
;
});
wait_cond_
.
wait
(
lk
,
[
this
]
{
return
status_
!=
kDefaultState
;
});
ret
=
status_
;
ret
=
status_
;
}
}
VLOG
(
7
)
<<
"VarHandle wait:"
<<
ret
;
VLOG
(
7
0
)
<<
"VarHandle wait:"
<<
ret
;
return
ret
!=
kErrorState
;
return
ret
!=
kErrorState
;
}
}
...
@@ -84,7 +84,7 @@ class VarHandle {
...
@@ -84,7 +84,7 @@ class VarHandle {
std
::
unique_lock
<
std
::
mutex
>
lk
(
sync_mutex_
);
std
::
unique_lock
<
std
::
mutex
>
lk
(
sync_mutex_
);
status_
=
ok
?
kFinishState
:
kErrorState
;
status_
=
ok
?
kFinishState
:
kErrorState
;
}
}
VLOG
(
7
)
<<
"VarHandle finish:"
<<
ok
;
VLOG
(
7
0
)
<<
"VarHandle finish:"
<<
ok
;
wait_cond_
.
notify_all
();
wait_cond_
.
notify_all
();
}
}
...
...
paddle/fluid/operators/distributed/request_handler_impl.cc
浏览文件 @
d231e550
...
@@ -38,19 +38,19 @@ bool RequestSendHandler::Handle(const std::string& varname,
...
@@ -38,19 +38,19 @@ bool RequestSendHandler::Handle(const std::string& varname,
framework
::
Variable
**
outvar
,
framework
::
Variable
**
outvar
,
const
int
trainer_id
,
const
int
trainer_id
,
const
std
::
string
&
out_var_name
)
{
const
std
::
string
&
out_var_name
)
{
VLOG
(
4
)
<<
"RequestSendHandler:"
<<
varname
;
VLOG
(
4
0
)
<<
"RequestSendHandler:"
<<
varname
;
// Sync
// Sync
if
(
varname
==
BATCH_BARRIER_MESSAGE
)
{
if
(
varname
==
BATCH_BARRIER_MESSAGE
)
{
VLOG
(
3
)
<<
"sync: recv BATCH_BARRIER_MESSAGE"
;
VLOG
(
3
0
)
<<
"sync: recv BATCH_BARRIER_MESSAGE"
;
rpc_server_
->
IncreaseBatchBarrier
(
kRequestSend
);
rpc_server_
->
IncreaseBatchBarrier
(
kRequestSend
);
}
else
if
(
varname
==
COMPLETE_MESSAGE
)
{
}
else
if
(
varname
==
COMPLETE_MESSAGE
)
{
VLOG
(
3
)
<<
"sync: recv complete message"
;
VLOG
(
3
0
)
<<
"sync: recv complete message"
;
rpc_server_
->
Complete
();
rpc_server_
->
Complete
();
}
else
{
}
else
{
// Async
// Async
if
(
!
sync_mode_
)
{
if
(
!
sync_mode_
)
{
VLOG
(
3
)
<<
"async process var: "
<<
varname
;
VLOG
(
3
0
)
<<
"async process var: "
<<
varname
;
try
{
try
{
executor_
->
RunPreparedContext
((
*
grad_to_prepared_ctx_
)[
varname
].
get
(),
executor_
->
RunPreparedContext
((
*
grad_to_prepared_ctx_
)[
varname
].
get
(),
scope
);
scope
);
...
@@ -61,7 +61,7 @@ bool RequestSendHandler::Handle(const std::string& varname,
...
@@ -61,7 +61,7 @@ bool RequestSendHandler::Handle(const std::string& varname,
return
true
;
return
true
;
}
else
{
// sync
}
else
{
// sync
rpc_server_
->
WaitCond
(
kRequestSend
);
rpc_server_
->
WaitCond
(
kRequestSend
);
VLOG
(
3
)
<<
"sync: processing received var: "
<<
varname
;
VLOG
(
3
0
)
<<
"sync: processing received var: "
<<
varname
;
if
(
invar
==
nullptr
)
{
if
(
invar
==
nullptr
)
{
LOG
(
FATAL
)
<<
"sync: Can not find server side var: "
<<
varname
;
LOG
(
FATAL
)
<<
"sync: Can not find server side var: "
<<
varname
;
...
@@ -78,10 +78,10 @@ bool RequestGetHandler::Handle(const std::string& varname,
...
@@ -78,10 +78,10 @@ bool RequestGetHandler::Handle(const std::string& varname,
framework
::
Variable
**
outvar
,
framework
::
Variable
**
outvar
,
const
int
trainer_id
,
const
int
trainer_id
,
const
std
::
string
&
out_var_name
)
{
const
std
::
string
&
out_var_name
)
{
VLOG
(
4
)
<<
"RequestGetHandler:"
<<
varname
;
VLOG
(
4
0
)
<<
"RequestGetHandler:"
<<
varname
;
if
(
sync_mode_
)
{
if
(
sync_mode_
)
{
if
(
varname
==
FETCH_BARRIER_MESSAGE
)
{
if
(
varname
==
FETCH_BARRIER_MESSAGE
)
{
VLOG
(
3
)
<<
"sync: recv fetch barrier message"
;
VLOG
(
3
0
)
<<
"sync: recv fetch barrier message"
;
rpc_server_
->
IncreaseBatchBarrier
(
kRequestGet
);
rpc_server_
->
IncreaseBatchBarrier
(
kRequestGet
);
}
else
{
}
else
{
rpc_server_
->
WaitCond
(
kRequestGet
);
rpc_server_
->
WaitCond
(
kRequestGet
);
...
@@ -93,13 +93,14 @@ bool RequestGetHandler::Handle(const std::string& varname,
...
@@ -93,13 +93,14 @@ bool RequestGetHandler::Handle(const std::string& varname,
// NOTE: the format is determined by distributed_transpiler.py
// NOTE: the format is determined by distributed_transpiler.py
std
::
string
param_bak_name
=
std
::
string
param_bak_name
=
string
::
Sprintf
(
"%s.trainer_%d_bak"
,
varname
,
trainer_id
);
string
::
Sprintf
(
"%s.trainer_%d_bak"
,
varname
,
trainer_id
);
VLOG
(
3
)
<<
"getting "
<<
param_bak_name
<<
" trainer_id "
<<
trainer_id
;
VLOG
(
30
)
<<
"getting "
<<
param_bak_name
<<
" trainer_id "
<<
trainer_id
;
auto
var
=
scope_
->
FindVar
(
varname
);
auto
var
=
scope_
->
FindVar
(
varname
);
auto
t_orig
=
var
->
Get
<
framework
::
LoDTensor
>
();
auto
t_orig
=
var
->
Get
<
framework
::
LoDTensor
>
();
auto
param_bak
=
scope_
->
Var
(
param_bak_name
);
auto
param_bak
=
scope_
->
Var
(
param_bak_name
);
auto
t
=
param_bak
->
GetMutable
<
framework
::
LoDTensor
>
();
auto
t
=
param_bak
->
GetMutable
<
framework
::
LoDTensor
>
();
t
->
mutable_data
(
dev_ctx_
->
GetPlace
(),
t_orig
.
type
());
t
->
mutable_data
(
dev_ctx_
->
GetPlace
(),
t_orig
.
type
());
VLOG
(
3
)
<<
"copying "
<<
varname
<<
" to "
<<
param_bak_name
;
VLOG
(
3
0
)
<<
"copying "
<<
varname
<<
" to "
<<
param_bak_name
;
framework
::
TensorCopy
(
t_orig
,
dev_ctx_
->
GetPlace
(),
t
);
framework
::
TensorCopy
(
t_orig
,
dev_ctx_
->
GetPlace
(),
t
);
}
}
*
outvar
=
scope_
->
FindVar
(
varname
);
*
outvar
=
scope_
->
FindVar
(
varname
);
...
@@ -114,7 +115,7 @@ bool RequestPrefetchHandler::Handle(const std::string& varname,
...
@@ -114,7 +115,7 @@ bool RequestPrefetchHandler::Handle(const std::string& varname,
framework
::
Variable
**
outvar
,
framework
::
Variable
**
outvar
,
const
int
trainer_id
,
const
int
trainer_id
,
const
std
::
string
&
out_var_name
)
{
const
std
::
string
&
out_var_name
)
{
VLOG
(
4
)
<<
"RequestPrefetchHandler "
<<
varname
;
VLOG
(
4
0
)
<<
"RequestPrefetchHandler "
<<
varname
;
auto
var_desc
=
program_
->
Block
(
0
).
FindVar
(
out_var_name
);
auto
var_desc
=
program_
->
Block
(
0
).
FindVar
(
out_var_name
);
InitializeVariable
(
*
outvar
,
var_desc
->
GetType
());
InitializeVariable
(
*
outvar
,
var_desc
->
GetType
());
...
@@ -138,7 +139,7 @@ bool RequestCheckpointHandler::Handle(const std::string& varname,
...
@@ -138,7 +139,7 @@ bool RequestCheckpointHandler::Handle(const std::string& varname,
auto
*
lt_var
=
scope_
->
FindVar
(
LOOKUP_TABLE_PATH
)
->
GetMutable
<
std
::
string
>
();
auto
*
lt_var
=
scope_
->
FindVar
(
LOOKUP_TABLE_PATH
)
->
GetMutable
<
std
::
string
>
();
lt_var
->
clear
();
lt_var
->
clear
();
lt_var
->
append
(
out_var_name
);
lt_var
->
append
(
out_var_name
);
VLOG
(
4
)
<<
"RequestCheckpointHandler update var kLookupTablePath to: "
VLOG
(
4
0
)
<<
"RequestCheckpointHandler update var kLookupTablePath to: "
<<
out_var_name
;
<<
out_var_name
;
executor_
->
RunPreparedContext
(
checkpoint_prepared_ctx_
.
get
(),
scope_
);
executor_
->
RunPreparedContext
(
checkpoint_prepared_ctx_
.
get
(),
scope_
);
return
true
;
return
true
;
...
...
paddle/fluid/operators/distributed/rpc_server.cc
浏览文件 @
d231e550
...
@@ -39,7 +39,7 @@ void RPCServer::SavePort() const {
...
@@ -39,7 +39,7 @@ void RPCServer::SavePort() const {
port_file
.
open
(
file_path
);
port_file
.
open
(
file_path
);
port_file
<<
selected_port_
;
port_file
<<
selected_port_
;
port_file
.
close
();
port_file
.
close
();
VLOG
(
4
)
<<
"selected port written to "
<<
file_path
;
VLOG
(
4
0
)
<<
"selected port written to "
<<
file_path
;
}
}
void
RPCServer
::
WaitBarrier
(
const
std
::
string
&
rpc_name
)
{
void
RPCServer
::
WaitBarrier
(
const
std
::
string
&
rpc_name
)
{
...
@@ -49,12 +49,12 @@ void RPCServer::WaitBarrier(const std::string& rpc_name) {
...
@@ -49,12 +49,12 @@ void RPCServer::WaitBarrier(const std::string& rpc_name) {
exit_flag_
.
load
());
exit_flag_
.
load
());
});
});
VLOG
(
3
)
<<
"batch_barrier_: "
<<
rpc_name
<<
" "
VLOG
(
3
0
)
<<
"batch_barrier_: "
<<
rpc_name
<<
" "
<<
barrier_counter_
[
rpc_name
];
<<
barrier_counter_
[
rpc_name
];
}
}
void
RPCServer
::
IncreaseBatchBarrier
(
const
std
::
string
rpc_name
)
{
void
RPCServer
::
IncreaseBatchBarrier
(
const
std
::
string
rpc_name
)
{
VLOG
(
4
)
<<
"RPCServer begin IncreaseBatchBarrier "
<<
rpc_name
;
VLOG
(
4
0
)
<<
"RPCServer begin IncreaseBatchBarrier "
<<
rpc_name
;
int
b
=
0
;
int
b
=
0
;
std
::
unique_lock
<
std
::
mutex
>
lock
(
mutex_
);
std
::
unique_lock
<
std
::
mutex
>
lock
(
mutex_
);
b
=
++
barrier_counter_
[
rpc_name
];
b
=
++
barrier_counter_
[
rpc_name
];
...
@@ -71,7 +71,7 @@ void RPCServer::Complete() {
...
@@ -71,7 +71,7 @@ void RPCServer::Complete() {
client_num_
--
;
client_num_
--
;
need_reset_all_vars_
=
true
;
need_reset_all_vars_
=
true
;
VLOG
(
4
)
<<
"decrease client_num to: "
<<
client_num_
;
VLOG
(
4
0
)
<<
"decrease client_num to: "
<<
client_num_
;
if
(
cur_cond_
.
load
()
==
rpc_cond_map_
[
kRequestGet
])
{
if
(
cur_cond_
.
load
()
==
rpc_cond_map_
[
kRequestGet
])
{
barrier_counter_
[
kRequestGet
]
--
;
barrier_counter_
[
kRequestGet
]
--
;
}
}
...
@@ -90,7 +90,7 @@ int RPCServer::GetClientNum() {
...
@@ -90,7 +90,7 @@ int RPCServer::GetClientNum() {
}
}
void
RPCServer
::
ResetBarrierCounter
()
{
void
RPCServer
::
ResetBarrierCounter
()
{
VLOG
(
3
)
<<
"RPCServer ResetBarrierCounter "
;
VLOG
(
3
0
)
<<
"RPCServer ResetBarrierCounter "
;
std
::
unique_lock
<
std
::
mutex
>
lock
(
mutex_
);
std
::
unique_lock
<
std
::
mutex
>
lock
(
mutex_
);
for
(
auto
&
t
:
barrier_counter_
)
{
for
(
auto
&
t
:
barrier_counter_
)
{
t
.
second
=
0
;
t
.
second
=
0
;
...
@@ -105,12 +105,12 @@ void RPCServer::RegisterRPC(const std::string& rpc_name,
...
@@ -105,12 +105,12 @@ void RPCServer::RegisterRPC(const std::string& rpc_name,
static
int
cond
=
-
1
;
static
int
cond
=
-
1
;
rpc_cond_map_
[
rpc_name
]
=
++
cond
;
rpc_cond_map_
[
rpc_name
]
=
++
cond
;
VLOG
(
4
)
<<
"RegisterRPC rpc_name:"
<<
rpc_name
<<
", handler:"
<<
handler
VLOG
(
4
0
)
<<
"RegisterRPC rpc_name:"
<<
rpc_name
<<
", handler:"
<<
handler
<<
", cond:"
<<
rpc_cond_map_
[
rpc_name
];
<<
", cond:"
<<
rpc_cond_map_
[
rpc_name
];
}
}
void
RPCServer
::
SetCond
(
const
std
::
string
&
rpc_name
)
{
void
RPCServer
::
SetCond
(
const
std
::
string
&
rpc_name
)
{
VLOG
(
3
)
<<
"RPCServer SetCond "
<<
rpc_name
;
VLOG
(
3
0
)
<<
"RPCServer SetCond "
<<
rpc_name
;
{
{
std
::
unique_lock
<
std
::
mutex
>
lock
(
mutex_
);
std
::
unique_lock
<
std
::
mutex
>
lock
(
mutex_
);
cur_cond_
=
rpc_cond_map_
[
rpc_name
];
cur_cond_
=
rpc_cond_map_
[
rpc_name
];
...
@@ -120,7 +120,7 @@ void RPCServer::SetCond(const std::string& rpc_name) {
...
@@ -120,7 +120,7 @@ void RPCServer::SetCond(const std::string& rpc_name) {
}
}
void
RPCServer
::
WaitCond
(
const
std
::
string
&
rpc_name
)
{
void
RPCServer
::
WaitCond
(
const
std
::
string
&
rpc_name
)
{
VLOG
(
4
)
<<
"RPCServer WaitCond "
<<
rpc_name
;
VLOG
(
4
0
)
<<
"RPCServer WaitCond "
<<
rpc_name
;
int
cond
=
0
;
int
cond
=
0
;
{
{
std
::
unique_lock
<
std
::
mutex
>
lock
(
mutex_
);
std
::
unique_lock
<
std
::
mutex
>
lock
(
mutex_
);
...
...
paddle/fluid/operators/distributed/variable_response.cc
浏览文件 @
d231e550
...
@@ -50,7 +50,7 @@ bool VariableResponse::ReadRaw(::google::protobuf::io::CodedInputStream* input,
...
@@ -50,7 +50,7 @@ bool VariableResponse::ReadRaw(::google::protobuf::io::CodedInputStream* input,
size_to_write
=
length
-
total_written
;
size_to_write
=
length
-
total_written
;
}
}
// This log is useful to see how long a internal block size is of rpc.
// This log is useful to see how long a internal block size is of rpc.
VLOG
(
7
)
<<
"copy "
<<
size_to_write
<<
" data to CUDAPlace"
;
VLOG
(
7
0
)
<<
"copy "
<<
size_to_write
<<
" data to CUDAPlace"
;
memory
::
Copy
(
boost
::
get
<
platform
::
CUDAPlace
>
(
place
),
memory
::
Copy
(
boost
::
get
<
platform
::
CUDAPlace
>
(
place
),
reinterpret_cast
<
void
*>
(
p
),
cpu
,
data
,
size_to_write
,
reinterpret_cast
<
void
*>
(
p
),
cpu
,
data
,
size_to_write
,
gpu_dev_ctx
.
stream
());
gpu_dev_ctx
.
stream
());
...
@@ -79,7 +79,7 @@ bool VariableResponse::ReadRaw(::google::protobuf::io::CodedInputStream* input,
...
@@ -79,7 +79,7 @@ bool VariableResponse::ReadRaw(::google::protobuf::io::CodedInputStream* input,
// TODO(gongwb): can we avoid copy?
// TODO(gongwb): can we avoid copy?
platform
::
CPUPlace
cpu
;
platform
::
CPUPlace
cpu
;
// This log is useful to see how long a internal block size is of rpc.
// This log is useful to see how long a internal block size is of rpc.
VLOG
(
7
)
<<
"copy "
<<
size_to_write
<<
" data to CPUPlace"
;
VLOG
(
7
0
)
<<
"copy "
<<
size_to_write
<<
" data to CPUPlace"
;
memory
::
Copy
(
cpu
,
reinterpret_cast
<
void
*>
(
p
),
cpu
,
data
,
size_to_write
);
memory
::
Copy
(
cpu
,
reinterpret_cast
<
void
*>
(
p
),
cpu
,
data
,
size_to_write
);
p
+=
size_to_write
;
p
+=
size_to_write
;
...
@@ -198,7 +198,7 @@ bool VariableResponse::ProcSerializedField(
...
@@ -198,7 +198,7 @@ bool VariableResponse::ProcSerializedField(
#endif
#endif
}
}
VLOG
(
7
)
<<
"ProcSerializedField:"
<<
meta_
.
varname
()
VLOG
(
7
0
)
<<
"ProcSerializedField:"
<<
meta_
.
varname
()
<<
", type:"
<<
meta_
.
type
()
<<
std
::
endl
;
<<
", type:"
<<
meta_
.
type
()
<<
std
::
endl
;
framework
::
DDim
dims
=
GetDims
(
meta_
.
dims
());
framework
::
DDim
dims
=
GetDims
(
meta_
.
dims
());
if
(
meta_
.
type
()
==
sendrecv
::
LOD_TENSOR
)
{
if
(
meta_
.
type
()
==
sendrecv
::
LOD_TENSOR
)
{
...
...
paddle/fluid/operators/elementwise_op.h
浏览文件 @
d231e550
...
@@ -75,16 +75,12 @@ class ElementwiseOp : public framework::OperatorWithKernel {
...
@@ -75,16 +75,12 @@ class ElementwiseOp : public framework::OperatorWithKernel {
}
}
};
};
class
ElementwiseOpInferVarType
:
public
framework
::
VarTypeInference
{
class
ElementwiseOpInferVarType
public:
:
public
framework
::
PassInDtypeAndVarTypeToOutput
{
void
operator
()(
const
framework
::
OpDesc
&
op_desc
,
protected:
framework
::
BlockDesc
*
block
)
const
override
{
std
::
unordered_map
<
std
::
string
,
std
::
string
>
GetInputOutputWithSameType
()
auto
x_name
=
op_desc
.
Input
(
"X"
)[
0
];
const
override
{
auto
out_name
=
op_desc
.
Output
(
"Out"
)[
0
];
return
std
::
unordered_map
<
std
::
string
,
std
::
string
>
{{
"X"
,
/*->*/
"Out"
}};
auto
&
x
=
block
->
FindRecursiveOrCreateVar
(
x_name
);
auto
&
out
=
block
->
FindRecursiveOrCreateVar
(
out_name
);
out
.
SetType
(
x
.
GetType
());
out
.
SetDataType
(
x
.
GetDataType
());
}
}
};
};
...
...
paddle/fluid/operators/feed_op.cc
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/operators/fetch_barrier_op.cc
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/operators/fetch_op.cc
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/operators/gen_nccl_id_op.cc
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/operators/grid_sampler_op.h
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/operators/
bilinear_interp
_op.cc
→
paddle/fluid/operators/
interpolate
_op.cc
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/operators/
bilinear_interp
_op.cu
→
paddle/fluid/operators/
interpolate
_op.cu
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/operators/interpolate_op.h
0 → 100644
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/operators/listen_and_serv_op.cc
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/operators/lod_rank_table_op.cc
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/operators/lookup_table_op.cc
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/operators/math/cpu_vec_test.cc
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/operators/math/jit_kernel_test.cc
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/operators/math/selected_rows_functor.cc
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/operators/math/selected_rows_functor.cu
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/operators/mean_op.cc
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/operators/momentum_op.h
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/operators/mul_op.cc
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/operators/nccl_op.cu.cc
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/operators/nccl_op_test.cu.cc
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/operators/parallel_do_op.cc
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/operators/pool_op.cc
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/operators/prefetch_op.cc
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/operators/random_crop_op.h
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/operators/reader/blocking_queue.h
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/operators/reader/create_shuffle_reader_op.cc
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/operators/recurrent_op.cc
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/operators/recv_op.cc
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/operators/rnn_memory_helper_op.cc
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/operators/save_op.cc
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/operators/send_barrier_op.cc
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/operators/send_op.cc
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/operators/send_recv_op_test.cc
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/operators/sequence_mask_op.h
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/operators/sgd_op.h
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/operators/similarity_focus_op.cc
0 → 100644
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/operators/similarity_focus_op.h
0 → 100644
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/operators/softmax_op.cc
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/operators/split_byref_op.h
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/operators/split_ids_op.h
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/operators/sum_mkldnn_op.cc
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/operators/sum_op.cc
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/operators/tensor_array_read_write_op.cc
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/operators/tensor_array_to_tensor_op.cc
0 → 100644
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/operators/tensorrt_engine_op.h
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/operators/while_op.cc
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/platform/device_context.cc
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/platform/device_tracer.cc
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/platform/dynload/cudnn.h
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/platform/dynload/dynamic_loader.cc
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/platform/gpu_info.cc
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/platform/init.cc
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/platform/nccl_helper.h
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/pybind/protobuf.cc
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/fluid/train/demo/demo_trainer.cc
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/scripts/paddle_build.sh
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
paddle/testing/TestUtil.cpp
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
python/paddle/fluid/__init__.py
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
python/paddle/fluid/distribute_lookup_table.py
0 → 100644
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
python/paddle/fluid/layers/nn.py
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
python/paddle/fluid/layers/tensor.py
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
python/paddle/fluid/optimizer.py
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/book/test_label_semantic_roles.py
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_conv2d_op.py
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_conv3d_op.py
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_dist_transpiler.py
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_
bilinear_interp
_op.py
→
python/paddle/fluid/tests/unittests/test_
interpolate
_op.py
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_layers.py
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_similarity_focus_op.py
0 → 100755
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_tensor_array_to_tensor.py
0 → 100644
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
python/paddle/fluid/transpiler/distribute_transpiler.py
浏览文件 @
d231e550
此差异已折叠。
点击以展开。
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录