Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
机器未来
Paddle
提交
161b8ddc
P
Paddle
项目概览
机器未来
/
Paddle
与 Fork 源项目一致
Fork自
PaddlePaddle / Paddle
通知
1
Star
1
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
1
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
1
Issue
1
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
161b8ddc
编写于
3月 18, 2019
作者:
S
sneaxiy
浏览文件
操作
浏览文件
下载
差异文件
Merge develop
上级
4b073c95
6429d2a8
变更
103
展开全部
显示空白变更内容
内联
并排
Showing
103 changed file
with
5068 addition
and
707 deletion
+5068
-707
CMakeLists.txt
CMakeLists.txt
+2
-0
cmake/operators.cmake
cmake/operators.cmake
+1
-1
paddle/fluid/API.spec
paddle/fluid/API.spec
+8
-7
paddle/fluid/framework/details/build_strategy.cc
paddle/fluid/framework/details/build_strategy.cc
+7
-0
paddle/fluid/framework/details/build_strategy.h
paddle/fluid/framework/details/build_strategy.h
+2
-0
paddle/fluid/framework/details/memory_optimize_helper.cc
paddle/fluid/framework/details/memory_optimize_helper.cc
+0
-1
paddle/fluid/framework/grad_op_desc_maker.h
paddle/fluid/framework/grad_op_desc_maker.h
+5
-3
paddle/fluid/framework/ir/CMakeLists.txt
paddle/fluid/framework/ir/CMakeLists.txt
+8
-0
paddle/fluid/framework/ir/cpu_quantize_pass.cc
paddle/fluid/framework/ir/cpu_quantize_pass.cc
+239
-0
paddle/fluid/framework/ir/cpu_quantize_pass.h
paddle/fluid/framework/ir/cpu_quantize_pass.h
+66
-0
paddle/fluid/framework/ir/cpu_quantize_pass_tester.cc
paddle/fluid/framework/ir/cpu_quantize_pass_tester.cc
+211
-0
paddle/fluid/framework/ir/cpu_quantize_squash_pass.cc
paddle/fluid/framework/ir/cpu_quantize_squash_pass.cc
+146
-0
paddle/fluid/framework/ir/cpu_quantize_squash_pass.h
paddle/fluid/framework/ir/cpu_quantize_squash_pass.h
+58
-0
paddle/fluid/framework/ir/cpu_quantize_squash_pass_tester.cc
paddle/fluid/framework/ir/cpu_quantize_squash_pass_tester.cc
+179
-0
paddle/fluid/framework/ir/graph_pattern_detector.cc
paddle/fluid/framework/ir/graph_pattern_detector.cc
+93
-3
paddle/fluid/framework/ir/graph_pattern_detector.h
paddle/fluid/framework/ir/graph_pattern_detector.h
+60
-0
paddle/fluid/framework/ir/sync_batch_norm_pass.cc
paddle/fluid/framework/ir/sync_batch_norm_pass.cc
+45
-0
paddle/fluid/framework/ir/sync_batch_norm_pass.h
paddle/fluid/framework/ir/sync_batch_norm_pass.h
+32
-0
paddle/fluid/framework/ir/sync_batch_norm_pass_tester.cc
paddle/fluid/framework/ir/sync_batch_norm_pass_tester.cc
+80
-0
paddle/fluid/framework/parallel_executor.cc
paddle/fluid/framework/parallel_executor.cc
+16
-0
paddle/fluid/imperative/CMakeLists.txt
paddle/fluid/imperative/CMakeLists.txt
+1
-0
paddle/fluid/imperative/layer.cc
paddle/fluid/imperative/layer.cc
+5
-7
paddle/fluid/imperative/layer.h
paddle/fluid/imperative/layer.h
+17
-11
paddle/fluid/imperative/profiler.cc
paddle/fluid/imperative/profiler.cc
+62
-0
paddle/fluid/imperative/profiler.h
paddle/fluid/imperative/profiler.h
+25
-0
paddle/fluid/imperative/tracer.cc
paddle/fluid/imperative/tracer.cc
+8
-41
paddle/fluid/inference/CMakeLists.txt
paddle/fluid/inference/CMakeLists.txt
+1
-1
paddle/fluid/inference/analysis/argument.h
paddle/fluid/inference/analysis/argument.h
+6
-0
paddle/fluid/inference/analysis/ir_pass_manager.cc
paddle/fluid/inference/analysis/ir_pass_manager.cc
+6
-5
paddle/fluid/inference/api/analysis_config.cc
paddle/fluid/inference/api/analysis_config.cc
+8
-1
paddle/fluid/operators/CMakeLists.txt
paddle/fluid/operators/CMakeLists.txt
+6
-2
paddle/fluid/operators/batch_norm_op.cc
paddle/fluid/operators/batch_norm_op.cc
+228
-246
paddle/fluid/operators/batch_norm_op.cu
paddle/fluid/operators/batch_norm_op.cu
+19
-39
paddle/fluid/operators/batch_norm_op.h
paddle/fluid/operators/batch_norm_op.h
+72
-2
paddle/fluid/operators/conv_op.cc
paddle/fluid/operators/conv_op.cc
+7
-0
paddle/fluid/operators/cross_entropy_op.cc
paddle/fluid/operators/cross_entropy_op.cc
+168
-19
paddle/fluid/operators/cross_entropy_op.cu
paddle/fluid/operators/cross_entropy_op.cu
+10
-0
paddle/fluid/operators/cross_entropy_op.h
paddle/fluid/operators/cross_entropy_op.h
+120
-0
paddle/fluid/operators/detection/CMakeLists.txt
paddle/fluid/operators/detection/CMakeLists.txt
+1
-0
paddle/fluid/operators/detection/box_coder_op.cc
paddle/fluid/operators/detection/box_coder_op.cc
+8
-7
paddle/fluid/operators/detection/yolo_box_op.cc
paddle/fluid/operators/detection/yolo_box_op.cc
+167
-0
paddle/fluid/operators/detection/yolo_box_op.cu
paddle/fluid/operators/detection/yolo_box_op.cu
+120
-0
paddle/fluid/operators/detection/yolo_box_op.h
paddle/fluid/operators/detection/yolo_box_op.h
+149
-0
paddle/fluid/operators/detection/yolov3_loss_op.cc
paddle/fluid/operators/detection/yolov3_loss_op.cc
+33
-0
paddle/fluid/operators/detection/yolov3_loss_op.h
paddle/fluid/operators/detection/yolov3_loss_op.h
+79
-26
paddle/fluid/operators/expand_op.cc
paddle/fluid/operators/expand_op.cc
+18
-1
paddle/fluid/operators/fake_quantize_op.cc
paddle/fluid/operators/fake_quantize_op.cc
+102
-0
paddle/fluid/operators/fake_quantize_op.cu
paddle/fluid/operators/fake_quantize_op.cu
+38
-0
paddle/fluid/operators/fake_quantize_op.h
paddle/fluid/operators/fake_quantize_op.h
+58
-1
paddle/fluid/operators/fc_op.cc
paddle/fluid/operators/fc_op.cc
+14
-13
paddle/fluid/operators/fc_op.h
paddle/fluid/operators/fc_op.h
+16
-0
paddle/fluid/operators/fused/fused_embedding_seq_pool_op.cc
paddle/fluid/operators/fused/fused_embedding_seq_pool_op.cc
+2
-1
paddle/fluid/operators/hash_op.cc
paddle/fluid/operators/hash_op.cc
+2
-1
paddle/fluid/operators/math.h
paddle/fluid/operators/math.h
+42
-0
paddle/fluid/operators/math/cross_entropy.cu
paddle/fluid/operators/math/cross_entropy.cu
+1
-12
paddle/fluid/operators/mkldnn/conv_mkldnn_op.cc
paddle/fluid/operators/mkldnn/conv_mkldnn_op.cc
+1
-0
paddle/fluid/operators/mkldnn/fc_mkldnn_op.cc
paddle/fluid/operators/mkldnn/fc_mkldnn_op.cc
+16
-8
paddle/fluid/operators/mkldnn/transpose_mkldnn_op.cc
paddle/fluid/operators/mkldnn/transpose_mkldnn_op.cc
+27
-1
paddle/fluid/operators/optimizers/adam_op.h
paddle/fluid/operators/optimizers/adam_op.h
+15
-34
paddle/fluid/operators/optimizers/momentum_op.h
paddle/fluid/operators/optimizers/momentum_op.h
+6
-13
paddle/fluid/operators/optimizers/rmsprop_op.h
paddle/fluid/operators/optimizers/rmsprop_op.h
+4
-14
paddle/fluid/operators/pool_op.cc
paddle/fluid/operators/pool_op.cc
+7
-0
paddle/fluid/operators/reshape_op.cc
paddle/fluid/operators/reshape_op.cc
+0
-8
paddle/fluid/operators/selu_op.h
paddle/fluid/operators/selu_op.h
+2
-3
paddle/fluid/operators/sequence_ops/sequence_enumerate_op.cc
paddle/fluid/operators/sequence_ops/sequence_enumerate_op.cc
+2
-1
paddle/fluid/operators/sequence_ops/sequence_softmax_op.cu
paddle/fluid/operators/sequence_ops/sequence_softmax_op.cu
+1
-3
paddle/fluid/operators/sigmoid_cross_entropy_with_logits_op.cu
...e/fluid/operators/sigmoid_cross_entropy_with_logits_op.cu
+1
-5
paddle/fluid/operators/slice_op.cu
paddle/fluid/operators/slice_op.cu
+122
-2
paddle/fluid/operators/softmax_with_cross_entropy_op.cu
paddle/fluid/operators/softmax_with_cross_entropy_op.cu
+2
-1
paddle/fluid/operators/squeeze_op.cc
paddle/fluid/operators/squeeze_op.cc
+1
-0
paddle/fluid/operators/sync_batch_norm_op.cc
paddle/fluid/operators/sync_batch_norm_op.cc
+20
-0
paddle/fluid/operators/sync_batch_norm_op.cu
paddle/fluid/operators/sync_batch_norm_op.cu
+452
-0
paddle/fluid/platform/device_context.cc
paddle/fluid/platform/device_context.cc
+3
-1
paddle/fluid/platform/device_context.h
paddle/fluid/platform/device_context.h
+17
-0
paddle/fluid/platform/init.cc
paddle/fluid/platform/init.cc
+3
-0
paddle/fluid/platform/nccl_helper.h
paddle/fluid/platform/nccl_helper.h
+4
-0
paddle/fluid/pybind/CMakeLists.txt
paddle/fluid/pybind/CMakeLists.txt
+1
-1
paddle/fluid/pybind/imperative.cc
paddle/fluid/pybind/imperative.cc
+2
-0
paddle/fluid/pybind/pybind.cc
paddle/fluid/pybind/pybind.cc
+21
-0
python/paddle/fluid/__init__.py
python/paddle/fluid/__init__.py
+2
-1
python/paddle/fluid/compiler.py
python/paddle/fluid/compiler.py
+3
-0
python/paddle/fluid/contrib/quantize/quantize_transpiler.py
python/paddle/fluid/contrib/quantize/quantize_transpiler.py
+74
-10
python/paddle/fluid/contrib/slim/quantization/quantization_pass.py
...ddle/fluid/contrib/slim/quantization/quantization_pass.py
+81
-5
python/paddle/fluid/contrib/slim/tests/test_quantization_pass.py
...paddle/fluid/contrib/slim/tests/test_quantization_pass.py
+13
-0
python/paddle/fluid/contrib/utils/lookup_table_utils.py
python/paddle/fluid/contrib/utils/lookup_table_utils.py
+227
-67
python/paddle/fluid/framework.py
python/paddle/fluid/framework.py
+5
-0
python/paddle/fluid/imperative/__init__.py
python/paddle/fluid/imperative/__init__.py
+4
-0
python/paddle/fluid/imperative/profiler.py
python/paddle/fluid/imperative/profiler.py
+30
-0
python/paddle/fluid/layers/detection.py
python/paddle/fluid/layers/detection.py
+109
-12
python/paddle/fluid/layers/nn.py
python/paddle/fluid/layers/nn.py
+81
-25
python/paddle/fluid/tests/test_detection.py
python/paddle/fluid/tests/test_detection.py
+20
-2
python/paddle/fluid/tests/unittests/mkldnn/test_transpose_int8_mkldnn_op.py
...d/tests/unittests/mkldnn/test_transpose_int8_mkldnn_op.py
+78
-0
python/paddle/fluid/tests/unittests/test_cross_entropy2_op.py
...on/paddle/fluid/tests/unittests/test_cross_entropy2_op.py
+82
-0
python/paddle/fluid/tests/unittests/test_dist_transpiler.py
python/paddle/fluid/tests/unittests/test_dist_transpiler.py
+10
-10
python/paddle/fluid/tests/unittests/test_fake_quantize_op.py
python/paddle/fluid/tests/unittests/test_fake_quantize_op.py
+42
-0
python/paddle/fluid/tests/unittests/test_imperative_gnn.py
python/paddle/fluid/tests/unittests/test_imperative_gnn.py
+144
-0
python/paddle/fluid/tests/unittests/test_layers.py
python/paddle/fluid/tests/unittests/test_layers.py
+75
-0
python/paddle/fluid/tests/unittests/test_slice_op.py
python/paddle/fluid/tests/unittests/test_slice_op.py
+24
-0
python/paddle/fluid/tests/unittests/test_sync_batch_norm_op.py
...n/paddle/fluid/tests/unittests/test_sync_batch_norm_op.py
+159
-0
python/paddle/fluid/tests/unittests/test_yolo_box_op.py
python/paddle/fluid/tests/unittests/test_yolo_box_op.py
+117
-0
python/paddle/fluid/tests/unittests/test_yolov3_loss_op.py
python/paddle/fluid/tests/unittests/test_yolov3_loss_op.py
+70
-28
tools/manylinux1/build_scripts/build.sh
tools/manylinux1/build_scripts/build.sh
+6
-0
tools/timeline.py
tools/timeline.py
+5
-1
未找到文件。
CMakeLists.txt
浏览文件 @
161b8ddc
...
@@ -24,6 +24,8 @@ message(STATUS "CXX compiler: ${CMAKE_CXX_COMPILER}, version: "
...
@@ -24,6 +24,8 @@ message(STATUS "CXX compiler: ${CMAKE_CXX_COMPILER}, version: "
"
${
CMAKE_CXX_COMPILER_ID
}
${
CMAKE_CXX_COMPILER_VERSION
}
"
)
"
${
CMAKE_CXX_COMPILER_ID
}
${
CMAKE_CXX_COMPILER_VERSION
}
"
)
message
(
STATUS
"C compiler:
${
CMAKE_C_COMPILER
}
, version: "
message
(
STATUS
"C compiler:
${
CMAKE_C_COMPILER
}
, version: "
"
${
CMAKE_C_COMPILER_ID
}
${
CMAKE_C_COMPILER_VERSION
}
"
)
"
${
CMAKE_C_COMPILER_ID
}
${
CMAKE_C_COMPILER_VERSION
}
"
)
message
(
STATUS
"AR tools:
${
CMAKE_AR
}
"
)
if
(
WIN32
)
if
(
WIN32
)
set
(
CMAKE_SUPPRESS_REGENERATION ON
)
set
(
CMAKE_SUPPRESS_REGENERATION ON
)
set
(
CMAKE_STATIC_LIBRARY_PREFIX lib
)
set
(
CMAKE_STATIC_LIBRARY_PREFIX lib
)
...
...
cmake/operators.cmake
浏览文件 @
161b8ddc
...
@@ -110,7 +110,7 @@ function(op_library TARGET)
...
@@ -110,7 +110,7 @@ function(op_library TARGET)
# Define operators that don't need pybind here.
# Define operators that don't need pybind here.
foreach
(
manual_pybind_op
"compare_op"
"logical_op"
"nccl_op"
foreach
(
manual_pybind_op
"compare_op"
"logical_op"
"nccl_op"
"tensor_array_read_write_op"
"tensorrt_engine_op"
"conv_fusion_op"
"tensor_array_read_write_op"
"tensorrt_engine_op"
"conv_fusion_op"
"fusion_transpose_flatten_concat_op"
"fusion_conv_inception_op"
)
"fusion_transpose_flatten_concat_op"
"fusion_conv_inception_op"
"sync_batch_norm_op"
)
if
(
"
${
TARGET
}
"
STREQUAL
"
${
manual_pybind_op
}
"
)
if
(
"
${
TARGET
}
"
STREQUAL
"
${
manual_pybind_op
}
"
)
set
(
pybind_flag 1
)
set
(
pybind_flag 1
)
endif
()
endif
()
...
...
paddle/fluid/API.spec
浏览文件 @
161b8ddc
...
@@ -77,7 +77,7 @@ paddle.fluid.initializer.MSRAInitializer.__init__ (ArgSpec(args=['self', 'unifor
...
@@ -77,7 +77,7 @@ paddle.fluid.initializer.MSRAInitializer.__init__ (ArgSpec(args=['self', 'unifor
paddle.fluid.initializer.force_init_on_cpu (ArgSpec(args=[], varargs=None, keywords=None, defaults=None), ('document', '6d0f3e22c90d9d500d36ff57daf056ee'))
paddle.fluid.initializer.force_init_on_cpu (ArgSpec(args=[], varargs=None, keywords=None, defaults=None), ('document', '6d0f3e22c90d9d500d36ff57daf056ee'))
paddle.fluid.initializer.init_on_cpu (ArgSpec(args=[], varargs=None, keywords=None, defaults=None), ('document', 'a6d7011ca3d8c0d454dac3a56eae0c29'))
paddle.fluid.initializer.init_on_cpu (ArgSpec(args=[], varargs=None, keywords=None, defaults=None), ('document', 'a6d7011ca3d8c0d454dac3a56eae0c29'))
paddle.fluid.initializer.NumpyArrayInitializer.__init__ (ArgSpec(args=['self', 'value'], varargs=None, keywords=None, defaults=None), ('document', '6adf97f83acf6453d4a6a4b1070f3754'))
paddle.fluid.initializer.NumpyArrayInitializer.__init__ (ArgSpec(args=['self', 'value'], varargs=None, keywords=None, defaults=None), ('document', '6adf97f83acf6453d4a6a4b1070f3754'))
paddle.fluid.layers.fc (ArgSpec(args=['input', 'size', 'num_flatten_dims', 'param_attr', 'bias_attr', 'act', 'is_test', 'name'], varargs=None, keywords=None, defaults=(1, None, None, None, False, None)), ('document', '
1929058262994f212620599c63aea6bd
'))
paddle.fluid.layers.fc (ArgSpec(args=['input', 'size', 'num_flatten_dims', 'param_attr', 'bias_attr', 'act', 'is_test', 'name'], varargs=None, keywords=None, defaults=(1, None, None, None, False, None)), ('document', '
424e898365195e3ccbc2e7dc8b63605e
'))
paddle.fluid.layers.embedding (ArgSpec(args=['input', 'size', 'is_sparse', 'is_distributed', 'padding_idx', 'param_attr', 'dtype'], varargs=None, keywords=None, defaults=(False, False, None, None, 'float32')), ('document', '89c2c55a0b0656b106064048e068e77a'))
paddle.fluid.layers.embedding (ArgSpec(args=['input', 'size', 'is_sparse', 'is_distributed', 'padding_idx', 'param_attr', 'dtype'], varargs=None, keywords=None, defaults=(False, False, None, None, 'float32')), ('document', '89c2c55a0b0656b106064048e068e77a'))
paddle.fluid.layers.dynamic_lstm (ArgSpec(args=['input', 'size', 'h_0', 'c_0', 'param_attr', 'bias_attr', 'use_peepholes', 'is_reverse', 'gate_activation', 'cell_activation', 'candidate_activation', 'dtype', 'name'], varargs=None, keywords=None, defaults=(None, None, None, None, True, False, 'sigmoid', 'tanh', 'tanh', 'float32', None)), ('document', 'dfbb624f85015df29e994ca6999e8ff6'))
paddle.fluid.layers.dynamic_lstm (ArgSpec(args=['input', 'size', 'h_0', 'c_0', 'param_attr', 'bias_attr', 'use_peepholes', 'is_reverse', 'gate_activation', 'cell_activation', 'candidate_activation', 'dtype', 'name'], varargs=None, keywords=None, defaults=(None, None, None, None, True, False, 'sigmoid', 'tanh', 'tanh', 'float32', None)), ('document', 'dfbb624f85015df29e994ca6999e8ff6'))
paddle.fluid.layers.dynamic_lstmp (ArgSpec(args=['input', 'size', 'proj_size', 'param_attr', 'bias_attr', 'use_peepholes', 'is_reverse', 'gate_activation', 'cell_activation', 'candidate_activation', 'proj_activation', 'dtype', 'name', 'h_0', 'c_0', 'cell_clip', 'proj_clip'], varargs=None, keywords=None, defaults=(None, None, True, False, 'sigmoid', 'tanh', 'tanh', 'tanh', 'float32', None, None, None, None, None)), ('document', 'b4b608b986eb9617aa0525e1be21d32d'))
paddle.fluid.layers.dynamic_lstmp (ArgSpec(args=['input', 'size', 'proj_size', 'param_attr', 'bias_attr', 'use_peepholes', 'is_reverse', 'gate_activation', 'cell_activation', 'candidate_activation', 'proj_activation', 'dtype', 'name', 'h_0', 'c_0', 'cell_clip', 'proj_clip'], varargs=None, keywords=None, defaults=(None, None, True, False, 'sigmoid', 'tanh', 'tanh', 'tanh', 'float32', None, None, None, None, None)), ('document', 'b4b608b986eb9617aa0525e1be21d32d'))
...
@@ -100,7 +100,7 @@ paddle.fluid.layers.pool2d (ArgSpec(args=['input', 'pool_size', 'pool_type', 'po
...
@@ -100,7 +100,7 @@ paddle.fluid.layers.pool2d (ArgSpec(args=['input', 'pool_size', 'pool_type', 'po
paddle.fluid.layers.pool3d (ArgSpec(args=['input', 'pool_size', 'pool_type', 'pool_stride', 'pool_padding', 'global_pooling', 'use_cudnn', 'ceil_mode', 'name', 'exclusive'], varargs=None, keywords=None, defaults=(-1, 'max', 1, 0, False, True, False, None, True)), ('document', '043de7333b79ee0ac55053c14ed81625'))
paddle.fluid.layers.pool3d (ArgSpec(args=['input', 'pool_size', 'pool_type', 'pool_stride', 'pool_padding', 'global_pooling', 'use_cudnn', 'ceil_mode', 'name', 'exclusive'], varargs=None, keywords=None, defaults=(-1, 'max', 1, 0, False, True, False, None, True)), ('document', '043de7333b79ee0ac55053c14ed81625'))
paddle.fluid.layers.adaptive_pool2d (ArgSpec(args=['input', 'pool_size', 'pool_type', 'require_index', 'name'], varargs=None, keywords=None, defaults=('max', False, None)), ('document', '859b887174d06f361658f69cb7c06d95'))
paddle.fluid.layers.adaptive_pool2d (ArgSpec(args=['input', 'pool_size', 'pool_type', 'require_index', 'name'], varargs=None, keywords=None, defaults=('max', False, None)), ('document', '859b887174d06f361658f69cb7c06d95'))
paddle.fluid.layers.adaptive_pool3d (ArgSpec(args=['input', 'pool_size', 'pool_type', 'require_index', 'name'], varargs=None, keywords=None, defaults=('max', False, None)), ('document', '120f4323a3d7ed9c0916f15a59f0e497'))
paddle.fluid.layers.adaptive_pool3d (ArgSpec(args=['input', 'pool_size', 'pool_type', 'require_index', 'name'], varargs=None, keywords=None, defaults=('max', False, None)), ('document', '120f4323a3d7ed9c0916f15a59f0e497'))
paddle.fluid.layers.batch_norm (ArgSpec(args=['input', 'act', 'is_test', 'momentum', 'epsilon', 'param_attr', 'bias_attr', 'data_layout', 'in_place', 'name', 'moving_mean_name', 'moving_variance_name', 'do_model_average_for_mean_and_var', 'fuse_with_relu', 'use_global_stats'], varargs=None, keywords=None, defaults=(None, False, 0.9, 1e-05, None, None, 'NCHW', False, None, None, None, False, False, False)), ('document', '
c527b71b8a4c60dca8df8a745c2b598d
'))
paddle.fluid.layers.batch_norm (ArgSpec(args=['input', 'act', 'is_test', 'momentum', 'epsilon', 'param_attr', 'bias_attr', 'data_layout', 'in_place', 'name', 'moving_mean_name', 'moving_variance_name', 'do_model_average_for_mean_and_var', 'fuse_with_relu', 'use_global_stats'], varargs=None, keywords=None, defaults=(None, False, 0.9, 1e-05, None, None, 'NCHW', False, None, None, None, False, False, False)), ('document', '
320c6973b02ea179fa89fecc80796464
'))
paddle.fluid.layers.data_norm (ArgSpec(args=['input', 'act', 'epsilon', 'param_attr', 'data_layout', 'in_place', 'name', 'moving_mean_name', 'moving_variance_name', 'do_model_average_for_mean_and_var'], varargs=None, keywords=None, defaults=(None, 1e-05, None, 'NCHW', False, None, None, None, False)), ('document', 'e45e09e65a2658e07cad987222f0d9ab'))
paddle.fluid.layers.data_norm (ArgSpec(args=['input', 'act', 'epsilon', 'param_attr', 'data_layout', 'in_place', 'name', 'moving_mean_name', 'moving_variance_name', 'do_model_average_for_mean_and_var'], varargs=None, keywords=None, defaults=(None, 1e-05, None, 'NCHW', False, None, None, None, False)), ('document', 'e45e09e65a2658e07cad987222f0d9ab'))
paddle.fluid.layers.beam_search_decode (ArgSpec(args=['ids', 'scores', 'beam_size', 'end_id', 'name'], varargs=None, keywords=None, defaults=(None,)), ('document', 'b0b8d53821716cd50c42e09b593f3feb'))
paddle.fluid.layers.beam_search_decode (ArgSpec(args=['ids', 'scores', 'beam_size', 'end_id', 'name'], varargs=None, keywords=None, defaults=(None,)), ('document', 'b0b8d53821716cd50c42e09b593f3feb'))
paddle.fluid.layers.conv2d_transpose (ArgSpec(args=['input', 'num_filters', 'output_size', 'filter_size', 'padding', 'stride', 'dilation', 'groups', 'param_attr', 'bias_attr', 'use_cudnn', 'act', 'name'], varargs=None, keywords=None, defaults=(None, None, 0, 1, 1, None, None, None, True, None, None)), ('document', '03993955ab1e6d3044c44e6f17fc85e9'))
paddle.fluid.layers.conv2d_transpose (ArgSpec(args=['input', 'num_filters', 'output_size', 'filter_size', 'padding', 'stride', 'dilation', 'groups', 'param_attr', 'bias_attr', 'use_cudnn', 'act', 'name'], varargs=None, keywords=None, defaults=(None, None, 0, 1, 1, None, None, None, True, None, None)), ('document', '03993955ab1e6d3044c44e6f17fc85e9'))
...
@@ -339,7 +339,8 @@ paddle.fluid.layers.generate_mask_labels (ArgSpec(args=['im_info', 'gt_classes',
...
@@ -339,7 +339,8 @@ paddle.fluid.layers.generate_mask_labels (ArgSpec(args=['im_info', 'gt_classes',
paddle.fluid.layers.iou_similarity (ArgSpec(args=['x', 'y', 'name'], varargs=None, keywords=None, defaults=(None,)), ('document', '587845f60c5d97ffdf2dfd21da52eca1'))
paddle.fluid.layers.iou_similarity (ArgSpec(args=['x', 'y', 'name'], varargs=None, keywords=None, defaults=(None,)), ('document', '587845f60c5d97ffdf2dfd21da52eca1'))
paddle.fluid.layers.box_coder (ArgSpec(args=['prior_box', 'prior_box_var', 'target_box', 'code_type', 'box_normalized', 'name', 'axis'], varargs=None, keywords=None, defaults=('encode_center_size', True, None, 0)), ('document', '032d0f4b7d8f6235ee5d91e473344f0e'))
paddle.fluid.layers.box_coder (ArgSpec(args=['prior_box', 'prior_box_var', 'target_box', 'code_type', 'box_normalized', 'name', 'axis'], varargs=None, keywords=None, defaults=('encode_center_size', True, None, 0)), ('document', '032d0f4b7d8f6235ee5d91e473344f0e'))
paddle.fluid.layers.polygon_box_transform (ArgSpec(args=['input', 'name'], varargs=None, keywords=None, defaults=(None,)), ('document', '0e5ac2507723a0b5adec473f9556799b'))
paddle.fluid.layers.polygon_box_transform (ArgSpec(args=['input', 'name'], varargs=None, keywords=None, defaults=(None,)), ('document', '0e5ac2507723a0b5adec473f9556799b'))
paddle.fluid.layers.yolov3_loss (ArgSpec(args=['x', 'gtbox', 'gtlabel', 'anchors', 'anchor_mask', 'class_num', 'ignore_thresh', 'downsample_ratio', 'name'], varargs=None, keywords=None, defaults=(None,)), ('document', '991e934c3e09abf0edec7c9c978b4691'))
paddle.fluid.layers.yolov3_loss (ArgSpec(args=['x', 'gtbox', 'gtlabel', 'anchors', 'anchor_mask', 'class_num', 'ignore_thresh', 'downsample_ratio', 'gtscore', 'use_label_smooth', 'name'], varargs=None, keywords=None, defaults=(None, True, None)), ('document', '57fa96922e42db8f064c3fb77f2255e8'))
paddle.fluid.layers.yolo_box (ArgSpec(args=['x', 'img_size', 'anchors', 'class_num', 'conf_thresh', 'downsample_ratio', 'name'], varargs=None, keywords=None, defaults=(None,)), ('document', '5566169a5ab993d177792c023c7fb340'))
paddle.fluid.layers.box_clip (ArgSpec(args=['input', 'im_info', 'name'], varargs=None, keywords=None, defaults=(None,)), ('document', '397e9e02b451d99c56e20f268fa03f2e'))
paddle.fluid.layers.box_clip (ArgSpec(args=['input', 'im_info', 'name'], varargs=None, keywords=None, defaults=(None,)), ('document', '397e9e02b451d99c56e20f268fa03f2e'))
paddle.fluid.layers.multiclass_nms (ArgSpec(args=['bboxes', 'scores', 'score_threshold', 'nms_top_k', 'keep_top_k', 'nms_threshold', 'normalized', 'nms_eta', 'background_label', 'name'], varargs=None, keywords=None, defaults=(0.3, True, 1.0, 0, None)), ('document', 'ca7d1107b6c5d2d6d8221039a220fde0'))
paddle.fluid.layers.multiclass_nms (ArgSpec(args=['bboxes', 'scores', 'score_threshold', 'nms_top_k', 'keep_top_k', 'nms_threshold', 'normalized', 'nms_eta', 'background_label', 'name'], varargs=None, keywords=None, defaults=(0.3, True, 1.0, 0, None)), ('document', 'ca7d1107b6c5d2d6d8221039a220fde0'))
paddle.fluid.layers.distribute_fpn_proposals (ArgSpec(args=['fpn_rois', 'min_level', 'max_level', 'refer_level', 'refer_scale', 'name'], varargs=None, keywords=None, defaults=(None,)), ('document', '7bb011ec26bace2bc23235aa4a17647d'))
paddle.fluid.layers.distribute_fpn_proposals (ArgSpec(args=['fpn_rois', 'min_level', 'max_level', 'refer_level', 'refer_scale', 'name'], varargs=None, keywords=None, defaults=(None,)), ('document', '7bb011ec26bace2bc23235aa4a17647d'))
...
@@ -376,7 +377,7 @@ paddle.fluid.contrib.BeamSearchDecoder.read_array (ArgSpec(args=['self', 'init',
...
@@ -376,7 +377,7 @@ paddle.fluid.contrib.BeamSearchDecoder.read_array (ArgSpec(args=['self', 'init',
paddle.fluid.contrib.BeamSearchDecoder.update_array (ArgSpec(args=['self', 'array', 'value'], varargs=None, keywords=None, defaults=None), ('document', '5754e9b3212b7c09497151516a0de5a7'))
paddle.fluid.contrib.BeamSearchDecoder.update_array (ArgSpec(args=['self', 'array', 'value'], varargs=None, keywords=None, defaults=None), ('document', '5754e9b3212b7c09497151516a0de5a7'))
paddle.fluid.contrib.memory_usage (ArgSpec(args=['program', 'batch_size'], varargs=None, keywords=None, defaults=None), ('document', '8fcb2f93bb743693baa8d4860a5ccc47'))
paddle.fluid.contrib.memory_usage (ArgSpec(args=['program', 'batch_size'], varargs=None, keywords=None, defaults=None), ('document', '8fcb2f93bb743693baa8d4860a5ccc47'))
paddle.fluid.contrib.op_freq_statistic (ArgSpec(args=['program'], varargs=None, keywords=None, defaults=None), ('document', '4d43687113c4bf5b29d15aee2f4e4afa'))
paddle.fluid.contrib.op_freq_statistic (ArgSpec(args=['program'], varargs=None, keywords=None, defaults=None), ('document', '4d43687113c4bf5b29d15aee2f4e4afa'))
paddle.fluid.contrib.QuantizeTranspiler.__init__ (ArgSpec(args=['self', 'weight_bits', 'activation_bits', 'activation_quantize_type', 'weight_quantize_type', 'window_size'
], varargs=None, keywords=None, defaults=(8, 8, 'abs_max', 'abs_max', 10000
)), ('document', '14b39f1fcd5667ff556b1aad94357d1d'))
paddle.fluid.contrib.QuantizeTranspiler.__init__ (ArgSpec(args=['self', 'weight_bits', 'activation_bits', 'activation_quantize_type', 'weight_quantize_type', 'window_size'
, 'moving_rate'], varargs=None, keywords=None, defaults=(8, 8, 'abs_max', 'abs_max', 10000, 0.9
)), ('document', '14b39f1fcd5667ff556b1aad94357d1d'))
paddle.fluid.contrib.QuantizeTranspiler.convert_to_int8 (ArgSpec(args=['self', 'program', 'place', 'scope'], varargs=None, keywords=None, defaults=(None,)), ('document', '6adf97f83acf6453d4a6a4b1070f3754'))
paddle.fluid.contrib.QuantizeTranspiler.convert_to_int8 (ArgSpec(args=['self', 'program', 'place', 'scope'], varargs=None, keywords=None, defaults=(None,)), ('document', '6adf97f83acf6453d4a6a4b1070f3754'))
paddle.fluid.contrib.QuantizeTranspiler.freeze_program (ArgSpec(args=['self', 'program', 'place', 'fuse_bn', 'scope'], varargs=None, keywords=None, defaults=(False, None)), ('document', '909675a1ab055c69b436a7893fcae4fd'))
paddle.fluid.contrib.QuantizeTranspiler.freeze_program (ArgSpec(args=['self', 'program', 'place', 'fuse_bn', 'scope'], varargs=None, keywords=None, defaults=(False, None)), ('document', '909675a1ab055c69b436a7893fcae4fd'))
paddle.fluid.contrib.QuantizeTranspiler.training_transpile (ArgSpec(args=['self', 'program', 'startup_program'], varargs=None, keywords=None, defaults=(None, None)), ('document', '6dd9909f10b283ba2892a99058a72884'))
paddle.fluid.contrib.QuantizeTranspiler.training_transpile (ArgSpec(args=['self', 'program', 'startup_program'], varargs=None, keywords=None, defaults=(None, None)), ('document', '6dd9909f10b283ba2892a99058a72884'))
...
@@ -401,9 +402,9 @@ paddle.fluid.contrib.MagnitudePruner.__init__ (ArgSpec(args=['self', 'threshold'
...
@@ -401,9 +402,9 @@ paddle.fluid.contrib.MagnitudePruner.__init__ (ArgSpec(args=['self', 'threshold'
paddle.fluid.contrib.MagnitudePruner.prune (ArgSpec(args=['self', 'param', 'threshold'], varargs=None, keywords=None, defaults=(None,)), ('document', '6adf97f83acf6453d4a6a4b1070f3754'))
paddle.fluid.contrib.MagnitudePruner.prune (ArgSpec(args=['self', 'param', 'threshold'], varargs=None, keywords=None, defaults=(None,)), ('document', '6adf97f83acf6453d4a6a4b1070f3754'))
paddle.fluid.contrib.RatioPruner.__init__ (ArgSpec(args=['self', 'ratios'], varargs=None, keywords=None, defaults=(None,)), ('document', 'e7a81a325b296a9ca502ee5adb4fc85d'))
paddle.fluid.contrib.RatioPruner.__init__ (ArgSpec(args=['self', 'ratios'], varargs=None, keywords=None, defaults=(None,)), ('document', 'e7a81a325b296a9ca502ee5adb4fc85d'))
paddle.fluid.contrib.RatioPruner.prune (ArgSpec(args=['self', 'param', 'ratio'], varargs=None, keywords=None, defaults=(None,)), ('document', '358cbf2978c91028fb96a195a9884645'))
paddle.fluid.contrib.RatioPruner.prune (ArgSpec(args=['self', 'param', 'ratio'], varargs=None, keywords=None, defaults=(None,)), ('document', '358cbf2978c91028fb96a195a9884645'))
paddle.fluid.contrib.load_persistables_for_increment (ArgSpec(args=['dirname', 'executor', 'program', 'lookup_table_var', 'lookup_table_var_path'], varargs=None, keywords=None, defaults=None), ('document', '
11fbf7e8dd2289805de291b453a33ee
7'))
paddle.fluid.contrib.load_persistables_for_increment (ArgSpec(args=['dirname', 'executor', 'program', 'lookup_table_var', 'lookup_table_var_path'], varargs=None, keywords=None, defaults=None), ('document', '
2ab36d4f7a564f5f65e455807ad06c6
7'))
paddle.fluid.contrib.load_persistables_for_inference (ArgSpec(args=['dirname', 'executor', 'program', 'lookup_table_var_name'], varargs=None, keywords=None, defaults=None), ('document', '5
b5577bb3d24070da819674255d16196
'))
paddle.fluid.contrib.load_persistables_for_inference (ArgSpec(args=['dirname', 'executor', 'program', 'lookup_table_var_name'], varargs=None, keywords=None, defaults=None), ('document', '5
9066bac9db0ac6ce414d05780b7333f
'))
paddle.fluid.contrib.convert_dist_to_sparse_program (ArgSpec(args=['program'], varargs=None, keywords=None, defaults=None), ('document', '
4efbd93876832d4d35497cdbc7a1e6d8
'))
paddle.fluid.contrib.convert_dist_to_sparse_program (ArgSpec(args=['program'], varargs=None, keywords=None, defaults=None), ('document', '
74c39c595dc70d6be2f16d8e462d282b
'))
paddle.fluid.contrib.HDFSClient.__init__ (ArgSpec(args=['self', 'hadoop_home', 'configs'], varargs=None, keywords=None, defaults=None), ('document', '6adf97f83acf6453d4a6a4b1070f3754'))
paddle.fluid.contrib.HDFSClient.__init__ (ArgSpec(args=['self', 'hadoop_home', 'configs'], varargs=None, keywords=None, defaults=None), ('document', '6adf97f83acf6453d4a6a4b1070f3754'))
paddle.fluid.contrib.HDFSClient.delete (ArgSpec(args=['self', 'hdfs_path'], varargs=None, keywords=None, defaults=None), ('document', 'c3721aa2d4d9ef5a857dd47b2681c03e'))
paddle.fluid.contrib.HDFSClient.delete (ArgSpec(args=['self', 'hdfs_path'], varargs=None, keywords=None, defaults=None), ('document', 'c3721aa2d4d9ef5a857dd47b2681c03e'))
paddle.fluid.contrib.HDFSClient.download (ArgSpec(args=['self', 'hdfs_path', 'local_path', 'overwrite', 'unzip'], varargs=None, keywords=None, defaults=(False, False)), ('document', 'ca55bde92184d3fd0f9f5c963b25e634'))
paddle.fluid.contrib.HDFSClient.download (ArgSpec(args=['self', 'hdfs_path', 'local_path', 'overwrite', 'unzip'], varargs=None, keywords=None, defaults=(False, False)), ('document', 'ca55bde92184d3fd0f9f5c963b25e634'))
...
...
paddle/fluid/framework/details/build_strategy.cc
浏览文件 @
161b8ddc
...
@@ -16,6 +16,7 @@ limitations under the License. */
...
@@ -16,6 +16,7 @@ limitations under the License. */
#include <glog/logging.h>
#include <glog/logging.h>
#include <memory>
#include <memory>
#include <utility>
#include "paddle/fluid/framework/details/memory_optimize_helper.h"
#include "paddle/fluid/framework/details/memory_optimize_helper.h"
#include "paddle/fluid/framework/details/multi_devices_graph_pass.h"
#include "paddle/fluid/framework/details/multi_devices_graph_pass.h"
...
@@ -49,6 +50,11 @@ class ParallelExecutorPassBuilder : public ir::PassBuilder {
...
@@ -49,6 +50,11 @@ class ParallelExecutorPassBuilder : public ir::PassBuilder {
AppendPass
(
"sequential_execution_pass"
);
AppendPass
(
"sequential_execution_pass"
);
}
}
// Add op fusion.
if
(
strategy
.
sync_batch_norm_
)
{
AppendPass
(
"sync_batch_norm_pass"
);
}
// Add op fusion.
// Add op fusion.
if
(
strategy
.
fuse_relu_depthwise_conv_
)
{
if
(
strategy
.
fuse_relu_depthwise_conv_
)
{
AppendPass
(
"fuse_relu_depthwise_conv_pass"
);
AppendPass
(
"fuse_relu_depthwise_conv_pass"
);
...
@@ -227,6 +233,7 @@ std::unique_ptr<ir::Graph> BuildStrategy::Apply(
...
@@ -227,6 +233,7 @@ std::unique_ptr<ir::Graph> BuildStrategy::Apply(
}
// namespace framework
}
// namespace framework
}
// namespace paddle
}
// namespace paddle
USE_PASS
(
sync_batch_norm_pass
);
USE_PASS
(
fuse_relu_depthwise_conv_pass
);
USE_PASS
(
fuse_relu_depthwise_conv_pass
);
USE_PASS
(
fuse_elewise_add_act_pass
);
USE_PASS
(
fuse_elewise_add_act_pass
);
USE_PASS
(
graph_viz_pass
);
USE_PASS
(
graph_viz_pass
);
...
...
paddle/fluid/framework/details/build_strategy.h
浏览文件 @
161b8ddc
...
@@ -77,6 +77,8 @@ struct BuildStrategy {
...
@@ -77,6 +77,8 @@ struct BuildStrategy {
bool
fuse_relu_depthwise_conv_
{
false
};
bool
fuse_relu_depthwise_conv_
{
false
};
bool
sync_batch_norm_
{
false
};
bool
memory_optimize_
{
true
};
bool
memory_optimize_
{
true
};
// TODO(dzhwinter):
// TODO(dzhwinter):
// make enable_inplace, memory_optimize_
// make enable_inplace, memory_optimize_
...
...
paddle/fluid/framework/details/memory_optimize_helper.cc
浏览文件 @
161b8ddc
...
@@ -337,7 +337,6 @@ bool NodeCanReused(const VarDesc& node) {
...
@@ -337,7 +337,6 @@ bool NodeCanReused(const VarDesc& node) {
auto
type
=
node
.
GetType
();
auto
type
=
node
.
GetType
();
// only these types holds bulk of gpu memory
// only these types holds bulk of gpu memory
if
(
!
(
type
==
proto
::
VarType
::
LOD_TENSOR
||
if
(
!
(
type
==
proto
::
VarType
::
LOD_TENSOR
||
type
==
proto
::
VarType
::
SELECTED_ROWS
||
type
==
proto
::
VarType
::
LOD_TENSOR_ARRAY
))
{
type
==
proto
::
VarType
::
LOD_TENSOR_ARRAY
))
{
return
false
;
return
false
;
}
}
...
...
paddle/fluid/framework/grad_op_desc_maker.h
浏览文件 @
161b8ddc
...
@@ -14,7 +14,9 @@ limitations under the License. */
...
@@ -14,7 +14,9 @@ limitations under the License. */
#pragma once
#pragma once
#include <algorithm>
#include <algorithm>
#include <memory>
#include <string>
#include <string>
#include <unordered_map>
#include <unordered_set>
#include <unordered_set>
#include <vector>
#include <vector>
#include "paddle/fluid/framework/op_desc.h"
#include "paddle/fluid/framework/op_desc.h"
...
@@ -55,11 +57,11 @@ class GradOpDescMakerBase {
...
@@ -55,11 +57,11 @@ class GradOpDescMakerBase {
std
::
back_inserter
(
ret_val
),
std
::
back_inserter
(
ret_val
),
[
this
](
const
std
::
string
&
fwd_var_name
)
->
std
::
string
{
[
this
](
const
std
::
string
&
fwd_var_name
)
->
std
::
string
{
auto
g_name
=
GradVarName
(
fwd_var_name
);
auto
g_name
=
GradVarName
(
fwd_var_name
);
if
(
no_grad_set_
.
count
(
g_name
))
{
if
(
no_grad_set_
.
empty
()
||
!
no_grad_set_
.
count
(
g_name
))
{
return
kEmptyVarName
;
}
else
{
(
*
this
->
grad_to_var_
)[
g_name
]
=
fwd_var_name
;
(
*
this
->
grad_to_var_
)[
g_name
]
=
fwd_var_name
;
return
g_name
;
return
g_name
;
}
else
{
return
kEmptyVarName
;
}
}
});
});
if
(
!
drop_empty_grad
)
{
if
(
!
drop_empty_grad
)
{
...
...
paddle/fluid/framework/ir/CMakeLists.txt
浏览文件 @
161b8ddc
...
@@ -46,6 +46,8 @@ cc_library(fuse_pass_base SRCS fuse_pass_base.cc DEPS pass)
...
@@ -46,6 +46,8 @@ cc_library(fuse_pass_base SRCS fuse_pass_base.cc DEPS pass)
pass_library
(
graph_to_program_pass base
)
pass_library
(
graph_to_program_pass base
)
pass_library
(
graph_viz_pass base
)
pass_library
(
graph_viz_pass base
)
pass_library
(
lock_free_optimize_pass base
)
pass_library
(
lock_free_optimize_pass base
)
pass_library
(
cpu_quantize_pass inference
)
pass_library
(
cpu_quantize_squash_pass inference
)
pass_library
(
fc_fuse_pass inference
)
pass_library
(
fc_fuse_pass inference
)
pass_library
(
attention_lstm_fuse_pass inference
)
pass_library
(
attention_lstm_fuse_pass inference
)
pass_library
(
infer_clean_graph_pass inference
)
pass_library
(
infer_clean_graph_pass inference
)
...
@@ -66,6 +68,7 @@ pass_library(conv_elementwise_add_fuse_pass inference)
...
@@ -66,6 +68,7 @@ pass_library(conv_elementwise_add_fuse_pass inference)
pass_library
(
conv_affine_channel_fuse_pass inference
)
pass_library
(
conv_affine_channel_fuse_pass inference
)
pass_library
(
transpose_flatten_concat_fuse_pass inference
)
pass_library
(
transpose_flatten_concat_fuse_pass inference
)
pass_library
(
identity_scale_op_clean_pass base
)
pass_library
(
identity_scale_op_clean_pass base
)
pass_library
(
sync_batch_norm_pass base
)
# There may be many transpose-flatten structures in a model, and the output of
# There may be many transpose-flatten structures in a model, and the output of
# these structures will be used as inputs to the concat Op. This pattern will
# these structures will be used as inputs to the concat Op. This pattern will
...
@@ -100,6 +103,11 @@ cc_test(test_graph_pattern_detector SRCS graph_pattern_detector_tester.cc DEPS g
...
@@ -100,6 +103,11 @@ cc_test(test_graph_pattern_detector SRCS graph_pattern_detector_tester.cc DEPS g
cc_test
(
test_fc_fuse_pass SRCS fc_fuse_pass_tester.cc DEPS fc_fuse_pass framework_proto
)
cc_test
(
test_fc_fuse_pass SRCS fc_fuse_pass_tester.cc DEPS fc_fuse_pass framework_proto
)
cc_test
(
test_seqpool_concat_fuse_pass SRCS seqpool_concat_fuse_pass_tester.cc DEPS seqpool_concat_fuse_pass framework_proto
)
cc_test
(
test_seqpool_concat_fuse_pass SRCS seqpool_concat_fuse_pass_tester.cc DEPS seqpool_concat_fuse_pass framework_proto
)
cc_test
(
test_is_test_pass SRCS is_test_pass_tester.cc DEPS is_test_pass
)
cc_test
(
test_is_test_pass SRCS is_test_pass_tester.cc DEPS is_test_pass
)
cc_test
(
test_cpu_quantize_pass SRCS cpu_quantize_pass_tester.cc DEPS cpu_quantize_pass naive_executor
)
cc_test
(
test_cpu_quantize_squash_pass SRCS cpu_quantize_squash_pass_tester.cc DEPS cpu_quantize_squash_pass naive_executor
)
if
(
NOT WIN32
)
cc_test
(
test_sync_batch_norm_pass SRCS sync_batch_norm_pass_tester.cc DEPS sync_batch_norm_pass
)
endif
()
if
(
WITH_MKLDNN
)
if
(
WITH_MKLDNN
)
cc_test
(
test_depthwise_conv_mkldnn_pass SRCS mkldnn/depthwise_conv_mkldnn_pass_tester.cc DEPS depthwise_conv_mkldnn_pass
)
cc_test
(
test_depthwise_conv_mkldnn_pass SRCS mkldnn/depthwise_conv_mkldnn_pass_tester.cc DEPS depthwise_conv_mkldnn_pass
)
cc_test
(
test_conv_bias_mkldnn_fuse_pass SRCS mkldnn/conv_bias_mkldnn_fuse_pass_tester.cc DEPS conv_bias_mkldnn_fuse_pass naive_executor
)
cc_test
(
test_conv_bias_mkldnn_fuse_pass SRCS mkldnn/conv_bias_mkldnn_fuse_pass_tester.cc DEPS conv_bias_mkldnn_fuse_pass naive_executor
)
...
...
paddle/fluid/framework/ir/cpu_quantize_pass.cc
0 → 100644
浏览文件 @
161b8ddc
// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "paddle/fluid/framework/ir/cpu_quantize_pass.h"
#include <utility>
#include <vector>
#include "paddle/fluid/framework/eigen.h"
#include "paddle/fluid/string/pretty_log.h"
namespace
paddle
{
namespace
framework
{
namespace
ir
{
namespace
{
void
UnlinkNodes
(
ir
::
Node
*
a
,
ir
::
Node
*
b
)
{
a
->
outputs
.
erase
(
std
::
remove
(
a
->
outputs
.
begin
(),
a
->
outputs
.
end
(),
b
),
a
->
outputs
.
end
());
b
->
inputs
.
erase
(
std
::
remove
(
b
->
inputs
.
begin
(),
b
->
inputs
.
end
(),
a
),
b
->
inputs
.
end
());
}
}
// namespace
enum
{
U8_MAX
=
255
,
S8_MAX
=
127
};
using
EigenVectorArrayMap
=
Eigen
::
Map
<
Eigen
::
Array
<
double
,
Eigen
::
Dynamic
,
1
>>
;
using
string
::
PrettyLogDetail
;
void
CPUQuantizePass
::
QuantizeInput
(
Graph
*
g
,
Node
*
op
,
Node
*
input
,
std
::
string
input_name
,
double
scale_to_one
,
bool
is_unsigned
,
std
::
string
scale_attr_name
)
const
{
unsigned
max
=
is_unsigned
?
U8_MAX
:
S8_MAX
;
float
scale
=
scale_to_one
*
max
;
// Create quantize output variable
VarDesc
quantize_out_desc
(
patterns
::
PDNodeName
(
"quantize"
,
"out"
));
auto
*
quantize_out_node
=
g
->
CreateVarNode
(
&
quantize_out_desc
);
// create a quantize op node
OpDesc
q_desc
;
q_desc
.
SetType
(
"quantize"
);
q_desc
.
SetInput
(
"Input"
,
std
::
vector
<
std
::
string
>
({
input
->
Name
()}));
q_desc
.
SetOutput
(
"Output"
,
std
::
vector
<
std
::
string
>
({
quantize_out_node
->
Name
()}));
q_desc
.
SetAttr
(
"Scale"
,
scale
);
q_desc
.
SetAttr
(
"is_negative_input"
,
!
is_unsigned
);
auto
quantize_op
=
g
->
CreateOpNode
(
&
q_desc
);
// OpDesc will be copied.
// update op's input
op
->
Op
()
->
SetInput
(
input_name
,
std
::
vector
<
std
::
string
>
({
quantize_out_node
->
Name
()}));
// link quantize op
UnlinkNodes
(
input
,
op
);
IR_NODE_LINK_TO
(
input
,
quantize_op
);
IR_NODE_LINK_TO
(
quantize_op
,
quantize_out_node
);
IR_NODE_LINK_TO
(
quantize_out_node
,
op
);
if
(
!
scale_attr_name
.
empty
())
op
->
Op
()
->
SetAttr
(
scale_attr_name
,
scale
);
}
void
CPUQuantizePass
::
DequantizeOutput
(
Graph
*
g
,
Node
*
op
,
Node
*
output
,
std
::
string
output_name
,
double
scale_to_one
,
bool
is_unsigned
,
std
::
string
scale_attr_name
)
const
{
unsigned
max
=
is_unsigned
?
U8_MAX
:
S8_MAX
;
float
scale
=
scale_to_one
*
max
;
// Create dequantize input variable
VarDesc
dequantize_in_desc
(
patterns
::
PDNodeName
(
"dequantize"
,
"in"
));
auto
*
dequantize_in_node
=
g
->
CreateVarNode
(
&
dequantize_in_desc
);
// create a dequantize op node for output.
OpDesc
deq_desc
;
deq_desc
.
SetType
(
"dequantize"
);
deq_desc
.
SetInput
(
"Input"
,
std
::
vector
<
std
::
string
>
({
dequantize_in_node
->
Name
()}));
deq_desc
.
SetOutput
(
"Output"
,
std
::
vector
<
std
::
string
>
({
output
->
Name
()}));
deq_desc
.
SetAttr
(
"Scale"
,
scale
);
auto
dequantize_op
=
g
->
CreateOpNode
(
&
deq_desc
);
// OpDesc will be copied.
// update op's output
op
->
Op
()
->
SetOutput
(
output_name
,
std
::
vector
<
std
::
string
>
({
dequantize_in_node
->
Name
()}));
// link dequantize op
UnlinkNodes
(
op
,
output
);
IR_NODE_LINK_TO
(
op
,
dequantize_in_node
);
IR_NODE_LINK_TO
(
dequantize_in_node
,
dequantize_op
);
IR_NODE_LINK_TO
(
dequantize_op
,
output
);
if
(
!
scale_attr_name
.
empty
())
op
->
Op
()
->
SetAttr
(
scale_attr_name
,
scale
);
}
void
CPUQuantizePass
::
QuantizeConv
(
Graph
*
graph
,
bool
with_residual_data
)
const
{
GraphPatternDetector
gpd
;
auto
pattern
=
gpd
.
mutable_pattern
();
patterns
::
ConvResidual
conv_pattern
{
pattern
,
name_scope_
};
conv_pattern
(
with_residual_data
);
int
quantize_conv_count
=
0
;
auto
handler
=
[
&
](
const
GraphPatternDetector
::
subgraph_t
&
subgraph
,
Graph
*
g
)
{
VLOG
(
4
)
<<
"Quantize conv2d op"
;
GET_IR_NODE_FROM_SUBGRAPH
(
conv_op
,
conv_op
,
conv_pattern
);
auto
*
conv_op_desc
=
conv_op
->
Op
();
// skip if should not be quantized
if
(
!
conv_op_desc
->
HasAttr
(
"use_quantizer"
)
||
!
boost
::
get
<
bool
>
(
conv_op_desc
->
GetAttr
(
"use_quantizer"
)))
return
;
GET_IR_NODE_FROM_SUBGRAPH
(
conv_filter
,
conv_filter
,
conv_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
conv_input
,
conv_input
,
conv_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
conv_output
,
conv_output
,
conv_pattern
);
// get scales calculated after warmup, they scale variables to MAX=1.0
auto
scales
=
Get
<
VarQuantScale
>
(
"quant_var_scales"
);
auto
input_scale
=
scales
[
conv_input
->
Name
()].
second
.
data
<
double
>
()[
0
];
bool
is_input_unsigned
=
scales
[
conv_input
->
Name
()].
first
;
QuantizeInput
(
g
,
conv_op
,
conv_input
,
"Input"
,
input_scale
,
is_input_unsigned
,
"Scale_in"
);
auto
filter_scale_tensor
=
scales
[
conv_filter
->
Name
()].
second
;
EigenVectorArrayMap
eigen_tensor
{
filter_scale_tensor
.
data
<
double
>
(),
filter_scale_tensor
.
numel
(),
1
};
eigen_tensor
*=
static_cast
<
double
>
(
S8_MAX
);
std
::
vector
<
float
>
filter_scale
{
filter_scale_tensor
.
data
<
double
>
(),
filter_scale_tensor
.
data
<
double
>
()
+
filter_scale_tensor
.
numel
()};
conv_op
->
Op
()
->
SetAttr
(
"Scale_weights"
,
filter_scale
);
if
(
with_residual_data
)
{
GET_IR_NODE_FROM_SUBGRAPH
(
conv_residual_data
,
conv_residual_data
,
conv_pattern
);
auto
residual_scale
=
scales
[
conv_residual_data
->
Name
()].
second
.
data
<
double
>
()[
0
];
bool
is_residual_unsigned
=
scales
[
conv_residual_data
->
Name
()].
first
;
QuantizeInput
(
g
,
conv_op
,
conv_residual_data
,
"ResidualData"
,
residual_scale
,
is_residual_unsigned
,
"Scale_in_eltwise"
);
}
auto
output_scale
=
scales
[
conv_output
->
Name
()].
second
.
data
<
double
>
()[
0
];
bool
is_output_unsigned
=
scales
[
conv_output
->
Name
()].
first
;
DequantizeOutput
(
g
,
conv_op
,
conv_output
,
"Output"
,
output_scale
,
is_output_unsigned
,
"Scale_out"
);
++
quantize_conv_count
;
};
gpd
(
graph
,
handler
);
AddStatis
(
quantize_conv_count
);
std
::
stringstream
msg_ss
;
msg_ss
<<
"--- quantized "
<<
quantize_conv_count
<<
" conv2d ops"
;
if
(
with_residual_data
)
msg_ss
<<
" with residual connection"
;
PrettyLogDetail
(
msg_ss
.
str
().
c_str
());
}
void
CPUQuantizePass
::
QuantizePool
(
Graph
*
graph
)
const
{
GraphPatternDetector
gpd
;
auto
pattern
=
gpd
.
mutable_pattern
();
patterns
::
Pool
pool_pattern
{
pattern
,
name_scope_
};
pool_pattern
();
int
quantize_pool_count
=
0
;
auto
handler
=
[
&
](
const
GraphPatternDetector
::
subgraph_t
&
subgraph
,
Graph
*
g
)
{
VLOG
(
4
)
<<
"Quantize pool2d op"
;
GET_IR_NODE_FROM_SUBGRAPH
(
pool_op
,
pool_op
,
pool_pattern
);
auto
*
pool_op_desc
=
pool_op
->
Op
();
// skip if should not be quantized
if
(
!
pool_op_desc
->
HasAttr
(
"use_quantizer"
)
||
!
boost
::
get
<
bool
>
(
pool_op_desc
->
GetAttr
(
"use_quantizer"
)))
return
;
GET_IR_NODE_FROM_SUBGRAPH
(
pool_input
,
pool_input
,
pool_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
pool_output
,
pool_output
,
pool_pattern
);
// get scales calculated after warmup, they scale variables to MAX=1.0
auto
scales
=
Get
<
VarQuantScale
>
(
"quant_var_scales"
);
auto
input_scale
=
scales
[
pool_input
->
Name
()].
second
.
data
<
double
>
()[
0
];
bool
is_input_unsigned
=
scales
[
pool_input
->
Name
()].
first
;
QuantizeInput
(
g
,
pool_op
,
pool_input
,
"X"
,
input_scale
,
is_input_unsigned
);
auto
output_scale
=
scales
[
pool_output
->
Name
()].
second
.
data
<
double
>
()[
0
];
bool
is_output_unsigned
=
scales
[
pool_output
->
Name
()].
first
;
DequantizeOutput
(
g
,
pool_op
,
pool_output
,
"Out"
,
output_scale
,
is_output_unsigned
);
++
quantize_pool_count
;
};
gpd
(
graph
,
handler
);
AddStatis
(
quantize_pool_count
);
PrettyLogDetail
(
"--- quantized %d pool2d ops"
,
quantize_pool_count
);
}
std
::
unique_ptr
<
ir
::
Graph
>
CPUQuantizePass
::
ApplyImpl
(
std
::
unique_ptr
<
ir
::
Graph
>
graph
)
const
{
VLOG
(
3
)
<<
"Quantizing the graph."
;
PADDLE_ENFORCE
(
graph
.
get
());
FusePassBase
::
Init
(
name_scope_
,
graph
.
get
());
PADDLE_ENFORCE
(
param_scope
());
QuantizeConv
(
graph
.
get
(),
true
/* with_residual_data */
);
QuantizeConv
(
graph
.
get
());
QuantizePool
(
graph
.
get
());
return
graph
;
}
}
// namespace ir
}
// namespace framework
}
// namespace paddle
REGISTER_PASS
(
cpu_quantize_pass
,
paddle
::
framework
::
ir
::
CPUQuantizePass
)
.
RequirePassAttr
(
"quant_var_scales"
);
paddle/fluid/framework/ir/cpu_quantize_pass.h
0 → 100644
浏览文件 @
161b8ddc
// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#pragma once
#include <memory>
#include <string>
#include <unordered_map>
#include <utility>
#include "paddle/fluid/framework/ir/fuse_pass_base.h"
#include "paddle/fluid/framework/ir/graph.h"
#include "paddle/fluid/framework/ir/graph_pattern_detector.h"
namespace
paddle
{
namespace
framework
{
namespace
ir
{
/*
* Map variable name to tensor of scaling factors scaling it to MAX=1.0.
* bool denotes whether quantization of the variable should be done to unsigned
* type.
*/
using
VarQuantScale
=
std
::
unordered_map
<
std
::
string
,
std
::
pair
<
bool
,
LoDTensor
>>
;
/*
* Quantize all supported operators.
*/
class
CPUQuantizePass
:
public
FusePassBase
{
public:
virtual
~
CPUQuantizePass
()
{}
protected:
std
::
unique_ptr
<
ir
::
Graph
>
ApplyImpl
(
std
::
unique_ptr
<
ir
::
Graph
>
graph
)
const
override
;
void
QuantizeConv
(
Graph
*
graph
,
bool
with_residual_data
=
false
)
const
;
void
QuantizePool
(
Graph
*
graph
)
const
;
void
QuantizeInput
(
Graph
*
g
,
Node
*
op
,
Node
*
input
,
std
::
string
input_name
,
double
scale_to_one
,
bool
is_unsigned
,
std
::
string
scale_attr_name
=
""
)
const
;
void
DequantizeOutput
(
Graph
*
g
,
Node
*
op
,
Node
*
output
,
std
::
string
output_name
,
double
scale_to_one
,
bool
is_unsigned
,
std
::
string
scale_attr_name
=
""
)
const
;
const
std
::
string
name_scope_
{
"quantize"
};
};
}
// namespace ir
}
// namespace framework
}
// namespace paddle
paddle/fluid/framework/ir/cpu_quantize_pass_tester.cc
0 → 100644
浏览文件 @
161b8ddc
// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "paddle/fluid/framework/ir/cpu_quantize_pass.h"
#include <gtest/gtest.h>
#include "paddle/fluid/framework/naive_executor.h"
#include "paddle/fluid/platform/place.h"
namespace
paddle
{
namespace
framework
{
namespace
ir
{
void
SetOp
(
ProgramDesc
*
prog
,
const
std
::
string
&
type
,
const
std
::
string
&
name
,
const
std
::
vector
<
std
::
string
>&
inputs
,
const
std
::
vector
<
std
::
string
>&
outputs
,
bool
use_mkldnn
,
bool
use_quantizer
=
false
)
{
auto
*
op
=
prog
->
MutableBlock
(
0
)
->
AppendOp
();
op
->
SetType
(
type
);
op
->
SetAttr
(
"use_mkldnn"
,
use_mkldnn
);
op
->
SetAttr
(
"name"
,
name
);
if
(
type
==
"conv2d"
)
{
op
->
SetInput
(
"Input"
,
{
inputs
[
0
]});
op
->
SetInput
(
"Filter"
,
{
inputs
[
1
]});
if
(
inputs
.
size
()
>
2
)
op
->
SetInput
(
"Bias"
,
{
inputs
[
2
]});
else
op
->
SetInput
(
"Bias"
,
{});
if
(
inputs
.
size
()
>
3
)
{
op
->
SetInput
(
"ResidualData"
,
{
inputs
[
3
]});
op
->
SetAttr
(
"fuse_residual_connection"
,
true
);
}
else
{
op
->
SetInput
(
"ResidualData"
,
{});
op
->
SetAttr
(
"fuse_residual_connection"
,
false
);
}
op
->
SetOutput
(
"Output"
,
{
outputs
[
0
]});
op
->
SetAttr
(
"use_quantizer"
,
use_quantizer
);
op
->
SetAttr
(
"Scale_in"
,
1.0
f
);
op
->
SetAttr
(
"Scale_out"
,
1.0
f
);
op
->
SetAttr
(
"Scale_weights"
,
std
::
vector
<
float
>
{
1.0
f
});
}
else
if
(
type
==
"pool2d"
)
{
op
->
SetInput
(
"X"
,
{
inputs
[
0
]});
op
->
SetOutput
(
"Out"
,
{
outputs
[
0
]});
op
->
SetAttr
(
"use_quantizer"
,
use_quantizer
);
}
else
if
(
type
==
"dropout"
)
{
op
->
SetInput
(
"X"
,
{
inputs
[
0
]});
op
->
SetOutput
(
"Out"
,
{
outputs
[
0
]});
}
else
if
(
type
==
"fc"
)
{
op
->
SetInput
(
"Input"
,
{
inputs
[
0
]});
if
(
inputs
.
size
()
>
1
)
op
->
SetInput
(
"W"
,
{
inputs
[
1
]});
if
(
inputs
.
size
()
>
2
)
op
->
SetInput
(
"Bias"
,
{
inputs
[
2
]});
op
->
SetOutput
(
"Out"
,
{
outputs
[
0
]});
}
}
static
const
std
::
initializer_list
<
std
::
string
>
variable_names
{
"a"
,
"w1"
,
"c"
,
"d"
,
"w2"
,
"e"
,
"f"
,
"g"
,
"h"
,
"w3"
,
"b1"
,
"i"
,
"j"
,
"w4"
,
"b2"
};
// (a,w1)->Conv1->c and c->Pool1->d
//
// (d,w2)->Conv2->e and e->Pool2->f
//
// d->Dropout1->g and g->Fc1->h and (h,w3,b1,i)->Conv3->j
//
// (d,w4, b2)->Conv4->i
ProgramDesc
BuildProgramDesc
(
bool
use_mkldnn
,
bool
use_quantizer
)
{
ProgramDesc
prog
;
for
(
auto
&
v
:
variable_names
)
{
auto
*
var
=
prog
.
MutableBlock
(
0
)
->
Var
(
v
);
if
(
v
.
find
(
"w"
)
==
0
||
v
.
find
(
"b"
)
==
0
)
{
var
->
SetPersistable
(
true
);
}
}
SetOp
(
&
prog
,
"conv2d"
,
"Conv1"
,
{
"a"
,
"w1"
},
{
"c"
},
use_mkldnn
,
use_quantizer
);
SetOp
(
&
prog
,
"pool2d"
,
"Pool1"
,
{
"c"
},
{
"d"
},
use_mkldnn
,
use_quantizer
);
SetOp
(
&
prog
,
"conv2d"
,
"Conv2"
,
{
"d"
,
"w2"
},
{
"e"
},
use_mkldnn
,
use_quantizer
);
SetOp
(
&
prog
,
"pool2d"
,
"Pool2"
,
{
"e"
},
{
"f"
},
use_mkldnn
,
use_quantizer
);
SetOp
(
&
prog
,
"dropout"
,
"Dropout1"
,
{
"d"
},
{
"g"
},
use_mkldnn
);
SetOp
(
&
prog
,
"fc"
,
"Fc1"
,
{
"g"
},
{
"h"
},
use_mkldnn
);
SetOp
(
&
prog
,
"conv2d"
,
"Conv3"
,
{
"h"
,
"w3"
,
"b1"
,
"i"
},
{
"j"
},
use_mkldnn
,
use_quantizer
);
SetOp
(
&
prog
,
"conv2d"
,
"Conv4"
,
{
"c"
,
"w4"
,
"b2"
},
{
"i"
},
use_mkldnn
,
use_quantizer
);
return
prog
;
}
void
InitTensorHolder
(
Scope
*
scope
,
const
paddle
::
platform
::
Place
&
place
,
const
char
*
var_name
)
{
auto
x
=
scope
->
Var
(
var_name
);
auto
tensor
=
x
->
GetMutable
<
LoDTensor
>
();
tensor
->
mutable_data
(
place
,
proto
::
VarType
::
FP32
,
::
paddle
::
memory
::
Allocator
::
kDefault
,
1
);
}
void
MainTest
(
const
ProgramDesc
&
prog
,
int
conv_count
,
int
pool_count
,
int
quant_count
,
int
dequant_count
,
int
added_nodes_count
,
float
scale
)
{
std
::
unique_ptr
<
ir
::
Graph
>
graph
(
new
ir
::
Graph
(
prog
));
// Init scope, as it is used in pass
auto
place
=
paddle
::
platform
::
CPUPlace
();
NaiveExecutor
exe
{
place
};
Scope
scope
;
exe
.
CreateVariables
(
prog
,
0
,
true
,
&
scope
);
auto
*
scales
=
new
VarQuantScale
();
for
(
auto
&
v
:
variable_names
)
{
InitTensorHolder
(
&
scope
,
place
,
v
.
c_str
());
LoDTensor
tensor
;
tensor
.
Resize
({
1
});
auto
*
ptr
=
tensor
.
mutable_data
<
double
>
(
place
);
ptr
[
0
]
=
2.0
;
(
*
scales
)[
v
]
=
std
::
make_pair
(
false
,
std
::
move
(
tensor
));
}
graph
->
Set
(
kParamScopeAttr
,
new
framework
::
Scope
*
(
&
scope
));
auto
pass
=
PassRegistry
::
Instance
().
Get
(
"cpu_quantize_pass"
);
pass
->
Set
(
"quant_var_scales"
,
scales
);
int
original_nodes_num
=
graph
->
Nodes
().
size
();
graph
=
pass
->
Apply
(
std
::
move
(
graph
));
int
current_nodes_num
=
graph
->
Nodes
().
size
();
int
quantize_nodes_count
=
0
;
int
dequantize_nodes_count
=
0
;
int
conv2d_nodes_count
=
0
;
int
pool2d_nodes_count
=
0
;
for
(
auto
*
node
:
graph
->
Nodes
())
{
if
(
node
->
IsOp
())
{
auto
*
op
=
node
->
Op
();
if
(
op
->
Type
()
==
"conv2d"
)
{
conv2d_nodes_count
++
;
auto
op_name
=
boost
::
get
<
std
::
string
>
(
op
->
GetAttr
(
"name"
));
EXPECT_EQ
(
boost
::
get
<
float
>
(
op
->
GetAttr
(
"Scale_in"
)),
scale
)
<<
"Scale_in for node '"
+
op_name
+
"'."
;
EXPECT_EQ
(
boost
::
get
<
float
>
(
op
->
GetAttr
(
"Scale_out"
)),
scale
)
<<
"Scale_out for node '"
+
op_name
+
"'."
;
EXPECT_EQ
(
boost
::
get
<
std
::
vector
<
float
>>
(
op
->
GetAttr
(
"Scale_weights"
))[
0
],
scale
)
<<
"Scale_weights for node '"
+
op_name
+
"'."
;
}
else
if
(
op
->
Type
()
==
"pool2d"
)
{
pool2d_nodes_count
++
;
}
else
if
(
op
->
Type
()
==
"quantize"
)
{
quantize_nodes_count
++
;
}
else
if
(
op
->
Type
()
==
"dequantize"
)
{
dequantize_nodes_count
++
;
}
}
}
EXPECT_EQ
(
conv2d_nodes_count
,
conv_count
);
EXPECT_EQ
(
pool2d_nodes_count
,
pool_count
);
EXPECT_EQ
(
quantize_nodes_count
,
quant_count
);
EXPECT_EQ
(
dequantize_nodes_count
,
dequant_count
);
EXPECT_EQ
(
original_nodes_num
+
added_nodes_count
,
current_nodes_num
);
}
TEST
(
CpuQuantizePass
,
quantize
)
{
bool
use_mkldnn
=
true
;
bool
use_quantizer
=
true
;
// (a->QUANT1->IN1,w1)->Conv1->OUT1->DEQUANT1->c and
// c->QUANT2->IN2->Pool1->OUT2->DEQUANT2->d
//
// (d->QUANT3->IN3,w2)->Conv2->OUT3->DEQUANT3->e and
// e->QUANT4->IN4->Pool2->OUT4->DEQUANT4->f
//
// d->Dropout1->g and g->Fc1->h and
// (h->QUANT5->IN5,w3,b1,i->QUANT6->IN6)->Conv3->OUT5->DEQUANT5->j
//
// (d->QUANT7->IN7,w4, b2)->Conv4->DEQUANT6->OUT6->i
// Insert nodes: 7 Quant + 7 IN + 6 OUT + 6 DEQUANT
int
added_nodes
=
7
+
7
+
6
+
6
;
MainTest
(
BuildProgramDesc
(
use_mkldnn
,
use_quantizer
),
4
,
2
,
7
,
6
,
added_nodes
,
2.0
f
*
127
);
}
TEST
(
CpuQuantizePass
,
do_not_quantize
)
{
bool
use_mkldnn
=
true
;
bool
use_quantizer
=
false
;
int
added_nodes
=
0
;
MainTest
(
BuildProgramDesc
(
use_mkldnn
,
use_quantizer
),
4
,
2
,
0
,
0
,
added_nodes
,
1.0
f
);
}
}
// namespace ir
}
// namespace framework
}
// namespace paddle
USE_PASS
(
cpu_quantize_pass
);
paddle/fluid/framework/ir/cpu_quantize_squash_pass.cc
0 → 100644
浏览文件 @
161b8ddc
// Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file eint8_outcept in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either eint8_outpress or
// implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "paddle/fluid/framework/ir/cpu_quantize_squash_pass.h"
#include <string>
#include <vector>
#include "paddle/fluid/platform/enforce.h"
#include "paddle/fluid/string/pretty_log.h"
namespace
paddle
{
namespace
framework
{
namespace
ir
{
using
string
::
PrettyLogDetail
;
void
CPUQuantizeSquashPass
::
FindNodesToKeep
(
Graph
*
graph
,
std
::
unordered_map
<
const
Node
*
,
int
>*
nodes_keep_counter
)
const
{
GraphPatternDetector
gpd
;
patterns
::
DequantAny
deq_any_pattern
{
gpd
.
mutable_pattern
(),
"deqant_any"
};
deq_any_pattern
();
int
found_count
=
0
;
auto
handler
=
[
&
](
const
GraphPatternDetector
::
subgraph_t
&
subgraph
,
Graph
*
g
)
{
GET_IR_NODE_FROM_SUBGRAPH
(
dequant_out
,
dequant_out
,
deq_any_pattern
);
if
(
nodes_keep_counter
->
find
(
dequant_out
)
==
nodes_keep_counter
->
end
())
(
*
nodes_keep_counter
)[
dequant_out
]
=
1
;
else
(
*
nodes_keep_counter
)[
dequant_out
]
+=
1
;
found_count
++
;
};
gpd
(
graph
,
handler
);
AddStatis
(
found_count
);
}
void
CPUQuantizeSquashPass
::
Squash
(
Graph
*
graph
,
std
::
unordered_map
<
const
Node
*
,
int
>*
nodes_keep_counter
)
const
{
GraphPatternDetector
gpd
;
patterns
::
DequantQuantAny
squash_pattern
{
gpd
.
mutable_pattern
(),
"squash"
};
squash_pattern
();
int
found_squash_count
=
0
;
auto
handler
=
[
&
](
const
GraphPatternDetector
::
subgraph_t
&
subgraph
,
Graph
*
g
)
{
VLOG
(
4
)
<<
"squash requantize-quantize ops pair"
;
GET_IR_NODE_FROM_SUBGRAPH
(
dequant_in
,
dequant_in
,
squash_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
dequant_op
,
dequant_op
,
squash_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
dequant_out
,
dequant_out
,
squash_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
quant_op
,
quant_op
,
squash_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
quant_out
,
quant_out
,
squash_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
next_op
,
next_op
,
squash_pattern
);
auto
*
next_op_desc
=
next_op
->
Op
();
float
dequant_scale
=
boost
::
get
<
float
>
(
dequant_op
->
Op
()
->
GetAttr
(
"Scale"
));
float
quant_scale
=
boost
::
get
<
float
>
(
quant_op
->
Op
()
->
GetAttr
(
"Scale"
));
PADDLE_ENFORCE
(
nodes_keep_counter
->
find
(
dequant_out
)
!=
nodes_keep_counter
->
end
());
// check if dequantize op should be kept or removed, decrease the counter
bool
keep_dequant
=
(
*
nodes_keep_counter
)[
dequant_out
]
--
>
1
;
if
(
dequant_scale
==
quant_scale
)
{
// squash dequantize-quantize to nothing
auto
quant_out_var_name
=
quant_out
->
Name
();
auto
next_op_inputs
=
next_op_desc
->
InputNames
();
for
(
const
auto
&
name
:
next_op_inputs
)
{
auto
var_name
=
next_op_desc
->
Input
(
name
)[
0
];
if
(
var_name
.
compare
(
quant_out_var_name
)
==
0
)
{
next_op_desc
->
SetInput
(
name
,
std
::
vector
<
std
::
string
>
({
dequant_in
->
Name
()}));
break
;
}
}
if
(
keep_dequant
)
GraphSafeRemoveNodes
(
graph
,
{
quant_op
,
quant_out
});
else
GraphSafeRemoveNodes
(
graph
,
{
dequant_op
,
quant_op
,
dequant_out
,
quant_out
});
IR_NODE_LINK_TO
(
dequant_in
,
next_op
);
found_squash_count
++
;
}
else
{
// squash dequantize-quantize to requantize op
OpDesc
desc
;
desc
.
SetType
(
"requantize"
);
desc
.
SetInput
(
"Input"
,
std
::
vector
<
std
::
string
>
({
dequant_in
->
Name
()}));
desc
.
SetOutput
(
"Output"
,
std
::
vector
<
std
::
string
>
({
quant_out
->
Name
()}));
desc
.
SetAttr
(
"Scale_in"
,
dequant_scale
);
desc
.
SetAttr
(
"Scale_out"
,
quant_scale
);
auto
requant_op
=
g
->
CreateOpNode
(
&
desc
);
if
(
keep_dequant
)
GraphSafeRemoveNodes
(
graph
,
{
quant_op
});
else
GraphSafeRemoveNodes
(
graph
,
{
dequant_op
,
quant_op
,
dequant_out
});
IR_NODE_LINK_TO
(
dequant_in
,
requant_op
);
IR_NODE_LINK_TO
(
requant_op
,
quant_out
);
found_squash_count
++
;
}
};
gpd
(
graph
,
handler
);
AddStatis
(
found_squash_count
);
PrettyLogDetail
(
"--- squashed %d dequantize-quantize pairs"
,
found_squash_count
);
}
std
::
unique_ptr
<
ir
::
Graph
>
CPUQuantizeSquashPass
::
ApplyImpl
(
std
::
unique_ptr
<
ir
::
Graph
>
graph
)
const
{
PADDLE_ENFORCE
(
graph
.
get
());
FusePassBase
::
Init
(
"cpu_quantize_squash_pass"
,
graph
.
get
());
std
::
unordered_map
<
const
Node
*
,
int
>
nodes_keep_counter
;
FindNodesToKeep
(
graph
.
get
(),
&
nodes_keep_counter
);
Squash
(
graph
.
get
(),
&
nodes_keep_counter
);
return
graph
;
}
}
// namespace ir
}
// namespace framework
}
// namespace paddle
REGISTER_PASS
(
cpu_quantize_squash_pass
,
paddle
::
framework
::
ir
::
CPUQuantizeSquashPass
);
paddle/fluid/framework/ir/cpu_quantize_squash_pass.h
0 → 100644
浏览文件 @
161b8ddc
// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#pragma once
#include <memory>
#include <string>
#include <unordered_map>
#include "paddle/fluid/framework/ir/fuse_pass_base.h"
#include "paddle/fluid/framework/ir/graph.h"
#include "paddle/fluid/framework/ir/graph_pattern_detector.h"
#include "paddle/fluid/framework/ir/pass.h"
namespace
paddle
{
namespace
framework
{
namespace
ir
{
/*
* Squash dequantize->quantize pair pattern into requantize op
*/
class
CPUQuantizeSquashPass
:
public
FusePassBase
{
public:
virtual
~
CPUQuantizeSquashPass
()
{}
protected:
std
::
unique_ptr
<
ir
::
Graph
>
ApplyImpl
(
std
::
unique_ptr
<
ir
::
Graph
>
graph
)
const
override
;
/*
* For each dequantize's output find the number of operators it is an input to
*/
void
FindNodesToKeep
(
Graph
*
graph
,
std
::
unordered_map
<
const
Node
*
,
int
>*
nodes_keep_counter
)
const
;
/*
* Squash dequantize-quantize ops pairs into requantize or nothing
*/
void
Squash
(
Graph
*
graph
,
std
::
unordered_map
<
const
Node
*
,
int
>*
nodes_keep_counter
)
const
;
const
std
::
string
name_scope_
{
"squash"
};
};
}
// namespace ir
}
// namespace framework
}
// namespace paddle
paddle/fluid/framework/ir/cpu_quantize_squash_pass_tester.cc
0 → 100644
浏览文件 @
161b8ddc
// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "paddle/fluid/framework/ir/cpu_quantize_squash_pass.h"
#include <gtest/gtest.h>
#include "paddle/fluid/framework/naive_executor.h"
#include "paddle/fluid/platform/place.h"
namespace
paddle
{
namespace
framework
{
namespace
ir
{
void
SetOp
(
ProgramDesc
*
prog
,
const
std
::
string
&
type
,
const
std
::
string
&
name
,
const
std
::
vector
<
std
::
string
>&
inputs
,
const
std
::
vector
<
std
::
string
>&
outputs
,
bool
use_mkldnn
,
float
scale
=
0
)
{
auto
*
op
=
prog
->
MutableBlock
(
0
)
->
AppendOp
();
op
->
SetType
(
type
);
op
->
SetAttr
(
"use_mkldnn"
,
use_mkldnn
);
op
->
SetAttr
(
"name"
,
name
);
if
(
type
==
"conv2d"
)
{
op
->
SetInput
(
"Input"
,
{
inputs
[
0
]});
if
(
inputs
.
size
()
>
1
)
op
->
SetInput
(
"Filter"
,
{
inputs
[
1
]});
if
(
inputs
.
size
()
>
2
)
op
->
SetInput
(
"Bias"
,
{
inputs
[
2
]});
op
->
SetOutput
(
"Output"
,
{
outputs
[
0
]});
}
else
if
(
type
==
"quantize"
)
{
op
->
SetInput
(
"Input"
,
{
inputs
[
0
]});
op
->
SetOutput
(
"Output"
,
{
outputs
[
0
]});
op
->
SetAttr
(
"Scale"
,
scale
);
}
else
if
(
type
==
"dequantize"
)
{
op
->
SetInput
(
"Input"
,
{
inputs
[
0
]});
op
->
SetOutput
(
"Output"
,
{
outputs
[
0
]});
op
->
SetAttr
(
"Scale"
,
scale
);
}
}
// (a,w1,b1)->Conv1->d
// d->Dequant->e
// e->Quant->f
// (f,w2,b2)->Conv2->i
ProgramDesc
BuildProgramDesc
(
bool
use_mkldnn
,
float
scale1
,
float
scale2
)
{
ProgramDesc
prog
;
for
(
auto
&
v
:
std
::
initializer_list
<
std
::
string
>
(
{
"a"
,
"w1"
,
"b1"
,
"d"
,
"e"
,
"f"
,
"w2"
,
"b2"
,
"i"
}))
{
auto
*
var
=
prog
.
MutableBlock
(
0
)
->
Var
(
v
);
if
(
v
.
find
(
"w"
)
==
0
||
v
.
find
(
"b"
)
==
0
)
{
var
->
SetPersistable
(
true
);
}
}
SetOp
(
&
prog
,
"conv2d"
,
"Conv1"
,
{
"a"
,
"w1"
,
"b1"
},
{
"d"
},
use_mkldnn
);
SetOp
(
&
prog
,
"dequantize"
,
"Dequant"
,
{
"d"
},
{
"e"
},
use_mkldnn
,
scale1
);
SetOp
(
&
prog
,
"quantize"
,
"Quant"
,
{
"e"
},
{
"f"
},
use_mkldnn
,
scale2
);
SetOp
(
&
prog
,
"conv2d"
,
"Conv2"
,
{
"f"
,
"w2"
,
"b2"
},
{
"i"
},
use_mkldnn
);
return
prog
;
}
static
const
std
::
initializer_list
<
std
::
string
>
variable_names
{
"a"
,
"b"
,
"c"
,
"d"
,
"e"
,
"f"
,
"g"
,
"h"
};
// a->Conv1->b
// b->Dequant->c
//
// c->Quant1->d and d->Conv2->e
//
// c->Conv3->f
//
// c->Quant2->g and g->Conv4->h
//
ProgramDesc
BuildProgramDesc2
(
bool
use_mkldnn
,
float
scale1
,
float
scale2
,
float
scale3
)
{
ProgramDesc
prog
;
for
(
auto
&
v
:
variable_names
)
{
prog
.
MutableBlock
(
0
)
->
Var
(
v
);
}
SetOp
(
&
prog
,
"conv2d"
,
"Conv1"
,
{
"a"
},
{
"b"
},
use_mkldnn
);
SetOp
(
&
prog
,
"dequantize"
,
"Dequant"
,
{
"b"
},
{
"c"
},
use_mkldnn
,
scale1
);
SetOp
(
&
prog
,
"quantize"
,
"Quant1"
,
{
"c"
},
{
"d"
},
use_mkldnn
,
scale2
);
SetOp
(
&
prog
,
"conv2d"
,
"Conv2"
,
{
"d"
},
{
"e"
},
use_mkldnn
);
SetOp
(
&
prog
,
"conv2d"
,
"Conv3"
,
{
"c"
},
{
"f"
},
use_mkldnn
);
SetOp
(
&
prog
,
"quantize"
,
"Quant2"
,
{
"c"
},
{
"g"
},
use_mkldnn
,
scale3
);
SetOp
(
&
prog
,
"conv2d"
,
"Conv4"
,
{
"g"
},
{
"h"
},
use_mkldnn
);
return
prog
;
}
void
InitTensorHolder
(
Scope
*
scope
,
const
paddle
::
platform
::
Place
&
place
,
const
char
*
var_name
)
{
auto
x
=
scope
->
Var
(
var_name
);
auto
tensor
=
x
->
GetMutable
<
LoDTensor
>
();
tensor
->
mutable_data
(
place
,
proto
::
VarType
::
FP32
,
::
paddle
::
memory
::
Allocator
::
kDefault
,
1
);
}
void
MainTest
(
const
ProgramDesc
&
prog
,
int
removed_nodes_num
)
{
std
::
unique_ptr
<
ir
::
Graph
>
graph
(
new
ir
::
Graph
(
prog
));
// Init scope, as it is used in pass
auto
place
=
paddle
::
platform
::
CPUPlace
();
NaiveExecutor
exe
{
place
};
Scope
scope
;
exe
.
CreateVariables
(
prog
,
0
,
true
,
&
scope
);
for
(
auto
&
v
:
variable_names
)
{
InitTensorHolder
(
&
scope
,
place
,
v
.
c_str
());
}
graph
->
Set
(
kParamScopeAttr
,
new
framework
::
Scope
*
(
&
scope
));
auto
pass
=
PassRegistry
::
Instance
().
Get
(
"cpu_quantize_squash_pass"
);
int
original_nodes_num
=
graph
->
Nodes
().
size
();
graph
=
pass
->
Apply
(
std
::
move
(
graph
));
int
current_nodes_num
=
graph
->
Nodes
().
size
();
EXPECT_EQ
(
original_nodes_num
-
removed_nodes_num
,
current_nodes_num
);
}
TEST
(
CpuQuantizeSquashPass
,
equal_scales
)
{
auto
scale
=
1.2345
f
;
auto
use_mkldnn
=
true
;
// Remove 4 nodes: Dequant, Quant, e, f
auto
remove_nodes
=
4
;
MainTest
(
BuildProgramDesc
(
use_mkldnn
,
scale
,
scale
),
remove_nodes
);
use_mkldnn
=
!
use_mkldnn
;
MainTest
(
BuildProgramDesc
(
use_mkldnn
,
scale
,
scale
),
remove_nodes
);
}
TEST
(
CpuQuantizeSquashPass
,
inequal_scales
)
{
auto
scale1
=
1.2345
f
;
auto
scale2
=
21.0
f
;
auto
use_mkldnn
=
true
;
// Remove 3 nodes: Dequant, Quant, e
// Insert 1 node: requantize
auto
remove_nodes
=
2
;
MainTest
(
BuildProgramDesc
(
use_mkldnn
,
scale1
,
scale2
),
remove_nodes
);
use_mkldnn
=
!
use_mkldnn
;
MainTest
(
BuildProgramDesc
(
use_mkldnn
,
scale1
,
scale2
),
remove_nodes
);
}
TEST
(
CpuQuantizeSquashPass
,
branch_to_equal_inequal_and_fp32
)
{
// Delete both quantize ops,
// bypass dequantize in both branches,
// insert requantize on one branch
auto
scale
=
1.2345
f
;
auto
scale2
=
21.0
f
;
auto
use_mkldnn
=
true
;
// Remove 3 nodes: Quant1, Quant2, g
// Insert 1 node: requantize
auto
remove_nodes
=
2
;
MainTest
(
BuildProgramDesc2
(
use_mkldnn
,
scale
,
scale
,
scale2
),
remove_nodes
);
use_mkldnn
=
!
use_mkldnn
;
MainTest
(
BuildProgramDesc2
(
use_mkldnn
,
scale
,
scale
,
scale2
),
remove_nodes
);
}
}
// namespace ir
}
// namespace framework
}
// namespace paddle
USE_PASS
(
cpu_quantize_squash_pass
);
paddle/fluid/framework/ir/graph_pattern_detector.cc
浏览文件 @
161b8ddc
...
@@ -90,7 +90,8 @@ void GraphPatternDetector::operator()(Graph *graph,
...
@@ -90,7 +90,8 @@ void GraphPatternDetector::operator()(Graph *graph,
ValidateByNodeRole
(
&
subgraphs
);
ValidateByNodeRole
(
&
subgraphs
);
if
(
subgraphs
.
empty
())
return
;
if
(
subgraphs
.
empty
())
return
;
PrettyLogEndl
(
Style
::
detail
(),
"--- detect %d subgraphs"
,
subgraphs
.
size
());
PrettyLogEndl
(
Style
::
detail
(),
"--- detected %d subgraphs"
,
subgraphs
.
size
());
int
id
=
0
;
int
id
=
0
;
for
(
auto
&
g
:
subgraphs
)
{
for
(
auto
&
g
:
subgraphs
)
{
VLOG
(
3
)
<<
"optimizing #"
<<
id
++
<<
" subgraph"
;
VLOG
(
3
)
<<
"optimizing #"
<<
id
++
<<
" subgraph"
;
...
@@ -1074,9 +1075,53 @@ PDNode *patterns::Conv::operator()() {
...
@@ -1074,9 +1075,53 @@ PDNode *patterns::Conv::operator()() {
->
AsOutput
()
->
AsOutput
()
->
assert_is_op_output
(
"conv2d"
,
"Output"
);
->
assert_is_op_output
(
"conv2d"
,
"Output"
);
conv_op
->
LinksFrom
({
input_var
,
filter_var
});
conv_op
->
LinksFrom
({
input_var
,
filter_var
}).
LinksTo
({
output_var
});
conv_op
->
LinksTo
({
output_var
});
return
output_var
;
}
PDNode
*
patterns
::
ConvResidual
::
operator
()(
bool
with_residual_data
)
{
auto
conv_op
=
pattern
->
NewNode
(
conv_op_repr
())
->
assert_is_op
(
"conv2d"
);
if
(
!
with_residual_data
)
conv_op
->
assert_op_attr
(
"fuse_residual_connection"
,
false
);
auto
input_var
=
pattern
->
NewNode
(
conv_input_repr
())
->
AsInput
()
->
assert_is_op_input
(
"conv2d"
,
"Input"
);
auto
filter_var
=
pattern
->
NewNode
(
conv_filter_repr
())
->
AsInput
()
->
assert_is_op_input
(
"conv2d"
,
"Filter"
);
auto
output_var
=
pattern
->
NewNode
(
conv_output_repr
())
->
AsOutput
()
->
assert_is_op_output
(
"conv2d"
,
"Output"
);
std
::
vector
<
PDNode
*>
links_from
{
input_var
,
filter_var
};
if
(
with_residual_data
)
{
auto
res_conn_var
=
pattern
->
NewNode
(
conv_residual_data_repr
())
->
AsInput
()
->
assert_is_op_input
(
"conv2d"
,
"ResidualData"
);
links_from
.
push_back
(
res_conn_var
);
}
conv_op
->
LinksFrom
(
links_from
).
LinksTo
({
output_var
});
return
output_var
;
}
PDNode
*
patterns
::
Pool
::
operator
()()
{
auto
pool_op
=
pattern
->
NewNode
(
pool_op_repr
())
->
assert_is_op
(
"pool2d"
);
auto
input_var
=
pattern
->
NewNode
(
pool_input_repr
())
->
AsInput
()
->
assert_is_op_input
(
"pool2d"
,
"X"
);
auto
output_var
=
pattern
->
NewNode
(
pool_output_repr
())
->
AsOutput
()
->
assert_is_op_output
(
"pool2d"
,
"Out"
);
pool_op
->
LinksFrom
({
input_var
}).
LinksTo
({
output_var
});
return
output_var
;
return
output_var
;
}
}
...
@@ -1301,6 +1346,51 @@ PDNode *patterns::ConvAffineChannel::operator()(
...
@@ -1301,6 +1346,51 @@ PDNode *patterns::ConvAffineChannel::operator()(
return
ac_out_var
;
return
ac_out_var
;
}
}
PDNode
*
patterns
::
DequantQuantAny
::
operator
()()
{
auto
*
dequant_in
=
pattern
->
NewNode
(
dequant_in_repr
())
->
AsInput
()
->
assert_is_op_input
(
"dequantize"
,
"Input"
);
auto
*
dequant_op
=
pattern
->
NewNode
(
dequant_op_repr
())
->
assert_is_op
(
"dequantize"
);
auto
*
dequant_out
=
pattern
->
NewNode
(
dequant_out_repr
())
->
AsOutput
()
->
assert_is_op_output
(
"dequantize"
,
"Output"
);
auto
*
quant_op
=
pattern
->
NewNode
(
quant_op_repr
())
->
assert_is_op
(
"quantize"
)
->
AsIntermediate
();
auto
*
quant_out
=
pattern
->
NewNode
(
quant_out_repr
())
->
AsOutput
()
->
assert_is_op_output
(
"quantize"
);
auto
*
next_op
=
pattern
->
NewNode
(
next_op_repr
())
->
assert_is_op
();
dequant_op
->
LinksFrom
({
dequant_in
}).
LinksTo
({
dequant_out
});
quant_op
->
LinksFrom
({
dequant_out
}).
LinksTo
({
quant_out
});
next_op
->
LinksFrom
({
quant_out
});
return
quant_out
;
}
PDNode
*
patterns
::
DequantAny
::
operator
()()
{
auto
*
dequant_op
=
pattern
->
NewNode
(
dequant_op_repr
())
->
assert_is_op
(
"dequantize"
);
auto
*
dequant_out
=
pattern
->
NewNode
(
dequant_out_repr
())
->
AsOutput
()
->
assert_is_op_output
(
"dequantize"
,
"Output"
);
auto
*
next_op
=
pattern
->
NewNode
(
next_op_repr
())
->
assert_is_op
();
dequant_op
->
LinksTo
({
dequant_out
});
next_op
->
LinksFrom
({
dequant_out
});
return
dequant_out
;
}
// a -> transpose_op(1) -> transpose_out_a -> flatten_op(1) -> flatten_out_a
// a -> transpose_op(1) -> transpose_out_a -> flatten_op(1) -> flatten_out_a
// b -> transpose_op(2) -> transpose_out_b -> flatten_op(2) -> flatten_out_b
// b -> transpose_op(2) -> transpose_out_b -> flatten_op(2) -> flatten_out_b
// ...
// ...
...
...
paddle/fluid/framework/ir/graph_pattern_detector.h
浏览文件 @
161b8ddc
...
@@ -18,8 +18,11 @@
...
@@ -18,8 +18,11 @@
#include <gtest/gtest_prod.h>
#include <gtest/gtest_prod.h>
#endif
#endif
#include <memory>
#include <numeric>
#include <numeric>
#include <string>
#include <string>
#include <unordered_map>
#include <unordered_set>
#include <utility>
#include <utility>
#include <vector>
#include <vector>
#include "paddle/fluid/framework/ir/graph.h"
#include "paddle/fluid/framework/ir/graph.h"
...
@@ -656,6 +659,35 @@ struct Conv : public PatternBase {
...
@@ -656,6 +659,35 @@ struct Conv : public PatternBase {
PATTERN_DECL_NODE
(
conv_output
);
PATTERN_DECL_NODE
(
conv_output
);
};
};
// Convolution op with residual data
struct
ConvResidual
:
public
PatternBase
{
ConvResidual
(
PDPattern
*
pattern
,
const
std
::
string
&
name_scope
)
:
PatternBase
(
pattern
,
name_scope
,
"conv_residual"
)
{}
PDNode
*
operator
()(
bool
with_residual_data
);
PATTERN_DECL_NODE
(
conv_op
);
PATTERN_DECL_NODE
(
conv_input
);
PATTERN_DECL_NODE
(
conv_filter
);
PATTERN_DECL_NODE
(
conv_residual_data
);
PATTERN_DECL_NODE
(
conv_output
);
};
// Pool op
// Forward pass for pooling.
// pool_input is the input.
// pool_output is a result of the operator.
struct
Pool
:
public
PatternBase
{
Pool
(
PDPattern
*
pattern
,
const
std
::
string
&
name_scope
)
:
PatternBase
(
pattern
,
name_scope
,
"pooling"
)
{}
PDNode
*
operator
()();
PATTERN_DECL_NODE
(
pool_op
);
PATTERN_DECL_NODE
(
pool_input
);
PATTERN_DECL_NODE
(
pool_output
);
};
// ElementwiseAdd used in residual connections.
// ElementwiseAdd used in residual connections.
// y_var is used and convolution output.
// y_var is used and convolution output.
// The operator is removed, when residual
// The operator is removed, when residual
...
@@ -766,6 +798,34 @@ struct ConvAffineChannel : public PatternBase {
...
@@ -766,6 +798,34 @@ struct ConvAffineChannel : public PatternBase {
PATTERN_DECL_NODE
(
ac_out
);
// Out
PATTERN_DECL_NODE
(
ac_out
);
// Out
};
};
// Dequantize + Quantize + anyOP
// This pattern is used for squashing the dequantize-quantize pairs.
struct
DequantQuantAny
:
public
PatternBase
{
DequantQuantAny
(
PDPattern
*
pattern
,
const
std
::
string
&
name_scope
)
:
PatternBase
(
pattern
,
name_scope
,
"dequant_quant_any"
)
{}
PDNode
*
operator
()();
PATTERN_DECL_NODE
(
dequant_in
);
PATTERN_DECL_NODE
(
dequant_op
);
PATTERN_DECL_NODE
(
dequant_out
);
PATTERN_DECL_NODE
(
quant_op
);
PATTERN_DECL_NODE
(
quant_out
);
PATTERN_DECL_NODE
(
next_op
);
};
// Dequantize + anyOP
// This quantize is used for getting number of ops the Dequantize's
// output is an input to.
struct
DequantAny
:
public
PatternBase
{
DequantAny
(
PDPattern
*
pattern
,
const
std
::
string
&
name_scope
)
:
PatternBase
(
pattern
,
name_scope
,
"dequant_any"
)
{}
PDNode
*
operator
()();
PATTERN_DECL_NODE
(
dequant_op
);
PATTERN_DECL_NODE
(
dequant_out
);
PATTERN_DECL_NODE
(
next_op
);
};
struct
TransposeFlattenConcat
:
public
PatternBase
{
struct
TransposeFlattenConcat
:
public
PatternBase
{
TransposeFlattenConcat
(
PDPattern
*
pattern
,
const
std
::
string
&
name_scope
)
TransposeFlattenConcat
(
PDPattern
*
pattern
,
const
std
::
string
&
name_scope
)
:
PatternBase
(
pattern
,
name_scope
,
"transpose_flatten_concat"
)
{}
:
PatternBase
(
pattern
,
name_scope
,
"transpose_flatten_concat"
)
{}
...
...
paddle/fluid/framework/ir/sync_batch_norm_pass.cc
0 → 100644
浏览文件 @
161b8ddc
/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/framework/ir/sync_batch_norm_pass.h"
#include <memory>
#include <string>
#include <utility>
namespace
paddle
{
namespace
framework
{
namespace
ir
{
std
::
unique_ptr
<
ir
::
Graph
>
SyncBatchNormPass
::
ApplyImpl
(
std
::
unique_ptr
<
ir
::
Graph
>
graph
)
const
{
VLOG
(
3
)
<<
"Use synchronous batch norm"
;
for
(
const
Node
*
n
:
graph
->
Nodes
())
{
if
(
n
->
IsOp
())
{
auto
*
op
=
n
->
Op
();
if
(
op
->
Type
()
==
"batch_norm"
)
{
op
->
SetType
(
"sync_batch_norm"
);
}
if
(
op
->
Type
()
==
"batch_norm_grad"
)
{
op
->
SetType
(
"sync_batch_norm_grad"
);
}
}
}
return
graph
;
}
}
// namespace ir
}
// namespace framework
}
// namespace paddle
REGISTER_PASS
(
sync_batch_norm_pass
,
paddle
::
framework
::
ir
::
SyncBatchNormPass
);
paddle/fluid/framework/ir/sync_batch_norm_pass.h
0 → 100644
浏览文件 @
161b8ddc
/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <memory>
#include "paddle/fluid/framework/ir/pass.h"
namespace
paddle
{
namespace
framework
{
namespace
ir
{
class
SyncBatchNormPass
:
public
Pass
{
protected:
std
::
unique_ptr
<
ir
::
Graph
>
ApplyImpl
(
std
::
unique_ptr
<
ir
::
Graph
>
graph
)
const
override
;
};
}
// namespace ir
}
// namespace framework
}
// namespace paddle
paddle/fluid/framework/ir/sync_batch_norm_pass_tester.cc
0 → 100644
浏览文件 @
161b8ddc
// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "paddle/fluid/framework/ir/sync_batch_norm_pass.h"
#include <gtest/gtest.h>
namespace
paddle
{
namespace
framework
{
namespace
ir
{
void
SetOp
(
ProgramDesc
*
prog
,
const
std
::
string
&
type
,
const
std
::
string
&
name
,
const
std
::
vector
<
std
::
string
>&
inputs
,
const
std
::
vector
<
std
::
string
>&
outputs
)
{
auto
*
op
=
prog
->
MutableBlock
(
0
)
->
AppendOp
();
op
->
SetType
(
type
);
op
->
SetAttr
(
"name"
,
name
);
op
->
SetInput
(
"X"
,
inputs
);
op
->
SetOutput
(
"Out"
,
outputs
);
}
// (a, conv_w)->conv2d->b
// (b, bn_scale, bn_bias, mean, var)->batch_norm
// ->(c, mean, var, save_mean, save_inv_var)
ProgramDesc
BuildProgramDesc
()
{
ProgramDesc
prog
;
for
(
auto
&
v
:
std
::
vector
<
std
::
string
>
({
"a"
,
"conv_w"
,
"b"
,
"bn_scale"
,
"bn_bias"
,
"mean"
,
"var"
,
"c"
,
"save_mean"
,
"save_inv_var"
}))
{
auto
*
var
=
prog
.
MutableBlock
(
0
)
->
Var
(
v
);
if
(
v
==
"conv_w"
||
v
==
"bn_scale"
||
v
==
"bn_bias"
||
v
==
"mean"
||
v
==
"var"
)
{
var
->
SetPersistable
(
true
);
}
}
SetOp
(
&
prog
,
"conv2d"
,
"conv"
,
std
::
vector
<
std
::
string
>
({
"a"
,
"conv_w"
}),
std
::
vector
<
std
::
string
>
({
"b"
}));
SetOp
(
&
prog
,
"batch_norm"
,
"bn"
,
std
::
vector
<
std
::
string
>
({
"b"
,
"bn_scale"
,
"bn_bias"
,
"mean"
,
"var"
}),
std
::
vector
<
std
::
string
>
(
{
"c"
,
"mean"
,
"var"
,
"save_mean"
,
"save_inv_var"
}));
return
prog
;
}
TEST
(
IsTestPass
,
basic
)
{
auto
prog
=
BuildProgramDesc
();
std
::
unique_ptr
<
ir
::
Graph
>
graph
(
new
ir
::
Graph
(
prog
));
auto
pass
=
PassRegistry
::
Instance
().
Get
(
"sync_batch_norm_pass"
);
graph
=
pass
->
Apply
(
std
::
move
(
graph
));
for
(
auto
*
node
:
graph
->
Nodes
())
{
if
(
node
->
IsOp
())
{
auto
*
op
=
node
->
Op
();
auto
op_name
=
boost
::
get
<
std
::
string
>
(
op
->
GetAttr
(
"name"
));
if
(
op_name
==
"bn"
)
{
ASSERT_EQ
(
op
->
Type
(),
"sync_batch_norm"
);
}
}
}
}
}
// namespace ir
}
// namespace framework
}
// namespace paddle
USE_PASS
(
sync_batch_norm_pass
);
paddle/fluid/framework/parallel_executor.cc
浏览文件 @
161b8ddc
...
@@ -14,8 +14,10 @@ limitations under the License. */
...
@@ -14,8 +14,10 @@ limitations under the License. */
#include "paddle/fluid/framework/parallel_executor.h"
#include "paddle/fluid/framework/parallel_executor.h"
#include <algorithm>
#include <algorithm>
#include <memory>
#include <string>
#include <string>
#include <tuple>
#include <tuple>
#include <utility>
#include <vector>
#include <vector>
#include "paddle/fluid/framework/ir/graph_helper.h"
#include "paddle/fluid/framework/ir/graph_helper.h"
...
@@ -251,6 +253,20 @@ ParallelExecutor::ParallelExecutor(const std::vector<platform::Place> &places,
...
@@ -251,6 +253,20 @@ ParallelExecutor::ParallelExecutor(const std::vector<platform::Place> &places,
member_
->
nccl_ctxs_
.
reset
(
new
platform
::
NCCLContextMap
(
member_
->
nccl_ctxs_
.
reset
(
new
platform
::
NCCLContextMap
(
member_
->
places_
,
nccl_id
,
build_strategy
.
num_trainers_
,
member_
->
places_
,
nccl_id
,
build_strategy
.
num_trainers_
,
build_strategy
.
trainer_id_
));
build_strategy
.
trainer_id_
));
std
::
unique_ptr
<
platform
::
NCCLContextMap
>
dev_nccl_ctxs
;
dev_nccl_ctxs
.
reset
(
new
platform
::
NCCLContextMap
(
member_
->
places_
));
// Initialize device context's nccl comm
// Note, more than one ParallelExecutor with same place, the nccl comm will
// be rewrite and there will be some problem.
for
(
size_t
dev_id
=
0
;
dev_id
<
member_
->
places_
.
size
();
++
dev_id
)
{
auto
&
nccl_ctx
=
dev_nccl_ctxs
->
at
(
dev_id
);
platform
::
DeviceContextPool
&
pool
=
platform
::
DeviceContextPool
::
Instance
();
auto
*
dev_ctx
=
static_cast
<
platform
::
CUDADeviceContext
*>
(
pool
.
Get
(
member_
->
places_
[
dev_id
]));
dev_ctx
->
set_nccl_comm
(
nccl_ctx
.
comm
());
}
#else
#else
PADDLE_THROW
(
"Not compiled with CUDA"
);
PADDLE_THROW
(
"Not compiled with CUDA"
);
#endif
#endif
...
...
paddle/fluid/imperative/CMakeLists.txt
浏览文件 @
161b8ddc
...
@@ -2,4 +2,5 @@ if(WITH_PYTHON)
...
@@ -2,4 +2,5 @@ if(WITH_PYTHON)
cc_library
(
layer SRCS layer.cc DEPS proto_desc operator device_context blas pybind
)
cc_library
(
layer SRCS layer.cc DEPS proto_desc operator device_context blas pybind
)
cc_library
(
tracer SRCS tracer.cc DEPS proto_desc device_context pybind
)
cc_library
(
tracer SRCS tracer.cc DEPS proto_desc device_context pybind
)
cc_library
(
engine SRCS engine.cc
)
cc_library
(
engine SRCS engine.cc
)
cc_library
(
imperative_profiler SRCS profiler.cc
)
endif
()
endif
()
paddle/fluid/imperative/layer.cc
浏览文件 @
161b8ddc
...
@@ -214,10 +214,8 @@ framework::LoDTensor& VarBase::GradValue() {
...
@@ -214,10 +214,8 @@ framework::LoDTensor& VarBase::GradValue() {
}
}
std
::
map
<
std
::
string
,
std
::
vector
<
VarBase
*>>
OpBase
::
ApplyGrad
()
{
std
::
map
<
std
::
string
,
std
::
vector
<
VarBase
*>>
OpBase
::
ApplyGrad
()
{
if
(
grad_op_descs_
.
empty
()
&&
backward_id_
<=
0
)
{
PADDLE_ENFORCE
(
!
grad_op_descs_
.
empty
()
||
backward_id_
>
0
,
VLOG
(
3
)
<<
"op with no grad: "
<<
Type
();
"%s has no backward implementation"
,
Type
());
return
{};
}
VLOG
(
3
)
<<
"apply op grad: "
<<
Type
();
VLOG
(
3
)
<<
"apply op grad: "
<<
Type
();
std
::
vector
<
framework
::
VariableValueMap
>
tmp_grad_outputs
;
std
::
vector
<
framework
::
VariableValueMap
>
tmp_grad_outputs
;
...
@@ -239,7 +237,7 @@ std::map<std::string, std::vector<VarBase*>> OpBase::ApplyGrad() {
...
@@ -239,7 +237,7 @@ std::map<std::string, std::vector<VarBase*>> OpBase::ApplyGrad() {
VLOG
(
3
)
<<
"apply grad op "
<<
grad_op_desc
->
Type
();
VLOG
(
3
)
<<
"apply grad op "
<<
grad_op_desc
->
Type
();
// Allocate tmp grad output variable
// Allocate tmp grad output variable
for
(
auto
it
:
grad_output_variable_map
)
{
for
(
const
auto
&
it
:
grad_output_variable_map
)
{
auto
&
outputs
=
tmp_grad_outputs
[
k
][
it
.
first
];
auto
&
outputs
=
tmp_grad_outputs
[
k
][
it
.
first
];
outputs
.
reserve
(
it
.
second
.
size
());
outputs
.
reserve
(
it
.
second
.
size
());
for
(
size_t
i
=
0
;
i
<
it
.
second
.
size
();
++
i
)
{
for
(
size_t
i
=
0
;
i
<
it
.
second
.
size
();
++
i
)
{
...
@@ -273,9 +271,9 @@ std::map<std::string, std::vector<VarBase*>> OpBase::ApplyGrad() {
...
@@ -273,9 +271,9 @@ std::map<std::string, std::vector<VarBase*>> OpBase::ApplyGrad() {
// Add tmp grad outputs to original grad vars
// Add tmp grad outputs to original grad vars
for
(
size_t
k
=
0
;
k
<
grad_output_vars_
.
size
();
++
k
)
{
for
(
size_t
k
=
0
;
k
<
grad_output_vars_
.
size
();
++
k
)
{
for
(
auto
it
:
grad_output_vars_
[
k
])
{
for
(
const
auto
&
it
:
grad_output_vars_
[
k
])
{
auto
&
outputs
=
tmp_grad_outputs
[
k
][
it
.
first
];
auto
&
outputs
=
tmp_grad_outputs
[
k
][
it
.
first
];
auto
&
origin_outputs
=
it
.
second
;
const
auto
&
origin_outputs
=
it
.
second
;
PADDLE_ENFORCE_EQ
(
outputs
.
size
(),
origin_outputs
.
size
());
PADDLE_ENFORCE_EQ
(
outputs
.
size
(),
origin_outputs
.
size
());
for
(
size_t
i
=
0
;
i
<
outputs
.
size
();
++
i
)
{
for
(
size_t
i
=
0
;
i
<
outputs
.
size
();
++
i
)
{
...
...
paddle/fluid/imperative/layer.h
浏览文件 @
161b8ddc
...
@@ -294,17 +294,23 @@ class PYBIND11_HIDDEN OpBase {
...
@@ -294,17 +294,23 @@ class PYBIND11_HIDDEN OpBase {
void
InvokeBackwardHooks
();
void
InvokeBackwardHooks
();
void
TrackPreOp
(
const
VarBase
*
inp_var
,
const
std
::
string
&
inp_name
)
{
void
TrackPreOp
(
const
std
::
string
&
inp_name
,
const
std
::
vector
<
VarBase
*>&
inputs
)
{
auto
&
pre_ops_list
=
pre_ops_
[
inp_name
];
pre_ops_list
.
reserve
(
inputs
.
size
());
auto
&
pre_ops_out_idx_list
=
pre_ops_out_idx_
[
inp_name
];
for
(
VarBase
*
inp_var
:
inputs
)
{
if
(
inp_var
->
PreOp
()
&&
!
inp_var
->
IsStopGradient
())
{
if
(
inp_var
->
PreOp
()
&&
!
inp_var
->
IsStopGradient
())
{
VLOG
(
3
)
<<
"add pre op "
<<
inp_var
->
PreOp
()
->
Type
()
<<
" in slot "
VLOG
(
3
)
<<
"add pre op "
<<
inp_var
->
PreOp
()
->
Type
()
<<
" in slot "
<<
inp_name
;
<<
inp_name
;
pre_ops_
[
inp_name
].
push
_back
(
inp_var
->
PreOp
());
pre_ops_list
.
emplace
_back
(
inp_var
->
PreOp
());
pre_ops_out_idx_
[
inp_name
]
.
push_back
(
inp_var
->
PreOpOutIdx
());
pre_ops_out_idx_list
.
push_back
(
inp_var
->
PreOpOutIdx
());
}
else
{
}
else
{
VLOG
(
3
)
<<
"no pre op in slot "
<<
inp_name
VLOG
(
3
)
<<
"no pre op in slot "
<<
inp_name
<<
" input var stop_gradient: "
<<
inp_var
->
IsStopGradient
();
<<
" input var stop_gradient: "
<<
inp_var
->
IsStopGradient
();
pre_ops_
[
inp_name
].
push_back
(
nullptr
);
pre_ops_list
.
emplace_back
(
nullptr
);
// pre_ops_out_idx_[inp_name].push_back(-1);
// pre_ops_out_idx_list.push_back(-1);
}
}
}
}
}
...
...
paddle/fluid/imperative/profiler.cc
0 → 100644
浏览文件 @
161b8ddc
// Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "paddle/fluid/imperative/profiler.h"
#ifdef WITH_GPERFTOOLS
#include "gperftools/profiler.h"
#endif
#include <gflags/gflags.h>
#include <glog/logging.h>
#include <mutex> // NOLINT
#include <thread> // NOLINT
DEFINE_string
(
tracer_profile_fname
,
"xxgperf"
,
"Profiler filename for imperative tracer, which generated by gperftools."
"Only valid when compiled `WITH_PROFILER=ON`. Empty if disable."
);
namespace
paddle
{
namespace
imperative
{
static
std
::
once_flag
gTracerProfileOnce
;
#ifdef WITH_GPERFTOOLS
static
bool
gTracerProfilerStarted
=
false
;
#endif
void
StartProfile
()
{
if
(
!
FLAGS_tracer_profile_fname
.
empty
())
{
std
::
call_once
(
gTracerProfileOnce
,
[]
{
#ifdef WITH_GPERFTOOLS
ProfilerStart
(
FLAGS_tracer_profile_fname
.
c_str
());
gTracerProfilerStarted
=
true
;
#else
LOG
(
WARNING
)
<<
"Paddle is not compiled with gperftools. "
"FLAGS_tracer_profile_fname will be ignored"
;
#endif
});
}
}
void
StopProfile
()
{
#ifdef WITH_GPERFTOOLS
ProfilerFlush
();
#else
LOG
(
WARNING
)
<<
"Paddle is not compiled with gperftools. "
"FLAGS_tracer_profile_fname will be ignored"
;
#endif
}
}
// namespace imperative
}
// namespace paddle
paddle/fluid/imperative/profiler.h
0 → 100644
浏览文件 @
161b8ddc
// Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#pragma once
namespace
paddle
{
namespace
imperative
{
extern
void
StartProfile
();
extern
void
StopProfile
();
}
// namespace imperative
}
// namespace paddle
paddle/fluid/imperative/tracer.cc
浏览文件 @
161b8ddc
...
@@ -23,34 +23,21 @@
...
@@ -23,34 +23,21 @@
#include "paddle/fluid/platform/device_context.h"
#include "paddle/fluid/platform/device_context.h"
#include "paddle/fluid/platform/enforce.h"
#include "paddle/fluid/platform/enforce.h"
#ifdef WITH_GPERFTOOLS
#include "gperftools/profiler.h"
#endif
DEFINE_string
(
tracer_profile_fname
,
""
,
"Profiler filename for imperative tracer, which generated by gperftools."
"Only valid when compiled `WITH_PROFILER=ON`. Empty if disable."
);
namespace
paddle
{
namespace
paddle
{
namespace
imperative
{
namespace
imperative
{
static
std
::
once_flag
gTracerProfileOnce
;
#ifdef WITH_GPERFTOOLS
static
bool
gTracerProfilerStarted
=
false
;
#endif
void
CreateGradOp
(
const
framework
::
OpDesc
&
op_desc
,
void
CreateGradOp
(
const
framework
::
OpDesc
&
op_desc
,
const
std
::
unordered_set
<
std
::
string
>&
no_grad_set
,
const
std
::
unordered_set
<
std
::
string
>&
no_grad_set
,
const
std
::
vector
<
framework
::
BlockDesc
*>&
grad_sub_block
,
const
std
::
vector
<
framework
::
BlockDesc
*>&
grad_sub_block
,
std
::
vector
<
framework
::
OpDesc
*>*
grad_op_descs
,
std
::
vector
<
framework
::
OpDesc
*>*
grad_op_descs
,
std
::
unordered_map
<
std
::
string
,
std
::
string
>*
grad_to_var
)
{
std
::
unordered_map
<
std
::
string
,
std
::
string
>*
grad_to_var
)
{
PADDLE_ENFORCE
(
grad_op_descs
->
empty
());
PADDLE_ENFORCE
(
grad_op_descs
->
empty
());
std
::
vector
<
std
::
unique_ptr
<
framework
::
OpDesc
>>
descs
=
const
framework
::
OpInfo
&
op_info
=
framework
::
OpInfoMap
::
Instance
()
framework
::
OpInfoMap
::
Instance
().
Get
(
op_desc
.
Type
());
.
Get
(
op_desc
.
Type
())
if
(
!
op_info
.
grad_op_maker_
)
return
;
.
GradOpMaker
()(
op_desc
,
no_grad_set
,
grad_to_var
,
grad_sub_block
);
std
::
vector
<
std
::
unique_ptr
<
framework
::
OpDesc
>>
descs
=
op_info
.
GradOpMaker
()(
op_desc
,
no_grad_set
,
grad_to_var
,
grad_sub_block
);
for
(
auto
&
desc
:
descs
)
{
for
(
auto
&
desc
:
descs
)
{
grad_op_descs
->
emplace_back
(
desc
.
release
());
grad_op_descs
->
emplace_back
(
desc
.
release
());
}
}
...
@@ -145,31 +132,13 @@ framework::VariableNameMap CreateOutputVarNameMap(
...
@@ -145,31 +132,13 @@ framework::VariableNameMap CreateOutputVarNameMap(
return
result
;
return
result
;
}
}
Tracer
::
Tracer
(
framework
::
BlockDesc
*
root_block
)
:
root_block_
(
root_block
)
{
Tracer
::
Tracer
(
framework
::
BlockDesc
*
root_block
)
:
root_block_
(
root_block
)
{}
if
(
!
FLAGS_tracer_profile_fname
.
empty
())
{
std
::
call_once
(
gTracerProfileOnce
,
[]
{
#ifdef WITH_GPERFTOOLS
ProfilerStart
(
FLAGS_tracer_profile_fname
.
c_str
());
gTracerProfilerStarted
=
true
;
#else
LOG
(
WARNING
)
<<
"Paddle is not compiled with gperftools. "
"FLAGS_tracer_profile_fname will be ignored"
;
#endif
});
}
}
std
::
set
<
std
::
string
>
Tracer
::
Trace
(
OpBase
*
op
,
const
VarBasePtrMap
&
inputs
,
std
::
set
<
std
::
string
>
Tracer
::
Trace
(
OpBase
*
op
,
const
VarBasePtrMap
&
inputs
,
const
VarBasePtrMap
&
outputs
,
const
VarBasePtrMap
&
outputs
,
framework
::
AttributeMap
attrs_map
,
framework
::
AttributeMap
attrs_map
,
const
platform
::
Place
expected_place
,
const
platform
::
Place
expected_place
,
const
bool
stop_gradient
)
{
const
bool
stop_gradient
)
{
#ifdef WITH_GPERFTOOLS
if
(
gTracerProfilerStarted
)
{
ProfilerFlush
();
}
#endif
framework
::
VariableValueMap
invars_map
;
framework
::
VariableValueMap
invars_map
;
framework
::
VariableValueMap
outvars_map
;
framework
::
VariableValueMap
outvars_map
;
...
@@ -184,7 +153,6 @@ std::set<std::string> Tracer::Trace(OpBase* op, const VarBasePtrMap& inputs,
...
@@ -184,7 +153,6 @@ std::set<std::string> Tracer::Trace(OpBase* op, const VarBasePtrMap& inputs,
inp
->
Name
());
inp
->
Name
());
invars
.
emplace_back
(
inp
->
var_
);
invars
.
emplace_back
(
inp
->
var_
);
op
->
TrackPreOp
(
inp
,
it
.
first
);
if
(
!
stop_gradient
)
{
if
(
!
stop_gradient
)
{
current_vars_map
[
inp
->
Name
()]
=
inp
;
current_vars_map
[
inp
->
Name
()]
=
inp
;
}
}
...
@@ -192,6 +160,7 @@ std::set<std::string> Tracer::Trace(OpBase* op, const VarBasePtrMap& inputs,
...
@@ -192,6 +160,7 @@ std::set<std::string> Tracer::Trace(OpBase* op, const VarBasePtrMap& inputs,
<<
" inited: "
<<
inp
->
var_
->
IsInitialized
()
<<
" inited: "
<<
inp
->
var_
->
IsInitialized
()
<<
" stop_grad: "
<<
inp
->
IsStopGradient
();
<<
" stop_grad: "
<<
inp
->
IsStopGradient
();
}
}
op
->
TrackPreOp
(
it
.
first
,
it
.
second
);
}
}
op
->
output_vars_
=
outputs
;
op
->
output_vars_
=
outputs
;
...
@@ -319,9 +288,7 @@ std::vector<VarBase*> Tracer::PyTrace(OpBase* op,
...
@@ -319,9 +288,7 @@ std::vector<VarBase*> Tracer::PyTrace(OpBase* op,
std
::
vector
<
framework
::
Variable
*>
ret_vars
=
std
::
vector
<
framework
::
Variable
*>
ret_vars
=
PyLayer
::
Apply
(
op
->
forward_id_
,
inputs
);
PyLayer
::
Apply
(
op
->
forward_id_
,
inputs
);
for
(
VarBase
*
inp
:
inputs
)
{
op
->
TrackPreOp
(
PyLayer
::
kFwdInp
,
inputs
);
op
->
TrackPreOp
(
inp
,
PyLayer
::
kFwdInp
);
}
std
::
vector
<
VarBase
*>&
outputs
=
op
->
output_vars_
[
PyLayer
::
kFwdOut
];
std
::
vector
<
VarBase
*>&
outputs
=
op
->
output_vars_
[
PyLayer
::
kFwdOut
];
outputs
.
reserve
(
ret_vars
.
size
());
outputs
.
reserve
(
ret_vars
.
size
());
...
...
paddle/fluid/inference/CMakeLists.txt
浏览文件 @
161b8ddc
paddle/fluid/inference/analysis/argument.h
浏览文件 @
161b8ddc
...
@@ -27,6 +27,7 @@
...
@@ -27,6 +27,7 @@
#include <string>
#include <string>
#include <unordered_map>
#include <unordered_map>
#include <unordered_set>
#include <unordered_set>
#include <utility>
#include <vector>
#include <vector>
#include "paddle/fluid/framework/ir/graph.h"
#include "paddle/fluid/framework/ir/graph.h"
...
@@ -38,7 +39,10 @@
...
@@ -38,7 +39,10 @@
namespace
paddle
{
namespace
paddle
{
namespace
inference
{
namespace
inference
{
namespace
analysis
{
namespace
analysis
{
using
framework
::
ir
::
Graph
;
using
framework
::
ir
::
Graph
;
using
VarQuantScale
=
std
::
unordered_map
<
std
::
string
,
std
::
pair
<
bool
,
framework
::
LoDTensor
>>
;
/*
/*
* The argument definition of both Pass and PassManagers.
* The argument definition of both Pass and PassManagers.
...
@@ -127,6 +131,8 @@ struct Argument {
...
@@ -127,6 +131,8 @@ struct Argument {
// Pass a set of op types to enable its mkldnn kernel
// Pass a set of op types to enable its mkldnn kernel
DECL_ARGUMENT_FIELD
(
mkldnn_enabled_op_types
,
MKLDNNEnabledOpTypes
,
DECL_ARGUMENT_FIELD
(
mkldnn_enabled_op_types
,
MKLDNNEnabledOpTypes
,
std
::
unordered_set
<
std
::
string
>
);
std
::
unordered_set
<
std
::
string
>
);
// Scales for variables to be quantized
DECL_ARGUMENT_FIELD
(
quant_var_scales
,
QuantVarScales
,
VarQuantScale
);
// Passed from config.
// Passed from config.
DECL_ARGUMENT_FIELD
(
use_gpu
,
UseGPU
,
bool
);
DECL_ARGUMENT_FIELD
(
use_gpu
,
UseGPU
,
bool
);
...
...
paddle/fluid/inference/analysis/ir_pass_manager.cc
浏览文件 @
161b8ddc
...
@@ -14,6 +14,7 @@
...
@@ -14,6 +14,7 @@
#include "paddle/fluid/inference/analysis/ir_pass_manager.h"
#include "paddle/fluid/inference/analysis/ir_pass_manager.h"
#include <string>
#include <string>
#include <unordered_map>
#include <vector>
#include <vector>
#include "paddle/fluid/framework/ir/fuse_pass_base.h"
#include "paddle/fluid/framework/ir/fuse_pass_base.h"
#include "paddle/fluid/framework/ir/graph.h"
#include "paddle/fluid/framework/ir/graph.h"
...
@@ -55,14 +56,14 @@ void IRPassManager::CreatePasses(Argument *argument,
...
@@ -55,14 +56,14 @@ void IRPassManager::CreatePasses(Argument *argument,
".dot"
;
".dot"
;
pass
->
Set
(
"graph_viz_path"
,
new
std
::
string
(
std
::
move
(
dot_file_path
)));
pass
->
Set
(
"graph_viz_path"
,
new
std
::
string
(
std
::
move
(
dot_file_path
)));
pass_num
++
;
pass_num
++
;
}
}
else
if
(
pass_name
==
"mkldnn_placement_pass"
)
{
if
(
pass_name
==
"mkldnn_placement_pass"
)
{
pass
->
Set
(
"mkldnn_enabled_op_types"
,
pass
->
Set
(
"mkldnn_enabled_op_types"
,
new
std
::
unordered_set
<
std
::
string
>
(
new
std
::
unordered_set
<
std
::
string
>
(
argument
->
mkldnn_enabled_op_types
()));
argument
->
mkldnn_enabled_op_types
()));
}
}
else
if
(
pass_name
==
"cpu_quantize_pass"
)
{
pass
->
Set
(
"quant_var_scales"
,
if
(
pass_name
==
"tensorrt_subgraph_pass"
)
{
new
VarQuantScale
(
argument
->
quant_var_scales
()));
}
else
if
(
pass_name
==
"tensorrt_subgraph_pass"
)
{
pass
->
Set
(
"workspace_size"
,
new
int
(
argument
->
tensorrt_workspace_size
()));
pass
->
Set
(
"workspace_size"
,
new
int
(
argument
->
tensorrt_workspace_size
()));
pass
->
Set
(
"max_batch_size"
,
new
int
(
argument
->
tensorrt_max_batch_size
()));
pass
->
Set
(
"max_batch_size"
,
new
int
(
argument
->
tensorrt_max_batch_size
()));
pass
->
Set
(
"min_subgraph_size"
,
pass
->
Set
(
"min_subgraph_size"
,
...
...
paddle/fluid/inference/api/analysis_config.cc
浏览文件 @
161b8ddc
...
@@ -219,7 +219,14 @@ void AnalysisConfig::Update() {
...
@@ -219,7 +219,14 @@ void AnalysisConfig::Update() {
}
}
if
(
enable_memory_optim_
)
{
if
(
enable_memory_optim_
)
{
pass_builder
()
->
AppendAnalysisPass
(
"memory_optimize_pass"
);
auto
analysis_passes
=
pass_builder
()
->
AnalysisPasses
();
auto
memory_opti_pass_name
=
"memory_optimize_pass"
;
bool
already_exists
=
std
::
find
(
analysis_passes
.
begin
(),
analysis_passes
.
end
(),
memory_opti_pass_name
)
!=
analysis_passes
.
end
();
if
(
!
already_exists
)
{
pass_builder
()
->
AppendAnalysisPass
(
memory_opti_pass_name
);
}
}
}
if
(
ir_debug_
)
{
if
(
ir_debug_
)
{
...
...
paddle/fluid/operators/CMakeLists.txt
浏览文件 @
161b8ddc
...
@@ -44,10 +44,10 @@ if (WITH_DISTRIBUTE)
...
@@ -44,10 +44,10 @@ if (WITH_DISTRIBUTE)
SET
(
OP_PREFETCH_DEPS
${
OP_PREFETCH_DEPS
}
parameter_prefetch
)
SET
(
OP_PREFETCH_DEPS
${
OP_PREFETCH_DEPS
}
parameter_prefetch
)
endif
()
endif
()
register_operators
(
EXCLUDES py_func_op warpctc_op conv_fusion_op DEPS
${
OP_HEADER_DEPS
}
${
OP_PREFETCH_DEPS
}
)
register_operators
(
EXCLUDES py_func_op warpctc_op conv_fusion_op
sync_batch_norm_op
DEPS
${
OP_HEADER_DEPS
}
${
OP_PREFETCH_DEPS
}
)
# warpctc_op needs cudnn 7 above
if
(
WITH_GPU
)
if
(
WITH_GPU
)
# warpctc_op needs cudnn 7 above
if
(
${
CUDNN_MAJOR_VERSION
}
VERSION_LESS 7
)
if
(
${
CUDNN_MAJOR_VERSION
}
VERSION_LESS 7
)
op_library
(
warpctc_op DEPS dynload_warpctc sequence_padding sequence_scale SRCS warpctc_op.cc warpctc_op.cu.cc
)
op_library
(
warpctc_op DEPS dynload_warpctc sequence_padding sequence_scale SRCS warpctc_op.cc warpctc_op.cu.cc
)
else
()
else
()
...
@@ -58,6 +58,10 @@ if (WITH_GPU)
...
@@ -58,6 +58,10 @@ if (WITH_GPU)
op_library
(
conv_fusion_op
)
op_library
(
conv_fusion_op
)
file
(
APPEND
${
pybind_file
}
"USE_CUDA_ONLY_OP(conv2d_fusion);
\n
"
)
file
(
APPEND
${
pybind_file
}
"USE_CUDA_ONLY_OP(conv2d_fusion);
\n
"
)
endif
()
endif
()
if
(
NOT WIN32
)
op_library
(
sync_batch_norm_op
)
file
(
APPEND
${
pybind_file
}
"USE_CUDA_ONLY_OP(sync_batch_norm);
\n
"
)
endif
()
else
()
else
()
op_library
(
warpctc_op DEPS dynload_warpctc sequence_padding sequence_scale
)
op_library
(
warpctc_op DEPS dynload_warpctc sequence_padding sequence_scale
)
endif
()
endif
()
...
...
paddle/fluid/operators/batch_norm_op.cc
浏览文件 @
161b8ddc
...
@@ -13,7 +13,9 @@ See the License for the specific language governing permissions and
...
@@ -13,7 +13,9 @@ See the License for the specific language governing permissions and
limitations under the License. */
limitations under the License. */
#include "paddle/fluid/operators/batch_norm_op.h"
#include "paddle/fluid/operators/batch_norm_op.h"
#include <memory>
#include <string>
#include <string>
#include <unordered_map>
#include "paddle/fluid/framework/data_layout.h"
#include "paddle/fluid/framework/data_layout.h"
#ifdef PADDLE_WITH_MKLDNN
#ifdef PADDLE_WITH_MKLDNN
#include "paddle/fluid/platform/mkldnn_helper.h"
#include "paddle/fluid/platform/mkldnn_helper.h"
...
@@ -22,27 +24,34 @@ limitations under the License. */
...
@@ -22,27 +24,34 @@ limitations under the License. */
namespace
paddle
{
namespace
paddle
{
namespace
operators
{
namespace
operators
{
class
BatchNormOp
:
public
framework
::
OperatorWithKernel
{
void
BatchNormOp
::
InferShape
(
framework
::
InferShapeContext
*
ctx
)
const
{
public:
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"X"
),
"Input(X) of ConvOp should not be null."
);
using
framework
::
OperatorWithKernel
::
OperatorWithKernel
;
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"Scale"
),
"Input(Scale) of ConvOp should not be null."
);
void
InferShape
(
framework
::
InferShapeContext
*
ctx
)
const
override
{
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"Bias"
),
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"X"
),
""
);
"Input(Bias) of ConvOp should not be null."
);
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"Scale"
),
""
);
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"Mean"
),
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"Bias"
),
""
);
"Input(Mean) of ConvOp should not be null."
);
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"Mean"
),
""
);
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"Variance"
),
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"Variance"
),
""
);
"Input(Variance) of ConvOp should not be null."
);
PADDLE_ENFORCE
(
ctx
->
HasOutput
(
"Y"
),
""
);
PADDLE_ENFORCE
(
ctx
->
HasOutput
(
"Y"
),
PADDLE_ENFORCE
(
ctx
->
HasOutput
(
"MeanOut"
),
""
);
"Output(Y) of ConvOp should not be null."
);
PADDLE_ENFORCE
(
ctx
->
HasOutput
(
"VarianceOut"
),
""
);
bool
is_test
=
ctx
->
Attrs
().
Get
<
bool
>
(
"is_test"
);
PADDLE_ENFORCE
(
ctx
->
HasOutput
(
"SavedMean"
),
""
);
if
(
!
is_test
)
{
PADDLE_ENFORCE
(
ctx
->
HasOutput
(
"SavedVariance"
),
""
);
PADDLE_ENFORCE
(
ctx
->
HasOutput
(
"MeanOut"
),
"Output(MeanOut) of ConvOp should not be null."
);
PADDLE_ENFORCE
(
ctx
->
HasOutput
(
"VarianceOut"
),
"Output(VarianceOut) of ConvOp should not be null."
);
PADDLE_ENFORCE
(
ctx
->
HasOutput
(
"SavedMean"
),
"Output(SavedMean) of ConvOp should not be null."
);
PADDLE_ENFORCE
(
ctx
->
HasOutput
(
"SavedVariance"
),
"Output(SavedVariance) of ConvOp should not be null."
);
}
// make sure Mean/MeanOut and Variance/VarianceOut share memory in Python
// make sure Mean/MeanOut and Variance/VarianceOut share memory in Python
PADDLE_ENFORCE_EQ
(
ctx
->
Inputs
(
"Mean"
)[
0
],
ctx
->
Outputs
(
"MeanOut"
)[
0
],
PADDLE_ENFORCE_EQ
(
ctx
->
Inputs
(
"Mean"
)[
0
],
ctx
->
Outputs
(
"MeanOut"
)[
0
],
"Mean and MeanOut should share the same memory"
);
"Mean and MeanOut should share the same memory"
);
PADDLE_ENFORCE_EQ
(
ctx
->
Inputs
(
"Variance"
)[
0
],
PADDLE_ENFORCE_EQ
(
ctx
->
Inputs
(
"Variance"
)[
0
],
ctx
->
Outputs
(
"VarianceOut"
)[
0
],
ctx
->
Outputs
(
"VarianceOut"
)[
0
],
"Variance and VarianceOut should share the same memory"
);
"Variance and VarianceOut should share the same memory"
);
const
auto
x_dims
=
ctx
->
GetInputDim
(
"X"
);
const
auto
x_dims
=
ctx
->
GetInputDim
(
"X"
);
...
@@ -67,11 +76,10 @@ class BatchNormOp : public framework::OperatorWithKernel {
...
@@ -67,11 +76,10 @@ class BatchNormOp : public framework::OperatorWithKernel {
ctx
->
SetOutputDim
(
"SavedMean"
,
{
C
});
ctx
->
SetOutputDim
(
"SavedMean"
,
{
C
});
ctx
->
SetOutputDim
(
"SavedVariance"
,
{
C
});
ctx
->
SetOutputDim
(
"SavedVariance"
,
{
C
});
ctx
->
ShareLoD
(
"X"
,
"Y"
);
ctx
->
ShareLoD
(
"X"
,
"Y"
);
}
}
protected:
framework
::
OpKernelType
BatchNormOp
::
GetExpectedKernelType
(
framework
::
OpKernelType
GetExpectedKernelType
(
const
framework
::
ExecutionContext
&
ctx
)
const
{
const
framework
::
ExecutionContext
&
ctx
)
const
override
{
auto
input_data_type
=
ctx
.
Input
<
Tensor
>
(
"X"
)
->
type
();
auto
input_data_type
=
ctx
.
Input
<
Tensor
>
(
"X"
)
->
type
();
// By default, the type of the scale, bias, mean,
// By default, the type of the scale, bias, mean,
// and var tensors should both be float. (For float or float16 input tensor)
// and var tensors should both be float. (For float or float16 input tensor)
...
@@ -102,12 +110,9 @@ class BatchNormOp : public framework::OperatorWithKernel {
...
@@ -102,12 +110,9 @@ class BatchNormOp : public framework::OperatorWithKernel {
return
framework
::
OpKernelType
(
input_data_type
,
ctx
.
GetPlace
(),
layout
,
return
framework
::
OpKernelType
(
input_data_type
,
ctx
.
GetPlace
(),
layout
,
library
);
library
);
}
}
};
class
BatchNormOpMaker
:
public
framework
::
OpProtoAndCheckerMaker
{
void
BatchNormOpMaker
::
Make
()
{
public:
void
Make
()
override
{
AddAttr
<
bool
>
(
"is_test"
,
AddAttr
<
bool
>
(
"is_test"
,
"(bool, default false) Set to true for inference only, false "
"(bool, default false) Set to true for inference only, false "
"for training. Some layers may run faster when this is true."
)
"for training. Some layers may run faster when this is true."
)
...
@@ -173,17 +178,7 @@ The required data format for this layer is one of the following:
...
@@ -173,17 +178,7 @@ The required data format for this layer is one of the following:
2. NCHW `[batch, in_channels, in_height, in_width]`
2. NCHW `[batch, in_channels, in_height, in_width]`
)DOC"
);
)DOC"
);
}
}
};
class
BatchNormOpInferVarType
:
public
framework
::
PassInDtypeAndVarTypeToOutput
{
protected:
std
::
unordered_map
<
std
::
string
,
std
::
string
>
GetInputOutputWithSameType
()
const
override
{
return
std
::
unordered_map
<
std
::
string
,
std
::
string
>
{{
"X"
,
/*->*/
"Y"
}};
}
};
template
<
typename
T
>
template
<
typename
T
>
class
BatchNormKernel
<
platform
::
CPUDeviceContext
,
T
>
class
BatchNormKernel
<
platform
::
CPUDeviceContext
,
T
>
...
@@ -336,11 +331,7 @@ class BatchNormKernel<platform::CPUDeviceContext, T>
...
@@ -336,11 +331,7 @@ class BatchNormKernel<platform::CPUDeviceContext, T>
}
}
};
};
class
BatchNormGradOp
:
public
framework
::
OperatorWithKernel
{
void
BatchNormGradOp
::
InferShape
(
framework
::
InferShapeContext
*
ctx
)
const
{
public:
using
framework
::
OperatorWithKernel
::
OperatorWithKernel
;
void
InferShape
(
framework
::
InferShapeContext
*
ctx
)
const
override
{
// check input
// check input
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"X"
));
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"X"
));
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"Scale"
),
"Input(scale) should not be null."
);
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"Scale"
),
"Input(scale) should not be null."
);
...
@@ -368,8 +359,7 @@ class BatchNormGradOp : public framework::OperatorWithKernel {
...
@@ -368,8 +359,7 @@ class BatchNormGradOp : public framework::OperatorWithKernel {
const
auto
x_dims
=
ctx
->
GetInputDim
(
"X"
);
const
auto
x_dims
=
ctx
->
GetInputDim
(
"X"
);
const
DataLayout
data_layout
=
framework
::
StringToDataLayout
(
const
DataLayout
data_layout
=
framework
::
StringToDataLayout
(
ctx
->
Attrs
().
Get
<
std
::
string
>
(
"data_layout"
));
ctx
->
Attrs
().
Get
<
std
::
string
>
(
"data_layout"
));
const
int
C
=
const
int
C
=
(
data_layout
==
DataLayout
::
kNCHW
?
x_dims
[
1
]
(
data_layout
==
DataLayout
::
kNCHW
?
x_dims
[
1
]
:
x_dims
[
x_dims
.
size
()
-
1
]);
:
x_dims
[
x_dims
.
size
()
-
1
]);
ctx
->
SetOutputDim
(
framework
::
GradVarName
(
"X"
),
x_dims
);
ctx
->
SetOutputDim
(
framework
::
GradVarName
(
"X"
),
x_dims
);
...
@@ -377,11 +367,10 @@ class BatchNormGradOp : public framework::OperatorWithKernel {
...
@@ -377,11 +367,10 @@ class BatchNormGradOp : public framework::OperatorWithKernel {
ctx
->
SetOutputDim
(
framework
::
GradVarName
(
"Scale"
),
{
C
});
ctx
->
SetOutputDim
(
framework
::
GradVarName
(
"Scale"
),
{
C
});
ctx
->
SetOutputDim
(
framework
::
GradVarName
(
"Bias"
),
{
C
});
ctx
->
SetOutputDim
(
framework
::
GradVarName
(
"Bias"
),
{
C
});
}
}
}
}
protected:
framework
::
OpKernelType
BatchNormGradOp
::
GetExpectedKernelType
(
framework
::
OpKernelType
GetExpectedKernelType
(
const
framework
::
ExecutionContext
&
ctx
)
const
{
const
framework
::
ExecutionContext
&
ctx
)
const
override
{
const
auto
*
var
=
ctx
.
InputVar
(
framework
::
GradVarName
(
"Y"
));
const
auto
*
var
=
ctx
.
InputVar
(
framework
::
GradVarName
(
"Y"
));
if
(
var
==
nullptr
)
{
if
(
var
==
nullptr
)
{
PADDLE_THROW
(
"can't find Y@GRAD"
);
PADDLE_THROW
(
"can't find Y@GRAD"
);
...
@@ -408,10 +397,9 @@ class BatchNormGradOp : public framework::OperatorWithKernel {
...
@@ -408,10 +397,9 @@ class BatchNormGradOp : public framework::OperatorWithKernel {
}
}
#endif
#endif
return
framework
::
OpKernelType
(
ctx
.
Input
<
Tensor
>
(
"X"
)
->
type
(),
return
framework
::
OpKernelType
(
ctx
.
Input
<
Tensor
>
(
"X"
)
->
type
(),
ctx
.
GetPlace
(),
ctx
.
GetPlace
(),
layout
,
library
);
layout
,
library
);
}
}
};
template
<
typename
T
>
template
<
typename
T
>
class
BatchNormGradKernel
<
platform
::
CPUDeviceContext
,
T
>
class
BatchNormGradKernel
<
platform
::
CPUDeviceContext
,
T
>
...
@@ -572,14 +560,9 @@ class BatchNormGradKernel<platform::CPUDeviceContext, T>
...
@@ -572,14 +560,9 @@ class BatchNormGradKernel<platform::CPUDeviceContext, T>
}
}
};
};
class
BatchNormGradMaker
:
public
framework
::
SingleGradOpDescMaker
{
std
::
unique_ptr
<
framework
::
OpDesc
>
BatchNormGradMaker
::
Apply
()
const
{
public:
using
framework
::
SingleGradOpDescMaker
::
SingleGradOpDescMaker
;
protected:
std
::
unique_ptr
<
framework
::
OpDesc
>
Apply
()
const
override
{
auto
*
op
=
new
framework
::
OpDesc
();
auto
*
op
=
new
framework
::
OpDesc
();
op
->
SetType
(
"batch_norm_grad"
);
op
->
SetType
(
GradOpType
()
);
op
->
SetInput
(
"X"
,
Input
(
"X"
));
op
->
SetInput
(
"X"
,
Input
(
"X"
));
op
->
SetInput
(
framework
::
GradVarName
(
"Y"
),
OutputGrad
(
"Y"
));
op
->
SetInput
(
framework
::
GradVarName
(
"Y"
),
OutputGrad
(
"Y"
));
...
@@ -601,8 +584,7 @@ class BatchNormGradMaker : public framework::SingleGradOpDescMaker {
...
@@ -601,8 +584,7 @@ class BatchNormGradMaker : public framework::SingleGradOpDescMaker {
op
->
SetOutput
(
framework
::
GradVarName
(
"Bias"
),
InputGrad
(
"Bias"
));
op
->
SetOutput
(
framework
::
GradVarName
(
"Bias"
),
InputGrad
(
"Bias"
));
return
std
::
unique_ptr
<
framework
::
OpDesc
>
(
op
);
return
std
::
unique_ptr
<
framework
::
OpDesc
>
(
op
);
}
}
};
class
BatchNormInplaceInToOut
:
public
framework
::
InplaceInToOut
{
class
BatchNormInplaceInToOut
:
public
framework
::
InplaceInToOut
{
public:
public:
...
@@ -642,10 +624,10 @@ class BatchNormGradInplaceInToOut : public framework::InplaceInToOut {
...
@@ -642,10 +624,10 @@ class BatchNormGradInplaceInToOut : public framework::InplaceInToOut {
namespace
ops
=
paddle
::
operators
;
namespace
ops
=
paddle
::
operators
;
REGISTER_OPERATOR
(
batch_norm
,
ops
::
BatchNormOp
,
ops
::
BatchNormOpMaker
,
REGISTER_OPERATOR
(
batch_norm
,
ops
::
BatchNormOp
,
ops
::
BatchNormOpMaker
,
ops
::
BatchNormOpInferVarType
,
ops
::
BatchNormGradMaker
,
ops
::
BatchNormOpInferVarType
,
ops
::
BatchNormGradMaker
)
ops
::
BatchNormInplaceInToOut
);
//
ops::BatchNormInplaceInToOut);
REGISTER_OPERATOR
(
batch_norm_grad
,
ops
::
BatchNormGradOp
,
REGISTER_OPERATOR
(
batch_norm_grad
,
ops
::
BatchNormGradOp
)
ops
::
BatchNormGradInplaceInToOut
);
//
ops::BatchNormGradInplaceInToOut);
REGISTER_OP_CPU_KERNEL
(
REGISTER_OP_CPU_KERNEL
(
batch_norm
,
ops
::
BatchNormKernel
<
paddle
::
platform
::
CPUDeviceContext
,
float
>
,
batch_norm
,
ops
::
BatchNormKernel
<
paddle
::
platform
::
CPUDeviceContext
,
float
>
,
...
...
paddle/fluid/operators/batch_norm_op.cu
浏览文件 @
161b8ddc
...
@@ -33,26 +33,6 @@ using CudnnDataType = platform::CudnnDataType<T>;
...
@@ -33,26 +33,6 @@ using CudnnDataType = platform::CudnnDataType<T>;
template
<
typename
T
>
template
<
typename
T
>
using
BatchNormParamType
=
typename
CudnnDataType
<
T
>::
BatchNormParamType
;
using
BatchNormParamType
=
typename
CudnnDataType
<
T
>::
BatchNormParamType
;
void
ExtractNCWHD
(
const
framework
::
DDim
&
dims
,
const
DataLayout
&
data_layout
,
int
*
N
,
int
*
C
,
int
*
H
,
int
*
W
,
int
*
D
)
{
*
N
=
dims
[
0
];
if
(
dims
.
size
()
==
2
)
{
*
C
=
dims
[
1
];
*
H
=
1
;
*
W
=
1
;
*
D
=
1
;
}
else
{
*
C
=
data_layout
==
DataLayout
::
kNCHW
?
dims
[
1
]
:
dims
[
dims
.
size
()
-
1
];
*
H
=
data_layout
==
DataLayout
::
kNCHW
?
dims
[
2
]
:
dims
[
1
];
*
W
=
dims
.
size
()
>
3
?
(
data_layout
==
DataLayout
::
kNCHW
?
dims
[
3
]
:
dims
[
2
])
:
1
;
*
D
=
dims
.
size
()
>
4
?
(
data_layout
==
DataLayout
::
kNCHW
?
dims
[
4
]
:
dims
[
3
])
:
1
;
}
}
template
<
typename
T
>
template
<
typename
T
>
class
BatchNormKernel
<
platform
::
CUDADeviceContext
,
T
>
class
BatchNormKernel
<
platform
::
CUDADeviceContext
,
T
>
:
public
framework
::
OpKernel
<
T
>
{
:
public
framework
::
OpKernel
<
T
>
{
...
@@ -196,22 +176,6 @@ class BatchNormKernel<platform::CUDADeviceContext, T>
...
@@ -196,22 +176,6 @@ class BatchNormKernel<platform::CUDADeviceContext, T>
}
}
};
};
template
<
typename
T
,
framework
::
DataLayout
layout
>
static
__global__
void
KeBNBackwardData
(
const
T
*
dy
,
const
BatchNormParamType
<
T
>
*
scale
,
const
BatchNormParamType
<
T
>
*
variance
,
const
double
epsilon
,
const
int
C
,
const
int
HxW
,
const
int
num
,
T
*
dx
)
{
int
gid
=
blockIdx
.
x
*
blockDim
.
x
+
threadIdx
.
x
;
int
stride
=
blockDim
.
x
*
gridDim
.
x
;
for
(
int
i
=
gid
;
i
<
num
;
i
+=
stride
)
{
const
int
c
=
layout
==
framework
::
DataLayout
::
kNCHW
?
i
/
HxW
%
C
:
i
%
C
;
BatchNormParamType
<
T
>
inv_var
=
1.0
/
sqrt
(
variance
[
c
]
+
epsilon
);
dx
[
i
]
=
static_cast
<
T
>
(
static_cast
<
BatchNormParamType
<
T
>>
(
dy
[
i
])
*
scale
[
c
]
*
inv_var
);
}
}
template
<
typename
T
,
int
BlockDim
,
framework
::
DataLayout
layout
>
template
<
typename
T
,
int
BlockDim
,
framework
::
DataLayout
layout
>
static
__global__
void
KeBNBackwardScaleBias
(
static
__global__
void
KeBNBackwardScaleBias
(
const
T
*
dy
,
const
T
*
x
,
const
BatchNormParamType
<
T
>
*
mean
,
const
T
*
dy
,
const
T
*
x
,
const
BatchNormParamType
<
T
>
*
mean
,
...
@@ -248,6 +212,22 @@ static __global__ void KeBNBackwardScaleBias(
...
@@ -248,6 +212,22 @@ static __global__ void KeBNBackwardScaleBias(
}
}
}
}
template
<
typename
T
,
framework
::
DataLayout
layout
>
static
__global__
void
KeBNBackwardData
(
const
T
*
dy
,
const
BatchNormParamType
<
T
>
*
scale
,
const
BatchNormParamType
<
T
>
*
variance
,
const
double
epsilon
,
const
int
C
,
const
int
HxW
,
const
int
num
,
T
*
dx
)
{
int
gid
=
blockIdx
.
x
*
blockDim
.
x
+
threadIdx
.
x
;
int
stride
=
blockDim
.
x
*
gridDim
.
x
;
for
(
int
i
=
gid
;
i
<
num
;
i
+=
stride
)
{
const
int
c
=
layout
==
framework
::
DataLayout
::
kNCHW
?
i
/
HxW
%
C
:
i
%
C
;
BatchNormParamType
<
T
>
inv_var
=
1.0
/
sqrt
(
variance
[
c
]
+
epsilon
);
dx
[
i
]
=
static_cast
<
T
>
(
static_cast
<
BatchNormParamType
<
T
>>
(
dy
[
i
])
*
scale
[
c
]
*
inv_var
);
}
}
template
<
typename
T
>
template
<
typename
T
>
class
BatchNormGradKernel
<
platform
::
CUDADeviceContext
,
T
>
class
BatchNormGradKernel
<
platform
::
CUDADeviceContext
,
T
>
:
public
framework
::
OpKernel
<
T
>
{
:
public
framework
::
OpKernel
<
T
>
{
...
@@ -383,7 +363,7 @@ class BatchNormGradKernel<platform::CUDADeviceContext, T>
...
@@ -383,7 +363,7 @@ class BatchNormGradKernel<platform::CUDADeviceContext, T>
KeBNBackwardScaleBias
<
T
,
block
,
framework
::
DataLayout
::
kNCHW
><<<
KeBNBackwardScaleBias
<
T
,
block
,
framework
::
DataLayout
::
kNCHW
><<<
grid2
,
block
,
0
,
dev_ctx
.
stream
()
>>>
(
grid2
,
block
,
0
,
dev_ctx
.
stream
()
>>>
(
d_y
->
data
<
T
>
(),
x
->
data
<
T
>
(),
running_mean_data
,
running_var_data
,
d_y
->
data
<
T
>
(),
x
->
data
<
T
>
(),
running_mean_data
,
running_var_data
,
epsilon
,
C
,
H
*
W
,
num
,
d_scale
->
data
<
BatchNormParamType
<
T
>>
(),
epsilon
,
N
,
C
,
H
*
W
*
D
,
d_scale
->
data
<
BatchNormParamType
<
T
>>
(),
d_bias
->
data
<
BatchNormParamType
<
T
>>
());
d_bias
->
data
<
BatchNormParamType
<
T
>>
());
}
}
}
else
{
}
else
{
...
@@ -394,10 +374,10 @@ class BatchNormGradKernel<platform::CUDADeviceContext, T>
...
@@ -394,10 +374,10 @@ class BatchNormGradKernel<platform::CUDADeviceContext, T>
running_var_data
,
epsilon
,
C
,
H
*
W
,
num
,
d_x
->
data
<
T
>
());
running_var_data
,
epsilon
,
C
,
H
*
W
,
num
,
d_x
->
data
<
T
>
());
}
}
if
(
d_scale
&&
d_bias
)
{
if
(
d_scale
&&
d_bias
)
{
KeBNBackwardScaleBias
<
T
,
block
,
framework
::
DataLayout
::
kN
CHW
><<<
KeBNBackwardScaleBias
<
T
,
block
,
framework
::
DataLayout
::
kN
HWC
><<<
grid2
,
block
,
0
,
dev_ctx
.
stream
()
>>>
(
grid2
,
block
,
0
,
dev_ctx
.
stream
()
>>>
(
d_y
->
data
<
T
>
(),
x
->
data
<
T
>
(),
running_mean_data
,
running_var_data
,
d_y
->
data
<
T
>
(),
x
->
data
<
T
>
(),
running_mean_data
,
running_var_data
,
epsilon
,
C
,
H
*
W
,
num
,
d_scale
->
data
<
BatchNormParamType
<
T
>>
(),
epsilon
,
N
,
C
,
H
*
W
*
D
,
d_scale
->
data
<
BatchNormParamType
<
T
>>
(),
d_bias
->
data
<
BatchNormParamType
<
T
>>
());
d_bias
->
data
<
BatchNormParamType
<
T
>>
());
}
}
}
}
...
...
paddle/fluid/operators/batch_norm_op.h
浏览文件 @
161b8ddc
...
@@ -13,6 +13,9 @@ See the License for the specific language governing permissions and
...
@@ -13,6 +13,9 @@ See the License for the specific language governing permissions and
limitations under the License. */
limitations under the License. */
#pragma once
#pragma once
#include <memory>
#include <string>
#include <unordered_map>
#include "paddle/fluid/framework/eigen.h"
#include "paddle/fluid/framework/eigen.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/op_registry.h"
...
@@ -35,17 +38,84 @@ template <typename T>
...
@@ -35,17 +38,84 @@ template <typename T>
using
ConstEigenVectorArrayMap
=
using
ConstEigenVectorArrayMap
=
Eigen
::
Map
<
const
Eigen
::
Array
<
T
,
Eigen
::
Dynamic
,
1
>>
;
Eigen
::
Map
<
const
Eigen
::
Array
<
T
,
Eigen
::
Dynamic
,
1
>>
;
class
BatchNormOp
:
public
framework
::
OperatorWithKernel
{
public:
using
framework
::
OperatorWithKernel
::
OperatorWithKernel
;
void
InferShape
(
framework
::
InferShapeContext
*
ctx
)
const
override
;
protected:
framework
::
OpKernelType
GetExpectedKernelType
(
const
framework
::
ExecutionContext
&
ctx
)
const
override
;
};
class
BatchNormGradOp
:
public
framework
::
OperatorWithKernel
{
public:
using
framework
::
OperatorWithKernel
::
OperatorWithKernel
;
void
InferShape
(
framework
::
InferShapeContext
*
ctx
)
const
override
;
protected:
framework
::
OpKernelType
GetExpectedKernelType
(
const
framework
::
ExecutionContext
&
ctx
)
const
override
;
};
class
BatchNormOpMaker
:
public
framework
::
OpProtoAndCheckerMaker
{
public:
void
Make
()
override
;
};
class
BatchNormGradMaker
:
public
framework
::
SingleGradOpDescMaker
{
public:
using
framework
::
SingleGradOpDescMaker
::
SingleGradOpDescMaker
;
protected:
std
::
unique_ptr
<
framework
::
OpDesc
>
Apply
()
const
override
;
virtual
std
::
string
GradOpType
()
const
{
return
this
->
ForwardOpType
()
+
"_grad"
;
}
};
class
BatchNormOpInferVarType
:
public
framework
::
PassInDtypeAndVarTypeToOutput
{
protected:
std
::
unordered_map
<
std
::
string
,
std
::
string
>
GetInputOutputWithSameType
()
const
override
{
return
std
::
unordered_map
<
std
::
string
,
std
::
string
>
{{
"X"
,
/*->*/
"Y"
}};
}
};
template
<
typename
DeviceContext
,
typename
T
>
template
<
typename
DeviceContext
,
typename
T
>
class
BatchNormKernel
:
public
framework
::
OpKernel
<
T
>
{
class
BatchNormKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
public:
void
Compute
(
const
framework
::
ExecutionContext
&
ctx
)
const
override
;
void
Compute
(
const
framework
::
ExecutionContext
&
ctx
)
const
override
;
};
};
template
<
typename
DeviceContext
,
typename
T
>
template
<
typename
DeviceContext
,
typename
T
>
class
BatchNormGradKernel
:
public
framework
::
OpKernel
<
T
>
{
class
BatchNormGradKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
public:
void
Compute
(
const
framework
::
ExecutionContext
&
ctx
)
const
override
;
void
Compute
(
const
framework
::
ExecutionContext
&
ctx
)
const
override
;
};
};
inline
void
ExtractNCWHD
(
const
framework
::
DDim
&
dims
,
const
DataLayout
&
data_layout
,
int
*
N
,
int
*
C
,
int
*
H
,
int
*
W
,
int
*
D
)
{
*
N
=
dims
[
0
];
if
(
dims
.
size
()
==
2
)
{
*
C
=
dims
[
1
];
*
H
=
1
;
*
W
=
1
;
*
D
=
1
;
}
else
{
*
C
=
data_layout
==
DataLayout
::
kNCHW
?
dims
[
1
]
:
dims
[
dims
.
size
()
-
1
];
*
H
=
data_layout
==
DataLayout
::
kNCHW
?
dims
[
2
]
:
dims
[
1
];
*
W
=
dims
.
size
()
>
3
?
(
data_layout
==
DataLayout
::
kNCHW
?
dims
[
3
]
:
dims
[
2
])
:
1
;
*
D
=
dims
.
size
()
>
4
?
(
data_layout
==
DataLayout
::
kNCHW
?
dims
[
4
]
:
dims
[
3
])
:
1
;
}
}
}
// namespace operators
}
// namespace operators
}
// namespace paddle
}
// namespace paddle
paddle/fluid/operators/conv_op.cc
浏览文件 @
161b8ddc
...
@@ -14,6 +14,7 @@ limitations under the License. */
...
@@ -14,6 +14,7 @@ limitations under the License. */
#include "paddle/fluid/operators/conv_op.h"
#include "paddle/fluid/operators/conv_op.h"
#include <memory>
#include <string>
#include <string>
#include <vector>
#include <vector>
...
@@ -194,6 +195,12 @@ void Conv2DOpMaker::Make() {
...
@@ -194,6 +195,12 @@ void Conv2DOpMaker::Make() {
AddAttr
<
bool
>
(
"use_mkldnn"
,
AddAttr
<
bool
>
(
"use_mkldnn"
,
"(bool, default false) Only used in mkldnn kernel"
)
"(bool, default false) Only used in mkldnn kernel"
)
.
SetDefault
(
false
);
.
SetDefault
(
false
);
AddAttr
<
bool
>
(
"use_quantizer"
,
"(bool, default false) "
"Set to true for operators that should be quantized and use "
"int8 kernel. "
"Only used on CPU."
)
.
SetDefault
(
false
);
AddAttr
<
bool
>
(
"fuse_relu"
,
"(bool, default false) Only used in mkldnn kernel"
)
AddAttr
<
bool
>
(
"fuse_relu"
,
"(bool, default false) Only used in mkldnn kernel"
)
.
SetDefault
(
false
);
.
SetDefault
(
false
);
AddAttr
<
bool
>
(
"fuse_residual_connection"
,
AddAttr
<
bool
>
(
"fuse_residual_connection"
,
...
...
paddle/fluid/operators/cross_entropy_op.cc
浏览文件 @
161b8ddc
...
@@ -13,18 +13,21 @@ See the License for the specific language governing permissions and
...
@@ -13,18 +13,21 @@ See the License for the specific language governing permissions and
limitations under the License. */
limitations under the License. */
#include "paddle/fluid/operators/cross_entropy_op.h"
#include "paddle/fluid/operators/cross_entropy_op.h"
#include <memory>
#include <string>
#include <string>
#include <unordered_map>
namespace
paddle
{
namespace
paddle
{
namespace
operators
{
namespace
operators
{
class
CrossEntropyOp
:
public
framework
::
OperatorWithKernel
{
class
CrossEntropyOp
Base
:
public
framework
::
OperatorWithKernel
{
public:
public:
using
framework
::
OperatorWithKernel
::
OperatorWithKernel
;
using
framework
::
OperatorWithKernel
::
OperatorWithKernel
;
void
InferShape
(
framework
::
InferShapeContext
*
ctx
)
const
override
{
void
InferShape
(
framework
::
InferShapeContext
*
ctx
)
const
override
{
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"X"
),
"Input(X) should be not null."
);
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"X"
),
"Input(X) should be not null."
);
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"Label"
),
"Input(Label) should be not null."
);
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"Label"
),
"Input(Label) should be not null."
);
PADDLE_ENFORCE
(
ctx
->
HasOutput
(
"Y"
),
"Output(Y) should be not null."
);
PADDLE_ENFORCE
(
ctx
->
HasOutput
(
"Y"
),
"Output(Y) should be not null."
);
auto
x_dims
=
ctx
->
GetInputDim
(
"X"
);
auto
x_dims
=
ctx
->
GetInputDim
(
"X"
);
...
@@ -43,7 +46,8 @@ class CrossEntropyOp : public framework::OperatorWithKernel {
...
@@ -43,7 +46,8 @@ class CrossEntropyOp : public framework::OperatorWithKernel {
"Input(X) and Input(Label) shall have the same shape "
"Input(X) and Input(Label) shall have the same shape "
"except the last dimension."
);
"except the last dimension."
);
}
}
if
(
ctx
->
Attrs
().
Get
<
bool
>
(
"soft_label"
))
{
if
(
IsSoftLabel
(
ctx
))
{
if
(
check
)
{
if
(
check
)
{
PADDLE_ENFORCE_EQ
(
x_dims
[
rank
-
1
],
label_dims
[
rank
-
1
],
PADDLE_ENFORCE_EQ
(
x_dims
[
rank
-
1
],
label_dims
[
rank
-
1
],
"If Attr(soft_label) == true, the last dimension of "
"If Attr(soft_label) == true, the last dimension of "
...
@@ -69,21 +73,24 @@ class CrossEntropyOp : public framework::OperatorWithKernel {
...
@@ -69,21 +73,24 @@ class CrossEntropyOp : public framework::OperatorWithKernel {
return
framework
::
OpKernelType
(
ctx
.
Input
<
Tensor
>
(
"X"
)
->
type
(),
return
framework
::
OpKernelType
(
ctx
.
Input
<
Tensor
>
(
"X"
)
->
type
(),
ctx
.
device_context
());
ctx
.
device_context
());
}
}
virtual
bool
IsSoftLabel
(
framework
::
InferShapeContext
*
ctx
)
const
{
return
ctx
->
Attrs
().
Get
<
bool
>
(
"soft_label"
);
}
};
};
class
CrossEntropyGradientOp
:
public
framework
::
OperatorWithKernel
{
class
CrossEntropyGradientOp
Base
:
public
framework
::
OperatorWithKernel
{
public:
public:
using
framework
::
OperatorWithKernel
::
OperatorWithKernel
;
using
framework
::
OperatorWithKernel
::
OperatorWithKernel
;
void
InferShape
(
framework
::
InferShapeContext
*
ctx
)
const
override
{
void
InferShape
(
framework
::
InferShapeContext
*
ctx
)
const
{
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"X"
),
"Input(X) should be not null."
);
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"Label"
),
"Input(Label) should be not null."
);
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"Label"
),
"Input(Label) should be not null."
);
PADDLE_ENFORCE
(
ctx
->
HasInput
(
framework
::
GradVarName
(
"Y"
)),
PADDLE_ENFORCE
(
ctx
->
HasInput
(
framework
::
GradVarName
(
"Y"
)),
"Input(Y@GRAD) shoudl be not null."
);
"Input(Y@GRAD) shoudl be not null."
);
PADDLE_ENFORCE
(
ctx
->
HasOutput
(
framework
::
GradVarName
(
"X"
)),
PADDLE_ENFORCE
(
ctx
->
HasOutput
(
framework
::
GradVarName
(
"X"
)),
"Output(X@GRAD) should be not null."
);
"Output(X@GRAD) should be not null."
);
auto
x_dims
=
ctx
->
GetInputDim
(
"X"
);
auto
x_dims
=
GetXDim
(
ctx
);
auto
label_dims
=
ctx
->
GetInputDim
(
"Label"
);
auto
label_dims
=
ctx
->
GetInputDim
(
"Label"
);
auto
dy_dims
=
ctx
->
GetInputDim
(
framework
::
GradVarName
(
"Y"
));
auto
dy_dims
=
ctx
->
GetInputDim
(
framework
::
GradVarName
(
"Y"
));
int
rank
=
x_dims
.
size
();
int
rank
=
x_dims
.
size
();
...
@@ -108,9 +115,7 @@ class CrossEntropyGradientOp : public framework::OperatorWithKernel {
...
@@ -108,9 +115,7 @@ class CrossEntropyGradientOp : public framework::OperatorWithKernel {
"The Input(X) and Input(Y@Grad) should have the same "
"The Input(X) and Input(Y@Grad) should have the same "
"shape except the last dimension."
);
"shape except the last dimension."
);
}
}
PADDLE_ENFORCE_EQ
(
dy_dims
[
rank
-
1
],
1
,
if
(
IsSoftLabel
(
ctx
))
{
"The last dimension of Input(Y@Grad) should be 1."
);
if
(
ctx
->
Attrs
().
Get
<
bool
>
(
"soft_label"
))
{
if
(
check
)
{
if
(
check
)
{
PADDLE_ENFORCE_EQ
(
PADDLE_ENFORCE_EQ
(
x_dims
[
rank
-
1
],
label_dims
[
rank
-
1
],
x_dims
[
rank
-
1
],
label_dims
[
rank
-
1
],
...
@@ -123,7 +128,10 @@ class CrossEntropyGradientOp : public framework::OperatorWithKernel {
...
@@ -123,7 +128,10 @@ class CrossEntropyGradientOp : public framework::OperatorWithKernel {
"Input(Label) should be 1."
);
"Input(Label) should be 1."
);
}
}
ctx
->
SetOutputDim
(
framework
::
GradVarName
(
"X"
),
x_dims
);
ctx
->
SetOutputDim
(
framework
::
GradVarName
(
"X"
),
x_dims
);
ctx
->
ShareLoD
(
"X"
,
framework
::
GradVarName
(
"X"
));
PADDLE_ENFORCE_EQ
(
dy_dims
[
rank
-
1
],
1
,
"The last dimension of Input(Y@Grad) should be 1."
);
ctx
->
SetOutputDim
(
framework
::
GradVarName
(
"X"
),
x_dims
);
ctx
->
ShareLoD
(
VarNameWithXLoD
(),
framework
::
GradVarName
(
"X"
));
}
}
protected:
protected:
...
@@ -131,9 +139,29 @@ class CrossEntropyGradientOp : public framework::OperatorWithKernel {
...
@@ -131,9 +139,29 @@ class CrossEntropyGradientOp : public framework::OperatorWithKernel {
// is determined by its input "X".
// is determined by its input "X".
framework
::
OpKernelType
GetExpectedKernelType
(
framework
::
OpKernelType
GetExpectedKernelType
(
const
framework
::
ExecutionContext
&
ctx
)
const
override
{
const
framework
::
ExecutionContext
&
ctx
)
const
override
{
return
framework
::
OpKernelType
(
ctx
.
Input
<
Tensor
>
(
"X"
)
->
type
(),
return
framework
::
OpKernelType
(
ctx
.
Input
<
Tensor
>
(
framework
::
GradVarName
(
"Y"
))
->
type
(),
ctx
.
device_context
());
ctx
.
device_context
());
}
}
virtual
framework
::
DDim
GetXDim
(
framework
::
InferShapeContext
*
ctx
)
const
{
return
ctx
->
GetInputDim
(
"X"
);
}
virtual
const
char
*
VarNameWithXLoD
()
const
{
return
"X"
;
}
virtual
bool
IsSoftLabel
(
framework
::
InferShapeContext
*
ctx
)
const
{
return
ctx
->
Attrs
().
Get
<
bool
>
(
"soft_label"
);
}
};
class
CrossEntropyOpInferVarType
:
public
framework
::
PassInDtypeAndVarTypeToOutput
{
protected:
std
::
unordered_map
<
std
::
string
,
std
::
string
>
GetInputOutputWithSameType
()
const
override
{
return
std
::
unordered_map
<
std
::
string
,
std
::
string
>
{{
"X"
,
/*->*/
"Y"
}};
}
};
};
class
CrossEntropyOpMaker
:
public
framework
::
OpProtoAndCheckerMaker
{
class
CrossEntropyOpMaker
:
public
framework
::
OpProtoAndCheckerMaker
{
...
@@ -200,22 +228,132 @@ or not. But the output only shares the LoD information with input X.
...
@@ -200,22 +228,132 @@ or not. But the output only shares the LoD information with input X.
}
}
};
};
class
CrossEntropyOpInferVarType
class
CrossEntropyGradientOp
:
public
CrossEntropyGradientOpBase
{
:
public
framework
::
PassInDtypeAndVarTypeToOutput
{
public:
using
CrossEntropyGradientOpBase
::
CrossEntropyGradientOpBase
;
void
InferShape
(
framework
::
InferShapeContext
*
ctx
)
const
override
{
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"X"
),
"Input(X) should be not null."
);
CrossEntropyGradientOpBase
::
InferShape
(
ctx
);
}
};
class
CrossEntropyOp2
:
public
CrossEntropyOpBase
{
public:
using
CrossEntropyOpBase
::
CrossEntropyOpBase
;
void
InferShape
(
framework
::
InferShapeContext
*
ctx
)
const
override
{
CrossEntropyOpBase
::
InferShape
(
ctx
);
PADDLE_ENFORCE
(
ctx
->
HasOutput
(
"XShape"
),
"Output(XShape) should be not null."
);
PADDLE_ENFORCE
(
ctx
->
HasOutput
(
"MatchX"
),
"Output(MatchX) should be not null."
);
auto
x_dims
=
ctx
->
GetInputDim
(
"X"
);
auto
x_dims_vec
=
framework
::
vectorize
(
x_dims
);
x_dims_vec
.
push_back
(
0
);
ctx
->
SetOutputDim
(
"XShape"
,
framework
::
make_ddim
(
x_dims_vec
));
x_dims
[
x_dims
.
size
()
-
1
]
=
1
;
ctx
->
SetOutputDim
(
"MatchX"
,
x_dims
);
ctx
->
ShareLoD
(
"X"
,
/*->*/
"XShape"
);
}
protected:
protected:
std
::
unordered_map
<
std
::
string
,
std
::
string
>
GetInputOutputWithSameType
()
bool
IsSoftLabel
(
framework
::
InferShapeContext
*
ctx
)
const
override
{
const
override
{
return
false
;
return
std
::
unordered_map
<
std
::
string
,
std
::
string
>
{{
"X"
,
/*->*/
"Y"
}};
}
}
};
};
class
CrossEntropyGradientOp2
:
public
CrossEntropyGradientOpBase
{
public:
using
CrossEntropyGradientOpBase
::
CrossEntropyGradientOpBase
;
void
InferShape
(
framework
::
InferShapeContext
*
ctx
)
const
override
{
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"MatchX"
),
"Input(MatchX) must exist"
);
CrossEntropyGradientOpBase
::
InferShape
(
ctx
);
}
protected:
virtual
framework
::
DDim
GetXDim
(
framework
::
InferShapeContext
*
ctx
)
const
{
auto
x_shape
=
ctx
->
GetInputDim
(
"XShape"
);
return
framework
::
DDim
(
x_shape
.
Get
(),
x_shape
.
size
()
-
1
);
}
virtual
const
char
*
VarNameWithXLoD
()
const
{
return
"XShape"
;
}
virtual
bool
IsSoftLabel
(
framework
::
InferShapeContext
*
ctx
)
const
{
return
false
;
}
};
class
CrossEntropyOpMaker2
:
public
framework
::
OpProtoAndCheckerMaker
{
public:
void
Make
()
override
{
AddInput
(
"X"
,
"(Tensor, default Tensor<float>), a tensor whose last dimension "
"size is equal to the number of classes. This input is a "
"probability computed by the previous operator, which is almost "
"always the result of a softmax operator."
);
AddInput
(
"Label"
,
"(Tensor), the tensor which represents the ground truth. It has the "
"same shape with 'X' except the last dimension. One hot Tensor."
);
AddOutput
(
"Y"
,
"(Tensor, default Tensor<float>), a tensor whose shape is same "
"with 'X' except that the last dimension size is 1. It "
"represents the cross entropy loss."
);
AddOutput
(
"XShape"
,
"Temporaily variable to save shape and LoD of X."
);
AddOutput
(
"MatchX"
,
"X value that matches label, used for gradient computation."
);
AddAttr
<
int
>
(
"ignore_index"
,
"(int, default -100), Specifies a target value that is"
"ignored and does not contribute to the input gradient."
"Only valid if soft_label is set to False"
)
.
SetDefault
(
-
100
);
AddComment
(
R"DOC(
Hard-label CrossEntropy Operator.
The input 'X' and 'Label' will first be logically flattened to 2-D matrixs.
The matrix's second dimension(row length) is as same as the original last
dimension, and the first dimension(column length) is the product of all other
original dimensions. Then the softmax computation will take palce on each raw
of flattened matrixs.
Only support hard label.
Both the input X and Label can carry the LoD (Level of Details) information,
or not. But the output only shares the LoD information with input X.
)DOC"
);
}
};
class
CrossEntropyGradOpDescMaker2
:
public
framework
::
SingleGradOpDescMaker
{
public:
using
framework
::
SingleGradOpDescMaker
::
SingleGradOpDescMaker
;
protected:
std
::
unique_ptr
<
framework
::
OpDesc
>
Apply
()
const
override
{
std
::
unique_ptr
<
framework
::
OpDesc
>
op
(
new
framework
::
OpDesc
());
op
->
SetType
(
"cross_entropy_grad2"
);
op
->
SetInput
(
"Label"
,
Input
(
"Label"
));
op
->
SetInput
(
"MatchX"
,
Output
(
"MatchX"
));
op
->
SetInput
(
"XShape"
,
Output
(
"XShape"
));
op
->
SetInput
(
framework
::
GradVarName
(
"Y"
),
OutputGrad
(
"Y"
));
op
->
SetOutput
(
framework
::
GradVarName
(
"X"
),
InputGrad
(
"X"
));
op
->
SetAttrMap
(
Attrs
());
return
op
;
}
};
}
// namespace operators
}
// namespace operators
}
// namespace paddle
}
// namespace paddle
namespace
ops
=
paddle
::
operators
;
namespace
ops
=
paddle
::
operators
;
using
CPUCtx
=
paddle
::
platform
::
CPUDeviceContext
;
using
CPUCtx
=
paddle
::
platform
::
CPUDeviceContext
;
REGISTER_OPERATOR
(
cross_entropy
,
ops
::
CrossEntropyOp
,
ops
::
CrossEntropyOpMaker
,
REGISTER_OPERATOR
(
cross_entropy
,
ops
::
CrossEntropyOp
Base
,
ops
::
CrossEntropyOpInferVarType
,
ops
::
CrossEntropyOp
Maker
,
ops
::
CrossEntropyOp
InferVarType
,
paddle
::
framework
::
DefaultGradOpDescMaker
<
true
>
);
paddle
::
framework
::
DefaultGradOpDescMaker
<
true
>
);
REGISTER_OPERATOR
(
cross_entropy_grad
,
ops
::
CrossEntropyGradientOp
);
REGISTER_OPERATOR
(
cross_entropy_grad
,
ops
::
CrossEntropyGradientOp
);
REGISTER_OP_CPU_KERNEL
(
cross_entropy
,
ops
::
CrossEntropyOpKernel
<
CPUCtx
,
float
>
,
REGISTER_OP_CPU_KERNEL
(
cross_entropy
,
ops
::
CrossEntropyOpKernel
<
CPUCtx
,
float
>
,
...
@@ -223,3 +361,14 @@ REGISTER_OP_CPU_KERNEL(cross_entropy, ops::CrossEntropyOpKernel<CPUCtx, float>,
...
@@ -223,3 +361,14 @@ REGISTER_OP_CPU_KERNEL(cross_entropy, ops::CrossEntropyOpKernel<CPUCtx, float>,
REGISTER_OP_CPU_KERNEL
(
cross_entropy_grad
,
REGISTER_OP_CPU_KERNEL
(
cross_entropy_grad
,
ops
::
CrossEntropyGradientOpKernel
<
CPUCtx
,
float
>
,
ops
::
CrossEntropyGradientOpKernel
<
CPUCtx
,
float
>
,
ops
::
CrossEntropyGradientOpKernel
<
CPUCtx
,
double
>
);
ops
::
CrossEntropyGradientOpKernel
<
CPUCtx
,
double
>
);
REGISTER_OPERATOR
(
cross_entropy2
,
ops
::
CrossEntropyOp2
,
ops
::
CrossEntropyOpMaker2
,
ops
::
CrossEntropyOpInferVarType
,
ops
::
CrossEntropyGradOpDescMaker2
);
REGISTER_OPERATOR
(
cross_entropy_grad2
,
ops
::
CrossEntropyGradientOp2
);
REGISTER_OP_CPU_KERNEL
(
cross_entropy2
,
ops
::
CrossEntropyOpKernel2
<
CPUCtx
,
float
>
,
ops
::
CrossEntropyOpKernel2
<
CPUCtx
,
double
>
);
REGISTER_OP_CPU_KERNEL
(
cross_entropy_grad2
,
ops
::
CrossEntropyGradientOpKernel2
<
CPUCtx
,
float
>
,
ops
::
CrossEntropyGradientOpKernel2
<
CPUCtx
,
double
>
);
paddle/fluid/operators/cross_entropy_op.cu
浏览文件 @
161b8ddc
...
@@ -27,3 +27,13 @@ REGISTER_OP_CUDA_KERNEL(
...
@@ -27,3 +27,13 @@ REGISTER_OP_CUDA_KERNEL(
cross_entropy_grad
,
ops
::
CrossEntropyGradientOpKernel
<
CUDACtx
,
float
>
,
cross_entropy_grad
,
ops
::
CrossEntropyGradientOpKernel
<
CUDACtx
,
float
>
,
ops
::
CrossEntropyGradientOpKernel
<
CUDACtx
,
double
>
,
ops
::
CrossEntropyGradientOpKernel
<
CUDACtx
,
double
>
,
ops
::
CrossEntropyGradientOpKernel
<
CUDACtx
,
plat
::
float16
>
);
ops
::
CrossEntropyGradientOpKernel
<
CUDACtx
,
plat
::
float16
>
);
REGISTER_OP_CUDA_KERNEL
(
cross_entropy2
,
ops
::
CrossEntropyOpKernel2
<
CUDACtx
,
float
>
,
ops
::
CrossEntropyOpKernel2
<
CUDACtx
,
double
>
,
ops
::
CrossEntropyOpKernel2
<
CUDACtx
,
plat
::
float16
>
);
REGISTER_OP_CUDA_KERNEL
(
cross_entropy_grad2
,
ops
::
CrossEntropyGradientOpKernel2
<
CUDACtx
,
float
>
,
ops
::
CrossEntropyGradientOpKernel2
<
CUDACtx
,
double
>
,
ops
::
CrossEntropyGradientOpKernel2
<
CUDACtx
,
plat
::
float16
>
);
paddle/fluid/operators/cross_entropy_op.h
浏览文件 @
161b8ddc
...
@@ -15,6 +15,7 @@ limitations under the License. */
...
@@ -15,6 +15,7 @@ limitations under the License. */
#pragma once
#pragma once
#include "paddle/fluid/framework/eigen.h"
#include "paddle/fluid/framework/eigen.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/operators/math.h"
#include "paddle/fluid/operators/math/cross_entropy.h"
#include "paddle/fluid/operators/math/cross_entropy.h"
#include "paddle/fluid/operators/math/math_function.h"
#include "paddle/fluid/operators/math/math_function.h"
#include "paddle/fluid/platform/for_range.h"
#include "paddle/fluid/platform/for_range.h"
...
@@ -137,5 +138,124 @@ class CrossEntropyGradientOpKernel : public framework::OpKernel<T> {
...
@@ -137,5 +138,124 @@ class CrossEntropyGradientOpKernel : public framework::OpKernel<T> {
}
}
};
};
template
<
typename
T
>
struct
HardLabelCrossEntropyForwardFunctor
{
HardLabelCrossEntropyForwardFunctor
(
const
T
*
x
,
T
*
y
,
T
*
match_x
,
const
int64_t
*
label
,
int64_t
ignore_index
,
int64_t
feature_size
)
:
x_
(
x
),
y_
(
y
),
match_x_
(
match_x
),
label_
(
label
),
ignore_index_
(
ignore_index
),
feature_size_
(
feature_size
)
{}
HOSTDEVICE
void
operator
()(
int64_t
idx
)
const
{
auto
label
=
label_
[
idx
];
if
(
label
!=
ignore_index_
)
{
auto
match_x
=
x_
[
idx
*
feature_size_
+
label
];
y_
[
idx
]
=
-
math
::
TolerableValue
<
T
>
()(
real_log
(
match_x
));
match_x_
[
idx
]
=
match_x
;
}
else
{
y_
[
idx
]
=
0
;
match_x_
[
idx
]
=
0
;
// any value is ok
}
}
const
T
*
x_
;
T
*
y_
;
T
*
match_x_
;
const
int64_t
*
label_
;
int64_t
ignore_index_
;
int64_t
feature_size_
;
};
template
<
typename
T
>
struct
HardLabelCrossEntropyBackwardFunctor
{
HardLabelCrossEntropyBackwardFunctor
(
T
*
dx
,
const
T
*
dy
,
const
T
*
match_x
,
const
int64_t
*
label
,
int64_t
ignore_index
,
int64_t
feature_size
)
:
dx_
(
dx
),
dy_
(
dy
),
match_x_
(
match_x
),
label_
(
label
),
ignore_index_
(
ignore_index
),
feature_size_
(
feature_size
)
{}
HOSTDEVICE
void
operator
()(
int64_t
idx
)
const
{
auto
row_idx
=
idx
/
feature_size_
;
auto
col_idx
=
idx
%
feature_size_
;
auto
label
=
label_
[
row_idx
];
if
(
label
==
col_idx
&&
label
!=
ignore_index_
)
{
dx_
[
idx
]
=
-
dy_
[
row_idx
]
/
match_x_
[
row_idx
];
}
else
{
dx_
[
idx
]
=
0
;
}
}
T
*
dx_
;
const
T
*
dy_
;
const
T
*
match_x_
;
const
int64_t
*
label_
;
int64_t
ignore_index_
;
int64_t
feature_size_
;
};
template
<
typename
DeviceContext
,
typename
T
>
class
CrossEntropyOpKernel2
:
public
framework
::
OpKernel
<
T
>
{
public:
void
Compute
(
const
framework
::
ExecutionContext
&
ctx
)
const
override
{
auto
*
x
=
ctx
.
Input
<
Tensor
>
(
"X"
);
auto
*
label
=
ctx
.
Input
<
Tensor
>
(
"Label"
);
auto
*
y
=
ctx
.
Output
<
Tensor
>
(
"Y"
);
auto
*
match_x
=
ctx
.
Output
<
Tensor
>
(
"MatchX"
);
auto
&
x_dims
=
x
->
dims
();
auto
feature_size
=
x_dims
[
x_dims
.
size
()
-
1
];
auto
batch_size
=
framework
::
product
(
x
->
dims
())
/
feature_size
;
auto
*
p_x
=
x
->
data
<
T
>
();
auto
*
p_label
=
label
->
data
<
int64_t
>
();
auto
*
p_y
=
y
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
auto
*
p_match_x
=
match_x
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
auto
ignore_index
=
ctx
.
Attr
<
int
>
(
"ignore_index"
);
platform
::
ForRange
<
DeviceContext
>
for_range
(
ctx
.
template
device_context
<
DeviceContext
>(),
batch_size
);
for_range
(
HardLabelCrossEntropyForwardFunctor
<
T
>
(
p_x
,
p_y
,
p_match_x
,
p_label
,
ignore_index
,
feature_size
));
}
};
template
<
typename
DeviceContext
,
typename
T
>
class
CrossEntropyGradientOpKernel2
:
public
framework
::
OpKernel
<
T
>
{
public:
void
Compute
(
const
framework
::
ExecutionContext
&
ctx
)
const
override
{
auto
*
dx
=
ctx
.
Output
<
Tensor
>
(
framework
::
GradVarName
(
"X"
));
auto
*
dy
=
ctx
.
Input
<
Tensor
>
(
framework
::
GradVarName
(
"Y"
));
auto
*
match_x
=
ctx
.
Input
<
Tensor
>
(
"MatchX"
);
auto
*
label
=
ctx
.
Input
<
Tensor
>
(
"Label"
);
auto
*
p_dx
=
dx
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
auto
*
p_dy
=
dy
->
data
<
T
>
();
auto
*
p_match_x
=
match_x
->
data
<
T
>
();
auto
*
p_label
=
label
->
data
<
int64_t
>
();
int64_t
ignore_index
=
ctx
.
Attr
<
int
>
(
"ignore_index"
);
int
rank
=
dx
->
dims
().
size
();
int64_t
feature_size
=
dx
->
dims
()[
rank
-
1
];
int64_t
batch_size
=
framework
::
product
(
dx
->
dims
())
/
feature_size
;
platform
::
ForRange
<
DeviceContext
>
for_range
(
ctx
.
template
device_context
<
DeviceContext
>(),
batch_size
*
feature_size
);
for_range
(
HardLabelCrossEntropyBackwardFunctor
<
T
>
(
p_dx
,
p_dy
,
p_match_x
,
p_label
,
ignore_index
,
feature_size
));
}
};
}
// namespace operators
}
// namespace operators
}
// namespace paddle
}
// namespace paddle
paddle/fluid/operators/detection/CMakeLists.txt
浏览文件 @
161b8ddc
...
@@ -33,6 +33,7 @@ detection_library(rpn_target_assign_op SRCS rpn_target_assign_op.cc)
...
@@ -33,6 +33,7 @@ detection_library(rpn_target_assign_op SRCS rpn_target_assign_op.cc)
detection_library
(
generate_proposal_labels_op SRCS generate_proposal_labels_op.cc
)
detection_library
(
generate_proposal_labels_op SRCS generate_proposal_labels_op.cc
)
detection_library
(
box_clip_op SRCS box_clip_op.cc box_clip_op.cu
)
detection_library
(
box_clip_op SRCS box_clip_op.cc box_clip_op.cu
)
detection_library
(
yolov3_loss_op SRCS yolov3_loss_op.cc
)
detection_library
(
yolov3_loss_op SRCS yolov3_loss_op.cc
)
detection_library
(
yolo_box_op SRCS yolo_box_op.cc yolo_box_op.cu
)
detection_library
(
box_decoder_and_assign_op SRCS box_decoder_and_assign_op.cc box_decoder_and_assign_op.cu
)
detection_library
(
box_decoder_and_assign_op SRCS box_decoder_and_assign_op.cc box_decoder_and_assign_op.cu
)
if
(
WITH_GPU
)
if
(
WITH_GPU
)
...
...
paddle/fluid/operators/detection/box_coder_op.cc
浏览文件 @
161b8ddc
...
@@ -60,14 +60,15 @@ class BoxCoderOp : public framework::OperatorWithKernel {
...
@@ -60,14 +60,15 @@ class BoxCoderOp : public framework::OperatorWithKernel {
}
else
if
(
code_type
==
BoxCodeType
::
kDecodeCenterSize
)
{
}
else
if
(
code_type
==
BoxCodeType
::
kDecodeCenterSize
)
{
PADDLE_ENFORCE_EQ
(
target_box_dims
.
size
(),
3
,
PADDLE_ENFORCE_EQ
(
target_box_dims
.
size
(),
3
,
"The rank of Input TargetBox must be 3"
);
"The rank of Input TargetBox must be 3"
);
PADDLE_ENFORCE
(
axis
==
0
||
axis
==
1
,
"axis must be 0 or 1"
);
if
(
ctx
->
IsRuntime
())
{
if
(
axis
==
0
)
{
if
(
axis
==
0
)
{
PADDLE_ENFORCE_EQ
(
target_box_dims
[
1
],
prior_box_dims
[
0
]);
PADDLE_ENFORCE_EQ
(
target_box_dims
[
1
],
prior_box_dims
[
0
]);
}
else
if
(
axis
==
1
)
{
}
else
if
(
axis
==
1
)
{
PADDLE_ENFORCE_EQ
(
target_box_dims
[
0
],
prior_box_dims
[
0
]);
PADDLE_ENFORCE_EQ
(
target_box_dims
[
0
],
prior_box_dims
[
0
]);
}
else
{
PADDLE_THROW
(
"axis must be 0 or 1."
);
}
}
PADDLE_ENFORCE_EQ
(
target_box_dims
[
2
],
prior_box_dims
[
1
]);
PADDLE_ENFORCE_EQ
(
target_box_dims
[
2
],
prior_box_dims
[
1
]);
}
ctx
->
ShareDim
(
"TargetBox"
,
/*->*/
"OutputBox"
);
ctx
->
ShareDim
(
"TargetBox"
,
/*->*/
"OutputBox"
);
}
}
...
...
paddle/fluid/operators/detection/yolo_box_op.cc
0 → 100644
浏览文件 @
161b8ddc
/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/operators/detection/yolo_box_op.h"
#include "paddle/fluid/framework/op_registry.h"
namespace
paddle
{
namespace
operators
{
using
framework
::
Tensor
;
class
YoloBoxOp
:
public
framework
::
OperatorWithKernel
{
public:
using
framework
::
OperatorWithKernel
::
OperatorWithKernel
;
void
InferShape
(
framework
::
InferShapeContext
*
ctx
)
const
override
{
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"X"
),
"Input(X) of YoloBoxOp should not be null."
);
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"ImgSize"
),
"Input(ImgSize) of YoloBoxOp should not be null."
);
PADDLE_ENFORCE
(
ctx
->
HasOutput
(
"Boxes"
),
"Output(Boxes) of YoloBoxOp should not be null."
);
PADDLE_ENFORCE
(
ctx
->
HasOutput
(
"Scores"
),
"Output(Scores) of YoloBoxOp should not be null."
);
auto
dim_x
=
ctx
->
GetInputDim
(
"X"
);
auto
dim_imgsize
=
ctx
->
GetInputDim
(
"ImgSize"
);
auto
anchors
=
ctx
->
Attrs
().
Get
<
std
::
vector
<
int
>>
(
"anchors"
);
int
anchor_num
=
anchors
.
size
()
/
2
;
auto
class_num
=
ctx
->
Attrs
().
Get
<
int
>
(
"class_num"
);
PADDLE_ENFORCE_EQ
(
dim_x
.
size
(),
4
,
"Input(X) should be a 4-D tensor."
);
PADDLE_ENFORCE_EQ
(
dim_x
[
1
],
anchor_num
*
(
5
+
class_num
),
"Input(X) dim[1] should be equal to (anchor_mask_number * (5 "
"+ class_num))."
);
PADDLE_ENFORCE_EQ
(
dim_imgsize
.
size
(),
2
,
"Input(ImgSize) should be a 2-D tensor."
);
PADDLE_ENFORCE_EQ
(
dim_imgsize
[
0
],
dim_x
[
0
],
"Input(ImgSize) dim[0] and Input(X) dim[0] should be same."
);
PADDLE_ENFORCE_EQ
(
dim_imgsize
[
1
],
2
,
"Input(ImgSize) dim[1] should be 2."
);
PADDLE_ENFORCE_GT
(
anchors
.
size
(),
0
,
"Attr(anchors) length should be greater than 0."
);
PADDLE_ENFORCE_EQ
(
anchors
.
size
()
%
2
,
0
,
"Attr(anchors) length should be even integer."
);
PADDLE_ENFORCE_GT
(
class_num
,
0
,
"Attr(class_num) should be an integer greater than 0."
);
int
box_num
=
dim_x
[
2
]
*
dim_x
[
3
]
*
anchor_num
;
std
::
vector
<
int64_t
>
dim_boxes
({
dim_x
[
0
],
box_num
,
4
});
ctx
->
SetOutputDim
(
"Boxes"
,
framework
::
make_ddim
(
dim_boxes
));
std
::
vector
<
int64_t
>
dim_scores
({
dim_x
[
0
],
box_num
,
class_num
});
ctx
->
SetOutputDim
(
"Scores"
,
framework
::
make_ddim
(
dim_scores
));
}
protected:
framework
::
OpKernelType
GetExpectedKernelType
(
const
framework
::
ExecutionContext
&
ctx
)
const
override
{
return
framework
::
OpKernelType
(
ctx
.
Input
<
Tensor
>
(
"X"
)
->
type
(),
ctx
.
GetPlace
());
}
};
class
YoloBoxOpMaker
:
public
framework
::
OpProtoAndCheckerMaker
{
public:
void
Make
()
override
{
AddInput
(
"X"
,
"The input tensor of YoloBox operator is a 4-D tensor with "
"shape of [N, C, H, W]. The second dimension(C) stores "
"box locations, confidence score and classification one-hot "
"keys of each anchor box. Generally, X should be the output "
"of YOLOv3 network."
);
AddInput
(
"ImgSize"
,
"The image size tensor of YoloBox operator, "
"This is a 2-D tensor with shape of [N, 2]. This tensor holds "
"height and width of each input image used for resizing output "
"box in input image scale."
);
AddOutput
(
"Boxes"
,
"The output tensor of detection boxes of YoloBox operator, "
"This is a 3-D tensor with shape of [N, M, 4], N is the "
"batch num, M is output box number, and the 3rd dimension "
"stores [xmin, ymin, xmax, ymax] coordinates of boxes."
);
AddOutput
(
"Scores"
,
"The output tensor of detection boxes scores of YoloBox "
"operator, This is a 3-D tensor with shape of "
"[N, M, :attr:`class_num`], N is the batch num, M is "
"output box number."
);
AddAttr
<
int
>
(
"class_num"
,
"The number of classes to predict."
);
AddAttr
<
std
::
vector
<
int
>>
(
"anchors"
,
"The anchor width and height, "
"it will be parsed pair by pair."
)
.
SetDefault
(
std
::
vector
<
int
>
{});
AddAttr
<
int
>
(
"downsample_ratio"
,
"The downsample ratio from network input to YoloBox operator "
"input, so 32, 16, 8 should be set for the first, second, "
"and thrid YoloBox operators."
)
.
SetDefault
(
32
);
AddAttr
<
float
>
(
"conf_thresh"
,
"The confidence scores threshold of detection boxes. "
"Boxes with confidence scores under threshold should "
"be ignored."
)
.
SetDefault
(
0.01
);
AddComment
(
R"DOC(
This operator generates YOLO detection boxes from output of YOLOv3 network.
The output of previous network is in shape [N, C, H, W], while H and W
should be the same, H and W specify the grid size, each grid point predict
given number boxes, this given number, which following will be represented as S,
is specified by the number of anchors. In the second dimension(the channel
dimension), C should be equal to S * (5 + class_num), class_num is the object
category number of source dataset(such as 80 in coco dataset), so the
second(channel) dimension, apart from 4 box location coordinates x, y, w, h,
also includes confidence score of the box and class one-hot key of each anchor
box.
Assume the 4 location coordinates are :math:`t_x, t_y, t_w, t_h`, the box
predictions should be as follows:
$$
b_x = \\sigma(t_x) + c_x
$$
$$
b_y = \\sigma(t_y) + c_y
$$
$$
b_w = p_w e^{t_w}
$$
$$
b_h = p_h e^{t_h}
$$
in the equation above, :math:`c_x, c_y` is the left top corner of current grid
and :math:`p_w, p_h` is specified by anchors.
The logistic regression value of the 5th channel of each anchor prediction boxes
represents the confidence score of each prediction box, and the logistic
regression value of the last :attr:`class_num` channels of each anchor prediction
boxes represents the classifcation scores. Boxes with confidence scores less than
:attr:`conf_thresh` should be ignored, and box final scores is the product of
confidence scores and classification scores.
$$
score_{pred} = score_{conf} * score_{class}
$$
)DOC"
);
}
};
}
// namespace operators
}
// namespace paddle
namespace
ops
=
paddle
::
operators
;
REGISTER_OPERATOR
(
yolo_box
,
ops
::
YoloBoxOp
,
ops
::
YoloBoxOpMaker
,
paddle
::
framework
::
EmptyGradOpMaker
);
REGISTER_OP_CPU_KERNEL
(
yolo_box
,
ops
::
YoloBoxKernel
<
float
>
,
ops
::
YoloBoxKernel
<
double
>
);
paddle/fluid/operators/detection/yolo_box_op.cu
0 → 100644
浏览文件 @
161b8ddc
/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/operators/detection/yolo_box_op.h"
#include "paddle/fluid/operators/math/math_function.h"
namespace
paddle
{
namespace
operators
{
using
Tensor
=
framework
::
Tensor
;
template
<
typename
T
>
__global__
void
KeYoloBoxFw
(
const
T
*
input
,
const
int
*
imgsize
,
T
*
boxes
,
T
*
scores
,
const
float
conf_thresh
,
const
int
*
anchors
,
const
int
n
,
const
int
h
,
const
int
w
,
const
int
an_num
,
const
int
class_num
,
const
int
box_num
,
int
input_size
)
{
int
tid
=
blockIdx
.
x
*
blockDim
.
x
+
threadIdx
.
x
;
int
stride
=
blockDim
.
x
*
gridDim
.
x
;
T
box
[
4
];
for
(;
tid
<
n
*
box_num
;
tid
+=
stride
)
{
int
grid_num
=
h
*
w
;
int
i
=
tid
/
box_num
;
int
j
=
(
tid
%
box_num
)
/
grid_num
;
int
k
=
(
tid
%
grid_num
)
/
w
;
int
l
=
tid
%
w
;
int
an_stride
=
(
5
+
class_num
)
*
grid_num
;
int
img_height
=
imgsize
[
2
*
i
];
int
img_width
=
imgsize
[
2
*
i
+
1
];
int
obj_idx
=
GetEntryIndex
(
i
,
j
,
k
*
w
+
l
,
an_num
,
an_stride
,
grid_num
,
4
);
T
conf
=
sigmoid
<
T
>
(
input
[
obj_idx
]);
if
(
conf
<
conf_thresh
)
{
continue
;
}
int
box_idx
=
GetEntryIndex
(
i
,
j
,
k
*
w
+
l
,
an_num
,
an_stride
,
grid_num
,
0
);
GetYoloBox
<
T
>
(
box
,
input
,
anchors
,
l
,
k
,
j
,
h
,
input_size
,
box_idx
,
grid_num
,
img_height
,
img_width
);
box_idx
=
(
i
*
box_num
+
j
*
grid_num
+
k
*
w
+
l
)
*
4
;
CalcDetectionBox
<
T
>
(
boxes
,
box
,
box_idx
,
img_height
,
img_width
);
int
label_idx
=
GetEntryIndex
(
i
,
j
,
k
*
w
+
l
,
an_num
,
an_stride
,
grid_num
,
5
);
int
score_idx
=
(
i
*
box_num
+
j
*
grid_num
+
k
*
w
+
l
)
*
class_num
;
CalcLabelScore
<
T
>
(
scores
,
input
,
label_idx
,
score_idx
,
class_num
,
conf
,
grid_num
);
}
}
template
<
typename
T
>
class
YoloBoxOpCUDAKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
void
Compute
(
const
framework
::
ExecutionContext
&
ctx
)
const
override
{
auto
*
input
=
ctx
.
Input
<
Tensor
>
(
"X"
);
auto
*
img_size
=
ctx
.
Input
<
Tensor
>
(
"ImgSize"
);
auto
*
boxes
=
ctx
.
Output
<
Tensor
>
(
"Boxes"
);
auto
*
scores
=
ctx
.
Output
<
Tensor
>
(
"Scores"
);
auto
anchors
=
ctx
.
Attr
<
std
::
vector
<
int
>>
(
"anchors"
);
int
class_num
=
ctx
.
Attr
<
int
>
(
"class_num"
);
float
conf_thresh
=
ctx
.
Attr
<
float
>
(
"conf_thresh"
);
int
downsample_ratio
=
ctx
.
Attr
<
int
>
(
"downsample_ratio"
);
const
int
n
=
input
->
dims
()[
0
];
const
int
h
=
input
->
dims
()[
2
];
const
int
w
=
input
->
dims
()[
3
];
const
int
box_num
=
boxes
->
dims
()[
1
];
const
int
an_num
=
anchors
.
size
()
/
2
;
int
input_size
=
downsample_ratio
*
h
;
auto
&
dev_ctx
=
ctx
.
cuda_device_context
();
auto
&
allocator
=
platform
::
DeviceTemporaryAllocator
::
Instance
().
Get
(
dev_ctx
);
int
bytes
=
sizeof
(
int
)
*
anchors
.
size
();
auto
anchors_ptr
=
allocator
.
Allocate
(
sizeof
(
int
)
*
anchors
.
size
());
int
*
anchors_data
=
reinterpret_cast
<
int
*>
(
anchors_ptr
->
ptr
());
const
auto
gplace
=
boost
::
get
<
platform
::
CUDAPlace
>
(
ctx
.
GetPlace
());
const
auto
cplace
=
platform
::
CPUPlace
();
memory
::
Copy
(
gplace
,
anchors_data
,
cplace
,
anchors
.
data
(),
bytes
,
dev_ctx
.
stream
());
const
T
*
input_data
=
input
->
data
<
T
>
();
const
int
*
imgsize_data
=
img_size
->
data
<
int
>
();
T
*
boxes_data
=
boxes
->
mutable_data
<
T
>
({
n
,
box_num
,
4
},
ctx
.
GetPlace
());
T
*
scores_data
=
scores
->
mutable_data
<
T
>
({
n
,
box_num
,
class_num
},
ctx
.
GetPlace
());
math
::
SetConstant
<
platform
::
CUDADeviceContext
,
T
>
set_zero
;
set_zero
(
dev_ctx
,
boxes
,
static_cast
<
T
>
(
0
));
set_zero
(
dev_ctx
,
scores
,
static_cast
<
T
>
(
0
));
int
grid_dim
=
(
n
*
box_num
+
512
-
1
)
/
512
;
grid_dim
=
grid_dim
>
8
?
8
:
grid_dim
;
KeYoloBoxFw
<
T
><<<
grid_dim
,
512
,
0
,
ctx
.
cuda_device_context
().
stream
()
>>>
(
input_data
,
imgsize_data
,
boxes_data
,
scores_data
,
conf_thresh
,
anchors_data
,
n
,
h
,
w
,
an_num
,
class_num
,
box_num
,
input_size
);
}
};
}
// namespace operators
}
// namespace paddle
namespace
ops
=
paddle
::
operators
;
REGISTER_OP_CUDA_KERNEL
(
yolo_box
,
ops
::
YoloBoxOpCUDAKernel
<
float
>
,
ops
::
YoloBoxOpCUDAKernel
<
double
>
);
paddle/fluid/operators/detection/yolo_box_op.h
0 → 100644
浏览文件 @
161b8ddc
/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <algorithm>
#include <vector>
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/platform/hostdevice.h"
namespace
paddle
{
namespace
operators
{
using
Tensor
=
framework
::
Tensor
;
template
<
typename
T
>
HOSTDEVICE
inline
T
sigmoid
(
T
x
)
{
return
1.0
/
(
1.0
+
std
::
exp
(
-
x
));
}
template
<
typename
T
>
HOSTDEVICE
inline
void
GetYoloBox
(
T
*
box
,
const
T
*
x
,
const
int
*
anchors
,
int
i
,
int
j
,
int
an_idx
,
int
grid_size
,
int
input_size
,
int
index
,
int
stride
,
int
img_height
,
int
img_width
)
{
box
[
0
]
=
(
i
+
sigmoid
<
T
>
(
x
[
index
]))
*
img_width
/
grid_size
;
box
[
1
]
=
(
j
+
sigmoid
<
T
>
(
x
[
index
+
stride
]))
*
img_height
/
grid_size
;
box
[
2
]
=
std
::
exp
(
x
[
index
+
2
*
stride
])
*
anchors
[
2
*
an_idx
]
*
img_width
/
input_size
;
box
[
3
]
=
std
::
exp
(
x
[
index
+
3
*
stride
])
*
anchors
[
2
*
an_idx
+
1
]
*
img_height
/
input_size
;
}
HOSTDEVICE
inline
int
GetEntryIndex
(
int
batch
,
int
an_idx
,
int
hw_idx
,
int
an_num
,
int
an_stride
,
int
stride
,
int
entry
)
{
return
(
batch
*
an_num
+
an_idx
)
*
an_stride
+
entry
*
stride
+
hw_idx
;
}
template
<
typename
T
>
HOSTDEVICE
inline
void
CalcDetectionBox
(
T
*
boxes
,
T
*
box
,
const
int
box_idx
,
const
int
img_height
,
const
int
img_width
)
{
boxes
[
box_idx
]
=
box
[
0
]
-
box
[
2
]
/
2
;
boxes
[
box_idx
+
1
]
=
box
[
1
]
-
box
[
3
]
/
2
;
boxes
[
box_idx
+
2
]
=
box
[
0
]
+
box
[
2
]
/
2
;
boxes
[
box_idx
+
3
]
=
box
[
1
]
+
box
[
3
]
/
2
;
boxes
[
box_idx
]
=
boxes
[
box_idx
]
>
0
?
boxes
[
box_idx
]
:
static_cast
<
T
>
(
0
);
boxes
[
box_idx
+
1
]
=
boxes
[
box_idx
+
1
]
>
0
?
boxes
[
box_idx
+
1
]
:
static_cast
<
T
>
(
0
);
boxes
[
box_idx
+
2
]
=
boxes
[
box_idx
+
2
]
<
img_width
-
1
?
boxes
[
box_idx
+
2
]
:
static_cast
<
T
>
(
img_width
-
1
);
boxes
[
box_idx
+
3
]
=
boxes
[
box_idx
+
3
]
<
img_height
-
1
?
boxes
[
box_idx
+
3
]
:
static_cast
<
T
>
(
img_height
-
1
);
}
template
<
typename
T
>
HOSTDEVICE
inline
void
CalcLabelScore
(
T
*
scores
,
const
T
*
input
,
const
int
label_idx
,
const
int
score_idx
,
const
int
class_num
,
const
T
conf
,
const
int
stride
)
{
for
(
int
i
=
0
;
i
<
class_num
;
i
++
)
{
scores
[
score_idx
+
i
]
=
conf
*
sigmoid
<
T
>
(
input
[
label_idx
+
i
*
stride
]);
}
}
template
<
typename
T
>
class
YoloBoxKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
void
Compute
(
const
framework
::
ExecutionContext
&
ctx
)
const
override
{
auto
*
input
=
ctx
.
Input
<
Tensor
>
(
"X"
);
auto
*
imgsize
=
ctx
.
Input
<
Tensor
>
(
"ImgSize"
);
auto
*
boxes
=
ctx
.
Output
<
Tensor
>
(
"Boxes"
);
auto
*
scores
=
ctx
.
Output
<
Tensor
>
(
"Scores"
);
auto
anchors
=
ctx
.
Attr
<
std
::
vector
<
int
>>
(
"anchors"
);
int
class_num
=
ctx
.
Attr
<
int
>
(
"class_num"
);
float
conf_thresh
=
ctx
.
Attr
<
float
>
(
"conf_thresh"
);
int
downsample_ratio
=
ctx
.
Attr
<
int
>
(
"downsample_ratio"
);
const
int
n
=
input
->
dims
()[
0
];
const
int
h
=
input
->
dims
()[
2
];
const
int
w
=
input
->
dims
()[
3
];
const
int
box_num
=
boxes
->
dims
()[
1
];
const
int
an_num
=
anchors
.
size
()
/
2
;
int
input_size
=
downsample_ratio
*
h
;
const
int
stride
=
h
*
w
;
const
int
an_stride
=
(
class_num
+
5
)
*
stride
;
Tensor
anchors_
;
auto
anchors_data
=
anchors_
.
mutable_data
<
int
>
({
an_num
*
2
},
ctx
.
GetPlace
());
std
::
copy
(
anchors
.
begin
(),
anchors
.
end
(),
anchors_data
);
const
T
*
input_data
=
input
->
data
<
T
>
();
const
int
*
imgsize_data
=
imgsize
->
data
<
int
>
();
T
*
boxes_data
=
boxes
->
mutable_data
<
T
>
({
n
,
box_num
,
4
},
ctx
.
GetPlace
());
memset
(
boxes_data
,
0
,
boxes
->
numel
()
*
sizeof
(
T
));
T
*
scores_data
=
scores
->
mutable_data
<
T
>
({
n
,
box_num
,
class_num
},
ctx
.
GetPlace
());
memset
(
scores_data
,
0
,
scores
->
numel
()
*
sizeof
(
T
));
T
box
[
4
];
for
(
int
i
=
0
;
i
<
n
;
i
++
)
{
int
img_height
=
imgsize_data
[
2
*
i
];
int
img_width
=
imgsize_data
[
2
*
i
+
1
];
for
(
int
j
=
0
;
j
<
an_num
;
j
++
)
{
for
(
int
k
=
0
;
k
<
h
;
k
++
)
{
for
(
int
l
=
0
;
l
<
w
;
l
++
)
{
int
obj_idx
=
GetEntryIndex
(
i
,
j
,
k
*
w
+
l
,
an_num
,
an_stride
,
stride
,
4
);
T
conf
=
sigmoid
<
T
>
(
input_data
[
obj_idx
]);
if
(
conf
<
conf_thresh
)
{
continue
;
}
int
box_idx
=
GetEntryIndex
(
i
,
j
,
k
*
w
+
l
,
an_num
,
an_stride
,
stride
,
0
);
GetYoloBox
<
T
>
(
box
,
input_data
,
anchors_data
,
l
,
k
,
j
,
h
,
input_size
,
box_idx
,
stride
,
img_height
,
img_width
);
box_idx
=
(
i
*
box_num
+
j
*
stride
+
k
*
w
+
l
)
*
4
;
CalcDetectionBox
<
T
>
(
boxes_data
,
box
,
box_idx
,
img_height
,
img_width
);
int
label_idx
=
GetEntryIndex
(
i
,
j
,
k
*
w
+
l
,
an_num
,
an_stride
,
stride
,
5
);
int
score_idx
=
(
i
*
box_num
+
j
*
stride
+
k
*
w
+
l
)
*
class_num
;
CalcLabelScore
<
T
>
(
scores_data
,
input_data
,
label_idx
,
score_idx
,
class_num
,
conf
,
stride
);
}
}
}
}
}
};
}
// namespace operators
}
// namespace paddle
paddle/fluid/operators/detection/yolov3_loss_op.cc
浏览文件 @
161b8ddc
...
@@ -10,6 +10,7 @@
...
@@ -10,6 +10,7 @@
limitations under the License. */
limitations under the License. */
#include "paddle/fluid/operators/detection/yolov3_loss_op.h"
#include "paddle/fluid/operators/detection/yolov3_loss_op.h"
#include <memory>
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/op_registry.h"
namespace
paddle
{
namespace
paddle
{
...
@@ -72,6 +73,18 @@ class Yolov3LossOp : public framework::OperatorWithKernel {
...
@@ -72,6 +73,18 @@ class Yolov3LossOp : public framework::OperatorWithKernel {
PADDLE_ENFORCE_GT
(
class_num
,
0
,
PADDLE_ENFORCE_GT
(
class_num
,
0
,
"Attr(class_num) should be an integer greater then 0."
);
"Attr(class_num) should be an integer greater then 0."
);
if
(
ctx
->
HasInput
(
"GTScore"
))
{
auto
dim_gtscore
=
ctx
->
GetInputDim
(
"GTScore"
);
PADDLE_ENFORCE_EQ
(
dim_gtscore
.
size
(),
2
,
"Input(GTScore) should be a 2-D tensor"
);
PADDLE_ENFORCE_EQ
(
dim_gtscore
[
0
],
dim_gtbox
[
0
],
"Input(GTBox) and Input(GTScore) dim[0] should be same"
);
PADDLE_ENFORCE_EQ
(
dim_gtscore
[
1
],
dim_gtbox
[
1
],
"Input(GTBox) and Input(GTScore) dim[1] should be same"
);
}
std
::
vector
<
int64_t
>
dim_out
({
dim_x
[
0
]});
std
::
vector
<
int64_t
>
dim_out
({
dim_x
[
0
]});
ctx
->
SetOutputDim
(
"Loss"
,
framework
::
make_ddim
(
dim_out
));
ctx
->
SetOutputDim
(
"Loss"
,
framework
::
make_ddim
(
dim_out
));
...
@@ -112,6 +125,12 @@ class Yolov3LossOpMaker : public framework::OpProtoAndCheckerMaker {
...
@@ -112,6 +125,12 @@ class Yolov3LossOpMaker : public framework::OpProtoAndCheckerMaker {
"This is a 2-D tensor with shape of [N, max_box_num], "
"This is a 2-D tensor with shape of [N, max_box_num], "
"and each element should be an integer to indicate the "
"and each element should be an integer to indicate the "
"box class id."
);
"box class id."
);
AddInput
(
"GTScore"
,
"The score of GTLabel, This is a 2-D tensor in same shape "
"GTLabel, and score values should in range (0, 1). This "
"input is for GTLabel score can be not 1.0 in image mixup "
"augmentation."
)
.
AsDispensable
();
AddOutput
(
"Loss"
,
AddOutput
(
"Loss"
,
"The output yolov3 loss tensor, "
"The output yolov3 loss tensor, "
"This is a 1-D tensor with shape of [N]"
);
"This is a 1-D tensor with shape of [N]"
);
...
@@ -143,6 +162,9 @@ class Yolov3LossOpMaker : public framework::OpProtoAndCheckerMaker {
...
@@ -143,6 +162,9 @@ class Yolov3LossOpMaker : public framework::OpProtoAndCheckerMaker {
AddAttr
<
float
>
(
"ignore_thresh"
,
AddAttr
<
float
>
(
"ignore_thresh"
,
"The ignore threshold to ignore confidence loss."
)
"The ignore threshold to ignore confidence loss."
)
.
SetDefault
(
0.7
);
.
SetDefault
(
0.7
);
AddAttr
<
bool
>
(
"use_label_smooth"
,
"Whether to use label smooth. Default True."
)
.
SetDefault
(
true
);
AddComment
(
R"DOC(
AddComment
(
R"DOC(
This operator generates yolov3 loss based on given predict result and ground
This operator generates yolov3 loss based on given predict result and ground
truth boxes.
truth boxes.
...
@@ -204,6 +226,15 @@ class Yolov3LossOpMaker : public framework::OpProtoAndCheckerMaker {
...
@@ -204,6 +226,15 @@ class Yolov3LossOpMaker : public framework::OpProtoAndCheckerMaker {
loss = (loss_{xy} + loss_{wh}) * weight_{box}
loss = (loss_{xy} + loss_{wh}) * weight_{box}
+ loss_{conf} + loss_{class}
+ loss_{conf} + loss_{class}
$$
$$
While :attr:`use_label_smooth` is set to be :attr:`True`, the classification
target will be smoothed when calculating classification loss, target of
positive samples will be smoothed to :math:`1.0 - 1.0 / class\_num` and target of
negetive samples will be smoothed to :math:`1.0 / class\_num`.
While :attr:`GTScore` is given, which means the mixup score of ground truth
boxes, all losses incured by a ground truth box will be multiplied by its
mixup score.
)DOC"
);
)DOC"
);
}
}
};
};
...
@@ -240,6 +271,7 @@ class Yolov3LossGradMaker : public framework::SingleGradOpDescMaker {
...
@@ -240,6 +271,7 @@ class Yolov3LossGradMaker : public framework::SingleGradOpDescMaker {
op
->
SetInput
(
"X"
,
Input
(
"X"
));
op
->
SetInput
(
"X"
,
Input
(
"X"
));
op
->
SetInput
(
"GTBox"
,
Input
(
"GTBox"
));
op
->
SetInput
(
"GTBox"
,
Input
(
"GTBox"
));
op
->
SetInput
(
"GTLabel"
,
Input
(
"GTLabel"
));
op
->
SetInput
(
"GTLabel"
,
Input
(
"GTLabel"
));
op
->
SetInput
(
"GTScore"
,
Input
(
"GTScore"
));
op
->
SetInput
(
framework
::
GradVarName
(
"Loss"
),
OutputGrad
(
"Loss"
));
op
->
SetInput
(
framework
::
GradVarName
(
"Loss"
),
OutputGrad
(
"Loss"
));
op
->
SetInput
(
"ObjectnessMask"
,
Output
(
"ObjectnessMask"
));
op
->
SetInput
(
"ObjectnessMask"
,
Output
(
"ObjectnessMask"
));
op
->
SetInput
(
"GTMatchMask"
,
Output
(
"GTMatchMask"
));
op
->
SetInput
(
"GTMatchMask"
,
Output
(
"GTMatchMask"
));
...
@@ -249,6 +281,7 @@ class Yolov3LossGradMaker : public framework::SingleGradOpDescMaker {
...
@@ -249,6 +281,7 @@ class Yolov3LossGradMaker : public framework::SingleGradOpDescMaker {
op
->
SetOutput
(
framework
::
GradVarName
(
"X"
),
InputGrad
(
"X"
));
op
->
SetOutput
(
framework
::
GradVarName
(
"X"
),
InputGrad
(
"X"
));
op
->
SetOutput
(
framework
::
GradVarName
(
"GTBox"
),
{});
op
->
SetOutput
(
framework
::
GradVarName
(
"GTBox"
),
{});
op
->
SetOutput
(
framework
::
GradVarName
(
"GTLabel"
),
{});
op
->
SetOutput
(
framework
::
GradVarName
(
"GTLabel"
),
{});
op
->
SetOutput
(
framework
::
GradVarName
(
"GTScore"
),
{});
return
std
::
unique_ptr
<
framework
::
OpDesc
>
(
op
);
return
std
::
unique_ptr
<
framework
::
OpDesc
>
(
op
);
}
}
};
};
...
...
paddle/fluid/operators/detection/yolov3_loss_op.h
浏览文件 @
161b8ddc
...
@@ -37,8 +37,8 @@ static T SigmoidCrossEntropy(T x, T label) {
...
@@ -37,8 +37,8 @@ static T SigmoidCrossEntropy(T x, T label) {
}
}
template
<
typename
T
>
template
<
typename
T
>
static
T
L
2
Loss
(
T
x
,
T
y
)
{
static
T
L
1
Loss
(
T
x
,
T
y
)
{
return
0.5
*
(
y
-
x
)
*
(
y
-
x
);
return
std
::
abs
(
y
-
x
);
}
}
template
<
typename
T
>
template
<
typename
T
>
...
@@ -47,8 +47,8 @@ static T SigmoidCrossEntropyGrad(T x, T label) {
...
@@ -47,8 +47,8 @@ static T SigmoidCrossEntropyGrad(T x, T label) {
}
}
template
<
typename
T
>
template
<
typename
T
>
static
T
L
2
LossGrad
(
T
x
,
T
y
)
{
static
T
L
1
LossGrad
(
T
x
,
T
y
)
{
return
x
-
y
;
return
x
>
y
?
1.0
:
-
1.0
;
}
}
static
int
GetMaskIndex
(
std
::
vector
<
int
>
mask
,
int
val
)
{
static
int
GetMaskIndex
(
std
::
vector
<
int
>
mask
,
int
val
)
{
...
@@ -121,47 +121,49 @@ template <typename T>
...
@@ -121,47 +121,49 @@ template <typename T>
static
void
CalcBoxLocationLoss
(
T
*
loss
,
const
T
*
input
,
Box
<
T
>
gt
,
static
void
CalcBoxLocationLoss
(
T
*
loss
,
const
T
*
input
,
Box
<
T
>
gt
,
std
::
vector
<
int
>
anchors
,
int
an_idx
,
std
::
vector
<
int
>
anchors
,
int
an_idx
,
int
box_idx
,
int
gi
,
int
gj
,
int
grid_size
,
int
box_idx
,
int
gi
,
int
gj
,
int
grid_size
,
int
input_size
,
int
stride
)
{
int
input_size
,
int
stride
,
T
score
)
{
T
tx
=
gt
.
x
*
grid_size
-
gi
;
T
tx
=
gt
.
x
*
grid_size
-
gi
;
T
ty
=
gt
.
y
*
grid_size
-
gj
;
T
ty
=
gt
.
y
*
grid_size
-
gj
;
T
tw
=
std
::
log
(
gt
.
w
*
input_size
/
anchors
[
2
*
an_idx
]);
T
tw
=
std
::
log
(
gt
.
w
*
input_size
/
anchors
[
2
*
an_idx
]);
T
th
=
std
::
log
(
gt
.
h
*
input_size
/
anchors
[
2
*
an_idx
+
1
]);
T
th
=
std
::
log
(
gt
.
h
*
input_size
/
anchors
[
2
*
an_idx
+
1
]);
T
scale
=
(
2.0
-
gt
.
w
*
gt
.
h
);
T
scale
=
(
2.0
-
gt
.
w
*
gt
.
h
)
*
score
;
loss
[
0
]
+=
SigmoidCrossEntropy
<
T
>
(
input
[
box_idx
],
tx
)
*
scale
;
loss
[
0
]
+=
SigmoidCrossEntropy
<
T
>
(
input
[
box_idx
],
tx
)
*
scale
;
loss
[
0
]
+=
SigmoidCrossEntropy
<
T
>
(
input
[
box_idx
+
stride
],
ty
)
*
scale
;
loss
[
0
]
+=
SigmoidCrossEntropy
<
T
>
(
input
[
box_idx
+
stride
],
ty
)
*
scale
;
loss
[
0
]
+=
L
2
Loss
<
T
>
(
input
[
box_idx
+
2
*
stride
],
tw
)
*
scale
;
loss
[
0
]
+=
L
1
Loss
<
T
>
(
input
[
box_idx
+
2
*
stride
],
tw
)
*
scale
;
loss
[
0
]
+=
L
2
Loss
<
T
>
(
input
[
box_idx
+
3
*
stride
],
th
)
*
scale
;
loss
[
0
]
+=
L
1
Loss
<
T
>
(
input
[
box_idx
+
3
*
stride
],
th
)
*
scale
;
}
}
template
<
typename
T
>
template
<
typename
T
>
static
void
CalcBoxLocationLossGrad
(
T
*
input_grad
,
const
T
loss
,
const
T
*
input
,
static
void
CalcBoxLocationLossGrad
(
T
*
input_grad
,
const
T
loss
,
const
T
*
input
,
Box
<
T
>
gt
,
std
::
vector
<
int
>
anchors
,
Box
<
T
>
gt
,
std
::
vector
<
int
>
anchors
,
int
an_idx
,
int
box_idx
,
int
gi
,
int
gj
,
int
an_idx
,
int
box_idx
,
int
gi
,
int
gj
,
int
grid_size
,
int
input_size
,
int
stride
)
{
int
grid_size
,
int
input_size
,
int
stride
,
T
score
)
{
T
tx
=
gt
.
x
*
grid_size
-
gi
;
T
tx
=
gt
.
x
*
grid_size
-
gi
;
T
ty
=
gt
.
y
*
grid_size
-
gj
;
T
ty
=
gt
.
y
*
grid_size
-
gj
;
T
tw
=
std
::
log
(
gt
.
w
*
input_size
/
anchors
[
2
*
an_idx
]);
T
tw
=
std
::
log
(
gt
.
w
*
input_size
/
anchors
[
2
*
an_idx
]);
T
th
=
std
::
log
(
gt
.
h
*
input_size
/
anchors
[
2
*
an_idx
+
1
]);
T
th
=
std
::
log
(
gt
.
h
*
input_size
/
anchors
[
2
*
an_idx
+
1
]);
T
scale
=
(
2.0
-
gt
.
w
*
gt
.
h
);
T
scale
=
(
2.0
-
gt
.
w
*
gt
.
h
)
*
score
;
input_grad
[
box_idx
]
=
input_grad
[
box_idx
]
=
SigmoidCrossEntropyGrad
<
T
>
(
input
[
box_idx
],
tx
)
*
scale
*
loss
;
SigmoidCrossEntropyGrad
<
T
>
(
input
[
box_idx
],
tx
)
*
scale
*
loss
;
input_grad
[
box_idx
+
stride
]
=
input_grad
[
box_idx
+
stride
]
=
SigmoidCrossEntropyGrad
<
T
>
(
input
[
box_idx
+
stride
],
ty
)
*
scale
*
loss
;
SigmoidCrossEntropyGrad
<
T
>
(
input
[
box_idx
+
stride
],
ty
)
*
scale
*
loss
;
input_grad
[
box_idx
+
2
*
stride
]
=
input_grad
[
box_idx
+
2
*
stride
]
=
L
2
LossGrad
<
T
>
(
input
[
box_idx
+
2
*
stride
],
tw
)
*
scale
*
loss
;
L
1
LossGrad
<
T
>
(
input
[
box_idx
+
2
*
stride
],
tw
)
*
scale
*
loss
;
input_grad
[
box_idx
+
3
*
stride
]
=
input_grad
[
box_idx
+
3
*
stride
]
=
L
2
LossGrad
<
T
>
(
input
[
box_idx
+
3
*
stride
],
th
)
*
scale
*
loss
;
L
1
LossGrad
<
T
>
(
input
[
box_idx
+
3
*
stride
],
th
)
*
scale
*
loss
;
}
}
template
<
typename
T
>
template
<
typename
T
>
static
inline
void
CalcLabelLoss
(
T
*
loss
,
const
T
*
input
,
const
int
index
,
static
inline
void
CalcLabelLoss
(
T
*
loss
,
const
T
*
input
,
const
int
index
,
const
int
label
,
const
int
class_num
,
const
int
label
,
const
int
class_num
,
const
int
stride
)
{
const
int
stride
,
const
T
pos
,
const
T
neg
,
T
score
)
{
for
(
int
i
=
0
;
i
<
class_num
;
i
++
)
{
for
(
int
i
=
0
;
i
<
class_num
;
i
++
)
{
T
pred
=
input
[
index
+
i
*
stride
];
T
pred
=
input
[
index
+
i
*
stride
];
loss
[
0
]
+=
SigmoidCrossEntropy
<
T
>
(
pred
,
(
i
==
label
)
?
1.0
:
0.0
)
;
loss
[
0
]
+=
SigmoidCrossEntropy
<
T
>
(
pred
,
(
i
==
label
)
?
pos
:
neg
)
*
score
;
}
}
}
}
...
@@ -169,11 +171,13 @@ template <typename T>
...
@@ -169,11 +171,13 @@ template <typename T>
static
inline
void
CalcLabelLossGrad
(
T
*
input_grad
,
const
T
loss
,
static
inline
void
CalcLabelLossGrad
(
T
*
input_grad
,
const
T
loss
,
const
T
*
input
,
const
int
index
,
const
T
*
input
,
const
int
index
,
const
int
label
,
const
int
class_num
,
const
int
label
,
const
int
class_num
,
const
int
stride
)
{
const
int
stride
,
const
T
pos
,
const
T
neg
,
T
score
)
{
for
(
int
i
=
0
;
i
<
class_num
;
i
++
)
{
for
(
int
i
=
0
;
i
<
class_num
;
i
++
)
{
T
pred
=
input
[
index
+
i
*
stride
];
T
pred
=
input
[
index
+
i
*
stride
];
input_grad
[
index
+
i
*
stride
]
=
input_grad
[
index
+
i
*
stride
]
=
SigmoidCrossEntropyGrad
<
T
>
(
pred
,
(
i
==
label
)
?
1.0
:
0.0
)
*
loss
;
SigmoidCrossEntropyGrad
<
T
>
(
pred
,
(
i
==
label
)
?
pos
:
neg
)
*
score
*
loss
;
}
}
}
}
...
@@ -188,8 +192,8 @@ static inline void CalcObjnessLoss(T* loss, const T* input, const T* objness,
...
@@ -188,8 +192,8 @@ static inline void CalcObjnessLoss(T* loss, const T* input, const T* objness,
for
(
int
l
=
0
;
l
<
w
;
l
++
)
{
for
(
int
l
=
0
;
l
<
w
;
l
++
)
{
T
obj
=
objness
[
k
*
w
+
l
];
T
obj
=
objness
[
k
*
w
+
l
];
if
(
obj
>
1e-5
)
{
if
(
obj
>
1e-5
)
{
// positive sample: obj =
1
// positive sample: obj =
mixup score
loss
[
i
]
+=
SigmoidCrossEntropy
<
T
>
(
input
[
k
*
w
+
l
],
1.0
);
loss
[
i
]
+=
SigmoidCrossEntropy
<
T
>
(
input
[
k
*
w
+
l
],
1.0
)
*
obj
;
}
else
if
(
obj
>
-
0.5
)
{
}
else
if
(
obj
>
-
0.5
)
{
// negetive sample: obj = 0
// negetive sample: obj = 0
loss
[
i
]
+=
SigmoidCrossEntropy
<
T
>
(
input
[
k
*
w
+
l
],
0.0
);
loss
[
i
]
+=
SigmoidCrossEntropy
<
T
>
(
input
[
k
*
w
+
l
],
0.0
);
...
@@ -215,7 +219,8 @@ static inline void CalcObjnessLossGrad(T* input_grad, const T* loss,
...
@@ -215,7 +219,8 @@ static inline void CalcObjnessLossGrad(T* input_grad, const T* loss,
T
obj
=
objness
[
k
*
w
+
l
];
T
obj
=
objness
[
k
*
w
+
l
];
if
(
obj
>
1e-5
)
{
if
(
obj
>
1e-5
)
{
input_grad
[
k
*
w
+
l
]
=
input_grad
[
k
*
w
+
l
]
=
SigmoidCrossEntropyGrad
<
T
>
(
input
[
k
*
w
+
l
],
1.0
)
*
loss
[
i
];
SigmoidCrossEntropyGrad
<
T
>
(
input
[
k
*
w
+
l
],
1.0
)
*
obj
*
loss
[
i
];
}
else
if
(
obj
>
-
0.5
)
{
}
else
if
(
obj
>
-
0.5
)
{
input_grad
[
k
*
w
+
l
]
=
input_grad
[
k
*
w
+
l
]
=
SigmoidCrossEntropyGrad
<
T
>
(
input
[
k
*
w
+
l
],
0.0
)
*
loss
[
i
];
SigmoidCrossEntropyGrad
<
T
>
(
input
[
k
*
w
+
l
],
0.0
)
*
loss
[
i
];
...
@@ -252,6 +257,7 @@ class Yolov3LossKernel : public framework::OpKernel<T> {
...
@@ -252,6 +257,7 @@ class Yolov3LossKernel : public framework::OpKernel<T> {
auto
*
input
=
ctx
.
Input
<
Tensor
>
(
"X"
);
auto
*
input
=
ctx
.
Input
<
Tensor
>
(
"X"
);
auto
*
gt_box
=
ctx
.
Input
<
Tensor
>
(
"GTBox"
);
auto
*
gt_box
=
ctx
.
Input
<
Tensor
>
(
"GTBox"
);
auto
*
gt_label
=
ctx
.
Input
<
Tensor
>
(
"GTLabel"
);
auto
*
gt_label
=
ctx
.
Input
<
Tensor
>
(
"GTLabel"
);
auto
*
gt_score
=
ctx
.
Input
<
Tensor
>
(
"GTScore"
);
auto
*
loss
=
ctx
.
Output
<
Tensor
>
(
"Loss"
);
auto
*
loss
=
ctx
.
Output
<
Tensor
>
(
"Loss"
);
auto
*
objness_mask
=
ctx
.
Output
<
Tensor
>
(
"ObjectnessMask"
);
auto
*
objness_mask
=
ctx
.
Output
<
Tensor
>
(
"ObjectnessMask"
);
auto
*
gt_match_mask
=
ctx
.
Output
<
Tensor
>
(
"GTMatchMask"
);
auto
*
gt_match_mask
=
ctx
.
Output
<
Tensor
>
(
"GTMatchMask"
);
...
@@ -260,6 +266,7 @@ class Yolov3LossKernel : public framework::OpKernel<T> {
...
@@ -260,6 +266,7 @@ class Yolov3LossKernel : public framework::OpKernel<T> {
int
class_num
=
ctx
.
Attr
<
int
>
(
"class_num"
);
int
class_num
=
ctx
.
Attr
<
int
>
(
"class_num"
);
float
ignore_thresh
=
ctx
.
Attr
<
float
>
(
"ignore_thresh"
);
float
ignore_thresh
=
ctx
.
Attr
<
float
>
(
"ignore_thresh"
);
int
downsample_ratio
=
ctx
.
Attr
<
int
>
(
"downsample_ratio"
);
int
downsample_ratio
=
ctx
.
Attr
<
int
>
(
"downsample_ratio"
);
bool
use_label_smooth
=
ctx
.
Attr
<
bool
>
(
"use_label_smooth"
);
const
int
n
=
input
->
dims
()[
0
];
const
int
n
=
input
->
dims
()[
0
];
const
int
h
=
input
->
dims
()[
2
];
const
int
h
=
input
->
dims
()[
2
];
...
@@ -272,6 +279,13 @@ class Yolov3LossKernel : public framework::OpKernel<T> {
...
@@ -272,6 +279,13 @@ class Yolov3LossKernel : public framework::OpKernel<T> {
const
int
stride
=
h
*
w
;
const
int
stride
=
h
*
w
;
const
int
an_stride
=
(
class_num
+
5
)
*
stride
;
const
int
an_stride
=
(
class_num
+
5
)
*
stride
;
T
label_pos
=
1.0
;
T
label_neg
=
0.0
;
if
(
use_label_smooth
)
{
label_pos
=
1.0
-
1.0
/
static_cast
<
T
>
(
class_num
);
label_neg
=
1.0
/
static_cast
<
T
>
(
class_num
);
}
const
T
*
input_data
=
input
->
data
<
T
>
();
const
T
*
input_data
=
input
->
data
<
T
>
();
const
T
*
gt_box_data
=
gt_box
->
data
<
T
>
();
const
T
*
gt_box_data
=
gt_box
->
data
<
T
>
();
const
int
*
gt_label_data
=
gt_label
->
data
<
int
>
();
const
int
*
gt_label_data
=
gt_label
->
data
<
int
>
();
...
@@ -283,6 +297,19 @@ class Yolov3LossKernel : public framework::OpKernel<T> {
...
@@ -283,6 +297,19 @@ class Yolov3LossKernel : public framework::OpKernel<T> {
int
*
gt_match_mask_data
=
int
*
gt_match_mask_data
=
gt_match_mask
->
mutable_data
<
int
>
({
n
,
b
},
ctx
.
GetPlace
());
gt_match_mask
->
mutable_data
<
int
>
({
n
,
b
},
ctx
.
GetPlace
());
const
T
*
gt_score_data
;
if
(
!
gt_score
)
{
Tensor
gtscore
;
gtscore
.
mutable_data
<
T
>
({
n
,
b
},
ctx
.
GetPlace
());
math
::
SetConstant
<
platform
::
CPUDeviceContext
,
T
>
()(
ctx
.
template
device_context
<
platform
::
CPUDeviceContext
>(),
&
gtscore
,
static_cast
<
T
>
(
1.0
));
gt_score
=
&
gtscore
;
gt_score_data
=
gtscore
.
data
<
T
>
();
}
else
{
gt_score_data
=
gt_score
->
data
<
T
>
();
}
// calc valid gt box mask, avoid calc duplicately in following code
// calc valid gt box mask, avoid calc duplicately in following code
Tensor
gt_valid_mask
;
Tensor
gt_valid_mask
;
bool
*
gt_valid_mask_data
=
bool
*
gt_valid_mask_data
=
...
@@ -355,19 +382,20 @@ class Yolov3LossKernel : public framework::OpKernel<T> {
...
@@ -355,19 +382,20 @@ class Yolov3LossKernel : public framework::OpKernel<T> {
int
mask_idx
=
GetMaskIndex
(
anchor_mask
,
best_n
);
int
mask_idx
=
GetMaskIndex
(
anchor_mask
,
best_n
);
gt_match_mask_data
[
i
*
b
+
t
]
=
mask_idx
;
gt_match_mask_data
[
i
*
b
+
t
]
=
mask_idx
;
if
(
mask_idx
>=
0
)
{
if
(
mask_idx
>=
0
)
{
T
score
=
gt_score_data
[
i
*
b
+
t
];
int
box_idx
=
GetEntryIndex
(
i
,
mask_idx
,
gj
*
w
+
gi
,
mask_num
,
int
box_idx
=
GetEntryIndex
(
i
,
mask_idx
,
gj
*
w
+
gi
,
mask_num
,
an_stride
,
stride
,
0
);
an_stride
,
stride
,
0
);
CalcBoxLocationLoss
<
T
>
(
loss_data
+
i
,
input_data
,
gt
,
anchors
,
best_n
,
CalcBoxLocationLoss
<
T
>
(
loss_data
+
i
,
input_data
,
gt
,
anchors
,
best_n
,
box_idx
,
gi
,
gj
,
h
,
input_size
,
stride
);
box_idx
,
gi
,
gj
,
h
,
input_size
,
stride
,
score
);
int
obj_idx
=
(
i
*
mask_num
+
mask_idx
)
*
stride
+
gj
*
w
+
gi
;
int
obj_idx
=
(
i
*
mask_num
+
mask_idx
)
*
stride
+
gj
*
w
+
gi
;
obj_mask_data
[
obj_idx
]
=
1.0
;
obj_mask_data
[
obj_idx
]
=
score
;
int
label
=
gt_label_data
[
i
*
b
+
t
];
int
label
=
gt_label_data
[
i
*
b
+
t
];
int
label_idx
=
GetEntryIndex
(
i
,
mask_idx
,
gj
*
w
+
gi
,
mask_num
,
int
label_idx
=
GetEntryIndex
(
i
,
mask_idx
,
gj
*
w
+
gi
,
mask_num
,
an_stride
,
stride
,
5
);
an_stride
,
stride
,
5
);
CalcLabelLoss
<
T
>
(
loss_data
+
i
,
input_data
,
label_idx
,
label
,
CalcLabelLoss
<
T
>
(
loss_data
+
i
,
input_data
,
label_idx
,
label
,
class_num
,
stride
);
class_num
,
stride
,
label_pos
,
label_neg
,
score
);
}
}
}
}
}
}
...
@@ -384,6 +412,7 @@ class Yolov3LossGradKernel : public framework::OpKernel<T> {
...
@@ -384,6 +412,7 @@ class Yolov3LossGradKernel : public framework::OpKernel<T> {
auto
*
input
=
ctx
.
Input
<
Tensor
>
(
"X"
);
auto
*
input
=
ctx
.
Input
<
Tensor
>
(
"X"
);
auto
*
gt_box
=
ctx
.
Input
<
Tensor
>
(
"GTBox"
);
auto
*
gt_box
=
ctx
.
Input
<
Tensor
>
(
"GTBox"
);
auto
*
gt_label
=
ctx
.
Input
<
Tensor
>
(
"GTLabel"
);
auto
*
gt_label
=
ctx
.
Input
<
Tensor
>
(
"GTLabel"
);
auto
*
gt_score
=
ctx
.
Input
<
Tensor
>
(
"GTScore"
);
auto
*
input_grad
=
ctx
.
Output
<
Tensor
>
(
framework
::
GradVarName
(
"X"
));
auto
*
input_grad
=
ctx
.
Output
<
Tensor
>
(
framework
::
GradVarName
(
"X"
));
auto
*
loss_grad
=
ctx
.
Input
<
Tensor
>
(
framework
::
GradVarName
(
"Loss"
));
auto
*
loss_grad
=
ctx
.
Input
<
Tensor
>
(
framework
::
GradVarName
(
"Loss"
));
auto
*
objness_mask
=
ctx
.
Input
<
Tensor
>
(
"ObjectnessMask"
);
auto
*
objness_mask
=
ctx
.
Input
<
Tensor
>
(
"ObjectnessMask"
);
...
@@ -392,6 +421,7 @@ class Yolov3LossGradKernel : public framework::OpKernel<T> {
...
@@ -392,6 +421,7 @@ class Yolov3LossGradKernel : public framework::OpKernel<T> {
auto
anchor_mask
=
ctx
.
Attr
<
std
::
vector
<
int
>>
(
"anchor_mask"
);
auto
anchor_mask
=
ctx
.
Attr
<
std
::
vector
<
int
>>
(
"anchor_mask"
);
int
class_num
=
ctx
.
Attr
<
int
>
(
"class_num"
);
int
class_num
=
ctx
.
Attr
<
int
>
(
"class_num"
);
int
downsample_ratio
=
ctx
.
Attr
<
int
>
(
"downsample_ratio"
);
int
downsample_ratio
=
ctx
.
Attr
<
int
>
(
"downsample_ratio"
);
bool
use_label_smooth
=
ctx
.
Attr
<
bool
>
(
"use_label_smooth"
);
const
int
n
=
input_grad
->
dims
()[
0
];
const
int
n
=
input_grad
->
dims
()[
0
];
const
int
c
=
input_grad
->
dims
()[
1
];
const
int
c
=
input_grad
->
dims
()[
1
];
...
@@ -404,6 +434,13 @@ class Yolov3LossGradKernel : public framework::OpKernel<T> {
...
@@ -404,6 +434,13 @@ class Yolov3LossGradKernel : public framework::OpKernel<T> {
const
int
stride
=
h
*
w
;
const
int
stride
=
h
*
w
;
const
int
an_stride
=
(
class_num
+
5
)
*
stride
;
const
int
an_stride
=
(
class_num
+
5
)
*
stride
;
T
label_pos
=
1.0
;
T
label_neg
=
0.0
;
if
(
use_label_smooth
)
{
label_pos
=
1.0
-
1.0
/
static_cast
<
T
>
(
class_num
);
label_neg
=
1.0
/
static_cast
<
T
>
(
class_num
);
}
const
T
*
input_data
=
input
->
data
<
T
>
();
const
T
*
input_data
=
input
->
data
<
T
>
();
const
T
*
gt_box_data
=
gt_box
->
data
<
T
>
();
const
T
*
gt_box_data
=
gt_box
->
data
<
T
>
();
const
int
*
gt_label_data
=
gt_label
->
data
<
int
>
();
const
int
*
gt_label_data
=
gt_label
->
data
<
int
>
();
...
@@ -414,25 +451,41 @@ class Yolov3LossGradKernel : public framework::OpKernel<T> {
...
@@ -414,25 +451,41 @@ class Yolov3LossGradKernel : public framework::OpKernel<T> {
input_grad
->
mutable_data
<
T
>
({
n
,
c
,
h
,
w
},
ctx
.
GetPlace
());
input_grad
->
mutable_data
<
T
>
({
n
,
c
,
h
,
w
},
ctx
.
GetPlace
());
memset
(
input_grad_data
,
0
,
input_grad
->
numel
()
*
sizeof
(
T
));
memset
(
input_grad_data
,
0
,
input_grad
->
numel
()
*
sizeof
(
T
));
const
T
*
gt_score_data
;
if
(
!
gt_score
)
{
Tensor
gtscore
;
gtscore
.
mutable_data
<
T
>
({
n
,
b
},
ctx
.
GetPlace
());
math
::
SetConstant
<
platform
::
CPUDeviceContext
,
T
>
()(
ctx
.
template
device_context
<
platform
::
CPUDeviceContext
>(),
&
gtscore
,
static_cast
<
T
>
(
1.0
));
gt_score
=
&
gtscore
;
gt_score_data
=
gtscore
.
data
<
T
>
();
}
else
{
gt_score_data
=
gt_score
->
data
<
T
>
();
}
for
(
int
i
=
0
;
i
<
n
;
i
++
)
{
for
(
int
i
=
0
;
i
<
n
;
i
++
)
{
for
(
int
t
=
0
;
t
<
b
;
t
++
)
{
for
(
int
t
=
0
;
t
<
b
;
t
++
)
{
int
mask_idx
=
gt_match_mask_data
[
i
*
b
+
t
];
int
mask_idx
=
gt_match_mask_data
[
i
*
b
+
t
];
if
(
mask_idx
>=
0
)
{
if
(
mask_idx
>=
0
)
{
T
score
=
gt_score_data
[
i
*
b
+
t
];
Box
<
T
>
gt
=
GetGtBox
(
gt_box_data
,
i
,
b
,
t
);
Box
<
T
>
gt
=
GetGtBox
(
gt_box_data
,
i
,
b
,
t
);
int
gi
=
static_cast
<
int
>
(
gt
.
x
*
w
);
int
gi
=
static_cast
<
int
>
(
gt
.
x
*
w
);
int
gj
=
static_cast
<
int
>
(
gt
.
y
*
h
);
int
gj
=
static_cast
<
int
>
(
gt
.
y
*
h
);
int
box_idx
=
GetEntryIndex
(
i
,
mask_idx
,
gj
*
w
+
gi
,
mask_num
,
int
box_idx
=
GetEntryIndex
(
i
,
mask_idx
,
gj
*
w
+
gi
,
mask_num
,
an_stride
,
stride
,
0
);
an_stride
,
stride
,
0
);
CalcBoxLocationLossGrad
<
T
>
(
CalcBoxLocationLossGrad
<
T
>
(
input_grad_data
,
loss_grad_data
[
i
],
input_grad_data
,
loss_grad_data
[
i
],
input_data
,
gt
,
anchors
,
input_data
,
gt
,
anchors
,
anchor_mask
[
mask_idx
],
box_idx
,
gi
,
gj
,
h
,
input_size
,
stride
);
anchor_mask
[
mask_idx
],
box_idx
,
gi
,
gj
,
h
,
input_size
,
stride
,
score
);
int
label
=
gt_label_data
[
i
*
b
+
t
];
int
label
=
gt_label_data
[
i
*
b
+
t
];
int
label_idx
=
GetEntryIndex
(
i
,
mask_idx
,
gj
*
w
+
gi
,
mask_num
,
int
label_idx
=
GetEntryIndex
(
i
,
mask_idx
,
gj
*
w
+
gi
,
mask_num
,
an_stride
,
stride
,
5
);
an_stride
,
stride
,
5
);
CalcLabelLossGrad
<
T
>
(
input_grad_data
,
loss_grad_data
[
i
],
input_data
,
CalcLabelLossGrad
<
T
>
(
input_grad_data
,
loss_grad_data
[
i
],
input_data
,
label_idx
,
label
,
class_num
,
stride
);
label_idx
,
label
,
class_num
,
stride
,
label_pos
,
label_neg
,
score
);
}
}
}
}
}
}
...
...
paddle/fluid/operators/expand_op.cc
浏览文件 @
161b8ddc
...
@@ -13,6 +13,7 @@ See the License for the specific language governing permissions and
...
@@ -13,6 +13,7 @@ See the License for the specific language governing permissions and
limitations under the License. */
limitations under the License. */
#include "paddle/fluid/operators/expand_op.h"
#include "paddle/fluid/operators/expand_op.h"
#include <memory>
#include <vector>
#include <vector>
namespace
paddle
{
namespace
paddle
{
...
@@ -138,12 +139,28 @@ class ExpandGradOp : public framework::OperatorWithKernel {
...
@@ -138,12 +139,28 @@ class ExpandGradOp : public framework::OperatorWithKernel {
}
}
};
};
class
ExpandGradOpDescMaker
:
public
framework
::
SingleGradOpDescMaker
{
public:
using
framework
::
SingleGradOpDescMaker
::
SingleGradOpDescMaker
;
protected:
std
::
unique_ptr
<
framework
::
OpDesc
>
Apply
()
const
override
{
std
::
unique_ptr
<
framework
::
OpDesc
>
op
(
new
framework
::
OpDesc
());
op
->
SetType
(
"expand_grad"
);
op
->
SetInput
(
"X"
,
Input
(
"X"
));
op
->
SetInput
(
framework
::
GradVarName
(
"Out"
),
OutputGrad
(
"Out"
));
op
->
SetOutput
(
framework
::
GradVarName
(
"X"
),
InputGrad
(
"X"
));
op
->
SetAttrMap
(
Attrs
());
return
op
;
}
};
}
// namespace operators
}
// namespace operators
}
// namespace paddle
}
// namespace paddle
namespace
ops
=
paddle
::
operators
;
namespace
ops
=
paddle
::
operators
;
REGISTER_OPERATOR
(
expand
,
ops
::
ExpandOp
,
ops
::
ExpandOpMaker
,
REGISTER_OPERATOR
(
expand
,
ops
::
ExpandOp
,
ops
::
ExpandOpMaker
,
paddle
::
framework
::
DefaultGradOpDescMaker
<
true
>
);
ops
::
ExpandGradOpDescMaker
);
REGISTER_OPERATOR
(
expand_grad
,
ops
::
ExpandGradOp
);
REGISTER_OPERATOR
(
expand_grad
,
ops
::
ExpandGradOp
);
REGISTER_OP_CPU_KERNEL
(
REGISTER_OP_CPU_KERNEL
(
expand
,
ops
::
ExpandKernel
<
paddle
::
platform
::
CPUDeviceContext
,
float
>
,
expand
,
ops
::
ExpandKernel
<
paddle
::
platform
::
CPUDeviceContext
,
float
>
,
...
...
paddle/fluid/operators/fake_quantize_op.cc
浏览文件 @
161b8ddc
...
@@ -81,6 +81,30 @@ struct FindRangeAbsMaxFunctor<platform::CPUDeviceContext, T> {
...
@@ -81,6 +81,30 @@ struct FindRangeAbsMaxFunctor<platform::CPUDeviceContext, T> {
template
struct
FindRangeAbsMaxFunctor
<
platform
::
CPUDeviceContext
,
float
>;
template
struct
FindRangeAbsMaxFunctor
<
platform
::
CPUDeviceContext
,
float
>;
template
<
typename
T
>
struct
FindMovingAverageAbsMaxFunctor
<
platform
::
CPUDeviceContext
,
T
>
{
void
operator
()(
const
platform
::
CPUDeviceContext
&
ctx
,
const
framework
::
Tensor
&
in_accum
,
const
framework
::
Tensor
&
in_state
,
const
T
*
cur_scale
,
const
float
rate
,
framework
::
Tensor
*
out_state
,
framework
::
Tensor
*
out_accum
,
framework
::
Tensor
*
out_scale
)
{
T
accum
=
in_accum
.
data
<
T
>
()[
0
];
T
state
=
in_state
.
data
<
T
>
()[
0
];
T
scale
=
cur_scale
[
0
];
state
=
rate
*
state
+
1
;
accum
=
rate
*
accum
+
scale
;
scale
=
accum
/
state
;
out_state
->
mutable_data
<
T
>
(
ctx
.
GetPlace
())[
0
]
=
state
;
out_accum
->
mutable_data
<
T
>
(
ctx
.
GetPlace
())[
0
]
=
accum
;
out_scale
->
mutable_data
<
T
>
(
ctx
.
GetPlace
())[
0
]
=
scale
;
}
};
template
struct
FindMovingAverageAbsMaxFunctor
<
platform
::
CPUDeviceContext
,
float
>;
class
FakeQuantizeAbsMaxOp
:
public
framework
::
OperatorWithKernel
{
class
FakeQuantizeAbsMaxOp
:
public
framework
::
OperatorWithKernel
{
public:
public:
FakeQuantizeAbsMaxOp
(
const
std
::
string
&
type
,
FakeQuantizeAbsMaxOp
(
const
std
::
string
&
type
,
...
@@ -255,6 +279,78 @@ $$Out = round(X/scale * range)$$
...
@@ -255,6 +279,78 @@ $$Out = round(X/scale * range)$$
}
}
};
};
class
FakeQuantizeMovingAverageAbsMaxOp
:
public
framework
::
OperatorWithKernel
{
public:
FakeQuantizeMovingAverageAbsMaxOp
(
const
std
::
string
&
type
,
const
framework
::
VariableNameMap
&
inputs
,
const
framework
::
VariableNameMap
&
outputs
,
const
framework
::
AttributeMap
&
attrs
)
:
OperatorWithKernel
(
type
,
inputs
,
outputs
,
attrs
)
{}
void
InferShape
(
framework
::
InferShapeContext
*
ctx
)
const
override
{
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"X"
),
"Input(X) of FakeQuantizeMovingAverageAbsMaxOp should not be null."
);
PADDLE_ENFORCE
(
ctx
->
HasOutput
(
"Out"
),
"Output(Out) of FakeQuantizeMovingAverageAbsMaxOp should not be null."
);
PADDLE_ENFORCE
(
ctx
->
HasOutput
(
"OutScale"
),
"Output(OutScale) of FakeQuantizeMovingAverageAbsMaxOp "
"should not be null"
);
if
(
ctx
->
HasOutput
(
"OutState"
))
{
ctx
->
SetOutputDim
(
"OutState"
,
{
1
});
}
if
(
ctx
->
HasOutput
(
"OutAccum"
))
{
ctx
->
SetOutputDim
(
"OutAccum"
,
{
1
});
}
ctx
->
SetOutputDim
(
"Out"
,
ctx
->
GetInputDim
(
"X"
));
ctx
->
SetOutputDim
(
"OutScale"
,
{
1
});
ctx
->
ShareLoD
(
"X"
,
/*->*/
"Out"
);
}
protected:
framework
::
OpKernelType
GetExpectedKernelType
(
const
framework
::
ExecutionContext
&
ctx
)
const
override
{
return
framework
::
OpKernelType
(
ctx
.
Input
<
framework
::
LoDTensor
>
(
"X"
)
->
type
(),
ctx
.
device_context
());
}
};
class
FakeQuantizeMovingAverageAbsMaxOpMaker
:
public
framework
::
OpProtoAndCheckerMaker
{
public:
void
Make
()
override
{
AddInput
(
"X"
,
"(Tensor) Input is float data type."
);
AddInput
(
"InScale"
,
"Last scale."
);
AddInput
(
"InAccum"
,
"Last accum."
).
AsDispensable
();
AddInput
(
"InState"
,
"Last state."
).
AsDispensable
();
AddOutput
(
"Out"
,
"(Tensor) Output of quantized low level tensor."
);
AddOutput
(
"OutScale"
,
" Current scale"
);
AddOutput
(
"OutState"
,
"(Tensor) state buffer."
).
AsDispensable
();
AddOutput
(
"OutAccum"
,
"(Tensor) accum buffer."
).
AsDispensable
();
AddAttr
<
float
>
(
"moving_rate"
,
"(float, default 0.9) moving rate."
)
.
SetDefault
(
0.9
);
AddAttr
<
int
>
(
"bit_length"
,
"(int, default 8), quantization bit number."
)
.
SetDefault
(
8
)
.
AddCustomChecker
([](
const
int
&
bit_length
)
{
PADDLE_ENFORCE
(
bit_length
>=
1
&&
bit_length
<=
16
,
"'bit_length' should be between 1 and 16."
);
});
AddAttr
<
bool
>
(
"is_test"
,
"(bool, default false) Set to true for inference only, false "
"for training. Some layers may run faster when this is true."
)
.
SetDefault
(
false
);
AddComment
(
R"DOC(
FakeQuantize operator is used in static quantization.
$$scale = (0.9*max(abs(x))+accum)/(0.9*state+1)$$
$$range = 2^{bit_length - 1} - 1$$
$$Out = round(X/scale * range)$$
)DOC"
);
}
};
}
// namespace operators
}
// namespace operators
}
// namespace paddle
}
// namespace paddle
...
@@ -273,6 +369,12 @@ REGISTER_OPERATOR(fake_quantize_range_abs_max, ops::FakeQuantizeRangeAbsMaxOp,
...
@@ -273,6 +369,12 @@ REGISTER_OPERATOR(fake_quantize_range_abs_max, ops::FakeQuantizeRangeAbsMaxOp,
REGISTER_OP_CPU_KERNEL
(
fake_quantize_range_abs_max
,
REGISTER_OP_CPU_KERNEL
(
fake_quantize_range_abs_max
,
ops
::
FakeQuantizeRangeAbsMaxKernel
<
CPU
,
float
>
);
ops
::
FakeQuantizeRangeAbsMaxKernel
<
CPU
,
float
>
);
REGISTER_OPERATOR
(
fake_quantize_moving_average_abs_max
,
ops
::
FakeQuantizeMovingAverageAbsMaxOp
,
ops
::
FakeQuantizeMovingAverageAbsMaxOpMaker
,
paddle
::
framework
::
EmptyGradOpMaker
);
REGISTER_OP_CPU_KERNEL
(
fake_quantize_moving_average_abs_max
,
ops
::
FakeQuantizeMovingAverageAbsMaxKernel
<
CPU
,
float
>
);
REGISTER_OPERATOR
(
fake_channel_wise_quantize_abs_max
,
REGISTER_OPERATOR
(
fake_channel_wise_quantize_abs_max
,
ops
::
FakeChannelWiseQuantizeAbsMaxOp
,
ops
::
FakeChannelWiseQuantizeAbsMaxOp
,
ops
::
FakeChannelWiseQuantizeAbsMaxOpMaker
,
ops
::
FakeChannelWiseQuantizeAbsMaxOpMaker
,
...
...
paddle/fluid/operators/fake_quantize_op.cu
浏览文件 @
161b8ddc
...
@@ -147,6 +147,41 @@ struct FindRangeAbsMaxFunctor<platform::CUDADeviceContext, T> {
...
@@ -147,6 +147,41 @@ struct FindRangeAbsMaxFunctor<platform::CUDADeviceContext, T> {
template
struct
FindRangeAbsMaxFunctor
<
platform
::
CUDADeviceContext
,
float
>;
template
struct
FindRangeAbsMaxFunctor
<
platform
::
CUDADeviceContext
,
float
>;
template
<
typename
T
>
struct
FindMovingAverageAbsMaxFunctor
<
platform
::
CUDADeviceContext
,
T
>
{
void
operator
()(
const
platform
::
CUDADeviceContext
&
ctx
,
const
framework
::
Tensor
&
in_accum
,
const
framework
::
Tensor
&
in_state
,
const
T
*
cur_scale
,
const
float
rate
,
framework
::
Tensor
*
out_state
,
framework
::
Tensor
*
out_accum
,
framework
::
Tensor
*
out_scale
)
{
const
auto
gpu_place
=
boost
::
get
<
platform
::
CUDAPlace
>
(
ctx
.
GetPlace
());
T
accum
;
memory
::
Copy
(
platform
::
CPUPlace
(),
&
accum
,
gpu_place
,
in_accum
.
data
<
T
>
(),
sizeof
(
T
),
0
);
T
state
;
memory
::
Copy
(
platform
::
CPUPlace
(),
&
state
,
gpu_place
,
in_state
.
data
<
T
>
(),
sizeof
(
T
),
0
);
T
scale
;
memory
::
Copy
(
platform
::
CPUPlace
(),
&
scale
,
gpu_place
,
cur_scale
,
sizeof
(
T
),
0
);
state
=
rate
*
state
+
1
;
accum
=
rate
*
accum
+
scale
;
scale
=
accum
/
state
;
memory
::
Copy
(
gpu_place
,
out_accum
->
mutable_data
<
T
>
(
gpu_place
),
platform
::
CPUPlace
(),
&
accum
,
sizeof
(
T
),
0
);
memory
::
Copy
(
gpu_place
,
out_state
->
mutable_data
<
T
>
(
gpu_place
),
platform
::
CPUPlace
(),
&
state
,
sizeof
(
T
),
0
);
memory
::
Copy
(
gpu_place
,
out_scale
->
mutable_data
<
T
>
(
gpu_place
),
platform
::
CPUPlace
(),
&
scale
,
sizeof
(
T
),
0
);
}
};
template
struct
FindMovingAverageAbsMaxFunctor
<
platform
::
CUDADeviceContext
,
float
>;
template
<
typename
T
>
template
<
typename
T
>
struct
ClipAndFakeQuantFunctor
<
platform
::
CUDADeviceContext
,
T
>
{
struct
ClipAndFakeQuantFunctor
<
platform
::
CUDADeviceContext
,
T
>
{
void
operator
()(
const
platform
::
CUDADeviceContext
&
ctx
,
void
operator
()(
const
platform
::
CUDADeviceContext
&
ctx
,
...
@@ -178,3 +213,6 @@ REGISTER_OP_CUDA_KERNEL(fake_channel_wise_quantize_abs_max,
...
@@ -178,3 +213,6 @@ REGISTER_OP_CUDA_KERNEL(fake_channel_wise_quantize_abs_max,
ops
::
FakeChannelWiseQuantizeAbsMaxKernel
<
CUDA
,
float
>
);
ops
::
FakeChannelWiseQuantizeAbsMaxKernel
<
CUDA
,
float
>
);
REGISTER_OP_CUDA_KERNEL
(
fake_quantize_range_abs_max
,
REGISTER_OP_CUDA_KERNEL
(
fake_quantize_range_abs_max
,
ops
::
FakeQuantizeRangeAbsMaxKernel
<
CUDA
,
float
>
);
ops
::
FakeQuantizeRangeAbsMaxKernel
<
CUDA
,
float
>
);
REGISTER_OP_CUDA_KERNEL
(
fake_quantize_moving_average_abs_max
,
ops
::
FakeQuantizeMovingAverageAbsMaxKernel
<
CUDA
,
float
>
);
paddle/fluid/operators/fake_quantize_op.h
浏览文件 @
161b8ddc
...
@@ -42,12 +42,20 @@ struct FindRangeAbsMaxFunctor {
...
@@ -42,12 +42,20 @@ struct FindRangeAbsMaxFunctor {
framework
::
Tensor
*
scales_arr
,
framework
::
Tensor
*
out_scale
);
framework
::
Tensor
*
scales_arr
,
framework
::
Tensor
*
out_scale
);
};
};
template
<
typename
DeviceContext
,
typename
T
>
struct
FindMovingAverageAbsMaxFunctor
{
void
operator
()(
const
DeviceContext
&
ctx
,
const
framework
::
Tensor
&
in_accum
,
const
framework
::
Tensor
&
in_state
,
const
framework
::
Tensor
&
cur_scale
,
framework
::
Tensor
*
out_state
,
framework
::
Tensor
*
out_accum
,
framework
::
Tensor
*
out_scale
);
};
template
<
typename
DeviceContext
,
typename
T
>
template
<
typename
DeviceContext
,
typename
T
>
class
FakeQuantizeAbsMaxKernel
:
public
framework
::
OpKernel
<
T
>
{
class
FakeQuantizeAbsMaxKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
public:
void
Compute
(
const
framework
::
ExecutionContext
&
context
)
const
override
{
void
Compute
(
const
framework
::
ExecutionContext
&
context
)
const
override
{
auto
*
in
=
context
.
Input
<
framework
::
Tensor
>
(
"X"
);
auto
*
in
=
context
.
Input
<
framework
::
Tensor
>
(
"X"
);
auto
*
out
=
context
.
Output
<
framework
::
Tensor
>
(
"Out"
);
auto
*
out
=
context
.
Output
<
framework
::
Tensor
>
(
"Out"
);
auto
*
out_scale
=
context
.
Output
<
framework
::
Tensor
>
(
"OutScale"
);
auto
*
out_scale
=
context
.
Output
<
framework
::
Tensor
>
(
"OutScale"
);
T
*
out_s
=
out_scale
->
mutable_data
<
T
>
(
context
.
GetPlace
());
T
*
out_s
=
out_scale
->
mutable_data
<
T
>
(
context
.
GetPlace
());
...
@@ -138,5 +146,54 @@ class FakeQuantizeRangeAbsMaxKernel : public framework::OpKernel<T> {
...
@@ -138,5 +146,54 @@ class FakeQuantizeRangeAbsMaxKernel : public framework::OpKernel<T> {
}
}
};
};
template
<
typename
DeviceContext
,
typename
T
>
class
FakeQuantizeMovingAverageAbsMaxKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
void
Compute
(
const
framework
::
ExecutionContext
&
context
)
const
override
{
auto
*
in
=
context
.
Input
<
framework
::
Tensor
>
(
"X"
);
auto
*
in_scale
=
context
.
Input
<
framework
::
Tensor
>
(
"InScale"
);
auto
*
out
=
context
.
Output
<
framework
::
Tensor
>
(
"Out"
);
out
->
mutable_data
<
T
>
(
context
.
GetPlace
());
bool
is_test
=
context
.
Attr
<
bool
>
(
"is_test"
);
int
bit_length
=
context
.
Attr
<
int
>
(
"bit_length"
);
int
bin_cnt
=
std
::
pow
(
2
,
bit_length
-
1
)
-
1
;
auto
&
dev_ctx
=
context
.
template
device_context
<
DeviceContext
>();
// testing
if
(
is_test
)
{
ClipAndFakeQuantFunctor
<
DeviceContext
,
T
>
()(
dev_ctx
,
*
in
,
*
in_scale
,
bin_cnt
,
out
);
return
;
}
// training
auto
*
in_accum
=
context
.
Input
<
framework
::
Tensor
>
(
"InAccum"
);
auto
*
in_state
=
context
.
Input
<
framework
::
Tensor
>
(
"InState"
);
auto
&
allocator
=
platform
::
DeviceTemporaryAllocator
::
Instance
().
Get
(
dev_ctx
);
auto
cur_scale
=
allocator
.
Allocate
(
1
*
sizeof
(
T
));
T
*
cur_scale_data
=
static_cast
<
T
*>
(
cur_scale
->
ptr
());
FindAbsMaxFunctor
<
DeviceContext
,
T
>
()(
dev_ctx
,
in
->
data
<
T
>
(),
in
->
numel
(),
cur_scale_data
);
auto
*
out_state
=
context
.
Output
<
framework
::
Tensor
>
(
"OutState"
);
auto
*
out_accum
=
context
.
Output
<
framework
::
Tensor
>
(
"OutAccum"
);
auto
*
out_scale
=
context
.
Output
<
framework
::
Tensor
>
(
"OutScale"
);
out_state
->
mutable_data
<
T
>
(
context
.
GetPlace
());
out_accum
->
mutable_data
<
T
>
(
context
.
GetPlace
());
out_scale
->
mutable_data
<
T
>
(
context
.
GetPlace
());
float
moving_rate
=
context
.
Attr
<
float
>
(
"moving_rate"
);
FindMovingAverageAbsMaxFunctor
<
DeviceContext
,
T
>
()(
dev_ctx
,
*
in_accum
,
*
in_state
,
cur_scale_data
,
moving_rate
,
out_state
,
out_accum
,
out_scale
);
ClipAndFakeQuantFunctor
<
DeviceContext
,
T
>
()(
dev_ctx
,
*
in
,
*
out_scale
,
bin_cnt
,
out
);
}
};
}
// namespace operators
}
// namespace operators
}
// namespace paddle
}
// namespace paddle
paddle/fluid/operators/fc_op.cc
浏览文件 @
161b8ddc
...
@@ -55,17 +55,8 @@ void FCOp::InferShape(framework::InferShapeContext* ctx) const {
...
@@ -55,17 +55,8 @@ void FCOp::InferShape(framework::InferShapeContext* ctx) const {
"The input tensor Input's rank of FCOp should be larger than "
"The input tensor Input's rank of FCOp should be larger than "
"in_num_col_dims."
);
"in_num_col_dims."
);
auto
in_mat_dims
=
framework
::
flatten_to_2d
(
in_dims
,
in_num_col_dims
);
PADDLE_ENFORCE_EQ
(
in_mat_dims
[
1
],
w_dims
[
0
],
"Fully Connected input and weigth size do not match. %s, %s"
);
std
::
vector
<
int64_t
>
output_dims
;
std
::
vector
<
int64_t
>
output_dims
;
output_dims
.
reserve
(
static_cast
<
size_t
>
(
in_num_col_dims
+
1
));
FCOutputSize
(
in_dims
,
w_dims
,
output_dims
,
in_num_col_dims
);
for
(
int
i
=
0
;
i
<
in_num_col_dims
;
++
i
)
{
output_dims
.
push_back
(
in_dims
[
i
]);
}
output_dims
.
push_back
(
w_dims
[
1
]);
ctx
->
SetOutputDim
(
"Out"
,
framework
::
make_ddim
(
output_dims
));
ctx
->
SetOutputDim
(
"Out"
,
framework
::
make_ddim
(
output_dims
));
ctx
->
ShareLoD
(
"Input"
,
"Out"
);
ctx
->
ShareLoD
(
"Input"
,
"Out"
);
...
@@ -128,6 +119,9 @@ void FCOpMaker::Make() {
...
@@ -128,6 +119,9 @@ void FCOpMaker::Make() {
AddAttr
<
bool
>
(
"use_mkldnn"
,
AddAttr
<
bool
>
(
"use_mkldnn"
,
"(bool, default false) Only used in mkldnn kernel"
)
"(bool, default false) Only used in mkldnn kernel"
)
.
SetDefault
(
false
);
.
SetDefault
(
false
);
AddAttr
<
bool
>
(
framework
::
kAllKernelsMustComputeRuntimeShape
,
"Skip calling InferShape() function in the runtime."
)
.
SetDefault
(
true
);
AddComment
(
R"DOC(
AddComment
(
R"DOC(
Fully Connected Operator.
Fully Connected Operator.
...
@@ -142,13 +136,20 @@ class FCOpKernel : public framework::OpKernel<T> {
...
@@ -142,13 +136,20 @@ class FCOpKernel : public framework::OpKernel<T> {
void
Compute
(
const
paddle
::
framework
::
ExecutionContext
&
ctx
)
const
override
{
void
Compute
(
const
paddle
::
framework
::
ExecutionContext
&
ctx
)
const
override
{
PADDLE_ENFORCE
(
platform
::
is_cpu_place
(
ctx
.
GetPlace
()),
PADDLE_ENFORCE
(
platform
::
is_cpu_place
(
ctx
.
GetPlace
()),
"It must use CPUPlace."
);
"It must use CPUPlace."
);
auto
input
=
ctx
.
Input
<
Tensor
>
(
"Input"
);
auto
input
=
ctx
.
Input
<
framework
::
LoD
Tensor
>
(
"Input"
);
auto
w
=
ctx
.
Input
<
Tensor
>
(
"W"
);
auto
w
=
ctx
.
Input
<
Tensor
>
(
"W"
);
auto
bias
=
ctx
.
Input
<
Tensor
>
(
"Bias"
);
auto
bias
=
ctx
.
Input
<
Tensor
>
(
"Bias"
);
auto
output
=
ctx
.
Output
<
Tensor
>
(
"Out"
);
auto
output
=
ctx
.
Output
<
framework
::
LoDTensor
>
(
"Out"
);
int
in_num_col_dims
=
ctx
.
Attr
<
int
>
(
"in_num_col_dims"
);
auto
w_dims
=
w
->
dims
();
auto
w_dims
=
w
->
dims
();
std
::
vector
<
int64_t
>
output_dims
;
FCOutputSize
(
input
->
dims
(),
w_dims
,
output_dims
,
in_num_col_dims
);
output
->
Resize
(
framework
::
make_ddim
(
output_dims
));
output
->
set_lod
(
input
->
lod
());
auto
out_dims
=
output
->
dims
();
auto
out_dims
=
output
->
dims
();
int
M
=
framework
::
product
(
out_dims
)
/
out_dims
[
out_dims
.
size
()
-
1
];
int
M
=
framework
::
product
(
out_dims
)
/
w_dims
[
1
];
const
T
*
input_data
=
input
->
data
<
T
>
();
const
T
*
input_data
=
input
->
data
<
T
>
();
const
T
*
w_data
=
w
->
data
<
T
>
();
const
T
*
w_data
=
w
->
data
<
T
>
();
...
...
paddle/fluid/operators/fc_op.h
浏览文件 @
161b8ddc
...
@@ -48,5 +48,21 @@ class FCOpMaker : public framework::OpProtoAndCheckerMaker {
...
@@ -48,5 +48,21 @@ class FCOpMaker : public framework::OpProtoAndCheckerMaker {
void
Make
()
override
;
void
Make
()
override
;
};
};
inline
void
FCOutputSize
(
const
framework
::
DDim
&
in_dims
,
const
framework
::
DDim
&
w_dims
,
std
::
vector
<
int64_t
>&
out_dims
,
// NOLINT
int
in_num_col_dims
)
{
auto
in_mat_dims
=
framework
::
flatten_to_2d
(
in_dims
,
in_num_col_dims
);
PADDLE_ENFORCE_EQ
(
in_mat_dims
[
1
],
w_dims
[
0
],
"Fully Connected input and weigth size do not match. %s, %s"
);
out_dims
.
reserve
(
static_cast
<
size_t
>
(
in_num_col_dims
+
1
));
for
(
int
i
=
0
;
i
<
in_num_col_dims
;
++
i
)
{
out_dims
.
push_back
(
in_dims
[
i
]);
}
out_dims
.
push_back
(
w_dims
[
1
]);
}
}
// namespace operators
}
// namespace operators
}
// namespace paddle
}
// namespace paddle
paddle/fluid/operators/fused/fused_embedding_seq_pool_op.cc
浏览文件 @
161b8ddc
...
@@ -88,7 +88,8 @@ class FusedEmbeddingSeqPoolOpMaker : public framework::OpProtoAndCheckerMaker {
...
@@ -88,7 +88,8 @@ class FusedEmbeddingSeqPoolOpMaker : public framework::OpProtoAndCheckerMaker {
"(boolean, default false) "
"(boolean, default false) "
"Sparse update."
)
"Sparse update."
)
.
SetDefault
(
false
);
.
SetDefault
(
false
);
AddAttr
<
bool
>
(
framework
::
kAllKernelsMustComputeRuntimeShape
,
""
)
AddAttr
<
bool
>
(
framework
::
kAllKernelsMustComputeRuntimeShape
,
"Skip calling InferShape() function in the runtime."
)
.
SetDefault
(
true
);
.
SetDefault
(
true
);
AddComment
(
R"DOC(
AddComment
(
R"DOC(
FusedEmbeddingSeqPool Operator.
FusedEmbeddingSeqPool Operator.
...
...
paddle/fluid/operators/hash_op.cc
浏览文件 @
161b8ddc
...
@@ -54,7 +54,8 @@ $$Out = scale * X$$
...
@@ -54,7 +54,8 @@ $$Out = scale * X$$
)DOC"
);
)DOC"
);
AddAttr
<
int
>
(
"num_hash"
,
""
).
SetDefault
(
1
);
AddAttr
<
int
>
(
"num_hash"
,
""
).
SetDefault
(
1
);
AddAttr
<
int
>
(
"mod_by"
,
""
).
SetDefault
(
100000
);
AddAttr
<
int
>
(
"mod_by"
,
""
).
SetDefault
(
100000
);
AddAttr
<
bool
>
(
framework
::
kAllKernelsMustComputeRuntimeShape
,
""
)
AddAttr
<
bool
>
(
framework
::
kAllKernelsMustComputeRuntimeShape
,
"Skip calling InferShape() function in the runtime."
)
.
SetDefault
(
true
);
.
SetDefault
(
true
);
}
}
};
};
...
...
paddle/fluid/operators/math.h
0 → 100644
浏览文件 @
161b8ddc
// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#pragma once
#include "paddle/fluid/platform/float16.h"
#include "paddle/fluid/platform/hostdevice.h"
#include "math.h" // NOLINT
namespace
paddle
{
namespace
operators
{
inline
HOSTDEVICE
platform
::
float16
real_exp
(
platform
::
float16
x
)
{
return
static_cast
<
platform
::
float16
>
(
::
expf
(
static_cast
<
float
>
(
x
)));
}
inline
HOSTDEVICE
float
real_exp
(
float
x
)
{
return
::
expf
(
x
);
}
inline
HOSTDEVICE
double
real_exp
(
double
x
)
{
return
::
exp
(
x
);
}
inline
HOSTDEVICE
platform
::
float16
real_log
(
platform
::
float16
x
)
{
return
static_cast
<
platform
::
float16
>
(
::
logf
(
static_cast
<
float
>
(
x
)));
}
inline
HOSTDEVICE
float
real_log
(
float
x
)
{
return
::
logf
(
x
);
}
inline
HOSTDEVICE
double
real_log
(
double
x
)
{
return
::
log
(
x
);
}
}
// namespace operators
}
// namespace paddle
paddle/fluid/operators/math/cross_entropy.cu
浏览文件 @
161b8ddc
...
@@ -12,6 +12,7 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
...
@@ -12,6 +12,7 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
See the License for the specific language governing permissions and
limitations under the License. */
limitations under the License. */
#include "paddle/fluid/operators/math.h"
#include "paddle/fluid/operators/math/cross_entropy.h"
#include "paddle/fluid/operators/math/cross_entropy.h"
#include "paddle/fluid/platform/cuda_device_function.h"
#include "paddle/fluid/platform/cuda_device_function.h"
#include "paddle/fluid/platform/cuda_primitives.h"
#include "paddle/fluid/platform/cuda_primitives.h"
...
@@ -20,17 +21,6 @@ namespace paddle {
...
@@ -20,17 +21,6 @@ namespace paddle {
namespace
operators
{
namespace
operators
{
namespace
math
{
namespace
math
{
namespace
{
__device__
__forceinline__
float
real_log
(
float
x
)
{
return
logf
(
x
);
}
__device__
__forceinline__
double
real_log
(
double
x
)
{
return
log
(
x
);
}
__device__
__forceinline__
platform
::
float16
real_log
(
const
platform
::
float16
&
val
)
{
return
static_cast
<
platform
::
float16
>
(
logf
(
static_cast
<
float
>
(
val
)));
}
template
<
typename
T
>
template
<
typename
T
>
__global__
void
CrossEntropyKernel
(
T
*
Y
,
const
T
*
X
,
const
int64_t
*
label
,
__global__
void
CrossEntropyKernel
(
T
*
Y
,
const
T
*
X
,
const
int64_t
*
label
,
const
int
N
,
const
int
D
,
const
int
N
,
const
int
D
,
...
@@ -61,7 +51,6 @@ __global__ void SoftCrossEntropyKernel(T* Y, const T* X, const T* label,
...
@@ -61,7 +51,6 @@ __global__ void SoftCrossEntropyKernel(T* Y, const T* X, const T* label,
Y
[
blockIdx
.
x
]
=
-
val
;
Y
[
blockIdx
.
x
]
=
-
val
;
}
}
}
}
}
// namespace
template
<
typename
T
>
template
<
typename
T
>
class
CrossEntropyFunctor
<
platform
::
CUDADeviceContext
,
T
>
{
class
CrossEntropyFunctor
<
platform
::
CUDADeviceContext
,
T
>
{
...
...
paddle/fluid/operators/mkldnn/conv_mkldnn_op.cc
浏览文件 @
161b8ddc
...
@@ -592,6 +592,7 @@ class ConvMKLDNNOpKernel : public paddle::framework::OpKernel<T> {
...
@@ -592,6 +592,7 @@ class ConvMKLDNNOpKernel : public paddle::framework::OpKernel<T> {
platform
::
SetDstMemoryHandler
<
uint8_t
>
(
ctx
,
output
,
handler
,
platform
::
SetDstMemoryHandler
<
uint8_t
>
(
ctx
,
output
,
handler
,
&
dst_memory_p
);
&
dst_memory_p
);
}
else
{
}
else
{
need_s8_to_u8
=
fuse_relu
;
platform
::
SetDstMemoryHandler
<
int8_t
>
(
ctx
,
output
,
handler
,
platform
::
SetDstMemoryHandler
<
int8_t
>
(
ctx
,
output
,
handler
,
&
dst_memory_p
);
&
dst_memory_p
);
}
}
...
...
paddle/fluid/operators/mkldnn/fc_mkldnn_op.cc
浏览文件 @
161b8ddc
...
@@ -123,7 +123,7 @@ class FCMKLDNNOpKernel : public paddle::framework::OpKernel<T> {
...
@@ -123,7 +123,7 @@ class FCMKLDNNOpKernel : public paddle::framework::OpKernel<T> {
auto
&
dev_ctx
=
ctx
.
template
device_context
<
MKLDNNDeviceContext
>();
auto
&
dev_ctx
=
ctx
.
template
device_context
<
MKLDNNDeviceContext
>();
const
auto
&
mkldnn_engine
=
dev_ctx
.
GetEngine
();
const
auto
&
mkldnn_engine
=
dev_ctx
.
GetEngine
();
auto
input
=
ctx
.
Input
<
Tensor
>
(
"Input"
);
auto
input
=
ctx
.
Input
<
framework
::
LoD
Tensor
>
(
"Input"
);
auto
w
=
ctx
.
Input
<
Tensor
>
(
"W"
);
auto
w
=
ctx
.
Input
<
Tensor
>
(
"W"
);
auto
bias
=
ctx
.
Input
<
Tensor
>
(
"Bias"
);
auto
bias
=
ctx
.
Input
<
Tensor
>
(
"Bias"
);
...
@@ -151,7 +151,13 @@ class FCMKLDNNOpKernel : public paddle::framework::OpKernel<T> {
...
@@ -151,7 +151,13 @@ class FCMKLDNNOpKernel : public paddle::framework::OpKernel<T> {
const
T
*
input_data
=
input
->
data
<
T
>
();
const
T
*
input_data
=
input
->
data
<
T
>
();
const
T
*
w_data
=
w
->
data
<
T
>
();
const
T
*
w_data
=
w
->
data
<
T
>
();
auto
output
=
ctx
.
Output
<
Tensor
>
(
"Out"
);
auto
output
=
ctx
.
Output
<
framework
::
LoDTensor
>
(
"Out"
);
int
in_num_col_dims
=
ctx
.
Attr
<
int
>
(
"in_num_col_dims"
);
std
::
vector
<
int64_t
>
output_dims
;
FCOutputSize
(
input
->
dims
(),
w
->
dims
(),
output_dims
,
in_num_col_dims
);
output
->
Resize
(
framework
::
make_ddim
(
output_dims
));
output
->
set_lod
(
input
->
lod
());
T
*
output_data
=
output
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
T
*
output_data
=
output
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
auto
dst_memory
=
mem
.
dst
(
output_data
);
auto
dst_memory
=
mem
.
dst
(
output_data
);
...
@@ -204,19 +210,21 @@ class FCMKLDNNGradOpKernel : public paddle::framework::OpKernel<T> {
...
@@ -204,19 +210,21 @@ class FCMKLDNNGradOpKernel : public paddle::framework::OpKernel<T> {
Tensor
*
input_grad
=
ctx
.
Output
<
Tensor
>
(
framework
::
GradVarName
(
"Input"
));
Tensor
*
input_grad
=
ctx
.
Output
<
Tensor
>
(
framework
::
GradVarName
(
"Input"
));
Tensor
*
w_grad
=
ctx
.
Output
<
Tensor
>
(
framework
::
GradVarName
(
"W"
));
Tensor
*
w_grad
=
ctx
.
Output
<
Tensor
>
(
framework
::
GradVarName
(
"W"
));
const
Tensor
*
input
=
ctx
.
Input
<
Tensor
>
(
"Input"
);
const
T
*
input_data
=
input
->
data
<
T
>
();
const
Tensor
*
w
=
ctx
.
Input
<
Tensor
>
(
"W"
);
const
T
*
w_data
=
w
->
data
<
T
>
();
if
(
input_grad
)
{
if
(
input_grad
)
{
input_grad
->
Resize
(
input
->
dims
());
input_grad_data
=
input_grad
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
input_grad_data
=
input_grad
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
}
}
if
(
w_grad
)
{
if
(
w_grad
)
{
w_grad
->
Resize
(
w
->
dims
());
w_grad_data
=
w_grad
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
w_grad_data
=
w_grad
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
}
}
const
Tensor
*
input
=
ctx
.
Input
<
Tensor
>
(
"Input"
);
const
T
*
input_data
=
input
->
data
<
T
>
();
const
Tensor
*
w
=
ctx
.
Input
<
Tensor
>
(
"W"
);
const
T
*
w_data
=
w
->
data
<
T
>
();
const
Tensor
*
out_grad
=
ctx
.
Input
<
Tensor
>
(
framework
::
GradVarName
(
"Out"
));
const
Tensor
*
out_grad
=
ctx
.
Input
<
Tensor
>
(
framework
::
GradVarName
(
"Out"
));
const
T
*
out_grad_data
=
out_grad
->
data
<
T
>
();
const
T
*
out_grad_data
=
out_grad
->
data
<
T
>
();
...
...
paddle/fluid/operators/mkldnn/transpose_mkldnn_op.cc
浏览文件 @
161b8ddc
...
@@ -73,6 +73,29 @@ class TransposeMKLDNNOpKernel : public paddle::framework::OpKernel<T> {
...
@@ -73,6 +73,29 @@ class TransposeMKLDNNOpKernel : public paddle::framework::OpKernel<T> {
}
}
};
};
template
<
typename
T
>
class
TransposeINT8MKLDNNOpKernel
:
public
paddle
::
framework
::
OpKernel
<
T
>
{
public:
void
Compute
(
const
paddle
::
framework
::
ExecutionContext
&
ctx
)
const
override
{
std
::
vector
<
int
>
axis
=
ctx
.
Attr
<
std
::
vector
<
int
>>
(
"axis"
);
std
::
vector
<
int
>
axis_int8
=
{
0
,
2
,
3
,
1
};
if
(
axis
.
size
()
!=
1
)
{
PADDLE_ENFORCE_EQ
(
axis
.
size
(),
axis_int8
.
size
());
for
(
size_t
i
=
0
;
i
<
axis
.
size
();
i
++
)
{
PADDLE_ENFORCE_EQ
(
axis
[
i
],
axis_int8
[
i
],
"Current INT8 MKLDNN Transpose kernel only surpport "
"axis with [0, 2, 3, 1] due to MKL-DNN kernel "
"implementation."
);
}
}
auto
*
input
=
ctx
.
Input
<
Tensor
>
(
"X"
);
auto
*
output
=
ctx
.
Output
<
Tensor
>
(
"Out"
);
output
->
ShareDataWith
(
*
input
);
output
->
set_layout
(
DataLayout
::
kMKLDNN
);
output
->
set_format
(
input
->
format
());
}
};
template
<
typename
T
>
template
<
typename
T
>
class
TransposeMKLDNNGradOpKernel
:
public
paddle
::
framework
::
OpKernel
<
T
>
{
class
TransposeMKLDNNGradOpKernel
:
public
paddle
::
framework
::
OpKernel
<
T
>
{
public:
public:
...
@@ -140,7 +163,10 @@ class TransposeMKLDNNGradOpKernel : public paddle::framework::OpKernel<T> {
...
@@ -140,7 +163,10 @@ class TransposeMKLDNNGradOpKernel : public paddle::framework::OpKernel<T> {
namespace
ops
=
paddle
::
operators
;
namespace
ops
=
paddle
::
operators
;
REGISTER_OP_KERNEL
(
transpose2
,
MKLDNN
,
::
paddle
::
platform
::
CPUPlace
,
REGISTER_OP_KERNEL
(
transpose2
,
MKLDNN
,
::
paddle
::
platform
::
CPUPlace
,
ops
::
TransposeMKLDNNOpKernel
<
float
>
);
ops
::
TransposeMKLDNNOpKernel
<
float
>
,
ops
::
TransposeINT8MKLDNNOpKernel
<
uint8_t
>
,
ops
::
TransposeINT8MKLDNNOpKernel
<
int8_t
>
);
REGISTER_OP_KERNEL
(
transpose
,
MKLDNN
,
::
paddle
::
platform
::
CPUPlace
,
REGISTER_OP_KERNEL
(
transpose
,
MKLDNN
,
::
paddle
::
platform
::
CPUPlace
,
ops
::
TransposeMKLDNNOpKernel
<
float
>
);
ops
::
TransposeMKLDNNOpKernel
<
float
>
);
...
...
paddle/fluid/operators/optimizers/adam_op.h
浏览文件 @
161b8ddc
...
@@ -15,6 +15,7 @@ limitations under the License. */
...
@@ -15,6 +15,7 @@ limitations under the License. */
#pragma once
#pragma once
#include <math.h> // for sqrt in CPU and CUDA
#include <math.h> // for sqrt in CPU and CUDA
#include <Eigen/Dense>
#include <Eigen/Dense>
#include <unordered_map>
#include <vector>
#include <vector>
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/threadpool.h"
#include "paddle/fluid/framework/threadpool.h"
...
@@ -311,17 +312,17 @@ struct SparseAdamFunctor<T, CPUAdam> {
...
@@ -311,17 +312,17 @@ struct SparseAdamFunctor<T, CPUAdam> {
T
beta1_pow
=
*
beta1_pow_
;
T
beta1_pow
=
*
beta1_pow_
;
T
beta2_pow
=
*
beta2_pow_
;
T
beta2_pow
=
*
beta2_pow_
;
lr
*=
sqrt
(
1
-
beta2_pow
)
/
(
1
-
beta1_pow
);
lr
*=
sqrt
(
1
-
beta2_pow
)
/
(
1
-
beta1_pow
);
size_t
row_count
=
numel
/
row_numel_
;
int64_t
row_count
=
static_cast
<
int64_t
>
(
numel
/
row_numel_
)
;
for
(
size_t
i
=
0U
,
j
=
0U
;
i
!=
row_count
;
++
i
)
{
for
(
int64_t
i
=
0
,
j
=
0
;
i
!=
row_count
;
++
i
)
{
if
(
i
==
*
(
rows_
+
j
))
{
if
(
i
==
*
(
rows_
+
j
))
{
for
(
size_t
k
=
0U
;
k
!=
row_numel_
;
++
k
)
{
for
(
int64_t
k
=
0
;
k
!=
row_numel_
;
++
k
)
{
T
g
=
grad_
[
j
*
row_numel_
+
k
];
T
g
=
grad_
[
j
*
row_numel_
+
k
];
adam_update
(
i
*
row_numel_
+
k
,
g
);
adam_update
(
i
*
row_numel_
+
k
,
g
);
}
}
++
j
;
++
j
;
}
else
{
}
else
{
for
(
size_t
k
=
0U
;
k
!=
row_numel_
;
++
k
)
{
for
(
int64_t
k
=
0
;
k
!=
row_numel_
;
++
k
)
{
T
mom1
=
moment1_
[
i
*
row_numel_
+
k
];
T
mom1
=
moment1_
[
i
*
row_numel_
+
k
];
T
mom2
=
moment2_
[
i
*
row_numel_
+
k
];
T
mom2
=
moment2_
[
i
*
row_numel_
+
k
];
T
p
=
param_
[
i
*
row_numel_
+
k
];
T
p
=
param_
[
i
*
row_numel_
+
k
];
...
@@ -427,43 +428,23 @@ class AdamOpKernel : public framework::OpKernel<T> {
...
@@ -427,43 +428,23 @@ class AdamOpKernel : public framework::OpKernel<T> {
}
}
}
}
framework
::
SelectedRows
cpu
_grad_merge
;
framework
::
SelectedRows
tmp
_grad_merge
;
const
framework
::
SelectedRows
*
grad_merge_ptr
;
const
framework
::
SelectedRows
*
grad_merge_ptr
;
if
(
is_strict_sorted
)
{
if
(
is_strict_sorted
)
{
grad_merge_ptr
=
&
grad
;
grad_merge_ptr
=
&
grad
;
}
else
{
}
else
{
// merge duplicated rows if any.
// merge duplicated rows if any.
// The rows of grad_merge have been sorted inside MergeAdd functor
// The rows of grad_merge have been sorted inside MergeAdd functor
framework
::
SelectedRows
*
grad_merge_var
;
scatter
::
MergeAdd
<
DeviceContext
,
T
>
merge_func
;
scatter
::
MergeAdd
<
DeviceContext
,
T
>
merge_func
;
if
(
platform
::
is_cpu_place
(
ctx
.
GetPlace
()))
{
grad_merge_var
=
&
cpu_grad_merge
;
}
else
{
// FIXME(qiao): GPU also need to fix this
grad_merge_var
=
const_cast
<
framework
::
Scope
&>
(
ctx
.
scope
())
.
Var
()
->
GetMutable
<
framework
::
SelectedRows
>
();
}
merge_func
(
ctx
.
template
device_context
<
DeviceContext
>(),
grad
,
merge_func
(
ctx
.
template
device_context
<
DeviceContext
>(),
grad
,
grad_merge_var
,
true
);
&
tmp_grad_merge
,
true
);
grad_merge_ptr
=
grad_merge_var
;
grad_merge_ptr
=
&
tmp_grad_merge
;
}
}
auto
&
grad_merge
=
*
grad_merge_ptr
;
auto
&
grad_merge
=
*
grad_merge_ptr
;
auto
&
grad_tensor
=
grad_merge
.
value
();
auto
&
grad_tensor
=
grad_merge
.
value
();
const
T
*
grad_data
=
grad_tensor
.
template
data
<
T
>();
const
T
*
grad_data
=
grad_tensor
.
template
data
<
T
>();
const
int64_t
*
rows
=
nullptr
;
const
int64_t
*
rows
=
grad_merge
.
rows
().
Data
(
ctx
.
GetPlace
());
// When compiled without CUDA, the CUDAData() interface should not be
// provided.
#if defined(PADDLE_WITH_CUDA)
if
(
platform
::
is_gpu_place
(
ctx
.
GetPlace
()))
{
rows
=
grad_merge
.
rows
().
CUDAData
(
ctx
.
GetPlace
());
}
else
{
#endif
rows
=
grad_merge
.
rows
().
data
();
#if defined(PADDLE_WITH_CUDA)
}
#endif
auto
row_numel
=
grad_tensor
.
numel
()
/
grad_merge
.
rows
().
size
();
auto
row_numel
=
grad_tensor
.
numel
()
/
grad_merge
.
rows
().
size
();
if
(
platform
::
is_cpu_place
(
ctx
.
GetPlace
()))
{
if
(
platform
::
is_cpu_place
(
ctx
.
GetPlace
()))
{
...
@@ -488,7 +469,7 @@ class AdamOpKernel : public framework::OpKernel<T> {
...
@@ -488,7 +469,7 @@ class AdamOpKernel : public framework::OpKernel<T> {
}
}
}
}
#ifndef _WIN32
#ifndef _WIN32
else
if
(
FLAGS_inner_op_parallelism
>
1
&&
else
if
(
FLAGS_inner_op_parallelism
>
1
&&
// NOLINT
min_row_size_to_use_multithread
>
0
&&
min_row_size_to_use_multithread
>
0
&&
param
.
dims
()[
0
]
>
min_row_size_to_use_multithread
)
{
param
.
dims
()[
0
]
>
min_row_size_to_use_multithread
)
{
VLOG
(
3
)
<<
"use multi thread, inner_op_parallelism="
VLOG
(
3
)
<<
"use multi thread, inner_op_parallelism="
...
@@ -516,11 +497,11 @@ class AdamOpKernel : public framework::OpKernel<T> {
...
@@ -516,11 +497,11 @@ class AdamOpKernel : public framework::OpKernel<T> {
for
(
int
i
=
0
;
i
<
FLAGS_inner_op_parallelism
;
++
i
)
{
for
(
int
i
=
0
;
i
<
FLAGS_inner_op_parallelism
;
++
i
)
{
int64_t
start
=
i
*
line_in_each_thread
;
int64_t
start
=
i
*
line_in_each_thread
;
int64_t
end
=
(
i
+
1
)
*
line_in_each_thread
;
int64_t
end
=
(
i
+
1
)
*
line_in_each_thread
;
if
(
start
>=
param_row_count
)
{
if
(
start
>=
static_cast
<
int64_t
>
(
param_row_count
)
)
{
break
;
break
;
}
}
if
(
end
>
param_row_count
)
{
if
(
end
>
static_cast
<
int64_t
>
(
param_row_count
)
)
{
end
=
param_row_count
;
end
=
static_cast
<
int64_t
>
(
param_row_count
)
;
}
}
fs
.
push_back
(
fs
.
push_back
(
framework
::
Async
([
&
functor
,
&
row_id_to_grad_row_offset
,
framework
::
Async
([
&
functor
,
&
row_id_to_grad_row_offset
,
...
@@ -546,7 +527,7 @@ class AdamOpKernel : public framework::OpKernel<T> {
...
@@ -546,7 +527,7 @@ class AdamOpKernel : public framework::OpKernel<T> {
for
(
size_t
i
=
0
;
i
<
fs
.
size
();
++
i
)
fs
[
i
].
wait
();
for
(
size_t
i
=
0
;
i
<
fs
.
size
();
++
i
)
fs
[
i
].
wait
();
}
}
#endif // !_WIN32
#endif // !_WIN32
else
{
else
{
// NOLINT
functor
(
param
.
numel
());
functor
(
param
.
numel
());
}
}
}
else
if
(
platform
::
is_gpu_place
(
ctx
.
GetPlace
()))
{
}
else
if
(
platform
::
is_gpu_place
(
ctx
.
GetPlace
()))
{
...
...
paddle/fluid/operators/optimizers/momentum_op.h
浏览文件 @
161b8ddc
...
@@ -13,6 +13,7 @@ See the License for the specific language governing permissions and
...
@@ -13,6 +13,7 @@ See the License for the specific language governing permissions and
limitations under the License. */
limitations under the License. */
#pragma once
#pragma once
#include <memory>
#include <string>
#include <string>
#include "paddle/fluid/framework/eigen.h"
#include "paddle/fluid/framework/eigen.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/op_registry.h"
...
@@ -69,6 +70,7 @@ class MomentumOp : public framework::OperatorWithKernel {
...
@@ -69,6 +70,7 @@ class MomentumOp : public framework::OperatorWithKernel {
ctx
->
SetOutputDim
(
"ParamOut"
,
param_dim
);
ctx
->
SetOutputDim
(
"ParamOut"
,
param_dim
);
ctx
->
SetOutputDim
(
"VelocityOut"
,
param_dim
);
ctx
->
SetOutputDim
(
"VelocityOut"
,
param_dim
);
}
}
framework
::
OpKernelType
GetExpectedKernelType
(
framework
::
OpKernelType
GetExpectedKernelType
(
const
framework
::
ExecutionContext
&
ctx
)
const
override
{
const
framework
::
ExecutionContext
&
ctx
)
const
override
{
auto
input_data_type
=
framework
::
GetDataTypeOfVar
(
ctx
.
InputVar
(
"Param"
));
auto
input_data_type
=
framework
::
GetDataTypeOfVar
(
ctx
.
InputVar
(
"Param"
));
...
@@ -351,23 +353,14 @@ class MomentumOpKernel : public framework::OpKernel<T> {
...
@@ -351,23 +353,14 @@ class MomentumOpKernel : public framework::OpKernel<T> {
VLOG
(
3
)
<<
"Grad SelectedRows contains no data!"
;
VLOG
(
3
)
<<
"Grad SelectedRows contains no data!"
;
return
;
return
;
}
}
auto
*
merged_grad
=
const_cast
<
framework
::
Scope
&>
(
ctx
.
scope
())
.
Var
()
framework
::
SelectedRows
tmp_merged_grad
;
->
GetMutable
<
framework
::
SelectedRows
>
()
;
framework
::
SelectedRows
*
merged_grad
=
&
tmp_merged_grad
;
math
::
scatter
::
MergeAdd
<
DeviceContext
,
T
>
merge_func
;
math
::
scatter
::
MergeAdd
<
DeviceContext
,
T
>
merge_func
;
merge_func
(
ctx
.
template
device_context
<
DeviceContext
>(),
*
grad
,
merge_func
(
ctx
.
template
device_context
<
DeviceContext
>(),
*
grad
,
merged_grad
);
merged_grad
);
const
int64_t
*
rows
=
nullptr
;
const
int64_t
*
rows
=
merged_grad
->
rows
().
Data
(
ctx
.
GetPlace
());
#ifdef PADDLE_WITH_CUDA
if
(
platform
::
is_gpu_place
(
ctx
.
GetPlace
()))
{
rows
=
merged_grad
->
rows
().
CUDAData
(
ctx
.
GetPlace
());
}
else
{
#endif
rows
=
merged_grad
->
rows
().
data
();
#ifdef PADDLE_WITH_CUDA
}
#endif
int64_t
row_numel
=
int64_t
row_numel
=
merged_grad
->
value
().
numel
()
/
merged_grad
->
rows
().
size
();
merged_grad
->
value
().
numel
()
/
merged_grad
->
rows
().
size
();
platform
::
ForRange
<
DeviceContext
>
for_range
(
platform
::
ForRange
<
DeviceContext
>
for_range
(
...
...
paddle/fluid/operators/optimizers/rmsprop_op.h
浏览文件 @
161b8ddc
...
@@ -216,24 +216,14 @@ class RmspropOpKernel : public framework::OpKernel<T> {
...
@@ -216,24 +216,14 @@ class RmspropOpKernel : public framework::OpKernel<T> {
}
}
}
else
if
(
grad_var
->
IsType
<
framework
::
SelectedRows
>
())
{
}
else
if
(
grad_var
->
IsType
<
framework
::
SelectedRows
>
())
{
auto
&
grad
=
grad_var
->
Get
<
framework
::
SelectedRows
>
();
auto
&
grad
=
grad_var
->
Get
<
framework
::
SelectedRows
>
();
auto
*
merged_grad
=
const_cast
<
framework
::
Scope
&>
(
ctx
.
scope
())
framework
::
SelectedRows
tmp_merged_grad
;
.
Var
()
framework
::
SelectedRows
*
merged_grad
=
&
tmp_merged_grad
;
->
GetMutable
<
framework
::
SelectedRows
>
();
math
::
scatter
::
MergeAdd
<
DeviceContext
,
T
>
merge_func
;
math
::
scatter
::
MergeAdd
<
DeviceContext
,
T
>
merge_func
;
merge_func
(
dev_ctx
,
grad
,
merged_grad
);
merge_func
(
dev_ctx
,
grad
,
merged_grad
);
platform
::
ForRange
<
DeviceContext
>
for_range
(
dev_ctx
,
limit
);
platform
::
ForRange
<
DeviceContext
>
for_range
(
dev_ctx
,
limit
);
const
int64_t
*
rows
;
const
int64_t
*
rows
=
merged_grad
->
rows
().
Data
(
ctx
.
GetPlace
());
#ifdef PADDLE_WITH_CUDA
if
(
platform
::
is_gpu_place
(
ctx
.
GetPlace
()))
{
rows
=
merged_grad
->
rows
().
CUDAData
(
ctx
.
GetPlace
());
}
else
{
#endif
rows
=
merged_grad
->
rows
().
data
();
#ifdef PADDLE_WITH_CUDA
}
#endif
auto
&
merged_tensor
=
merged_grad
->
value
();
auto
&
merged_tensor
=
merged_grad
->
value
();
int64_t
row_count
=
merged_grad
->
rows
().
size
();
int64_t
row_count
=
merged_grad
->
rows
().
size
();
int64_t
row_numel
=
merged_tensor
.
numel
()
/
row_count
;
int64_t
row_numel
=
merged_tensor
.
numel
()
/
row_count
;
...
...
paddle/fluid/operators/pool_op.cc
浏览文件 @
161b8ddc
...
@@ -13,6 +13,7 @@ See the License for the specific language governing permissions and
...
@@ -13,6 +13,7 @@ See the License for the specific language governing permissions and
limitations under the License. */
limitations under the License. */
#include "paddle/fluid/operators/pool_op.h"
#include "paddle/fluid/operators/pool_op.h"
#include <unordered_map>
#ifdef PADDLE_WITH_CUDA
#ifdef PADDLE_WITH_CUDA
#include "paddle/fluid/platform/cudnn_helper.h"
#include "paddle/fluid/platform/cudnn_helper.h"
#endif
#endif
...
@@ -212,6 +213,12 @@ void Pool2dOpMaker::Make() {
...
@@ -212,6 +213,12 @@ void Pool2dOpMaker::Make() {
AddAttr
<
bool
>
(
"use_mkldnn"
,
AddAttr
<
bool
>
(
"use_mkldnn"
,
"(bool, default false) Only used in mkldnn kernel"
)
"(bool, default false) Only used in mkldnn kernel"
)
.
SetDefault
(
false
);
.
SetDefault
(
false
);
AddAttr
<
bool
>
(
"use_quantizer"
,
"(bool, default false) "
"Set to true for operators that should be quantized and use "
"int8 kernel. "
"Only used on CPU."
)
.
SetDefault
(
false
);
AddAttr
<
std
::
string
>
(
AddAttr
<
std
::
string
>
(
"data_format"
,
"data_format"
,
"(string, default NCHW) Only used in "
"(string, default NCHW) Only used in "
...
...
paddle/fluid/operators/reshape_op.cc
浏览文件 @
161b8ddc
...
@@ -219,14 +219,6 @@ class ReshapeKernel {
...
@@ -219,14 +219,6 @@ class ReshapeKernel {
std
::
vector
<
int
>
(
shape_data
,
shape_data
+
shape_tensor
->
numel
());
std
::
vector
<
int
>
(
shape_data
,
shape_data
+
shape_tensor
->
numel
());
out_dims
=
ReshapeOp
::
ValidateShape
(
shape
,
in
->
dims
());
out_dims
=
ReshapeOp
::
ValidateShape
(
shape
,
in
->
dims
());
}
}
if
(
!
in
->
lod
().
empty
())
{
PADDLE_ENFORCE_EQ
(
out_dims
[
0
],
in
->
dims
()[
0
],
"Reshape operator cannot reshape an input sequence batch "
"into an output sequence batch that has a different "
"number of time steps. Please consider using "
"sequence_reshape op."
);
}
out
->
mutable_data
(
ctx
.
GetPlace
(),
in
->
type
());
out
->
mutable_data
(
ctx
.
GetPlace
(),
in
->
type
());
framework
::
TensorCopy
(
framework
::
TensorCopy
(
...
...
paddle/fluid/operators/selu_op.h
浏览文件 @
161b8ddc
...
@@ -15,13 +15,12 @@ limitations under the License. */
...
@@ -15,13 +15,12 @@ limitations under the License. */
#pragma once
#pragma once
#include <string>
#include <string>
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/operators/math.h"
#include "paddle/fluid/platform/for_range.h"
#include "paddle/fluid/platform/for_range.h"
namespace
paddle
{
namespace
paddle
{
namespace
operators
{
namespace
operators
{
static
HOSTDEVICE
float
real_exp
(
float
x
)
{
return
expf
(
x
);
}
static
HOSTDEVICE
float
real_exp
(
double
x
)
{
return
exp
(
x
);
}
template
<
typename
T
>
template
<
typename
T
>
struct
SeluFunctor
{
struct
SeluFunctor
{
SeluFunctor
(
const
T
*
x_data_ptr
,
float
alpha
,
float
scale
,
T
*
y_data_ptr
)
SeluFunctor
(
const
T
*
x_data_ptr
,
float
alpha
,
float
scale
,
T
*
y_data_ptr
)
...
...
paddle/fluid/operators/sequence_ops/sequence_enumerate_op.cc
浏览文件 @
161b8ddc
...
@@ -59,7 +59,8 @@ class SequenceEnumerateOpMaker : public framework::OpProtoAndCheckerMaker {
...
@@ -59,7 +59,8 @@ class SequenceEnumerateOpMaker : public framework::OpProtoAndCheckerMaker {
});
});
AddAttr
<
int
>
(
"pad_value"
,
"(int) The enumerate sequence padding value."
)
AddAttr
<
int
>
(
"pad_value"
,
"(int) The enumerate sequence padding value."
)
.
SetDefault
(
0
);
.
SetDefault
(
0
);
AddAttr
<
bool
>
(
framework
::
kAllKernelsMustComputeRuntimeShape
,
""
)
AddAttr
<
bool
>
(
framework
::
kAllKernelsMustComputeRuntimeShape
,
"Skip calling InferShape() function in the runtime."
)
.
SetDefault
(
true
);
.
SetDefault
(
true
);
AddComment
(
R"DOC(
AddComment
(
R"DOC(
Sequence Enumerate Operator.
Sequence Enumerate Operator.
...
...
paddle/fluid/operators/sequence_ops/sequence_softmax_op.cu
浏览文件 @
161b8ddc
...
@@ -14,6 +14,7 @@ limitations under the License. */
...
@@ -14,6 +14,7 @@ limitations under the License. */
#include <algorithm>
#include <algorithm>
#include <cub/cub.cuh> // NOLINT
#include <cub/cub.cuh> // NOLINT
#include "paddle/fluid/operators/math.h"
#include "paddle/fluid/operators/sequence_ops/sequence_softmax_op.h"
#include "paddle/fluid/operators/sequence_ops/sequence_softmax_op.h"
namespace
paddle
{
namespace
paddle
{
...
@@ -21,9 +22,6 @@ namespace operators {
...
@@ -21,9 +22,6 @@ namespace operators {
using
LoDTensor
=
framework
::
LoDTensor
;
using
LoDTensor
=
framework
::
LoDTensor
;
__device__
__forceinline__
float
real_exp
(
float
x
)
{
return
expf
(
x
);
}
__device__
__forceinline__
double
real_exp
(
double
x
)
{
return
exp
(
x
);
}
template
<
typename
T
,
int
BlockDim
>
template
<
typename
T
,
int
BlockDim
>
using
BlockReduce
=
cub
::
BlockReduce
<
T
,
BlockDim
>
;
using
BlockReduce
=
cub
::
BlockReduce
<
T
,
BlockDim
>
;
...
...
paddle/fluid/operators/sigmoid_cross_entropy_with_logits_op.cu
浏览文件 @
161b8ddc
...
@@ -12,6 +12,7 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
...
@@ -12,6 +12,7 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
See the License for the specific language governing permissions and
limitations under the License. */
limitations under the License. */
#include "cub/cub.cuh"
#include "cub/cub.cuh"
#include "paddle/fluid/operators/math.h"
#include "paddle/fluid/operators/sigmoid_cross_entropy_with_logits_op.h"
#include "paddle/fluid/operators/sigmoid_cross_entropy_with_logits_op.h"
#include "paddle/fluid/platform/cuda_primitives.h"
#include "paddle/fluid/platform/cuda_primitives.h"
#include "paddle/fluid/platform/hostdevice.h"
#include "paddle/fluid/platform/hostdevice.h"
...
@@ -21,11 +22,6 @@ namespace operators {
...
@@ -21,11 +22,6 @@ namespace operators {
using
Tensor
=
framework
::
Tensor
;
using
Tensor
=
framework
::
Tensor
;
static
HOSTDEVICE
float
real_exp
(
float
x
)
{
return
expf
(
x
);
}
static
HOSTDEVICE
float
real_exp
(
double
x
)
{
return
exp
(
x
);
}
static
HOSTDEVICE
float
real_log
(
float
x
)
{
return
logf
(
x
);
}
static
HOSTDEVICE
float
real_log
(
double
x
)
{
return
log
(
x
);
}
static
constexpr
int
kNumCUDAThreads
=
512
;
static
constexpr
int
kNumCUDAThreads
=
512
;
static
constexpr
int
kNumMaxinumNumBlocks
=
4096
;
static
constexpr
int
kNumMaxinumNumBlocks
=
4096
;
...
...
paddle/fluid/operators/slice_op.cu
浏览文件 @
161b8ddc
...
@@ -12,18 +12,138 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
...
@@ -12,18 +12,138 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
See the License for the specific language governing permissions and
limitations under the License. */
limitations under the License. */
#include <thrust/device_vector.h>
#include "paddle/fluid/operators/math/math_function.h"
#include "paddle/fluid/operators/slice_op.h"
#include "paddle/fluid/operators/slice_op.h"
#include "paddle/fluid/platform/cuda_device_function.h"
#include "paddle/fluid/platform/cuda_primitives.h"
#include "paddle/fluid/platform/float16.h"
namespace
paddle
{
namespace
operators
{
using
platform
::
PADDLE_CUDA_NUM_THREADS
;
template
<
size_t
D
>
__global__
void
Padding
(
const
paddle
::
platform
::
float16
*
d_out
,
const
int
*
out_dims
,
const
int
*
in_dims
,
const
int
*
offsets
,
int64_t
n
,
paddle
::
platform
::
float16
*
d_in
)
{
int64_t
out_idx
=
threadIdx
.
x
+
blockDim
.
x
*
blockIdx
.
x
;
if
(
out_idx
<
n
)
{
int
coords
[
D
]
=
{
0
};
for
(
int
i
=
D
-
1
;
i
>=
0
;
--
i
)
{
coords
[
i
]
=
out_idx
%
out_dims
[
i
];
out_idx
/=
out_dims
[
i
];
coords
[
i
]
+=
offsets
[
i
];
}
int64_t
in_idx
=
0
;
for
(
int
i
=
0
;
i
<
D
-
1
;
++
i
)
{
in_idx
+=
coords
[
i
]
*
in_dims
[
i
+
1
];
}
in_idx
+=
coords
[
D
-
1
];
d_in
[
in_idx
]
=
d_out
[
out_idx
];
}
}
template
<
>
class
SliceGradKernel
<
paddle
::
platform
::
CUDADeviceContext
,
paddle
::
platform
::
float16
>
:
public
framework
::
OpKernel
<
paddle
::
platform
::
float16
>
{
public:
void
Compute
(
const
framework
::
ExecutionContext
&
ctx
)
const
override
{
auto
*
d_out
=
ctx
.
Input
<
framework
::
Tensor
>
(
framework
::
GradVarName
(
"Out"
));
auto
*
d_in
=
ctx
.
Output
<
framework
::
Tensor
>
(
framework
::
GradVarName
(
"Input"
));
d_in
->
mutable_data
<
paddle
::
platform
::
float16
>
(
ctx
.
GetPlace
());
auto
out_dims
=
d_out
->
dims
();
auto
in_dims
=
d_in
->
dims
();
int
rank
=
out_dims
.
size
();
std
::
vector
<
int
>
offsets
(
rank
,
0
);
auto
axes
=
ctx
.
Attr
<
std
::
vector
<
int
>>
(
"axes"
);
auto
starts
=
ctx
.
Attr
<
std
::
vector
<
int
>>
(
"starts"
);
for
(
size_t
i
=
0
;
i
<
starts
.
size
();
++
i
)
{
if
(
starts
[
i
]
<
0
)
{
starts
[
i
]
+=
in_dims
[
axes
[
i
]];
}
offsets
[
axes
[
i
]]
=
std
::
max
(
starts
[
i
],
0
);
}
math
::
SetConstant
<
paddle
::
platform
::
CUDADeviceContext
,
paddle
::
platform
::
float16
>
set_zero
;
auto
&
dev_ctx
=
ctx
.
template
device_context
<
paddle
::
platform
::
CUDADeviceContext
>();
set_zero
(
dev_ctx
,
d_in
,
static_cast
<
paddle
::
platform
::
float16
>
(
0
));
int64_t
numel
=
d_out
->
numel
();
dim3
blocks
((
numel
-
1
)
/
PADDLE_CUDA_NUM_THREADS
+
1
,
1
,
1
);
dim3
threads
(
PADDLE_CUDA_NUM_THREADS
,
1
,
1
);
auto
stream
=
ctx
.
cuda_device_context
().
stream
();
auto
out_shape
=
framework
::
vectorize2int
(
out_dims
);
thrust
::
device_vector
<
int
>
out_dims_vec
(
out_shape
.
begin
(),
out_shape
.
end
());
auto
in_shape
=
framework
::
vectorize2int
(
in_dims
);
thrust
::
device_vector
<
int
>
in_dims_vec
(
in_shape
.
begin
(),
in_shape
.
end
());
thrust
::
device_vector
<
int
>
offsets_vec
(
offsets
.
begin
(),
offsets
.
end
());
const
int
*
out_dims_ptr
=
thrust
::
raw_pointer_cast
(
out_dims_vec
.
data
());
const
int
*
in_dims_ptr
=
thrust
::
raw_pointer_cast
(
in_dims_vec
.
data
());
const
int
*
offsets_ptr
=
thrust
::
raw_pointer_cast
(
offsets_vec
.
data
());
switch
(
rank
)
{
case
1
:
Padding
<
1
><<<
blocks
,
threads
,
0
,
stream
>>>
(
d_out
->
data
<
paddle
::
platform
::
float16
>
(),
out_dims_ptr
,
in_dims_ptr
,
offsets_ptr
,
numel
,
d_in
->
data
<
paddle
::
platform
::
float16
>
());
break
;
case
2
:
Padding
<
2
><<<
blocks
,
threads
,
0
,
stream
>>>
(
d_out
->
data
<
paddle
::
platform
::
float16
>
(),
out_dims_ptr
,
in_dims_ptr
,
offsets_ptr
,
numel
,
d_in
->
data
<
paddle
::
platform
::
float16
>
());
break
;
case
3
:
Padding
<
3
><<<
blocks
,
threads
,
0
,
stream
>>>
(
d_out
->
data
<
paddle
::
platform
::
float16
>
(),
out_dims_ptr
,
in_dims_ptr
,
offsets_ptr
,
numel
,
d_in
->
data
<
paddle
::
platform
::
float16
>
());
break
;
case
4
:
Padding
<
4
><<<
blocks
,
threads
,
0
,
stream
>>>
(
d_out
->
data
<
paddle
::
platform
::
float16
>
(),
out_dims_ptr
,
in_dims_ptr
,
offsets_ptr
,
numel
,
d_in
->
data
<
paddle
::
platform
::
float16
>
());
break
;
case
5
:
Padding
<
5
><<<
blocks
,
threads
,
0
,
stream
>>>
(
d_out
->
data
<
paddle
::
platform
::
float16
>
(),
out_dims_ptr
,
in_dims_ptr
,
offsets_ptr
,
numel
,
d_in
->
data
<
paddle
::
platform
::
float16
>
());
break
;
case
6
:
Padding
<
6
><<<
blocks
,
threads
,
0
,
stream
>>>
(
d_out
->
data
<
paddle
::
platform
::
float16
>
(),
out_dims_ptr
,
in_dims_ptr
,
offsets_ptr
,
numel
,
d_in
->
data
<
paddle
::
platform
::
float16
>
());
break
;
}
}
};
}
// namespace operators
}
// namespace paddle
namespace
ops
=
paddle
::
operators
;
namespace
ops
=
paddle
::
operators
;
namespace
plat
=
paddle
::
platform
;
REGISTER_OP_CUDA_KERNEL
(
REGISTER_OP_CUDA_KERNEL
(
slice
,
ops
::
SliceKernel
<
paddle
::
platform
::
CUDADeviceContext
,
float
>
,
slice
,
ops
::
SliceKernel
<
paddle
::
platform
::
CUDADeviceContext
,
float
>
,
ops
::
SliceKernel
<
paddle
::
platform
::
CUDADeviceContext
,
double
>
,
ops
::
SliceKernel
<
paddle
::
platform
::
CUDADeviceContext
,
double
>
,
ops
::
SliceKernel
<
paddle
::
platform
::
CUDADeviceContext
,
int
>
,
ops
::
SliceKernel
<
paddle
::
platform
::
CUDADeviceContext
,
int
>
,
ops
::
SliceKernel
<
paddle
::
platform
::
CUDADeviceContext
,
int64_t
>
);
ops
::
SliceKernel
<
paddle
::
platform
::
CUDADeviceContext
,
int64_t
>
,
ops
::
SliceKernel
<
paddle
::
platform
::
CUDADeviceContext
,
plat
::
float16
>
);
REGISTER_OP_CUDA_KERNEL
(
REGISTER_OP_CUDA_KERNEL
(
slice_grad
,
slice_grad
,
ops
::
SliceGradKernel
<
paddle
::
platform
::
CUDADeviceContext
,
float
>
,
ops
::
SliceGradKernel
<
paddle
::
platform
::
CUDADeviceContext
,
float
>
,
ops
::
SliceGradKernel
<
paddle
::
platform
::
CUDADeviceContext
,
double
>
,
ops
::
SliceGradKernel
<
paddle
::
platform
::
CUDADeviceContext
,
double
>
,
ops
::
SliceGradKernel
<
paddle
::
platform
::
CUDADeviceContext
,
int
>
,
ops
::
SliceGradKernel
<
paddle
::
platform
::
CUDADeviceContext
,
int
>
,
ops
::
SliceGradKernel
<
paddle
::
platform
::
CUDADeviceContext
,
int64_t
>
);
ops
::
SliceGradKernel
<
paddle
::
platform
::
CUDADeviceContext
,
int64_t
>
,
ops
::
SliceGradKernel
<
paddle
::
platform
::
CUDADeviceContext
,
plat
::
float16
>
);
paddle/fluid/operators/softmax_with_cross_entropy_op.cu
浏览文件 @
161b8ddc
...
@@ -439,7 +439,8 @@ class SoftmaxWithCrossEntropyGradCUDAKernel : public framework::OpKernel<T> {
...
@@ -439,7 +439,8 @@ class SoftmaxWithCrossEntropyGradCUDAKernel : public framework::OpKernel<T> {
context
.
Input
<
Tensor
>
(
framework
::
GradVarName
(
"Loss"
))
->
data
<
T
>
();
context
.
Input
<
Tensor
>
(
framework
::
GradVarName
(
"Loss"
))
->
data
<
T
>
();
Tensor
*
logit_grad
=
Tensor
*
logit_grad
=
context
.
Output
<
Tensor
>
(
framework
::
GradVarName
(
"Logits"
));
context
.
Output
<
Tensor
>
(
framework
::
GradVarName
(
"Logits"
));
logit_grad
->
ShareDataWith
(
*
context
.
Input
<
Tensor
>
(
"Softmax"
));
framework
::
TensorCopy
(
*
context
.
Input
<
Tensor
>
(
"Softmax"
),
context
.
GetPlace
(),
context
.
device_context
(),
logit_grad
);
T
*
logit_grad_data
=
logit_grad
->
data
<
T
>
();
T
*
logit_grad_data
=
logit_grad
->
data
<
T
>
();
const
int
batch_size
=
logit_grad
->
dims
()[
0
];
const
int
batch_size
=
logit_grad
->
dims
()[
0
];
...
...
paddle/fluid/operators/squeeze_op.cc
浏览文件 @
161b8ddc
...
@@ -94,6 +94,7 @@ class SqueezeOpInferShape : public framework::InferShapeBase {
...
@@ -94,6 +94,7 @@ class SqueezeOpInferShape : public framework::InferShapeBase {
}
}
};
};
// TODO(paddle-dev): Should use OpKernel.
class
SqueezeOp
:
public
framework
::
OperatorBase
{
class
SqueezeOp
:
public
framework
::
OperatorBase
{
public:
public:
using
OperatorBase
::
OperatorBase
;
using
OperatorBase
::
OperatorBase
;
...
...
paddle/fluid/operators/sync_batch_norm_op.cc
0 → 100644
浏览文件 @
161b8ddc
/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/operators/batch_norm_op.h"
namespace
ops
=
paddle
::
operators
;
REGISTER_OPERATOR
(
sync_batch_norm
,
ops
::
BatchNormOp
,
ops
::
BatchNormOpMaker
,
ops
::
BatchNormOpInferVarType
,
ops
::
BatchNormGradMaker
);
REGISTER_OPERATOR
(
sync_batch_norm_grad
,
ops
::
BatchNormGradOp
);
paddle/fluid/operators/sync_batch_norm_op.cu
0 → 100644
浏览文件 @
161b8ddc
/* Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <algorithm>
#include <cfloat>
#include <string>
#include <vector>
#include "cub/cub.cuh"
#include "paddle/fluid/framework/data_layout.h"
#include "paddle/fluid/operators/batch_norm_op.h"
#include "paddle/fluid/platform/cudnn_helper.h"
#include "paddle/fluid/platform/float16.h"
#include "paddle/fluid/platform/nccl_helper.h"
namespace
paddle
{
namespace
operators
{
using
Tensor
=
framework
::
Tensor
;
using
DataLayout
=
framework
::
DataLayout
;
template
<
typename
T
>
using
CudnnDataType
=
platform
::
CudnnDataType
<
T
>
;
template
<
typename
T
,
int
BlockDim
,
framework
::
DataLayout
layout
>
__global__
void
KeLocalStats
(
const
T
*
x
,
int
N
,
int
M
,
int
C
,
T
*
mean_var
)
{
typedef
cub
::
BlockReduce
<
T
,
BlockDim
>
BlockReduce
;
__shared__
typename
BlockReduce
::
TempStorage
temp_storage
;
for
(
int
k
=
blockIdx
.
x
;
k
<
C
;
k
+=
gridDim
.
x
)
{
T
x_sum
=
0
;
T
x2_sum
=
0
;
for
(
int
i
=
threadIdx
.
x
;
i
<
N
*
M
;
i
+=
BlockDim
)
{
int
id
=
layout
==
framework
::
DataLayout
::
kNCHW
?
(
i
/
M
)
*
C
*
M
+
k
*
M
+
i
%
M
:
i
*
C
+
k
;
T
x_in
=
x
[
id
];
x_sum
+=
x_in
;
x2_sum
+=
x_in
*
x_in
;
}
__syncthreads
();
T
out
=
BlockReduce
(
temp_storage
).
Reduce
(
x_sum
,
cub
::
Sum
());
__syncthreads
();
if
(
threadIdx
.
x
==
0
)
{
mean_var
[
k
]
=
out
/
(
N
*
M
);
}
out
=
BlockReduce
(
temp_storage
).
Reduce
(
x2_sum
,
cub
::
Sum
());
__syncthreads
();
if
(
threadIdx
.
x
==
0
)
{
mean_var
[
k
+
C
]
=
out
/
(
N
*
M
);
}
}
if
(
blockIdx
.
x
==
0
&&
threadIdx
.
x
==
0
)
{
mean_var
[
2
*
C
]
=
static_cast
<
T
>
(
1.0
);
}
}
template
<
typename
T
>
__global__
void
KeSyncAndMovingStats
(
T
*
means
,
T
*
variances
,
T
*
num_dev
,
const
int
C
,
const
T
momentum
,
const
double
epsilon
,
T
*
sv_mean_data
,
T
*
sv_inv_var_data
,
T
*
moving_means
,
T
*
moving_variances
)
{
// sync stats across multi-devices
int
gid
=
blockIdx
.
x
*
blockDim
.
x
+
threadIdx
.
x
;
int
stride
=
blockDim
.
x
*
gridDim
.
x
;
for
(
int
i
=
gid
;
i
<
C
;
i
+=
stride
)
{
T
mean
=
means
[
i
]
/
(
*
num_dev
);
T
var
=
variances
[
i
]
/
(
*
num_dev
);
var
=
var
-
mean
*
mean
;
// sync stats
sv_mean_data
[
i
]
=
mean
;
sv_inv_var_data
[
i
]
=
1.0
/
sqrt
(
var
+
epsilon
);
variances
[
i
]
=
var
;
// moving stats
moving_means
[
i
]
=
moving_means
[
i
]
*
momentum
+
mean
*
(
1.
-
momentum
);
moving_variances
[
i
]
=
moving_variances
[
i
]
*
momentum
+
var
*
(
1.
-
momentum
);
}
}
template
<
typename
T
,
framework
::
DataLayout
layout
>
static
__global__
void
KeNormAffine
(
const
T
*
x
,
const
T
*
scale
,
const
T
*
bias
,
const
T
*
mean
,
const
T
*
variance
,
const
double
epsilon
,
const
int
C
,
const
int
M
,
const
int
num
,
T
*
y
)
{
int
gid
=
blockIdx
.
x
*
blockDim
.
x
+
threadIdx
.
x
;
int
stride
=
blockDim
.
x
*
gridDim
.
x
;
for
(
int
i
=
gid
;
i
<
num
;
i
+=
stride
)
{
const
int
c
=
layout
==
framework
::
DataLayout
::
kNCHW
?
(
i
/
M
)
%
C
:
i
%
C
;
y
[
i
]
=
(
x
[
i
]
-
mean
[
c
])
/
sqrt
(
variance
[
c
]
+
epsilon
)
*
scale
[
c
]
+
bias
[
c
];
}
}
template
<
typename
DeviceContext
,
typename
T
>
class
SyncBatchNormKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
void
Compute
(
const
framework
::
ExecutionContext
&
ctx
)
const
override
{
double
epsilon
=
static_cast
<
double
>
(
ctx
.
Attr
<
float
>
(
"epsilon"
));
const
float
momentum
=
ctx
.
Attr
<
float
>
(
"momentum"
);
const
bool
is_test
=
ctx
.
Attr
<
bool
>
(
"is_test"
);
const
std
::
string
layout_str
=
ctx
.
Attr
<
std
::
string
>
(
"data_layout"
);
const
DataLayout
layout
=
framework
::
StringToDataLayout
(
layout_str
);
const
bool
use_global_stats
=
ctx
.
Attr
<
bool
>
(
"use_global_stats"
);
PADDLE_ENFORCE
(
!
use_global_stats
,
"sync_batch_norm doesn't support to set use_global_stats True. "
,
"Please use batch_norm in this case."
);
const
auto
*
x
=
ctx
.
Input
<
Tensor
>
(
"X"
);
const
auto
&
x_dims
=
x
->
dims
();
PADDLE_ENFORCE
(
x_dims
.
size
()
>=
2
&&
x_dims
.
size
()
<=
5
,
"The Input dim size should be between 2 and 5"
);
int
N
,
C
,
H
,
W
,
D
;
ExtractNCWHD
(
x_dims
,
layout
,
&
N
,
&
C
,
&
H
,
&
W
,
&
D
);
int
x_numel
=
x
->
numel
();
const
T
*
x_d
=
x
->
data
<
T
>
();
const
T
*
s_d
=
ctx
.
Input
<
Tensor
>
(
"Scale"
)
->
data
<
T
>
();
const
T
*
b_d
=
ctx
.
Input
<
Tensor
>
(
"Bias"
)
->
data
<
T
>
();
auto
*
y
=
ctx
.
Output
<
Tensor
>
(
"Y"
);
T
*
y_d
=
y
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
const
T
*
mean_data
=
nullptr
;
const
T
*
var_data
=
nullptr
;
auto
&
dev_ctx
=
ctx
.
cuda_device_context
();
auto
stream
=
dev_ctx
.
stream
();
auto
*
comm
=
dev_ctx
.
nccl_comm
();
const
int
block
=
512
;
int
max_threads
=
dev_ctx
.
GetMaxPhysicalThreadCount
();
paddle
::
memory
::
AllocationPtr
alloc_ptr
{
nullptr
};
if
(
is_test
)
{
const
auto
*
est_mean
=
ctx
.
Input
<
Tensor
>
(
"Mean"
);
const
auto
*
est_var
=
ctx
.
Input
<
Tensor
>
(
"Variance"
);
mean_data
=
est_mean
->
data
<
T
>
();
var_data
=
est_var
->
data
<
T
>
();
}
else
{
auto
&
allocator
=
platform
::
DeviceTemporaryAllocator
::
Instance
().
Get
(
dev_ctx
);
// x, x^2, 1, here 1 is used to calc device num
// device num also can be got from platform::DeviceContextPool
const
int
bytes
=
(
C
*
2
+
1
)
*
sizeof
(
T
);
alloc_ptr
=
allocator
.
Allocate
(
bytes
);
T
*
stats
=
reinterpret_cast
<
T
*>
(
alloc_ptr
->
ptr
());
const
int
threads
=
256
;
int
grid
=
std
::
min
(
C
,
(
max_threads
+
threads
-
1
)
/
threads
);
if
(
layout
==
framework
::
DataLayout
::
kNCHW
)
{
KeLocalStats
<
T
,
threads
,
framework
::
DataLayout
::
kNCHW
><<<
grid
,
threads
,
0
,
stream
>>>
(
x_d
,
N
,
H
*
W
*
D
,
C
,
stats
);
}
else
{
KeLocalStats
<
T
,
threads
,
framework
::
DataLayout
::
kNHWC
><<<
grid
,
threads
,
0
,
stream
>>>
(
x_d
,
N
,
H
*
W
*
D
,
C
,
stats
);
}
Tensor
c_g_st
;
T
*
c_g_st_d
=
c_g_st
.
mutable_data
<
T
>
({
2
*
C
+
1
},
platform
::
CPUPlace
());
auto
gplace
=
boost
::
get
<
platform
::
CUDAPlace
>
(
ctx
.
GetPlace
());
memory
::
Copy
(
platform
::
CPUPlace
(),
c_g_st_d
,
gplace
,
stats
,
bytes
,
0
);
int
dtype
=
platform
::
ToNCCLDataType
(
x
->
type
());
// In-place operation
PADDLE_ENFORCE
(
platform
::
dynload
::
ncclAllReduce
(
stats
,
stats
,
2
*
C
+
1
,
static_cast
<
ncclDataType_t
>
(
dtype
),
ncclSum
,
comm
,
stream
));
// moving mean/variance
auto
*
mean_out
=
ctx
.
Output
<
Tensor
>
(
"MeanOut"
);
auto
*
variance_out
=
ctx
.
Output
<
Tensor
>
(
"VarianceOut"
);
T
*
est_mean_data
=
mean_out
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
T
*
est_var_data
=
variance_out
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
auto
*
saved_mean
=
ctx
.
Output
<
Tensor
>
(
"SavedMean"
);
auto
*
saved_inv_variance
=
ctx
.
Output
<
Tensor
>
(
"SavedVariance"
);
T
*
sv_mean_data
=
saved_mean
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
T
*
sv_inv_var_data
=
saved_inv_variance
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
// Note, Input('Mean')/Input('Variance') share variable with
// Output('MeanOut')/Output('VarianceOut')
KeSyncAndMovingStats
<
T
><<<
(
C
+
block
-
1
)
/
block
,
block
,
0
,
stream
>>>
(
stats
,
stats
+
C
,
stats
+
2
*
C
,
C
,
momentum
,
epsilon
,
sv_mean_data
,
sv_inv_var_data
,
est_mean_data
,
est_var_data
);
mean_data
=
sv_mean_data
;
var_data
=
stats
+
C
;
}
int
grid2
=
(
std
::
min
(
x_numel
,
max_threads
)
+
block
-
1
)
/
block
;
if
(
layout
==
framework
::
DataLayout
::
kNCHW
)
{
KeNormAffine
<
T
,
framework
::
DataLayout
::
kNCHW
><<<
grid2
,
block
,
0
,
stream
>>>
(
x_d
,
s_d
,
b_d
,
mean_data
,
var_data
,
epsilon
,
C
,
H
*
W
*
D
,
x_numel
,
y_d
);
}
else
{
KeNormAffine
<
T
,
framework
::
DataLayout
::
kNHWC
><<<
grid2
,
block
,
0
,
stream
>>>
(
x_d
,
s_d
,
b_d
,
mean_data
,
var_data
,
epsilon
,
C
,
H
*
W
*
D
,
x_numel
,
y_d
);
}
}
};
template
<
typename
T
,
const
int
BlockDim
,
framework
::
DataLayout
layout
>
__global__
void
KeBackwardLocalStats
(
const
T
*
dy
,
const
T
*
x
,
const
T
*
means
,
int
N
,
int
M
,
int
C
,
T
*
sum_dy_prod
)
{
typedef
cub
::
BlockReduce
<
double
,
BlockDim
>
BlockReduce
;
__shared__
typename
BlockReduce
::
TempStorage
temp_storage
;
for
(
int
k
=
blockIdx
.
x
;
k
<
C
;
k
+=
gridDim
.
x
)
{
T
sum1
=
0
;
T
sum2
=
0
;
T
mean
=
means
[
k
];
for
(
int
i
=
threadIdx
.
x
;
i
<
N
*
M
;
i
+=
blockDim
.
x
)
{
int
id
=
layout
==
framework
::
DataLayout
::
kNCHW
?
(
i
/
M
)
*
C
*
M
+
k
*
M
+
i
%
M
:
i
*
C
+
k
;
T
g
=
dy
[
id
];
sum1
+=
g
;
sum2
+=
g
*
(
x
[
id
]
-
mean
);
}
__syncthreads
();
T
out
=
BlockReduce
(
temp_storage
).
Reduce
(
sum1
,
cub
::
Sum
());
__syncthreads
();
if
(
threadIdx
.
x
==
0
)
{
sum_dy_prod
[
k
]
=
out
;
}
out
=
BlockReduce
(
temp_storage
).
Reduce
(
sum2
,
cub
::
Sum
());
__syncthreads
();
if
(
threadIdx
.
x
==
0
)
{
sum_dy_prod
[
k
+
C
]
=
out
;
}
}
if
(
blockIdx
.
x
==
0
&&
threadIdx
.
x
==
0
)
{
sum_dy_prod
[
2
*
C
]
=
static_cast
<
T
>
(
1.0
);
}
}
template
<
typename
T
,
int
BlockDim
,
framework
::
DataLayout
layout
>
static
__global__
void
KeBNBackwardScaleBias
(
const
T
*
dy
,
const
T
*
x
,
const
T
*
mean
,
const
T
*
inv_variance
,
const
double
epsilon
,
const
int
N
,
const
int
C
,
const
int
HxW
,
T
*
dscale
,
T
*
dbias
)
{
const
int
outer_size
=
C
;
const
int
inner_size
=
N
*
HxW
;
typedef
cub
::
BlockReduce
<
double
,
BlockDim
>
BlockReduce
;
__shared__
typename
BlockReduce
::
TempStorage
temp_storage
;
for
(
int
i
=
blockIdx
.
x
;
i
<
outer_size
;
i
+=
gridDim
.
x
)
{
T
ds_sum
=
static_cast
<
T
>
(
0
);
T
db_sum
=
static_cast
<
T
>
(
0
);
T
inv_var_i
=
inv_variance
[
i
];
T
mean_i
=
mean
[
i
];
for
(
int
j
=
threadIdx
.
x
;
j
<
inner_size
;
j
+=
blockDim
.
x
)
{
const
int
id
=
layout
==
framework
::
DataLayout
::
kNCHW
?
((
j
/
HxW
)
*
C
+
i
)
*
HxW
+
(
j
%
HxW
)
:
j
*
outer_size
+
i
;
ds_sum
+=
dy
[
id
]
*
(
x
[
id
]
-
mean_i
);
db_sum
+=
dy
[
id
];
}
__syncthreads
();
double
os
=
BlockReduce
(
temp_storage
)
.
Reduce
(
static_cast
<
double
>
(
ds_sum
),
cub
::
Sum
());
__syncthreads
();
double
ob
=
BlockReduce
(
temp_storage
)
.
Reduce
(
static_cast
<
double
>
(
db_sum
),
cub
::
Sum
());
__syncthreads
();
if
(
threadIdx
.
x
==
0
)
{
dscale
[
i
]
=
static_cast
<
T
>
(
os
*
inv_var_i
);
dbias
[
i
]
=
static_cast
<
T
>
(
ob
);
}
__syncthreads
();
}
}
template
<
typename
T
,
framework
::
DataLayout
layout
>
static
__global__
void
KeBNBackwardData
(
const
T
*
dy
,
const
T
*
x
,
const
T
*
beta
,
const
T
*
mean
,
const
T
*
inv_variance
,
const
T
*
g_sum_dy
,
const
T
*
g_sum_dy_prod
,
const
T
*
num_dev
,
const
double
epsilon
,
const
int
C
,
const
int
HxW
,
const
int
num
,
T
*
dx
)
{
int
gid
=
blockIdx
.
x
*
blockDim
.
x
+
threadIdx
.
x
;
int
stride
=
blockDim
.
x
*
gridDim
.
x
;
T
scale
=
static_cast
<
T
>
(
C
)
/
num
;
T
dev_num
=
num_dev
[
0
];
for
(
int
i
=
gid
;
i
<
num
;
i
+=
stride
)
{
const
int
c
=
layout
==
framework
::
DataLayout
::
kNCHW
?
i
/
HxW
%
C
:
i
%
C
;
T
inv_var
=
inv_variance
[
c
];
T
s_d
=
beta
[
c
];
T
gvar
=
-
1.0
*
(
g_sum_dy_prod
[
c
]
/
dev_num
)
*
s_d
*
inv_var
*
(
inv_var
*
inv_var
);
T
gmean
=
-
1.0
*
(
g_sum_dy
[
c
]
/
dev_num
)
*
s_d
*
inv_var
;
dx
[
i
]
=
dy
[
i
]
*
s_d
*
inv_var
+
gmean
*
scale
+
gvar
*
scale
*
(
x
[
i
]
-
mean
[
c
]);
}
}
// Deriving the Gradient for the Backward Pass of Batch Normalization
// https://kevinzakka.github.io/2016/09/14/batch_normalization/
template
<
typename
DeviceContext
,
typename
T
>
class
SyncBatchNormGradKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
void
Compute
(
const
framework
::
ExecutionContext
&
ctx
)
const
override
{
PADDLE_ENFORCE
(
platform
::
is_gpu_place
(
ctx
.
GetPlace
()),
"It must use CUDAPlace."
);
double
epsilon
=
static_cast
<
double
>
(
ctx
.
Attr
<
float
>
(
"epsilon"
));
const
std
::
string
layout_str
=
ctx
.
Attr
<
std
::
string
>
(
"data_layout"
);
const
DataLayout
layout
=
framework
::
StringToDataLayout
(
layout_str
);
const
auto
*
x
=
ctx
.
Input
<
Tensor
>
(
"X"
);
const
auto
*
d_y
=
ctx
.
Input
<
Tensor
>
(
framework
::
GradVarName
(
"Y"
));
const
auto
*
scale
=
ctx
.
Input
<
Tensor
>
(
"Scale"
);
const
auto
&
x_dims
=
x
->
dims
();
PADDLE_ENFORCE
(
x_dims
.
size
()
>=
2
&&
x_dims
.
size
()
<=
5
,
"The Input dim size should be between 2 and 5"
);
int
N
,
C
,
H
,
W
,
D
;
ExtractNCWHD
(
x_dims
,
layout
,
&
N
,
&
C
,
&
H
,
&
W
,
&
D
);
// init output
auto
*
d_x
=
ctx
.
Output
<
Tensor
>
(
framework
::
GradVarName
(
"X"
));
auto
*
d_scale
=
ctx
.
Output
<
Tensor
>
(
framework
::
GradVarName
(
"Scale"
));
auto
*
d_bias
=
ctx
.
Output
<
Tensor
>
(
framework
::
GradVarName
(
"Bias"
));
d_x
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
if
(
d_scale
&&
d_bias
)
{
d_scale
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
d_bias
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
}
PADDLE_ENFORCE_EQ
(
scale
->
dims
().
size
(),
1UL
);
PADDLE_ENFORCE_EQ
(
scale
->
dims
()[
0
],
C
);
std
::
vector
<
int
>
dims
;
std
::
vector
<
int
>
strides
;
if
(
layout
==
DataLayout
::
kNCHW
)
{
dims
=
{
N
,
C
,
H
,
W
,
D
};
strides
=
{
C
*
H
*
W
*
D
,
H
*
W
*
D
,
W
*
D
,
D
,
1
};
}
else
{
dims
=
{
N
,
C
,
H
,
W
,
D
};
strides
=
{
H
*
W
*
C
*
D
,
1
,
W
*
D
*
C
,
D
*
C
,
C
};
}
const
T
*
x_d
=
x
->
data
<
T
>
();
const
T
*
dy_d
=
d_y
->
data
<
T
>
();
auto
&
dev_ctx
=
ctx
.
cuda_device_context
();
auto
stream
=
dev_ctx
.
stream
();
auto
*
comm
=
dev_ctx
.
nccl_comm
();
const
T
*
saved_mean
=
ctx
.
Input
<
Tensor
>
(
"SavedMean"
)
->
data
<
T
>
();
const
T
*
saved_inv_var
=
ctx
.
Input
<
Tensor
>
(
"SavedVariance"
)
->
data
<
T
>
();
auto
&
allocator
=
platform
::
DeviceTemporaryAllocator
::
Instance
().
Get
(
dev_ctx
);
const
int
bytes
=
(
C
*
2
+
1
)
*
sizeof
(
T
);
auto
alloc_ptr
=
allocator
.
Allocate
(
bytes
);
T
*
stats
=
reinterpret_cast
<
T
*>
(
alloc_ptr
->
ptr
());
const
int
threads
=
256
;
int
max_threads
=
dev_ctx
.
GetMaxPhysicalThreadCount
();
int
grid
=
std
::
min
(
C
,
(
max_threads
+
threads
-
1
)
/
threads
);
int
x_numel
=
x
->
numel
();
int
fsize
=
H
*
W
*
D
;
if
(
layout
==
framework
::
DataLayout
::
kNCHW
)
{
KeBackwardLocalStats
<
T
,
threads
,
framework
::
DataLayout
::
kNCHW
><<<
grid
,
threads
,
0
,
stream
>>>
(
dy_d
,
x_d
,
saved_mean
,
N
,
fsize
,
C
,
stats
);
}
else
{
KeBackwardLocalStats
<
T
,
threads
,
framework
::
DataLayout
::
kNHWC
><<<
grid
,
threads
,
0
,
stream
>>>
(
dy_d
,
x_d
,
saved_mean
,
N
,
fsize
,
C
,
stats
);
}
int
dtype
=
platform
::
ToNCCLDataType
(
x
->
type
());
// In-place operation
PADDLE_ENFORCE
(
platform
::
dynload
::
ncclAllReduce
(
stats
,
stats
,
2
*
C
+
1
,
static_cast
<
ncclDataType_t
>
(
dtype
),
ncclSum
,
comm
,
stream
));
const
int
block
=
512
;
int
grid2
=
(
std
::
min
(
x_numel
,
max_threads
)
+
block
-
1
)
/
block
;
if
(
layout
==
framework
::
DataLayout
::
kNCHW
)
{
if
(
d_scale
&&
d_bias
)
{
KeBNBackwardScaleBias
<
T
,
threads
,
framework
::
DataLayout
::
kNCHW
><<<
grid
,
threads
,
0
,
stream
>>>
(
dy_d
,
x_d
,
saved_mean
,
saved_inv_var
,
epsilon
,
N
,
C
,
fsize
,
d_scale
->
data
<
T
>
(),
d_bias
->
data
<
T
>
());
}
if
(
d_x
)
{
KeBNBackwardData
<
T
,
framework
::
DataLayout
::
kNCHW
><<<
grid2
,
block
,
0
,
stream
>>>
(
dy_d
,
x_d
,
scale
->
data
<
T
>
(),
saved_mean
,
saved_inv_var
,
stats
,
stats
+
C
,
stats
+
2
*
C
,
epsilon
,
C
,
fsize
,
x
->
numel
(),
d_x
->
data
<
T
>
());
}
}
else
{
if
(
d_scale
&&
d_bias
)
{
KeBNBackwardScaleBias
<
T
,
threads
,
framework
::
DataLayout
::
kNHWC
><<<
grid
,
threads
,
0
,
stream
>>>
(
dy_d
,
x_d
,
saved_mean
,
saved_inv_var
,
epsilon
,
N
,
C
,
fsize
,
d_scale
->
data
<
T
>
(),
d_bias
->
data
<
T
>
());
}
if
(
d_x
)
{
KeBNBackwardData
<
T
,
framework
::
DataLayout
::
kNHWC
><<<
grid2
,
block
,
0
,
stream
>>>
(
dy_d
,
x_d
,
scale
->
data
<
T
>
(),
saved_mean
,
saved_inv_var
,
stats
,
stats
+
C
,
stats
+
2
*
C
,
epsilon
,
C
,
fsize
,
x
->
numel
(),
d_x
->
data
<
T
>
());
}
}
}
};
}
// namespace operators
}
// namespace paddle
namespace
ops
=
paddle
::
operators
;
namespace
plat
=
paddle
::
platform
;
REGISTER_OP_CUDA_KERNEL
(
sync_batch_norm
,
ops
::
SyncBatchNormKernel
<
plat
::
CUDADeviceContext
,
float
>
,
ops
::
SyncBatchNormKernel
<
plat
::
CUDADeviceContext
,
double
>
);
REGISTER_OP_CUDA_KERNEL
(
sync_batch_norm_grad
,
ops
::
SyncBatchNormGradKernel
<
plat
::
CUDADeviceContext
,
float
>
,
ops
::
SyncBatchNormGradKernel
<
plat
::
CUDADeviceContext
,
double
>
);
paddle/fluid/platform/device_context.cc
浏览文件 @
161b8ddc
...
@@ -57,7 +57,6 @@ DeviceContextPool::DeviceContextPool(
...
@@ -57,7 +57,6 @@ DeviceContextPool::DeviceContextPool(
for
(
auto
&
p
:
places
)
{
for
(
auto
&
p
:
places
)
{
set
.
insert
(
p
);
set
.
insert
(
p
);
}
}
for
(
auto
&
p
:
set
)
{
for
(
auto
&
p
:
set
)
{
if
(
platform
::
is_cpu_place
(
p
))
{
if
(
platform
::
is_cpu_place
(
p
))
{
#ifdef PADDLE_WITH_MKLDNN
#ifdef PADDLE_WITH_MKLDNN
...
@@ -317,6 +316,9 @@ CUDADeviceContext::~CUDADeviceContext() {
...
@@ -317,6 +316,9 @@ CUDADeviceContext::~CUDADeviceContext() {
eigen_stream_
.
reset
();
eigen_stream_
.
reset
();
eigen_device_
.
reset
();
eigen_device_
.
reset
();
PADDLE_ENFORCE
(
cudaStreamDestroy
(
stream_
));
PADDLE_ENFORCE
(
cudaStreamDestroy
(
stream_
));
#if !defined(_WIN32)
PADDLE_ENFORCE
(
dynload
::
ncclCommDestroy
(
nccl_comm_
));
#endif
}
}
Place
CUDADeviceContext
::
GetPlace
()
const
{
return
place_
;
}
Place
CUDADeviceContext
::
GetPlace
()
const
{
return
place_
;
}
...
...
paddle/fluid/platform/device_context.h
浏览文件 @
161b8ddc
...
@@ -265,6 +265,14 @@ class CUDADeviceContext : public DeviceContext {
...
@@ -265,6 +265,14 @@ class CUDADeviceContext : public DeviceContext {
/*! \brief Return cuda stream in the device context. */
/*! \brief Return cuda stream in the device context. */
cudaStream_t
stream
()
const
;
cudaStream_t
stream
()
const
;
#if !defined(_WIN32)
/*! \brief Return nccl communicators. */
ncclComm_t
nccl_comm
()
const
{
return
nccl_comm_
;
}
/*! \brief Set nccl communicators. */
void
set_nccl_comm
(
ncclComm_t
comm
)
{
nccl_comm_
=
comm
;
}
#endif
template
<
typename
Callback
>
template
<
typename
Callback
>
void
RecordEvent
(
cudaEvent_t
ev
,
Callback
callback
)
{
void
RecordEvent
(
cudaEvent_t
ev
,
Callback
callback
)
{
callback
();
callback
();
...
@@ -289,6 +297,15 @@ class CUDADeviceContext : public DeviceContext {
...
@@ -289,6 +297,15 @@ class CUDADeviceContext : public DeviceContext {
std
::
unique_ptr
<
CublasHandleHolder
>
cublas_handle_
;
std
::
unique_ptr
<
CublasHandleHolder
>
cublas_handle_
;
std
::
unique_ptr
<
CublasHandleHolder
>
cublas_tensor_core_handle_
;
std
::
unique_ptr
<
CublasHandleHolder
>
cublas_tensor_core_handle_
;
#if !defined(_WIN32)
// NCCL communicator (single process version) for NCCL collective operations.
// NCCL collective operations provides fast collectives over multiple GPUs
// both within and across nodes.
// But, this collectives is used for collectives over multiple GPUs within
// nodes.
ncclComm_t
nccl_comm_
{
nullptr
};
#endif
int
compute_capability_
;
int
compute_capability_
;
int
runtime_version_
;
int
runtime_version_
;
int
driver_version_
;
int
driver_version_
;
...
...
paddle/fluid/platform/init.cc
浏览文件 @
161b8ddc
...
@@ -13,6 +13,8 @@ See the License for the specific language governing permissions and
...
@@ -13,6 +13,8 @@ See the License for the specific language governing permissions and
limitations under the License. */
limitations under the License. */
#include <string.h> // for strdup
#include <string.h> // for strdup
#include <algorithm>
#include <algorithm>
#include <memory>
#include <set>
#include <stdexcept>
#include <stdexcept>
#include <string>
#include <string>
...
@@ -140,6 +142,7 @@ void InitDevices(bool init_p2p, const std::vector<int> devices) {
...
@@ -140,6 +142,7 @@ void InitDevices(bool init_p2p, const std::vector<int> devices) {
places
.
emplace_back
(
platform
::
CPUPlace
());
places
.
emplace_back
(
platform
::
CPUPlace
());
platform
::
DeviceContextPool
::
Init
(
places
);
platform
::
DeviceContextPool
::
Init
(
places
);
platform
::
DeviceTemporaryAllocator
::
Init
();
platform
::
DeviceTemporaryAllocator
::
Init
();
#ifndef PADDLE_WITH_MKLDNN
#ifndef PADDLE_WITH_MKLDNN
platform
::
SetNumThreads
(
FLAGS_paddle_num_threads
);
platform
::
SetNumThreads
(
FLAGS_paddle_num_threads
);
#endif
#endif
...
...
paddle/fluid/platform/nccl_helper.h
浏览文件 @
161b8ddc
...
@@ -16,9 +16,11 @@
...
@@ -16,9 +16,11 @@
#pragma once
#pragma once
#include <stdio.h>
#include <stdio.h>
#include <memory>
#include <string>
#include <string>
#include <thread> // NOLINT
#include <thread> // NOLINT
#include <typeindex>
#include <typeindex>
#include <unordered_map>
#include <vector>
#include <vector>
#include "paddle/fluid/framework/data_type.h"
#include "paddle/fluid/framework/data_type.h"
#include "paddle/fluid/platform/dynload/nccl.h"
#include "paddle/fluid/platform/dynload/nccl.h"
...
@@ -78,6 +80,8 @@ struct NCCLContext {
...
@@ -78,6 +80,8 @@ struct NCCLContext {
cudaStream_t
stream
()
const
{
return
ctx_
->
stream
();
}
cudaStream_t
stream
()
const
{
return
ctx_
->
stream
();
}
ncclComm_t
comm
()
const
{
return
comm_
;
}
int
device_id
()
const
{
int
device_id
()
const
{
return
boost
::
get
<
platform
::
CUDAPlace
>
(
ctx_
->
GetPlace
()).
device
;
return
boost
::
get
<
platform
::
CUDAPlace
>
(
ctx_
->
GetPlace
()).
device
;
}
}
...
...
paddle/fluid/pybind/CMakeLists.txt
浏览文件 @
161b8ddc
set
(
PYBIND_DEPS pybind python proto_desc memory executor async_executor prune
set
(
PYBIND_DEPS pybind python proto_desc memory executor async_executor prune
feed_fetch_method pass_builder parallel_executor profiler layer scope_pool
feed_fetch_method pass_builder parallel_executor profiler layer scope_pool
tracer analysis_predictor
)
tracer analysis_predictor
imperative_profiler
)
if
(
WITH_PYTHON
)
if
(
WITH_PYTHON
)
list
(
APPEND PYBIND_DEPS py_func_op
)
list
(
APPEND PYBIND_DEPS py_func_op
)
...
...
paddle/fluid/pybind/imperative.cc
浏览文件 @
161b8ddc
...
@@ -42,6 +42,7 @@ void BindTracer(pybind11::module* m) {
...
@@ -42,6 +42,7 @@ void BindTracer(pybind11::module* m) {
framework
::
AttributeMap
attrs_map
,
framework
::
AttributeMap
attrs_map
,
const
platform
::
CPUPlace
expected_place
,
const
platform
::
CPUPlace
expected_place
,
const
bool
stop_gradient
=
false
)
{
const
bool
stop_gradient
=
false
)
{
pybind11
::
gil_scoped_release
release
;
return
self
.
Trace
(
op
,
inputs
,
outputs
,
attrs_map
,
expected_place
,
return
self
.
Trace
(
op
,
inputs
,
outputs
,
attrs_map
,
expected_place
,
stop_gradient
);
stop_gradient
);
})
})
...
@@ -52,6 +53,7 @@ void BindTracer(pybind11::module* m) {
...
@@ -52,6 +53,7 @@ void BindTracer(pybind11::module* m) {
framework
::
AttributeMap
attrs_map
,
framework
::
AttributeMap
attrs_map
,
const
platform
::
CUDAPlace
expected_place
,
const
platform
::
CUDAPlace
expected_place
,
const
bool
stop_gradient
=
false
)
{
const
bool
stop_gradient
=
false
)
{
pybind11
::
gil_scoped_release
release
;
return
self
.
Trace
(
op
,
inputs
,
outputs
,
attrs_map
,
expected_place
,
return
self
.
Trace
(
op
,
inputs
,
outputs
,
attrs_map
,
expected_place
,
stop_gradient
);
stop_gradient
);
})
})
...
...
paddle/fluid/pybind/pybind.cc
浏览文件 @
161b8ddc
...
@@ -36,6 +36,7 @@ limitations under the License. */
...
@@ -36,6 +36,7 @@ limitations under the License. */
#include "paddle/fluid/framework/selected_rows.h"
#include "paddle/fluid/framework/selected_rows.h"
#include "paddle/fluid/framework/version.h"
#include "paddle/fluid/framework/version.h"
#include "paddle/fluid/imperative/layer.h"
#include "paddle/fluid/imperative/layer.h"
#include "paddle/fluid/imperative/profiler.h"
#include "paddle/fluid/memory/allocation/allocator_strategy.h"
#include "paddle/fluid/memory/allocation/allocator_strategy.h"
#include "paddle/fluid/memory/allocation/legacy_allocator.h"
#include "paddle/fluid/memory/allocation/legacy_allocator.h"
#include "paddle/fluid/operators/activation_op.h"
#include "paddle/fluid/operators/activation_op.h"
...
@@ -162,6 +163,11 @@ PYBIND11_MODULE(core, m) {
...
@@ -162,6 +163,11 @@ PYBIND11_MODULE(core, m) {
m
.
def
(
"print_mem_usage"
,
m
.
def
(
"print_mem_usage"
,
[]()
{
return
memory
::
allocation
::
GPUMemMonitor
.
PrintMemUsage
();
});
[]()
{
return
memory
::
allocation
::
GPUMemMonitor
.
PrintMemUsage
();
});
m
.
def
(
"start_imperative_gperf_profiler"
,
[]()
{
imperative
::
StartProfile
();
});
m
.
def
(
"stop_imperative_gperf_profiler"
,
[]()
{
imperative
::
StopProfile
();
});
py
::
class_
<
imperative
::
VarBase
>
(
m
,
"VarBase"
,
R"DOC()DOC"
)
py
::
class_
<
imperative
::
VarBase
>
(
m
,
"VarBase"
,
R"DOC()DOC"
)
.
def
(
.
def
(
py
::
init
<
const
std
::
string
&
,
paddle
::
framework
::
proto
::
VarType
::
Type
,
py
::
init
<
const
std
::
string
&
,
paddle
::
framework
::
proto
::
VarType
::
Type
,
...
@@ -1249,6 +1255,21 @@ All parameter, weight, gradient are variables in Paddle.
...
@@ -1249,6 +1255,21 @@ All parameter, weight, gradient are variables in Paddle.
This options is only available in GPU devices.
This options is only available in GPU devices.
Default False)DOC"
)
Default False)DOC"
)
.
def_property
(
.
def_property
(
"sync_batch_norm"
,
[](
const
BuildStrategy
&
self
)
{
return
self
.
sync_batch_norm_
;
},
[](
BuildStrategy
&
self
,
bool
b
)
{
PADDLE_ENFORCE
(
!
self
.
IsFinalized
(),
"BuildStrategy is finlaized."
);
self
.
sync_batch_norm_
=
b
;
},
R"DOC(The type is BOOL, sync_batch_norm indicates whether to use
synchronous batch normalization which synchronizes the mean
and variance through multi-devices in training phase.
Current implementation doesn't support FP16 training and CPU.
And only synchronous on one machine, not all machines.
Default False)DOC"
)
.
def_property
(
"memory_optimize"
,
"memory_optimize"
,
[](
const
BuildStrategy
&
self
)
{
return
self
.
memory_optimize_
;
},
[](
const
BuildStrategy
&
self
)
{
return
self
.
memory_optimize_
;
},
[](
BuildStrategy
&
self
,
bool
b
)
{
self
.
memory_optimize_
=
b
;
})
[](
BuildStrategy
&
self
,
bool
b
)
{
self
.
memory_optimize_
=
b
;
})
...
...
python/paddle/fluid/__init__.py
浏览文件 @
161b8ddc
...
@@ -132,7 +132,8 @@ def __bootstrap__():
...
@@ -132,7 +132,8 @@ def __bootstrap__():
'allocator_strategy'
,
'reader_queue_speed_test_mode'
,
'allocator_strategy'
,
'reader_queue_speed_test_mode'
,
'print_sub_graph_dir'
,
'pe_profile_fname'
,
'warpctc_dir'
,
'print_sub_graph_dir'
,
'pe_profile_fname'
,
'warpctc_dir'
,
'inner_op_parallelism'
,
'enable_parallel_graph'
,
'inner_op_parallelism'
,
'enable_parallel_graph'
,
'multiple_of_cupti_buffer_size'
,
'enable_subgraph_optimize'
'multiple_of_cupti_buffer_size'
,
'enable_subgraph_optimize'
,
'tracer_profile_fname'
]
]
if
'Darwin'
not
in
sysstr
:
if
'Darwin'
not
in
sysstr
:
read_env_flags
.
append
(
'use_pinned_memory'
)
read_env_flags
.
append
(
'use_pinned_memory'
)
...
...
python/paddle/fluid/compiler.py
浏览文件 @
161b8ddc
...
@@ -230,6 +230,9 @@ class CompiledProgram(object):
...
@@ -230,6 +230,9 @@ class CompiledProgram(object):
tps
),
"num_trainers == len(end_points)"
tps
),
"num_trainers == len(end_points)"
self
.
_build_strategy
.
trainers_endpoints
=
tps
self
.
_build_strategy
.
trainers_endpoints
=
tps
if
self
.
_build_strategy
.
sync_batch_norm
:
self
.
_build_strategy
.
enable_sequential_execution
=
True
self
.
_persistable_vars
=
[]
self
.
_persistable_vars
=
[]
for
node
in
self
.
_graph
.
nodes
():
for
node
in
self
.
_graph
.
nodes
():
if
node
.
is_var
()
and
node
.
var
()
is
not
None
and
node
.
var
().
persistable
()
and
\
if
node
.
is_var
()
and
node
.
var
()
is
not
None
and
node
.
var
().
persistable
()
and
\
...
...
python/paddle/fluid/contrib/quantize/quantize_transpiler.py
浏览文件 @
161b8ddc
此差异已折叠。
点击以展开。
python/paddle/fluid/contrib/slim/quantization/quantization_pass.py
浏览文件 @
161b8ddc
此差异已折叠。
点击以展开。
python/paddle/fluid/contrib/slim/tests/test_quantization_pass.py
浏览文件 @
161b8ddc
...
@@ -164,6 +164,9 @@ class TestQuantizationTransformPass(unittest.TestCase):
...
@@ -164,6 +164,9 @@ class TestQuantizationTransformPass(unittest.TestCase):
def
test_linear_fc_quant_range_abs_max
(
self
):
def
test_linear_fc_quant_range_abs_max
(
self
):
self
.
linear_fc_quant
(
'range_abs_max'
,
for_ci
=
True
)
self
.
linear_fc_quant
(
'range_abs_max'
,
for_ci
=
True
)
def
test_linear_fc_quant_moving_average_abs_max
(
self
):
self
.
linear_fc_quant
(
'moving_average_abs_max'
,
for_ci
=
True
)
def
residual_block_quant
(
self
,
quant_type
,
for_ci
=
False
):
def
residual_block_quant
(
self
,
quant_type
,
for_ci
=
False
):
main
=
fluid
.
Program
()
main
=
fluid
.
Program
()
startup
=
fluid
.
Program
()
startup
=
fluid
.
Program
()
...
@@ -201,6 +204,9 @@ class TestQuantizationTransformPass(unittest.TestCase):
...
@@ -201,6 +204,9 @@ class TestQuantizationTransformPass(unittest.TestCase):
def
test_residual_block_range_abs_max
(
self
):
def
test_residual_block_range_abs_max
(
self
):
self
.
residual_block_quant
(
'range_abs_max'
,
for_ci
=
True
)
self
.
residual_block_quant
(
'range_abs_max'
,
for_ci
=
True
)
def
test_residual_block_moving_average_abs_max
(
self
):
self
.
residual_block_quant
(
'moving_average_abs_max'
,
for_ci
=
True
)
class
TestQuantizationFreezePass
(
unittest
.
TestCase
):
class
TestQuantizationFreezePass
(
unittest
.
TestCase
):
def
freeze_graph
(
self
,
use_cuda
,
seed
,
quant_type
,
for_ci
=
False
):
def
freeze_graph
(
self
,
use_cuda
,
seed
,
quant_type
,
for_ci
=
False
):
...
@@ -380,11 +386,18 @@ class TestQuantizationFreezePass(unittest.TestCase):
...
@@ -380,11 +386,18 @@ class TestQuantizationFreezePass(unittest.TestCase):
with
fluid
.
unique_name
.
guard
():
with
fluid
.
unique_name
.
guard
():
self
.
freeze_graph
(
self
.
freeze_graph
(
True
,
seed
=
1
,
quant_type
=
'range_abs_max'
,
for_ci
=
True
)
True
,
seed
=
1
,
quant_type
=
'range_abs_max'
,
for_ci
=
True
)
self
.
freeze_graph
(
True
,
seed
=
1
,
quant_type
=
'moving_average_abs_max'
,
for_ci
=
True
)
def
test_freeze_graph_cpu_static
(
self
):
def
test_freeze_graph_cpu_static
(
self
):
with
fluid
.
unique_name
.
guard
():
with
fluid
.
unique_name
.
guard
():
self
.
freeze_graph
(
self
.
freeze_graph
(
False
,
seed
=
2
,
quant_type
=
'range_abs_max'
,
for_ci
=
True
)
False
,
seed
=
2
,
quant_type
=
'range_abs_max'
,
for_ci
=
True
)
self
.
freeze_graph
(
False
,
seed
=
2
,
quant_type
=
'moving_average_abs_max'
,
for_ci
=
True
)
if
__name__
==
'__main__'
:
if
__name__
==
'__main__'
:
...
...
python/paddle/fluid/contrib/utils/lookup_table_utils.py
浏览文件 @
161b8ddc
此差异已折叠。
点击以展开。
python/paddle/fluid/framework.py
浏览文件 @
161b8ddc
...
@@ -515,6 +515,11 @@ class Variable(object):
...
@@ -515,6 +515,11 @@ class Variable(object):
Returns:
Returns:
str: The debug string.
str: The debug string.
"""
"""
if
_in_imperative_mode
():
# TODO(panyx0718): add more imperative debug info.
return
'name %s, dtype: %s shape: %s'
%
(
self
.
name
,
self
.
dtype
,
self
.
shape
)
assert
isinstance
(
throw_on_error
,
bool
)
and
isinstance
(
with_details
,
assert
isinstance
(
throw_on_error
,
bool
)
and
isinstance
(
with_details
,
bool
)
bool
)
protostr
=
self
.
desc
.
serialize_to_string
()
protostr
=
self
.
desc
.
serialize_to_string
()
...
...
python/paddle/fluid/imperative/__init__.py
浏览文件 @
161b8ddc
此差异已折叠。
点击以展开。
python/paddle/fluid/imperative/profiler.py
0 → 100644
浏览文件 @
161b8ddc
此差异已折叠。
点击以展开。
python/paddle/fluid/layers/detection.py
浏览文件 @
161b8ddc
此差异已折叠。
点击以展开。
python/paddle/fluid/layers/nn.py
浏览文件 @
161b8ddc
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/test_detection.py
浏览文件 @
161b8ddc
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/mkldnn/test_transpose_int8_mkldnn_op.py
0 → 100644
浏览文件 @
161b8ddc
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_cross_entropy2_op.py
0 → 100644
浏览文件 @
161b8ddc
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_dist_transpiler.py
浏览文件 @
161b8ddc
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_fake_quantize_op.py
浏览文件 @
161b8ddc
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_imperative_gnn.py
0 → 100644
浏览文件 @
161b8ddc
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_layers.py
浏览文件 @
161b8ddc
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_slice_op.py
浏览文件 @
161b8ddc
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_sync_batch_norm_op.py
0 → 100644
浏览文件 @
161b8ddc
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_yolo_box_op.py
0 → 100644
浏览文件 @
161b8ddc
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_yolov3_loss_op.py
浏览文件 @
161b8ddc
此差异已折叠。
点击以展开。
tools/manylinux1/build_scripts/build.sh
浏览文件 @
161b8ddc
此差异已折叠。
点击以展开。
tools/timeline.py
浏览文件 @
161b8ddc
此差异已折叠。
点击以展开。
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录