Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
BaiXuePrincess
Paddle
提交
65420271
P
Paddle
项目概览
BaiXuePrincess
/
Paddle
与 Fork 源项目一致
Fork自
PaddlePaddle / Paddle
通知
1
Star
1
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
未验证
提交
65420271
编写于
12月 07, 2022
作者:
张
张春乔
提交者:
GitHub
12月 07, 2022
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
[phi::DenseTensor] Replace Tensor with phi::DenseTensor (#48682)
上级
693de9f0
变更
419
展开全部
显示空白变更内容
内联
并排
Showing
419 changed file
with
2450 addition
and
2880 deletion
+2450
-2880
paddle/fluid/imperative/gradient_accumulator.cc
paddle/fluid/imperative/gradient_accumulator.cc
+5
-5
paddle/fluid/operators/abs_op_mlu.cc
paddle/fluid/operators/abs_op_mlu.cc
+1
-3
paddle/fluid/operators/abs_op_npu.cc
paddle/fluid/operators/abs_op_npu.cc
+0
-2
paddle/fluid/operators/activation_op_mlu.cc
paddle/fluid/operators/activation_op_mlu.cc
+1
-3
paddle/fluid/operators/activation_op_npu.cc
paddle/fluid/operators/activation_op_npu.cc
+108
-110
paddle/fluid/operators/affine_grid_op.cc
paddle/fluid/operators/affine_grid_op.cc
+0
-2
paddle/fluid/operators/amp/alloc_float_status_op_npu.cc
paddle/fluid/operators/amp/alloc_float_status_op_npu.cc
+0
-2
paddle/fluid/operators/amp/check_finite_and_unscale_op_mlu.cc
...le/fluid/operators/amp/check_finite_and_unscale_op_mlu.cc
+3
-5
paddle/fluid/operators/amp/check_finite_and_unscale_op_npu.cc
...le/fluid/operators/amp/check_finite_and_unscale_op_npu.cc
+4
-6
paddle/fluid/operators/amp/check_finite_and_unscale_op_npu_test.cc
...uid/operators/amp/check_finite_and_unscale_op_npu_test.cc
+1
-3
paddle/fluid/operators/amp/clear_float_status_op_npu.cc
paddle/fluid/operators/amp/clear_float_status_op_npu.cc
+1
-3
paddle/fluid/operators/amp/get_float_status_op_npu.cc
paddle/fluid/operators/amp/get_float_status_op_npu.cc
+1
-3
paddle/fluid/operators/amp/update_loss_scaling_op_npu.cc
paddle/fluid/operators/amp/update_loss_scaling_op_npu.cc
+2
-4
paddle/fluid/operators/arg_max_op_npu.cc
paddle/fluid/operators/arg_max_op_npu.cc
+1
-2
paddle/fluid/operators/arg_min_op_npu.cc
paddle/fluid/operators/arg_min_op_npu.cc
+0
-1
paddle/fluid/operators/argsort_op_npu.cc
paddle/fluid/operators/argsort_op_npu.cc
+17
-18
paddle/fluid/operators/attention_lstm_op.cc
paddle/fluid/operators/attention_lstm_op.cc
+18
-15
paddle/fluid/operators/attention_lstm_op.h
paddle/fluid/operators/attention_lstm_op.h
+0
-2
paddle/fluid/operators/batch_norm_op.cc
paddle/fluid/operators/batch_norm_op.cc
+9
-9
paddle/fluid/operators/batch_norm_op.cu
paddle/fluid/operators/batch_norm_op.cu
+0
-1
paddle/fluid/operators/batch_norm_op.h
paddle/fluid/operators/batch_norm_op.h
+0
-1
paddle/fluid/operators/batch_norm_op_mlu.cc
paddle/fluid/operators/batch_norm_op_mlu.cc
+6
-6
paddle/fluid/operators/batch_norm_op_npu.cc
paddle/fluid/operators/batch_norm_op_npu.cc
+1
-1
paddle/fluid/operators/bce_loss_op_mlu.cc
paddle/fluid/operators/bce_loss_op_mlu.cc
+0
-2
paddle/fluid/operators/bce_loss_op_npu.cc
paddle/fluid/operators/bce_loss_op_npu.cc
+0
-2
paddle/fluid/operators/cast_op.cc
paddle/fluid/operators/cast_op.cc
+1
-1
paddle/fluid/operators/cast_op_mlu.cc
paddle/fluid/operators/cast_op_mlu.cc
+0
-2
paddle/fluid/operators/cast_op_npu.cc
paddle/fluid/operators/cast_op_npu.cc
+0
-2
paddle/fluid/operators/center_loss_op.h
paddle/fluid/operators/center_loss_op.h
+1
-2
paddle/fluid/operators/clip_by_norm_op.h
paddle/fluid/operators/clip_by_norm_op.h
+0
-1
paddle/fluid/operators/clip_by_norm_op_npu.cc
paddle/fluid/operators/clip_by_norm_op_npu.cc
+3
-5
paddle/fluid/operators/clip_op_mlu.cc
paddle/fluid/operators/clip_op_mlu.cc
+4
-4
paddle/fluid/operators/clip_op_npu.cc
paddle/fluid/operators/clip_op_npu.cc
+4
-6
paddle/fluid/operators/coalesce_tensor_op.cc
paddle/fluid/operators/coalesce_tensor_op.cc
+1
-1
paddle/fluid/operators/collective/c_allreduce_op.h
paddle/fluid/operators/collective/c_allreduce_op.h
+2
-3
paddle/fluid/operators/collective/c_softmax_with_cross_entropy_op.cu
...d/operators/collective/c_softmax_with_cross_entropy_op.cu
+9
-11
paddle/fluid/operators/concat_op.cc
paddle/fluid/operators/concat_op.cc
+0
-1
paddle/fluid/operators/concat_op_mlu.cc
paddle/fluid/operators/concat_op_mlu.cc
+2
-2
paddle/fluid/operators/controlflow/logical_op_mlu.cc
paddle/fluid/operators/controlflow/logical_op_mlu.cc
+0
-2
paddle/fluid/operators/controlflow/logical_op_npu.cc
paddle/fluid/operators/controlflow/logical_op_npu.cc
+0
-2
paddle/fluid/operators/conv_op.h
paddle/fluid/operators/conv_op.h
+0
-2
paddle/fluid/operators/conv_op_mlu.cc
paddle/fluid/operators/conv_op_mlu.cc
+16
-17
paddle/fluid/operators/conv_op_npu.cc
paddle/fluid/operators/conv_op_npu.cc
+15
-14
paddle/fluid/operators/conv_transpose_op_mlu.cc
paddle/fluid/operators/conv_transpose_op_mlu.cc
+8
-9
paddle/fluid/operators/conv_transpose_op_npu.cc
paddle/fluid/operators/conv_transpose_op_npu.cc
+4
-5
paddle/fluid/operators/copy_cross_scope_op.cc
paddle/fluid/operators/copy_cross_scope_op.cc
+0
-2
paddle/fluid/operators/correlation_op.cc
paddle/fluid/operators/correlation_op.cc
+0
-2
paddle/fluid/operators/cos_sim_op.h
paddle/fluid/operators/cos_sim_op.h
+2
-4
paddle/fluid/operators/crop_op_npu.cc
paddle/fluid/operators/crop_op_npu.cc
+2
-4
paddle/fluid/operators/cross_entropy_op.h
paddle/fluid/operators/cross_entropy_op.h
+2
-4
paddle/fluid/operators/ctc_align_op.h
paddle/fluid/operators/ctc_align_op.h
+0
-2
paddle/fluid/operators/cudnn_lstm_op.cu.cc
paddle/fluid/operators/cudnn_lstm_op.cu.cc
+24
-25
paddle/fluid/operators/cumsum_op_mlu.cc
paddle/fluid/operators/cumsum_op_mlu.cc
+1
-3
paddle/fluid/operators/cumsum_op_npu.cc
paddle/fluid/operators/cumsum_op_npu.cc
+3
-5
paddle/fluid/operators/cvm_op.cc
paddle/fluid/operators/cvm_op.cc
+0
-2
paddle/fluid/operators/cvm_op.cu
paddle/fluid/operators/cvm_op.cu
+0
-1
paddle/fluid/operators/cvm_op.h
paddle/fluid/operators/cvm_op.h
+0
-2
paddle/fluid/operators/data_norm_op.cc
paddle/fluid/operators/data_norm_op.cc
+6
-7
paddle/fluid/operators/data_norm_op.cu
paddle/fluid/operators/data_norm_op.cu
+1
-2
paddle/fluid/operators/deformable_conv_op_mlu.cc
paddle/fluid/operators/deformable_conv_op_mlu.cc
+14
-16
paddle/fluid/operators/deformable_psroi_pooling_op.cu
paddle/fluid/operators/deformable_psroi_pooling_op.cu
+0
-1
paddle/fluid/operators/deformable_psroi_pooling_op.h
paddle/fluid/operators/deformable_psroi_pooling_op.h
+1
-3
paddle/fluid/operators/detection/bbox_util.cu.h
paddle/fluid/operators/detection/bbox_util.cu.h
+7
-9
paddle/fluid/operators/detection/bipartite_match_op.cc
paddle/fluid/operators/detection/bipartite_match_op.cc
+1
-3
paddle/fluid/operators/detection/box_clip_op.cu
paddle/fluid/operators/detection/box_clip_op.cu
+0
-1
paddle/fluid/operators/detection/box_clip_op.h
paddle/fluid/operators/detection/box_clip_op.h
+4
-5
paddle/fluid/operators/detection/box_coder_op_npu.cc
paddle/fluid/operators/detection/box_coder_op_npu.cc
+69
-66
paddle/fluid/operators/detection/collect_fpn_proposals_op.cc
paddle/fluid/operators/detection/collect_fpn_proposals_op.cc
+0
-1
paddle/fluid/operators/detection/collect_fpn_proposals_op.cu
paddle/fluid/operators/detection/collect_fpn_proposals_op.cu
+12
-14
paddle/fluid/operators/detection/density_prior_box_op_npu.cc
paddle/fluid/operators/detection/density_prior_box_op_npu.cc
+36
-35
paddle/fluid/operators/detection/generate_mask_labels_op.cc
paddle/fluid/operators/detection/generate_mask_labels_op.cc
+29
-29
paddle/fluid/operators/detection/generate_proposal_labels_op.cc
.../fluid/operators/detection/generate_proposal_labels_op.cc
+31
-31
paddle/fluid/operators/detection/generate_proposals_op.cc
paddle/fluid/operators/detection/generate_proposals_op.cc
+20
-22
paddle/fluid/operators/detection/generate_proposals_op.cu
paddle/fluid/operators/detection/generate_proposals_op.cu
+19
-21
paddle/fluid/operators/detection/generate_proposals_v2_op.cc
paddle/fluid/operators/detection/generate_proposals_v2_op.cc
+0
-2
paddle/fluid/operators/detection/iou_similarity_op_mlu.cc
paddle/fluid/operators/detection/iou_similarity_op_mlu.cc
+24
-26
paddle/fluid/operators/detection/iou_similarity_op_npu.cc
paddle/fluid/operators/detection/iou_similarity_op_npu.cc
+24
-26
paddle/fluid/operators/detection/locality_aware_nms_op.cc
paddle/fluid/operators/detection/locality_aware_nms_op.cc
+4
-6
paddle/fluid/operators/detection/matrix_nms_op.cc
paddle/fluid/operators/detection/matrix_nms_op.cc
+0
-2
paddle/fluid/operators/detection/multiclass_nms_op.cc
paddle/fluid/operators/detection/multiclass_nms_op.cc
+4
-6
paddle/fluid/operators/detection/polygon_box_transform_op.cc
paddle/fluid/operators/detection/polygon_box_transform_op.cc
+0
-2
paddle/fluid/operators/detection/polygon_box_transform_op.cu
paddle/fluid/operators/detection/polygon_box_transform_op.cu
+0
-1
paddle/fluid/operators/detection/prior_box_op_npu.cc
paddle/fluid/operators/detection/prior_box_op_npu.cc
+3
-5
paddle/fluid/operators/detection/retinanet_detection_output_op.cc
...luid/operators/detection/retinanet_detection_output_op.cc
+24
-23
paddle/fluid/operators/detection/roi_perspective_transform_op.cc
...fluid/operators/detection/roi_perspective_transform_op.cc
+6
-8
paddle/fluid/operators/detection/rpn_target_assign_op.cc
paddle/fluid/operators/detection/rpn_target_assign_op.cc
+71
-67
paddle/fluid/operators/detection/sigmoid_focal_loss_op.cu
paddle/fluid/operators/detection/sigmoid_focal_loss_op.cu
+10
-11
paddle/fluid/operators/detection/sigmoid_focal_loss_op.h
paddle/fluid/operators/detection/sigmoid_focal_loss_op.h
+10
-11
paddle/fluid/operators/detection/yolo_box_op_mlu.cc
paddle/fluid/operators/detection/yolo_box_op_mlu.cc
+1
-1
paddle/fluid/operators/detection_map_op.cc
paddle/fluid/operators/detection_map_op.cc
+0
-2
paddle/fluid/operators/dgc_clip_by_norm_op.h
paddle/fluid/operators/dgc_clip_by_norm_op.h
+0
-2
paddle/fluid/operators/dropout_op_mlu.cc
paddle/fluid/operators/dropout_op_mlu.cc
+3
-5
paddle/fluid/operators/dropout_op_npu.cc
paddle/fluid/operators/dropout_op_npu.cc
+7
-9
paddle/fluid/operators/elementwise/elementwise_add_op_mlu.cc
paddle/fluid/operators/elementwise/elementwise_add_op_mlu.cc
+0
-1
paddle/fluid/operators/elementwise/elementwise_add_op_npu.cc
paddle/fluid/operators/elementwise/elementwise_add_op_npu.cc
+3
-4
paddle/fluid/operators/elementwise/elementwise_div_op.h
paddle/fluid/operators/elementwise/elementwise_div_op.h
+0
-1
paddle/fluid/operators/elementwise/elementwise_div_op_mlu.cc
paddle/fluid/operators/elementwise/elementwise_div_op_mlu.cc
+3
-5
paddle/fluid/operators/elementwise/elementwise_div_op_npu.cc
paddle/fluid/operators/elementwise/elementwise_div_op_npu.cc
+10
-12
paddle/fluid/operators/elementwise/elementwise_floordiv_op_npu.cc
...luid/operators/elementwise/elementwise_floordiv_op_npu.cc
+0
-2
paddle/fluid/operators/elementwise/elementwise_max_op_npu.cc
paddle/fluid/operators/elementwise/elementwise_max_op_npu.cc
+8
-10
paddle/fluid/operators/elementwise/elementwise_min_op_mlu.cc
paddle/fluid/operators/elementwise/elementwise_min_op_mlu.cc
+0
-2
paddle/fluid/operators/elementwise/elementwise_min_op_npu.cc
paddle/fluid/operators/elementwise/elementwise_min_op_npu.cc
+7
-9
paddle/fluid/operators/elementwise/elementwise_mlu.h
paddle/fluid/operators/elementwise/elementwise_mlu.h
+3
-3
paddle/fluid/operators/elementwise/elementwise_mod_op_npu.cc
paddle/fluid/operators/elementwise/elementwise_mod_op_npu.cc
+1
-3
paddle/fluid/operators/elementwise/elementwise_mul_op.h
paddle/fluid/operators/elementwise/elementwise_mul_op.h
+0
-1
paddle/fluid/operators/elementwise/elementwise_mul_op_mlu.cc
paddle/fluid/operators/elementwise/elementwise_mul_op_mlu.cc
+2
-3
paddle/fluid/operators/elementwise/elementwise_mul_op_npu.cc
paddle/fluid/operators/elementwise/elementwise_mul_op_npu.cc
+4
-5
paddle/fluid/operators/elementwise/elementwise_npu.h
paddle/fluid/operators/elementwise/elementwise_npu.h
+4
-5
paddle/fluid/operators/elementwise/elementwise_op.h
paddle/fluid/operators/elementwise/elementwise_op.h
+0
-6
paddle/fluid/operators/elementwise/elementwise_pow_op_mlu.cc
paddle/fluid/operators/elementwise/elementwise_pow_op_mlu.cc
+5
-7
paddle/fluid/operators/elementwise/elementwise_pow_op_npu.cc
paddle/fluid/operators/elementwise/elementwise_pow_op_npu.cc
+15
-17
paddle/fluid/operators/elementwise/elementwise_sub_op_mlu.cc
paddle/fluid/operators/elementwise/elementwise_sub_op_mlu.cc
+0
-2
paddle/fluid/operators/elementwise/elementwise_sub_op_npu.cc
paddle/fluid/operators/elementwise/elementwise_sub_op_npu.cc
+3
-5
paddle/fluid/operators/expand_as_op.h
paddle/fluid/operators/expand_as_op.h
+0
-1
paddle/fluid/operators/expand_as_v2_op.h
paddle/fluid/operators/expand_as_v2_op.h
+0
-1
paddle/fluid/operators/expand_as_v2_op_mlu.cc
paddle/fluid/operators/expand_as_v2_op_mlu.cc
+0
-2
paddle/fluid/operators/expand_op.h
paddle/fluid/operators/expand_op.h
+0
-1
paddle/fluid/operators/expand_v2_op_npu.cc
paddle/fluid/operators/expand_v2_op_npu.cc
+4
-5
paddle/fluid/operators/eye_op_npu.cc
paddle/fluid/operators/eye_op_npu.cc
+0
-2
paddle/fluid/operators/fc_op.h
paddle/fluid/operators/fc_op.h
+0
-1
paddle/fluid/operators/fill_constant_batch_size_like_op_npu.cc
...e/fluid/operators/fill_constant_batch_size_like_op_npu.cc
+1
-3
paddle/fluid/operators/fill_constant_op_mlu.cc
paddle/fluid/operators/fill_constant_op_mlu.cc
+2
-1
paddle/fluid/operators/filter_by_instag_op.cu
paddle/fluid/operators/filter_by_instag_op.cu
+0
-1
paddle/fluid/operators/filter_by_instag_op.h
paddle/fluid/operators/filter_by_instag_op.h
+0
-1
paddle/fluid/operators/flatten_op.cc
paddle/fluid/operators/flatten_op.cc
+0
-2
paddle/fluid/operators/flatten_op_npu.cc
paddle/fluid/operators/flatten_op_npu.cc
+0
-2
paddle/fluid/operators/fsp_op.h
paddle/fluid/operators/fsp_op.h
+0
-2
paddle/fluid/operators/fused/attn_gemm.h
paddle/fluid/operators/fused/attn_gemm.h
+0
-1
paddle/fluid/operators/fused/attn_gemm_int8.h
paddle/fluid/operators/fused/attn_gemm_int8.h
+0
-1
paddle/fluid/operators/fused/conv_fusion_op.cu
paddle/fluid/operators/fused/conv_fusion_op.cu
+5
-5
paddle/fluid/operators/fused/cudnn_bn_add_relu_test.cc
paddle/fluid/operators/fused/cudnn_bn_add_relu_test.cc
+86
-85
paddle/fluid/operators/fused/cudnn_bn_stats_finalize.cu.h
paddle/fluid/operators/fused/cudnn_bn_stats_finalize.cu.h
+10
-11
paddle/fluid/operators/fused/cudnn_norm_conv.cu.h
paddle/fluid/operators/fused/cudnn_norm_conv.cu.h
+10
-11
paddle/fluid/operators/fused/cudnn_norm_conv_test.cc
paddle/fluid/operators/fused/cudnn_norm_conv_test.cc
+6
-7
paddle/fluid/operators/fused/cudnn_scale_bias_add_relu.cu.h
paddle/fluid/operators/fused/cudnn_scale_bias_add_relu.cu.h
+19
-20
paddle/fluid/operators/fused/fmha_ref.h
paddle/fluid/operators/fused/fmha_ref.h
+0
-2
paddle/fluid/operators/fused/fused_attention_op.cc
paddle/fluid/operators/fused/fused_attention_op.cc
+0
-2
paddle/fluid/operators/fused/fused_attention_op.cu
paddle/fluid/operators/fused/fused_attention_op.cu
+3
-5
paddle/fluid/operators/fused/fused_attention_op_xpu.cc
paddle/fluid/operators/fused/fused_attention_op_xpu.cc
+83
-74
paddle/fluid/operators/fused/fused_bias_dropout_residual_layer_norm_op.cc
...rators/fused/fused_bias_dropout_residual_layer_norm_op.cc
+0
-2
paddle/fluid/operators/fused/fused_bias_dropout_residual_layer_norm_op.cu
...rators/fused/fused_bias_dropout_residual_layer_norm_op.cu
+0
-2
paddle/fluid/operators/fused/fused_bn_activation_op.cc
paddle/fluid/operators/fused/fused_bn_activation_op.cc
+3
-3
paddle/fluid/operators/fused/fused_bn_activation_op.cu
paddle/fluid/operators/fused/fused_bn_activation_op.cu
+2
-3
paddle/fluid/operators/fused/fused_bn_activation_op.h
paddle/fluid/operators/fused/fused_bn_activation_op.h
+0
-1
paddle/fluid/operators/fused/fused_bn_add_activation_op.cc
paddle/fluid/operators/fused/fused_bn_add_activation_op.cc
+3
-3
paddle/fluid/operators/fused/fused_bn_add_activation_op.cu
paddle/fluid/operators/fused/fused_bn_add_activation_op.cu
+2
-3
paddle/fluid/operators/fused/fused_bn_add_activation_op.h
paddle/fluid/operators/fused/fused_bn_add_activation_op.h
+0
-1
paddle/fluid/operators/fused/fused_embedding_eltwise_layernorm_op.cu
...d/operators/fused/fused_embedding_eltwise_layernorm_op.cu
+0
-1
paddle/fluid/operators/fused/fused_embedding_fc_lstm_op.cc
paddle/fluid/operators/fused/fused_embedding_fc_lstm_op.cc
+12
-9
paddle/fluid/operators/fused/fused_embedding_fc_lstm_op.h
paddle/fluid/operators/fused/fused_embedding_fc_lstm_op.h
+0
-2
paddle/fluid/operators/fused/fused_embedding_seq_pool_op.h
paddle/fluid/operators/fused/fused_embedding_seq_pool_op.h
+2
-3
paddle/fluid/operators/fused/fused_feedforward_op.cc
paddle/fluid/operators/fused/fused_feedforward_op.cc
+0
-1
paddle/fluid/operators/fused/fused_feedforward_op.cu
paddle/fluid/operators/fused/fused_feedforward_op.cu
+0
-2
paddle/fluid/operators/fused/fused_feedforward_op_xpu.cc
paddle/fluid/operators/fused/fused_feedforward_op_xpu.cc
+99
-95
paddle/fluid/operators/fused/fused_gate_attention.h
paddle/fluid/operators/fused/fused_gate_attention.h
+17
-19
paddle/fluid/operators/fused/fused_gate_attention_op.cc
paddle/fluid/operators/fused/fused_gate_attention_op.cc
+0
-1
paddle/fluid/operators/fused/fused_gate_attention_op.cu
paddle/fluid/operators/fused/fused_gate_attention_op.cu
+46
-47
paddle/fluid/operators/fused/fused_gemm_epilogue_op.cc
paddle/fluid/operators/fused/fused_gemm_epilogue_op.cc
+0
-1
paddle/fluid/operators/fused/fused_gemm_epilogue_op.cu
paddle/fluid/operators/fused/fused_gemm_epilogue_op.cu
+0
-2
paddle/fluid/operators/fused/fused_gemm_epilogue_op_xpu.cc
paddle/fluid/operators/fused/fused_gemm_epilogue_op_xpu.cc
+0
-2
paddle/fluid/operators/fused/fused_multi_transformer_int8_op.cc
.../fluid/operators/fused/fused_multi_transformer_int8_op.cc
+1
-3
paddle/fluid/operators/fused/fused_multi_transformer_int8_op.cu
.../fluid/operators/fused/fused_multi_transformer_int8_op.cu
+20
-18
paddle/fluid/operators/fused/fused_multi_transformer_op.cc
paddle/fluid/operators/fused/fused_multi_transformer_op.cc
+1
-3
paddle/fluid/operators/fused/fused_multi_transformer_op.cu
paddle/fluid/operators/fused/fused_multi_transformer_op.cu
+49
-43
paddle/fluid/operators/fused/fused_multi_transformer_op.cu.h
paddle/fluid/operators/fused/fused_multi_transformer_op.cu.h
+5
-7
paddle/fluid/operators/fused/fusion_conv_inception_op.cu
paddle/fluid/operators/fused/fusion_conv_inception_op.cu
+0
-1
paddle/fluid/operators/fused/fusion_gru_op.cc
paddle/fluid/operators/fused/fusion_gru_op.cc
+16
-13
paddle/fluid/operators/fused/fusion_gru_op.h
paddle/fluid/operators/fused/fusion_gru_op.h
+0
-2
paddle/fluid/operators/fused/fusion_lstm_op.cc
paddle/fluid/operators/fused/fusion_lstm_op.cc
+12
-9
paddle/fluid/operators/fused/fusion_lstm_op.h
paddle/fluid/operators/fused/fusion_lstm_op.h
+0
-2
paddle/fluid/operators/fused/fusion_repeated_fc_relu_op.cc
paddle/fluid/operators/fused/fusion_repeated_fc_relu_op.cc
+5
-3
paddle/fluid/operators/fused/fusion_repeated_fc_relu_op.h
paddle/fluid/operators/fused/fusion_repeated_fc_relu_op.h
+0
-2
paddle/fluid/operators/fused/fusion_seqconv_eltadd_relu_op.cc
...le/fluid/operators/fused/fusion_seqconv_eltadd_relu_op.cc
+11
-9
paddle/fluid/operators/fused/fusion_seqconv_eltadd_relu_op.h
paddle/fluid/operators/fused/fusion_seqconv_eltadd_relu_op.h
+0
-2
paddle/fluid/operators/fused/fusion_seqexpand_concat_fc_op.cc
...le/fluid/operators/fused/fusion_seqexpand_concat_fc_op.cc
+4
-3
paddle/fluid/operators/fused/fusion_seqexpand_concat_fc_op.h
paddle/fluid/operators/fused/fusion_seqexpand_concat_fc_op.h
+0
-2
paddle/fluid/operators/fused/fusion_seqpool_concat_op.h
paddle/fluid/operators/fused/fusion_seqpool_concat_op.h
+0
-2
paddle/fluid/operators/fused/fusion_seqpool_cvm_concat_op.cc
paddle/fluid/operators/fused/fusion_seqpool_cvm_concat_op.cc
+2
-1
paddle/fluid/operators/fused/fusion_seqpool_cvm_concat_op.h
paddle/fluid/operators/fused/fusion_seqpool_cvm_concat_op.h
+0
-2
paddle/fluid/operators/fused/fusion_squared_mat_sub_op.cc
paddle/fluid/operators/fused/fusion_squared_mat_sub_op.cc
+6
-6
paddle/fluid/operators/fused/fusion_squared_mat_sub_op.h
paddle/fluid/operators/fused/fusion_squared_mat_sub_op.h
+0
-2
paddle/fluid/operators/fused/multihead_matmul_op.cu
paddle/fluid/operators/fused/multihead_matmul_op.cu
+5
-6
paddle/fluid/operators/fused/resnet_basic_block_op.cc
paddle/fluid/operators/fused/resnet_basic_block_op.cc
+0
-1
paddle/fluid/operators/fused/resnet_basic_block_op_xpu.cc
paddle/fluid/operators/fused/resnet_basic_block_op_xpu.cc
+0
-2
paddle/fluid/operators/fused/resnet_unit_op.cc
paddle/fluid/operators/fused/resnet_unit_op.cc
+0
-2
paddle/fluid/operators/fused/resnet_unit_op.cu
paddle/fluid/operators/fused/resnet_unit_op.cu
+71
-62
paddle/fluid/operators/fused/resnet_unit_op_xpu.cc
paddle/fluid/operators/fused/resnet_unit_op_xpu.cc
+55
-46
paddle/fluid/operators/fused/skip_layernorm_op.cu
paddle/fluid/operators/fused/skip_layernorm_op.cu
+0
-1
paddle/fluid/operators/fused/xpu_fused_common_function.h
paddle/fluid/operators/fused/xpu_fused_common_function.h
+8
-7
paddle/fluid/operators/fused/yolo_box_head_op.cu
paddle/fluid/operators/fused/yolo_box_head_op.cu
+0
-1
paddle/fluid/operators/fused/yolo_box_post_op.cu
paddle/fluid/operators/fused/yolo_box_post_op.cu
+0
-1
paddle/fluid/operators/gather_nd_op_mlu.cc
paddle/fluid/operators/gather_nd_op_mlu.cc
+0
-2
paddle/fluid/operators/gather_nd_op_npu.cc
paddle/fluid/operators/gather_nd_op_npu.cc
+0
-1
paddle/fluid/operators/gather_scatter_kernel.cc
paddle/fluid/operators/gather_scatter_kernel.cc
+11
-13
paddle/fluid/operators/gather_scatter_kernel.cu
paddle/fluid/operators/gather_scatter_kernel.cu
+12
-14
paddle/fluid/operators/gather_scatter_kernel.h
paddle/fluid/operators/gather_scatter_kernel.h
+24
-26
paddle/fluid/operators/gaussian_random_op.cc
paddle/fluid/operators/gaussian_random_op.cc
+0
-2
paddle/fluid/operators/gaussian_random_op_mlu.cc
paddle/fluid/operators/gaussian_random_op_mlu.cc
+1
-2
paddle/fluid/operators/gaussian_random_op_npu.cc
paddle/fluid/operators/gaussian_random_op_npu.cc
+1
-2
paddle/fluid/operators/gelu_op_npu.cc
paddle/fluid/operators/gelu_op_npu.cc
+0
-2
paddle/fluid/operators/graph_khop_sampler_op.cu
paddle/fluid/operators/graph_khop_sampler_op.cu
+0
-2
paddle/fluid/operators/graph_khop_sampler_op.h
paddle/fluid/operators/graph_khop_sampler_op.h
+0
-2
paddle/fluid/operators/grid_sampler_op_mlu.cc
paddle/fluid/operators/grid_sampler_op_mlu.cc
+2
-4
paddle/fluid/operators/group_norm_op.cc
paddle/fluid/operators/group_norm_op.cc
+7
-8
paddle/fluid/operators/group_norm_op.cu
paddle/fluid/operators/group_norm_op.cu
+5
-5
paddle/fluid/operators/group_norm_op.h
paddle/fluid/operators/group_norm_op.h
+0
-1
paddle/fluid/operators/group_norm_op_npu.cc
paddle/fluid/operators/group_norm_op_npu.cc
+23
-24
paddle/fluid/operators/gru_op.cc
paddle/fluid/operators/gru_op.cc
+7
-7
paddle/fluid/operators/gru_op.cu.cc
paddle/fluid/operators/gru_op.cu.cc
+5
-4
paddle/fluid/operators/gru_op.h
paddle/fluid/operators/gru_op.h
+11
-10
paddle/fluid/operators/gru_unit_op.h
paddle/fluid/operators/gru_unit_op.h
+2
-4
paddle/fluid/operators/huber_loss_op_mlu.cc
paddle/fluid/operators/huber_loss_op_mlu.cc
+12
-14
paddle/fluid/operators/huber_loss_op_npu.cc
paddle/fluid/operators/huber_loss_op_npu.cc
+2
-4
paddle/fluid/operators/im2sequence_op.h
paddle/fluid/operators/im2sequence_op.h
+18
-19
paddle/fluid/operators/index_sample_op_npu.cc
paddle/fluid/operators/index_sample_op_npu.cc
+2
-3
paddle/fluid/operators/index_select_op.h
paddle/fluid/operators/index_select_op.h
+0
-1
paddle/fluid/operators/index_select_op_npu.cc
paddle/fluid/operators/index_select_op_npu.cc
+3
-5
paddle/fluid/operators/inplace_abn_op.cc
paddle/fluid/operators/inplace_abn_op.cc
+5
-5
paddle/fluid/operators/inplace_abn_op.cu
paddle/fluid/operators/inplace_abn_op.cu
+3
-3
paddle/fluid/operators/inplace_abn_op.h
paddle/fluid/operators/inplace_abn_op.h
+0
-1
paddle/fluid/operators/instance_norm_op.cc
paddle/fluid/operators/instance_norm_op.cc
+6
-6
paddle/fluid/operators/instance_norm_op.h
paddle/fluid/operators/instance_norm_op.h
+0
-1
paddle/fluid/operators/instance_norm_op_npu.cc
paddle/fluid/operators/instance_norm_op_npu.cc
+1
-2
paddle/fluid/operators/interpolate_op.cu
paddle/fluid/operators/interpolate_op.cu
+8
-8
paddle/fluid/operators/interpolate_op.h
paddle/fluid/operators/interpolate_op.h
+1
-2
paddle/fluid/operators/interpolate_op_npu.cc
paddle/fluid/operators/interpolate_op_npu.cc
+2
-3
paddle/fluid/operators/interpolate_v2_op_mlu.cc
paddle/fluid/operators/interpolate_v2_op_mlu.cc
+2
-2
paddle/fluid/operators/interpolate_v2_op_npu.cc
paddle/fluid/operators/interpolate_v2_op_npu.cc
+38
-34
paddle/fluid/operators/jit/benchmark.cc
paddle/fluid/operators/jit/benchmark.cc
+17
-18
paddle/fluid/operators/kldiv_loss_op_npu.cc
paddle/fluid/operators/kldiv_loss_op_npu.cc
+1
-3
paddle/fluid/operators/label_smooth_op_mlu.cc
paddle/fluid/operators/label_smooth_op_mlu.cc
+0
-2
paddle/fluid/operators/label_smooth_op_npu.cc
paddle/fluid/operators/label_smooth_op_npu.cc
+4
-6
paddle/fluid/operators/layer_norm_kernel.cu.h
paddle/fluid/operators/layer_norm_kernel.cu.h
+0
-1
paddle/fluid/operators/layer_norm_op.cc
paddle/fluid/operators/layer_norm_op.cc
+3
-4
paddle/fluid/operators/layer_norm_op_mlu.cc
paddle/fluid/operators/layer_norm_op_mlu.cc
+8
-9
paddle/fluid/operators/layer_norm_op_npu.cc
paddle/fluid/operators/layer_norm_op_npu.cc
+16
-17
paddle/fluid/operators/layout_utils.h
paddle/fluid/operators/layout_utils.h
+0
-2
paddle/fluid/operators/limit_by_capacity_op.cu
paddle/fluid/operators/limit_by_capacity_op.cu
+0
-2
paddle/fluid/operators/log_loss_op_npu.cc
paddle/fluid/operators/log_loss_op_npu.cc
+0
-2
paddle/fluid/operators/log_loss_op_xpu.cc
paddle/fluid/operators/log_loss_op_xpu.cc
+0
-2
paddle/fluid/operators/lookup_table_dequant_op.h
paddle/fluid/operators/lookup_table_dequant_op.h
+0
-1
paddle/fluid/operators/lookup_table_op.h
paddle/fluid/operators/lookup_table_op.h
+0
-1
paddle/fluid/operators/lookup_table_v2_op.h
paddle/fluid/operators/lookup_table_v2_op.h
+5
-6
paddle/fluid/operators/lookup_table_v2_op_mlu.cc
paddle/fluid/operators/lookup_table_v2_op_mlu.cc
+1
-3
paddle/fluid/operators/lookup_table_v2_op_npu.cc
paddle/fluid/operators/lookup_table_v2_op_npu.cc
+5
-6
paddle/fluid/operators/lrn_op.h
paddle/fluid/operators/lrn_op.h
+0
-3
paddle/fluid/operators/lstm_op.h
paddle/fluid/operators/lstm_op.h
+19
-21
paddle/fluid/operators/lstmp_op.h
paddle/fluid/operators/lstmp_op.h
+23
-24
paddle/fluid/operators/masked_select_op_mlu.cc
paddle/fluid/operators/masked_select_op_mlu.cc
+7
-7
paddle/fluid/operators/match_matrix_tensor_op.cc
paddle/fluid/operators/match_matrix_tensor_op.cc
+1
-2
paddle/fluid/operators/match_matrix_tensor_op.h
paddle/fluid/operators/match_matrix_tensor_op.h
+0
-1
paddle/fluid/operators/math/context_project.h
paddle/fluid/operators/math/context_project.h
+23
-22
paddle/fluid/operators/math/eigen_values_vectors.h
paddle/fluid/operators/math/eigen_values_vectors.h
+13
-13
paddle/fluid/operators/math/sample_prob.cu
paddle/fluid/operators/math/sample_prob.cu
+1
-3
paddle/fluid/operators/math/sample_prob.h
paddle/fluid/operators/math/sample_prob.h
+0
-2
paddle/fluid/operators/math/sequence_pooling.cc
paddle/fluid/operators/math/sequence_pooling.cc
+2
-3
paddle/fluid/operators/math/softmax.cu
paddle/fluid/operators/math/softmax.cu
+2
-3
paddle/fluid/operators/math/tree2col.cu
paddle/fluid/operators/math/tree2col.cu
+4
-5
paddle/fluid/operators/matmul_op_mlu.cc
paddle/fluid/operators/matmul_op_mlu.cc
+4
-6
paddle/fluid/operators/matmul_op_npu.cc
paddle/fluid/operators/matmul_op_npu.cc
+13
-14
paddle/fluid/operators/matmul_v2_op_mlu.cc
paddle/fluid/operators/matmul_v2_op_mlu.cc
+4
-6
paddle/fluid/operators/matmul_v2_op_npu.cc
paddle/fluid/operators/matmul_v2_op_npu.cc
+10
-11
paddle/fluid/operators/mean_iou_op.h
paddle/fluid/operators/mean_iou_op.h
+3
-4
paddle/fluid/operators/mean_op_mlu.cc
paddle/fluid/operators/mean_op_mlu.cc
+8
-9
paddle/fluid/operators/mean_op_npu.cc
paddle/fluid/operators/mean_op_npu.cc
+10
-11
paddle/fluid/operators/meshgrid_op_mlu.cc
paddle/fluid/operators/meshgrid_op_mlu.cc
+6
-6
paddle/fluid/operators/metrics/accuracy_op_mlu.cc
paddle/fluid/operators/metrics/accuracy_op_mlu.cc
+7
-7
paddle/fluid/operators/metrics/accuracy_op_xpu.cc
paddle/fluid/operators/metrics/accuracy_op_xpu.cc
+0
-1
paddle/fluid/operators/metrics/precision_recall_op.h
paddle/fluid/operators/metrics/precision_recall_op.h
+0
-1
paddle/fluid/operators/mkldnn/dequantize_mkldnn_op.cc
paddle/fluid/operators/mkldnn/dequantize_mkldnn_op.cc
+0
-1
paddle/fluid/operators/mkldnn/matmul_v2_mkldnn_op.cc
paddle/fluid/operators/mkldnn/matmul_v2_mkldnn_op.cc
+25
-24
paddle/fluid/operators/mkldnn/quantize_mkldnn_op.cc
paddle/fluid/operators/mkldnn/quantize_mkldnn_op.cc
+0
-1
paddle/fluid/operators/mkldnn/requantize_mkldnn_op.cc
paddle/fluid/operators/mkldnn/requantize_mkldnn_op.cc
+0
-1
paddle/fluid/operators/mkldnn/reshape_mkldnn_op.cc
paddle/fluid/operators/mkldnn/reshape_mkldnn_op.cc
+1
-1
paddle/fluid/operators/mkldnn/transpose_mkldnn_op.cc
paddle/fluid/operators/mkldnn/transpose_mkldnn_op.cc
+5
-5
paddle/fluid/operators/mlu/mlu_baseop.cc
paddle/fluid/operators/mlu/mlu_baseop.cc
+64
-64
paddle/fluid/operators/mlu/mlu_baseop.h
paddle/fluid/operators/mlu/mlu_baseop.h
+6
-7
paddle/fluid/operators/modified_huber_loss_op.cu
paddle/fluid/operators/modified_huber_loss_op.cu
+0
-2
paddle/fluid/operators/modified_huber_loss_op.h
paddle/fluid/operators/modified_huber_loss_op.h
+0
-1
paddle/fluid/operators/multi_dot_op.cc
paddle/fluid/operators/multi_dot_op.cc
+0
-1
paddle/fluid/operators/multinomial_op_npu.cc
paddle/fluid/operators/multinomial_op_npu.cc
+0
-2
paddle/fluid/operators/multiplex_op.cc
paddle/fluid/operators/multiplex_op.cc
+0
-2
paddle/fluid/operators/nce_op.h
paddle/fluid/operators/nce_op.h
+5
-6
paddle/fluid/operators/norm_op_npu.cc
paddle/fluid/operators/norm_op_npu.cc
+0
-1
paddle/fluid/operators/norm_utils.cu.h
paddle/fluid/operators/norm_utils.cu.h
+14
-15
paddle/fluid/operators/number_count_op.cu
paddle/fluid/operators/number_count_op.cu
+0
-2
paddle/fluid/operators/one_hot_op.h
paddle/fluid/operators/one_hot_op.h
+0
-1
paddle/fluid/operators/one_hot_op_npu.cc
paddle/fluid/operators/one_hot_op_npu.cc
+1
-2
paddle/fluid/operators/one_hot_op_xpu.cc
paddle/fluid/operators/one_hot_op_xpu.cc
+0
-2
paddle/fluid/operators/one_hot_v2_op_mlu.cc
paddle/fluid/operators/one_hot_v2_op_mlu.cc
+7
-6
paddle/fluid/operators/one_hot_v2_op_npu.cc
paddle/fluid/operators/one_hot_v2_op_npu.cc
+1
-2
paddle/fluid/operators/optimizers/adadelta_op.cc
paddle/fluid/operators/optimizers/adadelta_op.cc
+0
-2
paddle/fluid/operators/optimizers/adagrad_op.cc
paddle/fluid/operators/optimizers/adagrad_op.cc
+0
-1
paddle/fluid/operators/optimizers/adam_op.h
paddle/fluid/operators/optimizers/adam_op.h
+0
-2
paddle/fluid/operators/optimizers/adam_op_mlu.cc
paddle/fluid/operators/optimizers/adam_op_mlu.cc
+6
-8
paddle/fluid/operators/optimizers/adam_op_npu.cc
paddle/fluid/operators/optimizers/adam_op_npu.cc
+6
-8
paddle/fluid/operators/optimizers/adamax_op.cc
paddle/fluid/operators/optimizers/adamax_op.cc
+0
-1
paddle/fluid/operators/optimizers/decayed_adagrad_op.cc
paddle/fluid/operators/optimizers/decayed_adagrad_op.cc
+0
-1
paddle/fluid/operators/optimizers/dpsgd_op.cc
paddle/fluid/operators/optimizers/dpsgd_op.cc
+0
-1
paddle/fluid/operators/optimizers/ftrl_op.cc
paddle/fluid/operators/optimizers/ftrl_op.cc
+0
-1
paddle/fluid/operators/optimizers/ftrl_op.h
paddle/fluid/operators/optimizers/ftrl_op.h
+0
-1
paddle/fluid/operators/optimizers/merged_adam_op.cc
paddle/fluid/operators/optimizers/merged_adam_op.cc
+0
-2
paddle/fluid/operators/optimizers/merged_momentum_op_mlu.cc
paddle/fluid/operators/optimizers/merged_momentum_op_mlu.cc
+3
-2
paddle/fluid/operators/optimizers/momentum_op.cc
paddle/fluid/operators/optimizers/momentum_op.cc
+6
-8
paddle/fluid/operators/optimizers/momentum_op_mlu.cc
paddle/fluid/operators/optimizers/momentum_op_mlu.cc
+2
-2
paddle/fluid/operators/optimizers/proximal_adagrad_op.cc
paddle/fluid/operators/optimizers/proximal_adagrad_op.cc
+0
-1
paddle/fluid/operators/optimizers/proximal_adagrad_op.h
paddle/fluid/operators/optimizers/proximal_adagrad_op.h
+0
-2
paddle/fluid/operators/optimizers/proximal_gd_op.cc
paddle/fluid/operators/optimizers/proximal_gd_op.cc
+0
-1
paddle/fluid/operators/optimizers/proximal_gd_op.h
paddle/fluid/operators/optimizers/proximal_gd_op.h
+0
-2
paddle/fluid/operators/optimizers/rmsprop_op_npu.cc
paddle/fluid/operators/optimizers/rmsprop_op_npu.cc
+6
-8
paddle/fluid/operators/optimizers/sparse_momentum_op.cc
paddle/fluid/operators/optimizers/sparse_momentum_op.cc
+9
-10
paddle/fluid/operators/p_norm_op_npu.cc
paddle/fluid/operators/p_norm_op_npu.cc
+10
-11
paddle/fluid/operators/pad3d_op_npu.cc
paddle/fluid/operators/pad3d_op_npu.cc
+0
-2
paddle/fluid/operators/pad_op_npu.cc
paddle/fluid/operators/pad_op_npu.cc
+0
-2
paddle/fluid/operators/partial_concat_op.cc
paddle/fluid/operators/partial_concat_op.cc
+0
-1
paddle/fluid/operators/partial_concat_op.cu
paddle/fluid/operators/partial_concat_op.cu
+1
-3
paddle/fluid/operators/partial_concat_op.h
paddle/fluid/operators/partial_concat_op.h
+0
-1
paddle/fluid/operators/partial_sum_op.cc
paddle/fluid/operators/partial_sum_op.cc
+0
-1
paddle/fluid/operators/partial_sum_op.cu
paddle/fluid/operators/partial_sum_op.cu
+2
-4
paddle/fluid/operators/partial_sum_op.h
paddle/fluid/operators/partial_sum_op.h
+0
-2
paddle/fluid/operators/pool_op.cc
paddle/fluid/operators/pool_op.cc
+4
-4
paddle/fluid/operators/pool_op.h
paddle/fluid/operators/pool_op.h
+0
-2
paddle/fluid/operators/pool_op_mlu.cc
paddle/fluid/operators/pool_op_mlu.cc
+6
-6
paddle/fluid/operators/positive_negative_pair_op.h
paddle/fluid/operators/positive_negative_pair_op.h
+0
-2
paddle/fluid/operators/prelu_op.cc
paddle/fluid/operators/prelu_op.cc
+0
-2
paddle/fluid/operators/prroi_pool_op.cc
paddle/fluid/operators/prroi_pool_op.cc
+0
-2
paddle/fluid/operators/prroi_pool_op.cu
paddle/fluid/operators/prroi_pool_op.cu
+0
-2
paddle/fluid/operators/pyramid_hash_op.cc
paddle/fluid/operators/pyramid_hash_op.cc
+0
-1
paddle/fluid/operators/random_routing_op.cu
paddle/fluid/operators/random_routing_op.cu
+0
-2
paddle/fluid/operators/rank_attention_op.cc
paddle/fluid/operators/rank_attention_op.cc
+0
-1
paddle/fluid/operators/reduce_ops/reduce_any_op_npu.cc
paddle/fluid/operators/reduce_ops/reduce_any_op_npu.cc
+0
-1
paddle/fluid/operators/reduce_ops/reduce_any_op_npu_test.cc
paddle/fluid/operators/reduce_ops/reduce_any_op_npu_test.cc
+0
-2
paddle/fluid/operators/reduce_ops/reduce_max_op_mlu.cc
paddle/fluid/operators/reduce_ops/reduce_max_op_mlu.cc
+11
-9
paddle/fluid/operators/reduce_ops/reduce_max_op_npu.cc
paddle/fluid/operators/reduce_ops/reduce_max_op_npu.cc
+7
-8
paddle/fluid/operators/reduce_ops/reduce_mean_op_mlu.cc
paddle/fluid/operators/reduce_ops/reduce_mean_op_mlu.cc
+1
-1
paddle/fluid/operators/reduce_ops/reduce_mean_op_npu.cc
paddle/fluid/operators/reduce_ops/reduce_mean_op_npu.cc
+3
-3
paddle/fluid/operators/reduce_ops/reduce_min_op_npu.cc
paddle/fluid/operators/reduce_ops/reduce_min_op_npu.cc
+2
-3
paddle/fluid/operators/reduce_ops/reduce_op.h
paddle/fluid/operators/reduce_ops/reduce_op.h
+4
-5
paddle/fluid/operators/reduce_ops/reduce_op_function.h
paddle/fluid/operators/reduce_ops/reduce_op_function.h
+0
-1
paddle/fluid/operators/reduce_ops/reduce_prod_op_npu.cc
paddle/fluid/operators/reduce_ops/reduce_prod_op_npu.cc
+0
-1
paddle/fluid/operators/reduce_ops/reduce_sum_op.h
paddle/fluid/operators/reduce_ops/reduce_sum_op.h
+1
-1
paddle/fluid/operators/reduce_ops/reduce_sum_op_mlu.cc
paddle/fluid/operators/reduce_ops/reduce_sum_op_mlu.cc
+1
-1
paddle/fluid/operators/reduce_ops/reduce_sum_op_npu.cc
paddle/fluid/operators/reduce_ops/reduce_sum_op_npu.cc
+1
-1
paddle/fluid/operators/reshape_op.cc
paddle/fluid/operators/reshape_op.cc
+3
-5
paddle/fluid/operators/rnn_op_mlu.cc
paddle/fluid/operators/rnn_op_mlu.cc
+1
-2
paddle/fluid/operators/roi_align_op.cc
paddle/fluid/operators/roi_align_op.cc
+0
-2
paddle/fluid/operators/roi_align_op_mlu.cc
paddle/fluid/operators/roi_align_op_mlu.cc
+11
-13
paddle/fluid/operators/roi_align_op_npu.cc
paddle/fluid/operators/roi_align_op_npu.cc
+5
-6
paddle/fluid/operators/roi_pool_op.cc
paddle/fluid/operators/roi_pool_op.cc
+0
-2
paddle/fluid/operators/sample_logits_op.cu
paddle/fluid/operators/sample_logits_op.cu
+7
-8
paddle/fluid/operators/sample_logits_op.h
paddle/fluid/operators/sample_logits_op.h
+7
-9
paddle/fluid/operators/sampling_id_op.cc
paddle/fluid/operators/sampling_id_op.cc
+0
-2
paddle/fluid/operators/sampling_id_op.h
paddle/fluid/operators/sampling_id_op.h
+0
-2
paddle/fluid/operators/save_combine_op.cc
paddle/fluid/operators/save_combine_op.cc
+0
-2
paddle/fluid/operators/scatter_op_mlu.cc
paddle/fluid/operators/scatter_op_mlu.cc
+1
-1
paddle/fluid/operators/scatter_op_npu.cc
paddle/fluid/operators/scatter_op_npu.cc
+4
-6
paddle/fluid/operators/search_compute.h
paddle/fluid/operators/search_compute.h
+0
-1
paddle/fluid/operators/seed_op.cc
paddle/fluid/operators/seed_op.cc
+0
-1
paddle/fluid/operators/seed_op.h
paddle/fluid/operators/seed_op.h
+0
-1
paddle/fluid/operators/set_value_op.cc
paddle/fluid/operators/set_value_op.cc
+15
-12
paddle/fluid/operators/set_value_op.h
paddle/fluid/operators/set_value_op.h
+0
-1
paddle/fluid/operators/set_value_op_mlu.cc
paddle/fluid/operators/set_value_op_mlu.cc
+4
-4
paddle/fluid/operators/set_value_op_npu.cc
paddle/fluid/operators/set_value_op_npu.cc
+3
-3
paddle/fluid/operators/shape_op_mlu.cc
paddle/fluid/operators/shape_op_mlu.cc
+1
-2
paddle/fluid/operators/shape_op_npu.cc
paddle/fluid/operators/shape_op_npu.cc
+0
-2
paddle/fluid/operators/shard_index_op_npu.cc
paddle/fluid/operators/shard_index_op_npu.cc
+5
-6
paddle/fluid/operators/shuffle_batch_op.h
paddle/fluid/operators/shuffle_batch_op.h
+0
-1
paddle/fluid/operators/shuffle_channel_op.cu
paddle/fluid/operators/shuffle_channel_op.cu
+0
-1
paddle/fluid/operators/sigmoid_cross_entropy_with_logits_op_mlu.cc
...uid/operators/sigmoid_cross_entropy_with_logits_op_mlu.cc
+0
-1
paddle/fluid/operators/sigmoid_cross_entropy_with_logits_op_npu.cc
...uid/operators/sigmoid_cross_entropy_with_logits_op_npu.cc
+0
-1
paddle/fluid/operators/similarity_focus_op.h
paddle/fluid/operators/similarity_focus_op.h
+0
-1
paddle/fluid/operators/slice_op.cc
paddle/fluid/operators/slice_op.cc
+2
-4
paddle/fluid/operators/slice_op_mlu.cc
paddle/fluid/operators/slice_op_mlu.cc
+0
-2
paddle/fluid/operators/slice_op_npu.cc
paddle/fluid/operators/slice_op_npu.cc
+1
-2
paddle/fluid/operators/smooth_l1_loss_op.h
paddle/fluid/operators/smooth_l1_loss_op.h
+3
-4
paddle/fluid/operators/smooth_l1_loss_op_npu.cc
paddle/fluid/operators/smooth_l1_loss_op_npu.cc
+11
-11
paddle/fluid/operators/softmax_with_cross_entropy_op_mlu.cc
paddle/fluid/operators/softmax_with_cross_entropy_op_mlu.cc
+0
-2
paddle/fluid/operators/softmax_with_cross_entropy_op_npu.cc
paddle/fluid/operators/softmax_with_cross_entropy_op_npu.cc
+2
-4
paddle/fluid/operators/space_to_depth_op.cc
paddle/fluid/operators/space_to_depth_op.cc
+0
-2
paddle/fluid/operators/sparse_attention_op.cu
paddle/fluid/operators/sparse_attention_op.cu
+25
-23
paddle/fluid/operators/split_op_mlu.cc
paddle/fluid/operators/split_op_mlu.cc
+0
-2
paddle/fluid/operators/split_op_npu.cc
paddle/fluid/operators/split_op_npu.cc
+1
-3
paddle/fluid/operators/squared_l2_distance_op.h
paddle/fluid/operators/squared_l2_distance_op.h
+0
-2
paddle/fluid/operators/squared_l2_norm_op_mlu.cc
paddle/fluid/operators/squared_l2_norm_op_mlu.cc
+2
-4
paddle/fluid/operators/squared_l2_norm_op_npu.cc
paddle/fluid/operators/squared_l2_norm_op_npu.cc
+3
-5
paddle/fluid/operators/stack_op_mlu.cc
paddle/fluid/operators/stack_op_mlu.cc
+4
-6
paddle/fluid/operators/stack_op_npu.cc
paddle/fluid/operators/stack_op_npu.cc
+8
-10
paddle/fluid/operators/stft_op.h
paddle/fluid/operators/stft_op.h
+7
-9
paddle/fluid/operators/strided_slice_op.cc
paddle/fluid/operators/strided_slice_op.cc
+2
-4
paddle/fluid/operators/strided_slice_op_mlu.cc
paddle/fluid/operators/strided_slice_op_mlu.cc
+4
-5
paddle/fluid/operators/strided_slice_op_npu.cc
paddle/fluid/operators/strided_slice_op_npu.cc
+15
-16
paddle/fluid/operators/sum_op_mlu.cc
paddle/fluid/operators/sum_op_mlu.cc
+1
-2
paddle/fluid/operators/sum_op_npu.cc
paddle/fluid/operators/sum_op_npu.cc
+1
-2
paddle/fluid/operators/svd_helper.h
paddle/fluid/operators/svd_helper.h
+19
-19
paddle/fluid/operators/sync_batch_norm_op_mlu.cc
paddle/fluid/operators/sync_batch_norm_op_mlu.cc
+14
-15
paddle/fluid/operators/sync_batch_norm_op_npu.cc
paddle/fluid/operators/sync_batch_norm_op_npu.cc
+70
-72
paddle/fluid/operators/take_along_axis_op_npu.cc
paddle/fluid/operators/take_along_axis_op_npu.cc
+0
-2
paddle/fluid/operators/tdm_child_op.h
paddle/fluid/operators/tdm_child_op.h
+0
-1
paddle/fluid/operators/tdm_sampler_op.h
paddle/fluid/operators/tdm_sampler_op.h
+0
-1
paddle/fluid/operators/teacher_student_sigmoid_loss_op.cc
paddle/fluid/operators/teacher_student_sigmoid_loss_op.cc
+6
-6
paddle/fluid/operators/teacher_student_sigmoid_loss_op.h
paddle/fluid/operators/teacher_student_sigmoid_loss_op.h
+0
-1
paddle/fluid/operators/temporal_shift_op.h
paddle/fluid/operators/temporal_shift_op.h
+0
-1
paddle/fluid/operators/tile_op_mlu.cc
paddle/fluid/operators/tile_op_mlu.cc
+0
-2
paddle/fluid/operators/tile_op_npu.cc
paddle/fluid/operators/tile_op_npu.cc
+0
-1
paddle/fluid/operators/top_k_op.cu
paddle/fluid/operators/top_k_op.cu
+1
-3
paddle/fluid/operators/top_k_op.h
paddle/fluid/operators/top_k_op.h
+0
-2
paddle/fluid/operators/top_k_op_npu.cc
paddle/fluid/operators/top_k_op_npu.cc
+1
-1
paddle/fluid/operators/top_k_op_xpu.cc
paddle/fluid/operators/top_k_op_xpu.cc
+0
-1
paddle/fluid/operators/tree_conv_op.h
paddle/fluid/operators/tree_conv_op.h
+6
-7
paddle/fluid/operators/truncated_gaussian_random_op_npu.cc
paddle/fluid/operators/truncated_gaussian_random_op_npu.cc
+6
-8
paddle/fluid/operators/uniform_random_op.cc
paddle/fluid/operators/uniform_random_op.cc
+1
-1
paddle/fluid/operators/uniform_random_op.cu
paddle/fluid/operators/uniform_random_op.cu
+2
-1
paddle/fluid/operators/uniform_random_op.h
paddle/fluid/operators/uniform_random_op.h
+0
-1
paddle/fluid/operators/uniform_random_op_mlu.cc
paddle/fluid/operators/uniform_random_op_mlu.cc
+3
-2
paddle/fluid/operators/uniform_random_op_npu.cc
paddle/fluid/operators/uniform_random_op_npu.cc
+3
-2
paddle/fluid/operators/var_conv_2d_op.cc
paddle/fluid/operators/var_conv_2d_op.cc
+19
-18
paddle/fluid/operators/var_conv_2d_op.h
paddle/fluid/operators/var_conv_2d_op.h
+0
-1
paddle/fluid/operators/where_index_op_mlu.cc
paddle/fluid/operators/where_index_op_mlu.cc
+2
-4
paddle/fluid/operators/where_index_op_npu.cc
paddle/fluid/operators/where_index_op_npu.cc
+5
-7
未找到文件。
paddle/fluid/imperative/gradient_accumulator.cc
浏览文件 @
65420271
...
...
@@ -644,11 +644,11 @@ void GradientAccumulator::CallGradientHooks() {
true
,
platform
::
errors
::
PreconditionNotMet
(
"Only can call gradient hooks after sum gradient completed."
));
PADDLE_ENFORCE_EQ
(
HasInnerVar
(),
PADDLE_ENFORCE_EQ
(
HasInnerVar
(),
true
,
platform
::
errors
::
PreconditionNotMet
(
"Leaf Tensor's inner var is nullptr when call gradient hook."
));
"Leaf Tensor's inner var is nullptr when "
"call gradient hook."
));
PADDLE_ENFORCE_EQ
(
inner_var_
->
Var
().
IsInitialized
(),
true
,
...
...
paddle/fluid/operators/abs_op_mlu.cc
浏览文件 @
65420271
...
...
@@ -18,8 +18,6 @@ limitations under the Licnse. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
template
<
typename
T
>
class
AbsMLUKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
...
...
@@ -54,7 +52,7 @@ class AbsGradMLUKernel : public framework::OpKernel<T> {
MLUCnnlOpTensorDesc
mul_op_desc
(
CNNL_OP_TENSOR_MUL
,
ToCnnlDataType
<
T
>
(),
CNNL_NOT_PROPAGATE_NAN
);
Tensor
sign_x
;
phi
::
Dense
Tensor
sign_x
;
sign_x
.
mutable_data
<
T
>
(
x
->
dims
(),
ctx
.
GetPlace
());
MLUCnnl
::
Sign
(
ctx
,
...
...
paddle/fluid/operators/abs_op_npu.cc
浏览文件 @
65420271
...
...
@@ -18,8 +18,6 @@ limitations under the Licnse. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
template
<
typename
DeviceContext
,
typename
T
>
class
AbsNPUKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
...
...
paddle/fluid/operators/activation_op_mlu.cc
浏览文件 @
65420271
...
...
@@ -21,8 +21,6 @@ limitations under the Licnse. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
template
<
cnnlActivationMode_t
act_mode
,
typename
T
>
class
ActivationMLUKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
...
...
@@ -442,7 +440,7 @@ class ReciprocalGradMLUKernel : public framework::OpKernel<T> {
auto
*
dx
=
ctx
.
Output
<
phi
::
DenseTensor
>
(
framework
::
GradVarName
(
"X"
));
auto
place
=
ctx
.
GetPlace
();
dx
->
mutable_data
<
T
>
(
place
);
Tensor
square_out
;
phi
::
Dense
Tensor
square_out
;
square_out
.
Resize
(
out
->
dims
());
square_out
.
mutable_data
<
T
>
(
place
);
MLUCnnlTensorDesc
out_desc
(
*
out
);
...
...
paddle/fluid/operators/activation_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/affine_grid_op.cc
浏览文件 @
65420271
...
...
@@ -28,8 +28,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
class
AffineGridOp
:
public
framework
::
OperatorWithKernel
{
public:
using
framework
::
OperatorWithKernel
::
OperatorWithKernel
;
...
...
paddle/fluid/operators/amp/alloc_float_status_op_npu.cc
浏览文件 @
65420271
...
...
@@ -21,8 +21,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
template
<
typename
DeviceContext
,
typename
T
>
class
AllocFloatStatusKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
...
...
paddle/fluid/operators/amp/check_finite_and_unscale_op_mlu.cc
浏览文件 @
65420271
...
...
@@ -19,8 +19,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
template
<
typename
T
>
class
CheckFiniteAndUnscaleMLUKernel
:
public
framework
::
OpKernel
<
T
>
{
using
MPDType
=
typename
details
::
MPTypeTrait
<
T
>::
Type
;
...
...
@@ -45,7 +43,7 @@ class CheckFiniteAndUnscaleMLUKernel : public framework::OpKernel<T> {
out
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
// check is_finite or is_nan
Tensor
is_finite
(
found_inf
->
type
());
phi
::
Dense
Tensor
is_finite
(
found_inf
->
type
());
if
(
i
!=
0
)
{
is_finite
.
Resize
(
phi
::
make_ddim
({
1
}));
is_finite
.
mutable_data
<
bool
>
(
ctx
.
GetPlace
());
...
...
@@ -78,8 +76,8 @@ class CheckFiniteAndUnscaleMLUKernel : public framework::OpKernel<T> {
// out = in/scale, if found_inf = false
// But when found_inf is true, the data of Out should not be used.
// So, on MLU, we always compute out with in/scale.
Tensor
float_x
;
Tensor
float_out
;
phi
::
Dense
Tensor
float_x
;
phi
::
Dense
Tensor
float_out
;
if
(
std
::
is_same
<
T
,
paddle
::
platform
::
float16
>::
value
)
{
float_x
.
Resize
(
x
->
dims
());
float_out
.
Resize
(
out
->
dims
());
...
...
paddle/fluid/operators/amp/check_finite_and_unscale_op_npu.cc
浏览文件 @
65420271
...
...
@@ -22,8 +22,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
// NOTE(zhiqiu): The CheckFiniteAndUnscaleNPUKernel is different from CUDA.
// On NPU, we do not really check the data of input tensors,
// but use NPUGetFloatStatus to check whether the nan/inf occurs on device,
...
...
@@ -47,13 +45,13 @@ class CheckFiniteAndUnscaleNPUKernel : public framework::OpKernel<T> {
.
stream
();
// step1: inverse scale
Tensor
const_tensor
;
phi
::
Dense
Tensor
const_tensor
;
const_tensor
.
mutable_data
<
T
>
({
1
},
ctx
.
GetPlace
());
FillNpuTensorWithConstant
<
T
>
(
&
const_tensor
,
static_cast
<
T
>
(
1.0
));
// Inverse(1.0/scale)
phi
::
DenseTensor
*
tmp_inverse_out
=
const_cast
<
phi
::
DenseTensor
*>
(
scale
);
Tensor
inverse_out
(
scale
->
type
());
phi
::
Dense
Tensor
inverse_out
(
scale
->
type
());
inverse_out
.
Resize
(
scale
->
dims
());
inverse_out
.
mutable_data
<
T
>
(
ctx
.
GetPlace
());
const
auto
&
runner_inverse
=
...
...
@@ -62,7 +60,7 @@ class CheckFiniteAndUnscaleNPUKernel : public framework::OpKernel<T> {
tmp_inverse_out
=
&
inverse_out
;
// NOTE(zhiqiu):
Tensor
tmp
;
phi
::
Dense
Tensor
tmp
;
tmp
.
mutable_data
<
float
>
({
8
},
ctx
.
GetPlace
());
// NOTE(zhiqiu): NPUGetFloatStatus updates data on input in-place.
// tmp is only placeholder.
...
...
@@ -73,7 +71,7 @@ class CheckFiniteAndUnscaleNPUKernel : public framework::OpKernel<T> {
{{
"message"
,
std
::
string
(
"check_nan_and_inf"
)}});
runner_float_status
.
Run
(
stream
);
Tensor
sum
;
phi
::
Dense
Tensor
sum
;
sum
.
mutable_data
<
float
>
({
1
},
ctx
.
GetPlace
());
const
auto
&
runner_reduce_sum
=
NpuOpRunner
(
"ReduceSumD"
,
...
...
paddle/fluid/operators/amp/check_finite_and_unscale_op_npu_test.cc
浏览文件 @
65420271
...
...
@@ -31,8 +31,6 @@ limitations under the License. */
namespace
f
=
paddle
::
framework
;
namespace
p
=
paddle
::
platform
;
using
Tensor
=
phi
::
DenseTensor
;
USE_OP_ITSELF
(
check_finite_and_unscale
);
USE_OP_DEVICE_KERNEL
(
check_finite_and_unscale
,
NPU
);
...
...
@@ -110,7 +108,7 @@ void Compare(f::Scope *scope, const p::DeviceContext &ctx) {
ctx
.
Wait
();
// out found_inf
Tensor
found_inf_tensor
;
phi
::
Dense
Tensor
found_inf_tensor
;
found_inf_tensor
.
Resize
({
1
});
bool
*
found_inf_data
=
found_inf_tensor
.
mutable_data
<
bool
>
(
paddle
::
platform
::
CPUPlace
());
...
...
paddle/fluid/operators/amp/clear_float_status_op_npu.cc
浏览文件 @
65420271
...
...
@@ -21,8 +21,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
template
<
typename
DeviceContext
,
typename
T
>
class
ClearFloatStatusKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
...
...
@@ -35,7 +33,7 @@ class ClearFloatStatusKernel : public framework::OpKernel<T> {
platform
::
errors
::
PreconditionNotMet
(
"The input(FloatStatus) and Output(FloatStatusOut) "
"should be the same."
));
Tensor
tmp
;
phi
::
Dense
Tensor
tmp
;
tmp
.
mutable_data
<
float
>
({
8
},
ctx
.
GetPlace
());
const
auto
&
runner
=
NpuOpRunner
(
"NPUClearFloatStatus"
,
{
tmp
},
{
*
float_status_out
});
...
...
paddle/fluid/operators/amp/get_float_status_op_npu.cc
浏览文件 @
65420271
...
...
@@ -21,8 +21,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
template
<
typename
DeviceContext
,
typename
T
>
class
GetFloatStatusKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
...
...
@@ -35,7 +33,7 @@ class GetFloatStatusKernel : public framework::OpKernel<T> {
platform
::
errors
::
PreconditionNotMet
(
"The input(FloatStatus) and Output(FloatStatusOut) "
"should be the same."
));
Tensor
tmp
;
phi
::
Dense
Tensor
tmp
;
tmp
.
mutable_data
<
float
>
({
8
},
ctx
.
GetPlace
());
auto
stream
=
ctx
.
template
device_context
<
paddle
::
platform
::
NPUDeviceContext
>()
...
...
paddle/fluid/operators/amp/update_loss_scaling_op_npu.cc
浏览文件 @
65420271
...
...
@@ -25,8 +25,6 @@ DECLARE_int32(min_loss_scaling);
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
template
<
typename
T
>
void
Update
(
const
platform
::
NPUDeviceContext
&
ctx
,
const
std
::
vector
<
bool
>
found_inf_vec
,
...
...
@@ -50,7 +48,7 @@ void Update(const platform::NPUDeviceContext& ctx,
good_out_tensor
->
numel
()
*
sizeof
(
int
),
stream
);
// bad_out_data = bad_in_data + 1
Tensor
factor_tensor
(
bad_out_tensor
->
dtype
());
phi
::
Dense
Tensor
factor_tensor
(
bad_out_tensor
->
dtype
());
factor_tensor
.
mutable_data
<
int
>
({
1
},
place
);
FillNpuTensorWithConstant
<
int
>
(
&
factor_tensor
,
static_cast
<
int
>
(
1
));
const
auto
&
runner_p2
=
NpuOpRunner
(
...
...
@@ -106,7 +104,7 @@ void Update(const platform::NPUDeviceContext& ctx,
stream
);
// good_out_data = good_in_data + 1
Tensor
factor_tensor
(
good_out_tensor
->
dtype
());
phi
::
Dense
Tensor
factor_tensor
(
good_out_tensor
->
dtype
());
factor_tensor
.
mutable_data
<
int
>
({
1
},
place
);
FillNpuTensorWithConstant
<
int
>
(
&
factor_tensor
,
static_cast
<
int
>
(
1
));
const
auto
&
runner_p2
=
NpuOpRunner
(
...
...
paddle/fluid/operators/arg_max_op_npu.cc
浏览文件 @
65420271
...
...
@@ -18,7 +18,6 @@ limitations under the Licnse. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
using
NPUDeviceContext
=
platform
::
NPUDeviceContext
;
template
<
typename
T
>
...
...
@@ -36,7 +35,7 @@ struct VisitDataArgNPUMaxFunctor {
auto
dtype
=
ctx
.
Attr
<
int
>
(
"dtype"
);
const
bool
&
flatten
=
ctx
.
Attr
<
bool
>
(
"flatten"
);
Tensor
transformed_x
(
x
.
type
());
phi
::
Dense
Tensor
transformed_x
(
x
.
type
());
transformed_x
.
ShareDataWith
(
x
);
if
(
flatten
)
{
transformed_x
.
Resize
(
phi
::
make_ddim
({
x
.
numel
()}));
...
...
paddle/fluid/operators/arg_min_op_npu.cc
浏览文件 @
65420271
...
...
@@ -17,7 +17,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
template
<
typename
DeviceContext
,
typename
T
>
class
ArgMinNPUKernel
:
public
framework
::
OpKernel
<
T
>
{
...
...
paddle/fluid/operators/argsort_op_npu.cc
浏览文件 @
65420271
...
...
@@ -18,7 +18,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
using
NPUDeviceContext
=
platform
::
NPUDeviceContext
;
template
<
typename
T
>
...
...
@@ -79,16 +78,16 @@ class ArgsortNPUKernel : public framework::OpKernel<T> {
framework
::
NPUAttributeMap
attr
=
{{
"axis"
,
-
1
},
{
"descending"
,
descending
}};
Tensor
indices_tmp
(
experimental
::
DataType
::
INT32
);
phi
::
Dense
Tensor
indices_tmp
(
experimental
::
DataType
::
INT32
);
indices_tmp
.
Resize
(
indices
->
dims
());
if
(
framework
::
TransToProtoVarType
(
input
->
dtype
())
==
framework
::
proto
::
VarType
::
INT64
)
{
Tensor
input_fp32
(
experimental
::
DataType
::
FLOAT32
);
phi
::
Dense
Tensor
input_fp32
(
experimental
::
DataType
::
FLOAT32
);
input_fp32
.
Resize
(
input
->
dims
());
CastToFP32
(
ctx
,
stream
,
*
input
,
&
input_fp32
);
Tensor
output_fp32
(
experimental
::
DataType
::
FLOAT32
);
phi
::
Dense
Tensor
output_fp32
(
experimental
::
DataType
::
FLOAT32
);
output_fp32
.
Resize
(
output
->
dims
());
if
(
axis
==
-
1
||
axis
+
1
==
in_dims
.
size
())
{
...
...
@@ -112,12 +111,12 @@ class ArgsortNPUKernel : public framework::OpKernel<T> {
}
auto
trans_dims
=
phi
::
make_ddim
(
shape
);
Tensor
trans_input
(
input_fp32
.
type
());
phi
::
Dense
Tensor
trans_input
(
input_fp32
.
type
());
trans_input
.
Resize
(
trans_dims
);
TranposeNPU
<
float
>
(
ctx
,
stream
,
&
perm
,
input_fp32
,
&
trans_input
);
Tensor
trans_output
(
input_fp32
.
type
());
Tensor
trans_indices
(
experimental
::
DataType
::
INT32
);
phi
::
Dense
Tensor
trans_output
(
input_fp32
.
type
());
phi
::
Dense
Tensor
trans_indices
(
experimental
::
DataType
::
INT32
);
trans_output
.
mutable_data
<
float
>
(
trans_dims
,
ctx
.
GetPlace
());
trans_indices
.
mutable_data
<
int32_t
>
(
trans_dims
,
ctx
.
GetPlace
());
...
...
@@ -150,12 +149,12 @@ class ArgsortNPUKernel : public framework::OpKernel<T> {
}
auto
trans_dims
=
phi
::
make_ddim
(
shape
);
Tensor
trans_input
(
input
->
type
());
phi
::
Dense
Tensor
trans_input
(
input
->
type
());
trans_input
.
Resize
(
trans_dims
);
TranposeNPU
<
T
>
(
ctx
,
stream
,
&
perm
,
*
input
,
&
trans_input
);
Tensor
trans_output
(
input
->
type
());
Tensor
trans_indices
(
experimental
::
DataType
::
INT32
);
phi
::
Dense
Tensor
trans_output
(
input
->
type
());
phi
::
Dense
Tensor
trans_indices
(
experimental
::
DataType
::
INT32
);
trans_output
.
mutable_data
<
T
>
(
trans_dims
,
ctx
.
GetPlace
());
trans_indices
.
mutable_data
<
int32_t
>
(
trans_dims
,
ctx
.
GetPlace
());
...
...
@@ -183,12 +182,12 @@ static void FullAssignNPU(const framework::ExecutionContext& ctx,
phi
::
product
(
phi
::
slice_ddim
(
in_dims
,
0
,
in_dims
.
size
()
-
1
));
const
int64_t
input_width
=
in_dims
[
in_dims
.
size
()
-
1
];
Tensor
input_tmp
;
phi
::
Dense
Tensor
input_tmp
;
input_tmp
.
ShareDataWith
(
input
);
input_tmp
.
Resize
(
phi
::
make_ddim
(
std
::
vector
<
int64_t
>
{
input_height
*
input_width
}));
Tensor
indices_tmp
;
phi
::
Dense
Tensor
indices_tmp
;
indices_tmp
.
ShareDataWith
(
indices
);
indices_tmp
.
Resize
(
phi
::
make_ddim
(
std
::
vector
<
int64_t
>
{
input_height
,
input_width
}));
...
...
@@ -197,12 +196,12 @@ static void FullAssignNPU(const framework::ExecutionContext& ctx,
for
(
Type
i
=
0
;
i
<
input_height
;
i
++
)
{
indexs_value
.
push_back
(
i
*
input_width
);
}
Tensor
indexs_tmp
(
indices
.
type
());
phi
::
Dense
Tensor
indexs_tmp
(
indices
.
type
());
framework
::
TensorFromVector
<
int64_t
>
(
indexs_value
,
ctx
.
device_context
(),
&
indexs_tmp
);
indexs_tmp
.
Resize
(
phi
::
make_ddim
(
std
::
vector
<
int64_t
>
{
input_height
,
1
}));
Tensor
indices_index
(
indices
.
type
());
phi
::
Dense
Tensor
indices_index
(
indices
.
type
());
indices_index
.
mutable_data
<
int64_t
>
(
indices_tmp
.
dims
(),
ctx
.
GetPlace
());
const
auto
&
runner_add
=
NpuOpRunner
(
"Add"
,
{
indices_tmp
,
indexs_tmp
},
{
indices_index
},
{});
...
...
@@ -212,7 +211,7 @@ static void FullAssignNPU(const framework::ExecutionContext& ctx,
phi
::
make_ddim
(
std
::
vector
<
int64_t
>
{
input_height
*
input_width
}));
t_out
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
Tensor
out_tmp
(
t_out
->
type
());
phi
::
Dense
Tensor
out_tmp
(
t_out
->
type
());
out_tmp
.
ShareDataWith
(
*
t_out
);
const
auto
&
runner
=
NpuOpRunner
(
"TensorScatterUpdate"
,
...
...
@@ -252,15 +251,15 @@ class ArgsortGradNPUKernel : public framework::OpKernel<T> {
}
auto
trans_dims
=
phi
::
make_ddim
(
shape
);
Tensor
trans_dout
(
dO
->
type
());
Tensor
trans_ids
(
indices
->
type
());
phi
::
Dense
Tensor
trans_dout
(
dO
->
type
());
phi
::
Dense
Tensor
trans_ids
(
indices
->
type
());
trans_dout
.
Resize
(
trans_dims
);
trans_ids
.
Resize
(
trans_dims
);
TranposeNPU
<
T
>
(
ctx
,
stream
,
&
perm
,
*
dO
,
&
trans_dout
);
TranposeNPU
<
int64_t
>
(
ctx
,
stream
,
&
perm
,
*
indices
,
&
trans_ids
);
Tensor
trans_dx
(
dO
->
type
());
phi
::
Dense
Tensor
trans_dx
(
dO
->
type
());
trans_dx
.
Resize
(
trans_dims
);
FullAssignNPU
<
T
,
int64_t
>
(
ctx
,
stream
,
trans_dims
,
trans_dout
,
trans_ids
,
&
trans_dx
);
...
...
paddle/fluid/operators/attention_lstm_op.cc
浏览文件 @
65420271
...
...
@@ -212,39 +212,41 @@ void AttentionLSTMOpMaker::Make() {
"this phi::DenseTensor is a matrix with shape (T X M), where T is the "
"total time steps in this mini-batch, M is the dim size of x."
);
AddInput
(
"C0"
,
"(Tensor) LSTM C0"
"(
phi::Dense
Tensor) LSTM C0"
"This is a tensor with shape (N x D), where N is the batch size, D "
"is the gate size."
"C0 is necessary because of attention."
);
AddInput
(
"H0"
,
"(Tensor, optional) LSTM H0"
"(
phi::Dense
Tensor, optional) LSTM H0"
"This is a tensor with shape (N x D), where N is the "
"batch size and D is the gate size."
)
.
AsDispensable
();
AddInput
(
"AttentionWeight"
,
"(Tensor) the weights of attention fc. Always relu the fc result."
"(phi::DenseTensor) the weights of attention fc. Always relu the fc "
"result."
"The shape is ((M+D) x 1), where M is the dim size of x, D is the "
"gate size of LSTM."
);
AddInput
(
"AttentionBias"
,
"(Tensor, optional) the bias of attention fc."
"(
phi::Dense
Tensor, optional) the bias of attention fc."
"The shape is (1 x 1)"
)
.
AsDispensable
();
AddInput
(
"AttentionScalar"
,
"(Tensor, optional) the scalar on the result of attentioned fc. "
"(phi::DenseTensor, optional) the scalar on the result of "
"attentioned fc. "
"Always relu the Scalar."
"The shape is (1 x 1)"
)
.
AsDispensable
();
AddInput
(
"AttentionScalarBias"
,
"(Tensor, optional) the scalar bias of attention fc."
"(
phi::Dense
Tensor, optional) the scalar bias of attention fc."
"The shape is (1 x 1)"
)
.
AsDispensable
();
AddInput
(
"LSTMWeight"
,
"(Tensor) the combined weight of LSTM"
"(
phi::Dense
Tensor) the combined weight of LSTM"
" - The shape is ((D+M) x 4D), where D is the hidden gate size, M "
"is the dim size of x"
" - Weight = {W_forget, W_input, W_output, W_cell}"
);
AddInput
(
"LSTMBias"
,
"(Tensor) the combined bias of LSTM, shape (1x4D)."
"(
phi::Dense
Tensor) the combined bias of LSTM, shape (1x4D)."
"Note: we should add the bias of hidden and context accorindg to "
"the same gate: "
"{B_forget, B_input, B_output, B_cell}"
);
...
...
@@ -257,20 +259,21 @@ void AttentionLSTMOpMaker::Make() {
"(phi::DenseTensor) (same as LSTMOp) the cell state of LSTM operator. "
"The shape is (T x D), and lod is the same with the `Input`."
);
AddOutput
(
"AttentionedX"
,
"(Tensor) shape is (T x 1), the result after X * AttentionWeight,"
"(phi::DenseTensor) shape is (T x 1), the result after X * "
"AttentionWeight,"
" where T is the total time steps in this mini-batch,"
" D is the hidden size."
)
.
AsIntermediate
();
AddOutput
(
"AttentionFCOut"
,
"(Tensor) (max_seq_len, 1), compute at each step."
)
"(
phi::Dense
Tensor) (max_seq_len, 1), compute at each step."
)
.
AsIntermediate
();
AddOutput
(
"LSTMX"
,
"(Tensor) the input X of LSTM for each step."
"(
phi::Dense
Tensor) the input X of LSTM for each step."
"Shape is (1 x M), where M is the x frame size"
)
.
AsIntermediate
();
AddOutput
(
"LSTMOUT"
,
"(Tensor) the output of LSTM X(1*(D+M))*
weight((D+M)*4D) for each step."
AddOutput
(
"LSTMOUT"
,
"(phi::DenseTensor) the output of LSTM X(1*(D+M))* "
"
weight((D+M)*4D) for each step."
"Shape is (1 x 4D), where M is the x frame size"
)
.
AsIntermediate
();
AddAttr
<
std
::
string
>
(
"gate_activation"
,
...
...
paddle/fluid/operators/attention_lstm_op.h
浏览文件 @
65420271
...
...
@@ -18,8 +18,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
class
AttentionLSTMOp
:
public
framework
::
OperatorWithKernel
{
public:
using
framework
::
OperatorWithKernel
::
OperatorWithKernel
;
...
...
paddle/fluid/operators/batch_norm_op.cc
浏览文件 @
65420271
...
...
@@ -207,7 +207,7 @@ framework::OpKernelType BatchNormOp::GetExpectedKernelType(
framework
::
OpKernelType
BatchNormOp
::
GetKernelTypeForVar
(
const
std
::
string
&
var_name
,
const
Tensor
&
tensor
,
const
phi
::
Dense
Tensor
&
tensor
,
const
framework
::
OpKernelType
&
expected_kernel_type
)
const
{
#ifdef PADDLE_WITH_MKLDNN
// Only input require reshaping, weights and
...
...
@@ -265,7 +265,7 @@ void BatchNormOpMaker::Make() {
"The global variance (for training) "
"or estimated Variance (for testing)"
);
AddInput
(
"MomentumTensor"
,
"(Tensor<float32>, optional) If provided, batch_norm will "
"(
phi::Dense
Tensor<float32>, optional) If provided, batch_norm will "
"use this as momentum, this has a higher priority than "
"attr(momentum), the shape of this tensor MUST BE [1]."
)
.
AsDispensable
();
...
...
@@ -380,9 +380,9 @@ framework::OpKernelType BatchNormGradOp::GetExpectedKernelType(
PADDLE_THROW
(
platform
::
errors
::
InvalidArgument
(
"can't find gradient variable of Y"
));
}
const
Tensor
*
t
=
nullptr
;
if
(
var
->
IsType
<
Tensor
>
())
{
t
=
&
var
->
Get
<
Tensor
>
();
const
phi
::
Dense
Tensor
*
t
=
nullptr
;
if
(
var
->
IsType
<
phi
::
Dense
Tensor
>
())
{
t
=
&
var
->
Get
<
phi
::
Dense
Tensor
>
();
}
else
if
(
var
->
IsType
<
phi
::
DenseTensor
>
())
{
t
=
&
var
->
Get
<
phi
::
DenseTensor
>
();
}
...
...
@@ -397,7 +397,7 @@ framework::OpKernelType BatchNormGradOp::GetExpectedKernelType(
framework
::
OpKernelType
BatchNormGradOp
::
GetKernelTypeForVar
(
const
std
::
string
&
var_name
,
const
Tensor
&
tensor
,
const
phi
::
Dense
Tensor
&
tensor
,
const
framework
::
OpKernelType
&
expected_kernel_type
)
const
{
#ifdef PADDLE_WITH_MKLDNN
// Only input require reshaping, weights and
...
...
@@ -522,9 +522,9 @@ framework::OpKernelType BatchNormDoubleGradOp::GetExpectedKernelType(
PADDLE_THROW
(
platform
::
errors
::
NotFound
(
"cannot find gradient variable of Y"
));
}
const
Tensor
*
t
=
nullptr
;
if
(
var
->
IsType
<
Tensor
>
())
{
t
=
&
var
->
Get
<
Tensor
>
();
const
phi
::
Dense
Tensor
*
t
=
nullptr
;
if
(
var
->
IsType
<
phi
::
Dense
Tensor
>
())
{
t
=
&
var
->
Get
<
phi
::
Dense
Tensor
>
();
}
else
if
(
var
->
IsType
<
phi
::
DenseTensor
>
())
{
t
=
&
var
->
Get
<
phi
::
DenseTensor
>
();
}
...
...
paddle/fluid/operators/batch_norm_op.cu
浏览文件 @
65420271
...
...
@@ -34,7 +34,6 @@ DECLARE_bool(cudnn_batchnorm_spatial_persistent);
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
using
DataLayout
=
phi
::
DataLayout
;
template
<
typename
T
>
using
CudnnDataType
=
platform
::
CudnnDataType
<
T
>
;
...
...
paddle/fluid/operators/batch_norm_op.h
浏览文件 @
65420271
...
...
@@ -27,7 +27,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
using
DataLayout
=
phi
::
DataLayout
;
template
<
typename
T
>
...
...
paddle/fluid/operators/batch_norm_op_mlu.cc
浏览文件 @
65420271
...
...
@@ -78,8 +78,8 @@ class MLUBatchNormOpKernel : public framework::OpKernel<T> {
saved_mean
->
mutable_data
<
MPDType
>
(
place
);
saved_variance
->
mutable_data
<
MPDType
>
(
place
);
Tensor
transformed_x
;
Tensor
transformed_y
;
phi
::
Dense
Tensor
transformed_x
;
phi
::
Dense
Tensor
transformed_y
;
const
int
transformed_dim_size
=
4
;
const
int
transformed_shape
[
transformed_dim_size
]
=
{
N
,
sample_size
,
1
,
C
};
MLUCnnlTensorDesc
transformed_desc
(
transformed_dim_size
,
...
...
@@ -116,7 +116,7 @@ class MLUBatchNormOpKernel : public framework::OpKernel<T> {
if
(
ctx
.
HasInput
(
"MomentumTensor"
))
{
const
auto
*
mom_tensor
=
ctx
.
Input
<
phi
::
DenseTensor
>
(
"MomentumTensor"
);
Tensor
mom_cpu
;
phi
::
Dense
Tensor
mom_cpu
;
framework
::
TensorCopySync
(
*
mom_tensor
,
platform
::
CPUPlace
(),
&
mom_cpu
);
momentum
=
mom_cpu
.
data
<
float
>
()[
0
];
}
...
...
@@ -226,9 +226,9 @@ class MLUBatchNormGradOpKernel : public framework::OpKernel<T> {
:
x_dims
[
x_dims
.
size
()
-
1
]);
const
int
sample_size
=
x
->
numel
()
/
N
/
C
;
Tensor
transformed_d_y
;
Tensor
transformed_x
;
Tensor
transformed_d_x
;
phi
::
Dense
Tensor
transformed_d_y
;
phi
::
Dense
Tensor
transformed_x
;
phi
::
Dense
Tensor
transformed_d_x
;
const
int
transformed_dim_size
=
4
;
const
int
transformed_shape
[
transformed_dim_size
]
=
{
N
,
sample_size
,
1
,
C
};
...
...
paddle/fluid/operators/batch_norm_op_npu.cc
浏览文件 @
65420271
...
...
@@ -89,7 +89,7 @@ class NPUBatchNormOpKernel : public framework::OpKernel<T> {
// is only used in this training branch
if
(
ctx
.
HasInput
(
"MomentumTensor"
))
{
const
auto
*
mom_tensor
=
ctx
.
Input
<
phi
::
DenseTensor
>
(
"MomentumTensor"
);
Tensor
mom_cpu
;
phi
::
Dense
Tensor
mom_cpu
;
paddle
::
framework
::
TensorCopySync
(
*
mom_tensor
,
platform
::
CPUPlace
(),
&
mom_cpu
);
momentum
=
mom_cpu
.
data
<
float
>
()[
0
];
...
...
paddle/fluid/operators/bce_loss_op_mlu.cc
浏览文件 @
65420271
...
...
@@ -18,8 +18,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
template
<
typename
T
>
class
BCELossMLUKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
...
...
paddle/fluid/operators/bce_loss_op_npu.cc
浏览文件 @
65420271
...
...
@@ -18,8 +18,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
template
<
typename
DeviceContext
,
typename
T
>
class
BCELossNPUKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
...
...
paddle/fluid/operators/cast_op.cc
浏览文件 @
65420271
...
...
@@ -40,7 +40,7 @@ class CastOpProtoMaker : public framework::OpProtoAndCheckerMaker {
Cast Operator.
This Operator casts the input tensor to another data type and
returns the Output Tensor. It's meaningless if the output dtype equals
returns the Output
phi::Dense
Tensor. It's meaningless if the output dtype equals
the input dtype, but it's fine if you do so.
)DOC"
);
...
...
paddle/fluid/operators/cast_op_mlu.cc
浏览文件 @
65420271
...
...
@@ -19,8 +19,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
template
<
typename
T
>
class
CastMLUKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
...
...
paddle/fluid/operators/cast_op_npu.cc
浏览文件 @
65420271
...
...
@@ -32,8 +32,6 @@ static std::map<framework::proto::VarType::Type, aclDataType>
{
framework
::
proto
::
VarType
::
FP64
,
ACL_DOUBLE
},
};
using
Tensor
=
phi
::
DenseTensor
;
template
<
typename
DeviceContext
,
typename
T
>
class
CastNPUKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
...
...
paddle/fluid/operators/center_loss_op.h
浏览文件 @
65420271
...
...
@@ -26,7 +26,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
template
<
typename
T
,
int
MajorType
=
Eigen
::
RowMajor
,
typename
IndexType
=
Eigen
::
DenseIndex
>
...
...
@@ -81,7 +80,7 @@ class CenterLossKernel : public framework::OpKernel<T> {
auto
loss_data
=
out_loss
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
Tensor
centers_diffacc
;
// used to accumulate all diff
phi
::
Dense
Tensor
centers_diffacc
;
// used to accumulate all diff
auto
centers_diffacc_data
=
centers_diffacc
.
mutable_data
<
T
>
(
centers_dim
,
ctx
.
GetPlace
());
int
numel
=
centers_diffacc
.
numel
();
...
...
paddle/fluid/operators/clip_by_norm_op.h
浏览文件 @
65420271
...
...
@@ -23,7 +23,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
// using SelectedRows = phi::SelectedRows;
template
<
typename
T
,
int
MajorType
=
Eigen
::
RowMajor
,
...
...
paddle/fluid/operators/clip_by_norm_op_npu.cc
浏览文件 @
65420271
...
...
@@ -18,8 +18,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
template
<
typename
DeviceContext
,
typename
T
>
class
NPUClipByNormKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
...
...
@@ -48,7 +46,7 @@ class NPUClipByNormKernel : public framework::OpKernel<T> {
"Input(X) of ClipByNormOp should not be null. "
"Please check if it is created correctly."
));
Tensor
square_sum
(
input
->
type
());
phi
::
Dense
Tensor
square_sum
(
input
->
type
());
square_sum
.
mutable_data
<
T
>
(
framework
::
DDim
({
1
}),
place
);
const
auto
&
x_dims
=
input
->
dims
();
std
::
vector
<
int
>
axis
;
...
...
@@ -62,12 +60,12 @@ class NPUClipByNormKernel : public framework::OpKernel<T> {
{{
"axis"
,
axis
},
{
"keep_dims"
,
false
}});
square_sum_runner
.
Run
(
stream
);
Tensor
x_norm
(
input
->
type
());
phi
::
Dense
Tensor
x_norm
(
input
->
type
());
x_norm
.
mutable_data
<
T
>
(
framework
::
DDim
({
1
}),
place
);
const
auto
&
x_norm_runner
=
NpuOpRunner
(
"Sqrt"
,
{
square_sum
},
{
x_norm
},
{});
x_norm_runner
.
Run
(
stream
);
Tensor
x_norm_t
;
phi
::
Dense
Tensor
x_norm_t
;
framework
::
TensorCopySync
(
x_norm
,
platform
::
CPUPlace
(),
&
x_norm_t
);
auto
x_norm_v
=
static_cast
<
float
>
(
*
x_norm_t
.
data
<
T
>
());
if
(
x_norm_v
<=
max_norm
)
{
...
...
paddle/fluid/operators/clip_op_mlu.cc
浏览文件 @
65420271
...
...
@@ -29,7 +29,7 @@ class ClipMLUKernel : public framework::OpKernel<T> {
auto
max
=
static_cast
<
T
>
(
ctx
.
Attr
<
float
>
(
"max"
));
if
(
ctx
.
HasInput
(
"Min"
))
{
Tensor
min_cpu
;
phi
::
Dense
Tensor
min_cpu
;
auto
*
min_tensor
=
ctx
.
Input
<
phi
::
DenseTensor
>
(
"Min"
);
auto
*
min_data
=
min_tensor
->
data
<
T
>
();
if
(
platform
::
is_mlu_place
(
min_tensor
->
place
()))
{
...
...
@@ -41,7 +41,7 @@ class ClipMLUKernel : public framework::OpKernel<T> {
}
if
(
ctx
.
HasInput
(
"Max"
))
{
Tensor
max_cpu
;
phi
::
Dense
Tensor
max_cpu
;
auto
*
max_tensor
=
ctx
.
Input
<
phi
::
DenseTensor
>
(
"Max"
);
auto
*
max_data
=
max_tensor
->
data
<
T
>
();
if
(
platform
::
is_mlu_place
(
max_tensor
->
place
()))
{
...
...
@@ -80,7 +80,7 @@ class ClipGradMLUKernel : public framework::OpKernel<T> {
auto
min_val
=
ctx
.
Attr
<
float
>
(
"min"
);
if
(
min_tensor
)
{
Tensor
min_data
;
phi
::
Dense
Tensor
min_data
;
framework
::
TensorCopy
(
*
min_tensor
,
platform
::
CPUPlace
(),
...
...
@@ -91,7 +91,7 @@ class ClipGradMLUKernel : public framework::OpKernel<T> {
}
auto
max_val
=
ctx
.
Attr
<
float
>
(
"max"
);
if
(
max_tensor
)
{
Tensor
max_data
;
phi
::
Dense
Tensor
max_data
;
framework
::
TensorCopy
(
*
max_tensor
,
platform
::
CPUPlace
(),
...
...
paddle/fluid/operators/clip_op_npu.cc
浏览文件 @
65420271
...
...
@@ -18,8 +18,6 @@
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
template
<
typename
DeviceContext
,
typename
T
>
class
ClipNPUKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
...
...
@@ -33,8 +31,8 @@ class ClipNPUKernel : public framework::OpKernel<T> {
auto
max_tensor
=
ctx
.
HasInput
(
"Max"
)
?
ctx
.
Input
<
phi
::
DenseTensor
>
(
"Max"
)
:
nullptr
;
Tensor
min_tensor_temp
(
x
->
type
());
Tensor
max_tensor_temp
(
x
->
type
());
phi
::
Dense
Tensor
min_tensor_temp
(
x
->
type
());
phi
::
Dense
Tensor
max_tensor_temp
(
x
->
type
());
if
(
min_tensor
==
nullptr
)
{
auto
min_value
=
static_cast
<
T
>
(
ctx
.
Attr
<
float
>
(
"min"
));
min_tensor_temp
.
mutable_data
<
T
>
({
1
},
ctx
.
GetPlace
());
...
...
@@ -74,7 +72,7 @@ class ClipGradNPUKernel : public framework::OpKernel<T> {
auto
min_val
=
ctx
.
Attr
<
float
>
(
"min"
);
if
(
min_tensor
)
{
Tensor
min_data
;
phi
::
Dense
Tensor
min_data
;
framework
::
TensorCopy
(
*
min_tensor
,
platform
::
CPUPlace
(),
...
...
@@ -86,7 +84,7 @@ class ClipGradNPUKernel : public framework::OpKernel<T> {
auto
max_val
=
ctx
.
Attr
<
float
>
(
"max"
);
if
(
max_tensor
)
{
Tensor
max_data
;
phi
::
Dense
Tensor
max_data
;
framework
::
TensorCopy
(
*
max_tensor
,
platform
::
CPUPlace
(),
...
...
paddle/fluid/operators/coalesce_tensor_op.cc
浏览文件 @
65420271
...
...
@@ -61,7 +61,7 @@ struct FillConstantVisitor {
*
=
nullptr
)
const
{
#ifdef PADDLE_WITH_ASCEND_CL
if
(
platform
::
is_npu_place
(
dev_ctx_
.
GetPlace
()))
{
Tensor
tensor_tmp
(
framework
::
TransToPhiDataType
(
dtype_
));
phi
::
Dense
Tensor
tensor_tmp
(
framework
::
TransToPhiDataType
(
dtype_
));
tensor_tmp
.
mutable_data
<
T
>
({
1
},
context_
.
GetPlace
());
FillNpuTensorWithConstant
<
T
>
(
&
tensor_tmp
,
static_cast
<
T
>
(
value_
));
...
...
paddle/fluid/operators/collective/c_allreduce_op.h
浏览文件 @
65420271
...
...
@@ -151,10 +151,9 @@ class CAllReduceOpCPUKernel : public framework::OpKernel<T> {
inline
bool
ContainsNan
(
const
paddle
::
platform
::
NPUDeviceContext
&
dev_ctx
,
aclrtStream
stream
,
const
phi
::
DenseTensor
*
in
)
{
using
Tensor
=
phi
::
DenseTensor
;
Tensor
out
(
in
->
type
());
phi
::
DenseTensor
out
(
in
->
type
());
Tensor
mean
(
in
->
type
());
phi
::
Dense
Tensor
mean
(
in
->
type
());
mean
.
Resize
({
1
});
mean
.
mutable_data
<
float
>
(
dev_ctx
.
GetPlace
());
std
::
vector
<
int
>
axes
;
...
...
paddle/fluid/operators/collective/c_softmax_with_cross_entropy_op.cu
浏览文件 @
65420271
...
...
@@ -24,8 +24,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
static
constexpr
int
kNumCUDAThreads
=
512
;
static
constexpr
int
kNumMaxinumNumBlocks
=
4096
;
...
...
@@ -126,7 +124,7 @@ struct CSoftmaxWithCrossEntropyFunctor<phi::GPUContext, T> {
const
int
N
=
phi
::
funcs
::
SizeToAxis
(
axis
,
logits_dims
);
const
int
D
=
phi
::
funcs
::
SizeFromAxis
(
axis
,
logits_dims
);
Tensor
logits_2d
,
softmax_2d
,
loss_2d
;
phi
::
Dense
Tensor
logits_2d
,
softmax_2d
,
loss_2d
;
logits_2d
.
ShareDataWith
(
*
logits
).
Resize
({
N
,
D
});
softmax_2d
.
ShareDataWith
(
*
softmax
).
Resize
({
N
,
D
});
loss_2d
.
ShareDataWith
(
*
loss
).
Resize
({
N
,
1
});
...
...
@@ -135,7 +133,7 @@ struct CSoftmaxWithCrossEntropyFunctor<phi::GPUContext, T> {
auto
eigen_softmax
=
math
::
EigenMatrix
<
T
>::
From
(
softmax_2d
);
// step 1, obtain logit_max
Tensor
logits_max
;
phi
::
Dense
Tensor
logits_max
;
logits_max
=
ctx
.
AllocateTmpTensor
<
T
,
phi
::
GPUContext
>
({
N
,
1
},
dev_ctx
);
void
*
logits_max_buff
=
logits_max
.
mutable_data
<
T
>
(
place
);
...
...
@@ -163,7 +161,7 @@ struct CSoftmaxWithCrossEntropyFunctor<phi::GPUContext, T> {
.
unaryExpr
(
math
::
ValueClip
<
T
>
());
// step 3, obtain predict target
Tensor
predicted_logits
;
phi
::
Dense
Tensor
predicted_logits
;
predicted_logits
=
ctx
.
AllocateTmpTensor
<
T
,
phi
::
GPUContext
>
({
N
,
1
},
dev_ctx
);
predicted_logits
.
mutable_data
<
T
>
(
place
);
...
...
@@ -215,7 +213,7 @@ struct CSoftmaxWithCrossEntropyFunctor<phi::GPUContext, T> {
eigen_softmax
.
device
(
*
dev_ctx
.
eigen_device
())
=
eigen_softmax
.
exp
();
// step 5, obtain sum_exp_logits
Tensor
sum_exp_logits
;
phi
::
Dense
Tensor
sum_exp_logits
;
sum_exp_logits
=
ctx
.
AllocateTmpTensor
<
T
,
phi
::
GPUContext
>
({
N
,
1
},
dev_ctx
);
void
*
sum_exp_logits_buff
=
sum_exp_logits
.
mutable_data
<
T
>
(
place
);
...
...
@@ -278,7 +276,7 @@ struct CSoftmaxWithCrossEntropyProcessGroupFunctor<phi::GPUContext, T> {
const
int
N
=
phi
::
funcs
::
SizeToAxis
(
axis
,
logits_dims
);
const
int
D
=
phi
::
funcs
::
SizeFromAxis
(
axis
,
logits_dims
);
Tensor
logits_2d
,
softmax_2d
,
loss_2d
;
phi
::
Dense
Tensor
logits_2d
,
softmax_2d
,
loss_2d
;
logits_2d
.
ShareDataWith
(
*
logits
).
Resize
({
N
,
D
});
softmax_2d
.
ShareDataWith
(
*
softmax
).
Resize
({
N
,
D
});
loss_2d
.
ShareDataWith
(
*
loss
).
Resize
({
N
,
1
});
...
...
@@ -287,7 +285,7 @@ struct CSoftmaxWithCrossEntropyProcessGroupFunctor<phi::GPUContext, T> {
auto
eigen_softmax
=
math
::
EigenMatrix
<
T
>::
From
(
softmax_2d
);
// step 1, obtain logit_max
Tensor
logits_max
;
phi
::
Dense
Tensor
logits_max
;
logits_max
=
ctx
.
AllocateTmpTensor
<
T
,
phi
::
GPUContext
>
({
N
,
1
},
dev_ctx
);
auto
eigen_logits_max
=
math
::
EigenMatrix
<
T
>::
From
(
logits_max
);
...
...
@@ -309,7 +307,7 @@ struct CSoftmaxWithCrossEntropyProcessGroupFunctor<phi::GPUContext, T> {
.
unaryExpr
(
math
::
ValueClip
<
T
>
());
// step 3, obtain predict target
Tensor
predicted_logits
;
phi
::
Dense
Tensor
predicted_logits
;
predicted_logits
=
ctx
.
AllocateTmpTensor
<
T
,
phi
::
GPUContext
>
({
N
,
1
},
dev_ctx
);
predicted_logits
.
mutable_data
<
T
>
(
place
);
...
...
@@ -355,7 +353,7 @@ struct CSoftmaxWithCrossEntropyProcessGroupFunctor<phi::GPUContext, T> {
eigen_softmax
.
device
(
*
dev_ctx
.
eigen_device
())
=
eigen_softmax
.
exp
();
// step 5, obtain sum_exp_logits
Tensor
sum_exp_logits
;
phi
::
Dense
Tensor
sum_exp_logits
;
sum_exp_logits
=
ctx
.
AllocateTmpTensor
<
T
,
phi
::
GPUContext
>
({
N
,
1
},
dev_ctx
);
void
*
sum_exp_logits_buff
=
sum_exp_logits
.
mutable_data
<
T
>
(
place
);
...
...
@@ -405,7 +403,7 @@ class CSoftmaxWithCrossEntropyGradCUDAKernel : public framework::OpKernel<T> {
const
int
N
=
phi
::
funcs
::
SizeToAxis
(
axis
,
sofrmax_dims
);
const
int
D
=
phi
::
funcs
::
SizeFromAxis
(
axis
,
sofrmax_dims
);
Tensor
logit_grad_2d
;
phi
::
Dense
Tensor
logit_grad_2d
;
logit_grad_2d
.
ShareDataWith
(
*
logit_grad
).
Resize
({
N
,
D
});
int
blocks
=
NumBlocks
(
N
*
D
);
...
...
paddle/fluid/operators/concat_op.cc
浏览文件 @
65420271
...
...
@@ -26,7 +26,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
class
ConcatOp
:
public
framework
::
OperatorWithKernel
{
public:
...
...
paddle/fluid/operators/concat_op_mlu.cc
浏览文件 @
65420271
...
...
@@ -119,7 +119,7 @@ class ConcatGradMLUKernel : public framework::OpKernel<T> {
out_grad
->
dims
().
size
()));
// get output tensor that the name is not kEmptyVarName
std
::
vector
<
void
*>
outputs_vec
;
std
::
vector
<
Tensor
>
tmp_outputs_vec
;
std
::
vector
<
phi
::
Dense
Tensor
>
tmp_outputs_vec
;
std
::
vector
<
MLUCnnlTensorDesc
>
output_descs
;
std
::
vector
<
cnnlTensorDescriptor_t
>
descs_vec
;
for
(
size_t
j
=
0
;
j
<
outs
.
size
();
++
j
)
{
...
...
@@ -129,7 +129,7 @@ class ConcatGradMLUKernel : public framework::OpKernel<T> {
output_descs
.
emplace_back
(
MLUCnnlTensorDesc
(
*
outs
[
j
]));
outputs_vec
.
push_back
(
GetBasePtr
(
outs
[
j
]));
}
else
{
Tensor
tmp_tensor
;
phi
::
Dense
Tensor
tmp_tensor
;
tmp_tensor
.
mutable_data
<
T
>
(
ins
[
j
]
->
dims
(),
ctx
.
GetPlace
());
tmp_outputs_vec
.
push_back
(
tmp_tensor
);
output_descs
.
emplace_back
(
MLUCnnlTensorDesc
(
*
ins
[
j
]));
...
...
paddle/fluid/operators/controlflow/logical_op_mlu.cc
浏览文件 @
65420271
...
...
@@ -18,8 +18,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
template
<
typename
T
,
cnnlLogicOp_t
log_method
>
class
LogicalMLUKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
...
...
paddle/fluid/operators/controlflow/logical_op_npu.cc
浏览文件 @
65420271
...
...
@@ -15,8 +15,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
template
<
typename
DeviceContext
,
typename
T
>
class
LogicalNotNPUKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
...
...
paddle/fluid/operators/conv_op.h
浏览文件 @
65420271
...
...
@@ -29,8 +29,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
// Base convolution operator definations for other conv
// like operators to reuse the implementation.
inline
int
ConvOutputSize
(
...
...
paddle/fluid/operators/conv_op_mlu.cc
浏览文件 @
65420271
...
...
@@ -18,7 +18,6 @@
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
using
DataLayout
=
phi
::
DataLayout
;
template
<
typename
T
>
...
...
@@ -56,8 +55,8 @@ class MLUConvOpKernel : public framework::OpKernel<T> {
UpdatePaddingAndDilation
(
&
paddings
,
&
dilations
,
padding_algorithm
,
in_data_dims
,
strides
,
ksize
);
Tensor
input_tensor
(
input
->
type
());
Tensor
output_tensor
(
output
->
type
());
phi
::
Dense
Tensor
input_tensor
(
input
->
type
());
phi
::
Dense
Tensor
output_tensor
(
output
->
type
());
const
std
::
vector
<
int
>
perm_to_nhwc
=
{
0
,
2
,
3
,
1
};
if
(
channel_last
)
{
input_tensor
.
ShareDataWith
(
*
input
);
...
...
@@ -78,7 +77,7 @@ class MLUConvOpKernel : public framework::OpKernel<T> {
output_tensor
.
set_layout
(
DataLayout
::
kNHWC
);
// transpose filter from MCHW to MHWC
Tensor
trans_filter
(
filter
->
type
());
phi
::
Dense
Tensor
trans_filter
(
filter
->
type
());
TransposeFromMLUTensor
<
T
>
(
ctx
,
perm_to_nhwc
,
filter
,
...
...
@@ -166,8 +165,8 @@ class MLUConvGradOpKernel : public framework::OpKernel<T> {
UpdatePaddingAndDilation
(
&
paddings
,
&
dilations
,
padding_algorithm
,
in_data_dims
,
strides
,
ksize
);
Tensor
input_tensor
(
input
->
type
());
Tensor
output_grad_tensor
(
output_grad
->
type
());
phi
::
Dense
Tensor
input_tensor
(
input
->
type
());
phi
::
Dense
Tensor
output_grad_tensor
(
output_grad
->
type
());
const
std
::
vector
<
int
>
perm_to_nhwc
=
{
0
,
2
,
3
,
1
};
const
std
::
vector
<
int
>
perm_to_nchw
=
{
0
,
3
,
1
,
2
};
if
(
channel_last
)
{
...
...
@@ -193,7 +192,7 @@ class MLUConvGradOpKernel : public framework::OpKernel<T> {
filter_grad
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
auto
filter_grad_dims
=
filter_grad
->
dims
();
Tensor
temp_filter_grad
(
filter_grad
->
type
());
phi
::
Dense
Tensor
temp_filter_grad
(
filter_grad
->
type
());
temp_filter_grad
.
mutable_data
<
T
>
({
filter_grad_dims
[
0
],
filter_grad_dims
[
2
],
filter_grad_dims
[
3
],
...
...
@@ -234,7 +233,7 @@ class MLUConvGradOpKernel : public framework::OpKernel<T> {
if
(
input_grad
)
{
input_grad
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
Tensor
input_grad_tensor
(
input_grad
->
type
());
phi
::
Dense
Tensor
input_grad_tensor
(
input_grad
->
type
());
if
(
channel_last
)
{
input_grad_tensor
.
ShareDataWith
(
*
input_grad
);
}
else
{
...
...
@@ -248,7 +247,7 @@ class MLUConvGradOpKernel : public framework::OpKernel<T> {
input_grad_tensor
.
set_layout
(
DataLayout
::
kNHWC
);
// transpose filter from MCHW to MHWC
Tensor
trans_filter
(
filter
->
type
());
phi
::
Dense
Tensor
trans_filter
(
filter
->
type
());
TransposeFromMLUTensor
<
T
>
(
ctx
,
perm_to_nhwc
,
filter
,
...
...
@@ -326,8 +325,8 @@ class MLUDepthwiseConvOpKernel : public framework::OpKernel<T> {
UpdatePaddingAndDilation
(
&
paddings
,
&
dilations
,
padding_algorithm
,
in_data_dims
,
strides
,
ksize
);
Tensor
input_tensor
(
input
->
type
());
Tensor
output_tensor
(
output
->
type
());
phi
::
Dense
Tensor
input_tensor
(
input
->
type
());
phi
::
Dense
Tensor
output_tensor
(
output
->
type
());
const
std
::
vector
<
int
>
perm_to_nhwc
=
{
0
,
2
,
3
,
1
};
if
(
channel_last
)
{
groups
=
in_dims
[
3
];
...
...
@@ -350,7 +349,7 @@ class MLUDepthwiseConvOpKernel : public framework::OpKernel<T> {
output_tensor
.
set_layout
(
DataLayout
::
kNHWC
);
// transpose filter from MCHW to MHWC
Tensor
trans_filter
(
filter
->
type
());
phi
::
Dense
Tensor
trans_filter
(
filter
->
type
());
TransposeFromMLUTensor
<
T
>
(
ctx
,
perm_to_nhwc
,
filter
,
...
...
@@ -438,8 +437,8 @@ class MLUDepthwiseConvGradOpKernel : public framework::OpKernel<T> {
UpdatePaddingAndDilation
(
&
paddings
,
&
dilations
,
padding_algorithm
,
in_data_dims
,
strides
,
ksize
);
Tensor
input_tensor
(
input
->
type
());
Tensor
output_grad_tensor
(
output_grad
->
type
());
phi
::
Dense
Tensor
input_tensor
(
input
->
type
());
phi
::
Dense
Tensor
output_grad_tensor
(
output_grad
->
type
());
const
std
::
vector
<
int
>
perm_to_nhwc
=
{
0
,
2
,
3
,
1
};
const
std
::
vector
<
int
>
perm_to_nchw
=
{
0
,
3
,
1
,
2
};
const
std
::
vector
<
int
>
perm_hwcm_to_mchw
=
{
3
,
2
,
0
,
1
};
...
...
@@ -469,7 +468,7 @@ class MLUDepthwiseConvGradOpKernel : public framework::OpKernel<T> {
filter_grad
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
auto
filter_grad_dims
=
filter_grad
->
dims
();
Tensor
temp_filter_grad
(
filter_grad
->
type
());
phi
::
Dense
Tensor
temp_filter_grad
(
filter_grad
->
type
());
// Details about setting diff_w hwcn for better performance, see the CNNL
// documentation.
temp_filter_grad
.
mutable_data
<
T
>
({
filter_grad_dims
[
perm_mchw_to_hwcm
[
0
]],
...
...
@@ -512,7 +511,7 @@ class MLUDepthwiseConvGradOpKernel : public framework::OpKernel<T> {
if
(
input_grad
)
{
input_grad
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
Tensor
input_grad_tensor
(
input_grad
->
type
());
phi
::
Dense
Tensor
input_grad_tensor
(
input_grad
->
type
());
if
(
channel_last
)
{
input_grad_tensor
.
ShareDataWith
(
*
input_grad
);
}
else
{
...
...
@@ -526,7 +525,7 @@ class MLUDepthwiseConvGradOpKernel : public framework::OpKernel<T> {
input_grad_tensor
.
set_layout
(
DataLayout
::
kNHWC
);
// transpose filter from MCHW to MHWC
Tensor
trans_filter
(
filter
->
type
());
phi
::
Dense
Tensor
trans_filter
(
filter
->
type
());
TransposeFromMLUTensor
<
T
>
(
ctx
,
perm_to_nhwc
,
filter
,
...
...
paddle/fluid/operators/conv_op_npu.cc
浏览文件 @
65420271
...
...
@@ -18,7 +18,6 @@
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
using
NPUDeviceContext
=
platform
::
NPUDeviceContext
;
static
void
CastToFP16
(
const
framework
::
ExecutionContext
&
ctx
,
const
aclrtStream
&
stream
,
...
...
@@ -104,7 +103,7 @@ class DepthwiseConvNPUKernel : public framework::OpKernel<T> {
std
::
vector
<
int
>
strides
(
4
,
1
);
std
::
vector
<
int
>
dilations
(
4
,
1
);
Tensor
input_tensor
,
output_tensor
;
phi
::
Dense
Tensor
input_tensor
,
output_tensor
;
input_tensor
.
ShareDataWith
(
*
input
);
output_tensor
.
ShareDataWith
(
*
output
);
...
...
@@ -125,7 +124,7 @@ class DepthwiseConvNPUKernel : public framework::OpKernel<T> {
auto
stream
=
ctx
.
template
device_context
<
NPUDeviceContext
>().
stream
();
// Transform filter (n, 1, h, w) --> (1, n, h, w)
Tensor
transformed_filter
(
filter
->
type
());
phi
::
Dense
Tensor
transformed_filter
(
filter
->
type
());
transformed_filter
.
mutable_data
<
T
>
({
filter
->
dims
()[
1
],
filter
->
dims
()[
0
],
filter
->
dims
()[
2
],
...
...
@@ -189,7 +188,7 @@ class DepthwiseConvGradNPUKernel : public framework::OpKernel<T> {
auto
stream
=
ctx
.
template
device_context
<
NPUDeviceContext
>().
stream
();
// Transform filter (n, 1, h, w) --> (1, n, h, w)
Tensor
transformed_filter
(
filter
->
type
());
phi
::
Dense
Tensor
transformed_filter
(
filter
->
type
());
transformed_filter
.
mutable_data
<
T
>
({
filter
->
dims
()[
1
],
filter
->
dims
()[
0
],
filter
->
dims
()[
2
],
...
...
@@ -204,7 +203,7 @@ class DepthwiseConvGradNPUKernel : public framework::OpKernel<T> {
std
::
vector
<
int
>
strides
(
4
,
1
);
std
::
vector
<
int
>
dilations
(
4
,
1
);
Tensor
input_tensor
,
output_grad_tensor
;
phi
::
Dense
Tensor
input_tensor
,
output_grad_tensor
;
input_tensor
.
ShareDataWith
(
*
input
);
output_grad_tensor
.
ShareDataWith
(
*
output_grad
);
if
(
channel_last
)
{
...
...
@@ -247,7 +246,7 @@ class DepthwiseConvGradNPUKernel : public framework::OpKernel<T> {
}
if
(
input_grad
)
{
input_grad
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
Tensor
input_grad_tensor
;
phi
::
Dense
Tensor
input_grad_tensor
;
input_grad_tensor
.
ShareDataWith
(
*
input_grad
);
if
(
channel_last
)
{
input_grad_tensor
.
set_layout
(
DataLayout
::
kNHWC
);
...
...
@@ -305,7 +304,7 @@ class NPUConvOpKernel : public framework::OpKernel<T> {
std
::
vector
<
int
>
strides_vec
(
4
,
1
);
std
::
vector
<
int
>
dilations_vec
(
4
,
1
);
Tensor
input_tensor
,
output_tensor
;
phi
::
Dense
Tensor
input_tensor
,
output_tensor
;
input_tensor
.
ShareDataWith
(
*
input
);
output_tensor
.
ShareDataWith
(
*
output
);
if
(
channel_last
)
{
...
...
@@ -378,7 +377,7 @@ class NPUConvGradOpKernel : public framework::OpKernel<T> {
std
::
vector
<
int
>
strides_vec
(
4
,
1
);
std
::
vector
<
int
>
dilations_vec
(
4
,
1
);
Tensor
input_tensor
,
output_grad_tensor
;
phi
::
Dense
Tensor
input_tensor
,
output_grad_tensor
;
input_tensor
.
ShareDataWith
(
*
input
);
output_grad_tensor
.
ShareDataWith
(
*
output_grad
);
if
(
channel_last
)
{
...
...
@@ -400,7 +399,7 @@ class NPUConvGradOpKernel : public framework::OpKernel<T> {
filter_grad
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
std
::
vector
<
int
>
filter_shape_vec
=
phi
::
vectorize
<
int
>
(
filter
->
dims
());
Tensor
filter_grad_fp32
(
experimental
::
DataType
::
FLOAT32
);
phi
::
Dense
Tensor
filter_grad_fp32
(
experimental
::
DataType
::
FLOAT32
);
filter_grad_fp32
.
Resize
(
filter_grad
->
dims
());
if
(
framework
::
TransToProtoVarType
(
input
->
dtype
())
==
...
...
@@ -430,7 +429,7 @@ class NPUConvGradOpKernel : public framework::OpKernel<T> {
input_grad
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
std
::
vector
<
int
>
input_shape_vec
=
phi
::
vectorize
<
int
>
(
input
->
dims
());
Tensor
input_grad_tensor
;
phi
::
Dense
Tensor
input_grad_tensor
;
input_grad_tensor
.
ShareDataWith
(
*
input_grad
);
if
(
channel_last
)
{
input_grad_tensor
.
set_layout
(
DataLayout
::
kNHWC
);
...
...
@@ -617,8 +616,9 @@ class NPUConv3dGradKernel : public framework::OpKernel<T> {
filter_grad
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
std
::
vector
<
int
>
filter_shape_vec
=
phi
::
vectorize
<
int
>
(
filter
->
dims
());
Tensor
filter_grad_tensor
=
ctx
.
AllocateTmpTensor
<
T
,
NPUDeviceContext
>
(
filter_grad
->
dims
(),
dev_ctx
);
phi
::
DenseTensor
filter_grad_tensor
=
ctx
.
AllocateTmpTensor
<
T
,
NPUDeviceContext
>
(
filter_grad
->
dims
(),
dev_ctx
);
filter_grad_tensor
.
ShareDataWith
(
*
filter_grad
);
filter_grad_tensor
.
set_layout
(
DataLayout
::
kNCDHW
);
...
...
@@ -638,8 +638,9 @@ class NPUConv3dGradKernel : public framework::OpKernel<T> {
input_grad
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
std
::
vector
<
int
>
input_shape_vec
=
phi
::
vectorize
<
int
>
(
input
->
dims
());
Tensor
input_grad_tensor
=
ctx
.
AllocateTmpTensor
<
T
,
NPUDeviceContext
>
(
input_grad
->
dims
(),
dev_ctx
);
phi
::
DenseTensor
input_grad_tensor
=
ctx
.
AllocateTmpTensor
<
T
,
NPUDeviceContext
>
(
input_grad
->
dims
(),
dev_ctx
);
input_grad_tensor
.
ShareDataWith
(
*
input_grad
);
input_grad_tensor
.
set_layout
(
DataLayout
::
kNCDHW
);
...
...
paddle/fluid/operators/conv_transpose_op_mlu.cc
浏览文件 @
65420271
...
...
@@ -20,7 +20,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
using
DataLayout
=
phi
::
DataLayout
;
template
<
typename
T
>
...
...
@@ -61,8 +60,8 @@ class Conv2DTransposeMLUKernel : public framework::OpKernel<T> {
phi
::
UpdatePaddingAndDilation
(
&
paddings
,
&
dilations
,
padding_algorithm
,
in_data_dims
,
strides
,
ksize
);
Tensor
input_tensor
(
input
->
type
());
Tensor
output_tensor
(
output
->
type
());
phi
::
Dense
Tensor
input_tensor
(
input
->
type
());
phi
::
Dense
Tensor
output_tensor
(
output
->
type
());
input_tensor
.
set_layout
(
DataLayout
::
kNHWC
);
output_tensor
.
set_layout
(
DataLayout
::
kNHWC
);
const
std
::
vector
<
int
>
perm_to_nhwc
=
{
0
,
2
,
3
,
1
};
...
...
@@ -84,7 +83,7 @@ class Conv2DTransposeMLUKernel : public framework::OpKernel<T> {
}
// transpose filter from MCHW to MHWC
Tensor
trans_filter
(
filter
->
type
());
phi
::
Dense
Tensor
trans_filter
(
filter
->
type
());
TransposeFromMLUTensor
<
T
>
(
ctx
,
perm_to_nhwc
,
filter
,
...
...
@@ -168,8 +167,8 @@ class Conv2DTransposeGradMLUKernel : public framework::OpKernel<T> {
phi
::
UpdatePaddingAndDilation
(
&
paddings
,
&
dilations
,
padding_algorithm
,
in_data_dims
,
strides
,
ksize
);
Tensor
input_tensor
(
input
->
type
());
Tensor
output_grad_tensor
(
output_grad
->
type
());
phi
::
Dense
Tensor
input_tensor
(
input
->
type
());
phi
::
Dense
Tensor
output_grad_tensor
(
output_grad
->
type
());
output_grad_tensor
.
set_layout
(
DataLayout
::
kNHWC
);
const
std
::
vector
<
int
>
perm_to_nhwc
=
{
0
,
2
,
3
,
1
};
...
...
@@ -191,7 +190,7 @@ class Conv2DTransposeGradMLUKernel : public framework::OpKernel<T> {
}
// transpose filter from MCHW to MHWC
Tensor
trans_filter
(
filter
->
type
());
phi
::
Dense
Tensor
trans_filter
(
filter
->
type
());
TransposeFromMLUTensor
<
T
>
(
ctx
,
perm_to_nhwc
,
filter
,
...
...
@@ -217,7 +216,7 @@ class Conv2DTransposeGradMLUKernel : public framework::OpKernel<T> {
if
(
filter_grad
)
{
filter_grad
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
Tensor
filter_grad_tensor
(
filter_grad
->
type
());
phi
::
Dense
Tensor
filter_grad_tensor
(
filter_grad
->
type
());
// filter_grad always MCHW
// filter_grad_tensor always MHWC
auto
filter_grad_dims
=
filter_grad
->
dims
();
...
...
@@ -253,7 +252,7 @@ class Conv2DTransposeGradMLUKernel : public framework::OpKernel<T> {
if
(
input_grad
)
{
input_grad
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
Tensor
input_grad_tensor
(
input_grad
->
type
());
phi
::
Dense
Tensor
input_grad_tensor
(
input_grad
->
type
());
input_tensor
.
set_layout
(
DataLayout
::
kNHWC
);
if
(
channel_last
)
{
...
...
paddle/fluid/operators/conv_transpose_op_npu.cc
浏览文件 @
65420271
...
...
@@ -20,7 +20,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
using
NPUDeviceContext
=
platform
::
NPUDeviceContext
;
template
<
typename
T
>
...
...
@@ -65,7 +64,7 @@ class Conv2DTransposeNPUKernel : public framework::OpKernel<T> {
std
::
vector
<
int
>
strides
(
4
,
1
);
std
::
vector
<
int
>
dilations
(
4
,
1
);
Tensor
input_tensor
,
output_tensor
;
phi
::
Dense
Tensor
input_tensor
,
output_tensor
;
input_tensor
.
ShareDataWith
(
*
input
);
output_tensor
.
ShareDataWith
(
*
output
);
...
...
@@ -148,7 +147,7 @@ class Conv2DTransposeGradNPUKernel : public framework::OpKernel<T> {
std
::
vector
<
int
>
strides_vec
(
4
,
1
);
std
::
vector
<
int
>
dilations_vec
(
4
,
1
);
Tensor
input_tensor
,
output_grad_tensor
;
phi
::
Dense
Tensor
input_tensor
,
output_grad_tensor
;
input_tensor
.
ShareDataWith
(
*
input
);
output_grad_tensor
.
ShareDataWith
(
*
output_grad
);
if
(
channel_last
)
{
...
...
@@ -182,7 +181,7 @@ class Conv2DTransposeGradNPUKernel : public framework::OpKernel<T> {
}
if
(
input_grad
)
{
input_grad
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
Tensor
input_grad_tensor
;
phi
::
Dense
Tensor
input_grad_tensor
;
input_grad_tensor
.
ShareDataWith
(
*
input_grad
);
if
(
channel_last
)
{
input_grad_tensor
.
set_layout
(
DataLayout
::
kNHWC
);
...
...
@@ -248,7 +247,7 @@ class Conv3DTransposeNPUKernel : public framework::OpKernel<T> {
std
::
vector
<
int
>
strides
(
5
,
1
);
std
::
vector
<
int
>
dilations
(
5
,
1
);
Tensor
input_tensor
,
output_tensor
,
filter_tensor
;
phi
::
Dense
Tensor
input_tensor
,
output_tensor
,
filter_tensor
;
input_tensor
.
Resize
(
input
->
dims
());
input_tensor
.
ShareDataWith
(
*
input
);
output_tensor
.
Resize
(
output
->
dims
());
...
...
paddle/fluid/operators/copy_cross_scope_op.cc
浏览文件 @
65420271
...
...
@@ -30,8 +30,6 @@ class OpBase;
}
// namespace imperative
}
// namespace paddle
using
Tensor
=
phi
::
DenseTensor
;
namespace
paddle
{
namespace
operators
{
...
...
paddle/fluid/operators/correlation_op.cc
浏览文件 @
65420271
...
...
@@ -22,8 +22,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
inline
std
::
vector
<
int64_t
>
CorrelationOutputSize
(
int
batch
,
int
input_height
,
int
input_width
,
...
...
paddle/fluid/operators/cos_sim_op.h
浏览文件 @
65420271
...
...
@@ -21,13 +21,11 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
template
<
typename
DeviceContext
,
typename
T
>
class
CosSimKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
void
Compute
(
const
framework
::
ExecutionContext
&
context
)
const
override
{
// get Tensor
// get
phi::Dense
Tensor
auto
*
in_x
=
context
.
Input
<
phi
::
DenseTensor
>
(
"X"
);
auto
*
in_y
=
context
.
Input
<
phi
::
DenseTensor
>
(
"Y"
);
auto
*
out_z
=
context
.
Output
<
phi
::
DenseTensor
>
(
"Out"
);
...
...
@@ -74,7 +72,7 @@ template <typename DeviceContext, typename T>
class
CosSimGradKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
void
Compute
(
const
framework
::
ExecutionContext
&
context
)
const
override
{
// get Tensor
// get
phi::Dense
Tensor
auto
*
in_x
=
context
.
Input
<
phi
::
DenseTensor
>
(
"X"
);
auto
*
in_y
=
context
.
Input
<
phi
::
DenseTensor
>
(
"Y"
);
auto
*
in_z
=
context
.
Input
<
phi
::
DenseTensor
>
(
"Out"
);
...
...
paddle/fluid/operators/crop_op_npu.cc
浏览文件 @
65420271
...
...
@@ -18,8 +18,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
template
<
typename
DeviceContext
,
typename
T
>
class
CropNPUKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
...
...
@@ -71,7 +69,7 @@ class CropNPUKernel : public framework::OpKernel<T> {
x
->
dims
().
size
()));
// shape memory maybe have gc.
Tensor
tmp_shape
(
*
shape
);
phi
::
Dense
Tensor
tmp_shape
(
*
shape
);
tmp_shape
.
mutable_data
<
T
>
(
ctx
.
GetPlace
());
const
auto
&
runner
=
...
...
@@ -90,7 +88,7 @@ class CropNPUKernel : public framework::OpKernel<T> {
"(%d) of the Input(X)."
,
shape_size
.
size
(),
x
->
dims
().
size
()));
Tensor
tmp_shape
(
x
->
dtype
());
phi
::
Dense
Tensor
tmp_shape
(
x
->
dtype
());
tmp_shape
.
Resize
(
phi
::
make_ddim
(
shape_size
));
tmp_shape
.
mutable_data
<
T
>
(
ctx
.
GetPlace
());
const
auto
&
runner
=
...
...
paddle/fluid/operators/cross_entropy_op.h
浏览文件 @
65420271
...
...
@@ -23,8 +23,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
template
<
typename
DeviceContext
,
typename
T
>
class
CrossEntropyOpKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
...
...
@@ -36,8 +34,8 @@ class CrossEntropyOpKernel : public framework::OpKernel<T> {
int
rank
=
x
->
dims
().
size
();
auto
label_dims
=
labels
->
dims
();
Tensor
x_2d
=
framework
::
ReshapeToMatrix
(
*
x
,
rank
-
1
);
Tensor
labels_2d
,
y_2d
;
phi
::
Dense
Tensor
x_2d
=
framework
::
ReshapeToMatrix
(
*
x
,
rank
-
1
);
phi
::
Dense
Tensor
labels_2d
,
y_2d
;
if
(
label_dims
.
size
()
<
rank
)
{
labels_2d
.
ShareDataWith
(
*
labels
);
labels_2d
.
Resize
({
phi
::
product
(
label_dims
),
1
});
...
...
paddle/fluid/operators/ctc_align_op.h
浏览文件 @
65420271
...
...
@@ -24,8 +24,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
template
<
typename
DeviceContext
,
typename
T
>
class
CTCAlignKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
...
...
paddle/fluid/operators/cudnn_lstm_op.cu.cc
浏览文件 @
65420271
...
...
@@ -26,8 +26,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
template
<
typename
T
,
typename
Type
>
bool
is_continuous
(
const
Type
&
weight_list
)
{
bool
continuous
=
true
;
...
...
@@ -41,7 +39,7 @@ bool is_continuous(const Type &weight_list) {
return
continuous
;
}
int
size_sum
(
const
std
::
vector
<
const
Tensor
*>
&
weight_list
)
{
int
size_sum
(
const
std
::
vector
<
const
phi
::
Dense
Tensor
*>
&
weight_list
)
{
int
size
=
0
;
for
(
size_t
i
=
0
;
i
<
weight_list
.
size
();
++
i
)
{
auto
in_size
=
weight_list
[
i
]
->
numel
();
...
...
@@ -53,8 +51,8 @@ int size_sum(const std::vector<const Tensor *> &weight_list) {
template
<
typename
T
>
void
weight_to_tensor
(
const
platform
::
Place
&
place
,
gpuStream_t
stream
,
const
std
::
vector
<
const
Tensor
*>
&
weight_list
,
Tensor
*
weight
)
{
const
std
::
vector
<
const
phi
::
Dense
Tensor
*>
&
weight_list
,
phi
::
Dense
Tensor
*
weight
)
{
auto
weight_data
=
weight
->
data
<
T
>
();
int
weight_offset
=
0
;
for
(
size_t
i
=
0
;
i
<
weight_list
.
size
();
++
i
)
{
...
...
@@ -72,11 +70,12 @@ void weight_to_tensor(const platform::Place &place,
}
template
<
typename
T
>
void
weight_to_tensor_list
(
const
platform
::
Place
&
place
,
void
weight_to_tensor_list
(
const
platform
::
Place
&
place
,
gpuStream_t
stream
,
std
::
vector
<
Tensor
*>
*
weight_grad
,
const
std
::
vector
<
const
Tensor
*>
&
weight_input
,
const
Tensor
*
weight
)
{
std
::
vector
<
phi
::
Dense
Tensor
*>
*
weight_grad
,
const
std
::
vector
<
const
phi
::
Dense
Tensor
*>
&
weight_input
,
const
phi
::
Dense
Tensor
*
weight
)
{
int
weight_offset
=
0
;
auto
*
weight_data
=
weight
->
data
<
T
>
();
for
(
size_t
i
=
0
;
i
<
weight_input
.
size
();
++
i
)
{
...
...
@@ -204,15 +203,15 @@ template <typename T>
class
CudnnLSTMGPUKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
void
Compute
(
const
framework
::
ExecutionContext
&
ctx
)
const
override
{
const
Tensor
*
x
=
ctx
.
Input
<
phi
::
DenseTensor
>
(
"Input"
);
const
Tensor
*
init_h
=
ctx
.
Input
<
phi
::
DenseTensor
>
(
"InitH"
);
const
Tensor
*
init_c
=
ctx
.
Input
<
phi
::
DenseTensor
>
(
"InitC"
);
const
phi
::
Dense
Tensor
*
x
=
ctx
.
Input
<
phi
::
DenseTensor
>
(
"Input"
);
const
phi
::
Dense
Tensor
*
init_h
=
ctx
.
Input
<
phi
::
DenseTensor
>
(
"InitH"
);
const
phi
::
Dense
Tensor
*
init_c
=
ctx
.
Input
<
phi
::
DenseTensor
>
(
"InitC"
);
Tensor
*
out
=
ctx
.
Output
<
phi
::
DenseTensor
>
(
"Out"
);
Tensor
*
last_h
=
ctx
.
Output
<
phi
::
DenseTensor
>
(
"LastH"
);
Tensor
*
last_c
=
ctx
.
Output
<
phi
::
DenseTensor
>
(
"LastC"
);
Tensor
*
reserve
=
ctx
.
Output
<
phi
::
DenseTensor
>
(
"Reserve"
);
Tensor
*
state_out
=
ctx
.
Output
<
phi
::
DenseTensor
>
(
"StateOut"
);
phi
::
Dense
Tensor
*
out
=
ctx
.
Output
<
phi
::
DenseTensor
>
(
"Out"
);
phi
::
Dense
Tensor
*
last_h
=
ctx
.
Output
<
phi
::
DenseTensor
>
(
"LastH"
);
phi
::
Dense
Tensor
*
last_c
=
ctx
.
Output
<
phi
::
DenseTensor
>
(
"LastC"
);
phi
::
Dense
Tensor
*
reserve
=
ctx
.
Output
<
phi
::
DenseTensor
>
(
"Reserve"
);
phi
::
Dense
Tensor
*
state_out
=
ctx
.
Output
<
phi
::
DenseTensor
>
(
"StateOut"
);
const
T
*
x_data
=
x
->
data
<
T
>
();
const
T
*
init_h_data
=
init_h
->
data
<
T
>
();
...
...
@@ -256,7 +255,7 @@ class CudnnLSTMGPUKernel : public framework::OpKernel<T> {
size_t
workspace_size
;
size_t
reserve_size
;
Tensor
weight_whole
;
phi
::
Dense
Tensor
weight_whole
;
T
*
w_data
=
nullptr
;
int
weight_numel
;
bool
w_initialized
=
false
;
...
...
@@ -272,7 +271,7 @@ class CudnnLSTMGPUKernel : public framework::OpKernel<T> {
if
(
!
w_initialized
)
{
auto
weight_list
=
ctx
.
MultiInput
<
phi
::
DenseTensor
>
(
"WeightList"
);
bool
continuous
=
is_continuous
<
T
,
std
::
vector
<
const
Tensor
*>>
(
weight_list
);
is_continuous
<
T
,
std
::
vector
<
const
phi
::
Dense
Tensor
*>>
(
weight_list
);
weight_numel
=
size_sum
(
weight_list
);
if
(
!
continuous
)
{
...
...
@@ -288,7 +287,7 @@ class CudnnLSTMGPUKernel : public framework::OpKernel<T> {
for
(
size_t
i
=
0
;
i
<
weight_list
.
size
();
++
i
)
{
size_t
len
=
weight_list
[
i
]
->
numel
();
auto
dim
=
weight_list
[
i
]
->
dims
();
const_cast
<
Tensor
*>
(
weight_list
[
i
])
const_cast
<
phi
::
Dense
Tensor
*>
(
weight_list
[
i
])
->
ShareDataWith
(
weight_whole
.
Slice
(
static_cast
<
int64_t
>
(
offset
),
static_cast
<
int64_t
>
(
offset
+
len
)))
...
...
@@ -481,12 +480,12 @@ class CudnnLSTMGPUGradKernel : public framework::OpKernel<T> {
auto
place
=
ctx
.
GetPlace
();
int
weight_numel
=
size_sum
(
weight_list
);
bool
continuous
=
is_continuous
<
T
,
std
::
vector
<
const
Tensor
*>>
(
weight_list
);
is_continuous
<
T
,
std
::
vector
<
const
phi
::
Dense
Tensor
*>>
(
weight_list
);
auto
stream
=
reinterpret_cast
<
const
phi
::
GPUContext
&>
(
ctx
.
device_context
())
.
stream
();
Tensor
weight_whole
;
phi
::
Dense
Tensor
weight_whole
;
T
*
weight_data
=
nullptr
;
if
(
!
continuous
)
{
...
...
@@ -497,7 +496,7 @@ class CudnnLSTMGPUGradKernel : public framework::OpKernel<T> {
weight_data
=
const_cast
<
T
*>
(
weight_list
[
0
]
->
data
<
T
>
());
}
Tensor
weight_grad
;
phi
::
Dense
Tensor
weight_grad
;
phi
::
funcs
::
SetConstant
<
phi
::
GPUContext
,
T
>
zero
;
weight_grad
.
mutable_data
<
T
>
({
weight_numel
},
ctx
.
GetPlace
());
zero
(
dev_ctx
,
&
weight_grad
,
static_cast
<
T
>
(
0.0
));
...
...
@@ -559,7 +558,7 @@ class CudnnLSTMGPUGradKernel : public framework::OpKernel<T> {
SequenceLength
,
&
workspace_size
,
&
reserve_size
,
const_cast
<
Tensor
*>
(
state_out
));
const_cast
<
phi
::
Dense
Tensor
*>
(
state_out
));
phi
::
DenseTensor
workspace_data_
;
workspace_data_
.
mutable_data
<
uint8_t
>
(
...
...
paddle/fluid/operators/cumsum_op_mlu.cc
浏览文件 @
65420271
...
...
@@ -18,8 +18,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
template
<
typename
T
>
class
CumSumMLUKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
...
...
@@ -34,7 +32,7 @@ class CumSumMLUKernel : public framework::OpKernel<T> {
out
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
phi
::
DenseTensor
*
input_ptr
=
const_cast
<
phi
::
DenseTensor
*>
(
x
);
Tensor
flat_x
(
x
->
type
());
phi
::
Dense
Tensor
flat_x
(
x
->
type
());
if
(
flatten
)
{
PADDLE_ENFORCE_EQ
(
axis
,
...
...
paddle/fluid/operators/cumsum_op_npu.cc
浏览文件 @
65420271
...
...
@@ -19,8 +19,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
static
void
CumsumImp
(
const
phi
::
DenseTensor
&
input
,
phi
::
DenseTensor
*
output
,
const
framework
::
NPUAttributeMap
&
attr_input
,
...
...
@@ -30,7 +28,7 @@ static void CumsumImp(const phi::DenseTensor& input,
.
stream
();
if
(
framework
::
TransToProtoVarType
(
input
.
dtype
())
==
framework
::
proto
::
VarType
::
INT64
)
{
Tensor
tmp_input
;
phi
::
Dense
Tensor
tmp_input
;
tmp_input
.
mutable_data
<
float
>
(
input
.
dims
(),
ctx
.
GetPlace
());
auto
dst_acl_dtype
=
ConvertToNpuDtype
(
framework
::
TransToProtoVarType
(
tmp_input
.
type
()));
...
...
@@ -41,7 +39,7 @@ static void CumsumImp(const phi::DenseTensor& input,
{{
"dst_type"
,
static_cast
<
int
>
(
dst_acl_dtype
)}});
cast_runner_1
.
Run
(
stream
);
Tensor
tmp_output
;
phi
::
Dense
Tensor
tmp_output
;
tmp_output
.
mutable_data
<
float
>
(
output
->
dims
(),
ctx
.
GetPlace
());
const
auto
&
runner
=
NpuOpRunner
(
"CumsumD"
,
{
tmp_input
},
{
tmp_output
},
attr_input
);
...
...
@@ -86,7 +84,7 @@ class CumSumNPUKernel : public framework::OpKernel<T> {
-
1
,
axis
));
Tensor
new_x
(
x
->
type
());
phi
::
Dense
Tensor
new_x
(
x
->
type
());
new_x
.
ShareDataWith
(
*
x
);
new_x
.
Resize
(
phi
::
make_ddim
({
x
->
numel
()}));
...
...
paddle/fluid/operators/cvm_op.cc
浏览文件 @
65420271
...
...
@@ -21,8 +21,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
class
CVMOp
:
public
framework
::
OperatorWithKernel
{
public:
using
framework
::
OperatorWithKernel
::
OperatorWithKernel
;
...
...
paddle/fluid/operators/cvm_op.cu
浏览文件 @
65420271
...
...
@@ -22,7 +22,6 @@ namespace paddle {
namespace
operators
{
using
phi
::
PADDLE_CUDA_NUM_THREADS
;
using
Tensor
=
phi
::
DenseTensor
;
template
<
typename
T
>
__global__
void
CvmComputeKernel
(
const
bool
use_cvm
,
...
...
paddle/fluid/operators/cvm_op.h
浏览文件 @
65420271
...
...
@@ -19,8 +19,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
template
<
typename
T
>
void
CvmComputeKernel
(
const
bool
use_cvm
,
const
int64_t
item_width
,
...
...
paddle/fluid/operators/data_norm_op.cc
浏览文件 @
65420271
...
...
@@ -23,7 +23,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
using
DataLayout
=
phi
::
DataLayout
;
template
<
typename
T
>
...
...
@@ -483,9 +482,9 @@ class DataNormGradOp : public framework::OperatorWithKernel {
PADDLE_THROW
(
platform
::
errors
::
InvalidArgument
(
"Y@GRAD can not be found for computation"
));
}
const
Tensor
*
t
=
nullptr
;
if
(
var
->
IsType
<
Tensor
>
())
{
t
=
&
var
->
Get
<
Tensor
>
();
const
phi
::
Dense
Tensor
*
t
=
nullptr
;
if
(
var
->
IsType
<
phi
::
Dense
Tensor
>
())
{
t
=
&
var
->
Get
<
phi
::
Dense
Tensor
>
();
}
else
if
(
var
->
IsType
<
phi
::
DenseTensor
>
())
{
t
=
&
var
->
Get
<
phi
::
DenseTensor
>
();
}
...
...
@@ -523,7 +522,7 @@ class DataNormGradKernel<phi::CPUContext, T> : public framework::OpKernel<T> {
(
data_layout
==
DataLayout
::
kNCHW
?
x_dims
[
1
]
:
x_dims
[
x_dims
.
size
()
-
1
]);
// init output
Tensor
*
d_x
=
nullptr
;
phi
::
Dense
Tensor
*
d_x
=
nullptr
;
if
(
ctx
.
HasOutput
(
framework
::
GradVarName
(
"X"
)))
{
d_x
=
ctx
.
Output
<
phi
::
DenseTensor
>
(
framework
::
GradVarName
(
"X"
));
}
...
...
@@ -587,12 +586,12 @@ class DataNormGradKernel<phi::CPUContext, T> : public framework::OpKernel<T> {
EigenVectorArrayMap
<
T
>
d_bias_arr
(
d_bias_data
,
C
);
EigenVectorArrayMap
<
T
>
d_scale_arr
(
d_scale_data
,
C
);
Tensor
dy_sum
;
phi
::
Dense
Tensor
dy_sum
;
dy_sum
.
Resize
({
C
});
dy_sum
.
mutable_data
<
T
>
(
ctx
.
GetPlace
());
EigenVectorArrayMap
<
T
>
dy_sum_arr
(
dy_sum
.
mutable_data
<
T
>
(
ctx
.
GetPlace
()),
C
);
Tensor
dy_mul_x_sub_mean_mul_invstd_sum
;
phi
::
Dense
Tensor
dy_mul_x_sub_mean_mul_invstd_sum
;
dy_mul_x_sub_mean_mul_invstd_sum
.
Resize
({
C
});
dy_mul_x_sub_mean_mul_invstd_sum
.
mutable_data
<
T
>
(
ctx
.
GetPlace
());
EigenVectorArrayMap
<
T
>
dy_mul_x_sub_mean_mul_invstd_sum_arr
(
...
...
paddle/fluid/operators/data_norm_op.cu
浏览文件 @
65420271
...
...
@@ -26,7 +26,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
using
DataLayout
=
phi
::
DataLayout
;
using
phi
::
PADDLE_CUDA_NUM_THREADS
;
...
...
@@ -166,7 +165,7 @@ class DataNormGradKernel<phi::GPUContext, T> : public framework::OpKernel<T> {
const
int
C
=
x_dims
[
1
];
// init output
Tensor
*
d_x
=
nullptr
;
phi
::
Dense
Tensor
*
d_x
=
nullptr
;
if
(
ctx
.
HasOutput
(
framework
::
GradVarName
(
"X"
)))
{
d_x
=
ctx
.
Output
<
phi
::
DenseTensor
>
(
framework
::
GradVarName
(
"X"
));
}
...
...
paddle/fluid/operators/deformable_conv_op_mlu.cc
浏览文件 @
65420271
...
...
@@ -18,8 +18,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
template
<
typename
T
>
class
DeformableConvMLUKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
...
...
@@ -58,29 +56,29 @@ class DeformableConvMLUKernel : public framework::OpKernel<T> {
im2col_step
);
const
std
::
vector
<
int
>
perm_to_nhwc
=
{
0
,
2
,
3
,
1
};
Tensor
trans_input
(
input
->
dtype
());
phi
::
Dense
Tensor
trans_input
(
input
->
dtype
());
TransposeFromMLUTensor
<
T
>
(
ctx
,
perm_to_nhwc
,
input
,
&
trans_input
,
true
/*need_reshape_or_alloc*/
);
Tensor
trans_offset
(
offset
->
dtype
());
phi
::
Dense
Tensor
trans_offset
(
offset
->
dtype
());
TransposeFromMLUTensor
<
T
>
(
ctx
,
perm_to_nhwc
,
offset
,
&
trans_offset
,
true
/*need_reshape_or_alloc*/
);
Tensor
trans_mask
(
mask
->
dtype
());
phi
::
Dense
Tensor
trans_mask
(
mask
->
dtype
());
TransposeFromMLUTensor
<
T
>
(
ctx
,
perm_to_nhwc
,
mask
,
&
trans_mask
,
true
/*need_reshape_or_alloc*/
);
Tensor
trans_filter
(
filter
->
dtype
());
phi
::
Dense
Tensor
trans_filter
(
filter
->
dtype
());
TransposeFromMLUTensor
<
T
>
(
ctx
,
perm_to_nhwc
,
filter
,
&
trans_filter
,
true
/*need_reshape_or_alloc*/
);
Tensor
tmp_output
(
output
->
dtype
());
phi
::
Dense
Tensor
tmp_output
(
output
->
dtype
());
auto
output_dims
=
output
->
dims
();
tmp_output
.
mutable_data
<
T
>
(
{
output_dims
[
0
],
output_dims
[
2
],
output_dims
[
3
],
output_dims
[
1
]},
...
...
@@ -167,54 +165,54 @@ class DeformableConvGradMLUKernel : public framework::OpKernel<T> {
groups
,
im2col_step
);
Tensor
tmp_input_grad
;
phi
::
Dense
Tensor
tmp_input_grad
;
auto
input_dims
=
input
->
dims
();
tmp_input_grad
.
mutable_data
<
T
>
(
{
input_dims
[
0
],
input_dims
[
2
],
input_dims
[
3
],
input_dims
[
1
]},
ctx
.
GetPlace
());
Tensor
tmp_filter_grad
;
phi
::
Dense
Tensor
tmp_filter_grad
;
auto
filter_dims
=
filter
->
dims
();
tmp_filter_grad
.
mutable_data
<
T
>
(
{
filter_dims
[
0
],
filter_dims
[
2
],
filter_dims
[
3
],
filter_dims
[
1
]},
ctx
.
GetPlace
());
Tensor
tmp_offset_grad
;
phi
::
Dense
Tensor
tmp_offset_grad
;
auto
offset_dims
=
offset
->
dims
();
tmp_offset_grad
.
mutable_data
<
T
>
(
{
offset_dims
[
0
],
offset_dims
[
2
],
offset_dims
[
3
],
offset_dims
[
1
]},
ctx
.
GetPlace
());
Tensor
tmp_mask_grad
;
phi
::
Dense
Tensor
tmp_mask_grad
;
auto
mask_dims
=
mask
->
dims
();
tmp_mask_grad
.
mutable_data
<
T
>
(
{
mask_dims
[
0
],
mask_dims
[
2
],
mask_dims
[
3
],
mask_dims
[
1
]},
ctx
.
GetPlace
());
const
std
::
vector
<
int
>
perm_to_nhwc
=
{
0
,
2
,
3
,
1
};
Tensor
trans_output_grad
(
output_grad
->
dtype
());
phi
::
Dense
Tensor
trans_output_grad
(
output_grad
->
dtype
());
TransposeFromMLUTensor
<
T
>
(
ctx
,
perm_to_nhwc
,
output_grad
,
&
trans_output_grad
,
true
/*need_reshape_or_alloc*/
);
Tensor
trans_input
(
input
->
dtype
());
phi
::
Dense
Tensor
trans_input
(
input
->
dtype
());
TransposeFromMLUTensor
<
T
>
(
ctx
,
perm_to_nhwc
,
input
,
&
trans_input
,
true
/*need_reshape_or_alloc*/
);
Tensor
trans_offset
(
offset
->
dtype
());
phi
::
Dense
Tensor
trans_offset
(
offset
->
dtype
());
TransposeFromMLUTensor
<
T
>
(
ctx
,
perm_to_nhwc
,
offset
,
&
trans_offset
,
true
/*need_reshape_or_alloc*/
);
Tensor
trans_mask
(
mask
->
dtype
());
phi
::
Dense
Tensor
trans_mask
(
mask
->
dtype
());
TransposeFromMLUTensor
<
T
>
(
ctx
,
perm_to_nhwc
,
mask
,
&
trans_mask
,
true
/*need_reshape_or_alloc*/
);
Tensor
trans_filter
(
filter
->
dtype
());
phi
::
Dense
Tensor
trans_filter
(
filter
->
dtype
());
TransposeFromMLUTensor
<
T
>
(
ctx
,
perm_to_nhwc
,
filter
,
...
...
paddle/fluid/operators/deformable_psroi_pooling_op.cu
浏览文件 @
65420271
...
...
@@ -39,7 +39,6 @@
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
using
phi
::
PADDLE_CUDA_NUM_THREADS
;
static
inline
int
GET_BLOCKS
(
const
int
N
)
{
...
...
paddle/fluid/operators/deformable_psroi_pooling_op.h
浏览文件 @
65420271
...
...
@@ -33,8 +33,6 @@
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
template
<
typename
T
>
T
bilinear_interp
(
const
T
*
data
,
const
T
x
,
const
T
y
,
const
int
width
,
const
int
height
)
{
...
...
@@ -518,7 +516,7 @@ class DeformablePSROIPoolGradCPUKernel : public framework::OpKernel<T> {
const
int
num_classes
=
no_trans
?
1
:
channels_trans
/
2
;
const
int
channels_each_class
=
no_trans
?
output_dim
:
output_dim
/
num_classes
;
Tensor
roi_batch_id_list
;
phi
::
Dense
Tensor
roi_batch_id_list
;
roi_batch_id_list
.
Resize
({
num_rois
});
int
*
roi_batch_id_data
=
roi_batch_id_list
.
mutable_data
<
int
>
(
ctx
.
GetPlace
());
...
...
paddle/fluid/operators/detection/bbox_util.cu.h
浏览文件 @
65420271
...
...
@@ -30,8 +30,6 @@ namespace cub = hipcub;
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
#define DIVUP(m, n) ((m) / (n) + ((m) % (n) > 0))
int
const
kThreadsPerBlock
=
sizeof
(
uint64_t
)
*
8
;
...
...
@@ -47,11 +45,11 @@ struct RangeInitFunctor {
template
<
typename
T
>
static
void
SortDescending
(
const
phi
::
GPUContext
&
ctx
,
const
Tensor
&
value
,
Tensor
*
value_out
,
Tensor
*
index_out
)
{
const
phi
::
Dense
Tensor
&
value
,
phi
::
Dense
Tensor
*
value_out
,
phi
::
Dense
Tensor
*
index_out
)
{
int
num
=
static_cast
<
int
>
(
value
.
numel
());
Tensor
index_in_t
;
phi
::
Dense
Tensor
index_in_t
;
int
*
idx_in
=
index_in_t
.
mutable_data
<
int
>
({
num
},
ctx
.
GetPlace
());
platform
::
ForRange
<
phi
::
GPUContext
>
for_range
(
ctx
,
num
);
for_range
(
RangeInitFunctor
{
0
,
1
,
idx_in
});
...
...
@@ -287,10 +285,10 @@ static __global__ void NMSKernel(const int n_boxes,
template
<
typename
T
>
static
void
NMS
(
const
phi
::
GPUContext
&
ctx
,
const
Tensor
&
proposals
,
const
Tensor
&
sorted_indices
,
const
phi
::
Dense
Tensor
&
proposals
,
const
phi
::
Dense
Tensor
&
sorted_indices
,
const
T
nms_threshold
,
Tensor
*
keep_out
,
phi
::
Dense
Tensor
*
keep_out
,
bool
pixel_offset
=
true
)
{
int
boxes_num
=
proposals
.
dims
()[
0
];
const
int
col_blocks
=
DIVUP
(
boxes_num
,
kThreadsPerBlock
);
...
...
paddle/fluid/operators/detection/bipartite_match_op.cc
浏览文件 @
65420271
...
...
@@ -18,8 +18,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
class
BipartiteMatchOp
:
public
framework
::
OperatorWithKernel
{
public:
using
framework
::
OperatorWithKernel
::
OperatorWithKernel
;
...
...
@@ -234,7 +232,7 @@ class BipartiteMatchKernel : public framework::OpKernel<T> {
auto
lod
=
dist_mat
->
lod
().
back
();
for
(
size_t
i
=
0
;
i
<
lod
.
size
()
-
1
;
++
i
)
{
if
(
lod
[
i
+
1
]
>
lod
[
i
])
{
Tensor
one_ins
=
dist_mat
->
Slice
(
lod
[
i
],
lod
[
i
+
1
]);
phi
::
Dense
Tensor
one_ins
=
dist_mat
->
Slice
(
lod
[
i
],
lod
[
i
+
1
]);
BipartiteMatch
(
one_ins
,
indices
+
i
*
col
,
dist
+
i
*
col
);
if
(
type
==
"per_prediction"
)
{
ArgMaxMatch
(
one_ins
,
indices
+
i
*
col
,
dist
+
i
*
col
,
threshold
);
...
...
paddle/fluid/operators/detection/box_clip_op.cu
浏览文件 @
65420271
...
...
@@ -22,7 +22,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
using
LoDTenso
=
phi
::
DenseTensor
;
static
constexpr
int
ImInfoSize
=
3
;
...
...
paddle/fluid/operators/detection/box_clip_op.h
浏览文件 @
65420271
...
...
@@ -19,8 +19,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
template
<
typename
DeviceContext
,
typename
T
>
class
BoxClipKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
...
...
@@ -42,9 +40,10 @@ class BoxClipKernel : public framework::OpKernel<T> {
auto
box_lod
=
input_box
->
lod
().
back
();
int64_t
n
=
static_cast
<
int64_t
>
(
box_lod
.
size
()
-
1
);
for
(
int
i
=
0
;
i
<
n
;
++
i
)
{
Tensor
im_info_slice
=
im_info
->
Slice
(
i
,
i
+
1
);
Tensor
box_slice
=
input_box
->
Slice
(
box_lod
[
i
],
box_lod
[
i
+
1
]);
Tensor
output_slice
=
output_box
->
Slice
(
box_lod
[
i
],
box_lod
[
i
+
1
]);
phi
::
DenseTensor
im_info_slice
=
im_info
->
Slice
(
i
,
i
+
1
);
phi
::
DenseTensor
box_slice
=
input_box
->
Slice
(
box_lod
[
i
],
box_lod
[
i
+
1
]);
phi
::
DenseTensor
output_slice
=
output_box
->
Slice
(
box_lod
[
i
],
box_lod
[
i
+
1
]);
ClipTiledBoxes
<
T
>
(
dev_ctx
,
im_info_slice
,
box_slice
,
&
output_slice
);
}
}
...
...
paddle/fluid/operators/detection/box_coder_op_npu.cc
浏览文件 @
65420271
...
...
@@ -18,8 +18,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
template
<
typename
T
>
struct
BoxCoderFunction
{
public:
...
...
@@ -28,31 +26,31 @@ struct BoxCoderFunction {
stream
=
ctx
.
template
device_context
<
paddle
::
platform
::
NPUDeviceContext
>()
.
stream
();
}
Tensor
Adds
(
const
phi
::
DenseTensor
&
x
,
float
scalar
)
{
Tensor
y
;
phi
::
Dense
Tensor
Adds
(
const
phi
::
DenseTensor
&
x
,
float
scalar
)
{
phi
::
Dense
Tensor
y
;
y
.
mutable_data
<
T
>
(
x
.
dims
(),
place
);
const
auto
&
runner
=
NpuOpRunner
(
"Adds"
,
{
x
},
{
y
},
{{
"value"
,
scalar
}});
runner
.
Run
(
stream
);
return
y
;
}
Tensor
Muls
(
const
phi
::
DenseTensor
&
x
,
float
scalar
)
{
Tensor
y
;
phi
::
Dense
Tensor
Muls
(
const
phi
::
DenseTensor
&
x
,
float
scalar
)
{
phi
::
Dense
Tensor
y
;
y
.
mutable_data
<
T
>
(
x
.
dims
(),
place
);
const
auto
&
runner
=
NpuOpRunner
(
"Muls"
,
{
x
},
{
y
},
{{
"value"
,
scalar
}});
runner
.
Run
(
stream
);
return
y
;
}
Tensor
Mul
(
const
phi
::
DenseTensor
&
x
,
const
phi
::
DenseTensor
&
y
)
{
Tensor
z
;
phi
::
Dense
Tensor
Mul
(
const
phi
::
DenseTensor
&
x
,
const
phi
::
DenseTensor
&
y
)
{
phi
::
Dense
Tensor
z
;
z
.
mutable_data
<
T
>
(
x
.
dims
(),
place
);
const
auto
&
runner
=
NpuOpRunner
(
"Mul"
,
{
x
,
y
},
{
z
},
{});
runner
.
Run
(
stream
);
return
z
;
}
Tensor
SubWithBroadCast
(
const
phi
::
DenseTensor
&
x
,
phi
::
Dense
Tensor
SubWithBroadCast
(
const
phi
::
DenseTensor
&
x
,
const
phi
::
DenseTensor
&
y
,
const
framework
::
DDim
&
shape
)
{
Tensor
z
;
phi
::
Dense
Tensor
z
;
z
.
mutable_data
<
T
>
(
shape
,
place
);
const
auto
&
runner
=
NpuOpRunner
(
"Sub"
,
{
x
,
y
},
{
z
},
{});
runner
.
Run
(
stream
);
...
...
@@ -66,10 +64,10 @@ struct BoxCoderFunction {
const
auto
&
runner
=
NpuOpRunner
(
"Div"
,
{
x
,
y
},
{
*
z
},
{});
runner
.
Run
(
stream
);
}
Tensor
DivWithBroadCast
(
const
phi
::
DenseTensor
&
x
,
phi
::
Dense
Tensor
DivWithBroadCast
(
const
phi
::
DenseTensor
&
x
,
const
phi
::
DenseTensor
&
y
,
const
framework
::
DDim
&
shape
)
{
Tensor
z
;
phi
::
Dense
Tensor
z
;
DivWithBroadCastVoid
(
x
,
y
,
shape
,
&
z
);
return
z
;
}
...
...
@@ -81,10 +79,10 @@ struct BoxCoderFunction {
const
auto
&
runner
=
NpuOpRunner
(
"Mul"
,
{
x
,
y
},
{
*
z
},
{});
runner
.
Run
(
stream
);
}
Tensor
MulWithBroadCast
(
const
phi
::
DenseTensor
&
x
,
phi
::
Dense
Tensor
MulWithBroadCast
(
const
phi
::
DenseTensor
&
x
,
const
phi
::
DenseTensor
&
y
,
const
framework
::
DDim
&
shape
)
{
Tensor
z
;
phi
::
Dense
Tensor
z
;
MulWithBroadCastVoid
(
x
,
y
,
shape
,
&
z
);
return
z
;
}
...
...
@@ -96,36 +94,36 @@ struct BoxCoderFunction {
const
auto
&
runner
=
NpuOpRunner
(
"AddV2"
,
{
x
,
y
},
{
*
z
},
{});
runner
.
Run
(
stream
);
}
Tensor
AddWithBroadCast
(
const
phi
::
DenseTensor
&
x
,
phi
::
Dense
Tensor
AddWithBroadCast
(
const
phi
::
DenseTensor
&
x
,
const
phi
::
DenseTensor
&
y
,
const
framework
::
DDim
&
shape
)
{
Tensor
z
;
phi
::
Dense
Tensor
z
;
AddWithBroadCastVoid
(
x
,
y
,
shape
,
&
z
);
return
z
;
}
Tensor
Abs
(
const
phi
::
DenseTensor
&
x
)
{
Tensor
y
;
phi
::
Dense
Tensor
Abs
(
const
phi
::
DenseTensor
&
x
)
{
phi
::
Dense
Tensor
y
;
y
.
mutable_data
<
T
>
(
x
.
dims
(),
place
);
const
auto
&
runner
=
NpuOpRunner
(
"Abs"
,
{
x
},
{
y
},
{});
runner
.
Run
(
stream
);
return
y
;
}
Tensor
Log
(
const
phi
::
DenseTensor
&
x
)
{
Tensor
t_x_m1
=
Adds
(
x
,
-
1
);
Tensor
y
;
phi
::
Dense
Tensor
Log
(
const
phi
::
DenseTensor
&
x
)
{
phi
::
Dense
Tensor
t_x_m1
=
Adds
(
x
,
-
1
);
phi
::
Dense
Tensor
y
;
y
.
mutable_data
<
T
>
(
x
.
dims
(),
place
);
const
auto
&
runner
=
NpuOpRunner
(
"Log1p"
,
{
t_x_m1
},
{
y
},
{});
runner
.
Run
(
stream
);
return
y
;
}
Tensor
Exp
(
const
phi
::
DenseTensor
&
x
)
{
Tensor
y
;
phi
::
Dense
Tensor
Exp
(
const
phi
::
DenseTensor
&
x
)
{
phi
::
Dense
Tensor
y
;
y
.
mutable_data
<
T
>
(
x
.
dims
(),
place
);
const
auto
&
runner
=
NpuOpRunner
(
"Exp"
,
{
x
},
{
y
},
{});
runner
.
Run
(
stream
);
return
y
;
}
Tensor
Dot
(
const
phi
::
DenseTensor
&
x
,
const
phi
::
DenseTensor
&
y
)
{
phi
::
Dense
Tensor
Dot
(
const
phi
::
DenseTensor
&
x
,
const
phi
::
DenseTensor
&
y
)
{
auto
dim_x
=
x
.
dims
();
auto
dim_y
=
y
.
dims
();
PADDLE_ENFORCE_EQ
(
...
...
@@ -145,7 +143,7 @@ struct BoxCoderFunction {
"got dim_x[1] = %d, dim_y[0] = %d."
,
dim_x
[
1
],
dim_y
[
0
]));
Tensor
z
;
phi
::
Dense
Tensor
z
;
z
.
mutable_data
<
T
>
({
dim_x
[
0
],
dim_y
[
1
]},
place
);
const
auto
&
runner
=
NpuOpRunner
(
"MatMul"
,
...
...
@@ -155,7 +153,7 @@ struct BoxCoderFunction {
runner
.
Run
(
stream
);
return
z
;
}
void
ConcatVoid
(
const
std
::
vector
<
Tensor
>&
inputs
,
void
ConcatVoid
(
const
std
::
vector
<
phi
::
Dense
Tensor
>&
inputs
,
const
framework
::
DDim
&
shape_out
,
int
axis
,
phi
::
DenseTensor
*
output
)
{
...
...
@@ -172,18 +170,18 @@ struct BoxCoderFunction {
runner
.
AddInputNames
(
names
);
runner
.
Run
(
stream
);
}
Tensor
Concat
(
const
std
::
vector
<
Tensor
>&
inputs
,
phi
::
DenseTensor
Concat
(
const
std
::
vector
<
phi
::
Dense
Tensor
>&
inputs
,
const
framework
::
DDim
&
shape_out
,
int
axis
)
{
Tensor
output
;
phi
::
Dense
Tensor
output
;
ConcatVoid
(
inputs
,
shape_out
,
axis
,
&
output
);
return
output
;
}
Tensor
Slice
(
const
phi
::
DenseTensor
&
x
,
phi
::
Dense
Tensor
Slice
(
const
phi
::
DenseTensor
&
x
,
const
std
::
vector
<
int
>&
offsets
,
const
std
::
vector
<
int
>&
size
,
const
framework
::
DDim
&
shape
)
{
Tensor
y
;
phi
::
Dense
Tensor
y
;
y
.
mutable_data
<
T
>
(
shape
,
place
);
const
auto
&
runner
=
NpuOpRunner
(
"SliceD"
,
{
x
},
{
y
},
{{
"offsets"
,
offsets
},
{
"size"
,
size
}});
...
...
@@ -218,8 +216,8 @@ void BoxCoderEnc(const framework::ExecutionContext& ctx,
auto
M
=
pb
->
dims
()[
0
];
auto
N
=
tb
->
dims
()[
0
];
auto
shape_0
=
phi
::
make_ddim
({
4
,
2
});
Tensor
m_diff
;
Tensor
m_aver
;
phi
::
Dense
Tensor
m_diff
;
phi
::
Dense
Tensor
m_aver
;
std
::
vector
<
T
>
vec_diff
=
{
static_cast
<
T
>
(
-
1
),
static_cast
<
T
>
(
0
),
static_cast
<
T
>
(
0
),
...
...
@@ -240,10 +238,10 @@ void BoxCoderEnc(const framework::ExecutionContext& ctx,
Vector2Tensor
<
T
>
(
ctx
,
vec_aver
,
shape_0
,
&
m_aver
);
BoxCoderFunction
<
T
>
F
(
ctx
);
Tensor
pb_xy
=
F
.
Adds
(
F
.
Dot
(
*
pb
,
m_aver
),
(
norm
?
0
:
0.5
));
Tensor
pb_wh
=
F
.
Adds
(
F
.
Dot
(
*
pb
,
m_diff
),
(
norm
?
0
:
1
));
Tensor
tb_xy
=
F
.
Dot
(
*
tb
,
m_aver
);
Tensor
tb_wh
=
F
.
Adds
(
F
.
Dot
(
*
tb
,
m_diff
),
(
norm
?
0
:
1
));
phi
::
Dense
Tensor
pb_xy
=
F
.
Adds
(
F
.
Dot
(
*
pb
,
m_aver
),
(
norm
?
0
:
0.5
));
phi
::
Dense
Tensor
pb_wh
=
F
.
Adds
(
F
.
Dot
(
*
pb
,
m_diff
),
(
norm
?
0
:
1
));
phi
::
Dense
Tensor
tb_xy
=
F
.
Dot
(
*
tb
,
m_aver
);
phi
::
Dense
Tensor
tb_wh
=
F
.
Adds
(
F
.
Dot
(
*
tb
,
m_diff
),
(
norm
?
0
:
1
));
pb_xy
.
Resize
({
1
,
M
,
2
});
pb_wh
.
Resize
({
1
,
M
,
2
});
...
...
@@ -253,15 +251,16 @@ void BoxCoderEnc(const framework::ExecutionContext& ctx,
auto
shape_half
=
phi
::
make_ddim
({
N
,
M
,
2
});
auto
shape_full
=
phi
::
make_ddim
({
N
,
M
,
4
});
Tensor
out_xy_0
=
F
.
DivWithBroadCast
(
phi
::
Dense
Tensor
out_xy_0
=
F
.
DivWithBroadCast
(
F
.
SubWithBroadCast
(
tb_xy
,
pb_xy
,
shape_half
),
pb_wh
,
shape_half
);
Tensor
out_wh_0
=
F
.
Log
(
F
.
Abs
(
F
.
DivWithBroadCast
(
tb_wh
,
pb_wh
,
shape_half
)));
Tensor
out_0
=
F
.
Concat
({
out_xy_0
,
out_wh_0
},
shape_full
,
2
);
phi
::
DenseTensor
out_wh_0
=
F
.
Log
(
F
.
Abs
(
F
.
DivWithBroadCast
(
tb_wh
,
pb_wh
,
shape_half
)));
phi
::
DenseTensor
out_0
=
F
.
Concat
({
out_xy_0
,
out_wh_0
},
shape_full
,
2
);
if
(
pbv
)
{
F
.
DivWithBroadCastVoid
(
out_0
,
*
pbv
,
shape_full
,
out
);
}
else
{
Tensor
t_var
;
phi
::
Dense
Tensor
t_var
;
std
::
vector
<
T
>
vec_var
(
4
);
for
(
auto
i
=
0
;
i
<
4
;
i
++
)
{
vec_var
[
i
]
=
static_cast
<
T
>
(
variance
[
i
]);
...
...
@@ -281,8 +280,8 @@ void BoxCoderDec(const framework::ExecutionContext& ctx,
int
axis
,
phi
::
DenseTensor
*
out
)
{
auto
shape_0
=
phi
::
make_ddim
({
4
,
2
});
Tensor
m_diff
;
Tensor
m_aver
;
phi
::
Dense
Tensor
m_diff
;
phi
::
Dense
Tensor
m_aver
;
std
::
vector
<
T
>
vec_diff
=
{
static_cast
<
T
>
(
-
1
),
static_cast
<
T
>
(
0
),
static_cast
<
T
>
(
0
),
...
...
@@ -303,8 +302,8 @@ void BoxCoderDec(const framework::ExecutionContext& ctx,
Vector2Tensor
<
T
>
(
ctx
,
vec_aver
,
shape_0
,
&
m_aver
);
BoxCoderFunction
<
T
>
F
(
ctx
);
Tensor
pb_xy
=
F
.
Adds
(
F
.
Dot
(
*
pb
,
m_aver
),
(
norm
?
0
:
0.5
));
Tensor
pb_wh
=
F
.
Adds
(
F
.
Dot
(
*
pb
,
m_diff
),
(
norm
?
0
:
1
));
phi
::
Dense
Tensor
pb_xy
=
F
.
Adds
(
F
.
Dot
(
*
pb
,
m_aver
),
(
norm
?
0
:
0.5
));
phi
::
Dense
Tensor
pb_wh
=
F
.
Adds
(
F
.
Dot
(
*
pb
,
m_diff
),
(
norm
?
0
:
1
));
auto
pb_resize_shape
=
axis
==
0
?
phi
::
make_ddim
({
1
,
pb
->
dims
()[
0
],
2
})
:
phi
::
make_ddim
({
pb
->
dims
()[
0
],
1
,
2
});
pb_xy
.
Resize
(
pb_resize_shape
);
...
...
@@ -313,18 +312,22 @@ void BoxCoderDec(const framework::ExecutionContext& ctx,
auto
tbox_slice_shape
=
phi
::
make_ddim
({
tb
->
dims
()[
0
],
tb
->
dims
()[
1
],
2
});
std
::
vector
<
int
>
tbox_slice_size
=
{
static_cast
<
int
>
(
tb
->
dims
()[
0
]),
static_cast
<
int
>
(
tb
->
dims
()[
1
]),
2
};
Tensor
tbox01
=
F
.
Slice
(
*
tb
,
{
0
,
0
,
0
},
tbox_slice_size
,
tbox_slice_shape
);
Tensor
tbox23
=
F
.
Slice
(
*
tb
,
{
0
,
0
,
2
},
tbox_slice_size
,
tbox_slice_shape
);
phi
::
DenseTensor
tbox01
=
F
.
Slice
(
*
tb
,
{
0
,
0
,
0
},
tbox_slice_size
,
tbox_slice_shape
);
phi
::
DenseTensor
tbox23
=
F
.
Slice
(
*
tb
,
{
0
,
0
,
2
},
tbox_slice_size
,
tbox_slice_shape
);
Tensor
tb_xy
;
Tensor
tb_wh
;
phi
::
Dense
Tensor
tb_xy
;
phi
::
Dense
Tensor
tb_wh
;
if
(
pbv
)
{
auto
pbvt_slice_shape
=
phi
::
make_ddim
({
pbv
->
dims
()[
0
],
2
});
auto
pbvt_resize_shape
=
axis
==
0
?
phi
::
make_ddim
({
1
,
pbv
->
dims
()[
0
],
2
})
:
phi
::
make_ddim
({
pbv
->
dims
()[
0
],
1
,
2
});
std
::
vector
<
int
>
pbvt_slice_size
=
{
static_cast
<
int
>
(
pbv
->
dims
()[
0
]),
2
};
Tensor
pbv_t01
=
F
.
Slice
(
*
pbv
,
{
0
,
0
},
pbvt_slice_size
,
pbvt_slice_shape
);
Tensor
pbv_t23
=
F
.
Slice
(
*
pbv
,
{
0
,
2
},
pbvt_slice_size
,
pbvt_slice_shape
);
phi
::
DenseTensor
pbv_t01
=
F
.
Slice
(
*
pbv
,
{
0
,
0
},
pbvt_slice_size
,
pbvt_slice_shape
);
phi
::
DenseTensor
pbv_t23
=
F
.
Slice
(
*
pbv
,
{
0
,
2
},
pbvt_slice_size
,
pbvt_slice_shape
);
pbv_t01
.
Resize
(
pbvt_resize_shape
);
pbv_t23
.
Resize
(
pbvt_resize_shape
);
...
...
@@ -345,7 +348,7 @@ void BoxCoderDec(const framework::ExecutionContext& ctx,
&
tb_xy
);
F
.
MulWithBroadCastVoid
(
F
.
Exp
(
tbox23
),
pb_wh
,
tbox_slice_shape
,
&
tb_wh
);
}
else
{
Tensor
t_var01
,
t_var23
;
phi
::
Dense
Tensor
t_var01
,
t_var23
;
auto
t_var_shape
=
phi
::
make_ddim
({
1
,
1
,
2
});
std
::
vector
<
T
>
vec_var01
=
{
static_cast
<
T
>
(
variance
[
0
]),
static_cast
<
T
>
(
variance
[
1
])};
...
...
@@ -366,9 +369,9 @@ void BoxCoderDec(const framework::ExecutionContext& ctx,
tbox_slice_shape
,
&
tb_wh
);
}
Tensor
obox01
=
phi
::
Dense
Tensor
obox01
=
F
.
AddWithBroadCast
(
tb_xy
,
F
.
Muls
(
tb_wh
,
-
0.5
),
tbox_slice_shape
);
Tensor
obox23
=
phi
::
Dense
Tensor
obox23
=
F
.
Adds
(
F
.
AddWithBroadCast
(
tb_xy
,
F
.
Muls
(
tb_wh
,
0.5
),
tbox_slice_shape
),
(
norm
?
0
:
-
1
));
F
.
ConcatVoid
({
obox01
,
obox23
},
out
->
dims
(),
2
,
out
);
...
...
paddle/fluid/operators/detection/collect_fpn_proposals_op.cc
浏览文件 @
65420271
...
...
@@ -16,7 +16,6 @@ limitations under the License.*/
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
class
CollectFpnProposalsOp
:
public
framework
::
OperatorWithKernel
{
public:
using
framework
::
OperatorWithKernel
::
OperatorWithKernel
;
...
...
paddle/fluid/operators/detection/collect_fpn_proposals_op.cu
浏览文件 @
65420271
...
...
@@ -33,8 +33,6 @@ namespace cub = hipcub;
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
static
constexpr
int
kNumCUDAThreads
=
64
;
static
constexpr
int
kNumMaxinumNumBlocks
=
4096
;
...
...
@@ -74,13 +72,13 @@ class GPUCollectFpnProposalsOpKernel : public framework::OpKernel<T> {
int
real_post_num
=
min
(
post_nms_topN
,
total_roi_num
);
fpn_rois
->
mutable_data
<
T
>
({
real_post_num
,
kBBoxSize
},
dev_ctx
.
GetPlace
());
Tensor
concat_rois
;
Tensor
concat_scores
;
phi
::
Dense
Tensor
concat_rois
;
phi
::
Dense
Tensor
concat_scores
;
T
*
concat_rois_data
=
concat_rois
.
mutable_data
<
T
>
(
{
total_roi_num
,
kBBoxSize
},
dev_ctx
.
GetPlace
());
T
*
concat_scores_data
=
concat_scores
.
mutable_data
<
T
>
({
total_roi_num
,
1
},
dev_ctx
.
GetPlace
());
Tensor
roi_batch_id_list
;
phi
::
Dense
Tensor
roi_batch_id_list
;
roi_batch_id_list
.
Resize
({
total_roi_num
});
int
*
roi_batch_id_data
=
roi_batch_id_list
.
mutable_data
<
int
>
(
platform
::
CPUPlace
());
...
...
@@ -130,20 +128,20 @@ class GPUCollectFpnProposalsOpKernel : public framework::OpKernel<T> {
}
// copy batch id list to GPU
Tensor
roi_batch_id_list_gpu
;
phi
::
Dense
Tensor
roi_batch_id_list_gpu
;
framework
::
TensorCopy
(
roi_batch_id_list
,
dev_ctx
.
GetPlace
(),
&
roi_batch_id_list_gpu
);
Tensor
index_in_t
;
phi
::
Dense
Tensor
index_in_t
;
int
*
idx_in
=
index_in_t
.
mutable_data
<
int
>
({
total_roi_num
},
dev_ctx
.
GetPlace
());
platform
::
ForRange
<
phi
::
GPUContext
>
for_range_total
(
dev_ctx
,
total_roi_num
);
for_range_total
(
RangeInitFunctor
{
0
,
1
,
idx_in
});
Tensor
keys_out_t
;
phi
::
Dense
Tensor
keys_out_t
;
T
*
keys_out
=
keys_out_t
.
mutable_data
<
T
>
({
total_roi_num
},
dev_ctx
.
GetPlace
());
Tensor
index_out_t
;
phi
::
Dense
Tensor
index_out_t
;
int
*
idx_out
=
index_out_t
.
mutable_data
<
int
>
({
total_roi_num
},
dev_ctx
.
GetPlace
());
...
...
@@ -175,21 +173,21 @@ class GPUCollectFpnProposalsOpKernel : public framework::OpKernel<T> {
sizeof
(
T
)
*
8
,
dev_ctx
.
stream
());
index_out_t
.
Resize
({
real_post_num
});
Tensor
sorted_rois
;
phi
::
Dense
Tensor
sorted_rois
;
sorted_rois
.
mutable_data
<
T
>
({
real_post_num
,
kBBoxSize
},
dev_ctx
.
GetPlace
());
Tensor
sorted_batch_id
;
phi
::
Dense
Tensor
sorted_batch_id
;
sorted_batch_id
.
mutable_data
<
int
>
({
real_post_num
},
dev_ctx
.
GetPlace
());
phi
::
funcs
::
GPUGather
<
T
>
(
dev_ctx
,
concat_rois
,
index_out_t
,
&
sorted_rois
);
phi
::
funcs
::
GPUGather
<
int
>
(
dev_ctx
,
roi_batch_id_list_gpu
,
index_out_t
,
&
sorted_batch_id
);
Tensor
batch_index_t
;
phi
::
Dense
Tensor
batch_index_t
;
int
*
batch_idx_in
=
batch_index_t
.
mutable_data
<
int
>
({
real_post_num
},
dev_ctx
.
GetPlace
());
platform
::
ForRange
<
phi
::
GPUContext
>
for_range_post
(
dev_ctx
,
real_post_num
);
for_range_post
(
RangeInitFunctor
{
0
,
1
,
batch_idx_in
});
Tensor
out_id_t
;
phi
::
Dense
Tensor
out_id_t
;
int
*
out_id_data
=
out_id_t
.
mutable_data
<
int
>
({
real_post_num
},
dev_ctx
.
GetPlace
());
// Determine temporary device storage requirements
...
...
@@ -222,7 +220,7 @@ class GPUCollectFpnProposalsOpKernel : public framework::OpKernel<T> {
phi
::
funcs
::
GPUGather
<
T
>
(
dev_ctx
,
sorted_rois
,
index_out_t
,
fpn_rois
);
Tensor
length_lod
;
phi
::
Dense
Tensor
length_lod
;
int
*
length_lod_data
=
length_lod
.
mutable_data
<
int
>
({
lod_size
},
dev_ctx
.
GetPlace
());
phi
::
funcs
::
SetConstant
<
phi
::
GPUContext
,
int
>
set_zero
;
...
...
paddle/fluid/operators/detection/density_prior_box_op_npu.cc
浏览文件 @
65420271
...
...
@@ -15,7 +15,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
using
fp16
=
paddle
::
platform
::
float16
;
template
<
typename
T
>
...
...
@@ -89,7 +88,7 @@ struct DensityPriorBoxFunction {
const
auto
&
runner
=
NpuOpRunner
(
"Minimum"
,
{
*
x
,
*
y
},
{
*
z
},
{});
runner
.
Run
(
stream
);
}
void
Concat
(
const
std
::
vector
<
Tensor
>&
inputs
,
void
Concat
(
const
std
::
vector
<
phi
::
Dense
Tensor
>&
inputs
,
int
axis
,
phi
::
DenseTensor
*
output
)
{
// output should be init first
...
...
@@ -131,14 +130,14 @@ struct DensityPriorBoxFunction {
platform
::
Place
place
;
aclrtStream
stream
;
const
framework
::
ExecutionContext
&
ctx
;
Tensor
t0
;
Tensor
t1
;
Tensor
tn
;
phi
::
Dense
Tensor
t0
;
phi
::
Dense
Tensor
t1
;
phi
::
Dense
Tensor
tn
;
};
template
<
>
void
DensityPriorBoxFunction
<
fp16
>::
Arange
(
int
n
,
phi
::
DenseTensor
*
x
)
{
Tensor
x_fp32
(
experimental
::
DataType
::
FLOAT32
);
phi
::
Dense
Tensor
x_fp32
(
experimental
::
DataType
::
FLOAT32
);
x_fp32
.
mutable_data
<
float
>
(
x
->
dims
(),
place
);
FillNpuTensorWithConstant
<
float
>
(
&
tn
,
static_cast
<
float
>
(
n
));
const
auto
&
runner
=
NpuOpRunner
(
"Range"
,
{
t0
,
tn
,
t1
},
{
x_fp32
},
{});
...
...
@@ -149,7 +148,7 @@ void DensityPriorBoxFunction<fp16>::Arange(int n, phi::DenseTensor* x) {
template
<
>
void
DensityPriorBoxFunction
<
fp16
>::
FloatVec2Tsr
(
const
std
::
vector
<
float
>&
vec
,
phi
::
DenseTensor
*
tsr_dst
)
{
Tensor
tsr_fp32
(
experimental
::
DataType
::
FLOAT32
);
phi
::
Dense
Tensor
tsr_fp32
(
experimental
::
DataType
::
FLOAT32
);
tsr_fp32
.
mutable_data
<
float
>
(
tsr_dst
->
dims
(),
place
);
framework
::
TensorFromVector
<
float
>
(
vec
,
ctx
.
device_context
(),
&
tsr_fp32
);
ctx
.
template
device_context
<
paddle
::
platform
::
NPUDeviceContext
>().
Wait
();
...
...
@@ -185,9 +184,9 @@ class DensityPriorBoxOpNPUKernel : public framework::OpKernel<T> {
auto
place
=
ctx
.
GetPlace
();
DensityPriorBoxFunction
<
T
>
F
(
ctx
);
Tensor
h
(
_type
);
phi
::
Dense
Tensor
h
(
_type
);
h
.
mutable_data
<
T
>
({
layer_h
},
place
);
Tensor
w
(
_type
);
phi
::
Dense
Tensor
w
(
_type
);
w
.
mutable_data
<
T
>
({
layer_w
},
place
);
F
.
Arange
(
layer_h
,
&
h
);
F
.
Arange
(
layer_w
,
&
w
);
...
...
@@ -203,11 +202,11 @@ class DensityPriorBoxOpNPUKernel : public framework::OpKernel<T> {
for
(
size_t
i
=
0
;
i
<
densities
.
size
();
++
i
)
{
num_priors_per_ratio
+=
densities
[
i
]
*
densities
[
i
];
}
Tensor
di
(
_type
);
Tensor
dj
(
_type
);
Tensor
shifts
(
_type
);
Tensor
box_w_ratio
(
_type
);
Tensor
box_h_ratio
(
_type
);
phi
::
Dense
Tensor
di
(
_type
);
phi
::
Dense
Tensor
dj
(
_type
);
phi
::
Dense
Tensor
shifts
(
_type
);
phi
::
Dense
Tensor
box_w_ratio
(
_type
);
phi
::
Dense
Tensor
box_h_ratio
(
_type
);
di
.
mutable_data
<
T
>
({
ratios_size
*
num_priors_per_ratio
},
place
);
dj
.
mutable_data
<
T
>
({
ratios_size
*
num_priors_per_ratio
},
place
);
shifts
.
mutable_data
<
T
>
({
ratios_size
*
num_priors_per_ratio
},
place
);
...
...
@@ -220,19 +219,21 @@ class DensityPriorBoxOpNPUKernel : public framework::OpKernel<T> {
// Range = start:start+ratios_size*density_sqr, density = densities[i]
int
density_sqr
=
densities
[
i
]
*
densities
[
i
];
// shifts[Range] = [step_average/density]*ratios_size*density_sqr
Tensor
shifts_part
=
phi
::
Dense
Tensor
shifts_part
=
shifts
.
Slice
(
start
,
start
+
ratios_size
*
density_sqr
);
FillNpuTensorWithConstant
<
T
>
(
&
shifts_part
,
static_cast
<
T
>
(
step_average
/
densities
[
i
]));
// di[Range] = [ i // density for i in range(density_sqr) ] * ratios_size
// dj[Range] = [ i % density for i in range(density_sqr) ] * ratios_size
Tensor
di_part
=
di
.
Slice
(
start
,
start
+
ratios_size
*
density_sqr
);
Tensor
dj_part
=
dj
.
Slice
(
start
,
start
+
ratios_size
*
density_sqr
);
phi
::
DenseTensor
di_part
=
di
.
Slice
(
start
,
start
+
ratios_size
*
density_sqr
);
phi
::
DenseTensor
dj_part
=
dj
.
Slice
(
start
,
start
+
ratios_size
*
density_sqr
);
if
(
densities
[
i
]
>
1
)
{
di_part
.
Resize
({
ratios_size
,
densities
[
i
],
densities
[
i
]});
dj_part
.
Resize
({
ratios_size
,
densities
[
i
],
densities
[
i
]});
Tensor
range_n
(
_type
);
phi
::
Dense
Tensor
range_n
(
_type
);
range_n
.
mutable_data
<
T
>
({
densities
[
i
]},
place
);
F
.
Arange
(
densities
[
i
],
&
range_n
);
range_n
.
Resize
({
1
,
densities
[
i
],
1
});
...
...
@@ -254,9 +255,9 @@ class DensityPriorBoxOpNPUKernel : public framework::OpKernel<T> {
// Range_mini = start_box_ratio:start_box_ratio+density_sqr
// box_h_ratio[Range_mini] = [fixed_sizes[i] * sqrt(ar)] * density_sqr
// box_w_ratio[Range_mini] = [fixed_sizes[i] / sqrt(ar)] * density_sqr
Tensor
box_h_ratio_part
=
phi
::
Dense
Tensor
box_h_ratio_part
=
box_h_ratio
.
Slice
(
start_box_ratio
,
start_box_ratio
+
density_sqr
);
Tensor
box_w_ratio_part
=
phi
::
Dense
Tensor
box_w_ratio_part
=
box_w_ratio
.
Slice
(
start_box_ratio
,
start_box_ratio
+
density_sqr
);
FillNpuTensorWithConstant
<
T
>
(
&
box_w_ratio_part
,
static_cast
<
T
>
(
fixed_sizes
[
i
]
*
sqrt
(
ar
)));
...
...
@@ -274,8 +275,8 @@ class DensityPriorBoxOpNPUKernel : public framework::OpKernel<T> {
// c_x = (w+offset)*step_w - 0.5*step_average + 0.5*shifts + dj*shifts
// c_y = (h+offset)*step_h - 0.5*step_average + 0.5*shifts + di*shifts
Tensor
c_x
(
_type
);
Tensor
c_y
(
_type
);
phi
::
Dense
Tensor
c_x
(
_type
);
phi
::
Dense
Tensor
c_y
(
_type
);
auto
dim0
=
phi
::
make_ddim
({
1
,
layer_w
,
ratios_size
*
num_priors_per_ratio
,
1
});
auto
dim1
=
...
...
@@ -301,17 +302,17 @@ class DensityPriorBoxOpNPUKernel : public framework::OpKernel<T> {
F
.
Muls
(
&
box_w_ratio
,
static_cast
<
float
>
(
0.5
),
&
box_w_ratio
);
F
.
Muls
(
&
box_h_ratio
,
static_cast
<
float
>
(
0.5
),
&
box_h_ratio
);
Tensor
zero_t
(
_type
);
Tensor
one_t
(
_type
);
phi
::
Dense
Tensor
zero_t
(
_type
);
phi
::
Dense
Tensor
one_t
(
_type
);
zero_t
.
mutable_data
<
T
>
({
1
},
place
);
one_t
.
mutable_data
<
T
>
({
1
},
place
);
FillNpuTensorWithConstant
<
T
>
(
&
zero_t
,
static_cast
<
T
>
(
0
));
FillNpuTensorWithConstant
<
T
>
(
&
one_t
,
static_cast
<
T
>
(
1
));
Tensor
outbox0
(
_type
);
Tensor
outbox1
(
_type
);
Tensor
outbox2
(
_type
);
Tensor
outbox3
(
_type
);
phi
::
Dense
Tensor
outbox0
(
_type
);
phi
::
Dense
Tensor
outbox1
(
_type
);
phi
::
Dense
Tensor
outbox2
(
_type
);
phi
::
Dense
Tensor
outbox3
(
_type
);
outbox0
.
mutable_data
<
T
>
(
dim0
,
place
);
outbox1
.
mutable_data
<
T
>
(
dim1
,
place
);
outbox2
.
mutable_data
<
T
>
(
dim0
,
place
);
...
...
@@ -349,17 +350,17 @@ class DensityPriorBoxOpNPUKernel : public framework::OpKernel<T> {
{
layer_h
,
layer_w
,
ratios_size
*
num_priors_per_ratio
,
4
});
boxes
->
mutable_data
<
T
>
(
place
);
vars
->
mutable_data
<
T
>
(
place
);
Tensor
boxes_share
(
_type
);
Tensor
vars_share
(
_type
);
phi
::
Dense
Tensor
boxes_share
(
_type
);
phi
::
Dense
Tensor
vars_share
(
_type
);
boxes_share
.
ShareDataWith
(
*
boxes
);
boxes_share
.
Resize
(
out_dim
);
vars_share
.
ShareDataWith
(
*
vars
);
vars_share
.
Resize
(
out_dim
);
Tensor
box0
(
_type
);
Tensor
box1
(
_type
);
Tensor
box2
(
_type
);
Tensor
box3
(
_type
);
phi
::
Dense
Tensor
box0
(
_type
);
phi
::
Dense
Tensor
box1
(
_type
);
phi
::
Dense
Tensor
box2
(
_type
);
phi
::
Dense
Tensor
box3
(
_type
);
// out_dim = {layer_h, layer_w, ratios_size*num_priors_per_ratio, 1}
out_dim
[
3
]
=
1
;
box0
.
mutable_data
<
T
>
(
out_dim
,
place
);
...
...
@@ -377,7 +378,7 @@ class DensityPriorBoxOpNPUKernel : public framework::OpKernel<T> {
std
::
vector
<
int
>
multiples
=
{
layer_h
,
layer_w
,
ratios_size
*
num_priors_per_ratio
,
1
};
Tensor
variances_t
(
_type
);
phi
::
Dense
Tensor
variances_t
(
_type
);
// variances.size() == 4
variances_t
.
mutable_data
<
T
>
({
4
},
place
);
F
.
FloatVec2Tsr
(
variances
,
&
variances_t
);
...
...
paddle/fluid/operators/detection/generate_mask_labels_op.cc
浏览文件 @
65420271
...
...
@@ -25,7 +25,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
const
int
kBoxDim
=
4
;
template
<
typename
T
>
...
...
@@ -151,7 +150,8 @@ static inline void ExpandMaskTarget(const phi::CPUContext& ctx,
}
template
<
typename
T
>
std
::
vector
<
Tensor
>
SampleMaskForOneImage
(
const
phi
::
CPUContext
&
ctx
,
std
::
vector
<
phi
::
DenseTensor
>
SampleMaskForOneImage
(
const
phi
::
CPUContext
&
ctx
,
const
phi
::
DenseTensor
&
im_info
,
const
phi
::
DenseTensor
&
gt_classes
,
const
phi
::
DenseTensor
&
is_crowd
,
...
...
@@ -218,15 +218,15 @@ std::vector<Tensor> SampleMaskForOneImage(const phi::CPUContext& ctx,
int
gt_num
=
mask_gt_inds
.
size
();
int
fg_num
=
fg_inds
.
size
();
Tensor
boxes_from_polys
;
phi
::
Dense
Tensor
boxes_from_polys
;
boxes_from_polys
.
mutable_data
<
T
>
({
gt_num
,
4
},
platform
::
CPUPlace
());
Poly2Boxes
(
gt_polys
,
boxes_from_polys
.
data
<
T
>
());
std
::
vector
<
int
>
roi_has_mask
=
std
::
vector
<
int
>
(
fg_inds
.
begin
(),
fg_inds
.
end
());
Tensor
mask_class_labels
;
Tensor
masks
;
Tensor
rois_fg
;
phi
::
Dense
Tensor
mask_class_labels
;
phi
::
Dense
Tensor
masks
;
phi
::
Dense
Tensor
rois_fg
;
auto
im_scale
=
im_info
.
data
<
T
>
()[
2
];
if
(
fg_num
>
0
)
{
...
...
@@ -251,7 +251,7 @@ std::vector<Tensor> SampleMaskForOneImage(const phi::CPUContext& ctx,
rois_fg_data
[
k
]
=
rois_fg_data
[
k
]
/
im_scale
;
}
Tensor
overlaps_bbfg_bbpolys
;
phi
::
Dense
Tensor
overlaps_bbfg_bbpolys
;
overlaps_bbfg_bbpolys
.
mutable_data
<
T
>
({
fg_num
,
gt_num
},
ctx
.
GetPlace
());
BboxOverlaps
<
T
>
(
rois_fg
,
boxes_from_polys
,
&
overlaps_bbfg_bbpolys
);
...
...
@@ -306,7 +306,7 @@ std::vector<Tensor> SampleMaskForOneImage(const phi::CPUContext& ctx,
roi_has_mask
=
std
::
vector
<
int
>
(
bg_inds
.
begin
(),
bg_inds
.
end
());
}
Tensor
masks_expand
;
phi
::
Dense
Tensor
masks_expand
;
ExpandMaskTarget
<
T
>
(
ctx
,
masks
,
mask_class_labels
,
resolution
,
num_classes
,
&
masks_expand
);
...
...
@@ -315,13 +315,13 @@ std::vector<Tensor> SampleMaskForOneImage(const phi::CPUContext& ctx,
rois_fg_data
[
k
]
=
rois_fg_data
[
k
]
*
im_scale
;
}
Tensor
roi_has_mask_t
;
phi
::
Dense
Tensor
roi_has_mask_t
;
int
roi_has_mask_size
=
roi_has_mask
.
size
();
int
*
roi_has_mask_data
=
roi_has_mask_t
.
mutable_data
<
int
>
({
roi_has_mask_size
,
1
},
ctx
.
GetPlace
());
std
::
copy
(
roi_has_mask
.
begin
(),
roi_has_mask
.
end
(),
roi_has_mask_data
);
std
::
vector
<
Tensor
>
res
;
std
::
vector
<
phi
::
Dense
Tensor
>
res
;
res
.
emplace_back
(
rois_fg
);
res
.
emplace_back
(
roi_has_mask_t
);
res
.
emplace_back
(
masks_expand
);
...
...
@@ -405,23 +405,23 @@ class GenerateMaskLabelsKernel : public framework::OpKernel<T> {
lod0
.
emplace_back
(
num_mask
);
continue
;
}
Tensor
im_info_slice
=
im_info
->
Slice
(
i
,
i
+
1
);
Tensor
gt_classes_slice
=
phi
::
Dense
Tensor
im_info_slice
=
im_info
->
Slice
(
i
,
i
+
1
);
phi
::
Dense
Tensor
gt_classes_slice
=
gt_classes
->
Slice
(
gt_classes_lod
[
i
],
gt_classes_lod
[
i
+
1
]);
Tensor
is_crowd_slice
=
phi
::
Dense
Tensor
is_crowd_slice
=
is_crowd
->
Slice
(
is_crowd_lod
[
i
],
is_crowd_lod
[
i
+
1
]);
Tensor
label_int32_slice
=
phi
::
Dense
Tensor
label_int32_slice
=
label_int32
->
Slice
(
label_int32_lod
[
i
],
label_int32_lod
[
i
+
1
]);
Tensor
rois_slice
=
rois
->
Slice
(
rois_lod
[
i
],
rois_lod
[
i
+
1
]);
phi
::
Dense
Tensor
rois_slice
=
rois
->
Slice
(
rois_lod
[
i
],
rois_lod
[
i
+
1
]);
auto
sub_lod_and_offset
=
framework
::
GetSubLoDAndAbsoluteOffset
(
gt_segms_lod
,
i
,
i
+
1
,
0
);
auto
lod_length
=
sub_lod_and_offset
.
first
;
size_t
s
=
sub_lod_and_offset
.
second
.
first
;
size_t
e
=
sub_lod_and_offset
.
second
.
second
;
Tensor
gt_segms_slice
=
gt_segms
->
Slice
(
s
,
e
);
phi
::
Dense
Tensor
gt_segms_slice
=
gt_segms
->
Slice
(
s
,
e
);
std
::
vector
<
Tensor
>
tensor_output
=
std
::
vector
<
phi
::
Dense
Tensor
>
tensor_output
=
SampleMaskForOneImage
<
T
>
(
dev_ctx
,
im_info_slice
,
gt_classes_slice
,
...
...
@@ -433,9 +433,9 @@ class GenerateMaskLabelsKernel : public framework::OpKernel<T> {
resolution
,
lod_length
);
Tensor
sampled_mask_rois
=
tensor_output
[
0
];
Tensor
sampled_roi_has_mask_int32
=
tensor_output
[
1
];
Tensor
sampled_mask_int32
=
tensor_output
[
2
];
phi
::
Dense
Tensor
sampled_mask_rois
=
tensor_output
[
0
];
phi
::
Dense
Tensor
sampled_roi_has_mask_int32
=
tensor_output
[
1
];
phi
::
Dense
Tensor
sampled_mask_int32
=
tensor_output
[
2
];
AppendMask
<
T
>
(
mask_rois
,
kBoxDim
*
num_mask
,
&
sampled_mask_rois
);
AppendMask
<
int
>
(
...
...
paddle/fluid/operators/detection/generate_proposal_labels_op.cc
浏览文件 @
65420271
...
...
@@ -25,7 +25,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
const
int
kBoxDim
=
4
;
template
<
typename
T
>
...
...
@@ -174,7 +173,7 @@ void Concat(const phi::CPUContext& context,
const
phi
::
DenseTensor
&
in_tensor_b
,
phi
::
DenseTensor
*
out_tensor
)
{
int
axis
=
0
;
std
::
vector
<
Tensor
>
inputs
;
std
::
vector
<
phi
::
Dense
Tensor
>
inputs
;
inputs
.
emplace_back
(
in_tensor_a
);
inputs
.
emplace_back
(
in_tensor_b
);
math
::
ConcatFunctor
<
phi
::
CPUContext
,
T
>
concat_functor
;
...
...
@@ -300,7 +299,7 @@ void GatherBoxesLabels(const phi::CPUContext& context,
phi
::
DenseTensor
*
sampled_max_overlap
)
{
int
fg_num
=
fg_inds
.
size
();
int
bg_num
=
bg_inds
.
size
();
Tensor
fg_inds_t
,
bg_inds_t
,
gt_box_inds_t
,
gt_label_inds_t
;
phi
::
Dense
Tensor
fg_inds_t
,
bg_inds_t
,
gt_box_inds_t
,
gt_label_inds_t
;
int
*
fg_inds_data
=
fg_inds_t
.
mutable_data
<
int
>
({
fg_num
},
context
.
GetPlace
());
int
*
bg_inds_data
=
bg_inds_t
.
mutable_data
<
int
>
({
bg_num
},
context
.
GetPlace
());
int
*
gt_box_inds_data
=
...
...
@@ -312,7 +311,7 @@ void GatherBoxesLabels(const phi::CPUContext& context,
std
::
copy
(
gt_inds
.
begin
(),
gt_inds
.
end
(),
gt_box_inds_data
);
std
::
copy
(
gt_inds
.
begin
(),
gt_inds
.
end
(),
gt_label_inds_data
);
Tensor
fg_boxes
,
bg_boxes
,
fg_labels
,
bg_labels
;
phi
::
Dense
Tensor
fg_boxes
,
bg_boxes
,
fg_labels
,
bg_labels
;
fg_boxes
.
mutable_data
<
T
>
({
fg_num
,
kBoxDim
},
context
.
GetPlace
());
phi
::
funcs
::
CPUGather
<
T
>
(
context
,
boxes
,
fg_inds_t
,
&
fg_boxes
);
bg_boxes
.
mutable_data
<
T
>
({
bg_num
,
kBoxDim
},
context
.
GetPlace
());
...
...
@@ -325,7 +324,7 @@ void GatherBoxesLabels(const phi::CPUContext& context,
phi
::
funcs
::
set_constant
(
context
,
&
bg_labels
,
0
);
Concat
<
int
>
(
context
,
fg_labels
,
bg_labels
,
sampled_labels
);
Tensor
fg_max_overlap
,
bg_max_overlap
;
phi
::
Dense
Tensor
fg_max_overlap
,
bg_max_overlap
;
fg_max_overlap
.
mutable_data
<
T
>
({
fg_num
},
context
.
GetPlace
());
phi
::
funcs
::
CPUGather
<
T
>
(
context
,
max_overlap
,
fg_inds_t
,
&
fg_max_overlap
);
bg_max_overlap
.
mutable_data
<
T
>
({
bg_num
},
context
.
GetPlace
());
...
...
@@ -334,7 +333,7 @@ void GatherBoxesLabels(const phi::CPUContext& context,
}
template
<
typename
T
>
std
::
vector
<
Tensor
>
SampleRoisForOneImage
(
std
::
vector
<
phi
::
Dense
Tensor
>
SampleRoisForOneImage
(
const
phi
::
CPUContext
&
context
,
const
phi
::
DenseTensor
&
rpn_rois_in
,
const
phi
::
DenseTensor
&
gt_classes
,
...
...
@@ -355,7 +354,7 @@ std::vector<Tensor> SampleRoisForOneImage(
const
phi
::
DenseTensor
&
max_overlap
)
{
// 1.1 map to original image
auto
im_scale
=
im_info
.
data
<
T
>
()[
2
];
Tensor
rpn_rois
;
phi
::
Dense
Tensor
rpn_rois
;
rpn_rois
.
mutable_data
<
T
>
(
rpn_rois_in
.
dims
(),
context
.
GetPlace
());
const
T
*
rpn_rois_in_dt
=
rpn_rois_in
.
data
<
T
>
();
T
*
rpn_rois_dt
=
rpn_rois
.
data
<
T
>
();
...
...
@@ -367,10 +366,10 @@ std::vector<Tensor> SampleRoisForOneImage(
int
proposals_num
=
1
;
if
(
is_cascade_rcnn
)
{
Tensor
keep
;
phi
::
Dense
Tensor
keep
;
FilterRoIs
<
T
>
(
context
,
rpn_rois
,
max_overlap
,
&
keep
);
Tensor
roi_filter
;
// Tensor box_filter;
phi
::
Dense
Tensor
roi_filter
;
//
phi::Dense
Tensor box_filter;
if
(
keep
.
numel
()
==
0
)
{
phi
::
funcs
::
SetConstant
<
phi
::
CPUContext
,
T
>
set_zero
;
roi_filter
.
mutable_data
<
T
>
({
proposals_num
,
kBoxDim
},
context
.
GetPlace
());
...
...
@@ -389,16 +388,16 @@ std::vector<Tensor> SampleRoisForOneImage(
// 1.2 compute overlaps
proposals_num
+=
gt_boxes
.
dims
()[
0
];
Tensor
proposal_to_gt_overlaps
;
phi
::
Dense
Tensor
proposal_to_gt_overlaps
;
proposal_to_gt_overlaps
.
mutable_data
<
T
>
({
proposals_num
,
gt_boxes
.
dims
()[
0
]},
context
.
GetPlace
());
Tensor
boxes
;
phi
::
Dense
Tensor
boxes
;
boxes
.
mutable_data
<
T
>
({
proposals_num
,
kBoxDim
},
context
.
GetPlace
());
Concat
<
T
>
(
context
,
gt_boxes
,
rpn_rois
,
&
boxes
);
BboxOverlaps
<
T
>
(
boxes
,
gt_boxes
,
&
proposal_to_gt_overlaps
);
Tensor
proposal_with_max_overlap
;
phi
::
Dense
Tensor
proposal_with_max_overlap
;
proposal_with_max_overlap
.
mutable_data
<
T
>
({
proposals_num
},
context
.
GetPlace
());
...
...
@@ -423,7 +422,8 @@ std::vector<Tensor> SampleRoisForOneImage(
std
::
vector
<
int
>
mapped_gt_inds
=
fg_bg_gt
[
2
];
// mapped_gt_labels
// Gather boxes and labels
Tensor
sampled_boxes
,
sampled_labels
,
sampled_gts
,
sampled_max_overlap
;
phi
::
DenseTensor
sampled_boxes
,
sampled_labels
,
sampled_gts
,
sampled_max_overlap
;
int
fg_num
=
fg_inds
.
size
();
int
bg_num
=
bg_inds
.
size
();
int
boxes_num
=
fg_num
+
bg_num
;
...
...
@@ -446,7 +446,7 @@ std::vector<Tensor> SampleRoisForOneImage(
&
sampled_max_overlap
);
// Compute targets
Tensor
bbox_targets_single
;
phi
::
Dense
Tensor
bbox_targets_single
;
bbox_targets_single
.
mutable_data
<
T
>
(
bbox_dim
,
context
.
GetPlace
());
BoxToDelta
<
T
>
(
fg_num
,
sampled_boxes
,
...
...
@@ -456,14 +456,14 @@ std::vector<Tensor> SampleRoisForOneImage(
&
bbox_targets_single
);
// Scale rois
Tensor
sampled_rois
;
phi
::
Dense
Tensor
sampled_rois
;
sampled_rois
.
mutable_data
<
T
>
(
sampled_boxes
.
dims
(),
context
.
GetPlace
());
auto
sampled_rois_et
=
framework
::
EigenTensor
<
T
,
2
>::
From
(
sampled_rois
);
auto
sampled_boxes_et
=
framework
::
EigenTensor
<
T
,
2
>::
From
(
sampled_boxes
);
sampled_rois_et
=
sampled_boxes_et
*
im_scale
;
// Expand box targets
Tensor
bbox_targets
,
bbox_inside_weights
,
bbox_outside_weights
;
phi
::
Dense
Tensor
bbox_targets
,
bbox_inside_weights
,
bbox_outside_weights
;
framework
::
DDim
bbox_expand_dim
({
boxes_num
,
kBoxDim
*
class_nums
});
bbox_targets
.
mutable_data
<
T
>
(
bbox_expand_dim
,
context
.
GetPlace
());
bbox_inside_weights
.
mutable_data
<
T
>
(
bbox_expand_dim
,
context
.
GetPlace
());
...
...
@@ -500,7 +500,7 @@ std::vector<Tensor> SampleRoisForOneImage(
bbox_outside_weights_data
[
dst_idx
+
3
]
=
1
;
}
}
std
::
vector
<
Tensor
>
res
;
std
::
vector
<
phi
::
Dense
Tensor
>
res
;
res
.
emplace_back
(
sampled_rois
);
res
.
emplace_back
(
sampled_labels
);
res
.
emplace_back
(
bbox_targets
);
...
...
@@ -610,16 +610,16 @@ class GenerateProposalLabelsKernel : public framework::OpKernel<T> {
lod0
.
emplace_back
(
num_rois
);
continue
;
}
Tensor
rpn_rois_slice
=
phi
::
Dense
Tensor
rpn_rois_slice
=
rpn_rois
->
Slice
(
rpn_rois_lod
[
i
],
rpn_rois_lod
[
i
+
1
]);
Tensor
gt_classes_slice
=
phi
::
Dense
Tensor
gt_classes_slice
=
gt_classes
->
Slice
(
gt_classes_lod
[
i
],
gt_classes_lod
[
i
+
1
]);
Tensor
is_crowd_slice
=
phi
::
Dense
Tensor
is_crowd_slice
=
is_crowd
->
Slice
(
is_crowd_lod
[
i
],
is_crowd_lod
[
i
+
1
]);
Tensor
gt_boxes_slice
=
phi
::
Dense
Tensor
gt_boxes_slice
=
gt_boxes
->
Slice
(
gt_boxes_lod
[
i
],
gt_boxes_lod
[
i
+
1
]);
Tensor
im_info_slice
=
im_info
->
Slice
(
i
,
i
+
1
);
Tensor
max_overlap_slice
;
phi
::
Dense
Tensor
im_info_slice
=
im_info
->
Slice
(
i
,
i
+
1
);
phi
::
Dense
Tensor
max_overlap_slice
;
if
(
is_cascade_rcnn
)
{
auto
*
max_overlap
=
context
.
Input
<
phi
::
DenseTensor
>
(
"MaxOverlap"
);
max_overlap_slice
=
...
...
@@ -628,7 +628,7 @@ class GenerateProposalLabelsKernel : public framework::OpKernel<T> {
max_overlap_slice
.
mutable_data
<
T
>
({
rpn_rois_slice
.
dims
()[
0
]},
context
.
GetPlace
());
}
std
::
vector
<
Tensor
>
tensor_output
=
std
::
vector
<
phi
::
Dense
Tensor
>
tensor_output
=
SampleRoisForOneImage
<
T
>
(
dev_ctx
,
rpn_rois_slice
,
gt_classes_slice
,
...
...
@@ -647,12 +647,12 @@ class GenerateProposalLabelsKernel : public framework::OpKernel<T> {
is_cascade_rcnn
,
is_cls_agnostic
,
max_overlap_slice
);
Tensor
sampled_rois
=
tensor_output
[
0
];
Tensor
sampled_labels_int32
=
tensor_output
[
1
];
Tensor
sampled_bbox_targets
=
tensor_output
[
2
];
Tensor
sampled_bbox_inside_weights
=
tensor_output
[
3
];
Tensor
sampled_bbox_outside_weights
=
tensor_output
[
4
];
Tensor
sampled_max_overlap
=
tensor_output
[
5
];
phi
::
Dense
Tensor
sampled_rois
=
tensor_output
[
0
];
phi
::
Dense
Tensor
sampled_labels_int32
=
tensor_output
[
1
];
phi
::
Dense
Tensor
sampled_bbox_targets
=
tensor_output
[
2
];
phi
::
Dense
Tensor
sampled_bbox_inside_weights
=
tensor_output
[
3
];
phi
::
Dense
Tensor
sampled_bbox_outside_weights
=
tensor_output
[
4
];
phi
::
Dense
Tensor
sampled_max_overlap
=
tensor_output
[
5
];
AppendRois
<
T
>
(
rois
,
kBoxDim
*
num_rois
,
&
sampled_rois
);
AppendRois
<
int
>
(
labels_int32
,
num_rois
,
&
sampled_labels_int32
);
...
...
paddle/fluid/operators/detection/generate_proposals_op.cc
浏览文件 @
65420271
...
...
@@ -27,8 +27,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
class
GenerateProposalsOp
:
public
framework
::
OperatorWithKernel
{
public:
using
framework
::
OperatorWithKernel
::
OperatorWithKernel
;
...
...
@@ -115,7 +113,7 @@ class GenerateProposalsKernel : public framework::OpKernel<T> {
context
.
GetPlace
());
rpn_roi_probs
->
mutable_data
<
T
>
({
scores
->
numel
(),
1
},
context
.
GetPlace
());
Tensor
bbox_deltas_swap
,
scores_swap
;
phi
::
Dense
Tensor
bbox_deltas_swap
,
scores_swap
;
bbox_deltas_swap
.
mutable_data
<
T
>
({
num
,
h_bbox
,
w_bbox
,
c_bbox
},
dev_ctx
.
GetPlace
());
scores_swap
.
mutable_data
<
T
>
({
num
,
h_score
,
w_score
,
c_score
},
...
...
@@ -136,14 +134,14 @@ class GenerateProposalsKernel : public framework::OpKernel<T> {
int64_t
num_proposals
=
0
;
for
(
int64_t
i
=
0
;
i
<
num
;
++
i
)
{
Tensor
im_info_slice
=
im_info
->
Slice
(
i
,
i
+
1
);
Tensor
bbox_deltas_slice
=
bbox_deltas_swap
.
Slice
(
i
,
i
+
1
);
Tensor
scores_slice
=
scores_swap
.
Slice
(
i
,
i
+
1
);
phi
::
Dense
Tensor
im_info_slice
=
im_info
->
Slice
(
i
,
i
+
1
);
phi
::
Dense
Tensor
bbox_deltas_slice
=
bbox_deltas_swap
.
Slice
(
i
,
i
+
1
);
phi
::
Dense
Tensor
scores_slice
=
scores_swap
.
Slice
(
i
,
i
+
1
);
bbox_deltas_slice
.
Resize
({
h_bbox
*
w_bbox
*
c_bbox
/
4
,
4
});
scores_slice
.
Resize
({
h_score
*
w_score
*
c_score
,
1
});
std
::
pair
<
Tensor
,
Tensor
>
tensor_pair
=
std
::
pair
<
phi
::
DenseTensor
,
phi
::
Dense
Tensor
>
tensor_pair
=
ProposalForOneImage
(
dev_ctx
,
im_info_slice
,
anchors
,
...
...
@@ -155,8 +153,8 @@ class GenerateProposalsKernel : public framework::OpKernel<T> {
nms_thresh
,
min_size
,
eta
);
Tensor
&
proposals
=
tensor_pair
.
first
;
Tensor
&
scores
=
tensor_pair
.
second
;
phi
::
Dense
Tensor
&
proposals
=
tensor_pair
.
first
;
phi
::
Dense
Tensor
&
scores
=
tensor_pair
.
second
;
AppendProposals
(
rpn_rois
,
4
*
num_proposals
,
proposals
);
AppendProposals
(
rpn_roi_probs
,
num_proposals
,
scores
);
...
...
@@ -179,13 +177,13 @@ class GenerateProposalsKernel : public framework::OpKernel<T> {
rpn_roi_probs
->
Resize
({
num_proposals
,
1
});
}
std
::
pair
<
Tensor
,
Tensor
>
ProposalForOneImage
(
std
::
pair
<
phi
::
DenseTensor
,
phi
::
Dense
Tensor
>
ProposalForOneImage
(
const
phi
::
CPUContext
&
ctx
,
const
Tensor
&
im_info_slice
,
const
Tensor
&
anchors
,
const
Tensor
&
variances
,
const
Tensor
&
bbox_deltas_slice
,
// [M, 4]
const
Tensor
&
scores_slice
,
// [N, 1]
const
phi
::
Dense
Tensor
&
im_info_slice
,
const
phi
::
Dense
Tensor
&
anchors
,
const
phi
::
Dense
Tensor
&
variances
,
const
phi
::
Dense
Tensor
&
bbox_deltas_slice
,
// [M, 4]
const
phi
::
Dense
Tensor
&
scores_slice
,
// [N, 1]
int
pre_nms_top_n
,
int
post_nms_top_n
,
float
nms_thresh
,
...
...
@@ -194,7 +192,7 @@ class GenerateProposalsKernel : public framework::OpKernel<T> {
auto
*
scores_data
=
scores_slice
.
data
<
T
>
();
// Sort index
Tensor
index_t
;
phi
::
Dense
Tensor
index_t
;
index_t
.
Resize
({
scores_slice
.
numel
()});
int
*
index
=
index_t
.
mutable_data
<
int
>
(
ctx
.
GetPlace
());
for
(
int
i
=
0
;
i
<
scores_slice
.
numel
();
++
i
)
{
...
...
@@ -212,7 +210,7 @@ class GenerateProposalsKernel : public framework::OpKernel<T> {
index_t
.
Resize
({
pre_nms_top_n
});
}
Tensor
scores_sel
,
bbox_sel
,
anchor_sel
,
var_sel
;
phi
::
Dense
Tensor
scores_sel
,
bbox_sel
,
anchor_sel
,
var_sel
;
scores_sel
.
mutable_data
<
T
>
({
index_t
.
numel
(),
1
},
ctx
.
GetPlace
());
bbox_sel
.
mutable_data
<
T
>
({
index_t
.
numel
(),
4
},
ctx
.
GetPlace
());
anchor_sel
.
mutable_data
<
T
>
({
index_t
.
numel
(),
4
},
ctx
.
GetPlace
());
...
...
@@ -223,26 +221,26 @@ class GenerateProposalsKernel : public framework::OpKernel<T> {
phi
::
funcs
::
CPUGather
<
T
>
(
ctx
,
anchors
,
index_t
,
&
anchor_sel
);
phi
::
funcs
::
CPUGather
<
T
>
(
ctx
,
variances
,
index_t
,
&
var_sel
);
Tensor
proposals
;
phi
::
Dense
Tensor
proposals
;
proposals
.
mutable_data
<
T
>
({
index_t
.
numel
(),
4
},
ctx
.
GetPlace
());
BoxCoder
<
T
>
(
ctx
,
&
anchor_sel
,
&
bbox_sel
,
&
var_sel
,
&
proposals
);
ClipTiledBoxes
<
T
>
(
ctx
,
im_info_slice
,
proposals
,
&
proposals
,
false
);
Tensor
keep
;
phi
::
Dense
Tensor
keep
;
FilterBoxes
<
T
>
(
ctx
,
&
proposals
,
min_size
,
im_info_slice
,
true
,
&
keep
);
// Handle the case when there is no keep index left
if
(
keep
.
numel
()
==
0
)
{
phi
::
funcs
::
SetConstant
<
phi
::
CPUContext
,
T
>
set_zero
;
bbox_sel
.
mutable_data
<
T
>
({
1
,
4
},
ctx
.
GetPlace
());
set_zero
(
ctx
,
&
bbox_sel
,
static_cast
<
T
>
(
0
));
Tensor
scores_filter
;
phi
::
Dense
Tensor
scores_filter
;
scores_filter
.
mutable_data
<
T
>
({
1
,
1
},
ctx
.
GetPlace
());
set_zero
(
ctx
,
&
scores_filter
,
static_cast
<
T
>
(
0
));
return
std
::
make_pair
(
bbox_sel
,
scores_filter
);
}
Tensor
scores_filter
;
phi
::
Dense
Tensor
scores_filter
;
bbox_sel
.
mutable_data
<
T
>
({
keep
.
numel
(),
4
},
ctx
.
GetPlace
());
scores_filter
.
mutable_data
<
T
>
({
keep
.
numel
(),
1
},
ctx
.
GetPlace
());
phi
::
funcs
::
CPUGather
<
T
>
(
ctx
,
proposals
,
keep
,
&
bbox_sel
);
...
...
@@ -251,7 +249,7 @@ class GenerateProposalsKernel : public framework::OpKernel<T> {
return
std
::
make_pair
(
bbox_sel
,
scores_filter
);
}
Tensor
keep_nms
=
phi
::
Dense
Tensor
keep_nms
=
phi
::
funcs
::
NMS
<
T
>
(
ctx
,
&
bbox_sel
,
&
scores_filter
,
nms_thresh
,
eta
);
if
(
post_nms_top_n
>
0
&&
post_nms_top_n
<
keep_nms
.
numel
())
{
...
...
paddle/fluid/operators/detection/generate_proposals_op.cu
浏览文件 @
65420271
...
...
@@ -28,24 +28,22 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
namespace
{
template
<
typename
T
>
static
std
::
pair
<
Tensor
,
Tensor
>
ProposalForOneImage
(
static
std
::
pair
<
phi
::
DenseTensor
,
phi
::
Dense
Tensor
>
ProposalForOneImage
(
const
phi
::
GPUContext
&
ctx
,
const
Tensor
&
im_info
,
const
Tensor
&
anchors
,
const
Tensor
&
variances
,
const
Tensor
&
bbox_deltas
,
// [M, 4]
const
Tensor
&
scores
,
// [N, 1]
const
phi
::
Dense
Tensor
&
im_info
,
const
phi
::
Dense
Tensor
&
anchors
,
const
phi
::
Dense
Tensor
&
variances
,
const
phi
::
Dense
Tensor
&
bbox_deltas
,
// [M, 4]
const
phi
::
Dense
Tensor
&
scores
,
// [N, 1]
int
pre_nms_top_n
,
int
post_nms_top_n
,
float
nms_thresh
,
float
min_size
,
float
eta
)
{
// 1. pre nms
Tensor
scores_sort
,
index_sort
;
phi
::
Dense
Tensor
scores_sort
,
index_sort
;
SortDescending
<
T
>
(
ctx
,
scores
,
&
scores_sort
,
&
index_sort
);
int
num
=
scores
.
numel
();
int
pre_nms_num
=
(
pre_nms_top_n
<=
0
||
pre_nms_top_n
>
num
)
?
scores
.
numel
()
...
...
@@ -54,7 +52,7 @@ static std::pair<Tensor, Tensor> ProposalForOneImage(
index_sort
.
Resize
({
pre_nms_num
,
1
});
// 2. box decode and clipping
Tensor
proposals
;
phi
::
Dense
Tensor
proposals
;
proposals
.
mutable_data
<
T
>
({
pre_nms_num
,
4
},
ctx
.
GetPlace
());
{
...
...
@@ -68,7 +66,7 @@ static std::pair<Tensor, Tensor> ProposalForOneImage(
}
// 3. filter
Tensor
keep_index
,
keep_num_t
;
phi
::
Dense
Tensor
keep_index
,
keep_num_t
;
keep_index
.
mutable_data
<
int
>
({
pre_nms_num
},
ctx
.
GetPlace
());
keep_num_t
.
mutable_data
<
int
>
({
1
},
ctx
.
GetPlace
());
min_size
=
std
::
max
(
min_size
,
1.0
f
);
...
...
@@ -90,7 +88,7 @@ static std::pair<Tensor, Tensor> ProposalForOneImage(
ctx
.
Wait
();
keep_index
.
Resize
({
keep_num
});
Tensor
scores_filter
,
proposals_filter
;
phi
::
Dense
Tensor
scores_filter
,
proposals_filter
;
// Handle the case when there is no keep index left
if
(
keep_num
==
0
)
{
phi
::
funcs
::
SetConstant
<
phi
::
GPUContext
,
T
>
set_zero
;
...
...
@@ -110,13 +108,13 @@ static std::pair<Tensor, Tensor> ProposalForOneImage(
}
// 4. nms
Tensor
keep_nms
;
phi
::
Dense
Tensor
keep_nms
;
NMS
<
T
>
(
ctx
,
proposals_filter
,
keep_index
,
nms_thresh
,
&
keep_nms
);
if
(
post_nms_top_n
>
0
&&
post_nms_top_n
<
keep_nms
.
numel
())
{
keep_nms
.
Resize
({
post_nms_top_n
});
}
Tensor
scores_nms
,
proposals_nms
;
phi
::
Dense
Tensor
scores_nms
,
proposals_nms
;
proposals_nms
.
mutable_data
<
T
>
({
keep_nms
.
numel
(),
4
},
ctx
.
GetPlace
());
scores_nms
.
mutable_data
<
T
>
({
keep_nms
.
numel
(),
1
},
ctx
.
GetPlace
());
phi
::
funcs
::
GPUGather
<
T
>
(
ctx
,
proposals_filter
,
keep_nms
,
&
proposals_nms
);
...
...
@@ -171,7 +169,7 @@ class CUDAGenerateProposalsKernel : public framework::OpKernel<T> {
int64_t
h_bbox
=
bbox_dim
[
2
];
int64_t
w_bbox
=
bbox_dim
[
3
];
Tensor
bbox_deltas_swap
,
scores_swap
;
phi
::
Dense
Tensor
bbox_deltas_swap
,
scores_swap
;
bbox_deltas_swap
.
mutable_data
<
T
>
({
num
,
h_bbox
,
w_bbox
,
c_bbox
},
dev_ctx
.
GetPlace
());
scores_swap
.
mutable_data
<
T
>
({
num
,
h_score
,
w_score
,
c_score
},
...
...
@@ -200,14 +198,14 @@ class CUDAGenerateProposalsKernel : public framework::OpKernel<T> {
std
::
vector
<
int
>
tmp_num
;
for
(
int64_t
i
=
0
;
i
<
num
;
++
i
)
{
Tensor
im_info_slice
=
im_info
->
Slice
(
i
,
i
+
1
);
Tensor
bbox_deltas_slice
=
bbox_deltas_swap
.
Slice
(
i
,
i
+
1
);
Tensor
scores_slice
=
scores_swap
.
Slice
(
i
,
i
+
1
);
phi
::
Dense
Tensor
im_info_slice
=
im_info
->
Slice
(
i
,
i
+
1
);
phi
::
Dense
Tensor
bbox_deltas_slice
=
bbox_deltas_swap
.
Slice
(
i
,
i
+
1
);
phi
::
Dense
Tensor
scores_slice
=
scores_swap
.
Slice
(
i
,
i
+
1
);
bbox_deltas_slice
.
Resize
({
h_bbox
*
w_bbox
*
c_bbox
/
4
,
4
});
scores_slice
.
Resize
({
h_score
*
w_score
*
c_score
,
1
});
std
::
pair
<
Tensor
,
Tensor
>
box_score_pair
=
std
::
pair
<
phi
::
DenseTensor
,
phi
::
Dense
Tensor
>
box_score_pair
=
ProposalForOneImage
<
T
>
(
dev_ctx
,
im_info_slice
,
anchors
,
...
...
@@ -220,8 +218,8 @@ class CUDAGenerateProposalsKernel : public framework::OpKernel<T> {
min_size
,
eta
);
Tensor
&
proposals
=
box_score_pair
.
first
;
Tensor
&
scores
=
box_score_pair
.
second
;
phi
::
Dense
Tensor
&
proposals
=
box_score_pair
.
first
;
phi
::
Dense
Tensor
&
scores
=
box_score_pair
.
second
;
memory
::
Copy
(
place
,
rpn_rois_data
+
num_proposals
*
4
,
...
...
paddle/fluid/operators/detection/generate_proposals_v2_op.cc
浏览文件 @
65420271
...
...
@@ -29,8 +29,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
class
GenerateProposalsV2Op
:
public
framework
::
OperatorWithKernel
{
public:
using
framework
::
OperatorWithKernel
::
OperatorWithKernel
;
...
...
paddle/fluid/operators/detection/iou_similarity_op_mlu.cc
浏览文件 @
65420271
...
...
@@ -18,8 +18,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
template
<
typename
T
>
struct
IouFunction
{
public:
...
...
@@ -182,21 +180,21 @@ class IouSimilarityMLUKernel : public framework::OpKernel<T> {
auto
M
=
y
->
dims
()[
0
];
out
->
mutable_data
<
T
>
({
N
,
M
},
place
);
Tensor
xt
(
_type
);
Tensor
yt
(
_type
);
phi
::
Dense
Tensor
xt
(
_type
);
phi
::
Dense
Tensor
yt
(
_type
);
xt
.
mutable_data
<
T
>
({
4
,
N
},
place
);
yt
.
mutable_data
<
T
>
({
4
,
M
},
place
);
std
::
vector
<
int
>
vec_trans
=
{
1
,
0
};
F
.
Transpose
(
x
,
&
xt
,
vec_trans
);
F
.
Transpose
(
y
,
&
yt
,
vec_trans
);
Tensor
xmin1
=
xt
.
Slice
(
0
,
1
);
Tensor
ymin1
=
xt
.
Slice
(
1
,
2
);
Tensor
xmax1
=
xt
.
Slice
(
2
,
3
);
Tensor
ymax1
=
xt
.
Slice
(
3
,
4
);
Tensor
xmin2
=
yt
.
Slice
(
0
,
1
);
Tensor
ymin2
=
yt
.
Slice
(
1
,
2
);
Tensor
xmax2
=
yt
.
Slice
(
2
,
3
);
Tensor
ymax2
=
yt
.
Slice
(
3
,
4
);
phi
::
Dense
Tensor
xmin1
=
xt
.
Slice
(
0
,
1
);
phi
::
Dense
Tensor
ymin1
=
xt
.
Slice
(
1
,
2
);
phi
::
Dense
Tensor
xmax1
=
xt
.
Slice
(
2
,
3
);
phi
::
Dense
Tensor
ymax1
=
xt
.
Slice
(
3
,
4
);
phi
::
Dense
Tensor
xmin2
=
yt
.
Slice
(
0
,
1
);
phi
::
Dense
Tensor
ymin2
=
yt
.
Slice
(
1
,
2
);
phi
::
Dense
Tensor
xmax2
=
yt
.
Slice
(
2
,
3
);
phi
::
Dense
Tensor
ymax2
=
yt
.
Slice
(
3
,
4
);
xmin1
.
Resize
({
N
,
1
});
ymin1
.
Resize
({
N
,
1
});
xmax1
.
Resize
({
N
,
1
});
...
...
@@ -206,12 +204,12 @@ class IouSimilarityMLUKernel : public framework::OpKernel<T> {
xmax2
.
Resize
({
1
,
M
});
ymax2
.
Resize
({
1
,
M
});
Tensor
w1
(
_type
);
Tensor
h1
(
_type
);
Tensor
w2
(
_type
);
Tensor
h2
(
_type
);
Tensor
area1
(
_type
);
Tensor
area2
(
_type
);
phi
::
Dense
Tensor
w1
(
_type
);
phi
::
Dense
Tensor
h1
(
_type
);
phi
::
Dense
Tensor
w2
(
_type
);
phi
::
Dense
Tensor
h2
(
_type
);
phi
::
Dense
Tensor
area1
(
_type
);
phi
::
Dense
Tensor
area2
(
_type
);
w1
.
mutable_data
<
T
>
({
N
,
1
},
place
);
h1
.
mutable_data
<
T
>
({
N
,
1
},
place
);
w2
.
mutable_data
<
T
>
({
1
,
M
},
place
);
...
...
@@ -231,10 +229,10 @@ class IouSimilarityMLUKernel : public framework::OpKernel<T> {
F
.
Mul
(
&
w1
,
&
h1
,
&
area1
);
F
.
Mul
(
&
w2
,
&
h2
,
&
area2
);
Tensor
inter_xmax
(
_type
);
Tensor
inter_ymax
(
_type
);
Tensor
inter_xmin
(
_type
);
Tensor
inter_ymin
(
_type
);
phi
::
Dense
Tensor
inter_xmax
(
_type
);
phi
::
Dense
Tensor
inter_ymax
(
_type
);
phi
::
Dense
Tensor
inter_xmin
(
_type
);
phi
::
Dense
Tensor
inter_ymin
(
_type
);
inter_xmax
.
mutable_data
<
T
>
({
N
,
M
},
place
);
inter_ymax
.
mutable_data
<
T
>
({
N
,
M
},
place
);
inter_xmin
.
mutable_data
<
T
>
({
N
,
M
},
place
);
...
...
@@ -244,8 +242,8 @@ class IouSimilarityMLUKernel : public framework::OpKernel<T> {
F
.
Maximum
(
&
xmin1
,
&
xmin2
,
&
inter_xmin
);
F
.
Maximum
(
&
ymin1
,
&
ymin2
,
&
inter_ymin
);
Tensor
inter_w
(
_type
);
Tensor
inter_h
(
_type
);
phi
::
Dense
Tensor
inter_w
(
_type
);
phi
::
Dense
Tensor
inter_h
(
_type
);
inter_w
.
mutable_data
<
T
>
({
N
,
M
},
place
);
inter_h
.
mutable_data
<
T
>
({
N
,
M
},
place
);
F
.
Sub
(
&
inter_xmax
,
&
inter_xmin
,
&
inter_w
);
...
...
@@ -255,14 +253,14 @@ class IouSimilarityMLUKernel : public framework::OpKernel<T> {
F
.
Adds
(
&
inter_w
,
1.0
f
,
&
inter_w
);
F
.
Adds
(
&
inter_h
,
1.0
f
,
&
inter_h
);
}
Tensor
zeros
(
_type
);
phi
::
Dense
Tensor
zeros
(
_type
);
zeros
.
mutable_data
<
T
>
({
1
},
place
);
FillMLUTensorWithHostValue
<
T
>
(
ctx
,
static_cast
<
T
>
(
0
),
&
zeros
);
F
.
Maximum
(
&
inter_w
,
&
zeros
,
&
inter_w
);
F
.
Maximum
(
&
inter_h
,
&
zeros
,
&
inter_h
);
F
.
Mul
(
&
inter_w
,
&
inter_h
,
out
);
Tensor
union_area
(
_type
);
phi
::
Dense
Tensor
union_area
(
_type
);
union_area
.
mutable_data
<
T
>
({
N
,
M
},
place
);
F
.
Add
(
&
area1
,
&
area2
,
&
union_area
);
F
.
Sub
(
&
union_area
,
out
,
&
union_area
);
...
...
paddle/fluid/operators/detection/iou_similarity_op_npu.cc
浏览文件 @
65420271
...
...
@@ -18,8 +18,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
template
<
typename
T
>
struct
IouFunction
{
public:
...
...
@@ -108,21 +106,21 @@ class IouSimilarityNPUKernel : public framework::OpKernel<T> {
auto
M
=
y
->
dims
()[
0
];
out
->
mutable_data
<
T
>
({
N
,
M
},
place
);
Tensor
xt
(
_type
);
Tensor
yt
(
_type
);
phi
::
Dense
Tensor
xt
(
_type
);
phi
::
Dense
Tensor
yt
(
_type
);
xt
.
mutable_data
<
T
>
({
4
,
N
},
place
);
yt
.
mutable_data
<
T
>
({
4
,
M
},
place
);
std
::
vector
<
int
>
vec_trans
=
{
1
,
0
};
F
.
Transpose
(
x
,
&
xt
,
vec_trans
);
F
.
Transpose
(
y
,
&
yt
,
vec_trans
);
Tensor
xmin1
=
xt
.
Slice
(
0
,
1
);
Tensor
ymin1
=
xt
.
Slice
(
1
,
2
);
Tensor
xmax1
=
xt
.
Slice
(
2
,
3
);
Tensor
ymax1
=
xt
.
Slice
(
3
,
4
);
Tensor
xmin2
=
yt
.
Slice
(
0
,
1
);
Tensor
ymin2
=
yt
.
Slice
(
1
,
2
);
Tensor
xmax2
=
yt
.
Slice
(
2
,
3
);
Tensor
ymax2
=
yt
.
Slice
(
3
,
4
);
phi
::
Dense
Tensor
xmin1
=
xt
.
Slice
(
0
,
1
);
phi
::
Dense
Tensor
ymin1
=
xt
.
Slice
(
1
,
2
);
phi
::
Dense
Tensor
xmax1
=
xt
.
Slice
(
2
,
3
);
phi
::
Dense
Tensor
ymax1
=
xt
.
Slice
(
3
,
4
);
phi
::
Dense
Tensor
xmin2
=
yt
.
Slice
(
0
,
1
);
phi
::
Dense
Tensor
ymin2
=
yt
.
Slice
(
1
,
2
);
phi
::
Dense
Tensor
xmax2
=
yt
.
Slice
(
2
,
3
);
phi
::
Dense
Tensor
ymax2
=
yt
.
Slice
(
3
,
4
);
xmin1
.
Resize
({
N
,
1
});
ymin1
.
Resize
({
N
,
1
});
xmax1
.
Resize
({
N
,
1
});
...
...
@@ -132,12 +130,12 @@ class IouSimilarityNPUKernel : public framework::OpKernel<T> {
xmax2
.
Resize
({
1
,
M
});
ymax2
.
Resize
({
1
,
M
});
Tensor
w1
(
_type
);
Tensor
h1
(
_type
);
Tensor
w2
(
_type
);
Tensor
h2
(
_type
);
Tensor
area1
(
_type
);
Tensor
area2
(
_type
);
phi
::
Dense
Tensor
w1
(
_type
);
phi
::
Dense
Tensor
h1
(
_type
);
phi
::
Dense
Tensor
w2
(
_type
);
phi
::
Dense
Tensor
h2
(
_type
);
phi
::
Dense
Tensor
area1
(
_type
);
phi
::
Dense
Tensor
area2
(
_type
);
w1
.
mutable_data
<
T
>
({
N
,
1
},
place
);
h1
.
mutable_data
<
T
>
({
N
,
1
},
place
);
w2
.
mutable_data
<
T
>
({
1
,
M
},
place
);
...
...
@@ -157,10 +155,10 @@ class IouSimilarityNPUKernel : public framework::OpKernel<T> {
F
.
Mul
(
&
w1
,
&
h1
,
&
area1
);
F
.
Mul
(
&
w2
,
&
h2
,
&
area2
);
Tensor
inter_xmax
(
_type
);
Tensor
inter_ymax
(
_type
);
Tensor
inter_xmin
(
_type
);
Tensor
inter_ymin
(
_type
);
phi
::
Dense
Tensor
inter_xmax
(
_type
);
phi
::
Dense
Tensor
inter_ymax
(
_type
);
phi
::
Dense
Tensor
inter_xmin
(
_type
);
phi
::
Dense
Tensor
inter_ymin
(
_type
);
inter_xmax
.
mutable_data
<
T
>
({
N
,
M
},
place
);
inter_ymax
.
mutable_data
<
T
>
({
N
,
M
},
place
);
inter_xmin
.
mutable_data
<
T
>
({
N
,
M
},
place
);
...
...
@@ -170,8 +168,8 @@ class IouSimilarityNPUKernel : public framework::OpKernel<T> {
F
.
Maximum
(
&
xmin1
,
&
xmin2
,
&
inter_xmin
);
F
.
Maximum
(
&
ymin1
,
&
ymin2
,
&
inter_ymin
);
Tensor
inter_w
(
_type
);
Tensor
inter_h
(
_type
);
phi
::
Dense
Tensor
inter_w
(
_type
);
phi
::
Dense
Tensor
inter_h
(
_type
);
inter_w
.
mutable_data
<
T
>
({
N
,
M
},
place
);
inter_h
.
mutable_data
<
T
>
({
N
,
M
},
place
);
F
.
Sub
(
&
inter_xmax
,
&
inter_xmin
,
&
inter_w
);
...
...
@@ -181,14 +179,14 @@ class IouSimilarityNPUKernel : public framework::OpKernel<T> {
F
.
Adds
(
&
inter_w
,
1.0
f
,
&
inter_w
);
F
.
Adds
(
&
inter_h
,
1.0
f
,
&
inter_h
);
}
Tensor
zeros
(
_type
);
phi
::
Dense
Tensor
zeros
(
_type
);
zeros
.
mutable_data
<
T
>
({
1
},
place
);
FillNpuTensorWithConstant
<
T
>
(
&
zeros
,
static_cast
<
T
>
(
0
));
F
.
Maximum
(
&
inter_w
,
&
zeros
,
&
inter_w
);
F
.
Maximum
(
&
inter_h
,
&
zeros
,
&
inter_h
);
F
.
Mul
(
&
inter_w
,
&
inter_h
,
out
);
Tensor
union_area
(
_type
);
phi
::
Dense
Tensor
union_area
(
_type
);
union_area
.
mutable_data
<
T
>
({
N
,
M
},
place
);
F
.
Add
(
&
area1
,
&
area2
,
&
union_area
);
F
.
Sub
(
&
union_area
,
out
,
&
union_area
);
...
...
paddle/fluid/operators/detection/locality_aware_nms_op.cc
浏览文件 @
65420271
...
...
@@ -19,8 +19,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
class
LocalityAwareNMSOp
:
public
framework
::
OperatorWithKernel
{
public:
using
framework
::
OperatorWithKernel
::
OperatorWithKernel
;
...
...
@@ -252,7 +250,7 @@ class LocalityAwareNMSKernel : public framework::OpKernel<T> {
int
num_det
=
0
;
int64_t
class_num
=
scores
->
dims
()[
0
];
Tensor
bbox_slice
,
score_slice
;
phi
::
Dense
Tensor
bbox_slice
,
score_slice
;
for
(
int64_t
c
=
0
;
c
<
class_num
;
++
c
)
{
if
(
c
==
background_label
)
continue
;
...
...
@@ -325,7 +323,7 @@ class LocalityAwareNMSKernel : public framework::OpKernel<T> {
auto
*
bboxes_data
=
bboxes
.
data
<
T
>
();
auto
*
odata
=
outs
->
data
<
T
>
();
const
T
*
sdata
;
Tensor
bbox
;
phi
::
Dense
Tensor
bbox
;
bbox
.
Resize
({
scores
.
dims
()[
0
],
box_size
});
int
count
=
0
;
for
(
const
auto
&
it
:
selected_indices
)
{
...
...
@@ -370,7 +368,7 @@ class LocalityAwareNMSKernel : public framework::OpKernel<T> {
int64_t
box_dim
=
boxes
.
dims
()[
2
];
int64_t
out_dim
=
box_dim
+
2
;
int
num_nmsed_out
=
0
;
Tensor
boxes_slice
,
scores_slice
;
phi
::
Dense
Tensor
boxes_slice
,
scores_slice
;
int
n
=
batch_size
;
for
(
int
i
=
0
;
i
<
n
;
++
i
)
{
scores_slice
=
scores
.
Slice
(
i
,
i
+
1
);
...
...
@@ -407,7 +405,7 @@ class LocalityAwareNMSKernel : public framework::OpKernel<T> {
int64_t
s
=
batch_starts
[
i
];
int64_t
e
=
batch_starts
[
i
+
1
];
if
(
e
>
s
)
{
Tensor
out
=
outs
->
Slice
(
s
,
e
);
phi
::
Dense
Tensor
out
=
outs
->
Slice
(
s
,
e
);
LocalityAwareNMSOutput
(
dev_ctx
,
scores_slice
,
boxes_slice
,
...
...
paddle/fluid/operators/detection/matrix_nms_op.cc
浏览文件 @
65420271
...
...
@@ -20,8 +20,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
class
MatrixNMSOp
:
public
framework
::
OperatorWithKernel
{
public:
using
framework
::
OperatorWithKernel
::
OperatorWithKernel
;
...
...
paddle/fluid/operators/detection/multiclass_nms_op.cc
浏览文件 @
65420271
...
...
@@ -21,8 +21,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
inline
std
::
vector
<
size_t
>
GetNmsLodFromRoisNum
(
const
phi
::
DenseTensor
*
rois_num
)
{
std
::
vector
<
size_t
>
rois_lod
;
...
...
@@ -228,7 +226,7 @@ class MultiClassNMSKernel : public framework::OpKernel<T> {
int
num_det
=
0
;
int64_t
class_num
=
scores_size
==
3
?
scores
.
dims
()[
0
]
:
scores
.
dims
()[
1
];
Tensor
bbox_slice
,
score_slice
;
phi
::
Dense
Tensor
bbox_slice
,
score_slice
;
for
(
int64_t
c
=
0
;
c
<
class_num
;
++
c
)
{
if
(
c
==
background_label
)
continue
;
if
(
scores_size
==
3
)
{
...
...
@@ -319,7 +317,7 @@ class MultiClassNMSKernel : public framework::OpKernel<T> {
auto
*
bboxes_data
=
bboxes
.
data
<
T
>
();
auto
*
odata
=
outs
->
data
<
T
>
();
const
T
*
sdata
;
Tensor
bbox
;
phi
::
Dense
Tensor
bbox
;
bbox
.
Resize
({
scores
.
dims
()[
0
],
box_size
});
int
count
=
0
;
for
(
const
auto
&
it
:
selected_indices
)
{
...
...
@@ -373,7 +371,7 @@ class MultiClassNMSKernel : public framework::OpKernel<T> {
int64_t
box_dim
=
boxes
->
dims
()[
2
];
int64_t
out_dim
=
box_dim
+
2
;
int
num_nmsed_out
=
0
;
Tensor
boxes_slice
,
scores_slice
;
phi
::
Dense
Tensor
boxes_slice
,
scores_slice
;
int
n
=
0
;
if
(
has_roisnum
)
{
n
=
score_size
==
3
?
batch_size
:
rois_num
->
numel
();
...
...
@@ -449,7 +447,7 @@ class MultiClassNMSKernel : public framework::OpKernel<T> {
int64_t
s
=
batch_starts
[
i
];
int64_t
e
=
batch_starts
[
i
+
1
];
if
(
e
>
s
)
{
Tensor
out
=
outs
->
Slice
(
s
,
e
);
phi
::
Dense
Tensor
out
=
outs
->
Slice
(
s
,
e
);
if
(
return_index
)
{
int
*
output_idx
=
index
->
mutable_data
<
int
>
({
num_kept
,
1
},
ctx
.
GetPlace
());
...
...
paddle/fluid/operators/detection/polygon_box_transform_op.cc
浏览文件 @
65420271
...
...
@@ -17,8 +17,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
template
<
typename
DeviceContext
,
typename
T
>
class
PolygonBoxTransformCPUKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
...
...
paddle/fluid/operators/detection/polygon_box_transform_op.cu
浏览文件 @
65420271
...
...
@@ -19,7 +19,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
using
phi
::
PADDLE_CUDA_NUM_THREADS
;
#define CUDA_BLOCK_SIZE 16
...
...
paddle/fluid/operators/detection/prior_box_op_npu.cc
浏览文件 @
65420271
...
...
@@ -18,8 +18,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
template
<
typename
DeviceContext
,
typename
T
>
class
PriorBoxNPUKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
...
...
@@ -50,7 +48,7 @@ class PriorBoxNPUKernel : public framework::OpKernel<T> {
auto
place
=
ctx
.
GetPlace
();
Tensor
out
(
input
->
type
());
phi
::
Dense
Tensor
out
(
input
->
type
());
auto
out_dims
=
phi
::
vectorize
(
boxes
->
dims
());
out_dims
.
insert
(
out_dims
.
begin
(),
2
);
out
.
Resize
(
phi
::
make_ddim
(
out_dims
));
...
...
@@ -75,8 +73,8 @@ class PriorBoxNPUKernel : public framework::OpKernel<T> {
runner
.
Run
(
stream
);
out
.
Resize
(
phi
::
make_ddim
({
out
.
numel
()}));
Tensor
out_boxes
=
out
.
Slice
(
0
,
boxes
->
numel
());
Tensor
out_variances
=
out
.
Slice
(
boxes
->
numel
(),
out
.
numel
());
phi
::
Dense
Tensor
out_boxes
=
out
.
Slice
(
0
,
boxes
->
numel
());
phi
::
Dense
Tensor
out_variances
=
out
.
Slice
(
boxes
->
numel
(),
out
.
numel
());
out_boxes
.
Resize
(
boxes
->
dims
());
out_variances
.
Resize
(
variances
->
dims
());
...
...
paddle/fluid/operators/detection/retinanet_detection_output_op.cc
浏览文件 @
65420271
...
...
@@ -18,8 +18,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
class
RetinanetDetectionOutputOp
:
public
framework
::
OperatorWithKernel
{
public:
using
framework
::
OperatorWithKernel
::
OperatorWithKernel
;
...
...
@@ -409,9 +407,9 @@ class RetinanetDetectionOutputKernel : public framework::OpKernel<T> {
}
void
RetinanetDetectionOutput
(
const
framework
::
ExecutionContext
&
ctx
,
const
std
::
vector
<
Tensor
>&
scores
,
const
std
::
vector
<
Tensor
>&
bboxes
,
const
std
::
vector
<
Tensor
>&
anchors
,
const
std
::
vector
<
phi
::
Dense
Tensor
>&
scores
,
const
std
::
vector
<
phi
::
Dense
Tensor
>&
bboxes
,
const
std
::
vector
<
phi
::
Dense
Tensor
>&
anchors
,
const
phi
::
DenseTensor
&
im_info
,
std
::
vector
<
std
::
vector
<
T
>>*
nmsed_out
,
int
*
num_nmsed_out
)
const
{
...
...
@@ -425,11 +423,11 @@ class RetinanetDetectionOutputKernel : public framework::OpKernel<T> {
std
::
map
<
int
,
std
::
vector
<
std
::
vector
<
T
>>>
preds
;
for
(
size_t
l
=
0
;
l
<
scores
.
size
();
++
l
)
{
// Fetch per level score
Tensor
scores_per_level
=
scores
[
l
];
phi
::
Dense
Tensor
scores_per_level
=
scores
[
l
];
// Fetch per level bbox
Tensor
bboxes_per_level
=
bboxes
[
l
];
phi
::
Dense
Tensor
bboxes_per_level
=
bboxes
[
l
];
// Fetch per level anchor
Tensor
anchors_per_level
=
anchors
[
l
];
phi
::
Dense
Tensor
anchors_per_level
=
anchors
[
l
];
int64_t
scores_num
=
scores_per_level
.
numel
();
int64_t
bboxes_num
=
bboxes_per_level
.
numel
();
...
...
@@ -492,9 +490,9 @@ class RetinanetDetectionOutputKernel : public framework::OpKernel<T> {
auto
*
im_info
=
ctx
.
Input
<
phi
::
DenseTensor
>
(
"ImInfo"
);
auto
*
outs
=
ctx
.
Output
<
phi
::
DenseTensor
>
(
"Out"
);
std
::
vector
<
Tensor
>
boxes_list
(
boxes
.
size
());
std
::
vector
<
Tensor
>
scores_list
(
scores
.
size
());
std
::
vector
<
Tensor
>
anchors_list
(
anchors
.
size
());
std
::
vector
<
phi
::
Dense
Tensor
>
boxes_list
(
boxes
.
size
());
std
::
vector
<
phi
::
Dense
Tensor
>
scores_list
(
scores
.
size
());
std
::
vector
<
phi
::
Dense
Tensor
>
anchors_list
(
anchors
.
size
());
for
(
size_t
j
=
0
;
j
<
boxes_list
.
size
();
++
j
)
{
boxes_list
[
j
]
=
*
boxes
[
j
];
scores_list
[
j
]
=
*
scores
[
j
];
...
...
@@ -512,8 +510,8 @@ class RetinanetDetectionOutputKernel : public framework::OpKernel<T> {
std
::
vector
<
size_t
>
batch_starts
=
{
0
};
for
(
int
i
=
0
;
i
<
batch_size
;
++
i
)
{
int
num_nmsed_out
=
0
;
std
::
vector
<
Tensor
>
box_per_batch_list
(
boxes_list
.
size
());
std
::
vector
<
Tensor
>
score_per_batch_list
(
scores_list
.
size
());
std
::
vector
<
phi
::
Dense
Tensor
>
box_per_batch_list
(
boxes_list
.
size
());
std
::
vector
<
phi
::
Dense
Tensor
>
score_per_batch_list
(
scores_list
.
size
());
for
(
size_t
j
=
0
;
j
<
boxes_list
.
size
();
++
j
)
{
const
auto
&
score_dims
=
scores_list
[
j
].
dims
();
score_per_batch_list
[
j
]
=
scores_list
[
j
].
Slice
(
i
,
i
+
1
);
...
...
@@ -521,7 +519,7 @@ class RetinanetDetectionOutputKernel : public framework::OpKernel<T> {
box_per_batch_list
[
j
]
=
boxes_list
[
j
].
Slice
(
i
,
i
+
1
);
box_per_batch_list
[
j
].
Resize
({
score_dims
[
1
],
box_dim
});
}
Tensor
im_info_slice
=
im_info
->
Slice
(
i
,
i
+
1
);
phi
::
Dense
Tensor
im_info_slice
=
im_info
->
Slice
(
i
,
i
+
1
);
std
::
vector
<
std
::
vector
<
T
>>
nmsed_out
;
RetinanetDetectionOutput
(
ctx
,
...
...
@@ -544,7 +542,7 @@ class RetinanetDetectionOutputKernel : public framework::OpKernel<T> {
int64_t
s
=
batch_starts
[
i
];
int64_t
e
=
batch_starts
[
i
+
1
];
if
(
e
>
s
)
{
Tensor
out
=
outs
->
Slice
(
s
,
e
);
phi
::
Dense
Tensor
out
=
outs
->
Slice
(
s
,
e
);
MultiClassOutput
(
dev_ctx
,
all_nmsed_out
[
i
],
&
out
);
}
}
...
...
@@ -563,7 +561,8 @@ class RetinanetDetectionOutputOpMaker
void
Make
()
override
{
AddInput
(
"BBoxes"
,
"(List) A list of tensors from multiple FPN levels. Each "
"element is a 3-D Tensor with shape [N, Mi, 4] represents the "
"element is a 3-D phi::DenseTensor with shape [N, Mi, 4] "
"represents the "
"predicted locations of Mi bounding boxes, N is the batch size. "
"Mi is the number of bounding boxes from i-th FPN level. Each "
"bounding box has four coordinate values and the layout is "
...
...
@@ -571,15 +570,17 @@ class RetinanetDetectionOutputOpMaker
.
AsDuplicable
();
AddInput
(
"Scores"
,
"(List) A list of tensors from multiple FPN levels. Each "
"element is a 3-D Tensor with shape [N, Mi, C] represents the "
"element is a 3-D phi::DenseTensor with shape [N, Mi, C] "
"represents the "
"predicted confidence from its FPN level. N is the batch size, "
"C is the class number (excluding background), Mi is the number "
"of bounding boxes from i-th FPN level. For each bounding box, "
"there are total C scores."
)
.
AsDuplicable
();
AddInput
(
"Anchors"
,
AddInput
(
"Anchors"
,
"(List) A list of tensors from multiple FPN levels. Each"
"element is a 2-D
Tensor with shape [Mi, 4] represents the "
"element is a 2-D phi::Dense
Tensor with shape [Mi, 4] represents the "
"locations of Mi anchor boxes from i-th FPN level. Each "
"bounding box has four coordinate values and the layout is "
"[xmin, ymin, xmax, ymax]."
)
...
...
paddle/fluid/operators/detection/roi_perspective_transform_op.cc
浏览文件 @
65420271
...
...
@@ -22,8 +22,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
template
<
typename
T
>
bool
GT_E
(
T
a
,
T
b
)
{
return
(
a
>
b
)
||
fabs
(
a
-
b
)
<
1e-4
;
...
...
@@ -600,7 +598,7 @@ class ROIPerspectiveTransformOpMaker
public:
void
Make
()
override
{
AddInput
(
"X"
,
"(Tensor), "
"(
phi::Dense
Tensor), "
"the input of ROIPerspectiveTransformOp. "
"The format of input tensor is NCHW. Where N is batch size, "
"C is the number of input channels, "
...
...
@@ -617,28 +615,28 @@ class ROIPerspectiveTransformOpMaker
"(x4, y4) is the bottom left coordinates."
);
AddOutput
(
"Out"
,
"(Tensor), "
"(
phi::Dense
Tensor), "
"The output of ROIPerspectiveTransformOp is a 4-D tensor with shape "
"(num_rois, channels, transformed_h, transformed_w)."
);
AddOutput
(
"Mask"
,
"(Tensor), "
"(
phi::Dense
Tensor), "
"The output mask of ROIPerspectiveTransformOp is a 4-D tensor "
"with shape "
"(num_rois, 1, transformed_h, transformed_w)."
);
AddOutput
(
"TransformMatrix"
,
"(Tensor), "
"(
phi::Dense
Tensor), "
"The output transform matrix of ROIPerspectiveTransformOp is a "
"1-D tensor with shape "
"(num_rois, 9)."
);
AddOutput
(
"Out2InIdx"
,
"(Tensor), "
"(
phi::Dense
Tensor), "
"An intermediate tensor used to map indexes of input feature map "
"and indexes of output feature map."
"The shape of the tensor is [out_size, 4] and out_size is the "
"number of elements in output feature map."
)
.
AsIntermediate
();
AddOutput
(
"Out2InWeights"
,
"(Tensor), "
"(
phi::Dense
Tensor), "
"An intermediate tensor used to record the weights of bilinear "
"interpolatein for each element in output. The shape of the "
"tensor is [out_size, 4] and out_size is the number of elements "
...
...
paddle/fluid/operators/detection/rpn_target_assign_op.cc
浏览文件 @
65420271
...
...
@@ -21,7 +21,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
template
<
typename
T
,
int
MajorType
=
Eigen
::
RowMajor
,
typename
IndexType
=
Eigen
::
DenseIndex
>
...
...
@@ -113,7 +112,8 @@ void AppendRpns(phi::DenseTensor* out,
}
template
<
typename
T
>
std
::
vector
<
Tensor
>
FilterStraddleAnchor
(
const
phi
::
CPUContext
&
context
,
std
::
vector
<
phi
::
DenseTensor
>
FilterStraddleAnchor
(
const
phi
::
CPUContext
&
context
,
const
phi
::
DenseTensor
*
anchor
,
const
float
rpn_straddle_thresh
,
T
im_height
,
...
...
@@ -138,23 +138,23 @@ std::vector<Tensor> FilterStraddleAnchor(const phi::CPUContext& context,
}
}
int
inside_num
=
inds_inside
.
size
();
Tensor
inds_inside_t
;
phi
::
Dense
Tensor
inds_inside_t
;
int
*
inds_inside_data
=
inds_inside_t
.
mutable_data
<
int
>
({
inside_num
},
context
.
GetPlace
());
std
::
copy
(
inds_inside
.
begin
(),
inds_inside
.
end
(),
inds_inside_data
);
Tensor
inside_anchor_t
;
phi
::
Dense
Tensor
inside_anchor_t
;
T
*
inside_anchor_data
=
inside_anchor_t
.
mutable_data
<
T
>
({
inside_num
,
4
},
context
.
GetPlace
());
Gather
<
T
>
(
anchor
->
data
<
T
>
(),
4
,
inds_inside_data
,
inside_num
,
inside_anchor_data
);
std
::
vector
<
Tensor
>
res
;
std
::
vector
<
phi
::
Dense
Tensor
>
res
;
res
.
emplace_back
(
inds_inside_t
);
res
.
emplace_back
(
inside_anchor_t
);
return
res
;
}
template
<
typename
T
>
Tensor
FilterCrowdGt
(
const
phi
::
CPUContext
&
context
,
phi
::
Dense
Tensor
FilterCrowdGt
(
const
phi
::
CPUContext
&
context
,
phi
::
DenseTensor
*
gt_boxes
,
phi
::
DenseTensor
*
is_crowd
)
{
int
gt_num
=
gt_boxes
->
dims
()[
0
];
...
...
@@ -166,7 +166,7 @@ Tensor FilterCrowdGt(const phi::CPUContext& context,
}
}
int
ncrowd_num
=
not_crowd_inds
.
size
();
Tensor
ncrowd_gt_boxes
;
phi
::
Dense
Tensor
ncrowd_gt_boxes
;
T
*
ncrowd_gt_boxes_data
=
ncrowd_gt_boxes
.
mutable_data
<
T
>
({
ncrowd_num
,
4
},
context
.
GetPlace
());
Gather
<
T
>
(
gt_boxes
->
data
<
T
>
(),
...
...
@@ -300,7 +300,7 @@ void ScoreAssign(const T* anchor_by_gt_overlap_data,
}
template
<
typename
T
>
std
::
vector
<
Tensor
>
SampleRpnFgBgGt
(
std
::
vector
<
phi
::
Dense
Tensor
>
SampleRpnFgBgGt
(
const
phi
::
CPUContext
&
ctx
,
const
phi
::
DenseTensor
&
anchor_by_gt_overlap
,
const
int
rpn_batch_size_per_im
,
...
...
@@ -322,7 +322,7 @@ std::vector<Tensor> SampleRpnFgBgGt(
// Calculate the max IoU between anchors and gt boxes
// Map from anchor to gt box that has highest overlap
auto
place
=
ctx
.
GetPlace
();
Tensor
anchor_to_gt_max
,
anchor_to_gt_argmax
,
gt_to_anchor_max
;
phi
::
Dense
Tensor
anchor_to_gt_max
,
anchor_to_gt_argmax
,
gt_to_anchor_max
;
anchor_to_gt_max
.
mutable_data
<
T
>
({
anchor_num
},
place
);
int
*
argmax
=
anchor_to_gt_argmax
.
mutable_data
<
int
>
({
anchor_num
},
place
);
gt_to_anchor_max
.
mutable_data
<
T
>
({
gt_num
},
place
);
...
...
@@ -365,7 +365,8 @@ std::vector<Tensor> SampleRpnFgBgGt(
for
(
int
i
=
0
;
i
<
fg_fake_num
;
++
i
)
{
gt_inds
.
emplace_back
(
argmax
[
fg_fake
[
i
]]);
}
Tensor
loc_index_t
,
score_index_t
,
tgt_lbl_t
,
gt_inds_t
,
bbox_inside_weight_t
;
phi
::
DenseTensor
loc_index_t
,
score_index_t
,
tgt_lbl_t
,
gt_inds_t
,
bbox_inside_weight_t
;
int
*
loc_index_data
=
loc_index_t
.
mutable_data
<
int
>
({
fg_fake_num
},
place
);
int
*
score_index_data
=
score_index_t
.
mutable_data
<
int
>
({
fg_num
+
bg_num
},
place
);
...
...
@@ -381,7 +382,7 @@ std::vector<Tensor> SampleRpnFgBgGt(
std
::
copy
(
bbox_inside_weight
.
begin
(),
bbox_inside_weight
.
end
(),
bbox_inside_weight_data
);
std
::
vector
<
Tensor
>
loc_score_tgtlbl_gt
;
std
::
vector
<
phi
::
Dense
Tensor
>
loc_score_tgtlbl_gt
;
loc_score_tgtlbl_gt
.
emplace_back
(
loc_index_t
);
loc_score_tgtlbl_gt
.
emplace_back
(
score_index_t
);
loc_score_tgtlbl_gt
.
emplace_back
(
tgt_lbl_t
);
...
...
@@ -455,30 +456,30 @@ class RpnTargetAssignKernel : public framework::OpKernel<T> {
auto
gt_boxes_lod
=
gt_boxes
->
lod
().
back
();
auto
is_crowd_lod
=
is_crowd
->
lod
().
back
();
for
(
int
i
=
0
;
i
<
batch_num
;
++
i
)
{
Tensor
gt_boxes_slice
=
phi
::
Dense
Tensor
gt_boxes_slice
=
gt_boxes
->
Slice
(
gt_boxes_lod
[
i
],
gt_boxes_lod
[
i
+
1
]);
Tensor
is_crowd_slice
=
phi
::
Dense
Tensor
is_crowd_slice
=
is_crowd
->
Slice
(
is_crowd_lod
[
i
],
is_crowd_lod
[
i
+
1
]);
Tensor
im_info_slice
=
im_info
->
Slice
(
i
,
i
+
1
);
phi
::
Dense
Tensor
im_info_slice
=
im_info
->
Slice
(
i
,
i
+
1
);
auto
*
im_info_data
=
im_info_slice
.
data
<
T
>
();
auto
im_height
=
im_info_data
[
0
];
auto
im_width
=
im_info_data
[
1
];
auto
im_scale
=
im_info_data
[
2
];
// Filter straddle anchor
std
::
vector
<
Tensor
>
filter_output
=
FilterStraddleAnchor
<
T
>
(
std
::
vector
<
phi
::
Dense
Tensor
>
filter_output
=
FilterStraddleAnchor
<
T
>
(
dev_ctx
,
anchor
,
rpn_straddle_thresh
,
im_height
,
im_width
);
Tensor
inds_inside
=
filter_output
[
0
];
Tensor
inside_anchor
=
filter_output
[
1
];
phi
::
Dense
Tensor
inds_inside
=
filter_output
[
0
];
phi
::
Dense
Tensor
inside_anchor
=
filter_output
[
1
];
// Filter crowd gt
Tensor
ncrowd_gt_boxes
=
phi
::
Dense
Tensor
ncrowd_gt_boxes
=
FilterCrowdGt
<
T
>
(
dev_ctx
,
&
gt_boxes_slice
,
&
is_crowd_slice
);
auto
ncrowd_gt_boxes_et
=
framework
::
EigenTensor
<
T
,
2
>::
From
(
ncrowd_gt_boxes
);
ncrowd_gt_boxes_et
=
ncrowd_gt_boxes_et
*
im_scale
;
Tensor
anchor_by_gt_overlap
;
phi
::
Dense
Tensor
anchor_by_gt_overlap
;
anchor_by_gt_overlap
.
mutable_data
<
T
>
(
{
inside_anchor
.
dims
()[
0
],
ncrowd_gt_boxes
.
dims
()[
0
]},
place
);
BboxOverlaps
<
T
>
(
inside_anchor
,
ncrowd_gt_boxes
,
&
anchor_by_gt_overlap
);
...
...
@@ -492,16 +493,16 @@ class RpnTargetAssignKernel : public framework::OpKernel<T> {
engine
,
use_random
);
Tensor
sampled_loc_index
=
loc_score_tgtlbl_gt
[
0
];
Tensor
sampled_score_index
=
loc_score_tgtlbl_gt
[
1
];
Tensor
sampled_tgtlbl
=
loc_score_tgtlbl_gt
[
2
];
Tensor
sampled_gt_index
=
loc_score_tgtlbl_gt
[
3
];
Tensor
sampled_bbox_inside_weight
=
loc_score_tgtlbl_gt
[
4
];
phi
::
Dense
Tensor
sampled_loc_index
=
loc_score_tgtlbl_gt
[
0
];
phi
::
Dense
Tensor
sampled_score_index
=
loc_score_tgtlbl_gt
[
1
];
phi
::
Dense
Tensor
sampled_tgtlbl
=
loc_score_tgtlbl_gt
[
2
];
phi
::
Dense
Tensor
sampled_gt_index
=
loc_score_tgtlbl_gt
[
3
];
phi
::
Dense
Tensor
sampled_bbox_inside_weight
=
loc_score_tgtlbl_gt
[
4
];
int
loc_num
=
sampled_loc_index
.
dims
()[
0
];
int
score_num
=
sampled_score_index
.
dims
()[
0
];
// unmap to all anchor
Tensor
sampled_loc_index_unmap
,
sampled_score_index_unmap
;
phi
::
Dense
Tensor
sampled_loc_index_unmap
,
sampled_score_index_unmap
;
sampled_loc_index_unmap
.
mutable_data
<
int
>
({
loc_num
},
place
);
sampled_score_index_unmap
.
mutable_data
<
int
>
({
score_num
},
place
);
Gather
<
int
>
(
inds_inside
.
data
<
int
>
(),
...
...
@@ -516,7 +517,7 @@ class RpnTargetAssignKernel : public framework::OpKernel<T> {
sampled_score_index_unmap
.
data
<
int
>
());
// get target bbox deltas
Tensor
sampled_anchor
,
sampled_gt
,
sampled_tgt_bbox
;
phi
::
Dense
Tensor
sampled_anchor
,
sampled_gt
,
sampled_tgt_bbox
;
auto
*
sampled_anchor_data
=
sampled_anchor
.
mutable_data
<
T
>
({
loc_num
,
4
},
place
);
auto
*
sampled_gt_data
=
sampled_gt
.
mutable_data
<
T
>
({
loc_num
,
4
},
place
);
...
...
@@ -859,7 +860,8 @@ class RetinanetTargetAssignOp : public framework::OperatorWithKernel {
};
template
<
typename
T
>
std
::
vector
<
Tensor
>
FilterCrowdGtBoxLabel
(
const
phi
::
CPUContext
&
context
,
std
::
vector
<
phi
::
DenseTensor
>
FilterCrowdGtBoxLabel
(
const
phi
::
CPUContext
&
context
,
phi
::
DenseTensor
*
gt_boxes
,
phi
::
DenseTensor
*
gt_labels
,
phi
::
DenseTensor
*
is_crowd
)
{
...
...
@@ -872,7 +874,7 @@ std::vector<Tensor> FilterCrowdGtBoxLabel(const phi::CPUContext& context,
}
}
int
ncrowd_num
=
not_crowd_inds
.
size
();
Tensor
ncrowd_gt_boxes
,
ncrowd_gt_labels
;
phi
::
Dense
Tensor
ncrowd_gt_boxes
,
ncrowd_gt_labels
;
T
*
ncrowd_gt_boxes_data
=
ncrowd_gt_boxes
.
mutable_data
<
T
>
({
ncrowd_num
,
4
},
context
.
GetPlace
());
int
*
ncrowd_gt_labels_data
=
...
...
@@ -887,14 +889,15 @@ std::vector<Tensor> FilterCrowdGtBoxLabel(const phi::CPUContext& context,
not_crowd_inds
.
data
(),
ncrowd_num
,
ncrowd_gt_labels_data
);
std
::
vector
<
Tensor
>
res
;
std
::
vector
<
phi
::
Dense
Tensor
>
res
;
res
.
emplace_back
(
ncrowd_gt_boxes
);
res
.
emplace_back
(
ncrowd_gt_labels
);
return
res
;
}
template
<
typename
T
>
std
::
vector
<
Tensor
>
GetAllFgBgGt
(
const
phi
::
CPUContext
&
ctx
,
std
::
vector
<
phi
::
DenseTensor
>
GetAllFgBgGt
(
const
phi
::
CPUContext
&
ctx
,
const
phi
::
DenseTensor
&
anchor_by_gt_overlap
,
const
phi
::
DenseTensor
&
ncrowd_gt_labels
,
const
float
positive_overlap
,
...
...
@@ -913,7 +916,7 @@ std::vector<Tensor> GetAllFgBgGt(const phi::CPUContext& ctx,
// Calculate the max IoU between anchors and gt boxes
// Map from anchor to gt box that has highest overlap
auto
place
=
ctx
.
GetPlace
();
Tensor
anchor_to_gt_max
,
anchor_to_gt_argmax
,
gt_to_anchor_max
;
phi
::
Dense
Tensor
anchor_to_gt_max
,
anchor_to_gt_argmax
,
gt_to_anchor_max
;
anchor_to_gt_max
.
mutable_data
<
T
>
({
anchor_num
},
place
);
int
*
argmax
=
anchor_to_gt_argmax
.
mutable_data
<
int
>
({
anchor_num
},
place
);
gt_to_anchor_max
.
mutable_data
<
T
>
({
gt_num
},
place
);
...
...
@@ -961,8 +964,9 @@ std::vector<Tensor> GetAllFgBgGt(const phi::CPUContext& ctx,
gt_inds
.
emplace_back
(
argmax
[
fg_fake
[
i
]]);
}
Tensor
loc_index_t
,
score_index_t
,
tgt_lbl_t
,
gt_inds_t
,
bbox_inside_weight_t
;
Tensor
fg_num_t
;
phi
::
DenseTensor
loc_index_t
,
score_index_t
,
tgt_lbl_t
,
gt_inds_t
,
bbox_inside_weight_t
;
phi
::
DenseTensor
fg_num_t
;
int
*
loc_index_data
=
loc_index_t
.
mutable_data
<
int
>
({
fg_fake_num
},
place
);
int
*
score_index_data
=
score_index_t
.
mutable_data
<
int
>
({
fg_num
+
bg_num
},
place
);
...
...
@@ -980,7 +984,7 @@ std::vector<Tensor> GetAllFgBgGt(const phi::CPUContext& ctx,
bbox_inside_weight
.
end
(),
bbox_inside_weight_data
);
fg_num_data
[
0
]
=
fg_fake
.
size
()
+
1
;
std
::
vector
<
Tensor
>
loc_score_tgtlbl_gt
;
std
::
vector
<
phi
::
Dense
Tensor
>
loc_score_tgtlbl_gt
;
loc_score_tgtlbl_gt
.
emplace_back
(
loc_index_t
);
loc_score_tgtlbl_gt
.
emplace_back
(
score_index_t
);
loc_score_tgtlbl_gt
.
emplace_back
(
tgt_lbl_t
);
...
...
@@ -1065,35 +1069,35 @@ class RetinanetTargetAssignKernel : public framework::OpKernel<T> {
auto
gt_labels_lod
=
gt_labels
->
lod
().
back
();
auto
is_crowd_lod
=
is_crowd
->
lod
().
back
();
for
(
int
i
=
0
;
i
<
batch_num
;
++
i
)
{
Tensor
gt_boxes_slice
=
phi
::
Dense
Tensor
gt_boxes_slice
=
gt_boxes
->
Slice
(
gt_boxes_lod
[
i
],
gt_boxes_lod
[
i
+
1
]);
Tensor
gt_labels_slice
=
phi
::
Dense
Tensor
gt_labels_slice
=
gt_labels
->
Slice
(
gt_labels_lod
[
i
],
gt_labels_lod
[
i
+
1
]);
Tensor
is_crowd_slice
=
phi
::
Dense
Tensor
is_crowd_slice
=
is_crowd
->
Slice
(
is_crowd_lod
[
i
],
is_crowd_lod
[
i
+
1
]);
Tensor
im_info_slice
=
im_info
->
Slice
(
i
,
i
+
1
);
phi
::
Dense
Tensor
im_info_slice
=
im_info
->
Slice
(
i
,
i
+
1
);
auto
*
im_info_data
=
im_info_slice
.
data
<
T
>
();
auto
im_height
=
im_info_data
[
0
];
auto
im_width
=
im_info_data
[
1
];
auto
im_scale
=
im_info_data
[
2
];
// Filter straddle anchor
std
::
vector
<
Tensor
>
filter_output
=
std
::
vector
<
phi
::
Dense
Tensor
>
filter_output
=
FilterStraddleAnchor
<
T
>
(
dev_ctx
,
anchor
,
-
1
,
im_height
,
im_width
);
Tensor
inds_inside
=
filter_output
[
0
];
Tensor
inside_anchor
=
filter_output
[
1
];
phi
::
Dense
Tensor
inds_inside
=
filter_output
[
0
];
phi
::
Dense
Tensor
inside_anchor
=
filter_output
[
1
];
// Filter crowd gt
std
::
vector
<
Tensor
>
ncrowd_output
=
FilterCrowdGtBoxLabel
<
T
>
(
std
::
vector
<
phi
::
Dense
Tensor
>
ncrowd_output
=
FilterCrowdGtBoxLabel
<
T
>
(
dev_ctx
,
&
gt_boxes_slice
,
&
gt_labels_slice
,
&
is_crowd_slice
);
Tensor
ncrowd_gt_boxes
=
ncrowd_output
[
0
];
Tensor
ncrowd_gt_labels
=
ncrowd_output
[
1
];
phi
::
Dense
Tensor
ncrowd_gt_boxes
=
ncrowd_output
[
0
];
phi
::
Dense
Tensor
ncrowd_gt_labels
=
ncrowd_output
[
1
];
auto
ncrowd_gt_boxes_et
=
framework
::
EigenTensor
<
T
,
2
>::
From
(
ncrowd_gt_boxes
);
ncrowd_gt_boxes_et
=
ncrowd_gt_boxes_et
*
im_scale
;
Tensor
anchor_by_gt_overlap
;
phi
::
Dense
Tensor
anchor_by_gt_overlap
;
anchor_by_gt_overlap
.
mutable_data
<
T
>
(
{
inside_anchor
.
dims
()[
0
],
ncrowd_gt_boxes
.
dims
()[
0
]},
place
);
BboxOverlaps
<
T
>
(
inside_anchor
,
ncrowd_gt_boxes
,
&
anchor_by_gt_overlap
);
...
...
@@ -1105,17 +1109,17 @@ class RetinanetTargetAssignKernel : public framework::OpKernel<T> {
negative_overlap
,
engine
);
Tensor
sampled_loc_index
=
loc_score_tgtlbl_gt
[
0
];
Tensor
sampled_score_index
=
loc_score_tgtlbl_gt
[
1
];
Tensor
sampled_tgtlbl
=
loc_score_tgtlbl_gt
[
2
];
Tensor
sampled_gt_index
=
loc_score_tgtlbl_gt
[
3
];
Tensor
sampled_bbox_inside_weight
=
loc_score_tgtlbl_gt
[
4
];
Tensor
sampled_fg_num
=
loc_score_tgtlbl_gt
[
5
];
phi
::
Dense
Tensor
sampled_loc_index
=
loc_score_tgtlbl_gt
[
0
];
phi
::
Dense
Tensor
sampled_score_index
=
loc_score_tgtlbl_gt
[
1
];
phi
::
Dense
Tensor
sampled_tgtlbl
=
loc_score_tgtlbl_gt
[
2
];
phi
::
Dense
Tensor
sampled_gt_index
=
loc_score_tgtlbl_gt
[
3
];
phi
::
Dense
Tensor
sampled_bbox_inside_weight
=
loc_score_tgtlbl_gt
[
4
];
phi
::
Dense
Tensor
sampled_fg_num
=
loc_score_tgtlbl_gt
[
5
];
int
loc_num
=
sampled_loc_index
.
dims
()[
0
];
int
score_num
=
sampled_score_index
.
dims
()[
0
];
// unmap to all anchor
Tensor
sampled_loc_index_unmap
,
sampled_score_index_unmap
;
phi
::
Dense
Tensor
sampled_loc_index_unmap
,
sampled_score_index_unmap
;
sampled_loc_index_unmap
.
mutable_data
<
int
>
({
loc_num
},
place
);
sampled_score_index_unmap
.
mutable_data
<
int
>
({
score_num
},
place
);
Gather
<
int
>
(
inds_inside
.
data
<
int
>
(),
...
...
@@ -1130,7 +1134,7 @@ class RetinanetTargetAssignKernel : public framework::OpKernel<T> {
sampled_score_index_unmap
.
data
<
int
>
());
// get target bbox deltas
Tensor
sampled_anchor
,
sampled_gt
,
sampled_tgt_bbox
;
phi
::
Dense
Tensor
sampled_anchor
,
sampled_gt
,
sampled_tgt_bbox
;
auto
*
sampled_anchor_data
=
sampled_anchor
.
mutable_data
<
T
>
({
loc_num
,
4
},
place
);
auto
*
sampled_gt_data
=
sampled_gt
.
mutable_data
<
T
>
({
loc_num
,
4
},
place
);
...
...
paddle/fluid/operators/detection/sigmoid_focal_loss_op.cu
浏览文件 @
65420271
...
...
@@ -19,8 +19,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
static
constexpr
int
kNumCUDAThreads
=
512
;
static
constexpr
int
kNumMaxinumNumBlocks
=
4096
;
...
...
@@ -123,10 +121,10 @@ template <typename DeviceContext, typename T>
class
GPUSigmoidFocalLossKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
void
Compute
(
const
framework
::
ExecutionContext
&
context
)
const
override
{
const
Tensor
*
X
=
context
.
Input
<
phi
::
DenseTensor
>
(
"X"
);
const
Tensor
*
Labels
=
context
.
Input
<
phi
::
DenseTensor
>
(
"Label"
);
const
Tensor
*
FgNum
=
context
.
Input
<
phi
::
DenseTensor
>
(
"FgNum"
);
Tensor
*
Out
=
context
.
Output
<
phi
::
DenseTensor
>
(
"Out"
);
const
phi
::
Dense
Tensor
*
X
=
context
.
Input
<
phi
::
DenseTensor
>
(
"X"
);
const
phi
::
Dense
Tensor
*
Labels
=
context
.
Input
<
phi
::
DenseTensor
>
(
"Label"
);
const
phi
::
Dense
Tensor
*
FgNum
=
context
.
Input
<
phi
::
DenseTensor
>
(
"FgNum"
);
phi
::
Dense
Tensor
*
Out
=
context
.
Output
<
phi
::
DenseTensor
>
(
"Out"
);
T
gamma
=
static_cast
<
T
>
(
context
.
Attr
<
float
>
(
"gamma"
));
T
alpha
=
static_cast
<
T
>
(
context
.
Attr
<
float
>
(
"alpha"
));
auto
x_dims
=
X
->
dims
();
...
...
@@ -154,12 +152,13 @@ template <typename DeviceContext, typename T>
class
GPUSigmoidFocalLossGradKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
void
Compute
(
const
framework
::
ExecutionContext
&
context
)
const
override
{
const
Tensor
*
X
=
context
.
Input
<
phi
::
DenseTensor
>
(
"X"
);
const
Tensor
*
Labels
=
context
.
Input
<
phi
::
DenseTensor
>
(
"Label"
);
const
Tensor
*
FgNum
=
context
.
Input
<
phi
::
DenseTensor
>
(
"FgNum"
);
const
Tensor
*
dOut
=
const
phi
::
Dense
Tensor
*
X
=
context
.
Input
<
phi
::
DenseTensor
>
(
"X"
);
const
phi
::
Dense
Tensor
*
Labels
=
context
.
Input
<
phi
::
DenseTensor
>
(
"Label"
);
const
phi
::
Dense
Tensor
*
FgNum
=
context
.
Input
<
phi
::
DenseTensor
>
(
"FgNum"
);
const
phi
::
Dense
Tensor
*
dOut
=
context
.
Input
<
phi
::
DenseTensor
>
(
framework
::
GradVarName
(
"Out"
));
Tensor
*
dX
=
context
.
Output
<
phi
::
DenseTensor
>
(
framework
::
GradVarName
(
"X"
));
phi
::
DenseTensor
*
dX
=
context
.
Output
<
phi
::
DenseTensor
>
(
framework
::
GradVarName
(
"X"
));
auto
dx_data
=
dX
->
mutable_data
<
T
>
(
context
.
GetPlace
());
T
gamma
=
static_cast
<
T
>
(
context
.
Attr
<
float
>
(
"gamma"
));
T
alpha
=
static_cast
<
T
>
(
context
.
Attr
<
float
>
(
"alpha"
));
...
...
paddle/fluid/operators/detection/sigmoid_focal_loss_op.h
浏览文件 @
65420271
...
...
@@ -22,16 +22,14 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
template
<
typename
DeviceContext
,
typename
T
>
class
SigmoidFocalLossKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
void
Compute
(
const
framework
::
ExecutionContext
&
context
)
const
override
{
const
Tensor
*
X
=
context
.
Input
<
phi
::
DenseTensor
>
(
"X"
);
const
Tensor
*
Labels
=
context
.
Input
<
phi
::
DenseTensor
>
(
"Label"
);
const
Tensor
*
FgNum
=
context
.
Input
<
phi
::
DenseTensor
>
(
"FgNum"
);
Tensor
*
Out
=
context
.
Output
<
phi
::
DenseTensor
>
(
"Out"
);
const
phi
::
Dense
Tensor
*
X
=
context
.
Input
<
phi
::
DenseTensor
>
(
"X"
);
const
phi
::
Dense
Tensor
*
Labels
=
context
.
Input
<
phi
::
DenseTensor
>
(
"Label"
);
const
phi
::
Dense
Tensor
*
FgNum
=
context
.
Input
<
phi
::
DenseTensor
>
(
"FgNum"
);
phi
::
Dense
Tensor
*
Out
=
context
.
Output
<
phi
::
DenseTensor
>
(
"Out"
);
T
gamma
=
static_cast
<
T
>
(
context
.
Attr
<
float
>
(
"gamma"
));
T
alpha
=
static_cast
<
T
>
(
context
.
Attr
<
float
>
(
"alpha"
));
auto
out_data
=
Out
->
mutable_data
<
T
>
(
context
.
GetPlace
());
...
...
@@ -79,12 +77,13 @@ template <typename DeviceContext, typename T>
class
SigmoidFocalLossGradKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
void
Compute
(
const
framework
::
ExecutionContext
&
context
)
const
override
{
const
Tensor
*
X
=
context
.
Input
<
phi
::
DenseTensor
>
(
"X"
);
const
Tensor
*
Labels
=
context
.
Input
<
phi
::
DenseTensor
>
(
"Label"
);
const
Tensor
*
FgNum
=
context
.
Input
<
phi
::
DenseTensor
>
(
"FgNum"
);
const
Tensor
*
dOut
=
const
phi
::
Dense
Tensor
*
X
=
context
.
Input
<
phi
::
DenseTensor
>
(
"X"
);
const
phi
::
Dense
Tensor
*
Labels
=
context
.
Input
<
phi
::
DenseTensor
>
(
"Label"
);
const
phi
::
Dense
Tensor
*
FgNum
=
context
.
Input
<
phi
::
DenseTensor
>
(
"FgNum"
);
const
phi
::
Dense
Tensor
*
dOut
=
context
.
Input
<
phi
::
DenseTensor
>
(
framework
::
GradVarName
(
"Out"
));
Tensor
*
dX
=
context
.
Output
<
phi
::
DenseTensor
>
(
framework
::
GradVarName
(
"X"
));
phi
::
DenseTensor
*
dX
=
context
.
Output
<
phi
::
DenseTensor
>
(
framework
::
GradVarName
(
"X"
));
auto
dx_data
=
dX
->
mutable_data
<
T
>
(
context
.
GetPlace
());
T
gamma
=
static_cast
<
T
>
(
context
.
Attr
<
float
>
(
"gamma"
));
T
alpha
=
static_cast
<
T
>
(
context
.
Attr
<
float
>
(
"alpha"
));
...
...
paddle/fluid/operators/detection/yolo_box_op_mlu.cc
浏览文件 @
65420271
...
...
@@ -77,7 +77,7 @@ class YoloBoxMLUKernel : public framework::OpKernel<T> {
MLUOpTensorDesc
x_desc
(
*
x
,
MLUOP_LAYOUT_ARRAY
,
ToMluOpDataType
<
T
>
());
MLUOpTensorDesc
img_size_desc
(
*
img_size
,
MLUOP_LAYOUT_ARRAY
,
ToMluOpDataType
<
int32_t
>
());
Tensor
anchors_temp
(
framework
::
TransToPhiDataType
(
VT
::
INT32
));
phi
::
Dense
Tensor
anchors_temp
(
framework
::
TransToPhiDataType
(
VT
::
INT32
));
anchors_temp
.
Resize
({
size
});
paddle
::
framework
::
TensorFromVector
(
anchors
,
ctx
.
device_context
(),
&
anchors_temp
);
...
...
paddle/fluid/operators/detection_map_op.cc
浏览文件 @
65420271
...
...
@@ -19,8 +19,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
class
DetectionMAPOp
:
public
framework
::
OperatorWithKernel
{
public:
using
framework
::
OperatorWithKernel
::
OperatorWithKernel
;
...
...
paddle/fluid/operators/dgc_clip_by_norm_op.h
浏览文件 @
65420271
...
...
@@ -21,8 +21,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
template
<
typename
DeviceContext
,
typename
T
>
class
DGCClipByNormKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
...
...
paddle/fluid/operators/dropout_op_mlu.cc
浏览文件 @
65420271
...
...
@@ -18,8 +18,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
template
<
typename
T
>
class
DropoutMLUKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
...
...
@@ -106,8 +104,8 @@ class DropoutMLUKernel : public framework::OpKernel<T> {
}
// In downgrade_in_infer mode, need to multiply (1.0f - dropout_prob).
Tensor
scale_tensor
(
x
->
dtype
());
Tensor
bias_tensor
(
x
->
dtype
());
phi
::
Dense
Tensor
scale_tensor
(
x
->
dtype
());
phi
::
Dense
Tensor
bias_tensor
(
x
->
dtype
());
scale_tensor
.
mutable_data
<
T
>
({
1
},
ctx
.
GetPlace
());
bias_tensor
.
mutable_data
<
T
>
({
1
},
ctx
.
GetPlace
());
MLUCnnlTensorDesc
scale_desc
(
scale_tensor
);
...
...
@@ -157,7 +155,7 @@ class DropoutGradMLUKernel : public framework::OpKernel<T> {
}
// cast mask from uint8 to float32/float16
Tensor
cast_mask
(
grad_x
->
dtype
());
phi
::
Dense
Tensor
cast_mask
(
grad_x
->
dtype
());
cast_mask
.
Resize
(
mask
->
dims
());
cast_mask
.
mutable_data
<
T
>
(
ctx
.
GetPlace
());
...
...
paddle/fluid/operators/dropout_op_npu.cc
浏览文件 @
65420271
...
...
@@ -23,8 +23,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
template
<
typename
DeviceContext
,
typename
T
>
class
DropoutNPUKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
...
...
@@ -56,8 +54,8 @@ class DropoutNPUKernel : public framework::OpKernel<T> {
// only achieve the default `upscale_in_train` method
if
(
!
is_test
)
{
Tensor
tmp_x
(
x
->
dtype
());
Tensor
tmp_out
(
out
->
dtype
());
phi
::
Dense
Tensor
tmp_x
(
x
->
dtype
());
phi
::
Dense
Tensor
tmp_out
(
out
->
dtype
());
tmp_x
.
ShareDataWith
(
*
x
);
tmp_out
.
ShareDataWith
(
*
out
);
if
(
x
->
dims
().
size
()
==
1
)
{
...
...
@@ -80,7 +78,7 @@ class DropoutNPUKernel : public framework::OpKernel<T> {
seed
=
ctx
.
Attr
<
bool
>
(
"fix_seed"
)
?
ctx
.
Attr
<
int
>
(
"seed"
)
:
0
;
}
Tensor
keep_prob_tensor
(
x
->
dtype
());
phi
::
Dense
Tensor
keep_prob_tensor
(
x
->
dtype
());
keep_prob_tensor
.
mutable_data
<
T
>
({
1
},
ctx
.
GetPlace
());
FillNpuTensorWithConstant
<
T
>
(
&
keep_prob_tensor
,
static_cast
<
T
>
(
keep_prob
));
...
...
@@ -89,14 +87,14 @@ class DropoutNPUKernel : public framework::OpKernel<T> {
// mask used in `DropOutGenMask` NPU OP is different from
// the output `Mask`.
Tensor
npu_mask
(
experimental
::
DataType
::
UINT8
);
phi
::
Dense
Tensor
npu_mask
(
experimental
::
DataType
::
UINT8
);
uint32_t
length
=
(
x
->
numel
()
+
128
-
1
)
/
128
*
128
;
npu_mask
.
Resize
(
phi
::
make_ddim
({
length
/
8
}));
npu_mask
.
mutable_data
<
uint8_t
>
(
ctx
.
GetPlace
());
// TODO(pangyoki): `keep_prob` used in `DropOutGenMask` NPU
// OP must be a scalar with shape[0]. At present, the shape
// of the `prob` Tensor of this OP is forced to be set to 0
// of the `prob`
phi::Dense
Tensor of this OP is forced to be set to 0
// in `npu_op_runner.cc`, which needs to be optimized later.
NpuOpRunner
runner_gen_mask
;
runner_gen_mask
.
SetType
(
"DropOutGenMask"
)
...
...
@@ -116,7 +114,7 @@ class DropoutNPUKernel : public framework::OpKernel<T> {
runner_dropout
.
Run
(
stream
);
// cast `out` from float/float16 to bool
Tensor
cast_mask
(
experimental
::
DataType
::
BOOL
);
phi
::
Dense
Tensor
cast_mask
(
experimental
::
DataType
::
BOOL
);
cast_mask
.
Resize
(
mask
->
dims
());
cast_mask
.
mutable_data
<
bool
>
(
ctx
.
GetPlace
());
auto
dst_dtype_bool
=
...
...
@@ -176,7 +174,7 @@ class DropoutGradNPUKernel : public framework::OpKernel<T> {
}
// cast mask from uint8 to float32/float16
Tensor
cast_mask
(
dx
->
dtype
());
phi
::
Dense
Tensor
cast_mask
(
dx
->
dtype
());
cast_mask
.
Resize
(
mask
->
dims
());
cast_mask
.
mutable_data
<
T
>
(
ctx
.
GetPlace
());
auto
dst_dtype
=
...
...
paddle/fluid/operators/elementwise/elementwise_add_op_mlu.cc
浏览文件 @
65420271
...
...
@@ -16,7 +16,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
template
<
typename
T
>
class
ElementwiseAddMLUKernel
:
public
framework
::
OpKernel
<
T
>
{
...
...
paddle/fluid/operators/elementwise/elementwise_add_op_npu.cc
浏览文件 @
65420271
...
...
@@ -21,7 +21,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
template
<
typename
T
>
class
ElementwiseAddNPUKernel
:
public
framework
::
OpKernel
<
T
>
{
...
...
@@ -53,7 +52,7 @@ class ElementwiseAddNPUKernel : public framework::OpKernel<T> {
const
auto
&
runner
=
NpuOpRunner
(
"Add"
,
{
*
x
,
*
y
},
{
*
out
},
{});
runner
.
Run
(
dev_ctx
.
stream
());
}
else
{
Tensor
transformed_x
,
transformed_y
;
phi
::
Dense
Tensor
transformed_x
,
transformed_y
;
NpuElementWiseOpBroadcast
<
T
>
(
dev_ctx
,
x
,
y
,
axis
,
&
transformed_x
,
&
transformed_y
);
const
auto
&
runner
=
...
...
@@ -96,7 +95,7 @@ class ElementwiseAddGradNPUKernel : public framework::OpKernel<T> {
}
}
if
(
!
reduce_axes
.
empty
())
{
Tensor
tmp
;
phi
::
Dense
Tensor
tmp
;
tmp
.
ShareDataWith
(
*
dx
);
tmp
.
Resize
(
phi
::
make_ddim
(
dst_dims_vec
));
const
auto
&
runner
=
...
...
@@ -128,7 +127,7 @@ class ElementwiseAddGradNPUKernel : public framework::OpKernel<T> {
}
}
if
(
!
reduce_axes
.
empty
())
{
Tensor
tmp
;
phi
::
Dense
Tensor
tmp
;
tmp
.
ShareDataWith
(
*
dy
);
tmp
.
Resize
(
phi
::
make_ddim
(
dst_dims_vec
));
const
auto
&
runner
=
...
...
paddle/fluid/operators/elementwise/elementwise_div_op.h
浏览文件 @
65420271
...
...
@@ -24,7 +24,6 @@ namespace operators {
class
ElementwiseDivOpDoubleGrad
:
public
framework
::
OperatorWithKernel
{
public:
using
framework
::
OperatorWithKernel
::
OperatorWithKernel
;
using
Tensor
=
phi
::
DenseTensor
;
void
InferShape
(
framework
::
InferShapeContext
*
ctx
)
const
override
{
auto
y_grad_name
=
framework
::
GradVarName
(
"Y"
);
...
...
paddle/fluid/operators/elementwise/elementwise_div_op_mlu.cc
浏览文件 @
65420271
...
...
@@ -21,8 +21,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
template
<
typename
T
>
class
ElementwiseDivMLUKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
...
...
@@ -66,7 +64,7 @@ class ElementwiseDivGradMLUKernel : public framework::OpKernel<T> {
CNNL_OP_TENSOR_MUL
,
ToCnnlDataType
<
T
>
(),
CNNL_NOT_PROPAGATE_NAN
);
// compute dout/y == 1/y * dout
Tensor
dout_div_y
(
dout
->
dtype
());
phi
::
Dense
Tensor
dout_div_y
(
dout
->
dtype
());
dout_div_y
.
Resize
(
dout
->
dims
());
dout_div_y
.
mutable_data
<
T
>
(
ctx
.
GetPlace
());
MLUBinary
<
DIV
>
(
ctx
,
...
...
@@ -110,7 +108,7 @@ class ElementwiseDivGradMLUKernel : public framework::OpKernel<T> {
if
(
dy
)
{
// compute dy = -out * (dout/y) = -out/y * dout
Tensor
neg_out
(
out
->
type
());
phi
::
Dense
Tensor
neg_out
(
out
->
type
());
neg_out
.
mutable_data
<
T
>
(
out
->
dims
(),
ctx
.
GetPlace
());
MLUCnnlTensorDesc
out_desc
(
*
out
);
...
...
@@ -121,7 +119,7 @@ class ElementwiseDivGradMLUKernel : public framework::OpKernel<T> {
out_desc
.
get
(),
GetBasePtr
(
&
neg_out
));
Tensor
dy_temp
(
y
->
dtype
());
phi
::
Dense
Tensor
dy_temp
(
y
->
dtype
());
dy_temp
.
Resize
(
dout
->
dims
());
dy_temp
.
mutable_data
<
T
>
(
ctx
.
GetPlace
());
...
...
paddle/fluid/operators/elementwise/elementwise_div_op_npu.cc
浏览文件 @
65420271
...
...
@@ -21,8 +21,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
template
<
typename
DeviceContext
,
typename
T
>
class
ElementwiseDivNPUKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
...
...
@@ -66,38 +64,38 @@ class ElementwiseDivGradNPUKernel : public framework::OpKernel<T> {
if
(
dx
)
{
dx
->
mutable_data
<
T
>
(
place
);
Tensor
tensor_one
(
y
->
type
());
phi
::
Dense
Tensor
tensor_one
(
y
->
type
());
tensor_one
.
mutable_data
<
float
>
({
1
},
place
);
FillNpuTensorWithConstant
<
float
>
(
&
tensor_one
,
static_cast
<
float
>
(
1.0
));
// Use `Div` CANN OP to achieve `1/y` instead of `Power` CANN OP.
// Because `Power` will cause precision overflow, that is, `float_status`
// will be set to 1.
Tensor
y_div
(
y
->
type
());
phi
::
Dense
Tensor
y_div
(
y
->
type
());
y_div
.
mutable_data
<
T
>
(
y
->
dims
(),
place
);
const
auto
&
runner_one_div_y
=
NpuOpRunner
(
"Div"
,
{
tensor_one
,
*
y
},
{
y_div
},
{});
runner_one_div_y
.
Run
(
stream
);
Tensor
tensor_zeros
(
x
->
type
());
phi
::
Dense
Tensor
tensor_zeros
(
x
->
type
());
tensor_zeros
.
mutable_data
<
T
>
(
x
->
dims
(),
place
);
const
auto
&
runner_tensor_zeros
=
NpuOpRunner
(
"ZerosLike"
,
{
*
x
},
{
tensor_zeros
},
{});
runner_tensor_zeros
.
Run
(
stream
);
Tensor
x_zero
(
experimental
::
DataType
::
BOOL
);
phi
::
Dense
Tensor
x_zero
(
experimental
::
DataType
::
BOOL
);
x_zero
.
mutable_data
<
bool
>
(
x
->
dims
(),
place
);
const
auto
&
runner_x_zero
=
NpuOpRunner
(
"Equal"
,
{
*
x
,
tensor_zeros
},
{
x_zero
},
{});
runner_x_zero
.
Run
(
stream
);
Tensor
x_nozero
(
experimental
::
DataType
::
BOOL
);
phi
::
Dense
Tensor
x_nozero
(
experimental
::
DataType
::
BOOL
);
x_nozero
.
mutable_data
<
bool
>
(
x
->
dims
(),
place
);
const
auto
&
runner_x_nonzero
=
NpuOpRunner
(
"LogicalNot"
,
{
x_zero
},
{
x_nozero
},
{});
runner_x_nonzero
.
Run
(
stream
);
Tensor
x_nozero_f
(
x
->
type
());
phi
::
Dense
Tensor
x_nozero_f
(
x
->
type
());
x_nozero_f
.
mutable_data
<
T
>
(
x
->
dims
(),
place
);
const
auto
&
runner_x_nonzero_f
=
NpuOpRunner
(
"Cast"
,
...
...
@@ -106,7 +104,7 @@ class ElementwiseDivGradNPUKernel : public framework::OpKernel<T> {
{{
"dst_type"
,
static_cast
<
int32_t
>
(
0
)}});
runner_x_nonzero_f
.
Run
(
stream
);
Tensor
x_grad_w
(
x
->
type
());
phi
::
Dense
Tensor
x_grad_w
(
x
->
type
());
x_grad_w
.
mutable_data
<
T
>
(
x
->
dims
(),
place
);
const
auto
&
runner_x_grad_w
=
NpuOpRunner
(
"Mul"
,
{
x_nozero_f
,
y_div
},
{
x_grad_w
},
{});
...
...
@@ -120,19 +118,19 @@ class ElementwiseDivGradNPUKernel : public framework::OpKernel<T> {
if
(
dy
)
{
dy
->
mutable_data
<
T
>
(
place
);
Tensor
neg_out
(
out
->
type
());
phi
::
Dense
Tensor
neg_out
(
out
->
type
());
neg_out
.
mutable_data
<
T
>
(
out
->
dims
(),
place
);
const
auto
&
runner_neg_out
=
NpuOpRunner
(
"Neg"
,
{
*
out
},
{
neg_out
},
{});
runner_neg_out
.
Run
(
stream
);
Tensor
tmp_mul
(
out
->
type
());
phi
::
Dense
Tensor
tmp_mul
(
out
->
type
());
tmp_mul
.
mutable_data
<
T
>
(
out
->
dims
(),
place
);
const
auto
&
runner_mul
=
NpuOpRunner
(
"Mul"
,
{
neg_out
,
*
dout
},
{
tmp_mul
},
{});
runner_mul
.
Run
(
stream
);
if
(
dy
->
dims
()
!=
dout
->
dims
())
{
Tensor
reduced_tmp_mul
(
y
->
type
());
phi
::
Dense
Tensor
reduced_tmp_mul
(
y
->
type
());
reduced_tmp_mul
.
mutable_data
<
T
>
(
y
->
dims
(),
place
);
std
::
vector
<
int64_t
>
axes
;
...
...
paddle/fluid/operators/elementwise/elementwise_floordiv_op_npu.cc
浏览文件 @
65420271
...
...
@@ -21,8 +21,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
template
<
typename
T
>
class
ElementwiseFloorDivNPUKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
...
...
paddle/fluid/operators/elementwise/elementwise_max_op_npu.cc
浏览文件 @
65420271
...
...
@@ -18,8 +18,6 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
using
Tensor
=
phi
::
DenseTensor
;
template
<
typename
DeviceContext
,
typename
T
>
class
ElementwiseMaxNPUKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
...
...
@@ -51,7 +49,7 @@ class ElementwiseMaxNPUKernel : public framework::OpKernel<T> {
const
auto
&
runner
=
NpuOpRunner
(
"Maximum"
,
{
*
x
,
*
y
},
{
*
out
},
{});
runner
.
Run
(
stream
);
}
else
{
Tensor
transformed_x
,
transformed_y
;
phi
::
Dense
Tensor
transformed_x
,
transformed_y
;
NpuElementWiseOpBroadcast
<
T
>
(
dev_ctx
,
x
,
y
,
axis
,
&
transformed_x
,
&
transformed_y
);
const
auto
&
runner
=
...
...
@@ -85,7 +83,7 @@ class ElementwiseMaxGradNPUKernel : public framework::OpKernel<T> {
auto
x_dims
=
x
->
dims
();
auto
y_dims
=
y
->
dims
();
axis
=
(
axis
==
-
1
?
std
::
abs
(
x_dims
.
size
()
-
y_dims
.
size
())
:
axis
);
Tensor
transformed_x
,
transformed_y
;
phi
::
Dense
Tensor
transformed_x
,
transformed_y
;
NpuElementWiseOpBroadcast
<
T
>
(
dev_ctx
,
x
,
y
,
axis
,
&
transformed_x
,
&
transformed_y
);
...
...
@@ -99,9 +97,9 @@ class ElementwiseMaxGradNPUKernel : public framework::OpKernel<T> {
if
(
dx
&&
dy
)
{
dx
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
dy
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
Tensor
tmp_dx
;
phi
::
Dense
Tensor
tmp_dx
;
tmp_dx
.
mutable_data
<
T
>
(
dout_dims
,
ctx
.
GetPlace
());
Tensor
tmp_dy
;
phi
::
Dense
Tensor
tmp_dy
;
tmp_dy
.
mutable_data
<
T
>
(
dout_dims
,
ctx
.
GetPlace
());
const
auto
&
runner
=
NpuOpRunner
(
"MaximumGrad"
,
...
...
@@ -153,12 +151,12 @@ class ElementwiseMaxGradNPUKernel : public framework::OpKernel<T> {
}
}
else
if
(
dx
)
{
Tensor
zero_tensor
(
dout
->
type
());
phi
::
Dense
Tensor
zero_tensor
(
dout
->
type
());
zero_tensor
.
mutable_data
<
T
>
(
dout_dims
,
ctx
.
GetPlace
());
FillNpuTensorWithConstant
<
T
>
(
&
zero_tensor
,
static_cast
<
T
>
(
0
));
dx
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
Tensor
tmp_dx
;
phi
::
Dense
Tensor
tmp_dx
;
tmp_dx
.
mutable_data
<
T
>
(
dout_dims
,
ctx
.
GetPlace
());
const
auto
&
runner
=
NpuOpRunner
(
"MaximumGrad"
,
...
...
@@ -190,12 +188,12 @@ class ElementwiseMaxGradNPUKernel : public framework::OpKernel<T> {
}
}
else
if
(
dy
)
{
Tensor
zero_tensor
(
dout
->
type
());
phi
::
Dense
Tensor
zero_tensor
(
dout
->
type
());
zero_tensor
.
mutable_data
<
T
>
(
dout_dims
,
ctx
.
GetPlace
());
FillNpuTensorWithConstant
<
T
>
(
&
zero_tensor
,
static_cast
<
T
>
(
0
));
dy
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
Tensor
tmp_dy
;
phi
::
Dense
Tensor
tmp_dy
;
tmp_dy
.
mutable_data
<
T
>
(
dout_dims
,
ctx
.
GetPlace
());
const
auto
&
runner
=
NpuOpRunner
(
"MaximumGrad"
,
...
...
paddle/fluid/operators/elementwise/elementwise_min_op_mlu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/elementwise/elementwise_min_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/elementwise/elementwise_mlu.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/elementwise/elementwise_mod_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/elementwise/elementwise_mul_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/elementwise/elementwise_mul_op_mlu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/elementwise/elementwise_mul_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/elementwise/elementwise_npu.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/elementwise/elementwise_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/elementwise/elementwise_pow_op_mlu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/elementwise/elementwise_pow_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/elementwise/elementwise_sub_op_mlu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/elementwise/elementwise_sub_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/expand_as_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/expand_as_v2_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/expand_as_v2_op_mlu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/expand_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/expand_v2_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/eye_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fc_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fill_constant_batch_size_like_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fill_constant_op_mlu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/filter_by_instag_op.cu
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/filter_by_instag_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/flatten_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/flatten_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fsp_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/attn_gemm.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/attn_gemm_int8.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/conv_fusion_op.cu
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/cudnn_bn_add_relu_test.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/cudnn_bn_stats_finalize.cu.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/cudnn_norm_conv.cu.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/cudnn_norm_conv_test.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/cudnn_scale_bias_add_relu.cu.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fmha_ref.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fused_attention_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fused_attention_op.cu
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fused_attention_op_xpu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fused_bias_dropout_residual_layer_norm_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fused_bias_dropout_residual_layer_norm_op.cu
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fused_bn_activation_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fused_bn_activation_op.cu
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fused_bn_activation_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fused_bn_add_activation_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fused_bn_add_activation_op.cu
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fused_bn_add_activation_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fused_embedding_eltwise_layernorm_op.cu
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fused_embedding_fc_lstm_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fused_embedding_fc_lstm_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fused_embedding_seq_pool_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fused_feedforward_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fused_feedforward_op.cu
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fused_feedforward_op_xpu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fused_gate_attention.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fused_gate_attention_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fused_gate_attention_op.cu
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fused_gemm_epilogue_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fused_gemm_epilogue_op.cu
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fused_gemm_epilogue_op_xpu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fused_multi_transformer_int8_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fused_multi_transformer_int8_op.cu
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fused_multi_transformer_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fused_multi_transformer_op.cu
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fused_multi_transformer_op.cu.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fusion_conv_inception_op.cu
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fusion_gru_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fusion_gru_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fusion_lstm_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fusion_lstm_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fusion_repeated_fc_relu_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fusion_repeated_fc_relu_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fusion_seqconv_eltadd_relu_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fusion_seqconv_eltadd_relu_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fusion_seqexpand_concat_fc_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fusion_seqexpand_concat_fc_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fusion_seqpool_concat_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fusion_seqpool_cvm_concat_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fusion_seqpool_cvm_concat_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fusion_squared_mat_sub_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/fusion_squared_mat_sub_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/multihead_matmul_op.cu
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/resnet_basic_block_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/resnet_basic_block_op_xpu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/resnet_unit_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/resnet_unit_op.cu
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/resnet_unit_op_xpu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/skip_layernorm_op.cu
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/xpu_fused_common_function.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/yolo_box_head_op.cu
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/fused/yolo_box_post_op.cu
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/gather_nd_op_mlu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/gather_nd_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/gather_scatter_kernel.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/gather_scatter_kernel.cu
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/gather_scatter_kernel.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/gaussian_random_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/gaussian_random_op_mlu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/gaussian_random_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/gelu_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/graph_khop_sampler_op.cu
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/graph_khop_sampler_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/grid_sampler_op_mlu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/group_norm_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/group_norm_op.cu
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/group_norm_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/group_norm_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/gru_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/gru_op.cu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/gru_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/gru_unit_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/huber_loss_op_mlu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/huber_loss_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/im2sequence_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/index_sample_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/index_select_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/index_select_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/inplace_abn_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/inplace_abn_op.cu
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/inplace_abn_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/instance_norm_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/instance_norm_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/instance_norm_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/interpolate_op.cu
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/interpolate_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/interpolate_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/interpolate_v2_op_mlu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/interpolate_v2_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/jit/benchmark.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/kldiv_loss_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/label_smooth_op_mlu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/label_smooth_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/layer_norm_kernel.cu.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/layer_norm_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/layer_norm_op_mlu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/layer_norm_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/layout_utils.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/limit_by_capacity_op.cu
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/log_loss_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/log_loss_op_xpu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/lookup_table_dequant_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/lookup_table_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/lookup_table_v2_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/lookup_table_v2_op_mlu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/lookup_table_v2_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/lrn_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/lstm_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/lstmp_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/masked_select_op_mlu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/match_matrix_tensor_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/match_matrix_tensor_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/math/context_project.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/math/eigen_values_vectors.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/math/sample_prob.cu
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/math/sample_prob.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/math/sequence_pooling.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/math/softmax.cu
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/math/tree2col.cu
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/matmul_op_mlu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/matmul_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/matmul_v2_op_mlu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/matmul_v2_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/mean_iou_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/mean_op_mlu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/mean_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/meshgrid_op_mlu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/metrics/accuracy_op_mlu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/metrics/accuracy_op_xpu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/metrics/precision_recall_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/mkldnn/dequantize_mkldnn_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/mkldnn/matmul_v2_mkldnn_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/mkldnn/quantize_mkldnn_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/mkldnn/requantize_mkldnn_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/mkldnn/reshape_mkldnn_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/mkldnn/transpose_mkldnn_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/mlu/mlu_baseop.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/mlu/mlu_baseop.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/modified_huber_loss_op.cu
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/modified_huber_loss_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/multi_dot_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/multinomial_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/multiplex_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/nce_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/norm_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/norm_utils.cu.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/number_count_op.cu
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/one_hot_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/one_hot_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/one_hot_op_xpu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/one_hot_v2_op_mlu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/one_hot_v2_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/optimizers/adadelta_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/optimizers/adagrad_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/optimizers/adam_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/optimizers/adam_op_mlu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/optimizers/adam_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/optimizers/adamax_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/optimizers/decayed_adagrad_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/optimizers/dpsgd_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/optimizers/ftrl_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/optimizers/ftrl_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/optimizers/merged_adam_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/optimizers/merged_momentum_op_mlu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/optimizers/momentum_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/optimizers/momentum_op_mlu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/optimizers/proximal_adagrad_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/optimizers/proximal_adagrad_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/optimizers/proximal_gd_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/optimizers/proximal_gd_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/optimizers/rmsprop_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/optimizers/sparse_momentum_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/p_norm_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/pad3d_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/pad_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/partial_concat_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/partial_concat_op.cu
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/partial_concat_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/partial_sum_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/partial_sum_op.cu
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/partial_sum_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/pool_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/pool_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/pool_op_mlu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/positive_negative_pair_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/prelu_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/prroi_pool_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/prroi_pool_op.cu
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/pyramid_hash_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/random_routing_op.cu
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/rank_attention_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/reduce_ops/reduce_any_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/reduce_ops/reduce_any_op_npu_test.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/reduce_ops/reduce_max_op_mlu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/reduce_ops/reduce_max_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/reduce_ops/reduce_mean_op_mlu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/reduce_ops/reduce_mean_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/reduce_ops/reduce_min_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/reduce_ops/reduce_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/reduce_ops/reduce_op_function.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/reduce_ops/reduce_prod_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/reduce_ops/reduce_sum_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/reduce_ops/reduce_sum_op_mlu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/reduce_ops/reduce_sum_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/reshape_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/rnn_op_mlu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/roi_align_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/roi_align_op_mlu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/roi_align_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/roi_pool_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/sample_logits_op.cu
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/sample_logits_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/sampling_id_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/sampling_id_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/save_combine_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/scatter_op_mlu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/scatter_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/search_compute.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/seed_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/seed_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/set_value_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/set_value_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/set_value_op_mlu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/set_value_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/shape_op_mlu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/shape_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/shard_index_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/shuffle_batch_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/shuffle_channel_op.cu
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/sigmoid_cross_entropy_with_logits_op_mlu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/sigmoid_cross_entropy_with_logits_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/similarity_focus_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/slice_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/slice_op_mlu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/slice_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/smooth_l1_loss_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/smooth_l1_loss_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/softmax_with_cross_entropy_op_mlu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/softmax_with_cross_entropy_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/space_to_depth_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/sparse_attention_op.cu
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/split_op_mlu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/split_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/squared_l2_distance_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/squared_l2_norm_op_mlu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/squared_l2_norm_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/stack_op_mlu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/stack_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/stft_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/strided_slice_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/strided_slice_op_mlu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/strided_slice_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/sum_op_mlu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/sum_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/svd_helper.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/sync_batch_norm_op_mlu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/sync_batch_norm_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/take_along_axis_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/tdm_child_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/tdm_sampler_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/teacher_student_sigmoid_loss_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/teacher_student_sigmoid_loss_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/temporal_shift_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/tile_op_mlu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/tile_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/top_k_op.cu
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/top_k_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/top_k_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/top_k_op_xpu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/tree_conv_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/truncated_gaussian_random_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/uniform_random_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/uniform_random_op.cu
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/uniform_random_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/uniform_random_op_mlu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/uniform_random_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/var_conv_2d_op.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/var_conv_2d_op.h
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/where_index_op_mlu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
paddle/fluid/operators/where_index_op_npu.cc
浏览文件 @
65420271
此差异已折叠。
点击以展开。
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录