Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
Paddle
提交
ef734e84
P
Paddle
项目概览
PaddlePaddle
/
Paddle
大约 2 年 前同步成功
通知
2325
Star
20933
Fork
5424
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
1423
列表
看板
标记
里程碑
合并请求
543
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
1,423
Issue
1,423
列表
看板
标记
里程碑
合并请求
543
合并请求
543
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
未验证
提交
ef734e84
编写于
4月 13, 2023
作者:
W
Wangzheee
提交者:
GitHub
4月 13, 2023
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
[Paddle-Trt] Replace fc mul matmul matmul_v2 with matrix_multiply (#52222)
* Paddle-Trt: Replace fc mul matmul matmul_v2 with matrix_multiply
上级
acf55016
变更
42
显示空白变更内容
内联
并排
Showing
42 changed file
with
1004 addition
and
2921 deletion
+1004
-2921
paddle/fluid/framework/ir/CMakeLists.txt
paddle/fluid/framework/ir/CMakeLists.txt
+1
-1
paddle/fluid/framework/ir/constant_folding_pass.cc
paddle/fluid/framework/ir/constant_folding_pass.cc
+1
-1
paddle/fluid/framework/ir/delete_weight_dequant_linear_op_pass.cc
...luid/framework/ir/delete_weight_dequant_linear_op_pass.cc
+2
-2
paddle/fluid/framework/ir/graph_pattern_detector.cc
paddle/fluid/framework/ir/graph_pattern_detector.cc
+19
-9
paddle/fluid/framework/ir/graph_pattern_detector.h
paddle/fluid/framework/ir/graph_pattern_detector.h
+11
-0
paddle/fluid/framework/ir/multihead_matmul_roformer_fuse_pass.cc
...fluid/framework/ir/multihead_matmul_roformer_fuse_pass.cc
+1
-57
paddle/fluid/framework/ir/quant_conv2d_dequant_fuse_pass.cc
paddle/fluid/framework/ir/quant_conv2d_dequant_fuse_pass.cc
+26
-107
paddle/fluid/framework/ir/quant_conv2d_dequant_fuse_pass.h
paddle/fluid/framework/ir/quant_conv2d_dequant_fuse_pass.h
+1
-1
paddle/fluid/framework/ir/remove_padding_recover_padding_pass.cc
...fluid/framework/ir/remove_padding_recover_padding_pass.cc
+100
-22
paddle/fluid/framework/ir/remove_padding_recover_padding_pass.h
.../fluid/framework/ir/remove_padding_recover_padding_pass.h
+16
-5
paddle/fluid/framework/ir/trt_cross_multihead_matmul_fuse_pass.cc
...luid/framework/ir/trt_cross_multihead_matmul_fuse_pass.cc
+2
-59
paddle/fluid/framework/ir/trt_delete_weight_dequant_linear_op_pass.cc
.../framework/ir/trt_delete_weight_dequant_linear_op_pass.cc
+0
-71
paddle/fluid/framework/ir/trt_flash_multihead_matmul_fuse_pass.cc
...luid/framework/ir/trt_flash_multihead_matmul_fuse_pass.cc
+2
-59
paddle/fluid/framework/ir/trt_map_matmul_to_mul_pass.cc
paddle/fluid/framework/ir/trt_map_matmul_to_mul_pass.cc
+0
-918
paddle/fluid/framework/ir/trt_map_matmul_to_mul_pass.h
paddle/fluid/framework/ir/trt_map_matmul_to_mul_pass.h
+0
-130
paddle/fluid/framework/ir/trt_map_ops_to_matrix_multiply_pass.cc
...fluid/framework/ir/trt_map_ops_to_matrix_multiply_pass.cc
+125
-0
paddle/fluid/framework/ir/trt_map_ops_to_matrix_multiply_pass.h
.../fluid/framework/ir/trt_map_ops_to_matrix_multiply_pass.h
+39
-0
paddle/fluid/framework/ir/trt_multihead_matmul_fuse_pass.cc
paddle/fluid/framework/ir/trt_multihead_matmul_fuse_pass.cc
+45
-165
paddle/fluid/framework/ir/trt_skip_layernorm_fuse_pass.cc
paddle/fluid/framework/ir/trt_skip_layernorm_fuse_pass.cc
+8
-1
paddle/fluid/framework/ir/vit_attention_fuse_pass.cc
paddle/fluid/framework/ir/vit_attention_fuse_pass.cc
+2
-3
paddle/fluid/inference/api/analysis_predictor.cc
paddle/fluid/inference/api/analysis_predictor.cc
+1
-3
paddle/fluid/inference/api/paddle_pass_builder.cc
paddle/fluid/inference/api/paddle_pass_builder.cc
+17
-27
paddle/fluid/inference/tensorrt/convert/CMakeLists.txt
paddle/fluid/inference/tensorrt/convert/CMakeLists.txt
+1
-3
paddle/fluid/inference/tensorrt/convert/fc_op.cc
paddle/fluid/inference/tensorrt/convert/fc_op.cc
+0
-415
paddle/fluid/inference/tensorrt/convert/matmul_op.cc
paddle/fluid/inference/tensorrt/convert/matmul_op.cc
+0
-190
paddle/fluid/inference/tensorrt/convert/matmul_v2_op.cc
paddle/fluid/inference/tensorrt/convert/matmul_v2_op.cc
+0
-126
paddle/fluid/inference/tensorrt/convert/matrix_multiply_op.cc
...le/fluid/inference/tensorrt/convert/matrix_multiply_op.cc
+273
-0
paddle/fluid/inference/tensorrt/convert/multihead_matmul_op.cc
...e/fluid/inference/tensorrt/convert/multihead_matmul_op.cc
+115
-32
paddle/fluid/inference/tensorrt/convert/one_hot_op.cc
paddle/fluid/inference/tensorrt/convert/one_hot_op.cc
+2
-0
paddle/fluid/inference/tensorrt/convert/op_converter.h
paddle/fluid/inference/tensorrt/convert/op_converter.h
+0
-13
paddle/fluid/inference/tensorrt/convert/skip_layernorm.cc
paddle/fluid/inference/tensorrt/convert/skip_layernorm.cc
+63
-6
paddle/fluid/inference/tensorrt/engine.cc
paddle/fluid/inference/tensorrt/engine.cc
+7
-0
paddle/fluid/inference/tensorrt/op_teller.cc
paddle/fluid/inference/tensorrt/op_teller.cc
+2
-121
paddle/fluid/operators/tensorrt/tensorrt_engine_op_test.cc
paddle/fluid/operators/tensorrt/tensorrt_engine_op_test.cc
+48
-44
python/paddle/fluid/tests/unittests/ir/inference/CMakeLists.txt
.../paddle/fluid/tests/unittests/ir/inference/CMakeLists.txt
+0
-2
python/paddle/fluid/tests/unittests/ir/inference/test_fc_fuse_pass.py
...e/fluid/tests/unittests/ir/inference/test_fc_fuse_pass.py
+0
-10
python/paddle/fluid/tests/unittests/ir/inference/test_multihead_matmul_roformer_fuse_pass.py
.../ir/inference/test_multihead_matmul_roformer_fuse_pass.py
+4
-1
python/paddle/fluid/tests/unittests/ir/inference/test_trt_convert_matmul_v2.py
...ests/unittests/ir/inference/test_trt_convert_matmul_v2.py
+21
-11
python/paddle/fluid/tests/unittests/ir/inference/test_trt_convert_multihead_matmul.py
...ittests/ir/inference/test_trt_convert_multihead_matmul.py
+23
-148
python/paddle/fluid/tests/unittests/ir/inference/test_trt_fc_fuse_pass.py
...uid/tests/unittests/ir/inference/test_trt_fc_fuse_pass.py
+9
-9
python/paddle/fluid/tests/unittests/ir/inference/test_trt_flatten2_matmul_fuse_pass.py
...ttests/ir/inference/test_trt_flatten2_matmul_fuse_pass.py
+0
-148
python/paddle/fluid/tests/unittests/ir/inference/test_trt_matmul_quant_dequant.py
...s/unittests/ir/inference/test_trt_matmul_quant_dequant.py
+17
-1
未找到文件。
paddle/fluid/framework/ir/CMakeLists.txt
浏览文件 @
ef734e84
...
@@ -132,7 +132,7 @@ pass_library(generate_pass DEPS pass_desc_proto)
...
@@ -132,7 +132,7 @@ pass_library(generate_pass DEPS pass_desc_proto)
target_link_libraries
(
generate_pass pass_desc_proto
)
target_link_libraries
(
generate_pass pass_desc_proto
)
if
(
WITH_TENSORRT
)
if
(
WITH_TENSORRT
)
pass_library
(
trt_map_
matmul_to_mul
_pass inference
)
pass_library
(
trt_map_
ops_to_matrix_multiply
_pass inference
)
pass_library
(
trt_multihead_matmul_fuse_pass inference
)
pass_library
(
trt_multihead_matmul_fuse_pass inference
)
pass_library
(
trt_flash_multihead_matmul_fuse_pass inference
)
pass_library
(
trt_flash_multihead_matmul_fuse_pass inference
)
pass_library
(
trt_cross_multihead_matmul_fuse_pass inference
)
pass_library
(
trt_cross_multihead_matmul_fuse_pass inference
)
...
...
paddle/fluid/framework/ir/constant_folding_pass.cc
浏览文件 @
ef734e84
...
@@ -64,7 +64,7 @@ void ConstantFoldingPass::ApplyImpl(ir::Graph *graph) const {
...
@@ -64,7 +64,7 @@ void ConstantFoldingPass::ApplyImpl(ir::Graph *graph) const {
platform
::
errors
::
Fatal
(
platform
::
errors
::
Fatal
(
"scope must not be null when applying constant floding."
));
"scope must not be null when applying constant floding."
));
std
::
vector
<
std
::
string
>
blacklist
{
"feed"
};
std
::
vector
<
std
::
string
>
blacklist
{
"feed"
,
"matrix_multiply"
};
auto
op_node_sorted
=
framework
::
ir
::
TopologyVarientSort
(
auto
op_node_sorted
=
framework
::
ir
::
TopologyVarientSort
(
*
graph
,
static_cast
<
framework
::
ir
::
SortKind
>
(
0
));
*
graph
,
static_cast
<
framework
::
ir
::
SortKind
>
(
0
));
...
...
paddle/fluid/framework/ir/delete_weight_dequant_linear_op_pass.cc
浏览文件 @
ef734e84
...
@@ -24,10 +24,10 @@ namespace ir {
...
@@ -24,10 +24,10 @@ namespace ir {
class
Graph
;
class
Graph
;
void
DeleteWeightDequantLinearOpPass
::
ApplyImpl
(
ir
::
Graph
*
graph
)
const
{
void
DeleteWeightDequantLinearOpPass
::
ApplyImpl
(
ir
::
Graph
*
graph
)
const
{
std
::
unordered_set
<
std
::
string
>
op_list
=
{
"matmul_v2"
,
std
::
unordered_set
<
std
::
string
>
op_list
=
{
"matrix_multiply"
,
"matmul_v2"
,
"matmul"
,
"matmul"
,
"mul"
,
"mul"
,
"fc"
,
"depthwise_conv2d"
,
"depthwise_conv2d"
,
"conv2d"
,
"conv2d"
,
"conv2d_transpose"
};
"conv2d_transpose"
};
...
...
paddle/fluid/framework/ir/graph_pattern_detector.cc
浏览文件 @
ef734e84
...
@@ -2465,7 +2465,7 @@ PDNode *patterns::ConvElementwiseaddAct::operator()(
...
@@ -2465,7 +2465,7 @@ PDNode *patterns::ConvElementwiseaddAct::operator()(
PDNode
*
patterns
::
VitAttention
::
operator
()(
PDNode
*
in
)
{
PDNode
*
patterns
::
VitAttention
::
operator
()(
PDNode
*
in
)
{
in
->
AsInput
();
in
->
AsInput
();
std
::
unordered_set
<
std
::
string
>
matmul_ops
{
"mat
mul"
,
"matmul_v2
"
};
std
::
unordered_set
<
std
::
string
>
matmul_ops
{
"mat
rix_multiply
"
};
auto
matmul0_op
=
auto
matmul0_op
=
pattern
->
NewNode
(
matmul0_op_repr
())
->
assert_is_ops
(
matmul_ops
);
pattern
->
NewNode
(
matmul0_op_repr
())
->
assert_is_ops
(
matmul_ops
);
...
@@ -2504,13 +2504,13 @@ PDNode *patterns::VitAttention::operator()(PDNode *in) {
...
@@ -2504,13 +2504,13 @@ PDNode *patterns::VitAttention::operator()(PDNode *in) {
auto
slice1_op
=
pattern
->
NewNode
(
slice1_op_repr
())
->
assert_is_op
(
"slice"
);
auto
slice1_op
=
pattern
->
NewNode
(
slice1_op_repr
())
->
assert_is_op
(
"slice"
);
auto
slice1_out
=
pattern
->
NewNode
(
slice1_out_repr
())
auto
slice1_out
=
pattern
->
NewNode
(
slice1_out_repr
())
->
assert_is_op_output
(
"slice"
,
"Out"
)
->
assert_is_op_output
(
"slice"
,
"Out"
)
->
assert_is_op_input
(
"mat
mul_v2
"
,
"Y"
)
->
assert_is_op_input
(
"mat
rix_multiply
"
,
"Y"
)
->
AsIntermediate
();
->
AsIntermediate
();
auto
slice2_op
=
pattern
->
NewNode
(
slice2_op_repr
())
->
assert_is_op
(
"slice"
);
auto
slice2_op
=
pattern
->
NewNode
(
slice2_op_repr
())
->
assert_is_op
(
"slice"
);
auto
slice2_out
=
pattern
->
NewNode
(
slice2_out_repr
())
auto
slice2_out
=
pattern
->
NewNode
(
slice2_out_repr
())
->
assert_is_op_output
(
"slice"
,
"Out"
)
->
assert_is_op_output
(
"slice"
,
"Out"
)
->
assert_is_op_input
(
"mat
mul_v2
"
,
"X"
)
->
assert_is_op_input
(
"mat
rix_multiply
"
,
"X"
)
->
AsIntermediate
();
->
AsIntermediate
();
auto
slice3_op
=
pattern
->
NewNode
(
slice3_op_repr
())
->
assert_is_op
(
"slice"
);
auto
slice3_op
=
pattern
->
NewNode
(
slice3_op_repr
())
->
assert_is_op
(
"slice"
);
...
@@ -2523,13 +2523,13 @@ PDNode *patterns::VitAttention::operator()(PDNode *in) {
...
@@ -2523,13 +2523,13 @@ PDNode *patterns::VitAttention::operator()(PDNode *in) {
pattern
->
NewNode
(
transpose2_op_repr
())
->
assert_is_op
(
"transpose2"
);
pattern
->
NewNode
(
transpose2_op_repr
())
->
assert_is_op
(
"transpose2"
);
auto
transpose2_out
=
pattern
->
NewNode
(
transpose2_out_repr
())
auto
transpose2_out
=
pattern
->
NewNode
(
transpose2_out_repr
())
->
assert_is_op_output
(
"transpose2"
,
"Out"
)
->
assert_is_op_output
(
"transpose2"
,
"Out"
)
->
assert_is_op_input
(
"mat
mul_v2
"
,
"Y"
)
->
assert_is_op_input
(
"mat
rix_multiply
"
,
"Y"
)
->
AsIntermediate
();
->
AsIntermediate
();
auto
matmul1_op
=
auto
matmul1_op
=
pattern
->
NewNode
(
matmul1_op_repr
())
->
assert_is_op
(
"mat
mul_v2
"
);
pattern
->
NewNode
(
matmul1_op_repr
())
->
assert_is_op
(
"mat
rix_multiply
"
);
auto
matmul1_out
=
pattern
->
NewNode
(
matmul1_out_repr
())
auto
matmul1_out
=
pattern
->
NewNode
(
matmul1_out_repr
())
->
assert_is_op_output
(
"mat
mul_v2
"
,
"Out"
)
->
assert_is_op_output
(
"mat
rix_multiply
"
,
"Out"
)
->
assert_is_op_input
(
"scale"
,
"X"
)
->
assert_is_op_input
(
"scale"
,
"X"
)
->
AsIntermediate
();
->
AsIntermediate
();
...
@@ -2543,13 +2543,13 @@ PDNode *patterns::VitAttention::operator()(PDNode *in) {
...
@@ -2543,13 +2543,13 @@ PDNode *patterns::VitAttention::operator()(PDNode *in) {
pattern
->
NewNode
(
softmax1_op_repr
())
->
assert_is_op
(
"softmax"
);
pattern
->
NewNode
(
softmax1_op_repr
())
->
assert_is_op
(
"softmax"
);
auto
softmax1_out
=
pattern
->
NewNode
(
softmax1_out_repr
())
auto
softmax1_out
=
pattern
->
NewNode
(
softmax1_out_repr
())
->
assert_is_op_output
(
"softmax"
,
"Out"
)
->
assert_is_op_output
(
"softmax"
,
"Out"
)
->
assert_is_op_input
(
"mat
mul_v2
"
,
"X"
)
->
assert_is_op_input
(
"mat
rix_multiply
"
,
"X"
)
->
AsIntermediate
();
->
AsIntermediate
();
auto
matmul2_op
=
auto
matmul2_op
=
pattern
->
NewNode
(
matmul2_op_repr
())
->
assert_is_op
(
"mat
mul_v2
"
);
pattern
->
NewNode
(
matmul2_op_repr
())
->
assert_is_op
(
"mat
rix_multiply
"
);
auto
matmul2_out
=
pattern
->
NewNode
(
matmul2_out_repr
())
auto
matmul2_out
=
pattern
->
NewNode
(
matmul2_out_repr
())
->
assert_is_op_output
(
"mat
mul_v2
"
,
"Out"
)
->
assert_is_op_output
(
"mat
rix_multiply
"
,
"Out"
)
->
assert_is_op_input
(
"transpose2"
,
"X"
)
->
assert_is_op_input
(
"transpose2"
,
"X"
)
->
AsIntermediate
();
->
AsIntermediate
();
...
@@ -4452,6 +4452,16 @@ PDNode *patterns::FusedFeedForwardBwd::operator()(
...
@@ -4452,6 +4452,16 @@ PDNode *patterns::FusedFeedForwardBwd::operator()(
return
out_grad
;
return
out_grad
;
}
}
void
patterns
::
MulMatmulMatmulV2
::
operator
()(
const
std
::
unordered_set
<
std
::
string
>
&
ops_type
)
{
auto
ops
=
pattern
->
NewNode
(
ops_repr
())
->
assert_is_ops
(
ops_type
);
auto
ops_out
=
pattern
->
NewNode
(
ops_out_repr
())
->
AsOutput
()
->
assert_is_ops_output
(
ops_type
,
"Out"
);
ops
->
LinksTo
({
ops_out
});
}
}
// namespace ir
}
// namespace ir
}
// namespace framework
}
// namespace framework
}
// namespace paddle
}
// namespace paddle
paddle/fluid/framework/ir/graph_pattern_detector.h
浏览文件 @
ef734e84
...
@@ -2146,6 +2146,17 @@ struct MergeLayernormPattern : public PatternBase {
...
@@ -2146,6 +2146,17 @@ struct MergeLayernormPattern : public PatternBase {
PATTERN_DECL_NODE
(
layernorm_40_out
);
PATTERN_DECL_NODE
(
layernorm_40_out
);
};
};
// MulMatmulMatmulV2: ops(mul, matmul, matmul_v2)
// Forward pass for ops(mul, matmul, matmul_v2) convert to matrix_multiply.
struct
MulMatmulMatmulV2
:
public
PatternBase
{
MulMatmulMatmulV2
(
PDPattern
*
pattern
,
const
std
::
string
&
name_scope
)
:
PatternBase
(
pattern
,
name_scope
,
"mul_matmul_matmul_v2"
)
{}
void
operator
()(
const
std
::
unordered_set
<
std
::
string
>&
ops_type
);
PATTERN_DECL_NODE
(
ops
);
PATTERN_DECL_NODE
(
ops_out
);
};
// Add support int8 flag
// Add support int8 flag
struct
AddSupportInt8
:
public
PatternBase
{
struct
AddSupportInt8
:
public
PatternBase
{
AddSupportInt8
(
PDPattern
*
pattern
,
const
std
::
string
&
name_scope
)
AddSupportInt8
(
PDPattern
*
pattern
,
const
std
::
string
&
name_scope
)
...
...
paddle/fluid/framework/ir/multihead_matmul_roformer_fuse_pass.cc
浏览文件 @
ef734e84
...
@@ -37,7 +37,7 @@ static void ReplaceOutputVar(Node* op, Node* old_var, Node* new_var) {
...
@@ -37,7 +37,7 @@ static void ReplaceOutputVar(Node* op, Node* old_var, Node* new_var) {
}
}
PDNode
*
MultiHeadMatmulRoformerPattern
::
operator
()()
{
PDNode
*
MultiHeadMatmulRoformerPattern
::
operator
()()
{
std
::
unordered_set
<
std
::
string
>
matmul_ops
{
"mat
mul"
,
"matmul_v2
"
};
std
::
unordered_set
<
std
::
string
>
matmul_ops
{
"mat
rix_multiply
"
};
auto
*
input0
=
pattern
->
NewNode
(
input0_repr
());
auto
*
input0
=
pattern
->
NewNode
(
input0_repr
());
input0
->
assert_is_ops_input
(
matmul_ops
);
input0
->
assert_is_ops_input
(
matmul_ops
);
...
@@ -313,23 +313,6 @@ PDNode* MultiHeadMatmulRoformerPattern::operator()() {
...
@@ -313,23 +313,6 @@ PDNode* MultiHeadMatmulRoformerPattern::operator()() {
}
// namespace patterns
}
// namespace patterns
MultiHeadMatmulRoformerFusePass
::
MultiHeadMatmulRoformerFusePass
()
{
MultiHeadMatmulRoformerFusePass
::
MultiHeadMatmulRoformerFusePass
()
{
AddOpCompat
(
OpCompat
(
"mul"
))
.
AddInput
(
"X"
)
// the shape shoule be (B, S, N*H)
.
IsTensor
()
.
End
()
.
AddInput
(
"Y"
)
// the shape shoule be (N*H, N*H)
.
IsTensor
()
.
End
()
.
AddOutput
(
"Out"
)
// the shape shoule be (B, S, N*H)
.
IsTensor
()
.
End
()
.
AddAttr
(
"x_num_col_dims"
)
.
IsNumEQ
(
2
)
.
End
()
.
AddAttr
(
"y_num_col_dims"
)
.
IsNumEQ
(
1
)
.
End
();
AddOpCompat
(
OpCompat
(
"elementwise_add"
))
AddOpCompat
(
OpCompat
(
"elementwise_add"
))
.
AddInput
(
"X"
)
.
AddInput
(
"X"
)
// in bias, shape is (B, S, N*H),
// in bias, shape is (B, S, N*H),
...
@@ -394,43 +377,6 @@ MultiHeadMatmulRoformerFusePass::MultiHeadMatmulRoformerFusePass() {
...
@@ -394,43 +377,6 @@ MultiHeadMatmulRoformerFusePass::MultiHeadMatmulRoformerFusePass() {
// QK (B, H, S, N)*(B, H, S, N) -> (B, H, S, S)
// QK (B, H, S, N)*(B, H, S, N) -> (B, H, S, S)
// QKV (B, H, S, S)*(B, H, S, N) -> (B, H, S, N)
// QKV (B, H, S, S)*(B, H, S, N) -> (B, H, S, N)
AddOpCompat
(
OpCompat
(
"matmul"
))
.
AddInput
(
"X"
)
.
IsTensor
()
.
End
()
.
AddInput
(
"Y"
)
.
IsTensor
()
.
End
()
.
AddOutput
(
"Out"
)
.
IsTensor
()
.
End
()
.
AddAttr
(
"alpha"
)
.
IsType
<
float
>
()
// QK(anyvalue, will copy to new op) QKV(1.0)
.
End
()
.
AddAttr
(
"transpose_X"
)
.
IsBoolEQ
(
false
)
.
End
()
.
AddAttr
(
"transpose_Y"
)
// QK(true) QKV(false)
.
IsType
<
bool
>
()
.
End
();
AddOpCompat
(
OpCompat
(
"matmul_v2"
))
.
AddInput
(
"X"
)
.
IsTensor
()
.
End
()
.
AddInput
(
"Y"
)
.
IsTensor
()
.
End
()
.
AddOutput
(
"Out"
)
.
IsTensor
()
.
End
()
.
AddAttr
(
"trans_x"
)
.
IsBoolEQ
(
false
)
.
End
()
.
AddAttr
(
"trans_y"
)
// QK(true) QKV(false)
.
IsType
<
bool
>
()
.
End
();
AddOpCompat
(
OpCompat
(
"softmax"
))
AddOpCompat
(
OpCompat
(
"softmax"
))
.
AddInput
(
"X"
)
.
AddInput
(
"X"
)
.
IsTensor
()
.
IsTensor
()
...
@@ -825,6 +771,4 @@ REGISTER_PASS_CAPABILITY(multihead_matmul_roformer_fuse_pass)
...
@@ -825,6 +771,4 @@ REGISTER_PASS_CAPABILITY(multihead_matmul_roformer_fuse_pass)
.
EQ
(
"reshape2"
,
0
)
.
EQ
(
"reshape2"
,
0
)
.
EQ
(
"transpose2"
,
0
)
.
EQ
(
"transpose2"
,
0
)
.
EQ
(
"scale"
,
0
)
.
EQ
(
"scale"
,
0
)
.
LE
(
"matmul"
,
1
)
.
EQ
(
"matmul_v2"
,
0
)
.
EQ
(
"softmax"
,
0
));
.
EQ
(
"softmax"
,
0
));
paddle/fluid/framework/ir/quant_conv2d_dequant_fuse_pass.cc
浏览文件 @
ef734e84
...
@@ -202,77 +202,6 @@ QuantDequantFusePass::QuantDequantFusePass() {
...
@@ -202,77 +202,6 @@ QuantDequantFusePass::QuantDequantFusePass() {
.
AddAttr
(
"data_format"
)
.
AddAttr
(
"data_format"
)
.
IsStringIn
({
"NCHW"
,
"NHWC"
,
"AnyLayout"
})
.
IsStringIn
({
"NCHW"
,
"NHWC"
,
"AnyLayout"
})
.
End
();
.
End
();
AddOpCompat
(
OpCompat
(
"mul"
))
.
AddInput
(
"X"
)
.
IsTensor
()
.
End
()
.
AddInput
(
"Y"
)
.
IsTensor
()
.
End
()
.
AddOutput
(
"Out"
)
.
IsTensor
()
.
End
()
.
AddAttr
(
"x_num_col_dims"
)
.
IsNumGE
(
1
)
.
End
()
.
AddAttr
(
"y_num_col_dims"
)
.
IsNumEQ
(
1
)
.
End
();
AddOpCompat
(
OpCompat
(
"matmul_v2"
))
.
AddInput
(
"X"
)
.
IsTensor
()
.
End
()
.
AddInput
(
"Y"
)
.
IsTensor
()
.
End
()
.
AddOutput
(
"Out"
)
.
IsTensor
()
.
End
()
.
AddAttr
(
"trans_x"
)
.
IsBoolEQ
(
false
)
.
End
()
.
AddAttr
(
"trans_y"
)
.
IsBoolEQ
(
false
)
.
End
();
AddOpCompat
(
OpCompat
(
"matmul"
))
.
AddInput
(
"X"
)
.
IsTensor
()
.
End
()
.
AddInput
(
"Y"
)
.
IsTensor
()
.
End
()
.
AddOutput
(
"Out"
)
.
IsTensor
()
.
End
()
.
AddAttr
(
"alpha"
)
.
IsNumGE
(
0.99
f
)
.
IsNumLE
(
1.01
f
)
.
End
()
.
AddAttr
(
"transpose_X"
)
.
IsBoolEQ
(
false
)
.
End
()
.
AddAttr
(
"transpose_Y"
)
.
IsBoolEQ
(
false
)
.
End
();
AddOpCompat
(
OpCompat
(
"fc"
))
.
AddInput
(
"Input"
)
.
IsTensor
()
.
End
()
.
AddInput
(
"W"
)
.
IsTensor
()
.
End
()
.
AddInput
(
"Bias"
)
.
IsTensor
()
.
End
()
.
AddOutput
(
"Out"
)
.
IsTensor
()
.
End
()
.
AddAttr
(
"in_num_col_dims"
)
.
IsNumGE
(
1
)
.
End
()
.
AddAttr
(
"activation_type"
)
.
IsStringIn
({
"relu"
,
""
})
.
End
();
AddOpCompat
(
OpCompat
(
"conv2d_transpose"
))
AddOpCompat
(
OpCompat
(
"conv2d_transpose"
))
.
AddInput
(
"Input"
)
.
AddInput
(
"Input"
)
.
IsTensor
()
.
IsTensor
()
...
@@ -379,10 +308,8 @@ void QuantDequantFusePass::DeleteQuant(ir::Graph* graph,
...
@@ -379,10 +308,8 @@ void QuantDequantFusePass::DeleteQuant(ir::Graph* graph,
if
(
quantized_op_type
==
"conv2d"
||
if
(
quantized_op_type
==
"conv2d"
||
quantized_op_type
==
"conv2d_fusion"
||
quantized_op_type
==
"conv2d_fusion"
||
quantized_op_type
==
"depthwise_conv2d"
||
quantized_op_type
==
"depthwise_conv2d"
||
quantized_op_type
==
"fc"
||
quantized_op_type
==
"conv2d_transpose"
||
quantized_op_type
==
"conv2d_transpose"
||
quantized_op_type
==
"mul"
||
quantized_op_type
==
"matmul"
||
quantized_op_type
==
"matrix_multiply"
)
{
quantized_op_type
==
"matmul_v2"
)
{
op_desc
->
SetAttr
(
"Input_scale"
,
scale_value
);
op_desc
->
SetAttr
(
"Input_scale"
,
scale_value
);
}
else
{
}
else
{
PADDLE_THROW
(
platform
::
errors
::
Unimplemented
(
PADDLE_THROW
(
platform
::
errors
::
Unimplemented
(
...
@@ -416,17 +343,14 @@ void QuantDequantFusePass::FuseDequant(ir::Graph* graph,
...
@@ -416,17 +343,14 @@ void QuantDequantFusePass::FuseDequant(ir::Graph* graph,
quantized_op_type
==
"conv2d_transpose"
)
{
quantized_op_type
==
"conv2d_transpose"
)
{
weight_name
=
"Filter"
;
weight_name
=
"Filter"
;
input_name
=
"Input"
;
input_name
=
"Input"
;
}
else
if
(
quantized_op_type
==
"mul"
||
quantized_op_type
==
"matmul"
||
}
else
if
(
quantized_op_type
==
"matrix_multiply"
)
{
quantized_op_type
==
"matmul_v2"
)
{
weight_name
=
"Y"
;
weight_name
=
"Y"
;
input_name
=
"X"
;
input_name
=
"X"
;
}
else
if
(
quantized_op_type
==
"fc"
)
{
weight_name
=
"W"
;
input_name
=
"Input"
;
}
else
{
}
else
{
PADDLE_THROW
(
platform
::
errors
::
Unimplemented
(
PADDLE_THROW
(
platform
::
errors
::
Unimplemented
(
"QuantDequantFuse: We only support conv2d, conv2d_fusion, fused_conv2d,"
"QuantDequantFuse: We only support conv2d, conv2d_fusion, fused_conv2d,"
"conv2d_transpose, fc, mul, matmul, matmul_v2 for now, but received: "
"conv2d_transpose, matrix_multiply(mul/matmul/matmul_v2) for now, but "
"received: "
"%s."
,
"%s."
,
quantized_op_type
));
quantized_op_type
));
}
}
...
@@ -514,16 +438,16 @@ void QuantDequantFusePass::FuseDequant(ir::Graph* graph,
...
@@ -514,16 +438,16 @@ void QuantDequantFusePass::FuseDequant(ir::Graph* graph,
// re-write it again when this weight tensor is shared among many ops.
// re-write it again when this weight tensor is shared among many ops.
if
(
!
quantized_op_weight_node_set
.
count
(
quantized_op_weight_node
))
{
if
(
!
quantized_op_weight_node_set
.
count
(
quantized_op_weight_node
))
{
quantized_op_weight_node_set
.
insert
(
quantized_op_weight_node
);
quantized_op_weight_node_set
.
insert
(
quantized_op_weight_node
);
// If quantized op is
fc
, weight scale size = 1;
// If quantized op is
matrix_multiply
, weight scale size = 1;
// If quantized op is conv2d, weight scale size = weight dims[0]
// If quantized op is conv2d, weight scale size = weight dims[0]
// If quantized op is conv2d_transpose, weight scale size = weight dims[1]
// If quantized op is conv2d_transpose, weight scale size = weight dims[1]
if
(
quantized_op_type
==
"mul"
||
quantized_op_type
==
"matmul"
||
if
(
quantized_op_type
==
"matrix_multiply"
)
{
quantized_op_type
==
"matmul_v2"
||
quantized_op_type
==
"fc"
)
{
if
(
dequant_type
==
"fake_dequantize_max_abs"
)
{
if
(
dequant_type
==
"fake_dequantize_max_abs"
)
{
PADDLE_ENFORCE_EQ
(
weight_scale
.
size
(),
PADDLE_ENFORCE_EQ
(
weight_scale
.
size
(),
1
,
1
,
platform
::
errors
::
InvalidArgument
(
platform
::
errors
::
InvalidArgument
(
"mul/matmul/matmul_v2 op weight dequantized by "
"matrix_multiply(mul/matmul/matmul_v2) op "
"weight dequantized by "
"[fake_dequantize_max_abs] "
"[fake_dequantize_max_abs] "
"requires weight scale size = 1, but got %d."
,
"requires weight scale size = 1, but got %d."
,
weight_scale
.
size
()));
weight_scale
.
size
()));
...
@@ -538,19 +462,22 @@ void QuantDequantFusePass::FuseDequant(ir::Graph* graph,
...
@@ -538,19 +462,22 @@ void QuantDequantFusePass::FuseDequant(ir::Graph* graph,
quant_axis
==
1
,
quant_axis
==
1
,
true
,
true
,
platform
::
errors
::
InvalidArgument
(
platform
::
errors
::
InvalidArgument
(
"'quant_axis' of mul/matmul/fc/matmul_v2 op weight "
"'quant_axis' of matrix_multiply(mul/matmul/matmul_v2) op "
"weight "
"dequantized by "
"dequantized by "
"[fake_channel_wise_dequantize_max_abs]should be 1, but "
"[fake_channel_wise_dequantize_max_abs]should be 1, but "
"the received is %d"
,
"the received is %d"
,
quant_axis
));
quant_axis
));
}
}
PADDLE_ENFORCE_EQ
(
weight_scale
.
size
(),
PADDLE_ENFORCE_EQ
(
weight_scale
.
size
(),
static_cast
<
size_t
>
(
w_dims
[
1
]),
static_cast
<
size_t
>
(
w_dims
[
1
]),
platform
::
errors
::
InvalidArgument
(
platform
::
errors
::
InvalidArgument
(
"mul/matmul/matmul_v2 op weight dequantized by "
"matrix_multiply(mul/matmul/matmul_v2) op weight dequantized "
"by "
"[fake_channel_wise_dequantize_max_abs] "
"[fake_channel_wise_dequantize_max_abs] "
"requires weight scale "
"requires weight scale "
"size = 2nd dim of mul/matmul/matmul_v2
's "
"size = 2nd dim of matrix_multiply(mul/matmul/matmul_v2)
's "
"weight, which is %d, "
"weight, which is %d, "
"but got "
"but got "
"%d."
,
"%d."
,
...
@@ -650,11 +577,7 @@ void QuantDequantFusePass::FuseDequant(ir::Graph* graph,
...
@@ -650,11 +577,7 @@ void QuantDequantFusePass::FuseDequant(ir::Graph* graph,
quantized_op_type
==
"conv2d_transpose"
)
{
quantized_op_type
==
"conv2d_transpose"
)
{
new_op_desc
.
SetInput
(
"Input"
,
{
new_input
});
new_op_desc
.
SetInput
(
"Input"
,
{
new_input
});
new_op_desc
.
SetOutput
(
"Output"
,
{
new_output
});
new_op_desc
.
SetOutput
(
"Output"
,
{
new_output
});
}
else
if
(
quantized_op_type
==
"fc"
)
{
}
else
if
(
quantized_op_type
==
"matrix_multiply"
)
{
new_op_desc
.
SetInput
(
"Input"
,
{
new_input
});
new_op_desc
.
SetOutput
(
"Out"
,
{
new_output
});
}
else
if
(
quantized_op_type
==
"mul"
||
quantized_op_type
==
"matmul"
||
quantized_op_type
==
"matmul_v2"
)
{
new_op_desc
.
SetInput
(
"X"
,
{
new_input
});
new_op_desc
.
SetInput
(
"X"
,
{
new_input
});
new_op_desc
.
SetOutput
(
"Out"
,
{
new_output
});
new_op_desc
.
SetOutput
(
"Out"
,
{
new_output
});
}
}
...
@@ -682,12 +605,9 @@ void QuantDequantFusePass::ApplyImpl(ir::Graph* graph) const {
...
@@ -682,12 +605,9 @@ void QuantDequantFusePass::ApplyImpl(ir::Graph* graph) const {
std
::
unordered_set
<
std
::
string
>
quantized_op_types
=
{
std
::
unordered_set
<
std
::
string
>
quantized_op_types
=
{
"conv2d"
,
"conv2d"
,
"fused_conv2d"
,
"fused_conv2d"
,
"mul"
,
"matrix_multiply"
,
"matmul"
,
"depthwise_conv2d"
,
"depthwise_conv2d"
,
"conv2d_transpose"
,
"conv2d_transpose"
,
"fc"
,
"matmul_v2"
,
};
};
auto
*
scope
=
param_scope
();
auto
*
scope
=
param_scope
();
...
@@ -712,7 +632,6 @@ REGISTER_PASS_CAPABILITY(quant_conv2d_dequant_fuse_pass)
...
@@ -712,7 +632,6 @@ REGISTER_PASS_CAPABILITY(quant_conv2d_dequant_fuse_pass)
.
AddCombination
(
.
AddCombination
(
paddle
::
framework
::
compatible
::
OpVersionComparatorCombination
()
paddle
::
framework
::
compatible
::
OpVersionComparatorCombination
()
.
LE
(
"conv2d"
,
1
)
.
LE
(
"conv2d"
,
1
)
.
EQ
(
"fc"
,
0
)
.
LE
(
"conv2d_transpose"
,
2
)
.
LE
(
"conv2d_transpose"
,
2
)
.
EQ
(
"fake_quantize_abs_max"
,
0
)
.
EQ
(
"fake_quantize_abs_max"
,
0
)
.
EQ
(
"fake_quantize_range_abs_max"
,
0
)
.
EQ
(
"fake_quantize_range_abs_max"
,
0
)
...
...
paddle/fluid/framework/ir/quant_conv2d_dequant_fuse_pass.h
浏览文件 @
ef734e84
...
@@ -22,7 +22,7 @@ namespace framework {
...
@@ -22,7 +22,7 @@ namespace framework {
namespace
ir
{
namespace
ir
{
///
///
/// Fuse quant + conv2d/depthwise_conv2d/m
ul/fc
+ dequant
/// Fuse quant + conv2d/depthwise_conv2d/m
atrix_multiply
+ dequant
///
///
class
QuantDequantFusePass
:
public
FusePassBase
{
class
QuantDequantFusePass
:
public
FusePassBase
{
public:
public:
...
...
paddle/fluid/framework/ir/remove_padding_recover_padding_pass.cc
浏览文件 @
ef734e84
...
@@ -110,12 +110,14 @@ void MultiheadMatmul::operator()() {
...
@@ -110,12 +110,14 @@ void MultiheadMatmul::operator()() {
.
LinksTo
({
multihead_matmul_out
});
.
LinksTo
({
multihead_matmul_out
});
}
}
void
Fc
::
operator
()()
{
void
MatrixMultiply
::
operator
()()
{
// Create nodes for fc.
// Create nodes for matrix_multiply.
auto
*
fc_input
=
auto
*
matrix_multiply_input
=
pattern
->
NewNode
(
fc_input_repr
())
->
assert_is_op_input
(
"fc"
,
"Input"
);
pattern
->
NewNode
(
matrix_multiply_input_repr
())
auto
*
fc_op
=
pattern
->
NewNode
(
fc_op_repr
())
->
assert_is_op
(
"fc"
);
->
assert_is_op_input
(
"matrix_multiply"
,
"X"
);
fc_op
->
LinksFrom
({
fc_input
});
auto
*
matrix_multiply_op
=
pattern
->
NewNode
(
matrix_multiply_op_repr
())
->
assert_is_op
(
"matrix_multiply"
);
matrix_multiply_op
->
LinksFrom
({
matrix_multiply_input
});
}
}
void
Activation
::
operator
()()
{
void
Activation
::
operator
()()
{
...
@@ -146,6 +148,19 @@ void FusedTokenPrune::operator()() {
...
@@ -146,6 +148,19 @@ void FusedTokenPrune::operator()() {
fused_token_prune_op
->
LinksFrom
({
fused_token_prune_input
})
fused_token_prune_op
->
LinksFrom
({
fused_token_prune_input
})
.
LinksTo
({
fused_token_prune_output
});
.
LinksTo
({
fused_token_prune_output
});
}
}
void
ElementWise
::
operator
()()
{
// Create nodes for elementwise.
auto
*
elementwise_input
=
pattern
->
NewNode
(
elementwise_input_repr
())
->
assert_is_op_input
(
"elementwise_add"
,
"X"
);
auto
*
elementwise_op
=
pattern
->
NewNode
(
elementwise_op_repr
())
->
assert_is_op
(
"elementwise_add"
);
auto
*
elementwise_out
=
pattern
->
NewNode
(
elementwise_out_repr
())
->
assert_is_op_output
(
"elementwise_add"
);
// Add links for elementwise op.
elementwise_op
->
LinksFrom
({
elementwise_input
}).
LinksTo
({
elementwise_out
});
}
}
// namespace patterns
}
// namespace patterns
void
RemovePaddingRecoverPaddingPass
::
ApplyImpl
(
ir
::
Graph
*
graph
)
const
{
void
RemovePaddingRecoverPaddingPass
::
ApplyImpl
(
ir
::
Graph
*
graph
)
const
{
...
@@ -400,38 +415,45 @@ void RemovePaddingRecoverPaddingPass::ApplyImpl(ir::Graph* graph) const {
...
@@ -400,38 +415,45 @@ void RemovePaddingRecoverPaddingPass::ApplyImpl(ir::Graph* graph) const {
gpd2
(
graph
,
handler2
);
gpd2
(
graph
,
handler2
);
GraphPatternDetector
gpd3
;
GraphPatternDetector
gpd3
;
patterns
::
Fc
fc
(
gpd3
.
mutable_pattern
(),
patterns
::
MatrixMultiply
matrix_multiply
(
"remove_padding_recover_padding_pass"
);
gpd3
.
mutable_pattern
(),
"remove_padding_recover_padding_pass"
);
fc
();
matrix_multiply
();
auto
handler3
=
[
&
](
const
GraphPatternDetector
::
subgraph_t
&
subgraph
,
auto
handler3
=
[
&
](
const
GraphPatternDetector
::
subgraph_t
&
subgraph
,
Graph
*
graph
)
{
Graph
*
graph
)
{
VLOG
(
3
)
<<
"remove_padding_recover_padding_pass for transformer: fc"
;
VLOG
(
3
)
<<
"remove_padding_recover_padding_pass for transformer: "
"matrix_multiply"
;
GET_IR_NODE_FROM_SUBGRAPH
(
fc_input
,
fc_input
,
fc
);
GET_IR_NODE_FROM_SUBGRAPH
(
GET_IR_NODE_FROM_SUBGRAPH
(
fc_op
,
fc_op
,
fc
);
matrix_multiply_input
,
matrix_multiply_input
,
matrix_multiply
);
GET_IR_NODE_FROM_SUBGRAPH
(
matrix_multiply_op
,
matrix_multiply_op
,
matrix_multiply
);
std
::
vector
<
int64_t
>
fc_input_shape
=
fc_input
->
Var
()
->
GetShape
();
std
::
vector
<
int64_t
>
matrix_multiply_input_shape
=
matrix_multiply_input
->
Var
()
->
GetShape
();
check_flag
=
true
;
check_flag
=
true
;
if
((
fc_input_shape
.
size
()
!=
multihead_matmul_input_shape
.
size
())
||
if
((
matrix_multiply_input_shape
.
size
()
!=
(
fc_input_shape
.
size
()
!=
3
))
{
multihead_matmul_input_shape
.
size
())
||
(
matrix_multiply_input_shape
.
size
()
!=
3
))
{
check_flag
=
false
;
check_flag
=
false
;
VLOG
(
3
)
<<
"Transformer model remove_padding shape check failed, return "
VLOG
(
3
)
<<
"Transformer model remove_padding shape check failed, return "
"remove_padding pass."
;
"remove_padding pass."
;
return
;
return
;
}
}
if
(
fc
_input_shape
[
0
]
!=
multihead_matmul_input_shape
[
0
])
{
if
(
matrix_multiply
_input_shape
[
0
]
!=
multihead_matmul_input_shape
[
0
])
{
check_flag
=
false
;
check_flag
=
false
;
}
}
if
(
fc
_input_shape
[
1
]
!=
multihead_matmul_input_shape
[
1
])
{
if
(
matrix_multiply
_input_shape
[
1
]
!=
multihead_matmul_input_shape
[
1
])
{
check_flag
=
false
;
check_flag
=
false
;
}
}
if
((
fc_input_shape
[
2
]
!=
multihead_matmul_input_shape
[
2
])
&&
if
((
matrix_multiply_input_shape
[
2
]
!=
multihead_matmul_input_shape
[
2
])
&&
(
fc_input_shape
[
2
]
!=
4
*
multihead_matmul_input_shape
[
2
]))
{
(
matrix_multiply_input_shape
[
2
]
!=
4
*
multihead_matmul_input_shape
[
2
]))
{
check_flag
=
false
;
check_flag
=
false
;
}
}
if
(
PADDLE_GET_CONST
(
int
,
fc_op
->
Op
()
->
GetAttr
(
"in_num_col_dims"
))
!=
2
)
{
if
(
PADDLE_GET_CONST
(
int
,
matrix_multiply_op
->
Op
()
->
GetAttr
(
"x_num_col_dims"
))
!=
2
)
{
check_flag
=
false
;
check_flag
=
false
;
}
}
if
(
!
check_flag
)
{
if
(
!
check_flag
)
{
...
@@ -439,8 +461,13 @@ void RemovePaddingRecoverPaddingPass::ApplyImpl(ir::Graph* graph) const {
...
@@ -439,8 +461,13 @@ void RemovePaddingRecoverPaddingPass::ApplyImpl(ir::Graph* graph) const {
"remove_padding pass."
;
"remove_padding pass."
;
return
;
return
;
}
}
insert_remove_padding_op
(
fc_input
,
fc_op
);
insert_recover_padding_op
(
fc_op
,
fc_op
->
outputs
[
0
]);
matrix_multiply_op
->
Op
()
->
RemoveAttr
(
"x_num_col_dims"
);
matrix_multiply_op
->
Op
()
->
SetAttr
(
"x_num_col_dims"
,
1
);
insert_remove_padding_op
(
matrix_multiply_input
,
matrix_multiply_op
);
insert_recover_padding_op
(
matrix_multiply_op
,
matrix_multiply_op
->
outputs
[
0
]);
found_subgraph_count
++
;
found_subgraph_count
++
;
};
};
gpd3
(
graph
,
handler3
);
gpd3
(
graph
,
handler3
);
...
@@ -617,6 +644,57 @@ void RemovePaddingRecoverPaddingPass::ApplyImpl(ir::Graph* graph) const {
...
@@ -617,6 +644,57 @@ void RemovePaddingRecoverPaddingPass::ApplyImpl(ir::Graph* graph) const {
};
};
gpd7
(
graph
,
handler7
);
gpd7
(
graph
,
handler7
);
// Removed fc_add fuse, elementwise can be used by the optimized model
GraphPatternDetector
gpd8
;
patterns
::
ElementWise
elementwise
(
gpd8
.
mutable_pattern
(),
"remove_padding_recover_padding_pass"
);
elementwise
();
auto
handler8
=
[
&
](
const
GraphPatternDetector
::
subgraph_t
&
subgraph
,
Graph
*
graph
)
{
VLOG
(
3
)
<<
"remove_padding_recover_padding_pass for transformer: "
"elementwise"
;
GET_IR_NODE_FROM_SUBGRAPH
(
elementwise_input
,
elementwise_input
,
elementwise
);
GET_IR_NODE_FROM_SUBGRAPH
(
elementwise_op
,
elementwise_op
,
elementwise
);
GET_IR_NODE_FROM_SUBGRAPH
(
elementwise_out
,
elementwise_out
,
elementwise
);
std
::
vector
<
int64_t
>
elementwise_input_shape
=
elementwise_input
->
Var
()
->
GetShape
();
check_flag
=
true
;
if
(
elementwise_input_shape
.
size
()
!=
multihead_matmul_input_shape
.
size
())
{
check_flag
=
false
;
VLOG
(
3
)
<<
"Transformer model remove_padding shape check failed, return "
"remove_padding pass."
;
return
;
}
if
(
elementwise_input_shape
[
0
]
!=
multihead_matmul_input_shape
[
0
])
{
check_flag
=
false
;
}
if
(
elementwise_input_shape
[
1
]
!=
multihead_matmul_input_shape
[
1
])
{
check_flag
=
false
;
}
if
((
elementwise_input_shape
[
2
]
!=
multihead_matmul_input_shape
[
2
])
&&
(
elementwise_input_shape
[
2
]
!=
4
*
multihead_matmul_input_shape
[
2
]))
{
check_flag
=
false
;
}
if
(
!
check_flag
)
{
VLOG
(
3
)
<<
"Transformer model remove_padding shape check failed, return "
"remove_padding pass."
;
return
;
}
elementwise_op
->
Op
()
->
RemoveAttr
(
"axis"
);
elementwise_op
->
Op
()
->
SetAttr
(
"axis"
,
1
);
insert_remove_padding_op
(
elementwise_input
,
elementwise_op
);
insert_recover_padding_op
(
elementwise_op
,
elementwise_out
);
found_subgraph_count
++
;
};
gpd8
(
graph
,
handler8
);
AddStatis
(
found_subgraph_count
);
AddStatis
(
found_subgraph_count
);
}
}
...
...
paddle/fluid/framework/ir/remove_padding_recover_padding_pass.h
浏览文件 @
ef734e84
...
@@ -87,14 +87,14 @@ struct MultiheadMatmul : public PatternBase {
...
@@ -87,14 +87,14 @@ struct MultiheadMatmul : public PatternBase {
PATTERN_DECL_NODE
(
multihead_matmul_out
);
PATTERN_DECL_NODE
(
multihead_matmul_out
);
};
};
struct
Fc
:
public
PatternBase
{
struct
MatrixMultiply
:
public
PatternBase
{
Fc
(
PDPattern
*
pattern
,
const
std
::
string
&
name_scope
)
MatrixMultiply
(
PDPattern
*
pattern
,
const
std
::
string
&
name_scope
)
:
PatternBase
(
pattern
,
name_scope
,
"
fc
"
)
{}
:
PatternBase
(
pattern
,
name_scope
,
"
matrix_multiply
"
)
{}
void
operator
()();
void
operator
()();
PATTERN_DECL_NODE
(
fc
_input
);
PATTERN_DECL_NODE
(
matrix_multiply
_input
);
PATTERN_DECL_NODE
(
fc
_op
);
PATTERN_DECL_NODE
(
matrix_multiply
_op
);
};
};
struct
Activation
:
public
PatternBase
{
struct
Activation
:
public
PatternBase
{
...
@@ -118,6 +118,17 @@ struct FusedTokenPrune : public PatternBase {
...
@@ -118,6 +118,17 @@ struct FusedTokenPrune : public PatternBase {
PATTERN_DECL_NODE
(
fused_token_prune_op
);
PATTERN_DECL_NODE
(
fused_token_prune_op
);
PATTERN_DECL_NODE
(
fused_token_prune_output
);
PATTERN_DECL_NODE
(
fused_token_prune_output
);
};
};
struct
ElementWise
:
public
PatternBase
{
ElementWise
(
PDPattern
*
pattern
,
const
std
::
string
&
name_scope
)
:
PatternBase
(
pattern
,
name_scope
,
"elementwise"
)
{}
void
operator
()();
PATTERN_DECL_NODE
(
elementwise_input
);
PATTERN_DECL_NODE
(
elementwise_op
);
PATTERN_DECL_NODE
(
elementwise_out
);
};
}
// namespace patterns
}
// namespace patterns
class
RemovePaddingRecoverPaddingPass
:
public
FusePassBase
{
class
RemovePaddingRecoverPaddingPass
:
public
FusePassBase
{
...
...
paddle/fluid/framework/ir/trt_cross_multihead_matmul_fuse_pass.cc
浏览文件 @
ef734e84
...
@@ -64,8 +64,8 @@ namespace patterns {
...
@@ -64,8 +64,8 @@ namespace patterns {
// output
// output
PDNode
*
TrtCrossMultiHeadMatmulPattern
::
operator
()()
{
PDNode
*
TrtCrossMultiHeadMatmulPattern
::
operator
()()
{
std
::
unordered_set
<
std
::
string
>
mul_ops
{
"m
ul"
,
"matmul_v2
"
};
std
::
unordered_set
<
std
::
string
>
mul_ops
{
"m
atrix_multiply
"
};
std
::
unordered_set
<
std
::
string
>
matmul_ops
{
"mat
mul"
,
"matmul_v2
"
};
std
::
unordered_set
<
std
::
string
>
matmul_ops
{
"mat
rix_multiply
"
};
auto
*
input0
=
pattern
->
NewNode
(
input0_repr
());
auto
*
input0
=
pattern
->
NewNode
(
input0_repr
());
auto
*
input1
=
pattern
->
NewNode
(
input1_repr
());
auto
*
input1
=
pattern
->
NewNode
(
input1_repr
());
...
@@ -210,23 +210,6 @@ PDNode* TrtCrossMultiHeadMatmulPattern::operator()() {
...
@@ -210,23 +210,6 @@ PDNode* TrtCrossMultiHeadMatmulPattern::operator()() {
}
// namespace patterns
}
// namespace patterns
TrtCrossMultiHeadMatmulFusePass
::
TrtCrossMultiHeadMatmulFusePass
()
{
TrtCrossMultiHeadMatmulFusePass
::
TrtCrossMultiHeadMatmulFusePass
()
{
AddOpCompat
(
OpCompat
(
"mul"
))
.
AddInput
(
"X"
)
// the shape shoule be (B, S, N*H)
.
IsTensor
()
.
End
()
.
AddInput
(
"Y"
)
// the shape shoule be (N*H, N*H)
.
IsTensor
()
.
End
()
.
AddOutput
(
"Out"
)
// the shape shoule be (B, S, N*H)
.
IsTensor
()
.
End
()
.
AddAttr
(
"x_num_col_dims"
)
.
IsNumEQ
(
2
)
.
End
()
.
AddAttr
(
"y_num_col_dims"
)
.
IsNumEQ
(
1
)
.
End
();
AddOpCompat
(
OpCompat
(
"reshape2"
))
AddOpCompat
(
OpCompat
(
"reshape2"
))
.
AddInput
(
"X"
)
.
AddInput
(
"X"
)
.
IsTensor
()
.
IsTensor
()
...
@@ -269,43 +252,6 @@ TrtCrossMultiHeadMatmulFusePass::TrtCrossMultiHeadMatmulFusePass() {
...
@@ -269,43 +252,6 @@ TrtCrossMultiHeadMatmulFusePass::TrtCrossMultiHeadMatmulFusePass() {
// QK (B, H, S, N)*(B, H, S, N) -> (B, H, S, S)
// QK (B, H, S, N)*(B, H, S, N) -> (B, H, S, S)
// QKV (B, H, S, S)*(B, H, S, N) -> (B, H, S, N)
// QKV (B, H, S, S)*(B, H, S, N) -> (B, H, S, N)
AddOpCompat
(
OpCompat
(
"matmul"
))
.
AddInput
(
"X"
)
.
IsTensor
()
.
End
()
.
AddInput
(
"Y"
)
.
IsTensor
()
.
End
()
.
AddOutput
(
"Out"
)
.
IsTensor
()
.
End
()
.
AddAttr
(
"alpha"
)
.
IsType
<
float
>
()
// QK(anyvalue, will copy to new op) QKV(1.0)
.
End
()
.
AddAttr
(
"transpose_X"
)
.
IsBoolEQ
(
false
)
.
End
()
.
AddAttr
(
"transpose_Y"
)
// QK(true) QKV(false)
.
IsType
<
bool
>
()
.
End
();
AddOpCompat
(
OpCompat
(
"matmul_v2"
))
.
AddInput
(
"X"
)
.
IsTensor
()
.
End
()
.
AddInput
(
"Y"
)
.
IsTensor
()
.
End
()
.
AddOutput
(
"Out"
)
.
IsTensor
()
.
End
()
.
AddAttr
(
"trans_x"
)
.
IsBoolEQ
(
false
)
.
End
()
.
AddAttr
(
"trans_y"
)
// QK(true) QKV(false)
.
IsType
<
bool
>
()
.
End
();
AddOpCompat
(
OpCompat
(
"softmax"
))
AddOpCompat
(
OpCompat
(
"softmax"
))
.
AddInput
(
"X"
)
.
AddInput
(
"X"
)
.
IsTensor
()
.
IsTensor
()
...
@@ -584,11 +530,8 @@ REGISTER_PASS(trt_cross_multihead_matmul_fuse_pass,
...
@@ -584,11 +530,8 @@ REGISTER_PASS(trt_cross_multihead_matmul_fuse_pass,
REGISTER_PASS_CAPABILITY
(
trt_cross_multihead_matmul_fuse_pass
)
REGISTER_PASS_CAPABILITY
(
trt_cross_multihead_matmul_fuse_pass
)
.
AddCombination
(
.
AddCombination
(
paddle
::
framework
::
compatible
::
OpVersionComparatorCombination
()
paddle
::
framework
::
compatible
::
OpVersionComparatorCombination
()
.
EQ
(
"mul"
,
0
)
.
LE
(
"elementwise_add"
,
1
)
.
LE
(
"elementwise_add"
,
1
)
.
EQ
(
"reshape2"
,
0
)
.
EQ
(
"reshape2"
,
0
)
.
EQ
(
"transpose2"
,
0
)
.
EQ
(
"transpose2"
,
0
)
.
EQ
(
"scale"
,
0
)
.
EQ
(
"scale"
,
0
)
.
LE
(
"matmul"
,
1
)
.
EQ
(
"matmul_v2"
,
0
)
.
EQ
(
"softmax"
,
0
));
.
EQ
(
"softmax"
,
0
));
paddle/fluid/framework/ir/trt_delete_weight_dequant_linear_op_pass.cc
浏览文件 @
ef734e84
...
@@ -156,77 +156,6 @@ TrtDeleteWeightQuantDequantLinearOpPass::
...
@@ -156,77 +156,6 @@ TrtDeleteWeightQuantDequantLinearOpPass::
.
AddAttr
(
"data_format"
)
.
AddAttr
(
"data_format"
)
.
IsStringIn
({
"NCHW"
,
"NHWC"
,
"AnyLayout"
})
.
IsStringIn
({
"NCHW"
,
"NHWC"
,
"AnyLayout"
})
.
End
();
.
End
();
AddOpCompat
(
OpCompat
(
"mul"
))
.
AddInput
(
"X"
)
.
IsTensor
()
.
End
()
.
AddInput
(
"Y"
)
.
IsTensor
()
.
End
()
.
AddOutput
(
"Out"
)
.
IsTensor
()
.
End
()
.
AddAttr
(
"x_num_col_dims"
)
.
IsNumGE
(
1
)
.
End
()
.
AddAttr
(
"y_num_col_dims"
)
.
IsNumEQ
(
1
)
.
End
();
AddOpCompat
(
OpCompat
(
"matmul_v2"
))
.
AddInput
(
"X"
)
.
IsTensor
()
.
End
()
.
AddInput
(
"Y"
)
.
IsTensor
()
.
End
()
.
AddOutput
(
"Out"
)
.
IsTensor
()
.
End
()
.
AddAttr
(
"trans_x"
)
.
IsBoolEQ
(
false
)
.
End
()
.
AddAttr
(
"trans_y"
)
.
IsBoolEQ
(
false
)
.
End
();
AddOpCompat
(
OpCompat
(
"matmul"
))
.
AddInput
(
"X"
)
.
IsTensor
()
.
End
()
.
AddInput
(
"Y"
)
.
IsTensor
()
.
End
()
.
AddOutput
(
"Out"
)
.
IsTensor
()
.
End
()
.
AddAttr
(
"alpha"
)
.
IsNumGE
(
0.99
f
)
.
IsNumLE
(
1.01
f
)
.
End
()
.
AddAttr
(
"transpose_X"
)
.
IsBoolEQ
(
false
)
.
End
()
.
AddAttr
(
"transpose_Y"
)
.
IsBoolEQ
(
false
)
.
End
();
AddOpCompat
(
OpCompat
(
"fc"
))
.
AddInput
(
"Input"
)
.
IsTensor
()
.
End
()
.
AddInput
(
"W"
)
.
IsTensor
()
.
End
()
.
AddInput
(
"Bias"
)
.
IsTensor
()
.
End
()
.
AddOutput
(
"Out"
)
.
IsTensor
()
.
End
()
.
AddAttr
(
"in_num_col_dims"
)
.
IsNumGE
(
1
)
.
End
()
.
AddAttr
(
"activation_type"
)
.
IsStringIn
({
"relu"
,
""
})
.
End
();
AddOpCompat
(
OpCompat
(
"conv2d_transpose"
))
AddOpCompat
(
OpCompat
(
"conv2d_transpose"
))
.
AddInput
(
"Input"
)
.
AddInput
(
"Input"
)
.
IsTensor
()
.
IsTensor
()
...
...
paddle/fluid/framework/ir/trt_flash_multihead_matmul_fuse_pass.cc
浏览文件 @
ef734e84
...
@@ -65,8 +65,8 @@ namespace patterns {
...
@@ -65,8 +65,8 @@ namespace patterns {
// output
// output
PDNode
*
TrtFlashMultiHeadMatmulPattern
::
operator
()()
{
PDNode
*
TrtFlashMultiHeadMatmulPattern
::
operator
()()
{
std
::
unordered_set
<
std
::
string
>
mul_ops
{
"m
ul"
,
"matmul_v2
"
};
std
::
unordered_set
<
std
::
string
>
mul_ops
{
"m
atrix_multiply
"
};
std
::
unordered_set
<
std
::
string
>
matmul_ops
{
"mat
mul"
,
"matmul_v2
"
};
std
::
unordered_set
<
std
::
string
>
matmul_ops
{
"mat
rix_multiply
"
};
auto
*
input0
=
pattern
->
NewNode
(
input0_repr
());
auto
*
input0
=
pattern
->
NewNode
(
input0_repr
());
input0
->
assert_is_ops_input
(
mul_ops
);
input0
->
assert_is_ops_input
(
mul_ops
);
VLOG
(
5
)
<<
"Start match TrtFlashMultiHeadMatmulPattern"
;
VLOG
(
5
)
<<
"Start match TrtFlashMultiHeadMatmulPattern"
;
...
@@ -209,23 +209,6 @@ PDNode* TrtFlashMultiHeadMatmulPattern::operator()() {
...
@@ -209,23 +209,6 @@ PDNode* TrtFlashMultiHeadMatmulPattern::operator()() {
}
// namespace patterns
}
// namespace patterns
TrtFlashMultiHeadMatmulFusePass
::
TrtFlashMultiHeadMatmulFusePass
()
{
TrtFlashMultiHeadMatmulFusePass
::
TrtFlashMultiHeadMatmulFusePass
()
{
AddOpCompat
(
OpCompat
(
"mul"
))
.
AddInput
(
"X"
)
// the shape shoule be (B, S, N*H)
.
IsTensor
()
.
End
()
.
AddInput
(
"Y"
)
// the shape shoule be (N*H, N*H)
.
IsTensor
()
.
End
()
.
AddOutput
(
"Out"
)
// the shape shoule be (B, S, N*H)
.
IsTensor
()
.
End
()
.
AddAttr
(
"x_num_col_dims"
)
.
IsNumEQ
(
2
)
.
End
()
.
AddAttr
(
"y_num_col_dims"
)
.
IsNumEQ
(
1
)
.
End
();
AddOpCompat
(
OpCompat
(
"reshape2"
))
AddOpCompat
(
OpCompat
(
"reshape2"
))
.
AddInput
(
"X"
)
.
AddInput
(
"X"
)
.
IsTensor
()
.
IsTensor
()
...
@@ -268,43 +251,6 @@ TrtFlashMultiHeadMatmulFusePass::TrtFlashMultiHeadMatmulFusePass() {
...
@@ -268,43 +251,6 @@ TrtFlashMultiHeadMatmulFusePass::TrtFlashMultiHeadMatmulFusePass() {
// QK (B, H, S, N)*(B, H, S, N) -> (B, H, S, S)
// QK (B, H, S, N)*(B, H, S, N) -> (B, H, S, S)
// QKV (B, H, S, S)*(B, H, S, N) -> (B, H, S, N)
// QKV (B, H, S, S)*(B, H, S, N) -> (B, H, S, N)
AddOpCompat
(
OpCompat
(
"matmul"
))
.
AddInput
(
"X"
)
.
IsTensor
()
.
End
()
.
AddInput
(
"Y"
)
.
IsTensor
()
.
End
()
.
AddOutput
(
"Out"
)
.
IsTensor
()
.
End
()
.
AddAttr
(
"alpha"
)
.
IsType
<
float
>
()
// QK(anyvalue, will copy to new op) QKV(1.0)
.
End
()
.
AddAttr
(
"transpose_X"
)
.
IsBoolEQ
(
false
)
.
End
()
.
AddAttr
(
"transpose_Y"
)
// QK(true) QKV(false)
.
IsType
<
bool
>
()
.
End
();
AddOpCompat
(
OpCompat
(
"matmul_v2"
))
.
AddInput
(
"X"
)
.
IsTensor
()
.
End
()
.
AddInput
(
"Y"
)
.
IsTensor
()
.
End
()
.
AddOutput
(
"Out"
)
.
IsTensor
()
.
End
()
.
AddAttr
(
"trans_x"
)
.
IsBoolEQ
(
false
)
.
End
()
.
AddAttr
(
"trans_y"
)
// QK(true) QKV(false)
.
IsType
<
bool
>
()
.
End
();
AddOpCompat
(
OpCompat
(
"softmax"
))
AddOpCompat
(
OpCompat
(
"softmax"
))
.
AddInput
(
"X"
)
.
AddInput
(
"X"
)
.
IsTensor
()
.
IsTensor
()
...
@@ -578,11 +524,8 @@ REGISTER_PASS(trt_flash_multihead_matmul_fuse_pass,
...
@@ -578,11 +524,8 @@ REGISTER_PASS(trt_flash_multihead_matmul_fuse_pass,
REGISTER_PASS_CAPABILITY
(
trt_flash_multihead_matmul_fuse_pass
)
REGISTER_PASS_CAPABILITY
(
trt_flash_multihead_matmul_fuse_pass
)
.
AddCombination
(
.
AddCombination
(
paddle
::
framework
::
compatible
::
OpVersionComparatorCombination
()
paddle
::
framework
::
compatible
::
OpVersionComparatorCombination
()
.
EQ
(
"mul"
,
0
)
.
LE
(
"elementwise_add"
,
1
)
.
LE
(
"elementwise_add"
,
1
)
.
EQ
(
"reshape2"
,
0
)
.
EQ
(
"reshape2"
,
0
)
.
EQ
(
"transpose2"
,
0
)
.
EQ
(
"transpose2"
,
0
)
.
EQ
(
"scale"
,
0
)
.
EQ
(
"scale"
,
0
)
.
LE
(
"matmul"
,
1
)
.
EQ
(
"matmul_v2"
,
0
)
.
EQ
(
"softmax"
,
0
));
.
EQ
(
"softmax"
,
0
));
paddle/fluid/framework/ir/trt_map_matmul_to_mul_pass.cc
已删除
100644 → 0
浏览文件 @
acf55016
// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "paddle/fluid/framework/ir/trt_map_matmul_to_mul_pass.h"
#include <cmath>
#include <string>
#include "paddle/fluid/framework/ir/graph_pattern_detector.h"
#include "paddle/fluid/framework/op_proto_maker.h"
#include "paddle/fluid/framework/op_version_registry.h"
#include "paddle/fluid/platform/enforce.h"
namespace
paddle
{
namespace
framework
{
namespace
ir
{
class
Node
;
TrtMapMatmul2MulPass
::
TrtMapMatmul2MulPass
()
{
AddOpCompat
(
OpCompat
(
"matmul"
))
.
AddInput
(
"X"
)
.
IsTensor
()
.
End
()
.
AddInput
(
"Y"
)
.
IsTensor
()
.
End
()
.
AddOutput
(
"Out"
)
.
IsTensor
()
.
End
()
.
AddAttr
(
"alpha"
)
.
IsNumGE
(
0.99
f
)
.
IsNumLE
(
1.01
f
)
.
End
()
.
AddAttr
(
"transpose_X"
)
.
IsBoolEQ
(
false
)
.
End
()
.
AddAttr
(
"transpose_Y"
)
.
IsType
<
bool
>
()
.
End
();
AddOpCompat
(
OpCompat
(
"mul"
))
.
AddInput
(
"X"
)
.
IsTensor
()
.
End
()
.
AddInput
(
"Y"
)
.
IsTensor
()
.
End
()
.
AddOutput
(
"Out"
)
.
IsTensor
()
.
End
()
.
AddAttr
(
"x_num_col_dims"
)
.
IsNumGE
(
1
)
.
End
()
.
AddAttr
(
"y_num_col_dims"
)
.
IsNumEQ
(
1
)
.
End
();
}
TrtMapMatmulV2ToMulPass
::
TrtMapMatmulV2ToMulPass
()
{
AddOpCompat
(
OpCompat
(
"matmul_v2"
))
.
AddInput
(
"X"
)
.
IsTensor
()
.
End
()
.
AddInput
(
"Y"
)
.
IsTensor
()
.
End
()
.
AddOutput
(
"Out"
)
.
IsTensor
()
.
End
()
.
AddAttr
(
"trans_x"
)
.
IsBoolEQ
(
false
)
.
End
()
.
AddAttr
(
"trans_y"
)
.
IsType
<
bool
>
()
.
End
();
AddOpCompat
(
OpCompat
(
"mul"
))
.
AddInput
(
"X"
)
.
IsTensor
()
.
End
()
.
AddInput
(
"Y"
)
.
IsTensor
()
.
End
()
.
AddOutput
(
"Out"
)
.
IsTensor
()
.
End
()
.
AddAttr
(
"x_num_col_dims"
)
.
IsNumGE
(
1
)
.
End
()
.
AddAttr
(
"y_num_col_dims"
)
.
IsNumEQ
(
1
)
.
End
();
}
TrtMapMatmulV2ToMatmulPass
::
TrtMapMatmulV2ToMatmulPass
()
{
AddOpCompat
(
OpCompat
(
"matmul_v2"
))
.
AddInput
(
"X"
)
.
IsTensor
()
.
End
()
.
AddInput
(
"Y"
)
.
IsTensor
()
.
End
()
.
AddOutput
(
"Out"
)
.
IsTensor
()
.
End
()
.
AddAttr
(
"trans_x"
)
.
IsType
<
bool
>
()
.
End
()
.
AddAttr
(
"trans_y"
)
.
IsType
<
bool
>
()
.
End
();
AddOpCompat
(
OpCompat
(
"matmul"
))
.
AddInput
(
"X"
)
.
IsTensor
()
.
End
()
.
AddInput
(
"Y"
)
.
IsTensor
()
.
End
()
.
AddAttr
(
"alpha"
)
.
IsNumEQ
(
1.0
f
)
.
End
()
.
AddOutput
(
"Out"
)
.
IsTensor
()
.
End
()
.
AddAttr
(
"transpose_X"
)
.
IsType
<
bool
>
()
.
End
()
.
AddAttr
(
"transpose_Y"
)
.
IsType
<
bool
>
()
.
End
();
}
TrtFlatten2MatmulFusePass
::
TrtFlatten2MatmulFusePass
()
{
AddOpCompat
(
OpCompat
(
"matmul"
))
.
AddInput
(
"X"
)
.
IsTensor
()
.
End
()
.
AddInput
(
"Y"
)
.
IsTensor
()
.
End
()
.
AddOutput
(
"Out"
)
.
IsTensor
()
.
End
()
.
AddAttr
(
"alpha"
)
.
IsNumGE
(
0.99
f
)
.
IsNumLE
(
1.01
f
)
.
End
()
.
AddAttr
(
"transpose_X"
)
.
IsBoolEQ
(
false
)
.
End
()
.
AddAttr
(
"transpose_Y"
)
.
IsBoolEQ
(
false
)
.
End
();
AddOpCompat
(
OpCompat
(
"flatten2"
))
.
AddInput
(
"X"
)
.
IsTensor
()
.
End
()
.
AddOutput
(
"Out"
)
.
IsTensor
()
.
End
()
.
AddOutput
(
"XShape"
)
.
IsTensor
()
.
End
()
.
AddAttr
(
"axis"
)
.
IsNumEQ
(
1
)
.
End
();
AddOpCompat
(
OpCompat
(
"mul"
))
.
AddInput
(
"X"
)
.
IsTensor
()
.
End
()
.
AddInput
(
"Y"
)
.
IsTensor
()
.
End
()
.
AddOutput
(
"Out"
)
.
IsTensor
()
.
End
()
.
AddAttr
(
"x_num_col_dims"
)
.
IsNumGE
(
1
)
.
End
()
.
AddAttr
(
"y_num_col_dims"
)
.
IsNumEQ
(
1
)
.
End
();
}
TrtSqueeze2MatmulFusePass
::
TrtSqueeze2MatmulFusePass
()
{
AddOpCompat
(
OpCompat
(
"matmul"
))
.
AddInput
(
"X"
)
.
IsTensor
()
.
End
()
.
AddInput
(
"Y"
)
.
IsTensor
()
.
End
()
.
AddOutput
(
"Out"
)
.
IsTensor
()
.
End
()
.
AddAttr
(
"alpha"
)
.
IsNumGE
(
0.99
f
)
.
IsNumLE
(
1.01
f
)
.
End
()
.
AddAttr
(
"transpose_X"
)
.
IsBoolEQ
(
false
)
.
End
()
.
AddAttr
(
"transpose_Y"
)
.
IsBoolEQ
(
false
)
.
End
();
AddOpCompat
(
OpCompat
(
"squeeze2"
))
.
AddInput
(
"X"
)
.
IsTensor
()
.
End
()
.
AddOutput
(
"Out"
)
.
IsTensor
()
.
End
()
.
AddOutput
(
"XShape"
)
.
IsTensor
()
.
End
()
.
AddAttr
(
"axes"
)
.
IsType
<
std
::
vector
<
int
>>
()
.
End
();
AddOpCompat
(
OpCompat
(
"mul"
))
.
AddInput
(
"X"
)
.
IsTensor
()
.
End
()
.
AddInput
(
"Y"
)
.
IsTensor
()
.
End
()
.
AddOutput
(
"Out"
)
.
IsTensor
()
.
End
()
.
AddAttr
(
"x_num_col_dims"
)
.
IsNumEQ
(
1
)
.
End
()
.
AddAttr
(
"y_num_col_dims"
)
.
IsNumEQ
(
1
)
.
End
();
}
void
TrtMapMatmul2MulPass
::
ApplyImpl
(
ir
::
Graph
*
graph
)
const
{
PADDLE_ENFORCE_NOT_NULL
(
graph
,
platform
::
errors
::
InvalidArgument
(
"Graph cannot be nullptr."
));
std
::
string
name_scope
=
"trt_map_matmul_to_mul_pass"
;
FusePassBase
::
Init
(
name_scope
,
graph
);
GraphPatternDetector
gpd
;
patterns
::
Matmul
matmul_pattern
(
gpd
.
mutable_pattern
(),
name_scope
);
matmul_pattern
();
int
found_count
=
0
;
auto
handler
=
[
&
](
const
GraphPatternDetector
::
subgraph_t
&
subgraph
,
Graph
*
g
)
{
VLOG
(
4
)
<<
"trt map matmul to mul"
;
GET_IR_NODE_FROM_SUBGRAPH
(
matmul_in_x
,
matmul_in_x
,
matmul_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
matmul_in_y
,
matmul_in_y
,
matmul_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
matmul_op
,
matmul_op
,
matmul_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
matmul_out
,
matmul_out
,
matmul_pattern
);
bool
flag
=
true
;
bool
transpose_X
=
PADDLE_GET_CONST
(
bool
,
matmul_op
->
Op
()
->
GetAttr
(
"transpose_X"
));
float
alpha
=
PADDLE_GET_CONST
(
float
,
matmul_op
->
Op
()
->
GetAttr
(
"alpha"
));
flag
=
flag
&&
!
transpose_X
&&
std
::
abs
(
alpha
-
1.0
)
<
1e-5
;
std
::
vector
<
int64_t
>
x_shape
=
matmul_in_x
->
Var
()
->
GetShape
();
std
::
vector
<
int64_t
>
y_shape
=
matmul_in_y
->
Var
()
->
GetShape
();
size_t
x_rank
=
x_shape
.
size
();
size_t
y_rank
=
y_shape
.
size
();
flag
=
flag
&&
x_rank
>=
2
&&
y_rank
==
2
;
if
(
flag
)
{
if
(
!
IsCompat
(
subgraph
,
g
))
{
LOG
(
WARNING
)
<<
"TrtMapMatmul2MulPass in op compat failed."
;
return
;
}
OpDesc
desc
(
matmul_op
->
Op
()
->
Block
());
desc
.
SetType
(
"mul"
);
desc
.
SetInput
(
"X"
,
{
matmul_in_x
->
Name
()});
desc
.
SetInput
(
"Y"
,
{
matmul_in_y
->
Name
()});
desc
.
SetOutput
(
"Out"
,
{
matmul_out
->
Name
()});
desc
.
SetAttr
(
"x_num_col_dims"
,
static_cast
<
int
>
(
x_rank
-
1
));
desc
.
SetAttr
(
"y_num_col_dims"
,
1
);
desc
.
SetAttr
(
"transpose_Y"
,
matmul_op
->
Op
()
->
GetAttr
(
"transpose_Y"
));
if
(
matmul_op
->
Op
()
->
HasAttr
(
"enable_int8"
))
{
desc
.
SetAttr
(
"enable_int8"
,
matmul_op
->
Op
()
->
GetAttr
(
"enable_int8"
));
desc
.
SetAttr
(
"Input_scale"
,
matmul_op
->
Op
()
->
GetAttr
(
"Input_scale"
));
desc
.
SetAttr
(
"out_threshold"
,
matmul_op
->
Op
()
->
GetAttr
(
"out_threshold"
));
}
bool
inscale_flag
=
false
;
bool
outscale_flag
=
false
;
if
(
matmul_op
->
Op
()
->
HasAttr
(
"X"
))
{
desc
.
SetAttr
(
"X"
,
matmul_op
->
Op
()
->
GetAttr
(
"X"
));
inscale_flag
=
true
;
}
if
(
matmul_op
->
Op
()
->
HasAttr
(
"Out"
))
{
desc
.
SetAttr
(
"Out"
,
matmul_op
->
Op
()
->
GetAttr
(
"Out"
));
outscale_flag
=
true
;
}
desc
.
SetAttr
(
"support_int8"
,
inscale_flag
&&
outscale_flag
);
auto
mul_node
=
g
->
CreateOpNode
(
&
desc
);
IR_NODE_LINK_TO
(
matmul_in_x
,
mul_node
);
IR_NODE_LINK_TO
(
matmul_in_y
,
mul_node
);
IR_NODE_LINK_TO
(
mul_node
,
matmul_out
);
GraphSafeRemoveNodes
(
graph
,
{
matmul_op
});
++
found_count
;
if
(
!
IsCompat
(
desc
))
{
LOG
(
WARNING
)
<<
"TrtMapMatmul2MulPass in out mul op compat failed."
;
return
;
}
}
};
gpd
(
graph
,
handler
);
AddStatis
(
found_count
);
}
void
TrtMapMatmulV2ToMulPass
::
ApplyImpl
(
ir
::
Graph
*
graph
)
const
{
PADDLE_ENFORCE_NOT_NULL
(
graph
,
platform
::
errors
::
InvalidArgument
(
"Graph cannot be nullptr."
));
std
::
string
name_scope
=
"trt_map_matmul_v2_to_mul_pass"
;
FusePassBase
::
Init
(
name_scope
,
graph
);
GraphPatternDetector
gpd
;
patterns
::
MatmulV2Weight
matmul_v2_weight_pattern
(
gpd
.
mutable_pattern
(),
name_scope
);
matmul_v2_weight_pattern
();
int
found_count
=
0
;
auto
handler
=
[
&
](
const
GraphPatternDetector
::
subgraph_t
&
subgraph
,
Graph
*
g
)
{
VLOG
(
3
)
<<
"trt map matmul_v2 to mul"
;
GET_IR_NODE_FROM_SUBGRAPH
(
matmul_v2_in_x
,
matmul_v2_in_x
,
matmul_v2_weight_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
matmul_v2_in_y
,
matmul_v2_in_y
,
matmul_v2_weight_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
matmul_v2_op
,
matmul_v2_op
,
matmul_v2_weight_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
matmul_v2_out
,
matmul_v2_out
,
matmul_v2_weight_pattern
);
bool
flag
=
true
;
bool
trans_x
=
PADDLE_GET_CONST
(
bool
,
matmul_v2_op
->
Op
()
->
GetAttr
(
"trans_x"
));
flag
=
flag
&&
!
trans_x
;
std
::
vector
<
int64_t
>
x_shape
=
matmul_v2_in_x
->
Var
()
->
GetShape
();
std
::
vector
<
int64_t
>
y_shape
=
matmul_v2_in_y
->
Var
()
->
GetShape
();
size_t
x_rank
=
x_shape
.
size
();
size_t
y_rank
=
y_shape
.
size
();
flag
=
flag
&&
x_rank
>=
2
&&
y_rank
==
2
;
if
(
flag
)
{
if
(
!
IsCompat
(
subgraph
,
g
))
{
LOG
(
WARNING
)
<<
"TrtMapMatmulV2ToMulPass in op compat failed."
;
return
;
}
OpDesc
desc
(
matmul_v2_op
->
Op
()
->
Block
());
desc
.
SetType
(
"mul"
);
desc
.
SetInput
(
"X"
,
{
matmul_v2_in_x
->
Name
()});
desc
.
SetInput
(
"Y"
,
{
matmul_v2_in_y
->
Name
()});
desc
.
SetOutput
(
"Out"
,
{
matmul_v2_out
->
Name
()});
desc
.
SetAttr
(
"x_num_col_dims"
,
static_cast
<
int
>
(
x_rank
-
1
));
desc
.
SetAttr
(
"y_num_col_dims"
,
1
);
desc
.
SetAttr
(
"transpose_Y"
,
matmul_v2_op
->
Op
()
->
GetAttr
(
"trans_y"
));
if
(
matmul_v2_op
->
Op
()
->
HasAttr
(
"enable_int8"
))
{
desc
.
SetAttr
(
"enable_int8"
,
matmul_v2_op
->
Op
()
->
GetAttr
(
"enable_int8"
));
desc
.
SetAttr
(
"Input_scale"
,
matmul_v2_op
->
Op
()
->
GetAttr
(
"Input_scale"
));
desc
.
SetAttr
(
"out_threshold"
,
matmul_v2_op
->
Op
()
->
GetAttr
(
"out_threshold"
));
}
bool
inscale_flag
=
false
;
bool
outscale_flag
=
false
;
if
(
matmul_v2_op
->
Op
()
->
HasAttr
(
"X"
))
{
desc
.
SetAttr
(
"X"
,
matmul_v2_op
->
Op
()
->
GetAttr
(
"X"
));
inscale_flag
=
true
;
}
if
(
matmul_v2_op
->
Op
()
->
HasAttr
(
"Out"
))
{
desc
.
SetAttr
(
"Out"
,
matmul_v2_op
->
Op
()
->
GetAttr
(
"Out"
));
outscale_flag
=
true
;
}
desc
.
SetAttr
(
"support_int8"
,
inscale_flag
&&
outscale_flag
);
auto
mul_node
=
g
->
CreateOpNode
(
&
desc
);
IR_NODE_LINK_TO
(
matmul_v2_in_x
,
mul_node
);
IR_NODE_LINK_TO
(
matmul_v2_in_y
,
mul_node
);
IR_NODE_LINK_TO
(
mul_node
,
matmul_v2_out
);
GraphSafeRemoveNodes
(
graph
,
{
matmul_v2_op
});
++
found_count
;
if
(
!
IsCompat
(
desc
))
{
LOG
(
WARNING
)
<<
"TrtMapMatmulV2ToMulPass in out mul op compat failed."
;
return
;
}
}
};
gpd
(
graph
,
handler
);
AddStatis
(
found_count
);
}
void
TrtMapMatmulV2ToMatmulPass
::
ApplyImpl
(
ir
::
Graph
*
graph
)
const
{
PADDLE_ENFORCE_NOT_NULL
(
graph
,
platform
::
errors
::
InvalidArgument
(
"Graph cannot be nullptr."
));
std
::
string
name_scope
=
"trt_map_matmul_v2_to_matmul_pass"
;
FusePassBase
::
Init
(
name_scope
,
graph
);
GraphPatternDetector
gpd
;
patterns
::
MatmulV2
matmul_v2_pattern
(
gpd
.
mutable_pattern
(),
name_scope
);
matmul_v2_pattern
();
int
found_count
=
0
;
auto
handler
=
[
&
](
const
GraphPatternDetector
::
subgraph_t
&
subgraph
,
Graph
*
g
)
{
VLOG
(
4
)
<<
"trt map matmul_v2 to matmul"
;
GET_IR_NODE_FROM_SUBGRAPH
(
matmul_v2_in_x
,
matmul_v2_in_x
,
matmul_v2_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
matmul_v2_in_y
,
matmul_v2_in_y
,
matmul_v2_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
matmul_v2_op
,
matmul_v2_op
,
matmul_v2_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
matmul_v2_out
,
matmul_v2_out
,
matmul_v2_pattern
);
if
(
!
IsCompat
(
subgraph
,
g
))
{
LOG
(
WARNING
)
<<
"TrtMapMatmulV2ToMatmulPass in op compat failed."
;
return
;
}
std
::
vector
<
int64_t
>
x_shape
=
matmul_v2_in_x
->
Var
()
->
GetShape
();
std
::
vector
<
int64_t
>
y_shape
=
matmul_v2_in_y
->
Var
()
->
GetShape
();
if
(
x_shape
.
size
()
!=
y_shape
.
size
())
{
LOG
(
WARNING
)
<<
"matmul op not support broadcast, please check inputs'shape. "
;
return
;
}
uint64_t
dims
=
2
;
for
(
size_t
i
=
0
;
i
<
x_shape
.
size
()
-
dims
;
++
i
)
{
if
(
x_shape
[
i
]
!=
y_shape
[
i
]
&&
(
x_shape
[
i
]
==
1
||
y_shape
[
i
]
==
1
))
{
LOG
(
WARNING
)
<<
"matmul op not support broadcast, please check "
"inputs'shape[i]. "
;
return
;
}
}
OpDesc
desc
(
matmul_v2_op
->
Op
()
->
Block
());
desc
.
SetType
(
"matmul"
);
desc
.
SetInput
(
"X"
,
{
matmul_v2_in_x
->
Name
()});
desc
.
SetInput
(
"Y"
,
{
matmul_v2_in_y
->
Name
()});
desc
.
SetOutput
(
"Out"
,
{
matmul_v2_out
->
Name
()});
desc
.
SetAttr
(
"transpose_X"
,
matmul_v2_op
->
Op
()
->
GetAttr
(
"trans_x"
));
desc
.
SetAttr
(
"transpose_Y"
,
matmul_v2_op
->
Op
()
->
GetAttr
(
"trans_y"
));
desc
.
SetAttr
(
"alpha"
,
1.0
f
);
if
(
matmul_v2_op
->
Op
()
->
HasAttr
(
"use_mkldnn"
))
{
desc
.
SetAttr
(
"use_mkldnn"
,
matmul_v2_op
->
Op
()
->
GetAttr
(
"use_mkldnn"
));
}
if
(
matmul_v2_op
->
Op
()
->
HasAttr
(
"enable_int8"
))
{
desc
.
SetAttr
(
"enable_int8"
,
matmul_v2_op
->
Op
()
->
GetAttr
(
"enable_int8"
));
desc
.
SetAttr
(
"Input_scale"
,
matmul_v2_op
->
Op
()
->
GetAttr
(
"Input_scale"
));
desc
.
SetAttr
(
"out_threshold"
,
matmul_v2_op
->
Op
()
->
GetAttr
(
"out_threshold"
));
}
bool
inscale_flag
=
false
;
bool
outscale_flag
=
false
;
if
(
matmul_v2_op
->
Op
()
->
HasAttr
(
"X"
))
{
desc
.
SetAttr
(
"X"
,
matmul_v2_op
->
Op
()
->
GetAttr
(
"X"
));
inscale_flag
=
true
;
}
if
(
matmul_v2_op
->
Op
()
->
HasAttr
(
"Out"
))
{
desc
.
SetAttr
(
"Out"
,
matmul_v2_op
->
Op
()
->
GetAttr
(
"Out"
));
outscale_flag
=
true
;
}
desc
.
SetAttr
(
"support_int8"
,
inscale_flag
&&
outscale_flag
);
auto
matmul_node
=
g
->
CreateOpNode
(
&
desc
);
IR_NODE_LINK_TO
(
matmul_v2_in_x
,
matmul_node
);
IR_NODE_LINK_TO
(
matmul_v2_in_y
,
matmul_node
);
IR_NODE_LINK_TO
(
matmul_node
,
matmul_v2_out
);
GraphSafeRemoveNodes
(
graph
,
{
matmul_v2_op
});
++
found_count
;
if
(
!
IsCompat
(
desc
))
{
LOG
(
WARNING
)
<<
"TrtMapMatmulV2ToMatmulPass in out matmul op compat failed."
;
return
;
}
};
gpd
(
graph
,
handler
);
AddStatis
(
found_count
);
}
void
TrtSqueeze2MatmulFusePass
::
ApplyImpl
(
ir
::
Graph
*
graph
)
const
{
PADDLE_ENFORCE_NOT_NULL
(
graph
,
platform
::
errors
::
InvalidArgument
(
"Graph cannot be nullptr."
));
std
::
string
name_scope
=
"trt_squeeze2_matmul_fuse_pass"
;
FusePassBase
::
Init
(
name_scope
,
graph
);
GraphPatternDetector
gpd
;
patterns
::
Squeeze2Matmul
fuse_pattern
(
gpd
.
mutable_pattern
(),
name_scope
);
fuse_pattern
();
int
found_count
=
0
;
auto
handler
=
[
&
](
const
GraphPatternDetector
::
subgraph_t
&
subgraph
,
Graph
*
g
)
{
VLOG
(
4
)
<<
"trt fuse squeeze2+matmul to mul"
;
GET_IR_NODE_FROM_SUBGRAPH
(
squeeze2_in_x
,
squeeze2_in_x
,
fuse_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
squeeze2_op
,
squeeze2_op
,
fuse_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
matmul_in_x
,
matmul_in_x
,
fuse_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
matmul_in_y
,
matmul_in_y
,
fuse_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
matmul_op
,
matmul_op
,
fuse_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
matmul_out
,
matmul_out
,
fuse_pattern
);
bool
flag
=
true
;
size_t
squeeze2_in_x_rank
=
(
squeeze2_in_x
->
Var
()
->
GetShape
()).
size
();
std
::
vector
<
int
>
squeeze2_op_axes
=
PADDLE_GET_CONST
(
std
::
vector
<
int
>
,
squeeze2_op
->
Op
()
->
GetAttr
(
"axes"
));
flag
=
flag
&&
squeeze2_in_x_rank
==
4
&&
squeeze2_op_axes
==
std
::
vector
<
int
>
{
2
,
3
}
&&
(
matmul_in_x
->
outputs
).
size
()
==
1
&&
matmul_in_y
->
Var
()
->
Persistable
();
bool
transpose_X
=
PADDLE_GET_CONST
(
bool
,
matmul_op
->
Op
()
->
GetAttr
(
"transpose_X"
));
bool
transpose_Y
=
PADDLE_GET_CONST
(
bool
,
matmul_op
->
Op
()
->
GetAttr
(
"transpose_Y"
));
float
alpha
=
PADDLE_GET_CONST
(
float
,
matmul_op
->
Op
()
->
GetAttr
(
"alpha"
));
size_t
matmul_in_x_rank
=
(
matmul_in_x
->
Var
()
->
GetShape
()).
size
();
size_t
matmul_in_y_rank
=
(
matmul_in_y
->
Var
()
->
GetShape
()).
size
();
flag
=
flag
&&
!
transpose_X
&&
!
transpose_Y
&&
std
::
abs
(
alpha
-
1.0
)
<
1e-5
&&
matmul_in_x_rank
==
2
&&
matmul_in_y_rank
==
2
;
std
::
vector
<
Node
*>&
next_ops
=
matmul_out
->
outputs
;
flag
=
flag
&&
next_ops
.
size
()
==
1
&&
next_ops
[
0
]
->
Name
()
==
"elementwise_add"
;
if
(
flag
)
{
if
(
!
IsCompat
(
subgraph
,
g
))
{
LOG
(
WARNING
)
<<
"TrtSqueeze2MatmulFusePass in op compat failed."
;
return
;
}
OpDesc
desc
(
matmul_op
->
Op
()
->
Block
());
desc
.
SetType
(
"mul"
);
desc
.
SetInput
(
"X"
,
{
squeeze2_in_x
->
Name
()});
desc
.
SetInput
(
"Y"
,
{
matmul_in_y
->
Name
()});
desc
.
SetOutput
(
"Out"
,
{
matmul_out
->
Name
()});
desc
.
SetAttr
(
"x_num_col_dims"
,
1
);
desc
.
SetAttr
(
"y_num_col_dims"
,
1
);
if
(
matmul_op
->
Op
()
->
HasAttr
(
"enable_int8"
))
{
desc
.
SetAttr
(
"enable_int8"
,
matmul_op
->
Op
()
->
GetAttr
(
"enable_int8"
));
desc
.
SetAttr
(
"Input_scale"
,
matmul_op
->
Op
()
->
GetAttr
(
"Input_scale"
));
desc
.
SetAttr
(
"out_threshold"
,
matmul_op
->
Op
()
->
GetAttr
(
"out_threshold"
));
}
bool
inscale_flag_x
=
false
;
bool
outscale_flag
=
false
;
if
(
squeeze2_op
->
Op
()
->
HasAttr
(
"X"
))
{
desc
.
SetAttr
(
"X"
,
squeeze2_op
->
Op
()
->
GetAttr
(
"X"
));
inscale_flag_x
=
true
;
}
if
(
matmul_op
->
Op
()
->
HasAttr
(
"Out"
))
{
desc
.
SetAttr
(
"Out"
,
matmul_op
->
Op
()
->
GetAttr
(
"Out"
));
outscale_flag
=
true
;
}
desc
.
SetAttr
(
"support_int8"
,
inscale_flag_x
&&
outscale_flag
);
auto
mul_node
=
g
->
CreateOpNode
(
&
desc
);
IR_NODE_LINK_TO
(
squeeze2_in_x
,
mul_node
);
IR_NODE_LINK_TO
(
matmul_in_y
,
mul_node
);
IR_NODE_LINK_TO
(
mul_node
,
matmul_out
);
GraphSafeRemoveNodes
(
graph
,
{
squeeze2_op
,
matmul_in_x
,
matmul_op
});
++
found_count
;
if
(
!
IsCompat
(
desc
))
{
LOG
(
WARNING
)
<<
"TrtSqueeze2MatmulFusePass in out mul op compat failed."
;
return
;
}
}
};
gpd
(
graph
,
handler
);
AddStatis
(
found_count
);
}
TrtReshape2MatmulFusePass
::
TrtReshape2MatmulFusePass
()
{
AddOpCompat
(
OpCompat
(
"reshape2"
))
.
AddInput
(
"X"
)
.
IsTensor
()
.
End
()
.
AddInput
(
"Shape"
)
.
IsTensor
()
.
IsOptional
()
.
End
()
.
AddInput
(
"ShapeTensor"
)
.
IsTensor
()
.
IsOptional
()
.
End
()
.
AddOutput
(
"Out"
)
.
IsTensor
()
.
End
()
.
AddOutput
(
"XShape"
)
.
IsTensor
()
.
End
()
.
AddAttr
(
"shape"
)
// ints
.
IsType
<
std
::
vector
<
int
>>
()
.
End
();
AddOpCompat
(
OpCompat
(
"matmul"
))
.
AddInput
(
"X"
)
.
IsTensor
()
.
End
()
.
AddInput
(
"Y"
)
.
IsTensor
()
.
End
()
.
AddOutput
(
"Out"
)
.
IsTensor
()
.
End
()
.
AddAttr
(
"alpha"
)
.
IsNumGT
(
0.99999
f
)
.
IsNumLT
(
1.00001
f
)
.
End
()
.
AddAttr
(
"transpose_X"
)
.
IsBoolEQ
(
false
)
.
End
()
.
AddAttr
(
"transpose_Y"
)
.
IsBoolEQ
(
false
)
.
End
();
AddOpCompat
(
OpCompat
(
"mul"
))
.
AddInput
(
"X"
)
.
IsTensor
()
.
End
()
.
AddInput
(
"Y"
)
.
IsTensor
()
.
End
()
.
AddOutput
(
"Out"
)
.
IsTensor
()
.
End
()
.
AddAttr
(
"x_num_col_dims"
)
.
IsNumEQ
(
1
)
.
End
()
.
AddAttr
(
"y_num_col_dims"
)
.
IsNumEQ
(
1
)
.
End
();
}
void
TrtReshape2MatmulFusePass
::
ApplyImpl
(
ir
::
Graph
*
graph
)
const
{
PADDLE_ENFORCE_NOT_NULL
(
graph
,
platform
::
errors
::
InvalidArgument
(
"Graph cannot be nullptr."
));
std
::
string
name_scope
=
"trt_reshape2_matmul_fuse_pass"
;
FusePassBase
::
Init
(
name_scope
,
graph
);
GraphPatternDetector
gpd
;
patterns
::
Reshape2Matmul
fuse_pattern
(
gpd
.
mutable_pattern
(),
name_scope
);
fuse_pattern
();
int
found_count
=
0
;
auto
handler
=
[
&
](
const
GraphPatternDetector
::
subgraph_t
&
subgraph
,
Graph
*
g
)
{
VLOG
(
4
)
<<
"trt fuse reshape2+matmul to mul"
;
GET_IR_NODE_FROM_SUBGRAPH
(
reshape2_in_x
,
reshape2_in_x
,
fuse_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
reshape2_op
,
reshape2_op
,
fuse_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
matmul_in_x
,
matmul_in_x
,
fuse_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
matmul_in_y
,
matmul_in_y
,
fuse_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
matmul_op
,
matmul_op
,
fuse_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
matmul_out
,
matmul_out
,
fuse_pattern
);
bool
flag
=
true
;
size_t
reshape2_in_nums
=
reshape2_op
->
inputs
.
size
();
auto
reshape2_in_x_shape
=
reshape2_in_x
->
Var
()
->
GetShape
();
size_t
reshape2_in_x_rank
=
reshape2_in_x_shape
.
size
();
std
::
vector
<
int
>
reshape2_op_shape
=
PADDLE_GET_CONST
(
std
::
vector
<
int
>
,
reshape2_op
->
Op
()
->
GetAttr
(
"shape"
));
flag
=
flag
&&
reshape2_in_nums
==
1
&&
reshape2_in_x_rank
==
4
&&
reshape2_in_x_shape
[
2
]
==
1
&&
reshape2_in_x_shape
[
3
]
==
1
&&
reshape2_op_shape
.
size
()
==
2
&&
(
matmul_in_x
->
outputs
).
size
()
==
1
;
bool
transpose_X
=
PADDLE_GET_CONST
(
bool
,
matmul_op
->
Op
()
->
GetAttr
(
"transpose_X"
));
bool
transpose_Y
=
PADDLE_GET_CONST
(
bool
,
matmul_op
->
Op
()
->
GetAttr
(
"transpose_Y"
));
float
alpha
=
PADDLE_GET_CONST
(
float
,
matmul_op
->
Op
()
->
GetAttr
(
"alpha"
));
size_t
matmul_in_x_rank
=
(
matmul_in_x
->
Var
()
->
GetShape
()).
size
();
size_t
matmul_in_y_rank
=
(
matmul_in_y
->
Var
()
->
GetShape
()).
size
();
flag
=
flag
&&
!
transpose_X
&&
!
transpose_Y
&&
std
::
abs
(
alpha
-
1.0
)
<
1e-5
&&
matmul_in_x_rank
==
2
&&
matmul_in_y_rank
==
2
&&
matmul_in_y
->
Var
()
->
Persistable
();
std
::
vector
<
Node
*>&
next_ops
=
matmul_out
->
outputs
;
flag
=
flag
&&
next_ops
.
size
()
==
1
&&
next_ops
[
0
]
->
Name
()
==
"elementwise_add"
;
if
(
flag
)
{
if
(
!
IsCompat
(
subgraph
,
g
))
{
LOG
(
WARNING
)
<<
"TrtReshape2MatmulFusePass in op compat failed."
;
return
;
}
OpDesc
desc
(
matmul_op
->
Op
()
->
Block
());
desc
.
SetType
(
"mul"
);
desc
.
SetInput
(
"X"
,
{
reshape2_in_x
->
Name
()});
desc
.
SetInput
(
"Y"
,
{
matmul_in_y
->
Name
()});
desc
.
SetOutput
(
"Out"
,
{
matmul_out
->
Name
()});
desc
.
SetAttr
(
"x_num_col_dims"
,
1
);
desc
.
SetAttr
(
"y_num_col_dims"
,
1
);
if
(
matmul_op
->
Op
()
->
HasAttr
(
"enable_int8"
))
{
desc
.
SetAttr
(
"enable_int8"
,
matmul_op
->
Op
()
->
GetAttr
(
"enable_int8"
));
desc
.
SetAttr
(
"Input_scale"
,
matmul_op
->
Op
()
->
GetAttr
(
"Input_scale"
));
desc
.
SetAttr
(
"out_threshold"
,
matmul_op
->
Op
()
->
GetAttr
(
"out_threshold"
));
}
bool
inscale_flag_x
=
false
;
bool
outscale_flag
=
false
;
if
(
reshape2_op
->
Op
()
->
HasAttr
(
"X"
))
{
desc
.
SetAttr
(
"X"
,
reshape2_op
->
Op
()
->
GetAttr
(
"X"
));
inscale_flag_x
=
true
;
}
if
(
matmul_op
->
Op
()
->
HasAttr
(
"Out"
))
{
desc
.
SetAttr
(
"Out"
,
matmul_op
->
Op
()
->
GetAttr
(
"Out"
));
outscale_flag
=
true
;
}
desc
.
SetAttr
(
"support_int8"
,
inscale_flag_x
&&
outscale_flag
);
if
(
!
IsCompat
(
desc
))
{
LOG
(
WARNING
)
<<
"TrtReshape2MatmulFusePass in out mul op compat failed."
;
return
;
}
auto
mul_node
=
g
->
CreateOpNode
(
&
desc
);
IR_NODE_LINK_TO
(
reshape2_in_x
,
mul_node
);
IR_NODE_LINK_TO
(
matmul_in_y
,
mul_node
);
IR_NODE_LINK_TO
(
mul_node
,
matmul_out
);
GraphSafeRemoveNodes
(
graph
,
{
reshape2_op
,
matmul_in_x
,
matmul_op
});
++
found_count
;
}
};
gpd
(
graph
,
handler
);
AddStatis
(
found_count
);
}
void
TrtFlatten2MatmulFusePass
::
ApplyImpl
(
ir
::
Graph
*
graph
)
const
{
PADDLE_ENFORCE_NOT_NULL
(
graph
,
platform
::
errors
::
InvalidArgument
(
"Graph cannot be nullptr."
));
std
::
string
name_scope
=
"trt_flatten2_matmul_fuse_pass"
;
FusePassBase
::
Init
(
name_scope
,
graph
);
GraphPatternDetector
gpd
;
patterns
::
Flatten2Matmul
fuse_pattern
(
gpd
.
mutable_pattern
(),
name_scope
);
fuse_pattern
();
int
found_count
=
0
;
auto
handler
=
[
&
](
const
GraphPatternDetector
::
subgraph_t
&
subgraph
,
Graph
*
g
)
{
VLOG
(
4
)
<<
"trt fuse flatten2+matmul to mul"
;
GET_IR_NODE_FROM_SUBGRAPH
(
flatten2_in_x
,
flatten2_in_x
,
fuse_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
flatten2_op
,
flatten2_op
,
fuse_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
matmul_in_x
,
matmul_in_x
,
fuse_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
matmul_in_y
,
matmul_in_y
,
fuse_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
matmul_op
,
matmul_op
,
fuse_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
matmul_out
,
matmul_out
,
fuse_pattern
);
bool
pattern_found
=
true
;
size_t
flatten2_in_nums
=
flatten2_op
->
inputs
.
size
();
auto
flatten2_in_x_shape
=
flatten2_in_x
->
Var
()
->
GetShape
();
size_t
flatten2_in_x_rank
=
flatten2_in_x_shape
.
size
();
int
flatten2_axis
=
PADDLE_GET_CONST
(
int
,
flatten2_op
->
Op
()
->
GetAttr
(
"axis"
));
// only convert matmul to mul when the flatten2 has a single input
// and the rank of input is 4 and the size of the output of matmul
// is 1.
pattern_found
=
pattern_found
&&
flatten2_in_nums
==
1
&&
flatten2_in_x_rank
==
4
&&
(
matmul_in_x
->
outputs
).
size
()
==
1
;
bool
transpose_X
=
PADDLE_GET_CONST
(
bool
,
matmul_op
->
Op
()
->
GetAttr
(
"transpose_X"
));
bool
transpose_Y
=
PADDLE_GET_CONST
(
bool
,
matmul_op
->
Op
()
->
GetAttr
(
"transpose_Y"
));
float
alpha
=
PADDLE_GET_CONST
(
float
,
matmul_op
->
Op
()
->
GetAttr
(
"alpha"
));
size_t
matmul_in_x_rank
=
(
matmul_in_x
->
Var
()
->
GetShape
()).
size
();
size_t
matmul_in_y_rank
=
(
matmul_in_y
->
Var
()
->
GetShape
()).
size
();
pattern_found
=
pattern_found
&&
!
transpose_X
&&
!
transpose_Y
&&
std
::
abs
(
alpha
-
1.0
)
<
1e-5
&&
matmul_in_x_rank
==
2
&&
matmul_in_y_rank
==
2
&&
matmul_in_y
->
Var
()
->
Persistable
();
std
::
vector
<
Node
*>&
next_ops
=
matmul_out
->
outputs
;
// we further require the matmul op is followed by one elementwise
// add op.
pattern_found
=
pattern_found
&&
next_ops
.
size
()
==
1
&&
next_ops
[
0
]
->
Name
()
==
"elementwise_add"
;
if
(
pattern_found
)
{
if
(
!
IsCompat
(
subgraph
,
g
))
{
LOG
(
WARNING
)
<<
"TrtFlatten2MatmulFusePass in op compat failed."
;
return
;
}
OpDesc
desc
(
matmul_op
->
Op
()
->
Block
());
desc
.
SetType
(
"mul"
);
desc
.
SetInput
(
"X"
,
{
flatten2_in_x
->
Name
()});
desc
.
SetInput
(
"Y"
,
{
matmul_in_y
->
Name
()});
desc
.
SetOutput
(
"Out"
,
{
matmul_out
->
Name
()});
desc
.
SetAttr
(
"x_num_col_dims"
,
flatten2_axis
);
desc
.
SetAttr
(
"y_num_col_dims"
,
1
);
if
(
matmul_op
->
Op
()
->
HasAttr
(
"enable_int8"
))
{
desc
.
SetAttr
(
"enable_int8"
,
matmul_op
->
Op
()
->
GetAttr
(
"enable_int8"
));
desc
.
SetAttr
(
"Input_scale"
,
matmul_op
->
Op
()
->
GetAttr
(
"Input_scale"
));
desc
.
SetAttr
(
"out_threshold"
,
matmul_op
->
Op
()
->
GetAttr
(
"out_threshold"
));
}
bool
inscale_flag_x
=
false
;
bool
outscale_flag
=
false
;
if
(
flatten2_op
->
Op
()
->
HasAttr
(
"X"
))
{
desc
.
SetAttr
(
"X"
,
flatten2_op
->
Op
()
->
GetAttr
(
"X"
));
inscale_flag_x
=
true
;
}
if
(
matmul_op
->
Op
()
->
HasAttr
(
"Out"
))
{
desc
.
SetAttr
(
"Out"
,
matmul_op
->
Op
()
->
GetAttr
(
"Out"
));
outscale_flag
=
true
;
}
desc
.
SetAttr
(
"support_int8"
,
inscale_flag_x
&&
outscale_flag
);
auto
mul_node
=
g
->
CreateOpNode
(
&
desc
);
IR_NODE_LINK_TO
(
flatten2_in_x
,
mul_node
);
IR_NODE_LINK_TO
(
matmul_in_y
,
mul_node
);
IR_NODE_LINK_TO
(
mul_node
,
matmul_out
);
GraphSafeRemoveNodes
(
graph
,
{
flatten2_op
,
matmul_in_x
,
matmul_op
});
++
found_count
;
if
(
!
IsCompat
(
desc
))
{
LOG
(
WARNING
)
<<
"TrtFlatten2MatmulFusePass in out mul op compat failed."
;
return
;
}
}
};
gpd
(
graph
,
handler
);
AddStatis
(
found_count
);
}
}
// namespace ir
}
// namespace framework
}
// namespace paddle
REGISTER_PASS
(
trt_map_matmul_to_mul_pass
,
paddle
::
framework
::
ir
::
TrtMapMatmul2MulPass
);
REGISTER_PASS_CAPABILITY
(
trt_map_matmul_to_mul_pass
)
.
AddCombination
(
paddle
::
framework
::
compatible
::
OpVersionComparatorCombination
()
.
LE
(
"matmul"
,
1
)
.
EQ
(
"mul"
,
0
));
REGISTER_PASS
(
trt_map_matmul_v2_to_mul_pass
,
paddle
::
framework
::
ir
::
TrtMapMatmulV2ToMulPass
);
REGISTER_PASS_CAPABILITY
(
trt_map_matmul_v2_to_mul_pass
)
.
AddCombination
(
paddle
::
framework
::
compatible
::
OpVersionComparatorCombination
()
.
EQ
(
"matmul_v2"
,
0
)
.
EQ
(
"mul"
,
0
));
REGISTER_PASS
(
trt_map_matmul_v2_to_matmul_pass
,
paddle
::
framework
::
ir
::
TrtMapMatmulV2ToMatmulPass
);
REGISTER_PASS_CAPABILITY
(
trt_map_matmul_v2_to_matmul_pass
)
.
AddCombination
(
paddle
::
framework
::
compatible
::
OpVersionComparatorCombination
()
.
EQ
(
"matmul_v2"
,
0
)
.
LE
(
"matmul"
,
1
));
REGISTER_PASS
(
trt_squeeze2_matmul_fuse_pass
,
paddle
::
framework
::
ir
::
TrtSqueeze2MatmulFusePass
);
REGISTER_PASS_CAPABILITY
(
trt_squeeze2_matmul_fuse_pass
)
.
AddCombination
(
paddle
::
framework
::
compatible
::
OpVersionComparatorCombination
()
.
LE
(
"matmul"
,
1
)
.
EQ
(
"squeeze2"
,
0
)
.
EQ
(
"mul"
,
0
));
REGISTER_PASS
(
trt_reshape2_matmul_fuse_pass
,
paddle
::
framework
::
ir
::
TrtReshape2MatmulFusePass
);
REGISTER_PASS_CAPABILITY
(
trt_reshape2_matmul_fuse_pass
)
.
AddCombination
(
paddle
::
framework
::
compatible
::
OpVersionComparatorCombination
()
.
LE
(
"matmul"
,
1
)
.
EQ
(
"reshape2"
,
0
)
.
EQ
(
"mul"
,
0
));
REGISTER_PASS
(
trt_flatten2_matmul_fuse_pass
,
paddle
::
framework
::
ir
::
TrtFlatten2MatmulFusePass
);
REGISTER_PASS_CAPABILITY
(
trt_flatten2_matmul_fuse_pass
)
.
AddCombination
(
paddle
::
framework
::
compatible
::
OpVersionComparatorCombination
()
.
LE
(
"matmul"
,
1
)
.
EQ
(
"flatten2"
,
0
)
.
EQ
(
"mul"
,
0
));
paddle/fluid/framework/ir/trt_map_matmul_to_mul_pass.h
已删除
100644 → 0
浏览文件 @
acf55016
// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#pragma once
#include "paddle/fluid/framework/ir/fuse_pass_base.h"
#include "paddle/fluid/framework/ir/graph.h"
#include "paddle/fluid/framework/ir/graph_pattern_detector.h"
#include "paddle/fluid/framework/ir/pass.h"
namespace
paddle
{
namespace
framework
{
namespace
ir
{
class
Graph
;
class
TrtMapMatmul2MulPass
:
public
FusePassBase
{
public:
TrtMapMatmul2MulPass
();
virtual
~
TrtMapMatmul2MulPass
()
{}
protected:
void
ApplyImpl
(
Graph
*
graph
)
const
override
;
};
/*
* Map matmul_v2 to mul, the same as TrtMapMatmul2MulPass.
*/
class
TrtMapMatmulV2ToMulPass
:
public
FusePassBase
{
public:
TrtMapMatmulV2ToMulPass
();
virtual
~
TrtMapMatmulV2ToMulPass
()
{}
protected:
void
ApplyImpl
(
Graph
*
graph
)
const
override
;
};
/*
* Map matmul_v2 to matmul, not supoort broadcast.
*/
class
TrtMapMatmulV2ToMatmulPass
:
public
FusePassBase
{
public:
TrtMapMatmulV2ToMatmulPass
();
virtual
~
TrtMapMatmulV2ToMatmulPass
()
{}
protected:
void
ApplyImpl
(
Graph
*
graph
)
const
override
;
};
/*
* Fuse squeeze2+matmul to mul, so the optimization can use fc_fuse_pass.
* The squeeze2 op must satisfy the following conditions:
* 1. the rank of input X is 4
* 2. the axis attr is [2, 3]
* 3. the next op is only matmul
*
* The matmul op must satisfy the following conditions:
* 1. the transpose_X and transpose_Y attrs are false
* 2. the alpha attr is 1.0
* 3. the rank of input X and Y is 2
* 4. the next op of matmul is only elementwise_add
*
* Notice:
* the rank of input activation is obtained from var_desc,
* it maybe change in runtime. Therefore, the pass considers
* the above passes to reduce the impact on other models.
*/
class
TrtSqueeze2MatmulFusePass
:
public
FusePassBase
{
public:
TrtSqueeze2MatmulFusePass
();
virtual
~
TrtSqueeze2MatmulFusePass
()
{}
protected:
void
ApplyImpl
(
Graph
*
graph
)
const
override
;
};
/*
* Fuse reshape2+matmul to mul, so the optimization can use fc_fuse_pass.
* The reshape2 op must satisfy the following conditions:
* 1. reshape2 has one input node, which means it don't
* have Shape or ShapeTensor input
* 2. the rank of input X is 4 and the last two dims of input X is 1
* 3. the rank of shape attr is 2
* 4. the next op is only matmul
*
* The matmul op must satisfy the following conditions:
* 1. the transpose_X and transpose_Y attrs are false
* 2. the alpha attr is 1.0
* 3. the rank of input X and Y is 2
* 4. the next op of matmul is only elementwise_add
*
* Notice:
* the shape and rank of input activation is obtained from var_desc,
* they maybe change in runtime. Therefore, the pass considers
* the above passes to reduce the impact on other models.
*/
class
TrtReshape2MatmulFusePass
:
public
FusePassBase
{
public:
TrtReshape2MatmulFusePass
();
virtual
~
TrtReshape2MatmulFusePass
()
{}
protected:
void
ApplyImpl
(
Graph
*
graph
)
const
override
;
};
class
TrtFlatten2MatmulFusePass
:
public
FusePassBase
{
public:
TrtFlatten2MatmulFusePass
();
virtual
~
TrtFlatten2MatmulFusePass
()
{}
protected:
void
ApplyImpl
(
Graph
*
graph
)
const
override
;
};
}
// namespace ir
}
// namespace framework
}
// namespace paddle
paddle/fluid/framework/ir/trt_map_ops_to_matrix_multiply_pass.cc
0 → 100644
浏览文件 @
ef734e84
// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "paddle/fluid/framework/ir/trt_map_ops_to_matrix_multiply_pass.h"
#include <cmath>
#include <string>
#include "paddle/fluid/framework/ir/graph_pattern_detector.h"
#include "paddle/fluid/framework/op_proto_maker.h"
#include "paddle/fluid/framework/op_version_registry.h"
#include "paddle/fluid/platform/enforce.h"
namespace
paddle
{
namespace
framework
{
namespace
ir
{
class
Node
;
TrtMapOpsToMatrixMultiplyPass
::
TrtMapOpsToMatrixMultiplyPass
()
{}
void
TrtMapOpsToMatrixMultiplyPass
::
ApplyImpl
(
ir
::
Graph
*
graph
)
const
{
PADDLE_ENFORCE_NOT_NULL
(
graph
,
platform
::
errors
::
InvalidArgument
(
"Graph cannot be nullptr."
));
std
::
string
name_scope
=
"trt_map_ops_to_matrix_multiply_pass"
;
FusePassBase
::
Init
(
name_scope
,
graph
);
std
::
unordered_set
<
std
::
string
>
ops_type
=
{
"mul"
,
"matmul"
,
"matmul_v2"
};
GraphPatternDetector
gpd
;
patterns
::
MulMatmulMatmulV2
mul_matmul_matmul_v2
(
gpd
.
mutable_pattern
(),
name_scope
);
mul_matmul_matmul_v2
(
ops_type
);
int
found_count
=
0
;
auto
handler
=
[
&
](
const
GraphPatternDetector
::
subgraph_t
&
subgraph
,
Graph
*
g
)
{
bool
with_dynamic_shape
=
Get
<
bool
>
(
"with_dynamic_shape"
);
if
(
!
with_dynamic_shape
)
{
VLOG
(
3
)
<<
"TrtMapOpsToMatrixMultiplyPass need with_dynamic_shape, stop this "
"pass."
"Please reconfig 'SetTRTDynamicShapeInfo'. You can refer to the "
"https://github.com/PaddlePaddle/Paddle-Inference-Demo/blob/"
"master/c%2B%2B/gpu/resnet50/resnet50_test.cc"
;
return
;
}
VLOG
(
4
)
<<
"trt map some ops to matrix_multiply"
;
GET_IR_NODE_FROM_SUBGRAPH
(
ops
,
ops
,
mul_matmul_matmul_v2
);
GET_IR_NODE_FROM_SUBGRAPH
(
ops_out
,
ops_out
,
mul_matmul_matmul_v2
);
OpDesc
desc
(
ops
->
Op
()
->
Block
());
desc
.
SetType
(
"matrix_multiply"
);
desc
.
SetInput
(
"X"
,
{
ops
->
Op
()
->
Input
(
"X"
).
front
()});
desc
.
SetInput
(
"Y"
,
{
ops
->
Op
()
->
Input
(
"Y"
).
front
()});
desc
.
SetOutput
(
"Out"
,
{
ops_out
->
Name
()});
if
(
ops
->
Op
()
->
HasAttr
(
"transpose_X"
)
||
ops
->
Op
()
->
HasAttr
(
"trans_x"
))
{
if
(
ops
->
Op
()
->
HasAttr
(
"transpose_X"
))
{
desc
.
SetAttr
(
"transpose_x"
,
ops
->
Op
()
->
GetAttr
(
"transpose_X"
));
}
else
{
desc
.
SetAttr
(
"transpose_x"
,
ops
->
Op
()
->
GetAttr
(
"trans_x"
));
}
}
else
{
desc
.
SetAttr
(
"transpose_x"
,
false
);
}
if
(
ops
->
Op
()
->
HasAttr
(
"transpose_Y"
)
||
ops
->
Op
()
->
HasAttr
(
"trans_y"
))
{
if
(
ops
->
Op
()
->
HasAttr
(
"transpose_Y"
))
{
desc
.
SetAttr
(
"transpose_y"
,
ops
->
Op
()
->
GetAttr
(
"transpose_Y"
));
}
else
{
desc
.
SetAttr
(
"transpose_y"
,
ops
->
Op
()
->
GetAttr
(
"trans_y"
));
}
}
else
{
desc
.
SetAttr
(
"transpose_y"
,
false
);
}
if
(
ops
->
Op
()
->
HasAttr
(
"out_threshold"
))
{
desc
.
SetAttr
(
"out_threshold"
,
ops
->
Op
()
->
GetAttr
(
"out_threshold"
));
}
// Todo: remove attr(x_num_col_dims, y_num_col_dims, alpha)
if
(
ops
->
Op
()
->
HasAttr
(
"x_num_col_dims"
))
{
desc
.
SetAttr
(
"x_num_col_dims"
,
ops
->
Op
()
->
GetAttr
(
"x_num_col_dims"
));
}
else
{
int32_t
x_num_col_dims
=
-
1
;
desc
.
SetAttr
(
"x_num_col_dims"
,
x_num_col_dims
);
}
// op_teller: Only support y_num_col_dims == y.rank - 1;
int32_t
y_num_col_dims
=
-
1
;
desc
.
SetAttr
(
"y_num_col_dims"
,
y_num_col_dims
);
float
alpha
=
1
;
if
(
ops
->
Op
()
->
HasAttr
(
"alpha"
))
{
alpha
=
PADDLE_GET_CONST
(
float
,
ops
->
Op
()
->
GetAttr
(
"alpha"
));
}
desc
.
SetAttr
(
"alpha"
,
alpha
);
auto
matrix_multiply_node
=
g
->
CreateOpNode
(
&
desc
);
for
(
auto
node
:
ops
->
inputs
)
{
IR_NODE_LINK_TO
(
node
,
matrix_multiply_node
);
}
IR_NODE_LINK_TO
(
matrix_multiply_node
,
ops_out
);
GraphSafeRemoveNodes
(
graph
,
{
ops
});
++
found_count
;
};
gpd
(
graph
,
handler
);
AddStatis
(
found_count
);
}
}
// namespace ir
}
// namespace framework
}
// namespace paddle
REGISTER_PASS
(
trt_map_ops_to_matrix_multiply_pass
,
paddle
::
framework
::
ir
::
TrtMapOpsToMatrixMultiplyPass
);
paddle/fluid/framework/ir/trt_map_ops_to_matrix_multiply_pass.h
0 → 100644
浏览文件 @
ef734e84
// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#pragma once
#include "paddle/fluid/framework/ir/fuse_pass_base.h"
#include "paddle/fluid/framework/ir/graph.h"
#include "paddle/fluid/framework/ir/graph_pattern_detector.h"
#include "paddle/fluid/framework/ir/pass.h"
namespace
paddle
{
namespace
framework
{
namespace
ir
{
class
Graph
;
class
TrtMapOpsToMatrixMultiplyPass
:
public
FusePassBase
{
public:
TrtMapOpsToMatrixMultiplyPass
();
virtual
~
TrtMapOpsToMatrixMultiplyPass
()
{}
protected:
void
ApplyImpl
(
Graph
*
graph
)
const
override
;
};
}
// namespace ir
}
// namespace framework
}
// namespace paddle
paddle/fluid/framework/ir/trt_multihead_matmul_fuse_pass.cc
浏览文件 @
ef734e84
...
@@ -257,18 +257,16 @@ static int BuildFusion(Graph* graph, const std::string& name_scope) {
...
@@ -257,18 +257,16 @@ static int BuildFusion(Graph* graph, const std::string& name_scope) {
}
}
PDNode
*
TrtMultiHeadMatmulPattern
::
operator
()()
{
PDNode
*
TrtMultiHeadMatmulPattern
::
operator
()()
{
std
::
unordered_set
<
std
::
string
>
mul_ops
{
"mul"
,
"matmul_v2"
};
std
::
unordered_set
<
std
::
string
>
matmul_ops
{
"matmul"
,
"matmul_v2"
};
auto
*
input0
=
pattern
->
NewNode
(
input0_repr
());
auto
*
input0
=
pattern
->
NewNode
(
input0_repr
());
input0
->
assert_is_op
s_input
(
mul_ops
);
input0
->
assert_is_op
_input
(
"matrix_multiply"
);
// First path with scale
// First path with scale
auto
*
mul0
=
pattern
->
NewNode
(
mul0_repr
())
->
assert_is_op
s
(
mul_ops
);
auto
*
mul0
=
pattern
->
NewNode
(
mul0_repr
())
->
assert_is_op
(
"matrix_multiply"
);
auto
*
mul0_w_var
=
pattern
->
NewNode
(
mul0_w_repr
())
auto
*
mul0_w_var
=
pattern
->
NewNode
(
mul0_w_repr
())
->
AsInput
()
->
AsInput
()
->
assert_is_op
s_input
(
mul_ops
,
"Y"
);
->
assert_is_op
_input
(
"matrix_multiply"
,
"Y"
);
auto
*
mul0_out_var
=
auto
*
mul0_out_var
=
pattern
->
NewNode
(
mul0_out_repr
())
->
assert_is_op
s_output
(
mul_ops
);
pattern
->
NewNode
(
mul0_out_repr
())
->
assert_is_op
_output
(
"matrix_multiply"
);
decltype
(
mul0
)
eltadd0
;
decltype
(
mul0
)
eltadd0
;
decltype
(
mul0
)
eltadd0_b_var
;
decltype
(
mul0
)
eltadd0_b_var
;
...
@@ -301,12 +299,12 @@ PDNode* TrtMultiHeadMatmulPattern::operator()() {
...
@@ -301,12 +299,12 @@ PDNode* TrtMultiHeadMatmulPattern::operator()() {
auto
*
scale
=
pattern
->
NewNode
(
scale_repr
())
->
assert_is_op
(
"scale"
);
auto
*
scale
=
pattern
->
NewNode
(
scale_repr
())
->
assert_is_op
(
"scale"
);
auto
*
scale_out_var
=
auto
*
scale_out_var
=
pattern
->
NewNode
(
scale_out_repr
())
->
assert_is_op_output
(
"scale"
);
pattern
->
NewNode
(
scale_out_repr
())
->
assert_is_op_output
(
"scale"
);
scale_out_var
->
AsIntermediate
()
->
assert_is_op
s_input
(
matmul_ops
);
scale_out_var
->
AsIntermediate
()
->
assert_is_op
_input
(
"matrix_multiply"
);
auto
*
matmul_qk
=
auto
*
matmul_qk
=
pattern
->
NewNode
(
matmul_qk_repr
())
->
assert_is_op
s
(
matmul_ops
);
pattern
->
NewNode
(
matmul_qk_repr
())
->
assert_is_op
(
"matrix_multiply"
);
auto
*
matmul_qk_out_var
=
auto
*
matmul_qk_out_var
=
pattern
->
NewNode
(
matmul_qk_out_repr
())
pattern
->
NewNode
(
matmul_qk_out_repr
())
->
assert_is_ops_output
(
matmul_ops
);
->
assert_is_op_output
(
"matrix_multiply"
);
matmul_qk_out_var
->
AsIntermediate
()
->
assert_is_op_input
(
"elementwise_add"
);
matmul_qk_out_var
->
AsIntermediate
()
->
assert_is_op_input
(
"elementwise_add"
);
auto
*
eltadd_qk
=
auto
*
eltadd_qk
=
...
@@ -322,12 +320,12 @@ PDNode* TrtMultiHeadMatmulPattern::operator()() {
...
@@ -322,12 +320,12 @@ PDNode* TrtMultiHeadMatmulPattern::operator()() {
pattern
->
NewNode
(
softmax_qk_repr
())
->
assert_is_op
(
"softmax"
);
pattern
->
NewNode
(
softmax_qk_repr
())
->
assert_is_op
(
"softmax"
);
auto
*
softmax_qk_out_var
=
auto
*
softmax_qk_out_var
=
pattern
->
NewNode
(
softmax_qk_out_repr
())
->
assert_is_op_output
(
"softmax"
);
pattern
->
NewNode
(
softmax_qk_out_repr
())
->
assert_is_op_output
(
"softmax"
);
softmax_qk_out_var
->
AsIntermediate
()
->
assert_is_op
s_input
(
matmul_ops
);
softmax_qk_out_var
->
AsIntermediate
()
->
assert_is_op
_input
(
"matrix_multiply"
);
auto
*
matmul_qkv
=
auto
*
matmul_qkv
=
pattern
->
NewNode
(
matmul_qkv_repr
())
->
assert_is_op
s
(
matmul_ops
);
pattern
->
NewNode
(
matmul_qkv_repr
())
->
assert_is_op
(
"matrix_multiply"
);
auto
*
matmul_qkv_out_var
=
auto
*
matmul_qkv_out_var
=
pattern
->
NewNode
(
matmul_qkv_out_repr
())
pattern
->
NewNode
(
matmul_qkv_out_repr
())
->
assert_is_ops_output
(
matmul_ops
);
->
assert_is_op_output
(
"matrix_multiply"
);
matmul_qkv_out_var
->
AsIntermediate
()
->
assert_is_op_input
(
"transpose2"
);
matmul_qkv_out_var
->
AsIntermediate
()
->
assert_is_op_input
(
"transpose2"
);
auto
*
transpose2_qkv
=
auto
*
transpose2_qkv
=
...
@@ -340,15 +338,14 @@ PDNode* TrtMultiHeadMatmulPattern::operator()() {
...
@@ -340,15 +338,14 @@ PDNode* TrtMultiHeadMatmulPattern::operator()() {
pattern
->
NewNode
(
reshape2_qkv_repr
())
->
assert_is_op
(
"reshape2"
);
pattern
->
NewNode
(
reshape2_qkv_repr
())
->
assert_is_op
(
"reshape2"
);
auto
*
reshape2_qkv_out_var
=
pattern
->
NewNode
(
reshape2_qkv_out_repr
())
auto
*
reshape2_qkv_out_var
=
pattern
->
NewNode
(
reshape2_qkv_out_repr
())
->
assert_is_op_output
(
"reshape2"
);
->
assert_is_op_output
(
"reshape2"
);
reshape2_qkv_out_var
->
assert_is_ops_input
(
mul_ops
);
// Second path to matmul
// Second path to matmul
auto
*
mul1
=
pattern
->
NewNode
(
mul1_repr
())
->
assert_is_op
s
(
mul_ops
);
auto
*
mul1
=
pattern
->
NewNode
(
mul1_repr
())
->
assert_is_op
(
"matrix_multiply"
);
auto
*
mul1_w_var
=
pattern
->
NewNode
(
mul1_w_repr
())
auto
*
mul1_w_var
=
pattern
->
NewNode
(
mul1_w_repr
())
->
AsInput
()
->
AsInput
()
->
assert_is_op
s_input
(
mul_ops
,
"Y"
);
->
assert_is_op
_input
(
"matrix_multiply"
,
"Y"
);
auto
*
mul1_out_var
=
auto
*
mul1_out_var
=
pattern
->
NewNode
(
mul1_out_repr
())
->
assert_is_op
s_output
(
mul_ops
);
pattern
->
NewNode
(
mul1_out_repr
())
->
assert_is_op
_output
(
"matrix_multiply"
);
decltype
(
mul1
)
eltadd1
;
decltype
(
mul1
)
eltadd1
;
decltype
(
mul1
)
eltadd1_b_var
;
decltype
(
mul1
)
eltadd1_b_var
;
...
@@ -375,16 +372,16 @@ PDNode* TrtMultiHeadMatmulPattern::operator()() {
...
@@ -375,16 +372,16 @@ PDNode* TrtMultiHeadMatmulPattern::operator()() {
pattern
->
NewNode
(
transpose2_1_repr
())
->
assert_is_op
(
"transpose2"
);
pattern
->
NewNode
(
transpose2_1_repr
())
->
assert_is_op
(
"transpose2"
);
auto
*
transpose2_1_out_var
=
pattern
->
NewNode
(
transpose2_1_out_repr
())
auto
*
transpose2_1_out_var
=
pattern
->
NewNode
(
transpose2_1_out_repr
())
->
assert_is_op_output
(
"transpose2"
);
->
assert_is_op_output
(
"transpose2"
);
transpose2_1_out_var
->
AsIntermediate
()
->
assert_is_op
s
_input
(
transpose2_1_out_var
->
AsIntermediate
()
->
assert_is_op_input
(
matmul_ops
);
// link to matmul qk
"matrix_multiply"
);
// link to matmul qk
// Third path to matmul
// Third path to matmul
auto
*
mul2
=
pattern
->
NewNode
(
mul2_repr
())
->
assert_is_op
s
(
mul_ops
);
auto
*
mul2
=
pattern
->
NewNode
(
mul2_repr
())
->
assert_is_op
(
"matrix_multiply"
);
auto
*
mul2_w_var
=
pattern
->
NewNode
(
mul2_w_repr
())
auto
*
mul2_w_var
=
pattern
->
NewNode
(
mul2_w_repr
())
->
AsInput
()
->
AsInput
()
->
assert_is_op
s_input
(
mul_ops
,
"Y"
);
->
assert_is_op
_input
(
"matrix_multiply"
,
"Y"
);
auto
*
mul2_out_var
=
auto
*
mul2_out_var
=
pattern
->
NewNode
(
mul2_out_repr
())
->
assert_is_op
s_output
(
mul_ops
);
pattern
->
NewNode
(
mul2_out_repr
())
->
assert_is_op
_output
(
"matrix_multiply"
);
decltype
(
mul2
)
eltadd2
;
decltype
(
mul2
)
eltadd2
;
decltype
(
mul2
)
eltadd2_b_var
;
decltype
(
mul2
)
eltadd2_b_var
;
...
@@ -411,8 +408,8 @@ PDNode* TrtMultiHeadMatmulPattern::operator()() {
...
@@ -411,8 +408,8 @@ PDNode* TrtMultiHeadMatmulPattern::operator()() {
pattern
->
NewNode
(
transpose2_2_repr
())
->
assert_is_op
(
"transpose2"
);
pattern
->
NewNode
(
transpose2_2_repr
())
->
assert_is_op
(
"transpose2"
);
auto
*
transpose2_2_out_var
=
pattern
->
NewNode
(
transpose2_2_out_repr
())
auto
*
transpose2_2_out_var
=
pattern
->
NewNode
(
transpose2_2_out_repr
())
->
assert_is_op_output
(
"transpose2"
);
->
assert_is_op_output
(
"transpose2"
);
transpose2_2_out_var
->
AsIntermediate
()
->
assert_is_op
s
_input
(
transpose2_2_out_var
->
AsIntermediate
()
->
assert_is_op_input
(
matmul_ops
);
// link to matmul qkv
"matrix_multiply"
);
// link to matmul qkv
// Q path
// Q path
mul0
->
LinksFrom
({
input0
,
mul0_w_var
}).
LinksTo
({
mul0_out_var
});
mul0
->
LinksFrom
({
input0
,
mul0_w_var
}).
LinksTo
({
mul0_out_var
});
...
@@ -449,17 +446,16 @@ PDNode* TrtMultiHeadMatmulPattern::operator()() {
...
@@ -449,17 +446,16 @@ PDNode* TrtMultiHeadMatmulPattern::operator()() {
}
}
PDNode
*
TrtMultiHeadMatmulV3Pattern
::
operator
()()
{
PDNode
*
TrtMultiHeadMatmulV3Pattern
::
operator
()()
{
std
::
unordered_set
<
std
::
string
>
matmul_ops
{
"matmul"
,
"matmul_v2"
};
auto
*
input0
=
pattern
->
NewNode
(
input0_repr
());
auto
*
input0
=
pattern
->
NewNode
(
input0_repr
());
input0
->
assert_is_op
s_input
(
matmul_ops
);
input0
->
assert_is_op
_input
(
"matrix_multiply"
);
// First path with scale
// First path with scale
auto
*
mul0
=
pattern
->
NewNode
(
mul0_repr
())
->
assert_is_op
s
(
matmul_ops
);
auto
*
mul0
=
pattern
->
NewNode
(
mul0_repr
())
->
assert_is_op
(
"matrix_multiply"
);
auto
*
mul0_w_var
=
pattern
->
NewNode
(
mul0_w_repr
())
auto
*
mul0_w_var
=
pattern
->
NewNode
(
mul0_w_repr
())
->
AsInput
()
->
AsInput
()
->
assert_is_op
s_input
(
matmul_ops
,
"Y"
);
->
assert_is_op
_input
(
"matrix_multiply"
,
"Y"
);
auto
*
mul0_out_var
=
auto
*
mul0_out_var
=
pattern
->
NewNode
(
mul0_out_repr
())
->
assert_is_op
s_output
(
matmul_ops
);
pattern
->
NewNode
(
mul0_out_repr
())
->
assert_is_op
_output
(
"matrix_multiply"
);
decltype
(
mul0
)
eltadd0
;
decltype
(
mul0
)
eltadd0
;
decltype
(
mul0
)
eltadd0_b_var
;
decltype
(
mul0
)
eltadd0_b_var
;
...
@@ -487,12 +483,13 @@ PDNode* TrtMultiHeadMatmulV3Pattern::operator()() {
...
@@ -487,12 +483,13 @@ PDNode* TrtMultiHeadMatmulV3Pattern::operator()() {
pattern
->
NewNode
(
transpose2_0_repr
())
->
assert_is_op
(
"transpose2"
);
pattern
->
NewNode
(
transpose2_0_repr
())
->
assert_is_op
(
"transpose2"
);
auto
*
transpose2_0_out_var
=
pattern
->
NewNode
(
transpose2_0_out_repr
())
auto
*
transpose2_0_out_var
=
pattern
->
NewNode
(
transpose2_0_out_repr
())
->
assert_is_op_output
(
"transpose2"
);
->
assert_is_op_output
(
"transpose2"
);
transpose2_0_out_var
->
AsIntermediate
()
->
assert_is_ops_input
(
matmul_ops
,
"X"
);
transpose2_0_out_var
->
AsIntermediate
()
->
assert_is_op_input
(
"matrix_multiply"
,
"X"
);
auto
*
matmul_qk
=
auto
*
matmul_qk
=
pattern
->
NewNode
(
matmul_qk_repr
())
->
assert_is_op
s
(
matmul_ops
);
pattern
->
NewNode
(
matmul_qk_repr
())
->
assert_is_op
(
"matrix_multiply"
);
auto
*
matmul_qk_out_var
=
auto
*
matmul_qk_out_var
=
pattern
->
NewNode
(
matmul_qk_out_repr
())
pattern
->
NewNode
(
matmul_qk_out_repr
())
->
assert_is_ops_output
(
matmul_ops
);
->
assert_is_op_output
(
"matrix_multiply"
);
matmul_qk_out_var
->
AsIntermediate
()
->
assert_is_op_input
(
"elementwise_add"
);
matmul_qk_out_var
->
AsIntermediate
()
->
assert_is_op_input
(
"elementwise_add"
);
auto
*
eltadd_qk
=
auto
*
eltadd_qk
=
...
@@ -508,12 +505,12 @@ PDNode* TrtMultiHeadMatmulV3Pattern::operator()() {
...
@@ -508,12 +505,12 @@ PDNode* TrtMultiHeadMatmulV3Pattern::operator()() {
pattern
->
NewNode
(
softmax_qk_repr
())
->
assert_is_op
(
"softmax"
);
pattern
->
NewNode
(
softmax_qk_repr
())
->
assert_is_op
(
"softmax"
);
auto
*
softmax_qk_out_var
=
auto
*
softmax_qk_out_var
=
pattern
->
NewNode
(
softmax_qk_out_repr
())
->
assert_is_op_output
(
"softmax"
);
pattern
->
NewNode
(
softmax_qk_out_repr
())
->
assert_is_op_output
(
"softmax"
);
softmax_qk_out_var
->
AsIntermediate
()
->
assert_is_op
s_input
(
matmul_ops
);
softmax_qk_out_var
->
AsIntermediate
()
->
assert_is_op
_input
(
"matrix_multiply"
);
auto
*
matmul_qkv
=
auto
*
matmul_qkv
=
pattern
->
NewNode
(
matmul_qkv_repr
())
->
assert_is_op
s
(
matmul_ops
);
pattern
->
NewNode
(
matmul_qkv_repr
())
->
assert_is_op
(
"matrix_multiply"
);
auto
*
matmul_qkv_out_var
=
auto
*
matmul_qkv_out_var
=
pattern
->
NewNode
(
matmul_qkv_out_repr
())
pattern
->
NewNode
(
matmul_qkv_out_repr
())
->
assert_is_ops_output
(
matmul_ops
);
->
assert_is_op_output
(
"matrix_multiply"
);
matmul_qkv_out_var
->
AsIntermediate
()
->
assert_is_op_input
(
"transpose2"
);
matmul_qkv_out_var
->
AsIntermediate
()
->
assert_is_op_input
(
"transpose2"
);
auto
*
transpose2_qkv
=
auto
*
transpose2_qkv
=
...
@@ -526,14 +523,13 @@ PDNode* TrtMultiHeadMatmulV3Pattern::operator()() {
...
@@ -526,14 +523,13 @@ PDNode* TrtMultiHeadMatmulV3Pattern::operator()() {
pattern
->
NewNode
(
reshape2_qkv_repr
())
->
assert_is_op
(
"reshape2"
);
pattern
->
NewNode
(
reshape2_qkv_repr
())
->
assert_is_op
(
"reshape2"
);
auto
*
reshape2_qkv_out_var
=
pattern
->
NewNode
(
reshape2_qkv_out_repr
())
auto
*
reshape2_qkv_out_var
=
pattern
->
NewNode
(
reshape2_qkv_out_repr
())
->
assert_is_op_output
(
"reshape2"
);
->
assert_is_op_output
(
"reshape2"
);
reshape2_qkv_out_var
->
assert_is_ops_input
(
matmul_ops
);
// Second path to matmul
// Second path to matmul
auto
*
mul1
=
pattern
->
NewNode
(
mul1_repr
())
->
assert_is_op
s
(
matmul_ops
);
auto
*
mul1
=
pattern
->
NewNode
(
mul1_repr
())
->
assert_is_op
(
"matrix_multiply"
);
auto
*
mul1_w_var
=
pattern
->
NewNode
(
mul1_w_repr
())
auto
*
mul1_w_var
=
pattern
->
NewNode
(
mul1_w_repr
())
->
AsInput
()
->
AsInput
()
->
assert_is_op
s_input
(
matmul_ops
,
"Y"
);
->
assert_is_op
_input
(
"matrix_multiply"
,
"Y"
);
auto
*
mul1_out_var
=
auto
*
mul1_out_var
=
pattern
->
NewNode
(
mul1_out_repr
())
->
assert_is_op
s_output
(
matmul_ops
);
pattern
->
NewNode
(
mul1_out_repr
())
->
assert_is_op
_output
(
"matrix_multiply"
);
decltype
(
mul1
)
eltadd1
;
decltype
(
mul1
)
eltadd1
;
decltype
(
mul1
)
eltadd1_b_var
;
decltype
(
mul1
)
eltadd1_b_var
;
...
@@ -560,16 +556,16 @@ PDNode* TrtMultiHeadMatmulV3Pattern::operator()() {
...
@@ -560,16 +556,16 @@ PDNode* TrtMultiHeadMatmulV3Pattern::operator()() {
pattern
->
NewNode
(
transpose2_1_repr
())
->
assert_is_op
(
"transpose2"
);
pattern
->
NewNode
(
transpose2_1_repr
())
->
assert_is_op
(
"transpose2"
);
auto
*
transpose2_1_out_var
=
pattern
->
NewNode
(
transpose2_1_out_repr
())
auto
*
transpose2_1_out_var
=
pattern
->
NewNode
(
transpose2_1_out_repr
())
->
assert_is_op_output
(
"transpose2"
);
->
assert_is_op_output
(
"transpose2"
);
transpose2_1_out_var
->
AsIntermediate
()
->
assert_is_op
s
_input
(
transpose2_1_out_var
->
AsIntermediate
()
->
assert_is_op_input
(
matmul_ops
,
"Y"
);
// link to matmul qk
"matrix_multiply"
,
"Y"
);
// link to matmul qk
// Third path to matmul
// Third path to matmul
auto
*
mul2
=
pattern
->
NewNode
(
mul2_repr
())
->
assert_is_op
s
(
matmul_ops
);
auto
*
mul2
=
pattern
->
NewNode
(
mul2_repr
())
->
assert_is_op
(
"matrix_multiply"
);
auto
*
mul2_w_var
=
pattern
->
NewNode
(
mul2_w_repr
())
auto
*
mul2_w_var
=
pattern
->
NewNode
(
mul2_w_repr
())
->
AsInput
()
->
AsInput
()
->
assert_is_op
s_input
(
matmul_ops
,
"Y"
);
->
assert_is_op
_input
(
"matrix_multiply"
,
"Y"
);
auto
*
mul2_out_var
=
auto
*
mul2_out_var
=
pattern
->
NewNode
(
mul2_out_repr
())
->
assert_is_op
s_output
(
matmul_ops
);
pattern
->
NewNode
(
mul2_out_repr
())
->
assert_is_op
_output
(
"matrix_multiply"
);
decltype
(
mul2
)
eltadd2
;
decltype
(
mul2
)
eltadd2
;
decltype
(
mul2
)
eltadd2_b_var
;
decltype
(
mul2
)
eltadd2_b_var
;
...
@@ -596,8 +592,8 @@ PDNode* TrtMultiHeadMatmulV3Pattern::operator()() {
...
@@ -596,8 +592,8 @@ PDNode* TrtMultiHeadMatmulV3Pattern::operator()() {
pattern
->
NewNode
(
transpose2_2_repr
())
->
assert_is_op
(
"transpose2"
);
pattern
->
NewNode
(
transpose2_2_repr
())
->
assert_is_op
(
"transpose2"
);
auto
*
transpose2_2_out_var
=
pattern
->
NewNode
(
transpose2_2_out_repr
())
auto
*
transpose2_2_out_var
=
pattern
->
NewNode
(
transpose2_2_out_repr
())
->
assert_is_op_output
(
"transpose2"
);
->
assert_is_op_output
(
"transpose2"
);
transpose2_2_out_var
->
AsIntermediate
()
->
assert_is_op
s
_input
(
transpose2_2_out_var
->
AsIntermediate
()
->
assert_is_op_input
(
matmul_ops
);
// link to matmul qkv
"matrix_multiply"
);
// link to matmul qkv
// Q path
// Q path
mul0
->
LinksFrom
({
input0
,
mul0_w_var
}).
LinksTo
({
mul0_out_var
});
mul0
->
LinksFrom
({
input0
,
mul0_w_var
}).
LinksTo
({
mul0_out_var
});
...
@@ -642,23 +638,6 @@ void TrtMultiHeadMatmulFusePass::ApplyImpl(Graph* graph) const {
...
@@ -642,23 +638,6 @@ void TrtMultiHeadMatmulFusePass::ApplyImpl(Graph* graph) const {
}
}
TrtMultiHeadMatmulV2FusePass
::
TrtMultiHeadMatmulV2FusePass
()
{
TrtMultiHeadMatmulV2FusePass
::
TrtMultiHeadMatmulV2FusePass
()
{
AddOpCompat
(
OpCompat
(
"mul"
))
.
AddInput
(
"X"
)
// the shape shoule be (B, S, N*H)
.
IsTensor
()
.
End
()
.
AddInput
(
"Y"
)
// the shape shoule be (N*H, N*H)
.
IsTensor
()
.
End
()
.
AddOutput
(
"Out"
)
// the shape shoule be (B, S, N*H)
.
IsTensor
()
.
End
()
.
AddAttr
(
"x_num_col_dims"
)
.
IsNumEQ
(
2
)
.
End
()
.
AddAttr
(
"y_num_col_dims"
)
.
IsNumEQ
(
1
)
.
End
();
AddOpCompat
(
OpCompat
(
"elementwise_add"
))
AddOpCompat
(
OpCompat
(
"elementwise_add"
))
.
AddInput
(
"X"
)
.
AddInput
(
"X"
)
// in bias, shape is (B, S, N*H),
// in bias, shape is (B, S, N*H),
...
@@ -738,45 +717,6 @@ TrtMultiHeadMatmulV2FusePass::TrtMultiHeadMatmulV2FusePass() {
...
@@ -738,45 +717,6 @@ TrtMultiHeadMatmulV2FusePass::TrtMultiHeadMatmulV2FusePass() {
.
IsType
<
bool
>
()
.
IsType
<
bool
>
()
.
End
();
.
End
();
// QK (B, H, S, N)*(B, H, S, N) -> (B, H, S, S)
// QKV (B, H, S, S)*(B, H, S, N) -> (B, H, S, N)
AddOpCompat
(
OpCompat
(
"matmul"
))
.
AddInput
(
"X"
)
.
IsTensor
()
.
End
()
.
AddInput
(
"Y"
)
.
IsTensor
()
.
End
()
.
AddOutput
(
"Out"
)
.
IsTensor
()
.
End
()
.
AddAttr
(
"alpha"
)
.
IsNumEQ
(
1.0
f
)
.
End
()
.
AddAttr
(
"transpose_X"
)
.
IsBoolEQ
(
false
)
.
End
()
.
AddAttr
(
"transpose_Y"
)
// QK(true) QKV(false)
.
IsType
<
bool
>
()
.
End
();
AddOpCompat
(
OpCompat
(
"matmul_v2"
))
.
AddInput
(
"X"
)
.
IsTensor
()
.
End
()
.
AddInput
(
"Y"
)
.
IsTensor
()
.
End
()
.
AddOutput
(
"Out"
)
.
IsTensor
()
.
End
()
.
AddAttr
(
"trans_x"
)
.
IsType
<
bool
>
()
.
End
()
.
AddAttr
(
"trans_y"
)
.
IsType
<
bool
>
()
.
End
();
AddOpCompat
(
OpCompat
(
"softmax"
))
AddOpCompat
(
OpCompat
(
"softmax"
))
.
AddInput
(
"X"
)
.
AddInput
(
"X"
)
.
IsTensor
()
.
IsTensor
()
...
@@ -1187,23 +1127,6 @@ void TrtMultiHeadMatmulV2FusePass::ApplyImpl(Graph* graph) const {
...
@@ -1187,23 +1127,6 @@ void TrtMultiHeadMatmulV2FusePass::ApplyImpl(Graph* graph) const {
}
}
TrtMultiHeadMatmulV3FusePass
::
TrtMultiHeadMatmulV3FusePass
()
{
TrtMultiHeadMatmulV3FusePass
::
TrtMultiHeadMatmulV3FusePass
()
{
AddOpCompat
(
OpCompat
(
"mul"
))
.
AddInput
(
"X"
)
// the shape shoule be (B, S, N*H)
.
IsTensor
()
.
End
()
.
AddInput
(
"Y"
)
// the shape shoule be (N*H, N*H)
.
IsTensor
()
.
End
()
.
AddOutput
(
"Out"
)
// the shape shoule be (B, S, N*H)
.
IsTensor
()
.
End
()
.
AddAttr
(
"x_num_col_dims"
)
.
IsNumEQ
(
2
)
.
End
()
.
AddAttr
(
"y_num_col_dims"
)
.
IsNumEQ
(
1
)
.
End
();
AddOpCompat
(
OpCompat
(
"elementwise_add"
))
AddOpCompat
(
OpCompat
(
"elementwise_add"
))
.
AddInput
(
"X"
)
.
AddInput
(
"X"
)
// in bias, shape is (B, S, N*H),
// in bias, shape is (B, S, N*H),
...
@@ -1266,45 +1189,6 @@ TrtMultiHeadMatmulV3FusePass::TrtMultiHeadMatmulV3FusePass() {
...
@@ -1266,45 +1189,6 @@ TrtMultiHeadMatmulV3FusePass::TrtMultiHeadMatmulV3FusePass() {
.
IsType
<
std
::
vector
<
int
>>
()
.
IsType
<
std
::
vector
<
int
>>
()
.
End
();
.
End
();
// QK (B, H, S, N)*(B, H, S, N) -> (B, H, S, S)
// QKV (B, H, S, S)*(B, H, S, N) -> (B, H, S, N)
AddOpCompat
(
OpCompat
(
"matmul"
))
.
AddInput
(
"X"
)
.
IsTensor
()
.
End
()
.
AddInput
(
"Y"
)
.
IsTensor
()
.
End
()
.
AddOutput
(
"Out"
)
.
IsTensor
()
.
End
()
.
AddAttr
(
"alpha"
)
.
IsType
<
float
>
()
// QK(anyvalue, will copy to new op) QKV(1.0)
.
End
()
.
AddAttr
(
"transpose_X"
)
.
IsBoolEQ
(
false
)
.
End
()
.
AddAttr
(
"transpose_Y"
)
// QK(true) QKV(false)
.
IsType
<
bool
>
()
.
End
();
AddOpCompat
(
OpCompat
(
"matmul_v2"
))
.
AddInput
(
"X"
)
.
IsTensor
()
.
End
()
.
AddInput
(
"Y"
)
.
IsTensor
()
.
End
()
.
AddOutput
(
"Out"
)
.
IsTensor
()
.
End
()
.
AddAttr
(
"trans_x"
)
.
IsBoolEQ
(
false
)
.
End
()
.
AddAttr
(
"trans_y"
)
// QK(true) QKV(false)
.
IsType
<
bool
>
()
.
End
();
AddOpCompat
(
OpCompat
(
"softmax"
))
AddOpCompat
(
OpCompat
(
"softmax"
))
.
AddInput
(
"X"
)
.
AddInput
(
"X"
)
.
IsTensor
()
.
IsTensor
()
...
@@ -1672,12 +1556,10 @@ REGISTER_PASS(trt_multihead_matmul_fuse_pass_v3,
...
@@ -1672,12 +1556,10 @@ REGISTER_PASS(trt_multihead_matmul_fuse_pass_v3,
REGISTER_PASS_CAPABILITY
(
trt_multihead_matmul_fuse_pass_v2
)
REGISTER_PASS_CAPABILITY
(
trt_multihead_matmul_fuse_pass_v2
)
.
AddCombination
(
.
AddCombination
(
paddle
::
framework
::
compatible
::
OpVersionComparatorCombination
()
paddle
::
framework
::
compatible
::
OpVersionComparatorCombination
()
.
EQ
(
"mul"
,
0
)
.
LE
(
"elementwise_add"
,
1
)
.
LE
(
"elementwise_add"
,
1
)
.
EQ
(
"reshape2"
,
0
)
.
EQ
(
"reshape2"
,
0
)
.
EQ
(
"transpose2"
,
0
)
.
EQ
(
"transpose2"
,
0
)
.
EQ
(
"scale"
,
0
)
.
EQ
(
"scale"
,
0
)
.
LE
(
"matmul"
,
1
)
.
EQ
(
"softmax"
,
0
));
.
EQ
(
"softmax"
,
0
));
REGISTER_PASS_CAPABILITY
(
trt_multihead_matmul_fuse_pass_v3
)
REGISTER_PASS_CAPABILITY
(
trt_multihead_matmul_fuse_pass_v3
)
...
@@ -1687,6 +1569,4 @@ REGISTER_PASS_CAPABILITY(trt_multihead_matmul_fuse_pass_v3)
...
@@ -1687,6 +1569,4 @@ REGISTER_PASS_CAPABILITY(trt_multihead_matmul_fuse_pass_v3)
.
EQ
(
"reshape2"
,
0
)
.
EQ
(
"reshape2"
,
0
)
.
EQ
(
"transpose2"
,
0
)
.
EQ
(
"transpose2"
,
0
)
.
EQ
(
"scale"
,
0
)
.
EQ
(
"scale"
,
0
)
.
LE
(
"matmul"
,
1
)
.
EQ
(
"matmul_v2"
,
0
)
.
EQ
(
"softmax"
,
0
));
.
EQ
(
"softmax"
,
0
));
paddle/fluid/framework/ir/trt_skip_layernorm_fuse_pass.cc
浏览文件 @
ef734e84
...
@@ -176,10 +176,17 @@ void TrtSkipLayerNormFusePass::ApplyImpl(ir::Graph *graph) const {
...
@@ -176,10 +176,17 @@ void TrtSkipLayerNormFusePass::ApplyImpl(ir::Graph *graph) const {
new_desc
.
SetInput
(
"Bias"
,
{
layer_norm_bias
->
Name
()});
new_desc
.
SetInput
(
"Bias"
,
{
layer_norm_bias
->
Name
()});
if
(
layer_norm
->
Op
()
->
HasAttr
(
"out_threshold"
))
{
if
(
layer_norm
->
Op
()
->
HasAttr
(
"out_threshold"
))
{
new_desc
.
SetAttr
(
"enable_int8"
,
true
);
new_desc
.
SetAttr
(
"out_threshold"
,
new_desc
.
SetAttr
(
"out_threshold"
,
layer_norm
->
Op
()
->
GetAttr
(
"out_threshold"
));
layer_norm
->
Op
()
->
GetAttr
(
"out_threshold"
));
}
}
if
(
subgraph
.
at
(
x
)
->
inputs
[
0
]
->
Op
()
->
HasAttr
(
"out_threshold"
))
{
new_desc
.
SetAttr
(
"X"
,
subgraph
.
at
(
x
)
->
inputs
[
0
]
->
Op
()
->
GetAttr
(
"out_threshold"
));
}
if
(
subgraph
.
at
(
y
)
->
inputs
[
0
]
->
Op
()
->
HasAttr
(
"out_threshold"
))
{
new_desc
.
SetAttr
(
"Y"
,
subgraph
.
at
(
y
)
->
inputs
[
0
]
->
Op
()
->
GetAttr
(
"out_threshold"
));
}
if
(
layer_norm
->
Op
()
->
HasAttr
(
"smooth_scale"
))
{
if
(
layer_norm
->
Op
()
->
HasAttr
(
"smooth_scale"
))
{
new_desc
.
SetAttr
(
"smooth_scale"
,
new_desc
.
SetAttr
(
"smooth_scale"
,
...
...
paddle/fluid/framework/ir/vit_attention_fuse_pass.cc
浏览文件 @
ef734e84
...
@@ -79,7 +79,7 @@ void VitAttentionFusePass::ApplyImpl(ir::Graph* graph) const {
...
@@ -79,7 +79,7 @@ void VitAttentionFusePass::ApplyImpl(ir::Graph* graph) const {
auto
*
scope
=
param_scope
();
auto
*
scope
=
param_scope
();
// pattern
// pattern
std
::
unordered_set
<
std
::
string
>
matmul_ops
{
"mat
mul"
,
"matmul_v2
"
};
std
::
unordered_set
<
std
::
string
>
matmul_ops
{
"mat
rix_multiply
"
};
PDNode
*
x
=
gpd
.
mutable_pattern
()
PDNode
*
x
=
gpd
.
mutable_pattern
()
->
NewNode
(
"x"
)
->
NewNode
(
"x"
)
->
assert_is_ops_input
(
matmul_ops
,
"X"
)
->
assert_is_ops_input
(
matmul_ops
,
"X"
)
...
@@ -173,5 +173,4 @@ REGISTER_PASS_CAPABILITY(vit_attention_fuse_pass)
...
@@ -173,5 +173,4 @@ REGISTER_PASS_CAPABILITY(vit_attention_fuse_pass)
.
EQ
(
"transpose2"
,
0
)
.
EQ
(
"transpose2"
,
0
)
.
EQ
(
"slice"
,
0
)
.
EQ
(
"slice"
,
0
)
.
EQ
(
"scale"
,
0
)
.
EQ
(
"scale"
,
0
)
.
EQ
(
"softmax"
,
0
)
.
EQ
(
"softmax"
,
0
));
.
EQ
(
"matmul_v2"
,
0
));
paddle/fluid/inference/api/analysis_predictor.cc
浏览文件 @
ef734e84
...
@@ -2552,13 +2552,11 @@ USE_TRT_CONVERTER(transpose);
...
@@ -2552,13 +2552,11 @@ USE_TRT_CONVERTER(transpose);
USE_TRT_CONVERTER
(
transpose2
);
USE_TRT_CONVERTER
(
transpose2
);
USE_TRT_CONVERTER
(
flatten
);
USE_TRT_CONVERTER
(
flatten
);
USE_TRT_CONVERTER
(
flatten_contiguous_range
);
USE_TRT_CONVERTER
(
flatten_contiguous_range
);
USE_TRT_CONVERTER
(
matmul
);
USE_TRT_CONVERTER
(
matrix_multiply
);
USE_TRT_CONVERTER
(
matmul_v2
);
USE_TRT_CONVERTER
(
bmm
);
USE_TRT_CONVERTER
(
bmm
);
USE_TRT_CONVERTER
(
conv2d
);
USE_TRT_CONVERTER
(
conv2d
);
USE_TRT_CONVERTER
(
relu
);
USE_TRT_CONVERTER
(
relu
);
USE_TRT_CONVERTER
(
sigmoid
);
USE_TRT_CONVERTER
(
sigmoid
);
USE_TRT_CONVERTER
(
fc
);
USE_TRT_CONVERTER
(
pool2d
);
USE_TRT_CONVERTER
(
pool2d
);
USE_TRT_CONVERTER
(
softmax
);
USE_TRT_CONVERTER
(
softmax
);
USE_TRT_CONVERTER
(
batch_norm
);
USE_TRT_CONVERTER
(
batch_norm
);
...
...
paddle/fluid/inference/api/paddle_pass_builder.cc
浏览文件 @
ef734e84
...
@@ -87,6 +87,7 @@ void PaddlePassBuilder::ClearPasses() { passes_.clear(); }
...
@@ -87,6 +87,7 @@ void PaddlePassBuilder::ClearPasses() { passes_.clear(); }
const
std
::
vector
<
std
::
string
>
kTRTSubgraphPasses
({
const
std
::
vector
<
std
::
string
>
kTRTSubgraphPasses
({
"trt_support_nhwc_pass"
,
"trt_support_nhwc_pass"
,
"adaptive_pool2d_convert_global_pass"
,
//
"adaptive_pool2d_convert_global_pass"
,
//
"trt_map_ops_to_matrix_multiply_pass"
,
//
"shuffle_channel_detect_pass"
,
//
"shuffle_channel_detect_pass"
,
//
"quant_conv2d_dequant_fuse_pass"
,
//
"quant_conv2d_dequant_fuse_pass"
,
//
"delete_fill_constant_op_pass"
,
//
"delete_fill_constant_op_pass"
,
//
...
@@ -96,7 +97,6 @@ const std::vector<std::string> kTRTSubgraphPasses({
...
@@ -96,7 +97,6 @@ const std::vector<std::string> kTRTSubgraphPasses({
"delete_quant_dequant_linear_op_pass"
,
//
"delete_quant_dequant_linear_op_pass"
,
//
"identity_scale_op_clean_pass"
,
//
"identity_scale_op_clean_pass"
,
//
"add_support_int8_pass"
,
//
"add_support_int8_pass"
,
//
// "fc_fuse_pass", //
"simplify_with_basic_ops_pass"
,
//
"simplify_with_basic_ops_pass"
,
//
"trt_embedding_eltwise_layernorm_fuse_pass"
,
//
"trt_embedding_eltwise_layernorm_fuse_pass"
,
//
"preln_embedding_eltwise_layernorm_fuse_pass"
,
//
"preln_embedding_eltwise_layernorm_fuse_pass"
,
//
...
@@ -124,12 +124,6 @@ const std::vector<std::string> kTRTSubgraphPasses({
...
@@ -124,12 +124,6 @@ const std::vector<std::string> kTRTSubgraphPasses({
"reverse_roll_fuse_pass"
,
//
"reverse_roll_fuse_pass"
,
//
"conv_bn_fuse_pass"
,
//
"conv_bn_fuse_pass"
,
//
"unsqueeze2_eltwise_fuse_pass"
,
//
"unsqueeze2_eltwise_fuse_pass"
,
//
"trt_squeeze2_matmul_fuse_pass"
,
//
"trt_flatten2_matmul_fuse_pass"
,
//
"trt_map_matmul_v2_to_mul_pass"
,
//
"trt_map_matmul_v2_to_matmul_pass"
,
//
"trt_map_matmul_to_mul_pass"
,
//
"fc_fuse_pass"
,
//
"conv_elementwise_add_fuse_pass"
,
//
"conv_elementwise_add_fuse_pass"
,
//
#if defined _WIN32 // Windows CI is TensorRT7.0. Remove this after upgrading.
#if defined _WIN32 // Windows CI is TensorRT7.0. Remove this after upgrading.
#else
#else
...
@@ -216,10 +210,6 @@ const std::vector<std::string> kTrtLowerPrecisionPasses{
...
@@ -216,10 +210,6 @@ const std::vector<std::string> kTrtLowerPrecisionPasses{
// "conv_eltwiseadd_bn_fuse_pass",
// "conv_eltwiseadd_bn_fuse_pass",
"trt_embedding_eltwise_layernorm_fuse_pass"
,
"trt_embedding_eltwise_layernorm_fuse_pass"
,
"trt_skip_layernorm_fuse_pass"
,
"trt_skip_layernorm_fuse_pass"
,
"trt_map_matmul_v2_to_mul_pass"
,
"trt_map_matmul_v2_to_matmul_pass"
,
"trt_map_matmul_to_mul_pass"
,
"fc_fuse_pass"
,
"tensorrt_subgraph_pass"
,
"tensorrt_subgraph_pass"
,
};
};
...
...
paddle/fluid/inference/tensorrt/convert/CMakeLists.txt
浏览文件 @
ef734e84
...
@@ -2,11 +2,9 @@
...
@@ -2,11 +2,9 @@
list
(
list
(
APPEND
APPEND
CONVERT_FILES
CONVERT_FILES
matmul_op.cc
matrix_multiply_op.cc
matmul_v2_op.cc
bmm_op.cc
bmm_op.cc
conv2d_op.cc
conv2d_op.cc
fc_op.cc
pool2d_op.cc
pool2d_op.cc
elementwise_op.cc
elementwise_op.cc
batch_norm_op.cc
batch_norm_op.cc
...
...
paddle/fluid/inference/tensorrt/convert/fc_op.cc
已删除
100644 → 0
浏览文件 @
acf55016
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/inference/tensorrt/convert/op_converter.h"
namespace
paddle
{
namespace
inference
{
namespace
tensorrt
{
namespace
{
template
<
typename
T
>
void
tranpose_weight
(
const
T
*
src
,
T
*
dst
,
int
m
,
int
n
)
{
for
(
int
i
=
0
;
i
<
m
;
i
++
)
{
for
(
int
j
=
0
;
j
<
n
;
j
++
)
{
dst
[
j
*
m
+
i
]
=
src
[
i
*
n
+
j
];
}
}
}
}
// namespace
/*
* FC converter convert a MUL op in Fluid to a FC layer in TRT.
*/
class
FcOpConverter
:
public
OpConverter
{
public:
nvinfer1
::
ILayer
*
reshape_before_fc
(
nvinfer1
::
ITensor
*
before_fc
,
nvinfer1
::
Dims
x_dim
,
int
x_num_col_dims
,
std
::
string
output_name
)
{
// add shuffle before fc
nvinfer1
::
Dims
reshape_before_fc_dim
;
reshape_before_fc_dim
.
nbDims
=
x_num_col_dims
+
3
;
// padding shape "* x q x 1 x 1"
nvinfer1
::
ITensor
*
filal_reshape_before_fc_shape_tensor
=
nullptr
;
if
(
!
engine_
->
with_dynamic_shape
())
{
for
(
int
i
=
0
;
i
<
reshape_before_fc_dim
.
nbDims
;
i
++
)
{
reshape_before_fc_dim
.
d
[
i
]
=
1
;
}
for
(
int
i
=
0
;
i
<
x_dim
.
nbDims
;
i
++
)
{
if
(
i
<
x_num_col_dims
)
{
reshape_before_fc_dim
.
d
[
i
]
=
0
;
}
else
{
reshape_before_fc_dim
.
d
[
x_num_col_dims
]
*=
x_dim
.
d
[
i
];
}
}
}
else
{
std
::
vector
<
nvinfer1
::
ITensor
*>
reshape_before_fc_shape_tensor
;
nvinfer1
::
ITensor
*
input_shape_tensor
=
Shape
(
before_fc
);
for
(
int
i
=
0
;
i
<
reshape_before_fc_dim
.
nbDims
;
i
++
)
{
reshape_before_fc_shape_tensor
.
push_back
(
Add1DConstantLayer
(
1
));
}
for
(
int
i
=
0
;
i
<
x_dim
.
nbDims
;
i
++
)
{
if
(
i
<
x_num_col_dims
)
{
reshape_before_fc_shape_tensor
[
i
]
=
GetEleTensorOfShape
(
input_shape_tensor
,
i
);
}
else
{
reshape_before_fc_shape_tensor
[
x_num_col_dims
]
=
Prod
(
GetEleTensorOfShape
(
input_shape_tensor
,
i
),
reshape_before_fc_shape_tensor
[
x_num_col_dims
]);
}
}
filal_reshape_before_fc_shape_tensor
=
Concat
(
reshape_before_fc_shape_tensor
);
}
auto
*
reshape_before_fc_layer
=
TRT_ENGINE_ADD_LAYER
(
engine_
,
Shuffle
,
*
before_fc
);
if
(
!
engine_
->
with_dynamic_shape
())
{
reshape_before_fc_layer
->
setReshapeDimensions
(
reshape_before_fc_dim
);
}
else
{
reshape_before_fc_layer
->
setInput
(
1
,
*
filal_reshape_before_fc_shape_tensor
);
}
reshape_before_fc_layer
->
setName
(
(
"fc_op_reshape_before_fc: Shuffle (Output: "
+
output_name
+
")"
)
.
c_str
());
return
reshape_before_fc_layer
;
}
nvinfer1
::
ILayer
*
reshape_after_fc
(
nvinfer1
::
ITensor
*
after_fc
,
nvinfer1
::
Dims
x_dim
,
int
x_num_col_dims
)
{
// add shuffle after fc
nvinfer1
::
Dims
reshape_after_fc_dim
;
reshape_after_fc_dim
.
nbDims
=
x_num_col_dims
+
1
;
nvinfer1
::
ITensor
*
filal_reshape_after_fc_shape_tensor
=
nullptr
;
if
(
!
engine_
->
with_dynamic_shape
())
{
for
(
int
i
=
0
;
i
<
reshape_after_fc_dim
.
nbDims
;
i
++
)
{
reshape_after_fc_dim
.
d
[
i
]
=
0
;
}
}
else
{
std
::
vector
<
int
>
gather_indices
(
x_num_col_dims
+
1
);
std
::
iota
(
gather_indices
.
begin
(),
gather_indices
.
end
(),
0
);
filal_reshape_after_fc_shape_tensor
=
Gather
(
Shape
(
after_fc
),
gather_indices
);
}
auto
*
reshape_after_fc_layer
=
TRT_ENGINE_ADD_LAYER
(
engine_
,
Shuffle
,
*
after_fc
);
if
(
!
engine_
->
with_dynamic_shape
())
{
reshape_after_fc_layer
->
setReshapeDimensions
(
reshape_after_fc_dim
);
}
else
{
reshape_after_fc_layer
->
setInput
(
1
,
*
filal_reshape_after_fc_shape_tensor
);
}
return
reshape_after_fc_layer
;
}
void
operator
()(
const
framework
::
proto
::
OpDesc
&
op
,
const
framework
::
Scope
&
scope
,
bool
test_mode
)
override
{
VLOG
(
3
)
<<
"convert a fc op to tensorrt fc layer without bias"
;
framework
::
OpDesc
op_desc
(
op
,
nullptr
);
auto
output_name
=
op_desc
.
Output
(
"Out"
).
front
();
auto
input_names
=
op_desc
.
InputNames
();
bool
with_bias
=
input_names
.
size
()
>=
3
;
std
::
string
w_name
=
"Y"
;
std
::
string
i_name
=
"X"
;
if
(
with_bias
)
{
w_name
=
"W"
;
i_name
=
"Input"
;
}
// Declare inputs
auto
*
X
=
engine_
->
GetITensor
(
op_desc
.
Input
(
i_name
).
front
());
auto
x_dim
=
X
->
getDimensions
();
// Declare weights
auto
*
Y_v
=
scope
.
FindVar
(
op_desc
.
Input
(
w_name
).
front
());
PADDLE_ENFORCE_NOT_NULL
(
Y_v
,
platform
::
errors
::
NotFound
(
"Can not find %s presistale var of fc in scope."
,
w_name
));
auto
*
Y_t
=
Y_v
->
GetMutable
<
phi
::
DenseTensor
>
();
int
x_num_col_dims
=
op_desc
.
HasAttr
(
"x_num_col_dims"
)
?
PADDLE_GET_CONST
(
int
,
op_desc
.
GetAttr
(
"x_num_col_dims"
))
:
(
op_desc
.
HasAttr
(
"in_num_col_dims"
)
?
PADDLE_GET_CONST
(
int
,
op_desc
.
GetAttr
(
"in_num_col_dims"
))
:
1
);
const
std
::
string
activation_type
=
op_desc
.
HasAttr
(
"activation_type"
)
?
PADDLE_GET_CONST
(
std
::
string
,
op_desc
.
GetAttr
(
"activation_type"
))
:
""
;
bool
enable_int8
=
op_desc
.
HasAttr
(
"enable_int8"
);
bool
support_int8
=
false
;
if
(
op_desc
.
HasAttr
(
"support_int8"
))
{
support_int8
=
PADDLE_GET_CONST
(
bool
,
op_desc
.
GetAttr
(
"support_int8"
));
}
float
in_scale
=
0
;
if
(
enable_int8
||
support_int8
)
{
if
(
enable_int8
)
{
in_scale
=
PADDLE_GET_CONST
(
float
,
op_desc
.
GetAttr
(
"Input_scale"
));
}
else
{
in_scale
=
PADDLE_GET_CONST
(
float
,
op_desc
.
GetAttr
(
"X"
));
}
engine_
->
SetTensorDynamicRange
(
X
,
in_scale
);
}
PADDLE_ENFORCE_EQ
(
Y_t
->
dims
().
size
(),
2UL
,
platform
::
errors
::
InvalidArgument
(
"The fc's weight should be a matrix with 2 dims, but "
"it's %d-dimensional."
,
Y_t
->
dims
().
size
()));
// a matrix
int
m
=
Y_t
->
dims
()[
0
];
int
n
=
Y_t
->
dims
()[
1
];
auto
regist_fc
=
[
&
](
nvinfer1
::
ITensor
*
inputs
,
int
n_output
,
TensorRTEngine
::
Weight
&
weight
,
TensorRTEngine
::
Weight
&
bias
)
{
if
(
enable_int8
||
support_int8
)
{
// add conv layer
float
out_scale
=
0
;
if
(
enable_int8
)
{
PADDLE_ENFORCE_EQ
(
op_desc
.
HasAttr
(
"out_threshold"
),
true
,
platform
::
errors
::
InvalidArgument
(
"must have out threshold in fc layers in int8 mode"
));
out_scale
=
PADDLE_GET_CONST
(
float
,
op_desc
.
GetAttr
(
"out_threshold"
));
}
else
{
out_scale
=
PADDLE_GET_CONST
(
float
,
op_desc
.
GetAttr
(
"Out"
));
}
nvinfer1
::
DimsHW
nv_ksize
(
1
,
1
);
auto
*
fc_layer_int8
=
TRT_ENGINE_ADD_LAYER
(
engine_
,
Convolution
,
*
inputs
,
n_output
,
nv_ksize
,
weight
.
get
(),
bias
.
get
());
fc_layer_int8
->
setName
(
(
"fc_op_int8_conv1x1: Convolution (Output: "
+
output_name
+
")"
)
.
c_str
());
engine_
->
SetTensorDynamicRange
(
fc_layer_int8
->
getOutput
(
0
),
out_scale
);
auto
*
fc_after_reshape_int8
=
reshape_after_fc
(
fc_layer_int8
->
getOutput
(
0
),
x_dim
,
x_num_col_dims
);
if
(
activation_type
==
"relu"
)
{
fc_after_reshape_int8
->
setName
(
(
"int8_reshape_after_fc: Shuffle (Output: "
+
output_name
+
")"
)
.
c_str
());
engine_
->
SetTensorDynamicRange
(
fc_after_reshape_int8
->
getOutput
(
0
),
out_scale
);
nvinfer1
::
IActivationLayer
*
relu_layer_int8
=
TRT_ENGINE_ADD_LAYER
(
engine_
,
Activation
,
*
(
fc_after_reshape_int8
->
getOutput
(
0
)),
nvinfer1
::
ActivationType
::
kRELU
);
RreplenishLayerAndOutput
(
relu_layer_int8
,
"relu_after_fc_shuffle"
,
{
output_name
},
test_mode
);
}
else
{
RreplenishLayerAndOutput
(
fc_after_reshape_int8
,
"fc_op_int8_reshape_after_fc: Shuffle"
,
{
output_name
},
test_mode
);
}
}
else
{
// add fc layer
auto
*
fc_layer_float
=
TRT_ENGINE_ADD_LAYER
(
engine_
,
FullyConnected
,
*
inputs
,
n_output
,
weight
.
get
(),
bias
.
get
());
fc_layer_float
->
setName
(
(
"fc_op_float: FullyConnected (Output: "
+
output_name
+
")"
)
.
c_str
());
auto
*
fc_after_reshape_float
=
reshape_after_fc
(
fc_layer_float
->
getOutput
(
0
),
x_dim
,
x_num_col_dims
);
if
(
activation_type
==
"relu"
)
{
fc_after_reshape_float
->
setName
(
(
"float_reshape_after_fc: Shuffle (Output: "
+
output_name
+
")"
)
.
c_str
());
nvinfer1
::
IActivationLayer
*
relu_layer_float
=
TRT_ENGINE_ADD_LAYER
(
engine_
,
Activation
,
*
(
fc_after_reshape_float
->
getOutput
(
0
)),
nvinfer1
::
ActivationType
::
kRELU
);
RreplenishLayerAndOutput
(
relu_layer_float
,
"relu_after_fc_shuffle"
,
{
output_name
},
test_mode
);
}
else
{
RreplenishLayerAndOutput
(
fc_after_reshape_float
,
"shuffle_after_fc"
,
{
output_name
},
test_mode
);
}
}
};
bool
transpose_y
=
false
;
if
(
op_desc
.
HasAttr
(
"transpose_Y"
))
{
transpose_y
=
PADDLE_GET_CONST
(
bool
,
op_desc
.
GetAttr
(
"transpose_Y"
));
}
int
weight_w
,
weight_h
;
auto
weight
=
engine_
->
GetTrtWeight
(
op_desc
.
Input
(
w_name
).
front
(),
*
Y_t
);
if
(
!
transpose_y
)
{
if
(
weight
.
get
().
type
==
nvinfer1
::
DataType
::
kFLOAT
)
{
std
::
vector
<
float
>
weight_data_tmp
;
weight_data_tmp
.
reserve
(
Y_t
->
numel
());
memcpy
(
weight_data_tmp
.
data
(),
weight
.
get
().
values
,
Y_t
->
numel
()
*
sizeof
(
float
));
tranpose_weight
(
weight_data_tmp
.
data
(),
const_cast
<
float
*>
(
static_cast
<
const
float
*>
(
weight
.
get
().
values
)),
m
,
n
);
}
else
if
(
weight
.
get
().
type
==
nvinfer1
::
DataType
::
kHALF
)
{
std
::
vector
<
float16
>
weight_data_tmp
;
weight_data_tmp
.
reserve
(
Y_t
->
numel
());
memcpy
(
weight_data_tmp
.
data
(),
weight
.
get
().
values
,
Y_t
->
numel
()
*
sizeof
(
float16
));
tranpose_weight
(
weight_data_tmp
.
data
(),
const_cast
<
float16
*>
(
static_cast
<
const
float16
*>
(
weight
.
get
().
values
)),
m
,
n
);
}
else
{
PADDLE_THROW
(
paddle
::
platform
::
errors
::
InvalidArgument
(
"Paddle-TRT fc convert not supporte dtype, now only support fp32 "
"and fp16."
));
}
weight_w
=
n
;
weight_h
=
m
;
}
else
{
weight_w
=
m
;
weight_h
=
n
;
}
size_t
n_output
=
weight_w
;
weight
.
dims
.
assign
({
weight_w
,
weight_h
});
TensorRTEngine
::
Weight
bias
{
weight
.
get
().
type
,
nullptr
,
0
};
if
(
with_bias
)
{
auto
*
b_v
=
scope
.
GetVar
(
op_desc
.
Input
(
"Bias"
).
front
());
auto
*
b_t
=
b_v
->
GetMutable
<
phi
::
DenseTensor
>
();
bias
=
engine_
->
GetTrtWeight
(
op_desc
.
Input
(
"Bias"
).
front
(),
*
b_t
);
}
// Running the TRT Static Shape mode: x_num_col_dims-1
if
(
!
engine_
->
with_dynamic_shape
())
{
x_num_col_dims
--
;
}
// If use tensorrt'oss, the x_dim and x_num_col_dims need change, and can
// not add Shuffle layer in ernie's multihead.
if
(
x_dim
.
nbDims
==
4
&&
x_dim
.
d
[
2
]
==
1
&&
x_dim
.
d
[
3
]
==
1
)
{
if
(
enable_int8
||
support_int8
)
{
// add conv1x1 layer
nvinfer1
::
DimsHW
nv_ksize
(
1
,
1
);
auto
*
fc_layer_int8
=
TRT_ENGINE_ADD_LAYER
(
engine_
,
Convolution
,
*
X
,
n_output
,
nv_ksize
,
weight
.
get
(),
bias
.
get
());
if
(
activation_type
==
"relu"
)
{
fc_layer_int8
->
setName
(
(
"ernie_fc_op_int8: Convolution (Output: "
+
output_name
+
")"
)
.
c_str
());
PADDLE_ENFORCE_EQ
(
op_desc
.
HasAttr
(
"out_threshold"
),
true
,
platform
::
errors
::
InvalidArgument
(
"must have out threshold in fc layers in int8 mode"
));
float
out_scale
=
0
;
if
(
enable_int8
)
{
out_scale
=
PADDLE_GET_CONST
(
float
,
op_desc
.
GetAttr
(
"out_threshold"
));
}
else
{
out_scale
=
PADDLE_GET_CONST
(
float
,
op_desc
.
GetAttr
(
"Out"
));
}
engine_
->
SetTensorDynamicRange
(
fc_layer_int8
->
getOutput
(
0
),
out_scale
);
nvinfer1
::
IActivationLayer
*
relu_layer_int8
=
TRT_ENGINE_ADD_LAYER
(
engine_
,
Activation
,
*
(
fc_layer_int8
->
getOutput
(
0
)),
nvinfer1
::
ActivationType
::
kRELU
);
RreplenishLayerAndOutput
(
relu_layer_int8
,
"relu_after_ernie_fc_int8"
,
{
output_name
},
test_mode
);
}
else
{
RreplenishLayerAndOutput
(
fc_layer_int8
,
"ernie_fc_op_int8: Convolution"
,
{
output_name
},
test_mode
);
}
}
else
{
// add fc layer
auto
*
fc_layer_float
=
TRT_ENGINE_ADD_LAYER
(
engine_
,
FullyConnected
,
*
X
,
n_output
,
weight
.
get
(),
bias
.
get
());
if
(
activation_type
==
"relu"
)
{
fc_layer_float
->
setName
(
(
"ernie_fc_op_float: (Output: "
+
output_name
+
")"
).
c_str
());
nvinfer1
::
IActivationLayer
*
relu_layer_float
=
TRT_ENGINE_ADD_LAYER
(
engine_
,
Activation
,
*
(
fc_layer_float
->
getOutput
(
0
)),
nvinfer1
::
ActivationType
::
kRELU
);
RreplenishLayerAndOutput
(
relu_layer_float
,
"relu_after_ernie_fc_float"
,
{
output_name
},
test_mode
);
}
else
{
RreplenishLayerAndOutput
(
fc_layer_float
,
"ernie_fc_op_float"
,
{
output_name
},
test_mode
);
}
}
}
else
{
// need reshape input before and after fc
PADDLE_ENFORCE_GT
(
x_dim
.
nbDims
,
x_num_col_dims
,
platform
::
errors
::
InvalidArgument
(
"Params and input dims mismatch. Paddle-TRT FC "
"converter expects x_dim.nbDims > x_num_col_dims, but "
"x_dim.nbDims : %d, x_num_col_dims : %d."
,
x_dim
.
nbDims
,
x_num_col_dims
));
auto
*
reshape_before_fc_layer
=
reshape_before_fc
(
X
,
x_dim
,
x_num_col_dims
,
output_name
);
auto
*
reshape_itensor
=
reshape_before_fc_layer
->
getOutput
(
0
);
if
(
enable_int8
||
support_int8
)
{
engine_
->
SetTensorDynamicRange
(
reshape_itensor
,
in_scale
);
}
regist_fc
(
reshape_itensor
,
n_output
,
weight
,
bias
);
}
}
};
}
// namespace tensorrt
}
// namespace inference
}
// namespace paddle
REGISTER_TRT_OP_CONVERTER
(
fc
,
FcOpConverter
);
paddle/fluid/inference/tensorrt/convert/matmul_op.cc
已删除
100644 → 0
浏览文件 @
acf55016
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/inference/tensorrt/convert/op_converter.h"
#include "paddle/fluid/inference/tensorrt/plugin/matmul_op_int8_plugin.h"
namespace
paddle
{
namespace
inference
{
namespace
tensorrt
{
/*
* MatMulOp, IMatrixMultiplyLayer in TRT. This Layer doesn't has weights.
*/
class
MatMulOpConverter
:
public
OpConverter
{
public:
void
operator
()(
const
framework
::
proto
::
OpDesc
&
op
,
const
framework
::
Scope
&
scope
,
bool
test_mode
)
override
{
VLOG
(
3
)
<<
"convert a matmul op to tensorrt matmul layer "
;
framework
::
OpDesc
op_desc
(
op
,
nullptr
);
nvinfer1
::
ILayer
*
layer
=
nullptr
;
// Declare inputs
auto
*
input1
=
engine_
->
GetITensor
(
op_desc
.
Input
(
"X"
)[
0
]);
auto
*
input2
=
engine_
->
GetITensor
(
op_desc
.
Input
(
"Y"
)[
0
]);
nvinfer1
::
Dims
dims_x
=
input1
->
getDimensions
();
nvinfer1
::
Dims
dims_y
=
input2
->
getDimensions
();
bool
transpose_X
=
PADDLE_GET_CONST
(
bool
,
op_desc
.
GetAttr
(
"transpose_X"
));
bool
transpose_Y
=
PADDLE_GET_CONST
(
bool
,
op_desc
.
GetAttr
(
"transpose_Y"
));
auto
output_name
=
op_desc
.
Output
(
"Out"
)[
0
];
float
alpha
=
1
;
if
(
op_desc
.
HasAttr
(
"alpha"
))
{
float
alpha_tem
=
PADDLE_GET_CONST
(
float
,
op_desc
.
GetAttr
(
"alpha"
));
alpha
=
alpha_tem
;
}
nvinfer1
::
MatrixOperation
matrix_operation_X
=
transpose_X
?
nvinfer1
::
MatrixOperation
::
kTRANSPOSE
:
nvinfer1
::
MatrixOperation
::
kNONE
;
nvinfer1
::
MatrixOperation
matrix_operation_Y
=
transpose_Y
?
nvinfer1
::
MatrixOperation
::
kTRANSPOSE
:
nvinfer1
::
MatrixOperation
::
kNONE
;
if
(
op_desc
.
HasAttr
(
"support_int8"
)
&&
PADDLE_GET_CONST
(
bool
,
op_desc
.
GetAttr
(
"support_int8"
))
&&
engine_
->
precision
()
==
AnalysisConfig
::
Precision
::
kInt8
&&
platform
::
GetGPUComputeCapability
(
platform
::
GetCurrentDeviceId
())
>=
75
)
{
if
(
engine_
->
with_dynamic_shape
())
{
VLOG
(
3
)
<<
"Convert a fluid matmul_op_int8_dynamic to TensorRT "
"MatmulPluginLayer"
;
plugin
::
MatmulPluginDynamic
*
plugin
=
new
plugin
::
MatmulPluginDynamic
(
transpose_X
,
transpose_Y
,
alpha
);
std
::
vector
<
nvinfer1
::
ITensor
*>
inputs
{
input1
,
input2
};
layer
=
engine_
->
AddDynamicPlugin
(
inputs
.
data
(),
inputs
.
size
(),
plugin
);
RreplenishLayerAndOutput
(
layer
,
"matmul_op_int8_dynamic"
,
{
output_name
},
test_mode
);
}
else
{
VLOG
(
3
)
<<
"Convert a fluid matmul_op_int8_static to TensorRT "
"MatmulPluginLayer"
;
plugin
::
MatmulPlugin
*
plugin
=
new
plugin
::
MatmulPlugin
(
dims_x
,
dims_y
,
transpose_X
,
transpose_Y
,
alpha
);
std
::
vector
<
nvinfer1
::
ITensor
*>
inputs
{
input1
,
input2
};
layer
=
engine_
->
AddPluginV2IOExt
(
inputs
.
data
(),
inputs
.
size
(),
plugin
);
RreplenishLayerAndOutput
(
layer
,
"matmul_op_int8_static"
,
{
output_name
},
test_mode
);
}
}
else
{
VLOG
(
3
)
<<
"Convert a fluid matmul_op_float to TensorRT "
;
layer
=
TRT_ENGINE_ADD_LAYER
(
engine_
,
MatrixMultiply
,
*
input1
,
matrix_operation_X
,
*
input2
,
matrix_operation_Y
);
if
(
alpha
==
1
)
{
RreplenishLayerAndOutput
(
layer
,
"matmul_op_float_no_alpha"
,
{
output_name
},
test_mode
);
}
else
{
layer
->
setName
(
(
"matmul_op_float_has_alpha: MatrixMultiplyLayer (Output: "
+
output_name
+
")"
)
.
c_str
());
// IScaleLayer requires the input must have at least
// three dimensions in static shape mode and at least
// four dimensions in dynamic shape mode.
auto
*
matmul_out
=
layer
->
getOutput
(
0
);
nvinfer1
::
Dims
out_shape
=
matmul_out
->
getDimensions
();
const
int
out_dims
=
out_shape
.
nbDims
;
bool
need_change_dim
=
false
;
if
(
engine_
->
with_dynamic_shape
())
{
if
(
out_dims
==
3
)
{
need_change_dim
=
true
;
}
}
else
{
if
(
out_dims
==
2
)
{
need_change_dim
=
true
;
}
}
if
(
need_change_dim
)
{
nvinfer1
::
Dims
reshape_dim
;
reshape_dim
.
nbDims
=
out_dims
+
1
;
reshape_dim
.
d
[
out_dims
]
=
1
;
for
(
int
i
=
0
;
i
<
out_dims
;
i
++
)
{
reshape_dim
.
d
[
i
]
=
out_shape
.
d
[
i
];
}
auto
*
reshape_layer
=
TRT_ENGINE_ADD_LAYER
(
engine_
,
Shuffle
,
*
matmul_out
);
reshape_layer
->
setReshapeDimensions
(
reshape_dim
);
matmul_out
=
reshape_layer
->
getOutput
(
0
);
reshape_layer
->
setName
((
"matmul_op_float_has_alpha_reshape_before: "
"ShuffleLayer (Output: "
+
output_name
+
")"
)
.
c_str
());
}
auto
create_weights
=
[
&
](
float
data
,
const
std
::
string
&
type
)
->
float
*
{
std
::
unique_ptr
<
phi
::
DenseTensor
>
tmp_tensor
(
new
phi
::
DenseTensor
());
tmp_tensor
->
Resize
({
1
});
auto
*
tmp_data
=
tmp_tensor
->
mutable_data
<
float
>
(
platform
::
CPUPlace
());
tmp_data
[
0
]
=
data
;
engine_
->
SetWeights
(
output_name
+
"_add_scale_op_"
+
type
,
std
::
move
(
tmp_tensor
));
return
tmp_data
;
};
float
*
alpha_data
=
create_weights
(
alpha
,
"alpha"
);
float
*
shift_data
=
create_weights
(
0.0
,
"shift"
);
float
*
power_data
=
create_weights
(
1.0
,
"power"
);
TensorRTEngine
::
Weight
nv_alpha
{
nvinfer1
::
DataType
::
kFLOAT
,
static_cast
<
void
*>
(
alpha_data
),
1
};
TensorRTEngine
::
Weight
nv_shift
{
nvinfer1
::
DataType
::
kFLOAT
,
static_cast
<
void
*>
(
shift_data
),
1
};
TensorRTEngine
::
Weight
nv_power
{
nvinfer1
::
DataType
::
kFLOAT
,
static_cast
<
void
*>
(
power_data
),
1
};
auto
*
scale_layer
=
TRT_ENGINE_ADD_LAYER
(
engine_
,
Scale
,
*
matmul_out
,
nvinfer1
::
ScaleMode
::
kUNIFORM
,
nv_shift
.
get
(),
nv_alpha
.
get
(),
nv_power
.
get
());
auto
*
scale_out
=
scale_layer
->
getOutput
(
0
);
scale_layer
->
setName
(
(
"matmul_op_float_has_alpha: ScaleLayer (Output: "
+
output_name
+
")"
)
.
c_str
());
if
(
need_change_dim
)
{
auto
*
reshape_layer
=
TRT_ENGINE_ADD_LAYER
(
engine_
,
Shuffle
,
*
scale_out
);
reshape_layer
->
setReshapeDimensions
(
out_shape
);
scale_out
=
reshape_layer
->
getOutput
(
0
);
reshape_layer
->
setName
((
"matmul_op_float_has_alpha_reshape_after: "
"ShuffleLayer (Output: "
+
output_name
+
")"
)
.
c_str
());
}
engine_
->
SetITensor
(
output_name
,
scale_out
);
if
(
test_mode
)
{
// the test framework can not determine which is the
// output, so place the declaration inside.
engine_
->
DeclareOutput
(
output_name
);
}
}
}
}
};
}
// namespace tensorrt
}
// namespace inference
}
// namespace paddle
REGISTER_TRT_OP_CONVERTER
(
matmul
,
MatMulOpConverter
);
paddle/fluid/inference/tensorrt/convert/matmul_v2_op.cc
已删除
100644 → 0
浏览文件 @
acf55016
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/inference/tensorrt/convert/op_converter.h"
#include "paddle/fluid/inference/tensorrt/plugin/matmul_op_int8_plugin.h"
namespace
paddle
{
namespace
inference
{
namespace
tensorrt
{
/*
* MatMulV2Op, IMatrixMultiplyLayer in TRT. This Layer doesn't has weights.
*/
class
MatMulV2OpConverter
:
public
OpConverter
{
public:
void
operator
()(
const
framework
::
proto
::
OpDesc
&
op
,
const
framework
::
Scope
&
scope
,
bool
test_mode
)
override
{
VLOG
(
3
)
<<
"convert a matmul_v2 op to tensorrt IMatrixMultiplyLayer layer "
;
framework
::
OpDesc
op_desc
(
op
,
nullptr
);
nvinfer1
::
IMatrixMultiplyLayer
*
layer
=
nullptr
;
// Declare inputs
auto
*
input1
=
engine_
->
GetITensor
(
op_desc
.
Input
(
"X"
)[
0
]);
auto
*
input2
=
engine_
->
GetITensor
(
op_desc
.
Input
(
"Y"
)[
0
]);
nvinfer1
::
Dims
dims_x
=
input1
->
getDimensions
();
nvinfer1
::
Dims
dims_y
=
input2
->
getDimensions
();
bool
transpose_X
=
PADDLE_GET_CONST
(
bool
,
op_desc
.
GetAttr
(
"trans_x"
));
bool
transpose_Y
=
PADDLE_GET_CONST
(
bool
,
op_desc
.
GetAttr
(
"trans_y"
));
auto
output_name
=
op_desc
.
Output
(
"Out"
)[
0
];
nvinfer1
::
MatrixOperation
matrix_operation_X
=
transpose_X
?
nvinfer1
::
MatrixOperation
::
kTRANSPOSE
:
nvinfer1
::
MatrixOperation
::
kNONE
;
nvinfer1
::
MatrixOperation
matrix_operation_Y
=
transpose_Y
?
nvinfer1
::
MatrixOperation
::
kTRANSPOSE
:
nvinfer1
::
MatrixOperation
::
kNONE
;
int
one_num
=
0
;
bool
all_matrix
=
dims_x
.
nbDims
>=
2
&&
dims_y
.
nbDims
>=
2
;
nvinfer1
::
ITensor
*
new_shape_tensor
=
nullptr
;
if
(
dims_x
.
nbDims
<
dims_y
.
nbDims
&&
all_matrix
)
{
one_num
=
dims_y
.
nbDims
-
dims_x
.
nbDims
;
new_shape_tensor
=
Shape
(
input1
);
std
::
vector
<
int32_t
>
one_vec
(
one_num
,
1
);
auto
*
one_tensor
=
Add1DConstantLayer
(
one_vec
);
new_shape_tensor
=
Concat
(
std
::
vector
<
nvinfer1
::
ITensor
*>
{
one_tensor
,
new_shape_tensor
});
auto
*
reshape_layer
=
TRT_ENGINE_ADD_LAYER
(
engine_
,
Shuffle
,
*
input1
);
reshape_layer
->
setInput
(
1
,
*
new_shape_tensor
);
layer
=
TRT_ENGINE_ADD_LAYER
(
engine_
,
MatrixMultiply
,
*
reshape_layer
->
getOutput
(
0
),
matrix_operation_X
,
*
input2
,
matrix_operation_Y
);
}
else
if
(
dims_x
.
nbDims
>
dims_y
.
nbDims
&&
all_matrix
)
{
one_num
=
dims_x
.
nbDims
-
dims_y
.
nbDims
;
new_shape_tensor
=
Shape
(
input2
);
std
::
vector
<
int32_t
>
one_vec
(
one_num
,
1
);
auto
*
one_tensor
=
Add1DConstantLayer
(
one_vec
);
new_shape_tensor
=
Concat
(
std
::
vector
<
nvinfer1
::
ITensor
*>
{
one_tensor
,
new_shape_tensor
});
auto
*
reshape_layer
=
TRT_ENGINE_ADD_LAYER
(
engine_
,
Shuffle
,
*
input2
);
reshape_layer
->
setInput
(
1
,
*
new_shape_tensor
);
layer
=
TRT_ENGINE_ADD_LAYER
(
engine_
,
MatrixMultiply
,
*
input1
,
matrix_operation_X
,
*
reshape_layer
->
getOutput
(
0
),
matrix_operation_Y
);
}
else
{
layer
=
TRT_ENGINE_ADD_LAYER
(
engine_
,
MatrixMultiply
,
*
input1
,
matrix_operation_X
,
*
input2
,
matrix_operation_Y
);
}
if
(
dims_x
.
nbDims
==
1
)
layer
->
setOperation
(
0
,
nvinfer1
::
MatrixOperation
::
kVECTOR
);
if
(
dims_y
.
nbDims
==
1
)
layer
->
setOperation
(
1
,
nvinfer1
::
MatrixOperation
::
kVECTOR
);
nvinfer1
::
ILayer
*
final_layer
=
static_cast
<
nvinfer1
::
ILayer
*>
(
layer
);
// When vec * vec, trt produces a scalar, so to be consistent with paddle,
// we need add a reshape.
if
(
dims_x
.
nbDims
==
1
&&
dims_y
.
nbDims
==
1
)
{
auto
reshape_layer
=
TRT_ENGINE_ADD_LAYER
(
engine_
,
Shuffle
,
*
layer
->
getOutput
(
0
));
nvinfer1
::
Dims
reshape_dim
;
reshape_dim
.
nbDims
=
1
;
reshape_dim
.
d
[
0
]
=
1
;
reshape_layer
->
setReshapeDimensions
(
reshape_dim
);
final_layer
=
static_cast
<
nvinfer1
::
ILayer
*>
(
reshape_layer
);
}
VLOG
(
3
)
<<
"Convert a matmul_v2_op to TensorRT "
;
RreplenishLayerAndOutput
(
final_layer
,
"matmul_v2_op"
,
{
output_name
},
test_mode
);
}
};
}
// namespace tensorrt
}
// namespace inference
}
// namespace paddle
REGISTER_TRT_OP_CONVERTER
(
matmul_v2
,
MatMulV2OpConverter
);
paddle/fluid/inference/tensorrt/convert/matrix_multiply_op.cc
0 → 100644
浏览文件 @
ef734e84
/* Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/inference/tensorrt/convert/op_converter.h"
#include "paddle/fluid/inference/tensorrt/plugin/matmul_op_int8_plugin.h"
namespace
paddle
{
namespace
inference
{
namespace
tensorrt
{
/*
* After trt_map_ops_to_matrix_multiply_pass(mul, matmul, matmul_v2 ->
* matrix_multiply), use MatrixMultiply layer, ElementWiseOperation::kPROD
* layer.
*/
class
MatrixMultiplyOpConverter
:
public
OpConverter
{
public:
void
operator
()(
const
framework
::
proto
::
OpDesc
&
op
,
const
framework
::
Scope
&
scope
,
bool
test_mode
)
override
{
VLOG
(
3
)
<<
"convert a matrix_multiply op to TensorRT MatrixMultiply layer + "
"ElementWiseOperation::kPROD layer(if alpha != 1)."
;
// Input: X, Y
// Output: Out
// Attributes: transpose_x, transpose_y, x_num_col_dims, y_num_col_dims,
// alpha. extra Attributes(for quant dequant): X, Y, Out, Input_scale,
// out_threshold.
framework
::
OpDesc
op_desc
(
op
,
nullptr
);
// Declare inputs
auto
*
input1
=
engine_
->
GetITensor
(
op_desc
.
Input
(
"X"
)[
0
]);
auto
*
input2
=
engine_
->
GetITensor
(
op_desc
.
Input
(
"Y"
)[
0
]);
bool
enable_int8
=
(
engine_
->
precision
()
==
AnalysisConfig
::
Precision
::
kInt8
);
float
x_scale
=
0
;
float
y_scale
=
0
;
float
out_scale
=
0
;
if
(
enable_int8
)
{
if
(
op_desc
.
HasAttr
(
"Input_scale"
))
{
x_scale
=
PADDLE_GET_CONST
(
float
,
op_desc
.
GetAttr
(
"Input_scale"
));
engine_
->
SetTensorDynamicRange
(
input1
,
x_scale
);
}
if
(
op_desc
.
HasAttr
(
"X"
))
{
x_scale
=
PADDLE_GET_CONST
(
float
,
op_desc
.
GetAttr
(
"X"
));
engine_
->
SetTensorDynamicRange
(
input1
,
x_scale
);
}
if
(
op_desc
.
HasAttr
(
"Y"
))
{
y_scale
=
PADDLE_GET_CONST
(
float
,
op_desc
.
GetAttr
(
"Y"
));
engine_
->
SetTensorDynamicRange
(
input2
,
y_scale
);
}
if
(
op_desc
.
HasAttr
(
"out_threshold"
))
{
out_scale
=
PADDLE_GET_CONST
(
float
,
op_desc
.
GetAttr
(
"out_threshold"
));
}
if
(
op_desc
.
HasAttr
(
"Out"
))
{
out_scale
=
PADDLE_GET_CONST
(
float
,
op_desc
.
GetAttr
(
"Out"
));
}
}
auto
output_name
=
op_desc
.
Output
(
"Out"
)[
0
];
nvinfer1
::
Dims
dims_x
=
input1
->
getDimensions
();
int32_t
x_rank
=
dims_x
.
nbDims
;
nvinfer1
::
Dims
dims_y
=
input2
->
getDimensions
();
int32_t
y_rank
=
dims_y
.
nbDims
;
int32_t
x_num_col_dims
=
PADDLE_GET_CONST
(
int32_t
,
op_desc
.
GetAttr
(
"x_num_col_dims"
));
if
(
x_num_col_dims
<
0
)
{
x_num_col_dims
+=
x_rank
;
}
// Temporarily solve the reformat problem of matrix multiplication, make
// input.rank == 4. Possible solution in trt 8.7.
if
(
x_rank
==
2
&&
x_num_col_dims
==
1
&&
engine_
->
use_varseqlen
())
{
VLOG
(
3
)
<<
"Temporarily solve the reformat problem of matrix "
"multiplication, make input.rank == 4. "
;
auto
*
reshape_before_matrix
=
TRT_ENGINE_ADD_LAYER
(
engine_
,
Shuffle
,
*
input1
);
std
::
vector
<
nvinfer1
::
ITensor
*>
reshape_before_tensor
;
reshape_before_tensor
.
push_back
(
GetEleTensorOfShape
(
Shape
(
input1
),
0
));
reshape_before_tensor
.
push_back
(
GetEleTensorOfShape
(
Shape
(
input1
),
1
));
reshape_before_tensor
.
push_back
(
Add1DConstantLayer
(
1
));
reshape_before_tensor
.
push_back
(
Add1DConstantLayer
(
1
));
reshape_before_matrix
->
setInput
(
1
,
*
Concat
(
reshape_before_tensor
));
reshape_before_matrix
->
setName
(
(
"reshape_before_matrix(Output: "
+
output_name
+
")"
).
c_str
());
input1
=
reshape_before_matrix
->
getOutput
(
0
);
dims_x
=
input1
->
getDimensions
();
x_rank
=
dims_x
.
nbDims
;
if
(
enable_int8
)
{
if
(
op_desc
.
HasAttr
(
"Input_scale"
)
||
op_desc
.
HasAttr
(
"X"
))
{
engine_
->
SetTensorDynamicRange
(
input1
,
x_scale
);
}
}
}
if
(
x_num_col_dims
!=
x_rank
-
1
)
{
std
::
vector
<
nvinfer1
::
ITensor
*>
before_shape_tensors
;
nvinfer1
::
ITensor
*
input_shape_tensor
=
Shape
(
input1
);
for
(
int
i
=
0
;
i
<
x_num_col_dims
;
++
i
)
{
before_shape_tensors
.
push_back
(
GetEleTensorOfShape
(
input_shape_tensor
,
i
));
}
nvinfer1
::
ITensor
*
producted
=
Add1DConstantLayer
(
1
);
for
(
int
i
=
x_num_col_dims
;
i
<
x_rank
;
++
i
)
{
producted
=
Prod
(
producted
,
GetEleTensorOfShape
(
input_shape_tensor
,
i
));
}
before_shape_tensors
.
push_back
(
producted
);
nvinfer1
::
ITensor
*
before_shape_tensor
=
Concat
(
before_shape_tensors
);
auto
*
reshape_before_layer
=
TRT_ENGINE_ADD_LAYER
(
engine_
,
Shuffle
,
*
input1
);
reshape_before_layer
->
setInput
(
1
,
*
before_shape_tensor
);
reshape_before_layer
->
setName
(
(
"reshape_x_before_matrix_multiply: Shuffle (Output: "
+
output_name
+
")"
)
.
c_str
());
input1
=
reshape_before_layer
->
getOutput
(
0
);
if
(
enable_int8
)
{
if
(
op_desc
.
HasAttr
(
"Input_scale"
)
||
op_desc
.
HasAttr
(
"X"
))
{
engine_
->
SetTensorDynamicRange
(
input1
,
x_scale
);
}
}
x_rank
=
x_num_col_dims
+
1
;
}
int32_t
y_num_col_dims
=
PADDLE_GET_CONST
(
int32_t
,
op_desc
.
GetAttr
(
"y_num_col_dims"
));
if
(
y_num_col_dims
<
0
)
{
y_num_col_dims
+=
y_rank
;
}
PADDLE_ENFORCE_EQ
(
y_num_col_dims
,
y_rank
-
1
,
platform
::
errors
::
InvalidArgument
(
"The matrix_multiply op'y_num_col_dims should be equal "
"to y'rank - 1, but got y_num_col_dims = %d, and y_rank = %d"
,
y_num_col_dims
,
y_rank
-
1
));
if
(
x_rank
!=
1
&&
y_rank
!=
1
&&
x_rank
!=
y_rank
)
{
if
(
x_rank
<
y_rank
)
{
std
::
vector
<
nvinfer1
::
ITensor
*>
before_shape_tensors
;
nvinfer1
::
ITensor
*
input_shape_tensor
=
Shape
(
input1
);
for
(
int
i
=
0
;
i
<
y_rank
-
x_rank
;
++
i
)
{
before_shape_tensors
.
push_back
(
Add1DConstantLayer
(
1
));
}
for
(
int
i
=
0
;
i
<
x_rank
;
++
i
)
{
before_shape_tensors
.
push_back
(
GetEleTensorOfShape
(
input_shape_tensor
,
i
));
}
nvinfer1
::
ITensor
*
before_shape_tensor
=
Concat
(
before_shape_tensors
);
auto
*
reshape_before_layer
=
TRT_ENGINE_ADD_LAYER
(
engine_
,
Shuffle
,
*
input1
);
reshape_before_layer
->
setInput
(
1
,
*
before_shape_tensor
);
reshape_before_layer
->
setName
(
(
"full_x_before_matrix_multiply: Shuffle (Output: "
+
output_name
+
")"
)
.
c_str
());
input1
=
reshape_before_layer
->
getOutput
(
0
);
if
(
enable_int8
)
{
if
(
op_desc
.
HasAttr
(
"Input_scale"
)
||
op_desc
.
HasAttr
(
"X"
))
{
engine_
->
SetTensorDynamicRange
(
input1
,
x_scale
);
}
}
x_rank
=
y_rank
;
}
else
{
std
::
vector
<
nvinfer1
::
ITensor
*>
before_shape_tensors
;
nvinfer1
::
ITensor
*
input_shape_tensor
=
Shape
(
input2
);
for
(
int
i
=
0
;
i
<
x_rank
-
y_rank
;
++
i
)
{
before_shape_tensors
.
push_back
(
Add1DConstantLayer
(
1
));
}
for
(
int
i
=
0
;
i
<
y_rank
;
++
i
)
{
before_shape_tensors
.
push_back
(
GetEleTensorOfShape
(
input_shape_tensor
,
i
));
}
nvinfer1
::
ITensor
*
before_shape_tensor
=
Concat
(
before_shape_tensors
);
auto
*
reshape_before_layer
=
TRT_ENGINE_ADD_LAYER
(
engine_
,
Shuffle
,
*
input2
);
reshape_before_layer
->
setInput
(
1
,
*
before_shape_tensor
);
reshape_before_layer
->
setName
(
(
"full_y_before_matrix_multiply: Shuffle (Output: "
+
output_name
+
")"
)
.
c_str
());
input2
=
reshape_before_layer
->
getOutput
(
0
);
if
(
enable_int8
)
{
if
(
op_desc
.
HasAttr
(
"Y"
))
{
engine_
->
SetTensorDynamicRange
(
input2
,
y_scale
);
}
}
}
y_rank
=
x_rank
;
}
nvinfer1
::
MatrixOperation
matrix_operation_x
;
nvinfer1
::
MatrixOperation
matrix_operation_y
;
if
(
x_rank
==
1
)
{
matrix_operation_x
=
nvinfer1
::
MatrixOperation
::
kVECTOR
;
}
else
{
bool
transpose_x
=
PADDLE_GET_CONST
(
bool
,
op_desc
.
GetAttr
(
"transpose_x"
));
matrix_operation_x
=
transpose_x
?
nvinfer1
::
MatrixOperation
::
kTRANSPOSE
:
nvinfer1
::
MatrixOperation
::
kNONE
;
}
if
(
y_rank
==
1
)
{
matrix_operation_y
=
nvinfer1
::
MatrixOperation
::
kVECTOR
;
}
else
{
bool
transpose_y
=
PADDLE_GET_CONST
(
bool
,
op_desc
.
GetAttr
(
"transpose_y"
));
matrix_operation_y
=
transpose_y
?
nvinfer1
::
MatrixOperation
::
kTRANSPOSE
:
nvinfer1
::
MatrixOperation
::
kNONE
;
}
nvinfer1
::
ILayer
*
layer
=
nullptr
;
layer
=
TRT_ENGINE_ADD_LAYER
(
engine_
,
MatrixMultiply
,
*
input1
,
matrix_operation_x
,
*
input2
,
matrix_operation_y
);
if
(
enable_int8
)
{
if
(
op_desc
.
HasAttr
(
"out_threshold"
)
||
op_desc
.
HasAttr
(
"Out"
))
{
engine_
->
SetTensorDynamicRange
(
layer
->
getOutput
(
0
),
out_scale
);
}
}
float
alpha
=
PADDLE_GET_CONST
(
float
,
op_desc
.
GetAttr
(
"alpha"
));
if
(
alpha
<
0.999
||
alpha
>
1.001
)
{
auto
*
alpha_tensor
=
Add1DConstantLayer
(
alpha
);
std
::
vector
<
nvinfer1
::
ITensor
*>
alpha_shape_tensors
;
for
(
int
i
=
0
;
i
<
layer
->
getOutput
(
0
)
->
getDimensions
().
nbDims
;
i
++
)
{
alpha_shape_tensors
.
push_back
(
Add1DConstantLayer
(
1
));
}
auto
*
reshape_alpha
=
TRT_ENGINE_ADD_LAYER
(
engine_
,
Shuffle
,
*
alpha_tensor
);
reshape_alpha
->
setInput
(
1
,
*
Concat
(
alpha_shape_tensors
));
layer
=
TRT_ENGINE_ADD_LAYER
(
engine_
,
ElementWise
,
*
layer
->
getOutput
(
0
),
*
reshape_alpha
->
getOutput
(
0
),
nvinfer1
::
ElementWiseOperation
::
kPROD
);
}
RreplenishLayerAndOutput
(
layer
,
"matrix_multiply_op"
,
{
output_name
},
test_mode
);
}
};
}
// namespace tensorrt
}
// namespace inference
}
// namespace paddle
REGISTER_TRT_OP_CONVERTER
(
matrix_multiply
,
MatrixMultiplyOpConverter
);
paddle/fluid/inference/tensorrt/convert/multihead_matmul_op.cc
浏览文件 @
ef734e84
...
@@ -71,14 +71,6 @@ class MultiheadMatMulOpConverter : public OpConverter {
...
@@ -71,14 +71,6 @@ class MultiheadMatMulOpConverter : public OpConverter {
int
hidden_out
=
weight_dims
[
2
];
// channels_out
int
hidden_out
=
weight_dims
[
2
];
// channels_out
int
m
=
hidden_in
;
int
m
=
hidden_in
;
int
n
=
three
*
hidden_out
;
int
n
=
three
*
hidden_out
;
auto
tranpose_weight
=
[](
const
float
*
src
,
float
*
dst
,
int
m
,
int
n
)
{
for
(
int
i
=
0
;
i
<
m
;
i
++
)
{
for
(
int
j
=
0
;
j
<
n
;
j
++
)
{
dst
[
j
*
m
+
i
]
=
src
[
i
*
n
+
j
];
}
}
};
tranpose_weight
(
weight_data_tmp
.
data
(),
weight_data
,
m
,
n
);
int
head_number
=
PADDLE_GET_CONST
(
int
,
op_desc
.
GetAttr
(
"head_number"
));
int
head_number
=
PADDLE_GET_CONST
(
int
,
op_desc
.
GetAttr
(
"head_number"
));
...
@@ -102,7 +94,6 @@ class MultiheadMatMulOpConverter : public OpConverter {
...
@@ -102,7 +94,6 @@ class MultiheadMatMulOpConverter : public OpConverter {
nvinfer1
::
ITensor
*
mask_tensor
;
nvinfer1
::
ITensor
*
mask_tensor
;
nvinfer1
::
ITensor
*
pos_id_tensor
;
nvinfer1
::
ITensor
*
pos_id_tensor
;
nvinfer1
::
ITensor
*
max_seqlen_tensor
;
nvinfer1
::
ITensor
*
max_seqlen_tensor
;
auto
*
new_input
=
input
;
if
(
flag_varseqlen
)
{
if
(
flag_varseqlen
)
{
mask_tensor
=
engine_
->
GetITensor
(
"qkv_plugin_mask"
);
mask_tensor
=
engine_
->
GetITensor
(
"qkv_plugin_mask"
);
pos_id_tensor
=
engine_
->
GetITensor
(
"pos_id"
);
pos_id_tensor
=
engine_
->
GetITensor
(
"pos_id"
);
...
@@ -188,7 +179,11 @@ class MultiheadMatMulOpConverter : public OpConverter {
...
@@ -188,7 +179,11 @@ class MultiheadMatMulOpConverter : public OpConverter {
nvinfer1
::
ILayer
*
transformer_input_layer
=
engine_
->
AddDynamicPlugin
(
nvinfer1
::
ILayer
*
transformer_input_layer
=
engine_
->
AddDynamicPlugin
(
inputs_transformer
.
data
(),
inputs_transformer
.
size
(),
plugin
);
inputs_transformer
.
data
(),
inputs_transformer
.
size
(),
plugin
);
new_input
=
transformer_input_layer
->
getOutput
(
0
);
input
=
transformer_input_layer
->
getOutput
(
0
);
if
(
op_desc
.
HasAttr
(
"Input_scale"
))
{
in_scale
=
PADDLE_GET_CONST
(
float
,
op_desc
.
GetAttr
(
"Input_scale"
));
engine_
->
SetTensorDynamicRange
(
input
,
in_scale
);
}
mask_tensor
=
transformer_input_layer
->
getOutput
(
1
);
mask_tensor
=
transformer_input_layer
->
getOutput
(
1
);
pos_id_tensor
=
transformer_input_layer
->
getOutput
(
2
);
pos_id_tensor
=
transformer_input_layer
->
getOutput
(
2
);
max_seqlen_tensor
=
transformer_input_layer
->
getOutput
(
3
);
max_seqlen_tensor
=
transformer_input_layer
->
getOutput
(
3
);
...
@@ -204,7 +199,7 @@ class MultiheadMatMulOpConverter : public OpConverter {
...
@@ -204,7 +199,7 @@ class MultiheadMatMulOpConverter : public OpConverter {
float
dp_probs
=
1.0
/
127.0
;
float
dp_probs
=
1.0
/
127.0
;
nvinfer1
::
DimsHW
nv_ksize
(
1
,
1
);
nvinfer1
::
DimsHW
nv_ksize
(
1
,
1
);
fc_layer
=
TRT_ENGINE_ADD_LAYER
(
fc_layer
=
TRT_ENGINE_ADD_LAYER
(
engine_
,
Convolution
,
*
new_
input
,
n
,
nv_ksize
,
weight
,
bias
);
engine_
,
Convolution
,
*
input
,
n
,
nv_ksize
,
weight
,
bias
);
fc_layer
->
setName
(
fc_layer
->
setName
(
(
"Multihead: Convolution/FullyConnected: (Output: "
+
(
"Multihead: Convolution/FullyConnected: (Output: "
+
output_name
+
")"
)
output_name
+
")"
)
...
@@ -261,22 +256,42 @@ class MultiheadMatMulOpConverter : public OpConverter {
...
@@ -261,22 +256,42 @@ class MultiheadMatMulOpConverter : public OpConverter {
RreplenishLayerAndOutput
(
RreplenishLayerAndOutput
(
plugin_layer
,
"multihead_matmul"
,
{
output_name
},
test_mode
);
plugin_layer
,
"multihead_matmul"
,
{
output_name
},
test_mode
);
}
else
{
}
else
{
auto
*
reshape_before_matrix
=
TRT_ENGINE_ADD_LAYER
(
engine_
,
Shuffle
,
*
input
);
std
::
vector
<
nvinfer1
::
ITensor
*>
reshape_before_tensor_matrix
;
reshape_before_tensor_matrix
.
push_back
(
GetEleTensorOfShape
(
Shape
(
input
),
0
));
reshape_before_tensor_matrix
.
push_back
(
GetEleTensorOfShape
(
Shape
(
input
),
1
));
reshape_before_matrix
->
setInput
(
1
,
*
Concat
(
reshape_before_tensor_matrix
));
reshape_before_matrix
->
setName
(
(
"reshape_before_matrix(Output: "
+
output_name
+
")"
).
c_str
());
auto
*
input
=
reshape_before_matrix
->
getOutput
(
0
);
if
(
op_desc
.
HasAttr
(
"Input_scale"
))
{
in_scale
=
PADDLE_GET_CONST
(
float
,
op_desc
.
GetAttr
(
"Input_scale"
));
engine_
->
SetTensorDynamicRange
(
input
,
in_scale
);
}
int
head_size
=
hidden_out
/
head_number
;
int
head_size
=
hidden_out
/
head_number
;
// [3, head_number, head_size, hidden_in] -> [head_number, 3,
// [hidden_in, 3, head_number, head_size] -> [hidden_in, head_number,
// head_size,
// 3, head_size]
// hidden_in]
auto
transpose_weight_v2
=
[](
const
float
*
src
,
auto
transpose_weight_v2
=
[](
const
float
*
src
,
float
*
dst
,
float
*
dst
,
int
three
,
int
three
,
int
head_number
,
int
head_number
,
int
head_size
,
int
head_size
,
int
hidden_in
)
{
int
hidden_in
)
{
const
int
HH
=
head_size
*
hidden_in
;
for
(
int
i
=
0
;
i
<
hidden_in
;
++
i
)
{
for
(
int
i
=
0
;
i
<
three
;
++
i
)
{
for
(
int
j
=
0
;
j
<
three
;
++
j
)
{
for
(
int
n
=
0
;
n
<
head_number
;
++
n
)
{
for
(
int
n
=
0
;
n
<
head_number
;
++
n
)
{
for
(
int
hh
=
0
;
hh
<
HH
;
++
hh
)
{
for
(
int
m
=
0
;
m
<
head_size
;
++
m
)
{
dst
[
n
*
three
*
HH
+
i
*
HH
+
hh
]
=
dst
[
i
*
head_number
*
three
*
head_size
+
src
[
i
*
head_number
*
HH
+
n
*
HH
+
hh
];
n
*
three
*
head_size
+
j
*
head_size
+
m
]
=
src
[
i
*
three
*
head_number
*
head_size
+
j
*
head_number
*
head_size
+
n
*
head_size
+
m
];
}
}
}
}
}
}
}
...
@@ -309,16 +324,61 @@ class MultiheadMatMulOpConverter : public OpConverter {
...
@@ -309,16 +324,61 @@ class MultiheadMatMulOpConverter : public OpConverter {
transpose_bias_v2
(
transpose_bias_v2
(
bias_data_tmp
.
data
(),
bias_data
,
head_number
,
head_size
);
bias_data_tmp
.
data
(),
bias_data
,
head_number
,
head_size
);
nvinfer1
::
ILayer
*
fc_layer
=
nullptr
;
float
dp_probs
=
1.0
/
127.0
;
float
dp_probs
=
1.0
/
127.0
;
if
(
op_desc
.
HasAttr
(
"Input_scale"
))
{
nvinfer1
::
DimsHW
nv_ksize
(
1
,
1
);
nvinfer1
::
Dims
trt_dims_weight
;
fc_layer
=
TRT_ENGINE_ADD_LAYER
(
trt_dims_weight
.
nbDims
=
2
;
engine_
,
Convolution
,
*
new_input
,
n
,
nv_ksize
,
weight
,
bias
);
trt_dims_weight
.
d
[
0
]
=
m
;
}
else
{
trt_dims_weight
.
d
[
1
]
=
n
;
fc_layer
=
TRT_ENGINE_ADD_LAYER
(
auto
*
weight_tensor
=
engine_
,
FullyConnected
,
*
new_input
,
n
,
weight
,
bias
);
TRT_ENGINE_ADD_LAYER
(
engine_
,
Constant
,
trt_dims_weight
,
weight
)
}
->
getOutput
(
0
);
bool
transpose_x
=
false
;
bool
transpose_y
=
false
;
nvinfer1
::
MatrixOperation
matrix_operation_x
=
transpose_x
?
nvinfer1
::
MatrixOperation
::
kTRANSPOSE
:
nvinfer1
::
MatrixOperation
::
kNONE
;
nvinfer1
::
MatrixOperation
matrix_operation_y
=
transpose_y
?
nvinfer1
::
MatrixOperation
::
kTRANSPOSE
:
nvinfer1
::
MatrixOperation
::
kNONE
;
auto
*
matrix_layer
=
TRT_ENGINE_ADD_LAYER
(
engine_
,
MatrixMultiply
,
*
input
,
matrix_operation_x
,
*
weight_tensor
,
matrix_operation_y
);
nvinfer1
::
Dims
trt_dims_bias
;
trt_dims_bias
.
nbDims
=
2
;
trt_dims_bias
.
d
[
0
]
=
1
;
trt_dims_bias
.
d
[
1
]
=
n
;
auto
*
bias_tensor
=
TRT_ENGINE_ADD_LAYER
(
engine_
,
Constant
,
trt_dims_bias
,
bias
)
->
getOutput
(
0
);
auto
*
add_layer
=
TRT_ENGINE_ADD_LAYER
(
engine_
,
ElementWise
,
*
matrix_layer
->
getOutput
(
0
),
*
bias_tensor
,
nvinfer1
::
ElementWiseOperation
::
kSUM
);
auto
*
reshape_before_multihead_layer
=
TRT_ENGINE_ADD_LAYER
(
engine_
,
Shuffle
,
*
add_layer
->
getOutput
(
0
));
std
::
vector
<
nvinfer1
::
ITensor
*>
reshape_tensor
;
reshape_tensor
.
push_back
(
GetEleTensorOfShape
(
Shape
(
matrix_layer
->
getOutput
(
0
)),
0
));
reshape_tensor
.
push_back
(
GetEleTensorOfShape
(
Shape
(
matrix_layer
->
getOutput
(
0
)),
1
));
reshape_tensor
.
push_back
(
Add1DConstantLayer
(
1
));
reshape_tensor
.
push_back
(
Add1DConstantLayer
(
1
));
reshape_before_multihead_layer
->
setInput
(
1
,
*
Concat
(
reshape_tensor
));
reshape_before_multihead_layer
->
setName
(
(
"reshape_before_multihead_mamul(Output: "
+
output_name
+
")"
)
.
c_str
());
if
(
op_desc
.
HasAttr
(
"fc_out_threshold"
))
{
if
(
op_desc
.
HasAttr
(
"fc_out_threshold"
))
{
PADDLE_ENFORCE_EQ
(
op_desc
.
HasAttr
(
"fc_out_threshold"
),
PADDLE_ENFORCE_EQ
(
op_desc
.
HasAttr
(
"fc_out_threshold"
),
...
@@ -328,12 +388,19 @@ class MultiheadMatMulOpConverter : public OpConverter {
...
@@ -328,12 +388,19 @@ class MultiheadMatMulOpConverter : public OpConverter {
"in int8 mode"
));
"in int8 mode"
));
float
out_scale
=
float
out_scale
=
PADDLE_GET_CONST
(
float
,
op_desc
.
GetAttr
(
"fc_out_threshold"
));
PADDLE_GET_CONST
(
float
,
op_desc
.
GetAttr
(
"fc_out_threshold"
));
engine_
->
SetTensorDynamicRange
(
fc_layer
->
getOutput
(
0
),
out_scale
);
engine_
->
SetTensorDynamicRange
(
matrix_layer
->
getOutput
(
0
),
out_scale
);
engine_
->
SetTensorDynamicRange
(
add_layer
->
getOutput
(
0
),
out_scale
);
engine_
->
SetTensorDynamicRange
(
reshape_before_multihead_layer
->
getOutput
(
0
),
out_scale
);
if
(
qkv2context_plugin_int8
)
{
if
(
qkv2context_plugin_int8
)
{
dp_probs
=
dp_probs
=
PADDLE_GET_CONST
(
float
,
op_desc
.
GetAttr
(
"dp_probs"
))
/
127.0
;
PADDLE_GET_CONST
(
float
,
op_desc
.
GetAttr
(
"dp_probs"
))
/
127.0
;
}
}
}
}
auto
creator
=
GetPluginRegistry
()
->
getPluginCreator
(
auto
creator
=
GetPluginRegistry
()
->
getPluginCreator
(
"CustomQKVToContextPluginDynamic"
,
"2"
);
"CustomQKVToContextPluginDynamic"
,
"2"
);
assert
(
creator
!=
nullptr
);
assert
(
creator
!=
nullptr
);
...
@@ -375,7 +442,8 @@ class MultiheadMatMulOpConverter : public OpConverter {
...
@@ -375,7 +442,8 @@ class MultiheadMatMulOpConverter : public OpConverter {
free
(
plugin_collection
);
free
(
plugin_collection
);
std
::
vector
<
nvinfer1
::
ITensor
*>
plugin_inputs
;
std
::
vector
<
nvinfer1
::
ITensor
*>
plugin_inputs
;
plugin_inputs
.
emplace_back
(
fc_layer
->
getOutput
(
0
));
plugin_inputs
.
emplace_back
(
reshape_before_multihead_layer
->
getOutput
(
0
));
plugin_inputs
.
emplace_back
(
mask_tensor
);
plugin_inputs
.
emplace_back
(
mask_tensor
);
plugin_inputs
.
emplace_back
(
pos_id_tensor
);
plugin_inputs
.
emplace_back
(
pos_id_tensor
);
plugin_inputs
.
emplace_back
(
plugin_inputs
.
emplace_back
(
...
@@ -389,7 +457,8 @@ class MultiheadMatMulOpConverter : public OpConverter {
...
@@ -389,7 +457,8 @@ class MultiheadMatMulOpConverter : public OpConverter {
if
(
!
flag_varseqlen
)
{
if
(
!
flag_varseqlen
)
{
std
::
vector
<
nvinfer1
::
ITensor
*>
output_transformer
;
std
::
vector
<
nvinfer1
::
ITensor
*>
output_transformer
;
output_transformer
.
emplace_back
(
plugin_layer
->
getOutput
(
0
));
output_transformer
.
emplace_back
(
plugin_layer
->
getOutput
(
0
));
output_transformer
.
emplace_back
(
input
);
output_transformer
.
emplace_back
(
engine_
->
GetITensor
(
op_desc
.
Input
(
"Input"
).
front
()));
output_transformer
.
emplace_back
(
pos_id_tensor
);
output_transformer
.
emplace_back
(
pos_id_tensor
);
plugin
::
TransformerOutputConvertPlugin
*
plugin
=
plugin
::
TransformerOutputConvertPlugin
*
plugin
=
new
plugin
::
TransformerOutputConvertPlugin
();
new
plugin
::
TransformerOutputConvertPlugin
();
...
@@ -401,9 +470,23 @@ class MultiheadMatMulOpConverter : public OpConverter {
...
@@ -401,9 +470,23 @@ class MultiheadMatMulOpConverter : public OpConverter {
transformer_output_layer
->
getOutput
(
0
));
transformer_output_layer
->
getOutput
(
0
));
}
else
{
}
else
{
engine_
->
SetITensor
(
output_name
,
plugin_layer
->
getOutput
(
0
));
engine_
->
SetITensor
(
output_name
,
plugin_layer
->
getOutput
(
0
));
if
(
op_desc
.
HasAttr
(
"out_threshold"
))
{
float
out_scale
=
PADDLE_GET_CONST
(
float
,
op_desc
.
GetAttr
(
"out_threshold"
));
engine_
->
SetTensorDynamicRange
(
plugin_layer
->
getOutput
(
0
),
out_scale
);
}
}
}
}
}
}
else
{
}
else
{
auto
tranpose_weight
=
[](
const
float
*
src
,
float
*
dst
,
int
m
,
int
n
)
{
for
(
int
i
=
0
;
i
<
m
;
i
++
)
{
for
(
int
j
=
0
;
j
<
n
;
j
++
)
{
dst
[
j
*
m
+
i
]
=
src
[
i
*
n
+
j
];
}
}
};
tranpose_weight
(
weight_data_tmp
.
data
(),
weight_data
,
m
,
n
);
if
(
input_dims
.
d
[
1
]
<=
384
&&
!
bias_qk_attr
&&
if
(
input_dims
.
d
[
1
]
<=
384
&&
!
bias_qk_attr
&&
engine_
->
precision
()
!=
AnalysisConfig
::
Precision
::
kFloat32
&&
engine_
->
precision
()
!=
AnalysisConfig
::
Precision
::
kFloat32
&&
platform
::
GetGPUComputeCapability
(
platform
::
GetCurrentDeviceId
())
>=
platform
::
GetGPUComputeCapability
(
platform
::
GetCurrentDeviceId
())
>=
...
...
paddle/fluid/inference/tensorrt/convert/one_hot_op.cc
浏览文件 @
ef734e84
...
@@ -56,6 +56,8 @@ class OneHotOpConverter : public OpConverter {
...
@@ -56,6 +56,8 @@ class OneHotOpConverter : public OpConverter {
if
(
dtype
==
6
)
{
// int64
if
(
dtype
==
6
)
{
// int64
VLOG
(
3
)
<<
"trt not support float64, so it is converted to float32."
;
VLOG
(
3
)
<<
"trt not support float64, so it is converted to float32."
;
}
}
}
else
{
PADDLE_THROW
(
platform
::
errors
::
Fatal
(
"one_hot is not supported"
));
}
}
auto
depth_name
=
op_desc
.
Input
(
"depth_tensor"
);
auto
depth_name
=
op_desc
.
Input
(
"depth_tensor"
);
...
...
paddle/fluid/inference/tensorrt/convert/op_converter.h
浏览文件 @
ef734e84
...
@@ -59,19 +59,6 @@ class OpConverter {
...
@@ -59,19 +59,6 @@ class OpConverter {
auto
op_converter_type_map
=
OpTeller
::
Global
().
GetOpConverterTypeMap
();
auto
op_converter_type_map
=
OpTeller
::
Global
().
GetOpConverterTypeMap
();
switch
(
op_converter_type_map
.
at
(
op_desc
.
Type
()))
{
switch
(
op_converter_type_map
.
at
(
op_desc
.
Type
()))
{
case
OpConverterType
::
Default
:
case
OpConverterType
::
Default
:
if
(
op_desc
.
Type
()
==
"mul"
)
{
PADDLE_ENFORCE_EQ
(
op_desc
.
Input
(
"Y"
).
size
(),
1UL
,
platform
::
errors
::
InvalidArgument
(
"The input op mul's Input(
\"
Y
\"
)."
"size() should equal to 1, but reveceid "
"Input(
\"
Y
\"
).size() = %u."
,
op_desc
.
Input
(
"Y"
).
size
()));
std
::
string
Y
=
op_desc
.
Input
(
"Y"
)[
0
];
if
(
parameters
.
count
(
Y
))
{
it
=
Registry
<
OpConverter
>::
Global
().
Lookup
(
"fc"
);
}
}
if
(
op_desc
.
Type
().
find
(
"elementwise"
)
!=
std
::
string
::
npos
)
{
if
(
op_desc
.
Type
().
find
(
"elementwise"
)
!=
std
::
string
::
npos
)
{
static
std
::
unordered_set
<
std
::
string
>
add_tensor_op_set
{
static
std
::
unordered_set
<
std
::
string
>
add_tensor_op_set
{
"add"
,
"mul"
,
"sub"
,
"div"
,
"max"
,
"min"
,
"pow"
,
"mod"
};
"add"
,
"mul"
,
"sub"
,
"div"
,
"max"
,
"min"
,
"pow"
,
"mod"
};
...
...
paddle/fluid/inference/tensorrt/convert/skip_layernorm.cc
浏览文件 @
ef734e84
...
@@ -31,6 +31,7 @@ class SkipLayerNormOpConverter : public OpConverter {
...
@@ -31,6 +31,7 @@ class SkipLayerNormOpConverter : public OpConverter {
platform
::
errors
::
InvalidArgument
(
platform
::
errors
::
InvalidArgument
(
"Skip_layernorm must run the dynamic shape mode."
));
"Skip_layernorm must run the dynamic shape mode."
));
framework
::
OpDesc
op_desc
(
op
,
nullptr
);
framework
::
OpDesc
op_desc
(
op
,
nullptr
);
auto
output_name
=
op_desc
.
Output
(
"Out"
)[
0
];
auto
GetWeight
=
auto
GetWeight
=
[
&
](
const
std
::
string
&
arg_name
)
->
TensorRTEngine
::
Weight
{
[
&
](
const
std
::
string
&
arg_name
)
->
TensorRTEngine
::
Weight
{
std
::
string
var_name
=
op_desc
.
Input
(
arg_name
).
front
();
std
::
string
var_name
=
op_desc
.
Input
(
arg_name
).
front
();
...
@@ -42,15 +43,72 @@ class SkipLayerNormOpConverter : public OpConverter {
...
@@ -42,15 +43,72 @@ class SkipLayerNormOpConverter : public OpConverter {
// Declare inputs
// Declare inputs
auto
*
input1
=
engine_
->
GetITensor
(
op_desc
.
Input
(
"X"
)[
0
]);
auto
*
input1
=
engine_
->
GetITensor
(
op_desc
.
Input
(
"X"
)[
0
]);
auto
*
input2
=
engine_
->
GetITensor
(
op_desc
.
Input
(
"Y"
)[
0
]);
auto
*
input2
=
engine_
->
GetITensor
(
op_desc
.
Input
(
"Y"
)[
0
]);
bool
enable_int8
=
(
engine_
->
precision
()
==
AnalysisConfig
::
Precision
::
kInt8
);
float
x_scale
=
0
;
float
y_scale
=
0
;
if
(
enable_int8
)
{
if
(
op_desc
.
HasAttr
(
"X"
))
{
x_scale
=
PADDLE_GET_CONST
(
float
,
op_desc
.
GetAttr
(
"X"
));
engine_
->
SetTensorDynamicRange
(
input1
,
x_scale
);
}
if
(
op_desc
.
HasAttr
(
"Y"
))
{
y_scale
=
PADDLE_GET_CONST
(
float
,
op_desc
.
GetAttr
(
"Y"
));
engine_
->
SetTensorDynamicRange
(
input2
,
y_scale
);
}
}
nvinfer1
::
Dims
dims_x
=
input1
->
getDimensions
();
int32_t
x_rank
=
dims_x
.
nbDims
;
nvinfer1
::
Dims
dims_y
=
input2
->
getDimensions
();
int32_t
y_rank
=
dims_y
.
nbDims
;
if
((
x_rank
==
2
&&
y_rank
==
4
)
||
(
y_rank
==
2
&&
x_rank
==
4
))
{
if
(
x_rank
==
2
&&
y_rank
==
4
)
{
auto
*
reshape_before_skiplayn
=
TRT_ENGINE_ADD_LAYER
(
engine_
,
Shuffle
,
*
input1
);
std
::
vector
<
nvinfer1
::
ITensor
*>
reshape_before_tensor
;
reshape_before_tensor
.
push_back
(
GetEleTensorOfShape
(
Shape
(
input1
),
0
));
reshape_before_tensor
.
push_back
(
GetEleTensorOfShape
(
Shape
(
input1
),
1
));
reshape_before_tensor
.
push_back
(
Add1DConstantLayer
(
1
));
reshape_before_tensor
.
push_back
(
Add1DConstantLayer
(
1
));
reshape_before_skiplayn
->
setInput
(
1
,
*
Concat
(
reshape_before_tensor
));
reshape_before_skiplayn
->
setName
(
(
"reshape_before_skiplayn(Output: "
+
output_name
+
")"
).
c_str
());
input1
=
reshape_before_skiplayn
->
getOutput
(
0
);
if
(
enable_int8
)
{
if
(
op_desc
.
HasAttr
(
"X"
))
{
engine_
->
SetTensorDynamicRange
(
input1
,
x_scale
);
}
}
}
else
{
auto
*
reshape_before_skiplayn
=
TRT_ENGINE_ADD_LAYER
(
engine_
,
Shuffle
,
*
input2
);
std
::
vector
<
nvinfer1
::
ITensor
*>
reshape_before_tensor
;
reshape_before_tensor
.
push_back
(
GetEleTensorOfShape
(
Shape
(
input2
),
0
));
reshape_before_tensor
.
push_back
(
GetEleTensorOfShape
(
Shape
(
input2
),
1
));
reshape_before_tensor
.
push_back
(
Add1DConstantLayer
(
1
));
reshape_before_tensor
.
push_back
(
Add1DConstantLayer
(
1
));
reshape_before_skiplayn
->
setInput
(
1
,
*
Concat
(
reshape_before_tensor
));
reshape_before_skiplayn
->
setName
(
(
"reshape_before_skiplayn(Output: "
+
output_name
+
")"
).
c_str
());
input2
=
reshape_before_skiplayn
->
getOutput
(
0
);
if
(
enable_int8
)
{
if
(
op_desc
.
HasAttr
(
"Y"
))
{
engine_
->
SetTensorDynamicRange
(
input2
,
y_scale
);
}
}
}
}
std
::
vector
<
nvinfer1
::
ITensor
*>
inputs
;
std
::
vector
<
nvinfer1
::
ITensor
*>
inputs
;
inputs
.
push_back
(
input1
);
inputs
.
push_back
(
input1
);
inputs
.
push_back
(
input2
);
inputs
.
push_back
(
input2
);
bool
enable_int8
=
false
;
if
(
op_desc
.
HasAttr
(
"enable_int8"
))
{
enable_int8
=
PADDLE_GET_CONST
(
bool
,
op_desc
.
GetAttr
(
"enable_int8"
));
}
std
::
vector
<
float
>
smooth_scale
;
std
::
vector
<
float
>
smooth_scale
;
bool
use_smooth
=
false
;
bool
use_smooth
=
false
;
if
(
op_desc
.
HasAttr
(
"smooth_scale"
))
{
if
(
op_desc
.
HasAttr
(
"smooth_scale"
))
{
...
@@ -199,7 +257,6 @@ class SkipLayerNormOpConverter : public OpConverter {
...
@@ -199,7 +257,6 @@ class SkipLayerNormOpConverter : public OpConverter {
layer
=
plugin_layer
;
layer
=
plugin_layer
;
}
}
}
}
auto
output_name
=
op_desc
.
Output
(
"Out"
)[
0
];
RreplenishLayerAndOutput
(
layer
,
"skip_layernorm"
,
{
output_name
},
test_mode
);
RreplenishLayerAndOutput
(
layer
,
"skip_layernorm"
,
{
output_name
},
test_mode
);
}
}
};
};
...
...
paddle/fluid/inference/tensorrt/engine.cc
浏览文件 @
ef734e84
...
@@ -157,6 +157,13 @@ void TensorRTEngine::FreezeNetwork() {
...
@@ -157,6 +157,13 @@ void TensorRTEngine::FreezeNetwork() {
#else
#else
infer_builder_config_
->
setMaxWorkspaceSize
(
max_workspace_
);
infer_builder_config_
->
setMaxWorkspaceSize
(
max_workspace_
);
#endif
#endif
#if IS_TRT_VERSION_GE(8500)
infer_builder_config_
->
setPreviewFeature
(
nvinfer1
::
PreviewFeature
::
kFASTER_DYNAMIC_SHAPES_0805
,
true
);
#else
#endif
bool
enable_fp16
=
(
precision_
==
AnalysisConfig
::
Precision
::
kHalf
);
bool
enable_fp16
=
(
precision_
==
AnalysisConfig
::
Precision
::
kHalf
);
if
(
enable_fp16
)
{
if
(
enable_fp16
)
{
bool
support_fp16
=
infer_builder_
->
platformHasFastFp16
();
bool
support_fp16
=
infer_builder_
->
platformHasFastFp16
();
...
...
paddle/fluid/inference/tensorrt/op_teller.cc
浏览文件 @
ef734e84
...
@@ -393,62 +393,6 @@ struct SimpleOpTypeSetTeller : public Teller {
...
@@ -393,62 +393,6 @@ struct SimpleOpTypeSetTeller : public Teller {
return
false
;
return
false
;
#endif
#endif
}
}
if
(
op_type
==
"matmul_v2"
)
{
if
(
!
with_dynamic_shape
)
{
return
false
;
}
auto
*
block
=
desc
.
Block
();
if
(
block
==
nullptr
)
{
VLOG
(
3
)
<<
"The block desc is nullptr, we can't continue to analyze. "
"Developers need to check whether block_desc is passed in "
"the pass."
;
return
false
;
}
return
true
;
}
if
(
op_type
==
"matmul"
)
{
auto
*
block
=
desc
.
Block
();
if
(
block
==
nullptr
)
{
VLOG
(
3
)
<<
"The block desc is nullptr, we can't continue to analyze. "
"Developers need to check whether block_desc is passed in "
"the pass."
;
return
false
;
}
// not support broadcast
auto
*
x_var_desc
=
block
->
FindVar
(
desc
.
Input
(
"X"
)[
0
]);
auto
*
y_var_desc
=
block
->
FindVar
(
desc
.
Input
(
"Y"
)[
0
]);
const
auto
x_shape
=
x_var_desc
->
GetShape
();
const
auto
y_shape
=
y_var_desc
->
GetShape
();
if
(
x_shape
.
size
()
!=
y_shape
.
size
())
{
VLOG
(
3
)
<<
"matmul op not support broadcast, please check inputs'shape. "
;
return
false
;
}
uint64_t
dims
=
2
;
for
(
size_t
i
=
0
;
i
<
x_shape
.
size
()
-
dims
;
++
i
)
{
if
(
x_shape
[
i
]
!=
y_shape
[
i
]
&&
(
x_shape
[
i
]
==
1
||
y_shape
[
i
]
==
1
))
{
VLOG
(
3
)
<<
"matmul op not support broadcast, please check "
"inputs'shape[i]. "
;
return
false
;
}
}
for
(
auto
&
param_name
:
desc
.
Inputs
())
{
for
(
auto
&
var_name
:
param_name
.
second
)
{
auto
*
var_desc
=
block
->
FindVar
(
var_name
);
const
auto
shape
=
var_desc
->
GetShape
();
if
(
shape
.
size
()
<
3
)
{
VLOG
(
3
)
<<
"matmul op dims < 3 not supported in tensorrt, but got dims "
<<
shape
.
size
()
<<
", so jump it."
;
return
false
;
}
}
}
}
if
(
op_type
==
"softmax"
)
{
if
(
op_type
==
"softmax"
)
{
auto
*
block
=
desc
.
Block
();
auto
*
block
=
desc
.
Block
();
if
(
block
==
nullptr
)
{
if
(
block
==
nullptr
)
{
...
@@ -2158,63 +2102,6 @@ struct SimpleOpTypeSetTeller : public Teller {
...
@@ -2158,63 +2102,6 @@ struct SimpleOpTypeSetTeller : public Teller {
}
}
}
}
if
(
op_type
==
"fc"
)
{
auto
*
block
=
desc
.
Block
();
if
(
block
==
nullptr
)
{
VLOG
(
3
)
<<
"The block desc is nullptr, we can't continue to analyze. "
"Developers need to check whether block_desc is passed in "
"the pass."
;
return
false
;
}
// y'shapes == 2
auto
fc_inputs
=
desc
.
Inputs
();
std
::
string
fc_y
=
""
;
if
(
fc_inputs
.
find
(
"Y"
)
!=
fc_inputs
.
end
())
{
fc_y
=
"Y"
;
}
else
if
(
fc_inputs
.
find
(
"W"
)
!=
fc_inputs
.
end
())
{
fc_y
=
"W"
;
}
else
{
VLOG
(
3
)
<<
" input_y(fc_op) must be Y or W "
;
return
false
;
}
// There is currently no input: Y(weight) more than two dimensions
/*
auto* y_var_desc = block->FindVar(desc.Input(fc_y)[0]);
const auto y_shape = y_var_desc->GetShape();
if (y_shape.size() != 2) {
VLOG(3)
<< " input_y(fc_op)'shapes must be 2, but input_y(fc_op)'shapes =
"
<< y_shape.size();
return false;
}
// y_num_col_dims ==1
if (desc.HasAttr("y_num_col_dims")) {
int y_num_col_dims =
PADDLE_GET_CONST(int, desc.GetAttr("y_num_col_dims"));
if (y_num_col_dims != 1) {
VLOG(3) << " fc_op'y_num_col_dims must be 1, but y_num_col_dims = "
<< y_num_col_dims;
return false;
}
}
*/
int
x_num_col_dims
=
desc
.
HasAttr
(
"x_num_col_dims"
)
?
PADDLE_GET_CONST
(
int
,
desc
.
GetAttr
(
"x_num_col_dims"
))
:
(
desc
.
HasAttr
(
"in_num_col_dims"
)
?
PADDLE_GET_CONST
(
int
,
desc
.
GetAttr
(
"in_num_col_dims"
))
:
1
);
if
(
x_num_col_dims
<
1
)
{
VLOG
(
3
)
<<
"fc_op expects x_num_col_dims >= 1, "
"but x_num_col_dims = "
<<
x_num_col_dims
;
return
false
;
}
}
if
(
op_type
==
"reshape"
||
op_type
==
"reshape2"
)
{
if
(
op_type
==
"reshape"
||
op_type
==
"reshape2"
)
{
if
(
!
desc
.
HasAttr
(
"shape"
))
{
if
(
!
desc
.
HasAttr
(
"shape"
))
{
return
false
;
return
false
;
...
@@ -2798,9 +2685,7 @@ struct SimpleOpTypeSetTeller : public Teller {
...
@@ -2798,9 +2685,7 @@ struct SimpleOpTypeSetTeller : public Teller {
private:
private:
// use this set for no calib int8.
// use this set for no calib int8.
std
::
unordered_set
<
std
::
string
>
int8_teller_set
{
std
::
unordered_set
<
std
::
string
>
int8_teller_set
{
"mul"
,
"matrix_multiply"
,
"matmul"
,
"matmul_v2"
,
"bmm"
,
"bmm"
,
"range"
,
"range"
,
"conv2d"
,
"conv2d"
,
...
@@ -2869,7 +2754,6 @@ struct SimpleOpTypeSetTeller : public Teller {
...
@@ -2869,7 +2754,6 @@ struct SimpleOpTypeSetTeller : public Teller {
"conv2d_transpose"
,
"conv2d_transpose"
,
"depthwise_conv2d_transpose"
,
"depthwise_conv2d_transpose"
,
"leaky_relu"
,
"leaky_relu"
,
"fc"
,
"shuffle_channel"
,
"shuffle_channel"
,
"where"
,
"where"
,
"bitwise_not"
,
"bitwise_not"
,
...
@@ -2958,9 +2842,7 @@ struct SimpleOpTypeSetTeller : public Teller {
...
@@ -2958,9 +2842,7 @@ struct SimpleOpTypeSetTeller : public Teller {
"cumsum"
};
"cumsum"
};
std
::
unordered_set
<
std
::
string
>
teller_set
{
std
::
unordered_set
<
std
::
string
>
teller_set
{
"mul"
,
"matrix_multiply"
,
"matmul"
,
"matmul_v2"
,
"bmm"
,
"bmm"
,
"range"
,
"range"
,
"conv2d"
,
"conv2d"
,
...
@@ -3029,7 +2911,6 @@ struct SimpleOpTypeSetTeller : public Teller {
...
@@ -3029,7 +2911,6 @@ struct SimpleOpTypeSetTeller : public Teller {
"conv2d_transpose"
,
"conv2d_transpose"
,
"depthwise_conv2d_transpose"
,
"depthwise_conv2d_transpose"
,
"leaky_relu"
,
"leaky_relu"
,
"fc"
,
"shuffle_channel"
,
"shuffle_channel"
,
"where"
,
"where"
,
"bitwise_not"
,
"bitwise_not"
,
...
...
paddle/fluid/operators/tensorrt/tensorrt_engine_op_test.cc
浏览文件 @
ef734e84
...
@@ -72,31 +72,39 @@ void DynamicShapeTest(bool allow_build_at_runtime) {
...
@@ -72,31 +72,39 @@ void DynamicShapeTest(bool allow_build_at_runtime) {
LOG
(
INFO
)
<<
"create block desc"
;
LOG
(
INFO
)
<<
"create block desc"
;
framework
::
BlockDesc
block_desc
(
&
program
,
block_
);
framework
::
BlockDesc
block_desc
(
&
program
,
block_
);
LOG
(
INFO
)
<<
"create fc op"
;
LOG
(
INFO
)
<<
"create elementwise_add op"
;
auto
*
fc0
=
block_desc
.
AppendOp
();
auto
*
elementwise_add0
=
block_desc
.
AppendOp
();
fc0
->
SetType
(
"fc"
);
elementwise_add0
->
SetType
(
"elementwise_add"
);
fc0
->
SetInput
(
"X"
,
std
::
vector
<
std
::
string
>
({
"x"
}));
// 4 x 1 x 1
elementwise_add0
->
SetInput
(
"X"
,
fc0
->
SetInput
(
"Y"
,
std
::
vector
<
std
::
string
>
({
"y"
}));
// 4 x 6
std
::
vector
<
std
::
string
>
({
"x"
}));
// 2 x 4 x 4 x 4
fc0
->
SetOutput
(
"Out"
,
std
::
vector
<
std
::
string
>
({
"z"
}));
// 6 x 1 x 1
elementwise_add0
->
SetInput
(
"Y"
,
std
::
vector
<
std
::
string
>
({
"y"
}));
// 1 x 4 x 1 x 1
LOG
(
INFO
)
<<
"create fc op"
;
elementwise_add0
->
SetOutput
(
auto
*
fc1
=
block_desc
.
AppendOp
();
"Out"
,
std
::
vector
<
std
::
string
>
({
"z"
}));
// 2 x 4 x 4 x 4
fc1
->
SetType
(
"fc"
);
elementwise_add0
->
SetAttr
(
"axis"
,
static_cast
<
int32_t
>
(
0
));
fc1
->
SetInput
(
"X"
,
std
::
vector
<
std
::
string
>
({
"z"
}));
fc1
->
SetInput
(
"Y"
,
std
::
vector
<
std
::
string
>
({
"y0"
}));
// 6 x 8
LOG
(
INFO
)
<<
"create elementwise_add op"
;
fc1
->
SetOutput
(
"Out"
,
std
::
vector
<
std
::
string
>
({
"z0"
}));
// 8 x 1 x 1
auto
*
elementwise_add1
=
block_desc
.
AppendOp
();
elementwise_add1
->
SetType
(
"elementwise_add"
);
elementwise_add1
->
SetInput
(
"X"
,
std
::
vector
<
std
::
string
>
({
"z"
}));
// 2 x 4 x 4 x 4
elementwise_add1
->
SetInput
(
"Y"
,
std
::
vector
<
std
::
string
>
({
"y0"
}));
// 1 x 4 x 4 x 4
elementwise_add1
->
SetOutput
(
"Out"
,
std
::
vector
<
std
::
string
>
({
"z0"
}));
// 2 x 4 x 4 x 4
elementwise_add1
->
SetAttr
(
"axis"
,
static_cast
<
int32_t
>
(
0
));
// Set inputs' variable shape in BlockDesc
// Set inputs' variable shape in BlockDesc
// the batch size is 2, so the dims of 'x' is {2, 4
, 1, 1
}
// the batch size is 2, so the dims of 'x' is {2, 4}
AddTensorToBlockDesc
(
block_
,
"x"
,
std
::
vector
<
int64_t
>
({
2
,
4
,
1
,
1
}));
AddTensorToBlockDesc
(
block_
,
"x"
,
std
::
vector
<
int64_t
>
({
2
,
4
,
4
,
4
}));
AddTensorToBlockDesc
(
block_
,
"y"
,
std
::
vector
<
int64_t
>
({
4
,
6
}));
AddTensorToBlockDesc
(
block_
,
"y"
,
std
::
vector
<
int64_t
>
({
1
,
4
,
1
,
1
}));
AddTensorToBlockDesc
(
block_
,
"y0"
,
std
::
vector
<
int64_t
>
({
6
,
8
}));
AddTensorToBlockDesc
(
block_
,
"y0"
,
std
::
vector
<
int64_t
>
({
1
,
4
,
4
,
4
}));
AddTensorToBlockDesc
(
block_
,
"z"
,
std
::
vector
<
int64_t
>
({
2
,
6
}));
AddTensorToBlockDesc
(
block_
,
"z"
,
std
::
vector
<
int64_t
>
({
2
,
4
,
4
,
4
}));
AddTensorToBlockDesc
(
block_
,
"z0"
,
std
::
vector
<
int64_t
>
({
8
,
1
,
1
}));
AddTensorToBlockDesc
(
block_
,
"z0"
,
std
::
vector
<
int64_t
>
({
2
,
4
,
4
,
4
}));
// It is wired, need to copy manually.
// It is wired, need to copy manually.
*
block_
->
add_ops
()
=
*
fc
0
->
Proto
();
*
block_
->
add_ops
()
=
*
elementwise_add
0
->
Proto
();
*
block_
->
add_ops
()
=
*
fc
1
->
Proto
();
*
block_
->
add_ops
()
=
*
elementwise_add
1
->
Proto
();
ASSERT_EQ
(
block_
->
ops_size
(),
2
);
ASSERT_EQ
(
block_
->
ops_size
(),
2
);
...
@@ -132,9 +140,9 @@ void DynamicShapeTest(bool allow_build_at_runtime) {
...
@@ -132,9 +140,9 @@ void DynamicShapeTest(bool allow_build_at_runtime) {
engine_op_desc
.
SetAttr
(
"use_static_engine"
,
true
);
engine_op_desc
.
SetAttr
(
"use_static_engine"
,
true
);
engine_op_desc
.
SetAttr
(
"dynamic_shape_names"
,
std
::
vector
<
std
::
string
>
{
"x"
});
engine_op_desc
.
SetAttr
(
"dynamic_shape_names"
,
std
::
vector
<
std
::
string
>
{
"x"
});
engine_op_desc
.
SetAttr
(
"dynamic_shape_lens"
,
std
::
vector
<
int
>
{
4
});
engine_op_desc
.
SetAttr
(
"dynamic_shape_lens"
,
std
::
vector
<
int
>
{
4
});
engine_op_desc
.
SetAttr
(
"min_input_shape"
,
std
::
vector
<
int
>
{
1
,
4
,
1
,
1
});
engine_op_desc
.
SetAttr
(
"min_input_shape"
,
std
::
vector
<
int
>
{
1
,
1
,
1
,
1
});
engine_op_desc
.
SetAttr
(
"max_input_shape"
,
std
::
vector
<
int
>
{
2
,
4
,
1
,
1
});
engine_op_desc
.
SetAttr
(
"max_input_shape"
,
std
::
vector
<
int
>
{
16
,
16
,
16
,
16
});
engine_op_desc
.
SetAttr
(
"opt_input_shape"
,
std
::
vector
<
int
>
{
2
,
4
,
1
,
1
});
engine_op_desc
.
SetAttr
(
"opt_input_shape"
,
std
::
vector
<
int
>
{
2
,
4
,
4
,
4
});
engine_op_desc
.
SetAttr
(
"model_precision"
,
engine_op_desc
.
SetAttr
(
"model_precision"
,
static_cast
<
int
>
(
phi
::
DataType
::
FLOAT32
));
static_cast
<
int
>
(
phi
::
DataType
::
FLOAT32
));
...
@@ -151,26 +159,22 @@ void DynamicShapeTest(bool allow_build_at_runtime) {
...
@@ -151,26 +159,22 @@ void DynamicShapeTest(bool allow_build_at_runtime) {
ctx
.
PartialInitWithAllocator
();
ctx
.
PartialInitWithAllocator
();
// Prepare variables.
// Prepare variables.
if
(
allow_build_at_runtime
)
if
(
allow_build_at_runtime
)
CreateCUDATensor
(
&
scope
,
"x"
,
std
::
vector
<
int64_t
>
({
3
,
4
,
1
,
1
}));
CreateCUDATensor
(
&
scope
,
"x"
,
std
::
vector
<
int64_t
>
({
3
2
,
4
,
4
,
4
}));
else
else
CreateCUDATensor
(
&
scope
,
"x"
,
std
::
vector
<
int64_t
>
({
2
,
4
,
1
,
1
}));
CreateCUDATensor
(
&
scope
,
"x"
,
std
::
vector
<
int64_t
>
({
2
,
4
,
4
,
4
}));
CreateCUDATensor
(
&
scope
,
"y"
,
std
::
vector
<
int64_t
>
({
4
,
6
}));
CreateCUDATensor
(
&
scope
,
"y"
,
std
::
vector
<
int64_t
>
({
1
,
4
,
1
,
1
}));
CreateCUDATensor
(
&
scope
,
"y0"
,
std
::
vector
<
int64_t
>
({
6
,
8
}));
CreateCUDATensor
(
&
scope
,
"y0"
,
std
::
vector
<
int64_t
>
({
1
,
4
,
4
,
4
}));
CreateCUDATensor
(
&
scope
,
"z0"
,
std
::
vector
<
int64_t
>
({
2
,
8
}));
CreateCUDATensor
(
&
scope
,
"z0"
,
std
::
vector
<
int64_t
>
({
2
,
4
,
4
,
4
}));
// Execute them.
// Execute them.
LOG
(
INFO
)
<<
"engine_op run"
;
LOG
(
INFO
)
<<
"engine_op run"
;
inference
::
tensorrt
::
OpTeller
::
Global
().
SetOpConverterType
(
inference
::
tensorrt
::
OpTeller
::
Global
().
SetOpConverterType
(
"
fc
"
,
inference
::
tensorrt
::
OpConverterType
::
Default
);
"
elementwise_add
"
,
inference
::
tensorrt
::
OpConverterType
::
Default
);
engine_op
->
Run
(
scope
,
place
);
engine_op
->
Run
(
scope
,
place
);
}
}
TEST
(
TensorRTEngineOp
,
manual
)
{
TEST
(
TensorRTEngineOp
,
manual
)
{
DynamicShapeTest
(
false
);
}
DynamicShapeTest
(
false
);
DynamicShapeTest
(
true
);
}
void
Execute
(
int
batch_size
,
int
input_dim
,
int
output_dim
,
int
nlayers
=
1
)
{
void
Execute
(
int
batch_size
,
int
input_dim
,
int
output_dim
,
int
nlayers
=
1
)
{
framework
::
ProgramDesc
program
;
framework
::
ProgramDesc
program
;
framework
::
Scope
scope
;
framework
::
Scope
scope
;
...
@@ -197,12 +201,12 @@ void Execute(int batch_size, int input_dim, int output_dim, int nlayers = 1) {
...
@@ -197,12 +201,12 @@ void Execute(int batch_size, int input_dim, int output_dim, int nlayers = 1) {
const
shape_t
&
x_shape
,
const
shape_t
&
x_shape
,
const
shape_t
&
y_shape
,
const
shape_t
&
y_shape
,
const
shape_t
&
z_shape
)
{
const
shape_t
&
z_shape
)
{
LOG
(
INFO
)
<<
"create
fc
op"
;
LOG
(
INFO
)
<<
"create
matrix_multiply
op"
;
auto
*
fc
=
block_desc
.
AppendOp
();
auto
*
matrix_multiply
=
block_desc
.
AppendOp
();
fc
->
SetType
(
"mul
"
);
matrix_multiply
->
SetType
(
"matrix_multiply
"
);
fc
->
SetInput
(
"X"
,
std
::
vector
<
std
::
string
>
({
x_name
}));
matrix_multiply
->
SetInput
(
"X"
,
std
::
vector
<
std
::
string
>
({
x_name
}));
fc
->
SetInput
(
"Y"
,
std
::
vector
<
std
::
string
>
({
y_name
}));
matrix_multiply
->
SetInput
(
"Y"
,
std
::
vector
<
std
::
string
>
({
y_name
}));
fc
->
SetOutput
(
"Out"
,
std
::
vector
<
std
::
string
>
({
z_name
}));
matrix_multiply
->
SetOutput
(
"Out"
,
std
::
vector
<
std
::
string
>
({
z_name
}));
// Set inputs' variable shape in BlockDesc
// Set inputs' variable shape in BlockDesc
if
(
!
x_created
)
{
if
(
!
x_created
)
{
...
@@ -222,7 +226,7 @@ void Execute(int batch_size, int input_dim, int output_dim, int nlayers = 1) {
...
@@ -222,7 +226,7 @@ void Execute(int batch_size, int input_dim, int output_dim, int nlayers = 1) {
CreateCUDATensor
(
&
scope
,
z_name
,
std
::
vector
<
int64_t
>
(
z_shape
));
CreateCUDATensor
(
&
scope
,
z_name
,
std
::
vector
<
int64_t
>
(
z_shape
));
// It is wired, need to copy manually.
// It is wired, need to copy manually.
*
block_
->
add_ops
()
=
*
fc
->
Proto
();
*
block_
->
add_ops
()
=
*
matrix_multiply
->
Proto
();
};
};
// Test with 4 layer FC
// Test with 4 layer FC
...
@@ -293,9 +297,9 @@ void Execute(int batch_size, int input_dim, int output_dim, int nlayers = 1) {
...
@@ -293,9 +297,9 @@ void Execute(int batch_size, int input_dim, int output_dim, int nlayers = 1) {
}
}
// Test with a larger FC layer.
// Test with a larger FC layer.
// TEST(TensorRTEngineOp,
fc
) { Execute(40, 28, 28); }
// TEST(TensorRTEngineOp,
matrix_multiply
) { Execute(40, 28, 28); }
}
// namespace operators
}
// namespace operators
}
// namespace paddle
}
// namespace paddle
USE_TRT_CONVERTER
(
fc
)
USE_TRT_CONVERTER
(
elementwise_add_weight
)
python/paddle/fluid/tests/unittests/ir/inference/CMakeLists.txt
浏览文件 @
ef734e84
...
@@ -236,8 +236,6 @@ if(WITH_GPU AND TENSORRT_FOUND)
...
@@ -236,8 +236,6 @@ if(WITH_GPU AND TENSORRT_FOUND)
set_tests_properties
(
test_reshape2_matmul_fuse_pass PROPERTIES TIMEOUT 240
)
set_tests_properties
(
test_reshape2_matmul_fuse_pass PROPERTIES TIMEOUT 240
)
set_tests_properties
(
test_preln_layernorm_x_fuse_pass PROPERTIES TIMEOUT
set_tests_properties
(
test_preln_layernorm_x_fuse_pass PROPERTIES TIMEOUT
240
)
240
)
set_tests_properties
(
test_trt_flatten2_matmul_fuse_pass PROPERTIES TIMEOUT
240
)
set_tests_properties
(
test_shuffle_channel_detect_pass PROPERTIES TIMEOUT
set_tests_properties
(
test_shuffle_channel_detect_pass PROPERTIES TIMEOUT
120
)
120
)
if
(
WIN32
)
if
(
WIN32
)
...
...
python/paddle/fluid/tests/unittests/ir/inference/test_fc_fuse_pass.py
浏览文件 @
ef734e84
...
@@ -19,8 +19,6 @@ import numpy as np
...
@@ -19,8 +19,6 @@ import numpy as np
from
auto_scan_test
import
IgnoreReasons
,
PassAutoScanTest
from
auto_scan_test
import
IgnoreReasons
,
PassAutoScanTest
from
program_config
import
OpConfig
,
ProgramConfig
,
TensorConfig
from
program_config
import
OpConfig
,
ProgramConfig
,
TensorConfig
import
paddle.inference
as
paddle_infer
class
TestFcFusePass
(
PassAutoScanTest
):
class
TestFcFusePass
(
PassAutoScanTest
):
r
"""
r
"""
...
@@ -45,14 +43,6 @@ class TestFcFusePass(PassAutoScanTest):
...
@@ -45,14 +43,6 @@ class TestFcFusePass(PassAutoScanTest):
# trt static_shape
# trt static_shape
config
=
self
.
create_trt_inference_config
()
config
=
self
.
create_trt_inference_config
()
config
.
enable_tensorrt_engine
(
max_batch_size
=
8
,
workspace_size
=
102400
,
min_subgraph_size
=
0
,
precision_mode
=
paddle_infer
.
PrecisionType
.
Float32
,
use_static
=
False
,
use_calib_mode
=
False
,
)
yield
config
,
[
'fc'
],
(
1e-5
,
1e-5
)
yield
config
,
[
'fc'
],
(
1e-5
,
1e-5
)
def
add_ignore_pass_case
(
self
):
def
add_ignore_pass_case
(
self
):
...
...
python/paddle/fluid/tests/unittests/ir/inference/test_multihead_matmul_roformer_fuse_pass.py
浏览文件 @
ef734e84
...
@@ -54,7 +54,10 @@ class TestMultiheadMatmulRoformerFusePass(PassAutoScanTest):
...
@@ -54,7 +54,10 @@ class TestMultiheadMatmulRoformerFusePass(PassAutoScanTest):
"sin_input"
:
[
1
,
12
,
128
,
64
],
"sin_input"
:
[
1
,
12
,
128
,
64
],
},
},
)
)
yield
config
,
[
"multihead_matmul_roformer"
,
"matmul"
],
(
1e-2
,
1e-3
)
yield
config
,
[
"multihead_matmul_roformer"
,
"matrix_multiply"
],
(
1e-2
,
1e-3
,
)
def
sample_program_config
(
self
,
draw
):
def
sample_program_config
(
self
,
draw
):
def
generate_mul_input
():
def
generate_mul_input
():
...
...
python/paddle/fluid/tests/unittests/ir/inference/test_trt_convert_matmul_v2.py
浏览文件 @
ef734e84
...
@@ -19,7 +19,7 @@ from typing import List
...
@@ -19,7 +19,7 @@ from typing import List
import
numpy
as
np
import
numpy
as
np
from
program_config
import
ProgramConfig
,
TensorConfig
from
program_config
import
ProgramConfig
,
TensorConfig
from
trt_layer_auto_scan_test
import
TrtLayerAutoScanTest
from
trt_layer_auto_scan_test
import
SkipReasons
,
TrtLayerAutoScanTest
import
paddle.inference
as
paddle_infer
import
paddle.inference
as
paddle_infer
...
@@ -91,17 +91,14 @@ class TrtConvertMatmulTest_dynamic(TrtLayerAutoScanTest):
...
@@ -91,17 +91,14 @@ class TrtConvertMatmulTest_dynamic(TrtLayerAutoScanTest):
]
]
# The output has little diff between gpu and trt in CI-Windows-Inference
# The output has little diff between gpu and trt in CI-Windows-Inference
tol_fp32
=
1e-5
tol_half
=
1e-5
if
os
.
name
==
'nt'
:
tol_fp32
=
1e-3
tol_fp32
=
1e-3
tol_half
=
1e-3
tol_half
=
1e-3
# for dynamic_shape
# for dynamic_shape
generate_dynamic_shape
(
attrs
)
generate_dynamic_shape
(
attrs
)
self
.
trt_param
.
precision
=
paddle_infer
.
PrecisionType
.
Float32
self
.
trt_param
.
precision
=
paddle_infer
.
PrecisionType
.
Float32
yield
self
.
create_inference_config
(),
(
1
,
3
),
tol_fp32
yield
self
.
create_inference_config
(),
(
1
,
3
),
(
tol_fp32
,
tol_fp32
)
self
.
trt_param
.
precision
=
paddle_infer
.
PrecisionType
.
Half
self
.
trt_param
.
precision
=
paddle_infer
.
PrecisionType
.
Half
yield
self
.
create_inference_config
(),
(
1
,
3
),
tol_half
yield
self
.
create_inference_config
(),
(
1
,
3
),
(
tol_half
,
tol_half
)
def
add_skip_trt_case
(
self
):
def
add_skip_trt_case
(
self
):
pass
pass
...
@@ -185,9 +182,9 @@ class TrtConvertMatmulTest_dynamic2(TrtLayerAutoScanTest):
...
@@ -185,9 +182,9 @@ class TrtConvertMatmulTest_dynamic2(TrtLayerAutoScanTest):
# for dynamic_shape
# for dynamic_shape
generate_dynamic_shape
(
attrs
)
generate_dynamic_shape
(
attrs
)
self
.
trt_param
.
precision
=
paddle_infer
.
PrecisionType
.
Float32
self
.
trt_param
.
precision
=
paddle_infer
.
PrecisionType
.
Float32
yield
self
.
create_inference_config
(),
(
1
,
3
),
tol_fp32
yield
self
.
create_inference_config
(),
(
1
,
3
),
(
tol_fp32
,
tol_fp32
)
self
.
trt_param
.
precision
=
paddle_infer
.
PrecisionType
.
Half
self
.
trt_param
.
precision
=
paddle_infer
.
PrecisionType
.
Half
yield
self
.
create_inference_config
(),
(
1
,
3
),
tol_half
yield
self
.
create_inference_config
(),
(
1
,
3
),
(
tol_half
,
tol_half
)
def
add_skip_trt_case
(
self
):
def
add_skip_trt_case
(
self
):
pass
pass
...
@@ -319,7 +316,20 @@ class TrtConvertMatmulTest_dynamic3(TrtLayerAutoScanTest):
...
@@ -319,7 +316,20 @@ class TrtConvertMatmulTest_dynamic3(TrtLayerAutoScanTest):
yield
self
.
create_inference_config
(),
(
1
,
3
),
1e-3
yield
self
.
create_inference_config
(),
(
1
,
3
),
1e-3
def
add_skip_trt_case
(
self
):
def
add_skip_trt_case
(
self
):
pass
def
teller1
(
program_config
,
predictor_config
):
inputs
=
program_config
.
inputs
if
(
len
(
inputs
[
'input1_data'
].
shape
)
==
1
and
len
(
inputs
[
'input2_data'
].
shape
)
==
1
):
return
True
return
False
self
.
add_skip_case
(
teller1
,
SkipReasons
.
TRT_NOT_IMPLEMENTED
,
"If both tensors are one-dimensional, the dot product result is obtained(Out.rank = 0)"
,
)
def
test
(
self
):
def
test
(
self
):
self
.
add_skip_trt_case
()
self
.
add_skip_trt_case
()
...
...
python/paddle/fluid/tests/unittests/ir/inference/test_trt_convert_multihead_matmul.py
浏览文件 @
ef734e84
...
@@ -18,7 +18,7 @@ from typing import List
...
@@ -18,7 +18,7 @@ from typing import List
import
numpy
as
np
import
numpy
as
np
from
program_config
import
ProgramConfig
,
TensorConfig
from
program_config
import
ProgramConfig
,
TensorConfig
from
trt_layer_auto_scan_test
import
SkipReasons
,
TrtLayerAutoScanTest
from
trt_layer_auto_scan_test
import
TrtLayerAutoScanTest
import
paddle.inference
as
paddle_infer
import
paddle.inference
as
paddle_infer
...
@@ -29,18 +29,18 @@ class TrtConvertMultiHeadMatmulTest(TrtLayerAutoScanTest):
...
@@ -29,18 +29,18 @@ class TrtConvertMultiHeadMatmulTest(TrtLayerAutoScanTest):
def
sample_program_configs
(
self
):
def
sample_program_configs
(
self
):
def
generate_input1
(
batch
,
dim1
):
def
generate_input1
(
batch
,
dim1
):
return
np
.
random
.
random
((
batch
,
dim1
,
768
)
).
astype
(
np
.
float32
)
return
np
.
full
((
batch
,
dim1
,
768
),
1
).
astype
(
np
.
float32
)
def
generate_input2
(
shape
):
def
generate_input2
(
shape
):
return
np
.
random
.
random
(
shape
).
astype
(
np
.
float32
)
return
np
.
full
(
shape
,
1
).
astype
(
np
.
float32
)
def
generate_weight1
():
def
generate_weight1
():
return
np
.
random
.
random
((
768
,
768
)
).
astype
(
np
.
float32
)
return
np
.
full
((
768
,
768
),
0.1
).
astype
(
np
.
float32
)
def
generate_weight2
():
def
generate_weight2
():
return
np
.
random
.
random
(
768
).
astype
(
np
.
float32
)
return
np
.
full
((
768
),
0.1
).
astype
(
np
.
float32
)
for
batch
in
[
1
,
2
,
4
]:
for
batch
in
[
1
,
4
]:
self
.
batch
=
batch
self
.
batch
=
batch
for
reshape_shape
in
[[
0
,
0
,
12
,
64
]]:
for
reshape_shape
in
[[
0
,
0
,
12
,
64
]]:
for
dim1
in
[
128
]:
for
dim1
in
[
128
]:
...
@@ -371,80 +371,33 @@ class TrtConvertMultiHeadMatmulTest(TrtLayerAutoScanTest):
...
@@ -371,80 +371,33 @@ class TrtConvertMultiHeadMatmulTest(TrtLayerAutoScanTest):
program_config
.
ops
[
i
].
attrs
for
i
in
range
(
len
(
program_config
.
ops
))
program_config
.
ops
[
i
].
attrs
for
i
in
range
(
len
(
program_config
.
ops
))
]
]
# for static_shape
clear_dynamic_shape
()
self
.
trt_param
.
precision
=
paddle_infer
.
PrecisionType
.
Float32
self
.
trt_param
.
workspace_size
=
2013265920
yield
self
.
create_inference_config
(),
(
1
,
4
),
(
1e-5
,
1e-5
)
self
.
trt_param
.
precision
=
paddle_infer
.
PrecisionType
.
Half
yield
self
.
create_inference_config
(),
(
1
,
4
),
(
1e-3
,
1e-3
)
# for dynamic_shape
# for dynamic_shape
generate_dynamic_shape
(
attrs
)
generate_dynamic_shape
(
attrs
)
self
.
trt_param
.
precision
=
paddle_infer
.
PrecisionType
.
Float32
self
.
trt_param
.
precision
=
paddle_infer
.
PrecisionType
.
Float32
self
.
trt_param
.
workspace_size
=
2013265920
self
.
trt_param
.
workspace_size
=
2013265920
yield
self
.
create_inference_config
(),
(
1
,
3
),
(
1e-5
,
1e-4
)
yield
self
.
create_inference_config
(),
(
1
,
3
),
(
1e-5
,
1e-4
)
self
.
trt_param
.
precision
=
paddle_infer
.
PrecisionType
.
Half
self
.
trt_param
.
precision
=
paddle_infer
.
PrecisionType
.
Half
yield
self
.
create_inference_config
(),
(
1
,
3
),
(
1e-3
,
1e-3
)
yield
self
.
create_inference_config
(),
(
1
,
3
),
(
1e-3
,
1e-2
)
def
add_skip_trt_case
(
self
):
def
teller1
(
program_config
,
predictor_config
):
if
self
.
trt_param
.
precision
==
paddle_infer
.
PrecisionType
.
Half
:
return
True
return
False
self
.
add_skip_case
(
teller1
,
SkipReasons
.
TRT_NOT_IMPLEMENTED
,
"The output has diff between gpu and trt in fp16 mode."
,
)
def
teller2
(
program_config
,
predictor_config
):
if
(
self
.
trt_param
.
precision
==
paddle_infer
.
PrecisionType
.
Float32
and
len
(
self
.
dynamic_shape
.
min_input_shape
)
!=
0
and
self
.
batch
>
2
):
return
True
return
False
self
.
add_skip_case
(
teller2
,
SkipReasons
.
TRT_NOT_IMPLEMENTED
,
"The output has diff between gpu and trt when dynamic fp32 mode and batch size > 2."
,
)
def
teller3
(
program_config
,
predictor_config
):
if
self
.
trt_param
.
precision
==
paddle_infer
.
PrecisionType
.
Int8
:
return
True
return
False
self
.
add_skip_case
(
teller3
,
SkipReasons
.
TRT_NOT_IMPLEMENTED
,
"The output has diff between gpu and trt in int8 mode."
,
)
def
test
(
self
):
def
test
(
self
):
self
.
add_skip_trt_case
()
self
.
run_test
()
self
.
run_test
()
class
TrtConvertMultiHeadMatmulTestInt8
(
TrtConvertMultiHeadMatmulTest
):
class
TrtConvertMultiHeadMatmulTestInt8
(
TrtConvertMultiHeadMatmulTest
):
def
sample_program_configs
(
self
):
def
sample_program_configs
(
self
):
def
generate_input1
(
batch
,
dim1
):
def
generate_input1
(
batch
,
dim1
):
return
np
.
random
.
random
((
batch
,
dim1
,
768
)
).
astype
(
np
.
float32
)
return
np
.
full
((
batch
,
dim1
,
768
),
1
).
astype
(
np
.
float32
)
def
generate_input2
(
shape
):
def
generate_input2
(
shape
):
return
np
.
random
.
random
(
shape
).
astype
(
np
.
float32
)
return
np
.
full
(
shape
,
1
).
astype
(
np
.
float32
)
def
generate_weight1
():
def
generate_weight1
():
return
np
.
random
.
random
((
768
,
768
)
).
astype
(
np
.
float32
)
return
np
.
full
((
768
,
768
),
0.1
).
astype
(
np
.
float32
)
def
generate_weight2
():
def
generate_weight2
():
return
np
.
random
.
random
(
768
).
astype
(
np
.
float32
)
return
np
.
full
((
768
),
0.1
).
astype
(
np
.
float32
)
for
batch
in
[
1
,
2
,
4
]:
for
batch
in
[
4
]:
self
.
batch
=
batch
self
.
batch
=
batch
for
reshape_shape
in
[[
0
,
0
,
12
,
64
]]:
for
reshape_shape
in
[[
0
,
0
,
12
,
64
]]:
for
dim1
in
[
128
]:
for
dim1
in
[
128
]:
...
@@ -776,15 +729,15 @@ class TrtConvertVitToMultiHeadMatmulTest(TrtLayerAutoScanTest):
...
@@ -776,15 +729,15 @@ class TrtConvertVitToMultiHeadMatmulTest(TrtLayerAutoScanTest):
def
sample_program_configs
(
self
):
def
sample_program_configs
(
self
):
def
generate_input1
(
batch
,
length
):
def
generate_input1
(
batch
,
length
):
return
np
.
zeros
((
batch
,
length
,
768
),
dtype
=
np
.
float32
)
return
np
.
full
((
batch
,
length
,
768
),
0.1
).
astype
(
np
.
float32
)
def
generate_weight1
():
def
generate_weight1
():
return
np
.
random
.
rand
(
768
,
2304
).
astype
(
np
.
float32
)
return
np
.
full
((
768
,
2304
),
0.1
).
astype
(
np
.
float32
)
def
generate_weight2
():
def
generate_weight2
():
return
np
.
random
.
rand
(
2304
).
astype
(
np
.
float32
)
return
np
.
full
((
2304
),
0.1
).
astype
(
np
.
float32
)
for
batch
in
[
2
,
4
]:
for
batch
in
[
4
]:
self
.
batch
=
batch
self
.
batch
=
batch
for
length
in
[
197
]:
for
length
in
[
197
]:
self
.
length
=
length
self
.
length
=
length
...
@@ -989,17 +942,6 @@ class TrtConvertVitToMultiHeadMatmulTest(TrtLayerAutoScanTest):
...
@@ -989,17 +942,6 @@ class TrtConvertVitToMultiHeadMatmulTest(TrtLayerAutoScanTest):
"input_data1"
:
[
1
,
197
,
768
],
"input_data1"
:
[
1
,
197
,
768
],
}
}
def
generate_static_shape
(
attrs
):
self
.
dynamic_shape
.
min_input_shape
=
{
"input_data1"
:
[
1
,
197
,
768
],
}
self
.
dynamic_shape
.
max_input_shape
=
{
"input_data1"
:
[
16
,
197
,
768
],
}
self
.
dynamic_shape
.
opt_input_shape
=
{
"input_data1"
:
[
1
,
197
,
768
],
}
def
clear_dynamic_shape
():
def
clear_dynamic_shape
():
self
.
dynamic_shape
.
max_input_shape
=
{}
self
.
dynamic_shape
.
max_input_shape
=
{}
self
.
dynamic_shape
.
min_input_shape
=
{}
self
.
dynamic_shape
.
min_input_shape
=
{}
...
@@ -1026,22 +968,7 @@ class TrtConvertVitToMultiHeadMatmulTest(TrtLayerAutoScanTest):
...
@@ -1026,22 +968,7 @@ class TrtConvertVitToMultiHeadMatmulTest(TrtLayerAutoScanTest):
self
.
trt_param
.
precision
=
paddle_infer
.
PrecisionType
.
Half
self
.
trt_param
.
precision
=
paddle_infer
.
PrecisionType
.
Half
yield
self
.
create_inference_config
(),
generate_trt_nodes_num
(),
(
yield
self
.
create_inference_config
(),
generate_trt_nodes_num
(),
(
1e-3
,
1e-3
,
1e-3
,
2e-2
,
)
self
.
trt_param
.
precision
=
paddle_infer
.
PrecisionType
.
Float32
yield
self
.
create_inference_config
(),
generate_trt_nodes_num
(),
(
1e-5
,
1e-5
,
)
# for static_shape
clear_dynamic_shape
()
generate_static_shape
(
attrs
)
self
.
trt_param
.
workspace_size
=
2013265920
self
.
trt_param
.
precision
=
paddle_infer
.
PrecisionType
.
Half
yield
self
.
create_inference_config
(),
generate_trt_nodes_num
(),
(
1e-3
,
1e-3
,
)
)
self
.
trt_param
.
precision
=
paddle_infer
.
PrecisionType
.
Float32
self
.
trt_param
.
precision
=
paddle_infer
.
PrecisionType
.
Float32
yield
self
.
create_inference_config
(),
generate_trt_nodes_num
(),
(
yield
self
.
create_inference_config
(),
generate_trt_nodes_num
(),
(
...
@@ -1049,20 +976,7 @@ class TrtConvertVitToMultiHeadMatmulTest(TrtLayerAutoScanTest):
...
@@ -1049,20 +976,7 @@ class TrtConvertVitToMultiHeadMatmulTest(TrtLayerAutoScanTest):
1e-5
,
1e-5
,
)
)
def
add_skip_trt_case
(
self
):
def
teller1
(
program_config
,
predictor_config
):
if
self
.
trt_param
.
precision
==
paddle_infer
.
PrecisionType
.
Half
:
return
True
return
False
self
.
add_skip_case
(
teller1
,
SkipReasons
.
TRT_NOT_IMPLEMENTED
,
"The output has diff between gpu and trt in fp16 mode."
,
)
def
test
(
self
):
def
test
(
self
):
self
.
add_skip_trt_case
()
self
.
run_test
()
self
.
run_test
()
...
@@ -1072,19 +986,19 @@ class TrtConvertMultiHeadMatmulTest_biasqk_seqseq(TrtLayerAutoScanTest):
...
@@ -1072,19 +986,19 @@ class TrtConvertMultiHeadMatmulTest_biasqk_seqseq(TrtLayerAutoScanTest):
def
sample_program_configs
(
self
):
def
sample_program_configs
(
self
):
def
generate_input1
(
batch
,
dim1
):
def
generate_input1
(
batch
,
dim1
):
return
np
.
random
.
random
((
batch
,
dim1
,
768
)
).
astype
(
np
.
float32
)
return
np
.
full
((
batch
,
dim1
,
768
),
1
).
astype
(
np
.
float32
)
def
generate_input2
(
shape
):
def
generate_input2
(
shape
):
return
np
.
random
.
random
(
shape
).
astype
(
np
.
float32
)
return
np
.
full
(
shape
,
1
).
astype
(
np
.
float32
)
def
generate_weight1
():
def
generate_weight1
():
return
np
.
random
.
random
((
768
,
768
)
).
astype
(
np
.
float32
)
return
np
.
full
((
768
,
768
),
0.1
).
astype
(
np
.
float32
)
def
generate_weight2
():
def
generate_weight2
():
return
np
.
random
.
random
(
768
).
astype
(
np
.
float32
)
return
np
.
full
((
768
),
0.1
).
astype
(
np
.
float32
)
def
generate_weight3
():
def
generate_weight3
():
return
np
.
random
.
random
((
768
,
768
)
).
astype
(
np
.
float32
)
return
np
.
full
((
768
,
768
),
0.1
).
astype
(
np
.
float32
)
for
batch
in
[
2
]:
for
batch
in
[
2
]:
self
.
batch
=
batch
self
.
batch
=
batch
...
@@ -1423,48 +1337,9 @@ class TrtConvertMultiHeadMatmulTest_biasqk_seqseq(TrtLayerAutoScanTest):
...
@@ -1423,48 +1337,9 @@ class TrtConvertMultiHeadMatmulTest_biasqk_seqseq(TrtLayerAutoScanTest):
self
.
trt_param
.
workspace_size
=
2013265920
self
.
trt_param
.
workspace_size
=
2013265920
yield
self
.
create_inference_config
(),
(
1
,
3
),
(
1e-5
,
1e-4
)
yield
self
.
create_inference_config
(),
(
1
,
3
),
(
1e-5
,
1e-4
)
self
.
trt_param
.
precision
=
paddle_infer
.
PrecisionType
.
Half
self
.
trt_param
.
precision
=
paddle_infer
.
PrecisionType
.
Half
yield
self
.
create_inference_config
(),
(
1
,
3
),
(
1e-3
,
1e-3
)
yield
self
.
create_inference_config
(),
(
1
,
3
),
(
1e-3
,
1e-2
)
def
add_skip_trt_case
(
self
):
def
teller1
(
program_config
,
predictor_config
):
if
self
.
trt_param
.
precision
==
paddle_infer
.
PrecisionType
.
Half
:
return
True
return
False
self
.
add_skip_case
(
teller1
,
SkipReasons
.
TRT_NOT_IMPLEMENTED
,
"The output has diff between gpu and trt in fp16 mode."
,
)
def
teller2
(
program_config
,
predictor_config
):
if
(
self
.
trt_param
.
precision
==
paddle_infer
.
PrecisionType
.
Float32
and
len
(
self
.
dynamic_shape
.
min_input_shape
)
!=
0
and
self
.
batch
>
2
):
return
True
return
False
self
.
add_skip_case
(
teller2
,
SkipReasons
.
TRT_NOT_IMPLEMENTED
,
"The output has diff between gpu and trt when dynamic fp32 mode and batch size > 2."
,
)
def
teller3
(
program_config
,
predictor_config
):
if
self
.
trt_param
.
precision
==
paddle_infer
.
PrecisionType
.
Int8
:
return
True
return
False
self
.
add_skip_case
(
teller3
,
SkipReasons
.
TRT_NOT_IMPLEMENTED
,
"The output has diff between gpu and trt in int8 mode."
,
)
def
test
(
self
):
def
test
(
self
):
self
.
add_skip_trt_case
()
self
.
run_test
()
self
.
run_test
()
...
...
python/paddle/fluid/tests/unittests/ir/inference/test_trt_fc_fuse_pass.py
浏览文件 @
ef734e84
...
@@ -50,7 +50,7 @@ class FCFusePassTRTTest(InferencePassTest):
...
@@ -50,7 +50,7 @@ class FCFusePassTRTTest(InferencePassTest):
if
core
.
is_compiled_with_cuda
():
if
core
.
is_compiled_with_cuda
():
use_gpu
.
append
(
True
)
use_gpu
.
append
(
True
)
for
i
in
range
(
len
(
use_gpu
)):
for
i
in
range
(
len
(
use_gpu
)):
self
.
check_output_with_option
(
use_gpu
[
i
])
self
.
check_output_with_option
(
use_gpu
[
i
]
,
atol
=
1e-4
,
rtol
=
1e-3
)
class
FCFusePassTRTStaticDims4Cols1Test
(
InferencePassTest
):
class
FCFusePassTRTStaticDims4Cols1Test
(
InferencePassTest
):
...
@@ -78,7 +78,7 @@ class FCFusePassTRTStaticDims4Cols1Test(InferencePassTest):
...
@@ -78,7 +78,7 @@ class FCFusePassTRTStaticDims4Cols1Test(InferencePassTest):
if
core
.
is_compiled_with_cuda
():
if
core
.
is_compiled_with_cuda
():
use_gpu
.
append
(
True
)
use_gpu
.
append
(
True
)
for
i
in
range
(
len
(
use_gpu
)):
for
i
in
range
(
len
(
use_gpu
)):
self
.
check_output_with_option
(
use_gpu
[
i
])
self
.
check_output_with_option
(
use_gpu
[
i
]
,
atol
=
1e-4
,
rtol
=
1e-3
)
class
FCFusePassTRTStaticDims4Cols2Test
(
InferencePassTest
):
class
FCFusePassTRTStaticDims4Cols2Test
(
InferencePassTest
):
...
@@ -106,7 +106,7 @@ class FCFusePassTRTStaticDims4Cols2Test(InferencePassTest):
...
@@ -106,7 +106,7 @@ class FCFusePassTRTStaticDims4Cols2Test(InferencePassTest):
if
core
.
is_compiled_with_cuda
():
if
core
.
is_compiled_with_cuda
():
use_gpu
.
append
(
True
)
use_gpu
.
append
(
True
)
for
i
in
range
(
len
(
use_gpu
)):
for
i
in
range
(
len
(
use_gpu
)):
self
.
check_output_with_option
(
use_gpu
[
i
])
self
.
check_output_with_option
(
use_gpu
[
i
]
,
atol
=
1e-4
,
rtol
=
1e-3
)
class
FCFusePassTRTDynamicDims2Test
(
InferencePassTest
):
class
FCFusePassTRTDynamicDims2Test
(
InferencePassTest
):
...
@@ -140,7 +140,7 @@ class FCFusePassTRTDynamicDims2Test(InferencePassTest):
...
@@ -140,7 +140,7 @@ class FCFusePassTRTDynamicDims2Test(InferencePassTest):
if
core
.
is_compiled_with_cuda
():
if
core
.
is_compiled_with_cuda
():
use_gpu
.
append
(
True
)
use_gpu
.
append
(
True
)
for
i
in
range
(
len
(
use_gpu
)):
for
i
in
range
(
len
(
use_gpu
)):
self
.
check_output_with_option
(
use_gpu
[
i
])
self
.
check_output_with_option
(
use_gpu
[
i
]
,
atol
=
1e-4
,
rtol
=
1e-3
)
class
FCFusePassTRTDynamicDims3Cols1Test
(
InferencePassTest
):
class
FCFusePassTRTDynamicDims3Cols1Test
(
InferencePassTest
):
...
@@ -174,7 +174,7 @@ class FCFusePassTRTDynamicDims3Cols1Test(InferencePassTest):
...
@@ -174,7 +174,7 @@ class FCFusePassTRTDynamicDims3Cols1Test(InferencePassTest):
if
core
.
is_compiled_with_cuda
():
if
core
.
is_compiled_with_cuda
():
use_gpu
.
append
(
True
)
use_gpu
.
append
(
True
)
for
i
in
range
(
len
(
use_gpu
)):
for
i
in
range
(
len
(
use_gpu
)):
self
.
check_output_with_option
(
use_gpu
[
i
])
self
.
check_output_with_option
(
use_gpu
[
i
]
,
atol
=
1e-4
,
rtol
=
1e-3
)
class
FCFusePassTRTDynamicDims3Cols2Test
(
InferencePassTest
):
class
FCFusePassTRTDynamicDims3Cols2Test
(
InferencePassTest
):
...
@@ -208,7 +208,7 @@ class FCFusePassTRTDynamicDims3Cols2Test(InferencePassTest):
...
@@ -208,7 +208,7 @@ class FCFusePassTRTDynamicDims3Cols2Test(InferencePassTest):
if
core
.
is_compiled_with_cuda
():
if
core
.
is_compiled_with_cuda
():
use_gpu
.
append
(
True
)
use_gpu
.
append
(
True
)
for
i
in
range
(
len
(
use_gpu
)):
for
i
in
range
(
len
(
use_gpu
)):
self
.
check_output_with_option
(
use_gpu
[
i
])
self
.
check_output_with_option
(
use_gpu
[
i
]
,
atol
=
1e-4
,
rtol
=
1e-3
)
class
FCFusePassTRTDynamicDims4Cols1Test
(
InferencePassTest
):
class
FCFusePassTRTDynamicDims4Cols1Test
(
InferencePassTest
):
...
@@ -244,7 +244,7 @@ class FCFusePassTRTDynamicDims4Cols1Test(InferencePassTest):
...
@@ -244,7 +244,7 @@ class FCFusePassTRTDynamicDims4Cols1Test(InferencePassTest):
if
core
.
is_compiled_with_cuda
():
if
core
.
is_compiled_with_cuda
():
use_gpu
.
append
(
True
)
use_gpu
.
append
(
True
)
for
i
in
range
(
len
(
use_gpu
)):
for
i
in
range
(
len
(
use_gpu
)):
self
.
check_output_with_option
(
use_gpu
[
i
])
self
.
check_output_with_option
(
use_gpu
[
i
]
,
atol
=
1e-4
,
rtol
=
1e-3
)
class
FCFusePassTRTDynamicDims4Cols2Test
(
InferencePassTest
):
class
FCFusePassTRTDynamicDims4Cols2Test
(
InferencePassTest
):
...
@@ -280,7 +280,7 @@ class FCFusePassTRTDynamicDims4Cols2Test(InferencePassTest):
...
@@ -280,7 +280,7 @@ class FCFusePassTRTDynamicDims4Cols2Test(InferencePassTest):
if
core
.
is_compiled_with_cuda
():
if
core
.
is_compiled_with_cuda
():
use_gpu
.
append
(
True
)
use_gpu
.
append
(
True
)
for
i
in
range
(
len
(
use_gpu
)):
for
i
in
range
(
len
(
use_gpu
)):
self
.
check_output_with_option
(
use_gpu
[
i
])
self
.
check_output_with_option
(
use_gpu
[
i
]
,
atol
=
1e-4
,
rtol
=
1e-3
)
class
FCFusePassTRTDynamicDims4Cols3Test
(
InferencePassTest
):
class
FCFusePassTRTDynamicDims4Cols3Test
(
InferencePassTest
):
...
@@ -316,7 +316,7 @@ class FCFusePassTRTDynamicDims4Cols3Test(InferencePassTest):
...
@@ -316,7 +316,7 @@ class FCFusePassTRTDynamicDims4Cols3Test(InferencePassTest):
if
core
.
is_compiled_with_cuda
():
if
core
.
is_compiled_with_cuda
():
use_gpu
.
append
(
True
)
use_gpu
.
append
(
True
)
for
i
in
range
(
len
(
use_gpu
)):
for
i
in
range
(
len
(
use_gpu
)):
self
.
check_output_with_option
(
use_gpu
[
i
])
self
.
check_output_with_option
(
use_gpu
[
i
]
,
atol
=
1e-4
,
rtol
=
1e-3
)
if
__name__
==
"__main__"
:
if
__name__
==
"__main__"
:
...
...
python/paddle/fluid/tests/unittests/ir/inference/test_trt_flatten2_matmul_fuse_pass.py
已删除
100644 → 0
浏览文件 @
acf55016
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import
unittest
import
hypothesis.strategies
as
st
from
auto_scan_test
import
IgnoreReasons
,
PassAutoScanTest
from
program_config
import
OpConfig
,
ProgramConfig
,
TensorConfig
import
paddle.inference
as
paddle_infer
class
TestFlatten2MatmulFusePass
(
PassAutoScanTest
):
r
"""
x_var
|
flatten2
\
flatten2_out_var y_var
\ /
matmul bias_var
\ /
elementwise_add
"""
def
sample_predictor_configs
(
self
,
program_config
):
# TRT
config
=
self
.
create_trt_inference_config
()
config
.
enable_tensorrt_engine
(
max_batch_size
=
10
,
workspace_size
=
102400
,
min_subgraph_size
=
0
,
precision_mode
=
paddle_infer
.
PrecisionType
.
Float32
,
use_static
=
False
,
use_calib_mode
=
False
,
)
yield
config
,
[
'mul'
,
'elementwise_add'
],
(
1e-4
,
1e-1
)
def
add_ignore_pass_case
(
self
):
# Here we put some skip rules to avoid known bugs
def
teller1
(
program_config
,
predictor_config
):
y_shape
=
list
(
program_config
.
weights
[
"matmul_y"
].
shape
)
bias_shape
=
program_config
.
weights
[
"bias"
].
shape
axis
=
program_config
.
ops
[
2
].
attrs
[
"axis"
]
# bias should be [mul_y_shape[-1]]
if
axis
==
0
or
bias_shape
[
0
]
!=
y_shape
[
1
]
or
len
(
bias_shape
)
!=
1
:
return
True
return
False
self
.
add_ignore_check_case
(
teller1
,
IgnoreReasons
.
PASS_ACCURACY_ERROR
,
"The pass error on TRT while shape of bias is not [out_size]."
,
)
def
sample_program_config
(
self
,
draw
):
# 1. Generate shape and attr of flatten2
x_shape
=
draw
(
st
.
lists
(
st
.
integers
(
min_value
=
1
,
max_value
=
10
),
min_size
=
4
,
max_size
=
4
)
)
# [a, b, c, d] => [a, b*c*d]
flatten_axis
=
1
flatten_shape
=
[
x_shape
[
0
],
x_shape
[
1
]
*
x_shape
[
2
]
*
x_shape
[
3
]]
# 2. Generate attr:transpose_X/transpose_Y/alpha of matmul
alpha
=
1.0
transpose_X
=
False
transpose_Y
=
False
# 3. Generate legal shape of input:Y of matmul
y_shape
=
draw
(
st
.
lists
(
st
.
integers
(
min_value
=
1
,
max_value
=
8
),
min_size
=
2
,
max_size
=
2
)
)
y_shape
[
0
]
=
flatten_shape
[
1
]
# 4. Generate legal attr:axis of elementwise_add
axis
=
draw
(
st
.
integers
(
min_value
=-
1
,
max_value
=
1
))
if
axis
==
0
:
axis
=
-
1
bias_shape
=
[
y_shape
[
1
],
]
flatten2_op
=
OpConfig
(
"flatten2"
,
inputs
=
{
"X"
:
[
"flatten2_x"
],
},
axis
=
flatten_axis
,
outputs
=
{
"Out"
:
[
"flatten2_out"
],
"XShape"
:
[
"xshape"
]},
)
matmul_op
=
OpConfig
(
"matmul"
,
inputs
=
{
"X"
:
[
"flatten2_out"
],
"Y"
:
[
"matmul_y"
]},
outputs
=
{
"Out"
:
[
"matmul_out"
]},
alpha
=
alpha
,
transpose_X
=
transpose_X
,
transpose_Y
=
transpose_Y
,
)
add_op
=
OpConfig
(
"elementwise_add"
,
inputs
=
{
"X"
:
[
"matmul_out"
],
"Y"
:
[
"bias"
]},
outputs
=
{
"Out"
:
[
"add_out"
]},
axis
=
axis
,
)
ops
=
[
flatten2_op
,
matmul_op
,
add_op
]
program_config
=
ProgramConfig
(
ops
=
ops
,
weights
=
{
"matmul_y"
:
TensorConfig
(
shape
=
y_shape
),
"bias"
:
TensorConfig
(
shape
=
bias_shape
),
},
inputs
=
{
"flatten2_x"
:
TensorConfig
(
shape
=
x_shape
),
},
outputs
=
ops
[
-
1
].
outputs
[
"Out"
],
)
return
program_config
def
test
(
self
):
self
.
run_and_statis
(
quant
=
False
,
max_examples
=
25
,
passes
=
[
"trt_flatten2_matmul_fuse_pass"
],
)
if
__name__
==
"__main__"
:
unittest
.
main
()
python/paddle/fluid/tests/unittests/ir/inference/test_trt_matmul_quant_dequant.py
浏览文件 @
ef734e84
...
@@ -79,6 +79,14 @@ class TensorRTMatMulQuantDequantDims3Test(QuantDequantTest):
...
@@ -79,6 +79,14 @@ class TensorRTMatMulQuantDequantDims3Test(QuantDequantTest):
self
.
trt_parameters
=
TensorRTMatMulQuantDequantDims3Test
.
TensorRTParam
(
self
.
trt_parameters
=
TensorRTMatMulQuantDequantDims3Test
.
TensorRTParam
(
1
<<
30
,
32
,
0
,
AnalysisConfig
.
Precision
.
Int8
,
False
,
False
1
<<
30
,
32
,
0
,
AnalysisConfig
.
Precision
.
Int8
,
False
,
False
)
)
self
.
dynamic_shape_params
=
(
TensorRTMatMulQuantDequantDims3Test
.
DynamicShapeParam
(
{
'data'
:
[
1
,
28
,
28
]},
{
'data'
:
[
4
,
28
,
28
]},
{
'data'
:
[
3
,
28
,
28
]},
False
,
)
)
self
.
activation_quantize_type
=
'moving_average_abs_max'
self
.
activation_quantize_type
=
'moving_average_abs_max'
self
.
weight_quantize_type
=
'channel_wise_abs_max'
self
.
weight_quantize_type
=
'channel_wise_abs_max'
...
@@ -137,7 +145,7 @@ class TensorRTMatMulQuantDequantDims4Test(QuantDequantTest):
...
@@ -137,7 +145,7 @@ class TensorRTMatMulQuantDequantDims4Test(QuantDequantTest):
self
.
label
=
paddle
.
static
.
data
(
self
.
label
=
paddle
.
static
.
data
(
name
=
'label'
,
shape
=
[
1
,
1
],
dtype
=
'int64'
name
=
'label'
,
shape
=
[
1
,
1
],
dtype
=
'int64'
)
)
reshape_out
=
paddle
.
reshape
(
self
.
data
,
shape
=
[
1
,
4
,
14
,
14
])
reshape_out
=
paddle
.
reshape
(
self
.
data
,
shape
=
[
0
,
4
,
14
,
14
])
matmul_out
=
paddle
.
matmul
(
matmul_out
=
paddle
.
matmul
(
x
=
reshape_out
,
x
=
reshape_out
,
y
=
reshape_out
,
y
=
reshape_out
,
...
@@ -183,6 +191,14 @@ class TensorRTMatMulQuantDequantDims4Test(QuantDequantTest):
...
@@ -183,6 +191,14 @@ class TensorRTMatMulQuantDequantDims4Test(QuantDequantTest):
self
.
trt_parameters
=
TensorRTMatMulQuantDequantDims4Test
.
TensorRTParam
(
self
.
trt_parameters
=
TensorRTMatMulQuantDequantDims4Test
.
TensorRTParam
(
1
<<
30
,
32
,
0
,
AnalysisConfig
.
Precision
.
Int8
,
False
,
False
1
<<
30
,
32
,
0
,
AnalysisConfig
.
Precision
.
Int8
,
False
,
False
)
)
self
.
dynamic_shape_params
=
(
TensorRTMatMulQuantDequantDims4Test
.
DynamicShapeParam
(
{
'data'
:
[
1
,
28
,
28
]},
{
'data'
:
[
4
,
28
,
28
]},
{
'data'
:
[
3
,
28
,
28
]},
False
,
)
)
self
.
activation_quantize_type
=
'moving_average_abs_max'
self
.
activation_quantize_type
=
'moving_average_abs_max'
self
.
weight_quantize_type
=
'channel_wise_abs_max'
self
.
weight_quantize_type
=
'channel_wise_abs_max'
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录