Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
PaddleDetection
提交
23fc896b
P
PaddleDetection
项目概览
PaddlePaddle
/
PaddleDetection
大约 1 年 前同步成功
通知
694
Star
11112
Fork
2696
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
184
列表
看板
标记
里程碑
合并请求
40
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
PaddleDetection
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
184
Issue
184
列表
看板
标记
里程碑
合并请求
40
合并请求
40
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
体验新版 GitCode,发现更多精彩内容 >>
提交
23fc896b
编写于
10月 19, 2018
作者:
T
tensor-tang
浏览文件
操作
浏览文件
下载
差异文件
Merge remote-tracking branch 'ups/develop' into fea/fusion_seqconv_add
test=develop
上级
339e655a
a1d3db03
变更
35
显示空白变更内容
内联
并排
Showing
35 changed file
with
3559 addition
and
319 deletion
+3559
-319
README.md
README.md
+5
-5
cmake/generic.cmake
cmake/generic.cmake
+4
-0
paddle/fluid/API.spec
paddle/fluid/API.spec
+5
-5
paddle/fluid/framework/ir/CMakeLists.txt
paddle/fluid/framework/ir/CMakeLists.txt
+6
-4
paddle/fluid/framework/ir/conv_bn_fuse_pass.cc
paddle/fluid/framework/ir/conv_bn_fuse_pass.cc
+63
-23
paddle/fluid/framework/ir/conv_relu_mkldnn_fuse_pass.cc
paddle/fluid/framework/ir/conv_relu_mkldnn_fuse_pass.cc
+6
-0
paddle/fluid/framework/ir/conv_relu_mkldnn_fuse_pass_tester.cc
...e/fluid/framework/ir/conv_relu_mkldnn_fuse_pass_tester.cc
+33
-14
paddle/fluid/framework/ir/fuse_pass_base.cc
paddle/fluid/framework/ir/fuse_pass_base.cc
+62
-0
paddle/fluid/framework/ir/fuse_pass_base.h
paddle/fluid/framework/ir/fuse_pass_base.h
+12
-20
paddle/fluid/framework/ir/graph_pattern_detector.cc
paddle/fluid/framework/ir/graph_pattern_detector.cc
+2
-0
paddle/fluid/framework/ir/mkldnn_placement_pass.cc
paddle/fluid/framework/ir/mkldnn_placement_pass.cc
+37
-0
paddle/fluid/framework/ir/mkldnn_placement_pass.h
paddle/fluid/framework/ir/mkldnn_placement_pass.h
+31
-0
paddle/fluid/framework/op_desc.cc
paddle/fluid/framework/op_desc.cc
+0
-4
paddle/fluid/inference/analysis/analyzer.cc
paddle/fluid/inference/analysis/analyzer.cc
+20
-1
paddle/fluid/inference/analysis/analyzer.h
paddle/fluid/inference/analysis/analyzer.h
+6
-0
paddle/fluid/inference/api/analysis_predictor.cc
paddle/fluid/inference/api/analysis_predictor.cc
+18
-4
paddle/fluid/inference/api/paddle_inference_api.h
paddle/fluid/inference/api/paddle_inference_api.h
+7
-0
paddle/fluid/inference/tests/api/analyzer_rnn2_tester.cc
paddle/fluid/inference/tests/api/analyzer_rnn2_tester.cc
+3
-3
paddle/fluid/operators/detection/CMakeLists.txt
paddle/fluid/operators/detection/CMakeLists.txt
+1
-1
paddle/fluid/operators/detection/gpc.cc
paddle/fluid/operators/detection/gpc.cc
+2201
-0
paddle/fluid/operators/detection/gpc.h
paddle/fluid/operators/detection/gpc.h
+246
-0
paddle/fluid/operators/detection/multiclass_nms_op.cc
paddle/fluid/operators/detection/multiclass_nms_op.cc
+60
-21
paddle/fluid/operators/detection/poly_util.cc
paddle/fluid/operators/detection/poly_util.cc
+132
-0
paddle/fluid/operators/detection/poly_util.h
paddle/fluid/operators/detection/poly_util.h
+73
-0
paddle/fluid/operators/detection/polygon_box_transform_op.cc
paddle/fluid/operators/detection/polygon_box_transform_op.cc
+2
-2
paddle/fluid/operators/detection/polygon_box_transform_op.cu
paddle/fluid/operators/detection/polygon_box_transform_op.cu
+2
-2
paddle/fluid/operators/math/CMakeLists.txt
paddle/fluid/operators/math/CMakeLists.txt
+1
-1
paddle/fluid/operators/math/jit_kernel_exp.cc
paddle/fluid/operators/math/jit_kernel_exp.cc
+201
-60
paddle/fluid/operators/math/jit_kernel_lstm.cc
paddle/fluid/operators/math/jit_kernel_lstm.cc
+122
-70
paddle/fluid/operators/roi_pool_op.cc
paddle/fluid/operators/roi_pool_op.cc
+1
-1
paddle/fluid/operators/roi_pool_op.cu
paddle/fluid/operators/roi_pool_op.cu
+1
-1
python/paddle/fluid/layers/nn.py
python/paddle/fluid/layers/nn.py
+176
-68
python/paddle/fluid/nets.py
python/paddle/fluid/nets.py
+18
-8
python/paddle/fluid/regularizer.py
python/paddle/fluid/regularizer.py
+1
-0
python/paddle/fluid/tests/unittests/test_polygon_box_transform.py
...addle/fluid/tests/unittests/test_polygon_box_transform.py
+1
-1
未找到文件。
README.md
浏览文件 @
23fc896b
...
@@ -2,8 +2,8 @@
...
@@ -2,8 +2,8 @@
[
![Build Status
](
https://travis-ci.org/PaddlePaddle/Paddle.svg?branch=develop
)
](https://travis-ci.org/PaddlePaddle/Paddle)
[
![Build Status
](
https://travis-ci.org/PaddlePaddle/Paddle.svg?branch=develop
)
](https://travis-ci.org/PaddlePaddle/Paddle)
[
![Documentation Status
](
https://img.shields.io/badge/docs-latest-brightgreen.svg?style=flat
)
](http://
www.paddlepaddle.org/docs/develop/documentation/en
/getstarted/index_en.html)
[
![Documentation Status
](
https://img.shields.io/badge/docs-latest-brightgreen.svg?style=flat
)
](http://
paddlepaddle.org/documentation/docs/en/1.0
/getstarted/index_en.html)
[
![Documentation Status
](
https://img.shields.io/badge/中文文档-最新-brightgreen.svg
)
](http://
www.paddlepaddle.org/docs/develop/documentation/zh/getstarted/index_cn
.html)
[
![Documentation Status
](
https://img.shields.io/badge/中文文档-最新-brightgreen.svg
)
](http://
paddlepaddle.org/documentation/docs/zh/1.0/beginners_guide/index
.html)
[
![Release
](
https://img.shields.io/github/release/PaddlePaddle/Paddle.svg
)
](https://github.com/PaddlePaddle/Paddle/releases)
[
![Release
](
https://img.shields.io/github/release/PaddlePaddle/Paddle.svg
)
](https://github.com/PaddlePaddle/Paddle/releases)
[
![License
](
https://img.shields.io/badge/license-Apache%202-blue.svg
)
](LICENSE)
[
![License
](
https://img.shields.io/badge/license-Apache%202-blue.svg
)
](LICENSE)
...
@@ -19,7 +19,7 @@ Our vision is to enable deep learning for everyone via PaddlePaddle.
...
@@ -19,7 +19,7 @@ Our vision is to enable deep learning for everyone via PaddlePaddle.
Please refer to our
[
release announcement
](
https://github.com/PaddlePaddle/Paddle/releases
)
to track the latest feature of PaddlePaddle.
Please refer to our
[
release announcement
](
https://github.com/PaddlePaddle/Paddle/releases
)
to track the latest feature of PaddlePaddle.
### Latest PaddlePaddle Release: [Fluid 1.0.
0
](https://github.com/PaddlePaddle/Paddle/tree/release/1.0.0)
### Latest PaddlePaddle Release: [Fluid 1.0.
1
](https://github.com/PaddlePaddle/Paddle/tree/release/1.0.0)
### Install Latest Stable Release:
### Install Latest Stable Release:
```
```
# Linux CPU
# Linux CPU
...
@@ -27,9 +27,9 @@ pip install paddlepaddle
...
@@ -27,9 +27,9 @@ pip install paddlepaddle
# Linux GPU cuda9cudnn7
# Linux GPU cuda9cudnn7
pip install paddlepaddle-gpu
pip install paddlepaddle-gpu
# Linux GPU cuda8cudnn7
# Linux GPU cuda8cudnn7
pip install paddlepaddle-gpu==
0.15.0
.post87
pip install paddlepaddle-gpu==
1.0.1
.post87
# Linux GPU cuda8cudnn5
# Linux GPU cuda8cudnn5
pip install paddlepaddle-gpu==
0.15.0
.post85
pip install paddlepaddle-gpu==
1.0.1
.post85
# For installation on other platform, refer to http://paddlepaddle.org/
# For installation on other platform, refer to http://paddlepaddle.org/
```
```
...
...
cmake/generic.cmake
浏览文件 @
23fc896b
...
@@ -311,6 +311,8 @@ function(cc_test TARGET_NAME)
...
@@ -311,6 +311,8 @@ function(cc_test TARGET_NAME)
set_property
(
TEST
${
TARGET_NAME
}
PROPERTY ENVIRONMENT FLAGS_cpu_deterministic=true
)
set_property
(
TEST
${
TARGET_NAME
}
PROPERTY ENVIRONMENT FLAGS_cpu_deterministic=true
)
set_property
(
TEST
${
TARGET_NAME
}
PROPERTY ENVIRONMENT FLAGS_init_allocated_mem=true
)
set_property
(
TEST
${
TARGET_NAME
}
PROPERTY ENVIRONMENT FLAGS_init_allocated_mem=true
)
set_property
(
TEST
${
TARGET_NAME
}
PROPERTY ENVIRONMENT FLAGS_cudnn_deterministic=true
)
set_property
(
TEST
${
TARGET_NAME
}
PROPERTY ENVIRONMENT FLAGS_cudnn_deterministic=true
)
# No unit test should exceed 10 minutes.
set_tests_properties
(
${
TARGET_NAME
}
PROPERTIES TIMEOUT 600
)
endif
()
endif
()
endfunction
(
cc_test
)
endfunction
(
cc_test
)
...
@@ -629,6 +631,8 @@ function(py_test TARGET_NAME)
...
@@ -629,6 +631,8 @@ function(py_test TARGET_NAME)
PYTHONPATH=
${
PADDLE_BINARY_DIR
}
/python
${
py_test_ENVS
}
PYTHONPATH=
${
PADDLE_BINARY_DIR
}
/python
${
py_test_ENVS
}
${
PYTHON_EXECUTABLE
}
-u
${
py_test_SRCS
}
${
py_test_ARGS
}
${
PYTHON_EXECUTABLE
}
-u
${
py_test_SRCS
}
${
py_test_ARGS
}
WORKING_DIRECTORY
${
CMAKE_CURRENT_BINARY_DIR
}
)
WORKING_DIRECTORY
${
CMAKE_CURRENT_BINARY_DIR
}
)
# No unit test should exceed 10 minutes.
set_tests_properties
(
${
TARGET_NAME
}
PROPERTIES TIMEOUT 600
)
endif
()
endif
()
endfunction
()
endfunction
()
...
...
paddle/fluid/API.spec
浏览文件 @
23fc896b
...
@@ -61,12 +61,12 @@ paddle.fluid.layers.cos_sim ArgSpec(args=['X', 'Y'], varargs=None, keywords=None
...
@@ -61,12 +61,12 @@ paddle.fluid.layers.cos_sim ArgSpec(args=['X', 'Y'], varargs=None, keywords=None
paddle.fluid.layers.cross_entropy ArgSpec(args=['input', 'label', 'soft_label', 'ignore_index'], varargs=None, keywords=None, defaults=(False, -100))
paddle.fluid.layers.cross_entropy ArgSpec(args=['input', 'label', 'soft_label', 'ignore_index'], varargs=None, keywords=None, defaults=(False, -100))
paddle.fluid.layers.square_error_cost ArgSpec(args=['input', 'label'], varargs=None, keywords=None, defaults=None)
paddle.fluid.layers.square_error_cost ArgSpec(args=['input', 'label'], varargs=None, keywords=None, defaults=None)
paddle.fluid.layers.chunk_eval ArgSpec(args=['input', 'label', 'chunk_scheme', 'num_chunk_types', 'excluded_chunk_types'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.layers.chunk_eval ArgSpec(args=['input', 'label', 'chunk_scheme', 'num_chunk_types', 'excluded_chunk_types'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.layers.sequence_conv ArgSpec(args=['input', 'num_filters', 'filter_size', 'filter_stride', 'padding', 'bias_attr', 'param_attr', 'act'
], varargs=None, keywords=None, defaults=(3, 1
, None, None, None, None))
paddle.fluid.layers.sequence_conv ArgSpec(args=['input', 'num_filters', 'filter_size', 'filter_stride', 'padding', 'bias_attr', 'param_attr', 'act'
, 'name'], varargs=None, keywords=None, defaults=(3, 1, None
, None, None, None, None))
paddle.fluid.layers.conv2d ArgSpec(args=['input', 'num_filters', 'filter_size', 'stride', 'padding', 'dilation', 'groups', 'param_attr', 'bias_attr', 'use_cudnn', 'act', 'name'], varargs=None, keywords=None, defaults=(1, 0, 1, None, None, None, True, None, None))
paddle.fluid.layers.conv2d ArgSpec(args=['input', 'num_filters', 'filter_size', 'stride', 'padding', 'dilation', 'groups', 'param_attr', 'bias_attr', 'use_cudnn', 'act', 'name'], varargs=None, keywords=None, defaults=(1, 0, 1, None, None, None, True, None, None))
paddle.fluid.layers.conv3d ArgSpec(args=['input', 'num_filters', 'filter_size', 'stride', 'padding', 'dilation', 'groups', 'param_attr', 'bias_attr', 'use_cudnn', 'act', 'name'], varargs=None, keywords=None, defaults=(1, 0, 1, None, None, None, True, None, None))
paddle.fluid.layers.conv3d ArgSpec(args=['input', 'num_filters', 'filter_size', 'stride', 'padding', 'dilation', 'groups', 'param_attr', 'bias_attr', 'use_cudnn', 'act', 'name'], varargs=None, keywords=None, defaults=(1, 0, 1, None, None, None, True, None, None))
paddle.fluid.layers.sequence_pool ArgSpec(args=['input', 'pool_type'], varargs=None, keywords=None, defaults=None)
paddle.fluid.layers.sequence_pool ArgSpec(args=['input', 'pool_type'], varargs=None, keywords=None, defaults=None)
paddle.fluid.layers.sequence_softmax ArgSpec(args=['input', '
param_attr', 'bias_attr', 'use_cudnn'], varargs=None, keywords=None, defaults=(None, None, Fals
e))
paddle.fluid.layers.sequence_softmax ArgSpec(args=['input', '
use_cudnn', 'name'], varargs=None, keywords=None, defaults=(False, Non
e))
paddle.fluid.layers.softmax ArgSpec(args=['input', '
param_attr', 'bias_attr', 'use_cudnn', 'name'], varargs=None, keywords=None, defaults=(None, None,
True, None))
paddle.fluid.layers.softmax ArgSpec(args=['input', '
use_cudnn', 'name'], varargs=None, keywords=None, defaults=(
True, None))
paddle.fluid.layers.pool2d ArgSpec(args=['input', 'pool_size', 'pool_type', 'pool_stride', 'pool_padding', 'global_pooling', 'use_cudnn', 'ceil_mode', 'name'], varargs=None, keywords=None, defaults=(-1, 'max', 1, 0, False, True, False, None))
paddle.fluid.layers.pool2d ArgSpec(args=['input', 'pool_size', 'pool_type', 'pool_stride', 'pool_padding', 'global_pooling', 'use_cudnn', 'ceil_mode', 'name'], varargs=None, keywords=None, defaults=(-1, 'max', 1, 0, False, True, False, None))
paddle.fluid.layers.pool3d ArgSpec(args=['input', 'pool_size', 'pool_type', 'pool_stride', 'pool_padding', 'global_pooling', 'use_cudnn', 'ceil_mode', 'name'], varargs=None, keywords=None, defaults=(-1, 'max', 1, 0, False, True, False, None))
paddle.fluid.layers.pool3d ArgSpec(args=['input', 'pool_size', 'pool_type', 'pool_stride', 'pool_padding', 'global_pooling', 'use_cudnn', 'ceil_mode', 'name'], varargs=None, keywords=None, defaults=(-1, 'max', 1, 0, False, True, False, None))
paddle.fluid.layers.batch_norm ArgSpec(args=['input', 'act', 'is_test', 'momentum', 'epsilon', 'param_attr', 'bias_attr', 'data_layout', 'in_place', 'name', 'moving_mean_name', 'moving_variance_name', 'do_model_average_for_mean_and_var', 'fuse_with_relu'], varargs=None, keywords=None, defaults=(None, False, 0.9, 1e-05, None, None, 'NCHW', False, None, None, None, False, False))
paddle.fluid.layers.batch_norm ArgSpec(args=['input', 'act', 'is_test', 'momentum', 'epsilon', 'param_attr', 'bias_attr', 'data_layout', 'in_place', 'name', 'moving_mean_name', 'moving_variance_name', 'do_model_average_for_mean_and_var', 'fuse_with_relu'], varargs=None, keywords=None, defaults=(None, False, 0.9, 1e-05, None, None, 'NCHW', False, None, None, None, False, False))
...
@@ -97,8 +97,8 @@ paddle.fluid.layers.warpctc ArgSpec(args=['input', 'label', 'blank', 'norm_by_ti
...
@@ -97,8 +97,8 @@ paddle.fluid.layers.warpctc ArgSpec(args=['input', 'label', 'blank', 'norm_by_ti
paddle.fluid.layers.sequence_reshape ArgSpec(args=['input', 'new_dim'], varargs=None, keywords=None, defaults=None)
paddle.fluid.layers.sequence_reshape ArgSpec(args=['input', 'new_dim'], varargs=None, keywords=None, defaults=None)
paddle.fluid.layers.transpose ArgSpec(args=['x', 'perm', 'name'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.layers.transpose ArgSpec(args=['x', 'perm', 'name'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.layers.im2sequence ArgSpec(args=['input', 'filter_size', 'stride', 'padding', 'input_image_size', 'out_stride', 'name'], varargs=None, keywords=None, defaults=(1, 1, 0, None, 1, None))
paddle.fluid.layers.im2sequence ArgSpec(args=['input', 'filter_size', 'stride', 'padding', 'input_image_size', 'out_stride', 'name'], varargs=None, keywords=None, defaults=(1, 1, 0, None, 1, None))
paddle.fluid.layers.nce ArgSpec(args=['input', 'label', 'num_total_classes', 'sample_weight', 'param_attr', 'bias_attr', 'num_neg_samples'
], varargs=None, keywords=None, defaults=(
None, None, None, None))
paddle.fluid.layers.nce ArgSpec(args=['input', 'label', 'num_total_classes', 'sample_weight', 'param_attr', 'bias_attr', 'num_neg_samples'
, 'name'], varargs=None, keywords=None, defaults=(None,
None, None, None, None))
paddle.fluid.layers.hsigmoid ArgSpec(args=['input', 'label', 'num_classes', 'param_attr', 'bias_attr'
], varargs=None, keywords=None, defaults=(
None, None))
paddle.fluid.layers.hsigmoid ArgSpec(args=['input', 'label', 'num_classes', 'param_attr', 'bias_attr'
, 'name'], varargs=None, keywords=None, defaults=(None,
None, None))
paddle.fluid.layers.beam_search ArgSpec(args=['pre_ids', 'pre_scores', 'ids', 'scores', 'beam_size', 'end_id', 'level', 'name'], varargs=None, keywords=None, defaults=(0, None))
paddle.fluid.layers.beam_search ArgSpec(args=['pre_ids', 'pre_scores', 'ids', 'scores', 'beam_size', 'end_id', 'level', 'name'], varargs=None, keywords=None, defaults=(0, None))
paddle.fluid.layers.row_conv ArgSpec(args=['input', 'future_context_size', 'param_attr', 'act'], varargs=None, keywords=None, defaults=(None, None))
paddle.fluid.layers.row_conv ArgSpec(args=['input', 'future_context_size', 'param_attr', 'act'], varargs=None, keywords=None, defaults=(None, None))
paddle.fluid.layers.multiplex ArgSpec(args=['inputs', 'index'], varargs=None, keywords=None, defaults=None)
paddle.fluid.layers.multiplex ArgSpec(args=['inputs', 'index'], varargs=None, keywords=None, defaults=None)
...
...
paddle/fluid/framework/ir/CMakeLists.txt
浏览文件 @
23fc896b
...
@@ -10,7 +10,7 @@ function(pass_library TARGET DEST)
...
@@ -10,7 +10,7 @@ function(pass_library TARGET DEST)
set
(
oneValueArgs
""
)
set
(
oneValueArgs
""
)
set
(
multiValueArgs SRCS DEPS
)
set
(
multiValueArgs SRCS DEPS
)
cmake_parse_arguments
(
op_library
"
${
options
}
"
"
${
oneValueArgs
}
"
"
${
multiValueArgs
}
"
${
ARGN
}
)
cmake_parse_arguments
(
op_library
"
${
options
}
"
"
${
oneValueArgs
}
"
"
${
multiValueArgs
}
"
${
ARGN
}
)
cc_library
(
${
TARGET
}
SRCS
${
TARGET
}
.cc DEPS graph_pattern_detector pass
${
op_library_DEPS
}
)
cc_library
(
${
TARGET
}
SRCS
${
TARGET
}
.cc DEPS graph_pattern_detector pass
fuse_pass_base
${
op_library_DEPS
}
)
# add more DEST here, such as train, dist and collect USE_PASS into a file automatically.
# add more DEST here, such as train, dist and collect USE_PASS into a file automatically.
if
(
${
DEST
}
STREQUAL
"base"
OR
${
DEST
}
STREQUAL
"inference"
)
if
(
${
DEST
}
STREQUAL
"base"
OR
${
DEST
}
STREQUAL
"inference"
)
message
(
STATUS
"add pass
${
TARGET
}
${
DEST
}
"
)
message
(
STATUS
"add pass
${
TARGET
}
${
DEST
}
"
)
...
@@ -25,13 +25,11 @@ cc_library(graph_helper SRCS graph_helper.cc DEPS graph)
...
@@ -25,13 +25,11 @@ cc_library(graph_helper SRCS graph_helper.cc DEPS graph)
cc_library
(
pass SRCS pass.cc DEPS graph node graph_helper
)
cc_library
(
pass SRCS pass.cc DEPS graph node graph_helper
)
cc_library
(
graph_traits SRCS graph_traits.cc DEPS graph
)
cc_library
(
graph_traits SRCS graph_traits.cc DEPS graph
)
cc_library
(
graph_pattern_detector SRCS graph_pattern_detector.cc DEPS graph graph_helper graph_traits
)
cc_library
(
graph_pattern_detector SRCS graph_pattern_detector.cc DEPS graph graph_helper graph_traits
)
cc_library
(
fuse_pass_base SRCS fuse_pass_base.cc DEPS pass
)
pass_library
(
graph_to_program_pass base
)
pass_library
(
graph_to_program_pass base
)
pass_library
(
graph_viz_pass base
)
pass_library
(
graph_viz_pass base
)
pass_library
(
fc_fuse_pass inference
)
pass_library
(
fc_fuse_pass inference
)
if
(
WITH_MKLDNN
)
pass_library
(
conv_relu_mkldnn_fuse_pass inference
)
endif
()
pass_library
(
attention_lstm_fuse_pass inference
)
pass_library
(
attention_lstm_fuse_pass inference
)
pass_library
(
infer_clean_graph_pass inference
)
pass_library
(
infer_clean_graph_pass inference
)
pass_library
(
fc_lstm_fuse_pass inference
)
pass_library
(
fc_lstm_fuse_pass inference
)
...
@@ -39,6 +37,10 @@ pass_library(embedding_fc_lstm_fuse_pass inference)
...
@@ -39,6 +37,10 @@ pass_library(embedding_fc_lstm_fuse_pass inference)
pass_library
(
fc_gru_fuse_pass inference
)
pass_library
(
fc_gru_fuse_pass inference
)
pass_library
(
seq_concat_fc_fuse_pass inference
)
pass_library
(
seq_concat_fc_fuse_pass inference
)
pass_library
(
conv_bn_fuse_pass inference
)
pass_library
(
conv_bn_fuse_pass inference
)
if
(
WITH_MKLDNN
)
pass_library
(
mkldnn_placement_pass base
)
pass_library
(
conv_relu_mkldnn_fuse_pass inference
)
endif
()
cc_library
(
fuse_elewise_add_act_pass SRCS fuse_elewise_add_act_pass.cc DEPS pass graph_pattern_detector
)
cc_library
(
fuse_elewise_add_act_pass SRCS fuse_elewise_add_act_pass.cc DEPS pass graph_pattern_detector
)
...
...
paddle/fluid/framework/ir/conv_bn_fuse_pass.cc
浏览文件 @
23fc896b
...
@@ -126,12 +126,21 @@ std::unique_ptr<ir::Graph> ConvBNFusePass::ApplyImpl(
...
@@ -126,12 +126,21 @@ std::unique_ptr<ir::Graph> ConvBNFusePass::ApplyImpl(
// conv, batch_norm,
// conv, batch_norm,
// conv_weight, conv_out,
// conv_weight, conv_out,
// bn_scale, bn_bias, bn_mean, bn_variance,
// bn_scale, bn_bias, bn_mean, bn_variance,
// bn_out, bn_mean_out, bn_variance_out, bn_saved_mean, bn_saved_variance
// bn_out, bn_mean_out, bn_variance_out, bn_saved_mean,
// bn_saved_variance
GET_CONV_BN_NODES
(
conv_bn_pattern
);
GET_CONV_BN_NODES
(
conv_bn_pattern
);
// check if fuse can be done and if MKL-DNN should be used
FuseOptions
fuse_option
=
FindFuseOption
(
*
conv
,
*
batch_norm
);
if
(
fuse_option
==
DO_NOT_FUSE
)
{
VLOG
(
3
)
<<
"do not perform conv+bn fuse"
;
return
;
}
// Create eltwise_y (conv bias) variable
// Create eltwise_y (conv bias) variable
VarDesc
eltwise_y_in_desc
(
VarDesc
eltwise_y_in_desc
(
patterns
::
PDNodeName
(
name_scope_
,
"eltwise_y_in"
));
patterns
::
PDNodeName
(
name_scope_
,
"eltwise_y_in"
));
eltwise_y_in_desc
.
SetPersistable
(
true
);
auto
*
eltwise_y_in_node
=
g
->
CreateVarNode
(
&
eltwise_y_in_desc
);
auto
*
eltwise_y_in_node
=
g
->
CreateVarNode
(
&
eltwise_y_in_desc
);
auto
*
eltwise_y_in_tensor
=
auto
*
eltwise_y_in_tensor
=
scope
->
Var
(
eltwise_y_in_node
->
Name
())
->
GetMutable
<
LoDTensor
>
();
scope
->
Var
(
eltwise_y_in_node
->
Name
())
->
GetMutable
<
LoDTensor
>
();
...
@@ -151,27 +160,59 @@ std::unique_ptr<ir::Graph> ConvBNFusePass::ApplyImpl(
...
@@ -151,27 +160,59 @@ std::unique_ptr<ir::Graph> ConvBNFusePass::ApplyImpl(
*
bn_mean
,
*
bn_variance
,
eltwise_y_in_tensor
,
*
bn_mean
,
*
bn_variance
,
eltwise_y_in_tensor
,
epsilon
);
epsilon
);
// Create an elementwise add node
// with MKL-DNN fuse conv+bn into conv with bias
// without MKL-DNN fuse conv+bn into conv+elementwise_add
if
(
fuse_option
==
FUSE_MKLDNN
)
{
auto
input_names
=
conv
->
Op
()
->
InputNames
();
bool
has_bias
=
std
::
find
(
input_names
.
begin
(),
input_names
.
end
(),
"Bias"
)
!=
input_names
.
end
();
if
(
has_bias
&&
conv
->
Op
()
->
Input
(
"Bias"
).
size
()
>
0
)
{
// reuse existing conv bias node
auto
conv_bias_names
=
conv
->
Op
()
->
Input
(
"Bias"
);
PADDLE_ENFORCE_EQ
(
conv_bias_names
.
size
(),
1
);
auto
*
conv_bias_var
=
scope
->
FindVar
(
conv_bias_names
[
0
]);
auto
*
conv_bias_tensor
=
conv_bias_var
->
GetMutable
<
LoDTensor
>
();
PADDLE_ENFORCE_EQ
(
conv_bias_tensor
->
dims
(),
eltwise_y_in_tensor
->
dims
());
auto
eigen_conv_bias
=
EigenVector
<
float
>::
From
(
*
conv_bias_tensor
);
eigen_conv_bias
+=
EigenVector
<
float
>::
From
(
*
eltwise_y_in_tensor
);
}
else
{
// add new conv_bias node
conv
->
Op
()
->
SetInput
(
"Bias"
,
std
::
vector
<
std
::
string
>
({
eltwise_y_in_node
->
Name
()}));
IR_NODE_LINK_TO
(
eltwise_y_in_node
,
conv
);
}
conv
->
Op
()
->
SetOutput
(
"Output"
,
std
::
vector
<
std
::
string
>
({
bn_out
->
Name
()}));
GraphSafeRemoveNodes
(
graph
.
get
(),
{
conv_out
,
bn_scale
,
bn_bias
,
bn_mean
,
bn_variance
,
batch_norm
,
bn_mean_out
,
bn_variance_out
,
bn_saved_mean
,
bn_saved_variance
});
IR_NODE_LINK_TO
(
conv
,
bn_out
);
found_conv_bn_count
++
;
}
else
{
// fuse_option == FUSE_NATIVE
// create an elementwise add node.
OpDesc
desc
;
OpDesc
desc
;
desc
.
SetInput
(
"X"
,
std
::
vector
<
std
::
string
>
({
conv_out
->
Name
()}));
desc
.
SetInput
(
"X"
,
std
::
vector
<
std
::
string
>
({
conv_out
->
Name
()}));
desc
.
SetInput
(
"Y"
,
std
::
vector
<
std
::
string
>
({
eltwise_y_in_node
->
Name
()}));
desc
.
SetInput
(
"Y"
,
std
::
vector
<
std
::
string
>
({
eltwise_y_in_node
->
Name
()}));
desc
.
SetOutput
(
"Out"
,
std
::
vector
<
std
::
string
>
({
bn_out
->
Name
()}));
desc
.
SetOutput
(
"Out"
,
std
::
vector
<
std
::
string
>
({
bn_out
->
Name
()}));
desc
.
SetType
(
"elementwise_add"
);
desc
.
SetType
(
"elementwise_add"
);
desc
.
SetAttr
(
"axis"
,
1
);
desc
.
SetAttr
(
"axis"
,
1
);
bool
a
=
boost
::
get
<
bool
>
(
conv
->
Op
()
->
GetAttr
(
"use_mkldnn"
));
desc
.
SetAttr
(
"use_mkldnn"
,
a
);
auto
eltwise_op
=
g
->
CreateOpNode
(
&
desc
);
// OpDesc will be copied.
auto
eltwise_op
=
g
->
CreateOpNode
(
&
desc
);
// OpDesc will be copied.
GraphSafeRemoveNodes
(
graph
.
get
(),
{
bn_scale
,
bn_bias
,
bn_mean
,
bn_variance
,
GraphSafeRemoveNodes
(
batch_norm
,
bn_mean_out
,
bn_variance_out
,
graph
.
get
(),
bn_saved_mean
,
bn_saved_variance
});
{
bn_scale
,
bn_bias
,
bn_mean
,
bn_variance
,
batch_norm
,
bn_mean_out
,
bn_variance_out
,
bn_saved_mean
,
bn_saved_variance
});
PADDLE_ENFORCE
(
subgraph
.
count
(
conv_input
));
IR_NODE_LINK_TO
(
conv_out
,
eltwise_op
);
IR_NODE_LINK_TO
(
conv_out
,
eltwise_op
);
IR_NODE_LINK_TO
(
eltwise_y_in_node
,
eltwise_op
);
IR_NODE_LINK_TO
(
eltwise_y_in_node
,
eltwise_op
);
IR_NODE_LINK_TO
(
eltwise_op
,
bn_out
);
IR_NODE_LINK_TO
(
eltwise_op
,
bn_out
);
found_conv_bn_count
++
;
found_conv_bn_count
++
;
}
};
};
gpd
(
graph
.
get
(),
handler
);
gpd
(
graph
.
get
(),
handler
);
...
@@ -237,7 +278,6 @@ std::unique_ptr<ir::Graph> ConvEltwiseAddBNFusePass::ApplyImpl(
...
@@ -237,7 +278,6 @@ std::unique_ptr<ir::Graph> ConvEltwiseAddBNFusePass::ApplyImpl(
{
bn_scale
,
bn_bias
,
bn_mean
,
bn_variance
,
batch_norm
,
bn_mean_out
,
{
bn_scale
,
bn_bias
,
bn_mean
,
bn_variance
,
batch_norm
,
bn_mean_out
,
bn_variance_out
,
bn_saved_mean
,
bn_saved_variance
,
eltwise_out
});
bn_variance_out
,
bn_saved_mean
,
bn_saved_variance
,
eltwise_out
});
PADDLE_ENFORCE
(
subgraph
.
count
(
conv_input
));
IR_NODE_LINK_TO
(
eltwise
,
bn_out
);
IR_NODE_LINK_TO
(
eltwise
,
bn_out
);
found_conv_bn_count
++
;
found_conv_bn_count
++
;
...
...
paddle/fluid/framework/ir/conv_relu_mkldnn_fuse_pass.cc
浏览文件 @
23fc896b
...
@@ -46,6 +46,12 @@ std::unique_ptr<ir::Graph> ConvReLUFusePass::ApplyImpl(
...
@@ -46,6 +46,12 @@ std::unique_ptr<ir::Graph> ConvReLUFusePass::ApplyImpl(
GET_IR_NODE_FROM_SUBGRAPH
(
relu_out
,
relu_out
,
conv_relu_pattern
);
// Out
GET_IR_NODE_FROM_SUBGRAPH
(
relu_out
,
relu_out
,
conv_relu_pattern
);
// Out
GET_IR_NODE_FROM_SUBGRAPH
(
relu
,
relu
,
conv_relu_pattern
);
// ReLU op
GET_IR_NODE_FROM_SUBGRAPH
(
relu
,
relu
,
conv_relu_pattern
);
// ReLU op
FuseOptions
fuse_option
=
FindFuseOption
(
*
conv
,
*
relu
);
if
(
fuse_option
==
DO_NOT_FUSE
)
{
VLOG
(
3
)
<<
"do not perform conv+relu fuse"
;
return
;
}
// Transform Conv node into ConvReLU node.
// Transform Conv node into ConvReLU node.
OpDesc
*
desc
=
conv
->
Op
();
OpDesc
*
desc
=
conv
->
Op
();
desc
->
SetOutput
(
"Output"
,
std
::
vector
<
std
::
string
>
({
relu_out
->
Name
()}));
desc
->
SetOutput
(
"Output"
,
std
::
vector
<
std
::
string
>
({
relu_out
->
Name
()}));
...
...
paddle/fluid/framework/ir/conv_relu_mkldnn_fuse_pass_tester.cc
浏览文件 @
23fc896b
...
@@ -20,17 +20,19 @@ namespace paddle {
...
@@ -20,17 +20,19 @@ namespace paddle {
namespace
framework
{
namespace
framework
{
namespace
ir
{
namespace
ir
{
void
SetOp
(
ProgramDesc
*
prog
,
const
std
::
string
&
type
,
void
SetOp
(
ProgramDesc
*
prog
,
const
std
::
string
&
type
,
const
std
::
string
&
name
,
const
std
::
vector
<
std
::
string
>&
inputs
,
const
std
::
vector
<
std
::
string
>&
inputs
,
const
std
::
vector
<
std
::
string
>&
outputs
)
{
const
std
::
vector
<
std
::
string
>&
outputs
,
bool
use_mkldnn
=
false
)
{
auto
*
op
=
prog
->
MutableBlock
(
0
)
->
AppendOp
();
auto
*
op
=
prog
->
MutableBlock
(
0
)
->
AppendOp
();
op
->
SetType
(
type
);
op
->
SetType
(
type
);
if
(
type
==
"conv2d"
)
{
if
(
type
==
"conv2d"
)
{
op
->
SetAttr
(
"use_mkldnn"
,
true
);
op
->
SetAttr
(
"use_mkldnn"
,
use_mkldnn
);
op
->
SetAttr
(
"name"
,
name
);
op
->
SetInput
(
"Input"
,
{
inputs
[
0
]});
op
->
SetInput
(
"Input"
,
{
inputs
[
0
]});
op
->
SetInput
(
"Filter"
,
{
inputs
[
1
]});
op
->
SetInput
(
"Filter"
,
{
inputs
[
1
]});
op
->
SetInput
(
"Bias"
,
{
inputs
[
2
]});
op
->
SetInput
(
"Bias"
,
{
inputs
[
2
]});
}
else
if
(
type
==
"relu"
)
{
}
else
if
(
type
==
"relu"
)
{
op
->
SetAttr
(
"use_mkldnn"
,
use_mkldnn
);
op
->
SetInput
(
"X"
,
inputs
);
op
->
SetInput
(
"X"
,
inputs
);
}
}
op
->
SetOutput
(
"Out"
,
outputs
);
op
->
SetOutput
(
"Out"
,
outputs
);
...
@@ -43,7 +45,8 @@ void SetOp(ProgramDesc* prog, const std::string& type,
...
@@ -43,7 +45,8 @@ void SetOp(ProgramDesc* prog, const std::string& type,
ProgramDesc
BuildProgramDesc
()
{
ProgramDesc
BuildProgramDesc
()
{
ProgramDesc
prog
;
ProgramDesc
prog
;
for
(
auto
&
v
:
for
(
auto
&
v
:
std
::
vector
<
std
::
string
>
({
"a"
,
"b"
,
"c"
,
"weights"
,
"bias"
,
"f"
,
"g"
}))
{
std
::
vector
<
std
::
string
>
({
"a"
,
"b"
,
"c"
,
"weights"
,
"bias"
,
"f"
,
"g"
,
"h"
,
"weights2"
,
"bias2"
,
"k"
,
"l"
}))
{
auto
*
var
=
prog
.
MutableBlock
(
0
)
->
Var
(
v
);
auto
*
var
=
prog
.
MutableBlock
(
0
)
->
Var
(
v
);
var
->
SetType
(
proto
::
VarType
::
SELECTED_ROWS
);
var
->
SetType
(
proto
::
VarType
::
SELECTED_ROWS
);
if
(
v
==
"weights"
||
v
==
"bias"
)
{
if
(
v
==
"weights"
||
v
==
"bias"
)
{
...
@@ -51,14 +54,24 @@ ProgramDesc BuildProgramDesc() {
...
@@ -51,14 +54,24 @@ ProgramDesc BuildProgramDesc() {
}
}
}
}
SetOp
(
&
prog
,
"OP0"
,
std
::
vector
<
std
::
string
>
({
"a"
}),
SetOp
(
&
prog
,
"OP0"
,
"op0"
,
std
::
vector
<
std
::
string
>
({
"a"
}),
std
::
vector
<
std
::
string
>
({
"b"
}));
std
::
vector
<
std
::
string
>
({
"b"
}));
SetOp
(
&
prog
,
"OP1"
,
std
::
vector
<
std
::
string
>
({
"b"
}),
SetOp
(
&
prog
,
"OP1"
,
"op1"
,
std
::
vector
<
std
::
string
>
({
"b"
}),
std
::
vector
<
std
::
string
>
({
"c"
}));
std
::
vector
<
std
::
string
>
({
"c"
}));
SetOp
(
&
prog
,
"conv2d"
,
std
::
vector
<
std
::
string
>
({
"c"
,
"weights"
,
"bias"
}),
// conv+relu, both with MKL-DNN
std
::
vector
<
std
::
string
>
({
"f"
}));
SetOp
(
&
prog
,
"conv2d"
,
"conv1"
,
SetOp
(
&
prog
,
"relu"
,
std
::
vector
<
std
::
string
>
({
"f"
}),
std
::
vector
<
std
::
string
>
({
"c"
,
"weights"
,
"bias"
}),
std
::
vector
<
std
::
string
>
({
"g"
}));
std
::
vector
<
std
::
string
>
({
"f"
}),
true
);
SetOp
(
&
prog
,
"relu"
,
"relu1"
,
std
::
vector
<
std
::
string
>
({
"f"
}),
std
::
vector
<
std
::
string
>
({
"g"
}),
true
);
SetOp
(
&
prog
,
"OP3"
,
"op3"
,
std
::
vector
<
std
::
string
>
({
"g"
}),
std
::
vector
<
std
::
string
>
({
"h"
}));
// conv+relu, only one with MKL-DNN
SetOp
(
&
prog
,
"conv2d"
,
"conv2"
,
std
::
vector
<
std
::
string
>
({
"h"
,
"weights2"
,
"bias2"
}),
std
::
vector
<
std
::
string
>
({
"k"
}),
true
);
SetOp
(
&
prog
,
"relu"
,
"relu2"
,
std
::
vector
<
std
::
string
>
({
"k"
}),
std
::
vector
<
std
::
string
>
({
"l"
}));
return
prog
;
return
prog
;
}
}
...
@@ -88,11 +101,17 @@ TEST(ConvReLUFusePass, basic) {
...
@@ -88,11 +101,17 @@ TEST(ConvReLUFusePass, basic) {
auto
*
op
=
node
->
Op
();
auto
*
op
=
node
->
Op
();
ASSERT_TRUE
(
op
->
HasAttr
(
"use_mkldnn"
));
ASSERT_TRUE
(
op
->
HasAttr
(
"use_mkldnn"
));
EXPECT_TRUE
(
boost
::
get
<
bool
>
(
op
->
GetAttr
(
"use_mkldnn"
)));
EXPECT_TRUE
(
boost
::
get
<
bool
>
(
op
->
GetAttr
(
"use_mkldnn"
)));
// check if only "conv1" convolution is fused
auto
op_name
=
boost
::
get
<
std
::
string
>
(
op
->
GetAttr
(
"name"
));
if
(
op_name
==
"conv1"
)
{
ASSERT_TRUE
(
op
->
HasAttr
(
"fuse_relu"
));
ASSERT_TRUE
(
op
->
HasAttr
(
"fuse_relu"
));
bool
fuse_relu
=
boost
::
get
<
bool
>
(
op
->
GetAttr
(
"fuse_relu"
));
bool
fuse_relu
=
boost
::
get
<
bool
>
(
op
->
GetAttr
(
"fuse_relu"
));
if
(
fuse_relu
)
{
if
(
fuse_relu
)
{
++
conv_relu_count
;
++
conv_relu_count
;
}
}
}
else
if
(
op_name
==
"conv2"
)
{
ASSERT_FALSE
(
op
->
HasAttr
(
"fuse_relu"
));
}
}
}
}
}
EXPECT_EQ
(
conv_relu_count
,
1
);
EXPECT_EQ
(
conv_relu_count
,
1
);
...
...
paddle/fluid/framework/ir/fuse_pass_base.cc
0 → 100644
浏览文件 @
23fc896b
// Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "paddle/fluid/framework/ir/fuse_pass_base.h"
namespace
paddle
{
namespace
framework
{
namespace
ir
{
void
FusePassBase
::
Init
(
const
std
::
string
&
repr
,
Graph
*
graph
)
const
{
repr_
=
repr
;
graph_
=
graph
;
}
Scope
*
FusePassBase
::
param_scope
()
const
{
PADDLE_ENFORCE
(
graph_
->
Has
(
kParamScopeAttr
));
return
graph_
->
Get
<
framework
::
Scope
*>
(
kParamScopeAttr
);
}
void
FusePassBase
::
AddStatis
(
int
count_of_fused
)
const
{
PADDLE_ENFORCE
(
graph_
);
PADDLE_ENFORCE
(
!
repr_
.
empty
());
if
(
!
graph_
->
Has
(
kFuseStatisAttr
))
{
graph_
->
Set
(
kFuseStatisAttr
,
new
std
::
unordered_map
<
std
::
string
,
int
>
);
}
auto
&
info
=
graph_
->
Get
<
std
::
unordered_map
<
std
::
string
,
int
>>
(
kFuseStatisAttr
);
info
[
repr_
]
=
count_of_fused
;
}
FuseOptions
FusePassBase
::
FindFuseOption
(
const
Node
&
node1
,
const
Node
&
node2
)
const
{
#ifdef PADDLE_WITH_MKLDNN
bool
node1_mkldnn
=
node1
.
Op
()
->
HasAttr
(
"use_mkldnn"
)
&&
boost
::
get
<
bool
>
(
node1
.
Op
()
->
GetAttr
(
"use_mkldnn"
));
bool
node2_mkldnn
=
node2
.
Op
()
->
HasAttr
(
"use_mkldnn"
)
&&
boost
::
get
<
bool
>
(
node2
.
Op
()
->
GetAttr
(
"use_mkldnn"
));
if
(
node1_mkldnn
&&
node2_mkldnn
)
return
FUSE_MKLDNN
;
else
if
(
!
node1_mkldnn
&&
!
node2_mkldnn
)
return
FUSE_NATIVE
;
else
return
DO_NOT_FUSE
;
#else
return
FUSE_NATIVE
;
#endif
};
}
// namespace ir
}
// namespace framework
}
// namespace paddle
paddle/fluid/framework/ir/fuse_pass_base.h
浏览文件 @
23fc896b
...
@@ -25,32 +25,24 @@ namespace ir {
...
@@ -25,32 +25,24 @@ namespace ir {
static
const
char
kParamScopeAttr
[]
=
"__param_scope__"
;
static
const
char
kParamScopeAttr
[]
=
"__param_scope__"
;
static
const
char
kFuseStatisAttr
[]
=
"__fuse_statis__"
;
static
const
char
kFuseStatisAttr
[]
=
"__fuse_statis__"
;
enum
FuseOptions
{
DO_NOT_FUSE
,
// fusing will not be done
FUSE_NATIVE
,
// fusing will be done without MKL-DNN
FUSE_MKLDNN
// fusing will be done with MKL-DNN
};
class
FusePassBase
:
public
Pass
{
class
FusePassBase
:
public
Pass
{
public:
public:
void
Init
(
const
std
::
string
&
repr
,
Graph
*
graph
)
const
{
void
Init
(
const
std
::
string
&
repr
,
Graph
*
graph
)
const
;
repr_
=
repr
;
Scope
*
param_scope
()
const
;
graph_
=
graph
;
void
AddStatis
(
int
count_of_fused
)
const
;
}
Scope
*
param_scope
()
const
{
PADDLE_ENFORCE
(
graph_
->
Has
(
kParamScopeAttr
));
return
graph_
->
Get
<
framework
::
Scope
*>
(
kParamScopeAttr
);
}
void
AddStatis
(
int
count_of_fused
)
const
{
PADDLE_ENFORCE
(
graph_
);
PADDLE_ENFORCE
(
!
repr_
.
empty
());
if
(
!
graph_
->
Has
(
kFuseStatisAttr
))
{
graph_
->
Set
(
kFuseStatisAttr
,
new
std
::
unordered_map
<
std
::
string
,
int
>
);
}
auto
&
info
=
graph_
->
Get
<
std
::
unordered_map
<
std
::
string
,
int
>>
(
kFuseStatisAttr
);
info
[
repr_
]
=
count_of_fused
;
}
virtual
~
FusePassBase
()
{}
virtual
~
FusePassBase
()
{}
protected:
protected:
virtual
FuseOptions
FindFuseOption
(
const
Node
&
node1
,
const
Node
&
node2
)
const
;
mutable
Graph
*
graph_
;
mutable
Graph
*
graph_
;
mutable
std
::
string
repr_
;
mutable
std
::
string
repr_
;
};
};
...
...
paddle/fluid/framework/ir/graph_pattern_detector.cc
浏览文件 @
23fc896b
...
@@ -259,6 +259,8 @@ GraphPatternDetector::DetectPatterns() {
...
@@ -259,6 +259,8 @@ GraphPatternDetector::DetectPatterns() {
return
result
;
return
result
;
}
}
// TODO(Superjomn) enhance the function as it marks unique unique as duplicates
// see https://github.com/PaddlePaddle/Paddle/issues/13550
void
GraphPatternDetector
::
UniquePatterns
(
void
GraphPatternDetector
::
UniquePatterns
(
std
::
vector
<
GraphPatternDetector
::
subgraph_t
>
*
subgraphs
)
{
std
::
vector
<
GraphPatternDetector
::
subgraph_t
>
*
subgraphs
)
{
if
(
subgraphs
->
empty
())
return
;
if
(
subgraphs
->
empty
())
return
;
...
...
paddle/fluid/framework/ir/mkldnn_placement_pass.cc
0 → 100644
浏览文件 @
23fc896b
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/framework/ir/mkldnn_placement_pass.h"
namespace
paddle
{
namespace
framework
{
namespace
ir
{
std
::
unique_ptr
<
ir
::
Graph
>
MKLDNNPlacementPass
::
ApplyImpl
(
std
::
unique_ptr
<
ir
::
Graph
>
graph
)
const
{
VLOG
(
3
)
<<
"Aplies MKL-DNN placement strategy."
;
for
(
const
Node
*
n
:
graph
->
Nodes
())
{
if
(
n
->
IsOp
()
&&
n
->
Op
()
->
HasAttr
(
"use_mkldnn"
))
{
n
->
Op
()
->
SetAttr
(
"use_mkldnn"
,
true
);
}
}
return
graph
;
}
}
// namespace ir
}
// namespace framework
}
// namespace paddle
REGISTER_PASS
(
mkldnn_placement_pass
,
paddle
::
framework
::
ir
::
MKLDNNPlacementPass
);
paddle/fluid/framework/ir/mkldnn_placement_pass.h
0 → 100644
浏览文件 @
23fc896b
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include "paddle/fluid/framework/ir/pass.h"
namespace
paddle
{
namespace
framework
{
namespace
ir
{
class
MKLDNNPlacementPass
:
public
Pass
{
protected:
std
::
unique_ptr
<
ir
::
Graph
>
ApplyImpl
(
std
::
unique_ptr
<
ir
::
Graph
>
graph
)
const
override
;
};
}
// namespace ir
}
// namespace framework
}
// namespace paddle
paddle/fluid/framework/op_desc.cc
浏览文件 @
23fc896b
...
@@ -85,10 +85,6 @@ class CompileTimeInferShapeContext : public InferShapeContext {
...
@@ -85,10 +85,6 @@ class CompileTimeInferShapeContext : public InferShapeContext {
VLOG
(
3
)
<<
"input "
<<
in
<<
" is not LodTensor"
;
VLOG
(
3
)
<<
"input "
<<
in
<<
" is not LodTensor"
;
return
;
return
;
}
}
PADDLE_ENFORCE_EQ
(
in_var
->
GetType
(),
proto
::
VarType
::
LOD_TENSOR
,
"The %d-th output of Output(%s) must be LoDTensor."
,
j
,
out
);
out_var
->
SetLoDLevel
(
in_var
->
GetLoDLevel
());
out_var
->
SetLoDLevel
(
in_var
->
GetLoDLevel
());
}
}
...
...
paddle/fluid/inference/analysis/analyzer.cc
浏览文件 @
23fc896b
...
@@ -101,7 +101,11 @@ Analyzer::Analyzer() { Register("manager1", new DfgPassManagerImpl); }
...
@@ -101,7 +101,11 @@ Analyzer::Analyzer() { Register("manager1", new DfgPassManagerImpl); }
void
Analyzer
::
Run
(
Argument
*
argument
)
{
void
Analyzer
::
Run
(
Argument
*
argument
)
{
std
::
vector
<
std
::
string
>
passes
;
std
::
vector
<
std
::
string
>
passes
;
for
(
auto
&
pass
:
all_ir_passes_
)
{
if
(
use_mkldnn_
)
{
VLOG
(
3
)
<<
"Adding MKL-DNN placement pass"
;
passes
.
push_back
(
"mkldnn_placement_pass"
);
}
for
(
auto
&
pass
:
ir_passes_
)
{
if
(
!
disabled_ir_passes_
.
count
(
pass
))
{
if
(
!
disabled_ir_passes_
.
count
(
pass
))
{
passes
.
push_back
(
pass
);
passes
.
push_back
(
pass
);
passes
.
push_back
(
"graph_viz_pass"
);
// add graphviz for debug.
passes
.
push_back
(
"graph_viz_pass"
);
// add graphviz for debug.
...
@@ -117,11 +121,26 @@ void Analyzer::Run(Argument* argument) {
...
@@ -117,11 +121,26 @@ void Analyzer::Run(Argument* argument) {
}
}
}
}
Analyzer
&
Analyzer
::
IncludeAllIrPasses
()
{
ir_passes_
=
all_ir_passes_
;
return
*
this
;
}
Analyzer
&
Analyzer
::
DisableIrPasses
(
const
std
::
vector
<
std
::
string
>&
passes
)
{
Analyzer
&
Analyzer
::
DisableIrPasses
(
const
std
::
vector
<
std
::
string
>&
passes
)
{
disabled_ir_passes_
.
insert
(
passes
.
begin
(),
passes
.
end
());
disabled_ir_passes_
.
insert
(
passes
.
begin
(),
passes
.
end
());
return
*
this
;
return
*
this
;
}
}
Analyzer
&
Analyzer
::
IncludeIrPasses
(
const
std
::
vector
<
std
::
string
>&
passes
)
{
ir_passes_
=
passes
;
return
*
this
;
}
Analyzer
&
Analyzer
::
SetUseMkldnn
(
bool
use_mkldnn
)
{
use_mkldnn_
=
use_mkldnn
;
return
*
this
;
}
}
// namespace analysis
}
// namespace analysis
}
// namespace inference
}
// namespace inference
}
// namespace paddle
}
// namespace paddle
paddle/fluid/inference/analysis/analyzer.h
浏览文件 @
23fc896b
...
@@ -54,6 +54,9 @@ class Analyzer : public OrderedRegistry<PassManager> {
...
@@ -54,6 +54,9 @@ class Analyzer : public OrderedRegistry<PassManager> {
void
Run
(
Argument
*
argument
);
void
Run
(
Argument
*
argument
);
Analyzer
&
DisableIrPasses
(
const
std
::
vector
<
std
::
string
>&
passes
);
Analyzer
&
DisableIrPasses
(
const
std
::
vector
<
std
::
string
>&
passes
);
Analyzer
&
IncludeIrPasses
(
const
std
::
vector
<
std
::
string
>&
passes
);
Analyzer
&
IncludeAllIrPasses
();
Analyzer
&
SetUseMkldnn
(
bool
use_mkldnn
);
DISABLE_COPY_AND_ASSIGN
(
Analyzer
);
DISABLE_COPY_AND_ASSIGN
(
Analyzer
);
...
@@ -81,6 +84,9 @@ class Analyzer : public OrderedRegistry<PassManager> {
...
@@ -81,6 +84,9 @@ class Analyzer : public OrderedRegistry<PassManager> {
}};
}};
std
::
unordered_set
<
std
::
string
>
disabled_ir_passes_
;
std
::
unordered_set
<
std
::
string
>
disabled_ir_passes_
;
// Ir passes to run
std
::
vector
<
std
::
string
>
ir_passes_
;
bool
use_mkldnn_
;
};
};
}
// namespace analysis
}
// namespace analysis
...
...
paddle/fluid/inference/api/analysis_predictor.cc
浏览文件 @
23fc896b
...
@@ -225,10 +225,24 @@ void AnalysisPredictor::OptimizeInferenceProgram() {
...
@@ -225,10 +225,24 @@ void AnalysisPredictor::OptimizeInferenceProgram() {
argument_
.
origin_program_desc
.
reset
(
argument_
.
origin_program_desc
.
reset
(
new
ProgramDesc
(
*
inference_program_
->
Proto
()));
new
ProgramDesc
(
*
inference_program_
->
Proto
()));
PADDLE_ENFORCE
(
config_
.
ir_mode
==
contrib
::
AnalysisConfig
::
IrPassMode
::
kExclude
,
switch
(
config_
.
ir_mode
)
{
"Only kExclude is supported yet."
);
case
contrib
::
AnalysisConfig
::
IrPassMode
::
kExclude
:
Analyzer
().
DisableIrPasses
(
config_
.
ir_passes
).
Run
(
&
argument_
);
Analyzer
()
.
IncludeAllIrPasses
()
.
SetUseMkldnn
(
config_
.
_use_mkldnn
)
.
DisableIrPasses
(
config_
.
ir_passes
)
.
Run
(
&
argument_
);
break
;
case
contrib
::
AnalysisConfig
::
IrPassMode
::
kInclude
:
Analyzer
()
.
SetUseMkldnn
(
config_
.
_use_mkldnn
)
.
IncludeIrPasses
(
config_
.
ir_passes
)
.
Run
(
&
argument_
);
break
;
default:
LOG
(
ERROR
)
<<
"Only kExclude and kInclude modes are supoorted yet."
;
}
CHECK
(
argument_
.
transformed_program_desc
);
CHECK
(
argument_
.
transformed_program_desc
);
VLOG
(
5
)
<<
"to prepare executor"
;
VLOG
(
5
)
<<
"to prepare executor"
;
...
...
paddle/fluid/inference/api/paddle_inference_api.h
浏览文件 @
23fc896b
...
@@ -259,10 +259,17 @@ struct AnalysisConfig : public NativeConfig {
...
@@ -259,10 +259,17 @@ struct AnalysisConfig : public NativeConfig {
kExclude
// Specify the disabled passes in `ir_passes`.
kExclude
// Specify the disabled passes in `ir_passes`.
};
};
void
SetIncludeMode
()
{
ir_mode
=
IrPassMode
::
kInclude
;
// this pass has to be run at the beginning of all fuse passes
ir_passes
=
{
"infer_clean_graph_pass"
};
}
// Determine whether to perform graph optimization.
// Determine whether to perform graph optimization.
bool
enable_ir_optim
=
true
;
bool
enable_ir_optim
=
true
;
// Manually determine the IR passes to run.
// Manually determine the IR passes to run.
IrPassMode
ir_mode
{
IrPassMode
::
kExclude
};
IrPassMode
ir_mode
{
IrPassMode
::
kExclude
};
// passes to be excluded/included
std
::
vector
<
std
::
string
>
ir_passes
{
"embedding_fc_lstm_fuse_pass"
};
std
::
vector
<
std
::
string
>
ir_passes
{
"embedding_fc_lstm_fuse_pass"
};
// NOT stable yet.
// NOT stable yet.
...
...
paddle/fluid/inference/tests/api/analyzer_rnn2_tester.cc
浏览文件 @
23fc896b
...
@@ -18,12 +18,12 @@ namespace paddle {
...
@@ -18,12 +18,12 @@ namespace paddle {
namespace
inference
{
namespace
inference
{
using
namespace
framework
;
// NOLINT
using
namespace
framework
;
// NOLINT
static
std
::
vector
<
float
>
result_data
;
struct
DataRecord
{
struct
DataRecord
{
std
::
vector
<
std
::
vector
<
std
::
vector
<
float
>>>
link_step_data_all
;
std
::
vector
<
std
::
vector
<
std
::
vector
<
float
>>>
link_step_data_all
;
std
::
vector
<
size_t
>
lod
;
std
::
vector
<
size_t
>
lod
;
std
::
vector
<
std
::
vector
<
float
>>
rnn_link_data
;
std
::
vector
<
std
::
vector
<
float
>>
rnn_link_data
;
std
::
vector
<
float
>
result_data
;
size_t
num_samples
;
// total number of samples
size_t
num_samples
;
// total number of samples
size_t
batch_iter
{
0
};
size_t
batch_iter
{
0
};
size_t
batch_size
{
1
};
size_t
batch_size
{
1
};
...
@@ -57,6 +57,7 @@ struct DataRecord {
...
@@ -57,6 +57,7 @@ struct DataRecord {
std
::
ifstream
file
(
path
);
std
::
ifstream
file
(
path
);
std
::
string
line
;
std
::
string
line
;
int
num_lines
=
0
;
int
num_lines
=
0
;
result_data
.
clear
();
while
(
std
::
getline
(
file
,
line
))
{
while
(
std
::
getline
(
file
,
line
))
{
num_lines
++
;
num_lines
++
;
std
::
vector
<
std
::
string
>
data
;
std
::
vector
<
std
::
string
>
data
;
...
@@ -135,13 +136,12 @@ TEST(Analyzer_rnn2, profile) {
...
@@ -135,13 +136,12 @@ TEST(Analyzer_rnn2, profile) {
if
(
FLAGS_num_threads
==
1
&&
!
FLAGS_test_all_data
)
{
if
(
FLAGS_num_threads
==
1
&&
!
FLAGS_test_all_data
)
{
// the first inference result
// the first inference result
DataRecord
data
(
FLAGS_infer_data
,
FLAGS_batch_size
);
PADDLE_ENFORCE_GT
(
outputs
.
size
(),
0
);
PADDLE_ENFORCE_GT
(
outputs
.
size
(),
0
);
size_t
size
=
GetSize
(
outputs
[
0
]);
size_t
size
=
GetSize
(
outputs
[
0
]);
PADDLE_ENFORCE_GT
(
size
,
0
);
PADDLE_ENFORCE_GT
(
size
,
0
);
float
*
result
=
static_cast
<
float
*>
(
outputs
[
0
].
data
.
data
());
float
*
result
=
static_cast
<
float
*>
(
outputs
[
0
].
data
.
data
());
for
(
size_t
i
=
0
;
i
<
size
;
i
++
)
{
for
(
size_t
i
=
0
;
i
<
size
;
i
++
)
{
EXPECT_NEAR
(
result
[
i
],
data
.
result_data
[
i
],
1e-3
);
EXPECT_NEAR
(
result
[
i
],
result_data
[
i
],
1e-3
);
}
}
}
}
}
}
...
...
paddle/fluid/operators/detection/CMakeLists.txt
浏览文件 @
23fc896b
...
@@ -20,7 +20,7 @@ detection_library(box_coder_op SRCS box_coder_op.cc box_coder_op.cu)
...
@@ -20,7 +20,7 @@ detection_library(box_coder_op SRCS box_coder_op.cc box_coder_op.cu)
detection_library
(
iou_similarity_op SRCS iou_similarity_op.cc
detection_library
(
iou_similarity_op SRCS iou_similarity_op.cc
iou_similarity_op.cu
)
iou_similarity_op.cu
)
detection_library
(
mine_hard_examples_op SRCS mine_hard_examples_op.cc
)
detection_library
(
mine_hard_examples_op SRCS mine_hard_examples_op.cc
)
detection_library
(
multiclass_nms_op SRCS multiclass_nms_op.cc
)
detection_library
(
multiclass_nms_op SRCS multiclass_nms_op.cc
poly_util.cc gpc.cc
)
detection_library
(
prior_box_op SRCS prior_box_op.cc prior_box_op.cu
)
detection_library
(
prior_box_op SRCS prior_box_op.cc prior_box_op.cu
)
detection_library
(
anchor_generator_op SRCS anchor_generator_op.cc
detection_library
(
anchor_generator_op SRCS anchor_generator_op.cc
anchor_generator_op.cu
)
anchor_generator_op.cu
)
...
...
paddle/fluid/operators/detection/gpc.cc
0 → 100644
浏览文件 @
23fc896b
// Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
/**
* @file src/gpc.cpp
* @author huhan02(com@baidu.com)
* @date 2015/12/18 14:17:30
* @brief
*
* @modified by sunyipeng
* @email sunyipeng@baidu.com
* @date 2018/6/12
**/
#include "paddle/fluid/operators/detection/gpc.h"
namespace
gpc
{
typedef
struct
lmt_shape
{
/* Local minima table */
double
y
;
/* Y coordinate at local minimum */
edge_node
*
first_bound
;
/* Pointer to bound list */
struct
lmt_shape
*
next
;
/* Pointer to next local minimum */
}
lmt_node
;
typedef
struct
sbt_t_shape
{
/* Scanbeam tree */
double
y
;
/* Scanbeam node y value */
struct
sbt_t_shape
*
less
;
/* Pointer to nodes with lower y */
struct
sbt_t_shape
*
more
;
/* Pointer to nodes with higher y */
}
sb_tree
;
typedef
struct
it_shape
{
/* Intersection table */
edge_node
*
ie
[
2
];
/* Intersecting edge (bundle) pair */
gpc_vertex
point
;
/* Point of intersection */
struct
it_shape
*
next
;
/* The next intersection table node */
}
it_node
;
typedef
struct
st_shape
{
/* Sorted edge table */
edge_node
*
edge
;
/* Pointer to AET edge */
double
xb
;
/* Scanbeam bottom x coordinate */
double
xt
;
/* Scanbeam top x coordinate */
double
dx
;
/* Change in x for a unit y increase */
struct
st_shape
*
prev
;
/* Previous edge in sorted list */
}
st_node
;
typedef
struct
bbox_shape
{
/* Contour axis-aligned bounding box */
double
xmin
;
/* Minimum x coordinate */
double
ymin
;
/* Minimum y coordinate */
double
xmax
;
/* Maximum x coordinate */
double
ymax
;
/* Maximum y coordinate */
}
bbox
;
/*
===========================================================================
Global Data
===========================================================================
*/
/* Horizontal edge state transitions within scanbeam boundary */
const
h_state
next_h_state
[
3
][
6
]
=
{
/* ABOVE BELOW CROSS */
/* L R L R L R */
/* NH */
{
BH
,
TH
,
TH
,
BH
,
NH
,
NH
},
/* BH */
{
NH
,
NH
,
NH
,
NH
,
TH
,
TH
},
/* TH */
{
NH
,
NH
,
NH
,
NH
,
BH
,
BH
}};
/*
===========================================================================
Private Functions
===========================================================================
*/
static
void
reset_it
(
it_node
**
it
)
{
it_node
*
itn
;
while
(
*
it
)
{
itn
=
(
*
it
)
->
next
;
gpc_free
<
it_node
>
(
*
it
);
*
it
=
itn
;
}
}
static
void
reset_lmt
(
lmt_node
**
lmt
)
{
lmt_node
*
lmtn
;
while
(
*
lmt
)
{
lmtn
=
(
*
lmt
)
->
next
;
gpc_free
<
lmt_node
>
(
*
lmt
);
*
lmt
=
lmtn
;
}
}
static
void
insert_bound
(
edge_node
**
b
,
edge_node
*
e
)
{
edge_node
*
existing_bound
=
NULL
;
if
(
!*
b
)
{
/* Link node e to the tail of the list */
*
b
=
e
;
}
else
{
/* Do primary sort on the x field */
if
(
e
[
0
].
bot
.
x
<
(
*
b
)[
0
].
bot
.
x
)
{
/* Insert a new node mid-list */
existing_bound
=
*
b
;
*
b
=
e
;
(
*
b
)
->
next_bound
=
existing_bound
;
}
else
{
if
(
e
[
0
].
bot
.
x
==
(
*
b
)[
0
].
bot
.
x
)
{
/* Do secondary sort on the dx field */
if
(
e
[
0
].
dx
<
(
*
b
)[
0
].
dx
)
{
/* Insert a new node mid-list */
existing_bound
=
*
b
;
*
b
=
e
;
(
*
b
)
->
next_bound
=
existing_bound
;
}
else
{
/* Head further down the list */
insert_bound
(
&
((
*
b
)
->
next_bound
),
e
);
}
}
else
{
/* Head further down the list */
insert_bound
(
&
((
*
b
)
->
next_bound
),
e
);
}
}
}
}
static
edge_node
**
bound_list
(
lmt_node
**
lmt
,
double
y
)
{
lmt_node
*
existing_node
;
if
(
!*
lmt
)
{
/* Add node onto the tail end of the LMT */
gpc_malloc
<
lmt_node
>
(
*
lmt
,
sizeof
(
lmt_node
),
const_cast
<
char
*>
(
"LMT insertion"
));
(
*
lmt
)
->
y
=
y
;
(
*
lmt
)
->
first_bound
=
NULL
;
(
*
lmt
)
->
next
=
NULL
;
return
&
((
*
lmt
)
->
first_bound
);
}
else
if
(
y
<
(
*
lmt
)
->
y
)
{
/* Insert a new LMT node before the current node */
existing_node
=
*
lmt
;
gpc_malloc
<
lmt_node
>
(
*
lmt
,
sizeof
(
lmt_node
),
const_cast
<
char
*>
(
"LMT insertion"
));
(
*
lmt
)
->
y
=
y
;
(
*
lmt
)
->
first_bound
=
NULL
;
(
*
lmt
)
->
next
=
existing_node
;
return
&
((
*
lmt
)
->
first_bound
);
}
else
{
if
(
y
>
(
*
lmt
)
->
y
)
{
/* Head further up the LMT */
return
bound_list
(
&
((
*
lmt
)
->
next
),
y
);
}
else
{
/* Use this existing LMT node */
return
&
((
*
lmt
)
->
first_bound
);
}
}
}
static
void
add_to_sbtree
(
int
*
entries
,
sb_tree
**
sbtree
,
double
y
)
{
if
(
!*
sbtree
)
{
/* Add a new tree node here */
gpc_malloc
<
sb_tree
>
(
*
sbtree
,
sizeof
(
sb_tree
),
const_cast
<
char
*>
(
"scanbeam tree insertion"
));
(
*
sbtree
)
->
y
=
y
;
(
*
sbtree
)
->
less
=
NULL
;
(
*
sbtree
)
->
more
=
NULL
;
(
*
entries
)
++
;
}
else
{
if
((
*
sbtree
)
->
y
>
y
)
{
/* Head into the 'less' sub-tree */
add_to_sbtree
(
entries
,
&
((
*
sbtree
)
->
less
),
y
);
}
else
{
if
((
*
sbtree
)
->
y
<
y
)
{
/* Head into the 'more' sub-tree */
add_to_sbtree
(
entries
,
&
((
*
sbtree
)
->
more
),
y
);
}
}
}
}
static
void
build_sbt
(
int
*
entries
,
double
*
sbt
,
sb_tree
*
sbtree
)
{
if
(
sbtree
->
less
)
{
build_sbt
(
entries
,
sbt
,
sbtree
->
less
);
}
sbt
[
*
entries
]
=
sbtree
->
y
;
(
*
entries
)
++
;
if
(
sbtree
->
more
)
{
build_sbt
(
entries
,
sbt
,
sbtree
->
more
);
}
}
static
void
free_sbtree
(
sb_tree
**
sbtree
)
{
if
(
*
sbtree
)
{
free_sbtree
(
&
((
*
sbtree
)
->
less
));
free_sbtree
(
&
((
*
sbtree
)
->
more
));
gpc_free
<
sb_tree
>
(
*
sbtree
);
}
}
static
int
count_optimal_vertices
(
gpc_vertex_list
c
)
{
int
result
=
0
;
int
i
=
0
;
/* Ignore non-contributing contours */
if
(
c
.
num_vertices
>
0
)
{
for
(
i
=
0
;
i
<
c
.
num_vertices
;
i
++
)
{
/* Ignore superfluous vertices embedded in horizontal edges */
if
(
gpc_optimal
(
c
.
vertex
,
i
,
c
.
num_vertices
))
{
result
++
;
}
}
}
return
result
;
}
static
edge_node
*
build_lmt
(
lmt_node
**
lmt
,
sb_tree
**
sbtree
,
int
*
sbt_entries
,
gpc_polygon
*
p
,
int
type
,
gpc_op
op
)
{
int
c
=
0
;
int
i
=
0
;
int
min
=
0
;
int
max
=
0
;
int
num_edges
=
0
;
int
v
=
0
;
int
num_vertices
=
0
;
int
total_vertices
=
0
;
int
e_index
=
0
;
edge_node
*
e
=
NULL
;
edge_node
*
edge_table
=
NULL
;
for
(
c
=
0
;
c
<
p
->
num_contours
;
c
++
)
{
total_vertices
+=
count_optimal_vertices
(
p
->
contour
[
c
]);
}
/* Create the entire input polygon edge table in one go */
gpc_malloc
<
edge_node
>
(
edge_table
,
total_vertices
*
sizeof
(
edge_node
),
const_cast
<
char
*>
(
"edge table creation"
));
for
(
c
=
0
;
c
<
p
->
num_contours
;
c
++
)
{
if
(
p
->
contour
[
c
].
num_vertices
<
0
)
{
/* Ignore the non-contributing contour and repair the vertex count */
p
->
contour
[
c
].
num_vertices
=
-
p
->
contour
[
c
].
num_vertices
;
}
else
{
/* Perform contour optimisation */
num_vertices
=
0
;
for
(
i
=
0
;
i
<
p
->
contour
[
c
].
num_vertices
;
i
++
)
{
if
(
gpc_optimal
(
p
->
contour
[
c
].
vertex
,
i
,
p
->
contour
[
c
].
num_vertices
))
{
edge_table
[
num_vertices
].
vertex
.
x
=
p
->
contour
[
c
].
vertex
[
i
].
x
;
edge_table
[
num_vertices
].
vertex
.
y
=
p
->
contour
[
c
].
vertex
[
i
].
y
;
/* Record vertex in the scanbeam table */
add_to_sbtree
(
sbt_entries
,
sbtree
,
edge_table
[
num_vertices
].
vertex
.
y
);
num_vertices
++
;
}
}
/* Do the contour forward pass */
for
(
min
=
0
;
min
<
num_vertices
;
min
++
)
{
/* If a forward local minimum... */
if
(
gpc_fwd_min
(
edge_table
,
min
,
num_vertices
))
{
/* Search for the next local maximum... */
num_edges
=
1
;
max
=
gpc_next_index
(
min
,
num_vertices
);
while
(
gpc_not_fmax
(
edge_table
,
max
,
num_vertices
))
{
num_edges
++
;
max
=
gpc_next_index
(
max
,
num_vertices
);
}
/* Build the next edge list */
e
=
&
edge_table
[
e_index
];
e_index
+=
num_edges
;
v
=
min
;
e
[
0
].
bstate
[
BELOW
]
=
UNBUNDLED
;
e
[
0
].
bundle
[
BELOW
][
CLIP
]
=
0
;
e
[
0
].
bundle
[
BELOW
][
SUBJ
]
=
0
;
for
(
i
=
0
;
i
<
num_edges
;
i
++
)
{
e
[
i
].
xb
=
edge_table
[
v
].
vertex
.
x
;
e
[
i
].
bot
.
x
=
edge_table
[
v
].
vertex
.
x
;
e
[
i
].
bot
.
y
=
edge_table
[
v
].
vertex
.
y
;
v
=
gpc_next_index
(
v
,
num_vertices
);
e
[
i
].
top
.
x
=
edge_table
[
v
].
vertex
.
x
;
e
[
i
].
top
.
y
=
edge_table
[
v
].
vertex
.
y
;
e
[
i
].
dx
=
(
edge_table
[
v
].
vertex
.
x
-
e
[
i
].
bot
.
x
)
/
(
e
[
i
].
top
.
y
-
e
[
i
].
bot
.
y
);
e
[
i
].
type
=
type
;
e
[
i
].
outp
[
ABOVE
]
=
NULL
;
e
[
i
].
outp
[
BELOW
]
=
NULL
;
e
[
i
].
next
=
NULL
;
e
[
i
].
prev
=
NULL
;
e
[
i
].
succ
=
((
num_edges
>
1
)
&&
(
i
<
(
num_edges
-
1
)))
?
&
(
e
[
i
+
1
])
:
NULL
;
e
[
i
].
pred
=
((
num_edges
>
1
)
&&
(
i
>
0
))
?
&
(
e
[
i
-
1
])
:
NULL
;
e
[
i
].
next_bound
=
NULL
;
e
[
i
].
bside
[
CLIP
]
=
(
op
==
GPC_DIFF
)
?
RIGHT
:
LEFT
;
e
[
i
].
bside
[
SUBJ
]
=
LEFT
;
}
insert_bound
(
bound_list
(
lmt
,
edge_table
[
min
].
vertex
.
y
),
e
);
}
}
/* Do the contour reverse pass */
for
(
min
=
0
;
min
<
num_vertices
;
min
++
)
{
/* If a reverse local minimum... */
if
(
gpc_rev_min
(
edge_table
,
min
,
num_vertices
))
{
/* Search for the previous local maximum... */
num_edges
=
1
;
max
=
gpc_prev_index
(
min
,
num_vertices
);
while
(
gpc_not_rmax
(
edge_table
,
max
,
num_vertices
))
{
num_edges
++
;
max
=
gpc_prev_index
(
max
,
num_vertices
);
}
/* Build the previous edge list */
e
=
&
edge_table
[
e_index
];
e_index
+=
num_edges
;
v
=
min
;
e
[
0
].
bstate
[
BELOW
]
=
UNBUNDLED
;
e
[
0
].
bundle
[
BELOW
][
CLIP
]
=
0
;
e
[
0
].
bundle
[
BELOW
][
SUBJ
]
=
0
;
for
(
i
=
0
;
i
<
num_edges
;
i
++
)
{
e
[
i
].
xb
=
edge_table
[
v
].
vertex
.
x
;
e
[
i
].
bot
.
x
=
edge_table
[
v
].
vertex
.
x
;
e
[
i
].
bot
.
y
=
edge_table
[
v
].
vertex
.
y
;
v
=
gpc_prev_index
(
v
,
num_vertices
);
e
[
i
].
top
.
x
=
edge_table
[
v
].
vertex
.
x
;
e
[
i
].
top
.
y
=
edge_table
[
v
].
vertex
.
y
;
e
[
i
].
dx
=
(
edge_table
[
v
].
vertex
.
x
-
e
[
i
].
bot
.
x
)
/
(
e
[
i
].
top
.
y
-
e
[
i
].
bot
.
y
);
e
[
i
].
type
=
type
;
e
[
i
].
outp
[
ABOVE
]
=
NULL
;
e
[
i
].
outp
[
BELOW
]
=
NULL
;
e
[
i
].
next
=
NULL
;
e
[
i
].
prev
=
NULL
;
e
[
i
].
succ
=
((
num_edges
>
1
)
&&
(
i
<
(
num_edges
-
1
)))
?
&
(
e
[
i
+
1
])
:
NULL
;
e
[
i
].
pred
=
((
num_edges
>
1
)
&&
(
i
>
0
))
?
&
(
e
[
i
-
1
])
:
NULL
;
e
[
i
].
next_bound
=
NULL
;
e
[
i
].
bside
[
CLIP
]
=
(
op
==
GPC_DIFF
)
?
RIGHT
:
LEFT
;
e
[
i
].
bside
[
SUBJ
]
=
LEFT
;
}
insert_bound
(
bound_list
(
lmt
,
edge_table
[
min
].
vertex
.
y
),
e
);
}
}
}
}
return
edge_table
;
}
// NOLINT
static
void
add_edge_to_aet
(
edge_node
**
aet
,
edge_node
*
edge
,
edge_node
*
prev
)
{
if
(
!*
aet
)
{
/* Append edge onto the tail end of the AET */
*
aet
=
edge
;
edge
->
prev
=
prev
;
edge
->
next
=
NULL
;
}
else
{
/* Do primary sort on the xb field */
if
(
edge
->
xb
<
(
*
aet
)
->
xb
)
{
/* Insert edge here (before the AET edge) */
edge
->
prev
=
prev
;
edge
->
next
=
*
aet
;
(
*
aet
)
->
prev
=
edge
;
*
aet
=
edge
;
}
else
{
if
(
edge
->
xb
==
(
*
aet
)
->
xb
)
{
/* Do secondary sort on the dx field */
if
(
edge
->
dx
<
(
*
aet
)
->
dx
)
{
/* Insert edge here (before the AET edge) */
edge
->
prev
=
prev
;
edge
->
next
=
*
aet
;
(
*
aet
)
->
prev
=
edge
;
*
aet
=
edge
;
}
else
{
/* Head further into the AET */
add_edge_to_aet
(
&
((
*
aet
)
->
next
),
edge
,
*
aet
);
}
}
else
{
/* Head further into the AET */
add_edge_to_aet
(
&
((
*
aet
)
->
next
),
edge
,
*
aet
);
}
}
}
}
static
void
add_intersection
(
it_node
**
it
,
edge_node
*
edge0
,
edge_node
*
edge1
,
double
x
,
double
y
)
{
it_node
*
existing_node
;
if
(
!*
it
)
{
/* Append a new node to the tail of the list */
gpc_malloc
<
it_node
>
(
*
it
,
sizeof
(
it_node
),
const_cast
<
char
*>
(
"IT insertion"
));
(
*
it
)
->
ie
[
0
]
=
edge0
;
(
*
it
)
->
ie
[
1
]
=
edge1
;
(
*
it
)
->
point
.
x
=
x
;
(
*
it
)
->
point
.
y
=
y
;
(
*
it
)
->
next
=
NULL
;
}
else
{
if
((
*
it
)
->
point
.
y
>
y
)
{
/* Insert a new node mid-list */
existing_node
=
*
it
;
gpc_malloc
<
it_node
>
(
*
it
,
sizeof
(
it_node
),
const_cast
<
char
*>
(
"IT insertion"
));
(
*
it
)
->
ie
[
0
]
=
edge0
;
(
*
it
)
->
ie
[
1
]
=
edge1
;
(
*
it
)
->
point
.
x
=
x
;
(
*
it
)
->
point
.
y
=
y
;
(
*
it
)
->
next
=
existing_node
;
}
else
{
/* Head further down the list */
add_intersection
(
&
((
*
it
)
->
next
),
edge0
,
edge1
,
x
,
y
);
}
}
}
static
void
add_st_edge
(
st_node
**
st
,
it_node
**
it
,
edge_node
*
edge
,
double
dy
)
{
st_node
*
existing_node
;
double
den
=
0.0
;
double
r
=
0.0
;
double
x
=
0.0
;
double
y
=
0.0
;
if
(
!*
st
)
{
/* Append edge onto the tail end of the ST */
gpc_malloc
<
st_node
>
(
*
st
,
sizeof
(
st_node
),
const_cast
<
char
*>
(
"ST insertion"
));
(
*
st
)
->
edge
=
edge
;
(
*
st
)
->
xb
=
edge
->
xb
;
(
*
st
)
->
xt
=
edge
->
xt
;
(
*
st
)
->
dx
=
edge
->
dx
;
(
*
st
)
->
prev
=
NULL
;
}
else
{
den
=
((
*
st
)
->
xt
-
(
*
st
)
->
xb
)
-
(
edge
->
xt
-
edge
->
xb
);
/* If new edge and ST edge don't cross */
if
((
edge
->
xt
>=
(
*
st
)
->
xt
)
||
(
edge
->
dx
==
(
*
st
)
->
dx
)
||
(
fabs
(
den
)
<=
DBL_EPSILON
))
{
/* No intersection - insert edge here (before the ST edge) */
existing_node
=
*
st
;
gpc_malloc
<
st_node
>
(
*
st
,
sizeof
(
st_node
),
const_cast
<
char
*>
(
"ST insertion"
));
(
*
st
)
->
edge
=
edge
;
(
*
st
)
->
xb
=
edge
->
xb
;
(
*
st
)
->
xt
=
edge
->
xt
;
(
*
st
)
->
dx
=
edge
->
dx
;
(
*
st
)
->
prev
=
existing_node
;
}
else
{
/* Compute intersection between new edge and ST edge */
r
=
(
edge
->
xb
-
(
*
st
)
->
xb
)
/
den
;
x
=
(
*
st
)
->
xb
+
r
*
((
*
st
)
->
xt
-
(
*
st
)
->
xb
);
y
=
r
*
dy
;
/* Insert the edge pointers and the intersection point in the IT */
add_intersection
(
it
,
(
*
st
)
->
edge
,
edge
,
x
,
y
);
/* Head further into the ST */
add_st_edge
(
&
((
*
st
)
->
prev
),
it
,
edge
,
dy
);
}
}
}
static
void
build_intersection_table
(
it_node
**
it
,
edge_node
*
aet
,
double
dy
)
{
st_node
*
st
;
st_node
*
stp
;
edge_node
*
edge
=
NULL
;
/* Build intersection table for the current scanbeam */
reset_it
(
it
);
st
=
NULL
;
/* Process each AET edge */
for
(
edge
=
aet
;
edge
;
edge
=
edge
->
next
)
{
if
((
edge
->
bstate
[
ABOVE
]
==
BUNDLE_HEAD
)
||
edge
->
bundle
[
ABOVE
][
CLIP
]
||
edge
->
bundle
[
ABOVE
][
SUBJ
])
{
add_st_edge
(
&
st
,
it
,
edge
,
dy
);
}
}
/* Free the sorted edge table */
while
(
st
)
{
stp
=
st
->
prev
;
gpc_free
<
st_node
>
(
st
);
st
=
stp
;
}
}
static
int
count_contours
(
polygon_node
*
polygon
)
{
int
nc
=
0
;
int
nv
=
0
;
vertex_node
*
v
=
NULL
;
vertex_node
*
nextv
=
NULL
;
for
(
nc
=
0
;
polygon
;
polygon
=
polygon
->
next
)
{
if
(
polygon
->
active
)
{
/* Count the vertices in the current contour */
nv
=
0
;
for
(
v
=
polygon
->
proxy
->
v
[
LEFT
];
v
;
v
=
v
->
next
)
{
nv
++
;
}
/* Record valid vertex counts in the active field */
if
(
nv
>
2
)
{
polygon
->
active
=
nv
;
nc
++
;
}
else
{
/* Invalid contour: just free the heap */
for
(
v
=
polygon
->
proxy
->
v
[
LEFT
];
v
;
v
=
nextv
)
{
nextv
=
v
->
next
;
gpc_free
<
vertex_node
>
(
v
);
}
polygon
->
active
=
0
;
}
}
}
return
nc
;
}
static
void
add_left
(
polygon_node
*
p
,
double
x
,
double
y
)
{
vertex_node
*
nv
=
NULL
;
/* Create a new vertex node and set its fields */
gpc_malloc
<
vertex_node
>
(
nv
,
sizeof
(
vertex_node
),
const_cast
<
char
*>
(
"vertex node creation"
));
nv
->
x
=
x
;
nv
->
y
=
y
;
/* Add vertex nv to the left end of the polygon's vertex list */
nv
->
next
=
p
->
proxy
->
v
[
LEFT
];
/* Update proxy->[LEFT] to point to nv */
p
->
proxy
->
v
[
LEFT
]
=
nv
;
}
static
void
merge_left
(
polygon_node
*
p
,
polygon_node
*
q
,
polygon_node
*
list
)
{
polygon_node
*
target
=
NULL
;
/* Label contour as a hole */
q
->
proxy
->
hole
=
1
;
if
(
p
->
proxy
!=
q
->
proxy
)
{
/* Assign p's vertex list to the left end of q's list */
p
->
proxy
->
v
[
RIGHT
]
->
next
=
q
->
proxy
->
v
[
LEFT
];
q
->
proxy
->
v
[
LEFT
]
=
p
->
proxy
->
v
[
LEFT
];
/* Redirect any p->proxy references to q->proxy */
for
(
target
=
p
->
proxy
;
list
;
list
=
list
->
next
)
{
if
(
list
->
proxy
==
target
)
{
list
->
active
=
0
;
list
->
proxy
=
q
->
proxy
;
}
}
}
}
static
void
add_right
(
polygon_node
*
p
,
double
x
,
double
y
)
{
vertex_node
*
nv
=
NULL
;
/* Create a new vertex node and set its fields */
gpc_malloc
<
vertex_node
>
(
nv
,
sizeof
(
vertex_node
),
const_cast
<
char
*>
(
"vertex node creation"
));
nv
->
x
=
x
;
nv
->
y
=
y
;
nv
->
next
=
NULL
;
/* Add vertex nv to the right end of the polygon's vertex list */
p
->
proxy
->
v
[
RIGHT
]
->
next
=
nv
;
/* Update proxy->v[RIGHT] to point to nv */
p
->
proxy
->
v
[
RIGHT
]
=
nv
;
}
static
void
merge_right
(
polygon_node
*
p
,
polygon_node
*
q
,
polygon_node
*
list
)
{
polygon_node
*
target
=
NULL
;
/* Label contour as external */
q
->
proxy
->
hole
=
0
;
if
(
p
->
proxy
!=
q
->
proxy
)
{
/* Assign p's vertex list to the right end of q's list */
q
->
proxy
->
v
[
RIGHT
]
->
next
=
p
->
proxy
->
v
[
LEFT
];
q
->
proxy
->
v
[
RIGHT
]
=
p
->
proxy
->
v
[
RIGHT
];
/* Redirect any p->proxy references to q->proxy */
for
(
target
=
p
->
proxy
;
list
;
list
=
list
->
next
)
{
if
(
list
->
proxy
==
target
)
{
list
->
active
=
0
;
list
->
proxy
=
q
->
proxy
;
}
}
}
}
static
void
add_local_min
(
polygon_node
**
p
,
edge_node
*
edge
,
double
x
,
double
y
)
{
polygon_node
*
existing_min
=
NULL
;
vertex_node
*
nv
=
NULL
;
existing_min
=
*
p
;
gpc_malloc
<
polygon_node
>
(
*
p
,
sizeof
(
polygon_node
),
const_cast
<
char
*>
(
"polygon node creation"
));
/* Create a new vertex node and set its fields */
gpc_malloc
<
vertex_node
>
(
nv
,
sizeof
(
vertex_node
),
const_cast
<
char
*>
(
"vertex node creation"
));
nv
->
x
=
x
;
nv
->
y
=
y
;
nv
->
next
=
NULL
;
/* Initialise proxy to point to p itself */
(
*
p
)
->
proxy
=
(
*
p
);
(
*
p
)
->
active
=
1
;
(
*
p
)
->
next
=
existing_min
;
/* Make v[LEFT] and v[RIGHT] point to new vertex nv */
(
*
p
)
->
v
[
LEFT
]
=
nv
;
(
*
p
)
->
v
[
RIGHT
]
=
nv
;
/* Assign polygon p to the edge */
edge
->
outp
[
ABOVE
]
=
*
p
;
}
static
int
count_tristrips
(
polygon_node
*
tn
)
{
int
total
=
0
;
for
(
total
=
0
;
tn
;
tn
=
tn
->
next
)
{
if
(
tn
->
active
>
2
)
{
total
++
;
}
}
return
total
;
}
void
add_vertex
(
vertex_node
**
t
,
double
x
,
double
y
)
{
if
(
!
(
*
t
))
{
gpc_malloc
<
vertex_node
>
(
*
t
,
sizeof
(
vertex_node
),
const_cast
<
char
*>
(
"tristrip vertex creation"
));
(
*
t
)
->
x
=
x
;
(
*
t
)
->
y
=
y
;
(
*
t
)
->
next
=
NULL
;
}
else
{
/* Head further down the list */
add_vertex
(
&
((
*
t
)
->
next
),
x
,
y
);
}
}
void
gpc_vertex_create
(
edge_node
*
e
,
int
p
,
int
s
,
double
x
,
double
y
)
{
add_vertex
(
&
(
e
->
outp
[
p
]
->
v
[
s
]),
x
,
y
);
e
->
outp
[
p
]
->
active
++
;
}
static
void
new_tristrip
(
polygon_node
**
tn
,
edge_node
*
edge
,
double
x
,
double
y
)
{
if
(
!
(
*
tn
))
{
gpc_malloc
<
polygon_node
>
(
*
tn
,
sizeof
(
polygon_node
),
const_cast
<
char
*>
(
"tristrip node creation"
));
(
*
tn
)
->
next
=
NULL
;
(
*
tn
)
->
v
[
LEFT
]
=
NULL
;
(
*
tn
)
->
v
[
RIGHT
]
=
NULL
;
(
*
tn
)
->
active
=
1
;
add_vertex
(
&
((
*
tn
)
->
v
[
LEFT
]),
x
,
y
);
edge
->
outp
[
ABOVE
]
=
*
tn
;
}
else
{
/* Head further down the list */
new_tristrip
(
&
((
*
tn
)
->
next
),
edge
,
x
,
y
);
}
}
static
bbox
*
create_contour_bboxes
(
gpc_polygon
*
p
)
{
bbox
*
box
;
int
c
=
0
;
int
v
=
0
;
gpc_malloc
<
bbox
>
(
box
,
p
->
num_contours
*
sizeof
(
bbox
),
const_cast
<
char
*>
(
"Bounding box creation"
));
/* Construct contour bounding boxes */
for
(
c
=
0
;
c
<
p
->
num_contours
;
c
++
)
{
/* Initialise bounding box extent */
box
[
c
].
xmin
=
DBL_MAX
;
box
[
c
].
ymin
=
DBL_MAX
;
box
[
c
].
xmax
=
-
DBL_MAX
;
box
[
c
].
ymax
=
-
DBL_MAX
;
for
(
v
=
0
;
v
<
p
->
contour
[
c
].
num_vertices
;
v
++
)
{
/* Adjust bounding box */
if
(
p
->
contour
[
c
].
vertex
[
v
].
x
<
box
[
c
].
xmin
)
{
box
[
c
].
xmin
=
p
->
contour
[
c
].
vertex
[
v
].
x
;
}
if
(
p
->
contour
[
c
].
vertex
[
v
].
y
<
box
[
c
].
ymin
)
{
box
[
c
].
ymin
=
p
->
contour
[
c
].
vertex
[
v
].
y
;
}
if
(
p
->
contour
[
c
].
vertex
[
v
].
x
>
box
[
c
].
xmax
)
{
box
[
c
].
xmax
=
p
->
contour
[
c
].
vertex
[
v
].
x
;
}
if
(
p
->
contour
[
c
].
vertex
[
v
].
y
>
box
[
c
].
ymax
)
{
box
[
c
].
ymax
=
p
->
contour
[
c
].
vertex
[
v
].
y
;
}
}
}
return
box
;
}
static
void
minimax_test
(
gpc_polygon
*
subj
,
gpc_polygon
*
clip
,
gpc_op
op
)
{
bbox
*
s_bbox
;
bbox
*
c_bbox
;
int
s
=
0
;
int
c
=
0
;
int
*
o_table
=
NULL
;
int
overlap
=
0
;
s_bbox
=
create_contour_bboxes
(
subj
);
c_bbox
=
create_contour_bboxes
(
clip
);
gpc_malloc
<
int
>
(
o_table
,
subj
->
num_contours
*
clip
->
num_contours
*
sizeof
(
int
),
const_cast
<
char
*>
(
"overlap table creation"
));
/* Check all subject contour bounding boxes against clip boxes */
for
(
s
=
0
;
s
<
subj
->
num_contours
;
s
++
)
{
for
(
c
=
0
;
c
<
clip
->
num_contours
;
c
++
)
{
o_table
[
c
*
subj
->
num_contours
+
s
]
=
(
!
((
s_bbox
[
s
].
xmax
<
c_bbox
[
c
].
xmin
)
||
(
s_bbox
[
s
].
xmin
>
c_bbox
[
c
].
xmax
)))
&&
(
!
((
s_bbox
[
s
].
ymax
<
c_bbox
[
c
].
ymin
)
||
(
s_bbox
[
s
].
ymin
>
c_bbox
[
c
].
ymax
)));
}
}
/* For each clip contour, search for any subject contour overlaps */
for
(
c
=
0
;
c
<
clip
->
num_contours
;
c
++
)
{
overlap
=
0
;
for
(
s
=
0
;
(
!
overlap
)
&&
(
s
<
subj
->
num_contours
);
s
++
)
{
overlap
=
o_table
[
c
*
subj
->
num_contours
+
s
];
}
if
(
!
overlap
)
{
/* Flag non contributing status by negating vertex count */
clip
->
contour
[
c
].
num_vertices
=
-
clip
->
contour
[
c
].
num_vertices
;
}
}
if
(
op
==
GPC_INT
)
{
/* For each subject contour, search for any clip contour overlaps */
for
(
s
=
0
;
s
<
subj
->
num_contours
;
s
++
)
{
overlap
=
0
;
for
(
c
=
0
;
(
!
overlap
)
&&
(
c
<
clip
->
num_contours
);
c
++
)
{
overlap
=
o_table
[
c
*
subj
->
num_contours
+
s
];
}
if
(
!
overlap
)
{
/* Flag non contributing status by negating vertex count */
subj
->
contour
[
s
].
num_vertices
=
-
subj
->
contour
[
s
].
num_vertices
;
}
}
}
gpc_free
<
bbox
>
(
s_bbox
);
gpc_free
<
bbox
>
(
c_bbox
);
gpc_free
<
int
>
(
o_table
);
}
/*
===========================================================================
Public Functions
===========================================================================
*/
void
gpc_free_polygon
(
gpc_polygon
*
p
)
{
int
c
=
0
;
for
(
c
=
0
;
c
<
p
->
num_contours
;
c
++
)
{
gpc_free
<
gpc_vertex
>
(
p
->
contour
[
c
].
vertex
);
}
gpc_free
<
int
>
(
p
->
hole
);
gpc_free
<
gpc_vertex_list
>
(
p
->
contour
);
p
->
num_contours
=
0
;
}
/*
void gpc_read_polygon(FILE *fp, int read_hole_flags, gpc_polygon *p) {
int c = 0;
int v = 0;
fscanf(fp, "%d", &(p->num_contours));
gpc_malloc<int>(p->hole, p->num_contours * sizeof(int),
(char *)"hole flag array creation");
gpc_malloc<gpc_vertex_list>(p->contour,
p->num_contours * sizeof(gpc_vertex_list),
(char *)"contour creation");
for (c = 0; c < p->num_contours; c++) {
fscanf(fp, "%d", &(p->contour[c].num_vertices));
if (read_hole_flags) {
fscanf(fp, "%d", &(p->hole[c]));
} else {
p->hole[c] = 0; // Assume all contours to be external
}
gpc_malloc<gpc_vertex>(p->contour[c].vertex,
p->contour[c].num_vertices * sizeof(gpc_vertex),
(char *)"vertex creation");
for (v = 0; v < p->contour[c].num_vertices; v++) {
fscanf(fp, "%lf %lf", &(p->contour[c].vertex[v].x),
&(p->contour[c].vertex[v].y));
}
}
}
void gpc_write_polygon(FILE *fp, int write_hole_flags, gpc_polygon *p) {
int c = 0;
int v = 0;
fprintf(fp, "%d\n", p->num_contours);
for (c = 0; c < p->num_contours; c++) {
fprintf(fp, "%d\n", p->contour[c].num_vertices);
if (write_hole_flags) {
fprintf(fp, "%d\n", p->hole[c]);
}
for (v = 0; v < p->contour[c].num_vertices; v++) {
fprintf(fp, "% .*lf % .*lf\n", DBL_DIG, p->contour[c].vertex[v].x,
DBL_DIG, p->contour[c].vertex[v].y);
}
}
}
*/
void
gpc_add_contour
(
gpc_polygon
*
p
,
gpc_vertex_list
*
new_contour
,
int
hole
)
{
int
*
extended_hole
=
NULL
;
int
c
=
0
;
int
v
=
0
;
gpc_vertex_list
*
extended_contour
=
NULL
;
/* Create an extended hole array */
gpc_malloc
<
int
>
(
extended_hole
,
(
p
->
num_contours
+
1
)
*
sizeof
(
int
),
const_cast
<
char
*>
(
"contour hole addition"
));
/* Create an extended contour array */
gpc_malloc
<
gpc_vertex_list
>
(
extended_contour
,
(
p
->
num_contours
+
1
)
*
sizeof
(
gpc_vertex_list
),
const_cast
<
char
*>
(
"contour addition"
));
/* Copy the old contour and hole data into the extended arrays */
for
(
c
=
0
;
c
<
p
->
num_contours
;
c
++
)
{
extended_hole
[
c
]
=
p
->
hole
[
c
];
extended_contour
[
c
]
=
p
->
contour
[
c
];
}
/* Copy the new contour and hole onto the end of the extended arrays */
c
=
p
->
num_contours
;
extended_hole
[
c
]
=
hole
;
extended_contour
[
c
].
num_vertices
=
new_contour
->
num_vertices
;
gpc_malloc
<
gpc_vertex
>
(
extended_contour
[
c
].
vertex
,
new_contour
->
num_vertices
*
sizeof
(
gpc_vertex
),
const_cast
<
char
*>
(
"contour addition"
));
for
(
v
=
0
;
v
<
new_contour
->
num_vertices
;
v
++
)
{
extended_contour
[
c
].
vertex
[
v
]
=
new_contour
->
vertex
[
v
];
}
/* Dispose of the old contour */
gpc_free
<
gpc_vertex_list
>
(
p
->
contour
);
gpc_free
<
int
>
(
p
->
hole
);
/* Update the polygon information */
p
->
num_contours
++
;
p
->
hole
=
extended_hole
;
p
->
contour
=
extended_contour
;
}
// gpc_polygon_clip
void
gpc_polygon_clip
(
gpc_op
op
,
gpc_polygon
*
subj
,
gpc_polygon
*
clip
,
gpc_polygon
*
result
)
{
sb_tree
*
sbtree
=
NULL
;
it_node
*
it
=
NULL
;
it_node
*
intersect
=
NULL
;
edge_node
*
edge
=
NULL
;
edge_node
*
prev_edge
=
NULL
;
edge_node
*
next_edge
=
NULL
;
edge_node
*
succ_edge
=
NULL
;
edge_node
*
e0
=
NULL
;
edge_node
*
e1
=
NULL
;
edge_node
*
aet
=
NULL
;
edge_node
*
c_heap
=
NULL
;
edge_node
*
s_heap
=
NULL
;
lmt_node
*
lmt
=
NULL
;
lmt_node
*
local_min
=
NULL
;
polygon_node
*
out_poly
=
NULL
;
polygon_node
*
p
=
NULL
;
polygon_node
*
q
=
NULL
;
polygon_node
*
poly
=
NULL
;
polygon_node
*
npoly
=
NULL
;
polygon_node
*
cf
=
NULL
;
vertex_node
*
vtx
=
NULL
;
vertex_node
*
nv
=
NULL
;
h_state
horiz
[
2
];
int
in
[
2
];
int
exists
[
2
];
int
parity
[
2
]
=
{
LEFT
,
LEFT
};
int
c
=
0
;
int
v
=
0
;
int
contributing
=
0
;
int
search
=
0
;
int
scanbeam
=
0
;
int
sbt_entries
=
0
;
int
vclass
=
0
;
int
bl
=
0
;
int
br
=
0
;
int
tl
=
0
;
int
tr
=
0
;
double
*
sbt
=
NULL
;
double
xb
=
0.0
;
double
px
=
0.0
;
double
yb
=
0.0
;
double
yt
=
0.0
;
double
dy
=
0.0
;
double
ix
=
0.0
;
double
iy
=
0.0
;
/* Test for trivial NULL result cases */
if
(((
subj
->
num_contours
==
0
)
&&
(
clip
->
num_contours
==
0
))
||
((
subj
->
num_contours
==
0
)
&&
((
op
==
GPC_INT
)
||
(
op
==
GPC_DIFF
)))
||
((
clip
->
num_contours
==
0
)
&&
(
op
==
GPC_INT
)))
{
result
->
num_contours
=
0
;
result
->
hole
=
NULL
;
result
->
contour
=
NULL
;
return
;
}
/* Identify potentialy contributing contours */
if
(((
op
==
GPC_INT
)
||
(
op
==
GPC_DIFF
))
&&
(
subj
->
num_contours
>
0
)
&&
(
clip
->
num_contours
>
0
))
{
minimax_test
(
subj
,
clip
,
op
);
}
/* Build LMT */
if
(
subj
->
num_contours
>
0
)
{
s_heap
=
build_lmt
(
&
lmt
,
&
sbtree
,
&
sbt_entries
,
subj
,
SUBJ
,
op
);
}
if
(
clip
->
num_contours
>
0
)
{
c_heap
=
build_lmt
(
&
lmt
,
&
sbtree
,
&
sbt_entries
,
clip
,
CLIP
,
op
);
}
/* Return a NULL result if no contours contribute */
if
(
lmt
==
NULL
)
{
result
->
num_contours
=
0
;
result
->
hole
=
NULL
;
result
->
contour
=
NULL
;
reset_lmt
(
&
lmt
);
gpc_free
<
edge_node
>
(
s_heap
);
gpc_free
<
edge_node
>
(
c_heap
);
return
;
}
/* Build scanbeam table from scanbeam tree */
gpc_malloc
<
double
>
(
sbt
,
sbt_entries
*
sizeof
(
double
),
const_cast
<
char
*>
(
"sbt creation"
));
build_sbt
(
&
scanbeam
,
sbt
,
sbtree
);
scanbeam
=
0
;
free_sbtree
(
&
sbtree
);
/* Allow pointer re-use without causing memory leak */
if
(
subj
==
result
)
{
gpc_free_polygon
(
subj
);
}
if
(
clip
==
result
)
{
gpc_free_polygon
(
clip
);
}
/* Invert clip polygon for difference operation */
if
(
op
==
GPC_DIFF
)
{
parity
[
CLIP
]
=
RIGHT
;
}
local_min
=
lmt
;
// Process each scanbeam
while
(
scanbeam
<
sbt_entries
)
{
/* Set yb and yt to the bottom and top of the scanbeam */
yb
=
sbt
[
scanbeam
++
];
if
(
scanbeam
<
sbt_entries
)
{
yt
=
sbt
[
scanbeam
];
dy
=
yt
-
yb
;
}
/* === SCANBEAM BOUNDARY PROCESSING ================================ */
/* If LMT node corresponding to yb exists */
if
(
local_min
)
{
if
(
local_min
->
y
==
yb
)
{
/* Add edges starting at this local minimum to the AET */
for
(
edge
=
local_min
->
first_bound
;
edge
;
edge
=
edge
->
next_bound
)
{
add_edge_to_aet
(
&
aet
,
edge
,
NULL
);
}
local_min
=
local_min
->
next
;
}
}
/* Set dummy previous x value */
px
=
-
DBL_MAX
;
/* Create bundles within AET */
e0
=
aet
;
e1
=
aet
;
/* Set up bundle fields of first edge */
aet
->
bundle
[
ABOVE
][
aet
->
type
]
=
(
aet
->
top
.
y
!=
yb
);
aet
->
bundle
[
ABOVE
][
!
aet
->
type
]
=
0
;
aet
->
bstate
[
ABOVE
]
=
UNBUNDLED
;
for
(
next_edge
=
aet
->
next
;
next_edge
;
next_edge
=
next_edge
->
next
)
{
/* Set up bundle fields of next edge */
next_edge
->
bundle
[
ABOVE
][
next_edge
->
type
]
=
(
next_edge
->
top
.
y
!=
yb
);
next_edge
->
bundle
[
ABOVE
][
!
next_edge
->
type
]
=
0
;
next_edge
->
bstate
[
ABOVE
]
=
UNBUNDLED
;
/* Bundle edges above the scanbeam boundary if they coincide */
if
(
next_edge
->
bundle
[
ABOVE
][
next_edge
->
type
])
{
if
(
gpc_eq
(
e0
->
xb
,
next_edge
->
xb
)
&&
gpc_eq
(
e0
->
dx
,
next_edge
->
dx
)
&&
(
e0
->
top
.
y
!=
yb
))
{
next_edge
->
bundle
[
ABOVE
][
next_edge
->
type
]
^=
e0
->
bundle
[
ABOVE
][
next_edge
->
type
];
next_edge
->
bundle
[
ABOVE
][
!
next_edge
->
type
]
=
e0
->
bundle
[
ABOVE
][
!
next_edge
->
type
];
next_edge
->
bstate
[
ABOVE
]
=
BUNDLE_HEAD
;
e0
->
bundle
[
ABOVE
][
CLIP
]
=
0
;
e0
->
bundle
[
ABOVE
][
SUBJ
]
=
0
;
e0
->
bstate
[
ABOVE
]
=
BUNDLE_TAIL
;
}
e0
=
next_edge
;
}
}
horiz
[
CLIP
]
=
NH
;
horiz
[
SUBJ
]
=
NH
;
// Process each edge at this scanbeam boundary
for
(
edge
=
aet
;
edge
;
edge
=
edge
->
next
)
{
exists
[
CLIP
]
=
edge
->
bundle
[
ABOVE
][
CLIP
]
+
(
edge
->
bundle
[
BELOW
][
CLIP
]
<<
1
);
exists
[
SUBJ
]
=
edge
->
bundle
[
ABOVE
][
SUBJ
]
+
(
edge
->
bundle
[
BELOW
][
SUBJ
]
<<
1
);
if
(
exists
[
CLIP
]
||
exists
[
SUBJ
])
{
/* Set bundle side */
edge
->
bside
[
CLIP
]
=
parity
[
CLIP
];
edge
->
bside
[
SUBJ
]
=
parity
[
SUBJ
];
/* Determine contributing status and quadrant occupancies */
switch
(
op
)
{
case
GPC_DIFF
:
case
GPC_INT
:
contributing
=
(
exists
[
CLIP
]
&&
(
parity
[
SUBJ
]
||
horiz
[
SUBJ
]))
||
(
exists
[
SUBJ
]
&&
(
parity
[
CLIP
]
||
horiz
[
CLIP
]))
||
(
exists
[
CLIP
]
&&
exists
[
SUBJ
]
&&
(
parity
[
CLIP
]
==
parity
[
SUBJ
]));
br
=
(
parity
[
CLIP
])
&&
(
parity
[
SUBJ
]);
bl
=
(
parity
[
CLIP
]
^
edge
->
bundle
[
ABOVE
][
CLIP
])
&&
(
parity
[
SUBJ
]
^
edge
->
bundle
[
ABOVE
][
SUBJ
]);
tr
=
(
parity
[
CLIP
]
^
(
horiz
[
CLIP
]
!=
NH
))
&&
(
parity
[
SUBJ
]
^
(
horiz
[
SUBJ
]
!=
NH
));
tl
=
(
parity
[
CLIP
]
^
(
horiz
[
CLIP
]
!=
NH
)
^
edge
->
bundle
[
BELOW
][
CLIP
])
&&
(
parity
[
SUBJ
]
^
(
horiz
[
SUBJ
]
!=
NH
)
^
edge
->
bundle
[
BELOW
][
SUBJ
]);
break
;
case
GPC_XOR
:
contributing
=
exists
[
CLIP
]
||
exists
[
SUBJ
];
br
=
(
parity
[
CLIP
])
^
(
parity
[
SUBJ
]);
bl
=
(
parity
[
CLIP
]
^
edge
->
bundle
[
ABOVE
][
CLIP
])
^
(
parity
[
SUBJ
]
^
edge
->
bundle
[
ABOVE
][
SUBJ
]);
tr
=
(
parity
[
CLIP
]
^
(
horiz
[
CLIP
]
!=
NH
))
^
(
parity
[
SUBJ
]
^
(
horiz
[
SUBJ
]
!=
NH
));
tl
=
(
parity
[
CLIP
]
^
(
horiz
[
CLIP
]
!=
NH
)
^
edge
->
bundle
[
BELOW
][
CLIP
])
^
(
parity
[
SUBJ
]
^
(
horiz
[
SUBJ
]
!=
NH
)
^
edge
->
bundle
[
BELOW
][
SUBJ
]);
break
;
case
GPC_UNION
:
contributing
=
(
exists
[
CLIP
]
&&
(
!
parity
[
SUBJ
]
||
horiz
[
SUBJ
]))
||
(
exists
[
SUBJ
]
&&
(
!
parity
[
CLIP
]
||
horiz
[
CLIP
]))
||
(
exists
[
CLIP
]
&&
exists
[
SUBJ
]
&&
(
parity
[
CLIP
]
==
parity
[
SUBJ
]));
br
=
(
parity
[
CLIP
])
||
(
parity
[
SUBJ
]);
bl
=
(
parity
[
CLIP
]
^
edge
->
bundle
[
ABOVE
][
CLIP
])
||
(
parity
[
SUBJ
]
^
edge
->
bundle
[
ABOVE
][
SUBJ
]);
tr
=
(
parity
[
CLIP
]
^
(
horiz
[
CLIP
]
!=
NH
))
||
(
parity
[
SUBJ
]
^
(
horiz
[
SUBJ
]
!=
NH
));
tl
=
(
parity
[
CLIP
]
^
(
horiz
[
CLIP
]
!=
NH
)
^
edge
->
bundle
[
BELOW
][
CLIP
])
||
(
parity
[
SUBJ
]
^
(
horiz
[
SUBJ
]
!=
NH
)
^
edge
->
bundle
[
BELOW
][
SUBJ
]);
break
;
}
// Update parity
parity
[
CLIP
]
^=
edge
->
bundle
[
ABOVE
][
CLIP
];
parity
[
SUBJ
]
^=
edge
->
bundle
[
ABOVE
][
SUBJ
];
/* Update horizontal state */
if
(
exists
[
CLIP
])
{
horiz
[
CLIP
]
=
next_h_state
[
horiz
[
CLIP
]]
[((
exists
[
CLIP
]
-
1
)
<<
1
)
+
parity
[
CLIP
]];
}
if
(
exists
[
SUBJ
])
{
horiz
[
SUBJ
]
=
next_h_state
[
horiz
[
SUBJ
]]
[((
exists
[
SUBJ
]
-
1
)
<<
1
)
+
parity
[
SUBJ
]];
}
vclass
=
tr
+
(
tl
<<
1
)
+
(
br
<<
2
)
+
(
bl
<<
3
);
if
(
contributing
)
{
xb
=
edge
->
xb
;
switch
(
vclass
)
{
case
EMN
:
case
IMN
:
add_local_min
(
&
out_poly
,
edge
,
xb
,
yb
);
px
=
xb
;
cf
=
edge
->
outp
[
ABOVE
];
break
;
case
ERI
:
if
(
xb
!=
px
)
{
add_right
(
cf
,
xb
,
yb
);
px
=
xb
;
}
edge
->
outp
[
ABOVE
]
=
cf
;
cf
=
NULL
;
break
;
case
ELI
:
add_left
(
edge
->
outp
[
BELOW
],
xb
,
yb
);
px
=
xb
;
cf
=
edge
->
outp
[
BELOW
];
break
;
case
EMX
:
if
(
xb
!=
px
)
{
add_left
(
cf
,
xb
,
yb
);
px
=
xb
;
}
merge_right
(
cf
,
edge
->
outp
[
BELOW
],
out_poly
);
cf
=
NULL
;
break
;
case
ILI
:
if
(
xb
!=
px
)
{
add_left
(
cf
,
xb
,
yb
);
px
=
xb
;
}
edge
->
outp
[
ABOVE
]
=
cf
;
cf
=
NULL
;
break
;
case
IRI
:
add_right
(
edge
->
outp
[
BELOW
],
xb
,
yb
);
px
=
xb
;
cf
=
edge
->
outp
[
BELOW
];
edge
->
outp
[
BELOW
]
=
NULL
;
break
;
case
IMX
:
if
(
xb
!=
px
)
{
add_right
(
cf
,
xb
,
yb
);
px
=
xb
;
}
merge_left
(
cf
,
edge
->
outp
[
BELOW
],
out_poly
);
cf
=
NULL
;
edge
->
outp
[
BELOW
]
=
NULL
;
break
;
case
IMM
:
if
(
xb
!=
px
)
{
add_right
(
cf
,
xb
,
yb
);
px
=
xb
;
}
merge_left
(
cf
,
edge
->
outp
[
BELOW
],
out_poly
);
edge
->
outp
[
BELOW
]
=
NULL
;
add_local_min
(
&
out_poly
,
edge
,
xb
,
yb
);
cf
=
edge
->
outp
[
ABOVE
];
break
;
case
EMM
:
if
(
xb
!=
px
)
{
add_left
(
cf
,
xb
,
yb
);
px
=
xb
;
}
merge_right
(
cf
,
edge
->
outp
[
BELOW
],
out_poly
);
edge
->
outp
[
BELOW
]
=
NULL
;
add_local_min
(
&
out_poly
,
edge
,
xb
,
yb
);
cf
=
edge
->
outp
[
ABOVE
];
break
;
case
LED
:
if
(
edge
->
bot
.
y
==
yb
)
{
add_left
(
edge
->
outp
[
BELOW
],
xb
,
yb
);
}
edge
->
outp
[
ABOVE
]
=
edge
->
outp
[
BELOW
];
px
=
xb
;
break
;
case
RED
:
if
(
edge
->
bot
.
y
==
yb
)
{
add_right
(
edge
->
outp
[
BELOW
],
xb
,
yb
);
}
edge
->
outp
[
ABOVE
]
=
edge
->
outp
[
BELOW
];
px
=
xb
;
break
;
default:
break
;
}
/* End of switch */
}
/* End of contributing conditional */
}
/* End of edge exists conditional */
}
// End of AET loop
/* Delete terminating edges from the AET, otherwise compute xt */
for
(
edge
=
aet
;
edge
;
edge
=
edge
->
next
)
{
if
(
edge
->
top
.
y
==
yb
)
{
prev_edge
=
edge
->
prev
;
next_edge
=
edge
->
next
;
if
(
prev_edge
)
{
prev_edge
->
next
=
next_edge
;
}
else
{
aet
=
next_edge
;
}
if
(
next_edge
)
{
next_edge
->
prev
=
prev_edge
;
}
/* Copy bundle head state to the adjacent tail edge if required */
if
((
edge
->
bstate
[
BELOW
]
==
BUNDLE_HEAD
)
&&
prev_edge
)
{
if
(
prev_edge
->
bstate
[
BELOW
]
==
BUNDLE_TAIL
)
{
prev_edge
->
outp
[
BELOW
]
=
edge
->
outp
[
BELOW
];
prev_edge
->
bstate
[
BELOW
]
=
UNBUNDLED
;
if
(
prev_edge
->
prev
)
{
if
(
prev_edge
->
prev
->
bstate
[
BELOW
]
==
BUNDLE_TAIL
)
{
prev_edge
->
bstate
[
BELOW
]
=
BUNDLE_HEAD
;
}
}
}
}
}
else
{
if
(
edge
->
top
.
y
==
yt
)
{
edge
->
xt
=
edge
->
top
.
x
;
}
else
{
edge
->
xt
=
edge
->
bot
.
x
+
edge
->
dx
*
(
yt
-
edge
->
bot
.
y
);
}
}
}
if
(
scanbeam
<
sbt_entries
)
{
/* === SCANBEAM INTERIOR PROCESSING ============================== */
build_intersection_table
(
&
it
,
aet
,
dy
);
/* Process each node in the intersection table */
for
(
intersect
=
it
;
intersect
;
intersect
=
intersect
->
next
)
{
e0
=
intersect
->
ie
[
0
];
e1
=
intersect
->
ie
[
1
];
/* Only generate output for contributing intersections */
if
((
e0
->
bundle
[
ABOVE
][
CLIP
]
||
e0
->
bundle
[
ABOVE
][
SUBJ
])
&&
(
e1
->
bundle
[
ABOVE
][
CLIP
]
||
e1
->
bundle
[
ABOVE
][
SUBJ
]))
{
p
=
e0
->
outp
[
ABOVE
];
q
=
e1
->
outp
[
ABOVE
];
ix
=
intersect
->
point
.
x
;
iy
=
intersect
->
point
.
y
+
yb
;
in
[
CLIP
]
=
(
e0
->
bundle
[
ABOVE
][
CLIP
]
&&
!
e0
->
bside
[
CLIP
])
||
(
e1
->
bundle
[
ABOVE
][
CLIP
]
&&
e1
->
bside
[
CLIP
])
||
(
!
e0
->
bundle
[
ABOVE
][
CLIP
]
&&
!
e1
->
bundle
[
ABOVE
][
CLIP
]
&&
e0
->
bside
[
CLIP
]
&&
e1
->
bside
[
CLIP
]);
in
[
SUBJ
]
=
(
e0
->
bundle
[
ABOVE
][
SUBJ
]
&&
!
e0
->
bside
[
SUBJ
])
||
(
e1
->
bundle
[
ABOVE
][
SUBJ
]
&&
e1
->
bside
[
SUBJ
])
||
(
!
e0
->
bundle
[
ABOVE
][
SUBJ
]
&&
!
e1
->
bundle
[
ABOVE
][
SUBJ
]
&&
e0
->
bside
[
SUBJ
]
&&
e1
->
bside
[
SUBJ
]);
// Determine quadrant occupancies
switch
(
op
)
{
case
GPC_DIFF
:
case
GPC_INT
:
tr
=
(
in
[
CLIP
])
&&
(
in
[
SUBJ
]);
tl
=
(
in
[
CLIP
]
^
e1
->
bundle
[
ABOVE
][
CLIP
])
&&
(
in
[
SUBJ
]
^
e1
->
bundle
[
ABOVE
][
SUBJ
]);
br
=
(
in
[
CLIP
]
^
e0
->
bundle
[
ABOVE
][
CLIP
])
&&
(
in
[
SUBJ
]
^
e0
->
bundle
[
ABOVE
][
SUBJ
]);
bl
=
(
in
[
CLIP
]
^
e1
->
bundle
[
ABOVE
][
CLIP
]
^
e0
->
bundle
[
ABOVE
][
CLIP
])
&&
(
in
[
SUBJ
]
^
e1
->
bundle
[
ABOVE
][
SUBJ
]
^
e0
->
bundle
[
ABOVE
][
SUBJ
]);
break
;
case
GPC_XOR
:
tr
=
(
in
[
CLIP
])
^
(
in
[
SUBJ
]);
tl
=
(
in
[
CLIP
]
^
e1
->
bundle
[
ABOVE
][
CLIP
])
^
(
in
[
SUBJ
]
^
e1
->
bundle
[
ABOVE
][
SUBJ
]);
br
=
(
in
[
CLIP
]
^
e0
->
bundle
[
ABOVE
][
CLIP
])
^
(
in
[
SUBJ
]
^
e0
->
bundle
[
ABOVE
][
SUBJ
]);
bl
=
(
in
[
CLIP
]
^
e1
->
bundle
[
ABOVE
][
CLIP
]
^
e0
->
bundle
[
ABOVE
][
CLIP
])
^
(
in
[
SUBJ
]
^
e1
->
bundle
[
ABOVE
][
SUBJ
]
^
e0
->
bundle
[
ABOVE
][
SUBJ
]);
break
;
case
GPC_UNION
:
tr
=
(
in
[
CLIP
])
||
(
in
[
SUBJ
]);
tl
=
(
in
[
CLIP
]
^
e1
->
bundle
[
ABOVE
][
CLIP
])
||
(
in
[
SUBJ
]
^
e1
->
bundle
[
ABOVE
][
SUBJ
]);
br
=
(
in
[
CLIP
]
^
e0
->
bundle
[
ABOVE
][
CLIP
])
||
(
in
[
SUBJ
]
^
e0
->
bundle
[
ABOVE
][
SUBJ
]);
bl
=
(
in
[
CLIP
]
^
e1
->
bundle
[
ABOVE
][
CLIP
]
^
e0
->
bundle
[
ABOVE
][
CLIP
])
||
(
in
[
SUBJ
]
^
e1
->
bundle
[
ABOVE
][
SUBJ
]
^
e0
->
bundle
[
ABOVE
][
SUBJ
]);
break
;
}
vclass
=
tr
+
(
tl
<<
1
)
+
(
br
<<
2
)
+
(
bl
<<
3
);
switch
(
vclass
)
{
case
EMN
:
add_local_min
(
&
out_poly
,
e0
,
ix
,
iy
);
e1
->
outp
[
ABOVE
]
=
e0
->
outp
[
ABOVE
];
break
;
case
ERI
:
if
(
p
)
{
add_right
(
p
,
ix
,
iy
);
e1
->
outp
[
ABOVE
]
=
p
;
e0
->
outp
[
ABOVE
]
=
NULL
;
}
break
;
case
ELI
:
if
(
q
)
{
add_left
(
q
,
ix
,
iy
);
e0
->
outp
[
ABOVE
]
=
q
;
e1
->
outp
[
ABOVE
]
=
NULL
;
}
break
;
case
EMX
:
if
(
p
&&
q
)
{
add_left
(
p
,
ix
,
iy
);
merge_right
(
p
,
q
,
out_poly
);
e0
->
outp
[
ABOVE
]
=
NULL
;
e1
->
outp
[
ABOVE
]
=
NULL
;
}
break
;
case
IMN
:
add_local_min
(
&
out_poly
,
e0
,
ix
,
iy
);
e1
->
outp
[
ABOVE
]
=
e0
->
outp
[
ABOVE
];
break
;
case
ILI
:
if
(
p
)
{
add_left
(
p
,
ix
,
iy
);
e1
->
outp
[
ABOVE
]
=
p
;
e0
->
outp
[
ABOVE
]
=
NULL
;
}
break
;
case
IRI
:
if
(
q
)
{
add_right
(
q
,
ix
,
iy
);
e0
->
outp
[
ABOVE
]
=
q
;
e1
->
outp
[
ABOVE
]
=
NULL
;
}
break
;
case
IMX
:
if
(
p
&&
q
)
{
add_right
(
p
,
ix
,
iy
);
merge_left
(
p
,
q
,
out_poly
);
e0
->
outp
[
ABOVE
]
=
NULL
;
e1
->
outp
[
ABOVE
]
=
NULL
;
}
break
;
case
IMM
:
if
(
p
&&
q
)
{
add_right
(
p
,
ix
,
iy
);
merge_left
(
p
,
q
,
out_poly
);
add_local_min
(
&
out_poly
,
e0
,
ix
,
iy
);
e1
->
outp
[
ABOVE
]
=
e0
->
outp
[
ABOVE
];
}
break
;
case
EMM
:
if
(
p
&&
q
)
{
add_left
(
p
,
ix
,
iy
);
merge_right
(
p
,
q
,
out_poly
);
add_local_min
(
&
out_poly
,
e0
,
ix
,
iy
);
e1
->
outp
[
ABOVE
]
=
e0
->
outp
[
ABOVE
];
}
break
;
default:
break
;
}
// End of switch
}
/* End of contributing intersection conditional */
/* Swap bundle sides in response to edge crossing */
if
(
e0
->
bundle
[
ABOVE
][
CLIP
])
{
e1
->
bside
[
CLIP
]
=
!
e1
->
bside
[
CLIP
];
}
if
(
e1
->
bundle
[
ABOVE
][
CLIP
])
{
e0
->
bside
[
CLIP
]
=
!
e0
->
bside
[
CLIP
];
}
if
(
e0
->
bundle
[
ABOVE
][
SUBJ
])
{
e1
->
bside
[
SUBJ
]
=
!
e1
->
bside
[
SUBJ
];
}
if
(
e1
->
bundle
[
ABOVE
][
SUBJ
])
{
e0
->
bside
[
SUBJ
]
=
!
e0
->
bside
[
SUBJ
];
}
/* Swap e0 and e1 bundles in the AET */
prev_edge
=
e0
->
prev
;
next_edge
=
e1
->
next
;
if
(
next_edge
)
{
next_edge
->
prev
=
e0
;
}
if
(
e0
->
bstate
[
ABOVE
]
==
BUNDLE_HEAD
)
{
search
=
1
;
while
(
search
)
{
prev_edge
=
prev_edge
->
prev
;
if
(
prev_edge
)
{
if
(
prev_edge
->
bstate
[
ABOVE
]
!=
BUNDLE_TAIL
)
{
search
=
0
;
}
}
else
{
search
=
0
;
}
}
}
if
(
!
prev_edge
)
{
aet
->
prev
=
e1
;
e1
->
next
=
aet
;
aet
=
e0
->
next
;
}
else
{
prev_edge
->
next
->
prev
=
e1
;
e1
->
next
=
prev_edge
->
next
;
prev_edge
->
next
=
e0
->
next
;
}
e0
->
next
->
prev
=
prev_edge
;
e1
->
next
->
prev
=
e1
;
e0
->
next
=
next_edge
;
}
/* End of IT loop*/
// Prepare for next scanbeam
for
(
edge
=
aet
;
edge
;
edge
=
next_edge
)
{
next_edge
=
edge
->
next
;
succ_edge
=
edge
->
succ
;
if
((
edge
->
top
.
y
==
yt
)
&&
succ_edge
)
{
/* Replace AET edge by its successor */
succ_edge
->
outp
[
BELOW
]
=
edge
->
outp
[
ABOVE
];
succ_edge
->
bstate
[
BELOW
]
=
edge
->
bstate
[
ABOVE
];
succ_edge
->
bundle
[
BELOW
][
CLIP
]
=
edge
->
bundle
[
ABOVE
][
CLIP
];
succ_edge
->
bundle
[
BELOW
][
SUBJ
]
=
edge
->
bundle
[
ABOVE
][
SUBJ
];
prev_edge
=
edge
->
prev
;
if
(
prev_edge
)
{
prev_edge
->
next
=
succ_edge
;
}
else
{
aet
=
succ_edge
;
}
if
(
next_edge
)
{
next_edge
->
prev
=
succ_edge
;
}
succ_edge
->
prev
=
prev_edge
;
succ_edge
->
next
=
next_edge
;
}
else
{
/* Update this edge */
edge
->
outp
[
BELOW
]
=
edge
->
outp
[
ABOVE
];
edge
->
bstate
[
BELOW
]
=
edge
->
bstate
[
ABOVE
];
edge
->
bundle
[
BELOW
][
CLIP
]
=
edge
->
bundle
[
ABOVE
][
CLIP
];
edge
->
bundle
[
BELOW
][
SUBJ
]
=
edge
->
bundle
[
ABOVE
][
SUBJ
];
edge
->
xb
=
edge
->
xt
;
}
edge
->
outp
[
ABOVE
]
=
NULL
;
}
}
}
/* === END OF SCANBEAM PROCESSING ================================== */
// Generate result polygon from out_poly
result
->
contour
=
NULL
;
result
->
hole
=
NULL
;
result
->
num_contours
=
count_contours
(
out_poly
);
if
(
result
->
num_contours
>
0
)
{
gpc_malloc
<
int
>
(
result
->
hole
,
result
->
num_contours
*
sizeof
(
int
),
const_cast
<
char
*>
(
"hole flag table creation"
));
gpc_malloc
<
gpc_vertex_list
>
(
result
->
contour
,
result
->
num_contours
*
sizeof
(
gpc_vertex_list
),
const_cast
<
char
*>
(
"contour creation"
));
c
=
0
;
for
(
poly
=
out_poly
;
poly
;
poly
=
npoly
)
{
npoly
=
poly
->
next
;
if
(
poly
->
active
)
{
result
->
hole
[
c
]
=
poly
->
proxy
->
hole
;
result
->
contour
[
c
].
num_vertices
=
poly
->
active
;
gpc_malloc
<
gpc_vertex
>
(
result
->
contour
[
c
].
vertex
,
result
->
contour
[
c
].
num_vertices
*
sizeof
(
gpc_vertex
),
const_cast
<
char
*>
(
"vertex creation"
));
v
=
result
->
contour
[
c
].
num_vertices
-
1
;
for
(
vtx
=
poly
->
proxy
->
v
[
LEFT
];
vtx
;
vtx
=
nv
)
{
nv
=
vtx
->
next
;
result
->
contour
[
c
].
vertex
[
v
].
x
=
vtx
->
x
;
result
->
contour
[
c
].
vertex
[
v
].
y
=
vtx
->
y
;
gpc_free
<
vertex_node
>
(
vtx
);
v
--
;
}
c
++
;
}
gpc_free
<
polygon_node
>
(
poly
);
}
}
else
{
for
(
poly
=
out_poly
;
poly
;
poly
=
npoly
)
{
npoly
=
poly
->
next
;
gpc_free
<
polygon_node
>
(
poly
);
}
}
// Tidy up
reset_it
(
&
it
);
reset_lmt
(
&
lmt
);
gpc_free
<
edge_node
>
(
c_heap
);
gpc_free
<
edge_node
>
(
s_heap
);
gpc_free
<
double
>
(
sbt
);
}
// NOLINT
void
gpc_free_tristrip
(
gpc_tristrip
*
t
)
{
int
s
=
0
;
for
(
s
=
0
;
s
<
t
->
num_strips
;
s
++
)
{
gpc_free
<
gpc_vertex
>
(
t
->
strip
[
s
].
vertex
);
}
gpc_free
<
gpc_vertex_list
>
(
t
->
strip
);
t
->
num_strips
=
0
;
}
void
gpc_polygon_to_tristrip
(
gpc_polygon
*
s
,
gpc_tristrip
*
t
)
{
gpc_polygon
c
;
c
.
num_contours
=
0
;
c
.
hole
=
NULL
;
c
.
contour
=
NULL
;
gpc_tristrip_clip
(
GPC_DIFF
,
s
,
&
c
,
t
);
}
// gpc_tristrip_clip
void
gpc_tristrip_clip
(
gpc_op
op
,
gpc_polygon
*
subj
,
gpc_polygon
*
clip
,
gpc_tristrip
*
result
)
{
sb_tree
*
sbtree
=
NULL
;
it_node
*
it
=
NULL
;
it_node
*
intersect
=
NULL
;
edge_node
*
edge
=
NULL
;
edge_node
*
prev_edge
=
NULL
;
edge_node
*
next_edge
=
NULL
;
edge_node
*
succ_edge
=
NULL
;
edge_node
*
e0
=
NULL
;
edge_node
*
e1
=
NULL
;
edge_node
*
aet
=
NULL
;
edge_node
*
c_heap
=
NULL
;
edge_node
*
s_heap
=
NULL
;
edge_node
*
cf
=
NULL
;
lmt_node
*
lmt
=
NULL
;
lmt_node
*
local_min
=
NULL
;
polygon_node
*
tlist
=
NULL
;
polygon_node
*
tn
=
NULL
;
polygon_node
*
tnn
=
NULL
;
polygon_node
*
p
=
NULL
;
polygon_node
*
q
=
NULL
;
vertex_node
*
lt
=
NULL
;
vertex_node
*
ltn
=
NULL
;
vertex_node
*
rt
=
NULL
;
vertex_node
*
rtn
=
NULL
;
h_state
horiz
[
2
];
vertex_type
cft
=
NUL
;
int
in
[
2
];
int
exists
[
2
];
int
parity
[
2
]
=
{
LEFT
,
LEFT
};
int
s
=
0
;
int
v
=
0
;
int
contributing
=
0
;
int
search
=
0
;
int
scanbeam
=
0
;
int
sbt_entries
=
0
;
int
vclass
=
0
;
int
bl
=
0
;
int
br
=
0
;
int
tl
=
0
;
int
tr
=
0
;
double
*
sbt
=
NULL
;
double
xb
=
0.0
;
double
px
=
0.0
;
double
nx
=
0.0
;
double
yb
=
0.0
;
double
yt
=
0.0
;
double
dy
=
0.0
;
double
ix
=
0.0
;
double
iy
=
0.0
;
/* Test for trivial NULL result cases */
if
(((
subj
->
num_contours
==
0
)
&&
(
clip
->
num_contours
==
0
))
||
((
subj
->
num_contours
==
0
)
&&
((
op
==
GPC_INT
)
||
(
op
==
GPC_DIFF
)))
||
((
clip
->
num_contours
==
0
)
&&
(
op
==
GPC_INT
)))
{
result
->
num_strips
=
0
;
result
->
strip
=
NULL
;
return
;
}
/* Identify potentialy contributing contours */
if
(((
op
==
GPC_INT
)
||
(
op
==
GPC_DIFF
))
&&
(
subj
->
num_contours
>
0
)
&&
(
clip
->
num_contours
>
0
))
{
minimax_test
(
subj
,
clip
,
op
);
}
/* Build LMT */
if
(
subj
->
num_contours
>
0
)
{
s_heap
=
build_lmt
(
&
lmt
,
&
sbtree
,
&
sbt_entries
,
subj
,
SUBJ
,
op
);
}
if
(
clip
->
num_contours
>
0
)
{
c_heap
=
build_lmt
(
&
lmt
,
&
sbtree
,
&
sbt_entries
,
clip
,
CLIP
,
op
);
}
/* Return a NULL result if no contours contribute */
if
(
lmt
==
NULL
)
{
result
->
num_strips
=
0
;
result
->
strip
=
NULL
;
reset_lmt
(
&
lmt
);
gpc_free
<
edge_node
>
(
s_heap
);
gpc_free
<
edge_node
>
(
c_heap
);
return
;
}
/* Build scanbeam table from scanbeam tree */
gpc_malloc
<
double
>
(
sbt
,
sbt_entries
*
sizeof
(
double
),
const_cast
<
char
*>
(
"sbt creation"
));
build_sbt
(
&
scanbeam
,
sbt
,
sbtree
);
scanbeam
=
0
;
free_sbtree
(
&
sbtree
);
/* Invert clip polygon for difference operation */
if
(
op
==
GPC_DIFF
)
{
parity
[
CLIP
]
=
RIGHT
;
}
local_min
=
lmt
;
// Process each scanbeam
while
(
scanbeam
<
sbt_entries
)
{
/* Set yb and yt to the bottom and top of the scanbeam */
yb
=
sbt
[
scanbeam
++
];
if
(
scanbeam
<
sbt_entries
)
{
yt
=
sbt
[
scanbeam
];
dy
=
yt
-
yb
;
}
/* === SCANBEAM BOUNDARY PROCESSING ================================ */
/* If LMT node corresponding to yb exists */
if
(
local_min
)
{
if
(
local_min
->
y
==
yb
)
{
/* Add edges starting at this local minimum to the AET */
for
(
edge
=
local_min
->
first_bound
;
edge
;
edge
=
edge
->
next_bound
)
{
add_edge_to_aet
(
&
aet
,
edge
,
NULL
);
}
local_min
=
local_min
->
next
;
}
}
/* Set dummy previous x value */
/* Create bundles within AET */
px
=
-
DBL_MAX
;
e0
=
aet
;
e1
=
aet
;
/* Set up bundle fields of first edge */
aet
->
bundle
[
ABOVE
][
aet
->
type
]
=
(
aet
->
top
.
y
!=
yb
);
aet
->
bundle
[
ABOVE
][
!
aet
->
type
]
=
0
;
aet
->
bstate
[
ABOVE
]
=
UNBUNDLED
;
for
(
next_edge
=
aet
->
next
;
next_edge
;
next_edge
=
next_edge
->
next
)
{
/* Set up bundle fields of next edge */
next_edge
->
bundle
[
ABOVE
][
next_edge
->
type
]
=
(
next_edge
->
top
.
y
!=
yb
);
next_edge
->
bundle
[
ABOVE
][
!
next_edge
->
type
]
=
0
;
next_edge
->
bstate
[
ABOVE
]
=
UNBUNDLED
;
/* Bundle edges above the scanbeam boundary if they coincide */
if
(
next_edge
->
bundle
[
ABOVE
][
next_edge
->
type
])
{
if
(
gpc_eq
(
e0
->
xb
,
next_edge
->
xb
)
&&
gpc_eq
(
e0
->
dx
,
next_edge
->
dx
)
&&
(
e0
->
top
.
y
!=
yb
))
{
next_edge
->
bundle
[
ABOVE
][
next_edge
->
type
]
^=
e0
->
bundle
[
ABOVE
][
next_edge
->
type
];
next_edge
->
bundle
[
ABOVE
][
!
next_edge
->
type
]
=
e0
->
bundle
[
ABOVE
][
!
next_edge
->
type
];
next_edge
->
bstate
[
ABOVE
]
=
BUNDLE_HEAD
;
e0
->
bundle
[
ABOVE
][
CLIP
]
=
0
;
e0
->
bundle
[
ABOVE
][
SUBJ
]
=
0
;
e0
->
bstate
[
ABOVE
]
=
BUNDLE_TAIL
;
}
e0
=
next_edge
;
}
}
horiz
[
CLIP
]
=
NH
;
horiz
[
SUBJ
]
=
NH
;
/* Process each edge at this scanbeam boundary */
for
(
edge
=
aet
;
edge
;
edge
=
edge
->
next
)
{
exists
[
CLIP
]
=
edge
->
bundle
[
ABOVE
][
CLIP
]
+
(
edge
->
bundle
[
BELOW
][
CLIP
]
<<
1
);
exists
[
SUBJ
]
=
edge
->
bundle
[
ABOVE
][
SUBJ
]
+
(
edge
->
bundle
[
BELOW
][
SUBJ
]
<<
1
);
if
(
exists
[
CLIP
]
||
exists
[
SUBJ
])
{
/* Set bundle side */
edge
->
bside
[
CLIP
]
=
parity
[
CLIP
];
edge
->
bside
[
SUBJ
]
=
parity
[
SUBJ
];
/* Determine contributing status and quadrant occupancies */
switch
(
op
)
{
case
GPC_DIFF
:
case
GPC_INT
:
contributing
=
(
exists
[
CLIP
]
&&
(
parity
[
SUBJ
]
||
horiz
[
SUBJ
]))
||
(
exists
[
SUBJ
]
&&
(
parity
[
CLIP
]
||
horiz
[
CLIP
]))
||
(
exists
[
CLIP
]
&&
exists
[
SUBJ
]
&&
(
parity
[
CLIP
]
==
parity
[
SUBJ
]));
br
=
(
parity
[
CLIP
])
&&
(
parity
[
SUBJ
]);
bl
=
(
parity
[
CLIP
]
^
edge
->
bundle
[
ABOVE
][
CLIP
])
&&
(
parity
[
SUBJ
]
^
edge
->
bundle
[
ABOVE
][
SUBJ
]);
tr
=
(
parity
[
CLIP
]
^
(
horiz
[
CLIP
]
!=
NH
))
&&
(
parity
[
SUBJ
]
^
(
horiz
[
SUBJ
]
!=
NH
));
tl
=
(
parity
[
CLIP
]
^
(
horiz
[
CLIP
]
!=
NH
)
^
edge
->
bundle
[
BELOW
][
CLIP
])
&&
(
parity
[
SUBJ
]
^
(
horiz
[
SUBJ
]
!=
NH
)
^
edge
->
bundle
[
BELOW
][
SUBJ
]);
break
;
case
GPC_XOR
:
contributing
=
exists
[
CLIP
]
||
exists
[
SUBJ
];
br
=
(
parity
[
CLIP
])
^
(
parity
[
SUBJ
]);
bl
=
(
parity
[
CLIP
]
^
edge
->
bundle
[
ABOVE
][
CLIP
])
^
(
parity
[
SUBJ
]
^
edge
->
bundle
[
ABOVE
][
SUBJ
]);
tr
=
(
parity
[
CLIP
]
^
(
horiz
[
CLIP
]
!=
NH
))
^
(
parity
[
SUBJ
]
^
(
horiz
[
SUBJ
]
!=
NH
));
tl
=
(
parity
[
CLIP
]
^
(
horiz
[
CLIP
]
!=
NH
)
^
edge
->
bundle
[
BELOW
][
CLIP
])
^
(
parity
[
SUBJ
]
^
(
horiz
[
SUBJ
]
!=
NH
)
^
edge
->
bundle
[
BELOW
][
SUBJ
]);
break
;
case
GPC_UNION
:
contributing
=
(
exists
[
CLIP
]
&&
(
!
parity
[
SUBJ
]
||
horiz
[
SUBJ
]))
||
(
exists
[
SUBJ
]
&&
(
!
parity
[
CLIP
]
||
horiz
[
CLIP
]))
||
(
exists
[
CLIP
]
&&
exists
[
SUBJ
]
&&
(
parity
[
CLIP
]
==
parity
[
SUBJ
]));
br
=
(
parity
[
CLIP
])
||
(
parity
[
SUBJ
]);
bl
=
(
parity
[
CLIP
]
^
edge
->
bundle
[
ABOVE
][
CLIP
])
||
(
parity
[
SUBJ
]
^
edge
->
bundle
[
ABOVE
][
SUBJ
]);
tr
=
(
parity
[
CLIP
]
^
(
horiz
[
CLIP
]
!=
NH
))
||
(
parity
[
SUBJ
]
^
(
horiz
[
SUBJ
]
!=
NH
));
tl
=
(
parity
[
CLIP
]
^
(
horiz
[
CLIP
]
!=
NH
)
^
edge
->
bundle
[
BELOW
][
CLIP
])
||
(
parity
[
SUBJ
]
^
(
horiz
[
SUBJ
]
!=
NH
)
^
edge
->
bundle
[
BELOW
][
SUBJ
]);
break
;
}
// Update parity
parity
[
CLIP
]
^=
edge
->
bundle
[
ABOVE
][
CLIP
];
parity
[
SUBJ
]
^=
edge
->
bundle
[
ABOVE
][
SUBJ
];
/* Update horizontal state */
if
(
exists
[
CLIP
])
{
horiz
[
CLIP
]
=
next_h_state
[
horiz
[
CLIP
]]
[((
exists
[
CLIP
]
-
1
)
<<
1
)
+
parity
[
CLIP
]];
}
if
(
exists
[
SUBJ
])
{
horiz
[
SUBJ
]
=
next_h_state
[
horiz
[
SUBJ
]]
[((
exists
[
SUBJ
]
-
1
)
<<
1
)
+
parity
[
SUBJ
]];
}
vclass
=
tr
+
(
tl
<<
1
)
+
(
br
<<
2
)
+
(
bl
<<
3
);
if
(
contributing
)
{
xb
=
edge
->
xb
;
switch
(
vclass
)
{
case
EMN
:
new_tristrip
(
&
tlist
,
edge
,
xb
,
yb
);
cf
=
edge
;
break
;
case
ERI
:
edge
->
outp
[
ABOVE
]
=
cf
->
outp
[
ABOVE
];
if
(
xb
!=
cf
->
xb
)
{
gpc_vertex_create
(
edge
,
ABOVE
,
RIGHT
,
xb
,
yb
);
}
cf
=
NULL
;
break
;
case
ELI
:
gpc_vertex_create
(
edge
,
BELOW
,
LEFT
,
xb
,
yb
);
edge
->
outp
[
ABOVE
]
=
NULL
;
cf
=
edge
;
break
;
case
EMX
:
if
(
xb
!=
cf
->
xb
)
{
gpc_vertex_create
(
edge
,
BELOW
,
RIGHT
,
xb
,
yb
);
}
edge
->
outp
[
ABOVE
]
=
NULL
;
cf
=
NULL
;
break
;
case
IMN
:
if
(
cft
==
LED
)
{
if
(
cf
->
bot
.
y
!=
yb
)
{
gpc_vertex_create
(
cf
,
BELOW
,
LEFT
,
cf
->
xb
,
yb
);
}
new_tristrip
(
&
tlist
,
cf
,
cf
->
xb
,
yb
);
}
edge
->
outp
[
ABOVE
]
=
cf
->
outp
[
ABOVE
];
gpc_vertex_create
(
edge
,
ABOVE
,
RIGHT
,
xb
,
yb
);
break
;
case
ILI
:
new_tristrip
(
&
tlist
,
edge
,
xb
,
yb
);
cf
=
edge
;
cft
=
ILI
;
break
;
case
IRI
:
if
(
cft
==
LED
)
{
if
(
cf
->
bot
.
y
!=
yb
)
{
gpc_vertex_create
(
cf
,
BELOW
,
LEFT
,
cf
->
xb
,
yb
);
}
new_tristrip
(
&
tlist
,
cf
,
cf
->
xb
,
yb
);
}
gpc_vertex_create
(
edge
,
BELOW
,
RIGHT
,
xb
,
yb
);
edge
->
outp
[
ABOVE
]
=
NULL
;
break
;
case
IMX
:
gpc_vertex_create
(
edge
,
BELOW
,
LEFT
,
xb
,
yb
);
edge
->
outp
[
ABOVE
]
=
NULL
;
cft
=
IMX
;
break
;
case
IMM
:
gpc_vertex_create
(
edge
,
BELOW
,
LEFT
,
xb
,
yb
);
edge
->
outp
[
ABOVE
]
=
cf
->
outp
[
ABOVE
];
if
(
xb
!=
cf
->
xb
)
{
gpc_vertex_create
(
cf
,
ABOVE
,
RIGHT
,
xb
,
yb
);
}
cf
=
edge
;
break
;
case
EMM
:
gpc_vertex_create
(
edge
,
BELOW
,
RIGHT
,
xb
,
yb
);
edge
->
outp
[
ABOVE
]
=
NULL
;
new_tristrip
(
&
tlist
,
edge
,
xb
,
yb
);
cf
=
edge
;
break
;
case
LED
:
if
(
edge
->
bot
.
y
==
yb
)
{
gpc_vertex_create
(
edge
,
BELOW
,
LEFT
,
xb
,
yb
);
}
edge
->
outp
[
ABOVE
]
=
edge
->
outp
[
BELOW
];
cf
=
edge
;
cft
=
LED
;
break
;
case
RED
:
edge
->
outp
[
ABOVE
]
=
cf
->
outp
[
ABOVE
];
if
(
cft
==
LED
)
{
if
(
cf
->
bot
.
y
==
yb
)
{
gpc_vertex_create
(
edge
,
BELOW
,
RIGHT
,
xb
,
yb
);
}
else
{
if
(
edge
->
bot
.
y
==
yb
)
{
gpc_vertex_create
(
cf
,
BELOW
,
LEFT
,
cf
->
xb
,
yb
);
gpc_vertex_create
(
edge
,
BELOW
,
RIGHT
,
xb
,
yb
);
}
}
}
else
{
gpc_vertex_create
(
edge
,
BELOW
,
RIGHT
,
xb
,
yb
);
gpc_vertex_create
(
edge
,
ABOVE
,
RIGHT
,
xb
,
yb
);
}
cf
=
NULL
;
break
;
default:
break
;
}
/* End of switch */
}
/* End of contributing conditional */
}
/* End of edge exists conditional */
}
// End of AET loop
/* Delete terminating edges from the AET, otherwise compute xt */
for
(
edge
=
aet
;
edge
;
edge
=
edge
->
next
)
{
if
(
edge
->
top
.
y
==
yb
)
{
prev_edge
=
edge
->
prev
;
next_edge
=
edge
->
next
;
if
(
prev_edge
)
{
prev_edge
->
next
=
next_edge
;
}
else
{
aet
=
next_edge
;
}
if
(
next_edge
)
{
next_edge
->
prev
=
prev_edge
;
}
/* Copy bundle head state to the adjacent tail edge if required */
if
((
edge
->
bstate
[
BELOW
]
==
BUNDLE_HEAD
)
&&
prev_edge
)
{
if
(
prev_edge
->
bstate
[
BELOW
]
==
BUNDLE_TAIL
)
{
prev_edge
->
outp
[
BELOW
]
=
edge
->
outp
[
BELOW
];
prev_edge
->
bstate
[
BELOW
]
=
UNBUNDLED
;
if
(
prev_edge
->
prev
)
{
if
(
prev_edge
->
prev
->
bstate
[
BELOW
]
==
BUNDLE_TAIL
)
{
prev_edge
->
bstate
[
BELOW
]
=
BUNDLE_HEAD
;
}
}
}
}
}
else
{
if
(
edge
->
top
.
y
==
yt
)
{
edge
->
xt
=
edge
->
top
.
x
;
}
else
{
edge
->
xt
=
edge
->
bot
.
x
+
edge
->
dx
*
(
yt
-
edge
->
bot
.
y
);
}
}
}
if
(
scanbeam
<
sbt_entries
)
{
/* === SCANBEAM INTERIOR PROCESSING ============================== */
build_intersection_table
(
&
it
,
aet
,
dy
);
/* Process each node in the intersection table */
for
(
intersect
=
it
;
intersect
;
intersect
=
intersect
->
next
)
{
e0
=
intersect
->
ie
[
0
];
e1
=
intersect
->
ie
[
1
];
/* Only generate output for contributing intersections */
if
((
e0
->
bundle
[
ABOVE
][
CLIP
]
||
e0
->
bundle
[
ABOVE
][
SUBJ
])
&&
(
e1
->
bundle
[
ABOVE
][
CLIP
]
||
e1
->
bundle
[
ABOVE
][
SUBJ
]))
{
p
=
e0
->
outp
[
ABOVE
];
q
=
e1
->
outp
[
ABOVE
];
ix
=
intersect
->
point
.
x
;
iy
=
intersect
->
point
.
y
+
yb
;
in
[
CLIP
]
=
(
e0
->
bundle
[
ABOVE
][
CLIP
]
&&
!
e0
->
bside
[
CLIP
])
||
(
e1
->
bundle
[
ABOVE
][
CLIP
]
&&
e1
->
bside
[
CLIP
])
||
(
!
e0
->
bundle
[
ABOVE
][
CLIP
]
&&
!
e1
->
bundle
[
ABOVE
][
CLIP
]
&&
e0
->
bside
[
CLIP
]
&&
e1
->
bside
[
CLIP
]);
in
[
SUBJ
]
=
(
e0
->
bundle
[
ABOVE
][
SUBJ
]
&&
!
e0
->
bside
[
SUBJ
])
||
(
e1
->
bundle
[
ABOVE
][
SUBJ
]
&&
e1
->
bside
[
SUBJ
])
||
(
!
e0
->
bundle
[
ABOVE
][
SUBJ
]
&&
!
e1
->
bundle
[
ABOVE
][
SUBJ
]
&&
e0
->
bside
[
SUBJ
]
&&
e1
->
bside
[
SUBJ
]);
switch
(
op
)
{
// Determine quadrant occupancies
case
GPC_DIFF
:
case
GPC_INT
:
tr
=
(
in
[
CLIP
])
&&
(
in
[
SUBJ
]);
tl
=
(
in
[
CLIP
]
^
e1
->
bundle
[
ABOVE
][
CLIP
])
&&
(
in
[
SUBJ
]
^
e1
->
bundle
[
ABOVE
][
SUBJ
]);
br
=
(
in
[
CLIP
]
^
e0
->
bundle
[
ABOVE
][
CLIP
])
&&
(
in
[
SUBJ
]
^
e0
->
bundle
[
ABOVE
][
SUBJ
]);
bl
=
(
in
[
CLIP
]
^
e1
->
bundle
[
ABOVE
][
CLIP
]
^
e0
->
bundle
[
ABOVE
][
CLIP
])
&&
(
in
[
SUBJ
]
^
e1
->
bundle
[
ABOVE
][
SUBJ
]
^
e0
->
bundle
[
ABOVE
][
SUBJ
]);
break
;
case
GPC_XOR
:
tr
=
(
in
[
CLIP
])
^
(
in
[
SUBJ
]);
tl
=
(
in
[
CLIP
]
^
e1
->
bundle
[
ABOVE
][
CLIP
])
^
(
in
[
SUBJ
]
^
e1
->
bundle
[
ABOVE
][
SUBJ
]);
br
=
(
in
[
CLIP
]
^
e0
->
bundle
[
ABOVE
][
CLIP
])
^
(
in
[
SUBJ
]
^
e0
->
bundle
[
ABOVE
][
SUBJ
]);
bl
=
(
in
[
CLIP
]
^
e1
->
bundle
[
ABOVE
][
CLIP
]
^
e0
->
bundle
[
ABOVE
][
CLIP
])
^
(
in
[
SUBJ
]
^
e1
->
bundle
[
ABOVE
][
SUBJ
]
^
e0
->
bundle
[
ABOVE
][
SUBJ
]);
break
;
case
GPC_UNION
:
tr
=
(
in
[
CLIP
])
||
(
in
[
SUBJ
]);
tl
=
(
in
[
CLIP
]
^
e1
->
bundle
[
ABOVE
][
CLIP
])
||
(
in
[
SUBJ
]
^
e1
->
bundle
[
ABOVE
][
SUBJ
]);
br
=
(
in
[
CLIP
]
^
e0
->
bundle
[
ABOVE
][
CLIP
])
||
(
in
[
SUBJ
]
^
e0
->
bundle
[
ABOVE
][
SUBJ
]);
bl
=
(
in
[
CLIP
]
^
e1
->
bundle
[
ABOVE
][
CLIP
]
^
e0
->
bundle
[
ABOVE
][
CLIP
])
||
(
in
[
SUBJ
]
^
e1
->
bundle
[
ABOVE
][
SUBJ
]
^
e0
->
bundle
[
ABOVE
][
SUBJ
]);
break
;
}
vclass
=
tr
+
(
tl
<<
1
)
+
(
br
<<
2
)
+
(
bl
<<
3
);
switch
(
vclass
)
{
case
EMN
:
new_tristrip
(
&
tlist
,
e1
,
ix
,
iy
);
e0
->
outp
[
ABOVE
]
=
e1
->
outp
[
ABOVE
];
break
;
case
ERI
:
if
(
p
)
{
gpc_p_edge
(
prev_edge
,
e0
,
ABOVE
);
gpc_vertex_create
(
prev_edge
,
ABOVE
,
LEFT
,
px
,
iy
);
gpc_vertex_create
(
e0
,
ABOVE
,
RIGHT
,
ix
,
iy
);
e1
->
outp
[
ABOVE
]
=
e0
->
outp
[
ABOVE
];
e0
->
outp
[
ABOVE
]
=
NULL
;
}
break
;
case
ELI
:
if
(
q
)
{
gpc_n_edge
(
next_edge
,
e1
,
ABOVE
);
gpc_vertex_create
(
e1
,
ABOVE
,
LEFT
,
ix
,
iy
);
gpc_vertex_create
(
next_edge
,
ABOVE
,
RIGHT
,
nx
,
iy
);
e0
->
outp
[
ABOVE
]
=
e1
->
outp
[
ABOVE
];
e1
->
outp
[
ABOVE
]
=
NULL
;
}
break
;
case
EMX
:
if
(
p
&&
q
)
{
gpc_vertex_create
(
e0
,
ABOVE
,
LEFT
,
ix
,
iy
);
e0
->
outp
[
ABOVE
]
=
NULL
;
e1
->
outp
[
ABOVE
]
=
NULL
;
}
break
;
case
IMN
:
gpc_p_edge
(
prev_edge
,
e0
,
ABOVE
);
gpc_vertex_create
(
prev_edge
,
ABOVE
,
LEFT
,
px
,
iy
);
gpc_n_edge
(
next_edge
,
e1
,
ABOVE
);
gpc_vertex_create
(
next_edge
,
ABOVE
,
RIGHT
,
nx
,
iy
);
new_tristrip
(
&
tlist
,
prev_edge
,
px
,
iy
);
e1
->
outp
[
ABOVE
]
=
prev_edge
->
outp
[
ABOVE
];
gpc_vertex_create
(
e1
,
ABOVE
,
RIGHT
,
ix
,
iy
);
new_tristrip
(
&
tlist
,
e0
,
ix
,
iy
);
next_edge
->
outp
[
ABOVE
]
=
e0
->
outp
[
ABOVE
];
gpc_vertex_create
(
next_edge
,
ABOVE
,
RIGHT
,
nx
,
iy
);
break
;
case
ILI
:
if
(
p
)
{
gpc_vertex_create
(
e0
,
ABOVE
,
LEFT
,
ix
,
iy
);
gpc_n_edge
(
next_edge
,
e1
,
ABOVE
);
gpc_vertex_create
(
next_edge
,
ABOVE
,
RIGHT
,
nx
,
iy
);
e1
->
outp
[
ABOVE
]
=
e0
->
outp
[
ABOVE
];
e0
->
outp
[
ABOVE
]
=
NULL
;
}
break
;
case
IRI
:
if
(
q
)
{
gpc_vertex_create
(
e1
,
ABOVE
,
RIGHT
,
ix
,
iy
);
gpc_p_edge
(
prev_edge
,
e0
,
ABOVE
);
gpc_vertex_create
(
prev_edge
,
ABOVE
,
LEFT
,
px
,
iy
);
e0
->
outp
[
ABOVE
]
=
e1
->
outp
[
ABOVE
];
e1
->
outp
[
ABOVE
]
=
NULL
;
}
break
;
case
IMX
:
if
(
p
&&
q
)
{
gpc_vertex_create
(
e0
,
ABOVE
,
RIGHT
,
ix
,
iy
);
gpc_vertex_create
(
e1
,
ABOVE
,
LEFT
,
ix
,
iy
);
e0
->
outp
[
ABOVE
]
=
NULL
;
e1
->
outp
[
ABOVE
]
=
NULL
;
gpc_p_edge
(
prev_edge
,
e0
,
ABOVE
);
gpc_vertex_create
(
prev_edge
,
ABOVE
,
LEFT
,
px
,
iy
);
new_tristrip
(
&
tlist
,
prev_edge
,
px
,
iy
);
gpc_n_edge
(
next_edge
,
e1
,
ABOVE
);
gpc_vertex_create
(
next_edge
,
ABOVE
,
RIGHT
,
nx
,
iy
);
next_edge
->
outp
[
ABOVE
]
=
prev_edge
->
outp
[
ABOVE
];
gpc_vertex_create
(
next_edge
,
ABOVE
,
RIGHT
,
nx
,
iy
);
}
break
;
case
IMM
:
if
(
p
&&
q
)
{
gpc_vertex_create
(
e0
,
ABOVE
,
RIGHT
,
ix
,
iy
);
gpc_vertex_create
(
e1
,
ABOVE
,
LEFT
,
ix
,
iy
);
gpc_p_edge
(
prev_edge
,
e0
,
ABOVE
);
gpc_vertex_create
(
prev_edge
,
ABOVE
,
LEFT
,
px
,
iy
);
new_tristrip
(
&
tlist
,
prev_edge
,
px
,
iy
);
gpc_n_edge
(
next_edge
,
e1
,
ABOVE
);
gpc_vertex_create
(
next_edge
,
ABOVE
,
RIGHT
,
nx
,
iy
);
e1
->
outp
[
ABOVE
]
=
prev_edge
->
outp
[
ABOVE
];
gpc_vertex_create
(
e1
,
ABOVE
,
RIGHT
,
ix
,
iy
);
new_tristrip
(
&
tlist
,
e0
,
ix
,
iy
);
next_edge
->
outp
[
ABOVE
]
=
e0
->
outp
[
ABOVE
];
gpc_vertex_create
(
next_edge
,
ABOVE
,
RIGHT
,
nx
,
iy
);
}
break
;
case
EMM
:
if
(
p
&&
q
)
{
gpc_vertex_create
(
e0
,
ABOVE
,
LEFT
,
ix
,
iy
);
new_tristrip
(
&
tlist
,
e1
,
ix
,
iy
);
e0
->
outp
[
ABOVE
]
=
e1
->
outp
[
ABOVE
];
}
break
;
default:
break
;
}
/* End of switch */
}
/* End of contributing intersection conditional */
// Swap bundle sides in response to edge crossing
if
(
e0
->
bundle
[
ABOVE
][
CLIP
])
{
e1
->
bside
[
CLIP
]
=
!
e1
->
bside
[
CLIP
];
}
if
(
e1
->
bundle
[
ABOVE
][
CLIP
])
{
e0
->
bside
[
CLIP
]
=
!
e0
->
bside
[
CLIP
];
}
if
(
e0
->
bundle
[
ABOVE
][
SUBJ
])
{
e1
->
bside
[
SUBJ
]
=
!
e1
->
bside
[
SUBJ
];
}
if
(
e1
->
bundle
[
ABOVE
][
SUBJ
])
{
e0
->
bside
[
SUBJ
]
=
!
e0
->
bside
[
SUBJ
];
}
/* Swap e0 and e1 bundles in the AET */
prev_edge
=
e0
->
prev
;
next_edge
=
e1
->
next
;
if
(
e1
->
next
)
{
e1
->
next
->
prev
=
e0
;
}
if
(
e0
->
bstate
[
ABOVE
]
==
BUNDLE_HEAD
)
{
search
=
1
;
while
(
search
)
{
prev_edge
=
prev_edge
->
prev
;
if
(
prev_edge
)
{
if
(
prev_edge
->
bundle
[
ABOVE
][
CLIP
]
||
prev_edge
->
bundle
[
ABOVE
][
SUBJ
]
||
(
prev_edge
->
bstate
[
ABOVE
]
==
BUNDLE_HEAD
))
{
search
=
0
;
}
}
else
{
search
=
0
;
}
}
}
if
(
!
prev_edge
)
{
e1
->
next
=
aet
;
aet
=
e0
->
next
;
}
else
{
e1
->
next
=
prev_edge
->
next
;
prev_edge
->
next
=
e0
->
next
;
}
e0
->
next
->
prev
=
prev_edge
;
e1
->
next
->
prev
=
e1
;
e0
->
next
=
next_edge
;
}
/* End of IT loop*/
/* Prepare for next scanbeam */
for
(
edge
=
aet
;
edge
;
edge
=
next_edge
)
{
next_edge
=
edge
->
next
;
succ_edge
=
edge
->
succ
;
if
((
edge
->
top
.
y
==
yt
)
&&
succ_edge
)
{
/* Replace AET edge by its successor */
succ_edge
->
outp
[
BELOW
]
=
edge
->
outp
[
ABOVE
];
succ_edge
->
bstate
[
BELOW
]
=
edge
->
bstate
[
ABOVE
];
succ_edge
->
bundle
[
BELOW
][
CLIP
]
=
edge
->
bundle
[
ABOVE
][
CLIP
];
succ_edge
->
bundle
[
BELOW
][
SUBJ
]
=
edge
->
bundle
[
ABOVE
][
SUBJ
];
prev_edge
=
edge
->
prev
;
if
(
prev_edge
)
{
prev_edge
->
next
=
succ_edge
;
}
else
{
aet
=
succ_edge
;
}
if
(
next_edge
)
{
next_edge
->
prev
=
succ_edge
;
}
succ_edge
->
prev
=
prev_edge
;
succ_edge
->
next
=
next_edge
;
}
else
{
/* Update this edge */
edge
->
outp
[
BELOW
]
=
edge
->
outp
[
ABOVE
];
edge
->
bstate
[
BELOW
]
=
edge
->
bstate
[
ABOVE
];
edge
->
bundle
[
BELOW
][
CLIP
]
=
edge
->
bundle
[
ABOVE
][
CLIP
];
edge
->
bundle
[
BELOW
][
SUBJ
]
=
edge
->
bundle
[
ABOVE
][
SUBJ
];
edge
->
xb
=
edge
->
xt
;
}
edge
->
outp
[
ABOVE
]
=
NULL
;
}
}
}
/* === END OF SCANBEAM PROCESSING ================================== */
// Generate result tristrip from tlist
result
->
strip
=
NULL
;
result
->
num_strips
=
count_tristrips
(
tlist
);
if
(
result
->
num_strips
>
0
)
{
gpc_malloc
<
gpc_vertex_list
>
(
result
->
strip
,
result
->
num_strips
*
sizeof
(
gpc_vertex_list
),
const_cast
<
char
*>
(
"tristrip list creation"
));
s
=
0
;
for
(
tn
=
tlist
;
tn
;
tn
=
tnn
)
{
tnn
=
tn
->
next
;
if
(
tn
->
active
>
2
)
{
/* Valid tristrip: copy the vertices and free the heap */
result
->
strip
[
s
].
num_vertices
=
tn
->
active
;
gpc_malloc
<
gpc_vertex
>
(
result
->
strip
[
s
].
vertex
,
tn
->
active
*
sizeof
(
gpc_vertex
),
const_cast
<
char
*>
(
"tristrip creation"
));
v
=
0
;
if
(
0
)
{
lt
=
tn
->
v
[
RIGHT
];
rt
=
tn
->
v
[
LEFT
];
}
else
{
lt
=
tn
->
v
[
LEFT
];
rt
=
tn
->
v
[
RIGHT
];
}
while
(
lt
||
rt
)
{
if
(
lt
)
{
ltn
=
lt
->
next
;
result
->
strip
[
s
].
vertex
[
v
].
x
=
lt
->
x
;
result
->
strip
[
s
].
vertex
[
v
].
y
=
lt
->
y
;
v
++
;
gpc_free
<
vertex_node
>
(
lt
);
lt
=
ltn
;
}
if
(
rt
)
{
rtn
=
rt
->
next
;
result
->
strip
[
s
].
vertex
[
v
].
x
=
rt
->
x
;
result
->
strip
[
s
].
vertex
[
v
].
y
=
rt
->
y
;
v
++
;
gpc_free
<
vertex_node
>
(
rt
);
rt
=
rtn
;
}
}
s
++
;
}
else
{
/* Invalid tristrip: just free the heap */
for
(
lt
=
tn
->
v
[
LEFT
];
lt
;
lt
=
ltn
)
{
ltn
=
lt
->
next
;
gpc_free
<
vertex_node
>
(
lt
);
}
for
(
rt
=
tn
->
v
[
RIGHT
];
rt
;
rt
=
rtn
)
{
rtn
=
rt
->
next
;
gpc_free
<
vertex_node
>
(
rt
);
}
}
gpc_free
<
polygon_node
>
(
tn
);
}
}
// Tidy up
reset_it
(
&
it
);
reset_lmt
(
&
lmt
);
gpc_free
<
edge_node
>
(
c_heap
);
gpc_free
<
edge_node
>
(
s_heap
);
gpc_free
<
double
>
(
sbt
);
}
// NOLINT
}
// namespace gpc
/* vim: set expandtab ts=4 sw=4 sts=4 tw=100: */
paddle/fluid/operators/detection/gpc.h
0 → 100644
浏览文件 @
23fc896b
// Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
/***************************************************************************
*
* Copyright (c) 2015 Baidu.com, Inc. All Rights Reserved
*
**************************************************************************/
/**
* @file include/gpc.h
* @author huhan02(com@baidu.com)
* @date 2015/12/18 13:52:10
* @brief
*
* @modified by sunyipeng
* @email sunyipeng@baidu.com
* @date 2018/6/12
**/
#ifndef PADDLE_FLUID_OPERATORS_DETECTION_GPC_H_ // GPC_H_
#define PADDLE_FLUID_OPERATORS_DETECTION_GPC_H_ // GPC_H_
#include <float.h>
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
namespace
gpc
{
typedef
enum
{
// Set operation type
GPC_DIFF
,
// Difference
GPC_INT
,
// Intersection
GPC_XOR
,
// Exclusive or
GPC_UNION
// Union
}
gpc_op
;
typedef
struct
{
// Polygon vertex structure
double
x
;
// Vertex x component
double
y
;
// vertex y component
}
gpc_vertex
;
typedef
struct
{
// Vertex list structure
int
num_vertices
;
// Number of vertices in list
gpc_vertex
*
vertex
;
// Vertex array pointer
}
gpc_vertex_list
;
typedef
struct
{
// Polygon set structure
int
num_contours
;
// Number of contours in polygon
int
*
hole
;
// Hole external contour flags
gpc_vertex_list
*
contour
;
// Contour array pointer
}
gpc_polygon
;
typedef
struct
{
// Tristrip set structure
int
num_strips
;
// Number of tristrips
gpc_vertex_list
*
strip
;
// Tristrip array pointer
}
gpc_tristrip
;
typedef
enum
{
LEFT
,
RIGHT
}
gpc_left_right
;
typedef
enum
{
ABOVE
,
BELOW
}
gpc_above_below
;
typedef
enum
{
CLIP
,
SUBJ
}
gpc_clip_subj
;
typedef
enum
{
/* Edge intersection classes */
NUL
,
/* Empty non-intersection */
EMX
,
/* External maximum */
ELI
,
/* External left intermediate */
TED
,
/* Top edge */
ERI
,
/* External right intermediate */
RED
,
/* Right edge */
IMM
,
/* Internal maximum and minimum */
IMN
,
/* Internal minimum */
EMN
,
/* External minimum */
EMM
,
/* External maximum and minimum */
LED
,
/* Left edge */
ILI
,
/* Internal left intermediate */
BED
,
/* Bottom edge */
IRI
,
/* Internal right intermediate */
IMX
,
/* Internal maximum */
FUL
/* Full non-intersection */
}
vertex_type
;
typedef
enum
{
/* Horizontal edge states */
NH
,
/* No horizontal edge */
BH
,
/* Bottom horizontal edge */
TH
/* Top horizontal edge */
}
h_state
;
typedef
enum
{
/* Edge bundle state */
UNBUNDLED
,
/* Isolated edge not within a bundle */
BUNDLE_HEAD
,
/* Bundle head node */
BUNDLE_TAIL
/* Passive bundle tail node */
}
bundle_state
;
typedef
struct
v_shape
{
/* Internal vertex list datatype */
double
x
;
/* X coordinate component */
double
y
;
/* Y coordinate component */
struct
v_shape
*
next
;
/* Pointer to next vertex in list */
}
vertex_node
;
typedef
struct
p_shape
{
/* Internal contour / tristrip type */
int
active
;
/* Active flag / vertex count */
int
hole
;
/* Hole / external contour flag */
vertex_node
*
v
[
2
];
/* Left and right vertex list ptrs */
struct
p_shape
*
next
;
/* Pointer to next polygon contour */
struct
p_shape
*
proxy
;
/* Pointer to actual structure used */
}
polygon_node
;
typedef
struct
edge_shape
{
gpc_vertex
vertex
;
/* Piggy-backed contour vertex data */
gpc_vertex
bot
;
/* Edge lower (x, y) coordinate */
gpc_vertex
top
;
/* Edge upper (x, y) coordinate */
double
xb
;
/* Scanbeam bottom x coordinate */
double
xt
;
/* Scanbeam top x coordinate */
double
dx
;
/* Change in x for a unit y increase */
int
type
;
/* Clip / subject edge flag */
int
bundle
[
2
][
2
];
/* Bundle edge flags */
int
bside
[
2
];
/* Bundle left / right indicators */
bundle_state
bstate
[
2
];
/* Edge bundle state */
polygon_node
*
outp
[
2
];
/* Output polygon / tristrip pointer */
struct
edge_shape
*
prev
;
/* Previous edge in the AET */
struct
edge_shape
*
next
;
/* Next edge in the AET */
struct
edge_shape
*
pred
;
/* Edge connected at the lower end */
struct
edge_shape
*
succ
;
/* Edge connected at the upper end */
struct
edge_shape
*
next_bound
;
/* Pointer to next bound in LMT */
}
edge_node
;
inline
bool
gpc_eq
(
float
a
,
float
b
)
{
return
(
fabs
(
a
-
b
)
<=
1e-6
);
}
inline
bool
gpc_prev_index
(
float
a
,
float
b
)
{
return
(
fabs
(
a
-
b
)
<=
1e-6
);
}
inline
int
gpc_prev_index
(
int
i
,
int
n
)
{
return
((
i
-
1
+
n
)
%
n
);
}
inline
int
gpc_next_index
(
int
i
,
int
n
)
{
return
((
i
+
1
)
%
n
);
}
inline
int
gpc_optimal
(
gpc_vertex
*
v
,
int
i
,
int
n
)
{
return
(
v
[(
i
+
1
)
%
n
].
y
!=
v
[
i
].
y
||
v
[(
i
-
1
+
n
)
%
n
].
y
!=
v
[
i
].
y
);
}
inline
int
gpc_fwd_min
(
edge_node
*
v
,
int
i
,
int
n
)
{
return
(
v
[(
i
+
1
)
%
n
].
vertex
.
y
>
v
[
i
].
vertex
.
y
&&
v
[(
i
-
1
+
n
)
%
n
].
vertex
.
y
>=
v
[
i
].
vertex
.
y
);
}
inline
int
gpc_not_fmax
(
edge_node
*
v
,
int
i
,
int
n
)
{
return
(
v
[(
i
+
1
)
%
n
].
vertex
.
y
>
v
[
i
].
vertex
.
y
);
}
inline
int
gpc_rev_min
(
edge_node
*
v
,
int
i
,
int
n
)
{
return
(
v
[(
i
+
1
)
%
n
].
vertex
.
y
>=
v
[
i
].
vertex
.
y
&&
v
[(
i
-
1
+
n
)
%
n
].
vertex
.
y
>
v
[
i
].
vertex
.
y
);
}
inline
int
gpc_not_rmax
(
edge_node
*
v
,
int
i
,
int
n
)
{
return
(
v
[(
i
-
1
+
n
)
%
n
].
vertex
.
y
>
v
[
i
].
vertex
.
y
);
}
// inline void gpc_p_edge(edge_node *d, edge_node *e, int p, double i, double j)
// {
inline
void
gpc_p_edge
(
edge_node
*
d
,
edge_node
*
e
,
int
p
)
{
d
=
e
;
do
{
d
=
d
->
prev
;
}
while
(
!
d
->
outp
[
p
]);
// i = d->bot.x + d->dx * (j - d->bot.y);
}
// inline void gpc_n_edge(edge_node *d, edge_node *e, int p, double i, double j)
// {
inline
void
gpc_n_edge
(
edge_node
*
d
,
edge_node
*
e
,
int
p
)
{
d
=
e
;
do
{
d
=
d
->
next
;
}
while
(
!
d
->
outp
[
p
]);
// i = d->bot.x + d->dx * (j - d->bot.y);
}
template
<
typename
T
>
void
gpc_malloc
(
T
*&
p
,
int
b
,
char
*
s
)
{
if
(
b
>
0
)
{
p
=
(
T
*
)
malloc
(
b
);
if
(
!
p
)
{
fprintf
(
stderr
,
"gpc malloc failure: %s
\n
"
,
s
);
exit
(
0
);
}
}
else
{
p
=
NULL
;
}
}
template
<
typename
T
>
void
gpc_free
(
T
*&
p
)
{
if
(
p
)
{
free
(
p
);
p
=
NULL
;
}
}
/*
===========================================================================
Public Function Prototypes
===========================================================================
*/
void
add_vertex
(
vertex_node
**
t
,
double
x
,
double
y
);
void
gpc_vertex_create
(
edge_node
*
e
,
int
p
,
int
s
,
double
x
,
double
y
);
/*
void gpc_read_polygon(FILE *infile_ptr, int read_hole_flags,
gpc_polygon *polygon);
void gpc_write_polygon(FILE *outfile_ptr, int write_hole_flags,
gpc_polygon *polygon);
*/
void
gpc_add_contour
(
gpc_polygon
*
polygon
,
gpc_vertex_list
*
contour
,
int
hole
);
void
gpc_polygon_clip
(
gpc_op
set_operation
,
gpc_polygon
*
subject_polygon
,
gpc_polygon
*
clip_polygon
,
gpc_polygon
*
result_polygon
);
void
gpc_tristrip_clip
(
gpc_op
set_operation
,
gpc_polygon
*
subject_polygon
,
gpc_polygon
*
clip_polygon
,
gpc_tristrip
*
result_tristrip
);
void
gpc_polygon_to_tristrip
(
gpc_polygon
*
polygon
,
gpc_tristrip
*
tristrip
);
void
gpc_free_polygon
(
gpc_polygon
*
polygon
);
void
gpc_free_tristrip
(
gpc_tristrip
*
tristrip
);
}
// namespace gpc
#endif // PADDLE_FLUID_OPERATORS_DETECTION_GPC_H_
/* vim: set expandtab ts=4 sw=4 sts=4 tw=100: */
paddle/fluid/operators/detection/multiclass_nms_op.cc
浏览文件 @
23fc896b
...
@@ -9,10 +9,11 @@ http://www.apache.org/licenses/LICENSE-2.0
...
@@ -9,10 +9,11 @@ http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
limitations under the License. */
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/operators/detection/poly_util.h"
namespace
paddle
{
namespace
paddle
{
namespace
operators
{
namespace
operators
{
...
@@ -20,9 +21,6 @@ namespace operators {
...
@@ -20,9 +21,6 @@ namespace operators {
using
Tensor
=
framework
::
Tensor
;
using
Tensor
=
framework
::
Tensor
;
using
LoDTensor
=
framework
::
LoDTensor
;
using
LoDTensor
=
framework
::
LoDTensor
;
constexpr
int64_t
kOutputDim
=
6
;
constexpr
int64_t
kBBoxSize
=
4
;
class
MultiClassNMSOp
:
public
framework
::
OperatorWithKernel
{
class
MultiClassNMSOp
:
public
framework
::
OperatorWithKernel
{
public:
public:
using
framework
::
OperatorWithKernel
::
OperatorWithKernel
;
using
framework
::
OperatorWithKernel
::
OperatorWithKernel
;
...
@@ -42,10 +40,15 @@ class MultiClassNMSOp : public framework::OperatorWithKernel {
...
@@ -42,10 +40,15 @@ class MultiClassNMSOp : public framework::OperatorWithKernel {
"The rank of Input(BBoxes) must be 3."
);
"The rank of Input(BBoxes) must be 3."
);
PADDLE_ENFORCE_EQ
(
score_dims
.
size
(),
3
,
PADDLE_ENFORCE_EQ
(
score_dims
.
size
(),
3
,
"The rank of Input(Scores) must be 3."
);
"The rank of Input(Scores) must be 3."
);
PADDLE_ENFORCE_EQ
(
box_dims
[
2
],
4
,
PADDLE_ENFORCE
(
box_dims
[
2
]
==
4
||
box_dims
[
2
]
==
8
||
box_dims
[
2
]
==
16
||
"The 2nd dimension of Input(BBoxes) must be 4, "
box_dims
[
2
]
==
24
||
box_dims
[
2
]
==
32
,
"The 2nd dimension of Input(BBoxes) must be 4 or 8, "
"represents the layout of coordinate "
"represents the layout of coordinate "
"[xmin, ymin, xmax, ymax]"
);
"[xmin, ymin, xmax, ymax] or "
"4 points: [x1, y1, x2, y2, x3, y3, x4, y4] or "
"8 points: [xi, yi] i= 1,2,...,8 or "
"12 points: [xi, yi] i= 1,2,...,12 or "
"16 points: [xi, yi] i= 1,2,...,16"
);
PADDLE_ENFORCE_EQ
(
box_dims
[
1
],
score_dims
[
2
],
PADDLE_ENFORCE_EQ
(
box_dims
[
1
],
score_dims
[
2
],
"The 1st dimensiong of Input(BBoxes) must be equal to "
"The 1st dimensiong of Input(BBoxes) must be equal to "
"3rd dimension of Input(Scores), which represents the "
"3rd dimension of Input(Scores), which represents the "
...
@@ -53,7 +56,7 @@ class MultiClassNMSOp : public framework::OperatorWithKernel {
...
@@ -53,7 +56,7 @@ class MultiClassNMSOp : public framework::OperatorWithKernel {
// Here the box_dims[0] is not the real dimension of output.
// Here the box_dims[0] is not the real dimension of output.
// It will be rewritten in the computing kernel.
// It will be rewritten in the computing kernel.
ctx
->
SetOutputDim
(
"Out"
,
{
box_dims
[
1
],
6
});
ctx
->
SetOutputDim
(
"Out"
,
{
box_dims
[
1
],
box_dims
[
2
]
+
2
});
}
}
protected:
protected:
...
@@ -128,6 +131,21 @@ static inline T JaccardOverlap(const T* box1, const T* box2,
...
@@ -128,6 +131,21 @@ static inline T JaccardOverlap(const T* box1, const T* box2,
}
}
}
}
template
<
class
T
>
T
PolyIoU
(
const
T
*
box1
,
const
T
*
box2
,
const
size_t
box_size
,
const
bool
normalized
)
{
T
bbox1_area
=
PolyArea
<
T
>
(
box1
,
box_size
,
normalized
);
T
bbox2_area
=
PolyArea
<
T
>
(
box2
,
box_size
,
normalized
);
T
inter_area
=
PolyOverlapArea
<
T
>
(
box1
,
box2
,
box_size
,
normalized
);
if
(
bbox1_area
==
0
||
bbox2_area
==
0
||
inter_area
==
0
)
{
// If coordinate values are is invalid
// if area size <= 0, return 0.
return
T
(
0.
);
}
else
{
return
inter_area
/
(
bbox1_area
+
bbox2_area
-
inter_area
);
}
}
template
<
typename
T
>
template
<
typename
T
>
class
MultiClassNMSKernel
:
public
framework
::
OpKernel
<
T
>
{
class
MultiClassNMSKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
public:
...
@@ -137,6 +155,8 @@ class MultiClassNMSKernel : public framework::OpKernel<T> {
...
@@ -137,6 +155,8 @@ class MultiClassNMSKernel : public framework::OpKernel<T> {
// The total boxes for each instance.
// The total boxes for each instance.
int64_t
num_boxes
=
bbox
.
dims
()[
0
];
int64_t
num_boxes
=
bbox
.
dims
()[
0
];
// 4: [xmin ymin xmax ymax]
// 4: [xmin ymin xmax ymax]
// 8: [x1 y1 x2 y2 x3 y3 x4 y4]
// 16, 24, or 32: [x1 y1 x2 y2 ... xn yn], n = 8, 12 or 16
int64_t
box_size
=
bbox
.
dims
()[
1
];
int64_t
box_size
=
bbox
.
dims
()[
1
];
std
::
vector
<
T
>
scores_data
(
num_boxes
);
std
::
vector
<
T
>
scores_data
(
num_boxes
);
...
@@ -154,8 +174,19 @@ class MultiClassNMSKernel : public framework::OpKernel<T> {
...
@@ -154,8 +174,19 @@ class MultiClassNMSKernel : public framework::OpKernel<T> {
for
(
size_t
k
=
0
;
k
<
selected_indices
->
size
();
++
k
)
{
for
(
size_t
k
=
0
;
k
<
selected_indices
->
size
();
++
k
)
{
if
(
keep
)
{
if
(
keep
)
{
const
int
kept_idx
=
(
*
selected_indices
)[
k
];
const
int
kept_idx
=
(
*
selected_indices
)[
k
];
T
overlap
=
JaccardOverlap
<
T
>
(
bbox_data
+
idx
*
box_size
,
T
overlap
=
T
(
0.
);
// 4: [xmin ymin xmax ymax]
if
(
box_size
==
4
)
{
overlap
=
JaccardOverlap
<
T
>
(
bbox_data
+
idx
*
box_size
,
bbox_data
+
kept_idx
*
box_size
,
true
);
bbox_data
+
kept_idx
*
box_size
,
true
);
}
// 8: [x1 y1 x2 y2 x3 y3 x4 y4] or 16, 24, 32
if
(
box_size
==
8
||
box_size
==
16
||
box_size
==
24
||
box_size
==
32
)
{
overlap
=
PolyIoU
<
T
>
(
bbox_data
+
idx
*
box_size
,
bbox_data
+
kept_idx
*
box_size
,
box_size
,
true
);
}
keep
=
overlap
<=
adaptive_threshold
;
keep
=
overlap
<=
adaptive_threshold
;
}
else
{
}
else
{
break
;
break
;
...
@@ -228,7 +259,9 @@ class MultiClassNMSKernel : public framework::OpKernel<T> {
...
@@ -228,7 +259,9 @@ class MultiClassNMSKernel : public framework::OpKernel<T> {
void
MultiClassOutput
(
const
Tensor
&
scores
,
const
Tensor
&
bboxes
,
void
MultiClassOutput
(
const
Tensor
&
scores
,
const
Tensor
&
bboxes
,
const
std
::
map
<
int
,
std
::
vector
<
int
>>&
selected_indices
,
const
std
::
map
<
int
,
std
::
vector
<
int
>>&
selected_indices
,
Tensor
*
outs
)
const
{
Tensor
*
outs
)
const
{
int
predict_dim
=
scores
.
dims
()[
1
];
int64_t
predict_dim
=
scores
.
dims
()[
1
];
int64_t
box_size
=
bboxes
.
dims
()[
1
];
int64_t
out_dim
=
bboxes
.
dims
()[
1
]
+
2
;
auto
*
scores_data
=
scores
.
data
<
T
>
();
auto
*
scores_data
=
scores
.
data
<
T
>
();
auto
*
bboxes_data
=
bboxes
.
data
<
T
>
();
auto
*
bboxes_data
=
bboxes
.
data
<
T
>
();
auto
*
odata
=
outs
->
data
<
T
>
();
auto
*
odata
=
outs
->
data
<
T
>
();
...
@@ -240,11 +273,11 @@ class MultiClassNMSKernel : public framework::OpKernel<T> {
...
@@ -240,11 +273,11 @@ class MultiClassNMSKernel : public framework::OpKernel<T> {
const
std
::
vector
<
int
>&
indices
=
it
.
second
;
const
std
::
vector
<
int
>&
indices
=
it
.
second
;
for
(
size_t
j
=
0
;
j
<
indices
.
size
();
++
j
)
{
for
(
size_t
j
=
0
;
j
<
indices
.
size
();
++
j
)
{
int
idx
=
indices
[
j
];
int
idx
=
indices
[
j
];
const
T
*
bdata
=
bboxes_data
+
idx
*
kBBoxS
ize
;
const
T
*
bdata
=
bboxes_data
+
idx
*
box_s
ize
;
odata
[
count
*
kOutputD
im
]
=
label
;
// label
odata
[
count
*
out_d
im
]
=
label
;
// label
odata
[
count
*
kOutputD
im
+
1
]
=
sdata
[
idx
];
// score
odata
[
count
*
out_d
im
+
1
]
=
sdata
[
idx
];
// score
// xmin, ymin, xmax, ymax
// xmin, ymin, xmax, ymax
or multi-points coordinates
std
::
memcpy
(
odata
+
count
*
kOutputDim
+
2
,
bdata
,
4
*
sizeof
(
T
));
std
::
memcpy
(
odata
+
count
*
out_dim
+
2
,
bdata
,
box_size
*
sizeof
(
T
));
count
++
;
count
++
;
}
}
}
}
...
@@ -261,6 +294,7 @@ class MultiClassNMSKernel : public framework::OpKernel<T> {
...
@@ -261,6 +294,7 @@ class MultiClassNMSKernel : public framework::OpKernel<T> {
int64_t
class_num
=
score_dims
[
1
];
int64_t
class_num
=
score_dims
[
1
];
int64_t
predict_dim
=
score_dims
[
2
];
int64_t
predict_dim
=
score_dims
[
2
];
int64_t
box_dim
=
boxes
->
dims
()[
2
];
int64_t
box_dim
=
boxes
->
dims
()[
2
];
int64_t
out_dim
=
boxes
->
dims
()[
2
]
+
2
;
std
::
vector
<
std
::
map
<
int
,
std
::
vector
<
int
>>>
all_indices
;
std
::
vector
<
std
::
map
<
int
,
std
::
vector
<
int
>>>
all_indices
;
std
::
vector
<
size_t
>
batch_starts
=
{
0
};
std
::
vector
<
size_t
>
batch_starts
=
{
0
};
...
@@ -283,7 +317,7 @@ class MultiClassNMSKernel : public framework::OpKernel<T> {
...
@@ -283,7 +317,7 @@ class MultiClassNMSKernel : public framework::OpKernel<T> {
T
*
od
=
outs
->
mutable_data
<
T
>
({
1
},
ctx
.
GetPlace
());
T
*
od
=
outs
->
mutable_data
<
T
>
({
1
},
ctx
.
GetPlace
());
od
[
0
]
=
-
1
;
od
[
0
]
=
-
1
;
}
else
{
}
else
{
outs
->
mutable_data
<
T
>
({
num_kept
,
kOutputD
im
},
ctx
.
GetPlace
());
outs
->
mutable_data
<
T
>
({
num_kept
,
out_d
im
},
ctx
.
GetPlace
());
for
(
int64_t
i
=
0
;
i
<
batch_size
;
++
i
)
{
for
(
int64_t
i
=
0
;
i
<
batch_size
;
++
i
)
{
Tensor
ins_score
=
scores
->
Slice
(
i
,
i
+
1
);
Tensor
ins_score
=
scores
->
Slice
(
i
,
i
+
1
);
ins_score
.
Resize
({
class_num
,
predict_dim
});
ins_score
.
Resize
({
class_num
,
predict_dim
});
...
@@ -311,10 +345,11 @@ class MultiClassNMSOpMaker : public framework::OpProtoAndCheckerMaker {
...
@@ -311,10 +345,11 @@ class MultiClassNMSOpMaker : public framework::OpProtoAndCheckerMaker {
public:
public:
void
Make
()
override
{
void
Make
()
override
{
AddInput
(
"BBoxes"
,
AddInput
(
"BBoxes"
,
"(Tensor) A 3-D Tensor with shape [N, M, 4] represents the "
"(Tensor) A 3-D Tensor with shape "
"[N, M, 4 or 8 16 24 32] represents the "
"predicted locations of M bounding bboxes, N is the batch size. "
"predicted locations of M bounding bboxes, N is the batch size. "
"Each bounding box has four coordinate values and the layout is "
"Each bounding box has four coordinate values and the layout is "
"[xmin, ymin, xmax, ymax]."
);
"[xmin, ymin, xmax, ymax]
, when box size equals to 4
."
);
AddInput
(
"Scores"
,
AddInput
(
"Scores"
,
"(Tensor) A 3-D Tensor with shape [N, C, M] represents the "
"(Tensor) A 3-D Tensor with shape [N, C, M] represents the "
"predicted confidence predictions. N is the batch size, C is the "
"predicted confidence predictions. N is the batch size, C is the "
...
@@ -351,8 +386,12 @@ class MultiClassNMSOpMaker : public framework::OpProtoAndCheckerMaker {
...
@@ -351,8 +386,12 @@ class MultiClassNMSOpMaker : public framework::OpProtoAndCheckerMaker {
AddOutput
(
"Out"
,
AddOutput
(
"Out"
,
"(LoDTensor) A 2-D LoDTensor with shape [No, 6] represents the "
"(LoDTensor) A 2-D LoDTensor with shape [No, 6] represents the "
"detections. Each row has 6 values: "
"detections. Each row has 6 values: "
"[label, confidence, xmin, ymin, xmax, ymax], No is the total "
"[label, confidence, xmin, ymin, xmax, ymax] or "
"number of detections in this mini-batch. For each instance, "
"(LoDTensor) A 2-D LoDTensor with shape [No, 10] represents the "
"detections. Each row has 10 values: "
"[label, confidence, x1, y1, x2, y2, x3, y3, x4, y4]. No is the "
"total number of detections in this mini-batch."
"For each instance, "
"the offsets in first dimension are called LoD, the number of "
"the offsets in first dimension are called LoD, the number of "
"offset is N + 1, if LoD[i + 1] - LoD[i] == 0, means there is "
"offset is N + 1, if LoD[i + 1] - LoD[i] == 0, means there is "
"no detected bbox."
);
"no detected bbox."
);
...
...
paddle/fluid/operators/detection/poly_util.cc
0 → 100644
浏览文件 @
23fc896b
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#ifndef POLY_UTIL_CC_
#define POLY_UTIL_CC_
#include "paddle/fluid/operators/detection/poly_util.h"
#include "paddle/fluid/framework/op_registry.h"
namespace
paddle
{
namespace
operators
{
using
gpc
::
gpc_polygon_clip
;
using
gpc
::
gpc_free_polygon
;
template
<
class
T
>
void
Array2PointVec
(
const
T
*&
box
,
const
size_t
box_size
,
std
::
vector
<
Point_
<
T
>>&
vec
)
{
size_t
pts_num
=
box_size
/
2
;
vec
.
resize
(
pts_num
);
for
(
size_t
i
=
0
;
i
<
pts_num
;
i
++
)
{
vec
.
at
(
i
).
x
=
box
[
2
*
i
];
vec
.
at
(
i
).
y
=
box
[
2
*
i
+
1
];
}
}
template
<
class
T
>
void
Array2Poly
(
const
T
*&
box
,
const
size_t
box_size
,
gpc
::
gpc_polygon
&
poly
)
{
size_t
pts_num
=
box_size
/
2
;
poly
.
num_contours
=
1
;
poly
.
hole
=
(
int
*
)
malloc
(
sizeof
(
int
));
poly
.
hole
[
0
]
=
0
;
poly
.
contour
=
(
gpc
::
gpc_vertex_list
*
)
malloc
(
sizeof
(
gpc
::
gpc_vertex_list
));
poly
.
contour
->
num_vertices
=
pts_num
;
poly
.
contour
->
vertex
=
(
gpc
::
gpc_vertex
*
)
malloc
(
sizeof
(
gpc
::
gpc_vertex
)
*
pts_num
);
for
(
size_t
i
=
0
;
i
<
pts_num
;
++
i
)
{
poly
.
contour
->
vertex
[
i
].
x
=
box
[
2
*
i
];
poly
.
contour
->
vertex
[
i
].
y
=
box
[
2
*
i
+
1
];
}
}
template
<
class
T
>
void
PointVec2Poly
(
const
std
::
vector
<
Point_
<
T
>>&
vec
,
gpc
::
gpc_polygon
&
poly
)
{
int
pts_num
=
vec
.
size
();
poly
.
num_contours
=
1
;
poly
.
hole
=
(
int
*
)
malloc
(
sizeof
(
int
));
poly
.
hole
[
0
]
=
0
;
poly
.
contour
=
(
gpc
::
gpc_vertex_list
*
)
malloc
(
sizeof
(
gpc
::
gpc_vertex_list
));
poly
.
contour
->
num_vertices
=
pts_num
;
poly
.
contour
->
vertex
=
(
gpc
::
gpc_vertex
*
)
malloc
(
sizeof
(
gpc
::
gpc_vertex
)
*
pts_num
);
for
(
size_t
i
=
0
;
i
<
pts_num
;
++
i
)
{
poly
.
contour
->
vertex
[
i
].
x
=
vec
[
i
].
x
;
poly
.
contour
->
vertex
[
i
].
y
=
vec
[
i
].
y
;
}
}
template
<
class
T
>
void
Poly2PointVec
(
const
gpc
::
gpc_vertex_list
&
contour
,
std
::
vector
<
Point_
<
T
>>&
vec
)
{
int
pts_num
=
contour
.
num_vertices
;
vec
.
resize
(
pts_num
);
for
(
int
i
=
0
;
i
<
pts_num
;
i
++
)
{
vec
.
at
(
i
).
x
=
contour
.
vertex
[
i
].
x
;
vec
.
at
(
i
).
y
=
contour
.
vertex
[
i
].
y
;
}
}
template
<
class
T
>
T
GetContourArea
(
std
::
vector
<
Point_
<
T
>>&
vec
)
{
size_t
pts_num
=
vec
.
size
();
if
(
pts_num
<
3
)
return
T
(
0.
);
T
area
=
T
(
0.
);
for
(
size_t
i
=
0
;
i
<
pts_num
;
++
i
)
{
area
+=
vec
[
i
].
x
*
vec
[(
i
+
1
)
%
pts_num
].
y
-
vec
[
i
].
y
*
vec
[(
i
+
1
)
%
pts_num
].
x
;
}
return
std
::
fabs
(
area
/
2.0
);
}
template
<
class
T
>
T
PolyArea
(
const
T
*
box
,
const
size_t
box_size
,
const
bool
normalized
)
{
// If coordinate values are is invalid
// if area size <= 0, return 0.
std
::
vector
<
Point_
<
T
>>
vec
;
Array2PointVec
<
T
>
(
box
,
box_size
,
vec
);
return
GetContourArea
<
T
>
(
vec
);
}
template
<
class
T
>
T
PolyOverlapArea
(
const
T
*
box1
,
const
T
*
box2
,
const
size_t
box_size
,
const
bool
normalized
)
{
gpc
::
gpc_polygon
poly1
;
gpc
::
gpc_polygon
poly2
;
Array2Poly
<
T
>
(
box1
,
box_size
,
poly1
);
Array2Poly
<
T
>
(
box2
,
box_size
,
poly2
);
gpc
::
gpc_polygon
respoly
;
gpc
::
gpc_op
op
=
gpc
::
GPC_INT
;
gpc
::
gpc_polygon_clip
(
op
,
&
poly2
,
&
poly1
,
&
respoly
);
T
inter_area
=
T
(
0.
);
int
contour_num
=
respoly
.
num_contours
;
for
(
int
i
=
0
;
i
<
contour_num
;
++
i
)
{
std
::
vector
<
Point_
<
T
>>
resvec
;
Poly2PointVec
<
T
>
(
respoly
.
contour
[
i
],
resvec
);
// inter_area += std::fabs(cv::contourArea(resvec)) + 0.5f *
// (cv::arcLength(resvec, true));
inter_area
+=
GetContourArea
<
T
>
(
resvec
);
}
gpc
::
gpc_free_polygon
(
&
poly1
);
gpc
::
gpc_free_polygon
(
&
poly2
);
gpc
::
gpc_free_polygon
(
&
respoly
);
return
inter_area
;
}
}
// namespace operators
}
// namespace paddle
#endif
paddle/fluid/operators/detection/poly_util.h
0 → 100644
浏览文件 @
23fc896b
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#ifndef POLY_UTIL_H_
#define POLY_UTIL_H_
#include <vector>
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/operators/detection/gpc.h"
namespace
paddle
{
namespace
operators
{
template
<
class
T
>
class
Point_
{
public:
// default constructor
Point_
()
{}
Point_
(
T
_x
,
T
_y
)
{}
Point_
(
const
Point_
&
pt
)
{}
Point_
&
operator
=
(
const
Point_
&
pt
);
// conversion to another data type
// template<typename _T> operator Point_<_T>() const;
// conversion to the old-style C structures
// operator Vec<T, 2>() const;
// checks whether the point is inside the specified rectangle
// bool inside(const Rect_<T>& r) const;
T
x
;
//!< x coordinate of the point
T
y
;
//!< y coordinate of the point
};
template
<
class
T
>
void
Array2PointVec
(
const
T
*&
box
,
const
size_t
box_size
,
std
::
vector
<
Point_
<
T
>>&
vec
);
template
<
class
T
>
void
Array2Poly
(
const
T
*&
box
,
const
size_t
box_size
,
gpc
::
gpc_polygon
&
poly
);
template
<
class
T
>
void
PointVec2Poly
(
const
std
::
vector
<
Point_
<
T
>>&
vec
,
gpc
::
gpc_polygon
&
poly
);
template
<
class
T
>
void
Poly2PointVec
(
const
gpc
::
gpc_vertex_list
&
contour
,
std
::
vector
<
Point_
<
T
>>&
vec
);
template
<
class
T
>
T
GetContourArea
(
std
::
vector
<
Point_
<
T
>>&
vec
);
template
<
class
T
>
T
PolyArea
(
const
T
*
box
,
const
size_t
box_size
,
const
bool
normalized
);
template
<
class
T
>
T
PolyOverlapArea
(
const
T
*
box1
,
const
T
*
box2
,
const
size_t
box_size
,
const
bool
normalized
);
}
// namespace operators
}
// namespace paddle
#include "paddle/fluid/operators/detection/poly_util.cc"
#endif // POLY_UTIL_H_
paddle/fluid/operators/detection/polygon_box_transform_op.cc
浏览文件 @
23fc896b
...
@@ -41,9 +41,9 @@ class PolygonBoxTransformCPUKernel : public framework::OpKernel<T> {
...
@@ -41,9 +41,9 @@ class PolygonBoxTransformCPUKernel : public framework::OpKernel<T> {
for
(
int
id_w
=
0
;
id_w
<
width
;
++
id_w
)
{
for
(
int
id_w
=
0
;
id_w
<
width
;
++
id_w
)
{
id
=
id_n
*
height
*
width
+
width
*
id_h
+
id_w
;
id
=
id_n
*
height
*
width
+
width
*
id_h
+
id_w
;
if
(
id_n
%
2
==
0
)
{
if
(
id_n
%
2
==
0
)
{
out_data
[
id
]
=
id_w
-
in_data
[
id
];
out_data
[
id
]
=
id_w
*
4
-
in_data
[
id
];
}
else
{
}
else
{
out_data
[
id
]
=
id_h
-
in_data
[
id
];
out_data
[
id
]
=
id_h
*
4
-
in_data
[
id
];
}
}
}
}
}
}
...
...
paddle/fluid/operators/detection/polygon_box_transform_op.cu
浏览文件 @
23fc896b
...
@@ -32,9 +32,9 @@ __global__ void PolygonBoxTransformKernel(const int n, const int h, const int w,
...
@@ -32,9 +32,9 @@ __global__ void PolygonBoxTransformKernel(const int n, const int h, const int w,
if
(
id_n
<
n
&&
id_h
<
h
&&
id_w
<
w
)
{
if
(
id_n
<
n
&&
id_h
<
h
&&
id_w
<
w
)
{
int
id
=
id_n
*
h
*
w
+
w
*
id_h
+
id_w
;
int
id
=
id_n
*
h
*
w
+
w
*
id_h
+
id_w
;
if
(
id_n
%
2
==
0
)
{
if
(
id_n
%
2
==
0
)
{
output
[
id
]
=
id_w
-
input
[
id
];
output
[
id
]
=
id_w
*
4
-
input
[
id
];
}
else
{
}
else
{
output
[
id
]
=
id_h
-
input
[
id
];
output
[
id
]
=
id_h
*
4
-
input
[
id
];
}
}
}
}
}
}
...
...
paddle/fluid/operators/math/CMakeLists.txt
浏览文件 @
23fc896b
...
@@ -76,5 +76,5 @@ cc_test(concat_test SRCS concat_test.cc DEPS concat)
...
@@ -76,5 +76,5 @@ cc_test(concat_test SRCS concat_test.cc DEPS concat)
cc_test
(
cpu_vec_test SRCS cpu_vec_test.cc DEPS blas cpu_info
)
cc_test
(
cpu_vec_test SRCS cpu_vec_test.cc DEPS blas cpu_info
)
cc_library
(
jit_kernel
cc_library
(
jit_kernel
SRCS jit_kernel.cc jit_kernel_blas.cc jit_kernel_exp.cc jit_kernel_lstm.cc
SRCS jit_kernel.cc jit_kernel_blas.cc jit_kernel_exp.cc jit_kernel_lstm.cc
DEPS cpu_info cblas
activation_functions
)
DEPS cpu_info cblas
)
cc_test
(
jit_kernel_test SRCS jit_kernel_test.cc DEPS jit_kernel
)
cc_test
(
jit_kernel_test SRCS jit_kernel_test.cc DEPS jit_kernel
)
paddle/fluid/operators/math/jit_kernel_exp.cc
浏览文件 @
23fc896b
...
@@ -27,13 +27,6 @@ limitations under the License. */
...
@@ -27,13 +27,6 @@ limitations under the License. */
namespace
paddle
{
namespace
paddle
{
namespace
operators
{
namespace
operators
{
namespace
math
{
namespace
math
{
#ifdef __AVX__
namespace
detail
{
__m256
Exp
(
__m256
a
);
}
// namespace detail
#endif
namespace
jitkernel
{
namespace
jitkernel
{
namespace
jit
=
platform
::
jit
;
namespace
jit
=
platform
::
jit
;
...
@@ -69,37 +62,186 @@ FOR_EACH_ISA(MKL_FLOAT, kGT16);
...
@@ -69,37 +62,186 @@ FOR_EACH_ISA(MKL_FLOAT, kGT16);
FOR_EACH_ISA_BLOCK
(
MKL_DOUBLE
);
FOR_EACH_ISA_BLOCK
(
MKL_DOUBLE
);
#endif
#endif
#define INTRI8_FLOAT(isa) \
namespace
detail
{
#ifdef __AVX__
#define ALIGN32 __attribute__((aligned(32)))
#define _PS256_CONST(Name, Val) \
static const float _ps256_##Name[8] ALIGN32 = {Val, Val, Val, Val, \
Val, Val, Val, Val}
#define _PI256_CONST(Name, Val) \
static const int _pi256_##Name[8] ALIGN32 = {Val, Val, Val, Val, \
Val, Val, Val, Val}
_PI256_CONST
(
0x7f
,
0x7f
);
_PS256_CONST
(
one
,
1.
f
);
_PS256_CONST
(
0
p5
,
0.5
f
);
_PS256_CONST
(
exp_hi
,
88.3762626647949
f
);
_PS256_CONST
(
exp_lo
,
-
88.3762626647949
f
);
_PS256_CONST
(
cephes_LOG2EF
,
1.44269504088896341
);
_PS256_CONST
(
cephes_exp_C1
,
0.693359375
);
_PS256_CONST
(
cephes_exp_C2
,
-
2.12194440e-4
);
_PS256_CONST
(
cephes_exp_p0
,
1.9875691500E-4
);
_PS256_CONST
(
cephes_exp_p1
,
1.3981999507E-3
);
_PS256_CONST
(
cephes_exp_p2
,
8.3334519073E-3
);
_PS256_CONST
(
cephes_exp_p3
,
4.1665795894E-2
);
_PS256_CONST
(
cephes_exp_p4
,
1.6666665459E-1
);
_PS256_CONST
(
cephes_exp_p5
,
5.0000001201E-1
);
typedef
union
imm_xmm_union
{
__m256i
imm
;
__m128i
xmm
[
2
];
}
imm_xmm_union
;
#define COPY_IMM_TO_XMM(imm_, xmm0_, xmm1_) \
{ \
imm_xmm_union u ALIGN32; \
u.imm = imm_; \
xmm0_ = u.xmm[0]; \
xmm1_ = u.xmm[1]; \
}
#define COPY_XMM_TO_IMM(xmm0_, xmm1_, imm_) \
{ \
imm_xmm_union u ALIGN32; \
u.xmm[0] = xmm0_; \
u.xmm[1] = xmm1_; \
imm_ = u.imm; \
}
#define AVX2_BITOP_USING_SSE2(fn) \
static inline __m256i avx2_mm256_##fn(__m256i x, int y) { \
/* use SSE2 to perform the bitop AVX2 */
\
__m128i x1, x2; \
__m256i ret; \
COPY_IMM_TO_XMM(x, x1, x2); \
x1 = _mm_##fn(x1, y); \
x2 = _mm_##fn(x2, y); \
COPY_XMM_TO_IMM(x1, x2, ret); \
return ret; \
}
#define AVX2_INTOP_USING_SSE2(fn) \
static inline __m256i avx2_mm256_add_epi32(__m256i x, __m256i y) { \
/* use SSE2 to perform the AVX2 integer operation */
\
__m128i x1, x2; \
__m128i y1, y2; \
__m256i ret; \
COPY_IMM_TO_XMM(x, x1, x2); \
COPY_IMM_TO_XMM(y, y1, y2); \
x1 = _mm_##fn(x1, y1); \
x2 = _mm_##fn(x2, y2); \
COPY_XMM_TO_IMM(x1, x2, ret); \
return ret; \
}
AVX2_BITOP_USING_SSE2
(
slli_epi32
);
AVX2_INTOP_USING_SSE2
(
add_epi32
);
#define AVXEXP_BASE \
__m256 tmp = _mm256_setzero_ps(), fx; \
__m256 one = *reinterpret_cast<const __m256*>(_ps256_one); \
__m256i imm0; \
x = _mm256_min_ps(x, *reinterpret_cast<const __m256*>(_ps256_exp_hi)); \
x = _mm256_max_ps(x, *reinterpret_cast<const __m256*>(_ps256_exp_lo)); \
/* express exp(x) as exp(g + n*log(2)) */
\
fx = _mm256_mul_ps(x, \
*reinterpret_cast<const __m256*>(_ps256_cephes_LOG2EF)); \
fx = _mm256_add_ps(fx, *reinterpret_cast<const __m256*>(_ps256_0p5)); \
tmp = _mm256_floor_ps(fx); \
/* if greater, substract 1 */
\
__m256 mask = _mm256_cmp_ps(tmp, fx, _CMP_GT_OS); \
mask = _mm256_and_ps(mask, one); \
fx = _mm256_sub_ps(tmp, mask); \
tmp = _mm256_mul_ps(fx, \
*reinterpret_cast<const __m256*>(_ps256_cephes_exp_C1)); \
__m256 z = _mm256_mul_ps( \
fx, *reinterpret_cast<const __m256*>(_ps256_cephes_exp_C2)); \
x = _mm256_sub_ps(x, tmp); \
x = _mm256_sub_ps(x, z); \
z = _mm256_mul_ps(x, x); \
__m256 y = *reinterpret_cast<const __m256*>(_ps256_cephes_exp_p0); \
y = _mm256_mul_ps(y, x); \
y = _mm256_add_ps(y, \
*reinterpret_cast<const __m256*>(_ps256_cephes_exp_p1)); \
y = _mm256_mul_ps(y, x); \
y = _mm256_add_ps(y, \
*reinterpret_cast<const __m256*>(_ps256_cephes_exp_p2)); \
y = _mm256_mul_ps(y, x); \
y = _mm256_add_ps(y, \
*reinterpret_cast<const __m256*>(_ps256_cephes_exp_p3)); \
y = _mm256_mul_ps(y, x); \
y = _mm256_add_ps(y, \
*reinterpret_cast<const __m256*>(_ps256_cephes_exp_p4)); \
y = _mm256_mul_ps(y, x); \
y = _mm256_add_ps(y, \
*reinterpret_cast<const __m256*>(_ps256_cephes_exp_p5)); \
y = _mm256_mul_ps(y, z); \
y = _mm256_add_ps(y, x); \
y = _mm256_add_ps(y, one); \
/* build 2^n */
\
imm0 = _mm256_cvttps_epi32(fx)
__m256
ExpAVX
(
__m256
x
)
{
AVXEXP_BASE
;
// two AVX2 instructions using SSE2
imm0
=
avx2_mm256_add_epi32
(
imm0
,
*
reinterpret_cast
<
const
__m256i
*>
(
_pi256_0x7f
));
imm0
=
avx2_mm256_slli_epi32
(
imm0
,
23
);
__m256
pow2n
=
_mm256_castsi256_ps
(
imm0
);
y
=
_mm256_mul_ps
(
y
,
pow2n
);
return
y
;
}
#endif
#ifdef __AVX2__
__m256
ExpAVX2
(
__m256
x
)
{
AVXEXP_BASE
;
// two AVX2 instructions
imm0
=
_mm256_add_epi32
(
imm0
,
*
reinterpret_cast
<
const
__m256i
*>
(
_pi256_0x7f
));
imm0
=
_mm256_slli_epi32
(
imm0
,
23
);
__m256
pow2n
=
_mm256_castsi256_ps
(
imm0
);
y
=
_mm256_mul_ps
(
y
,
pow2n
);
return
y
;
}
#endif
}
// namespace detail
#define INTRI8_FLOAT(isa, expisa) \
template <> \
template <> \
void VExpKernelImpl<float, isa, kEQ8>::Compute(const float* x, float* y) \
void VExpKernelImpl<float, isa, kEQ8>::Compute(const float* x, float* y) \
const { \
const { \
__m256 tmp = _mm256_loadu_ps(x); \
__m256 tmp = _mm256_loadu_ps(x); \
_mm256_storeu_ps(y,
detail::Exp(tmp));
\
_mm256_storeu_ps(y,
expisa(tmp));
\
}
}
#define INTRI16_FLOAT(isa
)
\
#define INTRI16_FLOAT(isa
, expisa)
\
template <> \
template <> \
void VExpKernelImpl<float, isa, kEQ16>::Compute(const float* x, float* y) \
void VExpKernelImpl<float, isa, kEQ16>::Compute(const float* x, float* y) \
const { \
const { \
__m256 tmp0 = _mm256_loadu_ps(x); \
__m256 tmp0 = _mm256_loadu_ps(x); \
__m256 tmp1 = _mm256_loadu_ps(x + 8); \
__m256 tmp1 = _mm256_loadu_ps(x + 8); \
tmp0 =
detail::Exp(tmp0);
\
tmp0 =
expisa(tmp0);
\
tmp1 =
detail::Exp(tmp1);
\
tmp1 =
expisa(tmp1);
\
_mm256_storeu_ps(y, tmp0); \
_mm256_storeu_ps(y, tmp0); \
_mm256_storeu_ps(y + 8, tmp1); \
_mm256_storeu_ps(y + 8, tmp1); \
}
}
#ifdef __AVX__
#ifdef __AVX__
INTRI8_FLOAT
(
jit
::
avx
);
INTRI8_FLOAT
(
jit
::
avx
,
detail
::
ExpAVX
);
INTRI16_FLOAT
(
jit
::
avx
);
INTRI16_FLOAT
(
jit
::
avx
,
detail
::
ExpAVX
);
#endif
#endif
#ifdef __AVX2__
#ifdef __AVX2__
INTRI8_FLOAT
(
jit
::
avx2
);
INTRI8_FLOAT
(
jit
::
avx2
,
detail
::
ExpAVX2
);
INTRI16_FLOAT
(
jit
::
avx2
);
INTRI16_FLOAT
(
jit
::
avx2
,
detail
::
ExpAVX2
);
#endif
#endif
#ifdef __AVX512F__
#ifdef __AVX512F__
INTRI8_FLOAT
(
jit
::
avx512f
);
INTRI8_FLOAT
(
jit
::
avx512f
,
detail
::
ExpAVX2
);
INTRI16_FLOAT
(
jit
::
avx512f
);
INTRI16_FLOAT
(
jit
::
avx512f
,
detail
::
ExpAVX2
);
#endif
#endif
// TODO(TJ): eq16 test and complete avx512
// TODO(TJ): eq16 test and complete avx512
...
@@ -135,26 +277,27 @@ class VSigmoidKernelImpl : public VSigmoidKernel<T> {
...
@@ -135,26 +277,27 @@ class VSigmoidKernelImpl : public VSigmoidKernel<T> {
std
::
shared_ptr
<
const
VExpKernel
<
T
>>
vexp_
;
std
::
shared_ptr
<
const
VExpKernel
<
T
>>
vexp_
;
};
};
#define INTRI_SIGMOID(tmp, min, max
)
\
#define INTRI_SIGMOID(tmp, min, max
, expisa)
\
tmp = _mm256_max_ps(tmp, min); \
tmp = _mm256_max_ps(tmp, min); \
tmp = _mm256_min_ps(tmp, max); \
tmp = _mm256_min_ps(tmp, max); \
tmp = _mm256_sub_ps(_mm256_set1_ps(0.0f), tmp); \
tmp = _mm256_sub_ps(_mm256_set1_ps(0.0f), tmp); \
tmp =
detail::Exp(tmp);
\
tmp =
expisa(tmp);
\
tmp = _mm256_add_ps(_mm256_set1_ps(1.0f), tmp); \
tmp = _mm256_add_ps(_mm256_set1_ps(1.0f), tmp); \
tmp = _mm256_div_ps(_mm256_set1_ps(1.0f), tmp)
tmp = _mm256_div_ps(_mm256_set1_ps(1.0f), tmp)
#define INTRI8_FLOAT(isa
)
\
#define INTRI8_FLOAT(isa
, expisa)
\
template <> \
template <> \
void VSigmoidKernelImpl<float, isa, kEQ8>::Compute(const float* x, float* y) \
void VSigmoidKernelImpl<float, isa, kEQ8>::Compute(const float* x, float* y) \
const { \
const { \
/* TODO(TJ): try to use static const*/
\
__m256 max = _mm256_set1_ps(SIGMOID_THRESHOLD_MAX); \
__m256 max = _mm256_set1_ps(SIGMOID_THRESHOLD_MAX); \
__m256 min = _mm256_set1_ps(SIGMOID_THRESHOLD_MIN); \
__m256 min = _mm256_set1_ps(SIGMOID_THRESHOLD_MIN); \
__m256 tmp = _mm256_loadu_ps(x); \
__m256 tmp = _mm256_loadu_ps(x); \
INTRI_SIGMOID(tmp, min, max
);
\
INTRI_SIGMOID(tmp, min, max
, expisa);
\
_mm256_storeu_ps(y, tmp); \
_mm256_storeu_ps(y, tmp); \
}
}
#define INTRI16_FLOAT(isa
)
\
#define INTRI16_FLOAT(isa
, expisa)
\
template <> \
template <> \
void VSigmoidKernelImpl<float, isa, kEQ16>::Compute(const float* x, \
void VSigmoidKernelImpl<float, isa, kEQ16>::Compute(const float* x, \
float* y) const { \
float* y) const { \
...
@@ -162,13 +305,13 @@ class VSigmoidKernelImpl : public VSigmoidKernel<T> {
...
@@ -162,13 +305,13 @@ class VSigmoidKernelImpl : public VSigmoidKernel<T> {
__m256 min = _mm256_set1_ps(SIGMOID_THRESHOLD_MIN); \
__m256 min = _mm256_set1_ps(SIGMOID_THRESHOLD_MIN); \
__m256 tmp0 = _mm256_loadu_ps(x); \
__m256 tmp0 = _mm256_loadu_ps(x); \
__m256 tmp1 = _mm256_loadu_ps(x + 8); \
__m256 tmp1 = _mm256_loadu_ps(x + 8); \
INTRI_SIGMOID(tmp0, min, max
);
\
INTRI_SIGMOID(tmp0, min, max
, expisa);
\
INTRI_SIGMOID(tmp1, min, max
);
\
INTRI_SIGMOID(tmp1, min, max
, expisa);
\
_mm256_storeu_ps(y, tmp0); \
_mm256_storeu_ps(y, tmp0); \
_mm256_storeu_ps(y + 8, tmp1); \
_mm256_storeu_ps(y + 8, tmp1); \
}
}
#define INTRI_GT8LT16_FLOAT(isa
)
\
#define INTRI_GT8LT16_FLOAT(isa
, expisa)
\
template <> \
template <> \
VSigmoidKernelImpl<float, isa, kGT8LT16>::VSigmoidKernelImpl(int d) \
VSigmoidKernelImpl<float, isa, kGT8LT16>::VSigmoidKernelImpl(int d) \
: VSigmoidKernel<float>() { \
: VSigmoidKernel<float>() { \
...
@@ -184,7 +327,7 @@ class VSigmoidKernelImpl : public VSigmoidKernel<T> {
...
@@ -184,7 +327,7 @@ class VSigmoidKernelImpl : public VSigmoidKernel<T> {
__m256 max = _mm256_set1_ps(SIGMOID_THRESHOLD_MAX); \
__m256 max = _mm256_set1_ps(SIGMOID_THRESHOLD_MAX); \
__m256 min = _mm256_set1_ps(SIGMOID_THRESHOLD_MIN); \
__m256 min = _mm256_set1_ps(SIGMOID_THRESHOLD_MIN); \
__m256 tmp = _mm256_loadu_ps(x); \
__m256 tmp = _mm256_loadu_ps(x); \
INTRI_SIGMOID(tmp, min, max
);
\
INTRI_SIGMOID(tmp, min, max
, expisa);
\
_mm256_storeu_ps(y, tmp); \
_mm256_storeu_ps(y, tmp); \
const float min_ = SIGMOID_THRESHOLD_MIN; \
const float min_ = SIGMOID_THRESHOLD_MIN; \
const float max_ = SIGMOID_THRESHOLD_MAX; \
const float max_ = SIGMOID_THRESHOLD_MAX; \
...
@@ -198,7 +341,7 @@ class VSigmoidKernelImpl : public VSigmoidKernel<T> {
...
@@ -198,7 +341,7 @@ class VSigmoidKernelImpl : public VSigmoidKernel<T> {
} \
} \
}
}
#define INTRI_GT16_FLOAT(isa
)
\
#define INTRI_GT16_FLOAT(isa
, expisa)
\
template <> \
template <> \
VSigmoidKernelImpl<float, isa, kGT16>::VSigmoidKernelImpl(int d) \
VSigmoidKernelImpl<float, isa, kGT16>::VSigmoidKernelImpl(int d) \
: VSigmoidKernel<float>() { \
: VSigmoidKernel<float>() { \
...
@@ -215,7 +358,7 @@ class VSigmoidKernelImpl : public VSigmoidKernel<T> {
...
@@ -215,7 +358,7 @@ class VSigmoidKernelImpl : public VSigmoidKernel<T> {
__m256 min = _mm256_set1_ps(SIGMOID_THRESHOLD_MIN); \
__m256 min = _mm256_set1_ps(SIGMOID_THRESHOLD_MIN); \
for (int i = 0; i < this->end_; i += AVX_FLOAT_BLOCK) { \
for (int i = 0; i < this->end_; i += AVX_FLOAT_BLOCK) { \
__m256 tmp = _mm256_loadu_ps(x + i); \
__m256 tmp = _mm256_loadu_ps(x + i); \
INTRI_SIGMOID(tmp, min, max
);
\
INTRI_SIGMOID(tmp, min, max
, expisa);
\
_mm256_storeu_ps(y + i, tmp); \
_mm256_storeu_ps(y + i, tmp); \
} \
} \
const float min_ = SIGMOID_THRESHOLD_MIN; \
const float min_ = SIGMOID_THRESHOLD_MIN; \
...
@@ -231,22 +374,20 @@ class VSigmoidKernelImpl : public VSigmoidKernel<T> {
...
@@ -231,22 +374,20 @@ class VSigmoidKernelImpl : public VSigmoidKernel<T> {
}
}
#ifdef __AVX__
#ifdef __AVX__
INTRI8_FLOAT
(
jit
::
avx
);
INTRI8_FLOAT
(
jit
::
avx
,
detail
::
ExpAVX
);
INTRI16_FLOAT
(
jit
::
avx
);
INTRI16_FLOAT
(
jit
::
avx
,
detail
::
ExpAVX
);
INTRI_GT8LT16_FLOAT
(
jit
::
avx
);
INTRI_GT8LT16_FLOAT
(
jit
::
avx
,
detail
::
ExpAVX
);
INTRI_GT16_FLOAT
(
jit
::
avx
);
INTRI_GT16_FLOAT
(
jit
::
avx
,
detail
::
ExpAVX
);
#endif
#endif
#ifdef __AVX2__
#ifdef __AVX2__
INTRI8_FLOAT
(
jit
::
avx2
);
INTRI8_FLOAT
(
jit
::
avx2
,
detail
::
ExpAVX2
);
INTRI16_FLOAT
(
jit
::
avx2
);
INTRI16_FLOAT
(
jit
::
avx2
,
detail
::
ExpAVX2
);
// INTRI_GT8LT16_FLOAT(jit::avx2);
// maybe use avx at gt8lt16 and gt16
// INTRI_GT16_FLOAT(jit::avx2);
#endif
#endif
#ifdef __AVX512F__
#ifdef __AVX512F__
INTRI8_FLOAT
(
jit
::
avx512f
);
INTRI8_FLOAT
(
jit
::
avx512f
,
detail
::
ExpAVX2
);
INTRI16_FLOAT
(
jit
::
avx512f
);
INTRI16_FLOAT
(
jit
::
avx512f
,
detail
::
ExpAVX2
);
// INTRI_GT8LT16_FLOAT(jit::avx512f);
// maybe use avx2 at gt8lt16 and gt16
// INTRI_GT16_FLOAT(jit::avx512f);
#endif
#endif
#undef INTRI8_FLOAT
#undef INTRI8_FLOAT
...
@@ -280,36 +421,36 @@ class VTanhKernelImpl : public VTanhKernel<T> {
...
@@ -280,36 +421,36 @@ class VTanhKernelImpl : public VTanhKernel<T> {
std
::
shared_ptr
<
const
VAddBiasKernel
<
T
>>
vaddbias_
;
std
::
shared_ptr
<
const
VAddBiasKernel
<
T
>>
vaddbias_
;
};
};
#define INTRI_VTANH(tmp
)
\
#define INTRI_VTANH(tmp
, expisa)
\
tmp = _mm256_mul_ps(_mm256_set1_ps(-2.0f), tmp); \
tmp = _mm256_mul_ps(_mm256_set1_ps(-2.0f), tmp); \
tmp = _mm256_min_ps(tmp, _mm256_set1_ps(EXP_MAX_INPUT)); \
tmp = _mm256_min_ps(tmp, _mm256_set1_ps(EXP_MAX_INPUT)); \
tmp =
detail::Exp(tmp);
\
tmp =
expisa(tmp);
\
tmp = _mm256_add_ps(_mm256_set1_ps(1.0f), tmp); \
tmp = _mm256_add_ps(_mm256_set1_ps(1.0f), tmp); \
tmp = _mm256_div_ps(_mm256_set1_ps(2.0f), tmp); \
tmp = _mm256_div_ps(_mm256_set1_ps(2.0f), tmp); \
tmp = _mm256_sub_ps(tmp, _mm256_set1_ps(1.0f))
tmp = _mm256_sub_ps(tmp, _mm256_set1_ps(1.0f))
#define INTRI8_FLOAT(isa
)
\
#define INTRI8_FLOAT(isa
, expisa)
\
template <> \
template <> \
void VTanhKernelImpl<float, isa, kEQ8>::Compute(const float* x, float* y) \
void VTanhKernelImpl<float, isa, kEQ8>::Compute(const float* x, float* y) \
const { \
const { \
__m256 tmp = _mm256_loadu_ps(x); \
__m256 tmp = _mm256_loadu_ps(x); \
INTRI_VTANH(tmp
);
\
INTRI_VTANH(tmp
, expisa);
\
_mm256_storeu_ps(y, tmp); \
_mm256_storeu_ps(y, tmp); \
}
}
#define INTRI16_FLOAT(isa
)
\
#define INTRI16_FLOAT(isa
, expisa)
\
template <> \
template <> \
void VTanhKernelImpl<float, isa, kEQ16>::Compute(const float* x, float* y) \
void VTanhKernelImpl<float, isa, kEQ16>::Compute(const float* x, float* y) \
const { \
const { \
__m256 tmp0 = _mm256_loadu_ps(x); \
__m256 tmp0 = _mm256_loadu_ps(x); \
__m256 tmp1 = _mm256_loadu_ps(x + 8); \
__m256 tmp1 = _mm256_loadu_ps(x + 8); \
INTRI_VTANH(tmp0
);
\
INTRI_VTANH(tmp0
, expisa);
\
INTRI_VTANH(tmp1
);
\
INTRI_VTANH(tmp1
, expisa);
\
_mm256_storeu_ps(y, tmp0); \
_mm256_storeu_ps(y, tmp0); \
_mm256_storeu_ps(y + 8, tmp1); \
_mm256_storeu_ps(y + 8, tmp1); \
}
}
#define INTRI_GT8LT16_FLOAT(isa
)
\
#define INTRI_GT8LT16_FLOAT(isa
, expisa)
\
template <> \
template <> \
VTanhKernelImpl<float, isa, kGT8LT16>::VTanhKernelImpl(int d) \
VTanhKernelImpl<float, isa, kGT8LT16>::VTanhKernelImpl(int d) \
: VTanhKernel<float>() { \
: VTanhKernel<float>() { \
...
@@ -327,7 +468,7 @@ class VTanhKernelImpl : public VTanhKernel<T> {
...
@@ -327,7 +468,7 @@ class VTanhKernelImpl : public VTanhKernel<T> {
void VTanhKernelImpl<float, isa, kGT8LT16>::Compute(const float* x, \
void VTanhKernelImpl<float, isa, kGT8LT16>::Compute(const float* x, \
float* y) const { \
float* y) const { \
__m256 tmp = _mm256_loadu_ps(x); \
__m256 tmp = _mm256_loadu_ps(x); \
INTRI_VTANH(tmp
);
\
INTRI_VTANH(tmp
, expisa);
\
_mm256_storeu_ps(y, tmp); \
_mm256_storeu_ps(y, tmp); \
x += AVX_FLOAT_BLOCK; \
x += AVX_FLOAT_BLOCK; \
y += AVX_FLOAT_BLOCK; \
y += AVX_FLOAT_BLOCK; \
...
@@ -337,7 +478,7 @@ class VTanhKernelImpl : public VTanhKernel<T> {
...
@@ -337,7 +478,7 @@ class VTanhKernelImpl : public VTanhKernel<T> {
vaddbias_->Compute(-1.f, y, y); \
vaddbias_->Compute(-1.f, y, y); \
}
}
#define INTRI_GT16_FLOAT(isa
)
\
#define INTRI_GT16_FLOAT(isa
, expisa)
\
template <> \
template <> \
VTanhKernelImpl<float, isa, kGT16>::VTanhKernelImpl(int d) \
VTanhKernelImpl<float, isa, kGT16>::VTanhKernelImpl(int d) \
: VTanhKernel<float>() { \
: VTanhKernel<float>() { \
...
@@ -356,7 +497,7 @@ class VTanhKernelImpl : public VTanhKernel<T> {
...
@@ -356,7 +497,7 @@ class VTanhKernelImpl : public VTanhKernel<T> {
const { \
const { \
for (int i = 0; i < this->end_; i += AVX_FLOAT_BLOCK) { \
for (int i = 0; i < this->end_; i += AVX_FLOAT_BLOCK) { \
__m256 tmp = _mm256_loadu_ps(x + i); \
__m256 tmp = _mm256_loadu_ps(x + i); \
INTRI_VTANH(tmp
);
\
INTRI_VTANH(tmp
, expisa);
\
_mm256_storeu_ps(y + i, tmp); \
_mm256_storeu_ps(y + i, tmp); \
} \
} \
x += this->end_; \
x += this->end_; \
...
@@ -368,19 +509,19 @@ class VTanhKernelImpl : public VTanhKernel<T> {
...
@@ -368,19 +509,19 @@ class VTanhKernelImpl : public VTanhKernel<T> {
}
}
#ifdef __AVX__
#ifdef __AVX__
INTRI8_FLOAT
(
jit
::
avx
);
INTRI8_FLOAT
(
jit
::
avx
,
detail
::
ExpAVX
);
INTRI16_FLOAT
(
jit
::
avx
);
INTRI16_FLOAT
(
jit
::
avx
,
detail
::
ExpAVX
);
INTRI_GT8LT16_FLOAT
(
jit
::
avx
);
INTRI_GT8LT16_FLOAT
(
jit
::
avx
,
detail
::
ExpAVX
);
INTRI_GT16_FLOAT
(
jit
::
avx
);
INTRI_GT16_FLOAT
(
jit
::
avx
,
detail
::
ExpAVX
);
#endif
#endif
#ifdef __AVX2__
#ifdef __AVX2__
INTRI8_FLOAT
(
jit
::
avx2
);
INTRI8_FLOAT
(
jit
::
avx2
,
detail
::
ExpAVX2
);
INTRI16_FLOAT
(
jit
::
avx2
);
INTRI16_FLOAT
(
jit
::
avx2
,
detail
::
ExpAVX2
);
// maybe use avx at gt8lt16 and gt16
// maybe use avx at gt8lt16 and gt16
#endif
#endif
#ifdef __AVX512F__
#ifdef __AVX512F__
INTRI8_FLOAT
(
jit
::
avx512f
);
INTRI8_FLOAT
(
jit
::
avx512f
,
detail
::
ExpAVX2
);
INTRI16_FLOAT
(
jit
::
avx512f
);
INTRI16_FLOAT
(
jit
::
avx512f
,
detail
::
ExpAVX2
);
// maybe use avx at gt8lt16 and gt16
// maybe use avx at gt8lt16 and gt16
#endif
#endif
...
...
paddle/fluid/operators/math/jit_kernel_lstm.cc
浏览文件 @
23fc896b
...
@@ -25,13 +25,18 @@ limitations under the License. */
...
@@ -25,13 +25,18 @@ limitations under the License. */
namespace
paddle
{
namespace
paddle
{
namespace
operators
{
namespace
operators
{
namespace
math
{
namespace
math
{
#ifdef __AVX__
namespace
jitkernel
{
namespace
detail
{
namespace
detail
{
__m256
Exp
(
__m256
a
);
#ifdef __AVX__
}
// namespace detail
__m256
ExpAVX
(
__m256
x
);
#endif
#endif
namespace
jitkernel
{
#ifdef __AVX2__
__m256
ExpAVX2
(
__m256
x
);
#endif
}
// namespace detail
namespace
jit
=
platform
::
jit
;
namespace
jit
=
platform
::
jit
;
#ifdef __AVX__
#ifdef __AVX__
...
@@ -43,43 +48,72 @@ class AVXAct {
...
@@ -43,43 +48,72 @@ class AVXAct {
virtual
__m256
Compute
(
__m256
x
)
const
=
0
;
virtual
__m256
Compute
(
__m256
x
)
const
=
0
;
};
};
template
<
act_type
type
>
template
<
act_type
type
,
jit
::
cpu_isa_t
isa
>
class
AVXActImpl
:
public
AVXAct
{
class
AVXActImpl
:
public
AVXAct
{
public:
public:
__m256
Compute
(
__m256
x
)
const
override
{
PADDLE_THROW
(
"Unkown type!"
);
}
__m256
Compute
(
__m256
x
)
const
override
{
PADDLE_THROW
(
"Unkown type!"
);
}
};
};
template
<
>
#define AVX_SIGMOID(isa, expisa) \
__m256
AVXActImpl
<
kSigmoid
>::
Compute
(
__m256
x
)
const
{
template <> \
__m256
ones
=
_mm256_set1_ps
(
1.0
f
);
__m256 AVXActImpl<kSigmoid, isa>::Compute(__m256 x) const { \
x
=
_mm256_max_ps
(
x
,
_mm256_set1_ps
(
SIGMOID_THRESHOLD_MIN
));
__m256 ones = _mm256_set1_ps(1.0f); \
x
=
_mm256_min_ps
(
x
,
_mm256_set1_ps
(
SIGMOID_THRESHOLD_MAX
));
x = _mm256_max_ps(x, _mm256_set1_ps(SIGMOID_THRESHOLD_MIN)); \
x
=
_mm256_sub_ps
(
_mm256_set1_ps
(
0.0
f
),
x
);
x = _mm256_min_ps(x, _mm256_set1_ps(SIGMOID_THRESHOLD_MAX)); \
x
=
detail
::
Exp
(
x
);
x = _mm256_sub_ps(_mm256_set1_ps(0.0f), x); \
x
=
_mm256_add_ps
(
ones
,
x
);
x = expisa(x); \
return
_mm256_div_ps
(
ones
,
x
);
x = _mm256_add_ps(ones, x); \
}
return _mm256_div_ps(ones, x); \
}
template
<
>
#define AVX_TANH(isa, expisa) \
__m256
AVXActImpl
<
kTanh
>::
Compute
(
__m256
x
)
const
{
template <> \
__m256
ones
=
_mm256_set1_ps
(
1.0
f
);
__m256 AVXActImpl<kTanh, isa>::Compute(__m256 x) const { \
x
=
_mm256_mul_ps
(
_mm256_set1_ps
(
-
2.0
f
),
x
);
__m256 ones = _mm256_set1_ps(1.0f); \
x
=
_mm256_min_ps
(
x
,
_mm256_set1_ps
(
EXP_MAX_INPUT
));
x = _mm256_mul_ps(_mm256_set1_ps(-2.0f), x); \
x
=
detail
::
Exp
(
x
);
x = _mm256_min_ps(x, _mm256_set1_ps(EXP_MAX_INPUT)); \
x
=
_mm256_add_ps
(
ones
,
x
);
x = expisa(x); \
x
=
_mm256_div_ps
(
_mm256_set1_ps
(
2.0
f
),
x
);
x = _mm256_add_ps(ones, x); \
return
_mm256_sub_ps
(
x
,
ones
);
x = _mm256_div_ps(_mm256_set1_ps(2.0f), x); \
}
return _mm256_sub_ps(x, ones); \
}
template
<
>
#define AVX_RELU(isa) \
__m256
AVXActImpl
<
kRelu
>::
Compute
(
__m256
x
)
const
{
template <> \
return
_mm256_max_ps
(
x
,
_mm256_setzero_ps
());
__m256 AVXActImpl<kRelu, isa>::Compute(__m256 x) const { \
}
return _mm256_max_ps(x, _mm256_setzero_ps()); \
}
#define AVX_IDENTITY(isa) \
template <> \
__m256 AVXActImpl<kIdentity, isa>::Compute(__m256 x) const { \
return x; \
}
#define FOR_EACH_AVX_ISA(macro_) \
macro_(jit::avx); \
macro_(jit::avx2); \
macro_(jit::avx512f)
FOR_EACH_AVX_ISA
(
AVX_RELU
);
FOR_EACH_AVX_ISA
(
AVX_IDENTITY
);
AVX_SIGMOID
(
jit
::
avx
,
detail
::
ExpAVX
);
AVX_TANH
(
jit
::
avx
,
detail
::
ExpAVX
);
#ifdef __AVX2__
AVX_SIGMOID
(
jit
::
avx2
,
detail
::
ExpAVX2
);
AVX_SIGMOID
(
jit
::
avx512f
,
detail
::
ExpAVX2
);
AVX_TANH
(
jit
::
avx2
,
detail
::
ExpAVX2
);
AVX_TANH
(
jit
::
avx512f
,
detail
::
ExpAVX2
);
#endif
#undef FOR_EACH_AVX_ISA
#undef AVX_IDENTITY
#undef AVX_RELU
#undef AVX_TANH
#undef AVX_SIGMOID
template
<
>
__m256
AVXActImpl
<
kIdentity
>::
Compute
(
__m256
x
)
const
{
return
x
;
}
#endif
#endif
template
<
typename
T
>
template
<
typename
T
>
...
@@ -119,23 +153,6 @@ class LSTMKernelImpl : public LSTMKernel<T> {
...
@@ -119,23 +153,6 @@ class LSTMKernelImpl : public LSTMKernel<T> {
act_cell_d_
=
GetActKernel
<
T
>
(
act_cell
,
d
);
act_cell_d_
=
GetActKernel
<
T
>
(
act_cell
,
d
);
vmul_d_
=
KernelPool
::
Instance
().
template
Get
<
VMulKernel
<
T
>
>
(
d
);
vmul_d_
=
KernelPool
::
Instance
().
template
Get
<
VMulKernel
<
T
>
>
(
d
);
vadd_d_
=
KernelPool
::
Instance
().
template
Get
<
VAddKernel
<
T
>
>
(
d
);
vadd_d_
=
KernelPool
::
Instance
().
template
Get
<
VAddKernel
<
T
>
>
(
d
);
#ifdef __AVX__
auto
GetAVXAct
=
[
&
](
const
std
::
string
&
type
)
->
std
::
unique_ptr
<
AVXAct
>
{
if
(
type
==
"sigmoid"
)
{
return
std
::
unique_ptr
<
AVXAct
>
(
new
AVXActImpl
<
kSigmoid
>
());
}
else
if
(
type
==
"relu"
)
{
return
std
::
unique_ptr
<
AVXAct
>
(
new
AVXActImpl
<
kRelu
>
());
}
else
if
(
type
==
"tanh"
)
{
return
std
::
unique_ptr
<
AVXAct
>
(
new
AVXActImpl
<
kTanh
>
());
}
else
if
(
type
==
"identity"
||
type
==
""
)
{
return
std
::
unique_ptr
<
AVXAct
>
(
new
AVXActImpl
<
kIdentity
>
());
}
PADDLE_THROW
(
"Not support type: %s"
,
type
);
};
avx_act_gate_
=
GetAVXAct
(
act_gate
);
avx_act_cand_
=
GetAVXAct
(
act_cand
);
avx_act_cell_
=
GetAVXAct
(
act_cell
);
#endif
}
}
void
ComputeCtHt
(
T
*
gates
,
const
T
*
ct_1
,
T
*
ct
,
T
*
ht
,
const
T
*
wp_data
,
void
ComputeCtHt
(
T
*
gates
,
const
T
*
ct_1
,
T
*
ct
,
T
*
ht
,
const
T
*
wp_data
,
...
@@ -176,6 +193,27 @@ class LSTMKernelImpl : public LSTMKernel<T> {
...
@@ -176,6 +193,27 @@ class LSTMKernelImpl : public LSTMKernel<T> {
};
};
#define INTRI8_FLOAT(isa) \
#define INTRI8_FLOAT(isa) \
template <> \
LSTMKernelImpl<float, isa, kEQ8>::LSTMKernelImpl( \
const std::string& act_gate, const std::string& act_cand, \
const std::string& act_cell, int d) \
: LSTMKernel<float>() { \
auto GetAVXAct = [&](const std::string& type) -> std::unique_ptr<AVXAct> { \
if (type == "sigmoid") { \
return std::unique_ptr<AVXAct>(new AVXActImpl<kSigmoid, isa>()); \
} else if (type == "relu") { \
return std::unique_ptr<AVXAct>(new AVXActImpl<kRelu, isa>()); \
} else if (type == "tanh") { \
return std::unique_ptr<AVXAct>(new AVXActImpl<kTanh, isa>()); \
} else if (type == "identity" || type == "") { \
return std::unique_ptr<AVXAct>(new AVXActImpl<kIdentity, isa>()); \
} \
PADDLE_THROW("Not support type: %s", type); \
}; \
avx_act_gate_ = GetAVXAct(act_gate); \
avx_act_cand_ = GetAVXAct(act_cand); \
avx_act_cell_ = GetAVXAct(act_cell); \
} \
template <> \
template <> \
void LSTMKernelImpl<float, isa, kEQ8>::ComputeCtHt( \
void LSTMKernelImpl<float, isa, kEQ8>::ComputeCtHt( \
float* gates, const float* ct_1, float* ct, float* ht, \
float* gates, const float* ct_1, float* ct, float* ht, \
...
@@ -195,6 +233,20 @@ class LSTMKernelImpl : public LSTMKernel<T> {
...
@@ -195,6 +233,20 @@ class LSTMKernelImpl : public LSTMKernel<T> {
/* H_t = act_cell(C_t) * ogated */
\
/* H_t = act_cell(C_t) * ogated */
\
o = _mm256_mul_ps(avx_act_cell_->Compute(f), avx_act_gate_->Compute(o)); \
o = _mm256_mul_ps(avx_act_cell_->Compute(f), avx_act_gate_->Compute(o)); \
_mm256_storeu_ps(ht, o); \
_mm256_storeu_ps(ht, o); \
} \
template <> \
void LSTMKernelImpl<float, isa, kEQ8>::ComputeC1H1( \
float* gates, float* ct, float* ht, const float* wp_data) const { \
__m256 c, i, o; \
c = _mm256_loadu_ps(gates); \
i = _mm256_loadu_ps(gates + 8); \
o = _mm256_loadu_ps(gates + 24); \
/* C_t = igated * cgated*/
\
c = _mm256_mul_ps(avx_act_gate_->Compute(i), avx_act_cand_->Compute(c)); \
_mm256_storeu_ps(ct, c); \
/* H_t = act_cell(C_t) * ogated */
\
o = _mm256_mul_ps(avx_act_cell_->Compute(c), avx_act_gate_->Compute(o)); \
_mm256_storeu_ps(ht, o); \
}
}
// TODO(TJ): optimize keq16
// TODO(TJ): optimize keq16
...
...
paddle/fluid/operators/roi_pool_op.cc
浏览文件 @
23fc896b
...
@@ -174,4 +174,4 @@ REGISTER_OP_CPU_KERNEL(
...
@@ -174,4 +174,4 @@ REGISTER_OP_CPU_KERNEL(
REGISTER_OP_CPU_KERNEL
(
REGISTER_OP_CPU_KERNEL
(
roi_pool_grad
,
roi_pool_grad
,
ops
::
CPUROIPoolGradOpKernel
<
paddle
::
platform
::
CPUDeviceContext
,
float
>
,
ops
::
CPUROIPoolGradOpKernel
<
paddle
::
platform
::
CPUDeviceContext
,
float
>
,
ops
::
CPUROIPoolOpKernel
<
paddle
::
platform
::
CPUDeviceContext
,
double
>
);
ops
::
CPUROIPool
Grad
OpKernel
<
paddle
::
platform
::
CPUDeviceContext
,
double
>
);
paddle/fluid/operators/roi_pool_op.cu
浏览文件 @
23fc896b
...
@@ -249,4 +249,4 @@ REGISTER_OP_CUDA_KERNEL(
...
@@ -249,4 +249,4 @@ REGISTER_OP_CUDA_KERNEL(
REGISTER_OP_CUDA_KERNEL
(
REGISTER_OP_CUDA_KERNEL
(
roi_pool_grad
,
roi_pool_grad
,
ops
::
GPUROIPoolGradOpKernel
<
paddle
::
platform
::
CUDADeviceContext
,
float
>
,
ops
::
GPUROIPoolGradOpKernel
<
paddle
::
platform
::
CUDADeviceContext
,
float
>
,
ops
::
GPUROIPoolOpKernel
<
paddle
::
platform
::
CUDADeviceContext
,
double
>
);
ops
::
GPUROIPool
Grad
OpKernel
<
paddle
::
platform
::
CUDADeviceContext
,
double
>
);
python/paddle/fluid/layers/nn.py
浏览文件 @
23fc896b
...
@@ -355,7 +355,6 @@ def dynamic_lstm(input,
...
@@ -355,7 +355,6 @@ def dynamic_lstm(input,
c_0(Variable): The initial cell state is an optional input, default is zero.
c_0(Variable): The initial cell state is an optional input, default is zero.
This is a tensor with shape (N x D), where N is the
This is a tensor with shape (N x D), where N is the
batch size. `h_0` and `c_0` can be NULL but only at the same time.
batch size. `h_0` and `c_0` can be NULL but only at the same time.
param_attr(ParamAttr|None): The parameter attribute for the learnable
param_attr(ParamAttr|None): The parameter attribute for the learnable
hidden-hidden weights.
hidden-hidden weights.
...
@@ -363,6 +362,11 @@ def dynamic_lstm(input,
...
@@ -363,6 +362,11 @@ def dynamic_lstm(input,
W_{fh}, W_{oh}`}
W_{fh}, W_{oh}`}
- The shape is (D x 4D), where D is the hidden
- The shape is (D x 4D), where D is the hidden
size.
size.
If it is set to None or one attribute of ParamAttr,
dynamic_lstm will create ParamAttr as param_attr.
If the Initializer of the param_attr is not set, the
parameter is initialized with Xavier. Default: None.
bias_attr (ParamAttr|None): The bias attribute for the learnable bias
bias_attr (ParamAttr|None): The bias attribute for the learnable bias
weights, which contains two parts, input-hidden
weights, which contains two parts, input-hidden
bias weights and peephole connections weights if
bias weights and peephole connections weights if
...
@@ -375,6 +379,11 @@ def dynamic_lstm(input,
...
@@ -375,6 +379,11 @@ def dynamic_lstm(input,
- Biases = { :math:`b_c, b_i, b_f, b_o, W_{ic},
\
- Biases = { :math:`b_c, b_i, b_f, b_o, W_{ic},
\
W_{fc}, W_{oc}`}.
W_{fc}, W_{oc}`}.
- The shape is (1 x 7D).
- The shape is (1 x 7D).
If it is set to None or one attribute of ParamAttr,
dynamic_lstm will create ParamAttr as bias_attr.
If the Initializer of the bias_attr is not set,
the bias is initialized zero. Default: None.
use_peepholes (bool): ${use_peepholes_comment}
use_peepholes (bool): ${use_peepholes_comment}
is_reverse (bool): ${is_reverse_comment}
is_reverse (bool): ${is_reverse_comment}
gate_activation (str): ${gate_activation_comment}
gate_activation (str): ${gate_activation_comment}
...
@@ -393,11 +402,11 @@ def dynamic_lstm(input,
...
@@ -393,11 +402,11 @@ def dynamic_lstm(input,
hidden_dim = 512
hidden_dim = 512
forward_proj = fluid.layers.fc(input=input_seq, size=hidden_dim * 4,
forward_proj = fluid.layers.fc(input=input_seq, size=hidden_dim * 4,
act=None, bias_attr=Non
e)
bias_attr=Fals
e)
forward, _ = fluid.layers.dynamic_lstm(
forward, _ = fluid.layers.dynamic_lstm(
input=forward_proj, size=hidden_dim * 4, use_peepholes=False)
input=forward_proj, size=hidden_dim * 4, use_peepholes=False)
"""
"""
assert
bias_attr
is
not
False
,
"bias_attr should not be False in dynamic_lstmp."
helper
=
LayerHelper
(
'lstm'
,
**
locals
())
helper
=
LayerHelper
(
'lstm'
,
**
locals
())
size
=
size
//
4
size
=
size
//
4
weight
=
helper
.
create_parameter
(
weight
=
helper
.
create_parameter
(
...
@@ -532,6 +541,11 @@ def dynamic_lstmp(input,
...
@@ -532,6 +541,11 @@ def dynamic_lstmp(input,
size.
size.
- Projection weight = {:math:`W_{rh}`}.
- Projection weight = {:math:`W_{rh}`}.
- The shape of projection weight is (D x P).
- The shape of projection weight is (D x P).
If it is set to None or one attribute of ParamAttr,
dynamic_lstm will create ParamAttr as param_attr.
If the Initializer of the param_attr is not set, the
parameter is initialized with Xavier. Default: None.
bias_attr(ParamAttr|None): The bias attribute for the learnable bias
bias_attr(ParamAttr|None): The bias attribute for the learnable bias
weights, which contains two parts, input-hidden
weights, which contains two parts, input-hidden
bias weights and peephole connections weights if
bias weights and peephole connections weights if
...
@@ -544,6 +558,11 @@ def dynamic_lstmp(input,
...
@@ -544,6 +558,11 @@ def dynamic_lstmp(input,
- Biases = { :math:`b_c, b_i, b_f, b_o, W_{ic},
\
- Biases = { :math:`b_c, b_i, b_f, b_o, W_{ic},
\
W_{fc}, W_{oc}`}.
W_{fc}, W_{oc}`}.
- The shape is (1 x 7D).
- The shape is (1 x 7D).
If it is set to None or one attribute of ParamAttr,
dynamic_lstm will create ParamAttr as bias_attr.
If the Initializer of the bias_attr is not set,
the bias is initialized zero. Default: None.
use_peepholes(bool): Whether to enable diagonal/peephole connections,
use_peepholes(bool): Whether to enable diagonal/peephole connections,
default `True`.
default `True`.
is_reverse(bool): Whether to compute reversed LSTM, default `False`.
is_reverse(bool): Whether to compute reversed LSTM, default `False`.
...
@@ -588,6 +607,7 @@ def dynamic_lstmp(input,
...
@@ -588,6 +607,7 @@ def dynamic_lstmp(input,
proj_activation="tanh")
proj_activation="tanh")
"""
"""
assert
bias_attr
is
not
False
,
"bias_attr should not be False in dynamic_lstmp."
helper
=
LayerHelper
(
'lstmp'
,
**
locals
())
helper
=
LayerHelper
(
'lstmp'
,
**
locals
())
size
=
size
//
4
size
=
size
//
4
weight
=
helper
.
create_parameter
(
weight
=
helper
.
create_parameter
(
...
@@ -1269,7 +1289,8 @@ def sequence_conv(input,
...
@@ -1269,7 +1289,8 @@ def sequence_conv(input,
padding
=
None
,
padding
=
None
,
bias_attr
=
None
,
bias_attr
=
None
,
param_attr
=
None
,
param_attr
=
None
,
act
=
None
):
act
=
None
,
name
=
None
):
"""
"""
This function creates the op for sequence_conv, using the inputs and
This function creates the op for sequence_conv, using the inputs and
other convolutional configurations for the filters and stride as given
other convolutional configurations for the filters and stride as given
...
@@ -1281,9 +1302,19 @@ def sequence_conv(input,
...
@@ -1281,9 +1302,19 @@ def sequence_conv(input,
filter_size (int): the filter size (H and W).
filter_size (int): the filter size (H and W).
filter_stride (int): stride of the filter.
filter_stride (int): stride of the filter.
padding (bool): if True, add paddings.
padding (bool): if True, add paddings.
bias_attr (ParamAttr|None): attributes for bias
bias_attr (ParamAttr|bool|None): The parameter attribute for the bias of sequence_conv.
param_attr (ParamAttr|None): attributes for parameter
If it is set to False, no bias will be added to the output units.
act (str): the activation type
If it is set to None or one attribute of ParamAttr, sequence_conv
will create ParamAttr as bias_attr. If the Initializer of the bias_attr
is not set, the bias is initialized zero. Default: None.
param_attr (ParamAttr|None): The parameter attribute for learnable parameters/weights
of sequence_conv. If it is set to None or one attribute of ParamAttr, sequence_conv
will create ParamAttr as param_attr. If the Initializer of the param_attr
is not set, the parameter is initialized with Xavier. Default: None.
act (str): Activation type, if it is set to None, activation is not appended.
Default: None.
name (str|None): A name for this layer(optional). If set None, the layer
will be named automatically. Default: None.
Returns:
Returns:
Variable: output of sequence_conv
Variable: output of sequence_conv
...
@@ -1312,7 +1343,7 @@ def sequence_conv(input,
...
@@ -1312,7 +1343,7 @@ def sequence_conv(input,
return
helper
.
append_activation
(
pre_act
)
return
helper
.
append_activation
(
pre_act
)
def
sequence_softmax
(
input
,
param_attr
=
None
,
bias_attr
=
None
,
use_cudnn
=
Fals
e
):
def
sequence_softmax
(
input
,
use_cudnn
=
False
,
name
=
Non
e
):
"""
"""
This function computes the softmax activation among all time-steps for each
This function computes the softmax activation among all time-steps for each
sequence. The dimension of each time-step should be 1. Thus, the shape of
sequence. The dimension of each time-step should be 1. Thus, the shape of
...
@@ -1332,10 +1363,10 @@ def sequence_softmax(input, param_attr=None, bias_attr=None, use_cudnn=False):
...
@@ -1332,10 +1363,10 @@ def sequence_softmax(input, param_attr=None, bias_attr=None, use_cudnn=False):
Args:
Args:
input (Variable): The input variable which is a LoDTensor.
input (Variable): The input variable which is a LoDTensor.
bias_attr (ParamAttr|None): attributes for bias
param_attr (ParamAttr|None): attributes for parameter
use_cudnn (bool): Use cudnn kernel or not, it is valid only when the cudnn
\
use_cudnn (bool): Use cudnn kernel or not, it is valid only when the cudnn
\
library is installed. Default: False
library is installed. Default: False.
name (str|None): A name for this layer(optional). If set None, the layer
will be named automatically. Default: None.
Returns:
Returns:
Variable: output of sequence_softmax
Variable: output of sequence_softmax
...
@@ -1359,7 +1390,7 @@ def sequence_softmax(input, param_attr=None, bias_attr=None, use_cudnn=False):
...
@@ -1359,7 +1390,7 @@ def sequence_softmax(input, param_attr=None, bias_attr=None, use_cudnn=False):
return
softmax_out
return
softmax_out
def
softmax
(
input
,
param_attr
=
None
,
bias_attr
=
None
,
use_cudnn
=
True
,
name
=
None
):
def
softmax
(
input
,
use_cudnn
=
True
,
name
=
None
):
"""
"""
The input of the softmax operator is a tensor of any rank. The output tensor
The input of the softmax operator is a tensor of any rank. The output tensor
has the same shape as the input.
has the same shape as the input.
...
@@ -1386,10 +1417,10 @@ def softmax(input, param_attr=None, bias_attr=None, use_cudnn=True, name=None):
...
@@ -1386,10 +1417,10 @@ def softmax(input, param_attr=None, bias_attr=None, use_cudnn=True, name=None):
Args:
Args:
input (Variable): The input variable.
input (Variable): The input variable.
bias_attr (ParamAttr): attributes for bias
param_attr (ParamAttr): attributes for parameter
use_cudnn (bool): Use cudnn kernel or not, it is valid only when the cudnn
\
use_cudnn (bool): Use cudnn kernel or not, it is valid only when the cudnn
\
library is installed.
library is installed.
name (str|None): A name for this layer(optional). If set None, the layer
will be named automatically. Default: None.
Returns:
Returns:
Variable: output of softmax
Variable: output of softmax
...
@@ -1495,14 +1526,23 @@ def conv2d(input,
...
@@ -1495,14 +1526,23 @@ def conv2d(input,
convolution in Alex Krizhevsky's Deep CNN paper: when group=2,
convolution in Alex Krizhevsky's Deep CNN paper: when group=2,
the first half of the filters is only connected to the first half
the first half of the filters is only connected to the first half
of the input channels, while the second half of the filters is only
of the input channels, while the second half of the filters is only
connected to the second half of the input channels. Default: groups=1
connected to the second half of the input channels. Default: groups=1.
param_attr (ParamAttr): The parameters to the Conv2d Layer. Default: None
param_attr (ParamAttr|None): The parameter attribute for learnable parameters/weights
bias_attr (ParamAttr): Bias parameter for the Conv2d layer. Default: None
of conv2d. If it is set to None or one attribute of ParamAttr, conv2d
will create ParamAttr as param_attr. If the Initializer of the param_attr
is not set, the parameter is initialized with :math:`Normal(0.0, std)`,
and the :math:`std` is :math:`(
\\
frac{2.0 }{filter\_elem\_num})^{0.5}`. Default: None.
bias_attr (ParamAttr|bool|None): The parameter attribute for the bias of conv2d.
If it is set to False, no bias will be added to the output units.
If it is set to None or one attribute of ParamAttr, conv2d
will create ParamAttr as bias_attr. If the Initializer of the bias_attr
is not set, the bias is initialized zero. Default: None.
use_cudnn (bool): Use cudnn kernel or not, it is valid only when the cudnn
use_cudnn (bool): Use cudnn kernel or not, it is valid only when the cudnn
library is installed. Default: True
library is installed. Default: True
act (str): Activation type. Default: None
act (str): Activation type, if it is set to None, activation is not appended.
Default: None
name (str|None): A name for this layer(optional). If set None, the layer
name (str|None): A name for this layer(optional). If set None, the layer
will be named automatically.
will be named automatically.
Default: None
Returns:
Returns:
Variable: The tensor variable storing the convolution and
\
Variable: The tensor variable storing the convolution and
\
...
@@ -1520,7 +1560,7 @@ def conv2d(input,
...
@@ -1520,7 +1560,7 @@ def conv2d(input,
"""
"""
num_channels
=
input
.
shape
[
1
]
num_channels
=
input
.
shape
[
1
]
assert
param_attr
is
not
False
,
"param_attr should not be False here."
l_type
=
'conv2d'
l_type
=
'conv2d'
if
(
num_channels
==
groups
and
num_filters
%
num_channels
==
0
and
if
(
num_channels
==
groups
and
num_filters
%
num_channels
==
0
and
not
use_cudnn
):
not
use_cudnn
):
...
@@ -1548,7 +1588,8 @@ def conv2d(input,
...
@@ -1548,7 +1588,8 @@ def conv2d(input,
filter_shape
=
[
num_filters
,
int
(
num_filter_channels
)]
+
filter_size
filter_shape
=
[
num_filters
,
int
(
num_filter_channels
)]
+
filter_size
def
_get_default_param_initializer
():
def
_get_default_param_initializer
():
std
=
(
2.0
/
(
filter_size
[
0
]
**
2
*
num_channels
))
**
0.5
filter_elem_num
=
filter_size
[
0
]
*
filter_size
[
1
]
*
num_channels
std
=
(
2.0
/
filter_elem_num
)
**
0.5
return
Normal
(
0.0
,
std
,
0
)
return
Normal
(
0.0
,
std
,
0
)
filter_param
=
helper
.
create_parameter
(
filter_param
=
helper
.
create_parameter
(
...
@@ -1659,13 +1700,22 @@ def conv3d(input,
...
@@ -1659,13 +1700,22 @@ def conv3d(input,
the first half of the filters is only connected to the first half
the first half of the filters is only connected to the first half
of the input channels, while the second half of the filters is only
of the input channels, while the second half of the filters is only
connected to the second half of the input channels. Default: groups=1
connected to the second half of the input channels. Default: groups=1
param_attr (ParamAttr): The parameters to the Conv3d Layer. Default: None
param_attr (ParamAttr|None): The parameter attribute for learnable parameters/weights
bias_attr (ParamAttr): Bias parameter for the Conv3d layer. Default: None
of conv3d. If it is set to None or one attribute of ParamAttr, conv3d
will create ParamAttr as param_attr. If it is set to None, the parameter
is initialized with :math:`Normal(0.0, std)`, and the :math:`std` is
:math:`(
\\
frac{2.0 }{filter\_elem\_num})^{0.5}`. Default: None.
bias_attr (ParamAttr|bool|None): The parameter attribute for the bias of conv3d.
If it is set to False, no bias will be added to the output units.
If it is set to None or one attribute of ParamAttr, conv3d
will create ParamAttr as bias_attr. If the Initializer of the bias_attr
is not set, the bias is initialized zero. Default: None.
use_cudnn (bool): Use cudnn kernel or not, it is valid only when the cudnn
use_cudnn (bool): Use cudnn kernel or not, it is valid only when the cudnn
library is installed. Default: True
library is installed. Default: True
act (str): Activation type. Default: None
act (str): Activation type, if it is set to None, activation is not appended.
Default: None.
name (str|None): A name for this layer(optional). If set None, the layer
name (str|None): A name for this layer(optional). If set None, the layer
will be named automatically.
will be named automatically.
Default: None.
Returns:
Returns:
Variable: The tensor variable storing the convolution and
\
Variable: The tensor variable storing the convolution and
\
...
@@ -1683,7 +1733,7 @@ def conv3d(input,
...
@@ -1683,7 +1733,7 @@ def conv3d(input,
"""
"""
l_type
=
'conv3d'
l_type
=
'conv3d'
assert
param_attr
is
not
False
,
"param_attr should not be False here."
helper
=
LayerHelper
(
l_type
,
**
locals
())
helper
=
LayerHelper
(
l_type
,
**
locals
())
dtype
=
helper
.
input_dtype
()
dtype
=
helper
.
input_dtype
()
...
@@ -1708,7 +1758,9 @@ def conv3d(input,
...
@@ -1708,7 +1758,9 @@ def conv3d(input,
filter_shape
=
[
num_filters
,
num_filter_channels
]
+
filter_size
filter_shape
=
[
num_filters
,
num_filter_channels
]
+
filter_size
def
_get_default_param_initializer
():
def
_get_default_param_initializer
():
std
=
(
2.0
/
(
filter_size
[
0
]
**
3
*
num_channels
))
**
0.5
filter_elem_num
=
filter_size
[
0
]
*
filter_size
[
1
]
*
filter_size
[
2
]
*
num_channels
std
=
(
2.0
/
filter_elem_num
)
**
0.5
return
Normal
(
0.0
,
std
,
0
)
return
Normal
(
0.0
,
std
,
0
)
filter_param
=
helper
.
create_parameter
(
filter_param
=
helper
.
create_parameter
(
...
@@ -2180,8 +2232,14 @@ def batch_norm(input,
...
@@ -2180,8 +2232,14 @@ def batch_norm(input,
is_test(bool, Default False): Used for training or training.
is_test(bool, Default False): Used for training or training.
momentum(float, Default 0.9):
momentum(float, Default 0.9):
epsilon(float, Default 1e-05):
epsilon(float, Default 1e-05):
param_attr(ParamAttr): The parameter attribute for Parameter `scale`.
param_attr(ParamAttr|None): The parameter attribute for Parameter `scale`
bias_attr(ParamAttr): The parameter attribute for Parameter `bias`.
of batch_norm. If it is set to None or one attribute of ParamAttr, batch_norm
will create ParamAttr as param_attr. If the Initializer of the param_attr
is not set, the parameter is initialized with Xavier. Default: None.
bias_attr(ParamAttr|None): The parameter attribute for the bias of batch_norm.
If it is set to None or one attribute of ParamAttr, batch_norm
will create ParamAttr as bias_attr. If the Initializer of the bias_attr
is not set, the bias is initialized zero. Default: None.
data_layout(string, default NCHW): NCHW|NHWC
data_layout(string, default NCHW): NCHW|NHWC
in_place(bool, Default False): Make the input and output of batch norm reuse memory.
in_place(bool, Default False): Make the input and output of batch norm reuse memory.
name(string, Default None): A name for this layer(optional). If set None, the layer
name(string, Default None): A name for this layer(optional). If set None, the layer
...
@@ -2201,6 +2259,7 @@ def batch_norm(input,
...
@@ -2201,6 +2259,7 @@ def batch_norm(input,
hidden1 = fluid.layers.fc(input=x, size=200, param_attr='fc1.w')
hidden1 = fluid.layers.fc(input=x, size=200, param_attr='fc1.w')
hidden2 = fluid.layers.batch_norm(input=hidden1)
hidden2 = fluid.layers.batch_norm(input=hidden1)
"""
"""
assert
bias_attr
is
not
False
,
"bias_attr should not be False in batch_norm."
helper
=
LayerHelper
(
'batch_norm'
,
**
locals
())
helper
=
LayerHelper
(
'batch_norm'
,
**
locals
())
dtype
=
helper
.
input_dtype
()
dtype
=
helper
.
input_dtype
()
...
@@ -2479,15 +2538,22 @@ def conv2d_transpose(input,
...
@@ -2479,15 +2538,22 @@ def conv2d_transpose(input,
when group=2, the first half of the filters is only connected to the
when group=2, the first half of the filters is only connected to the
first half of the input channels, while the second half of the
first half of the input channels, while the second half of the
filters is only connected to the second half of the input channels.
filters is only connected to the second half of the input channels.
Default: groups=1
Default: groups = 1.
param_attr(ParamAttr): The parameters to the Conv2d_transpose Layer.
param_attr (ParamAttr|None): The parameter attribute for learnable parameters/weights
Default: None
of conv2d_transpose. If it is set to None or one attribute of ParamAttr, conv2d_transpose
bias_attr(ParamAttr): Bias parameter for the Conv2d layer. Default: None
will create ParamAttr as param_attr. If the Initializer of the param_attr
is not set, the parameter is initialized with Xavier. Default: None.
bias_attr (ParamAttr|bool|None): The parameter attribute for the bias of conv2d_transpose.
If it is set to False, no bias will be added to the output units.
If it is set to None or one attribute of ParamAttr, conv2d_transpose
will create ParamAttr as bias_attr. If the Initializer of the bias_attr
is not set, the bias is initialized zero. Default: None.
use_cudnn(bool): Use cudnn kernel or not, it is valid only when the cudnn
use_cudnn(bool): Use cudnn kernel or not, it is valid only when the cudnn
library is installed. Default: True
library is installed. Default: True.
act(str): Activation type. Default: None
act (str): Activation type, if it is set to None, activation is not appended.
Default: None.
name(str|None): A name for this layer(optional). If set None, the layer
name(str|None): A name for this layer(optional). If set None, the layer
will be named automatically.
will be named automatically.
Default: True.
Returns:
Returns:
Variable: The tensor variable storing the convolution transpose result.
Variable: The tensor variable storing the convolution transpose result.
...
@@ -2502,7 +2568,7 @@ def conv2d_transpose(input,
...
@@ -2502,7 +2568,7 @@ def conv2d_transpose(input,
data = fluid.layers.data(name='data', shape=[3, 32, 32], dtype='float32')
data = fluid.layers.data(name='data', shape=[3, 32, 32], dtype='float32')
conv2d_transpose = fluid.layers.conv2d_transpose(input=data, num_filters=2, filter_size=3)
conv2d_transpose = fluid.layers.conv2d_transpose(input=data, num_filters=2, filter_size=3)
"""
"""
assert
param_attr
is
not
False
,
"param_attr should not be False in conv2d_transpose."
input_channel
=
input
.
shape
[
1
]
input_channel
=
input
.
shape
[
1
]
op_type
=
'conv2d_transpose'
op_type
=
'conv2d_transpose'
...
@@ -2538,6 +2604,7 @@ def conv2d_transpose(input,
...
@@ -2538,6 +2604,7 @@ def conv2d_transpose(input,
else
:
else
:
filter_size
=
utils
.
convert_to_list
(
filter_size
,
2
,
filter_size
=
utils
.
convert_to_list
(
filter_size
,
2
,
'conv2d_transpose.filter_size'
)
'conv2d_transpose.filter_size'
)
if
output_size
is
None
:
if
output_size
is
None
:
output_size
=
[]
output_size
=
[]
elif
isinstance
(
output_size
,
list
)
or
isinstance
(
output_size
,
int
):
elif
isinstance
(
output_size
,
list
)
or
isinstance
(
output_size
,
int
):
...
@@ -2547,6 +2614,7 @@ def conv2d_transpose(input,
...
@@ -2547,6 +2614,7 @@ def conv2d_transpose(input,
padding
=
utils
.
convert_to_list
(
padding
,
2
,
'padding'
)
padding
=
utils
.
convert_to_list
(
padding
,
2
,
'padding'
)
groups
=
1
if
groups
is
None
else
groups
groups
=
1
if
groups
is
None
else
groups
filter_shape
=
[
input_channel
,
num_filters
//
groups
]
+
filter_size
filter_shape
=
[
input_channel
,
num_filters
//
groups
]
+
filter_size
img_filter
=
helper
.
create_parameter
(
img_filter
=
helper
.
create_parameter
(
dtype
=
input
.
dtype
,
shape
=
filter_shape
,
attr
=
helper
.
param_attr
)
dtype
=
input
.
dtype
,
shape
=
filter_shape
,
attr
=
helper
.
param_attr
)
...
@@ -2659,12 +2727,19 @@ def conv3d_transpose(input,
...
@@ -2659,12 +2727,19 @@ def conv3d_transpose(input,
first half of the input channels, while the second half of the
first half of the input channels, while the second half of the
filters is only connected to the second half of the input channels.
filters is only connected to the second half of the input channels.
Default: groups=1
Default: groups=1
param_attr(ParamAttr): The parameters to the Conv3d_transpose Layer.
param_attr (ParamAttr|None): The parameter attribute for learnable parameters/weights
Default: None
of conv3d_transpose. If it is set to None or one attribute of ParamAttr, conv3d_transpose
bias_attr(ParamAttr): Bias parameter for the Conv3d layer. Default: None
will create ParamAttr as param_attr. If the Initializer of the param_attr
is not set, the parameter is initialized with Xavier. Default: None.
bias_attr (ParamAttr|bool|None): The parameter attribute for the bias of conv3d_transpose.
If it is set to False, no bias will be added to the output units.
If it is set to None or one attribute of ParamAttr, conv3d_transpose
will create ParamAttr as bias_attr. If the Initializer of the bias_attr
is not set, the bias is initialized zero. Default: None.
use_cudnn(bool): Use cudnn kernel or not, it is valid only when the cudnn
use_cudnn(bool): Use cudnn kernel or not, it is valid only when the cudnn
library is installed. Default: True
library is installed. Default: True
act(str): Activation type. Default: None
act (str): Activation type, if it is set to None, activation is not appended.
Default: None.
name(str|None): A name for this layer(optional). If set None, the layer
name(str|None): A name for this layer(optional). If set None, the layer
will be named automatically.
will be named automatically.
...
@@ -2681,6 +2756,7 @@ def conv3d_transpose(input,
...
@@ -2681,6 +2756,7 @@ def conv3d_transpose(input,
data = fluid.layers.data(name='data', shape=[3, 12, 32, 32], dtype='float32')
data = fluid.layers.data(name='data', shape=[3, 12, 32, 32], dtype='float32')
conv3d_transpose = fluid.layers.conv3d_transpose(input=data, num_filters=2, filter_size=3)
conv3d_transpose = fluid.layers.conv3d_transpose(input=data, num_filters=2, filter_size=3)
"""
"""
assert
param_attr
is
not
False
,
"param_attr should not be False in conv3d_transpose."
l_type
=
"conv3d_transpose"
l_type
=
"conv3d_transpose"
helper
=
LayerHelper
(
l_type
,
**
locals
())
helper
=
LayerHelper
(
l_type
,
**
locals
())
if
not
isinstance
(
input
,
Variable
):
if
not
isinstance
(
input
,
Variable
):
...
@@ -3199,10 +3275,18 @@ def lstm_unit(x_t,
...
@@ -3199,10 +3275,18 @@ def lstm_unit(x_t,
cell_t_prev (Variable): The cell value of lstm unit, a 2-D tensor with
cell_t_prev (Variable): The cell value of lstm unit, a 2-D tensor with
shape M x S, M for batch size and S for size of lstm unit.
shape M x S, M for batch size and S for size of lstm unit.
forget_bias (float): The forget bias of lstm unit.
forget_bias (float): The forget bias of lstm unit.
param_attr (ParamAttr): The attributes of parameter weights, used to set
param_attr(ParamAttr|None): The parameter attribute for the learnable
initializer, name etc.
hidden-hidden weights.
bias_attr (ParamAttr): The attributes of bias weights, if not False,
If it is set to None or one attribute of ParamAttr,
bias weights will be created and be set to default value.
lstm_unit will create ParamAttr as param_attr.
If the Initializer of the param_attr is not set, the
parameter is initialized with Xavier. Default: None.
bias_attr (ParamAttr|None): The bias attribute for the learnable bias
weights. If it is set to False, no bias will be added
to the output units. If it is set to None or one attribute of ParamAttr,
lstm_unit will create ParamAttr as bias_attr.
If the Initializer of the bias_attr is not set,
the bias is initialized zero. Default: None.
name(str|None): A name for this layer(optional). If set None, the layer
name(str|None): A name for this layer(optional). If set None, the layer
will be named automatically.
will be named automatically.
...
@@ -4116,7 +4200,8 @@ def nce(input,
...
@@ -4116,7 +4200,8 @@ def nce(input,
sample_weight
=
None
,
sample_weight
=
None
,
param_attr
=
None
,
param_attr
=
None
,
bias_attr
=
None
,
bias_attr
=
None
,
num_neg_samples
=
None
):
num_neg_samples
=
None
,
name
=
None
):
"""
"""
${comment}
${comment}
...
@@ -4127,9 +4212,18 @@ def nce(input,
...
@@ -4127,9 +4212,18 @@ def nce(input,
sample_weight (Variable|None): A Variable of shape [batch_size, 1]
sample_weight (Variable|None): A Variable of shape [batch_size, 1]
storing a weight for each sample. The default weight for each
storing a weight for each sample. The default weight for each
sample is 1.0.
sample is 1.0.
param_attr (ParamAttr|None): attributes for parameter
param_attr (ParamAttr|None): The parameter attribute for learnable parameters/weights
bias_attr (ParamAttr|None): attributes for bias
of nce. If it is set to None or one attribute of ParamAttr, nce
will create ParamAttr as param_attr. If the Initializer of the param_attr
is not set, the parameter is initialized with Xavier. Default: None.
bias_attr (ParamAttr|bool|None): The parameter attribute for the bias of nce.
If it is set to False, no bias will be added to the output units.
If it is set to None or one attribute of ParamAttr, nce
will create ParamAttr as bias_attr. If the Initializer of the bias_attr
is not set, the bias is initialized zero. Default: None.
num_neg_samples (int): ${num_neg_samples_comment}
num_neg_samples (int): ${num_neg_samples_comment}
name (str|None): A name for this layer(optional). If set None, the layer
will be named automatically. Default: None.
Returns:
Returns:
Variable: The output nce loss.
Variable: The output nce loss.
...
@@ -4162,19 +4256,28 @@ def nce(input,
...
@@ -4162,19 +4256,28 @@ def nce(input,
"""
"""
helper
=
LayerHelper
(
'nce'
,
**
locals
())
helper
=
LayerHelper
(
'nce'
,
**
locals
())
assert
isinstance
(
input
,
Variable
)
assert
isinstance
(
input
,
Variable
)
dim
=
input
.
shape
[
1
]
assert
isinstance
(
label
,
Variable
)
assert
isinstance
(
label
,
Variable
)
dim
=
input
.
shape
[
1
]
num_true_class
=
label
.
shape
[
1
]
num_true_class
=
label
.
shape
[
1
]
w
=
helper
.
create_parameter
(
w
=
helper
.
create_parameter
(
attr
=
helper
.
param_attr
,
attr
=
helper
.
param_attr
,
shape
=
[
num_total_classes
,
dim
],
shape
=
[
num_total_classes
,
dim
],
is_bias
=
False
,
is_bias
=
False
,
dtype
=
input
.
dtype
)
dtype
=
input
.
dtype
)
inputs
=
{
'Input'
:
input
,
'Label'
:
label
,
'Weight'
:
w
,
'SampleWeight'
:
sample_weight
if
sample_weight
is
not
None
else
[]
}
if
helper
.
bias_attr
:
b
=
helper
.
create_parameter
(
b
=
helper
.
create_parameter
(
attr
=
helper
.
bias_attr
,
attr
=
helper
.
bias_attr
,
shape
=
[
num_total_classes
,
1
],
shape
=
[
num_total_classes
,
1
],
is_bias
=
True
,
is_bias
=
True
,
dtype
=
input
.
dtype
)
dtype
=
input
.
dtype
)
inputs
[
'Bias'
]
=
b
cost
=
helper
.
create_tmp_variable
(
dtype
=
input
.
dtype
)
cost
=
helper
.
create_tmp_variable
(
dtype
=
input
.
dtype
)
sample_logits
=
helper
.
create_tmp_variable
(
dtype
=
input
.
dtype
)
sample_logits
=
helper
.
create_tmp_variable
(
dtype
=
input
.
dtype
)
sample_labels
=
helper
.
create_tmp_variable
(
dtype
=
label
.
dtype
)
sample_labels
=
helper
.
create_tmp_variable
(
dtype
=
label
.
dtype
)
...
@@ -4191,13 +4294,7 @@ def nce(input,
...
@@ -4191,13 +4294,7 @@ def nce(input,
helper
.
append_op
(
helper
.
append_op
(
type
=
'nce'
,
type
=
'nce'
,
inputs
=
{
inputs
=
inputs
,
'Input'
:
input
,
'Label'
:
label
,
'Weight'
:
w
,
'Bias'
:
b
,
'SampleWeight'
:
sample_weight
if
sample_weight
is
not
None
else
[]
},
outputs
=
{
outputs
=
{
'Cost'
:
cost
,
'Cost'
:
cost
,
'SampleLogits'
:
sample_logits
,
'SampleLogits'
:
sample_logits
,
...
@@ -4207,7 +4304,12 @@ def nce(input,
...
@@ -4207,7 +4304,12 @@ def nce(input,
return
cost
/
(
num_neg_samples
+
1
)
return
cost
/
(
num_neg_samples
+
1
)
def
hsigmoid
(
input
,
label
,
num_classes
,
param_attr
=
None
,
bias_attr
=
None
):
def
hsigmoid
(
input
,
label
,
num_classes
,
param_attr
=
None
,
bias_attr
=
None
,
name
=
None
):
"""
"""
The hierarchical sigmoid operator is used to accelerate the training
The hierarchical sigmoid operator is used to accelerate the training
process of language model. This operator organizes the classes into a
process of language model. This operator organizes the classes into a
...
@@ -4228,11 +4330,17 @@ def hsigmoid(input, label, num_classes, param_attr=None, bias_attr=None):
...
@@ -4228,11 +4330,17 @@ def hsigmoid(input, label, num_classes, param_attr=None, bias_attr=None):
label (Variable): The tensor variable contains labels of training data.
label (Variable): The tensor variable contains labels of training data.
It's a tensor with shape is :math:`[N
\\
times 1]`.
It's a tensor with shape is :math:`[N
\\
times 1]`.
num_classes: (int), The number of classes, must not be less than 2.
num_classes: (int), The number of classes, must not be less than 2.
param_attr (ParamAttr|list of ParamAttr, default None): The parameter
param_attr (ParamAttr|None): The parameter attribute for learnable parameters/weights
attribute for learnable parameters/weights of this layer.
of hsigmoid. If it is set to None or one attribute of ParamAttr, hsigmoid
bias_attr (ParamAttr|list of ParamAttr, default None): The parameter
will create ParamAttr as param_attr. If the Initializer of the param_attr
attribute for the bias of this layer. If it is set to False, no
is not set, the parameter is initialized with Xavier. Default: None.
bias will be applied.
bias_attr (ParamAttr|bool|None): The parameter attribute for the bias of hsigmoid.
If it is set to False, no bias will be added to the output units.
If it is set to None or one attribute of ParamAttr, hsigmoid
will create ParamAttr as bias_attr. If the Initializer of the bias_attr
is not set, the bias is initialized zero. Default: None.
name (str|None): A name for this layer(optional). If set None, the layer
will be named automatically. Default: None.
Returns:
Returns:
Out: (Tensor) The cost of hierarchical sigmoid operator. the shape is [N, 1]
Out: (Tensor) The cost of hierarchical sigmoid operator. the shape is [N, 1]
...
...
python/paddle/fluid/nets.py
浏览文件 @
23fc896b
...
@@ -64,23 +64,33 @@ def simple_img_conv_pool(input,
...
@@ -64,23 +64,33 @@ def simple_img_conv_pool(input,
average-pooling. Default :math:`max`.
average-pooling. Default :math:`max`.
global_pooling (bool): Whether to use the global pooling. If global_pooling = true,
global_pooling (bool): Whether to use the global pooling. If global_pooling = true,
pool_size and pool_padding while be ignored. Default False
pool_size and pool_padding while be ignored. Default False
conv_stride (int|list|tuple): The stride size of the
C
onv2d Layer. If stride is a
conv_stride (int|list|tuple): The stride size of the
c
onv2d Layer. If stride is a
list or tuple, it must contain two integers, (conv_stride_H, conv_stride_W). Otherwise,
list or tuple, it must contain two integers, (conv_stride_H, conv_stride_W). Otherwise,
the conv_stride_H = conv_stride_W = conv_stride. Default: conv_stride = 1.
the conv_stride_H = conv_stride_W = conv_stride. Default: conv_stride = 1.
conv_padding (int|list|tuple): The padding size of the
C
onv2d Layer. If padding is
conv_padding (int|list|tuple): The padding size of the
c
onv2d Layer. If padding is
a list or tuple, it must contain two integers, (conv_padding_H, conv_padding_W).
a list or tuple, it must contain two integers, (conv_padding_H, conv_padding_W).
Otherwise, the conv_padding_H = conv_padding_W = conv_padding. Default: conv_padding = 0.
Otherwise, the conv_padding_H = conv_padding_W = conv_padding. Default: conv_padding = 0.
conv_dilation (int|list|tuple): The dilation size of the
C
onv2d Layer. If dilation is
conv_dilation (int|list|tuple): The dilation size of the
c
onv2d Layer. If dilation is
a list or tuple, it must contain two integers, (conv_dilation_H, conv_dilation_W).
a list or tuple, it must contain two integers, (conv_dilation_H, conv_dilation_W).
Otherwise, the conv_dilation_H = conv_dilation_W = conv_dilation. Default: conv_dilation = 1.
Otherwise, the conv_dilation_H = conv_dilation_W = conv_dilation. Default: conv_dilation = 1.
conv_groups (int): The groups number of the
C
onv2d Layer. According to grouped
conv_groups (int): The groups number of the
c
onv2d Layer. According to grouped
convolution in Alex Krizhevsky's Deep CNN paper: when group=2,
convolution in Alex Krizhevsky's Deep CNN paper: when group=2,
the first half of the filters is only connected to the first half
the first half of the filters is only connected to the first half
of the input channels, while the second half of the filters is only
of the input channels, while the second half of the filters is only
connected to the second half of the input channels. Default: groups=1
connected to the second half of the input channels. Default: groups=1.
param_attr (ParamAttr): The parameters to the Conv2d Layer. Default: None
param_attr (ParamAttr|None): The parameter attribute for learnable parameters/weights
bias_attr (ParamAttr): Bias parameter for the Conv2d layer. Default: None
of conv2d. If it is set to None or one attribute of ParamAttr, conv2d
act (str): Activation type for Conv2d. Default: None
will create ParamAttr as param_attr. If the Initializer of the param_attr
is not set, the parameter is initialized with :math:`Normal(0.0, std)`,
and the :math:`std` is :math:`(
\\
frac{2.0 }{filter\_elem\_num})^{0.5}`.
Default: None.
bias_attr (ParamAttr|bool|None): The parameter attribute for the bias of conv2d.
If it is set to False, no bias will be added to the output units.
If it is set to None or one attribute of ParamAttr, conv2d
will create ParamAttr as bias_attr. If the Initializer of the bias_attr
is not set, the bias is initialized zero. Default: None.
act (str): Activation type for conv2d, if it is set to None, activation is not
appended. Default: None.
use_cudnn (bool): Use cudnn kernel or not, it is valid only when the cudnn
use_cudnn (bool): Use cudnn kernel or not, it is valid only when the cudnn
library is installed. Default: True
library is installed. Default: True
...
...
python/paddle/fluid/regularizer.py
浏览文件 @
23fc896b
...
@@ -237,6 +237,7 @@ class L1DecayRegularizer(WeightDecayRegularizer):
...
@@ -237,6 +237,7 @@ class L1DecayRegularizer(WeightDecayRegularizer):
'Ids'
:
idx
},
'Ids'
:
idx
},
outputs
=
{
'Out'
:
decay
},
outputs
=
{
'Out'
:
decay
},
attrs
=
{
'is_sparse'
:
True
})
attrs
=
{
'is_sparse'
:
True
})
param
=
decay
# Append sign op
# Append sign op
block
.
append_op
(
block
.
append_op
(
...
...
python/paddle/fluid/tests/unittests/test_polygon_box_transform.py
浏览文件 @
23fc896b
...
@@ -37,7 +37,7 @@ def PolygonBoxRestore(input):
...
@@ -37,7 +37,7 @@ def PolygonBoxRestore(input):
indexes
=
indexes
.
repeat
(
indexes
=
indexes
.
repeat
(
[
batch_size
],
axis
=
0
)
# [batch_size, geo_channels/2, 2, h, w]
[
batch_size
],
axis
=
0
)
# [batch_size, geo_channels/2, 2, h, w]
return
indexes
.
reshape
(
return
indexes
.
reshape
(
input
.
shape
)
-
input
# [batch_size, geo_channels, h, w]
input
.
shape
)
*
4
-
input
# [batch_size, geo_channels, h, w]
class
TestPolygonBoxRestoreOp
(
OpTest
):
class
TestPolygonBoxRestoreOp
(
OpTest
):
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录