Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
BaiXuePrincess
Paddle
提交
ea7dc9cb
P
Paddle
项目概览
BaiXuePrincess
/
Paddle
与 Fork 源项目一致
Fork自
PaddlePaddle / Paddle
通知
1
Star
1
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
ea7dc9cb
编写于
10月 08, 2018
作者:
T
tensor-tang
浏览文件
操作
浏览文件
下载
差异文件
Merge remote-tracking branch 'ups/develop' into fea/jitkernel
test=develop
上级
2513b2cc
d3f50474
变更
114
展开全部
隐藏空白更改
内联
并排
Showing
114 changed file
with
2996 addition
and
4578 deletion
+2996
-4578
Dockerfile
Dockerfile
+12
-2
cmake/cblas.cmake
cmake/cblas.cmake
+1
-1
cmake/external/openblas.cmake
cmake/external/openblas.cmake
+4
-4
paddle/fluid/API.spec
paddle/fluid/API.spec
+31
-31
paddle/fluid/framework/CMakeLists.txt
paddle/fluid/framework/CMakeLists.txt
+18
-10
paddle/fluid/framework/channel.h
paddle/fluid/framework/channel.h
+0
-291
paddle/fluid/framework/channel_impl.h
paddle/fluid/framework/channel_impl.h
+0
-369
paddle/fluid/framework/channel_test.cc
paddle/fluid/framework/channel_test.cc
+0
-1008
paddle/fluid/framework/concurrency_test.cc
paddle/fluid/framework/concurrency_test.cc
+0
-292
paddle/fluid/framework/details/reference_count_pass.cc
paddle/fluid/framework/details/reference_count_pass.cc
+9
-9
paddle/fluid/framework/executor.cc
paddle/fluid/framework/executor.cc
+1
-4
paddle/fluid/framework/framework.proto
paddle/fluid/framework/framework.proto
+0
-7
paddle/fluid/framework/ir/CMakeLists.txt
paddle/fluid/framework/ir/CMakeLists.txt
+2
-0
paddle/fluid/framework/ir/embedding_fc_lstm_fuse_pass.cc
paddle/fluid/framework/ir/embedding_fc_lstm_fuse_pass.cc
+243
-0
paddle/fluid/framework/ir/embedding_fc_lstm_fuse_pass.h
paddle/fluid/framework/ir/embedding_fc_lstm_fuse_pass.h
+18
-17
paddle/fluid/framework/ir/graph_pattern_detector.cc
paddle/fluid/framework/ir/graph_pattern_detector.cc
+18
-0
paddle/fluid/framework/ir/graph_pattern_detector.h
paddle/fluid/framework/ir/graph_pattern_detector.h
+17
-0
paddle/fluid/framework/naive_executor.cc
paddle/fluid/framework/naive_executor.cc
+21
-4
paddle/fluid/framework/naive_executor.h
paddle/fluid/framework/naive_executor.h
+4
-0
paddle/fluid/framework/op_proto_maker.cc
paddle/fluid/framework/op_proto_maker.cc
+1
-3
paddle/fluid/framework/op_proto_maker.h
paddle/fluid/framework/op_proto_maker.h
+0
-1
paddle/fluid/framework/operator.cc
paddle/fluid/framework/operator.cc
+13
-50
paddle/fluid/framework/parallel_executor.cc
paddle/fluid/framework/parallel_executor.cc
+13
-4
paddle/fluid/framework/rw_lock.h
paddle/fluid/framework/rw_lock.h
+1
-0
paddle/fluid/framework/scope.cc
paddle/fluid/framework/scope.cc
+0
-31
paddle/fluid/framework/tuple.h
paddle/fluid/framework/tuple.h
+0
-1
paddle/fluid/framework/var_desc.cc
paddle/fluid/framework/var_desc.cc
+2
-52
paddle/fluid/framework/var_desc.h
paddle/fluid/framework/var_desc.h
+0
-4
paddle/fluid/framework/var_type.h
paddle/fluid/framework/var_type.h
+0
-6
paddle/fluid/inference/CMakeLists.txt
paddle/fluid/inference/CMakeLists.txt
+3
-1
paddle/fluid/inference/analysis/CMakeLists.txt
paddle/fluid/inference/analysis/CMakeLists.txt
+0
-2
paddle/fluid/inference/analysis/analysis_pass.h
paddle/fluid/inference/analysis/analysis_pass.h
+0
-6
paddle/fluid/inference/analysis/analyzer.h
paddle/fluid/inference/analysis/analyzer.h
+9
-8
paddle/fluid/inference/api/CMakeLists.txt
paddle/fluid/inference/api/CMakeLists.txt
+0
-1
paddle/fluid/inference/api/analysis_predictor.cc
paddle/fluid/inference/api/analysis_predictor.cc
+6
-1
paddle/fluid/inference/api/api_impl.cc
paddle/fluid/inference/api/api_impl.cc
+0
-1
paddle/fluid/inference/api/api_impl_tester.cc
paddle/fluid/inference/api/api_impl_tester.cc
+11
-5
paddle/fluid/inference/api/demo_ci/run.sh
paddle/fluid/inference/api/demo_ci/run.sh
+9
-6
paddle/fluid/inference/api/helper.h
paddle/fluid/inference/api/helper.h
+19
-126
paddle/fluid/inference/api/paddle_inference_api.h
paddle/fluid/inference/api/paddle_inference_api.h
+3
-4
paddle/fluid/inference/tests/api/CMakeLists.txt
paddle/fluid/inference/tests/api/CMakeLists.txt
+8
-0
paddle/fluid/inference/tests/api/anakin_rnn1_tester.cc
paddle/fluid/inference/tests/api/anakin_rnn1_tester.cc
+0
-1
paddle/fluid/inference/tests/api/analyzer_resnet50_tester.cc
paddle/fluid/inference/tests/api/analyzer_resnet50_tester.cc
+96
-0
paddle/fluid/inference/tests/api/analyzer_rnn1_tester.cc
paddle/fluid/inference/tests/api/analyzer_rnn1_tester.cc
+3
-3
paddle/fluid/inference/tests/api/analyzer_text_classification_tester.cc
...nference/tests/api/analyzer_text_classification_tester.cc
+13
-0
paddle/fluid/inference/tests/api/analyzer_vis_tester.cc
paddle/fluid/inference/tests/api/analyzer_vis_tester.cc
+0
-2
paddle/fluid/inference/tests/api/tester_helper.h
paddle/fluid/inference/tests/api/tester_helper.h
+123
-0
paddle/fluid/inference/tests/book/CMakeLists.txt
paddle/fluid/inference/tests/book/CMakeLists.txt
+0
-1
paddle/fluid/operators/CMakeLists.txt
paddle/fluid/operators/CMakeLists.txt
+7
-9
paddle/fluid/operators/channel_close_op.cc
paddle/fluid/operators/channel_close_op.cc
+0
-70
paddle/fluid/operators/channel_create_op.cc
paddle/fluid/operators/channel_create_op.cc
+0
-113
paddle/fluid/operators/channel_recv_op.cc
paddle/fluid/operators/channel_recv_op.cc
+0
-98
paddle/fluid/operators/channel_send_op.cc
paddle/fluid/operators/channel_send_op.cc
+0
-76
paddle/fluid/operators/concurrency/CMakeLists.txt
paddle/fluid/operators/concurrency/CMakeLists.txt
+0
-1
paddle/fluid/operators/concurrency/channel_util.cc
paddle/fluid/operators/concurrency/channel_util.cc
+0
-111
paddle/fluid/operators/conv_op.h
paddle/fluid/operators/conv_op.h
+4
-3
paddle/fluid/operators/conv_transpose_op.h
paddle/fluid/operators/conv_transpose_op.h
+4
-3
paddle/fluid/operators/cub_reduce.h
paddle/fluid/operators/cub_reduce.h
+322
-0
paddle/fluid/operators/detection/roi_perspective_transform_op.cc
...fluid/operators/detection/roi_perspective_transform_op.cc
+5
-6
paddle/fluid/operators/detection/roi_perspective_transform_op.cu
...fluid/operators/detection/roi_perspective_transform_op.cu
+3
-3
paddle/fluid/operators/distributed/grpc_client.h
paddle/fluid/operators/distributed/grpc_client.h
+1
-0
paddle/fluid/operators/distributed/request_handler.h
paddle/fluid/operators/distributed/request_handler.h
+1
-0
paddle/fluid/operators/distributed/rpc_server.h
paddle/fluid/operators/distributed/rpc_server.h
+1
-0
paddle/fluid/operators/elementwise_op.h
paddle/fluid/operators/elementwise_op.h
+1
-1
paddle/fluid/operators/fused_embedding_fc_lstm_op.cc
paddle/fluid/operators/fused_embedding_fc_lstm_op.cc
+604
-0
paddle/fluid/operators/fused_embedding_fc_lstm_op.h
paddle/fluid/operators/fused_embedding_fc_lstm_op.h
+41
-0
paddle/fluid/operators/fusion_gru_op.cc
paddle/fluid/operators/fusion_gru_op.cc
+3
-2
paddle/fluid/operators/fusion_lstm_op.cc
paddle/fluid/operators/fusion_lstm_op.cc
+2
-1
paddle/fluid/operators/math/cpu_lstm_compute.cc
paddle/fluid/operators/math/cpu_lstm_compute.cc
+26
-1
paddle/fluid/operators/math/cpu_lstm_compute.h
paddle/fluid/operators/math/cpu_lstm_compute.h
+2
-19
paddle/fluid/operators/math/depthwise_conv.cu
paddle/fluid/operators/math/depthwise_conv.cu
+323
-156
paddle/fluid/operators/math/depthwise_conv.h
paddle/fluid/operators/math/depthwise_conv.h
+4
-1
paddle/fluid/operators/math/math_function.cc
paddle/fluid/operators/math/math_function.cc
+9
-0
paddle/fluid/operators/math/math_function.h
paddle/fluid/operators/math/math_function.h
+0
-12
paddle/fluid/operators/reduce_mean_op.cu
paddle/fluid/operators/reduce_mean_op.cu
+56
-9
paddle/fluid/operators/reduce_sum_op.cu
paddle/fluid/operators/reduce_sum_op.cu
+51
-9
paddle/fluid/operators/select_op.cc
paddle/fluid/operators/select_op.cc
+0
-419
paddle/fluid/operators/sequence_erase_op.cc
paddle/fluid/operators/sequence_erase_op.cc
+2
-2
paddle/fluid/operators/tensorrt_engine_op.h
paddle/fluid/operators/tensorrt_engine_op.h
+1
-1
paddle/fluid/operators/top_k_op.cc
paddle/fluid/operators/top_k_op.cc
+0
-2
paddle/fluid/platform/dynload/cublas.h
paddle/fluid/platform/dynload/cublas.h
+1
-1
paddle/fluid/platform/dynload/cudnn.h
paddle/fluid/platform/dynload/cudnn.h
+10
-7
paddle/fluid/platform/dynload/curand.h
paddle/fluid/platform/dynload/curand.h
+1
-1
paddle/fluid/platform/dynload/dynamic_loader.cc
paddle/fluid/platform/dynload/dynamic_loader.cc
+17
-3
paddle/fluid/platform/gpu_info.cc
paddle/fluid/platform/gpu_info.cc
+5
-2
paddle/fluid/pybind/const_value.cc
paddle/fluid/pybind/const_value.cc
+0
-3
paddle/fluid/pybind/protobuf.cc
paddle/fluid/pybind/protobuf.cc
+0
-2
paddle/fluid/pybind/pybind.cc
paddle/fluid/pybind/pybind.cc
+0
-1
paddle/fluid/train/CMakeLists.txt
paddle/fluid/train/CMakeLists.txt
+0
-1
paddle/legacy/trainer/tests/CMakeLists.txt
paddle/legacy/trainer/tests/CMakeLists.txt
+5
-1
paddle/scripts/paddle_build.sh
paddle/scripts/paddle_build.sh
+16
-4
python/paddle/fluid/clip.py
python/paddle/fluid/clip.py
+3
-1
python/paddle/fluid/concurrency.py
python/paddle/fluid/concurrency.py
+0
-454
python/paddle/fluid/contrib/tests/test_quantize_transpiler.py
...on/paddle/fluid/contrib/tests/test_quantize_transpiler.py
+16
-9
python/paddle/fluid/framework.py
python/paddle/fluid/framework.py
+2
-12
python/paddle/fluid/layers/control_flow.py
python/paddle/fluid/layers/control_flow.py
+1
-1
python/paddle/fluid/layers/detection.py
python/paddle/fluid/layers/detection.py
+95
-8
python/paddle/fluid/layers/nn.py
python/paddle/fluid/layers/nn.py
+488
-179
python/paddle/fluid/layers/ops.py
python/paddle/fluid/layers/ops.py
+1
-12
python/paddle/fluid/nets.py
python/paddle/fluid/nets.py
+6
-16
python/paddle/fluid/tests/CMakeLists.txt
python/paddle/fluid/tests/CMakeLists.txt
+1
-0
python/paddle/fluid/tests/book/high-level-api/recognize_digits/CMakeLists.txt
...tests/book/high-level-api/recognize_digits/CMakeLists.txt
+13
-3
python/paddle/fluid/tests/no_test_concurrency.py
python/paddle/fluid/tests/no_test_concurrency.py
+0
-260
python/paddle/fluid/tests/notest_concurrency.py
python/paddle/fluid/tests/notest_concurrency.py
+0
-41
python/paddle/fluid/tests/unittests/CMakeLists.txt
python/paddle/fluid/tests/unittests/CMakeLists.txt
+5
-3
python/paddle/fluid/tests/unittests/dist_se_resnext.py
python/paddle/fluid/tests/unittests/dist_se_resnext.py
+1
-1
python/paddle/fluid/tests/unittests/test_conv2d_op.py
python/paddle/fluid/tests/unittests/test_conv2d_op.py
+52
-7
python/paddle/fluid/tests/unittests/test_dist_base.py
python/paddle/fluid/tests/unittests/test_dist_base.py
+30
-36
python/paddle/fluid/tests/unittests/test_dist_ctr.py
python/paddle/fluid/tests/unittests/test_dist_ctr.py
+4
-3
python/paddle/fluid/tests/unittests/test_dist_simnet_bow.py
python/paddle/fluid/tests/unittests/test_dist_simnet_bow.py
+4
-4
python/paddle/fluid/tests/unittests/test_dist_text_classification.py
...le/fluid/tests/unittests/test_dist_text_classification.py
+2
-2
python/paddle/fluid/tests/unittests/test_layers.py
python/paddle/fluid/tests/unittests/test_layers.py
+9
-0
python/paddle/fluid/tests/unittests/test_operator_desc.py
python/paddle/fluid/tests/unittests/test_operator_desc.py
+1
-1
python/paddle/fluid/transpiler/inference_transpiler.py
python/paddle/fluid/transpiler/inference_transpiler.py
+2
-2
未找到文件。
Dockerfile
浏览文件 @
ea7dc9cb
...
...
@@ -24,6 +24,7 @@ COPY ./paddle/scripts/docker/root/ /root/
RUN
apt-get update
&&
\
apt-get
install
-y
--allow-downgrades
patchelf
\
python3 python3-dev python3-pip
\
git python-pip python-dev python-opencv openssh-server bison
\
libnccl2
=
2.1.2-1+cuda8.0 libnccl-dev
=
2.1.2-1+cuda8.0
\
wget unzip unrar
tar
xz-utils bzip2
gzip
coreutils ntp
\
...
...
@@ -70,24 +71,33 @@ RUN localedef -i en_US -f UTF-8 en_US.UTF-8
# specify sphinx version as 1.5.6 and remove -U option for [pip install -U
# sphinx-rtd-theme] since -U option will cause sphinx being updated to newest
# version(1.7.1 for now), which causes building documentation failed.
RUN
easy_install
-U
pip
&&
\
RUN
pip3
install
-U
wheel
&&
\
pip3
install
-U
docopt PyYAML
sphinx
==
1.5.6
&&
\
pip3
install
sphinx-rtd-theme
==
0.1.9 recommonmark
&&
\
easy_install
-U
pip
&&
\
pip
install
-U
wheel
&&
\
pip
install
-U
docopt PyYAML
sphinx
==
1.5.6
&&
\
pip
install
sphinx-rtd-theme
==
0.1.9 recommonmark
RUN
pip
install
pre-commit
'ipython==5.3.0'
&&
\
RUN
pip3
install
pre-commit
'ipython==5.3.0'
&&
\
pip3
install
'ipykernel==4.6.0'
'jupyter==1.0.0'
&&
\
pip3
install
opencv-python
&&
\
pip
install
pre-commit
'ipython==5.3.0'
&&
\
pip
install
'ipykernel==4.6.0'
'jupyter==1.0.0'
&&
\
pip
install
opencv-python
#For docstring checker
RUN
pip3
install
pylint pytest astroid isort
RUN
pip
install
pylint pytest astroid isort LinkChecker
COPY
./python/requirements.txt /root/
RUN
pip3
install
-r
/root/requirements.txt
RUN
pip
install
-r
/root/requirements.txt
# To fix https://github.com/PaddlePaddle/Paddle/issues/1954, we use
# the solution in https://urllib3.readthedocs.io/en/latest/user-guide.html#ssl-py2
RUN
apt-get
install
-y
libssl-dev libffi-dev
RUN
pip3
install
certifi urllib3[secure]
RUN
pip
install
certifi urllib3[secure]
...
...
cmake/cblas.cmake
浏览文件 @
ea7dc9cb
...
...
@@ -40,7 +40,7 @@ set(OPENBLAS_LIB_SEARCH_PATHS
/usr/local/opt/openblas/lib
)
find_path
(
OPENBLAS_INC_DIR NAMES cblas.h
PATHS
${
OPENBLAS_INCLUDE_SEARCH_PATHS
}
)
PATHS
${
OPENBLAS_INCLUDE_SEARCH_PATHS
}
NO_DEFAULT_PATH
)
find_path
(
OPENBLAS_LAPACKE_INC_DIR NAMES lapacke.h
PATHS
${
OPENBLAS_INCLUDE_SEARCH_PATHS
}
)
find_library
(
OPENBLAS_LIB NAMES openblas
...
...
cmake/external/openblas.cmake
浏览文件 @
ea7dc9cb
...
...
@@ -27,7 +27,7 @@ IF(NOT ${CBLAS_FOUND})
SET
(
CBLAS_SOURCES_DIR
${
THIRD_PARTY_PATH
}
/openblas
)
SET
(
CBLAS_INSTALL_DIR
${
THIRD_PARTY_PATH
}
/install/openblas
)
SET
(
CBLAS_INC
LUDE
_DIR
"
${
CBLAS_INSTALL_DIR
}
/include"
CACHE PATH
"openblas include directory."
FORCE
)
SET
(
CBLAS_INC_DIR
"
${
CBLAS_INSTALL_DIR
}
/include"
CACHE PATH
"openblas include directory."
FORCE
)
SET
(
CBLAS_LIBRARIES
"
${
CBLAS_INSTALL_DIR
}
/lib/
${
CMAKE_STATIC_LIBRARY_PREFIX
}
openblas
${
CMAKE_STATIC_LIBRARY_SUFFIX
}
"
...
...
@@ -96,7 +96,7 @@ IF(NOT ${CBLAS_FOUND})
ENDIF
(
NOT WIN32
)
SET
(
CBLAS_PROVIDER openblas
)
IF
(
WITH_C_API
)
INSTALL
(
DIRECTORY
${
CBLAS_INC
LUDE
_DIR
}
DESTINATION third_party/openblas
)
INSTALL
(
DIRECTORY
${
CBLAS_INC_DIR
}
DESTINATION third_party/openblas
)
# Because libopenblas.a is a symbolic link of another library, thus need to
# install the whole directory.
IF
(
ANDROID
)
...
...
@@ -117,8 +117,8 @@ IF(NOT ${CBLAS_FOUND})
ENDIF
(
NOT
${
CBLAS_FOUND
}
)
MESSAGE
(
STATUS
"BLAS library:
${
CBLAS_LIBRARIES
}
"
)
MESSAGE
(
STATUS
"BLAS Include:
${
CBLAS_INC
LUDE
_DIR
}
"
)
INCLUDE_DIRECTORIES
(
${
CBLAS_INC
LUDE
_DIR
}
)
MESSAGE
(
STATUS
"BLAS Include:
${
CBLAS_INC_DIR
}
"
)
INCLUDE_DIRECTORIES
(
${
CBLAS_INC_DIR
}
)
# FIXME(gangliao): generate cblas target to track all high performance
# linear algebra libraries for cc_library(xxx SRCS xxx.c DEPS cblas)
...
...
paddle/fluid/API.spec
浏览文件 @
ea7dc9cb
...
...
@@ -49,7 +49,7 @@ paddle.fluid.initializer.BilinearInitializer.__init__ ArgSpec(args=['self'], var
paddle.fluid.initializer.MSRAInitializer.__init__ ArgSpec(args=['self', 'uniform', 'fan_in', 'seed'], varargs=None, keywords=None, defaults=(True, None, 0))
paddle.fluid.initializer.force_init_on_cpu ArgSpec(args=[], varargs=None, keywords=None, defaults=None)
paddle.fluid.initializer.init_on_cpu ArgSpec(args=[], varargs='args', keywords='kwds', defaults=None)
paddle.fluid.layers.fc ArgSpec(args=['input', 'size', 'num_flatten_dims', 'param_attr', 'bias_attr', '
use_mkldnn', 'act', 'is_test', 'name'], varargs=None, keywords=None, defaults=(1, None, None, Fals
e, None, False, None))
paddle.fluid.layers.fc ArgSpec(args=['input', 'size', 'num_flatten_dims', 'param_attr', 'bias_attr', '
act', 'is_test', 'name'], varargs=None, keywords=None, defaults=(1, None, Non
e, None, False, None))
paddle.fluid.layers.embedding ArgSpec(args=['input', 'size', 'is_sparse', 'is_distributed', 'padding_idx', 'param_attr', 'dtype'], varargs=None, keywords=None, defaults=(False, False, None, None, 'float32'))
paddle.fluid.layers.dynamic_lstm ArgSpec(args=['input', 'size', 'h_0', 'c_0', 'param_attr', 'bias_attr', 'use_peepholes', 'is_reverse', 'gate_activation', 'cell_activation', 'candidate_activation', 'dtype', 'name'], varargs=None, keywords=None, defaults=(None, None, None, None, True, False, 'sigmoid', 'tanh', 'tanh', 'float32', None))
paddle.fluid.layers.dynamic_lstmp ArgSpec(args=['input', 'size', 'proj_size', 'param_attr', 'bias_attr', 'use_peepholes', 'is_reverse', 'gate_activation', 'cell_activation', 'candidate_activation', 'proj_activation', 'dtype', 'name'], varargs=None, keywords=None, defaults=(None, None, True, False, 'sigmoid', 'tanh', 'tanh', 'tanh', 'float32', None))
...
...
@@ -62,14 +62,14 @@ paddle.fluid.layers.cross_entropy ArgSpec(args=['input', 'label', 'soft_label',
paddle.fluid.layers.square_error_cost ArgSpec(args=['input', 'label'], varargs=None, keywords=None, defaults=None)
paddle.fluid.layers.chunk_eval ArgSpec(args=['input', 'label', 'chunk_scheme', 'num_chunk_types', 'excluded_chunk_types'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.layers.sequence_conv ArgSpec(args=['input', 'num_filters', 'filter_size', 'filter_stride', 'padding', 'bias_attr', 'param_attr', 'act'], varargs=None, keywords=None, defaults=(3, 1, None, None, None, None))
paddle.fluid.layers.conv2d ArgSpec(args=['input', 'num_filters', 'filter_size', 'stride', 'padding', 'dilation', 'groups', 'param_attr', 'bias_attr', 'use_cudnn', '
use_mkldnn', 'act', 'name'], varargs=None, keywords=None, defaults=(1, 0, 1, None, None, None, True, Fals
e, None, None))
paddle.fluid.layers.conv3d ArgSpec(args=['input', 'num_filters', 'filter_size', 'stride', 'padding', 'dilation', 'groups', 'param_attr', 'bias_attr', 'use_cudnn', '
use_mkldnn', 'act', 'name'], varargs=None, keywords=None, defaults=(1, 0, 1, None, None, None, True, Fals
e, None, None))
paddle.fluid.layers.conv2d ArgSpec(args=['input', 'num_filters', 'filter_size', 'stride', 'padding', 'dilation', 'groups', 'param_attr', 'bias_attr', 'use_cudnn', '
act', 'name'], varargs=None, keywords=None, defaults=(1, 0, 1, None, None, None, Tru
e, None, None))
paddle.fluid.layers.conv3d ArgSpec(args=['input', 'num_filters', 'filter_size', 'stride', 'padding', 'dilation', 'groups', 'param_attr', 'bias_attr', 'use_cudnn', '
act', 'name'], varargs=None, keywords=None, defaults=(1, 0, 1, None, None, None, Tru
e, None, None))
paddle.fluid.layers.sequence_pool ArgSpec(args=['input', 'pool_type'], varargs=None, keywords=None, defaults=None)
paddle.fluid.layers.sequence_softmax ArgSpec(args=['input', 'param_attr', 'bias_attr', 'use_cudnn'], varargs=None, keywords=None, defaults=(None, None, False))
paddle.fluid.layers.softmax ArgSpec(args=['input', 'param_attr', 'bias_attr', 'use_cudnn', 'name'], varargs=None, keywords=None, defaults=(None, None, True, None))
paddle.fluid.layers.pool2d ArgSpec(args=['input', 'pool_size', 'pool_type', 'pool_stride', 'pool_padding', 'global_pooling', 'use_cudnn', 'ceil_mode', '
use_mkldnn', 'name'], varargs=None, keywords=None, defaults=(-1, 'max', 1, 0, False, True, Fals
e, False, None))
paddle.fluid.layers.pool3d ArgSpec(args=['input', 'pool_size', 'pool_type', 'pool_stride', 'pool_padding', 'global_pooling', 'use_cudnn', 'ceil_mode', '
use_mkldnn', 'name'], varargs=None, keywords=None, defaults=(-1, 'max', 1, 0, False, True, Fals
e, False, None))
paddle.fluid.layers.batch_norm ArgSpec(args=['input', 'act', 'is_test', 'momentum', 'epsilon', 'param_attr', 'bias_attr', 'data_layout', 'in_place', '
use_mkldnn', 'name', 'moving_mean_name', 'moving_variance_name', 'do_model_average_for_mean_and_var', 'fuse_with_relu'], varargs=None, keywords=None, defaults=(None, False, 0.9, 1e-05, None, None, 'NCHW', False
, False, None, None, None, False, False))
paddle.fluid.layers.pool2d ArgSpec(args=['input', 'pool_size', 'pool_type', 'pool_stride', 'pool_padding', 'global_pooling', 'use_cudnn', 'ceil_mode', '
name'], varargs=None, keywords=None, defaults=(-1, 'max', 1, 0, False, Tru
e, False, None))
paddle.fluid.layers.pool3d ArgSpec(args=['input', 'pool_size', 'pool_type', 'pool_stride', 'pool_padding', 'global_pooling', 'use_cudnn', 'ceil_mode', '
name'], varargs=None, keywords=None, defaults=(-1, 'max', 1, 0, False, Tru
e, False, None))
paddle.fluid.layers.batch_norm ArgSpec(args=['input', 'act', 'is_test', 'momentum', 'epsilon', 'param_attr', 'bias_attr', 'data_layout', 'in_place', '
name', 'moving_mean_name', 'moving_variance_name', 'do_model_average_for_mean_and_var', 'fuse_with_relu'], varargs=None, keywords=None, defaults=(None, False, 0.9, 1e-05, None, None, 'NCHW'
, False, None, None, None, False, False))
paddle.fluid.layers.beam_search_decode ArgSpec(args=['ids', 'scores', 'beam_size', 'end_id', 'name'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.layers.conv2d_transpose ArgSpec(args=['input', 'num_filters', 'output_size', 'filter_size', 'padding', 'stride', 'dilation', 'groups', 'param_attr', 'bias_attr', 'use_cudnn', 'act', 'name'], varargs=None, keywords=None, defaults=(None, None, 0, 1, 1, None, None, None, True, None, None))
paddle.fluid.layers.conv3d_transpose ArgSpec(args=['input', 'num_filters', 'output_size', 'filter_size', 'padding', 'stride', 'dilation', 'groups', 'param_attr', 'bias_attr', 'use_cudnn', 'act', 'name'], varargs=None, keywords=None, defaults=(None, None, 0, 1, 1, None, None, None, True, None, None))
...
...
@@ -145,21 +145,31 @@ paddle.fluid.layers.unstack ArgSpec(args=['x', 'axis', 'num'], varargs=None, key
paddle.fluid.layers.sequence_enumerate ArgSpec(args=['input', 'win_size', 'pad_value', 'name'], varargs=None, keywords=None, defaults=(0, None))
paddle.fluid.layers.expand ArgSpec(args=['x', 'expand_times', 'name'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.layers.sequence_concat ArgSpec(args=['input', 'name'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.layers.scale ArgSpec(args=['x', 'scale', 'bias', 'bias_after_scale', '
out', 'act', 'name'], varargs=None, keywords=None, defaults=(1.0, 0.0, True, Non
e, None, None))
paddle.fluid.layers.elementwise_add ArgSpec(args=['x', 'y', '
out', 'axis', 'use_mkldnn', 'act', 'name'], varargs=None, keywords=None, defaults=(None, -1, False
, None, None))
paddle.fluid.layers.elementwise_div ArgSpec(args=['x', 'y', '
out', 'axis', 'use_mkldnn', 'act', 'name'], varargs=None, keywords=None, defaults=(None, -1, False
, None, None))
paddle.fluid.layers.elementwise_sub ArgSpec(args=['x', 'y', '
out', 'axis', 'use_mkldnn', 'act', 'name'], varargs=None, keywords=None, defaults=(None, -1, False
, None, None))
paddle.fluid.layers.elementwise_mul ArgSpec(args=['x', 'y', '
out', 'axis', 'use_mkldnn', 'act', 'name'], varargs=None, keywords=None, defaults=(None, -1, False
, None, None))
paddle.fluid.layers.elementwise_max ArgSpec(args=['x', 'y', '
out', 'axis', 'use_mkldnn', 'act', 'name'], varargs=None, keywords=None, defaults=(None, -1, False
, None, None))
paddle.fluid.layers.elementwise_min ArgSpec(args=['x', 'y', '
out', 'axis', 'use_mkldnn', 'act', 'name'], varargs=None, keywords=None, defaults=(None, -1, False
, None, None))
paddle.fluid.layers.elementwise_pow ArgSpec(args=['x', 'y', '
out', 'axis', 'use_mkldnn', 'act', 'name'], varargs=None, keywords=None, defaults=(None, -1, False
, None, None))
paddle.fluid.layers.scale ArgSpec(args=['x', 'scale', 'bias', 'bias_after_scale', '
act', 'name'], varargs=None, keywords=None, defaults=(1.0, 0.0, Tru
e, None, None))
paddle.fluid.layers.elementwise_add ArgSpec(args=['x', 'y', '
axis', 'act', 'name'], varargs=None, keywords=None, defaults=(-1
, None, None))
paddle.fluid.layers.elementwise_div ArgSpec(args=['x', 'y', '
axis', 'act', 'name'], varargs=None, keywords=None, defaults=(-1
, None, None))
paddle.fluid.layers.elementwise_sub ArgSpec(args=['x', 'y', '
axis', 'act', 'name'], varargs=None, keywords=None, defaults=(-1
, None, None))
paddle.fluid.layers.elementwise_mul ArgSpec(args=['x', 'y', '
axis', 'act', 'name'], varargs=None, keywords=None, defaults=(-1
, None, None))
paddle.fluid.layers.elementwise_max ArgSpec(args=['x', 'y', '
axis', 'act', 'name'], varargs=None, keywords=None, defaults=(-1
, None, None))
paddle.fluid.layers.elementwise_min ArgSpec(args=['x', 'y', '
axis', 'act', 'name'], varargs=None, keywords=None, defaults=(-1
, None, None))
paddle.fluid.layers.elementwise_pow ArgSpec(args=['x', 'y', '
axis', 'act', 'name'], varargs=None, keywords=None, defaults=(-1
, None, None))
paddle.fluid.layers.uniform_random_batch_size_like ArgSpec(args=['input', 'shape', 'dtype', 'input_dim_idx', 'output_dim_idx', 'min', 'max', 'seed'], varargs=None, keywords=None, defaults=('float32', 0, 0, -1.0, 1.0, 0))
paddle.fluid.layers.gaussian_random ArgSpec(args=['shape', 'mean', 'std', 'seed', 'dtype'
, 'use_mkldnn'], varargs=None, keywords=None, defaults=(0.0, 1.0, 0, 'float32', False
))
paddle.fluid.layers.gaussian_random ArgSpec(args=['shape', 'mean', 'std', 'seed', 'dtype'
], varargs=None, keywords=None, defaults=(0.0, 1.0, 0, 'float32'
))
paddle.fluid.layers.sampling_id ArgSpec(args=['x', 'min', 'max', 'seed', 'dtype'], varargs=None, keywords=None, defaults=(0.0, 1.0, 0, 'float32'))
paddle.fluid.layers.gaussian_random_batch_size_like ArgSpec(args=['input', 'shape', 'input_dim_idx', 'output_dim_idx', 'mean', 'std', 'seed', 'dtype'], varargs=None, keywords=None, defaults=(0, 0, 0.0, 1.0, 0, 'float32'))
paddle.fluid.layers.sum ArgSpec(args=['x'
, 'use_mkldnn'], varargs=None, keywords=None, defaults=(False,)
)
paddle.fluid.layers.sum ArgSpec(args=['x'
], varargs=None, keywords=None, defaults=None
)
paddle.fluid.layers.slice ArgSpec(args=['input', 'axes', 'starts', 'ends'], varargs=None, keywords=None, defaults=None)
paddle.fluid.layers.shape ArgSpec(args=['input'], varargs=None, keywords=None, defaults=None)
paddle.fluid.layers.logical_and ArgSpec(args=['x', 'y', 'out', 'name'], varargs=None, keywords=None, defaults=(None, None))
paddle.fluid.layers.logical_or ArgSpec(args=['x', 'y', 'out', 'name'], varargs=None, keywords=None, defaults=(None, None))
paddle.fluid.layers.logical_xor ArgSpec(args=['x', 'y', 'out', 'name'], varargs=None, keywords=None, defaults=(None, None))
paddle.fluid.layers.logical_not ArgSpec(args=['x', 'out', 'name'], varargs=None, keywords=None, defaults=(None, None))
paddle.fluid.layers.clip ArgSpec(args=['x', 'min', 'max', 'name'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.layers.clip_by_norm ArgSpec(args=['x', 'max_norm', 'name'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.layers.mean ArgSpec(args=['x', 'name'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.layers.mul ArgSpec(args=['x', 'y', 'x_num_col_dims', 'y_num_col_dims', 'name'], varargs=None, keywords=None, defaults=(1, 1, None))
paddle.fluid.layers.sigmoid_cross_entropy_with_logits ArgSpec(args=['x', 'label', 'name'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.layers.maxout ArgSpec(args=['x', 'groups', 'name'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.layers.data ArgSpec(args=['name', 'shape', 'append_batch_size', 'dtype', 'lod_level', 'type', 'stop_gradient'], varargs=None, keywords=None, defaults=(True, 'float32', 0, VarType.LOD_TENSOR, True))
paddle.fluid.layers.open_files ArgSpec(args=['filenames', 'shapes', 'lod_levels', 'dtypes', 'thread_num', 'buffer_size', 'pass_num', 'is_test'], varargs=None, keywords=None, defaults=(None, None, 1, None))
paddle.fluid.layers.read_file ArgSpec(args=['reader'], varargs=None, keywords=None, defaults=None)
...
...
@@ -222,16 +232,6 @@ paddle.fluid.layers.StaticRNN.update_memory ArgSpec(args=['self', 'mem', 'var'],
paddle.fluid.layers.reorder_lod_tensor_by_rank ArgSpec(args=['x', 'rank_table'], varargs=None, keywords=None, defaults=None)
paddle.fluid.layers.Print ArgSpec(args=['input', 'first_n', 'message', 'summarize', 'print_tensor_name', 'print_tensor_type', 'print_tensor_shape', 'print_tensor_lod', 'print_phase'], varargs=None, keywords=None, defaults=(-1, None, -1, True, True, True, True, 'both'))
paddle.fluid.layers.is_empty ArgSpec(args=['x', 'cond'], varargs=None, keywords='ignored', defaults=(None,))
paddle.fluid.layers.mean ArgSpec(args=[], varargs='args', keywords='kwargs', defaults=None)
paddle.fluid.layers.mul ArgSpec(args=[], varargs='args', keywords='kwargs', defaults=None)
paddle.fluid.layers.sigmoid_cross_entropy_with_logits ArgSpec(args=[], varargs='args', keywords='kwargs', defaults=None)
paddle.fluid.layers.clip ArgSpec(args=[], varargs='args', keywords='kwargs', defaults=None)
paddle.fluid.layers.clip_by_norm ArgSpec(args=[], varargs='args', keywords='kwargs', defaults=None)
paddle.fluid.layers.logical_and ArgSpec(args=[], varargs='args', keywords='kwargs', defaults=None)
paddle.fluid.layers.logical_or ArgSpec(args=[], varargs='args', keywords='kwargs', defaults=None)
paddle.fluid.layers.logical_xor ArgSpec(args=[], varargs='args', keywords='kwargs', defaults=None)
paddle.fluid.layers.logical_not ArgSpec(args=[], varargs='args', keywords='kwargs', defaults=None)
paddle.fluid.layers.maxout ArgSpec(args=[], varargs='args', keywords='kwargs', defaults=None)
paddle.fluid.layers.sigmoid ArgSpec(args=['x', 'name'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.layers.logsigmoid ArgSpec(args=['x', 'name'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.layers.exp ArgSpec(args=['x', 'name'], varargs=None, keywords=None, defaults=(None,))
...
...
@@ -265,9 +265,9 @@ paddle.fluid.layers.anchor_generator ArgSpec(args=['input', 'anchor_sizes', 'asp
paddle.fluid.layers.roi_perspective_transform ArgSpec(args=['input', 'rois', 'transformed_height', 'transformed_width', 'spatial_scale'], varargs=None, keywords=None, defaults=(1.0,))
paddle.fluid.layers.generate_proposal_labels ArgSpec(args=['rpn_rois', 'gt_classes', 'is_crowd', 'gt_boxes', 'im_info', 'batch_size_per_im', 'fg_fraction', 'fg_thresh', 'bg_thresh_hi', 'bg_thresh_lo', 'bbox_reg_weights', 'class_nums', 'use_random'], varargs=None, keywords=None, defaults=(256, 0.25, 0.25, 0.5, 0.0, [0.1, 0.1, 0.2, 0.2], None, True))
paddle.fluid.layers.generate_proposals ArgSpec(args=['scores', 'bbox_deltas', 'im_info', 'anchors', 'variances', 'pre_nms_top_n', 'post_nms_top_n', 'nms_thresh', 'min_size', 'eta', 'name'], varargs=None, keywords=None, defaults=(6000, 1000, 0.5, 0.1, 1.0, None))
paddle.fluid.layers.iou_similarity ArgSpec(args=[
], varargs='args', keywords='kwargs', defaults=None
)
paddle.fluid.layers.box_coder ArgSpec(args=[
], varargs='args', keywords='kwargs', defaults=None
)
paddle.fluid.layers.polygon_box_transform ArgSpec(args=[
], varargs='args', keywords='kwargs', defaults=None
)
paddle.fluid.layers.iou_similarity ArgSpec(args=[
'x', 'y', 'name'], varargs=None, keywords=None, defaults=(None,)
)
paddle.fluid.layers.box_coder ArgSpec(args=[
'prior_box', 'prior_box_var', 'target_box', 'code_type', 'box_normalized', 'name'], varargs=None, keywords=None, defaults=('encode_center_size', True, None)
)
paddle.fluid.layers.polygon_box_transform ArgSpec(args=[
'input', 'name'], varargs=None, keywords=None, defaults=(None,)
)
paddle.fluid.layers.accuracy ArgSpec(args=['input', 'label', 'k', 'correct', 'total'], varargs=None, keywords=None, defaults=(1, None, None))
paddle.fluid.layers.auc ArgSpec(args=['input', 'label', 'curve', 'num_thresholds', 'topk', 'slide_steps'], varargs=None, keywords=None, defaults=('ROC', 4095, 1, 1))
paddle.fluid.layers.exponential_decay ArgSpec(args=['learning_rate', 'decay_steps', 'decay_rate', 'staircase'], varargs=None, keywords=None, defaults=(False,))
...
...
@@ -318,11 +318,11 @@ paddle.fluid.transpiler.RoundRobin.__init__ ArgSpec(args=['self', 'pserver_endpo
paddle.fluid.transpiler.RoundRobin.dispatch ArgSpec(args=['self', 'varlist'], varargs=None, keywords=None, defaults=None)
paddle.fluid.transpiler.RoundRobin.reset ArgSpec(args=['self'], varargs=None, keywords=None, defaults=None)
paddle.fluid.transpiler.DistributeTranspilerConfig.__init__
paddle.fluid.nets.simple_img_conv_pool ArgSpec(args=['input', 'num_filters', 'filter_size', 'pool_size', 'pool_stride', 'pool_padding', 'pool_type', 'global_pooling', 'conv_stride', 'conv_padding', 'conv_dilation', 'conv_groups', 'param_attr', 'bias_attr', 'act', 'use_cudnn'
, 'use_mkldnn'], varargs=None, keywords=None, defaults=(0, 'max', False, 1, 0, 1, 1, None, None, None, True, Fals
e))
paddle.fluid.nets.simple_img_conv_pool ArgSpec(args=['input', 'num_filters', 'filter_size', 'pool_size', 'pool_stride', 'pool_padding', 'pool_type', 'global_pooling', 'conv_stride', 'conv_padding', 'conv_dilation', 'conv_groups', 'param_attr', 'bias_attr', 'act', 'use_cudnn'
], varargs=None, keywords=None, defaults=(0, 'max', False, 1, 0, 1, 1, None, None, None, Tru
e))
paddle.fluid.nets.sequence_conv_pool ArgSpec(args=['input', 'num_filters', 'filter_size', 'param_attr', 'act', 'pool_type'], varargs=None, keywords=None, defaults=(None, 'sigmoid', 'max'))
paddle.fluid.nets.glu ArgSpec(args=['input', 'dim'], varargs=None, keywords=None, defaults=(-1,))
paddle.fluid.nets.scaled_dot_product_attention ArgSpec(args=['queries', 'keys', 'values', 'num_heads', 'dropout_rate'], varargs=None, keywords=None, defaults=(1, 0.0))
paddle.fluid.nets.img_conv_group ArgSpec(args=['input', 'conv_num_filter', 'pool_size', 'conv_padding', 'conv_filter_size', 'conv_act', 'param_attr', 'conv_with_batchnorm', 'conv_batchnorm_drop_rate', 'pool_stride', 'pool_type', 'use_cudnn'
, 'use_mkldnn'], varargs=None, keywords=None, defaults=(1, 3, None, None, False, 0.0, 1, 'max', True, Fals
e))
paddle.fluid.nets.img_conv_group ArgSpec(args=['input', 'conv_num_filter', 'pool_size', 'conv_padding', 'conv_filter_size', 'conv_act', 'param_attr', 'conv_with_batchnorm', 'conv_batchnorm_drop_rate', 'pool_stride', 'pool_type', 'use_cudnn'
], varargs=None, keywords=None, defaults=(1, 3, None, None, False, 0.0, 1, 'max', Tru
e))
paddle.fluid.optimizer.SGDOptimizer.__init__ ArgSpec(args=['self', 'learning_rate', 'regularization', 'name'], varargs=None, keywords=None, defaults=(None, None))
paddle.fluid.optimizer.SGDOptimizer.minimize ArgSpec(args=['self', 'loss', 'startup_program', 'parameter_list', 'no_grad_set'], varargs=None, keywords=None, defaults=(None, None, None))
paddle.fluid.optimizer.MomentumOptimizer.__init__ ArgSpec(args=['self', 'learning_rate', 'momentum', 'use_nesterov', 'regularization', 'name'], varargs=None, keywords=None, defaults=(False, None, None))
...
...
paddle/fluid/framework/CMakeLists.txt
浏览文件 @
ea7dc9cb
# windows treat symbolic file as a real file, which is different with unix
# We create a hidden file and compile it instead of origin source file.
function
(
windows_symbolic TARGET
)
...
...
@@ -9,11 +10,23 @@ function(windows_symbolic TARGET)
if
(
NOT EXISTS
${
CMAKE_CURRENT_SOURCE_DIR
}
/
${
src
}
.cc OR NOT EXISTS
${
CMAKE_CURRENT_SOURCE_DIR
}
/
${
src
}
.cu
)
message
(
FATAL
"
${
src
}
.cc and
${
src
}
.cu must exsits, and
${
src
}
.cu must be symbolic file."
)
endif
()
add_custom_command
(
OUTPUT .
${
src
}
.cu
# only copy the xx.cu to .xx.cu when the content are modified
set
(
copy_flag 1
)
if
(
EXISTS
${
CMAKE_CURRENT_SOURCE_DIR
}
/.
${
src
}
.cu
)
file
(
READ
${
CMAKE_CURRENT_SOURCE_DIR
}
/
${
src
}
.cc SOURCE_STR
)
file
(
READ
${
CMAKE_CURRENT_SOURCE_DIR
}
/.
${
src
}
.cu TARGET_STR
)
if
(
SOURCE_STR STREQUAL TARGET_STR
)
set
(
copy_flag 0
)
endif
()
endif
()
if
(
copy_flag
)
add_custom_command
(
OUTPUT .
${
src
}
.cu
COMMAND
${
CMAKE_COMMAND
}
-E remove
${
CMAKE_CURRENT_SOURCE_DIR
}
/.
${
src
}
.cu
COMMAND
${
CMAKE_COMMAND
}
-E copy
"
${
CMAKE_CURRENT_SOURCE_DIR
}
/
${
src
}
.cc"
"
${
CMAKE_CURRENT_SOURCE_DIR
}
/.
${
src
}
.cu"
COMMENT
"create hidden file of
${
src
}
.cu"
)
add_custom_target
(
${
TARGET
}
ALL DEPENDS .
${
src
}
.cu
)
endif
(
copy_flag
)
add_custom_target
(
${
TARGET
}
ALL DEPENDS .
${
src
}
.cu
)
endforeach
()
endfunction
()
...
...
@@ -81,6 +94,8 @@ nv_test(data_device_transform_test SRCS data_device_transform_test.cu
if
(
WITH_GPU
)
if
(
WIN32
)
# windows treat symbolic file as a real file, which is different with unix
# We create a hidden file and compile it instead of origin source file.
windows_symbolic
(
hidden_file SRCS data_type_transform.cu
)
nv_library
(
data_type_transform SRCS .data_type_transform.cu DEPS tensor
)
add_dependencies
(
data_type_transform hidden_file
)
...
...
@@ -149,7 +164,7 @@ if(WITH_DISTRIBUTE)
set_source_files_properties
(
executor.cc PROPERTIES COMPILE_FLAGS
${
DISTRIBUTE_COMPILE_FLAGS
}
)
else
()
cc_library
(
executor SRCS executor.cc DEPS op_registry device_context scope framework_proto glog lod_rank_table feed_fetch_method graph_to_program_pass
)
cc_test
(
test_naive_executor SRCS naive_executor_test.cc DEPS naive_executor
op_registry device_context scope framework_proto glog lod_rank_table feed_fetch_method graph_to_program_pass
elementwise_add_op
)
cc_test
(
test_naive_executor SRCS naive_executor_test.cc DEPS naive_executor elementwise_add_op
)
endif
()
if
(
NOT WIN32
)
...
...
@@ -169,15 +184,8 @@ cc_test(selected_rows_test SRCS selected_rows_test.cc DEPS selected_rows)
cc_test
(
op_kernel_type_test SRCS op_kernel_type_test.cc DEPS place device_context framework_proto
)
cc_test
(
cow_ptr_tests SRCS details/cow_ptr_test.cc
)
# cc_test(channel_test SRCS channel_test.cc)
cc_test
(
tuple_test SRCS tuple_test.cc
)
if
(
NOT WIN32
)
cc_test
(
rw_lock_test SRCS rw_lock_test.cc
)
endif
(
NOT WIN32
)
# disable test temporarily.
# TODO https://github.com/PaddlePaddle/Paddle/issues/11971
# cc_test(concurrency_test SRCS concurrency_test.cc DEPS go_op channel_close_op channel_create_op
# channel_send_op channel_recv_op sum_op select_op elementwise_add_op compare_op
# conditional_block_op while_op assign_op print_op executor proto_desc)
paddle/fluid/framework/channel.h
已删除
100644 → 0
浏览文件 @
2513b2cc
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <stddef.h> // for size_t
#include <condition_variable> // NOLINT
#include <typeindex>
#include "paddle/fluid/platform/enforce.h"
namespace
paddle
{
namespace
framework
{
enum
class
ChannelAction
{
SEND
=
0
,
RECEIVE
=
1
,
CLOSE
=
2
,
};
// Channel is the abstract class of buffered and un-buffered channels.
template
<
typename
T
>
class
Channel
{
public:
virtual
bool
CanSend
()
=
0
;
virtual
bool
CanReceive
()
=
0
;
virtual
void
Send
(
T
*
)
=
0
;
virtual
bool
Receive
(
T
*
)
=
0
;
virtual
size_t
Cap
()
=
0
;
virtual
void
Lock
()
=
0
;
virtual
void
Unlock
()
=
0
;
virtual
bool
IsClosed
()
=
0
;
virtual
void
Close
()
=
0
;
virtual
~
Channel
()
{}
virtual
void
AddToSendQ
(
const
void
*
referrer
,
T
*
data
,
std
::
shared_ptr
<
std
::
condition_variable_any
>
cond
,
std
::
function
<
bool
(
ChannelAction
)
>
cb
)
=
0
;
virtual
void
AddToReceiveQ
(
const
void
*
referrer
,
T
*
data
,
std
::
shared_ptr
<
std
::
condition_variable_any
>
cond
,
std
::
function
<
bool
(
ChannelAction
)
>
cb
)
=
0
;
virtual
void
RemoveFromSendQ
(
const
void
*
referrer
)
=
0
;
virtual
void
RemoveFromReceiveQ
(
const
void
*
referrer
)
=
0
;
};
// Forward declaration of channel implementations.
template
<
typename
T
>
class
ChannelImpl
;
template
<
typename
T
>
Channel
<
T
>*
MakeChannel
(
size_t
buffer_size
)
{
return
new
ChannelImpl
<
T
>
(
buffer_size
);
}
template
<
typename
T
>
void
CloseChannel
(
Channel
<
T
>*
ch
)
{
ch
->
Close
();
}
/*
* The ChannelHolder class serves two main purposes:
* 1. It acts as a unified wrapper for the different kinds of
* channels, i.e. Buffered and Unbuffered channels. This is
* similar to the ReaderHolder class.
* 2. It also helps us in TypeHiding. This is similar to the
* PlaceHolder implementations in variable.h and tensor.h.
*/
class
ChannelHolder
{
public:
template
<
typename
T
>
void
Reset
(
size_t
buffer_size
)
{
holder_
.
reset
(
new
PlaceholderImpl
<
T
>
(
buffer_size
));
}
template
<
typename
T
>
void
Send
(
T
*
data
)
{
PADDLE_ENFORCE_EQ
(
IsInitialized
(),
true
,
"The Channel hasn't been initialized"
);
PADDLE_ENFORCE_EQ
(
holder_
->
Type
(),
std
::
type_index
(
typeid
(
T
)),
"Channel type is not same as the type of the data being sent"
);
// Static cast should be safe because we have ensured that types are same
Channel
<
T
>*
channel
=
static_cast
<
Channel
<
T
>*>
(
holder_
->
Ptr
());
PADDLE_ENFORCE_EQ
(
channel
!=
nullptr
,
true
,
"Channel should not be null."
);
channel
->
Send
(
data
);
}
template
<
typename
T
>
bool
Receive
(
T
*
data
)
{
PADDLE_ENFORCE_EQ
(
IsInitialized
(),
true
,
"The Channel hasn't been initialized"
);
PADDLE_ENFORCE_EQ
(
holder_
->
Type
(),
std
::
type_index
(
typeid
(
T
)),
"Channel type is not same as the type of the data being sent"
);
Channel
<
T
>*
channel
=
static_cast
<
Channel
<
T
>*>
(
holder_
->
Ptr
());
PADDLE_ENFORCE_EQ
(
channel
!=
nullptr
,
true
,
"Channel should not be null."
);
return
channel
->
Receive
(
data
);
}
bool
IsClosed
()
{
PADDLE_ENFORCE_EQ
(
IsInitialized
(),
true
,
"The Channel hasn't been initialized"
);
return
holder_
->
IsClosed
();
}
bool
CanSend
()
{
PADDLE_ENFORCE_EQ
(
IsInitialized
(),
true
,
"The Channel hasn't been initialized"
);
return
holder_
->
CanSend
();
}
bool
CanReceive
()
{
PADDLE_ENFORCE_EQ
(
IsInitialized
(),
true
,
"The Channel hasn't been initialized"
);
return
holder_
->
CanReceive
();
}
void
close
()
{
PADDLE_ENFORCE_EQ
(
IsInitialized
(),
true
,
"The Channel hasn't been initialized"
);
holder_
->
Close
();
}
size_t
Cap
()
{
PADDLE_ENFORCE_EQ
(
IsInitialized
(),
true
,
"The Channel hasn't been initialized"
);
return
holder_
->
Cap
();
}
void
Lock
()
{
PADDLE_ENFORCE_EQ
(
IsInitialized
(),
true
,
"The Channel hasn't been initialized"
);
holder_
->
Lock
();
}
void
Unlock
()
{
PADDLE_ENFORCE_EQ
(
IsInitialized
(),
true
,
"The Channel hasn't been initialized"
);
holder_
->
Unlock
();
}
template
<
typename
T
>
void
AddToSendQ
(
const
void
*
referrer
,
T
*
data
,
std
::
shared_ptr
<
std
::
condition_variable_any
>
cond
,
std
::
function
<
bool
(
ChannelAction
)
>
cb
)
{
PADDLE_ENFORCE_EQ
(
IsInitialized
(),
true
,
"The Channel hasn't been initialized"
);
Channel
<
T
>*
channel
=
static_cast
<
Channel
<
T
>*>
(
holder_
->
Ptr
());
if
(
channel
!=
nullptr
)
{
channel
->
AddToSendQ
(
referrer
,
data
,
cond
,
cb
);
}
}
template
<
typename
T
>
void
AddToReceiveQ
(
const
void
*
referrer
,
T
*
data
,
std
::
shared_ptr
<
std
::
condition_variable_any
>
cond
,
std
::
function
<
bool
(
ChannelAction
)
>
cb
)
{
PADDLE_ENFORCE_EQ
(
IsInitialized
(),
true
,
"The Channel hasn't been initialized"
);
Channel
<
T
>*
channel
=
static_cast
<
Channel
<
T
>*>
(
holder_
->
Ptr
());
if
(
channel
!=
nullptr
)
{
channel
->
AddToReceiveQ
(
referrer
,
data
,
cond
,
cb
);
}
}
void
RemoveFromSendQ
(
const
void
*
referrer
)
{
PADDLE_ENFORCE_EQ
(
IsInitialized
(),
true
,
"The Channel hasn't been initialized"
);
holder_
->
RemoveFromSendQ
(
referrer
);
}
void
RemoveFromReceiveQ
(
const
void
*
referrer
)
{
PADDLE_ENFORCE_EQ
(
IsInitialized
(),
true
,
"The Channel hasn't been initialized"
);
holder_
->
RemoveFromReceiveQ
(
referrer
);
}
inline
bool
IsInitialized
()
const
{
return
holder_
!=
nullptr
;
}
inline
const
std
::
type_index
Type
()
{
PADDLE_ENFORCE_EQ
(
IsInitialized
(),
true
,
"The Channel hasn't been initialized"
);
return
holder_
->
Type
();
}
private:
/**
* @note Placeholder hides type T, so it doesn't appear as a template
* parameter of ChannelHolder.
*/
struct
Placeholder
{
virtual
~
Placeholder
()
{}
virtual
const
std
::
type_index
Type
()
const
=
0
;
virtual
void
*
Ptr
()
const
=
0
;
virtual
bool
IsClosed
()
=
0
;
virtual
bool
CanSend
()
=
0
;
virtual
bool
CanReceive
()
=
0
;
virtual
void
RemoveFromSendQ
(
const
void
*
referrer
)
=
0
;
virtual
void
RemoveFromReceiveQ
(
const
void
*
referrer
)
=
0
;
virtual
void
Close
()
=
0
;
virtual
void
Lock
()
=
0
;
virtual
void
Unlock
()
=
0
;
virtual
size_t
Cap
()
=
0
;
};
template
<
typename
T
>
struct
PlaceholderImpl
:
public
Placeholder
{
explicit
PlaceholderImpl
(
size_t
buffer_size
)
:
type_
(
std
::
type_index
(
typeid
(
T
)))
{
channel_
.
reset
(
MakeChannel
<
T
>
(
buffer_size
));
}
virtual
const
std
::
type_index
Type
()
const
{
return
type_
;
}
virtual
void
*
Ptr
()
const
{
return
static_cast
<
void
*>
(
channel_
.
get
());
}
virtual
bool
IsClosed
()
{
if
(
channel_
)
{
return
channel_
->
IsClosed
();
}
return
false
;
}
virtual
bool
CanSend
()
{
if
(
channel_
)
{
return
channel_
->
CanSend
();
}
return
false
;
}
virtual
bool
CanReceive
()
{
if
(
channel_
)
{
return
channel_
->
CanReceive
();
}
return
false
;
}
virtual
void
RemoveFromSendQ
(
const
void
*
referrer
)
{
if
(
channel_
)
{
channel_
->
RemoveFromSendQ
(
referrer
);
}
}
virtual
void
RemoveFromReceiveQ
(
const
void
*
referrer
)
{
if
(
channel_
)
{
channel_
->
RemoveFromReceiveQ
(
referrer
);
}
}
virtual
void
Close
()
{
if
(
channel_
)
channel_
->
Close
();
}
virtual
size_t
Cap
()
{
if
(
channel_
)
return
channel_
->
Cap
();
else
return
-
1
;
}
virtual
void
Lock
()
{
if
(
channel_
)
channel_
->
Lock
();
}
virtual
void
Unlock
()
{
if
(
channel_
)
channel_
->
Unlock
();
}
std
::
unique_ptr
<
Channel
<
T
>>
channel_
;
const
std
::
type_index
type_
;
};
// Pointer to a PlaceholderImpl object
std
::
unique_ptr
<
Placeholder
>
holder_
;
};
}
// namespace framework
}
// namespace paddle
#include "paddle/fluid/framework/channel_impl.h"
paddle/fluid/framework/channel_impl.h
已删除
100644 → 0
浏览文件 @
2513b2cc
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <stddef.h> // for size_t
#include <atomic>
#include <condition_variable> // NOLINT
#include <deque>
#include "paddle/fluid/framework/channel.h"
#include "paddle/fluid/platform/enforce.h"
namespace
paddle
{
namespace
framework
{
template
<
typename
T
>
class
ChannelImpl
:
public
paddle
::
framework
::
Channel
<
T
>
{
friend
Channel
<
T
>
*
paddle
::
framework
::
MakeChannel
<
T
>
(
size_t
);
friend
void
paddle
::
framework
::
CloseChannel
<
T
>
(
Channel
<
T
>
*
);
public:
virtual
bool
CanSend
();
virtual
bool
CanReceive
();
virtual
void
Send
(
T
*
);
virtual
bool
Receive
(
T
*
);
virtual
size_t
Cap
()
{
return
cap_
;
}
virtual
void
Lock
();
virtual
void
Unlock
();
virtual
bool
IsClosed
();
virtual
void
Close
();
explicit
ChannelImpl
(
size_t
);
virtual
~
ChannelImpl
();
virtual
void
AddToSendQ
(
const
void
*
referrer
,
T
*
data
,
std
::
shared_ptr
<
std
::
condition_variable_any
>
cond
,
std
::
function
<
bool
(
ChannelAction
)
>
cb
);
virtual
void
AddToReceiveQ
(
const
void
*
referrer
,
T
*
data
,
std
::
shared_ptr
<
std
::
condition_variable_any
>
cond
,
std
::
function
<
bool
(
ChannelAction
)
>
cb
);
virtual
void
RemoveFromSendQ
(
const
void
*
referrer
);
virtual
void
RemoveFromReceiveQ
(
const
void
*
referrer
);
private:
struct
QueueMessage
{
T
*
data
;
std
::
shared_ptr
<
std
::
condition_variable_any
>
cond
;
bool
chan_closed
=
false
;
bool
completed
=
false
;
const
void
*
referrer
;
// TODO(thuan): figure out better way to do this
std
::
function
<
bool
(
ChannelAction
)
>
callback
;
explicit
QueueMessage
(
T
*
item
)
:
data
(
item
),
cond
(
std
::
make_shared
<
std
::
condition_variable_any
>
())
{}
QueueMessage
(
T
*
item
,
std
::
shared_ptr
<
std
::
condition_variable_any
>
cond
)
:
data
(
item
),
cond
(
cond
)
{}
void
Wait
(
std
::
unique_lock
<
std
::
recursive_mutex
>
&
lock
)
{
cond
->
wait
(
lock
,
[
this
]()
{
return
completed
;
});
}
void
Notify
()
{
completed
=
true
;
cond
->
notify_all
();
}
};
void
send_return
()
{
send_ctr
--
;
destructor_cond_
.
notify_all
();
}
bool
recv_return
(
bool
value
)
{
recv_ctr
--
;
destructor_cond_
.
notify_all
();
return
value
;
}
std
::
shared_ptr
<
QueueMessage
>
get_first_message
(
std
::
deque
<
std
::
shared_ptr
<
QueueMessage
>>
*
queue
,
ChannelAction
action
)
{
while
(
!
queue
->
empty
())
{
// Check whether this message was added by Select
// If this was added by Select then execute the callback
// to check if you can execute this message. The callback
// can return false if some other case was executed in Select.
// In that case just discard this QueueMessage and process next.
std
::
shared_ptr
<
QueueMessage
>
m
=
queue
->
front
();
queue
->
pop_front
();
if
(
m
->
callback
==
nullptr
||
m
->
callback
(
action
))
return
m
;
}
return
nullptr
;
}
size_t
cap_
;
std
::
recursive_mutex
mu_
;
bool
closed_
;
std
::
deque
<
T
>
buf_
;
std
::
deque
<
std
::
shared_ptr
<
QueueMessage
>>
recvq
;
std
::
deque
<
std
::
shared_ptr
<
QueueMessage
>>
sendq
;
std
::
atomic
<
unsigned
>
send_ctr
{
0
};
std
::
atomic
<
unsigned
>
recv_ctr
{
0
};
std
::
condition_variable_any
destructor_cond_
;
};
template
<
typename
T
>
ChannelImpl
<
T
>::
ChannelImpl
(
size_t
capacity
)
:
cap_
(
capacity
),
closed_
(
false
),
send_ctr
(
0
),
recv_ctr
(
0
)
{
PADDLE_ENFORCE_GE
(
capacity
,
0
);
}
template
<
typename
T
>
bool
ChannelImpl
<
T
>::
CanSend
()
{
std
::
lock_guard
<
std
::
recursive_mutex
>
lock
{
mu_
};
return
!
closed_
&&
(
!
recvq
.
empty
()
||
buf_
.
size
()
<
cap_
);
}
template
<
typename
T
>
bool
ChannelImpl
<
T
>::
CanReceive
()
{
std
::
lock_guard
<
std
::
recursive_mutex
>
lock
{
mu_
};
return
!
(
closed_
&&
buf_
.
empty
())
&&
(
!
sendq
.
empty
()
||
buf_
.
size
()
>
0
);
}
template
<
typename
T
>
void
ChannelImpl
<
T
>::
Send
(
T
*
item
)
{
send_ctr
++
;
std
::
unique_lock
<
std
::
recursive_mutex
>
lock
{
mu_
};
// If channel is closed, throw exception
if
(
closed_
)
{
send_return
();
lock
.
unlock
();
PADDLE_THROW
(
"Cannot send on closed channel"
);
}
// If there is a receiver, directly pass the value we want
// to send to the receiver, bypassing the channel buffer if any
if
(
!
recvq
.
empty
())
{
std
::
shared_ptr
<
QueueMessage
>
m
=
get_first_message
(
&
recvq
,
ChannelAction
::
SEND
);
if
(
m
!=
nullptr
)
{
*
(
m
->
data
)
=
std
::
move
(
*
item
);
m
->
Notify
();
send_return
();
return
;
}
else
{
Send
(
item
);
send_return
();
return
;
}
}
// Unbuffered channel will always bypass this
// If buffered channel has space in buffer,
// write the element to the buffer.
if
(
buf_
.
size
()
<
cap_
)
{
// Copy to buffer
buf_
.
push_back
(
std
::
move
(
*
item
));
send_return
();
return
;
}
// Block on channel, because some receiver will complete
// the operation for us
auto
m
=
std
::
make_shared
<
QueueMessage
>
(
item
);
sendq
.
push_back
(
m
);
m
->
Wait
(
lock
);
if
(
m
->
chan_closed
)
{
send_return
();
lock
.
unlock
();
PADDLE_THROW
(
"Cannot send on closed channel"
);
}
send_return
();
}
template
<
typename
T
>
bool
ChannelImpl
<
T
>::
Receive
(
T
*
item
)
{
recv_ctr
++
;
std
::
unique_lock
<
std
::
recursive_mutex
>
lock
{
mu_
};
// If channel is closed and buffer is empty or
// channel is unbuffered
if
(
closed_
&&
buf_
.
empty
())
return
recv_return
(
false
);
// If there is a sender, directly receive the value we want
// from the sender. In case of a buffered channel, read from
// buffer and move front of send queue to the buffer
if
(
!
sendq
.
empty
())
{
std
::
shared_ptr
<
QueueMessage
>
m
=
get_first_message
(
&
sendq
,
ChannelAction
::
RECEIVE
);
if
(
buf_
.
size
()
>
0
)
{
// Case 1 : Channel is Buffered
// Do Data transfer from front of buffer
// and add a QueueMessage to the buffer
*
item
=
std
::
move
(
buf_
.
front
());
buf_
.
pop_front
();
// If first message from sendq is not null
// add it to the buffer and notify it
if
(
m
!=
nullptr
)
{
// Copy to buffer
buf_
.
push_back
(
std
::
move
(
*
(
m
->
data
)));
m
->
Notify
();
}
// Ignore if there is no first message
}
else
{
// Case 2: Channel is Unbuffered
// Do data transfer from front of SendQ
// If front is nullptr, then recursively call itself
if
(
m
!=
nullptr
)
{
*
item
=
std
::
move
(
*
(
m
->
data
));
m
->
Notify
();
}
else
{
return
recv_return
(
Receive
(
item
));
}
}
return
recv_return
(
true
);
}
// If this is a buffered channel and there are items in buffer
if
(
buf_
.
size
()
>
0
)
{
// Directly read from buffer
*
item
=
std
::
move
(
buf_
.
front
());
buf_
.
pop_front
();
// return true
return
recv_return
(
true
);
}
// No sender available, block on this channel
// Some receiver will complete the option for us
auto
m
=
std
::
make_shared
<
QueueMessage
>
(
item
);
recvq
.
push_back
(
m
);
m
->
Wait
(
lock
);
return
recv_return
(
!
m
->
chan_closed
);
}
template
<
typename
T
>
void
ChannelImpl
<
T
>::
Lock
()
{
mu_
.
lock
();
}
template
<
typename
T
>
void
ChannelImpl
<
T
>::
Unlock
()
{
mu_
.
unlock
();
}
template
<
typename
T
>
bool
ChannelImpl
<
T
>::
IsClosed
()
{
std
::
lock_guard
<
std
::
recursive_mutex
>
lock
{
mu_
};
return
closed_
;
}
template
<
typename
T
>
void
ChannelImpl
<
T
>::
Close
()
{
std
::
unique_lock
<
std
::
recursive_mutex
>
lock
{
mu_
};
if
(
closed_
)
{
// TODO(abhinavarora): closing an already closed channel should panic
lock
.
unlock
();
return
;
}
closed_
=
true
;
// Empty the readers
while
(
!
recvq
.
empty
())
{
std
::
shared_ptr
<
QueueMessage
>
m
=
recvq
.
front
();
recvq
.
pop_front
();
m
->
chan_closed
=
true
;
// Execute callback function (if any)
if
(
m
->
callback
!=
nullptr
)
{
m
->
callback
(
ChannelAction
::
CLOSE
);
}
m
->
Notify
();
}
// Empty the senders
while
(
!
sendq
.
empty
())
{
std
::
shared_ptr
<
QueueMessage
>
m
=
sendq
.
front
();
sendq
.
pop_front
();
m
->
chan_closed
=
true
;
// Execute callback function (if any)
if
(
m
->
callback
!=
nullptr
)
{
m
->
callback
(
ChannelAction
::
CLOSE
);
}
m
->
Notify
();
}
}
template
<
typename
T
>
void
ChannelImpl
<
T
>::
AddToSendQ
(
const
void
*
referrer
,
T
*
data
,
std
::
shared_ptr
<
std
::
condition_variable_any
>
cond
,
std
::
function
<
bool
(
ChannelAction
)
>
cb
)
{
std
::
lock_guard
<
std
::
recursive_mutex
>
lock
{
mu_
};
auto
m
=
std
::
make_shared
<
QueueMessage
>
(
data
,
cond
);
m
->
referrer
=
referrer
;
m
->
callback
=
cb
;
sendq
.
push_back
(
m
);
}
template
<
typename
T
>
void
ChannelImpl
<
T
>::
AddToReceiveQ
(
const
void
*
referrer
,
T
*
data
,
std
::
shared_ptr
<
std
::
condition_variable_any
>
cond
,
std
::
function
<
bool
(
ChannelAction
)
>
cb
)
{
std
::
lock_guard
<
std
::
recursive_mutex
>
lock
{
mu_
};
auto
m
=
std
::
make_shared
<
QueueMessage
>
(
data
,
cond
);
m
->
referrer
=
referrer
;
m
->
callback
=
cb
;
recvq
.
push_back
(
m
);
}
template
<
typename
T
>
void
ChannelImpl
<
T
>::
RemoveFromSendQ
(
const
void
*
referrer
)
{
std
::
lock_guard
<
std
::
recursive_mutex
>
lock
{
mu_
};
for
(
auto
it
=
sendq
.
begin
();
it
!=
sendq
.
end
();)
{
std
::
shared_ptr
<
QueueMessage
>
sendMsg
=
(
std
::
shared_ptr
<
QueueMessage
>
)
*
it
;
if
(
sendMsg
->
referrer
==
referrer
)
{
it
=
sendq
.
erase
(
it
);
}
else
{
++
it
;
}
}
}
template
<
typename
T
>
void
ChannelImpl
<
T
>::
RemoveFromReceiveQ
(
const
void
*
referrer
)
{
std
::
lock_guard
<
std
::
recursive_mutex
>
lock
{
mu_
};
for
(
auto
it
=
recvq
.
begin
();
it
!=
recvq
.
end
();)
{
std
::
shared_ptr
<
QueueMessage
>
recvMsg
=
(
std
::
shared_ptr
<
QueueMessage
>
)
*
it
;
if
(
recvMsg
->
referrer
==
referrer
)
{
it
=
recvq
.
erase
(
it
);
}
else
{
++
it
;
}
}
}
template
<
typename
T
>
ChannelImpl
<
T
>::~
ChannelImpl
()
{
Close
();
// The destructor must wait for all readers and writers to complete their task
// The channel has been closed, so we will not accept new readers and writers
std
::
unique_lock
<
std
::
recursive_mutex
>
lock
{
mu_
};
destructor_cond_
.
wait
(
lock
,
[
this
]()
{
return
send_ctr
==
0
&&
recv_ctr
==
0
;
});
}
}
// namespace framework
}
// namespace paddle
paddle/fluid/framework/channel_test.cc
已删除
100644 → 0
浏览文件 @
2513b2cc
此差异已折叠。
点击以展开。
paddle/fluid/framework/concurrency_test.cc
已删除
100644 → 0
浏览文件 @
2513b2cc
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <thread> // NOLINT
#include "gtest/gtest.h"
#include "paddle/fluid/framework/block_desc.h"
#include "paddle/fluid/framework/channel.h"
#include "paddle/fluid/framework/executor.h"
#include "paddle/fluid/framework/op_registry.h"
USE_NO_KERNEL_OP
(
go
);
USE_NO_KERNEL_OP
(
channel_close
);
USE_NO_KERNEL_OP
(
channel_create
);
USE_NO_KERNEL_OP
(
channel_recv
);
USE_NO_KERNEL_OP
(
channel_send
);
USE_NO_KERNEL_OP
(
elementwise_add
);
USE_NO_KERNEL_OP
(
select
);
USE_NO_KERNEL_OP
(
conditional_block
);
USE_NO_KERNEL_OP
(
equal
);
USE_NO_KERNEL_OP
(
assign
);
USE_NO_KERNEL_OP
(
while
);
USE_NO_KERNEL_OP
(
print
);
namespace
f
=
paddle
::
framework
;
namespace
p
=
paddle
::
platform
;
namespace
paddle
{
namespace
framework
{
template
<
typename
T
>
LoDTensor
*
CreateVariable
(
Scope
*
scope
,
const
p
::
CPUPlace
&
place
,
std
::
string
name
,
T
value
)
{
// Create LoDTensor<int> of dim [1]
auto
var
=
scope
->
Var
(
name
);
auto
tensor
=
var
->
GetMutable
<
LoDTensor
>
();
tensor
->
Resize
({
1
});
T
*
expect
=
tensor
->
mutable_data
<
T
>
(
place
);
expect
[
0
]
=
value
;
return
tensor
;
}
void
AddOp
(
const
std
::
string
&
type
,
const
VariableNameMap
&
inputs
,
const
VariableNameMap
&
outputs
,
AttributeMap
attrs
,
BlockDesc
*
block
)
{
// insert op
auto
op
=
block
->
AppendOp
();
op
->
SetType
(
type
);
for
(
auto
&
kv
:
inputs
)
{
op
->
SetInput
(
kv
.
first
,
kv
.
second
);
}
for
(
auto
&
kv
:
outputs
)
{
op
->
SetOutput
(
kv
.
first
,
kv
.
second
);
}
op
->
SetAttrMap
(
attrs
);
}
void
AddCase
(
ProgramDesc
*
program
,
Scope
*
scope
,
p
::
CPUPlace
*
place
,
BlockDesc
*
casesBlock
,
int
caseId
,
int
caseType
,
std
::
string
caseChannel
,
std
::
string
caseVarName
,
std
::
function
<
void
(
BlockDesc
*
,
Scope
*
)
>
func
)
{
std
::
string
caseCondName
=
std
::
string
(
"caseCond"
)
+
std
::
to_string
(
caseId
);
std
::
string
caseCondXVarName
=
std
::
string
(
"caseCondX"
)
+
std
::
to_string
(
caseId
);
BlockDesc
*
caseBlock
=
program
->
AppendBlock
(
*
casesBlock
);
func
(
caseBlock
,
scope
);
CreateVariable
(
scope
,
*
place
,
caseCondName
,
false
);
CreateVariable
(
scope
,
*
place
,
caseCondXVarName
,
caseId
);
CreateVariable
(
scope
,
*
place
,
caseVarName
,
caseId
);
scope
->
Var
(
"step_scope"
);
AddOp
(
"equal"
,
{{
"X"
,
{
caseCondXVarName
}},
{
"Y"
,
{
"caseToExecute"
}}},
{{
"Out"
,
{
caseCondName
}}},
{},
casesBlock
);
AddOp
(
"conditional_block"
,
{{
"X"
,
{
caseCondName
}},
{
"Params"
,
{}}},
{{
"Out"
,
{}},
{
"Scope"
,
{
"step_scope"
}}},
{{
"sub_block"
,
caseBlock
},
{
"is_scalar_condition"
,
true
}},
casesBlock
);
}
void
AddFibonacciSelect
(
Scope
*
scope
,
p
::
CPUPlace
*
place
,
ProgramDesc
*
program
,
BlockDesc
*
parentBlock
,
std
::
string
dataChanName
,
std
::
string
quitChanName
)
{
BlockDesc
*
whileBlock
=
program
->
AppendBlock
(
*
parentBlock
);
CreateVariable
(
scope
,
*
place
,
"whileExitCond"
,
true
);
CreateVariable
(
scope
,
*
place
,
"caseToExecute"
,
-
1
);
CreateVariable
(
scope
,
*
place
,
"case1var"
,
0
);
CreateVariable
(
scope
,
*
place
,
"xtemp"
,
0
);
// TODO(thuan): Need to create fibXToSend, since channel send moves the actual
// data,
// which causes the data to be no longer accessible to do the fib calculation
// TODO(abhinav): Change channel send to do a copy instead of a move!
CreateVariable
(
scope
,
*
place
,
"fibXToSend"
,
0
);
CreateVariable
(
scope
,
*
place
,
"fibX"
,
0
);
CreateVariable
(
scope
,
*
place
,
"fibY"
,
1
);
CreateVariable
(
scope
,
*
place
,
"quitVar"
,
0
);
BlockDesc
*
casesBlock
=
program
->
AppendBlock
(
*
whileBlock
);
std
::
function
<
void
(
BlockDesc
*
caseBlock
)
>
f
=
[](
BlockDesc
*
caseBlock
)
{};
// TODO(thuan): Remove this once we change channel send to do a copy instead
// of move
AddOp
(
"assign"
,
{{
"X"
,
{
"fibX"
}}},
{{
"Out"
,
{
"fibXToSend"
}}},
{},
whileBlock
);
// Case 0: Send to dataChanName
std
::
function
<
void
(
BlockDesc
*
caseBlock
,
Scope
*
scope
)
>
case0Func
=
[
&
](
BlockDesc
*
caseBlock
,
Scope
*
scope
)
{
AddOp
(
"assign"
,
{{
"X"
,
{
"fibX"
}}},
{{
"Out"
,
{
"xtemp"
}}},
{},
caseBlock
);
AddOp
(
"assign"
,
{{
"X"
,
{
"fibY"
}}},
{{
"Out"
,
{
"fibX"
}}},
{},
caseBlock
);
AddOp
(
"elementwise_add"
,
{{
"X"
,
{
"xtemp"
}},
{
"Y"
,
{
"fibY"
}}},
{{
"Out"
,
{
"fibY"
}}},
{},
caseBlock
);
};
AddCase
(
program
,
scope
,
place
,
casesBlock
,
0
,
1
,
dataChanName
,
"fibXToSend"
,
case0Func
);
std
::
string
case0Config
=
std
::
string
(
"0,1,"
)
+
dataChanName
+
std
::
string
(
",fibXToSend"
);
// Case 1: Receive from quitChanName
std
::
function
<
void
(
BlockDesc
*
caseBlock
,
Scope
*
scope
)
>
case2Func
=
[
&
](
BlockDesc
*
caseBlock
,
Scope
*
scope
)
{
// Exit the while loop after we receive from quit channel.
// We assign a false to "whileExitCond" variable, which will
// break out of while_op loop
CreateVariable
(
scope
,
*
place
,
"whileFalse"
,
false
);
AddOp
(
"assign"
,
{{
"X"
,
{
"whileFalse"
}}},
{{
"Out"
,
{
"whileExitCond"
}}},
{},
caseBlock
);
};
AddCase
(
program
,
scope
,
place
,
casesBlock
,
1
,
2
,
quitChanName
,
"quitVar"
,
case2Func
);
std
::
string
case1Config
=
std
::
string
(
"1,2,"
)
+
quitChanName
+
std
::
string
(
",quitVar"
);
// Select block
AddOp
(
"select"
,
{{
"X"
,
{
dataChanName
,
quitChanName
}},
{
"case_to_execute"
,
{
"caseToExecute"
}}},
{{
"Out"
,
{}}},
{{
"sub_block"
,
casesBlock
},
{
"cases"
,
std
::
vector
<
std
::
string
>
{
case0Config
,
case1Config
}}},
whileBlock
);
scope
->
Var
(
"stepScopes"
);
AddOp
(
"while"
,
{{
"X"
,
{
dataChanName
,
quitChanName
}},
{
"Condition"
,
{
"whileExitCond"
}}},
{{
"Out"
,
{}},
{
"StepScopes"
,
{
"stepScopes"
}}},
{{
"sub_block"
,
whileBlock
}},
parentBlock
);
}
TEST
(
Concurrency
,
Go_Op
)
{
Scope
scope
;
p
::
CPUPlace
place
;
// Initialize scope variables
p
::
CPUDeviceContext
ctx
(
place
);
// Create channel variable
scope
.
Var
(
"Channel"
);
// Create Variables, x0 will be put into channel,
// result will be pulled from channel
CreateVariable
(
&
scope
,
place
,
"Status"
,
false
);
CreateVariable
(
&
scope
,
place
,
"x0"
,
99
);
CreateVariable
(
&
scope
,
place
,
"result"
,
0
);
framework
::
Executor
executor
(
place
);
ProgramDesc
program
;
BlockDesc
*
block
=
program
.
MutableBlock
(
0
);
// Create channel OP
AddOp
(
"channel_create"
,
{},
{{
"Out"
,
{
"Channel"
}}},
{{
"capacity"
,
10
},
{
"data_type"
,
f
::
proto
::
VarType
::
LOD_TENSOR
}},
block
);
// Create Go Op routine
BlockDesc
*
goOpBlock
=
program
.
AppendBlock
(
program
.
Block
(
0
));
AddOp
(
"channel_send"
,
{{
"Channel"
,
{
"Channel"
}},
{
"X"
,
{
"x0"
}}},
{{
"Status"
,
{
"Status"
}}},
{},
goOpBlock
);
// Create Go Op
AddOp
(
"go"
,
{{
"X"
,
{
"Channel"
,
"x0"
}}},
{},
{{
"sub_block"
,
goOpBlock
}},
block
);
// Create Channel Receive Op
AddOp
(
"channel_recv"
,
{{
"Channel"
,
{
"Channel"
}}},
{{
"Status"
,
{
"Status"
}},
{
"Out"
,
{
"result"
}}},
{},
block
);
// Create Channel Close Op
AddOp
(
"channel_close"
,
{{
"Channel"
,
{
"Channel"
}}},
{},
{},
block
);
// Check the result tensor to make sure it is set to 0
const
LoDTensor
&
tensor
=
(
scope
.
FindVar
(
"result"
))
->
Get
<
LoDTensor
>
();
auto
*
initialData
=
tensor
.
data
<
int
>
();
EXPECT_EQ
(
initialData
[
0
],
0
);
executor
.
Run
(
program
,
&
scope
,
0
,
true
,
true
);
// After we call executor.run, the Go operator should do a channel_send to
// set the "result" variable to 99.
auto
*
finalData
=
tensor
.
data
<
int
>
();
EXPECT_EQ
(
finalData
[
0
],
99
);
}
/**
* This test implements the fibonacci function using go_op and select_op
*/
TEST
(
Concurrency
,
Select
)
{
Scope
scope
;
p
::
CPUPlace
place
;
// Initialize scope variables
p
::
CPUDeviceContext
ctx
(
place
);
CreateVariable
(
&
scope
,
place
,
"Status"
,
false
);
CreateVariable
(
&
scope
,
place
,
"result"
,
0
);
CreateVariable
(
&
scope
,
place
,
"currentXFib"
,
0
);
framework
::
Executor
executor
(
place
);
ProgramDesc
program
;
BlockDesc
*
block
=
program
.
MutableBlock
(
0
);
// Create channel OP
std
::
string
dataChanName
=
"Channel"
;
scope
.
Var
(
dataChanName
);
AddOp
(
"channel_create"
,
{},
{{
"Out"
,
{
dataChanName
}}},
{{
"capacity"
,
0
},
{
"data_type"
,
f
::
proto
::
VarType
::
LOD_TENSOR
}},
block
);
std
::
string
quitChanName
=
"Quit"
;
scope
.
Var
(
quitChanName
);
AddOp
(
"channel_create"
,
{},
{{
"Out"
,
{
quitChanName
}}},
{{
"capacity"
,
0
},
{
"data_type"
,
f
::
proto
::
VarType
::
LOD_TENSOR
}},
block
);
// Create Go Op routine, which loops 10 times over fibonacci sequence
CreateVariable
(
&
scope
,
place
,
"xReceiveVar"
,
0
);
BlockDesc
*
goOpBlock
=
program
.
AppendBlock
(
program
.
Block
(
0
));
for
(
int
i
=
0
;
i
<
10
;
++
i
)
{
AddOp
(
"channel_recv"
,
{{
"Channel"
,
{
dataChanName
}}},
{{
"Status"
,
{
"Status"
}},
{
"Out"
,
{
"currentXFib"
}}},
{},
goOpBlock
);
AddOp
(
"print"
,
{{
"In"
,
{
"currentXFib"
}}},
{{
"Out"
,
{
"currentXFib"
}}},
{{
"first_n"
,
100
},
{
"summarize"
,
-
1
},
{
"print_tensor_name"
,
false
},
{
"print_tensor_type"
,
true
},
{
"print_tensor_shape"
,
false
},
{
"print_tensor_lod"
,
false
},
{
"print_phase"
,
std
::
string
(
"FORWARD"
)},
{
"message"
,
std
::
string
(
"X: "
)}},
goOpBlock
);
}
CreateVariable
(
&
scope
,
place
,
"quitSignal"
,
0
);
AddOp
(
"channel_send"
,
{{
"Channel"
,
{
quitChanName
}},
{
"X"
,
{
"quitSignal"
}}},
{{
"Status"
,
{
"Status"
}}},
{},
goOpBlock
);
// Create Go Op
AddOp
(
"go"
,
{{
"X"
,
{
dataChanName
,
quitChanName
}}},
{},
{{
"sub_block"
,
goOpBlock
}},
block
);
AddFibonacciSelect
(
&
scope
,
&
place
,
&
program
,
block
,
dataChanName
,
quitChanName
);
// Create Channel Close Op
AddOp
(
"channel_close"
,
{{
"Channel"
,
{
dataChanName
}}},
{},
{},
block
);
AddOp
(
"channel_close"
,
{{
"Channel"
,
{
quitChanName
}}},
{},
{},
block
);
executor
.
Run
(
program
,
&
scope
,
0
,
true
,
true
);
// After we call executor.run, "result" variable should be equal to 34
// (which is 10 loops through fibonacci sequence)
const
LoDTensor
&
tensor
=
(
scope
.
FindVar
(
"currentXFib"
))
->
Get
<
LoDTensor
>
();
auto
*
finalData
=
tensor
.
data
<
int
>
();
EXPECT_EQ
(
finalData
[
0
],
34
);
}
}
// namespace framework
}
// namespace paddle
paddle/fluid/framework/details/reference_count_pass.cc
浏览文件 @
ea7dc9cb
...
...
@@ -80,15 +80,15 @@ std::unique_ptr<ir::Graph> ReferenceCountPass::ApplyImpl(
// This is weird but there is really some variables without var_desc
// in computation_op
if
(
var_desc
==
nullptr
)
{
if
(
compute_op
->
Node
()
->
Op
()
->
Block
()
->
FindVar
(
var_name
)
==
nullptr
)
continue
;
}
else
{
if
(
var_desc
->
Persistable
())
continue
;
auto
var_type
=
var_desc
->
Proto
()
->
type
().
type
()
;
if
(
var_type
!=
proto
::
VarType
::
LOD_TENSOR
&&
var_type
!=
proto
::
VarType
::
SELECTED_ROWS
)
{
continue
;
}
var_desc
=
compute_op
->
Node
()
->
Op
()
->
Block
()
->
FindVar
(
var_name
);
if
(
var_desc
==
nullptr
)
continue
;
}
if
(
var_desc
->
Persistable
())
continue
;
auto
var_type
=
var_desc
->
Proto
()
->
type
().
type
();
if
(
var_type
!=
proto
::
VarType
::
LOD_TENSOR
&&
var_type
!=
proto
::
VarType
::
SELECTED_ROWS
)
{
continue
;
}
// compute op only runs in one device
...
...
paddle/fluid/framework/executor.cc
浏览文件 @
ea7dc9cb
...
...
@@ -14,7 +14,6 @@ limitations under the License. */
#include "paddle/fluid/framework/executor.h"
#include "paddle/fluid/framework/channel.h"
#include "paddle/fluid/framework/feed_fetch_method.h"
#include "paddle/fluid/framework/lod_rank_table.h"
#include "paddle/fluid/framework/lod_tensor_array.h"
...
...
@@ -76,15 +75,13 @@ void InitializeVariable(Variable* var, proto::VarType::Type var_type) {
var
->
GetMutable
<
platform
::
PlaceList
>
();
}
else
if
(
var_type
==
proto
::
VarType
::
READER
)
{
var
->
GetMutable
<
ReaderHolder
>
();
}
else
if
(
var_type
==
proto
::
VarType
::
CHANNEL
)
{
var
->
GetMutable
<
ChannelHolder
>
();
}
else
if
(
var_type
==
proto
::
VarType
::
RAW
)
{
// GetMutable will be called in operator
}
else
{
PADDLE_THROW
(
"Variable type %d is not in "
"[LOD_TENSOR, SELECTED_ROWS, FEED_MINIBATCH, FETCH_LIST, "
"LOD_RANK_TABLE, PLACE_LIST, READER,
CHANNEL,
RAW]"
,
"LOD_RANK_TABLE, PLACE_LIST, READER, RAW]"
,
var_type
);
}
}
...
...
paddle/fluid/framework/framework.proto
浏览文件 @
ea7dc9cb
...
...
@@ -126,7 +126,6 @@ message VarType {
LOD_TENSOR_ARRAY
=
13
;
PLACE_LIST
=
14
;
READER
=
15
;
CHANNEL
=
16
;
// Any runtime decided variable type is raw
// raw variables should manage their own allocations
// in operators like nccl_op
...
...
@@ -158,12 +157,6 @@ message VarType {
message
ReaderDesc
{
repeated
LoDTensorDesc
lod_tensor
=
1
;
}
optional
ReaderDesc
reader
=
5
;
message
ChannelDesc
{
required
Type
data_type
=
1
;
required
int64
capacity
=
2
;
}
optional
ChannelDesc
channel
=
6
;
message
Tuple
{
repeated
Type
element_type
=
1
;
}
optional
Tuple
tuple
=
7
;
}
...
...
paddle/fluid/framework/ir/CMakeLists.txt
浏览文件 @
ea7dc9cb
set
(
pass_file
${
PADDLE_BINARY_DIR
}
/paddle/fluid/inference/api/paddle_inference_pass.h
)
file
(
WRITE
${
pass_file
}
"// Generated by the paddle/fluid/framework/ir/CMakeLists.txt. DO NOT EDIT!
\n\n
"
)
file
(
APPEND
${
pass_file
}
"
\#
pragma once
\n
"
)
file
(
APPEND
${
pass_file
}
"
\#
include
\"
paddle/fluid/framework/ir/pass.h
\"\n
"
)
...
...
@@ -34,6 +35,7 @@ endif ()
pass_library
(
attention_lstm_fuse_pass inference
)
pass_library
(
infer_clean_graph_pass inference
)
pass_library
(
fc_lstm_fuse_pass inference
)
pass_library
(
embedding_fc_lstm_fuse_pass inference
)
pass_library
(
fc_gru_fuse_pass inference
)
pass_library
(
seq_concat_fc_fuse_pass inference
)
...
...
paddle/fluid/framework/ir/embedding_fc_lstm_fuse_pass.cc
0 → 100644
浏览文件 @
ea7dc9cb
// Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "paddle/fluid/framework/ir/embedding_fc_lstm_fuse_pass.h"
#include <algorithm>
#include <string>
#include "paddle/fluid/framework/lod_tensor.h"
#include "paddle/fluid/operators/math/blas.h"
#include "paddle/fluid/operators/math/cpu_vec.h"
#include "paddle/fluid/operators/math/fc_compute.h"
#include "paddle/fluid/platform/cpu_info.h"
namespace
paddle
{
namespace
framework
{
namespace
ir
{
static
int
BuildFusion
(
Graph
*
graph
,
const
std
::
string
&
name_scope
,
Scope
*
scope
,
bool
with_fc_bias
)
{
GraphPatternDetector
gpd
;
auto
*
pattern
=
gpd
.
mutable_pattern
();
// Build pattern
PDNode
*
x
=
pattern
->
NewNode
(
patterns
::
PDNodeName
(
name_scope
,
"x"
))
->
assert_is_op_input
(
"lookup_table"
)
->
assert_var_not_persistable
();
patterns
::
Embedding
embedding_pattern
(
pattern
,
name_scope
);
// TODO(jczaja): Intermediate can only be for val that are not used anywhere
// but lookup table output may go into other LSTM (for reverse
// direction)
auto
*
embedding_out
=
embedding_pattern
(
x
);
patterns
::
FC
fc_pattern
(
pattern
,
name_scope
);
// fc_out is a tmp var, will be removed after fuse, so marked as intermediate.
auto
*
fc_out
=
fc_pattern
(
embedding_out
,
with_fc_bias
)
->
AsIntermediate
();
patterns
::
LSTM
lstm_pattern
(
pattern
,
name_scope
);
lstm_pattern
(
fc_out
);
// Create New OpDesc
auto
embedding_lstm_creator
=
[
&
](
Node
*
embedding
,
Node
*
W
,
Node
*
lstm
,
Node
*
input
,
Node
*
weight_x
,
Node
*
weight_h
,
Node
*
bias
,
Node
*
hidden
,
Node
*
cell
,
Node
*
xx
,
Node
*
fc_bias
)
{
OpDesc
op_desc
;
op_desc
.
SetType
(
"fused_embedding_fc_lstm"
);
#define SET_IN(Key, node__) op_desc.SetInput(#Key, {node__->Name()});
SET_IN
(
Ids
,
input
);
SET_IN
(
WeightH
,
weight_h
);
// Neet to have this passed as We need Wc data for peephole connections
SET_IN
(
Bias
,
bias
);
#undef SET_IN
// Multiply embeddings with Weights
PADDLE_ENFORCE
(
scope
);
const
std
::
string
&
embeddings
=
patterns
::
UniqueKey
(
"Embeddings"
);
auto
*
embeddings_var
=
scope
->
Var
(
embeddings
);
PADDLE_ENFORCE
(
embeddings_var
);
auto
*
embeddings_tensor
=
embeddings_var
->
GetMutable
<
framework
::
LoDTensor
>
();
// Get WeightX size: [single_embedding, fc_size]
// and embedding size: [dict_size, single_embedding]
// and create new size of embeddings eg. [dict_size , hidden_size]
auto
*
embedding_var
=
scope
->
FindVar
(
W
->
Name
());
PADDLE_ENFORCE
(
embedding_var
);
const
auto
&
embedding_tensor
=
embedding_var
->
Get
<
framework
::
LoDTensor
>
();
const
auto
&
weightx_tensor
=
scope
->
FindVar
(
weight_x
->
Name
())
->
Get
<
framework
::
LoDTensor
>
();
embeddings_tensor
->
Resize
(
{
embedding_tensor
.
dims
()[
0
],
weightx_tensor
.
dims
()[
1
]});
// Multiplie embeddings via WeightsX and add bias
auto
embedding_data
=
embedding_tensor
.
data
<
float
>
();
auto
weightx_data
=
weightx_tensor
.
data
<
float
>
();
auto
embeddings_data
=
embeddings_tensor
->
mutable_data
<
float
>
(
platform
::
CPUPlace
());
// Adding biases to GEMM result to be
auto
*
lstm_bias_var
=
scope
->
FindVar
(
bias
->
Name
());
PADDLE_ENFORCE
(
lstm_bias_var
);
const
auto
&
lstm_bias_tensor
=
lstm_bias_var
->
Get
<
framework
::
LoDTensor
>
();
auto
alpha
=
1.0
f
;
auto
beta
=
1.0
f
;
int
m
=
embedding_tensor
.
dims
()[
0
];
int
n
=
weightx_tensor
.
dims
()[
1
];
int
k
=
embedding_tensor
.
dims
()[
1
];
// Copy only gate biases values (only actual bias data, not peephole
// weights)
std
::
vector
<
float
>
combined_biases
;
combined_biases
.
reserve
(
n
);
std
::
copy_n
(
lstm_bias_tensor
.
data
<
float
>
(),
n
,
std
::
back_inserter
(
combined_biases
));
if
(
with_fc_bias
)
{
// Add FC-bias with LSTM-bias (into GEMM result to be)
auto
*
fc_bias_var
=
scope
->
FindVar
(
fc_bias
->
Name
());
const
auto
&
fc_bias_tensor
=
fc_bias_var
->
Get
<
framework
::
LoDTensor
>
();
for
(
int
i
=
0
;
i
<
fc_bias_tensor
.
numel
();
i
++
)
{
combined_biases
[
i
]
+=
fc_bias_tensor
.
data
<
float
>
()[
i
];
}
}
// broadcast biases
std
::
vector
<
float
>
ones
(
m
,
1.0
f
);
paddle
::
operators
::
math
::
CBlas
<
float
>::
GEMM
(
CblasRowMajor
,
CblasNoTrans
,
CblasNoTrans
,
m
,
n
,
1
,
alpha
,
&
ones
[
0
],
1
,
&
combined_biases
[
0
],
n
,
0.0
f
,
embeddings_data
,
n
);
// Wx*embeddings + biases
paddle
::
operators
::
math
::
CBlas
<
float
>::
GEMM
(
CblasRowMajor
,
CblasNoTrans
,
CblasNoTrans
,
m
,
n
,
k
,
alpha
,
embedding_data
,
k
,
weightx_data
,
n
,
beta
,
embeddings_data
,
n
);
op_desc
.
SetInput
(
"Embeddings"
,
{
embeddings
});
// Create temp variables.
const
std
::
string
BatchedInput
=
patterns
::
UniqueKey
(
"BatchedInput"
);
const
std
::
string
BatchedCellPreAct
=
patterns
::
UniqueKey
(
"BatchedCellPreAct"
);
const
std
::
string
BatchedGate
=
patterns
::
UniqueKey
(
"BatchedGate"
);
scope
->
Var
(
BatchedInput
)
->
GetMutable
<
framework
::
LoDTensor
>
();
scope
->
Var
(
BatchedCellPreAct
)
->
GetMutable
<
framework
::
LoDTensor
>
();
scope
->
Var
(
BatchedGate
)
->
GetMutable
<
framework
::
LoDTensor
>
();
op_desc
.
SetInput
(
"H0"
,
{});
op_desc
.
SetInput
(
"C0"
,
{});
op_desc
.
SetOutput
(
"Hidden"
,
{
hidden
->
Name
()});
op_desc
.
SetOutput
(
"Cell"
,
{
cell
->
Name
()});
op_desc
.
SetOutput
(
"XX"
,
{
xx
->
Name
()});
op_desc
.
SetOutput
(
"BatchedGate"
,
{
BatchedGate
});
op_desc
.
SetOutput
(
"BatchCellPreAct"
,
{
BatchedCellPreAct
});
op_desc
.
SetOutput
(
"BatchedInput"
,
{
BatchedInput
});
op_desc
.
SetAttr
(
"is_reverse"
,
lstm
->
Op
()
->
GetAttr
(
"is_reverse"
));
op_desc
.
SetAttr
(
"use_peepholes"
,
lstm
->
Op
()
->
GetAttr
(
"use_peepholes"
));
// TODO(TJ): get from attr
op_desc
.
SetAttr
(
"use_seq"
,
true
);
PADDLE_ENFORCE
(
graph
->
Has
(
kParamScopeAttr
));
auto
*
scope
=
graph
->
Get
<
Scope
*>
(
kParamScopeAttr
);
#define OP_SET_OUT(x) \
const std::string x = patterns::UniqueKey(#x); \
op_desc.SetOutput(#x, {x}); \
scope->Var(x)->GetMutable<LoDTensor>()
OP_SET_OUT
(
BatchedCell
);
OP_SET_OUT
(
BatchedHidden
);
OP_SET_OUT
(
ReorderedH0
);
OP_SET_OUT
(
ReorderedC0
);
#undef OP_SET_OUT
auto
*
op
=
graph
->
CreateOpNode
(
&
op_desc
);
IR_NODE_LINK_TO
(
input
,
op
);
IR_NODE_LINK_TO
(
weight_x
,
op
);
IR_NODE_LINK_TO
(
weight_h
,
op
);
IR_NODE_LINK_TO
(
bias
,
op
);
IR_NODE_LINK_TO
(
op
,
hidden
);
return
op
;
};
int
fusion_count
{
0
};
auto
handler
=
[
&
](
const
GraphPatternDetector
::
subgraph_t
&
subgraph
,
Graph
*
g
)
{
GET_IR_NODE_FROM_SUBGRAPH
(
lstm
,
lstm
,
lstm_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
Weight
,
Weight
,
lstm_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
Bias
,
Bias
,
lstm_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
Cell
,
Cell
,
lstm_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
Hidden
,
Hidden
,
lstm_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
lookup_table
,
lookup_table
,
embedding_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
W
,
W
,
embedding_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
w
,
w
,
fc_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
mul
,
mul
,
fc_pattern
);
// TODO(jczaja): Add support for is_sparse / is_distributed
auto
is_sparse
=
boost
::
get
<
bool
>
(
lookup_table
->
Op
()
->
GetAttr
(
"is_sparse"
));
auto
is_distributed
=
boost
::
get
<
bool
>
(
lookup_table
->
Op
()
->
GetAttr
(
"is_distributed"
));
if
(
is_sparse
==
true
||
is_distributed
==
true
)
{
return
;
}
if
(
with_fc_bias
)
{
GET_IR_NODE_FROM_SUBGRAPH
(
fc_out
,
Out
,
fc_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
fc_bias
,
bias
,
fc_pattern
);
GET_IR_NODE_FROM_SUBGRAPH
(
elementwise_add
,
elementwise_add
,
fc_pattern
);
embedding_lstm_creator
(
lookup_table
,
W
,
lstm
,
subgraph
.
at
(
x
),
w
,
Weight
,
Bias
,
Hidden
,
Cell
,
fc_out
,
fc_bias
);
// Remove unneeded nodes.
// TODO(jczaja): Proper removing of lookup table
std
::
unordered_set
<
const
Node
*>
marked_nodes
(
//{lookup_table, mul, lstm, elementwise_add, fc_bias, W});
{
mul
,
lstm
,
elementwise_add
,
fc_bias
});
GraphSafeRemoveNodes
(
graph
,
marked_nodes
);
}
else
{
GET_IR_NODE_FROM_SUBGRAPH
(
fc_out
,
mul_out
,
fc_pattern
);
embedding_lstm_creator
(
lookup_table
,
W
,
lstm
,
subgraph
.
at
(
x
),
w
,
Weight
,
Bias
,
Hidden
,
Cell
,
fc_out
,
nullptr
);
// Remove unneeded nodes.
// TODO(jczaja): Proper removing of lookup table
// std::unordered_set<const Node*> marked_nodes({lookup_table, W, mul,
// lstm});
std
::
unordered_set
<
const
Node
*>
marked_nodes
({
mul
,
lstm
});
GraphSafeRemoveNodes
(
graph
,
marked_nodes
);
}
++
fusion_count
;
};
gpd
(
graph
,
handler
);
return
fusion_count
;
}
std
::
unique_ptr
<
ir
::
Graph
>
EmbeddingFCLSTMFusePass
::
ApplyImpl
(
std
::
unique_ptr
<
ir
::
Graph
>
graph
)
const
{
FusePassBase
::
Init
(
name_scope_
,
graph
.
get
());
int
fusion_count
=
BuildFusion
(
graph
.
get
(),
name_scope_
,
param_scope
(),
true
/*with_fc_bias*/
);
AddStatis
(
fusion_count
);
return
graph
;
}
}
// namespace ir
}
// namespace framework
}
// namespace paddle
REGISTER_PASS
(
embedding_fc_lstm_fuse_pass
,
paddle
::
framework
::
ir
::
EmbeddingFCLSTMFusePass
);
paddle/fluid/
inference/api/timer
.h
→
paddle/fluid/
framework/ir/embedding_fc_lstm_fuse_pass
.h
浏览文件 @
ea7dc9cb
...
...
@@ -11,29 +11,30 @@
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#pragma once
#include <chrono> // NOLINT
#include "paddle/fluid/framework/ir/fuse_pass_base.h"
#include "paddle/fluid/framework/ir/graph.h"
#include "paddle/fluid/framework/ir/graph_pattern_detector.h"
namespace
paddle
{
namespace
inference
{
namespace
framework
{
namespace
ir
{
// Fusing of Embedding , FC and LSTM op
//
Timer for timer
class
Timer
{
//
Just FC without bias
class
EmbeddingFCLSTMFusePass
:
public
FusePassBase
{
public:
std
::
chrono
::
high_resolution_clock
::
time_point
start
;
std
::
chrono
::
high_resolution_clock
::
time_point
startu
;
void
tic
()
{
start
=
std
::
chrono
::
high_resolution_clock
::
now
();
}
double
toc
()
{
startu
=
std
::
chrono
::
high_resolution_clock
::
now
();
std
::
chrono
::
duration
<
double
>
time_span
=
std
::
chrono
::
duration_cast
<
std
::
chrono
::
duration
<
double
>>
(
startu
-
start
);
double
used_time_ms
=
static_cast
<
double
>
(
time_span
.
count
())
*
1000.0
;
return
used_time_ms
;
}
virtual
~
EmbeddingFCLSTMFusePass
()
{}
protected:
std
::
unique_ptr
<
ir
::
Graph
>
ApplyImpl
(
std
::
unique_ptr
<
ir
::
Graph
>
graph
)
const
;
const
std
::
string
name_scope_
{
"embedding_fc_lstm_fuse"
};
};
}
// namespace inference
}
// namespace ir
}
// namespace framework
}
// namespace paddle
paddle/fluid/framework/ir/graph_pattern_detector.cc
浏览文件 @
ea7dc9cb
...
...
@@ -692,6 +692,24 @@ PDNode *patterns::FC::operator()(paddle::framework::ir::PDNode *x,
}
}
PDNode
*
patterns
::
Embedding
::
operator
()(
PDNode
*
x
)
{
x
->
assert_is_op_input
(
"lookup_table"
,
"Ids"
);
auto
*
lookup_table_op
=
pattern
->
NewNode
(
lookup_table_repr
())
->
assert_is_op
(
"lookup_table"
);
#define NEW_NODE(arg__, io__) \
auto *arg__ = pattern->NewNode(arg__##_repr()) \
->assert_is_op_##io__("lookup_table", #arg__);
NEW_NODE
(
W
,
input
);
NEW_NODE
(
Out
,
output
);
#undef NEW_NODE
lookup_table_op
->
LinksFrom
({
x
,
W
});
lookup_table_op
->
LinksTo
({
Out
});
return
Out
;
}
PDNode
*
patterns
::
LSTM
::
operator
()(
PDNode
*
x
)
{
x
->
assert_is_op_input
(
"lstm"
,
"Input"
);
auto
*
lstm_op
=
pattern
->
NewNode
(
lstm_repr
())
->
assert_is_op
(
"lstm"
);
...
...
paddle/fluid/framework/ir/graph_pattern_detector.h
浏览文件 @
ea7dc9cb
...
...
@@ -418,6 +418,23 @@ struct FC : public PatternBase {
PATTERN_DECL_NODE
(
Out
);
};
// Embedding
struct
Embedding
:
public
PatternBase
{
Embedding
(
PDPattern
*
pattern
,
const
std
::
string
&
name_scope
)
:
PatternBase
(
pattern
,
name_scope
,
"embedding"
)
{}
PDNode
*
operator
()(
PDNode
*
x
);
// declare operator node's name
PATTERN_DECL_NODE
(
lookup_table
);
// Inputs
//
PATTERN_DECL_NODE
(
Ids
);
PATTERN_DECL_NODE
(
W
);
// embeddings
// Outputs
PATTERN_DECL_NODE
(
Out
);
};
struct
LSTM
:
public
PatternBase
{
LSTM
(
PDPattern
*
pattern
,
const
std
::
string
&
name_scope
)
:
PatternBase
(
pattern
,
name_scope
,
"lstm"
)
{}
...
...
paddle/fluid/framework/naive_executor.cc
浏览文件 @
ea7dc9cb
...
...
@@ -12,11 +12,13 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#include "paddle/fluid/framework/naive_executor.h"
#include "paddle/fluid/framework/channel.h"
#include <string>
#include <vector>
#include "paddle/fluid/framework/feed_fetch_method.h"
#include "paddle/fluid/framework/lod_rank_table.h"
#include "paddle/fluid/framework/lod_tensor_array.h"
#include "paddle/fluid/framework/naive_executor.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/reader.h"
#include "paddle/fluid/string/pretty_log.h"
...
...
@@ -44,8 +46,6 @@ static void InitializeVariable(Variable *var, proto::VarType::Type var_type) {
var
->
GetMutable
<
platform
::
PlaceList
>
();
}
else
if
(
var_type
==
proto
::
VarType
::
READER
)
{
var
->
GetMutable
<
ReaderHolder
>
();
}
else
if
(
var_type
==
proto
::
VarType
::
CHANNEL
)
{
var
->
GetMutable
<
ChannelHolder
>
();
}
else
if
(
var_type
==
proto
::
VarType
::
RAW
)
{
// GetMutable will be called in operator
}
else
{
...
...
@@ -146,5 +146,22 @@ void NaiveExecutor::CleanFeedFetchOps() {
ops_
.
swap
(
ops
);
}
void
NaiveExecutor
::
EnableMKLDNN
(
const
ProgramDesc
&
program
)
{
#ifdef PADDLE_WITH_MKLDNN
VLOG
(
3
)
<<
"use_mkldnn=True"
;
for
(
size_t
block_id
=
0
;
block_id
<
program
.
Size
();
++
block_id
)
{
auto
*
block
=
const_cast
<
ProgramDesc
&>
(
program
).
MutableBlock
(
block_id
);
for
(
auto
*
op
:
block
->
AllOps
())
{
if
(
op
->
HasAttr
(
"use_mkldnn"
))
{
op
->
SetAttr
(
"use_mkldnn"
,
true
);
}
}
}
#else
LOG
(
WARNING
)
<<
"'MKLDNN' is not supported, Please re-compile with WITH_MKLDNN option"
;
#endif
}
}
// namespace framework
}
// namespace paddle
paddle/fluid/framework/naive_executor.h
浏览文件 @
ea7dc9cb
...
...
@@ -14,6 +14,8 @@
#pragma once
#include <string>
#include <vector>
#include "paddle/fluid/framework/operator.h"
#include "paddle/fluid/framework/program_desc.h"
#include "paddle/fluid/framework/scope.h"
...
...
@@ -46,6 +48,8 @@ class NaiveExecutor {
void
CleanFeedFetchOps
();
void
EnableMKLDNN
(
const
ProgramDesc
&
program
);
protected:
void
CreateVariables
(
const
ProgramDesc
&
desc
,
Scope
*
scope
,
int
block_id
);
...
...
paddle/fluid/framework/op_proto_maker.cc
浏览文件 @
ea7dc9cb
...
...
@@ -132,9 +132,7 @@ void OpProtoAndCheckerMaker::operator()(proto::OpProto* proto,
AddAttr
<
std
::
string
>
(
OpNamescopeAttrName
(),
"Operator name with namesope."
)
.
SetDefault
(
""
);
AddAttr
<
std
::
vector
<
std
::
string
>>
(
OpCreationCallstackAttrName
(),
"Callstack for Op Creatation."
)
.
SetDefault
({});
Validate
();
}
...
...
paddle/fluid/framework/op_proto_maker.h
浏览文件 @
ea7dc9cb
...
...
@@ -46,7 +46,6 @@ class OpProtoAndCheckerMaker {
static
const
char
*
OpRoleAttrName
()
{
return
"op_role"
;
}
static
const
char
*
OpRoleVarAttrName
()
{
return
"op_role_var"
;
}
static
const
char
*
OpNamescopeAttrName
()
{
return
"op_namescope"
;
}
static
const
char
*
OpCreationCallstackAttrName
()
{
return
"op_callstack"
;
}
void
operator
()(
proto
::
OpProto
*
proto
,
OpAttrChecker
*
attr_checker
);
...
...
paddle/fluid/framework/operator.cc
浏览文件 @
ea7dc9cb
...
...
@@ -14,17 +14,15 @@ limitations under the License. */
#define GLOG_NO_ABBREVIATED_SEVERITIES
#define GOOGLE_GLOG_DLL_DECL
#include "paddle/fluid/framework/operator.h"
#include <gflags/gflags.h>
#include <glog/logging.h>
#include <algorithm>
#include <sstream>
#include <string>
#include <vector>
#include "paddle/fluid/framework/data_transform.h"
#include "paddle/fluid/framework/executor.h"
#include "paddle/fluid/framework/lod_tensor.h"
#include "paddle/fluid/framework/op
_proto_make
r.h"
#include "paddle/fluid/framework/op
erato
r.h"
#include "paddle/fluid/framework/shape_inference.h"
#include "paddle/fluid/framework/var_type.h"
#include "paddle/fluid/platform/profiler.h"
...
...
@@ -142,54 +140,19 @@ static LoD GetLoD(const Scope& scope, const std::string& name) {
}
void
OperatorBase
::
Run
(
const
Scope
&
scope
,
const
platform
::
Place
&
place
)
{
try
{
if
(
VLOG_IS_ON
(
4
))
{
VLOG
(
4
)
<<
place
<<
" "
<<
DebugStringEx
(
&
scope
);
}
if
(
platform
::
is_gpu_place
(
place
))
{
VLOG
(
4
)
<<
place
<<
" "
<<
DebugStringEx
(
&
scope
);
if
(
platform
::
is_gpu_place
(
place
))
{
#ifndef PADDLE_WITH_CUDA
PADDLE_THROW
(
"Cannot run operator on place %s"
,
place
);
PADDLE_THROW
(
"Cannot run operator on place %s"
,
place
);
#else
auto
dev_id
=
boost
::
get
<
platform
::
CUDAPlace
>
(
place
).
device
;
platform
::
SetDeviceId
(
dev_id
);
auto
dev_id
=
boost
::
get
<
platform
::
CUDAPlace
>
(
place
).
device
;
platform
::
SetDeviceId
(
dev_id
);
#endif
}
if
(
platform
::
IsProfileEnabled
())
{
platform
::
DeviceContextPool
&
pool
=
platform
::
DeviceContextPool
::
Instance
();
platform
::
RecordEvent
record_event
(
Type
(),
pool
.
Get
(
place
));
}
RunImpl
(
scope
,
place
);
if
(
VLOG_IS_ON
(
3
))
{
VLOG
(
3
)
<<
place
<<
" "
<<
DebugStringEx
(
&
scope
);
}
}
catch
(
platform
::
EnforceNotMet
exception
)
{
if
(
Attrs
().
count
(
"sub_block"
)
!=
0
)
{
throw
exception
;
}
auto
&
callstack
=
Attr
<
std
::
vector
<
std
::
string
>>
(
OpProtoAndCheckerMaker
::
OpCreationCallstackAttrName
());
if
(
callstack
.
empty
())
{
throw
exception
;
}
std
::
ostringstream
sout
;
sout
<<
"Invoke operator "
<<
Type
()
<<
" error.
\n
"
;
sout
<<
"Python Callstacks:
\n
"
;
for
(
auto
&
line
:
callstack
)
{
sout
<<
line
;
}
sout
<<
"C++ Callstacks:
\n
"
;
sout
<<
exception
.
err_str_
;
exception
.
err_str_
=
sout
.
str
();
throw
exception
;
}
catch
(...)
{
std
::
rethrow_exception
(
std
::
current_exception
());
}
platform
::
DeviceContextPool
&
pool
=
platform
::
DeviceContextPool
::
Instance
();
platform
::
RecordEvent
record_event
(
Type
(),
pool
.
Get
(
place
));
RunImpl
(
scope
,
place
);
VLOG
(
3
)
<<
place
<<
" "
<<
DebugStringEx
(
&
scope
);
}
bool
OperatorBase
::
HasInputs
(
const
std
::
string
&
name
)
const
{
...
...
@@ -217,7 +180,7 @@ const std::vector<std::string>& OperatorBase::Inputs(
}
bool
OperatorBase
::
HasOutputs
(
const
std
::
string
&
name
)
const
{
if
(
outputs_
.
end
()
!=
outputs_
.
find
(
name
))
{
if
(
outputs_
.
find
(
name
)
!=
outputs_
.
end
(
))
{
return
true
;
}
else
{
return
false
;
...
...
paddle/fluid/framework/parallel_executor.cc
浏览文件 @
ea7dc9cb
...
...
@@ -156,10 +156,12 @@ ParallelExecutor::ParallelExecutor(
params
,
member_
->
local_scopes_
,
member_
->
use_cuda_
);
#endif
// If the loss_var_name is given, the number of graph should be only one.
if
(
loss_var_name
.
size
())
{
PADDLE_ENFORCE_EQ
(
ir
::
GraphNum
(
*
graph
),
1
,
"The number of graph should be only one"
);
if
(
VLOG_IS_ON
(
5
))
{
// If the loss_var_name is given, the number of graph should be only one.
if
(
loss_var_name
.
size
())
{
PADDLE_ENFORCE_EQ
(
ir
::
GraphNum
(
*
graph
),
1
,
"The number of graph should be only one"
);
}
}
if
(
exec_strategy
.
type_
==
ExecutionStrategy
::
kDefault
)
{
...
...
@@ -248,6 +250,13 @@ void ParallelExecutor::Run(const std::vector<std::string> &fetch_tensors,
#ifdef PADDLE_WITH_CUDA
if
(
!
gcs_
.
empty
())
{
ResetReferenceCount
();
for
(
auto
&
pair
:
cur_ref_cnts_
)
{
auto
&
name_map
=
*
(
pair
.
second
);
for
(
auto
&
fetch_name
:
fetch_tensors
)
{
name_map
.
erase
(
fetch_name
);
}
name_map
.
erase
(
fetched_var_name
);
}
}
#endif
auto
fetch_data
=
member_
->
executor_
->
Run
(
fetch_tensors
);
...
...
paddle/fluid/framework/rw_lock.h
浏览文件 @
ea7dc9cb
...
...
@@ -46,6 +46,7 @@ struct RWLock {
private:
pthread_rwlock_t
lock_
;
};
// TODO(paddle-dev): Support RWLock for WIN32 for correctness.
#else
// https://stackoverflow.com/questions/7125250/making-pthread-rwlock-wrlock-recursive
// In windows, rw_lock seems like a hack. Use empty object and do nothing.
...
...
paddle/fluid/framework/scope.cc
浏览文件 @
ea7dc9cb
...
...
@@ -20,13 +20,6 @@ limitations under the License. */
#include "paddle/fluid/framework/threadpool.h"
#include "paddle/fluid/string/printf.h"
// The mutex is not needed by training and inference, only for distribution.
#if PADDLE_WITH_DISTRIBUTE
#define WITH_LOCK 1
#else
#define WITH_LOCK 0
#endif
DEFINE_bool
(
benchmark
,
false
,
"Doing memory benchmark. It will make deleting scope synchronized, "
"and add some memory usage logs."
...
...
@@ -56,24 +49,18 @@ int64_t GetEagerDeletionThreshold() {
Scope
::~
Scope
()
{
DropKids
();
}
Scope
&
Scope
::
NewScope
()
const
{
#if WITH_LOCK
std
::
unique_lock
<
std
::
mutex
>
lock
(
mutex_
);
#endif
kids_
.
push_back
(
new
Scope
(
this
));
return
*
kids_
.
back
();
}
Variable
*
Scope
::
Var
(
const
std
::
string
&
name
)
{
#if WITH_LOCK
std
::
unique_lock
<
std
::
mutex
>
lock
(
mutex_
);
#endif
return
VarInternal
(
name
);
}
Variable
*
Scope
::
Var
(
std
::
string
*
name
)
{
#if WITH_LOCK
std
::
unique_lock
<
std
::
mutex
>
lock
(
mutex_
);
#endif
auto
new_name
=
string
::
Sprintf
(
"%p.%d"
,
this
,
vars_
.
size
());
if
(
name
!=
nullptr
)
{
*
name
=
new_name
;
...
...
@@ -82,39 +69,29 @@ Variable* Scope::Var(std::string* name) {
}
Variable
*
Scope
::
FindVar
(
const
std
::
string
&
name
)
const
{
#if WITH_LOCK
std
::
unique_lock
<
std
::
mutex
>
lock
(
mutex_
);
#endif
return
FindVarInternal
(
name
);
}
const
Scope
*
Scope
::
FindScope
(
const
Variable
*
var
)
const
{
#if WITH_LOCK
std
::
unique_lock
<
std
::
mutex
>
lock
(
mutex_
);
#endif
return
FindScopeInternal
(
var
);
}
void
Scope
::
DropKids
()
{
#if WITH_LOCK
std
::
unique_lock
<
std
::
mutex
>
lock
(
mutex_
);
#endif
for
(
Scope
*
s
:
kids_
)
delete
s
;
kids_
.
clear
();
}
bool
Scope
::
HasKid
(
const
Scope
*
scope
)
const
{
#if WITH_LOCK
std
::
unique_lock
<
std
::
mutex
>
lock
(
mutex_
);
#endif
auto
it
=
std
::
find
(
this
->
kids_
.
begin
(),
this
->
kids_
.
end
(),
scope
);
return
it
!=
this
->
kids_
.
end
();
}
std
::
vector
<
std
::
string
>
Scope
::
LocalVarNames
()
const
{
#if WITH_LOCK
std
::
unique_lock
<
std
::
mutex
>
lock
(
mutex_
);
#endif
std
::
vector
<
std
::
string
>
known_vars
;
known_vars
.
reserve
(
this
->
vars_
.
size
());
for
(
auto
&
p
:
vars_
)
{
...
...
@@ -124,9 +101,7 @@ std::vector<std::string> Scope::LocalVarNames() const {
}
void
Scope
::
DeleteScope
(
Scope
*
scope
)
const
{
#if WITH_LOCK
std
::
unique_lock
<
std
::
mutex
>
lock
(
mutex_
);
#endif
auto
it
=
std
::
find
(
this
->
kids_
.
begin
(),
this
->
kids_
.
end
(),
scope
);
PADDLE_ENFORCE
(
it
!=
this
->
kids_
.
end
(),
"Cannot find %p as kid scope"
,
scope
);
this
->
kids_
.
erase
(
it
);
...
...
@@ -139,9 +114,7 @@ void Scope::DeleteScope(Scope* scope) const {
}
void
Scope
::
EraseVars
(
const
std
::
vector
<
std
::
string
>&
var_names
)
{
#if WITH_LOCK
std
::
unique_lock
<
std
::
mutex
>
lock
(
mutex_
);
#endif
std
::
set
<
std
::
string
>
var_set
(
var_names
.
begin
(),
var_names
.
end
());
for
(
auto
it
=
vars_
.
begin
();
it
!=
vars_
.
end
();)
{
if
(
var_set
.
find
(
it
->
first
)
!=
var_set
.
end
())
{
...
...
@@ -154,16 +127,12 @@ void Scope::EraseVars(const std::vector<std::string>& var_names) {
void
Scope
::
Rename
(
const
std
::
string
&
origin_name
,
const
std
::
string
&
new_name
)
const
{
#if WITH_LOCK
std
::
unique_lock
<
std
::
mutex
>
lock
(
mutex_
);
#endif
RenameInternal
(
origin_name
,
new_name
);
}
std
::
string
Scope
::
Rename
(
const
std
::
string
&
origin_name
)
const
{
#if WITH_LOCK
std
::
unique_lock
<
std
::
mutex
>
lock
(
mutex_
);
#endif
auto
new_name
=
string
::
Sprintf
(
"%p.%d"
,
this
,
vars_
.
size
());
RenameInternal
(
origin_name
,
new_name
);
return
new_name
;
...
...
paddle/fluid/framework/tuple.h
浏览文件 @
ea7dc9cb
...
...
@@ -17,7 +17,6 @@ limitations under the License. */
#include <stdexcept>
#include <string>
#include <vector>
#include "paddle/fluid/framework/channel.h"
#include "paddle/fluid/framework/lod_tensor.h"
#include "paddle/fluid/framework/tensor.h"
#include "paddle/fluid/framework/var_desc.h"
...
...
paddle/fluid/framework/var_desc.cc
浏览文件 @
ea7dc9cb
...
...
@@ -88,13 +88,7 @@ std::vector<std::vector<int64_t>> VarDesc::GetShapes() const {
}
void
VarDesc
::
SetDataType
(
proto
::
VarType
::
Type
data_type
)
{
switch
(
desc_
.
type
().
type
())
{
case
proto
::
VarType
::
CHANNEL
:
mutable_channel_desc
()
->
set_data_type
(
data_type
);
break
;
default:
mutable_tensor_desc
()
->
set_data_type
(
data_type
);
}
mutable_tensor_desc
()
->
set_data_type
(
data_type
);
}
void
VarDesc
::
SetDataTypes
(
...
...
@@ -115,13 +109,7 @@ void VarDesc::SetDataTypes(
}
proto
::
VarType
::
Type
VarDesc
::
GetDataType
()
const
{
switch
(
desc_
.
type
().
type
())
{
case
proto
::
VarType
::
CHANNEL
:
return
channel_desc
().
data_type
();
break
;
default:
return
tensor_desc
().
data_type
();
}
return
tensor_desc
().
data_type
();
}
std
::
vector
<
proto
::
VarType
::
Type
>
VarDesc
::
GetDataTypes
()
const
{
...
...
@@ -134,17 +122,6 @@ std::vector<proto::VarType::Type> VarDesc::GetDataTypes() const {
return
res
;
}
void
VarDesc
::
SetCapacity
(
int64_t
capacity
)
{
switch
(
desc_
.
type
().
type
())
{
case
proto
::
VarType
::
CHANNEL
:
desc_
.
mutable_type
()
->
mutable_channel
()
->
set_capacity
(
capacity
);
break
;
default:
PADDLE_THROW
(
"Setting 'capacity' is not supported by the type of var %s."
,
this
->
Name
());
}
}
void
VarDesc
::
SetLoDLevel
(
int32_t
lod_level
)
{
switch
(
desc_
.
type
().
type
())
{
case
proto
::
VarType
::
LOD_TENSOR
:
...
...
@@ -214,19 +191,6 @@ std::vector<int32_t> VarDesc::GetLoDLevels() const {
}
}
const
proto
::
VarType
::
ChannelDesc
&
VarDesc
::
channel_desc
()
const
{
PADDLE_ENFORCE
(
desc_
.
has_type
(),
"The var's type hasn't been set."
);
PADDLE_ENFORCE
(
desc_
.
type
().
has_type
(),
"The var type hasn't been set."
);
switch
(
desc_
.
type
().
type
())
{
case
proto
::
VarType
::
CHANNEL
:
return
desc_
.
type
().
channel
();
default:
PADDLE_THROW
(
"Getting 'channel_desc' is not supported by the type of var %s."
,
this
->
Name
());
}
}
const
proto
::
VarType
::
TensorDesc
&
VarDesc
::
tensor_desc
()
const
{
PADDLE_ENFORCE
(
desc_
.
has_type
(),
"The var's type hasn't been set."
);
PADDLE_ENFORCE
(
desc_
.
type
().
has_type
(),
"The var type hasn't been set."
);
...
...
@@ -262,20 +226,6 @@ std::vector<proto::VarType::TensorDesc> VarDesc::tensor_descs() const {
}
}
proto
::
VarType
::
ChannelDesc
*
VarDesc
::
mutable_channel_desc
()
{
PADDLE_ENFORCE
(
desc_
.
has_type
(),
"The var type hasn't been set."
);
PADDLE_ENFORCE
(
desc_
.
type
().
has_type
(),
"The var type hasn't been set."
);
switch
(
desc_
.
type
().
type
())
{
case
proto
::
VarType
::
CHANNEL
:
return
desc_
.
mutable_type
()
->
mutable_channel
();
default:
PADDLE_THROW
(
"Getting 'mutable_channel_desc' is not supported by the type of var "
"%s."
,
this
->
Name
());
}
}
proto
::
VarType
::
TensorDesc
*
VarDesc
::
mutable_tensor_desc
()
{
PADDLE_ENFORCE
(
desc_
.
has_type
(),
"The var type hasn't been set."
);
PADDLE_ENFORCE
(
desc_
.
type
().
has_type
(),
"The var type hasn't been set."
);
...
...
paddle/fluid/framework/var_desc.h
浏览文件 @
ea7dc9cb
...
...
@@ -87,8 +87,6 @@ class VarDesc {
void
SetDataTypes
(
const
std
::
vector
<
proto
::
VarType
::
Type
>
&
multiple_data_type
);
void
SetCapacity
(
int64_t
capacity
);
proto
::
VarType
::
Type
GetDataType
()
const
;
std
::
vector
<
proto
::
VarType
::
Type
>
GetDataTypes
()
const
;
...
...
@@ -110,10 +108,8 @@ class VarDesc {
void
SetPersistable
(
bool
persistable
)
{
desc_
.
set_persistable
(
persistable
);
}
private:
const
proto
::
VarType
::
ChannelDesc
&
channel_desc
()
const
;
const
proto
::
VarType
::
TensorDesc
&
tensor_desc
()
const
;
std
::
vector
<
proto
::
VarType
::
TensorDesc
>
tensor_descs
()
const
;
proto
::
VarType
::
ChannelDesc
*
mutable_channel_desc
();
proto
::
VarType
::
TensorDesc
*
mutable_tensor_desc
();
std
::
vector
<
proto
::
VarType
::
TensorDesc
*>
mutable_tensor_descs
();
...
...
paddle/fluid/framework/var_type.h
浏览文件 @
ea7dc9cb
...
...
@@ -13,7 +13,6 @@ See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include "paddle/fluid/framework/channel.h"
#include "paddle/fluid/framework/framework.pb.h"
#include "paddle/fluid/framework/lod_rank_table.h"
#include "paddle/fluid/framework/lod_tensor.h"
...
...
@@ -41,8 +40,6 @@ inline proto::VarType::Type ToVarType(std::type_index type) {
return
proto
::
VarType_Type_SELECTED_ROWS
;
}
else
if
(
IsType
<
ReaderHolder
>
(
type
))
{
return
proto
::
VarType_Type_READER
;
}
else
if
(
IsType
<
ChannelHolder
>
(
type
))
{
return
proto
::
VarType_Type_CHANNEL
;
}
else
{
PADDLE_THROW
(
"ToVarType:Unsupported type %s"
,
type
.
name
());
}
...
...
@@ -66,9 +63,6 @@ inline void VisitVarType(const framework::Variable& var, Visitor visitor) {
case
proto
::
VarType_Type_READER
:
visitor
(
var
.
Get
<
ReaderHolder
>
());
return
;
case
proto
::
VarType_Type_CHANNEL
:
visitor
(
var
.
Get
<
ChannelHolder
>
());
return
;
default:
PADDLE_THROW
(
"Not supported visit type, %d"
,
ToVarType
(
var
.
Type
()));
}
...
...
paddle/fluid/inference/CMakeLists.txt
浏览文件 @
ea7dc9cb
...
...
@@ -20,7 +20,8 @@ cc_library(paddle_fluid_origin DEPS ${fluid_modules} paddle_fluid_api)
add_subdirectory
(
api
)
# Create static library
cc_library
(
paddle_fluid DEPS
${
fluid_modules
}
paddle_fluid_api paddle_inference_api analysis_predictor
)
cc_library
(
paddle_fluid DEPS
${
fluid_modules
}
paddle_fluid_api paddle_inference_api
analysis_predictor zero_copy_tensor
)
if
(
NOT APPLE
)
# TODO(liuyiqu: Temporarily disable the link flag because it is not support on Mac.
set
(
LINK_FLAGS
"-Wl,--retain-symbols-file
${
CMAKE_CURRENT_SOURCE_DIR
}
/paddle_fluid.sym"
)
...
...
@@ -31,6 +32,7 @@ endif()
cc_library
(
paddle_fluid_shared SHARED
SRCS io.cc
${
CMAKE_CURRENT_SOURCE_DIR
}
/api/api.cc
${
CMAKE_CURRENT_SOURCE_DIR
}
/api/api_impl.cc
${
CMAKE_CURRENT_SOURCE_DIR
}
/api/analysis_predictor.cc
${
CMAKE_CURRENT_SOURCE_DIR
}
/api/details/zero_copy_tensor.cc
DEPS
${
fluid_modules
}
paddle_fluid_api
)
set_target_properties
(
paddle_fluid_shared PROPERTIES OUTPUT_NAME paddle_fluid
)
...
...
paddle/fluid/inference/analysis/CMakeLists.txt
浏览文件 @
ea7dc9cb
...
...
@@ -20,8 +20,6 @@ cc_test(test_node SRCS node_tester.cc DEPS analysis)
cc_test
(
test_dot SRCS dot_tester.cc DEPS analysis
)
cc_binary
(
inference_analyzer SRCS analyzer_main.cc DEPS analysis paddle_fluid
)
set
(
PYTHON_TESTS_DIR
${
PADDLE_BINARY_DIR
}
/python/paddle/fluid/tests
)
function
(
inference_analysis_test TARGET
)
if
(
WITH_TESTING
)
set
(
options
""
)
...
...
paddle/fluid/inference/analysis/analysis_pass.h
浏览文件 @
ea7dc9cb
...
...
@@ -41,12 +41,6 @@ class AnalysisPass {
// all passes have run.
virtual
bool
Finalize
()
{
return
false
;
}
// Get a Pass appropriate to print the Node this pass operates on.
virtual
AnalysisPass
*
CreatePrinterPass
(
std
::
ostream
&
os
,
const
std
::
string
&
banner
)
const
{
return
nullptr
;
}
// Create a debugger Pass that draw the DFG by graphviz toolkit.
virtual
AnalysisPass
*
CreateGraphvizDebugerPass
()
const
{
return
nullptr
;
}
...
...
paddle/fluid/inference/analysis/analyzer.h
浏览文件 @
ea7dc9cb
...
...
@@ -64,14 +64,15 @@ class Analyzer : public OrderedRegistry<PassManager> {
// larger fusion.
const
std
::
vector
<
std
::
string
>
all_ir_passes_
{{
// Manual update the passes here.
"infer_clean_graph_pass"
,
//
"attention_lstm_fuse_pass"
,
//
"fc_lstm_fuse_pass"
,
//
"mul_lstm_fuse_pass"
,
//
"fc_gru_fuse_pass"
,
//
"mul_gru_fuse_pass"
,
//
"seq_concat_fc_fuse_pass"
,
//
"fc_fuse_pass"
,
//
"infer_clean_graph_pass"
,
//
"attention_lstm_fuse_pass"
,
//
"embedding_fc_lstm_fuse_pass"
,
//
"fc_lstm_fuse_pass"
,
//
"mul_lstm_fuse_pass"
,
//
"fc_gru_fuse_pass"
,
//
"mul_gru_fuse_pass"
,
//
"seq_concat_fc_fuse_pass"
,
//
"fc_fuse_pass"
,
//
#ifdef PADDLE_WITH_MKLDNN
"conv_relu_mkldnn_fuse_pass"
,
//
#endif
...
...
paddle/fluid/inference/api/CMakeLists.txt
浏览文件 @
ea7dc9cb
...
...
@@ -31,7 +31,6 @@ function(inference_api_test TARGET_NAME)
set
(
multiValueArgs ARGS
)
cmake_parse_arguments
(
inference_test
"
${
options
}
"
"
${
oneValueArgs
}
"
"
${
multiValueArgs
}
"
${
ARGN
}
)
set
(
PYTHON_TESTS_DIR
${
PADDLE_BINARY_DIR
}
/python/paddle/fluid/tests
)
cc_test
(
${
TARGET_NAME
}
SRCS
${
inference_test_SRC
}
DEPS
"
${
inference_deps
}
"
...
...
paddle/fluid/inference/api/analysis_predictor.cc
浏览文件 @
ea7dc9cb
...
...
@@ -24,7 +24,6 @@
#include "paddle/fluid/inference/api/helper.h"
#include "paddle/fluid/inference/api/paddle_inference_api.h"
#include "paddle/fluid/inference/api/paddle_inference_pass.h"
#include "paddle/fluid/inference/api/timer.h"
#include "paddle/fluid/inference/utils/singleton.h"
#include "paddle/fluid/platform/profiler.h"
...
...
@@ -72,6 +71,11 @@ bool AnalysisPredictor::Init(
}
else
{
inference_program_
=
program
;
}
if
(
config_
.
_use_mkldnn
)
{
executor_
->
EnableMKLDNN
(
*
inference_program_
);
}
executor_
->
Prepare
(
scope_
.
get
(),
*
inference_program_
,
0
,
config_
.
use_feed_fetch_ops
);
...
...
@@ -93,6 +97,7 @@ bool AnalysisPredictor::Run(const std::vector<PaddleTensor> &inputs,
LOG
(
ERROR
)
<<
"fail to set feed"
;
return
false
;
}
// Run the inference program
// if share variables, we need not create variables
executor_
->
Run
();
...
...
paddle/fluid/inference/api/api_impl.cc
浏览文件 @
ea7dc9cb
...
...
@@ -23,7 +23,6 @@ limitations under the License. */
#include "paddle/fluid/framework/feed_fetch_method.h"
#include "paddle/fluid/inference/api/api_impl.h"
#include "paddle/fluid/inference/api/helper.h"
#include "paddle/fluid/inference/api/timer.h"
#include "paddle/fluid/platform/profiler.h"
DEFINE_bool
(
profile
,
false
,
"Turn on profiler for fluid"
);
...
...
paddle/fluid/inference/api/api_impl_tester.cc
浏览文件 @
ea7dc9cb
...
...
@@ -21,6 +21,12 @@ limitations under the License. */
#include "paddle/fluid/inference/api/api_impl.h"
#include "paddle/fluid/inference/tests/test_helper.h"
#ifdef __clang__
#define ACC_DIFF 4e-3
#else
#define ACC_DIFF 1e-3
#endif
DEFINE_string
(
dirname
,
""
,
"Directory of the inference model."
);
namespace
paddle
{
...
...
@@ -99,8 +105,8 @@ void MainWord2Vec(bool use_gpu) {
float
*
lod_data
=
output1
.
data
<
float
>
();
for
(
int
i
=
0
;
i
<
output1
.
numel
();
++
i
)
{
EXPECT_LT
(
lod_data
[
i
]
-
data
[
i
],
1e-3
);
EXPECT_GT
(
lod_data
[
i
]
-
data
[
i
],
-
1e-3
);
EXPECT_LT
(
lod_data
[
i
]
-
data
[
i
],
ACC_DIFF
);
EXPECT_GT
(
lod_data
[
i
]
-
data
[
i
],
-
ACC_DIFF
);
}
}
...
...
@@ -144,7 +150,7 @@ void MainImageClassification(bool use_gpu) {
float
*
data
=
static_cast
<
float
*>
(
outputs
[
0
].
data
.
data
());
float
*
lod_data
=
output1
.
data
<
float
>
();
for
(
size_t
j
=
0
;
j
<
len
/
sizeof
(
float
);
++
j
)
{
EXPECT_NEAR
(
lod_data
[
j
],
data
[
j
],
1e-3
);
EXPECT_NEAR
(
lod_data
[
j
],
data
[
j
],
ACC_DIFF
);
}
}
...
...
@@ -199,7 +205,7 @@ void MainThreadsWord2Vec(bool use_gpu) {
float
*
ref_data
=
refs
[
tid
].
data
<
float
>
();
EXPECT_EQ
(
refs
[
tid
].
numel
(),
static_cast
<
int64_t
>
(
len
/
sizeof
(
float
)));
for
(
int
i
=
0
;
i
<
refs
[
tid
].
numel
();
++
i
)
{
EXPECT_NEAR
(
ref_data
[
i
],
data
[
i
],
1e-3
);
EXPECT_NEAR
(
ref_data
[
i
],
data
[
i
],
ACC_DIFF
);
}
});
}
...
...
@@ -251,7 +257,7 @@ void MainThreadsImageClassification(bool use_gpu) {
float
*
ref_data
=
refs
[
tid
].
data
<
float
>
();
EXPECT_EQ
((
size_t
)
refs
[
tid
].
numel
(),
len
/
sizeof
(
float
));
for
(
int
i
=
0
;
i
<
refs
[
tid
].
numel
();
++
i
)
{
EXPECT_NEAR
(
ref_data
[
i
],
data
[
i
],
1e-3
);
EXPECT_NEAR
(
ref_data
[
i
],
data
[
i
],
ACC_DIFF
);
}
});
}
...
...
paddle/fluid/inference/api/demo_ci/run.sh
浏览文件 @
ea7dc9cb
...
...
@@ -2,6 +2,9 @@ set -x
PADDLE_ROOT
=
$1
TURN_ON_MKL
=
$2
# use MKL or Openblas
TEST_GPU_CPU
=
$3
# test both GPU/CPU mode or only CPU mode
DATA_DIR
=
$4
# dataset
cd
`
dirname
$0
`
current_dir
=
`
pwd
`
if
[
$2
==
ON
]
;
then
# You can export yourself if move the install path
MKL_LIB
=
${
PADDLE_ROOT
}
/build/fluid_install_dir/third_party/install/mklml/lib
...
...
@@ -29,15 +32,15 @@ function download() {
fi
cd
..
}
mkdir
-p
data
cd
data
mkdir
-p
$DATA_DIR
cd
$DATA_DIR
vis_demo_list
=
'se_resnext50 ocr mobilenet'
for
vis_demo_name
in
$vis_demo_list
;
do
download
$vis_demo_name
done
cd
..
# compile and test the demo
cd
$current_dir
mkdir
-p
build
cd
build
...
...
@@ -73,9 +76,9 @@ for WITH_STATIC_LIB in ON OFF; do
for
use_gpu
in
$use_gpu_list
;
do
for
vis_demo_name
in
$vis_demo_list
;
do
./vis_demo
\
--modeldir
=
../data
/
$vis_demo_name
/model
\
--data
=
../data
/
$vis_demo_name
/data.txt
\
--refer
=
../data
/
$vis_demo_name
/result.txt
\
--modeldir
=
$DATA_DIR
/
$vis_demo_name
/model
\
--data
=
$DATA_DIR
/
$vis_demo_name
/data.txt
\
--refer
=
$DATA_DIR
/
$vis_demo_name
/result.txt
\
--use_gpu
=
$use_gpu
if
[
$?
-ne
0
]
;
then
echo
"vis demo
$vis_demo_name
runs fail."
...
...
paddle/fluid/inference/api/helper.h
浏览文件 @
ea7dc9cb
...
...
@@ -16,19 +16,34 @@
#include <glog/logging.h>
#include <sys/time.h>
#include <
algorithm>
#include <
chrono> // NOLINT
#include <numeric>
#include <sstream>
#include <string>
#include <vector>
#include "paddle/fluid/framework/lod_tensor.h"
#include "paddle/fluid/inference/api/paddle_inference_api.h"
#include "paddle/fluid/inference/api/timer.h"
#include "paddle/fluid/string/printf.h"
#include "paddle_inference_api.h"
namespace
paddle
{
namespace
inference
{
// Timer for timer
class
Timer
{
public:
std
::
chrono
::
high_resolution_clock
::
time_point
start
;
std
::
chrono
::
high_resolution_clock
::
time_point
startu
;
void
tic
()
{
start
=
std
::
chrono
::
high_resolution_clock
::
now
();
}
double
toc
()
{
startu
=
std
::
chrono
::
high_resolution_clock
::
now
();
std
::
chrono
::
duration
<
double
>
time_span
=
std
::
chrono
::
duration_cast
<
std
::
chrono
::
duration
<
double
>>
(
startu
-
start
);
double
used_time_ms
=
static_cast
<
double
>
(
time_span
.
count
())
*
1000.0
;
return
used_time_ms
;
}
};
static
void
split
(
const
std
::
string
&
str
,
char
sep
,
std
::
vector
<
std
::
string
>
*
pieces
)
{
pieces
->
clear
();
...
...
@@ -154,127 +169,5 @@ static void PrintTime(int batch_size, int repeat, int num_threads, int tid,
}
}
template
<
typename
T
>
std
::
string
LoDTensorSummary
(
const
framework
::
LoDTensor
&
tensor
)
{
std
::
stringstream
ss
;
ss
<<
"
\n
---- tensor ---"
<<
'\n'
;
ss
<<
"lod: ["
;
for
(
const
auto
&
level
:
tensor
.
lod
())
{
ss
<<
"[ "
;
for
(
auto
i
:
level
)
{
ss
<<
i
<<
", "
;
}
ss
<<
"]"
;
}
ss
<<
"]
\n
"
;
ss
<<
"shape: ["
;
int
size
=
1
;
for
(
int
i
=
0
;
i
<
tensor
.
dims
().
size
();
i
++
)
{
int
dim
=
tensor
.
dims
()[
i
];
ss
<<
dim
<<
", "
;
size
*=
dim
;
}
ss
<<
"]
\n
"
;
ss
<<
"data: "
;
for
(
int
i
=
0
;
i
<
std
::
min
(
20
,
size
);
i
++
)
{
ss
<<
tensor
.
data
<
T
>
()[
i
]
<<
" "
;
}
ss
<<
"
\n
"
;
return
ss
.
str
();
}
static
bool
CompareLoD
(
const
framework
::
LoD
&
a
,
const
framework
::
LoD
&
b
)
{
if
(
a
.
size
()
!=
b
.
size
())
{
LOG
(
ERROR
)
<<
string
::
Sprintf
(
"lod size not match %d != %d"
,
a
.
size
(),
b
.
size
());
return
false
;
}
for
(
size_t
i
=
0
;
i
<
a
.
size
();
i
++
)
{
auto
&
al
=
a
[
i
];
auto
&
bl
=
b
[
i
];
if
(
al
.
size
()
!=
bl
.
size
())
{
LOG
(
ERROR
)
<<
string
::
Sprintf
(
"level size %d != %d"
,
al
.
size
(),
bl
.
size
());
return
false
;
}
}
return
true
;
}
static
bool
CompareShape
(
const
std
::
vector
<
int64_t
>
&
a
,
const
std
::
vector
<
int64_t
>
&
b
)
{
if
(
a
.
size
()
!=
b
.
size
())
{
LOG
(
ERROR
)
<<
string
::
Sprintf
(
"shape size not match %d != %d"
,
a
.
size
(),
b
.
size
());
return
false
;
}
for
(
size_t
i
=
0
;
i
<
a
.
size
();
i
++
)
{
if
(
a
[
i
]
!=
b
[
i
])
{
LOG
(
ERROR
)
<<
string
::
Sprintf
(
"shape %d-th element not match %d != %d"
,
i
,
a
[
i
],
b
[
i
]);
return
false
;
}
}
return
true
;
}
static
bool
CompareTensorData
(
const
framework
::
LoDTensor
&
a
,
const
framework
::
LoDTensor
&
b
)
{
auto
a_shape
=
framework
::
vectorize
(
a
.
dims
());
auto
b_shape
=
framework
::
vectorize
(
b
.
dims
());
size_t
a_size
=
std
::
accumulate
(
a_shape
.
begin
(),
a_shape
.
end
(),
1
,
[](
int
a
,
int
b
)
{
return
a
*
b
;
});
size_t
b_size
=
std
::
accumulate
(
b_shape
.
begin
(),
b_shape
.
end
(),
1
,
[](
int
a
,
int
b
)
{
return
a
*
b
;
});
if
(
a_size
!=
b_size
)
{
LOG
(
ERROR
)
<<
string
::
Sprintf
(
"tensor data size not match, %d != %d"
,
a_size
,
b_size
);
}
for
(
size_t
i
=
0
;
i
<
a_size
;
i
++
)
{
if
(
a
.
type
()
==
typeid
(
float
))
{
const
auto
*
a_data
=
a
.
data
<
float
>
();
const
auto
*
b_data
=
b
.
data
<
float
>
();
if
(
std
::
abs
(
a_data
[
i
]
-
b_data
[
i
])
>
1e-3
)
{
LOG
(
ERROR
)
<<
string
::
Sprintf
(
"tensor data %d-th element not match, %f != %f"
,
i
,
a_data
[
i
],
b_data
[
i
]);
return
false
;
}
}
else
if
(
a
.
type
()
==
typeid
(
int64_t
))
{
const
auto
*
a_data
=
a
.
data
<
int64_t
>
();
const
auto
*
b_data
=
b
.
data
<
int64_t
>
();
if
(
std
::
abs
(
a_data
[
i
]
-
b_data
[
i
])
>
1e-3
)
{
LOG
(
ERROR
)
<<
string
::
Sprintf
(
"tensor data %d-th element not match, %f != %f"
,
i
,
a_data
[
i
],
b_data
[
i
]);
return
false
;
}
}
}
return
true
;
}
static
bool
CompareTensor
(
const
framework
::
LoDTensor
&
a
,
const
framework
::
LoDTensor
&
b
)
{
if
(
!
CompareLoD
(
a
.
lod
(),
b
.
lod
()))
{
return
false
;
}
if
(
!
CompareShape
(
framework
::
vectorize
(
a
.
dims
()),
framework
::
vectorize
(
b
.
dims
())))
{
return
false
;
}
if
(
!
CompareTensorData
(
a
,
b
))
{
return
false
;
}
return
true
;
}
}
// namespace inference
}
// namespace paddle
paddle/fluid/inference/api/paddle_inference_api.h
浏览文件 @
ea7dc9cb
...
...
@@ -263,14 +263,13 @@ struct AnalysisConfig : public NativeConfig {
bool
enable_ir_optim
=
true
;
// Manually determine the IR passes to run.
IrPassMode
ir_mode
{
IrPassMode
::
kExclude
};
std
::
vector
<
std
::
string
>
ir_passes
;
std
::
vector
<
std
::
string
>
ir_passes
{
"embedding_fc_lstm_fuse_pass"
}
;
// NOT stable yet.
bool
use_feed_fetch_ops
{
true
};
// NOTE this is just for internal development, please not use it. NOT
// stable
// yet.
// NOTE this is just for internal development, please not use it.
// NOT stable yet.
bool
_use_mkldnn
{
false
};
};
...
...
paddle/fluid/inference/tests/api/CMakeLists.txt
浏览文件 @
ea7dc9cb
...
...
@@ -70,6 +70,14 @@ if (NOT EXISTS ${OCR_INSTALL_DIR})
endif
()
inference_analysis_api_test
(
test_analyzer_ocr
${
OCR_INSTALL_DIR
}
analyzer_vis_tester.cc
)
# resnet50
set
(
RESNET50_INSTALL_DIR
"
${
INFERENCE_DEMO_INSTALL_DIR
}
/resnet50"
)
if
(
NOT EXISTS
${
RESNET50_INSTALL_DIR
}
)
inference_download_and_uncompress
(
${
RESNET50_INSTALL_DIR
}
${
INFERENCE_URL
}
"resnet50_model.tar.gz"
)
endif
()
inference_analysis_test
(
test_analyzer_resnet50 SRCS analyzer_resnet50_tester.cc
EXTRA_DEPS
${
INFERENCE_EXTRA_DEPS
}
ARGS --infer_model=
${
RESNET50_INSTALL_DIR
}
/model
)
# anakin
if
(
WITH_ANAKIN AND WITH_MKL
)
# only needed in CI
# anakin rnn1
...
...
paddle/fluid/inference/tests/api/anakin_rnn1_tester.cc
浏览文件 @
ea7dc9cb
...
...
@@ -22,7 +22,6 @@ limitations under the License. */
#include <vector>
#include "paddle/fluid/inference/api/helper.h"
#include "paddle/fluid/inference/api/paddle_inference_api.h"
#include "paddle/fluid/inference/api/timer.h"
#include "utils/logger/logger.h"
DEFINE_string
(
model
,
""
,
"Directory of the inference model."
);
...
...
paddle/fluid/inference/tests/api/analyzer_resnet50_tester.cc
0 → 100644
浏览文件 @
ea7dc9cb
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <fstream>
#include <iostream>
#include "paddle/fluid/inference/tests/api/tester_helper.h"
namespace
paddle
{
namespace
inference
{
namespace
analysis
{
void
SetConfig
(
AnalysisConfig
*
cfg
)
{
cfg
->
param_file
=
FLAGS_infer_model
+
"/params"
;
cfg
->
prog_file
=
FLAGS_infer_model
+
"/model"
;
cfg
->
use_gpu
=
false
;
cfg
->
device
=
0
;
cfg
->
enable_ir_optim
=
true
;
cfg
->
specify_input_name
=
true
;
}
void
SetInput
(
std
::
vector
<
std
::
vector
<
PaddleTensor
>>
*
inputs
)
{
PADDLE_ENFORCE_EQ
(
FLAGS_test_all_data
,
0
,
"Only have single batch of data."
);
PaddleTensor
input
;
// channel=3, height/width=318
std
::
vector
<
int
>
shape
({
FLAGS_batch_size
,
3
,
318
,
318
});
input
.
shape
=
shape
;
input
.
dtype
=
PaddleDType
::
FLOAT32
;
// fill input data, for profile easily, do not use random data here.
size_t
size
=
FLAGS_batch_size
*
3
*
318
*
318
;
input
.
data
.
Resize
(
size
*
sizeof
(
float
));
float
*
input_data
=
static_cast
<
float
*>
(
input
.
data
.
data
());
for
(
size_t
i
=
0
;
i
<
size
;
i
++
)
{
*
(
input_data
+
i
)
=
static_cast
<
float
>
(
i
)
/
size
;
}
std
::
vector
<
PaddleTensor
>
input_slots
;
input_slots
.
assign
({
input
});
(
*
inputs
).
emplace_back
(
input_slots
);
}
// Easy for profiling independently.
TEST
(
Analyzer_resnet50
,
profile
)
{
AnalysisConfig
cfg
;
SetConfig
(
&
cfg
);
std
::
vector
<
PaddleTensor
>
outputs
;
std
::
vector
<
std
::
vector
<
PaddleTensor
>>
input_slots_all
;
SetInput
(
&
input_slots_all
);
TestPrediction
(
cfg
,
input_slots_all
,
&
outputs
,
FLAGS_num_threads
);
if
(
FLAGS_num_threads
==
1
&&
!
FLAGS_test_all_data
)
{
PADDLE_ENFORCE_EQ
(
outputs
.
size
(),
1UL
);
size_t
size
=
GetSize
(
outputs
[
0
]);
// output is a 512-dimension feature
EXPECT_EQ
(
size
,
512
*
FLAGS_batch_size
);
}
}
// Check the fuse status
TEST
(
Analyzer_resnet50
,
fuse_statis
)
{
AnalysisConfig
cfg
;
SetConfig
(
&
cfg
);
int
num_ops
;
auto
predictor
=
CreatePaddlePredictor
<
AnalysisConfig
>
(
cfg
);
auto
fuse_statis
=
GetFuseStatis
(
static_cast
<
AnalysisPredictor
*>
(
predictor
.
get
()),
&
num_ops
);
ASSERT_TRUE
(
fuse_statis
.
count
(
"fc_fuse"
));
EXPECT_EQ
(
fuse_statis
.
at
(
"fc_fuse"
),
1
);
}
// Compare result of NativeConfig and AnalysisConfig
TEST
(
Analyzer_resnet50
,
compare
)
{
AnalysisConfig
cfg
;
SetConfig
(
&
cfg
);
std
::
vector
<
std
::
vector
<
PaddleTensor
>>
input_slots_all
;
SetInput
(
&
input_slots_all
);
CompareNativeAndAnalysis
(
cfg
,
input_slots_all
);
}
}
// namespace analysis
}
// namespace inference
}
// namespace paddle
paddle/fluid/inference/tests/api/analyzer_rnn1_tester.cc
浏览文件 @
ea7dc9cb
...
...
@@ -12,7 +12,6 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#include "paddle/fluid/inference/api/analysis_predictor.h"
#include "paddle/fluid/inference/tests/api/tester_helper.h"
DEFINE_bool
(
with_precision_check
,
true
,
"turn on test"
);
...
...
@@ -271,10 +270,11 @@ TEST(Analyzer_rnn1, multi_thread) {
std
::
vector
<
std
::
vector
<
PaddleTensor
>>
input_slots_all
;
SetInput
(
&
input_slots_all
);
TestPrediction
(
cfg
,
input_slots_all
,
&
outputs
,
FLAGS_num_threads
);
TestPrediction
(
cfg
,
input_slots_all
,
&
outputs
,
4
/* multi_thread */
);
}
bool
CompareTensors
(
framework
::
Scope
&
a_scope
,
framework
::
Scope
&
b_scope
,
bool
CompareTensors
(
const
framework
::
Scope
&
a_scope
,
const
framework
::
Scope
&
b_scope
,
const
std
::
vector
<
std
::
string
>
&
tensors
)
{
for
(
auto
&
x
:
tensors
)
{
auto
*
a_var
=
a_scope
.
FindVar
(
x
);
...
...
paddle/fluid/inference/tests/api/analyzer_text_classification_tester.cc
浏览文件 @
ea7dc9cb
...
...
@@ -104,5 +104,18 @@ TEST(Analyzer_Text_Classification, compare) {
CompareNativeAndAnalysis
(
cfg
,
input_slots_all
);
}
TEST
(
Analyzer_Text_Classification
,
compare_against_embedding_fc_lstm_fused
)
{
AnalysisConfig
cfg
;
SetConfig
(
&
cfg
);
// Enable embedding_fc_lstm_fuse_pass (disabled by default)
auto
it
=
std
::
find
(
cfg
.
ir_passes
.
begin
(),
cfg
.
ir_passes
.
end
(),
"embedding_fc_lstm_fuse_pass"
);
if
(
it
!=
cfg
.
ir_passes
.
end
())
cfg
.
ir_passes
.
erase
(
it
);
std
::
vector
<
std
::
vector
<
PaddleTensor
>>
input_slots_all
;
SetInput
(
&
input_slots_all
);
CompareNativeAndAnalysis
(
cfg
,
input_slots_all
);
}
}
// namespace inference
}
// namespace paddle
paddle/fluid/inference/tests/api/analyzer_vis_tester.cc
浏览文件 @
ea7dc9cb
...
...
@@ -61,8 +61,6 @@ void SetConfig(AnalysisConfig *cfg) {
cfg
->
ir_passes
.
push_back
(
"fc_gru_fuse_pass"
);
#ifdef PADDLE_WITH_MKLDNN
cfg
->
_use_mkldnn
=
true
;
// disable mkldnn fuse since it should have some bugs
cfg
->
ir_passes
.
push_back
(
"conv_relu_mkldnn_fuse_pass"
);
#endif
}
...
...
paddle/fluid/inference/tests/api/tester_helper.h
浏览文件 @
ea7dc9cb
...
...
@@ -15,6 +15,7 @@
#pragma once
#include <gtest/gtest.h>
#include <algorithm>
#include <string>
#include <thread> // NOLINT
#include <vector>
...
...
@@ -182,5 +183,127 @@ void CompareNativeAndAnalysis(
CompareResult
(
analysis_outputs
,
native_outputs
);
}
template
<
typename
T
>
std
::
string
LoDTensorSummary
(
const
framework
::
LoDTensor
&
tensor
)
{
std
::
stringstream
ss
;
ss
<<
"
\n
---- tensor ---"
<<
'\n'
;
ss
<<
"lod: ["
;
for
(
const
auto
&
level
:
tensor
.
lod
())
{
ss
<<
"[ "
;
for
(
auto
i
:
level
)
{
ss
<<
i
<<
", "
;
}
ss
<<
"]"
;
}
ss
<<
"]
\n
"
;
ss
<<
"shape: ["
;
int
size
=
1
;
for
(
int
i
=
0
;
i
<
tensor
.
dims
().
size
();
i
++
)
{
int
dim
=
tensor
.
dims
()[
i
];
ss
<<
dim
<<
", "
;
size
*=
dim
;
}
ss
<<
"]
\n
"
;
ss
<<
"data: "
;
for
(
int
i
=
0
;
i
<
std
::
min
(
20
,
size
);
i
++
)
{
ss
<<
tensor
.
data
<
T
>
()[
i
]
<<
" "
;
}
ss
<<
"
\n
"
;
return
ss
.
str
();
}
static
bool
CompareLoD
(
const
framework
::
LoD
&
a
,
const
framework
::
LoD
&
b
)
{
if
(
a
.
size
()
!=
b
.
size
())
{
LOG
(
ERROR
)
<<
string
::
Sprintf
(
"lod size not match %d != %d"
,
a
.
size
(),
b
.
size
());
return
false
;
}
for
(
size_t
i
=
0
;
i
<
a
.
size
();
i
++
)
{
auto
&
al
=
a
[
i
];
auto
&
bl
=
b
[
i
];
if
(
al
.
size
()
!=
bl
.
size
())
{
LOG
(
ERROR
)
<<
string
::
Sprintf
(
"level size %d != %d"
,
al
.
size
(),
bl
.
size
());
return
false
;
}
}
return
true
;
}
static
bool
CompareShape
(
const
std
::
vector
<
int64_t
>
&
a
,
const
std
::
vector
<
int64_t
>
&
b
)
{
if
(
a
.
size
()
!=
b
.
size
())
{
LOG
(
ERROR
)
<<
string
::
Sprintf
(
"shape size not match %d != %d"
,
a
.
size
(),
b
.
size
());
return
false
;
}
for
(
size_t
i
=
0
;
i
<
a
.
size
();
i
++
)
{
if
(
a
[
i
]
!=
b
[
i
])
{
LOG
(
ERROR
)
<<
string
::
Sprintf
(
"shape %d-th element not match %d != %d"
,
i
,
a
[
i
],
b
[
i
]);
return
false
;
}
}
return
true
;
}
static
bool
CompareTensorData
(
const
framework
::
LoDTensor
&
a
,
const
framework
::
LoDTensor
&
b
)
{
auto
a_shape
=
framework
::
vectorize
(
a
.
dims
());
auto
b_shape
=
framework
::
vectorize
(
b
.
dims
());
size_t
a_size
=
std
::
accumulate
(
a_shape
.
begin
(),
a_shape
.
end
(),
1
,
[](
int
a
,
int
b
)
{
return
a
*
b
;
});
size_t
b_size
=
std
::
accumulate
(
b_shape
.
begin
(),
b_shape
.
end
(),
1
,
[](
int
a
,
int
b
)
{
return
a
*
b
;
});
if
(
a_size
!=
b_size
)
{
LOG
(
ERROR
)
<<
string
::
Sprintf
(
"tensor data size not match, %d != %d"
,
a_size
,
b_size
);
}
for
(
size_t
i
=
0
;
i
<
a_size
;
i
++
)
{
if
(
a
.
type
()
==
typeid
(
float
))
{
const
auto
*
a_data
=
a
.
data
<
float
>
();
const
auto
*
b_data
=
b
.
data
<
float
>
();
if
(
std
::
abs
(
a_data
[
i
]
-
b_data
[
i
])
>
1e-3
)
{
LOG
(
ERROR
)
<<
string
::
Sprintf
(
"tensor data %d-th element not match, %f != %f"
,
i
,
a_data
[
i
],
b_data
[
i
]);
return
false
;
}
}
else
if
(
a
.
type
()
==
typeid
(
int64_t
))
{
const
auto
*
a_data
=
a
.
data
<
int64_t
>
();
const
auto
*
b_data
=
b
.
data
<
int64_t
>
();
if
(
std
::
abs
(
a_data
[
i
]
-
b_data
[
i
])
>
1e-3
)
{
LOG
(
ERROR
)
<<
string
::
Sprintf
(
"tensor data %d-th element not match, %f != %f"
,
i
,
a_data
[
i
],
b_data
[
i
]);
return
false
;
}
}
}
return
true
;
}
static
bool
CompareTensor
(
const
framework
::
LoDTensor
&
a
,
const
framework
::
LoDTensor
&
b
)
{
if
(
!
CompareLoD
(
a
.
lod
(),
b
.
lod
()))
{
return
false
;
}
if
(
!
CompareShape
(
framework
::
vectorize
(
a
.
dims
()),
framework
::
vectorize
(
b
.
dims
())))
{
return
false
;
}
if
(
!
CompareTensorData
(
a
,
b
))
{
return
false
;
}
return
true
;
}
}
// namespace inference
}
// namespace paddle
paddle/fluid/inference/tests/book/CMakeLists.txt
浏览文件 @
ea7dc9cb
...
...
@@ -4,7 +4,6 @@ function(inference_test TARGET_NAME)
set
(
multiValueArgs ARGS
)
cmake_parse_arguments
(
inference_test
"
${
options
}
"
"
${
oneValueArgs
}
"
"
${
multiValueArgs
}
"
${
ARGN
}
)
set
(
PYTHON_TESTS_DIR
${
PADDLE_BINARY_DIR
}
/python/paddle/fluid/tests
)
set
(
arg_list
""
)
if
(
inference_test_ARGS
)
foreach
(
arg
${
inference_test_ARGS
}
)
...
...
paddle/fluid/operators/CMakeLists.txt
浏览文件 @
ea7dc9cb
...
...
@@ -82,10 +82,11 @@ function(op_library TARGET)
if
(
${
cc_srcs_len
}
EQUAL 0
)
message
(
FATAL_ERROR
"The op library
${
TARGET
}
should contains at least one .cc file"
)
endif
()
#remove windows unsupported op
if
(
WIN32
)
foreach
(
windows_unsupport_op
"nccl_op"
"gen_nccl_id_op"
"warpctc_op"
)
# remove windows unsupported op, because windows has no nccl, no warpctc such ops.
foreach
(
windows_unsupport_op
"nccl_op"
"gen_nccl_id_op"
"warpctc_op"
"hierarchical_sigmoid_op"
"crf_decoding_op"
"select_op"
"lstmp_op"
"gru_op"
"fusion_gru_op"
"lstm_op"
"fusion_lstm_op"
"cumsum_op"
"channel_send_op"
"channel_create_op"
"channel_close_op"
"channel_recv_op"
)
if
(
"
${
TARGET
}
"
STREQUAL
"
${
windows_unsupport_op
}
"
)
return
()
endif
()
...
...
@@ -281,10 +282,12 @@ op_library(array_to_lod_tensor_op DEPS lod_rank_table_op)
op_library
(
max_sequence_len_op DEPS lod_rank_table
)
op_library
(
sequence_conv_op DEPS context_project
)
op_library
(
sequence_pool_op DEPS sequence_pooling
)
if
(
NOT WIN32
)
op_library
(
lstm_op DEPS sequence2batch lstm_compute
)
op_library
(
hierarchical_sigmoid_op DEPS matrix_bit_code
)
op_library
(
lstmp_op DEPS sequence2batch lstm_compute
)
op_library
(
gru_op DEPS sequence2batch gru_compute
)
endif
(
NOT WIN32
)
op_library
(
recurrent_op DEPS executor
)
op_library
(
warpctc_op DEPS dynload_warpctc sequence_padding sequence_scale
)
op_library
(
cos_sim_op DEPS cos_sim_functor
)
...
...
@@ -297,10 +300,10 @@ op_library(sequence_pad_op DEPS sequence_padding)
op_library
(
unstack_op DEPS stack_op
)
op_library
(
fake_quantize_op DEPS memory
)
op_library
(
fusion_lstm_op DEPS cpu_lstm_compute
)
if
(
WITH_GPU
)
op_library
(
conv_op DEPS vol2col depthwise_conv im2col
)
op_library
(
layer_norm_op DEPS cub
)
op_library
(
reduce_mean_op DEPS cub
)
else
()
op_library
(
conv_op DEPS vol2col im2col
)
endif
()
...
...
@@ -313,11 +316,6 @@ op_library(save_combine_op DEPS lod_tensor)
op_library
(
load_combine_op DEPS lod_tensor
)
op_library
(
concat_op DEPS concat
)
# FIXME(thuan): Move CSP operators to paddle/fluid/framework/operators/concurrency
add_subdirectory
(
concurrency
)
op_library
(
channel_send_op DEPS concurrency
)
op_library
(
channel_recv_op DEPS concurrency
)
list
(
REMOVE_ITEM GENERAL_OPS
${
DEPS_OPS
}
)
foreach
(
src
${
GENERAL_OPS
}
)
...
...
paddle/fluid/operators/channel_close_op.cc
已删除
100644 → 0
浏览文件 @
2513b2cc
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/framework/channel.h"
#include "paddle/fluid/framework/op_registry.h"
namespace
pf
=
paddle
::
framework
;
static
constexpr
char
kChannel
[]
=
"Channel"
;
namespace
paddle
{
namespace
operators
{
class
ChannelCloseOp
:
public
framework
::
OperatorBase
{
public:
ChannelCloseOp
(
const
std
::
string
&
type
,
const
framework
::
VariableNameMap
&
inputs
,
const
framework
::
VariableNameMap
&
outputs
,
const
framework
::
AttributeMap
&
attrs
)
:
framework
::
OperatorBase
(
type
,
inputs
,
outputs
,
attrs
)
{}
private:
void
RunImpl
(
const
framework
::
Scope
&
scope
,
const
platform
::
Place
&
dev_place
)
const
override
{
auto
&
inp
=
*
scope
.
FindVar
(
Input
(
kChannel
));
// Get the mutable version of the channel variable and closes it.
pf
::
ChannelHolder
*
ch
=
inp
.
GetMutable
<
framework
::
ChannelHolder
>
();
ch
->
close
();
}
};
class
ChannelCloseOpOpInferShape
:
public
framework
::
InferShapeBase
{
public:
void
operator
()(
framework
::
InferShapeContext
*
context
)
const
override
{
PADDLE_ENFORCE
(
context
->
HasInput
(
"Channel"
),
"The input of ChannelClose op must be set"
);
}
};
class
ChannelCloseOpMaker
:
public
framework
::
OpProtoAndCheckerMaker
{
public:
void
Make
()
override
{
AddInput
(
kChannel
,
"The Channel Variable that should be closed by"
" the ChannelClose Op."
);
AddComment
(
R"DOC(
Channel Close Operator.
This operator closes an open channel.
)DOC"
);
}
};
}
// namespace operators
}
// namespace paddle
REGISTER_OPERATOR
(
channel_close
,
paddle
::
operators
::
ChannelCloseOp
,
paddle
::
framework
::
EmptyGradOpMaker
,
paddle
::
operators
::
ChannelCloseOpMaker
);
paddle/fluid/operators/channel_create_op.cc
已删除
100644 → 0
浏览文件 @
2513b2cc
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/framework/channel.h"
#include "paddle/fluid/framework/lod_rank_table.h"
#include "paddle/fluid/framework/lod_tensor_array.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/reader.h"
namespace
pf
=
paddle
::
framework
;
static
constexpr
char
kOutput
[]
=
"Out"
;
namespace
paddle
{
namespace
operators
{
class
ChannelCreateOp
:
public
framework
::
OperatorBase
{
public:
ChannelCreateOp
(
const
std
::
string
&
type
,
const
framework
::
VariableNameMap
&
inputs
,
const
framework
::
VariableNameMap
&
outputs
,
const
framework
::
AttributeMap
&
attrs
)
:
framework
::
OperatorBase
(
type
,
inputs
,
outputs
,
attrs
)
{}
private:
void
RunImpl
(
const
framework
::
Scope
&
scope
,
const
platform
::
Place
&
dev_place
)
const
override
{
auto
&
out
=
*
scope
.
FindVar
(
Output
(
kOutput
));
// Determine the datatype and capacity of the channel to be created
// from the attributes provided.
auto
dtype
=
static_cast
<
framework
::
proto
::
VarType
::
Type
>
(
Attr
<
int
>
(
"data_type"
));
auto
capacity
=
Attr
<
int
>
(
"capacity"
);
// Based on the datatype, create a new channel holder initialized with
// the given capacity. When capacity is 0, an unbuffered channel is
// created.
pf
::
ChannelHolder
*
ch
=
out
.
GetMutable
<
framework
::
ChannelHolder
>
();
if
(
dtype
==
framework
::
proto
::
VarType
::
LOD_TENSOR
)
{
ch
->
Reset
<
pf
::
LoDTensor
>
(
capacity
);
}
else
if
(
dtype
==
framework
::
proto
::
VarType
::
SELECTED_ROWS
)
{
ch
->
Reset
<
pf
::
SelectedRows
>
(
capacity
);
}
else
if
(
dtype
==
framework
::
proto
::
VarType
::
LOD_RANK_TABLE
)
{
ch
->
Reset
<
pf
::
LoDRankTable
>
(
capacity
);
}
else
if
(
dtype
==
framework
::
proto
::
VarType
::
LOD_TENSOR_ARRAY
)
{
ch
->
Reset
<
pf
::
LoDTensorArray
>
(
capacity
);
}
else
if
(
dtype
==
framework
::
proto
::
VarType
::
READER
)
{
ch
->
Reset
<
pf
::
ReaderHolder
>
(
capacity
);
}
else
if
(
dtype
==
framework
::
proto
::
VarType
::
CHANNEL
)
{
ch
->
Reset
<
pf
::
ChannelHolder
>
(
capacity
);
}
else
if
(
dtype
==
framework
::
proto
::
VarType
::
BOOL
)
{
ch
->
Reset
<
bool
>
(
capacity
);
}
else
if
(
dtype
==
framework
::
proto
::
VarType
::
INT32
)
{
ch
->
Reset
<
int
>
(
capacity
);
}
else
if
(
dtype
==
framework
::
proto
::
VarType
::
INT64
)
{
ch
->
Reset
<
int64_t
>
(
capacity
);
}
else
if
(
dtype
==
framework
::
proto
::
VarType
::
FP32
)
{
ch
->
Reset
<
float
>
(
capacity
);
}
else
if
(
dtype
==
framework
::
proto
::
VarType
::
FP64
)
{
ch
->
Reset
<
double
>
(
capacity
);
}
else
{
PADDLE_THROW
(
"Data type %d is not in "
"[LOD_TENSOR, SELECTED_ROWS, LOD_RANK_TABLE, LOD_TENSOR_ARRAY, "
"READER, CHANNEL, BOOL, INT32, INT64, FP32, FP64]"
,
dtype
);
}
}
};
class
ChannelCreateOpOpInferShape
:
public
framework
::
InferShapeBase
{
public:
void
operator
()(
framework
::
InferShapeContext
*
context
)
const
override
{
PADDLE_ENFORCE
(
context
->
HasOutput
(
kOutput
),
"The output of ChannelCreate op must be set"
);
context
->
SetOutputDim
(
kOutput
,
{
1
});
}
};
class
ChannelCreateOpMaker
:
public
framework
::
OpProtoAndCheckerMaker
{
public:
void
Make
()
override
{
AddOutput
(
kOutput
,
"The object of a Channel type created by ChannelCreate Op."
);
AddAttr
<
int
>
(
"capacity"
,
"The size of the buffer of Channel."
)
.
SetDefault
(
0
);
AddAttr
<
int
>
(
"data_type"
,
"The data type of elements inside the Channel."
);
AddComment
(
R"DOC(
Channel Create Operator.
This operator creates an object of the VarType Channel and returns it.
)DOC"
);
}
};
}
// namespace operators
}
// namespace paddle
REGISTER_OPERATOR
(
channel_create
,
paddle
::
operators
::
ChannelCreateOp
,
paddle
::
framework
::
EmptyGradOpMaker
,
paddle
::
operators
::
ChannelCreateOpMaker
);
paddle/fluid/operators/channel_recv_op.cc
已删除
100644 → 0
浏览文件 @
2513b2cc
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/framework/channel.h"
#include <paddle/fluid/framework/lod_rank_table.h>
#include <paddle/fluid/framework/lod_tensor_array.h>
#include <paddle/fluid/framework/reader.h>
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/var_type.h"
#include "paddle/fluid/operators/concurrency/channel_util.h"
#include "paddle/fluid/operators/math/math_function.h"
static
constexpr
char
Channel
[]
=
"Channel"
;
static
constexpr
char
Status
[]
=
"Status"
;
static
constexpr
char
Out
[]
=
"Out"
;
namespace
paddle
{
namespace
operators
{
void
SetReceiveStatus
(
const
platform
::
Place
&
dev_place
,
framework
::
Variable
*
status_var
,
bool
status
)
{
auto
cpu
=
platform
::
CPUPlace
();
auto
status_tensor
=
status_var
->
GetMutable
<
framework
::
LoDTensor
>
()
->
mutable_data
<
bool
>
({
1
},
cpu
);
status_tensor
[
0
]
=
status
;
}
class
ChannelRecvOp
:
public
framework
::
OperatorBase
{
public:
ChannelRecvOp
(
const
std
::
string
&
type
,
const
framework
::
VariableNameMap
&
inputs
,
const
framework
::
VariableNameMap
&
outputs
,
const
framework
::
AttributeMap
&
attrs
)
:
framework
::
OperatorBase
(
type
,
inputs
,
outputs
,
attrs
)
{}
void
InferShape
(
framework
::
InferShapeContext
*
ctx
)
const
{
PADDLE_ENFORCE
(
ctx
->
HasInput
(
Channel
),
"Input(Channel) of ChannelRecvOp should not be null."
);
PADDLE_ENFORCE
(
ctx
->
HasOutput
(
Out
),
"Input(Channel) of ChannelRecvOp should not be null."
);
PADDLE_ENFORCE
(
ctx
->
HasOutput
(
Status
),
"Output(Status) of ChannelRecvOp should not be null."
);
ctx
->
SetOutputDim
(
"Status"
,
{
1
});
}
private:
void
RunImpl
(
const
framework
::
Scope
&
scope
,
const
platform
::
Place
&
dev_place
)
const
override
{
// Get the channel holder created by channel_create op, passed as input.
framework
::
ChannelHolder
*
ch
=
scope
.
FindVar
(
Input
(
Channel
))
->
GetMutable
<
framework
::
ChannelHolder
>
();
auto
output_var
=
scope
.
FindVar
(
Output
(
Out
));
// Receive the data from the channel.
bool
ok
=
concurrency
::
ChannelReceive
(
ch
,
output_var
);
// Set the status output of the `ChannelReceive` call.
SetReceiveStatus
(
dev_place
,
scope
.
FindVar
(
Output
(
Status
)),
ok
);
}
};
class
ChannelRecvOpMaker
:
public
framework
::
OpProtoAndCheckerMaker
{
public:
void
Make
()
override
{
AddInput
(
Channel
,
"(Channel) A variable which
\"
receives
\"
the a value sent"
"to it by a channel_send op."
)
.
AsDuplicable
();
AddOutput
(
Out
,
"(Variable) Output Variable that will hold the data received"
" from the Channel"
)
.
AsDuplicable
();
AddOutput
(
Status
,
"(Tensor) An LoD Tensor that returns a boolean status of the"
"result of the receive operation."
)
.
AsDuplicable
();
AddComment
(
R"DOC(
)DOC"
);
}
};
}
// namespace operators
}
// namespace paddle
REGISTER_OPERATOR
(
channel_recv
,
paddle
::
operators
::
ChannelRecvOp
,
paddle
::
framework
::
EmptyGradOpMaker
,
paddle
::
operators
::
ChannelRecvOpMaker
);
paddle/fluid/operators/channel_send_op.cc
已删除
100644 → 0
浏览文件 @
2513b2cc
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/framework/channel.h"
#include <paddle/fluid/framework/lod_rank_table.h>
#include <paddle/fluid/framework/lod_tensor_array.h>
#include <paddle/fluid/framework/reader.h>
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/var_type.h"
#include "paddle/fluid/operators/concurrency/channel_util.h"
#include "paddle/fluid/operators/math/math_function.h"
static
constexpr
char
Channel
[]
=
"Channel"
;
static
constexpr
char
X
[]
=
"X"
;
namespace
paddle
{
namespace
operators
{
class
ChannelSendOp
:
public
framework
::
OperatorBase
{
public:
ChannelSendOp
(
const
std
::
string
&
type
,
const
framework
::
VariableNameMap
&
inputs
,
const
framework
::
VariableNameMap
&
outputs
,
const
framework
::
AttributeMap
&
attrs
)
:
framework
::
OperatorBase
(
type
,
inputs
,
outputs
,
attrs
)
{}
void
InferShape
(
framework
::
InferShapeContext
*
ctx
)
const
{
PADDLE_ENFORCE
(
ctx
->
HasInput
(
Channel
),
"Input(Channel) of ChannelSendOp should not be null."
);
PADDLE_ENFORCE
(
ctx
->
HasInput
(
X
),
"Input(X) of ChannelSendOp should not be null."
);
}
private:
void
RunImpl
(
const
framework
::
Scope
&
scope
,
const
platform
::
Place
&
dev_place
)
const
override
{
// Get the channel holder created by channel_create op, passed as input.
framework
::
ChannelHolder
*
ch
=
scope
.
FindVar
(
Input
(
Channel
))
->
GetMutable
<
framework
::
ChannelHolder
>
();
auto
input_var
=
scope
.
FindVar
(
Input
(
X
));
// Send the input data through the channel.
concurrency
::
ChannelSend
(
ch
,
input_var
);
}
};
class
ChannelSendOpMaker
:
public
framework
::
OpProtoAndCheckerMaker
{
public:
void
Make
()
override
{
AddInput
(
Channel
,
"(Channel) A variable which
\"
sends
\"
the passed in value to "
"a listening receiver."
)
.
AsDuplicable
();
AddInput
(
X
,
"(Variable) The value which gets sent by the channel."
)
.
AsDuplicable
();
AddComment
(
R"DOC(
)DOC"
);
}
};
}
// namespace operators
}
// namespace paddle
REGISTER_OPERATOR
(
channel_send
,
paddle
::
operators
::
ChannelSendOp
,
paddle
::
framework
::
EmptyGradOpMaker
,
paddle
::
operators
::
ChannelSendOpMaker
);
paddle/fluid/operators/concurrency/CMakeLists.txt
已删除
100644 → 0
浏览文件 @
2513b2cc
cc_library
(
concurrency SRCS channel_util.cc DEPS device_context framework_proto boost eigen3
)
paddle/fluid/operators/concurrency/channel_util.cc
已删除
100644 → 0
浏览文件 @
2513b2cc
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/operators/concurrency/channel_util.h"
#include "paddle/fluid/framework/var_type.h"
namespace
poc
=
paddle
::
operators
::
concurrency
;
void
poc
::
ChannelSend
(
framework
::
ChannelHolder
*
ch
,
framework
::
Variable
*
var
)
{
auto
type
=
framework
::
ToVarType
(
var
->
Type
());
if
(
type
==
framework
::
proto
::
VarType_Type_LOD_TENSOR
)
ch
->
Send
(
var
->
GetMutable
<
framework
::
LoDTensor
>
());
else
if
(
type
==
framework
::
proto
::
VarType_Type_LOD_RANK_TABLE
)
ch
->
Send
(
var
->
GetMutable
<
framework
::
LoDRankTable
>
());
else
if
(
type
==
framework
::
proto
::
VarType_Type_LOD_TENSOR_ARRAY
)
ch
->
Send
(
var
->
GetMutable
<
framework
::
LoDTensorArray
>
());
else
if
(
type
==
framework
::
proto
::
VarType_Type_SELECTED_ROWS
)
ch
->
Send
(
var
->
GetMutable
<
framework
::
SelectedRows
>
());
else
if
(
type
==
framework
::
proto
::
VarType_Type_READER
)
ch
->
Send
(
var
->
GetMutable
<
framework
::
ReaderHolder
>
());
else
if
(
type
==
framework
::
proto
::
VarType_Type_CHANNEL
)
ch
->
Send
(
var
->
GetMutable
<
framework
::
ChannelHolder
>
());
else
PADDLE_THROW
(
"ChannelSend:Unsupported type"
);
}
bool
poc
::
ChannelReceive
(
framework
::
ChannelHolder
*
ch
,
framework
::
Variable
*
var
)
{
// Get type of channel and use that to call mutable data for Variable
auto
type
=
framework
::
ToVarType
(
ch
->
Type
());
if
(
type
==
framework
::
proto
::
VarType_Type_LOD_TENSOR
)
return
ch
->
Receive
(
var
->
GetMutable
<
framework
::
LoDTensor
>
());
else
if
(
type
==
framework
::
proto
::
VarType_Type_LOD_RANK_TABLE
)
return
ch
->
Receive
(
var
->
GetMutable
<
framework
::
LoDRankTable
>
());
else
if
(
type
==
framework
::
proto
::
VarType_Type_LOD_TENSOR_ARRAY
)
return
ch
->
Receive
(
var
->
GetMutable
<
framework
::
LoDTensorArray
>
());
else
if
(
type
==
framework
::
proto
::
VarType_Type_SELECTED_ROWS
)
return
ch
->
Receive
(
var
->
GetMutable
<
framework
::
SelectedRows
>
());
else
if
(
type
==
framework
::
proto
::
VarType_Type_READER
)
return
ch
->
Receive
(
var
->
GetMutable
<
framework
::
ReaderHolder
>
());
else
if
(
type
==
framework
::
proto
::
VarType_Type_CHANNEL
)
return
ch
->
Receive
(
var
->
GetMutable
<
framework
::
ChannelHolder
>
());
else
PADDLE_THROW
(
"ChannelReceive:Unsupported type"
);
}
void
poc
::
ChannelAddToSendQ
(
framework
::
ChannelHolder
*
ch
,
const
void
*
referrer
,
framework
::
Variable
*
var
,
std
::
shared_ptr
<
std
::
condition_variable_any
>
cond
,
std
::
function
<
bool
(
framework
::
ChannelAction
)
>
cb
)
{
auto
type
=
framework
::
ToVarType
(
var
->
Type
());
if
(
type
==
framework
::
proto
::
VarType_Type_LOD_TENSOR
)
{
ch
->
AddToSendQ
(
referrer
,
var
->
GetMutable
<
framework
::
LoDTensor
>
(),
cond
,
cb
);
}
else
if
(
type
==
framework
::
proto
::
VarType_Type_LOD_RANK_TABLE
)
{
ch
->
AddToSendQ
(
referrer
,
var
->
GetMutable
<
framework
::
LoDRankTable
>
(),
cond
,
cb
);
}
else
if
(
type
==
framework
::
proto
::
VarType_Type_LOD_TENSOR_ARRAY
)
{
ch
->
AddToSendQ
(
referrer
,
var
->
GetMutable
<
framework
::
LoDTensorArray
>
(),
cond
,
cb
);
}
else
if
(
type
==
framework
::
proto
::
VarType_Type_SELECTED_ROWS
)
{
ch
->
AddToSendQ
(
referrer
,
var
->
GetMutable
<
framework
::
SelectedRows
>
(),
cond
,
cb
);
}
else
if
(
type
==
framework
::
proto
::
VarType_Type_READER
)
{
ch
->
AddToSendQ
(
referrer
,
var
->
GetMutable
<
framework
::
ReaderHolder
>
(),
cond
,
cb
);
}
else
if
(
type
==
framework
::
proto
::
VarType_Type_CHANNEL
)
{
ch
->
AddToSendQ
(
referrer
,
var
->
GetMutable
<
framework
::
ChannelHolder
>
(),
cond
,
cb
);
}
else
{
PADDLE_THROW
(
"ChannelAddToSendQ:Unsupported type"
);
}
}
void
poc
::
ChannelAddToReceiveQ
(
framework
::
ChannelHolder
*
ch
,
const
void
*
referrer
,
framework
::
Variable
*
var
,
std
::
shared_ptr
<
std
::
condition_variable_any
>
cond
,
std
::
function
<
bool
(
framework
::
ChannelAction
)
>
cb
)
{
auto
type
=
framework
::
ToVarType
(
var
->
Type
());
if
(
type
==
framework
::
proto
::
VarType_Type_LOD_TENSOR
)
{
ch
->
AddToReceiveQ
(
referrer
,
var
->
GetMutable
<
framework
::
LoDTensor
>
(),
cond
,
cb
);
}
else
if
(
type
==
framework
::
proto
::
VarType_Type_LOD_RANK_TABLE
)
{
ch
->
AddToReceiveQ
(
referrer
,
var
->
GetMutable
<
framework
::
LoDRankTable
>
(),
cond
,
cb
);
}
else
if
(
type
==
framework
::
proto
::
VarType_Type_LOD_TENSOR_ARRAY
)
{
ch
->
AddToReceiveQ
(
referrer
,
var
->
GetMutable
<
framework
::
LoDTensorArray
>
(),
cond
,
cb
);
}
else
if
(
type
==
framework
::
proto
::
VarType_Type_SELECTED_ROWS
)
{
ch
->
AddToReceiveQ
(
referrer
,
var
->
GetMutable
<
framework
::
SelectedRows
>
(),
cond
,
cb
);
}
else
if
(
type
==
framework
::
proto
::
VarType_Type_READER
)
{
ch
->
AddToReceiveQ
(
referrer
,
var
->
GetMutable
<
framework
::
ReaderHolder
>
(),
cond
,
cb
);
}
else
if
(
type
==
framework
::
proto
::
VarType_Type_CHANNEL
)
{
ch
->
AddToReceiveQ
(
referrer
,
var
->
GetMutable
<
framework
::
ChannelHolder
>
(),
cond
,
cb
);
}
else
{
PADDLE_THROW
(
"ChannelAddToReceiveQ:Unsupported type"
);
}
}
paddle/fluid/operators/conv_op.h
浏览文件 @
ea7dc9cb
...
...
@@ -380,7 +380,8 @@ class DepthwiseConvKernel : public framework::OpKernel<T> {
math
::
DepthwiseConvFunctor
<
DeviceContext
,
T
>
depthwiseConv
;
auto
&
dev_ctx
=
context
.
template
device_context
<
DeviceContext
>();
depthwiseConv
(
dev_ctx
,
*
input
,
filter
,
strides
,
paddings
,
output
);
depthwiseConv
(
dev_ctx
,
*
input
,
filter
,
strides
,
paddings
,
dilations
,
output
);
}
};
...
...
@@ -415,14 +416,14 @@ class DepthwiseConvGradKernel : public framework::OpKernel<T> {
input_grad
->
mutable_data
<
T
>
(
context
.
GetPlace
());
set_zero
(
dev_ctx
,
input_grad
,
static_cast
<
T
>
(
0
));
depthwiseConvInputGrad
(
dev_ctx
,
*
input
,
filter
,
*
output_grad
,
strides
,
paddings
,
input_grad
);
paddings
,
dilations
,
input_grad
);
}
if
(
filter_grad
)
{
filter_grad
->
mutable_data
<
T
>
(
context
.
GetPlace
());
set_zero
(
dev_ctx
,
filter_grad
,
static_cast
<
T
>
(
0
));
depthwiseConvFilterGrad
(
dev_ctx
,
*
input
,
*
output_grad
,
strides
,
paddings
,
filter_grad
);
dilations
,
filter_grad
);
}
}
};
...
...
paddle/fluid/operators/conv_transpose_op.h
浏览文件 @
ea7dc9cb
...
...
@@ -345,7 +345,7 @@ class DepthwiseConvTransposeKernel : public framework::OpKernel<T> {
math
::
DepthwiseConvInputGradFunctor
<
DeviceContext
,
T
>
depthwiseConvInputGrad
;
depthwiseConvInputGrad
(
dev_ctx
,
*
output
,
filter
,
*
input
,
strides
,
paddings
,
output
);
dilations
,
output
);
}
};
...
...
@@ -367,10 +367,11 @@ class DepthwiseConvTransposeGradKernel : public framework::OpKernel<T> {
auto
&
dev_ctx
=
context
.
template
device_context
<
DeviceContext
>();
std
::
vector
<
int
>
strides
=
context
.
Attr
<
std
::
vector
<
int
>>
(
"strides"
);
std
::
vector
<
int
>
paddings
=
context
.
Attr
<
std
::
vector
<
int
>>
(
"paddings"
);
std
::
vector
<
int
>
dilations
=
context
.
Attr
<
std
::
vector
<
int
>>
(
"dilations"
);
if
(
input_grad
)
{
math
::
DepthwiseConvFunctor
<
DeviceContext
,
T
>
depthwiseConv
;
depthwiseConv
(
dev_ctx
,
*
output_grad
,
filter
,
strides
,
paddings
,
depthwiseConv
(
dev_ctx
,
*
output_grad
,
filter
,
strides
,
paddings
,
dilations
,
input_grad
);
}
...
...
@@ -382,7 +383,7 @@ class DepthwiseConvTransposeGradKernel : public framework::OpKernel<T> {
math
::
DepthwiseConvFilterGradFunctor
<
DeviceContext
,
T
>
depthwiseConvFilterGrad
;
depthwiseConvFilterGrad
(
dev_ctx
,
*
output_grad
,
*
input
,
strides
,
paddings
,
filter_grad
);
dilations
,
filter_grad
);
}
}
};
...
...
paddle/fluid/operators/cub_reduce.h
0 → 100644
浏览文件 @
ea7dc9cb
// Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#pragma once
#include <algorithm>
#include <cmath>
#include <numeric>
#include <set>
#include <vector>
#include <cub/cub.cuh> // NOLINT
#include "paddle/fluid/framework/tensor.h"
namespace
paddle
{
namespace
operators
{
namespace
detail
{
template
<
typename
T
,
size_t
ElementCount
>
struct
Array
{
public:
HOSTDEVICE
inline
Array
()
{}
HOSTDEVICE
inline
T
&
operator
[](
size_t
index
)
{
return
data_
[
index
];
}
HOSTDEVICE
inline
const
T
&
operator
[](
size_t
index
)
const
{
return
data_
[
index
];
}
HOSTDEVICE
constexpr
inline
size_t
size
()
const
{
return
ElementCount
;
}
template
<
typename
VectorLikeType
>
static
inline
Array
<
T
,
ElementCount
>
From
(
const
VectorLikeType
&
vec
)
{
PADDLE_ENFORCE_EQ
(
vec
.
size
(),
ElementCount
,
"size not match"
);
size_t
n
=
static_cast
<
size_t
>
(
vec
.
size
());
Array
<
T
,
ElementCount
>
ret
;
for
(
size_t
i
=
0
;
i
<
n
;
++
i
)
ret
[
i
]
=
vec
[
i
];
return
ret
;
}
private:
T
data_
[
ElementCount
];
};
// reduce the last axis of 2d array
template
<
typename
Tx
,
typename
Ty
,
typename
ReduceOp
,
typename
TransformOp
,
int
BlockDim
>
__global__
void
ReduceKernel2D
(
const
Tx
*
x
,
Ty
*
y
,
ReduceOp
reducer
,
TransformOp
transformer
,
Ty
init
,
int
reduce_num
)
{
__shared__
typename
cub
::
BlockReduce
<
Ty
,
BlockDim
>::
TempStorage
temp_storage
;
int
idx_x
=
blockIdx
.
x
*
reduce_num
;
int
idx_y
=
threadIdx
.
x
;
Ty
reduce_var
=
init
;
for
(
int
idx_y
=
threadIdx
.
x
;
idx_y
<
reduce_num
;
idx_y
+=
BlockDim
)
reduce_var
=
reducer
(
reduce_var
,
transformer
(
x
[
idx_x
+
idx_y
]));
reduce_var
=
cub
::
BlockReduce
<
Ty
,
BlockDim
>
(
temp_storage
).
Reduce
(
reduce_var
,
reducer
);
if
(
threadIdx
.
x
==
0
)
{
y
[
blockIdx
.
x
]
=
reduce_var
;
}
}
template
<
typename
Tx
,
typename
Ty
,
typename
ReduceOp
,
typename
TransformOp
,
int
BlockDim
,
int
Rank
,
int
ReduceRank
>
__global__
void
ReduceKernel
(
const
Tx
*
x
,
Ty
*
y
,
ReduceOp
reducer
,
TransformOp
transformer
,
Ty
init
,
int
reduce_num
,
Array
<
int
,
Rank
>
x_strides
,
Array
<
int
,
ReduceRank
>
reduce_dim
,
Array
<
int
,
ReduceRank
>
reduce_strides
,
Array
<
int
,
Rank
-
ReduceRank
>
left_dim
,
Array
<
int
,
Rank
-
ReduceRank
>
left_strides
)
{
__shared__
typename
cub
::
BlockReduce
<
Ty
,
BlockDim
>::
TempStorage
temp_storage
;
Array
<
int
,
Rank
>
sub_index
;
int
left_idx
=
blockIdx
.
x
;
for
(
int
i
=
0
;
i
<
Rank
-
ReduceRank
;
++
i
)
{
sub_index
[
left_dim
[
i
]]
=
left_idx
/
left_strides
[
i
];
left_idx
%=
left_strides
[
i
];
}
int
reduce_idx
=
threadIdx
.
x
;
for
(
int
j
=
0
;
j
<
ReduceRank
;
++
j
)
{
sub_index
[
reduce_dim
[
j
]]
=
reduce_idx
/
reduce_strides
[
j
];
reduce_idx
%=
reduce_strides
[
j
];
}
int
idx_x
=
0
;
for
(
int
k
=
0
;
k
<
Rank
;
++
k
)
idx_x
+=
(
sub_index
[
k
]
*
x_strides
[
k
]);
Ty
reduce_var
=
static_cast
<
Ty
>
(
transformer
(
x
[
idx_x
]));
for
(
int
i
=
threadIdx
.
x
+
BlockDim
;
i
<
reduce_num
;
i
+=
BlockDim
)
{
int
reduce_idx
=
i
;
for
(
int
j
=
0
;
j
<
ReduceRank
;
++
j
)
{
sub_index
[
reduce_dim
[
j
]]
=
reduce_idx
/
reduce_strides
[
j
];
reduce_idx
%=
reduce_strides
[
j
];
}
int
idx_x
=
0
;
for
(
int
k
=
0
;
k
<
Rank
;
++
k
)
idx_x
+=
(
sub_index
[
k
]
*
x_strides
[
k
]);
reduce_var
=
static_cast
<
Ty
>
(
reducer
(
reduce_var
,
transformer
(
x
[
idx_x
])));
}
reduce_var
=
cub
::
BlockReduce
<
Ty
,
BlockDim
>
(
temp_storage
).
Reduce
(
reduce_var
,
reducer
);
if
(
threadIdx
.
x
==
0
)
{
y
[
blockIdx
.
x
]
=
reduce_var
;
}
}
static
inline
std
::
vector
<
int
>
GetStrides
(
const
std
::
vector
<
int
>&
dims
)
{
int
n
=
static_cast
<
int
>
(
dims
.
size
());
if
(
n
==
0
)
return
std
::
vector
<
int
>
();
std
::
vector
<
int
>
strides
(
n
);
strides
.
back
()
=
1
;
for
(
int
i
=
n
-
2
;
i
>=
0
;
--
i
)
{
strides
[
i
]
=
strides
[
i
+
1
]
*
dims
[
i
+
1
];
}
return
strides
;
}
static
inline
std
::
vector
<
int
>
GetStrides
(
const
std
::
vector
<
int
>&
dims
,
const
std
::
vector
<
int
>&
idx
)
{
int
n
=
static_cast
<
int
>
(
idx
.
size
());
if
(
n
==
0
)
return
std
::
vector
<
int
>
();
std
::
vector
<
int
>
strides
(
n
);
strides
.
back
()
=
1
;
for
(
int
i
=
n
-
2
;
i
>=
0
;
--
i
)
{
strides
[
i
]
=
strides
[
i
+
1
]
*
dims
[
idx
[
i
+
1
]];
}
return
strides
;
}
constexpr
int
kMaxBlockDim
=
512
;
static
inline
int
GetDesiredBlockDim
(
int
block_dim
)
{
return
block_dim
>=
kMaxBlockDim
?
kMaxBlockDim
:
(
1
<<
static_cast
<
int
>
(
std
::
log2
(
block_dim
)));
}
template
<
typename
Tx
,
typename
Ty
,
int
BlockDim
,
typename
ReduceOp
,
typename
TransformOp
>
static
void
TensorReduceImpl
(
const
Tx
*
x_data
,
Ty
*
y_data
,
const
platform
::
Place
&
place
,
const
ReduceOp
&
reducer
,
const
TransformOp
&
transformer
,
const
Ty
&
init
,
int
left_num
,
int
reduce_num
,
const
std
::
vector
<
int
>&
x_strides
,
const
std
::
vector
<
int
>&
reduce_dim
,
const
std
::
vector
<
int
>&
reduce_strides
,
const
std
::
vector
<
int
>&
left_dim
,
const
std
::
vector
<
int
>&
left_strides
,
cudaStream_t
stream
)
{
#define CUB_RANK_CASE(i, ...) \
case i: { \
constexpr auto kRank = i; \
switch (reduce_rank) { __VA_ARGS__; } \
} break
#define CUB_REDUCE_RANK_CASE(i, ...) \
case i: { \
constexpr auto kReduceRank = i; \
ReduceKernel<Tx, Ty, ReduceOp, TransformOp, BlockDim, kRank, \
kReduceRank><<<left_num, BlockDim, 0, stream>>>( \
x_data, y_data, reducer, transformer, init, reduce_num, \
Array<int, kRank>::From(x_strides), \
Array<int, kReduceRank>::From(reduce_dim), \
Array<int, kReduceRank>::From(reduce_strides), \
Array<int, kRank - kReduceRank>::From(left_dim), \
Array<int, kRank - kReduceRank>::From(left_strides)); \
} break
int
rank
=
x_strides
.
size
();
int
reduce_rank
=
reduce_strides
.
size
();
if
(
rank
==
reduce_rank
)
{
cub
::
TransformInputIterator
<
Ty
,
TransformOp
,
const
Tx
*>
trans_x
(
x_data
,
transformer
);
size_t
temp_storage_bytes
=
0
;
cub
::
DeviceReduce
::
Reduce
(
nullptr
,
temp_storage_bytes
,
trans_x
,
y_data
,
reduce_num
,
reducer
,
init
,
stream
);
framework
::
Tensor
tmp
;
auto
*
temp_storage
=
tmp
.
mutable_data
<
uint8_t
>
(
framework
::
make_ddim
({
static_cast
<
int64_t
>
(
temp_storage_bytes
)}),
place
);
cub
::
DeviceReduce
::
Reduce
(
temp_storage
,
temp_storage_bytes
,
trans_x
,
y_data
,
reduce_num
,
reducer
,
init
,
stream
);
return
;
}
if
(
rank
==
2
&&
reduce_rank
==
1
&&
reduce_dim
[
0
]
==
1
)
{
ReduceKernel2D
<
Tx
,
Ty
,
ReduceOp
,
TransformOp
,
BlockDim
><<<
left_num
,
BlockDim
,
0
,
stream
>>>
(
x_data
,
y_data
,
reducer
,
transformer
,
init
,
reduce_num
);
return
;
}
/*
if (rank == 3 && reduce_rank == 1 && reduce_dim[0] == 1) {
// TODO(liangdun): we can optimize 3d case which the 2nd axis is reduced.
// Currently, it is handled by code below, but inefficient
return;
}
*/
switch
(
rank
)
{
CUB_RANK_CASE
(
2
,
CUB_REDUCE_RANK_CASE
(
1
););
CUB_RANK_CASE
(
3
,
CUB_REDUCE_RANK_CASE
(
1
);
CUB_REDUCE_RANK_CASE
(
2
););
CUB_RANK_CASE
(
4
,
CUB_REDUCE_RANK_CASE
(
1
);
CUB_REDUCE_RANK_CASE
(
2
);
CUB_REDUCE_RANK_CASE
(
3
););
CUB_RANK_CASE
(
5
,
CUB_REDUCE_RANK_CASE
(
1
);
CUB_REDUCE_RANK_CASE
(
2
);
CUB_REDUCE_RANK_CASE
(
3
);
CUB_REDUCE_RANK_CASE
(
4
););
CUB_RANK_CASE
(
6
,
CUB_REDUCE_RANK_CASE
(
1
);
CUB_REDUCE_RANK_CASE
(
2
);
CUB_REDUCE_RANK_CASE
(
3
);
CUB_REDUCE_RANK_CASE
(
4
);
CUB_REDUCE_RANK_CASE
(
5
););
CUB_RANK_CASE
(
7
,
CUB_REDUCE_RANK_CASE
(
1
);
CUB_REDUCE_RANK_CASE
(
2
);
CUB_REDUCE_RANK_CASE
(
3
);
CUB_REDUCE_RANK_CASE
(
4
);
CUB_REDUCE_RANK_CASE
(
5
);
CUB_REDUCE_RANK_CASE
(
6
););
CUB_RANK_CASE
(
8
,
CUB_REDUCE_RANK_CASE
(
1
);
CUB_REDUCE_RANK_CASE
(
2
);
CUB_REDUCE_RANK_CASE
(
3
);
CUB_REDUCE_RANK_CASE
(
4
);
CUB_REDUCE_RANK_CASE
(
5
);
CUB_REDUCE_RANK_CASE
(
6
););
CUB_RANK_CASE
(
9
,
CUB_REDUCE_RANK_CASE
(
1
);
CUB_REDUCE_RANK_CASE
(
2
);
CUB_REDUCE_RANK_CASE
(
3
);
CUB_REDUCE_RANK_CASE
(
4
);
CUB_REDUCE_RANK_CASE
(
5
);
CUB_REDUCE_RANK_CASE
(
6
);
CUB_REDUCE_RANK_CASE
(
7
);
CUB_REDUCE_RANK_CASE
(
8
););
}
#undef CUB_REDUCE_RANK_CASE
#undef CUB_RANK_CASE
}
}
// namespace detail
template
<
typename
Tx
,
typename
Ty
,
typename
ReduceOp
,
typename
TransformOp
>
void
TensorReduce
(
const
framework
::
Tensor
&
x
,
framework
::
Tensor
*
y
,
std
::
vector
<
int
>
origin_reduce_dims
,
const
Ty
&
init
,
const
ReduceOp
&
reducer
,
const
TransformOp
&
transformer
,
cudaStream_t
stream
)
{
auto
x_dim
=
framework
::
vectorize2int
(
x
.
dims
());
std
::
vector
<
int
>
new_x_dim
,
new_reduce_dims
;
int
is_reduced
=
0
;
for
(
auto
e
:
origin_reduce_dims
)
{
auto
pos
=
e
>=
0
?
e
:
e
+
x_dim
.
size
();
is_reduced
|=
1
<<
e
;
}
for
(
int
i
=
0
;
i
<
x_dim
.
size
();
i
++
)
{
if
((
i
==
0
)
||
(((
is_reduced
>>
i
)
^
(
is_reduced
>>
(
i
-
1
)))
&
1
))
{
new_x_dim
.
push_back
(
x_dim
[
i
]);
if
((
is_reduced
>>
i
)
&
1
)
new_reduce_dims
.
push_back
(
new_x_dim
.
size
()
-
1
);
}
else
{
new_x_dim
[
new_x_dim
.
size
()
-
1
]
*=
x_dim
[
i
];
}
}
x_dim
=
new_x_dim
;
origin_reduce_dims
=
new_reduce_dims
;
int
x_rank
=
static_cast
<
int
>
(
x_dim
.
size
());
std
::
set
<
int
>
left_set
,
reduce_set
;
for
(
int
i
=
0
;
i
<
x_rank
;
++
i
)
left_set
.
insert
(
i
);
for
(
auto
e
:
origin_reduce_dims
)
{
left_set
.
erase
(
e
);
reduce_set
.
insert
(
e
);
}
std
::
vector
<
int
>
reduce_dim
(
reduce_set
.
begin
(),
reduce_set
.
end
());
std
::
vector
<
int
>
left_dim
(
left_set
.
begin
(),
left_set
.
end
());
std
::
vector
<
int
>
x_strides
=
detail
::
GetStrides
(
x_dim
);
std
::
vector
<
int
>
reduce_strides
=
detail
::
GetStrides
(
x_dim
,
reduce_dim
);
std
::
vector
<
int
>
left_strides
=
detail
::
GetStrides
(
x_dim
,
left_dim
);
int
reduce_num
=
reduce_strides
[
0
]
*
x_dim
[
reduce_dim
[
0
]];
int
left_num
=
1
;
if
(
left_dim
.
size
())
left_num
=
left_strides
[
0
]
*
x_dim
[
left_dim
[
0
]];
std
::
vector
<
int
>
y_dim
(
left_dim
.
size
());
for
(
int
i
=
0
;
i
<
left_dim
.
size
();
++
i
)
{
y_dim
[
i
]
=
x_dim
[
left_dim
[
i
]];
}
auto
x_data
=
x
.
data
<
Tx
>
();
auto
y_data
=
y
->
mutable_data
<
Ty
>
(
x
.
place
());
if
(
reduce_num
==
1
)
return
;
#define CUB_BLOCK_DIM_CASE(block_dim) \
case block_dim: { \
constexpr auto kBlockDim = block_dim; \
detail::TensorReduceImpl<Tx, Ty, block_dim, ReduceOp, TransformOp>( \
x_data, y_data, x.place(), reducer, transformer, init, left_num, \
reduce_num, x_strides, reduce_dim, reduce_strides, left_dim, \
left_strides, stream); \
} break
switch
(
detail
::
GetDesiredBlockDim
(
reduce_num
))
{
CUB_BLOCK_DIM_CASE
(
512
);
CUB_BLOCK_DIM_CASE
(
256
);
CUB_BLOCK_DIM_CASE
(
128
);
CUB_BLOCK_DIM_CASE
(
64
);
CUB_BLOCK_DIM_CASE
(
32
);
CUB_BLOCK_DIM_CASE
(
16
);
CUB_BLOCK_DIM_CASE
(
8
);
CUB_BLOCK_DIM_CASE
(
4
);
CUB_BLOCK_DIM_CASE
(
2
);
}
#undef CUB_BLOCK_DIM_CASE
}
}
// namespace operators
}
// namespace paddle
paddle/fluid/operators/detection/roi_perspective_transform_op.cc
浏览文件 @
ea7dc9cb
...
...
@@ -104,7 +104,6 @@ bool in_quad(T x, T y, T roi_x[], T roi_y[]) {
* a31 = (dx3 * dy2 - dx2 * dy3) / (dx1 * dy2 - dx2 * dy1) / (w - 1)
* a32 = (dx1 * dy3 - dx3 * dy1) / (dx1 * dy2 - dx2 * dy1) / (h - 1)
* a33 = 1
*
*/
template
<
typename
T
>
void
get_transform_matrix
(
const
int
transformed_width
,
...
...
@@ -260,8 +259,8 @@ class CPUROIPerspectiveTransformOpKernel : public framework::OpKernel<T> {
roi2image
.
Resize
({
rois_num
});
int
*
roi2image_data
=
roi2image
.
mutable_data
<
int
>
(
ctx
.
GetPlace
());
auto
lod
=
rois
->
lod
().
back
();
for
(
in
t
i
=
0
;
i
<
lod
.
size
()
-
1
;
++
i
)
{
for
(
in
t
j
=
lod
[
i
];
j
<
lod
[
i
+
1
];
++
j
)
{
for
(
size_
t
i
=
0
;
i
<
lod
.
size
()
-
1
;
++
i
)
{
for
(
size_
t
j
=
lod
[
i
];
j
<
lod
[
i
+
1
];
++
j
)
{
roi2image_data
[
j
]
=
i
;
}
}
...
...
@@ -393,8 +392,8 @@ class CPUROIPerspectiveTransformGradOpKernel : public framework::OpKernel<T> {
roi2image
.
Resize
({
rois_num
});
int
*
roi2image_data
=
roi2image
.
mutable_data
<
int
>
(
ctx
.
GetPlace
());
auto
lod
=
rois
->
lod
().
back
();
for
(
in
t
i
=
0
;
i
<
lod
.
size
()
-
1
;
++
i
)
{
for
(
in
t
j
=
lod
[
i
];
j
<
lod
[
i
+
1
];
++
j
)
{
for
(
size_
t
i
=
0
;
i
<
lod
.
size
()
-
1
;
++
i
)
{
for
(
size_
t
j
=
lod
[
i
];
j
<
lod
[
i
+
1
];
++
j
)
{
roi2image_data
[
j
]
=
i
;
}
}
...
...
@@ -404,7 +403,7 @@ class CPUROIPerspectiveTransformGradOpKernel : public framework::OpKernel<T> {
for
(
int
in_h
=
0
;
in_h
<
in_height
;
++
in_h
)
{
for
(
int
in_w
=
0
;
in_w
<
in_width
;
++
in_w
)
{
T
gradient
=
0.0
;
for
(
in
t
roi_idx
=
lod
[
n
];
roi_idx
<
lod
[
n
+
1
];
++
roi_idx
)
{
for
(
size_
t
roi_idx
=
lod
[
n
];
roi_idx
<
lod
[
n
+
1
];
++
roi_idx
)
{
const
T
*
rois
=
rois_data
+
roi_idx
*
8
;
T
roi_x
[
4
];
T
roi_y
[
4
];
...
...
paddle/fluid/operators/detection/roi_perspective_transform_op.cu
浏览文件 @
ea7dc9cb
...
...
@@ -345,8 +345,8 @@ class CUDAROIPerspectiveTransformOpKernel : public framework::OpKernel<T> {
roi2image
.
Resize
({
rois_num
});
int
*
roi2image_data
=
roi2image
.
mutable_data
<
int
>
(
platform
::
CPUPlace
());
auto
lod
=
rois
->
lod
().
back
();
for
(
in
t
i
=
0
;
i
<
lod
.
size
()
-
1
;
++
i
)
{
for
(
in
t
j
=
lod
[
i
];
j
<
lod
[
i
+
1
];
++
j
)
{
for
(
size_
t
i
=
0
;
i
<
lod
.
size
()
-
1
;
++
i
)
{
for
(
size_
t
j
=
lod
[
i
];
j
<
lod
[
i
+
1
];
++
j
)
{
roi2image_data
[
j
]
=
i
;
}
}
...
...
@@ -432,7 +432,7 @@ __global__ void RoiTransformGradKernel(
T
gradient
=
0.0
;
// Accumulate gradient over all RoIs that interpolated this element
for
(
in
t
roi_idx
=
lod
[
n
];
roi_idx
<
lod
[
n
+
1
];
++
roi_idx
)
{
for
(
size_
t
roi_idx
=
lod
[
n
];
roi_idx
<
lod
[
n
+
1
];
++
roi_idx
)
{
const
T
*
rois
=
rois_data
+
roi_idx
*
8
;
T
roi_x
[
4
];
T
roi_y
[
4
];
...
...
paddle/fluid/operators/distributed/grpc_client.h
浏览文件 @
ea7dc9cb
...
...
@@ -15,6 +15,7 @@ limitations under the License. */
#pragma once
#include <time.h>
#include <atomic>
#include <chrono> // NOLINT
#include <condition_variable> // NOLINT
...
...
paddle/fluid/operators/distributed/request_handler.h
浏览文件 @
ea7dc9cb
...
...
@@ -15,6 +15,7 @@
#pragma once
#include <time.h>
#include <condition_variable> // NOLINT
#include <functional>
#include <string>
...
...
paddle/fluid/operators/distributed/rpc_server.h
浏览文件 @
ea7dc9cb
...
...
@@ -14,6 +14,7 @@
#pragma once
#include <atomic>
#include <set>
#include <string>
#include <thread> // NOLINT
...
...
paddle/fluid/operators/elementwise_op.h
浏览文件 @
ea7dc9cb
...
...
@@ -89,7 +89,7 @@ class ElementwiseOpMaker : public framework::OpProtoAndCheckerMaker {
AddAttr
<
bool
>
(
"use_mkldnn"
,
"(bool, default false). Used by MKLDNN."
)
.
SetDefault
(
false
);
AddComment
(
string
::
Sprintf
(
R"DOC(
Limited
Elementwise %s Operator
Elementwise %s Operator
The equation is:
...
...
paddle/fluid/operators/fused_embedding_fc_lstm_op.cc
0 → 100644
浏览文件 @
ea7dc9cb
此差异已折叠。
点击以展开。
paddle/fluid/operators/
concurrency/channel_util
.h
→
paddle/fluid/operators/
fused_embedding_fc_lstm_op
.h
浏览文件 @
ea7dc9cb
...
...
@@ -4,7 +4,7 @@ Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
...
...
@@ -13,26 +13,29 @@ See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include "paddle/fluid/framework/channel.h"
#include "paddle/fluid/framework/variable.h"
#include "paddle/fluid/framework/op_registry.h"
namespace
paddle
{
namespace
operators
{
namespace
concurrency
{
void
ChannelSend
(
framework
::
ChannelHolder
*
ch
,
framework
::
Variable
*
var
);
bool
ChannelReceive
(
framework
::
ChannelHolder
*
ch
,
framework
::
Variable
*
var
);
using
LoDTensor
=
framework
::
LoDTensor
;
using
Tensor
=
framework
::
Tensor
;
class
FusedEmbeddingFCLSTMOp
:
public
framework
::
OperatorWithKernel
{
public:
using
framework
::
OperatorWithKernel
::
OperatorWithKernel
;
void
InferShape
(
framework
::
InferShapeContext
*
ctx
)
const
override
;
protected:
framework
::
OpKernelType
GetExpectedKernelType
(
const
framework
::
ExecutionContext
&
ctx
)
const
override
;
};
void
ChannelAddToSendQ
(
framework
::
ChannelHolder
*
ch
,
const
void
*
referrer
,
framework
::
Variable
*
var
,
std
::
shared_ptr
<
std
::
condition_variable_any
>
cond
,
std
::
function
<
bool
(
framework
::
ChannelAction
)
>
cb
);
void
ChannelAddToReceiveQ
(
framework
::
ChannelHolder
*
ch
,
const
void
*
referrer
,
framework
::
Variable
*
var
,
std
::
shared_ptr
<
std
::
condition_variable_any
>
cond
,
std
::
function
<
bool
(
framework
::
ChannelAction
)
>
cb
);
class
FusedEmbeddingFCLSTMOpMaker
:
public
framework
::
OpProtoAndCheckerMaker
{
public:
void
Make
()
override
;
};
}
// namespace concurrency
}
// namespace operators
}
// namespace paddle
paddle/fluid/operators/fusion_gru_op.cc
浏览文件 @
ea7dc9cb
...
...
@@ -290,12 +290,13 @@ class FusionGRUKernel : public framework::OpKernel<T> {
void
BatchCompute
(
const
framework
::
ExecutionContext
&
ctx
)
const
{
using
DeviceContext
=
paddle
::
platform
::
CPUDeviceContext
;
auto
*
x
=
ctx
.
Input
<
LoDTensor
>
(
"X"
);
INIT_BASE_INPUT_OUTPUT
INIT_BASE_SIZES
if
(
x
->
lod
()[
0
].
size
()
==
2
)
{
xx
->
Resize
({
total_T
,
D3
});
SeqCompute
(
ctx
);
return
;
}
INIT_BASE_INPUT_OUTPUT
INIT_BASE_SIZES
INIT_VEC_FUNC
auto
*
reordered_h0
=
ctx
.
Output
<
Tensor
>
(
"ReorderedH0"
);
...
...
paddle/fluid/operators/fusion_lstm_op.cc
浏览文件 @
ea7dc9cb
...
...
@@ -432,11 +432,12 @@ class FuisonLSTMKernel : public framework::OpKernel<T> {
void
BatchCompute
(
const
framework
::
ExecutionContext
&
ctx
)
const
{
using
DeviceContext
=
platform
::
CPUDeviceContext
;
INIT_BASE_INPUT_OUTPUT
INIT_BASE_SIZES
if
(
x
->
lod
()[
0
].
size
()
==
2
)
{
xx
->
Resize
({
x_dims
[
0
],
D4
});
SeqCompute
(
ctx
);
return
;
}
INIT_BASE_SIZES
INIT_VEC_FUNC
INIT_BASE_INPUT_DATAS
...
...
paddle/fluid/operators/math/cpu_lstm_compute.cc
浏览文件 @
ea7dc9cb
...
...
@@ -13,6 +13,31 @@ limitations under the License. */
namespace
paddle
{
namespace
operators
{
namespace
math
{}
// namespace math
namespace
math
{
#ifdef __AVX__
template
<
>
void
lstm_compute_ctht
<
float
>
(
float
*
gates
,
const
float
*
ct_1
,
float
*
ct
,
float
*
ht
)
{
namespace
act
=
detail
::
forward
::
avx
;
// gates: W_ch, W_ih, W_fh, W_oh
__m256
c
,
i
,
f
,
o
;
c
=
_mm256_loadu_ps
(
gates
);
i
=
_mm256_loadu_ps
(
gates
+
8
);
f
=
_mm256_loadu_ps
(
gates
+
16
);
o
=
_mm256_loadu_ps
(
gates
+
24
);
/* C_t = C_t-1 * fgated + cand_gated * igated*/
c
=
_mm256_mul_ps
(
act
::
Tanh
(
c
),
act
::
Sigmoid
(
i
));
i
=
_mm256_loadu_ps
(
ct_1
);
f
=
_mm256_mul_ps
(
i
,
act
::
Sigmoid
(
f
));
f
=
_mm256_add_ps
(
c
,
f
);
_mm256_storeu_ps
(
ct
,
f
);
/* H_t = act_cell(C_t) * ogated */
o
=
_mm256_mul_ps
(
act
::
Tanh
(
f
),
act
::
Sigmoid
(
o
));
_mm256_storeu_ps
(
ht
,
o
);
}
#endif
}
// namespace math
}
// namespace operators
}
// namespace paddle
paddle/fluid/operators/math/cpu_lstm_compute.h
浏览文件 @
ea7dc9cb
...
...
@@ -48,32 +48,15 @@ namespace forward {
namespace
avx
{
__m256
Sigmoid
(
const
__m256
a
);
__m256
Tanh
(
const
__m256
a
);
}
// namespace avx
}
// namespace forward
}
// namespace detail
template
<
>
void
lstm_compute_ctht
<
float
>
(
float
*
gates
,
const
float
*
ct_1
,
float
*
ct
,
float
*
ht
)
{
namespace
act
=
detail
::
forward
::
avx
;
// gates: W_ch, W_ih, W_fh, W_oh
__m256
c
,
i
,
f
,
o
;
c
=
_mm256_loadu_ps
(
gates
);
i
=
_mm256_loadu_ps
(
gates
+
8
);
f
=
_mm256_loadu_ps
(
gates
+
16
);
o
=
_mm256_loadu_ps
(
gates
+
24
);
float
*
ht
);
/* C_t = C_t-1 * fgated + cand_gated * igated*/
c
=
_mm256_mul_ps
(
act
::
Tanh
(
c
),
act
::
Sigmoid
(
i
));
i
=
_mm256_loadu_ps
(
ct_1
);
f
=
_mm256_mul_ps
(
i
,
act
::
Sigmoid
(
f
));
f
=
_mm256_add_ps
(
c
,
f
);
_mm256_storeu_ps
(
ct
,
f
);
/* H_t = act_cell(C_t) * ogated */
o
=
_mm256_mul_ps
(
act
::
Tanh
(
f
),
act
::
Sigmoid
(
o
));
_mm256_storeu_ps
(
ht
,
o
);
}
#endif
}
// namespace math
...
...
paddle/fluid/operators/math/depthwise_conv.cu
浏览文件 @
ea7dc9cb
此差异已折叠。
点击以展开。
paddle/fluid/operators/math/depthwise_conv.h
浏览文件 @
ea7dc9cb
...
...
@@ -32,7 +32,8 @@ class DepthwiseConvFunctor {
void
operator
()(
const
DeviceContext
&
context
,
const
framework
::
Tensor
&
input
,
const
framework
::
Tensor
&
filter
,
const
std
::
vector
<
int
>&
strides
,
const
std
::
vector
<
int
>&
paddings
,
framework
::
Tensor
*
output
);
const
std
::
vector
<
int
>&
paddings
,
const
std
::
vector
<
int
>&
dilations
,
framework
::
Tensor
*
output
);
};
template
<
typename
DeviceContext
,
typename
T
>
...
...
@@ -43,6 +44,7 @@ class DepthwiseConvInputGradFunctor {
const
framework
::
Tensor
&
output_grad
,
const
std
::
vector
<
int
>&
strides
,
const
std
::
vector
<
int
>&
paddings
,
const
std
::
vector
<
int
>&
dilations
,
framework
::
Tensor
*
input_grad
);
};
...
...
@@ -53,6 +55,7 @@ class DepthwiseConvFilterGradFunctor {
const
framework
::
Tensor
&
output_grad
,
const
std
::
vector
<
int
>&
strides
,
const
std
::
vector
<
int
>&
paddings
,
const
std
::
vector
<
int
>&
dilations
,
framework
::
Tensor
*
filter_grad
);
};
...
...
paddle/fluid/operators/math/math_function.cc
浏览文件 @
ea7dc9cb
...
...
@@ -13,6 +13,15 @@ See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/operators/math/math_function.h"
#ifdef PADDLE_WITH_MKLML
#include "paddle/fluid/platform/dynload/mklml.h"
#endif
#ifdef PADDLE_USE_OPENBLAS
#include <cblas.h>
#endif
#include <vector>
#include "paddle/fluid/framework/data_type.h"
#include "paddle/fluid/operators/math/math_function_impl.h"
...
...
paddle/fluid/operators/math/math_function.h
浏览文件 @
ea7dc9cb
...
...
@@ -13,18 +13,6 @@ See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#ifdef PADDLE_WITH_MKLML
#include "paddle/fluid/platform/dynload/mklml.h"
#endif
#ifdef PADDLE_USE_OPENBLAS
#include <cblas.h>
// remove typedef in openblas
#undef FLOAT
#undef INT
#undef SIZE
#endif
#include <cmath>
#include <vector>
...
...
paddle/fluid/operators/reduce_mean_op.cu
浏览文件 @
ea7dc9cb
...
...
@@ -12,17 +12,64 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#include <vector>
#include "paddle/fluid/operators/cub_reduce.h"
#include "paddle/fluid/operators/reduce_mean_op.h"
REGISTER_OP_CUDA_KERNEL
(
reduce_mean
,
ops
::
ReduceKernel
<
paddle
::
platform
::
CUDADeviceContext
,
float
,
ops
::
MeanFunctor
>
,
ops
::
ReduceKernel
<
paddle
::
platform
::
CUDADeviceContext
,
double
,
ops
::
MeanFunctor
>
,
ops
::
ReduceKernel
<
paddle
::
platform
::
CUDADeviceContext
,
int
,
ops
::
MeanFunctor
>
,
ops
::
ReduceKernel
<
paddle
::
platform
::
CUDADeviceContext
,
int64_t
,
ops
::
MeanFunctor
>
);
namespace
paddle
{
namespace
operators
{
template
<
typename
T
>
struct
DivideFunctor
{
HOSTDEVICE
explicit
inline
DivideFunctor
(
int
n
)
:
n_inv
((
T
)(
1.0
/
n
))
{}
HOSTDEVICE
inline
T
operator
()(
const
T
&
x
)
const
{
return
x
*
n_inv
;
}
private:
T
n_inv
;
};
template
<
typename
T
>
class
ReduceMeanKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
void
Compute
(
const
framework
::
ExecutionContext
&
context
)
const
override
{
bool
reduce_all
=
context
.
Attr
<
bool
>
(
"reduce_all"
);
auto
*
input
=
context
.
Input
<
Tensor
>
(
"X"
);
auto
*
output
=
context
.
Output
<
Tensor
>
(
"Out"
);
auto
dims
=
context
.
Attr
<
std
::
vector
<
int
>>
(
"dim"
);
bool
keep_dim
=
context
.
Attr
<
bool
>
(
"keep_dim"
);
std
::
vector
<
int
>
reduce_dims
;
if
(
reduce_all
)
{
reduce_dims
.
resize
(
input
->
dims
().
size
());
for
(
int
i
=
0
;
i
<
reduce_dims
.
size
();
++
i
)
reduce_dims
[
i
]
=
i
;
}
else
{
for
(
auto
e
:
dims
)
{
reduce_dims
.
push_back
(
e
>=
0
?
e
:
e
+
input
->
dims
().
size
());
}
}
int
reduce_num
=
1
;
for
(
int
i
=
0
;
i
<
reduce_dims
.
size
();
++
i
)
{
reduce_num
*=
input
->
dims
()[
reduce_dims
[
i
]];
}
auto
stream
=
context
.
cuda_device_context
().
stream
();
TensorReduce
<
T
,
T
,
cub
::
Sum
,
DivideFunctor
<
T
>>
(
*
input
,
output
,
reduce_dims
,
static_cast
<
T
>
(
0
),
cub
::
Sum
(),
DivideFunctor
<
T
>
(
reduce_num
),
stream
);
}
};
}
// namespace operators
}
// namespace paddle
REGISTER_OP_CUDA_KERNEL
(
reduce_mean
,
ops
::
ReduceMeanKernel
<
float
>
,
ops
::
ReduceMeanKernel
<
double
>
,
ops
::
ReduceMeanKernel
<
int
>
,
ops
::
ReduceMeanKernel
<
int64_t
>
);
REGISTER_OP_CUDA_KERNEL
(
reduce_mean_grad
,
ops
::
ReduceGradKernel
<
paddle
::
platform
::
CUDADeviceContext
,
float
,
ops
::
MeanGradFunctor
>
,
...
...
paddle/fluid/operators/reduce_sum_op.cu
浏览文件 @
ea7dc9cb
此差异已折叠。
点击以展开。
paddle/fluid/operators/select_op.cc
已删除
100644 → 0
浏览文件 @
2513b2cc
此差异已折叠。
点击以展开。
paddle/fluid/operators/sequence_erase_op.cc
浏览文件 @
ea7dc9cb
...
...
@@ -24,9 +24,9 @@ class SequenceEraseOp : public framework::OperatorWithKernel {
void
InferShape
(
framework
::
InferShapeContext
*
ctx
)
const
override
{
PADDLE_ENFORCE
(
ctx
->
HasInput
(
"X"
),
"Input(X) of SequenceErase
Op
should not be null."
);
"Input(X) of SequenceErase
operator
should not be null."
);
PADDLE_ENFORCE
(
ctx
->
HasOutput
(
"Out"
),
"Output(Out) of SequenceErase
Op
should not be null."
);
"Output(Out) of SequenceErase
operator
should not be null."
);
auto
x_dims
=
ctx
->
GetInputDim
(
"X"
);
PADDLE_ENFORCE
(
x_dims
.
size
()
==
2
&&
x_dims
[
1
]
==
1
,
"Input(X) of SequenceEraseOp should be a 2-D LoDTensor "
...
...
paddle/fluid/operators/tensorrt_engine_op.h
浏览文件 @
ea7dc9cb
此差异已折叠。
点击以展开。
paddle/fluid/operators/top_k_op.cc
浏览文件 @
ea7dc9cb
此差异已折叠。
点击以展开。
paddle/fluid/platform/dynload/cublas.h
浏览文件 @
ea7dc9cb
此差异已折叠。
点击以展开。
paddle/fluid/platform/dynload/cudnn.h
浏览文件 @
ea7dc9cb
此差异已折叠。
点击以展开。
paddle/fluid/platform/dynload/curand.h
浏览文件 @
ea7dc9cb
此差异已折叠。
点击以展开。
paddle/fluid/platform/dynload/dynamic_loader.cc
浏览文件 @
ea7dc9cb
此差异已折叠。
点击以展开。
paddle/fluid/platform/gpu_info.cc
浏览文件 @
ea7dc9cb
此差异已折叠。
点击以展开。
paddle/fluid/pybind/const_value.cc
浏览文件 @
ea7dc9cb
此差异已折叠。
点击以展开。
paddle/fluid/pybind/protobuf.cc
浏览文件 @
ea7dc9cb
此差异已折叠。
点击以展开。
paddle/fluid/pybind/pybind.cc
浏览文件 @
ea7dc9cb
此差异已折叠。
点击以展开。
paddle/fluid/train/CMakeLists.txt
浏览文件 @
ea7dc9cb
此差异已折叠。
点击以展开。
paddle/legacy/trainer/tests/CMakeLists.txt
浏览文件 @
ea7dc9cb
此差异已折叠。
点击以展开。
paddle/scripts/paddle_build.sh
浏览文件 @
ea7dc9cb
此差异已折叠。
点击以展开。
python/paddle/fluid/clip.py
浏览文件 @
ea7dc9cb
此差异已折叠。
点击以展开。
python/paddle/fluid/concurrency.py
已删除
100644 → 0
浏览文件 @
2513b2cc
此差异已折叠。
点击以展开。
python/paddle/fluid/contrib/tests/test_quantize_transpiler.py
浏览文件 @
ea7dc9cb
此差异已折叠。
点击以展开。
python/paddle/fluid/framework.py
浏览文件 @
ea7dc9cb
此差异已折叠。
点击以展开。
python/paddle/fluid/layers/control_flow.py
浏览文件 @
ea7dc9cb
此差异已折叠。
点击以展开。
python/paddle/fluid/layers/detection.py
浏览文件 @
ea7dc9cb
此差异已折叠。
点击以展开。
python/paddle/fluid/layers/nn.py
浏览文件 @
ea7dc9cb
此差异已折叠。
点击以展开。
python/paddle/fluid/layers/ops.py
浏览文件 @
ea7dc9cb
此差异已折叠。
点击以展开。
python/paddle/fluid/nets.py
浏览文件 @
ea7dc9cb
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/CMakeLists.txt
浏览文件 @
ea7dc9cb
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/book/high-level-api/recognize_digits/CMakeLists.txt
浏览文件 @
ea7dc9cb
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/no_test_concurrency.py
已删除
100644 → 0
浏览文件 @
2513b2cc
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/notest_concurrency.py
已删除
100644 → 0
浏览文件 @
2513b2cc
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/CMakeLists.txt
浏览文件 @
ea7dc9cb
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/dist_se_resnext.py
浏览文件 @
ea7dc9cb
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_conv2d_op.py
浏览文件 @
ea7dc9cb
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_dist_base.py
浏览文件 @
ea7dc9cb
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_dist_ctr.py
浏览文件 @
ea7dc9cb
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_dist_simnet_bow.py
浏览文件 @
ea7dc9cb
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_dist_text_classification.py
浏览文件 @
ea7dc9cb
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_layers.py
浏览文件 @
ea7dc9cb
此差异已折叠。
点击以展开。
python/paddle/fluid/tests/unittests/test_operator_desc.py
浏览文件 @
ea7dc9cb
此差异已折叠。
点击以展开。
python/paddle/fluid/transpiler/inference_transpiler.py
浏览文件 @
ea7dc9cb
此差异已折叠。
点击以展开。
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录