Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
BaiXuePrincess
Paddle
提交
5e0bc130
P
Paddle
项目概览
BaiXuePrincess
/
Paddle
与 Fork 源项目一致
Fork自
PaddlePaddle / Paddle
通知
1
Star
1
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
5e0bc130
编写于
11月 23, 2018
作者:
Z
ZhenWang
浏览文件
操作
浏览文件
下载
差异文件
Merge branch 'develop' of
https://github.com/PaddlePaddle/Paddle
into add-code-exp
上级
43b9202d
64ca3d17
变更
45
隐藏空白更改
内联
并排
Showing
45 changed file
with
1480 addition
and
254 deletion
+1480
-254
.github/ISSUE_TEMPLATE/---feature-request-.md
.github/ISSUE_TEMPLATE/---feature-request-.md
+27
-0
.github/ISSUE_TEMPLATE/---inference-issue-.md
.github/ISSUE_TEMPLATE/---inference-issue-.md
+40
-0
.github/ISSUE_TEMPLATE/---installation-issue-.md
.github/ISSUE_TEMPLATE/---installation-issue-.md
+40
-0
.github/ISSUE_TEMPLATE/---model-issue-.md
.github/ISSUE_TEMPLATE/---model-issue-.md
+36
-0
.github/ISSUE_TEMPLATE/---others-.md
.github/ISSUE_TEMPLATE/---others-.md
+33
-0
.github/ISSUE_TEMPLATE/---training-issue-.md
.github/ISSUE_TEMPLATE/---training-issue-.md
+38
-0
CMakeLists.txt
CMakeLists.txt
+2
-0
cmake/external/dlpack.cmake
cmake/external/dlpack.cmake
+31
-0
cmake/external/eigen.cmake
cmake/external/eigen.cmake
+1
-1
cmake/external/mkldnn.cmake
cmake/external/mkldnn.cmake
+1
-1
cmake/external/rocprim.cmake
cmake/external/rocprim.cmake
+44
-0
cmake/flags.cmake
cmake/flags.cmake
+3
-0
cmake/generic.cmake
cmake/generic.cmake
+15
-11
cmake/hip.cmake
cmake/hip.cmake
+27
-5
paddle/fluid/API.spec
paddle/fluid/API.spec
+2
-2
paddle/fluid/framework/CMakeLists.txt
paddle/fluid/framework/CMakeLists.txt
+3
-0
paddle/fluid/framework/dlpack_tensor.cc
paddle/fluid/framework/dlpack_tensor.cc
+127
-0
paddle/fluid/framework/dlpack_tensor.h
paddle/fluid/framework/dlpack_tensor.h
+45
-0
paddle/fluid/framework/dlpack_tensor_test.cc
paddle/fluid/framework/dlpack_tensor_test.cc
+113
-0
paddle/fluid/framework/op_desc.cc
paddle/fluid/framework/op_desc.cc
+6
-0
paddle/fluid/inference/tensorrt/convert/split_op.cc
paddle/fluid/inference/tensorrt/convert/split_op.cc
+2
-10
paddle/fluid/inference/tensorrt/convert/test_split_op.cc
paddle/fluid/inference/tensorrt/convert/test_split_op.cc
+75
-13
paddle/fluid/inference/tensorrt/plugin/split_op_plugin.cu
paddle/fluid/inference/tensorrt/plugin/split_op_plugin.cu
+127
-31
paddle/fluid/inference/tensorrt/plugin/split_op_plugin.h
paddle/fluid/inference/tensorrt/plugin/split_op_plugin.h
+7
-2
paddle/fluid/operators/detection/CMakeLists.txt
paddle/fluid/operators/detection/CMakeLists.txt
+1
-1
paddle/fluid/operators/detection/density_prior_box_op.cc
paddle/fluid/operators/detection/density_prior_box_op.cc
+21
-15
paddle/fluid/operators/detection/density_prior_box_op.cu
paddle/fluid/operators/detection/density_prior_box_op.cu
+170
-0
paddle/fluid/operators/detection/density_prior_box_op.h
paddle/fluid/operators/detection/density_prior_box_op.h
+35
-38
paddle/fluid/operators/math/blas_impl.cu.h
paddle/fluid/operators/math/blas_impl.cu.h
+172
-34
paddle/fluid/platform/device_context.h
paddle/fluid/platform/device_context.h
+47
-0
paddle/fluid/platform/dynload/cublas.h
paddle/fluid/platform/dynload/cublas.h
+12
-14
paddle/fluid/platform/gpu_info.cc
paddle/fluid/platform/gpu_info.cc
+20
-0
paddle/fluid/platform/gpu_info.h
paddle/fluid/platform/gpu_info.h
+3
-0
paddle/fluid/pybind/CMakeLists.txt
paddle/fluid/pybind/CMakeLists.txt
+2
-2
paddle/scripts/paddle_build.sh
paddle/scripts/paddle_build.sh
+37
-1
paddle/testing/paddle_gtest_main.cc
paddle/testing/paddle_gtest_main.cc
+1
-1
python/paddle/fluid/__init__.py
python/paddle/fluid/__init__.py
+2
-1
python/paddle/fluid/layers/detection.py
python/paddle/fluid/layers/detection.py
+20
-23
python/paddle/fluid/layers/nn.py
python/paddle/fluid/layers/nn.py
+9
-4
python/paddle/fluid/nets.py
python/paddle/fluid/nets.py
+8
-1
python/paddle/fluid/tests/test_detection.py
python/paddle/fluid/tests/test_detection.py
+32
-28
python/paddle/fluid/tests/unittests/test_density_prior_box_op.py
...paddle/fluid/tests/unittests/test_density_prior_box_op.py
+19
-11
python/paddle/fluid/tests/unittests/test_layers.py
python/paddle/fluid/tests/unittests/test_layers.py
+11
-0
tools/manylinux1/Dockerfile.x64
tools/manylinux1/Dockerfile.x64
+1
-1
tools/manylinux1/build_scripts/build_utils.sh
tools/manylinux1/build_scripts/build_utils.sh
+12
-3
未找到文件。
.github/ISSUE_TEMPLATE/---feature-request-.md
0 → 100644
浏览文件 @
5e0bc130
---
name
:
建议(Feature request)
about
:
您可以提出您的建议。 You could use this template for reporting a suggestion issue.
---
欢迎您对PaddlePaddle提出建议,非常感谢您对PaddlePaddle的贡献!
在留下您的建议时,辛苦您同步提供如下信息:
-
版本、环境信息
1)PaddlePaddle版本:请提供您的PaddlePaddle版本号,例如1.1
2)CPU/GPU:您是否使用GPU进行训练,如是,请提供您的CUDA和cuDNN版本号
3)系统环境:请您描述系统类型、版本,例如Mac OS 10.14
-
复现信息:如为报错,请给出复现环境、复现步骤
-
建议描述:请您详细描述,您认为需优化的功能
Thank you for contributing to PaddlePaddle.
Before submitting the issue, you could search issue in the github in case that there was a similar issue submitted or resolved before.
Please make sure that this is a feature request.
**System information**
-PaddlePaddle version (eg.1.1)or CommitID
-CPU: including CPUMKL/OpenBlas/MKLDNN version
-GPU: including CUDA/CUDNN version
-OS Platform (eg.Mac OS 10.14)
**To Reproduce**
Steps to reproduce the behavior
**Describe the feature and the current behavior/state.**
**Any Other info.**
.github/ISSUE_TEMPLATE/---inference-issue-.md
0 → 100644
浏览文件 @
5e0bc130
---
name
:
预测(Inference Issue)
about
:
您可以提问预测中报错、应用等问题。 You could use this template for reporting an inference issue.
---
为使您的问题得到快速解决,在建立Issue前,请您先通过如下方式搜索是否有相似问题:【搜索issue关键字】【使用labels筛选】【官方文档】
如果您没有查询到相似问题,为快速解决您的提问,建立issue时请提供如下细节信息:
-
标题:简洁、精准描述您的问题,例如“最新预测库的API文档在哪儿 ”
-
版本、环境信息:
1)PaddlePaddle版本:请提供您的PaddlePaddle版本号(如1.1)或CommitID
2)CPU:预测若用CPU,请提供CPU型号,MKL/OpenBlas/MKLDNN/等数学库使用情况
3)GPU:预测若用GPU,请提供GPU型号、CUDA和CUDNN版本号
4)系统环境:请您描述系统类型、版本(如Mac OS 10.14),Python版本
-预测信息
1)C++预测:请您提供预测库安装包的版本信息,及其中的version.txt文件
2)CMake包含路径的完整命令
3)API信息(如调用请提供)
4)预测库来源:官网下载/特殊环境(如BCLOUD编译)
-
复现信息:如为报错,请给出复现环境、复现步骤
-
问题描述:请详细描述您的问题,同步贴出报错信息、日志/代码关键片段
Thank you for contributing to PaddlePaddle.
Before submitting the issue, you could search issue in the github in case that th
If there is no solution,please make sure that this is an inference issue including the following details :
**System information**
-PaddlePaddle version (eg.1.1)or CommitID
-CPU: including CPUMKL/OpenBlas/MKLDNN version
-GPU: including CUDA/CUDNN version
-OS Platform (eg.Mac OS 10.14)
-Python version
-Cmake orders
-C++version.txt
-API information
**To Reproduce**
Steps to reproduce the behavior
**Describe your current behavior**
**Code to reproduce the issue**
**Other info / logs**
.github/ISSUE_TEMPLATE/---installation-issue-.md
0 → 100644
浏览文件 @
5e0bc130
---
name
:
安装(Installation Issue)
about
:
您可以提问安装、编译出现报错等问题。 You could use this template for reporting an installation
issue.
---
为使您的问题得到快速解决,在建立Issue前,请您先通过如下方式搜索是否有相似问题:【搜索issue关键字】【使用labels筛选】【官方文档】
建立issue时,为快速解决问题,请您根据使用情况给出如下信息:
-
标题:请包含关键词“安装错误”/“编译错误”,例如“Mac编译错误”
-
版本、环境信息:
1)PaddlePaddle版本:请提供您的PaddlePaddle版本号(如1.1)或CommitID
2)CPU:请提供CPU型号,MKL/OpenBlas/MKLDNN/等数学库的使用情况
3)GPU:请提供GPU型号,CUDA和CUDNN版本号
4)系统环境:请说明系统类型、版本(如Mac OS 10.14)、Python版本
-
安装方式信息:
1)pip安装/docker安装
2)本地编译:请提供cmake命令,编译命令
3)docker编译:请提供docker镜像,编译命令
特殊环境请注明:如离线安装等
-
复现信息:如为报错,请给出复现环境、复现步骤
-
问题描述:请详细描述您的问题,同步贴出报错信息、日志/代码关键片段
Thank you for contributing to PaddlePaddle.
Before submitting the issue, you could search issue in Github in case that there was a similar issue submitted or resolved before.
If there is no solution,please make sure that this is an installation issue including the following details:
**System information**
-PaddlePaddle version (eg.1.1)or CommitID
-CPU: including CPUMKL/OpenBlas/MKLDNN version
-GPU: including CUDA/CUDNN version
-OS Platform (eg. Mac OS 10.14)
-Python version
-
Install method: pip install/install with docker/build from source(without docker)/build within docker
-
Other special cases that you think may be related to this problem, eg. offline install, special internet condition
**To Reproduce**
Steps to reproduce the behavior
**Describe your current behavior**
**Code to reproduce the issue**
**Other info / logs**
.github/ISSUE_TEMPLATE/---model-issue-.md
0 → 100644
浏览文件 @
5e0bc130
---
name
:
模型(Model Issue)
about
:
您可以提问模型、算法、数据集方向的使用报错等问题。You could use this template for reporting a model/
algorithm/dataset issue.
---
为使您的问题得到快速解决,在建立Issue前,请您先通过如下方式搜索是否有相似问题:【搜索issue关键字】【使用labels筛选】【官方文档】
建立issue时,为快速解决问题,请您根据使用情况给出如下信息:
-
标题:简洁、精准描述您的问题,例如“ssd 模型前置lstm报错 ”
-
版本、环境信息:
1)PaddlePaddle版本:请提供PaddlePaddle版本号,例如1.1或CommitID
2)CPU:请提供CPU型号,MKL/OpenBlas/MKLDNN/等数学库的使用情况
3)GPU:请提供GPU型号,CUDA和CUDNN版本号
4)系统环境:请说明系统类型、版本(例如Mac OS 10.14),Python版本
-
模型信息
1)模型名称 2)使用数据集名称 3)使用算法名称 4)模型链接
-
复现信息:如为报错,请给出复现环境、复现步骤
-
问题描述:请详细描述您的问题,同步贴出报错信息、日志/代码关键片段
Thank you for contributing to PaddlePaddle.
Before submitting the issue, you could search issue in the github.Probably there was a similar issue submitted or resolved before.
If there is no solution,please make sure that this is a issue of models including the following details:
**System information**
-PaddlePaddle version (eg.1.1)or CommitID
-CPU: including CPUMKL/OpenBlas/MKLDNN version
-GPU: including CUDA/CUDNN version
-OS Platform (eg.Mac OS 10.14)
-Python version
-Name of Models&Dataset/details of operator
**To Reproduce**
Steps to reproduce the behavior
**Describe your current behavior**
**Code to reproduce the issue**
**Other info / logs**
.github/ISSUE_TEMPLATE/---others-.md
0 → 100644
浏览文件 @
5e0bc130
---
name
:
其他(Others)
about
:
如上述分类未包含您的问题,可在此提出。 You could use this template for reporting other issues
---
为使您的问题得到快速解决,在建立Issues前,请您先通过如下方式搜索是否有相似问题:【搜索issue关键字】【使用labels筛选】【官方文档】
如果您没有查询到相似问题,为快速解决您的提问,建立issue时请提供如下细节信息:
-
标题:简洁、精准概括您的问题
-
版本、环境信息:
1)PaddlePaddle版本:请提供您的PaddlePaddle版本号,例如1.1或CommitID
2)CPU/GPU:如果您使用GPU训练,请提供GPU驱动版本、CUDA和cuDNN版本号
3)系统环境:请您描述系统类型、版本,例如Mac OS 10.14
4)Python版本号
5)显存信息
-
复现信息:如为报错,请给出复现环境、复现步骤
-
问题描述:请详细描述您的问题,同步贴出报错信息、日志/代码关键片段
Thank you for contributing to PaddlePaddle.
Before submitting the issue, you could search issue in the github in case that there was a similar issue submitted or resolved before.
If there is no solution,please provide us with the following details :
**System information**
-PaddlePaddle version (eg.1.1)or CommitID
-CPU: including CPUMKL/OpenBlas/MKLDNN version
-GPU: including CUDA/cuDNN version
-OS Platform and Distribution(eg.Mac OS 10.14)
-Python version
**To Reproduce**
Steps to reproduce the behavior
**Describe your current behavior**
**Code to reproduce the issue**
**Other info / logs**
.github/ISSUE_TEMPLATE/---training-issue-.md
0 → 100644
浏览文件 @
5e0bc130
---
name
:
训练(Training issue)
about
:
您可以提问训练中报错、应用、出core等问题。 You could use this template for reporting an training
issue.
---
为使您的问题得到快速解决,在建立Issues前,请您先通过如下方式搜索是否有相似问题:【搜索issue关键字】【使用labels筛选】【官方文档】
如果您没有查询到相似问题,为快速解决您的提问,建立issue时请提供如下细节信息:
-
标题:简洁、精准概括您的问题,例如“Insufficient Memory xxx" ”
-
版本、环境信息:
1)PaddlePaddle版本:请提供您的PaddlePaddle版本号,例如1.1或CommitID
2)CPU:预测若用CPU,请提供CPU型号,MKL/OpenBlas/MKLDNN/等数学库使用情况
3)GPU:预测若用GPU,请提供GPU型号、CUDA和CUDNN版本号
4)系统环境:请您描述系统类型、版本,例如Mac OS 10.14,Python版本
-
训练信息
1)单机/多机,单卡/多卡
2)显存信息
3)Operator信息
-
复现信息:如为报错,请给出复现环境、复现步骤
-
问题描述:请详细描述您的问题,同步贴出报错信息、日志、可复现的代码片段
Thank you for contributing to PaddlePaddle.
Before submitting the issue, you could search issue in the github in case that there was a similar issue submitted or resolved before.
If there is no solution,please make sure that this is a training issue including the following details:
**System information**
-PaddlePaddle version (eg.1.1)or CommitID
-CPU: including CPUMKL/OpenBlas/MKLDNN version
-GPU: including CUDA/CUDNN version
-OS Platform (eg.Mac OS 10.14)
-Other imformation: Distriuted training/informantion of operator/
Graphics card storage
**To Reproduce**
Steps to reproduce the behavior
**Describe your current behavior**
**Code to reproduce the issue**
**Other info / logs**
CMakeLists.txt
浏览文件 @
5e0bc130
...
...
@@ -204,7 +204,9 @@ include(external/eigen) # download eigen3
include
(
external/pybind11
)
# download pybind11
include
(
external/cares
)
include
(
external/cub
)
include
(
external/rocprim
)
include
(
external/xxhash
)
# download xxhash
include
(
external/dlpack
)
include
(
external/snappy
)
# download snappy
include
(
external/snappystream
)
# download snappystream
...
...
cmake/external/dlpack.cmake
0 → 100644
浏览文件 @
5e0bc130
include
(
ExternalProject
)
set
(
DLPACK_SOURCE_DIR
${
THIRD_PARTY_PATH
}
/dlpack
)
set
(
DLPACK_INCLUDE_DIR
${
DLPACK_SOURCE_DIR
}
/src/extern_dlpack/include
)
include_directories
(
${
DLPACK_INCLUDE_DIR
}
)
ExternalProject_Add
(
extern_dlpack
${
EXTERNAL_PROJECT_LOG_ARGS
}
GIT_REPOSITORY
"https://github.com/dmlc/dlpack.git"
GIT_TAG
"v0.2"
PREFIX
${
DLPACK_SOURCE_DIR
}
UPDATE_COMMAND
""
CONFIGURE_COMMAND
""
BUILD_COMMAND
""
INSTALL_COMMAND
""
TEST_COMMAND
""
)
if
(
${
CMAKE_VERSION
}
VERSION_LESS
"3.3.0"
)
set
(
dummyfile
${
CMAKE_CURRENT_BINARY_DIR
}
/dlpack_dummy.c
)
file
(
WRITE
${
dummyfile
}
"const char *dummy =
\"
${
dummyfile
}
\"
;"
)
add_library
(
dlpack STATIC
${
dummyfile
}
)
else
()
add_library
(
dlpack INTERFACE
)
endif
()
add_dependencies
(
dlpack extern_dlpack
)
LIST
(
APPEND externl_project_dependencies dlpack
)
cmake/external/eigen.cmake
浏览文件 @
5e0bc130
...
...
@@ -17,7 +17,7 @@ if(WITH_AMD_GPU)
extern_eigen3
${
EXTERNAL_PROJECT_LOG_ARGS
}
GIT_REPOSITORY
"https://github.com/sabreshao/hipeigen.git"
GIT_TAG
0cba03ff9f8f9f70bbd92ac5857b031aa8fed6f9
GIT_TAG
7cb2b6e5a4b4a1efe658abb215cd866c6fb2275e
PREFIX
${
EIGEN_SOURCE_DIR
}
UPDATE_COMMAND
""
CONFIGURE_COMMAND
""
...
...
cmake/external/mkldnn.cmake
浏览文件 @
5e0bc130
...
...
@@ -53,7 +53,7 @@ ExternalProject_Add(
${
EXTERNAL_PROJECT_LOG_ARGS
}
DEPENDS
${
MKLDNN_DEPENDS
}
GIT_REPOSITORY
"https://github.com/01org/mkl-dnn.git"
GIT_TAG
"
21fb5f2af1dd14e132af4f1b79160977ee487818
"
GIT_TAG
"
830a10059a018cd2634d94195140cf2d8790a75a
"
PREFIX
${
MKLDNN_SOURCES_DIR
}
UPDATE_COMMAND
""
CMAKE_ARGS -DCMAKE_CXX_COMPILER=
${
CMAKE_CXX_COMPILER
}
...
...
cmake/external/rocprim.cmake
0 → 100644
浏览文件 @
5e0bc130
if
(
NOT WITH_AMD_GPU
)
return
()
endif
()
# rocprim is "ROCm Parallel Primitives" for short.
# It is a header-only library providing HIP and HC parallel primitives
# for developing performant GPU-accelerated code on AMD ROCm platform.
if
(
"x
${
HCC_HOME
}
"
STREQUAL
"x"
)
set
(
HCC_HOME
"/opt/rocm/hcc"
)
endif
()
INCLUDE
(
ExternalProject
)
SET
(
ROCPRIM_SOURCE_DIR
${
THIRD_PARTY_PATH
}
/rocprim
)
SET
(
ROCPRIM_INSTALL_DIR
${
THIRD_PARTY_PATH
}
/install/rocprim
)
SET
(
ROCPRIM_INCLUDE_DIR
${
ROCPRIM_INSTALL_DIR
}
/include
)
ExternalProject_Add
(
extern_rocprim
GIT_REPOSITORY
"https://github.com/ROCmSoftwarePlatform/rocPRIM.git"
GIT_TAG 5bd41b96ab8d8343330fb2c3e1b96775bde3b3fc
PREFIX
${
ROCPRIM_SOURCE_DIR
}
UPDATE_COMMAND
""
CMAKE_ARGS -DCMAKE_CXX_COMPILER=
${
HCC_HOME
}
/bin/hcc
CMAKE_ARGS -DONLY_INSTALL=ON
CMAKE_ARGS -DBUILD_TEST=OFF
CMAKE_ARGS -DCMAKE_INSTALL_PREFIX=
${
ROCPRIM_INSTALL_DIR
}
INSTALL_DIR
${
ROCPRIM_INSTALL_DIR
}
${
EXTERNAL_PROJECT_LOG_ARGS
}
)
INCLUDE_DIRECTORIES
(
${
ROCPRIM_INCLUDE_DIR
}
)
if
(
${
CMAKE_VERSION
}
VERSION_LESS
"3.3.0"
)
set
(
dummyfile
${
CMAKE_CURRENT_BINARY_DIR
}
/rocprim_dummy.c
)
file
(
WRITE
${
dummyfile
}
"const char *dummy_rocprim =
\"
${
dummyfile
}
\"
;"
)
add_library
(
rocprim STATIC
${
dummyfile
}
)
else
()
add_library
(
rocprim INTERFACE
)
endif
()
add_dependencies
(
rocprim extern_rocprim
)
cmake/flags.cmake
浏览文件 @
5e0bc130
...
...
@@ -129,6 +129,9 @@ set(COMMON_FLAGS
-Wno-error=parentheses-equality
# Warnings in pybind11
-Wno-error=ignored-attributes
# Warnings in Eigen, gcc 6.3
-Wno-error=terminate
# Warning in PADDLE_ENFORCE
-Wno-error=int-in-bool-context
# Warning in Eigen gcc 7.2
-Wimplicit-fallthrough=0
# Warning in tinyformat.h
-Wno-error=maybe-uninitialized
# Warning in boost gcc 7.2
)
set
(
GPU_COMMON_FLAGS
...
...
cmake/generic.cmake
浏览文件 @
5e0bc130
...
...
@@ -454,25 +454,29 @@ function(hip_library TARGET_NAME)
else
()
add_library
(
${
TARGET_NAME
}
STATIC
${
_cmake_options
}
${
_generated_files
}
${
_sources
}
)
set_target_properties
(
${
TARGET_NAME
}
PROPERTIES LINKER_LANGUAGE CXX
)
target_link_libraries
(
${
TARGET_NAME
}
/opt/rocm/hip/lib/libhip_hcc.so /opt/rocm/hip/lib/libhip_device.a
)
find_fluid_modules
(
${
TARGET_NAME
}
)
target_link_libraries
(
${
TARGET_NAME
}
/opt/rocm/hip/lib/libhip_hcc.so /opt/rocm/hip/lib/libhip_device.a
/opt/rocm/rccl/lib/librccl.so /opt/rocm/hiprand/lib/libhiprand.so
)
find_fluid_modules
(
${
TARGET_NAME
}
)
endif
()
if
(
hip_library_DEPS
)
add_dependencies
(
${
TARGET_NAME
}
${
hip_library_DEPS
}
)
target_link_libraries
(
${
TARGET_NAME
}
${
hip_library_DEPS
}
)
if
(
"
${
hip_library_DEPS
}
"
MATCHES
"ARCHIVE_START"
)
# Support linking flags: --whole-archive (Linux) / -force_load (MacOS).
# WARNING: Please don't use ARCHIVE_START&ARCHIVE_END if TARGET_NAME will be linked by other libraries.
target_circle_link_libraries
(
${
TARGET_NAME
}
${
hip_library_DEPS
}
)
list
(
REMOVE_ITEM hip_library_DEPS ARCHIVE_START ARCHIVE_END
)
else
()
target_link_libraries
(
${
TARGET_NAME
}
${
hip_library_DEPS
}
)
endif
()
# cpplint code style
foreach
(
source_file
${
hip_library_SRCS
}
)
string
(
REGEX REPLACE
"
\\
.[^.]*$"
""
source
${
source_file
}
)
if
(
EXISTS
${
CMAKE_CURRENT_SOURCE_DIR
}
/
${
source
}
.h
)
list
(
APPEND hip_library_HEADERS
${
CMAKE_CURRENT_SOURCE_DIR
}
/
${
source
}
.h
)
endif
()
string
(
REGEX REPLACE
"
\\
.[^.]*$"
""
source
${
source_file
}
)
if
(
EXISTS
${
CMAKE_CURRENT_SOURCE_DIR
}
/
${
source
}
.h
)
list
(
APPEND hip_library_HEADERS
${
CMAKE_CURRENT_SOURCE_DIR
}
/
${
source
}
.h
)
endif
()
endforeach
()
else
(
hip_library_SRCS
)
if
(
hip_library_DEPS
)
merge_static_libs
(
${
TARGET_NAME
}
${
hip_library_DEPS
}
)
merge_static_libs
(
${
TARGET_NAME
}
${
hip_library_DEPS
}
)
else
()
message
(
FATAL
"Please specify source file or library in nv_library."
)
message
(
FATAL
"Please specify source file or library in nv_library."
)
endif
()
endif
(
hip_library_SRCS
)
endif
()
...
...
cmake/hip.cmake
浏览文件 @
5e0bc130
...
...
@@ -3,6 +3,8 @@ if(NOT WITH_AMD_GPU)
endif
()
include_directories
(
"/opt/rocm/include"
)
include_directories
(
"/opt/rocm/hip/include"
)
include_directories
(
"/opt/rocm/miopen/include"
)
include_directories
(
"/opt/rocm/hipblas/include"
)
include_directories
(
"/opt/rocm/hiprand/include"
)
include_directories
(
"/opt/rocm/rocrand/include"
)
...
...
@@ -11,20 +13,40 @@ include_directories("/opt/rocm/thrust")
list
(
APPEND EXTERNAL_LIBS
"-L/opt/rocm/lib/ -lhip_hcc"
)
set
(
HIP_HCC_FLAGS
"
${
HIP_HCC_FLAGS
}
-fPIC -DPADDLE_WITH_HIP -std=c++1
4
"
)
set
(
HIP_HCC_FLAGS
"
${
HIP_HCC_FLAGS
}
-fPIC -DPADDLE_WITH_HIP -std=c++1
1
"
)
if
(
WITH_DSO
)
set
(
HIP_HCC_FLAGS
"
${
HIP_HCC_FLAGS
}
-DPADDLE_USE_DSO"
)
endif
(
WITH_DSO
)
if
(
WITH_DOUBLE
)
set
(
HIP_HCC_FLAGS
"
${
HIP_HCC_FLAGS
}
-DPADDLE_TYPE_DOUBLE"
)
endif
(
WITH_DOUBLE
)
if
(
WITH_TESTING
)
set
(
HIP_HCC_FLAGS
"
${
HIP_HCC_FLAGS
}
-DPADDLE_WITH_TESTING"
)
endif
(
WITH_TESTING
)
if
(
WITH_DISTRIBUTE
)
set
(
HIP_HCC_FLAGS
"
${
HIP_HCC_FLAGS
}
-DPADDLE_WITH_DISTRIBUTE"
)
endif
(
WITH_DISTRIBUTE
)
if
(
WITH_GRPC
)
set
(
HIP_HCC_FLAGS
"
${
HIP_HCC_FLAGS
}
-DPADDLE_WITH_GRPC"
)
endif
(
WITH_GRPC
)
if
(
NOT WITH_GOLANG
)
set
(
HIP_HCC_FLAGS
"
${
HIP_HCC_FLAGS
}
-DPADDLE_WITHOUT_GOLANG"
)
endif
(
NOT WITH_GOLANG
)
if
(
WITH_MKLDNN
)
set
(
HIP_HCC_FLAGS
"
${
HIP_HCC_FLAGS
}
-DPADDLE_WITH_MKLDNN"
)
endif
(
WITH_MKLDNN
)
set
(
HIP_HCC_FLAGS
"
${
HIP_HCC_FLAGS
}
-DANY_IMPL_ANY_CAST_MOVEABLE"
)
if
(
NOT WITH_RDMA
)
set
(
HIP_HCC_FLAGS
"
${
HIP_HCC_FLAGS
}
-DPADDLE_DISABLE_RDMA"
)
endif
(
NOT WITH_RDMA
)
if
(
CMAKE_BUILD_TYPE STREQUAL
"Debug"
)
list
(
APPEND HIP_HCC_FLAGS
${
CMAKE_CXX_FLAGS_DEBUG
}
)
elseif
(
CMAKE_BUILD_TYPE STREQUAL
"RelWithDebInfo"
)
...
...
paddle/fluid/API.spec
浏览文件 @
5e0bc130
...
...
@@ -276,7 +276,7 @@ paddle.fluid.layers.hard_shrink ArgSpec(args=['x', 'threshold'], varargs=None, k
paddle.fluid.layers.cumsum ArgSpec(args=['x', 'axis', 'exclusive', 'reverse'], varargs=None, keywords=None, defaults=(None, None, None))
paddle.fluid.layers.thresholded_relu ArgSpec(args=['x', 'threshold'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.layers.prior_box ArgSpec(args=['input', 'image', 'min_sizes', 'max_sizes', 'aspect_ratios', 'variance', 'flip', 'clip', 'steps', 'offset', 'name', 'min_max_aspect_ratios_order'], varargs=None, keywords=None, defaults=(None, [1.0], [0.1, 0.1, 0.2, 0.2], False, False, [0.0, 0.0], 0.5, None, False))
paddle.fluid.layers.density_prior_box ArgSpec(args=['input', 'image', 'densities', 'fixed_sizes', 'fixed_ratios', 'variance', 'clip', 'steps', 'offset', '
name'], varargs=None, keywords=None, defaults=(None, None, None, [0.1, 0.1, 0.2, 0.2], False, [0.0, 0.0], 0.5
, None))
paddle.fluid.layers.density_prior_box ArgSpec(args=['input', 'image', 'densities', 'fixed_sizes', 'fixed_ratios', 'variance', 'clip', 'steps', 'offset', '
flatten_to_2d', 'name'], varargs=None, keywords=None, defaults=(None, None, None, [0.1, 0.1, 0.2, 0.2], False, [0.0, 0.0], 0.5, False
, None))
paddle.fluid.layers.multi_box_head ArgSpec(args=['inputs', 'image', 'base_size', 'num_classes', 'aspect_ratios', 'min_ratio', 'max_ratio', 'min_sizes', 'max_sizes', 'steps', 'step_w', 'step_h', 'offset', 'variance', 'flip', 'clip', 'kernel_size', 'pad', 'stride', 'name', 'min_max_aspect_ratios_order'], varargs=None, keywords=None, defaults=(None, None, None, None, None, None, None, 0.5, [0.1, 0.1, 0.2, 0.2], True, False, 1, 0, 1, None, False))
paddle.fluid.layers.bipartite_match ArgSpec(args=['dist_matrix', 'match_type', 'dist_threshold', 'name'], varargs=None, keywords=None, defaults=(None, None, None))
paddle.fluid.layers.target_assign ArgSpec(args=['input', 'matched_indices', 'negative_indices', 'mismatch_value', 'name'], varargs=None, keywords=None, defaults=(None, None, None))
...
...
@@ -342,7 +342,7 @@ paddle.fluid.transpiler.RoundRobin.dispatch ArgSpec(args=['self', 'varlist'], va
paddle.fluid.transpiler.RoundRobin.reset ArgSpec(args=['self'], varargs=None, keywords=None, defaults=None)
paddle.fluid.transpiler.DistributeTranspilerConfig.__init__
paddle.fluid.nets.simple_img_conv_pool ArgSpec(args=['input', 'num_filters', 'filter_size', 'pool_size', 'pool_stride', 'pool_padding', 'pool_type', 'global_pooling', 'conv_stride', 'conv_padding', 'conv_dilation', 'conv_groups', 'param_attr', 'bias_attr', 'act', 'use_cudnn'], varargs=None, keywords=None, defaults=(0, 'max', False, 1, 0, 1, 1, None, None, None, True))
paddle.fluid.nets.sequence_conv_pool ArgSpec(args=['input', 'num_filters', 'filter_size', 'param_attr', 'act', 'pool_type'
], varargs=None, keywords=None, defaults=(None, 'sigmoid', 'max'
))
paddle.fluid.nets.sequence_conv_pool ArgSpec(args=['input', 'num_filters', 'filter_size', 'param_attr', 'act', 'pool_type'
, 'bias_attr'], varargs=None, keywords=None, defaults=(None, 'sigmoid', 'max', None
))
paddle.fluid.nets.glu ArgSpec(args=['input', 'dim'], varargs=None, keywords=None, defaults=(-1,))
paddle.fluid.nets.scaled_dot_product_attention ArgSpec(args=['queries', 'keys', 'values', 'num_heads', 'dropout_rate'], varargs=None, keywords=None, defaults=(1, 0.0))
paddle.fluid.nets.img_conv_group ArgSpec(args=['input', 'conv_num_filter', 'pool_size', 'conv_padding', 'conv_filter_size', 'conv_act', 'param_attr', 'conv_with_batchnorm', 'conv_batchnorm_drop_rate', 'pool_stride', 'pool_type', 'use_cudnn'], varargs=None, keywords=None, defaults=(1, 3, None, None, False, 0.0, 1, 'max', True))
...
...
paddle/fluid/framework/CMakeLists.txt
浏览文件 @
5e0bc130
...
...
@@ -192,3 +192,6 @@ cc_test(tuple_test SRCS tuple_test.cc )
if
(
NOT WIN32
)
cc_test
(
rw_lock_test SRCS rw_lock_test.cc
)
endif
(
NOT WIN32
)
cc_library
(
dlpack_tensor SRCS dlpack_tensor.cc DEPS tensor dlpack
)
cc_test
(
dlpack_tensor_test SRCS dlpack_tensor_test.cc DEPS dlpack_tensor glog
)
paddle/fluid/framework/dlpack_tensor.cc
0 → 100644
浏览文件 @
5e0bc130
// Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "paddle/fluid/framework/dlpack_tensor.h"
namespace
paddle
{
namespace
framework
{
namespace
internal
{
template
<
typename
T
>
static
::
DLDataType
GetDLDataTypeCode
()
{
::
DLDataType
dtype
;
if
(
std
::
is_same
<
T
,
platform
::
float16
>::
value
||
std
::
is_floating_point
<
T
>::
value
)
{
dtype
.
code
=
kDLFloat
;
}
else
if
(
std
::
is_unsigned
<
T
>::
value
)
{
dtype
.
code
=
kDLUInt
;
}
else
if
(
std
::
is_integral
<
T
>::
value
)
{
dtype
.
code
=
kDLInt
;
}
else
{
PADDLE_THROW
(
"Unsupported data type %s"
,
typeid
(
T
).
name
());
}
dtype
.
bits
=
8
*
sizeof
(
T
);
dtype
.
lanes
=
1
;
return
dtype
;
}
static
DLDataType
GetDLDataTypeFromTypeIndex
(
const
std
::
type_index
&
type
)
{
#define REG_DL_DATA_TYPE(type) \
{ std::type_index(typeid(type)), GetDLDataTypeCode<type>() }
static
const
std
::
unordered_map
<
std
::
type_index
,
::
DLDataType
>
type_to_dtype_map
({
REG_DL_DATA_TYPE
(
platform
::
float16
),
// NOLINT
REG_DL_DATA_TYPE
(
float
),
// NOLINT
REG_DL_DATA_TYPE
(
double
),
// NOLINT
REG_DL_DATA_TYPE
(
int
),
// NOLINT
REG_DL_DATA_TYPE
(
int64_t
),
// NOLINT
REG_DL_DATA_TYPE
(
bool
),
// NOLINT
REG_DL_DATA_TYPE
(
size_t
),
// NOLINT
REG_DL_DATA_TYPE
(
int16_t
),
// NOLINT
REG_DL_DATA_TYPE
(
uint8_t
),
// NOLINT
REG_DL_DATA_TYPE
(
int8_t
)
// NOLINT
});
static
auto
type_to_dtype_map_end_it
=
type_to_dtype_map
.
end
();
auto
it
=
type_to_dtype_map
.
find
(
type
);
PADDLE_ENFORCE
(
it
!=
type_to_dtype_map_end_it
,
"Unsupported data type %s"
,
type
.
name
());
return
it
->
second
;
#undef REG_DL_DATA_TYPE
}
struct
DLContextVisitor
:
public
boost
::
static_visitor
<::
DLContext
>
{
inline
::
DLContext
operator
()(
const
platform
::
CPUPlace
&
place
)
const
{
DLContext
ctx
;
ctx
.
device_type
=
kDLCPU
;
ctx
.
device_id
=
0
;
return
ctx
;
}
inline
::
DLContext
operator
()(
const
platform
::
CUDAPlace
&
place
)
const
{
#ifdef PADDLE_WITH_CUDA
DLContext
ctx
;
ctx
.
device_type
=
kDLGPU
;
ctx
.
device_id
=
place
.
device
;
return
ctx
;
#else
PADDLE_THROW
(
"platform::CUDAPlace is not supported in CPU only version"
);
#endif
}
inline
::
DLContext
operator
()(
const
platform
::
CUDAPinnedPlace
&
place
)
const
{
#ifdef PADDLE_WITH_CUDA
DLContext
ctx
;
ctx
.
device_type
=
kDLCPUPinned
;
ctx
.
device_id
=
0
;
return
ctx
;
#else
PADDLE_THROW
(
"platform::CUDAPinnedPlace is not supported in CPU only version"
);
#endif
}
};
}
// namespace internal
DLPackTensor
::
DLPackTensor
(
const
Tensor
&
tensor
,
LaneType
lanes
)
{
// init data, data buffer
t_
.
data
=
const_cast
<
void
*>
(
tensor
.
data
<
void
>
());
// init ctx, DLContext type with device_type and device_id
auto
place
=
tensor
.
place
();
t_
.
ctx
=
boost
::
apply_visitor
(
internal
::
DLContextVisitor
(),
place
);
// init dtype
t_
.
dtype
=
internal
::
GetDLDataTypeFromTypeIndex
(
tensor
.
type
());
t_
.
dtype
.
lanes
=
lanes
;
// init ndim, tensor rank
auto
&
dims
=
tensor
.
dims
();
using
DimType
=
decltype
(
t_
.
ndim
);
// int
t_
.
ndim
=
static_cast
<
DimType
>
(
dims
.
size
());
// init shape, tensor dims
t_
.
shape
=
shape_
;
for
(
DimType
i
=
0
;
i
<
t_
.
ndim
;
++
i
)
{
t_
.
shape
[
i
]
=
dims
[
i
];
}
// init strides, nullptr means the tensor is compact
t_
.
strides
=
nullptr
;
// init byte_offset
t_
.
byte_offset
=
0
;
}
}
// namespace framework
}
// namespace paddle
paddle/fluid/framework/dlpack_tensor.h
0 → 100644
浏览文件 @
5e0bc130
// Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#pragma once
#include <dlpack/dlpack.h>
#include "paddle/fluid/framework/tensor.h"
namespace
paddle
{
namespace
framework
{
class
DLPackTensor
{
public:
using
LaneType
=
decltype
(
::
DLTensor
::
dtype
.
lanes
);
// uint16_t
using
ShapeType
=
std
::
remove_reference
<
decltype
(
::
DLTensor
::
shape
[
0
])
>::
type
;
// int64_t
// lanes is only used in CPU to enable vectorization
explicit
DLPackTensor
(
const
Tensor
&
tensor
,
LaneType
lanes
=
1
);
inline
operator
const
::
DLTensor
&
()
const
{
return
t_
;
}
inline
operator
::
DLTensor
&
()
{
return
t_
;
}
private:
::
DLTensor
t_
;
// The shape in DLTensor is defined as int64_t*
// Add this member to make TVMTensor init without heap allocation
ShapeType
shape_
[
9
];
};
}
// namespace framework
}
// namespace paddle
paddle/fluid/framework/dlpack_tensor_test.cc
0 → 100644
浏览文件 @
5e0bc130
// Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "paddle/fluid/framework/dlpack_tensor.h"
#include <glog/logging.h>
#include <gtest/gtest.h>
#include <vector>
namespace
paddle
{
namespace
framework
{
namespace
{
// NOLINT
template
<
typename
T
>
constexpr
uint8_t
GetDLDataTypeCode
()
{
return
std
::
is_same
<
platform
::
float16
,
T
>::
value
||
std
::
is_floating_point
<
T
>::
value
?
static_cast
<
uint8_t
>
(
kDLFloat
)
:
(
std
::
is_unsigned
<
T
>::
value
?
static_cast
<
uint8_t
>
(
kDLUInt
)
:
(
std
::
is_integral
<
T
>::
value
?
static_cast
<
uint8_t
>
(
kDLInt
)
:
static_cast
<
uint8_t
>
(
-
1
)));
}
}
// NOLINT
template
<
typename
T
>
void
TestMain
(
const
platform
::
Place
&
place
,
uint16_t
lanes
)
{
DDim
dims
{
4
,
5
,
6
,
7
};
Tensor
tensor
;
tensor
.
Resize
(
dims
);
void
*
p
=
tensor
.
mutable_data
<
T
>
(
place
);
DLPackTensor
dlpack_tensor
(
tensor
,
lanes
);
::
DLTensor
&
dl_tensor
=
dlpack_tensor
;
CHECK_EQ
(
p
,
dl_tensor
.
data
);
if
(
platform
::
is_cpu_place
(
place
))
{
CHECK_EQ
(
kDLCPU
,
dl_tensor
.
ctx
.
device_type
);
CHECK_EQ
(
0
,
dl_tensor
.
ctx
.
device_id
);
}
else
if
(
platform
::
is_gpu_place
(
place
))
{
CHECK_EQ
(
kDLGPU
,
dl_tensor
.
ctx
.
device_type
);
CHECK_EQ
(
boost
::
get
<
platform
::
CUDAPlace
>
(
place
).
device
,
dl_tensor
.
ctx
.
device_id
);
}
else
if
(
platform
::
is_cuda_pinned_place
(
place
))
{
CHECK_EQ
(
kDLCPUPinned
,
dl_tensor
.
ctx
.
device_type
);
CHECK_EQ
(
0
,
dl_tensor
.
ctx
.
device_id
);
}
else
{
CHECK_EQ
(
false
,
true
);
}
CHECK_EQ
(
dims
.
size
(),
dl_tensor
.
ndim
);
for
(
auto
i
=
0
;
i
<
dims
.
size
();
++
i
)
{
CHECK_EQ
(
dims
[
i
],
dl_tensor
.
shape
[
i
]);
}
CHECK_EQ
(
dl_tensor
.
strides
==
nullptr
,
true
);
CHECK_EQ
(
static_cast
<
uint64_t
>
(
0
),
dl_tensor
.
byte_offset
);
CHECK_EQ
(
lanes
,
dl_tensor
.
dtype
.
lanes
);
CHECK_EQ
(
sizeof
(
T
)
*
8
,
dl_tensor
.
dtype
.
bits
);
CHECK_EQ
(
GetDLDataTypeCode
<
T
>
(),
dl_tensor
.
dtype
.
code
);
}
template
<
typename
T
>
void
TestMainLoop
()
{
#ifdef PADDLE_WITH_CUDA
std
::
vector
<
platform
::
Place
>
places
{
platform
::
CPUPlace
(),
platform
::
CUDAPlace
(
0
),
platform
::
CUDAPinnedPlace
()};
if
(
platform
::
GetCUDADeviceCount
()
>
1
)
{
places
.
emplace_back
(
platform
::
CUDAPlace
(
1
));
}
#else
std
::
vector
<
platform
::
Place
>
places
{
platform
::
CPUPlace
()};
#endif
std
::
vector
<
uint16_t
>
lanes
{
1
,
2
};
for
(
auto
&
p
:
places
)
{
for
(
auto
&
l
:
lanes
)
{
TestMain
<
T
>
(
p
,
l
);
}
}
}
#define PADDLE_DLPACK_TEST(type) \
TEST(dlpack, test_##type) { TestMainLoop<type>(); }
using
float16
=
platform
::
float16
;
PADDLE_DLPACK_TEST
(
float16
);
PADDLE_DLPACK_TEST
(
float
);
PADDLE_DLPACK_TEST
(
double
);
PADDLE_DLPACK_TEST
(
int
);
PADDLE_DLPACK_TEST
(
int64_t
);
PADDLE_DLPACK_TEST
(
bool
);
PADDLE_DLPACK_TEST
(
size_t
);
PADDLE_DLPACK_TEST
(
int16_t
);
PADDLE_DLPACK_TEST
(
uint8_t
);
PADDLE_DLPACK_TEST
(
int8_t
);
#undef PADDLE_DLPACK_TEST
}
// namespace framework
}
// namespace paddle
paddle/fluid/framework/op_desc.cc
浏览文件 @
5e0bc130
...
...
@@ -252,6 +252,12 @@ void OpDesc::SetAttr(const std::string &name, const Attribute &v) {
this
->
attrs_
[
name
]
=
std
::
vector
<
int
>
();
break
;
}
case
proto
::
AttrType
::
LONGS
:
{
VLOG
(
110
)
<<
"SetAttr: "
<<
Type
()
<<
", "
<<
name
<<
" from LONGS to LONGS"
;
this
->
attrs_
[
name
]
=
std
::
vector
<
int64_t
>
();
break
;
}
case
proto
::
AttrType
::
FLOATS
:
{
VLOG
(
110
)
<<
"SetAttr: "
<<
Type
()
<<
", "
<<
name
<<
" from INTS to FLOATS"
;
...
...
paddle/fluid/inference/tensorrt/convert/split_op.cc
浏览文件 @
5e0bc130
...
...
@@ -19,9 +19,6 @@ namespace paddle {
namespace
inference
{
namespace
tensorrt
{
/*
* SplitOp.
*/
class
SplitOpConverter
:
public
OpConverter
{
public:
void
operator
()(
const
framework
::
proto
::
OpDesc
&
op
,
...
...
@@ -40,16 +37,11 @@ class SplitOpConverter : public OpConverter {
int
axis
=
boost
::
get
<
int
>
(
op_desc
.
GetAttr
(
"axis"
));
std
::
vector
<
int
>
output_lengths
=
boost
::
get
<
std
::
vector
<
int
>>
(
op_desc
.
GetAttr
(
"sections"
));
// split on batch is not supported in TensorRT
PADDLE_ENFORCE
(
axis
!=
0
);
if
(
axis
<
0
)
{
axis
+=
input_dims
.
nbDims
;
}
else
{
axis
-=
1
;
}
axis
+=
(
axis
<
0
)
?
input_dims
.
nbDims
:
-
1
;
PADDLE_ENFORCE
(
output_lengths
.
size
()
==
output_num
);
//
plugin
::
SplitPlugin
*
plugin
=
new
plugin
::
SplitPlugin
(
axis
,
output_lengths
);
nvinfer1
::
IPluginLayer
*
layer
=
engine_
->
AddPlugin
(
&
input
,
input_num
,
plugin
);
...
...
paddle/fluid/inference/tensorrt/convert/test_split_op.cc
浏览文件 @
5e0bc130
...
...
@@ -20,30 +20,92 @@ namespace paddle {
namespace
inference
{
namespace
tensorrt
{
TEST
(
split_op
,
test
)
{
template
<
int
BatchSize
,
int
Axis
>
void
TensorRTSplitTest
(
const
std
::
vector
<
int
>
&
in_shape
,
const
std
::
vector
<
int
>
&
sections
)
{
std
::
unordered_set
<
std
::
string
>
parameters
({
""
});
framework
::
Scope
scope
;
TRTConvertValidation
validator
(
10
,
parameters
,
scope
,
1000
);
validator
.
DeclInputVar
(
"split_input"
,
nvinfer1
::
DimsCHW
(
3
,
2
,
2
));
validator
.
DeclOutputVar
(
"split_out1"
,
nvinfer1
::
DimsCHW
(
2
,
2
,
2
));
validator
.
DeclOutputVar
(
"split_out2"
,
nvinfer1
::
DimsCHW
(
1
,
2
,
2
));
TRTConvertValidation
validator
(
BatchSize
+
1
,
parameters
,
scope
,
10000
);
auto
make_dim
=
[](
const
std
::
vector
<
int
>
&
shape
)
{
nvinfer1
::
DimsCHW
dim
;
dim
.
c
()
=
shape
[
0
];
dim
.
h
()
=
shape
[
1
];
dim
.
w
()
=
shape
[
2
];
return
dim
;
};
validator
.
DeclInputVar
(
"split_input"
,
make_dim
(
in_shape
));
std
::
vector
<
std
::
string
>
output_vars
;
for
(
size_t
i
=
0
;
i
<
sections
.
size
();
++
i
)
{
auto
out_shape
=
in_shape
;
out_shape
[
Axis
-
1
]
=
sections
[
i
];
std
::
string
output_name
=
"split_out"
+
std
::
to_string
(
i
);
validator
.
DeclOutputVar
(
output_name
,
make_dim
(
out_shape
));
output_vars
.
push_back
(
output_name
);
}
// Prepare Op description
framework
::
OpDesc
desc
;
desc
.
SetType
(
"split"
);
desc
.
SetInput
(
"X"
,
{
"split_input"
});
desc
.
SetOutput
(
"Out"
,
{
"split_out1"
,
"split_out2"
}
);
desc
.
SetOutput
(
"Out"
,
output_vars
);
int
num
=
0
;
int
axis
=
1
;
std
::
vector
<
int
>
output_lengths
=
{
2
,
1
};
desc
.
SetAttr
(
"axis"
,
axis
);
desc
.
SetAttr
(
"num"
,
num
);
desc
.
SetAttr
(
"sections"
,
output_lengths
);
desc
.
SetAttr
(
"axis"
,
Axis
);
desc
.
SetAttr
(
"num"
,
0
);
desc
.
SetAttr
(
"sections"
,
sections
);
validator
.
SetOp
(
*
desc
.
Proto
());
validator
.
Execute
(
1
);
validator
.
Execute
(
BatchSize
);
}
// batch = 0, axis = 1, same shape
TEST
(
split_op
,
test_same_shape_axis1_batch1
)
{
TensorRTSplitTest
<
1
,
1
>
({
4
,
2
,
2
},
{
2
,
2
});
}
// batch = 0, axis = 1, different shape
TEST
(
split_op
,
test_different_shape_axis1_batch1
)
{
TensorRTSplitTest
<
1
,
1
>
({
3
,
2
,
2
},
{
2
,
1
});
}
// batch = 10, axis = 1, same shape
TEST
(
split_op
,
test_same_shape_axis1_batch10
)
{
TensorRTSplitTest
<
10
,
1
>
({
4
,
2
,
2
},
{
2
,
2
});
}
// batch = 10, axis = 1, different shape
TEST
(
split_op
,
test_different_shape_axis1_batch10
)
{
TensorRTSplitTest
<
10
,
1
>
({
3
,
2
,
2
},
{
2
,
1
});
}
// batch = 0, axis = 2, same shape
TEST
(
split_op
,
test_same_shape_axis2_batch1
)
{
TensorRTSplitTest
<
1
,
2
>
({
3
,
4
,
2
},
{
2
,
2
});
}
// batch = 0, axis = 2, different shape
TEST
(
split_op
,
test_different_shape_axis2_batch1
)
{
TensorRTSplitTest
<
1
,
2
>
({
3
,
3
,
2
},
{
2
,
1
});
}
// batch = 10, axis = 2, same shape
TEST
(
split_op
,
test_same_shape_axis2_batch10
)
{
TensorRTSplitTest
<
10
,
2
>
({
3
,
4
,
2
},
{
2
,
2
});
}
// batch = 10, axis = 2, different shape
TEST
(
split_op
,
test_different_shape_axis2_batch10
)
{
TensorRTSplitTest
<
10
,
2
>
({
3
,
3
,
2
},
{
2
,
1
});
}
// batch = 0, axis = 3, same shape
TEST
(
split_op
,
test_same_shape_axis3_batch1
)
{
TensorRTSplitTest
<
1
,
3
>
({
3
,
2
,
4
},
{
2
,
2
});
}
// batch = 0, axis = 3, different shape
TEST
(
split_op
,
test_different_shape_axis3_batch1
)
{
TensorRTSplitTest
<
1
,
3
>
({
3
,
2
,
3
},
{
2
,
1
});
}
// batch = 10, axis = 3, same shape
TEST
(
split_op
,
test_same_shape_axis3_batch10
)
{
TensorRTSplitTest
<
10
,
3
>
({
3
,
2
,
4
},
{
2
,
2
});
}
// batch = 10, axis = 3, different shape
TEST
(
split_op
,
test_different_shape_axis3_batch10
)
{
TensorRTSplitTest
<
10
,
3
>
({
3
,
2
,
3
},
{
2
,
1
});
}
}
// namespace tensorrt
...
...
paddle/fluid/inference/tensorrt/plugin/split_op_plugin.cu
浏览文件 @
5e0bc130
...
...
@@ -12,6 +12,8 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#include <cuda_fp16.h>
#include <algorithm>
#include "paddle/fluid/inference/tensorrt/plugin/split_op_plugin.h"
namespace
paddle
{
...
...
@@ -19,6 +21,52 @@ namespace inference {
namespace
tensorrt
{
namespace
plugin
{
// copied from operators::math::SplitFunctor
template
<
typename
T
>
__global__
void
SplitKernel
(
const
T
*
input_data
,
const
int
in_row
,
const
int
in_col
,
const
int
*
out_cols
,
int
out_cols_size
,
T
**
outputs_data
)
{
int
tid_x
=
blockIdx
.
x
*
blockDim
.
x
+
threadIdx
.
x
;
int
curr_segment
=
0
;
int
curr_offset
=
out_cols
[
0
];
for
(;
tid_x
<
in_col
;
tid_x
+=
blockDim
.
x
*
gridDim
.
x
)
{
int
curr_col_offset
=
out_cols
[
curr_segment
+
1
];
while
(
curr_col_offset
<=
tid_x
)
{
curr_offset
=
curr_col_offset
;
++
curr_segment
;
curr_col_offset
=
out_cols
[
curr_segment
+
1
];
}
int
local_col
=
tid_x
-
curr_offset
;
int
segment_width
=
curr_col_offset
-
curr_offset
;
T
*
output_ptr
=
outputs_data
[
curr_segment
];
if
(
output_ptr
!=
nullptr
)
{
int
tid_y
=
blockIdx
.
y
*
blockDim
.
y
+
threadIdx
.
y
;
for
(;
tid_y
<
in_row
;
tid_y
+=
blockDim
.
y
*
gridDim
.
y
)
output_ptr
[
tid_y
*
segment_width
+
local_col
]
=
input_data
[
tid_y
*
in_col
+
tid_x
];
}
}
}
template
<
typename
T
>
__global__
void
SplitKernel
(
const
T
*
input_data
,
const
int
in_row
,
const
int
in_col
,
const
int
fixed_out_col
,
T
**
outputs_data
)
{
int
tid_x
=
blockIdx
.
x
*
blockDim
.
x
+
threadIdx
.
x
;
for
(;
tid_x
<
in_col
;
tid_x
+=
blockDim
.
x
*
gridDim
.
x
)
{
int
split
=
tid_x
/
fixed_out_col
;
int
in_offset
=
tid_x
-
split
*
fixed_out_col
;
T
*
output_ptr
=
outputs_data
[
split
];
if
(
output_ptr
!=
nullptr
)
{
int
tid_y
=
blockIdx
.
y
*
blockDim
.
y
+
threadIdx
.
y
;
for
(;
tid_y
<
in_row
;
tid_y
+=
blockDim
.
y
*
gridDim
.
y
)
output_ptr
[
tid_y
*
fixed_out_col
+
in_offset
]
=
input_data
[
tid_y
*
in_col
+
tid_x
];
}
}
}
nvinfer1
::
Dims
SplitPlugin
::
getOutputDimensions
(
int
index
,
const
nvinfer1
::
Dims
*
input_dims
,
int
num_inputs
)
{
PADDLE_ENFORCE_EQ
(
num_inputs
,
1
);
...
...
@@ -31,48 +79,96 @@ nvinfer1::Dims SplitPlugin::getOutputDimensions(
int
SplitPlugin
::
initialize
()
{
PADDLE_ENFORCE_LE
(
axis_
,
nvinfer1
::
Dims
::
MAX_DIMS
);
// notice input dims is [C, H, W]
nvinfer1
::
Dims
dims
=
this
->
getInputDims
(
0
);
outer_rows_
=
1
;
inner_cols_
=
1
;
for
(
int
i
=
0
;
i
<
axis_
;
++
i
)
{
outer_rows_
*=
dims
.
d
[
i
];
}
for
(
int
i
=
axis_
+
1
;
i
<
dims
.
nbDims
;
++
i
)
{
inner_cols_
*=
dims
.
d
[
i
];
}
same_shape_
=
true
;
std
::
vector
<
int
>
segment_offsets
(
1
,
0
);
for
(
int
i
=
0
;
i
<
this
->
getNbOutputs
();
++
i
)
{
segment_offsets
.
push_back
(
segment_offsets
.
back
()
+
output_length_
[
i
]);
if
(
output_length_
[
i
]
!=
output_length_
[
0
])
{
same_shape_
=
false
;
}
segment_offsets
.
push_back
(
segment_offsets
.
back
()
+
output_length_
[
i
]
*
inner_cols_
);
}
segment_offsets_
=
segment_offsets
;
nvinfer1
::
Dims
dims
=
this
->
getInputDims
(
0
);
nx_
=
1
;
for
(
int
i
=
dims
.
nbDims
-
1
;
i
>
axis_
;
--
i
)
{
nx_
*=
dims
.
d
[
i
];
inner_cols_
*=
dims
.
d
[
axis_
];
d_segment_offsets_
=
segment_offsets
;
segment_offsets_
=
std
::
move
(
segment_offsets
);
d_output_ptrs_
.
resize
(
this
->
getNbOutputs
(),
nullptr
);
return
0
;
}
template
<
typename
T
>
inline
void
Split
(
cudaStream_t
stream
,
const
bool
same_shape
,
const
int
outer_rows
,
const
int
inner_cols
,
const
std
::
vector
<
int
>&
segment_offsets
,
const
int
*
d_segment_offsets
,
const
T
*
input
,
T
**
outputs
)
{
const
int
kThreadsPerBlock
=
1024
;
const
int
kMaxBlocks
=
65535
;
int
block_cols
=
kThreadsPerBlock
;
if
(
inner_cols
<
kThreadsPerBlock
)
{
// block_cols is aligned by 32.
block_cols
=
((
inner_cols
+
31
)
>>
5
)
<<
5
;
}
ny_
=
dims
.
d
[
axis_
];
nz_
=
1
;
for
(
int
i
=
axis_
-
1
;
i
>=
0
;
--
i
)
{
nz_
*=
dims
.
d
[
i
];
int
block_rows
=
kThreadsPerBlock
/
block_cols
;
dim3
block_size
=
dim3
(
block_cols
,
block_rows
,
1
);
int
grid_cols
=
std
::
min
((
inner_cols
+
block_cols
-
1
)
/
block_cols
,
kMaxBlocks
);
int
grid_rows
=
std
::
min
(
kMaxBlocks
/
grid_cols
,
std
::
max
(
outer_rows
/
block_rows
,
1
));
dim3
grid_size
=
dim3
(
grid_cols
,
grid_rows
,
1
);
if
(
same_shape
)
{
SplitKernel
<<<
grid_size
,
block_size
,
0
,
stream
>>>
(
input
,
outer_rows
,
inner_cols
,
segment_offsets
[
1
],
outputs
);
}
else
{
SplitKernel
<<<
grid_size
,
block_size
,
0
,
stream
>>>
(
input
,
outer_rows
,
inner_cols
,
d_segment_offsets
,
static_cast
<
int
>
(
segment_offsets
.
size
()),
outputs
);
}
return
0
;
}
int
SplitPlugin
::
enqueue
(
int
batchSize
,
const
void
*
const
*
inputs
,
void
**
outputs
,
void
*
workspace
,
cudaStream_t
stream
)
{
auto
const
&
input_dims
=
this
->
getInputDims
(
0
);
int
input_size
=
0
;
float
const
*
idata
=
reinterpret_cast
<
float
const
*>
(
inputs
[
0
]);
float
**
odatas
=
reinterpret_cast
<
float
**>
(
outputs
);
// kernel impl here.
int
inputBatchOffset
=
nx_
*
ny_
*
nz_
;
for
(
size_t
i
=
0
;
i
<
this
->
getNbOutputs
();
i
++
)
{
for
(
size_t
j
=
0
;
j
<
batchSize
;
j
++
)
{
cudaMemcpyAsync
(
odatas
[
i
]
+
j
*
(
segment_offsets_
[
i
+
1
]
-
segment_offsets_
[
i
])
*
nx_
*
sizeof
(
float
),
inputs
[
0
]
+
(
inputBatchOffset
*
j
+
segment_offsets_
[
i
]
*
nx_
)
*
sizeof
(
float
),
(
segment_offsets_
[
i
+
1
]
-
segment_offsets_
[
i
])
*
nx_
*
sizeof
(
float
),
cudaMemcpyDeviceToDevice
,
stream
);
float
const
*
input_ptr
=
reinterpret_cast
<
float
const
*>
(
inputs
[
0
]);
if
(((
batchSize
==
1
&&
axis_
==
0
)
||
axis_
==
-
1
)
&&
this
->
getNbOutputs
()
<
10
)
{
float
**
output_ptrs
=
reinterpret_cast
<
float
**>
(
outputs
);
int
data_type_size
=
(
this
->
getDataType
()
==
nvinfer1
::
DataType
::
kFLOAT
)
?
sizeof
(
float
)
:
sizeof
(
__half
);
for
(
int
i
=
0
;
i
<
this
->
getNbOutputs
();
++
i
)
{
PADDLE_ENFORCE
(
cudaMemcpyAsync
(
output_ptrs
[
i
],
input_ptr
+
segment_offsets_
[
i
],
(
segment_offsets_
[
i
+
1
]
-
segment_offsets_
[
i
])
*
data_type_size
,
cudaMemcpyDeviceToDevice
,
stream
)
==
cudaSuccess
);
}
}
else
{
outer_rows_
*=
batchSize
;
const
int
*
d_segment_offsets_ptr
=
thrust
::
raw_pointer_cast
(
&
d_segment_offsets_
[
0
]);
float
**
output_ptrs
=
thrust
::
raw_pointer_cast
(
&
d_output_ptrs_
[
0
]);
PADDLE_ENFORCE
(
cudaMemcpyAsync
(
output_ptrs
,
outputs
,
this
->
getNbOutputs
()
*
sizeof
(
float
*
),
cudaMemcpyHostToDevice
,
stream
)
==
cudaSuccess
);
if
(
this
->
getDataType
()
==
nvinfer1
::
DataType
::
kFLOAT
)
{
Split
(
stream
,
same_shape_
,
outer_rows_
,
inner_cols_
,
segment_offsets_
,
d_segment_offsets_ptr
,
input_ptr
,
output_ptrs
);
}
else
{
Split
(
stream
,
same_shape_
,
outer_rows_
,
inner_cols_
,
segment_offsets_
,
d_segment_offsets_ptr
,
(
__half
*
)
input_ptr
,
// NOLINT
(
__half
**
)
output_ptrs
);
// NOLINT
}
}
return
cudaGetLastError
()
!=
cudaSuccess
;
}
...
...
paddle/fluid/inference/tensorrt/plugin/split_op_plugin.h
浏览文件 @
5e0bc130
...
...
@@ -14,6 +14,7 @@
#pragma once
#include <thrust/device_vector.h>
#include <vector>
#include "paddle/fluid/inference/tensorrt/plugin/trt_plugin.h"
...
...
@@ -25,7 +26,7 @@ namespace plugin {
class
SplitPlugin
:
public
PluginTensorRT
{
public:
SplitPlugin
(
int
axis
,
std
::
vector
<
int
>
const
&
output_lengths
)
:
axis_
(
axis
),
output_length_
(
output_lengths
)
{}
:
axis_
(
axis
),
same_shape_
(
true
),
output_length_
(
output_lengths
)
{}
SplitPlugin
(
void
const
*
serial_data
,
size_t
serial_length
)
{
deserializeBase
(
serial_data
,
serial_length
);
...
...
@@ -60,9 +61,13 @@ class SplitPlugin : public PluginTensorRT {
}
int
axis_
;
int
outer_rows_
;
int
inner_cols_
;
bool
same_shape_
;
std
::
vector
<
int
>
output_length_
;
int
nx_
,
ny_
,
nz_
;
std
::
vector
<
int
>
segment_offsets_
;
thrust
::
device_vector
<
int
>
d_segment_offsets_
;
thrust
::
device_vector
<
float
*>
d_output_ptrs_
;
};
}
// namespace plugin
...
...
paddle/fluid/operators/detection/CMakeLists.txt
浏览文件 @
5e0bc130
...
...
@@ -22,7 +22,7 @@ iou_similarity_op.cu)
detection_library
(
mine_hard_examples_op SRCS mine_hard_examples_op.cc
)
detection_library
(
multiclass_nms_op SRCS multiclass_nms_op.cc poly_util.cc gpc.cc
)
detection_library
(
prior_box_op SRCS prior_box_op.cc prior_box_op.cu
)
detection_library
(
density_prior_box_op SRCS density_prior_box_op.cc
)
detection_library
(
density_prior_box_op SRCS density_prior_box_op.cc
density_prior_box_op.cu
)
detection_library
(
anchor_generator_op SRCS anchor_generator_op.cc
anchor_generator_op.cu
)
detection_library
(
target_assign_op SRCS target_assign_op.cc
...
...
paddle/fluid/operators/detection/density_prior_box_op.cc
浏览文件 @
5e0bc130
...
...
@@ -39,24 +39,27 @@ class DensityPriorBoxOp : public framework::OperatorWithKernel {
auto
fixed_sizes
=
ctx
->
Attrs
().
Get
<
std
::
vector
<
float
>>
(
"fixed_sizes"
);
auto
fixed_ratios
=
ctx
->
Attrs
().
Get
<
std
::
vector
<
float
>>
(
"fixed_ratios"
);
auto
densities
=
ctx
->
Attrs
().
Get
<
std
::
vector
<
int
>>
(
"densities"
);
bool
flatten
=
ctx
->
Attrs
().
Get
<
bool
>
(
"flatten_to_2d"
);
PADDLE_ENFORCE_EQ
(
fixed_sizes
.
size
(),
densities
.
size
(),
"The number of fixed_sizes and densities must be equal."
);
size_t
num_priors
=
0
;
if
((
fixed_sizes
.
size
()
>
0
)
&&
(
densities
.
size
()
>
0
))
{
for
(
size_t
i
=
0
;
i
<
densities
.
size
();
++
i
)
{
if
(
fixed_ratios
.
size
()
>
0
)
{
num_priors
+=
(
fixed_ratios
.
size
())
*
(
pow
(
densities
[
i
],
2
));
}
}
for
(
size_t
i
=
0
;
i
<
densities
.
size
();
++
i
)
{
num_priors
+=
(
fixed_ratios
.
size
())
*
(
pow
(
densities
[
i
],
2
));
}
if
(
!
flatten
)
{
std
::
vector
<
int64_t
>
dim_vec
(
4
);
dim_vec
[
0
]
=
input_dims
[
2
];
dim_vec
[
1
]
=
input_dims
[
3
];
dim_vec
[
2
]
=
num_priors
;
dim_vec
[
3
]
=
4
;
ctx
->
SetOutputDim
(
"Boxes"
,
framework
::
make_ddim
(
dim_vec
));
ctx
->
SetOutputDim
(
"Variances"
,
framework
::
make_ddim
(
dim_vec
));
}
else
{
int64_t
dim0
=
input_dims
[
2
]
*
input_dims
[
3
]
*
num_priors
;
ctx
->
SetOutputDim
(
"Boxes"
,
{
dim0
,
4
});
ctx
->
SetOutputDim
(
"Variances"
,
{
dim0
,
4
});
}
std
::
vector
<
int64_t
>
dim_vec
(
4
);
dim_vec
[
0
]
=
input_dims
[
2
];
dim_vec
[
1
]
=
input_dims
[
3
];
dim_vec
[
2
]
=
num_priors
;
dim_vec
[
3
]
=
4
;
ctx
->
SetOutputDim
(
"Boxes"
,
framework
::
make_ddim
(
dim_vec
));
ctx
->
SetOutputDim
(
"Variances"
,
framework
::
make_ddim
(
dim_vec
));
}
protected:
...
...
@@ -64,7 +67,7 @@ class DensityPriorBoxOp : public framework::OperatorWithKernel {
const
framework
::
ExecutionContext
&
ctx
)
const
override
{
return
framework
::
OpKernelType
(
framework
::
ToDataType
(
ctx
.
Input
<
framework
::
Tensor
>
(
"Input"
)
->
type
()),
platform
::
CPU
Place
());
ctx
.
Get
Place
());
}
};
...
...
@@ -101,7 +104,10 @@ class DensityPriorBoxOpMaker : public framework::OpProtoAndCheckerMaker {
});
AddAttr
<
bool
>
(
"clip"
,
"(bool) Whether to clip out-of-boundary boxes."
)
.
SetDefault
(
true
);
AddAttr
<
bool
>
(
"flatten_to_2d"
,
"(bool) Whether to flatten to 2D and "
"the second dim is 4."
)
.
SetDefault
(
false
);
AddAttr
<
float
>
(
"step_w"
,
"Density prior boxes step across width, 0.0 for auto calculation."
)
...
...
paddle/fluid/operators/detection/density_prior_box_op.cu
0 → 100644
浏览文件 @
5e0bc130
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/operators/detection/density_prior_box_op.h"
namespace
paddle
{
namespace
operators
{
template
<
typename
T
>
static
__device__
inline
T
Clip
(
T
in
)
{
return
min
(
max
(
in
,
0.
),
1.
);
}
template
<
typename
T
>
static
__global__
void
GenDensityPriorBox
(
const
int
height
,
const
int
width
,
const
int
im_height
,
const
int
im_width
,
const
T
offset
,
const
T
step_width
,
const
T
step_height
,
const
int
num_priors
,
const
T
*
ratios_shift
,
bool
is_clip
,
const
T
var_xmin
,
const
T
var_ymin
,
const
T
var_xmax
,
const
T
var_ymax
,
T
*
out
,
T
*
var
)
{
int
gidx
=
blockIdx
.
x
*
blockDim
.
x
+
threadIdx
.
x
;
int
gidy
=
blockIdx
.
y
*
blockDim
.
y
+
threadIdx
.
y
;
int
step_x
=
blockDim
.
x
*
gridDim
.
x
;
int
step_y
=
blockDim
.
y
*
gridDim
.
y
;
const
T
*
width_ratio
=
ratios_shift
;
const
T
*
height_ratio
=
ratios_shift
+
num_priors
;
const
T
*
width_shift
=
ratios_shift
+
2
*
num_priors
;
const
T
*
height_shift
=
ratios_shift
+
3
*
num_priors
;
for
(
int
j
=
gidy
;
j
<
height
;
j
+=
step_y
)
{
for
(
int
i
=
gidx
;
i
<
width
*
num_priors
;
i
+=
step_x
)
{
int
h
=
j
;
int
w
=
i
/
num_priors
;
int
k
=
i
%
num_priors
;
T
center_x
=
(
w
+
offset
)
*
step_width
;
T
center_y
=
(
h
+
offset
)
*
step_height
;
T
center_x_temp
=
center_x
+
width_shift
[
k
];
T
center_y_temp
=
center_y
+
height_shift
[
k
];
T
box_width_ratio
=
width_ratio
[
k
]
/
2.
;
T
box_height_ratio
=
height_ratio
[
k
]
/
2.
;
T
xmin
=
max
((
center_x_temp
-
box_width_ratio
)
/
im_width
,
0.
);
T
ymin
=
max
((
center_y_temp
-
box_height_ratio
)
/
im_height
,
0.
);
T
xmax
=
min
((
center_x_temp
+
box_width_ratio
)
/
im_width
,
1.
);
T
ymax
=
min
((
center_y_temp
+
box_height_ratio
)
/
im_height
,
1.
);
int
out_offset
=
(
j
*
width
*
num_priors
+
i
)
*
4
;
out
[
out_offset
]
=
is_clip
?
Clip
<
T
>
(
xmin
)
:
xmin
;
out
[
out_offset
+
1
]
=
is_clip
?
Clip
<
T
>
(
ymin
)
:
ymin
;
out
[
out_offset
+
2
]
=
is_clip
?
Clip
<
T
>
(
xmax
)
:
xmax
;
out
[
out_offset
+
3
]
=
is_clip
?
Clip
<
T
>
(
ymax
)
:
ymax
;
var
[
out_offset
]
=
var_xmin
;
var
[
out_offset
+
1
]
=
var_ymin
;
var
[
out_offset
+
2
]
=
var_xmax
;
var
[
out_offset
+
3
]
=
var_ymax
;
}
}
}
template
<
typename
T
>
class
DensityPriorBoxOpCUDAKernel
:
public
framework
::
OpKernel
<
T
>
{
public:
void
Compute
(
const
framework
::
ExecutionContext
&
ctx
)
const
override
{
auto
*
input
=
ctx
.
Input
<
paddle
::
framework
::
Tensor
>
(
"Input"
);
auto
*
image
=
ctx
.
Input
<
paddle
::
framework
::
Tensor
>
(
"Image"
);
auto
*
boxes
=
ctx
.
Output
<
paddle
::
framework
::
Tensor
>
(
"Boxes"
);
auto
*
vars
=
ctx
.
Output
<
paddle
::
framework
::
Tensor
>
(
"Variances"
);
auto
variances
=
ctx
.
Attr
<
std
::
vector
<
float
>>
(
"variances"
);
auto
is_clip
=
ctx
.
Attr
<
bool
>
(
"clip"
);
auto
fixed_sizes
=
ctx
.
Attr
<
std
::
vector
<
float
>>
(
"fixed_sizes"
);
auto
fixed_ratios
=
ctx
.
Attr
<
std
::
vector
<
float
>>
(
"fixed_ratios"
);
auto
densities
=
ctx
.
Attr
<
std
::
vector
<
int
>>
(
"densities"
);
T
step_w
=
static_cast
<
T
>
(
ctx
.
Attr
<
float
>
(
"step_w"
));
T
step_h
=
static_cast
<
T
>
(
ctx
.
Attr
<
float
>
(
"step_h"
));
T
offset
=
static_cast
<
T
>
(
ctx
.
Attr
<
float
>
(
"offset"
));
auto
img_width
=
image
->
dims
()[
3
];
auto
img_height
=
image
->
dims
()[
2
];
auto
feature_width
=
input
->
dims
()[
3
];
auto
feature_height
=
input
->
dims
()[
2
];
T
step_width
,
step_height
;
if
(
step_w
==
0
||
step_h
==
0
)
{
step_width
=
static_cast
<
T
>
(
img_width
)
/
feature_width
;
step_height
=
static_cast
<
T
>
(
img_height
)
/
feature_height
;
}
else
{
step_width
=
step_w
;
step_height
=
step_h
;
}
int
num_priors
=
0
;
for
(
size_t
i
=
0
;
i
<
densities
.
size
();
++
i
)
{
num_priors
+=
(
fixed_ratios
.
size
())
*
(
pow
(
densities
[
i
],
2
));
}
int
step_average
=
static_cast
<
int
>
((
step_width
+
step_height
)
*
0.5
);
framework
::
Tensor
h_temp
;
T
*
tdata
=
h_temp
.
mutable_data
<
T
>
({
num_priors
*
4
},
platform
::
CPUPlace
());
int
idx
=
0
;
for
(
size_t
s
=
0
;
s
<
fixed_sizes
.
size
();
++
s
)
{
auto
fixed_size
=
fixed_sizes
[
s
];
int
density
=
densities
[
s
];
for
(
size_t
r
=
0
;
r
<
fixed_ratios
.
size
();
++
r
)
{
float
ar
=
fixed_ratios
[
r
];
int
shift
=
step_average
/
density
;
float
box_width_ratio
=
fixed_size
*
sqrt
(
ar
);
float
box_height_ratio
=
fixed_size
/
sqrt
(
ar
);
for
(
int
di
=
0
;
di
<
density
;
++
di
)
{
for
(
int
dj
=
0
;
dj
<
density
;
++
dj
)
{
float
center_x_temp
=
shift
/
2.
+
dj
*
shift
-
step_average
/
2.
;
float
center_y_temp
=
shift
/
2.
+
di
*
shift
-
step_average
/
2.
;
tdata
[
idx
]
=
box_width_ratio
;
tdata
[
num_priors
+
idx
]
=
box_height_ratio
;
tdata
[
2
*
num_priors
+
idx
]
=
center_x_temp
;
tdata
[
3
*
num_priors
+
idx
]
=
center_y_temp
;
idx
++
;
}
}
}
}
boxes
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
vars
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
framework
::
Tensor
d_temp
;
framework
::
TensorCopySync
(
h_temp
,
ctx
.
GetPlace
(),
&
d_temp
);
// At least use 32 threads, at most 512 threads.
// blockx is multiple of 32.
int
blockx
=
std
::
min
(((
feature_width
*
num_priors
+
31
)
>>
5
)
<<
5
,
512L
);
int
gridx
=
(
feature_width
*
num_priors
+
blockx
-
1
)
/
blockx
;
dim3
threads
(
blockx
,
1
);
dim3
grids
(
gridx
,
feature_height
);
auto
stream
=
ctx
.
template
device_context
<
platform
::
CUDADeviceContext
>().
stream
();
GenDensityPriorBox
<
T
><<<
grids
,
threads
,
0
,
stream
>>>
(
feature_height
,
feature_width
,
img_height
,
img_width
,
offset
,
step_width
,
step_height
,
num_priors
,
d_temp
.
data
<
T
>
(),
is_clip
,
variances
[
0
],
variances
[
1
],
variances
[
2
],
variances
[
3
],
boxes
->
data
<
T
>
(),
vars
->
data
<
T
>
());
}
};
// namespace operators
}
// namespace operators
}
// namespace paddle
namespace
ops
=
paddle
::
operators
;
REGISTER_OP_CUDA_KERNEL
(
density_prior_box
,
ops
::
DensityPriorBoxOpCUDAKernel
<
float
>
,
ops
::
DensityPriorBoxOpCUDAKernel
<
double
>
);
paddle/fluid/operators/detection/density_prior_box_op.h
浏览文件 @
5e0bc130
/* Copyright (c) 201
6
PaddlePaddle Authors. All Rights Reserved.
/* Copyright (c) 201
8
PaddlePaddle Authors. All Rights Reserved.
licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
...
...
@@ -52,18 +52,16 @@ class DensityPriorBoxOpKernel : public framework::OpKernel<T> {
step_height
=
step_h
;
}
int
num_priors
=
0
;
if
(
fixed_sizes
.
size
()
>
0
&&
densities
.
size
()
>
0
)
{
for
(
size_t
i
=
0
;
i
<
densities
.
size
();
++
i
)
{
if
(
fixed_ratios
.
size
()
>
0
)
{
num_priors
+=
(
fixed_ratios
.
size
())
*
(
pow
(
densities
[
i
],
2
));
}
}
for
(
size_t
i
=
0
;
i
<
densities
.
size
();
++
i
)
{
num_priors
+=
(
fixed_ratios
.
size
())
*
(
pow
(
densities
[
i
],
2
));
}
boxes
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
vars
->
mutable_data
<
T
>
(
ctx
.
GetPlace
());
auto
e_boxes
=
framework
::
EigenTensor
<
T
,
4
>::
From
(
*
boxes
).
setConstant
(
0.0
);
auto
box_dim
=
vars
->
dims
();
boxes
->
Resize
({
feature_height
,
feature_width
,
num_priors
,
4
});
auto
e_boxes
=
framework
::
EigenTensor
<
T
,
4
>::
From
(
*
boxes
).
setConstant
(
0.0
);
int
step_average
=
static_cast
<
int
>
((
step_width
+
step_height
)
*
0.5
);
for
(
int
h
=
0
;
h
<
feature_height
;
++
h
)
{
...
...
@@ -76,36 +74,34 @@ class DensityPriorBoxOpKernel : public framework::OpKernel<T> {
auto
fixed_size
=
fixed_sizes
[
s
];
int
density
=
densities
[
s
];
// Generate density prior boxes with fixed ratios.
if
(
fixed_ratios
.
size
()
>
0
)
{
for
(
size_t
r
=
0
;
r
<
fixed_ratios
.
size
();
++
r
)
{
float
ar
=
fixed_ratios
[
r
];
int
shift
=
step_average
/
density
;
float
box_width_ratio
=
fixed_size
*
sqrt
(
ar
);
float
box_height_ratio
=
fixed_size
/
sqrt
(
ar
);
for
(
int
di
=
0
;
di
<
density
;
++
di
)
{
for
(
int
dj
=
0
;
dj
<
density
;
++
dj
)
{
float
center_x_temp
=
center_x
-
step_average
/
2.
+
shift
/
2.
+
dj
*
shift
;
float
center_y_temp
=
center_y
-
step_average
/
2.
+
shift
/
2.
+
di
*
shift
;
e_boxes
(
h
,
w
,
idx
,
0
)
=
(
center_x_temp
-
box_width_ratio
/
2.
)
/
img_width
>=
0
?
(
center_x_temp
-
box_width_ratio
/
2.
)
/
img_width
:
0
;
e_boxes
(
h
,
w
,
idx
,
1
)
=
(
center_y_temp
-
box_height_ratio
/
2.
)
/
img_height
>=
0
?
(
center_y_temp
-
box_height_ratio
/
2.
)
/
img_height
:
0
;
e_boxes
(
h
,
w
,
idx
,
2
)
=
(
center_x_temp
+
box_width_ratio
/
2.
)
/
img_width
<=
1
?
(
center_x_temp
+
box_width_ratio
/
2.
)
/
img_width
:
1
;
e_boxes
(
h
,
w
,
idx
,
3
)
=
(
center_y_temp
+
box_height_ratio
/
2.
)
/
img_height
<=
1
?
(
center_y_temp
+
box_height_ratio
/
2.
)
/
img_height
:
1
;
idx
++
;
}
for
(
size_t
r
=
0
;
r
<
fixed_ratios
.
size
();
++
r
)
{
float
ar
=
fixed_ratios
[
r
];
int
shift
=
step_average
/
density
;
float
box_width_ratio
=
fixed_size
*
sqrt
(
ar
);
float
box_height_ratio
=
fixed_size
/
sqrt
(
ar
);
for
(
int
di
=
0
;
di
<
density
;
++
di
)
{
for
(
int
dj
=
0
;
dj
<
density
;
++
dj
)
{
float
center_x_temp
=
center_x
-
step_average
/
2.
+
shift
/
2.
+
dj
*
shift
;
float
center_y_temp
=
center_y
-
step_average
/
2.
+
shift
/
2.
+
di
*
shift
;
e_boxes
(
h
,
w
,
idx
,
0
)
=
(
center_x_temp
-
box_width_ratio
/
2.
)
/
img_width
>=
0
?
(
center_x_temp
-
box_width_ratio
/
2.
)
/
img_width
:
0
;
e_boxes
(
h
,
w
,
idx
,
1
)
=
(
center_y_temp
-
box_height_ratio
/
2.
)
/
img_height
>=
0
?
(
center_y_temp
-
box_height_ratio
/
2.
)
/
img_height
:
0
;
e_boxes
(
h
,
w
,
idx
,
2
)
=
(
center_x_temp
+
box_width_ratio
/
2.
)
/
img_width
<=
1
?
(
center_x_temp
+
box_width_ratio
/
2.
)
/
img_width
:
1
;
e_boxes
(
h
,
w
,
idx
,
3
)
=
(
center_y_temp
+
box_height_ratio
/
2.
)
/
img_height
<=
1
?
(
center_y_temp
+
box_height_ratio
/
2.
)
/
img_height
:
1
;
idx
++
;
}
}
}
...
...
@@ -139,6 +135,7 @@ class DensityPriorBoxOpKernel : public framework::OpKernel<T> {
e_vars
=
var_et
.
broadcast
(
Eigen
::
DSizes
<
int
,
2
>
(
box_num
,
1
));
vars
->
Resize
(
var_dim
);
boxes
->
Resize
(
box_dim
);
}
};
// namespace operators
...
...
paddle/fluid/operators/math/blas_impl.cu.h
浏览文件 @
5e0bc130
...
...
@@ -16,6 +16,9 @@
#include "paddle/fluid/operators/math/math_function.h"
#include "paddle/fluid/platform/dynload/cublas.h"
#include "paddle/fluid/platform/gpu_info.h"
DECLARE_bool
(
enable_cublas_tensor_op_math
);
namespace
paddle
{
namespace
operators
{
...
...
@@ -42,11 +45,44 @@ struct CUBlas<float> {
}
template
<
typename
...
ARGS
>
static
void
GEMM_BATCH
(
ARGS
...
args
)
{
static
void
GEMM_
STRIDED_
BATCH
(
ARGS
...
args
)
{
#if CUDA_VERSION >= 8000
PADDLE_ENFORCE
(
platform
::
dynload
::
cublasSgemmStridedBatched
(
args
...));
#else
PADDLE_THROW
(
"SgemmStridedBatched is not supported on cuda <= 7.5"
);
#endif
}
// NOTES: GEMM_EX can use Tensor Core to accelerate matrix multiply.
// https://docs.nvidia.com/cuda/cublas/index.html#cublassetmathmode
template
<
typename
...
ARGS
>
static
void
GEMM_EX
(
platform
::
CUDADeviceContext
*
dev_ctx
,
cublasOperation_t
transa
,
cublasOperation_t
transb
,
int
m
,
int
n
,
int
k
,
const
float
*
alpha
,
const
void
*
A
,
cudaDataType_t
Atype
,
int
lda
,
const
void
*
B
,
cudaDataType_t
Btype
,
int
ldb
,
const
float
*
beta
,
void
*
C
,
cudaDataType_t
Ctype
,
int
ldc
)
{
// Because the gcc 4.8 doesn't expand template parameter pack that
// appears in a lambda-expression, I can not use template parameter pack
// here.
auto
cublas_call
=
[
&
]()
{
#if CUDA_VERSION >= 8000
VLOG
(
5
)
<<
"use_tensor_op_math: "
<<
(
platform
::
TensorCoreAvailable
()
?
"True"
:
"False"
);
PADDLE_ENFORCE
(
platform
::
dynload
::
cublasSgemmEx
(
dev_ctx
->
cublas_handle
(),
transa
,
transb
,
m
,
n
,
k
,
alpha
,
A
,
Atype
,
lda
,
B
,
Btype
,
ldb
,
beta
,
C
,
Ctype
,
ldc
));
#else
PADDLE_THROW
(
"cublasSgemmEx is supported on cuda >= 8.0"
);
#endif
};
#if CUDA_VERSION >= 9000
// NOTES: To use Tensor Core, we should change the cublas config,
// but the cublas may be hold by multi-thread.
dev_ctx
->
CublasCall
(
cublas_call
,
CUBLAS_TENSOR_OP_MATH
);
#else
cublas_call
();
#endif
}
};
...
...
@@ -69,13 +105,18 @@ struct CUBlas<double> {
}
template
<
typename
...
ARGS
>
static
void
GEMM_BATCH
(
ARGS
...
args
)
{
static
void
GEMM_
STRIDED_
BATCH
(
ARGS
...
args
)
{
#if CUDA_VERSION >= 8000
PADDLE_ENFORCE
(
platform
::
dynload
::
cublasDgemmStridedBatched
(
args
...));
#else
PADDLE_THROW
(
"DgemmStridedBatched is not supported on cuda <= 7.5"
);
#endif
}
template
<
typename
...
ARGS
>
static
void
GEMM_EX
(
ARGS
...
args
)
{
PADDLE_THROW
(
"Currently there are not cublasDgemmEx."
);
}
};
template
<
>
...
...
@@ -96,14 +137,16 @@ struct CUBlas<platform::float16> {
reinterpret_cast
<
__half
*>
(
C
),
ldc
));
}
static
void
GEMM_BATCH
(
cublasHandle_t
handle
,
cublasOperation_t
transa
,
cublasOperation_t
transb
,
int
m
,
int
n
,
int
k
,
const
float16
*
alpha
,
const
float16
*
A
,
int
lda
,
long
long
int
strideA
,
const
float16
*
B
,
// NOLINT
int
ldb
,
long
long
int
strideB
,
// NOLINT
const
float16
*
beta
,
float16
*
C
,
int
ldc
,
long
long
int
strideC
,
// NOLINT
int
batchCount
)
{
static
void
GEMM_STRIDED_BATCH
(
cublasHandle_t
handle
,
cublasOperation_t
transa
,
cublasOperation_t
transb
,
int
m
,
int
n
,
int
k
,
const
float16
*
alpha
,
const
float16
*
A
,
int
lda
,
long
long
int
strideA
,
// NOLINT
const
float16
*
B
,
// NOLINT
int
ldb
,
long
long
int
strideB
,
// NOLINT
const
float16
*
beta
,
float16
*
C
,
int
ldc
,
long
long
int
strideC
,
// NOLINT
int
batchCount
)
{
#if CUDA_VERSION >= 8000
PADDLE_ENFORCE
(
platform
::
dynload
::
cublasHgemmStridedBatched
(
handle
,
transa
,
transb
,
m
,
n
,
k
,
...
...
@@ -114,6 +157,45 @@ struct CUBlas<platform::float16> {
ldc
,
strideC
,
batchCount
));
#else
PADDLE_THROW
(
"HgemmStridedBatched is not supported on cuda <= 7.5"
);
#endif
}
// NOTES: GEMM_EX can use Tensor Core to accelerate matrix multiply.
// https://docs.nvidia.com/cuda/cublas/index.html#cublassetmathmode
template
<
typename
...
ARGS
>
static
void
GEMM_EX
(
platform
::
CUDADeviceContext
*
dev_ctx
,
cublasOperation_t
transa
,
cublasOperation_t
transb
,
int
m
,
int
n
,
int
k
,
const
void
*
alpha
,
const
void
*
A
,
cudaDataType_t
Atype
,
int
lda
,
const
void
*
B
,
cudaDataType_t
Btype
,
int
ldb
,
const
void
*
beta
,
void
*
C
,
cudaDataType_t
Ctype
,
int
ldc
,
cudaDataType_t
computeType
)
{
auto
cublas_call
=
[
&
]()
{
#if CUDA_VERSION >= 8000
cublasGemmAlgo_t
algo
=
CUBLAS_GEMM_DFALT
;
#if CUDA_VERSION >= 9000
bool
use_tensor_op_math
=
platform
::
TensorCoreAvailable
();
if
(
use_tensor_op_math
)
{
algo
=
CUBLAS_GEMM_DFALT_TENSOR_OP
;
}
VLOG
(
5
)
<<
"use_tensor_op_math: "
<<
(
use_tensor_op_math
?
"True"
:
"False"
);
#endif // CUDA_VERSION >= 9000
PADDLE_ENFORCE
(
platform
::
dynload
::
cublasGemmEx
(
dev_ctx
->
cublas_handle
(),
transa
,
transb
,
m
,
n
,
k
,
alpha
,
A
,
Atype
,
lda
,
B
,
Btype
,
ldb
,
beta
,
C
,
Ctype
,
ldc
,
computeType
,
algo
));
#else
PADDLE_THROW
(
"cublasGemmEx is supported on cuda >= 8.0"
);
#endif
};
#if CUDA_VERSION >= 9000
// NOTES: To use Tensor Core, we should change the cublas config,
// but the cublas may be hold by multi-thread.
dev_ctx
->
CublasCall
(
cublas_call
,
CUBLAS_TENSOR_OP_MATH
);
#else
cublas_call
();
#endif
}
};
...
...
@@ -133,8 +215,21 @@ void Blas<platform::CUDADeviceContext>::GEMM(CBLAS_TRANSPOSE transA,
cublasOperation_t
cuTransB
=
(
transB
==
CblasNoTrans
)
?
CUBLAS_OP_N
:
CUBLAS_OP_T
;
CUBlas
<
T
>::
GEMM
(
context_
.
cublas_handle
(),
cuTransB
,
cuTransA
,
N
,
M
,
K
,
&
alpha
,
B
,
ldb
,
A
,
lda
,
&
beta
,
C
,
N
);
#if CUDA_VERSION >= 8000
if
(
FLAGS_enable_cublas_tensor_op_math
&&
std
::
is_same
<
T
,
float
>::
value
)
{
auto
&
cuda_ctx
=
const_cast
<
platform
::
CUDADeviceContext
&>
(
context_
);
CUBlas
<
T
>::
GEMM_EX
(
&
cuda_ctx
,
cuTransB
,
cuTransA
,
N
,
M
,
K
,
&
alpha
,
B
,
CUDA_R_32F
,
ldb
,
A
,
CUDA_R_32F
,
lda
,
&
beta
,
C
,
CUDA_R_32F
,
N
);
}
else
{
#endif // CUDA_VERSION >= 8000
CUBlas
<
T
>::
GEMM
(
context_
.
cublas_handle
(),
cuTransB
,
cuTransA
,
N
,
M
,
K
,
&
alpha
,
B
,
ldb
,
A
,
lda
,
&
beta
,
C
,
N
);
#if CUDA_VERSION >= 8000
}
#endif // CUDA_VERSION >= 8000
}
template
<
>
...
...
@@ -157,30 +252,18 @@ inline void Blas<platform::CUDADeviceContext>::GEMM(
PADDLE_ENFORCE_GE
(
context_
.
GetComputeCapability
(),
53
,
"cublas fp16 gemm requires GPU compute capability >= 53"
);
#if CUDA_VERSION >= 8000
float
h_alpha
=
static_cast
<
float
>
(
alpha
);
float
h_beta
=
static_cast
<
float
>
(
beta
);
cublasGemmAlgo_t
algo
=
CUBLAS_GEMM_DFALT
;
#if CUDA_VERSION >= 9000
if
(
context_
.
GetComputeCapability
()
>=
70
)
{
PADDLE_ENFORCE
(
platform
::
dynload
::
cublasSetMathMode
(
context_
.
cublas_handle
(),
CUBLAS_TENSOR_OP_MATH
));
algo
=
CUBLAS_GEMM_DFALT_TENSOR_OP
;
}
else
{
PADDLE_ENFORCE
(
platform
::
dynload
::
cublasSetMathMode
(
context_
.
cublas_handle
(),
CUBLAS_DEFAULT_MATH
));
}
#endif // CUDA_VERSION >= 9000
#if CUDA_VERSION >= 8000
// cublasHgemm does true FP16 computation which is slow for non-Volta
// GPUs. So use cublasGemmEx instead which does pesudo FP16 computation:
// input/output in fp16, computation in fp32, which can also be accelerated
// using tensor cores in volta GPUs.
PADDLE_ENFORCE
(
platform
::
dynload
::
cublasGemmEx
(
context_
.
cublas_handle
(),
cuTransB
,
cuTransA
,
N
,
M
,
K
,
&
h_alpha
,
B
,
CUDA_R_16F
,
ldb
,
A
,
CUDA_R_16F
,
lda
,
&
h_beta
,
C
,
CUDA_R_16F
,
N
,
CUDA_R_
32F
,
algo
)
);
auto
&
cuda_ctx
=
const_cast
<
platform
::
CUDADeviceContext
&>
(
context_
);
CUBlas
<
platform
::
float16
>::
GEMM_EX
(
&
cuda_ctx
,
cuTransB
,
cuTransA
,
N
,
M
,
K
,
&
h_alpha
,
B
,
CUDA_R_16F
,
ldb
,
A
,
CUDA_R_
16F
,
lda
,
&
h_beta
,
C
,
CUDA_R_16F
,
N
,
CUDA_R_32F
);
#else
// CUDA 7.5 does not support cublasGemmEx, hence we fall back to use hgemm
CUBlas
<
platform
::
float16
>::
GEMM
(
context_
.
cublas_handle
(),
cuTransB
,
cuTransA
,
...
...
@@ -199,8 +282,38 @@ void Blas<platform::CUDADeviceContext>::GEMM(bool transA, bool transB, int M,
// the cblas convention.
cublasOperation_t
cuTransA
=
transA
?
CUBLAS_OP_T
:
CUBLAS_OP_N
;
cublasOperation_t
cuTransB
=
transB
?
CUBLAS_OP_T
:
CUBLAS_OP_N
;
CUBlas
<
T
>::
GEMM
(
context_
.
cublas_handle
(),
cuTransB
,
cuTransA
,
N
,
M
,
K
,
&
alpha
,
B
,
ldb
,
A
,
lda
,
&
beta
,
C
,
ldc
);
#if CUDA_VERSION >= 8000
if
(
FLAGS_enable_cublas_tensor_op_math
&&
std
::
is_same
<
T
,
float
>::
value
)
{
auto
&
cuda_ctx
=
const_cast
<
platform
::
CUDADeviceContext
&>
(
context_
);
CUBlas
<
T
>::
GEMM_EX
(
&
cuda_ctx
,
cuTransB
,
cuTransA
,
N
,
M
,
K
,
&
alpha
,
B
,
CUDA_R_32F
,
ldb
,
A
,
CUDA_R_32F
,
lda
,
&
beta
,
C
,
CUDA_R_32F
,
ldc
);
}
else
{
#endif // CUDA_VERSION >= 8000
CUBlas
<
T
>::
GEMM
(
context_
.
cublas_handle
(),
cuTransB
,
cuTransA
,
N
,
M
,
K
,
&
alpha
,
B
,
ldb
,
A
,
lda
,
&
beta
,
C
,
ldc
);
#if CUDA_VERSION >= 8000
}
#endif // CUDA_VERSION >= 8000
}
template
<
>
template
<
>
inline
void
Blas
<
platform
::
CUDADeviceContext
>::
GEMM
(
bool
transA
,
bool
transB
,
int
M
,
int
N
,
int
K
,
platform
::
float16
alpha
,
const
platform
::
float16
*
A
,
int
lda
,
const
platform
::
float16
*
B
,
int
ldb
,
platform
::
float16
beta
,
platform
::
float16
*
C
,
int
ldc
)
const
{
// Note that cublas follows fortran order, so the order is different from
// the cblas convention.
cublasOperation_t
cuTransA
=
transA
?
CUBLAS_OP_T
:
CUBLAS_OP_N
;
cublasOperation_t
cuTransB
=
transB
?
CUBLAS_OP_T
:
CUBLAS_OP_N
;
CUBlas
<
platform
::
float16
>::
GEMM
(
context_
.
cublas_handle
(),
cuTransB
,
cuTransA
,
N
,
M
,
K
,
&
alpha
,
B
,
ldb
,
A
,
lda
,
&
beta
,
C
,
ldc
);
}
template
<
>
...
...
@@ -238,9 +351,34 @@ void Blas<platform::CUDADeviceContext>::BatchedGEMM(
(
transB
==
CblasNoTrans
)
?
CUBLAS_OP_N
:
CUBLAS_OP_T
;
const
int64_t
strideC
=
M
*
N
;
CUBlas
<
T
>::
GEMM_BATCH
(
context_
.
cublas_handle
(),
cuTransB
,
cuTransA
,
N
,
M
,
K
,
&
alpha
,
B
,
ldb
,
strideB
,
A
,
lda
,
strideA
,
&
beta
,
C
,
ldc
,
strideC
,
batchCount
);
#if CUDA_VERSION >= 9010
if
(
FLAGS_enable_cublas_tensor_op_math
&&
std
::
is_same
<
T
,
float
>::
value
)
{
auto
cublas_call
=
[
&
]()
{
cublasGemmAlgo_t
algo
=
CUBLAS_GEMM_DFALT
;
bool
use_tensor_op_math
=
platform
::
TensorCoreAvailable
();
if
(
use_tensor_op_math
)
{
algo
=
CUBLAS_GEMM_DFALT_TENSOR_OP
;
}
VLOG
(
5
)
<<
"use_tensor_op_math: "
<<
(
use_tensor_op_math
?
"True"
:
"False"
);
PADDLE_ENFORCE
(
platform
::
dynload
::
cublasGemmStridedBatchedEx
(
context_
.
cublas_handle
(),
cuTransB
,
cuTransA
,
N
,
M
,
K
,
&
alpha
,
B
,
CUDA_R_32F
,
ldb
,
strideB
,
A
,
CUDA_R_32F
,
lda
,
strideA
,
&
beta
,
C
,
CUDA_R_32F
,
ldc
,
strideC
,
batchCount
,
CUDA_R_32F
,
algo
));
};
auto
&
dev_ctx
=
const_cast
<
platform
::
CUDADeviceContext
&>
(
context_
);
dev_ctx
.
CublasCall
(
cublas_call
,
CUBLAS_TENSOR_OP_MATH
);
}
else
{
#endif // CUDA_VERSION >= 9010
CUBlas
<
T
>::
GEMM_STRIDED_BATCH
(
context_
.
cublas_handle
(),
cuTransB
,
cuTransA
,
N
,
M
,
K
,
&
alpha
,
B
,
ldb
,
strideB
,
A
,
lda
,
strideA
,
&
beta
,
C
,
ldc
,
strideC
,
batchCount
);
#if CUDA_VERSION >= 9010
}
#endif // CUDA_VERSION >= 9010
}
}
// namespace math
...
...
paddle/fluid/platform/device_context.h
浏览文件 @
5e0bc130
...
...
@@ -143,6 +143,39 @@ class CudnnWorkspaceHandle {
std
::
unique_ptr
<
std
::
lock_guard
<
std
::
mutex
>>
guard_
;
};
#if CUDA_VERSION >= 9000
class
ScopedCublasMathMode
{
public:
ScopedCublasMathMode
(
cublasHandle_t
handle
,
cublasMath_t
new_math_mode
)
:
handle_
(
handle
)
{
need_reset
=
false
;
PADDLE_ENFORCE
(
platform
::
dynload
::
cublasGetMathMode
(
handle_
,
&
old_math_mode_
),
"Failed to get old cublas math mode"
);
if
(
old_math_mode_
!=
new_math_mode
)
{
PADDLE_ENFORCE
(
platform
::
dynload
::
cublasSetMathMode
(
handle_
,
new_math_mode
),
"Failed to set old cublas math mode"
);
need_reset
=
true
;
}
}
~
ScopedCublasMathMode
()
{
if
(
need_reset
)
{
PADDLE_ENFORCE
(
platform
::
dynload
::
cublasSetMathMode
(
handle_
,
old_math_mode_
),
"Failed to set old cublas math mode"
);
}
}
private:
cublasHandle_t
handle_
;
cublasMath_t
old_math_mode_
;
bool
need_reset
;
};
#endif
class
CUDADeviceContext
:
public
DeviceContext
{
public:
explicit
CUDADeviceContext
(
CUDAPlace
place
);
...
...
@@ -199,6 +232,18 @@ class CUDADeviceContext : public DeviceContext {
callback_manager_
->
Wait
();
}
#if CUDA_VERSION >= 9000
/*! \brief CublasCall may need to change cublas's config,
* but the cublas may be hold by multi-thread, so we should
* add lock here. */
template
<
typename
Callback
>
void
CublasCall
(
Callback
callback
,
cublasMath_t
new_math
)
{
std
::
lock_guard
<
std
::
mutex
>
guard
(
cublas_mtx_
);
ScopedCublasMathMode
scoped_cublas_math
(
cublas_handle_
,
new_math
);
callback
();
}
#endif
private:
CUDAPlace
place_
;
...
...
@@ -220,6 +265,8 @@ class CUDADeviceContext : public DeviceContext {
// If we use mtx_ for StreamCallbackManager, deadlock may occur sometimes
mutable
std
::
mutex
callback_mtx_
;
std
::
unique_ptr
<
StreamCallbackManager
>
callback_manager_
;
mutable
std
::
mutex
cublas_mtx_
;
};
template
<
>
...
...
paddle/fluid/platform/dynload/cublas.h
浏览文件 @
5e0bc130
...
...
@@ -61,9 +61,6 @@ extern void *cublas_dso_handle;
extern DynLoad__##__name __name
#endif
#define DECLARE_DYNAMIC_LOAD_CUBLAS_V2_WRAP(__name) \
DECLARE_DYNAMIC_LOAD_CUBLAS_WRAP(__name)
#define CUBLAS_BLAS_ROUTINE_EACH(__macro) \
__macro(cublasSaxpy_v2); \
__macro(cublasDaxpy_v2); \
...
...
@@ -93,22 +90,23 @@ CUBLAS_BLAS_ROUTINE_EACH(DECLARE_DYNAMIC_LOAD_CUBLAS_WRAP)
// APIs available after CUDA 8.0
#if CUDA_VERSION >= 8000
#define CUBLAS_BLAS_ROUTINE_EACH_R2(__macro) \
__macro(cublasGemmEx); \
__macro(cublasSgemmStridedBatched); \
__macro(cublasDgemmStridedBatched); \
__macro(cublasCgemmStridedBatched); \
__macro(cublasZgemmStridedBatched); \
__macro(cublasHgemmStridedBatched);
CUBLAS_BLAS_ROUTINE_EACH_R2
(
DECLARE_DYNAMIC_LOAD_CUBLAS_WRAP
)
DECLARE_DYNAMIC_LOAD_CUBLAS_WRAP
(
cublasGemmEx
);
DECLARE_DYNAMIC_LOAD_CUBLAS_WRAP
(
cublasSgemmStridedBatched
);
DECLARE_DYNAMIC_LOAD_CUBLAS_WRAP
(
cublasDgemmStridedBatched
);
DECLARE_DYNAMIC_LOAD_CUBLAS_WRAP
(
cublasCgemmStridedBatched
);
DECLARE_DYNAMIC_LOAD_CUBLAS_WRAP
(
cublasZgemmStridedBatched
);
DECLARE_DYNAMIC_LOAD_CUBLAS_WRAP
(
cublasHgemmStridedBatched
);
#endif
// APIs available after CUDA 9.0
#if CUDA_VERSION >= 9000
#define CUBLAS_BLAS_ROUTINE_EACH_R3(__macro) __macro(cublasSetMathMode);
DECLARE_DYNAMIC_LOAD_CUBLAS_WRAP
(
cublasSetMathMode
);
DECLARE_DYNAMIC_LOAD_CUBLAS_WRAP
(
cublasGetMathMode
);
#endif
CUBLAS_BLAS_ROUTINE_EACH_R3
(
DECLARE_DYNAMIC_LOAD_CUBLAS_WRAP
)
#if CUDA_VERSION >= 9010
DECLARE_DYNAMIC_LOAD_CUBLAS_WRAP
(
cublasGemmBatchedEx
);
DECLARE_DYNAMIC_LOAD_CUBLAS_WRAP
(
cublasGemmStridedBatchedEx
);
#endif
#undef DECLARE_DYNAMIC_LOAD_CUBLAS_WRAP
...
...
paddle/fluid/platform/gpu_info.cc
浏览文件 @
5e0bc130
...
...
@@ -26,6 +26,16 @@ DEFINE_double(fraction_of_gpu_memory_to_use, 0.92,
"additional trunks of the same size will be requested from gpu "
"until the gpu has no memory left for another trunk."
);
DEFINE_bool
(
enable_cublas_tensor_op_math
,
false
,
"The enable_cublas_tensor_op_math indicate whether to use Tensor Core, "
"but it may loss precision. Currently, There are two CUDA libraries that"
" use Tensor Cores, cuBLAS and cuDNN. cuBLAS uses Tensor Cores to speed up"
" GEMM computations(the matrices must be either half precision or single "
"precision); cuDNN uses Tensor Cores to speed up both convolutions(the "
"input and output must be half precision) and recurrent neural networks "
"(RNNs)."
);
namespace
paddle
{
namespace
platform
{
...
...
@@ -64,6 +74,16 @@ int GetCUDADriverVersion(int id) {
return
driver_version
;
}
bool
TensorCoreAvailable
()
{
#if CUDA_VERSION >= 9000
int
device
=
GetCurrentDeviceId
();
int
driver_version
=
GetCUDAComputeCapability
(
device
);
return
driver_version
>=
70
;
#else
return
false
;
#endif
}
int
GetCUDAMultiProcessors
(
int
id
)
{
PADDLE_ENFORCE_LT
(
id
,
GetCUDADeviceCount
(),
"id must less than GPU count"
);
int
count
;
...
...
paddle/fluid/platform/gpu_info.h
浏览文件 @
5e0bc130
...
...
@@ -35,6 +35,9 @@ int GetCUDARuntimeVersion(int id);
//! Get the driver version of the ith GPU
int
GetCUDADriverVersion
(
int
id
);
//! Wheter the current device support TensorCore
bool
TensorCoreAvailable
();
//! Get the MultiProcessors of the ith GPU.
int
GetCUDAMultiProcessors
(
int
i
);
...
...
paddle/fluid/pybind/CMakeLists.txt
浏览文件 @
5e0bc130
...
...
@@ -5,8 +5,8 @@ if(WITH_PYTHON)
if
(
WITH_AMD_GPU
)
hip_library
(
paddle_pybind SHARED
SRCS
${
PYBIND_SRCS
}
DEPS
${
PYBIND_DEPS
}
${
GLOB_OP_LIB
}
${
GLOB_OPERATOR_DEPS
}
)
DEPS
ARCHIVE_START
${
PYBIND_DEPS
}
${
GLOB_OP_LIB
}
${
GLOB_OPERATOR_DEPS
}
ARCHIVE_END
)
else
()
cc_library
(
paddle_pybind SHARED
SRCS
${
PYBIND_SRCS
}
...
...
paddle/scripts/paddle_build.sh
浏览文件 @
5e0bc130
...
...
@@ -94,6 +94,30 @@ function cmake_gen() {
else
exit
1
fi
elif
[
"
$1
"
==
"cp36-cp36m"
]
;
then
if
[
-d
"/Library/Frameworks/Python.framework/Versions/3.6"
]
;
then
export
LD_LIBRARY_PATH
=
/Library/Frameworks/Python.framework/Versions/3.6/lib/
export
DYLD_LIBRARY_PATH
=
/Library/Frameworks/Python.framework/Versions/3.6/lib/
export
PATH
=
/Library/Frameworks/Python.framework/Versions/3.6/bin/:
${
PATH
}
PYTHON_FLAGS
=
"-DPYTHON_EXECUTABLE:FILEPATH=/Library/Frameworks/Python.framework/Versions/3.6/bin/python3
-DPYTHON_INCLUDE_DIR:PATH=/Library/Frameworks/Python.framework/Versions/3.6/include/python3.6m/
-DPYTHON_LIBRARY:FILEPATH=/Library/Frameworks/Python.framework/Versions/3.6/lib/libpython3.6m.dylib"
WITH_FLUID_ONLY
=
${
WITH_FLUID_ONLY
:-
ON
}
else
exit
1
fi
elif
[
"
$1
"
==
"cp37-cp37m"
]
;
then
if
[
-d
"/Library/Frameworks/Python.framework/Versions/3.7"
]
;
then
export
LD_LIBRARY_PATH
=
/Library/Frameworks/Python.framework/Versions/3.7/lib/
export
DYLD_LIBRARY_PATH
=
/Library/Frameworks/Python.framework/Versions/3.7/lib/
export
PATH
=
/Library/Frameworks/Python.framework/Versions/3.7/bin/:
${
PATH
}
PYTHON_FLAGS
=
"-DPYTHON_EXECUTABLE:FILEPATH=/Library/Frameworks/Python.framework/Versions/3.7/bin/python3
-DPYTHON_INCLUDE_DIR:PATH=/Library/Frameworks/Python.framework/Versions/3.7/include/python3.7m/
-DPYTHON_LIBRARY:FILEPATH=/Library/Frameworks/Python.framework/Versions/3.7/lib/libpython3.7m.dylib"
WITH_FLUID_ONLY
=
${
WITH_FLUID_ONLY
:-
ON
}
else
exit
1
fi
fi
else
if
[
"
$1
"
!=
""
]
;
then
...
...
@@ -116,6 +140,18 @@ function cmake_gen() {
export
PYTHON_FLAGS
=
"-DPYTHON_EXECUTABLE:FILEPATH=/opt/_internal/cpython-3.5.1/bin/python3
-DPYTHON_INCLUDE_DIR:PATH=/opt/_internal/cpython-3.5.1/include/python3.5m
-DPYTHON_LIBRARIES:FILEPATH=/opt/_internal/cpython-3.5.1/lib/libpython3.so"
elif
[
"
$1
"
==
"cp36-cp36m"
]
;
then
export
LD_LIBRARY_PATH
=
/opt/_internal/cpython-3.6.0/lib/:
${
LD_LIBRARY_PATH
}
export
PATH
=
/opt/_internal/cpython-3.6.0/bin/:
${
PATH
}
export
PYTHON_FLAGS
=
"-DPYTHON_EXECUTABLE:FILEPATH=/opt/_internal/cpython-3.6.0/bin/python3
-DPYTHON_INCLUDE_DIR:PATH=/opt/_internal/cpython-3.6.0/include/python3.6m
-DPYTHON_LIBRARIES:FILEPATH=/opt/_internal/cpython-3.6.0/lib/libpython3.so"
elif
[
"
$1
"
==
"cp37-cp37m"
]
;
then
export
LD_LIBRARY_PATH
=
/opt/_internal/cpython-3.7.0/lib/:
${
LD_LIBRARY_PATH
}
export
PATH
=
/opt/_internal/cpython-3.7.0/bin/:
${
PATH
}
export
PYTHON_FLAGS
=
"-DPYTHON_EXECUTABLE:FILEPATH=/opt/_internal/cpython-3.7.0/bin/python3
-DPYTHON_INCLUDE_DIR:PATH=/opt/_internal/cpython-3.7.0/include/python3.7m
-DPYTHON_LIBRARIES:FILEPATH=/opt/_internal/cpython-3.7.0/lib/libpython3.so"
fi
fi
fi
...
...
@@ -419,7 +455,7 @@ function assert_api_not_changed() {
source
.env/bin/activate
pip
install
${
PADDLE_ROOT
}
/build/python/dist/
*
whl
python
${
PADDLE_ROOT
}
/tools/print_signatures.py paddle.fluid
>
new.spec
if
[
"
$1
"
==
"cp35-cp35m"
]
;
then
if
[
"
$1
"
==
"cp35-cp35m"
]
||
[
"
$1
"
==
"cp36-cp36m"
]
||
[
"
$1
"
==
"cp37-cp37m"
]
;
then
# Use sed to make python2 and python3 sepc keeps the same
sed
-i
's/arg0: str/arg0: unicode/g'
new.spec
sed
-i
"s/
\(
.*Transpiler.*
\)
.__init__ ArgSpec(args=
\[
'self'].*/
\1
.__init__ /g"
new.spec
...
...
paddle/testing/paddle_gtest_main.cc
浏览文件 @
5e0bc130
...
...
@@ -28,7 +28,7 @@ int main(int argc, char** argv) {
for
(
int
i
=
0
;
i
<
argc
;
++
i
)
{
new_argv
.
push_back
(
argv
[
i
]);
}
#if
def PADDLE_WITH_CUDA
#if
defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP)
new_argv
.
push_back
(
strdup
(
"--tryfromenv=fraction_of_gpu_memory_to_use,allocator_strategy"
));
#else
...
...
python/paddle/fluid/__init__.py
浏览文件 @
5e0bc130
...
...
@@ -133,7 +133,8 @@ def __bootstrap__():
if
core
.
is_compiled_with_cuda
():
read_env_flags
+=
[
'fraction_of_gpu_memory_to_use'
,
'cudnn_deterministic'
,
'conv_workspace_size_limit'
,
'cudnn_exhaustive_search'
'enable_cublas_tensor_op_math'
,
'conv_workspace_size_limit'
,
'cudnn_exhaustive_search'
]
core
.
init_gflags
([
sys
.
argv
[
0
]]
+
[
"--tryfromenv="
+
","
.
join
(
read_env_flags
)])
...
...
python/paddle/fluid/layers/detection.py
浏览文件 @
5e0bc130
...
...
@@ -1029,6 +1029,7 @@ def density_prior_box(input,
clip
=
False
,
steps
=
[
0.0
,
0.0
],
offset
=
0.5
,
flatten_to_2d
=
False
,
name
=
None
):
"""
**Density Prior Box Operator**
...
...
@@ -1065,22 +1066,24 @@ def density_prior_box(input,
height/weight of the input will be automatically calculated.
Default: [0., 0.]
offset(float): Prior boxes center offset. Default: 0.5
flatten_to_2d(bool): Whether to flatten output prior boxes and variance
to 2D shape, the second dim is 4. Default: False.
name(str): Name of the density prior box op. Default: None.
Returns:
tuple: A tuple with two Variable (boxes, variances)
boxes: the output density prior boxes of PriorBox.
The layout is [H, W, num_priors, 4]
.
H is the height of input, W is the width of input,
num_priors is the total
box count of each position of input.
The layout is [H, W, num_priors, 4] when flatten_to_2d is False
.
The layout is [H * W * num_priors, 4] when flatten_to_2d is True.
H is the height of input, W is the width of input,
num_priors is the total
box count of each position of input.
variances: the expanded variances of PriorBox.
The layout is [H, W, num_priors, 4]
.
H is the height of input, W is the width of input
num_priors is the total
box count of each position of input
The layout is [H, W, num_priors, 4] when flatten_to_2d is False
.
The layout is [H * W * num_priors, 4] when flatten_to_2d is True.
H is the height of input, W is the width of input
num_priors is the total box count of each position of input.
Examples:
...
...
@@ -1089,14 +1092,11 @@ def density_prior_box(input,
box, var = fluid.layers.density_prior_box(
input=conv1,
image=images,
min_sizes=[100.],
max_sizes=[200.],
aspect_ratios=[1.0, 1.0 / 2.0, 2.0],
densities=[3, 4],
fixed_sizes=[50., 60.],
fixed_ratios=[1.0, 3.0, 1.0 / 3.0],
flip=True,
clip=True)
densities=[4, 2, 1],
fixed_sizes=[32.0, 64.0, 128.0],
fixed_ratios=[1.],
clip=True,
flatten_to_2d=True)
"""
helper
=
LayerHelper
(
"density_prior_box"
,
**
locals
())
dtype
=
helper
.
input_dtype
()
...
...
@@ -1127,14 +1127,11 @@ def density_prior_box(input,
'step_w'
:
steps
[
0
],
'step_h'
:
steps
[
1
],
'offset'
:
offset
,
'densities'
:
densities
,
'fixed_sizes'
:
fixed_sizes
,
'fixed_ratios'
:
fixed_ratios
,
'flatten_to_2d'
:
flatten_to_2d
,
}
if
densities
is
not
None
and
len
(
densities
)
>
0
:
attrs
[
'densities'
]
=
densities
if
fixed_sizes
is
not
None
and
len
(
fixed_sizes
)
>
0
:
attrs
[
'fixed_sizes'
]
=
fixed_sizes
if
fixed_ratios
is
not
None
and
len
(
fixed_ratios
)
>
0
:
attrs
[
'fixed_ratios'
]
=
fixed_ratios
box
=
helper
.
create_variable_for_type_inference
(
dtype
)
var
=
helper
.
create_variable_for_type_inference
(
dtype
)
helper
.
append_op
(
...
...
python/paddle/fluid/layers/nn.py
浏览文件 @
5e0bc130
...
...
@@ -2134,11 +2134,16 @@ def pool2d(input,
input tensor is NCHW, where N is batch size, C is
the number of channels, H is the height of the
feature, and W is the width of the feature.
pool_size (int): The side length of pooling windows. All pooling
windows are squares with pool_size on a side.
pool_size (int|list|tuple): The pool kernel size. If pool kernel size is a tuple or list,
it must contain two integers, (pool_size_Height, pool_size_Width).
Otherwise, the pool kernel size will be a square of an int.
pool_type: ${pooling_type_comment}
pool_stride (int): stride of the pooling layer.
pool_padding (int): padding size.
pool_stride (int|list|tuple): The pool stride size. If pool stride size is a tuple or list,
it must contain two integers, (pool_stride_Height, pool_stride_Width).
Otherwise, the pool stride size will be a square of an int.
pool_padding (int|list|tuple): The pool padding size. If pool padding size is a tuple,
it must contain two integers, (pool_padding_on_Height, pool_padding_on_Width).
Otherwise, the pool padding size will be a square of an int.
global_pooling (bool): ${global_pooling_comment}
use_cudnn (bool): ${use_cudnn_comment}
ceil_mode (bool): ${ceil_mode_comment}
...
...
python/paddle/fluid/nets.py
浏览文件 @
5e0bc130
...
...
@@ -250,7 +250,8 @@ def sequence_conv_pool(input,
filter_size
,
param_attr
=
None
,
act
=
"sigmoid"
,
pool_type
=
"max"
):
pool_type
=
"max"
,
bias_attr
=
None
):
"""
The sequence_conv_pool is composed with Sequence Convolution and Pooling.
...
...
@@ -266,6 +267,11 @@ def sequence_conv_pool(input,
pool_type (str): Pooling type can be :math:`max` for max-pooling, :math:`average` for
average-pooling, :math:`sum` for sum-pooling, :math:`sqrt` for sqrt-pooling.
Default :math:`max`.
bias_attr (ParamAttr|bool|None): The parameter attribute for the bias of sequence_conv.
If it is set to False, no bias will be added to the output units.
If it is set to None or one attribute of ParamAttr, sequence_conv
will create ParamAttr as bias_attr. If the Initializer of the bias_attr
is not set, the bias is initialized zero. Default: None.
Return:
Variable: The final result after Sequence Convolution and Pooling.
...
...
@@ -289,6 +295,7 @@ def sequence_conv_pool(input,
num_filters
=
num_filters
,
filter_size
=
filter_size
,
param_attr
=
param_attr
,
bias_attr
=
bias_attr
,
act
=
act
)
pool_out
=
layers
.
sequence_pool
(
input
=
conv_out
,
pool_type
=
pool_type
)
...
...
python/paddle/fluid/tests/test_detection.py
浏览文件 @
5e0bc130
...
...
@@ -112,38 +112,42 @@ class TestDetection(unittest.TestCase):
class
TestPriorBox
(
unittest
.
TestCase
):
def
test_prior_box
(
self
):
data_shape
=
[
3
,
224
,
224
]
images
=
fluid
.
layers
.
data
(
name
=
'pixel'
,
shape
=
data_shape
,
dtype
=
'float32'
)
conv1
=
fluid
.
layers
.
conv2d
(
images
,
3
,
3
,
2
)
box
,
var
=
layers
.
prior_box
(
input
=
conv1
,
image
=
images
,
min_sizes
=
[
100.0
],
aspect_ratios
=
[
1.
],
flip
=
True
,
clip
=
True
)
assert
len
(
box
.
shape
)
==
4
assert
box
.
shape
==
var
.
shape
assert
box
.
shape
[
3
]
==
4
program
=
Program
()
with
program_guard
(
program
):
data_shape
=
[
3
,
224
,
224
]
images
=
fluid
.
layers
.
data
(
name
=
'pixel'
,
shape
=
data_shape
,
dtype
=
'float32'
)
conv1
=
fluid
.
layers
.
conv2d
(
images
,
3
,
3
,
2
)
box
,
var
=
layers
.
prior_box
(
input
=
conv1
,
image
=
images
,
min_sizes
=
[
100.0
],
aspect_ratios
=
[
1.
],
flip
=
True
,
clip
=
True
)
assert
len
(
box
.
shape
)
==
4
assert
box
.
shape
==
var
.
shape
assert
box
.
shape
[
3
]
==
4
class
TestDensityPriorBox
(
unittest
.
TestCase
):
def
test_density_prior_box
(
self
):
data_shape
=
[
3
,
224
,
224
]
images
=
fluid
.
layers
.
data
(
name
=
'pixel'
,
shape
=
data_shape
,
dtype
=
'float32'
)
conv1
=
fluid
.
layers
.
conv2d
(
images
,
3
,
3
,
2
)
box
,
var
=
layers
.
density_prior_box
(
input
=
conv1
,
image
=
images
,
densities
=
[
3
,
4
],
fixed_sizes
=
[
50.
,
60.
],
fixed_ratios
=
[
1.0
],
clip
=
True
)
assert
len
(
box
.
shape
)
==
4
assert
box
.
shape
==
var
.
shape
assert
box
.
shape
[
3
]
==
4
program
=
Program
()
with
program_guard
(
program
):
data_shape
=
[
3
,
224
,
224
]
images
=
fluid
.
layers
.
data
(
name
=
'pixel'
,
shape
=
data_shape
,
dtype
=
'float32'
)
conv1
=
fluid
.
layers
.
conv2d
(
images
,
3
,
3
,
2
)
box
,
var
=
layers
.
density_prior_box
(
input
=
conv1
,
image
=
images
,
densities
=
[
3
,
4
],
fixed_sizes
=
[
50.
,
60.
],
fixed_ratios
=
[
1.0
],
clip
=
True
)
assert
len
(
box
.
shape
)
==
4
assert
box
.
shape
==
var
.
shape
assert
box
.
shape
[
-
1
]
==
4
class
TestAnchorGenerator
(
unittest
.
TestCase
):
...
...
python/paddle/fluid/tests/unittests/test_density_prior_box_op.py
浏览文件 @
5e0bc130
...
...
@@ -36,7 +36,8 @@ class TestDensityPriorBoxOp(OpTest):
'offset'
:
self
.
offset
,
'densities'
:
self
.
densities
,
'fixed_sizes'
:
self
.
fixed_sizes
,
'fixed_ratios'
:
self
.
fixed_ratios
'fixed_ratios'
:
self
.
fixed_ratios
,
'flatten_to_2d'
:
self
.
flatten_to_2d
}
self
.
outputs
=
{
'Boxes'
:
self
.
out_boxes
,
'Variances'
:
self
.
out_var
}
...
...
@@ -48,16 +49,17 @@ class TestDensityPriorBoxOp(OpTest):
self
.
set_data
()
def
set_density
(
self
):
self
.
densities
=
[]
self
.
fixed_sizes
=
[]
self
.
fixed_ratios
=
[]
self
.
densities
=
[
4
,
2
,
1
]
self
.
fixed_sizes
=
[
32.0
,
64.0
,
128.0
]
self
.
fixed_ratios
=
[
1.0
]
self
.
layer_w
=
17
self
.
layer_h
=
17
self
.
image_w
=
533
self
.
image_h
=
533
self
.
flatten_to_2d
=
False
def
init_test_params
(
self
):
self
.
layer_w
=
32
self
.
layer_h
=
32
self
.
image_w
=
40
self
.
image_h
=
40
self
.
set_density
()
self
.
step_w
=
float
(
self
.
image_w
)
/
float
(
self
.
layer_w
)
self
.
step_h
=
float
(
self
.
image_h
)
/
float
(
self
.
layer_h
)
...
...
@@ -69,8 +71,6 @@ class TestDensityPriorBoxOp(OpTest):
self
.
variances
=
[
0.1
,
0.1
,
0.2
,
0.2
]
self
.
variances
=
np
.
array
(
self
.
variances
,
dtype
=
np
.
float
).
flatten
()
self
.
set_density
()
self
.
clip
=
True
self
.
num_priors
=
0
if
len
(
self
.
fixed_sizes
)
>
0
and
len
(
self
.
densities
)
>
0
:
...
...
@@ -129,6 +129,9 @@ class TestDensityPriorBoxOp(OpTest):
(
self
.
layer_h
,
self
.
layer_w
,
self
.
num_priors
,
1
))
self
.
out_boxes
=
out_boxes
.
astype
(
'float32'
)
self
.
out_var
=
out_var
.
astype
(
'float32'
)
if
self
.
flatten_to_2d
:
self
.
out_boxes
=
self
.
out_boxes
.
reshape
((
-
1
,
4
))
self
.
out_var
=
self
.
out_var
.
reshape
((
-
1
,
4
))
class
TestDensityPriorBox
(
TestDensityPriorBoxOp
):
...
...
@@ -136,6 +139,11 @@ class TestDensityPriorBox(TestDensityPriorBoxOp):
self
.
densities
=
[
3
,
4
]
self
.
fixed_sizes
=
[
1.0
,
2.0
]
self
.
fixed_ratios
=
[
1.0
]
self
.
layer_w
=
32
self
.
layer_h
=
32
self
.
image_w
=
40
self
.
image_h
=
40
self
.
flatten_to_2d
=
True
if
__name__
==
'__main__'
:
...
...
python/paddle/fluid/tests/unittests/test_layers.py
浏览文件 @
5e0bc130
...
...
@@ -202,6 +202,17 @@ class TestBook(unittest.TestCase):
self
.
assertIsNotNone
(
layers
.
sequence_unpad
(
x
=
x
,
length
=
length
))
print
(
str
(
program
))
def
test_pool2d
(
self
):
program
=
Program
()
with
program_guard
(
program
):
x
=
layers
.
data
(
name
=
'x'
,
shape
=
[
3
,
224
,
224
],
dtype
=
'float32'
)
self
.
assertIsNotNone
(
layers
.
pool2d
(
x
,
pool_size
=
[
5
,
3
],
pool_stride
=
[
1
,
2
],
pool_padding
=
(
2
,
1
)))
def
test_lstm_unit
(
self
):
program
=
Program
()
with
program_guard
(
program
):
...
...
tools/manylinux1/Dockerfile.x64
浏览文件 @
5e0bc130
...
...
@@ -16,7 +16,7 @@ ENV PKG_CONFIG_PATH=/usr/local/lib/pkgconfig
RUN yum install -y sqlite-devel zlib-devel openssl-devel pcre-devel vim tk-devel tkinter libtool xz graphviz
COPY build_scripts /build_scripts
RUN bash build_scripts/build.sh && \
bash build_scripts/install_nccl2.sh && rm -r build_scripts
bash build_scripts/install_nccl2.sh && rm -r
f
build_scripts
ENV SSL_CERT_FILE=/opt/_internal/certs.pem
...
...
tools/manylinux1/build_scripts/build_utils.sh
浏览文件 @
5e0bc130
...
...
@@ -50,6 +50,15 @@ function do_cpython_build {
mkdir
-p
${
prefix
}
/lib
# -Wformat added for https://bugs.python.org/issue17547 on Python 2.6
if
[
$(
lex_pyver
$py_ver
)
-eq
$(
lex_pyver 3.6
)
]
;
then
wget https://www.sqlite.org/2018/sqlite-autoconf-3250300.tar.gz
tar
-zxf
sqlite-autoconf-3250300.tar.gz
cd
sqlite-autoconf-3250300
./configure
--prefix
=
/usr/local
make
-j8
&&
make
install
cd
../
&&
rm
sqlite-autoconf-3250300.tar.gz
fi
# NOTE --enable-shared for generating libpython shared library needed for
# linking of some of the nupic.core test executables.
if
[
$(
lex_pyver
$py_ver
)
-ge
$(
lex_pyver 3.7
)
]
;
then
...
...
@@ -59,9 +68,9 @@ function do_cpython_build {
make
-j8
>
/dev/null
make altinstall
>
/dev/null
else
CFLAGS
=
"-Wformat"
./configure
--prefix
=
${
prefix
}
--enable-shared
$unicode_flags
>
/dev/null
make
-j8
>
/dev/null
make
install
>
/dev/null
LD_LIBRARY_PATH
=
/usr/local/lib:
${
LD_LIBRARY_PATH
}
CFLAGS
=
"-Wformat"
./configure
--prefix
=
${
prefix
}
--enable-shared
$unicode_flags
>
/dev/null
LD_LIBRARY_PATH
=
/usr/local/lib:
${
LD_LIBRARY_PATH
}
make
-j8
>
/dev/null
LD_LIBRARY_PATH
=
/usr/local/lib:
${
LD_LIBRARY_PATH
}
make
install
>
/dev/null
fi
popd
echo
"ZZZ looking for libpython"
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录