提交 77340348 编写于 作者: T tangwei12

Merge branch 'develop' of github.com:PaddlePaddle/Paddle into new_api_about_cpkt

...@@ -4,6 +4,7 @@ ...@@ -4,6 +4,7 @@
| backyes | Yan-Fei Wang | | backyes | Yan-Fei Wang |
| baiyfbupt | Yi-Fan Bai | | baiyfbupt | Yi-Fan Bai |
| beckett1124 | Bin Qi | | beckett1124 | Bin Qi |
| ChengduoZH | Cheng-Duo Zhao|
| chengxiaohua1105 | Xiao-Hua Cheng | | chengxiaohua1105 | Xiao-Hua Cheng |
| cxwangyi, yiwangbaidu, wangkuiyi | Yi Wang | | cxwangyi, yiwangbaidu, wangkuiyi | Yi Wang |
| cxysteven | Xing-Yi Cheng | | cxysteven | Xing-Yi Cheng |
......
...@@ -4,5 +4,5 @@ ...@@ -4,5 +4,5 @@
.. toctree:: .. toctree::
:maxdepth: 1 :maxdepth: 1
inference/index_cn.rst
optimization/index_cn.rst optimization/index_cn.rst
inference/inference_support_in_fluid.md
...@@ -5,4 +5,3 @@ HOW TO ...@@ -5,4 +5,3 @@ HOW TO
:maxdepth: 1 :maxdepth: 1
optimization/index_en.rst optimization/index_en.rst
inference/inference_support_in_fluid.md
安装与编译C++预测库
===========================
直接下载安装
-------------
====================== ========================================
版本说明 C++预测库
====================== ========================================
cpu_avx_mkl `fluid.tgz <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_CpuAvxCp27cp27mu/.lastSuccessful/fluid.tgz>`_
cpu_avx_openblas `fluid.tgz <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_CpuAvxOpenblas/.lastSuccessful/fluid.tgz>`_
cpu_noavx_openblas `fluid.tgz <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_CpuNoavxOpenblas/.lastSuccessful/fluid.tgz>`_
cuda7.5_cudnn5_avx_mkl `fluid.tgz <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_Cuda75cudnn5cp27cp27mu/.lastSuccessful/fluid.tgz>`_
cuda8.0_cudnn5_avx_mkl `fluid.tgz <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_Cuda80cudnn5cp27cp27mu/.lastSuccessful/fluid.tgz>`_
cuda8.0_cudnn7_avx_mkl `fluid.tgz <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_Cuda8cudnn7cp27cp27mu/.lastSuccessful/fluid.tgz>`_
====================== ========================================
从源码编译
----------
用户也可以从 PaddlePaddle 核心代码编译C++预测库,只需在编译时配制下面这些编译选项:
================= =========
选项 值
================= =========
CMAKE_BUILD_TYPE Release
FLUID_INSTALL_DIR 安装路径
WITH_FLUID_ONLY ON(推荐)
WITH_SWIG_PY OFF(推荐
WITH_PYTHON OFF(推荐)
WITH_GPU ON/OFF
WITH_MKL ON/OFF
================= =========
建议按照推荐值设置,以避免链接不必要的库。其它可选编译选项按需进行设定。
下面的代码片段从github拉取最新代码,配制编译选项(需要将PADDLE_ROOT替换为PaddlePaddle预测库的安装路径):
.. code-block:: bash
pip install paddlepaddle-gpu
PADDLE_ROOT=/path/of/capi
git clone https://github.com/PaddlePaddle/Paddle.git
cd Paddle
mkdir build
cd build
cmake -DFLUID_INSTALL_DIR=$PADDLE_ROOT \
-DCMAKE_BUILD_TYPE=Release \
-DWITH_FLUID_ONLY=ON \
-DWITH_SWIG_PY=OFF \
-DWITH_PYTHON=OFF \
-DWITH_MKL=OFF \
-DWITH_GPU=OFF \
..
make
make inference_lib_dist
成功编译后,使用C++预测库所需的依赖(包括:(1)编译出的PaddlePaddle预测库和头文件;(2)第三方链接库和头文件;(3)版本信息与编译选项信息)
均会存放于PADDLE_ROOT目录中。目录结构如下:
.. code-block:: text
PaddleRoot/
├── CMakeCache.txt
├── paddle
│   └── fluid
│   ├── framework
│   ├── inference
│   ├── memory
│   ├── platform
│   ├── pybind
│   └── string
├── third_party
│   ├── boost
│   │   └── boost
│   ├── eigen3
│   │   ├── Eigen
│   │   └── unsupported
│   └── install
│   ├── gflags
│   ├── glog
│   ├── mklml
│   ├── protobuf
│   ├── snappy
│   ├── snappystream
│   └── zlib
└── version.txt
version.txt 中记录了该预测库的版本信息,包括Git Commit ID、使用OpenBlas或MKL数学库、CUDA/CUDNN版本号,如:
.. code-block:: text
GIT COMMIT ID: c95cd4742f02bb009e651a00b07b21c979637dc8
WITH_MKL: ON
WITH_GPU: ON
CUDA version: 8.0
CUDNN version: v5
预测库
------------
.. toctree::
:maxdepth: 1
build_and_install_lib_cn.rst
inference_support_in_fluid_cn.md
# Fluid Inference使用指南 # 使用指南
## 目录: ## 目录:
- Python Inference API - Python Inference API
- 编译Fluid Inference库
- Inference C++ API - Inference C++ API
- Inference实例 - Inference实例
- Inference计算优化 - Inference计算优化
...@@ -55,62 +54,6 @@ ...@@ -55,62 +54,6 @@
return [program, feed_target_names, fetch_targets] return [program, feed_target_names, fetch_targets]
``` ```
## 编译Fluid Inference库
- **不需要额外的CMake选项**
- 1、 配置CMake命令,更多配置请参考[源码编译PaddlePaddle](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/build_from_source_cn.html)
```bash
$ git clone https://github.com/PaddlePaddle/Paddle.git
$ cd Paddle
$ mkdir build
$ cd build
$ cmake -DCMAKE_INSTALL_PREFIX=your/path/to/paddle_inference_lib \
-DCMAKE_BUILD_TYPE=Release \
-DWITH_PYTHON=ON \
-DWITH_MKL=OFF \
-DWITH_GPU=OFF \
..
```
- 2、 编译PaddlePaddle
```bash
$ make
```
- 3、 部署。执行如下命令将PaddlePaddle Fluid Inference库部署到`your/path/to/paddle_inference_lib`目录。
```bash
$ make inference_lib_dist
```
- 目录结构
```bash
$ cd your/path/to/paddle_inference_lib
$ tree
.
|-- paddle
| `-- fluid
| |-- framework
| |-- inference
| | |-- io.h
| | `-- libpaddle_fluid.so
| |-- memory
| |-- platform
| `-- string
|-- third_party
| |-- eigen3
| `-- install
| |-- gflags
| |-- glog
| `-- protobuf
`-- ...
```
假设`PADDLE_ROOT=your/path/to/paddle_inference_lib`
## 链接Fluid Inference库 ## 链接Fluid Inference库
- 示例项目([链接](https://github.com/luotao1/fluid_inference_example.git)) - 示例项目([链接](https://github.com/luotao1/fluid_inference_example.git))
......
...@@ -17,32 +17,21 @@ if(APPLE) ...@@ -17,32 +17,21 @@ if(APPLE)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-error=pessimizing-move") set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-error=pessimizing-move")
endif(APPLE) endif(APPLE)
function(inference_api_test TARGET_NAME TEST_SRC) function(inference_api_test TARGET_NAME)
set(options "") set(options "")
set(oneValueArgs "") set(oneValueArgs "")
set(multiValueArgs ARGS) set(multiValueArgs ARGS)
cmake_parse_arguments(inference_test "${options}" "${oneValueArgs}" "${multiValueArgs}" ${ARGN}) cmake_parse_arguments(inference_test "${options}" "${oneValueArgs}" "${multiValueArgs}" ${ARGN})
set(PYTHON_TESTS_DIR ${PADDLE_BINARY_DIR}/python/paddle/fluid/tests) set(PYTHON_TESTS_DIR ${PADDLE_BINARY_DIR}/python/paddle/fluid/tests)
set(arg_list "") cc_test(test_paddle_inference_${TARGET_NAME}
if(inference_test_ARGS) SRCS test_paddle_inference_${TARGET_NAME}.cc
foreach(arg ${inference_test_ARGS})
list(APPEND arg_list "_${arg}")
endforeach()
else()
list(APPEND arg_list "_")
endif()
foreach(arg ${arg_list})
string(REGEX REPLACE "^_$" "" arg "${arg}")
cc_test(${TARGET_NAME}
SRCS ${TEST_SRC}
DEPS paddle_fluid_api paddle_inference_api DEPS paddle_fluid_api paddle_inference_api
ARGS --dirname=${PYTHON_TESTS_DIR}/book/) ARGS --dirname=${PYTHON_TESTS_DIR}/book/)
# TODO(panyx0178): Figure out how to add word2vec and image_classification if(inference_test_ARGS)
# as deps. set_tests_properties(test_paddle_inference_${TARGET_NAME}
# set_tests_properties(${TARGET_NAME} PROPERTIES DEPENDS "${inference_test_ARGS}")
# PROPERTIES DEPENDS ${DEP_TEST}) endif()
endforeach()
endfunction(inference_api_test) endfunction(inference_api_test)
...@@ -50,9 +39,11 @@ cc_library(paddle_inference_api ...@@ -50,9 +39,11 @@ cc_library(paddle_inference_api
SRCS paddle_inference_api.cc paddle_inference_api_impl.cc SRCS paddle_inference_api.cc paddle_inference_api_impl.cc
DEPS ${FLUID_CORE_MODULES} ${GLOB_OP_LIB}) DEPS ${FLUID_CORE_MODULES} ${GLOB_OP_LIB})
cc_test(test_paddle_inference_api if(WITH_TESTING)
cc_test(test_paddle_inference_api
SRCS test_paddle_inference_api.cc SRCS test_paddle_inference_api.cc
DEPS paddle_inference_api) DEPS paddle_inference_api)
inference_api_test(test_paddle_inference_api_impl inference_api_test(api_impl
test_paddle_inference_api_impl.cc) ARGS test_word2vec test_image_classification)
endif()
...@@ -200,7 +200,7 @@ BlockDesc::BlockDesc(ProgramDesc *prog, proto::BlockDesc *desc) ...@@ -200,7 +200,7 @@ BlockDesc::BlockDesc(ProgramDesc *prog, proto::BlockDesc *desc)
vars_[var_desc.name()].reset(new VarDesc(var_desc)); vars_[var_desc.name()].reset(new VarDesc(var_desc));
} }
for (const proto::OpDesc &op_desc : desc_->ops()) { for (const proto::OpDesc &op_desc : desc_->ops()) {
ops_.emplace_back(new OpDesc(op_desc, prog, this)); ops_.emplace_back(new OpDesc(op_desc, this));
} }
} }
...@@ -209,7 +209,7 @@ BlockDesc::BlockDesc(const BlockDesc &other, proto::BlockDesc *desc, ...@@ -209,7 +209,7 @@ BlockDesc::BlockDesc(const BlockDesc &other, proto::BlockDesc *desc,
: prog_(prog), desc_(desc) { : prog_(prog), desc_(desc) {
need_update_ = true; need_update_ = true;
for (auto &op : other.ops_) { for (auto &op : other.ops_) {
ops_.emplace_back(new OpDesc(*op->Proto(), prog, this)); ops_.emplace_back(new OpDesc(*op, this));
} }
for (auto &it : other.vars_) { for (auto &it : other.vars_) {
auto *var = new VarDesc(*it.second); auto *var = new VarDesc(*it.second);
......
...@@ -105,7 +105,7 @@ class BlockDesc { ...@@ -105,7 +105,7 @@ class BlockDesc {
size_t OpSize() const { return ops_.size(); } size_t OpSize() const { return ops_.size(); }
OpDesc *Op(int idx) { return ops_.at(idx).get(); } OpDesc *Op(int idx) const { return ops_.at(idx).get(); }
void Flush(); void Flush();
......
...@@ -11,11 +11,15 @@ ...@@ -11,11 +11,15 @@
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and // See the License for the specific language governing permissions and
// limitations under the License. // limitations under the License.
#include "paddle/fluid/framework/details/multi_devices_graph_builder.h" #include <algorithm>
#include <fstream> #include <fstream>
#include <string>
#include <utility> #include <utility>
#include <vector>
#include "paddle/fluid/framework/details/broadcast_op_handle.h" #include "paddle/fluid/framework/details/broadcast_op_handle.h"
#include "paddle/fluid/framework/details/computation_op_handle.h" #include "paddle/fluid/framework/details/computation_op_handle.h"
#include "paddle/fluid/framework/details/multi_devices_graph_builder.h"
#include "paddle/fluid/framework/details/reduce_op_handle.h" #include "paddle/fluid/framework/details/reduce_op_handle.h"
#include "paddle/fluid/framework/details/rpc_op_handle.h" #include "paddle/fluid/framework/details/rpc_op_handle.h"
#include "paddle/fluid/framework/details/scale_loss_grad_op_handle.h" #include "paddle/fluid/framework/details/scale_loss_grad_op_handle.h"
...@@ -26,9 +30,6 @@ ...@@ -26,9 +30,6 @@
#include "paddle/fluid/framework/details/nccl_all_reduce_op_handle.h" #include "paddle/fluid/framework/details/nccl_all_reduce_op_handle.h"
#endif #endif
#include <string>
#include <vector>
DEFINE_string(ssa_graph_path, "/tmp/ssa_graph.dot", DEFINE_string(ssa_graph_path, "/tmp/ssa_graph.dot",
"the ssa graph path only print with GLOG_v=10," "the ssa graph path only print with GLOG_v=10,"
"default /tmp/graph.dot"); "default /tmp/graph.dot");
...@@ -148,9 +149,9 @@ bool MultiDevSSAGraphBuilder::IsDistTrainOp( ...@@ -148,9 +149,9 @@ bool MultiDevSSAGraphBuilder::IsDistTrainOp(
std::unique_ptr<SSAGraph> MultiDevSSAGraphBuilder::Build( std::unique_ptr<SSAGraph> MultiDevSSAGraphBuilder::Build(
const ProgramDesc &program) const { const ProgramDesc &program) const {
std::unordered_map<std::string, proto::VarType::Type> var_types; std::unordered_map<std::string, VarDesc *> all_vars;
for (auto *var : program.Block(0).AllVars()) { for (auto *var : program.Block(0).AllVars()) {
var_types[var->Name()] = var->GetType(); all_vars[var->Name()] = var;
} }
auto graph = new SSAGraph(); auto graph = new SSAGraph();
...@@ -167,12 +168,28 @@ std::unique_ptr<SSAGraph> MultiDevSSAGraphBuilder::Build( ...@@ -167,12 +168,28 @@ std::unique_ptr<SSAGraph> MultiDevSSAGraphBuilder::Build(
auto send_vars = FindDistTrainSendVars(program); auto send_vars = FindDistTrainSendVars(program);
auto recv_vars = FindDistTrainRecvVars(program); auto recv_vars = FindDistTrainRecvVars(program);
size_t cur_device_id = 0;
std::vector<std::unordered_set<std::string>> var_name_on_devices; std::vector<std::unordered_set<std::string>> var_name_on_devices;
std::vector<std::unordered_set<std::string>> bcast_var_name_set; std::vector<std::unordered_set<std::string>> bcast_var_name_set;
var_name_on_devices.resize(places_.size()); var_name_on_devices.resize(places_.size());
bcast_var_name_set.resize(places_.size()); bcast_var_name_set.resize(places_.size());
size_t cur_device_id = 0;
std::vector<int64_t> balance_grads(places_.size(), 0);
auto get_appropriate_dev = [&](std::string &g_name) -> size_t {
auto var_desc = all_vars.at(g_name);
PADDLE_ENFORCE_NOT_NULL(var_desc);
auto dim = framework::make_ddim(var_desc->GetShape());
int64_t numel = framework::product(dim);
PADDLE_ENFORCE_GE(numel, 0);
auto smallest =
std::min_element(std::begin(balance_grads), std::end(balance_grads));
size_t dev_id =
static_cast<size_t>(std::distance(std::begin(balance_grads), smallest));
balance_grads[dev_id] += numel;
return dev_id;
};
bool is_forwarding = true; bool is_forwarding = true;
for (auto *op : program.Block(0).AllOps()) { for (auto *op : program.Block(0).AllOps()) {
if (boost::get<int>( if (boost::get<int>(
...@@ -220,13 +237,13 @@ std::unique_ptr<SSAGraph> MultiDevSSAGraphBuilder::Build( ...@@ -220,13 +237,13 @@ std::unique_ptr<SSAGraph> MultiDevSSAGraphBuilder::Build(
switch (strategy_.reduce_) { switch (strategy_.reduce_) {
case BuildStrategy::ReduceStrategy::kReduce: case BuildStrategy::ReduceStrategy::kReduce:
cur_device_id = get_appropriate_dev(g_name);
CreateReduceOp(&result, g_name, cur_device_id); CreateReduceOp(&result, g_name, cur_device_id);
var_name_on_devices[cur_device_id].emplace(g_name); var_name_on_devices[cur_device_id].emplace(g_name);
bcast_var_name_set[cur_device_id].emplace(p_name); bcast_var_name_set[cur_device_id].emplace(p_name);
cur_device_id = (cur_device_id + 1) % places_.size();
break; break;
case BuildStrategy::ReduceStrategy::kAllReduce: case BuildStrategy::ReduceStrategy::kAllReduce:
if (IsSparseGradient(var_types, g_name)) { if (IsSparseGradient(all_vars, g_name)) {
CreateReduceOp(&result, g_name, 0); CreateReduceOp(&result, g_name, 0);
CreateBroadcastOp(&result, g_name, 0); CreateBroadcastOp(&result, g_name, 0);
} else { } else {
...@@ -269,10 +286,10 @@ std::unique_ptr<SSAGraph> MultiDevSSAGraphBuilder::Build( ...@@ -269,10 +286,10 @@ std::unique_ptr<SSAGraph> MultiDevSSAGraphBuilder::Build(
} }
bool MultiDevSSAGraphBuilder::IsSparseGradient( bool MultiDevSSAGraphBuilder::IsSparseGradient(
const std::unordered_map<std::string, proto::VarType::Type> &var_types, const std::unordered_map<std::string, VarDesc *> &all_vars,
const std::string &og) const { const std::string &og) const {
PADDLE_ENFORCE(var_types.count(og) != 0); PADDLE_ENFORCE(all_vars.count(og) != 0);
if (var_types.at(og) == proto::VarType::SELECTED_ROWS) { if (all_vars.at(og)->GetType() == proto::VarType::SELECTED_ROWS) {
return true; return true;
} }
return false; return false;
......
...@@ -106,7 +106,7 @@ class MultiDevSSAGraphBuilder : public SSAGraphBuilder { ...@@ -106,7 +106,7 @@ class MultiDevSSAGraphBuilder : public SSAGraphBuilder {
size_t src_dev_id) const; size_t src_dev_id) const;
bool IsSparseGradient( bool IsSparseGradient(
const std::unordered_map<std::string, proto::VarType::Type> &var_types, const std::unordered_map<std::string, VarDesc *> &all_vars,
const std::string &og) const; const std::string &og) const;
private: private:
......
...@@ -103,7 +103,7 @@ void OpDesc::CopyFrom(const OpDesc &op_desc) { ...@@ -103,7 +103,7 @@ void OpDesc::CopyFrom(const OpDesc &op_desc) {
need_update_ = true; need_update_ = true;
} }
OpDesc::OpDesc(const proto::OpDesc &desc, ProgramDesc *prog, BlockDesc *block) OpDesc::OpDesc(const proto::OpDesc &desc, BlockDesc *block)
: desc_(desc), need_update_(false) { : desc_(desc), need_update_(false) {
// restore inputs_ // restore inputs_
int input_size = desc_.inputs_size(); int input_size = desc_.inputs_size();
......
...@@ -33,13 +33,14 @@ class OpDesc { ...@@ -33,13 +33,14 @@ class OpDesc {
OpDesc(const std::string &type, const VariableNameMap &inputs, OpDesc(const std::string &type, const VariableNameMap &inputs,
const VariableNameMap &outputs, const AttributeMap &attrs); const VariableNameMap &outputs, const AttributeMap &attrs);
OpDesc(const proto::OpDesc &desc, ProgramDesc *prog, BlockDesc *block); OpDesc(const proto::OpDesc &desc, BlockDesc *block);
explicit OpDesc(BlockDesc *block) : block_(block) {} explicit OpDesc(BlockDesc *block) : block_(block) {}
OpDesc(const OpDesc &other, BlockDesc *block) { OpDesc(const OpDesc &other, BlockDesc *block) {
*this = other; *this = other;
block_ = block; block_ = block;
need_update_ = true;
} }
void CopyFrom(const OpDesc &op_desc); void CopyFrom(const OpDesc &op_desc);
......
...@@ -51,12 +51,15 @@ ProgramDesc::ProgramDesc(const ProgramDesc &o) { ...@@ -51,12 +51,15 @@ ProgramDesc::ProgramDesc(const ProgramDesc &o) {
auto *block = desc_.mutable_blocks(i); auto *block = desc_.mutable_blocks(i);
blocks_.emplace_back(new BlockDesc(*o.blocks_[i], block, this)); blocks_.emplace_back(new BlockDesc(*o.blocks_[i], block, this));
} }
for (auto &block : blocks_) { for (size_t block_id = 0; block_id < blocks_.size(); ++block_id) {
for (auto *op : block->AllOps()) { auto all_ops = blocks_[block_id]->AllOps();
for (const auto &attr : op->Proto()->attrs()) { for (size_t op_id = 0; op_id < all_ops.size(); ++op_id) {
if (attr.type() == proto::AttrType::BLOCK) { auto &op = all_ops[op_id];
size_t blk_idx = attr.block_idx(); for (const std::string &attr_name : op->AttrNames()) {
op->SetBlockAttr(attr.name(), this->MutableBlock(blk_idx)); if (op->GetAttrType(attr_name) == proto::AttrType::BLOCK) {
int sub_block_id =
o.Block(block_id).Op(op_id)->GetBlockAttr(attr_name);
op->SetBlockAttr(attr_name, MutableBlock(sub_block_id));
} }
} }
} }
...@@ -86,6 +89,16 @@ ProgramDesc::ProgramDesc(const std::string &binary_str) { ...@@ -86,6 +89,16 @@ ProgramDesc::ProgramDesc(const std::string &binary_str) {
for (auto &block_desc : *desc_.mutable_blocks()) { for (auto &block_desc : *desc_.mutable_blocks()) {
blocks_.emplace_back(new BlockDesc(this, &block_desc)); blocks_.emplace_back(new BlockDesc(this, &block_desc));
} }
for (auto &block : blocks_) {
for (auto *op : block->AllOps()) {
for (const auto &attr : op->Proto()->attrs()) {
if (attr.type() == proto::AttrType::BLOCK) {
size_t blk_idx = attr.block_idx();
op->SetBlockAttr(attr.name(), this->MutableBlock(blk_idx));
}
}
}
}
} }
const std::vector<std::string> ProgramDesc::GetFeedTargetNames() { const std::vector<std::string> ProgramDesc::GetFeedTargetNames() {
......
...@@ -24,7 +24,7 @@ class ReluOpConverter : public OpConverter { ...@@ -24,7 +24,7 @@ class ReluOpConverter : public OpConverter {
void operator()(const framework::proto::OpDesc& op) override { void operator()(const framework::proto::OpDesc& op) override {
// Here the two nullptr looks strange, that's because the // Here the two nullptr looks strange, that's because the
// framework::OpDesc's constructor is strange. // framework::OpDesc's constructor is strange.
framework::OpDesc op_desc(op, nullptr, nullptr); framework::OpDesc op_desc(op, nullptr);
LOG(INFO) << "convert a fluid relu op to tensorrt activation layer whose " LOG(INFO) << "convert a fluid relu op to tensorrt activation layer whose "
"type is Relu"; "type is Relu";
const nvinfer1::ITensor* input_tensor = const nvinfer1::ITensor* input_tensor =
......
...@@ -27,7 +27,7 @@ class MulOpConverter : public OpConverter { ...@@ -27,7 +27,7 @@ class MulOpConverter : public OpConverter {
void operator()(const framework::proto::OpDesc& op) override { void operator()(const framework::proto::OpDesc& op) override {
VLOG(4) << "convert a fluid mul op to tensorrt fc layer without bias"; VLOG(4) << "convert a fluid mul op to tensorrt fc layer without bias";
framework::OpDesc op_desc(op, nullptr, nullptr); framework::OpDesc op_desc(op, nullptr);
// Declare inputs // Declare inputs
auto* input1 = engine_->GetITensor(op_desc.Input("X")[0]); auto* input1 = engine_->GetITensor(op_desc.Input("X")[0]);
auto* input2 = engine_->GetITensor(op_desc.Input("Y")[0]); auto* input2 = engine_->GetITensor(op_desc.Input("Y")[0]);
......
...@@ -104,7 +104,7 @@ class TRTConvertValidation { ...@@ -104,7 +104,7 @@ class TRTConvertValidation {
engine_->FreezeNetwork(); engine_->FreezeNetwork();
// Declare outputs. // Declare outputs.
op_desc_.reset(new framework::OpDesc(desc, nullptr, nullptr)); op_desc_.reset(new framework::OpDesc(desc, nullptr));
// Set Inputs. // Set Inputs.
for (const auto& input : op_desc_->InputArgumentNames()) { for (const auto& input : op_desc_->InputArgumentNames()) {
......
...@@ -34,9 +34,22 @@ class BilinearInterpOp : public framework::OperatorWithKernel { ...@@ -34,9 +34,22 @@ class BilinearInterpOp : public framework::OperatorWithKernel {
int out_w = ctx->Attrs().Get<int>("out_w"); int out_w = ctx->Attrs().Get<int>("out_w");
PADDLE_ENFORCE_EQ(dim_x.size(), 4, "X's dimension must be 4"); PADDLE_ENFORCE_EQ(dim_x.size(), 4, "X's dimension must be 4");
if (ctx->HasInput("OutSize")) {
auto out_size_dim = ctx->GetInputDim("OutSize");
PADDLE_ENFORCE_EQ(out_size_dim.size(), 1,
"OutSize's dimension size must be 1");
PADDLE_ENFORCE_EQ(out_size_dim[0], 2, "OutSize's dim[0] must be 2");
}
std::vector<int64_t> dim_out({dim_x[0], dim_x[1], out_h, out_w}); std::vector<int64_t> dim_out({dim_x[0], dim_x[1], out_h, out_w});
ctx->SetOutputDim("Out", framework::make_ddim(dim_out)); ctx->SetOutputDim("Out", framework::make_ddim(dim_out));
} }
protected:
framework::OpKernelType GetExpectedKernelType(
const framework::ExecutionContext& ctx) const override {
return framework::OpKernelType(
framework::ToDataType(ctx.Input<Tensor>("X")->type()), ctx.GetPlace());
}
}; };
class BilinearInterpOpMaker : public framework::OpProtoAndCheckerMaker { class BilinearInterpOpMaker : public framework::OpProtoAndCheckerMaker {
...@@ -45,6 +58,10 @@ class BilinearInterpOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -45,6 +58,10 @@ class BilinearInterpOpMaker : public framework::OpProtoAndCheckerMaker {
AddInput("X", AddInput("X",
"(Tensor) The input tensor of bilinear interpolation, " "(Tensor) The input tensor of bilinear interpolation, "
"This is a 4-D tensor with shape of (N x C x h x w)"); "This is a 4-D tensor with shape of (N x C x h x w)");
AddInput("OutSize",
"(Tensor) This is a 1-D tensor with two number. "
"The first number is height and the second number is width.")
.AsDispensable();
AddOutput("Out", AddOutput("Out",
"(Tensor) The dimension of output is (N x C x out_h x out_w]"); "(Tensor) The dimension of output is (N x C x out_h x out_w]");
...@@ -78,6 +95,12 @@ class BilinearInterpOpGrad : public framework::OperatorWithKernel { ...@@ -78,6 +95,12 @@ class BilinearInterpOpGrad : public framework::OperatorWithKernel {
ctx->SetOutputDim(framework::GradVarName("X"), dim_x); ctx->SetOutputDim(framework::GradVarName("X"), dim_x);
} }
} }
framework::OpKernelType GetExpectedKernelType(
const framework::ExecutionContext& ctx) const override {
return framework::OpKernelType(
framework::ToDataType(ctx.Input<Tensor>("X")->type()), ctx.GetPlace());
}
}; };
} // namespace operators } // namespace operators
......
...@@ -102,10 +102,21 @@ class BilinearInterpOpCUDAKernel : public framework::OpKernel<T> { ...@@ -102,10 +102,21 @@ class BilinearInterpOpCUDAKernel : public framework::OpKernel<T> {
auto* input_t = ctx.Input<Tensor>("X"); // float tensor auto* input_t = ctx.Input<Tensor>("X"); // float tensor
auto* output_t = ctx.Output<Tensor>("Out"); // float tensor auto* output_t = ctx.Output<Tensor>("Out"); // float tensor
auto* input = input_t->data<T>(); auto* input = input_t->data<T>();
auto* output = output_t->mutable_data<T>(ctx.GetPlace());
int out_h = ctx.Attr<int>("out_h"); int out_h = ctx.Attr<int>("out_h");
int out_w = ctx.Attr<int>("out_w"); int out_w = ctx.Attr<int>("out_w");
auto out_dims = output_t->dims();
auto out_size_t = ctx.Input<Tensor>("OutSize");
if (out_size_t != nullptr) {
Tensor sizes;
framework::TensorCopy(*out_size_t, platform::CPUPlace(), &sizes);
auto size_data = sizes.data<int>();
out_h = size_data[0];
out_w = size_data[1];
}
auto* output = output_t->mutable_data<T>(
{out_dims[0], out_dims[1], out_h, out_w}, ctx.GetPlace());
int batch_size = input_t->dims()[0]; int batch_size = input_t->dims()[0];
int channels = input_t->dims()[1]; int channels = input_t->dims()[1];
int in_h = input_t->dims()[2]; int in_h = input_t->dims()[2];
...@@ -139,8 +150,8 @@ class BilinearInterpGradOpCUDAKernel : public framework::OpKernel<T> { ...@@ -139,8 +150,8 @@ class BilinearInterpGradOpCUDAKernel : public framework::OpKernel<T> {
void Compute(const framework::ExecutionContext& ctx) const override { void Compute(const framework::ExecutionContext& ctx) const override {
auto* d_input_t = ctx.Output<Tensor>(framework::GradVarName("X")); auto* d_input_t = ctx.Output<Tensor>(framework::GradVarName("X"));
auto* d_output_t = ctx.Input<Tensor>(framework::GradVarName("Out")); auto* d_output_t = ctx.Input<Tensor>(framework::GradVarName("Out"));
auto* d_input = d_input_t->mutable_data<T>(ctx.GetPlace());
auto* d_output = d_output_t->data<T>(); auto* d_output = d_output_t->data<T>();
auto* d_input = d_input_t->mutable_data<T>(ctx.GetPlace());
auto& device_ctx = auto& device_ctx =
ctx.template device_context<platform::CUDADeviceContext>(); ctx.template device_context<platform::CUDADeviceContext>();
...@@ -149,6 +160,16 @@ class BilinearInterpGradOpCUDAKernel : public framework::OpKernel<T> { ...@@ -149,6 +160,16 @@ class BilinearInterpGradOpCUDAKernel : public framework::OpKernel<T> {
int out_h = ctx.Attr<int>("out_h"); int out_h = ctx.Attr<int>("out_h");
int out_w = ctx.Attr<int>("out_w"); int out_w = ctx.Attr<int>("out_w");
auto out_size_t = ctx.Input<Tensor>("OutSize");
if (out_size_t != nullptr) {
Tensor sizes;
framework::TensorCopy(*out_size_t, platform::CPUPlace(), &sizes);
auto size_data = sizes.data<int>();
out_h = size_data[0];
out_w = size_data[1];
}
int batch_size = d_input_t->dims()[0]; int batch_size = d_input_t->dims()[0];
int channels = d_input_t->dims()[1]; int channels = d_input_t->dims()[1];
int in_h = d_input_t->dims()[2]; int in_h = d_input_t->dims()[2];
......
...@@ -24,11 +24,18 @@ class BilinearInterpKernel : public framework::OpKernel<T> { ...@@ -24,11 +24,18 @@ class BilinearInterpKernel : public framework::OpKernel<T> {
void Compute(const framework::ExecutionContext& ctx) const override { void Compute(const framework::ExecutionContext& ctx) const override {
auto* input_t = ctx.Input<Tensor>("X"); // float tensor auto* input_t = ctx.Input<Tensor>("X"); // float tensor
auto* output_t = ctx.Output<Tensor>("Out"); // float tensor auto* output_t = ctx.Output<Tensor>("Out"); // float tensor
auto out_dims = output_t->dims();
auto* input = input_t->data<T>(); auto* input = input_t->data<T>();
auto* output = output_t->mutable_data<T>(ctx.GetPlace());
int out_h = ctx.Attr<int>("out_h"); int out_h = ctx.Attr<int>("out_h");
int out_w = ctx.Attr<int>("out_w"); int out_w = ctx.Attr<int>("out_w");
auto out_size_t = ctx.Input<Tensor>("OutSize");
if (out_size_t != nullptr) {
auto out_size_data = out_size_t->data<int>();
out_h = out_size_data[0];
out_w = out_size_data[1];
}
auto* output = output_t->mutable_data<T>(
{out_dims[0], out_dims[1], out_h, out_w}, ctx.GetPlace());
int batch_size = input_t->dims()[0]; int batch_size = input_t->dims()[0];
int channels = input_t->dims()[1]; int channels = input_t->dims()[1];
int in_h = input_t->dims()[2]; int in_h = input_t->dims()[2];
...@@ -83,9 +90,8 @@ class BilinearInterpGradKernel : public framework::OpKernel<T> { ...@@ -83,9 +90,8 @@ class BilinearInterpGradKernel : public framework::OpKernel<T> {
void Compute(const framework::ExecutionContext& ctx) const override { void Compute(const framework::ExecutionContext& ctx) const override {
auto* d_input_t = ctx.Output<Tensor>(framework::GradVarName("X")); auto* d_input_t = ctx.Output<Tensor>(framework::GradVarName("X"));
auto* d_output_t = ctx.Input<Tensor>(framework::GradVarName("Out")); auto* d_output_t = ctx.Input<Tensor>(framework::GradVarName("Out"));
auto* d_input = d_input_t->mutable_data<T>(ctx.GetPlace());
auto* d_output = d_output_t->data<T>(); auto* d_output = d_output_t->data<T>();
auto* d_input = d_input_t->mutable_data<T>(ctx.GetPlace());
auto& device_ctx = auto& device_ctx =
ctx.template device_context<platform::CPUDeviceContext>(); ctx.template device_context<platform::CPUDeviceContext>();
math::SetConstant<platform::CPUDeviceContext, T> zero; math::SetConstant<platform::CPUDeviceContext, T> zero;
...@@ -93,6 +99,14 @@ class BilinearInterpGradKernel : public framework::OpKernel<T> { ...@@ -93,6 +99,14 @@ class BilinearInterpGradKernel : public framework::OpKernel<T> {
int out_h = ctx.Attr<int>("out_h"); int out_h = ctx.Attr<int>("out_h");
int out_w = ctx.Attr<int>("out_w"); int out_w = ctx.Attr<int>("out_w");
auto out_size_t = ctx.Input<Tensor>("OutSize");
if (out_size_t != nullptr) {
auto out_size_data = out_size_t->data<int>();
out_h = out_size_data[0];
out_w = out_size_data[1];
}
int batch_size = d_input_t->dims()[0]; int batch_size = d_input_t->dims()[0];
int channels = d_input_t->dims()[1]; int channels = d_input_t->dims()[1];
int in_h = d_input_t->dims()[2]; int in_h = d_input_t->dims()[2];
......
...@@ -3944,7 +3944,7 @@ def upsampling_bilinear2d(input, out_shape=None, scale=None, name=None): ...@@ -3944,7 +3944,7 @@ def upsampling_bilinear2d(input, out_shape=None, scale=None, name=None):
input (Variable): The input tensor of bilinear interpolation, input (Variable): The input tensor of bilinear interpolation,
This is a 4-D tensor of the shape This is a 4-D tensor of the shape
(num_batches, channels, in_h, in_w). (num_batches, channels, in_h, in_w).
out_shape(list|tuple|None): Output shape of bilinear interpolation out_shape(list|tuple|Variable|None): Output shape of bilinear interpolation
layer, the shape is (out_h, out_w). layer, the shape is (out_h, out_w).
Default: None Default: None
scale(int|None): The multiplier for the input height or width. scale(int|None): The multiplier for the input height or width.
...@@ -3971,13 +3971,20 @@ def upsampling_bilinear2d(input, out_shape=None, scale=None, name=None): ...@@ -3971,13 +3971,20 @@ def upsampling_bilinear2d(input, out_shape=None, scale=None, name=None):
def _is_list_or_turple_(data): def _is_list_or_turple_(data):
return (isinstance(data, list) or isinstance(data, tuple)) return (isinstance(data, list) or isinstance(data, tuple))
out_h = 0
out_w = 0
inputs = {"X": input}
if out_shape is not None: if out_shape is not None:
if not (_is_list_or_turple_(out_shape) and len(out_shape) == 2): if not (_is_list_or_turple_(out_shape) and len(out_shape) == 2) and (
out_shape is not Variable):
raise ValueError('out_shape should be a list or tuple ', raise ValueError('out_shape should be a list or tuple ',
'with length 2, (out_h, out_w).') 'with length 2, (out_h, out_w).')
if _is_list_or_turple_(out_shape):
out_shape = list(map(int, out_shape)) out_shape = list(map(int, out_shape))
out_h = out_shape[0] out_h = out_shape[0]
out_w = out_shape[1] out_w = out_shape[1]
else:
inputs['OutSize'] = out_shape
else: else:
out_h = int(input.shape[2] * scale) out_h = int(input.shape[2] * scale)
out_w = int(input.shape[3] * scale) out_w = int(input.shape[3] * scale)
...@@ -3985,7 +3992,7 @@ def upsampling_bilinear2d(input, out_shape=None, scale=None, name=None): ...@@ -3985,7 +3992,7 @@ def upsampling_bilinear2d(input, out_shape=None, scale=None, name=None):
out = helper.create_tmp_variable(dtype) out = helper.create_tmp_variable(dtype)
helper.append_op( helper.append_op(
type="bilinear_interp", type="bilinear_interp",
inputs={"X": input}, inputs=inputs,
outputs={"Out": out}, outputs={"Out": out},
attrs={"out_h": out_h, attrs={"out_h": out_h,
"out_w": out_w}) "out_w": out_w})
......
...@@ -17,7 +17,10 @@ import numpy as np ...@@ -17,7 +17,10 @@ import numpy as np
from op_test import OpTest from op_test import OpTest
def bilinear_interp_np(input, out_h, out_w): def bilinear_interp_np(input, out_h, out_w, out_size):
if out_size is not None:
out_h = out_size[0]
out_w = out_size[1]
batch_size, channel, in_h, in_w = input.shape batch_size, channel, in_h, in_w = input.shape
if out_h > 1: if out_h > 1:
ratio_h = (in_h - 1.0) / (out_h - 1.0) ratio_h = (in_h - 1.0) / (out_h - 1.0)
...@@ -49,12 +52,15 @@ def bilinear_interp_np(input, out_h, out_w): ...@@ -49,12 +52,15 @@ def bilinear_interp_np(input, out_h, out_w):
class TestBilinearInterpOp(OpTest): class TestBilinearInterpOp(OpTest):
def setUp(self): def setUp(self):
self.out_size = None
self.init_test_case() self.init_test_case()
self.op_type = "bilinear_interp" self.op_type = "bilinear_interp"
input_np = np.random.random(self.input_shape).astype("float32") input_np = np.random.random(self.input_shape).astype("float32")
output_np = bilinear_interp_np(input_np, self.out_h, self.out_w) output_np = bilinear_interp_np(input_np, self.out_h, self.out_w,
self.out_size)
self.inputs = {'X': input_np} self.inputs = {'X': input_np}
if self.out_size is not None:
self.inputs['OutSize'] = self.out_size
self.attrs = {'out_h': self.out_h, 'out_w': self.out_w} self.attrs = {'out_h': self.out_h, 'out_w': self.out_w}
self.outputs = {'Out': output_np} self.outputs = {'Out': output_np}
...@@ -68,6 +74,7 @@ class TestBilinearInterpOp(OpTest): ...@@ -68,6 +74,7 @@ class TestBilinearInterpOp(OpTest):
self.input_shape = [2, 3, 4, 4] self.input_shape = [2, 3, 4, 4]
self.out_h = 2 self.out_h = 2
self.out_w = 2 self.out_w = 2
self.out_size = np.array([3, 3]).astype("int32")
class TestCase1(TestBilinearInterpOp): class TestCase1(TestBilinearInterpOp):
...@@ -91,5 +98,29 @@ class TestCase3(TestBilinearInterpOp): ...@@ -91,5 +98,29 @@ class TestCase3(TestBilinearInterpOp):
self.out_w = 128 self.out_w = 128
class TestCase4(TestBilinearInterpOp):
def init_test_case(self):
self.input_shape = [4, 1, 7, 8]
self.out_h = 1
self.out_w = 1
self.out_size = np.array([2, 2]).astype("int32")
class TestCase5(TestBilinearInterpOp):
def init_test_case(self):
self.input_shape = [3, 3, 9, 6]
self.out_h = 12
self.out_w = 12
self.out_size = np.array([11, 11]).astype("int32")
class TestCase6(TestBilinearInterpOp):
def init_test_case(self):
self.input_shape = [1, 1, 128, 64]
self.out_h = 64
self.out_w = 128
self.out_size = np.array([65, 129]).astype("int32")
if __name__ == "__main__": if __name__ == "__main__":
unittest.main() unittest.main()
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册