diff --git a/benchmark/IntelOptimizedPaddle.md b/benchmark/IntelOptimizedPaddle.md new file mode 100644 index 0000000000000000000000000000000000000000..1bf9ea9df02a1f0e0b71400207a9f375a2b3d25b --- /dev/null +++ b/benchmark/IntelOptimizedPaddle.md @@ -0,0 +1,48 @@ +# Benchmark + +Machine: + +- Server + - Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz, 2 Sockets, 20 Cores per socket +- Laptop + - DELL XPS15-9560-R1745: i7-7700HQ 8G 256GSSD + - i5 MacBook Pro (Retina, 13-inch, Early 2015) +- Desktop + - i7-6700k + +System: CentOS release 6.3 (Final), Docker 1.12.1. + +PaddlePaddle: paddlepaddle/paddle:latest (TODO: will rerun after 0.11.0) + +- MKL-DNN tag v0.10 +- MKLML 2018.0.20170720 +- OpenBLAS v0.2.20 + +On each machine, we will test and compare the performance of training on single node using MKL-DNN / MKLML / OpenBLAS respectively. + +## Benchmark Model + +### Server +Test on batch size 64, 128, 256 on Intel(R) Xeon(R) Gold 6148M CPU @ 2.40GHz + +Input image size - 3 * 224 * 224, Time: images/second + +- VGG-19 + +| BatchSize | 64 | 128 | 256 | +|--------------|-------| -----| --------| +| OpenBLAS | 7.82 | 8.62 | 10.34 | +| MKLML | 11.02 | 12.86 | 15.33 | +| MKL-DNN | 27.69 | 28.8 | 29.27 | + + +chart on batch size 128 +TBD + + - ResNet + - GoogLeNet + +### Laptop +TBD +### Desktop +TBD diff --git a/doc/howto/cross_compiling/cross_compiling_for_ios_cn.md b/doc/howto/cross_compiling/cross_compiling_for_ios_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..32c490d9aa4202e17aa1784a45a317c5307b98ea --- /dev/null +++ b/doc/howto/cross_compiling/cross_compiling_for_ios_cn.md @@ -0,0 +1,99 @@ +# 构建iOS平台上的PaddlePaddle库 +交叉编译iOS平台上适用的PaddlePaddle库,需要在MacOS系统上进行。本文的将介绍在MacOS上,从源码交叉编译iOS平台上适用的PaddlePaddle库。 + +## 准备交叉编译环境 +Apple官方为iOS开发提供了完整的交叉编译工具和集成开发环境,用户从App Store下载安装Xcode即可。也可自行前往官网下载,[Xcode](https://developer.apple.com/cn/xcode/)。安装完成之后,可在命令行执行`xcodebuild -version`,判断是否安装成功。 + +```bash +$ xcodebuild -version +Xcode 9.0 +Build version 9A235 +``` + +## 配置交叉编译参数 + +PaddlePaddle为交叉编译提供了工具链配置文档[cmake/cross_compiling/ios.cmake](https://github.com/PaddlePaddle/Paddle/blob/develop/cmake/cross_compiling/ios.cmake),以提供一些默认的编译器和编译参数配置。 + +交叉编译iOS版本的PaddlePaddle库时,有一些必须配置的参数: + +- `CMAKE_SYSTEM_NAME`,CMake编译的目标平台,必须设置为`iOS`。在设置`CMAKE_SYSTEM_NAME=iOS`后,PaddlePaddle的CMake系统会自动编译所有的第三方依赖库,并且强制设置一些PaddlePaddle参数的值(`WITH_C_API=ON`、`WITH_GPU=OFF`、`WITH_AVX=OFF`、`WITH_PYTHON=OFF`、`WITH_RDMA=OFF`)。 +- `WITH_C_API`,是否编译C-API预测库,必须设置为ON。在iOS平台上只支持使用C-API来预测。 +- `WITH_SWIG_PY`,必须设置为ON。在iOS平台上不支持通过swig调用来训练或者预测。 + +iOS平台可选配置参数: + +- `IOS_PLATFORM`,可设置为`OS/SIMULATOR`,默认值为`OS`。 + - `OS`,构建目标为`arm`架构的iPhone或者iPad等物理设备。 + - `SIMULATOR`,构建目标为`x86`架构的模拟器平台。 +- `IOS_ARCH`,目标架构。针对不同的`IOS_PLATFORM`,可设置的目标架构如下表所示: + + | IOS_PLATFORM | IOS_ARCH | + |--------------|----------------------| + | OS | armv7, armv7s, arm64 (默认) | + | SIMULATOR | i386, x86_64 (默认) | + +- `IOS_DEPLOYMENT_TARGET`,最小的iOS部署版本,默认值为`7.0`。 +- `IOS_ENABLE_BITCODE`,是否使能[Bitcode](https://developer.apple.com/library/content/documentation/IDEs/Conceptual/AppDistributionGuide/AppThinning/AppThinning.html#//apple_ref/doc/uid/TP40012582-CH35-SW3),可设置`ON/OFF`,默认值为`ON`。 +- `IOS_USE_VECLIB_FOR_BLAS`,是否使用[vecLib](https://developer.apple.com/documentation/accelerate/veclib)框架进行BLAS矩阵计算,可设置`ON/OFF`,默认值为`OFF`。 +- `IOS_DEVELOPMENT_ROOT`,`Developer`目录,可显式指定为`/path/to/platform/Developer`。若未显式指定,PaddlePaddle将会根据`IOS_PLATFORM`自动选择`Xcode`对应`platform`的`Developer`目录。 +- `IOS_SDK_ROOT`,所使用`SDK`的根目录,可显式指定为`/path/to/platform/Developer/SDKs/SDK`。若未显式指定,PaddlePaddle将会自动选择`IOS_DEVELOPMENT_ROOT`目录下最新的`SDK`版本。 + +其他配置参数: + +- `USE_EIGEN_FOR_BLAS`,是否使用Eigen库进行矩阵计算,在`IOS_USE_VECLIB_FOR_BLAS=OFF`时有效。可设置`ON/OFF`,默认值为`OFF`。 +- `HOST_C/CXX_COMPILER`,宿主机的C/C++编译器。默认值为环境变量`CC/CXX`的值;若环境变量`CC/CXX`未设置,则使用`cc/c++`编译器。 + +常用的cmake配置如下: + +```bash +cmake -DCMAKE_SYSTEM_NAME=iOS \ + -DIOS_PLATFORM=OS \ + -DIOS_ARCH="arm64" \ + -DIOS_ENABLE_BITCODE=ON \ + -DIOS_USE_VECLIB_FOR_BLAS=ON \ + -DCMAKE_INSTALL_PREFIX=your/path/to/install \ + -DWITH_C_API=ON \ + -DWITH_TESTING=OFF \ + -DWITH_SWIG_PY=OFF \ + .. +``` + +```bash +cmake -DCMAKE_SYSTEM_NAME=iOS \ + -DIOS_PLATFORM=SIMULATOR \ + -DIOS_ARCH="x86_64" \ + -DIOS_USE_VECLIB_FOR_BLAS=ON \ + -DCMAKE_INSTALL_PREFIX=your/path/to/install \ + -DWITH_C_API=ON \ + -DWITH_TESTING=OFF \ + -DWITH_SWIG_PY=OFF \ + .. +``` + +用户还可根据自己的需求设置其他编译参数。比如希望最小化生成库的大小,可以设置`CMAKE_BUILD_TYPE`为`MinSizeRel`;若希望得到最快的执行速度,则可设置`CMAKE_BUILD_TYPE`为`Release`。亦可以通过手动设置`CMAKE_C/CXX_FLAGS`来影响PaddlePaddle的编译过程。 + +**性能TIPS**,为了达到最快的计算速度,在CMake参数配置上,有以下建议: + +- 设置`CMAKE_BUILD_TYPE`为`Release` +- 设置`IOS_USE_VECLIB_FOR_BLAS=ON`,调用`vecLib`框架提供的BLAS函数进行矩阵计算。 + +## 编译和安装 + +CMake配置完成后,执行以下命令,PaddlePaddle将自动下载和编译所有第三方依赖库、编译和安装PaddlePaddle预测库。 + +``` +$ make +$ make install +``` + +注意:如果你曾在源码目录下编译过其他平台的PaddlePaddle库,请先使用`rm -rf`命令删除`third_party`目录和`build`目录,以确保所有的第三方依赖库和PaddlePaddle代码都是针对新的CMake配置重新编译的。 + +执行完安装命令后,`your/path/to/install`目录中会包含以下内容: + +- `include`目录,其中包含所有C-API的头文件 +- `lib`目录,其中包含PaddlePaddle的C-API静态库 +- `third_party`目录,其中包含所依赖的所有第三方库 + +注意,不同架构的PaddlePaddle库建议安装到不同的目录下,然后使用`lipo`工具将多个静态库合并成一个支持多个架构的fat库。 + +自此,PaddlePaddle库已经安装完成,用户可将合成的fat库用于深度学习相关的iOS App中,调用方法见C-API文档。 diff --git a/doc/howto/cross_compiling/cross_compiling_for_raspberry_cn.md b/doc/howto/cross_compiling/cross_compiling_for_raspberry_cn.md index 026c0c6f3b2a2ca322d063f38e1736a010e1197e..6e983645faaed1f67edaeeb82ddbef9cef6bb85f 100644 --- a/doc/howto/cross_compiling/cross_compiling_for_raspberry_cn.md +++ b/doc/howto/cross_compiling/cross_compiling_for_raspberry_cn.md @@ -59,4 +59,4 @@ make install 注意:如果你曾经在源码目录下编译过其他平台的PaddlePaddle库,请先使用`rm -rf`命令删除`third_party`目录和`build`目录,以确保所有的第三方依赖库和PaddlePaddle代码都是针对新的CMake配置重新编译的。 -执行完安装命令后,,`your/path/to/install`目录中会包含`include`和`lib`目录,其中`include`中包含C-API的头文件,`lib`中包含一个Raspberry Pi版本的库。 +执行完安装命令后,`your/path/to/install`目录中会包含`include`和`lib`目录,其中`include`中包含C-API的头文件,`lib`中包含一个Raspberry Pi版本的库。 diff --git a/doc/howto/cross_compiling/cross_compiling_for_raspberry_en.md b/doc/howto/cross_compiling/cross_compiling_for_raspberry_en.md index 09ac4733ec98c598dfd62f22beaf838320dc7531..3c1a5950ff9553bb725d5a96e3fdf2e5e9f6f95c 100644 --- a/doc/howto/cross_compiling/cross_compiling_for_raspberry_en.md +++ b/doc/howto/cross_compiling/cross_compiling_for_raspberry_en.md @@ -44,7 +44,7 @@ cmake -DCMAKE_SYSTEM_NAME=RPi \ .. ``` -To build the inference library, please set the argument WITH_API to ON: `WITH_C_API=ON`. +To build the inference library, please set the argument WITH\_C\_API to ON: `WITH_C_API=ON`. You can add more arguments. For example, to minimize the size of the generated inference library, you may use `CMAKE_BUILD_TYPE=MinSizeRel`. For performance optimization, you may use `CMAKE_BUILD_TYPE=Release`. diff --git a/paddle/framework/attribute.cc b/paddle/framework/attribute.cc index 29fe352ca450740e55ee87b63392e3aabac8aa40..b1e17936417e4ce09bace1d1a5d346d1c9cfa710 100644 --- a/paddle/framework/attribute.cc +++ b/paddle/framework/attribute.cc @@ -19,7 +19,7 @@ limitations under the License. */ namespace paddle { namespace framework { -Attribute GetAttrValue(const OpDesc::Attr& attr_desc, ProgramDesc* program) { +Attribute GetAttrValue(const OpDesc::Attr& attr_desc) { switch (attr_desc.type()) { case framework::AttrType::BOOLEAN: { return attr_desc.b(); @@ -61,13 +61,9 @@ Attribute GetAttrValue(const OpDesc::Attr& attr_desc, ProgramDesc* program) { } return val; } - case framework::AttrType::BLOCK: { - PADDLE_ENFORCE(program != nullptr, - "Need to specify ProgramDesc when get a block attr"); - return program->mutable_blocks(attr_desc.block_idx()); - } + default: + PADDLE_THROW("Unsupport attr type %d", attr_desc.type()); } - PADDLE_ENFORCE(false, "Unknown OpDesc::AttrDesc::type !"); return boost::blank(); } diff --git a/paddle/framework/attribute.h b/paddle/framework/attribute.h index 9744662b8f7229b0b17e910ae5cd997fa7d31e06..0641907d6ff7546df1601d3b0263ff42f4186968 100644 --- a/paddle/framework/attribute.h +++ b/paddle/framework/attribute.h @@ -32,7 +32,7 @@ inline AttrType AttrTypeID() { return static_cast(tmp.which() - 1); } -Attribute GetAttrValue(const OpDesc::Attr& attr_desc, ProgramDesc* desc); +Attribute GetAttrValue(const OpDesc::Attr& attr_desc); class AttrReader { public: diff --git a/paddle/framework/backward.cc b/paddle/framework/backward.cc index 150c152367e1bcdc095bce6f77fafdef601e1c47..dbd5a14f9f3b681f0b77b9bd507b34edfaa78766 100644 --- a/paddle/framework/backward.cc +++ b/paddle/framework/backward.cc @@ -18,6 +18,7 @@ #include #include #include +#include #include "paddle/framework/block_desc.h" #include "paddle/framework/op_registry.h" @@ -285,6 +286,15 @@ static bool AllGradInSet(const std::vector& names, return true; } +static std::string FwdName(const std::string& grad_name) { + auto pos = grad_name.find("@GRAD"); + if (pos == std::string::npos) { + return ""; + } else { + return grad_name.substr(0, pos); + } +} + static void CreateGradVarInBlock( size_t grad_op_start_index, const std::unordered_map& param_name_map, @@ -294,6 +304,7 @@ static void CreateGradVarInBlock( for (size_t op_index = grad_op_start_index; op_index < ops.size(); ++op_index) { bool need_infer_shape = false; + std::unordered_set new_vars; ForEachVarName(ops[op_index]->Outputs(), [&](const std::string& grad_var_name) { if (block_desc->HasVar(grad_var_name)) { @@ -301,8 +312,7 @@ static void CreateGradVarInBlock( } need_infer_shape = true; auto var = block_desc->Var(grad_var_name); - // FIXME(qiao) infer the datatype - var->SetDataType(framework::DataType::FP32); + new_vars.insert(var->Name()); auto it = param_name_map.find(grad_var_name); if (it == param_name_map.end()) { return false; @@ -316,6 +326,21 @@ static void CreateGradVarInBlock( }); if (need_infer_shape) { ops[op_index]->InferVarType(block_desc); + for (auto& arg : ops[op_index]->OutputArgumentNames()) { + if (new_vars.find(arg) == new_vars.end()) { + continue; + } + auto pname = FwdName(arg); + auto* param = block_desc->FindVar(pname); + auto* grad = block_desc->FindVar(arg); + if (param == nullptr) { + LOG(WARNING) << "Cannot find forward variable of " << arg + << ". Set its gradient to FP32"; + grad->SetDataType(DataType::FP32); + } else { + grad->SetDataType(param->GetDataType()); + } + } ops[op_index]->InferShape(*block_desc); } } @@ -368,7 +393,7 @@ std::vector> MakeBlockBackward( ProgramDescBind& program_desc, int block_idx, std::unordered_set* no_grad_vars, std::unordered_map* grad_to_var) { - BlockDescBind* cur_block = program_desc.Block(block_idx); + BlockDescBind* cur_block = program_desc.MutableBlock(block_idx); std::vector op_descs = cur_block->AllOps(); std::unordered_map> dup_out_ops; size_t grad_desc_idx = 0; @@ -443,7 +468,7 @@ ParamGradInfoMap AppendBackward( } const int root_block_idx = 0; - auto root_block = program_desc.Block(root_block_idx); + auto root_block = program_desc.MutableBlock(root_block_idx); // insert fill one op for target // TODO(qiao) add some check to the target. @@ -492,7 +517,7 @@ ParamGradInfoMap AppendBackward( CreateGradVarInBlock(forward_op_num, grad_to_var, root_block, &retv); for (size_t block_index = forward_block_num; block_index < program_desc.Size(); ++block_index) { - CreateGradVarInBlock(0, grad_to_var, program_desc.Block(block_index), + CreateGradVarInBlock(0, grad_to_var, program_desc.MutableBlock(block_index), &retv); } return retv; diff --git a/paddle/framework/backward_test.cc b/paddle/framework/backward_test.cc index 421f1321948235aa0c1acd2e24037b34716e449a..4e8d630c2634682ff63b38182108eadebb5c7ff9 100644 --- a/paddle/framework/backward_test.cc +++ b/paddle/framework/backward_test.cc @@ -499,7 +499,7 @@ TEST(Backward, linear_net_intermediate_variable_has_no_grad) { TEST(Backward, simple_single_op) { f::ProgramDescBind program; - f::BlockDescBind *block = program.Block(0); + f::BlockDescBind *block = program.MutableBlock(0); f::OpDescBind *op = block->AppendOp(); op->SetType("rowwise_add"); @@ -535,7 +535,7 @@ TEST(Backward, simple_single_op) { TEST(Backward, default_attribute) { f::ProgramDescBind program; - f::BlockDescBind *block = program.Block(0); + f::BlockDescBind *block = program.MutableBlock(0); f::OpDescBind *op = block->AppendOp(); op->SetType("mul"); op->SetInput("X", {"x"}); @@ -561,7 +561,7 @@ TEST(Backward, default_attribute) { TEST(Backward, simple_mult_op) { f::ProgramDescBind program; - f::BlockDescBind *block = program.Block(0); + f::BlockDescBind *block = program.MutableBlock(0); f::OpDescBind *op1 = block->AppendOp(); op1->SetType("rowwise_add"); op1->SetInput("X", {"x1"}); @@ -644,7 +644,7 @@ TEST(Backward, simple_mult_op) { TEST(Backward, intermedia_var_no_grad) { f::ProgramDescBind program; - f::BlockDescBind *block = program.Block(0); + f::BlockDescBind *block = program.MutableBlock(0); f::OpDescBind *op1 = block->AppendOp(); op1->SetType("rowwise_add"); op1->SetInput("X", {"x1"}); @@ -714,7 +714,7 @@ TEST(Backward, intermedia_var_no_grad) { TEST(Backward, var_no_grad) { f::ProgramDescBind program; - f::BlockDescBind *block = program.Block(0); + f::BlockDescBind *block = program.MutableBlock(0); f::OpDescBind *op1 = block->AppendOp(); op1->SetType("mult_in_out"); op1->SetInput("X", {"x1"}); @@ -790,7 +790,7 @@ TEST(Backward, var_no_grad) { TEST(Backward, shared_var) { f::ProgramDescBind program; - f::BlockDescBind *block = program.Block(0); + f::BlockDescBind *block = program.MutableBlock(0); f::OpDescBind *op1 = block->AppendOp(); op1->SetType("rowwise_add"); op1->SetInput("X", {"x1"}); @@ -880,7 +880,7 @@ TEST(Backward, shared_var) { TEST(Backward, half_backward) { f::ProgramDescBind program; - f::BlockDescBind *block = program.Block(0); + f::BlockDescBind *block = program.MutableBlock(0); auto *op1 = block->AppendOp(); op1->SetType("minus"); op1->SetInput("X", {"a"}); diff --git a/paddle/framework/block_desc.cc b/paddle/framework/block_desc.cc index b73a20cc89d936c2beee6a39cdf71cda3915bcdc..9e3d597f3a2c84623a1ce9e4b6f4b956cffde211 100644 --- a/paddle/framework/block_desc.cc +++ b/paddle/framework/block_desc.cc @@ -113,7 +113,7 @@ BlockDescBind *BlockDescBind::ParentBlock() const { if (this->desc_->parent_idx() == kNoneBlockIndex) { return nullptr; } - return prog_->Block(static_cast(this->desc_->parent_idx())); + return prog_->MutableBlock(static_cast(this->desc_->parent_idx())); } BlockDesc *BlockDescBind::Proto() { diff --git a/paddle/framework/executor.cc b/paddle/framework/executor.cc index 3e9d8b3084e8a76f3d5b8367b0ec45ed74dec42f..9bf2311dc835c701c9311880b8adba486a7d446c 100644 --- a/paddle/framework/executor.cc +++ b/paddle/framework/executor.cc @@ -73,33 +73,32 @@ static void CreateTensor(Variable* var, VarDesc::VarType var_type) { } } -void Executor::Run(const ProgramDesc& pdesc, Scope* scope, int block_id) { +void Executor::Run(const ProgramDescBind& pdesc, Scope* scope, int block_id) { // TODO(tonyyang-svail): // - only runs on the first device (i.e. no interdevice communication) // - will change to use multiple blocks for RNN op and Cond Op - PADDLE_ENFORCE_GT(pdesc.blocks_size(), block_id); - auto& block = pdesc.blocks(block_id); + PADDLE_ENFORCE_LT(block_id, pdesc.Size()); + auto& block = pdesc.Block(block_id); auto& device = device_contexts_[0]; Scope& local_scope = scope->NewScope(); - for (auto& var : block.vars()) { - if (var.persistable()) { - auto* ptr = scope->Var(var.name()); - CreateTensor(ptr, var.type()); - VLOG(3) << "Create Variable " << var.name() + for (auto& var : block.AllVars()) { + if (var->Persistable()) { + auto* ptr = scope->Var(var->Name()); + CreateTensor(ptr, var->GetType()); + VLOG(3) << "Create Variable " << var->Name() << " global, which pointer is " << ptr; } else { - auto* ptr = local_scope.Var(var.name()); - CreateTensor(ptr, var.type()); - VLOG(3) << "Create Variable " << var.name() + auto* ptr = local_scope.Var(var->Name()); + CreateTensor(ptr, var->GetType()); + VLOG(3) << "Create Variable " << var->Name() << " locally, which pointer is " << ptr; } } - for (auto& op_desc : block.ops()) { - auto op = paddle::framework::OpRegistry::CreateOp( - op_desc, const_cast(&pdesc)); + for (auto& op_desc : block.AllOps()) { + auto op = paddle::framework::OpRegistry::CreateOp(*op_desc); op->Run(local_scope, *device); } diff --git a/paddle/framework/executor.h b/paddle/framework/executor.h index 793ee954e25f7da6c9d04ea6acc2ad78812e8329..c78bfe8f9f07f1324515f0baaca4a94cc0fe844e 100644 --- a/paddle/framework/executor.h +++ b/paddle/framework/executor.h @@ -14,8 +14,8 @@ limitations under the License. */ #pragma once -#include "paddle/framework/framework.pb.h" #include "paddle/framework/op_info.h" +#include "paddle/framework/program_desc.h" #include "paddle/framework/scope.h" #include "paddle/framework/tensor.h" @@ -34,7 +34,7 @@ class Executor { * ProgramDesc * Scope */ - void Run(const ProgramDesc&, Scope*, int); + void Run(const ProgramDescBind&, Scope*, int); private: std::vector device_contexts_; diff --git a/paddle/framework/op_desc.cc b/paddle/framework/op_desc.cc index c2d6f124ad292bf46b4e7e9a1dcc2984aae7fcda..0779137639e6cd9f6ecf3bbbc24d081cae3de9c0 100644 --- a/paddle/framework/op_desc.cc +++ b/paddle/framework/op_desc.cc @@ -52,6 +52,22 @@ class CompileTimeInferShapeContext : public InferShapeContext { const std::vector &Outputs( const std::string &name) const override; + void ShareLoD(const std::string &in, const std::string &out, size_t i = 0, + size_t j = 0) const override { + PADDLE_ENFORCE_LT(i, Inputs(in).size()); + PADDLE_ENFORCE_LT(j, Outputs(out).size()); + auto *in_var = block_.FindVarRecursive(Inputs(in)[i]); + auto *out_var = block_.FindVarRecursive(Outputs(out)[j]); + if (in_var->GetType() != VarDesc::LOD_TENSOR) { + VLOG(3) << "input " << in << "is not LodTensor"; + return; + } + PADDLE_ENFORCE_EQ(in_var->GetType(), VarDesc::LOD_TENSOR, + "The %d-th output of Output(%s) must be LoDTensor.", j, + out); + in_var->SetLoDLevel(out_var->GetLodLevel()); + } + private: DDim GetDim(const std::string &name) const override; @@ -98,7 +114,12 @@ OpDescBind::OpDescBind(const OpDesc &desc, ProgramDescBind *prog) // restore attrs_ for (const OpDesc::Attr &attr : desc_.attrs()) { std::string attr_name = attr.name(); - attrs_[attr_name] = GetAttrValue(attr, prog->Proto()); + if (attr.type() != AttrType::BLOCK) { + attrs_[attr_name] = GetAttrValue(attr); + } else { + auto bid = attr.block_idx(); + attrs_[attr_name] = prog->MutableBlock(bid); + } } } @@ -172,8 +193,7 @@ void OpDescBind::SetAttr(const std::string &name, const Attribute &v) { } void OpDescBind::SetBlockAttr(const std::string &name, BlockDescBind &block) { - BlockDesc *desc = block.Proto(); - this->attrs_[name] = desc; + this->attrs_[name] = █ need_update_ = true; } @@ -192,7 +212,7 @@ Attribute OpDescBind::GetAttr(const std::string &name) const { int OpDescBind::GetBlockAttr(const std::string &name) const { auto it = attrs_.find(name); PADDLE_ENFORCE(it != attrs_.end(), "Attribute %s is not found", name); - return boost::get(it->second)->idx(); + return boost::get(it->second)->ID(); } const std::unordered_map &OpDescBind::GetAttrMap() diff --git a/paddle/framework/op_registry.cc b/paddle/framework/op_registry.cc index c2f2438edf6daadf26cbc6db37f6668739ab1726..8dedd873aad648174b770b84e5232cd17b577e72 100644 --- a/paddle/framework/op_registry.cc +++ b/paddle/framework/op_registry.cc @@ -43,13 +43,15 @@ static VariableNameMap ConvertOpDescVarsToVarNameMap( return ret_val; } -std::unique_ptr OpRegistry::CreateOp(const OpDesc& op_desc, - ProgramDesc* program) { +std::unique_ptr OpRegistry::CreateOp(const OpDesc& op_desc) { + VLOG(1) << "CreateOp directly from OpDesc is deprecated. It should only be" + "used in unit tests. Use CreateOp(const OpDescBind& op_desc) " + "instead."; VariableNameMap inputs = ConvertOpDescVarsToVarNameMap(op_desc.inputs()); VariableNameMap outputs = ConvertOpDescVarsToVarNameMap(op_desc.outputs()); AttributeMap attrs; for (auto& attr : op_desc.attrs()) { - attrs[attr.name()] = GetAttrValue(attr, program); + attrs[attr.name()] = GetAttrValue(attr); } return CreateOp(op_desc.type(), inputs, outputs, attrs); diff --git a/paddle/framework/op_registry.h b/paddle/framework/op_registry.h index 19a9fc3802a2f2348ad7d50a267615ed70bbc4fe..2bb5e0e8ec29fb2df81549650aa0c65bc1e51c49 100644 --- a/paddle/framework/op_registry.h +++ b/paddle/framework/op_registry.h @@ -77,8 +77,7 @@ class OpRegistry { const VariableNameMap& outputs, AttributeMap attrs); - static std::unique_ptr CreateOp(const OpDesc& op_desc, - ProgramDesc* program); + static std::unique_ptr CreateOp(const OpDesc& op_desc); static std::unique_ptr CreateOp(const OpDescBind& op_desc); }; diff --git a/paddle/framework/op_registry_test.cc b/paddle/framework/op_registry_test.cc index 6289125d7c782e542e5c55e1d4403836351b7e05..b860fe6cac773d1e85adecc43f5dfec42b6c7661 100644 --- a/paddle/framework/op_registry_test.cc +++ b/paddle/framework/op_registry_test.cc @@ -74,7 +74,7 @@ TEST(OpRegistry, CreateOp) { attr->set_type(paddle::framework::AttrType::FLOAT); attr->set_f(scale); - auto op = paddle::framework::OpRegistry::CreateOp(op_desc, nullptr); + auto op = paddle::framework::OpRegistry::CreateOp(op_desc); paddle::framework::Scope scope; paddle::platform::CPUDeviceContext dev_ctx; op->Run(scope, dev_ctx); @@ -95,7 +95,7 @@ TEST(OpRegistry, IllegalAttr) { bool caught = false; try { - paddle::framework::OpRegistry::CreateOp(op_desc, nullptr); + paddle::framework::OpRegistry::CreateOp(op_desc); } catch (paddle::platform::EnforceNotMet err) { caught = true; std::string msg = "larger_than check fail"; @@ -115,7 +115,7 @@ TEST(OpRegistry, DefaultValue) { ASSERT_TRUE(op_desc.IsInitialized()); - auto op = paddle::framework::OpRegistry::CreateOp(op_desc, nullptr); + auto op = paddle::framework::OpRegistry::CreateOp(op_desc); paddle::framework::Scope scope; paddle::platform::CPUDeviceContext dev_ctx; op->Run(scope, dev_ctx); @@ -131,7 +131,7 @@ TEST(OpRegistry, CustomChecker) { // attr 'test_attr' is not set bool caught = false; try { - paddle::framework::OpRegistry::CreateOp(op_desc, nullptr); + paddle::framework::OpRegistry::CreateOp(op_desc); } catch (paddle::platform::EnforceNotMet err) { caught = true; std::string msg = "Attribute 'test_attr' is required!"; @@ -149,7 +149,7 @@ TEST(OpRegistry, CustomChecker) { attr->set_i(3); caught = false; try { - paddle::framework::OpRegistry::CreateOp(op_desc, nullptr); + paddle::framework::OpRegistry::CreateOp(op_desc); } catch (paddle::platform::EnforceNotMet err) { caught = true; std::string msg = "'test_attr' must be even!"; @@ -166,7 +166,7 @@ TEST(OpRegistry, CustomChecker) { attr->set_name("test_attr"); attr->set_type(paddle::framework::AttrType::INT); attr->set_i(4); - auto op = paddle::framework::OpRegistry::CreateOp(op_desc, nullptr); + auto op = paddle::framework::OpRegistry::CreateOp(op_desc); paddle::platform::CPUDeviceContext dev_ctx; paddle::framework::Scope scope; op->Run(scope, dev_ctx); diff --git a/paddle/framework/operator.cc b/paddle/framework/operator.cc index 222a252dc409bf30d5d6abea95156b41cfcd221a..3be26fdc4fb6ebdd0ec427a2248b0f97d9edff01 100644 --- a/paddle/framework/operator.cc +++ b/paddle/framework/operator.cc @@ -37,32 +37,32 @@ ExecutionContext::GetEigenDevice() const { std::string OperatorBase::Input(const std::string& name) const { auto& ins = Inputs(name); PADDLE_ENFORCE_LE(ins.size(), 1UL, - "Op %s input %s should contain only one variable", type_, - name); + "Operator %s's input %s should contain only one variable.", + type_, name); return ins.empty() ? kEmptyVarName : ins[0]; } const std::vector& OperatorBase::Inputs( const std::string& name) const { auto it = inputs_.find(name); - PADDLE_ENFORCE(it != inputs_.end(), "Op %s do not have input %s", type_, - name); + PADDLE_ENFORCE(it != inputs_.end(), "Operator %s does not have the input %s.", + type_, name); return it->second; } std::string OperatorBase::Output(const std::string& name) const { auto& outs = Outputs(name); PADDLE_ENFORCE_LE(outs.size(), 1UL, - "Op %s output %s should contain only one variable", type_, - name); + "Operator %s's output %s should contain only one variable.", + type_, name); return outs.empty() ? kEmptyVarName : outs[0]; } const std::vector& OperatorBase::Outputs( const std::string& name) const { auto it = outputs_.find(name); - PADDLE_ENFORCE(it != outputs_.end(), "Op %s does not have output called %s", - type_, name); + PADDLE_ENFORCE(it != outputs_.end(), + "Operator %s does not have an output called %s.", type_, name); return it->second; } @@ -351,6 +351,20 @@ class RuntimeInferShapeContext : public InferShapeContext { return op_.Outputs(name); } + void ShareLoD(const std::string& in, const std::string& out, size_t i = 0, + size_t j = 0) const override { + PADDLE_ENFORCE_LT(i, Inputs(in).size()); + PADDLE_ENFORCE_LT(j, Outputs(out).size()); + Variable* in_var = scope_.FindVar(Inputs(in)[i]); + Variable* out_var = scope_.FindVar(Outputs(out)[j]); + if (!in_var->IsType()) return; + PADDLE_ENFORCE(out_var->IsType(), + "The %d-th output of Output(%s) must be LoDTensor.", j, out); + auto in_tensor = in_var->Get(); + auto* out_tensor = out_var->GetMutable(); + out_tensor->set_lod(in_tensor.lod()); + } + private: DDim GetDim(const std::string& name) const override { Variable* var = scope_.FindVar(name); diff --git a/paddle/framework/operator.h b/paddle/framework/operator.h index 93885fa3028e072bc0bd021ea9287087678f3621..b8a7040ed024fc7b19980beef3d8b367dfdd7f50 100644 --- a/paddle/framework/operator.h +++ b/paddle/framework/operator.h @@ -427,7 +427,8 @@ class OperatorWithKernel : public OperatorBase { int tmp = static_cast(ToDataType(t->type())); VLOG(3) << "Input " << ipt_name << " with data_type " << tmp; PADDLE_ENFORCE(tmp == data_type || data_type == -1, - "DataType of Paddle Op %s must be same.", Type()); + "DataType of Paddle Op %s must be the same.", + Type()); data_type = tmp; } } diff --git a/paddle/framework/operator_test.cc b/paddle/framework/operator_test.cc index 3c07621293389fc7803b0295d9d30b2c12d6e327..42e0d52eed3911d8e684e76a88bc690ca0783ce5 100644 --- a/paddle/framework/operator_test.cc +++ b/paddle/framework/operator_test.cc @@ -83,7 +83,7 @@ TEST(OperatorBase, all) { paddle::platform::CPUDeviceContext device_context; paddle::framework::Scope scope; - auto op = paddle::framework::OpRegistry::CreateOp(op_desc, nullptr); + auto op = paddle::framework::OpRegistry::CreateOp(op_desc); scope.Var("OUT1"); ASSERT_EQ(paddle::framework::op_run_num, 0); op->Run(scope, device_context); @@ -208,7 +208,7 @@ TEST(OpKernel, all) { paddle::platform::CPUDeviceContext cpu_device_context; paddle::framework::Scope scope; - auto op = paddle::framework::OpRegistry::CreateOp(op_desc, nullptr); + auto op = paddle::framework::OpRegistry::CreateOp(op_desc); ASSERT_EQ(paddle::framework::cpu_kernel_run_num, 0); op->Run(scope, cpu_device_context); ASSERT_EQ(paddle::framework::cpu_kernel_run_num, 1); @@ -244,7 +244,7 @@ TEST(OpKernel, multi_inputs) { scope.Var("y0")->GetMutable(); scope.Var("y1")->GetMutable(); - auto op = paddle::framework::OpRegistry::CreateOp(op_desc, nullptr); + auto op = paddle::framework::OpRegistry::CreateOp(op_desc); op->Run(scope, cpu_device_context); } diff --git a/paddle/framework/program_desc.h b/paddle/framework/program_desc.h index ce1721472d9046f50b7fc88253fa3f2dbaaf51a8..b1cb086de4345902482d8254b8aeec041ecf81bc 100644 --- a/paddle/framework/program_desc.h +++ b/paddle/framework/program_desc.h @@ -37,7 +37,9 @@ class ProgramDescBind { BlockDescBind *AppendBlock(const BlockDescBind &parent); - BlockDescBind *Block(size_t idx) { return blocks_[idx].get(); } + BlockDescBind *MutableBlock(size_t idx) { return blocks_[idx].get(); } + + const BlockDescBind &Block(size_t idx) const { return *blocks_[idx]; } size_t Size() const { return blocks_.size(); } diff --git a/paddle/framework/program_desc_test.cc b/paddle/framework/program_desc_test.cc index d28c2a0bff932f5aa37c69231495895dacb07bb3..83e7286e0ec3639fa589b0958922543a3ba16a00 100644 --- a/paddle/framework/program_desc_test.cc +++ b/paddle/framework/program_desc_test.cc @@ -20,7 +20,7 @@ namespace paddle { namespace framework { TEST(ProgramDesc, copy_ctor) { ProgramDescBind program; - auto* global_block = program.Block(0); + auto* global_block = program.MutableBlock(0); auto* x = global_block->Var("X"); x->SetType(VarDesc_VarType_LOD_TENSOR); x->SetLoDLevel(0); @@ -44,7 +44,7 @@ TEST(ProgramDesc, copy_ctor) { ProgramDescBind program_copy(program); - auto* global_block_copy = program_copy.Block(0); + auto* global_block_copy = program_copy.MutableBlock(0); ASSERT_NE(global_block, global_block_copy); auto assert_same_var = [&](const std::string& name, VarDescBind* var_before) { @@ -82,7 +82,7 @@ TEST(ProgramDesc, copy_ctor) { TEST(ProgramDescBind, serialize_and_deserialize) { ProgramDescBind program_origin; - auto* global_block = program_origin.Block(0); + auto* global_block = program_origin.MutableBlock(0); auto* x = global_block->Var("X"); x->SetType(VarDesc_VarType_LOD_TENSOR); x->SetLoDLevel(0); @@ -108,7 +108,7 @@ TEST(ProgramDescBind, serialize_and_deserialize) { program_origin.Proto()->SerializeToString(&binary_str); ProgramDescBind program_restored(binary_str); - auto* global_block_restored = program_restored.Block(0); + auto* global_block_restored = program_restored.MutableBlock(0); ASSERT_NE(global_block, global_block_restored); auto assert_same_var = [&](const std::string& name, VarDescBind* var_before) { diff --git a/paddle/framework/prune_test.cc b/paddle/framework/prune_test.cc index cadd114fbc3de897a13504e665ce464e83d312ff..5988874809f51c09b3d3d279be6c1e8d43d7a782 100644 --- a/paddle/framework/prune_test.cc +++ b/paddle/framework/prune_test.cc @@ -52,7 +52,7 @@ void AddOp(const std::string &type, const f::VariableNameMap &inputs, TEST(Prune, one_operator) { f::ProgramDescBind program; - f::BlockDescBind *block = program.Block(0); + f::BlockDescBind *block = program.MutableBlock(0); AddOp("one_one", {{"input", {"a"}}}, {{"output", {"b"}}}, {}, block); @@ -69,7 +69,7 @@ TEST(Prune, one_operator) { TEST(Prune, forward) { f::ProgramDescBind program; - f::BlockDescBind *block = program.Block(0); + f::BlockDescBind *block = program.MutableBlock(0); AddOp("one_one", {{"input", {"a"}}}, {{"output", {"b"}}}, {}, block); AddOp("one_one", {{"input", {"b"}}}, {{"output", {"c"}}}, {}, block); @@ -88,7 +88,7 @@ TEST(Prune, forward) { TEST(Prune, multi_input_op) { f::ProgramDescBind program; - f::BlockDescBind *block = program.Block(0); + f::BlockDescBind *block = program.MutableBlock(0); AddOp("one_one", {{"input", {"a0"}}}, {{"output", {"b0"}}}, {}, block); AddOp("one_one", {{"input", {"a1"}}}, {{"output", {"b1"}}}, {}, block); @@ -106,7 +106,7 @@ TEST(Prune, multi_input_op) { TEST(Prune, multi_output_op) { f::ProgramDescBind program; - f::BlockDescBind *block = program.Block(0); + f::BlockDescBind *block = program.MutableBlock(0); AddOp("one_two", {{"input", {"a"}}}, {{"output", {"b", "c"}}}, {}, block); AddOp("one_one", {{"input", {"b"}}}, {{"output", {"b1"}}}, {}, block); @@ -122,7 +122,7 @@ TEST(Prune, multi_output_op) { TEST(Prune, multi_target) { f::ProgramDescBind program; - f::BlockDescBind *block = program.Block(0); + f::BlockDescBind *block = program.MutableBlock(0); AddOp("one_two", {{"input", {"a"}}}, {{"output", {"b", "c"}}}, {}, block); AddOp("one_one", {{"input", {"b"}}}, {{"output", {"b1"}}}, {}, block); diff --git a/paddle/framework/shape_inference.cc b/paddle/framework/shape_inference.cc index 33a1d0b9b217c5d2a4b0fb63f427529e7988b24e..8169df8e4629e2d02d3dabcd6a8a102ad0077a81 100644 --- a/paddle/framework/shape_inference.cc +++ b/paddle/framework/shape_inference.cc @@ -28,9 +28,6 @@ void InferShapeContext::SetOutputsDim( SetDims(names, dims); } -void InferShapeContext::ShareLoD(const std::string &in, const std::string &out, - size_t i, size_t j) const {} - std::vector InferShapeContext::GetDims( const std::vector &names) const { std::vector ret; diff --git a/paddle/framework/shape_inference.h b/paddle/framework/shape_inference.h index f1f1e44bccd771be81cad7c28efe9b1b885eef6b..6f19900ef1a3e88fe78d457a03c344ea586ab551 100644 --- a/paddle/framework/shape_inference.h +++ b/paddle/framework/shape_inference.h @@ -43,9 +43,8 @@ class InferShapeContext { virtual const std::vector &Outputs( const std::string &name) const = 0; - // TODO(qiao) implement this function - void ShareLoD(const std::string &in, const std::string &out, size_t i = 0, - size_t j = 0) const; + virtual void ShareLoD(const std::string &in, const std::string &out, + size_t i = 0, size_t j = 0) const = 0; protected: virtual framework::DDim GetDim(const std::string &name) const = 0; diff --git a/paddle/framework/tensor.h b/paddle/framework/tensor.h index 7b9a5b75e1087a1cc3b6c6c7a6e4dc185c32dd42..9eab67561a42b3fb4e22d8475ad5eeb146a72f1c 100644 --- a/paddle/framework/tensor.h +++ b/paddle/framework/tensor.h @@ -118,10 +118,12 @@ class Tensor { const platform::DeviceContext& ctx); /** - * @brief Return the slice of the tensor. + * @brief Return a sub-tensor of the given tensor. * - * @param[in] begin_idx The begin index of the slice. - * @param[in] end_idx The end index of the slice. + * @param[in] begin_idx The index of the start row(inclusive) to slice. + * The index number begins from 0. + * @param[in] end_idx The index of the end row(exclusive) to slice. + * The index number begins from 0. */ inline Tensor Slice(const int& begin_idx, const int& end_idx) const; diff --git a/paddle/framework/tensor_impl.h b/paddle/framework/tensor_impl.h index 29ac683f48fcde4dd3b5ad7f04b5d1d7434706ba..bcccdd5881775e199297dce7e70aaf6aae62d95a 100644 --- a/paddle/framework/tensor_impl.h +++ b/paddle/framework/tensor_impl.h @@ -112,9 +112,10 @@ inline void* Tensor::mutable_data(platform::Place place, std::type_index type) { if (holder_ != nullptr) { holder_->set_type(type); } - PADDLE_ENFORCE_GT(numel(), 0, - "Tensor's numel must be larger than zero to call " - "Tensor::mutable_data. Call Tensor::set_dim first."); + PADDLE_ENFORCE_GT( + numel(), 0, + "When calling this method, the Tensor's numel must be larger than zero. " + "Please check Tensor::Resize has been called first."); int64_t size = numel() * SizeOfType(type); /* some versions of boost::variant don't have operator!= */ if (holder_ == nullptr || !(holder_->place() == place) || @@ -229,10 +230,12 @@ inline void Tensor::CopyFromVector(const std::vector& src, inline Tensor Tensor::Slice(const int& begin_idx, const int& end_idx) const { check_memory_size(); - PADDLE_ENFORCE_GE(begin_idx, 0, "Slice begin index is less than zero."); - PADDLE_ENFORCE_LE(end_idx, dims_[0], "Slice end index is out of bound."); - PADDLE_ENFORCE_LT(begin_idx, end_idx, - "Begin index must be less than end index."); + PADDLE_ENFORCE_GE(begin_idx, 0, + "The start row index must be greater than 0."); + PADDLE_ENFORCE_LE(end_idx, dims_[0], "The end row index is out of bound."); + PADDLE_ENFORCE_LT( + begin_idx, end_idx, + "The start row index must be lesser than the end row index."); if (dims_[0] == 1) { return *this; diff --git a/paddle/framework/type_defs.h b/paddle/framework/type_defs.h index c38c4a8ae9a46c8bda913e7643e812592de68e6e..afeeb1914ac30188b93c3b9da30bb5ceaf74416e 100644 --- a/paddle/framework/type_defs.h +++ b/paddle/framework/type_defs.h @@ -36,7 +36,7 @@ using VariableNameMap = std::map>; using Attribute = boost::variant, std::vector, std::vector, bool, - std::vector, BlockDesc*>; + std::vector, BlockDescBind*>; using AttributeMap = std::unordered_map; diff --git a/paddle/framework/var_type_inference_test.cc b/paddle/framework/var_type_inference_test.cc index 918de1fd055e32888f71ffea1f33993ba1210e86..9035e63fa48ffdf7c72061b0a4248538d7a357e4 100644 --- a/paddle/framework/var_type_inference_test.cc +++ b/paddle/framework/var_type_inference_test.cc @@ -63,41 +63,43 @@ namespace framework { TEST(InferVarType, sum_op) { ProgramDescBind prog; - auto *op = prog.Block(0)->AppendOp(); + auto *op = prog.MutableBlock(0)->AppendOp(); op->SetType("sum"); op->SetInput("X", {"test_a", "test_b", "test_c"}); op->SetOutput("Out", {"test_out"}); - prog.Block(0)->Var("test_a")->SetType(VarDesc::SELECTED_ROWS); - prog.Block(0)->Var("test_b")->SetType(VarDesc::SELECTED_ROWS); - prog.Block(0)->Var("test_c")->SetType(VarDesc::SELECTED_ROWS); - prog.Block(0)->Var("test_out"); + prog.MutableBlock(0)->Var("test_a")->SetType(VarDesc::SELECTED_ROWS); + prog.MutableBlock(0)->Var("test_b")->SetType(VarDesc::SELECTED_ROWS); + prog.MutableBlock(0)->Var("test_c")->SetType(VarDesc::SELECTED_ROWS); + prog.MutableBlock(0)->Var("test_out"); - op->InferVarType(prog.Block(0)); + op->InferVarType(prog.MutableBlock(0)); - ASSERT_EQ(VarDesc::SELECTED_ROWS, prog.Block(0)->Var("test_out")->GetType()); + ASSERT_EQ(VarDesc::SELECTED_ROWS, + prog.MutableBlock(0)->Var("test_out")->GetType()); - prog.Block(0)->Var("test_b")->SetType(VarDesc::LOD_TENSOR); - op->InferVarType(prog.Block(0)); - ASSERT_EQ(VarDesc::LOD_TENSOR, prog.Block(0)->Var("test_out")->GetType()); + prog.MutableBlock(0)->Var("test_b")->SetType(VarDesc::LOD_TENSOR); + op->InferVarType(prog.MutableBlock(0)); + ASSERT_EQ(VarDesc::LOD_TENSOR, + prog.MutableBlock(0)->Var("test_out")->GetType()); } TEST(InferVarType, sum_op_without_infer_var_type) { ProgramDescBind prog; - auto *op = prog.Block(0)->AppendOp(); + auto *op = prog.MutableBlock(0)->AppendOp(); op->SetType("sum_without_infer_var_type"); op->SetInput("X", {"test2_a", "test2_b", "test2_c"}); op->SetOutput("Out", {"test2_out"}); - prog.Block(0)->Var("test2_a")->SetType(VarDesc::SELECTED_ROWS); - prog.Block(0)->Var("test2_b")->SetType(VarDesc::SELECTED_ROWS); - prog.Block(0)->Var("test2_c")->SetType(VarDesc::SELECTED_ROWS); - prog.Block(0)->Var("test2_out"); + prog.MutableBlock(0)->Var("test2_a")->SetType(VarDesc::SELECTED_ROWS); + prog.MutableBlock(0)->Var("test2_b")->SetType(VarDesc::SELECTED_ROWS); + prog.MutableBlock(0)->Var("test2_c")->SetType(VarDesc::SELECTED_ROWS); + prog.MutableBlock(0)->Var("test2_out"); - op->InferVarType(prog.Block(0)); + op->InferVarType(prog.MutableBlock(0)); ASSERT_EQ(VarDesc_VarType_LOD_TENSOR, - prog.Block(0)->Var("test2_out")->GetType()); + prog.MutableBlock(0)->Var("test2_out")->GetType()); } } // namespace framework diff --git a/paddle/gserver/layers/CRFLayer.cpp b/paddle/gserver/layers/CRFLayer.cpp index 0b544420097e9150f8489731b6379dea633e992c..867303b4fa0d490297ab152fc2ad266e92e29baf 100644 --- a/paddle/gserver/layers/CRFLayer.cpp +++ b/paddle/gserver/layers/CRFLayer.cpp @@ -101,8 +101,10 @@ void CRFLayer::backward(const UpdateCallback& callback) { : real(1.0f); instanceWeight *= coeff_; - MatrixPtr grad = output.grad->subRowMatrix(starts[i], starts[i + 1]); - grad->add(*crfs_[i].getXGrad(), real(1.0f), instanceWeight); + if (output.grad) { + MatrixPtr grad = output.grad->subRowMatrix(starts[i], starts[i + 1]); + grad->add(*crfs_[i].getXGrad(), real(1.0f), instanceWeight); + } if (needWGrad) { weight_->getWGrad()->add( *crfs_[i].getWGrad(), real(1.0f), instanceWeight); diff --git a/paddle/gserver/layers/LinearChainCRF.cpp b/paddle/gserver/layers/LinearChainCRF.cpp index dc3dc156792bdf32c3b948a292597d0e9eca5d8b..abaa1802b763a49f748214dbd4dec1d2bac53b59 100644 --- a/paddle/gserver/layers/LinearChainCRF.cpp +++ b/paddle/gserver/layers/LinearChainCRF.cpp @@ -102,7 +102,6 @@ real LinearChainCRF::forward(real* x, int* s, int length) { } void LinearChainCRF::backward(real* x, int* s, int length, bool needWGrad) { - MatrixPtr matX = Matrix::create(x, length, numClasses_); Matrix::resizeOrCreate(matGrad_, length, numClasses_); Matrix::resizeOrCreate(beta_, length, numClasses_); real* b = b_->getData(); diff --git a/paddle/gserver/layers/SequenceReshapeLayer.cpp b/paddle/gserver/layers/SequenceReshapeLayer.cpp index 433592953b220eda4db4634124a57a2074cef4c0..822974407283c9ee6d0efee71bc945bc418b1942 100644 --- a/paddle/gserver/layers/SequenceReshapeLayer.cpp +++ b/paddle/gserver/layers/SequenceReshapeLayer.cpp @@ -70,11 +70,23 @@ void SequenceReshapeLayer::forward(PassType passType) { size_t outDim = getSize(); size_t numSequences = input.getNumSequences(); - auto startPositions = input.sequenceStartPositions->getVector(false); - const int* starts = startPositions->getData(); - CHECK_EQ(starts[numSequences], input.getBatchSize()); - CHECK_EQ(numSequences, startPositions->getSize() - 1); + // by default, we assume each instance as a sequence + IVectorPtr seqStarts; + IVector::resizeOrCreate(seqStarts, input.getBatchSize() + 1, false); + int* startsData = seqStarts->getData(); + for (int i = 0; i < input.getBatchSize() + 1; i++) { + startsData[i] = i; + } + const int* starts = startsData; + + // if there is sequence, then use start positions + if (input.sequenceStartPositions) { + auto startPositions = input.sequenceStartPositions->getVector(false); + starts = startPositions->getData(); + CHECK_EQ(starts[numSequences], input.getBatchSize()); + CHECK_EQ(numSequences, startPositions->getSize() - 1); + } for (size_t seqID = 0; seqID < numSequences; seqID++) { size_t inNumIns = starts[seqID + 1] - starts[seqID]; diff --git a/paddle/gserver/tests/MKLDNNTester.cpp b/paddle/gserver/tests/MKLDNNTester.cpp index 73b7e8857f35d194e71b2b5b341f89b77fd1f8b0..7670cb88fb67dec0ab1d170458d102da166dc7b6 100644 --- a/paddle/gserver/tests/MKLDNNTester.cpp +++ b/paddle/gserver/tests/MKLDNNTester.cpp @@ -273,31 +273,37 @@ void MKLDNNTester::printVector(const VectorPtr& v) { VLOG(MKLDNN_ALL) << std::endl << ostr.str(); } -double MKLDNNTester::getDelta(const real* d1, - const real* d2, +double MKLDNNTester::getDelta(const real* refer, + const real* value, size_t len, const float failRate, const float thres) { double delta = 0, sum = 0; int failCnt = 0; const double eps = 1e-5; - double maxOut = 0; + double maxRatio = 0; for (size_t i = 0; i < len; ++i) { - double ref = fabs(d2[i]); - double diff = fabs(d1[i] - d2[i]); + double ref = fabs(refer[i]); + double val = fabs(value[i]); + double diff = fabs(refer[i] - value[i]); delta += diff; sum += ref; - if (ref > eps && fabs(d1[i]) > eps && diff / ref > thres) { - maxOut = std::max(maxOut, diff / ref); + if (ref < eps && val < eps) { // both values are very small + continue; + } + double ratio = diff / ref; + if (ratio > thres) { + maxRatio = std::max(maxRatio, ratio); failCnt++; } } - EXPECT_TRUE(std::isnormal(sum)); EXPECT_FALSE(std::isinf(sum)); + EXPECT_FALSE(std::isnan(sum)); EXPECT_FALSE(std::isnan(delta)); VLOG(MKLDNN_ALL) << "reference avg data: " << sum / len << ", delta: " << delta / sum << ", failCnt:" << failCnt; - return (failCnt / (float)len) > failRate ? maxOut : delta / sum; + double res = sum > eps ? delta / sum : eps; + return (failCnt / (float)len) > failRate ? maxRatio : res; } double MKLDNNTester::compareMatrix(const MatrixPtr& m1, const MatrixPtr& m2) { @@ -515,12 +521,16 @@ void MKLDNNTester::getOutResult(const std::string& configPath, gradientMachine->forward(in.inArgs[i], &outArgs, PASS_TRAIN); // save forward result for (size_t k = 0; k < outArgs.size(); k++) { - MatrixPtr value = Matrix::create(outArgs[k].value->getHeight(), - outArgs[k].value->getWidth(), - false, - false); - value->copyFrom(*outArgs[k].value); - out.outValues.push_back(value); + const MatrixPtr& src = outArgs[k].value; + MatrixPtr dst = + Matrix::create(src->getHeight(), src->getWidth(), false, false); + if (typeid(*src) == typeid(MKLDNNMatrix)) { + MKLDNNMatrixPtr dnnSrc = std::dynamic_pointer_cast(src); + dnnSrc->copyTo(*dst); + } else { + dst->copyFrom(*src); + } + out.outValues.push_back(dst); } // random backward input @@ -543,19 +553,19 @@ void MKLDNNTester::getOutResult(const std::string& configPath, void MKLDNNTester::compareResult(DataOut& ref, DataOut& dnn, float eps) { CHECK_EQ(ref.outValues.size(), dnn.outValues.size()); CHECK_EQ(ref.paraValues.size(), dnn.paraValues.size()); - VLOG(MKLDNN_TESTS) << "compare value size: " << ref.outValues.size(); for (size_t i = 0; i < ref.outValues.size(); i++) { + VLOG(MKLDNN_TESTS) << "compare value index: " << i; EXPECT_LE(fabs(compareMatrix(ref.outValues[i], dnn.outValues[i])), eps); } - VLOG(MKLDNN_TESTS) << "compare param size: " << ref.outValues.size(); for (size_t i = 0; i < ref.paraValues.size(); i++) { + VLOG(MKLDNN_TESTS) << "compare param index: " << i; EXPECT_LE(fabs(compareVector(ref.paraValues[i], dnn.paraValues[i])), eps); } } -void MKLDNNTester::runBranchesTest(const std::string& configPath, - size_t iter, - float eps) { +void MKLDNNTester::runNetTest(const std::string& configPath, + size_t iter, + float eps) { DataIn in; initArgument(in, configPath, iter); DataOut outCpu, outDnn; diff --git a/paddle/gserver/tests/MKLDNNTester.h b/paddle/gserver/tests/MKLDNNTester.h index 19d8848f74f2ee4a809e42164a0eb180abd2a4e1..ca55a45bc77b4e171619ab788d7c7dfeefcd036a 100644 --- a/paddle/gserver/tests/MKLDNNTester.h +++ b/paddle/gserver/tests/MKLDNNTester.h @@ -85,17 +85,17 @@ public: bool printDetails = false, size_t iter = 3, float epsilon = 1e-4); - static void runBranchesTest(const std::string& configPath, - size_t iter = 3, - float eps = 1e-4); + static void runNetTest(const std::string& configPath, + size_t iter = 2, + float eps = 1e-4); static void initArgument(DataIn& data, const std::string& configPath, - size_t iter = 3); + size_t iter = 2); static void getOutResult(const std::string& configPath, DataIn& in, DataOut& out, bool use_mkldnn, - size_t iter = 3); + size_t iter = 2); private: void reset(const TestConfig& dnn, const TestConfig& ref, size_t batchSize); @@ -128,13 +128,13 @@ private: /** * Get delta percent - * if many(>failRate) wrong(abs(dnn-ref)/abs(ref)>thres) points return the - * max(diff/ref) - * else return sum(abs(a-b)) / sum(abs(b)) + * if many(>failRate) wrong(abs(val-ref)/abs(ref) > thres) points + * return the max(diff/ref) + * else return sum(abs(diff)) / sum(abs(ref)) * The return value should be smaller than eps when passing. */ - static double getDelta(const real* d1, - const real* d2, + static double getDelta(const real* refer, + const real* value, size_t len, const float failRate = 1e-3, const float thres = 0.1); diff --git a/paddle/gserver/tests/mkldnn_branch_net.conf b/paddle/gserver/tests/mkldnn_branch_net.conf new file mode 100644 index 0000000000000000000000000000000000000000..8d5146abb0ebd7f5d6c512457f3cb5c84eac20f5 --- /dev/null +++ b/paddle/gserver/tests/mkldnn_branch_net.conf @@ -0,0 +1,142 @@ +# Copyright (c) 2017 PaddlePaddle Authors. All Rights Reserved +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from paddle.trainer_config_helpers import * + +settings(batch_size=16) +channels = get_config_arg("channels", int, 2) + +def two_conv(input, group_name): + out1 = img_conv_layer(input=input, + name=group_name+'_conv1_', + filter_size=1, + num_filters=channels, + padding=0, + shared_biases=True, + act=ReluActivation()) + + out2 = img_conv_layer(input=input, + name=group_name+'_conv2_', + filter_size=3, + num_filters=channels, + padding=1, + shared_biases=True, + act=ReluActivation()) + return out1, out2 + +def two_conv_bn(input, group_name): + out1, out2 = two_conv(input, group_name) + out1 = batch_norm_layer(input=out1, + name=group_name+'_bn1_', + use_global_stats=False, + act=ReluActivation()) + + out2 = batch_norm_layer(input=out2, + name=group_name+'_bn2_', + use_global_stats=False, + act=ReluActivation()) + return out1, out2 + +def two_conv_pool(input, group_name): + out1, out2 = two_conv(input, group_name) + out1 = img_pool_layer(input=out1, + name=group_name+'_pool1_', + pool_size=3, + stride=2, + padding=0, + pool_type=MaxPooling()) + + out2 = img_pool_layer(input=out2, + name=group_name+'_pool2_', + pool_size=5, + stride=2, + padding=1, + pool_type=MaxPooling()) + return out1, out2 + +def two_fc(input, group_name): + out1 = fc_layer(input=input, + name=group_name+'_fc1_', + size=channels, + bias_attr=False, + act=LinearActivation()) + + out2 = fc_layer(input=input, + name=group_name+'_fc2_', + size=channels, + bias_attr=False, + act=LinearActivation()) + return out1, out2 + +data = data_layer(name ="input", size=channels*16*16) + +tmp = img_conv_layer(input=data, + num_channels=channels, + filter_size=3, + num_filters=channels, + padding=1, + shared_biases=True, + act=ReluActivation()) + +a1, a2 = two_conv(tmp, 'conv_branch') +tmp = addto_layer(input=[a1, a2], + act=ReluActivation(), + bias_attr=False) + +tmp = img_pool_layer(input=tmp, + pool_size=3, + stride=2, + padding=1, + pool_type=AvgPooling()) + +b1, b2 = two_conv_pool(tmp, 'pool_branch') +tmp = concat_layer(input=[b1, b2]) + +tmp = img_pool_layer(input=tmp, + num_channels=channels*2, + pool_size=3, + stride=2, + padding=1, + pool_type=MaxPooling()) + +tmp = img_conv_layer(input=tmp, + filter_size=3, + num_filters=channels, + padding=1, + stride=2, + shared_biases=True, + act=LinearActivation(), + bias_attr=False) + +tmp = batch_norm_layer(input=tmp, + use_global_stats=False, + act=ReluActivation()) + +c1, c2 = two_conv_bn(tmp, 'bn_branch') +tmp = addto_layer(input=[c1, c2], + act=ReluActivation(), + bias_attr=False) + +tmp = fc_layer(input=tmp, size=channels, + bias_attr=True, + act=ReluActivation()) + +d1, d2 = two_fc(tmp, 'fc_branch') +tmp = addto_layer(input=[d1, d2]) + +out = fc_layer(input=tmp, size=10, + bias_attr=True, + act=SoftmaxActivation()) + +outputs(out) diff --git a/paddle/gserver/tests/mkldnn_branches_fc.conf b/paddle/gserver/tests/mkldnn_branches_fc.conf deleted file mode 100644 index fb85425c2b63c7604d636e2b0c5d20d91fb5de1b..0000000000000000000000000000000000000000 --- a/paddle/gserver/tests/mkldnn_branches_fc.conf +++ /dev/null @@ -1,58 +0,0 @@ -# Copyright (c) 2017 PaddlePaddle Authors. All Rights Reserved -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from paddle.trainer_config_helpers import * - -settings(batch_size=16) -channels = get_config_arg("channels", int, 2) - -def two_fc(input, group_name): - out1 = fc_layer(input=input, - name=group_name+'_fc1', - size=channels, - bias_attr=False, - act=LinearActivation()) - - out2 = fc_layer(input=input, - name=group_name+'_fc2', - size=channels, - bias_attr=False, - act=LinearActivation()) - return out1, out2 - -data = data_layer(name ="input", size=channels*16*16) - -conv = img_conv_layer(input=data, - num_channels=channels, - filter_size=3, - num_filters=channels, - padding=1, - shared_biases=True, - act=LinearActivation()) - -pool = img_pool_layer(input=conv, - pool_size=3, - stride=2, - padding=1, - pool_type=AvgPooling()) - -a1, a2 = two_fc(input=pool, group_name='a') - -concat = concat_layer(input=[a1, a2]) - -b1, b2 = two_fc(input=pool, group_name='b') - -addto = addto_layer(input=[b1, b2]) - -outputs([concat, addto]) diff --git a/paddle/gserver/tests/mkldnn_branches_pool.conf b/paddle/gserver/tests/mkldnn_branches_pool.conf deleted file mode 100644 index ca17c74752ab0777a69f818d9f43275a6140cb4c..0000000000000000000000000000000000000000 --- a/paddle/gserver/tests/mkldnn_branches_pool.conf +++ /dev/null @@ -1,60 +0,0 @@ -# Copyright (c) 2017 PaddlePaddle Authors. All Rights Reserved -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from paddle.trainer_config_helpers import * - -settings(batch_size=16) -channels = get_config_arg("channels", int, 2) - -def two_pool(input, group_name): - out1 = img_pool_layer(input=input, - name=group_name+'_pool1', - pool_size=3, - stride=2, - padding=0, - pool_type=MaxPooling()) - - out2 = img_pool_layer(input=input, - name=group_name+'_pool2', - pool_size=5, - stride=2, - padding=1, - pool_type=MaxPooling()) - return out1, out2 - -data = data_layer(name ="input", size=channels*16*16) - -conv = img_conv_layer(input=data, - num_channels=channels, - filter_size=3, - num_filters=channels, - padding=1, - shared_biases=True, - act=LinearActivation()) - -pool = img_pool_layer(input=conv, - pool_size=3, - stride=1, - padding=1, - pool_type=AvgPooling()) - -a1, a2 = two_pool(input=pool, group_name='a') - -concat = concat_layer(input=[a1, a2]) - -b1, b2 = two_pool(input=pool, group_name='b') - -addto = addto_layer(input=[b1, b2]) - -outputs([concat, addto]) diff --git a/paddle/gserver/tests/mkldnn_branches_conv.conf b/paddle/gserver/tests/mkldnn_simple_net.conf similarity index 64% rename from paddle/gserver/tests/mkldnn_branches_conv.conf rename to paddle/gserver/tests/mkldnn_simple_net.conf index 2628509db43e6a5f69a4f5ea956bffdc2837e32a..8bbe91e56d0ba6da06475ad16f3162ee1103ee02 100644 --- a/paddle/gserver/tests/mkldnn_branches_conv.conf +++ b/paddle/gserver/tests/mkldnn_simple_net.conf @@ -17,40 +17,48 @@ from paddle.trainer_config_helpers import * settings(batch_size=16) channels = get_config_arg("channels", int, 2) -def two_conv(input, group_name): - out1 = img_conv_layer(input=input, - name=group_name+'_conv1', - filter_size=1, - num_filters=channels, - padding=0, - shared_biases=True, - act=ReluActivation()) +data = data_layer(name ="input", size=channels*16*16) - out2 = img_conv_layer(input=input, - name=group_name+'_conv2', +tmp = img_conv_layer(input=data, + num_channels=channels, filter_size=3, num_filters=channels, padding=1, shared_biases=True, act=ReluActivation()) - return out1, out2 -data = data_layer(name ="input", size=channels*16*16) +tmp = img_pool_layer(input=tmp, + pool_size=3, + stride=1, + padding=0, + pool_type=AvgPooling()) -conv = img_conv_layer(input=data, - num_channels=channels, +tmp = img_conv_layer(input=tmp, filter_size=3, num_filters=channels, padding=1, shared_biases=True, - act=ReluActivation()) + act=LinearActivation(), + bias_attr=False) -a1, a2 = two_conv(input=conv, group_name='a') +tmp = batch_norm_layer(input=tmp, + use_global_stats=False, + act=ReluActivation()) -concat = concat_layer(input=[a1, a2]) +tmp = img_pool_layer(input=tmp, + pool_size=3, + stride=2, + padding=1, + pool_type=MaxPooling()) -b1, b2 = two_conv(input=conv, group_name='b') +tmp = fc_layer(input=tmp, + size=channels, + bias_attr=False, + act=ReluActivation()) -addto = addto_layer(input=[b1, b2]) +out = fc_layer(input=tmp, + size=10, + bias_attr=True, + act=SoftmaxActivation()) -outputs([concat, addto]) +outputs(out) diff --git a/paddle/gserver/tests/test_MKLDNN.cpp b/paddle/gserver/tests/test_MKLDNN.cpp index 85d4f437c2664135a7975c6ed3270d8f1ddbeaf4..d60b0f04a1613acc3711e711cfe18ced5f0f924d 100644 --- a/paddle/gserver/tests/test_MKLDNN.cpp +++ b/paddle/gserver/tests/test_MKLDNN.cpp @@ -234,8 +234,7 @@ static void getMKLDNNBatchNormConfig(TestConfig& cfg, cfg.inputDefs.push_back({INPUT_DATA, "layer_2_moving_var", 1, size_t(pm.ic)}); cfg.inputDefs.back().isStatic = true; LayerInputConfig* input = cfg.layerConfig.add_inputs(); - // TODO(TJ): uncomment me when refine and support comparing all zeroes vector - // cfg.layerConfig.set_active_type("relu"); + cfg.layerConfig.set_active_type("relu"); cfg.layerConfig.add_inputs(); cfg.layerConfig.add_inputs(); ImageConfig* img_conf = input->mutable_image_conf(); @@ -309,15 +308,15 @@ TEST(MKLDNNActivation, Activations) { } DECLARE_string(config_args); -TEST(MKLDNNLayer, branches) { - std::vector cases = {"conv", "pool", "fc"}; +TEST(MKLDNNNet, net) { + std::vector cases = {"simple", "branch"}; for (auto name : cases) { - std::string config = "./gserver/tests/mkldnn_branches_" + name + ".conf"; + std::string config = "./gserver/tests/mkldnn_" + name + "_net.conf"; for (auto channels : {2, 32}) { std::ostringstream oss; oss << "channels=" << channels; FLAGS_config_args = oss.str(); - MKLDNNTester::runBranchesTest(config); + MKLDNNTester::runNetTest(config); } } } diff --git a/paddle/math/MKLDNNMatrix.h b/paddle/math/MKLDNNMatrix.h index 5f5b819017b83579ce58522198b3f13311297d42..54cfefe23b3dc70fd12fd2ca8886c941047b59f7 100644 --- a/paddle/math/MKLDNNMatrix.h +++ b/paddle/math/MKLDNNMatrix.h @@ -102,6 +102,11 @@ public: m_->copyFrom(src); } + void copyTo(Matrix& dst) { + // TODO(TJ): reorder data if this format is not nchw or x + dst.copyFrom(*m_); + } + public: /** * Reorder this MKLDNNMatrix from other format. diff --git a/paddle/memory/detail/buddy_allocator.cc b/paddle/memory/detail/buddy_allocator.cc index e212f7737a4093125857126cabb5b1a7b3e055b1..64ee53803891f192302bb915027f0499dfa36411 100644 --- a/paddle/memory/detail/buddy_allocator.cc +++ b/paddle/memory/detail/buddy_allocator.cc @@ -27,11 +27,11 @@ BuddyAllocator::BuddyAllocator(SystemAllocator* system_allocator, system_allocator_(std::move(system_allocator)) {} BuddyAllocator::~BuddyAllocator() { - VLOG(3) << "BuddyAllocator Disconstructor makes sure that all of these " - "have actually been freed"; + VLOG(10) << "BuddyAllocator Disconstructor makes sure that all of these " + "have actually been freed"; while (!pool_.empty()) { auto block = static_cast(std::get<2>(*pool_.begin())); - VLOG(3) << "Free from block (" << block << ", " << max_chunk_size_ << ")"; + VLOG(10) << "Free from block (" << block << ", " << max_chunk_size_ << ")"; system_allocator_->Free(block, max_chunk_size_, block->index(cache_)); cache_.invalidate(block); @@ -51,11 +51,12 @@ void* BuddyAllocator::Alloc(size_t unaligned_size) { // acquire the allocator lock std::lock_guard lock(mutex_); - VLOG(3) << "Allocate " << unaligned_size << " bytes from chunk size " << size; + VLOG(10) << "Allocate " << unaligned_size << " bytes from chunk size " + << size; // if the allocation is huge, send directly to the system allocator if (size > max_chunk_size_) { - VLOG(3) << "Allocate from system allocator."; + VLOG(10) << "Allocate from system allocator."; return SystemAlloc(size); } @@ -70,9 +71,9 @@ void* BuddyAllocator::Alloc(size_t unaligned_size) { return nullptr; } } else { - VLOG(3) << "Allocation from existing memory block " << std::get<2>(*it) - << " at address " - << reinterpret_cast(std::get<2>(*it))->data(); + VLOG(10) << "Allocation from existing memory block " << std::get<2>(*it) + << " at address " + << reinterpret_cast(std::get<2>(*it))->data(); } total_used_ += size; @@ -89,10 +90,10 @@ void BuddyAllocator::Free(void* p) { // Acquire the allocator lock std::lock_guard lock(mutex_); - VLOG(3) << "Free from address " << block; + VLOG(10) << "Free from address " << block; if (block->type(cache_) == MemoryBlock::HUGE_CHUNK) { - VLOG(3) << "Free directly from system allocator"; + VLOG(10) << "Free directly from system allocator"; system_allocator_->Free(block, block->total_size(cache_), block->index(cache_)); @@ -109,8 +110,8 @@ void BuddyAllocator::Free(void* p) { // Trying to merge the right buddy if (block->has_right_buddy(cache_)) { - VLOG(3) << "Merging this block " << block << " with its right buddy " - << block->right_buddy(cache_); + VLOG(10) << "Merging this block " << block << " with its right buddy " + << block->right_buddy(cache_); auto right_buddy = block->right_buddy(cache_); @@ -127,8 +128,8 @@ void BuddyAllocator::Free(void* p) { // Trying to merge the left buddy if (block->has_left_buddy(cache_)) { - VLOG(3) << "Merging this block " << block << " with its left buddy " - << block->left_buddy(cache_); + VLOG(10) << "Merging this block " << block << " with its left buddy " + << block->left_buddy(cache_); auto left_buddy = block->left_buddy(cache_); @@ -144,8 +145,8 @@ void BuddyAllocator::Free(void* p) { } // Dumping this block into pool - VLOG(3) << "Inserting free block (" << block << ", " - << block->total_size(cache_) << ")"; + VLOG(10) << "Inserting free block (" << block << ", " + << block->total_size(cache_) << ")"; pool_.insert( IndexSizeAddress(block->index(cache_), block->total_size(cache_), block)); @@ -164,7 +165,7 @@ void* BuddyAllocator::SystemAlloc(size_t size) { size_t index = 0; void* p = system_allocator_->Alloc(index, size); - VLOG(3) << "Allocated " << p << " from system allocator."; + VLOG(10) << "Allocated " << p << " from system allocator."; if (p == nullptr) return nullptr; @@ -190,8 +191,8 @@ BuddyAllocator::PoolSet::iterator BuddyAllocator::RefillPool() { if (p == nullptr) return pool_.end(); - VLOG(3) << "Creating and inserting new block " << p - << " from system allocator"; + VLOG(10) << "Creating and inserting new block " << p + << " from system allocator"; static_cast(p)->init(cache_, MemoryBlock::FREE_CHUNK, index, max_chunk_size_, nullptr, nullptr); @@ -235,19 +236,19 @@ void* BuddyAllocator::SplitToAlloc(BuddyAllocator::PoolSet::iterator it, auto block = static_cast(std::get<2>(*it)); pool_.erase(it); - VLOG(3) << "Split block (" << block << ", " << block->total_size(cache_) - << ") into"; + VLOG(10) << "Split block (" << block << ", " << block->total_size(cache_) + << ") into"; block->split(cache_, size); - VLOG(3) << "Left block (" << block << ", " << block->total_size(cache_) - << ")"; + VLOG(10) << "Left block (" << block << ", " << block->total_size(cache_) + << ")"; block->set_type(cache_, MemoryBlock::ARENA_CHUNK); // the rest of memory if exist if (block->has_right_buddy(cache_)) { if (block->right_buddy(cache_)->type(cache_) == MemoryBlock::FREE_CHUNK) { - VLOG(3) << "Insert right block (" << block->right_buddy(cache_) << ", " - << block->right_buddy(cache_)->total_size(cache_) << ")"; + VLOG(10) << "Insert right block (" << block->right_buddy(cache_) << ", " + << block->right_buddy(cache_)->total_size(cache_) << ")"; pool_.insert( IndexSizeAddress(block->right_buddy(cache_)->index(cache_), @@ -274,7 +275,7 @@ void BuddyAllocator::CleanIdleFallBackAlloc() { return; } - VLOG(3) << "Return block " << block << " to fallback allocator."; + VLOG(10) << "Return block " << block << " to fallback allocator."; system_allocator_->Free(block, max_chunk_size_, block->index(cache_)); cache_.invalidate(block); @@ -310,7 +311,7 @@ void BuddyAllocator::CleanIdleNormalAlloc() { MemoryBlock* block = static_cast(std::get<2>(*pool)); - VLOG(3) << "Return block " << block << " to base allocator."; + VLOG(10) << "Return block " << block << " to base allocator."; system_allocator_->Free(block, max_chunk_size_, block->index(cache_)); cache_.invalidate(block); diff --git a/paddle/memory/detail/meta_cache.cc b/paddle/memory/detail/meta_cache.cc index f0721c3b94b74eed3a02e4bc744c24b97ac170a9..7e2f92b00ca5d787c1114176c5dc3304ca3ebe26 100644 --- a/paddle/memory/detail/meta_cache.cc +++ b/paddle/memory/detail/meta_cache.cc @@ -30,7 +30,7 @@ Metadata MetadataCache::load(const MemoryBlock* block) { return existing_metadata->second; } else { auto* meta = reinterpret_cast(block); - VLOG(3) << "Load MetaData type=" << meta->type; + VLOG(10) << "Load MetaData type=" << meta->type; PADDLE_ASSERT(meta->check_guards()); return *reinterpret_cast(block); } diff --git a/paddle/memory/detail/system_allocator.cc b/paddle/memory/detail/system_allocator.cc index 33166d9ce23a4a345fc00a65adf63281b13643c3..6b4e46f56a0c9c9836c5b353ec9c554454ab0491 100644 --- a/paddle/memory/detail/system_allocator.cc +++ b/paddle/memory/detail/system_allocator.cc @@ -41,7 +41,16 @@ void* CPUAllocator::Alloc(size_t& index, size_t size) { index = 0; // unlock memory - void* p = malloc(size); + void* p; + +#ifdef PADDLE_USE_MKLDNN + // refer to https://github.com/01org/mkl-dnn/blob/master/include/mkldnn.hpp + // memory alignment + PADDLE_ENFORCE_EQ(posix_memalign(&p, 4096ul, size), 0); +#else + PADDLE_ENFORCE_EQ(posix_memalign(&p, 32ul, size), 0); +#endif + PADDLE_ENFORCE(p, "Fail to allocate CPU memory: size = %d .", size); if (p != nullptr) { if (FLAGS_use_pinned_memory) { diff --git a/paddle/memory/memory.cc b/paddle/memory/memory.cc index 0b648642f90a09db7452cce97eb04cedfcf55f4f..5eb1c44eb6fc45db31ef44bf79e74b79193e08aa 100644 --- a/paddle/memory/memory.cc +++ b/paddle/memory/memory.cc @@ -39,15 +39,15 @@ BuddyAllocator* GetCPUBuddyAllocator() { template <> void* Alloc(platform::CPUPlace place, size_t size) { - VLOG(3) << "Allocate " << size << " bytes on " << platform::Place(place); + VLOG(10) << "Allocate " << size << " bytes on " << platform::Place(place); void* p = GetCPUBuddyAllocator()->Alloc(size); - VLOG(3) << " pointer=" << p; + VLOG(10) << " pointer=" << p; return p; } template <> void Free(platform::CPUPlace place, void* p) { - VLOG(3) << "Free pointer=" << p << " on " << platform::Place(place); + VLOG(10) << "Free pointer=" << p << " on " << platform::Place(place); GetCPUBuddyAllocator()->Free(p); } @@ -69,11 +69,12 @@ BuddyAllocator* GetGPUBuddyAllocator(int gpu_id) { platform::GpuMinChunkSize(), platform::GpuMaxChunkSize()); } - VLOG(3) << "\n\nNOTE: each GPU device use " - << FLAGS_fraction_of_gpu_memory_to_use * 100 << "% of GPU memory.\n" - << "You can set environment variable '" - << platform::kEnvFractionGpuMemoryToUse - << "' to change the fraction of GPU usage.\n\n"; + VLOG(10) << "\n\nNOTE: each GPU device use " + << FLAGS_fraction_of_gpu_memory_to_use * 100 + << "% of GPU memory.\n" + << "You can set environment variable '" + << platform::kEnvFractionGpuMemoryToUse + << "' to change the fraction of GPU usage.\n\n"; } platform::SetDeviceId(gpu_id); return as[gpu_id]; diff --git a/paddle/operators/cross_entropy_op.cc b/paddle/operators/cross_entropy_op.cc index d94b96200c2a5cd112b17e45aa6cd4a63bdd04d0..39df19da677a7dee7d0989d491f8d5511f73a9c7 100644 --- a/paddle/operators/cross_entropy_op.cc +++ b/paddle/operators/cross_entropy_op.cc @@ -28,8 +28,9 @@ class CrossEntropyOp : public framework::OperatorWithKernel { auto x_dims = ctx->GetInputDim("X"); auto label_dims = ctx->GetInputDim("Label"); - PADDLE_ENFORCE_EQ(x_dims.size(), 2, "Input(X)'s rank should be 2."); - PADDLE_ENFORCE_EQ(label_dims.size(), 2, "Input(Label)'s rank should be 2."); + PADDLE_ENFORCE_EQ(x_dims.size(), 2UL, "Input(X)'s rank should be 2."); + PADDLE_ENFORCE_EQ(label_dims.size(), 2UL, + "Input(Label)'s rank should be 2."); PADDLE_ENFORCE_EQ(x_dims[0], label_dims[0], "The 1st dimension of Input(X) and Input(Label) should " "be equal."); @@ -38,8 +39,8 @@ class CrossEntropyOp : public framework::OperatorWithKernel { "If Attr(soft_label) == true, the 2nd dimension of " "Input(X) and Input(Label) should be equal."); } else { - PADDLE_ENFORCE_EQ(label_dims[1], 1, - "If Attr(soft_label) == false, the 2nd dimension of " + PADDLE_ENFORCE_EQ(label_dims[1], 1UL, + "If Attr(softLabel) == false, the 2nd dimension of " "Input(Label) should be 1."); } @@ -48,7 +49,8 @@ class CrossEntropyOp : public framework::OperatorWithKernel { } protected: - // CrossEntropy's data type just determined by "X" + // Explicitly set that data type of the output of the cross_entropy operator + // is determined by its input "X". framework::DataType IndicateDataType( const framework::ExecutionContext& ctx) const override { return framework::ToDataType(ctx.Input("X")->type()); diff --git a/paddle/operators/dynamic_recurrent_op_test.cc b/paddle/operators/dynamic_recurrent_op_test.cc index fff63efb24c70b7e864e2d5b011a22883c13dede..8d840e259b190ead86a66df8ab31c5170db4d824 100644 --- a/paddle/operators/dynamic_recurrent_op_test.cc +++ b/paddle/operators/dynamic_recurrent_op_test.cc @@ -51,7 +51,7 @@ class RNNAlgorithmTestHelper : public ::testing::Test { CreateGlobalVariables(); auto op_desc = CreateOpDesc(); - op = paddle::framework::OpRegistry::CreateOp(op_desc, nullptr); + op = paddle::framework::OpRegistry::CreateOp(op_desc); dop = &(dynamic_cast(op.get())->rnn); InitCacheManually(); InitStepNet(); diff --git a/paddle/operators/gaussian_random_op.cc b/paddle/operators/gaussian_random_op.cc index 04dfdf7c48381240108cf924979764966599151f..be7f542a7a274d88d2dac953995d6a83a6ce022d 100644 --- a/paddle/operators/gaussian_random_op.cc +++ b/paddle/operators/gaussian_random_op.cc @@ -45,14 +45,14 @@ class GaussianRandomOp : public framework::OperatorWithKernel { void InferShape(framework::InferShapeContext* ctx) const override { PADDLE_ENFORCE(ctx->HasOutput("Out"), "Output(Out) of GaussianRandomOp should not be null."); - auto dims = ctx->Attrs().Get>("dims"); + auto shape = ctx->Attrs().Get>("shape"); std::vector temp; - temp.reserve(dims.size()); - for (auto dim : dims) { + temp.reserve(shape.size()); + for (auto dim : shape) { temp.push_back(static_cast(dim)); } - PADDLE_ENFORCE(dims.size() > 0UL, - "dims can be one int or array. dims must be set."); + PADDLE_ENFORCE(shape.size() > 0UL, + "shape can be one int or array. shape must be set."); ctx->SetOutputDim("Out", framework::make_ddim(temp)); } @@ -74,7 +74,7 @@ GaussianRandom operator. Use to initialize tensor with gaussian random generator. )DOC"); - AddAttr>("dims", "The dimension of random tensor."); + AddAttr>("shape", "The dimension of random tensor."); AddAttr("mean", "mean of random tensor.").SetDefault(.0f); AddAttr("std", "std of random tensor.").SetDefault(1.0f); AddAttr("seed", diff --git a/paddle/operators/linear_chain_crf_op.cc b/paddle/operators/linear_chain_crf_op.cc new file mode 100644 index 0000000000000000000000000000000000000000..605dbba5af1bb8b0d718833be6af45fdaeac70ac --- /dev/null +++ b/paddle/operators/linear_chain_crf_op.cc @@ -0,0 +1,261 @@ +/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. */ + +#include "paddle/operators/linear_chain_crf_op.h" + +namespace paddle { +namespace operators { + +class LinearChainCRFOpMaker : public framework::OpProtoAndCheckerMaker { + public: + LinearChainCRFOpMaker(framework::OpProto* proto, + framework::OpAttrChecker* op_checker) + : OpProtoAndCheckerMaker(proto, op_checker) { + AddInput( + "Emission", + "(LoDTensor, default: LoDTensor). " + "The unscaled emission weight matrix for the linear chain CRF. " + "This input is a LoDTensor with shape [N x D] where N is the size of " + "the mini-batch and D is the total tag number."); + AddInput( + "Transition", + "(Tensor, default: Tensor). A Tensor with shape [(D + 2) x D]. " + "The learnable parameter for the linear_chain_crf operator. " + "See more details in the operator's comments."); + AddInput( + "Label", + "(LoDTensor, default: LoDTensor). The ground truth which is a 2-D " + "LoDTensor with shape [N x 1], where N is the total element number in " + "a mini-batch."); + AddOutput( + "Alpha", + "Tensor, default: Tensor. The forward vectors for the entire " + "batch. A two dimensional tensor with shape [N x D], " + "denoted as \f$\alpha\f$. \f$\alpha$\f is a memo table used to " + "calculate the normalization factor in CRF. \f$\alpha[k, v]$\f stores " + "the unnormalized probabilites of all possible unfinished sequences of " + "tags that end at position \f$k$\f with tag \f$v$\f. For each \f$k$\f, " + "\f$\alpha[k, v]$\f is a vector of length \f$D$\f with a component for " + "each tag value \f$v$\f. This vector is called a forward vecotr and " + "will also be used in backward computations.") + .AsIntermediate(); + AddOutput("EmissionExps", + "The exponentials of Input(Emission). This is an intermediate " + "computational result in forward computation, and will be reused " + "in backward computation.") + .AsIntermediate(); + AddOutput("TransitionExps", + "The exponentials of Input(Transition). This is an intermediate " + "computational result in forward computation, and will be reused " + "in backward computation.") + .AsIntermediate(); + AddOutput( + "LogLikelihood", + "(Tensor, default: Tensor). The logarithm of the conditional " + "likelihood of each training sample in a mini-batch. This is a 2-D " + "tensor with shape [S x 1], where S is the sequence number in a " + "mini-batch. Note: S is equal to the sequence number in a mini-batch. " + "The output is no longer a LoDTensor."); + AddComment(R"DOC( +Conditional Random Field defines an undirected probabilistic graph with nodes +denoting random variables and edges denoting dependencies between these +variables. CRF learns the conditional probability \f$P(Y|X)\f$, where +\f$X = (x_1, x_2, ... , x_n)\f$ are structured inputs and +\f$Y = (y_1, y_2, ... , y_n)\f$ are labels for the inputs. + +Linear chain CRF is a special case of CRF that is useful for sequence labeling +task. Sequence labeling tasks do not assume a lot of conditional +independences among inputs. The only constraint they impose is that the input +and output must be linear sequences. Thus, the graph of such a CRF is a simple +chain or a line, which results in the linear chain CRF. + +This operator implements the Forward-Backward algorithm for the linear chain +CRF. Please see http://www.cs.columbia.edu/~mcollins/fb.pdf and +http://cseweb.ucsd.edu/~elkan/250Bwinter2012/loglinearCRFs.pdf for reference. + +Equation: + +- Denote Input(Emission) to this operator as \f$x\f$ here. +- The first D values of Input(Transition) to this operator are for starting +weights, denoted as \f$a\f$ here. +- The next D values of Input(Transition) of this operator are for ending +weights, denoted as \f$b\f$ here. +- The remaning values of Input(Transition) are for transition weights, +denoted as \f$w\f$ here. +- Denote Input(Label) as \f$s\f$ here. + +The probability of a sequence \f$s\f$ of length \f$L\f$ is defined as: +\f$P(s) = (1/Z) exp(a_{s_1} + b_{s_L} + + \sum_{l=1}^L x_{s_l} + + \sum_{l=2}^L w_{s_{l-1},s_l})\f$ +where \f$Z\f$ is a normalization value so that the sum of \f$P(s)\f$ over +all possible sequences is \f$1\f$, and \f$x\f$ is the emission feature weight +to the linear chain CRF. + +Finaly, the linear chain CRF operator outputs the logarithm of the conditional +likelihood of each training sample in a mini-batch. + +NOTE: +1. The feature function for a CRF is made up of the emission features and the +transition features. The emission feature weights are NOT computed in +this operator. They MUST be computed first before this operator is called. + +2. Because this operator performs global normalization over all possible +sequences internally, it expects UNSCALED emission feature weights. +Please do not call this op with the emission feature being output of any +nonlinear activation. + +3. The 2nd dimension of Input(Emission) MUST be equal to the tag number. + +)DOC"); + } +}; + +class LinearChainCRFOp : public framework::OperatorWithKernel { + public: + using framework::OperatorWithKernel::OperatorWithKernel; + + void InferShape(framework::InferShapeContext* ctx) const override { + PADDLE_ENFORCE(ctx->HasInput("Emission"), + "Input(Emission) should be not null."); + PADDLE_ENFORCE(ctx->HasInput("Transition"), + "Input(Transition) should be not null."); + PADDLE_ENFORCE(ctx->HasInput("Label"), "Input(Label) should be not null."); + + PADDLE_ENFORCE(ctx->HasOutput("Alpha"), + "Output(Alpha) should be not null."); + PADDLE_ENFORCE(ctx->HasOutput("EmissionExps"), + "Output(EmissionExps) should be not null."); + PADDLE_ENFORCE(ctx->HasOutput("TransitionExps"), + "Output(TransitionExps) should be not null."); + PADDLE_ENFORCE(ctx->HasOutput("LogLikelihood"), + "Output(LogLikelihood) should be not null."); + + auto emission_dims = ctx->GetInputDim("Emission"); + PADDLE_ENFORCE_EQ(emission_dims.size(), 2UL, + "The Input(Emission) should be a 2-D tensor."); + PADDLE_ENFORCE(emission_dims[0], "An empty mini-batch is not allowed."); + + auto transition_dims = ctx->GetInputDim("Transition"); + PADDLE_ENFORCE_EQ(transition_dims.size(), 2UL, + "The Input(Transition) should be a 2-D tensor."); + PADDLE_ENFORCE_EQ( + transition_dims[0] - 2, transition_dims[1], + "An invalid dimension for the Input(Transition), which should " + "be a 2-D tensor with shape [(D + 2) x D]."); + PADDLE_ENFORCE_EQ( + emission_dims[1], transition_dims[1], + "The 2nd dimension of the Input(Emission) and the Input(Transition) " + "should be equal to the tag number."); + + auto label_dims = ctx->GetInputDim("Label"); + PADDLE_ENFORCE(label_dims.size() == 2UL && label_dims[1] == 1UL, + "The Input(Label) should be a 2-D tensor with the 2nd " + "dimensions fixed to 1."); + PADDLE_ENFORCE_EQ( + emission_dims[0], label_dims[0], + "The height of Input(Emission) and the height of Input(Label) " + "should be the same."); + + ctx->SetOutputDim("Alpha", emission_dims); + ctx->SetOutputDim("EmissionExps", emission_dims); + ctx->SetOutputDim("TransitionExps", transition_dims); + // TODO(caoying) This is tricky. The 1st dimension of Output(LogLikelihood) + // is the sequence number in a mini-batch. The dimension set here should be + // resized to its correct size in the function Compute. Fix this once we can + // get LoD information in the InferShape interface. + ctx->SetOutputDim("LogLikelihood", {emission_dims[0], 1}); + } + + protected: + // Explicitly set that the data type of output of the linear_chain_crf + // operator is determined by its input "Emission". + framework::DataType IndicateDataType( + const framework::ExecutionContext& ctx) const override { + return framework::ToDataType(ctx.Input("Emission")->type()); + } +}; + +class LinearChainCRFGradOp : public framework::OperatorWithKernel { + public: + using framework::OperatorWithKernel::OperatorWithKernel; + + void InferShape(framework::InferShapeContext* ctx) const override { + PADDLE_ENFORCE(ctx->HasInput("EmissionExps"), + "Input(EmissionExps) should be not null."); + PADDLE_ENFORCE(ctx->HasInput("TransitionExps"), + "Input(TransitionExps) should be not null."); + PADDLE_ENFORCE(ctx->HasInput(framework::GradVarName("LogLikelihood")), + "Input(LogLikelihood@GRAD) shoudl be not null."); + + auto emission_exps_dims = ctx->GetInputDim("EmissionExps"); + PADDLE_ENFORCE_EQ(emission_exps_dims.size(), 2UL, + "The Input(EmissionExps) should be a 2-D tensor."); + PADDLE_ENFORCE(emission_exps_dims[0], + "An empty mini-batch is not allowed."); + + auto transition_exps_dims = ctx->GetInputDim("TransitionExps"); + PADDLE_ENFORCE_EQ(transition_exps_dims.size(), 2UL, + "The Input(TransitionExps) should be a 2-D tensor."); + PADDLE_ENFORCE_EQ( + transition_exps_dims[0] - 2, transition_exps_dims[1], + "An invalid dimension for the Input(TransitionExps), which should " + "be a 2-D tensor with shape [(D + 2) x D]."); + PADDLE_ENFORCE_EQ( + emission_exps_dims[1], transition_exps_dims[1], + "The 2nd dimension of the Input(EmissionExps) and the " + "Input(TransitionExps) should be equal to the tag number."); + + auto label_dims = ctx->GetInputDim("Label"); + PADDLE_ENFORCE(label_dims.size() == 2UL && label_dims[1] == 1UL, + "The Input(Label) should be a 2-D tensor with the 2nd " + "dimensions fixed to 1."); + PADDLE_ENFORCE_EQ( + emission_exps_dims[0], label_dims[0], + "The height of Input(EmissionExps) and the height of Input(Label) " + "should be the same."); + + if (ctx->HasOutput(framework::GradVarName("Emission"))) { + ctx->SetOutputDim(framework::GradVarName("Emission"), emission_exps_dims); + } + if (ctx->HasOutput(framework::GradVarName("Transition"))) { + ctx->SetOutputDim(framework::GradVarName("Transition"), + transition_exps_dims); + } + } + + protected: + // Explicitly set that the data type of output of the linear_chain_crf_grad + // operator is determined by its input: gradients of LogLikelihood. + framework::DataType IndicateDataType( + const framework::ExecutionContext& ctx) const override { + return framework::ToDataType( + ctx.Input(framework::GradVarName("LogLikelihood"))->type()); + } +}; + +} // namespace operators +} // namespace paddle + +namespace ops = paddle::operators; +REGISTER_OP(linear_chain_crf, ops::LinearChainCRFOp, ops::LinearChainCRFOpMaker, + linear_chain_crf_grad, ops::LinearChainCRFGradOp); +REGISTER_OP_CPU_KERNEL( + linear_chain_crf, + ops::LinearChainCRFOpKernel, + ops::LinearChainCRFOpKernel); +REGISTER_OP_CPU_KERNEL( + linear_chain_crf_grad, + ops::LinearChainCRFGradOpKernel, + ops::LinearChainCRFGradOpKernel); diff --git a/paddle/operators/linear_chain_crf_op.cu b/paddle/operators/linear_chain_crf_op.cu new file mode 100644 index 0000000000000000000000000000000000000000..6fc8995f4c2ce05f89ffb58129695113f89159fa --- /dev/null +++ b/paddle/operators/linear_chain_crf_op.cu @@ -0,0 +1,26 @@ +/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. */ + +#include "paddle/operators/linear_chain_crf_op.h" + +namespace ops = paddle::operators; + +REGISTER_OP_GPU_KERNEL( + linear_chain_crf, + ops::LinearChainCRFOpKernel, + ops::LinearChainCRFOpKernel); +REGISTER_OP_GPU_KERNEL( + linear_chain_crf_grad, + ops::LinearChainCRFGradOpKernel, + ops::LinearChainCRFGradOpKernel); diff --git a/paddle/operators/linear_chain_crf_op.h b/paddle/operators/linear_chain_crf_op.h new file mode 100644 index 0000000000000000000000000000000000000000..56fb0c9102bee6e2fefd1180ef20237891573f70 --- /dev/null +++ b/paddle/operators/linear_chain_crf_op.h @@ -0,0 +1,543 @@ +/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. */ + +#pragma once +#include "paddle/framework/eigen.h" +#include "paddle/framework/op_registry.h" +#include "paddle/operators/math/math_function.h" + +namespace paddle { +namespace operators { + +template +static inline T NormalizeL1(T* x, size_t len) { + T sum = 0.; + for (size_t i = 0; i < len; ++i) sum += x[i]; + // (This comment is from the old LinearChainCRFLayer.) + // Right now, we just bet that sum won't be zero. If this really happens, we + // will figure out what should be done then. + PADDLE_ENFORCE(sum, + "The unnormalized probabilities of all possible unfinished " + "sequences must be greater than 0."); + T s = 1. / sum; + for (size_t i = 0; i < len; ++i) x[i] *= s; + return sum; +} + +template +struct ScalarMul { + explicit ScalarMul(const T& scalar) : scalar(scalar) {} + T operator()(const T& val) const { return val * scalar; } + + T scalar; +}; + +using framework::LoDTensor; +using framework::LoD; +using framework::Tensor; +template +using EigenMatrix = framework::EigenMatrix; + +template +class LinearChainCRFOpKernel : public framework::OpKernel { + public: + void Compute(const framework::ExecutionContext& ctx) const override { + // TODO(caoying) The checks related to LoD information should be + // moved into InferShape once after the InferShape is refactored. + PADDLE_ENFORCE_EQ(ctx.Input("Emission")->NumLevels(), 1UL, + "The Input(Emission) should be a sequence."); + PADDLE_ENFORCE_EQ(ctx.Input("Label")->NumLevels(), 1UL, + "The Input(Label) should be a sequence."); + auto in_lod = ctx.Input("Label")->lod(); + PADDLE_ENFORCE(in_lod.size(), "Input(Label) must be a sequence."); + const size_t level = 0; + const size_t seq_num = in_lod[level].size() - 1; + + // These local variables hold the inputs and outputs, garanteeing them on + // CPU memory, to provide a consistent reference. + // TODO(caoying) Fix this by moving all these local variables into the + // class's data members once we can profile the whole training process. + LoDTensor* emission_weights = nullptr; + LoDTensor emission_weight_tensor; + Tensor* transition_weights = nullptr; + Tensor transition_weight_tensor; + LoDTensor* label = nullptr; + LoDTensor label_tensor; + + Tensor* emission_exps = nullptr; + Tensor emission_exps_tensor; + Tensor* transition_exps = nullptr; + Tensor transition_exps_tensor; + Tensor* alpha = nullptr; + Tensor alpha_tensor; + Tensor* ll = nullptr; + Tensor ll_tensor; + + if (platform::is_gpu_place(ctx.GetPlace())) { + emission_weights = &emission_weight_tensor; + transition_weights = &transition_weight_tensor; + label = &label_tensor; + + CopyInputsToCpuMemory( + ctx.device_context(), *ctx.Input("Emission"), + *ctx.Input("Transition"), *ctx.Input("Label"), + emission_weights, transition_weights, label); + + emission_exps = &emission_exps_tensor; + emission_exps->Resize(emission_weights->dims()); + + transition_exps = &transition_exps_tensor; + transition_exps->Resize(transition_weights->dims()); + + alpha = &alpha_tensor; + alpha->Resize(ctx.Output("Alpha")->dims()); + + ll = &ll_tensor; + } else { + emission_weights = + const_cast(ctx.Input("Emission")); + transition_weights = const_cast(ctx.Input("Transition")); + label = const_cast(ctx.Input("Label")); + + emission_exps = ctx.Output("EmissionExps"); + transition_exps = ctx.Output("TransitionExps"); + alpha = ctx.Output("Alpha"); + ll = ctx.Output("LogLikelihood"); + } + + // Because the computation codes only runs on CPU, here the memory for all + // the outputs is FIXED to be allocated on the CPU memory. + emission_exps->mutable_data(platform::CPUPlace()); + transition_exps->mutable_data(platform::CPUPlace()); + alpha->mutable_data(platform::CPUPlace()); + + // Resize the output tensor to its correct dimension. + ll->Resize({static_cast(seq_num), 1}); + ll->mutable_data(platform::CPUPlace()); + + // Now, all the inputs and outputs should be on the CPU memory. + auto emission_dims = emission_weights->dims(); + const size_t batch_size = emission_dims[0]; + const size_t tag_num = emission_dims[1]; + + Tensor emission_row_max; + emission_row_max.mutable_data( + framework::make_ddim({static_cast(batch_size), 1}), + platform::CPUPlace()); + + auto place = ctx.GetEigenDevice(); + auto x = EigenMatrix::From(*emission_weights); + auto x_row_max = EigenMatrix::From(emission_row_max); + x_row_max.device(place) = + x.maximum(Eigen::DSizes(1)) + .reshape(Eigen::DSizes(int(batch_size), 1)); + + auto x_exps = EigenMatrix::From(*emission_exps); + x_exps.device(place) = + (x - x_row_max.broadcast(Eigen::DSizes(1, tag_num))).exp(); + + auto w = EigenMatrix::From(*transition_weights); + auto w_exps = EigenMatrix::From(*transition_exps); + w_exps.device(place) = w.exp(); + + T* log_likelihood = ll->data(); + for (size_t i = 0; i < seq_num; ++i) { + int start_pos = static_cast(in_lod[level][i]); + int end_pos = static_cast(in_lod[level][i + 1]); + if (end_pos == start_pos) { + // If an empty input sequence is given, pad 0 for its cost. + log_likelihood[i] = 0.; + continue; + } + + const Tensor one_seq = emission_weights->Slice(start_pos, end_pos); + Tensor one_seq_row_max = emission_row_max.Slice(start_pos, end_pos); + Tensor one_seq_exps = emission_exps->Slice(start_pos, end_pos); + const Tensor one_seq_label = label->Slice(start_pos, end_pos); + Tensor one_seq_alpha = alpha->Slice(start_pos, end_pos); + + log_likelihood[i] = ForwardOneSequence( + one_seq, one_seq_row_max, one_seq_exps, *transition_weights, + *transition_exps, one_seq_label, &one_seq_alpha); + } + + if (platform::is_gpu_place(ctx.GetPlace())) { + CopyOutputsToGpuMemory( + ctx.device_context(), *emission_exps, *transition_exps, *alpha, *ll, + ctx.Output("EmissionExps"), + ctx.Output("TransitionExps"), ctx.Output("Alpha"), + ctx.Output("LogLikelihood")); + } + }; + + private: + void CopyInputsToCpuMemory(const platform::DeviceContext& ctx, + const LoDTensor& emission_weights_src, + const Tensor& transition_weights_src, + const LoDTensor& label_src, + LoDTensor* emission_weights_dst, + Tensor* transition_weights_dst, + LoDTensor* label_dst) const { + // Copy the inputs from GPU memory to CPU memory if this operators runs on + // GPU device. + auto copyLoDTensor = [](const platform::DeviceContext& ctx, + const LoDTensor& src, LoDTensor* dst) { + dst->mutable_data(src.dims(), platform::CPUPlace()); + dst->CopyFrom(src, platform::CPUPlace(), ctx); + }; + + copyLoDTensor(ctx, emission_weights_src, emission_weights_dst); + copyLoDTensor(ctx, label_src, label_dst); + + transition_weights_dst->mutable_data(transition_weights_src.dims(), + platform::CPUPlace()); + transition_weights_dst->CopyFrom(transition_weights_src, + platform::CPUPlace(), ctx); + } + + void CopyOutputsToGpuMemory(const platform::DeviceContext& ctx, + const Tensor& emission_exps_src, + const Tensor& transition_exps_src, + const Tensor& alpha_src, const Tensor& ll_src, + Tensor* emission_exps_dst, + Tensor* transition_exps_dst, Tensor* alpha_dst, + Tensor* ll_dst) const { + // Copy the forward results from CPU memory to GPU memory if this + // operators runs on GPU device. + auto copyTensor = [](const platform::DeviceContext& ctx, const Tensor& src, + Tensor* dst) { + dst->mutable_data(platform::GPUPlace()); + dst->CopyFrom(src, platform::GPUPlace(), ctx); + }; + copyTensor(ctx, emission_exps_src, emission_exps_dst); + copyTensor(ctx, transition_exps_src, transition_exps_dst); + copyTensor(ctx, alpha_src, alpha_dst); + copyTensor(ctx, ll_src, ll_dst); + } + + T ForwardOneSequence(const Tensor& emission, const Tensor& emission_row_max, + const Tensor& emission_exps, const Tensor& trans_weights, + const Tensor& trans_weight_exps, const Tensor& label, + Tensor* alpha) const { + const T* x = emission.data(); + const T* x_row_max = emission_row_max.data(); + const T* x_exps = emission_exps.data(); + const T* w = trans_weights.data(); + const T* w_exps = trans_weight_exps.data(); + T* alpha_value = alpha->data(); + + auto x_dims = emission.dims(); + const size_t seq_length = x_dims[0]; + const size_t tag_num = x_dims[1]; + // The 1st row of w are transition weights for start mask. + // The 2nd row of w are transition weights for end mask. + // Transition weights between other tags begin from the 3rd row of w. + const size_t state_trans_base_idx = 2; + + for (size_t i = 0; i < tag_num; ++i) { + alpha_value[i] = w_exps[i] * x_exps[i]; + } + T ll = -x_row_max[0] - std::log(NormalizeL1(alpha_value, tag_num)); + + for (size_t k = 1; k < seq_length; ++k) { + for (size_t i = 0; i < tag_num; ++i) { + T sum = 0.; + for (size_t j = 0; j < tag_num; ++j) { + sum += alpha_value[(k - 1) * tag_num + j] * // (*) + w_exps[(j + state_trans_base_idx) * tag_num + i]; + } + alpha_value[k * tag_num + i] = x_exps[k * tag_num + i] * sum; + } + // NormalizeL1 is to avoid underflow or overflow at (*). + ll -= x_row_max[k] + + std::log(NormalizeL1(alpha_value + k * tag_num, tag_num)); + } + T sum = 0.; + for (size_t i = 0; i < tag_num; ++i) { + sum += alpha_value[(seq_length - 1) * tag_num + i] * w_exps[tag_num + i]; + } + ll -= std::log(sum); + // Now ll is equal to -log(Z). + + const int* lbl = label.data(); + PADDLE_ENFORCE_LT( + *std::max_element(lbl, lbl + seq_length), tag_num, + "An invalid tag label that execesses the largest tag number."); + + // Calculate the nominator part, which depends on the label sequence. + ll += w[lbl[0]] /*start transition*/ + x[lbl[0]] + + w[tag_num + lbl[seq_length - 1]] /*end transition*/; + for (size_t k = 1; k < seq_length; ++k) { + ll += x[k * tag_num + lbl[k]] + + w[(lbl[k - 1] + state_trans_base_idx) * tag_num + lbl[k]]; + } + return -ll; + } +}; + +template +class LinearChainCRFGradOpKernel : public framework::OpKernel { + public: + void Compute(const framework::ExecutionContext& ctx) const override { + const size_t level = 0; // currently, only support sequence. + auto lod = ctx.Input("Label")->lod(); + PADDLE_ENFORCE(lod.size(), "Input(Label) must be a sequence."); + + // These local variables hold the inputs and outputs, garanteeing them on + // CPU memory, to provide a consistent reference. + // TODO(caoying) Fix this by moving all these local variables into the + // class's data members once we can profile the training process, or + // implementing a real GPU kernel for CRF. + Tensor* label = nullptr; + Tensor label_tensor; + Tensor* emission_exps = nullptr; + Tensor emission_exps_tensor; + Tensor* transition_exps = nullptr; + Tensor transition_exps_tensor; + Tensor* alpha = nullptr; + Tensor alpha_tensor; + Tensor ll_grad_tensor; + T* ll_grad = nullptr; + + Tensor* emission_grad = nullptr; + Tensor emission_grad_tensor; + Tensor* transition_grad = nullptr; + Tensor transition_grad_tensor; + + if (platform::is_gpu_place(ctx.GetPlace())) { + label = &label_tensor; + emission_exps = &emission_exps_tensor; + transition_exps = &transition_exps_tensor; + alpha = &alpha_tensor; + CopyInputsToCpuMemory( + ctx.device_context(), *ctx.Input("Label"), + *ctx.Input("EmissionExps"), + *ctx.Input("TransitionExps"), *ctx.Input("Alpha"), + *ctx.Input(framework::GradVarName("LogLikelihood")), label, + emission_exps, transition_exps, alpha, &ll_grad_tensor); + ll_grad = ll_grad_tensor.data(); + + if (ctx.Output(framework::GradVarName("Emission"))) { + emission_grad = &emission_grad_tensor; + emission_grad->Resize(emission_exps->dims()); + } + + if (ctx.Output(framework::GradVarName("Transition"))) { + transition_grad = &transition_grad_tensor; + transition_grad->Resize(transition_exps->dims()); + } + } else { + label = const_cast(ctx.Input("Label")); + emission_exps = const_cast(ctx.Input("EmissionExps")); + transition_exps = + const_cast(ctx.Input("TransitionExps")); + alpha = const_cast(ctx.Input("Alpha")); + ll_grad = const_cast( + ctx.Input(framework::GradVarName("LogLikelihood"))) + ->data(); + + emission_grad = ctx.Output(framework::GradVarName("Emission")); + transition_grad = + ctx.Output(framework::GradVarName("Transition")); + } + + // TODO(caoying) Fix this constraint. When the Input(Emission) is from the + // data reader operator, it can have no gradients. + PADDLE_ENFORCE(emission_grad, "Output(Emission@Grad) should not be null."); + emission_grad->mutable_data(platform::CPUPlace()); + if (transition_grad) { + transition_grad->mutable_data(platform::CPUPlace()); + math::SetConstant()(ctx.device_context(), + transition_grad, 0.); + } + // Now, all the inputs and outputs should be on the CPU memory. + + auto emission_dims = emission_exps->dims(); + // Beta is the memo table used in dynamic programming to calculate the + // backwark vectors. For a backward vector i (the i-th row of beta), it + // captures the unnormalized probabilities of partial sequences starting + // at position i. + Tensor beta; + beta.mutable_data(emission_dims, platform::CPUPlace()); + + for (size_t i = 0; i < lod[level].size() - 1; ++i) { + int start_pos = static_cast(lod[level][i]); + int end_pos = static_cast(lod[level][i + 1]); + if (end_pos == start_pos) continue; + + const Tensor one_seq_emission_exps = + emission_exps->Slice(start_pos, end_pos); + const Tensor one_seq_label = label->Slice(start_pos, end_pos); + const Tensor one_seq_alpha = alpha->Slice(start_pos, end_pos); + Tensor one_seq_beta = beta.Slice(start_pos, end_pos); + Tensor one_seq_emission_grad = emission_grad->Slice(start_pos, end_pos); + + BackwardOneSequence(ctx.device_context(), ll_grad[i], + one_seq_emission_exps, *transition_exps, + one_seq_alpha, one_seq_label, &one_seq_beta, + transition_grad, &one_seq_emission_grad); + } + + if (platform::is_gpu_place(ctx.GetPlace())) { + CopyOutputsToGpuMemory( + ctx.device_context(), emission_grad, transition_grad, + ctx.Output(framework::GradVarName("Emission")), + ctx.Output(framework::GradVarName("Transition"))); + } + }; + + private: + void CopyInputsToCpuMemory(const platform::DeviceContext& ctx, + const LoDTensor& label_src, + const Tensor& emission_exps_src, + const Tensor& transition_exps_src, + const Tensor& alpha_src, const Tensor& ll_grad_src, + Tensor* label_dst, Tensor* emission_exps_dst, + Tensor* transition_exps_dst, Tensor* alpha_dst, + Tensor* ll_grad_dst) const { + // Copy the inputs from GPU memory to CPU memory when this operators runs on + // GPU device. + label_dst->mutable_data(label_src.dims(), platform::CPUPlace()); + label_dst->CopyFrom(label_src, platform::CPUPlace(), ctx); + + auto copyTensor = [](const platform::DeviceContext& ctx, const Tensor& src, + Tensor* dst) { + dst->mutable_data(src.dims(), platform::CPUPlace()); + dst->CopyFrom(src, platform::CPUPlace(), ctx); + }; + copyTensor(ctx, emission_exps_src, emission_exps_dst); + copyTensor(ctx, transition_exps_src, transition_exps_dst); + copyTensor(ctx, alpha_src, alpha_dst); + copyTensor(ctx, ll_grad_src, ll_grad_dst); + } + + void CopyOutputsToGpuMemory(const platform::DeviceContext& ctx, + const Tensor* emission_grad_src, + const Tensor* transition_grad_src, + Tensor* emission_grad_dst, + Tensor* transition_grad_dst) const { + // Copy the backward results from CPU memory to GPU + // memory if this operators runs on GPU device. + auto copyTensor = [](const platform::DeviceContext& ctx, const Tensor* src, + Tensor* dst) { + if (src && dst) { + dst->mutable_data(platform::GPUPlace()); + dst->CopyFrom(*src, platform::GPUPlace(), ctx); + } + }; + copyTensor(ctx, emission_grad_src, emission_grad_dst); + copyTensor(ctx, transition_grad_src, transition_grad_dst); + } + + void BackwardOneSequence(const platform::DeviceContext& ctx, const T ll_grad, + const Tensor& emission_exps, + const Tensor& transition_exps, const Tensor& alpha, + const Tensor& label, Tensor* beta, + Tensor* transition_grad, + Tensor* emission_grad) const { + const T* w_exps = transition_exps.data(); + const T* x_exps = emission_exps.data(); + const int* label_value = label.data(); + T* beta_value = beta->data(); + + auto x_dims = emission_exps.dims(); + const size_t seq_length = x_dims[0]; + const size_t tag_num = x_dims[1]; + const size_t state_trans_base_idx = 2; + + // Calculate the backward vectors: beta. + // First, calculate the initialition state. + for (size_t i = 0; i < tag_num; ++i) { + beta_value[(seq_length - 1) * tag_num + i] = w_exps[tag_num + i]; + } + NormalizeL1(beta_value + (seq_length - 1) * tag_num, tag_num); + for (int k = static_cast(seq_length) - 2; k >= 0; --k) { + for (size_t i = 0; i < tag_num; ++i) { + T sum = 0.; + for (size_t j = 0; j < tag_num; ++j) { + sum += w_exps[(i + state_trans_base_idx) * tag_num + j] * // (**) + x_exps[(k + 1) * tag_num + j] * + beta_value[(k + 1) * tag_num + j]; + } + beta_value[k * tag_num + i] = sum; + } + // NormalizeL1 is to avoid underflow or overflow at (**). + NormalizeL1(beta_value + k * tag_num, tag_num); + } + + auto x_grad_mat = EigenMatrix::From(*emission_grad); + auto alpha_mat = EigenMatrix::From(alpha); + auto beta_mat = EigenMatrix::From(*beta); + + auto* place = ctx.GetEigenDevice(); + auto prob = alpha_mat * beta_mat; + auto row_sum = prob.sum(Eigen::DSizes(1)) + .reshape(Eigen::DSizes(seq_length, 1)) + .broadcast(Eigen::DSizes(1, tag_num)); + x_grad_mat.device(*place) = + (prob / row_sum).unaryExpr(ScalarMul(ll_grad)); + + for (size_t k = 0; k < seq_length; ++k) { + x_grad_mat(k, label_value[k]) -= static_cast(ll_grad); + } + + if (transition_grad) { + T* trans_grad = transition_grad->data(); + for (size_t k = 0; k < tag_num; ++k) { + // Do not multiply by the output gradient here, because x_grad_mat has + // alrealy done this. + trans_grad[k] += x_grad_mat(/*from start state*/ 0, k); + trans_grad[tag_num + k] += + x_grad_mat(/*to end state*/ seq_length - 1, k); + } + + auto x_exps_mat = EigenMatrix::From(emission_exps); + + // TODO(caoying): Fix this to avoid using this local variable if we can + // profile the training process. + Tensor tmp; + tmp.mutable_data(beta->dims(), platform::CPUPlace()); + auto tmp_mat = EigenMatrix::From(tmp); + auto prob = beta_mat * x_exps_mat; + auto row_sum = prob.sum(Eigen::DSizes(1)) + .reshape(Eigen::DSizes(seq_length, 1)) + .broadcast(Eigen::DSizes(1, tag_num)); + tmp_mat.device(*place) = prob / row_sum; + + for (size_t k = 1; k < seq_length; ++k) { + T sum = 0.; + for (size_t i = 0; i < tag_num; ++i) { + for (size_t j = 0; j < tag_num; ++j) { + sum += w_exps[(i + state_trans_base_idx) * tag_num + j] * // (**) + alpha_mat(k - 1, i) * tmp_mat(k, j); + } + } + sum = 1. / sum; + for (size_t i = 0; i < tag_num; ++i) { + for (size_t j = 0; j < tag_num; ++j) { + trans_grad[(i + state_trans_base_idx) * tag_num + j] += + sum * w_exps[(i + state_trans_base_idx) * tag_num + j] * + alpha_mat(k - 1, i) * tmp_mat(k, j) * ll_grad; + } + } + trans_grad[(label_value[k - 1] + state_trans_base_idx) * tag_num + + label_value[k]] -= static_cast(ll_grad); + } + } + } +}; + +} // namespace operators +} // namespace paddle diff --git a/paddle/operators/lookup_table_op.cc b/paddle/operators/lookup_table_op.cc index 8fdd42352e5e6857e4bf0e4645f82c8e2fcdc6fd..0b361e20f2037b9b75bc8670488dff1c50fb689c 100644 --- a/paddle/operators/lookup_table_op.cc +++ b/paddle/operators/lookup_table_op.cc @@ -43,7 +43,7 @@ class LookupTableOp : public framework::OperatorWithKernel { protected: framework::DataType IndicateDataType( const framework::ExecutionContext& ctx) const override { - return framework::ToDataType(ctx.Input("W")->type()); + return framework::ToDataType(ctx.Input("W")->type()); } }; @@ -93,7 +93,7 @@ class LookupTableOpGrad : public framework::OperatorWithKernel { protected: framework::DataType IndicateDataType( const framework::ExecutionContext& ctx) const override { - return framework::ToDataType(ctx.Input("W")->type()); + return framework::ToDataType(ctx.Input("W")->type()); } }; diff --git a/paddle/operators/lookup_table_op.cu b/paddle/operators/lookup_table_op.cu index 837b2a1f4c94f201c0ab498671f936aab6c7a811..c7ba1720662fe80c945f2b4aa19745e408d40948 100644 --- a/paddle/operators/lookup_table_op.cu +++ b/paddle/operators/lookup_table_op.cu @@ -61,16 +61,16 @@ template class LookupTableCUDAKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { - auto table_t = context.Input("W"); - auto ids_t = context.Input("Ids"); - auto output_t = context.Output("Out"); + auto* table_t = context.Input("W"); + auto* ids_t = context.Input("Ids"); + auto* output_t = context.Output("Out"); size_t N = table_t->dims()[0]; size_t D = table_t->dims()[1]; size_t K = ids_t->numel(); - auto ids = ids_t->data(); - auto table = table_t->data(); - auto output = output_t->mutable_data(context.GetPlace()); + auto* ids = ids_t->data(); + auto* table = table_t->data(); + auto* output = output_t->mutable_data(context.GetPlace()); dim3 threads(128, 8); dim3 grids(8, 1); @@ -87,9 +87,9 @@ class LookupTableGradCUDAKernel : public framework::OpKernel { void Compute(const framework::ExecutionContext& context) const override { bool is_sparse = context.Attr("is_sparse"); if (is_sparse) { - auto* ids = context.Input("Ids"); - auto* table = context.Input("W"); - auto* d_output = context.Input(framework::GradVarName("Out")); + auto* ids = context.Input("Ids"); + auto* table = context.Input("W"); + auto* d_output = context.Input(framework::GradVarName("Out")); auto* d_table = context.Output(framework::GradVarName("W")); auto* ids_data = ids->data(); @@ -116,12 +116,12 @@ class LookupTableGradCUDAKernel : public framework::OpKernel { auto* d_output_data = d_output->data(); PADDLE_ENFORCE_EQ(d_table_value->dims(), d_output->dims()); memory::Copy(gpu_place, d_table_data, gpu_place, d_output_data, - d_output->numel(), stream); + d_output->numel() * sizeof(T), stream); } else { - auto ids_t = context.Input("Ids"); - auto d_output_t = context.Input(framework::GradVarName("Out")); - auto d_table_t = context.Output(framework::GradVarName("W")); + auto ids_t = context.Input("Ids"); + auto d_output_t = context.Input(framework::GradVarName("Out")); + auto d_table_t = context.Output(framework::GradVarName("W")); int N = d_table_t->dims()[0]; int D = d_table_t->dims()[1]; diff --git a/paddle/operators/lookup_table_op.h b/paddle/operators/lookup_table_op.h index 54067cd01d3ef35a050a3c2565ea19cb6520bcec..ea3289d2731a4b2098c3a199464559b0a0ce7202 100644 --- a/paddle/operators/lookup_table_op.h +++ b/paddle/operators/lookup_table_op.h @@ -19,22 +19,22 @@ namespace paddle { namespace operators { -using Tensor = framework::Tensor; +using LoDTensor = framework::LoDTensor; using SelectedRows = framework::SelectedRows; template class LookupTableKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { - auto table_t = context.Input("W"); // float tensor - auto ids_t = context.Input("Ids"); // int tensor - auto output_t = context.Output("Out"); // float tensor + auto* table_t = context.Input("W"); // float tensor + auto* ids_t = context.Input("Ids"); // int tensor + auto* output_t = context.Output("Out"); // float tensor int N = table_t->dims()[0]; int D = table_t->dims()[1]; - auto ids = ids_t->data(); - auto table = table_t->data(); - auto output = output_t->mutable_data(context.GetPlace()); + auto* ids = ids_t->data(); + auto* table = table_t->data(); + auto* output = output_t->mutable_data(context.GetPlace()); for (int64_t i = 0; i < ids_t->numel(); ++i) { PADDLE_ENFORCE_LT(ids[i], N); PADDLE_ENFORCE_GE(ids[i], 0); @@ -49,9 +49,9 @@ class LookupTableGradKernel : public framework::OpKernel { void Compute(const framework::ExecutionContext& context) const override { bool is_sparse = context.Attr("is_sparse"); if (is_sparse) { - auto* ids = context.Input("Ids"); - auto* table = context.Input("W"); - auto* d_output = context.Input(framework::GradVarName("Out")); + auto* ids = context.Input("Ids"); + auto* table = context.Input("W"); + auto* d_output = context.Input(framework::GradVarName("Out")); auto* d_table = context.Output(framework::GradVarName("W")); auto* ids_data = ids->data(); @@ -76,10 +76,10 @@ class LookupTableGradKernel : public framework::OpKernel { PADDLE_ENFORCE_EQ(d_table_value->dims(), d_output->dims()); memcpy(d_table_data, d_output_data, sizeof(T) * d_output->numel()); } else { - auto* ids = context.Input("Ids"); - auto* d_output = context.Input(framework::GradVarName("Out")); - auto* d_table = context.Output(framework::GradVarName("W")); - auto* table = context.Input("W"); + auto* ids = context.Input("Ids"); + auto* d_output = context.Input(framework::GradVarName("Out")); + auto* d_table = context.Output(framework::GradVarName("W")); + auto* table = context.Input("W"); auto* ids_data = ids->data(); auto ids_dim = ids->dims(); diff --git a/paddle/operators/nccl_op_test.cu b/paddle/operators/nccl_op_test.cu index 80c50a28a9e5d560fc693c518b9e62091ddc5724..e5927d56ae7cfbd09e941c993041af46ecd8d70d 100644 --- a/paddle/operators/nccl_op_test.cu +++ b/paddle/operators/nccl_op_test.cu @@ -185,7 +185,7 @@ TEST_F(NCCLTester, ncclAllReduceOp) { recv_tensor.numel() * sizeof(float), static_cast(dev_ctxs[i])->stream()); - for (size_t j = 0; j < f::product(kDims); ++j) { + for (int64_t j = 0; j < f::product(kDims); ++j) { ASSERT_NEAR(ct[j], result, 1e-5); } } @@ -234,7 +234,7 @@ TEST_F(NCCLTester, ncclReduceOp) { recv_tensor.numel() * sizeof(float), static_cast(dev_ctxs[kRoot])->stream()); - for (int j = 0; j < f::product(kDims); ++j) { + for (int64_t j = 0; j < f::product(kDims); ++j) { ASSERT_NEAR(ct[j], result, 1e-5); } } @@ -282,7 +282,7 @@ TEST_F(NCCLTester, ncclBcastOp) { recv_tensor.numel() * sizeof(float), static_cast(dev_ctxs[idx])->stream()); - for (size_t j = 0; j < f::product(kDims); ++j) { + for (int64_t j = 0; j < f::product(kDims); ++j) { ASSERT_NEAR(ct[j], result, 1e-5); } } diff --git a/paddle/operators/reshape_op.cc b/paddle/operators/reshape_op.cc index eda8226480a66ae1a631391e9335db04604039c5..9213cc7a85822e4c78ef72aec2bf86d2edac023a 100644 --- a/paddle/operators/reshape_op.cc +++ b/paddle/operators/reshape_op.cc @@ -36,7 +36,7 @@ class ReshapeOp : public framework::OperatorWithKernel { PADDLE_ENFORCE(shape.size() > 0, "Attr(shape) shouldn't be empty."); auto x_dims = ctx->GetInputDim("X"); // TODO(qiao) change batch_size - for (int i = 1; i < shape.size(); ++i) { + for (size_t i = 1; i < shape.size(); ++i) { PADDLE_ENFORCE(shape[i] > 0, "Each dimension of shape " "must be positiv except the first."); diff --git a/paddle/operators/save_load_op_test.cc b/paddle/operators/save_load_op_test.cc index fe2b15ec09c6d29ad5f78e5c36f534c6a88497e6..a57466a48d4d6016fe2618d19fdca4c4f667124a 100644 --- a/paddle/operators/save_load_op_test.cc +++ b/paddle/operators/save_load_op_test.cc @@ -34,7 +34,7 @@ TEST(SaveLoadOp, CPU) { tensor->set_lod(expect_lod); int* expect = tensor->mutable_data(place); - for (size_t i = 0; i < paddle::framework::product(tensor->dims()); ++i) { + for (int64_t i = 0; i < tensor->numel(); ++i) { expect[i] = static_cast(i); } paddle::framework::AttributeMap attrs; @@ -50,7 +50,7 @@ TEST(SaveLoadOp, CPU) { "load", {}, {{"Out", {"out_var"}}}, attrs); load_op->Run(scope, ctx); int* actual = target->data(); - for (size_t i = 0; i < paddle::framework::product(tensor->dims()); ++i) { + for (int64_t i = 0; i < tensor->numel(); ++i) { EXPECT_EQ(expect[i], actual[i]); } auto& actual_lod = target->lod(); @@ -60,4 +60,4 @@ TEST(SaveLoadOp, CPU) { EXPECT_EQ(expect_lod[i][j], actual_lod[i][j]); } } -} \ No newline at end of file +} diff --git a/paddle/operators/sequence_conv_op.cc b/paddle/operators/sequence_conv_op.cc index bdb52265a529f560b4622ee037dcb3160ac90dec..a3f2ed14439572e9723c3057d212bb773b2a4e44 100644 --- a/paddle/operators/sequence_conv_op.cc +++ b/paddle/operators/sequence_conv_op.cc @@ -89,7 +89,7 @@ class SequenceConvGradOp : public framework::OperatorWithKernel { } if (ctx->HasOutput(framework::GradVarName("X"))) { ctx->SetOutputDim(framework::GradVarName("X"), ctx->GetInputDim("X")); - ctx->ShareLoD(framework::GradVarName("X"), "X"); + ctx->ShareLoD("X", framework::GradVarName("X")); } if (ctx->HasOutput(framework::GradVarName("Filter"))) { ctx->SetOutputDim(framework::GradVarName("Filter"), diff --git a/paddle/operators/sequence_pool_op.cc b/paddle/operators/sequence_pool_op.cc index 6d600c27271c660f0cf933e8bd05455df61740ec..29d19df10898634dd433abc1263fefe169de6f08 100644 --- a/paddle/operators/sequence_pool_op.cc +++ b/paddle/operators/sequence_pool_op.cc @@ -39,15 +39,14 @@ class SequencePoolOpMaker : public framework::OpProtoAndCheckerMaker { AddOutput("Out", "(Tensor), output of SequencePoolOp, which does not contain LoD " "infomation."); - AddAttr( - "strategy", - "(int, default AVERAGE) the pooling strategy of SequencePoolOp.") - .SetDefault(AVERAGE) - .InEnum({AVERAGE, SUM, SQRT, MAX, LAST, FIRST}); + AddAttr( + "pooltype", + "(int, default AVERAGE) the pooling pooltype of SequencePoolOp.") + .SetDefault("AVERAGE"); AddComment(R"DOC( SequencePoolOp pools features of all time-steps of each instance. - It supports six pooling strategy: + It supports six pooling pooltype: - AVERAGE: Out[i] = average_{for each instance in i-th sequence}{X[i]} - SUM: Out[i] = sum_{for each instance in i-th sequence}{X[i]} - SQRT: Out[i] = sum_{for each instance in i-th sequence}{X[i]} @@ -63,7 +62,7 @@ class SequencePoolOpMaker : public framework::OpProtoAndCheckerMaker { and the value of X = [[1, 3], [2, 4, 6], [5, 1]]. Thus, Out is a [3,1,1] Tensor without LoD infomation. - And for different strategy, the value of Out is as follows: + And for different pooltype, the value of Out is as follows: - AVERAGE: [2, 4, 3], where 2=(1+3)/2, 4=(2+4+6)/3, 3=(5+1)/2 - SUM: [4, 12, 6], where 4=1+3, 12=2+4+6, 6=5+1 diff --git a/paddle/operators/sequence_pool_op.h b/paddle/operators/sequence_pool_op.h index 07bf61df45bf51c8648180ffc9eb97306865fab6..e0e0493fe0ef7e1963ce5c2e3f37c164a605809b 100644 --- a/paddle/operators/sequence_pool_op.h +++ b/paddle/operators/sequence_pool_op.h @@ -29,22 +29,13 @@ template using EigenMatrix = framework::EigenMatrix; -enum SeqPoolType { - AVERAGE = 0, - SUM = 1, - SQRT = 2, // square_root_n - MAX = 3, - LAST = 4, - FIRST = 5 -}; - template class SequencePoolKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { auto* in = context.Input("X"); auto* out = context.Output("Out"); - int strategy = context.Attr("strategy"); + std::string pooltype = context.Attr("pooltype"); auto dims = in->dims(); auto lod = in->lod(); @@ -71,28 +62,21 @@ class SequencePoolKernel : public framework::OpKernel { auto in_e = EigenMatrix::From(in_t, framework::make_ddim({h, w})); auto out_e = EigenVector::Flatten(out_t); - switch (strategy) { - case AVERAGE: - out_e.device(place) = in_e.mean(Eigen::array({{0}})); - break; - case SUM: - out_e.device(place) = in_e.sum(Eigen::array({{0}})); - break; - case SQRT: - out_e.device(place) = in_e.sum(Eigen::array({{0}})) / - std::sqrt(static_cast(h)); - break; - case MAX: - out_e.device(place) = in_e.maximum(Eigen::array({{0}})); - break; - case LAST: - out_e.device(place) = in_e.chip(h - 1, 0); - break; - case FIRST: - out_e.device(place) = in_e.chip(0, 0); - break; - default: - PADDLE_THROW("unsupported pooling strategy"); + if (pooltype == "AVERAGE") { + out_e.device(place) = in_e.mean(Eigen::array({{0}})); + } else if (pooltype == "SUM") { + out_e.device(place) = in_e.sum(Eigen::array({{0}})); + } else if (pooltype == "SQRT") { + out_e.device(place) = in_e.sum(Eigen::array({{0}})) / + std::sqrt(static_cast(h)); + } else if (pooltype == "MAX") { + out_e.device(place) = in_e.maximum(Eigen::array({{0}})); + } else if (pooltype == "LAST") { + out_e.device(place) = in_e.chip(h - 1, 0); + } else if (pooltype == "FIRST") { + out_e.device(place) = in_e.chip(0, 0); + } else { + PADDLE_THROW("unsupported pooling pooltype"); } } } @@ -105,15 +89,15 @@ class SequencePoolGradKernel : public framework::OpKernel { auto* in = context.Input("X"); auto* in_g = context.Output(framework::GradVarName("X")); auto* out_g = context.Input(framework::GradVarName("Out")); - int strategy = context.Attr("strategy"); + std::string pooltype = context.Attr("pooltype"); auto dims = in->dims(); auto lod = in->lod()[0]; int64_t w = in->numel() / dims[0]; in_g->mutable_data(context.GetPlace()); - if (strategy == LAST || strategy == FIRST) { - // set X@Grad be zero at first when strategy is LAST/FIRST + if (pooltype == "LAST" || pooltype == "FIRST") { + // set X@Grad be zero at first when pooltype is LAST/FIRST math::SetConstant functor; functor(context.device_context(), in_g, 0); } @@ -127,41 +111,33 @@ class SequencePoolGradKernel : public framework::OpKernel { auto out_g_e = EigenMatrix::From(out_g_t, {1, w}); Eigen::DSizes bcast(h, 1); - switch (strategy) { - case AVERAGE: - in_g_e.device(place) = (out_g_e / static_cast(h)).broadcast(bcast); - break; - case SUM: - in_g_e.device(place) = (out_g_e).broadcast(bcast); - break; - case SQRT: - in_g_e.device(place) = - (out_g_e / std::sqrt(static_cast(h))).broadcast(bcast); - break; - case MAX: { - auto in_t = - in->Slice(static_cast(lod[i]), static_cast(lod[i + 1])); - Eigen::Map> - in_t_map(in_t.data(), h, w); - int row_id; - Eigen::array extents{{1, 1}}; - for (int col_id = 0; col_id < w; col_id++) { - in_t_map.col(col_id).maxCoeff(&row_id); - Eigen::array in_offsets{{row_id, col_id}}; - Eigen::array out_offsets{{0, col_id}}; - in_g_e.slice(in_offsets, extents).device(place) = - out_g_e.slice(out_offsets, extents); - } - break; + if (pooltype == "AVERAGE") { + in_g_e.device(place) = (out_g_e / static_cast(h)).broadcast(bcast); + } else if (pooltype == "SUM") { + in_g_e.device(place) = (out_g_e).broadcast(bcast); + } else if (pooltype == "SQRT") { + in_g_e.device(place) = + (out_g_e / std::sqrt(static_cast(h))).broadcast(bcast); + } else if (pooltype == "MAX") { + auto in_t = + in->Slice(static_cast(lod[i]), static_cast(lod[i + 1])); + Eigen::Map> + in_t_map(in_t.data(), h, w); + int row_id; + Eigen::array extents{{1, 1}}; + for (int col_id = 0; col_id < w; col_id++) { + in_t_map.col(col_id).maxCoeff(&row_id); + Eigen::array in_offsets{{row_id, col_id}}; + Eigen::array out_offsets{{0, col_id}}; + in_g_e.slice(in_offsets, extents).device(place) = + out_g_e.slice(out_offsets, extents); } - case LAST: - in_g_e.chip(h - 1, 0).device(place) = out_g_e; - break; - case FIRST: - in_g_e.chip(0, 0).device(place) = out_g_e; - break; - default: - PADDLE_THROW("unsupported pooling strategy"); + } else if (pooltype == "LAST") { + in_g_e.chip(h - 1, 0).device(place) = out_g_e; + } else if (pooltype == "FIRST") { + in_g_e.chip(0, 0).device(place) = out_g_e; + } else { + PADDLE_THROW("unsupported pooling pooltype"); } } } diff --git a/paddle/operators/softmax_with_cross_entropy_op.cc b/paddle/operators/softmax_with_cross_entropy_op.cc index 942fbb42df8bb90b86bd097832a15b320a857750..50497da1b70d39d2638240dd91035c9181124af9 100644 --- a/paddle/operators/softmax_with_cross_entropy_op.cc +++ b/paddle/operators/softmax_with_cross_entropy_op.cc @@ -32,9 +32,9 @@ class SoftmaxWithCrossEntropyOpMaker AddInput("Label", "(Tensor, default: Tensor), The ground truth which is a 2-D " "tensor. " - "If softLable is set to 0, Label is a Tensor with shape [N x " - "1]. " - "If softLable is set to 1, Label is a Tensor " + "If softLabel is set to false, Label is a Tensor with shape " + "[N x 1]." + "If softLabel is set to true, Label is a Tensor " "with shape [N x K]."); AddOutput( "Softmax", @@ -60,19 +60,23 @@ Because this operators performs a softmax on logits internally, it expects unscaled logits. Please do not call this op with the output of softmax operator, which will produce incorrect results. -This operators expects mutually exclusive hard labels, each sample in a batch -is in exactly one class with probabilities 1. Each sample in the batch with one -and only one label. +When the attribute softLabel is set false, this operators expects mutually +exclusive hard labels, each sample in a batch is in exactly one class with +probabilities 1. Each sample in the batch with one and only one label. Equation: 1) hard label (one-hot label) -Loss_j = -\text{Logit}_{Label_j} + \log\left(\sum_{i=0}^{K}\exp(\text{Logit}_i)\right), j = 1, ..., K +Loss_j = \f$ -\text{Logit}_{Label_j} + +\log\left(\sum_{i=0}^{K}\exp(\text{Logit}_i)\right), +j = 1, ..., K $\f 2) soft label (a distribution over all classes) -Loss_j = -\sum_{i=0}^{K}\text{Label}_i\left(\text{Logit}_i-\log\left(\sum_{i=0}^{K}\exp(\text{Logit}_i)\right)\right), j = 1,...,K +Loss_j = \f$ -\sum_{i=0}^{K}\text{Label}_i\left(\text{Logit}_i - +\log\left(\sum_{i=0}^{K}\exp(\text{Logit}_i)\right)\right), +j = 1,...,K $\f )DOC"); } diff --git a/paddle/pybind/protobuf.cc b/paddle/pybind/protobuf.cc index 14adfa1f35225ca5bf0c093dcf75d1c21af69676..dcae426c7e231757d796c5a84cc5a1c2b0d6763b 100644 --- a/paddle/pybind/protobuf.cc +++ b/paddle/pybind/protobuf.cc @@ -129,7 +129,8 @@ void BindProgramDesc(py::module &m) { } return retv; }) - .def("block", &ProgramDescBind::Block, py::return_value_policy::reference) + .def("block", &ProgramDescBind::MutableBlock, + py::return_value_policy::reference) .def("num_blocks", &ProgramDescBind::Size) .def("serialize_to_string", [](ProgramDescBind &program_desc) -> py::bytes { diff --git a/paddle/pybind/pybind.cc b/paddle/pybind/pybind.cc index 2a0075356ed1e0f0b3501ac681c5e3a1bf37e2ca..881df6ad3244e597f1c0f0121ccec08d50dd851e 100644 --- a/paddle/pybind/pybind.cc +++ b/paddle/pybind/pybind.cc @@ -275,7 +275,7 @@ All parameter, weight, gradient are variables in Paddle. const std::vector> &targets) { ProgramDescBind prog_with_targets(origin); for (const auto &t : targets) { - prog_with_targets.Block(t[0])->Op(t[1])->MarkAsTarget(); + prog_with_targets.MutableBlock(t[0])->Op(t[1])->MarkAsTarget(); } ProgramDesc pruned_desc; Prune(*prog_with_targets.Proto(), &pruned_desc); @@ -335,7 +335,7 @@ All parameter, weight, gradient are variables in Paddle. PADDLE_ENFORCE(desc.IsInitialized(), "User OpDesc is not initialized, reason %s", desc.InitializationErrorString()); - return OpRegistry::CreateOp(desc, nullptr); + return OpRegistry::CreateOp(desc); }) .def("backward", [](const OperatorBase &forwardOp, @@ -439,7 +439,7 @@ All parameter, weight, gradient are variables in Paddle. PADDLE_ENFORCE(desc.IsInitialized(), "User OpDesc is not initialized, reason %s", desc.InitializationErrorString()); - auto rnn_op = OpRegistry::CreateOp(desc, nullptr); + auto rnn_op = OpRegistry::CreateOp(desc); return static_cast(rnn_op.release()); }) .def("set_stepnet", [](operators::RecurrentOp &self, @@ -457,7 +457,7 @@ All parameter, weight, gradient are variables in Paddle. PADDLE_ENFORCE(desc.IsInitialized(), "User OpDesc is not initialized, reason %s", desc.InitializationErrorString()); - auto rnn_op = OpRegistry::CreateOp(desc, nullptr); + auto rnn_op = OpRegistry::CreateOp(desc); return static_cast( rnn_op.release()); }) @@ -484,7 +484,7 @@ All parameter, weight, gradient are variables in Paddle. PADDLE_ENFORCE(desc.IsInitialized(), "User OpDesc is not initialized, reason %s", desc.InitializationErrorString()); - auto cond_op = OpRegistry::CreateOp(desc, nullptr); + auto cond_op = OpRegistry::CreateOp(desc); return static_cast(cond_op.release()); }) .def("set_truenet", @@ -498,10 +498,7 @@ All parameter, weight, gradient are variables in Paddle. py::class_(m, "Executor") .def(py::init &>()) - .def("run", [](Executor &self, ProgramDescBind *program_bind, - Scope *scope, int block_id) { - self.Run(*program_bind->Proto(), scope, block_id); - }); + .def("run", &Executor::Run); m.def("unique_integer", UniqueIntegerGenerator); m.def("init_gflags", InitGflags); diff --git a/paddle/scripts/docker/build_android.sh b/paddle/scripts/docker/build_android.sh index 11612ad4bed0afa8496087605afaefbd0420d5ce..6ef45d33d8c9e32e564555854c10a6fe15e4fd9f 100644 --- a/paddle/scripts/docker/build_android.sh +++ b/paddle/scripts/docker/build_android.sh @@ -4,6 +4,10 @@ set -xe if [ $ANDROID_ABI == "arm64-v8a" ]; then ANDROID_ARCH=arm64 + if [ $ANDROID_API -lt 21 ]; then + echo "Warning: arm64-v8a requires ANDROID_API >= 21." + ANDROID_API=21 + fi else # armeabi, armeabi-v7a ANDROID_ARCH=arm fi diff --git a/paddle/trainer/MergeModel.cpp b/paddle/trainer/MergeModel.cpp index a70673ffec8812d86b9a0c13f15ef0b378dcf3ce..f3cfd9f97fea837e8f666f2eabee5a75659a4e42 100644 --- a/paddle/trainer/MergeModel.cpp +++ b/paddle/trainer/MergeModel.cpp @@ -27,6 +27,13 @@ using namespace paddle; // NOLINT using namespace std; // NOLINT int main(int argc, char** argv) { + if (FLAGS_model_dir.empty() || FLAGS_config_file.empty() || + FLAGS_model_file.empty()) { + LOG(INFO) << "Usage: ./paddle_merge_model --model_dir=pass-00000 " + "--config_file=config.py --model_file=out.paddle"; + return 0; + } + initMain(argc, argv); initPython(argc, argv); diff --git a/paddle/trainer/tests/CMakeLists.txt b/paddle/trainer/tests/CMakeLists.txt index 5ebbb99c94bce45d295ae0bf585f2cf864bfc4d4..f01ad4142d4fe7c7f7d7aac60d967ea114b93e56 100644 --- a/paddle/trainer/tests/CMakeLists.txt +++ b/paddle/trainer/tests/CMakeLists.txt @@ -37,22 +37,6 @@ add_test(NAME test_CompareTwoNets --config_file_a=trainer/tests/sample_trainer_config_qb_rnn.conf --config_file_b=trainer/tests/sample_trainer_config_rnn.conf WORKING_DIRECTORY ${PADDLE_SOURCE_DIR}/paddle/) -################ test_CompareMKLDNNandCPU ###################### -if(WITH_MKLDNN) - macro(gen_command VAR_NAME CONFIG_FILE) - set(${VAR_NAME} "${PADDLE_SOURCE_DIR}/paddle/.set_python_path.sh" "-d" "${PADDLE_SOURCE_DIR}/python/" - "${CMAKE_CURRENT_BINARY_DIR}/test_CompareMKLDNNandCPU --use_gpu=False" - "--config_file_a=trainer/tests/${CONFIG_FILE} --use_mkldnn_a=True" - "--config_file_b=trainer/tests/${CONFIG_FILE} --use_mkldnn_b=False" - "WORKING_DIRECTORY" "${PADDLE_SOURCE_DIR}/paddle/") - endmacro() - add_unittest_without_exec(test_CompareMKLDNNandCPU test_CompareTwoNets.cpp) - gen_command(compare_simple_net "sample_trainer_config_simple_net.conf") - gen_command(compare_branch_net "sample_trainer_config_branch_net.conf") - add_test(NAME test_CompareMKLDNNandCPU_simple_net COMMAND ${compare_simple_net}) - add_test(NAME test_CompareMKLDNNandCPU_branch_net COMMAND ${compare_branch_net}) -endif() - ############### test_CompareTwoOpts ################### add_unittest_without_exec(test_CompareTwoOpts test_CompareTwoOpts.cpp) diff --git a/paddle/trainer/tests/sample_trainer_config_branch_net.conf b/paddle/trainer/tests/sample_trainer_config_branch_net.conf deleted file mode 100644 index 3d8fb77a11958218091d2ee72e1d5a40ad1d9f5b..0000000000000000000000000000000000000000 --- a/paddle/trainer/tests/sample_trainer_config_branch_net.conf +++ /dev/null @@ -1,133 +0,0 @@ -# Copyright (c) 2017 PaddlePaddle Authors. All Rights Reserved -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from paddle.trainer_config_helpers import * - -################################### Data Configuration ################################### -TrainData(ProtoData(files = "trainer/tests/mnist.list")) -################################### Algorithm Configuration ################################### -settings(batch_size = 128, - learning_method = MomentumOptimizer(momentum=0.5, sparse=False)) -################################### Network Configuration ################################### -data = data_layer(name ="input", size=784) - -tmp = img_conv_layer(input=data, - num_channels=1, - filter_size=3, - num_filters=32, - padding=1, - shared_biases=True, - act=ReluActivation()) - -a1 = img_conv_layer(input=tmp, - filter_size=1, - num_filters=32, - padding=0, - shared_biases=True, - act=ReluActivation()) - -a2 = img_conv_layer(input=tmp, - filter_size=3, - num_filters=32, - padding=1, - shared_biases=True, - act=ReluActivation()) - -tmp = addto_layer(input=[a1, a2], - act=ReluActivation(), - bias_attr=False) - -tmp = img_pool_layer(input=tmp, - pool_size=3, - stride=2, - padding=1, - pool_type=AvgPooling()) - -b1 = img_conv_layer(input=tmp, - filter_size=3, - num_filters=32, - padding=1, - shared_biases=True, - act=ReluActivation()) - -b1 = img_pool_layer(input=b1, - pool_size=3, - stride=2, - padding=0, - pool_type=MaxPooling()) - -b2 = img_conv_layer(input=tmp, - filter_size=3, - num_filters=64, - padding=1, - shared_biases=True, - act=ReluActivation()) - -b2 = img_pool_layer(input=b2, - pool_size=5, - stride=2, - padding=1, - pool_type=MaxPooling()) - -tmp = concat_layer(input=[b1, b2]) - -tmp = img_pool_layer(input=tmp, - num_channels=96, - pool_size=3, - stride=2, - padding=1, - pool_type=MaxPooling()) - -tmp = img_conv_layer(input=tmp, - filter_size=3, - num_filters=32, - padding=1, - shared_biases=True, - act=LinearActivation(), - bias_attr=False) - -tmp = batch_norm_layer(input=tmp, - use_global_stats=False, - act=ReluActivation()) - -c1 = img_conv_layer(input=tmp, - filter_size=1, - num_filters=32, - padding=0, - shared_biases=True, - act=ReluActivation()) - -c2 = img_conv_layer(input=tmp, - filter_size=3, - num_filters=32, - padding=1, - shared_biases=True, - act=ReluActivation()) - -tmp = addto_layer(input=[c1, c2], - act=ReluActivation(), - bias_attr=False) - -tmp = fc_layer(input=tmp, size=64, - bias_attr=False, - act=TanhActivation()) - -output = fc_layer(input=tmp, size=10, - bias_attr=True, - act=SoftmaxActivation()) - -lbl = data_layer(name ="label", size=10) - -cost = classification_cost(input=output, label=lbl) -outputs(cost) diff --git a/paddle/trainer/tests/sample_trainer_config_simple_net.conf b/paddle/trainer/tests/sample_trainer_config_simple_net.conf deleted file mode 100644 index c615b5622b7e50b7aa99a9fcf9f63d7b4351417c..0000000000000000000000000000000000000000 --- a/paddle/trainer/tests/sample_trainer_config_simple_net.conf +++ /dev/null @@ -1,68 +0,0 @@ -# Copyright (c) 2017 PaddlePaddle Authors. All Rights Reserved -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from paddle.trainer_config_helpers import * - -################################### Data Configuration ################################### -TrainData(ProtoData(files = "trainer/tests/mnist.list")) -################################### Algorithm Configuration ################################### -settings(batch_size = 128, - learning_method = MomentumOptimizer(momentum=0.5, sparse=False)) -################################### Network Configuration ################################### -data = data_layer(name ="input", size=784) - -tmp = img_conv_layer(input=data, - num_channels=1, - filter_size=3, - num_filters=32, - padding=1, - shared_biases=True, - act=ReluActivation()) - -tmp = img_pool_layer(input=tmp, - pool_size=3, - stride=2, - padding=1, - pool_type=AvgPooling()) - -tmp = img_conv_layer(input=tmp, - filter_size=3, - num_filters=32, - padding=1, - shared_biases=True, - act=LinearActivation(), - bias_attr=False) - -tmp = batch_norm_layer(input=tmp, - use_global_stats=False, - act=ReluActivation()) - -tmp = img_pool_layer(input=tmp, - pool_size=3, - stride=2, - padding=1, - pool_type=MaxPooling()) - -tmp = fc_layer(input=tmp, size=64, - bias_attr=True, - act=ReluActivation()) - -output = fc_layer(input=tmp, size=10, - bias_attr=True, - act=SoftmaxActivation()) - -lbl = data_layer(name ="label", size=10) - -cost = classification_cost(input=output, label=lbl) -outputs(cost) diff --git a/paddle/trainer/tests/test_CompareTwoNets.cpp b/paddle/trainer/tests/test_CompareTwoNets.cpp index 307645d2c3d21d954371fcedb5f95a2536a0183e..94f65e545d116c802fb4877dc14f07aaaf83a4fb 100644 --- a/paddle/trainer/tests/test_CompareTwoNets.cpp +++ b/paddle/trainer/tests/test_CompareTwoNets.cpp @@ -26,15 +26,12 @@ DECLARE_int32(gpu_id); DECLARE_bool(local); DECLARE_bool(use_gpu); -DECLARE_bool(use_mkldnn); DECLARE_string(config); DECLARE_string(nics); DEFINE_string(config_file_a, "", "config of one network to compare"); DEFINE_string(config_file_b, "", "config of another network to compare"); -DEFINE_bool(use_mkldnn_a, false, "whether to use mkldnn to run config_file_a"); -DEFINE_bool(use_mkldnn_b, false, "whether to use mkldnn to run config_file_b"); DEFINE_bool(need_high_accuracy, false, "whether need to run in double accuracy"); @@ -131,12 +128,6 @@ void compareGradient(ComData& comDataA, ComData& comDataB) { matA.getWidth()); } - if (FLAGS_use_mkldnn_a || FLAGS_use_mkldnn_b) { - // some format of mkldnn parameter is different with cpu - // test_MKLDNN will check the parameters - return; - } - vector& parametersA = comDataA.parameters; vector& parametersB = comDataB.parameters; @@ -176,12 +167,10 @@ void compareGradient(ComData& comDataA, ComData& comDataB) { TEST(Trainer, create) { ComData dataA; - FLAGS_use_mkldnn = FLAGS_use_mkldnn_a; calcGradient(dataA, FLAGS_config_file_a); LOG(INFO) << "\n\nforwardBackward of Network A is finished\n\n"; ComData dataB; - FLAGS_use_mkldnn = FLAGS_use_mkldnn_b; calcGradient(dataB, FLAGS_config_file_b); LOG(INFO) << "\n\nforwardBackward of the Network B is finished\n\n"; diff --git a/python/paddle/v2/dataset/imdb.py b/python/paddle/v2/dataset/imdb.py index 93dd3e8f7d3a569eaf56335f0f92bed04c0ee26c..cfc1c886e1389c15e3f803c341b6f62dd7b4bf41 100644 --- a/python/paddle/v2/dataset/imdb.py +++ b/python/paddle/v2/dataset/imdb.py @@ -116,7 +116,7 @@ def reader_creator(pos_pattern, neg_pattern, word_idx, buffer_size): yield [word_idx.get(w, UNK) for w in doc], i % 2 doc = qs[i % 2].get() - return reader() + return reader def train(word_idx): diff --git a/python/paddle/v2/framework/framework.py b/python/paddle/v2/framework/framework.py index f8d2f67410a6c06a1642180d2d62c881ec6bda3d..b3493fc378f4d866d16a6a0a739e0ccc8e44d97c 100644 --- a/python/paddle/v2/framework/framework.py +++ b/python/paddle/v2/framework/framework.py @@ -354,8 +354,8 @@ class Block(object): def create_var(self, *args, **kwargs): var = Variable(self, *args, **kwargs) - if 'init_attr' in kwargs: - self._prepend_initialize_ops_(var, kwargs['init_attr']) + if 'initializer' in kwargs: + kwargs['initializer'](var, self) return var def has_var(self, name): @@ -364,8 +364,8 @@ class Block(object): def create_parameter(self, *args, **kwargs): global_block = self.program.global_block() param = Parameter(global_block, *args, **kwargs) - if 'init_attr' in kwargs: - self._prepend_initialize_ops_(param, kwargs['init_attr']) + if 'initializer' in kwargs: + kwargs['initializer'](param, self) return param def append_op(self, *args, **kwargs): @@ -424,17 +424,6 @@ class Block(object): for index in range(len(self.ops)): assert self.ops[index].desc == ops_in_cpp[index] - def _prepend_initialize_ops_(self, param, init_attr): - op_type = init_attr['type'] - init_attr['shape'] = param.shape - init_attr['data_type'] = int(param.data_type) - op = self.prepend_op( - type=op_type, - inputs=None, - outputs={'Out': [param]}, - attrs=init_attr) - param.op = op - class Program(object): def __init__(self): diff --git a/python/paddle/v2/framework/initializer.py b/python/paddle/v2/framework/initializer.py new file mode 100644 index 0000000000000000000000000000000000000000..507fd16062af1e2458eb9b45407e91a8d29ea9ce --- /dev/null +++ b/python/paddle/v2/framework/initializer.py @@ -0,0 +1,158 @@ +import paddle.v2.framework.framework as framework + +__all__ = ['ConstantInitializer', 'UniformInitializer'] + + +class Initializer(object): + """Base class for variable initializers + + Defines the common interface of variable initializers. + They add operations to the init program that are used + to initialize variables. Users should not use this class + directly, but need to use one of its implementations. + """ + + def __init_(self): + pass + + def __call__(self, param, block): + """Add corresponding initialization operations to the network + """ + raise NotImplementedError() + + +class ConstantInitializer(Initializer): + """Implements the constant initializer + """ + + def __init__(self, value=0.0): + """Constructor for ConstantInitializer + + Args: + value: constant value to initialize the variable + """ + assert value is not None + super(ConstantInitializer, self).__init__() + self._value = value + + def __call__(self, var, block): + """Add constant initialization ops for a variable + + Args: + var: Variable that needs to be initialized + block: The block in which initialization ops + should be added + + Returns: + the initialization op + """ + assert isinstance(var, framework.Variable) + assert isinstance(block, framework.Block) + # Initialization Ops should be prepended and not appended + op = block.prepend_op( + type="fill_constant", + outputs={"Out": var}, + attrs={ + "shape": var.shape, + "data_type": int(var.data_type), + "value": self._value + }) + var.op = op + return op + + +class UniformInitializer(Initializer): + """Implements the random uniform distribution initializer + """ + + def __init__(self, low=-1.0, high=1.0, seed=0): + """Constructor for UniformInitializer + + Args: + low: lower boundary of the uniform distribution + high: upper boundary of the uniform distribution + seed: random seed + """ + assert low is not None + assert high is not None + assert high >= low + assert seed is not None + super(UniformInitializer, self).__init__() + self._low = low + self._high = high + self._seed = seed + + def __call__(self, var, block): + """Add uniform distribution initialization ops for a variable + + Args: + var: Variable that needs to be initialized + block: The block in which initialization ops + should be added + + Returns: + the initialization op + """ + assert isinstance(var, framework.Variable) + assert isinstance(block, framework.Block) + # Initialization Ops should be prepended and not appended + op = block.prepend_op( + type="uniform_random", + outputs={"Out": var}, + attrs={ + "shape": var.shape, + "data_type": int(var.data_type), + "min": self._low, + "max": self._high, + "seed": self._seed + }) + var.op = op + return op + + +class NormalInitializer(Initializer): + """Implements the random Normal(Gaussian) distribution initializer + """ + + def __init__(self, loc=0.0, scale=1.0, seed=0): + """Constructor for NormalInitializer + + Args: + loc: mean of the normal distribution + scale: standard deviation of the normal distribution + seed: random seed + """ + assert loc is not None + assert scale is not None + assert seed is not None + super(NormalInitializer, self).__init__() + self._mean = loc + self._std_dev = scale + self._seed = seed + + def __call__(self, var, block): + """Add normal distribution initialization ops for a variable + + Args: + var: Variable that needs to be initialized + block: The block in which initialization ops + should be added + + Returns: + the initialization op + """ + assert isinstance(var, framework.Variable) + assert isinstance(block, framework.Block) + # Initialization Ops should be prepended and not appended + op = block.prepend_op( + type="gaussian_random", + outputs={"Out": var}, + attrs={ + "shape": var.shape, + "data_type": int(var.data_type), + "mean": self._mean, + "std": self._std_dev, + "seed": self._seed + }) + var.op = op + return op diff --git a/python/paddle/v2/framework/layer_helper.py b/python/paddle/v2/framework/layer_helper.py index d96dbe172c22617182e7ebf4aab175c6142352b7..45d9cf3f4827a3c31005f9650240bcdba47696e7 100644 --- a/python/paddle/v2/framework/layer_helper.py +++ b/python/paddle/v2/framework/layer_helper.py @@ -5,6 +5,8 @@ import paddle.v2.framework.core as core from paddle.v2.framework.framework import Variable, g_program, \ g_init_program +from paddle.v2.framework.initializer import ConstantInitializer, \ + UniformInitializer def unique_name(prefix): @@ -66,14 +68,7 @@ class LayerHelper(object): @property def param_attr(self): - default = { - 'name': None, - 'init_attr': { - 'type': 'uniform_random', - 'min': -1.0, - 'max': 1.0 - } - } + default = {'name': None, 'initializer': UniformInitializer()} actual = self.kwargs.get('param_attr', None) if actual is None: actual = default @@ -83,13 +78,7 @@ class LayerHelper(object): return actual def bias_attr(self): - default = { - 'name': None, - 'init_attr': { - 'type': 'fill_constant', - 'value': 0.0 - } - } + default = {'name': None, 'initializer': ConstantInitializer()} bias_attr = self.kwargs.get('bias_attr', None) if bias_attr is True: bias_attr = default @@ -153,8 +142,24 @@ class LayerHelper(object): return self.program.global_block().create_var( *args, persistable=False, **kwargs) - def append_bias_op(self, input_var): - size = list(input_var.shape[1:]) + def append_bias_op(self, input_var, num_flatten_dims=None): + """ + Append bias operator and return its output. If the user does not set + bias_attr, append_bias_op will return input_var + + :param input_var: the input variable. The len(input_var.shape) is larger + or equal than 2. + :param num_flatten_dims: The input tensor will be flatten as a matrix + when adding bias. + `matrix.shape = product(input_var.shape[0:num_flatten_dims]), product( + input_var.shape[num_flatten_dims:])` + """ + if num_flatten_dims is None: + num_flatten_dims = self.kwargs.get('num_flatten_dims', None) + if num_flatten_dims is None: + num_flatten_dims = 1 + + size = list(input_var.shape[num_flatten_dims:]) bias_attr = self.bias_attr() if not bias_attr: return input_var diff --git a/python/paddle/v2/framework/layers.py b/python/paddle/v2/framework/layers.py index 6451d11e2b68692527addb424b0cd716f23bd77a..86a2c7bf08b09638fd065f96300fc4f7ffc332d5 100644 --- a/python/paddle/v2/framework/layers.py +++ b/python/paddle/v2/framework/layers.py @@ -1,11 +1,13 @@ from paddle.v2.framework.layer_helper import LayerHelper, unique_name import paddle.v2.framework.core as core from paddle.v2.framework.framework import OpProtoHolder, Variable, Program +from paddle.v2.framework.initializer import ConstantInitializer import re __all__ = [ 'fc', 'data', 'cross_entropy', 'conv2d', 'pool2d', 'embedding', 'concat', - 'StaticRNN', 'cast', 'sequence_conv', 'sequence_pool', 'accuracy' + 'StaticRNN', 'cast', 'sequence_conv', 'sequence_pool', 'sums', 'cos_sim', + 'batch_norm', 'accuracy' ] @@ -165,18 +167,6 @@ _create_op_func_('dropout') _create_op_func_('reshape') -def cast(x, data_type, program=None): - helper = LayerHelper('cast', **locals()) - out = helper.create_tmp_variable(dtype=data_type) - helper.append_op( - type='cast', - inputs={'X': [x]}, - outputs={'Out': [out]}, - attrs={'in_data_type': x.data_type, - 'out_data_type': out.data_type}) - return out - - def cast(x, data_type, program=None): helper = LayerHelper('cast', **locals()) out = helper.create_tmp_variable(dtype=data_type) @@ -191,9 +181,7 @@ def cast(x, data_type, program=None): def concat(input, axis, program=None, init_program=None): helper = LayerHelper('concat', **locals()) - if not isinstance(input, list) and not isinstance(input, tuple): - input = [input] - out = helper.create_tmp_variable(dtype=input[0].data_type) + out = helper.create_tmp_variable(dtype=helper.input_dtype()) helper.append_op( type='concat', inputs={'X': input}, @@ -202,6 +190,28 @@ def concat(input, axis, program=None, init_program=None): return out +def sums(input, program=None, init_program=None): + helper = LayerHelper('sum', **locals()) + out = helper.create_tmp_variable(dtype=helper.input_dtype()) + helper.append_op(type='sum', inputs={'X': [input]}, outputs={'Out': out}) + return out + + +def cos_sim(X, Y, program=None, init_program=None): + helper = LayerHelper('cos_sim', **locals()) + out = helper.create_tmp_variable(dtype=helper.input_dtype("X")) + xnorm = helper.create_tmp_variable(dtype=helper.input_dtype("X")) + ynorm = helper.create_tmp_variable(dtype=helper.input_dtype("X")) + helper.append_op( + type='cos_sim', + inputs={'X': [X], + 'Y': [Y]}, + outputs={'Out': [out], + 'XNorm': [xnorm], + 'YNorm': [ynorm]}) + return out, xnorm, ynorm + + def cross_entropy(input, label, **kwargs): helper = LayerHelper('cross_entropy', **kwargs) out = helper.create_tmp_variable(dtype=input.data_type) @@ -254,9 +264,7 @@ def accuracy(input, label, k=1, **kwargs): def sequence_conv(input, num_filters, - name=None, filter_size=3, - act=None, stride=1, padding=None, bias_attr=None, @@ -270,7 +278,7 @@ def sequence_conv(input, helper = LayerHelper('sequence_conv', **locals()) dtype = helper.input_dtype() - filter_shape = [num_filters, filter_size] + filter_shape = [filter_size * input.shape[1], num_filters] filter = helper.create_parameter( attr=helper.param_attr, shape=filter_shape, dtype=dtype) pre_bias = helper.create_tmp_variable(dtype) @@ -279,7 +287,7 @@ def sequence_conv(input, type='sequence_conv', inputs={ 'X': [input], - 'Filter': filter, + 'Filter': [filter], }, outputs={"Out": pre_bias}, attrs={ @@ -287,7 +295,6 @@ def sequence_conv(input, 'context_start': 0, 'context_length': filter_size }) - pre_act = helper.append_bias_op(pre_bias) return helper.append_activation(pre_act) @@ -344,31 +351,21 @@ def conv2d(input, return helper.append_activation(pre_act) -def sequence_pool(input, - pool_size, - pool_type, - pool_stride=1, - pool_padding=0, - global_pooling=False, - program=None, - init_program=None): - # FIXME(dzh) : want to unify the argument of python layer - # function. So we ignore some unecessary attributes - - ENUM_POOL_TYPE = set(["max", "avg", "sqrt", "last", "first"]) - if pool_type not in ENUM_POOL_TYPE: +def sequence_pool(input, pool_type, **kwargs): + ENUM_POOL_TYPE = set(["MAX", "AVG", "SQRT", "LAST", "FIRST"]) + if pool_type.upper() not in ENUM_POOL_TYPE: raise ValueError("Unknown pool_type: '%s'. It can only be %s.", str(pool_type), " ".join(ENUM_POOL_TYPE)) - helper = LayerHelper('sequence_pool', **locals()) + helper = LayerHelper('sequence_pool', **kwargs) dtype = helper.input_dtype() pool_out = helper.create_tmp_variable(dtype) helper.append_op( type="sequence_pool", inputs={"X": [input]}, - outputs={"Out": pool_out}, - attrs={"strategy": pool_type}) + outputs={"Out": [pool_out]}, + attrs={"pooltype": pool_type.upper()}) return pool_out @@ -433,26 +430,12 @@ def batch_norm(input, else: raise ValueError("unsupported data layout:" + data_layout) - def get_init_attr(value): - if not isinstance(value, float): - raise ValueError("attr value should be a float") - return {'type': 'fill_constant', 'value': value} - - def prepend_init_op(var, init_attr): - assert isinstance(var, Variable) - op_type = init_attr['type'] - init_attr['shape'] = var.shape - init_attr['data_type'] = int(var.data_type) - op = var.block.prepend_op( - type=op_type, inputs=None, outputs={'Out': [var]}, attrs=init_attr) - return op - - def create_persistable_var(dtype, shape, init_attr=None): + def create_persistable_var(dtype, shape, initializer=None): name = unique_name(".".join([helper.name, "xxxx"])) var = init_program.global_block().create_var( dtype=dtype, shape=shape, name=name, persistable=True) - if 'init_attr' is not None: - prepend_init_op(var, init_attr) + if initializer is not None: + initializer(var, var.block) return program.global_block().create_var( name=name, dtype=dtype, shape=shape, persistable=True) @@ -465,8 +448,9 @@ def batch_norm(input, attr=helper.param_attr, shape=param_shape, dtype=dtype) # create input - mean = create_persistable_var(dtype, param_shape, get_init_attr(0.0)) - variance = create_persistable_var(dtype, param_shape, get_init_attr(1.0)) + mean = create_persistable_var(dtype, param_shape, ConstantInitializer(0.0)) + variance = create_persistable_var(dtype, param_shape, + ConstantInitializer(1.0)) # create output # mean and mean_out share the same memory diff --git a/python/paddle/v2/framework/nets.py b/python/paddle/v2/framework/nets.py index a9998073e164a223e5d99fc26146ba48027d7a3e..8191b5ef44de840c90087ff4b9eb6923b3f1fbae 100644 --- a/python/paddle/v2/framework/nets.py +++ b/python/paddle/v2/framework/nets.py @@ -101,24 +101,19 @@ def img_conv_group(input, def sequence_conv_pool(input, num_filters, filter_size, - pool_size, - pool_stride, - act, + pool_type="max", program=None, init_program=None): conv_out = layers.sequence_conv( input=input, num_filters=num_filters, filter_size=filter_size, - act=act, program=program, init_program=init_program) pool_out = layers.sequence_pool( input=conv_out, - pool_size=pool_size, - pool_type='max', - pool_stride=pool_stride, + pool_type=pool_type, program=program, init_program=init_program) return pool_out diff --git a/python/paddle/v2/framework/tests/test_gaussian_random_op.py b/python/paddle/v2/framework/tests/test_gaussian_random_op.py index 8b7779667d5e806c06b333527f774c7987ce7e73..0dc7e091a5c8dd046f36cab7f79a15b2281cdd90 100644 --- a/python/paddle/v2/framework/tests/test_gaussian_random_op.py +++ b/python/paddle/v2/framework/tests/test_gaussian_random_op.py @@ -19,7 +19,7 @@ class TestGaussianRandomOp(unittest.TestCase): op = Operator( "gaussian_random", Out='Out', - dims=[1000, 784], + shape=[1000, 784], mean=.0, std=1., seed=10) diff --git a/python/paddle/v2/framework/tests/test_initializer.py b/python/paddle/v2/framework/tests/test_initializer.py new file mode 100644 index 0000000000000000000000000000000000000000..f28fc8a86c7c8e683e00249a2f73dbbe6d7be27c --- /dev/null +++ b/python/paddle/v2/framework/tests/test_initializer.py @@ -0,0 +1,120 @@ +import unittest + +import paddle.v2.framework.framework as framework +import paddle.v2.framework.initializer as initializer + +DELTA = 0.00001 + + +class TestConstantInitializer(unittest.TestCase): + def test_constant_initializer_default_value(self): + """Test the constant initializer with default value + """ + program = framework.Program() + block = program.global_block() + block.create_parameter( + dtype="float32", + shape=[5, 10], + lod_level=0, + name="param", + initializer=initializer.ConstantInitializer()) + self.assertEqual(len(block.ops), 1) + init_op = block.ops[0] + self.assertEqual(init_op.type, 'fill_constant') + self.assertAlmostEqual(init_op.attr('value'), 0.0, delta=DELTA) + + def test_constant_initializer(self): + """Test constant initializer with supplied value + """ + program = framework.Program() + block = program.global_block() + block.create_parameter( + dtype="float32", + shape=[5, 10], + lod_level=0, + name="param", + initializer=initializer.ConstantInitializer(2.3)) + self.assertEqual(len(block.ops), 1) + init_op = block.ops[0] + self.assertEqual(init_op.type, 'fill_constant') + self.assertAlmostEqual(init_op.attr('value'), 2.3, delta=DELTA) + + +class TestUniformInitializer(unittest.TestCase): + def test_uniform_initializer_default_value(self): + """Test the uniform initializer with default value + """ + program = framework.Program() + block = program.global_block() + block.create_parameter( + dtype="float32", + shape=[5, 10], + lod_level=0, + name="param", + initializer=initializer.UniformInitializer()) + self.assertEqual(len(block.ops), 1) + init_op = block.ops[0] + self.assertEqual(init_op.type, 'uniform_random') + self.assertAlmostEqual(init_op.attr('min'), -1.0, delta=DELTA) + self.assertAlmostEqual(init_op.attr('max'), 1.0, delta=DELTA) + self.assertEqual(init_op.attr('seed'), 0) + + def test_uniform_initializer(self): + """Test uniform initializer with supplied attributes + """ + program = framework.Program() + block = program.global_block() + block.create_parameter( + dtype="float32", + shape=[5, 10], + lod_level=0, + name="param", + initializer=initializer.UniformInitializer(-4.2, 3.1, 123)) + self.assertEqual(len(block.ops), 1) + init_op = block.ops[0] + self.assertEqual(init_op.type, 'uniform_random') + self.assertAlmostEqual(init_op.attr('min'), -4.2, delta=DELTA) + self.assertAlmostEqual(init_op.attr('max'), 3.1, delta=DELTA) + self.assertEqual(init_op.attr('seed'), 123) + + +class TestNormalInitializer(unittest.TestCase): + def test_normal_initializer_default_value(self): + """Test the normal initializer with default value + """ + program = framework.Program() + block = program.global_block() + block.create_parameter( + dtype="float32", + shape=[5, 10], + lod_level=0, + name="param", + initializer=initializer.NormalInitializer()) + self.assertEqual(len(block.ops), 1) + init_op = block.ops[0] + self.assertEqual(init_op.type, 'gaussian_random') + self.assertAlmostEqual(init_op.attr('mean'), 0.0, delta=DELTA) + self.assertAlmostEqual(init_op.attr('std'), 1.0, delta=DELTA) + self.assertEqual(init_op.attr('seed'), 0) + + def test_normal_initializer(self): + """Test normal initializer with supplied attributes + """ + program = framework.Program() + block = program.global_block() + block.create_parameter( + dtype="float32", + shape=[5, 10], + lod_level=0, + name="param", + initializer=initializer.NormalInitializer(2.3, 1.9, 123)) + self.assertEqual(len(block.ops), 1) + init_op = block.ops[0] + self.assertEqual(init_op.type, 'gaussian_random') + self.assertAlmostEqual(init_op.attr('mean'), 2.3, delta=DELTA) + self.assertAlmostEqual(init_op.attr('std'), 1.9, delta=DELTA) + self.assertEqual(init_op.attr('seed'), 123) + + +if __name__ == '__main__': + unittest.main() diff --git a/python/paddle/v2/framework/tests/test_linear_chain_crf_op.py b/python/paddle/v2/framework/tests/test_linear_chain_crf_op.py new file mode 100644 index 0000000000000000000000000000000000000000..6f06a66c825b37ee91214efc0a29a58f0b9057f9 --- /dev/null +++ b/python/paddle/v2/framework/tests/test_linear_chain_crf_op.py @@ -0,0 +1,142 @@ +import unittest +import random +import numpy as np + +from op_test import OpTest + + +class LinearChainCrfForward(object): + def __init__(self, seq_start_positions, emission_weights, emission_row_max, + emission_exps, transition_weights, transition_exps, labels): + self.tag_num = emission_weights.shape[1] + self.seq_num = len(seq_start_positions) - 1 + + self.seq_start_positions = seq_start_positions + self.labels = labels + self.x = emission_weights + + self.x_row_max = emission_row_max + self.x_exps = emission_exps + + # unnormalized logits of the transition weights for the start mark. + self.a = transition_weights[0, :] + self.a_exps = transition_exps[0, :] + # unnormalized logits of the transition weights for the end mark. + self.b = transition_weights[1, :] + self.b_exps = transition_exps[1, :] + # unnormalized logits of the transition weights for all the other tags. + self.w = transition_weights[2:, :] + self.w_exps = transition_exps[2:, :] + + # The output of linear chain crf operator. + # alpha is a memo table in dynamic programming to caculate + # nomalization factor. + self.alpha = np.zeros( + (seq_start_positions[-1], self.tag_num), dtype="float64") + self.log_likelihood = np.zeros((self.seq_num, 1)) + + def _l1_norm(self, x): + s = np.sum(x) + x /= s + return s + + def _forward_a_sequence(self, x, x_row_max, x_exps, label, alpha): + seq_len = x_row_max.shape[0] + log_likelihood = 0. + + for i in range(self.tag_num): + alpha[0, i] = self.a_exps[i] * x_exps[0, i] + log_likelihood = -x_row_max[0] - np.log(self._l1_norm(alpha[0, :])) + + # calculate the unnormalized logits of the normalization factor. + for k in range(1, seq_len): + for i in range(self.tag_num): + s = 0. + for j in range(self.tag_num): + s += alpha[k - 1, j] * self.w_exps[j, i] + alpha[k, i] = x_exps[k, i] * s + log_likelihood -= x_row_max[k] + np.log(self._l1_norm(alpha[k, :])) + s = 0. + for i in range(self.tag_num): + s += alpha[-1, i] * self.b_exps[i] + log_likelihood -= np.log(s) + + # calculate the nominator part. + log_likelihood += ( + self.a[label[0]] + x[0, label[0]] + self.b[label[-1]]) + + for k in range(1, seq_len): + log_likelihood += (x[k, label[k]] + self.w[label[k - 1], label[k]]) + return -log_likelihood + + def crf_forward_compute(self): + for i in range(self.seq_num): + start = self.seq_start_positions[i] + end = self.seq_start_positions[i + 1] + + self.log_likelihood[i] = self._forward_a_sequence( + self.x[start:end, :], self.x_row_max[start:end, :], + self.x_exps[start:end, :], self.labels[start:end, :], + self.alpha[start:end, :]) + return self.alpha, self.log_likelihood + + +class TestLinearChainCrfOp(OpTest): + def set_test_data(self): + # TODO(caoying) Fix the unittest by: add the boundary cases when + # sequence lengths are 1, 2, and 3. + + SEQ_NUM = 3 + TAG_NUM = 17 + MAX_SEQ_LEN = 5 + + # the linear_chain_crf operator only supports sequence (LoD level = 1) + lod = [[0]] + for i in range(SEQ_NUM): + lod[-1].append(lod[-1][-1] + random.randint(1, MAX_SEQ_LEN)) + emission = np.random.uniform(-1, 1, + [lod[-1][-1], TAG_NUM]).astype("float64") + emission_row_max = np.amax(emission, axis=1, keepdims=True) + emission_exps = np.exp(emission - emission_row_max) + + transition = np.random.uniform(-0.5, 0.5, + [TAG_NUM + 2, TAG_NUM]).astype("float64") + transition_exps = np.exp(transition) + + labels = np.random.randint( + low=0, high=TAG_NUM, size=(lod[-1][-1], 1), dtype="int32") + + self.inputs = { + "Emission": (emission, lod), + "Transition": transition, + "Label": (labels, lod) + } + crf = LinearChainCrfForward(lod[0], emission, emission_row_max, + emission_exps, transition, transition_exps, + labels) + alpha, log_likelihood = crf.crf_forward_compute() + + self.outputs = { + "Alpha": alpha, + "EmissionExps": emission_exps, + "TransitionExps": transition_exps, + "LogLikelihood": log_likelihood + } + + def setUp(self): + self.op_type = "linear_chain_crf" + self.set_test_data() + + def test_check_output(self): + self.check_output() + + def test_check_grad(self): + self.check_grad(["Emission", "Transition"], "LogLikelihood") + + def test_check_grad_ignore_transition(self): + self.check_grad( + ["Emission"], "LogLikelihood", no_grad_set=set("Transition")) + + +if __name__ == "__main__": + unittest.main() diff --git a/python/paddle/v2/framework/tests/test_recognize_digits_mlp.py b/python/paddle/v2/framework/tests/test_recognize_digits_mlp.py index a8a34b2a952c8d374089ab8142b530610b2afe59..9916569d042d5d5077732281487ca3ae4366b39c 100644 --- a/python/paddle/v2/framework/tests/test_recognize_digits_mlp.py +++ b/python/paddle/v2/framework/tests/test_recognize_digits_mlp.py @@ -3,9 +3,10 @@ import paddle.v2.framework.layers as layers import paddle.v2.framework.core as core import paddle.v2.framework.optimizer as optimizer -from paddle.v2.framework.framework import Program, g_program +from paddle.v2.framework.framework import Program from paddle.v2.framework.executor import Executor from paddle.v2.framework.regularizer import L2DecayRegularizer +from paddle.v2.framework.initializer import UniformInitializer import numpy as np @@ -21,11 +22,8 @@ image = layers.data( param_attr = { 'name': None, - 'init_attr': { - 'type': 'uniform_random', - 'min': -1.0, - 'max': 1.0 - }, + 'initializer': UniformInitializer( + low=-1.0, high=1.0), 'regularization': L2DecayRegularizer(0.0005 * BATCH_SIZE) } diff --git a/python/paddle/v2/framework/tests/test_seq_pool.py b/python/paddle/v2/framework/tests/test_seq_pool.py index 56602c57e6b63b71d6b089e774a876ad6164040e..efc4920124afb539017a3b3f211c7320da68ffef 100644 --- a/python/paddle/v2/framework/tests/test_seq_pool.py +++ b/python/paddle/v2/framework/tests/test_seq_pool.py @@ -3,15 +3,6 @@ import numpy as np from op_test import OpTest -class SeqPoolType(OpTest): - AVERAGE = 0 - SUM = 1 - SQRT = 2 - MAX = 3 - LAST = 4 - FIRST = 5 - - class TestSeqAvgPool(OpTest): def set_data(self): self.op_type = 'sequence_pool' @@ -25,7 +16,7 @@ class TestSeqAvgPool(OpTest): return x, lod, out def compute(self, x, lod, out): - self.attrs = {'strategy': SeqPoolType.AVERAGE} + self.attrs = {'pooltype': "AVERAGE"} for i in range(4): sub_x = x[lod[0][i]:lod[0][i + 1], :] out[i] = sub_x.mean(axis=0) @@ -54,7 +45,7 @@ class TestSeqAvgPool2D(TestSeqAvgPool): return x, lod, out def compute(self, x, lod, out): - self.attrs = {'strategy': SeqPoolType.AVERAGE} + self.attrs = {'pooltype': "AVERAGE"} for i in range(4): sub_x = np.reshape(x[lod[0][i]:lod[0][i + 1], :], (-1, 3 * 17)) out[i] = np.reshape(sub_x.mean(axis=0), (3, 17)) @@ -62,7 +53,7 @@ class TestSeqAvgPool2D(TestSeqAvgPool): class TestSeqSumPool(TestSeqAvgPool): def compute(self, x, lod, out): - self.attrs = {'strategy': SeqPoolType.SUM} + self.attrs = {'pooltype': "SUM"} for i in range(4): sub_x = x[lod[0][i]:lod[0][i + 1], :] out[i] = sub_x.sum(axis=0) @@ -70,7 +61,7 @@ class TestSeqSumPool(TestSeqAvgPool): class TestSeqSumPool2D(TestSeqAvgPool2D): def compute(self, x, lod, out): - self.attrs = {'strategy': SeqPoolType.SUM} + self.attrs = {'pooltype': "SUM"} for i in range(4): sub_x = np.reshape(x[lod[0][i]:lod[0][i + 1], :], (-1, 3 * 17)) out[i] = np.reshape(sub_x.sum(axis=0), (3, 17)) @@ -78,7 +69,7 @@ class TestSeqSumPool2D(TestSeqAvgPool2D): class TestSeqSqrtPool(TestSeqAvgPool): def compute(self, x, lod, out): - self.attrs = {'strategy': SeqPoolType.SQRT} + self.attrs = {'pooltype': "SQRT"} for i in range(4): sub_x = x[lod[0][i]:lod[0][i + 1], :] len = lod[0][i + 1] - lod[0][i] @@ -87,7 +78,7 @@ class TestSeqSqrtPool(TestSeqAvgPool): class TestSeqSqrtPool2D(TestSeqAvgPool2D): def compute(self, x, lod, out): - self.attrs = {'strategy': SeqPoolType.SQRT} + self.attrs = {'pooltype': "SQRT"} for i in range(4): sub_x = np.reshape(x[lod[0][i]:lod[0][i + 1], :], (-1, 3 * 17)) len = lod[0][i + 1] - lod[0][i] @@ -99,7 +90,7 @@ class TestSeqSqrtPool2D(TestSeqAvgPool2D): class TestSeqMaxPool(TestSeqAvgPool): def compute(self, x, lod, out): - self.attrs = {'strategy': SeqPoolType.MAX} + self.attrs = {'pooltype': "MAX"} for i in range(4): sub_x = x[lod[0][i]:lod[0][i + 1], :] out[i] = np.amax(sub_x, axis=0) @@ -111,7 +102,7 @@ class TestSeqMaxPool(TestSeqAvgPool): class TestSeqMaxPool2D(TestSeqAvgPool2D): def compute(self, x, lod, out): - self.attrs = {'strategy': SeqPoolType.MAX} + self.attrs = {'pooltype': "MAX"} for i in range(4): sub_x = np.reshape(x[lod[0][i]:lod[0][i + 1], :], (-1, 3 * 17)) out[i] = np.reshape(np.amax(sub_x, axis=0), (3, 17)) @@ -123,7 +114,7 @@ class TestSeqMaxPool2D(TestSeqAvgPool2D): class TestSeqLastPool(TestSeqAvgPool): def compute(self, x, lod, out): - self.attrs = {'strategy': SeqPoolType.LAST} + self.attrs = {'pooltype': "LAST"} for i in range(4): sub_x = x[lod[0][i]:lod[0][i + 1], :] out[i] = sub_x[-1, :] @@ -131,7 +122,7 @@ class TestSeqLastPool(TestSeqAvgPool): class TestSeqLastPool2D(TestSeqAvgPool2D): def compute(self, x, lod, out): - self.attrs = {'strategy': SeqPoolType.LAST} + self.attrs = {'pooltype': "LAST"} for i in range(4): sub_x = np.reshape(x[lod[0][i]:lod[0][i + 1], :], (-1, 3 * 17)) out[i] = np.reshape(sub_x[-1, :], (3, 17)) @@ -139,7 +130,7 @@ class TestSeqLastPool2D(TestSeqAvgPool2D): class TestSeqFirstPool(TestSeqAvgPool): def compute(self, x, lod, out): - self.attrs = {'strategy': SeqPoolType.FIRST} + self.attrs = {'pooltype': "FIRST"} for i in range(4): sub_x = x[lod[0][i]:lod[0][i + 1], :] out[i] = sub_x[0, :] @@ -147,7 +138,7 @@ class TestSeqFirstPool(TestSeqAvgPool): class TestSeqFirstPool2D(TestSeqAvgPool2D): def compute(self, x, lod, out): - self.attrs = {'strategy': SeqPoolType.FIRST} + self.attrs = {'pooltype': "FIRST"} for i in range(4): sub_x = np.reshape(x[lod[0][i]:lod[0][i + 1], :], (-1, 3 * 17)) out[i] = np.reshape(sub_x[0, :], (3, 17))