diff --git a/README.md b/README.md index 32a302cc5431a62b310d4812b545bd929f090e0a..68421cf177f4cd15f8f44e8d00a27cafb5a13b91 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,6 @@ # PaddlePaddle +English | [简体中文](./README_cn.md) [![Build Status](https://travis-ci.org/PaddlePaddle/Paddle.svg?branch=develop)](https://travis-ci.org/PaddlePaddle/Paddle) [![Documentation Status](https://img.shields.io/badge/docs-latest-brightgreen.svg?style=flat)](http://paddlepaddle.org/documentation/docs/en/1.2/getstarted/index_en.html) @@ -7,7 +8,6 @@ [![Release](https://img.shields.io/github/release/PaddlePaddle/Paddle.svg)](https://github.com/PaddlePaddle/Paddle/releases) [![License](https://img.shields.io/badge/license-Apache%202-blue.svg)](LICENSE) - Welcome to the PaddlePaddle GitHub. PaddlePaddle (PArallel Distributed Deep LEarning) is an easy-to-use, @@ -18,16 +18,6 @@ learning to many products at Baidu. Our vision is to enable deep learning for everyone via PaddlePaddle. Please refer to our [release announcement](https://github.com/PaddlePaddle/Paddle/releases) to track the latest feature of PaddlePaddle. - -欢迎来到 PaddlePaddle GitHub - -PaddlePaddle (PArallel Distributed Deep LEarning) 是一个简单易用、高效灵活、可扩展的深度学习平台,最初由百度科学家和工程师共同开发,目的是将深度学习技术应用到百度的众多产品中。 - -我们的愿景是让每个人都能通过PaddlePaddle接触深度学习 - -跟进PaddlePaddle最新特性请参考我们的[版本说明](https://github.com/PaddlePaddle/Paddle/releases) - - ### Latest PaddlePaddle Release: [Fluid 1.2.0](https://github.com/PaddlePaddle/Paddle/tree/release/1.2) ### Install Latest Stable Release: ``` @@ -43,23 +33,6 @@ pip install paddlepaddle-gpu==1.2.0.post85 # For installation on other platform, refer to http://paddlepaddle.org/ ``` - -### PaddlePaddle最新版本: [Fluid 1.2.0](https://github.com/PaddlePaddle/Paddle/tree/release/1.2) -### 安装最新稳定版本: -``` -# Linux CPU -pip install paddlepaddle -# Linux GPU cuda9cudnn7 -pip install paddlepaddle-gpu -# Linux GPU cuda8cudnn7 -pip install paddlepaddle-gpu==1.2.0.post87 -# Linux GPU cuda8cudnn5 -pip install paddlepaddle-gpu==1.2.0.post85 - -# 其他平台上的安装指引请参考 http://paddlepaddle.org/ -``` - - ## Features - **Flexibility** @@ -100,38 +73,10 @@ pip install paddlepaddle-gpu==1.2.0.post85 Baidu and it has achieved a significant impact. We hope you can also explore the capability of PaddlePaddle to make an impact on your product. -## 特点 - -- **灵活性** - - PaddlePaddle支持丰富的神经网络架构和优化算法。易于配置复杂模型,例如带有注意力机制或复杂记忆连接的神经网络机器翻译模型。 - -- **高效性** - - 为了高效使用异步计算资源,PaddlePaddle对框架的不同层进行优化,包括计算、存储、架构和通信。下面是一些样例: - - - 通过SSE/AVX 内置函数、BLAS库(例如MKL、OpenBLAS、cuBLAS)或定制的CPU/GPU内核优化数学操作。 - - 通过MKL-DNN库优化CNN网络 - - 高度优化循环网络,无需执行 `padding` 操作即可处理 **变长** 序列 - - 针对高维稀疏数据模型,优化了局部和分布式训练。 - - -- **稳定性** - - 有了 PaddlePaddle,使得利用各种CPU/GPU和机器来加速训练变得简单。PaddlePaddle 通过优化通信可以实现巨大吞吐量和快速执行。 - -- **连接产品** - - 另外,PaddlePaddle 的设计也易于部署。在百度,PaddlePaddle 已经部署到含有巨大用户量的产品和服务上,包括广告点击率(CTR)预测、大规模图像分类、光学字符识别(OCR)、搜索排序,计算机病毒检测、推荐系统等等。PaddlePaddle广泛应用于百度产品中,产生了非常重要的影响。我们希望您也能探索 PaddlePaddle 的能力,为您的产品创造新的影响力和效果。 - ## Installation It is recommended to read [this doc](http://paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/install/index_cn.html) on our website. -## 安装 - -推荐阅读官网上的[安装说明](http://paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/install/index_cn.html) - ## Documentation We provide [English](http://paddlepaddle.org/documentation/docs/en/1.2/getstarted/index_en.html) and @@ -153,37 +98,9 @@ We provide [English](http://paddlepaddle.org/documentation/docs/en/1.2/getstarte We appreciate your contributions! -## 文档 - -我们提供[英文](http://paddlepaddle.org/documentation/docs/en/1.2/getstarted/index_en.html)和 -[中文](http://paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/index.html) 文档 - -- [深度学习101](https://github.com/PaddlePaddle/book) - - 或许您想从这个在线交互式书籍开始,可以在Jupyter Notebook中运行 - -- [分布式训练](http://paddlepaddle.org/documentation/docs/zh/1.2/user_guides/howto/training/cluster_howto.html) - - 可以在MPI集群上运行分布式训练任务 - -- [Python API](http://paddlepaddle.org/documentation/docs/zh/1.2/api_cn/index_cn.html) - - 新的API支持代码更少更简洁的程序 - -- [贡献方式](http://paddlepaddle.org/documentation/docs/zh/1.2/advanced_usage/development/contribute_to_paddle/index_cn.html) - - 欢迎您的贡献! - ## Ask Questions You are welcome to submit questions and bug reports as [Github Issues](https://github.com/PaddlePaddle/Paddle/issues). -## 答疑 - -欢迎您将问题和bug报告以[Github Issues](https://github.com/PaddlePaddle/Paddle/issues)的形式提交 - ## Copyright and License PaddlePaddle is provided under the [Apache-2.0 license](LICENSE). - -## 版权和许可证 -PaddlePaddle由[Apache-2.0 license](LICENSE)提供 diff --git a/README_cn.md b/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..dfb55b17ca4fd05ce5b7b85b2e26e4f7f7229763 --- /dev/null +++ b/README_cn.md @@ -0,0 +1,88 @@ +# PaddlePaddle + +[English](./README.md) | 简体中文 + +[![Build Status](https://travis-ci.org/PaddlePaddle/Paddle.svg?branch=develop)](https://travis-ci.org/PaddlePaddle/Paddle) +[![Documentation Status](https://img.shields.io/badge/docs-latest-brightgreen.svg?style=flat)](http://paddlepaddle.org/documentation/docs/en/1.2/getstarted/index_en.html) +[![Documentation Status](https://img.shields.io/badge/中文文档-最新-brightgreen.svg)](http://paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/index.html) +[![Release](https://img.shields.io/github/release/PaddlePaddle/Paddle.svg)](https://github.com/PaddlePaddle/Paddle/releases) +[![License](https://img.shields.io/badge/license-Apache%202-blue.svg)](LICENSE) + +欢迎来到 PaddlePaddle GitHub + +PaddlePaddle (PArallel Distributed Deep LEarning) 是一个简单易用、高效灵活、可扩展的深度学习平台,最初由百度科学家和工程师共同开发,目的是将深度学习技术应用到百度的众多产品中。 + +我们的愿景是让每个人都能通过PaddlePaddle接触深度学习 + +跟进PaddlePaddle最新特性请参考我们的[版本说明](https://github.com/PaddlePaddle/Paddle/releases) + +### PaddlePaddle最新版本: [Fluid 1.2.0](https://github.com/PaddlePaddle/Paddle/tree/release/1.2) +### 安装最新稳定版本: +``` +# Linux CPU +pip install paddlepaddle +# Linux GPU cuda9cudnn7 +pip install paddlepaddle-gpu +# Linux GPU cuda8cudnn7 +pip install paddlepaddle-gpu==1.2.0.post87 +# Linux GPU cuda8cudnn5 +pip install paddlepaddle-gpu==1.2.0.post85 + +# 其他平台上的安装指引请参考 http://paddlepaddle.org/ +``` + +## 特性 + +- **灵活性** + + PaddlePaddle支持丰富的神经网络架构和优化算法。易于配置复杂模型,例如带有注意力机制或复杂记忆连接的神经网络机器翻译模型。 + +- **高效性** + + 为了高效使用异步计算资源,PaddlePaddle对框架的不同层进行优化,包括计算、存储、架构和通信。下面是一些样例: + + - 通过SSE/AVX 内置函数、BLAS库(例如MKL、OpenBLAS、cuBLAS)或定制的CPU/GPU内核优化数学操作。 + - 通过MKL-DNN库优化CNN网络 + - 高度优化循环网络,无需执行 `padding` 操作即可处理 **变长** 序列 + - 针对高维稀疏数据模型,优化了局部和分布式训练。 + + +- **稳定性** + + 有了 PaddlePaddle,使得利用各种CPU/GPU和机器来加速训练变得简单。PaddlePaddle 通过优化通信可以实现巨大吞吐量和快速执行。 + +- **与产品相连** + + 另外,PaddlePaddle 的设计也易于部署。在百度,PaddlePaddle 已经部署到含有巨大用户量的产品和服务上,包括广告点击率(CTR)预测、大规模图像分类、光学字符识别(OCR)、搜索排序,计算机病毒检测、推荐系统等等。PaddlePaddle广泛应用于百度产品中,产生了非常重要的影响。我们希望您也能探索 PaddlePaddle 的能力,为您的产品创造新的影响力和效果。 + +## 安装 + +推荐阅读官网上的[安装说明](http://paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/install/index_cn.html) + +## 文档 + +我们提供[英文](http://paddlepaddle.org/documentation/docs/en/1.2/getstarted/index_en.html)和 +[中文](http://paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/index.html) 文档 + +- [深度学习101](https://github.com/PaddlePaddle/book) + + 或许您想从这个在线交互式书籍开始,可以在Jupyter Notebook中运行 + +- [分布式训练](http://paddlepaddle.org/documentation/docs/zh/1.2/user_guides/howto/training/cluster_howto.html) + + 可以在MPI集群上运行分布式训练任务 + +- [Python API](http://paddlepaddle.org/documentation/docs/zh/1.2/api_cn/index_cn.html) + + 新的API支持代码更少更简洁的程序 + +- [贡献方式](http://paddlepaddle.org/documentation/docs/zh/1.2/advanced_usage/development/contribute_to_paddle/index_cn.html) + + 欢迎您的贡献! + +## 答疑 + +欢迎您将问题和bug报告以[Github Issues](https://github.com/PaddlePaddle/Paddle/issues)的形式提交 + +## 版权和许可证 +PaddlePaddle由[Apache-2.0 license](LICENSE)提供 diff --git a/paddle/fluid/API.spec b/paddle/fluid/API.spec index fe8d6dd42595dfc9dc4bf17e336df801b582703e..f50a38842a21c795c979f859e88a9b16c3e54bd8 100644 --- a/paddle/fluid/API.spec +++ b/paddle/fluid/API.spec @@ -324,7 +324,8 @@ paddle.fluid.layers.generate_mask_labels ArgSpec(args=['im_info', 'gt_classes', paddle.fluid.layers.iou_similarity ArgSpec(args=['x', 'y', 'name'], varargs=None, keywords=None, defaults=(None,)) paddle.fluid.layers.box_coder ArgSpec(args=['prior_box', 'prior_box_var', 'target_box', 'code_type', 'box_normalized', 'name', 'axis'], varargs=None, keywords=None, defaults=('encode_center_size', True, None, 0)) paddle.fluid.layers.polygon_box_transform ArgSpec(args=['input', 'name'], varargs=None, keywords=None, defaults=(None,)) -paddle.fluid.layers.yolov3_loss ArgSpec(args=['x', 'gtbox', 'gtlabel', 'anchors', 'class_num', 'ignore_thresh', 'loss_weight_xy', 'loss_weight_wh', 'loss_weight_conf_target', 'loss_weight_conf_notarget', 'loss_weight_class', 'name'], varargs=None, keywords=None, defaults=(None, None, None, None, None, None)) +paddle.fluid.layers.yolov3_loss ArgSpec(args=['x', 'gtbox', 'gtlabel', 'anchors', 'anchor_mask', 'class_num', 'ignore_thresh', 'downsample_ratio', 'name'], varargs=None, keywords=None, defaults=(None,)) +paddle.fluid.layers.box_clip ArgSpec(args=['input', 'im_info', 'name'], varargs=None, keywords=None, defaults=(None,)) paddle.fluid.layers.multiclass_nms ArgSpec(args=['bboxes', 'scores', 'score_threshold', 'nms_top_k', 'keep_top_k', 'nms_threshold', 'normalized', 'nms_eta', 'background_label', 'name'], varargs=None, keywords=None, defaults=(0.3, True, 1.0, 0, None)) paddle.fluid.layers.accuracy ArgSpec(args=['input', 'label', 'k', 'correct', 'total'], varargs=None, keywords=None, defaults=(1, None, None)) paddle.fluid.layers.auc ArgSpec(args=['input', 'label', 'curve', 'num_thresholds', 'topk', 'slide_steps'], varargs=None, keywords=None, defaults=('ROC', 4095, 1, 1)) @@ -361,6 +362,9 @@ paddle.fluid.contrib.QuantizeTranspiler.__init__ ArgSpec(args=['self', 'weight_b paddle.fluid.contrib.QuantizeTranspiler.convert_to_int8 ArgSpec(args=['self', 'program', 'place', 'scope'], varargs=None, keywords=None, defaults=(None,)) paddle.fluid.contrib.QuantizeTranspiler.freeze_program ArgSpec(args=['self', 'program', 'place', 'fuse_bn', 'scope'], varargs=None, keywords=None, defaults=(False, None)) paddle.fluid.contrib.QuantizeTranspiler.training_transpile ArgSpec(args=['self', 'program', 'startup_program'], varargs=None, keywords=None, defaults=(None, None)) +paddle.fluid.contrib.Calibrator.__init__ ArgSpec(args=['self'], varargs='args', keywords='kwargs', defaults=None) +paddle.fluid.contrib.Calibrator.sample_data ArgSpec(args=['self'], varargs=None, keywords=None, defaults=None) +paddle.fluid.contrib.Calibrator.save_int8_model ArgSpec(args=['self'], varargs=None, keywords=None, defaults=None) paddle.fluid.contrib.reader.ctr_reader.ctr_reader ArgSpec(args=['feed_dict', 'file_type', 'file_format', 'dense_slot_index', 'sparse_slot_index', 'capacity', 'thread_num', 'batch_size', 'file_list', 'slots', 'name'], varargs=None, keywords=None, defaults=(None,)) paddle.fluid.contrib.build_compressor ArgSpec(args=['place', 'data_reader', 'data_feeder', 'scope', 'metrics', 'epoch', 'config'], varargs=None, keywords=None, defaults=(None, None, None, None, None, None, None)) paddle.fluid.contrib.CompressPass.__init__ ArgSpec(args=['self', 'place', 'data_reader', 'data_feeder', 'scope', 'metrics', 'epoch', 'program_exe'], varargs=None, keywords=None, defaults=(None, None, None, None, None, None, None)) diff --git a/paddle/fluid/framework/ir/CMakeLists.txt b/paddle/fluid/framework/ir/CMakeLists.txt index 914bcce7755bcf0651da29dd669f5d6d14e081d2..07c2c970d4de3cecf03e4cf80e60e81e7a9595a8 100644 --- a/paddle/fluid/framework/ir/CMakeLists.txt +++ b/paddle/fluid/framework/ir/CMakeLists.txt @@ -65,6 +65,7 @@ pass_library(conv_elementwise_add2_act_fuse_pass inference) pass_library(conv_elementwise_add_fuse_pass inference) pass_library(conv_affine_channel_fuse_pass inference) pass_library(transpose_flatten_concat_fuse_pass inference) +pass_library(identity_scale_op_clean_pass base) # There may be many transpose-flatten structures in a model, and the output of # these structures will be used as inputs to the concat Op. This pattern will diff --git a/paddle/fluid/framework/ir/graph_pattern_detector.cc b/paddle/fluid/framework/ir/graph_pattern_detector.cc index 6282ced1e47329915bb3626b410e55ad8251071d..9ea0729e1f3339c2f17371ecc8fa51325b9629bb 100644 --- a/paddle/fluid/framework/ir/graph_pattern_detector.cc +++ b/paddle/fluid/framework/ir/graph_pattern_detector.cc @@ -117,11 +117,6 @@ bool GraphPatternDetector::MarkPDNodesInGraph(const ir::Graph &graph) { // return false; } } - for (auto &item : pdnodes2nodes_) { - for (auto &n : item.second) { - GetMarkedNodes(const_cast(&graph)).insert(n); - } - } VLOG(3) << pdnodes2nodes_.size() << " nodes marked"; return !pdnodes2nodes_.empty(); diff --git a/paddle/fluid/framework/ir/identity_scale_op_clean_pass.cc b/paddle/fluid/framework/ir/identity_scale_op_clean_pass.cc new file mode 100644 index 0000000000000000000000000000000000000000..3b738aa159ebfd77f00c9e532fbd94542e2097db --- /dev/null +++ b/paddle/fluid/framework/ir/identity_scale_op_clean_pass.cc @@ -0,0 +1,80 @@ +// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include "paddle/fluid/framework/ir/identity_scale_op_clean_pass.h" +#include +#include "paddle/fluid/framework/ir/graph_pattern_detector.h" + +namespace paddle { +namespace framework { +namespace ir { + +std::unique_ptr IdentityScaleOpCleanPass::ApplyImpl( + std::unique_ptr graph) const { + FusePassBase::Init("identity_scale_op_clean", graph.get()); + + // pre_op -> scale_in -> scale_op -> scale_out + // -> + // pre_op -> scale_out + GraphPatternDetector detector; + auto pre_op = detector.mutable_pattern()->NewNode("pre_op")->assert_is_op(); + auto scale_in = detector.mutable_pattern() + ->NewNode("scale_in") + ->assert_is_op_input("scale") + ->AsIntermediate(); + auto scale_op = detector.mutable_pattern() + ->NewNode("scale_fuse") + ->assert_is_op("scale") + ->assert_op_attr("scale", 1.) + ->assert_op_attr("bias", 0.); + auto scale_out = detector.mutable_pattern() + ->NewNode("scale_out") + ->assert_is_op_output("scale"); + + pre_op->LinksTo({scale_in}); + scale_op->LinksFrom({scale_in}).LinksTo({scale_out}); + + GraphPatternDetector::handle_t handler = [&]( + const GraphPatternDetector::subgraph_t& subgraph, Graph* graph) { + Node* scale_op_var = subgraph.at(scale_op); + Node* scale_in_var = subgraph.at(scale_in); + Node* scale_out_var = subgraph.at(scale_out); + Node* pre_op_var = subgraph.at(pre_op); + // Link pre_op directly to scale_out + const std::string scale_in_name = scale_in_var->Name(); + const std::string scale_out_name = scale_out_var->Name(); + // Remove links in graph + GraphSafeRemoveNodes(graph, {scale_in_var, scale_op_var}); + // Modify proto message + auto* pre_op_desc = pre_op_var->Op(); + for (auto& parameter : *pre_op_desc->Proto()->mutable_outputs()) { + auto* arguments = parameter.mutable_arguments(); + auto it = std::find(arguments->begin(), arguments->end(), scale_in_name); + PADDLE_ENFORCE(it != arguments->end()); + *it = scale_out_name; + } + + IR_NODE_LINK_TO(pre_op_var, scale_out_var); + }; + + detector(graph.get(), handler); + return graph; +} + +} // namespace ir +} // namespace framework +} // namespace paddle + +REGISTER_PASS(identity_scale_op_clean_pass, + paddle::framework::ir::IdentityScaleOpCleanPass); diff --git a/paddle/fluid/framework/ir/identity_scale_op_clean_pass.h b/paddle/fluid/framework/ir/identity_scale_op_clean_pass.h new file mode 100644 index 0000000000000000000000000000000000000000..50a654d82f0e4fb7e8e91c665397716407e6d2a5 --- /dev/null +++ b/paddle/fluid/framework/ir/identity_scale_op_clean_pass.h @@ -0,0 +1,33 @@ +// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#pragma once + +#include "paddle/fluid/framework/ir/fuse_pass_base.h" + +namespace paddle { +namespace framework { +namespace ir { + +class IdentityScaleOpCleanPass : public FusePassBase { + protected: + std::unique_ptr ApplyImpl(std::unique_ptr graph) const; + + private: + virtual ~IdentityScaleOpCleanPass() = default; +}; + +} // namespace ir +} // namespace framework +} // namespace paddle diff --git a/paddle/fluid/framework/scope.cc b/paddle/fluid/framework/scope.cc index 953618560913229cd1e47659ad61e621efc10ed1..87f0f307d30bc90a43a698c3766b16c975f0635e 100644 --- a/paddle/fluid/framework/scope.cc +++ b/paddle/fluid/framework/scope.cc @@ -22,11 +22,7 @@ limitations under the License. */ #include "paddle/fluid/framework/threadpool.h" #include "paddle/fluid/string/printf.h" -DEFINE_bool(benchmark, false, - "Doing memory benchmark. It will make deleting scope synchronized, " - "and add some memory usage logs." - "Default cuda is asynchronous device, set to True will" - "force op run in synchronous mode."); +DECLARE_bool(benchmark); DEFINE_bool( eager_delete_scope, true, diff --git a/paddle/fluid/inference/analysis/ir_pass_manager.cc b/paddle/fluid/inference/analysis/ir_pass_manager.cc index fe3c841186c35ea28c1d44007d91de5b997c1388..7476c199cfd073ec0962fa9a48f24750a6484bb5 100644 --- a/paddle/fluid/inference/analysis/ir_pass_manager.cc +++ b/paddle/fluid/inference/analysis/ir_pass_manager.cc @@ -83,7 +83,6 @@ void IRPassManager::CreatePasses(Argument *argument, new std::string(GetOrCreateModelOptCacheDir(model_opt_cache_dir))); } - // graph_ = pass->Apply(std::move(graph_)); pre_pass = pass_name; passes_.emplace_back(std::move(pass)); @@ -97,8 +96,9 @@ std::unique_ptr IRPassManager::Apply(std::unique_ptr graph) { PADDLE_ENFORCE(graph.get()); // Apply all the passes for (const auto &pass : passes_) { - if (pass->Type() == "graph_viz_pass") continue; - PrettyLogEndl(Style::H2(), "--- Running IR pass [%s]", pass->Type()); + if (pass->Type() != "graph_viz_pass") { + PrettyLogEndl(Style::H2(), "--- Running IR pass [%s]", pass->Type()); + } graph = pass->Apply(std::move(graph)); } return std::move(graph); diff --git a/paddle/fluid/inference/api/analysis_config.cc b/paddle/fluid/inference/api/analysis_config.cc index eecab238a88e90399eb70f17caa57633af4e2a69..e92273b4dd94f11e0e90c91fd82dafe42bf158f3 100644 --- a/paddle/fluid/inference/api/analysis_config.cc +++ b/paddle/fluid/inference/api/analysis_config.cc @@ -318,4 +318,9 @@ NativeConfig AnalysisConfig::ToNativeConfig() const { return config; } +void AnalysisConfig::SwitchIrDebug(int x) { + ir_debug_ = x; + Update(); +} + } // namespace paddle diff --git a/paddle/fluid/inference/api/analysis_predictor.cc b/paddle/fluid/inference/api/analysis_predictor.cc index 14d6ba8c56dc3fe04e27bccadd5a5155547398a4..da2e9803f0467f2b83d79cdd06d4317d41630b04 100644 --- a/paddle/fluid/inference/api/analysis_predictor.cc +++ b/paddle/fluid/inference/api/analysis_predictor.cc @@ -58,7 +58,8 @@ namespace { bool IsPersistable(const framework::VarDesc *var) { if (var->Persistable() && var->GetType() != framework::proto::VarType::FEED_MINIBATCH && - var->GetType() != framework::proto::VarType::FETCH_LIST) { + var->GetType() != framework::proto::VarType::FETCH_LIST && + var->GetType() != framework::proto::VarType::RAW) { return true; } return false; diff --git a/paddle/fluid/inference/api/analysis_predictor_tester.cc b/paddle/fluid/inference/api/analysis_predictor_tester.cc index 6d11b461082d0ed8ba08c9e280bba86737b86e71..002ba90e40e69d565f5a54e374a3f0083b84273f 100644 --- a/paddle/fluid/inference/api/analysis_predictor_tester.cc +++ b/paddle/fluid/inference/api/analysis_predictor_tester.cc @@ -196,7 +196,7 @@ TEST(AnalysisPredictor, memory_optim) { AnalysisConfig config(FLAGS_dirname); config.DisableGpu(); config.EnableMemoryOptim(true); - config.pass_builder()->TurnOnDebug(); + config.SwitchIrDebug(); auto native_predictor = CreatePaddlePredictor(config.ToNativeConfig()); diff --git a/paddle/fluid/inference/api/paddle_analysis_config.h b/paddle/fluid/inference/api/paddle_analysis_config.h index 9d9ed6a39d8324002a8850deae9bb8dd5af7ef9b..47361b3279e14dd65a0e6e7f864e508ef1183045 100644 --- a/paddle/fluid/inference/api/paddle_analysis_config.h +++ b/paddle/fluid/inference/api/paddle_analysis_config.h @@ -140,9 +140,12 @@ struct AnalysisConfig { */ bool tensorrt_engine_enabled() const { return use_tensorrt_; } - /** Control whther to debug IR graph analysis phase. + /** \brief Control whether to debug IR graph analysis phase. + * + * This will generate DOT files for visualizing the computation graph after + * each analysis pass applied. */ - void SwitchIrDebug(int x = true) { ir_debug_ = x; } + void SwitchIrDebug(int x = true); /** Turn on MKLDNN. */ diff --git a/paddle/fluid/inference/api/paddle_pass_builder.h b/paddle/fluid/inference/api/paddle_pass_builder.h index 391932a1ee018c45818457c55fd8f82a22ab7405..aa353f12ca7333713e2d640cce6b2dfbea3c4e26 100644 --- a/paddle/fluid/inference/api/paddle_pass_builder.h +++ b/paddle/fluid/inference/api/paddle_pass_builder.h @@ -117,6 +117,7 @@ class CpuPassStrategy : public PassStrategy { "conv_bn_fuse_pass", // "conv_eltwiseadd_bn_fuse_pass", // "is_test_pass", // + "identity_scale_op_clean_pass", // }); use_gpu_ = false; } @@ -155,6 +156,7 @@ class GpuPassStrategy : public PassStrategy { GpuPassStrategy() : PassStrategy({}) { passes_.assign({ "infer_clean_graph_pass", // + "identity_scale_op_clean_pass", // "conv_affine_channel_fuse_pass", // "conv_eltwiseadd_affine_channel_fuse_pass", // "conv_bn_fuse_pass", // diff --git a/paddle/fluid/inference/tests/api/CMakeLists.txt b/paddle/fluid/inference/tests/api/CMakeLists.txt index aa3da397ff67dd06dd750d336a49056baedaaab6..7ecd9e35332843e3a391cdad5ce32220d890abd1 100644 --- a/paddle/fluid/inference/tests/api/CMakeLists.txt +++ b/paddle/fluid/inference/tests/api/CMakeLists.txt @@ -128,9 +128,9 @@ inference_analysis_api_test_with_fake_data(test_analyzer_resnet50 inference_analysis_api_test_with_fake_data(test_analyzer_mobilenet_depthwise_conv "${INFERENCE_DEMO_INSTALL_DIR}/mobilenet_depthwise_conv" analyzer_resnet50_tester.cc "mobilenet_model.tar.gz" SERIAL) -# bert, max_len=20 -set(BERT_INSTALL_DIR "${INFERENCE_DEMO_INSTALL_DIR}/bert20") -download_model_and_data(${BERT_INSTALL_DIR} "bert_model.tar.gz" "bert_data_len20.txt.tar.gz") +# bert, max_len=20, embedding_dim=128 +set(BERT_INSTALL_DIR "${INFERENCE_DEMO_INSTALL_DIR}/bert_emb128") +download_model_and_data(${BERT_INSTALL_DIR} "bert_emb128_model.tar.gz" "bert_data_len20.txt.tar.gz") inference_analysis_api_test(test_analyzer_bert ${BERT_INSTALL_DIR} analyzer_bert_tester.cc SERIAL) # anakin diff --git a/paddle/fluid/inference/tests/api/analyzer_seq_pool1_tester.cc b/paddle/fluid/inference/tests/api/analyzer_seq_pool1_tester.cc index 8be2a6d79b2ede2c149aa523e38c3960ab30acb1..dd953e0dccbb3749bfcc87966453c6976dfefa10 100644 --- a/paddle/fluid/inference/tests/api/analyzer_seq_pool1_tester.cc +++ b/paddle/fluid/inference/tests/api/analyzer_seq_pool1_tester.cc @@ -142,7 +142,7 @@ void SetConfig(AnalysisConfig *cfg, bool use_mkldnn = false) { cfg->SetModel(FLAGS_infer_model + "/model", FLAGS_infer_model + "/params"); cfg->DisableGpu(); cfg->SwitchSpecifyInputNames(); - cfg->pass_builder()->TurnOnDebug(); + cfg->SwitchIrDebug(); cfg->SetCpuMathLibraryNumThreads(FLAGS_paddle_num_threads); if (use_mkldnn) { cfg->EnableMKLDNN(); diff --git a/paddle/fluid/inference/tests/api/analyzer_text_classification_tester.cc b/paddle/fluid/inference/tests/api/analyzer_text_classification_tester.cc index 2db297e2005c6b657259187d6b6b76657d9e4388..2003be82019333ca97b9fa8ef83668825fe5710d 100644 --- a/paddle/fluid/inference/tests/api/analyzer_text_classification_tester.cc +++ b/paddle/fluid/inference/tests/api/analyzer_text_classification_tester.cc @@ -69,7 +69,7 @@ void SetInput(std::vector> *inputs) { TEST(Analyzer_Text_Classification, profile) { AnalysisConfig cfg; SetConfig(&cfg); - cfg.pass_builder()->TurnOnDebug(); + cfg.SwitchIrDebug(); std::vector outputs; std::vector> input_slots_all; diff --git a/paddle/fluid/inference/utils/CMakeLists.txt b/paddle/fluid/inference/utils/CMakeLists.txt index a7b239731b9a2e876c16d9ff84dfb8ac3df7b82e..c43eaf7f9849ee4a88ed95bdb8b6966da8760435 100644 --- a/paddle/fluid/inference/utils/CMakeLists.txt +++ b/paddle/fluid/inference/utils/CMakeLists.txt @@ -1,4 +1,4 @@ cc_library(benchmark SRCS benchmark.cc DEPS enforce) cc_test(test_benchmark SRCS benchmark_tester.cc DEPS benchmark) -#cc_binary(visualizer SRCS visualizer.cc DEPS analysis -# paddle_pass_builder ir_pass_manager pass graph_viz_pass analysis_passes) +cc_binary(visualizer SRCS visualizer.cc DEPS analysis + paddle_pass_builder ir_pass_manager pass graph_viz_pass analysis_passes) diff --git a/paddle/fluid/memory/allocation/legacy_allocator.cc b/paddle/fluid/memory/allocation/legacy_allocator.cc index 8759ec8096cf102ab85d2c2a91eddc23a6ed0e50..ef62f758e37f28ab826faac84fd1276b14de7980 100644 --- a/paddle/fluid/memory/allocation/legacy_allocator.cc +++ b/paddle/fluid/memory/allocation/legacy_allocator.cc @@ -35,6 +35,7 @@ DEFINE_bool(init_allocated_mem, false, "To find this error in time, we use init_allocated_mem to indicate " "that initializing the allocated memory with a small value " "during unit testing."); +DECLARE_bool(benchmark); DECLARE_double(fraction_of_gpu_memory_to_use); namespace paddle { @@ -59,11 +60,6 @@ size_t memory_usage(const platform::Place &p); using BuddyAllocator = detail::BuddyAllocator; -std::unordered_map> - gpu_mem_info; - BuddyAllocator *GetCPUBuddyAllocator() { // We tried thread_local for inference::RNN1 model, but that not works much // for multi-thread test. @@ -144,6 +140,8 @@ BuddyAllocator *GetGPUBuddyAllocator(int gpu_id) { devices = platform::GetSelectedDevices(); int gpu_num = devices.size(); + allocation::GPUMemMonitor.Initialize(devices.size()); + a_arr = new BuddyAllocator *[gpu_num]; for (size_t i = 0; i < devices.size(); ++i) { int dev_id = devices[i]; @@ -204,12 +202,7 @@ void *Alloc(const platform::CUDAPlace &place, << string::HumanReadableSize(Used(place)); platform::SetDeviceId(cur_dev); } else { - gpu_mem_info[place.device].first += size; - if (gpu_mem_info[place.device].first > gpu_mem_info[place.device].second) { - gpu_mem_info[place.device].second = gpu_mem_info[place.device].first; - VLOG(3) << "device: " << place.device << " peak memory usage : " - << (gpu_mem_info[place.device].second >> 20) << " MiB"; - } + if (FLAGS_benchmark) allocation::GPUMemMonitor.Add(place.device, size); if (FLAGS_init_allocated_mem) { cudaMemset(ptr, 0xEF, size); } @@ -225,7 +218,7 @@ void Free(const platform::CUDAPlace &place, void *p, size_t size) { #ifdef PADDLE_WITH_CUDA GetGPUBuddyAllocator(place.device)->Free(p); - gpu_mem_info[place.device].first -= size; + if (FLAGS_benchmark) allocation::GPUMemMonitor.Minus(place.device, size); #else PADDLE_THROW("'CUDAPlace' is not supported in CPU only device."); #endif @@ -335,6 +328,8 @@ size_t Usage::operator()(const platform::CUDAPinnedPlace &cuda_pinned) const { namespace allocation { +LegacyMemMonitor GPUMemMonitor; + Allocation *LegacyAllocator::AllocateImpl(size_t size, Allocator::Attr attr) { void *ptr = boost::apply_visitor(legacy::AllocVisitor(size), place_); return new Allocation(ptr, size, place_); @@ -346,6 +341,63 @@ void LegacyAllocator::Free(Allocation *allocation) { allocation->place()); delete allocation; } + +bool MemInfo::Add(const size_t &size) { + std::lock_guard lock(mutex_); + usage_ += size; + bool peak_point = usage_ > peak_usage_; + if (peak_point) peak_usage_ = usage_; + return peak_point; +} + +void MemInfo::Minus(const size_t &size) { + std::lock_guard lock(mutex_); + usage_ -= size; +} + +uint64_t MemInfo::GetPeakUsage() { return peak_usage_; } + +LegacyMemMonitor::~LegacyMemMonitor() { + for (auto &item : gpu_mem_info_) delete item.second; +} + +void LegacyMemMonitor::Initialize(const int &device_num) { + for (auto i = 0; i < device_num; ++i) { + gpu_mem_info_[i] = new MemInfo(); + } +} + +void LegacyMemMonitor::Add(const int &device, const size_t &size) { + if (gpu_mem_info_[device]->Add(size)) { + VLOG(3) << "#LegacyMemMonitor# device: " << device + << " peak memory usage : " + << (gpu_mem_info_[device]->GetPeakUsage() >> 20) << " MiB"; + } +} + +void LegacyMemMonitor::Minus(const int &device, const size_t &size) { + gpu_mem_info_[device]->Minus(size); +} + +uint64_t LegacyMemMonitor::GetMemUsage(const int &device) { + return gpu_mem_info_.find(device) == gpu_mem_info_.end() + ? 0 + : gpu_mem_info_[device]->GetPeakUsage(); +} + +void LegacyMemMonitor::PrintMemUsage() { + std::vector devices; + for (const auto &item : gpu_mem_info_) { + devices.emplace_back(item.first); + } + std::sort(devices.begin(), devices.end()); + for (const auto &device : devices) { + std::cout << "Device : " << device << " Peak Memory Usage : " + << (gpu_mem_info_[device]->GetPeakUsage() >> 20) << " MiB" + << std::endl; + } +} + } // namespace allocation } // namespace memory } // namespace paddle diff --git a/paddle/fluid/memory/allocation/legacy_allocator.h b/paddle/fluid/memory/allocation/legacy_allocator.h index 503a7a685cb9d8dbbbbd6c23b5b82c383893e3d8..ccbc8c70d8e9a16e7edb1be54bf80bec3b368eca 100644 --- a/paddle/fluid/memory/allocation/legacy_allocator.h +++ b/paddle/fluid/memory/allocation/legacy_allocator.h @@ -13,12 +13,59 @@ // limitations under the License. #pragma once +#include +#include // NOLINT +#include +#include +#include #include "paddle/fluid/memory/allocation/allocator.h" #include "paddle/fluid/platform/place.h" namespace paddle { namespace memory { namespace allocation { +class MemInfo { + public: + MemInfo() : usage_(0), peak_usage_(0) {} + MemInfo(const MemInfo &) = delete; + MemInfo &operator=(const MemInfo &) = delete; + + // return a flag to indicate current operation will create a peak point or not + bool Add(const size_t &); + void Minus(const size_t &); + + uint64_t GetPeakUsage(); + + private: + /* current memory usage*/ + uint64_t usage_; + uint64_t peak_usage_; + std::mutex mutex_; +}; + +class LegacyMemMonitor { + public: + // used to store the GPU memory usage of each devices + using MemUsage = std::unordered_map; + + MemUsage GetMemUsageInfo() { return gpu_mem_info_; } + ~LegacyMemMonitor(); + + void Initialize(const int &); + void Add(const int &, const size_t &); + void Minus(const int &, const size_t &); + + uint64_t GetMemUsage(const int &); + + void PrintMemUsage(); + + protected: + MemUsage gpu_mem_info_; +}; + +extern LegacyMemMonitor GPUMemMonitor; + class LegacyAllocatorPrivate; class LegacyAllocator : public Allocator { public: diff --git a/paddle/fluid/operators/batch_norm_op.cc b/paddle/fluid/operators/batch_norm_op.cc index 8b672e09b2c5c203c1a1447fbbd14a45ef7ba257..0736bd4d20eb60df4a1cb23aeec92dbe7f7495bd 100644 --- a/paddle/fluid/operators/batch_norm_op.cc +++ b/paddle/fluid/operators/batch_norm_op.cc @@ -589,8 +589,10 @@ class BatchNormGradMaker : public framework::SingleGradOpDescMaker { op->SetInput("SavedVariance", Output("SavedVariance")); // used when setting use_global_stats True during training - op->SetInput("Mean", Output("MeanOut")); - op->SetInput("Variance", Output("VarianceOut")); + if (boost::get(GetAttr("use_global_stats"))) { + op->SetInput("Mean", Output("MeanOut")); + op->SetInput("Variance", Output("VarianceOut")); + } op->SetAttrMap(Attrs()); diff --git a/paddle/fluid/operators/detection/CMakeLists.txt b/paddle/fluid/operators/detection/CMakeLists.txt index d3a61dc367c3642b8faa9085a470a302712395e5..f6fbe97565c43c306ea885c765c0a665492fa317 100644 --- a/paddle/fluid/operators/detection/CMakeLists.txt +++ b/paddle/fluid/operators/detection/CMakeLists.txt @@ -31,6 +31,8 @@ detection_library(polygon_box_transform_op SRCS polygon_box_transform_op.cc polygon_box_transform_op.cu) detection_library(rpn_target_assign_op SRCS rpn_target_assign_op.cc) detection_library(generate_proposal_labels_op SRCS generate_proposal_labels_op.cc) +detection_library(box_clip_op SRCS box_clip_op.cc box_clip_op.cu) +detection_library(yolov3_loss_op SRCS yolov3_loss_op.cc) if(WITH_GPU) detection_library(generate_proposals_op SRCS generate_proposals_op.cc generate_proposals_op.cu DEPS memory cub) diff --git a/paddle/fluid/operators/detection/bbox_util.h b/paddle/fluid/operators/detection/bbox_util.h index b99edb5bf05f94e762b377a8882e4c3fcdb5afad..a7bc3e027229884e78721d29428a8ab3f08a6ebc 100644 --- a/paddle/fluid/operators/detection/bbox_util.h +++ b/paddle/fluid/operators/detection/bbox_util.h @@ -99,5 +99,29 @@ void BboxOverlaps(const framework::Tensor& r_boxes, } } +template +void ClipTiledBoxes(const platform::DeviceContext& ctx, + const framework::Tensor& im_info, + const framework::Tensor& input_boxes, + framework::Tensor* out) { + T* out_data = out->mutable_data(ctx.GetPlace()); + const T* im_info_data = im_info.data(); + const T* input_boxes_data = input_boxes.data(); + T zero(0); + T im_w = round(im_info_data[1] / im_info_data[2]); + T im_h = round(im_info_data[0] / im_info_data[2]); + for (int64_t i = 0; i < input_boxes.numel(); ++i) { + if (i % 4 == 0) { + out_data[i] = std::max(std::min(input_boxes_data[i], im_w - 1), zero); + } else if (i % 4 == 1) { + out_data[i] = std::max(std::min(input_boxes_data[i], im_h - 1), zero); + } else if (i % 4 == 2) { + out_data[i] = std::max(std::min(input_boxes_data[i], im_w - 1), zero); + } else { + out_data[i] = std::max(std::min(input_boxes_data[i], im_h - 1), zero); + } + } +} + } // namespace operators } // namespace paddle diff --git a/paddle/fluid/operators/detection/box_clip_op.cc b/paddle/fluid/operators/detection/box_clip_op.cc new file mode 100644 index 0000000000000000000000000000000000000000..3aa766559a530bc31fbb277f2bcd474da776e63b --- /dev/null +++ b/paddle/fluid/operators/detection/box_clip_op.cc @@ -0,0 +1,86 @@ +/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved. +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. */ + +#include "paddle/fluid/operators/detection/box_clip_op.h" +#include "paddle/fluid/framework/op_registry.h" + +namespace paddle { +namespace operators { + +class BoxClipOp : public framework::OperatorWithKernel { + public: + using framework::OperatorWithKernel::OperatorWithKernel; + + protected: + void InferShape(framework::InferShapeContext* ctx) const override { + PADDLE_ENFORCE(ctx->HasInput("Input"), + "Input(Input) of BoxClipOp should not be null."); + PADDLE_ENFORCE(ctx->HasInput("ImInfo"), + "Input(ImInfo) of BoxClipOp should not be null."); + + auto input_box_dims = ctx->GetInputDim("Input"); + auto im_info_dims = ctx->GetInputDim("ImInfo"); + + if (ctx->IsRuntime()) { + auto input_box_size = input_box_dims.size(); + PADDLE_ENFORCE_EQ(input_box_dims[input_box_size - 1], 4, + "The last dimension of Input must be 4"); + PADDLE_ENFORCE_EQ(im_info_dims.size(), 2, + "The rank of Input(Input) in BoxClipOp must be 2"); + PADDLE_ENFORCE_EQ(im_info_dims[1], 3, + "The last dimension of ImInfo must be 3"); + } + ctx->ShareDim("Input", /*->*/ "Output"); + ctx->ShareLoD("Input", /*->*/ "Output"); + } +}; + +class BoxClipOpMaker : public framework::OpProtoAndCheckerMaker { + public: + void Make() override { + AddInput("Input", + "(LoDTensor) " + "Input is a LoDTensor with shape [..., 4] holds 4 points" + "in last dimension in format [xmin, ymin, xmax, ymax]"); + AddInput("ImInfo", + "(Tensor) Information for image reshape is in shape (N, 3), " + "in format (height, width, im_scale)"); + AddOutput("Output", + "(LoDTensor) " + "Output is a LoDTensor with the same shape as Input" + "and it is the result after clip"); + AddComment(R"DOC( +This operator clips input boxes to original input images. + +For each input box, The formula is given as follows: + + $$xmin = \max(\min(xmin, im_w - 1), 0)$$ + $$ymin = \max(\min(ymin, im_h - 1), 0)$$ + $$xmax = \max(\min(xmax, im_w - 1), 0)$$ + $$ymax = \max(\min(ymax, im_h - 1), 0)$$ + +where im_w and im_h are computed from ImInfo, the formula is given as follows: + + $$im_w = \round(width / im_scale)$$ + $$im_h = \round(height / im_scale)$$ +)DOC"); + } +}; + +} // namespace operators +} // namespace paddle + +namespace ops = paddle::operators; +REGISTER_OPERATOR(box_clip, ops::BoxClipOp, ops::BoxClipOpMaker, + paddle::framework::EmptyGradOpMaker); +REGISTER_OP_CPU_KERNEL( + box_clip, ops::BoxClipKernel, + ops::BoxClipKernel); diff --git a/paddle/fluid/operators/detection/box_clip_op.cu b/paddle/fluid/operators/detection/box_clip_op.cu new file mode 100644 index 0000000000000000000000000000000000000000..b727da5f7b736b6f22407d1dfbca708ed0cf04d9 --- /dev/null +++ b/paddle/fluid/operators/detection/box_clip_op.cu @@ -0,0 +1,74 @@ +/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved. +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. */ + +#include +#include "paddle/fluid/framework/op_registry.h" +#include "paddle/fluid/operators/detection/box_clip_op.h" +#include "paddle/fluid/operators/math/math_function.h" +#include "paddle/fluid/platform/cuda_primitives.h" +#include "paddle/fluid/platform/hostdevice.h" + +namespace paddle { +namespace operators { + +using Tensor = framework::Tensor; +using LoDTenso = framework::LoDTensor; + +static constexpr int ImInfoSize = 3; + +template +static __global__ void GPUBoxClip(const T *input, const size_t *lod, + const size_t width, const T *im_info, + T *output) { + T im_w = round(im_info[blockIdx.x * ImInfoSize + 1] / + im_info[blockIdx.x * ImInfoSize + 2]); + T im_h = round(im_info[blockIdx.x * ImInfoSize] / + im_info[blockIdx.x * ImInfoSize + 2]); + for (int i = threadIdx.x; i < (lod[blockIdx.x + 1] - lod[blockIdx.x]) * width; + i += BlockSize) { + int idx = lod[blockIdx.x] * width + i; + T im_size = (idx % 2 == 0) ? im_w : im_h; + output[idx] = max(min(input[idx], im_size - 1), T(0.)); + } +} + +template +class GPUBoxClipKernel : public framework::OpKernel { + public: + void Compute(const framework::ExecutionContext &context) const override { + PADDLE_ENFORCE(platform::is_gpu_place(context.GetPlace()), + "This kernel only runs on GPU device."); + auto *input = context.Input("Input"); + auto *im_info = context.Input("ImInfo"); + auto *output = context.Output("Output"); + const int64_t num = input->dims()[0]; + const int64_t bbox_width = input->numel() / num; + auto lod = input->lod(); + framework::LoD abs_offset_lod = framework::ToAbsOffset(lod); + auto &dev_ctx = context.template device_context(); + auto stream = dev_ctx.stream(); + const size_t batch_size = lod.back().size() - 1; + T *output_data = output->mutable_data(dev_ctx.GetPlace()); + GPUBoxClip<<>>( + input->data(), abs_offset_lod[0].CUDAMutableData(dev_ctx.GetPlace()), + bbox_width, im_info->data(), output_data); + } +}; + +} // namespace operators +} // namespace paddle + +namespace ops = paddle::operators; +REGISTER_OP_CUDA_KERNEL( + box_clip, ops::GPUBoxClipKernel, + ops::GPUBoxClipKernel); diff --git a/paddle/fluid/operators/detection/box_clip_op.h b/paddle/fluid/operators/detection/box_clip_op.h new file mode 100644 index 0000000000000000000000000000000000000000..74e1f88f8d8b28e490d170934760bd9bffc807bc --- /dev/null +++ b/paddle/fluid/operators/detection/box_clip_op.h @@ -0,0 +1,50 @@ +/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved. +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. */ + +#pragma once +#include +#include "paddle/fluid/framework/op_registry.h" +#include "paddle/fluid/operators/detection/bbox_util.h" +#include "paddle/fluid/operators/math/math_function.h" + +namespace paddle { +namespace operators { + +using Tensor = framework::Tensor; +using LoDTensor = framework::LoDTensor; + +template +class BoxClipKernel : public framework::OpKernel { + public: + void Compute(const framework::ExecutionContext& context) const override { + auto* input_box = context.Input("Input"); + auto* im_info = context.Input("ImInfo"); + auto* output_box = context.Output("Output"); + auto& dev_ctx = + context.template device_context(); + output_box->mutable_data(context.GetPlace()); + if (input_box->lod().size()) { + PADDLE_ENFORCE_EQ(input_box->lod().size(), 1UL, + "Only support 1 level of LoD."); + } + auto box_lod = input_box->lod().back(); + int64_t n = static_cast(box_lod.size() - 1); + for (int i = 0; i < n; ++i) { + Tensor im_info_slice = im_info->Slice(i, i + 1); + Tensor box_slice = input_box->Slice(box_lod[i], box_lod[i + 1]); + Tensor output_slice = output_box->Slice(box_lod[i], box_lod[i + 1]); + ClipTiledBoxes(dev_ctx, im_info_slice, box_slice, &output_slice); + } + } +}; + +} // namespace operators +} // namespace paddle diff --git a/paddle/fluid/operators/detection/density_prior_box_op.h b/paddle/fluid/operators/detection/density_prior_box_op.h index ed2f5df80cf4d7a5a44af9b09f3b048b1b14cdb9..3591681fc3f6951dfc8d73e8edce38180b771eaf 100644 --- a/paddle/fluid/operators/detection/density_prior_box_op.h +++ b/paddle/fluid/operators/detection/density_prior_box_op.h @@ -52,6 +52,10 @@ class DensityPriorBoxOpKernel : public framework::OpKernel { step_height = step_h; } int num_priors = 0; + +#ifdef PADDLE_WITH_MKLML +#pragma omp parallel for reduction(+ : num_priors) +#endif for (size_t i = 0; i < densities.size(); ++i) { num_priors += (fixed_ratios.size()) * (pow(densities[i], 2)); } @@ -64,6 +68,17 @@ class DensityPriorBoxOpKernel : public framework::OpKernel { auto e_boxes = framework::EigenTensor::From(*boxes).setConstant(0.0); int step_average = static_cast((step_width + step_height) * 0.5); + std::vector sqrt_fixed_ratios; +#ifdef PADDLE_WITH_MKLML +#pragma omp parallel for +#endif + for (int i = 0; i < fixed_ratios.size(); i++) { + sqrt_fixed_ratios.push_back(sqrt(fixed_ratios[i])); + } + +#ifdef PADDLE_WITH_MKLML +#pragma omp parallel for collapse(2) +#endif for (int h = 0; h < feature_height; ++h) { for (int w = 0; w < feature_width; ++w) { T center_x = (w + offset) * step_width; @@ -73,34 +88,25 @@ class DensityPriorBoxOpKernel : public framework::OpKernel { for (size_t s = 0; s < fixed_sizes.size(); ++s) { auto fixed_size = fixed_sizes[s]; int density = densities[s]; + int shift = step_average / density; // Generate density prior boxes with fixed ratios. for (size_t r = 0; r < fixed_ratios.size(); ++r) { - float ar = fixed_ratios[r]; - int shift = step_average / density; - float box_width_ratio = fixed_size * sqrt(ar); - float box_height_ratio = fixed_size / sqrt(ar); + float box_width_ratio = fixed_size * sqrt_fixed_ratios[r]; + float box_height_ratio = fixed_size / sqrt_fixed_ratios[r]; + float density_center_x = center_x - step_average / 2. + shift / 2.; + float density_center_y = center_y - step_average / 2. + shift / 2.; for (int di = 0; di < density; ++di) { for (int dj = 0; dj < density; ++dj) { - float center_x_temp = - center_x - step_average / 2. + shift / 2. + dj * shift; - float center_y_temp = - center_y - step_average / 2. + shift / 2. + di * shift; - e_boxes(h, w, idx, 0) = - (center_x_temp - box_width_ratio / 2.) / img_width >= 0 - ? (center_x_temp - box_width_ratio / 2.) / img_width - : 0; - e_boxes(h, w, idx, 1) = - (center_y_temp - box_height_ratio / 2.) / img_height >= 0 - ? (center_y_temp - box_height_ratio / 2.) / img_height - : 0; - e_boxes(h, w, idx, 2) = - (center_x_temp + box_width_ratio / 2.) / img_width <= 1 - ? (center_x_temp + box_width_ratio / 2.) / img_width - : 1; - e_boxes(h, w, idx, 3) = - (center_y_temp + box_height_ratio / 2.) / img_height <= 1 - ? (center_y_temp + box_height_ratio / 2.) / img_height - : 1; + float center_x_temp = density_center_x + dj * shift; + float center_y_temp = density_center_y + di * shift; + e_boxes(h, w, idx, 0) = std::max( + (center_x_temp - box_width_ratio / 2.) / img_width, 0.); + e_boxes(h, w, idx, 1) = std::max( + (center_y_temp - box_height_ratio / 2.) / img_height, 0.); + e_boxes(h, w, idx, 2) = std::min( + (center_x_temp + box_width_ratio / 2.) / img_width, 1.); + e_boxes(h, w, idx, 3) = std::min( + (center_y_temp + box_height_ratio / 2.) / img_height, 1.); idx++; } } @@ -131,8 +137,14 @@ class DensityPriorBoxOpKernel : public framework::OpKernel { vars->Resize({box_num, static_cast(variances.size())}); auto e_vars = framework::EigenMatrix::From(*vars); - - e_vars = var_et.broadcast(Eigen::DSizes(box_num, 1)); +#ifdef PADDLE_WITH_MKLML +#pragma omp parallel for collapse(2) +#endif + for (int i = 0; i < box_num; ++i) { + for (int j = 0; j < variances.size(); ++j) { + e_vars(i, j) = variances[j]; + } + } vars->Resize(var_dim); boxes->Resize(box_dim); diff --git a/paddle/fluid/operators/yolov3_loss_op.cc b/paddle/fluid/operators/detection/yolov3_loss_op.cc similarity index 69% rename from paddle/fluid/operators/yolov3_loss_op.cc rename to paddle/fluid/operators/detection/yolov3_loss_op.cc index 60508f7ab871910c38f1e4aa04c2035075d37df5..2a69ad4b53c26f5e2e0547e75e0d9c6518a8bcba 100644 --- a/paddle/fluid/operators/yolov3_loss_op.cc +++ b/paddle/fluid/operators/detection/yolov3_loss_op.cc @@ -9,7 +9,7 @@ See the License for the specific language governing permissions and limitations under the License. */ -#include "paddle/fluid/operators/yolov3_loss_op.h" +#include "paddle/fluid/operators/detection/yolov3_loss_op.h" #include "paddle/fluid/framework/op_registry.h" namespace paddle { @@ -29,23 +29,33 @@ class Yolov3LossOp : public framework::OperatorWithKernel { "Input(GTLabel) of Yolov3LossOp should not be null."); PADDLE_ENFORCE(ctx->HasOutput("Loss"), "Output(Loss) of Yolov3LossOp should not be null."); + PADDLE_ENFORCE( + ctx->HasOutput("ObjectnessMask"), + "Output(ObjectnessMask) of Yolov3LossOp should not be null."); + PADDLE_ENFORCE(ctx->HasOutput("GTMatchMask"), + "Output(GTMatchMask) of Yolov3LossOp should not be null."); auto dim_x = ctx->GetInputDim("X"); auto dim_gtbox = ctx->GetInputDim("GTBox"); auto dim_gtlabel = ctx->GetInputDim("GTLabel"); auto anchors = ctx->Attrs().Get>("anchors"); + int anchor_num = anchors.size() / 2; + auto anchor_mask = ctx->Attrs().Get>("anchor_mask"); + int mask_num = anchor_mask.size(); auto class_num = ctx->Attrs().Get("class_num"); + PADDLE_ENFORCE_EQ(dim_x.size(), 4, "Input(X) should be a 4-D tensor."); PADDLE_ENFORCE_EQ(dim_x[2], dim_x[3], "Input(X) dim[3] and dim[4] should be euqal."); - PADDLE_ENFORCE_EQ(dim_x[1], anchors.size() / 2 * (5 + class_num), - "Input(X) dim[1] should be equal to (anchor_number * (5 " - "+ class_num))."); + PADDLE_ENFORCE_EQ( + dim_x[1], mask_num * (5 + class_num), + "Input(X) dim[1] should be equal to (anchor_mask_number * (5 " + "+ class_num))."); PADDLE_ENFORCE_EQ(dim_gtbox.size(), 3, "Input(GTBox) should be a 3-D tensor"); PADDLE_ENFORCE_EQ(dim_gtbox[2], 4, "Input(GTBox) dim[2] should be 5"); PADDLE_ENFORCE_EQ(dim_gtlabel.size(), 2, - "Input(GTBox) should be a 2-D tensor"); + "Input(GTLabel) should be a 2-D tensor"); PADDLE_ENFORCE_EQ(dim_gtlabel[0], dim_gtbox[0], "Input(GTBox) and Input(GTLabel) dim[0] should be same"); PADDLE_ENFORCE_EQ(dim_gtlabel[1], dim_gtbox[1], @@ -54,11 +64,22 @@ class Yolov3LossOp : public framework::OperatorWithKernel { "Attr(anchors) length should be greater then 0."); PADDLE_ENFORCE_EQ(anchors.size() % 2, 0, "Attr(anchors) length should be even integer."); + for (size_t i = 0; i < anchor_mask.size(); i++) { + PADDLE_ENFORCE_LT( + anchor_mask[i], anchor_num, + "Attr(anchor_mask) should not crossover Attr(anchors)."); + } PADDLE_ENFORCE_GT(class_num, 0, "Attr(class_num) should be an integer greater then 0."); - std::vector dim_out({1}); + std::vector dim_out({dim_x[0]}); ctx->SetOutputDim("Loss", framework::make_ddim(dim_out)); + + std::vector dim_obj_mask({dim_x[0], mask_num, dim_x[2], dim_x[3]}); + ctx->SetOutputDim("ObjectnessMask", framework::make_ddim(dim_obj_mask)); + + std::vector dim_gt_match_mask({dim_gtbox[0], dim_gtbox[1]}); + ctx->SetOutputDim("GTMatchMask", framework::make_ddim(dim_gt_match_mask)); } protected: @@ -73,11 +94,11 @@ class Yolov3LossOpMaker : public framework::OpProtoAndCheckerMaker { public: void Make() override { AddInput("X", - "The input tensor of YOLO v3 loss operator, " + "The input tensor of YOLOv3 loss operator, " "This is a 4-D tensor with shape of [N, C, H, W]." "H and W should be same, and the second dimention(C) stores" "box locations, confidence score and classification one-hot" - "key of each anchor box"); + "keys of each anchor box"); AddInput("GTBox", "The input tensor of ground truth boxes, " "This is a 3-D tensor with shape of [N, max_box_num, 5], " @@ -89,32 +110,39 @@ class Yolov3LossOpMaker : public framework::OpProtoAndCheckerMaker { AddInput("GTLabel", "The input tensor of ground truth label, " "This is a 2-D tensor with shape of [N, max_box_num], " - "and each element shoudl be an integer to indicate the " + "and each element should be an integer to indicate the " "box class id."); AddOutput("Loss", "The output yolov3 loss tensor, " - "This is a 1-D tensor with shape of [1]"); + "This is a 1-D tensor with shape of [N]"); + AddOutput("ObjectnessMask", + "This is an intermediate tensor with shape of [N, M, H, W], " + "M is the number of anchor masks. This parameter caches the " + "mask for calculate objectness loss in gradient kernel.") + .AsIntermediate(); + AddOutput("GTMatchMask", + "This is an intermediate tensor with shape of [N, B], " + "B is the max box number of GT boxes. This parameter caches " + "matched mask index of each GT boxes for gradient calculate.") + .AsIntermediate(); AddAttr("class_num", "The number of classes to predict."); AddAttr>("anchors", "The anchor width and height, " - "it will be parsed pair by pair."); + "it will be parsed pair by pair.") + .SetDefault(std::vector{}); + AddAttr>("anchor_mask", + "The mask index of anchors used in " + "current YOLOv3 loss calculation.") + .SetDefault(std::vector{}); + AddAttr("downsample_ratio", + "The downsample ratio from network input to YOLOv3 loss " + "input, so 32, 16, 8 should be set for the first, second, " + "and thrid YOLOv3 loss operators.") + .SetDefault(32); AddAttr("ignore_thresh", - "The ignore threshold to ignore confidence loss."); - AddAttr("loss_weight_xy", "The weight of x, y location loss.") - .SetDefault(1.0); - AddAttr("loss_weight_wh", "The weight of w, h location loss.") - .SetDefault(1.0); - AddAttr( - "loss_weight_conf_target", - "The weight of confidence score loss in locations with target object.") - .SetDefault(1.0); - AddAttr("loss_weight_conf_notarget", - "The weight of confidence score loss in locations without " - "target object.") - .SetDefault(1.0); - AddAttr("loss_weight_class", "The weight of classification loss.") - .SetDefault(1.0); + "The ignore threshold to ignore confidence loss.") + .SetDefault(0.7); AddComment(R"DOC( This operator generate yolov3 loss by given predict result and ground truth boxes. @@ -147,17 +175,28 @@ class Yolov3LossOpMaker : public framework::OpProtoAndCheckerMaker { thresh, the confidence score loss of this anchor box will be ignored. Therefore, the yolov3 loss consist of three major parts, box location loss, - confidence score loss, and classification loss. The MSE loss is used for - box location, and binary cross entropy loss is used for confidence score - loss and classification loss. + confidence score loss, and classification loss. The L2 loss is used for + box coordinates (w, h), and sigmoid cross entropy loss is used for box + coordinates (x, y), confidence score loss and classification loss. + + Each groud truth box find a best matching anchor box in all anchors, + prediction of this anchor box will incur all three parts of losses, and + prediction of anchor boxes with no GT box matched will only incur objectness + loss. + + In order to trade off box coordinate losses between big boxes and small + boxes, box coordinate losses will be mutiplied by scale weight, which is + calculated as follow. + + $$ + weight_{box} = 2.0 - t_w * t_h + $$ Final loss will be represented as follow. $$ - loss = \loss_weight_{xy} * loss_{xy} + \loss_weight_{wh} * loss_{wh} - + \loss_weight_{conf_target} * loss_{conf_target} - + \loss_weight_{conf_notarget} * loss_{conf_notarget} - + \loss_weight_{class} * loss_{class} + loss = (loss_{xy} + loss_{wh}) * weight_{box} + + loss_{conf} + loss_{class} $$ )DOC"); } @@ -196,6 +235,8 @@ class Yolov3LossGradMaker : public framework::SingleGradOpDescMaker { op->SetInput("GTBox", Input("GTBox")); op->SetInput("GTLabel", Input("GTLabel")); op->SetInput(framework::GradVarName("Loss"), OutputGrad("Loss")); + op->SetInput("ObjectnessMask", Output("ObjectnessMask")); + op->SetInput("GTMatchMask", Output("GTMatchMask")); op->SetAttrMap(Attrs()); diff --git a/paddle/fluid/operators/detection/yolov3_loss_op.h b/paddle/fluid/operators/detection/yolov3_loss_op.h new file mode 100644 index 0000000000000000000000000000000000000000..8407d4e6e8f87a2e8d073c4fbda5691abe1bba68 --- /dev/null +++ b/paddle/fluid/operators/detection/yolov3_loss_op.h @@ -0,0 +1,447 @@ +/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve. + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. */ + +#pragma once +#include +#include +#include "paddle/fluid/framework/op_registry.h" +#include "paddle/fluid/operators/math/math_function.h" + +namespace paddle { +namespace operators { + +using Tensor = framework::Tensor; +template +using EigenTensor = framework::EigenTensor; +template +using EigenVector = framework::EigenVector; + +template +static inline bool LessEqualZero(T x) { + return x < 1e-6; +} + +template +static T SigmoidCrossEntropy(T x, T label) { + return (x > 0 ? x : 0.0) - x * label + std::log(1.0 + std::exp(-std::abs(x))); +} + +template +static T L2Loss(T x, T y) { + return 0.5 * (y - x) * (y - x); +} + +template +static T SigmoidCrossEntropyGrad(T x, T label) { + return 1.0 / (1.0 + std::exp(-x)) - label; +} + +template +static T L2LossGrad(T x, T y) { + return x - y; +} + +static int GetMaskIndex(std::vector mask, int val) { + for (size_t i = 0; i < mask.size(); i++) { + if (mask[i] == val) { + return i; + } + } + return -1; +} + +template +struct Box { + T x, y, w, h; +}; + +template +static inline T sigmoid(T x) { + return 1.0 / (1.0 + std::exp(-x)); +} + +template +static inline Box GetYoloBox(const T* x, std::vector anchors, int i, + int j, int an_idx, int grid_size, + int input_size, int index, int stride) { + Box b; + b.x = (i + sigmoid(x[index])) / grid_size; + b.y = (j + sigmoid(x[index + stride])) / grid_size; + b.w = std::exp(x[index + 2 * stride]) * anchors[2 * an_idx] / input_size; + b.h = std::exp(x[index + 3 * stride]) * anchors[2 * an_idx + 1] / input_size; + return b; +} + +template +static inline Box GetGtBox(const T* gt, int batch, int max_boxes, int idx) { + Box b; + b.x = gt[(batch * max_boxes + idx) * 4]; + b.y = gt[(batch * max_boxes + idx) * 4 + 1]; + b.w = gt[(batch * max_boxes + idx) * 4 + 2]; + b.h = gt[(batch * max_boxes + idx) * 4 + 3]; + return b; +} + +template +static inline T BoxOverlap(T c1, T w1, T c2, T w2) { + T l1 = c1 - w1 / 2.0; + T l2 = c2 - w2 / 2.0; + T left = l1 > l2 ? l1 : l2; + T r1 = c1 + w1 / 2.0; + T r2 = c2 + w2 / 2.0; + T right = r1 < r2 ? r1 : r2; + return right - left; +} + +template +static inline T CalcBoxIoU(Box b1, Box b2) { + T w = BoxOverlap(b1.x, b1.w, b2.x, b2.w); + T h = BoxOverlap(b1.y, b1.h, b2.y, b2.h); + T inter_area = (w < 0 || h < 0) ? 0.0 : w * h; + T union_area = b1.w * b1.h + b2.w * b2.h - inter_area; + return inter_area / union_area; +} + +static inline int GetEntryIndex(int batch, int an_idx, int hw_idx, int an_num, + int an_stride, int stride, int entry) { + return (batch * an_num + an_idx) * an_stride + entry * stride + hw_idx; +} + +template +static void CalcBoxLocationLoss(T* loss, const T* input, Box gt, + std::vector anchors, int an_idx, + int box_idx, int gi, int gj, int grid_size, + int input_size, int stride) { + T tx = gt.x * grid_size - gi; + T ty = gt.y * grid_size - gj; + T tw = std::log(gt.w * input_size / anchors[2 * an_idx]); + T th = std::log(gt.h * input_size / anchors[2 * an_idx + 1]); + + T scale = (2.0 - gt.w * gt.h); + loss[0] += SigmoidCrossEntropy(input[box_idx], tx) * scale; + loss[0] += SigmoidCrossEntropy(input[box_idx + stride], ty) * scale; + loss[0] += L2Loss(input[box_idx + 2 * stride], tw) * scale; + loss[0] += L2Loss(input[box_idx + 3 * stride], th) * scale; +} + +template +static void CalcBoxLocationLossGrad(T* input_grad, const T loss, const T* input, + Box gt, std::vector anchors, + int an_idx, int box_idx, int gi, int gj, + int grid_size, int input_size, int stride) { + T tx = gt.x * grid_size - gi; + T ty = gt.y * grid_size - gj; + T tw = std::log(gt.w * input_size / anchors[2 * an_idx]); + T th = std::log(gt.h * input_size / anchors[2 * an_idx + 1]); + + T scale = (2.0 - gt.w * gt.h); + input_grad[box_idx] = + SigmoidCrossEntropyGrad(input[box_idx], tx) * scale * loss; + input_grad[box_idx + stride] = + SigmoidCrossEntropyGrad(input[box_idx + stride], ty) * scale * loss; + input_grad[box_idx + 2 * stride] = + L2LossGrad(input[box_idx + 2 * stride], tw) * scale * loss; + input_grad[box_idx + 3 * stride] = + L2LossGrad(input[box_idx + 3 * stride], th) * scale * loss; +} + +template +static inline void CalcLabelLoss(T* loss, const T* input, const int index, + const int label, const int class_num, + const int stride) { + for (int i = 0; i < class_num; i++) { + T pred = input[index + i * stride]; + loss[0] += SigmoidCrossEntropy(pred, (i == label) ? 1.0 : 0.0); + } +} + +template +static inline void CalcLabelLossGrad(T* input_grad, const T loss, + const T* input, const int index, + const int label, const int class_num, + const int stride) { + for (int i = 0; i < class_num; i++) { + T pred = input[index + i * stride]; + input_grad[index + i * stride] = + SigmoidCrossEntropyGrad(pred, (i == label) ? 1.0 : 0.0) * loss; + } +} + +template +static inline void CalcObjnessLoss(T* loss, const T* input, const T* objness, + const int n, const int an_num, const int h, + const int w, const int stride, + const int an_stride) { + for (int i = 0; i < n; i++) { + for (int j = 0; j < an_num; j++) { + for (int k = 0; k < h; k++) { + for (int l = 0; l < w; l++) { + T obj = objness[k * w + l]; + if (obj > 1e-5) { + // positive sample: obj = 1 + loss[i] += SigmoidCrossEntropy(input[k * w + l], 1.0); + } else if (obj > -0.5) { + // negetive sample: obj = 0 + loss[i] += SigmoidCrossEntropy(input[k * w + l], 0.0); + } + } + } + objness += stride; + input += an_stride; + } + } +} + +template +static inline void CalcObjnessLossGrad(T* input_grad, const T* loss, + const T* input, const T* objness, + const int n, const int an_num, + const int h, const int w, + const int stride, const int an_stride) { + for (int i = 0; i < n; i++) { + for (int j = 0; j < an_num; j++) { + for (int k = 0; k < h; k++) { + for (int l = 0; l < w; l++) { + T obj = objness[k * w + l]; + if (obj > 1e-5) { + input_grad[k * w + l] = + SigmoidCrossEntropyGrad(input[k * w + l], 1.0) * loss[i]; + } else if (obj > -0.5) { + input_grad[k * w + l] = + SigmoidCrossEntropyGrad(input[k * w + l], 0.0) * loss[i]; + } + } + } + objness += stride; + input += an_stride; + input_grad += an_stride; + } + } +} + +template +static void inline GtValid(bool* valid, const T* gtbox, const int n, + const int b) { + for (int i = 0; i < n; i++) { + for (int j = 0; j < b; j++) { + if (LessEqualZero(gtbox[j * 4 + 2]) || LessEqualZero(gtbox[j * 4 + 3])) { + valid[j] = false; + } else { + valid[j] = true; + } + } + valid += b; + gtbox += b * 4; + } +} + +template +class Yolov3LossKernel : public framework::OpKernel { + public: + void Compute(const framework::ExecutionContext& ctx) const override { + auto* input = ctx.Input("X"); + auto* gt_box = ctx.Input("GTBox"); + auto* gt_label = ctx.Input("GTLabel"); + auto* loss = ctx.Output("Loss"); + auto* objness_mask = ctx.Output("ObjectnessMask"); + auto* gt_match_mask = ctx.Output("GTMatchMask"); + auto anchors = ctx.Attr>("anchors"); + auto anchor_mask = ctx.Attr>("anchor_mask"); + int class_num = ctx.Attr("class_num"); + float ignore_thresh = ctx.Attr("ignore_thresh"); + int downsample_ratio = ctx.Attr("downsample_ratio"); + + const int n = input->dims()[0]; + const int h = input->dims()[2]; + const int w = input->dims()[3]; + const int an_num = anchors.size() / 2; + const int mask_num = anchor_mask.size(); + const int b = gt_box->dims()[1]; + int input_size = downsample_ratio * h; + + const int stride = h * w; + const int an_stride = (class_num + 5) * stride; + + const T* input_data = input->data(); + const T* gt_box_data = gt_box->data(); + const int* gt_label_data = gt_label->data(); + T* loss_data = loss->mutable_data({n}, ctx.GetPlace()); + memset(loss_data, 0, loss->numel() * sizeof(T)); + T* obj_mask_data = + objness_mask->mutable_data({n, mask_num, h, w}, ctx.GetPlace()); + memset(obj_mask_data, 0, objness_mask->numel() * sizeof(T)); + int* gt_match_mask_data = + gt_match_mask->mutable_data({n, b}, ctx.GetPlace()); + + // calc valid gt box mask, avoid calc duplicately in following code + Tensor gt_valid_mask; + bool* gt_valid_mask_data = + gt_valid_mask.mutable_data({n, b}, ctx.GetPlace()); + GtValid(gt_valid_mask_data, gt_box_data, n, b); + + for (int i = 0; i < n; i++) { + for (int j = 0; j < mask_num; j++) { + for (int k = 0; k < h; k++) { + for (int l = 0; l < w; l++) { + // each predict box find a best match gt box, if overlap is bigger + // then ignore_thresh, ignore the objectness loss. + int box_idx = + GetEntryIndex(i, j, k * w + l, mask_num, an_stride, stride, 0); + Box pred = GetYoloBox(input_data, anchors, l, k, anchor_mask[j], + h, input_size, box_idx, stride); + T best_iou = 0; + for (int t = 0; t < b; t++) { + if (!gt_valid_mask_data[i * b + t]) { + continue; + } + Box gt = GetGtBox(gt_box_data, i, b, t); + T iou = CalcBoxIoU(pred, gt); + if (iou > best_iou) { + best_iou = iou; + } + } + + // If best IoU is bigger then ignore_thresh, + // ignore the objectness loss. + if (best_iou > ignore_thresh) { + int obj_idx = (i * mask_num + j) * stride + k * w + l; + obj_mask_data[obj_idx] = static_cast(-1); + } + // all losses should be calculated if best IoU + // is bigger then truth thresh, but currently, + // truth thresh is an unreachable value as 1.0. + } + } + } + for (int t = 0; t < b; t++) { + if (!gt_valid_mask_data[i * b + t]) { + gt_match_mask_data[i * b + t] = -1; + continue; + } + Box gt = GetGtBox(gt_box_data, i, b, t); + int gi = static_cast(gt.x * w); + int gj = static_cast(gt.y * h); + Box gt_shift = gt; + gt_shift.x = 0.0; + gt_shift.y = 0.0; + T best_iou = 0.0; + int best_n = 0; + // each gt box find a best match anchor box as positive sample, + // for positive sample, all losses should be calculated, and for + // other samples, only objectness loss is required. + for (int an_idx = 0; an_idx < an_num; an_idx++) { + Box an_box; + an_box.x = 0.0; + an_box.y = 0.0; + an_box.w = anchors[2 * an_idx] / static_cast(input_size); + an_box.h = anchors[2 * an_idx + 1] / static_cast(input_size); + float iou = CalcBoxIoU(an_box, gt_shift); + if (iou > best_iou) { + best_iou = iou; + best_n = an_idx; + } + } + + int mask_idx = GetMaskIndex(anchor_mask, best_n); + gt_match_mask_data[i * b + t] = mask_idx; + if (mask_idx >= 0) { + int box_idx = GetEntryIndex(i, mask_idx, gj * w + gi, mask_num, + an_stride, stride, 0); + CalcBoxLocationLoss(loss_data + i, input_data, gt, anchors, best_n, + box_idx, gi, gj, h, input_size, stride); + + int obj_idx = (i * mask_num + mask_idx) * stride + gj * w + gi; + obj_mask_data[obj_idx] = 1.0; + + int label = gt_label_data[i * b + t]; + int label_idx = GetEntryIndex(i, mask_idx, gj * w + gi, mask_num, + an_stride, stride, 5); + CalcLabelLoss(loss_data + i, input_data, label_idx, label, + class_num, stride); + } + } + } + + CalcObjnessLoss(loss_data, input_data + 4 * stride, obj_mask_data, n, + mask_num, h, w, stride, an_stride); + } +}; + +template +class Yolov3LossGradKernel : public framework::OpKernel { + public: + void Compute(const framework::ExecutionContext& ctx) const override { + auto* input = ctx.Input("X"); + auto* gt_box = ctx.Input("GTBox"); + auto* gt_label = ctx.Input("GTLabel"); + auto* input_grad = ctx.Output(framework::GradVarName("X")); + auto* loss_grad = ctx.Input(framework::GradVarName("Loss")); + auto* objness_mask = ctx.Input("ObjectnessMask"); + auto* gt_match_mask = ctx.Input("GTMatchMask"); + auto anchors = ctx.Attr>("anchors"); + auto anchor_mask = ctx.Attr>("anchor_mask"); + int class_num = ctx.Attr("class_num"); + int downsample_ratio = ctx.Attr("downsample_ratio"); + + const int n = input_grad->dims()[0]; + const int c = input_grad->dims()[1]; + const int h = input_grad->dims()[2]; + const int w = input_grad->dims()[3]; + const int mask_num = anchor_mask.size(); + const int b = gt_match_mask->dims()[1]; + int input_size = downsample_ratio * h; + + const int stride = h * w; + const int an_stride = (class_num + 5) * stride; + + const T* input_data = input->data(); + const T* gt_box_data = gt_box->data(); + const int* gt_label_data = gt_label->data(); + const T* loss_grad_data = loss_grad->data(); + const T* obj_mask_data = objness_mask->data(); + const int* gt_match_mask_data = gt_match_mask->data(); + T* input_grad_data = + input_grad->mutable_data({n, c, h, w}, ctx.GetPlace()); + memset(input_grad_data, 0, input_grad->numel() * sizeof(T)); + + for (int i = 0; i < n; i++) { + for (int t = 0; t < b; t++) { + int mask_idx = gt_match_mask_data[i * b + t]; + if (mask_idx >= 0) { + Box gt = GetGtBox(gt_box_data, i, b, t); + int gi = static_cast(gt.x * w); + int gj = static_cast(gt.y * h); + + int box_idx = GetEntryIndex(i, mask_idx, gj * w + gi, mask_num, + an_stride, stride, 0); + CalcBoxLocationLossGrad( + input_grad_data, loss_grad_data[i], input_data, gt, anchors, + anchor_mask[mask_idx], box_idx, gi, gj, h, input_size, stride); + + int label = gt_label_data[i * b + t]; + int label_idx = GetEntryIndex(i, mask_idx, gj * w + gi, mask_num, + an_stride, stride, 5); + CalcLabelLossGrad(input_grad_data, loss_grad_data[i], input_data, + label_idx, label, class_num, stride); + } + } + } + + CalcObjnessLossGrad(input_grad_data + 4 * stride, loss_grad_data, + input_data + 4 * stride, obj_mask_data, n, mask_num, + h, w, stride, an_stride); + } +}; + +} // namespace operators +} // namespace paddle diff --git a/paddle/fluid/operators/fused/fusion_repeated_fc_relu_op.cc b/paddle/fluid/operators/fused/fusion_repeated_fc_relu_op.cc index a35ee8a09ed5ddcc4ac465d200b84358fa65b2f3..e9e2a3b1f5c1c00bb2e95b6171ecd09bfe7a0d21 100644 --- a/paddle/fluid/operators/fused/fusion_repeated_fc_relu_op.cc +++ b/paddle/fluid/operators/fused/fusion_repeated_fc_relu_op.cc @@ -79,17 +79,17 @@ void FusionRepeatedFCReluOpMaker::Make() { } template -static void fc_relu(const T* x, const T* w, const T* b, T* y, int m, int n, - int k) { +static void fc_relu(const T* x, const T* w, const T* b, T* y, + const jit::matmul_attr_t& attr) { auto matmul = - jit::Get, platform::CPUPlace>(k); + jit::Get, platform::CPUPlace>(attr); auto addbias_relu = - jit::Get, platform::CPUPlace>(n); - matmul(x, w, y, m, n, k); + jit::Get, platform::CPUPlace>(attr.n); + matmul(x, w, y, &attr); T* dst = y; - for (int i = 0; i < m; ++i) { - addbias_relu(b, dst, dst, n); - dst += n; + for (int i = 0; i < attr.m; ++i) { + addbias_relu(b, dst, dst, attr.n); + dst += attr.n; } } @@ -107,32 +107,33 @@ class FusionRepeatedFCReluKernel : public framework::OpKernel { auto i_dims = in->dims(); auto w_dims = weights[0]->dims(); - int m = i_dims[0]; - int n = w_dims[1]; - int k = w_dims[0]; - relus[0]->Resize({m, n}); + jit::matmul_attr_t attr; + attr.m = i_dims[0]; + attr.n = w_dims[1]; + attr.k = w_dims[0]; + relus[0]->Resize({attr.m, attr.n}); fc_relu(in->data(), weights[0]->data(), biases[0]->data(), - relus[0]->mutable_data(place), m, n, k); + relus[0]->mutable_data(place), attr); for (int i = 1; i < weight_sz - 1; ++i) { auto i_dims = relus[i - 1]->dims(); auto w_dims = weights[i]->dims(); - int m = i_dims[0]; - int n = w_dims[1]; - int k = w_dims[0]; - relus[i]->Resize({m, n}); + attr.m = i_dims[0]; + attr.n = w_dims[1]; + attr.k = w_dims[0]; + relus[i]->Resize({attr.m, attr.n}); fc_relu(relus[i - 1]->data(), weights[i]->data(), - biases[i]->data(), relus[i]->mutable_data(place), m, n, k); + biases[i]->data(), relus[i]->mutable_data(place), attr); } auto i_dims_last = relus[weight_sz - 2]->dims(); auto w_dims_last = weights[weight_sz - 1]->dims(); - m = i_dims_last[0]; - n = w_dims_last[1]; - k = w_dims_last[0]; + attr.m = i_dims_last[0]; + attr.n = w_dims_last[1]; + attr.k = w_dims_last[0]; fc_relu(relus[weight_sz - 2]->data(), weights[weight_sz - 1]->data(), - biases[weight_sz - 1]->data(), out->mutable_data(place), m, n, - k); + biases[weight_sz - 1]->data(), out->mutable_data(place), + attr); } }; diff --git a/paddle/fluid/operators/fused/fusion_squared_mat_sub_op.cc b/paddle/fluid/operators/fused/fusion_squared_mat_sub_op.cc index 00dafdead53bbd4614c70875441c565724fca46d..8c8b079633aacb711aa304ec7016c37c6bec61ce 100644 --- a/paddle/fluid/operators/fused/fusion_squared_mat_sub_op.cc +++ b/paddle/fluid/operators/fused/fusion_squared_mat_sub_op.cc @@ -87,15 +87,18 @@ class FusionSquaredMatSubKernel : public framework::OpKernel { auto x_dims = x->dims(); auto y_dims = y->dims(); - int m = x_dims[0]; - int k = x_dims[1]; - int n = y_dims[1]; - int o_numel = m * n; + jit::matmul_attr_t attr; + attr.m = x_dims[0]; + attr.k = x_dims[1]; + attr.n = y_dims[1]; + int o_numel = attr.m * attr.n; auto vsquare_x = - jit::Get, platform::CPUPlace>(m * k); + jit::Get, platform::CPUPlace>(attr.m * + attr.k); auto vsquare_y = - jit::Get, platform::CPUPlace>(k * n); + jit::Get, platform::CPUPlace>(attr.k * + attr.n); auto vsquare_xy = jit::Get, platform::CPUPlace>(o_numel); auto vsub = @@ -103,7 +106,7 @@ class FusionSquaredMatSubKernel : public framework::OpKernel { auto vscal = jit::Get, platform::CPUPlace>(o_numel); auto matmul = - jit::Get, platform::CPUPlace>(k); + jit::Get, platform::CPUPlace>(attr); const T* x_data = x->data(); const T* y_data = y->data(); @@ -112,12 +115,12 @@ class FusionSquaredMatSubKernel : public framework::OpKernel { T* squared_xy_data = squared_xy->mutable_data(place); T* o_data = out->mutable_data(place); - matmul(x_data, y_data, squared_xy_data, m, n, k); + matmul(x_data, y_data, squared_xy_data, &attr); vsquare_xy(squared_xy_data, squared_xy_data, o_numel); - vsquare_x(x_data, squared_x_data, m * k); - vsquare_y(y_data, squared_y_data, k * n); - matmul(squared_x_data, squared_y_data, o_data, m, n, k); + vsquare_x(x_data, squared_x_data, attr.m * attr.k); + vsquare_y(y_data, squared_y_data, attr.k * attr.n); + matmul(squared_x_data, squared_y_data, o_data, &attr); vsub(squared_xy_data, o_data, o_data, o_numel); vscal(&scalar, o_data, o_data, o_numel); diff --git a/paddle/fluid/operators/jit/benchmark.cc b/paddle/fluid/operators/jit/benchmark.cc index 5c5a61f64093802697eb21452267471129c7fcf3..97ddf223aefcdfaf8a488f93a152336c1ed458f4 100644 --- a/paddle/fluid/operators/jit/benchmark.cc +++ b/paddle/fluid/operators/jit/benchmark.cc @@ -93,6 +93,7 @@ std::vector TestSizes() { template struct BenchFunc { // return this function avg time + // TODO(TJ): clear cache every time double operator()(const typename KernelTuples::func_type tgt, Args... args) { for (int i = 0; i < FLAGS_burning; ++i) { tgt(args...); @@ -172,6 +173,9 @@ void BenchXYZNKernel() { RandomVec(d, y_data); BenchAllImpls, PlaceType>(d, x.data(), y.data(), z_data, d); + // test inplace + BenchAllImpls, PlaceType>(d, x.data(), z_data, + z_data, d); } } @@ -311,8 +315,9 @@ void BenchMatMulKernel() { const T* a_data = a.data(); const T* b_data = b.data(); T* c_data = c.mutable_data(PlaceType()); - BenchAllImpls, PlaceType>(k, a_data, b_data, - c_data, m, n, k); + const jit::matmul_attr_t attr{m, n, k}; + BenchAllImpls, PlaceType>(attr, a_data, b_data, + c_data, &attr); } } } diff --git a/paddle/fluid/operators/jit/gen/CMakeLists.txt b/paddle/fluid/operators/jit/gen/CMakeLists.txt index 2ea8f927e1a13867fa2065841fac05e766735237..efc7eb79d36c5cf9fac4ac40db4e2e28cb242e22 100644 --- a/paddle/fluid/operators/jit/gen/CMakeLists.txt +++ b/paddle/fluid/operators/jit/gen/CMakeLists.txt @@ -9,6 +9,7 @@ function(USE_JITKERNEL_GEN TARGET) endfunction() # use gen jitcode kernel by name +USE_JITKERNEL_GEN(kMatMul) USE_JITKERNEL_GEN(kVMul) USE_JITKERNEL_GEN(kVAdd) USE_JITKERNEL_GEN(kVSub) diff --git a/paddle/fluid/operators/jit/gen/blas.cc b/paddle/fluid/operators/jit/gen/blas.cc index dee6c7b9d3ee9756c1b11d10d55fdca341cbee85..5da24c359edd2df93333fe0ca8a18cdc7385aadb 100644 --- a/paddle/fluid/operators/jit/gen/blas.cc +++ b/paddle/fluid/operators/jit/gen/blas.cc @@ -155,7 +155,7 @@ class NCHW16CMulNCCreator : public JitCodeCreator { class name##Creator : public JitCodeCreator { \ public: \ bool UseMe(const int& attr) const override { \ - return platform::MayIUse(platform::avx); \ + return platform::MayIUse(platform::avx) && attr <= 1024; \ } \ size_t CodeSize(const int& d) const override { \ return 96 + d / YMM_FLOAT_BLOCK * 4 * 8; \ diff --git a/paddle/fluid/operators/jit/gen/blas.h b/paddle/fluid/operators/jit/gen/blas.h index de6b33f467279124d7acd97709516c31706ec4f9..66a97c1be503b0fa983f9a7ec3b61c986774f16b 100644 --- a/paddle/fluid/operators/jit/gen/blas.h +++ b/paddle/fluid/operators/jit/gen/blas.h @@ -61,6 +61,7 @@ class VXXJitCode : public JitCode { base += "_Vec"; } base += (with_relu_ ? "_Relu" : ""); + base += "_D" + std::to_string(num_); return base.c_str(); } void genCode() override; diff --git a/paddle/fluid/operators/jit/gen/matmul.cc b/paddle/fluid/operators/jit/gen/matmul.cc new file mode 100644 index 0000000000000000000000000000000000000000..ae3858eab20aeb80553d8fcec4088a6632c9c17d --- /dev/null +++ b/paddle/fluid/operators/jit/gen/matmul.cc @@ -0,0 +1,128 @@ +/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. */ + +#include "paddle/fluid/operators/jit/gen/matmul.h" +#include // offsetof +#include + +#include "paddle/fluid/operators/jit/registry.h" +#include "paddle/fluid/platform/cpu_info.h" + +namespace paddle { +namespace operators { +namespace jit { +namespace gen { + +void MatMulJitCode::genCode() { + preCode(); + int block, rest; + const auto groups = packed_groups(n_, k_, &block, &rest); + PADDLE_ENFORCE_GT(groups.front(), 0); + + const int block_len = sizeof(float) * block; + const int x_reg_idx = (block == ZMM_FLOAT_BLOCK ? 32 : 16) - 1; + const int w_reg_idx = x_reg_idx - 1; + // from packed mov(reg_ptr_wgt, ptr[param_attr + offsetof(matmul_attr_t, + // packed_weight)]); + mov(reg_ptr_wgt, param_y); + size_t z_offset = 0; + size_t wgt_offset = 0; + for (size_t g = 0; g < groups.size(); ++g) { + size_t x_offset = 0; + for (int k = 0; k < k_; ++k) { + vbroadcastss(zmm_t(x_reg_idx), ptr[param_x + x_offset]); + // clean + if (k == 0) { + for (int i = 0; i < groups[g]; ++i) { + vxorps(zmm_t(i), zmm_t(i), zmm_t(i)); + } + } + for (int i = 0; i < groups[g]; ++i) { + vmovups(zmm_t(w_reg_idx), ptr[reg_ptr_wgt + wgt_offset]); + vfmadd231ps(zmm_t(i), zmm_t(w_reg_idx), zmm_t(x_reg_idx)); + wgt_offset += block_len; + } + // last one, save + if (k == k_ - 1) { + for (int i = 0; i < groups[g]; ++i) { + // only rest save should be careful + if (rest != 0 && g == groups.size() - 1 && i == groups[g] - 1) { + break; + } + vmovups(ptr[param_z + z_offset + i * block_len], zmm_t(i)); + } + } + x_offset += sizeof(float); + } + z_offset += block_len * groups[g]; + } + + if (rest != 0) { + // below should refine with mask + int reg_idx = groups.back() - 1; + z_offset = (n_ - rest) * sizeof(float); + int inner_block = 8; + while (rest > 0) { + if (rest >= 8) { + inner_block = 8; + vmovups(ptr[param_z + z_offset], ymm_t(reg_idx)); + // shift zmm of inner_block, change reg_idx if update + } else if (rest >= 4) { + inner_block = 4; + vmovups(ptr[param_z + z_offset], xmm_t(reg_idx)); + } else if (rest >= 2) { + inner_block = 2; + vmovq(ptr[param_z + z_offset], xmm_t(reg_idx)); + } else { + inner_block = 1; + vmovss(ptr[param_z + z_offset], xmm_t(reg_idx)); + } + z_offset += inner_block * sizeof(float); + rest -= inner_block; + } + } + + postCode(); +} + +class MatMulCreator : public JitCodeCreator { + public: + bool UseMe(const matmul_attr_t& attr) const override { + return attr.m == 1 && platform::MayIUse(platform::avx512f) && + attr.n % ZMM_FLOAT_BLOCK == 0 && attr.k < 512; + } + size_t CodeSize(const matmul_attr_t& attr) const override { + int block = YMM_FLOAT_BLOCK; + if (platform::MayIUse(platform::avx512f)) { + block = ZMM_FLOAT_BLOCK; + } + return 96 + 4 * attr.k * (attr.n / block + 1) * 8; + } + std::unique_ptr CreateJitCode( + const matmul_attr_t& attr) const override { + PADDLE_ENFORCE_GT(attr.m, 0); + PADDLE_ENFORCE_GT(attr.n, 0); + PADDLE_ENFORCE_GT(attr.k, 0); + return make_unique(attr, CodeSize(attr)); + } +}; + +} // namespace gen +} // namespace jit +} // namespace operators +} // namespace paddle + +namespace gen = paddle::operators::jit::gen; + +REGISTER_JITKERNEL_GEN(kMatMul, gen::MatMulCreator); diff --git a/paddle/fluid/operators/jit/gen/matmul.h b/paddle/fluid/operators/jit/gen/matmul.h new file mode 100644 index 0000000000000000000000000000000000000000..626baa8f738bf0395f3c7f1700610d0a9075879b --- /dev/null +++ b/paddle/fluid/operators/jit/gen/matmul.h @@ -0,0 +1,62 @@ +/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. */ + +#pragma once + +#include // for malloc and free +#include +#include +#include "glog/logging.h" +#include "paddle/fluid/operators/jit/gen/jitcode.h" +#include "paddle/fluid/platform/enforce.h" + +namespace paddle { +namespace operators { +namespace jit { +namespace gen { + +class MatMulJitCode : public JitCode { + public: + explicit MatMulJitCode(const matmul_attr_t& attr, + size_t code_size = 256 * 1024, + void* code_ptr = nullptr) + : JitCode(code_size, code_ptr), m_(attr.m), n_(attr.n), k_(attr.k) { + PADDLE_ENFORCE_EQ(m_, 1, "Only support m==1 yet"); + this->genCode(); + } + + virtual const char* name() const { + std::string base = "MatMulJitCode"; + base = base + "_M" + std::to_string(m_) + "_N" + std::to_string(n_) + "_K" + + std::to_string(k_); + return base.c_str(); + } + void genCode() override; + + private: + int m_, n_, k_; + + reg64_t param_x{abi_param1}; + reg64_t param_y{abi_param2}; + reg64_t param_z{abi_param3}; + reg64_t param_attr{abi_param4}; + reg64_t reg_tmp{rax}; + + reg64_t reg_ptr_wgt{r10}; +}; + +} // namespace gen +} // namespace jit +} // namespace operators +} // namespace paddle diff --git a/paddle/fluid/operators/jit/gen_base.cc b/paddle/fluid/operators/jit/gen_base.cc index 310da0c76f1ab251d788e54f2305f375f3fb4838..3cd5f6554bdc188ce9ea0c0b85c84d032c509600 100644 --- a/paddle/fluid/operators/jit/gen_base.cc +++ b/paddle/fluid/operators/jit/gen_base.cc @@ -16,6 +16,8 @@ #include #include #include +#include +#include "paddle/fluid/platform/cpu_info.h" DEFINE_bool(dump_jitcode, false, "Whether to dump the jitcode to file"); @@ -38,6 +40,35 @@ void GenBase::dumpCode(const unsigned char* code) const { } } +std::vector packed_groups(int n, int k, int* block_out, int* rest_out) { + int block; + int max_num_regs; + if (platform::MayIUse(platform::avx512f)) { + block = ZMM_FLOAT_BLOCK; + max_num_regs = 32; + } else { + block = YMM_FLOAT_BLOCK; + max_num_regs = 16; + } + // one for x, one for y, others for z + const int max_used_regs_for_n = max_num_regs - 2; + const int aligned_n = n % block == 0 ? n : (n / block + 1) * block; + const int num_block = aligned_n / block; + const int num_groups = num_block / max_used_regs_for_n; + std::vector groups(num_groups, max_used_regs_for_n); + int rest_num_regs = num_block % max_used_regs_for_n; + if (rest_num_regs != 0) { + groups.push_back(rest_num_regs); + } + if (block_out) { + *block_out = block; + } + if (rest_out) { + *rest_out = n % block; + } + return groups; +} + } // namespace jit } // namespace operators } // namespace paddle diff --git a/paddle/fluid/operators/jit/gen_base.h b/paddle/fluid/operators/jit/gen_base.h index 4af01a437670aa6a07d370ff23ed2abd369f69a3..d808a332472ae86240cb63356cb417123523366a 100644 --- a/paddle/fluid/operators/jit/gen_base.h +++ b/paddle/fluid/operators/jit/gen_base.h @@ -16,6 +16,7 @@ #include #include // for unique_ptr +#include #include "paddle/fluid/operators/jit/kernel_base.h" DECLARE_bool(dump_jitcode); @@ -67,6 +68,11 @@ class JitCodeCreator : public GenCreator { virtual std::unique_ptr CreateJitCode(const Attr& attr) const = 0; }; +// unify the method of packed groups +// output the packed groups which used in weights, the block size and rest size +std::vector packed_groups(int n, int k, int* block = nullptr, + int* rest = nullptr); + } // namespace jit } // namespace operators } // namespace paddle diff --git a/paddle/fluid/operators/jit/helper.cc b/paddle/fluid/operators/jit/helper.cc index 4dac2f2460f72c7da63f48c82549b948cc253153..e7292fe2bd8031aa5bbff68e7c2305a238085bf1 100644 --- a/paddle/fluid/operators/jit/helper.cc +++ b/paddle/fluid/operators/jit/helper.cc @@ -14,6 +14,8 @@ #include "paddle/fluid/operators/jit/helper.h" #include // tolower +#include +#include #include "paddle/fluid/platform/enforce.h" namespace paddle { @@ -91,6 +93,41 @@ KernelType to_kerneltype(const std::string& act) { return kNone; } +template <> +void pack_weights(const float* src, float* dst, int n, int k) { + int block, rest; + const auto groups = packed_groups(n, k, &block, &rest); + std::for_each(groups.begin(), groups.end(), [&](int i) { + PADDLE_ENFORCE_GT(i, 0, "each element of groups should be larger than 0."); + }); + int sum = std::accumulate(groups.begin(), groups.end(), 0); + std::memset(dst, 0, k * sum * block * sizeof(float)); + PADDLE_ENFORCE_GE(sum * block, n, + "The packed n should be equal to or larger than n"); + + const int block_len = sizeof(float) * block; + int n_offset = 0; + + for (size_t g = 0; g < groups.size(); ++g) { + const float* from = src + n_offset; + for (int j = 0; j < k; ++j) { + size_t copy_sz = groups[g] * block_len; + if (g == groups.size() - 1 && rest != 0) { + copy_sz = (groups[g] - 1) * block_len + rest * sizeof(float); + } + std::memcpy(dst, from + j * n, copy_sz); + dst += groups[g] * block; + } + n_offset += groups[g] * block; + } +} + +template +typename std::enable_if::value>::type pack_weights( + const T* src, T* dst, int n, int k) { + PADDLE_THROW("Only support pack with float type."); +} + } // namespace jit } // namespace operators } // namespace paddle diff --git a/paddle/fluid/operators/jit/helper.h b/paddle/fluid/operators/jit/helper.h index 7bdc45779b7d39d36db0d52ca9361943cdcdef3e..d5773d65940127ea0a9b77ed2760bd371b778f4c 100644 --- a/paddle/fluid/operators/jit/helper.h +++ b/paddle/fluid/operators/jit/helper.h @@ -118,26 +118,33 @@ typename KernelTuples::func_type Get( return GetRefer(); } -template -class KernelFuncsCache { +template +class KernelFuncs { public: - KernelFuncsCache() = default; - static KernelFuncsCache& Instance() { - static thread_local KernelFuncsCache g_func_cache; + KernelFuncs() = default; + static KernelFuncs& Cache() { + static thread_local KernelFuncs g_func_cache; return g_func_cache; } bool Has(int key) const { return funcs_.find(key) != funcs_.end(); } - typename KernelTuples::func_type At(int key) { return funcs_.at(key); } - void Insert(int key, typename KernelTuples::func_type func) { funcs_.emplace(key, func); } + typename KernelTuples::func_type At(int key) { + if (Has(key)) { + return funcs_.at(key); + } + auto func = Get(key); + Insert(key, func); + return func; + } + private: std::unordered_map funcs_; - DISABLE_COPY_AND_ASSIGN(KernelFuncsCache); + DISABLE_COPY_AND_ASSIGN(KernelFuncs); }; const char* to_string(KernelType kt); @@ -152,17 +159,28 @@ inline std::ostream& operator<<(std::ostream& os, const lstm_attr_t& attr) { << (attr.use_peephole ? "True" : "False") << "]"; return os; } + inline std::ostream& operator<<(std::ostream& os, const gru_attr_t& attr) { os << "dim_size[" << attr.d << "],act_gate[" << to_string(attr.act_gate) << "],act_cand[" << to_string(attr.act_cand) << "]"; return os; } + inline std::ostream& operator<<(std::ostream& os, const seq_pool_attr_t& attr) { os << "height_size[" << attr.h << "],width_size[" << attr.w << "],pool_type[" << to_string(attr.type) << "]"; return os; } +inline std::ostream& operator<<(std::ostream& os, const matmul_attr_t& attr) { + os << "M[" << attr.m << "],N[" << attr.n << "],K[" << attr.k << "]"; + return os; +} + +// expose the method to pack matmul weight +template +void pack_weights(const T* src, T* dst, int n, int k); + } // namespace jit } // namespace operators } // namespace paddle diff --git a/paddle/fluid/operators/jit/kernel_base.h b/paddle/fluid/operators/jit/kernel_base.h index 42a58580f7b1e0832af57398ba9c29882b6cc6fb..4a8f61146a1921fa1d5f6b7e15af40cd45d31a22 100644 --- a/paddle/fluid/operators/jit/kernel_base.h +++ b/paddle/fluid/operators/jit/kernel_base.h @@ -145,11 +145,19 @@ struct SeqPoolTuples { typedef void (*func_type)(const T*, T*, const seq_pool_attr_t*); }; +typedef struct matmul_attr_s { + int m, n, k; + void* packed_weight{nullptr}; + matmul_attr_s() = default; + explicit matmul_attr_s(int m_, int n_, int k_, void* packed_weight_ = nullptr) + : m(m_), n(n_), k(k_), packed_weight(packed_weight_) {} +} matmul_attr_t; + template struct MatMulTuples { typedef T data_type; - typedef int attr_type; - typedef void (*func_type)(const T*, const T*, T*, int, int, int); + typedef matmul_attr_t attr_type; + typedef void (*func_type)(const T*, const T*, T*, const matmul_attr_t*); }; template diff --git a/paddle/fluid/operators/jit/kernel_key.cc b/paddle/fluid/operators/jit/kernel_key.cc index 61de38688664f83775c0c4e5aa6f7e06c3602ddb..1e4a8884e78c5d3c1748988f05ecf461a6f0eb94 100644 --- a/paddle/fluid/operators/jit/kernel_key.cc +++ b/paddle/fluid/operators/jit/kernel_key.cc @@ -49,6 +49,13 @@ size_t JitCodeKey(const seq_pool_attr_t& attr) { return (key << pool_type_shift) + static_cast(attr.type); } +template <> +size_t JitCodeKey(const matmul_attr_t& attr) { + size_t key = attr.m; + constexpr int shift = 21; + return (key << shift * 2) + ((static_cast(attr.n)) << shift) + attr.k; +} + } // namespace jit } // namespace operators } // namespace paddle diff --git a/paddle/fluid/operators/jit/more/mix/mix.cc b/paddle/fluid/operators/jit/more/mix/mix.cc index 0f42ac158ca7926981df55936cb903d5f4ae4806..0036d1c238b17768c4df61af22a85588990e1815 100644 --- a/paddle/fluid/operators/jit/more/mix/mix.cc +++ b/paddle/fluid/operators/jit/more/mix/mix.cc @@ -49,49 +49,16 @@ void VTanh(const T* x, T* y, int n) { } void Softmax(const T* x, T* y, int n, int bs) { - typename XRNTuples::func_type compute_hmax{nullptr}; - typename XRNTuples::func_type compute_hsum{nullptr}; - typename AXYNTuples::func_type compute_vscal{nullptr}; - typename AXYNTuples::func_type compute_vaddbias{nullptr}; - typename XYNTuples::func_type compute_vexp{nullptr}; - - if (!KernelFuncsCache>::Instance().Has(n)) { - compute_hmax = Get, platform::CPUPlace>(n); - KernelFuncsCache>::Instance().Insert(n, compute_hmax); - } else { - compute_hmax = KernelFuncsCache>::Instance().At(n); - } - - if (!KernelFuncsCache>::Instance().Has(n)) { - compute_hsum = Get, platform::CPUPlace>(n); - KernelFuncsCache>::Instance().Insert(n, compute_hsum); - } else { - compute_hsum = KernelFuncsCache>::Instance().At(n); - } - - if (!KernelFuncsCache>::Instance().Has(n)) { - compute_vscal = Get, platform::CPUPlace>(n); - KernelFuncsCache>::Instance().Insert(n, - compute_vscal); - } else { - compute_vscal = KernelFuncsCache>::Instance().At(n); - } - - if (!KernelFuncsCache>::Instance().Has(n)) { - compute_vaddbias = Get, platform::CPUPlace>(n); - KernelFuncsCache>::Instance().Insert( - n, compute_vaddbias); - } else { - compute_vaddbias = - KernelFuncsCache>::Instance().At(n); - } - - if (!KernelFuncsCache>::Instance().Has(n)) { - compute_vexp = Get, platform::CPUPlace>(n); - KernelFuncsCache>::Instance().Insert(n, compute_vexp); - } else { - compute_vexp = KernelFuncsCache>::Instance().At(n); - } + auto compute_hmax = + KernelFuncs, platform::CPUPlace>::Cache().At(n); + auto compute_hsum = + KernelFuncs, platform::CPUPlace>::Cache().At(n); + auto compute_vscal = + KernelFuncs, platform::CPUPlace>::Cache().At(n); + auto compute_vaddbias = + KernelFuncs, platform::CPUPlace>::Cache().At(n); + auto compute_vexp = + KernelFuncs, platform::CPUPlace>::Cache().At(n); for (int i = 0; i < bs; ++i) { T scalar; diff --git a/paddle/fluid/operators/jit/more/mkl/mkl.cc b/paddle/fluid/operators/jit/more/mkl/mkl.cc index 28a37198dae19a57509934ec784746bc23436e7a..4c999131ab116ebe3484355158993558b02cc4b2 100644 --- a/paddle/fluid/operators/jit/more/mkl/mkl.cc +++ b/paddle/fluid/operators/jit/more/mkl/mkl.cc @@ -25,17 +25,19 @@ namespace more { namespace mkl { template <> -void MatMul(const float* a, const float* b, float* c, int m, int n, - int k) { - platform::dynload::cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, - n, k, 1.f, a, k, b, n, 0.f, c, n); +void MatMul(const float* a, const float* b, float* c, + const matmul_attr_t* attr) { + platform::dynload::cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, + attr->m, attr->n, attr->k, 1.f, a, attr->k, b, + attr->n, 0.f, c, attr->n); } template <> -void MatMul(const double* a, const double* b, double* c, int m, int n, - int k) { - platform::dynload::cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, - n, k, 1.0, a, k, b, n, 0.0, c, n); +void MatMul(const double* a, const double* b, double* c, + const matmul_attr_t* attr) { + platform::dynload::cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, + attr->m, attr->n, attr->k, 1.0, a, attr->k, b, + attr->n, 0.0, c, attr->n); } template <> @@ -127,11 +129,6 @@ void ASum(const double* x, double* res, int n) { } // TODO(TJ): tuning me carefully on AVX, AVX2 and AVX512 -template <> -bool MatMulKernel::UseMe(const int& d) const { - return platform::MayIUse(platform::avx); -} - template <> bool VMulKernel::UseMe(const int& d) const { return platform::MayIUse(platform::avx512f) && d > 512; @@ -139,7 +136,7 @@ bool VMulKernel::UseMe(const int& d) const { template <> bool VAddKernel::UseMe(const int& d) const { - return platform::MayIUse(platform::avx512f) && d > 512; + return platform::MayIUse(platform::avx) && d > 512; } template <> @@ -177,6 +174,16 @@ bool SeqPoolKernel::UseMe(const seq_pool_attr_t& attr) const { return true; } +template <> +bool MatMulKernel::UseMe(const matmul_attr_t& attr) const { + return platform::MayIUse(platform::avx); +} + +template <> +bool MatMulKernel::UseMe(const matmul_attr_t& attr) const { + return true; +} + template <> bool SoftmaxKernel::UseMe(const int& d) const { // tuned on avx2 @@ -189,7 +196,6 @@ bool SoftmaxKernel::UseMe(const int& d) const { return true; \ } -AWALYS_USE_ME_WITH_DOUBLE(MatMul); AWALYS_USE_ME_WITH_DOUBLE(VMul); AWALYS_USE_ME_WITH_DOUBLE(VAdd); AWALYS_USE_ME_WITH_DOUBLE(VScal); diff --git a/paddle/fluid/operators/jit/more/mkl/mkl.h b/paddle/fluid/operators/jit/more/mkl/mkl.h index 6b95b9c872dc12cccaef0b0737edd760447a47d0..8130b87326f1887f232022ab30fa7bf42b0723e7 100644 --- a/paddle/fluid/operators/jit/more/mkl/mkl.h +++ b/paddle/fluid/operators/jit/more/mkl/mkl.h @@ -26,7 +26,7 @@ namespace more { namespace mkl { template -void MatMul(const T* a, const T* b, T* c, int m, int n, int k); +void MatMul(const T* a, const T* b, T* c, const matmul_attr_t* attr); template void VMul(const T* x, const T* y, T* z, int n); diff --git a/paddle/fluid/operators/jit/refer/refer.h b/paddle/fluid/operators/jit/refer/refer.h index 5a074db7e0e8ab49dc281e1809edef23e6a25c42..0c4a985f8e8ece0a6169478fa3a9b111f5a6f3b4 100644 --- a/paddle/fluid/operators/jit/refer/refer.h +++ b/paddle/fluid/operators/jit/refer/refer.h @@ -363,17 +363,19 @@ void SeqPool(const T* x, T* y, const seq_pool_attr_t* attr) { // A(M,K) * B(K,N) = C(M,N) template -void MatMul(const T* A, const T* B, T* C, int M, int N, int K) { +void MatMul(const T* A, const T* B, T* C, const matmul_attr_t* attr) { + int M = attr->m; + int N = attr->n; + int K = attr->k; for (int m = 0; m < M; ++m) { const T* pa = A + m * K; T* pc = C + m * N; for (int n = 0; n < N; ++n) { const T* pb = B + n; - T sum = static_cast(0); - for (int k = 0; k < K; ++k) { - sum += (pa[k] * pb[k * N]); + pc[n] = pa[0] * pb[0]; + for (int k = 1; k < K; ++k) { + pc[n] += pa[k] * pb[k * N]; } - *(pc + n) = sum; } } } diff --git a/paddle/fluid/operators/jit/test.cc b/paddle/fluid/operators/jit/test.cc index cc461552898fc68661ce548a520d65215d3572b4..237e588d35cc3b33658a830db34676967818aab6 100644 --- a/paddle/fluid/operators/jit/test.cc +++ b/paddle/fluid/operators/jit/test.cc @@ -22,7 +22,7 @@ #include "paddle/fluid/platform/cpu_info.h" #include "paddle/fluid/platform/place.h" -static double acc = 1e-5; +DEFINE_double(acc, 1e-5, "Test accuracy threshold."); template void RandomVec(const int n, T* a, const T lower = static_cast(-20.f), @@ -39,7 +39,7 @@ template void ExpectEQ(const T* target, const T* refer, int n) { if (std::is_floating_point::value) { for (int i = 0; i < n; ++i) { - EXPECT_NEAR(target[i], refer[i], acc); + EXPECT_NEAR(target[i], refer[i], FLAGS_acc); } } else { for (int i = 0; i < n; ++i) { @@ -272,21 +272,23 @@ struct TestFuncWithRefer, std::vector, std::vector, template struct TestFuncWithRefer, std::vector, std::vector, - std::vector, int, int, int> { + std::vector, + typename jit::MatMulTuples::attr_type> { void operator()(const typename jit::MatMulTuples::func_type tgt, const std::vector& a, const std::vector& b, - const std::vector& cref, int m, int n, int k) { + const std::vector& cref, + const typename jit::MatMulTuples::attr_type& attr) { EXPECT_TRUE(tgt != nullptr); - EXPECT_EQ(a.size(), static_cast(m * k)); - EXPECT_EQ(b.size(), static_cast(k * n)); - EXPECT_EQ(cref.size(), static_cast(m * n)); + EXPECT_EQ(a.size(), static_cast(attr.m * attr.k)); + EXPECT_EQ(b.size(), static_cast(attr.k * attr.n)); + EXPECT_EQ(cref.size(), static_cast(attr.m * attr.n)); std::vector c(cref.size()); const T* a_data = a.data(); const T* b_data = b.data(); const T* cref_data = cref.data(); T* c_data = c.data(); - tgt(a_data, b_data, c_data, m, n, k); - ExpectEQ(c_data, cref_data, m * n); + tgt(a_data, b_data, c_data, &attr); + ExpectEQ(c_data, cref_data, attr.m * attr.n); } }; @@ -383,8 +385,8 @@ void TestAXYNKernel() { template void TestXRNKernel() { VLOG(10) << "===== Test JITKernel " << jit::to_string(KT); - auto last_acc = acc; - acc = 1e-4; + auto last_acc = FLAGS_acc; + FLAGS_acc = 1e-4; for (int d : TestSizes()) { auto ref = jit::GetRefer>(); EXPECT_TRUE(ref != nullptr); @@ -395,7 +397,7 @@ void TestXRNKernel() { TestAllImpls, PlaceType, std::vector, T>(d, x, ref_res); } - acc = last_acc; + FLAGS_acc = last_acc; } template @@ -535,9 +537,10 @@ void TestSeqPoolKernel() { template void TestMatMulKernel() { VLOG(10) << "===== Test JITKernel " << jit::to_string(KT); - auto last_acc = acc; - // TODO(intel): this should be acc issue of MKL - acc = 1e-3; + auto last_acc = FLAGS_acc; + // TODO(intel): fix MKL acc issue + // https://github.com/PaddlePaddle/Paddle/issues/15447 + FLAGS_acc = 1e-3; for (int m : {1, 2, 3, 4}) { for (int n : {1, 2, 3, 4}) { for (int k : TestSizes()) { @@ -549,13 +552,14 @@ void TestMatMulKernel() { const T* a_data = a.data(); const T* b_data = b.data(); T* c_data = c.data(); - ref(a_data, b_data, c_data, m, n, k); + const jit::matmul_attr_t attr{m, n, k}; + ref(a_data, b_data, c_data, &attr); TestAllImpls, PlaceType, std::vector, - std::vector, std::vector>(k, a, b, c, m, n, k); + std::vector, std::vector>(attr, a, b, c, attr); } } } - acc = last_acc; + FLAGS_acc = last_acc; } template diff --git a/paddle/fluid/operators/math/fc_compute.h b/paddle/fluid/operators/math/fc_compute.h index cddd0a18db53a7ddf9ca14d5f373180586ef6a31..0ad57c51be79cd3577b43c9af777bff710308fac 100644 --- a/paddle/fluid/operators/math/fc_compute.h +++ b/paddle/fluid/operators/math/fc_compute.h @@ -30,15 +30,17 @@ inline void FCCompute(const BlasT& blas, const int M, return; } if (relu) { - auto compute = - jit::Get, platform::CPUPlace>(N); + auto compute = jit::KernelFuncs, + platform::CPUPlace>::Cache() + .At(N); for (int i = 0; i < M; i++) { T* dst = Y + i * N; compute(B, dst, dst, N); } } else { - auto compute = - jit::Get, platform::CPUPlace>(N); + auto compute = jit::KernelFuncs, + platform::CPUPlace>::Cache() + .At(N); #ifdef PADDLE_WITH_MKLML #pragma omp parallel for #endif diff --git a/paddle/fluid/operators/math/softmax_impl.h b/paddle/fluid/operators/math/softmax_impl.h index 1ff9ff684fc8001afb0f768a033b4c5bd1592702..a1cb3f972826a67721b00ce6df0ec48cc34d6e03 100644 --- a/paddle/fluid/operators/math/softmax_impl.h +++ b/paddle/fluid/operators/math/softmax_impl.h @@ -82,8 +82,9 @@ class SoftmaxFunctor> { const int kClassDim = 1; // 2D data. Batch x C auto compute_softmax = - jit::Get, platform::CPUPlace>( - in_dims[kClassDim]); + jit::KernelFuncs, + platform::CPUPlace>::Cache() + .At(in_dims[kClassDim]); compute_softmax(in_data, out_data, in_dims[kClassDim], in_dims[kBatchDim]); } }; diff --git a/paddle/fluid/operators/ngraph/ngraph_bridge.cc b/paddle/fluid/operators/ngraph/ngraph_bridge.cc index d6e897ed4666261cdd0bd6565f61abb218d971e5..9f92bc01befb496c103bcd367ae9cfc5c8f402b0 100644 --- a/paddle/fluid/operators/ngraph/ngraph_bridge.cc +++ b/paddle/fluid/operators/ngraph/ngraph_bridge.cc @@ -31,6 +31,8 @@ std::map>>)>> NgraphBridge::NG_NODE_MAP = { + {"conv2d", NG_OPS::BuildConv2dNode}, + {"conv2d_grad", NG_OPS::BuildConv2dGradNode}, {"elementwise_add", NG_OPS::BuildElementwiseAddNode}, {"elementwise_add_grad", NG_OPS::BuildElementwiseAddGradNode}, {"fill_constant", NG_OPS::BuildFillConstantNode}, @@ -38,6 +40,8 @@ std::map +#include +#include "ngraph/ngraph.hpp" +#include "paddle/fluid/platform/ngraph_helper.h" + +namespace paddle { +namespace operators { +namespace ngraphs { + +std::shared_ptr GroupedConvolution( + const std::shared_ptr& data_batch, + const std::shared_ptr& filters, const ngraph::Strides strides, + const ngraph::Strides dilations, const ngraph::CoordinateDiff& paddings, + size_t groups) { + auto& data_shape = data_batch->get_shape(); + auto& filter_shape = filters->get_shape(); + ngraph::NodeVector ng_slices; + + for (size_t i = 0; i < groups; ++i) { + size_t channel_step = filter_shape.at(1); + const std::vector lower_bound{0, i * channel_step, 0, 0}; + const std::vector upper_bound{data_shape.at(0), + (i + 1) * channel_step, + data_shape.at(2), data_shape.at(3)}; + auto data_slice = std::make_shared( + data_batch, lower_bound, upper_bound); + + size_t filter_step = filter_shape.at(0) / groups; + const std::vector filter_lower_bound{i * filter_step, 0, 0, 0}; + const std::vector filter_upper_bound{ + (i + 1) * filter_step, filter_shape.at(1), filter_shape.at(2), + filter_shape.at(3)}; + auto filter_slice = std::make_shared( + filters, filter_lower_bound, filter_upper_bound); + auto ng_conv = std::make_shared( + data_slice, filter_slice, strides, dilations, paddings, paddings); + ng_slices.push_back(ng_conv); + } + + size_t concat_axis = 1; + return std::make_shared(ng_slices, concat_axis); +} + +std::shared_ptr GroupedGradConvolutionFilter( + const std::shared_ptr& data_batch, + const std::shared_ptr& filters, + const std::shared_ptr& doutput, const ngraph::Strides strides, + const ngraph::Strides dilations, const ngraph::CoordinateDiff& paddings, + size_t groups) { + auto& data_shape = data_batch->get_shape(); + auto& filter_shape = filters->get_shape(); + auto& out_shape = doutput->get_shape(); + ngraph::NodeVector ng_slices; + + for (size_t i = 0; i < groups; ++i) { + size_t channel_step = filter_shape.at(1); + const std::vector lower_bound{0, i * channel_step, 0, 0}; + const std::vector upper_bound{data_shape.at(0), + (i + 1) * channel_step, + data_shape.at(2), data_shape.at(3)}; + auto data_slice = std::make_shared( + data_batch, lower_bound, upper_bound); + + size_t filter_step = data_shape.at(0); + + const std::vector filter_lower_bound{i * filter_step, 0, 0, 0}; + const std::vector filter_upper_bound{ + (i + 1) * filter_step, filter_shape.at(1), filter_shape.at(2), + filter_shape.at(3)}; + auto filter_slice = std::make_shared( + filters, filter_lower_bound, filter_upper_bound); + + const std::vector olower_bound{0, i * filter_step, 0, 0}; + const std::vector oupper_bound{out_shape.at(0), + (i + 1) * filter_step, + out_shape.at(2), out_shape.at(3)}; + auto out_slice = std::make_shared(doutput, olower_bound, + oupper_bound); + + auto ng_conv = std::make_shared( + data_slice, filter_slice->get_shape(), out_slice, strides, dilations, + paddings, paddings, ngraph::Strides{1, 1}); + + ng_slices.push_back(ng_conv); + } + + size_t concat_axis = 0; + return std::make_shared(ng_slices, concat_axis); +} + +std::shared_ptr GroupedGradConvolutionData( + const std::shared_ptr& data_batch, + const std::shared_ptr& filters, + const std::shared_ptr& doutput, const ngraph::Strides strides, + const ngraph::Strides dilations, const ngraph::CoordinateDiff& paddings, + size_t groups) { + auto& data_shape = data_batch->get_shape(); + auto& filter_shape = filters->get_shape(); + auto& out_shape = doutput->get_shape(); + ngraph::NodeVector ng_slices; + + for (size_t i = 0; i < groups; ++i) { + size_t channel_step = filter_shape.at(1); + const std::vector lower_bound{0, i * channel_step, 0, 0}; + const std::vector upper_bound{data_shape.at(0), + (i + 1) * channel_step, + data_shape.at(2), data_shape.at(3)}; + auto data_slice = std::make_shared( + data_batch, lower_bound, upper_bound); + + size_t filter_step = data_shape.at(0); + + const std::vector filter_lower_bound{i * filter_step, 0, 0, 0}; + const std::vector filter_upper_bound{ + (i + 1) * filter_step, filter_shape.at(1), filter_shape.at(2), + filter_shape.at(3)}; + auto filter_slice = std::make_shared( + filters, filter_lower_bound, filter_upper_bound); + + const std::vector olower_bound{0, i * filter_step, 0, 0}; + const std::vector oupper_bound{out_shape.at(0), + (i + 1) * filter_step, + out_shape.at(2), out_shape.at(3)}; + auto out_slice = std::make_shared(doutput, olower_bound, + oupper_bound); + + auto ng_conv = std::make_shared( + data_slice->get_shape(), filter_slice, out_slice, strides, dilations, + paddings, paddings, ngraph::Strides{1, 1}); + ng_slices.push_back(ng_conv); + } + + size_t concat_axis = 1; + return std::make_shared(ng_slices, concat_axis); +} + +void BuildConv2dNode( + const std::shared_ptr& op, + std::shared_ptr< + std::unordered_map>> + ngb_node_map) { + auto op_attrs = paddle::framework::AttrReader(op->Attrs()); + auto filters = paddle::platform::GetInputNode(op, "Filter", ngb_node_map); + auto input = paddle::platform::GetInputNode(op, "Input", ngb_node_map); + + std::vector strides = op_attrs.Get>("strides"); + std::vector paddings = op_attrs.Get>("paddings"); + std::vector dilations = op_attrs.Get>("dilations"); + + const ngraph::Strides ng_strides{static_cast(strides.at(0)), + static_cast(strides.at(1))}; + const ngraph::Strides ng_dilations{static_cast(dilations.at(0)), + static_cast(dilations.at(1))}; + const ngraph::CoordinateDiff ng_paddings{ + static_cast(paddings.at(0)), + static_cast(paddings.at(1))}; + + int groups = static_cast(op_attrs.Get("groups")); + PADDLE_ENFORCE_GE(groups, 1, "conv groups needs be no less than 1"); + + std::shared_ptr result; + if (groups == 1) { + result = std::make_shared( + input, filters, ng_strides, ng_dilations, ng_paddings, ng_paddings); + } else { + result = GroupedConvolution(input, filters, ng_strides, ng_dilations, + ng_paddings, groups); + } + paddle::platform::SetOutputNode(op, "Output", result, ngb_node_map); +} + +void BuildConv2dGradNode( + const std::shared_ptr& op, + std::shared_ptr< + std::unordered_map>> + ngb_node_map) { + auto op_attrs = paddle::framework::AttrReader(op->Attrs()); + auto filter = paddle::platform::GetInputNode(op, "Filter", ngb_node_map); + auto input = paddle::platform::GetInputNode(op, "Input", ngb_node_map); + auto doutput = + paddle::platform::GetInputNode(op, "Output@GRAD", ngb_node_map); + + int groups = op_attrs.Get("groups"); + std::vector strides = op_attrs.Get>("strides"); + std::vector paddings = op_attrs.Get>("paddings"); + std::vector dilations = op_attrs.Get>("dilations"); + + const ngraph::Strides ng_strides{static_cast(strides.at(0)), + static_cast(strides.at(1))}; + const ngraph::Strides ng_dilations{static_cast(dilations.at(0)), + static_cast(dilations.at(1))}; + const ngraph::CoordinateDiff ng_paddings{ + static_cast(paddings.at(0)), + static_cast(paddings.at(1))}; + + std::shared_ptr dfilter; + std::shared_ptr dinput; + if (groups == 1) { + dfilter = std::make_shared( + input, filter->get_shape(), doutput, ng_strides, ng_dilations, + ng_paddings, ng_paddings, ngraph::Strides{1, 1}); + + dinput = std::make_shared( + input->get_shape(), filter, doutput, ng_strides, ng_dilations, + ng_paddings, ng_paddings, ngraph::Strides{1, 1}); + + } else { + dfilter = GroupedGradConvolutionFilter(input, filter, doutput, ng_strides, + ng_dilations, ng_paddings, groups); + dinput = GroupedGradConvolutionData(input, filter, doutput, ng_strides, + ng_dilations, ng_paddings, groups); + } + + paddle::platform::SetOutputNode(op, "Filter@GRAD", dfilter, ngb_node_map); + paddle::platform::SetOutputNode(op, "Input@GRAD", dinput, ngb_node_map); +} +} // namespace ngraphs +} // namespace operators +} // namespace paddle diff --git a/paddle/fluid/operators/ngraph/ops/pool2d_op.h b/paddle/fluid/operators/ngraph/ops/pool2d_op.h new file mode 100644 index 0000000000000000000000000000000000000000..836c9d6c185b305d3dd4c9e9d30e23abb0c1431c --- /dev/null +++ b/paddle/fluid/operators/ngraph/ops/pool2d_op.h @@ -0,0 +1,174 @@ +/*Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved. + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. */ + +#pragma once + +#include +#include + +#include "ngraph/ngraph.hpp" +#include "paddle/fluid/platform/ngraph_helper.h" + +namespace paddle { +namespace operators { +namespace ngraphs { + +void BuildPool2dNode( + const std::shared_ptr& op, + std::shared_ptr< + std::unordered_map>> + ngb_node_map) { + auto op_attrs = paddle::framework::AttrReader(op->Attrs()); + auto x = paddle::platform::GetInputNode(op, "X", ngb_node_map); + auto x_shape = x->get_shape(); + + std::string pooling_type = op_attrs.Get("pooling_type"); + std::vector ksize = op_attrs.Get>("ksize"); + std::vector strides = op_attrs.Get>("strides"); + std::vector paddings = op_attrs.Get>("paddings"); + + PADDLE_ENFORCE_EQ(x_shape.size() - 2, ksize.size(), + "Handling 2d pooling only"); + + if (op_attrs.Get("global_pooling")) { + for (size_t i = 0; i < ksize.size(); ++i) { + paddings[i] = 0; + ksize[i] = static_cast(x_shape.at(i + 2)); + } + } + + ngraph::Shape ng_padding_below{static_cast(paddings.at(0)), + static_cast(paddings.at(1))}; + ngraph::Shape ng_padding_above{static_cast(paddings.at(0)), + static_cast(paddings.at(1))}; + ngraph::Shape ng_ksize_shape{static_cast(ksize.at(0)), + static_cast(ksize.at(1))}; + ngraph::Strides ng_strides{static_cast(strides.at(0)), + static_cast(strides.at(1))}; + + auto ComputeCeiledOutput = [](size_t in, size_t k, size_t p, size_t s) { + return (in - k + 2 * p) / s + 1; + }; + + if (op_attrs.Get("ceil_mode")) { + auto dummy_out = paddle::platform::GetOutputNode(op, "Out", ngb_node_map); + auto dummpy_shape = dummy_out->get_shape(); + for (size_t i = 0; i < ng_padding_above.size(); ++i) { + auto desired_size = ComputeCeiledOutput(x_shape[i + 2], ksize[i], + paddings[i], strides[i]); + if (desired_size != dummpy_shape[i + 2]) { + ng_padding_above[i] += strides[i]; + } + } + } + + bool padding_exclusive = op_attrs.Get("exclusive"); + if (pooling_type == "max") { + auto pool2d = std::make_shared( + x, ng_ksize_shape, ng_strides, ng_padding_below, ng_padding_above); + paddle::platform::SetOutputNode(op, "Out", pool2d, ngb_node_map); + } else if (pooling_type == "avg") { + std::shared_ptr pool2d; + if (op_attrs.Get("adaptive")) { + auto ComputeAdaptive = [](size_t in, size_t k) { + return std::floor(in / k); + }; + ng_strides[0] = x_shape.size() == 4 + ? ComputeAdaptive(x_shape[3], ksize[0]) + : ng_strides[0]; + ng_strides[1] = x_shape.size() == 4 + ? ComputeAdaptive(x_shape[3], ksize[0]) + : ng_strides[1]; + pool2d = + std::make_shared(x, ng_ksize_shape, ng_strides); + } else { + pool2d = std::make_shared( + x, ng_ksize_shape, ng_strides, ng_padding_below, ng_padding_above, + !padding_exclusive); + } + paddle::platform::SetOutputNode(op, "Out", pool2d, ngb_node_map); + } else { + PADDLE_THROW("Support max and avg pooling only"); + } +} + +void BuildPool2dGradNode( + const std::shared_ptr& op, + std::shared_ptr< + std::unordered_map>> + ngb_node_map) { + auto op_attrs = paddle::framework::AttrReader(op->Attrs()); + auto out = paddle::platform::GetInputNode(op, "Out", ngb_node_map); + auto dout = paddle::platform::GetInputNode(op, "Out@GRAD", ngb_node_map); + auto x = paddle::platform::GetInputNode(op, "X", ngb_node_map); + auto x_shape = x->get_shape(); + + std::string pooling_type = op_attrs.Get("pooling_type"); + std::vector ksize = op_attrs.Get>("ksize"); + std::vector strides = op_attrs.Get>("strides"); + std::vector paddings = op_attrs.Get>("paddings"); + + PADDLE_ENFORCE_EQ(x_shape.size() - 2, ksize.size(), + "Handling 2d pooling only"); + + if (op_attrs.Get("global_pooling")) { + for (size_t i = 0; i < ksize.size(); ++i) { + paddings[i] = 0; + ksize[i] = static_cast(x_shape.at(i + 2)); + } + } + + ngraph::Shape ng_padding_below{static_cast(paddings.at(0)), + static_cast(paddings.at(1))}; + ngraph::Shape ng_padding_above{static_cast(paddings.at(0)), + static_cast(paddings.at(1))}; + ngraph::Shape ng_ksize_shape{static_cast(ksize.at(0)), + static_cast(ksize.at(1))}; + ngraph::Strides ng_strides{static_cast(strides.at(0)), + static_cast(strides.at(1))}; + + bool padding_exclusive = op_attrs.Get("exclusive"); + if (pooling_type == "max") { + auto pool2d_grad = std::make_shared( + x, dout, out, ng_ksize_shape, ng_strides, ng_padding_below, + ng_padding_above); + paddle::platform::SetOutputNode(op, "X@GRAD", pool2d_grad, ngb_node_map); + } else if (pooling_type == "avg") { + std::shared_ptr pool2d_grad; + if (op_attrs.Get("adaptive")) { + auto ComputeAdaptive = [](size_t in, size_t k) { + return std::floor(in / k); + }; + ng_strides[0] = x_shape.size() == 4 + ? ComputeAdaptive(x_shape[3], ksize[0]) + : ng_strides[0]; + ng_strides[1] = x_shape.size() == 4 + ? ComputeAdaptive(x_shape[3], ksize[0]) + : ng_strides[1]; + pool2d_grad = std::make_shared( + x->get_shape(), dout, ng_ksize_shape, ng_strides, ng_padding_below, + ng_padding_above, !padding_exclusive); + } else { + pool2d_grad = std::make_shared( + x->get_shape(), dout, ng_ksize_shape, ng_strides, ng_padding_below, + ng_padding_above, !padding_exclusive); + } + paddle::platform::SetOutputNode(op, "X@GRAD", pool2d_grad, ngb_node_map); + } else { + PADDLE_THROW("Support max and avg pooling only"); + } +} +} // namespace ngraphs +} // namespace operators +} // namespace paddle diff --git a/paddle/fluid/operators/yolov3_loss_op.h b/paddle/fluid/operators/yolov3_loss_op.h deleted file mode 100644 index 0bb285722ddedf721d98237760ec9868e2134442..0000000000000000000000000000000000000000 --- a/paddle/fluid/operators/yolov3_loss_op.h +++ /dev/null @@ -1,483 +0,0 @@ -/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve. - Licensed under the Apache License, Version 2.0 (the "License"); - you may not use this file except in compliance with the License. - You may obtain a copy of the License at - http://www.apache.org/licenses/LICENSE-2.0 - Unless required by applicable law or agreed to in writing, software - distributed under the License is distributed on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - See the License for the specific language governing permissions and - limitations under the License. */ - -#pragma once -#include -#include -#include "paddle/fluid/framework/op_registry.h" - -namespace paddle { -namespace operators { - -using Tensor = framework::Tensor; -template -using EigenTensor = framework::EigenTensor; -template -using EigenVector = framework::EigenVector; - -using Array5 = Eigen::DSizes; - -template -static inline bool isZero(T x) { - return fabs(x) < 1e-6; -} - -template -static inline T sigmoid(T x) { - return 1.0 / (exp(-1.0 * x) + 1.0); -} - -template -static inline T CalcMaskPointNum(const Tensor& mask) { - auto mask_t = EigenVector::Flatten(mask); - T count = 0.0; - for (int i = 0; i < mask_t.dimensions()[0]; i++) { - if (mask_t(i)) { - count += 1.0; - } - } - return count; -} - -template -static inline T CalcMSEWithMask(const Tensor& x, const Tensor& y, - const Tensor& mask) { - auto x_t = EigenVector::Flatten(x); - auto y_t = EigenVector::Flatten(y); - auto mask_t = EigenVector::Flatten(mask); - - T error_sum = 0.0; - T points = 0.0; - for (int i = 0; i < x_t.dimensions()[0]; i++) { - if (mask_t(i)) { - error_sum += pow(x_t(i) - y_t(i), 2); - points += 1; - } - } - return (error_sum / points); -} - -template -static void CalcMSEGradWithMask(Tensor* grad, const Tensor& x, const Tensor& y, - const Tensor& mask, T mf) { - auto grad_t = EigenVector::Flatten(*grad).setConstant(0.0); - auto x_t = EigenVector::Flatten(x); - auto y_t = EigenVector::Flatten(y); - auto mask_t = EigenVector::Flatten(mask); - - for (int i = 0; i < x_t.dimensions()[0]; i++) { - if (mask_t(i)) { - grad_t(i) = 2.0 * (x_t(i) - y_t(i)) / mf; - } - } -} - -template -static inline T CalcBCEWithMask(const Tensor& x, const Tensor& y, - const Tensor& mask) { - auto x_t = EigenVector::Flatten(x); - auto y_t = EigenVector::Flatten(y); - auto mask_t = EigenVector::Flatten(mask); - - T error_sum = 0.0; - T points = 0.0; - for (int i = 0; i < x_t.dimensions()[0]; i++) { - if (mask_t(i)) { - error_sum += - -1.0 * (y_t(i) * log(x_t(i)) + (1.0 - y_t(i)) * log(1.0 - x_t(i))); - points += 1; - } - } - return (error_sum / points); -} - -template -static inline void CalcBCEGradWithMask(Tensor* grad, const Tensor& x, - const Tensor& y, const Tensor& mask, - T mf) { - auto grad_t = EigenVector::Flatten(*grad).setConstant(0.0); - auto x_t = EigenVector::Flatten(x); - auto y_t = EigenVector::Flatten(y); - auto mask_t = EigenVector::Flatten(mask); - - for (int i = 0; i < x_t.dimensions()[0]; i++) { - if (mask_t(i)) { - grad_t(i) = ((1.0 - y_t(i)) / (1.0 - x_t(i)) - y_t(i) / x_t(i)) / mf; - } - } -} - -template -static void CalcPredResult(const Tensor& input, Tensor* pred_conf, - Tensor* pred_class, Tensor* pred_x, Tensor* pred_y, - Tensor* pred_w, Tensor* pred_h, const int anchor_num, - const int class_num) { - const int n = input.dims()[0]; - const int h = input.dims()[2]; - const int w = input.dims()[3]; - const int box_attr_num = 5 + class_num; - - auto input_t = EigenTensor::From(input); - auto pred_conf_t = EigenTensor::From(*pred_conf); - auto pred_class_t = EigenTensor::From(*pred_class); - auto pred_x_t = EigenTensor::From(*pred_x); - auto pred_y_t = EigenTensor::From(*pred_y); - auto pred_w_t = EigenTensor::From(*pred_w); - auto pred_h_t = EigenTensor::From(*pred_h); - - for (int i = 0; i < n; i++) { - for (int an_idx = 0; an_idx < anchor_num; an_idx++) { - for (int j = 0; j < h; j++) { - for (int k = 0; k < w; k++) { - pred_x_t(i, an_idx, j, k) = - sigmoid(input_t(i, box_attr_num * an_idx, j, k)); - pred_y_t(i, an_idx, j, k) = - sigmoid(input_t(i, box_attr_num * an_idx + 1, j, k)); - pred_w_t(i, an_idx, j, k) = - input_t(i, box_attr_num * an_idx + 2, j, k); - pred_h_t(i, an_idx, j, k) = - input_t(i, box_attr_num * an_idx + 3, j, k); - - pred_conf_t(i, an_idx, j, k) = - sigmoid(input_t(i, box_attr_num * an_idx + 4, j, k)); - - for (int c = 0; c < class_num; c++) { - pred_class_t(i, an_idx, j, k, c) = - sigmoid(input_t(i, box_attr_num * an_idx + 5 + c, j, k)); - } - } - } - } - } -} - -template -static T CalcBoxIoU(std::vector box1, std::vector box2) { - T b1_x1 = box1[0] - box1[2] / 2; - T b1_x2 = box1[0] + box1[2] / 2; - T b1_y1 = box1[1] - box1[3] / 2; - T b1_y2 = box1[1] + box1[3] / 2; - T b2_x1 = box2[0] - box2[2] / 2; - T b2_x2 = box2[0] + box2[2] / 2; - T b2_y1 = box2[1] - box2[3] / 2; - T b2_y2 = box2[1] + box2[3] / 2; - - T b1_area = (b1_x2 - b1_x1) * (b1_y2 - b1_y1); - T b2_area = (b2_x2 - b2_x1) * (b2_y2 - b2_y1); - - T inter_rect_x1 = std::max(b1_x1, b2_x1); - T inter_rect_y1 = std::max(b1_y1, b2_y1); - T inter_rect_x2 = std::min(b1_x2, b2_x2); - T inter_rect_y2 = std::min(b1_y2, b2_y2); - T inter_area = std::max(inter_rect_x2 - inter_rect_x1, static_cast(0.0)) * - std::max(inter_rect_y2 - inter_rect_y1, static_cast(0.0)); - - return inter_area / (b1_area + b2_area - inter_area); -} - -template -static void PreProcessGTBox(const Tensor& gt_box, const Tensor& gt_label, - const float ignore_thresh, std::vector anchors, - const int grid_size, Tensor* obj_mask, - Tensor* noobj_mask, Tensor* tx, Tensor* ty, - Tensor* tw, Tensor* th, Tensor* tconf, - Tensor* tclass) { - const int n = gt_box.dims()[0]; - const int b = gt_box.dims()[1]; - const int anchor_num = anchors.size() / 2; - auto gt_box_t = EigenTensor::From(gt_box); - auto gt_label_t = EigenTensor::From(gt_label); - auto obj_mask_t = EigenTensor::From(*obj_mask).setConstant(0); - auto noobj_mask_t = EigenTensor::From(*noobj_mask).setConstant(1); - auto tx_t = EigenTensor::From(*tx).setConstant(0.0); - auto ty_t = EigenTensor::From(*ty).setConstant(0.0); - auto tw_t = EigenTensor::From(*tw).setConstant(0.0); - auto th_t = EigenTensor::From(*th).setConstant(0.0); - auto tconf_t = EigenTensor::From(*tconf).setConstant(0.0); - auto tclass_t = EigenTensor::From(*tclass).setConstant(0.0); - - for (int i = 0; i < n; i++) { - for (int j = 0; j < b; j++) { - if (isZero(gt_box_t(i, j, 0)) && isZero(gt_box_t(i, j, 1)) && - isZero(gt_box_t(i, j, 2)) && isZero(gt_box_t(i, j, 3))) { - continue; - } - - int cur_label = gt_label_t(i, j); - T gx = gt_box_t(i, j, 0) * grid_size; - T gy = gt_box_t(i, j, 1) * grid_size; - T gw = gt_box_t(i, j, 2) * grid_size; - T gh = gt_box_t(i, j, 3) * grid_size; - int gi = static_cast(gx); - int gj = static_cast(gy); - - T max_iou = static_cast(0); - T iou; - int best_an_index = -1; - std::vector gt_box_shape({0, 0, gw, gh}); - for (int an_idx = 0; an_idx < anchor_num; an_idx++) { - std::vector anchor_shape({0, 0, static_cast(anchors[2 * an_idx]), - static_cast(anchors[2 * an_idx + 1])}); - iou = CalcBoxIoU(gt_box_shape, anchor_shape); - if (iou > max_iou) { - max_iou = iou; - best_an_index = an_idx; - } - if (iou > ignore_thresh) { - noobj_mask_t(i, an_idx, gj, gi) = 0; - } - } - obj_mask_t(i, best_an_index, gj, gi) = 1; - noobj_mask_t(i, best_an_index, gj, gi) = 0; - tx_t(i, best_an_index, gj, gi) = gx - gi; - ty_t(i, best_an_index, gj, gi) = gy - gj; - tw_t(i, best_an_index, gj, gi) = log(gw / anchors[2 * best_an_index]); - th_t(i, best_an_index, gj, gi) = log(gh / anchors[2 * best_an_index + 1]); - tclass_t(i, best_an_index, gj, gi, cur_label) = 1; - tconf_t(i, best_an_index, gj, gi) = 1; - } - } -} - -static void ExpandObjMaskByClassNum(Tensor* obj_mask_expand, - const Tensor& obj_mask) { - const int n = obj_mask_expand->dims()[0]; - const int an_num = obj_mask_expand->dims()[1]; - const int h = obj_mask_expand->dims()[2]; - const int w = obj_mask_expand->dims()[3]; - const int class_num = obj_mask_expand->dims()[4]; - auto obj_mask_expand_t = EigenTensor::From(*obj_mask_expand); - auto obj_mask_t = EigenTensor::From(obj_mask); - - obj_mask_expand_t = obj_mask_t.reshape(Array5(n, an_num, h, w, 1)) - .broadcast(Array5(1, 1, 1, 1, class_num)); -} - -template -static void AddAllGradToInputGrad( - Tensor* grad, T loss, const Tensor& pred_x, const Tensor& pred_y, - const Tensor& pred_conf, const Tensor& pred_class, const Tensor& grad_x, - const Tensor& grad_y, const Tensor& grad_w, const Tensor& grad_h, - const Tensor& grad_conf_target, const Tensor& grad_conf_notarget, - const Tensor& grad_class, const int class_num, const float loss_weight_xy, - const float loss_weight_wh, const float loss_weight_conf_target, - const float loss_weight_conf_notarget, const float loss_weight_class) { - const int n = pred_x.dims()[0]; - const int an_num = pred_x.dims()[1]; - const int h = pred_x.dims()[2]; - const int w = pred_x.dims()[3]; - const int attr_num = class_num + 5; - auto grad_t = EigenTensor::From(*grad).setConstant(0.0); - auto pred_x_t = EigenTensor::From(pred_x); - auto pred_y_t = EigenTensor::From(pred_y); - auto pred_conf_t = EigenTensor::From(pred_conf); - auto pred_class_t = EigenTensor::From(pred_class); - auto grad_x_t = EigenTensor::From(grad_x); - auto grad_y_t = EigenTensor::From(grad_y); - auto grad_w_t = EigenTensor::From(grad_w); - auto grad_h_t = EigenTensor::From(grad_h); - auto grad_conf_target_t = EigenTensor::From(grad_conf_target); - auto grad_conf_notarget_t = EigenTensor::From(grad_conf_notarget); - auto grad_class_t = EigenTensor::From(grad_class); - - for (int i = 0; i < n; i++) { - for (int j = 0; j < an_num; j++) { - for (int k = 0; k < h; k++) { - for (int l = 0; l < w; l++) { - grad_t(i, j * attr_num, k, l) = - grad_x_t(i, j, k, l) * pred_x_t(i, j, k, l) * - (1.0 - pred_x_t(i, j, k, l)) * loss * loss_weight_xy; - grad_t(i, j * attr_num + 1, k, l) = - grad_y_t(i, j, k, l) * pred_y_t(i, j, k, l) * - (1.0 - pred_y_t(i, j, k, l)) * loss * loss_weight_xy; - grad_t(i, j * attr_num + 2, k, l) = - grad_w_t(i, j, k, l) * loss * loss_weight_wh; - grad_t(i, j * attr_num + 3, k, l) = - grad_h_t(i, j, k, l) * loss * loss_weight_wh; - grad_t(i, j * attr_num + 4, k, l) = - grad_conf_target_t(i, j, k, l) * pred_conf_t(i, j, k, l) * - (1.0 - pred_conf_t(i, j, k, l)) * loss * loss_weight_conf_target; - grad_t(i, j * attr_num + 4, k, l) += - grad_conf_notarget_t(i, j, k, l) * pred_conf_t(i, j, k, l) * - (1.0 - pred_conf_t(i, j, k, l)) * loss * - loss_weight_conf_notarget; - - for (int c = 0; c < class_num; c++) { - grad_t(i, j * attr_num + 5 + c, k, l) = - grad_class_t(i, j, k, l, c) * pred_class_t(i, j, k, l, c) * - (1.0 - pred_class_t(i, j, k, l, c)) * loss * loss_weight_class; - } - } - } - } - } -} - -template -class Yolov3LossKernel : public framework::OpKernel { - public: - void Compute(const framework::ExecutionContext& ctx) const override { - auto* input = ctx.Input("X"); - auto* gt_box = ctx.Input("GTBox"); - auto* gt_label = ctx.Input("GTLabel"); - auto* loss = ctx.Output("Loss"); - auto anchors = ctx.Attr>("anchors"); - int class_num = ctx.Attr("class_num"); - float ignore_thresh = ctx.Attr("ignore_thresh"); - float loss_weight_xy = ctx.Attr("loss_weight_xy"); - float loss_weight_wh = ctx.Attr("loss_weight_wh"); - float loss_weight_conf_target = ctx.Attr("loss_weight_conf_target"); - float loss_weight_conf_notarget = - ctx.Attr("loss_weight_conf_notarget"); - float loss_weight_class = ctx.Attr("loss_weight_class"); - - const int n = input->dims()[0]; - const int h = input->dims()[2]; - const int w = input->dims()[3]; - const int an_num = anchors.size() / 2; - - Tensor pred_x, pred_y, pred_w, pred_h; - Tensor pred_conf, pred_class; - pred_x.mutable_data({n, an_num, h, w}, ctx.GetPlace()); - pred_y.mutable_data({n, an_num, h, w}, ctx.GetPlace()); - pred_w.mutable_data({n, an_num, h, w}, ctx.GetPlace()); - pred_h.mutable_data({n, an_num, h, w}, ctx.GetPlace()); - pred_conf.mutable_data({n, an_num, h, w}, ctx.GetPlace()); - pred_class.mutable_data({n, an_num, h, w, class_num}, ctx.GetPlace()); - CalcPredResult(*input, &pred_conf, &pred_class, &pred_x, &pred_y, - &pred_w, &pred_h, an_num, class_num); - - Tensor obj_mask, noobj_mask; - Tensor tx, ty, tw, th, tconf, tclass; - obj_mask.mutable_data({n, an_num, h, w}, ctx.GetPlace()); - noobj_mask.mutable_data({n, an_num, h, w}, ctx.GetPlace()); - tx.mutable_data({n, an_num, h, w}, ctx.GetPlace()); - ty.mutable_data({n, an_num, h, w}, ctx.GetPlace()); - tw.mutable_data({n, an_num, h, w}, ctx.GetPlace()); - th.mutable_data({n, an_num, h, w}, ctx.GetPlace()); - tconf.mutable_data({n, an_num, h, w}, ctx.GetPlace()); - tclass.mutable_data({n, an_num, h, w, class_num}, ctx.GetPlace()); - PreProcessGTBox(*gt_box, *gt_label, ignore_thresh, anchors, h, &obj_mask, - &noobj_mask, &tx, &ty, &tw, &th, &tconf, &tclass); - - Tensor obj_mask_expand; - obj_mask_expand.mutable_data({n, an_num, h, w, class_num}, - ctx.GetPlace()); - ExpandObjMaskByClassNum(&obj_mask_expand, obj_mask); - - T loss_x = CalcMSEWithMask(pred_x, tx, obj_mask); - T loss_y = CalcMSEWithMask(pred_y, ty, obj_mask); - T loss_w = CalcMSEWithMask(pred_w, tw, obj_mask); - T loss_h = CalcMSEWithMask(pred_h, th, obj_mask); - T loss_conf_target = CalcBCEWithMask(pred_conf, tconf, obj_mask); - T loss_conf_notarget = CalcBCEWithMask(pred_conf, tconf, noobj_mask); - T loss_class = CalcBCEWithMask(pred_class, tclass, obj_mask_expand); - - auto* loss_data = loss->mutable_data({1}, ctx.GetPlace()); - loss_data[0] = loss_weight_xy * (loss_x + loss_y) + - loss_weight_wh * (loss_w + loss_h) + - loss_weight_conf_target * loss_conf_target + - loss_weight_conf_notarget * loss_conf_notarget + - loss_weight_class * loss_class; - } -}; - -template -class Yolov3LossGradKernel : public framework::OpKernel { - public: - void Compute(const framework::ExecutionContext& ctx) const override { - auto* input = ctx.Input("X"); - auto* gt_box = ctx.Input("GTBox"); - auto* gt_label = ctx.Input("GTLabel"); - auto anchors = ctx.Attr>("anchors"); - int class_num = ctx.Attr("class_num"); - float ignore_thresh = ctx.Attr("ignore_thresh"); - auto* input_grad = ctx.Output(framework::GradVarName("X")); - auto* output_grad = ctx.Input(framework::GradVarName("Loss")); - const T loss = output_grad->data()[0]; - float loss_weight_xy = ctx.Attr("loss_weight_xy"); - float loss_weight_wh = ctx.Attr("loss_weight_wh"); - float loss_weight_conf_target = ctx.Attr("loss_weight_conf_target"); - float loss_weight_conf_notarget = - ctx.Attr("loss_weight_conf_notarget"); - float loss_weight_class = ctx.Attr("loss_weight_class"); - - const int n = input->dims()[0]; - const int c = input->dims()[1]; - const int h = input->dims()[2]; - const int w = input->dims()[3]; - const int an_num = anchors.size() / 2; - - Tensor pred_x, pred_y, pred_w, pred_h; - Tensor pred_conf, pred_class; - pred_x.mutable_data({n, an_num, h, w}, ctx.GetPlace()); - pred_y.mutable_data({n, an_num, h, w}, ctx.GetPlace()); - pred_w.mutable_data({n, an_num, h, w}, ctx.GetPlace()); - pred_h.mutable_data({n, an_num, h, w}, ctx.GetPlace()); - pred_conf.mutable_data({n, an_num, h, w}, ctx.GetPlace()); - pred_class.mutable_data({n, an_num, h, w, class_num}, ctx.GetPlace()); - CalcPredResult(*input, &pred_conf, &pred_class, &pred_x, &pred_y, - &pred_w, &pred_h, an_num, class_num); - - Tensor obj_mask, noobj_mask; - Tensor tx, ty, tw, th, tconf, tclass; - obj_mask.mutable_data({n, an_num, h, w}, ctx.GetPlace()); - noobj_mask.mutable_data({n, an_num, h, w}, ctx.GetPlace()); - tx.mutable_data({n, an_num, h, w}, ctx.GetPlace()); - ty.mutable_data({n, an_num, h, w}, ctx.GetPlace()); - tw.mutable_data({n, an_num, h, w}, ctx.GetPlace()); - th.mutable_data({n, an_num, h, w}, ctx.GetPlace()); - tconf.mutable_data({n, an_num, h, w}, ctx.GetPlace()); - tclass.mutable_data({n, an_num, h, w, class_num}, ctx.GetPlace()); - PreProcessGTBox(*gt_box, *gt_label, ignore_thresh, anchors, h, &obj_mask, - &noobj_mask, &tx, &ty, &tw, &th, &tconf, &tclass); - - Tensor obj_mask_expand; - obj_mask_expand.mutable_data({n, an_num, h, w, class_num}, - ctx.GetPlace()); - ExpandObjMaskByClassNum(&obj_mask_expand, obj_mask); - - Tensor grad_x, grad_y, grad_w, grad_h; - Tensor grad_conf_target, grad_conf_notarget, grad_class; - grad_x.mutable_data({n, an_num, h, w}, ctx.GetPlace()); - grad_y.mutable_data({n, an_num, h, w}, ctx.GetPlace()); - grad_w.mutable_data({n, an_num, h, w}, ctx.GetPlace()); - grad_h.mutable_data({n, an_num, h, w}, ctx.GetPlace()); - grad_conf_target.mutable_data({n, an_num, h, w}, ctx.GetPlace()); - grad_conf_notarget.mutable_data({n, an_num, h, w}, ctx.GetPlace()); - grad_class.mutable_data({n, an_num, h, w, class_num}, ctx.GetPlace()); - T obj_mf = CalcMaskPointNum(obj_mask); - T noobj_mf = CalcMaskPointNum(noobj_mask); - T obj_expand_mf = CalcMaskPointNum(obj_mask_expand); - CalcMSEGradWithMask(&grad_x, pred_x, tx, obj_mask, obj_mf); - CalcMSEGradWithMask(&grad_y, pred_y, ty, obj_mask, obj_mf); - CalcMSEGradWithMask(&grad_w, pred_w, tw, obj_mask, obj_mf); - CalcMSEGradWithMask(&grad_h, pred_h, th, obj_mask, obj_mf); - CalcBCEGradWithMask(&grad_conf_target, pred_conf, tconf, obj_mask, - obj_mf); - CalcBCEGradWithMask(&grad_conf_notarget, pred_conf, tconf, noobj_mask, - noobj_mf); - CalcBCEGradWithMask(&grad_class, pred_class, tclass, obj_mask_expand, - obj_expand_mf); - - input_grad->mutable_data({n, c, h, w}, ctx.GetPlace()); - AddAllGradToInputGrad( - input_grad, loss, pred_x, pred_y, pred_conf, pred_class, grad_x, grad_y, - grad_w, grad_h, grad_conf_target, grad_conf_notarget, grad_class, - class_num, loss_weight_xy, loss_weight_wh, loss_weight_conf_target, - loss_weight_conf_notarget, loss_weight_class); - } -}; - -} // namespace operators -} // namespace paddle diff --git a/paddle/fluid/platform/cuda_device_function.h b/paddle/fluid/platform/cuda_device_function.h index 2ce8f141d3c51661305f4952479cf2889fc4f396..31b6c38d613cf9df8fa7e8f6a8e1cfa310280968 100644 --- a/paddle/fluid/platform/cuda_device_function.h +++ b/paddle/fluid/platform/cuda_device_function.h @@ -53,10 +53,12 @@ inline static int RoundToPowerOfTwo(int dim) { __VA_ARGS__; \ } break -#define CUDA_LAUNCH_KERNEL_HELPER(...) \ - CUDA_LAUNCH_KERNEL_BASE(256, ##__VA_ARGS__); \ - CUDA_LAUNCH_KERNEL_BASE(128, ##__VA_ARGS__); \ - CUDA_LAUNCH_KERNEL_BASE(64, ##__VA_ARGS__); \ +#define CUDA_LAUNCH_KERNEL_HELPER(...) \ + CUDA_LAUNCH_KERNEL_BASE(1024, ##__VA_ARGS__); \ + CUDA_LAUNCH_KERNEL_BASE(512, ##__VA_ARGS__); \ + CUDA_LAUNCH_KERNEL_BASE(256, ##__VA_ARGS__); \ + CUDA_LAUNCH_KERNEL_BASE(128, ##__VA_ARGS__); \ + CUDA_LAUNCH_KERNEL_BASE(64, ##__VA_ARGS__); \ CUDA_LAUNCH_KERNEL_BASE(32, ##__VA_ARGS__); template diff --git a/paddle/fluid/platform/place.cc b/paddle/fluid/platform/place.cc index 655ce8485d4584aa0955315b045da6bf541f7fe2..60b2d83f15746eab0a4d29c7965c064690b6d46d 100644 --- a/paddle/fluid/platform/place.cc +++ b/paddle/fluid/platform/place.cc @@ -14,6 +14,12 @@ limitations under the License. */ #include "paddle/fluid/platform/place.h" +DEFINE_bool(benchmark, false, + "Doing memory benchmark. It will make deleting scope synchronized, " + "and add some memory usage logs." + "Default cuda is asynchronous device, set to True will" + "force op run in synchronous mode."); + namespace paddle { namespace platform { diff --git a/paddle/fluid/pybind/pybind.cc b/paddle/fluid/pybind/pybind.cc index 97e5bbaaccaf7c702a324abd708a314c72ece004..4dcec21952f2eba72574c95303ba728df8746401 100644 --- a/paddle/fluid/pybind/pybind.cc +++ b/paddle/fluid/pybind/pybind.cc @@ -37,6 +37,7 @@ limitations under the License. */ #include "paddle/fluid/framework/version.h" #include "paddle/fluid/imperative/layer.h" #include "paddle/fluid/memory/allocation/allocator_strategy.h" +#include "paddle/fluid/memory/allocation/legacy_allocator.h" #include "paddle/fluid/operators/activation_op.h" #include "paddle/fluid/operators/py_func_op.h" #include "paddle/fluid/operators/reader/lod_tensor_blocking_queue.h" @@ -127,6 +128,13 @@ PYBIND11_MODULE(core, m) { m.add_object("_cleanup", py::capsule([]() { ScopePool::Instance().Clear(); })); + m.def("get_mem_usage", [](int device) { + return memory::allocation::GPUMemMonitor.GetMemUsage(device); + }); + + m.def("print_mem_usage", + []() { return memory::allocation::GPUMemMonitor.PrintMemUsage(); }); + py::class_(m, "VarBase", R"DOC()DOC") // .def(py::init<>()) .def(py::init(), py::arg("stop_gradient") = false) diff --git a/paddle/scripts/fast_install.sh b/paddle/scripts/fast_install.sh new file mode 100644 index 0000000000000000000000000000000000000000..b960d0f00a26196c827053c41a3b35b97e7cdb07 --- /dev/null +++ b/paddle/scripts/fast_install.sh @@ -0,0 +1,923 @@ +#!/bin/bash + +path='http://paddlepaddle.org/download?url=' +#release_version=`curl -s https://pypi.org/project/paddlepaddle/|grep -E "/project/paddlepaddle/"|grep "release"|awk -F '/' '{print $(NF-1)}'|head -1` +release_version=1.2.0 +python_list=( +"27" +"35" +"36" +"37" +) + + +function use_cpu(){ + while true + do + read -p "是否安装CPU版本的PaddlePaddle?(y/n)" cpu_option + cpu_option=`echo $cpu_option | tr 'A-Z' 'a-z'` + if [[ "$cpu_option" == "" || "$cpu_option" == "n" ]];then + echo "退出安装中..." + exit + else + GPU='cpu' + echo "将为您安装CPU版本的PaddlePaddle" + break + fi + done +} + +function checkLinuxCUDNN(){ + echo + read -n1 -p "请按回车键进行下一步..." + echo + while true + do + version_file='/usr/local/cuda/include/cudnn.h' + if [ -f "$version_file" ];then + CUDNN=`cat $version_file | grep CUDNN_MAJOR |awk 'NR==1{print $NF}'` + fi + if [ "$CUDNN" == "" ];then + version_file=`sudo find /usr -name "cudnn.h"|head -1` + if [ "$version_file" != "" ];then + CUDNN=`cat ${version_file} | grep CUDNN_MAJOR -A 2|awk 'NR==1{print $NF}'` + else + echo "检测结果:未在常规路径下找到cuda/include/cudnn.h文件" + while true + do + read -p "请核实cudnn.h位置,并在此输入路径(请注意,路径需要输入到“cudnn.h”这一级):" cudnn_version + echo + if [ "$cudnn_version" == "" ] || [ ! -f "$cudnn_version" ];then + read -p "仍未找到cuDNN,输入y将安装CPU版本的PaddlePaddle,输入n可重新录入cuDNN路径,请输入(y/n)" cpu_option + echo + cpu_option=`echo $cpu_option | tr 'A-Z' 'a-z'` + if [ "$cpu_option" == "y" -o "$cpu_option" == "" ];then + GPU='cpu' + break + else + echo "请重新输入" + echo + fi + else + CUDNN=`cat $cudnn_version | grep CUDNN_MAJOR |awk 'NR==1{print $NF}'` + echo "检测结果:找到cudnn.h" + break + fi + done + if [ "$GPU" == "cpu" ];then + break + fi + fi + fi + if [ "$CUDA" == "9" -a "$CUDNN" != "7" ];then + echo + echo "目前CUDA9下仅支持cuDNN7,暂不支持您机器上的CUDNN${CUDNN}。您可以访问NVIDIA官网下载适合版本的CUDNN,请ctrl+c退出安装进程。按回车键将为您安装CPU版本的PaddlePaddle" + echo + use_cpu() + if [ "$GPU"=="cpu" ];then + break + fi + fi + + if [ "$CUDNN" == 5 ] || [ "$CUDNN" == 7 ];then + echo + echo "您的CUDNN版本是: CUDNN$CUDNN" + break + else + echo + read -n1 -p "目前支持的CUDNN版本为5和7,暂不支持您机器上的CUDNN${CUDNN},将为您安装CPU版本的PaddlePaddle,请按回车键开始安装" + echo + use_cpu + if [ "$GPU"=="cpu" ];then + break + fi + fi + done +} + +function checkLinuxCUDA(){ + while true + do + CUDA=`echo ${CUDA_VERSION}|awk -F "[ .]" '{print $1}'` + if [ "$CUDA" == "" ];then + if [ -f "/usr/local/cuda/version.txt" ];then + CUDA=`cat /usr/local/cuda/version.txt | grep 'CUDA Version'|awk -F '[ .]' '{print $3}'` + tmp_cuda=$CUDA + fi + if [ -f "/usr/local/cuda8/version.txt" ];then + CUDA=`cat /usr/local/cuda8/version.txt | grep 'CUDA Version'|awk -F '[ .]' '{print $3}'` + tmp_cuda8=$CUDA + fi + if [ -f "/usr/local/cuda9/version.txt" ];then + CUDA=`cat /usr/local/cuda9/version.txt | grep 'CUDA Version'|awk -F '[ .]' '{print $3}'` + tmp_cuda9=$CUDA + fi + fi + + if [ "$tmp_cuda" != "" ];then + echo "检测结果:找到CUDA $tmp_cuda" + fi + if [ "$tmp_cudai8" != "" ];then + echo "检测结果:找到CUDA $tmp_cuda8" + fi + if [ "$tmp_cuda9" != "" ];then + echo "检测结果:找到CUDA $tmp_cuda9" + fi + + if [ "$CUDA" == "" ];then + echo "检测结果:没有在常规路径下找到cuda/version.txt文件" + while true + do + read -p "请输入cuda/version.txt的路径:" cuda_version + if [ "$cuda_version" == "" || ! -f "$cuda_version" ];then + read -p "仍未找到CUDA,输入y将安装CPU版本的PaddlePaddle,输入n可重新录入CUDA路径,请输入(y/n)" cpu_option + cpu_option=`echo $cpu_option | tr 'A-Z' 'a-z'` + if [ "$cpu_option" == "y" || "$cpu_option" == "" ];then + GPU='cpu' + break + else + echo "重新输入..." + fi + else + CUDA=`cat $cuda_version | grep 'CUDA Version'|awk -F '[ .]' '{print $3}'` + if [ "$CUDA" == "" ];then + echo "未能在version.txt中找到CUDA相关信息" + else + break + fi + fi + done + if [ "$GPU" == "cpu" ];then + break + fi + fi + + if [ "$CUDA" == "8" ] || [ "$CUDA" == "9" ];then + echo "您的CUDA版本是${CUDA}" + break + else + echo "目前支持CUDA8/9,暂不支持您的CUDA${CUDA},将为您安装CPU版本的PaddlePaddle" + echo + use_cpu + fi + + if [ "$GPU" == "cpu" ];then + break + fi + done +} + +function checkLinuxMathLibrary(){ + while true + do + if [ "$AVX" == "" ];then + echo "正在检测您环境中是否存在AVX指令集..." + echo + echo "检测结果:您电脑上没有AVX指令集,目前针对无AVX指令集的环境,我们仅提供支持mkl数学库的PaddlePaddle,将为您安装此版本的PaddlePaddle" + math='mkl' + break + elif [ "$GPU" == "gpu" ];then + math='mkl' + echo "检测到您的机器上配备GPU,推荐您使用mkl数学库" + break + else + read -p "请输入您希望使用的数学库: + 1:openblas 一个高性能多核 BLAS 库 + 2:mkl(推荐) 英特尔数学核心函数库 + => 请输入数字1或2。如输入其他字符或直接回车,将会默认选择【 2. mkl 】 。请在这里输入并回车:" math + if [ "$math" == "" ];then + math="mkl" + echo "您选择了数字【2】" + break + fi + if [ "$math" == "1" ];then + math=openblas + echo "您选择了数字【1】" + break + elif [ "$math" == "2" ];then + math=mkl + echo "您选择了数字【2】" + break + fi + echo "输入错误,请再次输入" + fi + done +} + +function checkLinuxPaddleVersion(){ + read -n1 -p "请按回车键继续..." + while true + do + read -p " + 1. 开发版:对应Github上develop分支,如您需要开发、或希望使用PaddlePaddle最新功能,请选用此版本 + 2. 稳定版(推荐):如您无特殊开发需求,建议使用此版本,目前最新的版本号为 ${release_version} + => 请输入数字1或2。如输入其他字符或直接回车,将会默认选择【 2. 稳定版 】 。请在这里输入并回车:" paddle_version + if [ "$paddle_version" == "" ];then + paddle_version="release-${release_version}" + echo "您选择了数字【2】,为您安装release-${release_version}" + break + fi + if [ "$paddle_version" == "1" ];then + echo "您选择了数字【1】,将为您安装开发版" + break + elif [ "$paddle_version" == "2" ];then + echo "您选择了数字【2】,为您安装release-${release_version}" + break + fi + echo "输入错误,请再次输入" + done +} + +function checkLinuxPip(){ + while true + do + echo "请输入您要使用的pip目录(您可以另起终端,并使用which pip来查看):" + read -p "" pip_path + if [ "$pip_path" == "" -o ! -f "$pip_path" ];then + echo "检测结果:pip不存在,请重新输入" + continue + fi + python_version=`$pip_path --version|awk -F "[ |)]" '{print $6}'|sed 's#\.##g'` + if [ "$python_version" == "27" ];then + uncode=`python -c "import pip._internal;print(pip._internal.pep425tags.get_supported())"|grep "cp27mu"` + if [[ "$uncode" == "" ]];then + uncode= + else + uncode=u + fi + fi + if [ "$python_version" == "" ];then + echo "检测结果:pip不存在,请重新输入" + else + version_list=`echo "${python_list[@]}" | grep "$python_version" ` + if [ "$version_list" != "" ];then + echo "检测结果:找到python${python_version}版本" + break + else + echo "检测结果:找不到可用的 pip, 我们只支持Python27/35/36/37及其对应的pip, 请重新输入, 或使用ctrl + c退出 " + fi + fi + done +} + +function checkLinuxAVX(){ + while true + do + if [[ "$AVX" != "" ]];then + AVX="avx" + break + else + if [ "$CUDA" == "8" -a "$CUDNN" == "7" ] || [ "$GPU" == "cpu" ];then + AVX="noavx" + break + else + echo "Step 6. 检测是否有avx" + echo + echo "检测结果:未能找到avx,我们仅提供CPU版本或配置为CUDA8 cuDNN7的GPU版本的安装包" + break + fi + fi + done +} + +function PipLinuxInstall(){ + wheel_cpu_release="http://paddle-wheel.bj.bcebos.com/${release_version}-${GPU}-${AVX}-${math}/paddlepaddle-${release_version}-cp${python_version}-cp${python_version}m${uncode}-linux_x86_64.whl" + wheel_gpu_release="http://paddle-wheel.bj.bcebos.com/${release_version}-gpu-cuda${CUDA}-cudnn${CUDNN}-${AVX}-${math}/paddlepaddle_gpu-${release_version}.post${CUDA}${CUDNN}-cp${python_version}-cp${python_version}m${uncode}-linux_x86_64.whl" + wheel_gpu_release_noavx="http://paddle-wheel.bj.bcebos.com/${release_version}-gpu-cuda${CUDA}-cudnn${CUDNN}-${AVX}-${math}/paddlepaddle_gpu-${release_version}-cp${python_version}-cp${python_version}m${uncode}-linux_x86_64.whl" + wheel_cpu_develop="http://paddle-wheel.bj.bcebos.com/latest-cpu-${AVX}-${math}/paddlepaddle-latest-cp${python_version}-cp${python_version}m${uncode}-linux_x86_64.whl" + wheel_gpu_develop="http://paddle-wheel.bj.bcebos.com/latest-gpu-cuda${CUDA}-cudnn${CUDNN}-${AVX}-${math}/paddlepaddle_gpu-latest-cp${python_version}-cp${python_version}m${uncode}-linux_x86_64.whl" + + + if [[ "$paddle_version" == "2" ]];then + if [[ "$GPU" == "gpu" ]];then + if [[ ${AVX} == "avx" ]];then + rm -rf `echo $wheel_gpu_release|awk -F '/' '{print $NF}'` + wget -q $wheel_gpu_release + if [ "$?" == "0" ];then + $pip_path install --user -i https://mirrors.aliyun.com/pypi/simple --trusted-host=mirrors.aliyun.com $wheel_gpu_release + else + echo "paddlepaddle whl包下载失败" + exit 1 + fi + else + rm -rf `echo $wheel_gpu_release_novax|awk -F '/' '{print $NF}'` + wget -q $wheel_gpu_release_novax + if [ "$?" == "0" ];then + $pip_path install --user -i https://mirrors.aliyun.com/pypi/simple --trusted-host=mirrors.aliyun.com $wheel_gpu_release_noavx + else + echo "paddlepaddle whl包下载失败" + exit 1 + fi + fi + else + rm -rf `echo $wheel_cpu_release|awk -F '/' '{print $NF}'` + wget -q $wheel_cpu_release + if [ "$?" == "0" ];then + $pip_path install --user -i https://mirrors.aliyun.com/pypi/simple --trusted-host=mirrors.aliyun.com $wheel_cpu_release + else + echo "paddlepaddle whl包下载失败" + exit 1 + fi + fi + else + if [[ "$GPU" == "gpu" ]];then + rm -rf `echo $wheel_gpu_develop|awk -F '/' '{print $NF}'` + wget -q $wheel_gpu_develop + if [ "$?" == "0" ];then + $pip_path install --user -i https://mirrors.aliyun.com/pypi/simple --trusted-host=mirrors.aliyun.com $wheel_gpu_develop + else + echo "paddlepaddle whl包下载失败" + exit 1 + fi + else + rm -rf `echo $wheel_cpu_develop|awk -F '/' '{print $NF}'` + wget -q $wheel_cpu_develop + if [ "$?" == "0" ];then + $pip_path install --user -i https://mirrors.aliyun.com/pypi/simple --trusted-host=mirrors.aliyun.com $wheel_cpu_develop + else + echo "paddlepaddle whl包下载失败" + exit 1 + fi + fi + fi +} + + +function checkLinuxGPU(){ + read -n1 -p "即将检测您的机器是否含GPU,请按回车键继续..." + echo + AVX=`cat /proc/cpuinfo |grep avx|tail -1|grep avx` + which nvidia-smi >/dev/null 2>&1 + if [ "$?" != "0" ];then + GPU='cpu' + echo "未在机器上找到GPU,或PaddlePaddle暂不支持此型号的GPU" + else + GPU='gpu' + echo "已在您的机器上找到GPU,即将确认CUDA和CUDNN版本..." + echo + fi + if [ "$GPU" == 'gpu' ];then + checkLinuxCUDA + checkLinuxCUDNN + fi +} + +function linux(){ +gpu_list=( +"GeForce 410M" +"GeForce 610M" +"GeForce 705M" +"GeForce 710M" +"GeForce 800M" +"GeForce 820M" +"GeForce 830M" +"GeForce 840M" +"GeForce 910M" +"GeForce 920M" +"GeForce 930M" +"GeForce 940M" +"GeForce GT 415M" +"GeForce GT 420M" +"GeForce GT 430" +"GeForce GT 435M" +"GeForce GT 440" +"GeForce GT 445M" +"GeForce GT 520" +"GeForce GT 520M" +"GeForce GT 520MX" +"GeForce GT 525M" +"GeForce GT 540M" +"GeForce GT 550M" +"GeForce GT 555M" +"GeForce GT 610" +"GeForce GT 620" +"GeForce GT 620M" +"GeForce GT 625M" +"GeForce GT 630" +"GeForce GT 630M" +"GeForce GT 635M" +"GeForce GT 640" +"GeForce GT 640 (GDDR5)" +"GeForce GT 640M" +"GeForce GT 640M LE" +"GeForce GT 645M" +"GeForce GT 650M" +"GeForce GT 705" +"GeForce GT 720" +"GeForce GT 720M" +"GeForce GT 730" +"GeForce GT 730M" +"GeForce GT 735M" +"GeForce GT 740" +"GeForce GT 740M" +"GeForce GT 745M" +"GeForce GT 750M" +"GeForce GTS 450" +"GeForce GTX 1050" +"GeForce GTX 1060" +"GeForce GTX 1070" +"GeForce GTX 1080" +"GeForce GTX 1080 Ti" +"GeForce GTX 460" +"GeForce GTX 460M" +"GeForce GTX 465" +"GeForce GTX 470" +"GeForce GTX 470M" +"GeForce GTX 480" +"GeForce GTX 480M" +"GeForce GTX 485M" +"GeForce GTX 550 Ti" +"GeForce GTX 560M" +"GeForce GTX 560 Ti" +"GeForce GTX 570" +"GeForce GTX 570M" +"GeForce GTX 580" +"GeForce GTX 580M" +"GeForce GTX 590" +"GeForce GTX 650" +"GeForce GTX 650 Ti" +"GeForce GTX 650 Ti BOOST" +"GeForce GTX 660" +"GeForce GTX 660M" +"GeForce GTX 660 Ti" +"GeForce GTX 670" +"GeForce GTX 670M" +"GeForce GTX 670MX" +"GeForce GTX 675M" +"GeForce GTX 675MX" +"GeForce GTX 680" +"GeForce GTX 680M" +"GeForce GTX 680MX" +"GeForce GTX 690" +"GeForce GTX 750" +"GeForce GTX 750 Ti" +"GeForce GTX 760" +"GeForce GTX 760M" +"GeForce GTX 765M" +"GeForce GTX 770" +"GeForce GTX 770M" +"GeForce GTX 780" +"GeForce GTX 780M" +"GeForce GTX 780 Ti" +"GeForce GTX 850M" +"GeForce GTX 860M" +"GeForce GTX 870M" +"GeForce GTX 880M" +"GeForce GTX 950" +"GeForce GTX 950M" +"GeForce GTX 960" +"GeForce GTX 960M" +"GeForce GTX 965M" +"GeForce GTX 970" +"GeForce GTX 970M" +"GeForce GTX 980" +"GeForce GTX 980M" +"GeForce GTX 980 Ti" +"GeForce GTX TITAN" +"GeForce GTX TITAN Black" +"GeForce GTX TITAN X" +"GeForce GTX TITAN Z" +"Jetson TK1" +"Jetson TX1" +"Jetson TX2" +"Mobile Products" +"NVIDIA NVS 310" +"NVIDIA NVS 315" +"NVIDIA NVS 510" +"NVIDIA NVS 810" +"NVIDIA TITAN V" +"NVIDIA TITAN X" +"NVIDIA TITAN Xp" +"NVS 4200M" +"NVS 5200M" +"NVS 5400M" +"Quadro 410" +"Quadro GP100" +"Quadro K1100M" +"Quadro K1200" +"Quadro K2000" +"Quadro K2000D" +"Quadro K2100M" +"Quadro K2200" +"Quadro K2200M" +"Quadro K3100M" +"Quadro K4000" +"Quadro K4100M" +"Quadro K420" +"Quadro K4200" +"Quadro K4200M" +"Quadro K5000" +"Quadro K500M" +"Quadro K5100M" +"Quadro K510M" +"Quadro K5200" +"Quadro K5200M" +"Quadro K600" +"Quadro K6000" +"Quadro K6000M" +"Quadro K610M" +"Quadro K620" +"Quadro K620M" +"Quadro M1000M" +"Quadro M1200" +"Quadro M2000" +"Quadro M2000M" +"Quadro M2200" +"Quadro M3000M" +"Quadro M4000" +"Quadro M4000M" +"Quadro M5000" +"Quadro M5000M" +"Quadro M500M" +"Quadro M520" +"Quadro M5500M" +"Quadro M6000" +"Quadro M6000 24GB" +"Quadro M600M" +"Quadro M620" +"Quadro Mobile Products" +"Quadro P1000" +"Quadro P2000" +"Quadro P3000" +"Quadro P400" +"Quadro P4000" +"Quadro P5000" +"Quadro P600" +"Quadro P6000" +"Quadro Plex 7000" +"Tegra K1" +"Tegra X1" +"Tesla C2050/C2070" +"Tesla C2075" +"Tesla Data Center Products" +"Tesla K10" +"Tesla K20" +"Tesla K40" +"Tesla K80" +"Tesla M40" +"Tesla M60" +"Tesla P100" +"Tesla P4" +"Tesla P40" +"Tesla V100") + + echo "Step 2. 检测GPU型号和CUDA/cuDNN版本" + echo + checkLinuxGPU + echo + echo "Step 3. 检测数学库" + echo + checkLinuxMathLibrary + echo + echo "Step 4. 选择要安装的PaddlePaddle版本" + echo + checkLinuxPaddleVersion + echo + echo "Step 5. 检测pip版本" + echo + checkLinuxPip + echo + checkLinuxAVX + echo "*********************2. 开始安装*****************************" + PipLinuxInstall +} + +function checkMacPython2(){ + while true + do + read -p " + => 未能在常规路径下找到Python2,请使用ctrl+c命令退出安装程序,并使用brew或pypi.org下载安装Python2(注意Python版本不能低于2.7.15) + 如希望自定义Python路径,请输入路径:" python_root + echo + python_version=`$python_root --version 2>&1 1>&1` + if [ $? == "0" ];then + : + else + python_version="" + fi + check_python=`echo $python_version | grep "Python 2"` + if [ "$python_version" == "" ] || [ "$python_root" == "/usr/bin/python" -a "$python_version" == "Python 2.7.10" ] ;then + python_version="" + elif [ -n "$check_python" ];then + while true + do + read -p " + => 在您的环境中找到 $python_version, 确认使用此版本请输入y;如您希望自定义Python路径请输入n。请在这里输入(y/n)并回车: " use_python + echo + use_python=`echo $use_python | tr 'A-Z' 'a-z'` + if [ "$use_python" == "y" ]||[ "$use_python" == "" ];then + use_python="y" + break + elif [ "$use_python" == "n" ];then + python_root="" + break + else + echo "输入错误,请重新输入(y/n)" + fi + done + if [ "$use_python" == "y" ];then + break + fi + else + echo "您输入Python的不是Python2" + python_version="" + fi + done +} + +function checkMacPython3(){ + while true + do + read -p " + => 未能在常规路径下找到Python3,请使用ctrl+c命令退出安装程序,并使用brew或pypi.org下载Python3 + 如希望自定义Python路径,请输入路径:" python_root + python_version=`$python_root --version 2>&1 1>&1` + if [ $? == "0" ];then + : + else + python_version="" + fi + check_python=`echo $python_version | grep "Python 3"` + if [ "$python_version" == "" ] || [ "$python_root" == "/usr/bin/python" -a "$python_version" == "Python 2.7.10" ] ;then + python_version="" + elif [ -n "$check_python" ] ;then + while true + do + read -p " + => 在您的环境中找到 $python_version, 确认使用此版本请输入y;如您希望自定义Python路径请输入n。请在这里输入(y/n)并回车: " use_python + echo + use_python=`echo $use_python | tr 'A-Z' 'a-z'` + if [ "$use_python" == "y" ]||[ "$use_python" == "" ];then + use_python="y" + break + elif [ "$use_python" == "n" ];then + python_root="" + break + else + echo "输入错误,请重新输入(y/n)" + fi + done + if [ "$use_python" == "y" ];then + break + fi + else + echo "您输入Python的不是Python3" + python_version="" + fi + done +} + +function checkMacPaddleVersion(){ + while true + do + read -n1 -p "Step 2. 选择PaddlePaddle的版本,请按回车键继续..." + echo + read -p " + 1. 开发版:对应Github上develop分支,如您需要开发、或希望使用PaddlePaddle最新功能,请选用此版本 + 2. 稳定版(推荐):如您无特殊开发需求,建议使用此版本,目前最新的版本号为 ${release_version} + + => 请输入数字1或2。如输入其他字符或直接回车,将会默认选择【 2. 稳定版 】 。请在这里输入并回车:" paddle_version + if [ "$paddle_version" == "1" ]||[ "$paddle_version" == "2" ];then + echo + echo "您选择了数字【"$paddle_version" 】" + echo + break + else + paddle_version="2" + echo + echo "您选择了数字【2】" + echo + break + fi + done +} + +function checkMacPythonVersion(){ + while true + do + read -n1 -p "Step 3. 选择Python版本,请按回车键继续..." + read -p " + 2. 使用python 2.x + 3. 使用python 3.x + + => 请输入数字2或3。如输入其他字符或直接回车,将会默认使用【Python 2 】。请在这里输入并回车:" python_V + echo + if [ "$python_V" == "" ];then + python_V="2" + fi + echo "您选择了数字【"$python_V"】,正在寻找符合您要求的Python版本,请按回车键继续..." + echo + if [ "$python_V" == "2" ];then + python_root=`which python2.7` + if [ "$python_root" == "" ];then + python_root=`which python` + fi + python_version=`$python_root --version 2>&1 1>&1` + if [ $? == "0" ];then + : + else + python_version="" + fi + if [ "$python_root" == "" ]||[ "$python_root" == "/usr/bin/python" -a "$python_version" == "Python 2.7.10" ]||[ "$python_root" == "/usr/bin/python2.7" -a "$python_version" == "Python 2.7.10" ];then + checkMacPython2 + fi + while true + do + read -p " + => 在您的环境中找到 $python_version, 确认使用此版本请输入y;如您希望自定义Python路径请输入n。请在这里输入(y/n)并回车:" use_python + echo + use_python=`echo $use_python | tr 'A-Z' 'a-z'` + if [ "$use_python" == "y" ]||[ "$use_python" == "" ];then + break + elif [ "$use_python" == "n" ];then + python_root="" + checkMacPython2 + break + else + echo "输入错误,请重新输入(y/n)" + fi + done + + elif [ "$python_V" == "3" ];then + python_root=`which python3` + python_version=`$python_root --version 2>&1 1>&1` + if [ $? == "0" ];then + : + else + python_version="" + fi + if [ "$python_root" == "" ]||[ "$python_root" == "/usr/bin/python" -a "$python_version" == "Python 2.7.10" ];then + checkMacPython3 + fi + while true + do + read -p " + => 在您的环境中找到 $python_version, 确认使用此版本请输入y;如您希望自定义Python路径请输入n。请在这里输入(y/n)并回车:" use_python + echo + use_python=`echo $use_python | tr 'A-Z' 'a-z'` + if [ "$use_python" == "y" ]||[ "$use_python" == "" ];then + break + elif [ "$use_python" == "n" ];then + checkMacPython3 + break + else + echo "输入错误,请重新输入(y/n)" + fi + done + else + : + fi + + + if [ "$python_V" == "2" ]||[ "$python_V" == "3" ];then + python_brief_version=`$python_root -m pip -V |awk -F "[ |)]" '{print $6}'|sed 's#\.##g'` + if [[ $python_brief_version == "27" ]];then + uncode=`python -c "import pip._internal;print(pip._internal.pep425tags.get_supported())"|grep "cp27"` + if [[ $uncode == "" ]];then + uncode="mu" + else + uncode="m" + fi + fi + version_list=`echo "${python_list[@]}" | grep "$python_brief_version" ` + if [ "$version_list" != "" ];then + break + else + echo "未找到可用的pip或pip3。PaddlePaddle目前支持:Python2.7/3.5/3.6/3.7及其对应的pip, 请重新输入,或使用ctrl + c退出" + fi + else + echo "输入错误,请重新输入" + fi + done +} + +function checkMacAVX(){ + read -n1 -p "Step 4. 检测您的Mac是否支持AVX指令集,请按回车键继续..." + echo + if [[ $AVX != "" ]];then + AVX="avx" + echo "检测结果:支持" + else + read -n1 -p "检测结果:不支持。非常抱歉,PaddlePaddle在Mac系统暂不提供no_avx类型的安装包,您可以选择在Linux系统中安装no_avx版的PaddlePaddle, 请按回车键退出..." + exit + fi + echo +} + +function checkMacGPU(){ + read -n1 -p "Step 5. 选择CPU/GPU版本,请按回车键继续..." + echo + if [[ $GPU != "" ]];then + echo "MacOS环境下,暂未提供GPU版本的PaddlePaddle安装包,将为您安装CPU版本的PaddlePaddle" + else + echo "MacOS环境下,暂未提供GPU版本的PaddlePaddle安装包,将为您安装CPU版本的PaddlePaddle" + GPU=cpu + fi + echo +} + +function macos() { + path='http://paddlepaddle.org/download?url=' + AVX=`sysctl -a | grep cpu | grep AVX1.0 | tail -1 | grep AVX` + + while true + do + checkMacPaddleVersion + checkMacPythonVersion + checkMacAVX + checkMacGPU + + + echo "*********************2. 开始安装*****************************" + echo + read -n1 -p "即将为您下载并安装PaddlePaddle,请按回车键继续..." + echo + if [[ $paddle_version == "2" ]];then + $python_root -m pip install paddlepaddle + if [ $? == "0" ];then + echo "安装成功,可以使用: ${python_root} 来启动安装了PaddlePaddle的Python解释器" + break + else + rm $whl_cpu_release + echo "未能正常安装PaddlePaddle,请尝试更换您输入的python路径,或者ctrl + c退出后请检查您使用的python对应的pip或pip源是否可用" + echo"" + echo "==========================================================================================" + echo"" + exit 1 + fi + else + if [ -f $whl_cpu_develop ];then + $python_root -m pip install $whl_cpu_develop + if [ $? == "0" ];then + rm -rf $whl_cpu_develop + echo "安装成功!小提示:可以使用: ${python_root} 来启动安装了PaddlePaddle的Python解释器" + break + else + echo "未能正常安装PaddlePaddle,请尝试更换您输入的python路径,或者ctrl + c退出后请检查您使用的python对应的pip或pip源是否可用" + echo"" + echo "==========================================================================================" + echo"" + exit 1 + fi + else + wget ${path}$whl_cpu_develop -O $whl_cpu_develop + if [ $? == "0" ];then + $python_root -m pip install $whl_cpu_develop + if [ $? == "0" ];then + rm $wheel_cpu_develop + echo "安装成功,可以使用: ${python_root} 来启动安装了PaddlePaddle的Python解释器" + break + else + rm $whl_cpu_release + echo "未能正常安装PaddlePaddle,请尝试更换您输入的python路径,或者ctrl + c退出后请检查您使用的python对应的pip或pip源是否可用" + echo"" + echo "==========================================================================================" + echo"" + exit 1 + fi + else + rm $whl_cpu_develop + echo "未能正常安装PaddlePaddle,请检查您的网络 或者确认您是否安装有 wget,或者ctrl + c退出后反馈至https://github.com/PaddlePaddle/Paddle/issues" + echo"" + echo "==========================================================================================" + echo"" + exit 1 + fi + fi + fi + done +} + +function main() { + echo "*********************************" + echo "欢迎使用PaddlePaddle快速安装脚本" + echo "*********************************" + echo + echo "如果您在安装过程中遇到任何问题,请在https://github.com/PaddlePaddle/Paddle/issues反馈,我们的工作人员将会帮您答疑解惑" + echo + echo "本安装包将帮助您在Linux或Mac系统下安装PaddlePaddle,包括 1)安装前的准备和 2)开始安装 两部分" + echo + read -n1 -p "请按回车键进行下一步..." + echo + echo + echo "*********************1. 安装前的准备*****************************" + echo + echo "Step 1. 正在检测您的操作系统信息..." + echo + SYSTEM=`uname -s` + if [ "$SYSTEM" == "Darwin" ];then + echo "您的系统为:MAC OSX" + echo + macos + else + echo "您的系统为:Linux" + echo + OS=`cat /etc/issue|awk 'NR==1 {print $1}'` + if [ $OS == "\S" ] || [ "$OS" == "CentOS" ] || [ $OS == "Ubuntu" ];then + linux + else + echo "您的系统不在本安装包的支持范围,如您需要在windows环境下安装PaddlePaddle,请您参考PaddlePaddle官网的windows安装文档" + fi + fi +} +main diff --git a/python/paddle/fluid/contrib/__init__.py b/python/paddle/fluid/contrib/__init__.py index 6127ca8a3eacd013dd258a02b9f3cc792b634137..870c57e54011361caae5265201d19f58830a87bc 100644 --- a/python/paddle/fluid/contrib/__init__.py +++ b/python/paddle/fluid/contrib/__init__.py @@ -22,6 +22,8 @@ from . import op_frequence from .op_frequence import * from . import quantize from .quantize import * +from . import int8_inference +from .int8_inference import * from . import reader from .reader import * from . import slim @@ -34,6 +36,7 @@ __all__ += decoder.__all__ __all__ += memory_usage_calc.__all__ __all__ += op_frequence.__all__ __all__ += quantize.__all__ +__all__ += int8_inference.__all__ __all__ += reader.__all__ __all__ += slim.__all__ __all__ += utils.__all__ diff --git a/python/paddle/fluid/contrib/int8_inference/README.md b/python/paddle/fluid/contrib/int8_inference/README.md new file mode 100644 index 0000000000000000000000000000000000000000..a9691dad4494f5eacf427b2806b2393baa57dc1e --- /dev/null +++ b/python/paddle/fluid/contrib/int8_inference/README.md @@ -0,0 +1,72 @@ +# Offline INT8 Calibration Tool + +PaddlePaddle supports offline INT8 calibration to accelerate the inference speed. In this document, we provide the instructions on how to enable INT8 calibration and show the ResNet-50 and MobileNet-V1 results in accuracy. + +## 0. Prerequisite +You need to install at least PaddlePaddle-1.3 python package `pip install paddlepaddle==1.3`. + +## 1. How to generate INT8 model +You can refer to the unit test in [test_calibration.py](../tests/test_calibration.py). Basically, there are three steps: +* Construct calibration object. + +```python +calibrator = int8_utility.Calibrator( # Step 1 + program=infer_program, # required, FP32 program + pretrained_model=model_path, # required, FP32 pretrained model + algo=algo, # required, calibration algorithm; default is max, the alternative is KL (Kullback–Leibler divergence) + exe=exe, # required, executor + output=int8_model, # required, INT8 model + feed_var_names=feed_dict, # required, feed dict + fetch_list=fetch_targets) # required, fetch targets +``` + +* Call the calibrator.sample_data() after executor run. +```python +_, acc1, _ = exe.run( + program, + feed={feed_dict[0]: image, + feed_dict[1]: label}, + fetch_list=fetch_targets) + +calibrator.sample_data() # Step 2 +``` + +* Call the calibrator.save_int8_model() after sampling over specified iterations (e.g., iterations = 50) +```python +calibrator.save_int8_model() # Step 3 +``` + +## 2. How to run INT8 model +You can load INT8 model by load_inference_model [API](https://github.com/PaddlePaddle/Paddle/blob/8b50ad80ff6934512d3959947ac1e71ea3fb9ea3/python/paddle/fluid/io.py#L991) and run INT8 inference similar as [FP32](https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleCV/object_detection/eval.py "FP32"). + +```python +[infer_program, feed_dict, + fetch_targets] = fluid.io.load_inference_model(model_path, exe) +``` + +## 3. Result +We provide the results of accuracy measurd on [Intel® Xeon® Platinum Gold Processor](https://ark.intel.com/products/120489/Intel-Xeon-Gold-6148-Processor-27-5M-Cache-2-40-GHz- "Intel® Xeon® Gold 6148 Processor") (also known as Intel® Xeon® Skylake6148). + +| Model | Dataset | FP32 Accuracy | INT8 Accuracy | Accuracy Diff | +| ------------ | ------------ | ------------ | ------------ | ------------ | +| ResNet-50 | Small | 72.00% | 72.00% | 0.00% | +| MobileNet-V1 | Small | 62.00% | 62.00% | 0.00% | +| ResNet-50 | Full ImageNet Val | 76.63% | 76.17% | 0.46% | +| MobileNet-V1 | Full ImageNet Val | 70.78% | 70.49% | 0.29% | + +Please note that [Small](http://paddle-inference-dist.cdn.bcebos.com/int8/calibration_test_data.tar.gz "Small") is a subset of [full ImageNet validation dataset](http://www.image-net.org/challenges/LSVRC/2012/nnoupb/ILSVRC2012_img_val.tar "full ImageNet validation dataset"). + +Notes: +* The accuracy measurement requires the model with `label`. +* The INT8 theoretical speedup is ~1.33X on Intel® Xeon® Skylake Server (please refer to `This allows for 4x more input at the cost of 3x more instructions or 33.33% more compute` in [Reference](https://software.intel.com/en-us/articles/lower-numerical-precision-deep-learning-inference-and-training "Reference")). + +## 4. How to reproduce the results +* Small dataset +```bash +python python/paddle/fluid/contrib/tests/test_calibration.py +``` + +* Full dataset +```bash +DATASET=full python python/paddle/fluid/contrib/tests/test_calibration.py +``` diff --git a/python/paddle/fluid/contrib/int8_inference/__init__.py b/python/paddle/fluid/contrib/int8_inference/__init__.py index eca2dce114b069bf9b455d77ce670d73b5047fd2..45547201d598c809f7dcf3a1a09103ae5de3e4c6 100644 --- a/python/paddle/fluid/contrib/int8_inference/__init__.py +++ b/python/paddle/fluid/contrib/int8_inference/__init__.py @@ -11,3 +11,10 @@ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. + +from __future__ import print_function + +from . import utility +from .utility import * + +__all__ = utility.__all__ diff --git a/python/paddle/fluid/contrib/int8_inference/utility.py b/python/paddle/fluid/contrib/int8_inference/utility.py index 40de038f28a83738e6e6cd8c77c0a9916ce68b4f..b35d9f2424ccf093f70e75b13e23f6c5ad59e859 100644 --- a/python/paddle/fluid/contrib/int8_inference/utility.py +++ b/python/paddle/fluid/contrib/int8_inference/utility.py @@ -11,11 +11,15 @@ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. -import paddle.fluid.core as core + +from paddle.fluid import core import numpy as np import math import os -import paddle.fluid as fluid +from paddle.fluid.executor import global_scope +from paddle.fluid import io + +__all__ = ['Calibrator'] class Calibrator(object): @@ -76,8 +80,7 @@ class Calibrator(object): ''' for i in self.sampling_program.list_vars(): if i.name in self.sampling_vars: - np_data = np.array(fluid.global_scope().find_var(i.name) - .get_tensor()) + np_data = np.array(global_scope().find_var(i.name).get_tensor()) if i.name not in self._sampling_data: self._sampling_data[i.name] = [] self._sampling_data[i.name].append(np_data) @@ -86,9 +89,9 @@ class Calibrator(object): ''' Save the quantized model to the disk. ''' - fluid.io.save_inference_model(self.output, self.feed_var_names, - self.fetch_list, self.exe, - self.sampling_program) + io.save_inference_model(self.output, self.feed_var_names, + self.fetch_list, self.exe, + self.sampling_program) def __display_debug(self): if self.debug: diff --git a/python/paddle/fluid/contrib/tests/test_calibration.py b/python/paddle/fluid/contrib/tests/test_calibration.py index cd6b7ba1661a4614b3b645a687e0a1eab3cb60f8..424ea245a0f2dff0d437ace386f2e4e0fa6b517d 100644 --- a/python/paddle/fluid/contrib/tests/test_calibration.py +++ b/python/paddle/fluid/contrib/tests/test_calibration.py @@ -24,8 +24,7 @@ import contextlib from paddle.dataset.common import download from PIL import Image, ImageEnhance import math -sys.path.append('..') -import int8_inference.utility as int8_utility +import paddle.fluid.contrib.int8_inference.utility as int8_utility random.seed(0) np.random.seed(0) diff --git a/python/paddle/fluid/io.py b/python/paddle/fluid/io.py index 6b1d4cc34f3cd40c878740f28618f26d5e89a6bd..95cc05ac7191783969ff0fbf286c17bd1cfd6c7d 100644 --- a/python/paddle/fluid/io.py +++ b/python/paddle/fluid/io.py @@ -21,9 +21,10 @@ import shutil import six from functools import reduce +from paddle.fluid import layers from paddle.fluid.executor import Executor from paddle.fluid.evaluator import Evaluator -from paddle.fluid.framework import Program, Parameter, default_main_program, default_startup_program, Variable +from paddle.fluid.framework import Program, Parameter, default_main_program, default_startup_program, Variable, program_guard from . import core __all__ = [ @@ -931,6 +932,17 @@ def save_inference_model(dirname, if main_program is None: main_program = default_main_program() + # fix the bug that the activation op's output as target will be pruned. + # will affect the inference performance. + # TODO(Superjomn) add an IR pass to remove 1-scale op. + with program_guard(main_program): + uniq_target_vars = [] + for var in target_vars: + if isinstance(var, Variable): + var1 = layers.scale(var, 1.) + uniq_target_vars.append(var1) + target_vars = uniq_target_vars + # when a pserver and a trainer running on the same machine, mkdir may conflict try: os.makedirs(dirname) diff --git a/python/paddle/fluid/layers/detection.py b/python/paddle/fluid/layers/detection.py index bf2f587e67099699e0f987ca248cf682ab0c22ce..c983e2a44b25c5943df5e822e2e363b2557a6ac3 100644 --- a/python/paddle/fluid/layers/detection.py +++ b/python/paddle/fluid/layers/detection.py @@ -49,6 +49,7 @@ __all__ = [ 'box_coder', 'polygon_box_transform', 'yolov3_loss', + 'box_clip', 'multiclass_nms', ] @@ -508,13 +509,10 @@ def yolov3_loss(x, gtbox, gtlabel, anchors, + anchor_mask, class_num, ignore_thresh, - loss_weight_xy=None, - loss_weight_wh=None, - loss_weight_conf_target=None, - loss_weight_conf_notarget=None, - loss_weight_class=None, + downsample_ratio, name=None): """ ${comment} @@ -526,16 +524,13 @@ def yolov3_loss(x, and x, y, w, h should be relative value of input image. N is the batch number and B is the max box number in an image. - gtlabel (Variable): class id of ground truth boxes, shoud be ins shape + gtlabel (Variable): class id of ground truth boxes, shoud be in shape of [N, B]. anchors (list|tuple): ${anchors_comment} + anchor_mask (list|tuple): ${anchor_mask_comment} class_num (int): ${class_num_comment} ignore_thresh (float): ${ignore_thresh_comment} - loss_weight_xy (float|None): ${loss_weight_xy_comment} - loss_weight_wh (float|None): ${loss_weight_wh_comment} - loss_weight_conf_target (float|None): ${loss_weight_conf_target_comment} - loss_weight_conf_notarget (float|None): ${loss_weight_conf_notarget_comment} - loss_weight_class (float|None): ${loss_weight_class_comment} + downsample_ratio (int): ${downsample_ratio_comment} name (string): the name of yolov3 loss Returns: @@ -555,9 +550,10 @@ def yolov3_loss(x, x = fluid.layers.data(name='x', shape=[255, 13, 13], dtype='float32') gtbox = fluid.layers.data(name='gtbox', shape=[6, 5], dtype='float32') gtlabel = fluid.layers.data(name='gtlabel', shape=[6, 1], dtype='int32') - anchors = [10, 13, 16, 30, 33, 23] - loss = fluid.layers.yolov3_loss(x=x, gtbox=gtbox, class_num=80 - anchors=anchors, ignore_thresh=0.5) + anchors = [10, 13, 16, 30, 33, 23, 30, 61, 62, 45, 59, 119, 116, 90, 156, 198, 373, 326] + anchors = [0, 1, 2] + loss = fluid.layers.yolov3_loss(x=x, gtbox=gtbox, class_num=80, anchors=anchors, + ignore_thresh=0.5, downsample_ratio=32) """ helper = LayerHelper('yolov3_loss', **locals()) @@ -569,6 +565,8 @@ def yolov3_loss(x, raise TypeError("Input gtlabel of yolov3_loss must be Variable") if not isinstance(anchors, list) and not isinstance(anchors, tuple): raise TypeError("Attr anchors of yolov3_loss must be list or tuple") + if not isinstance(anchor_mask, list) and not isinstance(anchor_mask, tuple): + raise TypeError("Attr anchor_mask of yolov3_loss must be list or tuple") if not isinstance(class_num, int): raise TypeError("Attr class_num of yolov3_loss must be an integer") if not isinstance(ignore_thresh, float): @@ -581,31 +579,29 @@ def yolov3_loss(x, loss = helper.create_variable( name=name, dtype=x.dtype, persistable=False) + objectness_mask = helper.create_variable_for_type_inference(dtype='int32') + gt_match_mask = helper.create_variable_for_type_inference(dtype='int32') + attrs = { "anchors": anchors, + "anchor_mask": anchor_mask, "class_num": class_num, "ignore_thresh": ignore_thresh, + "downsample_ratio": downsample_ratio, } - if loss_weight_xy is not None and isinstance(loss_weight_xy, float): - self.attrs['loss_weight_xy'] = loss_weight_xy - if loss_weight_wh is not None and isinstance(loss_weight_wh, float): - self.attrs['loss_weight_wh'] = loss_weight_wh - if loss_weight_conf_target is not None and isinstance( - loss_weight_conf_target, float): - self.attrs['loss_weight_conf_target'] = loss_weight_conf_target - if loss_weight_conf_notarget is not None and isinstance( - loss_weight_conf_notarget, float): - self.attrs['loss_weight_conf_notarget'] = loss_weight_conf_notarget - if loss_weight_class is not None and isinstance(loss_weight_class, float): - self.attrs['loss_weight_class'] = loss_weight_class - helper.append_op( type='yolov3_loss', - inputs={"X": x, - "GTBox": gtbox, - "GTLabel": gtlabel}, - outputs={'Loss': loss}, + inputs={ + "X": x, + "GTBox": gtbox, + "GTLabel": gtlabel, + }, + outputs={ + 'Loss': loss, + 'ObjectnessMask': objectness_mask, + 'GTMatchMask': gt_match_mask + }, attrs=attrs) return loss @@ -2060,6 +2056,54 @@ def generate_proposals(scores, return rpn_rois, rpn_roi_probs +def box_clip(input, im_info, name=None): + """ + Clip the box into the size given by im_info + For each input box, The formula is given as follows: + + .. code-block:: text + + xmin = max(min(xmin, im_w - 1), 0) + ymin = max(min(ymin, im_h - 1), 0) + xmax = max(min(xmax, im_w - 1), 0) + ymax = max(min(ymax, im_h - 1), 0) + + where im_w and im_h are computed from im_info: + + .. code-block:: text + + im_h = round(height / scale) + im_w = round(weight / scale) + + Args: + input(variable): The input box, the last dimension is 4. + im_info(variable): The information of image with shape [N, 3] with + layout (height, width, scale). height and width + is the input size and scale is the ratio of input + size and original size. + name (str): The name of this layer. It is optional. + + Returns: + Variable: The cliped tensor variable. + + Examples: + .. code-block:: python + + boxes = fluid.layers.data( + name='data', shape=[8, 4], dtype='float32', lod_level=1) + im_info = fluid.layers.data(name='im_info', shape=[3]) + out = fluid.layers.box_clip( + input=boxes, im_info=im_info, inplace=True) + """ + + helper = LayerHelper("box_clip", **locals()) + output = helper.create_variable_for_type_inference(dtype=input.dtype) + inputs = {"Input": input, "ImInfo": im_info} + helper.append_op(type="box_clip", inputs=inputs, outputs={"Output": output}) + + return output + + def multiclass_nms(bboxes, scores, score_threshold, @@ -2137,9 +2181,11 @@ def multiclass_nms(bboxes, (After version 1.3, when no boxes detected, the lod is changed from {0} to {1}) + Examples: .. code-block:: python + boxes = fluid.layers.data(name='bboxes', shape=[81, 4], dtype='float32', lod_level=1) scores = fluid.layers.data(name='scores', shape=[81], diff --git a/python/paddle/fluid/tests/test_detection.py b/python/paddle/fluid/tests/test_detection.py index 3e71584a5d1b1a42b2b916d2df33a55c34557046..0d39a139eed87f900b1f59fd0569b6acaec0962b 100644 --- a/python/paddle/fluid/tests/test_detection.py +++ b/python/paddle/fluid/tests/test_detection.py @@ -476,12 +476,23 @@ class TestYoloDetection(unittest.TestCase): x = layers.data(name='x', shape=[30, 7, 7], dtype='float32') gtbox = layers.data(name='gtbox', shape=[10, 4], dtype='float32') gtlabel = layers.data(name='gtlabel', shape=[10], dtype='int32') - loss = layers.yolov3_loss(x, gtbox, gtlabel, [10, 13, 30, 13], 10, - 0.5) + loss = layers.yolov3_loss(x, gtbox, gtlabel, [10, 13, 30, 13], + [0, 1], 10, 0.7, 32) self.assertIsNotNone(loss) +class TestBoxClip(unittest.TestCase): + def test_box_clip(self): + program = Program() + with program_guard(program): + input_box = layers.data( + name='input_box', shape=[7, 4], dtype='float32', lod_level=1) + im_info = layers.data(name='im_info', shape=[3], dtype='float32') + out = layers.box_clip(input_box, im_info) + self.assertIsNotNone(out) + + class TestMulticlassNMS(unittest.TestCase): def test_multiclass_nms(self): program = Program() diff --git a/python/paddle/fluid/tests/unittests/ngraph/test_conv2d_ngraph_op.py b/python/paddle/fluid/tests/unittests/ngraph/test_conv2d_ngraph_op.py new file mode 100644 index 0000000000000000000000000000000000000000..e5424e8a6e615820b4a1a5f2ee7e7e87dd0b22af --- /dev/null +++ b/python/paddle/fluid/tests/unittests/ngraph/test_conv2d_ngraph_op.py @@ -0,0 +1,52 @@ +# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import print_function + +import unittest +from paddle.fluid.tests.unittests.test_conv2d_op import * + + +class TestNGRAPH(TestConv2dOp): + def init_kernel_type(self): + super(TestNGRAPH, self).init_kernel_type() + + +class TestNGRAPHWithPad(TestWithPad): + def init_kernel_type(self): + super(TestNGRAPHWithPad, self).init_kernel_type() + + +class TestNGRAPHWithStride(TestWithStride): + def init_kernel_type(self): + super(TestNGRAPHWithStride, self).init_kernel_type() + + +class TestNGRAPHWithGroup(TestWithGroup): + def init_kernel_type(self): + super(TestNGRAPHWithGroup, self).init_kernel_type() + + +class TestNGRAPHWith1x1(TestWith1x1): + def init_kernel_type(self): + super(TestNGRAPHWith1x1, self).init_kernel_type() + + +class TestNGRAPHWithInput1x1Filter1x1(TestWithInput1x1Filter1x1): + def init_kernel_type(self): + super(TestNGRAPHWithInput1x1Filter1x1, self).init_kernel_type() + + +if __name__ == '__main__': + unittest.main() diff --git a/python/paddle/fluid/tests/unittests/ngraph/test_pool2d_ngraph_op.py b/python/paddle/fluid/tests/unittests/ngraph/test_pool2d_ngraph_op.py new file mode 100644 index 0000000000000000000000000000000000000000..95e592e8ec036ad231ed57ddbc706683cb7aa153 --- /dev/null +++ b/python/paddle/fluid/tests/unittests/ngraph/test_pool2d_ngraph_op.py @@ -0,0 +1,51 @@ +# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import print_function + +from paddle.fluid.tests.unittests.test_pool2d_op import * + + +class TestNGRAPHPool2D_Op(TestPool2D_Op): + def init_test_case(self): + super(TestNGRAPHPool2D_Op, self).init_test_case() + + +class TestNGRAPHCase1(TestCase1): + def init_test_case(self): + super(TestNGRAPHCase1, self).init_test_case() + + +class TestNGRAPHCase2(TestCase2): + def init_test_case(self): + super(TestNGRAPHCase2, self).init_test_case() + + +class TestNGRAPHCase3(TestCase3): + def init_pool_type(self): + super(TestNGRAPHCase3, self).init_pool_type() + + +class TestNGRAPHCase4(TestCase4): + def init_pool_type(self): + super(TestNGRAPHCase4, self).init_pool_type() + + +class TestNGRAPHCase5(TestCase5): + def init_pool_type(self): + super(TestNGRAPHCase5, self).init_pool_type() + + +if __name__ == '__main__': + unittest.main() diff --git a/python/paddle/fluid/tests/unittests/test_box_clip_op.py b/python/paddle/fluid/tests/unittests/test_box_clip_op.py new file mode 100644 index 0000000000000000000000000000000000000000..b2b0598f31dd27e12e5ce329129129b5e0f1caf0 --- /dev/null +++ b/python/paddle/fluid/tests/unittests/test_box_clip_op.py @@ -0,0 +1,70 @@ +# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import print_function + +import unittest +import numpy as np +import sys +import math +from op_test import OpTest +import copy + + +def box_clip(input_box, im_info, output_box): + im_w = round(im_info[1] / im_info[2]) + im_h = round(im_info[0] / im_info[2]) + output_box[:, :, 0] = np.maximum( + np.minimum(input_box[:, :, 0], im_w - 1), 0) + output_box[:, :, 1] = np.maximum( + np.minimum(input_box[:, :, 1], im_h - 1), 0) + output_box[:, :, 2] = np.maximum( + np.minimum(input_box[:, :, 2], im_w - 1), 0) + output_box[:, :, 3] = np.maximum( + np.minimum(input_box[:, :, 3], im_h - 1), 0) + + +def batch_box_clip(input_boxes, im_info, lod): + n = input_boxes.shape[0] + m = input_boxes.shape[1] + output_boxes = np.zeros((n, m, 4), dtype=np.float32) + cur_offset = 0 + for i in range(len(lod)): + box_clip(input_boxes[cur_offset:(cur_offset + lod[i]), :, :], + im_info[i, :], + output_boxes[cur_offset:(cur_offset + lod[i]), :, :]) + cur_offset += lod[i] + return output_boxes + + +class TestBoxClipOp(OpTest): + def test_check_output(self): + self.check_output() + + def setUp(self): + self.op_type = "box_clip" + lod = [[1, 2, 3]] + input_boxes = np.random.random((6, 10, 4)) * 5 + im_info = np.array([[5, 8, 1.], [6, 6, 1.], [7, 5, 1.]]) + output_boxes = batch_box_clip(input_boxes, im_info, lod[0]) + + self.inputs = { + 'Input': (input_boxes.astype('float32'), lod), + 'ImInfo': im_info.astype('float32'), + } + self.outputs = {'Output': output_boxes} + + +if __name__ == '__main__': + unittest.main() diff --git a/python/paddle/fluid/tests/unittests/test_imperative_ptb_rnn.py b/python/paddle/fluid/tests/unittests/test_imperative_ptb_rnn.py index 5877e91f92e642e69265104c6728cd9bd41c41cd..afe990e74ff96dfbca4f335b561f9bbe7d295246 100644 --- a/python/paddle/fluid/tests/unittests/test_imperative_ptb_rnn.py +++ b/python/paddle/fluid/tests/unittests/test_imperative_ptb_rnn.py @@ -268,9 +268,6 @@ class TestImperativePtbRnn(unittest.TestCase): sgd.minimize(dy_loss) for param in ptb_model.parameters(): dy_param_updated[param.name] = param._numpy() - # print("dy_loss is {}".format(dy_loss._numpy())) - # print("last_hidden is {}".format(last_hidden._numpy())) - # print("last_cell is {}".format(last_cell._numpy())) with new_program_scope(): fluid.default_startup_program().random_seed = seed diff --git a/python/paddle/fluid/tests/unittests/test_inference_model_io.py b/python/paddle/fluid/tests/unittests/test_inference_model_io.py index 9962702f69644b7aef7d868f086abb390441f617..3b54827dd2e5ba177cb1a91019581c3fb6f63bb5 100644 --- a/python/paddle/fluid/tests/unittests/test_inference_model_io.py +++ b/python/paddle/fluid/tests/unittests/test_inference_model_io.py @@ -82,7 +82,8 @@ class TestBook(unittest.TestCase): self.assertEqual(feed_var_names, ["x", "y"]) self.assertEqual(len(fetch_vars), 1) - self.assertEqual(str(fetch_vars[0]), str(avg_cost)) + print("fetch %s" % str(fetch_vars[0])) + self.assertTrue("scale" in str(fetch_vars[0])) self.assertEqual(expected, actual) diff --git a/python/paddle/fluid/tests/unittests/test_peak_gpumem_monitor.py b/python/paddle/fluid/tests/unittests/test_peak_gpumem_monitor.py new file mode 100644 index 0000000000000000000000000000000000000000..3673fd10c4d14ba8d7a9644dcc024f58cf92a099 --- /dev/null +++ b/python/paddle/fluid/tests/unittests/test_peak_gpumem_monitor.py @@ -0,0 +1,59 @@ +# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import print_function + +import unittest +import os +os.environ['FLAGS_benchmark'] = 'True' + +import numpy +import paddle.fluid.core as core +from paddle.fluid.executor import Executor +from paddle.fluid.layers import mul, data + + +class TestPeakMemoryMonitoring(unittest.TestCase): + def test_mul(self): + + a = data(name='a', shape=[784], dtype='float32') + b = data( + name='b', + shape=[784, 100], + dtype='float32', + append_batch_size=False) + out = mul(x=a, y=b) + + if core.is_compiled_with_cuda(): + place = core.CUDAPlace(0) + + a_np = numpy.random.random((100, 784)).astype('float32') + b_np = numpy.random.random((784, 100)).astype('float32') + self.assertEqual(0, core.get_mem_usage(0)) + exe = Executor(place) + outs = exe.run(feed={'a': a_np, 'b': b_np}, fetch_list=[out]) + out = outs[0] + #disable this assert since ctest will ignore the os.environ setting + #self.assertGreater(core.get_mem_usage(0), 0) + + raised = False + try: + core.print_mem_usage() + except: + raised = True + self.assertFalse(raised, 'Exception raised') + + +if __name__ == '__main__': + unittest.main() diff --git a/python/paddle/fluid/tests/unittests/test_yolov3_loss_op.py b/python/paddle/fluid/tests/unittests/test_yolov3_loss_op.py index 544fe4b4f81909b69a05d9751316e3d3137fdc45..020c1139230a9177c4d7765367359d91839d7d46 100644 --- a/python/paddle/fluid/tests/unittests/test_yolov3_loss_op.py +++ b/python/paddle/fluid/tests/unittests/test_yolov3_loss_op.py @@ -16,174 +16,179 @@ from __future__ import division import unittest import numpy as np +from scipy.special import logit +from scipy.special import expit from op_test import OpTest from paddle.fluid import core -def sigmoid(x): - return 1.0 / (1.0 + np.exp(-1.0 * x)) +def l2loss(x, y): + return 0.5 * (y - x) * (y - x) -def mse(x, y, num): - return ((y - x)**2).sum() / num +def sce(x, label): + sigmoid_x = expit(x) + term1 = label * np.log(sigmoid_x) + term2 = (1.0 - label) * np.log(1.0 - sigmoid_x) + return -term1 - term2 -def bce(x, y, mask): - x = x.reshape((-1)) - y = y.reshape((-1)) - mask = mask.reshape((-1)) +def sigmoid(x): + return 1.0 / (1.0 + np.exp(-1.0 * x)) - error_sum = 0.0 - count = 0 - for i in range(x.shape[0]): - if mask[i] > 0: - error_sum += y[i] * np.log(x[i]) + (1 - y[i]) * np.log(1 - x[i]) - count += 1 - return error_sum / (-1.0 * count) +def batch_xywh_box_iou(box1, box2): + b1_left = box1[:, :, 0] - box1[:, :, 2] / 2 + b1_right = box1[:, :, 0] + box1[:, :, 2] / 2 + b1_top = box1[:, :, 1] - box1[:, :, 3] / 2 + b1_bottom = box1[:, :, 1] + box1[:, :, 3] / 2 -def box_iou(box1, box2): - b1_x1 = box1[0] - box1[2] / 2 - b1_x2 = box1[0] + box1[2] / 2 - b1_y1 = box1[1] - box1[3] / 2 - b1_y2 = box1[1] + box1[3] / 2 - b2_x1 = box2[0] - box2[2] / 2 - b2_x2 = box2[0] + box2[2] / 2 - b2_y1 = box2[1] - box2[3] / 2 - b2_y2 = box2[1] + box2[3] / 2 + b2_left = box2[:, :, 0] - box2[:, :, 2] / 2 + b2_right = box2[:, :, 0] + box2[:, :, 2] / 2 + b2_top = box2[:, :, 1] - box2[:, :, 3] / 2 + b2_bottom = box2[:, :, 1] + box2[:, :, 3] / 2 - b1_area = (b1_x2 - b1_x1) * (b1_y2 - b1_y1) - b2_area = (b2_x2 - b2_x1) * (b2_y2 - b2_y1) + left = np.maximum(b1_left[:, :, np.newaxis], b2_left[:, np.newaxis, :]) + right = np.minimum(b1_right[:, :, np.newaxis], b2_right[:, np.newaxis, :]) + top = np.maximum(b1_top[:, :, np.newaxis], b2_top[:, np.newaxis, :]) + bottom = np.minimum(b1_bottom[:, :, np.newaxis], + b2_bottom[:, np.newaxis, :]) - inter_rect_x1 = max(b1_x1, b2_x1) - inter_rect_y1 = max(b1_y1, b2_y1) - inter_rect_x2 = min(b1_x2, b2_x2) - inter_rect_y2 = min(b1_y2, b2_y2) - inter_area = max(inter_rect_x2 - inter_rect_x1, 0) * max( - inter_rect_y2 - inter_rect_y1, 0) + inter_w = np.clip(right - left, 0., 1.) + inter_h = np.clip(bottom - top, 0., 1.) + inter_area = inter_w * inter_h - return inter_area / (b1_area + b2_area + inter_area) + b1_area = (b1_right - b1_left) * (b1_bottom - b1_top) + b2_area = (b2_right - b2_left) * (b2_bottom - b2_top) + union = b1_area[:, :, np.newaxis] + b2_area[:, np.newaxis, :] - inter_area + return inter_area / union -def build_target(gtboxs, gtlabel, attrs, grid_size): - n, b, _ = gtboxs.shape - ignore_thresh = attrs["ignore_thresh"] - anchors = attrs["anchors"] - class_num = attrs["class_num"] - an_num = len(anchors) // 2 - obj_mask = np.zeros((n, an_num, grid_size, grid_size)).astype('float32') - noobj_mask = np.ones((n, an_num, grid_size, grid_size)).astype('float32') - tx = np.zeros((n, an_num, grid_size, grid_size)).astype('float32') - ty = np.zeros((n, an_num, grid_size, grid_size)).astype('float32') - tw = np.zeros((n, an_num, grid_size, grid_size)).astype('float32') - th = np.zeros((n, an_num, grid_size, grid_size)).astype('float32') - tconf = np.zeros((n, an_num, grid_size, grid_size)).astype('float32') - tcls = np.zeros( - (n, an_num, grid_size, grid_size, class_num)).astype('float32') +def YOLOv3Loss(x, gtbox, gtlabel, attrs): + n, c, h, w = x.shape + b = gtbox.shape[1] + anchors = attrs['anchors'] + an_num = len(anchors) // 2 + anchor_mask = attrs['anchor_mask'] + mask_num = len(anchor_mask) + class_num = attrs["class_num"] + ignore_thresh = attrs['ignore_thresh'] + downsample = attrs['downsample'] + input_size = downsample * h + x = x.reshape((n, mask_num, 5 + class_num, h, w)).transpose((0, 1, 3, 4, 2)) + loss = np.zeros((n)).astype('float32') + + pred_box = x[:, :, :, :, :4].copy() + grid_x = np.tile(np.arange(w).reshape((1, w)), (h, 1)) + grid_y = np.tile(np.arange(h).reshape((h, 1)), (1, w)) + pred_box[:, :, :, :, 0] = (grid_x + sigmoid(pred_box[:, :, :, :, 0])) / w + pred_box[:, :, :, :, 1] = (grid_y + sigmoid(pred_box[:, :, :, :, 1])) / h + + x[:, :, :, :, 5:] = np.where(x[:, :, :, :, 5:] < -0.5, x[:, :, :, :, 5:], + np.ones_like(x[:, :, :, :, 5:]) * 1.0 / + class_num) + + mask_anchors = [] + for m in anchor_mask: + mask_anchors.append((anchors[2 * m], anchors[2 * m + 1])) + anchors_s = np.array( + [(an_w / input_size, an_h / input_size) for an_w, an_h in mask_anchors]) + anchor_w = anchors_s[:, 0:1].reshape((1, mask_num, 1, 1)) + anchor_h = anchors_s[:, 1:2].reshape((1, mask_num, 1, 1)) + pred_box[:, :, :, :, 2] = np.exp(pred_box[:, :, :, :, 2]) * anchor_w + pred_box[:, :, :, :, 3] = np.exp(pred_box[:, :, :, :, 3]) * anchor_h + + pred_box = pred_box.reshape((n, -1, 4)) + pred_obj = x[:, :, :, :, 4].reshape((n, -1)) + objness = np.zeros(pred_box.shape[:2]).astype('float32') + ious = batch_xywh_box_iou(pred_box, gtbox) + ious_max = np.max(ious, axis=-1) + objness = np.where(ious_max > ignore_thresh, -np.ones_like(objness), + objness) + + gtbox_shift = gtbox.copy() + gtbox_shift[:, :, 0] = 0 + gtbox_shift[:, :, 1] = 0 + + anchors = [(anchors[2 * i], anchors[2 * i + 1]) for i in range(0, an_num)] + anchors_s = np.array( + [(an_w / input_size, an_h / input_size) for an_w, an_h in anchors]) + anchor_boxes = np.concatenate( + [np.zeros_like(anchors_s), anchors_s], axis=-1) + anchor_boxes = np.tile(anchor_boxes[np.newaxis, :, :], (n, 1, 1)) + ious = batch_xywh_box_iou(gtbox_shift, anchor_boxes) + iou_matches = np.argmax(ious, axis=-1) + gt_matches = iou_matches.copy() for i in range(n): for j in range(b): - if gtboxs[i, j, :].sum() == 0: + if gtbox[i, j, 2:].sum() == 0: + gt_matches[i, j] = -1 continue + if iou_matches[i, j] not in anchor_mask: + gt_matches[i, j] = -1 + continue + an_idx = anchor_mask.index(iou_matches[i, j]) + gt_matches[i, j] = an_idx + gi = int(gtbox[i, j, 0] * w) + gj = int(gtbox[i, j, 1] * h) - gt_label = gtlabel[i, j] - gx = gtboxs[i, j, 0] * grid_size - gy = gtboxs[i, j, 1] * grid_size - gw = gtboxs[i, j, 2] * grid_size - gh = gtboxs[i, j, 3] * grid_size - - gi = int(gx) - gj = int(gy) - - gtbox = [0, 0, gw, gh] - max_iou = 0 - for k in range(an_num): - anchor_box = [0, 0, anchors[2 * k], anchors[2 * k + 1]] - iou = box_iou(gtbox, anchor_box) - if iou > max_iou: - max_iou = iou - best_an_index = k - if iou > ignore_thresh: - noobj_mask[i, best_an_index, gj, gi] = 0 - - obj_mask[i, best_an_index, gj, gi] = 1 - noobj_mask[i, best_an_index, gj, gi] = 0 - tx[i, best_an_index, gj, gi] = gx - gi - ty[i, best_an_index, gj, gi] = gy - gj - tw[i, best_an_index, gj, gi] = np.log(gw / anchors[2 * - best_an_index]) - th[i, best_an_index, gj, gi] = np.log( - gh / anchors[2 * best_an_index + 1]) - tconf[i, best_an_index, gj, gi] = 1 - tcls[i, best_an_index, gj, gi, gt_label] = 1 - - return (tx, ty, tw, th, tconf, tcls, obj_mask, noobj_mask) - - -def YoloV3Loss(x, gtbox, gtlabel, attrs): - n, c, h, w = x.shape - an_num = len(attrs['anchors']) // 2 - class_num = attrs["class_num"] - x = x.reshape((n, an_num, 5 + class_num, h, w)).transpose((0, 1, 3, 4, 2)) - pred_x = sigmoid(x[:, :, :, :, 0]) - pred_y = sigmoid(x[:, :, :, :, 1]) - pred_w = x[:, :, :, :, 2] - pred_h = x[:, :, :, :, 3] - pred_conf = sigmoid(x[:, :, :, :, 4]) - pred_cls = sigmoid(x[:, :, :, :, 5:]) - - tx, ty, tw, th, tconf, tcls, obj_mask, noobj_mask = build_target( - gtbox, gtlabel, attrs, x.shape[2]) - - obj_mask_expand = np.tile( - np.expand_dims(obj_mask, 4), (1, 1, 1, 1, int(attrs['class_num']))) - loss_x = mse(pred_x * obj_mask, tx * obj_mask, obj_mask.sum()) - loss_y = mse(pred_y * obj_mask, ty * obj_mask, obj_mask.sum()) - loss_w = mse(pred_w * obj_mask, tw * obj_mask, obj_mask.sum()) - loss_h = mse(pred_h * obj_mask, th * obj_mask, obj_mask.sum()) - loss_conf_target = bce(pred_conf * obj_mask, tconf * obj_mask, obj_mask) - loss_conf_notarget = bce(pred_conf * noobj_mask, tconf * noobj_mask, - noobj_mask) - loss_class = bce(pred_cls * obj_mask_expand, tcls * obj_mask_expand, - obj_mask_expand) - - return attrs['loss_weight_xy'] * (loss_x + loss_y) \ - + attrs['loss_weight_wh'] * (loss_w + loss_h) \ - + attrs['loss_weight_conf_target'] * loss_conf_target \ - + attrs['loss_weight_conf_notarget'] * loss_conf_notarget \ - + attrs['loss_weight_class'] * loss_class + tx = gtbox[i, j, 0] * w - gi + ty = gtbox[i, j, 1] * w - gj + tw = np.log(gtbox[i, j, 2] * input_size / mask_anchors[an_idx][0]) + th = np.log(gtbox[i, j, 3] * input_size / mask_anchors[an_idx][1]) + scale = (2.0 - gtbox[i, j, 2] * gtbox[i, j, 3]) + loss[i] += sce(x[i, an_idx, gj, gi, 0], tx) * scale + loss[i] += sce(x[i, an_idx, gj, gi, 1], ty) * scale + loss[i] += l2loss(x[i, an_idx, gj, gi, 2], tw) * scale + loss[i] += l2loss(x[i, an_idx, gj, gi, 3], th) * scale + + objness[i, an_idx * h * w + gj * w + gi] = 1.0 + + for label_idx in range(class_num): + loss[i] += sce(x[i, an_idx, gj, gi, 5 + label_idx], + float(label_idx == gtlabel[i, j])) + + for j in range(mask_num * h * w): + if objness[i, j] > 0: + loss[i] += sce(pred_obj[i, j], 1.0) + elif objness[i, j] == 0: + loss[i] += sce(pred_obj[i, j], 0.0) + + return (loss, objness.reshape((n, mask_num, h, w)).astype('float32'), \ + gt_matches.astype('int32')) class TestYolov3LossOp(OpTest): def setUp(self): - self.loss_weight_xy = 1.0 - self.loss_weight_wh = 1.0 - self.loss_weight_conf_target = 1.0 - self.loss_weight_conf_notarget = 1.0 - self.loss_weight_class = 1.0 self.initTestCase() self.op_type = 'yolov3_loss' - x = np.random.random(size=self.x_shape).astype('float32') + x = logit(np.random.uniform(0, 1, self.x_shape).astype('float32')) gtbox = np.random.random(size=self.gtbox_shape).astype('float32') - gtlabel = np.random.randint(0, self.class_num, - self.gtbox_shape[:2]).astype('int32') + gtlabel = np.random.randint(0, self.class_num, self.gtbox_shape[:2]) + gtmask = np.random.randint(0, 2, self.gtbox_shape[:2]) + gtbox = gtbox * gtmask[:, :, np.newaxis] + gtlabel = gtlabel * gtmask self.attrs = { "anchors": self.anchors, + "anchor_mask": self.anchor_mask, "class_num": self.class_num, "ignore_thresh": self.ignore_thresh, - "loss_weight_xy": self.loss_weight_xy, - "loss_weight_wh": self.loss_weight_wh, - "loss_weight_conf_target": self.loss_weight_conf_target, - "loss_weight_conf_notarget": self.loss_weight_conf_notarget, - "loss_weight_class": self.loss_weight_class, + "downsample": self.downsample, } - self.inputs = {'X': x, 'GTBox': gtbox, 'GTLabel': gtlabel} + self.inputs = { + 'X': x, + 'GTBox': gtbox.astype('float32'), + 'GTLabel': gtlabel.astype('int32'), + } + loss, objness, gt_matches = YOLOv3Loss(x, gtbox, gtlabel, self.attrs) self.outputs = { - 'Loss': np.array( - [YoloV3Loss(x, gtbox, gtlabel, self.attrs)]).astype('float32') + 'Loss': loss, + 'ObjectnessMask': objness, + "GTMatchMask": gt_matches } def test_check_output(self): @@ -196,19 +201,16 @@ class TestYolov3LossOp(OpTest): place, ['X'], 'Loss', no_grad_set=set(["GTBox", "GTLabel"]), - max_relative_error=0.06) + max_relative_error=0.3) def initTestCase(self): - self.anchors = [10, 13, 12, 12] - self.class_num = 10 + self.anchors = [10, 13, 16, 30, 33, 23] + self.anchor_mask = [1, 2] + self.class_num = 5 self.ignore_thresh = 0.5 - self.x_shape = (5, len(self.anchors) // 2 * (5 + self.class_num), 7, 7) - self.gtbox_shape = (5, 10, 4) - self.loss_weight_xy = 2.5 - self.loss_weight_wh = 0.8 - self.loss_weight_conf_target = 1.5 - self.loss_weight_conf_notarget = 0.5 - self.loss_weight_class = 1.2 + self.downsample = 32 + self.x_shape = (3, len(self.anchor_mask) * (5 + self.class_num), 5, 5) + self.gtbox_shape = (3, 5, 4) if __name__ == "__main__": diff --git a/python/paddle/fluid/transpiler/details/__init__.py b/python/paddle/fluid/transpiler/details/__init__.py index f33c05ed2f48c2498b98fc486d6ff7471088d77e..82d0d336e523ec48c5ceca3b92ff0963c4499123 100644 --- a/python/paddle/fluid/transpiler/details/__init__.py +++ b/python/paddle/fluid/transpiler/details/__init__.py @@ -17,3 +17,4 @@ from __future__ import print_function from .program_utils import * from .ufind import * from .checkport import * +from .vars_distributed import * diff --git a/python/paddle/fluid/transpiler/details/vars_distributed.py b/python/paddle/fluid/transpiler/details/vars_distributed.py new file mode 100644 index 0000000000000000000000000000000000000000..05e7f6e3e706376efc8af870a780d96c45642514 --- /dev/null +++ b/python/paddle/fluid/transpiler/details/vars_distributed.py @@ -0,0 +1,269 @@ +# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from __future__ import print_function +from paddle.fluid.framework import Variable + + +class VarStruct(object): + """ + record part properties of a Variable in python. + """ + + def __init__(self, name, shape, dtype, type, lod_level, persistable): + self.name = name + self.shape = shape + self.dtype = dtype + self.type = type + self.lod_level = lod_level + self.persistable = persistable + + +class VarDistributed(object): + """ + a class to record the var distributed on parameter servers. + the class will record the relationship between origin var and slice var. + the slice var's properties, such as type/shape/offset/endpoint. + """ + + def __init__(self, + origin_var, + slice_var, + is_slice=None, + block_id=None, + offset=None, + vtype=None, + endpoint=None): + """ + Args: + origin_var(Variable|VarStruct): origin var properties + slice_var(Variable|VarStruct): slice var properties + is_slice(bool|None): slice or not, slice_var=True/False and its block size > 8192 are the judgement standard. + block_id(int|None): the number about the slice var. + offset(int|None): if the slice var is sliced, offset is the numel before the var. + vtype(str|None): a tag, such as Optimizer/Param/RemoteProfetch. + endpoint(str|None): which parameter the slice var on, such as "127.0.0.1:1001" + """ + + if isinstance(origin_var, Variable): + self.origin = self.__create_var_struct(origin_var) + else: + self.origin = origin_var + + if isinstance(slice_var, Variable): + self.slice = self.__create_var_struct(slice_var) + else: + self.slice = slice_var + + if self.equal(self.origin, self.slice): + self.is_slice = False + self.block_id = 0 + self.offset = 0 + else: + self.is_slice = True + self.block_id = 0 + self.offset = 0 + + if is_slice is not None: + self.is_slice = is_slice + if block_id is not None: + self.block_id = block_id + if offset is not None: + self.offset = offset + + self.vtype = vtype + self.endpoint = endpoint + + @staticmethod + def __create_var_struct(var): + return VarStruct(var.name, var.shape, var.dtype, var.type, + var.lod_level, var.persistable) + + @staticmethod + def equal(var1, var2): + """ + the two var is equal or not. + Returns: + bool: equal will return True else False + """ + assert isinstance(var1, VarStruct) and isinstance(var2, VarStruct) + + return var1.name == var2.name and \ + var1.type == var2.type and \ + var1.shape == var2.shape and \ + var1.dtype == var2.dtype and \ + var1.lod_level == var2.lod_level and \ + var1.persistable == var2.persistable + + def __str__(self): + origin_var_str = "{name} : fluid.{type}.shape{shape}.astype({dtype})". \ + format(i="{", e="}", name=self.origin.name, type=self.origin.type, + shape=self.origin.shape, dtype=self.origin.dtype) + + slice_var_str = "{name} : fluid.{type}.shape{shape}.astype({dtype})" \ + ".slice({is_slice}).block({block_id}).offset({offset})". \ + format(i="{", e="}", name=self.slice.name, type=self.slice.type, + shape=self.slice.shape, dtype=self.slice.dtype, + is_slice=self.is_slice, block_id=self.block_id, offset=self.offset) + + return "var owned: {}, origin var: ( {} ), slice var: ( {} ), endpoint: {} ".format( + self.vtype, origin_var_str, slice_var_str, self.endpoint) + + +class VarsDistributed(object): + """ + a gather about VarDistributed with many methods to find distributed vars. + through the class, we can get overview about the distributed parameters on parameter servers. + this class may centralized and convenient for developer to manage and get variable's distribute. + other module can also use this to find variables such io.py. + """ + + def __init__(self): + self.distributed_vars = [] + + def add_distributed_var(self, + origin_var, + slice_var, + is_slice=None, + block_id=None, + offset=None, + vtype=None, + endpoint=None): + """ + add distributed var in this. + + Args: + origin_var(Variable|VarStruct): origin var properties + slice_var(Variable|VarStruct): slice var properties + is_slice(bool|None): slice or not, slice_var=True/False and its block size > 8192 are the judgement standard. + block_id(int|None): the number about the slice var. + offset(int|None): if the slice var is sliced, offset is the numel before the var. + vtype(str|None): a tag, such as Optimizer/Param/RemoteProfetch. + endpoint(str|None): which parameter the slice var on, such as "127.0.0.1:1001" + Returns: + None + """ + self.distributed_vars.append( + VarDistributed(origin_var, slice_var, is_slice, block_id, offset, + vtype, endpoint)) + + def get_distributed_var_by_slice(self, var_name): + """ + get distributed var by conditions. + + Args: + var_name(str): slice var name, such as "w.traier0.block1" + Returns: + VarDistributed: distributed var. + """ + for dist_var in self.distributed_vars: + if dist_var.slice.name == var_name: + return dist_var + return None + + @staticmethod + def equal(var1, var2): + """ + the two var is equal or not. + Returns: + bool: equal will return True else False + """ + return var1.name == var2.name and \ + var1.type == var2.type and \ + var1.shape == var2.shape and \ + var1.dtype == var2.dtype and \ + var1.lod_level == var2.lod_level and \ + var1.persistable == var2.persistable + + def get_distributed_var_by_origin_and_ep(self, origin_var_name, endpoint): + """ + get distributed var by conditions. + + Args: + origin_var_name(str): + endpoint(str): the parameter endpoint, such as "127.0.0.1:1001" + Returns: + VarDistributed: distributed var. + """ + for dist_var in self.distributed_vars: + if dist_var.origin.name == origin_var_name and dist_var.endpoint == endpoint: + return dist_var + return None + + def get_distributed_vars_by_vtypes(self, vtypes, groupby=False): + """ + get distributed vars by conditions. + + Args: + vtype(str|None): distributed var's vtype, such as "Optimizer", "RemotePrefetch" + groupby(bool|False): group by origin var or not. + + Returns: + list: distributed var list. + dict: distributed var map when groupby=True + """ + vtype_vars = [] + for var in self.distributed_vars: + if var.vtype in vtypes: + vtype_vars.append(var) + if not groupby: + return vtype_vars + + params_map = {} + for var in vtype_vars: + origin_var_name = var.origin.name + + if origin_var_name in params_map.keys(): + optimizers = params_map.get(origin_var_name) + else: + optimizers = [] + optimizers.append(var) + params_map[origin_var_name] = optimizers + return params_map + + def get_distributed_vars_by_ep(self, endpoint, vtype=None): + """ + get distributed vars by conditions. + + Args: + endpoint(str): the parameter server endpoint, such as "127.0.0.1:2001" + vtype(str|None): distributed var's vtype, such as "Optimizer", "RemotePrefetch" + + Returns: + list: distributed var list. + """ + endpoint_vars = [] + for var in self.distributed_vars: + if var.endpoint == endpoint: + endpoint_vars.append(var) + if not vtype: + return endpoint_vars + + vtype_vars = [] + for var in endpoint_vars: + if var.vtype == vtype: + vtype_vars.append(var) + return vtype_vars + + def overview(self): + """ + get the overview string about all params on all parameter servers. + + Returns: + Str: overview string. + + """ + vars_str = [] + for var in self.distributed_vars: + vars_str.append(str(var)) + return "\n".join(vars_str) diff --git a/python/paddle/fluid/transpiler/distribute_transpiler.py b/python/paddle/fluid/transpiler/distribute_transpiler.py index e58f34e3750803669149685003ea5858fa775ed7..a3293afbbd7cef8470c808e98ae88a05f2e492f4 100644 --- a/python/paddle/fluid/transpiler/distribute_transpiler.py +++ b/python/paddle/fluid/transpiler/distribute_transpiler.py @@ -30,19 +30,23 @@ Steps to transpile pserver: 5. add listen_and_serv op """ +import sys import math -import numpy as np +from functools import reduce + import collections +import six import logging +import numpy as np + from .ps_dispatcher import RoundRobin, PSDispatcher from .. import core, framework, unique_name from ..framework import Program, default_main_program, \ - default_startup_program, Block, \ - Parameter, Variable, grad_var_name -from .details import * + default_startup_program, Block, Parameter, grad_var_name +from .details import wait_server_ready, UnionFind, VarStruct, VarsDistributed +from .details import delete_ops, find_op_by_output_arg from ..distribute_lookup_table import find_distributed_lookup_table -from functools import reduce LOOKUP_TABLE_TYPE = "lookup_table" LOOKUP_TABLE_GRAD_TYPE = "lookup_table_grad" @@ -62,260 +66,6 @@ def log(*args): print(args) -class VarStruct(object): - """ - record part properties of a Variable in python. - """ - - def __init__(self, name, shape, dtype, type, lod_level, persistable): - self.name = name - self.shape = shape - self.dtype = dtype - self.type = type - self.lod_level = lod_level - self.persistable = persistable - - -class VarDistributed(object): - """ - a class to record the var distributed on parameter servers. - the class will record the relationship between origin var and slice var. - the slice var's properties, such as type/shape/offset/endpoint. - """ - - def __init__(self, - origin_var, - slice_var, - is_slice=None, - block_id=None, - offset=None, - vtype=None, - endpoint=None): - """ - Args: - origin_var(Variable|VarStruct): origin var properties - slice_var(Variable|VarStruct): slice var properties - is_slice(bool|None): slice or not, slice_var=True/False and its block size > 8192 are the judgement standard. - block_id(int|None): the number about the slice var. - offset(int|None): if the slice var is sliced, offset is the numel before the var. - vtype(str|None): a tag, such as Optimizer/Param/RemoteProfetch. - endpoint(str|None): which parameter the slice var on, such as "127.0.0.1:1001" - """ - - if isinstance(origin_var, Variable): - self.origin = self.__create_var_struct(origin_var) - else: - self.origin = origin_var - - if isinstance(slice_var, Variable): - self.slice = self.__create_var_struct(slice_var) - else: - self.slice = slice_var - - if self.equal(self.origin, self.slice): - self.is_slice = False - self.block_id = 0 - self.offset = 0 - else: - self.is_slice = True - self.block_id = 0 - self.offset = 0 - - if is_slice is not None: - self.is_slice = is_slice - if block_id is not None: - self.block_id = block_id - if offset is not None: - self.offset = offset - - self.vtype = vtype - self.endpoint = endpoint - - @staticmethod - def __create_var_struct(var): - return VarStruct(var.name, var.shape, var.dtype, var.type, - var.lod_level, var.persistable) - - @staticmethod - def equal(var1, var2): - """ - the two var is equal or not. - Returns: - bool: equal will return True else False - """ - assert isinstance(var1, VarStruct) and isinstance(var2, VarStruct) - - return var1.name == var2.name and \ - var1.type == var2.type and \ - var1.shape == var2.shape and \ - var1.dtype == var2.dtype and \ - var1.lod_level == var2.lod_level and \ - var1.persistable == var2.persistable - - def __str__(self): - origin_var_str = "{name} : fluid.{type}.shape{shape}.astype({dtype})". \ - format(i="{", e="}", name=self.origin.name, type=self.origin.type, - shape=self.origin.shape, dtype=self.origin.dtype) - - slice_var_str = "{name} : fluid.{type}.shape{shape}.astype({dtype})" \ - ".slice({is_slice}).block({block_id}).offset({offset})". \ - format(i="{", e="}", name=self.slice.name, type=self.slice.type, - shape=self.slice.shape, dtype=self.slice.dtype, - is_slice=self.is_slice, block_id=self.block_id, offset=self.offset) - - return "var owned: {}, origin var: ( {} ), slice var: ( {} ), endpoint: {} ".format( - self.vtype, origin_var_str, slice_var_str, self.endpoint) - - -class VarsDistributed(object): - """ - a gather about VarDistributed with many methods to find distributed vars. - through the class, we can get overview about the distributed parameters on parameter servers. - this class may centralized and convenient for developer to manage and get variable's distribute. - other module can also use this to find variables such io.py. - """ - - def __init__(self): - self.distributed_vars = [] - - def add_distributed_var(self, - origin_var, - slice_var, - is_slice=None, - block_id=None, - offset=None, - vtype=None, - endpoint=None): - """ - add distributed var in this. - - Args: - origin_var(Variable|VarStruct): origin var properties - slice_var(Variable|VarStruct): slice var properties - is_slice(bool|None): slice or not, slice_var=True/False and its block size > 8192 are the judgement standard. - block_id(int|None): the number about the slice var. - offset(int|None): if the slice var is sliced, offset is the numel before the var. - vtype(str|None): a tag, such as Optimizer/Param/RemoteProfetch. - endpoint(str|None): which parameter the slice var on, such as "127.0.0.1:1001" - Returns: - None - """ - self.distributed_vars.append( - VarDistributed(origin_var, slice_var, is_slice, block_id, offset, - vtype, endpoint)) - - def get_distributed_var_by_slice(self, var_name): - """ - get distributed var by conditions. - - Args: - var_name(str): slice var name, such as "w.traier0.block1" - Returns: - VarDistributed: distributed var. - """ - for dist_var in self.distributed_vars: - if dist_var.slice.name == var_name: - return dist_var - return None - - @staticmethod - def equal(var1, var2): - """ - the two var is equal or not. - Returns: - bool: equal will return True else False - """ - return var1.name == var2.name and \ - var1.type == var2.type and \ - var1.shape == var2.shape and \ - var1.dtype == var2.dtype and \ - var1.lod_level == var2.lod_level and \ - var1.persistable == var2.persistable - - def get_distributed_var_by_origin_and_ep(self, origin_var_name, endpoint): - """ - get distributed var by conditions. - - Args: - origin_var_name(str): - endpoint(str): the parameter endpoint, such as "127.0.0.1:1001" - Returns: - VarDistributed: distributed var. - """ - for dist_var in self.distributed_vars: - if dist_var.origin.name == origin_var_name and dist_var.endpoint == endpoint: - return dist_var - return None - - def get_distributed_vars_by_vtypes(self, vtypes, groupby=False): - """ - get distributed vars by conditions. - - Args: - vtype(str|None): distributed var's vtype, such as "Optimizer", "RemotePrefetch" - groupby(bool|False): group by origin var or not. - - Returns: - list: distributed var list. - dict: distributed var map when groupby=True - """ - vtype_vars = [] - for var in self.distributed_vars: - if var.vtype in vtypes: - vtype_vars.append(var) - if not groupby: - return vtype_vars - - params_map = {} - for var in vtype_vars: - origin_var_name = var.origin.name - - if origin_var_name in params_map.keys(): - optimizers = params_map.get(origin_var_name) - else: - optimizers = [] - optimizers.append(var) - params_map[origin_var_name] = optimizers - return params_map - - def get_distributed_vars_by_ep(self, endpoint, vtype=None): - """ - get distributed vars by conditions. - - Args: - endpoint(str): the parameter server endpoint, such as "127.0.0.1:2001" - vtype(str|None): distributed var's vtype, such as "Optimizer", "RemotePrefetch" - - Returns: - list: distributed var list. - """ - endpoint_vars = [] - for var in self.distributed_vars: - if var.endpoint == endpoint: - endpoint_vars.append(var) - if not vtype: - return endpoint_vars - - vtype_vars = [] - for var in endpoint_vars: - if var.vtype == vtype: - vtype_vars.append(var) - return vtype_vars - - def overview(self): - """ - get the overview string about all params on all parameter servers. - - Returns: - Str: overview string. - - """ - vars_str = [] - for var in self.distributed_vars: - vars_str.append(str(var)) - return "\n".join(vars_str) - - class VarBlock: def __init__(self, varname, offset, size): self.varname = varname diff --git a/python/setup.py.in b/python/setup.py.in index c947785cbf7517be56c3e43120db65284ab22d10..f93f0cd130e33311bade2b15726c3eff37546214 100644 --- a/python/setup.py.in +++ b/python/setup.py.in @@ -109,6 +109,7 @@ packages=['paddle', 'paddle.fluid.contrib', 'paddle.fluid.contrib.decoder', 'paddle.fluid.contrib.quantize', + 'paddle.fluid.contrib.int8_inference', 'paddle.fluid.contrib.reader', 'paddle.fluid.contrib.slim', 'paddle.fluid.contrib.slim.core',