提交 40289431 编写于 作者: X xuezhong

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into fix_bug_for_lstmp

# PaddlePaddle # PaddlePaddle
English | [简体中文](./README_cn.md)
[![Build Status](https://travis-ci.org/PaddlePaddle/Paddle.svg?branch=develop)](https://travis-ci.org/PaddlePaddle/Paddle) [![Build Status](https://travis-ci.org/PaddlePaddle/Paddle.svg?branch=develop)](https://travis-ci.org/PaddlePaddle/Paddle)
[![Documentation Status](https://img.shields.io/badge/docs-latest-brightgreen.svg?style=flat)](http://paddlepaddle.org/documentation/docs/en/1.2/getstarted/index_en.html) [![Documentation Status](https://img.shields.io/badge/docs-latest-brightgreen.svg?style=flat)](http://paddlepaddle.org/documentation/docs/en/1.2/getstarted/index_en.html)
...@@ -7,7 +8,6 @@ ...@@ -7,7 +8,6 @@
[![Release](https://img.shields.io/github/release/PaddlePaddle/Paddle.svg)](https://github.com/PaddlePaddle/Paddle/releases) [![Release](https://img.shields.io/github/release/PaddlePaddle/Paddle.svg)](https://github.com/PaddlePaddle/Paddle/releases)
[![License](https://img.shields.io/badge/license-Apache%202-blue.svg)](LICENSE) [![License](https://img.shields.io/badge/license-Apache%202-blue.svg)](LICENSE)
Welcome to the PaddlePaddle GitHub. Welcome to the PaddlePaddle GitHub.
PaddlePaddle (PArallel Distributed Deep LEarning) is an easy-to-use, PaddlePaddle (PArallel Distributed Deep LEarning) is an easy-to-use,
...@@ -18,16 +18,6 @@ learning to many products at Baidu. ...@@ -18,16 +18,6 @@ learning to many products at Baidu.
Our vision is to enable deep learning for everyone via PaddlePaddle. Our vision is to enable deep learning for everyone via PaddlePaddle.
Please refer to our [release announcement](https://github.com/PaddlePaddle/Paddle/releases) to track the latest feature of PaddlePaddle. Please refer to our [release announcement](https://github.com/PaddlePaddle/Paddle/releases) to track the latest feature of PaddlePaddle.
欢迎来到 PaddlePaddle GitHub
PaddlePaddle (PArallel Distributed Deep LEarning) 是一个简单易用、高效灵活、可扩展的深度学习平台,最初由百度科学家和工程师共同开发,目的是将深度学习技术应用到百度的众多产品中。
我们的愿景是让每个人都能通过PaddlePaddle接触深度学习
跟进PaddlePaddle最新特性请参考我们的[版本说明](https://github.com/PaddlePaddle/Paddle/releases)
### Latest PaddlePaddle Release: [Fluid 1.2.0](https://github.com/PaddlePaddle/Paddle/tree/release/1.2) ### Latest PaddlePaddle Release: [Fluid 1.2.0](https://github.com/PaddlePaddle/Paddle/tree/release/1.2)
### Install Latest Stable Release: ### Install Latest Stable Release:
``` ```
...@@ -43,23 +33,6 @@ pip install paddlepaddle-gpu==1.2.0.post85 ...@@ -43,23 +33,6 @@ pip install paddlepaddle-gpu==1.2.0.post85
# For installation on other platform, refer to http://paddlepaddle.org/ # For installation on other platform, refer to http://paddlepaddle.org/
``` ```
### PaddlePaddle最新版本: [Fluid 1.2.0](https://github.com/PaddlePaddle/Paddle/tree/release/1.2)
### 安装最新稳定版本:
```
# Linux CPU
pip install paddlepaddle
# Linux GPU cuda9cudnn7
pip install paddlepaddle-gpu
# Linux GPU cuda8cudnn7
pip install paddlepaddle-gpu==1.2.0.post87
# Linux GPU cuda8cudnn5
pip install paddlepaddle-gpu==1.2.0.post85
# 其他平台上的安装指引请参考 http://paddlepaddle.org/
```
## Features ## Features
- **Flexibility** - **Flexibility**
...@@ -100,38 +73,10 @@ pip install paddlepaddle-gpu==1.2.0.post85 ...@@ -100,38 +73,10 @@ pip install paddlepaddle-gpu==1.2.0.post85
Baidu and it has achieved a significant impact. We hope you can also explore Baidu and it has achieved a significant impact. We hope you can also explore
the capability of PaddlePaddle to make an impact on your product. the capability of PaddlePaddle to make an impact on your product.
## 特点
- **灵活性**
PaddlePaddle支持丰富的神经网络架构和优化算法。易于配置复杂模型,例如带有注意力机制或复杂记忆连接的神经网络机器翻译模型。
- **高效性**
为了高效使用异步计算资源,PaddlePaddle对框架的不同层进行优化,包括计算、存储、架构和通信。下面是一些样例:
- 通过SSE/AVX 内置函数、BLAS库(例如MKL、OpenBLAS、cuBLAS)或定制的CPU/GPU内核优化数学操作。
- 通过MKL-DNN库优化CNN网络
- 高度优化循环网络,无需执行 `padding` 操作即可处理 **变长** 序列
- 针对高维稀疏数据模型,优化了局部和分布式训练。
- **稳定性**
有了 PaddlePaddle,使得利用各种CPU/GPU和机器来加速训练变得简单。PaddlePaddle 通过优化通信可以实现巨大吞吐量和快速执行。
- **连接产品**
另外,PaddlePaddle 的设计也易于部署。在百度,PaddlePaddle 已经部署到含有巨大用户量的产品和服务上,包括广告点击率(CTR)预测、大规模图像分类、光学字符识别(OCR)、搜索排序,计算机病毒检测、推荐系统等等。PaddlePaddle广泛应用于百度产品中,产生了非常重要的影响。我们希望您也能探索 PaddlePaddle 的能力,为您的产品创造新的影响力和效果。
## Installation ## Installation
It is recommended to read [this doc](http://paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/install/index_cn.html) on our website. It is recommended to read [this doc](http://paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/install/index_cn.html) on our website.
## 安装
推荐阅读官网上的[安装说明](http://paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/install/index_cn.html)
## Documentation ## Documentation
We provide [English](http://paddlepaddle.org/documentation/docs/en/1.2/getstarted/index_en.html) and We provide [English](http://paddlepaddle.org/documentation/docs/en/1.2/getstarted/index_en.html) and
...@@ -153,37 +98,9 @@ We provide [English](http://paddlepaddle.org/documentation/docs/en/1.2/getstarte ...@@ -153,37 +98,9 @@ We provide [English](http://paddlepaddle.org/documentation/docs/en/1.2/getstarte
We appreciate your contributions! We appreciate your contributions!
## 文档
我们提供[英文](http://paddlepaddle.org/documentation/docs/en/1.2/getstarted/index_en.html)
[中文](http://paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/index.html) 文档
- [深度学习101](https://github.com/PaddlePaddle/book)
或许您想从这个在线交互式书籍开始,可以在Jupyter Notebook中运行
- [分布式训练](http://paddlepaddle.org/documentation/docs/zh/1.2/user_guides/howto/training/cluster_howto.html)
可以在MPI集群上运行分布式训练任务
- [Python API](http://paddlepaddle.org/documentation/docs/zh/1.2/api_cn/index_cn.html)
新的API支持代码更少更简洁的程序
- [贡献方式](http://paddlepaddle.org/documentation/docs/zh/1.2/advanced_usage/development/contribute_to_paddle/index_cn.html)
欢迎您的贡献!
## Ask Questions ## Ask Questions
You are welcome to submit questions and bug reports as [Github Issues](https://github.com/PaddlePaddle/Paddle/issues). You are welcome to submit questions and bug reports as [Github Issues](https://github.com/PaddlePaddle/Paddle/issues).
## 答疑
欢迎您将问题和bug报告以[Github Issues](https://github.com/PaddlePaddle/Paddle/issues)的形式提交
## Copyright and License ## Copyright and License
PaddlePaddle is provided under the [Apache-2.0 license](LICENSE). PaddlePaddle is provided under the [Apache-2.0 license](LICENSE).
## 版权和许可证
PaddlePaddle由[Apache-2.0 license](LICENSE)提供
# PaddlePaddle
[English](./README.md) | 简体中文
[![Build Status](https://travis-ci.org/PaddlePaddle/Paddle.svg?branch=develop)](https://travis-ci.org/PaddlePaddle/Paddle)
[![Documentation Status](https://img.shields.io/badge/docs-latest-brightgreen.svg?style=flat)](http://paddlepaddle.org/documentation/docs/en/1.2/getstarted/index_en.html)
[![Documentation Status](https://img.shields.io/badge/中文文档-最新-brightgreen.svg)](http://paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/index.html)
[![Release](https://img.shields.io/github/release/PaddlePaddle/Paddle.svg)](https://github.com/PaddlePaddle/Paddle/releases)
[![License](https://img.shields.io/badge/license-Apache%202-blue.svg)](LICENSE)
欢迎来到 PaddlePaddle GitHub
PaddlePaddle (PArallel Distributed Deep LEarning) 是一个简单易用、高效灵活、可扩展的深度学习平台,最初由百度科学家和工程师共同开发,目的是将深度学习技术应用到百度的众多产品中。
我们的愿景是让每个人都能通过PaddlePaddle接触深度学习
跟进PaddlePaddle最新特性请参考我们的[版本说明](https://github.com/PaddlePaddle/Paddle/releases)
### PaddlePaddle最新版本: [Fluid 1.2.0](https://github.com/PaddlePaddle/Paddle/tree/release/1.2)
### 安装最新稳定版本:
```
# Linux CPU
pip install paddlepaddle
# Linux GPU cuda9cudnn7
pip install paddlepaddle-gpu
# Linux GPU cuda8cudnn7
pip install paddlepaddle-gpu==1.2.0.post87
# Linux GPU cuda8cudnn5
pip install paddlepaddle-gpu==1.2.0.post85
# 其他平台上的安装指引请参考 http://paddlepaddle.org/
```
## 特性
- **灵活性**
PaddlePaddle支持丰富的神经网络架构和优化算法。易于配置复杂模型,例如带有注意力机制或复杂记忆连接的神经网络机器翻译模型。
- **高效性**
为了高效使用异步计算资源,PaddlePaddle对框架的不同层进行优化,包括计算、存储、架构和通信。下面是一些样例:
- 通过SSE/AVX 内置函数、BLAS库(例如MKL、OpenBLAS、cuBLAS)或定制的CPU/GPU内核优化数学操作。
- 通过MKL-DNN库优化CNN网络
- 高度优化循环网络,无需执行 `padding` 操作即可处理 **变长** 序列
- 针对高维稀疏数据模型,优化了局部和分布式训练。
- **稳定性**
有了 PaddlePaddle,使得利用各种CPU/GPU和机器来加速训练变得简单。PaddlePaddle 通过优化通信可以实现巨大吞吐量和快速执行。
- **与产品相连**
另外,PaddlePaddle 的设计也易于部署。在百度,PaddlePaddle 已经部署到含有巨大用户量的产品和服务上,包括广告点击率(CTR)预测、大规模图像分类、光学字符识别(OCR)、搜索排序,计算机病毒检测、推荐系统等等。PaddlePaddle广泛应用于百度产品中,产生了非常重要的影响。我们希望您也能探索 PaddlePaddle 的能力,为您的产品创造新的影响力和效果。
## 安装
推荐阅读官网上的[安装说明](http://paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/install/index_cn.html)
## 文档
我们提供[英文](http://paddlepaddle.org/documentation/docs/en/1.2/getstarted/index_en.html)
[中文](http://paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/index.html) 文档
- [深度学习101](https://github.com/PaddlePaddle/book)
或许您想从这个在线交互式书籍开始,可以在Jupyter Notebook中运行
- [分布式训练](http://paddlepaddle.org/documentation/docs/zh/1.2/user_guides/howto/training/cluster_howto.html)
可以在MPI集群上运行分布式训练任务
- [Python API](http://paddlepaddle.org/documentation/docs/zh/1.2/api_cn/index_cn.html)
新的API支持代码更少更简洁的程序
- [贡献方式](http://paddlepaddle.org/documentation/docs/zh/1.2/advanced_usage/development/contribute_to_paddle/index_cn.html)
欢迎您的贡献!
## 答疑
欢迎您将问题和bug报告以[Github Issues](https://github.com/PaddlePaddle/Paddle/issues)的形式提交
## 版权和许可证
PaddlePaddle由[Apache-2.0 license](LICENSE)提供
...@@ -324,7 +324,8 @@ paddle.fluid.layers.generate_mask_labels ArgSpec(args=['im_info', 'gt_classes', ...@@ -324,7 +324,8 @@ paddle.fluid.layers.generate_mask_labels ArgSpec(args=['im_info', 'gt_classes',
paddle.fluid.layers.iou_similarity ArgSpec(args=['x', 'y', 'name'], varargs=None, keywords=None, defaults=(None,)) paddle.fluid.layers.iou_similarity ArgSpec(args=['x', 'y', 'name'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.layers.box_coder ArgSpec(args=['prior_box', 'prior_box_var', 'target_box', 'code_type', 'box_normalized', 'name', 'axis'], varargs=None, keywords=None, defaults=('encode_center_size', True, None, 0)) paddle.fluid.layers.box_coder ArgSpec(args=['prior_box', 'prior_box_var', 'target_box', 'code_type', 'box_normalized', 'name', 'axis'], varargs=None, keywords=None, defaults=('encode_center_size', True, None, 0))
paddle.fluid.layers.polygon_box_transform ArgSpec(args=['input', 'name'], varargs=None, keywords=None, defaults=(None,)) paddle.fluid.layers.polygon_box_transform ArgSpec(args=['input', 'name'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.layers.yolov3_loss ArgSpec(args=['x', 'gtbox', 'gtlabel', 'anchors', 'class_num', 'ignore_thresh', 'loss_weight_xy', 'loss_weight_wh', 'loss_weight_conf_target', 'loss_weight_conf_notarget', 'loss_weight_class', 'name'], varargs=None, keywords=None, defaults=(None, None, None, None, None, None)) paddle.fluid.layers.yolov3_loss ArgSpec(args=['x', 'gtbox', 'gtlabel', 'anchors', 'anchor_mask', 'class_num', 'ignore_thresh', 'downsample_ratio', 'name'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.layers.box_clip ArgSpec(args=['input', 'im_info', 'name'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.layers.multiclass_nms ArgSpec(args=['bboxes', 'scores', 'score_threshold', 'nms_top_k', 'keep_top_k', 'nms_threshold', 'normalized', 'nms_eta', 'background_label', 'name'], varargs=None, keywords=None, defaults=(0.3, True, 1.0, 0, None)) paddle.fluid.layers.multiclass_nms ArgSpec(args=['bboxes', 'scores', 'score_threshold', 'nms_top_k', 'keep_top_k', 'nms_threshold', 'normalized', 'nms_eta', 'background_label', 'name'], varargs=None, keywords=None, defaults=(0.3, True, 1.0, 0, None))
paddle.fluid.layers.accuracy ArgSpec(args=['input', 'label', 'k', 'correct', 'total'], varargs=None, keywords=None, defaults=(1, None, None)) paddle.fluid.layers.accuracy ArgSpec(args=['input', 'label', 'k', 'correct', 'total'], varargs=None, keywords=None, defaults=(1, None, None))
paddle.fluid.layers.auc ArgSpec(args=['input', 'label', 'curve', 'num_thresholds', 'topk', 'slide_steps'], varargs=None, keywords=None, defaults=('ROC', 4095, 1, 1)) paddle.fluid.layers.auc ArgSpec(args=['input', 'label', 'curve', 'num_thresholds', 'topk', 'slide_steps'], varargs=None, keywords=None, defaults=('ROC', 4095, 1, 1))
...@@ -361,6 +362,9 @@ paddle.fluid.contrib.QuantizeTranspiler.__init__ ArgSpec(args=['self', 'weight_b ...@@ -361,6 +362,9 @@ paddle.fluid.contrib.QuantizeTranspiler.__init__ ArgSpec(args=['self', 'weight_b
paddle.fluid.contrib.QuantizeTranspiler.convert_to_int8 ArgSpec(args=['self', 'program', 'place', 'scope'], varargs=None, keywords=None, defaults=(None,)) paddle.fluid.contrib.QuantizeTranspiler.convert_to_int8 ArgSpec(args=['self', 'program', 'place', 'scope'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.contrib.QuantizeTranspiler.freeze_program ArgSpec(args=['self', 'program', 'place', 'fuse_bn', 'scope'], varargs=None, keywords=None, defaults=(False, None)) paddle.fluid.contrib.QuantizeTranspiler.freeze_program ArgSpec(args=['self', 'program', 'place', 'fuse_bn', 'scope'], varargs=None, keywords=None, defaults=(False, None))
paddle.fluid.contrib.QuantizeTranspiler.training_transpile ArgSpec(args=['self', 'program', 'startup_program'], varargs=None, keywords=None, defaults=(None, None)) paddle.fluid.contrib.QuantizeTranspiler.training_transpile ArgSpec(args=['self', 'program', 'startup_program'], varargs=None, keywords=None, defaults=(None, None))
paddle.fluid.contrib.Calibrator.__init__ ArgSpec(args=['self'], varargs='args', keywords='kwargs', defaults=None)
paddle.fluid.contrib.Calibrator.sample_data ArgSpec(args=['self'], varargs=None, keywords=None, defaults=None)
paddle.fluid.contrib.Calibrator.save_int8_model ArgSpec(args=['self'], varargs=None, keywords=None, defaults=None)
paddle.fluid.contrib.reader.ctr_reader.ctr_reader ArgSpec(args=['feed_dict', 'file_type', 'file_format', 'dense_slot_index', 'sparse_slot_index', 'capacity', 'thread_num', 'batch_size', 'file_list', 'slots', 'name'], varargs=None, keywords=None, defaults=(None,)) paddle.fluid.contrib.reader.ctr_reader.ctr_reader ArgSpec(args=['feed_dict', 'file_type', 'file_format', 'dense_slot_index', 'sparse_slot_index', 'capacity', 'thread_num', 'batch_size', 'file_list', 'slots', 'name'], varargs=None, keywords=None, defaults=(None,))
paddle.fluid.contrib.build_compressor ArgSpec(args=['place', 'data_reader', 'data_feeder', 'scope', 'metrics', 'epoch', 'config'], varargs=None, keywords=None, defaults=(None, None, None, None, None, None, None)) paddle.fluid.contrib.build_compressor ArgSpec(args=['place', 'data_reader', 'data_feeder', 'scope', 'metrics', 'epoch', 'config'], varargs=None, keywords=None, defaults=(None, None, None, None, None, None, None))
paddle.fluid.contrib.CompressPass.__init__ ArgSpec(args=['self', 'place', 'data_reader', 'data_feeder', 'scope', 'metrics', 'epoch', 'program_exe'], varargs=None, keywords=None, defaults=(None, None, None, None, None, None, None)) paddle.fluid.contrib.CompressPass.__init__ ArgSpec(args=['self', 'place', 'data_reader', 'data_feeder', 'scope', 'metrics', 'epoch', 'program_exe'], varargs=None, keywords=None, defaults=(None, None, None, None, None, None, None))
......
...@@ -65,6 +65,7 @@ pass_library(conv_elementwise_add2_act_fuse_pass inference) ...@@ -65,6 +65,7 @@ pass_library(conv_elementwise_add2_act_fuse_pass inference)
pass_library(conv_elementwise_add_fuse_pass inference) pass_library(conv_elementwise_add_fuse_pass inference)
pass_library(conv_affine_channel_fuse_pass inference) pass_library(conv_affine_channel_fuse_pass inference)
pass_library(transpose_flatten_concat_fuse_pass inference) pass_library(transpose_flatten_concat_fuse_pass inference)
pass_library(identity_scale_op_clean_pass base)
# There may be many transpose-flatten structures in a model, and the output of # There may be many transpose-flatten structures in a model, and the output of
# these structures will be used as inputs to the concat Op. This pattern will # these structures will be used as inputs to the concat Op. This pattern will
......
...@@ -117,11 +117,6 @@ bool GraphPatternDetector::MarkPDNodesInGraph(const ir::Graph &graph) { ...@@ -117,11 +117,6 @@ bool GraphPatternDetector::MarkPDNodesInGraph(const ir::Graph &graph) {
// return false; // return false;
} }
} }
for (auto &item : pdnodes2nodes_) {
for (auto &n : item.second) {
GetMarkedNodes(const_cast<Graph *>(&graph)).insert(n);
}
}
VLOG(3) << pdnodes2nodes_.size() << " nodes marked"; VLOG(3) << pdnodes2nodes_.size() << " nodes marked";
return !pdnodes2nodes_.empty(); return !pdnodes2nodes_.empty();
......
// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "paddle/fluid/framework/ir/identity_scale_op_clean_pass.h"
#include <string>
#include "paddle/fluid/framework/ir/graph_pattern_detector.h"
namespace paddle {
namespace framework {
namespace ir {
std::unique_ptr<ir::Graph> IdentityScaleOpCleanPass::ApplyImpl(
std::unique_ptr<ir::Graph> graph) const {
FusePassBase::Init("identity_scale_op_clean", graph.get());
// pre_op -> scale_in -> scale_op -> scale_out
// ->
// pre_op -> scale_out
GraphPatternDetector detector;
auto pre_op = detector.mutable_pattern()->NewNode("pre_op")->assert_is_op();
auto scale_in = detector.mutable_pattern()
->NewNode("scale_in")
->assert_is_op_input("scale")
->AsIntermediate();
auto scale_op = detector.mutable_pattern()
->NewNode("scale_fuse")
->assert_is_op("scale")
->assert_op_attr<float>("scale", 1.)
->assert_op_attr<float>("bias", 0.);
auto scale_out = detector.mutable_pattern()
->NewNode("scale_out")
->assert_is_op_output("scale");
pre_op->LinksTo({scale_in});
scale_op->LinksFrom({scale_in}).LinksTo({scale_out});
GraphPatternDetector::handle_t handler = [&](
const GraphPatternDetector::subgraph_t& subgraph, Graph* graph) {
Node* scale_op_var = subgraph.at(scale_op);
Node* scale_in_var = subgraph.at(scale_in);
Node* scale_out_var = subgraph.at(scale_out);
Node* pre_op_var = subgraph.at(pre_op);
// Link pre_op directly to scale_out
const std::string scale_in_name = scale_in_var->Name();
const std::string scale_out_name = scale_out_var->Name();
// Remove links in graph
GraphSafeRemoveNodes(graph, {scale_in_var, scale_op_var});
// Modify proto message
auto* pre_op_desc = pre_op_var->Op();
for (auto& parameter : *pre_op_desc->Proto()->mutable_outputs()) {
auto* arguments = parameter.mutable_arguments();
auto it = std::find(arguments->begin(), arguments->end(), scale_in_name);
PADDLE_ENFORCE(it != arguments->end());
*it = scale_out_name;
}
IR_NODE_LINK_TO(pre_op_var, scale_out_var);
};
detector(graph.get(), handler);
return graph;
}
} // namespace ir
} // namespace framework
} // namespace paddle
REGISTER_PASS(identity_scale_op_clean_pass,
paddle::framework::ir::IdentityScaleOpCleanPass);
// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#pragma once
#include "paddle/fluid/framework/ir/fuse_pass_base.h"
namespace paddle {
namespace framework {
namespace ir {
class IdentityScaleOpCleanPass : public FusePassBase {
protected:
std::unique_ptr<ir::Graph> ApplyImpl(std::unique_ptr<ir::Graph> graph) const;
private:
virtual ~IdentityScaleOpCleanPass() = default;
};
} // namespace ir
} // namespace framework
} // namespace paddle
...@@ -22,11 +22,7 @@ limitations under the License. */ ...@@ -22,11 +22,7 @@ limitations under the License. */
#include "paddle/fluid/framework/threadpool.h" #include "paddle/fluid/framework/threadpool.h"
#include "paddle/fluid/string/printf.h" #include "paddle/fluid/string/printf.h"
DEFINE_bool(benchmark, false, DECLARE_bool(benchmark);
"Doing memory benchmark. It will make deleting scope synchronized, "
"and add some memory usage logs."
"Default cuda is asynchronous device, set to True will"
"force op run in synchronous mode.");
DEFINE_bool( DEFINE_bool(
eager_delete_scope, true, eager_delete_scope, true,
......
...@@ -83,7 +83,6 @@ void IRPassManager::CreatePasses(Argument *argument, ...@@ -83,7 +83,6 @@ void IRPassManager::CreatePasses(Argument *argument,
new std::string(GetOrCreateModelOptCacheDir(model_opt_cache_dir))); new std::string(GetOrCreateModelOptCacheDir(model_opt_cache_dir)));
} }
// graph_ = pass->Apply(std::move(graph_));
pre_pass = pass_name; pre_pass = pass_name;
passes_.emplace_back(std::move(pass)); passes_.emplace_back(std::move(pass));
...@@ -97,8 +96,9 @@ std::unique_ptr<Graph> IRPassManager::Apply(std::unique_ptr<Graph> graph) { ...@@ -97,8 +96,9 @@ std::unique_ptr<Graph> IRPassManager::Apply(std::unique_ptr<Graph> graph) {
PADDLE_ENFORCE(graph.get()); PADDLE_ENFORCE(graph.get());
// Apply all the passes // Apply all the passes
for (const auto &pass : passes_) { for (const auto &pass : passes_) {
if (pass->Type() == "graph_viz_pass") continue; if (pass->Type() != "graph_viz_pass") {
PrettyLogEndl(Style::H2(), "--- Running IR pass [%s]", pass->Type()); PrettyLogEndl(Style::H2(), "--- Running IR pass [%s]", pass->Type());
}
graph = pass->Apply(std::move(graph)); graph = pass->Apply(std::move(graph));
} }
return std::move(graph); return std::move(graph);
......
...@@ -318,4 +318,9 @@ NativeConfig AnalysisConfig::ToNativeConfig() const { ...@@ -318,4 +318,9 @@ NativeConfig AnalysisConfig::ToNativeConfig() const {
return config; return config;
} }
void AnalysisConfig::SwitchIrDebug(int x) {
ir_debug_ = x;
Update();
}
} // namespace paddle } // namespace paddle
...@@ -58,7 +58,8 @@ namespace { ...@@ -58,7 +58,8 @@ namespace {
bool IsPersistable(const framework::VarDesc *var) { bool IsPersistable(const framework::VarDesc *var) {
if (var->Persistable() && if (var->Persistable() &&
var->GetType() != framework::proto::VarType::FEED_MINIBATCH && var->GetType() != framework::proto::VarType::FEED_MINIBATCH &&
var->GetType() != framework::proto::VarType::FETCH_LIST) { var->GetType() != framework::proto::VarType::FETCH_LIST &&
var->GetType() != framework::proto::VarType::RAW) {
return true; return true;
} }
return false; return false;
......
...@@ -196,7 +196,7 @@ TEST(AnalysisPredictor, memory_optim) { ...@@ -196,7 +196,7 @@ TEST(AnalysisPredictor, memory_optim) {
AnalysisConfig config(FLAGS_dirname); AnalysisConfig config(FLAGS_dirname);
config.DisableGpu(); config.DisableGpu();
config.EnableMemoryOptim(true); config.EnableMemoryOptim(true);
config.pass_builder()->TurnOnDebug(); config.SwitchIrDebug();
auto native_predictor = auto native_predictor =
CreatePaddlePredictor<NativeConfig>(config.ToNativeConfig()); CreatePaddlePredictor<NativeConfig>(config.ToNativeConfig());
......
...@@ -140,9 +140,12 @@ struct AnalysisConfig { ...@@ -140,9 +140,12 @@ struct AnalysisConfig {
*/ */
bool tensorrt_engine_enabled() const { return use_tensorrt_; } bool tensorrt_engine_enabled() const { return use_tensorrt_; }
/** Control whther to debug IR graph analysis phase. /** \brief Control whether to debug IR graph analysis phase.
*
* This will generate DOT files for visualizing the computation graph after
* each analysis pass applied.
*/ */
void SwitchIrDebug(int x = true) { ir_debug_ = x; } void SwitchIrDebug(int x = true);
/** Turn on MKLDNN. /** Turn on MKLDNN.
*/ */
......
...@@ -117,6 +117,7 @@ class CpuPassStrategy : public PassStrategy { ...@@ -117,6 +117,7 @@ class CpuPassStrategy : public PassStrategy {
"conv_bn_fuse_pass", // "conv_bn_fuse_pass", //
"conv_eltwiseadd_bn_fuse_pass", // "conv_eltwiseadd_bn_fuse_pass", //
"is_test_pass", // "is_test_pass", //
"identity_scale_op_clean_pass", //
}); });
use_gpu_ = false; use_gpu_ = false;
} }
...@@ -155,6 +156,7 @@ class GpuPassStrategy : public PassStrategy { ...@@ -155,6 +156,7 @@ class GpuPassStrategy : public PassStrategy {
GpuPassStrategy() : PassStrategy({}) { GpuPassStrategy() : PassStrategy({}) {
passes_.assign({ passes_.assign({
"infer_clean_graph_pass", // "infer_clean_graph_pass", //
"identity_scale_op_clean_pass", //
"conv_affine_channel_fuse_pass", // "conv_affine_channel_fuse_pass", //
"conv_eltwiseadd_affine_channel_fuse_pass", // "conv_eltwiseadd_affine_channel_fuse_pass", //
"conv_bn_fuse_pass", // "conv_bn_fuse_pass", //
......
...@@ -128,9 +128,9 @@ inference_analysis_api_test_with_fake_data(test_analyzer_resnet50 ...@@ -128,9 +128,9 @@ inference_analysis_api_test_with_fake_data(test_analyzer_resnet50
inference_analysis_api_test_with_fake_data(test_analyzer_mobilenet_depthwise_conv inference_analysis_api_test_with_fake_data(test_analyzer_mobilenet_depthwise_conv
"${INFERENCE_DEMO_INSTALL_DIR}/mobilenet_depthwise_conv" analyzer_resnet50_tester.cc "mobilenet_model.tar.gz" SERIAL) "${INFERENCE_DEMO_INSTALL_DIR}/mobilenet_depthwise_conv" analyzer_resnet50_tester.cc "mobilenet_model.tar.gz" SERIAL)
# bert, max_len=20 # bert, max_len=20, embedding_dim=128
set(BERT_INSTALL_DIR "${INFERENCE_DEMO_INSTALL_DIR}/bert20") set(BERT_INSTALL_DIR "${INFERENCE_DEMO_INSTALL_DIR}/bert_emb128")
download_model_and_data(${BERT_INSTALL_DIR} "bert_model.tar.gz" "bert_data_len20.txt.tar.gz") download_model_and_data(${BERT_INSTALL_DIR} "bert_emb128_model.tar.gz" "bert_data_len20.txt.tar.gz")
inference_analysis_api_test(test_analyzer_bert ${BERT_INSTALL_DIR} analyzer_bert_tester.cc SERIAL) inference_analysis_api_test(test_analyzer_bert ${BERT_INSTALL_DIR} analyzer_bert_tester.cc SERIAL)
# anakin # anakin
......
...@@ -142,7 +142,7 @@ void SetConfig(AnalysisConfig *cfg, bool use_mkldnn = false) { ...@@ -142,7 +142,7 @@ void SetConfig(AnalysisConfig *cfg, bool use_mkldnn = false) {
cfg->SetModel(FLAGS_infer_model + "/model", FLAGS_infer_model + "/params"); cfg->SetModel(FLAGS_infer_model + "/model", FLAGS_infer_model + "/params");
cfg->DisableGpu(); cfg->DisableGpu();
cfg->SwitchSpecifyInputNames(); cfg->SwitchSpecifyInputNames();
cfg->pass_builder()->TurnOnDebug(); cfg->SwitchIrDebug();
cfg->SetCpuMathLibraryNumThreads(FLAGS_paddle_num_threads); cfg->SetCpuMathLibraryNumThreads(FLAGS_paddle_num_threads);
if (use_mkldnn) { if (use_mkldnn) {
cfg->EnableMKLDNN(); cfg->EnableMKLDNN();
......
...@@ -69,7 +69,7 @@ void SetInput(std::vector<std::vector<PaddleTensor>> *inputs) { ...@@ -69,7 +69,7 @@ void SetInput(std::vector<std::vector<PaddleTensor>> *inputs) {
TEST(Analyzer_Text_Classification, profile) { TEST(Analyzer_Text_Classification, profile) {
AnalysisConfig cfg; AnalysisConfig cfg;
SetConfig(&cfg); SetConfig(&cfg);
cfg.pass_builder()->TurnOnDebug(); cfg.SwitchIrDebug();
std::vector<PaddleTensor> outputs; std::vector<PaddleTensor> outputs;
std::vector<std::vector<PaddleTensor>> input_slots_all; std::vector<std::vector<PaddleTensor>> input_slots_all;
......
cc_library(benchmark SRCS benchmark.cc DEPS enforce) cc_library(benchmark SRCS benchmark.cc DEPS enforce)
cc_test(test_benchmark SRCS benchmark_tester.cc DEPS benchmark) cc_test(test_benchmark SRCS benchmark_tester.cc DEPS benchmark)
#cc_binary(visualizer SRCS visualizer.cc DEPS analysis cc_binary(visualizer SRCS visualizer.cc DEPS analysis
# paddle_pass_builder ir_pass_manager pass graph_viz_pass analysis_passes) paddle_pass_builder ir_pass_manager pass graph_viz_pass analysis_passes)
...@@ -35,6 +35,7 @@ DEFINE_bool(init_allocated_mem, false, ...@@ -35,6 +35,7 @@ DEFINE_bool(init_allocated_mem, false,
"To find this error in time, we use init_allocated_mem to indicate " "To find this error in time, we use init_allocated_mem to indicate "
"that initializing the allocated memory with a small value " "that initializing the allocated memory with a small value "
"during unit testing."); "during unit testing.");
DECLARE_bool(benchmark);
DECLARE_double(fraction_of_gpu_memory_to_use); DECLARE_double(fraction_of_gpu_memory_to_use);
namespace paddle { namespace paddle {
...@@ -59,11 +60,6 @@ size_t memory_usage(const platform::Place &p); ...@@ -59,11 +60,6 @@ size_t memory_usage(const platform::Place &p);
using BuddyAllocator = detail::BuddyAllocator; using BuddyAllocator = detail::BuddyAllocator;
std::unordered_map</*device id*/ int,
std::pair</*current memory usage*/ uint64_t,
/*peak memory usage*/ uint64_t>>
gpu_mem_info;
BuddyAllocator *GetCPUBuddyAllocator() { BuddyAllocator *GetCPUBuddyAllocator() {
// We tried thread_local for inference::RNN1 model, but that not works much // We tried thread_local for inference::RNN1 model, but that not works much
// for multi-thread test. // for multi-thread test.
...@@ -144,6 +140,8 @@ BuddyAllocator *GetGPUBuddyAllocator(int gpu_id) { ...@@ -144,6 +140,8 @@ BuddyAllocator *GetGPUBuddyAllocator(int gpu_id) {
devices = platform::GetSelectedDevices(); devices = platform::GetSelectedDevices();
int gpu_num = devices.size(); int gpu_num = devices.size();
allocation::GPUMemMonitor.Initialize(devices.size());
a_arr = new BuddyAllocator *[gpu_num]; a_arr = new BuddyAllocator *[gpu_num];
for (size_t i = 0; i < devices.size(); ++i) { for (size_t i = 0; i < devices.size(); ++i) {
int dev_id = devices[i]; int dev_id = devices[i];
...@@ -204,12 +202,7 @@ void *Alloc<platform::CUDAPlace>(const platform::CUDAPlace &place, ...@@ -204,12 +202,7 @@ void *Alloc<platform::CUDAPlace>(const platform::CUDAPlace &place,
<< string::HumanReadableSize(Used<platform::CUDAPlace>(place)); << string::HumanReadableSize(Used<platform::CUDAPlace>(place));
platform::SetDeviceId(cur_dev); platform::SetDeviceId(cur_dev);
} else { } else {
gpu_mem_info[place.device].first += size; if (FLAGS_benchmark) allocation::GPUMemMonitor.Add(place.device, size);
if (gpu_mem_info[place.device].first > gpu_mem_info[place.device].second) {
gpu_mem_info[place.device].second = gpu_mem_info[place.device].first;
VLOG(3) << "device: " << place.device << " peak memory usage : "
<< (gpu_mem_info[place.device].second >> 20) << " MiB";
}
if (FLAGS_init_allocated_mem) { if (FLAGS_init_allocated_mem) {
cudaMemset(ptr, 0xEF, size); cudaMemset(ptr, 0xEF, size);
} }
...@@ -225,7 +218,7 @@ void Free<platform::CUDAPlace>(const platform::CUDAPlace &place, void *p, ...@@ -225,7 +218,7 @@ void Free<platform::CUDAPlace>(const platform::CUDAPlace &place, void *p,
size_t size) { size_t size) {
#ifdef PADDLE_WITH_CUDA #ifdef PADDLE_WITH_CUDA
GetGPUBuddyAllocator(place.device)->Free(p); GetGPUBuddyAllocator(place.device)->Free(p);
gpu_mem_info[place.device].first -= size; if (FLAGS_benchmark) allocation::GPUMemMonitor.Minus(place.device, size);
#else #else
PADDLE_THROW("'CUDAPlace' is not supported in CPU only device."); PADDLE_THROW("'CUDAPlace' is not supported in CPU only device.");
#endif #endif
...@@ -335,6 +328,8 @@ size_t Usage::operator()(const platform::CUDAPinnedPlace &cuda_pinned) const { ...@@ -335,6 +328,8 @@ size_t Usage::operator()(const platform::CUDAPinnedPlace &cuda_pinned) const {
namespace allocation { namespace allocation {
LegacyMemMonitor GPUMemMonitor;
Allocation *LegacyAllocator::AllocateImpl(size_t size, Allocator::Attr attr) { Allocation *LegacyAllocator::AllocateImpl(size_t size, Allocator::Attr attr) {
void *ptr = boost::apply_visitor(legacy::AllocVisitor(size), place_); void *ptr = boost::apply_visitor(legacy::AllocVisitor(size), place_);
return new Allocation(ptr, size, place_); return new Allocation(ptr, size, place_);
...@@ -346,6 +341,63 @@ void LegacyAllocator::Free(Allocation *allocation) { ...@@ -346,6 +341,63 @@ void LegacyAllocator::Free(Allocation *allocation) {
allocation->place()); allocation->place());
delete allocation; delete allocation;
} }
bool MemInfo::Add(const size_t &size) {
std::lock_guard<std::mutex> lock(mutex_);
usage_ += size;
bool peak_point = usage_ > peak_usage_;
if (peak_point) peak_usage_ = usage_;
return peak_point;
}
void MemInfo::Minus(const size_t &size) {
std::lock_guard<std::mutex> lock(mutex_);
usage_ -= size;
}
uint64_t MemInfo::GetPeakUsage() { return peak_usage_; }
LegacyMemMonitor::~LegacyMemMonitor() {
for (auto &item : gpu_mem_info_) delete item.second;
}
void LegacyMemMonitor::Initialize(const int &device_num) {
for (auto i = 0; i < device_num; ++i) {
gpu_mem_info_[i] = new MemInfo();
}
}
void LegacyMemMonitor::Add(const int &device, const size_t &size) {
if (gpu_mem_info_[device]->Add(size)) {
VLOG(3) << "#LegacyMemMonitor# device: " << device
<< " peak memory usage : "
<< (gpu_mem_info_[device]->GetPeakUsage() >> 20) << " MiB";
}
}
void LegacyMemMonitor::Minus(const int &device, const size_t &size) {
gpu_mem_info_[device]->Minus(size);
}
uint64_t LegacyMemMonitor::GetMemUsage(const int &device) {
return gpu_mem_info_.find(device) == gpu_mem_info_.end()
? 0
: gpu_mem_info_[device]->GetPeakUsage();
}
void LegacyMemMonitor::PrintMemUsage() {
std::vector<int> devices;
for (const auto &item : gpu_mem_info_) {
devices.emplace_back(item.first);
}
std::sort(devices.begin(), devices.end());
for (const auto &device : devices) {
std::cout << "Device : " << device << " Peak Memory Usage : "
<< (gpu_mem_info_[device]->GetPeakUsage() >> 20) << " MiB"
<< std::endl;
}
}
} // namespace allocation } // namespace allocation
} // namespace memory } // namespace memory
} // namespace paddle } // namespace paddle
...@@ -13,12 +13,59 @@ ...@@ -13,12 +13,59 @@
// limitations under the License. // limitations under the License.
#pragma once #pragma once
#include <algorithm>
#include <mutex> // NOLINT
#include <unordered_map>
#include <utility>
#include <vector>
#include "paddle/fluid/memory/allocation/allocator.h" #include "paddle/fluid/memory/allocation/allocator.h"
#include "paddle/fluid/platform/place.h" #include "paddle/fluid/platform/place.h"
namespace paddle { namespace paddle {
namespace memory { namespace memory {
namespace allocation { namespace allocation {
class MemInfo {
public:
MemInfo() : usage_(0), peak_usage_(0) {}
MemInfo(const MemInfo &) = delete;
MemInfo &operator=(const MemInfo &) = delete;
// return a flag to indicate current operation will create a peak point or not
bool Add(const size_t &);
void Minus(const size_t &);
uint64_t GetPeakUsage();
private:
/* current memory usage*/
uint64_t usage_;
uint64_t peak_usage_;
std::mutex mutex_;
};
class LegacyMemMonitor {
public:
// used to store the GPU memory usage of each devices
using MemUsage = std::unordered_map</*device id*/ int,
/*mem usage info node*/ MemInfo *>;
MemUsage GetMemUsageInfo() { return gpu_mem_info_; }
~LegacyMemMonitor();
void Initialize(const int &);
void Add(const int &, const size_t &);
void Minus(const int &, const size_t &);
uint64_t GetMemUsage(const int &);
void PrintMemUsage();
protected:
MemUsage gpu_mem_info_;
};
extern LegacyMemMonitor GPUMemMonitor;
class LegacyAllocatorPrivate; class LegacyAllocatorPrivate;
class LegacyAllocator : public Allocator { class LegacyAllocator : public Allocator {
public: public:
......
...@@ -589,8 +589,10 @@ class BatchNormGradMaker : public framework::SingleGradOpDescMaker { ...@@ -589,8 +589,10 @@ class BatchNormGradMaker : public framework::SingleGradOpDescMaker {
op->SetInput("SavedVariance", Output("SavedVariance")); op->SetInput("SavedVariance", Output("SavedVariance"));
// used when setting use_global_stats True during training // used when setting use_global_stats True during training
op->SetInput("Mean", Output("MeanOut")); if (boost::get<bool>(GetAttr("use_global_stats"))) {
op->SetInput("Variance", Output("VarianceOut")); op->SetInput("Mean", Output("MeanOut"));
op->SetInput("Variance", Output("VarianceOut"));
}
op->SetAttrMap(Attrs()); op->SetAttrMap(Attrs());
......
...@@ -31,6 +31,8 @@ detection_library(polygon_box_transform_op SRCS polygon_box_transform_op.cc ...@@ -31,6 +31,8 @@ detection_library(polygon_box_transform_op SRCS polygon_box_transform_op.cc
polygon_box_transform_op.cu) polygon_box_transform_op.cu)
detection_library(rpn_target_assign_op SRCS rpn_target_assign_op.cc) detection_library(rpn_target_assign_op SRCS rpn_target_assign_op.cc)
detection_library(generate_proposal_labels_op SRCS generate_proposal_labels_op.cc) detection_library(generate_proposal_labels_op SRCS generate_proposal_labels_op.cc)
detection_library(box_clip_op SRCS box_clip_op.cc box_clip_op.cu)
detection_library(yolov3_loss_op SRCS yolov3_loss_op.cc)
if(WITH_GPU) if(WITH_GPU)
detection_library(generate_proposals_op SRCS generate_proposals_op.cc generate_proposals_op.cu DEPS memory cub) detection_library(generate_proposals_op SRCS generate_proposals_op.cc generate_proposals_op.cu DEPS memory cub)
......
...@@ -99,5 +99,29 @@ void BboxOverlaps(const framework::Tensor& r_boxes, ...@@ -99,5 +99,29 @@ void BboxOverlaps(const framework::Tensor& r_boxes,
} }
} }
template <class T>
void ClipTiledBoxes(const platform::DeviceContext& ctx,
const framework::Tensor& im_info,
const framework::Tensor& input_boxes,
framework::Tensor* out) {
T* out_data = out->mutable_data<T>(ctx.GetPlace());
const T* im_info_data = im_info.data<T>();
const T* input_boxes_data = input_boxes.data<T>();
T zero(0);
T im_w = round(im_info_data[1] / im_info_data[2]);
T im_h = round(im_info_data[0] / im_info_data[2]);
for (int64_t i = 0; i < input_boxes.numel(); ++i) {
if (i % 4 == 0) {
out_data[i] = std::max(std::min(input_boxes_data[i], im_w - 1), zero);
} else if (i % 4 == 1) {
out_data[i] = std::max(std::min(input_boxes_data[i], im_h - 1), zero);
} else if (i % 4 == 2) {
out_data[i] = std::max(std::min(input_boxes_data[i], im_w - 1), zero);
} else {
out_data[i] = std::max(std::min(input_boxes_data[i], im_h - 1), zero);
}
}
}
} // namespace operators } // namespace operators
} // namespace paddle } // namespace paddle
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/operators/detection/box_clip_op.h"
#include "paddle/fluid/framework/op_registry.h"
namespace paddle {
namespace operators {
class BoxClipOp : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
protected:
void InferShape(framework::InferShapeContext* ctx) const override {
PADDLE_ENFORCE(ctx->HasInput("Input"),
"Input(Input) of BoxClipOp should not be null.");
PADDLE_ENFORCE(ctx->HasInput("ImInfo"),
"Input(ImInfo) of BoxClipOp should not be null.");
auto input_box_dims = ctx->GetInputDim("Input");
auto im_info_dims = ctx->GetInputDim("ImInfo");
if (ctx->IsRuntime()) {
auto input_box_size = input_box_dims.size();
PADDLE_ENFORCE_EQ(input_box_dims[input_box_size - 1], 4,
"The last dimension of Input must be 4");
PADDLE_ENFORCE_EQ(im_info_dims.size(), 2,
"The rank of Input(Input) in BoxClipOp must be 2");
PADDLE_ENFORCE_EQ(im_info_dims[1], 3,
"The last dimension of ImInfo must be 3");
}
ctx->ShareDim("Input", /*->*/ "Output");
ctx->ShareLoD("Input", /*->*/ "Output");
}
};
class BoxClipOpMaker : public framework::OpProtoAndCheckerMaker {
public:
void Make() override {
AddInput("Input",
"(LoDTensor) "
"Input is a LoDTensor with shape [..., 4] holds 4 points"
"in last dimension in format [xmin, ymin, xmax, ymax]");
AddInput("ImInfo",
"(Tensor) Information for image reshape is in shape (N, 3), "
"in format (height, width, im_scale)");
AddOutput("Output",
"(LoDTensor) "
"Output is a LoDTensor with the same shape as Input"
"and it is the result after clip");
AddComment(R"DOC(
This operator clips input boxes to original input images.
For each input box, The formula is given as follows:
$$xmin = \max(\min(xmin, im_w - 1), 0)$$
$$ymin = \max(\min(ymin, im_h - 1), 0)$$
$$xmax = \max(\min(xmax, im_w - 1), 0)$$
$$ymax = \max(\min(ymax, im_h - 1), 0)$$
where im_w and im_h are computed from ImInfo, the formula is given as follows:
$$im_w = \round(width / im_scale)$$
$$im_h = \round(height / im_scale)$$
)DOC");
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OPERATOR(box_clip, ops::BoxClipOp, ops::BoxClipOpMaker,
paddle::framework::EmptyGradOpMaker);
REGISTER_OP_CPU_KERNEL(
box_clip, ops::BoxClipKernel<paddle::platform::CPUDeviceContext, float>,
ops::BoxClipKernel<paddle::platform::CPUDeviceContext, double>);
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <algorithm>
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/operators/detection/box_clip_op.h"
#include "paddle/fluid/operators/math/math_function.h"
#include "paddle/fluid/platform/cuda_primitives.h"
#include "paddle/fluid/platform/hostdevice.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
using LoDTenso = framework::LoDTensor;
static constexpr int ImInfoSize = 3;
template <typename T, int BlockSize>
static __global__ void GPUBoxClip(const T *input, const size_t *lod,
const size_t width, const T *im_info,
T *output) {
T im_w = round(im_info[blockIdx.x * ImInfoSize + 1] /
im_info[blockIdx.x * ImInfoSize + 2]);
T im_h = round(im_info[blockIdx.x * ImInfoSize] /
im_info[blockIdx.x * ImInfoSize + 2]);
for (int i = threadIdx.x; i < (lod[blockIdx.x + 1] - lod[blockIdx.x]) * width;
i += BlockSize) {
int idx = lod[blockIdx.x] * width + i;
T im_size = (idx % 2 == 0) ? im_w : im_h;
output[idx] = max(min(input[idx], im_size - 1), T(0.));
}
}
template <typename DeviceContext, typename T>
class GPUBoxClipKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext &context) const override {
PADDLE_ENFORCE(platform::is_gpu_place(context.GetPlace()),
"This kernel only runs on GPU device.");
auto *input = context.Input<LoDTensor>("Input");
auto *im_info = context.Input<Tensor>("ImInfo");
auto *output = context.Output<LoDTensor>("Output");
const int64_t num = input->dims()[0];
const int64_t bbox_width = input->numel() / num;
auto lod = input->lod();
framework::LoD abs_offset_lod = framework::ToAbsOffset(lod);
auto &dev_ctx = context.template device_context<DeviceContext>();
auto stream = dev_ctx.stream();
const size_t batch_size = lod.back().size() - 1;
T *output_data = output->mutable_data<T>(dev_ctx.GetPlace());
GPUBoxClip<T, 512><<<batch_size, 512, 0, stream>>>(
input->data<T>(), abs_offset_lod[0].CUDAMutableData(dev_ctx.GetPlace()),
bbox_width, im_info->data<T>(), output_data);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_CUDA_KERNEL(
box_clip, ops::GPUBoxClipKernel<paddle::platform::CUDADeviceContext, float>,
ops::GPUBoxClipKernel<paddle::platform::CUDADeviceContext, double>);
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <string>
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/operators/detection/bbox_util.h"
#include "paddle/fluid/operators/math/math_function.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
using LoDTensor = framework::LoDTensor;
template <typename DeviceContext, typename T>
class BoxClipKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& context) const override {
auto* input_box = context.Input<LoDTensor>("Input");
auto* im_info = context.Input<LoDTensor>("ImInfo");
auto* output_box = context.Output<LoDTensor>("Output");
auto& dev_ctx =
context.template device_context<platform::CPUDeviceContext>();
output_box->mutable_data<T>(context.GetPlace());
if (input_box->lod().size()) {
PADDLE_ENFORCE_EQ(input_box->lod().size(), 1UL,
"Only support 1 level of LoD.");
}
auto box_lod = input_box->lod().back();
int64_t n = static_cast<int64_t>(box_lod.size() - 1);
for (int i = 0; i < n; ++i) {
Tensor im_info_slice = im_info->Slice(i, i + 1);
Tensor box_slice = input_box->Slice(box_lod[i], box_lod[i + 1]);
Tensor output_slice = output_box->Slice(box_lod[i], box_lod[i + 1]);
ClipTiledBoxes<T>(dev_ctx, im_info_slice, box_slice, &output_slice);
}
}
};
} // namespace operators
} // namespace paddle
...@@ -52,6 +52,10 @@ class DensityPriorBoxOpKernel : public framework::OpKernel<T> { ...@@ -52,6 +52,10 @@ class DensityPriorBoxOpKernel : public framework::OpKernel<T> {
step_height = step_h; step_height = step_h;
} }
int num_priors = 0; int num_priors = 0;
#ifdef PADDLE_WITH_MKLML
#pragma omp parallel for reduction(+ : num_priors)
#endif
for (size_t i = 0; i < densities.size(); ++i) { for (size_t i = 0; i < densities.size(); ++i) {
num_priors += (fixed_ratios.size()) * (pow(densities[i], 2)); num_priors += (fixed_ratios.size()) * (pow(densities[i], 2));
} }
...@@ -64,6 +68,17 @@ class DensityPriorBoxOpKernel : public framework::OpKernel<T> { ...@@ -64,6 +68,17 @@ class DensityPriorBoxOpKernel : public framework::OpKernel<T> {
auto e_boxes = framework::EigenTensor<T, 4>::From(*boxes).setConstant(0.0); auto e_boxes = framework::EigenTensor<T, 4>::From(*boxes).setConstant(0.0);
int step_average = static_cast<int>((step_width + step_height) * 0.5); int step_average = static_cast<int>((step_width + step_height) * 0.5);
std::vector<float> sqrt_fixed_ratios;
#ifdef PADDLE_WITH_MKLML
#pragma omp parallel for
#endif
for (int i = 0; i < fixed_ratios.size(); i++) {
sqrt_fixed_ratios.push_back(sqrt(fixed_ratios[i]));
}
#ifdef PADDLE_WITH_MKLML
#pragma omp parallel for collapse(2)
#endif
for (int h = 0; h < feature_height; ++h) { for (int h = 0; h < feature_height; ++h) {
for (int w = 0; w < feature_width; ++w) { for (int w = 0; w < feature_width; ++w) {
T center_x = (w + offset) * step_width; T center_x = (w + offset) * step_width;
...@@ -73,34 +88,25 @@ class DensityPriorBoxOpKernel : public framework::OpKernel<T> { ...@@ -73,34 +88,25 @@ class DensityPriorBoxOpKernel : public framework::OpKernel<T> {
for (size_t s = 0; s < fixed_sizes.size(); ++s) { for (size_t s = 0; s < fixed_sizes.size(); ++s) {
auto fixed_size = fixed_sizes[s]; auto fixed_size = fixed_sizes[s];
int density = densities[s]; int density = densities[s];
int shift = step_average / density;
// Generate density prior boxes with fixed ratios. // Generate density prior boxes with fixed ratios.
for (size_t r = 0; r < fixed_ratios.size(); ++r) { for (size_t r = 0; r < fixed_ratios.size(); ++r) {
float ar = fixed_ratios[r]; float box_width_ratio = fixed_size * sqrt_fixed_ratios[r];
int shift = step_average / density; float box_height_ratio = fixed_size / sqrt_fixed_ratios[r];
float box_width_ratio = fixed_size * sqrt(ar); float density_center_x = center_x - step_average / 2. + shift / 2.;
float box_height_ratio = fixed_size / sqrt(ar); float density_center_y = center_y - step_average / 2. + shift / 2.;
for (int di = 0; di < density; ++di) { for (int di = 0; di < density; ++di) {
for (int dj = 0; dj < density; ++dj) { for (int dj = 0; dj < density; ++dj) {
float center_x_temp = float center_x_temp = density_center_x + dj * shift;
center_x - step_average / 2. + shift / 2. + dj * shift; float center_y_temp = density_center_y + di * shift;
float center_y_temp = e_boxes(h, w, idx, 0) = std::max(
center_y - step_average / 2. + shift / 2. + di * shift; (center_x_temp - box_width_ratio / 2.) / img_width, 0.);
e_boxes(h, w, idx, 0) = e_boxes(h, w, idx, 1) = std::max(
(center_x_temp - box_width_ratio / 2.) / img_width >= 0 (center_y_temp - box_height_ratio / 2.) / img_height, 0.);
? (center_x_temp - box_width_ratio / 2.) / img_width e_boxes(h, w, idx, 2) = std::min(
: 0; (center_x_temp + box_width_ratio / 2.) / img_width, 1.);
e_boxes(h, w, idx, 1) = e_boxes(h, w, idx, 3) = std::min(
(center_y_temp - box_height_ratio / 2.) / img_height >= 0 (center_y_temp + box_height_ratio / 2.) / img_height, 1.);
? (center_y_temp - box_height_ratio / 2.) / img_height
: 0;
e_boxes(h, w, idx, 2) =
(center_x_temp + box_width_ratio / 2.) / img_width <= 1
? (center_x_temp + box_width_ratio / 2.) / img_width
: 1;
e_boxes(h, w, idx, 3) =
(center_y_temp + box_height_ratio / 2.) / img_height <= 1
? (center_y_temp + box_height_ratio / 2.) / img_height
: 1;
idx++; idx++;
} }
} }
...@@ -131,8 +137,14 @@ class DensityPriorBoxOpKernel : public framework::OpKernel<T> { ...@@ -131,8 +137,14 @@ class DensityPriorBoxOpKernel : public framework::OpKernel<T> {
vars->Resize({box_num, static_cast<int>(variances.size())}); vars->Resize({box_num, static_cast<int>(variances.size())});
auto e_vars = framework::EigenMatrix<T, Eigen::RowMajor>::From(*vars); auto e_vars = framework::EigenMatrix<T, Eigen::RowMajor>::From(*vars);
#ifdef PADDLE_WITH_MKLML
e_vars = var_et.broadcast(Eigen::DSizes<int, 2>(box_num, 1)); #pragma omp parallel for collapse(2)
#endif
for (int i = 0; i < box_num; ++i) {
for (int j = 0; j < variances.size(); ++j) {
e_vars(i, j) = variances[j];
}
}
vars->Resize(var_dim); vars->Resize(var_dim);
boxes->Resize(box_dim); boxes->Resize(box_dim);
......
...@@ -9,7 +9,7 @@ ...@@ -9,7 +9,7 @@
See the License for the specific language governing permissions and See the License for the specific language governing permissions and
limitations under the License. */ limitations under the License. */
#include "paddle/fluid/operators/yolov3_loss_op.h" #include "paddle/fluid/operators/detection/yolov3_loss_op.h"
#include "paddle/fluid/framework/op_registry.h" #include "paddle/fluid/framework/op_registry.h"
namespace paddle { namespace paddle {
...@@ -29,23 +29,33 @@ class Yolov3LossOp : public framework::OperatorWithKernel { ...@@ -29,23 +29,33 @@ class Yolov3LossOp : public framework::OperatorWithKernel {
"Input(GTLabel) of Yolov3LossOp should not be null."); "Input(GTLabel) of Yolov3LossOp should not be null.");
PADDLE_ENFORCE(ctx->HasOutput("Loss"), PADDLE_ENFORCE(ctx->HasOutput("Loss"),
"Output(Loss) of Yolov3LossOp should not be null."); "Output(Loss) of Yolov3LossOp should not be null.");
PADDLE_ENFORCE(
ctx->HasOutput("ObjectnessMask"),
"Output(ObjectnessMask) of Yolov3LossOp should not be null.");
PADDLE_ENFORCE(ctx->HasOutput("GTMatchMask"),
"Output(GTMatchMask) of Yolov3LossOp should not be null.");
auto dim_x = ctx->GetInputDim("X"); auto dim_x = ctx->GetInputDim("X");
auto dim_gtbox = ctx->GetInputDim("GTBox"); auto dim_gtbox = ctx->GetInputDim("GTBox");
auto dim_gtlabel = ctx->GetInputDim("GTLabel"); auto dim_gtlabel = ctx->GetInputDim("GTLabel");
auto anchors = ctx->Attrs().Get<std::vector<int>>("anchors"); auto anchors = ctx->Attrs().Get<std::vector<int>>("anchors");
int anchor_num = anchors.size() / 2;
auto anchor_mask = ctx->Attrs().Get<std::vector<int>>("anchor_mask");
int mask_num = anchor_mask.size();
auto class_num = ctx->Attrs().Get<int>("class_num"); auto class_num = ctx->Attrs().Get<int>("class_num");
PADDLE_ENFORCE_EQ(dim_x.size(), 4, "Input(X) should be a 4-D tensor."); PADDLE_ENFORCE_EQ(dim_x.size(), 4, "Input(X) should be a 4-D tensor.");
PADDLE_ENFORCE_EQ(dim_x[2], dim_x[3], PADDLE_ENFORCE_EQ(dim_x[2], dim_x[3],
"Input(X) dim[3] and dim[4] should be euqal."); "Input(X) dim[3] and dim[4] should be euqal.");
PADDLE_ENFORCE_EQ(dim_x[1], anchors.size() / 2 * (5 + class_num), PADDLE_ENFORCE_EQ(
"Input(X) dim[1] should be equal to (anchor_number * (5 " dim_x[1], mask_num * (5 + class_num),
"+ class_num))."); "Input(X) dim[1] should be equal to (anchor_mask_number * (5 "
"+ class_num)).");
PADDLE_ENFORCE_EQ(dim_gtbox.size(), 3, PADDLE_ENFORCE_EQ(dim_gtbox.size(), 3,
"Input(GTBox) should be a 3-D tensor"); "Input(GTBox) should be a 3-D tensor");
PADDLE_ENFORCE_EQ(dim_gtbox[2], 4, "Input(GTBox) dim[2] should be 5"); PADDLE_ENFORCE_EQ(dim_gtbox[2], 4, "Input(GTBox) dim[2] should be 5");
PADDLE_ENFORCE_EQ(dim_gtlabel.size(), 2, PADDLE_ENFORCE_EQ(dim_gtlabel.size(), 2,
"Input(GTBox) should be a 2-D tensor"); "Input(GTLabel) should be a 2-D tensor");
PADDLE_ENFORCE_EQ(dim_gtlabel[0], dim_gtbox[0], PADDLE_ENFORCE_EQ(dim_gtlabel[0], dim_gtbox[0],
"Input(GTBox) and Input(GTLabel) dim[0] should be same"); "Input(GTBox) and Input(GTLabel) dim[0] should be same");
PADDLE_ENFORCE_EQ(dim_gtlabel[1], dim_gtbox[1], PADDLE_ENFORCE_EQ(dim_gtlabel[1], dim_gtbox[1],
...@@ -54,11 +64,22 @@ class Yolov3LossOp : public framework::OperatorWithKernel { ...@@ -54,11 +64,22 @@ class Yolov3LossOp : public framework::OperatorWithKernel {
"Attr(anchors) length should be greater then 0."); "Attr(anchors) length should be greater then 0.");
PADDLE_ENFORCE_EQ(anchors.size() % 2, 0, PADDLE_ENFORCE_EQ(anchors.size() % 2, 0,
"Attr(anchors) length should be even integer."); "Attr(anchors) length should be even integer.");
for (size_t i = 0; i < anchor_mask.size(); i++) {
PADDLE_ENFORCE_LT(
anchor_mask[i], anchor_num,
"Attr(anchor_mask) should not crossover Attr(anchors).");
}
PADDLE_ENFORCE_GT(class_num, 0, PADDLE_ENFORCE_GT(class_num, 0,
"Attr(class_num) should be an integer greater then 0."); "Attr(class_num) should be an integer greater then 0.");
std::vector<int64_t> dim_out({1}); std::vector<int64_t> dim_out({dim_x[0]});
ctx->SetOutputDim("Loss", framework::make_ddim(dim_out)); ctx->SetOutputDim("Loss", framework::make_ddim(dim_out));
std::vector<int64_t> dim_obj_mask({dim_x[0], mask_num, dim_x[2], dim_x[3]});
ctx->SetOutputDim("ObjectnessMask", framework::make_ddim(dim_obj_mask));
std::vector<int64_t> dim_gt_match_mask({dim_gtbox[0], dim_gtbox[1]});
ctx->SetOutputDim("GTMatchMask", framework::make_ddim(dim_gt_match_mask));
} }
protected: protected:
...@@ -73,11 +94,11 @@ class Yolov3LossOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -73,11 +94,11 @@ class Yolov3LossOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
void Make() override { void Make() override {
AddInput("X", AddInput("X",
"The input tensor of YOLO v3 loss operator, " "The input tensor of YOLOv3 loss operator, "
"This is a 4-D tensor with shape of [N, C, H, W]." "This is a 4-D tensor with shape of [N, C, H, W]."
"H and W should be same, and the second dimention(C) stores" "H and W should be same, and the second dimention(C) stores"
"box locations, confidence score and classification one-hot" "box locations, confidence score and classification one-hot"
"key of each anchor box"); "keys of each anchor box");
AddInput("GTBox", AddInput("GTBox",
"The input tensor of ground truth boxes, " "The input tensor of ground truth boxes, "
"This is a 3-D tensor with shape of [N, max_box_num, 5], " "This is a 3-D tensor with shape of [N, max_box_num, 5], "
...@@ -89,32 +110,39 @@ class Yolov3LossOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -89,32 +110,39 @@ class Yolov3LossOpMaker : public framework::OpProtoAndCheckerMaker {
AddInput("GTLabel", AddInput("GTLabel",
"The input tensor of ground truth label, " "The input tensor of ground truth label, "
"This is a 2-D tensor with shape of [N, max_box_num], " "This is a 2-D tensor with shape of [N, max_box_num], "
"and each element shoudl be an integer to indicate the " "and each element should be an integer to indicate the "
"box class id."); "box class id.");
AddOutput("Loss", AddOutput("Loss",
"The output yolov3 loss tensor, " "The output yolov3 loss tensor, "
"This is a 1-D tensor with shape of [1]"); "This is a 1-D tensor with shape of [N]");
AddOutput("ObjectnessMask",
"This is an intermediate tensor with shape of [N, M, H, W], "
"M is the number of anchor masks. This parameter caches the "
"mask for calculate objectness loss in gradient kernel.")
.AsIntermediate();
AddOutput("GTMatchMask",
"This is an intermediate tensor with shape of [N, B], "
"B is the max box number of GT boxes. This parameter caches "
"matched mask index of each GT boxes for gradient calculate.")
.AsIntermediate();
AddAttr<int>("class_num", "The number of classes to predict."); AddAttr<int>("class_num", "The number of classes to predict.");
AddAttr<std::vector<int>>("anchors", AddAttr<std::vector<int>>("anchors",
"The anchor width and height, " "The anchor width and height, "
"it will be parsed pair by pair."); "it will be parsed pair by pair.")
.SetDefault(std::vector<int>{});
AddAttr<std::vector<int>>("anchor_mask",
"The mask index of anchors used in "
"current YOLOv3 loss calculation.")
.SetDefault(std::vector<int>{});
AddAttr<int>("downsample_ratio",
"The downsample ratio from network input to YOLOv3 loss "
"input, so 32, 16, 8 should be set for the first, second, "
"and thrid YOLOv3 loss operators.")
.SetDefault(32);
AddAttr<float>("ignore_thresh", AddAttr<float>("ignore_thresh",
"The ignore threshold to ignore confidence loss."); "The ignore threshold to ignore confidence loss.")
AddAttr<float>("loss_weight_xy", "The weight of x, y location loss.") .SetDefault(0.7);
.SetDefault(1.0);
AddAttr<float>("loss_weight_wh", "The weight of w, h location loss.")
.SetDefault(1.0);
AddAttr<float>(
"loss_weight_conf_target",
"The weight of confidence score loss in locations with target object.")
.SetDefault(1.0);
AddAttr<float>("loss_weight_conf_notarget",
"The weight of confidence score loss in locations without "
"target object.")
.SetDefault(1.0);
AddAttr<float>("loss_weight_class", "The weight of classification loss.")
.SetDefault(1.0);
AddComment(R"DOC( AddComment(R"DOC(
This operator generate yolov3 loss by given predict result and ground This operator generate yolov3 loss by given predict result and ground
truth boxes. truth boxes.
...@@ -147,17 +175,28 @@ class Yolov3LossOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -147,17 +175,28 @@ class Yolov3LossOpMaker : public framework::OpProtoAndCheckerMaker {
thresh, the confidence score loss of this anchor box will be ignored. thresh, the confidence score loss of this anchor box will be ignored.
Therefore, the yolov3 loss consist of three major parts, box location loss, Therefore, the yolov3 loss consist of three major parts, box location loss,
confidence score loss, and classification loss. The MSE loss is used for confidence score loss, and classification loss. The L2 loss is used for
box location, and binary cross entropy loss is used for confidence score box coordinates (w, h), and sigmoid cross entropy loss is used for box
loss and classification loss. coordinates (x, y), confidence score loss and classification loss.
Each groud truth box find a best matching anchor box in all anchors,
prediction of this anchor box will incur all three parts of losses, and
prediction of anchor boxes with no GT box matched will only incur objectness
loss.
In order to trade off box coordinate losses between big boxes and small
boxes, box coordinate losses will be mutiplied by scale weight, which is
calculated as follow.
$$
weight_{box} = 2.0 - t_w * t_h
$$
Final loss will be represented as follow. Final loss will be represented as follow.
$$ $$
loss = \loss_weight_{xy} * loss_{xy} + \loss_weight_{wh} * loss_{wh} loss = (loss_{xy} + loss_{wh}) * weight_{box}
+ \loss_weight_{conf_target} * loss_{conf_target} + loss_{conf} + loss_{class}
+ \loss_weight_{conf_notarget} * loss_{conf_notarget}
+ \loss_weight_{class} * loss_{class}
$$ $$
)DOC"); )DOC");
} }
...@@ -196,6 +235,8 @@ class Yolov3LossGradMaker : public framework::SingleGradOpDescMaker { ...@@ -196,6 +235,8 @@ class Yolov3LossGradMaker : public framework::SingleGradOpDescMaker {
op->SetInput("GTBox", Input("GTBox")); op->SetInput("GTBox", Input("GTBox"));
op->SetInput("GTLabel", Input("GTLabel")); op->SetInput("GTLabel", Input("GTLabel"));
op->SetInput(framework::GradVarName("Loss"), OutputGrad("Loss")); op->SetInput(framework::GradVarName("Loss"), OutputGrad("Loss"));
op->SetInput("ObjectnessMask", Output("ObjectnessMask"));
op->SetInput("GTMatchMask", Output("GTMatchMask"));
op->SetAttrMap(Attrs()); op->SetAttrMap(Attrs());
......
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <algorithm>
#include <vector>
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/operators/math/math_function.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
template <typename T, size_t D, int MajorType = Eigen::RowMajor,
typename IndexType = Eigen::DenseIndex>
using EigenTensor = framework::EigenTensor<T, D, MajorType, IndexType>;
template <typename T, int MajorType = Eigen::RowMajor,
typename IndexType = Eigen::DenseIndex>
using EigenVector = framework::EigenVector<T, MajorType, IndexType>;
template <typename T>
static inline bool LessEqualZero(T x) {
return x < 1e-6;
}
template <typename T>
static T SigmoidCrossEntropy(T x, T label) {
return (x > 0 ? x : 0.0) - x * label + std::log(1.0 + std::exp(-std::abs(x)));
}
template <typename T>
static T L2Loss(T x, T y) {
return 0.5 * (y - x) * (y - x);
}
template <typename T>
static T SigmoidCrossEntropyGrad(T x, T label) {
return 1.0 / (1.0 + std::exp(-x)) - label;
}
template <typename T>
static T L2LossGrad(T x, T y) {
return x - y;
}
static int GetMaskIndex(std::vector<int> mask, int val) {
for (size_t i = 0; i < mask.size(); i++) {
if (mask[i] == val) {
return i;
}
}
return -1;
}
template <typename T>
struct Box {
T x, y, w, h;
};
template <typename T>
static inline T sigmoid(T x) {
return 1.0 / (1.0 + std::exp(-x));
}
template <typename T>
static inline Box<T> GetYoloBox(const T* x, std::vector<int> anchors, int i,
int j, int an_idx, int grid_size,
int input_size, int index, int stride) {
Box<T> b;
b.x = (i + sigmoid<T>(x[index])) / grid_size;
b.y = (j + sigmoid<T>(x[index + stride])) / grid_size;
b.w = std::exp(x[index + 2 * stride]) * anchors[2 * an_idx] / input_size;
b.h = std::exp(x[index + 3 * stride]) * anchors[2 * an_idx + 1] / input_size;
return b;
}
template <typename T>
static inline Box<T> GetGtBox(const T* gt, int batch, int max_boxes, int idx) {
Box<T> b;
b.x = gt[(batch * max_boxes + idx) * 4];
b.y = gt[(batch * max_boxes + idx) * 4 + 1];
b.w = gt[(batch * max_boxes + idx) * 4 + 2];
b.h = gt[(batch * max_boxes + idx) * 4 + 3];
return b;
}
template <typename T>
static inline T BoxOverlap(T c1, T w1, T c2, T w2) {
T l1 = c1 - w1 / 2.0;
T l2 = c2 - w2 / 2.0;
T left = l1 > l2 ? l1 : l2;
T r1 = c1 + w1 / 2.0;
T r2 = c2 + w2 / 2.0;
T right = r1 < r2 ? r1 : r2;
return right - left;
}
template <typename T>
static inline T CalcBoxIoU(Box<T> b1, Box<T> b2) {
T w = BoxOverlap(b1.x, b1.w, b2.x, b2.w);
T h = BoxOverlap(b1.y, b1.h, b2.y, b2.h);
T inter_area = (w < 0 || h < 0) ? 0.0 : w * h;
T union_area = b1.w * b1.h + b2.w * b2.h - inter_area;
return inter_area / union_area;
}
static inline int GetEntryIndex(int batch, int an_idx, int hw_idx, int an_num,
int an_stride, int stride, int entry) {
return (batch * an_num + an_idx) * an_stride + entry * stride + hw_idx;
}
template <typename T>
static void CalcBoxLocationLoss(T* loss, const T* input, Box<T> gt,
std::vector<int> anchors, int an_idx,
int box_idx, int gi, int gj, int grid_size,
int input_size, int stride) {
T tx = gt.x * grid_size - gi;
T ty = gt.y * grid_size - gj;
T tw = std::log(gt.w * input_size / anchors[2 * an_idx]);
T th = std::log(gt.h * input_size / anchors[2 * an_idx + 1]);
T scale = (2.0 - gt.w * gt.h);
loss[0] += SigmoidCrossEntropy<T>(input[box_idx], tx) * scale;
loss[0] += SigmoidCrossEntropy<T>(input[box_idx + stride], ty) * scale;
loss[0] += L2Loss<T>(input[box_idx + 2 * stride], tw) * scale;
loss[0] += L2Loss<T>(input[box_idx + 3 * stride], th) * scale;
}
template <typename T>
static void CalcBoxLocationLossGrad(T* input_grad, const T loss, const T* input,
Box<T> gt, std::vector<int> anchors,
int an_idx, int box_idx, int gi, int gj,
int grid_size, int input_size, int stride) {
T tx = gt.x * grid_size - gi;
T ty = gt.y * grid_size - gj;
T tw = std::log(gt.w * input_size / anchors[2 * an_idx]);
T th = std::log(gt.h * input_size / anchors[2 * an_idx + 1]);
T scale = (2.0 - gt.w * gt.h);
input_grad[box_idx] =
SigmoidCrossEntropyGrad<T>(input[box_idx], tx) * scale * loss;
input_grad[box_idx + stride] =
SigmoidCrossEntropyGrad<T>(input[box_idx + stride], ty) * scale * loss;
input_grad[box_idx + 2 * stride] =
L2LossGrad<T>(input[box_idx + 2 * stride], tw) * scale * loss;
input_grad[box_idx + 3 * stride] =
L2LossGrad<T>(input[box_idx + 3 * stride], th) * scale * loss;
}
template <typename T>
static inline void CalcLabelLoss(T* loss, const T* input, const int index,
const int label, const int class_num,
const int stride) {
for (int i = 0; i < class_num; i++) {
T pred = input[index + i * stride];
loss[0] += SigmoidCrossEntropy<T>(pred, (i == label) ? 1.0 : 0.0);
}
}
template <typename T>
static inline void CalcLabelLossGrad(T* input_grad, const T loss,
const T* input, const int index,
const int label, const int class_num,
const int stride) {
for (int i = 0; i < class_num; i++) {
T pred = input[index + i * stride];
input_grad[index + i * stride] =
SigmoidCrossEntropyGrad<T>(pred, (i == label) ? 1.0 : 0.0) * loss;
}
}
template <typename T>
static inline void CalcObjnessLoss(T* loss, const T* input, const T* objness,
const int n, const int an_num, const int h,
const int w, const int stride,
const int an_stride) {
for (int i = 0; i < n; i++) {
for (int j = 0; j < an_num; j++) {
for (int k = 0; k < h; k++) {
for (int l = 0; l < w; l++) {
T obj = objness[k * w + l];
if (obj > 1e-5) {
// positive sample: obj = 1
loss[i] += SigmoidCrossEntropy<T>(input[k * w + l], 1.0);
} else if (obj > -0.5) {
// negetive sample: obj = 0
loss[i] += SigmoidCrossEntropy<T>(input[k * w + l], 0.0);
}
}
}
objness += stride;
input += an_stride;
}
}
}
template <typename T>
static inline void CalcObjnessLossGrad(T* input_grad, const T* loss,
const T* input, const T* objness,
const int n, const int an_num,
const int h, const int w,
const int stride, const int an_stride) {
for (int i = 0; i < n; i++) {
for (int j = 0; j < an_num; j++) {
for (int k = 0; k < h; k++) {
for (int l = 0; l < w; l++) {
T obj = objness[k * w + l];
if (obj > 1e-5) {
input_grad[k * w + l] =
SigmoidCrossEntropyGrad<T>(input[k * w + l], 1.0) * loss[i];
} else if (obj > -0.5) {
input_grad[k * w + l] =
SigmoidCrossEntropyGrad<T>(input[k * w + l], 0.0) * loss[i];
}
}
}
objness += stride;
input += an_stride;
input_grad += an_stride;
}
}
}
template <typename T>
static void inline GtValid(bool* valid, const T* gtbox, const int n,
const int b) {
for (int i = 0; i < n; i++) {
for (int j = 0; j < b; j++) {
if (LessEqualZero(gtbox[j * 4 + 2]) || LessEqualZero(gtbox[j * 4 + 3])) {
valid[j] = false;
} else {
valid[j] = true;
}
}
valid += b;
gtbox += b * 4;
}
}
template <typename T>
class Yolov3LossKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* input = ctx.Input<Tensor>("X");
auto* gt_box = ctx.Input<Tensor>("GTBox");
auto* gt_label = ctx.Input<Tensor>("GTLabel");
auto* loss = ctx.Output<Tensor>("Loss");
auto* objness_mask = ctx.Output<Tensor>("ObjectnessMask");
auto* gt_match_mask = ctx.Output<Tensor>("GTMatchMask");
auto anchors = ctx.Attr<std::vector<int>>("anchors");
auto anchor_mask = ctx.Attr<std::vector<int>>("anchor_mask");
int class_num = ctx.Attr<int>("class_num");
float ignore_thresh = ctx.Attr<float>("ignore_thresh");
int downsample_ratio = ctx.Attr<int>("downsample_ratio");
const int n = input->dims()[0];
const int h = input->dims()[2];
const int w = input->dims()[3];
const int an_num = anchors.size() / 2;
const int mask_num = anchor_mask.size();
const int b = gt_box->dims()[1];
int input_size = downsample_ratio * h;
const int stride = h * w;
const int an_stride = (class_num + 5) * stride;
const T* input_data = input->data<T>();
const T* gt_box_data = gt_box->data<T>();
const int* gt_label_data = gt_label->data<int>();
T* loss_data = loss->mutable_data<T>({n}, ctx.GetPlace());
memset(loss_data, 0, loss->numel() * sizeof(T));
T* obj_mask_data =
objness_mask->mutable_data<T>({n, mask_num, h, w}, ctx.GetPlace());
memset(obj_mask_data, 0, objness_mask->numel() * sizeof(T));
int* gt_match_mask_data =
gt_match_mask->mutable_data<int>({n, b}, ctx.GetPlace());
// calc valid gt box mask, avoid calc duplicately in following code
Tensor gt_valid_mask;
bool* gt_valid_mask_data =
gt_valid_mask.mutable_data<bool>({n, b}, ctx.GetPlace());
GtValid<T>(gt_valid_mask_data, gt_box_data, n, b);
for (int i = 0; i < n; i++) {
for (int j = 0; j < mask_num; j++) {
for (int k = 0; k < h; k++) {
for (int l = 0; l < w; l++) {
// each predict box find a best match gt box, if overlap is bigger
// then ignore_thresh, ignore the objectness loss.
int box_idx =
GetEntryIndex(i, j, k * w + l, mask_num, an_stride, stride, 0);
Box<T> pred = GetYoloBox(input_data, anchors, l, k, anchor_mask[j],
h, input_size, box_idx, stride);
T best_iou = 0;
for (int t = 0; t < b; t++) {
if (!gt_valid_mask_data[i * b + t]) {
continue;
}
Box<T> gt = GetGtBox(gt_box_data, i, b, t);
T iou = CalcBoxIoU(pred, gt);
if (iou > best_iou) {
best_iou = iou;
}
}
// If best IoU is bigger then ignore_thresh,
// ignore the objectness loss.
if (best_iou > ignore_thresh) {
int obj_idx = (i * mask_num + j) * stride + k * w + l;
obj_mask_data[obj_idx] = static_cast<T>(-1);
}
// all losses should be calculated if best IoU
// is bigger then truth thresh, but currently,
// truth thresh is an unreachable value as 1.0.
}
}
}
for (int t = 0; t < b; t++) {
if (!gt_valid_mask_data[i * b + t]) {
gt_match_mask_data[i * b + t] = -1;
continue;
}
Box<T> gt = GetGtBox(gt_box_data, i, b, t);
int gi = static_cast<int>(gt.x * w);
int gj = static_cast<int>(gt.y * h);
Box<T> gt_shift = gt;
gt_shift.x = 0.0;
gt_shift.y = 0.0;
T best_iou = 0.0;
int best_n = 0;
// each gt box find a best match anchor box as positive sample,
// for positive sample, all losses should be calculated, and for
// other samples, only objectness loss is required.
for (int an_idx = 0; an_idx < an_num; an_idx++) {
Box<T> an_box;
an_box.x = 0.0;
an_box.y = 0.0;
an_box.w = anchors[2 * an_idx] / static_cast<T>(input_size);
an_box.h = anchors[2 * an_idx + 1] / static_cast<T>(input_size);
float iou = CalcBoxIoU<T>(an_box, gt_shift);
if (iou > best_iou) {
best_iou = iou;
best_n = an_idx;
}
}
int mask_idx = GetMaskIndex(anchor_mask, best_n);
gt_match_mask_data[i * b + t] = mask_idx;
if (mask_idx >= 0) {
int box_idx = GetEntryIndex(i, mask_idx, gj * w + gi, mask_num,
an_stride, stride, 0);
CalcBoxLocationLoss<T>(loss_data + i, input_data, gt, anchors, best_n,
box_idx, gi, gj, h, input_size, stride);
int obj_idx = (i * mask_num + mask_idx) * stride + gj * w + gi;
obj_mask_data[obj_idx] = 1.0;
int label = gt_label_data[i * b + t];
int label_idx = GetEntryIndex(i, mask_idx, gj * w + gi, mask_num,
an_stride, stride, 5);
CalcLabelLoss<T>(loss_data + i, input_data, label_idx, label,
class_num, stride);
}
}
}
CalcObjnessLoss<T>(loss_data, input_data + 4 * stride, obj_mask_data, n,
mask_num, h, w, stride, an_stride);
}
};
template <typename T>
class Yolov3LossGradKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* input = ctx.Input<Tensor>("X");
auto* gt_box = ctx.Input<Tensor>("GTBox");
auto* gt_label = ctx.Input<Tensor>("GTLabel");
auto* input_grad = ctx.Output<Tensor>(framework::GradVarName("X"));
auto* loss_grad = ctx.Input<Tensor>(framework::GradVarName("Loss"));
auto* objness_mask = ctx.Input<Tensor>("ObjectnessMask");
auto* gt_match_mask = ctx.Input<Tensor>("GTMatchMask");
auto anchors = ctx.Attr<std::vector<int>>("anchors");
auto anchor_mask = ctx.Attr<std::vector<int>>("anchor_mask");
int class_num = ctx.Attr<int>("class_num");
int downsample_ratio = ctx.Attr<int>("downsample_ratio");
const int n = input_grad->dims()[0];
const int c = input_grad->dims()[1];
const int h = input_grad->dims()[2];
const int w = input_grad->dims()[3];
const int mask_num = anchor_mask.size();
const int b = gt_match_mask->dims()[1];
int input_size = downsample_ratio * h;
const int stride = h * w;
const int an_stride = (class_num + 5) * stride;
const T* input_data = input->data<T>();
const T* gt_box_data = gt_box->data<T>();
const int* gt_label_data = gt_label->data<int>();
const T* loss_grad_data = loss_grad->data<T>();
const T* obj_mask_data = objness_mask->data<T>();
const int* gt_match_mask_data = gt_match_mask->data<int>();
T* input_grad_data =
input_grad->mutable_data<T>({n, c, h, w}, ctx.GetPlace());
memset(input_grad_data, 0, input_grad->numel() * sizeof(T));
for (int i = 0; i < n; i++) {
for (int t = 0; t < b; t++) {
int mask_idx = gt_match_mask_data[i * b + t];
if (mask_idx >= 0) {
Box<T> gt = GetGtBox(gt_box_data, i, b, t);
int gi = static_cast<int>(gt.x * w);
int gj = static_cast<int>(gt.y * h);
int box_idx = GetEntryIndex(i, mask_idx, gj * w + gi, mask_num,
an_stride, stride, 0);
CalcBoxLocationLossGrad<T>(
input_grad_data, loss_grad_data[i], input_data, gt, anchors,
anchor_mask[mask_idx], box_idx, gi, gj, h, input_size, stride);
int label = gt_label_data[i * b + t];
int label_idx = GetEntryIndex(i, mask_idx, gj * w + gi, mask_num,
an_stride, stride, 5);
CalcLabelLossGrad<T>(input_grad_data, loss_grad_data[i], input_data,
label_idx, label, class_num, stride);
}
}
}
CalcObjnessLossGrad<T>(input_grad_data + 4 * stride, loss_grad_data,
input_data + 4 * stride, obj_mask_data, n, mask_num,
h, w, stride, an_stride);
}
};
} // namespace operators
} // namespace paddle
...@@ -79,17 +79,17 @@ void FusionRepeatedFCReluOpMaker::Make() { ...@@ -79,17 +79,17 @@ void FusionRepeatedFCReluOpMaker::Make() {
} }
template <typename T> template <typename T>
static void fc_relu(const T* x, const T* w, const T* b, T* y, int m, int n, static void fc_relu(const T* x, const T* w, const T* b, T* y,
int k) { const jit::matmul_attr_t& attr) {
auto matmul = auto matmul =
jit::Get<jit::kMatMul, jit::MatMulTuples<T>, platform::CPUPlace>(k); jit::Get<jit::kMatMul, jit::MatMulTuples<T>, platform::CPUPlace>(attr);
auto addbias_relu = auto addbias_relu =
jit::Get<jit::kVAddRelu, jit::XYZNTuples<T>, platform::CPUPlace>(n); jit::Get<jit::kVAddRelu, jit::XYZNTuples<T>, platform::CPUPlace>(attr.n);
matmul(x, w, y, m, n, k); matmul(x, w, y, &attr);
T* dst = y; T* dst = y;
for (int i = 0; i < m; ++i) { for (int i = 0; i < attr.m; ++i) {
addbias_relu(b, dst, dst, n); addbias_relu(b, dst, dst, attr.n);
dst += n; dst += attr.n;
} }
} }
...@@ -107,32 +107,33 @@ class FusionRepeatedFCReluKernel : public framework::OpKernel<T> { ...@@ -107,32 +107,33 @@ class FusionRepeatedFCReluKernel : public framework::OpKernel<T> {
auto i_dims = in->dims(); auto i_dims = in->dims();
auto w_dims = weights[0]->dims(); auto w_dims = weights[0]->dims();
int m = i_dims[0]; jit::matmul_attr_t attr;
int n = w_dims[1]; attr.m = i_dims[0];
int k = w_dims[0]; attr.n = w_dims[1];
relus[0]->Resize({m, n}); attr.k = w_dims[0];
relus[0]->Resize({attr.m, attr.n});
fc_relu(in->data<T>(), weights[0]->data<T>(), biases[0]->data<T>(), fc_relu(in->data<T>(), weights[0]->data<T>(), biases[0]->data<T>(),
relus[0]->mutable_data<T>(place), m, n, k); relus[0]->mutable_data<T>(place), attr);
for (int i = 1; i < weight_sz - 1; ++i) { for (int i = 1; i < weight_sz - 1; ++i) {
auto i_dims = relus[i - 1]->dims(); auto i_dims = relus[i - 1]->dims();
auto w_dims = weights[i]->dims(); auto w_dims = weights[i]->dims();
int m = i_dims[0]; attr.m = i_dims[0];
int n = w_dims[1]; attr.n = w_dims[1];
int k = w_dims[0]; attr.k = w_dims[0];
relus[i]->Resize({m, n}); relus[i]->Resize({attr.m, attr.n});
fc_relu(relus[i - 1]->data<T>(), weights[i]->data<T>(), fc_relu(relus[i - 1]->data<T>(), weights[i]->data<T>(),
biases[i]->data<T>(), relus[i]->mutable_data<T>(place), m, n, k); biases[i]->data<T>(), relus[i]->mutable_data<T>(place), attr);
} }
auto i_dims_last = relus[weight_sz - 2]->dims(); auto i_dims_last = relus[weight_sz - 2]->dims();
auto w_dims_last = weights[weight_sz - 1]->dims(); auto w_dims_last = weights[weight_sz - 1]->dims();
m = i_dims_last[0]; attr.m = i_dims_last[0];
n = w_dims_last[1]; attr.n = w_dims_last[1];
k = w_dims_last[0]; attr.k = w_dims_last[0];
fc_relu(relus[weight_sz - 2]->data<T>(), weights[weight_sz - 1]->data<T>(), fc_relu(relus[weight_sz - 2]->data<T>(), weights[weight_sz - 1]->data<T>(),
biases[weight_sz - 1]->data<T>(), out->mutable_data<T>(place), m, n, biases[weight_sz - 1]->data<T>(), out->mutable_data<T>(place),
k); attr);
} }
}; };
......
...@@ -87,15 +87,18 @@ class FusionSquaredMatSubKernel : public framework::OpKernel<T> { ...@@ -87,15 +87,18 @@ class FusionSquaredMatSubKernel : public framework::OpKernel<T> {
auto x_dims = x->dims(); auto x_dims = x->dims();
auto y_dims = y->dims(); auto y_dims = y->dims();
int m = x_dims[0]; jit::matmul_attr_t attr;
int k = x_dims[1]; attr.m = x_dims[0];
int n = y_dims[1]; attr.k = x_dims[1];
int o_numel = m * n; attr.n = y_dims[1];
int o_numel = attr.m * attr.n;
auto vsquare_x = auto vsquare_x =
jit::Get<jit::kVSquare, jit::XYNTuples<T>, platform::CPUPlace>(m * k); jit::Get<jit::kVSquare, jit::XYNTuples<T>, platform::CPUPlace>(attr.m *
attr.k);
auto vsquare_y = auto vsquare_y =
jit::Get<jit::kVSquare, jit::XYNTuples<T>, platform::CPUPlace>(k * n); jit::Get<jit::kVSquare, jit::XYNTuples<T>, platform::CPUPlace>(attr.k *
attr.n);
auto vsquare_xy = auto vsquare_xy =
jit::Get<jit::kVSquare, jit::XYNTuples<T>, platform::CPUPlace>(o_numel); jit::Get<jit::kVSquare, jit::XYNTuples<T>, platform::CPUPlace>(o_numel);
auto vsub = auto vsub =
...@@ -103,7 +106,7 @@ class FusionSquaredMatSubKernel : public framework::OpKernel<T> { ...@@ -103,7 +106,7 @@ class FusionSquaredMatSubKernel : public framework::OpKernel<T> {
auto vscal = auto vscal =
jit::Get<jit::kVScal, jit::AXYNTuples<T>, platform::CPUPlace>(o_numel); jit::Get<jit::kVScal, jit::AXYNTuples<T>, platform::CPUPlace>(o_numel);
auto matmul = auto matmul =
jit::Get<jit::kMatMul, jit::MatMulTuples<T>, platform::CPUPlace>(k); jit::Get<jit::kMatMul, jit::MatMulTuples<T>, platform::CPUPlace>(attr);
const T* x_data = x->data<T>(); const T* x_data = x->data<T>();
const T* y_data = y->data<T>(); const T* y_data = y->data<T>();
...@@ -112,12 +115,12 @@ class FusionSquaredMatSubKernel : public framework::OpKernel<T> { ...@@ -112,12 +115,12 @@ class FusionSquaredMatSubKernel : public framework::OpKernel<T> {
T* squared_xy_data = squared_xy->mutable_data<T>(place); T* squared_xy_data = squared_xy->mutable_data<T>(place);
T* o_data = out->mutable_data<T>(place); T* o_data = out->mutable_data<T>(place);
matmul(x_data, y_data, squared_xy_data, m, n, k); matmul(x_data, y_data, squared_xy_data, &attr);
vsquare_xy(squared_xy_data, squared_xy_data, o_numel); vsquare_xy(squared_xy_data, squared_xy_data, o_numel);
vsquare_x(x_data, squared_x_data, m * k); vsquare_x(x_data, squared_x_data, attr.m * attr.k);
vsquare_y(y_data, squared_y_data, k * n); vsquare_y(y_data, squared_y_data, attr.k * attr.n);
matmul(squared_x_data, squared_y_data, o_data, m, n, k); matmul(squared_x_data, squared_y_data, o_data, &attr);
vsub(squared_xy_data, o_data, o_data, o_numel); vsub(squared_xy_data, o_data, o_data, o_numel);
vscal(&scalar, o_data, o_data, o_numel); vscal(&scalar, o_data, o_data, o_numel);
......
...@@ -93,6 +93,7 @@ std::vector<int> TestSizes() { ...@@ -93,6 +93,7 @@ std::vector<int> TestSizes() {
template <typename KernelTuples, typename... Args> template <typename KernelTuples, typename... Args>
struct BenchFunc { struct BenchFunc {
// return this function avg time // return this function avg time
// TODO(TJ): clear cache every time
double operator()(const typename KernelTuples::func_type tgt, Args... args) { double operator()(const typename KernelTuples::func_type tgt, Args... args) {
for (int i = 0; i < FLAGS_burning; ++i) { for (int i = 0; i < FLAGS_burning; ++i) {
tgt(args...); tgt(args...);
...@@ -172,6 +173,9 @@ void BenchXYZNKernel() { ...@@ -172,6 +173,9 @@ void BenchXYZNKernel() {
RandomVec<T>(d, y_data); RandomVec<T>(d, y_data);
BenchAllImpls<KT, jit::XYZNTuples<T>, PlaceType>(d, x.data<T>(), BenchAllImpls<KT, jit::XYZNTuples<T>, PlaceType>(d, x.data<T>(),
y.data<T>(), z_data, d); y.data<T>(), z_data, d);
// test inplace
BenchAllImpls<KT, jit::XYZNTuples<T>, PlaceType>(d, x.data<T>(), z_data,
z_data, d);
} }
} }
...@@ -311,8 +315,9 @@ void BenchMatMulKernel() { ...@@ -311,8 +315,9 @@ void BenchMatMulKernel() {
const T* a_data = a.data<T>(); const T* a_data = a.data<T>();
const T* b_data = b.data<T>(); const T* b_data = b.data<T>();
T* c_data = c.mutable_data<T>(PlaceType()); T* c_data = c.mutable_data<T>(PlaceType());
BenchAllImpls<KT, jit::MatMulTuples<T>, PlaceType>(k, a_data, b_data, const jit::matmul_attr_t attr{m, n, k};
c_data, m, n, k); BenchAllImpls<KT, jit::MatMulTuples<T>, PlaceType>(attr, a_data, b_data,
c_data, &attr);
} }
} }
} }
......
...@@ -9,6 +9,7 @@ function(USE_JITKERNEL_GEN TARGET) ...@@ -9,6 +9,7 @@ function(USE_JITKERNEL_GEN TARGET)
endfunction() endfunction()
# use gen jitcode kernel by name # use gen jitcode kernel by name
USE_JITKERNEL_GEN(kMatMul)
USE_JITKERNEL_GEN(kVMul) USE_JITKERNEL_GEN(kVMul)
USE_JITKERNEL_GEN(kVAdd) USE_JITKERNEL_GEN(kVAdd)
USE_JITKERNEL_GEN(kVSub) USE_JITKERNEL_GEN(kVSub)
......
...@@ -155,7 +155,7 @@ class NCHW16CMulNCCreator : public JitCodeCreator<int> { ...@@ -155,7 +155,7 @@ class NCHW16CMulNCCreator : public JitCodeCreator<int> {
class name##Creator : public JitCodeCreator<int> { \ class name##Creator : public JitCodeCreator<int> { \
public: \ public: \
bool UseMe(const int& attr) const override { \ bool UseMe(const int& attr) const override { \
return platform::MayIUse(platform::avx); \ return platform::MayIUse(platform::avx) && attr <= 1024; \
} \ } \
size_t CodeSize(const int& d) const override { \ size_t CodeSize(const int& d) const override { \
return 96 + d / YMM_FLOAT_BLOCK * 4 * 8; \ return 96 + d / YMM_FLOAT_BLOCK * 4 * 8; \
......
...@@ -61,6 +61,7 @@ class VXXJitCode : public JitCode { ...@@ -61,6 +61,7 @@ class VXXJitCode : public JitCode {
base += "_Vec"; base += "_Vec";
} }
base += (with_relu_ ? "_Relu" : ""); base += (with_relu_ ? "_Relu" : "");
base += "_D" + std::to_string(num_);
return base.c_str(); return base.c_str();
} }
void genCode() override; void genCode() override;
......
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License. */
#include "paddle/fluid/operators/jit/gen/matmul.h"
#include <stddef.h> // offsetof
#include <vector>
#include "paddle/fluid/operators/jit/registry.h"
#include "paddle/fluid/platform/cpu_info.h"
namespace paddle {
namespace operators {
namespace jit {
namespace gen {
void MatMulJitCode::genCode() {
preCode();
int block, rest;
const auto groups = packed_groups(n_, k_, &block, &rest);
PADDLE_ENFORCE_GT(groups.front(), 0);
const int block_len = sizeof(float) * block;
const int x_reg_idx = (block == ZMM_FLOAT_BLOCK ? 32 : 16) - 1;
const int w_reg_idx = x_reg_idx - 1;
// from packed mov(reg_ptr_wgt, ptr[param_attr + offsetof(matmul_attr_t,
// packed_weight)]);
mov(reg_ptr_wgt, param_y);
size_t z_offset = 0;
size_t wgt_offset = 0;
for (size_t g = 0; g < groups.size(); ++g) {
size_t x_offset = 0;
for (int k = 0; k < k_; ++k) {
vbroadcastss(zmm_t(x_reg_idx), ptr[param_x + x_offset]);
// clean
if (k == 0) {
for (int i = 0; i < groups[g]; ++i) {
vxorps(zmm_t(i), zmm_t(i), zmm_t(i));
}
}
for (int i = 0; i < groups[g]; ++i) {
vmovups(zmm_t(w_reg_idx), ptr[reg_ptr_wgt + wgt_offset]);
vfmadd231ps(zmm_t(i), zmm_t(w_reg_idx), zmm_t(x_reg_idx));
wgt_offset += block_len;
}
// last one, save
if (k == k_ - 1) {
for (int i = 0; i < groups[g]; ++i) {
// only rest save should be careful
if (rest != 0 && g == groups.size() - 1 && i == groups[g] - 1) {
break;
}
vmovups(ptr[param_z + z_offset + i * block_len], zmm_t(i));
}
}
x_offset += sizeof(float);
}
z_offset += block_len * groups[g];
}
if (rest != 0) {
// below should refine with mask
int reg_idx = groups.back() - 1;
z_offset = (n_ - rest) * sizeof(float);
int inner_block = 8;
while (rest > 0) {
if (rest >= 8) {
inner_block = 8;
vmovups(ptr[param_z + z_offset], ymm_t(reg_idx));
// shift zmm of inner_block, change reg_idx if update
} else if (rest >= 4) {
inner_block = 4;
vmovups(ptr[param_z + z_offset], xmm_t(reg_idx));
} else if (rest >= 2) {
inner_block = 2;
vmovq(ptr[param_z + z_offset], xmm_t(reg_idx));
} else {
inner_block = 1;
vmovss(ptr[param_z + z_offset], xmm_t(reg_idx));
}
z_offset += inner_block * sizeof(float);
rest -= inner_block;
}
}
postCode();
}
class MatMulCreator : public JitCodeCreator<matmul_attr_t> {
public:
bool UseMe(const matmul_attr_t& attr) const override {
return attr.m == 1 && platform::MayIUse(platform::avx512f) &&
attr.n % ZMM_FLOAT_BLOCK == 0 && attr.k < 512;
}
size_t CodeSize(const matmul_attr_t& attr) const override {
int block = YMM_FLOAT_BLOCK;
if (platform::MayIUse(platform::avx512f)) {
block = ZMM_FLOAT_BLOCK;
}
return 96 + 4 * attr.k * (attr.n / block + 1) * 8;
}
std::unique_ptr<GenBase> CreateJitCode(
const matmul_attr_t& attr) const override {
PADDLE_ENFORCE_GT(attr.m, 0);
PADDLE_ENFORCE_GT(attr.n, 0);
PADDLE_ENFORCE_GT(attr.k, 0);
return make_unique<MatMulJitCode>(attr, CodeSize(attr));
}
};
} // namespace gen
} // namespace jit
} // namespace operators
} // namespace paddle
namespace gen = paddle::operators::jit::gen;
REGISTER_JITKERNEL_GEN(kMatMul, gen::MatMulCreator);
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License. */
#pragma once
#include <stdlib.h> // for malloc and free
#include <string>
#include <vector>
#include "glog/logging.h"
#include "paddle/fluid/operators/jit/gen/jitcode.h"
#include "paddle/fluid/platform/enforce.h"
namespace paddle {
namespace operators {
namespace jit {
namespace gen {
class MatMulJitCode : public JitCode {
public:
explicit MatMulJitCode(const matmul_attr_t& attr,
size_t code_size = 256 * 1024,
void* code_ptr = nullptr)
: JitCode(code_size, code_ptr), m_(attr.m), n_(attr.n), k_(attr.k) {
PADDLE_ENFORCE_EQ(m_, 1, "Only support m==1 yet");
this->genCode();
}
virtual const char* name() const {
std::string base = "MatMulJitCode";
base = base + "_M" + std::to_string(m_) + "_N" + std::to_string(n_) + "_K" +
std::to_string(k_);
return base.c_str();
}
void genCode() override;
private:
int m_, n_, k_;
reg64_t param_x{abi_param1};
reg64_t param_y{abi_param2};
reg64_t param_z{abi_param3};
reg64_t param_attr{abi_param4};
reg64_t reg_tmp{rax};
reg64_t reg_ptr_wgt{r10};
};
} // namespace gen
} // namespace jit
} // namespace operators
} // namespace paddle
...@@ -16,6 +16,8 @@ ...@@ -16,6 +16,8 @@
#include <fstream> #include <fstream>
#include <iostream> #include <iostream>
#include <sstream> #include <sstream>
#include <vector>
#include "paddle/fluid/platform/cpu_info.h"
DEFINE_bool(dump_jitcode, false, "Whether to dump the jitcode to file"); DEFINE_bool(dump_jitcode, false, "Whether to dump the jitcode to file");
...@@ -38,6 +40,35 @@ void GenBase::dumpCode(const unsigned char* code) const { ...@@ -38,6 +40,35 @@ void GenBase::dumpCode(const unsigned char* code) const {
} }
} }
std::vector<int> packed_groups(int n, int k, int* block_out, int* rest_out) {
int block;
int max_num_regs;
if (platform::MayIUse(platform::avx512f)) {
block = ZMM_FLOAT_BLOCK;
max_num_regs = 32;
} else {
block = YMM_FLOAT_BLOCK;
max_num_regs = 16;
}
// one for x, one for y, others for z
const int max_used_regs_for_n = max_num_regs - 2;
const int aligned_n = n % block == 0 ? n : (n / block + 1) * block;
const int num_block = aligned_n / block;
const int num_groups = num_block / max_used_regs_for_n;
std::vector<int> groups(num_groups, max_used_regs_for_n);
int rest_num_regs = num_block % max_used_regs_for_n;
if (rest_num_regs != 0) {
groups.push_back(rest_num_regs);
}
if (block_out) {
*block_out = block;
}
if (rest_out) {
*rest_out = n % block;
}
return groups;
}
} // namespace jit } // namespace jit
} // namespace operators } // namespace operators
} // namespace paddle } // namespace paddle
...@@ -16,6 +16,7 @@ ...@@ -16,6 +16,7 @@
#include <gflags/gflags.h> #include <gflags/gflags.h>
#include <memory> // for unique_ptr #include <memory> // for unique_ptr
#include <vector>
#include "paddle/fluid/operators/jit/kernel_base.h" #include "paddle/fluid/operators/jit/kernel_base.h"
DECLARE_bool(dump_jitcode); DECLARE_bool(dump_jitcode);
...@@ -67,6 +68,11 @@ class JitCodeCreator : public GenCreator { ...@@ -67,6 +68,11 @@ class JitCodeCreator : public GenCreator {
virtual std::unique_ptr<GenBase> CreateJitCode(const Attr& attr) const = 0; virtual std::unique_ptr<GenBase> CreateJitCode(const Attr& attr) const = 0;
}; };
// unify the method of packed groups
// output the packed groups which used in weights, the block size and rest size
std::vector<int> packed_groups(int n, int k, int* block = nullptr,
int* rest = nullptr);
} // namespace jit } // namespace jit
} // namespace operators } // namespace operators
} // namespace paddle } // namespace paddle
...@@ -14,6 +14,8 @@ ...@@ -14,6 +14,8 @@
#include "paddle/fluid/operators/jit/helper.h" #include "paddle/fluid/operators/jit/helper.h"
#include <algorithm> // tolower #include <algorithm> // tolower
#include <numeric>
#include <string>
#include "paddle/fluid/platform/enforce.h" #include "paddle/fluid/platform/enforce.h"
namespace paddle { namespace paddle {
...@@ -91,6 +93,41 @@ KernelType to_kerneltype(const std::string& act) { ...@@ -91,6 +93,41 @@ KernelType to_kerneltype(const std::string& act) {
return kNone; return kNone;
} }
template <>
void pack_weights<float>(const float* src, float* dst, int n, int k) {
int block, rest;
const auto groups = packed_groups(n, k, &block, &rest);
std::for_each(groups.begin(), groups.end(), [&](int i) {
PADDLE_ENFORCE_GT(i, 0, "each element of groups should be larger than 0.");
});
int sum = std::accumulate(groups.begin(), groups.end(), 0);
std::memset(dst, 0, k * sum * block * sizeof(float));
PADDLE_ENFORCE_GE(sum * block, n,
"The packed n should be equal to or larger than n");
const int block_len = sizeof(float) * block;
int n_offset = 0;
for (size_t g = 0; g < groups.size(); ++g) {
const float* from = src + n_offset;
for (int j = 0; j < k; ++j) {
size_t copy_sz = groups[g] * block_len;
if (g == groups.size() - 1 && rest != 0) {
copy_sz = (groups[g] - 1) * block_len + rest * sizeof(float);
}
std::memcpy(dst, from + j * n, copy_sz);
dst += groups[g] * block;
}
n_offset += groups[g] * block;
}
}
template <typename T>
typename std::enable_if<!std::is_same<T, float>::value>::type pack_weights(
const T* src, T* dst, int n, int k) {
PADDLE_THROW("Only support pack with float type.");
}
} // namespace jit } // namespace jit
} // namespace operators } // namespace operators
} // namespace paddle } // namespace paddle
...@@ -118,26 +118,33 @@ typename KernelTuples::func_type Get( ...@@ -118,26 +118,33 @@ typename KernelTuples::func_type Get(
return GetRefer<KT, KernelTuples>(); return GetRefer<KT, KernelTuples>();
} }
template <KernelType KT, typename KernelTuples> template <KernelType KT, typename KernelTuples, typename PlaceType>
class KernelFuncsCache { class KernelFuncs {
public: public:
KernelFuncsCache() = default; KernelFuncs() = default;
static KernelFuncsCache& Instance() { static KernelFuncs& Cache() {
static thread_local KernelFuncsCache<KT, KernelTuples> g_func_cache; static thread_local KernelFuncs<KT, KernelTuples, PlaceType> g_func_cache;
return g_func_cache; return g_func_cache;
} }
bool Has(int key) const { return funcs_.find(key) != funcs_.end(); } bool Has(int key) const { return funcs_.find(key) != funcs_.end(); }
typename KernelTuples::func_type At(int key) { return funcs_.at(key); }
void Insert(int key, typename KernelTuples::func_type func) { void Insert(int key, typename KernelTuples::func_type func) {
funcs_.emplace(key, func); funcs_.emplace(key, func);
} }
typename KernelTuples::func_type At(int key) {
if (Has(key)) {
return funcs_.at(key);
}
auto func = Get<KT, KernelTuples, PlaceType>(key);
Insert(key, func);
return func;
}
private: private:
std::unordered_map<int, typename KernelTuples::func_type> funcs_; std::unordered_map<int, typename KernelTuples::func_type> funcs_;
DISABLE_COPY_AND_ASSIGN(KernelFuncsCache); DISABLE_COPY_AND_ASSIGN(KernelFuncs);
}; };
const char* to_string(KernelType kt); const char* to_string(KernelType kt);
...@@ -152,17 +159,28 @@ inline std::ostream& operator<<(std::ostream& os, const lstm_attr_t& attr) { ...@@ -152,17 +159,28 @@ inline std::ostream& operator<<(std::ostream& os, const lstm_attr_t& attr) {
<< (attr.use_peephole ? "True" : "False") << "]"; << (attr.use_peephole ? "True" : "False") << "]";
return os; return os;
} }
inline std::ostream& operator<<(std::ostream& os, const gru_attr_t& attr) { inline std::ostream& operator<<(std::ostream& os, const gru_attr_t& attr) {
os << "dim_size[" << attr.d << "],act_gate[" << to_string(attr.act_gate) os << "dim_size[" << attr.d << "],act_gate[" << to_string(attr.act_gate)
<< "],act_cand[" << to_string(attr.act_cand) << "]"; << "],act_cand[" << to_string(attr.act_cand) << "]";
return os; return os;
} }
inline std::ostream& operator<<(std::ostream& os, const seq_pool_attr_t& attr) { inline std::ostream& operator<<(std::ostream& os, const seq_pool_attr_t& attr) {
os << "height_size[" << attr.h << "],width_size[" << attr.w << "],pool_type[" os << "height_size[" << attr.h << "],width_size[" << attr.w << "],pool_type["
<< to_string(attr.type) << "]"; << to_string(attr.type) << "]";
return os; return os;
} }
inline std::ostream& operator<<(std::ostream& os, const matmul_attr_t& attr) {
os << "M[" << attr.m << "],N[" << attr.n << "],K[" << attr.k << "]";
return os;
}
// expose the method to pack matmul weight
template <typename T>
void pack_weights(const T* src, T* dst, int n, int k);
} // namespace jit } // namespace jit
} // namespace operators } // namespace operators
} // namespace paddle } // namespace paddle
...@@ -145,11 +145,19 @@ struct SeqPoolTuples { ...@@ -145,11 +145,19 @@ struct SeqPoolTuples {
typedef void (*func_type)(const T*, T*, const seq_pool_attr_t*); typedef void (*func_type)(const T*, T*, const seq_pool_attr_t*);
}; };
typedef struct matmul_attr_s {
int m, n, k;
void* packed_weight{nullptr};
matmul_attr_s() = default;
explicit matmul_attr_s(int m_, int n_, int k_, void* packed_weight_ = nullptr)
: m(m_), n(n_), k(k_), packed_weight(packed_weight_) {}
} matmul_attr_t;
template <typename T> template <typename T>
struct MatMulTuples { struct MatMulTuples {
typedef T data_type; typedef T data_type;
typedef int attr_type; typedef matmul_attr_t attr_type;
typedef void (*func_type)(const T*, const T*, T*, int, int, int); typedef void (*func_type)(const T*, const T*, T*, const matmul_attr_t*);
}; };
template <typename T> template <typename T>
......
...@@ -49,6 +49,13 @@ size_t JitCodeKey<seq_pool_attr_t>(const seq_pool_attr_t& attr) { ...@@ -49,6 +49,13 @@ size_t JitCodeKey<seq_pool_attr_t>(const seq_pool_attr_t& attr) {
return (key << pool_type_shift) + static_cast<int>(attr.type); return (key << pool_type_shift) + static_cast<int>(attr.type);
} }
template <>
size_t JitCodeKey<matmul_attr_t>(const matmul_attr_t& attr) {
size_t key = attr.m;
constexpr int shift = 21;
return (key << shift * 2) + ((static_cast<size_t>(attr.n)) << shift) + attr.k;
}
} // namespace jit } // namespace jit
} // namespace operators } // namespace operators
} // namespace paddle } // namespace paddle
...@@ -49,49 +49,16 @@ void VTanh(const T* x, T* y, int n) { ...@@ -49,49 +49,16 @@ void VTanh(const T* x, T* y, int n) {
} }
void Softmax(const T* x, T* y, int n, int bs) { void Softmax(const T* x, T* y, int n, int bs) {
typename XRNTuples<T>::func_type compute_hmax{nullptr}; auto compute_hmax =
typename XRNTuples<T>::func_type compute_hsum{nullptr}; KernelFuncs<kHMax, XRNTuples<T>, platform::CPUPlace>::Cache().At(n);
typename AXYNTuples<T>::func_type compute_vscal{nullptr}; auto compute_hsum =
typename AXYNTuples<T>::func_type compute_vaddbias{nullptr}; KernelFuncs<kHSum, XRNTuples<T>, platform::CPUPlace>::Cache().At(n);
typename XYNTuples<T>::func_type compute_vexp{nullptr}; auto compute_vscal =
KernelFuncs<kVScal, AXYNTuples<T>, platform::CPUPlace>::Cache().At(n);
if (!KernelFuncsCache<kHMax, XRNTuples<T>>::Instance().Has(n)) { auto compute_vaddbias =
compute_hmax = Get<kHMax, XRNTuples<T>, platform::CPUPlace>(n); KernelFuncs<kVAddBias, AXYNTuples<T>, platform::CPUPlace>::Cache().At(n);
KernelFuncsCache<kHMax, XRNTuples<T>>::Instance().Insert(n, compute_hmax); auto compute_vexp =
} else { KernelFuncs<kVExp, XYNTuples<T>, platform::CPUPlace>::Cache().At(n);
compute_hmax = KernelFuncsCache<kHMax, XRNTuples<T>>::Instance().At(n);
}
if (!KernelFuncsCache<kHSum, XRNTuples<T>>::Instance().Has(n)) {
compute_hsum = Get<kHSum, XRNTuples<T>, platform::CPUPlace>(n);
KernelFuncsCache<kHSum, XRNTuples<T>>::Instance().Insert(n, compute_hsum);
} else {
compute_hsum = KernelFuncsCache<kHSum, XRNTuples<T>>::Instance().At(n);
}
if (!KernelFuncsCache<kVScal, AXYNTuples<T>>::Instance().Has(n)) {
compute_vscal = Get<kVScal, AXYNTuples<T>, platform::CPUPlace>(n);
KernelFuncsCache<kVScal, AXYNTuples<T>>::Instance().Insert(n,
compute_vscal);
} else {
compute_vscal = KernelFuncsCache<kVScal, AXYNTuples<T>>::Instance().At(n);
}
if (!KernelFuncsCache<kVAddBias, AXYNTuples<T>>::Instance().Has(n)) {
compute_vaddbias = Get<kVAddBias, AXYNTuples<T>, platform::CPUPlace>(n);
KernelFuncsCache<kVAddBias, AXYNTuples<T>>::Instance().Insert(
n, compute_vaddbias);
} else {
compute_vaddbias =
KernelFuncsCache<kVAddBias, AXYNTuples<T>>::Instance().At(n);
}
if (!KernelFuncsCache<kVExp, XYNTuples<T>>::Instance().Has(n)) {
compute_vexp = Get<KernelType::kVExp, XYNTuples<T>, platform::CPUPlace>(n);
KernelFuncsCache<kVExp, XYNTuples<T>>::Instance().Insert(n, compute_vexp);
} else {
compute_vexp = KernelFuncsCache<kVExp, XYNTuples<T>>::Instance().At(n);
}
for (int i = 0; i < bs; ++i) { for (int i = 0; i < bs; ++i) {
T scalar; T scalar;
......
...@@ -25,17 +25,19 @@ namespace more { ...@@ -25,17 +25,19 @@ namespace more {
namespace mkl { namespace mkl {
template <> template <>
void MatMul<float>(const float* a, const float* b, float* c, int m, int n, void MatMul<float>(const float* a, const float* b, float* c,
int k) { const matmul_attr_t* attr) {
platform::dynload::cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, platform::dynload::cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans,
n, k, 1.f, a, k, b, n, 0.f, c, n); attr->m, attr->n, attr->k, 1.f, a, attr->k, b,
attr->n, 0.f, c, attr->n);
} }
template <> template <>
void MatMul<double>(const double* a, const double* b, double* c, int m, int n, void MatMul<double>(const double* a, const double* b, double* c,
int k) { const matmul_attr_t* attr) {
platform::dynload::cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, platform::dynload::cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans,
n, k, 1.0, a, k, b, n, 0.0, c, n); attr->m, attr->n, attr->k, 1.0, a, attr->k, b,
attr->n, 0.0, c, attr->n);
} }
template <> template <>
...@@ -127,11 +129,6 @@ void ASum<double>(const double* x, double* res, int n) { ...@@ -127,11 +129,6 @@ void ASum<double>(const double* x, double* res, int n) {
} }
// TODO(TJ): tuning me carefully on AVX, AVX2 and AVX512 // TODO(TJ): tuning me carefully on AVX, AVX2 and AVX512
template <>
bool MatMulKernel<float>::UseMe(const int& d) const {
return platform::MayIUse(platform::avx);
}
template <> template <>
bool VMulKernel<float>::UseMe(const int& d) const { bool VMulKernel<float>::UseMe(const int& d) const {
return platform::MayIUse(platform::avx512f) && d > 512; return platform::MayIUse(platform::avx512f) && d > 512;
...@@ -139,7 +136,7 @@ bool VMulKernel<float>::UseMe(const int& d) const { ...@@ -139,7 +136,7 @@ bool VMulKernel<float>::UseMe(const int& d) const {
template <> template <>
bool VAddKernel<float>::UseMe(const int& d) const { bool VAddKernel<float>::UseMe(const int& d) const {
return platform::MayIUse(platform::avx512f) && d > 512; return platform::MayIUse(platform::avx) && d > 512;
} }
template <> template <>
...@@ -177,6 +174,16 @@ bool SeqPoolKernel<double>::UseMe(const seq_pool_attr_t& attr) const { ...@@ -177,6 +174,16 @@ bool SeqPoolKernel<double>::UseMe(const seq_pool_attr_t& attr) const {
return true; return true;
} }
template <>
bool MatMulKernel<float>::UseMe(const matmul_attr_t& attr) const {
return platform::MayIUse(platform::avx);
}
template <>
bool MatMulKernel<double>::UseMe(const matmul_attr_t& attr) const {
return true;
}
template <> template <>
bool SoftmaxKernel<float>::UseMe(const int& d) const { bool SoftmaxKernel<float>::UseMe(const int& d) const {
// tuned on avx2 // tuned on avx2
...@@ -189,7 +196,6 @@ bool SoftmaxKernel<float>::UseMe(const int& d) const { ...@@ -189,7 +196,6 @@ bool SoftmaxKernel<float>::UseMe(const int& d) const {
return true; \ return true; \
} }
AWALYS_USE_ME_WITH_DOUBLE(MatMul);
AWALYS_USE_ME_WITH_DOUBLE(VMul); AWALYS_USE_ME_WITH_DOUBLE(VMul);
AWALYS_USE_ME_WITH_DOUBLE(VAdd); AWALYS_USE_ME_WITH_DOUBLE(VAdd);
AWALYS_USE_ME_WITH_DOUBLE(VScal); AWALYS_USE_ME_WITH_DOUBLE(VScal);
......
...@@ -26,7 +26,7 @@ namespace more { ...@@ -26,7 +26,7 @@ namespace more {
namespace mkl { namespace mkl {
template <typename T> template <typename T>
void MatMul(const T* a, const T* b, T* c, int m, int n, int k); void MatMul(const T* a, const T* b, T* c, const matmul_attr_t* attr);
template <typename T> template <typename T>
void VMul(const T* x, const T* y, T* z, int n); void VMul(const T* x, const T* y, T* z, int n);
......
...@@ -363,17 +363,19 @@ void SeqPool(const T* x, T* y, const seq_pool_attr_t* attr) { ...@@ -363,17 +363,19 @@ void SeqPool(const T* x, T* y, const seq_pool_attr_t* attr) {
// A(M,K) * B(K,N) = C(M,N) // A(M,K) * B(K,N) = C(M,N)
template <typename T> template <typename T>
void MatMul(const T* A, const T* B, T* C, int M, int N, int K) { void MatMul(const T* A, const T* B, T* C, const matmul_attr_t* attr) {
int M = attr->m;
int N = attr->n;
int K = attr->k;
for (int m = 0; m < M; ++m) { for (int m = 0; m < M; ++m) {
const T* pa = A + m * K; const T* pa = A + m * K;
T* pc = C + m * N; T* pc = C + m * N;
for (int n = 0; n < N; ++n) { for (int n = 0; n < N; ++n) {
const T* pb = B + n; const T* pb = B + n;
T sum = static_cast<T>(0); pc[n] = pa[0] * pb[0];
for (int k = 0; k < K; ++k) { for (int k = 1; k < K; ++k) {
sum += (pa[k] * pb[k * N]); pc[n] += pa[k] * pb[k * N];
} }
*(pc + n) = sum;
} }
} }
} }
......
...@@ -22,7 +22,7 @@ ...@@ -22,7 +22,7 @@
#include "paddle/fluid/platform/cpu_info.h" #include "paddle/fluid/platform/cpu_info.h"
#include "paddle/fluid/platform/place.h" #include "paddle/fluid/platform/place.h"
static double acc = 1e-5; DEFINE_double(acc, 1e-5, "Test accuracy threshold.");
template <typename T> template <typename T>
void RandomVec(const int n, T* a, const T lower = static_cast<T>(-20.f), void RandomVec(const int n, T* a, const T lower = static_cast<T>(-20.f),
...@@ -39,7 +39,7 @@ template <typename T> ...@@ -39,7 +39,7 @@ template <typename T>
void ExpectEQ(const T* target, const T* refer, int n) { void ExpectEQ(const T* target, const T* refer, int n) {
if (std::is_floating_point<T>::value) { if (std::is_floating_point<T>::value) {
for (int i = 0; i < n; ++i) { for (int i = 0; i < n; ++i) {
EXPECT_NEAR(target[i], refer[i], acc); EXPECT_NEAR(target[i], refer[i], FLAGS_acc);
} }
} else { } else {
for (int i = 0; i < n; ++i) { for (int i = 0; i < n; ++i) {
...@@ -272,21 +272,23 @@ struct TestFuncWithRefer<jit::SeqPoolTuples<T>, std::vector<T>, std::vector<T>, ...@@ -272,21 +272,23 @@ struct TestFuncWithRefer<jit::SeqPoolTuples<T>, std::vector<T>, std::vector<T>,
template <typename T> template <typename T>
struct TestFuncWithRefer<jit::MatMulTuples<T>, std::vector<T>, std::vector<T>, struct TestFuncWithRefer<jit::MatMulTuples<T>, std::vector<T>, std::vector<T>,
std::vector<T>, int, int, int> { std::vector<T>,
typename jit::MatMulTuples<T>::attr_type> {
void operator()(const typename jit::MatMulTuples<T>::func_type tgt, void operator()(const typename jit::MatMulTuples<T>::func_type tgt,
const std::vector<T>& a, const std::vector<T>& b, const std::vector<T>& a, const std::vector<T>& b,
const std::vector<T>& cref, int m, int n, int k) { const std::vector<T>& cref,
const typename jit::MatMulTuples<T>::attr_type& attr) {
EXPECT_TRUE(tgt != nullptr); EXPECT_TRUE(tgt != nullptr);
EXPECT_EQ(a.size(), static_cast<size_t>(m * k)); EXPECT_EQ(a.size(), static_cast<size_t>(attr.m * attr.k));
EXPECT_EQ(b.size(), static_cast<size_t>(k * n)); EXPECT_EQ(b.size(), static_cast<size_t>(attr.k * attr.n));
EXPECT_EQ(cref.size(), static_cast<size_t>(m * n)); EXPECT_EQ(cref.size(), static_cast<size_t>(attr.m * attr.n));
std::vector<T> c(cref.size()); std::vector<T> c(cref.size());
const T* a_data = a.data(); const T* a_data = a.data();
const T* b_data = b.data(); const T* b_data = b.data();
const T* cref_data = cref.data(); const T* cref_data = cref.data();
T* c_data = c.data(); T* c_data = c.data();
tgt(a_data, b_data, c_data, m, n, k); tgt(a_data, b_data, c_data, &attr);
ExpectEQ<T>(c_data, cref_data, m * n); ExpectEQ<T>(c_data, cref_data, attr.m * attr.n);
} }
}; };
...@@ -383,8 +385,8 @@ void TestAXYNKernel() { ...@@ -383,8 +385,8 @@ void TestAXYNKernel() {
template <jit::KernelType KT, typename T, typename PlaceType> template <jit::KernelType KT, typename T, typename PlaceType>
void TestXRNKernel() { void TestXRNKernel() {
VLOG(10) << "===== Test JITKernel " << jit::to_string(KT); VLOG(10) << "===== Test JITKernel " << jit::to_string(KT);
auto last_acc = acc; auto last_acc = FLAGS_acc;
acc = 1e-4; FLAGS_acc = 1e-4;
for (int d : TestSizes()) { for (int d : TestSizes()) {
auto ref = jit::GetRefer<KT, jit::XRNTuples<T>>(); auto ref = jit::GetRefer<KT, jit::XRNTuples<T>>();
EXPECT_TRUE(ref != nullptr); EXPECT_TRUE(ref != nullptr);
...@@ -395,7 +397,7 @@ void TestXRNKernel() { ...@@ -395,7 +397,7 @@ void TestXRNKernel() {
TestAllImpls<KT, jit::XRNTuples<T>, PlaceType, std::vector<T>, T>(d, x, TestAllImpls<KT, jit::XRNTuples<T>, PlaceType, std::vector<T>, T>(d, x,
ref_res); ref_res);
} }
acc = last_acc; FLAGS_acc = last_acc;
} }
template <jit::KernelType KT, typename T, typename PlaceType> template <jit::KernelType KT, typename T, typename PlaceType>
...@@ -535,9 +537,10 @@ void TestSeqPoolKernel() { ...@@ -535,9 +537,10 @@ void TestSeqPoolKernel() {
template <jit::KernelType KT, typename T, typename PlaceType> template <jit::KernelType KT, typename T, typename PlaceType>
void TestMatMulKernel() { void TestMatMulKernel() {
VLOG(10) << "===== Test JITKernel " << jit::to_string(KT); VLOG(10) << "===== Test JITKernel " << jit::to_string(KT);
auto last_acc = acc; auto last_acc = FLAGS_acc;
// TODO(intel): this should be acc issue of MKL // TODO(intel): fix MKL acc issue
acc = 1e-3; // https://github.com/PaddlePaddle/Paddle/issues/15447
FLAGS_acc = 1e-3;
for (int m : {1, 2, 3, 4}) { for (int m : {1, 2, 3, 4}) {
for (int n : {1, 2, 3, 4}) { for (int n : {1, 2, 3, 4}) {
for (int k : TestSizes()) { for (int k : TestSizes()) {
...@@ -549,13 +552,14 @@ void TestMatMulKernel() { ...@@ -549,13 +552,14 @@ void TestMatMulKernel() {
const T* a_data = a.data(); const T* a_data = a.data();
const T* b_data = b.data(); const T* b_data = b.data();
T* c_data = c.data(); T* c_data = c.data();
ref(a_data, b_data, c_data, m, n, k); const jit::matmul_attr_t attr{m, n, k};
ref(a_data, b_data, c_data, &attr);
TestAllImpls<KT, jit::MatMulTuples<T>, PlaceType, std::vector<T>, TestAllImpls<KT, jit::MatMulTuples<T>, PlaceType, std::vector<T>,
std::vector<T>, std::vector<T>>(k, a, b, c, m, n, k); std::vector<T>, std::vector<T>>(attr, a, b, c, attr);
} }
} }
} }
acc = last_acc; FLAGS_acc = last_acc;
} }
template <jit::KernelType KT, typename T, typename PlaceType> template <jit::KernelType KT, typename T, typename PlaceType>
......
...@@ -30,15 +30,17 @@ inline void FCCompute(const BlasT<DeviceContext, T>& blas, const int M, ...@@ -30,15 +30,17 @@ inline void FCCompute(const BlasT<DeviceContext, T>& blas, const int M,
return; return;
} }
if (relu) { if (relu) {
auto compute = auto compute = jit::KernelFuncs<jit::kVAddRelu, jit::XYZNTuples<T>,
jit::Get<jit::kVAddRelu, jit::XYZNTuples<T>, platform::CPUPlace>(N); platform::CPUPlace>::Cache()
.At(N);
for (int i = 0; i < M; i++) { for (int i = 0; i < M; i++) {
T* dst = Y + i * N; T* dst = Y + i * N;
compute(B, dst, dst, N); compute(B, dst, dst, N);
} }
} else { } else {
auto compute = auto compute = jit::KernelFuncs<jit::kVAdd, jit::XYZNTuples<T>,
jit::Get<jit::kVAdd, jit::XYZNTuples<T>, platform::CPUPlace>(N); platform::CPUPlace>::Cache()
.At(N);
#ifdef PADDLE_WITH_MKLML #ifdef PADDLE_WITH_MKLML
#pragma omp parallel for #pragma omp parallel for
#endif #endif
......
...@@ -82,8 +82,9 @@ class SoftmaxFunctor<DeviceContext, float, true, enable_if_CPU<DeviceContext>> { ...@@ -82,8 +82,9 @@ class SoftmaxFunctor<DeviceContext, float, true, enable_if_CPU<DeviceContext>> {
const int kClassDim = 1; const int kClassDim = 1;
// 2D data. Batch x C // 2D data. Batch x C
auto compute_softmax = auto compute_softmax =
jit::Get<jit::kSoftmax, jit::SoftmaxTuples<float>, platform::CPUPlace>( jit::KernelFuncs<jit::kSoftmax, jit::SoftmaxTuples<float>,
in_dims[kClassDim]); platform::CPUPlace>::Cache()
.At(in_dims[kClassDim]);
compute_softmax(in_data, out_data, in_dims[kClassDim], in_dims[kBatchDim]); compute_softmax(in_data, out_data, in_dims[kClassDim], in_dims[kBatchDim]);
} }
}; };
......
...@@ -31,6 +31,8 @@ std::map<std::string, ...@@ -31,6 +31,8 @@ std::map<std::string,
std::shared_ptr<std::unordered_map< std::shared_ptr<std::unordered_map<
std::string, std::shared_ptr<ngraph::Node>>>)>> std::string, std::shared_ptr<ngraph::Node>>>)>>
NgraphBridge::NG_NODE_MAP = { NgraphBridge::NG_NODE_MAP = {
{"conv2d", NG_OPS::BuildConv2dNode},
{"conv2d_grad", NG_OPS::BuildConv2dGradNode},
{"elementwise_add", NG_OPS::BuildElementwiseAddNode}, {"elementwise_add", NG_OPS::BuildElementwiseAddNode},
{"elementwise_add_grad", NG_OPS::BuildElementwiseAddGradNode}, {"elementwise_add_grad", NG_OPS::BuildElementwiseAddGradNode},
{"fill_constant", NG_OPS::BuildFillConstantNode}, {"fill_constant", NG_OPS::BuildFillConstantNode},
...@@ -38,6 +40,8 @@ std::map<std::string, ...@@ -38,6 +40,8 @@ std::map<std::string,
{"mean_grad", NG_OPS::BuildMeanGradNode}, {"mean_grad", NG_OPS::BuildMeanGradNode},
{"mul", NG_OPS::BuildMulNode}, {"mul", NG_OPS::BuildMulNode},
{"mul_grad", NG_OPS::BuildMulGradNode}, {"mul_grad", NG_OPS::BuildMulGradNode},
{"pool2d", NG_OPS::BuildPool2dNode},
{"pool2d_grad", NG_OPS::BuildPool2dGradNode},
{"softmax", NG_OPS::BuildSoftmaxNode}, {"softmax", NG_OPS::BuildSoftmaxNode},
{"softmax_grad", NG_OPS::BuildSoftmaxGradNode}, {"softmax_grad", NG_OPS::BuildSoftmaxGradNode},
{"scale", NG_OPS::BuildScaleNode}, {"scale", NG_OPS::BuildScaleNode},
......
...@@ -22,10 +22,12 @@ limitations under the License. */ ...@@ -22,10 +22,12 @@ limitations under the License. */
#pragma once #pragma once
#include "ops/binary_unnary_op.h" #include "ops/binary_unnary_op.h"
#include "ops/conv2d_op.h"
#include "ops/elementwise_add_op.h" #include "ops/elementwise_add_op.h"
#include "ops/fill_constant_op.h" #include "ops/fill_constant_op.h"
#include "ops/mean_op.h" #include "ops/mean_op.h"
#include "ops/mul_op.h" #include "ops/mul_op.h"
#include "ops/pool2d_op.h"
#include "ops/scale_op.h" #include "ops/scale_op.h"
#include "ops/softmax_op.h" #include "ops/softmax_op.h"
#include "ops/top_k_op.h" #include "ops/top_k_op.h"
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <string>
#include <vector>
#include "ngraph/ngraph.hpp"
#include "paddle/fluid/platform/ngraph_helper.h"
namespace paddle {
namespace operators {
namespace ngraphs {
std::shared_ptr<ngraph::Node> GroupedConvolution(
const std::shared_ptr<ngraph::Node>& data_batch,
const std::shared_ptr<ngraph::Node>& filters, const ngraph::Strides strides,
const ngraph::Strides dilations, const ngraph::CoordinateDiff& paddings,
size_t groups) {
auto& data_shape = data_batch->get_shape();
auto& filter_shape = filters->get_shape();
ngraph::NodeVector ng_slices;
for (size_t i = 0; i < groups; ++i) {
size_t channel_step = filter_shape.at(1);
const std::vector<size_t> lower_bound{0, i * channel_step, 0, 0};
const std::vector<size_t> upper_bound{data_shape.at(0),
(i + 1) * channel_step,
data_shape.at(2), data_shape.at(3)};
auto data_slice = std::make_shared<ngraph::op::Slice>(
data_batch, lower_bound, upper_bound);
size_t filter_step = filter_shape.at(0) / groups;
const std::vector<size_t> filter_lower_bound{i * filter_step, 0, 0, 0};
const std::vector<size_t> filter_upper_bound{
(i + 1) * filter_step, filter_shape.at(1), filter_shape.at(2),
filter_shape.at(3)};
auto filter_slice = std::make_shared<ngraph::op::Slice>(
filters, filter_lower_bound, filter_upper_bound);
auto ng_conv = std::make_shared<ngraph::op::Convolution>(
data_slice, filter_slice, strides, dilations, paddings, paddings);
ng_slices.push_back(ng_conv);
}
size_t concat_axis = 1;
return std::make_shared<ngraph::op::Concat>(ng_slices, concat_axis);
}
std::shared_ptr<ngraph::Node> GroupedGradConvolutionFilter(
const std::shared_ptr<ngraph::Node>& data_batch,
const std::shared_ptr<ngraph::Node>& filters,
const std::shared_ptr<ngraph::Node>& doutput, const ngraph::Strides strides,
const ngraph::Strides dilations, const ngraph::CoordinateDiff& paddings,
size_t groups) {
auto& data_shape = data_batch->get_shape();
auto& filter_shape = filters->get_shape();
auto& out_shape = doutput->get_shape();
ngraph::NodeVector ng_slices;
for (size_t i = 0; i < groups; ++i) {
size_t channel_step = filter_shape.at(1);
const std::vector<size_t> lower_bound{0, i * channel_step, 0, 0};
const std::vector<size_t> upper_bound{data_shape.at(0),
(i + 1) * channel_step,
data_shape.at(2), data_shape.at(3)};
auto data_slice = std::make_shared<ngraph::op::Slice>(
data_batch, lower_bound, upper_bound);
size_t filter_step = data_shape.at(0);
const std::vector<size_t> filter_lower_bound{i * filter_step, 0, 0, 0};
const std::vector<size_t> filter_upper_bound{
(i + 1) * filter_step, filter_shape.at(1), filter_shape.at(2),
filter_shape.at(3)};
auto filter_slice = std::make_shared<ngraph::op::Slice>(
filters, filter_lower_bound, filter_upper_bound);
const std::vector<size_t> olower_bound{0, i * filter_step, 0, 0};
const std::vector<size_t> oupper_bound{out_shape.at(0),
(i + 1) * filter_step,
out_shape.at(2), out_shape.at(3)};
auto out_slice = std::make_shared<ngraph::op::Slice>(doutput, olower_bound,
oupper_bound);
auto ng_conv = std::make_shared<ngraph::op::ConvolutionBackpropFilters>(
data_slice, filter_slice->get_shape(), out_slice, strides, dilations,
paddings, paddings, ngraph::Strides{1, 1});
ng_slices.push_back(ng_conv);
}
size_t concat_axis = 0;
return std::make_shared<ngraph::op::Concat>(ng_slices, concat_axis);
}
std::shared_ptr<ngraph::Node> GroupedGradConvolutionData(
const std::shared_ptr<ngraph::Node>& data_batch,
const std::shared_ptr<ngraph::Node>& filters,
const std::shared_ptr<ngraph::Node>& doutput, const ngraph::Strides strides,
const ngraph::Strides dilations, const ngraph::CoordinateDiff& paddings,
size_t groups) {
auto& data_shape = data_batch->get_shape();
auto& filter_shape = filters->get_shape();
auto& out_shape = doutput->get_shape();
ngraph::NodeVector ng_slices;
for (size_t i = 0; i < groups; ++i) {
size_t channel_step = filter_shape.at(1);
const std::vector<size_t> lower_bound{0, i * channel_step, 0, 0};
const std::vector<size_t> upper_bound{data_shape.at(0),
(i + 1) * channel_step,
data_shape.at(2), data_shape.at(3)};
auto data_slice = std::make_shared<ngraph::op::Slice>(
data_batch, lower_bound, upper_bound);
size_t filter_step = data_shape.at(0);
const std::vector<size_t> filter_lower_bound{i * filter_step, 0, 0, 0};
const std::vector<size_t> filter_upper_bound{
(i + 1) * filter_step, filter_shape.at(1), filter_shape.at(2),
filter_shape.at(3)};
auto filter_slice = std::make_shared<ngraph::op::Slice>(
filters, filter_lower_bound, filter_upper_bound);
const std::vector<size_t> olower_bound{0, i * filter_step, 0, 0};
const std::vector<size_t> oupper_bound{out_shape.at(0),
(i + 1) * filter_step,
out_shape.at(2), out_shape.at(3)};
auto out_slice = std::make_shared<ngraph::op::Slice>(doutput, olower_bound,
oupper_bound);
auto ng_conv = std::make_shared<ngraph::op::ConvolutionBackpropData>(
data_slice->get_shape(), filter_slice, out_slice, strides, dilations,
paddings, paddings, ngraph::Strides{1, 1});
ng_slices.push_back(ng_conv);
}
size_t concat_axis = 1;
return std::make_shared<ngraph::op::Concat>(ng_slices, concat_axis);
}
void BuildConv2dNode(
const std::shared_ptr<paddle::framework::OperatorBase>& op,
std::shared_ptr<
std::unordered_map<std::string, std::shared_ptr<ngraph::Node>>>
ngb_node_map) {
auto op_attrs = paddle::framework::AttrReader(op->Attrs());
auto filters = paddle::platform::GetInputNode(op, "Filter", ngb_node_map);
auto input = paddle::platform::GetInputNode(op, "Input", ngb_node_map);
std::vector<int> strides = op_attrs.Get<std::vector<int>>("strides");
std::vector<int> paddings = op_attrs.Get<std::vector<int>>("paddings");
std::vector<int> dilations = op_attrs.Get<std::vector<int>>("dilations");
const ngraph::Strides ng_strides{static_cast<size_t>(strides.at(0)),
static_cast<size_t>(strides.at(1))};
const ngraph::Strides ng_dilations{static_cast<size_t>(dilations.at(0)),
static_cast<size_t>(dilations.at(1))};
const ngraph::CoordinateDiff ng_paddings{
static_cast<std::ptrdiff_t>(paddings.at(0)),
static_cast<std::ptrdiff_t>(paddings.at(1))};
int groups = static_cast<size_t>(op_attrs.Get<int>("groups"));
PADDLE_ENFORCE_GE(groups, 1, "conv groups needs be no less than 1");
std::shared_ptr<ngraph::Node> result;
if (groups == 1) {
result = std::make_shared<ngraph::op::Convolution>(
input, filters, ng_strides, ng_dilations, ng_paddings, ng_paddings);
} else {
result = GroupedConvolution(input, filters, ng_strides, ng_dilations,
ng_paddings, groups);
}
paddle::platform::SetOutputNode(op, "Output", result, ngb_node_map);
}
void BuildConv2dGradNode(
const std::shared_ptr<paddle::framework::OperatorBase>& op,
std::shared_ptr<
std::unordered_map<std::string, std::shared_ptr<ngraph::Node>>>
ngb_node_map) {
auto op_attrs = paddle::framework::AttrReader(op->Attrs());
auto filter = paddle::platform::GetInputNode(op, "Filter", ngb_node_map);
auto input = paddle::platform::GetInputNode(op, "Input", ngb_node_map);
auto doutput =
paddle::platform::GetInputNode(op, "Output@GRAD", ngb_node_map);
int groups = op_attrs.Get<int>("groups");
std::vector<int> strides = op_attrs.Get<std::vector<int>>("strides");
std::vector<int> paddings = op_attrs.Get<std::vector<int>>("paddings");
std::vector<int> dilations = op_attrs.Get<std::vector<int>>("dilations");
const ngraph::Strides ng_strides{static_cast<size_t>(strides.at(0)),
static_cast<size_t>(strides.at(1))};
const ngraph::Strides ng_dilations{static_cast<size_t>(dilations.at(0)),
static_cast<size_t>(dilations.at(1))};
const ngraph::CoordinateDiff ng_paddings{
static_cast<std::ptrdiff_t>(paddings.at(0)),
static_cast<std::ptrdiff_t>(paddings.at(1))};
std::shared_ptr<ngraph::Node> dfilter;
std::shared_ptr<ngraph::Node> dinput;
if (groups == 1) {
dfilter = std::make_shared<ngraph::op::ConvolutionBackpropFilters>(
input, filter->get_shape(), doutput, ng_strides, ng_dilations,
ng_paddings, ng_paddings, ngraph::Strides{1, 1});
dinput = std::make_shared<ngraph::op::ConvolutionBackpropData>(
input->get_shape(), filter, doutput, ng_strides, ng_dilations,
ng_paddings, ng_paddings, ngraph::Strides{1, 1});
} else {
dfilter = GroupedGradConvolutionFilter(input, filter, doutput, ng_strides,
ng_dilations, ng_paddings, groups);
dinput = GroupedGradConvolutionData(input, filter, doutput, ng_strides,
ng_dilations, ng_paddings, groups);
}
paddle::platform::SetOutputNode(op, "Filter@GRAD", dfilter, ngb_node_map);
paddle::platform::SetOutputNode(op, "Input@GRAD", dinput, ngb_node_map);
}
} // namespace ngraphs
} // namespace operators
} // namespace paddle
/*Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <string>
#include <vector>
#include "ngraph/ngraph.hpp"
#include "paddle/fluid/platform/ngraph_helper.h"
namespace paddle {
namespace operators {
namespace ngraphs {
void BuildPool2dNode(
const std::shared_ptr<paddle::framework::OperatorBase>& op,
std::shared_ptr<
std::unordered_map<std::string, std::shared_ptr<ngraph::Node>>>
ngb_node_map) {
auto op_attrs = paddle::framework::AttrReader(op->Attrs());
auto x = paddle::platform::GetInputNode(op, "X", ngb_node_map);
auto x_shape = x->get_shape();
std::string pooling_type = op_attrs.Get<std::string>("pooling_type");
std::vector<int> ksize = op_attrs.Get<std::vector<int>>("ksize");
std::vector<int> strides = op_attrs.Get<std::vector<int>>("strides");
std::vector<int> paddings = op_attrs.Get<std::vector<int>>("paddings");
PADDLE_ENFORCE_EQ(x_shape.size() - 2, ksize.size(),
"Handling 2d pooling only");
if (op_attrs.Get<bool>("global_pooling")) {
for (size_t i = 0; i < ksize.size(); ++i) {
paddings[i] = 0;
ksize[i] = static_cast<int>(x_shape.at(i + 2));
}
}
ngraph::Shape ng_padding_below{static_cast<size_t>(paddings.at(0)),
static_cast<size_t>(paddings.at(1))};
ngraph::Shape ng_padding_above{static_cast<size_t>(paddings.at(0)),
static_cast<size_t>(paddings.at(1))};
ngraph::Shape ng_ksize_shape{static_cast<size_t>(ksize.at(0)),
static_cast<size_t>(ksize.at(1))};
ngraph::Strides ng_strides{static_cast<size_t>(strides.at(0)),
static_cast<size_t>(strides.at(1))};
auto ComputeCeiledOutput = [](size_t in, size_t k, size_t p, size_t s) {
return (in - k + 2 * p) / s + 1;
};
if (op_attrs.Get<bool>("ceil_mode")) {
auto dummy_out = paddle::platform::GetOutputNode(op, "Out", ngb_node_map);
auto dummpy_shape = dummy_out->get_shape();
for (size_t i = 0; i < ng_padding_above.size(); ++i) {
auto desired_size = ComputeCeiledOutput(x_shape[i + 2], ksize[i],
paddings[i], strides[i]);
if (desired_size != dummpy_shape[i + 2]) {
ng_padding_above[i] += strides[i];
}
}
}
bool padding_exclusive = op_attrs.Get<bool>("exclusive");
if (pooling_type == "max") {
auto pool2d = std::make_shared<ngraph::op::MaxPool>(
x, ng_ksize_shape, ng_strides, ng_padding_below, ng_padding_above);
paddle::platform::SetOutputNode(op, "Out", pool2d, ngb_node_map);
} else if (pooling_type == "avg") {
std::shared_ptr<ngraph::Node> pool2d;
if (op_attrs.Get<bool>("adaptive")) {
auto ComputeAdaptive = [](size_t in, size_t k) {
return std::floor(in / k);
};
ng_strides[0] = x_shape.size() == 4
? ComputeAdaptive(x_shape[3], ksize[0])
: ng_strides[0];
ng_strides[1] = x_shape.size() == 4
? ComputeAdaptive(x_shape[3], ksize[0])
: ng_strides[1];
pool2d =
std::make_shared<ngraph::op::AvgPool>(x, ng_ksize_shape, ng_strides);
} else {
pool2d = std::make_shared<ngraph::op::AvgPool>(
x, ng_ksize_shape, ng_strides, ng_padding_below, ng_padding_above,
!padding_exclusive);
}
paddle::platform::SetOutputNode(op, "Out", pool2d, ngb_node_map);
} else {
PADDLE_THROW("Support max and avg pooling only");
}
}
void BuildPool2dGradNode(
const std::shared_ptr<paddle::framework::OperatorBase>& op,
std::shared_ptr<
std::unordered_map<std::string, std::shared_ptr<ngraph::Node>>>
ngb_node_map) {
auto op_attrs = paddle::framework::AttrReader(op->Attrs());
auto out = paddle::platform::GetInputNode(op, "Out", ngb_node_map);
auto dout = paddle::platform::GetInputNode(op, "Out@GRAD", ngb_node_map);
auto x = paddle::platform::GetInputNode(op, "X", ngb_node_map);
auto x_shape = x->get_shape();
std::string pooling_type = op_attrs.Get<std::string>("pooling_type");
std::vector<int> ksize = op_attrs.Get<std::vector<int>>("ksize");
std::vector<int> strides = op_attrs.Get<std::vector<int>>("strides");
std::vector<int> paddings = op_attrs.Get<std::vector<int>>("paddings");
PADDLE_ENFORCE_EQ(x_shape.size() - 2, ksize.size(),
"Handling 2d pooling only");
if (op_attrs.Get<bool>("global_pooling")) {
for (size_t i = 0; i < ksize.size(); ++i) {
paddings[i] = 0;
ksize[i] = static_cast<int>(x_shape.at(i + 2));
}
}
ngraph::Shape ng_padding_below{static_cast<size_t>(paddings.at(0)),
static_cast<size_t>(paddings.at(1))};
ngraph::Shape ng_padding_above{static_cast<size_t>(paddings.at(0)),
static_cast<size_t>(paddings.at(1))};
ngraph::Shape ng_ksize_shape{static_cast<size_t>(ksize.at(0)),
static_cast<size_t>(ksize.at(1))};
ngraph::Strides ng_strides{static_cast<size_t>(strides.at(0)),
static_cast<size_t>(strides.at(1))};
bool padding_exclusive = op_attrs.Get<bool>("exclusive");
if (pooling_type == "max") {
auto pool2d_grad = std::make_shared<ngraph::op::MaxPoolBackprop>(
x, dout, out, ng_ksize_shape, ng_strides, ng_padding_below,
ng_padding_above);
paddle::platform::SetOutputNode(op, "X@GRAD", pool2d_grad, ngb_node_map);
} else if (pooling_type == "avg") {
std::shared_ptr<ngraph::Node> pool2d_grad;
if (op_attrs.Get<bool>("adaptive")) {
auto ComputeAdaptive = [](size_t in, size_t k) {
return std::floor(in / k);
};
ng_strides[0] = x_shape.size() == 4
? ComputeAdaptive(x_shape[3], ksize[0])
: ng_strides[0];
ng_strides[1] = x_shape.size() == 4
? ComputeAdaptive(x_shape[3], ksize[0])
: ng_strides[1];
pool2d_grad = std::make_shared<ngraph::op::AvgPoolBackprop>(
x->get_shape(), dout, ng_ksize_shape, ng_strides, ng_padding_below,
ng_padding_above, !padding_exclusive);
} else {
pool2d_grad = std::make_shared<ngraph::op::AvgPoolBackprop>(
x->get_shape(), dout, ng_ksize_shape, ng_strides, ng_padding_below,
ng_padding_above, !padding_exclusive);
}
paddle::platform::SetOutputNode(op, "X@GRAD", pool2d_grad, ngb_node_map);
} else {
PADDLE_THROW("Support max and avg pooling only");
}
}
} // namespace ngraphs
} // namespace operators
} // namespace paddle
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <algorithm>
#include <vector>
#include "paddle/fluid/framework/op_registry.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
template <typename T, size_t D, int MajorType = Eigen::RowMajor,
typename IndexType = Eigen::DenseIndex>
using EigenTensor = framework::EigenTensor<T, D, MajorType, IndexType>;
template <typename T, int MajorType = Eigen::RowMajor,
typename IndexType = Eigen::DenseIndex>
using EigenVector = framework::EigenVector<T, MajorType, IndexType>;
using Array5 = Eigen::DSizes<int64_t, 5>;
template <typename T>
static inline bool isZero(T x) {
return fabs(x) < 1e-6;
}
template <typename T>
static inline T sigmoid(T x) {
return 1.0 / (exp(-1.0 * x) + 1.0);
}
template <typename T>
static inline T CalcMaskPointNum(const Tensor& mask) {
auto mask_t = EigenVector<int>::Flatten(mask);
T count = 0.0;
for (int i = 0; i < mask_t.dimensions()[0]; i++) {
if (mask_t(i)) {
count += 1.0;
}
}
return count;
}
template <typename T>
static inline T CalcMSEWithMask(const Tensor& x, const Tensor& y,
const Tensor& mask) {
auto x_t = EigenVector<T>::Flatten(x);
auto y_t = EigenVector<T>::Flatten(y);
auto mask_t = EigenVector<int>::Flatten(mask);
T error_sum = 0.0;
T points = 0.0;
for (int i = 0; i < x_t.dimensions()[0]; i++) {
if (mask_t(i)) {
error_sum += pow(x_t(i) - y_t(i), 2);
points += 1;
}
}
return (error_sum / points);
}
template <typename T>
static void CalcMSEGradWithMask(Tensor* grad, const Tensor& x, const Tensor& y,
const Tensor& mask, T mf) {
auto grad_t = EigenVector<T>::Flatten(*grad).setConstant(0.0);
auto x_t = EigenVector<T>::Flatten(x);
auto y_t = EigenVector<T>::Flatten(y);
auto mask_t = EigenVector<int>::Flatten(mask);
for (int i = 0; i < x_t.dimensions()[0]; i++) {
if (mask_t(i)) {
grad_t(i) = 2.0 * (x_t(i) - y_t(i)) / mf;
}
}
}
template <typename T>
static inline T CalcBCEWithMask(const Tensor& x, const Tensor& y,
const Tensor& mask) {
auto x_t = EigenVector<T>::Flatten(x);
auto y_t = EigenVector<T>::Flatten(y);
auto mask_t = EigenVector<int>::Flatten(mask);
T error_sum = 0.0;
T points = 0.0;
for (int i = 0; i < x_t.dimensions()[0]; i++) {
if (mask_t(i)) {
error_sum +=
-1.0 * (y_t(i) * log(x_t(i)) + (1.0 - y_t(i)) * log(1.0 - x_t(i)));
points += 1;
}
}
return (error_sum / points);
}
template <typename T>
static inline void CalcBCEGradWithMask(Tensor* grad, const Tensor& x,
const Tensor& y, const Tensor& mask,
T mf) {
auto grad_t = EigenVector<T>::Flatten(*grad).setConstant(0.0);
auto x_t = EigenVector<T>::Flatten(x);
auto y_t = EigenVector<T>::Flatten(y);
auto mask_t = EigenVector<int>::Flatten(mask);
for (int i = 0; i < x_t.dimensions()[0]; i++) {
if (mask_t(i)) {
grad_t(i) = ((1.0 - y_t(i)) / (1.0 - x_t(i)) - y_t(i) / x_t(i)) / mf;
}
}
}
template <typename T>
static void CalcPredResult(const Tensor& input, Tensor* pred_conf,
Tensor* pred_class, Tensor* pred_x, Tensor* pred_y,
Tensor* pred_w, Tensor* pred_h, const int anchor_num,
const int class_num) {
const int n = input.dims()[0];
const int h = input.dims()[2];
const int w = input.dims()[3];
const int box_attr_num = 5 + class_num;
auto input_t = EigenTensor<T, 4>::From(input);
auto pred_conf_t = EigenTensor<T, 4>::From(*pred_conf);
auto pred_class_t = EigenTensor<T, 5>::From(*pred_class);
auto pred_x_t = EigenTensor<T, 4>::From(*pred_x);
auto pred_y_t = EigenTensor<T, 4>::From(*pred_y);
auto pred_w_t = EigenTensor<T, 4>::From(*pred_w);
auto pred_h_t = EigenTensor<T, 4>::From(*pred_h);
for (int i = 0; i < n; i++) {
for (int an_idx = 0; an_idx < anchor_num; an_idx++) {
for (int j = 0; j < h; j++) {
for (int k = 0; k < w; k++) {
pred_x_t(i, an_idx, j, k) =
sigmoid(input_t(i, box_attr_num * an_idx, j, k));
pred_y_t(i, an_idx, j, k) =
sigmoid(input_t(i, box_attr_num * an_idx + 1, j, k));
pred_w_t(i, an_idx, j, k) =
input_t(i, box_attr_num * an_idx + 2, j, k);
pred_h_t(i, an_idx, j, k) =
input_t(i, box_attr_num * an_idx + 3, j, k);
pred_conf_t(i, an_idx, j, k) =
sigmoid(input_t(i, box_attr_num * an_idx + 4, j, k));
for (int c = 0; c < class_num; c++) {
pred_class_t(i, an_idx, j, k, c) =
sigmoid(input_t(i, box_attr_num * an_idx + 5 + c, j, k));
}
}
}
}
}
}
template <typename T>
static T CalcBoxIoU(std::vector<T> box1, std::vector<T> box2) {
T b1_x1 = box1[0] - box1[2] / 2;
T b1_x2 = box1[0] + box1[2] / 2;
T b1_y1 = box1[1] - box1[3] / 2;
T b1_y2 = box1[1] + box1[3] / 2;
T b2_x1 = box2[0] - box2[2] / 2;
T b2_x2 = box2[0] + box2[2] / 2;
T b2_y1 = box2[1] - box2[3] / 2;
T b2_y2 = box2[1] + box2[3] / 2;
T b1_area = (b1_x2 - b1_x1) * (b1_y2 - b1_y1);
T b2_area = (b2_x2 - b2_x1) * (b2_y2 - b2_y1);
T inter_rect_x1 = std::max(b1_x1, b2_x1);
T inter_rect_y1 = std::max(b1_y1, b2_y1);
T inter_rect_x2 = std::min(b1_x2, b2_x2);
T inter_rect_y2 = std::min(b1_y2, b2_y2);
T inter_area = std::max(inter_rect_x2 - inter_rect_x1, static_cast<T>(0.0)) *
std::max(inter_rect_y2 - inter_rect_y1, static_cast<T>(0.0));
return inter_area / (b1_area + b2_area - inter_area);
}
template <typename T>
static void PreProcessGTBox(const Tensor& gt_box, const Tensor& gt_label,
const float ignore_thresh, std::vector<int> anchors,
const int grid_size, Tensor* obj_mask,
Tensor* noobj_mask, Tensor* tx, Tensor* ty,
Tensor* tw, Tensor* th, Tensor* tconf,
Tensor* tclass) {
const int n = gt_box.dims()[0];
const int b = gt_box.dims()[1];
const int anchor_num = anchors.size() / 2;
auto gt_box_t = EigenTensor<T, 3>::From(gt_box);
auto gt_label_t = EigenTensor<int, 2>::From(gt_label);
auto obj_mask_t = EigenTensor<int, 4>::From(*obj_mask).setConstant(0);
auto noobj_mask_t = EigenTensor<int, 4>::From(*noobj_mask).setConstant(1);
auto tx_t = EigenTensor<T, 4>::From(*tx).setConstant(0.0);
auto ty_t = EigenTensor<T, 4>::From(*ty).setConstant(0.0);
auto tw_t = EigenTensor<T, 4>::From(*tw).setConstant(0.0);
auto th_t = EigenTensor<T, 4>::From(*th).setConstant(0.0);
auto tconf_t = EigenTensor<T, 4>::From(*tconf).setConstant(0.0);
auto tclass_t = EigenTensor<T, 5>::From(*tclass).setConstant(0.0);
for (int i = 0; i < n; i++) {
for (int j = 0; j < b; j++) {
if (isZero<T>(gt_box_t(i, j, 0)) && isZero<T>(gt_box_t(i, j, 1)) &&
isZero<T>(gt_box_t(i, j, 2)) && isZero<T>(gt_box_t(i, j, 3))) {
continue;
}
int cur_label = gt_label_t(i, j);
T gx = gt_box_t(i, j, 0) * grid_size;
T gy = gt_box_t(i, j, 1) * grid_size;
T gw = gt_box_t(i, j, 2) * grid_size;
T gh = gt_box_t(i, j, 3) * grid_size;
int gi = static_cast<int>(gx);
int gj = static_cast<int>(gy);
T max_iou = static_cast<T>(0);
T iou;
int best_an_index = -1;
std::vector<T> gt_box_shape({0, 0, gw, gh});
for (int an_idx = 0; an_idx < anchor_num; an_idx++) {
std::vector<T> anchor_shape({0, 0, static_cast<T>(anchors[2 * an_idx]),
static_cast<T>(anchors[2 * an_idx + 1])});
iou = CalcBoxIoU<T>(gt_box_shape, anchor_shape);
if (iou > max_iou) {
max_iou = iou;
best_an_index = an_idx;
}
if (iou > ignore_thresh) {
noobj_mask_t(i, an_idx, gj, gi) = 0;
}
}
obj_mask_t(i, best_an_index, gj, gi) = 1;
noobj_mask_t(i, best_an_index, gj, gi) = 0;
tx_t(i, best_an_index, gj, gi) = gx - gi;
ty_t(i, best_an_index, gj, gi) = gy - gj;
tw_t(i, best_an_index, gj, gi) = log(gw / anchors[2 * best_an_index]);
th_t(i, best_an_index, gj, gi) = log(gh / anchors[2 * best_an_index + 1]);
tclass_t(i, best_an_index, gj, gi, cur_label) = 1;
tconf_t(i, best_an_index, gj, gi) = 1;
}
}
}
static void ExpandObjMaskByClassNum(Tensor* obj_mask_expand,
const Tensor& obj_mask) {
const int n = obj_mask_expand->dims()[0];
const int an_num = obj_mask_expand->dims()[1];
const int h = obj_mask_expand->dims()[2];
const int w = obj_mask_expand->dims()[3];
const int class_num = obj_mask_expand->dims()[4];
auto obj_mask_expand_t = EigenTensor<int, 5>::From(*obj_mask_expand);
auto obj_mask_t = EigenTensor<int, 4>::From(obj_mask);
obj_mask_expand_t = obj_mask_t.reshape(Array5(n, an_num, h, w, 1))
.broadcast(Array5(1, 1, 1, 1, class_num));
}
template <typename T>
static void AddAllGradToInputGrad(
Tensor* grad, T loss, const Tensor& pred_x, const Tensor& pred_y,
const Tensor& pred_conf, const Tensor& pred_class, const Tensor& grad_x,
const Tensor& grad_y, const Tensor& grad_w, const Tensor& grad_h,
const Tensor& grad_conf_target, const Tensor& grad_conf_notarget,
const Tensor& grad_class, const int class_num, const float loss_weight_xy,
const float loss_weight_wh, const float loss_weight_conf_target,
const float loss_weight_conf_notarget, const float loss_weight_class) {
const int n = pred_x.dims()[0];
const int an_num = pred_x.dims()[1];
const int h = pred_x.dims()[2];
const int w = pred_x.dims()[3];
const int attr_num = class_num + 5;
auto grad_t = EigenTensor<T, 4>::From(*grad).setConstant(0.0);
auto pred_x_t = EigenTensor<T, 4>::From(pred_x);
auto pred_y_t = EigenTensor<T, 4>::From(pred_y);
auto pred_conf_t = EigenTensor<T, 4>::From(pred_conf);
auto pred_class_t = EigenTensor<T, 5>::From(pred_class);
auto grad_x_t = EigenTensor<T, 4>::From(grad_x);
auto grad_y_t = EigenTensor<T, 4>::From(grad_y);
auto grad_w_t = EigenTensor<T, 4>::From(grad_w);
auto grad_h_t = EigenTensor<T, 4>::From(grad_h);
auto grad_conf_target_t = EigenTensor<T, 4>::From(grad_conf_target);
auto grad_conf_notarget_t = EigenTensor<T, 4>::From(grad_conf_notarget);
auto grad_class_t = EigenTensor<T, 5>::From(grad_class);
for (int i = 0; i < n; i++) {
for (int j = 0; j < an_num; j++) {
for (int k = 0; k < h; k++) {
for (int l = 0; l < w; l++) {
grad_t(i, j * attr_num, k, l) =
grad_x_t(i, j, k, l) * pred_x_t(i, j, k, l) *
(1.0 - pred_x_t(i, j, k, l)) * loss * loss_weight_xy;
grad_t(i, j * attr_num + 1, k, l) =
grad_y_t(i, j, k, l) * pred_y_t(i, j, k, l) *
(1.0 - pred_y_t(i, j, k, l)) * loss * loss_weight_xy;
grad_t(i, j * attr_num + 2, k, l) =
grad_w_t(i, j, k, l) * loss * loss_weight_wh;
grad_t(i, j * attr_num + 3, k, l) =
grad_h_t(i, j, k, l) * loss * loss_weight_wh;
grad_t(i, j * attr_num + 4, k, l) =
grad_conf_target_t(i, j, k, l) * pred_conf_t(i, j, k, l) *
(1.0 - pred_conf_t(i, j, k, l)) * loss * loss_weight_conf_target;
grad_t(i, j * attr_num + 4, k, l) +=
grad_conf_notarget_t(i, j, k, l) * pred_conf_t(i, j, k, l) *
(1.0 - pred_conf_t(i, j, k, l)) * loss *
loss_weight_conf_notarget;
for (int c = 0; c < class_num; c++) {
grad_t(i, j * attr_num + 5 + c, k, l) =
grad_class_t(i, j, k, l, c) * pred_class_t(i, j, k, l, c) *
(1.0 - pred_class_t(i, j, k, l, c)) * loss * loss_weight_class;
}
}
}
}
}
}
template <typename T>
class Yolov3LossKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* input = ctx.Input<Tensor>("X");
auto* gt_box = ctx.Input<Tensor>("GTBox");
auto* gt_label = ctx.Input<Tensor>("GTLabel");
auto* loss = ctx.Output<Tensor>("Loss");
auto anchors = ctx.Attr<std::vector<int>>("anchors");
int class_num = ctx.Attr<int>("class_num");
float ignore_thresh = ctx.Attr<float>("ignore_thresh");
float loss_weight_xy = ctx.Attr<float>("loss_weight_xy");
float loss_weight_wh = ctx.Attr<float>("loss_weight_wh");
float loss_weight_conf_target = ctx.Attr<float>("loss_weight_conf_target");
float loss_weight_conf_notarget =
ctx.Attr<float>("loss_weight_conf_notarget");
float loss_weight_class = ctx.Attr<float>("loss_weight_class");
const int n = input->dims()[0];
const int h = input->dims()[2];
const int w = input->dims()[3];
const int an_num = anchors.size() / 2;
Tensor pred_x, pred_y, pred_w, pred_h;
Tensor pred_conf, pred_class;
pred_x.mutable_data<T>({n, an_num, h, w}, ctx.GetPlace());
pred_y.mutable_data<T>({n, an_num, h, w}, ctx.GetPlace());
pred_w.mutable_data<T>({n, an_num, h, w}, ctx.GetPlace());
pred_h.mutable_data<T>({n, an_num, h, w}, ctx.GetPlace());
pred_conf.mutable_data<T>({n, an_num, h, w}, ctx.GetPlace());
pred_class.mutable_data<T>({n, an_num, h, w, class_num}, ctx.GetPlace());
CalcPredResult<T>(*input, &pred_conf, &pred_class, &pred_x, &pred_y,
&pred_w, &pred_h, an_num, class_num);
Tensor obj_mask, noobj_mask;
Tensor tx, ty, tw, th, tconf, tclass;
obj_mask.mutable_data<int>({n, an_num, h, w}, ctx.GetPlace());
noobj_mask.mutable_data<int>({n, an_num, h, w}, ctx.GetPlace());
tx.mutable_data<T>({n, an_num, h, w}, ctx.GetPlace());
ty.mutable_data<T>({n, an_num, h, w}, ctx.GetPlace());
tw.mutable_data<T>({n, an_num, h, w}, ctx.GetPlace());
th.mutable_data<T>({n, an_num, h, w}, ctx.GetPlace());
tconf.mutable_data<T>({n, an_num, h, w}, ctx.GetPlace());
tclass.mutable_data<T>({n, an_num, h, w, class_num}, ctx.GetPlace());
PreProcessGTBox<T>(*gt_box, *gt_label, ignore_thresh, anchors, h, &obj_mask,
&noobj_mask, &tx, &ty, &tw, &th, &tconf, &tclass);
Tensor obj_mask_expand;
obj_mask_expand.mutable_data<int>({n, an_num, h, w, class_num},
ctx.GetPlace());
ExpandObjMaskByClassNum(&obj_mask_expand, obj_mask);
T loss_x = CalcMSEWithMask<T>(pred_x, tx, obj_mask);
T loss_y = CalcMSEWithMask<T>(pred_y, ty, obj_mask);
T loss_w = CalcMSEWithMask<T>(pred_w, tw, obj_mask);
T loss_h = CalcMSEWithMask<T>(pred_h, th, obj_mask);
T loss_conf_target = CalcBCEWithMask<T>(pred_conf, tconf, obj_mask);
T loss_conf_notarget = CalcBCEWithMask<T>(pred_conf, tconf, noobj_mask);
T loss_class = CalcBCEWithMask<T>(pred_class, tclass, obj_mask_expand);
auto* loss_data = loss->mutable_data<T>({1}, ctx.GetPlace());
loss_data[0] = loss_weight_xy * (loss_x + loss_y) +
loss_weight_wh * (loss_w + loss_h) +
loss_weight_conf_target * loss_conf_target +
loss_weight_conf_notarget * loss_conf_notarget +
loss_weight_class * loss_class;
}
};
template <typename T>
class Yolov3LossGradKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* input = ctx.Input<Tensor>("X");
auto* gt_box = ctx.Input<Tensor>("GTBox");
auto* gt_label = ctx.Input<Tensor>("GTLabel");
auto anchors = ctx.Attr<std::vector<int>>("anchors");
int class_num = ctx.Attr<int>("class_num");
float ignore_thresh = ctx.Attr<float>("ignore_thresh");
auto* input_grad = ctx.Output<Tensor>(framework::GradVarName("X"));
auto* output_grad = ctx.Input<Tensor>(framework::GradVarName("Loss"));
const T loss = output_grad->data<T>()[0];
float loss_weight_xy = ctx.Attr<float>("loss_weight_xy");
float loss_weight_wh = ctx.Attr<float>("loss_weight_wh");
float loss_weight_conf_target = ctx.Attr<float>("loss_weight_conf_target");
float loss_weight_conf_notarget =
ctx.Attr<float>("loss_weight_conf_notarget");
float loss_weight_class = ctx.Attr<float>("loss_weight_class");
const int n = input->dims()[0];
const int c = input->dims()[1];
const int h = input->dims()[2];
const int w = input->dims()[3];
const int an_num = anchors.size() / 2;
Tensor pred_x, pred_y, pred_w, pred_h;
Tensor pred_conf, pred_class;
pred_x.mutable_data<T>({n, an_num, h, w}, ctx.GetPlace());
pred_y.mutable_data<T>({n, an_num, h, w}, ctx.GetPlace());
pred_w.mutable_data<T>({n, an_num, h, w}, ctx.GetPlace());
pred_h.mutable_data<T>({n, an_num, h, w}, ctx.GetPlace());
pred_conf.mutable_data<T>({n, an_num, h, w}, ctx.GetPlace());
pred_class.mutable_data<T>({n, an_num, h, w, class_num}, ctx.GetPlace());
CalcPredResult<T>(*input, &pred_conf, &pred_class, &pred_x, &pred_y,
&pred_w, &pred_h, an_num, class_num);
Tensor obj_mask, noobj_mask;
Tensor tx, ty, tw, th, tconf, tclass;
obj_mask.mutable_data<int>({n, an_num, h, w}, ctx.GetPlace());
noobj_mask.mutable_data<int>({n, an_num, h, w}, ctx.GetPlace());
tx.mutable_data<T>({n, an_num, h, w}, ctx.GetPlace());
ty.mutable_data<T>({n, an_num, h, w}, ctx.GetPlace());
tw.mutable_data<T>({n, an_num, h, w}, ctx.GetPlace());
th.mutable_data<T>({n, an_num, h, w}, ctx.GetPlace());
tconf.mutable_data<T>({n, an_num, h, w}, ctx.GetPlace());
tclass.mutable_data<T>({n, an_num, h, w, class_num}, ctx.GetPlace());
PreProcessGTBox<T>(*gt_box, *gt_label, ignore_thresh, anchors, h, &obj_mask,
&noobj_mask, &tx, &ty, &tw, &th, &tconf, &tclass);
Tensor obj_mask_expand;
obj_mask_expand.mutable_data<int>({n, an_num, h, w, class_num},
ctx.GetPlace());
ExpandObjMaskByClassNum(&obj_mask_expand, obj_mask);
Tensor grad_x, grad_y, grad_w, grad_h;
Tensor grad_conf_target, grad_conf_notarget, grad_class;
grad_x.mutable_data<T>({n, an_num, h, w}, ctx.GetPlace());
grad_y.mutable_data<T>({n, an_num, h, w}, ctx.GetPlace());
grad_w.mutable_data<T>({n, an_num, h, w}, ctx.GetPlace());
grad_h.mutable_data<T>({n, an_num, h, w}, ctx.GetPlace());
grad_conf_target.mutable_data<T>({n, an_num, h, w}, ctx.GetPlace());
grad_conf_notarget.mutable_data<T>({n, an_num, h, w}, ctx.GetPlace());
grad_class.mutable_data<T>({n, an_num, h, w, class_num}, ctx.GetPlace());
T obj_mf = CalcMaskPointNum<int>(obj_mask);
T noobj_mf = CalcMaskPointNum<int>(noobj_mask);
T obj_expand_mf = CalcMaskPointNum<int>(obj_mask_expand);
CalcMSEGradWithMask<T>(&grad_x, pred_x, tx, obj_mask, obj_mf);
CalcMSEGradWithMask<T>(&grad_y, pred_y, ty, obj_mask, obj_mf);
CalcMSEGradWithMask<T>(&grad_w, pred_w, tw, obj_mask, obj_mf);
CalcMSEGradWithMask<T>(&grad_h, pred_h, th, obj_mask, obj_mf);
CalcBCEGradWithMask<T>(&grad_conf_target, pred_conf, tconf, obj_mask,
obj_mf);
CalcBCEGradWithMask<T>(&grad_conf_notarget, pred_conf, tconf, noobj_mask,
noobj_mf);
CalcBCEGradWithMask<T>(&grad_class, pred_class, tclass, obj_mask_expand,
obj_expand_mf);
input_grad->mutable_data<T>({n, c, h, w}, ctx.GetPlace());
AddAllGradToInputGrad<T>(
input_grad, loss, pred_x, pred_y, pred_conf, pred_class, grad_x, grad_y,
grad_w, grad_h, grad_conf_target, grad_conf_notarget, grad_class,
class_num, loss_weight_xy, loss_weight_wh, loss_weight_conf_target,
loss_weight_conf_notarget, loss_weight_class);
}
};
} // namespace operators
} // namespace paddle
...@@ -53,10 +53,12 @@ inline static int RoundToPowerOfTwo(int dim) { ...@@ -53,10 +53,12 @@ inline static int RoundToPowerOfTwo(int dim) {
__VA_ARGS__; \ __VA_ARGS__; \
} break } break
#define CUDA_LAUNCH_KERNEL_HELPER(...) \ #define CUDA_LAUNCH_KERNEL_HELPER(...) \
CUDA_LAUNCH_KERNEL_BASE(256, ##__VA_ARGS__); \ CUDA_LAUNCH_KERNEL_BASE(1024, ##__VA_ARGS__); \
CUDA_LAUNCH_KERNEL_BASE(128, ##__VA_ARGS__); \ CUDA_LAUNCH_KERNEL_BASE(512, ##__VA_ARGS__); \
CUDA_LAUNCH_KERNEL_BASE(64, ##__VA_ARGS__); \ CUDA_LAUNCH_KERNEL_BASE(256, ##__VA_ARGS__); \
CUDA_LAUNCH_KERNEL_BASE(128, ##__VA_ARGS__); \
CUDA_LAUNCH_KERNEL_BASE(64, ##__VA_ARGS__); \
CUDA_LAUNCH_KERNEL_BASE(32, ##__VA_ARGS__); CUDA_LAUNCH_KERNEL_BASE(32, ##__VA_ARGS__);
template <typename T> template <typename T>
......
...@@ -14,6 +14,12 @@ limitations under the License. */ ...@@ -14,6 +14,12 @@ limitations under the License. */
#include "paddle/fluid/platform/place.h" #include "paddle/fluid/platform/place.h"
DEFINE_bool(benchmark, false,
"Doing memory benchmark. It will make deleting scope synchronized, "
"and add some memory usage logs."
"Default cuda is asynchronous device, set to True will"
"force op run in synchronous mode.");
namespace paddle { namespace paddle {
namespace platform { namespace platform {
......
...@@ -37,6 +37,7 @@ limitations under the License. */ ...@@ -37,6 +37,7 @@ limitations under the License. */
#include "paddle/fluid/framework/version.h" #include "paddle/fluid/framework/version.h"
#include "paddle/fluid/imperative/layer.h" #include "paddle/fluid/imperative/layer.h"
#include "paddle/fluid/memory/allocation/allocator_strategy.h" #include "paddle/fluid/memory/allocation/allocator_strategy.h"
#include "paddle/fluid/memory/allocation/legacy_allocator.h"
#include "paddle/fluid/operators/activation_op.h" #include "paddle/fluid/operators/activation_op.h"
#include "paddle/fluid/operators/py_func_op.h" #include "paddle/fluid/operators/py_func_op.h"
#include "paddle/fluid/operators/reader/lod_tensor_blocking_queue.h" #include "paddle/fluid/operators/reader/lod_tensor_blocking_queue.h"
...@@ -127,6 +128,13 @@ PYBIND11_MODULE(core, m) { ...@@ -127,6 +128,13 @@ PYBIND11_MODULE(core, m) {
m.add_object("_cleanup", m.add_object("_cleanup",
py::capsule([]() { ScopePool::Instance().Clear(); })); py::capsule([]() { ScopePool::Instance().Clear(); }));
m.def("get_mem_usage", [](int device) {
return memory::allocation::GPUMemMonitor.GetMemUsage(device);
});
m.def("print_mem_usage",
[]() { return memory::allocation::GPUMemMonitor.PrintMemUsage(); });
py::class_<imperative::VarBase>(m, "VarBase", R"DOC()DOC") py::class_<imperative::VarBase>(m, "VarBase", R"DOC()DOC")
// .def(py::init<>()) // .def(py::init<>())
.def(py::init<bool>(), py::arg("stop_gradient") = false) .def(py::init<bool>(), py::arg("stop_gradient") = false)
......
#!/bin/bash
path='http://paddlepaddle.org/download?url='
#release_version=`curl -s https://pypi.org/project/paddlepaddle/|grep -E "/project/paddlepaddle/"|grep "release"|awk -F '/' '{print $(NF-1)}'|head -1`
release_version=1.2.0
python_list=(
"27"
"35"
"36"
"37"
)
function use_cpu(){
while true
do
read -p "是否安装CPU版本的PaddlePaddle?(y/n)" cpu_option
cpu_option=`echo $cpu_option | tr 'A-Z' 'a-z'`
if [[ "$cpu_option" == "" || "$cpu_option" == "n" ]];then
echo "退出安装中..."
exit
else
GPU='cpu'
echo "将为您安装CPU版本的PaddlePaddle"
break
fi
done
}
function checkLinuxCUDNN(){
echo
read -n1 -p "请按回车键进行下一步..."
echo
while true
do
version_file='/usr/local/cuda/include/cudnn.h'
if [ -f "$version_file" ];then
CUDNN=`cat $version_file | grep CUDNN_MAJOR |awk 'NR==1{print $NF}'`
fi
if [ "$CUDNN" == "" ];then
version_file=`sudo find /usr -name "cudnn.h"|head -1`
if [ "$version_file" != "" ];then
CUDNN=`cat ${version_file} | grep CUDNN_MAJOR -A 2|awk 'NR==1{print $NF}'`
else
echo "检测结果:未在常规路径下找到cuda/include/cudnn.h文件"
while true
do
read -p "请核实cudnn.h位置,并在此输入路径(请注意,路径需要输入到“cudnn.h”这一级):" cudnn_version
echo
if [ "$cudnn_version" == "" ] || [ ! -f "$cudnn_version" ];then
read -p "仍未找到cuDNN,输入y将安装CPU版本的PaddlePaddle,输入n可重新录入cuDNN路径,请输入(y/n)" cpu_option
echo
cpu_option=`echo $cpu_option | tr 'A-Z' 'a-z'`
if [ "$cpu_option" == "y" -o "$cpu_option" == "" ];then
GPU='cpu'
break
else
echo "请重新输入"
echo
fi
else
CUDNN=`cat $cudnn_version | grep CUDNN_MAJOR |awk 'NR==1{print $NF}'`
echo "检测结果:找到cudnn.h"
break
fi
done
if [ "$GPU" == "cpu" ];then
break
fi
fi
fi
if [ "$CUDA" == "9" -a "$CUDNN" != "7" ];then
echo
echo "目前CUDA9下仅支持cuDNN7,暂不支持您机器上的CUDNN${CUDNN}。您可以访问NVIDIA官网下载适合版本的CUDNN,请ctrl+c退出安装进程。按回车键将为您安装CPU版本的PaddlePaddle"
echo
use_cpu()
if [ "$GPU"=="cpu" ];then
break
fi
fi
if [ "$CUDNN" == 5 ] || [ "$CUDNN" == 7 ];then
echo
echo "您的CUDNN版本是: CUDNN$CUDNN"
break
else
echo
read -n1 -p "目前支持的CUDNN版本为5和7,暂不支持您机器上的CUDNN${CUDNN},将为您安装CPU版本的PaddlePaddle,请按回车键开始安装"
echo
use_cpu
if [ "$GPU"=="cpu" ];then
break
fi
fi
done
}
function checkLinuxCUDA(){
while true
do
CUDA=`echo ${CUDA_VERSION}|awk -F "[ .]" '{print $1}'`
if [ "$CUDA" == "" ];then
if [ -f "/usr/local/cuda/version.txt" ];then
CUDA=`cat /usr/local/cuda/version.txt | grep 'CUDA Version'|awk -F '[ .]' '{print $3}'`
tmp_cuda=$CUDA
fi
if [ -f "/usr/local/cuda8/version.txt" ];then
CUDA=`cat /usr/local/cuda8/version.txt | grep 'CUDA Version'|awk -F '[ .]' '{print $3}'`
tmp_cuda8=$CUDA
fi
if [ -f "/usr/local/cuda9/version.txt" ];then
CUDA=`cat /usr/local/cuda9/version.txt | grep 'CUDA Version'|awk -F '[ .]' '{print $3}'`
tmp_cuda9=$CUDA
fi
fi
if [ "$tmp_cuda" != "" ];then
echo "检测结果:找到CUDA $tmp_cuda"
fi
if [ "$tmp_cudai8" != "" ];then
echo "检测结果:找到CUDA $tmp_cuda8"
fi
if [ "$tmp_cuda9" != "" ];then
echo "检测结果:找到CUDA $tmp_cuda9"
fi
if [ "$CUDA" == "" ];then
echo "检测结果:没有在常规路径下找到cuda/version.txt文件"
while true
do
read -p "请输入cuda/version.txt的路径:" cuda_version
if [ "$cuda_version" == "" || ! -f "$cuda_version" ];then
read -p "仍未找到CUDA,输入y将安装CPU版本的PaddlePaddle,输入n可重新录入CUDA路径,请输入(y/n)" cpu_option
cpu_option=`echo $cpu_option | tr 'A-Z' 'a-z'`
if [ "$cpu_option" == "y" || "$cpu_option" == "" ];then
GPU='cpu'
break
else
echo "重新输入..."
fi
else
CUDA=`cat $cuda_version | grep 'CUDA Version'|awk -F '[ .]' '{print $3}'`
if [ "$CUDA" == "" ];then
echo "未能在version.txt中找到CUDA相关信息"
else
break
fi
fi
done
if [ "$GPU" == "cpu" ];then
break
fi
fi
if [ "$CUDA" == "8" ] || [ "$CUDA" == "9" ];then
echo "您的CUDA版本是${CUDA}"
break
else
echo "目前支持CUDA8/9,暂不支持您的CUDA${CUDA},将为您安装CPU版本的PaddlePaddle"
echo
use_cpu
fi
if [ "$GPU" == "cpu" ];then
break
fi
done
}
function checkLinuxMathLibrary(){
while true
do
if [ "$AVX" == "" ];then
echo "正在检测您环境中是否存在AVX指令集..."
echo
echo "检测结果:您电脑上没有AVX指令集,目前针对无AVX指令集的环境,我们仅提供支持mkl数学库的PaddlePaddle,将为您安装此版本的PaddlePaddle"
math='mkl'
break
elif [ "$GPU" == "gpu" ];then
math='mkl'
echo "检测到您的机器上配备GPU,推荐您使用mkl数学库"
break
else
read -p "请输入您希望使用的数学库:
1:openblas 一个高性能多核 BLAS 库
2:mkl(推荐) 英特尔数学核心函数库
=> 请输入数字1或2。如输入其他字符或直接回车,将会默认选择【 2. mkl 】 。请在这里输入并回车:" math
if [ "$math" == "" ];then
math="mkl"
echo "您选择了数字【2】"
break
fi
if [ "$math" == "1" ];then
math=openblas
echo "您选择了数字【1】"
break
elif [ "$math" == "2" ];then
math=mkl
echo "您选择了数字【2】"
break
fi
echo "输入错误,请再次输入"
fi
done
}
function checkLinuxPaddleVersion(){
read -n1 -p "请按回车键继续..."
while true
do
read -p "
1. 开发版:对应Github上develop分支,如您需要开发、或希望使用PaddlePaddle最新功能,请选用此版本
2. 稳定版(推荐):如您无特殊开发需求,建议使用此版本,目前最新的版本号为 ${release_version}
=> 请输入数字1或2。如输入其他字符或直接回车,将会默认选择【 2. 稳定版 】 。请在这里输入并回车:" paddle_version
if [ "$paddle_version" == "" ];then
paddle_version="release-${release_version}"
echo "您选择了数字【2】,为您安装release-${release_version}"
break
fi
if [ "$paddle_version" == "1" ];then
echo "您选择了数字【1】,将为您安装开发版"
break
elif [ "$paddle_version" == "2" ];then
echo "您选择了数字【2】,为您安装release-${release_version}"
break
fi
echo "输入错误,请再次输入"
done
}
function checkLinuxPip(){
while true
do
echo "请输入您要使用的pip目录(您可以另起终端,并使用which pip来查看):"
read -p "" pip_path
if [ "$pip_path" == "" -o ! -f "$pip_path" ];then
echo "检测结果:pip不存在,请重新输入"
continue
fi
python_version=`$pip_path --version|awk -F "[ |)]" '{print $6}'|sed 's#\.##g'`
if [ "$python_version" == "27" ];then
uncode=`python -c "import pip._internal;print(pip._internal.pep425tags.get_supported())"|grep "cp27mu"`
if [[ "$uncode" == "" ]];then
uncode=
else
uncode=u
fi
fi
if [ "$python_version" == "" ];then
echo "检测结果:pip不存在,请重新输入"
else
version_list=`echo "${python_list[@]}" | grep "$python_version" `
if [ "$version_list" != "" ];then
echo "检测结果:找到python${python_version}版本"
break
else
echo "检测结果:找不到可用的 pip, 我们只支持Python27/35/36/37及其对应的pip, 请重新输入, 或使用ctrl + c退出 "
fi
fi
done
}
function checkLinuxAVX(){
while true
do
if [[ "$AVX" != "" ]];then
AVX="avx"
break
else
if [ "$CUDA" == "8" -a "$CUDNN" == "7" ] || [ "$GPU" == "cpu" ];then
AVX="noavx"
break
else
echo "Step 6. 检测是否有avx"
echo
echo "检测结果:未能找到avx,我们仅提供CPU版本或配置为CUDA8 cuDNN7的GPU版本的安装包"
break
fi
fi
done
}
function PipLinuxInstall(){
wheel_cpu_release="http://paddle-wheel.bj.bcebos.com/${release_version}-${GPU}-${AVX}-${math}/paddlepaddle-${release_version}-cp${python_version}-cp${python_version}m${uncode}-linux_x86_64.whl"
wheel_gpu_release="http://paddle-wheel.bj.bcebos.com/${release_version}-gpu-cuda${CUDA}-cudnn${CUDNN}-${AVX}-${math}/paddlepaddle_gpu-${release_version}.post${CUDA}${CUDNN}-cp${python_version}-cp${python_version}m${uncode}-linux_x86_64.whl"
wheel_gpu_release_noavx="http://paddle-wheel.bj.bcebos.com/${release_version}-gpu-cuda${CUDA}-cudnn${CUDNN}-${AVX}-${math}/paddlepaddle_gpu-${release_version}-cp${python_version}-cp${python_version}m${uncode}-linux_x86_64.whl"
wheel_cpu_develop="http://paddle-wheel.bj.bcebos.com/latest-cpu-${AVX}-${math}/paddlepaddle-latest-cp${python_version}-cp${python_version}m${uncode}-linux_x86_64.whl"
wheel_gpu_develop="http://paddle-wheel.bj.bcebos.com/latest-gpu-cuda${CUDA}-cudnn${CUDNN}-${AVX}-${math}/paddlepaddle_gpu-latest-cp${python_version}-cp${python_version}m${uncode}-linux_x86_64.whl"
if [[ "$paddle_version" == "2" ]];then
if [[ "$GPU" == "gpu" ]];then
if [[ ${AVX} == "avx" ]];then
rm -rf `echo $wheel_gpu_release|awk -F '/' '{print $NF}'`
wget -q $wheel_gpu_release
if [ "$?" == "0" ];then
$pip_path install --user -i https://mirrors.aliyun.com/pypi/simple --trusted-host=mirrors.aliyun.com $wheel_gpu_release
else
echo "paddlepaddle whl包下载失败"
exit 1
fi
else
rm -rf `echo $wheel_gpu_release_novax|awk -F '/' '{print $NF}'`
wget -q $wheel_gpu_release_novax
if [ "$?" == "0" ];then
$pip_path install --user -i https://mirrors.aliyun.com/pypi/simple --trusted-host=mirrors.aliyun.com $wheel_gpu_release_noavx
else
echo "paddlepaddle whl包下载失败"
exit 1
fi
fi
else
rm -rf `echo $wheel_cpu_release|awk -F '/' '{print $NF}'`
wget -q $wheel_cpu_release
if [ "$?" == "0" ];then
$pip_path install --user -i https://mirrors.aliyun.com/pypi/simple --trusted-host=mirrors.aliyun.com $wheel_cpu_release
else
echo "paddlepaddle whl包下载失败"
exit 1
fi
fi
else
if [[ "$GPU" == "gpu" ]];then
rm -rf `echo $wheel_gpu_develop|awk -F '/' '{print $NF}'`
wget -q $wheel_gpu_develop
if [ "$?" == "0" ];then
$pip_path install --user -i https://mirrors.aliyun.com/pypi/simple --trusted-host=mirrors.aliyun.com $wheel_gpu_develop
else
echo "paddlepaddle whl包下载失败"
exit 1
fi
else
rm -rf `echo $wheel_cpu_develop|awk -F '/' '{print $NF}'`
wget -q $wheel_cpu_develop
if [ "$?" == "0" ];then
$pip_path install --user -i https://mirrors.aliyun.com/pypi/simple --trusted-host=mirrors.aliyun.com $wheel_cpu_develop
else
echo "paddlepaddle whl包下载失败"
exit 1
fi
fi
fi
}
function checkLinuxGPU(){
read -n1 -p "即将检测您的机器是否含GPU,请按回车键继续..."
echo
AVX=`cat /proc/cpuinfo |grep avx|tail -1|grep avx`
which nvidia-smi >/dev/null 2>&1
if [ "$?" != "0" ];then
GPU='cpu'
echo "未在机器上找到GPU,或PaddlePaddle暂不支持此型号的GPU"
else
GPU='gpu'
echo "已在您的机器上找到GPU,即将确认CUDA和CUDNN版本..."
echo
fi
if [ "$GPU" == 'gpu' ];then
checkLinuxCUDA
checkLinuxCUDNN
fi
}
function linux(){
gpu_list=(
"GeForce 410M"
"GeForce 610M"
"GeForce 705M"
"GeForce 710M"
"GeForce 800M"
"GeForce 820M"
"GeForce 830M"
"GeForce 840M"
"GeForce 910M"
"GeForce 920M"
"GeForce 930M"
"GeForce 940M"
"GeForce GT 415M"
"GeForce GT 420M"
"GeForce GT 430"
"GeForce GT 435M"
"GeForce GT 440"
"GeForce GT 445M"
"GeForce GT 520"
"GeForce GT 520M"
"GeForce GT 520MX"
"GeForce GT 525M"
"GeForce GT 540M"
"GeForce GT 550M"
"GeForce GT 555M"
"GeForce GT 610"
"GeForce GT 620"
"GeForce GT 620M"
"GeForce GT 625M"
"GeForce GT 630"
"GeForce GT 630M"
"GeForce GT 635M"
"GeForce GT 640"
"GeForce GT 640 (GDDR5)"
"GeForce GT 640M"
"GeForce GT 640M LE"
"GeForce GT 645M"
"GeForce GT 650M"
"GeForce GT 705"
"GeForce GT 720"
"GeForce GT 720M"
"GeForce GT 730"
"GeForce GT 730M"
"GeForce GT 735M"
"GeForce GT 740"
"GeForce GT 740M"
"GeForce GT 745M"
"GeForce GT 750M"
"GeForce GTS 450"
"GeForce GTX 1050"
"GeForce GTX 1060"
"GeForce GTX 1070"
"GeForce GTX 1080"
"GeForce GTX 1080 Ti"
"GeForce GTX 460"
"GeForce GTX 460M"
"GeForce GTX 465"
"GeForce GTX 470"
"GeForce GTX 470M"
"GeForce GTX 480"
"GeForce GTX 480M"
"GeForce GTX 485M"
"GeForce GTX 550 Ti"
"GeForce GTX 560M"
"GeForce GTX 560 Ti"
"GeForce GTX 570"
"GeForce GTX 570M"
"GeForce GTX 580"
"GeForce GTX 580M"
"GeForce GTX 590"
"GeForce GTX 650"
"GeForce GTX 650 Ti"
"GeForce GTX 650 Ti BOOST"
"GeForce GTX 660"
"GeForce GTX 660M"
"GeForce GTX 660 Ti"
"GeForce GTX 670"
"GeForce GTX 670M"
"GeForce GTX 670MX"
"GeForce GTX 675M"
"GeForce GTX 675MX"
"GeForce GTX 680"
"GeForce GTX 680M"
"GeForce GTX 680MX"
"GeForce GTX 690"
"GeForce GTX 750"
"GeForce GTX 750 Ti"
"GeForce GTX 760"
"GeForce GTX 760M"
"GeForce GTX 765M"
"GeForce GTX 770"
"GeForce GTX 770M"
"GeForce GTX 780"
"GeForce GTX 780M"
"GeForce GTX 780 Ti"
"GeForce GTX 850M"
"GeForce GTX 860M"
"GeForce GTX 870M"
"GeForce GTX 880M"
"GeForce GTX 950"
"GeForce GTX 950M"
"GeForce GTX 960"
"GeForce GTX 960M"
"GeForce GTX 965M"
"GeForce GTX 970"
"GeForce GTX 970M"
"GeForce GTX 980"
"GeForce GTX 980M"
"GeForce GTX 980 Ti"
"GeForce GTX TITAN"
"GeForce GTX TITAN Black"
"GeForce GTX TITAN X"
"GeForce GTX TITAN Z"
"Jetson TK1"
"Jetson TX1"
"Jetson TX2"
"Mobile Products"
"NVIDIA NVS 310"
"NVIDIA NVS 315"
"NVIDIA NVS 510"
"NVIDIA NVS 810"
"NVIDIA TITAN V"
"NVIDIA TITAN X"
"NVIDIA TITAN Xp"
"NVS 4200M"
"NVS 5200M"
"NVS 5400M"
"Quadro 410"
"Quadro GP100"
"Quadro K1100M"
"Quadro K1200"
"Quadro K2000"
"Quadro K2000D"
"Quadro K2100M"
"Quadro K2200"
"Quadro K2200M"
"Quadro K3100M"
"Quadro K4000"
"Quadro K4100M"
"Quadro K420"
"Quadro K4200"
"Quadro K4200M"
"Quadro K5000"
"Quadro K500M"
"Quadro K5100M"
"Quadro K510M"
"Quadro K5200"
"Quadro K5200M"
"Quadro K600"
"Quadro K6000"
"Quadro K6000M"
"Quadro K610M"
"Quadro K620"
"Quadro K620M"
"Quadro M1000M"
"Quadro M1200"
"Quadro M2000"
"Quadro M2000M"
"Quadro M2200"
"Quadro M3000M"
"Quadro M4000"
"Quadro M4000M"
"Quadro M5000"
"Quadro M5000M"
"Quadro M500M"
"Quadro M520"
"Quadro M5500M"
"Quadro M6000"
"Quadro M6000 24GB"
"Quadro M600M"
"Quadro M620"
"Quadro Mobile Products"
"Quadro P1000"
"Quadro P2000"
"Quadro P3000"
"Quadro P400"
"Quadro P4000"
"Quadro P5000"
"Quadro P600"
"Quadro P6000"
"Quadro Plex 7000"
"Tegra K1"
"Tegra X1"
"Tesla C2050/C2070"
"Tesla C2075"
"Tesla Data Center Products"
"Tesla K10"
"Tesla K20"
"Tesla K40"
"Tesla K80"
"Tesla M40"
"Tesla M60"
"Tesla P100"
"Tesla P4"
"Tesla P40"
"Tesla V100")
echo "Step 2. 检测GPU型号和CUDA/cuDNN版本"
echo
checkLinuxGPU
echo
echo "Step 3. 检测数学库"
echo
checkLinuxMathLibrary
echo
echo "Step 4. 选择要安装的PaddlePaddle版本"
echo
checkLinuxPaddleVersion
echo
echo "Step 5. 检测pip版本"
echo
checkLinuxPip
echo
checkLinuxAVX
echo "*********************2. 开始安装*****************************"
PipLinuxInstall
}
function checkMacPython2(){
while true
do
read -p "
=> 未能在常规路径下找到Python2,请使用ctrl+c命令退出安装程序,并使用brew或pypi.org下载安装Python2(注意Python版本不能低于2.7.15)
如希望自定义Python路径,请输入路径:" python_root
echo
python_version=`$python_root --version 2>&1 1>&1`
if [ $? == "0" ];then
:
else
python_version=""
fi
check_python=`echo $python_version | grep "Python 2"`
if [ "$python_version" == "" ] || [ "$python_root" == "/usr/bin/python" -a "$python_version" == "Python 2.7.10" ] ;then
python_version=""
elif [ -n "$check_python" ];then
while true
do
read -p "
=> 在您的环境中找到 $python_version, 确认使用此版本请输入y;如您希望自定义Python路径请输入n。请在这里输入(y/n)并回车: " use_python
echo
use_python=`echo $use_python | tr 'A-Z' 'a-z'`
if [ "$use_python" == "y" ]||[ "$use_python" == "" ];then
use_python="y"
break
elif [ "$use_python" == "n" ];then
python_root=""
break
else
echo "输入错误,请重新输入(y/n)"
fi
done
if [ "$use_python" == "y" ];then
break
fi
else
echo "您输入Python的不是Python2"
python_version=""
fi
done
}
function checkMacPython3(){
while true
do
read -p "
=> 未能在常规路径下找到Python3,请使用ctrl+c命令退出安装程序,并使用brew或pypi.org下载Python3
如希望自定义Python路径,请输入路径:" python_root
python_version=`$python_root --version 2>&1 1>&1`
if [ $? == "0" ];then
:
else
python_version=""
fi
check_python=`echo $python_version | grep "Python 3"`
if [ "$python_version" == "" ] || [ "$python_root" == "/usr/bin/python" -a "$python_version" == "Python 2.7.10" ] ;then
python_version=""
elif [ -n "$check_python" ] ;then
while true
do
read -p "
=> 在您的环境中找到 $python_version, 确认使用此版本请输入y;如您希望自定义Python路径请输入n。请在这里输入(y/n)并回车: " use_python
echo
use_python=`echo $use_python | tr 'A-Z' 'a-z'`
if [ "$use_python" == "y" ]||[ "$use_python" == "" ];then
use_python="y"
break
elif [ "$use_python" == "n" ];then
python_root=""
break
else
echo "输入错误,请重新输入(y/n)"
fi
done
if [ "$use_python" == "y" ];then
break
fi
else
echo "您输入Python的不是Python3"
python_version=""
fi
done
}
function checkMacPaddleVersion(){
while true
do
read -n1 -p "Step 2. 选择PaddlePaddle的版本,请按回车键继续..."
echo
read -p "
1. 开发版:对应Github上develop分支,如您需要开发、或希望使用PaddlePaddle最新功能,请选用此版本
2. 稳定版(推荐):如您无特殊开发需求,建议使用此版本,目前最新的版本号为 ${release_version}
=> 请输入数字1或2。如输入其他字符或直接回车,将会默认选择【 2. 稳定版 】 。请在这里输入并回车:" paddle_version
if [ "$paddle_version" == "1" ]||[ "$paddle_version" == "2" ];then
echo
echo "您选择了数字【"$paddle_version" 】"
echo
break
else
paddle_version="2"
echo
echo "您选择了数字【2】"
echo
break
fi
done
}
function checkMacPythonVersion(){
while true
do
read -n1 -p "Step 3. 选择Python版本,请按回车键继续..."
read -p "
2. 使用python 2.x
3. 使用python 3.x
=> 请输入数字2或3。如输入其他字符或直接回车,将会默认使用【Python 2 】。请在这里输入并回车:" python_V
echo
if [ "$python_V" == "" ];then
python_V="2"
fi
echo "您选择了数字【"$python_V"】,正在寻找符合您要求的Python版本,请按回车键继续..."
echo
if [ "$python_V" == "2" ];then
python_root=`which python2.7`
if [ "$python_root" == "" ];then
python_root=`which python`
fi
python_version=`$python_root --version 2>&1 1>&1`
if [ $? == "0" ];then
:
else
python_version=""
fi
if [ "$python_root" == "" ]||[ "$python_root" == "/usr/bin/python" -a "$python_version" == "Python 2.7.10" ]||[ "$python_root" == "/usr/bin/python2.7" -a "$python_version" == "Python 2.7.10" ];then
checkMacPython2
fi
while true
do
read -p "
=> 在您的环境中找到 $python_version, 确认使用此版本请输入y;如您希望自定义Python路径请输入n。请在这里输入(y/n)并回车:" use_python
echo
use_python=`echo $use_python | tr 'A-Z' 'a-z'`
if [ "$use_python" == "y" ]||[ "$use_python" == "" ];then
break
elif [ "$use_python" == "n" ];then
python_root=""
checkMacPython2
break
else
echo "输入错误,请重新输入(y/n)"
fi
done
elif [ "$python_V" == "3" ];then
python_root=`which python3`
python_version=`$python_root --version 2>&1 1>&1`
if [ $? == "0" ];then
:
else
python_version=""
fi
if [ "$python_root" == "" ]||[ "$python_root" == "/usr/bin/python" -a "$python_version" == "Python 2.7.10" ];then
checkMacPython3
fi
while true
do
read -p "
=> 在您的环境中找到 $python_version, 确认使用此版本请输入y;如您希望自定义Python路径请输入n。请在这里输入(y/n)并回车:" use_python
echo
use_python=`echo $use_python | tr 'A-Z' 'a-z'`
if [ "$use_python" == "y" ]||[ "$use_python" == "" ];then
break
elif [ "$use_python" == "n" ];then
checkMacPython3
break
else
echo "输入错误,请重新输入(y/n)"
fi
done
else
:
fi
if [ "$python_V" == "2" ]||[ "$python_V" == "3" ];then
python_brief_version=`$python_root -m pip -V |awk -F "[ |)]" '{print $6}'|sed 's#\.##g'`
if [[ $python_brief_version == "27" ]];then
uncode=`python -c "import pip._internal;print(pip._internal.pep425tags.get_supported())"|grep "cp27"`
if [[ $uncode == "" ]];then
uncode="mu"
else
uncode="m"
fi
fi
version_list=`echo "${python_list[@]}" | grep "$python_brief_version" `
if [ "$version_list" != "" ];then
break
else
echo "未找到可用的pip或pip3。PaddlePaddle目前支持:Python2.7/3.5/3.6/3.7及其对应的pip, 请重新输入,或使用ctrl + c退出"
fi
else
echo "输入错误,请重新输入"
fi
done
}
function checkMacAVX(){
read -n1 -p "Step 4. 检测您的Mac是否支持AVX指令集,请按回车键继续..."
echo
if [[ $AVX != "" ]];then
AVX="avx"
echo "检测结果:支持"
else
read -n1 -p "检测结果:不支持。非常抱歉,PaddlePaddle在Mac系统暂不提供no_avx类型的安装包,您可以选择在Linux系统中安装no_avx版的PaddlePaddle, 请按回车键退出..."
exit
fi
echo
}
function checkMacGPU(){
read -n1 -p "Step 5. 选择CPU/GPU版本,请按回车键继续..."
echo
if [[ $GPU != "" ]];then
echo "MacOS环境下,暂未提供GPU版本的PaddlePaddle安装包,将为您安装CPU版本的PaddlePaddle"
else
echo "MacOS环境下,暂未提供GPU版本的PaddlePaddle安装包,将为您安装CPU版本的PaddlePaddle"
GPU=cpu
fi
echo
}
function macos() {
path='http://paddlepaddle.org/download?url='
AVX=`sysctl -a | grep cpu | grep AVX1.0 | tail -1 | grep AVX`
while true
do
checkMacPaddleVersion
checkMacPythonVersion
checkMacAVX
checkMacGPU
echo "*********************2. 开始安装*****************************"
echo
read -n1 -p "即将为您下载并安装PaddlePaddle,请按回车键继续..."
echo
if [[ $paddle_version == "2" ]];then
$python_root -m pip install paddlepaddle
if [ $? == "0" ];then
echo "安装成功,可以使用: ${python_root} 来启动安装了PaddlePaddle的Python解释器"
break
else
rm $whl_cpu_release
echo "未能正常安装PaddlePaddle,请尝试更换您输入的python路径,或者ctrl + c退出后请检查您使用的python对应的pip或pip源是否可用"
echo""
echo "=========================================================================================="
echo""
exit 1
fi
else
if [ -f $whl_cpu_develop ];then
$python_root -m pip install $whl_cpu_develop
if [ $? == "0" ];then
rm -rf $whl_cpu_develop
echo "安装成功!小提示:可以使用: ${python_root} 来启动安装了PaddlePaddle的Python解释器"
break
else
echo "未能正常安装PaddlePaddle,请尝试更换您输入的python路径,或者ctrl + c退出后请检查您使用的python对应的pip或pip源是否可用"
echo""
echo "=========================================================================================="
echo""
exit 1
fi
else
wget ${path}$whl_cpu_develop -O $whl_cpu_develop
if [ $? == "0" ];then
$python_root -m pip install $whl_cpu_develop
if [ $? == "0" ];then
rm $wheel_cpu_develop
echo "安装成功,可以使用: ${python_root} 来启动安装了PaddlePaddle的Python解释器"
break
else
rm $whl_cpu_release
echo "未能正常安装PaddlePaddle,请尝试更换您输入的python路径,或者ctrl + c退出后请检查您使用的python对应的pip或pip源是否可用"
echo""
echo "=========================================================================================="
echo""
exit 1
fi
else
rm $whl_cpu_develop
echo "未能正常安装PaddlePaddle,请检查您的网络 或者确认您是否安装有 wget,或者ctrl + c退出后反馈至https://github.com/PaddlePaddle/Paddle/issues"
echo""
echo "=========================================================================================="
echo""
exit 1
fi
fi
fi
done
}
function main() {
echo "*********************************"
echo "欢迎使用PaddlePaddle快速安装脚本"
echo "*********************************"
echo
echo "如果您在安装过程中遇到任何问题,请在https://github.com/PaddlePaddle/Paddle/issues反馈,我们的工作人员将会帮您答疑解惑"
echo
echo "本安装包将帮助您在Linux或Mac系统下安装PaddlePaddle,包括 1)安装前的准备和 2)开始安装 两部分"
echo
read -n1 -p "请按回车键进行下一步..."
echo
echo
echo "*********************1. 安装前的准备*****************************"
echo
echo "Step 1. 正在检测您的操作系统信息..."
echo
SYSTEM=`uname -s`
if [ "$SYSTEM" == "Darwin" ];then
echo "您的系统为:MAC OSX"
echo
macos
else
echo "您的系统为:Linux"
echo
OS=`cat /etc/issue|awk 'NR==1 {print $1}'`
if [ $OS == "\S" ] || [ "$OS" == "CentOS" ] || [ $OS == "Ubuntu" ];then
linux
else
echo "您的系统不在本安装包的支持范围,如您需要在windows环境下安装PaddlePaddle,请您参考PaddlePaddle官网的windows安装文档"
fi
fi
}
main
...@@ -22,6 +22,8 @@ from . import op_frequence ...@@ -22,6 +22,8 @@ from . import op_frequence
from .op_frequence import * from .op_frequence import *
from . import quantize from . import quantize
from .quantize import * from .quantize import *
from . import int8_inference
from .int8_inference import *
from . import reader from . import reader
from .reader import * from .reader import *
from . import slim from . import slim
...@@ -34,6 +36,7 @@ __all__ += decoder.__all__ ...@@ -34,6 +36,7 @@ __all__ += decoder.__all__
__all__ += memory_usage_calc.__all__ __all__ += memory_usage_calc.__all__
__all__ += op_frequence.__all__ __all__ += op_frequence.__all__
__all__ += quantize.__all__ __all__ += quantize.__all__
__all__ += int8_inference.__all__
__all__ += reader.__all__ __all__ += reader.__all__
__all__ += slim.__all__ __all__ += slim.__all__
__all__ += utils.__all__ __all__ += utils.__all__
# Offline INT8 Calibration Tool
PaddlePaddle supports offline INT8 calibration to accelerate the inference speed. In this document, we provide the instructions on how to enable INT8 calibration and show the ResNet-50 and MobileNet-V1 results in accuracy.
## 0. Prerequisite
You need to install at least PaddlePaddle-1.3 python package `pip install paddlepaddle==1.3`.
## 1. How to generate INT8 model
You can refer to the unit test in [test_calibration.py](../tests/test_calibration.py). Basically, there are three steps:
* Construct calibration object.
```python
calibrator = int8_utility.Calibrator( # Step 1
program=infer_program, # required, FP32 program
pretrained_model=model_path, # required, FP32 pretrained model
algo=algo, # required, calibration algorithm; default is max, the alternative is KL (Kullback–Leibler divergence)
exe=exe, # required, executor
output=int8_model, # required, INT8 model
feed_var_names=feed_dict, # required, feed dict
fetch_list=fetch_targets) # required, fetch targets
```
* Call the calibrator.sample_data() after executor run.
```python
_, acc1, _ = exe.run(
program,
feed={feed_dict[0]: image,
feed_dict[1]: label},
fetch_list=fetch_targets)
calibrator.sample_data() # Step 2
```
* Call the calibrator.save_int8_model() after sampling over specified iterations (e.g., iterations = 50)
```python
calibrator.save_int8_model() # Step 3
```
## 2. How to run INT8 model
You can load INT8 model by load_inference_model [API](https://github.com/PaddlePaddle/Paddle/blob/8b50ad80ff6934512d3959947ac1e71ea3fb9ea3/python/paddle/fluid/io.py#L991) and run INT8 inference similar as [FP32](https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleCV/object_detection/eval.py "FP32").
```python
[infer_program, feed_dict,
fetch_targets] = fluid.io.load_inference_model(model_path, exe)
```
## 3. Result
We provide the results of accuracy measurd on [Intel® Xeon® Platinum Gold Processor](https://ark.intel.com/products/120489/Intel-Xeon-Gold-6148-Processor-27-5M-Cache-2-40-GHz- "Intel® Xeon® Gold 6148 Processor") (also known as Intel® Xeon® Skylake6148).
| Model | Dataset | FP32 Accuracy | INT8 Accuracy | Accuracy Diff |
| ------------ | ------------ | ------------ | ------------ | ------------ |
| ResNet-50 | Small | 72.00% | 72.00% | 0.00% |
| MobileNet-V1 | Small | 62.00% | 62.00% | 0.00% |
| ResNet-50 | Full ImageNet Val | 76.63% | 76.17% | 0.46% |
| MobileNet-V1 | Full ImageNet Val | 70.78% | 70.49% | 0.29% |
Please note that [Small](http://paddle-inference-dist.cdn.bcebos.com/int8/calibration_test_data.tar.gz "Small") is a subset of [full ImageNet validation dataset](http://www.image-net.org/challenges/LSVRC/2012/nnoupb/ILSVRC2012_img_val.tar "full ImageNet validation dataset").
Notes:
* The accuracy measurement requires the model with `label`.
* The INT8 theoretical speedup is ~1.33X on Intel® Xeon® Skylake Server (please refer to `This allows for 4x more input at the cost of 3x more instructions or 33.33% more compute` in [Reference](https://software.intel.com/en-us/articles/lower-numerical-precision-deep-learning-inference-and-training "Reference")).
## 4. How to reproduce the results
* Small dataset
```bash
python python/paddle/fluid/contrib/tests/test_calibration.py
```
* Full dataset
```bash
DATASET=full python python/paddle/fluid/contrib/tests/test_calibration.py
```
...@@ -11,3 +11,10 @@ ...@@ -11,3 +11,10 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
from __future__ import print_function
from . import utility
from .utility import *
__all__ = utility.__all__
...@@ -11,11 +11,15 @@ ...@@ -11,11 +11,15 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
import paddle.fluid.core as core
from paddle.fluid import core
import numpy as np import numpy as np
import math import math
import os import os
import paddle.fluid as fluid from paddle.fluid.executor import global_scope
from paddle.fluid import io
__all__ = ['Calibrator']
class Calibrator(object): class Calibrator(object):
...@@ -76,8 +80,7 @@ class Calibrator(object): ...@@ -76,8 +80,7 @@ class Calibrator(object):
''' '''
for i in self.sampling_program.list_vars(): for i in self.sampling_program.list_vars():
if i.name in self.sampling_vars: if i.name in self.sampling_vars:
np_data = np.array(fluid.global_scope().find_var(i.name) np_data = np.array(global_scope().find_var(i.name).get_tensor())
.get_tensor())
if i.name not in self._sampling_data: if i.name not in self._sampling_data:
self._sampling_data[i.name] = [] self._sampling_data[i.name] = []
self._sampling_data[i.name].append(np_data) self._sampling_data[i.name].append(np_data)
...@@ -86,9 +89,9 @@ class Calibrator(object): ...@@ -86,9 +89,9 @@ class Calibrator(object):
''' '''
Save the quantized model to the disk. Save the quantized model to the disk.
''' '''
fluid.io.save_inference_model(self.output, self.feed_var_names, io.save_inference_model(self.output, self.feed_var_names,
self.fetch_list, self.exe, self.fetch_list, self.exe,
self.sampling_program) self.sampling_program)
def __display_debug(self): def __display_debug(self):
if self.debug: if self.debug:
......
...@@ -24,8 +24,7 @@ import contextlib ...@@ -24,8 +24,7 @@ import contextlib
from paddle.dataset.common import download from paddle.dataset.common import download
from PIL import Image, ImageEnhance from PIL import Image, ImageEnhance
import math import math
sys.path.append('..') import paddle.fluid.contrib.int8_inference.utility as int8_utility
import int8_inference.utility as int8_utility
random.seed(0) random.seed(0)
np.random.seed(0) np.random.seed(0)
......
...@@ -21,9 +21,10 @@ import shutil ...@@ -21,9 +21,10 @@ import shutil
import six import six
from functools import reduce from functools import reduce
from paddle.fluid import layers
from paddle.fluid.executor import Executor from paddle.fluid.executor import Executor
from paddle.fluid.evaluator import Evaluator from paddle.fluid.evaluator import Evaluator
from paddle.fluid.framework import Program, Parameter, default_main_program, default_startup_program, Variable from paddle.fluid.framework import Program, Parameter, default_main_program, default_startup_program, Variable, program_guard
from . import core from . import core
__all__ = [ __all__ = [
...@@ -931,6 +932,17 @@ def save_inference_model(dirname, ...@@ -931,6 +932,17 @@ def save_inference_model(dirname,
if main_program is None: if main_program is None:
main_program = default_main_program() main_program = default_main_program()
# fix the bug that the activation op's output as target will be pruned.
# will affect the inference performance.
# TODO(Superjomn) add an IR pass to remove 1-scale op.
with program_guard(main_program):
uniq_target_vars = []
for var in target_vars:
if isinstance(var, Variable):
var1 = layers.scale(var, 1.)
uniq_target_vars.append(var1)
target_vars = uniq_target_vars
# when a pserver and a trainer running on the same machine, mkdir may conflict # when a pserver and a trainer running on the same machine, mkdir may conflict
try: try:
os.makedirs(dirname) os.makedirs(dirname)
......
...@@ -49,6 +49,7 @@ __all__ = [ ...@@ -49,6 +49,7 @@ __all__ = [
'box_coder', 'box_coder',
'polygon_box_transform', 'polygon_box_transform',
'yolov3_loss', 'yolov3_loss',
'box_clip',
'multiclass_nms', 'multiclass_nms',
] ]
...@@ -508,13 +509,10 @@ def yolov3_loss(x, ...@@ -508,13 +509,10 @@ def yolov3_loss(x,
gtbox, gtbox,
gtlabel, gtlabel,
anchors, anchors,
anchor_mask,
class_num, class_num,
ignore_thresh, ignore_thresh,
loss_weight_xy=None, downsample_ratio,
loss_weight_wh=None,
loss_weight_conf_target=None,
loss_weight_conf_notarget=None,
loss_weight_class=None,
name=None): name=None):
""" """
${comment} ${comment}
...@@ -526,16 +524,13 @@ def yolov3_loss(x, ...@@ -526,16 +524,13 @@ def yolov3_loss(x,
and x, y, w, h should be relative value of input image. and x, y, w, h should be relative value of input image.
N is the batch number and B is the max box number in N is the batch number and B is the max box number in
an image. an image.
gtlabel (Variable): class id of ground truth boxes, shoud be ins shape gtlabel (Variable): class id of ground truth boxes, shoud be in shape
of [N, B]. of [N, B].
anchors (list|tuple): ${anchors_comment} anchors (list|tuple): ${anchors_comment}
anchor_mask (list|tuple): ${anchor_mask_comment}
class_num (int): ${class_num_comment} class_num (int): ${class_num_comment}
ignore_thresh (float): ${ignore_thresh_comment} ignore_thresh (float): ${ignore_thresh_comment}
loss_weight_xy (float|None): ${loss_weight_xy_comment} downsample_ratio (int): ${downsample_ratio_comment}
loss_weight_wh (float|None): ${loss_weight_wh_comment}
loss_weight_conf_target (float|None): ${loss_weight_conf_target_comment}
loss_weight_conf_notarget (float|None): ${loss_weight_conf_notarget_comment}
loss_weight_class (float|None): ${loss_weight_class_comment}
name (string): the name of yolov3 loss name (string): the name of yolov3 loss
Returns: Returns:
...@@ -555,9 +550,10 @@ def yolov3_loss(x, ...@@ -555,9 +550,10 @@ def yolov3_loss(x,
x = fluid.layers.data(name='x', shape=[255, 13, 13], dtype='float32') x = fluid.layers.data(name='x', shape=[255, 13, 13], dtype='float32')
gtbox = fluid.layers.data(name='gtbox', shape=[6, 5], dtype='float32') gtbox = fluid.layers.data(name='gtbox', shape=[6, 5], dtype='float32')
gtlabel = fluid.layers.data(name='gtlabel', shape=[6, 1], dtype='int32') gtlabel = fluid.layers.data(name='gtlabel', shape=[6, 1], dtype='int32')
anchors = [10, 13, 16, 30, 33, 23] anchors = [10, 13, 16, 30, 33, 23, 30, 61, 62, 45, 59, 119, 116, 90, 156, 198, 373, 326]
loss = fluid.layers.yolov3_loss(x=x, gtbox=gtbox, class_num=80 anchors = [0, 1, 2]
anchors=anchors, ignore_thresh=0.5) loss = fluid.layers.yolov3_loss(x=x, gtbox=gtbox, class_num=80, anchors=anchors,
ignore_thresh=0.5, downsample_ratio=32)
""" """
helper = LayerHelper('yolov3_loss', **locals()) helper = LayerHelper('yolov3_loss', **locals())
...@@ -569,6 +565,8 @@ def yolov3_loss(x, ...@@ -569,6 +565,8 @@ def yolov3_loss(x,
raise TypeError("Input gtlabel of yolov3_loss must be Variable") raise TypeError("Input gtlabel of yolov3_loss must be Variable")
if not isinstance(anchors, list) and not isinstance(anchors, tuple): if not isinstance(anchors, list) and not isinstance(anchors, tuple):
raise TypeError("Attr anchors of yolov3_loss must be list or tuple") raise TypeError("Attr anchors of yolov3_loss must be list or tuple")
if not isinstance(anchor_mask, list) and not isinstance(anchor_mask, tuple):
raise TypeError("Attr anchor_mask of yolov3_loss must be list or tuple")
if not isinstance(class_num, int): if not isinstance(class_num, int):
raise TypeError("Attr class_num of yolov3_loss must be an integer") raise TypeError("Attr class_num of yolov3_loss must be an integer")
if not isinstance(ignore_thresh, float): if not isinstance(ignore_thresh, float):
...@@ -581,31 +579,29 @@ def yolov3_loss(x, ...@@ -581,31 +579,29 @@ def yolov3_loss(x,
loss = helper.create_variable( loss = helper.create_variable(
name=name, dtype=x.dtype, persistable=False) name=name, dtype=x.dtype, persistable=False)
objectness_mask = helper.create_variable_for_type_inference(dtype='int32')
gt_match_mask = helper.create_variable_for_type_inference(dtype='int32')
attrs = { attrs = {
"anchors": anchors, "anchors": anchors,
"anchor_mask": anchor_mask,
"class_num": class_num, "class_num": class_num,
"ignore_thresh": ignore_thresh, "ignore_thresh": ignore_thresh,
"downsample_ratio": downsample_ratio,
} }
if loss_weight_xy is not None and isinstance(loss_weight_xy, float):
self.attrs['loss_weight_xy'] = loss_weight_xy
if loss_weight_wh is not None and isinstance(loss_weight_wh, float):
self.attrs['loss_weight_wh'] = loss_weight_wh
if loss_weight_conf_target is not None and isinstance(
loss_weight_conf_target, float):
self.attrs['loss_weight_conf_target'] = loss_weight_conf_target
if loss_weight_conf_notarget is not None and isinstance(
loss_weight_conf_notarget, float):
self.attrs['loss_weight_conf_notarget'] = loss_weight_conf_notarget
if loss_weight_class is not None and isinstance(loss_weight_class, float):
self.attrs['loss_weight_class'] = loss_weight_class
helper.append_op( helper.append_op(
type='yolov3_loss', type='yolov3_loss',
inputs={"X": x, inputs={
"GTBox": gtbox, "X": x,
"GTLabel": gtlabel}, "GTBox": gtbox,
outputs={'Loss': loss}, "GTLabel": gtlabel,
},
outputs={
'Loss': loss,
'ObjectnessMask': objectness_mask,
'GTMatchMask': gt_match_mask
},
attrs=attrs) attrs=attrs)
return loss return loss
...@@ -2060,6 +2056,54 @@ def generate_proposals(scores, ...@@ -2060,6 +2056,54 @@ def generate_proposals(scores,
return rpn_rois, rpn_roi_probs return rpn_rois, rpn_roi_probs
def box_clip(input, im_info, name=None):
"""
Clip the box into the size given by im_info
For each input box, The formula is given as follows:
.. code-block:: text
xmin = max(min(xmin, im_w - 1), 0)
ymin = max(min(ymin, im_h - 1), 0)
xmax = max(min(xmax, im_w - 1), 0)
ymax = max(min(ymax, im_h - 1), 0)
where im_w and im_h are computed from im_info:
.. code-block:: text
im_h = round(height / scale)
im_w = round(weight / scale)
Args:
input(variable): The input box, the last dimension is 4.
im_info(variable): The information of image with shape [N, 3] with
layout (height, width, scale). height and width
is the input size and scale is the ratio of input
size and original size.
name (str): The name of this layer. It is optional.
Returns:
Variable: The cliped tensor variable.
Examples:
.. code-block:: python
boxes = fluid.layers.data(
name='data', shape=[8, 4], dtype='float32', lod_level=1)
im_info = fluid.layers.data(name='im_info', shape=[3])
out = fluid.layers.box_clip(
input=boxes, im_info=im_info, inplace=True)
"""
helper = LayerHelper("box_clip", **locals())
output = helper.create_variable_for_type_inference(dtype=input.dtype)
inputs = {"Input": input, "ImInfo": im_info}
helper.append_op(type="box_clip", inputs=inputs, outputs={"Output": output})
return output
def multiclass_nms(bboxes, def multiclass_nms(bboxes,
scores, scores,
score_threshold, score_threshold,
...@@ -2137,9 +2181,11 @@ def multiclass_nms(bboxes, ...@@ -2137,9 +2181,11 @@ def multiclass_nms(bboxes,
(After version 1.3, when no boxes detected, the lod is changed (After version 1.3, when no boxes detected, the lod is changed
from {0} to {1}) from {0} to {1})
Examples: Examples:
.. code-block:: python .. code-block:: python
boxes = fluid.layers.data(name='bboxes', shape=[81, 4], boxes = fluid.layers.data(name='bboxes', shape=[81, 4],
dtype='float32', lod_level=1) dtype='float32', lod_level=1)
scores = fluid.layers.data(name='scores', shape=[81], scores = fluid.layers.data(name='scores', shape=[81],
......
...@@ -476,12 +476,23 @@ class TestYoloDetection(unittest.TestCase): ...@@ -476,12 +476,23 @@ class TestYoloDetection(unittest.TestCase):
x = layers.data(name='x', shape=[30, 7, 7], dtype='float32') x = layers.data(name='x', shape=[30, 7, 7], dtype='float32')
gtbox = layers.data(name='gtbox', shape=[10, 4], dtype='float32') gtbox = layers.data(name='gtbox', shape=[10, 4], dtype='float32')
gtlabel = layers.data(name='gtlabel', shape=[10], dtype='int32') gtlabel = layers.data(name='gtlabel', shape=[10], dtype='int32')
loss = layers.yolov3_loss(x, gtbox, gtlabel, [10, 13, 30, 13], 10, loss = layers.yolov3_loss(x, gtbox, gtlabel, [10, 13, 30, 13],
0.5) [0, 1], 10, 0.7, 32)
self.assertIsNotNone(loss) self.assertIsNotNone(loss)
class TestBoxClip(unittest.TestCase):
def test_box_clip(self):
program = Program()
with program_guard(program):
input_box = layers.data(
name='input_box', shape=[7, 4], dtype='float32', lod_level=1)
im_info = layers.data(name='im_info', shape=[3], dtype='float32')
out = layers.box_clip(input_box, im_info)
self.assertIsNotNone(out)
class TestMulticlassNMS(unittest.TestCase): class TestMulticlassNMS(unittest.TestCase):
def test_multiclass_nms(self): def test_multiclass_nms(self):
program = Program() program = Program()
......
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import unittest
from paddle.fluid.tests.unittests.test_conv2d_op import *
class TestNGRAPH(TestConv2dOp):
def init_kernel_type(self):
super(TestNGRAPH, self).init_kernel_type()
class TestNGRAPHWithPad(TestWithPad):
def init_kernel_type(self):
super(TestNGRAPHWithPad, self).init_kernel_type()
class TestNGRAPHWithStride(TestWithStride):
def init_kernel_type(self):
super(TestNGRAPHWithStride, self).init_kernel_type()
class TestNGRAPHWithGroup(TestWithGroup):
def init_kernel_type(self):
super(TestNGRAPHWithGroup, self).init_kernel_type()
class TestNGRAPHWith1x1(TestWith1x1):
def init_kernel_type(self):
super(TestNGRAPHWith1x1, self).init_kernel_type()
class TestNGRAPHWithInput1x1Filter1x1(TestWithInput1x1Filter1x1):
def init_kernel_type(self):
super(TestNGRAPHWithInput1x1Filter1x1, self).init_kernel_type()
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
from paddle.fluid.tests.unittests.test_pool2d_op import *
class TestNGRAPHPool2D_Op(TestPool2D_Op):
def init_test_case(self):
super(TestNGRAPHPool2D_Op, self).init_test_case()
class TestNGRAPHCase1(TestCase1):
def init_test_case(self):
super(TestNGRAPHCase1, self).init_test_case()
class TestNGRAPHCase2(TestCase2):
def init_test_case(self):
super(TestNGRAPHCase2, self).init_test_case()
class TestNGRAPHCase3(TestCase3):
def init_pool_type(self):
super(TestNGRAPHCase3, self).init_pool_type()
class TestNGRAPHCase4(TestCase4):
def init_pool_type(self):
super(TestNGRAPHCase4, self).init_pool_type()
class TestNGRAPHCase5(TestCase5):
def init_pool_type(self):
super(TestNGRAPHCase5, self).init_pool_type()
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import unittest
import numpy as np
import sys
import math
from op_test import OpTest
import copy
def box_clip(input_box, im_info, output_box):
im_w = round(im_info[1] / im_info[2])
im_h = round(im_info[0] / im_info[2])
output_box[:, :, 0] = np.maximum(
np.minimum(input_box[:, :, 0], im_w - 1), 0)
output_box[:, :, 1] = np.maximum(
np.minimum(input_box[:, :, 1], im_h - 1), 0)
output_box[:, :, 2] = np.maximum(
np.minimum(input_box[:, :, 2], im_w - 1), 0)
output_box[:, :, 3] = np.maximum(
np.minimum(input_box[:, :, 3], im_h - 1), 0)
def batch_box_clip(input_boxes, im_info, lod):
n = input_boxes.shape[0]
m = input_boxes.shape[1]
output_boxes = np.zeros((n, m, 4), dtype=np.float32)
cur_offset = 0
for i in range(len(lod)):
box_clip(input_boxes[cur_offset:(cur_offset + lod[i]), :, :],
im_info[i, :],
output_boxes[cur_offset:(cur_offset + lod[i]), :, :])
cur_offset += lod[i]
return output_boxes
class TestBoxClipOp(OpTest):
def test_check_output(self):
self.check_output()
def setUp(self):
self.op_type = "box_clip"
lod = [[1, 2, 3]]
input_boxes = np.random.random((6, 10, 4)) * 5
im_info = np.array([[5, 8, 1.], [6, 6, 1.], [7, 5, 1.]])
output_boxes = batch_box_clip(input_boxes, im_info, lod[0])
self.inputs = {
'Input': (input_boxes.astype('float32'), lod),
'ImInfo': im_info.astype('float32'),
}
self.outputs = {'Output': output_boxes}
if __name__ == '__main__':
unittest.main()
...@@ -268,9 +268,6 @@ class TestImperativePtbRnn(unittest.TestCase): ...@@ -268,9 +268,6 @@ class TestImperativePtbRnn(unittest.TestCase):
sgd.minimize(dy_loss) sgd.minimize(dy_loss)
for param in ptb_model.parameters(): for param in ptb_model.parameters():
dy_param_updated[param.name] = param._numpy() dy_param_updated[param.name] = param._numpy()
# print("dy_loss is {}".format(dy_loss._numpy()))
# print("last_hidden is {}".format(last_hidden._numpy()))
# print("last_cell is {}".format(last_cell._numpy()))
with new_program_scope(): with new_program_scope():
fluid.default_startup_program().random_seed = seed fluid.default_startup_program().random_seed = seed
......
...@@ -82,7 +82,8 @@ class TestBook(unittest.TestCase): ...@@ -82,7 +82,8 @@ class TestBook(unittest.TestCase):
self.assertEqual(feed_var_names, ["x", "y"]) self.assertEqual(feed_var_names, ["x", "y"])
self.assertEqual(len(fetch_vars), 1) self.assertEqual(len(fetch_vars), 1)
self.assertEqual(str(fetch_vars[0]), str(avg_cost)) print("fetch %s" % str(fetch_vars[0]))
self.assertTrue("scale" in str(fetch_vars[0]))
self.assertEqual(expected, actual) self.assertEqual(expected, actual)
......
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import unittest
import os
os.environ['FLAGS_benchmark'] = 'True'
import numpy
import paddle.fluid.core as core
from paddle.fluid.executor import Executor
from paddle.fluid.layers import mul, data
class TestPeakMemoryMonitoring(unittest.TestCase):
def test_mul(self):
a = data(name='a', shape=[784], dtype='float32')
b = data(
name='b',
shape=[784, 100],
dtype='float32',
append_batch_size=False)
out = mul(x=a, y=b)
if core.is_compiled_with_cuda():
place = core.CUDAPlace(0)
a_np = numpy.random.random((100, 784)).astype('float32')
b_np = numpy.random.random((784, 100)).astype('float32')
self.assertEqual(0, core.get_mem_usage(0))
exe = Executor(place)
outs = exe.run(feed={'a': a_np, 'b': b_np}, fetch_list=[out])
out = outs[0]
#disable this assert since ctest will ignore the os.environ setting
#self.assertGreater(core.get_mem_usage(0), 0)
raised = False
try:
core.print_mem_usage()
except:
raised = True
self.assertFalse(raised, 'Exception raised')
if __name__ == '__main__':
unittest.main()
...@@ -16,174 +16,179 @@ from __future__ import division ...@@ -16,174 +16,179 @@ from __future__ import division
import unittest import unittest
import numpy as np import numpy as np
from scipy.special import logit
from scipy.special import expit
from op_test import OpTest from op_test import OpTest
from paddle.fluid import core from paddle.fluid import core
def sigmoid(x): def l2loss(x, y):
return 1.0 / (1.0 + np.exp(-1.0 * x)) return 0.5 * (y - x) * (y - x)
def mse(x, y, num): def sce(x, label):
return ((y - x)**2).sum() / num sigmoid_x = expit(x)
term1 = label * np.log(sigmoid_x)
term2 = (1.0 - label) * np.log(1.0 - sigmoid_x)
return -term1 - term2
def bce(x, y, mask): def sigmoid(x):
x = x.reshape((-1)) return 1.0 / (1.0 + np.exp(-1.0 * x))
y = y.reshape((-1))
mask = mask.reshape((-1))
error_sum = 0.0
count = 0
for i in range(x.shape[0]):
if mask[i] > 0:
error_sum += y[i] * np.log(x[i]) + (1 - y[i]) * np.log(1 - x[i])
count += 1
return error_sum / (-1.0 * count)
def batch_xywh_box_iou(box1, box2):
b1_left = box1[:, :, 0] - box1[:, :, 2] / 2
b1_right = box1[:, :, 0] + box1[:, :, 2] / 2
b1_top = box1[:, :, 1] - box1[:, :, 3] / 2
b1_bottom = box1[:, :, 1] + box1[:, :, 3] / 2
def box_iou(box1, box2): b2_left = box2[:, :, 0] - box2[:, :, 2] / 2
b1_x1 = box1[0] - box1[2] / 2 b2_right = box2[:, :, 0] + box2[:, :, 2] / 2
b1_x2 = box1[0] + box1[2] / 2 b2_top = box2[:, :, 1] - box2[:, :, 3] / 2
b1_y1 = box1[1] - box1[3] / 2 b2_bottom = box2[:, :, 1] + box2[:, :, 3] / 2
b1_y2 = box1[1] + box1[3] / 2
b2_x1 = box2[0] - box2[2] / 2
b2_x2 = box2[0] + box2[2] / 2
b2_y1 = box2[1] - box2[3] / 2
b2_y2 = box2[1] + box2[3] / 2
b1_area = (b1_x2 - b1_x1) * (b1_y2 - b1_y1) left = np.maximum(b1_left[:, :, np.newaxis], b2_left[:, np.newaxis, :])
b2_area = (b2_x2 - b2_x1) * (b2_y2 - b2_y1) right = np.minimum(b1_right[:, :, np.newaxis], b2_right[:, np.newaxis, :])
top = np.maximum(b1_top[:, :, np.newaxis], b2_top[:, np.newaxis, :])
bottom = np.minimum(b1_bottom[:, :, np.newaxis],
b2_bottom[:, np.newaxis, :])
inter_rect_x1 = max(b1_x1, b2_x1) inter_w = np.clip(right - left, 0., 1.)
inter_rect_y1 = max(b1_y1, b2_y1) inter_h = np.clip(bottom - top, 0., 1.)
inter_rect_x2 = min(b1_x2, b2_x2) inter_area = inter_w * inter_h
inter_rect_y2 = min(b1_y2, b2_y2)
inter_area = max(inter_rect_x2 - inter_rect_x1, 0) * max(
inter_rect_y2 - inter_rect_y1, 0)
return inter_area / (b1_area + b2_area + inter_area) b1_area = (b1_right - b1_left) * (b1_bottom - b1_top)
b2_area = (b2_right - b2_left) * (b2_bottom - b2_top)
union = b1_area[:, :, np.newaxis] + b2_area[:, np.newaxis, :] - inter_area
return inter_area / union
def build_target(gtboxs, gtlabel, attrs, grid_size):
n, b, _ = gtboxs.shape
ignore_thresh = attrs["ignore_thresh"]
anchors = attrs["anchors"]
class_num = attrs["class_num"]
an_num = len(anchors) // 2
obj_mask = np.zeros((n, an_num, grid_size, grid_size)).astype('float32')
noobj_mask = np.ones((n, an_num, grid_size, grid_size)).astype('float32')
tx = np.zeros((n, an_num, grid_size, grid_size)).astype('float32')
ty = np.zeros((n, an_num, grid_size, grid_size)).astype('float32')
tw = np.zeros((n, an_num, grid_size, grid_size)).astype('float32')
th = np.zeros((n, an_num, grid_size, grid_size)).astype('float32')
tconf = np.zeros((n, an_num, grid_size, grid_size)).astype('float32')
tcls = np.zeros(
(n, an_num, grid_size, grid_size, class_num)).astype('float32')
def YOLOv3Loss(x, gtbox, gtlabel, attrs):
n, c, h, w = x.shape
b = gtbox.shape[1]
anchors = attrs['anchors']
an_num = len(anchors) // 2
anchor_mask = attrs['anchor_mask']
mask_num = len(anchor_mask)
class_num = attrs["class_num"]
ignore_thresh = attrs['ignore_thresh']
downsample = attrs['downsample']
input_size = downsample * h
x = x.reshape((n, mask_num, 5 + class_num, h, w)).transpose((0, 1, 3, 4, 2))
loss = np.zeros((n)).astype('float32')
pred_box = x[:, :, :, :, :4].copy()
grid_x = np.tile(np.arange(w).reshape((1, w)), (h, 1))
grid_y = np.tile(np.arange(h).reshape((h, 1)), (1, w))
pred_box[:, :, :, :, 0] = (grid_x + sigmoid(pred_box[:, :, :, :, 0])) / w
pred_box[:, :, :, :, 1] = (grid_y + sigmoid(pred_box[:, :, :, :, 1])) / h
x[:, :, :, :, 5:] = np.where(x[:, :, :, :, 5:] < -0.5, x[:, :, :, :, 5:],
np.ones_like(x[:, :, :, :, 5:]) * 1.0 /
class_num)
mask_anchors = []
for m in anchor_mask:
mask_anchors.append((anchors[2 * m], anchors[2 * m + 1]))
anchors_s = np.array(
[(an_w / input_size, an_h / input_size) for an_w, an_h in mask_anchors])
anchor_w = anchors_s[:, 0:1].reshape((1, mask_num, 1, 1))
anchor_h = anchors_s[:, 1:2].reshape((1, mask_num, 1, 1))
pred_box[:, :, :, :, 2] = np.exp(pred_box[:, :, :, :, 2]) * anchor_w
pred_box[:, :, :, :, 3] = np.exp(pred_box[:, :, :, :, 3]) * anchor_h
pred_box = pred_box.reshape((n, -1, 4))
pred_obj = x[:, :, :, :, 4].reshape((n, -1))
objness = np.zeros(pred_box.shape[:2]).astype('float32')
ious = batch_xywh_box_iou(pred_box, gtbox)
ious_max = np.max(ious, axis=-1)
objness = np.where(ious_max > ignore_thresh, -np.ones_like(objness),
objness)
gtbox_shift = gtbox.copy()
gtbox_shift[:, :, 0] = 0
gtbox_shift[:, :, 1] = 0
anchors = [(anchors[2 * i], anchors[2 * i + 1]) for i in range(0, an_num)]
anchors_s = np.array(
[(an_w / input_size, an_h / input_size) for an_w, an_h in anchors])
anchor_boxes = np.concatenate(
[np.zeros_like(anchors_s), anchors_s], axis=-1)
anchor_boxes = np.tile(anchor_boxes[np.newaxis, :, :], (n, 1, 1))
ious = batch_xywh_box_iou(gtbox_shift, anchor_boxes)
iou_matches = np.argmax(ious, axis=-1)
gt_matches = iou_matches.copy()
for i in range(n): for i in range(n):
for j in range(b): for j in range(b):
if gtboxs[i, j, :].sum() == 0: if gtbox[i, j, 2:].sum() == 0:
gt_matches[i, j] = -1
continue continue
if iou_matches[i, j] not in anchor_mask:
gt_matches[i, j] = -1
continue
an_idx = anchor_mask.index(iou_matches[i, j])
gt_matches[i, j] = an_idx
gi = int(gtbox[i, j, 0] * w)
gj = int(gtbox[i, j, 1] * h)
gt_label = gtlabel[i, j] tx = gtbox[i, j, 0] * w - gi
gx = gtboxs[i, j, 0] * grid_size ty = gtbox[i, j, 1] * w - gj
gy = gtboxs[i, j, 1] * grid_size tw = np.log(gtbox[i, j, 2] * input_size / mask_anchors[an_idx][0])
gw = gtboxs[i, j, 2] * grid_size th = np.log(gtbox[i, j, 3] * input_size / mask_anchors[an_idx][1])
gh = gtboxs[i, j, 3] * grid_size scale = (2.0 - gtbox[i, j, 2] * gtbox[i, j, 3])
loss[i] += sce(x[i, an_idx, gj, gi, 0], tx) * scale
gi = int(gx) loss[i] += sce(x[i, an_idx, gj, gi, 1], ty) * scale
gj = int(gy) loss[i] += l2loss(x[i, an_idx, gj, gi, 2], tw) * scale
loss[i] += l2loss(x[i, an_idx, gj, gi, 3], th) * scale
gtbox = [0, 0, gw, gh]
max_iou = 0 objness[i, an_idx * h * w + gj * w + gi] = 1.0
for k in range(an_num):
anchor_box = [0, 0, anchors[2 * k], anchors[2 * k + 1]] for label_idx in range(class_num):
iou = box_iou(gtbox, anchor_box) loss[i] += sce(x[i, an_idx, gj, gi, 5 + label_idx],
if iou > max_iou: float(label_idx == gtlabel[i, j]))
max_iou = iou
best_an_index = k for j in range(mask_num * h * w):
if iou > ignore_thresh: if objness[i, j] > 0:
noobj_mask[i, best_an_index, gj, gi] = 0 loss[i] += sce(pred_obj[i, j], 1.0)
elif objness[i, j] == 0:
obj_mask[i, best_an_index, gj, gi] = 1 loss[i] += sce(pred_obj[i, j], 0.0)
noobj_mask[i, best_an_index, gj, gi] = 0
tx[i, best_an_index, gj, gi] = gx - gi return (loss, objness.reshape((n, mask_num, h, w)).astype('float32'), \
ty[i, best_an_index, gj, gi] = gy - gj gt_matches.astype('int32'))
tw[i, best_an_index, gj, gi] = np.log(gw / anchors[2 *
best_an_index])
th[i, best_an_index, gj, gi] = np.log(
gh / anchors[2 * best_an_index + 1])
tconf[i, best_an_index, gj, gi] = 1
tcls[i, best_an_index, gj, gi, gt_label] = 1
return (tx, ty, tw, th, tconf, tcls, obj_mask, noobj_mask)
def YoloV3Loss(x, gtbox, gtlabel, attrs):
n, c, h, w = x.shape
an_num = len(attrs['anchors']) // 2
class_num = attrs["class_num"]
x = x.reshape((n, an_num, 5 + class_num, h, w)).transpose((0, 1, 3, 4, 2))
pred_x = sigmoid(x[:, :, :, :, 0])
pred_y = sigmoid(x[:, :, :, :, 1])
pred_w = x[:, :, :, :, 2]
pred_h = x[:, :, :, :, 3]
pred_conf = sigmoid(x[:, :, :, :, 4])
pred_cls = sigmoid(x[:, :, :, :, 5:])
tx, ty, tw, th, tconf, tcls, obj_mask, noobj_mask = build_target(
gtbox, gtlabel, attrs, x.shape[2])
obj_mask_expand = np.tile(
np.expand_dims(obj_mask, 4), (1, 1, 1, 1, int(attrs['class_num'])))
loss_x = mse(pred_x * obj_mask, tx * obj_mask, obj_mask.sum())
loss_y = mse(pred_y * obj_mask, ty * obj_mask, obj_mask.sum())
loss_w = mse(pred_w * obj_mask, tw * obj_mask, obj_mask.sum())
loss_h = mse(pred_h * obj_mask, th * obj_mask, obj_mask.sum())
loss_conf_target = bce(pred_conf * obj_mask, tconf * obj_mask, obj_mask)
loss_conf_notarget = bce(pred_conf * noobj_mask, tconf * noobj_mask,
noobj_mask)
loss_class = bce(pred_cls * obj_mask_expand, tcls * obj_mask_expand,
obj_mask_expand)
return attrs['loss_weight_xy'] * (loss_x + loss_y) \
+ attrs['loss_weight_wh'] * (loss_w + loss_h) \
+ attrs['loss_weight_conf_target'] * loss_conf_target \
+ attrs['loss_weight_conf_notarget'] * loss_conf_notarget \
+ attrs['loss_weight_class'] * loss_class
class TestYolov3LossOp(OpTest): class TestYolov3LossOp(OpTest):
def setUp(self): def setUp(self):
self.loss_weight_xy = 1.0
self.loss_weight_wh = 1.0
self.loss_weight_conf_target = 1.0
self.loss_weight_conf_notarget = 1.0
self.loss_weight_class = 1.0
self.initTestCase() self.initTestCase()
self.op_type = 'yolov3_loss' self.op_type = 'yolov3_loss'
x = np.random.random(size=self.x_shape).astype('float32') x = logit(np.random.uniform(0, 1, self.x_shape).astype('float32'))
gtbox = np.random.random(size=self.gtbox_shape).astype('float32') gtbox = np.random.random(size=self.gtbox_shape).astype('float32')
gtlabel = np.random.randint(0, self.class_num, gtlabel = np.random.randint(0, self.class_num, self.gtbox_shape[:2])
self.gtbox_shape[:2]).astype('int32') gtmask = np.random.randint(0, 2, self.gtbox_shape[:2])
gtbox = gtbox * gtmask[:, :, np.newaxis]
gtlabel = gtlabel * gtmask
self.attrs = { self.attrs = {
"anchors": self.anchors, "anchors": self.anchors,
"anchor_mask": self.anchor_mask,
"class_num": self.class_num, "class_num": self.class_num,
"ignore_thresh": self.ignore_thresh, "ignore_thresh": self.ignore_thresh,
"loss_weight_xy": self.loss_weight_xy, "downsample": self.downsample,
"loss_weight_wh": self.loss_weight_wh,
"loss_weight_conf_target": self.loss_weight_conf_target,
"loss_weight_conf_notarget": self.loss_weight_conf_notarget,
"loss_weight_class": self.loss_weight_class,
} }
self.inputs = {'X': x, 'GTBox': gtbox, 'GTLabel': gtlabel} self.inputs = {
'X': x,
'GTBox': gtbox.astype('float32'),
'GTLabel': gtlabel.astype('int32'),
}
loss, objness, gt_matches = YOLOv3Loss(x, gtbox, gtlabel, self.attrs)
self.outputs = { self.outputs = {
'Loss': np.array( 'Loss': loss,
[YoloV3Loss(x, gtbox, gtlabel, self.attrs)]).astype('float32') 'ObjectnessMask': objness,
"GTMatchMask": gt_matches
} }
def test_check_output(self): def test_check_output(self):
...@@ -196,19 +201,16 @@ class TestYolov3LossOp(OpTest): ...@@ -196,19 +201,16 @@ class TestYolov3LossOp(OpTest):
place, ['X'], place, ['X'],
'Loss', 'Loss',
no_grad_set=set(["GTBox", "GTLabel"]), no_grad_set=set(["GTBox", "GTLabel"]),
max_relative_error=0.06) max_relative_error=0.3)
def initTestCase(self): def initTestCase(self):
self.anchors = [10, 13, 12, 12] self.anchors = [10, 13, 16, 30, 33, 23]
self.class_num = 10 self.anchor_mask = [1, 2]
self.class_num = 5
self.ignore_thresh = 0.5 self.ignore_thresh = 0.5
self.x_shape = (5, len(self.anchors) // 2 * (5 + self.class_num), 7, 7) self.downsample = 32
self.gtbox_shape = (5, 10, 4) self.x_shape = (3, len(self.anchor_mask) * (5 + self.class_num), 5, 5)
self.loss_weight_xy = 2.5 self.gtbox_shape = (3, 5, 4)
self.loss_weight_wh = 0.8
self.loss_weight_conf_target = 1.5
self.loss_weight_conf_notarget = 0.5
self.loss_weight_class = 1.2
if __name__ == "__main__": if __name__ == "__main__":
......
...@@ -17,3 +17,4 @@ from __future__ import print_function ...@@ -17,3 +17,4 @@ from __future__ import print_function
from .program_utils import * from .program_utils import *
from .ufind import * from .ufind import *
from .checkport import * from .checkport import *
from .vars_distributed import *
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
from paddle.fluid.framework import Variable
class VarStruct(object):
"""
record part properties of a Variable in python.
"""
def __init__(self, name, shape, dtype, type, lod_level, persistable):
self.name = name
self.shape = shape
self.dtype = dtype
self.type = type
self.lod_level = lod_level
self.persistable = persistable
class VarDistributed(object):
"""
a class to record the var distributed on parameter servers.
the class will record the relationship between origin var and slice var.
the slice var's properties, such as type/shape/offset/endpoint.
"""
def __init__(self,
origin_var,
slice_var,
is_slice=None,
block_id=None,
offset=None,
vtype=None,
endpoint=None):
"""
Args:
origin_var(Variable|VarStruct): origin var properties
slice_var(Variable|VarStruct): slice var properties
is_slice(bool|None): slice or not, slice_var=True/False and its block size > 8192 are the judgement standard.
block_id(int|None): the number about the slice var.
offset(int|None): if the slice var is sliced, offset is the numel before the var.
vtype(str|None): a tag, such as Optimizer/Param/RemoteProfetch.
endpoint(str|None): which parameter the slice var on, such as "127.0.0.1:1001"
"""
if isinstance(origin_var, Variable):
self.origin = self.__create_var_struct(origin_var)
else:
self.origin = origin_var
if isinstance(slice_var, Variable):
self.slice = self.__create_var_struct(slice_var)
else:
self.slice = slice_var
if self.equal(self.origin, self.slice):
self.is_slice = False
self.block_id = 0
self.offset = 0
else:
self.is_slice = True
self.block_id = 0
self.offset = 0
if is_slice is not None:
self.is_slice = is_slice
if block_id is not None:
self.block_id = block_id
if offset is not None:
self.offset = offset
self.vtype = vtype
self.endpoint = endpoint
@staticmethod
def __create_var_struct(var):
return VarStruct(var.name, var.shape, var.dtype, var.type,
var.lod_level, var.persistable)
@staticmethod
def equal(var1, var2):
"""
the two var is equal or not.
Returns:
bool: equal will return True else False
"""
assert isinstance(var1, VarStruct) and isinstance(var2, VarStruct)
return var1.name == var2.name and \
var1.type == var2.type and \
var1.shape == var2.shape and \
var1.dtype == var2.dtype and \
var1.lod_level == var2.lod_level and \
var1.persistable == var2.persistable
def __str__(self):
origin_var_str = "{name} : fluid.{type}.shape{shape}.astype({dtype})". \
format(i="{", e="}", name=self.origin.name, type=self.origin.type,
shape=self.origin.shape, dtype=self.origin.dtype)
slice_var_str = "{name} : fluid.{type}.shape{shape}.astype({dtype})" \
".slice({is_slice}).block({block_id}).offset({offset})". \
format(i="{", e="}", name=self.slice.name, type=self.slice.type,
shape=self.slice.shape, dtype=self.slice.dtype,
is_slice=self.is_slice, block_id=self.block_id, offset=self.offset)
return "var owned: {}, origin var: ( {} ), slice var: ( {} ), endpoint: {} ".format(
self.vtype, origin_var_str, slice_var_str, self.endpoint)
class VarsDistributed(object):
"""
a gather about VarDistributed with many methods to find distributed vars.
through the class, we can get overview about the distributed parameters on parameter servers.
this class may centralized and convenient for developer to manage and get variable's distribute.
other module can also use this to find variables such io.py.
"""
def __init__(self):
self.distributed_vars = []
def add_distributed_var(self,
origin_var,
slice_var,
is_slice=None,
block_id=None,
offset=None,
vtype=None,
endpoint=None):
"""
add distributed var in this.
Args:
origin_var(Variable|VarStruct): origin var properties
slice_var(Variable|VarStruct): slice var properties
is_slice(bool|None): slice or not, slice_var=True/False and its block size > 8192 are the judgement standard.
block_id(int|None): the number about the slice var.
offset(int|None): if the slice var is sliced, offset is the numel before the var.
vtype(str|None): a tag, such as Optimizer/Param/RemoteProfetch.
endpoint(str|None): which parameter the slice var on, such as "127.0.0.1:1001"
Returns:
None
"""
self.distributed_vars.append(
VarDistributed(origin_var, slice_var, is_slice, block_id, offset,
vtype, endpoint))
def get_distributed_var_by_slice(self, var_name):
"""
get distributed var by conditions.
Args:
var_name(str): slice var name, such as "w.traier0.block1"
Returns:
VarDistributed: distributed var.
"""
for dist_var in self.distributed_vars:
if dist_var.slice.name == var_name:
return dist_var
return None
@staticmethod
def equal(var1, var2):
"""
the two var is equal or not.
Returns:
bool: equal will return True else False
"""
return var1.name == var2.name and \
var1.type == var2.type and \
var1.shape == var2.shape and \
var1.dtype == var2.dtype and \
var1.lod_level == var2.lod_level and \
var1.persistable == var2.persistable
def get_distributed_var_by_origin_and_ep(self, origin_var_name, endpoint):
"""
get distributed var by conditions.
Args:
origin_var_name(str):
endpoint(str): the parameter endpoint, such as "127.0.0.1:1001"
Returns:
VarDistributed: distributed var.
"""
for dist_var in self.distributed_vars:
if dist_var.origin.name == origin_var_name and dist_var.endpoint == endpoint:
return dist_var
return None
def get_distributed_vars_by_vtypes(self, vtypes, groupby=False):
"""
get distributed vars by conditions.
Args:
vtype(str|None): distributed var's vtype, such as "Optimizer", "RemotePrefetch"
groupby(bool|False): group by origin var or not.
Returns:
list: distributed var list.
dict: distributed var map when groupby=True
"""
vtype_vars = []
for var in self.distributed_vars:
if var.vtype in vtypes:
vtype_vars.append(var)
if not groupby:
return vtype_vars
params_map = {}
for var in vtype_vars:
origin_var_name = var.origin.name
if origin_var_name in params_map.keys():
optimizers = params_map.get(origin_var_name)
else:
optimizers = []
optimizers.append(var)
params_map[origin_var_name] = optimizers
return params_map
def get_distributed_vars_by_ep(self, endpoint, vtype=None):
"""
get distributed vars by conditions.
Args:
endpoint(str): the parameter server endpoint, such as "127.0.0.1:2001"
vtype(str|None): distributed var's vtype, such as "Optimizer", "RemotePrefetch"
Returns:
list: distributed var list.
"""
endpoint_vars = []
for var in self.distributed_vars:
if var.endpoint == endpoint:
endpoint_vars.append(var)
if not vtype:
return endpoint_vars
vtype_vars = []
for var in endpoint_vars:
if var.vtype == vtype:
vtype_vars.append(var)
return vtype_vars
def overview(self):
"""
get the overview string about all params on all parameter servers.
Returns:
Str: overview string.
"""
vars_str = []
for var in self.distributed_vars:
vars_str.append(str(var))
return "\n".join(vars_str)
...@@ -30,19 +30,23 @@ Steps to transpile pserver: ...@@ -30,19 +30,23 @@ Steps to transpile pserver:
5. add listen_and_serv op 5. add listen_and_serv op
""" """
import sys
import math import math
import numpy as np from functools import reduce
import collections import collections
import six
import logging import logging
import numpy as np
from .ps_dispatcher import RoundRobin, PSDispatcher from .ps_dispatcher import RoundRobin, PSDispatcher
from .. import core, framework, unique_name from .. import core, framework, unique_name
from ..framework import Program, default_main_program, \ from ..framework import Program, default_main_program, \
default_startup_program, Block, \ default_startup_program, Block, Parameter, grad_var_name
Parameter, Variable, grad_var_name from .details import wait_server_ready, UnionFind, VarStruct, VarsDistributed
from .details import * from .details import delete_ops, find_op_by_output_arg
from ..distribute_lookup_table import find_distributed_lookup_table from ..distribute_lookup_table import find_distributed_lookup_table
from functools import reduce
LOOKUP_TABLE_TYPE = "lookup_table" LOOKUP_TABLE_TYPE = "lookup_table"
LOOKUP_TABLE_GRAD_TYPE = "lookup_table_grad" LOOKUP_TABLE_GRAD_TYPE = "lookup_table_grad"
...@@ -62,260 +66,6 @@ def log(*args): ...@@ -62,260 +66,6 @@ def log(*args):
print(args) print(args)
class VarStruct(object):
"""
record part properties of a Variable in python.
"""
def __init__(self, name, shape, dtype, type, lod_level, persistable):
self.name = name
self.shape = shape
self.dtype = dtype
self.type = type
self.lod_level = lod_level
self.persistable = persistable
class VarDistributed(object):
"""
a class to record the var distributed on parameter servers.
the class will record the relationship between origin var and slice var.
the slice var's properties, such as type/shape/offset/endpoint.
"""
def __init__(self,
origin_var,
slice_var,
is_slice=None,
block_id=None,
offset=None,
vtype=None,
endpoint=None):
"""
Args:
origin_var(Variable|VarStruct): origin var properties
slice_var(Variable|VarStruct): slice var properties
is_slice(bool|None): slice or not, slice_var=True/False and its block size > 8192 are the judgement standard.
block_id(int|None): the number about the slice var.
offset(int|None): if the slice var is sliced, offset is the numel before the var.
vtype(str|None): a tag, such as Optimizer/Param/RemoteProfetch.
endpoint(str|None): which parameter the slice var on, such as "127.0.0.1:1001"
"""
if isinstance(origin_var, Variable):
self.origin = self.__create_var_struct(origin_var)
else:
self.origin = origin_var
if isinstance(slice_var, Variable):
self.slice = self.__create_var_struct(slice_var)
else:
self.slice = slice_var
if self.equal(self.origin, self.slice):
self.is_slice = False
self.block_id = 0
self.offset = 0
else:
self.is_slice = True
self.block_id = 0
self.offset = 0
if is_slice is not None:
self.is_slice = is_slice
if block_id is not None:
self.block_id = block_id
if offset is not None:
self.offset = offset
self.vtype = vtype
self.endpoint = endpoint
@staticmethod
def __create_var_struct(var):
return VarStruct(var.name, var.shape, var.dtype, var.type,
var.lod_level, var.persistable)
@staticmethod
def equal(var1, var2):
"""
the two var is equal or not.
Returns:
bool: equal will return True else False
"""
assert isinstance(var1, VarStruct) and isinstance(var2, VarStruct)
return var1.name == var2.name and \
var1.type == var2.type and \
var1.shape == var2.shape and \
var1.dtype == var2.dtype and \
var1.lod_level == var2.lod_level and \
var1.persistable == var2.persistable
def __str__(self):
origin_var_str = "{name} : fluid.{type}.shape{shape}.astype({dtype})". \
format(i="{", e="}", name=self.origin.name, type=self.origin.type,
shape=self.origin.shape, dtype=self.origin.dtype)
slice_var_str = "{name} : fluid.{type}.shape{shape}.astype({dtype})" \
".slice({is_slice}).block({block_id}).offset({offset})". \
format(i="{", e="}", name=self.slice.name, type=self.slice.type,
shape=self.slice.shape, dtype=self.slice.dtype,
is_slice=self.is_slice, block_id=self.block_id, offset=self.offset)
return "var owned: {}, origin var: ( {} ), slice var: ( {} ), endpoint: {} ".format(
self.vtype, origin_var_str, slice_var_str, self.endpoint)
class VarsDistributed(object):
"""
a gather about VarDistributed with many methods to find distributed vars.
through the class, we can get overview about the distributed parameters on parameter servers.
this class may centralized and convenient for developer to manage and get variable's distribute.
other module can also use this to find variables such io.py.
"""
def __init__(self):
self.distributed_vars = []
def add_distributed_var(self,
origin_var,
slice_var,
is_slice=None,
block_id=None,
offset=None,
vtype=None,
endpoint=None):
"""
add distributed var in this.
Args:
origin_var(Variable|VarStruct): origin var properties
slice_var(Variable|VarStruct): slice var properties
is_slice(bool|None): slice or not, slice_var=True/False and its block size > 8192 are the judgement standard.
block_id(int|None): the number about the slice var.
offset(int|None): if the slice var is sliced, offset is the numel before the var.
vtype(str|None): a tag, such as Optimizer/Param/RemoteProfetch.
endpoint(str|None): which parameter the slice var on, such as "127.0.0.1:1001"
Returns:
None
"""
self.distributed_vars.append(
VarDistributed(origin_var, slice_var, is_slice, block_id, offset,
vtype, endpoint))
def get_distributed_var_by_slice(self, var_name):
"""
get distributed var by conditions.
Args:
var_name(str): slice var name, such as "w.traier0.block1"
Returns:
VarDistributed: distributed var.
"""
for dist_var in self.distributed_vars:
if dist_var.slice.name == var_name:
return dist_var
return None
@staticmethod
def equal(var1, var2):
"""
the two var is equal or not.
Returns:
bool: equal will return True else False
"""
return var1.name == var2.name and \
var1.type == var2.type and \
var1.shape == var2.shape and \
var1.dtype == var2.dtype and \
var1.lod_level == var2.lod_level and \
var1.persistable == var2.persistable
def get_distributed_var_by_origin_and_ep(self, origin_var_name, endpoint):
"""
get distributed var by conditions.
Args:
origin_var_name(str):
endpoint(str): the parameter endpoint, such as "127.0.0.1:1001"
Returns:
VarDistributed: distributed var.
"""
for dist_var in self.distributed_vars:
if dist_var.origin.name == origin_var_name and dist_var.endpoint == endpoint:
return dist_var
return None
def get_distributed_vars_by_vtypes(self, vtypes, groupby=False):
"""
get distributed vars by conditions.
Args:
vtype(str|None): distributed var's vtype, such as "Optimizer", "RemotePrefetch"
groupby(bool|False): group by origin var or not.
Returns:
list: distributed var list.
dict: distributed var map when groupby=True
"""
vtype_vars = []
for var in self.distributed_vars:
if var.vtype in vtypes:
vtype_vars.append(var)
if not groupby:
return vtype_vars
params_map = {}
for var in vtype_vars:
origin_var_name = var.origin.name
if origin_var_name in params_map.keys():
optimizers = params_map.get(origin_var_name)
else:
optimizers = []
optimizers.append(var)
params_map[origin_var_name] = optimizers
return params_map
def get_distributed_vars_by_ep(self, endpoint, vtype=None):
"""
get distributed vars by conditions.
Args:
endpoint(str): the parameter server endpoint, such as "127.0.0.1:2001"
vtype(str|None): distributed var's vtype, such as "Optimizer", "RemotePrefetch"
Returns:
list: distributed var list.
"""
endpoint_vars = []
for var in self.distributed_vars:
if var.endpoint == endpoint:
endpoint_vars.append(var)
if not vtype:
return endpoint_vars
vtype_vars = []
for var in endpoint_vars:
if var.vtype == vtype:
vtype_vars.append(var)
return vtype_vars
def overview(self):
"""
get the overview string about all params on all parameter servers.
Returns:
Str: overview string.
"""
vars_str = []
for var in self.distributed_vars:
vars_str.append(str(var))
return "\n".join(vars_str)
class VarBlock: class VarBlock:
def __init__(self, varname, offset, size): def __init__(self, varname, offset, size):
self.varname = varname self.varname = varname
......
...@@ -109,6 +109,7 @@ packages=['paddle', ...@@ -109,6 +109,7 @@ packages=['paddle',
'paddle.fluid.contrib', 'paddle.fluid.contrib',
'paddle.fluid.contrib.decoder', 'paddle.fluid.contrib.decoder',
'paddle.fluid.contrib.quantize', 'paddle.fluid.contrib.quantize',
'paddle.fluid.contrib.int8_inference',
'paddle.fluid.contrib.reader', 'paddle.fluid.contrib.reader',
'paddle.fluid.contrib.slim', 'paddle.fluid.contrib.slim',
'paddle.fluid.contrib.slim.core', 'paddle.fluid.contrib.slim.core',
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册