提交 90cf3821 编写于 作者: G gongjian08

Merge branch 'develop' of https://github.com/PaddlePaddle/models into longxinchen2

......@@ -130,17 +130,17 @@ StarGAN, AttGAN和STGAN所需要的[Celeba](http://mmlab.ie.cuhk.edu.hk/projects
Pix2Pix和CycleGAN的效果如图所示:
<p align="centor">
<img src = "images/pix2pix_cyclegan.png" width=550><br/>
Pix2Pix和CycleGAN的效果图
<p align="center">
<img src="images/pix2pix_cyclegan.png" width="650"/><br />
Pix2Pix和CycleGAN的效果图
</p>
StarGAN,AttGAN和STGAN的效果如图所示:
<p align="centor">
<img src = "images/female_stargan_attgan_stgan.png" width=550><br/>
StarGAN,AttGAN和STGAN的效果图
<p align="center">
<img src="images/female_stargan_attgan_stgan.png" width="650"/><br />
StarGAN,AttGAN和STGAN的效果图
</p>
......@@ -181,47 +181,43 @@ STGAN只输入有变化的标签,引入GRU结构,更好的选择变化的属
- Pix2Pix由一个生成网络和一个判别网络组成。生成网络中编码部分的网络结构都是采用`convolution-batch norm-ReLU`作为基础结构,解码部分的网络结构由`transpose convolution-batch norm-ReLU`组成,判别网络基本是由`convolution-norm-leaky_ReLU`作为基础结构,详细的网络结构可以查看`network/Pix2pix_network.py`文件。生成网络提供两种可选的网络结构:Unet网络结构和普通的encoder-decoder网络结构。网络利用损失函数学习从输入图像到输出图像的映射,生成网络损失函数由CGAN的损失函数和L1损失函数组成,判别网络损失函数由CGAN的损失函数组成。生成器的网络结构如下图所示:
<p align="centor">
<img src = "images/pix2pix_gen.png" width=550><br/>
Pix2Pix生成网络结构图[5]
<p align="center">
<img src="images/pix2pix_gen.png" width="550"/><br />
Pix2Pix生成网络结构图[5]
</p>
- CycleGAN由两个生成网络和两个判别网络组成,生成网络A是输入A类风格的图片输出B类风格的图片,生成网络B是输入B类风格的图片输出A类风格的图片。生成网络中编码部分的网络结构都是采用`convolution-norm-ReLU`作为基础结构,解码部分的网络结构由`transpose convolution-norm-ReLU`组成,判别网络基本是由`convolution-norm-leaky_ReLU`作为基础结构,详细的网络结构可以查看`network/CycleGAN_network.py`文件。生成网络提供两种可选的网络结构:Unet网络结构和普通的encoder-decoder网络结构。生成网络损失函数由CGAN的损失函数,重构损失和自身损失组成,判别网络的损失函数由CGAN的损失函数组成。
<p align="centor">
<img src = "images/pix2pix_gen.png" width=550><br/>
CycleGAN生成网络结构图[5]
<p align="center">
<img src="images/pix2pix_gen.png" width="550"/><br />
CycleGAN生成网络结构图[5]
</p>
- StarGAN中生成网络的编码部分主要由`convolution-instance norm-ReLU`组成,解码部分主要由`transpose convolution-norm-ReLU`组成,判别网络主要由`convolution-leaky_ReLU`组成,详细网络结构可以查看`network/StarGAN_network.py`文件。生成网络的损失函数是由CGAN的损失函数,重构损失和分类损失组成,判别网络的损失函数由预测损失,分类损失和梯度惩罚损失组成。
<p align="centor">
<img src = "images/stargan_gen.png" width=300><br/>
StarGAN生成网络结构[7]
</p>
<p align="centor">
<img src = "images/stargan_dis.png" width=300><br/>
StarGAN判别网络结构[7]
<p align="center">
<img src="images/stargan_gen.png" width=350 />
<img src="images/stargan_dis.png" width=400 /> <br />
StarGAN的生成网络结构[左]和判别网络结构[右] [7]
</p>
- AttGAN中生成网络的编码部分主要由`convolution-instance norm-ReLU`组成,解码部分由`transpose convolution-norm-ReLU`组成,判别网络主要由`convolution-leaky_ReLU`组成,详细网络结构可以查看`network/AttGAN_network.py`文件。生成网络的损失函数是由CGAN的损失函数,重构损失和分类损失组成,判别网络的损失函数由预测损失,分类损失和梯度惩罚损失组成。
<p align="centor">
<img src = "images/attgan_net.png" width=800><br/>
AttGAN的网络结构[8]
<p align="center">
<img src="images/attgan_net.png" width=800 /> <br />
AttGAN的网络结构[8]
</p>
- STGAN中生成网络再编码器和解码器之间加入Selective Transfer Units\(STU\),有选择的转换编码网络,从而更好的适配解码网络。生成网络中的编码网络主要由`convolution-instance norm-ReLU`组成,解码网络主要由`transpose convolution-norm-leaky_ReLU`组成,判别网络主要由`convolution-leaky_ReLU`组成,详细网络结构可以查看`network/STGAN_network.py`文件。生成网络的损失函数是由CGAN的损失函数,重构损失和分类损失组成,判别网络的损失函数由预测损失,分类损失和梯度惩罚损失组成。
<p align="centor">
<img src = "images/stgan_net.png" width=800><br/>
STGAN的网络结构[9]
<p align="center">
<img src="images/stgan_net.png" width=800 /> <br />
STGAN的网络结构[9]
</p>
......@@ -230,17 +226,16 @@ STGAN只输入有变化的标签,引入GRU结构,更好的选择变化的属
## FAQ
**Q:** StarGAN/AttGAN/STGAN中属性没有变化,为什么?
**A:** 查看是否所有的标签都转换对了。
**Q:** 预测结果不正常,是怎么回事?
**A:** 某些GAN预测的时候batch_norm的设置需要和训练的时候行为一致,查看模型库中相应的GAN中预测时batch_norm的行为和自己模型中的预测时batch_norm的行为是否一致。
**Q:** StarGAN/AttGAN/STGAN中属性没有变化,为什么?
**A:** 查看是否所有的标签都转换对了。
**Q:** 为什么STGAN和ATTGAN中变男性得到的预测结果是变女性呢?
**Q:** 预测结果不正常,是怎么回事?
**A:** 某些GAN预测的时候batch_norm的设置需要和训练的时候行为一致,查看模型库中相应的GAN中预测时batch_norm的行为和自己模型中的预测时batch_norm的
行为是否一致。
**A:** 这是由于预测时标签的设置,目标标签是基于原本的标签进行改变,比如原本图片是男生,预测代码对标签进行转变的时候会自动变成相对立的标签,即女性,所以得到的结果是女生。如果想要原本是男生,转变之后还是男生,可以参考模型库中预测代码的StarGAN的标签设置。
**Q:** 为什么STGAN和ATTGAN中变男性得到的预测结果是变女性呢?
**A:** 这是由于预测时标签的设置,目标标签是基于原本的标签进行改变,比如原本图片是男生,预测代码对标签进行转变的时候会自动变成相对立的标签,即女
性,所以得到的结果是女生。如果想要原本是男生,转变之后还是男生,可以参考模型库中预测代码的StarGAN的标签设置。
## 参考论文
......@@ -269,8 +264,7 @@ STGAN只输入有变化的标签,引入GRU结构,更好的选择变化的属
## 版本更新
- 4/2019 新增CGAN, DCGAN, Pix2Pix, CycleGAN
- 6/2019 新增StarGAN, AttGAN, STGAN
- 6/2019 新增CGAN, DCGAN, Pix2Pix, CycleGAN,StarGAN, AttGAN, STGAN
## 作者
- [ceci3](https://github.com/ceci3)
......
......@@ -182,6 +182,11 @@ def conv2d(input,
use_cudnn=use_cudnn,
param_attr=param_attr,
bias_attr=bias_attr)
if need_crop:
conv = fluid.layers.crop(
conv,
shape=(-1, conv.shape[1], conv.shape[2] - 1, conv.shape[3] - 1),
offsets=(0, 0, 1, 1))
if norm is not None:
conv = norm_layer(
input=conv, norm_type=norm, name=name + "_norm", is_test=is_test)
......
# PaddlePaddle Object Detection
This object detection framework is based on PaddlePaddle. We want to provide classically and state of the art detection algorithms in generic object detection and specific target detection for users. And we aimed to make this framework easy to extend, train, and deploy. We aimed to make it easy to use in research and products.
The goal of PaddleDetection is to provide easy access to a wide range of object
detection models in both industry and research settings. we design
PaddleDetection to be not only performant, production-ready but also highly
flexible, catering to research needs.
<div align="center">
<img src="demo/output/000000523957.jpg" />
<img src="demo/output/000000570688.jpg" />
</div>
## Introduction
Major features:
Design Principles:
- Easy to Deploy:
All the operations related to inference are implemented by C++ and CUDA, it makes the detection model easy to deploy in products on the server without Python based on the high efficient inference engine of PaddlePaddle.
We release detection models based on ResNet-D backbone, for example, the accuracy of Faster-RCNN model with FPN based on ResNet50 VD is close to model based on ResNet 101. But the former is smaller and faster.
- Production Ready:
Key operations are implemented in C++ and CUDA, together with PaddlePaddle's
highly efficient inference engine, enables easy deployment in server environments.
- Easy to Customize:
All components are modular encapsulated, including the data transforms. It's easy to plug in and pull out any module. For example, users can switch backbone easily or add mixup data augmentation for models.
- Highly Flexible:
Components are designed to be modular. Model architectures, as well as data
preprocess pipelines, can be easily customized with simple configuration
changes.
- High Efficiency:
Based on the high efficient PaddlePaddle framework, less memory is required. For example, the batch size of Mask-RCNN based on ResNet50 can be 5 per Tesla V100 (16G) when multi-GPU training. The training speed of Yolo v3 is faster than other frameworks.
- Performance Optimized:
With the help of the underlying PaddlePaddle framework, faster training and
reduced GPU memory footprint is achieved. Notably, Yolo V3 training is
much faster compared to other frameworks. Another example is Mask-RCNN
(ResNet50), we managed to fit up to 5 images per GPU (V100 16GB) during
training.
The supported architectures are as follows:
Supported Architectures:
| | ResNet |ResNet vd| ResNeXt | SENet | MobileNet | DarkNet|
|--------------------|:------:|--------:|:--------:|:--------:|:---------:|:------:|
| Faster R-CNN | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ |
| Faster R-CNN + FPN | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ |
| Mask R-CNN | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ |
| Mask R-CNN + FPN | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ |
| Cascade R-CNN | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ |
| RetinaNet | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ |
| Yolov3 | ✓ | ✗ | ✗ | ✗ | ✓ | ✓ |
| SSD | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ |
| | ResNet | ResNet-vd <sup>[1](#vd)</sup> | ResNeXt | SENet | MobileNet | DarkNet |
|--------------------|:------:|------------------------------:|:-------:|:-----:|:---------:|:-------:|
| Faster R-CNN | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ |
| Faster R-CNN + FPN | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ |
| Mask R-CNN | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ |
| Mask R-CNN + FPN | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ |
| Cascade R-CNN | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ |
| RetinaNet | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ |
| Yolov3 | ✓ | ✗ | ✗ | ✗ | ✓ | ✓ |
| SSD | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ |
The extensive capabilities:
<a name="vd">[1]</a> ResNet-vd models offer much improved accuracy with negligible performance cost.
- [x] **Synchronized batch norm**: used in Yolo v3.
- [x] **Group Norm**: supported this operation, the related model will be added later.
- [x] **Modulated deformable convolution**: supported this operation, the related model will be added later.
- [x] **Deformable PSRoI Pooling**: supported this operation, the related model will be added later.
Advanced Features:
- [x] **Synchronized Batch Norm**: currently used by Yolo V3.
- [x] **Group Norm**: pretrained models to be released.
- [x] **Modulated Deformable Convolution**: pretrained models to be released.
- [x] **Deformable PSRoI Pooling**: pretrained models to be released.
#### Work in Progress and to Do
- About Framework:
- Mixed precision training and distributed training.
- 8-bit deployment.
- Easy to customize the user-defined function.
- About Algorithms:
- More SOTA models.
- More easy-to-deployed models.
We are glad to receive your feedback.
## Model zoo
The trained models can be available in PaddlePaddle [detection model zoo](docs/MODEL_ZOO.md).
Pretrained models are available in the PaddlePaddle [detection model zoo](docs/MODEL_ZOO.md).
## Installation
Please follow the [installation instructions](docs/INSTALL.md) to install and prepare environment.
Please follow the [installation guide](docs/INSTALL.md).
## Get Started
For quickly start, infer an image:
For inference, simply run the following command and the visualized result will
be saved in `output/`.
```bash
export PYTHONPATH=`pwd`:$PYTHONPATH
python tools/infer.py -c configs/mask_rcnn_r50_1x.yml \
-o weights=http://
-infer_img=demo/000000523957.jpg
-o weights=https://paddlemodels.bj.bcebos.com/object_detection/mask_rcnn_r50_1x.tar
-infer_img=demo/000000570688.jpg
```
The predicted and visualized results are saved in `output/1.jpg`.
For detailed training and evaluation workflow, please refer to [GETTING_STARTED.md](docs/GETTING_STARTED.md).
For more detailed training and evaluating pipeline, please refer [GETTING_STARTED.md](docs/GETTING_STARTED.md).
We also recommend users to take a look at the [IPython Notebook demo](demo/mask_rcnn_demo.ipynb)
Further information can be found in these documentations:
More documentation:
- [Introduction to the configuration workflow.](docs/CONFIG.md)
- [Guide to custom dataset and preprocess pipeline.](docs/DATA.md)
- [How to config an object detection pipeline.](docs/CONFIG.md)
- [How to use customized dataset and add data preprocessing.](docs/DATA.md)
## Todo List
## Deploy
Please note this is a work in progress, substantial changes may come in the
near future.
Some of the planned features include:
The example of how to use PaddlePaddle to deploy detection model will be added later.
- [ ] Mixed precision training.
- [ ] Distributed training.
- [ ] Inference in 8-bit mode.
- [ ] User defined operations.
- [ ] Larger model zoo.
## Updates
The major updates are as follows:
## Updates
#### 2019-07-03
- Release the unified detection framework.
- Supported algorithms: Faster R-CNN, Mask R-CNN, Faster R-CNN + FPN, Mask R-CNN + FPN, Cascade-Faster-RCNN + FPN, RetinaNet, Yolo v3, and SSD.
- Release the first version of Model Zoo.
#### Initial release (7/3/2019)
- Initial release of PaddleDetection and detection model zoo
- Models included: Faster R-CNN, Mask R-CNN, Faster R-CNN+FPN, Mask
R-CNN+FPN, Cascade-Faster-RCNN+FPN, RetinaNet, Yolo v3, and SSD.
## Contributing
We appreciate everyone's contributions!
Contributions are highly welcomed and we would really appreciate your feedback!!
......@@ -3,7 +3,7 @@ train_feed: FasterRCNNTrainFeed
eval_feed: FasterRCNNEvalFeed
test_feed: FasterRCNNTestFeed
max_iters: 90000
use_gpu: True
use_gpu: true
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_pretrained.tar
weights: output/retinanet_r101_fpn_1x/model_final
log_smooth_window: 20
......
因为 它太大了无法显示 source diff 。你可以改为 查看blob
# Introduction
PaddleDetection takes a rather principled approach to configuration management. We aim to automate the configuration workflow and to reduce configuration errors.
# Rationale
Presently, configuration in mainstream frameworks are usually dictionary based: the global config is simply a giant, loosely defined Python dictionary.
This approach is error prone, e.g., misspelled or displaced keys may lead to serious errors in training process, causing time loss and wasted resources.
To avoid the common pitfalls, with automation and static analysis in mind, we propose a configuration design that is user friendly, easy to maintain and extensible.
# Design
The design utilizes some of Python's reflection mechanism to extract configuration schematics from python class definitions.
To be specific, it extracts information from class constructor arguments, including names, docstrings, default values, data types (if type hints are available).
This approach advocates modular and testable design, leading to a unified and extensible code base.
## API
Most of the functionality is exposed in `ppdet.core.workspace` module.
- `register`: This decorator register a class as configurable module; it understands several special annotations in the class definition.
- `__category__`: For better organization, modules are classified into categories.
- `__inject__`: A list of constructor arguments, which are intended to take module instances as input, module instances will be created at runtime an injected. The corresponding configuration value can be a class name string, a serialized object, a config key pointing to a serialized object, or a dict (in which case the constructor needs to handle it, see example below).
- `__op__`: Shortcut for wrapping PaddlePaddle operators into a callable objects, together with `__append_doc__` (extracting docstring from target PaddlePaddle operator automatically), this can be a real time saver.
- `serializable`: This decorator make a class directly serializable in yaml config file, by taking advantage of [pyyaml](https://pyyaml.org/wiki/PyYAMLDocumentation)'s serialization mechanism.
- `create`: Constructs a module instance according to global configuration.
- `load_config` and `merge_config`: Loading yaml file and merge config settings from command line.
## Example
Take the `RPNHead` module for example, it is composed of several PaddlePaddle operators. We first wrap those operators into classes, then pass in instances of these classes when instantiating the `RPNHead` module.
```python
# excerpt from `ppdet/modeling/ops.py`
from ppdet.core.workspace import register, serializable
# ... more operators
@register
@serializable
class GenerateProposals(object):
# NOTE this class simply wraps a PaddlePaddle operator
__op__ = fluid.layers.generate_proposals
# NOTE docstring for args are extracted from PaddlePaddle OP
__append_doc__ = True
def __init__(self,
pre_nms_top_n=6000,
post_nms_top_n=1000,
nms_thresh=.5,
min_size=.1,
eta=1.):
super(GenerateProposals, self).__init__()
self.pre_nms_top_n = pre_nms_top_n
self.post_nms_top_n = post_nms_top_n
self.nms_thresh = nms_thresh
self.min_size = min_size
self.eta = eta
# ... more operators
# excerpt from `ppdet/modeling/anchor_heads/rpn_head.py`
from ppdet.core.workspace import register
from ppdet.modeling.ops import AnchorGenerator, RPNTargetAssign, GenerateProposals
@register
class RPNHead(object):
"""
RPN Head
Args:
anchor_generator (object): `AnchorGenerator` instance
rpn_target_assign (object): `RPNTargetAssign` instance
train_proposal (object): `GenerateProposals` instance for training
test_proposal (object): `GenerateProposals` instance for testing
"""
__inject__ = [
'anchor_generator', 'rpn_target_assign', 'train_proposal',
'test_proposal'
]
def __init__(self,
anchor_generator=AnchorGenerator().__dict__,
rpn_target_assign=RPNTargetAssign().__dict__,
train_proposal=GenerateProposals(12000, 2000).__dict__,
test_proposal=GenerateProposals().__dict__):
super(RPNHead, self).__init__()
self.anchor_generator = anchor_generator
self.rpn_target_assign = rpn_target_assign
self.train_proposal = train_proposal
self.test_proposal = test_proposal
if isinstance(anchor_generator, dict):
self.anchor_generator = AnchorGenerator(**anchor_generator)
if isinstance(rpn_target_assign, dict):
self.rpn_target_assign = RPNTargetAssign(**rpn_target_assign)
if isinstance(train_proposal, dict):
self.train_proposal = GenerateProposals(**train_proposal)
if isinstance(test_proposal, dict):
self.test_proposal = GenerateProposals(**test_proposal)
```
The corresponding(generated) YAML snippet is as follows, note this is the configuration in **FULL**, all the default values can be omitted. In case of the above example, all arguments have default value, meaning nothing is required in the config file.
```yaml
RPNHead:
test_prop:
eta: 1.0
min_size: 0.1
nms_thresh: 0.5
post_nms_top_n: 1000
pre_nms_top_n: 6000
train_prop:
eta: 1.0
min_size: 0.1
nms_thresh: 0.5
post_nms_top_n: 2000
pre_nms_top_n: 12000
anchor_generator:
# ...
rpn_target_assign:
# ...
```
Example snippet that make use of the `RPNHead` module.
```python
from ppdet.core.worskspace import load_config, merge_config, create
load_config('some_config_file.yml')
merge_config(more_config_options_from_command_line)
rpn_head = create('RPNHead')
# ... code that use the created module!
```
Configuration file can also have serialized objects in it, denoted with `!`, for example
```yaml
LearningRate:
base_lr: 0.01
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [60000, 80000]
- !LinearWarmup
start_factor: 0.3333333333333333
steps: 500
```
# Requirements
Two python packages are used, both are optional.
- [typeguard](https://github.com/agronholm/typeguard) is used for type checking in Python 3.
- [docstring\_parser](https://github.com/rr-/docstring_parser) is needed for docstring parsing.
To install them, simply run:
```shell
pip install typeguard http://github.com/willthefrog/docstring_parser/tarball/master
```
# Tooling
A small utility (`tools/configure.py`) is included to simplify the configuration process, it provides 4 commands to walk users through the configuration process:
1. `list`: List currently registered modules by category, one can also specify which category to list with the `--category` flag.
2. `help`: Get help information for a module, including description, options, configuration template and example command line flags.
3. `analyze`: Check configuration file for missing/extraneous options, options with mismatch type (if type hint is given) and missing dependencies, it also highlights user provided values (overridden default values).
4. `generate`: Generate a configuration template for a given list of modules. By default it generates a complete configuration file, which can be quite verbose; if a `--minimal` flag is given, it generates a template that only contain non optional settings. For example, to generate a configuration for Faster R-CNN architecture with `ResNet` backbone and `FPN`, run:
```shell
python tools/configure.py generate FasterRCNN ResNet RPNHead RoIAlign BBoxAssigner BBoxHead FasterRCNNTrainFeed FasterRCNNTestFeed LearningRate OptimizerBuilder
```
For a minimal version, run:
```shell
python tools/configure.py --minimal generate FasterRCNN BBoxHead
```
## Introduction
This is a Python module used to load and convert data into formats for detection model training, evaluation and inference. The converted sample schema is a tuple of np.ndarrays. For example, the schema of Faster R-CNN training data is: `[(im, im_info, im_id, gt_bbox, gt_class, is_crowd), (...)]`.
The data pipeline is responsible for loading and converting data. Each
resulting data sample is a tuple of np.ndarrays.
For example, Faster R-CNN training uses samples of this format: `[(im,
im_info, im_id, gt_bbox, gt_class, is_crowd), (...)]`.
### Implementation
This module is consists of four sub-systems: data parsing, image pre-processing, data conversion and data feeding apis.
We use `dataset.Dataset` to abstract a set of data samples. For example, `COCO` data contains 3 sets of data for training, validation, and testing respectively. Original data stored in files could be loaded into memory using `dataset.source`; Then make use of `dataset.transform` to process the data; Finally, the batch data could be fetched by the api of `dataset.Reader`.
The data pipeline consists of four sub-systems: data parsing, image
pre-processing, data conversion and data feeding APIs.
Data samples are collected to form `dataset.Dataset`s, usually 3 sets are
needed for training, validation, and testing respectively.
Sub-systems introduction:
1. Data prasing
By data parsing, we can get a `dataset.Dataset` instance, whose implementation is located in `dataset.source`. This sub-system is used to parse different data formats, which is easy to add new data format supports. Currently, only following data sources are included:
First, `dataset.source` loads the data files into memory, then
`dataset.transform` processes them, and lastly, the batched samples
are fetched by `dataset.Reader`.
Sub-systems details:
1. Data parsing
Parses various data sources and creates `dataset.Dataset` instances. Currently,
following data sources are supported:
- COCO data source
This kind of source is used to load `COCO` data directly, eg: `COCO2017`. It's composed of json files for labeling info and image files. And it's directory structure is as follows:
- COCO data source
Loads `COCO` type datasets with directory structures like this:
```
data/coco/
......@@ -29,9 +41,8 @@ This kind of source is used to load `COCO` data directly, eg: `COCO2017`. It's c
| ...
```
- Pascal VOC data source
This kind of source is used to load `VOC` data directly, eg: `VOC2007`. It's composed of xml files for labeling info and image files. And it's directory structure is as follows:
- Pascal VOC data source
Loads `Pascal VOC` like datasets with directory structure like this:
```
data/pascalvoc/
......@@ -59,28 +70,28 @@ This kind of source is used to load `VOC` data directly, eg: `VOC2007`. It's com
| ...
```
- Roidb data source
This kind of source is a normalized data format which only contains a pickle file. The pickle file only has a dictionary which only has a list named 'records' (maybe there is a mapping file for label name to label id named 'canme2id'). You can convert `COCO` or `VOC` data into this format. The pickle file's content is as follows:
- Roidb data source
A generalized data source serialized as pickle files, which have the following
structure:
```python
(records, catname2clsid)
'records' is list of dict whose structure is:
(records, cname2id)
# `cname2id` is a `dict` which maps category name to class IDs
# and `records` is a list of dict of this structure:
{
'im_file': im_fname, # image file name
'im_id': im_id, # image id
'h': im_h, # height of image
'w': im_w, # width
'is_crowd': is_crowd,
'gt_class': gt_class,
'gt_bbox': gt_bbox,
'gt_poly': gt_poly,
'im_file': im_fname, # image file name
'im_id': im_id, # image ID
'h': im_h, # height of image
'w': im_w, # width of image
'is_crowd': is_crowd, # crowd marker
'gt_class': gt_class, # ground truth class
'gt_bbox': gt_bbox, # ground truth bounding box
'gt_poly': gt_poly, # ground truth segmentation
}
'cname2id' is a dict to map category name to class id
```
We also provide the tool to generate the roidb data source in `./tools/`. You can use the follow command to implement.
```python
We provide a tool to generate roidb data sources. To convert `COCO` or `VOC`
like dataset, run this command:
```sh
# --type: the type of original data (xml or json)
# --annotation: the path of file, which contains the name of annotation files
# --save-dir: the save path
......@@ -92,81 +103,80 @@ python ./tools/generate_data_for_training.py
--samples=-1
```
2. Image preprocessing
Image preprocessing subsystem includes operations such as image decoding, expanding, cropping, etc. We use `dataset.transform.operator` to unify the implementation, which is convenient for extension. In addition, multiple operators can be combined to form a complex processing pipeline, and used by data transformers in `dataset.transformer`, such as multi-threading to acclerate a complex image data processing.
3. Data transformer
The function of the data transformer is used to convert a `dataset.Dataset` to a new `dataset.Dataset`, for example: convert a jpeg image dataset into a decoded and resized dataset. We use the decorator pattern to implement different transformers which are all subclass of `dataset.Dataset`. For example, the `dataset.transform.paralle_map` transformer is for multi-process preprocessing, more transformers can be found in `dataset.transform.transformer`.
4. Data feeding apis
To facilitate data pipeline building and data feeding for training, we combine multiple `dataset.Dataset` to form a `dataset.Reader` which can provide data for training, validation and testing respectively. The user only needs to call `Reader.[train|eval|infer]` to get the corresponding data stream. `Reader` supports yaml file to configure data address, preprocessing oprators, acceleration mode, and so on.
2. Image preprocessing
the `dataset.transform.operator` module provides operations such as image
decoding, expanding, cropping, etc. Multiple operators are combined to form
larger processing pipelines.
3. Data transformer
Transform a `dataset.Dataset` to achieve various desired effects, Notably: the
`dataset.transform.paralle_map` transformer accelerates image processing with
multi-threads or multi-processes. More transformers can be found in
`dataset.transform.transformer`.
4. Data feeding apis
To facilitate data pipeline building, we combine multiple `dataset.Dataset` to
form a `dataset.Reader` which can provide data for training, validation and
testing respectively. Users can simply call `Reader.[train|eval|infer]` to get
the corresponding data stream. Many aspect of the `Reader`, such as storage
location, preprocessing pipeline, acceleration mode can be configured with yaml
files.
The main APIs are as follows:
1. Data parsing
- `source/coco_loader.py`: Use to parse the COCO dataset. [detail code](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/object_detection/ppdet/data/source/coco_loader.py)
- `source/voc_loader.py`: Use to parse the Pascal VOC dataset. [detail code](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/object_detection/ppdet/data/source/voc_loader.py)
[Note] When using VOC datasets, if you do not use the default label list, you need to generate `label_list.txt` using `tools/generate_data_for_training.py` (the usage method is same as generating the roidb data source) or provide `label_list.txt` in `data/pascalvoc/ImageSets/Main` firstly. Also set the parameter `use_default_label` to `false` in the configuration file.
- `source/loader.py`: Use to parse the Roidb dataset. [detail code](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/object_detection/ppdet/data/source/loader.py)
- `source/coco_loader.py`: COCO dataset parser. [source](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/object_detection/ppdet/data/source/coco_loader.py)
- `source/voc_loader.py`: Pascal VOC dataset parser. [source](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/object_detection/ppdet/data/source/voc_loader.py)
[Note] To use a non-default label list for VOC datasets, a `label_list.txt`
file is needed, one can use the provided label list
(`data/pascalvoc/ImageSets/Main/label_list.txt`) or generate a custom one (with `tools/generate_data_for_training.py`). Also, `use_default_label` option should
be set to `false` in the configuration file
- `source/loader.py`: Roidb dataset parser. [source](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/object_detection/ppdet/data/source/loader.py)
2. Operator
`transform/operators.py`: Contains a variety of data enhancement methods, including:
``` python
RandomFlipImage: Horizontal flip.
RandomDistort: Distort brightness, contrast, saturation, and hue.
ResizeImage: Adjust the image size according to the specific interpolation method.
RandomInterpImage: Use a random interpolation method to resize the image.
CropImage: Crop image with respect to different scale, aspect ratio, and overlap.
ExpandImage: Put the original image into a larger expanded image which is initialized using image mean.
DecodeImage: Read images in RGB format.
Permute: Arrange the channels of the image and converted to the BGR format.
NormalizeImage: Normalize image pixel values.
NormalizeBox: Normalize the bounding box.
MixupImage: Mixup two images in proportion.
```
[Note] The mixup operation can refer to[paper](https://arxiv.org/pdf/1710.09412.pdf)
`transform/arrange_sample.py`: Sort the data which need to input the network.
- `DecodeImage`: Read images in RGB format.
- `RandomFlipImage`: Horizontal flip.
- `RandomDistort`: Distort brightness, contrast, saturation, and hue.
- `ResizeImage`: Resize image with interpolation.
- `RandomInterpImage`: Use a random interpolation method to resize the image.
- `CropImage`: Crop image with respect to different scale, aspect ratio, and overlap.
- `ExpandImage`: Pad image to a larger size, padding filled with mean image value.
- `NormalizeImage`: Normalize image pixel values.
- `NormalizeBox`: Normalize the bounding box.
- `Permute`: Arrange the channels of the image and optionally convert image to BGR format.
- `MixupImage`: Mixup two images with given fraction<sup>[1](#vd)</sup>.
<a name="mix">[1]</a> Please refer to [this paper](https://arxiv.org/pdf/1710.09412.pdf)
`transform/arrange_sample.py`: Assemble the data samples needed by different models.
3. Transformer
`transform/post_map.py`: A pre-processing operation for completing batch data, which mainly includes:
``` python
Randomly adjust the image size of the batch data
Multi-scale adjustment of image size
Padding operation
```
`transform/transformer.py`: Used to filter useless data and return batch data.
`transform/parallel_map.py`: Used to achieve acceleration.
`transform/post_map.py`: Transformations that operates on whole batches, mainly for:
- Padding whole batch to given stride values
- Resize images to Multi-scales
- Randomly adjust the image size of the batch data
`transform/transformer.py`: Data filtering batching.
`transform/parallel_map.py`: Accelerate data processing with multi-threads/multi-processes.
4. Reader
`reader.py`: Used to combine source and transformer operations, and return batch data according to `max_iter`.
`reader.py`: Combine source and transforms, return batch data according to `max_iter`.
`data_feed.py`: Configure default parameters for `reader.py`.
### Usage
#### Ordinary usage
The function of this module is completed by combining the configuration information in the yaml file. The use of yaml files can be found in the configuration file section.
#### Canned Datasets
- Read data for training
Preset for common datasets, e.g., `MS-COCO` and `Pascal Voc` are included. In
most cases, user can simply use these canned dataset as is. Moreover, the
whole data pipeline is fully customizable through the yaml configuration files.
``` python
ccfg = load_cfg('./config.yml')
coco = Reader(ccfg.DATA, ccfg.TRANSFORM, maxiter=-1)
```
#### How to use customized dataset?
- Option 1: Convert the dataset to the VOC format or COCO format.
```python
# In ./tools/, the code named labelme2coco.py is provided to convert
# the dataset which is annotatedby Labelme to a COCO dataset.
#### Custom Datasets
- Option 1: Convert the dataset to COCO or VOC format.
```sh
# a small utility (`tools/labelme2coco.py`) is provided to convert
# Labelme-annotated dataset to COCO format.
python ./tools/labelme2coco.py --json_input_dir ./labelme_annos/
--image_input_dir ./labelme_imgs/
--output_dir ./cocome/
......@@ -180,13 +190,14 @@ coco = Reader(ccfg.DATA, ccfg.TRANSFORM, maxiter=-1)
# --val_proportion:The validation proportion of annatation data.
# --test_proportion: The inference proportion of annatation data.
```
- Option 2:
1. Following the `./source/coco_loader.py` and `./source/voc_loader.py`, add `./source/XX_loader.py` and implement the `load` function.
2. Add the entry for `./source/XX_loader.py` in the `load` function of `./source/loader.py`.
3. Modify `./source/__init__.py`:
- Option 2:
1. Add `source/XX_loader.py` and implement the `load` function, following the
example of `source/coco_loader.py` and `source/voc_loader.py`.
2. Modify the `load` function in `source/loader.py` to make use of the newly
added data loader.
3. Modify `/source/__init__.py` accordingly.
```python
if data_cf['type'] in ['VOCSource', 'COCOSource', 'RoiDbSource']:
source_type = 'RoiDbSource'
......@@ -194,9 +205,12 @@ if data_cf['type'] in ['VOCSource', 'COCOSource', 'RoiDbSource']:
if data_cf['type'] in ['VOCSource', 'COCOSource', 'RoiDbSource', 'XXSource']:
source_type = 'RoiDbSource'
```
4. In the configure file, define the `type` of `dataset` as `XXSource`
4. In the configure file, define the `type` of `dataset` as `XXSource`.
#### How to add data pre-processing?
- If you want to add the enhanced preprocessing of a single image, you can refer to the code of each class in `transform/operators.py`, and create a new class to implement new data enhancement. Also add the name of this preprocessing to the configuration file.
- If you want to add image preprocessing for a single batch, you can refer to the code for each function in `build_post_map` of `transform/post_map.py`, and create a new internal function to implement new batch data preprocessing. Also add the name of this preprocessing to the configuration file.
- To add pre-processing operation for a single image, refer to the classes in
`transform/operators.py`, and implement the desired transformation with a new
class.
- To add pre-processing for a batch, one needs to modify the `build_post_map`
function in `transform/post_map.py`.
# Getting Started
Please refer [installation instructions](INSTALL.md) to install PaddlePaddle and prepare dataset at first.
For setting up the test environment, please refer to [installation
instructions](INSTALL.md).
## Train a Model
## Training
#### Single-GPU Training
#### One-Device Training
```bash
export CUDA_VISIBLE_DEVICES=0
# export CPU_NUM=1 # for CPU training
python tools/train.py -c configs/faster_rcnn_r50_1x.yml
```
#### Multi-Device Training
#### Multi-GPU Training
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 # set devices
# export CPU_NUM=8 # for CPU training
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python tools/train.py -c =configs/faster_rcnn_r50_1x.yml
```
- Default dataset directory is `dataset/coco`, users also can specify it in the configure file.
- Pretrained model will be downloaded automatically and saved at `~/.cache/paddle/weights`.
- Model will be saved at `output/faster_rcnn_r50_1x` by default, users also can specify it in the configure file.
- All hyper parameters can refer input config.
- Change config file for other models.
- For more help, please run `python tools/train.py --help`.
- Datasets is stored in `dataset/coco` by default (configurable).
- Pretrained model is downloaded automatically and cached in `~/.cache/paddle/weights`.
- Model checkpoints is saved in `output` by default (configurable).
- To check out hyper parameters used, please refer to the config file.
Alternating between training epoch and evaluation run is possible, simply pass
in `--eval=True` to do so (tested with `SSD` detector on Pascal-VOC, not
recommended for two stage models or training sessions on COCO dataset)
For `SSD` on Pascal-VOC dataset, set `--eval=True` to do evaluation during training.
For other models based on COCO dataset, the evaluating during training is not fully verified, better to do evaluation after training.
## Evaluate with Pretrained models.
## Evaluation
```bash
export CUDA_VISIBLE_DEVICES=0
# export CPU_NUM=1 # for CPU training
# or run on CPU with:
# export CPU_NUM=1
python tools/eval.py -c configs/faster_rcnn_r50_1x.yml
```
- The default model directory is `output/faster_rcnn_r50_1x`, you also can specify it.
- For R-CNN and SSD models, do not support evaluating by multi-device now, we will enhanced it in next version.
- For more help, please run `python tools/eval.py --help`.
- Checkpoint is loaded from `output` by default (configurable)
- Multi-GPU evaluation for R-CNN and SSD models is not supported at the
moment, but it is a planned feature
## Inference
## Inference with Pretrained Models
- Infer one image:
- Run inference on a single image:
```bash
export CUDA_VISIBLE_DEVICES=0
python tools/infer.py -c configs/faster_rcnn_r50_1x.yml --infer_img=demo/000000000139.jpg
# or run on CPU with:
# export CPU_NUM=1
python tools/infer.py -c configs/faster_rcnn_r50_1x.yml --infer_img=demo/000000570688.jpg
```
- Infer several images:
- Batch inference:
```bash
export CUDA_VISIBLE_DEVICES=0
# or run on CPU with:
# export CPU_NUM=1
python tools/infer.py -c configs/faster_rcnn_r50_1x.yml --infer_dir=demo
```
The predicted and visualized images are saved in `output` by default, users can change saved directory by specifying `--savefile=`. For more help please run `python tools/infer.py --help`.
The visualization files are saved in `output` by default, to specify a different
path, simply add a `--save_file=` flag.
## FAQ
Q: Why the loss may be NaN when using single GPU to train?
Q: Why do I get `NaN` loss values during single GPU training?
A: The default learning rate is adapt to multi-device training, when use single GPU and small batch size, you need to decrease `base_lr` by corresponding multiples.
A: The default learning rate is tuned to multi-GPU training (8x GPUs), it must
be adapted for single GPU training accordingly (e.g., divide by 8).
# Installing PaddleDetection
# Installation
---
## Table of Contents
......@@ -12,22 +12,24 @@
## Introduction
This document covers how to install PaddleDetection, its dependencies (including PaddlePaddle), and COCO and PASCAL VOC dataset.
This document covers how to install PaddleDetection, its dependencies
(including PaddlePaddle), together with COCO and PASCAL VOC dataset.
For general information about PaddleDetection, please see [README.md](../README.md).
## PaddlePaddle
Running PaddleDetection requires PaddelPaddle Fluid v.1.5 and later. please follow the installation instructions in [installation document](http://www.paddlepaddle.org/documentation/docs/en/1.4/beginners_guide/install/index_en.html).
Running PaddleDetection requires PaddlePaddle Fluid v.1.5 and later. please follow the instructions in [installation document](http://www.paddlepaddle.org/documentation/docs/en/1.4/beginners_guide/install/index_en.html).
Please make sure your PaddlePaddle installation was sucessful and the version of your PaddlePaddle is not lower than the version required. You can check PaddlePaddle installation with following commands.
Please make sure your PaddlePaddle installation was successful and the version
of your PaddlePaddle is not lower than required. Verify with the following commands.
```
# To check if PaddlePaddle installation was sucessful
python -c "from paddle.fluid import fluid; fluid.install_check.run_check()"
# To print PaddlePaddle version
# To check PaddlePaddle version
python -c "import paddle; print(paddle.__version__)"
```
......@@ -41,9 +43,9 @@ python -c "import paddle; print(paddle.__version__)"
## Other Dependencies
**Install the [COCO-API](https://github.com/cocodataset/cocoapi):**
[COCO-API](https://github.com/cocodataset/cocoapi):
To train the model, COCO-API is needed. Installation is as follows:
COCO-API is needed for training. Installation is as follows:
git clone https://github.com/cocodataset/cocoapi.git
cd cocoapi/PythonAPI
......@@ -60,7 +62,8 @@ To train the model, COCO-API is needed. Installation is as follows:
**Clone Paddle models repository:**
You can clone Paddle models and change directory to PaddleDetection module with folloing commands:
You can clone Paddle models and change working directory to PaddleDetection
with the following commands:
```
cd <path/to/clone/models>
......@@ -68,15 +71,15 @@ git clone https://github.com/PaddlePaddle/models
cd models/PaddleCV/object_detection
```
**Install Python module requirements:**
**Install Python dependencies:**
Other python module requirements is set in [requirements.txt](../requirements.txt), you can install these requirements with folloing command:
Required python packages are specified in [requirements.txt](./requirements.txt), and can be installed with:
```
pip install -r requirements.txt
```
**Check PaddleDetection architectures tests pass:**
**Make sure the tests pass:**
```
export PYTHONPATH=`pwd`:$PYTHONPATH
......@@ -86,18 +89,22 @@ python ppdet/modeling/tests/test_architectures.py
## Datasets
PaddleDetection support train/eval/infer models with dataset [MSCOCO](http://cocodataset.org) and [PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/), you can set up dataset as follows.
PaddleDetection includes support for [MSCOCO](http://cocodataset.org) and [PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/) by default, please follow these instructions to set up the dataset.
**Create symlinks for datasets:**
**Create symlinks for local datasets:**
Dataset default path in PaddleDetection config files is `dataset/coco` and `dataset/voc`, you can set symlinks for your COCO/COCO-like or VOC/VOC-like datasets with following commands:
Default dataset path in config files is `data/coco` and `data/voc`, if the
datasets are already available on disk, you can simply create symlinks to
their directories:
```
ln -sf <path/to/coco> $PaddleDetection/data/coco
ln -sf <path/to/voc> $PaddleDetection/data/voc
ln -sf <path/to/coco> <path/to/paddle_detection>/data/coco
ln -sf <path/to/voc> <path/to/paddle_detection>/data/voc
```
If you do not have datasets locally, you can download dataset as follows:
**Download datasets manually:**
On the other hand, to download the datasets, run the following commands:
- MS-COCO
......@@ -113,9 +120,14 @@ cd dataset/voc
./download.sh
```
**Auto download datasets:**
**Download datasets automatically:**
If you set up models while `data/coc` and `data/voc` is not found, PaddleDetection will automaticaly download them from [MSCOCO-2017](http://images.cocodataset.org) and [VOC2012](http://host.robots.ox.ac.uk/pascal/VOC), the decompressed datasets will be places in `~/.cache/paddle/dataset/` and can be discovered automaticaly in the next setting up time.
If a training session is started but the dataset is not setup properly (e.g,
not found in `data/coc` or `data/voc`), PaddleDetection can automatically
download them from [MSCOCO-2017](http://images.cocodataset.org) and
[VOC2012](http://host.robots.ox.ac.uk/pascal/VOC), the decompressed datasets
will be cached in `~/.cache/paddle/dataset/` and can be discovered automatically
subsequently.
**NOTE:** For further informations on the datasets, please see [DATASET.md](DATA.md)
**NOTE:** For further informations on the datasets, please see [DATA.md](DATA.md)
......@@ -123,10 +123,12 @@ def load(fname,
elif os.path.isfile(fname):
from . import voc_loader
if use_default_label is None or cname2cid is not None:
records, cname2cid = voc_loader.get_roidb(fname, samples, cname2cid)
records, cname2cid = voc_loader.get_roidb(fname, samples, cname2cid,
with_background=with_background)
else:
records, cname2cid = voc_loader.load(fname, samples,
use_default_label)
use_default_label,
with_background=with_background)
else:
raise ValueError('invalid file type when load data from file[%s]' %
(fname))
......
......@@ -18,7 +18,10 @@ import numpy as np
import xml.etree.ElementTree as ET
def get_roidb(anno_path, sample_num=-1, cname2cid=None):
def get_roidb(anno_path,
sample_num=-1,
cname2cid=None,
with_background=True):
"""
Load VOC records with annotations in xml directory 'anno_path'
......@@ -30,6 +33,9 @@ def get_roidb(anno_path, sample_num=-1, cname2cid=None):
anno_path (str): root directory for voc annotation data
sample_num (int): number of samples to load, -1 means all
cname2cid (dict): the label name to id dictionary
with_background (bool): whether load background as a class.
if True, total class number will
be 81. default True
Returns:
(records, catname2clsid)
......@@ -89,7 +95,7 @@ def get_roidb(anno_path, sample_num=-1, cname2cid=None):
cname = obj.find('name').text
if not existence and cname not in cname2cid:
# the background's id is 0, so need to add 1.
cname2cid[cname] = len(cname2cid) + 1
cname2cid[cname] = len(cname2cid) + int(with_background)
elif existence and cname not in cname2cid:
raise KeyError(
'Not found cname[%s] in cname2cid when map it to cid.' %
......@@ -129,7 +135,10 @@ def get_roidb(anno_path, sample_num=-1, cname2cid=None):
return [records, cname2cid]
def load(anno_path, sample_num=-1, use_default_label=True):
def load(anno_path,
sample_num=-1,
use_default_label=True,
with_background=True):
"""
Load VOC records with annotations in
xml directory 'anno_path'
......@@ -142,6 +151,9 @@ def load(anno_path, sample_num=-1, use_default_label=True):
@anno_path (str): root directory for voc annotation data
@sample_num (int): number of samples to load, -1 means all
@use_default_label (bool): whether use the default mapping of label to id
@with_background (bool): whether load background as a class.
if True, total class number will
be 81. default True
Returns:
(records, catname2clsid)
......@@ -165,21 +177,24 @@ def load(anno_path, sample_num=-1, use_default_label=True):
assert os.path.isfile(txt_file) and \
os.path.isdir(xml_path), 'invalid xml path'
# mapping category name to class id
# if with_background is True:
# background:0, first_class:1, second_class:2, ...
# if with_background is False:
# first_class:0, second_class:1, ...
records = []
ct = 0
cname2cid = {}
if not use_default_label:
label_path = os.path.join(part[0], 'ImageSets/Main/label_list.txt')
with open(label_path, 'r') as fr:
label_id = 1
label_id = int(with_background)
for line in fr.readlines():
cname2cid[line.strip()] = label_id
label_id += 1
else:
cname2cid = pascalvoc_label()
cname2cid = pascalvoc_label(with_background)
# mapping category name to class id
# background:0, first_class:1, second_class:2, ...
with open(txt_file, 'r') as fr:
while True:
line = fr.readline()
......@@ -241,7 +256,7 @@ def load(anno_path, sample_num=-1, use_default_label=True):
return [records, cname2cid]
def pascalvoc_label():
def pascalvoc_label(with_background=True):
labels_map = {
'aeroplane': 1,
'bicycle': 2,
......@@ -264,4 +279,6 @@ def pascalvoc_label():
'train': 19,
'tvmonitor': 20
}
if not with_background:
labels_map = {k: v - 1 for k, v in labels_map.items()}
return labels_map
......@@ -168,9 +168,12 @@ def load_and_fusebn(exe, prog, path):
if not bn_in_path:
raise ValueError("The model in path {} has not params of batch norm.")
all_vars += [v for v in mean_variances]
# load running mean and running variance on cpu place into global scope.
place = fluid.CPUPlace()
exe_cpu = fluid.Executor(place)
fluid.io.load_vars(exe_cpu, path, vars=[v for v in mean_variances])
# load params including running mean and running variance into global scope.
# load params on real place into global scope.
fluid.io.load_vars(exe, path, prog, vars=all_vars)
eps = 1e-5
......
......@@ -48,9 +48,9 @@ def draw_mask(image, im_id, segms, threshold, alpha=0.7):
"""
Draw mask on image
"""
im_width, im_height = image.size
mask_color_id = 0
w_ratio = .4
color_list = colormap(rgb=True)
img_array = np.array(image).astype('float32')
for dt in np.array(segms):
if im_id != dt['image_id']:
......@@ -59,7 +59,6 @@ def draw_mask(image, im_id, segms, threshold, alpha=0.7):
if score < threshold:
continue
mask = mask_util.decode(segm) * 255
color_list = colormap(rgb=True)
color_mask = color_list[mask_color_id % len(color_list), 0:3]
mask_color_id += 1
for c in range(3):
......@@ -77,34 +76,43 @@ def draw_bbox(image, im_id, catid2name, bboxes, threshold,
"""
draw = ImageDraw.Draw(image)
catid2color = {}
color_list = colormap(rgb=True)[:40]
for dt in np.array(bboxes):
if im_id != dt['image_id']:
continue
catid, bbox, score = dt['category_id'], dt['bbox'], dt['score']
if score < threshold:
continue
xmin, ymin, w, h = bbox
xmin, ymin, w, h = bbox
if is_bbox_normalized:
im_width, im_height = image.size
xmin *= im_width
ymin *= im_height
w *= im_width
h *= im_height
xmax = xmin + w
ymax = ymin + h
if catid not in catid2color:
idx = np.random.randint(len(color_list))
catid2color[catid] = color_list[idx]
color = tuple(catid2color[catid])
# draw bbox
draw.line(
[(xmin, ymin), (xmin, ymax), (xmax, ymax), (xmax, ymin),
(xmin, ymin)],
width=2,
fill='red')
if image.mode == 'RGB':
text = catid2name[catid]
tw, th = draw.textsize(text)
draw.rectangle([(xmin + 1, ymin + 1),
(xmin + tw + 1, ymin + th + 1)],
fill='red')
draw.text((xmin + 1, ymin + 1), text, fill=(255, 255, 255))
fill=color)
# draw label
text = "{} {:.2f}".format(catid2name[catid], score)
tw, th = draw.textsize(text)
draw.rectangle([(xmin + 1, ymin - th),
(xmin + tw + 1, ymin)],
fill=color)
draw.text((xmin + 1, ymin - th), text, fill=(255, 255, 255))
return image
......@@ -80,7 +80,7 @@ def vocall_category_info(with_background=True):
with_background (bool, default True):
whether load background as class 0.
"""
label_map = pascalvoc_label()
label_map = pascalvoc_label(with_background)
label_map = sorted(label_map.items(), key=lambda x: x[1])
cats = [l[0] for l in label_map]
......
......@@ -16,7 +16,7 @@ Human-machine conversation is one of the most important topics in artificial int
# Task Description
Given a dialogue goal g and a set of topic-related background knowledge M = f<sub>1</sub> ,f<sub>2</sub> ,..., f<sub>n</sub> , the system is expected to output an utterance "u<sub>t</sub>" for the current conversation H = u<sub>1</sub>, u<sub>2</sub>, ..., u<sub>t-1</sub>, which keeps the conversation coherent and informative under the guidance of the given goal. During the dialogue, the system is required to proactively lead the conversation from one topic to another. The dialog goal g is given like this: "Start->Topic_A->TOPIC_B", which means the machine should lead the conversation from any start state to topic A and then to topic B. The given background knowledge includes knowledge related to topic A and topic B, and the relations between these two topics.<br>
![image](https://github.com/PaddlePaddle/models/blob/wwqydy-patch-1/PaddleNLP/Research/ACL2019-DuConv/images/proactive_conversation_case.png)
![image](https://github.com/PaddlePaddle/models/blob/develop/PaddleNLP/Research/ACL2019-DuConv/images/proactive_conversation_case.png)
*Figure1.Proactive Conversation Case. Each utterance of "BOT" could be predicted by system, e.g., utterances with black words represent history H,and utterance with green words represent the response u<sub>t</sub> predicted by system.*
# DuConv
......@@ -44,4 +44,4 @@ We provide retrieval-based and generation-based baseline systems. Both systems w
| 3 | 46.40/0.422/0.289 | 0.118/0.303 |
* [Leader Board](https://ai.baidu.com/broad/leaderboard?dataset=duconv), is opened forever <br>
We maintain a leader board which provides the official automatic evaluation. You can submit your result to https://ai.baidu.com/broad/submission?dataset=duconv to get the official result. Please make sure submit the result of test2 part.
\ No newline at end of file
We maintain a leader board which provides the official automatic evaluation. You can submit your result to https://ai.baidu.com/broad/submission?dataset=duconv to get the official result. Please make sure submit the result of test2 part.
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册