CONFIG.md 8.3 KB
Newer Older
Q
qingqing01 已提交
1 2
English | [简体中文](CONFIG_cn.md)

3 4 5
# Config Pipline

## Introduction
6 7 8 9

PaddleDetection takes a rather principled approach to configuration management. We aim to automate the configuration workflow and to reduce configuration errors.


10
## Rationale
11 12 13 14 15 16 17 18

Presently, configuration in mainstream frameworks are usually dictionary based: the global config is simply a giant, loosely defined Python dictionary.

This approach is error prone, e.g., misspelled or displaced keys may lead to serious errors in training process, causing time loss and wasted resources.

To avoid the common pitfalls, with automation and static analysis in mind, we propose a configuration design that is user friendly, easy to maintain and extensible.


19
## Design
20 21 22 23 24 25 26 27

The design utilizes some of Python's reflection mechanism to extract configuration schematics from Python class definitions.

To be specific, it extracts information from class constructor arguments, including names, docstrings, default values, data types (if type hints are available).

This approach advocates modular and testable design, leading to a unified and extensible code base.


28
### API
29 30 31 32 33 34 35 36 37 38 39 40

Most of the functionality is exposed in `ppdet.core.workspace` module.

-   `register`: This decorator register a class as configurable module; it understands several special annotations in the class definition.
    -   `__category__`: For better organization, modules are classified into categories.
    -   `__inject__`: A list of constructor arguments, which are intended to take module instances as input, module instances will be created at runtime an injected. The corresponding configuration value can be a class name string, a serialized object, a config key pointing to a serialized object, or a dict (in which case the constructor needs to handle it, see example below).
    -   `__op__`: Shortcut for wrapping PaddlePaddle operators into a callable objects, together with `__append_doc__` (extracting docstring from target PaddlePaddle operator automatically), this can be a real time saver.
-   `serializable`: This decorator make a class directly serializable in yaml config file, by taking advantage of [pyyaml](https://pyyaml.org/wiki/PyYAMLDocumentation)'s serialization mechanism.
-   `create`: Constructs a module instance according to global configuration.
-   `load_config` and `merge_config`: Loading yaml file and merge config settings from command line.


41
### Example
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117

Take the `RPNHead` module for example, it is composed of several PaddlePaddle operators. We first wrap those operators into classes, then pass in instances of these classes when instantiating the `RPNHead` module.

```python
# excerpt from `ppdet/modeling/ops.py`
from ppdet.core.workspace import register, serializable

# ... more operators

@register
@serializable
class GenerateProposals(object):
    # NOTE this class simply wraps a PaddlePaddle operator
    __op__ = fluid.layers.generate_proposals
    # NOTE docstring for args are extracted from PaddlePaddle OP
    __append_doc__ = True

    def __init__(self,
                 pre_nms_top_n=6000,
                 post_nms_top_n=1000,
                 nms_thresh=.5,
                 min_size=.1,
                 eta=1.):
        super(GenerateProposals, self).__init__()
        self.pre_nms_top_n = pre_nms_top_n
        self.post_nms_top_n = post_nms_top_n
        self.nms_thresh = nms_thresh
        self.min_size = min_size
        self.eta = eta

# ... more operators

# excerpt from `ppdet/modeling/anchor_heads/rpn_head.py`
from ppdet.core.workspace import register
from ppdet.modeling.ops import AnchorGenerator, RPNTargetAssign, GenerateProposals

@register
class RPNHead(object):
    """
    RPN Head

    Args:
        anchor_generator (object): `AnchorGenerator` instance
        rpn_target_assign (object): `RPNTargetAssign` instance
        train_proposal (object): `GenerateProposals` instance for training
        test_proposal (object): `GenerateProposals` instance for testing
    """
    __inject__ = [
        'anchor_generator', 'rpn_target_assign', 'train_proposal',
        'test_proposal'
    ]

    def __init__(self,
                 anchor_generator=AnchorGenerator().__dict__,
                 rpn_target_assign=RPNTargetAssign().__dict__,
                 train_proposal=GenerateProposals(12000, 2000).__dict__,
                 test_proposal=GenerateProposals().__dict__):
        super(RPNHead, self).__init__()
        self.anchor_generator = anchor_generator
        self.rpn_target_assign = rpn_target_assign
        self.train_proposal = train_proposal
        self.test_proposal = test_proposal
        if isinstance(anchor_generator, dict):
            self.anchor_generator = AnchorGenerator(**anchor_generator)
        if isinstance(rpn_target_assign, dict):
            self.rpn_target_assign = RPNTargetAssign(**rpn_target_assign)
        if isinstance(train_proposal, dict):
            self.train_proposal = GenerateProposals(**train_proposal)
        if isinstance(test_proposal, dict):
            self.test_proposal = GenerateProposals(**test_proposal)
```

The corresponding(generated) YAML snippet is as follows, note this is the configuration in **FULL**, all the default values can be omitted. In case of the above example, all arguments have default value, meaning nothing is required in the config file.

```yaml
RPNHead:
Q
qingqing01 已提交
118
  test_proposal:
119 120 121 122 123
    eta: 1.0
    min_size: 0.1
    nms_thresh: 0.5
    post_nms_top_n: 1000
    pre_nms_top_n: 6000
Q
qingqing01 已提交
124
  train_proposal:
125 126 127 128 129 130 131 132 133 134 135 136 137 138
    eta: 1.0
    min_size: 0.1
    nms_thresh: 0.5
    post_nms_top_n: 2000
    pre_nms_top_n: 12000
  anchor_generator:
    # ...
  rpn_target_assign:
    # ...
```

Example snippet that make use of the `RPNHead` module.

```python
W
wangguanzhong 已提交
139
from ppdet.core.workspace import load_config, merge_config, create
140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162

load_config('some_config_file.yml')
merge_config(more_config_options_from_command_line)

rpn_head = create('RPNHead')
# ... code that use the created module!
```

Configuration file can also have serialized objects in it, denoted with `!`, for example

```yaml
LearningRate:
  base_lr: 0.01
  schedulers:
  - !PiecewiseDecay
    gamma: 0.1
    milestones: [60000, 80000]
  - !LinearWarmup
    start_factor: 0.3333333333333333
    steps: 500
```


163
## Requirements
164 165 166 167 168 169 170 171 172 173 174 175 176

Two Python packages are used, both are optional.

-   [typeguard](https://github.com/agronholm/typeguard) is used for type checking in Python 3.
-   [docstring\_parser](https://github.com/rr-/docstring_parser) is needed for docstring parsing.

To install them, simply run:

```shell
pip install typeguard http://github.com/willthefrog/docstring_parser/tarball/master
```


177
## Tooling
178 179 180 181 182 183 184 185 186 187 188 189 190 191 192

A small utility (`tools/configure.py`) is included to simplify the configuration process, it provides 4 commands to walk users through the configuration process:

1.  `list`: List currently registered modules by category, one can also specify which category to list with the `--category` flag.
2.  `help`: Get help information for a module, including description, options, configuration template and example command line flags.
3.  `analyze`: Check configuration file for missing/extraneous options, options with mismatch type (if type hint is given) and missing dependencies, it also highlights user provided values (overridden default values).
4.  `generate`: Generate a configuration template for a given list of modules. By default it generates a complete configuration file, which can be quite verbose; if a `--minimal` flag is given, it generates a template that only contain non optional settings. For example, to generate a configuration for Faster R-CNN architecture with `ResNet` backbone and `FPN`, run:

    ```shell
    python tools/configure.py generate FasterRCNN ResNet RPNHead RoIAlign BBoxAssigner BBoxHead FasterRCNNTrainFeed FasterRCNNTestFeed LearningRate OptimizerBuilder
    ```

    For a minimal version, run:

    ```shell
Y
Yang Zhang 已提交
193
    python tools/configure.py generate --minimal FasterRCNN BBoxHead
194
    ```
195 196


197
## FAQ
198 199 200 201 202 203 204 205

**Q:** There are some configuration options that are used by multiple modules (e.g., `num_classes`), how do I avoid duplication in config files?

**A:** We provided a `__shared__` annotation for exactly this purpose, simply annotate like this `__shared__ = ['num_classes']`. It works as follows:

1.  if `num_classes` is configured for a module in config file, it takes precedence.
2.  if `num_classes` is not configured for a module but is present in the config file as a global key, its value will be used.
3.  otherwise, the default value (`81`) will be used.