config_en.md 17.7 KB
Newer Older
W
WenmuZhou 已提交
1
## Optional parameter list
K
Khanh Tran 已提交
2

W
WenmuZhou 已提交
3
The following list can be viewed through `--help`
K
Khanh Tran 已提交
4 5 6

|         FLAG             |     Supported script    |        Use        |      Defaults       |         Note         |
| :----------------------: | :------------: | :---------------: | :--------------: | :-----------------: |
W
WenmuZhou 已提交
7 8
|          -c              |      ALL       |  Specify configuration file to use  |  None  |  **Please refer to the parameter introduction for configuration file usage** |
|          -o              |      ALL       |  set configuration options  |  None  |  Configuration using -o has higher priority than the configuration file selected with -c. E.g: -o Global.use_gpu=false |
K
Khanh Tran 已提交
9

X
xxxpsyduck 已提交
10
## INTRODUCTION TO GLOBAL PARAMETERS OF CONFIGURATION FILE
K
Khanh Tran 已提交
11

W
WenmuZhou 已提交
12 13
Take rec_chinese_lite_train_v2.0.yml as an example
### Global
K
Khanh Tran 已提交
14

W
WenmuZhou 已提交
15
|         Parameter             |            Use                |      Defaults       |            Note            |
K
Khanh Tran 已提交
16
| :----------------------: |  :---------------------:   | :--------------:  |   :--------------------:   |
W
WenmuZhou 已提交
17 18
|      use_gpu             |    Set using GPU or not           |       true        |                \                 |
|      epoch_num           |    Maximum training epoch number             |       500        |                \                 |
W
WenmuZhou 已提交
19
|      log_smooth_window   |    Log queue length, the median value in the queue each time will be printed           |       20          |                \                 |
K
Khanh Tran 已提交
20
|      print_batch_step    |    Set print log interval         |       10          |                \                 |
W
WenmuZhou 已提交
21
|      save_model_dir      |    Set model save path        |  output/{算法名称}  |                \                 |
K
Khanh Tran 已提交
22
|      save_epoch_step     |    Set model save interval        |       3           |                \                 |
W
WenmuZhou 已提交
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
|      eval_batch_step     |    Set the model evaluation interval        | 2000 or [1000, 2000]        | runing evaluation every 2000 iters or evaluation is run every 2000 iterations after the 1000th iteration   |
|      cal_metric_during_train     |    Set whether to evaluate the metric during the training process. At this time, the metric of the model under the current batch is evaluated        |       true         |                \                 |
|      load_static_weights     |   Set whether the pre-training model is saved in static graph mode (currently only required by the detection algorithm)        |       true         |                \                 |
|      pretrained_model    |    Set the path of the pre-trained model      |  ./pretrain_models/CRNN/best_accuracy  |  \          |
|      checkpoints         |    set model parameter path            |       None        |   Used to load parameters after interruption to continue training|
|      use_visualdl  |    Set whether to enable visualdl for visual log display |          False        |    [Tutorial](https://www.paddlepaddle.org.cn/paddle/visualdl) |
|      infer_img            |    Set inference image path or folder path     |       ./infer_img | \|
|      character_dict_path |    Set dictionary path            |  ./ppocr/utils/ppocr_keys_v1.txt  |    \                 |
|      max_text_length     |    Set the maximum length of text        |       25          |                \                 |
|      character_type      |    Set character type            |       ch          |    en/ch, the default dict will be used for en, and the custom dict will be used for ch |
|      use_space_char     |    Set whether to recognize spaces             |        True      |          Only support in character_type=ch mode                 |
|      label_list          |    Set the angle supported by the direction classifier       |    ['0','180']    |     Only valid in angle classifier model |
|      save_res_path          |    Set the save address of the test model results       |    ./output/det_db/predicts_db.txt    |     Only valid in the text detection model |

### Optimizer ([ppocr/optimizer](../../ppocr/optimizer))

|         Parameter             |            Use            |      Defaults        |            Note             |
| :---------------------: |  :---------------------:   | :--------------:  |   :--------------------:   |
|      name        |         Optimizer class name          |  Adam  |  Currently supports`Momentum`,`Adam`,`RMSProp`, see [ppocr/optimizer/optimizer.py](../../ppocr/optimizer/optimizer.py)  |
|      beta1           |    Set the exponential decay rate for the 1st moment estimates  |       0.9         |               \             |
|      beta2           |    Set the exponential decay rate for the 2nd moment estimates  |     0.999         |               \             |
Z
zhoujun 已提交
44
|      clip_norm           |    The maximum norm value  |    -         |               \             |
W
WenmuZhou 已提交
45 46 47 48 49 50 51 52 53
|      **lr**                |         Set the learning rate decay method       |   -    |       \  |
|        name    |      Learning rate decay class name   |         Cosine       | Currently supports`Linear`,`Cosine`,`Step`,`Piecewise`, see[ppocr/optimizer/learning_rate.py](../../ppocr/optimizer/learning_rate.py) |
|        learning_rate      |    Set the base learning rate        |       0.001      |  \        |
|      **regularizer**      |  Set network regularization method        |       -      | \        |
|        name      |    Regularizer class name      |       L2     |  Currently support`L1`,`L2`, see[ppocr/optimizer/regularizer.py](../../ppocr/optimizer/regularizer.py)        |
|        factor      |    Learning rate decay coefficient       |       0.00004     |  \        |


### Architecture ([ppocr/modeling](../../ppocr/modeling))
L
LDOUBLEV 已提交
54
In PaddleOCR, the network is divided into four stages: Transform, Backbone, Neck and Head
W
WenmuZhou 已提交
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90

|         Parameter             |            Use            |      Defaults        |            Note             |
| :---------------------: |  :---------------------:   | :--------------:  |   :--------------------:   |
|      model_type        |         Network Type          |  rec  |  Currently support`rec`,`det`,`cls`  |
|      algorithm           |    Model name  |       CRNN         |               See [algorithm_overview](./algorithm_overview.md) for the support list             |
|      **Transform**           |    Set the transformation method  |       -       |               Currently only recognition algorithms are supported, see [ppocr/modeling/transform](../../ppocr/modeling/transform) for details            |
|        name    |      Transformation class name   |         TPS       | Currently supports `TPS` |
|        num_fiducial      |   Number of TPS control points        |       20      |  Ten on the top and bottom       |
|        loc_lr      |    Localization network learning rate        |       0.1      |  \      |
|        model_name      |    Localization network size        |       small      |  Currently support`small`,`large`       |
|      **Backbone**      |  Set the network backbone class name        |       -      | see [ppocr/modeling/backbones](../../ppocr/modeling/backbones)        |
|        name      |    backbone class name       |       ResNet     | Currently support`MobileNetV3`,`ResNet`        |
|        layers      |    resnet layers       |       34     |  Currently support18,34,50,101,152,200       |
|        model_name      |    MobileNetV3 network size       |       small     |  Currently support`small`,`large`       |
|      **Neck**      |  Set network neck        |       -      | see[ppocr/modeling/necks](../../ppocr/modeling/necks)        |
|        name      |    neck class name       |       SequenceEncoder     | Currently support`SequenceEncoder`,`DBFPN`        |
|        encoder_type      |    SequenceEncoder encoder type       |       rnn     |  Currently support`reshape`,`fc`,`rnn`       |
|        hidden_size      |   rnn number of internal units       |       48     |  \      |
|        out_channels      |   Number of DBFPN output channels       |       256     |  \      |
|      **Head**      |  Set the network head        |       -      | see[ppocr/modeling/heads](../../ppocr/modeling/heads)        |
|        name      |    head class name       |       CTCHead     | Currently support`CTCHead`,`DBHead`,`ClsHead`        |
|        fc_decay      |    CTCHead regularization coefficient       |       0.0004     |  \      |
|        k      |   DBHead binarization coefficient       |       50     |  \      |
|        class_dim      |   ClsHead output category number       |       2     |  \      |


### Loss ([ppocr/losses](../../ppocr/losses))

|         Parameter             |            Use            |      Defaults        |            Note             |
| :---------------------: |  :---------------------:   | :--------------:  |   :--------------------:   |
|      name        |         loss class name          |  CTCLoss  |  Currently support`CTCLoss`,`DBLoss`,`ClsLoss`  |
|      balance_loss        |        Whether to balance the number of positive and negative samples in DBLossloss (using OHEM)         |  True  |  \  |
|      ohem_ratio        |        The negative and positive sample ratio of OHEM in DBLossloss         |  3  |  \  |
|      main_loss_type        |        The loss used by shrink_map in DBLossloss        |  DiceLoss  |  Currently support`DiceLoss`,`BCELoss`  |
|      alpha        |        The coefficient of shrink_map_loss in DBLossloss       |  5  |  \  |
|      beta        |        The coefficient of threshold_map_loss in DBLossloss       |  10  |  \  |
T
tink2123 已提交
91

W
WenmuZhou 已提交
92
### PostProcess ([ppocr/postprocess](../../ppocr/postprocess))
T
tink2123 已提交
93

W
WenmuZhou 已提交
94 95 96 97 98 99 100 101 102 103 104 105 106 107
|         Parameter             |            Use            |      Defaults        |            Note             |
| :---------------------: |  :---------------------:   | :--------------:  |   :--------------------:   |
|      name        |         Post-processing class name          |  CTCLabelDecode  |  Currently support`CTCLoss`,`AttnLabelDecode`,`DBPostProcess`,`ClsPostProcess`  |
|      thresh        |        The threshold for binarization of the segmentation map in DBPostProcess         |  0.3  |  \  |
|      box_thresh        |        The threshold for filtering output boxes in DBPostProcess. Boxes below this threshold will not be output         |  0.7  |  \  |
|      max_candidates        |        The maximum number of text boxes output in DBPostProcess        |  1000  |   |
|      unclip_ratio        |        The unclip ratio of the text box in DBPostProcess       |  2.0  |  \  |

### Metric ([ppocr/metrics](../../ppocr/metrics))

|         Parameter             |            Use            |      Defaults        |            Note             |
| :---------------------: |  :---------------------:   | :--------------:  |   :--------------------:   |
|      name        |         Metric method name          |  CTCLabelDecode  |  Currently support`DetMetric`,`RecMetric`,`ClsMetric`  |
|      main_indicator        |        Main indicators, used to select the best model        |  acc |  For the detection method is hmean, the recognition and classification method is acc  |
T
tink2123 已提交
108

W
WenmuZhou 已提交
109 110
### Dataset  ([ppocr/data](../../ppocr/data))
|         Parameter             |            Use            |      Defaults        |            Note             |
T
tink2123 已提交
111
| :---------------------: |  :---------------------:   | :--------------:  |   :--------------------:   |
W
WenmuZhou 已提交
112
|      **dataset**        |         Return one sample per iteration          |  -  |  -  |
M
MissPenguin 已提交
113
|      name        |        dataset class name         |  SimpleDataSet |   Currently support`SimpleDataSet`,`LMDBDataSet`  |
W
WenmuZhou 已提交
114
|      data_dir        |        Image folder path        |  ./train_data |  \  |
M
MissPenguin 已提交
115
|      label_file_list        |        Groundtruth file path         |  ["./train_data/train_list.txt"] | This parameter is not required when dataset is LMDBDataSet   |
W
WenmuZhou 已提交
116 117 118 119 120 121
|      ratio_list        |        Ratio of data set         |  [1.0] | If there are two train_lists in label_file_list and ratio_list is [0.4,0.6], 40% will be sampled from train_list1, and 60% will be sampled from train_list2 to combine the entire dataset   |
|      transforms        |        List of methods to transform images and labels         |  [DecodeImage,CTCLabelEncode,RecResizeImg,KeepKeys] |   see[ppocr/data/imaug](../../ppocr/data/imaug)  |
|      **loader**        |        dataloader related         |  - |   |
|      shuffle        |        Does each epoch disrupt the order of the data set         |  True | \  |
|      batch_size_per_card        |        Single card batch size during training         |  256 | \  |
|      drop_last        |        Whether to discard the last incomplete mini-batch because the number of samples in the data set cannot be divisible by batch_size        |  True | \  |
W
WenmuZhou 已提交
122
|      num_workers        |        The number of sub-processes used to load data, if it is 0, the sub-process is not started, and the data is loaded in the main process       |  8 | \  |
L
LDOUBLEV 已提交
123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226


## 3. Multi-language config yml file generation

PaddleOCR currently supports 80 (except Chinese) language recognition. A multi-language configuration file template is
provided under the path `configs/rec/multi_languages`: [rec_multi_language_lite_train.yml](../../configs/rec/multi_language/rec_multi_language_lite_train.yml)

There are two ways to create the required configuration file::

1. Automatically generated by script

[generate_multi_language_configs.py](../../configs/rec/multi_language/generate_multi_language_configs.py) Can help you generate configuration files for multi-language models

- Take Italian as an example, if your data is prepared in the following format:
    ```
    |-train_data
        |- it_train.txt # train_set label
        |- it_val.txt # val_set label
        |- data
            |- word_001.jpg
            |- word_002.jpg
            |- word_003.jpg
            | ...
    ```

    You can use the default parameters to generate a configuration file:

    ```bash
    # The code needs to be run in the specified directory
    cd PaddleOCR/configs/rec/multi_language/
    # Set the configuration file of the language to be generated through the -l or --language parameter.
    # This command will write the default parameters into the configuration file
    python3 generate_multi_language_configs.py -l it
    ```

- If your data is placed in another location, or you want to use your own dictionary, you can generate the configuration file by specifying the relevant parameters:

    ```bash
    # -l or --language field is required
    # --train to modify the training set
    # --val to modify the validation set
    # --data_dir to modify the data set directory
    # --dict to modify the dict path
    # -o to modify the corresponding default parameters
    cd PaddleOCR/configs/rec/multi_language/
    python3 generate_multi_language_configs.py -l it \  # language
    --train {path/of/train_label.txt} \ # path of train_label
    --val {path/of/val_label.txt} \     # path of val_label
    --data_dir {train_data/path} \      # root directory of training data
    --dict {path/of/dict} \             # path of dict
    -o Global.use_gpu=False             # whether to use gpu
    ...

    ```
Italian is made up of Latin letters, so after executing the command, you will get the rec_latin_lite_train.yml.

2. Manually modify the configuration file

   You can also manually modify the following fields in the template:

   ```
    Global:
      use_gpu: True
      epoch_num: 500
      ...
      character_type: it  # language
      character_dict_path:  {path/of/dict} # path of dict

   Train:
      dataset:
        name: SimpleDataSet
        data_dir: train_data/ # root directory of training data
        label_file_list: ["./train_data/train_list.txt"] # train label path
      ...

   Eval:
      dataset:
        name: SimpleDataSet
        data_dir: train_data/ # root directory of val data
        label_file_list: ["./train_data/val_list.txt"] # val label path
      ...

   ```

Currently, the multi-language algorithms supported by PaddleOCR are:

| Configuration file |  Algorithm name |   backbone |   trans   |   seq      |     pred     |  language | character_type |
| :--------: |  :-------:   | :-------:  |   :-------:   |   :-----:   |  :-----:   | :-----:  | :-----:  |
| rec_chinese_cht_lite_train.yml |  CRNN |   Mobilenet_v3 small 0.5 |  None   |  BiLSTM |  ctc  | chinese traditional  | chinese_cht|
| rec_en_lite_train.yml |  CRNN |   Mobilenet_v3 small 0.5 |  None   |  BiLSTM |  ctc  | English(Case sensitive)   | EN |
| rec_french_lite_train.yml |  CRNN |   Mobilenet_v3 small 0.5 |  None   |  BiLSTM |  ctc  | French |  french |
| rec_ger_lite_train.yml |  CRNN |   Mobilenet_v3 small 0.5 |  None   |  BiLSTM |  ctc  | German   | german |
| rec_japan_lite_train.yml |  CRNN |   Mobilenet_v3 small 0.5 |  None   |  BiLSTM |  ctc  | Japanese | japan |
| rec_korean_lite_train.yml |  CRNN |   Mobilenet_v3 small 0.5 |  None   |  BiLSTM |  ctc  | Korean  | korean |
| rec_latin_lite_train.yml |  CRNN |   Mobilenet_v3 small 0.5 |  None   |  BiLSTM |  ctc  | Latin  | latin |
| rec_arabic_lite_train.yml |  CRNN |   Mobilenet_v3 small 0.5 |  None   |  BiLSTM |  ctc  | arabic |  ar |
| rec_cyrillic_lite_train.yml |  CRNN |   Mobilenet_v3 small 0.5 |  None   |  BiLSTM |  ctc  | cyrillic   | cyrillic |
| rec_devanagari_lite_train.yml |  CRNN |   Mobilenet_v3 small 0.5 |  None   |  BiLSTM |  ctc  | devanagari  | devanagari |

For more supported languages, please refer to : [Multi-language model](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/multi_languages_en.md#4-support-languages-and-abbreviations)

The multi-language model training method is the same as the Chinese model. The training data set is 100w synthetic data. A small amount of fonts and test data can be downloaded using the following two methods.
* [Baidu Netdisk](https://pan.baidu.com/s/1bS_u207Rm7YbY33wOECKDA),Extraction code:frgi.
* [Google drive](https://drive.google.com/file/d/18cSWX7wXSy4G0tbKJ0d9PuIaiwRLHpjA/view)