ignore parameters with mismatched shape in transfer learning (#297)

* refine transfer learn * refine transfer learning doc

ignore parameters with mismatched shape in transfer learning (#297)
* refine transfer learn * refine transfer learning doc
6b98421d · wangguanzhong · GitHub · 9f76868d · 6b98421d · 6b98421d
5 changed file
--- a/docs/advanced_tutorials/TRANSFER_LEARNING.md
+++ b/docs/advanced_tutorials/TRANSFER_LEARNING.md
@@ -8,17 +8,32 @@ In transfer learning, if different dataset and the number of classes is used, th
 ## Transfer Learning in PaddleDetection
-In transfer learning, it's needed to load pretrained model selectively. The following two methods can be used:
+In transfer learning, it's needed to load pretrained model selectively. Two ways are provided in PaddleDetection.
-1. Set `finetune_exclude_pretrained_params` in YAML configuration files. Please refer to [configure file](https://github.com/PaddlePaddle/PaddleDetection/blob/master/configs/yolov3_mobilenet_v1_fruit.yml#L15)
+#### Load pretrain weights directly (**recommended**)
-2. Set -o finetune_exclude_pretrained_params in command line. For example:
+The parameters which have diffierent shape between model and pretrain\_weights are ignored automatically. For example:
+```python
+export PYTHONPATH=$PYTHONPATH:.
+export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+python -u tools/train.py -c configs/faster_rcnn_r50_1x.yml \
+                      -o pretrain_weights=https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r50_1x.tar
+```
+#### Use `finetune_exclude_pretrained_params` to specify the parameters to ignore.
+The parameters which need to ignore can be specified explicitly as well and arbitrary parameter names can be added to `finetune_exclude_pretrained_params`. For this purpose, several methods can be used as follwed:
+- Set `finetune_exclude_pretrained_params` in YAML configuration files. Please refer to [configure file](https://github.com/PaddlePaddle/PaddleDetection/blob/master/configs/yolov3_mobilenet_v1_fruit.yml#L15)
+- Set `finetune_exclude_pretrained_params` in command line. For example:
 ```python
 export PYTHONPATH=$PYTHONPATH:.
 export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
 python -u tools/train.py -c configs/faster_rcnn_r50_1x.yml \
                        -o pretrain_weights=https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r50_1x.tar \
-                           finetune_exclude_pretrained_params=['cls_score','bbox_pred']
+                           finetune_exclude_pretrained_params=['cls_score','bbox_pred'] \
 ```
 * Note:
@@ -26,7 +41,7 @@ python -u tools/train.py -c configs/faster_rcnn_r50_1x.yml \
 1. The path in pretrain\_weights is the open-source model link of faster RCNN from COCO dataset. For full models link, please refer to [MODEL_ZOO](../MODEL_ZOO.md)
 2. The parameter fields are set in finetune\_exclude\_pretrained\_params. If the name of parameter matches field (wildcard matching), the parameter will be ignored in loading.
-If users want to fine-tune by own dataet, and remain the model construction, need to ignore the parameters related to the number of classes. PaddleDetection lists ignored parameter fields corresponding to different model type. The table is shown below: </br>
+If users want to fine-tune by own dataset, and remain the model construction, need to ignore the parameters related to the number of classes. PaddleDetection lists ignored parameter fields corresponding to different model type. The table is shown below: </br>
 |      model type    |         ignored parameter fields          |
 | :----------------: | :---------------------------------------: |

--- a/docs/advanced_tutorials/TRANSFER_LEARNING_cn.md
+++ b/docs/advanced_tutorials/TRANSFER_LEARNING_cn.md
@@ -6,17 +6,33 @@
 ## PaddleDetection进行迁移学习
-在迁移学习中，对预训练模型进行选择性加载，可通过如下两种方式实现：
+在迁移学习中，对预训练模型进行选择性加载，PaddleDetection支持如下两种迁移学习方式：
+#### 直接加载预训练权重（**推荐方式**）
+模型中和预训练模型中对应参数形状不同的参数将自动被忽略，例如：
+```python
+export PYTHONPATH=$PYTHONPATH:.
+export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+python -u tools/train.py -c configs/faster_rcnn_r50_1x.yml \
+                           -o pretrain_weights=https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r50_1x.tar
+```
+#### 使用`finetune_exclude_pretrained_params`参数控制忽略参数名
+可以显示的指定训练过程中忽略参数的名字，任何参数名均可加入`finetune_exclude_pretrained_params`中，为实现这一目的，可通过如下方式实现：
 1. 在 YMAL 配置文件中通过设置`finetune_exclude_pretrained_params`字段。可参考[配置文件](https://github.com/PaddlePaddle/PaddleDetection/blob/master/configs/yolov3_mobilenet_v1_fruit.yml#L15)
-2. 在 train.py的启动参数中设置 -o finetune_exclude_pretrained_params。例如：
+2. 在 train.py的启动参数中设置`finetune_exclude_pretrained_params`。例如：
 ```python
 export PYTHONPATH=$PYTHONPATH:.
 export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
 python -u tools/train.py -c configs/faster_rcnn_r50_1x.yml \
                         -o pretrain_weights=https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r50_1x.tar \
-                           finetune_exclude_pretrained_params=['cls_score','bbox_pred']
+                           finetune_exclude_pretrained_params=['cls_score','bbox_pred'] \
 ```
 * 说明：

--- a/docs/tutorials/GETTING_STARTED.md
+++ b/docs/tutorials/GETTING_STARTED.md
@@ -62,7 +62,18 @@ list below can be viewed by `--help`
 - Fine-tune other task
-  When using pre-trained model to fine-tune other task, two methods can be used:
+  When using pre-trained model to fine-tune other task, pretrain\_weights can be used directly. The parameters with different shape will be ignored automatically. For example:
+  ```bash
+  export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+  # If the shape of parameters in program is different from pretrain_weights,
+  # then PaddleDetection will not use such parameters.
+  python -u tools/train.py -c configs/faster_rcnn_r50_1x.yml \
+                           -o pretrain_weights=output/faster_rcnn_r50_1x/model_final \
+  ```
+  Besides, the name of parameters which need to ignore can be specified explicitly as well. Two methods can be used:
  1. The excluded pre-trained parameters can be set by `finetune_exclude_pretrained_params` in YAML config
  2. Set -o finetune\_exclude\_pretrained_params in the arguments.

--- a/docs/tutorials/GETTING_STARTED_cn.md
+++ b/docs/tutorials/GETTING_STARTED_cn.md
@@ -60,7 +60,15 @@ python tools/infer.py -c configs/faster_rcnn_r50_1x.yml --infer_img=demo/0000005
 - Fine-tune其他任务
-  使用预训练模型fine-tune其他任务时，可采用如下两种方式：
+  使用预训练模型fine-tune其他任务时，可以直接加载预训练模型，形状不匹配的参数将自动忽略，例如：
+  ```bash
+  export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+  python -u tools/train.py -c configs/faster_rcnn_r50_1x.yml \
+                         -o pretrain_weights=output/faster_rcnn_r50_1x/model_final \
+  ```
+  也可以显示的指定忽略参数名，可采用如下两种方式：
  1. 在YAML配置文件中设置`finetune_exclude_pretrained_params`
  2. 在命令行中添加-o finetune\_exclude\_pretrained_params对预训练模型进行选择性加载。
@@ -93,7 +101,6 @@ python tools/infer.py -c configs/faster_rcnn_r50_1x.yml --infer_img=demo/0000005
 - 若本地未找到数据集，将自动下载数据集并保存在`~/.cache/paddle/dataset`中。
 - 预训练模型自动下载并保存在`〜/.cache/paddle/weights`中。
 - 模型checkpoints默认保存在`output`中，可通过修改配置文件中save_dir进行配置。
- RCNN系列模型CPU训练在PaddlePaddle 1.5.1及以下版本暂不支持。
 ### 混合精度训练

--- a/ppdet/utils/checkpoint.py
+++ b/ppdet/utils/checkpoint.py
@@ -99,9 +99,9 @@ def load_params(exe, prog, path, ignore_params=[]):
        exe (fluid.Executor): The fluid.Executor object.
        prog (fluid.Program): load weight to which Program object.
        path (string): URL string or loca model path.
-        ignore_params (bool): ignore variable to load when finetuning.
+        ignore_params (list): ignore variable to load when finetuning.
            It can be specified by finetune_exclude_pretrained_params 
-            and the usage can refer to docs/TRANSFER_LEARNING.md
+            and the usage can refer to docs/advanced_tutorials/TRANSFER_LEARNING.md
    """
    if is_url(path):
@@ -112,32 +112,31 @@ def load_params(exe, prog, path, ignore_params=[]):
    logger.info('Loading parameters from {}...'.format(path))
-    ignore_list = None
+    ignore_set = set()
+    state = _load_state(path)
+    # ignore the parameter which mismatch the shape 
+    # between the model and pretrain weight.
+    all_var_shape = {}
+    for block in prog.blocks:
+        for param in block.all_parameters():
+            all_var_shape[param.name] = param.shape
+    ignore_set.update([
+        name for name, shape in all_var_shape.items()
+        if name in state and shape != state[name].shape
+    ])
    if ignore_params:
        all_var_names = [var.name for var in prog.list_vars()]
        ignore_list = filter(
            lambda var: any([re.match(name, var) for name in ignore_params]),
            all_var_names)
-        ignore_list = list(ignore_list)
+        ignore_set.update(list(ignore_list))
-    if os.path.isdir(path):
+    if len(ignore_set) > 0:
-        if not ignore_list:
+        for k in ignore_set:
-            fluid.load(prog, path, executor=exe)
-            return
-        # XXX this is hackish, but seems to be the least contrived way...
-        tmp = tempfile.mkdtemp()
-        dst = os.path.join(tmp, os.path.basename(os.path.normpath(path)))
-        shutil.copytree(path, dst, ignore=shutil.ignore_patterns(*ignore_list))
-        fluid.load(prog, dst, executor=exe)
-        shutil.rmtree(tmp)
-        return
-    state = _load_state(path)
-    if ignore_list:
-        for k in ignore_list:
            if k in state:
+                logger.warning('variable {} not used'.format(k))
                del state[k]
    fluid.io.set_program_state(prog, state)
@@ -217,19 +216,12 @@ def load_and_fusebn(exe, prog, path):
    #  x is any prefix
    mean_variances = set()
    bn_vars = []
-    state = None
-    if os.path.exists(path + '.pdparams'):
    state = _load_state(path)
    def check_mean_and_bias(prefix):
        m = prefix + 'mean'
        v = prefix + 'variance'
-        if state:
        return v in state and m in state
-        else:
-            return (os.path.exists(os.path.join(path, m)) and
-                    os.path.exists(os.path.join(path, v)))
    has_mean_bias = True
@@ -269,17 +261,14 @@ def load_and_fusebn(exe, prog, path):
                    bn_vars.append(
                        [scale_name, bias_name, mean_name, variance_name])
-    if state:
-        fluid.io.set_program_state(prog, state)
-    else:
-        load_params(exe, prog, path)
    if not has_mean_bias:
+        fluid.io.set_program_state(prog, state)
        logger.warning(
            "There is no paramters of batch norm in model {}. "
            "Skip to fuse batch norm. And load paramters done.".format(path))
        return
+    fluid.load(prog, path, exe)
    eps = 1e-5
    for names in bn_vars:
        scale_name, bias_name, mean_name, var_name = names