diff --git a/README.md b/README.md index a19efa5681c102984ed14f1cc02fbf3eb49a351d..36c913b0d133d4e72b89d05cc81485c4dd2ca29a 100644 --- a/README.md +++ b/README.md @@ -72,6 +72,7 @@ PaddleDetection的目的是为工业界和学术界提供丰富、易用的目 - [安装说明](docs/tutorials/INSTALL_cn.md) - [快速开始](docs/tutorials/QUICK_STARTED_cn.md) - [训练/评估/预测流程](docs/tutorials/GETTING_STARTED_cn.md) +- [常见问题汇总](docs/tutorials/FAQ.md) ### 进阶教程 - [数据预处理及自定义数据集](docs/advanced_tutorials/READER.md) @@ -94,9 +95,9 @@ PaddleDetection的目的是为工业界和学术界提供丰富、易用的目 ## 模型库 - [模型库](docs/MODEL_ZOO_cn.md) -- [人脸检测模型](configs/face_detection/README.md) +- [人脸检测模型](configs/face_detection/README.md) 开源BlazeFace系列模型,Wider-Face数据集上最高精度达到91.5%,同时具备了较高的预测性能 - [行人检测和车辆检测预训练模型](contrib/README_cn.md) 针对不同场景的检测模型 -- [YOLOv3增强模型](docs/featured_model/YOLOv3_ENHANCEMENT.md) 改进原始YOLOv3,精度达到41.4%,原论文精度为33.0%,同时预测速度也得到提升 +- [YOLOv3增强模型](docs/featured_model/YOLOv3_ENHANCEMENT.md) 改进原始YOLOv3,精度达到43.6%,原论文精度为33.0%,同时预测速度也得到提升 - [Objects365 2019 Challenge夺冠模型](docs/featured_model/CACascadeRCNN.md) Objects365 Full Track任务中最好的单模型之一,精度达到31.7% - [Open Images V5和Objects365数据集模型](docs/featured_model/OIDV5_BASELINE_MODEL.md) diff --git a/README_en.md b/README_en.md index 85dffdd025c1d393f5d75e9d7efc1a2a0217497e..d1c2f94d11e18f438362b5f253b39b65b5e0e6f0 100644 --- a/README_en.md +++ b/README_en.md @@ -81,7 +81,8 @@ Advanced Features: - [Installation guide](docs/tutorials/INSTALL.md) - [Quick start on small dataset](docs/tutorials/QUICK_STARTED.md) -- For detailed training and evaluation workflow, please refer to [GETTING_STARTED](docs/tutorials/GETTING_STARTED.md) +- [Train/Evaluation/Inference](docs/tutorials/GETTING_STARTED.md) +- [FAQ](docs/tutorials/FAQ.md) ### Advanced Tutorial @@ -105,9 +106,9 @@ Advanced Features: ## Model Zoo - Pretrained models are available in the [PaddleDetection model zoo](docs/MODEL_ZOO.md). -- [Face detection models](configs/face_detection/README.md) +- [Face detection models](configs/face_detection/README.md) BlazeFace series model with the highest precision of 91.5% on Wider-Face dataset and outstanding inference performance. - [Pretrained models for pedestrian and vehicle detection](contrib/README.md) Models for object detection in specific scenarios. -- [YOLOv3 enhanced model](docs/YOLOv3_ENHANCEMENT.md) Compared to MAP of 33.0% in paper, enhanced YOLOv3 reaches the MAP of 41.4% and inference speed is improved as well +- [YOLOv3 enhanced model](docs/YOLOv3_ENHANCEMENT.md) Compared to MAP of 33.0% in paper, enhanced YOLOv3 reaches the MAP of 43.6% and inference speed is improved as well - [Objects365 2019 Challenge champion model](docs/CACascadeRCNN.md) One of the best single models in Objects365 Full Track of which MAP reaches 31.7%. - [Open Images Dataset V5 and Objects365 Dataset models](docs/OIDV5_BASELINE_MODEL.md) diff --git a/docs/advanced_tutorials/CONFIG.md b/docs/advanced_tutorials/CONFIG.md index 709b6ecf7aa0bcce45acdcc7397b25deec28a840..c5acc67d8818c371b065a27770e79cd7e90d4bb4 100644 --- a/docs/advanced_tutorials/CONFIG.md +++ b/docs/advanced_tutorials/CONFIG.md @@ -192,14 +192,3 @@ A small utility (`tools/configure.py`) is included to simplify the configuration ```shell python tools/configure.py generate --minimal FasterRCNN BBoxHead ``` - - -## FAQ - -**Q:** There are some configuration options that are used by multiple modules (e.g., `num_classes`), how do I avoid duplication in config files? - -**A:** We provided a `__shared__` annotation for exactly this purpose, simply annotate like this `__shared__ = ['num_classes']`. It works as follows: - -1. if `num_classes` is configured for a module in config file, it takes precedence. -2. if `num_classes` is not configured for a module but is present in the config file as a global key, its value will be used. -3. otherwise, the default value (`81`) will be used. diff --git a/docs/advanced_tutorials/CONFIG_cn.md b/docs/advanced_tutorials/CONFIG_cn.md index fe1f7e31f57925dbb525602533565f69a1488bad..7899ace816bee3a4efd1939a7174eb6e17cc22a0 100644 --- a/docs/advanced_tutorials/CONFIG_cn.md +++ b/docs/advanced_tutorials/CONFIG_cn.md @@ -182,14 +182,3 @@ pip install typeguard http://github.com/willthefrog/docstring_parser/tarball/mas ```shell python tools/configure.py generate --minimal FasterRCNN BBoxHead ``` - - -## FAQ - -**Q:** 某些配置项会在多个模块中用到(如 `num_classes`),如何避免在配置文件中多次重复设置? - -**A:** 框架提供了 `__shared__` 标记来实现配置的共享,用户可以标记参数,如 `__shared__ = ['num_classes']` ,配置数值作用规则如下: - -1. 如果模块配置中提供了 `num_classes` ,会优先使用其数值。 -2. 如果模块配置中未提供 `num_classes` ,但配置文件中存在全局键值,那么会使用全局键值。 -3. 两者均为配置的情况下,将使用默认值(`81`)。 diff --git a/docs/advanced_tutorials/READER.md b/docs/advanced_tutorials/READER.md index 65f51283db0c0a3324593382972a6c880b64e6e6..532ebc0eb6d66ca5672ae4cbd406c7ba0b55b81d 100644 --- a/docs/advanced_tutorials/READER.md +++ b/docs/advanced_tutorials/READER.md @@ -16,7 +16,6 @@ - [评估配置](#评估配置) - [推理配置](#推理配置) - [运行](#运行) -- [FAQ](#faq) ## 简介 PaddleDetection的数据处理模块是一个Python模块,所有代码逻辑在`ppdet/data/`中,数据处理模块用于加载数据并将其转换成适用于物体检测模型的训练、评估、推理所需要的格式。 @@ -437,10 +436,4 @@ loader.set_sample_list_generator(reader, place) ``` 在运行程序中设置完数据处理模块后,就可以开始训练、评估与测试了,具体请参考相应运行程序python源码。 -## FAQ - -**Q:** 在配置文件中设置use_process=True,并且运行报错:`not enough space for reason[failed to malloc 601 pages...` - -**A:** 当前Reader的共享存储队列空间不足,请增大配置文件`xxx.yml`中的`memsize`,如`memsize: 3G`->`memsize: 6G`。或者配置文件中设置`use_process=False`。 - > 关于数据处理模块,如您有其他问题或建议,请给我们提issue,我们非常欢迎您的反馈。 diff --git a/docs/tutorials/FAQ.md b/docs/tutorials/FAQ.md new file mode 100644 index 0000000000000000000000000000000000000000..b583ba6c48360a24d8cbd1de9cb17ae4f1b78737 --- /dev/null +++ b/docs/tutorials/FAQ.md @@ -0,0 +1,36 @@ +## FAQ + +**Q:** 为什么我使用单GPU训练loss会出`NaN`?
+**A:** 默认学习率是适配多GPU训练(8x GPU),若使用单GPU训练,须对应调整学习率(例如,除以8)。 +计算规则表如下所示,它们是等价的,表中变化节点即为`piecewise decay`里的`boundaries`:
+ + +| GPU数 | 学习率 | 最大轮数 | 变化节点 | +| :---------: | :------------: | :-------: | :--------------: | +| 2 | 0.0025 | 720000 | [480000, 640000] | +| 4 | 0.005 | 360000 | [240000, 320000] | +| 8 | 0.01 | 180000 | [120000, 160000] | + + +**Q:** 如何减少GPU显存使用率?
+**A:** 可通过设置环境变量`FLAGS_conv_workspace_size_limit`为较小的值来减少显存消耗,并且不 +会影响训练速度。以Mask-RCNN(R50)为例,设置`export FLAGS_conv_workspace_size_limit = 512`, +batch size可以达到每GPU 4 (Tesla V100 16GB)。 + + +**Q:** 如何修改数据预处理?
+**A:** 可在配置文件中设置 `sample_transform`。注意需要在配置文件中加入**完整预处理** +例如RCNN模型中`DecodeImage`, `NormalizeImage` and `Permute`。 + +**Q:** affine_channel和batch norm是什么关系?
+**A:** 在RCNN系列模型加载预训练模型初始化,有时候会固定住batch norm的参数, 使用预训练模型中的全局均值和方式,并且batch norm的scale和bias参数不更新,已发布的大多ResNet系列的RCNN模型采用这种方式。这种情况下可以在config中设置norm_type为bn或affine_channel, freeze_norm为true (默认为true),两种方式等价。affne_channel的计算方式为`scale * x + bias`。只不过设置affine_channel时,内部对batch norm的参数自动做了融合。如果训练使用的affine_channel,用保存的模型做初始化,训练其他任务时,既可使用affine_channel, 也可使用batch norm, 参数均可正确加载。 + +**Q:** 某些配置项会在多个模块中用到(如 `num_classes`),如何避免在配置文件中多次重复设置? +**A:** 框架提供了 `__shared__` 标记来实现配置的共享,用户可以标记参数,如 `__shared__ = ['num_classes']` ,配置数值作用规则如下: + +1. 如果模块配置中提供了 `num_classes` ,会优先使用其数值。 +2. 如果模块配置中未提供 `num_classes` ,但配置文件中存在全局键值,那么会使用全局键值。 +3. 两者均为配置的情况下,将使用默认值(`81`)。 + +**Q:** 在配置文件中设置use_process=True,并且运行报错:`not enough space for reason[failed to malloc 601 pages...` +**A:** 当前Reader的共享存储队列空间不足,请增大配置文件`xxx.yml`中的`memsize`,如`memsize: 3G`->`memsize: 6G`。或者配置文件中设置`use_process=False`。 diff --git a/docs/tutorials/GETTING_STARTED.md b/docs/tutorials/GETTING_STARTED.md index 758f13a3243795c6ab97abb61bc6ab8597556f39..405f1eb0e09c0be1b577184df235b8a06ffdf0cd 100644 --- a/docs/tutorials/GETTING_STARTED.md +++ b/docs/tutorials/GETTING_STARTED.md @@ -100,7 +100,7 @@ list below can be viewed by `--help` ##### NOTES -- `CUDA_VISIBLE_DEVICES` can specify different gpu numbers. Such as: `export CUDA_VISIBLE_DEVICES=0,1,2,3`. GPU calculation rules can refer [FAQ](#faq) +- `CUDA_VISIBLE_DEVICES` can specify different gpu numbers. Such as: `export CUDA_VISIBLE_DEVICES=0,1,2,3`. GPU calculation rules can refer [FAQ](./FAQ.md) - Dataset will be downloaded automatically and cached in `~/.cache/paddle/dataset` if not be found locally. - Pretrained model is downloaded automatically and cached in `~/.cache/paddle/weights`. - Checkpoints are saved in `output` by default, and can be revised from save_dir in configure files. @@ -180,29 +180,3 @@ moment, but it is a planned feature ``` Save inference model `tools/export_model.py`, which can be loaded by PaddlePaddle predict library. - -## FAQ - -**Q:** Why do I get `NaN` loss values during single GPU training?
-**A:** The default learning rate is tuned to multi-GPU training (8x GPUs), it must -be adapted for single GPU training accordingly (e.g., divide by 8). -The calculation rules are as follows,they are equivalent:
- - -| GPU number | Learning rate | Max_iters | Milestones | -| :---------: | :------------: | :-------: | :--------------: | -| 2 | 0.0025 | 720000 | [480000, 640000] | -| 4 | 0.005 | 360000 | [240000, 320000] | -| 8 | 0.01 | 180000 | [120000, 160000] | - - -**Q:** How to reduce GPU memory usage?
-**A:** Setting environment variable FLAGS_conv_workspace_size_limit to a smaller -number can reduce GPU memory footprint without affecting training speed. -Take Mask-RCNN (R50) as example, by setting `export FLAGS_conv_workspace_size_limit=512`, -batch size could reach 4 per GPU (Tesla V100 16GB). - - -**Q:** How to change data preprocessing?
-**A:** Set `sample_transform` in configuration. Note that **the whole transforms** need to be added in configuration. -For example, `DecodeImage`, `NormalizeImage` and `Permute` in RCNN models. diff --git a/docs/tutorials/GETTING_STARTED_cn.md b/docs/tutorials/GETTING_STARTED_cn.md index 4409e2adb8fc532bd4eb6e5d5863032444483459..47751bb49571084febe74cc31153a93155ee94b9 100644 --- a/docs/tutorials/GETTING_STARTED_cn.md +++ b/docs/tutorials/GETTING_STARTED_cn.md @@ -97,7 +97,7 @@ python tools/infer.py -c configs/faster_rcnn_r50_1x.yml --infer_img=demo/0000005 **提示:** -- `CUDA_VISIBLE_DEVICES` 参数可以指定不同的GPU。例如: `export CUDA_VISIBLE_DEVICES=0,1,2,3`. GPU计算规则可以参考 [FAQ](#faq) +- `CUDA_VISIBLE_DEVICES` 参数可以指定不同的GPU。例如: `export CUDA_VISIBLE_DEVICES=0,1,2,3`. GPU计算规则可以参考 [FAQ](./FAQ.md) - 若本地未找到数据集,将自动下载数据集并保存在`~/.cache/paddle/dataset`中。 - 预训练模型自动下载并保存在`〜/.cache/paddle/weights`中。 - 模型checkpoints默认保存在`output`中,可通过修改配置文件中save_dir进行配置。 @@ -161,30 +161,3 @@ python -m paddle.distributed.launch --selected_gpus 0,1,2,3,4,5,6,7 tools/train. `--draw_threshold` 是个可选参数. 根据 [NMS](https://ieeexplore.ieee.org/document/1699659) 的计算, 不同阈值会产生不同的结果。如果用户需要对自定义路径的模型进行推断,可以设置`-o weights`指定模型路径。 - -## FAQ - -**Q:** 为什么我使用单GPU训练loss会出`NaN`?
-**A:** 默认学习率是适配多GPU训练(8x GPU),若使用单GPU训练,须对应调整学习率(例如,除以8)。 -计算规则表如下所示,它们是等价的,表中变化节点即为`piecewise decay`里的`boundaries`:
- - -| GPU数 | 学习率 | 最大轮数 | 变化节点 | -| :---------: | :------------: | :-------: | :--------------: | -| 2 | 0.0025 | 720000 | [480000, 640000] | -| 4 | 0.005 | 360000 | [240000, 320000] | -| 8 | 0.01 | 180000 | [120000, 160000] | - - -**Q:** 如何减少GPU显存使用率?
-**A:** 可通过设置环境变量`FLAGS_conv_workspace_size_limit`为较小的值来减少显存消耗,并且不 -会影响训练速度。以Mask-RCNN(R50)为例,设置`export FLAGS_conv_workspace_size_limit = 512`, -batch size可以达到每GPU 4 (Tesla V100 16GB)。 - - -**Q:** 如何修改数据预处理?
-**A:** 可在配置文件中设置 `sample_transform`。注意需要在配置文件中加入**完整预处理** -例如RCNN模型中`DecodeImage`, `NormalizeImage` and `Permute`。 - -**Q:** affine_channel和batch norm是什么关系?
-**A:** 在RCNN系列模型加载预训练模型初始化,有时候会固定住batch norm的参数, 使用预训练模型中的全局均值和方式,并且batch norm的scale和bias参数不更新,已发布的大多ResNet系列的RCNN模型采用这种方式。这种情况下可以在config中设置norm_type为bn或affine_channel, freeze_norm为true (默认为true),两种方式等价。affne_channel的计算方式为`scale * x + bias`。只不过设置affine_channel时,内部对batch norm的参数自动做了融合。如果训练使用的affine_channel,用保存的模型做初始化,训练其他任务时,既可使用affine_channel, 也可使用batch norm, 参数均可正确加载。 diff --git a/docs/tutorials/index.rst b/docs/tutorials/index.rst index d202c2ec55d3579922206b557cc190175393f6f5..0385863a84faea47bd5ae64709e0ebb700374259 100644 --- a/docs/tutorials/index.rst +++ b/docs/tutorials/index.rst @@ -7,3 +7,4 @@ INSTALL_cn.md QUICK_STARTED_cn.md GETTING_STARTED_cn.md + FAQ.md