add more distributed training info (#7064)

* add more distributed training info * fix bs and lr

add more distributed training info (#7064)
* add more distributed training info * fix bs and lr
bf1c95d5 · littletomatodonkey · GitHub · d9d46f8f · bf1c95d5 · bf1c95d5
隐藏空白更改
内联并排

Showing with 42 addition and 10 deletion

docs/tutorials/DistributedTraining_cn.md docs/tutorials/DistributedTraining_cn.md +21 -5

docs/tutorials/DistributedTraining_en.md docs/tutorials/DistributedTraining_en.md +21 -5

未找到文件。
--- a/docs/tutorials/DistributedTraining_cn.md
+++ b/docs/tutorials/DistributedTraining_cn.md
@@ -42,9 +42,25 @@ tools/train.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml \

 ## 3. 性能效果测试

-* 在单机和4机8卡V100的机器上，基于[PP-YOLOE-s](../../configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml)进行模型训练，模型的训练耗时情况如下所示。
+* 在3机8卡V100的机器上进行模型训练，不同模型的精度、训练耗时、多机加速比情况如下所示。

-机器 | 精度 | 耗时
-|-|-
-单机8卡 | 42.7% | 39h
-4机8卡 | 42.1% | 13h
+| 模型    | 数据集 | 配置   | 单机8卡耗时/精度 | 3机8卡耗时/精度 | 加速比  |
+|:---------:|:--------:|:--------:|:--------:|:--------:|:------:|
+|  PP-YOLOE-s  | Objects365 | [ppyoloe_crn_s_300e_coco.yml](../../configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml)  | 301h/- | 162h/17.7%  | **1.85** |
+|  PP-YOLOE-l  | Objects365 | [ppyoloe_crn_l_300e_coco.yml](../../configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml)  | 401h/- | 178h/30.3%  | **2.25** |
+
+
+* 在4机8卡V100的机器上进行模型训练，不同模型的精度、训练耗时、多机加速比情况如下所示。
+
+
+| 模型    | 数据集 | 配置   | 单机8卡耗时/精度 | 4机8卡耗时/精度 | 加速比  |
+|:---------:|:--------:|:--------:|:--------:|:--------:|:------:|
+|  PP-YOLOE-s  | COCO | [ppyoloe_crn_s_300e_coco.yml](../../configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml)  | 39h/42.7% | 13h/42.1%  | **3.0** |
+|  PP-YOLOE-m  | Objects365 | [ppyoloe_crn_m_300e_coco.yml](../../configs/ppyoloe/ppyoloe_crn_m_300e_coco.yml)  | 337h/- | 112h/24.6%  | **3.0** |
+|  PP-YOLOE-x  | Objects365 | [ppyoloe_crn_x_300e_coco.yml](../../configs/ppyoloe/ppyoloe_crn_x_300e_coco.yml)  | 464h/- | 125h/32.1%  | **3.4** |
+
+
+* **注意**
+    * 在训练的GPU卡数过多时，精度会稍微有所损失（1%左右），此时可以尝试通过添加warmup或者适当增加迭代轮数来弥补精度损失。
+    * 这里的配置文件均提供的是COCO数据集的配置文件，如果需要训练其他的数据集，需要修改数据集路径。
+    * 上面的`PP-YOLOE`系列模型在多机训练过程中，均设置单卡batch size为8，同时学习率相比于单机8卡保持不变。
--- a/docs/tutorials/DistributedTraining_en.md
+++ b/docs/tutorials/DistributedTraining_en.md
@@ -36,9 +36,25 @@ tools/train.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml \

 ## 2. Performance

-* On single-machine and 4-machine 8-card V100 machines, model training is performed based on [PP-YOLOE-s](../../configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml). The model training time is as follows.
+* We conducted model training on 3x8 V100 GPUs. Accuracy, training time, and multi machine acceleration ratio of different models are shown below.

-Machine | mAP | Time cost
-|-|-
-single machine | 42.7% | 39h
-4 machines | 42.1% | 13h
+| Model    | Dataset | Configuration   | 8 GPU training time / Accuracy | 3x8 GPU training time / Accuracy | Acceleration ratio  |
+|:---------:|:--------:|:--------:|:--------:|:--------:|:------:|
+|  PP-YOLOE-s  | Objects365 | [ppyoloe_crn_s_300e_coco.yml](../../configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml)  | 301h/- | 162h/17.7%  | **1.85** |
+|  PP-YOLOE-l  | Objects365 | [ppyoloe_crn_l_300e_coco.yml](../../configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml)  | 401h/- | 178h/30.3%  | **2.25** |
+
+
+* We conducted model training on 4x8 V100 GPUs. Accuracy, training time, and multi machine acceleration ratio of different models are shown below.
+
+
+| Model    | Dataset | Configuration   | 8 GPU training time / Accuracy | 4x8 GPU training time / Accuracy | Acceleration ratio  |
+|:---------:|:--------:|:--------:|:--------:|:--------:|:------:|
+|  PP-YOLOE-s  | COCO | [ppyoloe_crn_s_300e_coco.yml](../../configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml)  | 39h/42.7% | 13h/42.1%  | **3.0** |
+|  PP-YOLOE-m  | Objects365 | [ppyoloe_crn_m_300e_coco.yml](../../configs/ppyoloe/ppyoloe_crn_m_300e_coco.yml)  | 337h/- | 112h/24.6%  | **3.0** |
+|  PP-YOLOE-x  | Objects365 | [ppyoloe_crn_x_300e_coco.yml](../../configs/ppyoloe/ppyoloe_crn_x_300e_coco.yml)  | 464h/- | 125h/32.1%  | **3.4** |
+
+
+* **Note**
+    * When the number of GPU cards for training is too large, the accuracy will be slightly lost (about 1%). At this time, you can try to warmup the training process or increase some training epochs to reduce the lost.
+    * The configuration files here are provided based on COCO datasets. If you need to train on other datasets, you need to modify the dataset path.
+    * For the multi-machine training process of `PP-YOLOE` series, the batch size of single card is set as 8 and learning rate is same as that of single machine.