update table recognition finetune doc (#8353)

* add can, stt to algorithm_overview_en.md * update table recognition finetune

update table recognition finetune doc (#8353)
* add can, stt to algorithm_overview_en.md * update table recognition finetune
83f8a4d9 · zhoujun · GitHub · f68813eb · 83f8a4d9 · 83f8a4d9
隐藏空白更改
内联并排

Showing with 72 addition and 2 deletion

doc/doc_ch/table_recognition.md doc/doc_ch/table_recognition.md +36 -1

doc/doc_en/table_recognition_en.md doc/doc_en/table_recognition_en.md +36 -1

未找到文件。
--- a/doc/doc_ch/table_recognition.md
+++ b/doc/doc_ch/table_recognition.md
@@ -14,6 +14,9 @@
  - [2.5. 分布式训练](#25-分布式训练)
  - [2.6. 其他训练环境](#26-其他训练环境)
  - [2.7. 模型微调](#27-模型微调)
+    - [2.7.1 数据选择](#271-数据选择)
+    - [2.7.2 模型选择](#272-模型选择)
+    - [2.7.3 训练超参选择](#273-训练超参选择)
 - [3. 模型评估与预测](#3-模型评估与预测)
  - [3.1. 指标评估](#31-指标评估)
  - [3.2. 测试表格结构识别效果](#32-测试表格结构识别效果)
@@ -219,7 +222,39 @@ DCU设备上运行需要设置环境变量 `export HIP_VISIBLE_DEVICES=0,1,2,3`

 ## 2.7. 模型微调

-实际使用过程中，建议加载官方提供的预训练模型，在自己的数据集中进行微调，关于模型的微调方法，请参考：[模型微调教程](./finetune.md)。
+### 2.7.1 数据选择
+
+数据量：建议至少准备2000张的表格识别数据集用于模型微调。
+
+### 2.7.2 模型选择
+
+建议选择SLANet模型（配置文件：[SLANet_ch.yml](../../configs/table/SLANet_ch.yml)，预训练模型：[ch_ppstructure_mobile_v2.0_SLANet_train.tar](https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_train.tar)）进行微调，其精度与泛化性能是目前提供的最优中文表格预训练模型。
+
+更多表格识别模型，请参考[PP-Structure 系列模型库](../../ppstructure/docs/models_list.md)。
+
+### 2.7.3 训练超参选择
+
+在模型微调的时候，最重要的超参就是预训练模型路径`pretrained_model`, 学习率`learning_rate`，部分配置文件如下所示。
+
+```yaml
+Global:
+  pretrained_model: ./ch_ppstructure_mobile_v2.0_SLANet_train/best_accuracy.pdparams # 预训练模型路径
+Optimizer:
+  lr:
+    name: Cosine
+    learning_rate: 0.001 #
+    warmup_epoch: 0
+  regularizer:
+    name: 'L2'
+    factor: 0
+```
+
+上述配置文件中，首先需要将`pretrained_model`字段指定为`best_accuracy.pdparams`文件路径。
+
+PaddleOCR提供的配置文件是在4卡训练（相当于总的batch size是`4*48=192`）、且没有加载预训练模型情况下的配置文件，因此您的场景中，学习率与总的batch size需要对应线性调整，例如
+
+* 如果您的场景中是单卡训练，单卡batch_size=48，则总的batch_size=48，建议将学习率调整为`0.00025`左右。
+* 如果您的场景中是单卡训练，由于显存限制，只能设置单卡batch_size=32，则总的batch_size=32，建议将学习率调整为`0.00017`左右。


 # 3. 模型评估与预测

--- a/doc/doc_en/table_recognition_en.md
+++ b/doc/doc_en/table_recognition_en.md
@@ -14,6 +14,9 @@ This article provides a full-process guide for the PaddleOCR table recognition m
  - [2.5. Distributed Training](#25-distributed-training)
  - [2.6. Training on other platform(Windows/macOS/Linux DCU)](#26-training-on-other-platformwindowsmacoslinux-dcu)
  - [2.7. Fine-tuning](#27-fine-tuning)
+    - [2.7.1 Dataset](#271-dataset)
+    - [2.7.2 model selection](#272-model-selection)
+    - [2.7.3 Training hyperparameter selection](#273-training-hyperparameter-selection)
 - [3. Evaluation and Test](#3-evaluation-and-test)
  - [3.1. Evaluation](#31-evaluation)
  - [3.2. Test table structure recognition effect](#32-test-table-structure-recognition-effect)
@@ -226,8 +229,40 @@ Running on a DCU device requires setting the environment variable `export HIP_VI

 ## 2.7. Fine-tuning

-In the actual use process, it is recommended to load the officially provided pre-training model and fine-tune it in your own data set. For the fine-tuning method of the table recognition model, please refer to: [Model fine-tuning tutorial](./finetune.md).

+### 2.7.1 Dataset
+
+Data number: It is recommended to prepare at least 2000 table recognition datasets for model fine-tuning.
+
+### 2.7.2 model selection
+
+It is recommended to choose the SLANet model (configuration file: [SLANet_ch.yml](../../configs/table/SLANet_ch.yml), pre-training model: [ch_ppstructure_mobile_v2.0_SLANet_train.tar](https://paddleocr.bj.bcebos .com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_train.tar)) for fine-tuning, its accuracy and generalization performance is the best Chinese table pre-training model currently available.
+
+For more table recognition models, please refer to [PP-Structure Series Model Library](../../ppstructure/docs/models_list.md).
+
+### 2.7.3 Training hyperparameter selection
+
+When fine-tuning the model, the most important hyperparameters are the pretrained model path `pretrained_model`, the learning rate `learning_rate`, and some configuration files are shown below.
+
+```yaml
+Global:
+  pretrained_model: ./ch_ppstructure_mobile_v2.0_SLANet_train/best_accuracy.pdparams # Pre-trained model path
+Optimizer:
+  lr:
+    name: Cosine
+    learning_rate: 0.001 #
+    warmup_epoch: 0
+  regularizer:
+    name: 'L2'
+    factor: 0
+```
+
+In the above configuration file, you first need to specify the `pretrained_model` field as the `best_accuracy.pdparams` file path.
+
+The configuration file provided by PaddleOCR is for 4-card training (equivalent to a total batch size of `4*48=192`) and no pre-trained model is loaded. Therefore, in your scenario, the learning rate is the same as the total The batch size needs to be adjusted linearly, for example
+
+* If your scenario is single card training, single card batch_size=48, then the total batch_size=48, it is recommended to adjust the learning rate to about `0.00025`.
+* If your scenario is for single-card training, due to memory limitations, you can only set batch_size=32 for a single card, then the total batch_size=32, it is recommended to adjust the learning rate to about `0.00017`.

 # 3. Evaluation and Test