From fd1bddf5a2b69b2643d2fef28f84a3a617c1874c Mon Sep 17 00:00:00 2001 From: WenmuZhou Date: Wed, 28 Jul 2021 16:11:50 +0800 Subject: [PATCH] fix dead link --- ppstructure/table/README.md | 29 ++++++++++++++++++++++++++++- ppstructure/table/README_ch.md | 4 +++- 2 files changed, 31 insertions(+), 2 deletions(-) diff --git a/ppstructure/table/README.md b/ppstructure/table/README.md index 1fb00f01..b0692769 100644 --- a/ppstructure/table/README.md +++ b/ppstructure/table/README.md @@ -19,7 +19,34 @@ The table ocr flow chart is as follows ### 2.1 Train -TBD + +In this chapter, we only introduce the training of the table structure model, For model training of [text detection](../../doc/doc_en/detection_en.md) and [text recognition](../../doc/doc_en/recognition_en.md), please refer to the corresponding documents + +#### data preparation +The training data uses public data set [PubTabNet](https://arxiv.org/abs/1911.10683 ), Can be downloaded from the official [website](https://github.com/ibm-aur-nlp/PubTabNet) 。The PubTabNet data set contains about 500,000 images, as well as annotations in html format。 + +#### Start training +*If you are installing the cpu version of paddle, please modify the `use_gpu` field in the configuration file to false* +```shell +# single GPU training +python3 tools/train.py -c configs/table/table_mv3.yml +# multi-GPU training +# Set the GPU ID used by the '--gpus' parameter. +python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/table/table_mv3.yml +``` + +In the above instruction, use `-c` to select the training to use the `configs/table/table_mv3.yml` configuration file. +For a detailed explanation of the configuration file, please refer to [config](../../doc/doc_en/config_en.md). + +#### load trained model and continue training + +If you expect to load trained model and continue the training again, you can specify the parameter `Global.checkpoints` as the model path to be loaded. + +```shell +python3 tools/train.py -c configs/table/table_mv3.yml -o Global.checkpoints=./your/trained/model +``` + +**Note**: The priority of `Global.checkpoints` is higher than that of `Global.pretrain_weights`, that is, when two parameters are specified at the same time, the model specified by `Global.checkpoints` will be loaded first. If the model path specified by `Global.checkpoints` is wrong, the one specified by `Global.pretrain_weights` will be loaded. ### 2.2 Eval First cd to the PaddleOCR/ppstructure directory diff --git a/ppstructure/table/README_ch.md b/ppstructure/table/README_ch.md index 5c3c9a28..1f2b8b1d 100644 --- a/ppstructure/table/README_ch.md +++ b/ppstructure/table/README_ch.md @@ -19,6 +19,8 @@ ### 2.1 训练 +在这一章节中,我们仅介绍表格结构模型的训练,[文字检测](../../doc/doc_ch/detection.md)和[文字识别](../../doc/doc_ch/recognition.md)的模型训练请参考对应的文档。 + #### 数据准备 训练数据使用公开数据集[PubTabNet](https://arxiv.org/abs/1911.10683),可以从[官网](https://github.com/ibm-aur-nlp/PubTabNet)下载。PubTabNet数据集包含约50万张表格数据的图像,以及图像对应的html格式的注释。 @@ -31,7 +33,7 @@ python3 tools/train.py -c configs/table/table_mv3.yml python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/table/table_mv3.yml ``` -上述指令中,通过-c 选择训练使用configs/table/table_mv3.yml配置文件。有关配置文件的详细解释,请参考[链接](./config.md)。 +上述指令中,通过-c 选择训练使用configs/table/table_mv3.yml配置文件。有关配置文件的详细解释,请参考[链接](../../doc/doc_ch/config.md)。 #### 断点训练 -- GitLab