diff --git a/ppstructure/docs/quickstart.md b/ppstructure/docs/quickstart.md index f1f9cb8b09cd87b35fec0e7f09ff1d813e3d44db..c77b3c55d1adf20c082090e338258e5377d3e170 100644 --- a/ppstructure/docs/quickstart.md +++ b/ppstructure/docs/quickstart.md @@ -97,6 +97,19 @@ paddleocr --image_dir=ppstructure/docs/table/table.jpg --type=structure --layout #### 2.1.6 版面恢复(PDF转Word) +版面恢复分为2种方法,详细介绍请参考:[版面恢复教程](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/recovery/README_ch.md): + +- PDF解析 +- OCR技术 + +通过PDF解析(只支持pdf格式的输入): + +```bash +paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true +``` + +通过OCR技术: + ```bash # 中文测试图 paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --recovery=true diff --git a/ppstructure/docs/quickstart_en.md b/ppstructure/docs/quickstart_en.md index e0eec4b38ba57b1bebd0e711093e5dfd4773fdd9..f23ae95db3ba9d708409dff2de64cb6c06ae30d1 100644 --- a/ppstructure/docs/quickstart_en.md +++ b/ppstructure/docs/quickstart_en.md @@ -97,8 +97,22 @@ paddleocr --image_dir=ppstructure/docs/table/table.jpg --type=structure --layout Key information extraction does not currently support use by the whl package. For detailed usage tutorials, please refer to: [Key Information Extraction](../kie/README.md). -#### 2.1.6 layout recovery +#### 2.1.6 layout recovery(PDF to Word) + +Two layout recovery methods are provided, For detailed usage tutorials, please refer to: [Layout Recovery](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/recovery/README.md). + +- PDF parse +- OCR + +Recovery by using PDF parse (only support pdf as input): + +```bash +paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true ``` + +Recovery by using OCR: + +```bash paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --recovery=true --lang='en' ``` diff --git a/ppstructure/pdf2word/README.md b/ppstructure/pdf2word/README.md index 93023ecde06fd08eacdfa9917f4d1800327bc115..28b985a0ee4ba41c6c7c28323548efb9fb360183 100644 --- a/ppstructure/pdf2word/README.md +++ b/ppstructure/pdf2word/README.md @@ -20,7 +20,7 @@ PDF2Word是PaddleOCR社区开发者 [whjdark](https://github.com/whjdark) 基于 > - 初次安装程序根据不同设备需要等待1-2分钟不等 > - 使用Office与WPS打开的Word结果会出现不同,推荐以Office为准 > - 本程序使用 [QPT](https://github.com/QPT-Family/QPT) 进行应用程序打包,感谢 [GT-ZhangAcer](https://github.com/GT-ZhangAcer) 对打包过程的支持 -> - 应用程序不支持盗版Windows系统,若在安装过程中出现报错或缺少依赖,推荐直接使用 `paddleocr` whl包应用PDF2Word功能,详情可查看[链接](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/ppstructure/docs/quickstart.md) +> - 应用程序仅支持正版win10,11系统,不支持盗版Windows系统,若在安装过程中出现报错或缺少依赖,推荐直接使用 `paddleocr` whl包应用PDF2Word功能,详情可查看[链接](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/ppstructure/docs/quickstart.md) ### 脚本启动界面