diff --git a/ppstructure/recovery/README.md b/ppstructure/recovery/README.md
index 41fb3e45b83329cd7bfa7021b19ffcee33dc947b..209c995f8efb097fd744094ea0e102db3212387e 100644
--- a/ppstructure/recovery/README.md
+++ b/ppstructure/recovery/README.md
@@ -86,7 +86,7 @@ git clone https://gitee.com/paddlepaddle/PaddleOCR
- **(2) Install recovery `requirements`**
-The layout restoration is exported as docx files, so python-docx API need to be installed, and PyMuPDF api([requires Python >= 3.7](https://pypi.org/project/PyMuPDF/)) need to be installed to process the input files in pdf format. And if using pdf parse method, we need to install pdf2docx api.
+The layout restoration is exported as docx files, so python-docx API need to be installed, and PyMuPDF api([requires Python >= 3.7](https://pypi.org/project/PyMuPDF/)) need to be installed to process the input files in pdf format.
Install all the libraries by running the following command:
@@ -94,6 +94,13 @@ Install all the libraries by running the following command:
python3 -m pip install -r ppstructure/recovery/requirements.txt
````
+ And if using pdf parse method, we need to install pdf2docx api.
+
+```bash
+wget https://paddleocr.bj.bcebos.com/whl/pdf2docx-0.0.0-py3-none-any.whl
+pip3 install pdf2docx-0.0.0-py3-none-any.whl
+```
+
## 3. Quick Start using PDF parse
diff --git a/ppstructure/recovery/README_ch.md b/ppstructure/recovery/README_ch.md
index eaa5260b57db81bc483ff4b32ec4334340002335..5ef823d43488f0c765f0757aa6745ae4cc49e52c 100644
--- a/ppstructure/recovery/README_ch.md
+++ b/ppstructure/recovery/README_ch.md
@@ -82,7 +82,7 @@ git clone https://gitee.com/paddlepaddle/PaddleOCR
- **(2)安装recovery的`requirements`**
-版面恢复导出为docx文件,所以需要安装Python处理word文档的python-docx API,同时处理pdf格式的输入文件,需要安装PyMuPDF API([要求Python >= 3.7](https://pypi.org/project/PyMuPDF/))。使用pdf2docx库解析的方式恢复文档需要安装pdf2docx等。
+版面恢复导出为docx文件,所以需要安装Python处理word文档的python-docx API,同时处理pdf格式的输入文件,需要安装PyMuPDF API([要求Python >= 3.7](https://pypi.org/project/PyMuPDF/))。
通过如下命令安装全部库:
@@ -90,6 +90,13 @@ git clone https://gitee.com/paddlepaddle/PaddleOCR
python3 -m pip install -r ppstructure/recovery/requirements.txt
```
+使用pdf2docx库解析的方式恢复文档需要安装优化的pdf2docx。
+
+```bash
+wget https://paddleocr.bj.bcebos.com/whl/pdf2docx-0.0.0-py3-none-any.whl
+pip3 install pdf2docx-0.0.0-py3-none-any.whl
+```
+
## 3.使用 PDF解析进行版面恢复
diff --git a/ppstructure/recovery/requirements.txt b/ppstructure/recovery/requirements.txt
index d67e0a95aa929e767c0289d54056e8677bf83607..4e4239a14af9b6f95aca1171f25d50da5eac37cf 100644
--- a/ppstructure/recovery/requirements.txt
+++ b/ppstructure/recovery/requirements.txt
@@ -2,5 +2,4 @@ python-docx
PyMuPDF==1.19.0
beautifulsoup4
fonttools>=4.24.0
-fire>=0.3.0
-pdf2docx==0.0.0
\ No newline at end of file
+fire>=0.3.0
\ No newline at end of file