提交 d58c7022 编写于 作者: A an1018

add_pdf2docx_api

上级 8273983a
......@@ -86,7 +86,7 @@ git clone https://gitee.com/paddlepaddle/PaddleOCR
- **(2) Install recovery `requirements`**
The layout restoration is exported as docx files, so python-docx API need to be installed, and PyMuPDF api([requires Python >= 3.7](https://pypi.org/project/PyMuPDF/)) need to be installed to process the input files in pdf format. And if using pdf parse method, we need to install pdf2docx api.
The layout restoration is exported as docx files, so python-docx API need to be installed, and PyMuPDF api([requires Python >= 3.7](https://pypi.org/project/PyMuPDF/)) need to be installed to process the input files in pdf format.
Install all the libraries by running the following command:
......@@ -94,6 +94,13 @@ Install all the libraries by running the following command:
python3 -m pip install -r ppstructure/recovery/requirements.txt
````
And if using pdf parse method, we need to install pdf2docx api.
```bash
wget https://paddleocr.bj.bcebos.com/whl/pdf2docx-0.0.0-py3-none-any.whl
pip3 install pdf2docx-0.0.0-py3-none-any.whl
```
<a name="3"></a>
## 3. Quick Start using PDF parse
......
......@@ -82,7 +82,7 @@ git clone https://gitee.com/paddlepaddle/PaddleOCR
- **(2)安装recovery的`requirements`**
版面恢复导出为docx文件,所以需要安装Python处理word文档的python-docx API,同时处理pdf格式的输入文件,需要安装PyMuPDF API([要求Python >= 3.7](https://pypi.org/project/PyMuPDF/))。使用pdf2docx库解析的方式恢复文档需要安装pdf2docx等。
版面恢复导出为docx文件,所以需要安装Python处理word文档的python-docx API,同时处理pdf格式的输入文件,需要安装PyMuPDF API([要求Python >= 3.7](https://pypi.org/project/PyMuPDF/))。
通过如下命令安装全部库:
......@@ -90,6 +90,13 @@ git clone https://gitee.com/paddlepaddle/PaddleOCR
python3 -m pip install -r ppstructure/recovery/requirements.txt
```
使用pdf2docx库解析的方式恢复文档需要安装优化的pdf2docx。
```bash
wget https://paddleocr.bj.bcebos.com/whl/pdf2docx-0.0.0-py3-none-any.whl
pip3 install pdf2docx-0.0.0-py3-none-any.whl
```
<a name="3"></a>
## 3.使用 PDF解析进行版面恢复
......
......@@ -2,5 +2,4 @@ python-docx
PyMuPDF==1.19.0
beautifulsoup4
fonttools>=4.24.0
fire>=0.3.0
pdf2docx==0.0.0
\ No newline at end of file
fire>=0.3.0
\ No newline at end of file
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册