Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
weixin_41840029
PaddleOCR
提交
8a86708f
P
PaddleOCR
项目概览
weixin_41840029
/
PaddleOCR
与 Fork 源项目一致
Fork自
PaddlePaddle / PaddleOCR
通知
1
Star
1
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
PaddleOCR
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
未验证
提交
8a86708f
编写于
1月 28, 2022
作者:
livingbody
提交者:
GitHub
1月 28, 2022
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
Delete pdf2img.py
delete pdf2img
上级
51bc955d
变更
1
隐藏空白更改
内联
并排
Showing
1 changed file
with
0 addition
and
68 deletion
+0
-68
tools/pdf2img.py
tools/pdf2img.py
+0
-68
未找到文件。
tools/pdf2img.py
已删除
100644 → 0
浏览文件 @
51bc955d
# -*- coding: utf-8 -*-
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from
__future__
import
absolute_import
from
__future__
import
division
from
__future__
import
print_function
"""
1、安装库 pip install pymupdf
2、安装库 pip install pillow
3、直接运行
"""
import
os
import
fitz
import
os
import
sys
__dir__
=
os
.
path
.
dirname
(
os
.
path
.
abspath
(
__file__
))
sys
.
path
.
append
(
__dir__
)
sys
.
path
.
append
(
os
.
path
.
abspath
(
os
.
path
.
join
(
__dir__
,
'..'
)))
from
ppocr.utils.logging
import
get_logger
logger
=
get_logger
()
"""
parameter:
pdf_file : the pdf file for convert
img_dir : the dir to save images
return:
pic_list: image list for pdf file
"""
def
pdf2img
(
pdf_file
,
img_dir
):
doc
=
fitz
.
open
(
pdf_file
)
pdf_name
=
os
.
path
.
splitext
(
pdf_file
)[
0
]
pic_list
=
[]
for
pg
in
range
(
doc
.
pageCount
):
page
=
doc
[
pg
]
rotate
=
int
(
0
)
# 每个尺寸的缩放系数为2,这将为我们生成分辨率提高四倍的图像。
zoom_x
=
2.0
zoom_y
=
2.0
trans
=
fitz
.
Matrix
(
zoom_x
,
zoom_y
).
prerotate
(
rotate
)
pm
=
page
.
get_pixmap
(
matrix
=
trans
,
alpha
=
False
)
# 注意下边的一行,这是本的重点。原文是生成的PNG,我给改成了JPG
if
not
os
.
path
.
exists
(
img_dir
):
os
.
mkdir
(
img_dir
)
logger
.
info
(
'%s directory are created'
%
img_dir
)
pm
.
pil_save
(
'%s/%s.jpg'
%
(
img_dir
,
pg
),
quality
=
1
)
pic_list
.
append
(
'%s/%s.jpg'
%
(
img_dir
,
pg
))
logger
.
info
(
'%s pdf file are saved under %s successfully'
%
(
pdf_file
,
img_dir
))
return
pic_list
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录