Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
mikes zhang
001
提交
9c0d3652
0
001
项目概览
mikes zhang
/
001
通知
6
Star
0
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
DevOps
流水线
流水线任务
计划
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
0
001
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
DevOps
DevOps
流水线
流水线任务
计划
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
流水线任务
提交
Issue看板
未验证
提交
9c0d3652
编写于
5月 13, 2020
作者:
D
David Amos
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
Add article source code files
上级
1e0b8bc9
变更
7
隐藏空白更改
内联
并排
Showing
7 changed file
with
352 addition
and
2 deletion
+352
-2
creating-and-modifying-pdfs/README.md
creating-and-modifying-pdfs/README.md
+1
-2
creating-and-modifying-pdfs/source_code/01-extracting-text-from-a-pdf.py
...difying-pdfs/source_code/01-extracting-text-from-a-pdf.py
+65
-0
creating-and-modifying-pdfs/source_code/02-extracting-pages-from-a-pdf.py
...ifying-pdfs/source_code/02-extracting-pages-from-a-pdf.py
+70
-0
creating-and-modifying-pdfs/source_code/03-concatenating-and-merging-pdfs.py
...ing-pdfs/source_code/03-concatenating-and-merging-pdfs.py
+59
-0
creating-and-modifying-pdfs/source_code/04-rotating-and-cropping-PDF-pages.py
...ng-pdfs/source_code/04-rotating-and-cropping-PDF-pages.py
+106
-0
creating-and-modifying-pdfs/source_code/05-encrypting-and-decrypting-pdfs.py
...ing-pdfs/source_code/05-encrypting-and-decrypting-pdfs.py
+46
-0
creating-and-modifying-pdfs/source_code/06-creating-a-pdf-file-from-scratch.py
...g-pdfs/source_code/06-creating-a-pdf-file-from-scratch.py
+5
-0
未找到文件。
creating-and-modifying-pdfs/README.md
浏览文件 @
9c0d3652
...
...
@@ -7,5 +7,4 @@ There are two subfolders in this folder:
1.
**`practice_files/`:**
Contains the sample PDFs used in the chapter
2.
**`source_code/`:**
Contains source code from the chapter
TODO:
-
[ ] Add source code files
The source code files are organized by section of the article, and the start of each subsection is indicated with comments.
creating-and-modifying-pdfs/source_code/01-extracting-text-from-a-pdf.py
0 → 100644
浏览文件 @
9c0d3652
# ---------------
# Open a PDF File
# ---------------
from
PyPDF2
import
PdfFileReader
# You might need to change this to match the path on your computer
from
pathlib
import
Path
pdf_path
=
(
Path
.
home
()
/
"creating-and-modifying-pdfs"
/
"practice_files"
/
"Pride_and_Prejudice.pdf"
)
pdf
=
PdfFileReader
(
str
(
pdf_path
))
print
(
pdf
.
getNumPages
())
print
(
pdf
.
documentInfo
)
print
(
pdf
.
documentInfo
.
title
)
# ---------------------------
# Extracting Text From a Page
# ---------------------------
first_page
=
pdf
.
getPage
(
0
)
print
(
type
(
first_page
))
print
(
first_page
.
extractText
())
for
page
in
pdf
.
pages
:
print
(
page
.
extractText
())
# -----------------------
# Putting It All Together
# -----------------------
from
pathlib
import
Path
from
PyPDF2
import
PdfFileReader
# Change the path below to the correct path for your computer.
pdf_path
=
(
Path
.
home
()
/
"creating-and-modifying-pdfs"
/
"practice-files"
/
"Pride_and_Prejudice.pdf"
)
pdf_reader
=
PdfFileReader
(
str
(
pdf_path
))
output_file_path
=
Path
.
home
()
/
"Pride_and_Prejudice.txt"
with
output_file_path
.
open
(
mode
=
"w"
)
as
output_file
:
title
=
pdf_reader
.
documentInfo
.
title
num_pages
=
pdf_reader
.
getNumPages
()
output_file
.
write
(
f
"
{
title
}
\\
nNumber of pages:
{
num_pages
}
\\
n
\\
n"
)
for
page
in
pdf_reader
.
pages
:
text
=
page
.
extractText
()
output_file
.
write
(
text
)
creating-and-modifying-pdfs/source_code/02-extracting-pages-from-a-pdf.py
0 → 100644
浏览文件 @
9c0d3652
# -----------------------------
# Using the PdfFileWriter Class
# -----------------------------
from
PyPDF2
import
PdfFileWriter
pdf_writer
=
PdfFileWriter
()
page
=
pdf_writer
.
addBlankPage
(
width
=
72
,
height
=
72
)
print
(
type
(
page
))
from
pathlib
import
Path
with
Path
(
"blank.pdf"
).
open
(
mode
=
"wb"
)
as
output_file
:
pdf_writer
.
write
(
output_file
)
# -----------------------------------
# Extracting a Single Page From a PDF
# -----------------------------------
from
pathlib
import
Path
from
PyPDF2
import
PdfFileReader
,
PdfFileWriter
# Change the path to work on your computer if necessary
pdf_path
=
(
Path
.
home
()
/
"creating-and-modifying-pdfs"
/
"practice_files"
/
"Pride_and_Prejudice.pdf"
)
input_pdf
=
PdfFileReader
(
str
(
pdf_path
))
first_page
=
input_pdf
.
getPage
(
0
)
pdf_writer
=
PdfFileWriter
()
pdf_writer
.
addPage
(
first_page
)
with
Path
(
"first_page.pdf"
).
open
(
mode
=
"wb"
)
as
output_file
:
pdf_writer
.
write
(
output_file
)
# ------------------------------------
# Extracting Multiple Pages From a PDF
# ------------------------------------
from
PyPDF2
import
PdfFileReader
,
PdfFileWriter
from
pathlib
import
Path
pdf_path
=
(
Path
.
home
()
/
"creating-and-modifying-pdfs"
/
"practice_files"
/
"Pride_and_Prejudice.pdf"
)
input_pdf
=
PdfFileReader
(
str
(
pdf_path
))
pdf_writer
=
PdfFileWriter
()
for
n
in
range
(
1
,
4
):
page
=
input_pdf
.
getPage
(
n
)
pdf_writer
.
addPage
(
page
)
print
(
pdf_writer
.
getNumPages
())
pdf_writer
=
PdfFileWriter
()
for
page
in
input_pdf
.
pages
[
1
:
4
]:
pdf_writer
.
addPage
(
page
)
with
Path
(
"chapter1_slice.pdf"
).
open
(
mode
=
"wb"
)
as
output_file
:
pdf_writer
.
write
(
output_file
)
\ No newline at end of file
creating-and-modifying-pdfs/source_code/03-concatenating-and-merging-pdfs.py
0 → 100644
浏览文件 @
9c0d3652
# -----------------------------
# Using the PdfFileMerger Class
# -----------------------------
from
PyPDF2
import
PdfFileMerger
pdf_merger
=
PdfFileMerger
()
# ---------------------------------
# Concatenating PDFs With .append()
# ---------------------------------
from
pathlib
import
Path
reports_dir
=
(
Path
.
home
()
/
"creating-and-modifying-pdfs"
/
"practice_files"
/
"expense_reports"
)
for
path
in
reports_dir
.
glob
(
"*.pdf"
):
print
(
path
.
name
)
expense_reports
=
list
(
reports_dir
.
glob
(
"*.pdf"
))
expense_reports
.
sort
()
for
path
in
expense_reports
:
print
(
path
.
name
)
for
path
in
expense_reports
:
pdf_merger
.
append
(
str
(
path
))
with
Path
(
"expense_reports.pdf"
).
open
(
mode
=
"wb"
)
as
output_file
:
pdf_merger
.
write
(
output_file
)
# --------------------------
# Merging PDFs With .merge()
# --------------------------
from
pathlib
import
Path
from
PyPDF2
import
PdfFileMerger
report_dir
=
(
Path
.
home
()
/
"creating-and-modifying-pdfs"
/
"practice_files"
/
"quarterly_report"
)
report_path
=
report_dir
/
"report.pdf"
toc_path
=
report_dir
/
"toc.pdf"
pdf_merger
=
PdfFileMerger
()
pdf_merger
.
append
(
str
(
report_path
))
pdf_merger
.
merge
(
1
,
str
(
toc_path
))
with
Path
(
"full_report.pdf"
).
open
(
mode
=
"wb"
)
as
output_file
:
pdf_merger
.
write
(
output_file
)
\ No newline at end of file
creating-and-modifying-pdfs/source_code/04-rotating-and-cropping-PDF-pages.py
0 → 100644
浏览文件 @
9c0d3652
# --------------
# Rotating Pages
# --------------
from
pathlib
import
Path
from
PyPDF2
import
PdfFileReader
,
PdfFileWriter
pdf_path
=
(
Path
.
home
()
/
"creating-and-modifying-pdfs"
/
"practice_files"
/
"ugly.pdf"
)
pdf_reader
=
PdfFileReader
(
str
(
pdf_path
))
pdf_writer
=
PdfFileWriter
()
for
n
in
range
(
pdf_reader
.
getNumPages
()):
page
=
pdf_reader
.
getPage
(
n
)
if
n
%
2
==
0
:
page
.
rotateClockwise
(
90
)
pdf_writer
.
addPage
(
page
)
with
Path
(
"ugly_rotated.pdf"
).
open
(
mode
=
"wb"
)
as
output_file
:
pdf_writer
.
write
(
output_file
)
pdf_reader
=
PdfFileReader
(
str
(
pdf_path
))
print
(
pdf_reader
.
getPage
(
0
))
page
=
pdf_reader
.
getPage
(
0
)
print
(
page
[
"/Rotate"
])
page
=
pdf_reader
.
getPage
(
1
)
print
(
page
[
"/Rotate"
])
page
=
pdf_reader
.
getPage
(
0
)
print
(
page
[
"/Rotate"
])
page
.
rotateClockwise
(
90
)
print
(
page
[
"/Rotate"
])
pdf_reader
=
PdfFileReader
(
str
(
pdf_path
))
pdf_writer
=
PdfFileWriter
()
for
page
in
pdf_reader
.
pages
:
if
page
[
"/Rotate"
]
==
-
90
:
page
.
rotateClockwise
(
90
)
pdf_writer
.
addPage
(
page
)
with
Path
(
"ugly_rotated2.pdf"
).
open
(
mode
=
"wb"
)
as
output_file
:
pdf_writer
.
write
(
output_file
)
# --------------
# Cropping Pages
# --------------
from
pathlib
import
Path
from
PyPDF2
import
PdfFileReader
,
PdfFileWriter
pdf_path
=
(
Path
.
home
()
/
"creating-and-modifying-pdfs"
/
"practice_files"
/
"half_and_half.pdf"
)
pdf_reader
=
PdfFileReader
(
str
(
pdf_path
))
first_page
=
pdf_reader
.
getPage
(
0
)
print
(
first_page
.
mediaBox
)
print
(
first_page
.
mediaBox
.
lowerLeft
)
print
(
first_page
.
mediaBox
.
lowerRight
)
print
(
first_page
.
mediaBox
.
upperLeft
)
print
(
first_page
.
mediaBox
.
upperRight
)
print
(
first_page
.
mediaBox
.
upperRight
[
0
])
print
(
first_page
.
mediaBox
.
upperRight
[
1
])
first_page
.
mediaBox
.
upperLeft
=
(
0
,
480
)
print
(
first_page
.
mediaBox
.
upperLeft
)
print
(
first_page
.
mediaBox
.
upperRight
)
pdf_writer
=
PdfFileWriter
()
pdf_writer
.
addPage
(
first_page
)
with
Path
(
"cropped_page.pdf"
).
open
(
mode
=
"wb"
)
as
output_file
:
pdf_writer
.
write
(
output_file
)
pdf_reader
=
PdfFileReader
(
str
(
pdf_path
))
pdf_writer
=
PdfFileWriter
()
first_page
=
pdf_reader
.
getPage
(
0
)
import
copy
left_side
=
copy
.
deepcopy
(
first_page
)
current_coords
=
left_side
.
mediaBox
.
upperRight
new_coords
=
(
current_coords
[
0
]
/
2
,
current_coords
[
1
])
left_side
.
mediaBox
.
upperRight
=
new_coords
right_side
=
copy
.
deepcopy
(
first_page
)
right_side
.
mediaBox
.
upperLeft
=
new_coords
pdf_writer
.
addPage
(
left_side
)
pdf_writer
.
addPage
(
right_side
)
with
Path
(
"cropped_pages.pdf"
).
open
(
mode
=
"wb"
)
as
output_file
:
pdf_writer
.
write
(
output_file
)
\ No newline at end of file
creating-and-modifying-pdfs/source_code/05-encrypting-and-decrypting-pdfs.py
0 → 100644
浏览文件 @
9c0d3652
# ---------------
# Encrypting PDFs
# ---------------
from
pathlib
import
Path
from
PyPDF2
import
PdfFileReader
,
PdfFileWriter
pdf_path
=
(
Path
.
home
()
/
"creating-and-modifying-pdfs"
/
"practice_files"
/
"newsletter.pdf"
)
pdf_reader
=
PdfFileReader
(
str
(
pdf_path
))
pdf_writer
=
PdfFileWriter
()
pdf_writer
.
appendPagesFromReader
(
pdf_reader
)
pdf_writer
.
encrypt
(
user_pwd
=
"SuperSecret"
)
output_path
=
Path
.
home
()
/
"newsletter_protected.pdf"
with
output_path
.
open
(
mode
=
"wb"
)
as
output_file
:
pdf_writer
.
write
(
output_file
)
user_pwd
=
"SuperSecret"
owner_pwd
=
"ReallySuperSecret"
pdf_writer
.
encrypt
(
user_pwd
=
user_pwd
,
owner_pwd
=
owner_pwd
)
# ---------------
# Decrypting PDFs
# ---------------
from
pathlib
import
Path
from
PyPDF2
import
PdfFileReader
,
PdfFileWriter
pdf_path
=
Path
.
home
()
/
"newsletter_protected.pdf"
pdf_reader
=
PdfFileReader
(
str
(
pdf_path
))
print
(
pdf_reader
.
getPage
(
0
))
# Raises PdfReadError
print
(
pdf_reader
.
decrypt
(
password
=
"SuperSecret"
))
print
(
pdf_reader
.
getPage
(
0
))
\ No newline at end of file
creating-and-modifying-pdfs/source_code/06-creating-a-pdf-file-from-scratch.py
0 → 100644
浏览文件 @
9c0d3652
# ----------------------
# Using the Canvas Class
# ----------------------
from
reportlab.pdfgen.canvas
import
Canvas
\ No newline at end of file
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录