Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
weixin_41840029
PaddleOCR
提交
e63fbe49
P
PaddleOCR
项目概览
weixin_41840029
/
PaddleOCR
与 Fork 源项目一致
Fork自
PaddlePaddle / PaddleOCR
通知
1
Star
1
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
PaddleOCR
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
e63fbe49
编写于
2月 10, 2022
作者:
锦鲤AI幸运
🎯
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
修改完成:划分det与rec数据集
上级
373d0553
变更
1
隐藏空白更改
内联
并排
Showing
1 changed file
with
18 addition
and
14 deletion
+18
-14
PPOCRLabel/gen_ocr_train_val_test.py
PPOCRLabel/gen_ocr_train_val_test.py
+18
-14
未找到文件。
PPOCRLabel/gen_ocr_train_val_test.py
浏览文件 @
e63fbe49
...
...
@@ -17,15 +17,14 @@ def isCreateOrDeleteFolder(path, flag):
return
flagAbsPath
def
splitTrainVal
(
root
,
dir
,
absTrainRootPath
,
absValRootPath
,
absTestRootPath
,
trainTxt
,
valTxt
,
testTxt
,
flag
):
def
splitTrainVal
(
root
,
absTrainRootPath
,
absValRootPath
,
absTestRootPath
,
trainTxt
,
valTxt
,
testTxt
,
flag
):
# 按照指定的比例划分训练集、验证集、测试集
labelPath
=
os
.
path
.
join
(
root
,
dir
)
labelAbsPath
=
os
.
path
.
abspath
(
labelPath
)
dataAbsPath
=
os
.
path
.
abspath
(
root
)
if
flag
==
"det"
:
labelFilePath
=
os
.
path
.
join
(
label
AbsPath
,
args
.
detLabelFileName
)
labelFilePath
=
os
.
path
.
join
(
data
AbsPath
,
args
.
detLabelFileName
)
elif
flag
==
"rec"
:
labelFilePath
=
os
.
path
.
join
(
label
AbsPath
,
args
.
recLabelFileName
)
labelFilePath
=
os
.
path
.
join
(
data
AbsPath
,
args
.
recLabelFileName
)
labelFileRead
=
open
(
labelFilePath
,
"r"
,
encoding
=
"UTF-8"
)
labelFileContent
=
labelFileRead
.
readlines
()
...
...
@@ -38,9 +37,9 @@ def splitTrainVal(root, dir, absTrainRootPath, absValRootPath, absTestRootPath,
imageName
=
os
.
path
.
basename
(
imageRelativePath
)
if
flag
==
"det"
:
imagePath
=
os
.
path
.
join
(
label
AbsPath
,
imageName
)
imagePath
=
os
.
path
.
join
(
data
AbsPath
,
imageName
)
elif
flag
==
"rec"
:
imagePath
=
os
.
path
.
join
(
label
AbsPath
,
"{}
\\
{}"
.
format
(
args
.
recImageDirName
,
imageName
))
imagePath
=
os
.
path
.
join
(
data
AbsPath
,
"{}
\\
{}"
.
format
(
args
.
recImageDirName
,
imageName
))
# 按预设的比例划分训练集、验证集、测试集
trainValTestRatio
=
args
.
trainValTestRatio
.
split
(
":"
)
...
...
@@ -90,15 +89,20 @@ def genDetRecTrainVal(args):
recValTxt
=
open
(
os
.
path
.
join
(
args
.
recRootPath
,
"val.txt"
),
"a"
,
encoding
=
"UTF-8"
)
recTestTxt
=
open
(
os
.
path
.
join
(
args
.
recRootPath
,
"test.txt"
),
"a"
,
encoding
=
"UTF-8"
)
for
root
,
dirs
,
files
in
os
.
walk
(
args
.
labelRootPath
):
splitTrainVal
(
args
.
datasetRootPath
,
detAbsTrainRootPath
,
detAbsValRootPath
,
detAbsTestRootPath
,
detTrainTxt
,
detValTxt
,
detTestTxt
,
"det"
)
for
root
,
dirs
,
files
in
os
.
walk
(
args
.
datasetRootPath
):
for
dir
in
dirs
:
splitTrainVal
(
root
,
dir
,
detAbsTrainRootPath
,
detAbsValRootPath
,
detAbsTestRootPath
,
detTrainTxt
,
detValTxt
,
detTestTxt
,
"det"
)
splitTrainVal
(
root
,
dir
,
recAbsTrainRootPath
,
recAbsValRootPath
,
recAbsTestRootPath
,
recTrainTxt
,
recValTxt
,
recTestTxt
,
"rec"
)
if
dir
==
'crop_img'
:
splitTrainVal
(
root
,
recAbsTrainRootPath
,
recAbsValRootPath
,
recAbsTestRootPath
,
recTrainTxt
,
recValTxt
,
recTestTxt
,
"rec"
)
else
:
continue
break
if
__name__
==
"__main__"
:
# 功能描述:分别划分检测和识别的训练集、验证集、测试集
# 说明:可以根据自己的路径和需求调整参数,图像数据往往多人合作分批标注,每一批图像数据放在一个文件夹内用PPOCRLabel进行标注,
...
...
@@ -110,9 +114,9 @@ if __name__ == "__main__":
default
=
"6:2:2"
,
help
=
"ratio of trainset:valset:testset"
)
parser
.
add_argument
(
"--
label
RootPath"
,
"--
dataset
RootPath"
,
type
=
str
,
default
=
"../train_data/
label
"
,
default
=
"../train_data/"
,
help
=
"path to the dataset marked by ppocrlabel, E.g, dataset folder named 1,2,3..."
)
parser
.
add_argument
(
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录