Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
BaiXuePrincess
Paddle
提交
70142ae6
P
Paddle
项目概览
BaiXuePrincess
/
Paddle
与 Fork 源项目一致
Fork自
PaddlePaddle / Paddle
通知
1
Star
1
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
70142ae6
编写于
1月 23, 2018
作者:
T
typhoonzero
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
update dist benchmark to one image
上级
da3b14bc
变更
13
隐藏空白更改
内联
并排
Showing
13 changed file
with
86 addition
and
96 deletion
+86
-96
benchmark/cluster/vgg16/Dockerfile
benchmark/cluster/vgg16/Dockerfile
+1
-1
benchmark/cluster/vgg16/README.md
benchmark/cluster/vgg16/README.md
+58
-0
benchmark/cluster/vgg16/fluid_pserver.yaml
benchmark/cluster/vgg16/fluid_pserver.yaml
+3
-3
benchmark/cluster/vgg16/fluid_trainer.yaml
benchmark/cluster/vgg16/fluid_trainer.yaml
+3
-3
benchmark/cluster/vgg16/k8s_tools.py
benchmark/cluster/vgg16/k8s_tools.py
+0
-0
benchmark/cluster/vgg16/paddle_k8s
benchmark/cluster/vgg16/paddle_k8s
+0
-0
benchmark/cluster/vgg16/reader.py
benchmark/cluster/vgg16/reader.py
+0
-0
benchmark/cluster/vgg16/v2/Dockerfile
benchmark/cluster/vgg16/v2/Dockerfile
+0
-7
benchmark/cluster/vgg16/v2/reader.py
benchmark/cluster/vgg16/v2/reader.py
+0
-70
benchmark/cluster/vgg16/v2_pserver.yaml
benchmark/cluster/vgg16/v2_pserver.yaml
+2
-2
benchmark/cluster/vgg16/v2_trainer.yaml
benchmark/cluster/vgg16/v2_trainer.yaml
+5
-3
benchmark/cluster/vgg16/vgg16_fluid.py
benchmark/cluster/vgg16/vgg16_fluid.py
+0
-0
benchmark/cluster/vgg16/vgg16_v2.py
benchmark/cluster/vgg16/vgg16_v2.py
+14
-7
未找到文件。
benchmark/cluster/vgg16/
fluid/
Dockerfile
→
benchmark/cluster/vgg16/Dockerfile
浏览文件 @
70142ae6
...
@@ -12,4 +12,4 @@ ENV LD_LIBRARY_PATH=/usr/local/lib
...
@@ -12,4 +12,4 @@ ENV LD_LIBRARY_PATH=/usr/local/lib
ADD
reader.py /workspace/
ADD
reader.py /workspace/
RUN
python /workspace/reader.py
RUN
python /workspace/reader.py
ADD
vgg16.py /workspace/
ADD
vgg16
_fluid.py vgg16_v2
.py /workspace/
benchmark/cluster/vgg16/
fluid/
README.md
→
benchmark/cluster/vgg16/README.md
浏览文件 @
70142ae6
#
Fluid distributed training perf test
#
Performance for distributed vgg16
## Steps to get started
## Test Result
### Single node single thread
| Batch Size | 32 | 64 | 128 | 256 |
| -- | -- | -- | -- | -- |
| PaddlePaddle Fluid | - | - | 16.74 | - |
| PaddlePaddle v2 | - | - | 17.60 | - |
| TensorFlow | - | - | - | - |
### different batch size
-
PServer Count: 10
-
Trainer Count: 20
-
Metrics: samples / sec
| Batch Size | 32 | 64 | 128 | 256 |
| -- | -- | -- | -- | -- |
| PaddlePaddle Fluid | - | 247.40 | - | - |
| PaddlePaddle v2 | - | - | 256.14 | - |
| TensorFlow | - | - | - | - |
### different pserver number
-
Trainer Count: 100
-
Batch Size: 64
-
Metrics: mini-batch / sec
| PServer Count | 10 | 20 | 40 | 60 |
| -- | -- | -- | -- | -- |
| PaddlePaddle Fluid | - | - | - | - |
| PaddlePaddle v2 | - | - | - | - |
| TensorFlow | - | - | - | - |
### Accelerate rate
| Trainer Counter | 20 | 40 | 80 | 100 |
| -- | -- | -- | -- | -- |
| PaddlePaddle Fluid | - | - | - | - |
| PaddlePaddle v2 | - | - | - | - |
| TensorFlow | - | - | - | - |
## Steps to run the performance test
1.
You must re-compile PaddlePaddle and enable
`-DWITH_DISTRIBUTE`
to build PaddlePaddle with distributed support.
1.
You must re-compile PaddlePaddle and enable
`-DWITH_DISTRIBUTE`
to build PaddlePaddle with distributed support.
1.
When the build finishes, copy the output
`whl`
package located under
`build/python/dist`
to current directory.
1.
When the build finishes, copy the output
`whl`
package located under
`build/python/dist`
to current directory.
...
...
benchmark/cluster/vgg16/fluid
/
pserver.yaml
→
benchmark/cluster/vgg16/fluid
_
pserver.yaml
浏览文件 @
70142ae6
...
@@ -14,7 +14,7 @@ spec:
...
@@ -14,7 +14,7 @@ spec:
-
name
:
job-registry-secret
-
name
:
job-registry-secret
containers
:
containers
:
-
name
:
pserver
-
name
:
pserver
image
:
"
registry.baidu.com/paddlepaddle/
rawjob:vgg16_fluid
"
image
:
"
registry.baidu.com/paddlepaddle/
fluid_benchmark:vgg16
"
imagePullPolicy
:
Always
imagePullPolicy
:
Always
ports
:
ports
:
-
name
:
jobport-30236
-
name
:
jobport-30236
...
@@ -33,7 +33,7 @@ spec:
...
@@ -33,7 +33,7 @@ spec:
-
name
:
TOPOLOGY
-
name
:
TOPOLOGY
value
:
"
"
value
:
"
"
-
name
:
ENTRY
-
name
:
ENTRY
value
:
"
LD_LIBRARY_PATH=/usr/local/lib
MKL_NUM_THREADS=1
python
/workspace/vgg16
.py
--local
0"
value
:
"
MKL_NUM_THREADS=1
python
/workspace/vgg16_fluid
.py
--local
0"
-
name
:
TRAINER_PACKAGE
-
name
:
TRAINER_PACKAGE
value
:
"
/workspace"
value
:
"
/workspace"
-
name
:
PADDLE_INIT_PORT
-
name
:
PADDLE_INIT_PORT
...
@@ -53,7 +53,7 @@ spec:
...
@@ -53,7 +53,7 @@ spec:
-
name
:
PADDLE_INIT_USE_GPU
-
name
:
PADDLE_INIT_USE_GPU
value
:
"
0"
value
:
"
0"
-
name
:
LD_LIBRARY_PATH
-
name
:
LD_LIBRARY_PATH
value
:
"
/usr/local/nvidia/lib64"
value
:
"
/usr/local/
lib:/usr/local/
nvidia/lib64"
-
name
:
NAMESPACE
-
name
:
NAMESPACE
valueFrom
:
valueFrom
:
fieldRef
:
fieldRef
:
...
...
benchmark/cluster/vgg16/fluid
/
trainer.yaml
→
benchmark/cluster/vgg16/fluid
_
trainer.yaml
浏览文件 @
70142ae6
...
@@ -15,7 +15,7 @@ spec:
...
@@ -15,7 +15,7 @@ spec:
hostNetwork
:
true
hostNetwork
:
true
containers
:
containers
:
-
name
:
trainer
-
name
:
trainer
image
:
"
registry.baidu.com/paddlepaddle/
rawjob:vgg16_fluid
"
image
:
"
registry.baidu.com/paddlepaddle/
fluid_benchmark:vgg16
"
imagePullPolicy
:
Always
imagePullPolicy
:
Always
command
:
[
"
paddle_k8s"
,
"
start_fluid"
]
command
:
[
"
paddle_k8s"
,
"
start_fluid"
]
env
:
env
:
...
@@ -30,7 +30,7 @@ spec:
...
@@ -30,7 +30,7 @@ spec:
-
name
:
TOPOLOGY
-
name
:
TOPOLOGY
value
:
"
"
value
:
"
"
-
name
:
ENTRY
-
name
:
ENTRY
value
:
"
cd
/workspace
&&
LD_LIBRARY_PATH=/usr/local/lib
MKL_NUM_THREADS=1
python
/workspace/vgg16.py
--local
0
"
value
:
"
MKL_NUM_THREADS=1
python
/workspace/vgg16_fluid.py
--local
0
--batch_size
128
"
-
name
:
TRAINER_PACKAGE
-
name
:
TRAINER_PACKAGE
value
:
"
/workspace"
value
:
"
/workspace"
-
name
:
PADDLE_INIT_PORT
-
name
:
PADDLE_INIT_PORT
...
@@ -50,7 +50,7 @@ spec:
...
@@ -50,7 +50,7 @@ spec:
-
name
:
PADDLE_INIT_USE_GPU
-
name
:
PADDLE_INIT_USE_GPU
value
:
"
0"
value
:
"
0"
-
name
:
LD_LIBRARY_PATH
-
name
:
LD_LIBRARY_PATH
value
:
"
/usr/local/nvidia/lib64"
value
:
"
/usr/local/
lib:/usr/local/
nvidia/lib64"
-
name
:
NAMESPACE
-
name
:
NAMESPACE
valueFrom
:
valueFrom
:
fieldRef
:
fieldRef
:
...
...
benchmark/cluster/vgg16/
fluid/
k8s_tools.py
→
benchmark/cluster/vgg16/k8s_tools.py
浏览文件 @
70142ae6
文件已移动
benchmark/cluster/vgg16/
fluid/
paddle_k8s
→
benchmark/cluster/vgg16/paddle_k8s
浏览文件 @
70142ae6
文件已移动
benchmark/cluster/vgg16/
fluid/
reader.py
→
benchmark/cluster/vgg16/reader.py
浏览文件 @
70142ae6
文件已移动
benchmark/cluster/vgg16/v2/Dockerfile
已删除
100644 → 0
浏览文件 @
da3b14bc
FROM
paddlepaddle/paddlecloud-job
RUN
mkdir
-p
/workspace
ADD
reader.py /workspace/
RUN
python /workspace/reader.py
ADD
vgg16.py /workspace/
ADD
vgg16_fluid.py /workspace
benchmark/cluster/vgg16/v2/reader.py
已删除
100644 → 0
浏览文件 @
da3b14bc
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import
random
from
paddle.v2.image
import
load_and_transform
import
paddle.v2
as
paddle
from
multiprocessing
import
cpu_count
def
train_mapper
(
sample
):
'''
map image path to type needed by model input layer for the training set
'''
img
,
label
=
sample
img
=
paddle
.
image
.
load_image
(
img
)
img
=
paddle
.
image
.
simple_transform
(
img
,
256
,
224
,
True
)
return
img
.
flatten
().
astype
(
'float32'
),
label
def
test_mapper
(
sample
):
'''
map image path to type needed by model input layer for the test set
'''
img
,
label
=
sample
img
=
paddle
.
image
.
load_image
(
img
)
img
=
paddle
.
image
.
simple_transform
(
img
,
256
,
224
,
True
)
return
img
.
flatten
().
astype
(
'float32'
),
label
def
train_reader
(
train_list
,
buffered_size
=
1024
):
def
reader
():
with
open
(
train_list
,
'r'
)
as
f
:
lines
=
[
line
.
strip
()
for
line
in
f
]
for
line
in
lines
:
img_path
,
lab
=
line
.
strip
().
split
(
'
\t
'
)
yield
img_path
,
int
(
lab
)
return
paddle
.
reader
.
xmap_readers
(
train_mapper
,
reader
,
cpu_count
(),
buffered_size
)
def
test_reader
(
test_list
,
buffered_size
=
1024
):
def
reader
():
with
open
(
test_list
,
'r'
)
as
f
:
lines
=
[
line
.
strip
()
for
line
in
f
]
for
line
in
lines
:
img_path
,
lab
=
line
.
strip
().
split
(
'
\t
'
)
yield
img_path
,
int
(
lab
)
return
paddle
.
reader
.
xmap_readers
(
test_mapper
,
reader
,
cpu_count
(),
buffered_size
)
if
__name__
==
'__main__'
:
#for im in train_reader('train.list'):
# print len(im[0])
#for im in train_reader('test.list'):
# print len(im[0])
paddle
.
dataset
.
cifar
.
train10
()
benchmark/cluster/vgg16/v2
/
pserver.yaml
→
benchmark/cluster/vgg16/v2
_
pserver.yaml
浏览文件 @
70142ae6
...
@@ -14,7 +14,7 @@ spec:
...
@@ -14,7 +14,7 @@ spec:
-
name
:
job-registry-secret
-
name
:
job-registry-secret
containers
:
containers
:
-
name
:
pserver
-
name
:
pserver
image
:
"
registry.baidu.com/paddlepaddle/
rawjob
:vgg16"
image
:
"
registry.baidu.com/paddlepaddle/
fluid_benchmark
:vgg16"
imagePullPolicy
:
Always
imagePullPolicy
:
Always
ports
:
ports
:
-
name
:
jobport-30236
-
name
:
jobport-30236
...
@@ -49,7 +49,7 @@ spec:
...
@@ -49,7 +49,7 @@ spec:
-
name
:
PADDLE_INIT_USE_GPU
-
name
:
PADDLE_INIT_USE_GPU
value
:
"
0"
value
:
"
0"
-
name
:
LD_LIBRARY_PATH
-
name
:
LD_LIBRARY_PATH
value
:
"
/usr/local/nvidia/lib64"
value
:
"
/usr/local/
lib:/usr/local/
nvidia/lib64"
-
name
:
NAMESPACE
-
name
:
NAMESPACE
valueFrom
:
valueFrom
:
fieldRef
:
fieldRef
:
...
...
benchmark/cluster/vgg16/v2
/
trainer.yaml
→
benchmark/cluster/vgg16/v2
_
trainer.yaml
浏览文件 @
70142ae6
...
@@ -15,12 +15,14 @@ spec:
...
@@ -15,12 +15,14 @@ spec:
hostNetwork
:
true
hostNetwork
:
true
containers
:
containers
:
-
name
:
trainer
-
name
:
trainer
image
:
"
registry.baidu.com/paddlepaddle/
rawjob
:vgg16"
image
:
"
registry.baidu.com/paddlepaddle/
fluid_benchmark
:vgg16"
imagePullPolicy
:
Always
imagePullPolicy
:
Always
command
:
[
"
paddle_k8s"
,
"
start_trainer"
,
"
v2"
]
command
:
[
"
paddle_k8s"
,
"
start_trainer"
,
"
v2"
]
env
:
env
:
-
name
:
PADDLE_JOB_NAME
-
name
:
PADDLE_JOB_NAME
value
:
vgg16v2job
value
:
vgg16v2job
-
name
:
BATCH_SIZE
value
:
"
128"
-
name
:
TRAINERS
-
name
:
TRAINERS
value
:
"
20"
value
:
"
20"
-
name
:
PSERVERS
-
name
:
PSERVERS
...
@@ -28,7 +30,7 @@ spec:
...
@@ -28,7 +30,7 @@ spec:
-
name
:
TOPOLOGY
-
name
:
TOPOLOGY
value
:
"
"
value
:
"
"
-
name
:
ENTRY
-
name
:
ENTRY
value
:
"
cd
/workspace
&&
MKL_NUM_THREADS=1
python
/workspace/vgg16.py"
value
:
"
cd
/workspace
&&
MKL_NUM_THREADS=1
python
/workspace/vgg16
_v2
.py"
-
name
:
TRAINER_PACKAGE
-
name
:
TRAINER_PACKAGE
value
:
"
/workspace"
value
:
"
/workspace"
-
name
:
PADDLE_INIT_PORT
-
name
:
PADDLE_INIT_PORT
...
@@ -48,7 +50,7 @@ spec:
...
@@ -48,7 +50,7 @@ spec:
-
name
:
PADDLE_INIT_USE_GPU
-
name
:
PADDLE_INIT_USE_GPU
value
:
"
0"
value
:
"
0"
-
name
:
LD_LIBRARY_PATH
-
name
:
LD_LIBRARY_PATH
value
:
"
/usr/local/nvidia/lib64"
value
:
"
/usr/local/
lib:/usr/local/
nvidia/lib64"
-
name
:
NAMESPACE
-
name
:
NAMESPACE
valueFrom
:
valueFrom
:
fieldRef
:
fieldRef
:
...
...
benchmark/cluster/vgg16/
fluid/vgg16
.py
→
benchmark/cluster/vgg16/
vgg16_fluid
.py
浏览文件 @
70142ae6
文件已移动
benchmark/cluster/vgg16/v
2/vgg16
.py
→
benchmark/cluster/vgg16/v
gg16_v2
.py
浏览文件 @
70142ae6
...
@@ -16,12 +16,17 @@ import gzip
...
@@ -16,12 +16,17 @@ import gzip
import
paddle.v2.dataset.cifar
as
cifar
import
paddle.v2.dataset.cifar
as
cifar
import
paddle.v2
as
paddle
import
paddle.v2
as
paddle
import
reader
import
time
import
time
import
os
DATA_DIM
=
3
*
32
*
32
DATA_DIM
=
3
*
32
*
32
CLASS_DIM
=
10
CLASS_DIM
=
10
BATCH_SIZE
=
128
BATCH_SIZE
=
os
.
getenv
(
"BATCH_SIZE"
)
if
BATCH_SIZE
:
BATCH_SIZE
=
int
(
BATCH_SIZE
)
else
:
BATCH_SIZE
=
128
NODE_COUNT
=
int
(
os
.
getenv
(
"TRAINERS"
))
ts
=
0
ts
=
0
...
@@ -84,7 +89,8 @@ def main():
...
@@ -84,7 +89,8 @@ def main():
name
=
"label"
,
type
=
paddle
.
data_type
.
integer_value
(
CLASS_DIM
))
name
=
"label"
,
type
=
paddle
.
data_type
.
integer_value
(
CLASS_DIM
))
extra_layers
=
None
extra_layers
=
None
learning_rate
=
1e-3
/
20
# NOTE: for v2 distributed training need averaging updates.
learning_rate
=
1e-3
/
NODE_COUNT
out
=
vgg16
(
image
,
class_dim
=
CLASS_DIM
)
out
=
vgg16
(
image
,
class_dim
=
CLASS_DIM
)
cost
=
paddle
.
layer
.
classification_cost
(
input
=
out
,
label
=
lbl
)
cost
=
paddle
.
layer
.
classification_cost
(
input
=
out
,
label
=
lbl
)
...
@@ -123,7 +129,9 @@ def main():
...
@@ -123,7 +129,9 @@ def main():
# End batch and end pass event handler
# End batch and end pass event handler
def
event_handler
(
event
):
def
event_handler
(
event
):
global
ts
global
ts
,
ts_pass
if
isinstance
(
event
,
paddle
.
event
.
BeginPass
):
ts_pass
=
time
.
time
()
if
isinstance
(
event
,
paddle
.
event
.
BeginIteration
):
if
isinstance
(
event
,
paddle
.
event
.
BeginIteration
):
ts
=
time
.
time
()
ts
=
time
.
time
()
if
isinstance
(
event
,
paddle
.
event
.
EndIteration
):
if
isinstance
(
event
,
paddle
.
event
.
EndIteration
):
...
@@ -132,9 +140,8 @@ def main():
...
@@ -132,9 +140,8 @@ def main():
event
.
pass_id
,
event
.
batch_id
,
event
.
cost
,
event
.
metrics
,
event
.
pass_id
,
event
.
batch_id
,
event
.
cost
,
event
.
metrics
,
time
.
time
()
-
ts
)
time
.
time
()
-
ts
)
if
isinstance
(
event
,
paddle
.
event
.
EndPass
):
if
isinstance
(
event
,
paddle
.
event
.
EndPass
):
with
gzip
.
open
(
'params_pass_%d.tar.gz'
%
event
.
pass_id
,
'w'
)
as
f
:
print
"Pass %d end, spent: %f"
%
(
event
.
pass_id
,
trainer
.
save_parameter_to_tar
(
f
)
time
.
time
()
-
ts_pass
)
result
=
trainer
.
test
(
reader
=
test_reader
)
result
=
trainer
.
test
(
reader
=
test_reader
)
print
"
\n
Test with Pass %d, %s"
%
(
event
.
pass_id
,
result
.
metrics
)
print
"
\n
Test with Pass %d, %s"
%
(
event
.
pass_id
,
result
.
metrics
)
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录