Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
BaiXuePrincess
Paddle
提交
f2a32ddd
P
Paddle
项目概览
BaiXuePrincess
/
Paddle
与 Fork 源项目一致
Fork自
PaddlePaddle / Paddle
通知
1
Star
1
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
f2a32ddd
编写于
1月 24, 2018
作者:
W
wanghaoshuang
浏览文件
操作
浏览文件
下载
差异文件
Merge branch 'develop' of
https://github.com/PaddlePaddle/Paddle
into fix_im2seq
上级
1234b8b4
95853fc1
变更
6
隐藏空白更改
内联
并排
Showing
6 changed file
with
36 addition
and
33 deletion
+36
-33
doc/design/dist_refactor/parameter_server.md
doc/design/dist_refactor/parameter_server.md
+20
-20
doc/howto/optimization/cpu_profiling.md
doc/howto/optimization/cpu_profiling.md
+1
-2
paddle/gserver/layers/PriorBox.cpp
paddle/gserver/layers/PriorBox.cpp
+1
-1
paddle/operators/ctc_align_op.h
paddle/operators/ctc_align_op.h
+1
-1
paddle/operators/sequence_reshape_op.h
paddle/operators/sequence_reshape_op.h
+1
-1
python/paddle/v2/image.py
python/paddle/v2/image.py
+12
-8
未找到文件。
doc/design/dist_refactor/parameter_server.md
浏览文件 @
f2a32ddd
...
...
@@ -9,16 +9,16 @@ different purposes.
## Background
The previous implementations of the parameter server do
es
not run a
The previous implementations of the parameter server do not run a
fluid sub-program. Parameter initialization, optimizer computation, network
communication and checkpointing are implemented twice on both the
trainer a
nd
the parameter server.
trainer a
s well as
the parameter server.
It would be great if we can write code once and use them on both the
trainer and the parameter server
:
reduces code duplication and
improves extensibility. Given that after the current refactor, we are
representing everything as a comput
ing
graph on the
trainer. Representing everything as a comput
ing
graph on the parameter
It would be great if we can write code once and use them on both
:
the
trainer and the parameter server
, since this
reduces code duplication and
improves extensibility. Given that after the current refactor
ing
, we are
representing everything as a comput
ation
graph on the
trainer. Representing everything as a comput
ation
graph on the parameter
server becomes a natural extension.
## Design
...
...
@@ -30,9 +30,9 @@ into sub-programs to be scheduled on different nodes with the following
steps:
1.
OP placement: the OPs will be placed on different nodes according
to
heuristic that minimizes
estimated total computation
to
a heuristic that minimizes the
estimated total computation
time. Currently we will use a simple heuristic that puts parameter
varable on parameter server workers and everything else on trainer
var
i
able on parameter server workers and everything else on trainer
workers.
1.
Add communication OPs to enable the communication between nodes.
...
...
@@ -47,22 +47,22 @@ After converting:
<img
src=
"src/dist-graph.png"
width=
"700"
/>
1.
The parameter variable W and it
'
s optimizer program are placed on the parameter server.
1.
The parameter variable W and its optimizer program are placed on the parameter server.
1.
Operators are added to the program.
-
*Send*
sends data to the connected
*Recv*
operator. The
scheduler on the receive node will only schedule
*Recv*
operator
to run when the
*Send*
operator has ran (the
*Send*
OP will mark
the
*Recv*
OP runnable automatically).
-
*Enueue*
enqueues the input variable, it can block until space
-
*En
q
ueue*
enqueues the input variable, it can block until space
become available in the queue.
-
*Dequeue*
outputs configurable numbers of tensors from the
queue. It will block until the queue ha
ve
the required number of
queue. It will block until the queue ha
s
the required number of
tensors.
### Benefits
-
Model parallelism become
easier to implement: it'
s an extension to
-
Model parallelism become
s easier to implement: it i
s an extension to
the trainer - parameter server approach. We can have several "Transpilers"
to achieve different goals.
-
User-defined optimizer is easier to add - user can now express it as
...
...
@@ -72,22 +72,22 @@ After converting:
### Challenges
-
It
's important to balance the parameter shards of
on multiple
parameter server
. If a single parameter is very big (
some
-
It
is important to balance the parameter shards
on multiple
parameter server
s. If a single parameter is very big (for example:
some
word-embedding, fully connected, softmax layer), we need to
automatically partition the single parameter onto different
parameter servers when possible (only element-wise optimizer depends
on the parameter variable).
-
In the "Aync SGD" figure, the "W" variable on the parameter server
could be read and wr
ote
concurrently. See
-
In the "A
s
ync SGD" figure, the "W" variable on the parameter server
could be read and wr
itten
concurrently. See
[
here
](
https://github.com/PaddlePaddle/Paddle/pull/6394
)
for more
details about concurrent program in
f
luid.
details about concurrent program in
F
luid.
### Discussion
-
Can the Enqueue OP be implemented under our current tensor design
(put
s
the input tensor into the queue tensor)?
-
*Dequeue*
OP will have variable numbers of output (depend
s
on the
(put the input tensor into the queue tensor)?
-
*Dequeue*
OP will have variable numbers of output (depend
ing
on the
`min_count`
attribute), does our current design support it? (similar
question for the
*Add*
OP)
...
...
doc/howto/optimization/cpu_profiling.md
浏览文件 @
f2a32ddd
...
...
@@ -60,8 +60,7 @@ each column is as follows:
| column | meaning |
| --- | --- |
| ncalls | the number of calls into a function |
| tottime | the total execution time of the function, not including the
execution time of other functions called by the function |
| tottime | the total execution time of the function, not including the execution time of other functions called by the function |
| percall | tottime divided by ncalls |
| cumtime | the total execution time of the function, including the execution time of other functions being called |
| percall | cumtime divided by ncalls |
...
...
paddle/gserver/layers/PriorBox.cpp
浏览文件 @
f2a32ddd
...
...
@@ -69,7 +69,7 @@ bool PriorBoxLayer::init(const LayerMap& layerMap,
if
(
maxSize_
.
size
()
>
0
)
CHECK_EQ
(
minSize_
.
size
(),
maxSize_
.
size
());
// flip aspect ratios
for
(
int
index
=
0
;
index
<
tmp
.
size
();
index
++
)
{
for
(
unsigned
index
=
0
;
index
<
tmp
.
size
();
index
++
)
{
real
ar
=
tmp
[
index
];
if
(
fabs
(
ar
-
1.
)
<
1e-6
)
continue
;
aspectRatio_
.
push_back
(
ar
);
...
...
paddle/operators/ctc_align_op.h
浏览文件 @
f2a32ddd
...
...
@@ -51,7 +51,7 @@ class CTCAlignKernel : public framework::OpKernel<T> {
T
prev_token
=
-
1
;
for
(
size_t
i
=
input_lod
[
level
][
seq_idx
];
i
<
input_lod
[
level
][
seq_idx
+
1
];
++
i
)
{
if
(
input_data
[
i
]
!=
blank
&&
if
(
(
unsigned
)
input_data
[
i
]
!=
blank
&&
!
(
merge_repeated
&&
input_data
[
i
]
==
prev_token
))
{
output_data
[
output_idx
]
=
input_data
[
i
];
++
output_idx
;
...
...
paddle/operators/sequence_reshape_op.h
浏览文件 @
f2a32ddd
...
...
@@ -35,7 +35,7 @@ class SequenceReshapeKernel : public framework::OpKernel<T> {
PADDLE_ENFORCE_EQ
(
in_lod
.
size
(),
1UL
,
"Only support one level sequence now."
);
PADDLE_ENFORCE_EQ
(
in_dims
[
0
],
in_lod
[
0
].
back
(),
(
uint64_t
)
in_dims
[
0
],
in_lod
[
0
].
back
(),
"Inconsistent size between X.shape[0] and X.lod()[0].back()."
);
auto
in_lod_l0
=
in_lod
[
0
];
...
...
python/paddle/v2/image.py
浏览文件 @
f2a32ddd
...
...
@@ -176,7 +176,6 @@ def resize_short(im, size):
:param size: the shorter edge size of image after resizing.
:type size: int
"""
assert
im
.
shape
[
-
1
]
==
1
or
im
.
shape
[
-
1
]
==
3
h
,
w
=
im
.
shape
[:
2
]
h_new
,
w_new
=
size
,
size
if
h
>
w
:
...
...
@@ -267,7 +266,7 @@ def random_crop(im, size, is_color=True):
return
im
def
left_right_flip
(
im
):
def
left_right_flip
(
im
,
is_color
=
True
):
"""
Flip an image along the horizontal direction.
Return the flipped image.
...
...
@@ -278,13 +277,15 @@ def left_right_flip(im):
im = left_right_flip(im)
:pa
am im: input image with HWC layout
:pa
ram im: input image with HWC layout or HW layout for gray image
:type im: ndarray
:param is_color: whether input image is color or not
:type is_color: bool
"""
if
len
(
im
.
shape
)
==
3
:
if
len
(
im
.
shape
)
==
3
and
is_color
:
return
im
[:,
::
-
1
,
:]
else
:
return
im
[:,
::
-
1
,
:
]
return
im
[:,
::
-
1
]
def
simple_transform
(
im
,
...
...
@@ -321,8 +322,9 @@ def simple_transform(im,
if
is_train
:
im
=
random_crop
(
im
,
crop_size
,
is_color
=
is_color
)
if
np
.
random
.
randint
(
2
)
==
0
:
im
=
left_right_flip
(
im
)
im
=
left_right_flip
(
im
,
is_color
)
else
:
im
=
center_crop
(
im
,
crop_size
,
is_color
)
im
=
center_crop
(
im
,
crop_size
,
is_color
=
is_color
)
if
len
(
im
.
shape
)
==
3
:
im
=
to_chw
(
im
)
...
...
@@ -331,8 +333,10 @@ def simple_transform(im,
if
mean
is
not
None
:
mean
=
np
.
array
(
mean
,
dtype
=
np
.
float32
)
# mean value, may be one value per channel
if
mean
.
ndim
==
1
:
if
mean
.
ndim
==
1
and
is_color
:
mean
=
mean
[:,
np
.
newaxis
,
np
.
newaxis
]
elif
mean
.
ndim
==
1
:
mean
=
mean
else
:
# elementwise mean
assert
len
(
mean
.
shape
)
==
len
(
im
)
...
...
@@ -372,6 +376,6 @@ def load_and_transform(filename,
mean values per channel.
:type mean: numpy array | list
"""
im
=
load_image
(
filename
)
im
=
load_image
(
filename
,
is_color
)
im
=
simple_transform
(
im
,
resize_size
,
crop_size
,
is_train
,
is_color
,
mean
)
return
im
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录