Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
机器未来
Paddle
提交
d2c0e2f0
P
Paddle
项目概览
机器未来
/
Paddle
与 Fork 源项目一致
Fork自
PaddlePaddle / Paddle
通知
1
Star
1
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
1
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
1
Issue
1
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
d2c0e2f0
编写于
6月 21, 2018
作者:
S
sneaxiy
浏览文件
操作
浏览文件
下载
差异文件
Merge branch 'develop' of
https://github.com/PaddlePaddle/Paddle
into python_data_feeding
上级
6f745834
0151e4eb
变更
4
隐藏空白更改
内联
并排
Showing
4 changed file
with
67 addition
and
1 deletion
+67
-1
paddle/fluid/framework/details/multi_devices_graph_builder.cc
...le/fluid/framework/details/multi_devices_graph_builder.cc
+11
-1
paddle/fluid/framework/details/threaded_ssa_graph_executor.cc
...le/fluid/framework/details/threaded_ssa_graph_executor.cc
+2
-0
paddle/fluid/framework/details/threaded_ssa_graph_executor.h
paddle/fluid/framework/details/threaded_ssa_graph_executor.h
+1
-0
tools/check_ctest_hung.py
tools/check_ctest_hung.py
+53
-0
未找到文件。
paddle/fluid/framework/details/multi_devices_graph_builder.cc
浏览文件 @
d2c0e2f0
...
...
@@ -200,6 +200,10 @@ std::unique_ptr<SSAGraph> MultiDevSSAGraphBuilder::Build(
BuildStrategy
::
GradientScaleStrategy
::
kCustomized
)
{
CreateScaleLossGradOp
(
&
result
);
}
// This assumes the backward generating code will ensure IsScaleLossOp
// is true only for the op that scale the final scalar loss.
// It also assumes backward op will always follow the forward op in
// the block.
is_forwarding
=
false
;
}
else
{
int
op_dev_id
=
GetOpDeviceID
(
*
op
);
...
...
@@ -244,6 +248,9 @@ std::unique_ptr<SSAGraph> MultiDevSSAGraphBuilder::Build(
InsertAllReduceOp
(
&
result
,
g_name
);
}
break
;
default:
LOG
(
FATAL
)
<<
"Unknown reduce strategy "
;
break
;
}
}
}
catch
(
boost
::
bad_get
e
)
{
...
...
@@ -262,7 +269,7 @@ std::unique_ptr<SSAGraph> MultiDevSSAGraphBuilder::Build(
}
/*
Dependency graph has been constructed. However, there are still data
ha
rzae
ds need to be handled.
ha
zar
ds need to be handled.
*/
PolishGraphToSupportDataHazards
(
&
result
);
...
...
@@ -447,6 +454,8 @@ VarHandle *MultiDevSSAGraphBuilder::CreateReduceOp(SSAGraph *result,
return
var
;
}
// Find the first occurence of `prev_op_name` and make current `op` depend
// on it.
void
MultiDevSSAGraphBuilder
::
ConnectOp
(
SSAGraph
*
result
,
OpHandleBase
*
op
,
const
std
::
string
&
prev_op_name
)
const
{
for
(
auto
&
prev_op
:
result
->
ops_
)
{
...
...
@@ -490,6 +499,7 @@ void MultiDevSSAGraphBuilder::CreateDistTrainOp(SSAGraph *result,
}
}
// Create RPC related op handles that connects its in ops and out ops.
void
MultiDevSSAGraphBuilder
::
CreateRPCOp
(
SSAGraph
*
result
,
const
OpDesc
&
op
)
const
{
int
op_dev_id
=
-
1
;
...
...
paddle/fluid/framework/details/threaded_ssa_graph_executor.cc
浏览文件 @
d2c0e2f0
...
...
@@ -96,6 +96,7 @@ FeedFetchList ThreadedSSAGraphExecutor::Run(
auto
cur_ready_vars
=
ready_vars
.
PopAll
(
1
,
&
timeout
);
if
(
timeout
)
{
std
::
lock_guard
<
std
::
mutex
>
l
(
exception_mu_
);
if
(
exception_
)
{
auto
exp
=
*
exception_
;
exception_
.
reset
();
...
...
@@ -199,6 +200,7 @@ void ThreadedSSAGraphExecutor::RunOp(
ready_var_q
->
Extend
(
op
->
Outputs
());
VLOG
(
10
)
<<
op
<<
" "
<<
op
->
Name
()
<<
"Signal posted"
;
}
catch
(
platform
::
EnforceNotMet
ex
)
{
std
::
lock_guard
<
std
::
mutex
>
l
(
exception_mu_
);
exception_
.
reset
(
new
platform
::
EnforceNotMet
(
ex
));
}
catch
(...)
{
LOG
(
FATAL
)
<<
"Unknown exception catched"
;
...
...
paddle/fluid/framework/details/threaded_ssa_graph_executor.h
浏览文件 @
d2c0e2f0
...
...
@@ -56,6 +56,7 @@ class ThreadedSSAGraphExecutor : public SSAGraphExecutor {
std
::
vector
<
Scope
*>
local_scopes_
;
std
::
vector
<
platform
::
Place
>
places_
;
platform
::
DeviceContextPool
fetch_ctxs_
;
std
::
mutex
exception_mu_
;
std
::
unique_ptr
<
platform
::
EnforceNotMet
>
exception_
;
std
::
atomic
<
int
>
running_ops_
;
...
...
tools/check_ctest_hung.py
0 → 100644
浏览文件 @
d2c0e2f0
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import
sys
import
re
def
escape
(
input
):
o
=
input
.
replace
(
"
\n
"
,
""
)
o
=
o
.
replace
(
"
\r
"
,
""
)
return
o
def
main
():
usage
=
"""Usage:
1. Download the Paddle_PR_CI_*.log from TeamCity
2. run: python check_ctest_hung.py Paddle_PR_CI_*.log
3. If there is hung ctest, the result likes:
Diff: set(['test_parallel_executor_crf'])
"""
if
len
(
sys
.
argv
)
<
2
:
print
(
usage
)
exit
(
0
)
logfile
=
sys
.
argv
[
1
]
started
=
set
()
passed
=
set
()
with
open
(
logfile
,
"r"
)
as
fn
:
for
l
in
fn
.
readlines
():
if
l
.
find
(
"Test "
)
!=
-
1
and
\
l
.
find
(
"Passed"
)
!=
-
1
:
m
=
re
.
search
(
"Test\s+#[0-9]*\:\s([a-z0-9_]+)"
,
escape
(
l
))
passed
.
add
(
m
.
group
(
1
))
if
l
.
find
(
"Start "
)
!=
-
1
:
start_parts
=
escape
(
l
).
split
(
" "
)
m
=
re
.
search
(
"Start\s+[0-9]+\:\s([a-z0-9_]+)"
,
escape
(
l
))
started
.
add
(
m
.
group
(
1
))
print
"Diff: "
,
started
-
passed
if
__name__
==
"__main__"
:
main
()
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录