Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
BaiXuePrincess
Paddle
提交
09abfd91
P
Paddle
项目概览
BaiXuePrincess
/
Paddle
与 Fork 源项目一致
Fork自
PaddlePaddle / Paddle
通知
1
Star
1
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
09abfd91
编写于
6月 26, 2017
作者:
H
helinwang
提交者:
GitHub
6月 26, 2017
浏览文件
操作
浏览文件
下载
差异文件
Merge pull request #2593 from typhoonzero/set_ps_desired
Set ps_desired when pserver init
上级
09eb6be2
0824061e
变更
2
隐藏空白更改
内联
并排
Showing
2 changed file
with
29 addition
and
2 deletion
+29
-2
go/cmd/pserver/pserver.go
go/cmd/pserver/pserver.go
+2
-1
go/pserver/service.go
go/pserver/service.go
+27
-1
未找到文件。
go/cmd/pserver/pserver.go
浏览文件 @
09abfd91
...
...
@@ -18,6 +18,7 @@ func main() {
etcdEndpoint
:=
flag
.
String
(
"etcd-endpoint"
,
"http://127.0.0.1:2379"
,
"comma separated endpoint string for pserver to connect to etcd"
)
etcdTimeout
:=
flag
.
Int
(
"etcd-timeout"
,
5
,
"timeout for etcd calls"
)
numPservers
:=
flag
.
Int
(
"num-pservers"
,
1
,
"total pserver count in a training job"
)
logLevel
:=
flag
.
String
(
"log-level"
,
"info"
,
"log level, possible values: debug, info, warning, error, fatal, panic"
)
flag
.
Parse
()
...
...
@@ -29,7 +30,7 @@ func main() {
log
.
SetLevel
(
level
)
timeout
:=
time
.
Second
*
time
.
Duration
((
*
etcdTimeout
))
s
,
err
:=
pserver
.
NewService
(
*
etcdEndpoint
,
timeout
)
s
,
err
:=
pserver
.
NewService
(
*
etcdEndpoint
,
*
numPservers
,
timeout
)
if
err
!=
nil
{
panic
(
err
)
}
...
...
go/pserver/service.go
浏览文件 @
09abfd91
...
...
@@ -73,7 +73,7 @@ type Service struct {
// NewService creates a new service, will bypass etcd registration if no
// endpoints specified.
func
NewService
(
endpoints
string
,
timeout
time
.
Duration
)
(
*
Service
,
error
)
{
func
NewService
(
endpoints
string
,
numPservers
int
,
timeout
time
.
Duration
)
(
*
Service
,
error
)
{
s
:=
&
Service
{
opt
:
newOptimizer
(
sgd
,
0.005
)}
s
.
paramMap
=
make
(
map
[
string
]
Parameter
)
s
.
initialized
=
make
(
chan
struct
{})
...
...
@@ -103,6 +103,22 @@ func NewService(endpoints string, timeout time.Duration) (*Service, error) {
log
.
Debugf
(
"inited client to %s"
,
s
.
etcdEndpoints
)
break
}
// init /ps_desired using transaction, for multiple pservers may want to write
// it at the same time.
for
{
ctx
,
cancel
:=
context
.
WithTimeout
(
context
.
Background
(),
time
.
Second
)
_
,
err
:=
s
.
initDesiredPsercers
(
ctx
,
numPservers
)
cancel
()
if
err
!=
nil
{
log
.
Warn
(
err
)
time
.
Sleep
(
s
.
etcdTimeout
)
continue
}
break
}
// TODO: when implementing extending or reducing pservers, /ps_desired is
// changed, then we need to watch /ps_desired node for events. For now, just
// write once when init and read from it.
// wait and set s.desired init value
for
{
ctx
,
cancel
:=
context
.
WithTimeout
(
context
.
Background
(),
time
.
Second
)
...
...
@@ -141,6 +157,16 @@ func NewService(endpoints string, timeout time.Duration) (*Service, error) {
return
s
,
nil
}
func
(
s
*
Service
)
initDesiredPsercers
(
ctx
context
.
Context
,
numPservers
int
)
(
*
clientv3
.
TxnResponse
,
error
)
{
return
concurrency
.
NewSTM
(
s
.
etcdClient
,
func
(
c
concurrency
.
STM
)
error
{
dsStr
:=
c
.
Get
(
PsDesired
)
if
dsStr
==
""
{
c
.
Put
(
PsDesired
,
strconv
.
Itoa
(
numPservers
))
}
return
nil
},
concurrency
.
WithAbortContext
(
ctx
),
concurrency
.
WithIsolation
(
concurrency
.
RepeatableReads
))
}
// registerPserverEtcd registers pserver node on etcd using transaction.
func
(
s
*
Service
)
registerPserverEtcd
(
ctx
context
.
Context
)
(
*
clientv3
.
TxnResponse
,
error
)
{
return
concurrency
.
NewSTM
(
s
.
etcdClient
,
func
(
c
concurrency
.
STM
)
error
{
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录