提交 b3827473 编写于 作者: Y Yancey1989

add note message

上级 af8c7288
...@@ -11,6 +11,10 @@ the gradient to Parameter Server to execute the optimize program. ...@@ -11,6 +11,10 @@ the gradient to Parameter Server to execute the optimize program.
## Design ## Design
**NOTE**: this approach is a feature of Fluid distributed trianing, maybe you want
to know [Distributed Architecture](./distributed_architecture.md) and
[Parameter Server](./parameter_server.md) before reading the following content.
Fluid large model distributed training use Fluid large model distributed training use
[Distributed Transpiler](./parameter_server.md#distributed-transpiler) to split [Distributed Transpiler](./parameter_server.md#distributed-transpiler) to split
a large parameter into multiple parameters which stored on Parameter Server, and a large parameter into multiple parameters which stored on Parameter Server, and
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册