diff --git a/doc/design/ops/dist_train.md b/doc/design/ops/dist_train.md index 0bc350d8c084ee3b3bfc4828a85d20395230cfc5..8e92c87a59b880d7a257ae0a3faca8ddbce2c309 100644 --- a/doc/design/ops/dist_train.md +++ b/doc/design/ops/dist_train.md @@ -2,8 +2,8 @@ ## Abstract -We propose an approach to implment the parameter server. In this -approach, there is no fundimental difference between the trainer and +We propose an approach to implement the parameter server. In this +approach, there is no fundamental difference between the trainer and the parameter server: they both run sub-graphs, but sub-graphs of different purposes. @@ -16,7 +16,7 @@ trainer and the parameter server. It would be great if we can write code once and use them on both the trainer and the parameter server: reduces code duplication and -improves extensibility. Given during the current refactor, we are +improves extensibility. Given that after the current refactor, we are representing everything as a computing graph on the trainer. Representing everything as a computing graph on the parameter server becomes a natural extension. @@ -25,8 +25,8 @@ server becomes a natural extension. ### Graph Converter -The *graph converter* converts user-defined operation (OP) graph into -sub-graphs to be scheduled on different nodes. +The *graph converter* converts the user-defined operation (OP) graph +into sub-graphs to be scheduled on different nodes. 1. The user-defined OP graph will be cut into sub-graphs of different purposes (e.g., trainer, parameter server) to run on @@ -66,7 +66,7 @@ After converting: a subgraph. - No more duplication logic inside the trainer and the parameter - server in the background section. + server mentioned in the background section. ### Challenges