From 49e885b6eaf969c07413889f12f12241fefbe4f6 Mon Sep 17 00:00:00 2001 From: Yancey1989 Date: Mon, 16 Apr 2018 19:43:36 +0800 Subject: [PATCH] update --- doc/fluid/design/dist_train/async_update.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/doc/fluid/design/dist_train/async_update.md b/doc/fluid/design/dist_train/async_update.md index be8783a7e..344a44bde 100644 --- a/doc/fluid/design/dist_train/async_update.md +++ b/doc/fluid/design/dist_train/async_update.md @@ -5,9 +5,10 @@ For the typical synchronous distributed training, some significant steps are as follows: 1. A Trainer will compute the gradients and SEND them to the Parameter Server(PServer) nodes. -1. After the PServer node received gradients came from all the Trainers, -it would apply the gradient to the respective variables, and using an optimize algorithms(SGD, -Momentment...) to update the parameters. +1. After the PServer node received gradients came from all the Trainers, It will aggregate the +gradient variables for the same parameter into one gradient variable and then apply the aggregated +gradient to the respective parameter, finally using an optimize algorithms(SGD, Monument...) +to update the parameters. 1. The Trainer would wait for the PServers finished the optimize stage, and GET the parameters from PServer, so all the Trainers would get the same parameters. @@ -38,7 +39,7 @@ mini-batch. ### Trainer - For the multiple devices distributed training, we need to aggregate the gradient -variables which placed on different devices firstly, and then schedule a `SendVars` Operator to +variables which placed on different devices firstly and then schedule a `SendVars` Operator to send the gradient variables to the multiple PServer instances. - Schedule `FetchVars` operator to fetch the latest parameter from PServer before running the forward ops. -- GitLab