提交 · eeaf04da195e5a6b54b7a51555788271a96fd009 · PaddlePaddle / Paddle

17 10月, 2019 1 次提交

[cherry-pick]Fix communicator slow bug & fix communicator stop bug (#20366) (#20646) · eeaf04da

由 Chengmo 提交于 10月 17, 2019

* Fix communicator slow bug & fix communicator stop bug (#20366)

* test=develop,Fix communicator slow bug

* test=develop, delete if() in stop_worker()

* test=develop

* fix UT, test=develop

* fix bug in fetch handler, test=develop

* fix bug in fetch handler, test=develop

* test=develop, fix fetch barrier bug

* test=develop, bug fix

* test=develop, bug fix

* test=develop, fix bug

* test=develop,test=release/1.6

eeaf04da

08 10月, 2019 1 次提交
- T
  trainer from dataset fetch targets (#19760) (#20182) · 546a0d3c
  由 tangwei12 提交于 10月 08, 2019
```
add executor.FetchHandler for train/infer from the dataset
```
  546a0d3c
27 9月, 2019 1 次提交

the integrated communicator (#19849) · 8f0b3c05

由 tangwei12 提交于 9月 27, 2019

* add a base class for the Communicator
* add AsyncCommunicator Impl for async distributed training

8f0b3c05

28 8月, 2019 1 次提交

Fix the correctness of async mode at distributed training (#18863) · 65c73684

由 tangwei12 提交于 8月 28, 2019

* fix correctness of the communicator

* fix a bug in send thread when sending var context is empty, test=develop

* add lookup_table_prefetch_op and prefetch optimize, test=develop

* remove remote prefetch GPU supported

* word2vec force with CPU, test=develop

* test dist remote lookup table force with CPU, test=develop

65c73684

22 7月, 2019 1 次提交
- G
  split different comm method for mnist distributed training (#18715) · ebf9797e
  由 guru4elephant 提交于 7月 22, 2019
```
* split different comm method for mnist distributed training
```
  ebf9797e
12 6月, 2019 1 次提交
- T
  fix save/load in fleet (#17675) · 101f74cb
  由 tangwei12 提交于 6月 12, 2019
```
* fix save/load in Fleet
* add UT framework of Fleet
```
  101f74cb
29 10月, 2018 1 次提交

[1.1] [project] train imagenet using large batch size (#13766) · 26200f2e

由 Wu Yi 提交于 10月 29, 2018

* fix nccl2 lars dist support

* put lars in momentum op

* add tests lars

* fix ci

* fix cpu kernel

* soft warning

* remove lars in test_recognize_digits.py

* move to another op

* add file

* update api.spec test=develop

* update test=develop

* fix api.spec test=develop

* wip

* wip, finish grad merge ops

* wip, finish graph build

* wip test running

* work on 1 gpu

* workable version

* update

* fix tests

* fuse broadcast op

* fix compile failed

* refine

* add batch merge test mnist

* fix CI test=develop

* fix build

* use independent bn params for batch merge test=develop

* update api.spec

* follow comments and for test

* wip

* refine tests test=develop

* follow comments test=develop

* remove startup bn modify test=develop

* follow comments test=develop

* fix merge test=develop

26200f2e

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功