- 17 10月, 2019 2 次提交
-
-
由 tangwei12 提交于
* fix fetch handler error with pslib * fix distributed lookup table op with 1 pserver
-
由 Chengmo 提交于
* Fix communicator slow bug & fix communicator stop bug (#20366) * test=develop,Fix communicator slow bug * test=develop, delete if() in stop_worker() * test=develop * fix UT, test=develop * fix bug in fetch handler, test=develop * fix bug in fetch handler, test=develop * test=develop, fix fetch barrier bug * test=develop, bug fix * test=develop, bug fix * test=develop, fix bug * test=develop,test=release/1.6
-
- 16 10月, 2019 1 次提交
-
-
由 123malin 提交于
* bug fix: invalid learning rate decay in pserver async mode
-
- 11 10月, 2019 2 次提交
- 08 10月, 2019 1 次提交
-
-
由 tangwei12 提交于
Heartbeat for distributed async training.
-
- 02 10月, 2019 1 次提交
-
-
由 Chengmo 提交于
* refector geo sgd & communicator
-
- 26 9月, 2019 1 次提交
-
-
由 123malin 提交于
* fix DistributeTranspilerConfig document, test=develop
-
- 16 9月, 2019 1 次提交
-
-
由 tangwei12 提交于
fix wrong place with distributed_lookup_table
-
- 06 9月, 2019 1 次提交
-
-
由 123malin 提交于
* fleet api add input check, test=develop
-
- 28 8月, 2019 2 次提交
-
-
由 Yi Liu 提交于
test=develop
-
由 tangwei12 提交于
* fix correctness of the communicator * fix a bug in send thread when sending var context is empty, test=develop * add lookup_table_prefetch_op and prefetch optimize, test=develop * remove remote prefetch GPU supported * word2vec force with CPU, test=develop * test dist remote lookup table force with CPU, test=develop
-
- 26 8月, 2019 1 次提交
-
-
由 tangwei12 提交于
* fix sync mode hang in transpiler * remove sync mode in send/recv * replace PADDLE_ENFORCE with PADDLE_ENFORCE_NE
-
- 12 8月, 2019 1 次提交
-
-
由 gongweibao 提交于
Polish fleet API to support cuda collective mode and nccl2 mode
-
- 11 7月, 2019 1 次提交
-
-
由 gongweibao 提交于
-
- 27 6月, 2019 1 次提交
-
-
由 HaoRen 提交于
* fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * supports collective training in executor * make fetch_list runable with variables, add more unittest for use_program_cache test=develop * fix comment test=develop * use unique name for nccl_id * supports output to stream in program_to_code * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code * set op role in collective training * add collective op role * remove orig file * add build optimizer by strategy * add collective strategy * refine collective strategy * add multi-process role maker * refine strategy building factory so that we can easily plugin more strategy * scale loss grad in collective sgd transpiler * add support for distributed fc * code format * revert some features for dist fc * add support for distributed fc training * fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * supports collective training in executor * make fetch_list runable with variables, add more unittest for use_program_cache test=develop * use unique name for nccl_id * supports output to stream in program_to_code * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code * set op role in collective training * add collective op role * fix comment test=develop * remove orig file * add build optimizer by strategy * add collective strategy * refine collective strategy * add multi-process role maker * refine strategy building factory so that we can easily plugin more strategy * scale loss grad in collective sgd transpiler * add support for distributed fc * code format * revert some features for dist fc * add support for distributed fc training * test=develop add collective op unittest standard * test=develop remove the test_collective directory * test=develop remove the test_collective directory * remove slicegather test * code format for reducescatter * update attr of shard_index_op * Modify macro nccl_helper * remove test without distribute * macro collective_helper * marcro update * test=develop update support python3.5 * test=develop change gpu memory use to 0.1 when test * test=develop update ut equal func * test=develop set flags to 1.5 * test=develop fix pickle dumple py35 * test=develop fix divide in slice and add sync_comm_stream update atol and rtol to 1e-05 rm shard_index op and test modify read input from file to read from memory remove origin_program in framework and add i/o in c_sync_calc_stream * test=develop update unittest sync operator I/O
-
- 31 5月, 2019 1 次提交
-
-
由 tangwei12 提交于
* add example to get_startup_program() * fix example to get_startup_program()
-
- 30 5月, 2019 1 次提交
-
-
由 yaoxuefeng 提交于
-
- 29 5月, 2019 1 次提交
-
-
由 gongweibao 提交于
-
- 27 5月, 2019 1 次提交
-
-
由 gongweibao 提交于
-
- 23 5月, 2019 2 次提交
-
-
由 Qiao Longfei 提交于
* fix distribute doc
-
由 Qiao Longfei 提交于
Async exe support communicator
-
- 26 4月, 2019 1 次提交
-
-
由 tangwei12 提交于
-
- 25 4月, 2019 1 次提交
-
-
由 tangwei12 提交于
* implement distributed transpiler with fleet
-
- 27 3月, 2019 1 次提交
-
-
由 Qiao Longfei 提交于
-
- 25 3月, 2019 1 次提交
-
-
由 Qiao Longfei 提交于
-
- 23 3月, 2019 1 次提交
-
-
由 Qiao Longfei 提交于
-
- 20 2月, 2019 1 次提交
-
-
由 tangwei12 提交于
* fix params with only 1 dim * test=develop
-
- 08 2月, 2019 2 次提交
-
-
由 Qiao Longfei 提交于
-
由 Qiao Longfei 提交于
-
- 06 2月, 2019 1 次提交
-
-
由 Qiao Longfei 提交于
-
- 30 1月, 2019 1 次提交
-
-
由 tangwei12 提交于
* move var strusted to vars_distributed.py, add optimizer's block name, test=develop * rename optimzier's seems complex, revert it, test=develop * replace * with details, test=develop
-
- 24 1月, 2019 1 次提交
-
-
由 Wu Yi 提交于
-
- 23 1月, 2019 1 次提交
-
-
由 tangwei12 提交于
checkpoint for distributed training.
-
- 08 1月, 2019 1 次提交
-
-
由 Qiao Longfei 提交于
-
- 28 12月, 2018 1 次提交
-
-
由 Qiao Longfei 提交于
test=develop
-
- 27 12月, 2018 1 次提交
-
-
由 haowang101779990 提交于
test=develop
-
- 18 12月, 2018 1 次提交
-
-
由 JiabinYang 提交于
-
- 07 12月, 2018 2 次提交
-
-
由 gongweibao 提交于
-
由 tangwei12 提交于
-