提交 · f517fb60665ed017150d154e0e6817d5d2a3a71c · 机器未来 / Paddle

17 2月, 2020 1 次提交
- T
  deprecated for distribute transpiler api (#22513) (#22617) · f517fb60
  由 tangwei12 提交于 2月 17, 2020
```
* add deprecated for distribute transpiler, will delete it after 2.0.0, test=develop
```
  f517fb60
20 1月, 2020 1 次提交
- T
  integrated HALF_ASYNC to communicator (#21869) (#22343) · fa4e0e82
  由 tangwei12 提交于 1月 20, 2020
```
* add half_async in the communicator
* fix DistributedStrategy
```
  fa4e0e82
14 1月, 2020 1 次提交
- 1
  Bug fix for sparse recorder (#21969) (#22245) · 2e834eab
  由 123malin 提交于 1月 14, 2020
```
* test=develop, bug fix for sparse recorder
```
  2e834eab
07 1月, 2020 2 次提交
- C
  Update pyramid related OP (#21372) · 418abc92
  由 Chengmo 提交于 1月 07, 2020
```
* add special way to add distribute vars， Update Pyramid hash op
```
  418abc92
- C
  Fix grad clip (#21784) · 5c339193
  由 Chengmo 提交于 1月 07, 2020
```
* fix grad clip， clip op belongs to Backward op when running in Parameter Server mode.
```
  5c339193
06 1月, 2020 1 次提交
- 1
  add distributed_strategy (#21710) · 7fb817d4
  由 123malin 提交于 1月 06, 2020
```
* add distributed_strategy
```
  7fb817d4
12 12月, 2019 1 次提交

由 tangwei12 提交于 12月 12, 2019

* add fake init for the trainer, fix large memory hold in the trainer
* do not merge recv vars from a remote endpoint, test=develop
* add recv and save op, merge slice var in one op, save memory
* remove hsigmoid with pull sparse, test=develop

9ad940fd

06 12月, 2019 1 次提交
- H
  Paddlebox Related to Framework (#21586) · c5aec2fe
  由 hutuxian 提交于 12月 06, 2019
```
* Add a single_process_multi_thread transpiler.
* Add some UTs.
* Fix some API description.
```
  c5aec2fe
28 11月, 2019 1 次提交
- K
  add Adam beta1/beta2 support Variable (#21234) · ebfb720a
  由 Kaipeng Deng 提交于 11月 28, 2019
```
* add Adam beta1/beta2 support Variable. test=develop
```
  ebfb720a
01 11月, 2019 1 次提交
- 1
  Optimize decay (#20816) · 20cdff0e
  由 123malin 提交于 11月 01, 2019
```
* update pserver decay blocks

* update distributed notify handler
```
  20cdff0e
17 10月, 2019 1 次提交
- T
  fix fetch handler error with pslib (#20679) · 1d925440
  由 tangwei12 提交于 10月 17, 2019
```
* fix fetch handler error with pslib
* fix distributed lookup table op with 1 pserver
```
  1d925440
15 10月, 2019 2 次提交

Fix communicator slow bug & fix communicator stop bug (#20366) · 940c6ff1

由 Chengmo 提交于 10月 15, 2019

* test=develop,Fix communicator slow bug

* test=develop, delete if() in stop_worker()

* test=develop

* fix UT, test=develop

* fix bug in fetch handler, test=develop

* fix bug in fetch handler, test=develop

* test=develop, fix fetch barrier bug

* test=develop, bug fix

* test=develop, bug fix

* test=develop, fix bug

940c6ff1

1
bug fix: invalid learning rate decay in pserver async mode (#20325) · b4a3b750
由 123malin 提交于 10月 15, 2019
```
* bug fix: invalid learning rate decay in pserver async mode
```
b4a3b750

11 10月, 2019 1 次提交
- T
  doc fix, test=develop, test=document_fix (#20239) · a010d883
  由 tangwei12 提交于 10月 11, 2019
```
* doc fix, test=develop, test=document_fix
```
  a010d883
09 10月, 2019 1 次提交
- C
  Fix transpiler en doc (#20149) · 494d6cf2
  由 Chengmo 提交于 10月 09, 2019
```
* test=develop,test=document_fix,fix transpiler doc,add API.spec
```
  494d6cf2
07 10月, 2019 1 次提交
- T
  Trainer heartbeat for async mode (#19600) · b5a41046
  由 tangwei12 提交于 10月 07, 2019
```
Heartbeat for distributed async training.
```
  b5a41046
30 9月, 2019 1 次提交
- C
  Add GEO-SGD distribute training algorithm (#20018) · 728ec1b4
  由 Chengmo 提交于 9月 30, 2019
```
* refector geo sgd & communicator
```
  728ec1b4
26 9月, 2019 1 次提交
- 1
  fix APIs, test=document_preview (#19954) · 6c74e738
  由 123malin 提交于 9月 26, 2019
```
* fix DistributeTranspilerConfig document, test=develop
```
  6c74e738
16 9月, 2019 1 次提交
- T
  fix sync_with_distributed_lookup_table, test=develop (#19737) · 6a1db204
  由 tangwei12 提交于 9月 16, 2019
```
fix wrong place with distributed_lookup_table
```
  6a1db204
06 9月, 2019 1 次提交
- 1
  Optimize fleet API: add input check for some interfaces (#18971) · a25a716e
  由 123malin 提交于 9月 06, 2019
```
* fleet api add input check, test=develop
```
  a25a716e
28 8月, 2019 2 次提交

Y
adapte fleet api for localsgd and support nccl comm configuration in executor (#19443) · 4ef6b845
由 Yi Liu 提交于 8月 28, 2019
```
test=develop
```
4ef6b845

Fix the correctness of async mode at distributed training (#18863) · 65c73684

由 tangwei12 提交于 8月 28, 2019

* fix correctness of the communicator

* fix a bug in send thread when sending var context is empty, test=develop

* add lookup_table_prefetch_op and prefetch optimize, test=develop

* remove remote prefetch GPU supported

* word2vec force with CPU, test=develop

* test dist remote lookup table force with CPU, test=develop

65c73684

26 8月, 2019 1 次提交
- T
  fix distribute transpiler GRPC error code 4, RPC Deadline (#18984) · 19dac67e
  由 tangwei12 提交于 8月 26, 2019
```
* fix sync mode hang in transpiler
* remove sync mode in send/recv
* replace PADDLE_ENFORCE with PADDLE_ENFORCE_NE
```
  19dac67e
12 8月, 2019 1 次提交
- G
  Polish fleet API to support cuda collective mode and nccl2 mode. (#18966) · 29d87812
  由 gongweibao 提交于 8月 12, 2019
```
Polish fleet API to support cuda collective mode and nccl2 mode
```
  29d87812
11 7月, 2019 1 次提交
- G
  
  Polish backwards optimizer dependency codes and use more default values. (#18255) · c0a82748
  由 gongweibao 提交于 7月 11, 2019
  
  c0a82748
27 6月, 2019 1 次提交

supports collective communicated training (#18175) · b7128bac

由 HaoRen 提交于 6月 27, 2019

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* fix comment
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* fix comment
test=develop

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* test=develop
add collective op unittest standard

* test=develop
remove the test_collective directory

* test=develop
remove the test_collective directory

* remove slicegather test

* code format for reducescatter

* update attr of shard_index_op

* Modify macro nccl_helper

* remove test without distribute

* macro collective_helper

* marcro update

* test=develop
update support python3.5

* test=develop change gpu memory use to 0.1 when test

* test=develop
update ut equal func

* test=develop
set flags to 1.5

* test=develop fix pickle dumple  py35

* test=develop
fix divide in slice and add sync_comm_stream
update atol and rtol to 1e-05
rm shard_index op and test
modify read input from file to read from memory
remove origin_program in framework and add i/o in c_sync_calc_stream

* test=develop update unittest sync operator I/O

b7128bac

31 5月, 2019 1 次提交
- T
  fix document of python api get_startup_program() (#17764) · 659b72a9
  由 tangwei12 提交于 5月 31, 2019
```
* add example to get_startup_program()
* fix example to get_startup_program()
```
  659b72a9
30 5月, 2019 1 次提交
- Y
  
  fix distributed_transpiler.py api test=develop (#17668) · ac92e4c0
  由 yaoxuefeng 提交于 5月 30, 2019
  
  ac92e4c0
29 5月, 2019 1 次提交
- G
  
  fix 2dconn test=develop (#17681) · 0d561ef4
  由 gongweibao 提交于 5月 29, 2019
  
  0d561ef4
27 5月, 2019 1 次提交
- G
  
  Add multi-ncclcomm and 2D ncclallreduce support. (#17263) · 65bbf950
  由 gongweibao 提交于 5月 27, 2019
  
  65bbf950
23 5月, 2019 2 次提交
- Q
  fix distribute doc test=develop (#17318) · 92e7d5d7
  由 Qiao Longfei 提交于 5月 23, 2019
```
* fix distribute doc
```
  92e7d5d7
- Q
  Async exe support communicator (#17386) · 58f7695a
  由 Qiao Longfei 提交于 5月 23, 2019
```
Async exe support communicator
```
  58f7695a
26 4月, 2019 1 次提交
- T
  
  truncated_gaussian_random supported in distributed training, test=develop (#17091) · 7330cd63
  由 tangwei12 提交于 4月 26, 2019
  
  7330cd63
25 4月, 2019 1 次提交
- T
  Fleet unify distributed training (#16791) · 1a4a51db
  由 tangwei12 提交于 4月 25, 2019
```
* implement distributed transpiler with fleet
```
  1a4a51db
27 3月, 2019 1 次提交
- Q
  
  fix pylint · d640c6cf
  由 Qiao Longfei 提交于 3月 27, 2019
  
  d640c6cf
25 3月, 2019 1 次提交
- Q
  
  fix trainer_id · 542b52fa
  由 Qiao Longfei 提交于 3月 25, 2019
  
  542b52fa
23 3月, 2019 1 次提交
- Q
  
  update transpiler and listen and serv op · de65398c
  由 Qiao Longfei 提交于 3月 23, 2019
  
  de65398c
20 2月, 2019 1 次提交
- T
  fix params with only 1 dim (#15828) · 971f3bc9
  由 tangwei12 提交于 2月 20, 2019
```
* fix params with only 1 dim
* test=develop
```
  971f3bc9
08 2月, 2019 2 次提交
- Q
  
  parameter recv can run · 8bda4ab2
  由 Qiao Longfei 提交于 2月 08, 2019
  
  8bda4ab2
- Q
  
  complete recv op · fbd186bd
  由 Qiao Longfei 提交于 2月 08, 2019
  
  fbd186bd

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致