提交 · 2378aa8af4adefb573ddb031946023ed51fe5f8b · 机器未来 / Paddle

21 10月, 2019 1 次提交
- W
  
  [Cherry-pick 1.6] Fix DGC test and DGC nan bug (#20708) · 2378aa8a
  由 WangXi 提交于 10月 21, 2019
  
  2378aa8a
12 10月, 2019 1 次提交
- M
  Fleet: special case: strategy is None (#20427) · c0b04f0b
  由 mapingshuo 提交于 10月 12, 2019
```
* special case: strategy is None
```
  c0b04f0b
23 9月, 2019 1 次提交

Forward recompute3 (#19913) · 9901f696

由 mapingshuo 提交于 9月 23, 2019

* add recompute based checkpoints methods for large batch training
test=develop

* add append_backward_with_forward_recomputation
test=develop

* refine optimizer
test=develop

* update backward and optimizer
test=develop

* make Variable usable
test=develop

* add recompute code

* refine optimizer
test=develop

* refine addup _append_backward_ops_with_checkpoints_
1) for recompute part, just cache the grad_op_desc without appending to block
2) before appending grad_op_desc to backward part, addup_repetitive_vars, remove unused branch
test=develop

* make method private

* add recompute strategy into DistributedStrategy
test=develop

* checkpoint version3
test=develop

* remove some print information
test=develop

* remove unused sumop
test=develop

* try to fix recompute with graph building modules

* add input names to vars should be held

* add memory debug tool

* backup backward

* Fix bugs

* add backward desc for op not in any segments

* add exception info for sub_block

test=develop

* modify code style

test=develop

* modify code style

test=develop

* remove print functions

test=develop

* add API spec

test=develop
test=document_preview

* make Recompute a child class of Optimizer

test=develop
test=document_preview

* add API spec

test=develop
test=document_preview

* modify API spec

test=develop
test=document_preview

* add document for Recompute

test=develop
test=document_preview

* change API doc of Rcompute

test=develop
test=document_preview

* code cleaning

test=develop
test=document_preview

* modify API spec

* fix bugs when segments hold no element

* add testcase for Recompute Optimizer

test=develop
test=document_preview

* add test for apply_gradient, and code cleaning

test=develop
test=document_preview

* add test case for load function

* enable CI

test=develop
test=document

* add test case

test=develop
test=document_preview

* add sample code for 4 function of recompute optimizer

test=develop
test=document_preview

9901f696

19 9月, 2019 1 次提交
- G
  change _origin_program test=develop (#19863) · e8d3745c
  由 gongweibao 提交于 9月 19, 2019
```
change _origin_program test=develop
```
  e8d3745c
10 9月, 2019 1 次提交
- G
  Fix float16 optimizer. (#19682) · 6c2bc29c
  由 gongweibao 提交于 9月 10, 2019
```
Fix float16 optimizer
```
  6c2bc29c
28 8月, 2019 1 次提交
- Y
  adapte fleet api for localsgd and support nccl comm configuration in executor (#19443) · 4ef6b845
  由 Yi Liu 提交于 8月 28, 2019
```
test=develop
```
  4ef6b845
16 8月, 2019 1 次提交
- G
  Remove node_num function. (#19167) · 86f05911
  由 gongweibao 提交于 8月 16, 2019
```
node_num is not needed for users, so remove them and fix the bugs about it!
```
  86f05911
12 8月, 2019 1 次提交
- G
  Polish fleet API to support cuda collective mode and nccl2 mode. (#18966) · 29d87812
  由 gongweibao 提交于 8月 12, 2019
```
Polish fleet API to support cuda collective mode and nccl2 mode
```
  29d87812
10 7月, 2019 1 次提交
- G
  upgrade collective fleet api (#18533) · 9c17a899
  由 guru4elephant 提交于 7月 10, 2019
```
* upgrade collective fleet api
```
  9c17a899
27 6月, 2019 1 次提交

supports collective communicated training (#18175) · b7128bac

由 HaoRen 提交于 6月 27, 2019

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* fix comment
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* fix comment
test=develop

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* test=develop
add collective op unittest standard

* test=develop
remove the test_collective directory

* test=develop
remove the test_collective directory

* remove slicegather test

* code format for reducescatter

* update attr of shard_index_op

* Modify macro nccl_helper

* remove test without distribute

* macro collective_helper

* marcro update

* test=develop
update support python3.5

* test=develop change gpu memory use to 0.1 when test

* test=develop
update ut equal func

* test=develop
set flags to 1.5

* test=develop fix pickle dumple  py35

* test=develop
fix divide in slice and add sync_comm_stream
update atol and rtol to 1e-05
rm shard_index op and test
modify read input from file to read from memory
remove origin_program in framework and add i/o in c_sync_calc_stream

* test=develop update unittest sync operator I/O

b7128bac

12 6月, 2019 1 次提交
- T
  fix save/load in fleet (#17675) · 101f74cb
  由 tangwei12 提交于 6月 12, 2019
```
* fix save/load in Fleet
* add UT framework of Fleet
```
  101f74cb
23 5月, 2019 1 次提交
- Q
  Async exe support communicator (#17386) · 58f7695a
  由 Qiao Longfei 提交于 5月 23, 2019
```
Async exe support communicator
```
  58f7695a
09 5月, 2019 1 次提交

Reformat fleet API (#17135) · 565d3095

由 tangwei12 提交于 5月 09, 2019

* fix some logic in distributed transpiler, test=develop
* reformat fleet API, test=develop

565d3095

25 4月, 2019 1 次提交
- T
  Fleet unify distributed training (#16791) · 1a4a51db
  由 tangwei12 提交于 4月 25, 2019
```
* implement distributed transpiler with fleet
```
  1a4a51db

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致