提交 · a2a97cbbac10a050e6ad13999926867e1a4aaafe · PaddlePaddle / Paddle

16 11月, 2022 1 次提交

[remove fluid] under fleet meta_optimizers (#47864) · a2a97cbb

由 wangzhen38 提交于 11月 16, 2022

* [remove fluid] under fleet meta_optimizers

* [remove fluid] under fleet meta_optimizers

* [remove fluid] under fleet meta_optimizers

* [remove fluid] under fleet meta_optimizers

* [remove fluid] under fleet meta_optimizers

* [remove fluid] under fleet meta_optimizers

* [remove fluid] under fleet meta_optimizers

* [remove fluid] under fleet meta_optimizers

* [remove fluid] under fleet meta_optimizers

* [remove fluid] under fleet meta_optimizers

* [remove fluid] under fleet meta_optimizers

* [remove fluid] under fleet meta_optimizers

a2a97cbb

08 11月, 2022 1 次提交
- N
  [CodeStyle][py2][U004] unecessary explicit `object` inheritance in class definition (#47642) · 888272b5
  由 Nyakku Shigure 提交于 11月 08, 2022
```
* [CodeStyle][py2][U004] unecessary explicit `object` inheritance in class definition

* fix an increment
```
  888272b5
03 11月, 2022 1 次提交

[CodeStyle][py2][U008] remove unnecessary args in `super()` (#47549) · 3de3e45e

由 Nyakku Shigure 提交于 11月 03, 2022

* [CodeStyle][py2][U008] remove unnecessary args in `super()`

* remove remained args

* revert changes in test_pylayer_op

* Revert "revert changes in test_pylayer_op"

This reverts commit ff185a9ae738afac3b0264f61bde6c6b7f72e7c4.

* revert some changes in example code

3de3e45e

01 11月, 2022 1 次提交

[CodeStyle][E711] use `is`/`is not` for comparison with `None` (#47452) · a35a4a53

由 Nyakku Shigure 提交于 11月 01, 2022

* [CodeStyle][E711] use `is`/`is not` for comparison with `None`

* `self.assertTrue($A is None)` -> `self.assertIsNone($A)`

* `self.assertTrue($A is not None)` -> `self.assertIsNotNone($A)`

* `self.assertFalse($A is None)` -> `self.assertIsNotNone($A)`

* `self.assertEqual($A, None)` -> `self.assertIsNone($A)`

* `self.assertNotEqual($A, None)` -> `self.assertIsNotNone($A)`

a35a4a53

23 10月, 2022 1 次提交
- N
  [CodeStyle][black] use black instead of yapf (#46014) · 7097630f
  由 Nyakku Shigure 提交于 10月 23, 2022
```
* update config

* re-blacken python code

* temporarily disable date and diff_py_file

* skip a format
```
  7097630f
19 10月, 2022 1 次提交
- N
  
  [CodeStyle][py2] remove `six` package (part 1) (#46965) · e6fb551c
  由 Nyakku Shigure 提交于 10月 19, 2022
  
  e6fb551c
05 6月, 2022 1 次提交

【code format check upgrade】 step2：yapf (#42944) · a072fca8

由 Sing_chan 提交于 6月 05, 2022

* use yapf to format all python file

* yapf exclude two unittests file for they rely on writing and reading file, and format will break them

* disable diff_py_file because too many diff files cause command following failed

a072fca8

19 10月, 2020 1 次提交
- M
  fleet support paddle.optimzier (#28026) · 55098b97
  由 MRXLT 提交于 10月 19, 2020
```
fleet support paddle.optimzier

* bug fix

* fix fleet_base

* bug fix

* fix coverage
```
  55098b97
10 8月, 2020 1 次提交
- G
  Fix test_hdfs bug. (#26068) · a7c52100
  由 gongweibao 提交于 8月 10, 2020
```
* fix merge3 test=develop
```
  a7c52100
08 8月, 2020 1 次提交
- G
  
  Save checkpoint automatically (#25917) · 0067a2e4
  由 gongweibao 提交于 8月 08, 2020
  
  0067a2e4
07 7月, 2020 1 次提交
- G
  
  Fix typo in interface. (#24779) · 80f1c507
  由 gongweibao 提交于 7月 07, 2020
  
  80f1c507
15 4月, 2020 1 次提交
- M
  fix AMP and recompute (#23551) · f0e743f1
  由 mapingshuo 提交于 4月 15, 2020
```
* allow amp and recompute working together
```
  f0e743f1
03 4月, 2020 1 次提交
- G
  
  Add fleet checkpoint on local fs and remote fs(such as hdfs) for EDL (#22586) · 24a063f6
  由 gongweibao 提交于 4月 03, 2020
  
  24a063f6
23 2月, 2020 1 次提交
- T
  
  fix typo words (#22653) · d2ba91aa
  由 tianshuo78520a 提交于 2月 23, 2020
  
  d2ba91aa
31 12月, 2019 1 次提交
- W
  
  fix sync_batch_norm hang in fleet (#21838) · 3ec289a6
  由 WangXi 提交于 12月 31, 2019
  
  3ec289a6
05 12月, 2019 1 次提交
- L
  
  bugfix: construct a DistributedStrategy instance if the passed one is None (#21545) · da75ac8b
  由 lilong12 提交于 12月 05, 2019
  
  da75ac8b
12 11月, 2019 1 次提交

modify the implementation of save_persistables and save_inference_model for... · 53148e06

由 lilong12 提交于 11月 12, 2019

modify the implementation of save_persistables and save_inference_model for fleet collective mode (#20802)

* modify the implementation of  save_persistables and save_inference_model functions for fleet collective, test=develop

* add ut, test=develop

53148e06

15 10月, 2019 2 次提交
- W
  
  fix dgc test and bug when not set trainers_endpoints_, test=develop (#20617) · cadc6a97
  由 WangXi 提交于 10月 14, 2019
  
  cadc6a97
- M
  Fleet: deal with special case: strategy is None (#20359) · f55d1c68
  由 mapingshuo 提交于 10月 15, 2019
```
* special case: strategy is None
```
  f55d1c68
23 9月, 2019 1 次提交

Forward recompute3 (#19913) · 9901f696

由 mapingshuo 提交于 9月 23, 2019

* add recompute based checkpoints methods for large batch training
test=develop

* add append_backward_with_forward_recomputation
test=develop

* refine optimizer
test=develop

* update backward and optimizer
test=develop

* make Variable usable
test=develop

* add recompute code

* refine optimizer
test=develop

* refine addup _append_backward_ops_with_checkpoints_
1) for recompute part, just cache the grad_op_desc without appending to block
2) before appending grad_op_desc to backward part, addup_repetitive_vars, remove unused branch
test=develop

* make method private

* add recompute strategy into DistributedStrategy
test=develop

* checkpoint version3
test=develop

* remove some print information
test=develop

* remove unused sumop
test=develop

* try to fix recompute with graph building modules

* add input names to vars should be held

* add memory debug tool

* backup backward

* Fix bugs

* add backward desc for op not in any segments

* add exception info for sub_block

test=develop

* modify code style

test=develop

* modify code style

test=develop

* remove print functions

test=develop

* add API spec

test=develop
test=document_preview

* make Recompute a child class of Optimizer

test=develop
test=document_preview

* add API spec

test=develop
test=document_preview

* modify API spec

test=develop
test=document_preview

* add document for Recompute

test=develop
test=document_preview

* change API doc of Rcompute

test=develop
test=document_preview

* code cleaning

test=develop
test=document_preview

* modify API spec

* fix bugs when segments hold no element

* add testcase for Recompute Optimizer

test=develop
test=document_preview

* add test for apply_gradient, and code cleaning

test=develop
test=document_preview

* add test case for load function

* enable CI

test=develop
test=document

* add test case

test=develop
test=document_preview

* add sample code for 4 function of recompute optimizer

test=develop
test=document_preview

9901f696

19 9月, 2019 1 次提交
- G
  change _origin_program test=develop (#19863) · e8d3745c
  由 gongweibao 提交于 9月 19, 2019
```
change _origin_program test=develop
```
  e8d3745c
10 9月, 2019 1 次提交
- G
  Fix float16 optimizer. (#19682) · 6c2bc29c
  由 gongweibao 提交于 9月 10, 2019
```
Fix float16 optimizer
```
  6c2bc29c
28 8月, 2019 1 次提交
- Y
  adapte fleet api for localsgd and support nccl comm configuration in executor (#19443) · 4ef6b845
  由 Yi Liu 提交于 8月 28, 2019
```
test=develop
```
  4ef6b845
16 8月, 2019 1 次提交
- G
  Remove node_num function. (#19167) · 86f05911
  由 gongweibao 提交于 8月 16, 2019
```
node_num is not needed for users, so remove them and fix the bugs about it!
```
  86f05911
12 8月, 2019 1 次提交
- G
  Polish fleet API to support cuda collective mode and nccl2 mode. (#18966) · 29d87812
  由 gongweibao 提交于 8月 12, 2019
```
Polish fleet API to support cuda collective mode and nccl2 mode
```
  29d87812
10 7月, 2019 1 次提交
- G
  upgrade collective fleet api (#18533) · 9c17a899
  由 guru4elephant 提交于 7月 10, 2019
```
* upgrade collective fleet api
```
  9c17a899
27 6月, 2019 1 次提交

supports collective communicated training (#18175) · b7128bac

由 HaoRen 提交于 6月 27, 2019

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* fix comment
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* fix comment
test=develop

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* test=develop
add collective op unittest standard

* test=develop
remove the test_collective directory

* test=develop
remove the test_collective directory

* remove slicegather test

* code format for reducescatter

* update attr of shard_index_op

* Modify macro nccl_helper

* remove test without distribute

* macro collective_helper

* marcro update

* test=develop
update support python3.5

* test=develop change gpu memory use to 0.1 when test

* test=develop
update ut equal func

* test=develop
set flags to 1.5

* test=develop fix pickle dumple  py35

* test=develop
fix divide in slice and add sync_comm_stream
update atol and rtol to 1e-05
rm shard_index op and test
modify read input from file to read from memory
remove origin_program in framework and add i/o in c_sync_calc_stream

* test=develop update unittest sync operator I/O

b7128bac

12 6月, 2019 1 次提交
- T
  fix save/load in fleet (#17675) · 101f74cb
  由 tangwei12 提交于 6月 12, 2019
```
* fix save/load in Fleet
* add UT framework of Fleet
```
  101f74cb
23 5月, 2019 1 次提交
- Q
  Async exe support communicator (#17386) · 58f7695a
  由 Qiao Longfei 提交于 5月 23, 2019
```
Async exe support communicator
```
  58f7695a
09 5月, 2019 1 次提交

Reformat fleet API (#17135) · 565d3095

由 tangwei12 提交于 5月 09, 2019

* fix some logic in distributed transpiler, test=develop
* reformat fleet API, test=develop

565d3095

25 4月, 2019 1 次提交
- T
  Fleet unify distributed training (#16791) · 1a4a51db
  由 tangwei12 提交于 4月 25, 2019
```
* implement distributed transpiler with fleet
```
  1a4a51db

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功