- 22 3月, 2023 1 次提交
-
-
由 Ainavo 提交于
* replace assert false with AssertionError * 修改配置文件多余的部分
-
- 08 3月, 2023 1 次提交
-
-
由 kangguangli 提交于
* remove with_data_parallel in collective_optimizer * add comm op * fix collective optimizer * remove check_err_log=True
-
- 01 3月, 2023 1 次提交
-
-
由 wangxiaoning 提交于
* remove transpiler * Revert "remove transpiler" This reverts commit 46044ccd52011d45d7026786d331f264a6a8f645. * Revert "Revert "remove transpiler"" This reverts commit 80ad0945401b5b5efebac4baee0ec50a793d4405. * codestyle * fix setup * fix * fix
-
- 22 2月, 2023 1 次提交
-
-
由 meteor135 提交于
-
- 12 1月, 2023 1 次提交
-
-
由 zhangkaihuo 提交于
-
- 16 11月, 2022 1 次提交
-
-
由 wangzhen38 提交于
* [remove fluid] under fleet meta_optimizers * [remove fluid] under fleet meta_optimizers * [remove fluid] under fleet meta_optimizers * [remove fluid] under fleet meta_optimizers * [remove fluid] under fleet meta_optimizers * [remove fluid] under fleet meta_optimizers * [remove fluid] under fleet meta_optimizers * [remove fluid] under fleet meta_optimizers * [remove fluid] under fleet meta_optimizers * [remove fluid] under fleet meta_optimizers * [remove fluid] under fleet meta_optimizers * [remove fluid] under fleet meta_optimizers
-
- 08 11月, 2022 1 次提交
-
-
由 Nyakku Shigure 提交于
* [CodeStyle][py2][U004] unecessary explicit `object` inheritance in class definition * fix an increment
-
- 03 11月, 2022 1 次提交
-
-
由 Nyakku Shigure 提交于
* [CodeStyle][py2][U008] remove unnecessary args in `super()` * remove remained args * revert changes in test_pylayer_op * Revert "revert changes in test_pylayer_op" This reverts commit ff185a9ae738afac3b0264f61bde6c6b7f72e7c4. * revert some changes in example code
-
- 01 11月, 2022 1 次提交
-
-
由 Nyakku Shigure 提交于
* [CodeStyle][E711] use `is`/`is not` for comparison with `None` * `self.assertTrue($A is None)` -> `self.assertIsNone($A)` * `self.assertTrue($A is not None)` -> `self.assertIsNotNone($A)` * `self.assertFalse($A is None)` -> `self.assertIsNotNone($A)` * `self.assertEqual($A, None)` -> `self.assertIsNone($A)` * `self.assertNotEqual($A, None)` -> `self.assertIsNotNone($A)`
-
- 23 10月, 2022 1 次提交
-
-
由 Nyakku Shigure 提交于
* update config * re-blacken python code * temporarily disable date and diff_py_file * skip a format
-
- 19 10月, 2022 1 次提交
-
-
由 Nyakku Shigure 提交于
-
- 05 6月, 2022 1 次提交
-
-
由 Sing_chan 提交于
* use yapf to format all python file * yapf exclude two unittests file for they rely on writing and reading file, and format will break them * disable diff_py_file because too many diff files cause command following failed
-
- 19 10月, 2020 1 次提交
-
-
由 MRXLT 提交于
fleet support paddle.optimzier * bug fix * fix fleet_base * bug fix * fix coverage
-
- 10 8月, 2020 1 次提交
-
-
由 gongweibao 提交于
* fix merge3 test=develop
-
- 08 8月, 2020 1 次提交
-
-
由 gongweibao 提交于
-
- 07 7月, 2020 1 次提交
-
-
由 gongweibao 提交于
-
- 15 4月, 2020 1 次提交
-
-
由 mapingshuo 提交于
* allow amp and recompute working together
-
- 03 4月, 2020 1 次提交
-
-
由 gongweibao 提交于
-
- 23 2月, 2020 1 次提交
-
-
由 tianshuo78520a 提交于
-
- 31 12月, 2019 1 次提交
-
-
由 WangXi 提交于
-
- 05 12月, 2019 1 次提交
-
-
由 lilong12 提交于
-
- 12 11月, 2019 1 次提交
-
-
由 lilong12 提交于
modify the implementation of save_persistables and save_inference_model for fleet collective mode (#20802) * modify the implementation of save_persistables and save_inference_model functions for fleet collective, test=develop * add ut, test=develop
-
- 15 10月, 2019 2 次提交
-
-
由 WangXi 提交于
-
由 mapingshuo 提交于
* special case: strategy is None
-
- 23 9月, 2019 1 次提交
-
-
由 mapingshuo 提交于
* add recompute based checkpoints methods for large batch training test=develop * add append_backward_with_forward_recomputation test=develop * refine optimizer test=develop * update backward and optimizer test=develop * make Variable usable test=develop * add recompute code * refine optimizer test=develop * refine addup _append_backward_ops_with_checkpoints_ 1) for recompute part, just cache the grad_op_desc without appending to block 2) before appending grad_op_desc to backward part, addup_repetitive_vars, remove unused branch test=develop * make method private * add recompute strategy into DistributedStrategy test=develop * checkpoint version3 test=develop * remove some print information test=develop * remove unused sumop test=develop * try to fix recompute with graph building modules * add input names to vars should be held * add memory debug tool * backup backward * Fix bugs * add backward desc for op not in any segments * add exception info for sub_block test=develop * modify code style test=develop * modify code style test=develop * remove print functions test=develop * add API spec test=develop test=document_preview * make Recompute a child class of Optimizer test=develop test=document_preview * add API spec test=develop test=document_preview * modify API spec test=develop test=document_preview * add document for Recompute test=develop test=document_preview * change API doc of Rcompute test=develop test=document_preview * code cleaning test=develop test=document_preview * modify API spec * fix bugs when segments hold no element * add testcase for Recompute Optimizer test=develop test=document_preview * add test for apply_gradient, and code cleaning test=develop test=document_preview * add test case for load function * enable CI test=develop test=document * add test case test=develop test=document_preview * add sample code for 4 function of recompute optimizer test=develop test=document_preview
-
- 19 9月, 2019 1 次提交
-
-
由 gongweibao 提交于
change _origin_program test=develop
-
- 10 9月, 2019 1 次提交
-
-
由 gongweibao 提交于
Fix float16 optimizer
-
- 28 8月, 2019 1 次提交
-
-
由 Yi Liu 提交于
test=develop
-
- 16 8月, 2019 1 次提交
-
-
由 gongweibao 提交于
node_num is not needed for users, so remove them and fix the bugs about it!
-
- 12 8月, 2019 1 次提交
-
-
由 gongweibao 提交于
Polish fleet API to support cuda collective mode and nccl2 mode
-
- 10 7月, 2019 1 次提交
-
-
由 guru4elephant 提交于
* upgrade collective fleet api
-
- 27 6月, 2019 1 次提交
-
-
由 HaoRen 提交于
* fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * supports collective training in executor * make fetch_list runable with variables, add more unittest for use_program_cache test=develop * fix comment test=develop * use unique name for nccl_id * supports output to stream in program_to_code * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code * set op role in collective training * add collective op role * remove orig file * add build optimizer by strategy * add collective strategy * refine collective strategy * add multi-process role maker * refine strategy building factory so that we can easily plugin more strategy * scale loss grad in collective sgd transpiler * add support for distributed fc * code format * revert some features for dist fc * add support for distributed fc training * fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * supports collective training in executor * make fetch_list runable with variables, add more unittest for use_program_cache test=develop * use unique name for nccl_id * supports output to stream in program_to_code * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code * set op role in collective training * add collective op role * fix comment test=develop * remove orig file * add build optimizer by strategy * add collective strategy * refine collective strategy * add multi-process role maker * refine strategy building factory so that we can easily plugin more strategy * scale loss grad in collective sgd transpiler * add support for distributed fc * code format * revert some features for dist fc * add support for distributed fc training * test=develop add collective op unittest standard * test=develop remove the test_collective directory * test=develop remove the test_collective directory * remove slicegather test * code format for reducescatter * update attr of shard_index_op * Modify macro nccl_helper * remove test without distribute * macro collective_helper * marcro update * test=develop update support python3.5 * test=develop change gpu memory use to 0.1 when test * test=develop update ut equal func * test=develop set flags to 1.5 * test=develop fix pickle dumple py35 * test=develop fix divide in slice and add sync_comm_stream update atol and rtol to 1e-05 rm shard_index op and test modify read input from file to read from memory remove origin_program in framework and add i/o in c_sync_calc_stream * test=develop update unittest sync operator I/O
-
- 12 6月, 2019 1 次提交
-
-
由 tangwei12 提交于
* fix save/load in Fleet * add UT framework of Fleet
-
- 23 5月, 2019 1 次提交
-
-
由 Qiao Longfei 提交于
Async exe support communicator
-
- 09 5月, 2019 1 次提交
-
-
由 tangwei12 提交于
* fix some logic in distributed transpiler, test=develop * reformat fleet API, test=develop
-
- 25 4月, 2019 1 次提交
-
-
由 tangwei12 提交于
* implement distributed transpiler with fleet
-