- 13 7月, 2022 4 次提交
-
-
由 caozhou 提交于
* add comm init control by socket * avoid single card instance failure
-
由 ShenLiang 提交于
-
由 caozhou 提交于
* generate default cluster * add unittest
-
由 Jiabin Yang 提交于
* fix sharding in eager * support eager sharding
-
- 12 7月, 2022 1 次提交
-
-
由 caozhou 提交于
* update base cost * update unittest of cost model * add unittest
-
- 11 7月, 2022 2 次提交
-
-
由 Haohongxiang 提交于
* fix conflict * new pg apis * add docs of new apis * update * fix coverage * update * fix bug * fix reduce scatter * fix api * update Co-authored-by: NForFishes <2282912238@qq.com>
-
由 zhaoyingli 提交于
* add 'to_static' in engine api * fix cmakelist
-
- 07 7月, 2022 1 次提交
-
-
由 zhaoyingli 提交于
* fix op_role * fix engine * update op_role
-
- 04 7月, 2022 1 次提交
-
-
由 yaozhixin 提交于
-
- 30 6月, 2022 1 次提交
-
-
由 kuizhiqing 提交于
-
- 29 6月, 2022 1 次提交
-
-
由 JZ-LIANG 提交于
* fixed bug for pass & engine * fixed bug for benchmark GPT-3
-
- 27 6月, 2022 1 次提交
-
-
由 wanghuancoder 提交于
* rename eagerpylayer
-
- 24 6月, 2022 2 次提交
-
-
由 gongweibao 提交于
* tmp fix * init * compile ok * compile ok * add vlogs * add test * fix termination error * add testfile * add * fix window compile * fix window compile * fix windows compile * fix windows compile * fix windows compile * fix windows compile * fix windows compile * fix windows compile * fix kunlun compile * fix compilation * fix compilation * fix compilation * tmp fix * add windows * add windows * add more logs * change timeout to protected * SB * add * add * fix timeout * add * fix test * fix test * fix test * fix ut * fix ut * fix ut
-
由 Yulong Ao 提交于
* [Auto Parallel] Use a fast completion for data parallelism * remove unuse cuSparse function * [Auto Parallel] Fix some bugs of the fast dp completion * [Auto Parallel] Add the cmake statements * [Auto Parallel] Make the unittest adapt to the new interface * [Auto Parallel] Modify the timeout of the unittest * [Auto Parallel] Remove unnecessary comments Co-authored-by: Nzhouwei25 <zhouwei25@baidu.com>
-
- 20 6月, 2022 2 次提交
-
-
由 wangguanqun 提交于
* gpups default config and dataset * codestyle * add unittest * code style * add dymf to gpups * codestyle * add static.nn.cvm import * PSERVER_DEBUG * add fs config to worker desc * update unittest * unittest * remove gpups unittest * remove gpups unittest * static check
-
由 kuizhiqing 提交于
-
- 17 6月, 2022 1 次提交
-
-
由 ziyoujiyi 提交于
* back fl * delete ssl cert * . * make warning * . * unittest paral degree * solve unittest * heter & multi cloud commm ready * . * . * fl-ps v1.0 * . * support N + N mode * . * . * . * . * delete print * . * . * . * . * fix bug * . * .
-
- 16 6月, 2022 1 次提交
-
-
由 gongweibao 提交于
-
- 14 6月, 2022 3 次提交
-
-
由 Haohongxiang 提交于
-
由 yaozhixin 提交于
* update paddle.distributed.launch * add sample code * update shell * fix typo * fix typo * update docs * rm code * fix doc 2 * fix doc 3 * fix doc 4 Co-authored-by: Nroot <root@sgjur-pod004-1.ipu.graphcore.cn>
-
由 zlsh80826 提交于
* Replace np.bool/np.bool8 with np.bool_ * Replace np.object with np.object_ * Replace np.complex with np.complex128 * Replace np.float with np.float64 * Replace np.int with np.int_ * Rerun pre-commit for newer pre-commit configuration * Use builtin bool instead of np.bool_ based on the context
-
- 13 6月, 2022 2 次提交
-
-
由 zhaoyingli 提交于
* fix fetch list * fix unittest
-
由 wangguanqun 提交于
* gpups default config and dataset * codestyle * add unittest * code style
-
- 09 6月, 2022 1 次提交
-
-
由 sneaxiy 提交于
* add nproc_per_node for DistributedFusedLamb * fix nproc_per_node communicator bug * fix ring_id = 1 init bug * fix ci * fix test_parallel_executor_mnist.py
-
- 08 6月, 2022 1 次提交
-
-
由 zhaoyingli 提交于
* add fetch_list * fix evaluate log * tiny fix
-
- 07 6月, 2022 1 次提交
-
-
由 Haohongxiang 提交于
* fix bugs of reducer * update * update
-
- 06 6月, 2022 1 次提交
-
-
由 zhaoyingli 提交于
* fix gradient merge * bug fix * update annotation
-
- 05 6月, 2022 1 次提交
-
-
由 Sing_chan 提交于
* use yapf to format all python file * yapf exclude two unittests file for they rely on writing and reading file, and format will break them * disable diff_py_file because too many diff files cause command following failed
-
- 02 6月, 2022 4 次提交
-
-
由 Haohongxiang 提交于
-
由 ziyoujiyi 提交于
* back fl * delete ssl cert * . * make warning * . * unittest paral degree * solve unittest * heter & multi cloud commm ready * . * . * fl-ps v1.0 * . * support N + N mode * . * . * . * . * delete print * . * . * . * .
-
由 zhaoyingli 提交于
* prepare only once
-
由 zhaoyingli 提交于
-
- 01 6月, 2022 4 次提交
-
-
由 JZ-LIANG 提交于
* adapt for 10 loss * partitioner support optimizer
-
由 caozhou 提交于
-
由 Yulong Ao 提交于
* [Auto Parallel] Add the parallel tuner * [Auto Parallel] Improve the parallel tuner and fix some bugs * upodate cost model * update import Resharder by dist op * update cost model * fix comp cost bug * update cost model * [Auto Parallel] Amend the dist attr for #processses=1 * update cost model and tuner * update cost model and tuner * update cost model and tuner * update cluster * update reshard * [Auto Parallel] Add the estimation from the cost model * [Auto Parallel] Reimplement the backup and restore functions * [Auto Parallel] Fix the bugs of the parallel tuner * [Auto Parallel] Update the engine api and dist context * [Auto Parallel] Work around the high order grad problem * [Auto Parallel] Add some miscellaneous improvements * [Auto Parallel] Add a unittest for DistributedContext Co-authored-by: Ncaozhou <caozhou@radi.ac.cn>
- 31 5月, 2022 2 次提交
-
-
由 yaozhixin 提交于
* [IPU] support paddle.distributed.launch with IPUs * add device_num to env_args_mapping
-
由 Haohongxiang 提交于
-
- 30 5月, 2022 1 次提交
-
-
由 zhaoyingli 提交于
* use original id in dist_op_context.grad_op_id_to_op_id * del assert * remove redundant map
-
- 26 5月, 2022 1 次提交
-
-
由 danleifeng 提交于
-