提交 · c8affff0a8491eb7b8ac592657fe5caeaba8b0e7 · PaddlePaddle / Paddle

08 5月, 2021 1 次提交
- B
  add c_identity op npu (#32787) · c8affff0
  由 Baibaifan 提交于 5月 08, 2021
```
* add c_identity_op_npu
```
  c8affff0
23 4月, 2021 2 次提交
- L
  add the c_identity op (#32485) · 8fa8a37f
  由 lilong12 提交于 4月 23, 2021
```
* add c_identity op, test=develop
```
  8fa8a37f
- L
  add c_concat and c_split ops (#32486) · 2b108a04
  由 lilong12 提交于 4月 23, 2021
```
* add c_concat op
```
  2b108a04
13 11月, 2020 1 次提交
- L
  add send and recv ops (#28590) · ed9dd7c9
  由 lilong12 提交于 11月 13, 2020
```
* update, test=develop
```
  ed9dd7c9
30 9月, 2020 1 次提交

fix distributed error info (#27206) · 20fb01fb

由 MRXLT 提交于 9月 30, 2020

* fix distributed error info

* bug fix; notest

* error info refine

* update error info

* update error info

* update error info

* bug fix

* bug fix

* bug fix

* bug fix

20fb01fb

02 7月, 2019 1 次提交

supports collective training with programs (#18392) · a873fa84

由 Yi Liu 提交于 7月 02, 2019

1. Since allreduce op has 4 reduce types, We split these four reduce types into four ops
2. We also refined the collective op code, e.g. we separated the collective op kernel into CPUKernel and CUDAKernel, and remove the device specified DeviceContext parameter in template as we already knew the target DeviceContext
3. We remove the newly added Collective op role to reduce the complexity of program and graph analysis

a873fa84

27 6月, 2019 1 次提交

supports collective communicated training (#18175) · b7128bac

由 HaoRen 提交于 6月 27, 2019

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* fix comment
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* fix comment
test=develop

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* test=develop
add collective op unittest standard

* test=develop
remove the test_collective directory

* test=develop
remove the test_collective directory

* remove slicegather test

* code format for reducescatter

* update attr of shard_index_op

* Modify macro nccl_helper

* remove test without distribute

* macro collective_helper

* marcro update

* test=develop
update support python3.5

* test=develop change gpu memory use to 0.1 when test

* test=develop
update ut equal func

* test=develop
set flags to 1.5

* test=develop fix pickle dumple  py35

* test=develop
fix divide in slice and add sync_comm_stream
update atol and rtol to 1e-05
rm shard_index op and test
modify read input from file to read from memory
remove origin_program in framework and add i/o in c_sync_calc_stream

* test=develop update unittest sync operator I/O

b7128bac

17 5月, 2019 1 次提交
- Y
  polish parallel dygraph code (#17164) · 02175555
  由 Yan Xu 提交于 5月 17, 2019
```
* add var grad hook test=develop
```
  02175555
25 4月, 2019 1 次提交
- Y
  ParallelDyGraph with GPU collective mode (#16827) · 0b07eef1
  由 Yan Xu 提交于 4月 25, 2019
```
implement dygraph.parallel.DataParallel to hook reduce op.
```
  0b07eef1

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功