提交 · 173b39bb5703c297ae89c6ef442f634c56f2f2bf · PaddlePaddle / Paddle

09 9月, 2022 1 次提交

[new-exe] convert fused_all_reduce_op_handle to program (#45774) · e755c07e

由 Leo Chen 提交于 9月 09, 2022

* add operator<< for BuildStrategy

* add fake_coalesce

* fit allreduce mode for new_exe

* remove dubeg code

* follow comments

e755c07e

01 8月, 2022 1 次提交

unify gpu context (#44740) · 86763023

由 Leo Chen 提交于 8月 01, 2022

* remove cudaDeviceContext

* remove more template

* fix rocm compile

* remove alias name CUDADeviceContext

* fix compile

* fix tests

* revert changes

86763023

26 6月, 2022 1 次提交
- S
  
  format all files in fluid using new config (#43776) · 576236a0
  由 Sing_chan 提交于 6月 26, 2022
  
  576236a0
05 6月, 2022 1 次提交
- S
  
  【code format check upgrade】 step2：clang-format (#42840) · a3730dc8
  由 Sing_chan 提交于 6月 05, 2022
  
  a3730dc8
15 4月, 2022 1 次提交
- H
  [Dygraph] Refactor Model Parallel in eager mode (#41761) · e6fb6599
  由 Haohongxiang 提交于 4月 15, 2022
```
* refactor mp in eager mode

* update

* update

* add uts
```
  e6fb6599
11 3月, 2022 1 次提交
- Z
  
  [MLU]add allgather_op mlu kernel (#40356) · dc773828
  由 zn 提交于 3月 11, 2022
  
  dc773828
24 2月, 2022 1 次提交
- Z
  
  [MLU]add mlu kernel for allreduce (#39788) · ce207c3a
  由 zn 提交于 2月 24, 2022
  
  ce207c3a
15 2月, 2022 1 次提交

[PTen]Migrate proto::VarType outside of Pten (#39411) · 7e7e9404

由 Aurelius84 提交于 2月 15, 2022

* #1 migrate dist-related type()-> dtype()

* move datatype function from pten -> fluid/framework

* change type() in imperative into convert(dtype())

* modify xx_tensor->type into xx_tensor->dtype

* change the set_type interface and the caller

* modify xx_tensor.type into xx_tensor.dtype

* fix mutable_data(place, dtype())

* change caller of mutable_data in pten and distributed

* change the caller of mutable_data in fluid/framework

* change the caller of mutable_data in imperative directory

* mutable_data: inference

* update the call of mutable_data

* transfer MakePenScalarArray MakePtenScalar ResetHolderWithType

* pass the compile. the next step is remove VarType in Pten

* fix all and remove VarType from pten. success in linux. Next task is other platform

* fix conflict with develop

* fix compiled error

* Fix reset conversion

* fix conflict

* fix compiled problem

* fix typo

* Fix << in tensor_utils.cc

* fix type->dtype

* fix unittest

* fix tensor init constructor

* fix DataTypeSize for BFloat16

* fix code style

* fix npu compiled error

* fix npu

* compile npu sucessfully

* fix conflict

* fix conflict
Co-authored-by: Nxiongkun <xiongkun03@baidu.com>

7e7e9404

18 1月, 2022 1 次提交

[Unify Tensors PR #8] Merged Tensor into DenseTensor, test=allcases (#38914) · 2052f1e3

由 Zhanlue Yang 提交于 1月 18, 2022

* Merged LoDTensor with Tensor,test=allcases

* Patched python level LoDTensor

* Patched python level LoDTensor

* Merge Tensor into DenseTensor

* Fixed namespace issues,test=allcases

* Fixed merge issues

* Fixed inference issues

* Fixed NPU test issues

* Fixed merge issues

2052f1e3

03 12月, 2021 1 次提交
- R
  refine structure for cuda and rocm (#37202) · a6d2fddb
  由 ronnywang 提交于 12月 03, 2021
```
* refine structure for cuda and rocm

* update

* update

* update

* update
```
  a6d2fddb
27 11月, 2021 1 次提交

[NPU] reorganization for device API abstraction (#37110) · 72241a6a

由 Aganlengzi 提交于 11月 27, 2021

* [NPU] reorganization for device API abstraction

* [NPU] delete old files

* [NPU] fix npu_collective_helper

* [NPU] fix collective_helper

* [NPU] fix ut

* [NPU] mod memory allocation and hccl_helper

* [NPU] fix place_type

* [NPU] split enfoce.h

* move acl* call into npu_info

* merge conflict

* fix merge

* merge conflict

* merge conflict

72241a6a

23 11月, 2021 1 次提交
- Q
  [XPU] Reorganize xpu device codes in platform, test=develop (#37428) · 79800978
  由 Qi Li 提交于 11月 23, 2021
```
* [XPU] Reorganize xpu device codes in platform, test=develop

* fix xpu_header.h, test=develop
```
  79800978
06 9月, 2021 1 次提交
- Y
  
  Revert hccl check nan (#35438) · c3ad7775
  由 Yuang Liu 提交于 9月 06, 2021
  
  c3ad7775
24 8月, 2021 1 次提交
- G
  
  Add flags to control whether to check Nan value of hccl_allreduce_sum. (#35093) · 5b737834
  由 gongweibao 提交于 8月 24, 2021
  
  5b737834
06 8月, 2021 1 次提交
- G
  
  [NPU]Use another method to void c_allreduce_sum core! (#34619) · c91b1e03
  由 gongweibao 提交于 8月 06, 2021
  
  c91b1e03
02 8月, 2021 2 次提交
- G
  
  Revert "[NPU] refine nan check (#34508)" (#34530) · af886995
  由 gongweibao 提交于 8月 02, 2021
  
  af886995
- L
  
  [NPU] refine nan check (#34508) · 393a0b16
  由 Leo Chen 提交于 8月 02, 2021
  
  393a0b16
29 7月, 2021 1 次提交
- G
  
  Fix allreduce_sum potential bugs on NPU. (#34462) · 02cc3c5e
  由 gongweibao 提交于 7月 29, 2021
  
  02cc3c5e
06 5月, 2021 1 次提交
- G
  
  Fix bugs of pipeline on ascend. (#32737) · c5ae21f4
  由 gongweibao 提交于 5月 06, 2021
  
  c5ae21f4
23 4月, 2021 1 次提交
- L
  add the c_identity op (#32485) · 8fa8a37f
  由 lilong12 提交于 4月 23, 2021
```
* add c_identity op, test=develop
```
  8fa8a37f
21 4月, 2021 2 次提交

【NPU】Merge NPU ccl code (#32381) · c3158527

由 zhang wenhui 提交于 4月 21, 2021

* add allreduce and broadcast without test (#31024)

add allreduce and broadcast without test

* Refactor HCCLCommContext to be compatible with Paddle (#31359)

Refactor HCCLCommContext to be compatible with Paddle (#31359)

* [NPU] add npu kernel for communication op (#31437)

* add allreduce and broadcast without test

* add c_broadcast_test case

* build c_comm_init and c_create_group operators

* make the whole thing compile

* add broadcast and init op test case but run failed

* make unit test compile

* fix broadcast test bug and change into hcom for ccl

* change c_comm_init and c_create_group ops accordingly

* make tests compile

* transfer code to 27

* compiled successfully in 28, but run failed

* test broadcast in 28, but failed

* make hcom primitives work

* change hccl data type for base.h

* fix broadcast bug

* make attributes work

* fix group name bug

* add allreduce but test failed

* allreduce bug for qiuliang

* allreduce finished

* add allgather and reducescatter

* merge all op code

* add allgather test

* finish run all ccl op test exclude send/recv

* all all op and test exclude send/recv

* send_v2_npu.cc recv_v2_npiu.cc compiled

* fix ccl core dump bug and test allgather, reducescatter, broadcast op

* fix allreduce bug just for test

* hcom send&recv test pass, without hcom_destroy

* for qiuliang test

* Ascend Send&Recv Test Pass

* all op (ex send/recv) ok

* fix bug

* merge all ccl op

* style merge to PaddlePaddle

* merge style

* new merge style

* merge style 2

* insert an empty at the end

* disable ctest for hcom to pass ci
Co-authored-by: Nvoid-main <voidmain1313113@gmail.com>
Co-authored-by: Nf2hkop <f2huestc@outlook.com>

* Add auto-increasing tag id for Hcom OPs (#31702)

* add c_reduce_sum op (#31793)

add c_reduce_sum op

* update Ascendrc hccl to 20.3 (#32126)

update Ascendrc hccl to 20.3 (#32126)

* fix merge code

* change cmake.txt1

* [NPU] Support npu kernel for c sync stream op (#31386)

* sync stream npu op

* add with_ascend_acl

* update c++ unittest

* compile all failed

* try to pre commit

* after pre commit

* merge&compile&test hccl successfully!

* fix code style

* fix code style

* fix bugs about hccl

* fix some bugs

* fix code style

* fix style

* fix style

* fix

* fixed

* merge develop
Co-authored-by: Nlw921014 <liuwei921014@yeah.net>
Co-authored-by: NVoid Main <voidmain1313113@gmail.com>
Co-authored-by: Nf2hkop <f2huestc@outlook.com>
Co-authored-by: Nxiayanming <41795079@qq.com>

c3158527

L

[Kunlun]add collective ops for multi XPU cards training and add Kunlun multi XPU cards CI (#32302) · 2194ad15
由 liuyuhui 提交于 4月 21, 2021

2194ad15

24 2月, 2021 1 次提交
- Q
  
  [ROCM] update fluid collective op for rocm, test=develop (#31075) · ee76ea72
  由 Qi Li 提交于 2月 24, 2021
  
  ee76ea72
30 9月, 2020 1 次提交

fix distributed error info (#27206) · 20fb01fb

由 MRXLT 提交于 9月 30, 2020

* fix distributed error info

* bug fix; notest

* error info refine

* update error info

* update error info

* update error info

* bug fix

* bug fix

* bug fix

* bug fix

20fb01fb

27 8月, 2020 1 次提交
- L
  [api 2.0] add collective op for cpu using gloo and paddle.distributed.* apis (#26552) · 1c681383
  由 lilong12 提交于 8月 27, 2020
```
add collective op for cpu using gloo and paddle.distributed.* apis
```
  1c681383
11 2月, 2020 1 次提交

Compile without nccl deps. [1/2] (#22509) · a90fa540

由 Wilber 提交于 2月 11, 2020

支持不依赖nccl进行编译。[1/2]

多卡下，如果没有打开WITH_NCCL开关编译，多卡不能通信，则只能选择一张卡使用。
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>

a90fa540

28 10月, 2019 1 次提交

Replace risky GetInputType method with secure IndicateVarDataType interface (#20668) · 26cc1fe5

由 Chen Weihang 提交于 10月 28, 2019

* replace part of the old implementation, test=develop

* restore concat op, test=develop

* update all ops implemention & delete GetDataTypeOfVar func, test=develop

26cc1fe5

27 8月, 2019 1 次提交

supports multiple NCCL communicators preserved in NCCLCommContext (#19407) · efb05ba2

由 Yi Liu 提交于 8月 27, 2019

* supports multiple NCCL communicators preserved in NCCLCommContext
test=develop

* add ut for c_comm_init_all operator and fix cuda resource release problem
test=develop

efb05ba2

02 7月, 2019 1 次提交

supports collective training with programs (#18392) · a873fa84

由 Yi Liu 提交于 7月 02, 2019

1. Since allreduce op has 4 reduce types, We split these four reduce types into four ops
2. We also refined the collective op code, e.g. we separated the collective op kernel into CPUKernel and CUDAKernel, and remove the device specified DeviceContext parameter in template as we already knew the target DeviceContext
3. We remove the newly added Collective op role to reduce the complexity of program and graph analysis

a873fa84

27 6月, 2019 1 次提交

supports collective communicated training (#18175) · b7128bac

由 HaoRen 提交于 6月 27, 2019

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* fix comment
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* fix comment
test=develop

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* test=develop
add collective op unittest standard

* test=develop
remove the test_collective directory

* test=develop
remove the test_collective directory

* remove slicegather test

* code format for reducescatter

* update attr of shard_index_op

* Modify macro nccl_helper

* remove test without distribute

* macro collective_helper

* marcro update

* test=develop
update support python3.5

* test=develop change gpu memory use to 0.1 when test

* test=develop
update ut equal func

* test=develop
set flags to 1.5

* test=develop fix pickle dumple  py35

* test=develop
fix divide in slice and add sync_comm_stream
update atol and rtol to 1e-05
rm shard_index op and test
modify read input from file to read from memory
remove origin_program in framework and add i/o in c_sync_calc_stream

* test=develop update unittest sync operator I/O

b7128bac

17 5月, 2019 1 次提交
- Y
  polish parallel dygraph code (#17164) · 02175555
  由 Yan Xu 提交于 5月 17, 2019
```
* add var grad hook test=develop
```
  02175555
25 4月, 2019 1 次提交
- Y
  ParallelDyGraph with GPU collective mode (#16827) · 0b07eef1
  由 Yan Xu 提交于 4月 25, 2019
```
implement dygraph.parallel.DataParallel to hook reduce op.
```
  0b07eef1

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功