提交 · ce26f8823ac6662f9082b8053e81356f8f223278 · 机器未来 / Paddle

13 4月, 2021 1 次提交
- L
  update Ascendrc hccl to 20.3 (#32126) · ce26f882
  由 lw921014 提交于 4月 13, 2021
```
update Ascendrc hccl to 20.3 (#32126)
```
  ce26f882
12 4月, 2021 1 次提交

fix NPUDeviceContext in all c++ unittest (#32198) · 5ad94e7b

由 Leo Chen 提交于 4月 12, 2021

* fix NPUDeviceContext in all c++ unittest

* refine log
Co-authored-by: Npangyoki <pangyoki@126.com>

5ad94e7b

23 3月, 2021 2 次提交

L
add c_reduce_sum op (#31793) · c594f576
由 lw921014 提交于 3月 23, 2021
```
add c_reduce_sum op
```
c594f576

Add 3d parallelism (#31796) · 228bce12

由 lilong12 提交于 3月 23, 2021

Add 3d Parallelism
Co-authored-by: NWangXi <wangxi16@baidu.com>
Co-authored-by: NJZ-LIANG <jianzhongliang10@gmail.com>
Co-authored-by: Nroot <root@yq01-sys-hic-k8s-v100-box-a225-0562.yq01.baidu.com>

228bce12

18 3月, 2021 1 次提交
- V
  
  Add auto-increasing tag id for Hcom OPs (#31702) · 7b450e78
  由 Void Main 提交于 3月 18, 2021
  
  7b450e78
12 3月, 2021 1 次提交
- X
  [NPU] Support npu kernel for c sync stream op (#31386) · f1fdddfd
  由 xiayanming 提交于 3月 12, 2021
```
* sync stream npu op

* add with_ascend_acl

* update c++ unittest
```
  f1fdddfd
08 3月, 2021 1 次提交

[NPU] add npu kernel for communication op (#31437) · 15823bb0

由 lw921014 提交于 3月 08, 2021

* add allreduce and broadcast without test

* add c_broadcast_test case

* build c_comm_init and c_create_group operators

* make the whole thing compile

* add broadcast and init op test case but run failed

* make unit test compile

* fix broadcast test bug and change into hcom for ccl

* change c_comm_init and c_create_group ops accordingly

* make tests compile

* transfer code to 27

* compiled successfully in 28, but run failed

* test broadcast in 28, but failed

* make hcom primitives work

* change hccl data type for base.h

* fix broadcast bug

* make attributes work

* fix group name bug

* add allreduce but test failed

* allreduce bug for qiuliang

* allreduce finished

* add allgather and reducescatter

* merge all op code

* add allgather test

* finish run all ccl op test exclude send/recv

* all all op and test exclude send/recv

* send_v2_npu.cc recv_v2_npiu.cc compiled

* fix ccl core dump bug and test allgather, reducescatter, broadcast op

* fix allreduce bug just for test

* hcom send&recv test pass, without hcom_destroy

* for qiuliang test

* Ascend Send&Recv Test Pass

* all op (ex send/recv) ok

* fix bug

* merge all ccl op

* style merge to PaddlePaddle

* merge style

* new merge style

* merge style 2

* insert an empty at the end

* disable ctest for hcom to pass ci
Co-authored-by: Nvoid-main <voidmain1313113@gmail.com>
Co-authored-by: Nf2hkop <f2huestc@outlook.com>

15823bb0

02 3月, 2021 1 次提交
- V
  Refactor HCCLCommContext to be compatible with Paddle (#31359) · 45765d6e
  由 Void Main 提交于 3月 02, 2021
```
Refactor HCCLCommContext to be compatible with Paddle (#31359)
```
  45765d6e
01 3月, 2021 1 次提交
- L
  add allreduce and broadcast without test (#31024) · 9fcdaeba
  由 lw921014 提交于 3月 01, 2021
```
add allreduce and broadcast without test
```
  9fcdaeba
21 1月, 2021 2 次提交
- G
  Add Hccl program group (#30642) · e4287ca6
  由 gongweibao 提交于 1月 21, 2021
```
Add Hccl program group
```
  e4287ca6
- G
  Add distribution supported (#30578) · f9c97dd7
  由 gongweibao 提交于 1月 21, 2021
```
Add distribution supported
```
  f9c97dd7
24 12月, 2020 1 次提交

[Feature] one ps (3/4) (#29604) · 032414ca

由 tangwei12 提交于 12月 24, 2020

* oneps (3/4)
Co-authored-by: NMrChengmo <cmchengmo@163.com>
Co-authored-by: Nmalin10 <malin10@baidu.com>
Co-authored-by: Nchengmo <chengmo@baidu.com>

032414ca

16 12月, 2020 1 次提交
- W
  
  fix gen_nccl_id_op_helper compile failed, test=develop (#29614) · 613c46bc
  由 WangXi 提交于 12月 16, 2020
  
  613c46bc
14 12月, 2020 1 次提交
- W
  
  gen nccl id use socket (#29431) · 467c7169
  由 WangXi 提交于 12月 14, 2020
  
  467c7169
16 11月, 2020 1 次提交
- L
  
  bug fix, test=develop (#28648) · b2f7ab66
  由 lilong12 提交于 11月 16, 2020
  
  b2f7ab66
13 11月, 2020 1 次提交
- L
  add send and recv ops (#28590) · ed9dd7c9
  由 lilong12 提交于 11月 13, 2020
```
* update, test=develop
```
  ed9dd7c9
30 9月, 2020 1 次提交

fix distributed error info (#27206) · 20fb01fb

由 MRXLT 提交于 9月 30, 2020

* fix distributed error info

* bug fix; notest

* error info refine

* update error info

* update error info

* update error info

* bug fix

* bug fix

* bug fix

* bug fix

20fb01fb

25 9月, 2020 1 次提交
- M
  
  add AsDuplicable for sync_comm op(#27515) · c83ade6d
  由 mapingshuo 提交于 9月 25, 2020
  
  c83ade6d
24 9月, 2020 1 次提交

use iwyu clean include (#27267) · df43905f

由 wanghuancoder 提交于 9月 24, 2020

* use iwyu clean include, test=develop, test=win

* compilation error, test=develop

* fix compilation error2, test=develop

* fix compilation error3, test=develop

* fix compilation error4, test=develop

* fix compilation error5, test=develop

* fix compilation error6, test=develop

* fix compilation error7, test=develop

* fix compilation error8, test=develop

* fix compilation error8, test=develop

* fix compilation error10, test=develop

* fix compilation error11, test=develop

df43905f

27 8月, 2020 1 次提交
- L
  [api 2.0] add collective op for cpu using gloo and paddle.distributed.* apis (#26552) · 1c681383
  由 lilong12 提交于 8月 27, 2020
```
add collective op for cpu using gloo and paddle.distributed.* apis
```
  1c681383
22 8月, 2020 1 次提交
- L
  
  fix cscatter, test=develop (#26554) · faa9b97b
  由 lilong12 提交于 8月 22, 2020
  
  faa9b97b
21 8月, 2020 1 次提交
- L
  
  Add collective ops (reduce) (#26340) · e92f770c
  由 lilong12 提交于 8月 21, 2020
  
  e92f770c
10 8月, 2020 1 次提交
- L
  【paddle.fleet】add the support for multi-node training for pipeline (#25907) · 8caee2ad
  由 lilong12 提交于 8月 10, 2020
```
* add the support for multi-node training
```
  8caee2ad
08 7月, 2020 1 次提交

Revert/barrier for sync (#25417) · 4b3778a3

由 tangwei12 提交于 7月 08, 2020

* add retry for prefetch

* Revert "Fix/sync barrier (#25016)"

This reverts commit be6a315f.

* reopen dist UT, test=develop

* remove fl UT, test=develop

4b3778a3

12 6月, 2020 1 次提交
- T
  Fix/sync barrier (#25016) · be6a315f
  由 tangwei12 提交于 6月 12, 2020
```
* fix sync barrier with barrier monitor, test=develop
```
  be6a315f
03 6月, 2020 1 次提交

Replace all errors thrown by LOG(FATAL) with PADDLE_THROW (#24759) · d1062d52

由 Chen Weihang 提交于 6月 03, 2020

* remove REPLACE_ENFORCE_GLOG compile option & add ci rule prohibit LOG(FATAL) using, test=develop

* remove ci test case, test=develop

* replace all LOG(FATAL) & polish message, test=develop

* fix typo, test=develop

* polish error info detail, test=develop

d1062d52

11 5月, 2020 1 次提交

Add macro BOOST_GET to enrich the error information of boost :: get (#24175) · aa0f254f

由 Chen Weihang 提交于 5月 11, 2020

* add new macro BOOST_GET_SAFELY & unittests, test=develop

* add different macro type, test=develop

* fix get macro type in executor, test=develop

* four macro part change backup

* using one macro for all case, test=develop

* revert attribute change, test=develop

* change to three func to solve gcc4.8 bug, test=develop

* polish some details, test=develop

aa0f254f

09 3月, 2020 1 次提交

Imperative tracer refactoring (#22457) · d33c4343

由 Zeng Jinle 提交于 3月 09, 2020

* refine grad maker, test=develop

* refactor tracer stage 1, test=develop

* merge develop to solve conflict third times, test=develop

d33c4343

11 2月, 2020 1 次提交

Compile without nccl deps. [1/2] (#22509) · a90fa540

由 Wilber 提交于 2月 11, 2020

支持不依赖nccl进行编译。[1/2]

多卡下，如果没有打开WITH_NCCL开关编译，多卡不能通信，则只能选择一张卡使用。
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>

a90fa540

10 2月, 2020 1 次提交

Compile without nccl deps. [2/2] (#22484) · de009152

由 Wilber 提交于 2月 10, 2020

Compile without nccl deps. [1/2]
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>

de009152

05 2月, 2020 1 次提交

add WITH_NCCL option for cmake. (#22384) · 7bc4b095

由 Wilber 提交于 2月 05, 2020

cmake选项中添加了WITH_NCCL，显示指定是否编译NCCL的部分代码，WITH_NCCL默认打开，但如果WITH_GPU为OFF，则关闭WITH_NCCL

添加了PADDLE_WITH_NCCL定义

单机单卡能够关闭NCCL编译，多卡的话需要默认打开NCCL，如果关闭NCCL，则只能使用单卡
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>

7bc4b095

13 1月, 2020 1 次提交
- 1
  Bug fix for sparse recorder (#21969) · 985bceac
  由 123malin 提交于 1月 13, 2020
```
* test=develop, bug fix for sparse recorder
```
  985bceac
25 12月, 2019 1 次提交
- Z
  
  remove patch command and file of cares to Improved quality of Paddle Repo (#21776) · a01663ca
  由 zhouwei25 提交于 12月 25, 2019
  
  a01663ca
03 12月, 2019 2 次提交
- T
  remove unused snappy/snappystream depends in distributed codes (#21484) · 70eb3976
  由 Tao Luo 提交于 12月 03, 2019
```
test=develop
```
  70eb3976
- L
  set dim[0] to -1 if dim[0] < 0 during compiling for c_allgather op (#21402) · 0bc8bdf7
  由 lilong12 提交于 12月 03, 2019
```
* set dim[0] to -1 if dim[0] < 0 and remove assertion to runtime, test=develop

* modify ENFORCE message, test=develop

* add validation for x.shape[0] > 0, test=develop

* add ut, test=develop
```
  0bc8bdf7
31 10月, 2019 1 次提交

GradMaker for dygraph (#19706) · 8c4573a3

由 hong 提交于 10月 31, 2019

* refactor dygraph,test=develop

* fix failed unittest,test=develop

* polish code,test=develop

* check windows ci error,test=develop
try to fix windows ci error by np.allclose,test=develop

* polish vlog and profiler, test=develop

* try to fix preceding ops order,test=develop

* test transformer in windows ci, test=develop

* use python c-api to speed up tracer.trace,test=develop

* test=develop, fix docker with paddle nccl problem

* test=develop, add ut for debug string and gradient_accumulator

* test=develop, add tests for layer/gradient_accumulator/prepared_op

* test=develop, fix complie error for test_prepared_op

* test=develop, add more ut for dygraph

* test=develop, create API.spec for dygraph api change

* optimize grad maker; test=develop

* optimize grad maker

* test

* grad make optim; test=develop

* fix unittest bugs; test=develop

* add dygraph grad op maker and split_op

* grad op maker refactor; test=develop

* add dygraph grad maker; test=develop

* fix op deformable_conv_v1_op bug; test=develop

* fix deformable_conv prroi pool bugs;

* fix new op grad op maker bug; test=develop

* fix split by ref bug; test=develop

* fix dygraph auto prune bug; test=develop

* fix test_trace bug; test=develop

* fix fused emb seq pool bug; test=develop

* remove useless code in op_desc file; test=develop

* remove useless code, StrVarBaseNode; test=develop

* fix review issues; test=develop

* fix rank_loss grad maker; test=develop

* remove flag in VarBase; test=develop

* fix distributed_notify_op compile bug ; test=develop

* fix reshape op double grad; test=develop

* fix expand as op; test=develop

* add impertive type_defs.h for demo_train; test=develop

* fix inference lib cmake; test=develop

* fix inference lib; test=develop

* fix infernce_lib; test=develop

* fix inference cmake; test=develop

* fix inference lib; test=develop

* fix inference lib; test=develop

* remove condition dygraph grad maker, modify local name; test=develop

* fix split grad maker bug; test=develop

* fix pyramid_op bug; test=develop

* change travis time out limit; test=develop

* restore travis; test=develop

* change timeout limit; test=develop

8c4573a3

28 10月, 2019 1 次提交

Replace risky GetInputType method with secure IndicateVarDataType interface (#20668) · 26cc1fe5

由 Chen Weihang 提交于 10月 28, 2019

* replace part of the old implementation, test=develop

* restore concat op, test=develop

* update all ops implemention & delete GetDataTypeOfVar func, test=develop

26cc1fe5

27 8月, 2019 1 次提交

supports multiple NCCL communicators preserved in NCCLCommContext (#19407) · efb05ba2

由 Yi Liu 提交于 8月 27, 2019

* supports multiple NCCL communicators preserved in NCCLCommContext
test=develop

* add ut for c_comm_init_all operator and fix cuda resource release problem
test=develop

efb05ba2

02 7月, 2019 1 次提交

supports collective training with programs (#18392) · a873fa84

由 Yi Liu 提交于 7月 02, 2019

1. Since allreduce op has 4 reduce types, We split these four reduce types into four ops
2. We also refined the collective op code, e.g. we separated the collective op kernel into CPUKernel and CUDAKernel, and remove the device specified DeviceContext parameter in template as we already knew the target DeviceContext
3. We remove the newly added Collective op role to reduce the complexity of program and graph analysis

a873fa84

27 6月, 2019 1 次提交

supports collective communicated training (#18175) · b7128bac

由 HaoRen 提交于 6月 27, 2019

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* fix comment
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* fix comment
test=develop

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* test=develop
add collective op unittest standard

* test=develop
remove the test_collective directory

* test=develop
remove the test_collective directory

* remove slicegather test

* code format for reducescatter

* update attr of shard_index_op

* Modify macro nccl_helper

* remove test without distribute

* macro collective_helper

* marcro update

* test=develop
update support python3.5

* test=develop change gpu memory use to 0.1 when test

* test=develop
update ut equal func

* test=develop
set flags to 1.5

* test=develop fix pickle dumple  py35

* test=develop
fix divide in slice and add sync_comm_stream
update atol and rtol to 1e-05
rm shard_index op and test
modify read input from file to read from memory
remove origin_program in framework and add i/o in c_sync_calc_stream

* test=develop update unittest sync operator I/O

b7128bac

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致