提交 · a6e0015935223eefc1667e99c5925673b637e6e9 · PaddlePaddle / Paddle

14 2月, 2022 1 次提交
- [MLU] add mlu kernel for c_broadcast op (#39470) · 1b9e6790
  由 mhhhh1 提交于 2月 14, 2022
  
  1b9e6790
03 12月, 2021 1 次提交
- R
  refine structure for cuda and rocm (#37202) · a6d2fddb
  由 ronnywang 提交于 12月 03, 2021
```
* refine structure for cuda and rocm

* update

* update

* update

* update
```
  a6d2fddb
24 2月, 2021 1 次提交
- Q
  
  [ROCM] update fluid collective op for rocm, test=develop (#31075) · ee76ea72
  由 Qi Li 提交于 2月 24, 2021
  
  ee76ea72
30 9月, 2020 1 次提交

fix distributed error info (#27206) · 20fb01fb

由 MRXLT 提交于 9月 30, 2020

* fix distributed error info

* bug fix; notest

* error info refine

* update error info

* update error info

* update error info

* bug fix

* bug fix

* bug fix

* bug fix

20fb01fb

11 2月, 2020 1 次提交

Compile without nccl deps. [1/2] (#22509) · a90fa540

由 Wilber 提交于 2月 11, 2020

支持不依赖nccl进行编译。[1/2]

多卡下，如果没有打开WITH_NCCL开关编译，多卡不能通信，则只能选择一张卡使用。
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>

a90fa540

27 8月, 2019 1 次提交

supports multiple NCCL communicators preserved in NCCLCommContext (#19407) · efb05ba2

由 Yi Liu 提交于 8月 27, 2019

* supports multiple NCCL communicators preserved in NCCLCommContext
test=develop

* add ut for c_comm_init_all operator and fix cuda resource release problem
test=develop

efb05ba2

02 7月, 2019 1 次提交

supports collective training with programs (#18392) · a873fa84

由 Yi Liu 提交于 7月 02, 2019

1. Since allreduce op has 4 reduce types, We split these four reduce types into four ops
2. We also refined the collective op code, e.g. we separated the collective op kernel into CPUKernel and CUDAKernel, and remove the device specified DeviceContext parameter in template as we already knew the target DeviceContext
3. We remove the newly added Collective op role to reduce the complexity of program and graph analysis

a873fa84

27 6月, 2019 1 次提交

supports collective communicated training (#18175) · b7128bac

由 HaoRen 提交于 6月 27, 2019

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* fix comment
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* fix comment
test=develop

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* test=develop
add collective op unittest standard

* test=develop
remove the test_collective directory

* test=develop
remove the test_collective directory

* remove slicegather test

* code format for reducescatter

* update attr of shard_index_op

* Modify macro nccl_helper

* remove test without distribute

* macro collective_helper

* marcro update

* test=develop
update support python3.5

* test=develop change gpu memory use to 0.1 when test

* test=develop
update ut equal func

* test=develop
set flags to 1.5

* test=develop fix pickle dumple  py35

* test=develop
fix divide in slice and add sync_comm_stream
update atol and rtol to 1e-05
rm shard_index op and test
modify read input from file to read from memory
remove origin_program in framework and add i/o in c_sync_calc_stream

* test=develop update unittest sync operator I/O

b7128bac

03 4月, 2019 1 次提交

Add Pixel shuffle OP (#15782) · 229dc932

由 ruri 提交于 4月 03, 2019

* add pixel_shuffle op

* add pixel_shuffle op, test=develop

* rewrite code, test=develop

* delete useless comment, test=develop

* Refine pixel_shuffle_op and unit testing

* refine code,test=develop

* refine .cu,test=develop

* fix unittest,test=develop

* Fix unit testing
test=develop

* resolve conflict, test=develop

* fix test, test=develop

* fix API, test=develop

* fix test datatype bug,test=develop

* polish comments,test=develop

* add API,test=develop

* test=develop

* Add Pixel_Shuffle OP,test=develop

* support python3,test=develop

* add include memory to travis CI bug,test=develop

229dc932

21 3月, 2019 2 次提交
- P
  
  fix time; test=develop · 5dc9b519
  由 phlrain 提交于 3月 21, 2019
  
  5dc9b519
- P
  
  add elementwise floordiv, mod; test=develop · 56c2d384
  由 phlrain 提交于 3月 21, 2019
  
  56c2d384
12 2月, 2018 1 次提交
- Q
  
  Fix the grammar in copyright. (#8403) · 24509f4a
  由 qingqing01 提交于 2月 12, 2018
  
  24509f4a
10 2月, 2018 2 次提交
- Y
  
  Correct #include path · fc374821
  由 Yi Wang 提交于 2月 09, 2018
  
  fc374821
- Y
  
  Move file to fluid/; Edit CMakeLists.txt · 90648f33
  由 Yi Wang 提交于 2月 09, 2018
  
  90648f33
26 12月, 2017 1 次提交
- L
  
  unify the indentation of license · 761b3297
  由 Luo Tao 提交于 12月 26, 2017
  
  761b3297
12 12月, 2017 1 次提交

Refine device context (#6433) · 61ec0b95

由 QI JUN 提交于 12月 12, 2017

There are mainly following fixes:

- take `DeviceContext` as the template parameter of math functors and OpKernel instead of `Place`
- remove `eigen_device` interface in base class  `DeviceContext`
- remove `GetEigenDevice` interface in `ExecutionContext` and base class `DeviceContext`
- remove unused `platform::EigenDeviceConverter`
- rename `REGISTER_OP_GPU_KERNEL` to `REGISTER_OP_CUDA_KERNEL`
- rename `USE_GPU_ONLY_OP` to `USE_CUDA_ONLY_OP`

61ec0b95

03 11月, 2017 1 次提交
- W
  
  fix doc and code style · 34d68f24
  由 wwhu 提交于 11月 03, 2017
  
  34d68f24
02 11月, 2017 1 次提交
- W
  
  add cliy_by_norm op · 65451b5c
  由 wwhu 提交于 11月 02, 2017
  
  65451b5c
13 10月, 2017 1 次提交

Adding the Adam Optimizer operator (#4733) · 11680037

由 Abhinav Arora 提交于 10月 12, 2017

* add adam op

moment1_out = beta1 * moment1 + (1 − beta1) * grad
moment2_out = beta2 * moment2 + (1 − beta2) * grad * grad
moment1_hat =  moment1_out / (1 - beta1^t)
moment2_hat =  moment2_out / (1 - beta2^t)
param_out = param - learning_rate * moment1_hat / (sqrt(moment2_hat) +
epsilon)

* fix moment 2

* Adding the Adam optimization operator

* Adding more tests for Adam op

11680037

07 8月, 2017 1 次提交
- D
  
  "remove alias to more operators" · 6b23b91c
  由 dongzhihong 提交于 8月 07, 2017
  
  6b23b91c
04 8月, 2017 1 次提交
- L
  
  Add cpplint for *.h and cuda *.cu · b58725bd
  由 liaogang 提交于 8月 04, 2017
  
  b58725bd
31 7月, 2017 1 次提交
- Q
  
  add EIGEN_USE_GPU macro to op.cu file · 61f94f00
  由 qijun 提交于 7月 31, 2017
  
  61f94f00
25 7月, 2017 1 次提交
- Y
  Add type_alias to import framework into ops · efc119b4
  由 Yu Yang 提交于 7月 25, 2017
```
Make implement an operator less noisy.
```
  efc119b4
19 7月, 2017 1 次提交
- Q
  Add sgd op (#2950) · e3b27d19
  由 Qiao Longfei 提交于 7月 19, 2017
```
* a simplest SGD op
```
  e3b27d19

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功