提交 · 5781999d39007af1e190b14d709914704aeeb77b · PaddlePaddle / Paddle

09 6月, 2022 1 次提交

Add nproc_per_node for DistributedFusedLamb (#43295) · 6678def9

由 sneaxiy 提交于 6月 09, 2022

* add nproc_per_node for DistributedFusedLamb

* fix nproc_per_node communicator bug

* fix ring_id = 1 init bug

* fix ci

* fix test_parallel_executor_mnist.py

6678def9

07 6月, 2022 2 次提交
- S
  
  Optimized the performance of activation op in XPU2 (#43187) · d5afc1ba
  由 shixingbo 提交于 6月 07, 2022
  
  d5afc1ba
- S
  Add use_master_acc_grad for DistributedFusedLamb (#43266) · 601d7a35
  由 sneaxiy 提交于 6月 07, 2022
```
* add use_master_acc_grad

* add ut
```
  601d7a35
05 6月, 2022 1 次提交
- S
  
  【code format check upgrade】 step2：clang-format (#42840) · a3730dc8
  由 Sing_chan 提交于 6月 05, 2022
  
  a3730dc8
04 6月, 2022 1 次提交
- S
  
  【code format check upgrade】 step2：cmake-format (#43057) · 92568edb
  由 Sing_chan 提交于 6月 04, 2022
  
  92568edb
27 5月, 2022 1 次提交

[Phi] Change optional tensor from `optional<const Tensor&>` to `optional<Tensor>` (#42939) · 6d78524c

由 zyfncg 提交于 5月 27, 2022

* refactor the optional tensor

* remove optiona<MetaTensor> in InferMeta

* fix bug

* fix optional<vector<Tensor>>

* fix bug

* fix rmsprop

* fix amp of eager_gen

* polish code

* fix deleted code

* fix merge conflict

* polish code

* remove is_nullopt_

* fix merge conflict

* fix merge conflict

6d78524c

16 5月, 2022 1 次提交

Add the new XDNN implementation. test=kunlun (#42683) · 87667c66

由 wbn 提交于 5月 16, 2022

* Add the new XDNN implementation. test=kunlun

* Add the new XDNN implementation. test=kunlun

* Modify the code based on review, test=kunlun

87667c66

11 5月, 2022 1 次提交
- T
  
  remove old XDNN implementation test=kunlun (#42404) · 7b828f71
  由 taixiurong 提交于 5月 11, 2022
  
  7b828f71
10 5月, 2022 1 次提交
- Q
  
  [MLU]add adam, adamw op of mlu device (#42557) · cc077693
  由 qipengh 提交于 5月 10, 2022
  
  cc077693
29 4月, 2022 1 次提交
- A
  
  [OP]Fix adamw not registered into AllKernels (#42391) · 683f152a
  由 Aurelius84 提交于 4月 29, 2022
  
  683f152a
28 4月, 2022 1 次提交

Add gradient merge for DistributedFusedLamb optimizer (#40177) · 108aeb28

由 sneaxiy 提交于 4月 28, 2022

* add gradient merge for DistributedFusedLamb

* use master acc gradient

* fix CI ut

* polish

* remove math_function_impl.h change

* fix test_update_loss_scaling_op.py

* try to fix XPU/NPU CI

* add gm ut

108aeb28

20 4月, 2022 1 次提交
- F
  
  [MLU] add gather mlu kernel (#41969) · 23ad2166
  由 fwenguang 提交于 4月 20, 2022
  
  23ad2166
15 4月, 2022 1 次提交
- F
  
  [MLU] add mlu activation kernels (#41751) · 10114859
  由 fwenguang 提交于 4月 15, 2022
  
  10114859
13 4月, 2022 1 次提交
- Z
  Add yaml and unittest for SGD (#41485) · 6d1e03a2
  由 zyfncg 提交于 4月 13, 2022
```
* add sgd yaml

* change python api

* open eager mode in sgd

* fix bug
```
  6d1e03a2
07 4月, 2022 2 次提交
- S
  Add Output(Step) to DistributedFusedLamb optimizer (#41249) · e4459a40
  由 sneaxiy 提交于 4月 07, 2022
```
* add Output(Step) to distributed fused lamb op

* add _set_step
```
  e4459a40
- H
  momentum support l2decay for xpu. test=kunlun (#41325) · 533c649f
  由 houj04 提交于 4月 07, 2022
```
* momentum support l2decay for xpu. test=kunlun

* fix include file. test=kunlun

* fix cmake for device_worker. test=kunlun
```
  533c649f
03 4月, 2022 1 次提交

Add infer meta (#41054) · 868a3203

由 hong 提交于 4月 03, 2022

* add some infer meta

* fix bug

* fix bugs;

* fix bug and add set data type

* revert infer shape of lookup table

* recover test

868a3203

28 3月, 2022 1 次提交

Move meshgrid to phi (#40994) · ca871957

由 hong 提交于 3月 28, 2022

* move momentum, rmsprop to phi; test=develop

* update

* update

* update

* update

* udpate; test=develop

* fix xpu npu bugs; test=develop

* fix npu bug; test=develop

* fix windows compile error; test=develop

* fix windows compile error; test=develop

* polish code; test=develop

* fix conflict; test=develop

* add meshgrid;

* update

* polish code

* polish code;

* fix bug

* format; remove useless code

* fix npu bug

* fix bug

ca871957

25 3月, 2022 3 次提交

D
fix lars optitmizer bug (#40892) · c006a609
由 duanboqiang 提交于 3月 25, 2022
```
* fix lars optitmizer bug

* Update optimizer.py
```
c006a609

[Phi] Migrate Adam and AdamW into Phi (#40351) · 56cd3407

由 Aurelius84 提交于 3月 25, 2022

* [Phi] Migrate Adam and Adamw into Phi

* fix compile error and unittest ok

* fix compile error and unittest ok

* fix undefined reference to fLI::FLAGS

* test depend on operator

* fix cmake

* fix xpu compile

* fix infrt

* fix amp_type_traits

* fix amp_type_traits

* modify according reviewer

* modify according reviewer

* fix dtype float16

* fix typo

* fix Cmake

* fix code style

56cd3407

A
[NPU] add merged_momentum (#40875) · 2b74b739
由 Aganlengzi 提交于 3月 25, 2022
```
* [NPU] add merged_momentum

* fix

* fix device
```
2b74b739

14 3月, 2022 1 次提交
- F
  
  [MLU] add merged_momentum mlu kernel (#40406) · 1f7b2516
  由 fwenguang 提交于 3月 14, 2022
  
  1f7b2516
07 3月, 2022 2 次提交
- A
  
  [Phi] Fix macro name typo (#40204) · 55a3bfbd
  由 Aurelius84 提交于 3月 07, 2022
  
  55a3bfbd
- A
  [Phi]Migrate Adamax and Adadelta Optimizer Op into Phi (#40173) · f5ec0314
  由 Aurelius84 提交于 3月 07, 2022
```
* [Phi]Migrate Adamax into phi

* Add adadelta kernel
```
  f5ec0314
04 3月, 2022 1 次提交
- L
  clean distribution_helper, index_impl, aligned_vector code in fluid (#40071) · b9672a1e
  由 Leo Chen 提交于 3月 04, 2022
```
* clean distribution_helper, index_impl, aligned_vector code in fluid

* fix conflicts
```
  b9672a1e
02 3月, 2022 2 次提交
- H
  Move sgd to phi (#40045) · f3d54e2e
  由 hong 提交于 3月 02, 2022
```
* move sgd to phi; test=develop

* update

* add sgd kernel; test=develop
```
  f3d54e2e
- S
  
  vec scale kernel (#40011) · 2e6548a9
  由 sneaxiy 提交于 3月 02, 2022
  
  2e6548a9
01 3月, 2022 1 次提交
- S
  Optimize the CUDA kernel in DistributedFusedLamb optimizer (#39972) · d17961ed
  由 sneaxiy 提交于 3月 01, 2022
```
* vectorize lamb kernel

* remove flags, add ut

* remove useless codes

* refine code, add param order
```
  d17961ed
25 2月, 2022 1 次提交

Add MultiTensorApply to calculate L2-Norm in DistributedFusedLamb optimizer (#39900) · d32a0102

由 sneaxiy 提交于 2月 25, 2022

* add multi tensor apply l2 norm

* add multi_tensor_apply code

* make sizeof(TensorMeta) smalller

* move code to distributed_fused_lamb_op.cu

* remove useless FLAGS

d32a0102

22 2月, 2022 1 次提交

change Vector to std::vector and provide MixVector class as a helper … (#39559) · 728c0624

由 xiongkun 提交于 2月 22, 2022

* change Vector to std::vector and provide MixVector class as a helper wrapper class

* solve the multi-gpu hang problem

* remove the duplicate template instantialize

* Copy vector to cpu

* add CopyToCPU

* xxx

* final version: fix the problem of all reduce

* remove mixvector dependence

* fix

* merge

* fix code

* fix by CI

728c0624

21 2月, 2022 1 次提交
- S
  
  fix alignment bug (#39747) · 65ced1fa
  由 sneaxiy 提交于 2月 21, 2022
  
  65ced1fa
20 2月, 2022 1 次提交

[PTen->Phi PR1] Change pten dirname and namespace to phi (#39748) · dcfe1986

由 Chen Weihang 提交于 2月 20, 2022

* rename pten dir to phi

* rename namespace to phi

* rename infrt pten dir to phi

* resolve conflict

* rename pten to phi in cmake

* revert all infrt change

* change needed files

* fix infrt failed

* fix inference failed

dcfe1986

19 2月, 2022 2 次提交

[Pten]Unify paddle/pten::framework::ddim into pten::ddim (#39614) · 2fe04264

由 Aurelius84 提交于 2月 19, 2022

* Unify paddle/pten::framework::ddim into pten::ddim

* fix paddle namespace

* compile sucessfully

* fix npu src file

* fix conflict

* fix conflict

* fix tensorrt compiler error

* fix conflict

* fix conflict

* fix tesst file conflict

* fix conflict

* fix mlu file conflict

* fix mlu file conflict

* fix cinn header file conflict

* fix conflict

* fix conflict

* fix conflict

* fix conflict

2fe04264

Add the DistributedFusedLamb optimizer (#39148) · 5df3cd61

由 sneaxiy 提交于 2月 19, 2022

* add DistributedFusedLamb op

* polish code

* fix compile error

* compatible with pten changement

* fix rocm compile error

* improve converage

* update upstream/develop

* fix cast_with_ptr.h

* add FLAGS_distributed_lamb_divide_nranks_when_allreduce=1

* fix clip before allreduce

* add use_master_param_norm

* code polish

* fix bug

* fix ROCM ci

5df3cd61

15 2月, 2022 2 次提交

move algorithm.h (#39502) · 7eb9593e

由 Feiyu Chan 提交于 2月 15, 2022

Move paddle/fluid/operators/math/algorithm.h to paddle/pten/kernels/funcs and rename all references to symbols in it.

7eb9593e

[PTen]Migrate proto::VarType outside of Pten (#39411) · 7e7e9404

由 Aurelius84 提交于 2月 15, 2022

* #1 migrate dist-related type()-> dtype()

* move datatype function from pten -> fluid/framework

* change type() in imperative into convert(dtype())

* modify xx_tensor->type into xx_tensor->dtype

* change the set_type interface and the caller

* modify xx_tensor.type into xx_tensor.dtype

* fix mutable_data(place, dtype())

* change caller of mutable_data in pten and distributed

* change the caller of mutable_data in fluid/framework

* change the caller of mutable_data in imperative directory

* mutable_data: inference

* update the call of mutable_data

* transfer MakePenScalarArray MakePtenScalar ResetHolderWithType

* pass the compile. the next step is remove VarType in Pten

* fix all and remove VarType from pten. success in linux. Next task is other platform

* fix conflict with develop

* fix compiled error

* Fix reset conversion

* fix conflict

* fix compiled problem

* fix typo

* Fix << in tensor_utils.cc

* fix type->dtype

* fix unittest

* fix tensor init constructor

* fix DataTypeSize for BFloat16

* fix code style

* fix npu compiled error

* fix npu

* compile npu sucessfully

* fix conflict

* fix conflict
Co-authored-by: Nxiongkun <xiongkun03@baidu.com>

7e7e9404

11 2月, 2022 1 次提交
- F
  [Pten] move operators/math/math_function_* to pten/kernels/func (#39300) · d25a7f9e
  由 Feiyu Chan 提交于 2月 11, 2022
```
* move operators/math/math_function_* to pten/kernels/func
* namespace from `paddle::operators::math` to `pten::funcs`
```
  d25a7f9e
09 2月, 2022 2 次提交

F

[mlu] add mlu kernel for momentum op (#39331) · f8ba12e5
由 fwenguang 提交于 2月 09, 2022

f8ba12e5

Replace EagerTensor with Tensor (#39376) · 945a3ce9

由 Jiabin Yang 提交于 2月 09, 2022

* merge legacy to fluid

* Remove legacy code

* Remove legacy code

* Remove DataType test

* Using Tensor directly instead of using EagerTensor

* support gradient_accumulation

* make test_imperative_lod_tensor_to_selected_rows longer

* make test_imperative_lod_tensor_to_selected_rows longer

945a3ce9

07 2月, 2022 1 次提交
- J
  Added Adam FP32 JIT assembly kernel (#39158) · ebd14743
  由 jakpiase 提交于 2月 07, 2022
```
* Added adam kernel

* CI rerun
```
  ebd14743

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功