提交 · d17961edc0f32f640861db93ed2e8660062ba2b7 · Crayon鑫 / Paddle

01 3月, 2022 1 次提交
- S
  Optimize the CUDA kernel in DistributedFusedLamb optimizer (#39972) · d17961ed
  由 sneaxiy 提交于 3月 01, 2022
```
* vectorize lamb kernel

* remove flags, add ut

* remove useless codes

* refine code, add param order
```
  d17961ed
25 2月, 2022 1 次提交

Add MultiTensorApply to calculate L2-Norm in DistributedFusedLamb optimizer (#39900) · d32a0102

由 sneaxiy 提交于 2月 25, 2022

* add multi tensor apply l2 norm

* add multi_tensor_apply code

* make sizeof(TensorMeta) smalller

* move code to distributed_fused_lamb_op.cu

* remove useless FLAGS

d32a0102

22 2月, 2022 1 次提交

change Vector to std::vector and provide MixVector class as a helper … (#39559) · 728c0624

由 xiongkun 提交于 2月 22, 2022

* change Vector to std::vector and provide MixVector class as a helper wrapper class

* solve the multi-gpu hang problem

* remove the duplicate template instantialize

* Copy vector to cpu

* add CopyToCPU

* xxx

* final version: fix the problem of all reduce

* remove mixvector dependence

* fix

* merge

* fix code

* fix by CI

728c0624

21 2月, 2022 1 次提交
- S
  
  fix alignment bug (#39747) · 65ced1fa
  由 sneaxiy 提交于 2月 21, 2022
  
  65ced1fa
20 2月, 2022 1 次提交

[PTen->Phi PR1] Change pten dirname and namespace to phi (#39748) · dcfe1986

由 Chen Weihang 提交于 2月 20, 2022

* rename pten dir to phi

* rename namespace to phi

* rename infrt pten dir to phi

* resolve conflict

* rename pten to phi in cmake

* revert all infrt change

* change needed files

* fix infrt failed

* fix inference failed

dcfe1986

19 2月, 2022 2 次提交

[Pten]Unify paddle/pten::framework::ddim into pten::ddim (#39614) · 2fe04264

由 Aurelius84 提交于 2月 19, 2022

* Unify paddle/pten::framework::ddim into pten::ddim

* fix paddle namespace

* compile sucessfully

* fix npu src file

* fix conflict

* fix conflict

* fix tensorrt compiler error

* fix conflict

* fix conflict

* fix tesst file conflict

* fix conflict

* fix mlu file conflict

* fix mlu file conflict

* fix cinn header file conflict

* fix conflict

* fix conflict

* fix conflict

* fix conflict

2fe04264

Add the DistributedFusedLamb optimizer (#39148) · 5df3cd61

由 sneaxiy 提交于 2月 19, 2022

* add DistributedFusedLamb op

* polish code

* fix compile error

* compatible with pten changement

* fix rocm compile error

* improve converage

* update upstream/develop

* fix cast_with_ptr.h

* add FLAGS_distributed_lamb_divide_nranks_when_allreduce=1

* fix clip before allreduce

* add use_master_param_norm

* code polish

* fix bug

* fix ROCM ci

5df3cd61

15 2月, 2022 2 次提交

move algorithm.h (#39502) · 7eb9593e

由 Feiyu Chan 提交于 2月 15, 2022

Move paddle/fluid/operators/math/algorithm.h to paddle/pten/kernels/funcs and rename all references to symbols in it.

7eb9593e

[PTen]Migrate proto::VarType outside of Pten (#39411) · 7e7e9404

由 Aurelius84 提交于 2月 15, 2022

* #1 migrate dist-related type()-> dtype()

* move datatype function from pten -> fluid/framework

* change type() in imperative into convert(dtype())

* modify xx_tensor->type into xx_tensor->dtype

* change the set_type interface and the caller

* modify xx_tensor.type into xx_tensor.dtype

* fix mutable_data(place, dtype())

* change caller of mutable_data in pten and distributed

* change the caller of mutable_data in fluid/framework

* change the caller of mutable_data in imperative directory

* mutable_data: inference

* update the call of mutable_data

* transfer MakePenScalarArray MakePtenScalar ResetHolderWithType

* pass the compile. the next step is remove VarType in Pten

* fix all and remove VarType from pten. success in linux. Next task is other platform

* fix conflict with develop

* fix compiled error

* Fix reset conversion

* fix conflict

* fix compiled problem

* fix typo

* Fix << in tensor_utils.cc

* fix type->dtype

* fix unittest

* fix tensor init constructor

* fix DataTypeSize for BFloat16

* fix code style

* fix npu compiled error

* fix npu

* compile npu sucessfully

* fix conflict

* fix conflict
Co-authored-by: Nxiongkun <xiongkun03@baidu.com>

7e7e9404

11 2月, 2022 1 次提交
- F
  [Pten] move operators/math/math_function_* to pten/kernels/func (#39300) · d25a7f9e
  由 Feiyu Chan 提交于 2月 11, 2022
```
* move operators/math/math_function_* to pten/kernels/func
* namespace from `paddle::operators::math` to `pten::funcs`
```
  d25a7f9e
09 2月, 2022 2 次提交

F

[mlu] add mlu kernel for momentum op (#39331) · f8ba12e5
由 fwenguang 提交于 2月 09, 2022

f8ba12e5

Replace EagerTensor with Tensor (#39376) · 945a3ce9

由 Jiabin Yang 提交于 2月 09, 2022

* merge legacy to fluid

* Remove legacy code

* Remove legacy code

* Remove DataType test

* Using Tensor directly instead of using EagerTensor

* support gradient_accumulation

* make test_imperative_lod_tensor_to_selected_rows longer

* make test_imperative_lod_tensor_to_selected_rows longer

945a3ce9

07 2月, 2022 1 次提交
- J
  Added Adam FP32 JIT assembly kernel (#39158) · ebd14743
  由 jakpiase 提交于 2月 07, 2022
```
* Added adam kernel

* CI rerun
```
  ebd14743
27 1月, 2022 1 次提交
- F
  
  move math_cuda_utils.h to pten/kernels/funcs (#39246) · 809a10b6
  由 Feiyu Chan 提交于 1月 27, 2022
  
  809a10b6
25 1月, 2022 1 次提交

[Move selected_rows PR ] Change the relationship of [include/Cmake]. (#39128) · 2bafd338

由 Weilong Wu 提交于 1月 25, 2022

* Added selected_rows and rw_lock to pten

* Renamed the unit test target to fix CI

* Removed Class SelectedRows in Fluid, changed include/cmake relationship, use pten::SelectedRows in Fluid

* Remove rw_lock.h,rw_lock_test.cc in fluid

* Use pten::RWLock and pten::AutoRDLock, fix CI

* Use pten::SelectedRows

* Use pten::SelectedRows

* Fix to pass NPU CI

* Use pten::SelectedRows, to pass NPU CI

* To fix NPU CI

* To fix NPU CI again

2bafd338

24 1月, 2022 2 次提交

[Pten] Migration of eigen numeric extensions and functors in paddle/fluid/operatos/eigen (#39124) · a1e40dc6

由 Feiyu Chan 提交于 1月 24, 2022

* migration of functors in paddle/fluid/operators/eigen and paddle/fluid/platform/eigen_ext.h
* update path of data types like float16.h in includes in extensions.h

a1e40dc6

support sparse of adam, *test=kunlun (#38483) · e106901e

由 z8hanghuan 提交于 1月 24, 2022

* support sparse of adam, *test=kunlun

* add pre-commit-config.yaml

* support sparse of adam in KL2,*test=kunlun

* support sparse of adam in KL2, *test=kunlun

* modify xpu.cmake, *test=kunlun

* support sparse of adam, rm some wait, *test=kunlun

* support sparse of adam, rm some wait, *test=kunlun

* support sparse of adam, *test=kunlun

* support sparse of adam, *test=kunlun

* support sparse of adam, *test=kunlun

* support sparse of adam, *test=kunlun

* support sparse of adam, *test=kunlun

e106901e

21 1月, 2022 1 次提交
- W
  
  Renamed selected_rows.* -> selected_rows_utils.* (#39037) · 814e5ab4
  由 Weilong Wu 提交于 1月 21, 2022
  
  814e5ab4
20 1月, 2022 1 次提交
- Z
  Fix master weight bug for multi_tensor optimizer(momentum, adam) (#38991) · 6b0c57cf
  由 zhangbo9674 提交于 1月 20, 2022
```
* fix mp

* support merged_momentum for mp
```
  6b0c57cf
18 1月, 2022 1 次提交

[Unify Tensors PR ] Merged Tensor into DenseTensor, test=allcases (#38914) · 2052f1e3

由 Zhanlue Yang 提交于 1月 18, 2022

* Merged LoDTensor with Tensor,test=allcases

* Patched python level LoDTensor

* Patched python level LoDTensor

* Merge Tensor into DenseTensor

* Fixed namespace issues,test=allcases

* Fixed merge issues

* Fixed inference issues

* Fixed NPU test issues

* Fixed merge issues

2052f1e3

17 1月, 2022 1 次提交
- S
  
  add squared_l2_norm (#38968) · 6eeb16b8
  由 sneaxiy 提交于 1月 17, 2022
  
  6eeb16b8
10 1月, 2022 1 次提交

[Unify Tensors PR ] framework::Tensor inherits from DenseTensor,test=allcases (#38632) · 5c73a6ea

由 Zhanlue Yang 提交于 1月 10, 2022

* Added shared_ptr<Allocation> member & corresponding interfaces to Storage

* Removed original pten::Allocation from Storage and adjusted the interfaces accordingly

* Fixed issues with storage offset

* Used place to malloc allocation for TensorStorage

* [Unify Tensors PR #3]Ported framework::Tensor interfaces to pten::DenseTensor

* Fixed issues with place

* Added comments

* Moved mutable_data with stream argument to DenseTensor

* Added set_offset interface

* Fixed CI issues,test=allcases

* [Unify Tensors PR #4] Port LoDTensor interfaces to DenseTensor

* Removed friend class EigenTensor/EigenMatrix/EigenVector from Tensor

* Modified framework::Tensor to inherit from DenseTensor

* Reverted changes too pten_layout() interface

* Removed friend classes

* Rearranged cfunction calls from tensor.data<void>() to tensor.data()

* Fixed CI issues

* Fixed lite issues

* Fixed data() interface issues,test=allcases

* Resolved IsInitialized() issues

* Fixed ResetHolder() issues

* Fixed MKLDNN & Storage issues

* Resolved ShareBufferWith() issues

* Fixed LoD issues

5c73a6ea

07 1月, 2022 1 次提交

Add multi tensor for adam (#38010) · fb3313e9

由 zhangbo9674 提交于 1月 07, 2022

* add multi tensor for adam

* add merged_adam op

* refine code

* refine adam compute logic

fb3313e9

29 12月, 2021 1 次提交
- S
  
  fix lamb beta1pow beta2pow update (#38518) · 3672480b
  由 sneaxiy 提交于 12月 29, 2021
  
  3672480b
28 12月, 2021 1 次提交
- G
  
  fix adamw epsilon in cuda kernel (#37746) · 6f1bb3d6
  由 Guoxia Wang 提交于 12月 28, 2021
  
  6f1bb3d6
24 12月, 2021 1 次提交
- Z
  
  [AMP] Add multi_precision for sgd (#38231) · a4d07bb9
  由 zhangbo9674 提交于 12月 24, 2021
  
  a4d07bb9
17 12月, 2021 1 次提交

Refine some AMP operators for BERT (#37923) · d80fe268

由 sneaxiy 提交于 12月 17, 2021

* support multi precision update for LAMB

* hide some api

* fix ci uts

* fix lamb output of dygraph

* remove some changes to some PR

* try to fix Py3 CI compile error

* fix test_imperative_optimizer, add lars ut, add layer_norm ut

* fix ut, fix format

* fix ut

* fix windows ci

d80fe268

03 12月, 2021 1 次提交
- R
  refine structure for cuda and rocm (#37202) · a6d2fddb
  由 ronnywang 提交于 12月 03, 2021
```
* refine structure for cuda and rocm

* update

* update

* update

* update
```
  a6d2fddb
30 11月, 2021 1 次提交

[opt] Add regularation and Nesterov for mergerd_momentum op (#37527) · c8ffdecb

由 zhangbo9674 提交于 11月 30, 2021

* add regularation and Nesterov for mergerd_momentum

* refine unittest for use_nesterov attr

* refine op check

* refine code

* fix bug

* refine code of regularization_flag

* delete useless code

c8ffdecb

29 11月, 2021 1 次提交
- P
  
  Add third batch of deprecated mkldnn namespace name changes (#37558) · 1ba81500
  由 piotrekobiIntel 提交于 11月 29, 2021
  
  1ba81500
27 11月, 2021 1 次提交

[NPU] reorganization for device API abstraction (#37110) · 72241a6a

由 Aganlengzi 提交于 11月 27, 2021

* [NPU] reorganization for device API abstraction

* [NPU] delete old files

* [NPU] fix npu_collective_helper

* [NPU] fix collective_helper

* [NPU] fix ut

* [NPU] mod memory allocation and hccl_helper

* [NPU] fix place_type

* [NPU] split enfoce.h

* move acl* call into npu_info

* merge conflict

* fix merge

* merge conflict

* merge conflict

72241a6a

17 11月, 2021 1 次提交
- L
  copy beta pow to same place when skip_update=1 (#37245) · 5e4b419b
  由 Leo Chen 提交于 11月 17, 2021
```
* copy beta pow to same place when skip_update=1

* fix xpu
```
  5e4b419b
20 10月, 2021 1 次提交
- Z
  
  fix pow2 decay (#36559) · 605e7f08
  由 Zeng Jinle 提交于 10月 20, 2021
  
  605e7f08
19 10月, 2021 1 次提交

Add pow2_decay_with_linear_warmup op (#36421) · 305b99a0

由 Zeng Jinle 提交于 10月 19, 2021

* add pow2_warmup op

* remove contrib __all__

* add AttrT

* rename

* follow comments

* fix duplicate PADDLE_RESTRICT

305b99a0

17 10月, 2021 1 次提交
- Z
  
  refine rescale_grad (#36490) · 4e036fa1
  由 Zeng Jinle 提交于 10月 17, 2021
  
  4e036fa1
15 10月, 2021 2 次提交
- Z
  Remove wrong __restrict__ of CUDA LarsMomentumOpKernel (#36460) · adb80494
  由 Zeng Jinle 提交于 10月 15, 2021
```
* remove wrong restrict

* remove master_param_out __restrict__

* update
```
  adb80494
- Z
  
  fix momentum ops (#36452) · 4dda18a8
  由 Zeng Jinle 提交于 10月 15, 2021
  
  4dda18a8
14 10月, 2021 3 次提交
- Z
  
  fix lars (#36431) · 8256f6fa
  由 Zeng Jinle 提交于 10月 14, 2021
  
  8256f6fa
- Z
  
  refine merge lars (#36428) · 63fd7d66
  由 Zeng Jinle 提交于 10月 14, 2021
  
  63fd7d66
- Z
  Merge momentum ops/kernels (#36380) · f4eda869
  由 Zeng Jinle 提交于 10月 14, 2021
```
* merge momentum ops

* update

* add ut to improve coverage

* remove optimizer change

* fix error msg

* update ut

* add __restrict__ for CUDA

* update ut

* move merged_momentum_op to optimizer dir

* fix coverage
```
  f4eda869

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致