提交 · 406f1b9650e072abf04ac7572b24d89399d98343 · PaddlePaddle / Paddle

28 2月, 2022 9 次提交

由 liutiexing 提交于 2月 28, 2022

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* Revert "Add EventsWaiter"

This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2.

* update HostTracer

* fix

* update

* update
Co-authored-by: Nliutiexing <liutiexing@google.com>

406f1b96

Z
[bf16] Refine BF16 amp-o1 logic (#39815) · 18ee051e
由 zhangbo9674 提交于 2月 28, 2022
```
* refine bf16 amp-o1 logic

* refine amp GLOG

* refine unittest

* refine unittest
```
18ee051e

[PHI] adjust the empty kernel and dev_api (#39958) · d1595c26

由 zyfncg 提交于 2月 28, 2022

* remove empty kernel in fluid and adjust the param of empty dev_api

* polish code

* revert fluid empty kernel

d1595c26

W

infrt add trt engine (#39885) · 27536a32
由 Wilber 提交于 2月 28, 2022

27536a32

[Pten] Support optional param for C++ API (#39760) · aceb25e1

由 zyfncg 提交于 2月 28, 2022

* fix selected_rows bug in C++ API

* add optional for C++ APIO

* data transform support optional

* remove data transform for optional vector<Tensor>

* adjust some format of funtcion

* fix empyt bug

aceb25e1

Z

fix ps_gpu_wrapper (#39965) · bd9b9460
由 zmxdream 提交于 2月 28, 2022

bd9b9460
C
add new profiler components (#39964) · d4ae1775
由 chenjian 提交于 2月 28, 2022
```
* add new profiler components

* fix bug
```
d4ae1775

[KP] Unify .cu and .xpu files with .kps files (#39917) · 0ff72e5d

由 Liu-xiandong 提交于 2月 28, 2022

* [KP] Unify .cu and .xpu files with .kps files

* fix CI bug in GPU and modify the list

* fix conflict

* modify the date

0ff72e5d

[Phi] Add ClearHolder when re-alloc on new place in DeviceContext (#39833) · 2753c16f

由 Aurelius84 提交于 2月 28, 2022

* [Phi] Add ClearHolder when re-alloc on new place in DeviceContext

* fix hostAlloc

* foix inferRT unittest

* remove dev_ctx ptr

2753c16f

27 2月, 2022 1 次提交
- L
  fix pylayer problem with amp (#39950) · 282e09dc
  由 Leo Chen 提交于 2月 27, 2022
```
* fix pylayer problem with amp

* add ut

* refine code
```
  282e09dc
26 2月, 2022 7 次提交

Y

revert reshape op infershape (#39946) · b33a3c23
由 YuanRisheng 提交于 2月 26, 2022

b33a3c23

[Pten] Refactor the copy kernel (#39731) · 9a7b9eda

由 zyfncg 提交于 2月 26, 2022

* remove SetAllocationForOutputTenosr

* add place param for copy kernel

* recover SetAllocationForOutputTenosr

* polish code

* fix empty_dev api bug

* test=allcases

* test=allcases

* fix bug

* recover empty

* recover modify

9a7b9eda

Move GumbelSoftmax OP to phi (#39873) · 581b2c64

由 From00 提交于 2月 26, 2022

* Move GumbelSoftmax OP to phi

* platform::errors -> phi::errors; GumbelSoftmaxGradInferMeta -> backend.h/cc

* Use axis util in kernel impl

* Remove namespace platform::errors

* Use GetCPUEngine in Device Context

581b2c64

Support custom implement for C++ API (#39521) · caea126c

由 zyfncg 提交于 2月 26, 2022

* Support custom implement for C++ API

* rename api_invoke_impl to api_custom_impl

* remove manual_api

* delete mutable_data in copy_to api

* fix problem of copy_to

* add unittest for infer_meta_fn_factory

* fix split cofig in yaml

* fix split cofig in yaml

* modify sum api yaml

* add copy_to wrapped infermeta

* rollback copy impl

caea126c

F
Move BilinearTensorProduct OP to phi (#39903) · de8f2748
由 From00 提交于 2月 26, 2022
```
* Move BilinearTensorProduct OP to phi

* Set dtype for Infermeta
```
de8f2748
W
[Eager Hook] Support GradientHook and ReduceHook, expose related interface to python (#39893) · a456dda6
由 Weilong Wu 提交于 2月 26, 2022
```
* Support Eager Hook, expose interface to python

* Fix CI issue
```
a456dda6
C

fix mkldnn softmax erro (#39951) · ab872efe
由 Chen Weihang 提交于 2月 26, 2022

ab872efe

25 2月, 2022 23 次提交
- C
  
  move for_range into phi (#39931) · 94d8f392
  由 Chen Weihang 提交于 2月 25, 2022
  
  94d8f392
- F
  
  [phi] update code for mkl based fft (#39889) · 687902fc
  由 Feiyu Chan 提交于 2月 25, 2022
  
  687902fc
- J
  
  added logsoftmax oneDNN kernel (#39793) · 584844ec
  由 jakpiase 提交于 2月 25, 2022
  
  584844ec
- S
  Add MultiTensorApply to calculate L2-Norm in DistributedFusedLamb optimizer (#39900) · d32a0102
  由 sneaxiy 提交于 2月 25, 2022
```
* add multi tensor apply l2 norm

* add multi_tensor_apply code

* make sizeof(TensorMeta) smalller

* move code to distributed_fused_lamb_op.cu

* remove useless FLAGS
```
  d32a0102
- 0
  move eye、size、erfinv、pixel_shuffle OP to phi (#39712) · 639675de
  由 0x45f 提交于 2月 25, 2022
```
* move eye OP to pten

* move size OP to pten

* merge develop

* fix merge

* move files

* move erfinv OP to phi

* remove comment

* move pixel_shuffle OP to phi

* remove comment

* fix PT_REGISTER

* fix NPU

* fix CR

* remove size_sig.cc for PR-CI-Coverage
```
  639675de
- Y
  Disable dist ut cases (#39906) · 4fe465cb
  由 YUNSHEN XIE 提交于 2月 25, 2022
```
* disable some distribute test case when in CPU test env

* disable some test case when in CPU test env

* fix
```
  4fe465cb
- Z
  
  Fix conflict caused by wrong namespace (#39930) · d8fc7211
  由 Zhang Zheng 提交于 2月 25, 2022
  
  d8fc7211
- A
  [phi]migrate increment addmm multinomial cholesky InferShapes to phi (#39913) · 87b903a3
  由 Aganlengzi 提交于 2月 25, 2022
```
* [phi]migrate increment addmm multinomial cholesky InferShapes to phi

* set_dtype and mod MultinomialFunctor
```
  87b903a3
- Q
  [ROCm] fix Managed Memory Alloc on HIP, test=develop (#39896) · 37cb6f32
  由 Qi Li 提交于 2月 25, 2022
```
* [ROCm] fix Managed Memory Alloc on HIP, test=develop

* update, test=develop
```
  37cb6f32
- L
  
  move diag_v2 to phi (#39914) · 783c4aba
  由 Linjie Chen 提交于 2月 25, 2022
  
  783c4aba
- Z
  
  [MLU]support launch process on mlu (#39839) · 2533cac6
  由 zn 提交于 2月 25, 2022
  
  2533cac6
- Z
  
  replace implementation with cuda kernel (#39795) · 64f1485a
  由 Zhang Ting 提交于 2月 25, 2022
  
  64f1485a
- Z
  Optimize perf of softmax_with_cross_entropy (#39553) · bbe5228c
  由 Zhang Zheng 提交于 2月 25, 2022
```
* Optimize perf of softmax_with_cross_entropy

* fix

* fix

* fix accuracy error
```
  bbe5228c
- Z
  [bf16] add bf16 kernel: elementwise_add elementwise_mul elementwise_sub (#39716) · 2fedd39b
  由 zhangbo9674 提交于 2月 25, 2022
```
* add ele_add

* add ele_mul

* add ele_sub

* sovle conflict

* fix npu

* refine ele_add

* add ele_mul unittest

* refine ele_sub

* refine ci

* refine unittest
```
  2fedd39b
- F
  [Phi] mv kernel (#39861) · 2553af4f
  由 furnace 提交于 2月 25, 2022
```
[Phi] mv kernel 
```
  2553af4f
- L
  [phi] refine code of randint, randperm, unbind kernel (#39909) · 22f84122
  由 Leo Chen 提交于 2月 25, 2022
```
* refine randint kernel

* refine randperm kernel

* refine unbind kernel

* support op seed
```
  22f84122
- J
  
  add reduce_min and reduce_max (#39899) · 44da9b42
  由 joeqiao12 提交于 2月 25, 2022
  
  44da9b42
- C
  [Phi] Support cudnn kernel moving & move softmax kernels (#39547) · 8895379a
  由 Chen Weihang 提交于 2月 25, 2022
```
* support cudnn kernel moving

* polish cmake rules

* add unittest for coverage

* remove orig kernel

* remove softmax cudnn kernel

* fix softmax test failed

* fix npu func error

* resolve conflict

* rename gpu dnn kernels

* fix name rule error

* fix compile error

* update fp16 namespace
```
  8895379a
- Y
  [Bug Fixes]Fix Bugs when construct infermeta by using shape(Vector<Tensor>) (#39904) · fed6de40
  由 YuanRisheng 提交于 2月 25, 2022
```
* fix bugs

* fix bugs
```
  fed6de40
- L
  [Fix bug] fix fp16 atomicAdd compiler error on different cuda_arch. (#39886) · ef96ffb6
  由 Li Min 提交于 2月 25, 2022
```
* Fix compile error on cuda_arch less than 700.
```
  ef96ffb6
- F
  
  [MLU] add elementwise_mul mlu kernel (#39864) · 04d324b2
  由 fwenguang 提交于 2月 25, 2022
  
  04d324b2
- N
  
  Fix a bug in IndexKernel data overflow (#39891) · 0615815d
  由 niuliling123 提交于 2月 25, 2022
  
  0615815d
- Z
  
  Fixed Python-C AutoCodeGen issues (#39897) · b56ac35c
  由 Zhanlue Yang 提交于 2月 25, 2022
  
  b56ac35c

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功