提交 · 2fedd39bcf8fc0d74d693e299d1c11019300fbe7 · BaiXuePrincess / Paddle

25 2月, 2022 10 次提交
- Z
  [bf16] add bf16 kernel: elementwise_add elementwise_mul elementwise_sub (#39716) · 2fedd39b
  由 zhangbo9674 提交于 2月 25, 2022
```
* add ele_add

* add ele_mul

* add ele_sub

* sovle conflict

* fix npu

* refine ele_add

* add ele_mul unittest

* refine ele_sub

* refine ci

* refine unittest
```
  2fedd39b
- F
  [Phi] mv kernel (#39861) · 2553af4f
  由 furnace 提交于 2月 25, 2022
```
[Phi] mv kernel 
```
  2553af4f
- J
  
  add reduce_min and reduce_max (#39899) · 44da9b42
  由 joeqiao12 提交于 2月 25, 2022
  
  44da9b42
- C
  [Phi] Support cudnn kernel moving & move softmax kernels (#39547) · 8895379a
  由 Chen Weihang 提交于 2月 25, 2022
```
* support cudnn kernel moving

* polish cmake rules

* add unittest for coverage

* remove orig kernel

* remove softmax cudnn kernel

* fix softmax test failed

* fix npu func error

* resolve conflict

* rename gpu dnn kernels

* fix name rule error

* fix compile error

* update fp16 namespace
```
  8895379a
- Y
  [Bug Fixes]Fix Bugs when construct infermeta by using shape(Vector<Tensor>) (#39904) · fed6de40
  由 YuanRisheng 提交于 2月 25, 2022
```
* fix bugs

* fix bugs
```
  fed6de40
- L
  [Fix bug] fix fp16 atomicAdd compiler error on different cuda_arch. (#39886) · ef96ffb6
  由 Li Min 提交于 2月 25, 2022
```
* Fix compile error on cuda_arch less than 700.
```
  ef96ffb6
- F
  
  [MLU] add elementwise_mul mlu kernel (#39864) · 04d324b2
  由 fwenguang 提交于 2月 25, 2022
  
  04d324b2
- N
  
  Fix a bug in IndexKernel data overflow (#39891) · 0615815d
  由 niuliling123 提交于 2月 25, 2022
  
  0615815d
- Z
  
  Fixed Python-C AutoCodeGen issues (#39897) · b56ac35c
  由 Zhanlue Yang 提交于 2月 25, 2022
  
  b56ac35c
- W
  
  fill_constant_batch_size_like op support fp16 (#39907) · b8cf8ca7
  由 WangXi 提交于 2月 25, 2022
  
  b8cf8ca7
24 2月, 2022 21 次提交

C
[PTen->Phi PR3] Rename pten make target to phi (#39832) · f77019a0
由 Chen Weihang 提交于 2月 24, 2022
```
* rename pten to phi

* fix infrt compile failed

* resolve conflict
```
f77019a0

[IPU] Update IpuStrategy Python Part (#39646) · e0409c93

由 Allen Guo 提交于 2月 24, 2022

* Update IpuStrategy Python Part

* add docs

* add add_custom_op for ipu_strategy

* fix build warning

* rm unneeded part

* clean api

* fix typo

* update option names

* update IpuStrategy doc

e0409c93

W
[Paddle-Inference] fix special_slice plugin (#39875) · 1255e7d6
由 Wangzheee 提交于 2月 24, 2022
```
* fix plugin: special slice for ernie
```
1255e7d6
Z

[MLU]add mlu kernel for allreduce (#39788) · ce207c3a
由 zn 提交于 2月 24, 2022

ce207c3a
A
[phi]migrate increment addmm multinomial cholesky kernels to phi (#39858) · b695fd95
由 Aganlengzi 提交于 2月 24, 2022
```
* migrate increment addmm multinomial cholesky kernels to phi

* test pr39869

* test pr39869

* fix style and ci
```
b695fd95
L
[phi] move randint to phi (#39872) · 127440c3
由 Leo Chen 提交于 2月 24, 2022
```
* move randint to phi

* use host generator
```
127440c3

build a Paddle Graph from CINN compiled program for execution with PE (#39724) · 4d042a83

由 TeFeng Chen 提交于 2月 24, 2022

* build a Paddle Graph from CINN compiled program for execution with PE

* update names of some variables

* fix random fail in build_cinn_pass_test and update some comments

* fix compiler error by merging phi pr

4d042a83

Optimize nearest_interp backward (#39067) · df0b4434

由 Lijunhui 提交于 2月 24, 2022

* nearest_interp_bw init

* optimize kernel config

* optimize kernel config

* fix struct init

* optimize code

* rm duplicated struct

df0b4434

[Phi]Move cross OP to phi (#39829) · 6c358a7c

由 0x45f 提交于 2月 24, 2022

* move cross forward OP

* move cross grad op to phi

* move infershape

* refine infershape

* rename ctx

* set dtype and layout in InferMeta

* refine code

6c358a7c

L
[phi] move bce_loss to phi (#39868) · 6fc5d88a
由 Linjie Chen 提交于 2月 24, 2022
```
* move bce_loss to phi

* refine PADDLE_ENFORCE

* revert PADDLE_ENFORCE

* fix ci
```
6fc5d88a
【Phi】Migrate poisson op into phi (#39814) · bbe441fc
由 zhouweiwei2014 提交于 2月 24, 2022
```
* Migrate poisson op into phi

* fix CI

* fix comment
```
bbe441fc
Z

config fleet optimize. test=develop (#39849) · 23bbd912
由 zmxdream 提交于 2月 24, 2022

23bbd912

Added nearest interp v2 BF16 FWD kernel (#39490) · 2ec943a7

由 jakpiase 提交于 2月 24, 2022

* added nearest interp v2 bf16

* disabled bilinear interp nhwc test

* added skipping UT for gpu

* added NHWC support

* removed unnecessary statements

* minor change

* CI fix

* added appropriate changes to interpolate_v1

* fix after review

* minor change

* minor change

* revert unwanted deletions

* CI fix

2ec943a7

Refactored GradNodeAccumulation data structure and behaviour (#39526) · 1abfc8dd

由 Zhanlue Yang 提交于 2月 24, 2022

* Refactored GradNodeAccumulation data structure and behaviour

* Fixed CI issues

* Fix compilation issues

* Fixed minor issues

* Reverted changes for intermediate and OverwriteOutput

* fixed minor issue

* Fixed code format issues

* Fixed CI-Coverage issue

* Fixed CI issues

1abfc8dd

A
[Phi] Fix XPU OP segmentation Fault problem (#39827) · 7a7a7cad
由 Aurelius84 提交于 2月 24, 2022
```
* [Phi] Fix XPU OP segmentation Fault problem

* fix cast_op_xpu in kunlun1

* fix cast_op_xpu in kunlun1
```
7a7a7cad

[pten] add optional type for infermeta (#39848) · 94b31f90

由 chentianyu03 提交于 2月 24, 2022

* modify infershape by args_def

* add optional type for infermate

* add optional type for infermate

* add optional type for infermate

* support scalar type

* change OptionalInputAt function to none template

* support phi::DataType

94b31f90

J
Fix for split op in BF16 inference (#39548) · 75f91ce4
由 jakpiase 提交于 2月 24, 2022
```
* Fix for split bf16 inference

* added test for pass

* changes after review
```
75f91ce4
H
Optimize where_op and abs_grad_op by the elementwise interface (#39609) · c9699556
由 huangxu96 提交于 2月 24, 2022
```
* Optimize the where_op by the elementwise_op funtion

* Modified where_op & abs_grad_op by elementwise interface
```
c9699556

[Eager] save load testcase (#39571) · 6b5749eb

由 wanghuancoder 提交于 2月 24, 2022

* eager, test=develop

* fix bug, test=develop

* eager, test=develop

* merge legacy to fluid

* eager, test=develop

* eager, test=develop

* Refactor TensorAdd func by template and remove gradient_accumulation in eager

* Remove needless target name

* eager, test=develop

* eager, test=develop

* Use overload instead of template

* Remove legacy code

* Remove legacy code

* selectedrows, test=develop

* Remove DataType test

* eager, test=develop

* eager, test=develop

* support gan, test=develop

* Using Tensor directly instead of using EagerTensor

* support gradient_accumulation

* make test_imperative_lod_tensor_to_selected_rows longer

* make test_imperative_lod_tensor_to_selected_rows longer

* refine code

* ptb, test=develop

* Rename all EagerTensor to Tensor

* Rename some EagerTensor to Tensor

* rename EagerTensor to EagerVariable

* eager, test=develop

* eager, test=develop

* eager, test=develop

* eager, test=develop

* add more test

* eager, test=develop

* Support copiable selected rows and merge develop

* save load, eager, test=develop

* save load, eager, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* revert static_runner, test=develop

* EagerTensor to Tensor, test=develop

* refine, test=develop

* refine, test=develop

* clear grad, test=develop

* merge, develop

* merge, develop

* merge, test=develop

* merge, test=develop
Co-authored-by: NJiabinYang <360788950@qq.com>
Co-authored-by: NWeilong Wu <veyron_wu@163.com>

6b5749eb

N

Fix a bug in IndexKernel out-of-memory (#39867) · 2136bd42
由 niuliling123 提交于 2月 24, 2022

2136bd42
L
optimize performance of lookup_table_v2_op (#39856) · d6038c22
由 Li Min 提交于 2月 24, 2022
```
* optimize block config  and fp16 atomicAdd perf for lookup_table_v2_grad.
```
d6038c22

23 2月, 2022 9 次提交
- S
  Add ProcessGroupNCCL for distributed training (#39737) · 0b205817
  由 ShenLiang 提交于 2月 23, 2022
```
* add processgroup_nccl
```
  0b205817
- Z
  
  Support dispensable inputs for eager final state codegen (#39743) · ca11a0e5
  由 Zhanlue Yang 提交于 2月 23, 2022
  
  ca11a0e5
- S
  move trunc_op's infere shape to phi (#39772) · 95280a36
  由 Sing_chan 提交于 2月 23, 2022
```
* move trunc_op's infere shape

* modify according to risheng's comment
```
  95280a36
- L
  [phi] move randperm to phi (#39816) · 30992ea0
  由 Leo Chen 提交于 2月 23, 2022
```
* move randperm to phi

* fix npu

* fix memory::Copy
```
  30992ea0
- Y
  
  [Phi] move flip op to phi kernel (#39822) · ad294a81
  由 Yang 提交于 2月 23, 2022
  
  ad294a81
- C
  [Phi] Polish default signature attr and output select impl (#39810) · 64ed92bd
  由 Chen Weihang 提交于 2月 23, 2022
```
* polish default sig impl

* revert dispenable out
```
  64ed92bd
- [MLU] add cncl parallel context and mlu resource pool (#39803) · 6241913b
  由 mhhhh1 提交于 2月 23, 2022
```
* [MLU] add cncl parallel context and mlu resource pool

* [MLU] fix the cncl_context_test
```
  6241913b
- change CUDA implementaion of bernoulli OP (#39732) · b9675acc
  由 zhouweiwei2014 提交于 2月 23, 2022
```
* change CUDA implementaion of bernoulli OP

* fix CI
```
  b9675acc
- R
  
  [phi] migrate atan2_op into phi (#39806) · b089e7cd
  由 ronnywang 提交于 2月 23, 2022
  
  b089e7cd

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致