提交 · 2ec943a7d29cace9ac5b36d3d6c9da3fedd99da5 · BaiXuePrincess / Paddle

24 2月, 2022 4 次提交

Added nearest interp v2 BF16 FWD kernel (#39490) · 2ec943a7

由 jakpiase 提交于 2月 24, 2022

* added nearest interp v2 bf16

* disabled bilinear interp nhwc test

* added skipping UT for gpu

* added NHWC support

* removed unnecessary statements

* minor change

* CI fix

* added appropriate changes to interpolate_v1

* fix after review

* minor change

* minor change

* revert unwanted deletions

* CI fix

2ec943a7

H
Optimize where_op and abs_grad_op by the elementwise interface (#39609) · c9699556
由 huangxu96 提交于 2月 24, 2022
```
* Optimize the where_op by the elementwise_op funtion

* Modified where_op & abs_grad_op by elementwise interface
```
c9699556
N

Fix a bug in IndexKernel out-of-memory (#39867) · 2136bd42
由 niuliling123 提交于 2月 24, 2022

2136bd42
L
optimize performance of lookup_table_v2_op (#39856) · d6038c22
由 Li Min 提交于 2月 24, 2022
```
* optimize block config  and fp16 atomicAdd perf for lookup_table_v2_grad.
```
d6038c22

23 2月, 2022 11 次提交

S
move trunc_op's infere shape to phi (#39772) · 95280a36
由 Sing_chan 提交于 2月 23, 2022
```
* move trunc_op's infere shape

* modify according to risheng's comment
```
95280a36
L
[phi] move randperm to phi (#39816) · 30992ea0
由 Leo Chen 提交于 2月 23, 2022
```
* move randperm to phi

* fix npu

* fix memory::Copy
```
30992ea0
Y

[Phi] move flip op to phi kernel (#39822) · ad294a81
由 Yang 提交于 2月 23, 2022

ad294a81
change CUDA implementaion of bernoulli OP (#39732) · b9675acc
由 zhouweiwei2014 提交于 2月 23, 2022
```
* change CUDA implementaion of bernoulli OP

* fix CI
```
b9675acc
R

[phi] migrate atan2_op into phi (#39806) · b089e7cd
由 ronnywang 提交于 2月 23, 2022

b089e7cd

[phi] move unbind to phi (#39789) · dba694f4

由 Leo Chen 提交于 2月 23, 2022

* move unbind to phi

* revert infer shape

* add header file

* move concat_and_split to phi

dba694f4

[KP] Add elementwise add xpu after phi, test=develop (#39787) · 1a1a2ce8

由 Liu-xiandong 提交于 2月 23, 2022

* [KP] Add elementwise add xpu, test=develop

* modify the File Permissions

* modify the copyright time

* modify code style

* modify code style

1a1a2ce8

A
[Phi] Migrate lable_smooth_op into Phi (#39796) · b7bcd0f6
由 Aurelius84 提交于 2月 23, 2022
```
* [Phi] Migrate lable_smooth_op into Phi

* fix PT->PD
```
b7bcd0f6

[bf16] add bf16 kernel: elementwise_div (#39602) · ca4df333

由 zhangbo9674 提交于 2月 23, 2022

* add elementwise_div

* refine rocm

* refine code

* refine op register

* solve conflict

* refine unittest

* refine unittest precision

* add rocm

ca4df333

Update record interface using part3 (#39695) · 1fcaab45

由 chenjian 提交于 2月 23, 2022

* fix RecordEvent interface

* modify default level to 4

* update interface use

* add const default trace level

* update record event interface using

* update record event interface using

* update record event interface using

* update operator.cc

* update part2

* update part1

* update part3

* fix include profiler.h header in ps server

* fix include profiler.h header in ps server

* fix profiler.h header

* fix profiler.h header

* fix merge buf

* update

* fix bug

* fix bug

1fcaab45

[PHI] Remove fill_any_like kernel register in fluid (#39807) · 69e9e9d5

由 zyfncg 提交于 2月 23, 2022

* remove fill_any_like kernel in fluid and fix data transform bug

* support scalar in infershpe

* recover infershape in fill_and_like

69e9e9d5

22 2月, 2022 7 次提交

Move real and imag op to phi (#39777) · 345cc8fa

由 From00 提交于 2月 22, 2022

* Move Real OP to phi

* Move Imag OP to phi

* Move Real and Imag InferShape to phi

* Move Real and Imag to complex_kernel

* Change PT_REGISTER_XXX to PD_REGISTER_XXX

345cc8fa

J

added round fwd onednn kernel (#39653) · 74c0bc1c
由 jakpiase 提交于 2月 22, 2022

74c0bc1c

Adapt to batch_norm_grad op and add align function in roi_align op for kunlun (#39685) · f33ae206

由 Leo Guo 提交于 2月 22, 2022

* Adapt to batch_norm_grad op and add align function in
roi_align op for kunlun, *test=kunlun

* Adapt to batch_norm, batch_norm_grad op api for kunlun, and add unit-tests of batch_norm, roi_align. *test=kunlun

f33ae206

change Vector to std::vector and provide MixVector class as a helper … (#39559) · 728c0624

由 xiongkun 提交于 2月 22, 2022

* change Vector to std::vector and provide MixVector class as a helper wrapper class

* solve the multi-gpu hang problem

* remove the duplicate template instantialize

* Copy vector to cpu

* add CopyToCPU

* xxx

* final version: fix the problem of all reduce

* remove mixvector dependence

* fix

* merge

* fix code

* fix by CI

728c0624

[Phi] Migrate unfold_op into phi (#39778) · 1aa67778

由 Aurelius84 提交于 2月 22, 2022

* [Phi] Migrate unfold_op into phi

* fix im2col CPUContext template instantial

* fix unfold_op.h header include problem

* fix unittest

* fix PT->PD

1aa67778

H

update unittests for nearest_interp_v2_op_xpu: 'sync' from gpu. test=kunlun (#39768) · e89bf25b
由 houj04 提交于 2月 22, 2022

e89bf25b
N
Modified RandomKernel with Kernel Primitive API (#39666) · 9f94821b
由 niuliling123 提交于 2月 22, 2022
```
* Modified RandomKernel with Kernel Primitive API

* update pten.h to phi.h

* update

* update fullKernel
```
9f94821b

21 2月, 2022 9 次提交
- F
  Move Abs InferShape to phi (#39762) · 9c51eee1
  由 From00 提交于 2月 21, 2022
```
* Move Abs InferShaper to phi

* Fix CI error
```
  9c51eee1
- A
  [Pten] Migrate huber_loss into phi (#39761) · 6aafb2fa
  由 Aurelius84 提交于 2月 21, 2022
```
* migrate huber_loss into phi

* migrate infershape

* modify pten into phi
```
  6aafb2fa
- 0
  [Dy2St]Fix cond grad error when handle tensor array (#39689) · a863b32e
  由 0x45f 提交于 2月 21, 2022
```
* fix cond grad error when handle tensor array

* add UT
```
  a863b32e
- C
  [pten]rm reduce_sum and reduce_mean raw kernel (#39484) · 2bb5aae8
  由 chentianyu03 提交于 2月 21, 2022
```
* rm reduce_sum raw kernel

* remove reduce_mean kernel

* remove reduce_mean kernel

* reduce support int and int64_t

* mean support int and int64_t type
```
  2bb5aae8
- Z
  [bf16] add bf16 kernel: elementwise_max (#39461) · 93016331
  由 zhangbo9674 提交于 2月 21, 2022
```
* add elementwise_max & unittest

* refine cuda register and unittest

* refine unittest

* refine uinttest for bf16

* refine optest

* refine code

* refine unittest

* refine unittest
```
  93016331
- Y
  [PTen]Remove infershape of Reshape OP (#39631) · 45dd4a5f
  由 YuanRisheng 提交于 2月 21, 2022
```
* remove infershape and Xshape

* add xshape

* fix bugs when run ci

* fix bugs when run ci

* fix bugs when run infrt test

* pass converage
```
  45dd4a5f
- Z
  [HeterPS]fix ut for heteps comm op (#39684) · d41836ef
  由 zmxdream 提交于 2月 21, 2022
```
* fix. test=develop

* fix. test=develop

* fix code style. test=develop

* fix. test=develop

* fix. test=develop
```
  d41836ef
- S
  
  fix alignment bug (#39747) · 65ced1fa
  由 sneaxiy 提交于 2月 21, 2022
  
  65ced1fa
- S
  
  fix bug: core when missing range XPU kernel in kunlun2 (#39673) · 496aadfb
  由 ShiningZhang 提交于 2月 21, 2022
  
  496aadfb
20 2月, 2022 4 次提交

[PTen->Phi PR1] Change pten dirname and namespace to phi (#39748) · dcfe1986

由 Chen Weihang 提交于 2月 20, 2022

* rename pten dir to phi

* rename namespace to phi

* rename infrt pten dir to phi

* resolve conflict

* rename pten to phi in cmake

* revert all infrt change

* change needed files

* fix infrt failed

* fix inference failed

dcfe1986

add index initialization in the block loop for index_sample kernel when... · c6950ab2

由 FlyingQianMM 提交于 2月 20, 2022

add index initialization in the block loop for index_sample kernel when dealing with a input tensor whose shape is larger than block_dim * grid_dim (#39736)

* add block and grid loop for index_sample kernel to deal with a large-shape tensor

* fix code format

* limit grid dim

* fix the omissive initialization of index_i in the second cycle for index_sample kernel

* fix conflicts

c6950ab2

Y

Rename the general elementwise and broadcast functions. (#39623) · 553afc07
由 Yiqun Liu 提交于 2月 20, 2022

553afc07
S
Add int16 support for several ops (#39636) · 267275d9
由 sneaxiy 提交于 2月 20, 2022
```
* add more op int16 support

* fix xpu ci
```
267275d9

19 2月, 2022 5 次提交

[Pten]Unify paddle/pten::framework::ddim into pten::ddim (#39614) · 2fe04264

由 Aurelius84 提交于 2月 19, 2022

* Unify paddle/pten::framework::ddim into pten::ddim

* fix paddle namespace

* compile sucessfully

* fix npu src file

* fix conflict

* fix conflict

* fix tensorrt compiler error

* fix conflict

* fix conflict

* fix tesst file conflict

* fix conflict

* fix mlu file conflict

* fix mlu file conflict

* fix cinn header file conflict

* fix conflict

* fix conflict

* fix conflict

* fix conflict

2fe04264

[Pten] Add selected_rows kernel for Full (#39465) · 79f8eeca

由 zyfncg 提交于 2月 19, 2022

* Add selected_rows kernel for full

* remove fill_constant register in fluid

* fix bug without GPU

* add jit_kernel_helper dependency for fc

* do some refactor

* add unittest for ops signatures

* add coverage unittest

* fix merge conflict

* fix full selectew_rows bug

79f8eeca

fix RecordEvent interface (#39675) · 019a552b

由 chenjian 提交于 2月 19, 2022

* fix RecordEvent interface

* modify default level to 4

* update interface use

* add const default trace level

* update operator.cc

019a552b

[Pten] Adjust the params of creation kernel for inference (#39573) · 4e5d6743

由 zyfncg 提交于 2月 19, 2022

* remove manual_api

* change sig map of full and empty

* fix fill_any_like_xpu_op

* fix fill_any_like_xpu_op

* fix problem of fill_any_like_xpu_op

* fix conflict

* polish code

4e5d6743

Add the DistributedFusedLamb optimizer (#39148) · 5df3cd61

由 sneaxiy 提交于 2月 19, 2022

* add DistributedFusedLamb op

* polish code

* fix compile error

* compatible with pten changement

* fix rocm compile error

* improve converage

* update upstream/develop

* fix cast_with_ptr.h

* add FLAGS_distributed_lamb_divide_nranks_when_allreduce=1

* fix clip before allreduce

* add use_master_param_norm

* code polish

* fix bug

* fix ROCM ci

5df3cd61

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致