提交 · 1178f153a830670c48c5a9fff2966155a007214e · PaddlePaddle / Paddle

25 4月, 2022 1 次提交
- C
  
  fix variant compile error (#42203) · 1178f153
  由 Chen Weihang 提交于 4月 25, 2022
  
  1178f153
23 4月, 2022 1 次提交

[Performance]Set ShapeKernel with ALL_BACKEND and ALL_LAYOUT (#42138) · 79ac8870

由 Aurelius84 提交于 4月 23, 2022

* [Performance]Set ShapeKernel with ALL_BACKEND and ALL_LAYOUT

* [Performance]Set ShapeKernel with ALL_BACKEND and ALL_LAYOUT

79ac8870

22 4月, 2022 1 次提交
- Z
  
  Add Sparse BatchNorm and fix two bugs (#42013) · 8a6456db
  由 zhangkaihuo 提交于 4月 22, 2022
  
  8a6456db
21 4月, 2022 1 次提交
- S
  Support FP16 argmax/argmin kernel (#42038) · 7003dcaa
  由 sneaxiy 提交于 4月 21, 2022
```
* support int16 argmax kernel

* add fp16 test
```
  7003dcaa
20 4月, 2022 1 次提交

【PaddlePaddle Hackathon 2】9、为 Paddle 新增 logspace API (#41261) · a3c50c42

由 BrilliantYuKaimin 提交于 4月 20, 2022

* 增加logspace的算子描述

* 增加logspace的形状推断

* 增加logspace核函数实现

* 在python中增加logspace接口

* 增加logspace单测

* 增加logspace

* Update logspace_kernel.cu

* Update logspace_op.cc

* 调整代码格式

* Update doc of logspace

* Update tensor.py

* Update logspace_op.cc

* Update logspace_kernel.cc

* Update logspace_kernel.cu

* Update test_logspace.py

* 调整 logspace 的位置

* 调整代码格式

a3c50c42

19 4月, 2022 1 次提交

[Phi]Separate AddKernel/DivideKernel/SubtractKernel/MultiplyKernel from... · 2cb19d8f

由 YuanRisheng 提交于 4月 19, 2022

[Phi]Separate AddKernel/DivideKernel/SubtractKernel/MultiplyKernel from ElementwiseKernel（Part1） (#41806)

* seperate add/div/sub/mul from elementwise

* delete code

* fix compile bugs

* deal with conflict

* fix bugs when compile

* fix windows unit test bug

* fix ci converage bugs

2cb19d8f

18 4月, 2022 3 次提交
- L
  
  [KP] Add Reduce op registry & UT for xpu_kp compilation (#41869) · b3959fe4
  由 Lijunhui 提交于 4月 18, 2022
  
  b3959fe4
- Z
  
  Add sparse kernel coalesced (#41784) · 8f469ddd
  由 zhangkaihuo 提交于 4月 18, 2022
  
  8f469ddd
- S
  Optimization for graph_sample_neighbors API (#41447) · c31dd04c
  由 Siming Dai 提交于 4月 18, 2022
```
* add eids result for graph_sample_neighbors

* fix bug

* move fisher_yates sample to warp

* add cpu eid output

* delete comment

* delete comment

* change nullptr placeholder

* optimize sample kernel

* fix mutable_data
```
  c31dd04c
17 4月, 2022 1 次提交

[Perf] Optimize dygraph scheduling performance (#41696) · 7ee31a96

由 Chen Weihang 提交于 4月 17, 2022

* split phi and fluid infermeta context

* resolve conflict

* fix type error

* optimize scheduling perf

* spec small vector size

* replace all grad var name

* fix test failed

* move init defalut signature

* polish details

* polish details

* fix no init bug

* init sig for tests

* add init sig for infer

* fix infrt error

* fix infrt failed

* fix kunlun error

* fix infrt failed

7ee31a96

16 4月, 2022 1 次提交
- 王
  
  move fc_functor from fluid to phi.test=develop (#41856) · 21aa3adc
  由王明冬提交于 4月 16, 2022
  
  21aa3adc
15 4月, 2022 5 次提交

[Phi]Reduce kernels into multiply files (#41747) · 1927aff9

由 chentianyu03 提交于 4月 15, 2022

* split reduce_kernel

* rm reduce_kernel in cmake

* split reduce_grad kernels

* fix cmake build error

* format code

* fix standalone_executor_test error

1927aff9

[DoubleGrad] Enabled test_imperative_star_gan_with_gradient_penalty.py under eager mode (#41730) · 27f28e82

由 Zhanlue Yang 提交于 4月 15, 2022

* [DoubleGrad] Enabled double grad test cases in eager_mode for test_imperative_double_grad

* Fixed elementwise issue

* Addressed CI failures

* [DoubleGrad] Enabled test_imperative_triple_grad test cases under eager_mode

* [DoubleGrad] Enabled test_autograd_functional_dynamic.py under eager mode

* Enabled more test cases

* [DoubleGrad] Enabled test_imperative_star_gan_with_gradient_penalty.py under eager mode

* Adjusted test_imperative_star_gan_with_gradient_penalty.py

27f28e82

Z

Add API: Sparse Convolution3D (#41434) · 1665594d
由 zhangkaihuo 提交于 4月 15, 2022

1665594d

Change cuDNN Conv kernel for auto tune feature (#41313) · 35acfeda

由 limingshu 提交于 4月 15, 2022

* change cudnn helper for auto-tune

* Add FLAGS_use_autotune to set the global status of autotune and change the order of choosing algorithm.

* Fix the bug in calculating and printing current step cache hit rate.

* Improve the autotune cache and fix unittest.

* Change the key from AlgorithmType to int64_t.

* Fix unittest for cpu-only env.

* change ChooseAlgoByWorkspace for heuristic mode
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>

35acfeda

fix batch norm memory issue (#41717) · 42abcc08

由 hong 提交于 4月 15, 2022

* try to fix batch norm memory issue

* fix batch norm memroy alloc bug

* polish some code

42abcc08

14 4月, 2022 3 次提交
- L
  [KP] Add registry for elementwise_add/max/min/sub/div/mul/floordiv on XPU2 with KP lib (#41494) · fbe2c311
  由 Lijunhui 提交于 4月 14, 2022
```
* regist elementwise_xxx
```
  fbe2c311
- A
  
  [Op]Fix adam/adamw beta1_pow/beta2_pow place while copying (#41732) · 4ae76d21
  由 Aurelius84 提交于 4月 14, 2022
  
  4ae76d21
- C
  [Phi] Unify dispatch macros to visit (#41653) · 2ab986ae
  由 Chen Weihang 提交于 4月 14, 2022
```
* chnage dispatch to visit

* resolve conflict
```
  2ab986ae
13 4月, 2022 2 次提交
- H
  Add expand equal all yaml (#41540) · e53d1837
  由 hong 提交于 4月 13, 2022
```
* add expand, poisson

* add poison grad

* add expand equal_all poisson triangular solve yaml
```
  e53d1837
- Z
  
  Add kernel sparse_mask_helper; sparse_coo_tensor_grad (#41586) · acd08a9b
  由 zhangkaihuo 提交于 4月 13, 2022
  
  acd08a9b
12 4月, 2022 8 次提交

Add layer norm yaml (#41589) · 43d5cca6

由 hong 提交于 4月 12, 2022

* add layer norm infermeta

* add layer norm yaml

* polish layer norm infer meta

* add layer norm to black list

43d5cca6

C
exchange assign and assign_raw kernel name (#41625) · de49a4b7
由 chentianyu03 提交于 4月 12, 2022
```
* exchange assign and assign_raw kernel name

* fix register error
```
de49a4b7
H

fix depthwise dnn bug (#41666) · 7b627dd8
由 hong 提交于 4月 12, 2022

7b627dd8

[KP] Add Logical/compare/bitwise registry & UT (#40802) · 3749198e

由 Lijunhui 提交于 4月 12, 2022

* init commit no push

* collect comile errors

* bitwise UT

* fix compile problem

* cancel comments

* restore miss deletion

* fix compilation

* fix UT

* NO stash in multiple branch at the same times

* fix error

* combine .cu from gpu and kps

* replace gpu by kps

* fix by Chen-weihang

* Revert "Fix kps compile error in Junhui logic compare bitwise"

* fix backend test

* rm comments
Co-authored-by: NChen Weihang <chenweihang@baidu.com>

3749198e

W

add fp16 kernel to clip_grad (#41661) · 137dc3e3
由 wuyefeilin 提交于 4月 12, 2022

137dc3e3
Z
[DoubleGrad] Enabled double grad test cases in eager_mode for test_imperative_double_grad (#41451) · 0b4c3c20
由 Zhanlue Yang 提交于 4月 12, 2022
```
* [DoubleGrad] Enabled double grad test cases in eager_mode for test_imperative_double_grad

* Fixed elementwise issue

* Addressed CI failures
```
0b4c3c20
A
[Phi]Fix beta1_pow/beta2_pow/skip_update data transform problem in adam/adamw (#41641) · fdeec8c3
由 Aurelius84 提交于 4月 12, 2022
```
* [Phi]Fix beta1_pow/beta2_pow/skip_update data transform problem in adam/adamw

* fix xpu unittest failed
```
fdeec8c3

add a inner loop for index_select_grad_init() in index_select op when dealing... · bc01242b

由 FlyingQianMM 提交于 4月 12, 2022

add a inner loop for index_select_grad_init() in index_select op when dealing with large-shape data (#41563)

* replace for with CUDA_KERNEL_LOOP for index_select_grad_init() in index_select op

* use CUDA_KERNEL_LOOP_TYPE

* fix code style

* replace index_select_grad_init with SetConstant

bc01242b

11 4月, 2022 3 次提交

Y
[Phi]Add multi_dot/maxout/multiplex op yaml (#41550) · 36d76840
由 YuanRisheng 提交于 4月 11, 2022
```
* add multi_dot,maxout,multiplex yaml

* add code converage
```
36d76840

[Yaml] Add assign yaml (#41428) · 437bebda

由 chentianyu03 提交于 4月 11, 2022

* add assign yaml

* add assign api

* add assign backward api

* add assign

* add assign yaml

* add assign

* assign yaml

* add assign raw kernel and use assign_raw in yaml

* merge develop branch

* add missing python_api

437bebda

S

fix some ops (#41577) · 795d7121
由 sneaxiy 提交于 4月 11, 2022

795d7121

10 4月, 2022 1 次提交
- C
  
  fix warpctc grad kernel dep eror (#41598) · 91d6f47a
  由 Chen Weihang 提交于 4月 10, 2022
  
  91d6f47a
09 4月, 2022 2 次提交

H

add depthwise conv hip support (#41537) · b3b8d345
由 hong 提交于 4月 09, 2022

b3b8d345

Autotune the workspace_size_limit in conv. (#40338) · b937cdc5

由 limingshu 提交于 4月 09, 2022

* Using the maximum workspace_size of all alogirhms to limit the workspace size in exhaustive search mode.

* Use the system cudaMalloc and cudaFree to allocate workspace during searching.

* Enable switch of two kind of workspace setting methods.
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>

b937cdc5

08 4月, 2022 1 次提交
- J
  
  Fix RNN OP multi-threads predict bug (#41529) · 09203e46
  由 Jack Zhou 提交于 4月 08, 2022
  
  09203e46
07 4月, 2022 4 次提交
- remove FLAGS_use_curand and change all random op CUDA implementation (#41308) · 9714878c
  由 zhouweiwei2014 提交于 4月 07, 2022
  
  9714878c
- Y
  [Phi]Add hard_swish/kron/linspace/logit yaml file (#41298) · 90cb337e
  由 YuanRisheng 提交于 4月 07, 2022
```
* add yaml

* perfect converage
```
  90cb337e
- fix compile bug of windows cuda11.5 (#41433) · eea85814
  由 zhouweiwei2014 提交于 4月 07, 2022
  
  eea85814
- Z
  
  Add Sparse API to_dense, to_sparse_coo and values (#41394) · f78cc3da
  由 zhangkaihuo 提交于 4月 07, 2022
  
  f78cc3da

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功