提交 · 35acfeda36caada80464051043a3f86ae2b76779 · 机器未来 / Paddle

15 4月, 2022 2 次提交

Change cuDNN Conv kernel for auto tune feature (#41313) · 35acfeda

由 limingshu 提交于 4月 15, 2022

* change cudnn helper for auto-tune

* Add FLAGS_use_autotune to set the global status of autotune and change the order of choosing algorithm.

* Fix the bug in calculating and printing current step cache hit rate.

* Improve the autotune cache and fix unittest.

* Change the key from AlgorithmType to int64_t.

* Fix unittest for cpu-only env.

* change ChooseAlgoByWorkspace for heuristic mode
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>

35acfeda

fix batch norm memory issue (#41717) · 42abcc08

由 hong 提交于 4月 15, 2022

* try to fix batch norm memory issue

* fix batch norm memroy alloc bug

* polish some code

42abcc08

14 4月, 2022 3 次提交
- L
  [KP] Add registry for elementwise_add/max/min/sub/div/mul/floordiv on XPU2 with KP lib (#41494) · fbe2c311
  由 Lijunhui 提交于 4月 14, 2022
```
* regist elementwise_xxx
```
  fbe2c311
- A
  
  [Op]Fix adam/adamw beta1_pow/beta2_pow place while copying (#41732) · 4ae76d21
  由 Aurelius84 提交于 4月 14, 2022
  
  4ae76d21
- C
  [Phi] Unify dispatch macros to visit (#41653) · 2ab986ae
  由 Chen Weihang 提交于 4月 14, 2022
```
* chnage dispatch to visit

* resolve conflict
```
  2ab986ae
13 4月, 2022 2 次提交
- H
  Add expand equal all yaml (#41540) · e53d1837
  由 hong 提交于 4月 13, 2022
```
* add expand, poisson

* add poison grad

* add expand equal_all poisson triangular solve yaml
```
  e53d1837
- Z
  
  Add kernel sparse_mask_helper; sparse_coo_tensor_grad (#41586) · acd08a9b
  由 zhangkaihuo 提交于 4月 13, 2022
  
  acd08a9b
12 4月, 2022 8 次提交

Add layer norm yaml (#41589) · 43d5cca6

由 hong 提交于 4月 12, 2022

* add layer norm infermeta

* add layer norm yaml

* polish layer norm infer meta

* add layer norm to black list

43d5cca6

C
exchange assign and assign_raw kernel name (#41625) · de49a4b7
由 chentianyu03 提交于 4月 12, 2022
```
* exchange assign and assign_raw kernel name

* fix register error
```
de49a4b7
H

fix depthwise dnn bug (#41666) · 7b627dd8
由 hong 提交于 4月 12, 2022

7b627dd8

[KP] Add Logical/compare/bitwise registry & UT (#40802) · 3749198e

由 Lijunhui 提交于 4月 12, 2022

* init commit no push

* collect comile errors

* bitwise UT

* fix compile problem

* cancel comments

* restore miss deletion

* fix compilation

* fix UT

* NO stash in multiple branch at the same times

* fix error

* combine .cu from gpu and kps

* replace gpu by kps

* fix by Chen-weihang

* Revert "Fix kps compile error in Junhui logic compare bitwise"

* fix backend test

* rm comments
Co-authored-by: NChen Weihang <chenweihang@baidu.com>

3749198e

W

add fp16 kernel to clip_grad (#41661) · 137dc3e3
由 wuyefeilin 提交于 4月 12, 2022

137dc3e3
Z
[DoubleGrad] Enabled double grad test cases in eager_mode for test_imperative_double_grad (#41451) · 0b4c3c20
由 Zhanlue Yang 提交于 4月 12, 2022
```
* [DoubleGrad] Enabled double grad test cases in eager_mode for test_imperative_double_grad

* Fixed elementwise issue

* Addressed CI failures
```
0b4c3c20
A
[Phi]Fix beta1_pow/beta2_pow/skip_update data transform problem in adam/adamw (#41641) · fdeec8c3
由 Aurelius84 提交于 4月 12, 2022
```
* [Phi]Fix beta1_pow/beta2_pow/skip_update data transform problem in adam/adamw

* fix xpu unittest failed
```
fdeec8c3

add a inner loop for index_select_grad_init() in index_select op when dealing... · bc01242b

由 FlyingQianMM 提交于 4月 12, 2022

add a inner loop for index_select_grad_init() in index_select op when dealing with large-shape data (#41563)

* replace for with CUDA_KERNEL_LOOP for index_select_grad_init() in index_select op

* use CUDA_KERNEL_LOOP_TYPE

* fix code style

* replace index_select_grad_init with SetConstant

bc01242b

11 4月, 2022 3 次提交

Y
[Phi]Add multi_dot/maxout/multiplex op yaml (#41550) · 36d76840
由 YuanRisheng 提交于 4月 11, 2022
```
* add multi_dot,maxout,multiplex yaml

* add code converage
```
36d76840

[Yaml] Add assign yaml (#41428) · 437bebda

由 chentianyu03 提交于 4月 11, 2022

* add assign yaml

* add assign api

* add assign backward api

* add assign

* add assign yaml

* add assign

* assign yaml

* add assign raw kernel and use assign_raw in yaml

* merge develop branch

* add missing python_api

437bebda

S

fix some ops (#41577) · 795d7121
由 sneaxiy 提交于 4月 11, 2022

795d7121

10 4月, 2022 1 次提交
- C
  
  fix warpctc grad kernel dep eror (#41598) · 91d6f47a
  由 Chen Weihang 提交于 4月 10, 2022
  
  91d6f47a
09 4月, 2022 2 次提交

H

add depthwise conv hip support (#41537) · b3b8d345
由 hong 提交于 4月 09, 2022

b3b8d345

Autotune the workspace_size_limit in conv. (#40338) · b937cdc5

由 limingshu 提交于 4月 09, 2022

* Using the maximum workspace_size of all alogirhms to limit the workspace size in exhaustive search mode.

* Use the system cudaMalloc and cudaFree to allocate workspace during searching.

* Enable switch of two kind of workspace setting methods.
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>

b937cdc5

08 4月, 2022 1 次提交
- J
  
  Fix RNN OP multi-threads predict bug (#41529) · 09203e46
  由 Jack Zhou 提交于 4月 08, 2022
  
  09203e46
07 4月, 2022 8 次提交
- remove FLAGS_use_curand and change all random op CUDA implementation (#41308) · 9714878c
  由 zhouweiwei2014 提交于 4月 07, 2022
  
  9714878c
- Y
  [Phi]Add hard_swish/kron/linspace/logit yaml file (#41298) · 90cb337e
  由 YuanRisheng 提交于 4月 07, 2022
```
* add yaml

* perfect converage
```
  90cb337e
- fix compile bug of windows cuda11.5 (#41433) · eea85814
  由 zhouweiwei2014 提交于 4月 07, 2022
  
  eea85814
- Z
  
  Add Sparse API to_dense, to_sparse_coo and values (#41394) · f78cc3da
  由 zhangkaihuo 提交于 4月 07, 2022
  
  f78cc3da
- S
  [BugFix] Add error hint for one_hot gpu version (#41335) · 91266b96
  由 Siming Dai 提交于 4月 07, 2022
```
* add one_hot gpu hint

* move allow_out_of_range judgement

* delete useless unittest
```
  91266b96
- Z
  
  fix p_norm gpu nan bug while divide zero (#41359) · dfa63126
  由 zhiboniu 提交于 4月 07, 2022
  
  dfa63126
- C
  [Phi] Polish truncated normal kernel and add yaml (#41280) · d39e7896
  由 Chen Weihang 提交于 4月 07, 2022
```
* polish truncated normal kernel

* add yaml

* add truncated normal kernel and add yaml

* polish unittests and yaml

* import dygraph mehtod
```
  d39e7896
- Y
  
  fix bugs of reshape double grad infermeta (#41459) · 53409bcd
  由 YuanRisheng 提交于 4月 07, 2022
  
  53409bcd
06 4月, 2022 4 次提交

Y
[Phi]Add graph_send_recv yaml file (#41206) · 6f4bd0ea
由 YuanRisheng 提交于 4月 06, 2022
```
* add graph_send_recv yaml

* deal with confict

* fix compile bugs
```
6f4bd0ea
S

fix bug of missing boost when compile cache.cc (#41430) · 5c6e4bff
由 Sing_chan 提交于 4月 06, 2022

5c6e4bff

Add conv yaml (#41354) · 7ed7c6c7

由 hong 提交于 4月 06, 2022

* update

* add conv yaml

* add backward

* remove useless code

* fix bug

* fix bug

* revert fluid dygraph conv2d

* remove useless infermeta function

* fix meta fn deluplicat error

* conv using custom impl

* remove amp include

* fix bug

* use cudnn = true

* fix test mkldnn caching bug

7ed7c6c7

X
[Dygraph TestsFix] Test some tests in new dygraph final_state mode. (#41363) · 0b96793e
由 xiongkun 提交于 4月 06, 2022
```
* fix less than

* fix some tests

* fix additional 3 unittest case
```
0b96793e

05 4月, 2022 4 次提交

Z
Fix bug of data transform in inference executor (#41349) · 91212104
由 zyfncg 提交于 4月 05, 2022
```
* fix bug of data transform in inference executor

* fix bug
```
91212104

[DoubleGrad PR #8] Enabled triple grads for sigmoid and matmul (#41387) · d8a10977

由 Zhanlue Yang 提交于 4月 05, 2022

* [Refactor] refactored eager_gen.py PR #2

* [DoubleGrad PR #1] Decoupled code generation logics for Dygraph ForwardFunctions and GradNodes

* Fixed minor issue

* Adjusted logics of GenerateNodeCreationCodes and GenerateForwardDefinition

* Fixed issues

* Supported higher-order grad node generation

* [DoubleGrad PR #4] Supported higher-order GradNode generation

* [DoubleGrad #4] Bug Fixes to Double Grad Node Generation

* Fixed yaml typo

* Fixed yaml typo

* fixed minor issues

* [DoubleGrad PR #5] Enabled gradient computations for grad_tensors passed to paddle.grad()

* Fixed minor issue

* Fixed CI-Inference issue

* Fixed CI-inference issues

* [DoubleGrad PR #7] paddle.grad() to copy backward graph before backward run

* Fixed minor issues

* Fixed issue with backward graph construction logic

* Fixed implementation issues with backward graph reconstruction

* Fixed unittest issue

* Fixed issues

* [DoubleGrad PR #8] Enabled triple grads for sigmoid and matmul

* Fixed issues with phi kernel

* Added triple grad test case

* Fixed minor issue

d8a10977

G

add new format of quantization (#41041) · b72a7ebb
由 Guanghua Yu 提交于 4月 05, 2022

b72a7ebb

Implement AutoTuneStatus class for Kernel Auto Tune (#41218) · b0f8000e

由 Zhang Ting 提交于 4月 05, 2022

* switch autotune

* implement AutoTuneCache

* implement AutoTuneCache class

* add pybind api

* add dygraph test

* support static mode and eager mode and improve unittests

* rename the SwitchAutoTune Class and improve tests

* improve AutoTuneStatus and reduce the cost of tests

b0f8000e

04 4月, 2022 2 次提交
- 0
  
  Fix Warpctc error when using muti-gpu (#41389) · f8b3e576
  由 0x45f 提交于 4月 04, 2022
  
  f8b3e576
- F
  
  fix index_select kernel configuration error where input numel is 0 (#41383) · 3e9ad093
  由 FlyingQianMM 提交于 4月 04, 2022
  
  3e9ad093

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致