提交 · 2ab986aeb7eecc7c28dc5b1907bf3f5ca72911e4 · Crayon鑫 / Paddle

14 4月, 2022 1 次提交
- C
  [Phi] Unify dispatch macros to visit (#41653) · 2ab986ae
  由 Chen Weihang 提交于 4月 14, 2022
```
* chnage dispatch to visit

* resolve conflict
```
  2ab986ae
13 4月, 2022 2 次提交
- H
  Add expand equal all yaml (#41540) · e53d1837
  由 hong 提交于 4月 13, 2022
```
* add expand, poisson

* add poison grad

* add expand equal_all poisson triangular solve yaml
```
  e53d1837
- Z
  
  Add kernel sparse_mask_helper; sparse_coo_tensor_grad (#41586) · acd08a9b
  由 zhangkaihuo 提交于 4月 13, 2022
  
  acd08a9b
12 4月, 2022 8 次提交

由 hong 提交于 4月 12, 2022

* add layer norm infermeta

* add layer norm yaml

* polish layer norm infer meta

* add layer norm to black list

43d5cca6

C
exchange assign and assign_raw kernel name (#41625) · de49a4b7
由 chentianyu03 提交于 4月 12, 2022
```
* exchange assign and assign_raw kernel name

* fix register error
```
de49a4b7
H

fix depthwise dnn bug (#41666) · 7b627dd8
由 hong 提交于 4月 12, 2022

7b627dd8

[KP] Add Logical/compare/bitwise registry & UT (#40802) · 3749198e

由 Lijunhui 提交于 4月 12, 2022

* init commit no push

* collect comile errors

* bitwise UT

* fix compile problem

* cancel comments

* restore miss deletion

* fix compilation

* fix UT

* NO stash in multiple branch at the same times

* fix error

* combine .cu from gpu and kps

* replace gpu by kps

* fix by Chen-weihang

* Revert "Fix kps compile error in Junhui logic compare bitwise"

* fix backend test

* rm comments
Co-authored-by: NChen Weihang <chenweihang@baidu.com>

3749198e

W

add fp16 kernel to clip_grad (#41661) · 137dc3e3
由 wuyefeilin 提交于 4月 12, 2022

137dc3e3
Z
[DoubleGrad] Enabled double grad test cases in eager_mode for test_imperative_double_grad (#41451) · 0b4c3c20
由 Zhanlue Yang 提交于 4月 12, 2022
```
* [DoubleGrad] Enabled double grad test cases in eager_mode for test_imperative_double_grad

* Fixed elementwise issue

* Addressed CI failures
```
0b4c3c20
A
[Phi]Fix beta1_pow/beta2_pow/skip_update data transform problem in adam/adamw (#41641) · fdeec8c3
由 Aurelius84 提交于 4月 12, 2022
```
* [Phi]Fix beta1_pow/beta2_pow/skip_update data transform problem in adam/adamw

* fix xpu unittest failed
```
fdeec8c3

add a inner loop for index_select_grad_init() in index_select op when dealing... · bc01242b

由 FlyingQianMM 提交于 4月 12, 2022

add a inner loop for index_select_grad_init() in index_select op when dealing with large-shape data (#41563)

* replace for with CUDA_KERNEL_LOOP for index_select_grad_init() in index_select op

* use CUDA_KERNEL_LOOP_TYPE

* fix code style

* replace index_select_grad_init with SetConstant

bc01242b

11 4月, 2022 3 次提交

Y
[Phi]Add multi_dot/maxout/multiplex op yaml (#41550) · 36d76840
由 YuanRisheng 提交于 4月 11, 2022
```
* add multi_dot,maxout,multiplex yaml

* add code converage
```
36d76840

[Yaml] Add assign yaml (#41428) · 437bebda

由 chentianyu03 提交于 4月 11, 2022

* add assign yaml

* add assign api

* add assign backward api

* add assign

* add assign yaml

* add assign

* assign yaml

* add assign raw kernel and use assign_raw in yaml

* merge develop branch

* add missing python_api

437bebda

S

fix some ops (#41577) · 795d7121
由 sneaxiy 提交于 4月 11, 2022

795d7121

10 4月, 2022 1 次提交
- C
  
  fix warpctc grad kernel dep eror (#41598) · 91d6f47a
  由 Chen Weihang 提交于 4月 10, 2022
  
  91d6f47a
09 4月, 2022 2 次提交

H

add depthwise conv hip support (#41537) · b3b8d345
由 hong 提交于 4月 09, 2022

b3b8d345

Autotune the workspace_size_limit in conv. (#40338) · b937cdc5

由 limingshu 提交于 4月 09, 2022

* Using the maximum workspace_size of all alogirhms to limit the workspace size in exhaustive search mode.

* Use the system cudaMalloc and cudaFree to allocate workspace during searching.

* Enable switch of two kind of workspace setting methods.
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>

b937cdc5

08 4月, 2022 1 次提交
- J
  
  Fix RNN OP multi-threads predict bug (#41529) · 09203e46
  由 Jack Zhou 提交于 4月 08, 2022
  
  09203e46
07 4月, 2022 8 次提交
- remove FLAGS_use_curand and change all random op CUDA implementation (#41308) · 9714878c
  由 zhouweiwei2014 提交于 4月 07, 2022
  
  9714878c
- Y
  [Phi]Add hard_swish/kron/linspace/logit yaml file (#41298) · 90cb337e
  由 YuanRisheng 提交于 4月 07, 2022
```
* add yaml

* perfect converage
```
  90cb337e
- fix compile bug of windows cuda11.5 (#41433) · eea85814
  由 zhouweiwei2014 提交于 4月 07, 2022
  
  eea85814
- Z
  
  Add Sparse API to_dense, to_sparse_coo and values (#41394) · f78cc3da
  由 zhangkaihuo 提交于 4月 07, 2022
  
  f78cc3da
- S
  [BugFix] Add error hint for one_hot gpu version (#41335) · 91266b96
  由 Siming Dai 提交于 4月 07, 2022
```
* add one_hot gpu hint

* move allow_out_of_range judgement

* delete useless unittest
```
  91266b96
- Z
  
  fix p_norm gpu nan bug while divide zero (#41359) · dfa63126
  由 zhiboniu 提交于 4月 07, 2022
  
  dfa63126
- C
  [Phi] Polish truncated normal kernel and add yaml (#41280) · d39e7896
  由 Chen Weihang 提交于 4月 07, 2022
```
* polish truncated normal kernel

* add yaml

* add truncated normal kernel and add yaml

* polish unittests and yaml

* import dygraph mehtod
```
  d39e7896
- Y
  
  fix bugs of reshape double grad infermeta (#41459) · 53409bcd
  由 YuanRisheng 提交于 4月 07, 2022
  
  53409bcd
06 4月, 2022 4 次提交

Y
[Phi]Add graph_send_recv yaml file (#41206) · 6f4bd0ea
由 YuanRisheng 提交于 4月 06, 2022
```
* add graph_send_recv yaml

* deal with confict

* fix compile bugs
```
6f4bd0ea
S

fix bug of missing boost when compile cache.cc (#41430) · 5c6e4bff
由 Sing_chan 提交于 4月 06, 2022

5c6e4bff

Add conv yaml (#41354) · 7ed7c6c7

由 hong 提交于 4月 06, 2022

* update

* add conv yaml

* add backward

* remove useless code

* fix bug

* fix bug

* revert fluid dygraph conv2d

* remove useless infermeta function

* fix meta fn deluplicat error

* conv using custom impl

* remove amp include

* fix bug

* use cudnn = true

* fix test mkldnn caching bug

7ed7c6c7

X
[Dygraph TestsFix] Test some tests in new dygraph final_state mode. (#41363) · 0b96793e
由 xiongkun 提交于 4月 06, 2022
```
* fix less than

* fix some tests

* fix additional 3 unittest case
```
0b96793e

05 4月, 2022 4 次提交

Z
Fix bug of data transform in inference executor (#41349) · 91212104
由 zyfncg 提交于 4月 05, 2022
```
* fix bug of data transform in inference executor

* fix bug
```
91212104

[DoubleGrad PR #8] Enabled triple grads for sigmoid and matmul (#41387) · d8a10977

由 Zhanlue Yang 提交于 4月 05, 2022

* [Refactor] refactored eager_gen.py PR #2

* [DoubleGrad PR #1] Decoupled code generation logics for Dygraph ForwardFunctions and GradNodes

* Fixed minor issue

* Adjusted logics of GenerateNodeCreationCodes and GenerateForwardDefinition

* Fixed issues

* Supported higher-order grad node generation

* [DoubleGrad PR #4] Supported higher-order GradNode generation

* [DoubleGrad #4] Bug Fixes to Double Grad Node Generation

* Fixed yaml typo

* Fixed yaml typo

* fixed minor issues

* [DoubleGrad PR #5] Enabled gradient computations for grad_tensors passed to paddle.grad()

* Fixed minor issue

* Fixed CI-Inference issue

* Fixed CI-inference issues

* [DoubleGrad PR #7] paddle.grad() to copy backward graph before backward run

* Fixed minor issues

* Fixed issue with backward graph construction logic

* Fixed implementation issues with backward graph reconstruction

* Fixed unittest issue

* Fixed issues

* [DoubleGrad PR #8] Enabled triple grads for sigmoid and matmul

* Fixed issues with phi kernel

* Added triple grad test case

* Fixed minor issue

d8a10977

G

add new format of quantization (#41041) · b72a7ebb
由 Guanghua Yu 提交于 4月 05, 2022

b72a7ebb

Implement AutoTuneStatus class for Kernel Auto Tune (#41218) · b0f8000e

由 Zhang Ting 提交于 4月 05, 2022

* switch autotune

* implement AutoTuneCache

* implement AutoTuneCache class

* add pybind api

* add dygraph test

* support static mode and eager mode and improve unittests

* rename the SwitchAutoTune Class and improve tests

* improve AutoTuneStatus and reduce the cost of tests

b0f8000e

04 4月, 2022 4 次提交
- 0
  
  Fix Warpctc error when using muti-gpu (#41389) · f8b3e576
  由 0x45f 提交于 4月 04, 2022
  
  f8b3e576
- F
  
  fix index_select kernel configuration error where input numel is 0 (#41383) · 3e9ad093
  由 FlyingQianMM 提交于 4月 04, 2022
  
  3e9ad093
- H
  Add batch norm yaml (#41386) · 77cf305f
  由 hong 提交于 4月 04, 2022
```
* update

* fix bug
```
  77cf305f
- F
  Add yaml for flatten_contiguous_range OP (#41345) · c5285cc5
  由 From00 提交于 4月 04, 2022
```
* Add yaml for flatten_contiguous_range OP

* update

* Fix typos
Co-authored-by: NShixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
```
  c5285cc5
03 4月, 2022 2 次提交

[Phi]Concat grad (#41112) · 3f57ef7a

由 chentianyu03 提交于 4月 03, 2022

* add concat_grad kernel

* fix error

* remove comment code

* fix outs nullptr error

* change to phi header

* add concat_grad declare for standalone_executor_test

3f57ef7a

add maximum limit for grid of index_select (#41127) · af8d2482

由 FlyingQianMM 提交于 4月 03, 2022

* limit grid dim for index select

* mv LimitGridDim into gpu_launch_config.h

* fix conflicts

* fix conflicts

* fix code style

* set block to 256

* fix grid setting

* set dtype of block_dim to unsigned int

af8d2482

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致