提交 · 92fdfe33164e84b46fc2102dc992d7339a2782ae · 机器未来 / Paddle

04 5月, 2022 1 次提交
- X
  
  fix bug when compiling with cusparse in CUDA version >=11.4 (#42455) · 92fdfe33
  由 XiaoguangHu 提交于 5月 04, 2022
  
  92fdfe33
01 5月, 2022 1 次提交
- L
  
  [KP] Complete registry of elementwise ops on XPU with KP (#42056) · a3d56a9c
  由 Lijunhui 提交于 5月 01, 2022
  
  a3d56a9c
25 4月, 2022 1 次提交

fix incorrect usages of std::move and other compile errors (#41045) · 05739d9e

由 tiancaishaonvjituizi 提交于 4月 25, 2022

* fix bug of std::move and others

* fix an compile error in debug mode

* fix wrong copy assignment operator
Signed-off-by: Ntiancaishaonvjituizi <452565578@qq.com>

* reformat
Signed-off-by: Ntiancaishaonvjituizi <452565578@qq.com>

* reformat
Signed-off-by: Ntiancaishaonvjituizi <452565578@qq.com>

* fix ArrayRef constructor following llvm

* fix format

* fix conflict with master

05739d9e

23 4月, 2022 1 次提交
- T
  
  update reduce_max for kunlun, *test=kunlun (#42116) · 1587ad07
  由 TTerror 提交于 4月 23, 2022
  
  1587ad07
22 4月, 2022 1 次提交

[WIP] Algorithm Cache of cuBlasLt Epilogue (#41010) · 19650d72

由 Ming-Xu Huang 提交于 4月 22, 2022

* Fix leading dimension setting error in fused_gemm_epilogue_grad_op.

* Add dyload to cuBlasLt functions.

* Added cublasLtMatmulAlgoGetHeuristic to improve performance.

* Added FLAGS_cublaslt_exhaustive_search_times to cublasLt epilogue

* Added UTs to FLAGS_cublaslt_exhaustive_search_times

* Added warmup runs in algo searching of Gemm epilogue.

* Update copyright and documents.

* Fixed error handling.

19650d72

20 4月, 2022 1 次提交
- F
  
  [MLU] add gather mlu kernel (#41969) · 23ad2166
  由 fwenguang 提交于 4月 20, 2022
  
  23ad2166
19 4月, 2022 3 次提交
- J
  OneDNN md-in-tensor refactoring part 1: Added main changes for md-in-tensor (#41303) · c9f4fcf3
  由 jakpiase 提交于 4月 19, 2022
```
* changes for md in tensor

* ci fix

* Temporarily limited dims for test

* ci fix

* removed unnecessary includes

* added reviewers suggestions

* checkouted two files to avoid changing more than 19 in single PR

* minor fix

* reverted one file to reduce files changed to 19
```
  c9f4fcf3
- support bmm&bmm_grad for KL2, *test=kunlun (#41935) · 60bec700
  由 z8hanghuan 提交于 4月 19, 2022
  
  60bec700
- F
  
  [MLU] support add callback to stream (#41831) · 03533b0c
  由 fwenguang 提交于 4月 19, 2022
  
  03533b0c
18 4月, 2022 3 次提交
- L
  
  [KP] Add Reduce op registry & UT for xpu_kp compilation (#41869) · b3959fe4
  由 Lijunhui 提交于 4月 18, 2022
  
  b3959fe4
- support tril_triu_grad for KL2, *test=kunlun (#41877) · 0759e99d
  由 z8hanghuan 提交于 4月 18, 2022
  
  0759e99d
- T
  cinn_launch_op: optimize the overhead of preparing variables before executing... · 2d4fe163
  由 TeFeng Chen 提交于 4月 18, 2022
```
cinn_launch_op: optimize the overhead of preparing variables before executing cinn compiled program (#41777)

* optimize preparation overhead before executing cinn compiled program

* update code notes

* fix flag annotation

* add a flag of auto-tune feature beforehand
```
  2d4fe163
15 4月, 2022 3 次提交

T

add fp16 for masked_select on kunlun, *test=kunlun (#41215) · ff818c77
由 TTerror 提交于 4月 15, 2022

ff818c77

Change cuDNN Conv kernel for auto tune feature (#41313) · 35acfeda

由 limingshu 提交于 4月 15, 2022

* change cudnn helper for auto-tune

* Add FLAGS_use_autotune to set the global status of autotune and change the order of choosing algorithm.

* Fix the bug in calculating and printing current step cache hit rate.

* Improve the autotune cache and fix unittest.

* Change the key from AlgorithmType to int64_t.

* Fix unittest for cpu-only env.

* change ChooseAlgoByWorkspace for heuristic mode
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>

35acfeda

F
[MLU] add mlu new profiler (#41138) · fc208b7e
由 fwenguang 提交于 4月 15, 2022
```
* [MLU] add mlu new profiler

* fix format
```
fc208b7e

14 4月, 2022 4 次提交

L
[KP] Add registry for elementwise_add/max/min/sub/div/mul/floordiv on XPU2 with KP lib (#41494) · fbe2c311
由 Lijunhui 提交于 4月 14, 2022
```
* regist elementwise_xxx
```
fbe2c311

[Phi] Support construct Scalar by using Non-CPU Tensor (#41765) · 54ccc308

由 YuanRisheng 提交于 4月 14, 2022

* support construct scalar using non-cpu tensor

* fix bugs when run unittest

* fix compile bugs

* fix bugs when run ci

* fix compile bugs

* fix bugs when move copy

* perfect unit test

* perfect unittest

* update according to comment

* add target dependency

* deal with conflict

* fix bugs when run unit test

* fix unit test bugs

54ccc308

Fix to #38693 (minimal UT) (#41026) · d0f3296b

由 Jacek Czaja 提交于 4月 14, 2022

* Add UT

- Added missed data_layout

- Added missing conversions

- NDHWC added

- NDHWC support in data_transform

- another fix

- condddate change

- fix

u- fix

- fix

- fix

- fix

- fix

- fix to hack

- compilation fix

- fix to automatic merge

* - reduced UT

* - fix

* - lint

* - fix to lint

d0f3296b

support multi layer and bidirection of lstm_grad, *test=kunlun (#41742) · 8b07ce0e

由 z8hanghuan 提交于 4月 14, 2022

* support multi layer and bidirection of lstm_grad, *test=kunlun

* support multi layer and bidirection of lstm_grad, *test=kunlun

8b07ce0e

13 4月, 2022 5 次提交
- T
  Revert "[Phi] Support construct Scalar by using Non-CPU Tensosr (#41528)" (#41740) · 404c4a6b
  由 tianshuo78520a 提交于 4月 13, 2022
```
This reverts commit fe214af2.
```
  404c4a6b
- Y
  [Phi] Support construct Scalar by using Non-CPU Tensosr (#41528) · fe214af2
  由 YuanRisheng 提交于 4月 13, 2022
```
* support construct scalar using non-cpu tensor

* fix bugs when run unittest

* fix compile bugs

* fix bugs when run ci

* fix compile bugs

* fix bugs when move copy

* perfect unit test

* perfect unittest

* update according to comment

* add target dependency
```
  fe214af2
- Z
  
  concat and relu sopport FP16 in XPU, test=kunlun (#41631) · c4d5a77f
  由 zhangyikun02 提交于 4月 13, 2022
  
  c4d5a77f
- Z
  
  support bce_loss and bce_loss_grad in XPU, test=kunlun (#41610) · 468c1ad7
  由 zhangyikun02 提交于 4月 13, 2022
  
  468c1ad7
- H
  Update sign op xpu (#41685) · a4d4c116
  由 houj04 提交于 4月 13, 2022
```
* update sign op on xpu. test=kunlun

* fix typo. test=kunlun
```
  a4d4c116
12 4月, 2022 3 次提交

[KP] Add Logical/compare/bitwise registry & UT (#40802) · 3749198e

由 Lijunhui 提交于 4月 12, 2022

* init commit no push

* collect comile errors

* bitwise UT

* fix compile problem

* cancel comments

* restore miss deletion

* fix compilation

* fix UT

* NO stash in multiple branch at the same times

* fix error

* combine .cu from gpu and kps

* replace gpu by kps

* fix by Chen-weihang

* Revert "Fix kps compile error in Junhui logic compare bitwise"

* fix backend test

* rm comments
Co-authored-by: NChen Weihang <chenweihang@baidu.com>

3749198e

J
fix_paddle_numel_check (#41607) · 51cae7f7
由 JingZhuangzhuang 提交于 4月 12, 2022
```
* fix_paddle_numel_check

* fix_paddle_numel_check
```
51cae7f7
L

Update Profiler (#41638) · c3e1d257
由 liutiexing 提交于 4月 12, 2022

c3e1d257

11 4月, 2022 2 次提交
- fix dynamic flag bug on mac (#41571) · b026840a
  由 zhouweiwei2014 提交于 4月 11, 2022
  
  b026840a
- A
  
  support more ops (#41421) · fc621dfe
  由 Allen Guo 提交于 4月 11, 2022
  
  fc621dfe
09 4月, 2022 1 次提交

Autotune the workspace_size_limit in conv. (#40338) · b937cdc5

由 limingshu 提交于 4月 09, 2022

* Using the maximum workspace_size of all alogirhms to limit the workspace size in exhaustive search mode.

* Use the system cudaMalloc and cudaFree to allocate workspace during searching.

* Enable switch of two kind of workspace setting methods.
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>

b937cdc5

08 4月, 2022 2 次提交
- Q
  [ROCm] fix dcu error in device event base, test=develop (#41521) · 14dba636
  由 Qi Li 提交于 4月 08, 2022
```
* [ROCm] fix dcu error in device event base, test=develop

* fix, test=develop
```
  14dba636
- T
  
  xpu mul unittest *test=kunlun (#41140) · 770ce7cf
  由 taixiurong 提交于 4月 08, 2022
  
  770ce7cf
07 4月, 2022 5 次提交
- remove FLAGS_use_curand and change all random op CUDA implementation (#41308) · 9714878c
  由 zhouweiwei2014 提交于 4月 07, 2022
  
  9714878c
- C
  Fix dygraph record event position (#41445) · 8fba68d3
  由 chenjian 提交于 4月 07, 2022
```
* no

* maintain old profiler

* fix old dygraph record event
```
  8fba68d3
- Q
  ignore some failed test for KL2 (#41342) · 81389c51
  由 QingshuChen 提交于 4月 07, 2022
```
* ignore some failed test for KL2
*test=kunlun

* minor
*test=kunlun

* minor
*test=kunlun
```
  81389c51
- J
  modify infer gpu memory strategy (#41427) · 56e72b20
  由 JingZhuangzhuang 提交于 4月 07, 2022
```
* modify infer gpu memory strategy

* modify infer gpu memory strategy
```
  56e72b20
- Y
  Add GPU memory usage information in the print of profiler. (#41440) · 516160a4
  由 Yiqun Liu 提交于 4月 07, 2022
```
* Add GPU memory usage information in the print of profiler.

* Add ifdef.
```
  516160a4
06 4月, 2022 1 次提交
- A
  [IPU] remove paddle_ipu shared library (#41307) · 229e91bf
  由 Allen Guo 提交于 4月 06, 2022
```
* remove paddle_ipu shared library

* fix unique_name
```
  229e91bf
03 4月, 2022 1 次提交

add maximum limit for grid of index_select (#41127) · af8d2482

由 FlyingQianMM 提交于 4月 03, 2022

* limit grid dim for index select

* mv LimitGridDim into gpu_launch_config.h

* fix conflicts

* fix conflicts

* fix code style

* set block to 256

* fix grid setting

* set dtype of block_dim to unsigned int

af8d2482

01 4月, 2022 1 次提交

[Eager] Support pinned (#41035) · f3270fc8

由 wanghuancoder 提交于 4月 01, 2022

* support pinned, test=develop

* support async_write, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine,test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

f3270fc8

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致