提交 · fc208b7efe7307b0d286410aa9e7ca7c5ca410bd · Crayon鑫 / Paddle

15 4月, 2022 1 次提交
- F
  [MLU] add mlu new profiler (#41138) · fc208b7e
  由 fwenguang 提交于 4月 15, 2022
```
* [MLU] add mlu new profiler

* fix format
```
  fc208b7e
14 4月, 2022 4 次提交

L
[KP] Add registry for elementwise_add/max/min/sub/div/mul/floordiv on XPU2 with KP lib (#41494) · fbe2c311
由 Lijunhui 提交于 4月 14, 2022
```
* regist elementwise_xxx
```
fbe2c311

[Phi] Support construct Scalar by using Non-CPU Tensor (#41765) · 54ccc308

由 YuanRisheng 提交于 4月 14, 2022

* support construct scalar using non-cpu tensor

* fix bugs when run unittest

* fix compile bugs

* fix bugs when run ci

* fix compile bugs

* fix bugs when move copy

* perfect unit test

* perfect unittest

* update according to comment

* add target dependency

* deal with conflict

* fix bugs when run unit test

* fix unit test bugs

54ccc308

Fix to #38693 (minimal UT) (#41026) · d0f3296b

由 Jacek Czaja 提交于 4月 14, 2022

* Add UT

- Added missed data_layout

- Added missing conversions

- NDHWC added

- NDHWC support in data_transform

- another fix

- condddate change

- fix

u- fix

- fix

- fix

- fix

- fix

- fix to hack

- compilation fix

- fix to automatic merge

* - reduced UT

* - fix

* - lint

* - fix to lint

d0f3296b

support multi layer and bidirection of lstm_grad, *test=kunlun (#41742) · 8b07ce0e

由 z8hanghuan 提交于 4月 14, 2022

* support multi layer and bidirection of lstm_grad, *test=kunlun

* support multi layer and bidirection of lstm_grad, *test=kunlun

8b07ce0e

13 4月, 2022 5 次提交
- T
  Revert "[Phi] Support construct Scalar by using Non-CPU Tensosr (#41528)" (#41740) · 404c4a6b
  由 tianshuo78520a 提交于 4月 13, 2022
```
This reverts commit fe214af2.
```
  404c4a6b
- Y
  [Phi] Support construct Scalar by using Non-CPU Tensosr (#41528) · fe214af2
  由 YuanRisheng 提交于 4月 13, 2022
```
* support construct scalar using non-cpu tensor

* fix bugs when run unittest

* fix compile bugs

* fix bugs when run ci

* fix compile bugs

* fix bugs when move copy

* perfect unit test

* perfect unittest

* update according to comment

* add target dependency
```
  fe214af2
- Z
  
  concat and relu sopport FP16 in XPU, test=kunlun (#41631) · c4d5a77f
  由 zhangyikun02 提交于 4月 13, 2022
  
  c4d5a77f
- Z
  
  support bce_loss and bce_loss_grad in XPU, test=kunlun (#41610) · 468c1ad7
  由 zhangyikun02 提交于 4月 13, 2022
  
  468c1ad7
- H
  Update sign op xpu (#41685) · a4d4c116
  由 houj04 提交于 4月 13, 2022
```
* update sign op on xpu. test=kunlun

* fix typo. test=kunlun
```
  a4d4c116
12 4月, 2022 3 次提交

[KP] Add Logical/compare/bitwise registry & UT (#40802) · 3749198e

由 Lijunhui 提交于 4月 12, 2022

* init commit no push

* collect comile errors

* bitwise UT

* fix compile problem

* cancel comments

* restore miss deletion

* fix compilation

* fix UT

* NO stash in multiple branch at the same times

* fix error

* combine .cu from gpu and kps

* replace gpu by kps

* fix by Chen-weihang

* Revert "Fix kps compile error in Junhui logic compare bitwise"

* fix backend test

* rm comments
Co-authored-by: NChen Weihang <chenweihang@baidu.com>

3749198e

J
fix_paddle_numel_check (#41607) · 51cae7f7
由 JingZhuangzhuang 提交于 4月 12, 2022
```
* fix_paddle_numel_check

* fix_paddle_numel_check
```
51cae7f7
L

Update Profiler (#41638) · c3e1d257
由 liutiexing 提交于 4月 12, 2022

c3e1d257

11 4月, 2022 2 次提交
- fix dynamic flag bug on mac (#41571) · b026840a
  由 zhouweiwei2014 提交于 4月 11, 2022
  
  b026840a
- A
  
  support more ops (#41421) · fc621dfe
  由 Allen Guo 提交于 4月 11, 2022
  
  fc621dfe
09 4月, 2022 1 次提交

Autotune the workspace_size_limit in conv. (#40338) · b937cdc5

由 limingshu 提交于 4月 09, 2022

* Using the maximum workspace_size of all alogirhms to limit the workspace size in exhaustive search mode.

* Use the system cudaMalloc and cudaFree to allocate workspace during searching.

* Enable switch of two kind of workspace setting methods.
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>

b937cdc5

08 4月, 2022 2 次提交
- Q
  [ROCm] fix dcu error in device event base, test=develop (#41521) · 14dba636
  由 Qi Li 提交于 4月 08, 2022
```
* [ROCm] fix dcu error in device event base, test=develop

* fix, test=develop
```
  14dba636
- T
  
  xpu mul unittest *test=kunlun (#41140) · 770ce7cf
  由 taixiurong 提交于 4月 08, 2022
  
  770ce7cf
07 4月, 2022 5 次提交
- remove FLAGS_use_curand and change all random op CUDA implementation (#41308) · 9714878c
  由 zhouweiwei2014 提交于 4月 07, 2022
  
  9714878c
- C
  Fix dygraph record event position (#41445) · 8fba68d3
  由 chenjian 提交于 4月 07, 2022
```
* no

* maintain old profiler

* fix old dygraph record event
```
  8fba68d3
- Q
  ignore some failed test for KL2 (#41342) · 81389c51
  由 QingshuChen 提交于 4月 07, 2022
```
* ignore some failed test for KL2
*test=kunlun

* minor
*test=kunlun

* minor
*test=kunlun
```
  81389c51
- J
  modify infer gpu memory strategy (#41427) · 56e72b20
  由 JingZhuangzhuang 提交于 4月 07, 2022
```
* modify infer gpu memory strategy

* modify infer gpu memory strategy
```
  56e72b20
- Y
  Add GPU memory usage information in the print of profiler. (#41440) · 516160a4
  由 Yiqun Liu 提交于 4月 07, 2022
```
* Add GPU memory usage information in the print of profiler.

* Add ifdef.
```
  516160a4
06 4月, 2022 1 次提交
- A
  [IPU] remove paddle_ipu shared library (#41307) · 229e91bf
  由 Allen Guo 提交于 4月 06, 2022
```
* remove paddle_ipu shared library

* fix unique_name
```
  229e91bf
03 4月, 2022 1 次提交

add maximum limit for grid of index_select (#41127) · af8d2482

由 FlyingQianMM 提交于 4月 03, 2022

* limit grid dim for index select

* mv LimitGridDim into gpu_launch_config.h

* fix conflicts

* fix conflicts

* fix code style

* set block to 256

* fix grid setting

* set dtype of block_dim to unsigned int

af8d2482

01 4月, 2022 2 次提交

[Eager] Support pinned (#41035) · f3270fc8

由 wanghuancoder 提交于 4月 01, 2022

* support pinned, test=develop

* support async_write, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine,test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

f3270fc8

support multi_layer of bilstm,*test=kunlun (#41151) · 00d23897

由 z8hanghuan 提交于 4月 01, 2022

* support multi_layer of bilstm,*test=kunlun

* support multi_layer of bilstm, *test=kunlun

* support multi_layer of bilstm, *test=kunlun

* support multi_layer of bilstm, *test=kunlun

00d23897

31 3月, 2022 3 次提交

[new-exec] fit mkldnn op (#41058) · 02cf6764

由 Leo Chen 提交于 3月 31, 2022

* fix bug that some op has no op_role attr

* add mkldnn support for new executor

* fit for mkldnn data_transfer

* fit for mkldnn data_transfer

02cf6764

Maintain old profiler (#41132) · a6bf2218

由 chenjian 提交于 3月 31, 2022

* no

* maintain old profiler

* exclude new python record events for old profiler

* maintain old profiler

* maintain

* maintain old profiler

* maintain

* fix cmakes

a6bf2218

Add time range duration display (#41029) · 6744754f

由 chenjian 提交于 3月 31, 2022

* no

* fix bugs

* fix doc according to review

* fix api doc format

* fix api doc according to review

* fix bug and add unit test

* fix record event bug

* optimize chrome tracing display

* fix bug

* add comment

* add unit test

* fix a bug

* fix

* fix

* fix format

6744754f

30 3月, 2022 3 次提交

Add new APIs for GPU memory monitoring (max_memory_allocated,... · afe02e9d

由 From00 提交于 3月 30, 2022

Add new APIs for GPU memory monitoring (max_memory_allocated, max_memory_reserved, memory_allocated, memory_reserved) (#38657)

* Add new API memory_reserved

* Add memory_allocated, max_memory_reserved and max_memory_allocater

* Fix CI error

* Fix CI error

* Enhance UT

* Add FLAGS_memory_stats_opt

* Add STATS macro functions

* Add StatAllocator

* Fix CI errors

* Add UT

* Fix CI errors

afe02e9d

add bilinear interpolate v2 to xpu list and unitteset, *test=kunlun (#41037) · 4e86dff2

由 ykkk2333 提交于 3月 30, 2022

* add bilinear interpolate v2 to xpu list and unitteset, *test=kunlun

* Delete ps_usr_print_log

* Delete ps_usr_print_log

* Delete xpu_op_test

4e86dff2

swish and pow op for xpu test=kunlun (#40654) · d951f3af

由 houj04 提交于 3月 30, 2022

* swish and pow op for xpu. test=kunlun

* fix code style. test=kunlun.

* use pow_grad xdnn api. test=kunlun.

d951f3af

29 3月, 2022 1 次提交
- Z
  
  softmax_with_cross_entropy support fp16 on xpu, test=kunlun (#40869) · 649948a6
  由 zhangyikun02 提交于 3月 29, 2022
  
  649948a6
28 3月, 2022 1 次提交

Fix profiler package bug (#40888) · 77a455c7

由 chenjian 提交于 3月 28, 2022

* no

* fix bugs

* fix doc according to review

* fix api doc format

* fix api doc according to review

* fix bug and add unit test

* fix record event bug

77a455c7

27 3月, 2022 1 次提交

[new-exec] fit for mkldnn and inplace op (#40955) · afa0e82c

由 Leo Chen 提交于 3月 27, 2022

* fit for mkldnn and inplace op

* fix compile

* refine ut

* register op version

* fix inplace op

* fix transfer_layout

afa0e82c

25 3月, 2022 2 次提交

[Phi] Migrate Adam and AdamW into Phi (#40351) · 56cd3407

由 Aurelius84 提交于 3月 25, 2022

* [Phi] Migrate Adam and Adamw into Phi

* fix compile error and unittest ok

* fix compile error and unittest ok

* fix undefined reference to fLI::FLAGS

* test depend on operator

* fix cmake

* fix xpu compile

* fix infrt

* fix amp_type_traits

* fix amp_type_traits

* modify according reviewer

* modify according reviewer

* fix dtype float16

* fix typo

* fix Cmake

* fix code style

56cd3407

F
add maximum limit for grid of reduce, elementwise, gather and scatter (#40813) · 608a5f55
由 FlyingQianMM 提交于 3月 25, 2022
```
* add maximum limit for grid of reduce, elementwise and gather

* add {} after if
```
608a5f55

23 3月, 2022 2 次提交

[NPU] add npu support for conv3d and conv3d_grad (#38480) · ff568afa

由 furnace 提交于 3月 23, 2022

* [NPU] add npu support for conv3d and conv3d_grad

* [NPU] delete failed unittests due to Ascend not support

* [NPU] delete debug codes

* [NPU] optimize codes, notest

* [NPU] remove const_cast

* [NPU] optimize for remove const_cast

* [NPU] fix written errors

ff568afa

Performance optimization for StreamSafeCudaAllocator (#40718) · d8bff988

由 From00 提交于 3月 23, 2022

* Performance optimize

* Optimize GetAllocator, RWLock and ProcessUnfreedAllocation

* Remove test file

* Fix CI error

* Fix CI errors

* Fix CI errors

d8bff988

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致