提交 · b71833ea4a98a1f9cbee6b769c25b72072f20539 · PaddlePaddle / Paddle

25 7月, 2022 1 次提交
- [cherry-pick]remove unuse cuSparse function (#44511) · 684b12ee
  由 zhouweiwei2014 提交于 7月 25, 2022
```
cherry-pick #43626
```
  684b12ee
19 7月, 2022 1 次提交

Record op shape data for profiler [cherry-pick PR43405 43578 43822] (#44384) · a2240190

由 chenjian 提交于 7月 19, 2022

* add serialization for new field in event node (#43405)

* add serialization for new field in event node

* fix a bug

* add more field to memory record (#43578)

* Add infer shape in dygraph (#43822)

* record memory and op supplement info

* update

* update

* fix a bug

* fix memory recording

* fix a bug

* update

* update

* fix a bug

* update

* fix a bug

* fix a bug

* fix a bug

* update dygraph record

* add infer shape record

* fix

* fix

* fix

* add comments

* fix a bug

* fix

* fix

* add record op info

* fix file mode

* add op input shape info

* fix dependency

a2240190

12 7月, 2022 1 次提交

add new field for event node (#43223) (#44245) · 94271bc2

由 chenjian 提交于 7月 12, 2022

* add new field for event node

* fix

* fix bug

* fix bug

* fix clang

* fix clang format

* fix code format

94271bc2

24 6月, 2022 1 次提交

[cherry-pick] NVIDIA fixes (#43780) · 9edbe4aa

由 Aganlengzi 提交于 6月 24, 2022

* Use all sitepackages path as the library/include path (#42940)

* Fix several unit tests and increase the unit tests stability (#43670)

* Reduce gather op unit tests size and increase the timeout

* Add NVIDIA_TF32_OVERRIDE for multi-processes environment

* Remove record test for device event ut

* Fix 3 unittest errors (#43532)

* Fix test_fuse_resnet_unit failure

* Fix test_imperative_auto_mixed_precision failure

* Fix sparse_attention_op error

* Fix sparse_attention_op error

* Use fixed random seed (#43659)

* for CI test_collective_sendrecv_api
Co-authored-by: Nzlsh80826 <rewang@nvidia.com>
Co-authored-by: NShijie <505749828@qq.com>

9edbe4aa

14 6月, 2022 1 次提交

[ CherryPick ] Cherry pick for einsum optimization. (#43468) · 22e75d92

由 xiongkun 提交于 6月 14, 2022

* [EinsumOp] Polish forward logic and backward logic for optimize (#42603)

* change logic for optimize

* modifty

* merge

* change einsum_v2 as default and add new flags: FLAG_einsum_opt=1|0 (#43010)

* [EinsumOp] Make EinsumOp support bfloat16. (#43085)

* change einsum_v2 as default and add new flags: FLAG_einsum_opt=1|0

* make EInsumOP support bf16

* add unittest for BF16

* add condition for test_BF16

* fix bugs

* fix

* change the backward api to fit einsum op

22e75d92

10 5月, 2022 2 次提交

[cherry-pick][MLU] support add callback to stream and profiler (#42115) · 25124d7f

由 fwenguang 提交于 5月 10, 2022

* [MLU] add mlu new profiler (#41138)

* [MLU] add mlu new profiler

* fix format

* [MLU] support add callback to stream (#41831)

* [MLU] add gather mlu kernel (#41969)

* [MLU] add mlu activation kernels (#41751)

25124d7f

A
set custom_nll_loss_op attr ignoreIndex to str (#42596) · 6c935e1d
由 Allen Guo 提交于 5月 10, 2022
```
set attr ignoreIndex type to string for custom_nllloss_op

部分 cheery-pick of #42534
```
6c935e1d

09 5月, 2022 1 次提交

[Cherry-pick][IPU] merge recent changes (#42078) (#42582) · 1f9b60df

由 Allen Guo 提交于 5月 09, 2022

    add class NameScopeHelper for adding namescope info
    添加更多 种类优化器状态的映射
    为 IpuStrategy 添加 compilation_progress_logger option 用于输出 编译进度
    部分代码清理和杂项优化

1f9b60df

04 5月, 2022 1 次提交
- X
  
  fix bug when compiling with cusparse in CUDA version >=11.4 (#42456) · b57c132a
  由 XiaoguangHu 提交于 5月 04, 2022
  
  b57c132a
22 4月, 2022 1 次提交
- J
  
  Add UT (#42055) · 4f6aba87
  由 Jacek Czaja 提交于 4月 22, 2022
  
  4f6aba87
21 4月, 2022 2 次提交

Z

support bce_loss and bce_loss_grad in XPU, test=kunlun (#41610) · b1ba98ca
由 zhangyikun02 提交于 4月 13, 2022

b1ba98ca

[cherry-pick]support multi_layer of bilstm,*test=kunlun (#42076) · 58f6d459

由 z8hanghuan 提交于 4月 21, 2022

* modify xpu.cmake,*test=kunlun (#41832)

* modify xpu.cmake,*test=kunlun

* modify xpu.cmake,*test=kunlun

* modify xpu.cmake,*test=kunlun

* modify xpu.cmake,*test=kunlun

* support bilstm,*test=kunlun

* [cherry-pick]support multi_layer of bilstm,*test=kunlun

58f6d459

20 4月, 2022 1 次提交

[Phi] Support construct Scalar by using Non-CPU Tensor (#41765) (#41963) · 3b25afb2

由 YuanRisheng 提交于 4月 20, 2022

* support construct scalar using non-cpu tensor

* fix bugs when run unittest

* fix compile bugs

* fix bugs when run ci

* fix compile bugs

* fix bugs when move copy

* perfect unit test

* perfect unittest

* update according to comment

* add target dependency

* deal with conflict

* fix bugs when run unit test

* fix unit test bugs

3b25afb2

19 4月, 2022 2 次提交
- Y
  [Cherry-pick 2.3] Autotune the workspace and kernel choosing of conv (#41833) · b4adbe5c
  由 Yiqun Liu 提交于 4月 19, 2022
```
Cherry-pick #40338 #41741 #41313
```
  b4adbe5c
- T
  cinn_launch_op: optimize the overhead of preparing variables before executing... · dab7dfbf
  由 TeFeng Chen 提交于 4月 19, 2022
```
cinn_launch_op: optimize the overhead of preparing variables before executing cinn compiled program (#41777) (#41910)

cherry-pick #41777
* optimize preparation overhead before executing cinn compiled program
```
  dab7dfbf
15 4月, 2022 1 次提交
- A
  
  support more ops (#41421) (#41731) · 9f2ae360
  由 Allen Guo 提交于 4月 15, 2022
  
  9f2ae360
13 4月, 2022 1 次提交
- A
  Revert "[Phi] Migrate Adam and AdamW into Phi (#40351)" (#41712) · 8663376f
  由 Aurelius84 提交于 4月 13, 2022
```
* Revert "[Phi] Migrate Adam and AdamW into Phi (#40351)"

This reverts commit 56cd3407.

* add infermeta
```
  8663376f
12 4月, 2022 3 次提交
- J
  
  [cherry pick]fix paddle tensor numel check (#41665) · 6a1ddd61
  由 JingZhuangzhuang 提交于 4月 12, 2022
  
  6a1ddd61
- fix dynamic flag bug on mac (#41571) (#41660) · 883d5be3
  由 zhouweiwei2014 提交于 4月 12, 2022
```
cherry-pick #41571
```
  883d5be3
- C
  Fix dygraph record event position (#41445) (#41608) · 727dcbd9
  由 chenjian 提交于 4月 12, 2022
```
* no

* maintain old profiler

* fix old dygraph record event
```
  727dcbd9
08 4月, 2022 2 次提交
- Q
  [Cherry-pick][ROCm] fix dcu error in device event base, test=develop (#41523) · ebe72b88
  由 Qi Li 提交于 4月 08, 2022
```
Cherry-pick of #41521
```
  ebe72b88
- Z
  [cherrypick-2.3] modify infer gpu memory strategy (#41427), remove... · cb7551fd
  由 Zhang Jun 提交于 4月 08, 2022
```
[cherrypick-2.3] modify infer gpu memory strategy (#41427), remove cudnn_deterministic=True (#41341)  (#41491)
Co-authored-by: NJingZhuangzhuang <75348594+JZZ-NOTE@users.noreply.github.com>
```
  cb7551fd
06 4月, 2022 1 次提交
- A
  [IPU] remove paddle_ipu shared library (#41307) · 229e91bf
  由 Allen Guo 提交于 4月 06, 2022
```
* remove paddle_ipu shared library

* fix unique_name
```
  229e91bf
03 4月, 2022 1 次提交

add maximum limit for grid of index_select (#41127) · af8d2482

由 FlyingQianMM 提交于 4月 03, 2022

* limit grid dim for index select

* mv LimitGridDim into gpu_launch_config.h

* fix conflicts

* fix conflicts

* fix code style

* set block to 256

* fix grid setting

* set dtype of block_dim to unsigned int

af8d2482

01 4月, 2022 2 次提交

[Eager] Support pinned (#41035) · f3270fc8

由 wanghuancoder 提交于 4月 01, 2022

* support pinned, test=develop

* support async_write, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine,test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

f3270fc8

support multi_layer of bilstm,*test=kunlun (#41151) · 00d23897

由 z8hanghuan 提交于 4月 01, 2022

* support multi_layer of bilstm,*test=kunlun

* support multi_layer of bilstm, *test=kunlun

* support multi_layer of bilstm, *test=kunlun

* support multi_layer of bilstm, *test=kunlun

00d23897

31 3月, 2022 3 次提交

[new-exec] fit mkldnn op (#41058) · 02cf6764

由 Leo Chen 提交于 3月 31, 2022

* fix bug that some op has no op_role attr

* add mkldnn support for new executor

* fit for mkldnn data_transfer

* fit for mkldnn data_transfer

02cf6764

Maintain old profiler (#41132) · a6bf2218

由 chenjian 提交于 3月 31, 2022

* no

* maintain old profiler

* exclude new python record events for old profiler

* maintain old profiler

* maintain

* maintain old profiler

* maintain

* fix cmakes

a6bf2218

Add time range duration display (#41029) · 6744754f

由 chenjian 提交于 3月 31, 2022

* no

* fix bugs

* fix doc according to review

* fix api doc format

* fix api doc according to review

* fix bug and add unit test

* fix record event bug

* optimize chrome tracing display

* fix bug

* add comment

* add unit test

* fix a bug

* fix

* fix

* fix format

6744754f

30 3月, 2022 3 次提交

Add new APIs for GPU memory monitoring (max_memory_allocated,... · afe02e9d

由 From00 提交于 3月 30, 2022

Add new APIs for GPU memory monitoring (max_memory_allocated, max_memory_reserved, memory_allocated, memory_reserved) (#38657)

* Add new API memory_reserved

* Add memory_allocated, max_memory_reserved and max_memory_allocater

* Fix CI error

* Fix CI error

* Enhance UT

* Add FLAGS_memory_stats_opt

* Add STATS macro functions

* Add StatAllocator

* Fix CI errors

* Add UT

* Fix CI errors

afe02e9d

add bilinear interpolate v2 to xpu list and unitteset, *test=kunlun (#41037) · 4e86dff2

由 ykkk2333 提交于 3月 30, 2022

* add bilinear interpolate v2 to xpu list and unitteset, *test=kunlun

* Delete ps_usr_print_log

* Delete ps_usr_print_log

* Delete xpu_op_test

4e86dff2

swish and pow op for xpu test=kunlun (#40654) · d951f3af

由 houj04 提交于 3月 30, 2022

* swish and pow op for xpu. test=kunlun

* fix code style. test=kunlun.

* use pow_grad xdnn api. test=kunlun.

d951f3af

29 3月, 2022 1 次提交
- Z
  
  softmax_with_cross_entropy support fp16 on xpu, test=kunlun (#40869) · 649948a6
  由 zhangyikun02 提交于 3月 29, 2022
  
  649948a6
28 3月, 2022 1 次提交

Fix profiler package bug (#40888) · 77a455c7

由 chenjian 提交于 3月 28, 2022

* no

* fix bugs

* fix doc according to review

* fix api doc format

* fix api doc according to review

* fix bug and add unit test

* fix record event bug

77a455c7

27 3月, 2022 1 次提交

[new-exec] fit for mkldnn and inplace op (#40955) · afa0e82c

由 Leo Chen 提交于 3月 27, 2022

* fit for mkldnn and inplace op

* fix compile

* refine ut

* register op version

* fix inplace op

* fix transfer_layout

afa0e82c

25 3月, 2022 2 次提交

[Phi] Migrate Adam and AdamW into Phi (#40351) · 56cd3407

由 Aurelius84 提交于 3月 25, 2022

* [Phi] Migrate Adam and Adamw into Phi

* fix compile error and unittest ok

* fix compile error and unittest ok

* fix undefined reference to fLI::FLAGS

* test depend on operator

* fix cmake

* fix xpu compile

* fix infrt

* fix amp_type_traits

* fix amp_type_traits

* modify according reviewer

* modify according reviewer

* fix dtype float16

* fix typo

* fix Cmake

* fix code style

56cd3407

F
add maximum limit for grid of reduce, elementwise, gather and scatter (#40813) · 608a5f55
由 FlyingQianMM 提交于 3月 25, 2022
```
* add maximum limit for grid of reduce, elementwise and gather

* add {} after if
```
608a5f55

23 3月, 2022 3 次提交

[NPU] add npu support for conv3d and conv3d_grad (#38480) · ff568afa

由 furnace 提交于 3月 23, 2022

* [NPU] add npu support for conv3d and conv3d_grad

* [NPU] delete failed unittests due to Ascend not support

* [NPU] delete debug codes

* [NPU] optimize codes, notest

* [NPU] remove const_cast

* [NPU] optimize for remove const_cast

* [NPU] fix written errors

ff568afa

Performance optimization for StreamSafeCudaAllocator (#40718) · d8bff988

由 From00 提交于 3月 23, 2022

* Performance optimize

* Optimize GetAllocator, RWLock and ProcessUnfreedAllocation

* Remove test file

* Fix CI error

* Fix CI errors

* Fix CI errors

d8bff988

Add profiler features (#40357) · c15e3823

由 chenjian 提交于 3月 23, 2022

* add event record for model profiling

* fix format

* fix format

* fix code example bug

* no

* add profiler statistic

* add profiler feature

* fix bug

* fix bug

* fix bug

* fix bug

* required: gpu

* required: gpu

* fix bug

* required: gpu

* fix ci bug

* fix ci error

* fix ci error

* upgrade document

* fix doc

* fix ci bug

* add doc and fix bug

* nothing

* fix bug

* fix format bug

* modify format

* add deprecated description for old profiler

* fix bug

* fix bug

* fix

* add load_profiler_reuslt doc

* add load_profiler_reuslt doc

* add load_profiler_reuslt doc

* help fix old profiler sample code

* add api doc

* fix format

* fix api doc

* fix api doc format

* fix api doc format

* fix api doc c format

* fix api doc format

c15e3823

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功