提交 · ab1a4df95ca85884dd6c8c0ae012cddbf2c681d0 · BaiXuePrincess / Paddle

19 5月, 2021 1 次提交

【cherrypick】support cuda11 for heterps; add profiler in oneps (#32957) · ab1a4df9

由 danleifeng 提交于 5月 19, 2021

* cherrypick for #32640 :add profile and fix dataset hang in heterps;test=develop

* cherrypick for #32640 :add profile and fix dataset hang in heterps;test=develop

* cherrypick for #32640 :add profile and fix dataset hang in heterps;test=develop

ab1a4df9

18 5月, 2021 1 次提交
- H
  
  bugfix: parallel_executor for xpu should use BindThreadedSSAGraphExecutor (#32792) (#32933) · b619648c
  由 houj04 提交于 5月 18, 2021
  
  b619648c
11 5月, 2021 1 次提交
- S
  fix find_unused_parameters default value (#32829) · 02513207
  由 ShenLiang 提交于 5月 11, 2021
```
fix error log for reducer

fix doc

fix bug of utest

fix spawn

fix converage
```
  02513207
07 5月, 2021 6 次提交

[Paddle-TRT] Implement MHA fp16 order same as training (#32629) (#32785) · 09b18a49

由 Shang Zhizhou 提交于 5月 07, 2021

* implement MHA order same as training

* fix fp16 compile issue on old architecture
Co-authored-by: Nzlsh80826 <rewang@nvidia.com>

09b18a49

J

fix stack grad gpu (#32781) · f54fb1ee
由 Jiawei Wang 提交于 5月 07, 2021

f54fb1ee
L
[Cherrypick 2.1] fix compile error on jetson platform (#32760) · ded39f84
由 LielinJiang 提交于 5月 07, 2021
```
* fix compile error on jetson platform

* remove unused head file

* rm decode_jpeg op on jetson platform
```
ded39f84

[CHERRY-PICK2.1]Remove paddle_custom_op dynamic libraries, and link to... · 3ba8c48a

由 Zhou Wei 提交于 5月 07, 2021

 [CHERRY-PICK2.1]Remove paddle_custom_op dynamic libraries, and link to FLUID_CORE on windows (#32583) (#32769)

* Remove paddle_custom_op dynamic libraries, change link to FLUID_CORE on windows, and check copy_to

* fix CI

3ba8c48a

W
pylayer_op:release context after compute. (#32707) (#32744) · c67a5d98
由 WeiXin 提交于 5月 07, 2021
```
修复了py_layer_op由于没有析构PyLayerContext造成内存(显存)泄露的问题。

原始pr：#32707
```
c67a5d98

[Cherry-Pick] Clear 'BasicEngine' when an exception occurs in the backward. (#32546) (#32615) · 7e35ef3a

由 WeiXin 提交于 5月 07, 2021

* clear 'BasicEngine' when an exception occurs in the backward. (#32546)

* clear 'BasicEngine' when an exception occurs in the backward.

* deal with conflict.

* deal with conflict.

* forward return any type. (#32661)

7e35ef3a

06 5月, 2021 5 次提交
- A
  
  [cherry-pick] Sum kernel for CPU supporting BF16 and SelectedRows (#32631) (#32755) · f3436af1
  由 Adam Osewski 提交于 5月 06, 2021
  
  f3436af1
- J
  [CHERRY-PICK] Reduce grad fix cherrypick (#32742) · 21448525
  由 jakpiase 提交于 5月 06, 2021
```
* base changes for fix

* minor change

* fix for bwd kernel

* removed unnecessary import

* implemented reviewers suggestions

* CI fix
```
  21448525
- C
  cherry-pick:change softmax_with_cross_entropy_op's parameter name from... · 9a589de8
  由 chajchaj 提交于 5月 06, 2021
```
cherry-pick:change softmax_with_cross_entropy_op's parameter name from softmax_switch to use_softmax (#32750)

* change parameter name from softmax_switch to use_softmax, test=develop

* cherry-pick:change parameter name from softmax_switch to use_softmax, test=develop
```
  9a589de8
- L
  
  update, test=develop (#32731) · df00636b
  由 lilong12 提交于 5月 06, 2021
  
  df00636b
- Z
  add API Tensor.item() to convert Tensor element to a Python scalar (#32634) · 035c7425
  由 Zhou Wei 提交于 5月 06, 2021
```
cherry-pick #32561
```
  035c7425
05 5月, 2021 1 次提交
- S
  
  fix traverse graph in reducer (#32721) · 4626afa4
  由 ShenLiang 提交于 5月 05, 2021
  
  4626afa4
04 5月, 2021 1 次提交
- B
  
  add_c_sync_npu_kernel (#32687) (#32723) · 4593597d
  由 Baibaifan 提交于 5月 04, 2021
  
  4593597d
01 5月, 2021 2 次提交
- B
  
  slove develop bugs (#32560) (#32684) · 6a1957e7
  由 Baibaifan 提交于 5月 01, 2021
  
  6a1957e7
- L
  
  [Kunlun]fix multi xpu dygraph hang, test=kunlun (#32662) (#32696) · 2c1ed9b8
  由 liuyuhui 提交于 5月 01, 2021
  
  2c1ed9b8
30 4月, 2021 6 次提交

X

add flag to check_kernel launch (#32692) (#32709) · 09adf20f
由 XiangGao 提交于 4月 30, 2021

09adf20f

Add 12 inplace APIs including auto generated (#32573) (#32699) · 097d5f52

由 pangyoki 提交于 4月 30, 2021

* add relu6_ hardsigmoid_ leaky_relu_ Inplace APIs

* add softmax_with_cross_entropy_ Inplace API

* add clip_ scale_ add_ subtract_ Inplace APIs

* add wlist

* fix parameter of scale api

* add add_n_ Inplace API and remove log_ Inplace API

* fix elementwise_add_ and elementwise_sub_ broadcast problem

* elementwise inplace api give error message before run the op

* use broadcast_shape in elementwise inplace op

* add 8 inplace apis that is auto generated

* add unittest for all inplace apis

* add decorator for inplace apis in static mode

* fix windows blas fail of exp inplace api, change array_equal to allclose

* add flatten inplace api

* add flatten unittest

* fix flatten unittest

* add decorator

* fix grad.numpy in test_pylayer_op

* unsupport softmax_with_cross_entropy_

* add test_inplace_softmax_with_cross_entropy to static_mode_white_list

* delete __all__ in inplace_utils

* delete activation inplace function and add Tensor.inplace_func

* change paddle.inplace_ to Tensor.inplace_

* fix little problem

* add paddle in inplace_utils

097d5f52

C

remove is_test=True in grad (#32683) · 1a417a4c
由 ceci3 提交于 4月 30, 2021

1a417a4c
L
Add op read_file and decode_jpeg (#32564) (#32686) · 2817239a
由 LielinJiang 提交于 4月 30, 2021
```
* add op read_file and decode_jpeg
```
2817239a
C

skip fuse repeated fc when the fc with weight padding (#32648) (#32680) · 79ce2a6c
由 cc 提交于 4月 30, 2021

79ce2a6c

Nne integration (#32604) (#32658) · cb506579

由 Shang Zhizhou 提交于 4月 30, 2021

* Add dlnne engine runtime

* Remove <const_cast> and remove unrelated modify with dlnne, +clang-format

* Add copyright message

* Add some paddlepaddle_pass to support more networks

* Add delete dropout_op pass
Co-authored-by: Ndenglin-github <82362191+denglin-github@users.noreply.github.com>

cb506579

29 4月, 2021 6 次提交
- J
  Add BF16 uniform random initializer (#32468) (#32677) · e7c81600
  由 joanna.wozna.intel 提交于 4月 29, 2021
```
* Add bf16 uniform random initializer

* Remove duplicated section

* Change UT to CPU place only

* Put detail functions into anonymous namespace
```
  e7c81600
- A
  Added pure_bf16 mode (#32281) (#32681) · 93535c59
  由 arlesniak 提交于 4月 29, 2021
```
This is cherry-pick of #32281
```
  93535c59
- C
  [Cherry-pick] Polish custom operator overrided method impl (#32666) (#32674) · ca2ef414
  由 Chen Weihang 提交于 4月 29, 2021
```
cherry-pick of #32666
```
  ca2ef414
- P
  
  specify multihead_matmul_fuse_pass_v3 QK path (#32659) (#32668) · 30dfa745
  由 Pei Yang 提交于 4月 29, 2021
  
  30dfa745
- W
  fix mem release error. (#32655) · a5627df3
  由 Wilber 提交于 4月 29, 2021
```
后续修复计划是啥
```
  a5627df3
- J
  - Added clearing oneDNN per executor (#32664) · 7ae0a80f
  由 Jacek Czaja 提交于 4月 29, 2021
```
- Executor is nt always having FLAGS_use_mkldnn set to true
```
  7ae0a80f
28 4月, 2021 2 次提交

W

conservative judgment (#32619) · 056a2fca
由 wenbin 提交于 4月 28, 2021

056a2fca

[Cherry-pick] Optimize update_loss_scaling_op(#32554) (#32606) · 33703da8

由 jiangcheng 提交于 4月 28, 2021

* optimize update_loss_scaling_op by fused for loop to one kernel, test=develop

* remove useless while loop and optimize variable name, test=develop

* optimize variable name from out_addrs_tensor to out_addrs_mem, test=develop

* optimize variable name for readable by change prefix identifier from t_ to local_

33703da8

27 4月, 2021 5 次提交
- Z
  [OPs] Bug fix, fix the segment mean for illegal syncthreads usage. (#32596) (#32610) · 54ab656c
  由 Zhong Hui 提交于 4月 27, 2021
```
* [OPs] Bug fix, fix the segment mean for illegal syncthreads usage.
```
  54ab656c
- P
  
  support depthwise_conv2d_transpose (#32593) · 85e697d7
  由 Pei Yang 提交于 4月 27, 2021
  
  85e697d7
- T
  Revert "[PsCore] optimize performance of large kv (#32535)" (#32599) · 809ac036
  由 tianshuo78520a 提交于 4月 27, 2021
```
This reverts commit 4b7242b0.
```
  809ac036
- A
  
  Fix grad calculation bug in tensor_array_to_tensor (#32558) · 6579432f
  由 Aurelius84 提交于 4月 27, 2021
  
  6579432f
- X
  Check for cuda errors immediately after kernel launch (#32557) · 19eefef4
  由 XiangGao 提交于 4月 27, 2021
```
Co-authored-by: NYang Zhang <yangzhang@live.com>
```
  19eefef4
26 4月, 2021 3 次提交

L
add send/recv api (#32504) · c47bafc6
由 lilong12 提交于 4月 26, 2021
```
* add sendrecv, test=develop
```
c47bafc6

Optimize where_index_op(prefix sum) (#30601) · 6ec4e640

由 jiangcheng 提交于 4月 26, 2021

* new optimize for where_index_op with prefix sum version.

* write a scan prefix sum kernel with stream for where index op.

* optimize where_index by using cub::DeviceScan::InclusiveSum instead of imperfect self-kernel.

* remove CheckTrue struct and rename stide_array for readable.

* optimize variable name for readable.

* optimize function name and annotation.

6ec4e640

T
[PsCore] optimize performance of large kv (#32535) · 4b7242b0
由 Thunderbrook 提交于 4月 26, 2021
```
* optimize pull sparse

* optimize pull sparse

* change macro

* format
```
4b7242b0

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致