提交 · 6ab43f7fe8a9876293f3bc93a86c1a38588c0ae5 · Crayon鑫 / Paddle

30 4月, 2021 7 次提交

P

remove check for optim_cache_dir in trt slim int8 (#32676) · c6713bc0
由 Pei Yang 提交于 4月 30, 2021

c6713bc0
Z

add API Tensor.item() to convert Tensor element to a Python scalar (#32561) · 7e2b60a4
由 Zhou Wei 提交于 4月 30, 2021

7e2b60a4

Add 12 inplace APIs including auto generated (#32573) · 308073de

由 pangyoki 提交于 4月 30, 2021

* add relu6_ hardsigmoid_ leaky_relu_ Inplace APIs

* add softmax_with_cross_entropy_ Inplace API

* add clip_ scale_ add_ subtract_ Inplace APIs

* add wlist

* fix parameter of scale api

* add add_n_ Inplace API and remove log_ Inplace API

* fix elementwise_add_ and elementwise_sub_ broadcast problem

* elementwise inplace api give error message before run the op

* use broadcast_shape in elementwise inplace op

* add 8 inplace apis that is auto generated

* add unittest for all inplace apis

* add decorator for inplace apis in static mode

* fix windows blas fail of exp inplace api, change array_equal to allclose

* add flatten inplace api

* add flatten unittest

* fix flatten unittest

* add decorator

* fix grad.numpy in test_pylayer_op

* unsupport softmax_with_cross_entropy_

* add test_inplace_softmax_with_cross_entropy to static_mode_white_list

* delete __all__ in inplace_utils

* delete activation inplace function and add Tensor.inplace_func

* change paddle.inplace_ to Tensor.inplace_

* fix little problem

* add paddle in inplace_utils

308073de

C

remove is_test=True in grad (#32678) · bd8d35a2
由 ceci3 提交于 4月 30, 2021

bd8d35a2
1

test=develop, optimize index_sampler (#32663) · 5ada0329
由 123malin 提交于 4月 30, 2021

5ada0329
B

add_c_sync_npu_kernel (#32687) · 8fd724a5
由 Baibaifan 提交于 4月 30, 2021

8fd724a5
J

Reduce grad fix (#32592) · 43527a2b
由 jakpiase 提交于 4月 30, 2021

43527a2b

29 4月, 2021 10 次提交
- L
  
  [Kunlun]fix multi xpu dygraph hang, test=kunlun (#32662) · a3e77197
  由 liuyuhui 提交于 4月 29, 2021
  
  a3e77197
- L
  
  [NPU] refine FillNpuTensorWithConstant (#32682) · 0f578db9
  由 Leo Chen 提交于 4月 29, 2021
  
  0f578db9
- L
  Add op read_file and decode_jpeg (#32564) · b22f6d69
  由 LielinJiang 提交于 4月 29, 2021
```
* add op read_file and decode_jpeg
```
  b22f6d69
- C
  
  normalized custom operator impl (#32666) · 7a73692b
  由 Chen Weihang 提交于 4月 29, 2021
  
  7a73692b
- W
  
  forward return any type. (#32661) · b6ca6a55
  由 WeiXin 提交于 4月 29, 2021
  
  b6ca6a55
- C
  
  skip fuse repeated fc when the fc with weight padding (#32648) · b7ddd7d7
  由 cc 提交于 4月 29, 2021
  
  b7ddd7d7
- P
  
  specify multihead_matmul_fuse_pass_v3 QK path (#32659) · 8ccf549b
  由 Pei Yang 提交于 4月 29, 2021
  
  8ccf549b
- J
  Add BF16 uniform random initializer (#32468) · f46f15a0
  由 joanna.wozna.intel 提交于 4月 29, 2021
```
* Add bf16 uniform random initializer

* Remove duplicated section

* Change UT to CPU place only

* Put detail functions into anonymous namespace
```
  f46f15a0
- W
  
  fix mem release error. (#32654) · dec8ab8f
  由 Wilber 提交于 4月 29, 2021
  
  dec8ab8f
- Z
  [Paddle-TRT] Implement MHA fp16 order same as training (#32629) · 75282e74
  由 zlsh80826 提交于 4月 29, 2021
```
* implement MHA order same as training

* fix fp16 compile issue on old architecture

* fix format

* fix format
```
  75282e74
28 4月, 2021 8 次提交

[NPU] add input EpsilonTensor for adam (#32605) · 119cda3d

由 Leo Chen 提交于 4月 28, 2021

* add input EpsilonTensor for adam

* update python api

* add unit test

* add npu test

* add more ut

119cda3d

A

Added pure_bf16 mode (#32281) · bc379ca3
由 arlesniak 提交于 4月 28, 2021

bc379ca3

Nne integration (#32604) · abcb3f54

由 denglin-github 提交于 4月 28, 2021

* Add dlnne engine runtime

* Fix log

* Remove <const_cast> and remove unrelated modify with dlnne, +clang-format

* Fix CMakeList format error

* Add copyright message

* Fix dlnne CMakeList.txt

* Add some paddlepaddle_pass to support more networks

* Fix some format bug

* Add delete dropout_op pass

* Fix some format bug

* Fix format bug

abcb3f54

[PsCore] solve Brpc dep (#32632) · 4ead9a5a

由 Thunderbrook 提交于 4月 28, 2021

* Revert "Revert "[PsCore] optimize performance of large kv (#32535)" (#32599)"

This reverts commit 809ac036.

* brpc dep

4ead9a5a

Fix some error message (#32614) · 9ee709fc

由 Kqnonrime 提交于 4月 28, 2021

* fix two error message

* fix two error message

* fix error

* fix error

* fix error

* fix error

* fix some error message

* fix some error

* fix error

* fix some error

* fix some error

* fix some error

* fix one error

* fix some error

* fix seven error message

* fix error

* fix error

* fix error

* fix error

* fix some error message

* fix error

* fix some error

* fix some error

9ee709fc

Z

[Rocm] fix test_var_base (#32639) · 7a245b7a
由 zhulei 提交于 4月 28, 2021

7a245b7a
J
[oneDNN] Added clearing oneDNN cache per executor (#32499) · ba610761
由 Jacek Czaja 提交于 4月 28, 2021
```
* - Added clearing oneDNN per executor

* - Executor is nt always having FLAGS_use_mkldnn set to true
```
ba610761

Optimize update_loss_scaling_op (#32554) · 0dc02dc7

由 jiangcheng 提交于 4月 28, 2021

* optimize update_loss_scaling_op by fused for loop to one kernel, test=develop

* remove useless while loop and optimize variable name, test=develop

* optimize variable name from out_addrs_tensor to out_addrs_mem, test=develop

* optimize variable name for readable by change prefix identifier from t_ to local_

0dc02dc7

27 4月, 2021 10 次提交
- L
  add alltoall api (#32507) · db41b742
  由 lilong12 提交于 4月 27, 2021
```
* add alltoall api, test=develop
```
  db41b742
- W
  clear 'BasicEngine' when an exception occurs in the backward. (#32546) · 797b2dfd
  由 WeiXin 提交于 4月 27, 2021
```
* clear 'BasicEngine' when an exception occurs in the backward.

* deal with conflict.

* deal with conflict.
```
  797b2dfd
- W
  
  conservative judgment (#32556) · f285f4c1
  由 wenbin 提交于 4月 27, 2021
  
  f285f4c1
- Z
  [OPs] Bug fix, fix the segment mean for illegal syncthreads usage. (#32596) · 1afe1ac9
  由 Zhong Hui 提交于 4月 27, 2021
```
* [OPs] Bug fix, fix the segment mean for illegal syncthreads usage.
```
  1afe1ac9
- Z
  
  Unify the implementation of activation operation (#32348) · eca8dcc7
  由 Zhang Zheng 提交于 4月 27, 2021
  
  eca8dcc7
- B
  
  slove develop bugs (#32560) · 6f6e159a
  由 Baibaifan 提交于 4月 27, 2021
  
  6f6e159a
- P
  
  support depthwise_conv2d_transpose (#32593) · 85e697d7
  由 Pei Yang 提交于 4月 27, 2021
  
  85e697d7
- T
  Revert "[PsCore] optimize performance of large kv (#32535)" (#32599) · 809ac036
  由 tianshuo78520a 提交于 4月 27, 2021
```
This reverts commit 4b7242b0.
```
  809ac036
- A
  
  Fix grad calculation bug in tensor_array_to_tensor (#32558) · 6579432f
  由 Aurelius84 提交于 4月 27, 2021
  
  6579432f
- X
  Check for cuda errors immediately after kernel launch (#32557) · 19eefef4
  由 XiangGao 提交于 4月 27, 2021
```
Co-authored-by: NYang Zhang <yangzhang@live.com>
```
  19eefef4
26 4月, 2021 5 次提交

L
add send/recv api (#32504) · c47bafc6
由 lilong12 提交于 4月 26, 2021
```
* add sendrecv, test=develop
```
c47bafc6
Z
Fix OPENBLAS ci and fix windows CPU CI to parallel compile (#32548) · 1ec9525a
由 Zhou Wei 提交于 4月 26, 2021
```
* clear CUDA compile environment on windows

* fix Windows CI

* fix Windows CI

* fix Windows CI
```
1ec9525a

Optimize where_index_op(prefix sum) (#30601) · 6ec4e640

由 jiangcheng 提交于 4月 26, 2021

* new optimize for where_index_op with prefix sum version.

* write a scan prefix sum kernel with stream for where index op.

* optimize where_index by using cub::DeviceScan::InclusiveSum instead of imperfect self-kernel.

* remove CheckTrue struct and rename stide_array for readable.

* optimize variable name for readable.

* optimize function name and annotation.

6ec4e640

T
[PsCore] optimize performance of large kv (#32535) · 4b7242b0
由 Thunderbrook 提交于 4月 26, 2021
```
* optimize pull sparse

* optimize pull sparse

* change macro

* format
```
4b7242b0
Y
Unset ReserveSpace of batch_norm for inference program. (#32493) · 202b0eaf
由 Yiqun Liu 提交于 4月 26, 2021
```
* Unset ReserveSpace for inference program.

* Support training from an inference program.
```
202b0eaf

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致