提交 · 14440905d5555e9903ee7b99475de3f4cdcc4348 · 机器未来 / Paddle

11 6月, 2021 2 次提交
- C
  [Cherry-pick] Support diff dataset tensor place in single process dataloader (#33470) (#33487) · 14440905
  由 Chen Weihang 提交于 6月 11, 2021
```
Support diff dataset tensor place in single process dataloader

cherry-pick of #33470
```
  14440905
- L
  [cherry-pick]Fixed a bug of log_softmax: op input was modified to 'nan' (#32937) (#33436) · 61cae0df
  由 Lijunhui 提交于 6月 11, 2021
```
使用op benchmark时发现，当输入数据量小于某个值时，python 端 log_softmax 接口的输入值经过计算过后 会被改变为nan。输出正常。

cherry-pick自 #32937
```
  61cae0df
10 6月, 2021 1 次提交
- W
  
  fix aligned in roi_align (#33446) · 03f46685
  由 wangguanzhong 提交于 6月 10, 2021
  
  03f46685
09 6月, 2021 2 次提交
- fix the bug of yolo_box which can't run on nano and tx2 (#33422) (#33442) · d4967224
  由 s.feng 提交于 6月 09, 2021
  
  d4967224
- W
  
  [Paddle-TRT] Add gather_nd and reduce_sum trt op. (#33324) (#33365) · 6385f5ee
  由 Wilber 提交于 6月 09, 2021
  
  6385f5ee
08 6月, 2021 1 次提交
- T
  OP:strided_slice_op supports bool type inputs (#33373) (#33393) · ccabafa6
  由 TeslaZhao 提交于 6月 08, 2021
```
* Fix two english api documents, transpose and strided_slice

* OP:strided_slice_op supports bool type inputs
```
  ccabafa6
04 6月, 2021 1 次提交
- W
  [CherryPick] fix compare ops when broadcast (#33086) · c42ccf14
  由 wawltor 提交于 6月 04, 2021
```
* fix compare op in for in the cuda device

* fix the paddle compare op for the broadcast
```
  c42ccf14
01 6月, 2021 1 次提交
- W
  
  Fix cuda kernel launch of grid sampler (#33100) (#33232) · 8a5a45f8
  由 whs 提交于 6月 01, 2021
  
  8a5a45f8
07 5月, 2021 4 次提交
- J
  
  fix stack grad gpu (#32781) · f54fb1ee
  由 Jiawei Wang 提交于 5月 07, 2021
  
  f54fb1ee
- L
  [Cherrypick 2.1] fix compile error on jetson platform (#32760) · ded39f84
  由 LielinJiang 提交于 5月 07, 2021
```
* fix compile error on jetson platform

* remove unused head file

* rm decode_jpeg op on jetson platform
```
  ded39f84
- W
  pylayer_op:release context after compute. (#32707) (#32744) · c67a5d98
  由 WeiXin 提交于 5月 07, 2021
```
修复了py_layer_op由于没有析构PyLayerContext造成内存(显存)泄露的问题。

原始pr：#32707
```
  c67a5d98
- W
  [Cherry-Pick] Clear 'BasicEngine' when an exception occurs in the backward. (#32546) (#32615) · 7e35ef3a
  由 WeiXin 提交于 5月 07, 2021
```
* clear 'BasicEngine' when an exception occurs in the backward. (#32546)

* clear 'BasicEngine' when an exception occurs in the backward.

* deal with conflict.

* deal with conflict.

* forward return any type. (#32661)
```
  7e35ef3a
06 5月, 2021 3 次提交

A

[cherry-pick] Sum kernel for CPU supporting BF16 and SelectedRows (#32631) (#32755) · f3436af1
由 Adam Osewski 提交于 5月 06, 2021

f3436af1

[CHERRY-PICK] Reduce grad fix cherrypick (#32742) · 21448525

由 jakpiase 提交于 5月 06, 2021

* base changes for fix

* minor change

* fix for bwd kernel

* removed unnecessary import

* implemented reviewers suggestions

* CI fix

21448525

cherry-pick:change softmax_with_cross_entropy_op's parameter name from... · 9a589de8

由 chajchaj 提交于 5月 06, 2021

cherry-pick:change softmax_with_cross_entropy_op's parameter name from softmax_switch to use_softmax (#32750)

* change parameter name from softmax_switch to use_softmax, test=develop

* cherry-pick:change parameter name from softmax_switch to use_softmax, test=develop

9a589de8

04 5月, 2021 1 次提交
- B
  
  add_c_sync_npu_kernel (#32687) (#32723) · 4593597d
  由 Baibaifan 提交于 5月 04, 2021
  
  4593597d
01 5月, 2021 1 次提交
- B
  
  slove develop bugs (#32560) (#32684) · 6a1957e7
  由 Baibaifan 提交于 5月 01, 2021
  
  6a1957e7
30 4月, 2021 3 次提交

Add 12 inplace APIs including auto generated (#32573) (#32699) · 097d5f52

由 pangyoki 提交于 4月 30, 2021

* add relu6_ hardsigmoid_ leaky_relu_ Inplace APIs

* add softmax_with_cross_entropy_ Inplace API

* add clip_ scale_ add_ subtract_ Inplace APIs

* add wlist

* fix parameter of scale api

* add add_n_ Inplace API and remove log_ Inplace API

* fix elementwise_add_ and elementwise_sub_ broadcast problem

* elementwise inplace api give error message before run the op

* use broadcast_shape in elementwise inplace op

* add 8 inplace apis that is auto generated

* add unittest for all inplace apis

* add decorator for inplace apis in static mode

* fix windows blas fail of exp inplace api, change array_equal to allclose

* add flatten inplace api

* add flatten unittest

* fix flatten unittest

* add decorator

* fix grad.numpy in test_pylayer_op

* unsupport softmax_with_cross_entropy_

* add test_inplace_softmax_with_cross_entropy to static_mode_white_list

* delete __all__ in inplace_utils

* delete activation inplace function and add Tensor.inplace_func

* change paddle.inplace_ to Tensor.inplace_

* fix little problem

* add paddle in inplace_utils

097d5f52

C

remove is_test=True in grad (#32683) · 1a417a4c
由 ceci3 提交于 4月 30, 2021

1a417a4c
L
Add op read_file and decode_jpeg (#32564) (#32686) · 2817239a
由 LielinJiang 提交于 4月 30, 2021
```
* add op read_file and decode_jpeg
```
2817239a

29 4月, 2021 3 次提交
- J
  Add BF16 uniform random initializer (#32468) (#32677) · e7c81600
  由 joanna.wozna.intel 提交于 4月 29, 2021
```
* Add bf16 uniform random initializer

* Remove duplicated section

* Change UT to CPU place only

* Put detail functions into anonymous namespace
```
  e7c81600
- A
  Added pure_bf16 mode (#32281) (#32681) · 93535c59
  由 arlesniak 提交于 4月 29, 2021
```
This is cherry-pick of #32281
```
  93535c59
- J
  - Added clearing oneDNN per executor (#32664) · 7ae0a80f
  由 Jacek Czaja 提交于 4月 29, 2021
```
- Executor is nt always having FLAGS_use_mkldnn set to true
```
  7ae0a80f
28 4月, 2021 1 次提交

[Cherry-pick] Optimize update_loss_scaling_op(#32554) (#32606) · 33703da8

由 jiangcheng 提交于 4月 28, 2021

* optimize update_loss_scaling_op by fused for loop to one kernel, test=develop

* remove useless while loop and optimize variable name, test=develop

* optimize variable name from out_addrs_tensor to out_addrs_mem, test=develop

* optimize variable name for readable by change prefix identifier from t_ to local_

33703da8

27 4月, 2021 2 次提交
- Z
  [OPs] Bug fix, fix the segment mean for illegal syncthreads usage. (#32596) (#32610) · 54ab656c
  由 Zhong Hui 提交于 4月 27, 2021
```
* [OPs] Bug fix, fix the segment mean for illegal syncthreads usage.
```
  54ab656c
- A
  
  Fix grad calculation bug in tensor_array_to_tensor (#32558) · 6579432f
  由 Aurelius84 提交于 4月 27, 2021
  
  6579432f
26 4月, 2021 5 次提交

Optimize where_index_op(prefix sum) (#30601) · 6ec4e640

由 jiangcheng 提交于 4月 26, 2021

* new optimize for where_index_op with prefix sum version.

* write a scan prefix sum kernel with stream for where index op.

* optimize where_index by using cub::DeviceScan::InclusiveSum instead of imperfect self-kernel.

* remove CheckTrue struct and rename stide_array for readable.

* optimize variable name for readable.

* optimize function name and annotation.

6ec4e640

W

[HybridParallel] fix port reuse when create multi group (#31876) · 41bfec8d
由 WangXi 提交于 4月 26, 2021

41bfec8d
S
[HybridParallel]Fix model parallel bug by using C++ op (#32536) · ea465fa5
由 ShenLiang 提交于 4月 26, 2021
```
* fix model parallel

* rm parallel_help.py

* add embedding
```
ea465fa5
W
support backward return None, when corresponding input tensor without gradient (#32494) · 8e66046b
由 WeiXin 提交于 4月 26, 2021
```
* support backward return None.

* edit unittest.

* edit code according to CI

* Improve error information
```
8e66046b

optimize slice op and slice grad op (#32266) · 5161f71a

由 jiangcheng 提交于 4月 26, 2021

* optimize slice op and slice grad op, test=develop

* optimize variable name and annotation information, test=develop

5161f71a

25 4月, 2021 9 次提交
- L
  
  [Setitem] Support grad computation of op set_value (#32431) · 25e723e7
  由 liym27 提交于 4月 25, 2021
  
  25e723e7
- B
  
  add copy_cross_scope (#32432) · 5943ff7b
  由 Baibaifan 提交于 4月 25, 2021
  
  5943ff7b
- Z
  
  fix gradient(nan) when two inputs are equal (#32448) · 1896c777
  由 Zhang Ting 提交于 4月 25, 2021
  
  1896c777
- Q
  
  [ROCM] update PADDLE_WITH_ROCM to PADDLE_WITH_HIP, test=develop (#32487) · 3b4dcad7
  由 Qi Li 提交于 4月 25, 2021
  
  3b4dcad7
- M
  
  add silu op, test=develop (#32384) · 2f351ed5
  由 minghaoBD 提交于 4月 25, 2021
  
  2f351ed5
- W
  [BUG FIX] when x.dim < y.dim, the result of compare_op is inverse (#32470) · 78eff521
  由 wawltor 提交于 4月 25, 2021
```
* fix bug: when x.dim < y.dim, the result of compare_op is inverse to expected result

* support the cuda for fix the compare broadcast bug
```
  78eff521
- C
  
  fix reader_blocking_queue_test (#32505) · 4db2cc90
  由 Chen Weihang 提交于 4月 25, 2021
  
  4db2cc90
- L
  [NPU] refine lookup_table_v2_grad npu_kernel (#32497) · fb7590d4
  由 Leo Chen 提交于 4月 25, 2021
```
* use ZerosLike instead of NPUMemsetAsync

* fix compile
```
  fb7590d4
- D
  Nne integration (#32255) · feb2e476
  由 denglin-github 提交于 4月 25, 2021
```
* Add dlnne engine runtime

* Fix log

* Remove <const_cast> and remove unrelated modify with dlnne, +clang-format

* Fix CMakeList format error

* Add copyright message

* Fix dlnne CMakeList.txt

* Add some paddlepaddle_pass to support more networks

* Fix some format bug
```
  feb2e476

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致