提交 · 1d1ca0f877e19f5925d35e4ee94e1c27919459af · 机器未来 / Paddle

15 7月, 2021 1 次提交
- W
  [Cherry-Pick]Support finetuning the model saved on the MAC on the Linux (#34027) (#34154) · 1d1ca0f8
  由 WeiXin 提交于 7月 15, 2021
```
修复《jit.save在Mac系统上保存的模型，在Linux平台上无法对模型进行重训练》的问题。

原始PR： #34027
```
  1d1ca0f8
12 7月, 2021 1 次提交

cherry pick xpu to 2.1 (#34000) · 0f266ac1

由 taixiurong 提交于 7月 12, 2021

* update xpu cmake for kunlun (#33328)

* xpu support amp (#33809)

* fix bug DLTP-31078 (#33877)

* update xpu cmake (#33906)

* [xpu] add dropout & amp ops in xpu place (#33891)
Co-authored-by: NTTerror <tangzhiyi11@users.noreply.github.com>

0f266ac1

09 7月, 2021 1 次提交

[oneDNN] Fix to #33282 , added support of X input broadcasting to oneDNN... · f2f2fd80

由 Jacek Czaja 提交于 7月 09, 2021

[oneDNN] Fix to #33282 , added support of X input broadcasting to oneDNN elementwise ops (#33549) (#33845)

* - fix to #33282

* - Increased threshold for elementwise_mul_bf16 grad

* -disabled faulty UT

* - fix to approval

f2f2fd80

01 7月, 2021 1 次提交
- L
  [cherry-pick] fix bug when the cuda kernel config exceeds dims max (#33748) (#33893) · bedcf0dd
  由 Leo Chen 提交于 7月 01, 2021
```
fix bug when the cuda kernel config exceeds dims max
```
  bedcf0dd
21 6月, 2021 2 次提交
- Z
  
  fix gpt2 train loss Nan problem by add a line __syncthreads in BlockReduceSum (#33659) · cdeffff4
  由 zhiboniu 提交于 6月 21, 2021
  
  cdeffff4
- 李
  
  fix the but that concat op can't support uint8 (#33667) · 18043ab5
  由李季提交于 6月 21, 2021
  
  18043ab5
16 6月, 2021 1 次提交
- T
  fix gather op and add logsumexp op on kunlun (#32931) (#33592) · 63aeb02d
  由 TTerror 提交于 6月 16, 2021
```
* fix gather op and add logsumexp op on kunlun

* update xpu depence

* update tests and fix elementwise_add
```
  63aeb02d
15 6月, 2021 3 次提交

W

Cherry-pick support the bool tensor for the compare ops (#33551) · c334d2bd
由 wawltor 提交于 6月 15, 2021

c334d2bd

[cherry-pick] fix gather bug && fix hang of new_group (#33553) · a4e841e0

由 ShenLiang 提交于 6月 15, 2021

* Fix gather infer shape using axis (#33413)

* fix gather shape bug

* fix None

* fix topo

* Fix hang of hybrid parallel in new_group  (#33141)

* fix hang of hybrid parallel

* fix new_group for hang problem

* fix hang

a4e841e0

[Cherry-Pick] Fix the segfault when using to_tensor in PyLayer. (#33303) (#33518) · 0079e0b1

由 WeiXin 提交于 6月 15, 2021

修复pylayer 返回to_tensor时触发段错误的bug。
原因：

如果在Python端修改了stop_gradient属性，c++ 端InnerSetOverridedStopGradient 无法修改stop_gradient属性，在c++端调用SetOverridedStopGradient修改stop_gradient属性。
to_tensor产生的tensor的grad var的DataType为默认值（-1），在backward的过程中grad var的DataType不能为默认值（-1），因此在调用ForwardDataType设置grad var的DataType。

原始PR：#33303

0079e0b1

12 6月, 2021 1 次提交

Fix LayerNorm Problem Release2.1 (#33534) · a43e1fac

由 zhiboniu 提交于 6月 12, 2021

* Eliminate numerical differences of LayerNorm; fix LayerNorm Nan Bug while large data input

* fix bug while large shape of data input

a43e1fac

11 6月, 2021 3 次提交
- L
  [cherry-pick 2.1.1]2.1/fix concat (#33383) · 9567cbd7
  由 liuyuhui 提交于 6月 11, 2021
```
* add unit8 for concat (#32850)

* add bool type for tril api (#33402)
```
  9567cbd7
- C
  [Cherry-pick] Support diff dataset tensor place in single process dataloader (#33470) (#33487) · 14440905
  由 Chen Weihang 提交于 6月 11, 2021
```
Support diff dataset tensor place in single process dataloader

cherry-pick of #33470
```
  14440905
- L
  [cherry-pick]Fixed a bug of log_softmax: op input was modified to 'nan' (#32937) (#33436) · 61cae0df
  由 Lijunhui 提交于 6月 11, 2021
```
使用op benchmark时发现，当输入数据量小于某个值时，python 端 log_softmax 接口的输入值经过计算过后 会被改变为nan。输出正常。

cherry-pick自 #32937
```
  61cae0df
10 6月, 2021 1 次提交
- W
  
  fix aligned in roi_align (#33446) · 03f46685
  由 wangguanzhong 提交于 6月 10, 2021
  
  03f46685
09 6月, 2021 2 次提交
- fix the bug of yolo_box which can't run on nano and tx2 (#33422) (#33442) · d4967224
  由 s.feng 提交于 6月 09, 2021
  
  d4967224
- W
  
  [Paddle-TRT] Add gather_nd and reduce_sum trt op. (#33324) (#33365) · 6385f5ee
  由 Wilber 提交于 6月 09, 2021
  
  6385f5ee
08 6月, 2021 1 次提交
- T
  OP:strided_slice_op supports bool type inputs (#33373) (#33393) · ccabafa6
  由 TeslaZhao 提交于 6月 08, 2021
```
* Fix two english api documents, transpose and strided_slice

* OP:strided_slice_op supports bool type inputs
```
  ccabafa6
04 6月, 2021 1 次提交
- W
  [CherryPick] fix compare ops when broadcast (#33086) · c42ccf14
  由 wawltor 提交于 6月 04, 2021
```
* fix compare op in for in the cuda device

* fix the paddle compare op for the broadcast
```
  c42ccf14
01 6月, 2021 1 次提交
- W
  
  Fix cuda kernel launch of grid sampler (#33100) (#33232) · 8a5a45f8
  由 whs 提交于 6月 01, 2021
  
  8a5a45f8
07 5月, 2021 4 次提交
- J
  
  fix stack grad gpu (#32781) · f54fb1ee
  由 Jiawei Wang 提交于 5月 07, 2021
  
  f54fb1ee
- L
  [Cherrypick 2.1] fix compile error on jetson platform (#32760) · ded39f84
  由 LielinJiang 提交于 5月 07, 2021
```
* fix compile error on jetson platform

* remove unused head file

* rm decode_jpeg op on jetson platform
```
  ded39f84
- W
  pylayer_op:release context after compute. (#32707) (#32744) · c67a5d98
  由 WeiXin 提交于 5月 07, 2021
```
修复了py_layer_op由于没有析构PyLayerContext造成内存(显存)泄露的问题。

原始pr：#32707
```
  c67a5d98
- W
  [Cherry-Pick] Clear 'BasicEngine' when an exception occurs in the backward. (#32546) (#32615) · 7e35ef3a
  由 WeiXin 提交于 5月 07, 2021
```
* clear 'BasicEngine' when an exception occurs in the backward. (#32546)

* clear 'BasicEngine' when an exception occurs in the backward.

* deal with conflict.

* deal with conflict.

* forward return any type. (#32661)
```
  7e35ef3a
06 5月, 2021 3 次提交

A

[cherry-pick] Sum kernel for CPU supporting BF16 and SelectedRows (#32631) (#32755) · f3436af1
由 Adam Osewski 提交于 5月 06, 2021

f3436af1

[CHERRY-PICK] Reduce grad fix cherrypick (#32742) · 21448525

由 jakpiase 提交于 5月 06, 2021

* base changes for fix

* minor change

* fix for bwd kernel

* removed unnecessary import

* implemented reviewers suggestions

* CI fix

21448525

cherry-pick:change softmax_with_cross_entropy_op's parameter name from... · 9a589de8

由 chajchaj 提交于 5月 06, 2021

cherry-pick:change softmax_with_cross_entropy_op's parameter name from softmax_switch to use_softmax (#32750)

* change parameter name from softmax_switch to use_softmax, test=develop

* cherry-pick:change parameter name from softmax_switch to use_softmax, test=develop

9a589de8

04 5月, 2021 1 次提交
- B
  
  add_c_sync_npu_kernel (#32687) (#32723) · 4593597d
  由 Baibaifan 提交于 5月 04, 2021
  
  4593597d
01 5月, 2021 1 次提交
- B
  
  slove develop bugs (#32560) (#32684) · 6a1957e7
  由 Baibaifan 提交于 5月 01, 2021
  
  6a1957e7
30 4月, 2021 3 次提交

Add 12 inplace APIs including auto generated (#32573) (#32699) · 097d5f52

由 pangyoki 提交于 4月 30, 2021

* add relu6_ hardsigmoid_ leaky_relu_ Inplace APIs

* add softmax_with_cross_entropy_ Inplace API

* add clip_ scale_ add_ subtract_ Inplace APIs

* add wlist

* fix parameter of scale api

* add add_n_ Inplace API and remove log_ Inplace API

* fix elementwise_add_ and elementwise_sub_ broadcast problem

* elementwise inplace api give error message before run the op

* use broadcast_shape in elementwise inplace op

* add 8 inplace apis that is auto generated

* add unittest for all inplace apis

* add decorator for inplace apis in static mode

* fix windows blas fail of exp inplace api, change array_equal to allclose

* add flatten inplace api

* add flatten unittest

* fix flatten unittest

* add decorator

* fix grad.numpy in test_pylayer_op

* unsupport softmax_with_cross_entropy_

* add test_inplace_softmax_with_cross_entropy to static_mode_white_list

* delete __all__ in inplace_utils

* delete activation inplace function and add Tensor.inplace_func

* change paddle.inplace_ to Tensor.inplace_

* fix little problem

* add paddle in inplace_utils

097d5f52

C

remove is_test=True in grad (#32683) · 1a417a4c
由 ceci3 提交于 4月 30, 2021

1a417a4c
L
Add op read_file and decode_jpeg (#32564) (#32686) · 2817239a
由 LielinJiang 提交于 4月 30, 2021
```
* add op read_file and decode_jpeg
```
2817239a

29 4月, 2021 3 次提交
- J
  Add BF16 uniform random initializer (#32468) (#32677) · e7c81600
  由 joanna.wozna.intel 提交于 4月 29, 2021
```
* Add bf16 uniform random initializer

* Remove duplicated section

* Change UT to CPU place only

* Put detail functions into anonymous namespace
```
  e7c81600
- A
  Added pure_bf16 mode (#32281) (#32681) · 93535c59
  由 arlesniak 提交于 4月 29, 2021
```
This is cherry-pick of #32281
```
  93535c59
- J
  - Added clearing oneDNN per executor (#32664) · 7ae0a80f
  由 Jacek Czaja 提交于 4月 29, 2021
```
- Executor is nt always having FLAGS_use_mkldnn set to true
```
  7ae0a80f
28 4月, 2021 1 次提交

[Cherry-pick] Optimize update_loss_scaling_op(#32554) (#32606) · 33703da8

由 jiangcheng 提交于 4月 28, 2021

* optimize update_loss_scaling_op by fused for loop to one kernel, test=develop

* remove useless while loop and optimize variable name, test=develop

* optimize variable name from out_addrs_tensor to out_addrs_mem, test=develop

* optimize variable name for readable by change prefix identifier from t_ to local_

33703da8

27 4月, 2021 2 次提交
- Z
  [OPs] Bug fix, fix the segment mean for illegal syncthreads usage. (#32596) (#32610) · 54ab656c
  由 Zhong Hui 提交于 4月 27, 2021
```
* [OPs] Bug fix, fix the segment mean for illegal syncthreads usage.
```
  54ab656c
- A
  
  Fix grad calculation bug in tensor_array_to_tensor (#32558) · 6579432f
  由 Aurelius84 提交于 4月 27, 2021
  
  6579432f
26 4月, 2021 2 次提交

Optimize where_index_op(prefix sum) (#30601) · 6ec4e640

由 jiangcheng 提交于 4月 26, 2021

* new optimize for where_index_op with prefix sum version.

* write a scan prefix sum kernel with stream for where index op.

* optimize where_index by using cub::DeviceScan::InclusiveSum instead of imperfect self-kernel.

* remove CheckTrue struct and rename stide_array for readable.

* optimize variable name for readable.

* optimize function name and annotation.

6ec4e640

W

[HybridParallel] fix port reuse when create multi group (#31876) · 41bfec8d
由 WangXi 提交于 4月 26, 2021

41bfec8d

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致