提交 · 0f266ac18bcac01bd0438e4c4b95ff79237eda6b · 机器未来 / Paddle

12 7月, 2021 1 次提交

cherry pick xpu to 2.1 (#34000) · 0f266ac1

由 taixiurong 提交于 7月 12, 2021

* update xpu cmake for kunlun (#33328)

* xpu support amp (#33809)

* fix bug DLTP-31078 (#33877)

* update xpu cmake (#33906)

* [xpu] add dropout & amp ops in xpu place (#33891)
Co-authored-by: NTTerror <tangzhiyi11@users.noreply.github.com>

0f266ac1

09 7月, 2021 2 次提交

C
[Cherry-pick] Up cxx11 check to cxx14 (#34015) (#34034) · 8417ad60
由 Chen Weihang 提交于 7月 09, 2021
```
[Cherry-pick] Up cxx11 check to cxx14 #34034
```
8417ad60

[oneDNN] Fix to #33282 , added support of X input broadcasting to oneDNN... · f2f2fd80

由 Jacek Czaja 提交于 7月 09, 2021

[oneDNN] Fix to #33282 , added support of X input broadcasting to oneDNN elementwise ops (#33549) (#33845)

* - fix to #33282

* - Increased threshold for elementwise_mul_bf16 grad

* -disabled faulty UT

* - fix to approval

f2f2fd80

05 7月, 2021 1 次提交
- W
  
  cherry-pick prs. (#33932) · fe827540
  由 Wilber 提交于 7月 05, 2021
  
  fe827540
01 7月, 2021 2 次提交
- L
  [cherry-pick] fix bug when the cuda kernel config exceeds dims max (#33748) (#33893) · bedcf0dd
  由 Leo Chen 提交于 7月 01, 2021
```
fix bug when the cuda kernel config exceeds dims max
```
  bedcf0dd
- 王
  
  fix the opt path create error in windows, test=develop (#33853) (#33885) · 702610ef
  由王明冬提交于 7月 01, 2021
  
  702610ef
28 6月, 2021 1 次提交
- W
  
  Fix wrong scale length for QkvToContext (#33763) (#33784) · 89fdd6c8
  由 wenbin 提交于 6月 28, 2021
  
  89fdd6c8
22 6月, 2021 2 次提交
- R
  
  Dynamic amp support sync_batch_norm op (#32770) (#33709) · 1e62c239
  由 Roc 提交于 6月 22, 2021
  
  1e62c239
- P
  
  fix emb_eltwise_ln gpu_id bug (#33701) (#33706) · bf3161bd
  由 Pei Yang 提交于 6月 22, 2021
  
  bf3161bd
21 6月, 2021 2 次提交
- Z
  
  fix gpt2 train loss Nan problem by add a line __syncthreads in BlockReduceSum (#33659) · cdeffff4
  由 zhiboniu 提交于 6月 21, 2021
  
  cdeffff4
- 李
  
  fix the but that concat op can't support uint8 (#33667) · 18043ab5
  由李季提交于 6月 21, 2021
  
  18043ab5
18 6月, 2021 2 次提交
- W
  
  cherry-pick .Align the code of trt under the develop and release/2.1 branch (#33631) · 9a3d8593
  由 Wilber 提交于 6月 18, 2021
  
  9a3d8593
- P
  
  remove check for optim_cache_dir in trt slim int8 (#32676) (#33629) · 370fb102
  由 Pei Yang 提交于 6月 18, 2021
  
  370fb102
17 6月, 2021 1 次提交
- W
  [Inference Tensorrt] Add attr for trt engine and handle the input seq problem... · 8e163f92
  由 Wilber 提交于 6月 17, 2021
```
[Inference Tensorrt] Add attr for trt engine and handle the input seq problem for ernie var len. (#33575) (#33622)
```
  8e163f92
16 6月, 2021 4 次提交

T
fix gather op and add logsumexp op on kunlun (#32931) (#33592) · 63aeb02d
由 TTerror 提交于 6月 16, 2021
```
* fix gather op and add logsumexp op on kunlun

* update xpu depence

* update tests and fix elementwise_add
```
63aeb02d

[CP] add a strategy to run program with fleet (#33511) · bb5963da

由 lilong12 提交于 6月 16, 2021

* Add raw program meta optimizer (#32597)

* add raw program, test=develop

* add precision unitest for executor all reduce (#33339)

* fix dp (#33297)
Co-authored-by: NYuang Liu <liuyuang@baidu.com>
Co-authored-by: N李季 <2042519524@qq.com>

bb5963da

[cherry pick] Fix issue #33021 setCacheCapacity could not limit memory consumption (#33571) · 5c68e79d

由 lidanqing 提交于 6月 16, 2021

* [oneDNN] First fix to #33021  (#33174)

* - First fix to #33021

* [oneDNN] Second fix to #33021 (#33471)

* use older download_data function
Co-authored-by: NJacek Czaja <jacek.czaja@intel.com>

5c68e79d

S
Add trt layer norm dynamic (#33448) · e5bd7eb8
由 Shang Zhizhou 提交于 6月 16, 2021
```
* 1, remove layernorm dynamic fp16; 2, let reshape out in dynamic shape (#33535)
```
e5bd7eb8

15 6月, 2021 4 次提交

W

Cherry-pick support the bool tensor for the compare ops (#33551) · c334d2bd
由 wawltor 提交于 6月 15, 2021

c334d2bd

[cherry-pick] fix gather bug && fix hang of new_group (#33553) · a4e841e0

由 ShenLiang 提交于 6月 15, 2021

* Fix gather infer shape using axis (#33413)

* fix gather shape bug

* fix None

* fix topo

* Fix hang of hybrid parallel in new_group  (#33141)

* fix hang of hybrid parallel

* fix new_group for hang problem

* fix hang

a4e841e0

[Cherry-Pick] Fix the segfault when using to_tensor in PyLayer. (#33303) (#33518) · 0079e0b1

由 WeiXin 提交于 6月 15, 2021

修复pylayer 返回to_tensor时触发段错误的bug。
原因：

如果在Python端修改了stop_gradient属性，c++ 端InnerSetOverridedStopGradient 无法修改stop_gradient属性，在c++端调用SetOverridedStopGradient修改stop_gradient属性。
to_tensor产生的tensor的grad var的DataType为默认值（-1），在backward的过程中grad var的DataType不能为默认值（-1），因此在调用ForwardDataType设置grad var的DataType。

原始PR：#33303

0079e0b1

W

refix if-else logic for inference: missing if (#33531) · f7034613
由 wenbin 提交于 6月 15, 2021

f7034613

12 6月, 2021 1 次提交

Fix LayerNorm Problem Release2.1 (#33534) · a43e1fac

由 zhiboniu 提交于 6月 12, 2021

* Eliminate numerical differences of LayerNorm; fix LayerNorm Nan Bug while large data input

* fix bug while large shape of data input

a43e1fac

11 6月, 2021 3 次提交
- L
  [cherry-pick 2.1.1]2.1/fix concat (#33383) · 9567cbd7
  由 liuyuhui 提交于 6月 11, 2021
```
* add unit8 for concat (#32850)

* add bool type for tril api (#33402)
```
  9567cbd7
- C
  [Cherry-pick] Support diff dataset tensor place in single process dataloader (#33470) (#33487) · 14440905
  由 Chen Weihang 提交于 6月 11, 2021
```
Support diff dataset tensor place in single process dataloader

cherry-pick of #33470
```
  14440905
- L
  [cherry-pick]Fixed a bug of log_softmax: op input was modified to 'nan' (#32937) (#33436) · 61cae0df
  由 Lijunhui 提交于 6月 11, 2021
```
使用op benchmark时发现，当输入数据量小于某个值时，python 端 log_softmax 接口的输入值经过计算过后 会被改变为nan。输出正常。

cherry-pick自 #32937
```
  61cae0df
10 6月, 2021 2 次提交
- W
  
  fix aligned in roi_align (#33446) · 03f46685
  由 wangguanzhong 提交于 6月 10, 2021
  
  03f46685
- 王
  
  fix the bug in repeated_fc_relu_fuse_pass.test=develop (#33386) (#33431) · c4a417f5
  由王明冬提交于 6月 10, 2021
  
  c4a417f5
09 6月, 2021 2 次提交
- fix the bug of yolo_box which can't run on nano and tx2 (#33422) (#33442) · d4967224
  由 s.feng 提交于 6月 09, 2021
  
  d4967224
- W
  
  [Paddle-TRT] Add gather_nd and reduce_sum trt op. (#33324) (#33365) · 6385f5ee
  由 Wilber 提交于 6月 09, 2021
  
  6385f5ee
08 6月, 2021 3 次提交
- W
  
  Add trt convert reshape_op in release/2.1.1 (#33372) · bad3bebf
  由 Wangzheee 提交于 6月 08, 2021
  
  bad3bebf
- P
  Cherry pick deconv & jetson single arch (#33387) · 0549d4af
  由 Pei Yang 提交于 6月 08, 2021
```
* fix conv2d_transpose trt bugs (#33242)

* fix jetson arch when compiling with single arch (#33269)
```
  0549d4af
- T
  OP:strided_slice_op supports bool type inputs (#33373) (#33393) · ccabafa6
  由 TeslaZhao 提交于 6月 08, 2021
```
* Fix two english api documents, transpose and strided_slice

* OP:strided_slice_op supports bool type inputs
```
  ccabafa6
07 6月, 2021 1 次提交
- W
  
  Fix inference prepare data (#33370) · d5225145
  由 wenbin 提交于 6月 07, 2021
  
  d5225145
04 6月, 2021 1 次提交
- W
  [CherryPick] fix compare ops when broadcast (#33086) · c42ccf14
  由 wawltor 提交于 6月 04, 2021
```
* fix compare op in for in the cuda device

* fix the paddle compare op for the broadcast
```
  c42ccf14
03 6月, 2021 2 次提交
- Q
  
  [ROCM] update paddle inference cmake, test=develop (#33260) (#33290) · b032b579
  由 Qi Li 提交于 6月 03, 2021
  
  b032b579
- Q
  
  [ROCM] fix fused_fc_elementwise_layernorm, test=develop (#33281) (#33299) · ef6120f3
  由 Qi Li 提交于 6月 03, 2021
  
  ef6120f3
01 6月, 2021 1 次提交
- W
  
  Fix cuda kernel launch of grid sampler (#33100) (#33232) · 8a5a45f8
  由 whs 提交于 6月 01, 2021
  
  8a5a45f8
31 5月, 2021 1 次提交
- W
  
  disable conv plugin in TRT old versions (#33198) · 7766721a
  由 wenbin 提交于 5月 31, 2021
  
  7766721a
25 5月, 2021 1 次提交
- S
  [HybridParallel]Fix precision problem of model parallel (#32897) (#33087) · 4026e227
  由 ShenLiang 提交于 5月 25, 2021
```
* fix precision of mp

* fix bug of seed

* fix dp

* print group
```
  4026e227

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致