提交 · e7c8160050e815016453f0a171097dc5d79e5d7a · BaiXuePrincess / Paddle

29 4月, 2021 6 次提交
- J
  Add BF16 uniform random initializer (#32468) (#32677) · e7c81600
  由 joanna.wozna.intel 提交于 4月 29, 2021
```
* Add bf16 uniform random initializer

* Remove duplicated section

* Change UT to CPU place only

* Put detail functions into anonymous namespace
```
  e7c81600
- A
  Added pure_bf16 mode (#32281) (#32681) · 93535c59
  由 arlesniak 提交于 4月 29, 2021
```
This is cherry-pick of #32281
```
  93535c59
- C
  [Cherry-pick] Polish custom operator overrided method impl (#32666) (#32674) · ca2ef414
  由 Chen Weihang 提交于 4月 29, 2021
```
cherry-pick of #32666
```
  ca2ef414
- P
  
  specify multihead_matmul_fuse_pass_v3 QK path (#32659) (#32668) · 30dfa745
  由 Pei Yang 提交于 4月 29, 2021
  
  30dfa745
- W
  fix mem release error. (#32655) · a5627df3
  由 Wilber 提交于 4月 29, 2021
```
后续修复计划是啥
```
  a5627df3
- J
  - Added clearing oneDNN per executor (#32664) · 7ae0a80f
  由 Jacek Czaja 提交于 4月 29, 2021
```
- Executor is nt always having FLAGS_use_mkldnn set to true
```
  7ae0a80f
28 4月, 2021 2 次提交

W

conservative judgment (#32619) · 056a2fca
由 wenbin 提交于 4月 28, 2021

056a2fca

[Cherry-pick] Optimize update_loss_scaling_op(#32554) (#32606) · 33703da8

由 jiangcheng 提交于 4月 28, 2021

* optimize update_loss_scaling_op by fused for loop to one kernel, test=develop

* remove useless while loop and optimize variable name, test=develop

* optimize variable name from out_addrs_tensor to out_addrs_mem, test=develop

* optimize variable name for readable by change prefix identifier from t_ to local_

33703da8

27 4月, 2021 5 次提交
- Z
  [OPs] Bug fix, fix the segment mean for illegal syncthreads usage. (#32596) (#32610) · 54ab656c
  由 Zhong Hui 提交于 4月 27, 2021
```
* [OPs] Bug fix, fix the segment mean for illegal syncthreads usage.
```
  54ab656c
- P
  
  support depthwise_conv2d_transpose (#32593) · 85e697d7
  由 Pei Yang 提交于 4月 27, 2021
  
  85e697d7
- T
  Revert "[PsCore] optimize performance of large kv (#32535)" (#32599) · 809ac036
  由 tianshuo78520a 提交于 4月 27, 2021
```
This reverts commit 4b7242b0.
```
  809ac036
- A
  
  Fix grad calculation bug in tensor_array_to_tensor (#32558) · 6579432f
  由 Aurelius84 提交于 4月 27, 2021
  
  6579432f
- X
  Check for cuda errors immediately after kernel launch (#32557) · 19eefef4
  由 XiangGao 提交于 4月 27, 2021
```
Co-authored-by: NYang Zhang <yangzhang@live.com>
```
  19eefef4
26 4月, 2021 13 次提交
- L
  add send/recv api (#32504) · c47bafc6
  由 lilong12 提交于 4月 26, 2021
```
* add sendrecv, test=develop
```
  c47bafc6
- Z
  Fix OPENBLAS ci and fix windows CPU CI to parallel compile (#32548) · 1ec9525a
  由 Zhou Wei 提交于 4月 26, 2021
```
* clear CUDA compile environment on windows

* fix Windows CI

* fix Windows CI

* fix Windows CI
```
  1ec9525a
- J
  Optimize where_index_op(prefix sum) (#30601) · 6ec4e640
  由 jiangcheng 提交于 4月 26, 2021
```
* new optimize for where_index_op with prefix sum version.

* write a scan prefix sum kernel with stream for where index op.

* optimize where_index by using cub::DeviceScan::InclusiveSum instead of imperfect self-kernel.

* remove CheckTrue struct and rename stide_array for readable.

* optimize variable name for readable.

* optimize function name and annotation.
```
  6ec4e640
- T
  [PsCore] optimize performance of large kv (#32535) · 4b7242b0
  由 Thunderbrook 提交于 4月 26, 2021
```
* optimize pull sparse

* optimize pull sparse

* change macro

* format
```
  4b7242b0
- Y
  Unset ReserveSpace of batch_norm for inference program. (#32493) · 202b0eaf
  由 Yiqun Liu 提交于 4月 26, 2021
```
* Unset ReserveSpace for inference program.

* Support training from an inference program.
```
  202b0eaf
- W
  
  [HybridParallel] fix port reuse when create multi group (#31876) · 41bfec8d
  由 WangXi 提交于 4月 26, 2021
  
  41bfec8d
- S
  [HybridParallel]Fix model parallel bug by using C++ op (#32536) · ea465fa5
  由 ShenLiang 提交于 4月 26, 2021
```
* fix model parallel

* rm parallel_help.py

* add embedding
```
  ea465fa5
- 石
  
  python inference supports custom operators, test=develop (#32533) · 40e51b25
  由石晓伟提交于 4月 26, 2021
  
  40e51b25
- W
  support backward return None, when corresponding input tensor without gradient (#32494) · 8e66046b
  由 WeiXin 提交于 4月 26, 2021
```
* support backward return None.

* edit unittest.

* edit code according to CI

* Improve error information
```
  8e66046b
- J
  optimize slice op and slice grad op (#32266) · 5161f71a
  由 jiangcheng 提交于 4月 26, 2021
```
* optimize slice op and slice grad op, test=develop

* optimize variable name and annotation information, test=develop
```
  5161f71a
- L
  [AMP] Autocast to fp32 for op has no fp16 kernel (#32543) · d2b31a14
  由 Leo Chen 提交于 4月 26, 2021
```
* skip op has no fp16 kernel

* add ut
```
  d2b31a14
- L
  
  refine error msg when out of memory (#32527) · 756f4639
  由 Leo Chen 提交于 4月 26, 2021
  
  756f4639
- S
  
  fix ernie oss error (#32540) · d0751d09
  由 Shang Zhizhou 提交于 4月 26, 2021
  
  d0751d09
25 4月, 2021 14 次提交
- P
  [Paddle-TRT] Fix AI-Rank BERT emb_eltwise_layernorm input order (#32482) · fba46ea3
  由 Pei Yang 提交于 4月 25, 2021
```
* fix airank bert emb order

* move input num check to converter

* add input num check

* add unused var check white list
```
  fba46ea3
- L
  
  [slice] Support index is Tensor for slice in dynamic mode (#32435) · aceec7fb
  由 liym27 提交于 4月 25, 2021
  
  aceec7fb
- L
  
  [Setitem] Support grad computation of op set_value (#32431) · 25e723e7
  由 liym27 提交于 4月 25, 2021
  
  25e723e7
- B
  
  add copy_cross_scope (#32432) · 5943ff7b
  由 Baibaifan 提交于 4月 25, 2021
  
  5943ff7b
- P
  [Paddle-TRT] Add trt runtime version check (#32443) · b0556764
  由 Pei Yang 提交于 4月 25, 2021
```
* add trt runtime version check

* use different wrap, and change to major version check
```
  b0556764
- P
  
  add trt verbose logs (#32459) · 541d702d
  由 Pei Yang 提交于 4月 25, 2021
  
  541d702d
- Z
  
  fix gradient(nan) when two inputs are equal (#32448) · 1896c777
  由 Zhang Ting 提交于 4月 25, 2021
  
  1896c777
- L
  
  add clearGradient for amp sample code (#32517) · 74824fdd
  由 Leo Chen 提交于 4月 25, 2021
  
  74824fdd
- W
  
  update lite subgraph api. (#32513) · 92dc9b2b
  由 Wilber 提交于 4月 25, 2021
  
  92dc9b2b
- P
  
  support python3.9 in paddle_build (#32503) · 486946ae
  由 pangyoki 提交于 4月 25, 2021
  
  486946ae
- Q
  
  [ROCM] update PADDLE_WITH_ROCM to PADDLE_WITH_HIP, test=develop (#32487) · 3b4dcad7
  由 Qi Li 提交于 4月 25, 2021
  
  3b4dcad7
- M
  
  add silu op, test=develop (#32384) · 2f351ed5
  由 minghaoBD 提交于 4月 25, 2021
  
  2f351ed5
- L
  Fix the bug in mp (#31996) · 976fe6f9
  由 lilong12 提交于 4月 25, 2021
```
* update
```
  976fe6f9
- W
  [BUG FIX] when x.dim < y.dim, the result of compare_op is inverse (#32470) · 78eff521
  由 wawltor 提交于 4月 25, 2021
```
* fix bug: when x.dim < y.dim, the result of compare_op is inverse to expected result

* support the cuda for fix the compare broadcast bug
```
  78eff521

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致