提交 · 6326c3efbec9a024364b1fe4450a48c3eaa63de2 · 机器未来 / Paddle

12 8月, 2021 7 次提交
- W
  
  [Inference] Inference python api support fp16 (#34676) · 6326c3ef
  由 Wilber 提交于 8月 12, 2021
  
  6326c3ef
- F
  transformer c files (#34706) · 016cc56d
  由 Feng Xing 提交于 8月 12, 2021
```
This PR adds fused transformer related files defining c interface including class, function etc..
```
  016cc56d
- Z
  Fix safety-bug of functional.linear (#34696) · 0e28c8bb
  由 zhulei 提交于 8月 12, 2021
```
* Fix safety-bug of functional.linear

* Fix safety-bug of functional.linear

* Fix safety-bug of functional.linear

* Fix safety-bug of functional.linear
```
  0e28c8bb
- S
  [HybridParallel]Add Recompute for PipeLineParallel (#34607) · 589d13c5
  由 ShenLiang 提交于 8月 12, 2021
```
* add recompute for pp

* add recompute offload

* add recompute partition
```
  589d13c5
- W
  
  [NPU] Support npu kernel for smooth_l1_loss op (#34674) · cfa69133
  由 wuhuachaocoding 提交于 8月 12, 2021
  
  cfa69133
- F
  [NPU] Support npu op expand_v2 and expand_v2_grad (#34764) · bc543e35
  由 Fan Zhang 提交于 8月 12, 2021
```
* [NPU] Support npu op expand_v2 and expand_v2_grad

* [NPU] Support npu op expand_v2 and expand_v2_grad

* [NPU] Support npu op expand_v2 and expand_v2_grad

* update test_expand_v2_op_npu.py

* update test_expand_v2_op_npu.py

* modify expand_v2_op_npu.cc

* modify expand_v2_op_npu.cc
```
  bc543e35
- P
  add det_mv3_db & LeViT test case in pr-ci-inference (#34803) · 1c31d9d3
  由 Peihan 提交于 8月 12, 2021
```
* add det_mv3_db & LeViT test case in pr-ci-inference

* fix LeViT model dir bugs

* fix grammar error
```
  1c31d9d3
11 8月, 2021 21 次提交

[oneDNN] Fix to issue #34554 (#34623) · 0a5c99e8

由 Jacek Czaja 提交于 8月 11, 2021

* - Added softmax without caching

* - Binary is no longer manually cached

* - Activation onednn caching removed

* - Removed manual caching of activation

* - modified UT

* - fix

* - fix

* - fixes to building

* - fix

* - fix

* - fix to UT

* - Faulty UT workaround

* - approval workaround

* - Fixes after review

* - compilation fixes

* - more lint fixes

* - more fixes after review

* - fixes after another round of review

0a5c99e8

[AMP] add state_dict and load_state_dict and unittest for class GradScaler (#34300) · 99f8f5c8

由 zhangbo9674 提交于 8月 11, 2021

* add state_dict and load_state_dict and unittest for class GradScaler

* refine unittest for coverage of load_state_dict

* refine comments of code-block

* refine some comments

* refine state_dict code and unittest

* add #require gpu, xpu for GradScaler get/set example code

* add #require gpu, xpu for GradScaler get/set example code

* refine example code

* refine unittest for state_dict

* refine unittest for state_dict

* fix bug of DataLoader in TestGradScalerStateDict

* add flag FLAGS_cudnn_deterministic

99f8f5c8

`set_value_grad` propagate gradients to `Input` and `TensorValue` (#34304) · 9d02313c

由 WeiXin 提交于 8月 11, 2021

* add set_value_grad op

* add unittest.

* polish unittest.

* polish code.

* support cuda kernel

* polish code according to CI

* polish code.

* polish code

* remove *.pyc

* polish code.

* add unittest to improve coverage.

* polish code.

9d02313c

W
[Paddle TRT]fix_fc_int8_convert; fix_reshape_convert (#34787) · 3429c04b
由 Wangzheee 提交于 8月 11, 2021
```
* fix_fc_reshape_convert

* fix
```
3429c04b
F

[NPU] Support npu op flatten_contiguous_range_grad (#34798) · fc537d4f
由 Fan Zhang 提交于 8月 11, 2021

fc537d4f
P
[NPU] add while, read_from_array and write_to_array npu op (#34755) · 234c21ac
由 pangyoki 提交于 8月 11, 2021
```
* add while read_from_array write_to_array npu op

* optimize unittest
```
234c21ac
R

split_op for npu (#34699) · d45d3112
由 Roc 提交于 8月 11, 2021

d45d3112
R
[NPU] add momentum_op_npu and test (#34082) · 9e3e08f0
由 ronnywang 提交于 8月 11, 2021
```
* add momentum_op_npu and test

* update

* fix hang
```
9e3e08f0
R
[NPU] add reduce_mean_op_npu and test (#34053) · f6fab559
由 ronnywang 提交于 8月 11, 2021
```
* add reduce_mean_op_npu and test

* remove skip.If

* update
```
f6fab559
R
[NPU] add batch_norm_op_npu and test (#34056) · 9ed5db28
由 ronnywang 提交于 8月 11, 2021
```
* add batch_norm_op_npu and tests

* remove skip.If

* fix bug
```
9ed5db28

Add ext_tensor.slice() API (#34227) · 3f011d82

由 Hao Lin 提交于 8月 11, 2021

* Add ext_tensor.slice() API, test=develop

* Call Tensor::mutable_data first to fix bugs and add test for writing to sliced tensor

* Fix unit test bug

* Fix code format problem, test=develop

* Fix code format problem

* Fix code format problem

* strengthen unit test

* Use CustomTensorUtils::ShareDataFrom to simplify codes

3f011d82

W

[hybrid] pp+dp support fp16 allreduce (#34762) · 4d7af372
由 WangXi 提交于 8月 11, 2021

4d7af372
L
add the basic apis for auto_parallel (#33804) · 3f962e77
由 lilong12 提交于 8月 11, 2021
```
* add auto_parallel apis
```
3f962e77
S
[HybridParallel] Support save/load for PipeLineParallel (#34768) · 88f2f4a4
由 ShenLiang 提交于 8月 11, 2021
```
* add save/load for pipelineparallel

* add save/load
```
88f2f4a4

[NPU] Add exp and exp_grad npu op (#34612) · b5ec65e1

由 0x45f 提交于 8月 11, 2021

* add exp and exp_grad npu op

* modify support register type

* remove empty line and remove exp_grad support data type int/int64

* move exp and epx_grad kernel to activation_op_npu.cc, delete attrs

* move code to activation_op_npu.cc

b5ec65e1

A

[NPU] add elementwise_min_grad_op_npu,test=develop (#34731) · 45af4f2a
由 andyjpaddle 提交于 8月 11, 2021

45af4f2a
W

miss format (#34771) · addd5fce
由 wenbin 提交于 8月 11, 2021

addd5fce
Y

Optimize fused allreduce in raw program (#34509) · 4d2994cb
由 Yuang Liu 提交于 8月 11, 2021

4d2994cb
N

modified reduce_sum_op and reduce_mean_op for higher_performance (#32885) · 6a9fac14
由 niuliling123 提交于 8月 11, 2021

6a9fac14

[NPU] Support NPU kernel for TopKV2 op (#34599) · bb01b120

由 From00 提交于 8月 11, 2021

* Add NPU kernel for TopKV2 op

* deleted unnecessary cache file static_mode_white_list.cpython-37.pyc

* A draft for error checking

* A commit with accuracy error for float32 data

* Modify codes according to the review comments

* Modify codes according to the review comments

bb01b120

Add no need output to gc check list (#34754) · 17c1dae9

由 hong 提交于 8月 11, 2021

* add not used output var to gc_check_list; test=develop

* add useless output to gc check list; test=develop

17c1dae9

10 8月, 2021 12 次提交

[NPU] Support npu kernel for flatten_contiguous_range op, test=develop (#34642) · 79be8427

由 Liu-xiandong 提交于 8月 10, 2021

* fix npu compile error, test=develop

* [NPU] Support npu kernel for flatten_contiguous_range op, test=develop

* [NPU] Support npu kernel for flatten_contiguous_range op, test=develop

* [NPU] Support npu kernel for flatten_contiguous_range op, test=develop

* [NPU] Support npu kernel for flatten_contiguous_range op, test=develop

* [NPU] Support npu kernel for flatten_contiguous_range op, test=develop

* [NPU] Support npu kernel for flatten_contiguous_range op, test=develop

* [NPU] Support npu kernel for flatten_contiguous_range op, test=develop

* Update flatten_op_npu.cc

* Update flatten_op_npu.cc
Co-authored-by: Nqili93 <qili93@qq.com>

79be8427

N
Kernel primitives api (#34672) · 8f9d573f
由 niuliling123 提交于 8月 10, 2021
```
添加Kernel primitives api： ReadData, WriteData ComputeFunctor
```
8f9d573f
C

fix format_string_append test cast,test=develop (#34753) · 8b9bd165
由 chentianyu03 提交于 8月 10, 2021

8b9bd165
A
[NPU] add squared_l2_norm squared_l2_norm_grad and tests (#34708) · b64312fc
由 Aganlengzi 提交于 8月 10, 2021
```
* [NPU] add squared_l2_norm squared_l2_norm and tests

* [NPU] replace Square&ReduceSumD with SquareSumV1
```
b64312fc

Support npu op fill_any_like (#34518) · e8df3226

由 zyfncg 提交于 8月 10, 2021

* Support npu kernel for fill_any_like op

* modify the description of exception

* remove useless template element

* remove useless decorator

* fix the code format error

e8df3226

[NPU] Support op kernel for Fill constant batch size like op (#34721) · ed2641cb

由 andyjpaddle 提交于 8月 10, 2021

* fix npu compile error, test=develop

* add fill constant batch size lilke op npu,test=develop
Co-authored-by: Nqili93 <qili93@qq.com>

ed2641cb

X

fix a quantization bug (#34647) · cfd49acc
由 XGZhang 提交于 8月 10, 2021

cfd49acc
S

[bug fix] fix unfold fpe bug (#34673) · 4f4662b0
由 shangliang Xu 提交于 8月 10, 2021

4f4662b0
W

[hybrid] refine sharding code (#34678) · a1603797
由 WangXi 提交于 8月 10, 2021

a1603797
C

add cudaEvent destructor function (#34734) · f30a5c42
由 chentianyu03 提交于 8月 10, 2021

f30a5c42

copy boost/any.hpp to utils and replace boost::any with self defined any (#34613) · 12892929

由 chentianyu03 提交于 8月 10, 2021

* add any.hpp to utils and replace boost::any with self defined paddle::any

* add copy any.hpp to custom op depends

* modify any.hpp include path

* remove boost from setup.py.in

* add copy any.hpp to custom op depends

* move any.hpp to paddle/utils/ dirs

* move any.h to extension/include direction

* copy utils to right directions

12892929

H
fix for div zero (#34724) · d86c26dc
由 Hui Zhang 提交于 8月 09, 2021
```
* fix for div zero

* fix err;test=develop

* fix lod
```
d86c26dc

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致