提交 · 50bfe420893e15f48e0aca9dbbc26cac3ce33bae · BaiXuePrincess / Paddle

29 4月, 2022 1 次提交

[cherry-pick 2.3] Add fused_multi_transformer op to optimize transformer... · 50bfe420

由 WangXi 提交于 4月 29, 2022

[cherry-pick 2.3] Add fused_multi_transformer op to optimize transformer generation performance (#42311)

* Add fused_multi_transformer op to optimize transformer generation performance (#41814)

* fix fused_multi_transformer compile failed in cuda arch < sm53 (#42315)

* fix ci timeout

50bfe420

22 4月, 2022 1 次提交
- A
  [IPU] add mixed-precission support for ipu (#41733) (#41906) · c09b1d68
  由 Allen Guo 提交于 4月 22, 2022
```
add mixed-precission support for ipu

cherry-pick from #41733
```
  c09b1d68
05 4月, 2022 1 次提交
- G
  
  add new format of quantization (#41041) · b72a7ebb
  由 Guanghua Yu 提交于 4月 05, 2022
  
  b72a7ebb
01 4月, 2022 1 次提交
- D
  
  edit fused_seqpool_cvm doc; test=develop (#41192) · 3b7b8528
  由 danleifeng 提交于 4月 01, 2022
  
  3b7b8528
28 3月, 2022 3 次提交
- D
  add fused_seqpool_cvm op (#37928) · ea5b2f26
  由 danleifeng 提交于 3月 28, 2022
```
* add fused_seqpool_cvm op;test=develop
```
  ea5b2f26
- L
  update docs dtype(core.VarDesc.VarType)test=document_fix (#40947) · 34f07045
  由 Ligoml 提交于 3月 28, 2022
```
* update docs dtype(core.VarDesc.VarType)

* fix code style, test=document_fix

fix code style, test=document_fix
Co-authored-by: NChen Long <1300851984@qq.com>
```
  34f07045
- G
  add adaround post-quant method (#38460) · 3d5a27f0
  由 Guanghua Yu 提交于 3月 28, 2022
```
* add adaround post-quant method
```
  3d5a27f0
25 3月, 2022 1 次提交

Refactor Dygraph Flags (#40786) · 3085d5e4

由 Jiabin Yang 提交于 3月 25, 2022

* refactor eager flags

* fix flags error when we switch from eager to dygraph

* fix ci problem

* fix ci

* fix ci

* merge develop and fix code style

* merge develop and fix code style

* fix op test error

* fix op test error

* fix op test error

* fix op test error

* fix op test error

* merge develop

3085d5e4

24 3月, 2022 1 次提交

[AMP] Support amp for Intermediate_dygraph (#40623) · c12f7d48

由 zhangbo9674 提交于 3月 24, 2022

* approve amp for intermediate_dygraph

* add amp_utils for intermediate_dygraph

* add amp needcast check for mlu & npu

* test unittest

* add SetGradNode for set_stop_gradient && add checktensor for GradientHooks

* refine code

* refien unittest of imperative_amp for new dygraph

* inplace api skip amp

* add test_imperative_qat_amp for intermediate amp

* refine code

* refine test_amp ci strategy

* refine unittest code

* refine amp_utils code

* refine amp getpromotetype for some special op

* refine unittest code

c12f7d48

16 3月, 2022 3 次提交
- J
  Modify save_quant_model to support different input and output filenames (#40542) · dec2b1ca
  由 joanna.wozna.intel 提交于 3月 16, 2022
```
* Modify save_quant_model.py to support differnet input and output filenames

* Correct wrong order of arguments
```
  dec2b1ca
- M
  
  Add Support Layer List to ASP (#40253) · c040bbd7
  由 Ming-Xu Huang 提交于 3月 16, 2022
  
  c040bbd7
- Q
  
  [MLU] support amp O1 of mlu (#40461) · ad81f22c
  由 qipengh 提交于 3月 16, 2022
  
  ad81f22c
15 3月, 2022 1 次提交
- G
  Support some ops for full quantization (#40083) · 7ced3017
  由 Guanghua Yu 提交于 3月 15, 2022
```
* add some op for full_quantization
```
  7ced3017
11 3月, 2022 1 次提交
- G
  
  add EMD method of post_quant (#40421) · 82c30f71
  由 Guanghua Yu 提交于 3月 11, 2022
  
  82c30f71
04 3月, 2022 1 次提交
- J
  
  extend test_imperative_qat_user_defined test time (#40114) · 73a4fe6c
  由 Jiabin Yang 提交于 3月 04, 2022
  
  73a4fe6c
03 3月, 2022 2 次提交

B

change_ASP_sharding_option (#40028) · 815f7a67
由 Baibaifan 提交于 3月 03, 2022

815f7a67

Support slim eager (#39874) · da47544c

由 Jiabin Yang 提交于 3月 03, 2022

* eager, test=develop

* fix bug, test=develop

* eager, test=develop

* merge legacy to fluid

* eager, test=develop

* eager, test=develop

* Refactor TensorAdd func by template and remove gradient_accumulation in eager

* Remove needless target name

* eager, test=develop

* eager, test=develop

* Use overload instead of template

* Remove legacy code

* Remove legacy code

* selectedrows, test=develop

* Remove DataType test

* eager, test=develop

* eager, test=develop

* support gan, test=develop

* Using Tensor directly instead of using EagerTensor

* support gradient_accumulation

* make test_imperative_lod_tensor_to_selected_rows longer

* make test_imperative_lod_tensor_to_selected_rows longer

* refine code

* ptb, test=develop

* Rename all EagerTensor to Tensor

* Rename some EagerTensor to Tensor

* rename EagerTensor to EagerVariable

* eager, test=develop

* eager, test=develop

* eager, test=develop

* eager, test=develop

* add more test

* eager, test=develop

* Support copiable selected rows and merge develop

* save load, eager, test=develop

* save load, eager, test=develop

* refine, test=develop

* remove useless _set_value method

* refine, test=develop

* refine, test=develop

* revert static_runner, test=develop

* EagerTensor to Tensor, test=develop

* refine, test=develop

* refine, test=develop

* clear grad, test=develop

* merge, develop

* merge, develop

* merge, test=develop

* merge, test=develop

* Support quant and part of slice

* support legacy static save

* extend slim tests time

* remove imperative on inference

* remove imperative on inference

* merge develop

* fix typo

* fix typo

* split slice related code into 2 part for imperative and eager

* split slice from inference

* split slice from inference

* fix test_tensor_register_hook
Co-authored-by: NWang Huan <wanghuan29@baidu.com>
Co-authored-by: NWeilong Wu <veyron_wu@163.com>
Co-authored-by: Nwanghuancoder <wanghuancoder@163.com>

da47544c

01 3月, 2022 2 次提交
- J
  Add mobilenetv3_large performance test for bf16 and int8 (#39738) · eb7c211a
  由 joanna.wozna.intel 提交于 3月 01, 2022
```
* Add mobilenetv3_large performance test

* Disable the BF16 test if the device does not support BF16 computations

* Change test timeout
```
  eb7c211a
- W
  remove conv_affine_channel_fuse_pass (#39817) · fc06be9d
  由 wenbin 提交于 3月 01, 2022
```
* remove

* pass

* more pass
```
  fc06be9d
19 2月, 2022 1 次提交

Add the DistributedFusedLamb optimizer (#39148) · 5df3cd61

由 sneaxiy 提交于 2月 19, 2022

* add DistributedFusedLamb op

* polish code

* fix compile error

* compatible with pten changement

* fix rocm compile error

* improve converage

* update upstream/develop

* fix cast_with_ptr.h

* add FLAGS_distributed_lamb_divide_nranks_when_allreduce=1

* fix clip before allreduce

* add use_master_param_norm

* code polish

* fix bug

* fix ROCM ci

5df3cd61

14 2月, 2022 1 次提交

[UT] mish op, conv+mish, fc+mish fuse passes (#39340) · 02938b3d

由 Sławomir Siwek 提交于 2月 14, 2022

* mish unit tests

* code format

* remove unused imports

* code format

* remove hard-coded shape values

* remove timeouts

* remove timeouts v2

* restore timeouts

02938b3d

09 2月, 2022 1 次提交

[Paddle-Inference] rebuild matmul pass: trt and gpu_cpu (#39369) · db7d129e

由 Wangzheee 提交于 2月 09, 2022

* rebuild matmul pass: trt and gpu_cpu

* rebuild matmul pass: trt and gpu_cpu

* rebuild matmul pass: trt and gpu_cpu

* rebuild matmul pass: trt and gpu_cpu

db7d129e

07 2月, 2022 1 次提交

Update BF16 amp list (#39304) · 0c43ce22

由 arlesniak 提交于 2月 07, 2022

* amp list updated

* tests updated

* gray list updated

* amp list updated

* test updated

0c43ce22

27 1月, 2022 1 次提交

Update passes in quant2_int8_mkldnn_pass (#38912) · 0e235e58

由 joanna.wozna.intel 提交于 1月 27, 2022

* Upadate pass in quant2_int8_mkldnn_pass

* Back to the previous scale_matmul order

* Change place of cpu_quantize_placement_pass

0e235e58

21 1月, 2022 1 次提交
- C
  
  fix save channel wise quant model (#39054) · ab1abd40
  由 ceci3 提交于 1月 21, 2022
  
  ab1abd40
13 1月, 2022 1 次提交

Added mul BF16/FP32 FWD/BWD oneDNN kernel (#38552) · fc6eed5b

由 jakpiase 提交于 1月 13, 2022

* base changes for mul reimplementation

* empty commit

* tmp save

* full implementation of mul bf16/fp32 fwd bwd

* CI fix

* CI rerun

* changed unity build cmake to avoid gpu issues

* removed mul mkldnn from unity build

* added skipping tests if not cpu_bf16

* CI fix

* CI fix

* CI fix

fc6eed5b

12 1月, 2022 1 次提交
- S
  Fix conv act int8 scale (#38331) · 4825addd
  由 Sylwester Fraczek 提交于 1月 12, 2022
```
* fix conv act int8 scale

* add unit test for conv+hard_swish
```
  4825addd
06 1月, 2022 1 次提交
- M
  
  [Paddle-ASP]Asp sharding (#37725) · aec6e8a9
  由 minghaoBD 提交于 1月 06, 2022
  
  aec6e8a9
05 1月, 2022 2 次提交
- J
  Make post training quant API support dataloader (#38686) · 0af1a87b
  由 Jiaqi Liu 提交于 1月 05, 2022
```
* make post training quant API support dataloader
```
  0af1a87b
- J
  Quantize nearest_interp and nearest_interp_v2 (#38622) · 1456b02d
  由 joanna.wozna.intel 提交于 1月 05, 2022
```
* Quantize nearest_interp and nearest_interp_v2

* Check if avx_core supported

* Add depthwise_conv2d to supported quantization list
```
  1456b02d
28 12月, 2021 1 次提交

Fix scatter_op fp16 perf problem. (#38499) · 33ce249f

由 Li Min 提交于 12月 28, 2021

* Fix scatter_op fp16 perf problem.

* Add scatter into black list.

* Add scatter into black list for dygraph.

33ce249f

22 12月, 2021 1 次提交
- G
  
  fix clip extra when QAT export model (#38323) · 142ea171
  由 Guanghua Yu 提交于 12月 22, 2021
  
  142ea171
20 12月, 2021 1 次提交

Support FP16 for more ops (#38123) · 1f445bf3

由 sneaxiy 提交于 12月 20, 2021

* support FP16 for more ops

* add amp list tests

* refine reduce_mean_grad

* fix OP benchmark ci

* fix fp16 reduce_mean

* updat ut, but still have some problems

* remove mean/reduce_mean fp16 kernel

1f445bf3

17 12月, 2021 1 次提交

Refine some AMP operators for BERT (#37923) · d80fe268

由 sneaxiy 提交于 12月 17, 2021

* support multi precision update for LAMB

* hide some api

* fix ci uts

* fix lamb output of dygraph

* remove some changes to some PR

* try to fix Py3 CI compile error

* fix test_imperative_optimizer, add lars ut, add layer_norm ut

* fix ut, fix format

* fix ut

* fix windows ci

d80fe268

14 12月, 2021 3 次提交
- S
  add map_matmul and fc_act_fuse passes to quant2_int8_mkldnn_pass (#38023) · 8f800dc0
  由 Sylwester Fraczek 提交于 12月 14, 2021
```
* add map_matmul passes to quant2_int8_mkldnn_pass

* fix fc+act fuse (activation scale)

* ci fix, c++17 structured bindings not available

* fix ci static check
```
  8f800dc0
- G
  
  fix QAT export bug in while OP (#38102) · fff6e77c
  由 Guanghua Yu 提交于 12月 14, 2021
  
  fff6e77c
- S
  add reshape+transpose+matmul_v2 only (#37847) · a922168a
  由 Sylwester Fraczek 提交于 12月 14, 2021
```
* reshape+transpose+matmul_v2

* in_name->input_name

* fix pr-ci-static-check
```
  a922168a
13 12月, 2021 1 次提交
- X
  fix single card 8 unittests in new executor (#37957) · 9a4eec98
  由 xiongkun 提交于 12月 13, 2021
```
* fix single card 8 unittests in new executor

* fix

* fix
```
  9a4eec98
10 12月, 2021 2 次提交
- G
  Support quantization of condition block (#37498) · 89069af5
  由 Guanghua Yu 提交于 12月 10, 2021
```
* Support sub graph quant-post
```
  89069af5
- G
  
  fix fetch op rename_input bug in QAT export model (#38012) · 76c73226
  由 Guanghua Yu 提交于 12月 10, 2021
  
  76c73226

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致