提交 · e5ed5257083b92b018330812c33c746bae26fb41 · PaddlePaddle / Paddle

17 11月, 2022 3 次提交

Support bfloat16 for adamw and adam optimizer. Fit the lr for pure bf16... · e5ed5257

由 Yuang Liu 提交于 11月 17, 2022

Support bfloat16 for adamw and adam optimizer. Fit the lr for pure bf16 training with tensor fusion. (#48041)

* add bfloat16 for adamw

* set lr not to bfloat16 for pure bf16 training

* update the logic

* update the adamw optimizer

* support bfloat for adam

e5ed5257

Add vectorized bfloat16 atomicAdd (#48056) · ccbd03d5

由 sneaxiy 提交于 11月 17, 2022

* add vectorized bfloat16 atomicAdd

* fix compile error

* fix compile error again

* fix V100 compile error

* fix V100 compile again

ccbd03d5

Z

generate static graph code for some op (#48036) · 7cc0d171
由 zyfncg 提交于 11月 17, 2022

7cc0d171

16 11月, 2022 9 次提交
- X
  [Paddle Inference] Add fill_any_like trt converter. (#47974) · d6be9000
  由 xiaoxiaohehe001 提交于 11月 16, 2022
```
* add_fill_any_like

* add_fill_any_like
```
  d6be9000
- W
  elementwise_floordiv (#47944) · b4b78060
  由 wenbin 提交于 11月 16, 2022
```
* elementwise_op

* add teller

* modify ut

* comments

* modify ut

* return

* modify
```
  b4b78060
- Z
  
  trt memory set change from setMaxWorkspaceSize to setMemoryPoolLimit since trt 8.3+ (#47795) · 9cf3aa61
  由 Zhang Jun 提交于 11月 16, 2022
  
  9cf3aa61
- Z
  
  [inference][trt] update trt hardswish plugin to layer (#47745) · 6c54e0e8
  由 Zhang Jun 提交于 11月 16, 2022
  
  6c54e0e8
- P
  Add bf16 data type support to oneDNN bilinear_interp kernel (#46770) · 8e6315e4
  由 Piotr Paturej 提交于 11月 16, 2022
```
* Enable bf16 in oneDNN bilinear_interp kernel

* Fix bilinear_interp_v2 not enabled in models

* Remove unnecessary checks
```
  8e6315e4
- H
  remove avx check (#48003) · a762d68e
  由 hong 提交于 11月 16, 2022
```
* remove avx check

* fix bug;
```
  a762d68e
- L
  
  increase the level of some log (#47990) · 2f8901cb
  由 Leo Chen 提交于 11月 16, 2022
  
  2f8901cb
- W
  Update `ProcessGroupCustom` for `sync_op` compatibility (#47976) · e4ebf383
  由 Wen Sun 提交于 11月 16, 2022
```
* refactor: update pg custom

* fix: use new api in ut

* fix: typo

* revert: recover legacy apis

* fix: add GetDeviceContext
```
  e4ebf383
- C
  
  feat(ipu): add paddle inference support for model_runtime. (#47364) · 39c85064
  由 czr-gc 提交于 11月 16, 2022
  
  39c85064
15 11月, 2022 5 次提交
- Y
  
  fix onednn bugs, test=document_fix (#48013) · 21d4fa02
  由 YuanRisheng 提交于 11月 15, 2022
  
  21d4fa02
- J
  Added optimization pass for oneDNN layernorm kernel (#47782) · 519e7426
  由 jakpiase 提交于 11月 15, 2022
```
* optimization for ln

* fix

* added output to gpd

* added formatting

* fix
```
  519e7426
- [Zero-Dim] Make auto parallel judge dim more strict (#47961) · 626d7bcb
  由 zhouweiwei2014 提交于 11月 15, 2022
  
  626d7bcb
- W
  
  [convert_to_mixed_precision] fallback to fp32 when encounter circle (#47902) · a00aebe1
  由 Wilber 提交于 11月 15, 2022
  
  a00aebe1
- S
  mkldnn directory cleanup (#47779) · 8a339d24
  由 Sławomir Siwek 提交于 11月 15, 2022
```
* cleanup unused code

* unify is_int8 is_bfloat16

* Simplify matmul_v2 FWD kernel

* remove RunKernel methods

* remove import namespace

* remove headers

* clean fluid/phi cross imports

* remove fluid axpy_handler

* delete fluid methods

* activations

* OneDNNMemDesc

* MKLDNNFormatForSize

* MatchShapeToLayout

* MKLDNNMemoryFormat

* MKLDNNFormat

* ReorderMKLDNNHandler

* to_void_cast

* review suggestions

* interpolate

* remove fluid depedency
```
  8a339d24
14 11月, 2022 9 次提交
- W
  Refactor collective communication send_partial, recv_partial, all_gather_partial C++ API (#47863) · 25e63dca
  由 Wen Sun 提交于 11月 14, 2022
```
* refactor: simplify send, recv interfaces

* refactor: rm send_partial, recv_partial, all_gather_partial
```
  25e63dca
- X
  
  [Paddle Inference] Add where trt converter (#47820) · dac0f7dd
  由 xiaoxiaohehe001 提交于 11月 14, 2022
  
  dac0f7dd
- L
  
  Remove place for process group (#47857) · 2d383b81
  由 LiYuRio 提交于 11月 14, 2022
  
  2d383b81
- C
  
  add cos double and triple grad operator (#47796) · 1a145aab
  由 cyber-pioneer 提交于 11月 14, 2022
  
  1a145aab
- L
  
  remove heter and hccl (#47918) · 9191e743
  由 LiYuRio 提交于 11月 14, 2022
  
  9191e743
- R
  
  Do not release memory cache after build_op_func_list in interpretercore (#47910) · 8347354d
  由 Ruibiao Chen 提交于 11月 14, 2022
  
  8347354d
- E
  
  add lite opencl support api (#47112) · 798ab3f9
  由 engineer1109 提交于 11月 14, 2022
  
  798ab3f9
- Y
  
  fix squueze_transpose (#47911) · f50de679
  由 yeliang2258 提交于 11月 14, 2022
  
  f50de679
- R
  
  Add InferShape for Depend OP (#47907) · 5478e1a5
  由 Ruibiao Chen 提交于 11月 14, 2022
  
  5478e1a5
11 11月, 2022 6 次提交

[Zero-Dim] fix batch_norm op infermeta bug (#47858) · 18549417
由 zhouweiwei2014 提交于 11月 11, 2022

18549417

[IPU]: add model_runtime backend support in IPU (#47363) · 21b901cb

由 czr-gc 提交于 11月 11, 2022

* feat(ipu): add model_runtime backend support in IPU.

* fix(ipu_executor): fix error message format.

* fix(ipu_executor): fix format.

* fix(ipu_executor): fix format again.

* fix(ipu_executor): fix format again.

* fix(ipu_executor): fix format again.

21b901cb

Refine shape op lanch method for standalone executor (#47843) · 981d1a10

由 zhangbo9674 提交于 11月 11, 2022

* refine shape op in new_exe

* Revert "refine shape op in new_exe"

This reverts commit 0e0336ddc5eede3da019b348a0bcc0ef0f3be64e.

* refine shape op in new_exe

* refine shape expected_kernel_type

* add SelectedRows check for shape op

* refine code

981d1a10

J
bugfix in XPU legacy_dygraph distributed training: (#47838) · 9a6465ca
由 james 提交于 11月 11, 2022
```
phi::Alloc() complains about missing device_allocator_
```
9a6465ca

Generate static graph code for some ops by yaml (part3) (#47803) · 31f3f643

由 zyfncg 提交于 11月 11, 2022

* generate static graph code for some ops by yaml

* remove deleted files

* update cmake

* update cmake

* udpate cmake

31f3f643

Y

[Inference] fix mixed precision (#47794) · 9bda10cd
由 Yuanle Liu 提交于 11月 11, 2022

9bda10cd

10 11月, 2022 8 次提交
- S
  [phi] migrate prelu (#47422) · cdd8c8ab
  由 Sylwester Fraczek 提交于 11月 10, 2022
```
* migrate prelu

* remove cache

* review fixes
```
  cdd8c8ab
- W
  Get grads types from cpp for adam to speed up (#47769) · 5900129c
  由 WangZhen 提交于 11月 10, 2022
```
Get grads types from cpp for adam to speed up
```
  5900129c
- L
  
  remove the hang checkness (#47806) · 8d99dd0c
  由 LiYuRio 提交于 11月 10, 2022
  
  8d99dd0c
- Y
  [PHI]Standardise some C++ API (Part4) (#47702) · 594bd723
  由 YuanRisheng 提交于 11月 10, 2022
```
* standard api

* fix sparse bugs

* fix xpu bugs, test=kunlun

* remove hard code for custom unittest

* open ci, test=kunlun

* deal with conflict
```
  594bd723
- Z
  [search && paddle inference]add roformer pass&&plugin novarlen version (#47523) · 0f3fb562
  由 zhangxin81 提交于 11月 10, 2022
```
* add roformer pass&&plugin（novarlen）
```
  0f3fb562
- J
  XPU multi-card support eager mode (#47445) · 3b91f8f3
  由 james 提交于 11月 10, 2022
```
* XPU support eager mode

* add unittest for XPU eager mode

* minor bugfix

* minor bugfix, test=kunlun

* correct copyright info

* 1. remove unsed vars/funcs
2. ProcessGroupBKCL inherit from ProcessGroupStream

* bugfix for fp16 in eager mode multi-card, test=kunlun

* rebase & fix a few issues

* use new processgroup interface, test=kunlun

* fix compile issue, test=kunlun
```
  3b91f8f3
- W
  skip_merge_layernorm (#47810) · 1c6013dd
  由 wenbin 提交于 11月 10, 2022
```
* skip_merge_layernorm

* add UT

* modify comments
```
  1c6013dd
- Z
  Add CI check for script of auto code-gen (#47814) · 00ea0b2f
  由 zyfncg 提交于 11月 10, 2022
```
* add ci check for code-gen script

* update
```
  00ea0b2f

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功