提交 · af4bdede30584bc11d2b4f9538c7dfa42e42d1d4 · 翠花的腿毛 / Paddle

19 10月, 2022 15 次提交
- C
  Support uniform api and sigmoid api in new AD (#46960) · af4bdede
  由 Charles-hit 提交于 10月 19, 2022
```
* support uniform api in new ad

* add unit test for uniform_random_p

* resolve conflict

* fix uniform_random orig2prim

* fix primrules

* remove ShapeTensor and ShapeTensorList input in uniform_random_p op and add sigmoid orig2prim rules
```
  af4bdede
- W
  [Dy2St]Fix recurrent op eager deletion pass error in dy2st (#47105) · 94132190
  由 WangZhen 提交于 10月 19, 2022
```
* Fix recurrent op eager deletion pass error in dy2st

* Polish code

* Refine error message
```
  94132190
- W
  
  slice op supports uint8_t (#47067) · 1e1c7275
  由 will-jl944 提交于 10月 19, 2022
  
  1e1c7275
- X
  [Dy2Static] Remove GradTransformer (#47063) · be3908a3
  由 xiongkun 提交于 10月 19, 2022
```
* [Dy2Static] Remove GradTransformer
1. fix einsum infershape bugs.
2. remove grad_transformer and unify paddle.grad and paddle.static.gradient.
3. add dygraph_and_dy2static_only decorator for dy2static.

* fix bugs

* rename
```
  be3908a3
- Z
  
  Loose TRT half test tolerance to 1e-3 (#47101) · 36ab58f8
  由 zlsh80826 提交于 10月 19, 2022
  
  36ab58f8
- A
  [Dy2Stat]Polish @to_static temporary file directory to speed up transformation (#47102) · b3afac8a
  由 Aurelius84 提交于 10月 19, 2022
```
* [Dy2Stat]Polish @to_static temporary file directory

* [Dy2Stat]Polish @to_static temporary file directory

* refine temp.name

* fix typo

* fix typo
```
  b3afac8a
- H
  Construct exec and ctx only once in cond op to speed up (#47092) · 2814d7f6
  由 Hui Zhang 提交于 10月 19, 2022
```
* cond infer apply exec seprate

* fix bugs

* fix as comment
```
  2814d7f6
- L
  clean unused code: piece.cc/h (#47103) · e435d695
  由 Leo Chen 提交于 10月 19, 2022
```
* clean unused code: piece.cc/h

* clean usage
```
  e435d695
- Z
  
  Loose TRT half test tolerance to 1e-3 (#47106) · d53bd8c1
  由 zlsh80826 提交于 10月 19, 2022
  
  d53bd8c1
- Z
  
  Loose TRT fp16 tests tolerance (#47100) · 3f40cdfb
  由 zlsh80826 提交于 10月 19, 2022
  
  3f40cdfb
- W
  
  fix old dygraph a vlog bug (#47115) · 3c39475d
  由 wanghuancoder 提交于 10月 19, 2022
  
  3c39475d
- Z
  Reduce squeeze2_matmul_fuse_pass, flattent tests time (#47098) · 1a14d011
  由 zlsh80826 提交于 10月 19, 2022
```
* Add missing fp32 config and reduce the testing combination

* Reduce trt matmul pass test max examples
```
  1a14d011
- L
  
  fix build warning: [Wsign-compare] on linux (#46644) · be273ea9
  由 Li-fAngyU 提交于 10月 19, 2022
  
  be273ea9
- N
  [CodeStyle][py2] fix a decode error caused by 47036 (#47097) · ddf317ed
  由 Nyakku Shigure 提交于 10月 19, 2022
```
* [CodeStyle][py2] fix an decode error caused by 47036

* add a comment

* add an unittest for Block._rename_var

* add test_block_rename_var to static_mode_white_list
```
  ddf317ed
- R
  
  fix send for old dygraph mode by passing use_calc_stream to the send op (#47110) · d817d896
  由 Roc 提交于 10月 19, 2022
  
  d817d896
18 10月, 2022 23 次提交
- W
  
  Fix bugs in the General Plugin Mechanism (#47072) · 75b16781
  由 weishengying 提交于 10月 18, 2022
  
  75b16781
- Y
  update audio api examples (#46938) · c7d2e82c
  由 YangZhou 提交于 10月 18, 2022
```
* update audio api examples

* fix format

* format

* fix

* test api

* fix format

* fix static check error

* fix doc error

* fix ci

* fix api error

* update api.spec

* fix ci

* fix typo in window gaussian
```
  c7d2e82c
- Z
  [Paddle-TRT]Rewrite strided_slice converter using shape tensor (#46819) · 5c0bfc18
  由 zhoutianzi666 提交于 10月 18, 2022
```
* Rewrite strided_slice converter  using  shape tensor 
* clean code
```
  5c0bfc18
- [Zero-Dim] support 0D Tensor for reshape/create_parameters (#47074) · 35d5db36
  由 zhouweiwei2014 提交于 10月 18, 2022
  
  35d5db36
- C
  [Auto Parallel]Add parallel tuner (#46189) · 3108ba11
  由 caozhou 提交于 10月 18, 2022
```
* add parallel tuner

* add unittest

* fix unittest

* set timeout of unittest

* set unittest timeout

* fix auto_mode setting

* update unittest

* sync from develop and update unittest

* remove unused import

* update unittest

* update cmakelist

* add unittests
```
  3108ba11
- R
  
  [CustomDevice] turn on WITH_CUSTOM_DEVICE when WITH_PYTHON=ON (#47108) · 9cdf30dc
  由 ronnywang 提交于 10月 18, 2022
  
  9cdf30dc
- L
  
  add strategy group (#47021) · 178d7e5e
  由 LiYuRio 提交于 10月 18, 2022
  
  178d7e5e
- S
  add embedding range check (#46991) · d68c38ef
  由 seemingwang 提交于 10月 18, 2022
```
* add embedding range check

* change head file

* change head file

* fix
```
  d68c38ef
- L
  
  Add value check & error message for gather_tree (#47051) · e5e3d5cf
  由 liu zhengxi 提交于 10月 18, 2022
  
  e5e3d5cf
- W
  Merge layernorm trt fuse (#46320) · 5e9f491e
  由 Wang Bojun 提交于 10月 18, 2022
```
* first version, accuracy corrected

* disable debug print

* use blockReduceSum in phi

* add UT

* add opCompat

* code style

* code refine

* bug fix

* code refine

* test fix

* bugfix

* codesytle fix

* code style

* code-style

* code-style

* code-style
```
  5e9f491e
- S
  FC + activation fuse passes (#45183) · b7a23adb
  由 Sławomir Siwek 提交于 10月 18, 2022
```
* git

* style

* leave default relu in kernel

* style

* cleanup FCMKLDNN pattern

* merge conflicts

* update develop

* update develop

* add const

* rename to oneDNN and adjust attributes

* whitespace
```
  b7a23adb
- C
  [Auto Parallel] Add cost interface (#47043) · da051350
  由 caozhou 提交于 10月 18, 2022
```
* add cost interface

* update inferface and add unittest

* update unittest

* update inferface
```
  da051350
- N
  
  remove __future__ import in docstring, test=document_fix (#46890) · 30dae6db
  由 Nyakku Shigure 提交于 10月 18, 2022
  
  30dae6db
- H
  Construct exec and ctx only once in cond op to speed up (#47009) · 42e312a1
  由 Hui Zhang 提交于 10月 18, 2022
```
* cond infer apply exec seprate

* fix bugs
```
  42e312a1
- W
  
  reconstruct code for convert_fp16 (#46428) · 1cc482b0
  由 Wilber 提交于 10月 18, 2022
  
  1cc482b0
- fix doc of some sparse api (#47020) · a89b33ff
  由 zhouweiwei2014 提交于 10月 18, 2022
  
  a89b33ff
- X
  
  [Paddle Inference] Add_expand_v2_trt_layer (#47002) · a21a2b5b
  由 xiaoxiaohehe001 提交于 10月 18, 2022
  
  a21a2b5b
- N
  [CodeStyle][py2] remove `compat` module (to_text) (#47036) · ad4c773b
  由 Nyakku Shigure 提交于 10月 18, 2022
```
* [CodeStyle][py2] remove `compat` module (to_text)

* remove some unnecessary decode

* remove to_text definition and unittest

* Revert "remove to_text definition and unittest"

This reverts commit a6b69cb8dca8b9b031ce10ea32d1040e7e0dd267.

* remove an assertion

* empty commit
```
  ad4c773b
- W
  
  [Eager, Performance optimization] support pow( ** operator) to sink to Cpp layer (#47077) · 62c0abac
  由 Weilong Wu 提交于 10月 18, 2022
  
  62c0abac
- H
  [XPU] update xpu cmake to 1016. test=kunlun (#47041) · 55ac9c46
  由 houj04 提交于 10月 18, 2022
```
* [XPU] update xpu cmake to 1016. test=kunlun

* fix special case of transpose op. test=kunlun
```
  55ac9c46
- Z
  [code-gen] Support code-gen for opmaker of sparse op (#46993) · bdd3dde3
  由 zyfncg 提交于 10月 18, 2022
```
* support generating code of opmaker for backward op invoke forward op

* gsupport code-gen of opmaker for sparse op

* refind logic of choose phi kernrel

* fix complie budg

* fix code_gen bug

* fix bug

* fix kernel signature code-gen

* fix complie bug of VarType

* fix complie bug of VarType

* fix test_sparse_conv_op

* fix test_sparse_norm_op
```
  bdd3dde3
- H
  
  delete GetExpectedKernelType mkldnn of conv_op (#47044) · a9c20660
  由 HongyuJia 提交于 10月 18, 2022
  
  a9c20660
- Z
  [AutoParallel] add callbacks (#47014) · 7c92177c
  由 zhaoyingli 提交于 10月 18, 2022
```
* [AutoParallel] add callbacks

* fix unittest

* fix dist_context

* fix engine

* fix cmakelist

* fix unittest's returns

* fix cmakelist
```
  7c92177c
17 10月, 2022 2 次提交

Add enable_partial_send_recv switch in pipeline_configs (#46992) · b9a2f29c

由 Ghost Screaming 提交于 10月 17, 2022

* Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result
is wrong.

* Support allow_partial switch, which can be configure in
pipeline_configs. If sent tensor are not the same from
different hosts, they shouldn't been sent partially and
then concated as a whole tensor.

* Change name allow_partial to enable_partial_send_recv.

* Add global variable _enable_partial_send_recv

b9a2f29c

Support BF16 training for sharding (#46846) · 0b39b244

由 Ghost Screaming 提交于 10月 17, 2022

* Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result
is wrong.

* support pure bfloat16

* support bf16 linear

* update PR to pass CI

* tiny fix where_grad_kernel.cu

* Support bfloat16 type for reducer and sharding.

* Fix some bug.

* Polish code.

* Polise code.

* Add bfloat16 datatype in fill_grad kernels.
Co-authored-by: Nsneaxiy <sneaxiy@126.com>

0b39b244

翠花的腿毛 / Paddle 与 Fork 源项目一致

翠花的腿毛 / Paddle
与 Fork 源项目一致