提交 · f9815bfee7f74d08ebcd0e3c9e588a3261326121 · BaiXuePrincess / Paddle

03 12月, 2022 1 次提交
- Y
  
  Scatter 0D index for gather, 0D index and 0D updates for scatter. (#48452) · f9815bfe
  由 Yuang Liu 提交于 12月 03, 2022
  
  f9815bfe
02 12月, 2022 13 次提交

Y

[Paddle-TRT] Support engine sharing memory of multiple predictors (#47631) · ea5ca555
由 Yuanle Liu 提交于 12月 02, 2022

ea5ca555
P
[PHI] Migrate elementwise_sub kernel (#48611) · 493825a5
由 Piotr Paturej 提交于 12月 02, 2022
```
* Add migrations

* Fix build errors

* Remove elementwise_mul from migration
```
493825a5

Migrate mul_mkldnn_op to phi matmul_kernel (#48299) · e8edbb09

由 Hulek 提交于 12月 02, 2022

* Migrate mul_mkldnn_op to matmul_kernel

* Review fixes - changed mutable_data, changed ctx to dev_ctx, fixed namespaces

* switched some funcs to phi

* Deleted not needed phi:: and changed place checking according to standards

e8edbb09

[XPU ]Fix xpu compile error (#48621) · 2af82190

由 Jiabin Yang 提交于 12月 02, 2022

* [Eager] Fix paddle.grad interface

* [Eager] Support minimum SubGraph for GeneralGrad

* Add needed_nodes to prune grad graph more thoroughly

* [Eager] Add grad_node_trans_mapping_ to record which grad_node has been transformed to AccumulationNode

* [Eager] Fix paddle.grad interface

* Polish code

* remove potential_stop_node

* Add endding_nodes to enhance genSugraph logic

* clear endding_nodes_

* polish code

* rename endding_nodes to endding_nades_

* Refactor grad interface

* Add register_hook case to fix coverage-ci

* Fix code format

* Refactor general_grad

* Add more code comments

* call clear directly to release GradSlotMeta

* fix a mistake

* fix matmul/ multiply kernel logic and optional input in yaml, fill zeros logic and so on.

* fix batch_norm_double_grad yaml optional config

* fix tanh_triple_grad yaml and kernels

* fix MultiplyTripleGradKernel optional logic

* fix merge mistake

* fix compile error

* remove legacy attr for bn

* polish code

* fix some kernel

* merge develop

* fix error

* remote log

* fix kernel with full like

* hide value log behind

* hide value log behind

* fix matmul_triple grad

* fix xpu compile error

* fix xpu compile error

* fix xpu ut

* fix xpu ut

* fix_xpu_compile_error
Co-authored-by: NWeilong Wu <veyron_wu@163.com>

2af82190

Split common funcs from reduction and structure modification (#46970) · ef575d6a

由 Bo Zhang 提交于 12月 02, 2022

* profile reduce kernel for fp16 and reduceHigherdim

* use reinterpret_cast

* fix for CI on ROCm

* add Macro for ROCm

* ROCm CI config

* ROCm CI config

* unit test repair

* pull

* add common_funcs.h

* reduceType

* Update reduce_function.h

* not higher

* rename

ef575d6a

Fix fuse_gemm_epilogue (#47805) · 6efc2888

由 Shijie 提交于 12月 02, 2022

* Fix fuse_gemm_epilogue

* update tests

* Update CMakeLists.txt

* Update CMakeLists.txt

* Update CMakeLists.txt

* fix random seed

* use assert_allclose

* Update test_dist_fuse_gemm_epilogue_pass.py

* Update cpp_pass.py

* Update test_dist_fuse_gemm_epilogue_pass.py

* fix codestyle

* update seed and atol

6efc2888

G

add some compare and logical trt converter (#48592) · 4c38b87e
由 gem5 提交于 12月 02, 2022

4c38b87e
R
fix phi capi kernel registration macro error (#48616) · 0f3b1ad6
由 ronnywang 提交于 12月 02, 2022
```
* fix capi kernel registration macro error

* update
```
0f3b1ad6
W
[Eager, Performance Optimization] modify AllocateFrom to reduce deconstruction... · 708c4f88
由 Weilong Wu 提交于 12月 02, 2022
```
[Eager, Performance Optimization] modify AllocateFrom to reduce deconstruction of shared_ptr (#48548)
```
708c4f88

[Eager] Optimize Grad by prune useless branch (#47827) · d1e93be1

由 Jiabin Yang 提交于 12月 02, 2022

* [Eager] Fix paddle.grad interface

* [Eager] Support minimum SubGraph for GeneralGrad

* Add needed_nodes to prune grad graph more thoroughly

* [Eager] Add grad_node_trans_mapping_ to record which grad_node has been transformed to AccumulationNode

* [Eager] Fix paddle.grad interface

* Polish code

* remove potential_stop_node

* Add endding_nodes to enhance genSugraph logic

* clear endding_nodes_

* polish code

* rename endding_nodes to endding_nades_

* Refactor grad interface

* Add register_hook case to fix coverage-ci

* Fix code format

* Refactor general_grad

* Add more code comments

* call clear directly to release GradSlotMeta

* fix a mistake

* fix matmul/ multiply kernel logic and optional input in yaml, fill zeros logic and so on.

* fix batch_norm_double_grad yaml optional config

* fix tanh_triple_grad yaml and kernels

* fix MultiplyTripleGradKernel optional logic

* fix merge mistake

* fix compile error

* remove legacy attr for bn

* polish code

* fix some kernel

* merge develop

* fix error

* remote log

* fix kernel with full like

* hide value log behind

* hide value log behind

* fix matmul_triple grad
Co-authored-by: NWeilong Wu <veyron_wu@163.com>

d1e93be1

add silu, silu_grad, unfold and unfold_grad xpu kernels (#48325) · f71de378

由 ykkk2333 提交于 12月 02, 2022

* add stat tool

* add roll and roll_grad kernels and strided_slice and strided_slice_grad kernels, test=kunlun

* add silu, unfold and their grads,test=kunlun

f71de378

fix boardcasting superlink (#48434) · c34812ac

由 Infinity_lee 提交于 12月 02, 2022

* fix boardcasting superlink

* Update bitwise_op.cc

* fix typo errors(from 48186)

* Update python/paddle/distribution/uniform.py
Co-authored-by: NLigoml <39876205+Ligoml@users.noreply.github.com>

* Update math.py

* Update math.py

* refix

* Update logic.py

* BaseTransform api doc; test=docs_preview

* Update python/paddle/vision/transforms/transforms.py

* for text block; test=docs_preview

* Update transforms.py
Co-authored-by: NLigoml <39876205+Ligoml@users.noreply.github.com>

c34812ac

C

polish fusion kernel naming (#48609) · 61486bf2
由 Chen Weihang 提交于 12月 02, 2022

61486bf2

01 12月, 2022 11 次提交
- Z
  Rename kernel for top_k, slogdeterminant, generate_proposals_v2 (#48594) · 3d35aa80
  由 zyfncg 提交于 12月 01, 2022
```
* rename kernel for top_k, slogdeterminant, generate_proposals_v2

* fix bug
```
  3d35aa80
- W
  [Paddle Inference] General optimization for no_varlen multihead (#48469) · e5cf75d8
  由 Wangzheee 提交于 12月 01, 2022
```
* general optimization for no_varlen multihead
```
  e5cf75d8
- W
  [Inference] Optimize memory_optimize pass. (#48476) · aa892113
  由 Wilber 提交于 12月 01, 2022
```
* update memory_optimize pass
```
  aa892113
- W
  do not link python lib in tensor wrapper (#48523) · 93099bb8
  由 wanghuancoder 提交于 12月 01, 2022
```
* do not link python lib in tensor wrapper
```
  93099bb8
- Z
  [inference][trt] dynamic shape support for Instance norm (#47998) · 758fccfe
  由 Zhang Jun 提交于 12月 01, 2022
```
* instance norm support dynamic shape
* update unittest
```
  758fccfe
- X
  
  [Paddle Inference] Add sign and not trt converter (#48557) · 1b1d6d3f
  由 xiaoxiaohehe001 提交于 12月 01, 2022
  
  1b1d6d3f
- Z
  [Paddle Inference] remove conv_act_set from graph_pattern_detector.cc (#48569) · d3f8ede0
  由 zhoutianzi666 提交于 12月 01, 2022
```
* remove conv_act_set from graph_pattern_detector.cc
```
  d3f8ede0
- Z
  [inference][trt] Fp16 support for Generic plugin (#48253) · 2bdad6cd
  由 Zhang Jun 提交于 12月 01, 2022
```
* Support FP16 in generic TensorRT plugin.
* Support FP16 for Pad3D.
```
  2bdad6cd
- M
  fuse-mt passes compatible with structured pruning (#48585) · a365024c
  由 minghaoBD 提交于 12月 01, 2022
```
* fuse-mt passes compatible with structured pruning
```
  a365024c
- H
  [Fix Type] Fix typo error (#48391) · 47e7b7a5
  由 HongyuJia 提交于 12月 01, 2022
```
* fix typo error

* pass CI-coverage
```
  47e7b7a5
- Z
  
  change d2d copy to api copy in xpu kernel, test=kunlun (#48505) · 4f834cb2
  由 zhangyikun02 提交于 12月 01, 2022
  
  4f834cb2
30 11月, 2022 15 次提交
- Q
  
  fix phi header file without fluid header, test=develop (#48488) · cbb1cfbb
  由 Qi Li 提交于 11月 30, 2022
  
  cbb1cfbb
- Z
  Fix error log for yaml check (#48126) · f62b3fc8
  由 zyfncg 提交于 11月 30, 2022
```
* fix error log for yaml check

* remove grad_op of increment
```
  f62b3fc8
- N
  [PHI decoupling] migrate transpose_op.cu.h and gpu_utils.h to phi (#48286) · 8a9bef70
  由 Netpunk 提交于 11月 30, 2022
```
* migrate transpose_op.cu.h and gpu_utils.h

* format code style

* fix some problems

* format code

* reset tranpose_op.cc

* test commit

* recover transpose_op.h

* delete transpose_op.h

* adjust header files order in transpose_op.cc
```
  8a9bef70
- S
  [BugFix]Fix tuple output bug of pylayer (#48533) · fd1c0d7f
  由 ShenLiang 提交于 11月 30, 2022
```
* fix bug of pylayer

* fix bug
```
  fd1c0d7f
- W
  
  refine mmap allocator (#48511) · 2de881aa
  由 wanghuancoder 提交于 11月 30, 2022
  
  2de881aa
- A
  [Perf]Fix interploate OutSize data transform problem (#48498) · 0b2a66bb
  由 Aurelius84 提交于 11月 30, 2022
```
* [Perf]Fix interploate OutSize data transform problem

* fix code style

* fix grad

* fix phi kernel
```
  0b2a66bb
- Support more activation in fused multi transformer (#48371) · 8a717a3e
  由 MarDino 提交于 11月 30, 2022
```
* add activation support
* fix cublasLt bug
* remove useless code and fix test random range
```
  8a717a3e
- F
  
  feat:add the support for vit_attention_op on gpu (#48515) · e9ca7600
  由 feng_shuai 提交于 11月 30, 2022
  
  e9ca7600
- Y
  
  [Paddle Inference] clean unused code (#48392) · 5de01e8a
  由 Yuanle Liu 提交于 11月 30, 2022
  
  5de01e8a
- Z
  Add fuse_act_add_grad_pass (#48346) · ca552933
  由 zhangbo9674 提交于 11月 30, 2022
```
* add fuse act add grad pass

* polish code

* refine code

* add test

* refine code
```
  ca552933
- Z
  Fix the name map of operator from Phi to fluid (#48496) · e337d280
  由 zyfncg 提交于 11月 30, 2022
```
* rename some kernel name

* fix compile problem
```
  e337d280
- Z
  Fix bug of wrong eigen dependency (#48485) · 35902ec6
  由 zyfncg 提交于 11月 30, 2022
```
* fix bug of eigen_dependency

* fix xpu compile
```
  35902ec6
- R
  Add int8 support in fused_multi_transformer_pass and fuse_multi_transformer_layer_pass (#48209) · 12486712
  由 RichardWooSJTU 提交于 11月 30, 2022
```
* delete unnecessary shape and slice op
Co-authored-by: NYour Name <you@example.com>
```
  12486712
- J
  use correct xpu stream for synchronization (#48470) · 16562a9d
  由 james 提交于 11月 30, 2022
```
some legacy code still use xpu_wait() for stream sync -- it only syncs
default stream. this PR replaces them with dev_ctx.Wait() to ensure
that correct stream is always used
```
  16562a9d
- Z
  
  optimize for argsort with xpu, test=kunlun (#48440) · 7bf7e6e0
  由 zhangyikun02 提交于 11月 30, 2022
  
  7bf7e6e0

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致