提交 · 3cbca60f6b635d24b9cb90c1406f3f72e742b458 · BaiXuePrincess / Paddle

01 12月, 2022 10 次提交
- W
  [Paddle Inference] General optimization for no_varlen multihead (#48469) · e5cf75d8
  由 Wangzheee 提交于 12月 01, 2022
```
* general optimization for no_varlen multihead
```
  e5cf75d8
- W
  [Inference] Optimize memory_optimize pass. (#48476) · aa892113
  由 Wilber 提交于 12月 01, 2022
```
* update memory_optimize pass
```
  aa892113
- W
  do not link python lib in tensor wrapper (#48523) · 93099bb8
  由 wanghuancoder 提交于 12月 01, 2022
```
* do not link python lib in tensor wrapper
```
  93099bb8
- Z
  [inference][trt] dynamic shape support for Instance norm (#47998) · 758fccfe
  由 Zhang Jun 提交于 12月 01, 2022
```
* instance norm support dynamic shape
* update unittest
```
  758fccfe
- X
  
  [Paddle Inference] Add sign and not trt converter (#48557) · 1b1d6d3f
  由 xiaoxiaohehe001 提交于 12月 01, 2022
  
  1b1d6d3f
- Z
  [Paddle Inference] remove conv_act_set from graph_pattern_detector.cc (#48569) · d3f8ede0
  由 zhoutianzi666 提交于 12月 01, 2022
```
* remove conv_act_set from graph_pattern_detector.cc
```
  d3f8ede0
- Z
  [inference][trt] Fp16 support for Generic plugin (#48253) · 2bdad6cd
  由 Zhang Jun 提交于 12月 01, 2022
```
* Support FP16 in generic TensorRT plugin.
* Support FP16 for Pad3D.
```
  2bdad6cd
- M
  fuse-mt passes compatible with structured pruning (#48585) · a365024c
  由 minghaoBD 提交于 12月 01, 2022
```
* fuse-mt passes compatible with structured pruning
```
  a365024c
- H
  [Fix Type] Fix typo error (#48391) · 47e7b7a5
  由 HongyuJia 提交于 12月 01, 2022
```
* fix typo error

* pass CI-coverage
```
  47e7b7a5
- Z
  
  change d2d copy to api copy in xpu kernel, test=kunlun (#48505) · 4f834cb2
  由 zhangyikun02 提交于 12月 01, 2022
  
  4f834cb2
30 11月, 2022 15 次提交
- Q
  
  fix phi header file without fluid header, test=develop (#48488) · cbb1cfbb
  由 Qi Li 提交于 11月 30, 2022
  
  cbb1cfbb
- Z
  Fix error log for yaml check (#48126) · f62b3fc8
  由 zyfncg 提交于 11月 30, 2022
```
* fix error log for yaml check

* remove grad_op of increment
```
  f62b3fc8
- N
  [PHI decoupling] migrate transpose_op.cu.h and gpu_utils.h to phi (#48286) · 8a9bef70
  由 Netpunk 提交于 11月 30, 2022
```
* migrate transpose_op.cu.h and gpu_utils.h

* format code style

* fix some problems

* format code

* reset tranpose_op.cc

* test commit

* recover transpose_op.h

* delete transpose_op.h

* adjust header files order in transpose_op.cc
```
  8a9bef70
- S
  [BugFix]Fix tuple output bug of pylayer (#48533) · fd1c0d7f
  由 ShenLiang 提交于 11月 30, 2022
```
* fix bug of pylayer

* fix bug
```
  fd1c0d7f
- W
  
  refine mmap allocator (#48511) · 2de881aa
  由 wanghuancoder 提交于 11月 30, 2022
  
  2de881aa
- A
  [Perf]Fix interploate OutSize data transform problem (#48498) · 0b2a66bb
  由 Aurelius84 提交于 11月 30, 2022
```
* [Perf]Fix interploate OutSize data transform problem

* fix code style

* fix grad

* fix phi kernel
```
  0b2a66bb
- Support more activation in fused multi transformer (#48371) · 8a717a3e
  由 MarDino 提交于 11月 30, 2022
```
* add activation support
* fix cublasLt bug
* remove useless code and fix test random range
```
  8a717a3e
- F
  
  feat:add the support for vit_attention_op on gpu (#48515) · e9ca7600
  由 feng_shuai 提交于 11月 30, 2022
  
  e9ca7600
- Y
  
  [Paddle Inference] clean unused code (#48392) · 5de01e8a
  由 Yuanle Liu 提交于 11月 30, 2022
  
  5de01e8a
- Z
  Add fuse_act_add_grad_pass (#48346) · ca552933
  由 zhangbo9674 提交于 11月 30, 2022
```
* add fuse act add grad pass

* polish code

* refine code

* add test

* refine code
```
  ca552933
- Z
  Fix the name map of operator from Phi to fluid (#48496) · e337d280
  由 zyfncg 提交于 11月 30, 2022
```
* rename some kernel name

* fix compile problem
```
  e337d280
- Z
  Fix bug of wrong eigen dependency (#48485) · 35902ec6
  由 zyfncg 提交于 11月 30, 2022
```
* fix bug of eigen_dependency

* fix xpu compile
```
  35902ec6
- R
  Add int8 support in fused_multi_transformer_pass and fuse_multi_transformer_layer_pass (#48209) · 12486712
  由 RichardWooSJTU 提交于 11月 30, 2022
```
* delete unnecessary shape and slice op
Co-authored-by: NYour Name <you@example.com>
```
  12486712
- J
  use correct xpu stream for synchronization (#48470) · 16562a9d
  由 james 提交于 11月 30, 2022
```
some legacy code still use xpu_wait() for stream sync -- it only syncs
default stream. this PR replaces them with dev_ctx.Wait() to ensure
that correct stream is always used
```
  16562a9d
- Z
  
  optimize for argsort with xpu, test=kunlun (#48440) · 7bf7e6e0
  由 zhangyikun02 提交于 11月 30, 2022
  
  7bf7e6e0
29 11月, 2022 15 次提交

由 lzy 提交于 11月 29, 2022

* fix mma_tensorcore (__CUDA_ARCH__)

* disable tensorcore by default.

disable tensorcore by default, because the judgment of __CUDA_ARCH__ will cause undefined behavior in some environments, can manually enable it on a machine that supports tensorcore.

bf4d1792

H

rename use_cudnn to use_gpudnn in phi (#48443) · 41f15537
由 HongyuJia 提交于 11月 29, 2022

41f15537

[PHI] traspose2 kernel migration (#47748) · d86aa4ca

由 Paulina Gacek 提交于 11月 29, 2022

* traspose2 kernel migrated

* Got rid of mutable_data

* x modification added

* ops added in extra info file

* Formatting fix

* 2 fuse passes with tanpose2 commented

* nr of outs changed in 2 passes, passes uncommented

* Changes in passes reverted

* transpose chnaged in operator.cc

* MKLDNN check in operator.cc

* Transpose fixes

* Fix deleted from operato

* template corrected
Co-authored-by: NPaulina Gacek <paulinagacek@intel.com>

d86aa4ca

张

Replace LoDTensor with phi::DenseTensor in fluid\operators (#48417) · 91dd8a2e

由张春乔提交于 11月 29, 2022

* replace LoDTensor with phi::DenseTensor in fluid\operators

* replace LoDTensor with phi::DenseTensor in fluid\operators

* Update split_lod_tensor_op.cc

* Update warpctc_op.cc

* Update broadcast_tensors_op.cc

* Update crf_decoding_op.cc

* Update lstm_op.cc

* Update lstm_op.cc

* Update lod_reset_op.cc

* Update gru_op.cc

* Update linear_chain_crf_op.cc

* resume 2 files for confilct

* Update gru_op.cc

* Update linear_chain_crf_op.cc

* Update lstm_op.cc

91dd8a2e

N
[CodeStyle][isort] introduce isort (part4) (#48402) · f85def97
由 Nyakku Shigure 提交于 11月 29, 2022
```
* isort all files

* revert conflicting files

* revert conflicting files

* revert conflicting files
```
f85def97
X

[Paddle Inference] Add take_along_axis trt converter (#48358) · 9ae6c854
由 xiaoxiaohehe001 提交于 11月 29, 2022

9ae6c854
A
[PHI decoupling]migrate enforce_custom.h from fluid to phi (#48422) · 9896ac1e
由 Asthestarsfalll 提交于 11月 29, 2022
```
* migrate enforce_custom.h from fluid to phi

* move to backends/custom/
```
9896ac1e
S

eltwise_div + scale [PHI] (#48484) · fa10524d
由 Sławomir Siwek 提交于 11月 29, 2022

fa10524d
V
Optimize the implementation of the argsort operator. (#47738) · 9e9b705a
由 Vvsmile 提交于 11月 29, 2022
```
Optimize the implementation of the argsort operator
```
9e9b705a

[PHI] Migrate matmul kernel (#48162) · f41ccbd5

由 Sławomir Siwek 提交于 11月 29, 2022

* cleanup unused code

* unify is_int8 is_bfloat16

* Simplify matmul_v2 FWD kernel

* remove RunKernel methods

* remove import namespace

* remove headers

* clean fluid/phi cross imports

* remove fluid axpy_handler

* delete fluid methods

* activations

* OneDNNMemDesc

* MKLDNNFormatForSize

* MatchShapeToLayout

* MKLDNNMemoryFormat

* MKLDNNFormat

* ReorderMKLDNNHandler

* to_void_cast

* review suggestions

* interpolate

* remove fluid depedency

* init

* ExecuteMatMulV2

* rm fluid kernel

* matmul_grad

* remove mutable_data

* mul_grad

* matmul fwd

* add extra attr

* temp disable passes

* re-enable passes

* workaround for matmul+act

* fix for matmul+eltwise_add

* fix typo

* merge bugfix #48364

* remove merge conflict

f41ccbd5

[Control Flow] replace executor in while op with InterpreterCore (#47573) · 6dbfbfa5

由 kangguangli 提交于 11月 29, 2022

* fix:add no support for cuda_arch<700

* replace Executor in while op with InterpreterCore

* cache InterpreterCore as the member of WhileOp

* fix bug: tensor place changed because of assign op in while loop

* refine code

* refine code

* refine code

* hot fix

* fix compile

* merge develop

* follow comments

* add log for test

* remove LoDTensor

* set flag control_flow_use_new_executor false
Co-authored-by: Nfengshuai <fengshuai03@baidu.com>
Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>

6dbfbfa5

H

add floor fp32 op *test=kunlun (#48458) · 9d4b4be3
由 haosicheng 提交于 11月 29, 2022

9d4b4be3
J
Bugfix for Collective default calc stream (#48308) · a66bb67a
由 JZ-LIANG 提交于 11月 29, 2022
```
* get default calc stream from execution ctx instead of global dev ctx pool.
```
a66bb67a
G

Support rsqrt op. (#48223) · fc882c7b
由 gem5 提交于 11月 29, 2022

fc882c7b

[Fluid API]Remove multiple APIs in control_flow (#48279) · c0d31dac

由 LiYuRio 提交于 11月 29, 2022

* remove lod_tensor_to_array, array_to_lod_tensor, DynamicRNN

* remove less_equal, greater_than, greater_equal, equal, not_equal

c0d31dac

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致