提交 · 29d75c14f1e25ca9c4b741270859027fa390179a · BaiXuePrincess / Paddle

23 11月, 2022 6 次提交
- L
  Add bfloat16 type support for abs op (#48205) · 29d75c14
  由 limingshu 提交于 11月 23, 2022
```
* first commit

* 2nd commit
```
  29d75c14
- D
  
  add support of controlflow op for custom device (#48259) · edf46919
  由 duanyanhui 提交于 11月 23, 2022
  
  edf46919
- H
  
  opt kernel_factory warning message (#48245) · 32462c64
  由 HongyuJia 提交于 11月 23, 2022
  
  32462c64
- L
  
  fix vector out of range error (#48255) · a606db67
  由 Leo Chen 提交于 11月 23, 2022
  
  a606db67
- Z
  
  add warpctc kernel and change cast_v2 to cast for xpu, test=kunlun (#48134) · 25ffe9c2
  由 zhangyikun02 提交于 11月 23, 2022
  
  25ffe9c2
- Use cublaslt in multi transformer FFN (#48052) · b07e6b45
  由 MarDino 提交于 11月 23, 2022
```
* use fused mlp in multi transformer
* Restruct code
* use cublaslt to fuse ffn
* fix conflict
```
  b07e6b45
22 11月, 2022 10 次提交
- P
  [PHI] Migrate elementwise_div + all elementwise grad kernels (#48210) · 78b30e97
  由 Piotr Paturej 提交于 11月 22, 2022
```
* Migrate elementwise_div

* Migrate elementwise grad kernels
```
  78b30e97
- Z
  
  Optimize the format of printing phi kernels (#48228) · cbdc86b5
  由 Zhang Zheng 提交于 11月 22, 2022
  
  cbdc86b5
- F
  fix:fix the bug of TRT_8.0.3.4 (#48135) · 1022b777
  由 feng_shuai 提交于 11月 22, 2022
```
* fix:fix the bug of trt_8.0.3.4

* fix: fix the bug of trt_8.0

* fix: notes
```
  1022b777
- H
  
  fix typo error (#48156) · 0cdca676
  由 HongyuJia 提交于 11月 22, 2022
  
  0cdca676
- H
  [PHI decoupling] move vol2col from fluid to phi (#48175) · aa36c6aa
  由 huangjiyi 提交于 11月 22, 2022
```
* move vol2col from fluid to phi

* update copyright year
```
  aa36c6aa
- T
  CudnnNormConvolution is no longer supported on NVIDIA Hopper GPUs (#48203) · df4dfda0
  由 Tian Zheng 提交于 11月 22, 2022
```
* Skip tests that use fused_ops on H100

* Add error message to FusedOps on H100
```
  df4dfda0
- S
  Some residualdata fixes (#48118) · 7bbdbe5b
  由 Sylwester Fraczek 提交于 11月 22, 2022
```
Removed ResidualData and Bias from ExtraAttrProperties because it's not an attribute.
Removed bug with checking for ResidualData attribute in matmul_elementwise_add_fuse_pass
Removed residualData from list of matmul outputs in cpu_bfloat16_pass.cc because it's input
Co-authored-by: NTomasz Socha <tomasz.socha@intel.com>
```
  7bbdbe5b
- H
  Delete caching from requantize_mkldnn_op and changed to Acquire API (#48113) · 7d6a4a54
  由 Hulek 提交于 11月 22, 2022
```
* Delete caching from requantize_mkldnn_op and changed to Acquire API
* Fixed codestyle and implementation
```
  7d6a4a54
- Y
  
  bf16 for interpolate, nhwc for bf16 (#48192) · e0dd4ee9
  由 Yuang Liu 提交于 11月 22, 2022
  
  e0dd4ee9
- H
  [PHI decoupling] remove "gpu_device_function.h" in fluid. (#48117) · 4da1a0fe
  由 huangjiyi 提交于 11月 22, 2022
```
* move "paddle/phi/backends/gpu/gpu_device_function.h" to phi

* update copyright years

* rm "fluid/platform/device/gpu/gpu_device_function.h" in phi

* rm dependence to "gpu_device_function.h" in fluid

* rm gpu_device_function.h etc in fluid

* fix rocm-complie bugs

* fix cuda_helper_test.cu bugs
```
  4da1a0fe
21 11月, 2022 16 次提交

L
fix doc of NPUPlace (#48148) · 809516f6
由 Leo Chen 提交于 11月 21, 2022
```
* fix doc of NPUPlace

* fix doc of NPUPlace, test=document_fix
```
809516f6
R

Fix Ctx Dev pointer for KUNLUN (#48184) · 2d0fb059
由 Roc 提交于 11月 21, 2022

2d0fb059

add fc-residual quantization (#46917) · fed0ed34

由 Sylwester Fraczek 提交于 11月 21, 2022

* add fc-residual quantization

* revert removal of check for use_mkldnn

* fix bug

* add disable_logs

* review fix

call twice AreScalesPresntForNodes instead of if-else

* rewrite residual input to output

* revert fc mkldnn taking residual data

* format fix

* fix LoDTensor->DenseTensor

* LoDTensor->DenseTensor

* output->input

* revert changes to unsupported script

revert changes to unsupported script

* remove fc residualdata from output blocklist in cpu_bfloat16_pass.cc

fed0ed34

R

delete unnecessary shape and slice op (#48112) · 41483383
由 RichardWooSJTU 提交于 11月 21, 2022

41483383

[PHI] Migrate mul_grad kernel (#48061) · 55f6fb3d

由 Sławomir Siwek 提交于 11月 21, 2022

* cleanup unused code

* unify is_int8 is_bfloat16

* Simplify matmul_v2 FWD kernel

* remove RunKernel methods

* remove import namespace

* remove headers

* clean fluid/phi cross imports

* remove fluid axpy_handler

* delete fluid methods

* activations

* OneDNNMemDesc

* MKLDNNFormatForSize

* MatchShapeToLayout

* MKLDNNMemoryFormat

* MKLDNNFormat

* ReorderMKLDNNHandler

* to_void_cast

* review suggestions

* interpolate

* remove fluid depedency

* init

* ExecuteMatMulV2

* rm fluid kernel

* matmul_grad

* remove mutable_data

* mul_grad

55f6fb3d

mma qk tensor_core (#48087) · d79eda71

由 lzy 提交于 11月 21, 2022

* use mma for QK dot computing in fused_multi_transformer.
* Update fused_multi_transformer_op.cu.h

d79eda71

W
refine reduce_all (#48133) · 56f15c43
由 wanghuancoder 提交于 11月 21, 2022
```
* refine reduce_all
```
56f15c43
Z
Fix wrong eigen header include in data_type.h (#48157) · 70589379
由 zyfncg 提交于 11月 21, 2022
```
* Fix wrong eigen header include

* fix compile bug
```
70589379
P
[PHI decoupling] move "thread pool" from fluid to phi (#48075) · 3ca7328f
由 PuQing 提交于 11月 21, 2022
```
* move threadpool

fix cmake

* fix make
```
3ca7328f
T

add adamw suppor xpu, test=kunlun (#48114) · 27e252d9
由 taixiurong 提交于 11月 21, 2022

27e252d9
W

round (#48107) · b546438c
由 wenbin 提交于 11月 21, 2022

b546438c
H
[PHI decoupling] move cross_entropy from fluid to phi (#48160) · 3501ff7d
由 huangjiyi 提交于 11月 21, 2022
```
* move cross_entropy from fluid to phi

* replace mutable_data with Alloc

* use .template
```
3501ff7d

Unify `ProcessGroupNCCL` APIs underlying implementation (#48163) · 88410225

由 Wen Sun 提交于 11月 21, 2022

* refactor: replace Collective & PointToPoint with NCCLEnv

* refactor: rename to RunFnInNCCLEnv

* refactor: pass std::function by value

88410225

L

add new map instance (#48145) · 2a47416c
由 LiYuRio 提交于 11月 21, 2022

2a47416c
L

return pointer rather than reference (#48152) · 403d58bb
由 LiYuRio 提交于 11月 21, 2022

403d58bb
P

remove macros.h (#48069) · 02c51f3b
由 PuQing 提交于 11月 21, 2022

02c51f3b

19 11月, 2022 2 次提交
- W
  
  refactor: rm redundant funcs (#48149) · f38e09f0
  由 Wen Sun 提交于 11月 19, 2022
  
  f38e09f0
- A
  [CustomPlace] fix amp (#48090) · c775bc69
  由 Aganlengzi 提交于 11月 19, 2022
```
* [CustomPlace] fix amp

* [CustomPlace] fix amp

* fix ut because of too long time matmul fp16
```
  c775bc69
18 11月, 2022 6 次提交

W

refine save hook (#48124) · 04709310
由 wanghuancoder 提交于 11月 18, 2022

04709310

Fused QKVBiasAdd and Transpose with Split Q, KV (#47680) · d595928e

由 MarDino 提交于 11月 18, 2022

* fused qkvBiasAdd and transpose with split qkv

* fix typo

* fix format

* fix name

* add annotation

* fix comment

d595928e

[PHI] Migrate matmul_grad kernel (#48023) · 4ab18ada

由 Sławomir Siwek 提交于 11月 18, 2022

* cleanup unused code

* unify is_int8 is_bfloat16

* Simplify matmul_v2 FWD kernel

* remove RunKernel methods

* remove import namespace

* remove headers

* clean fluid/phi cross imports

* remove fluid axpy_handler

* delete fluid methods

* activations

* OneDNNMemDesc

* MKLDNNFormatForSize

* MatchShapeToLayout

* MKLDNNMemoryFormat

* MKLDNNFormat

* ReorderMKLDNNHandler

* to_void_cast

* review suggestions

* interpolate

* remove fluid depedency

* init

* ExecuteMatMulV2

* rm fluid kernel

* matmul_grad

* remove mutable_data

4ab18ada

[PHI] Migrate conv_transpose kernel (#48119) · 9aacb31b

由 Zuza Gawrysiak 提交于 11月 18, 2022

* Migrate conv_transpose to phi

* Move handler to kernel

* kernel m

* Fix formatting

* handler

* remove fluid

* revert tcp_store

* tcp_store

* remove unused

* Fix declaration

* add dnn input

* Fix typo
Co-authored-by: NSławomir Siwek <slawomir.siwek@intel.com>

9aacb31b

Z
Fix bug of zero_allocator in HostAlloc (#48108) · 7f92e27e
由 zyfncg 提交于 11月 18, 2022
```
* fix bug of zero_allocator in host

* fix test compile bug

* add unittest

* update test
```
7f92e27e

Optimize FusedBiasAddGelu Kernel (#47679) · b0e28540

由 MarDino 提交于 11月 18, 2022

* Add quick gelu and fused bias add kernel

* fix annotation

* remove useless code

* add fast gelu option and set it in multi transformer op

* add flag to restrict if use fast gelu approximate

* fix flags conflict

* fix use tanh function instead

* add cudart version limit

* use phi fast tanh func

* fix comment

b0e28540

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致