提交 · 6a706e63cf2467c0a0dbfc2a774039b33c15e4b4 · PaddlePaddle / Paddle

28 9月, 2022 1 次提交

[PHI] relu6_grad kernel (#46501) · cee2b12d

由 Sławomir Siwek 提交于 9月 28, 2022

* Relu6

* remove fluid handler

* add individual kernel signature

* coding style

* replace bounded_relu with clip

* whitespace

* code style

cee2b12d

26 9月, 2022 1 次提交
- C
  
  [MLU] fluid: add mluop (#46429) · 3e1e482b
  由 cifar10 提交于 9月 26, 2022
  
  3e1e482b
25 9月, 2022 1 次提交
- S
  
  move some singleton to cc file (#46470) · e8b9ae20
  由 sneaxiy 提交于 9月 25, 2022
  
  e8b9ae20
22 9月, 2022 1 次提交
- C
  
  [MLU] fix profiler compile failure (#46208) · 608181a9
  由 Chenxiao Niu 提交于 9月 22, 2022
  
  608181a9
18 9月, 2022 1 次提交
- R
  
  Add INT8 support for fused_multi_transformer_op (#45284) · 3d7e2118
  由 RichardWooSJTU 提交于 9月 18, 2022
  
  3d7e2118
16 9月, 2022 5 次提交

Support broadcast elementwise operators with int64 index type (#45741) · 20b5bf84

由 sneaxiy 提交于 9月 16, 2022

* support int64 non-broadcast

* support broadcast case for int64 index

* fix bug

* support more Arity

* remove some codes

* upgrade patchelf to v0.15.0 to pass CI build

* fix bug

* fix patchelf installation

* add debug flags

* remove useless codes

* fix viterbi_decode and set_value op uts

* remove always enable int64

20b5bf84

C
optimize device synchronization in profiler (#46089) · 2a5bd7dc
由 chenjian 提交于 9月 16, 2022
```
* avoid to synchronize all devices

* synchronize custom device
```
2a5bd7dc
J

Modify callstacklevel flag for c++ (#46058) · d072aaeb
由 JingZhuangzhuang 提交于 9月 16, 2022

d072aaeb
L
add interpretercore for jit engine (#46092) · 22c3cdb4
由 Leo Chen 提交于 9月 16, 2022
```
* add interpretercore for jit engine

* add ut
```
22c3cdb4

[CustomDevice] add new executor support (#46038) · 268f097e

由 ronnywang 提交于 9月 16, 2022

* [CustomDevice] add custom_device_resource_pool & device_event_custom_device

* update

* update

* update

* update

268f097e

15 9月, 2022 2 次提交
- J
  updating mul and matmul with set_mem_desc (#45624) · 416e0de7
  由 Jacek Czaja 提交于 9月 15, 2022
```
* - mul & matmul changes

- fix

- bs16 correction of strides

* - cosmetic fixes

* - lint

* - fix

* - fix

* - format -> mem_desc

* - fix

* - fix

* - fix

* - fix

* - fix
```
  416e0de7
- N
  
  [CodeStyle] trim trailing whitespace in .h, .cc, .cu, etc. (#46006) · 8dde7aea
  由 Nyakku Shigure 提交于 9月 15, 2022
  
  8dde7aea
14 9月, 2022 2 次提交
- J
  Support inference compilation in training package (#46008) · cbe64cc1
  由 JingZhuangzhuang 提交于 9月 14, 2022
```
* merge python lib
* Update third_party.cmake
* Update CMakeLists.txt
```
  cbe64cc1
- J
  delay tensorrt registry (#45824) · d7d35ff8
  由 JingZhuangzhuang 提交于 9月 14, 2022
```
* Delay TensorRT registry
* Add unused define
* Fix TensorRT test
* fix function to reference
* Update trt_plugin.h
```
  d7d35ff8
09 9月, 2022 3 次提交
- L
  [new-exe] convert fused_all_reduce_op_handle to program (#45774) · e755c07e
  由 Leo Chen 提交于 9月 09, 2022
```
* add operator<< for BuildStrategy

* add fake_coalesce

* fit allreduce mode for new_exe

* remove dubeg code

* follow comments
```
  e755c07e
- R
  [CustomDevice] add dy2static support (#45878) · abc85c50
  由 ronnywang 提交于 9月 09, 2022
```
* [CustomDevice] add dy2static support

* update
```
  abc85c50
- C
  
  [MLU] fix mluinfo compile error. (#45886) · f06ab336
  由 Chenxiao Niu 提交于 9月 09, 2022
  
  f06ab336
08 9月, 2022 2 次提交
- C
  
  fix warning (#45870) · 23998b75
  由 chenjian 提交于 9月 08, 2022
  
  23998b75
- T
  xpu-paddlepaddle-40 [任务] fused_gemm_epilogue 支持xpu (#45706) · 7085cb97
  由 taixiurong 提交于 9月 08, 2022
```
* add gemm_epilogue

* xpu-paddlepaddle-40 [任务] fused_gemm_epilogue 支持 test=kunlun
```
  7085cb97
07 9月, 2022 1 次提交
- H
  
  [XPU] move rnn op to phi. (#45822) · 91631492
  由 houj04 提交于 9月 07, 2022
  
  91631492
06 9月, 2022 1 次提交

Update protobuf output format for profiler (#45724) · 23bc0e3c

由 chenjian 提交于 9月 06, 2022

* update protobuf format

* fix protobuf content

* fix file mode

* fix compiling error when gpu not exists

* fix compiling error when gpu not exists

* fix compiling error when gpu not exists

* fix compiling error when gpu not exists

* support rocm

23bc0e3c

05 9月, 2022 3 次提交

[PHI] Move oneDNN helper classes to new location (#45626) · 269bd1fe

由 piotrekobi 提交于 9月 05, 2022

* gaussian random

* mkldnn to onednn renaming

* fix merge conflicts

* remove fluid code

* onednn renaming

* Move classes from mkldnn_reuse.h to onednn_reuse.h

* Move more functions from mkldnn_helper.h to onednn_helpper.h

* Change MKLDNN to OneDNN in VLOG message
Co-authored-by: NSilv3S <slawomir.siwek@intel.com>

269bd1fe

C

Fix jetson compile error (#45692) · cfaee812
由 chalsliu 提交于 9月 05, 2022

cfaee812
S

fix some op int32 exceed range (#45711) · a1dbee23
由 sneaxiy 提交于 9月 05, 2022

a1dbee23

02 9月, 2022 1 次提交
- K
  
  move onednn file from phi/kernels/funcs/onednn to phi/backends/onednn (#45659) · 6813f41e
  由 kangguangli 提交于 9月 02, 2022
  
  6813f41e
01 9月, 2022 3 次提交
- H
  
  [XPU] add c_embedding_op_xpu. (#45617) · ed2ad5d9
  由 houj04 提交于 9月 01, 2022
  
  ed2ad5d9
- T
  xpu-paddlepaddle-37 [任务] 迁移lamb到phi (#45520) · 1a0ef45e
  由 taixiurong 提交于 9月 01, 2022
```
test=kunlun
```
  1a0ef45e
- L
  remove circular dependency of device_context and allocator (#45455) · 934171ae
  由 Leo Chen 提交于 9月 01, 2022
```
* refine cmake of framework

* add deps for dense tensor

* fix deps

* remove alloc(ctx)

* add depends on mkldnn
```
  934171ae
29 8月, 2022 3 次提交

[PHI] Migrate relu6 and abs kernels (#45397) · 632bc1f2

由 Sławomir Siwek 提交于 8月 29, 2022

* abs relu6 fwd

* abs bwd

* gaussian_random_kernel and mkldnn-onednn renaming

* scale kernel

* whitespace

* whitespace

* revert scale migration

* whitespaces

* revert changes to gaussian kernel

* whitespaces

632bc1f2

[IPU] support depthwise_conv2d ops (#45234) · a237ff8e

由 Allen Guo 提交于 8月 29, 2022

* support depthwise_conv2d ops
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>
Co-authored-by: NZhaorui Chen <zhaoruic@graphcore.ai>

* fix duplicate name
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>
Co-authored-by: NZhaorui Chen <zhaoruic@graphcore.ai>

a237ff8e

A

fix compile (#45441) · f49f3b4f
由 Allen Guo 提交于 8月 29, 2022

f49f3b4f

26 8月, 2022 1 次提交
- H
  
  [XPU] add load_combine_op_xpu. test=kunlun (#45436) · 3055d71a
  由 houj04 提交于 8月 26, 2022
  
  3055d71a
25 8月, 2022 2 次提交

optimize conv algo cache (#41891) · 1cd7e68b

由 hong 提交于 8月 25, 2022

* optimizer conv alog speed

* code polish

* remove useless code

* fix compile error

* fix cpu compile error

* not use cudnn alog t

* add search cache max number

* polish code

* fix cache test bug

* add groups data format to conv args

* fix cache test bug

* fix cudnn_deterministic bug

* fix test switch auto tune bug

* fix test swith autotune bug;

* fix conv cache bug

* fix cache test error

* fix cache test bug

* fix windows mac compile error

* fix workspace search error

* update cudnn cache

* fix cache test bug; test=develop

* fix autotune swith test error

* polish code

* oplish code

1cd7e68b

H

add temporal shift and grad *test=kunlun (#45300) · 63d9a175
由 haosicheng 提交于 8月 25, 2022

63d9a175

24 8月, 2022 1 次提交

Support fp16 of adam operator in xpu environment (#45292) · a012d426

由 mengqingchun02 提交于 8月 24, 2022

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support fp16 of adam operator in xpu environment. test=kunlun

* support fp16 of adam operator in xpu environment. test=kunlun

* support fp16 of adam operator in xpu environment. test=kunlun

a012d426

23 8月, 2022 1 次提交

[CustomDevice] add profiler apis (#45130) · da51baf2

由 ronnywang 提交于 8月 23, 2022

* [CustomDevice] add profiler apis

* migrate CalculateEstOccupancy into cuda_tracer

* update

* add ut

da51baf2

22 8月, 2022 1 次提交
- J
  Add int8 support for matmul+elementwise_add fuse pass (#45077) · 9e5f3a38
  由 joanna.wozna.intel 提交于 8月 22, 2022
```
* Add int8 support for matmul+elementwiae_add fuse

* Corrections after review and ernie test fix
```
  9e5f3a38
19 8月, 2022 3 次提交

H

[XPU] c_allreduce support int. update bkcl to 1.0.5. test=kunlun (#45248) · 9f1f1b0a
由 houj04 提交于 8月 19, 2022

9f1f1b0a

[XPU] add merged_momentum unittest and change momentum (#45241) · e0f1c9f2

由 dongfangshenzhu 提交于 8月 19, 2022

* add merged_momentum *test=kunlun

* add merged_momentum *test=kunlun

* add fp16 to merged_momentum,*test=kunlun

* change dist_model.cc

* add merged_momentum unittest and  change momentum,test=kunlun

* add merged_momentum unittest and  change momentum,test=kunlun

* add merged_momentum unittest and  change momentum,test=kunlun

* add merged_momentum unittest and  change momentum,test=kunlun

e0f1c9f2

Support beam search decode op in XPU environment (#44917) · adaffb7b

由 mengqingchun02 提交于 8月 19, 2022

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* fix beam_search operator bugs on xpu. test=kunlun

* fix beam_search operator bugs on xpu. test=kunlun

* fix beam_search operator bugs on xpu. test=kunlun

* fix beam_search operator bugs on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

adaffb7b

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功