提交 · 1cd7e68b34487f06fb52acf5381fdd533425b9e0 · BaiXuePrincess / Paddle

25 8月, 2022 2 次提交

optimize conv algo cache (#41891) · 1cd7e68b

由 hong 提交于 8月 25, 2022

* optimizer conv alog speed

* code polish

* remove useless code

* fix compile error

* fix cpu compile error

* not use cudnn alog t

* add search cache max number

* polish code

* fix cache test bug

* add groups data format to conv args

* fix cache test bug

* fix cudnn_deterministic bug

* fix test switch auto tune bug

* fix test swith autotune bug;

* fix conv cache bug

* fix cache test error

* fix cache test bug

* fix windows mac compile error

* fix workspace search error

* update cudnn cache

* fix cache test bug; test=develop

* fix autotune swith test error

* polish code

* oplish code

1cd7e68b

H

add temporal shift and grad *test=kunlun (#45300) · 63d9a175
由 haosicheng 提交于 8月 25, 2022

63d9a175

24 8月, 2022 1 次提交

Support fp16 of adam operator in xpu environment (#45292) · a012d426

由 mengqingchun02 提交于 8月 24, 2022

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support fp16 of adam operator in xpu environment. test=kunlun

* support fp16 of adam operator in xpu environment. test=kunlun

* support fp16 of adam operator in xpu environment. test=kunlun

a012d426

23 8月, 2022 1 次提交

[CustomDevice] add profiler apis (#45130) · da51baf2

由 ronnywang 提交于 8月 23, 2022

* [CustomDevice] add profiler apis

* migrate CalculateEstOccupancy into cuda_tracer

* update

* add ut

da51baf2

22 8月, 2022 1 次提交
- J
  Add int8 support for matmul+elementwise_add fuse pass (#45077) · 9e5f3a38
  由 joanna.wozna.intel 提交于 8月 22, 2022
```
* Add int8 support for matmul+elementwiae_add fuse

* Corrections after review and ernie test fix
```
  9e5f3a38
19 8月, 2022 3 次提交

H

[XPU] c_allreduce support int. update bkcl to 1.0.5. test=kunlun (#45248) · 9f1f1b0a
由 houj04 提交于 8月 19, 2022

9f1f1b0a

[XPU] add merged_momentum unittest and change momentum (#45241) · e0f1c9f2

由 dongfangshenzhu 提交于 8月 19, 2022

* add merged_momentum *test=kunlun

* add merged_momentum *test=kunlun

* add fp16 to merged_momentum,*test=kunlun

* change dist_model.cc

* add merged_momentum unittest and  change momentum,test=kunlun

* add merged_momentum unittest and  change momentum,test=kunlun

* add merged_momentum unittest and  change momentum,test=kunlun

* add merged_momentum unittest and  change momentum,test=kunlun

e0f1c9f2

Support beam search decode op in XPU environment (#44917) · adaffb7b

由 mengqingchun02 提交于 8月 19, 2022

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* fix beam_search operator bugs on xpu. test=kunlun

* fix beam_search operator bugs on xpu. test=kunlun

* fix beam_search operator bugs on xpu. test=kunlun

* fix beam_search operator bugs on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

adaffb7b

18 8月, 2022 1 次提交

change to async mode for xpu multi-card training in static graph mode, test=kunlun (#45024) · 41bdf41d

由 zhangxiaoci 提交于 8月 18, 2022

* change to async mode for xpu multi-card training in static graph mode

* minor bugfix

* irrelevant. move to another pr

* move change to other pr

* fix stream issue

* fix 'stream not meet with current context' error

* fix branch diverge, test=kunlun

41bdf41d

17 8月, 2022 1 次提交

add instance norm op for xpu (#45097) · 216d25ac

由 ykkk2333 提交于 8月 17, 2022

* xpu unittest grad compute supports more types, *test=kunlun

* add instance norm xpu, *test=kunlun

216d25ac

16 8月, 2022 1 次提交
- H
  
  [XPU] add truncated_gaussian_random op. (#45152) · 5bcabf78
  由 houj04 提交于 8月 16, 2022
  
  5bcabf78
15 8月, 2022 2 次提交
- Z
  
  add mish and mish_grad for XPU, test=kunlun (#45098) · 6815c8ab
  由 zhangyikun02 提交于 8月 15, 2022
  
  6815c8ab
- H
  [XPU] add some collective ops. (#45049) · 7e2a20d5
  由 houj04 提交于 8月 15, 2022
```
* [XPU] add some collective ops. test=kunlun

* use XPUOpTestWrapper. test=kunlun

* skip kl1 for collective ops. fix typo: deivce -> device. test=kunlun
```
  7e2a20d5
12 8月, 2022 2 次提交

A

fix compilation (#45087) · 4eec94dd
由 Allen Guo 提交于 8月 12, 2022

4eec94dd

[geometric]Add paddle.geometric.send_ue_recv API (#43174) · 615b15a3

由 Siming Dai 提交于 8月 12, 2022

* add init file

* add op definition and infermeta

* add kernel definition funcs

* add broadcast infer shape

* add gpu forward kernel

* delete SUB and DIV

* add x_grad

* add template

* add e_grad for min and max

* fix small bug

* temp commit

* temp commit

* add e_grad for sum and mean

* fix some compile bug

* fix compile bugs

* fix compile problem

* add sum forward unittest

* fix broadcast error, add kernel sig, register e_grad, change unit test

* fix grad

* add temp grad fix

* temp commit

* add min max unittest

* add max, min unittest, fix mul bug

* add cpu forward sum and mean

* add forward min max, fix mean unittest

* add cpu backward min max

* fix code-style

* add backward sum mean

* fix rocm ci

* set uniitest timeout

* fix bug of x broadcast to e, gpu grad

* fix bug of x broadcast to e, cpu grad

* rename BOOST_GET_CONST macro

* fix rocm ci

* mv graph_send_e_recv to graph_send_ue_recv

* move out_size to IntArray

* add eager op test

* fix max pool type bug, add unittest for api

* revise api doc

* add fp16 for atomic min and max, add unittest

* add unittest

* add fp16 support for graph_send_recv

* fix unittest fp16 bug

* change OutSizeTensor to Out_size

* move E to Y

* add copyright, fix comment

* review code

* fix thread block size

* fix thread block size

* change api attribute name: pool_type to reduce_op, compute_type to message_op

* change api attribute name, move pool_type to reduce_op, move compute_type to message_op

615b15a3

11 8月, 2022 1 次提交

Add input shape record for new dygraph operator (#44999) · 8ea83400

由 chenjian 提交于 8月 11, 2022

* fix

* add control flag and input shapes for new dygraph

* fix file mode

* improve code coverage

* fix a bug in statstic

* fix according to review

* optimize performance

* fix

8ea83400

10 8月, 2022 2 次提交
- Z
  add macro control in enforce_xpu.h, test=kunlun (#45022) · 9e74211f
  由 zhangxiaoci 提交于 8月 10, 2022
```
* add macro control in enforce_xpu.h, test=kunlun

* minor bugfix

* minor bugfix
```
  9e74211f
- L
  [new-exec] set cuda device before run (#44985) · 68b06ba6
  由 Leo Chen 提交于 8月 10, 2022
```
* set cuda device before run

* add header file

* fix compile
```
  68b06ba6
09 8月, 2022 1 次提交

add phi empty kernel for xpu,*test=kunlun (#44745) · cd0b03cd

由 z8hanghuan 提交于 8月 09, 2022

* add phi empty,*test=kunlun

* support empty op in xpu, *test=kunlun

* support empty op in xpu, *test=kunlun

cd0b03cd

08 8月, 2022 1 次提交

[JitLayer]Rename Function to Engine and using new Function class to warp Engine (#44900) · ede0990f

由 WangZhen 提交于 8月 08, 2022

* Polish function code

* Rename funciton to engine

* Fix Log msg and doc

* Rename Function to Engine and using new Function class to warp Engine

* Rename EngineInfo

* Adjust member variable order

ede0990f

05 8月, 2022 3 次提交
- Y
  [MKLDNN]Move mkldnn activation kernel to phi (#44365) · 2dfa88d2
  由 YuanRisheng 提交于 8月 05, 2022
```
* move mkldnn activation kernel

* fix compile bugs

* fix compile bugs

* deal with conflict

* fix compile bugs

* fix windows compile bugs

* mkldnn unittest fix

* change mutable to alloc

* fix unittest bugs

* modify code according comment
```
  2dfa88d2
- J
  
  Add int8 support for matmulV2 (#44908) · f3c14762
  由 joanna.wozna.intel 提交于 8月 05, 2022
  
  f3c14762
- Z
  
  refactor xpu tests for squeeze/unsqueeze, *test=kunlun (#44812) · 54d98963
  由 zhangxiaoci 提交于 8月 05, 2022
  
  54d98963
04 8月, 2022 3 次提交

Matmuls with activation and elementwise_add fuses (#44655) · 0420d514

由 Sławomir Siwek 提交于 8月 04, 2022

* Add unit tests

* matmul_v2 + activation

* matmuls + elementwise_add

* matmul_v2 postops

* transform matmul to v2

* opcompat

* fix fusing matmul with multipe outs

* add shape constraints

* remove unused vars

* change pass order

* - Unit tests to be debugged

- fix

- refactor

- diagnostic

- more diagnostic

- fix

- Fix number two

- fix

- fix

- fix

- alpha added

- more fixes

- compilation fix

- removed diagnostic code

- cosmetic fixes

* lint

* add alpha constraint

* merge matmul refactor

* trigger CI

* - fix

* - another fix

* code style

* add support for matmul+elementwise_add+activation

* code style

* fix bfloat16 bugs

* change append_binary to append_sum
Co-authored-by: NJacek Czaja <jacek.czaja@intel.com>

0420d514

D
[XPU] add merged_momentum including fp32 and fp16 (#44824) · 4922376c
由 dongfangshenzhu 提交于 8月 04, 2022
```
* add merged_momentum *test=kunlun

* add merged_momentum *test=kunlun

* add fp16 to merged_momentum,*test=kunlun
```
4922376c
王

add xpu garbage collector for standalone executor. (#44572) · 0e26361c
由王明冬提交于 8月 04, 2022

0e26361c

03 8月, 2022 2 次提交
- add sequence_unpad for xpu (#44808) · ed0e95a8
  由 z8hanghuan 提交于 8月 03, 2022
```
* add sequence_unpad for xpu,*test=kunlun

* add sequence_unpad, *test=kunlun

* fix bug in testcase,should not be sequence_pad,*test=kunlun
```
  ed0e95a8
- L
  
  clean class EigenCudaStreamDevice and CudnnWorkspaceHandle in device_context.cc (#44829) · 7eb37a7e
  由 Leo Chen 提交于 8月 03, 2022
  
  7eb37a7e
02 8月, 2022 2 次提交

H
[XPU] fp16 for layer_norm op (#44778) · 4c3e13de
由 houj04 提交于 8月 02, 2022
```
* [XPU] fp16 for layer_norm op. test=kunlun
```
4c3e13de

support beam_search operator on xpu. test=kunlun (#44720) · 9bf80772

由 mengqingchun02 提交于 8月 02, 2022

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

9bf80772

01 8月, 2022 3 次提交

unify gpu context (#44740) · 86763023

由 Leo Chen 提交于 8月 01, 2022

* remove cudaDeviceContext

* remove more template

* fix rocm compile

* remove alias name CUDADeviceContext

* fix compile

* fix tests

* revert changes

86763023

GPUGraph merge to develop (#44594) · 798670bb

由 danleifeng 提交于 8月 01, 2022

798670bb

[Sparse] optimize sparse attention (#44743) · 1149a378
由 zhouweiwei2014 提交于 8月 01, 2022

1149a378

29 7月, 2022 7 次提交
- L
  unify fluid::CUDADeviceContext and phi::GpuContext (#44723) · 88490567
  由 Leo Chen 提交于 7月 29, 2022
```
* remove cudaDeviceContext

* remove more template

* fix rocm compile
```
  88490567
- Q
  add some fp16 op for kunlun resnet50 model (#44672) · fecbc958
  由 QingshuChen 提交于 7月 29, 2022
```
* add some fp16 op for kunlun resnet50 model
*test=kunlun

* tmp
*test=kunlun
```
  fecbc958
- A
  add FLAGS_enable_api_kernel_fallback (#44706) · e439d735
  由 Aganlengzi 提交于 7月 29, 2022
```
* add FLAGS_enable_api_kernel_fallback

* deal with more cases

* add ut for coverage
```
  e439d735
- J
  [WIP] Matmul v1 & v2 unification -- part 1 (#44640) · 653885a5
  由 Jacek Czaja 提交于 7月 29, 2022
```
* - Unit tests to be debugged

- fix

- refactor

- diagnostic

- more diagnostic

- fix

- Fix number two

- fix

- fix

- fix

- alpha added

- more fixes

- compilation fix

- removed diagnostic code

- cosmetic fixes

* lint
```
  653885a5
- L
  move CUDAStream to phi (#44529) · da3743fd
  由 Leo Chen 提交于 7月 29, 2022
```
* init

* move CUDAStream to phi

* fix compilation

* merge develop

* add stream_owned_ member

* split cuda_stream.h

* fix cpu compile

* fix constructor

* fix bug

* fix windows compile

* fix inference test_levit

* fix windows tests
```
  da3743fd
- A
  
  update to sdk2.6.0 (#44673) · 23ad0cc4
  由 Allen Guo 提交于 7月 29, 2022
  
  23ad0cc4
- H
  
  [XPU] add sampling_id op, add top_k op, update xdnn api. test=kunlun (#44704) · e61f48c1
  由 houj04 提交于 7月 29, 2022
  
  e61f48c1

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致