提交 · 2dfa88d2526953bec87507d87402c4038dc49259 · BaiXuePrincess / Paddle

05 8月, 2022 3 次提交
- Y
  [MKLDNN]Move mkldnn activation kernel to phi (#44365) · 2dfa88d2
  由 YuanRisheng 提交于 8月 05, 2022
```
* move mkldnn activation kernel

* fix compile bugs

* fix compile bugs

* deal with conflict

* fix compile bugs

* fix windows compile bugs

* mkldnn unittest fix

* change mutable to alloc

* fix unittest bugs

* modify code according comment
```
  2dfa88d2
- J
  
  Add int8 support for matmulV2 (#44908) · f3c14762
  由 joanna.wozna.intel 提交于 8月 05, 2022
  
  f3c14762
- Z
  
  refactor xpu tests for squeeze/unsqueeze, *test=kunlun (#44812) · 54d98963
  由 zhangxiaoci 提交于 8月 05, 2022
  
  54d98963
04 8月, 2022 3 次提交

Matmuls with activation and elementwise_add fuses (#44655) · 0420d514

由 Sławomir Siwek 提交于 8月 04, 2022

* Add unit tests

* matmul_v2 + activation

* matmuls + elementwise_add

* matmul_v2 postops

* transform matmul to v2

* opcompat

* fix fusing matmul with multipe outs

* add shape constraints

* remove unused vars

* change pass order

* - Unit tests to be debugged

- fix

- refactor

- diagnostic

- more diagnostic

- fix

- Fix number two

- fix

- fix

- fix

- alpha added

- more fixes

- compilation fix

- removed diagnostic code

- cosmetic fixes

* lint

* add alpha constraint

* merge matmul refactor

* trigger CI

* - fix

* - another fix

* code style

* add support for matmul+elementwise_add+activation

* code style

* fix bfloat16 bugs

* change append_binary to append_sum
Co-authored-by: NJacek Czaja <jacek.czaja@intel.com>

0420d514

D
[XPU] add merged_momentum including fp32 and fp16 (#44824) · 4922376c
由 dongfangshenzhu 提交于 8月 04, 2022
```
* add merged_momentum *test=kunlun

* add merged_momentum *test=kunlun

* add fp16 to merged_momentum,*test=kunlun
```
4922376c
王

add xpu garbage collector for standalone executor. (#44572) · 0e26361c
由王明冬提交于 8月 04, 2022

0e26361c

03 8月, 2022 2 次提交
- add sequence_unpad for xpu (#44808) · ed0e95a8
  由 z8hanghuan 提交于 8月 03, 2022
```
* add sequence_unpad for xpu,*test=kunlun

* add sequence_unpad, *test=kunlun

* fix bug in testcase,should not be sequence_pad,*test=kunlun
```
  ed0e95a8
- L
  
  clean class EigenCudaStreamDevice and CudnnWorkspaceHandle in device_context.cc (#44829) · 7eb37a7e
  由 Leo Chen 提交于 8月 03, 2022
  
  7eb37a7e
02 8月, 2022 2 次提交

H
[XPU] fp16 for layer_norm op (#44778) · 4c3e13de
由 houj04 提交于 8月 02, 2022
```
* [XPU] fp16 for layer_norm op. test=kunlun
```
4c3e13de

support beam_search operator on xpu. test=kunlun (#44720) · 9bf80772

由 mengqingchun02 提交于 8月 02, 2022

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

9bf80772

01 8月, 2022 3 次提交

unify gpu context (#44740) · 86763023

由 Leo Chen 提交于 8月 01, 2022

* remove cudaDeviceContext

* remove more template

* fix rocm compile

* remove alias name CUDADeviceContext

* fix compile

* fix tests

* revert changes

86763023

GPUGraph merge to develop (#44594) · 798670bb

由 danleifeng 提交于 8月 01, 2022

798670bb

[Sparse] optimize sparse attention (#44743) · 1149a378
由 zhouweiwei2014 提交于 8月 01, 2022

1149a378

29 7月, 2022 7 次提交
- L
  unify fluid::CUDADeviceContext and phi::GpuContext (#44723) · 88490567
  由 Leo Chen 提交于 7月 29, 2022
```
* remove cudaDeviceContext

* remove more template

* fix rocm compile
```
  88490567
- Q
  add some fp16 op for kunlun resnet50 model (#44672) · fecbc958
  由 QingshuChen 提交于 7月 29, 2022
```
* add some fp16 op for kunlun resnet50 model
*test=kunlun

* tmp
*test=kunlun
```
  fecbc958
- A
  add FLAGS_enable_api_kernel_fallback (#44706) · e439d735
  由 Aganlengzi 提交于 7月 29, 2022
```
* add FLAGS_enable_api_kernel_fallback

* deal with more cases

* add ut for coverage
```
  e439d735
- J
  [WIP] Matmul v1 & v2 unification -- part 1 (#44640) · 653885a5
  由 Jacek Czaja 提交于 7月 29, 2022
```
* - Unit tests to be debugged

- fix

- refactor

- diagnostic

- more diagnostic

- fix

- Fix number two

- fix

- fix

- fix

- alpha added

- more fixes

- compilation fix

- removed diagnostic code

- cosmetic fixes

* lint
```
  653885a5
- L
  move CUDAStream to phi (#44529) · da3743fd
  由 Leo Chen 提交于 7月 29, 2022
```
* init

* move CUDAStream to phi

* fix compilation

* merge develop

* add stream_owned_ member

* split cuda_stream.h

* fix cpu compile

* fix constructor

* fix bug

* fix windows compile

* fix inference test_levit

* fix windows tests
```
  da3743fd
- A
  
  update to sdk2.6.0 (#44673) · 23ad0cc4
  由 Allen Guo 提交于 7月 29, 2022
  
  23ad0cc4
- H
  
  [XPU] add sampling_id op, add top_k op, update xdnn api. test=kunlun (#44704) · e61f48c1
  由 houj04 提交于 7月 29, 2022
  
  e61f48c1
28 7月, 2022 4 次提交
- N
  
  delete elementwise pow in xpu_kp_list (#44661) · dfeb1942
  由 niuliling123 提交于 7月 28, 2022
  
  dfeb1942
- support log_grad op, *test=kunlun (#44662) · 067107ad
  由 z8hanghuan 提交于 7月 28, 2022
  
  067107ad
- L
  
  Complete the dtypes for all_gather, add all_gather_object api (#44417) · d4cf02bc
  由 LiYuRio 提交于 7月 28, 2022
  
  d4cf02bc
- H
  [XPU] add top_k op (#44656) · acf07c74
  由 houj04 提交于 7月 28, 2022
```
* [XPU] add top_k op. test=kunlun

* [XPU] add top_k op. test=kunlun

* use PADDLE_ENFORCE_XDNN_NOT_NULL to check pointer. test=kunlun
```
  acf07c74
27 7月, 2022 3 次提交

[IPU] add more loss ops (#44646) · 8bf7cd85

由 Allen Guo 提交于 7月 27, 2022

* add more loss ops

* add authors
Co-authored-by: NZhaorui Chen <zhaoruic@graphcore.ai>
Co-authored-by: NZhaorui Chen <zhaoruic@graphcore.ai>

8bf7cd85

Y

[DCU] Fix NAN problem when training BERT on DUC platform (#44643) · 28aa0c61
由 Yuang Liu 提交于 7月 27, 2022

28aa0c61

[IPU] small bug fix (#44473) · 42d58ddd

由 Allen Guo 提交于 7月 27, 2022

* sync misc changes

* add authors
Co-authored-by: NZhaorui Chen <zhaoruic@graphcore.ai>

* up x

* Revert "up x"

This reverts commit f3fde458c6cc48613269a643cfe2acf689caccd3.

* add guarg for ipu
Co-authored-by: NZhaorui Chen <zhaoruic@graphcore.ai>

42d58ddd

26 7月, 2022 2 次提交
- C
  fix record event for operator type in new dygraph (#44582) · 963163e6
  由 chenjian 提交于 7月 26, 2022
```
* fix new dygraph record event for op

* update unit test
```
  963163e6
- F
  
  [MLU] rollback cntoolkit vetsion to 2.8.5 (#44595) · 356ff436
  由 fwenguang 提交于 7月 26, 2022
  
  356ff436
22 7月, 2022 3 次提交
- Q
  add xpu lars_momentum/pow2_decay (#44448) · 8ccbb863
  由 QingshuChen 提交于 7月 22, 2022
```
*test=kunlun
```
  8ccbb863
- Y
  
  Add code of occupancy computing on DCU and avoid threadID bug for DCU profiler (#44520) · 8037901b
  由 yuguo 提交于 7月 22, 2022
  
  8037901b
- F
  
  [MLU] add floor kernel and grid_sampler kernel (#44498) · 1c0120e2
  由 fwenguang 提交于 7月 22, 2022
  
  1c0120e2
21 7月, 2022 1 次提交

[JitLayer]Pybind PEFunction and call phi api in layer_test (#44465) · a0bccd9e

由 WangZhen 提交于 7月 21, 2022

* Support predictor function in JitLayer

* Pybind PEFunction

* Pybind PEFunction and call phi api in layer_test

* Call sqrt phi API

* Polish flags

* Fix comments

a0bccd9e

20 7月, 2022 1 次提交
- Y
  [IPU] Add more Ops (#44414) · 7daae985
  由 yaozhixin 提交于 7月 20, 2022
```
* [IPU] Add more Ops

* update boost API
```
  7daae985
19 7月, 2022 3 次提交
- L
  compile phi/backends into one static library (#44373) · 1047cb17
  由 Leo Chen 提交于 7月 19, 2022
```
* compile into one static library

* fix xpu compile

* fix xpu compile

* fix inference compile

* fix inference compile

* add custom test

* revert one file
```
  1047cb17
- R
  Rename BOOST_GET macros (#44368) · 4b085c57
  由 Ruibiao Chen 提交于 7月 19, 2022
```
* Rename BOOST_GET macros

* Fix conflicts
```
  4b085c57
- W
  
  update (#44418) · d5f0ed4b
  由 Wilber 提交于 7月 19, 2022
  
  d5f0ed4b
18 7月, 2022 3 次提交
- [Sparse] Add sparse matmul kernel(coo*dense->dense) (#44346) · 3f70b1d3
  由 zhouweiwei2014 提交于 7月 18, 2022
  
  3f70b1d3
- 王
  
  add ipu support for standalone executor. (#44342) · fbedf77e
  由王明冬提交于 7月 18, 2022
  
  fbedf77e
- Q
  add xpu resnet_unit (#44297) · 02e9453f
  由 QingshuChen 提交于 7月 18, 2022
```
* add xpu resnet_unit
*test=kunlun

* tmp
*test=kunlun
```
  02e9453f

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致