提交 · 1dbc863202f1229ac9f630586a2ccf785c74aee4 · PaddlePaddle / Paddle

15 1月, 2022 1 次提交

[Unify Tensors PR ] Merged LoDTensor with Tensor, test=allcases (#38880) · 88966b28

由 Zhanlue Yang 提交于 1月 15, 2022

* Merged LoDTensor with Tensor,test=allcases

* Patched python level LoDTensor

* Fixed example code failure

* Polished function names, removed duplicated forward declarations

88966b28

14 1月, 2022 1 次提交

[XPU]add stack_grad op for kunlun2,*test=kunlun (#38674) · 87ee3e4f

由 Zhangjingyu06 提交于 1月 14, 2022

* [XPU]add split op for kunlun2,*test=kunlun

* [XPU]add split op for kunlun2,*test=kunlun

* [XPU]add split op for kunlun,*test=kunlun

* [XPU]add stack_grad op for kunlun2,*test=kunlun
Co-authored-by: NQingshuChen <chenqingshu@baidu.com>

87ee3e4f

13 1月, 2022 2 次提交

Added mul BF16/FP32 FWD/BWD oneDNN kernel (#38552) · fc6eed5b

由 jakpiase 提交于 1月 13, 2022

* base changes for mul reimplementation

* empty commit

* tmp save

* full implementation of mul bf16/fp32 fwd bwd

* CI fix

* CI rerun

* changed unity build cmake to avoid gpu issues

* removed mul mkldnn from unity build

* added skipping tests if not cpu_bf16

* CI fix

* CI fix

* CI fix

fc6eed5b

石

splits allocation for pten, test=develop (#38853) · 277cf900
由石晓伟提交于 1月 13, 2022

277cf900

12 1月, 2022 3 次提交

[IPU] add more ops (#38831) · 050fd168

由 Allen Guo 提交于 1月 12, 2022

* support more ops

* Co-authored-by: Xiaobing Wang <xiaobingw@graphcore.ai>
Co-authored-by: NAllen Guo <alleng@graphcore.ai>
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>
Co-authored-by: NHaicheng Jiang <haichengj@graphcore.ai>
Co-authored-by: NHan Zhao <hanzhao@graphcore.ai>

* add authors
Co-authored-by: NXiaobing Wang <xiaobingw@graphcore.ai>
Co-authored-by: NAllen Guo <alleng@graphcore.ai>
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>
Co-authored-by: NHaicheng Jiang <haichengj@graphcore.ai>
Co-authored-by: NHan Zhao <hanzhao@graphcore.ai>

* update date
Co-authored-by: NXiaobing Wang <xiaobingw@graphcore.ai>
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>
Co-authored-by: NHaicheng Jiang <haichengj@graphcore.ai>
Co-authored-by: NHan Zhao <hanzhao@graphcore.ai>

050fd168

Adjust warpper of gpu_lanuch_config (#38654) · f5166284

由 limingshu 提交于 1月 12, 2022

* first commit

* fix wrong filename

* fix the wrong spell name

* fix gpu config warper

* modify according to pr advices

* fix GpuLauchConfig1D api bugs

* change the config for dropout grad

* fix bugs

* modification according to pr advices

* modification according to pr advices

f5166284

Os info (#38779) · 0d8d1e0e

由 liutiexing 提交于 1月 12, 2022

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* Revert "Add EventsWaiter"

This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2.

* os_info update

* update

* update

* update

* update

* update

* fix

* update

* update for windows

* fix windows

* update

* update
Co-authored-by: Nliutiexing <liutiexing@google.com>

0d8d1e0e

11 1月, 2022 1 次提交
- N
  
  Modified Kernel Primitive API and elementwise for xpu2 #38688 · 3eaf8d2c
  由 niuliling123 提交于 1月 11, 2022
  
  3eaf8d2c
10 1月, 2022 4 次提交

Add gpu kernel for new api : linalg.lstsq (#38621) · 405103d8

由 Haohongxiang 提交于 1月 10, 2022

* add lstsq gpu kernel

* update

* add docs_en

* modify ut

* fix bugs

* modify example in docs_en

* remove lstsq_op.cu from ROCM cmake

* modify docs_en

* modify docs_en

* modify docs_en

* remove unneccessary TensorCopy

405103d8

Profiler skeleton (#38826) · a8afed69

由 liutiexing 提交于 1月 10, 2022

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* Revert "Add EventsWaiter"

This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2.

* profiler skeleton

* update

* update

* update
Co-authored-by: Nliutiexing <liutiexing@google.com>

a8afed69

T

1.fix elementwise_add_grad bug. 2. add dropout kernel in kl2 (#38726) · 7b860a23
由 taixiurong 提交于 1月 10, 2022

7b860a23

[Unify Tensors PR ] framework::Tensor inherits from DenseTensor,test=allcases (#38632) · 5c73a6ea

由 Zhanlue Yang 提交于 1月 10, 2022

* Added shared_ptr<Allocation> member & corresponding interfaces to Storage

* Removed original pten::Allocation from Storage and adjusted the interfaces accordingly

* Fixed issues with storage offset

* Used place to malloc allocation for TensorStorage

* [Unify Tensors PR #3]Ported framework::Tensor interfaces to pten::DenseTensor

* Fixed issues with place

* Added comments

* Moved mutable_data with stream argument to DenseTensor

* Added set_offset interface

* Fixed CI issues,test=allcases

* [Unify Tensors PR #4] Port LoDTensor interfaces to DenseTensor

* Removed friend class EigenTensor/EigenMatrix/EigenVector from Tensor

* Modified framework::Tensor to inherit from DenseTensor

* Reverted changes too pten_layout() interface

* Removed friend classes

* Rearranged cfunction calls from tensor.data<void>() to tensor.data()

* Fixed CI issues

* Fixed lite issues

* Fixed data() interface issues,test=allcases

* Resolved IsInitialized() issues

* Fixed ResetHolder() issues

* Fixed MKLDNN & Storage issues

* Resolved ShareBufferWith() issues

* Fixed LoD issues

5c73a6ea

05 1月, 2022 1 次提交

add huber_loss for kunlun (#38589) · a268c7ce

由 TTerror 提交于 1月 05, 2022

* add huber_loss for kunlun

* update xpu.cmake

* update unitests

* update unitests

* update elementwise_add

* update elementwise_add

* update elementwise_add

a268c7ce

04 1月, 2022 3 次提交
- Q
  
  [XPU] update XPU device info, test=develop (#37884) · e1187e50
  由 Qi Li 提交于 1月 04, 2022
  
  e1187e50
- Z
  
  Modify macro definition to support arm (#38642) · 719f7419
  由 zhangkaihuo 提交于 1月 04, 2022
  
  719f7419
- H
  
  remove sigmoid cross entropy with logits from kl1 oplist. (#38641) · 30be9317
  由 houj04 提交于 1月 04, 2022
  
  30be9317
31 12月, 2021 3 次提交
- Z
  [XPU]add split op for kunlun2,*test=kunlun (#38277) · 26b845e2
  由 Zhangjingyu06 提交于 12月 31, 2021
```
* [XPU]add split op for kunlun2,*test=kunlun

* [XPU]add split op for kunlun2,*test=kunlun

* [XPU]add split op for kunlun,*test=kunlun
Co-authored-by: NQingshuChen <chenqingshu@baidu.com>
```
  26b845e2
- J
  Fix for undefined format for 6 dim tensor (#38553) · 730ccd9e
  由 jakpiase 提交于 12月 31, 2021
```
* 6 dims fix

* removed limitations of max dims
```
  730ccd9e
- C
  [PTen] Unify data layout of pten and fluid (#38583) · 8d32cef8
  由 Chen Weihang 提交于 12月 31, 2021
```
* unify data layout

* fix test_transfer_layout error
```
  8d32cef8
30 12月, 2021 7 次提交

Z
add OP lu forward (#38559) · 4e21457d
由 zhiboniu 提交于 12月 30, 2021
```
LGTM
```
4e21457d

add sigmoid_cross_entropy_with_logits to kl1 (#38586) · 790cadd1

由 houj04 提交于 12月 30, 2021

* add sigmoid cross entropy with logits to kl1. test=kunlun

* add sigmoid cross entropy with logits to kl1. test=kunlun

790cadd1

Z
Add exp, abs_grad, reciprocal, reciprocal_grad operator for XPU and update... · ceec1e21
由 zhangyk0314 提交于 12月 30, 2021
```
Add exp, abs_grad, reciprocal, reciprocal_grad operator for XPU and update xpu2_op_list.h,test=kunlun (#38570)
```
ceec1e21

flags to choose kp kernel (#38455) · ed2cfecf

由 Feng Xing 提交于 12月 30, 2021

This PR adds runtime flags run_kp_kernel, which choose which op to run for xpu2. There are two: dynamic linked and built from kp.

ed2cfecf

Add cpu kernel of new api : lstsq (#38585) · ccf99b66

由 Haohongxiang 提交于 12月 30, 2021

* add cpu kernel of lstsq

* update

* modify code style

* modify unittest

* remove support for complex

ccf99b66

Add cusparse and unittest (#38431) · 667dc9f0

由 zhangkaihuo 提交于 12月 30, 2021

将cuSparse的handle与DeviceContext进行绑定，避免op中进行创建和销毁
添加对cuSparse中dense和sparse转换的API进行封装
添加对封装的API的单测

667dc9f0

Fix the bug of batch_norm and batch_norm_grad op. (#38288) · cc83c95f

由 Leo Guo 提交于 12月 30, 2021

* Fix the bug of batch_norm and batch_norm_grad op. Add the "roi_align" and "roi_align_grad" op in xpu2 op list.

* Fix the bug of batch_norm and batch_norm_grad op. Add the "roi_align" and "roi_align_grad" op in xpu2 op list. test=kunlun
Co-authored-by: NZibin <guozibin@baidu.com>

cc83c95f

29 12月, 2021 4 次提交

Make profiler better (#38280) · 851637fd

由 liutiexing 提交于 12月 29, 2021

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* Revert "Add EventsWaiter"

This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2.

* update OS info

* split host_event_recorder

* split host_event_recorder

* update

* update

* update

* update

* update

* update

* update
Co-authored-by: Nliutiexing <liutiexing@google.com>

851637fd

Y

add top k v2 operator, test=kunlun (#38434) · d22f92ad
由 ykkk2333 提交于 12月 29, 2021

d22f92ad

add argsort/scatter for kunlun (#38345) · 4643baa7

由 TTerror 提交于 12月 29, 2021

* add argsort/scatter for kunlun

* update test_scatter

* update xpu.cmake

* update xpu.cmake

* fix scatter

4643baa7

S

add nccl func of NCCL 2.11 (#38519) · 4853ab0a
由 sneaxiy 提交于 12月 29, 2021

4853ab0a

28 12月, 2021 1 次提交

add reduce_prod_xpu. fix reduce_mean_xpu bug. (#38481) · 78836bb7

由 houj04 提交于 12月 28, 2021

* add reduce_prod_xpu. fix reduce_mean_xpu bug.

* iadd reduce_prod_xpu. fix reduce_mean_xpu bug. test=kunlun

78836bb7

27 12月, 2021 2 次提交
- L
  add device-agnostic stream class (#38391) · 6b5e33b4
  由 Leo Chen 提交于 12月 27, 2021
```
* add device-agnostic stream class

* add stream.h

* fix ut

* fix cpu compile
```
  6b5e33b4
- S
  
  refine float16 implementation (#38439) · 78375990
  由 sneaxiy 提交于 12月 27, 2021
  
  78375990
24 12月, 2021 1 次提交
- Z
  
  Add new API cholesky_solve (#38167) · 39f7c41f
  由 zhiboniu 提交于 12月 24, 2021
  
  39f7c41f
23 12月, 2021 3 次提交
- J
  Make GetBlob assuming elements are cached (#38336) · 7da5368d
  由 Jacek Czaja 提交于 12月 23, 2021
```
* First set of fixes

* - Make more likely to GetBlob find a blobs

* - Lint
```
  7da5368d
- W
  Support external stream. (#38373) · 15ad7ee4
  由 Wilber 提交于 12月 23, 2021
```
* support external stream.

* update

* update

* update
```
  15ad7ee4
- H
  
  add-leaky-relu-to-xpu2-op-list (#38366) · b7bafee8
  由 houj04 提交于 12月 23, 2021
  
  b7bafee8
20 12月, 2021 1 次提交
- F
  
  [MLU]add mlu backend (#38207) · 76514a1f
  由 fwenguang 提交于 12月 20, 2021
  
  76514a1f
17 12月, 2021 2 次提交

Get base pointer from Allocation (#37978) · 431a2d6a

由 From00 提交于 12月 17, 2021

* Get GPU BasePtr from CUDA allocation

* Fix compile error for ROCm

* Add BasePtr function for IPUPlace in naive_best_fit_allocator.cc

* Add alignment for BuddyAllocator

* Set address alignment of BuddyAllocator to 32 bytes

* Fix CI error

* Remove code for naive_best_fit strategy

431a2d6a

H

update xpu1 op list, for train ResNet50 using PaddleClas. (#38201) · 3a0e0b6f
由 houj04 提交于 12月 17, 2021

3a0e0b6f

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功