提交 · e1187e50d5fdaa5325abe4b36f610188652743d2 · Crayon鑫 / Paddle

04 1月, 2022 12 次提交
- Q
  
  [XPU] update XPU device info, test=develop (#37884) · e1187e50
  由 Qi Li 提交于 1月 04, 2022
  
  e1187e50
- A
  Fix memcpyD2H sync behavior with other stream (#38647) · c0c54ba3
  由 Aurelius84 提交于 1月 04, 2022
```
* Fix memcpyD2H sync behavior with other stream

* add wait

* add wait

* add wait
```
  c0c54ba3
- Y
  [Pten]Move CPU_implementation of elementwise kernel in new directory (#38651) · 7c020c71
  由 YuanRisheng 提交于 1月 04, 2022
```
* change 'math' to 'math_kernel'

* fix compile bugs

* merge develop

* fix compile bugs

* move cpu_impl of elementwise kernel to new directory
```
  7c020c71
- F
  [NPU] add pad and pad_grad (#38658) · 6e9714a2
  由 furnace 提交于 1月 04, 2022
```
[NPU] add pad and pad_grad
```
  6e9714a2
- L
  
  [fleet_executor] Support multi carriers (#38650) · 2273471d
  由 LiYuRio 提交于 1月 04, 2022
  
  2273471d
- J
  
  added sqrt bf16 fwd/bwd (#38599) · 2d2609ea
  由 jakpiase 提交于 1月 04, 2022
  
  2d2609ea
- Z
  
  Modify macro definition to support arm (#38642) · 719f7419
  由 zhangkaihuo 提交于 1月 04, 2022
  
  719f7419
- Z
  [Unify Tensors PR #3]Port framework::Tensor members & interfaces to... · dfdc9960
  由 Zhanlue Yang 提交于 1月 04, 2022
```
[Unify Tensors PR #3]Port framework::Tensor members & interfaces to pten::DenseTensor, test=allcases (#38473)

* Added shared_ptr<Allocation> member & corresponding interfaces to Storage

* Removed original pten::Allocation from Storage and adjusted the interfaces accordingly

* Fixed issues with storage offset

* Used place to malloc allocation for TensorStorage

* [Unify Tensors PR #3]Ported framework::Tensor interfaces to pten::DenseTensor

* Fixed issues with place

* Added comments

* Moved mutable_data with stream argument to DenseTensor

* Added set_offset interface

* Fixed CI issues,test=allcases

* [Unify Tensors PR #4] Port LoDTensor interfaces to DenseTensor

* Reverted changes too pten_layout() interface

* Removed friend classes
```
  dfdc9960
- H
  
  remove sigmoid cross entropy with logits from kl1 oplist. (#38641) · 30be9317
  由 houj04 提交于 1月 04, 2022
  
  30be9317
- C
  [PTen] Move inner empty and cast api to kernel.h (#38587) · 64538c8d
  由 Chen Weihang 提交于 1月 04, 2022
```
* move inner cast api to cast_kernel.h

* resolve conflit
```
  64538c8d
- Y
  heter context support dynamic mf dim (#38487) · 59888bba
  由 yaoxuefeng 提交于 1月 04, 2022
```
heter context support dynamic mf dim
```
  59888bba
- Z
  
  plugin terminate should be called by TensorRT (#38374) · ba411960
  由 zlsh80826 提交于 1月 04, 2022
  
  ba411960
31 12月, 2021 16 次提交
- Z
  [XPU]add split op for kunlun2,*test=kunlun (#38277) · 26b845e2
  由 Zhangjingyu06 提交于 12月 31, 2021
```
* [XPU]add split op for kunlun2,*test=kunlun

* [XPU]add split op for kunlun2,*test=kunlun

* [XPU]add split op for kunlun,*test=kunlun
Co-authored-by: NQingshuChen <chenqingshu@baidu.com>
```
  26b845e2
- J
  [new API] add paddle.kthvalue and paddle.Tensor.kthvalue (#38386) · 538b5721
  由 JYChen 提交于 12月 31, 2021
```
* add new api/op kthvalue

* kthvalue cuda kernel to cub sorting

* fix example code error

* throw errors instead of LOG in cuda sort

* throw errors by Paddle_ENFORCE
```
  538b5721
- B
  add mul_gru_fuse_pass ut (#37772) · bc827307
  由 baoachun 提交于 12月 31, 2021
```
* add mul_gru_fuse_pass ut

* update ut

* update ut

* update ut timeout setting

* update ut
```
  bc827307
- J
  Fix for MKLDNNDeviceContext error in matmul_v2_transpose_reshape fuse pass when GLOG_v set (#38554) · 1d31764e
  由 jakpiase 提交于 12月 31, 2021
```
* glog fix

* changed approach
```
  1d31764e
- J
  Fix for undefined format for 6 dim tensor (#38553) · 730ccd9e
  由 jakpiase 提交于 12月 31, 2021
```
* 6 dims fix

* removed limitations of max dims
```
  730ccd9e
- Y
  [Pten]Fix bugs of compilation when use pten::add/subtract (#38631) · 31efec53
  由 YuanRisheng 提交于 12月 31, 2021
```
* change 'math' to 'math_kernel'

* fix compile bugs

* merge develop

* fix compile bugs

* fix compile bugs
```
  31efec53
- Z
  
  fix compile error for fleetwrapper with -DWITH_TESTING=ON (#38603) · 761055f0
  由 zmxdream 提交于 12月 31, 2021
  
  761055f0
- F
  [MLU]support calling mlu op from python interface (#38292) · b6bf650a
  由 fwenguang 提交于 12月 31, 2021
```
* [MLU]support calling mlu op from python interface

* [MLU]fix

* fix

* [mlu]fix mlu_places

* [mlu]fix required mlu

* fix

* [MLU]fix tensor copy

* [mlu] fix MLUPlace call path
```
  b6bf650a
- W
  
  fix python ascend run error. (#38605) · 1df354e7
  由 Wilber 提交于 12月 31, 2021
  
  1df354e7
- Z
  
  add new API paddle.linalg.lu/lu_unpack (#38617) · 2ce91c33
  由 zhiboniu 提交于 12月 31, 2021
  
  2ce91c33
- X
  Add fold opereators (#38613) · 8898dce1
  由 xiaoting 提交于 12月 31, 2021
```
* add fold opereators, test=develop

* add fold opereators, test=develop

* add fold opereators, test=develop

* update fold op error test, test=develop

* fix unitext, test=develop

* fix unitext, test=develop
```
  8898dce1
- Z
  
  Removed friend class EigenTensor/EigenMatrix/EigenVector from Tensor (#38607) · 5a6a2d27
  由 Zhanlue Yang 提交于 12月 31, 2021
  
  5a6a2d27
- H
  Put_along_axis (based on PR #37921 by Xu Huang) (#38608) · f147fc99
  由 Huihuang Zheng 提交于 12月 31, 2021
```
Paddle new APIs: put_along_axis.

Xu Huang is on holiday so we created this PR to work on it. It is based on his PR: https://github.com/PaddlePaddle/Paddle/pull/37921
```
  f147fc99
- Z
  
  add lu_op backward (#38616) · a1275c8b
  由 zhiboniu 提交于 12月 31, 2021
  
  a1275c8b
- C
  [PTen] Unify data layout of pten and fluid (#38583) · 8d32cef8
  由 Chen Weihang 提交于 12月 31, 2021
```
* unify data layout

* fix test_transfer_layout error
```
  8d32cef8
- Y
  [Pten]Move math to new directory and change 「math」 to 「math_kernel」 (#38604) · e76087ad
  由 YuanRisheng 提交于 12月 31, 2021
```
* change 'math' to 'math_kernel'

* fix compile bugs

* merge develop

* fix compile bugs
```
  e76087ad
30 12月, 2021 12 次提交

Z
add OP lu forward (#38559) · 4e21457d
由 zhiboniu 提交于 12月 30, 2021
```
LGTM
```
4e21457d

add sigmoid_cross_entropy_with_logits to kl1 (#38586) · 790cadd1

由 houj04 提交于 12月 30, 2021

* add sigmoid cross entropy with logits to kl1. test=kunlun

* add sigmoid cross entropy with logits to kl1. test=kunlun

790cadd1

Z
Add exp, abs_grad, reciprocal, reciprocal_grad operator for XPU and update... · ceec1e21
由 zhangyk0314 提交于 12月 30, 2021
```
Add exp, abs_grad, reciprocal, reciprocal_grad operator for XPU and update xpu2_op_list.h,test=kunlun (#38570)
```
ceec1e21
J

Refactor cpu_quantize_pass (#38019) · 1fa6900e
由 joanna.wozna.intel 提交于 12月 30, 2021

1fa6900e

flags to choose kp kernel (#38455) · ed2cfecf

由 Feng Xing 提交于 12月 30, 2021

This PR adds runtime flags run_kp_kernel, which choose which op to run for xpu2. There are two: dynamic linked and built from kp.

ed2cfecf

J
[New API] add new api paddle.mode and paddle.Tensor.mode (#38446) · 3777779b
由 JYChen 提交于 12月 30, 2021
```
* add new OP mode

* rename trans-variable name and fix UT
```
3777779b
Y
[Auto parallel] Make sure the id semantics of every var and op unique (#38132) · 5620214e
由 Yulong Ao 提交于 12月 30, 2021
```
* [Auto parallel] Make the id of var and op unique

* [Auto Parallel] Rename back dist_context to distop_context
```
5620214e

Add cpu kernel of new api : lstsq (#38585) · ccf99b66

由 Haohongxiang 提交于 12月 30, 2021

* add cpu kernel of lstsq

* update

* modify code style

* modify unittest

* remove support for complex

ccf99b66

Add cusparse and unittest (#38431) · 667dc9f0

由 zhangkaihuo 提交于 12月 30, 2021

将cuSparse的handle与DeviceContext进行绑定，避免op中进行创建和销毁
添加对cuSparse中dense和sparse转换的API进行封装
添加对封装的API的单测

667dc9f0

L

[Fleet Executor] Support multi carrier (#38535) · 3658405c
由 LiYuRio 提交于 12月 30, 2021

3658405c

Support test imperative basic with fixed retain grad interface (#38548) · 2421a25a

由 Jiabin Yang 提交于 12月 30, 2021

* Rearranged Eager AutoCodeGen directory structure

* Removed USE_OP in Eager AutoCodeGen

* Enabled generation for Operators without Grad/Inputs/Outputs

* Resolved operators without input

* Fixed merge conflicts

* Enabled Eager AutoCodeGen for 10+ more operators

* Refactored Eager AutoCodeGen with more organized helper objects

* Enabled Eager AutoCodeGen for operators with multiple OpBases

* Adjusted Eager AutoCodeGen to Enable Passing Output Tensor as Input Argument

* Handled Dispensable Inputs/Outputs in Eager AutoCodeGen

* Adjusted function generation/call between Python-C API & Dygraph API

* Synchronized auto-generated Python-C API with Dygraph Forward Functions

* support more eager tensor api

* fix merge compile error

* fix compile error and fit develop code

* support pure CPU

* fix some logic error in eager_mode

* support _varbase_creator in eager mode

* Added safe_initialized interface to EagerTensor for use in processing dispensable inputs

* for eager mode

* refine

* support multiple constructor for eager tensor

* add place related code

* polish code

* specific randint with dtype of int64

* Support pure cpu test

* eager logic

* refine test in pure cpu

* eager logic

* eager logic

* eager logic, test=develop

* skip core.eager when in inference, test=develop

* refine, test=develop

* refine, test=develop

* call RetainGrad after run forward kernel, test=develop

* refine, test=develop

* support dygraph util, meta, guard test

* support inference test

* refine test and fix initializer failed

* support create varbase and fix retain grad error

* fix windows error

* support test_imperative_basic test in eager mode

* remove additional log in variable.h

* remove additional log in variable.h

* remove additional code create in merge
Co-authored-by: Njim19930609 <jim19930609@gmail.com>
Co-authored-by: NWang Huan <wanghuan29@baidu.com>

2421a25a

W
dynamic shape clone (#38520) · 339c34e6
由 wenbin 提交于 12月 30, 2021
```
* dynamic shape clone supported
```
339c34e6

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致