提交 · 1a8786cf051cd7b70711e22ee1ca1e5c0d1b11e4 · PaddlePaddle / Paddle

23 11月, 2021 5 次提交
- L
  Add support bias is none for fused_attention op. (#37411) · 1a8786cf
  由 Li Min 提交于 11月 23, 2021
```
Add support for bias is none for fused_attention op.
```
  1a8786cf
- S
  Enhance the error message of scatter op (#37429) · 11b17c88
  由 sneaxiy 提交于 11月 23, 2021
```
* enhance scatter err msg check

* fix ci error
```
  11b17c88
- Y
  [PTen]Elementwise_div Kernel Refactor (#37418) · 32d9beef
  由 YuanRisheng 提交于 11月 23, 2021
```
* elementwise_div refactor

* fix compile bugs in windows ci
```
  32d9beef
- R
  [NPU] Added HCCL backend support in dygraph mode (#36285) · 83e55cff
  由 ronnywang 提交于 11月 23, 2021
```
* Added HCCL backend support in dynamic graph mode

* fix segmentation fault

* add ut
```
  83e55cff
- A
  [NewExe] Support layout/dtype transform by adding transfer_layout/transfer_dtype op (#37299) · 2a1f009e
  由 Aurelius84 提交于 11月 23, 2021
```
* Add transfer_layout/dtype op

* clean useless codes

* fix unused var

* add optest in white.txt

* split into data_transfer.cc

* fix cmake

* modify according reviewer comment

* replace cast_op with transfer_dtype_op
```
  2a1f009e
22 11月, 2021 6 次提交

disable copying of datatype when sharing buffer between two tensors. (#37247) · 9ec1432d

由 Feiyu Chan 提交于 11月 22, 2021

* disable copying of datatype when sharing buffer between two tensors.
* fix for mkldnn operator kernels (elementwise_add, sum, softplus, softmax, scale, activation), mannually set the data type when reusing memory by ShareBufferWith.

9ec1432d

Add isclose op (#37135) · d2200e97

由 andyjpaddle 提交于 11月 22, 2021

* add isclose op, test=develop

* add isclose op, test=develop

* add isclose api, test=develop

* rm useless code

* rm useless code

* update python api of isclose

* add some unittest of isclose op, test=develop

d2200e97

Z

elu support alpha < 0 (#37316) · e3503de8
由 zhupengyang 提交于 11月 22, 2021

e3503de8
Z
Support zero value in dimension for slice (#37313) · e788c7b5
由 zyfncg 提交于 11月 22, 2021
```
* support zero dim for slice op

* support zero dim Tensor in set_value op

* polish some debug log
```
e788c7b5

[PTen] Add variable transform to/from ptenTensor and add cast kernel (#36916) · 5caa6fc5

由 chentianyu03 提交于 11月 22, 2021

* add cast kernel

* add cast cuda kernel

* add cast kernel

* make cast kernel output dtype undefined

* get cast dtype from vardesc

* move cast to manipulation and add test case

* add castinfershape

* avoid reinitilaze variable

* InitializeVariable support datatype

* merge develop branch

* fix merge bug

* revert modify initializeVariable

* revert modify on InitializeVariable

* revert modify on InitializeVariable

* mutable support reset dtype

* enable make pten tensor from variable when def_arg.type is undefined

* fix build pten ctx start_idx error

* copy pten out tensor to variable

* merge develop branch

* fix non pten kernel cast failed

* add reset allocation place for remake tensor

* fix inplace realloc error

* add mutable on pten kernles and remove unused cast files

* rename function names

* fix output type error

* fix conflict with develop branch

* set data type to variable with pten's dtype

* fix test_cast_api type mismatch

* densorTensro mutable_data support 0 bytes value

* fix the inplace bug of reshape kernel

* fix pten.backend != variable.place when moving storage, palce mismatch bug

* fix conflict with develop branch

* Fix bug of paddle::experimental::MovesStorage

* fix ReMakePtenDenseTensor place mismatch bug

* Revert "fix ReMakePtenDenseTensor place mismatch bug"

This reverts commit 86336032f60b8a15eacd2c1ff2fa513f5d8dfd1a.

* fix ReMakePtenDenseTensor place mismatch bug

* reverts the set_lod interface, test=develop

* modify by the review options

* modify error message

* add & for const input arguments

* add reference in params

* elementwise_sub add mutable_data

* fix ResetHolderWithType check size bug

* add dependence pten_tensor to test_cast_api object

* remove unused code to pass ci coverage
Co-authored-by: NChen Weihang <chenweihang@baidu.com>
Co-authored-by: NYuanRisheng <yuanrisheng@baidu.com>
Co-authored-by: Nshixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>

5caa6fc5

L

[new feature] add local scope for interpretercore (#37379) · 1f0512be
由 Leo Chen 提交于 11月 22, 2021

1f0512be

19 11月, 2021 6 次提交

L

bug fix shard_index (#37042) · b505ff96
由 lilong12 提交于 11月 19, 2021

b505ff96
J
Optimize cinn_cache_key by replace GraphToProgram to Dot string (#37317) · edc3496f
由 jiangcheng 提交于 11月 19, 2021
```
* optimize cache-key by replace GraphToProgram to Dot string

* fix compile failure bug
```
edc3496f

Add fuse_resnet_unit pass (#36818) · 3cd3bf29

由 wuhuanzhou 提交于 11月 19, 2021

* GeneratePass support attr condition and mapping, test=develop

* fix coverage, test=develop

* Add fuse_resnet_unit pass, test=develop

* fix CI errors, test=develop

* fix CI errors, test=develop

* fix unittest error when compiling without CUDA, test=develop

* fix static ci error, test=develop

* limit kernel size must equal 1, test=develop

3cd3bf29

F

fix for cufft: some early versions of cufft do not define CUFFT_VERSION in the header (#37312) · d8191d06
由 Feiyu Chan 提交于 11月 19, 2021

d8191d06

Add paddle.incubate.graph_send_recv API (#37205) · 39012536

由 Siming Dai 提交于 11月 19, 2021

* add cpu version, using set: sum, min, max

* add cpu version: mean

* improve cpu code and fix dynamic memory allcation problem

* fix arg error, add index judge, delete fp16

* fix bug in CudaAtomicMax and CudaAtomicMin

* add CUDA version

* fix grad_op bug for index

* add op test, add correct cpu grad op

* Add correct CUDA Mean grad

* [Add] Successful MEAN and SUM

* [Add] Successful MIN and MAX in CPU

* [Add] Successful MIN and MAX in CUDA

* fix windows dtype ci

* fix ROCM ci by adding HIP flag

* rename fused_gather_scatter to send_recv

* unify name as send and recv

* change zero index return time

* add send_recv incubate api

* fix index data type, add unittest case for API

* delete redundant input tensor

* fix en example and docs, add default value in pool_type

* add shape judge and max grid judge

* fix comment

* fix index type bug

* add const &

* fix en docs

* delete numpy in examples

* add unittest for int input

* fix send_recv comment

* change send_recv to graph_send_recv

39012536

L

fix cmake dependence error (#37304) · 6653ac5e
由 LiYuRio 提交于 11月 19, 2021

6653ac5e

18 11月, 2021 4 次提交

L
fix bug to support dropout eval grad computing. (#37305) · c3d3001f
由 Li Min 提交于 11月 18, 2021
```
* fix bug to support dropout eval grad computing.

* Remove useless code.
```
c3d3001f

[PTen]elementwise_sub kernel refactor (#37260) · 36a95654

由 YuanRisheng 提交于 11月 18, 2021

* elementwise_add kernel refactor

* fix compile bugs in elementwise_add refactor

* fix compile bugs when run in npu/xpu

* fix bugs when run unit test

* fix bugs when run ci-windows

* modify code as recommended

* code format adjust

* fix bugs when run ci

* fix compile bug when run in ci-windwos

* elementwise_sub refactor

* add PD_DLL_DECL for elementwise_sub

* fix bugs when compilei

36a95654

Add the `GetFetchNames` method in CinnGraphSymbolization. (#37218) · 3ad495e8

由 Zhen Wang 提交于 11月 18, 2021

* Add the `GetFetchNames` method in CinnGraphSymbolization.

* Use unordered_set instead vector as the type of fetch_var_names.

* Reuse the definition of kCompilationKey.

* Use CompileOptions to set fetch_var_ids.

* Update the argument passing of GraphCompiler.Build.

* Fix some bugs in CinnGraphSymbolization::GetFetchIds.

3ad495e8

Opt topk (#37256) · c4862d99

由 zhangkaihuo 提交于 11月 18, 2021

topk中有cub和手写kernel两种实现，而cub是通过排序来获取topk，通过多组数据发现只有当input_width>=128且k超过input_width 75%的时候性能会比手写的更好。

c4862d99

17 11月, 2021 6 次提交

Replace custom IOHW -> OIHW reorder with build-in oneDNN reorder (#37175) · 162ac048

由 Sławomir Siwek 提交于 11月 17, 2021

* Use oneDNN reorder instead of custom one

* Fix whitespace typo

* Fix Code format error

* Incorporating feedback

* Remove unncessary reorder

* Support GIOHW format

* Fix code format error

162ac048

Changed first batch of deprecated mkldnn headers and function names to new oneDNN names (#37040) · ce3ee9bb

由 piotrekobiIntel 提交于 11月 17, 2021

* Change first batch of mkldnn headers and namespace names to dnnl

* Revert changes to tensor.h, which require approval

* Format changes with pre-commit

* Add int32 tests

* Fix int32 tests and call GetDataFromTensor for int32

* Fix test

ce3ee9bb

N
Modify reduce_op.op.h for xpu2 with kernel primitive api (#36904) · 9c5d5665
由 niuliling123 提交于 11月 17, 2021
```
* Modify reduce_op.op.h for xpu2 with kernel primitive api
```
9c5d5665

[heterps]Refactor heterogenous worker (#37244) · 54d2626a

由 zmx 提交于 11月 17, 2021

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix ut. test=develop

* refactor heter trainer. test=develop

* fix. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix ut. test=develop

54d2626a

L
copy beta pow to same place when skip_update=1 (#37245) · 5e4b419b
由 Leo Chen 提交于 11月 17, 2021
```
* copy beta pow to same place when skip_update=1

* fix xpu
```
5e4b419b
W

[npu][hybrid] support offload (#37224) · 762819a8
由 WangXi 提交于 11月 17, 2021

762819a8

16 11月, 2021 5 次提交

A
Added BF16 Pool2d grad (#37081) · f95d44a2
由 arlesniak 提交于 11月 16, 2021
```
* Added BF16 Pool2d grad

* upstream pulled

* fix for CI

* fixes after review
```
f95d44a2

Add API and unit test for reshape (#37232) · 79b49c20

由 YuanRisheng 提交于 11月 16, 2021

* reshape kernel refactor

* fix compile bugs when run ci

* support xpu for reshape

* fix bugs when run unittest in kunlun ci

* fix compile bugs when run kunlun

* perfect code according to suggestion

* add api and unit test for reshape

79b49c20

Y
Make FLAGS_determinstic effective in conv2d forward. (#37173) · ea47d211
由 Yiqun Liu 提交于 11月 16, 2021
```
* Make FLAGS_determinstic effective in conv2d forward.

* Add call of SetCinnCudnnDeterministic in cinn_launch op.
```
ea47d211
J

added onednn elu kernel (#37149) · ae40ee32
由 jakpiase 提交于 11月 16, 2021

ae40ee32

Fix attn_bias_add bug. (#37147) · a9e7a854

由 Li Min 提交于 11月 16, 2021

fused_attention_op的实现中，使用了bias_add，且其实现是通过使用kernel primitive来实现的，之后kernel primitive的WriteData api接口及函数内部实现发生了更改，将判断越界的逻辑移到了template的参数中，使得调用的分支有错误，产生了越界赋值操作，污染了别的显存空间的内容。具体表现为：test_fused_attention_op_api.py 单次执行基本上不会报错，多次循环执行不同shape的输入，结果计算不对，具有偶发性，bug不易察觉。

a9e7a854

15 11月, 2021 6 次提交

[Pten] Refactor the implementation of custom operator (#37122) · 1e598f1a

由 Chen Weihang 提交于 11月 15, 2021

* move extension into pten [no-verify]

* append tensor methods by ext_tensor [no-verify]

* append other tensor methods [no-verify]

* ext related files tidy [no-verify]

* include relation tidy [no-verify]

* add pten tensor test [no-verify]

* replace tensor in custom op & compile success

* refine tensor constructor for unittest

* custom relu jit run success

* fix all custom op unittests

* add inference cmake adapt [no-verify]

* fix failed unittests

* fix windows failed unittests

* try to fix kunlun and inference failed

* fix test_elementwise_api error

* try to fix win compile failed

* fix kunlun fp16 type error

* remove useless haddle error macro

* add custom linear op test

* fix compile failed & add win symbols

* fix non pten kernel cast failed

* add dll decl for api

* polish several deetails

* polish details by review comment

* add dll_decl for register

1e598f1a

F

fix:delete macro INFERENCE (#37130) · b628c316
由 feng_shuai 提交于 11月 15, 2021

b628c316
A
Added BF16 to mean op (#37104) · df7cc457
由 arlesniak 提交于 11月 15, 2021
```
* Added BF16 to mean op

* fix for CI

* fix for CI

* fix for CI
```
df7cc457
W
[New features] Add elementwise_mul triple grad kernel (#37152) · 59fdf4da
由 Weilong Wu 提交于 11月 15, 2021
```
* Add elementwise_mul triple grad kernel

* Removed InplaceInferer and polished code
```
59fdf4da

Add distributed pass framework: including PassBase/PassTest/PassUtils (#36643) · 12339fa0

由 Zeng Jinle 提交于 11月 15, 2021

* add split_program

* make ut faster

* increase ut timeout

* make result deterministic

* add fuse_all_reduce pass

* add ut framework, update

* fix ut framework

* remove useless code

* add coverage support

* update

* fix CI

* fix some bugs and fix ci coverage

* fix conflict

12339fa0

[heterps]bug fix for local training with --heter_worker_num (#37166) · 31cd9145

由 zmx 提交于 11月 15, 2021

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix ut. test=develop

31cd9145

14 11月, 2021 1 次提交

[PTen]Reshape Kernel Refactor (#37164) · 895692e3

由 YuanRisheng 提交于 11月 14, 2021

* reshape kernel refactor

* fix compile bugs when run ci

* support xpu for reshape

* fix bugs when run unittest in kunlun ci

* fix compile bugs when run kunlun

* perfect code according to suggestion

895692e3

13 11月, 2021 1 次提交

cinn_launch_op: skip checking input variables must be used (#37119) · 228eb898

由 CtfGo 提交于 11月 13, 2021

Modify serveral implements on CinnLaunchOp：
1. Skip checking input variables must be used 
2. Move current helper functions to a CinnlaunchContext

228eb898

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功