提交 · d772a9aa7bd221c8598d5a0300a0b89060720169 · Crayon鑫 / Paddle

18 11月, 2021 4 次提交

L
fix bug to support dropout eval grad computing. (#37305) · c3d3001f
由 Li Min 提交于 11月 18, 2021
```
* fix bug to support dropout eval grad computing.

* Remove useless code.
```
c3d3001f

[PTen]elementwise_sub kernel refactor (#37260) · 36a95654

由 YuanRisheng 提交于 11月 18, 2021

* elementwise_add kernel refactor

* fix compile bugs in elementwise_add refactor

* fix compile bugs when run in npu/xpu

* fix bugs when run unit test

* fix bugs when run ci-windows

* modify code as recommended

* code format adjust

* fix bugs when run ci

* fix compile bug when run in ci-windwos

* elementwise_sub refactor

* add PD_DLL_DECL for elementwise_sub

* fix bugs when compilei

36a95654

Add the `GetFetchNames` method in CinnGraphSymbolization. (#37218) · 3ad495e8

由 Zhen Wang 提交于 11月 18, 2021

* Add the `GetFetchNames` method in CinnGraphSymbolization.

* Use unordered_set instead vector as the type of fetch_var_names.

* Reuse the definition of kCompilationKey.

* Use CompileOptions to set fetch_var_ids.

* Update the argument passing of GraphCompiler.Build.

* Fix some bugs in CinnGraphSymbolization::GetFetchIds.

3ad495e8

Opt topk (#37256) · c4862d99

由 zhangkaihuo 提交于 11月 18, 2021

topk中有cub和手写kernel两种实现，而cub是通过排序来获取topk，通过多组数据发现只有当input_width>=128且k超过input_width 75%的时候性能会比手写的更好。

c4862d99

17 11月, 2021 6 次提交

Replace custom IOHW -> OIHW reorder with build-in oneDNN reorder (#37175) · 162ac048

由 Sławomir Siwek 提交于 11月 17, 2021

* Use oneDNN reorder instead of custom one

* Fix whitespace typo

* Fix Code format error

* Incorporating feedback

* Remove unncessary reorder

* Support GIOHW format

* Fix code format error

162ac048

Changed first batch of deprecated mkldnn headers and function names to new oneDNN names (#37040) · ce3ee9bb

由 piotrekobiIntel 提交于 11月 17, 2021

* Change first batch of mkldnn headers and namespace names to dnnl

* Revert changes to tensor.h, which require approval

* Format changes with pre-commit

* Add int32 tests

* Fix int32 tests and call GetDataFromTensor for int32

* Fix test

ce3ee9bb

N
Modify reduce_op.op.h for xpu2 with kernel primitive api (#36904) · 9c5d5665
由 niuliling123 提交于 11月 17, 2021
```
* Modify reduce_op.op.h for xpu2 with kernel primitive api
```
9c5d5665

[heterps]Refactor heterogenous worker (#37244) · 54d2626a

由 zmx 提交于 11月 17, 2021

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix ut. test=develop

* refactor heter trainer. test=develop

* fix. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix ut. test=develop

54d2626a

L
copy beta pow to same place when skip_update=1 (#37245) · 5e4b419b
由 Leo Chen 提交于 11月 17, 2021
```
* copy beta pow to same place when skip_update=1

* fix xpu
```
5e4b419b
W

[npu][hybrid] support offload (#37224) · 762819a8
由 WangXi 提交于 11月 17, 2021

762819a8

16 11月, 2021 5 次提交

A
Added BF16 Pool2d grad (#37081) · f95d44a2
由 arlesniak 提交于 11月 16, 2021
```
* Added BF16 Pool2d grad

* upstream pulled

* fix for CI

* fixes after review
```
f95d44a2

Add API and unit test for reshape (#37232) · 79b49c20

由 YuanRisheng 提交于 11月 16, 2021

* reshape kernel refactor

* fix compile bugs when run ci

* support xpu for reshape

* fix bugs when run unittest in kunlun ci

* fix compile bugs when run kunlun

* perfect code according to suggestion

* add api and unit test for reshape

79b49c20

Y
Make FLAGS_determinstic effective in conv2d forward. (#37173) · ea47d211
由 Yiqun Liu 提交于 11月 16, 2021
```
* Make FLAGS_determinstic effective in conv2d forward.

* Add call of SetCinnCudnnDeterministic in cinn_launch op.
```
ea47d211
J

added onednn elu kernel (#37149) · ae40ee32
由 jakpiase 提交于 11月 16, 2021

ae40ee32

Fix attn_bias_add bug. (#37147) · a9e7a854

由 Li Min 提交于 11月 16, 2021

fused_attention_op的实现中，使用了bias_add，且其实现是通过使用kernel primitive来实现的，之后kernel primitive的WriteData api接口及函数内部实现发生了更改，将判断越界的逻辑移到了template的参数中，使得调用的分支有错误，产生了越界赋值操作，污染了别的显存空间的内容。具体表现为：test_fused_attention_op_api.py 单次执行基本上不会报错，多次循环执行不同shape的输入，结果计算不对，具有偶发性，bug不易察觉。

a9e7a854

15 11月, 2021 6 次提交

[Pten] Refactor the implementation of custom operator (#37122) · 1e598f1a

由 Chen Weihang 提交于 11月 15, 2021

* move extension into pten [no-verify]

* append tensor methods by ext_tensor [no-verify]

* append other tensor methods [no-verify]

* ext related files tidy [no-verify]

* include relation tidy [no-verify]

* add pten tensor test [no-verify]

* replace tensor in custom op & compile success

* refine tensor constructor for unittest

* custom relu jit run success

* fix all custom op unittests

* add inference cmake adapt [no-verify]

* fix failed unittests

* fix windows failed unittests

* try to fix kunlun and inference failed

* fix test_elementwise_api error

* try to fix win compile failed

* fix kunlun fp16 type error

* remove useless haddle error macro

* add custom linear op test

* fix compile failed & add win symbols

* fix non pten kernel cast failed

* add dll decl for api

* polish several deetails

* polish details by review comment

* add dll_decl for register

1e598f1a

F

fix:delete macro INFERENCE (#37130) · b628c316
由 feng_shuai 提交于 11月 15, 2021

b628c316
A
Added BF16 to mean op (#37104) · df7cc457
由 arlesniak 提交于 11月 15, 2021
```
* Added BF16 to mean op

* fix for CI

* fix for CI

* fix for CI
```
df7cc457
W
[New features] Add elementwise_mul triple grad kernel (#37152) · 59fdf4da
由 Weilong Wu 提交于 11月 15, 2021
```
* Add elementwise_mul triple grad kernel

* Removed InplaceInferer and polished code
```
59fdf4da

Add distributed pass framework: including PassBase/PassTest/PassUtils (#36643) · 12339fa0

由 Zeng Jinle 提交于 11月 15, 2021

* add split_program

* make ut faster

* increase ut timeout

* make result deterministic

* add fuse_all_reduce pass

* add ut framework, update

* fix ut framework

* remove useless code

* add coverage support

* update

* fix CI

* fix some bugs and fix ci coverage

* fix conflict

12339fa0

[heterps]bug fix for local training with --heter_worker_num (#37166) · 31cd9145

由 zmx 提交于 11月 15, 2021

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix ut. test=develop

31cd9145

14 11月, 2021 1 次提交

[PTen]Reshape Kernel Refactor (#37164) · 895692e3

由 YuanRisheng 提交于 11月 14, 2021

* reshape kernel refactor

* fix compile bugs when run ci

* support xpu for reshape

* fix bugs when run unittest in kunlun ci

* fix compile bugs when run kunlun

* perfect code according to suggestion

895692e3

13 11月, 2021 1 次提交

cinn_launch_op: skip checking input variables must be used (#37119) · 228eb898

由 CtfGo 提交于 11月 13, 2021

Modify serveral implements on CinnLaunchOp：
1. Skip checking input variables must be used 
2. Move current helper functions to a CinnlaunchContext

228eb898

12 11月, 2021 5 次提交
- Z
  [fix]fix the bug of fused_attention and fused_feedforward (#36972) · 6486e242
  由 zhangkaihuo 提交于 11月 12, 2021
```
* fix bug:
1. atten: set the default value of attn_dropout_rate to None
2. ffn: add activation parameter
```
  6486e242
- C
  
  fix test_scale_op skipped test (#37153) · ca7f1cd2
  由 Chen Weihang 提交于 11月 12, 2021
  
  ca7f1cd2
- Z
  [PTen] Adjust the param of full_like API in pten (#37088) · abd4ab9c
  由 zyfncg 提交于 11月 12, 2021
```
* adjust the param of full_like api  in pten

* adjust the code format

* adjust the code format

* adjust the code format
```
  abd4ab9c
- A
  
  [NPU] fix fill_constant and test_memcpy_op_npu (#37144) · 9396f286
  由 Aganlengzi 提交于 11月 12, 2021
  
  9396f286
- Y
  [Pten]Refactor the Elementwise_add Kernel (#37043) · c1310343
  由 YuanRisheng 提交于 11月 12, 2021
```
* elementwise_add kernel refactor

* fix compile bugs in elementwise_add refactor

* fix compile bugs when run in npu/xpu

* fix bugs when run unit test

* fix bugs when run ci-windows

* modify code as recommended

* code format adjust

* fix bugs when run ci

* fix compile bug when run in ci-windwos
```
  c1310343
11 11月, 2021 4 次提交

Z

Fix unit test for send_and_recv_cpu & send_and_recv_gpu (#37129) · a41447f0
由 zmx 提交于 11月 11, 2021

a41447f0
T
add where/where_index/masked_select for kunlun (#37053) · f5e7b02a
由 TTerror 提交于 11月 11, 2021
```
* add where/where_index/masked_select for kunlun

* fix where/where_index

* update where/masked_select
```
f5e7b02a

Added softplus + activation oneDNN fuse pass (#36657) · a346c4dc

由 jakpiase 提交于 11月 11, 2021

* added softplus + activation fuse plass

* minor change

* implemented reviewer suggestion

* minor fix

* minor fix

* added scale_out parameter

* minor fix

* fix for iScan CI

* conditionally disabled logs

* refactored pass builder

a346c4dc

[Heterps]Refactor Heter Pipeline Parameter Server (#36845) · a2da1efa

由 zmx 提交于 11月 11, 2021

* change username

* fix

* fix

* fix

* fix

* fix

* update

* update

* update unittests

* fix

* update

* fix

* update

* fix

* fix

* fix

* update

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update send_and_recv op. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix ut. test=develop

* fix unit. notest,test=coverage

* fix ut. notest, test=coverage

* update. notest,test=coverage

* fix ut. notest, test=coverage

* fix ut. notest, test=coverage

* fix. notest, test=coverage

* fix. notest, test=coverage

* fix ut. notest, test=coverage

* fix ut. notest, test=coverage

* fix ut. notest, test=coverage

* fix ut. notest, test=coverage

* add func. notest, test=coverage

* fix ut. notest, test=coverage

* fix. test=develop

* fix. test=develop

a2da1efa

10 11月, 2021 3 次提交
- J
  Added stack FP32 FWD oneDNN kernel (#37002) · 99f9224c
  由 jakpiase 提交于 11月 10, 2021
```
* added stack oneDNN FP32 op

* minor change

* CI fix

* added skipping for gpus

* fix for stack op

* CI fix

* CI fix

* Added comment

* CI fix
```
  99f9224c
- L
  Fix fused_attention_op scope. (#37065) · ad44a40c
  由 Li Min 提交于 11月 10, 2021
```
att, bug fix
```
  ad44a40c
- J
  Fix rnn grad bug in cpu when dropout is zero (#37080) · 211940eb
  由 Jack Zhou 提交于 11月 10, 2021
```
* fix rnn grad bug when num_layers is set 2 and dropout_prob is set 0

* add more test for rnn
```
  211940eb
09 11月, 2021 3 次提交
- H
  
  optimize backward (#37055) · aac00f6a
  由 Haohongxiang 提交于 11月 09, 2021
  
  aac00f6a
- Z
  Try to fix CUDA Graph H2D copy bug (#36987) · 2a143f84
  由 Zeng Jinle 提交于 11月 09, 2021
```
* try to fix CUDA Graph H2D copy bug

* remove useless code

* fix ci

* fix ROCM CI

* fix CUDA_VERSION

* improve CI coverage
```
  2a143f84
- T
  
  add gather_nd/tile op for kunlun (#37029) · 819b9589
  由 TTerror 提交于 11月 09, 2021
  
  819b9589
08 11月, 2021 2 次提交

[PTen] Add full kernel in pten (incomplete) (#36930) · 655f4e3f

由 zyfncg 提交于 11月 08, 2021

* initial tensor design & sign kernel demo

* add move constructor for meta & add lodtensor

* add dirs & sign xpu kernel

* add mean cpu&cuda kernel impl

* move sign & mean xpu & npu kernel

* add selected_rows basic impl

* refactor design, BaseTensor to DenseTensor, etc.

* add scale mkldnn kernel

* polish xpu & npu impl details

* fix mkldnn reuse compile failed

* change tensor operation lib name

* rename util filename

* add more comments

* change TensorImplInterface to TensorInterface

* add kernel key and factory

* remove MKLDNNTensorMeta, add MKLDNNDenseTensor

* change XXDeviceContext to XXContext

* add base kernel registrar utils & test on sign

* replace boost::any by paddle::any

* fix several ci failed

* fix npu compile error

* add ordered map util

* fix multiple ordered_map compile errors

* move dev into include dir

* support sign op in static op run

* fix static op run error

* fix new executor compile failed

* add dygraph branch & remove sign_op.h

* fix test_infer_no_need_buffer_slots

* fix rocm compile link error

* fix unitybuild error & clear glog

* fix npu compile failed

* skip quant trans test

* fix part windows compile problem

* fix xpu enforce error

* fix inference test failed

* remove ordered_map to solve quant failed

* fix part of rcom compile faild

* add more register kernels

* revert scale kernel temporarily

* fix code format error

* add new kernel registrar marco

* rename top to tcmpt

* revert xpu, npu, mkldnn impl & remove op def

* add kernel args parse functor to auto parse args

* revert some change & add scale kernels

* add op proto in dygraph kernelcontext building

* polish kernel dispatch logic & nameing rule

* fix scale kernel match error

* fix scale test failed

* add mean API and unittest

* test mean api success

* add branch to solve compiled error

* skip clang format error

* add mean skip rule in op_library

* add dot kernel, api and unittest (#6)

* remove old kernel and add symbol link

* fix dot compiled failed

* add merco for module declare

* fix npu and xpu compile error

* revert sign, mean, scale, dot kernel removing

* add comment for keeping old kernel impl

* fix mutable_data error

* fix bfloat16 conflit

* fix inference undef error

* adapt to msvc compile rules

* polish comment for template inst

* add cmake template instantiation for win

* fix backend to place device id bug

* fix ifdef error

* Op2functor (#7)

* add kernel args maker class

* make args maker non-const

* remove debug log

* modify codes by review options

* split constructPrKernelContext function

* fix output name bug

* fix test_mean_op test_sign_op failed

* fill_any_like kernel refactor (#10)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* skip dtype for fill_any_like

* add attrs for kernel key constrcut

* add use_pt_kernel Flags to control whether to use pt kernel (#13)

* add use_pt_kernel Flags to control whether to use pt kernel

* change the default value to true for cheking pt kernels

* fix mutable_data cuda place error

* move high level apis into hapi

* remove selectedrows adapting temporarily

* Support Scalar in Tensor Compute Library (#14)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* Support Scalar in Tensor Compute Library

* add scalar in dygraph and static graph mode

* keep the basic type for attr, instead of using scalar for all

* merge the code

* remove mkldnn tensor & polish details

* use flat_hash_map and small_vector in kernel factory

* Refactor flatten kernel (#12)

* refactor flatten kernel

* update infershape function

* fix compile bugs

* fix bugs when merge

* fix compiler bugs

* fix bugs when run test_flatten_api

* fix bugs when run test

* Revert "use flat_hash_map and small_vector in kernel factory"

This reverts commit 23091495cfdd3df8cc1be592d30f09ea66a7c72b.

* Move cpu, cuda and other device code into kernels (#15)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* Support Scalar in Tensor Compute Library

* add scalar in dygraph and static graph mode

* keep the basic type for attr, instead of using scalar for all

* merge the code

* start refactor matmul

* move cpu, cuda and other device modules into kernels

* merge code

* polish code in operator.cc

* Perfect unitests (#16)

* perfect unittest

* update license

* replace with flat_hash_map, small_vector (#19)

* fix small_vector build error on windows platform

* replace with flat_hash_map, small_vector

* remove todo

* Perfect unitests (#20)

* perfect unittest

* update license

* fix bug when run tcmpt_utils_test

* refactor execution adapting impl

* fix insert conflit

* Fix CI bug of test_yolov3 (#21)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* Support Scalar in Tensor Compute Library

* add scalar in dygraph and static graph mode

* keep the basic type for attr, instead of using scalar for all

* merge the code

* start refactor matmul

* move cpu, cuda and other device modules into kernels

* merge code

* polish code in operator.cc

* Fix CI bug of test_yolov3

* add the tensor base class, test=develop (#17)

* update the tensor base class, test=develop

* remove two funcs, test=develop

* update the error msg, test=develop
Co-authored-by: NChen Weihang <chenweihang@baidu.com>

* [no-verify] commit backend and tensor signature changes

* Rename tcmpt to pten (#23)

* rename tcmpt to pten

* update omitted files for rename to pten

* update omitted file for rename to pten

* remove k of all enum var

* remove kernel_instantiate (#26)

* remove symbols and spatial_tensor

* change common to functions

* readd share tensor impl methods

* add a candidate dense tensor class, test=develop (#28)

* change all Pt to Pten

* resolve conflit with xiaowei

* Op2functor opt1 (#27)

* replace to small vector and change to const &

* add std::move
Co-authored-by: NChen Weihang <chenweihang@baidu.com>

* polish kernel factory and kernel registry

* fix operator test error msg mismatch

* remove tensor signature and backend set member

* move scalar and polish enforce

* revert dtype layout change to fix error

* fix enum operator override error

* add several base unittests

* add pten utils tests

* polish some details

* Dev/op2func refactor 3 (#30)

* add a candidate dense tensor class, test=develop

* remove TensorBase::backend(), test=develop

* remove some ops, test=develop

* cherry-pick the pr of tensor meta, test=develop

* moves the dense tensor and some ops, test=develop

* update the linalg operator, test=develop

* update other operators, test=develop

* fix errors, test=develop

* fix bugs, test=develop

* try to resolve the problem of windows ci, test=develop

* updates codes, test=develop

* fix the tensor_utils.cc, test=develop

* modify the dense tensor, test=develop

* fix the data type, test=develop
Co-authored-by: Nshixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>

* polish some details

* polish kernel signature details

* fix a bug about offsets of the tensor, test=develop (#31)
Co-authored-by: Nshixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>

* polish some details

* add fill_constant kernel in pten

* fix bug of full api (c++)

* remove the support for SelectRows in new fill_constant kernel

* fix bug of setting fill_any_like kernel key

* merge code confilct

* modify fill_constant GetExpectedKernelType

* fix fill_constant KernelType bug

* polish code of build pten KernelContext

* refactor code of fill_constant in pten
Co-authored-by: NChen Weihang <chenweihang@baidu.com>
Co-authored-by: Nchentianyu03 <ctychentianyu@gmail.com>
Co-authored-by: NYuanRisheng <yuanrisheng@baidu.com>
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>

655f4e3f

L
【fix-bug】Support attn_mask=None input cases for fused_attention_op. (#36951) · 472dcca4
由 Li Min 提交于 11月 08, 2021
```
目前的fused_attention_op不支持attn_mask=None的输入，本PR对此进行了补充，并补充了相应的单测逻辑。
```
472dcca4

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致