提交 · 77034fc3ec1afa5037c43f2b129fa9949930a889 · Crayon鑫 / Paddle

26 10月, 2021 3 次提交

Z
[cherry-pick]add op: fused_feedforward(forward) (#36729) · 77034fc3
由 zhangkaihuo 提交于 10月 26, 2021
```
This is a fusion operator to compute feed forward layer in transformer model architecture.
```
77034fc3
Y

[Cherry-pick] Add the forward QR operator (#36627) · 616ce203
由 Yulong Ao 提交于 10月 26, 2021

616ce203

[cherry-pick-2.2] Fused attention op forward (#35905) (#36708) · d2be870a

由 Li Min 提交于 10月 26, 2021

功能：本PR的目标是提高attention模块的计算性能。
为了减少框架层对op的调度开销，本PR通过在C++层手动实现attention模块，对外提供attention 大op；
为了减少防存开销，本PR采取了两种优化方法：
（1）在q,k,v计算时通过共享输入X，将该处的gemm，transpose和bias add从三次调用减少为一次；
（2）使用kernel融合优化技术，在不同cuda kernel之间通过寄存器传输数据；

d2be870a

19 10月, 2021 1 次提交

[cherry-pick]Add sparse attention cherrypick (#36447) · 36edb0e1

由 Liu-xiandong 提交于 10月 19, 2021

The code of this PR can only support CUDA 11.2. Currently, CI does not have GPU with CUDA 11.2 , and all tests will be skipped automatically.

The new OP is paddle._C_ops.sparse_attention. Regarding the work of the python API, it will be resolved in a follow-up PR.

The code of this PR lacks tests on dynamic graphs and static graphs, and will be added in subsequent PRs.

36edb0e1

16 9月, 2021 1 次提交
- C
  
  Add CPU and GPU eigh op implementation (#34990) · 07d0b834
  由 crystal 提交于 9月 16, 2021
  
  07d0b834
09 9月, 2021 1 次提交

Add matrix_rank Op and it's GPU and CPU kernel (#34823) · eb1fbf12

由 0x45f 提交于 9月 09, 2021

* init matrix_rank op, add matrix_rank CPU code and test

* add GPU kernel, remove svd_eigen.h

* add CPU kernel when tol is tensor

* add cpu and gpu code when tol is tensor

* fix CI-ROCM error

* add matrix_rank API describe, fix PR-CI-Py3 error

* fix PR-CI-Windows error, add matrix_rank API test

* delete useless comments

* fix review

* add my code in svd_helper.h

* update doc commets

* remove spaces

eb1fbf12

02 9月, 2021 1 次提交

Add SVD Op and it's GPU and CPU kernel (#34953) · 7e5fb462

由 xiongkun 提交于 9月 02, 2021

* Add SVD Op and it's GPU and CPU kernel

* Remove CUDAPlace in test_svd_op, make the test available in CPU package

* modfity the file

* fix windows bug/ fix ROCM / fix test timeout

* for pass the CIs

* improve error report

* for code review

* some modification to test_svd_op

* change python code style

* expose the svd interface for document

7e5fb462

16 6月, 2021 1 次提交
- Z
  
  Add bitwise_and/or/xor/not OP/API and unittest (#33524) · ecc05377
  由 Zhou Wei 提交于 6月 16, 2021
  
  ecc05377
07 5月, 2021 1 次提交
- L
  Fix compile error on jetson platform (#32748) · 8ce6b393
  由 LielinJiang 提交于 5月 07, 2021
```
* fix compile error on jetson platform
```
  8ce6b393
06 5月, 2021 1 次提交

[ROCM] bugfix for unittest (#32392) · 31392627

由 ronnywang 提交于 5月 06, 2021

* fix test_unpool_op

* fix test_inplace_addto_strategy

* fix test_conv2d_fusion_op

* fix test_imperative_lod_tensor_to_selected_rows, test_imperative_selected_rows_to_lod_tensor

* fix test_dot_op

* fix test_correlation_op

* fix tracer

* fix test_memcpy_op

31392627

29 4月, 2021 1 次提交
- L
  Add op read_file and decode_jpeg (#32564) · b22f6d69
  由 LielinJiang 提交于 4月 29, 2021
```
* add op read_file and decode_jpeg
```
  b22f6d69
09 4月, 2021 1 次提交

[NPU] cherry-pick basic NPU components/allocator/operator/executor supports from ascendrc (#32144) · ccf5709d

由 Leo Chen 提交于 4月 09, 2021

* [feature] support npu allocator (#30840)

[feature] support npu allocator

* [feature] support npu operator (#30951)

[feature] support npu operator

* [feature] support npu allocator, part 2 (#30972)

* support npu allocator

* add npu device context

* fix some compile problem

* fix some compile problem

* add npu info

* compile ok

* fix include dir

* support naive_best_fit_allocator

* run ut ok, bug failed to exit

* call aclrtResetDevice before exit

* fix aclFinilize

* add system allocatot test

* add selected_gpus in gtest

* add tensor_test for npu

* support npu op, initial commit

* add npu stream

* add elementwise_add_op

* compile ok

* fix typo

* fix elementwise_add_op_npu_test

* support op run

* test can run but failed

* change aclopExecuteV2 to aclopCompileAndExecute

* support parsing ascend rank table file (#31000)

support parsing ascend rank table file

* Fix reshape on GE graph. (#31084)

Fix reshape on GE graph

* add npu kernel for elementwise_sub and elementwise_sub_grad (#30973)

* add npu sub op

* fix typo

* rename test

* fix bug

* fix bug

* add fp16 kernel

* fix typo

* support sub grad op

* support elementwise_sub_grad op
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>

* Fix compilation problem (#31100)

Fix compilation problem (#31100)

* fix compile

* fix code stype

* remove const_cast

* support adding correct npu op in pybind.h (#31143)

* support adding correct npu op in pybind.h

* refine code

* [NPU] Support executor with NPU (#31057)

* [NPU] Support executor with NPU

* Fix code according to reviews

* Fix code

* Add unittest for sub op npu

* refactor npu device manager (#31154)

refactor npu device manager (#31154)

* fix selected npus

* fix compile

* fix reading flags from env

* format
Co-authored-by: Nxiayanming <41795079@qq.com>
Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>

ccf5709d

27 1月, 2021 1 次提交

REUPLOAD Added vanilla LSTM and LSTM with peepholes oneDNN fp32 kernel (#30719) · f8da5536

由 jakpiase 提交于 1月 27, 2021

* added external reorder to profiler

* resolved conflict

* added enable_static

* initial version of lstm, not working yet

* added lstm to operators.cmake

* added vanilla lstm mkldnn op

* added peephole weights integration

* minor changes

* added formatting

* added fusion_lstm_mkldnn to static_whitelist

* added formatting

* removed comment

* moved use_peepholes attribute inside is_cached block

* reverted wrong changes

* minor formatting change

* minor changes

* changed stream handling

* minor change

* added datatype to GetExpectedKernelType()

* added reading stream from TLS

f8da5536

26 1月, 2021 2 次提交

T
Revert "Added vanilla LSTM and LSTM with peepholes oneDNN fp32 kernel (#30661)" (#30708) · 824a79d3
由 Tao Luo 提交于 1月 26, 2021
```
This reverts commit d834f4e6.
```
824a79d3

Added vanilla LSTM and LSTM with peepholes oneDNN fp32 kernel (#30661) · d834f4e6

由 jakpiase 提交于 1月 26, 2021

* added external reorder to profiler

* resolved conflict

* added enable_static

* initial version of lstm, not working yet

* added lstm to operators.cmake

* added vanilla lstm mkldnn op

* added peephole weights integration

* minor changes

* added formatting

* added fusion_lstm_mkldnn to static_whitelist

* added formatting

* removed comment

* moved use_peepholes attribute inside is_cached block

* reverted wrong changes

* minor formatting change

* minor changes

d834f4e6

21 1月, 2021 1 次提交
- Q
  
  [ROCM] update cmake and dockerfile, test=develop (#30598) · 1f5841c2
  由 Qi Li 提交于 1月 21, 2021
  
  1f5841c2
16 12月, 2020 1 次提交

添加rocm平台支持代码 (#29342) · 76738504

由 Y_Xuan 提交于 12月 16, 2020

* 添加rocm平台支持代码

* 修改一些问题

* 修改一些歧义并添加备注

* 修改代码格式

* 解决冲突后的代码修改

* 修改operators.cmake

* 修改格式

* 修正错误

* 统一接口

* 修改日期

76738504

07 12月, 2020 1 次提交

Compiling operator libraries with Unity build (#29130) · 671555ed

由 LoveAn 提交于 12月 07, 2020

* Compiling operator libraries with Unity Build on Windows CPU.

* Compiling operator libraries with Unity Build on Windows GPU, no_test, test=windows_ci

* Add option in windows ci script, no_test, test=windows_ci

* Optimize parallel compiling, test=develop

* remove limit of parallel compile and skip some ops in UB, test=develop

* remove changes of header file, test=develop

* remove changes of header file, test=develop

* fix test_eye_op unittest failed, test=develop

* Compiling operator libraries with Unity Build on Linux, test=develop

* set default WITH_UNITY_BUILD=OFF, test=develop

* Move unity build rules into a single file and add comment, test=develop

* optimize parallel compilation, test=develop

* fix undefined reference error on coverage ci, test=develop

671555ed

12 11月, 2020 1 次提交
- S
  裁剪transformer模型trt支持；修复tensorRT不支持DeletePass的bug (#28517) · 8699f38d
  由 Shang Zhizhou 提交于 11月 12, 2020
```
* skip_layernorm_op done

* add unittest

* slice op convertor support trt < 6

* skip_layernorm only work in ernie
```
  8699f38d
27 9月, 2020 1 次提交

support elementwise add, activation, matmul on Baidu Kunlun (#27143) · 6b727e08

由 QingshuChen 提交于 9月 27, 2020

* support elementwise add, activation, matmul on Baidu Kunlun
* test=kunlun

* minor
* test=kunlun

* reconstuct the xpu directory
* test=kunlun

* minor
* test=kunlun

* minor
* test=kunlun

* minor
* test=kunlun

* minor
* test=kunlun

* minor
* test=kunlun

6b727e08

23 9月, 2020 1 次提交
- Z
  add fuse_bn_act op (#27230) · 906e7f92
  由 Zhang Ting 提交于 9月 23, 2020
```
* add fused_bn_add_relu op
```
  906e7f92
16 9月, 2020 1 次提交
- L
  
  Fix bug of handling blank characters in operators.cmake (#27310) · c89f269c
  由 Leo Chen 提交于 9月 16, 2020
  
  c89f269c
21 8月, 2020 1 次提交

support Baidu Kunlun AI Accelerator (#25959) · 138ecf24

由 QingshuChen 提交于 8月 21, 2020

* support Baidu AI Accelerator
  * test=kunlun

* minor
 * test=kunlun

* support xpu op in separate file
 * test=kunlun

* update XPU error message and remove duplicated code

 * test=kunlun

* minor
 * test=kunlun

* minor
 * test=kunlun

138ecf24

13 8月, 2020 1 次提交

[OpDevOptimize] Add common infershape functions (#26096) · ffe52b44

由 Leo Chen 提交于 8月 13, 2020

* add unchaged infershape function

* add broadcast infershape function

* fix bug

* rename infershape functions

* add UnaryOpUnchangedInferShapeCheckAxis

* add error message

* add test for common infer shape functions

* dont update existed ops

* dont update op_desc.h

* add more test

* add error check, refine error message

ffe52b44

06 8月, 2020 1 次提交

Add oneDNN fusion_gru kernel (#25594) · 68c6160e

由 Adam 提交于 8月 06, 2020

* Add oneDNN fusion_gru kernel and fix fc+gru pass
test=develop

* Formatting changes
test=develop

* Lint fixes
test=develop

* Add memory::format_tag::any to GRU weights
test=develop

* Fix build with CUDA

* Fix build with CUDA v2

68c6160e

30 7月, 2020 1 次提交
- W
  Update the api for the compare_ops · 595a7197
  由 wawltor 提交于 7月 30, 2020
```
Update the code for the compare_ops, update the api and doc 
```
  595a7197
03 4月, 2020 1 次提交

update linspace, equal operators to API 2.0 (#23274) · a2e10930

由 channings 提交于 4月 03, 2020

* update linspace, equal operators to API 2.0, test=develop

* equal support higher performance CUDA kernel, test=develop

* update comment of equal&linspace operator, test=develop

* update comment of equal&linspace operator, test=develop

a2e10930

11 3月, 2020 1 次提交

[Ernie GPU Optimize]: Embedding_eltwise_layernorm Fuse (#22494) · 8d6dc102

由 Zhaolong Xing 提交于 3月 11, 2020

* 1. add embedding eltwise layernorm fuse
2. add embedding eltwise layernorm op
3. refine inplace_add_relu
4. refine fc_eltwise_layernorm
test=develop

* 1. refine fc
test=develop

* fix comments
test=develop

* fix comments

test=develop

8d6dc102

10 1月, 2020 1 次提交

Add bn and relu fuse pass (#22048) · 46189b16

由 Zhen Wang 提交于 1月 10, 2020

* add bn and relu fuse pass

* add op attr assert and dtype assert

* fix some inputs&&outputs bugs for the fused op and pattern.

* add the unittest for fuse_bn_act_pass. test=develop

* use normative enforce statements. test=develop

* add the cpu test. test=develop

* add the support of batch_size=1 for the bn with relu op. test=develop

* add the error type for paddle throws. test=develop

* add fused_batch_norm_act and fused_batch_norm_act_grad to op_has_unsed_vars_white_list. test=develop

46189b16

03 1月, 2020 1 次提交

Add the first implememtation of fusion_group op (#19621) · d4832077

由 Yiqun Liu 提交于 1月 03, 2020

* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc.
test=develop

* Call CUDA driver api to launch the kernel compiled by nvrtc.
test=develop

* Disable for mac and windows.
test=develop

* Refine the codes to support manually specified num_threads and workload_per_thread.
test=develop

* Refine the CUDA kernel to support large dims.
test=develop

* Add DeviceCodePool to manage all device codes.

* Add the first implementation fusion_group op.

* Add unit-test for fusion_group op.

* Add the check of result.

* Add the check of nvrtc in unit-test.
test=develop

* Add comment to explain the inputs, outputs and features of fusion_group op.
test=develop

* Disable fusion_group op for mac and windows.
test=develop

* Make the compiling of device code return status instead of hanging up.
test=develop

* Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API.

* Unify fusion_group_op's input and output names.
test=develop

* Add the check of CUDA driver library in unittest.
test=develop

* Refine the calling of PADDLE_ENFORCE.
test=develop

d4832077

09 12月, 2019 1 次提交

dygraph_grad_maker supports varbase without grad_var (#21524) · 84b72671

由 Leo Chen 提交于 12月 09, 2019

* dygraph_grad_maker supports varbase without grad_var, test=develop

* fix compile, test=develop

* fix test_tracer, test=develop

* follow comments, test=develop

84b72671

27 11月, 2019 1 次提交

INT8 Fully-connected (#17641) · 5d7d5482

由 Michał Gallus 提交于 11月 27, 2019

* Implement Int8 FC

* Integrate FC into INT8v2

test=develop

* int8 FC: transpose weights before computing scales

test=develop

* Add support for activation_type string in FC

test=develop

* Disable MKL-DNN's FC in VGG16 and 19

test=develop

* Disable FC quantization when mkldnn FC is disabled

test=develop

* Solve PADDLE_ENFORCES in FC int8

* Fix Paddle enforces and remove const cast

test=develop

* Fix style changes

test=develop

* Fix quantizer_tester test and add fc quantization

test=develop

* Fix FC test fail on CUDA

* Remove unnecessary log from quantize placement pass

test=develop

* Add Thread ID to FC hash key

test=develop

* Add comments to MKL-DNN FC Kernel

test=develop

* Refactor quantizer

test=develop

* Fix linter issues

test=develop

* Fix crash in slim googlenet

test=develop

* Fix PADDLE_ENFORCE messages

test=develop

5d7d5482

08 11月, 2019 1 次提交

Add transpose2 INT8 for mkl-dnn (#19424) · 77c20835

由 joanna.wozna.intel 提交于 11月 08, 2019

* Add transpose2 INT8 for mkl-dnn

test=develop

* Fix test_transpose_int8_mkldnn

test=develop

* Revert "Merge branch 'develop' into transpose_int8_mkldnn_2"

This reverts commit 34011bdb, reversing
changes made to 2ce6473f.

* Revert "Revert "Merge branch 'develop' into transpose_int8_mkldnn_2""

This reverts commit 23754dd7.

* Add template to TransposeMKLDNNHandler

test=develop

* Resolve conflict

test=develop

* Restore get_size and refactor

test=develop

77c20835

02 10月, 2019 1 次提交

Add multihead op for ernie opt (#19933) · e8673668

由 zhaoyuchen2018 提交于 10月 02, 2019

* Add multihead op for ernie opt

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* Refine code

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* Refine code

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* Refine code

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* Refine softmax

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* Refine kernel.

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* Refine code

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* Refine code

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* Refine code

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* Refine cuda kernel

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* Refine cuda version

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* Refine code

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* Refine cmake

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

e8673668

29 9月, 2019 1 次提交

fix conv2d and conv3d: (#20042) · 3aa331d9

由 liym27 提交于 9月 29, 2019

1.support asymmetric padding;
    2.support padding algorithm:"SAME" and "VALID";
    3.support channel_last: data_format NHWC and NDHWC;
    4.change doc of python API and c++;

    test=develop, test=document_preview

3aa331d9

19 9月, 2019 1 次提交

Add a pass to fuse fc+elementwise_add+layernorm (#19776) · 3cd985a6

由 Yiqun Liu 提交于 9月 19, 2019

* Add fc_elementwise_layernorm_fuse pass and unittest.

* Add fused_fc_elementwise_layernorm op and its GPU kernel.
test=develop

* Apply fc_elementwise_layernorm_fuse_pass to GPU inference.

* Add the setting of attrs in the definition of binary_op.
test=develop

* Add comment.

* Implement the unittest.
test=develop

* Change the unittest name of layer_norm.
test=develop

3cd985a6

17 9月, 2019 1 次提交
- C
  add deformable conv v1 op and cpu version of deformable conv v2 (#18500) · 00efd1d8
  由 chengjuntao 提交于 9月 17, 2019
```
* add deformable conv v1 op, test=develop
```
  00efd1d8
11 9月, 2019 1 次提交

Implement the GPU kernel of fc operator (#19687) · a65c728e

由 Yiqun Liu 提交于 9月 11, 2019

* Refine the codes related to fc op.

* Add GPU implementation for fc functor.

* Apply fc_fuse_pass in GPU inference.
test=develop

* Change the cmake for fc op.

* Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ.

* Add an attribute to set the activation type in fc_op.

* Enhance the unittest of fc_op.
test=develop

* Remove the declaration of FCOpGrad back to the header file.
test=develop

* Set default value for newly added arguments in test_fc_op.
test=develop

a65c728e

30 5月, 2019 1 次提交
- B
  Add deformable conv v2 op,test=develop (#17145) · bba57cdd
  由 Bai Yifan 提交于 5月 30, 2019
```
* unit commits, test=develop

* update API.spec, test=develop
```
  bba57cdd
28 3月, 2019 1 次提交
- G
  
  Add DGC(Deep Gradient Compression) interface. (#15841) · eb83abea
  由 gongweibao 提交于 3月 28, 2019
  
  eb83abea

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致