提交 · 8c7ee8c21b28ad9332652e5a1f7b62ca6b4736e0 · Crayon鑫 / Paddle

18 2月, 2022 1 次提交
- F
  [Pten] blas and lapck migration (#39587) · 8c7ee8c2
  由 Feiyu Chan 提交于 2月 18, 2022
```
* move blas related files
* move lapack related files
```
  8c7ee8c2
15 2月, 2022 2 次提交

F
[Pten] move paddle/operators/math/functors.h and compound_functors.h (#39514) · 0d46a108
由 Feiyu Chan 提交于 2月 15, 2022
```
* move paddle/operators/math/functors.h
* move paddle/operators/math/compound_functors.h
```
0d46a108

[PTen]Migrate proto::VarType outside of Pten (#39411) · 7e7e9404

由 Aurelius84 提交于 2月 15, 2022

* #1 migrate dist-related type()-> dtype()

* move datatype function from pten -> fluid/framework

* change type() in imperative into convert(dtype())

* modify xx_tensor->type into xx_tensor->dtype

* change the set_type interface and the caller

* modify xx_tensor.type into xx_tensor.dtype

* fix mutable_data(place, dtype())

* change caller of mutable_data in pten and distributed

* change the caller of mutable_data in fluid/framework

* change the caller of mutable_data in imperative directory

* mutable_data: inference

* update the call of mutable_data

* transfer MakePenScalarArray MakePtenScalar ResetHolderWithType

* pass the compile. the next step is remove VarType in Pten

* fix all and remove VarType from pten. success in linux. Next task is other platform

* fix conflict with develop

* fix compiled error

* Fix reset conversion

* fix conflict

* fix compiled problem

* fix typo

* Fix << in tensor_utils.cc

* fix type->dtype

* fix unittest

* fix tensor init constructor

* fix DataTypeSize for BFloat16

* fix code style

* fix npu compiled error

* fix npu

* compile npu sucessfully

* fix conflict

* fix conflict
Co-authored-by: Nxiongkun <xiongkun03@baidu.com>

7e7e9404

11 2月, 2022 1 次提交
- F
  [Pten] move operators/math/math_function_* to pten/kernels/func (#39300) · d25a7f9e
  由 Feiyu Chan 提交于 2月 11, 2022
```
* move operators/math/math_function_* to pten/kernels/func
* namespace from `paddle::operators::math` to `pten::funcs`
```
  d25a7f9e
08 2月, 2022 1 次提交
- Y
  
  Rename partial function name TensorReduceFunctorImpl to TensorReduceImpl. (#39388) · f71241b9
  由 Yiqun Liu 提交于 2月 08, 2022
  
  f71241b9
06 2月, 2022 1 次提交
- W
  
  [PTEN] Add Gpu context (#39305) · a821c4a9
  由 Wilber 提交于 2月 06, 2022
  
  a821c4a9
29 1月, 2022 1 次提交

Optimize layer norm backward cuda kernel when cols is 1024. (#39247) · 99cfcc09

由 Li Min 提交于 1月 29, 2022

* Add fp16 support for scale/bias for fused_layernnorm_residual_dropout_bias op.

* Remove useless code.

* Remove useless code.

* Optimize layer_norm fwd when cols is 1024.

* Remove useless code.

* Minors.

* Minors.

* Modifications accordding to reviews.

* Minors.

* Optimize layer_norm bwd kernel when cols is 1024.

* Polish layer_norm_bwd_1024 kernel.

* Limit ln_bwd_1024_kernel to paddle_with_cuda.

* Fix double type compile error.

* Add optimization of ln bwd for fused_dropout_add_ln op.

* Polish codes.

99cfcc09

26 1月, 2022 2 次提交

[pten] remove deprecated fluid op kernel for pten (#38842) · 3ab9aef1

由 Leo Chen 提交于 1月 26, 2022

* update cmake file to remove fluid kernel

* add pten declaration.h to where pybind.h used

* fix sync_bn and tensorrt_engine

* refine detection_library

* fix interpreter_core

* support eager legacy

* fit eager legacy for pten

* fall back to cpu if not found kernel

* fix compile problem

* fix compile problem

* refine fallback logic

* fit operator.run()

* fix xpu compile

* fit for new_exec

* add REGISTER_OP_WITHOUT_GRADIENT

* un-cache pt_kernel_context

* fix compile

* fix cudnn

* fix compiling with on_infer

* fix mkldnn

* fix isfinite_v2

* fix xpu problem

* fix op_device

* refine fallback for xpu

* fix xpu compile

* merge develop

* refine code format

* fix compile

* fix compile

* add data_transfer

* fix PreparePtenData

* fix cpu context

* merge develop

* fix compile

* fix error device context

* fix xpu

* fix dev_ctx

3ab9aef1

L
Optimize layer norm forward when cols is 1024. (#39167) · 01d04be6
由 Li Min 提交于 1月 26, 2022
```
* Optimize layer_norm fwd when cols is 1024.
```
01d04be6

25 1月, 2022 1 次提交

[Move selected_rows PR #3] Change the relationship of [include/Cmake]. (#39128) · 2bafd338

由 Weilong Wu 提交于 1月 25, 2022

* Added selected_rows and rw_lock to pten

* Renamed the unit test target to fix CI

* Removed Class SelectedRows in Fluid, changed include/cmake relationship, use pten::SelectedRows in Fluid

* Remove rw_lock.h,rw_lock_test.cc in fluid

* Use pten::RWLock and pten::AutoRDLock, fix CI

* Use pten::SelectedRows

* Use pten::SelectedRows

* Fix to pass NPU CI

* Use pten::SelectedRows, to pass NPU CI

* To fix NPU CI

* To fix NPU CI again

2bafd338

24 1月, 2022 1 次提交

Remved redundant defintions of likely/unlikely (#38911) · 43919d0a

由 Jacek Czaja 提交于 1月 24, 2022

* - more unlikely

* - compilation fix

* - removed redundant definition

* - fix

* - Fixes

* - compilation fix for windows

43919d0a

21 1月, 2022 1 次提交
- W
  
  Renamed selected_rows.* -> selected_rows_utils.* (#39037) · 814e5ab4
  由 Weilong Wu 提交于 1月 21, 2022
  
  814e5ab4
18 1月, 2022 1 次提交

[Unify Tensors PR #8] Merged Tensor into DenseTensor, test=allcases (#38914) · 2052f1e3

由 Zhanlue Yang 提交于 1月 18, 2022

* Merged LoDTensor with Tensor,test=allcases

* Patched python level LoDTensor

* Patched python level LoDTensor

* Merge Tensor into DenseTensor

* Fixed namespace issues,test=allcases

* Fixed merge issues

* Fixed inference issues

* Fixed NPU test issues

* Fixed merge issues

2052f1e3

17 1月, 2022 1 次提交

[Pten] Replace platform::Place to pten::Place. (#38899) · c48a9ad5

由 Wilber 提交于 1月 17, 2022

* add pten::Place data structure.

* update ci problem

* fix ci problem

* update

* using platform::Place=pten::Place

* remove BOOST_GET_CONST for CPUPlace and GPUPlace

* compile pass 25%.

* compile pass 45%

* compile pass 60%

* remove boost_get for xpu npu mlu and ipu

* compile pass on cpu and gpu.

* fix compile problem

* fix compile error.

* update

* fix ci problem

* update

* ci approve

* fix ci problem

* fix ci eager test problem

* remove BOOST_GET_CONST

* fix npu compile

c48a9ad5

12 1月, 2022 1 次提交

Adjust warpper of gpu_lanuch_config (#38654) · f5166284

由 limingshu 提交于 1月 12, 2022

* first commit

* fix wrong filename

* fix the wrong spell name

* fix gpu config warper

* modify according to pr advices

* fix GpuLauchConfig1D api bugs

* change the config for dropout grad

* fix bugs

* modification according to pr advices

* modification according to pr advices

f5166284

07 1月, 2022 1 次提交
- L
  Add fp16 support for scale and bias parameter for fused_layernnorm_residual_dropout op. (#38775) · 1b6e4664
  由 Li Min 提交于 1月 07, 2022
```
* Add fp16 support for scale/bias for fused_layernnorm_residual_dropout_bias op.
```
  1b6e4664
28 12月, 2021 1 次提交
- L
  
  Add constructor for fused dropout param to ease use. (#38475) · f9e8a775
  由 Li Min 提交于 12月 28, 2021
  
  f9e8a775
24 12月, 2021 1 次提交

[pten] combine reduce_cuda codes (#38328) · 08941eda

由 chentianyu03 提交于 12月 24, 2021

* combine reduce_cuda codes

* support float16 in pten redcue_mean

* replace ReduceCudaKernel impl with pten reduce impl

* mv reduce funcs into reduce_cuda_impl

* rm unsed codes and headers

* mv GetReduceDim into reduce_cuda_impl

* recover GetReduceDim in reduce_op.h

* add new dispatch macro

* fix pool op output not inited and cause transform to pten::denseTensor error

* fix output tensor not initialized error

* rename new dispatch macro and format code style

* rm reduce_functor_op.h file

08941eda

17 12月, 2021 1 次提交
- N
  
  Delete cub_reduce.h and modified the TensorReduce to TensorReduceFunctorImpl (#38197) · 9a8a4c77
  由 niuliling123 提交于 12月 17, 2021
  
  9a8a4c77
16 12月, 2021 1 次提交
- N
  Add the transformop parameter in TensorReduceFunctorImpl (#38135) · 524389ee
  由 niuliling123 提交于 12月 16, 2021
```
* Add the transformop parameter in TensorReduceFunctorImpl
```
  524389ee
03 12月, 2021 1 次提交
- R
  refine structure for cuda and rocm (#37202) · a6d2fddb
  由 ronnywang 提交于 12月 03, 2021
```
* refine structure for cuda and rocm

* update

* update

* update

* update
```
  a6d2fddb
29 11月, 2021 1 次提交
- P
  
  Add third batch of deprecated mkldnn namespace name changes (#37558) · 1ba81500
  由 piotrekobiIntel 提交于 11月 29, 2021
  
  1ba81500
23 11月, 2021 1 次提交
- L
  Add support bias is none for fused_attention op. (#37411) · 1a8786cf
  由 Li Min 提交于 11月 23, 2021
```
Add support for bias is none for fused_attention op.
```
  1a8786cf
19 11月, 2021 1 次提交

Add fuse_resnet_unit pass (#36818) · 3cd3bf29

由 wuhuanzhou 提交于 11月 19, 2021

* GeneratePass support attr condition and mapping, test=develop

* fix coverage, test=develop

* Add fuse_resnet_unit pass, test=develop

* fix CI errors, test=develop

* fix CI errors, test=develop

* fix unittest error when compiling without CUDA, test=develop

* fix static ci error, test=develop

* limit kernel size must equal 1, test=develop

3cd3bf29

17 11月, 2021 1 次提交

Changed first batch of deprecated mkldnn headers and function names to new oneDNN names (#37040) · ce3ee9bb

由 piotrekobiIntel 提交于 11月 17, 2021

* Change first batch of mkldnn headers and namespace names to dnnl

* Revert changes to tensor.h, which require approval

* Format changes with pre-commit

* Add int32 tests

* Fix int32 tests and call GetDataFromTensor for int32

* Fix test

ce3ee9bb

16 11月, 2021 1 次提交

Fix attn_bias_add bug. (#37147) · a9e7a854

由 Li Min 提交于 11月 16, 2021

fused_attention_op的实现中，使用了bias_add，且其实现是通过使用kernel primitive来实现的，之后kernel primitive的WriteData api接口及函数内部实现发生了更改，将判断越界的逻辑移到了template的参数中，使得调用的分支有错误，产生了越界赋值操作，污染了别的显存空间的内容。具体表现为：test_fused_attention_op_api.py 单次执行基本上不会报错，多次循环执行不同shape的输入，结果计算不对，具有偶发性，bug不易察觉。

a9e7a854

12 11月, 2021 1 次提交
- Z
  [fix]fix the bug of fused_attention and fused_feedforward (#36972) · 6486e242
  由 zhangkaihuo 提交于 11月 12, 2021
```
* fix bug:
1. atten: set the default value of attn_dropout_rate to None
2. ffn: add activation parameter
```
  6486e242
10 11月, 2021 1 次提交
- L
  Fix fused_attention_op scope. (#37065) · ad44a40c
  由 Li Min 提交于 11月 10, 2021
```
att, bug fix
```
  ad44a40c
08 11月, 2021 1 次提交
- L
  【fix-bug】Support attn_mask=None input cases for fused_attention_op. (#36951) · 472dcca4
  由 Li Min 提交于 11月 08, 2021
```
目前的fused_attention_op不支持attn_mask=None的输入，本PR对此进行了补充，并补充了相应的单测逻辑。
```
  472dcca4
05 11月, 2021 1 次提交

Disable pool&conv_transpose&quantize caching (#36695) · db6c00c4

由 Jacek Czaja 提交于 11月 05, 2021

* - WIP

- compilation fix

- fix

- fixes

- fix

- fix

- fix again

- fix

- another fix

- another compilation fix

- fix

- fix

- fix

- lint

* - pool2d partially stripped from cache

- pool2d partially stripped of caching

* - compilation fix

* - compilation fix

* - Fix to UT of caching

* - Enabling test_conv3d_mkldnn

* - conv_transpose stripped of cache

* - compilation fix

* - fix

* - fix

* - compilation fix

* - fix

* Reverted disabling caching of conv2d

* - compilation fix

* - ut reverted

db6c00c4

28 10月, 2021 1 次提交
- L
  Fix fused_attention_op and fused_feedforward_op bug when pre_layer_norm is false. (#36793) · ff3018d7
  由 Li Min 提交于 10月 28, 2021
```
* Fix bug when pre_layer_norm is false.
```
  ff3018d7
26 10月, 2021 2 次提交

Add fused attention op backward and python layer. (#36498) · 5119428e

由 Li Min 提交于 10月 26, 2021

功能：本PR的目标是提高attention模块的计算性能。
为了减少框架层对op的调度开销，本PR通过在C++层手动实现attention模块，对外提供attention 大op；
为了减少防存开销，本PR采取了两种优化方法：
（1）在q,k,v计算时通过共享输入X，将该处的gemm，transpose和bias add从三次调用减少为一次；
（2）使用kernel融合优化技术，在不同cuda kernel之间通过寄存器传输数据；

5119428e

L
Move fused_attention and fused_feedforward functional api path to incubate (#36704) · 9aeca2f1
由 Li Min 提交于 10月 26, 2021
```
将 #35905 和 #35843 PR中新增的的python api接口移到incubate目录下。
```
9aeca2f1

25 10月, 2021 2 次提交

add op: fused_feedforward(backward) (#35611) · 2dd0a46a

由 zhangkaihuo 提交于 10月 25, 2021

这个PR是fused_feedforward反向的代码

相关kernel实现：fused_dropout_act_bias, fused_residual_dropout_bias, fused_layernorm_residual_dropout_bias

fused_feedforward是一个融合算子，该算子对transformer模型的feed forward层的算子进行融合和封装，使得前端只呈现一个接口，通过融合减少部分访存和kernel launch的时间，以此提升性能。

2dd0a46a

add op: fused_feedforward(forward) (#35843) · b18cbfb2

由 zhangkaihuo 提交于 10月 25, 2021

这个PR只包含fused_feedforward前向的代码。

相关kernel实现：fused_dropout_act_bias, fused_residual_dropout_bias, fused_layernorm_residual_dropout_bias

b18cbfb2

22 10月, 2021 1 次提交

Fused attention op forward (#35905) · d4906214

由 Li Min 提交于 10月 22, 2021

功能：本PR的目标是提高attention模块的计算性能。
为了减少框架层对op的调度开销，本PR通过在C++层手动实现attention模块，对外提供attention 大op；
为了减少防存开销，本PR采取了两种优化方法：
（1）在q,k,v计算时通过共享输入X，将该处的gemm，transpose和bias add从三次调用减少为一次；
（2）使用kernel融合优化技术，在不同cuda kernel之间通过寄存器传输数据；

d4906214

21 10月, 2021 1 次提交

Fix a bug in ReadData, ReadDataBc and ReadDataReduce when NX != 1 (#36373) · 921c0917

由 niuliling123 提交于 10月 21, 2021

* Update the implement of reduceAnyKernel according to kernel primitive api
* Fix a bug in ReadData, ReadDataBc and ReadDataReduce when NX != 1

921c0917

15 10月, 2021 1 次提交
- Z
  
  Add ResNetUnit Python API (#35426) · 12882b2f
  由 Zhang Zheng 提交于 10月 15, 2021
  
  12882b2f
14 10月, 2021 1 次提交
- Z
  
  Add the complete code and related files of resnet_unit_op (#36366) · 12e6dbbc
  由 Zhang Zheng 提交于 10月 14, 2021
  
  12e6dbbc
12 10月, 2021 1 次提交
- Z
  
  Change the input param of fusion op interface from pointer to tensor (#36349) · 3e2dec5b
  由 Zhang Zheng 提交于 10月 12, 2021
  
  3e2dec5b

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致