提交 · d25a7f9ea7a171caa7b37fd9f624b28d25aa293a · Crayon鑫 / Paddle

11 2月, 2022 5 次提交
- F
  [Pten] move operators/math/math_function_* to pten/kernels/func (#39300) · d25a7f9e
  由 Feiyu Chan 提交于 2月 11, 2022
```
* move operators/math/math_function_* to pten/kernels/func
* namespace from `paddle::operators::math` to `pten::funcs`
```
  d25a7f9e
- Z
  Optimize performance of softmax_bwd when axis!=-1 (#38609) · 2ea15fc9
  由 Zhang Zheng 提交于 2月 11, 2022
```
* Optimize performance of softmax_bwd when axis!=-1

* fix

* fix

* fix

* fix
```
  2ea15fc9
- L
  Optimize bilinear interpolation foward (#39243) · a1174973
  由 Lijunhui 提交于 2月 11, 2022
```
* bilinear_fw init

* optimize code

* pre-compute linear_interp input index
```
  a1174973
- C
  [PTen] Move grad GetExpectedPtenKernelArgs into pten (#39418) · 667bd962
  由 Chen Weihang 提交于 2月 11, 2022
```
* move grad get expected pten kernel args

* fix reduce sum error

* fix element_sub_grad failed

* revert kernel judge change
```
  667bd962
- Z
  Support different dtypes of inputs for elementwise ops (#38859) · bf305033
  由 Zhang Ting 提交于 2月 11, 2022
```
* improve backward performance

* support different dtypes for elementwise ops
```
  bf305033
10 2月, 2022 7 次提交

F
[MLU] add mlu kernel for accuracy op (#39337) · 383de295
由 fwenguang 提交于 2月 10, 2022
```
* [MLU] add mlu kernel for accuracy op

* fix license format

* fix error message
```
383de295
F
[NPU] add reduce_min (#39019) · 2b8b16d7
由 furnace 提交于 2月 10, 2022
```
[NPU] add reduce_min
```
2b8b16d7

move Masked select to pten (#39193) · e2ad433b

由 hong 提交于 2月 10, 2022

* move masked select cpu kernel

* add masked selected gpu kernel; test=develop

* fix bugs; test=develop

* bug fix; test=develop

* bug fix; test=develop

* add namespace to set mask array; test=develop

* fix bug; test=develop

* fix bugs; test=develop

* fix ddim bug; test=develop

* fix npu op bug; test=develop

* fix xpu dependecy bug; test=develop

* move kernel args to sig.cc; test=develop

e2ad433b

Modify the unsqueeze dimension of input data in conv1d NCL And NLC format (#38425) · 224bc511

由 crystal 提交于 2月 10, 2022

* optimize conv1d forward

* add conv opt

* Optimize memory copy

* delete share data with

* set num_filters=512

* add nlc optimize

* Optimize num_filter=512 data on A100 and V100

* Fix the workspace_size size setting of filter

224bc511

Z
[bf16] add bf16 kernel: squeeze & unsqueeze & stack (#39402) · 59c7aea5
由 zhangbo9674 提交于 2月 10, 2022
```
* add squeeze unsqueeze stack

* add unittest

* add cpu kernel
```
59c7aea5

[bf16] add bf16 kernel: dropout & reshape & slice (#39395) · e8ac7fc3

由 zhangbo9674 提交于 2月 10, 2022

* add dropout

* add reshape

* add slice

* refien slice unittest

* refine slice unittest

* add cpu bf16 kernel

e8ac7fc3

L
[pten] update isnan registration (#39419) · 14ed2f54
由 Leo Chen 提交于 2月 10, 2022
```
* update isnan registration

* fix compile
```
14ed2f54

09 2月, 2022 13 次提交
- Z
  Optimize performance of softmax_fwd when axis!=-1 (#38602) · 8e1b0204
  由 Zhang Zheng 提交于 2月 09, 2022
```
* Optimize performence of softmax_fwd when axis!=-1

* use functor

* support hip

* fix functor
```
  8e1b0204
- N
  
  Replace EigenBroadcast with ElementwiseBroadcast in ReduceGrad (#39255) · 772be4f5
  由 niuliling123 提交于 2月 09, 2022
  
  772be4f5
- [MLU] add mlu kernel for c_comm_init op (#39364) · 1bd7a143
  由 mhhhh1 提交于 2月 09, 2022
  
  1bd7a143
- F
  
  [MLU] add gaussian_random mlu kernel (#39338) · c35b4b8e
  由 fwenguang 提交于 2月 09, 2022
  
  c35b4b8e
- F
  
  [mlu] add mlu kernel for momentum op (#39331) · f8ba12e5
  由 fwenguang 提交于 2月 09, 2022
  
  f8ba12e5
- F
  
  [mlu] add mlu kernel for elementwise_add (#39313) · d47a511a
  由 fwenguang 提交于 2月 09, 2022
  
  d47a511a
- J
  Replace EagerTensor with Tensor (#39376) · 945a3ce9
  由 Jiabin Yang 提交于 2月 09, 2022
```
* merge legacy to fluid

* Remove legacy code

* Remove legacy code

* Remove DataType test

* Using Tensor directly instead of using EagerTensor

* support gradient_accumulation

* make test_imperative_lod_tensor_to_selected_rows longer

* make test_imperative_lod_tensor_to_selected_rows longer
```
  945a3ce9
- Y
  
  Rename partial function name TensorReduceFunctorImpl to TensorReduceImpl. (#39387) · 6354f81c
  由 Yiqun Liu 提交于 2月 09, 2022
  
  6354f81c
- H
  Move trace op to pten (#39227) · d7dddf94
  由 hong 提交于 2月 09, 2022
```
* add trace op

* bug fix

* bug fix; test=develop

* thrust bug fix; test=develop

* remove useless register; test=develop

* fix bug; test=develop

* update trace kernel; test=develop

* move kernel args to trace_sig; test=develop
```
  d7dddf94
- C
  
  move stream into pten (#39392) · 266955a9
  由 Chen Weihang 提交于 2月 09, 2022
  
  266955a9
- S
  
  add more int type support for softmax_with_cross_entropy (#39409) · eaa3fd45
  由 sneaxiy 提交于 2月 09, 2022
  
  eaa3fd45
- H
  
  convert paddle model to mlir paddle dialect (#39216) · 2be20e20
  由 huzhiqiang 提交于 2月 08, 2022
  
  2be20e20
- H
  Move norm to pten (#39324) · ece200b3
  由 hong 提交于 2月 09, 2022
```
* add norm cpu

* update code;

* norm bug fix

* move norm op to pten; test=develop

* move norm op to pten; test=develop

* add norm util; test=develop

* fix norm npu bug; test=develop

* fix norm kernel bug; test=develop

* move kernel args to pten; test=develop

* move kernel args to pten sig; test=develop
```
  ece200b3
08 2月, 2022 7 次提交

S
Make Embedding layer support more int ids type (#39381) · 60f1461a
由 sneaxiy 提交于 2月 08, 2022
```
* add more int id type support for embedding

* add ut

* add more ut

* fix ci error
```
60f1461a
Y

Rename partial function name TensorReduceFunctorImpl to TensorReduceImpl. (#39388) · f71241b9
由 Yiqun Liu 提交于 2月 08, 2022

f71241b9

Fix to #38126 (#39097) · f884edb9

由 Jacek Czaja 提交于 2月 08, 2022

* - 38126 potential fix

* - fix

* - build fix

* - another candidate fix

* - compilation fix

* - another fix

* - Fix to activation of NHWC being first oneDNN op in chain on oneDNN ops

* - compilation fix

* - added NHWC reotating for elementwise being first op

* - compilation fix

* - compilation fix

* - Added UT

* - cosmetic fixes

f884edb9

Z
[bf16] add bf16 cuda kernel: concat and split (#39380) · de0bad2a
由 zhangbo9674 提交于 2月 08, 2022
```
* add concat & split

* add concat kernel

* add concat unittest

* add split unittest
```
de0bad2a
W
[PTEN] Update gpu_context. (#39359) · 24103cbb
由 Wilber 提交于 2月 08, 2022
```
* gpu_context..

* update

* update

* update
```
24103cbb
N
Replace clip, bce_loss, full and full_like with elementwise (#39197) · 424700ff
由 niuliling123 提交于 2月 08, 2022
```
* Replace clip, bce_loss, full and full_like with elementwise
```
424700ff

[PTen] Support SelectedRows in execution and remove scale OpKernel and InferShape (#39351) · 41eb2595

由 Chen Weihang 提交于 2月 08, 2022

* adapt selectedrows in execution

* impl selected rows branch

* support selectedrow in infershape utils

* fix device compile failed

* fix new exe test failed

* revert some changes

41eb2595

07 2月, 2022 2 次提交
- T
  
  add sequence_conv op in xpu place (#39025) · fee4316d
  由 tanzhipeng 提交于 2月 07, 2022
  
  fee4316d
- J
  Added Adam FP32 JIT assembly kernel (#39158) · ebd14743
  由 jakpiase 提交于 2月 07, 2022
```
* Added adam kernel

* CI rerun
```
  ebd14743
06 2月, 2022 1 次提交
- W
  
  [PTEN] Add Gpu context (#39305) · a821c4a9
  由 Wilber 提交于 2月 06, 2022
  
  a821c4a9
04 2月, 2022 1 次提交
- C
  
  remove unchanged infermeta new (#39343) · 0dccdee0
  由 Chen Weihang 提交于 2月 04, 2022
  
  0dccdee0
02 2月, 2022 1 次提交
- J
  
  Merge legacy to fluid (#39318) · 34cce62f
  由 Jiabin Yang 提交于 2月 02, 2022
  
  34cce62f
30 1月, 2022 1 次提交
- F
  
  [MLU] add softmax_with_cross_entropy mlu kernel (#39260) · aecf9967
  由 fwenguang 提交于 1月 30, 2022
  
  aecf9967
29 1月, 2022 2 次提交

Optimize layer norm backward cuda kernel when cols is 1024. (#39247) · 99cfcc09

由 Li Min 提交于 1月 29, 2022

* Add fp16 support for scale/bias for fused_layernnorm_residual_dropout_bias op.

* Remove useless code.

* Remove useless code.

* Optimize layer_norm fwd when cols is 1024.

* Remove useless code.

* Minors.

* Minors.

* Modifications accordding to reviews.

* Minors.

* Optimize layer_norm bwd kernel when cols is 1024.

* Polish layer_norm_bwd_1024 kernel.

* Limit ln_bwd_1024_kernel to paddle_with_cuda.

* Fix double type compile error.

* Add optimization of ln bwd for fused_dropout_add_ln op.

* Polish codes.

99cfcc09

[PTen] Tidy pten core headers (#39188) · dd990981

由 Chen Weihang 提交于 1月 29, 2022

* open header for custom kernel

* add core utils

* tidy core code

* tify header

* tidy include

* tidy namespace

* resolve conflit

* fix unittest and coverage

* remove platform using

* resolve conflict

* resolve conflict

* fix digamma namespace error

* fix xpu full kernel error

* fix xpu full kernel error

* polish details

* add place for lib storage

dd990981

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致