提交 · 81abaaf5dd3cd03e223f9f73e1c27ea3de0cab6d · Crayon鑫 / Paddle

15 6月, 2022 3 次提交
- G
  
  modify index dtype from int to int64_t of concat_and_split_functor (#43479) · 81abaaf5
  由 Guoxia Wang 提交于 6月 15, 2022
  
  81abaaf5
- add some kernels(csr*dense->csr, dense*dense->csr) of SparseTensor matmul (#42935) · 346efe96
  由 zhouweiwei2014 提交于 6月 15, 2022
```
* add some kernel(csr*dense->csr, dense*dense->csr) of SparseTensor matmul

* fix CI

* fix CI

* fix comment

* fix comment
```
  346efe96
- Y
  Use int64_t in GetGpuLaunchConfig1D and ElementwiseKernel as index type to... · 15577630
  由 Yiqun Liu 提交于 6月 15, 2022
```
Use int64_t in GetGpuLaunchConfig1D and ElementwiseKernel as index type to support large tensor. (#43506)

* Change some data type from int to int64_t in GetGpuLaunchConfig1D to support large tensor.

* Use int64_t in ElementwiseKernel as index type to support large tensor.
```
  15577630
13 6月, 2022 1 次提交
- Z
  fix bug of strided_slice (#43388) · abc5d0c4
  由 zyfncg 提交于 6月 13, 2022
```
* fix stride_slice bug

* fix bug
```
  abc5d0c4
10 6月, 2022 1 次提交
- W
  
  revert PR43039 (#43384) · ac75617a
  由 Wilber 提交于 6月 10, 2022
  
  ac75617a
07 6月, 2022 3 次提交
- S
  
  Optimized the performance of activation op in XPU2 (#43187) · d5afc1ba
  由 shixingbo 提交于 6月 07, 2022
  
  d5afc1ba
- W
  
  [multi-stream] Fix split and concat problem. (#43039) · 8c3777df
  由 Wilber 提交于 6月 07, 2022
  
  8c3777df
- N
  
  [XPU KP]Add xpu register, any, amax, amin op test (#43204) · aec49361
  由 niuliling123 提交于 6月 07, 2022
  
  aec49361
06 6月, 2022 1 次提交
- N
  
  Replace ReduceAmax/Amax.part.cu with KP (#43202) · 39903f72
  由 niuliling123 提交于 6月 06, 2022
  
  39903f72
05 6月, 2022 1 次提交
- S
  
  【code format check upgrade】 step2：clang-format (#42840) · a3730dc8
  由 Sing_chan 提交于 6月 05, 2022
  
  a3730dc8
04 6月, 2022 1 次提交
- S
  
  【code format check upgrade】 step2：cmake-format (#43057) · 92568edb
  由 Sing_chan 提交于 6月 04, 2022
  
  92568edb
31 5月, 2022 1 次提交

[EinsumOp] Make EinsumOp support bfloat16. (#43085) · a4bb38cb

由 xiongkun 提交于 5月 31, 2022

* change einsum_v2 as default and add new flags: FLAG_einsum_opt=1|0

* make EInsumOP support bf16

* add unittest for BF16

* add condition for test_BF16

* fix bugs

* fix

a4bb38cb

30 5月, 2022 1 次提交
- L
  Optimize memcpy operation in Eigh (#42853) · 806073d6
  由 limingshu 提交于 5月 30, 2022
```
* 1st commit

* fix usless change in header transpose_kernel_h file

* add sync
```
  806073d6
27 5月, 2022 1 次提交

[Phi] Change optional tensor from `optional<const Tensor&>` to `optional<Tensor>` (#42939) · 6d78524c

由 zyfncg 提交于 5月 27, 2022

* refactor the optional tensor

* remove optiona<MetaTensor> in InferMeta

* fix bug

* fix optional<vector<Tensor>>

* fix bug

* fix rmsprop

* fix amp of eager_gen

* polish code

* fix deleted code

* fix merge conflict

* polish code

* remove is_nullopt_

* fix merge conflict

* fix merge conflict

6d78524c

26 5月, 2022 1 次提交
- Y
  
  move instance_norm_double_grad (#43021) · b2b78cd4
  由 YuanRisheng 提交于 5月 26, 2022
  
  b2b78cd4
25 5月, 2022 1 次提交

fix maybe-uninitialized warning (#42902) · f1f79b0d

由 Leo Chen 提交于 5月 25, 2022

* fix maybe-uninitialized warning

* fix compile

* fix xpu compile

* fix npu compile

* fix infer compile

* fix compile

* fix compile

f1f79b0d

20 5月, 2022 3 次提交
- N
  
  Delete ElementwiseKernel in BroadcastKernel (#42779) · 0d878f1a
  由 niuliling123 提交于 5月 20, 2022
  
  0d878f1a
- L
  use fp32 compute type for cublasGemmStridedBatchedEx with fp16 input/output (#42851) · f36a9464
  由 Leo Chen 提交于 5月 20, 2022
```
* use fp32 compute type for cublasGemmStridedBatchedEx with fp16 input/output

* add flags to control compute type

* default to false

* add unit test

* default to true
```
  f36a9464
- Y
  
  move activation kernel (#42880) · 191c441a
  由 YuanRisheng 提交于 5月 20, 2022
  
  191c441a
19 5月, 2022 1 次提交

[Phi] Change the output format of C++ backward api (Part2) (#42545) · 4427f1b1

由 zyfncg 提交于 5月 19, 2022

* change the output format of C++ backward api

* fix merge conflict

* fix sparse api code auto-gen

* fix eager_gen bug

* fix bug of output is null

* fix bug of conv2d_grad_impl

* fix optional grad

* fix bug of eager-gen double_grad

* fix bug

* fix multiply_double_grad bug

* fix bug of higher order derivative

* fix bug of FillZeroForEmptyGradInput

* remove redundant vector in grad_node

* fix bug of test_deformable_conv_v1_op

* fix bug of test_deformable_conv_v1_op

* some refacotr

4427f1b1

16 5月, 2022 1 次提交
- N
  
  delete rank switch in broadcast_function.h for compile (#42645) · 8501fb00
  由 niuliling123 提交于 5月 16, 2022
  
  8501fb00
12 5月, 2022 1 次提交
- S
  
  Fix some typos in paddle/. (#42408) · 2012672c
  由 Shuangchi He 提交于 5月 12, 2022
  
  2012672c
10 5月, 2022 2 次提交

【PaddlePaddle Hackathon 2】18、为 Paddle 新增 paddle.heaviside 和 paddle.Tensor.heaviside API (#41872) · 4892d592

由 BrilliantYuKaimin 提交于 5月 10, 2022

* Create elementwise_heaviside_op.cc

* add ElementwiseHeavisideFunctor

* Create test_elementwise_heaviside_op.py

* 增加heaviside的python接口

* add heaviside in white list

* 增加heaviside的签名

* 增加heaviside的核函数

* 增加heaviside梯度的核函数

* 增加heaviside梯度的注册

* 调整代码格式

* Update elementwise_sig.cc

* add heaviside in __all__

* Update heaviside docs

* Update math.py

* Update math.py

* Update math.py

4892d592

S

broadcast_add kp performance optimization (#42097) · c7855125
由 shixingbo 提交于 5月 10, 2022

c7855125

09 5月, 2022 1 次提交
- N
  
  Modified reduce for xpu2 (#42439) · ae4d1ec1
  由 niuliling123 提交于 5月 09, 2022
  
  ae4d1ec1
01 5月, 2022 1 次提交
- L
  
  [KP] Complete registry of elementwise ops on XPU with KP (#42056) · a3d56a9c
  由 Lijunhui 提交于 5月 01, 2022
  
  a3d56a9c
28 4月, 2022 1 次提交
- F
  set device id of Place() to get GPUContext needed by LimitGridDim in ElemwiseGradBroadcast (#42320) · 22d3c560
  由 FlyingQianMM 提交于 4月 28, 2022
```
* set device id of Place() to get GPUContext needed by LimitGridDim in ElemwiseGradBroadcast

* fix code style
```
  22d3c560
27 4月, 2022 1 次提交

Optimize performance of dygraph (v4) (#42196) · 37e2f027

由 zyfncg 提交于 4月 27, 2022

* optimize performance of dygraph

* optimize performance of dygraph and elementwise_add

* optimize the trace op

* fix bug

* fix bug

* fix unittest bug

* fix code format

37e2f027

25 4月, 2022 2 次提交
- L
  Fix dimension merge bug in broadcast (#42143) · 2562ad5a
  由 limingshu 提交于 4月 25, 2022
```
* change sequential logic

* change some quotes

* add some notations

* change wrong note style.
```
  2562ad5a
- C
  
  fix variant compile error (#42203) · 1178f153
  由 Chen Weihang 提交于 4月 25, 2022
  
  1178f153
18 4月, 2022 2 次提交
- L
  
  [KP] Add Reduce op registry & UT for xpu_kp compilation (#41869) · b3959fe4
  由 Lijunhui 提交于 4月 18, 2022
  
  b3959fe4
- Z
  
  Add sparse kernel coalesced (#41784) · 8f469ddd
  由 zhangkaihuo 提交于 4月 18, 2022
  
  8f469ddd
16 4月, 2022 1 次提交
- 王
  
  move fc_functor from fluid to phi.test=develop (#41856) · 21aa3adc
  由王明冬提交于 4月 16, 2022
  
  21aa3adc
14 4月, 2022 2 次提交
- L
  [KP] Add registry for elementwise_add/max/min/sub/div/mul/floordiv on XPU2 with KP lib (#41494) · fbe2c311
  由 Lijunhui 提交于 4月 14, 2022
```
* regist elementwise_xxx
```
  fbe2c311
- C
  [Phi] Unify dispatch macros to visit (#41653) · 2ab986ae
  由 Chen Weihang 提交于 4月 14, 2022
```
* chnage dispatch to visit

* resolve conflict
```
  2ab986ae
13 4月, 2022 1 次提交
- Z
  
  Add kernel sparse_mask_helper; sparse_coo_tensor_grad (#41586) · acd08a9b
  由 zhangkaihuo 提交于 4月 13, 2022
  
  acd08a9b
12 4月, 2022 2 次提交

[KP] Add Logical/compare/bitwise registry & UT (#40802) · 3749198e

由 Lijunhui 提交于 4月 12, 2022

* init commit no push

* collect comile errors

* bitwise UT

* fix compile problem

* cancel comments

* restore miss deletion

* fix compilation

* fix UT

* NO stash in multiple branch at the same times

* fix error

* combine .cu from gpu and kps

* replace gpu by kps

* fix by Chen-weihang

* Revert "Fix kps compile error in Junhui logic compare bitwise"

* fix backend test

* rm comments
Co-authored-by: NChen Weihang <chenweihang@baidu.com>

3749198e

add a inner loop for index_select_grad_init() in index_select op when dealing... · bc01242b

由 FlyingQianMM 提交于 4月 12, 2022

add a inner loop for index_select_grad_init() in index_select op when dealing with large-shape data (#41563)

* replace for with CUDA_KERNEL_LOOP for index_select_grad_init() in index_select op

* use CUDA_KERNEL_LOOP_TYPE

* fix code style

* replace index_select_grad_init with SetConstant

bc01242b

07 4月, 2022 1 次提交
- fix compile bug of windows cuda11.5 (#41433) · eea85814
  由 zhouweiwei2014 提交于 4月 07, 2022
  
  eea85814
03 4月, 2022 1 次提交

add maximum limit for grid of index_select (#41127) · af8d2482

由 FlyingQianMM 提交于 4月 03, 2022

* limit grid dim for index select

* mv LimitGridDim into gpu_launch_config.h

* fix conflicts

* fix conflicts

* fix code style

* set block to 256

* fix grid setting

* set dtype of block_dim to unsigned int

af8d2482

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致