提交 · df5152551d933487c7e9f0edd47c7066f2c95f86 · Crayon鑫 / Paddle

21 1月, 2022 8 次提交
- Z
  
  modify DivideFunctor to match ElementwiseSameDims template (#39041) · df515255
  由 Zhang Ting 提交于 1月 21, 2022
  
  df515255
- T
  Keep strided_slice op behavior consistent with slice op when starts input is... · b47fb764
  由 TeslaZhao 提交于 1月 21, 2022
```
Keep strided_slice op behavior consistent with slice op when starts input is less than -rank (#39066)
```
  b47fb764
- F
  [MLU]add mlu ci dockerfile (#39021) · fdab43b5
  由 fwenguang 提交于 1月 21, 2022
```
* [MLU]add mlu ci dockerfile

* fix comment

* add cncl
```
  fdab43b5
- A
  [PTen]Migrate Dim and DDim from paddle::framework into pten namespace (#39053) · 4e23ba32
  由 Aurelius84 提交于 1月 21, 2022
```
* Migrate Dim and DDim from paddle::framework into pten namespace

* fix paddle::framework::Array

* fix framework::Array
```
  4e23ba32
- R
  
  fix npu c_allgather int64 (#39099) · 89f903da
  由 ronnywang 提交于 1月 21, 2022
  
  89f903da
- F
  add block and grid loop for index_sample kernel to deal with a large-shape tensor (#37816) · 4adeff06
  由 FlyingQianMM 提交于 1月 21, 2022
```
* add block and grid loop for index_sample kernel to deal with a large-shape tensor

* fix code format

* limit grid dim
```
  4adeff06
- F
  
  [MLU]add batch_norm mlu kernel (#39070) · 29796efe
  由 fwenguang 提交于 1月 21, 2022
  
  29796efe
- W
  [PTEN] Add cpu context (#38979) · 064bc4b8
  由 Wilber 提交于 1月 21, 2022
```
* add cpu_context.

* update

* update

* update

* update

* update

* fix ci problem

* fix npu ci problem

* update

* fix ci compile
```
  064bc4b8
20 1月, 2022 7 次提交
- F
  
  [MLU]add mlu kernel for top_k and top_k_v2 (#39065) · e02dec01
  由 fwenguang 提交于 1月 20, 2022
  
  e02dec01
- F
  
  [MLU]add mlu kernel for cast and scale op (#38961) · e3e50ea8
  由 fwenguang 提交于 1月 20, 2022
  
  e3e50ea8
- A
  [Pten] Migrate bfloat16/float16/complex from paddle::platform into pten::common (#39044) · f1143f0c
  由 Aurelius84 提交于 1月 20, 2022
```
* Migrate bfloat16/float16/complex from platform into pten::common

* fix typo

* fix code style
```
  f1143f0c
- Y
  
  mod communicator (#39064) · 2a9c993e
  由 yaoxuefeng 提交于 1月 20, 2022
  
  2a9c993e
- Z
  Fix master weight bug for multi_tensor optimizer(momentum, adam) (#38991) · 6b0c57cf
  由 zhangbo9674 提交于 1月 20, 2022
```
* fix mp

* support merged_momentum for mp
```
  6b0c57cf
- S
  
  remove if !defined(WIN32) (#39058) · 90e9233a
  由 sneaxiy 提交于 1月 20, 2022
  
  90e9233a
- S
  
  fix gelu compile on CUDA 10 (#39045) · 0617a3ed
  由 sneaxiy 提交于 1月 20, 2022
  
  0617a3ed
19 1月, 2022 1 次提交
- Z
  
  Add conv2d_transpose and conv2d_transpose_grad for XPU,test=kunlun (#38956) · c7de7440
  由 zhangyikun02 提交于 1月 19, 2022
  
  c7de7440
18 1月, 2022 7 次提交
- S
  Mish FP32/BF16 kernel, conv and fc fuse passes (#38623) · 1d18bc2c
  由 Sławomir Siwek 提交于 1月 18, 2022
```
* Mish

* Change exp() library

* mish fuse pass

* mish attrs

* fixes

* mishop maker

* remove attrs

* mish kernal for bf16

* fc+mish fuse

* fix code format error

* Resolve merge conflicts

* Update mish operator version

* update mish variable to new naming convention
```
  1d18bc2c
- change CUDA implementaion of uniform/gaussian OP (#38611) · bbbd75e4
  由 zhouweiwei2014 提交于 1月 18, 2022
```
* change CUDA implementaion of uniform/gaussian OP

* fix unittest
```
  bbbd75e4
- Z
  [Unify Tensors PR #8] Merged Tensor into DenseTensor, test=allcases (#38914) · 2052f1e3
  由 Zhanlue Yang 提交于 1月 18, 2022
```
* Merged LoDTensor with Tensor,test=allcases

* Patched python level LoDTensor

* Patched python level LoDTensor

* Merge Tensor into DenseTensor

* Fixed namespace issues,test=allcases

* Fixed merge issues

* Fixed inference issues

* Fixed NPU test issues

* Fixed merge issues
```
  2052f1e3
- T
  
  fix lookup_table_v2 error in kunlun2 (#38855) · df898f8b
  由 taixiurong 提交于 1月 18, 2022
  
  df898f8b
- Y
  
  Unify the functor of elementwise and logical ops. (#35767) · b1365d25
  由 Yiqun Liu 提交于 1月 18, 2022
  
  b1365d25
- Y
  
  break the circular dependency between reduce and elementwise (#38951) · a1980d9c
  由 YuanRisheng 提交于 1月 18, 2022
  
  a1980d9c
- S
  Speedup FP16 Gelu op using fast math and vectorized 8 kernel (#38980) · 8c20d668
  由 sneaxiy 提交于 1月 18, 2022
```
* speedup gelu using fast math

* add bwd part
```
  8c20d668
17 1月, 2022 6 次提交

J

fix for conv2D training error (#38938) · 944ea436
由 jakpiase 提交于 1月 17, 2022

944ea436

expose input variables that only shape needed in each subgraph that compiled by CINN (#38367) · b4cb3589

由 CtfGo 提交于 1月 17, 2022

collecting input variables that only shape needed of each subgraph that compiled by CINN in build_cinn_pass, and expose them to memory optimization of framework passes by declaringDECLARE_INPLACE_OP_INFERER in cinn_launch op.

b4cb3589

Z

remove MakePtenDenseTensor in op compute (#38910) · 04f042a5
由 zyfncg 提交于 1月 17, 2022

04f042a5

[Pten] Replace platform::Place to pten::Place. (#38899) · c48a9ad5

由 Wilber 提交于 1月 17, 2022

* add pten::Place data structure.

* update ci problem

* fix ci problem

* update

* using platform::Place=pten::Place

* remove BOOST_GET_CONST for CPUPlace and GPUPlace

* compile pass 25%.

* compile pass 45%

* compile pass 60%

* remove boost_get for xpu npu mlu and ipu

* compile pass on cpu and gpu.

* fix compile problem

* fix compile error.

* update

* fix ci problem

* update

* ci approve

* fix ci problem

* fix ci eager test problem

* remove BOOST_GET_CONST

* fix npu compile

c48a9ad5

S

add squared_l2_norm (#38968) · 6eeb16b8
由 sneaxiy 提交于 1月 17, 2022

6eeb16b8
Z

[part 5]change type of function args (#38889) · ac933235
由 Zhang Ting 提交于 1月 17, 2022

ac933235

15 1月, 2022 1 次提交

[Unify Tensors PR #7] Merged LoDTensor with Tensor, test=allcases (#38880) · 88966b28

由 Zhanlue Yang 提交于 1月 15, 2022

* Merged LoDTensor with Tensor,test=allcases

* Patched python level LoDTensor

* Fixed example code failure

* Polished function names, removed duplicated forward declarations

88966b28

14 1月, 2022 3 次提交

[XPU]add stack_grad op for kunlun2,*test=kunlun (#38674) · 87ee3e4f

由 Zhangjingyu06 提交于 1月 14, 2022

* [XPU]add split op for kunlun2,*test=kunlun

* [XPU]add split op for kunlun2,*test=kunlun

* [XPU]add split op for kunlun,*test=kunlun

* [XPU]add stack_grad op for kunlun2,*test=kunlun
Co-authored-by: NQingshuChen <chenqingshu@baidu.com>

87ee3e4f

Y

refactor impl of elementwise op part2 (#38898) · 556d5097
由 YuanRisheng 提交于 1月 14, 2022

556d5097

[MLU]Add mean and reduce_mean op (#38872) · 7f8d5bc8

由 qipengh 提交于 1月 14, 2022

* [MLU]: add mean and reduce mean op

* [MLU]add mlu pytest dir in CMakeLists.txt

* [MLU]fix tensor data

* [MLU]fix TensorToPyArray and license

7f8d5bc8

13 1月, 2022 7 次提交

S

[bug fix] fix unfold bug in compile time (#38907) · 7f123456
由 shangliang Xu 提交于 1月 13, 2022

7f123456
F
[NPU] fix tril_triu (#38864) · eaccdc71
由 furnace 提交于 1月 13, 2022
```
[NPU] fix tril_triu
```
eaccdc71
F
[NPU] fix expand op (#38526) · 7a5af630
由 furnace 提交于 1月 13, 2022
```
* [NPU] fix expand op

* [NPU] optimize codes

* [NPU] optimize codes
```
7a5af630

[pten]Remove pten/include dir files (#38878) · 7e0292ea

由 chentianyu03 提交于 1月 13, 2022

* move dot_dev api into dot_kernel.h

* add infermate header

* modify to dotkerel in dot_op.h

* mvoe conj dev api into complex_kernel.h

* move sign dev api into  sign_kernel.h

* move scale dev api into kernel.h and remove infermete.h

* rm paddle/pten/include/math.h

* rm paddle/pten/include/math.h

* rm include dir

* rm paddle/pten/include/math.h

* fix conflict with develop branch

* rm devContext in conj_op.h

* add the missing complex_kernel header

7e0292ea

Added mul BF16/FP32 FWD/BWD oneDNN kernel (#38552) · fc6eed5b

由 jakpiase 提交于 1月 13, 2022

* base changes for mul reimplementation

* empty commit

* tmp save

* full implementation of mul bf16/fp32 fwd bwd

* CI fix

* CI rerun

* changed unity build cmake to avoid gpu issues

* removed mul mkldnn from unity build

* added skipping tests if not cpu_bf16

* CI fix

* CI fix

* CI fix

fc6eed5b

C
Fix mkldnn invalid infershape impl (#38837) · 281644cd
由 Chen Weihang 提交于 1月 13, 2022
```
* fix mkldnn invalid infershape

* add unittest for mkldnn in new executor

* add import os
```
281644cd
石

splits allocation for pten, test=develop (#38853) · 277cf900
由石晓伟提交于 1月 13, 2022

277cf900

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致