提交 · 7f1234563ff3aab32168a6fbaeb57d73748981c3 · Crayon鑫 / Paddle

13 1月, 2022 9 次提交

S

[bug fix] fix unfold bug in compile time (#38907) · 7f123456
由 shangliang Xu 提交于 1月 13, 2022

7f123456
F
[NPU] fix tril_triu (#38864) · eaccdc71
由 furnace 提交于 1月 13, 2022
```
[NPU] fix tril_triu
```
eaccdc71
F
[NPU] fix expand op (#38526) · 7a5af630
由 furnace 提交于 1月 13, 2022
```
* [NPU] fix expand op

* [NPU] optimize codes

* [NPU] optimize codes
```
7a5af630

[pten]Remove pten/include dir files (#38878) · 7e0292ea

由 chentianyu03 提交于 1月 13, 2022

* move dot_dev api into dot_kernel.h

* add infermate header

* modify to dotkerel in dot_op.h

* mvoe conj dev api into complex_kernel.h

* move sign dev api into  sign_kernel.h

* move scale dev api into kernel.h and remove infermete.h

* rm paddle/pten/include/math.h

* rm paddle/pten/include/math.h

* rm include dir

* rm paddle/pten/include/math.h

* fix conflict with develop branch

* rm devContext in conj_op.h

* add the missing complex_kernel header

7e0292ea

L

[fleet_executor] fix uninitialized pointer (#38904) · a6cf6cdd
由 LiYuRio 提交于 1月 13, 2022

a6cf6cdd
W
roi_align aligned supported (#38905) · 08dcea18
由 wenbin 提交于 1月 13, 2022
```
roi_align aligned supported
```
08dcea18

Added mul BF16/FP32 FWD/BWD oneDNN kernel (#38552) · fc6eed5b

由 jakpiase 提交于 1月 13, 2022

* base changes for mul reimplementation

* empty commit

* tmp save

* full implementation of mul bf16/fp32 fwd bwd

* CI fix

* CI rerun

* changed unity build cmake to avoid gpu issues

* removed mul mkldnn from unity build

* added skipping tests if not cpu_bf16

* CI fix

* CI fix

* CI fix

fc6eed5b

C
Fix mkldnn invalid infershape impl (#38837) · 281644cd
由 Chen Weihang 提交于 1月 13, 2022
```
* fix mkldnn invalid infershape

* add unittest for mkldnn in new executor

* add import os
```
281644cd
石

splits allocation for pten, test=develop (#38853) · 277cf900
由石晓伟提交于 1月 13, 2022

277cf900

12 1月, 2022 18 次提交

Z
[part 3]change type of function args (#38887) · 0efcae86
由 Zhang Ting 提交于 1月 12, 2022
```
* code clean

* [part 3]change type of function args
```
0efcae86
Z

pscore perfermance optimization (#38582) · f1201482
由 zhaocaibei123 提交于 1月 12, 2022

f1201482

[IPU] add more ops (#38831) · 050fd168

由 Allen Guo 提交于 1月 12, 2022

* support more ops

* Co-authored-by: Xiaobing Wang <xiaobingw@graphcore.ai>
Co-authored-by: NAllen Guo <alleng@graphcore.ai>
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>
Co-authored-by: NHaicheng Jiang <haichengj@graphcore.ai>
Co-authored-by: NHan Zhao <hanzhao@graphcore.ai>

* add authors
Co-authored-by: NXiaobing Wang <xiaobingw@graphcore.ai>
Co-authored-by: NAllen Guo <alleng@graphcore.ai>
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>
Co-authored-by: NHaicheng Jiang <haichengj@graphcore.ai>
Co-authored-by: NHan Zhao <hanzhao@graphcore.ai>

* update date
Co-authored-by: NXiaobing Wang <xiaobingw@graphcore.ai>
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>
Co-authored-by: NHaicheng Jiang <haichengj@graphcore.ai>
Co-authored-by: NHan Zhao <hanzhao@graphcore.ai>

050fd168

the_one_ps dirs reconstruct (#38804) · 50609214

由 ziyoujiyi 提交于 1月 12, 2022

* delete gloo connect retry

* the_one_ps dirs reconstruct

* .

* .

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* the one ps dirs modify

* the one ps dirs modify

* the one ps dirs modify

* the one ps dirs modify

50609214

S
Fix conv act int8 scale (#38331) · 4825addd
由 Sylwester Fraczek 提交于 1月 12, 2022
```
* fix conv act int8 scale

* add unit test for conv+hard_swish
```
4825addd

support 5d for nearest interp (#38868) · d296456c

由 xiaoting 提交于 1月 12, 2022

* support 5d for nearest

* update nearest3d unittest, test=develop

* fix approve ci, test=develop

* fix approve ci, test=develop

d296456c

optimize elementwise_max_grad using new interfaces (#37906) · 4a64ca1e

由 Lijunhui 提交于 1月 12, 2022

* init elem_max_grad op

* optimize code and reply review comments

* ternary functors

* apply new reduce func

* move functor to .h

* multi-outputs init

* rearrange code

* modifed functors

* optimizer code

* pass nullptr

* revert the last change as seg fault occurs

* optimize code

* remove inplace

* remove comments

4a64ca1e

C
[PTen] Remove hybird dir (#38863) · 5f5f626b
由 Chen Weihang 提交于 1月 12, 2022
```
* remove hybird dir

* resolve conflit
```
5f5f626b
L
optimize elementwise_min_grad using new reduce interface (#38236) · c2f825d7
由 Lijunhui 提交于 1月 12, 2022
```
* ini commit

* multi-outputs init commit

* optimize code

* remove inplace
```
c2f825d7
Z

[part 6]change type of function args (#38891) · 12c5b1fe
由 Zhang Ting 提交于 1月 12, 2022

12c5b1fe

[pten]Move dot, conj, sign dev_api into kernel.h (#38862) · 5fc8bbf7

由 chentianyu03 提交于 1月 12, 2022

* move dot_dev api into dot_kernel.h

* add infermate header

* modify to dotkerel in dot_op.h

* mvoe conj dev api into complex_kernel.h

* move sign dev api into  sign_kernel.h

5fc8bbf7

J

support test_auto_prune_partial (#38871) · 4640955c
由 Jiabin Yang 提交于 1月 12, 2022

4640955c
Y
[PTen]Refactor impl of elementwise op grad_kernel (Part1) (#38873) · 676903d5
由 YuanRisheng 提交于 1月 12, 2022
```
* refactor the impl of elementwise grad kernel

* refactor impl of elementwise grad kernel(cuda)

* fix compile bugs
```
676903d5
Z

[part 4]change type of function args (#38888) · a250c56c
由 Zhang Ting 提交于 1月 12, 2022

a250c56c
Z

[part 2]change type of function args (#38886) · 86434818
由 Zhang Ting 提交于 1月 12, 2022

86434818
Z

[part 1]change type of function args (#38885) · df5d55bb
由 Zhang Ting 提交于 1月 12, 2022

df5d55bb

Adjust warpper of gpu_lanuch_config (#38654) · f5166284

由 limingshu 提交于 1月 12, 2022

* first commit

* fix wrong filename

* fix the wrong spell name

* fix gpu config warper

* modify according to pr advices

* fix GpuLauchConfig1D api bugs

* change the config for dropout grad

* fix bugs

* modification according to pr advices

* modification according to pr advices

f5166284

Os info (#38779) · 0d8d1e0e

由 liutiexing 提交于 1月 12, 2022

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* Revert "Add EventsWaiter"

This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2.

* os_info update

* update

* update

* update

* update

* update

* fix

* update

* update for windows

* fix windows

* update

* update
Co-authored-by: Nliutiexing <liutiexing@google.com>

0d8d1e0e

11 1月, 2022 8 次提交

Y

refactor reshape grad kernel (#38833) · 8cc09552
由 YuanRisheng 提交于 1月 11, 2022

8cc09552

【PTen】Add dot and matmul grad kernel in pten (#38713) · be817719

由 zyfncg 提交于 1月 11, 2022

* refactor matmul directory in pten

* fix merge conflict

* add dot_grad kernel

* add dot_grad kernel in pten

* add matmul_grad kernel

* update the code

* delete useless code in fluid

* fix some bug of running matmul grad kernel

* fix merge conflict

* refactor some code

* refactor code

be817719

Z
Fix bug in elementwise_mul/div_grad when inplace strategy (#38840) · 7915d180
由 Zhang Zheng 提交于 1月 11, 2022
```
* fix bug when inplace strategy

* fix

* fix

* fix

* fix

* fix
```
7915d180
N

Modified Kernel Primitive API and elementwise for xpu2 #38688 · 3eaf8d2c
由 niuliling123 提交于 1月 11, 2022

3eaf8d2c

Remove useless headers for some grad ops (#38823) · 9f34a070

由 limingshu 提交于 1月 11, 2022

* fix the wrong filename

* first commit

* first commit

* remove rest useless headers

* for ci approval

9f34a070

S
support vs2019 compilation in windows (#38719) · 0ad363b1
由 Sing_chan 提交于 1月 11, 2022
```
* support vs2019 compilation in windows

* not modify pow_op's original compute logic
```
0ad363b1

[Eager] fix some eager logic (#38576) · d3686471

由 wanghuancoder 提交于 1月 11, 2022

* Rearranged Eager AutoCodeGen directory structure

* Removed USE_OP in Eager AutoCodeGen

* Enabled generation for Operators without Grad/Inputs/Outputs

* Resolved operators without input

* Fixed merge conflicts

* Enabled Eager AutoCodeGen for 10+ more operators

* Refactored Eager AutoCodeGen with more organized helper objects

* Enabled Eager AutoCodeGen for operators with multiple OpBases

* Adjusted Eager AutoCodeGen to Enable Passing Output Tensor as Input Argument

* Handled Dispensable Inputs/Outputs in Eager AutoCodeGen

* Adjusted function generation/call between Python-C API & Dygraph API

* Synchronized auto-generated Python-C API with Dygraph Forward Functions

* support more eager tensor api

* fix merge compile error

* fix compile error and fit develop code

* support pure CPU

* fix some logic error in eager_mode

* support _varbase_creator in eager mode

* Added safe_initialized interface to EagerTensor for use in processing dispensable inputs

* for eager mode

* refine

* support multiple constructor for eager tensor

* add place related code

* polish code

* specific randint with dtype of int64

* Support pure cpu test

* eager logic

* refine test in pure cpu

* eager logic

* eager logic

* eager logic, test=develop

* skip core.eager when in inference, test=develop

* refine, test=develop

* refine, test=develop

* call RetainGrad after run forward kernel, test=develop

* refine, test=develop

* support dygraph util, meta, guard test

* eager test case

* support inference test

* refine test and fix initializer failed

* modify eagertensor patch method

* add eagertensor.clear_grandint, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* support create varbase and fix retain grad error

* call monkey_patch_varbase in _test_eager_guard, test=develop

* fix windows error

* split clear_gradient to clear_gradient and zero_grads, test=develop

* refine, test=develop

* refine, test=develop

* support test_imperative_basic test in eager mode

* remove additional log in variable.h

* remove additional log in variable.h

* remove additional code create in merge

* eager

* fix some eager logic, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop
Co-authored-by: Njim19930609 <jim19930609@gmail.com>
Co-authored-by: NJiabinYang <360788950@qq.com>

d3686471

F

roi_align fix (#38788) · ffbc2122
由 fengkuangxiaxia 提交于 1月 11, 2022

ffbc2122

10 1月, 2022 5 次提交
- Y
  
  add retry on pull dense sync (#38793) · 0a7cb901
  由 yaoxuefeng 提交于 1月 10, 2022
  
  0a7cb901
- H
  Add gpu kernel for new api : linalg.lstsq (#38621) · 405103d8
  由 Haohongxiang 提交于 1月 10, 2022
```
* add lstsq gpu kernel

* update

* add docs_en

* modify ut

* fix bugs

* modify example in docs_en

* remove lstsq_op.cu from ROCM cmake

* modify docs_en

* modify docs_en

* modify docs_en

* remove unneccessary TensorCopy
```
  405103d8
- L
  
  [Fleet Executor] Modified python cache strategy to support multi carriers (#38839) · c50c22b0
  由 LiYuRio 提交于 1月 10, 2022
  
  c50c22b0
- Y
  
  [fleet_executor] framework for big model inference (#38795) · ededcda2
  由 Yuang Liu 提交于 1月 10, 2022
  
  ededcda2
- B
  refactor the forward implementation of reshape npu op (#38748) · 31b1f707
  由 baoachun 提交于 1月 10, 2022
```
* refactor the forward implementation of reshape npu op

* update reshape npu op

* update reshape npu op
```
  31b1f707

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致