提交 · eefe5feb430d309f209c0dbabffa8b4aac444764 · PaddlePaddle / Paddle

30 1月, 2022 1 次提交

[pten] fit get all register op kernels (#39288) · eefe5feb

由 Leo Chen 提交于 1月 30, 2022

* upgrade _get_all_register_op_kernels

* add ut

* support xpu/npu

* fix device id

* enhance TransToFluidPlace

* fix compile

eefe5feb

29 1月, 2022 13 次提交

R

fix paddle.where broadcast bug (#39182) · 92253f11
由 ronnywang 提交于 1月 29, 2022

92253f11

Optimize layer norm backward cuda kernel when cols is 1024. (#39247) · 99cfcc09

由 Li Min 提交于 1月 29, 2022

* Add fp16 support for scale/bias for fused_layernnorm_residual_dropout_bias op.

* Remove useless code.

* Remove useless code.

* Optimize layer_norm fwd when cols is 1024.

* Remove useless code.

* Minors.

* Minors.

* Modifications accordding to reviews.

* Minors.

* Optimize layer_norm bwd kernel when cols is 1024.

* Polish layer_norm_bwd_1024 kernel.

* Limit ln_bwd_1024_kernel to paddle_with_cuda.

* Fix double type compile error.

* Add optimization of ln bwd for fused_dropout_add_ln op.

* Polish codes.

99cfcc09

Add xpu2 compiler (#37254) · 92da5055

由 Liu-xiandong 提交于 1月 29, 2022

* Add XPU compiler for paddle, test=develop

* clean code

* clean useless code

* clean useless code

* clean useless code

* test

* add include path

* use clang compiler

* xpu2.cmake

* XPU2 compiler passed

* update

* update after pten

* combination the WITH_XPU and WITH_XPU2

* update the fuse operation in WITH_XPU and WITH_XPU2

* update

* update

* update

* fix the merge error

* update

* update the code

* update the code

* add run_kp_kernel flag

* update

* update

* fix prepared type_ bug

* clean and update the code

* reset the kernel_primitives

* update

* clean the code

* delete useless comment

* fix the bug in WITH_XPU

* update

* update

* modify the abi

* delete some useless code

* Parameter automation in xpu compilation

* Parameter automation in xpu compilation

* delete kps in cmake

* delete useless comment

* clean the code

* clean the code

92da5055

C

rename utils to manual (#39320) · 96bcf2df
由 Chen Weihang 提交于 1月 29, 2022

96bcf2df

[PTen] Tidy pten core headers (#39188) · dd990981

由 Chen Weihang 提交于 1月 29, 2022

* open header for custom kernel

* add core utils

* tidy core code

* tify header

* tidy include

* tidy namespace

* resolve conflit

* fix unittest and coverage

* remove platform using

* resolve conflict

* resolve conflict

* fix digamma namespace error

* fix xpu full kernel error

* fix xpu full kernel error

* polish details

* add place for lib storage

dd990981

Symbolic Hessian (#39221) · 64e7c715

由 Tongxin Bai 提交于 1月 29, 2022

* [autograd] static Jacobian pass tests.

* [autograd] apply CR suggested changes.

* [autograd] more tests.

* [autograd] add CPUPlace in tests.

* [autograd] bug fixes.

* [autograd] reformatted.

* [autograd] adding Hessian, in progress.

* [autograd] Hessian passes. A double grad bug fixed.

* [autograd] fix renaming conflict in double backward pass.

* [autograd] polish test.s

* fix a bug when using brackets

* debug for ci

* [autograd] fixing Hessian test.

* polish format.
Co-authored-by: Nlevi131 <83750468+levi131@users.noreply.github.com>
Co-authored-by: Nlevi131 <limaolin01@baidu.com>

64e7c715

Z

Removed approval request for tensor/lod_tensor modifications (#39326) · 984b16fc
由 Zhanlue Yang 提交于 1月 29, 2022

984b16fc
G

fix FakeQuantAbsMax in QAT (#39307) · 34d97c57
由 Guanghua Yu 提交于 1月 29, 2022

34d97c57

Auto parallel/qkv fuse (#39080) · fdedf909

由 JZ-LIANG 提交于 1月 29, 2022

* support qkv fuse

* support qkv fuse

* update completion

* update completion

* update dist_split

* rerun ci

* is_auto_compatible added

* is_auto_compatible added

fdedf909

Q
fix kunlun2 softmax unitest bug (#39274) · 23bb2836
由 QingshuChen 提交于 1月 29, 2022
```
* fix kunlun2 softmax unitest bug
*test=kunlun

* minor
```
23bb2836
J

Update register_kernels and kernel_library function in pten.cmake (#39259) · 6b3a6a9f
由 Jack Zhou 提交于 1月 29, 2022

6b3a6a9f

Add FuseReluDepthwiseConvPass and unittest (#39105) · 9d6e8202

由 hlygit66666 提交于 1月 29, 2022

* add fuse_relu_depthwise_conv_pass unittest

* fix atol and rtol

* fix according to review

* Update test_dist_fuse_relu_depthwise_conv_pass.py

9d6e8202

L

[pten] fix wrong variable name in PreparePtenData (#39311) · 7b4916c4
由 Leo Chen 提交于 1月 29, 2022

7b4916c4

28 1月, 2022 18 次提交

Adjusted CMakeFiles to support compilation for final state auto generated codes (#39215) · 09198b04

由 Zhanlue Yang 提交于 1月 28, 2022

* Removed debug info

* Added automatic code generation for final state Eager Dygraph

* Modified backward yaml

* Added EagerUtils helper functions for final state CodeGen

* Adjusted CMakeFiles to support compilation for final state auto generated codes

* Fixed final state eager codegen

* Fixed CI problems

* Fixed yaml.load() method failure

* Turned final state codegen off for now

* Fixed minor issue

09198b04

Z

Added Eager Dygraph support for user_defined_grads (#39309) · 76103c88
由 Zhanlue Yang 提交于 1月 28, 2022

76103c88
C
[PTen] Update all forward argument maping fns (#39252) · 75923a32
由 Chen Weihang 提交于 1月 28, 2022
```
* update forward argument mapping

* fix compile failed

* fix test failed
```
75923a32
F

Remove macro for GetGpuBasePtr (#39279) · 9a001c09
由 From00 提交于 1月 28, 2022

9a001c09
Z

recovery code (#39287) · 45f9c9eb
由 zhangkaihuo 提交于 1月 28, 2022

45f9c9eb
R

fix optimizer docs (#39297) · ffa7ff9c
由 Roc 提交于 1月 28, 2022

ffa7ff9c

Host tracer and ProfilerController (#39230) · 7c489c2e

由 liutiexing 提交于 1月 28, 2022

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* Revert "Add EventsWaiter"

This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2.

* split template

* Add Profiler and HostTracer

* update

* update

* update

* updateg

* fix cmake
Co-authored-by: Nliutiexing <liutiexing@google.com>

7c489c2e

Workqueue threadnames (#39177) · 44af74b8

由 liutiexing 提交于 1月 28, 2022

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* Revert "Add EventsWaiter"

This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2.

* Set thread name for WorkQueue
Co-authored-by: Nliutiexing <liutiexing@google.com>

44af74b8

Y

fix elementwise_grad_bug (#39301) · 04a16189
由 YuanRisheng 提交于 1月 28, 2022

04a16189
Y
[PTen]Refactor scale kernel that has selected_rows input (#39278) · abfc2fe9
由 YuanRisheng 提交于 1月 28, 2022
```
* refactor scale kernel that its input is selected_rows

* complement upload file
```
abfc2fe9

Move digamma to pten (#39240) · 848ae7dc

由 hong 提交于 1月 28, 2022

* move digamma to pten; test=develop

* fix mutable_data bugs; test=develop

* remove useless code; test=develop

* remove kernel compute; test=develop

* fix bug; test=develop

848ae7dc

W
compile fix (#39272) · 91dd0f0d
由 wenbin 提交于 1月 28, 2022
```
* slice

* shuffle pass enhancement
```
91dd0f0d

[PSLIB] Add Metrics Module, Support User-defined Add Metric (#38789) · 2e6be886

由 Fan Zhang 提交于 1月 28, 2022

* [PSLIB] Add Metrics Module, Support User-defined Add Metric

* [PSLIB] Modify According to CI

* [PSLIB] Modify According to CI

* [PSLIB] Modify According to CI

* [PSLIB] Modify According to CI Coverage

* [PSLIB] Modify According to CI

* [PSLIB] Modify According to CI

* [PSLIB] Modify According to CI

* [PSLIB] Modify According to CI

* [PSLIB] Modify According to CI

* [PSLIB] Modify According to CI Coverage

* [PSLIB] Modify According to CI Coverage

* [PSLIB] Modify According to CI Coverage

* modify role_maker

* update CMakeLists.txt

2e6be886

【Pten】Remove WriteBackOutput in tensor_utils (#39291) · 3ef2922b

由 zyfncg 提交于 1月 28, 2022

* remove remake densetensor

* fix eager test error

* fix bug in eager

* implement AllocateFrom

* remove WriteBackOutput

* fix problem of eager
Co-authored-by: Nzkh2016 <zhangkaihuo@baidu.com>

3ef2922b

B

fix_stage2_minimize (#39285) · 90f44c6f
由 Baibaifan 提交于 1月 28, 2022

90f44c6f

[Eager] Refactor TensorAdd by template (#39282) · 0bb3e5f1

由 Weilong Wu 提交于 1月 28, 2022

* Refactor TensorAdd func by template and remove gradient_accumulation in eager

* Remove needless target name

* Use overload instead of template

0bb3e5f1

Z

Auto-geneate kernel signature in C++ API (#39281) · fc5fa0de
由 zyfncg 提交于 1月 28, 2022

fc5fa0de

Resolve unit-test timeout issues (#39292) · 543f3dea

由 Weilong Wu 提交于 1月 28, 2022

* implement AllocateFrom

* fix PR-CI-Coverage timeout in 120s
Co-authored-by: Nzkh2016 <zhangkaihuo@baidu.com>

543f3dea

27 1月, 2022 8 次提交

Z

implement AllocateFrom (#39280) · d89f246c
由 zhangkaihuo 提交于 1月 27, 2022

d89f246c

Add Khop Graph Sampler API (#39146) · 35f949b5

由 Siming Dai 提交于 1月 27, 2022

* add the test case for the UVA

* add the context load for the uva

* Add graph_sample kernel

* Add graph_sample commit

* add new commit for graph_sample

* add unsigned long long int

* delete some remarks

* add cpu version

* add cuda eids

* add cpu eids

* delete _uva

* optimize speed: emplace_back, last_layer

* add to_uva_tensor

* add cpu return_eids choice

* add gpu return_eids choice

* add cpu reindex_nodes

* add gpu reindex_nodes

* rename op and add OMP for cpu

* add incubate api

* fix the compile problem for the PADDLE_ENFORE and different device

* fix the rcom and windows compile problem

* add unittest for graph_sample_neighbors

* fix cpu unittest and unique problem

* fix uva unittest, fix cuda unique problem

* fix the windows compile problem

* fix the windows rand_r compile problem

* add correct unittest, add src_eids dispensable

* delete black

* combine uva unittest

* mv Sample_index to Sample_Index; check input shape; fix random sample func

* delete memset & cudaMemset

* fix according to PR comments

* fix rocm ci

* modify function names according to the specification

* fix windows_openblas ci

* refine annotations, fix windows unittest, add default value for uva device_id, fix bug for input nodes with empty neighbors

* fix rocm ci

* rename graph_sample_neighbors as graph_khop_sampler, add incubate api doc

* add data type

* fix conflict
Co-authored-by: Nwawltor <fangzeyang0904@hotmail.com>

35f949b5

L

[pten] remove concat fluid kernel (#39268) · 552db8dc
由 Leo Chen 提交于 1月 27, 2022

552db8dc
C
Add kernelsignature constructor for windows (#39253) · 33e3f5ac
由 Chen Weihang 提交于 1月 27, 2022
```
* add constructor for win

* change impl

* fix bug
```
33e3f5ac
Z
【PTen】Remove ReMakePtenDenseTensor (#39094) · 98c1829b
由 zyfncg 提交于 1月 27, 2022
```
* remove remake densetensor

* fix eager test error

* fix bug in eager
```
98c1829b
Y

refactor elementwise sub grad (#39225) · 7a1e1193
由 YuanRisheng 提交于 1月 27, 2022

7a1e1193

[PTen]Support AllocateFrom in Tensor and Alloc/HostAlloc in Context (#39022) · 5631da9c

由 Aurelius84 提交于 1月 27, 2022

* Support allocate_from in Tensor and allocate_data in Context

* fix #ifdef CUDA

* fix cycle depends

* fix test_xxx_dev_api failed

* fix windows compiling error

* fix unittest

* modify into PImpl

* fix selected rows

* add TODO comment

* refine interface according reviewer

5631da9c

C
[PTen] Add infermeta registry (#39204) · f3f16126
由 Chen Weihang 提交于 1月 27, 2022
```
* add infermeta registry

* add infermeta registry

* add unittest

* polish details
```
f3f16126

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功