提交 · ed0990e77b92d14d5f5cd22b411a1f3ee38d38f8 · PaddlePaddle / Paddle

07 2月, 2022 4 次提交
- Y
  
  INFRT/Refine TensorMap (2nd PR) (#39262) · ed0990e7
  由 Yan Chunwei 提交于 2月 07, 2022
  
  ed0990e7
- J
  Added Adam FP32 JIT assembly kernel (#39158) · ebd14743
  由 jakpiase 提交于 2月 07, 2022
```
* Added adam kernel

* CI rerun
```
  ebd14743
- Z
  
  Enabled final state Eager Dygraph Codegen (#39355) · e15e4ed0
  由 Zhanlue Yang 提交于 2月 07, 2022
  
  e15e4ed0
- C
  [CustomOp] Support output as input argument of kernel func (#39353) · f1f74e9e
  由 Chen Weihang 提交于 2月 07, 2022
```
* refactor custom op kernel func and utils

* add output sync

* adapte tensor* in utils

* fix windows symbol error
```
  f1f74e9e
06 2月, 2022 1 次提交
- W
  
  [PTEN] Add Gpu context (#39305) · a821c4a9
  由 Wilber 提交于 2月 06, 2022
  
  a821c4a9
04 2月, 2022 2 次提交
- Z
  【Pten】Support data transform in C++ API (#39263) · dcff7fa8
  由 zyfncg 提交于 2月 04, 2022
```
* add data_transform in pten api

* support GetKernelTypeForVar

* fix complie problem of bfloat16

* change error namespace

* add complex type transform unittest

* fix merge conflict
```
  dcff7fa8
- C
  
  remove unchanged infermeta new (#39343) · 0dccdee0
  由 Chen Weihang 提交于 2月 04, 2022
  
  0dccdee0
02 2月, 2022 3 次提交
- Z
  
  Fix fc_mkldnn format issue (#38890) · 633c71c2
  由 Zuza 提交于 2月 02, 2022
  
  633c71c2
- C
  [PTen] Remove kernel alias name (#39321) · 5dc20c27
  由 Chen Weihang 提交于 2月 02, 2022
```
* remove kernel alias name

* fix depreacted error

* fix deprecated failed

* fix mean error

* resolve conflict

* fix windows failed
```
  5dc20c27
- J
  
  Merge legacy to fluid (#39318) · 34cce62f
  由 Jiabin Yang 提交于 2月 02, 2022
  
  34cce62f
30 1月, 2022 8 次提交

Z
geo memory sparse table (#39250) · 9b3b53ba
由 zhaocaibei123 提交于 1月 30, 2022
```
* geo depends

* add memory geo table

* fix
```
9b3b53ba

Add a Sparse OP:sparse_csr_to_coo (#39266) · bafea65c

由 zhangkaihuo 提交于 1月 30, 2022

* dense_to_sparse_coo

* optimize unit testing; support rocm

* 1. delete fluid related header file
2. update the copyright

* fix hipMemcpy

* update dense_to_sparsecoo

* add namespace sparse

* sparse_csr_to_dense

* test to_sparse_coo: csr_to_coo

* fix writing error

bafea65c

[PTen] Change all InferMeta functions (#39222) · 7e29cea9

由 Chen Weihang 提交于 1月 30, 2022

* change unary infermeta

* change other infermeta

* change all infermeta format

* resolve conflit

* fix test failed

* resolve reshape conflit

* fix compile failed

* adapt auto api gen

* fix reshape failed

* fix concat failed

* resolve conflict

7e29cea9

Add a Sparse OP : to_sparse_coo (#39264) · 78132fe1

由 zhangkaihuo 提交于 1月 30, 2022

* dense_to_sparse_coo

* optimize unit testing; support rocm

* 1. delete fluid related header file
2. update the copyright

* fix hipMemcpy

* update dense_to_sparsecoo

* add namespace sparse

78132fe1

L

delete FLAGS_run_pten_kernel (#39330) · 2d6d6fa1
由 Leo Chen 提交于 1月 30, 2022

2d6d6fa1
F

[MLU] add softmax_with_cross_entropy mlu kernel (#39260) · aecf9967
由 fwenguang 提交于 1月 30, 2022

aecf9967
feat(cncl_mlu): add cncl dev for mlu distributed backend (#39294) · d28f6f7b
由 mhhhh1 提交于 1月 30, 2022

d28f6f7b

[pten] fit get all register op kernels (#39288) · eefe5feb

由 Leo Chen 提交于 1月 30, 2022

* upgrade _get_all_register_op_kernels

* add ut

* support xpu/npu

* fix device id

* enhance TransToFluidPlace

* fix compile

eefe5feb

29 1月, 2022 6 次提交

Optimize layer norm backward cuda kernel when cols is 1024. (#39247) · 99cfcc09

由 Li Min 提交于 1月 29, 2022

* Add fp16 support for scale/bias for fused_layernnorm_residual_dropout_bias op.

* Remove useless code.

* Remove useless code.

* Optimize layer_norm fwd when cols is 1024.

* Remove useless code.

* Minors.

* Minors.

* Modifications accordding to reviews.

* Minors.

* Optimize layer_norm bwd kernel when cols is 1024.

* Polish layer_norm_bwd_1024 kernel.

* Limit ln_bwd_1024_kernel to paddle_with_cuda.

* Fix double type compile error.

* Add optimization of ln bwd for fused_dropout_add_ln op.

* Polish codes.

99cfcc09

Add xpu2 compiler (#37254) · 92da5055

由 Liu-xiandong 提交于 1月 29, 2022

* Add XPU compiler for paddle, test=develop

* clean code

* clean useless code

* clean useless code

* clean useless code

* test

* add include path

* use clang compiler

* xpu2.cmake

* XPU2 compiler passed

* update

* update after pten

* combination the WITH_XPU and WITH_XPU2

* update the fuse operation in WITH_XPU and WITH_XPU2

* update

* update

* update

* fix the merge error

* update

* update the code

* update the code

* add run_kp_kernel flag

* update

* update

* fix prepared type_ bug

* clean and update the code

* reset the kernel_primitives

* update

* clean the code

* delete useless comment

* fix the bug in WITH_XPU

* update

* update

* modify the abi

* delete some useless code

* Parameter automation in xpu compilation

* Parameter automation in xpu compilation

* delete kps in cmake

* delete useless comment

* clean the code

* clean the code

92da5055

C

rename utils to manual (#39320) · 96bcf2df
由 Chen Weihang 提交于 1月 29, 2022

96bcf2df

[PTen] Tidy pten core headers (#39188) · dd990981

由 Chen Weihang 提交于 1月 29, 2022

* open header for custom kernel

* add core utils

* tidy core code

* tify header

* tidy include

* tidy namespace

* resolve conflit

* fix unittest and coverage

* remove platform using

* resolve conflict

* resolve conflict

* fix digamma namespace error

* fix xpu full kernel error

* fix xpu full kernel error

* polish details

* add place for lib storage

dd990981

Q
fix kunlun2 softmax unitest bug (#39274) · 23bb2836
由 QingshuChen 提交于 1月 29, 2022
```
* fix kunlun2 softmax unitest bug
*test=kunlun

* minor
```
23bb2836
L

[pten] fix wrong variable name in PreparePtenData (#39311) · 7b4916c4
由 Leo Chen 提交于 1月 29, 2022

7b4916c4

28 1月, 2022 13 次提交

Adjusted CMakeFiles to support compilation for final state auto generated codes (#39215) · 09198b04

由 Zhanlue Yang 提交于 1月 28, 2022

* Removed debug info

* Added automatic code generation for final state Eager Dygraph

* Modified backward yaml

* Added EagerUtils helper functions for final state CodeGen

* Adjusted CMakeFiles to support compilation for final state auto generated codes

* Fixed final state eager codegen

* Fixed CI problems

* Fixed yaml.load() method failure

* Turned final state codegen off for now

* Fixed minor issue

09198b04

C
[PTen] Update all forward argument maping fns (#39252) · 75923a32
由 Chen Weihang 提交于 1月 28, 2022
```
* update forward argument mapping

* fix compile failed

* fix test failed
```
75923a32
F

Remove macro for GetGpuBasePtr (#39279) · 9a001c09
由 From00 提交于 1月 28, 2022

9a001c09

Host tracer and ProfilerController (#39230) · 7c489c2e

由 liutiexing 提交于 1月 28, 2022

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* Revert "Add EventsWaiter"

This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2.

* split template

* Add Profiler and HostTracer

* update

* update

* update

* updateg

* fix cmake
Co-authored-by: Nliutiexing <liutiexing@google.com>

7c489c2e

Workqueue threadnames (#39177) · 44af74b8

由 liutiexing 提交于 1月 28, 2022

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* Revert "Add EventsWaiter"

This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2.

* Set thread name for WorkQueue
Co-authored-by: Nliutiexing <liutiexing@google.com>

44af74b8

Y

fix elementwise_grad_bug (#39301) · 04a16189
由 YuanRisheng 提交于 1月 28, 2022

04a16189
Y
[PTen]Refactor scale kernel that has selected_rows input (#39278) · abfc2fe9
由 YuanRisheng 提交于 1月 28, 2022
```
* refactor scale kernel that its input is selected_rows

* complement upload file
```
abfc2fe9

Move digamma to pten (#39240) · 848ae7dc

由 hong 提交于 1月 28, 2022

* move digamma to pten; test=develop

* fix mutable_data bugs; test=develop

* remove useless code; test=develop

* remove kernel compute; test=develop

* fix bug; test=develop

848ae7dc

W
compile fix (#39272) · 91dd0f0d
由 wenbin 提交于 1月 28, 2022
```
* slice

* shuffle pass enhancement
```
91dd0f0d

[PSLIB] Add Metrics Module, Support User-defined Add Metric (#38789) · 2e6be886

由 Fan Zhang 提交于 1月 28, 2022

* [PSLIB] Add Metrics Module, Support User-defined Add Metric

* [PSLIB] Modify According to CI

* [PSLIB] Modify According to CI

* [PSLIB] Modify According to CI

* [PSLIB] Modify According to CI Coverage

* [PSLIB] Modify According to CI

* [PSLIB] Modify According to CI

* [PSLIB] Modify According to CI

* [PSLIB] Modify According to CI

* [PSLIB] Modify According to CI

* [PSLIB] Modify According to CI Coverage

* [PSLIB] Modify According to CI Coverage

* [PSLIB] Modify According to CI Coverage

* modify role_maker

* update CMakeLists.txt

2e6be886

【Pten】Remove WriteBackOutput in tensor_utils (#39291) · 3ef2922b

由 zyfncg 提交于 1月 28, 2022

* remove remake densetensor

* fix eager test error

* fix bug in eager

* implement AllocateFrom

* remove WriteBackOutput

* fix problem of eager
Co-authored-by: Nzkh2016 <zhangkaihuo@baidu.com>

3ef2922b

[Eager] Refactor TensorAdd by template (#39282) · 0bb3e5f1

由 Weilong Wu 提交于 1月 28, 2022

* Refactor TensorAdd func by template and remove gradient_accumulation in eager

* Remove needless target name

* Use overload instead of template

0bb3e5f1

Z

Auto-geneate kernel signature in C++ API (#39281) · fc5fa0de
由 zyfncg 提交于 1月 28, 2022

fc5fa0de

27 1月, 2022 3 次提交

Z

implement AllocateFrom (#39280) · d89f246c
由 zhangkaihuo 提交于 1月 27, 2022

d89f246c

Add Khop Graph Sampler API (#39146) · 35f949b5

由 Siming Dai 提交于 1月 27, 2022

* add the test case for the UVA

* add the context load for the uva

* Add graph_sample kernel

* Add graph_sample commit

* add new commit for graph_sample

* add unsigned long long int

* delete some remarks

* add cpu version

* add cuda eids

* add cpu eids

* delete _uva

* optimize speed: emplace_back, last_layer

* add to_uva_tensor

* add cpu return_eids choice

* add gpu return_eids choice

* add cpu reindex_nodes

* add gpu reindex_nodes

* rename op and add OMP for cpu

* add incubate api

* fix the compile problem for the PADDLE_ENFORE and different device

* fix the rcom and windows compile problem

* add unittest for graph_sample_neighbors

* fix cpu unittest and unique problem

* fix uva unittest, fix cuda unique problem

* fix the windows compile problem

* fix the windows rand_r compile problem

* add correct unittest, add src_eids dispensable

* delete black

* combine uva unittest

* mv Sample_index to Sample_Index; check input shape; fix random sample func

* delete memset & cudaMemset

* fix according to PR comments

* fix rocm ci

* modify function names according to the specification

* fix windows_openblas ci

* refine annotations, fix windows unittest, add default value for uva device_id, fix bug for input nodes with empty neighbors

* fix rocm ci

* rename graph_sample_neighbors as graph_khop_sampler, add incubate api doc

* add data type

* fix conflict
Co-authored-by: Nwawltor <fangzeyang0904@hotmail.com>

35f949b5

L

[pten] remove concat fluid kernel (#39268) · 552db8dc
由 Leo Chen 提交于 1月 27, 2022

552db8dc

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功