提交 · 41eb259500445639c13da6991bfea4e559c23dd5 · PaddlePaddle / Paddle

08 2月, 2022 5 次提交
- C
  [PTen] Support SelectedRows in execution and remove scale OpKernel and InferShape (#39351) · 41eb2595
  由 Chen Weihang 提交于 2月 08, 2022
```
* adapt selectedrows in execution

* impl selected rows branch

* support selectedrow in infershape utils

* fix device compile failed

* fix new exe test failed

* revert some changes
```
  41eb2595
- F
  Support allocate CUDA managed memory (#39075) · 42910361
  由 From00 提交于 2月 08, 2022
```
* Rough implementation for experiment

* Support allocate cuda managed memory

* Fix CI error

* Modify UT

* Check whether support memory oversubscription

* Fix ROCM Compile error

* Fix ROCM Compile error

* Fix UT cuda_managed_memory_test

* Set UT timeout to 40

* Add UT OOMExceptionTest

* Set UT timeout to 50
```
  42910361
- L
  
  [bf16] change bf16 print behavior (#39370) · 96964ff8
  由 Leo Chen 提交于 2月 08, 2022
  
  96964ff8
- Z
  
  Fixed automatic codegen issues with grad attr_map (#39358) · 65805227
  由 Zhanlue Yang 提交于 2月 08, 2022
  
  65805227
- S
  Add __PD_DEFINE_RAW_OP_KERNEL_FUNC for registering custom op kernel with ExecutionContext (#39352) · 5c3873f6
  由 sneaxiy 提交于 2月 08, 2022
```
* hack custom op

* add ut

* skip windows ci
```
  5c3873f6
07 2月, 2022 5 次提交
- T
  
  add sequence_conv op in xpu place (#39025) · fee4316d
  由 tanzhipeng 提交于 2月 07, 2022
  
  fee4316d
- S
  
  add _ptr for tensor (#39357) · 24b2e8e6
  由 sneaxiy 提交于 2月 07, 2022
  
  24b2e8e6
- J
  Added Adam FP32 JIT assembly kernel (#39158) · ebd14743
  由 jakpiase 提交于 2月 07, 2022
```
* Added adam kernel

* CI rerun
```
  ebd14743
- Z
  
  Enabled final state Eager Dygraph Codegen (#39355) · e15e4ed0
  由 Zhanlue Yang 提交于 2月 07, 2022
  
  e15e4ed0
- C
  [CustomOp] Support output as input argument of kernel func (#39353) · f1f74e9e
  由 Chen Weihang 提交于 2月 07, 2022
```
* refactor custom op kernel func and utils

* add output sync

* adapte tensor* in utils

* fix windows symbol error
```
  f1f74e9e
06 2月, 2022 1 次提交
- W
  
  [PTEN] Add Gpu context (#39305) · a821c4a9
  由 Wilber 提交于 2月 06, 2022
  
  a821c4a9
04 2月, 2022 1 次提交
- C
  
  remove unchanged infermeta new (#39343) · 0dccdee0
  由 Chen Weihang 提交于 2月 04, 2022
  
  0dccdee0
02 2月, 2022 3 次提交
- Z
  
  Fix fc_mkldnn format issue (#38890) · 633c71c2
  由 Zuza 提交于 2月 02, 2022
  
  633c71c2
- C
  [PTen] Remove kernel alias name (#39321) · 5dc20c27
  由 Chen Weihang 提交于 2月 02, 2022
```
* remove kernel alias name

* fix depreacted error

* fix deprecated failed

* fix mean error

* resolve conflict

* fix windows failed
```
  5dc20c27
- J
  
  Merge legacy to fluid (#39318) · 34cce62f
  由 Jiabin Yang 提交于 2月 02, 2022
  
  34cce62f
30 1月, 2022 6 次提交
- Z
  geo memory sparse table (#39250) · 9b3b53ba
  由 zhaocaibei123 提交于 1月 30, 2022
```
* geo depends

* add memory geo table

* fix
```
  9b3b53ba
- C
  [PTen] Change all InferMeta functions (#39222) · 7e29cea9
  由 Chen Weihang 提交于 1月 30, 2022
```
* change unary infermeta

* change other infermeta

* change all infermeta format

* resolve conflit

* fix test failed

* resolve reshape conflit

* fix compile failed

* adapt auto api gen

* fix reshape failed

* fix concat failed

* resolve conflict
```
  7e29cea9
- L
  
  delete FLAGS_run_pten_kernel (#39330) · 2d6d6fa1
  由 Leo Chen 提交于 1月 30, 2022
  
  2d6d6fa1
- F
  
  [MLU] add softmax_with_cross_entropy mlu kernel (#39260) · aecf9967
  由 fwenguang 提交于 1月 30, 2022
  
  aecf9967
- feat(cncl_mlu): add cncl dev for mlu distributed backend (#39294) · d28f6f7b
  由 mhhhh1 提交于 1月 30, 2022
  
  d28f6f7b
- L
  [pten] fit get all register op kernels (#39288) · eefe5feb
  由 Leo Chen 提交于 1月 30, 2022
```
* upgrade _get_all_register_op_kernels

* add ut

* support xpu/npu

* fix device id

* enhance TransToFluidPlace

* fix compile
```
  eefe5feb
29 1月, 2022 5 次提交

Optimize layer norm backward cuda kernel when cols is 1024. (#39247) · 99cfcc09

由 Li Min 提交于 1月 29, 2022

* Add fp16 support for scale/bias for fused_layernnorm_residual_dropout_bias op.

* Remove useless code.

* Remove useless code.

* Optimize layer_norm fwd when cols is 1024.

* Remove useless code.

* Minors.

* Minors.

* Modifications accordding to reviews.

* Minors.

* Optimize layer_norm bwd kernel when cols is 1024.

* Polish layer_norm_bwd_1024 kernel.

* Limit ln_bwd_1024_kernel to paddle_with_cuda.

* Fix double type compile error.

* Add optimization of ln bwd for fused_dropout_add_ln op.

* Polish codes.

99cfcc09

Add xpu2 compiler (#37254) · 92da5055

由 Liu-xiandong 提交于 1月 29, 2022

* Add XPU compiler for paddle, test=develop

* clean code

* clean useless code

* clean useless code

* clean useless code

* test

* add include path

* use clang compiler

* xpu2.cmake

* XPU2 compiler passed

* update

* update after pten

* combination the WITH_XPU and WITH_XPU2

* update the fuse operation in WITH_XPU and WITH_XPU2

* update

* update

* update

* fix the merge error

* update

* update the code

* update the code

* add run_kp_kernel flag

* update

* update

* fix prepared type_ bug

* clean and update the code

* reset the kernel_primitives

* update

* clean the code

* delete useless comment

* fix the bug in WITH_XPU

* update

* update

* modify the abi

* delete some useless code

* Parameter automation in xpu compilation

* Parameter automation in xpu compilation

* delete kps in cmake

* delete useless comment

* clean the code

* clean the code

92da5055

[PTen] Tidy pten core headers (#39188) · dd990981

由 Chen Weihang 提交于 1月 29, 2022

* open header for custom kernel

* add core utils

* tidy core code

* tify header

* tidy include

* tidy namespace

* resolve conflit

* fix unittest and coverage

* remove platform using

* resolve conflict

* resolve conflict

* fix digamma namespace error

* fix xpu full kernel error

* fix xpu full kernel error

* polish details

* add place for lib storage

dd990981

Q
fix kunlun2 softmax unitest bug (#39274) · 23bb2836
由 QingshuChen 提交于 1月 29, 2022
```
* fix kunlun2 softmax unitest bug
*test=kunlun

* minor
```
23bb2836
L

[pten] fix wrong variable name in PreparePtenData (#39311) · 7b4916c4
由 Leo Chen 提交于 1月 29, 2022

7b4916c4

28 1月, 2022 12 次提交

Adjusted CMakeFiles to support compilation for final state auto generated codes (#39215) · 09198b04

由 Zhanlue Yang 提交于 1月 28, 2022

* Removed debug info

* Added automatic code generation for final state Eager Dygraph

* Modified backward yaml

* Added EagerUtils helper functions for final state CodeGen

* Adjusted CMakeFiles to support compilation for final state auto generated codes

* Fixed final state eager codegen

* Fixed CI problems

* Fixed yaml.load() method failure

* Turned final state codegen off for now

* Fixed minor issue

09198b04

C
[PTen] Update all forward argument maping fns (#39252) · 75923a32
由 Chen Weihang 提交于 1月 28, 2022
```
* update forward argument mapping

* fix compile failed

* fix test failed
```
75923a32
F

Remove macro for GetGpuBasePtr (#39279) · 9a001c09
由 From00 提交于 1月 28, 2022

9a001c09

Host tracer and ProfilerController (#39230) · 7c489c2e

由 liutiexing 提交于 1月 28, 2022

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* Revert "Add EventsWaiter"

This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2.

* split template

* Add Profiler and HostTracer

* update

* update

* update

* updateg

* fix cmake
Co-authored-by: Nliutiexing <liutiexing@google.com>

7c489c2e

Workqueue threadnames (#39177) · 44af74b8

由 liutiexing 提交于 1月 28, 2022

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* Revert "Add EventsWaiter"

This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2.

* Set thread name for WorkQueue
Co-authored-by: Nliutiexing <liutiexing@google.com>

44af74b8

Y

fix elementwise_grad_bug (#39301) · 04a16189
由 YuanRisheng 提交于 1月 28, 2022

04a16189
Y
[PTen]Refactor scale kernel that has selected_rows input (#39278) · abfc2fe9
由 YuanRisheng 提交于 1月 28, 2022
```
* refactor scale kernel that its input is selected_rows

* complement upload file
```
abfc2fe9

Move digamma to pten (#39240) · 848ae7dc

由 hong 提交于 1月 28, 2022

* move digamma to pten; test=develop

* fix mutable_data bugs; test=develop

* remove useless code; test=develop

* remove kernel compute; test=develop

* fix bug; test=develop

848ae7dc

W
compile fix (#39272) · 91dd0f0d
由 wenbin 提交于 1月 28, 2022
```
* slice

* shuffle pass enhancement
```
91dd0f0d

[PSLIB] Add Metrics Module, Support User-defined Add Metric (#38789) · 2e6be886

由 Fan Zhang 提交于 1月 28, 2022

* [PSLIB] Add Metrics Module, Support User-defined Add Metric

* [PSLIB] Modify According to CI

* [PSLIB] Modify According to CI

* [PSLIB] Modify According to CI

* [PSLIB] Modify According to CI Coverage

* [PSLIB] Modify According to CI

* [PSLIB] Modify According to CI

* [PSLIB] Modify According to CI

* [PSLIB] Modify According to CI

* [PSLIB] Modify According to CI

* [PSLIB] Modify According to CI Coverage

* [PSLIB] Modify According to CI Coverage

* [PSLIB] Modify According to CI Coverage

* modify role_maker

* update CMakeLists.txt

2e6be886

【Pten】Remove WriteBackOutput in tensor_utils (#39291) · 3ef2922b

由 zyfncg 提交于 1月 28, 2022

* remove remake densetensor

* fix eager test error

* fix bug in eager

* implement AllocateFrom

* remove WriteBackOutput

* fix problem of eager
Co-authored-by: Nzkh2016 <zhangkaihuo@baidu.com>

3ef2922b

[Eager] Refactor TensorAdd by template (#39282) · 0bb3e5f1

由 Weilong Wu 提交于 1月 28, 2022

* Refactor TensorAdd func by template and remove gradient_accumulation in eager

* Remove needless target name

* Use overload instead of template

0bb3e5f1

27 1月, 2022 2 次提交

Add Khop Graph Sampler API (#39146) · 35f949b5

由 Siming Dai 提交于 1月 27, 2022

* add the test case for the UVA

* add the context load for the uva

* Add graph_sample kernel

* Add graph_sample commit

* add new commit for graph_sample

* add unsigned long long int

* delete some remarks

* add cpu version

* add cuda eids

* add cpu eids

* delete _uva

* optimize speed: emplace_back, last_layer

* add to_uva_tensor

* add cpu return_eids choice

* add gpu return_eids choice

* add cpu reindex_nodes

* add gpu reindex_nodes

* rename op and add OMP for cpu

* add incubate api

* fix the compile problem for the PADDLE_ENFORE and different device

* fix the rcom and windows compile problem

* add unittest for graph_sample_neighbors

* fix cpu unittest and unique problem

* fix uva unittest, fix cuda unique problem

* fix the windows compile problem

* fix the windows rand_r compile problem

* add correct unittest, add src_eids dispensable

* delete black

* combine uva unittest

* mv Sample_index to Sample_Index; check input shape; fix random sample func

* delete memset & cudaMemset

* fix according to PR comments

* fix rocm ci

* modify function names according to the specification

* fix windows_openblas ci

* refine annotations, fix windows unittest, add default value for uva device_id, fix bug for input nodes with empty neighbors

* fix rocm ci

* rename graph_sample_neighbors as graph_khop_sampler, add incubate api doc

* add data type

* fix conflict
Co-authored-by: Nwawltor <fangzeyang0904@hotmail.com>

35f949b5

L

[pten] remove concat fluid kernel (#39268) · 552db8dc
由 Leo Chen 提交于 1月 27, 2022

552db8dc

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功