提交 · ebd14743b544a62b2330a3a3efebab41497b0d0a · PaddlePaddle / Paddle

07 2月, 2022 1 次提交
- J
  Added Adam FP32 JIT assembly kernel (#39158) · ebd14743
  由 jakpiase 提交于 2月 07, 2022
```
* Added adam kernel

* CI rerun
```
  ebd14743
06 2月, 2022 1 次提交
- W
  
  [PTEN] Add Gpu context (#39305) · a821c4a9
  由 Wilber 提交于 2月 06, 2022
  
  a821c4a9
04 2月, 2022 1 次提交
- C
  
  remove unchanged infermeta new (#39343) · 0dccdee0
  由 Chen Weihang 提交于 2月 04, 2022
  
  0dccdee0
02 2月, 2022 1 次提交
- J
  
  Merge legacy to fluid (#39318) · 34cce62f
  由 Jiabin Yang 提交于 2月 02, 2022
  
  34cce62f
30 1月, 2022 1 次提交
- F
  
  [MLU] add softmax_with_cross_entropy mlu kernel (#39260) · aecf9967
  由 fwenguang 提交于 1月 30, 2022
  
  aecf9967
29 1月, 2022 2 次提交

Optimize layer norm backward cuda kernel when cols is 1024. (#39247) · 99cfcc09

由 Li Min 提交于 1月 29, 2022

* Add fp16 support for scale/bias for fused_layernnorm_residual_dropout_bias op.

* Remove useless code.

* Remove useless code.

* Optimize layer_norm fwd when cols is 1024.

* Remove useless code.

* Minors.

* Minors.

* Modifications accordding to reviews.

* Minors.

* Optimize layer_norm bwd kernel when cols is 1024.

* Polish layer_norm_bwd_1024 kernel.

* Limit ln_bwd_1024_kernel to paddle_with_cuda.

* Fix double type compile error.

* Add optimization of ln bwd for fused_dropout_add_ln op.

* Polish codes.

99cfcc09

[PTen] Tidy pten core headers (#39188) · dd990981

由 Chen Weihang 提交于 1月 29, 2022

* open header for custom kernel

* add core utils

* tidy core code

* tify header

* tidy include

* tidy namespace

* resolve conflit

* fix unittest and coverage

* remove platform using

* resolve conflict

* resolve conflict

* fix digamma namespace error

* fix xpu full kernel error

* fix xpu full kernel error

* polish details

* add place for lib storage

dd990981

28 1月, 2022 3 次提交

C
[PTen] Update all forward argument maping fns (#39252) · 75923a32
由 Chen Weihang 提交于 1月 28, 2022
```
* update forward argument mapping

* fix compile failed

* fix test failed
```
75923a32
Y
[PTen]Refactor scale kernel that has selected_rows input (#39278) · abfc2fe9
由 YuanRisheng 提交于 1月 28, 2022
```
* refactor scale kernel that its input is selected_rows

* complement upload file
```
abfc2fe9

Move digamma to pten (#39240) · 848ae7dc

由 hong 提交于 1月 28, 2022

* move digamma to pten; test=develop

* fix mutable_data bugs; test=develop

* remove useless code; test=develop

* remove kernel compute; test=develop

* fix bug; test=develop

848ae7dc

27 1月, 2022 9 次提交

Add Khop Graph Sampler API (#39146) · 35f949b5

由 Siming Dai 提交于 1月 27, 2022

* add the test case for the UVA

* add the context load for the uva

* Add graph_sample kernel

* Add graph_sample commit

* add new commit for graph_sample

* add unsigned long long int

* delete some remarks

* add cpu version

* add cuda eids

* add cpu eids

* delete _uva

* optimize speed: emplace_back, last_layer

* add to_uva_tensor

* add cpu return_eids choice

* add gpu return_eids choice

* add cpu reindex_nodes

* add gpu reindex_nodes

* rename op and add OMP for cpu

* add incubate api

* fix the compile problem for the PADDLE_ENFORE and different device

* fix the rcom and windows compile problem

* add unittest for graph_sample_neighbors

* fix cpu unittest and unique problem

* fix uva unittest, fix cuda unique problem

* fix the windows compile problem

* fix the windows rand_r compile problem

* add correct unittest, add src_eids dispensable

* delete black

* combine uva unittest

* mv Sample_index to Sample_Index; check input shape; fix random sample func

* delete memset & cudaMemset

* fix according to PR comments

* fix rocm ci

* modify function names according to the specification

* fix windows_openblas ci

* refine annotations, fix windows unittest, add default value for uva device_id, fix bug for input nodes with empty neighbors

* fix rocm ci

* rename graph_sample_neighbors as graph_khop_sampler, add incubate api doc

* add data type

* fix conflict
Co-authored-by: Nwawltor <fangzeyang0904@hotmail.com>

35f949b5

L

[pten] remove concat fluid kernel (#39268) · 552db8dc
由 Leo Chen 提交于 1月 27, 2022

552db8dc
Z
【PTen】Remove ReMakePtenDenseTensor (#39094) · 98c1829b
由 zyfncg 提交于 1月 27, 2022
```
* remove remake densetensor

* fix eager test error

* fix bug in eager
```
98c1829b
Y

refactor elementwise sub grad (#39225) · 7a1e1193
由 YuanRisheng 提交于 1月 27, 2022

7a1e1193
Q

[MLU] add compile ci scripts for MLU, test=mlu_ci (#39122) · 56410b4a
由 Qi Li 提交于 1月 27, 2022

56410b4a

[pten] add full xpu kernel (#39172) · 93839717

由 chentianyu03 提交于 1月 27, 2022

* add full_kernel xpu

* fix full xpu register device type error

* fix full kernel bug

* add fulllike kernel impl and replace with raw kernel

* fix dev_ctx convert template args error

* modify namespace and header file

* add isinf check

* fix input type args in TensorSetConstantXPU error

93839717

optimize kunlun/xpu softmax_with_cross_entropy add add unitest (#39180) · 2b9bb8bb

由 QingshuChen 提交于 1月 27, 2022

* optimize kunlun/xpu softmax_with_cross_entropy add add unitest
*test=kunlun

* minor
*test=kunlun

* minor
*test=kunlun

* minor
*test=kunlun

* minor
*test=kunlun

2b9bb8bb

Z
Fix slice error in jit.to_static mode (#39251) · c0f993f6
由 zyfncg 提交于 1月 27, 2022
```
* fix slice bug

* fix syntax error
```
c0f993f6
F

move math_cuda_utils.h to pten/kernels/funcs (#39246) · 809a10b6
由 Feiyu Chan 提交于 1月 27, 2022

809a10b6

26 1月, 2022 9 次提交

[pten] remove deprecated fluid op kernel for pten (#38842) · 3ab9aef1

由 Leo Chen 提交于 1月 26, 2022

* update cmake file to remove fluid kernel

* add pten declaration.h to where pybind.h used

* fix sync_bn and tensorrt_engine

* refine detection_library

* fix interpreter_core

* support eager legacy

* fit eager legacy for pten

* fall back to cpu if not found kernel

* fix compile problem

* fix compile problem

* refine fallback logic

* fit operator.run()

* fix xpu compile

* fit for new_exec

* add REGISTER_OP_WITHOUT_GRADIENT

* un-cache pt_kernel_context

* fix compile

* fix cudnn

* fix compiling with on_infer

* fix mkldnn

* fix isfinite_v2

* fix xpu problem

* fix op_device

* refine fallback for xpu

* fix xpu compile

* merge develop

* refine code format

* fix compile

* fix compile

* add data_transfer

* fix PreparePtenData

* fix cpu context

* merge develop

* fix compile

* fix error device context

* fix xpu

* fix dev_ctx

3ab9aef1

[pten] Cast xpu kernel (#39179) · 93d2f0a6

由 chentianyu03 提交于 1月 26, 2022

* cast xpu kernel init

* cast xpu kernel

* replace with raw cast xpu kernel

* fix cast kernel bug

* add the missing break

* modify namespace and header file

93d2f0a6

Q
[MLU]Add conv2d op (#39110) · 71634a61
由 qipengh 提交于 1月 26, 2022
```
* [MLU]Add conv2d op

* [MLU]fix comment

* [MLU]adapt NCHW of conv2d op
```
71634a61
Y
[Pten]Move kernel_primitives lib to Pten directory (#39169) · 452bcbe2
由 YuanRisheng 提交于 1月 26, 2022
```
* move kernel_primitives

* use pten's errors
```
452bcbe2

[IPU] sync misc changes 02 (#39189) · 5df78366

由 Allen Guo 提交于 1月 26, 2022

* sync misc changes

* apply comments 01

* fix compile error

* remove is_ipu_place check

* add authors
Co-authored-by: NXiaobing Wang <xiaobingw@graphcore.ai>
Co-authored-by: NAllen Guo <alleng@graphcore.ai>
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>
Co-authored-by: NHaicheng Jiang <haichengj@graphcore.ai>
Co-authored-by: NHan Zhao <hanzhao@graphcore.ai>

* sync changes

* restore cmake

* update ir cmake and setup.py

* update inference_lib cmake

* restore for split PR
Co-authored-by: NXiaobing Wang <xiaobingw@graphcore.ai>
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>
Co-authored-by: NHaicheng Jiang <haichengj@graphcore.ai>
Co-authored-by: NHan Zhao <hanzhao@graphcore.ai>

5df78366

L
Optimize layer norm forward when cols is 1024. (#39167) · 01d04be6
由 Li Min 提交于 1月 26, 2022
```
* Optimize layer_norm fwd when cols is 1024.
```
01d04be6

add sigmoid cross entropy with logits to kl2 (#38915) · fd44de58

由 houj04 提交于 1月 26, 2022

* add sigmoid cross entropy with logits to kl2. test=kunlun

* add sigmoid cross entropy with logits to kl2. test=kunlun

* follow comments. test=kunlun

fd44de58

J

sum op (#39165) · 55d6b87c
由 joeqiao12 提交于 1月 26, 2022

55d6b87c

[PTen] Unify InferMeta(Shape) Function in pten and fluid op (#38976) · b75507d3

由 Chen Weihang 提交于 1月 26, 2022

* infermeta context init design

* support infermeta called in fluid op

* add hasattr and attr methods

* add dygraah GetVarPtrs support

* rename arg_map_context to arg_map_utils

* add registry for arg map func

* resolve conflit

* refactor op utils design

* polish meta config

* fix details

* remove hasattr method

* resolve conflit

* revert cmake order change

* revert some change

* change init pos

* fix compile faileed

* fix typo

* fix inference failed

* fix windows ccompile failed

* polish format
Co-authored-by: NWang Huan <wanghuan29@baidu.com>

b75507d3

25 1月, 2022 12 次提交
- Y
  
  reconstruct directory of ps (#39191) · 2bf9b844
  由 yaoxuefeng 提交于 1月 25, 2022
  
  2bf9b844
- Y
  
  change infermeta and remove makePtenTenosr in reshape (#39186) · 7613129e
  由 YuanRisheng 提交于 1月 25, 2022
  
  7613129e
- L
  GetWorkspaceSize trigger modfication in heuristic cudnn conv (#39184) · 4c61e141
  由 limingshu 提交于 1月 25, 2022
```
* first commit

* add more changes
```
  4c61e141
- Z
  [inference] update trt convert reduce op&ut,test=develop (#39088) · 80753755
  由 Zhang Jun 提交于 1月 25, 2022
```
* [inference] update convert reduce op&ut,test=develop

* update

* update

* update

* add int32 support

* add int32 support

* add comments

* trt < 7.0 do not support int32

* test=develop

* update

* test=develop
```
  80753755
- J
  [MLU]add mlu kernel for fill_constant op (#39069) · 6e871dbc
  由 joeqiao12 提交于 1月 25, 2022
```
* [MLU]add mlu kernel for fill_constant op

* delete device_context DEPS
```
  6e871dbc
- N
  Revert "Replace EigenBroadcast with ElementwiseBroadcast in ReduceGrad (#38959)" (#39205) · 978558be
  由 niuliling123 提交于 1月 25, 2022
```
This reverts commit 9059ef69.
```
  978558be
- J
  [MLU]add mlu kernel for split and concat (#39020) · ac3dc0bb
  由 joeqiao12 提交于 1月 25, 2022
```
* [MLU]add mlu kernel for concat and split op

* delete device_context DEPS
```
  ac3dc0bb
- N
  
  Replace EigenBroadcast with ElementwiseBroadcast in ReduceGrad (#38959) · 9059ef69
  由 niuliling123 提交于 1月 25, 2022
  
  9059ef69
- L
  Optimize nearest_interp forward (#38528) · 232bbce2
  由 Lijunhui 提交于 1月 25, 2022
```
* init commit

* remove comments

* remove nchw branch

* optimize code

* apply fast div mod in 1D kernel, rm 3D kernel

* move init of FastDivMode to CPU

* 3D kernel for nchw, FastDiv for 1D kernel

* debug done. process boundary

* 2^n

* optimize

* optimize

* change code & optimize code
```
  232bbce2
- W
  [Move selected_rows PR #3] Change the relationship of [include/Cmake]. (#39128) · 2bafd338
  由 Weilong Wu 提交于 1月 25, 2022
```
* Added selected_rows and rw_lock to pten

* Renamed the unit test target to fix CI

* Removed Class SelectedRows in Fluid, changed include/cmake relationship, use pten::SelectedRows in Fluid

* Remove rw_lock.h,rw_lock_test.cc in fluid

* Use pten::RWLock and pten::AutoRDLock, fix CI

* Use pten::SelectedRows

* Use pten::SelectedRows

* Fix to pass NPU CI

* Use pten::SelectedRows, to pass NPU CI

* To fix NPU CI

* To fix NPU CI again
```
  2bafd338
- N
  
  [pnorm] fix bug in fp16 & optimize memory (#39011) · 3825b40f
  由 Noel 提交于 1月 25, 2022
  
  3825b40f
- W
  
  [PTEN] Add xpu context. (#39098) · c1e5a393
  由 Wilber 提交于 1月 25, 2022
  
  c1e5a393

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功