提交 · 308816472d93dfdb7048c5c79651deb38301fa0a · PaddlePaddle / Paddle

05 6月, 2023 2 次提交
- G
  
  [static op generation] pool2d, pool3d (#54070) · 30881647
  由 gouzil 提交于 6月 05, 2023
  
  30881647
- H
  Support code generation for op conv2d_transpose, conv3d_transpose,... · 1075d35d
  由 huangjiyi 提交于 6月 05, 2023
```
Support code generation for op conv2d_transpose, conv3d_transpose, depthwise_conv2d_transpose (#54242)
```
  1075d35d
23 5月, 2023 1 次提交

[static op generation] tril_triu (#54033) · 4af0f140

由 gouzil 提交于 5月 23, 2023

* [phi] autogen code tril_triu

* [phi][api]fix tril_triu_grad args

* [fluid] clean cmake; [phi] fix infer_meta

4af0f140

18 5月, 2023 1 次提交

support auto generate for op layer_norm (#53178) · 4f07b653

由 RedContritio 提交于 5月 18, 2023

* simplify layer_norm_op.cc

* support auto generate for op layer_norm

* update unittest for composite_layer_norm

* remove layer_norm_op.cc from scripts

* replace layer_norm_op with generated_op

* add get_expected_kernel for layer_norm

* update cmake kernel register function for layer_norm_mkldnn_op

4f07b653

16 5月, 2023 1 次提交

static graph autogen code support for softmax op (#53581) · 312f0187

由 Wang Xin 提交于 5月 16, 2023

* static graph autogen code support for softmax op

* bug fixed

* fix PR-CI-Windows error

* fix CI error

* bug fixed

* fix conflicts

312f0187

20 4月, 2023 1 次提交
- W
  remove ASCEND* keyword (#53046) · 7fa415ca
  由 Wang Xin 提交于 4月 20, 2023
```
* remove ASCEND* keyword

* update docstring

* bug fixed

* bug fixed
```
  7fa415ca
17 4月, 2023 1 次提交
- Y
  [PHI]Unify fluid kernel (Part4) (#52626) · 1b5eba8a
  由 YuanRisheng 提交于 4月 17, 2023
```
* unify kernel

* fix ci bugs

* fix py3 bugs

* fix py3 bugs

* perfect code
```
  1b5eba8a
11 4月, 2023 2 次提交
- R
  
  support auto generate for op average_accumulates (#52704) · 6741dd22
  由 RedContritio 提交于 4月 11, 2023
  
  6741dd22
- R
  support auto generate static for randperm (#52531) · 4a74f4c5
  由 RedContritio 提交于 4月 11, 2023
```
* support auto generate static for randperm

* remove enforce in randperm infermeta
```
  4a74f4c5
10 4月, 2023 1 次提交
- H
  register fluid kerenls to phi [part 7] (#52577) · aa35331f
  由 huangjiyi 提交于 4月 10, 2023
```
* update

* fix bug

* fix ci-windows-openblas

* fix test_partial_sum_op

* fix codestyle
```
  aa35331f
08 4月, 2023 1 次提交
- R
  
  support auto generate static for truncated_gaussian_random (#52540) · ed9bac2f
  由 RedContritio 提交于 4月 08, 2023
  
  ed9bac2f
06 4月, 2023 3 次提交
- R
  
  support auto generate static for empty (#52524) · 2ad66a42
  由 RedContritio 提交于 4月 06, 2023
  
  2ad66a42
- R
  support auto generate static for randint (#52529) · 535915aa
  由 RedContritio 提交于 4月 06, 2023
```
* support auto generate static for randint

* move seed from extra to attrs
```
  535915aa
- R
  
  support auto generate static for uniform (uniform_random) (#52522) · 838b2c83
  由 RedContritio 提交于 4月 06, 2023
  
  838b2c83
03 4月, 2023 1 次提交

support auto generate static for gaussian (gaussian_random) (#52422) · 3c949ba9

由 RedContritio 提交于 4月 03, 2023

* support auto generate static for gaussian (gaussian_random)

* move gaussian_random_batch_size_like Kernels from gaussian_random_op.* to gaussian_random_batch_size_like_op.*

3c949ba9

27 3月, 2023 1 次提交

Automatically generate 'assign' operator (#51940) · 888a30c9

由 HappyHeavyRain 提交于 3月 27, 2023

* support assign op

* support assign infer_var_type

* change code according to review

* change code according to review

* only save 'get_infer_var_type_func'

* rest file mode

888a30c9

30 1月, 2023 1 次提交

[Pglbox2.0] merge gpugraph to develop (#49946) · cb525d4e

由 zmxdream 提交于 1月 30, 2023

* add set slot_num for psgpuwraper (#177)

* add set slot_num_for_pull_feature for psgpuwarper

* Add get_epoch_finish python interface (#182)

* add get_epoch_finish interface

* add return

* delete return

* add unzip op (#183)

* fix miss key for error dataset (#186)

* fix miss key for error dataset

* fix miss key for error dataset
Co-authored-by: Nyangjunchao <yangjunchao@baidu.com>

* add excluded_train_pair and infer_node_type (#187)

* support return of degree (#188)

* fix task stuck in barrier (#189)
Co-authored-by: Nyangjunchao <yangjunchao@baidu.com>

* check node/feature format when loading (#190)

* check node&feature format when loading

* check node&feature format when loading (2£ (2)

* degrade log (#191)

* [PGLBOX]fix conflict

* [PGLBOX]fix conflict

* [PGLBOX]replace LodTensor with phi::DenseTensor

* [PGLBOX]fix gpu_primitives.h include path

* [PGLBOX]from platform::PADDLE_CUDA_NUM_THREADS to phi::PADDLE_CUDA_NUM_THREADS

* [PGLBOX]fix unzip example code

* [PGLBOX]fix unzip example code

* [PGLBOX]fix unzip example code

* [PGLBOX]fix unzip example code

* [PGLBOX]fix unzip ut

* [PGLBOX]fix unzip ut

* [PGLBOX]fix code style

* [PGLBOX]fix code style

* [PGLBOX]fix code style

* fix code style

* fix code style

* fix unzip ut

* fix unzip ut

* fix unzip ut

* fix unzip

* fix code stype

* add ut

* add c++ ut & fix train_mode_ set

* fix load into memory

* fix c++ ut

* fix c++ ut

* fix c++ ut

* fix c++ ut

* fix code style

* fix collective

* fix unzip_op.cc

* fix barrier

* fix code style

* fix barrier

* fix barrier

* fix code styple

* fix unzip

* add unzip.py

* add unzip.py

* fix unzip.py

---------
Co-authored-by: Nchao9527 <33347532+chao9527@users.noreply.github.com>
Co-authored-by: NSiming Dai <908660116@qq.com>
Co-authored-by: Nhuwei02 <53012141+huwei02@users.noreply.github.com>
Co-authored-by: Nyangjunchao <yangjunchao@baidu.com>

cb525d4e

24 11月, 2022 1 次提交
- S
  
  [PHI] Migrate batch_norm_grad kernel (#48288) · 561b7278
  由 Sławomir Siwek 提交于 11月 24, 2022
  
  561b7278
07 11月, 2022 1 次提交

[Restore PR] Remove hard code of PADDLE_WITH_CUDA (#47630) · 908a381d

由 HongyuJia 提交于 11月 07, 2022

* move cudnn hardcode outside GetExpectedKernelType

* add header file

* debug

* update interpreter_util with hardcode

* update interpreter_util headerfile

* solve activation hardcode

* debug with CI

* add mkldnn_op_list header file

* temporarily uncomment mkldnn

* temporarily uncomment mkldnn

* delete sequence_softmax cudnn hardcode

* add hardcode to data_transfer.cc

* update data_transfer headerfile

* try fix segment fault

* update cudnn&miopen_helper

* reset HasAttr of DygraphExctnCtx

* debug, this commit should pass all CI

* debug should pass CI, temporarily disable activation

* debug should pass CI

* fix default_attr=nullptr bug

* clean debug code

* Call SetDnnFallback function in the base class

* activation fallback to plain kernel

* fix default GetExpectedKernelType find wrong kernel

* search cudnn kernel instead of fallback

* fix cudnn_handle bug

* remove tanh use_cudnn

* restore tanh use_cudnn

* debug tanh

* fix tanh bug

* delete activation cudnn kernel

* polish code

908a381d

12 10月, 2022 1 次提交
- Z
  
  support generating code of opmaker for backward op invoke forward op (#46912) · 227ab74d
  由 zyfncg 提交于 10月 12, 2022
  
  227ab74d
20 9月, 2022 1 次提交
- Y
  [PHI]Move merge_selected_rows kernel to PHI (#46004) · b1658232
  由 YuanRisheng 提交于 9月 20, 2022
```
* move_merge_selected_rows

* update code
```
  b1658232
04 6月, 2022 1 次提交
- S
  
  【code format check upgrade】 step2：cmake-format (#43057) · 92568edb
  由 Sing_chan 提交于 6月 04, 2022
  
  92568edb
10 3月, 2022 1 次提交
- Z
  [PHI] Move segment_pool to phi. (#40099) · a07f19ee
  由 Zhong Hui 提交于 3月 10, 2022
```
* move segment_pool to phi.

* mark summed ids as optional tensor.

* fix as reviews.
```
  a07f19ee
13 1月, 2022 1 次提交

Added mul BF16/FP32 FWD/BWD oneDNN kernel (#38552) · fc6eed5b

由 jakpiase 提交于 1月 13, 2022

* base changes for mul reimplementation

* empty commit

* tmp save

* full implementation of mul bf16/fp32 fwd bwd

* CI fix

* CI rerun

* changed unity build cmake to avoid gpu issues

* removed mul mkldnn from unity build

* added skipping tests if not cpu_bf16

* CI fix

* CI fix

* CI fix

fc6eed5b

13 12月, 2021 1 次提交
- N
  
  [pnorm] Optimize p_norm op for special cases (#37685) · 10d9ab4b
  由 Noel 提交于 12月 13, 2021
  
  10d9ab4b
01 11月, 2021 1 次提交

Paddle Tensor Operation Library initial implementation (#34425) · b9fdd3bc

由 Chen Weihang 提交于 11月 01, 2021

* initial tensor design & sign kernel demo

* add move constructor for meta & add lodtensor

* add dirs & sign xpu kernel

* add mean cpu&cuda kernel impl

* move sign & mean xpu & npu kernel

* add selected_rows basic impl

* refactor design, BaseTensor to DenseTensor, etc.

* add scale mkldnn kernel

* polish xpu & npu impl details

* fix mkldnn reuse compile failed

* change tensor operation lib name

* rename util filename

* add more comments

* change TensorImplInterface to TensorInterface

* add kernel key and factory

* remove MKLDNNTensorMeta, add MKLDNNDenseTensor

* change XXDeviceContext to XXContext

* add base kernel registrar utils & test on sign

* replace boost::any by paddle::any

* fix several ci failed

* fix npu compile error

* add ordered map util

* fix multiple ordered_map compile errors

* move dev into include dir

* support sign op in static op run

* fix static op run error

* fix new executor compile failed

* add dygraph branch & remove sign_op.h

* fix test_infer_no_need_buffer_slots

* fix rocm compile link error

* fix unitybuild error & clear glog

* fix npu compile failed

* skip quant trans test

* fix part windows compile problem

* fix xpu enforce error

* fix inference test failed

* remove ordered_map to solve quant failed

* fix part of rcom compile faild

* add more register kernels

* revert scale kernel temporarily

* fix code format error

* add new kernel registrar marco

* rename top to tcmpt

* revert xpu, npu, mkldnn impl & remove op def

* add kernel args parse functor to auto parse args

* revert some change & add scale kernels

* add op proto in dygraph kernelcontext building

* polish kernel dispatch logic & nameing rule

* fix scale kernel match error

* fix scale test failed

* add mean API and unittest

* test mean api success

* add branch to solve compiled error

* skip clang format error

* add mean skip rule in op_library

* add dot kernel, api and unittest (#6)

* remove old kernel and add symbol link

* fix dot compiled failed

* add merco for module declare

* fix npu and xpu compile error

* revert sign, mean, scale, dot kernel removing

* add comment for keeping old kernel impl

* fix mutable_data error

* fix bfloat16 conflit

* fix inference undef error

* adapt to msvc compile rules

* polish comment for template inst

* add cmake template instantiation for win

* fix backend to place device id bug

* fix ifdef error

* Op2functor (#7)

* add kernel args maker class

* make args maker non-const

* remove debug log

* modify codes by review options

* split constructPrKernelContext function

* fix output name bug

* fix test_mean_op test_sign_op failed

* fill_any_like kernel refactor (#10)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* skip dtype for fill_any_like

* add attrs for kernel key constrcut

* add use_pt_kernel Flags to control whether to use pt kernel (#13)

* add use_pt_kernel Flags to control whether to use pt kernel

* change the default value to true for cheking pt kernels

* fix mutable_data cuda place error

* move high level apis into hapi

* remove selectedrows adapting temporarily

* Support Scalar in Tensor Compute Library (#14)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* Support Scalar in Tensor Compute Library

* add scalar in dygraph and static graph mode

* keep the basic type for attr, instead of using scalar for all

* merge the code

* remove mkldnn tensor & polish details

* use flat_hash_map and small_vector in kernel factory

* Refactor flatten kernel (#12)

* refactor flatten kernel

* update infershape function

* fix compile bugs

* fix bugs when merge

* fix compiler bugs

* fix bugs when run test_flatten_api

* fix bugs when run test

* Revert "use flat_hash_map and small_vector in kernel factory"

This reverts commit 23091495cfdd3df8cc1be592d30f09ea66a7c72b.

* Move cpu, cuda and other device code into kernels (#15)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* Support Scalar in Tensor Compute Library

* add scalar in dygraph and static graph mode

* keep the basic type for attr, instead of using scalar for all

* merge the code

* start refactor matmul

* move cpu, cuda and other device modules into kernels

* merge code

* polish code in operator.cc

* Perfect unitests (#16)

* perfect unittest

* update license

* replace with flat_hash_map, small_vector (#19)

* fix small_vector build error on windows platform

* replace with flat_hash_map, small_vector

* remove todo

* Perfect unitests (#20)

* perfect unittest

* update license

* fix bug when run tcmpt_utils_test

* refactor execution adapting impl

* fix insert conflit

* Fix CI bug of test_yolov3 (#21)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* Support Scalar in Tensor Compute Library

* add scalar in dygraph and static graph mode

* keep the basic type for attr, instead of using scalar for all

* merge the code

* start refactor matmul

* move cpu, cuda and other device modules into kernels

* merge code

* polish code in operator.cc

* Fix CI bug of test_yolov3

* add the tensor base class, test=develop (#17)

* update the tensor base class, test=develop

* remove two funcs, test=develop

* update the error msg, test=develop
Co-authored-by: NChen Weihang <chenweihang@baidu.com>

* [no-verify] commit backend and tensor signature changes

* Rename tcmpt to pten (#23)

* rename tcmpt to pten

* update omitted files for rename to pten

* update omitted file for rename to pten

* remove k of all enum var

* remove kernel_instantiate (#26)

* remove symbols and spatial_tensor

* change common to functions

* readd share tensor impl methods

* add a candidate dense tensor class, test=develop (#28)

* change all Pt to Pten

* resolve conflit with xiaowei

* Op2functor opt1 (#27)

* replace to small vector and change to const &

* add std::move
Co-authored-by: NChen Weihang <chenweihang@baidu.com>

* polish kernel factory and kernel registry

* fix operator test error msg mismatch

* remove tensor signature and backend set member

* move scalar and polish enforce

* revert dtype layout change to fix error

* fix enum operator override error

* add several base unittests

* add pten utils tests

* polish some details

* Dev/op2func refactor 3 (#30)

* add a candidate dense tensor class, test=develop

* remove TensorBase::backend(), test=develop

* remove some ops, test=develop

* cherry-pick the pr of tensor meta, test=develop

* moves the dense tensor and some ops, test=develop

* update the linalg operator, test=develop

* update other operators, test=develop

* fix errors, test=develop

* fix bugs, test=develop

* try to resolve the problem of windows ci, test=develop

* updates codes, test=develop

* fix the tensor_utils.cc, test=develop

* modify the dense tensor, test=develop

* fix the data type, test=develop
Co-authored-by: Nshixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>

* polish some details

* polish kernel signature details

* fix a bug about offsets of the tensor, test=develop (#31)
Co-authored-by: Nshixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>

* polish some details
Co-authored-by: Nchentianyu03 <ctychentianyu@gmail.com>
Co-authored-by: Nzyfncg <1370305206@qq.com>
Co-authored-by: NYuanRisheng <yuanrisheng@baidu.com>
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>

b9fdd3bc

26 5月, 2021 1 次提交

Added cast op oneDNN kernel for bf16/fp32 datatypes casting(FWD/BWD) (#33056) · a2a45d8d

由 jakpiase 提交于 5月 26, 2021

* added op cast functionality for fp32/bf16

* added newline

* added entries in static mode white list and unity build

* fixed failing tests

* changes after review

* added formatting

* upgraded tests file as reviewer suggested

* changes after review

* minor change

a2a45d8d

25 5月, 2021 1 次提交
- J
  
  Added scale op FP32/BF16 FWD/BWD kernels (#32975) · 86ea8dce
  由 jakpiase 提交于 5月 25, 2021
  
  86ea8dce
01 3月, 2021 1 次提交

optimize unity build (#31119) · a13f1d69

由 wuhuanzhou 提交于 3月 01, 2021

* optimize unity build, test=develop

* fix compilation error on Windows, test=develop

* fix compilation error, test=develop

* fix code style error, test=develop

a13f1d69

20 1月, 2021 1 次提交

optimize unity build (#30195) · 7e671c07

由 wuhuanzhou 提交于 1月 20, 2021

* optimize unity build, test=develop

* fix code style error, test=develop

* fix code style error and test /MP settings, test=develop

7e671c07

07 12月, 2020 1 次提交

Compiling operator libraries with Unity build (#29130) · 671555ed

由 LoveAn 提交于 12月 07, 2020

* Compiling operator libraries with Unity Build on Windows CPU.

* Compiling operator libraries with Unity Build on Windows GPU, no_test, test=windows_ci

* Add option in windows ci script, no_test, test=windows_ci

* Optimize parallel compiling, test=develop

* remove limit of parallel compile and skip some ops in UB, test=develop

* remove changes of header file, test=develop

* remove changes of header file, test=develop

* fix test_eye_op unittest failed, test=develop

* Compiling operator libraries with Unity Build on Linux, test=develop

* set default WITH_UNITY_BUILD=OFF, test=develop

* Move unity build rules into a single file and add comment, test=develop

* optimize parallel compilation, test=develop

* fix undefined reference error on coverage ci, test=develop

671555ed

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功