提交 · b9fdd3bc0f4f22af17a81bb8a50a337b563c876b · PaddlePaddle / Paddle

01 11月, 2021 1 次提交

Paddle Tensor Operation Library initial implementation (#34425) · b9fdd3bc

由 Chen Weihang 提交于 11月 01, 2021

* initial tensor design & sign kernel demo

* add move constructor for meta & add lodtensor

* add dirs & sign xpu kernel

* add mean cpu&cuda kernel impl

* move sign & mean xpu & npu kernel

* add selected_rows basic impl

* refactor design, BaseTensor to DenseTensor, etc.

* add scale mkldnn kernel

* polish xpu & npu impl details

* fix mkldnn reuse compile failed

* change tensor operation lib name

* rename util filename

* add more comments

* change TensorImplInterface to TensorInterface

* add kernel key and factory

* remove MKLDNNTensorMeta, add MKLDNNDenseTensor

* change XXDeviceContext to XXContext

* add base kernel registrar utils & test on sign

* replace boost::any by paddle::any

* fix several ci failed

* fix npu compile error

* add ordered map util

* fix multiple ordered_map compile errors

* move dev into include dir

* support sign op in static op run

* fix static op run error

* fix new executor compile failed

* add dygraph branch & remove sign_op.h

* fix test_infer_no_need_buffer_slots

* fix rocm compile link error

* fix unitybuild error & clear glog

* fix npu compile failed

* skip quant trans test

* fix part windows compile problem

* fix xpu enforce error

* fix inference test failed

* remove ordered_map to solve quant failed

* fix part of rcom compile faild

* add more register kernels

* revert scale kernel temporarily

* fix code format error

* add new kernel registrar marco

* rename top to tcmpt

* revert xpu, npu, mkldnn impl & remove op def

* add kernel args parse functor to auto parse args

* revert some change & add scale kernels

* add op proto in dygraph kernelcontext building

* polish kernel dispatch logic & nameing rule

* fix scale kernel match error

* fix scale test failed

* add mean API and unittest

* test mean api success

* add branch to solve compiled error

* skip clang format error

* add mean skip rule in op_library

* add dot kernel, api and unittest (#6)

* remove old kernel and add symbol link

* fix dot compiled failed

* add merco for module declare

* fix npu and xpu compile error

* revert sign, mean, scale, dot kernel removing

* add comment for keeping old kernel impl

* fix mutable_data error

* fix bfloat16 conflit

* fix inference undef error

* adapt to msvc compile rules

* polish comment for template inst

* add cmake template instantiation for win

* fix backend to place device id bug

* fix ifdef error

* Op2functor (#7)

* add kernel args maker class

* make args maker non-const

* remove debug log

* modify codes by review options

* split constructPrKernelContext function

* fix output name bug

* fix test_mean_op test_sign_op failed

* fill_any_like kernel refactor (#10)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* skip dtype for fill_any_like

* add attrs for kernel key constrcut

* add use_pt_kernel Flags to control whether to use pt kernel (#13)

* add use_pt_kernel Flags to control whether to use pt kernel

* change the default value to true for cheking pt kernels

* fix mutable_data cuda place error

* move high level apis into hapi

* remove selectedrows adapting temporarily

* Support Scalar in Tensor Compute Library (#14)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* Support Scalar in Tensor Compute Library

* add scalar in dygraph and static graph mode

* keep the basic type for attr, instead of using scalar for all

* merge the code

* remove mkldnn tensor & polish details

* use flat_hash_map and small_vector in kernel factory

* Refactor flatten kernel (#12)

* refactor flatten kernel

* update infershape function

* fix compile bugs

* fix bugs when merge

* fix compiler bugs

* fix bugs when run test_flatten_api

* fix bugs when run test

* Revert "use flat_hash_map and small_vector in kernel factory"

This reverts commit 23091495cfdd3df8cc1be592d30f09ea66a7c72b.

* Move cpu, cuda and other device code into kernels (#15)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* Support Scalar in Tensor Compute Library

* add scalar in dygraph and static graph mode

* keep the basic type for attr, instead of using scalar for all

* merge the code

* start refactor matmul

* move cpu, cuda and other device modules into kernels

* merge code

* polish code in operator.cc

* Perfect unitests (#16)

* perfect unittest

* update license

* replace with flat_hash_map, small_vector (#19)

* fix small_vector build error on windows platform

* replace with flat_hash_map, small_vector

* remove todo

* Perfect unitests (#20)

* perfect unittest

* update license

* fix bug when run tcmpt_utils_test

* refactor execution adapting impl

* fix insert conflit

* Fix CI bug of test_yolov3 (#21)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* Support Scalar in Tensor Compute Library

* add scalar in dygraph and static graph mode

* keep the basic type for attr, instead of using scalar for all

* merge the code

* start refactor matmul

* move cpu, cuda and other device modules into kernels

* merge code

* polish code in operator.cc

* Fix CI bug of test_yolov3

* add the tensor base class, test=develop (#17)

* update the tensor base class, test=develop

* remove two funcs, test=develop

* update the error msg, test=develop
Co-authored-by: NChen Weihang <chenweihang@baidu.com>

* [no-verify] commit backend and tensor signature changes

* Rename tcmpt to pten (#23)

* rename tcmpt to pten

* update omitted files for rename to pten

* update omitted file for rename to pten

* remove k of all enum var

* remove kernel_instantiate (#26)

* remove symbols and spatial_tensor

* change common to functions

* readd share tensor impl methods

* add a candidate dense tensor class, test=develop (#28)

* change all Pt to Pten

* resolve conflit with xiaowei

* Op2functor opt1 (#27)

* replace to small vector and change to const &

* add std::move
Co-authored-by: NChen Weihang <chenweihang@baidu.com>

* polish kernel factory and kernel registry

* fix operator test error msg mismatch

* remove tensor signature and backend set member

* move scalar and polish enforce

* revert dtype layout change to fix error

* fix enum operator override error

* add several base unittests

* add pten utils tests

* polish some details

* Dev/op2func refactor 3 (#30)

* add a candidate dense tensor class, test=develop

* remove TensorBase::backend(), test=develop

* remove some ops, test=develop

* cherry-pick the pr of tensor meta, test=develop

* moves the dense tensor and some ops, test=develop

* update the linalg operator, test=develop

* update other operators, test=develop

* fix errors, test=develop

* fix bugs, test=develop

* try to resolve the problem of windows ci, test=develop

* updates codes, test=develop

* fix the tensor_utils.cc, test=develop

* modify the dense tensor, test=develop

* fix the data type, test=develop
Co-authored-by: Nshixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>

* polish some details

* polish kernel signature details

* fix a bug about offsets of the tensor, test=develop (#31)
Co-authored-by: Nshixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>

* polish some details
Co-authored-by: Nchentianyu03 <ctychentianyu@gmail.com>
Co-authored-by: Nzyfncg <1370305206@qq.com>
Co-authored-by: NYuanRisheng <yuanrisheng@baidu.com>
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>

b9fdd3bc

26 5月, 2021 1 次提交

optimize OP's compilation time (#32617) · 78ecb668

由 wuhuanzhou 提交于 5月 26, 2021

* optimize OP's compilation time, test=develop

* add more op and run ci test, test=develop

* CUDA Kernel register in cc file, test=develop

* fix macros, test=develop

* fix undefined symbol error, test=develop

* fix compilation error and undefined symbol, test=develop

* fix compilation error on Windows, test=develop

* fix compilation error on Windows, test=develop

78ecb668

25 5月, 2021 1 次提交
- J
  
  Added scale op FP32/BF16 FWD/BWD kernels (#32975) · 86ea8dce
  由 jakpiase 提交于 5月 25, 2021
  
  86ea8dce
22 3月, 2021 1 次提交
- A
  
  [oneDNN] Initial bf16 amp integration (#31093) · 7ccf6b60
  由 arlesniak 提交于 3月 22, 2021
  
  7ccf6b60
04 2月, 2021 1 次提交
- W
  use iwyu clean include second time, test=develop (#30829) · 35c5b23f
  由 wanghuancoder 提交于 2月 04, 2021
```
* use iwyu clean include second time, test=develop
```
  35c5b23f
24 9月, 2020 1 次提交

use iwyu clean include (#27267) · df43905f

由 wanghuancoder 提交于 9月 24, 2020

* use iwyu clean include, test=develop, test=win

* compilation error, test=develop

* fix compilation error2, test=develop

* fix compilation error3, test=develop

* fix compilation error4, test=develop

* fix compilation error5, test=develop

* fix compilation error6, test=develop

* fix compilation error7, test=develop

* fix compilation error8, test=develop

* fix compilation error8, test=develop

* fix compilation error10, test=develop

* fix compilation error11, test=develop

df43905f

25 5月, 2020 1 次提交
- L
  
  rename inplace/no_need_buffer inferer, part 1, test=develop (#24711) · de8b4f42
  由 Leo Chen 提交于 5月 25, 2020
  
  de8b4f42
26 4月, 2020 1 次提交

improve efficiency of runtime InferVarType (#22778) · 9a93f6aa

由 liuwei1031 提交于 4月 26, 2020

* save InferVarType changes, test=develop

* remove code comments, test=develop

* tweak code, test=develop

* fix compilation warning, update merge_ids_op split_ids_op to new interface, test=develop

* modify fused_bn_activation_op, test=develop

* fix error of fused_bn_activation_op, test=develop

* fix PADDLE_ENFORCE and unittest coverage issue, test=develop

* tweak PADDLE_ENFORCE messages, test=develop

* improve unittest coverage, test=develop

* add StaticGraphInferVarType class, test=develop

* rebase develop branch, test=develop

* fix unittest error, test=develop

* remove comments, test=develop

* improve unittest coverage, test=develop

* imporve error message and imporve unittest coverage, test=develop

* upgrade InferVarType API, test=develop

* tweak pyfunc error message, test=develop

* fix compilation conflict - save_combine_op, test=develop

9a93f6aa

12 4月, 2020 1 次提交
- W
  Add some error meesage and dtyp, dtyep check for some ops (#23762) · f26f7c36
  由 wawltor 提交于 4月 12, 2020
```
Those ops include，scale， sum， sums，unique_with_counts，unique，
wherre， add error message and test case
```
  f26f7c36
04 4月, 2020 1 次提交

Delete Ref & VectorRef and add GetDataSafely (#22997) · 16315d3d

由 Chen Weihang 提交于 4月 04, 2020

* delete invalid check inferface Ref & VectorRef, test=develop

* fix vector ref delete error, test=develop

* try the new check inferface, test=develop

* change all related code with new check macro, test=develop

* remove static assert, test=develop

* polish detail, test=develop

* skip coverage problem, test=develop

* add new check macro, test=develop

16315d3d

09 3月, 2020 1 次提交

Imperative tracer refactoring (#22457) · d33c4343

由 Zeng Jinle 提交于 3月 09, 2020

* refine grad maker, test=develop

* refactor tracer stage 1, test=develop

* merge develop to solve conflict third times, test=develop

d33c4343

28 11月, 2019 1 次提交
- K
  add Adam beta1/beta2 support Variable (#21234) · ebfb720a
  由 Kaipeng Deng 提交于 11月 28, 2019
```
* add Adam beta1/beta2 support Variable. test=develop
```
  ebfb720a
05 11月, 2019 1 次提交

Support NoNeedBufferVarsInference in dygraph backward (#20868) · 878a40f5

由 Zeng Jinle 提交于 11月 05, 2019

* support no need buffer vars in dygraph, test=develop

* fix inference compilation error, test=develop

* update no_need_buffer_vars_inference, test=develop

* add unittests for no_need_buffer_vars_context, test=develop

* refine no_need_buffer_vars by return ref, test=develop

* polish some codes, test=develop

878a40f5

31 10月, 2019 1 次提交

GradMaker for dygraph (#19706) · 8c4573a3

由 hong 提交于 10月 31, 2019

* refactor dygraph,test=develop

* fix failed unittest,test=develop

* polish code,test=develop

* check windows ci error,test=develop
try to fix windows ci error by np.allclose,test=develop

* polish vlog and profiler, test=develop

* try to fix preceding ops order,test=develop

* test transformer in windows ci, test=develop

* use python c-api to speed up tracer.trace,test=develop

* test=develop, fix docker with paddle nccl problem

* test=develop, add ut for debug string and gradient_accumulator

* test=develop, add tests for layer/gradient_accumulator/prepared_op

* test=develop, fix complie error for test_prepared_op

* test=develop, add more ut for dygraph

* test=develop, create API.spec for dygraph api change

* optimize grad maker; test=develop

* optimize grad maker

* test

* grad make optim; test=develop

* fix unittest bugs; test=develop

* add dygraph grad op maker and split_op

* grad op maker refactor; test=develop

* add dygraph grad maker; test=develop

* fix op deformable_conv_v1_op bug; test=develop

* fix deformable_conv prroi pool bugs;

* fix new op grad op maker bug; test=develop

* fix split by ref bug; test=develop

* fix dygraph auto prune bug; test=develop

* fix test_trace bug; test=develop

* fix fused emb seq pool bug; test=develop

* remove useless code in op_desc file; test=develop

* remove useless code, StrVarBaseNode; test=develop

* fix review issues; test=develop

* fix rank_loss grad maker; test=develop

* remove flag in VarBase; test=develop

* fix distributed_notify_op compile bug ; test=develop

* fix reshape op double grad; test=develop

* fix expand as op; test=develop

* add impertive type_defs.h for demo_train; test=develop

* fix inference lib cmake; test=develop

* fix inference lib; test=develop

* fix infernce_lib; test=develop

* fix inference cmake; test=develop

* fix inference lib; test=develop

* fix inference lib; test=develop

* remove condition dygraph grad maker, modify local name; test=develop

* fix split grad maker bug; test=develop

* fix pyramid_op bug; test=develop

* change travis time out limit; test=develop

* restore travis; test=develop

* change timeout limit; test=develop

8c4573a3

11 9月, 2019 1 次提交
- Z
  
  refine math_op_patch, test=develop (#19727) · 078a6782
  由 Zeng Jinle 提交于 9月 11, 2019
  
  078a6782
19 3月, 2019 1 次提交
- Z
  add allocator flags · 22715487
  由 zhhsplendid 提交于 3月 19, 2019
```
test=develop
```
  22715487
18 3月, 2019 1 次提交
- M
  Polish code style · b40e41fb
  由 minqiyang 提交于 3月 18, 2019
```
test=develop
```
  b40e41fb
15 3月, 2019 2 次提交
- M
  Implement Runtime Var Type Inference · 438bca9c
  由 minqiyang 提交于 3月 15, 2019
```
test=develop
```
  438bca9c
- M
  
  Implement infer var type context · ca392c7e
  由 minqiyang 提交于 3月 15, 2019
  
  ca392c7e
21 1月, 2019 1 次提交
- D
  
  squash commits. test=develop · 8f3b2523
  由 dzhwinter 提交于 1月 21, 2019
  
  8f3b2523
27 9月, 2018 1 次提交

Add distributed unit tests about text_classification/simnet-bow/ctr (#12812) · 97cf1eb6

由 tangwei12 提交于 9月 27, 2018

* add dist ut for text_classification

* add dist ut for text_classification

* add simnet bow unittest

* add dist ut for simnet bow

* add trainning data url for simnet bow

* add trainning data url for simnet bow

* modify simnet test_reader to train reader

* add test_dist_ctr

* test_dist_ctr can run now

* dense update is good

* add unit test for selected rows

* debug unit test

* fix dist sparse update problem

* Constant args at init

* optimize code

* simnet optimize

* fix DebugStringEx

* optimize sum_op.h

* add ScaleOpVarTypeInference

* clean code

* fix test_dist_transpiler.py

* code optimize

* modify delta

* fix sparse update bug

* dist test use one cpu

* update some data

* remove unused code

* add use cuda config

* unit test fix

* unit test fix

* unit test fix

* unit test fix

* dist_word2vec use CPU

* unit test fix

* unit test fix

* code clean

* code clean

* merge develop

* api spec update

* Revert: api spec update

* replace simnet data with fake

* replace simnet data with fake

* update dim

* add batch auc

* code clean

* code clean

* modify print to stderr

* update simnet delta -> 1e-5

* update RUN_STEP

* add use_reader_alloc

* add use_reader_alloc

* add use_reader_alloc

* modify delta

* add use_reader_alloc

* fix stderr write

* python3 compatibility

test=develop

* python3 compatibility, test=develop

* Update dist_text_classification.py

* test=develop

97cf1eb6

21 9月, 2018 1 次提交
- S
  
  remove kwargs in python api · 3ee0a648
  由 sneaxiy 提交于 9月 21, 2018
  
  3ee0a648
18 9月, 2018 1 次提交
- S
  
  modification · 0718113a
  由 sneaxiy 提交于 9月 18, 2018
  
  0718113a
17 9月, 2018 1 次提交
- S
  
  tiny change to save memory · abf9832c
  由 sneaxiy 提交于 9月 17, 2018
  
  abf9832c
28 8月, 2018 1 次提交

Scale support selectedrows (#12960) · 11e01d9b

由 Qiao Longfei 提交于 8月 28, 2018

* add ScaleOpVarTypeInference for scale op

* scale op support scale selected rows

* optimize code

* use FindVar

* use FindVarRecursive in ScaleOpVarTypeInference

11e01d9b

15 6月, 2018 1 次提交
- Y
  
  update by comment · 3380737c
  由 yi.wu 提交于 6月 15, 2018
  
  3380737c
08 6月, 2018 1 次提交
- Y
  
  polish docs · 5be454bf
  由 yi.wu 提交于 6月 08, 2018
  
  5be454bf
08 5月, 2018 1 次提交

Clean OpProtoAndCheckerMaker · 0e78cb69

由 Yu Yang 提交于 5月 08, 2018

Do not use ctor

* Reduce line of codes.
* We can use virtual function for Maker now.
* The implementation does not care what maker holds, it is easier to
refactor later.

0e78cb69

03 5月, 2018 1 次提交

Fix/fp64 (#10346) · f63ff90b

由 dzhwinter 提交于 5月 03, 2018

* "fix double type error"

* "fix ci"

* "softmax fp64"

* "fix momentum"

* "fix ci"

f63ff90b

14 4月, 2018 1 次提交

Fix CPPLint errors in operators (#9826) · 7b86da71

由 Abhinav Arora 提交于 4月 13, 2018

* Fix CPPLint errors in operators

* Fix cast in softmax

* Fix softmax_mkldnn

* Fix send_recv_op_test

* Send_recv

* Fix softmax mkldnn

7b86da71

12 4月, 2018 1 次提交

remove net op and cond_op (#9663) · b26f5050

由 Yang Yang(Tony) 提交于 4月 11, 2018

* remove net op and cond_op

* fix cpplint

* fix dependency

* delete backward_test; fix compile

* disable batch_norm backward

* rm test_net.py

* make batchnorm test independent of backward.cc

* make test_layer_norm_op independent of backward.cc

* make test_layer_norm_op independent of backward.cc

* delete unused code

* clean up

b26f5050

12 2月, 2018 1 次提交
- Q
  
  Fix the grammar in copyright. (#8403) · 24509f4a
  由 qingqing01 提交于 2月 12, 2018
  
  24509f4a
10 2月, 2018 2 次提交
- Y
  
  Correct #include path · fc374821
  由 Yi Wang 提交于 2月 09, 2018
  
  fc374821
- Y
  
  Move file to fluid/; Edit CMakeLists.txt · 90648f33
  由 Yi Wang 提交于 2月 09, 2018
  
  90648f33
19 1月, 2018 1 次提交
- P
  
  init · b48fedc3
  由 peterzhang2029 提交于 1月 19, 2018
  
  b48fedc3
26 12月, 2017 1 次提交
- L
  
  unify the indentation of license · 761b3297
  由 Luo Tao 提交于 12月 26, 2017
  
  761b3297
21 12月, 2017 1 次提交
- Y
  Rename XXDescBind --> XXDesc (#6797) · 09189732
  由 Yu Yang 提交于 12月 21, 2017
```
* Rename XXDescBind --> XXDesc

* Fix Compile
```
  09189732
20 12月, 2017 1 次提交
- Y
  Move framework.proto to proto namespace (#6718) · e445b3ff
  由 Yu Yang 提交于 12月 20, 2017
```
* Move framework.proto to proto namespace

* Fix compile

* Fix compile

* Fix Compile
```
  e445b3ff
12 12月, 2017 1 次提交

Refine device context (#6433) · 61ec0b95

由 QI JUN 提交于 12月 12, 2017

There are mainly following fixes:

- take `DeviceContext` as the template parameter of math functors and OpKernel instead of `Place`
- remove `eigen_device` interface in base class  `DeviceContext`
- remove `GetEigenDevice` interface in `ExecutionContext` and base class `DeviceContext`
- remove unused `platform::EigenDeviceConverter`
- rename `REGISTER_OP_GPU_KERNEL` to `REGISTER_OP_CUDA_KERNEL`
- rename `USE_GPU_ONLY_OP` to `USE_CUDA_ONLY_OP`

61ec0b95

28 11月, 2017 1 次提交
- F
  Make 'scale_op' supporting int and int64 (#5986) · 23b3fef0
  由 fengjiayi 提交于 11月 28, 2017
```
* Make 'scale_op' supporting int and int64

* refine .cu file
```
  23b3fef0

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功