提交 · 7df301f2fc0602745e40fa3a7c43ccedd41786ca · PaddlePaddle / Paddle

27 11月, 2021 1 次提交

[NPU] reorganization for device API abstraction (#37110) · 72241a6a

由 Aganlengzi 提交于 11月 27, 2021

* [NPU] reorganization for device API abstraction

* [NPU] delete old files

* [NPU] fix npu_collective_helper

* [NPU] fix collective_helper

* [NPU] fix ut

* [NPU] mod memory allocation and hccl_helper

* [NPU] fix place_type

* [NPU] split enfoce.h

* move acl* call into npu_info

* merge conflict

* fix merge

* merge conflict

* merge conflict

72241a6a

25 11月, 2021 1 次提交
- W
  
  fix_matmul_op_int8_plugin (#37525) · 0fd70d71
  由 Wangzheee 提交于 11月 25, 2021
  
  0fd70d71
24 11月, 2021 2 次提交
- W
  [Paddle-Inference] Matmul_int8_convert: tensor*tensor (#37285) · 16590799
  由 Wangzheee 提交于 11月 24, 2021
```
* matmul_convert_int8

* matmul_convert_int8

* matmulconvert_int8

* Matmul_int8_convert: tensor*tensor

* Matmul_int8_convert: tensor*tensor

* Matmul_int8_convert: tensor*tensor
```
  16590799
- Z
  
  fix lite with xpu or nnadapter (#37449) · 93aefceb
  由 zhupengyang 提交于 11月 24, 2021
  
  93aefceb
23 11月, 2021 2 次提交
- W
  fix problem of dcnv2 trt (#37345) · e91141fb
  由 wangxinxin08 提交于 11月 23, 2021
```
* modify code about fp16 of dcnv2 trt
```
  e91141fb
- W
  [Paddle Inference] Fix_nearest: align_corners != true (#37368) · bc150edc
  由 Wangzheee 提交于 11月 23, 2021
```
* fix_nearest

* fix_nearest

* fix_nearest

* fix_nearest
```
  bc150edc
22 11月, 2021 1 次提交
- J
  
  fix memeory_optimize_pass bug (#37324) · 075c22f6
  由 JingZhuangzhuang 提交于 11月 22, 2021
  
  075c22f6
19 11月, 2021 2 次提交
- J
  
  Add corner case in scale calculation (#37352) · 4d891c00
  由 joanna.wozna.intel 提交于 11月 19, 2021
  
  4d891c00
- J
  Optimize cinn_cache_key by replace GraphToProgram to Dot string (#37317) · edc3496f
  由 jiangcheng 提交于 11月 19, 2021
```
* optimize cache-key by replace GraphToProgram to Dot string

* fix compile failure bug
```
  edc3496f
15 11月, 2021 2 次提交

[Pten] Refactor the implementation of custom operator (#37122) · 1e598f1a

由 Chen Weihang 提交于 11月 15, 2021

* move extension into pten [no-verify]

* append tensor methods by ext_tensor [no-verify]

* append other tensor methods [no-verify]

* ext related files tidy [no-verify]

* include relation tidy [no-verify]

* add pten tensor test [no-verify]

* replace tensor in custom op & compile success

* refine tensor constructor for unittest

* custom relu jit run success

* fix all custom op unittests

* add inference cmake adapt [no-verify]

* fix failed unittests

* fix windows failed unittests

* try to fix kunlun and inference failed

* fix test_elementwise_api error

* try to fix win compile failed

* fix kunlun fp16 type error

* remove useless haddle error macro

* add custom linear op test

* fix compile failed & add win symbols

* fix non pten kernel cast failed

* add dll decl for api

* polish several deetails

* polish details by review comment

* add dll_decl for register

1e598f1a

remove input dim check in op_teller and update ut (#37097) · 6b21bb0b

由 baoachun 提交于 11月 15, 2021

* remove input dim check of activation in op_teller

* remove input dim check of concat in op_teller

* remove input dim check of clip in op_teller

* remove input dim check of scale in op_teller

* remove input dim check in op_teller

* update attr check of slice in op_teller

6b21bb0b

12 11月, 2021 3 次提交
- W
  
  add_fc_convert_layers_name (#37157) · a76b77a5
  由 Wangzheee 提交于 11月 12, 2021
  
  a76b77a5
- J
  
  add fetch to black list (#37123) · 63c8c8c2
  由 JingZhuangzhuang 提交于 11月 11, 2021
  
  63c8c8c2
- W
  [Paddle-Inference] fix_qkv_plugin: fix half scale (#37096) · 36154ba9
  由 Wangzheee 提交于 11月 12, 2021
```
* fix_qkv_plugin: half_scale

* [Paddle-Inference] fix_qkv_plugin: fix half scale
```
  36154ba9
11 11月, 2021 3 次提交
- J
  Added softplus + activation oneDNN fuse pass (#36657) · a346c4dc
  由 jakpiase 提交于 11月 11, 2021
```
* added softplus + activation fuse plass

* minor change

* implemented reviewer suggestion

* minor fix

* minor fix

* added scale_out parameter

* minor fix

* fix for iScan CI

* conditionally disabled logs

* refactored pass builder
```
  a346c4dc
- W
  
  op_teller: add all convert_op to int8 (#37099) · 1580eae2
  由 Wangzheee 提交于 11月 11, 2021
  
  1580eae2
- J
  
  - Enable FC int8 (#37078) · 498dbfa8
  由 Jacek Czaja 提交于 11月 10, 2021
  
  498dbfa8
09 11月, 2021 1 次提交
- S
  
  fix bugs when build in windows with_inference_api_test=on (#36973) · fd15477f
  由 Sing_chan 提交于 11月 09, 2021
  
  fd15477f
04 11月, 2021 1 次提交
- W
  
  修复TRT7在TRT8中单测bug (#36984) · bf9374c1
  由 wangye707 提交于 11月 04, 2021
  
  bf9374c1
02 11月, 2021 1 次提交
- J
  [Need review] Added conv + hard_sigmoid oneDNN fuse pass (#36869) · 53690719
  由 jakpiase 提交于 11月 02, 2021
```
* added conv + hard_sigmoid fuse pass

* Removed IsOptional() statements

* Reverted removing optional
```
  53690719
01 11月, 2021 2 次提交

W
disable int8 if there is no quant info (#36900) · 9dd442ab
由 wenbin 提交于 11月 01, 2021
```
* disable int8

* size_t to int
```
9dd442ab

Paddle Tensor Operation Library initial implementation (#34425) · b9fdd3bc

由 Chen Weihang 提交于 11月 01, 2021

* initial tensor design & sign kernel demo

* add move constructor for meta & add lodtensor

* add dirs & sign xpu kernel

* add mean cpu&cuda kernel impl

* move sign & mean xpu & npu kernel

* add selected_rows basic impl

* refactor design, BaseTensor to DenseTensor, etc.

* add scale mkldnn kernel

* polish xpu & npu impl details

* fix mkldnn reuse compile failed

* change tensor operation lib name

* rename util filename

* add more comments

* change TensorImplInterface to TensorInterface

* add kernel key and factory

* remove MKLDNNTensorMeta, add MKLDNNDenseTensor

* change XXDeviceContext to XXContext

* add base kernel registrar utils & test on sign

* replace boost::any by paddle::any

* fix several ci failed

* fix npu compile error

* add ordered map util

* fix multiple ordered_map compile errors

* move dev into include dir

* support sign op in static op run

* fix static op run error

* fix new executor compile failed

* add dygraph branch & remove sign_op.h

* fix test_infer_no_need_buffer_slots

* fix rocm compile link error

* fix unitybuild error & clear glog

* fix npu compile failed

* skip quant trans test

* fix part windows compile problem

* fix xpu enforce error

* fix inference test failed

* remove ordered_map to solve quant failed

* fix part of rcom compile faild

* add more register kernels

* revert scale kernel temporarily

* fix code format error

* add new kernel registrar marco

* rename top to tcmpt

* revert xpu, npu, mkldnn impl & remove op def

* add kernel args parse functor to auto parse args

* revert some change & add scale kernels

* add op proto in dygraph kernelcontext building

* polish kernel dispatch logic & nameing rule

* fix scale kernel match error

* fix scale test failed

* add mean API and unittest

* test mean api success

* add branch to solve compiled error

* skip clang format error

* add mean skip rule in op_library

* add dot kernel, api and unittest (#6)

* remove old kernel and add symbol link

* fix dot compiled failed

* add merco for module declare

* fix npu and xpu compile error

* revert sign, mean, scale, dot kernel removing

* add comment for keeping old kernel impl

* fix mutable_data error

* fix bfloat16 conflit

* fix inference undef error

* adapt to msvc compile rules

* polish comment for template inst

* add cmake template instantiation for win

* fix backend to place device id bug

* fix ifdef error

* Op2functor (#7)

* add kernel args maker class

* make args maker non-const

* remove debug log

* modify codes by review options

* split constructPrKernelContext function

* fix output name bug

* fix test_mean_op test_sign_op failed

* fill_any_like kernel refactor (#10)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* skip dtype for fill_any_like

* add attrs for kernel key constrcut

* add use_pt_kernel Flags to control whether to use pt kernel (#13)

* add use_pt_kernel Flags to control whether to use pt kernel

* change the default value to true for cheking pt kernels

* fix mutable_data cuda place error

* move high level apis into hapi

* remove selectedrows adapting temporarily

* Support Scalar in Tensor Compute Library (#14)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* Support Scalar in Tensor Compute Library

* add scalar in dygraph and static graph mode

* keep the basic type for attr, instead of using scalar for all

* merge the code

* remove mkldnn tensor & polish details

* use flat_hash_map and small_vector in kernel factory

* Refactor flatten kernel (#12)

* refactor flatten kernel

* update infershape function

* fix compile bugs

* fix bugs when merge

* fix compiler bugs

* fix bugs when run test_flatten_api

* fix bugs when run test

* Revert "use flat_hash_map and small_vector in kernel factory"

This reverts commit 23091495cfdd3df8cc1be592d30f09ea66a7c72b.

* Move cpu, cuda and other device code into kernels (#15)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* Support Scalar in Tensor Compute Library

* add scalar in dygraph and static graph mode

* keep the basic type for attr, instead of using scalar for all

* merge the code

* start refactor matmul

* move cpu, cuda and other device modules into kernels

* merge code

* polish code in operator.cc

* Perfect unitests (#16)

* perfect unittest

* update license

* replace with flat_hash_map, small_vector (#19)

* fix small_vector build error on windows platform

* replace with flat_hash_map, small_vector

* remove todo

* Perfect unitests (#20)

* perfect unittest

* update license

* fix bug when run tcmpt_utils_test

* refactor execution adapting impl

* fix insert conflit

* Fix CI bug of test_yolov3 (#21)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* Support Scalar in Tensor Compute Library

* add scalar in dygraph and static graph mode

* keep the basic type for attr, instead of using scalar for all

* merge the code

* start refactor matmul

* move cpu, cuda and other device modules into kernels

* merge code

* polish code in operator.cc

* Fix CI bug of test_yolov3

* add the tensor base class, test=develop (#17)

* update the tensor base class, test=develop

* remove two funcs, test=develop

* update the error msg, test=develop
Co-authored-by: NChen Weihang <chenweihang@baidu.com>

* [no-verify] commit backend and tensor signature changes

* Rename tcmpt to pten (#23)

* rename tcmpt to pten

* update omitted files for rename to pten

* update omitted file for rename to pten

* remove k of all enum var

* remove kernel_instantiate (#26)

* remove symbols and spatial_tensor

* change common to functions

* readd share tensor impl methods

* add a candidate dense tensor class, test=develop (#28)

* change all Pt to Pten

* resolve conflit with xiaowei

* Op2functor opt1 (#27)

* replace to small vector and change to const &

* add std::move
Co-authored-by: NChen Weihang <chenweihang@baidu.com>

* polish kernel factory and kernel registry

* fix operator test error msg mismatch

* remove tensor signature and backend set member

* move scalar and polish enforce

* revert dtype layout change to fix error

* fix enum operator override error

* add several base unittests

* add pten utils tests

* polish some details

* Dev/op2func refactor 3 (#30)

* add a candidate dense tensor class, test=develop

* remove TensorBase::backend(), test=develop

* remove some ops, test=develop

* cherry-pick the pr of tensor meta, test=develop

* moves the dense tensor and some ops, test=develop

* update the linalg operator, test=develop

* update other operators, test=develop

* fix errors, test=develop

* fix bugs, test=develop

* try to resolve the problem of windows ci, test=develop

* updates codes, test=develop

* fix the tensor_utils.cc, test=develop

* modify the dense tensor, test=develop

* fix the data type, test=develop
Co-authored-by: Nshixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>

* polish some details

* polish kernel signature details

* fix a bug about offsets of the tensor, test=develop (#31)
Co-authored-by: Nshixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>

* polish some details
Co-authored-by: Nchentianyu03 <ctychentianyu@gmail.com>
Co-authored-by: Nzyfncg <1370305206@qq.com>
Co-authored-by: NYuanRisheng <yuanrisheng@baidu.com>
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>

b9fdd3bc

29 10月, 2021 2 次提交
- B
  
  fix matmul error when input's dim is 3 (#36849) · f6b4ed22
  由 baoachun 提交于 10月 29, 2021
  
  f6b4ed22
- W
  
  fix dcnv2 trt8 compile error (#36850) · 82fb63eb
  由 wangxinxin08 提交于 10月 29, 2021
  
  82fb63eb
28 10月, 2021 1 次提交
- F
  change api to support trt8 in pool3d_op_convert (#36783) · a7d8837b
  由 feng_shuai 提交于 10月 28, 2021
```
* change api for support trt8

* fix:change api
```
  a7d8837b
27 10月, 2021 4 次提交
- W
  add dcnv2 trt plugin (#36612) · 8c3decd8
  由 wangxinxin08 提交于 10月 27, 2021
```
* add dcnv2 plugin
```
  8c3decd8
- Z
  
  fix ernie serialize problem (#36769) · d6b1beb0
  由 zlsh80826 提交于 10月 27, 2021
  
  d6b1beb0
- B
  add matmul_v2 to v1 CPU pass and fix matmul dim error (#36731) · d5245a35
  由 baoachun 提交于 10月 27, 2021
```
* fix matmul dim error

* fix wrong dim check in matmul
```
  d5245a35
- W
  
  enable trt test check and fix trt ut error（3/3） (#36581) · 8c1c72af
  由 Wilber 提交于 10月 27, 2021
  
  8c1c72af
26 10月, 2021 3 次提交

fix wrong trt dim when input dim is 2 (#36614) · 43dcf235

由 baoachun 提交于 10月 26, 2021

* fix wrong trt dim when input dim is 2

* update leaky_relu and instance_norm converter unit test

* add instance_norm input dim check

43dcf235

[Paddle-Inference]Add MatmulV2ToMatmul convert Pass, fix (matmul_v2, matmul,... · 93c591e2

由 Wangzheee 提交于 10月 26, 2021

[Paddle-Inference]Add MatmulV2ToMatmul convert Pass, fix (matmul_v2, matmul, mul) convert pass, fix (matmul, mul) op_teller (#36652)

* new_Matmul2ToMatmulToMul

* new_Matmul2ToMatmulToMul

* fix paddle_pass_builder

* fix paddle_pass_builder

* fix paddle_pass_builder

* tem

* tem

* Add MatmulV2ToMatmul convert Pass; MatmulV2ToMul convert Pass

* Add MatmulV2ToMatmul convert Pass; MatmulV2ToMul convert Pass

* add matmul_broadcast_unitest

* fix op_teller

93c591e2

F

Pool3d 2.0 (#36545) · 229bae81
由 feng_shuai 提交于 10月 26, 2021

229bae81

23 10月, 2021 3 次提交

W
disable padding if dynamic shape (#36648) · 99e396f8
由 wenbin 提交于 10月 23, 2021
```
* disable padding if dynamic shape

* add parentheses

* correct
```
99e396f8
W
add file exists check (#36628) · 425db7c8
由 Wilber 提交于 10月 23, 2021
```
* add file check

* add ut
```
425db7c8

New Paddle-CINN Compile PR (#36584) · ab732884

由 Huihuang Zheng 提交于 10月 23, 2021

This PR added some changes to match the CINN change for compilation. It also tried to fix JiangCheng's Problem in PR: https://github.com/PaddlePaddle/Paddle/pull/36100

These changes include:
1. Set `CINN_GIT_TAG` to a newer tag
2. CINN now just `make cinnapi -j`
3. We have to add `-DPY_VERSION=${PY_VERSION} -DWITH_TESTING=ON` to CINN cmake args
4. For CINN's third party dependencies, we could just include headers without target_link_libraries
5. Moved `cinn.cmake` from `paddle/cmake` to `paddle/cmake/external` to match old style. External folder contains `lite`, which is the same level of `cinn`
6. CINN added `-DNAMESPACE=cinn_gflags` in `gflags.cmake` to have different gflag namespaces between CINN and Paddle. It solved re-define problem.
7. Change namespace of `::google::` in gflags to `::GFLAGS_NAMESPACE`

ab732884

22 10月, 2021 2 次提交
- W
  correct slice serialize data (#36588) · 5e880840
  由 wenbin 提交于 10月 22, 2021
```
* slice

* add UT
```
  5e880840
- W
  
  support lite xpu choose device id (#36610) · f46311b0
  由 Wilber 提交于 10月 22, 2021
  
  f46311b0
21 10月, 2021 1 次提交

Added matmul_v2+transpose+reshape fuse pass (#36481) · 856cb9c5

由 jakpiase 提交于 10月 21, 2021

* added base changes for matmul_v2+trans+resh fuse pass

* added full matmul_v2+transpose+reshape pass

* removed a file added by mistake

* added reviewers suggestions

* Changed ops type in checking capatibility version

* Deteled one statement

856cb9c5

20 10月, 2021 2 次提交

Add FasterTokenizer Operator (#34491) · 3f2d6a3f

由 Steffy-zxf 提交于 10月 20, 2021

Add Tokenizer related functionalities for Transformer model in order that the process of training and predicting is consistent.

* support the text string as an input Tensor
* support the "VOCAB"unordered_map<wstring, int> as an input Tensor to lookup tokens
* Tokenizer used for BERT. This tokenizer applies an end-to-end, text string to wordpiece tokenization.
* It first applies basic tokenization, followed by wordpiece tokenization.

3f2d6a3f

W

add unittest (#36371) · 7325c9fb
由 Wilber 提交于 10月 20, 2021

7325c9fb

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功