提交 · 8895379a0a9d1223480071d97befe71876272623 · BaiXuePrincess / Paddle

25 2月, 2022 1 次提交

[Phi] Support cudnn kernel moving & move softmax kernels (#39547) · 8895379a

由 Chen Weihang 提交于 2月 25, 2022

* support cudnn kernel moving

* polish cmake rules

* add unittest for coverage

* remove orig kernel

* remove softmax cudnn kernel

* fix softmax test failed

* fix npu func error

* resolve conflict

* rename gpu dnn kernels

* fix name rule error

* fix compile error

* update fp16 namespace

8895379a

23 3月, 2021 1 次提交

fix tensorrt output varible reshape (#31733) · 9d04ef73

由 Shang Zhizhou 提交于 3月 23, 2021

* fix tensorrt output varible reshape

* move padding shape x 1 x 1 in ernie to qkv and fc

* update layer name

* fix softmax when input is dynamic, fc not padding any more

* fix varlen

* move fc x_dim assert to op_teller

9d04ef73

28 9月, 2020 1 次提交

Add unittests and OP version registry for tensorrt_subgraph_pass (#27544) · ae6e40a7

由 Pei Yang 提交于 9月 28, 2020

* add unittests and op version register for tensorrt_subgraph_pass

* rename to test_trt_subgraph_pass.py

* fix softmax converter diff when padding dim=1

ae6e40a7

24 9月, 2020 1 次提交

use iwyu clean include (#27267) · df43905f

由 wanghuancoder 提交于 9月 24, 2020

* use iwyu clean include, test=develop, test=win

* compilation error, test=develop

* fix compilation error2, test=develop

* fix compilation error3, test=develop

* fix compilation error4, test=develop

* fix compilation error5, test=develop

* fix compilation error6, test=develop

* fix compilation error7, test=develop

* fix compilation error8, test=develop

* fix compilation error8, test=develop

* fix compilation error10, test=develop

* fix compilation error11, test=develop

df43905f

10 2月, 2020 1 次提交

[Refine Paddle-TRT INT8]: Support PaddleSlim's Resnet50, Mobilenetv1, Yolov3... · 54a325a5

由 Zhaolong Xing 提交于 2月 10, 2020

[Refine Paddle-TRT INT8]: Support PaddleSlim's Resnet50, Mobilenetv1, Yolov3 models for Inference. (#22483)

* add int8 op teller for trt.

* refine trt int8

* add int8 op teller for trt.
test=develop

54a325a5

25 5月, 2019 1 次提交

TRT: Support set dynamic range in int8 mode. (#17524) · 61221ebc

由 Zhaolong Xing 提交于 5月 25, 2019

* fluid int8 train and trt int8 predict align.
trt int8 predict init
op converter

* 2. align fluid int8 train and trt int8 inference.
enhance quant dequant fuse pass
enhance op converter, trt engine, trt engine op, trt subgraph pass.

* 3. add delete_quant_dequant_pass for trt

test=develop

* 4. add the missing file
test=develop

* 5. i modify the c++ interface, but forget to modify the pybind code
fix the IS_TRT_VERSION_GE bug, and fix elementwise op converter
test=develop

61221ebc

12 11月, 2018 1 次提交
- N
  
  add serial to trt test and do not print log for unused trt logs · d6ff0069
  由 nhzlx 提交于 11月 12, 2018
  
  d6ff0069
08 11月, 2018 1 次提交
- M
  Change the origin VLOG level to 10 times · 0c3227a5
  由 minqiyang 提交于 11月 08, 2018
```
Fix code to support cpplint syntax check

test=develop
```
  0c3227a5
09 8月, 2018 1 次提交
- N
  
  add softmax op converter · 641f32da
  由 nhzlx 提交于 8月 09, 2018
  
  641f32da
25 7月, 2018 1 次提交
- L
  
  unify libpaddle_inference_api into libpaddle_fluid · 5ba43376
  由 Luo Tao 提交于 7月 25, 2018
  
  5ba43376
24 7月, 2018 2 次提交
- N
  
  fix comments · 4d49e61a
  由 nhzlx 提交于 7月 24, 2018
  
  4d49e61a
- N
  
  1. set ut batch > 1 2. readd the mul op(utest will be added later) · 7382f986
  由 nhzlx 提交于 7月 24, 2018
  
  7382f986
07 6月, 2018 2 次提交
- L
  
  add test_mode in trt/activation_op · f6fb51a1
  由 Luo Tao 提交于 6月 07, 2018
  
  f6fb51a1
- Y
  
  feature/trt engine op test (#11182) · 4f95bc94
  由 Yan Chunwei 提交于 6月 07, 2018
  
  4f95bc94
06 6月, 2018 1 次提交
- L
  
  rewrite unittest of trt_activation_op · e116129f
  由 Luo Tao 提交于 6月 06, 2018
  
  e116129f
01 6月, 2018 1 次提交
- F
  
  fix compile errors · 31f0533c
  由 fengjiayi 提交于 6月 01, 2018
  
  31f0533c
14 5月, 2018 1 次提交
- Y
  
  OpConverter change BlockDesc to proto::BlockDesc (#10623) · 674bd839
  由 Yan Chunwei 提交于 5月 14, 2018
  
  674bd839
03 5月, 2018 1 次提交
- L
  
  add relu converter and unit-test · beb12455
  由 Luo Tao 提交于 5月 03, 2018
  
  beb12455
27 4月, 2018 1 次提交
- L
  
  update the register method · 6f6f3304
  由 Luo Tao 提交于 4月 27, 2018
  
  6f6f3304
25 4月, 2018 2 次提交
- L
  
  use template to do registry · c4e3010b
  由 Luo Tao 提交于 4月 25, 2018
  
  c4e3010b
- L
  
  auto registray op converters · d599de5c
  由 Luo Tao 提交于 4月 25, 2018
  
  d599de5c
23 4月, 2018 1 次提交
- L
  
  tensorrt convert init · 42febfa9
  由 Luo Tao 提交于 4月 23, 2018
  
  42febfa9
26 2月, 2018 2 次提交
- X
  
  Fix version date. · 9bbce493
  由 Xin Pan 提交于 2月 26, 2018
  
  9bbce493
- X
  
  Extend current profiler for timeline and more features. · b9ec24c6
  由 Xin Pan 提交于 2月 24, 2018
  
  b9ec24c6
12 2月, 2018 1 次提交
- Q
  
  Fix the grammar in copyright. (#8403) · 24509f4a
  由 qingqing01 提交于 2月 12, 2018
  
  24509f4a
10 2月, 2018 2 次提交
- Y
  
  Correct #include path · fc374821
  由 Yi Wang 提交于 2月 09, 2018
  
  fc374821
- Y
  
  Move file to fluid/; Edit CMakeLists.txt · 90648f33
  由 Yi Wang 提交于 2月 09, 2018
  
  90648f33
09 1月, 2018 1 次提交

Port WarpCTC Operator (#5107) · b5fda272

由 Yiqun Liu 提交于 1月 09, 2018

* Add Seq2BatchFunctor, which will be used in WarpCTCOp.

* Implement WrapCTCFunctor and WrapCTCKernel.

* Add unittest of warpctc_op.

* Modify the check_output inferface in python unittest framework to allow check a subset of outputs.

* Use absolute offset lod in warpctc_op and related functors.

* Refine the comments of warpctc_op.

* The new python unittest supports checking a subset of the outputs, so revoke the previous change.

* Rename the transform from LoDTensor to Tensor with shape [max_sequence_length, num_sequences, sequence_width] to PaddingSequenceFunctor.

* Update to the newest codes.

* Rename the PaddingSequenceFunctor to PaddingLoDTensorFunctor and remove the computation of dimensions out of the functos.

b5fda272

04 8月, 2017 1 次提交
- L
  
  Add cpplint for *.h and cuda *.cu · b58725bd
  由 liaogang 提交于 8月 04, 2017
  
  b58725bd
11 7月, 2017 1 次提交
- Y
  
  Refine CUDA Related libraries · a0466053
  由 Yu Yang 提交于 7月 11, 2017
  
  a0466053

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致