提交 · 3d0a45c3ab20d170a9fee3d40e1584392425ae27 · PaddlePaddle / Paddle-Lite

07 7月, 2020 1 次提交
- 石
  
  cpp namespace alias, test=develop (#3894) · 4776f8f4
  由石晓伟提交于 7月 07, 2020
  
  4776f8f4
11 6月, 2020 1 次提交
- W
  
  [CUDA] [NVTX] Lite add nvtx to support performance debug. (#3764) · 6299a90a
  由 Wilber 提交于 6月 11, 2020
  
  6299a90a
09 6月, 2020 1 次提交
- H
  
  [Parl] Add CxxPredictor->Clone() method (#3759) · 24d37695
  由 huzhiqiang 提交于 6月 09, 2020
  
  24d37695
05 6月, 2020 1 次提交
- Y
  [LITE][PROFILE] Fix unit test segfault when profiler on (#3744) · 5169197f
  由 Yuan Shuai 提交于 6月 05, 2020
```
* [LITE][PROFILER] Fix unit test segfault when profiler on. test=develop
```
  5169197f
28 5月, 2020 1 次提交

[Libsize] Reduce size of dynamic library ".so" (#3717) · ec8ef528

由 T8T9 提交于 5月 28, 2020

* reduce .so size. test=develop

* compile all targets when LITE_ON_TINY_PUBLISH=OFF

* unordered_map is more convenient when key is customized class

* test=develop

ec8ef528

22 5月, 2020 1 次提交
- H
  
  [cherry-pick][BUG FIX] fix the issue that opt can not convert quantized model (#3683) · f779a5b9
  由 huzhiqiang 提交于 5月 22, 2020
  
  f779a5b9
18 5月, 2020 2 次提交
- Y
  [LITE][OPENCL] Enhance Profiler for OpenCL with in/out/filter shape,... · 53a6f3bc
  由 Yuan Shuai 提交于 5月 18, 2020
```
[LITE][OPENCL] Enhance Profiler for OpenCL with in/out/filter shape, macs/macs_ps, real backend kernel etc. (#3641)

* [LITE][OPENCL] Enhance Precision Profiler for OpenCL. test=develop
```
  53a6f3bc
- H
  
  [Framework][ModelType] Add Shape&Precision information into optimized model (#3643) · 546d4da8
  由 huzhiqiang 提交于 5月 18, 2020
  
  546d4da8
15 4月, 2020 1 次提交
- M
  refactor(*): reduce Wsign-compare warning (#3391) · 2997b937
  由 MaxwellDing 提交于 4月 15, 2020
```
refactor(*): reduce Wsign-compare warning
```
  2997b937
13 4月, 2020 1 次提交
- W
  lite cuda support exec multi-stream. (#2949) · 4a7284f9
  由 Wilber 提交于 4月 13, 2020
```
lite cuda support exec multi-stream
```
  4a7284f9
03 4月, 2020 1 次提交
- Y
  [LITE][PROFILER] Split precision profiler from performance profiler (#3305) · 185c7096
  由 Yuan Shuai 提交于 4月 03, 2020
```
* split precision profiler from performance profiler. test=develop
```
  185c7096
31 3月, 2020 1 次提交
- H
  
  [operator] add InferShapeImpl method (#3294) · 1f8b5c2b
  由 huzhiqiang 提交于 3月 31, 2020
  
  1f8b5c2b
25 3月, 2020 1 次提交
- X
  fix: fix infershape profile (#3240) · 6a0a1f08
  由 xiaogang 提交于 3月 25, 2020
```
test=develop
```
  6a0a1f08
22 3月, 2020 1 次提交

[LITE][OPENCL][PROFILE] Enhance precision profile & Clean opencl code (#3227) · 3868be2c

由 Yuan Shuai 提交于 3月 22, 2020

* [LITE][OPENCL] clean code for opencl. test=develop

* [LITE][PROFILER] Enhance Precision Profiler. test=develop

* delete useless var in profiler. test=develop

* add ocl header. test=develop

3868be2c

17 3月, 2020 1 次提交

add cuda cxx demo (#3205) · f6461e39

由 Wilber 提交于 3月 17, 2020

- 增加cuda c++ demo.
- 考虑到检测模型尾部一般是multiclass_nms，该kernel为host，如果fetch kernel为cuda的话，则会在此处插入无用的io_copy(host->cuda)，由于该原因，注释掉fetch的cuda kernel. 默认使用host的fetch kernel. 此处暗中进行的行为：每次predictor run完，都会默认把数据从cuda拷贝到cpu

f6461e39

20 2月, 2020 1 次提交
- W
  Optimize cuda kernel and remove io_copy added by default due to missing fetch_cuda kernel (#2920) · 823f0dae
  由 Wilber 提交于 2月 20, 2020
```
Optimize cuda kernel and remove io_copy added by default due to missing fetch_cuda kernel
```
  823f0dae
14 2月, 2020 1 次提交
- X
  fix: fix fpga run the feed/fetch op (#2868) · 27ec5deb
  由 xiaogang 提交于 2月 14, 2020
```
fix fpga lite_tensor compile bug
     add fake quantize_abs_max op
     test=develop
```
  27ec5deb
30 12月, 2019 1 次提交
- Y
  Optimize the execution of RuntimeProgram by saving the bool whether the op is... · bb1cf7ff
  由 Yiqun Liu 提交于 12月 30, 2019
```
Optimize the execution of RuntimeProgram by saving the bool whether the op is feed/fetch op. (#2703)

test=develop
```
  bb1cf7ff
27 12月, 2019 1 次提交
- 石
  
  update profiler, test=develop (#2644) · 9171b70e
  由石晓伟提交于 12月 27, 2019
  
  9171b70e
23 12月, 2019 1 次提交
- H
  
  [lite][arm]fix model_optimize bug, update concat and split op, speed up (#2620) · 6946ca23
  由 HappyAngel 提交于 12月 23, 2019
  
  6946ca23
19 12月, 2019 1 次提交
- Y
  [ARM] change global pooling choose kernel policy, test=develop (#2602) · 49f03648
  由 yiicy 提交于 12月 19, 2019
```
* [ARM] change global pooling choose kernel policy, test=develop
```
  49f03648
16 12月, 2019 1 次提交
- 石
  update profiler, test=develop (#2607) · af37a14f
  由石晓伟提交于 12月 16, 2019
```
* update profiler, test=develop

* warm up times of profiler, test=develop
```
  af37a14f
13 12月, 2019 1 次提交
- H
  [LITE][NPU][XPU] Refine subgraph pass, and support NPU/XPU model generation at... · d5434aa2
  由 hong19860320 提交于 12月 13, 2019
```
[LITE][NPU][XPU] Refine subgraph pass, and support NPU/XPU model generation at execution time (#2576)
```
  d5434aa2
10 12月, 2019 1 次提交

modify static_kernel_pass to support select the kernel according to input type (#2488) · 7ef0e7fe

由 Wilber 提交于 12月 10, 2019

修改了选kernel的逻辑，默认从模型文件中读取出lod_tensor的data type，在static_kernel_pick pass中如果kernel输入输出的类型与读取的data type完全一致，则选择该Kernel的概率增大。

- 增加 从模型文件__model__读取lod_tensor的data type到cpp::vardesc

- program中增加unordered_map<string, type>字段，并在 Program::PrepareWorkspace中对该字段赋值

- 修改了node.h文件，将const Type* 更改为Type*，并在SSAGraph::Build过程中为符合条件的type*赋值

- static_kernel_pick_pass中添加新规则，如果kernel的输入类型输出类型与__model__中存储的类型的一致，则score*=2。

- 支持模型中用到sequence_reverse_float kernel（输入输出均为float）和sequence_reverse_int64 kernel（输入输出均为int64），能够根据输入输出type选kernel

7ef0e7fe

07 12月, 2019 1 次提交

Support mask_rcnn (#2484) · c2f72cb3

由 juncaipeng 提交于 12月 07, 2019

* add arm split lod tensor, test=develop

* add arm merge lod tensor, test=develop

* update split merge lod tensor, test=develop

* add reduce_prob op, test=develop

* support mask_rcnn succeed, test=develop

c2f72cb3

04 12月, 2019 1 次提交
- 石
  
  refactor profile tools, test=develop (#2536) · 8a634b71
  由石晓伟提交于 12月 04, 2019
  
  8a634b71
30 10月, 2019 1 次提交
- Y
  [BugFix] Fix program vlog (#2294) · e353fab5
  由 Yuan Shuai 提交于 10月 30, 2019
```
* [LOG] macro for vlog. test=develop
```
  e353fab5
24 10月, 2019 1 次提交

Make inceptionv4, resnet50, googlenet can run on x86 paltform (#2250) · edb4ea9a

由 liu zhengxi 提交于 10月 24, 2019

* make inceptionv4, resnet50, googlenet can run on x86 paltform and fix the compare part in x86 unittests, test=develop

* fix googlenet tests for benchmark record, test=develop

* [framework][profile] fix profile dump bug when op is feed and fetch test=develop (sangoly)

edb4ea9a

16 10月, 2019 1 次提交
- Z
  Ban feed and fetch op during inference (#2198) · 75e8a6fc
  由 Zhaolong Xing 提交于 10月 16, 2019
```
* init: delete feed and fetch op, using zero copy
test=develop

* delete the unused test
test=develop
```
  75e8a6fc
27 9月, 2019 2 次提交
- Z
  can run yolov3 fp32 on cuda devices (#2092) · 3d6d744f
  由 Zhaolong Xing 提交于 9月 27, 2019
```
* add conv int8 support(in condition which the input or output channel not be the times of 4)
add add_kernel for cuda.

* can run yolov3 fp32
test=develop

* 1. fix bug with yolov3 run
test=develop
```
  3d6d744f
- S
  
  [Profile] add kernel runtime profile && add op runtime summary test=develop (#2136) · aa6623b8
  由 sangoly 提交于 9月 27, 2019
  
  aa6623b8
19 9月, 2019 1 次提交

Bug fix for model save and load (#1992) · 8efbdc66

由 TianXiaogang 提交于 9月 19, 2019

* fix: fix model parser and save bug

* style: delete debug code

* fix: fix light_predictor program run model with subblock bug

8efbdc66

01 9月, 2019 1 次提交

[ARM][CPU] Fix time counter of arm cpu profiler (#1925) · e3fb95ae

由 Yuan Shuai 提交于 9月 01, 2019

* Fix timer of arm cpu profiler. test=develop

* Fix un-added op in cmake.test=develop

* fix cmake error

* fix cmake error, test=develop

* Fix pass sequence. test=develop

* replace option with lite_option. test=develop

* disable profile mode by default. test=develop

* Fix error option name. test=develop

e3fb95ae

30 8月, 2019 1 次提交

add precision and persistable attrs for the tensor. (#1899) · e2e07fa4

由 Zhen Wang 提交于 8月 30, 2019

* Add precision and persistable attrs for the tensor. And fix cxx light and full api demo.

* update precision2string methods. test=develop

* move the save logic to the front of the run in mobilenetv1_full_api.cc, test=develop.

* add comments for UpdateVarsOfProgram. test=develop

e2e07fa4

16 8月, 2019 1 次提交
- Y
  
  publish lite (#1800) · 699d6cd0
  由 Yan Chunwei 提交于 8月 16, 2019
  
  699d6cd0