- 05 6月, 2020 1 次提交
-
-
由 Yuan Shuai 提交于
* [LITE][PROFILER] Fix unit test segfault when profiler on. test=develop
-
- 28 5月, 2020 1 次提交
-
-
由 T8T9 提交于
* reduce .so size. test=develop * compile all targets when LITE_ON_TINY_PUBLISH=OFF * unordered_map is more convenient when key is customized class * test=develop
-
- 22 5月, 2020 1 次提交
-
-
由 huzhiqiang 提交于
-
- 18 5月, 2020 2 次提交
-
-
由 Yuan Shuai 提交于
[LITE][OPENCL] Enhance Profiler for OpenCL with in/out/filter shape, macs/macs_ps, real backend kernel etc. (#3641) * [LITE][OPENCL] Enhance Precision Profiler for OpenCL. test=develop
-
由 huzhiqiang 提交于
-
- 15 4月, 2020 1 次提交
-
-
由 MaxwellDing 提交于
refactor(*): reduce Wsign-compare warning
-
- 13 4月, 2020 1 次提交
-
-
由 Wilber 提交于
lite cuda support exec multi-stream
-
- 03 4月, 2020 1 次提交
-
-
由 Yuan Shuai 提交于
* split precision profiler from performance profiler. test=develop
-
- 31 3月, 2020 1 次提交
-
-
由 huzhiqiang 提交于
-
- 25 3月, 2020 1 次提交
-
-
由 xiaogang 提交于
test=develop
-
- 22 3月, 2020 1 次提交
-
-
由 Yuan Shuai 提交于
* [LITE][OPENCL] clean code for opencl. test=develop * [LITE][PROFILER] Enhance Precision Profiler. test=develop * delete useless var in profiler. test=develop * add ocl header. test=develop
-
- 17 3月, 2020 1 次提交
-
-
由 Wilber 提交于
- 增加cuda c++ demo. - 考虑到检测模型尾部一般是multiclass_nms,该kernel为host,如果fetch kernel为cuda的话,则会在此处插入无用的io_copy(host->cuda),由于该原因,注释掉fetch的cuda kernel. 默认使用host的fetch kernel. 此处暗中进行的行为:每次predictor run完,都会默认把数据从cuda拷贝到cpu
-
- 20 2月, 2020 1 次提交
-
-
由 Wilber 提交于
Optimize cuda kernel and remove io_copy added by default due to missing fetch_cuda kernel
-
- 14 2月, 2020 1 次提交
-
-
由 xiaogang 提交于
fix fpga lite_tensor compile bug add fake quantize_abs_max op test=develop
-
- 30 12月, 2019 1 次提交
-
-
由 Yiqun Liu 提交于
Optimize the execution of RuntimeProgram by saving the bool whether the op is feed/fetch op. (#2703) test=develop
-
- 27 12月, 2019 1 次提交
-
-
由 石晓伟 提交于
-
- 23 12月, 2019 1 次提交
-
-
由 HappyAngel 提交于
-
- 19 12月, 2019 1 次提交
-
-
由 yiicy 提交于
* [ARM] change global pooling choose kernel policy, test=develop
-
- 16 12月, 2019 1 次提交
-
-
由 石晓伟 提交于
* update profiler, test=develop * warm up times of profiler, test=develop
-
- 13 12月, 2019 1 次提交
-
-
由 hong19860320 提交于
[LITE][NPU][XPU] Refine subgraph pass, and support NPU/XPU model generation at execution time (#2576)
-
- 10 12月, 2019 1 次提交
-
-
由 Wilber 提交于
修改了选kernel的逻辑,默认从模型文件中读取出lod_tensor的data type,在static_kernel_pick pass中如果kernel输入输出的类型与读取的data type完全一致,则选择该Kernel的概率增大。 - 增加 从模型文件__model__读取lod_tensor的data type到cpp::vardesc - program中增加unordered_map<string, type>字段,并在 Program::PrepareWorkspace中对该字段赋值 - 修改了node.h文件,将const Type* 更改为Type*,并在SSAGraph::Build过程中为符合条件的type*赋值 - static_kernel_pick_pass中添加新规则,如果kernel的输入类型输出类型与__model__中存储的类型的一致,则score*=2。 - 支持模型中用到sequence_reverse_float kernel(输入输出均为float)和sequence_reverse_int64 kernel(输入输出均为int64),能够根据输入输出type选kernel
-
- 07 12月, 2019 1 次提交
-
-
由 juncaipeng 提交于
* add arm split lod tensor, test=develop * add arm merge lod tensor, test=develop * update split merge lod tensor, test=develop * add reduce_prob op, test=develop * support mask_rcnn succeed, test=develop
-
- 04 12月, 2019 1 次提交
-
-
由 石晓伟 提交于
-
- 30 10月, 2019 1 次提交
-
-
由 Yuan Shuai 提交于
* [LOG] macro for vlog. test=develop
-
- 24 10月, 2019 1 次提交
-
-
由 liu zhengxi 提交于
* make inceptionv4, resnet50, googlenet can run on x86 paltform and fix the compare part in x86 unittests, test=develop * fix googlenet tests for benchmark record, test=develop * [framework][profile] fix profile dump bug when op is feed and fetch test=develop (sangoly)
-
- 16 10月, 2019 1 次提交
-
-
由 Zhaolong Xing 提交于
* init: delete feed and fetch op, using zero copy test=develop * delete the unused test test=develop
-
- 27 9月, 2019 2 次提交
-
-
由 Zhaolong Xing 提交于
* add conv int8 support(in condition which the input or output channel not be the times of 4) add add_kernel for cuda. * can run yolov3 fp32 test=develop * 1. fix bug with yolov3 run test=develop
-
由 sangoly 提交于
-
- 19 9月, 2019 1 次提交
-
-
由 TianXiaogang 提交于
* fix: fix model parser and save bug * style: delete debug code * fix: fix light_predictor program run model with subblock bug
-
- 01 9月, 2019 1 次提交
-
-
由 Yuan Shuai 提交于
* Fix timer of arm cpu profiler. test=develop * Fix un-added op in cmake.test=develop * fix cmake error * fix cmake error, test=develop * Fix pass sequence. test=develop * replace option with lite_option. test=develop * disable profile mode by default. test=develop * Fix error option name. test=develop
-
- 30 8月, 2019 1 次提交
-
-
由 Zhen Wang 提交于
* Add precision and persistable attrs for the tensor. And fix cxx light and full api demo. * update precision2string methods. test=develop * move the save logic to the front of the run in mobilenetv1_full_api.cc, test=develop. * add comments for UpdateVarsOfProgram. test=develop
-
- 16 8月, 2019 1 次提交
-
-
由 Yan Chunwei 提交于
-