提交 · 2f6d5f9e5ef9cd9ba02114855273dee40ac1774d · PaddlePaddle / Paddle-Lite

17 10月, 2019 2 次提交

由 HappyAngel 提交于 10月 17, 2019

* update con_dw

* update

* add conv_depthwise_3x3s1.cc and conv_depthwise_3x3s2.cc

* add conv_depthwise_3x3s1_fp32 and conv_depthwise_3x3s2_fp32

* add new conv_dw

* only support conv_dw pad=0, 1

* add conv_dw_s1 conv_dw_s2 fp32

*     //conv2_func _impl2{nullptr};
update conv_dw, add conv_3x3s1 and conv_3x3s2, pad=[0,1]

* fix format, test=develop

* fix formmat, test=develop

2f6d5f9e

L

enable batch_norm op and add its unit tests, test=develop (#2201) · a3241ca7
由 liu zhengxi 提交于 10月 17, 2019

a3241ca7

16 10月, 2019 5 次提交
- Z
  Ban feed and fetch op during inference (#2198) · 75e8a6fc
  由 Zhaolong Xing 提交于 10月 16, 2019
```
* init: delete feed and fetch op, using zero copy
test=develop

* delete the unused test
test=develop
```
  75e8a6fc
- J
  
  Open merge_cl_to_so switch and delete -I(cl_path) build option. test=develop (#2206) · 781d8191
  由 Jiaying Zhao 提交于 10月 16, 2019
  
  781d8191
- L
  enable conv2d op and its unit tests, test=develop (#2200) · 459848c4
  由 liu zhengxi 提交于 10月 16, 2019
```
enable conv2d op and its unit tests on x86 device
```
  459848c4
- X
  
  support global pooling ... test=develop (#2204) · b963383a
  由 xiebaiyuan 提交于 10月 16, 2019
  
  b963383a
- S
  [framework][place] remove prefered_place and kHost in valid_places (#2192) · 3012088b
  由 sangoly 提交于 10月 16, 2019
```
* [framework][place] remove prefered_place, use place order in valid_place array instead test=develop

* remove kHost from valid_places test=develop
```
  3012088b
15 10月, 2019 6 次提交

J
Fix quant dequant fuse pass (#2190) · 9cc7dfa8
由 juncaipeng 提交于 10月 15, 2019
```
* fix bug for accessing the removed node, test=develop
```
9cc7dfa8
J
fix benchmark, test=develop (#2188) · 4d530acc
由 juncaipeng 提交于 10月 15, 2019
```
* fix benchmark, test=develop
```
4d530acc
Y

fix persistable test=develop (#2191) · 435f942b
由 Yanzhan Yang 提交于 10月 15, 2019

435f942b
石

fix pass selection, test=develop (#2187) · da55f674
由石晓伟提交于 10月 15, 2019

da55f674

[LITE][OPENCL] Fix layout, target pass for OpenCL, add macro of... · 72c11758

由 Yuan Shuai 提交于 10月 15, 2019

[LITE][OPENCL] Fix layout, target pass for OpenCL, add macro of CONVERT_TYPE_TO and READ/WRITE image, memory reuse in ResetLazyImage2D (#2170)

* add macro of CONVERT_TYPE_TO and READ/WRITE image. test=develop

* add data type control. test=develop

* fix io op as general layout and precision. test=develop

* Fix memory reuse strategy for opencl image2d. test=develop

* remove std::array, std::map in about opencl backend. test=develop

72c11758

[NPU] Fix and refine the supporting of multi NPU models (#2037) · 7a731b7f

由 hong19860320 提交于 10月 15, 2019

* [NPU] Fix the bug of loading multi NPU models
test=develop

* [NPU] Use lite tensor to store NPU model, fix the management of multi NPU models, support loading NPU model from memory and reduce the modification of framework
test=develop

* [NPU] Remove redundant header files for NPU bridges,
test=develop

* [NPU] fix NPU deps
test=develop

* [NPU] refine the compiling script for NPU
test=develop

* [NPU] remove redundant subdirectory in lite/CMakeLists.txt
test=develop

* [NPU] Fix and refine NPU test case
test=develop

* [NPU] revoke the modification of other non-NPU modules
test=develop

* [NPU] Remove NPU bridges if target is tiny publish
test=develop

7a731b7f

14 10月, 2019 5 次提交
- J
  fix bug for reshape op, test=develop (#2141) · 421c6305
  由 juncaipeng 提交于 10月 14, 2019
```
* fix bug for reshape op, test=develop
```
  421c6305
- Z
  align yolov3 cuda int8 (#2183) · 80d35725
  由 Zhaolong Xing 提交于 10月 14, 2019
```
test=develop
```
  80d35725
- H
  add GetInputNames 、 GetOutPutNames 、 GetInputByName and GetTensor method (#2154) · 56151776
  由 huzhiqiang 提交于 10月 14, 2019
```
* add GetInputNames and GetOutPutNames and GetInputByName method test=develop
```
  56151776
- L
  fix asr modle related kernel bugs test=develop (#2179) · 792d898a
  由 lijianshe02 提交于 10月 14, 2019
```
* fix asr modle related kernel bugs test=develop
```
  792d898a
- J
  Optimize quant_dequant_fuse_pass (#2169) · 253acb80
  由 juncaipeng 提交于 10月 14, 2019
```
* optimize quant_dequant_fuse_pass, test=develop
```
  253acb80
12 10月, 2019 2 次提交
- J
  
  fix clang compile error. test=develop (#2180) · 508ca98b
  由 Jiaying Zhao 提交于 10月 12, 2019
  
  508ca98b
- X
  fix conv_transpose error (#2165) · a6b1e4fa
  由 Xiaoyang LI 提交于 10月 12, 2019
```
* fix conv_transpose error

* fix build error, enable basic test of conv_transpose, test=develop
```
  a6b1e4fa
11 10月, 2019 5 次提交

J

add rsqrt op, test=develop (#2176) · dfce4621
由 juncaipeng 提交于 10月 11, 2019

dfce4621
Y

1. fix group logic for convolution op. 2. add pixel shuffle op for OpenCL. (#2178) · 77811367
由 Yanzhan Yang 提交于 10月 11, 2019

77811367

CUDA: can run yolov3 int8 (#2172) · 7931104f

由 Zhaolong Xing 提交于 10月 11, 2019

* add conv int8 support(in condition which the input or output channel not be the times of 4)
add add_kernel for cuda.

* can run yolov3 fp32
test=develop

* 1. fix bug with yolov3 run
test=develop

* can run yolov3 int8 test=develop

7931104f

move the method of SetThread and SetPowerMode from MobileConfig into ConfigBase (#2147) · 1ae9239e

由 huzhiqiang 提交于 10月 11, 2019

* move the method of SetThread and SetPowerMode from MobileConfig into ConfigBase 
* cxxPredictor will support SetThread and SetPowerMode method

1ae9239e

[LITE][OPENCL] support image2d type (#2158) · 77cdbdce

由 Yuan Shuai 提交于 10月 11, 2019

* [LITE][OPENCL] support image2d. test=develop

* add context changed with consider image*. test=develop

* add layout, relu image kernels. test=develop

* replace image_data with data, mutable_image_data with mutable_data, test=develop

* comment unused var. test=develop

* remove unused var. test=develop

77cdbdce

10 10月, 2019 4 次提交
- W
  fix yolobox_cuda bug · f4ac2768
  由 Wilber 提交于 10月 10, 2019
```
* fix yolobox_cuda bug 
* update code format
```
  f4ac2768
- Y
  1. improve n-fold quantification algorithm by introducing a minimal size for... · 4bad9853
  由 Yanzhan Yang 提交于 10月 10, 2019
```
1. improve n-fold quantification algorithm by introducing a minimal size for each fold. 2. automatically search for best n for n-fold algorithm. (#2167)
```
  4bad9853
- Y
  [CMAKE] Abandon strip when CMAKE_BUILD_TYPE=Debug (#2155) · 2dc3b4d7
  由 Yuan Shuai 提交于 10月 10, 2019
```
* [CMAKE] Abandon strip when CMAKE_BUILD_TYPE=Debug

* Add CMAKE_BUILD_TYPE info in cmake. test=develop
```
  2dc3b4d7
- X
  
  fix an calc bug in test-mobilenetgpu (#2162) · f4a42296
  由 xiebaiyuan 提交于 10月 10, 2019
  
  f4a42296
09 10月, 2019 5 次提交
- Y
  add n-fold quantification algorithm (#2164) · 7e9bb98a
  由 Yanzhan Yang 提交于 10月 09, 2019
```
* 1. add quantification_fold parameter. 2. support quantification test in run.py.

* implement n-fold quantification
```
  7e9bb98a
- Y
  improve dw conv performance · 4b9df8fb
  由 yiicy 提交于 10月 09, 2019
```
*  imporve prepack_input func speed in int8 3x3s1 dw conv

* fix code style

* fix code style

* improve 3x3s1 dw fp32 conv speed a little

* arm add 5x5s1 int8 dw conv, test=develop
```
  4b9df8fb
- X
  open cl mem optimise, split with cpu codes. fix a bug when some memory is not equal 4 . (#2161) · 33c335a5
  由 xiebaiyuan 提交于 10月 09, 2019
```
* open cl mem optimise, split with cpu codes.
fix a bug when some memory is not equal 4 .

* open cl mem optimise, split with cpu codes.   fix a bug when some memory is not equal 4  fix bad des
```
  33c335a5
- Z
  
  fix crash in some platform (#2159) · c551e1d6
  由 zp7 提交于 10月 09, 2019
  
  c551e1d6
- X
  
  paddle mobile runtime cl memory optimise. test=develop (#2160) · 528fd741
  由 xiebaiyuan 提交于 10月 09, 2019
  
  528fd741
08 10月, 2019 1 次提交
- J
  
  add pre and post process in feed and fetch kernel test=develop (#2157) · 5f227934
  由 Jiaying Zhao 提交于 10月 08, 2019
  
  5f227934
01 10月, 2019 1 次提交
- S
  
  [Profile] Add time profile unit flags, ms or us test=develop (#2144) · 69da22ec
  由 sangoly 提交于 10月 01, 2019
  
  69da22ec
27 9月, 2019 4 次提交
- Z
  
  set merge_cl_to_so default value 0 (#2145) · ff00083d
  由 zp7 提交于 9月 27, 2019
  
  ff00083d
- Z
  can run yolov3 fp32 on cuda devices (#2092) · 3d6d744f
  由 Zhaolong Xing 提交于 9月 27, 2019
```
* add conv int8 support(in condition which the input or output channel not be the times of 4)
add add_kernel for cuda.

* can run yolov3 fp32
test=develop

* 1. fix bug with yolov3 run
test=develop
```
  3d6d744f
- S
  
  [Profile] add kernel runtime profile && add op runtime summary test=develop (#2136) · aa6623b8
  由 sangoly 提交于 9月 27, 2019
  
  aa6623b8
- N
  
  optimize instance norm. test=develop (#2120) · a96d013c
  由 NazgulLee 提交于 9月 27, 2019
  
  a96d013c