提交 · github/fork/lijianshe02/lite-x86 · PaddlePaddle / Paddle-Lite

22 10月, 2019 5 次提交

L

add asr related kernel test=develop · cd65cb74
由 lijianshe02 提交于 10月 22, 2019

cd65cb74
L

add asr related kernel test=develop · d5e58a98
由 lijianshe02 提交于 10月 22, 2019

d5e58a98

Optimize quant_dequant (#2215) · f480d474

由 juncaipeng 提交于 10月 22, 2019

* Add DeleteQuantOpFuser
* Add fake_quantize_dequantize_moving_avg_abs_max_op
* Add DeleteQuantDequantOpFuser

f480d474

Z
remove feed and fetch for npu subgraph pass (#2230) · 4e05ea29
由 zhupengyang 提交于 10月 22, 2019
```
test=develop
```
4e05ea29

Transformer pr (#2214) · f0a6c1eb

由 TianXiaogang 提交于 10月 22, 2019

* feat: add beam_search_special function for support nlp model

* fix: add beam_search_compute kernel input and output

* feat: add assign op & copy_compute kernel

* feat: add fill_const_batch_size_like op & kernel

* feat: add layer_norm op and kernel and ut

* fix: fix some bugs
    fix mul_op infer_shape bug when x_dim_idx = 2, x_dims.size()=3 & y_dim_idx = 1, y_dims.size()=2
    fix elementwise_compute bug when y axis is all 1
    fix beam_search choose math_func wrong bug
    fix layer_norm get attr bug
    fix fill_constant_batch_size_like shape_set bug

* feat: add gather op and kernel & and transform ut

* feats: add ops and fix bugs to support transformer op
       fix type_cast passes to skip `while`
       fix elementwise infer_shape bug when x.dims=3 and y.dims={1} & axis=0
       fix lookup_table compute bug
       fix read_from_array/beam_search/increment/compate/gather ops data_type problems

* fix:
    transfomer ut add word read inferface
    fix copy/gather/norm/layer_norm include path problem

* fix:debug info

* fix: fix input reshape bug

* fix: fix norm bug

* style: style fix & test=develop

* style: fix operators cmakelist

* style: fix operators cmakelist; test=develop

* fix and test=develop

* fix and test=develop

* style: style fix; test=develop

f0a6c1eb

21 10月, 2019 6 次提交
- 石
  link static library with cuda, test=develop (#2228) · 7c69b6b4
  由石晓伟提交于 10月 21, 2019
```
* add static libraries of cuda, test=develop

* update cuda make
```
  7c69b6b4
- to support yolov3 unet alexnet can run on tx2 (#2216) · 57d8e42e
  由 myq406450149 提交于 10月 21, 2019
```
* add gpu kernel mul pool relu scale softmax dropout bilinear_interp and can run in tx2

* rm GREATER_EQUAL
```
  57d8e42e
- J
  
  add GetExceptionMsg for paddle_inference_api. test=develop (#2231) · fda4d42c
  由 Jiaying Zhao 提交于 10月 21, 2019
  
  fda4d42c
- Y
  add cuda op(pool & softmax), support conv with padding_algorithm · 305130fc
  由 yiicy 提交于 10月 21, 2019
```
* cuda add softmax and pool op

* * fix armlinux can find sys/system_properties.h
* conv add padding_algorithm
test=develop

* delete padding_algorithm in op param, test=develop

* fix bugs, test=develop
```
  305130fc
- X
  
  fix ios build error, test=develop (#2222) · 91059871
  由 Xiaoyang LI 提交于 10月 21, 2019
  
  91059871
- H
  Fix ‘Large memory usage of Naive model loading’ (#2175) · 7ba2834e
  由 huzhiqiang 提交于 10月 21, 2019
```
Fix ‘Large memory usage of Naive model loading’  (#2175)
```
  7ba2834e
18 10月, 2019 2 次提交

W
fix yolobox_cuda_test (#2208) · 2a6a259d
由 Wilber 提交于 10月 18, 2019
```
fix yolobox_cuda test precision error
```
2a6a259d

Fix codestyle of GetInputName&GetOutputName (#2185) · 8591aaec

由 huzhiqiang 提交于 10月 18, 2019

* add shell file to automatically build and collect publish result test=develop

* modify codestyle of getInputNames test=develop

* test=develop

* rm publish.sh

* remove copy of func param

* test=develop

* test=devcelop

* test=develop

* test=develop

* const & test=develop

* modify variable defination test=develop

* test=develop

* test=develop

* test=develop

* test=develop

8591aaec

17 10月, 2019 5 次提交

J

add bilinear_interp_cuda_op, test=develop (#2197) · 4ac51a6b
由 juncaipeng 提交于 10月 17, 2019

4ac51a6b
S

fix “CL_INVALID_KERNEL_ARGS ” error， test=develop (#2213) · 2667a153
由 StarryRain 提交于 10月 17, 2019

2667a153

fix npu path (#2210) · 7c722a37

由 zhupengyang 提交于 10月 17, 2019

* move lite/backends/npu/bridges --> lite/kernels/npu/

test=develop

* fix namespace for npu

test=develop

* mv npu runtime file to lite/backends/npu

test=develop

7c722a37

speedup fp32 depthwise conv · 2f6d5f9e

由 HappyAngel 提交于 10月 17, 2019

* update con_dw

* update

* add conv_depthwise_3x3s1.cc and conv_depthwise_3x3s2.cc

* add conv_depthwise_3x3s1_fp32 and conv_depthwise_3x3s2_fp32

* add new conv_dw

* only support conv_dw pad=0, 1

* add conv_dw_s1 conv_dw_s2 fp32

*     //conv2_func _impl2{nullptr};
update conv_dw, add conv_3x3s1 and conv_3x3s2, pad=[0,1]

* fix format, test=develop

* fix formmat, test=develop

2f6d5f9e

L

enable batch_norm op and add its unit tests, test=develop (#2201) · a3241ca7
由 liu zhengxi 提交于 10月 17, 2019

a3241ca7

16 10月, 2019 5 次提交
- Z
  Ban feed and fetch op during inference (#2198) · 75e8a6fc
  由 Zhaolong Xing 提交于 10月 16, 2019
```
* init: delete feed and fetch op, using zero copy
test=develop

* delete the unused test
test=develop
```
  75e8a6fc
- J
  
  Open merge_cl_to_so switch and delete -I(cl_path) build option. test=develop (#2206) · 781d8191
  由 Jiaying Zhao 提交于 10月 16, 2019
  
  781d8191
- L
  enable conv2d op and its unit tests, test=develop (#2200) · 459848c4
  由 liu zhengxi 提交于 10月 16, 2019
```
enable conv2d op and its unit tests on x86 device
```
  459848c4
- X
  
  support global pooling ... test=develop (#2204) · b963383a
  由 xiebaiyuan 提交于 10月 16, 2019
  
  b963383a
- S
  [framework][place] remove prefered_place and kHost in valid_places (#2192) · 3012088b
  由 sangoly 提交于 10月 16, 2019
```
* [framework][place] remove prefered_place, use place order in valid_place array instead test=develop

* remove kHost from valid_places test=develop
```
  3012088b
15 10月, 2019 6 次提交

J
Fix quant dequant fuse pass (#2190) · 9cc7dfa8
由 juncaipeng 提交于 10月 15, 2019
```
* fix bug for accessing the removed node, test=develop
```
9cc7dfa8
J
fix benchmark, test=develop (#2188) · 4d530acc
由 juncaipeng 提交于 10月 15, 2019
```
* fix benchmark, test=develop
```
4d530acc
Y

fix persistable test=develop (#2191) · 435f942b
由 Yanzhan Yang 提交于 10月 15, 2019

435f942b
石

fix pass selection, test=develop (#2187) · da55f674
由石晓伟提交于 10月 15, 2019

da55f674

[LITE][OPENCL] Fix layout, target pass for OpenCL, add macro of... · 72c11758

由 Yuan Shuai 提交于 10月 15, 2019

[LITE][OPENCL] Fix layout, target pass for OpenCL, add macro of CONVERT_TYPE_TO and READ/WRITE image, memory reuse in ResetLazyImage2D (#2170)

* add macro of CONVERT_TYPE_TO and READ/WRITE image. test=develop

* add data type control. test=develop

* fix io op as general layout and precision. test=develop

* Fix memory reuse strategy for opencl image2d. test=develop

* remove std::array, std::map in about opencl backend. test=develop

72c11758

[NPU] Fix and refine the supporting of multi NPU models (#2037) · 7a731b7f

由 hong19860320 提交于 10月 15, 2019

* [NPU] Fix the bug of loading multi NPU models
test=develop

* [NPU] Use lite tensor to store NPU model, fix the management of multi NPU models, support loading NPU model from memory and reduce the modification of framework
test=develop

* [NPU] Remove redundant header files for NPU bridges,
test=develop

* [NPU] fix NPU deps
test=develop

* [NPU] refine the compiling script for NPU
test=develop

* [NPU] remove redundant subdirectory in lite/CMakeLists.txt
test=develop

* [NPU] Fix and refine NPU test case
test=develop

* [NPU] revoke the modification of other non-NPU modules
test=develop

* [NPU] Remove NPU bridges if target is tiny publish
test=develop

7a731b7f

14 10月, 2019 5 次提交
- J
  fix bug for reshape op, test=develop (#2141) · 421c6305
  由 juncaipeng 提交于 10月 14, 2019
```
* fix bug for reshape op, test=develop
```
  421c6305
- Z
  align yolov3 cuda int8 (#2183) · 80d35725
  由 Zhaolong Xing 提交于 10月 14, 2019
```
test=develop
```
  80d35725
- H
  add GetInputNames 、 GetOutPutNames 、 GetInputByName and GetTensor method (#2154) · 56151776
  由 huzhiqiang 提交于 10月 14, 2019
```
* add GetInputNames and GetOutPutNames and GetInputByName method test=develop
```
  56151776
- L
  fix asr modle related kernel bugs test=develop (#2179) · 792d898a
  由 lijianshe02 提交于 10月 14, 2019
```
* fix asr modle related kernel bugs test=develop
```
  792d898a
- J
  Optimize quant_dequant_fuse_pass (#2169) · 253acb80
  由 juncaipeng 提交于 10月 14, 2019
```
* optimize quant_dequant_fuse_pass, test=develop
```
  253acb80
12 10月, 2019 2 次提交
- J
  
  fix clang compile error. test=develop (#2180) · 508ca98b
  由 Jiaying Zhao 提交于 10月 12, 2019
  
  508ca98b
- X
  fix conv_transpose error (#2165) · a6b1e4fa
  由 Xiaoyang LI 提交于 10月 12, 2019
```
* fix conv_transpose error

* fix build error, enable basic test of conv_transpose, test=develop
```
  a6b1e4fa
11 10月, 2019 4 次提交
- J
  
  add rsqrt op, test=develop (#2176) · dfce4621
  由 juncaipeng 提交于 10月 11, 2019
  
  dfce4621
- Y
  
  1. fix group logic for convolution op. 2. add pixel shuffle op for OpenCL. (#2178) · 77811367
  由 Yanzhan Yang 提交于 10月 11, 2019
  
  77811367
- Z
  CUDA: can run yolov3 int8 (#2172) · 7931104f
  由 Zhaolong Xing 提交于 10月 11, 2019
```
* add conv int8 support(in condition which the input or output channel not be the times of 4)
add add_kernel for cuda.

* can run yolov3 fp32
test=develop

* 1. fix bug with yolov3 run
test=develop

* can run yolov3 int8 test=develop
```
  7931104f
- H
  move the method of SetThread and SetPowerMode from MobileConfig into ConfigBase (#2147) · 1ae9239e
  由 huzhiqiang 提交于 10月 11, 2019
```
* move the method of SetThread and SetPowerMode from MobileConfig into ConfigBase 
* cxxPredictor will support SetThread and SetPowerMode method
```
  1ae9239e