Paddle on Mobile
Created by: hedaoyuan
Paddle on Mobile
Based on some previous work and issues, I've listed some things Paddle needs to do on the mobile and embedded devices.
Build
Paddle mobile inference library needs to support a variety of computing platforms, including Linux, Android, iOS and CPUs, CPUs etc. So, we need to continue refining the entire compilation project (Especially Android and iOS compilation project). In addition, the binary size of the inference library also needs to continue to optimize.
Inference API
The C-API design did not consider the mobile scene. The existing C-API is also not enough on the mobile side (Android need Java API). We need to think about whether to refactor or refine the C-API. And it is more reasonable to rename C-API to Inference API. Also, we need to improve the inference programming model on mobile.
Low Precision
Low-precision calculations can allow for smaller and faster model inference. Many hardware are enhancing hardware support for low-precision computing. Next year, there will be chips that support the ARMv8.2 instruction set architecture. And we can use float16 calculations on the mobile to speed up model inference. Here is an issue #4853 (closed) about support for float16 calculation.
Multi-Thread
Multi-thread computing can be used to speed up some computationally intensive operations. However, due to the big.LITTLE architecture and power consumption issues, multi-thread in the mobile is hard to achieve the expected speed of acceleration. Here #4678 (closed) is a more detailed explanation of the mobile multi-threaded computing difficulties.
Mobile GPU
Mobile GPU performance has been greatly improved in recent years. For some computationally intensive operations, an order of magnitude acceleration can be achieved on the mobile GPU compared to the CPU. We need to add GPU computing on Paddle Mobile. Here #5469 (closed) is a more detailed explanation of why Paddle needs to support the mobile GPU.
Hardware Acceleration
On the mobile, hardware acceleration for model Inference is a trend. We need to know about libraries for Android NN, SNPE, ARM NN, etc. that can be used for hardware acceleration. And how Paddle uses these libraries for the model inference. Here is a project for this work.