diff --git a/design/meps/MEP-MSLITE.md b/design/meps/MEP-MSLITE.md new file mode 100644 index 0000000000000000000000000000000000000000..d53b15c1117be500dc58765d725aee8f03f8e528 --- /dev/null +++ b/design/meps/MEP-MSLITE.md @@ -0,0 +1,146 @@ +# + +| title | authors | owning-sig | participating-sigs | status | creation-date | reviewers | approvers | stage | milestone | +| ------- | -------------------------------- | ---------- | ------------------ | ----------- | ------------- | --------- | --------- | ----- | ------------- | +| MEP-MS-Lite | @zhengli  @zhiqiangzhai @chaijun | ms-lite | | provisional | 2020-08-18 | | TBD | beta | beta : "v0.5" | + +# MEP-AKG: MindSpore Lite + +## Table of Contents + + + +- [MEP-MS-lite: MindSpore Lite](#mep-akg-auto-kernel-generator) + - [Table of Contents](#table-of-contents) + - [Summary](#summary) + - [Motivation](#motivation) + - [Goals](#goals) + - [Non-Goals](#non-goals) + - [Proposal](#proposal) + - [User Stories](#user-stories-optional) + - [Generate a compact target model and low latency and low consumption runtime](#Generate a compact target model and low latency and low consumption runtime) + - [Design Details](#design-details) + - [Test Plan](#test-plan) + - [Implementation History](#implementation-history) + - [Drawbacks](#drawbacks) + - [Alternatives](#alternatives) + - [References](#references-optional) + + +## Summary +MindSpore(MS) lite is an extremely light-weight deep learning inference framework, +and designed for smart-phones and embedded devices, such as watches, headsets, and various IoT devices. +It supports Android and iOS, as well as Harmony os, and has industry leading performance. + +## Motivation +Since increased computing power and sensor data, intelligence is moving towards edge devices. +Improved AI algorithms are driving the trend towards machine learning be run on +the end device, such as smart-phones or automobiles, rather than in the cloud. +On-device AI can dramatically reduce latency, conserve bandwidth, +improve privacy and enable smarter applications. + +### Goals +- Compatibility: supports MindSpore model, as well as mainstream third-party models, such as TensorFlow lite, Caffe 1.0 and ONNX. +- High-performance: +generates small, low power consumption and fast inference target model for various hardware backends. + +- Versatility: supports Harmony, Android and iOS os. +- Light-weight: small shared library size, should be less than 1 MB, and could be easily deployed on +resource limited devices. + +### Non-Goals +- None + +## Proposal + +MS lite consists of converter and a runtime library. +The converter is an offline tool can handle most of the model translation work. +The runtime library deploys to device and executes online, +it has Lite RT and Lite Micro two modes. +Lite RT is for slightly resource limited devices, such as smart-phones, +while Lite Micro is for extremely resource limited devices, such as watches, headsets. + +- Compatibility + + provides an abundant of operator parsers for MindSpore, Tensorflow Lite, Caffe, ONNX, + and supports common neural networks in CV and NLP, 208+ CPU operators, and 60+ GPU operators. + +- High performance + + Many optimization methods, including graph optimizations, post training quantization, + are applied to model in offline converter, and generated target model is more compact. + Graph optimizations, such as operator fusion and constant folding, make model more compact. + Post training quantization transfers fp32 model into fix-point int8 model. + It brings nearly 4x smaller model size, low latency and low consumption for inference process. + + MS lite also applies a variety of optimization schemes to NN operations, including using Winograd +algorithm in convolution and deconvolution, Strassen algorithm in matrix multiplication. +Operations support fp64, fp32, fp16 and int8, and are highly optimized with acceleration by +neon instructions, hand-written assemble, multi-thread, memory reuse, heterogeneous computing, etc. + +- Versatility + + Supports Harmony, iOS and Android os, supports smart-phones, watches, headsets, and various IoT devices. + +- Light weight + + MS lite is highly Optimized under GHLO and GLLO. It has small foot-print, + MS lite runtime is about 800 kB, and MS Micro is less than 200 KB. + It is flexible and can easily deploy to mobile and a variety of embedded devices. +### User Stories + +#### Generate a compact target model and low latency and low consumption runtime + +Since devices has limited resource with few ROM, RAM, and power, how to deploy AI model to +device is very challenge. MS lite aims to solve the challenge for users, and provides user-friendly, +flexible tool to help users to make their own models more slim and more efficiency. + +## Design Details + +MS lite consists of converter and runtime. +The converter is an offline tool has three parts, frontend, IR, and backend. +Runtime deploys to device and executes online. + +- **Frontend.** Frontend aims to parse model from MindSpore, Tensorflow Lite, Caffe and ONNX in protobuf. +- **IR.** IR is to define ANF, including tensor, operations, and graph. +- **Backend.** Backend is an optimizer based ANF graph, including GHLO, GLLO, and quantization. + GHLO is short for "graph high level optimization", common optimization methods, + such as operators fusion, operator substitution, and constant folding, are included. + GLLO is short for "graph low level optimization", low level optimization methods + are related to hardware, such as layout adjustment, mixed-precision, etc. + +- **Runtime.** Runtime has Lite RT and Lite Micro two modes. + + + + +### Test Plan + +MS lite employed pytests and nosetest to launch the testing process, +and there are two types of testing strategies in MS lite: + +- **Unit Test.** Every operation, optimization or pass in MS has its own unitest. + +- **System test**. The ms lite module has its own component testing. +Basically we classify the testing into compilation verification, +function verification and performance testing. + +## Implementation History +- Support high and low level graph optimization. +- Support post training quantization. +- Support Arm CPU and Mali GPU. +- Support fp64, fp32, fp16, int8 operations. + +## Drawbacks +- MS lite does not support on-device training yet, it will come soon... + +## Alternatives +- MNN[1], TF lite[2] and TNN[3] are outstanding on-device AI frameworks. +MS lite is for on-device AI, and MS cloud is for on-cloud AI, +both of them are in scope of Huawei's MindSpore AI framework. +They share same IR, and optimization passes. MS lite is more flexible. + +## References +- [1] https://github.com/alibaba/MNN +- [2] https://www.tensorflow.org/lite +- [3] https://github.com/Tencent/TNN diff --git a/design/meps/ms-lite-arch.jpg b/design/meps/ms-lite-arch.jpg new file mode 100644 index 0000000000000000000000000000000000000000..6006c845e8002831ef79b39ce9d68b8afd85e0f2 Binary files /dev/null and b/design/meps/ms-lite-arch.jpg differ