未验证 提交 ad097764 编写于 作者: Q Qi Li 提交者: GitHub

[DOC] fix doc and update supported op list, test=develop, test=document_fix (#4248)

* [DOC] fix doc in readme and compile, test=develop, test=document_fix

* [DOC] update supported op list, test=develop, test=document_fix
上级 02497657
......@@ -20,55 +20,55 @@ Paddle Lite框架直接支持模型结构为[PaddlePaddle](https://github.com/Pa
**二. 模型优化**
Paddle Lite框架拥有优秀的加速、优化策略及实现,包含量化、子图融合、Kernel优选等优化手段。优化后的模型更轻量级,耗费资源更少,并且执行速度也更快。
这些优化通过Paddle Lite提供的opt工具实现。opt工具还可以统计并打印出模型中的算子信息,并判断不同硬件平台下Paddle Lite的支持情况。您获取PaddlePaddle格式的模型之后,一般需要通该opt工具做模型优化。opt工具的下载和使用,请参考 [模型优化方法](https://paddle-lite.readthedocs.io/zh/develop/user_guides/model_optimize_tool.html)
这些优化通过Paddle Lite提供的opt工具实现。opt工具还可以统计并打印出模型中的算子信息,并判断不同硬件平台下Paddle Lite的支持情况。您获取PaddlePaddle格式的模型之后,一般需要通该opt工具做模型优化。opt工具的下载和使用,请参考 [模型优化方法](https://paddle-lite.readthedocs.io/zh/latest/user_guides/model_optimize_tool.html)
**三. 下载或编译**
Paddle Lite提供了Android/iOS/X86平台的官方Release预测库下载,我们优先推荐您直接下载 [Paddle Lite预编译库](https://paddle-lite.readthedocs.io/zh/develop/quick_start/release_lib.html)
您也可以根据目标平台选择对应的[源码编译方法](https://paddle-lite.readthedocs.io/zh/develop/quick_start/release_lib.html#id2)。Paddle Lite 提供了源码编译脚本,位于 `lite/tools/`文件夹下,只需要 [准备环境](https://paddle-lite.readthedocs.io/zh/develop/source_compile/compile_env.html)[调用编译脚本](https://paddle-lite.readthedocs.io/zh/develop/quick_start/release_lib.html#id2) 两个步骤即可一键编译得到目标平台的Paddle Lite预测库。
Paddle Lite提供了Android/iOS/X86平台的官方Release预测库下载,我们优先推荐您直接下载 [Paddle Lite预编译库](https://paddle-lite.readthedocs.io/zh/latest/quick_start/release_lib.html)
您也可以根据目标平台选择对应的[源码编译方法](https://paddle-lite.readthedocs.io/zh/latest/quick_start/release_lib.html#id2)。Paddle Lite 提供了源码编译脚本,位于 `lite/tools/`文件夹下,只需要 [准备环境](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html)[调用编译脚本](https://paddle-lite.readthedocs.io/zh/latest/quick_start/release_lib.html#id2) 两个步骤即可一键编译得到目标平台的Paddle Lite预测库。
**四. 预测示例**
Paddle Lite提供了C++、Java、Python三种API,并且提供了相应API的完整使用示例:
- [C++完整示例](https://paddle-lite.readthedocs.io/zh/develop/quick_start/cpp_demo.html)
- [Java完整示例](https://paddle-lite.readthedocs.io/zh/develop/quick_start/java_demo.html)
- [Python完整示例](https://paddle-lite.readthedocs.io/zh/develop/quick_start/python_demo.html)
- [C++完整示例](https://paddle-lite.readthedocs.io/zh/latest/quick_start/cpp_demo.html)
- [Java完整示例](https://paddle-lite.readthedocs.io/zh/latest/quick_start/java_demo.html)
- [Python完整示例](https://paddle-lite.readthedocs.io/zh/latest/quick_start/python_demo.html)
您可以参考示例中的说明快速了解使用方法,并集成到您自己的项目中去。
针对不同的硬件平台,Paddle Lite提供了各个平台的完整示例:
- [Android示例](https://paddle-lite.readthedocs.io/zh/develop/demo_guides/android_app_demo.html) [[图像分类]](https://paddlelite-demo.bj.bcebos.com/apps/android/mobilenet_classification_demo.apk) [[目标检测]](https://paddlelite-demo.bj.bcebos.com/apps/android/yolo_detection_demo.apk) [[口罩检测]](https://paddlelite-demo.bj.bcebos.com/apps/android/mask_detection_demo.apk) [[人脸关键点]](https://paddlelite-demo.bj.bcebos.com/apps/android/face_keypoints_detection_demo.apk) [[人像分割]](https://paddlelite-demo.bj.bcebos.com/apps/android/human_segmentation_demo.apk)
- [iOS示例](https://paddle-lite.readthedocs.io/zh/develop/demo_guides/ios_app_demo.html)
- [ARMLinux示例](https://paddle-lite.readthedocs.io/zh/develop/demo_guides/linux_arm_demo.html)
- [X86示例](https://paddle-lite.readthedocs.io/zh/develop/demo_guides/x86.html)
- [CUDA示例](https://paddle-lite.readthedocs.io/zh/develop/demo_guides/cuda.html)
- [OpenCL示例](https://paddle-lite.readthedocs.io/zh/develop/demo_guides/opencl.html)
- [FPGA示例](https://paddle-lite.readthedocs.io/zh/develop/demo_guides/fpga.html)
- [Huawei NPU示例](https://paddle-lite.readthedocs.io/zh/develop/demo_guides/npu.html)
- [Baidu XPU示例](https://paddle-lite.readthedocs.io/zh/develop/demo_guides/baidu_xpu.html)
- [RKNPU示例](https://paddle-lite.readthedocs.io/zh/develop/demo_guides/rockchip_npu.html)
- [MTK APU示例](https://paddle-lite.readthedocs.io/zh/develop/demo_guides/mediatek_apu.html)
- [Android示例](https://paddle-lite.readthedocs.io/zh/latest/demo_guides/android_app_demo.html) [[图像分类]](https://paddlelite-demo.bj.bcebos.com/apps/android/mobilenet_classification_demo.apk) [[目标检测]](https://paddlelite-demo.bj.bcebos.com/apps/android/yolo_detection_demo.apk) [[口罩检测]](https://paddlelite-demo.bj.bcebos.com/apps/android/mask_detection_demo.apk) [[人脸关键点]](https://paddlelite-demo.bj.bcebos.com/apps/android/face_keypoints_detection_demo.apk) [[人像分割]](https://paddlelite-demo.bj.bcebos.com/apps/android/human_segmentation_demo.apk)
- [iOS示例](https://paddle-lite.readthedocs.io/zh/latest/demo_guides/ios_app_demo.html)
- [ARMLinux示例](https://paddle-lite.readthedocs.io/zh/latest/demo_guides/linux_arm_demo.html)
- [X86示例](https://paddle-lite.readthedocs.io/zh/latest/demo_guides/x86.html)
- [CUDA示例](https://paddle-lite.readthedocs.io/zh/latest/demo_guides/cuda.html)
- [OpenCL示例](https://paddle-lite.readthedocs.io/zh/latest/demo_guides/opencl.html)
- [FPGA示例](https://paddle-lite.readthedocs.io/zh/latest/demo_guides/fpga.html)
- [华为NPU示例](https://paddle-lite.readthedocs.io/zh/latest/demo_guides/huawei_kirin_npu.html)
- [百度XPU示例](https://paddle-lite.readthedocs.io/zh/latest/demo_guides/baidu_xpu.html)
- [瑞芯微NPU示例](https://paddle-lite.readthedocs.io/zh/latest/demo_guides/rockchip_npu.html)
- [联发科APU示例](https://paddle-lite.readthedocs.io/zh/latest/demo_guides/mediatek_apu.html)
## 主要特性
- **多硬件支持:**
- Paddle Lite架构已经验证和完整支持从 Mobile 到 Server [多种硬件平台](https://paddle-lite.readthedocs.io/zh/develop/introduction/support_hardware.html),包括 ARM CPU、Mali GPU、Adreno GPU、华为 NPU,以及 FPGA 等,且正在不断增加更多新硬件支持。
- 各个硬件平台的 Kernel 在代码层和执行层互不干扰,用户不仅可以自由插拔任何硬件,还支持任意系统可见硬件之间的[混合调度](https://paddle-lite.readthedocs.io/zh/develop/introduction/tech_highlights.html#id7)
- Paddle Lite架构已经验证和完整支持从 Mobile 到 Server [多种硬件平台](https://paddle-lite.readthedocs.io/zh/latest/introduction/support_hardware.html),包括 ARM CPU、Mali GPU、Adreno GPU、华为 NPU,以及 FPGA 等,且正在不断增加更多新硬件支持。
- 各个硬件平台的 Kernel 在代码层和执行层互不干扰,用户不仅可以自由插拔任何硬件,还支持任意系统可见硬件之间的[混合调度](https://paddle-lite.readthedocs.io/zh/latest/introduction/tech_highlights.html#id7)
- **轻量级部署**
- Paddle Lite在设计上对图优化模块和执行引擎实现了良好的解耦拆分,移动端可以直接部署执行阶段,无任何第三方依赖。
- 包含完整的80个 op+85个 Kernel 的动态库,对于ARMV7只有800K,ARMV8下为1.3M,并可以通过[裁剪预测](https://paddle-lite.readthedocs.io/zh/develop/user_guides/library_tailoring.html)库进一步减小预测库文件大小。
- 包含完整的80个 op+85个 Kernel 的动态库,对于ARMV7只有800K,ARMV8下为1.3M,并可以通过[裁剪预测](https://paddle-lite.readthedocs.io/zh/latest/user_guides/library_tailoring.html)库进一步减小预测库文件大小。
- **高性能:**
- 极致的 ARM CPU 性能优化:针对不同微架构特点实现kernel的定制,最大发挥计算性能,在主流模型上展现出领先的速度优势。
- 支持 [PaddleSlim模型压缩工具](https://github.com/PaddlePaddle/PaddleSlim):支持量化训练、离线量化等多种量化方式,最优可在不损失精度的前提下进一步提升模型推理性能。性能数据请参考 [benchmark](https://paddlepaddle.github.io/Paddle-Lite/develop/benchmark/)
- **多模型多算子**
- Paddle Lite和PaddlePaddle训练框架的OP对齐,提供广泛的模型支持能力。
- 目前已严格验证24个模型200个OP的精度和性能,对视觉类模型做到了较为充分的支持,覆盖分类、检测和定位,包含了特色的OCR模型的支持,并在不断丰富中。具体请参考[支持OP](https://paddle-lite.readthedocs.io/zh/develop/introduction/support_operation_list.html)
- 目前已严格验证24个模型200个OP的精度和性能,对视觉类模型做到了较为充分的支持,覆盖分类、检测和定位,包含了特色的OCR模型的支持,并在不断丰富中。具体请参考[支持OP](https://paddle-lite.readthedocs.io/zh/latest/introduction/support_operation_list.html)
- **强大的图分析和优化能力**
- 不同于常规的移动端预测引擎基于 Python 脚本工具转化模型, Lite 架构上有完整基于 C++ 开发的 IR 及相应 Pass 集合,以支持操作熔合,计算剪枝,存储优化,量化计算等多类计算图优化。更多的优化策略可以简单通过 [新增 Pass](https://paddle-lite.readthedocs.io/zh/develop/develop_guides/add_new_pass.html) 的方式模块化支持。
- 不同于常规的移动端预测引擎基于 Python 脚本工具转化模型, Lite 架构上有完整基于 C++ 开发的 IR 及相应 Pass 集合,以支持操作熔合,计算剪枝,存储优化,量化计算等多类计算图优化。更多的优化策略可以简单通过 [新增 Pass](https://paddle-lite.readthedocs.io/zh/latest/develop_guides/add_new_pass.html) 的方式模块化支持。
## 持续集成
......@@ -97,25 +97,25 @@ Paddle Lite 的架构设计着重考虑了对多硬件和平台的支持,并
如果您想要进一步了解Paddle Lite,下面是进一步学习和使用Paddle-Lite的相关内容:
### 文档和示例
- 完整文档: [Paddle Lite 文档](https://paddle-lite.readthedocs.io/zh/develop/)
- 完整文档: [Paddle Lite 文档](https://paddle-lite.readthedocs.io/zh/latest/)
- API文档:
- [C++ API文档](https://paddle-lite.readthedocs.io/zh/develop/api_reference/cxx_api_doc.html)
- [Java API文档](https://paddle-lite.readthedocs.io/zh/develop/api_reference/java_api_doc.html)
- [Python API文档](https://paddle-lite.readthedocs.io/zh/develop/api_reference/python_api_doc.html)
- [CV图像处理API文档](https://paddle-lite.readthedocs.io/zh/develop/api_reference/cv.html)
- [C++ API文档](https://paddle-lite.readthedocs.io/zh/latest/api_reference/cxx_api_doc.html)
- [Java API文档](https://paddle-lite.readthedocs.io/zh/latest/api_reference/java_api_doc.html)
- [Python API文档](https://paddle-lite.readthedocs.io/zh/latest/api_reference/python_api_doc.html)
- [CV图像处理API文档](https://paddle-lite.readthedocs.io/zh/latest/api_reference/cv.html)
- Paddle Lite工程示例: [Paddle-Lite-Demo](https://github.com/PaddlePaddle/Paddle-Lite-Demo)
### 关键技术
- 模型量化:
- [静态离线量化](https://paddle-lite.readthedocs.io/zh/develop/user_guides/post_quant_with_data.html)
- [动态离线量化](https://paddle-lite.readthedocs.io/zh/develop/user_guides/post_quant_no_data.html)
- [量化训练](https://paddle-lite.readthedocs.io/zh/develop/user_guides/model_quantization.html)
- 调试分析:[调试和性能分析工具](https://paddle-lite.readthedocs.io/zh/develop/user_guides/debug.html)
- 移动端模型训练:点击[了解一下](https://paddle-lite.readthedocs.io/zh/develop/demo_guides/cpp_train_demo.html)
- [静态离线量化](https://paddle-lite.readthedocs.io/zh/latest/user_guides/post_quant_with_data.html)
- [动态离线量化](https://paddle-lite.readthedocs.io/zh/latest/user_guides/post_quant_no_data.html)
- [量化训练](https://paddle-lite.readthedocs.io/zh/latest/user_guides/model_quantization.html)
- 调试分析:[调试和性能分析工具](https://paddle-lite.readthedocs.io/zh/latest/user_guides/debug.html)
- 移动端模型训练:点击[了解一下](https://paddle-lite.readthedocs.io/zh/latest/demo_guides/cpp_train_demo.html)
- 飞桨预训练模型库:试试在[PaddleHub](https://www.paddlepaddle.org.cn/hublist?filter=hot&value=1)浏览和下载Paddle的预训练模型
### FAQ
- FAQ:常见问题,可以访问[FAQ](https://paddle-lite.readthedocs.io/zh/develop/introduction/faq.html)、搜索Issues、或者通过页面底部的联系方式联系我们
- FAQ:常见问题,可以访问[FAQ](https://paddle-lite.readthedocs.io/zh/latest/introduction/faq.html)、搜索Issues、或者通过页面底部的联系方式联系我们
###贡献代码
- 贡献代码:如果您想一起参与Paddle Lite的开发,贡献代码,请访问[开发者共享文档](https://paddle-lite.readthedocs.io/zh/develop/develop_guides/for-developer.html)
- 贡献代码:如果您想一起参与Paddle Lite的开发,贡献代码,请访问[开发者共享文档](https://paddle-lite.readthedocs.io/zh/latest/develop_guides/for-developer.html)
## 交流与反馈
......
# 支持OP
# 支持算子
## Ops (共计158个算子)
当前Paddle-Lite共计支持算子204个,其中基础算子78个,附加算子126个。
### Basic Operators (默认编译的算子)
- affine_channel
- arg_max
- batch_norm
- bilinear_interp
- box_coder
- calib
- cast
- concat
- conv2d
- conv2d_transpose
- density_prior_box
- depthwise_conv2d
- dropout
- elementwise_add
- elementwise_div
- elementwise_max
- elementwise_mul
- elementwise_sub
- exp
- expand
- fake_channel_wise_dequantize_max_abs
- fake_dequantize_max_abs
- fake_quantize_abs_max
- fake_quantize_dequantize_moving_average_abs_max
- fake_quantize_moving_average_abs_max
- fake_quantize_range_abs_max
- fc
- feed
- fetch
- fill_constant
- fill_constant_batch_size_like
- flatten
- flatten2
- floor
- fusion_elementwise_add_activation
- fusion_elementwise_div_activation
- fusion_elementwise_max_activation
- fusion_elementwise_mul_activation
- fusion_elementwise_sub_activation
- gelu
- grid_sampler
- hard_sigmoid
- instance_norm
- io_copy
- io_copy_once
- layout
- leaky_relu
- log
- matmul
- mean
- mul
- multiclass_nms
- nearest_interp
- pad2d
- pool2d
- prelu
- prior_box
- range
- reduce_mean
- relu
- relu6
- relu_clipped
- reshape
- reshape2
- rsqrt
- scale
- search_fc
- sequence_topk_avg_pooling
- shuffle_channel
- sigmoid
- slice
- softmax
- softsign
- split
- sqrt
- square
- squeeze
- squeeze2
- stack
- subgraph
- swish
- tanh
- transpose
- transpose2
- unsqueeze
- unsqueeze2
- yolo_box
### 基础算子
### Extra Operators (打开 `--build_extra=ON`开关才会编译)
默认编译的算子,共计78个:
- anchor_generator
- assign
- assign_value
- attention_padding_mask
- axpy
- beam_search
- beam_search_decode
- box_clip
- calib_once
- collect_fpn_proposals
- conditional_block
- crop
- decode_bboxes
- distribute_fpn_proposals
- equal
- gather
- generate_proposals
- graph_op
- greater_equal
- greater_than
- gru
- gru_unit
- im2sequence
- increment
- is_empty
- layer_norm
- layout_once
- less_equal
- less_than
- lod_reset
- logical_and
- logical_not
- logical_or
- logical_xor
- lookup_table
- lookup_table_v2
- lrn
- match_matrix_tensor
- merge_lod_tensor
- negative
- norm
- not_equal
- power
- read_from_array
- reduce_max
- reduce_prod
- reduce_sum
- roi_align
- search_aligned_mat_mul
- search_attention_padding_mask
- search_grnn
- search_group_padding
- search_seq_arithmetic
- search_seq_depadding
- search_seq_fc
- search_seq_softmax
- sequence_arithmetic
- sequence_concat
- sequence_expand
- sequence_expand_as
- sequence_pool
- sequence_reshape
- sequence_reverse
- sequence_softmax
- shape
- split_lod_tensor
- top_k
- uniform_random
- var_conv_2d
- while
- write_to_array
| OP Name | Host | X86 | CUDA | ARM | OpenCL | FPGA | 华为NPU | 百度XPU | 瑞芯微NPU | 联发科APU |
|-:|-|-|-|-|-|-|-|-|-|-|
| affine_channel |   |   |   | Y |   |   |   |   |   |   |
| affine_grid |   |   |   | Y |   |   |   |   |   |   |
| arg_max |   |   |   | Y |   |   |   |   |   |   |
| assign_value |   |   | Y | Y |   |   |   |   |   |   |
| batch_norm |   | Y |   | Y |   |   | Y | Y | Y |   |
| bilinear_interp |   |   | Y | Y | Y |   | Y |   |   |   |
| box_coder |   |   |   | Y | Y |   |   |   |   |   |
| calib |   |   | Y | Y |   | Y |   |   |   |   |
| cast |   | Y |   | Y |   |   |   | Y |   |   |
| concat |   | Y | Y | Y | Y |   | Y |   | Y |   |
| conv2d |   | Y | Y | Y | Y | Y | Y | Y | Y | Y |
| conv2d_transpose |   |   |   | Y |   |   | Y |   |   |   |
| density_prior_box |   |   |   | Y |   |   |   |   |   |   |
| depthwise_conv2d |   | Y | Y | Y | Y | Y | Y | Y | Y | Y |
| depthwise_conv2d_transpose |   |   |   |   |   |   |   |   |   |   |
| dropout |   | Y | Y | Y | Y | Y | Y | Y |   |   |
| elementwise_add |   | Y | Y | Y | Y | Y | Y | Y | Y | Y |
| elementwise_div |   |   |   | Y |   |   | Y |   | Y |   |
| elementwise_max |   |   |   | Y |   |   |   |   |   |   |
| elementwise_mod |   |   |   | Y |   |   |   |   |   |   |
| elementwise_mul |   | Y | Y | Y | Y | Y | Y |   | Y | Y |
| elementwise_pow |   |   |   |   |   |   |   |   |   |   |
| elementwise_sub |   | Y | Y | Y | Y |   | Y |   | Y |   |
| elu |   |   |   | Y |   |   |   |   |   |   |
| expand | Y |   |   |   | Y |   | Y |   |   |   |
| expand_as | Y |   |   |   |   |   |   |   |   |   |
| fc |   | Y | Y | Y | Y | Y | Y |   | Y | Y |
| feed | Y |   | Y |   |   | Y |   |   |   |   |
| fetch | Y |   |   |   |   | Y |   |   |   |   |
| fill_constant | Y |   |   |   |   |   |   |   |   |   |
| fill_constant_batch_size_like | Y | Y |   |   |   |   |   |   |   |   |
| flatten | Y |   |   |   | Y |   |   |   |   |   |
| flatten2 | Y |   |   |   | Y |   |   |   |   |   |
| fusion_elementwise_add_activation |   |   | Y | Y | Y | Y | Y |   |   |   |
| fusion_elementwise_div_activation |   |   |   | Y |   |   | Y |   |   |   |
| fusion_elementwise_max_activation |   |   |   | Y |   |   |   |   |   |   |
| fusion_elementwise_mul_activation |   |   | Y | Y |   |   | Y |   |   |   |
| fusion_elementwise_sub_activation |   |   | Y | Y | Y |   | Y |   |   |   |
| grid_sampler |   |   |   | Y | Y |   |   |   |   |   |
| instance_norm |   |   |   | Y | Y |   | Y |   |   |   |
| io_copy |   |   | Y |   | Y | Y |   |   |   |   |
| io_copy_once |   |   | Y |   | Y | Y |   |   |   |   |
| layout |   |   | Y | Y | Y | Y |   |   |   |   |
| leaky_relu |   | Y | Y | Y | Y |   | Y |   |   |   |
| matmul |   | Y | Y | Y |   |   | Y | Y |   |   |
| mul |   | Y | Y | Y |   |   | Y | Y |   |   |
| multiclass_nms | Y |   |   |   |   | Y |   |   |   |   |
| multiclass_nms2 | Y |   |   |   |   |   |   |   |   |   |
| nearest_interp |   |   | Y | Y | Y |   | Y |   |   |   |
| pad2d |   |   |   | Y | Y |   | Y |   |   |   |
| pool2d |   | Y | Y | Y | Y | Y | Y | Y | Y | Y |
| prelu |   |   |   | Y |   |   |   |   |   |   |
| prior_box |   |   |   | Y |   | Y |   |   |   |   |
| range |   |   |   | Y |   |   |   |   |   |   |
| reduce_mean |   |   |   | Y |   |   | Y |   |   |   |
| relu |   | Y | Y | Y | Y |   | Y |   | Y | Y |
| relu6 |   |   |   | Y | Y |   | Y |   |   |   |
| reshape | Y | Y |   |   | Y |   | Y | Y |   |   |
| reshape2 | Y | Y |   |   | Y |   | Y | Y |   |   |
| scale |   | Y | Y | Y | Y | Y | Y | Y |   |   |
| search_fc |   | Y | Y |   |   |   |   |   |   |   |
| sequence_topk_avg_pooling |   | Y | Y |   |   |   |   |   |   |   |
| shuffle_channel |   |   |   | Y |   |   | Y |   |   |   |
| sigmoid |   | Y | Y | Y | Y |   | Y |   |   |   |
| slice |   | Y |   | Y | Y |   |   | Y |   |   |
| softmax |   | Y | Y | Y |   |   | Y | Y | Y | Y |
| split |   |   |   | Y |   |   | Y |   |   |   |
| squeeze | Y |   |   |   |   |   |   |   |   |   |
| squeeze2 | Y |   |   |   |   |   |   |   |   |   |
| stack |   | Y |   | Y |   |   |   | Y |   |   |
| subgraph |   |   |   |   |   |   | Y | Y | Y | Y |
| tanh |   | Y | Y | Y | Y |   | Y | Y |   |   |
| thresholded_relu |   |   |   | Y |   |   | Y |   |   |   |
| transpose |   | Y | Y | Y | Y |   | Y | Y |   |   |
| transpose2 |   | Y | Y | Y | Y |   | Y | Y |   |   |
| unsqueeze | Y |   |   |   |   |   | Y |   |   |   |
| unsqueeze2 | Y |   |   |   |   |   | Y |   |   |   |
| yolo_box |   |   | Y | Y |   |   |   | Y |   |   |
### 附加算子
## Kernels
附加算子共计126个,需要在编译时打开`--build_extra=ON`开关才会编译,具体请参考[参数详情](../source_compile/library)
### Host kernels
- feed
- fetch
- flatten
- flatten2
- multiclass_nms
- reshape
- reshape2
### ARM kernels
- affine_channel
- anchor_generator
- arg_max
- assign
- assign_value
- axpy
- batch_norm
- beam_search
- beam_search_decode
- bilinear_interp
- box_clip
- box_coder
- cast
- collect_fpn_proposals
- concat
- conditional_block
- conv2d
- conv2d_transpose
- crop
- decode_bboxes
- density_prior_box
- depthwise_conv2d
- distribute_fpn_proposals
- dropout
- elementwise_add
- elementwise_div
- elementwise_max
- elementwise_mul
- elementwise_sub
- equal
- exp
- expand
- fc
- fill_constant
- fill_constant_batch_size_like
- floor
- fusion_elementwise_add_activation
- fusion_elementwise_div_activation
- fusion_elementwise_max_activation
- fusion_elementwise_mul_activation
- fusion_elementwise_sub_activation
- gather
- generate_proposals
- greater_equal
- greater_than
- gru
- gru_unit
- hard_sigmoid
- im2sequence
- increment
- instance_norm
- is_empty
- layer_norm
- layout
- layout_once
- leaky_relu
- less_equal
- less_than
- lod_reset
- log
- logical_and
- logical_not
- logical_or
- logical_xor
- lookup_table
- lookup_table_v2
- lrn
- matmul
- merge_lod_tensor
- mul
- nearest_interp
- negative
- norm
- not_equal
- pad2d
- pool2d
- power
- prelu
- prior_box
- range
- read_from_array
- reduce_max
- reduce_mean
- reduce_prod
- relu
- relu6
- relu_clipped
- roi_align
- rsqrt
- scale
- sequence_expand
- sequence_pool
- sequence_softmax
- shape
- shuffle_channel
- sigmoid
- slice
- softmax
- split
- split_lod_tensor
- squeeze
- squeeze2
- stack
- swish
- tanh
- top_k
- transpose
- transpose2
- unsqueeze
- unsqueeze2
- while
- write_to_array
- yolo_box
### X86 kernels
- batch_norm
- cast
- concat
- conv2d
- depthwise_conv2d
- dropout
- elementwise_add
- elementwise_sub
- fc
- fill_constant_batch_size_like
- gather
- gelu
- gru
- layer_norm
- match_matrix_tensor
- matmul
- mul
- pool2d
- reduce_sum
- relu
- reshape
- reshape2
- scale
- search_aligned_mat_mul
- search_attention_padding_mask
- search_fc
- search_grnn
- search_group_padding
- search_seq_arithmetic
- search_seq_depadding
- search_seq_fc
- search_seq_softmax
- sequence_arithmetic
- sequence_concat
- sequence_expand_as
- sequence_pool
- sequence_reverse
- sequence_topk_avg_pooling
- shape
- slice
- softmax
- softsign
- square
- squeeze
- squeeze2
- stack
- tanh
- transpose
- transpose2
- var_conv_2d
### CUDA kernels
- attention_padding_mask
- bilinear_interp
- calib
- concat
- conv
- dropout
- elementwise_add
- fusion_elementwise_add_activation
- fusion_elementwise_mul_activation
- elementwise_mul
- feed
- io_copy
- layout
- layout_once
- leaky_relu
- lookup_table
- match_matrix_tensor
- mul
- nearest_interp
- pool2d
- relu
- scale
- search_aligned_mat_mul
- search_fc
- search_grnn
- search_group_padding
- search_seq_depadding
- search_seq_fc
- sequence_arithmetic
- sequence_concat
- sequence_pool
- sequence_reverse
- sequence_topk_avg_pooling
- softmax
- transpose
- var_conv_2d
- yolo_box
### OpenCL kernels
- conv2d
- depthwise_conv2d
- elementwise_add
- fc
- fusion_elementwise_add_activation
- layout
- layout_once
- io_copy
- io_copy_once
- mul
- pool2d
- relu
| OP Name | Host | X86 | CUDA | ARM | OpenCL | FPGA | 华为NPU | 百度XPU | 瑞芯微NPU | 联发科APU |
|-:|-|-|-|-|-|-|-|-|-|-|
| abs |   |   | Y | Y |   |   |   |   |   |   |
| anchor_generator |   |   |   | Y |   |   |   |   |   |   |
| assign | Y |   |   |   |   |   |   |   |   |   |
| attention_padding_mask |   |   |   |   |   |   |   |   |   |   |
| axpy |   |   |   | Y |   |   |   |   |   |   |
| beam_search_decode |   |   |   | Y |   |   |   |   |   |   |
| beam_search_decode |   |   |   | Y |   |   |   |   |   |   |
| box_clip |   |   |   | Y |   |   |   |   |   |   |
| calib_once |   |   | Y | Y |   | Y |   |   |   |   |
| clip |   |   |   | Y |   |   |   |   |   |   |
| collect_fpn_proposals |   |   |   | Y |   |   |   |   |   |   |
| conditional_block | Y |   |   |   |   |   |   |   |   |   |
| crf_decoding | Y |   |   |   |   |   |   |   |   |   |
| crop |   |   |   | Y |   |   |   |   |   |   |
| ctc_align | Y |   |   |   |   |   |   |   |   |   |
| decode_bboxes |   |   |   | Y |   |   |   |   |   |   |
| deformable_conv |   |   |   | Y |   |   |   |   |   |   |
| distribute_fpn_proposals |   |   |   | Y |   |   |   |   |   |   |
| equal | Y |   |   |   |   |   |   |   |   |   |
| exp |   |   |   | Y | Y |   |   |   |   |   |
| fake_channel_wise_dequantize_max_abs |   |   |   |   |   |   |   |   |   |   |
| fake_dequantize_max_abs |   |   |   |   |   |   |   |   |   |   |
| fake_quantize_abs_max |   |   |   |   |   |   |   |   |   |   |
| fake_quantize_dequantize_abs_max |   |   |   |   |   |   |   |   |   |   |
| fake_quantize_dequantize_moving_average_abs_max |   |   |   |   |   |   |   |   |   |   |
| fake_quantize_moving_average_abs_max |   |   |   |   |   |   |   |   |   |   |
| fake_quantize_range_abs_max |   |   |   |   |   |   |   |   |   |   |
| floor |   |   |   | Y |   |   |   |   |   |   |
| gather |   | Y |   | Y |   |   |   | Y |   |   |
| gelu |   | Y |   |   |   |   |   |   |   |   |
| generate_proposals |   |   |   | Y |   |   |   |   |   |   |
| greater_equal | Y |   |   |   |   |   |   |   |   |   |
| greater_than | Y |   |   |   |   |   |   |   |   |   |
| group_norm |   |   |   | Y |   |   |   |   |   |   |
| gru |   | Y | Y | Y |   | Y |   |   |   |   |
| gru_unit |   |   |   | Y |   |   |   |   |   |   |
| hard_sigmoid |   |   |   | Y | Y |   | Y |   |   |   |
| hard_swish |   |   |   | Y |   |   |   |   |   |   |
| im2sequence |   |   |   | Y |   |   |   |   |   |   |
| increment |   |   |   | Y |   |   | Y |   |   |   |
| is_empty | Y |   |   |   |   |   |   |   |   |   |
| layer_norm |   | Y |   | Y |   |   | Y | Y |   |   |
| layout_once |   |   | Y | Y |   | Y |   |   |   |   |
| less_equal | Y |   |   |   |   |   |   |   |   |   |
| less_than | Y |   |   |   |   |   | Y |   |   |   |
| lod_reset |   |   |   | Y |   |   |   |   |   |   |
| log |   |   |   | Y |   |   | Y |   |   |   |
| logical_and | Y |   |   |   |   |   |   |   |   |   |
| logical_not | Y |   |   |   |   |   |   |   |   |   |
| logical_or | Y |   |   |   |   |   |   |   |   |   |
| logical_xor | Y |   |   |   |   |   |   |   |   |   |
| lookup_table |   | Y | Y | Y |   |   |   | Y |   |   |
| lookup_table_dequant |   |   |   | Y |   |   |   |   |   |   |
| lookup_table_v2 |   | Y | Y | Y |   |   |   |   |   |   |
| lrn |   |   |   | Y | Y |   |   |   |   |   |
| lstm |   |   |   | Y |   |   |   |   |   |   |
| match_matrix_tensor |   | Y | Y |   |   |   |   |   |   |   |
| max_pool2d_with_index |   |   |   |   |   |   |   |   |   |   |
| mean |   |   |   | Y |   |   |   |   |   |   |
| merge_lod_tensor |   |   |   | Y |   |   |   |   |   |   |
| negative |   |   |   | Y |   |   |   |   |   |   |
| norm |   |   |   | Y |   | Y |   |   |   |   |
| not_equal | Y |   |   |   |   |   |   |   |   |   |
| one_hot | Y |   |   |   |   |   |   |   |   |   |
| pixel_shuffle | Y |   |   | Y | Y |   |   |   |   |   |
| pow |   |   |   |   |   |   |   |   |   |   |
| power |   |   |   | Y |   |   |   |   |   |   |
| print | Y |   |   |   |   |   |   |   |   |   |
| read_from_array | Y |   |   |   |   |   |   |   |   |   |
| reciprocal |   |   |   | Y |   |   |   |   |   |   |
| reduce_max |   |   |   | Y |   |   |   |   |   |   |
| reduce_prod |   |   |   | Y |   |   |   |   |   |   |
| reduce_sum |   | Y |   |   |   |   |   | Y |   |   |
| relu_clipped |   |   |   | Y |   |   | Y |   |   |   |
| retinanet_detection_output | Y |   |   |   |   |   |   |   |   |   |
| roi_align |   |   |   | Y |   |   |   |   |   |   |
| rsqrt |   |   |   | Y |   |   |   |   |   |   |
| search_aligned_mat_mul |   | Y | Y |   |   |   |   |   |   |   |
| search_attention_padding_mask |   | Y | Y |   |   |   |   |   |   |   |
| search_grnn |   | Y | Y |   |   |   |   |   |   |   |
| search_group_padding |   | Y | Y |   |   |   |   |   |   |   |
| search_seq_arithmetic |   | Y | Y |   |   |   |   |   |   |   |
| search_seq_depadding |   | Y | Y |   |   |   |   |   |   |   |
| search_seq_fc |   | Y | Y |   |   |   |   |   |   |   |
| search_seq_softmax |   | Y | Y |   |   |   |   |   |   |   |
| sequence_arithmetic |   | Y | Y |   |   |   |   |   |   |   |
| sequence_concat |   | Y | Y |   |   |   |   |   |   |   |
| sequence_conv |   | Y |   | Y |   |   |   |   |   |   |
| sequence_expand |   |   |   | Y |   |   |   |   |   |   |
| sequence_expand_as |   | Y |   |   |   |   |   |   |   |   |
| sequence_mask |   |   | Y |   |   |   |   |   |   |   |
| sequence_pad |   |   | Y |   |   |   |   |   |   |   |
| sequence_pool |   | Y | Y | Y |   |   |   |   |   |   |
| sequence_pool_concat |   |   | Y |   |   |   |   |   |   |   |
| sequence_reshape |   | Y |   |   |   |   |   |   |   |   |
| sequence_reverse |   | Y | Y |   |   |   |   |   |   |   |
| sequence_reverse_embedding |   |   | Y |   |   |   |   |   |   |   |
| sequence_softmax |   |   |   | Y |   |   |   |   |   |   |
| sequence_unpad |   | Y | Y |   |   |   |   |   |   |   |
| shape | Y | Y |   |   |   |   |   |   |   |   |
| sign |   |   |   |   |   |   |   |   |   |   |
| softsign |   | Y |   |   |   |   | Y |   |   |   |
| split_lod_tensor |   |   |   | Y |   |   |   |   |   |   |
| sqrt |   |   |   |   |   |   | Y |   |   |   |
| square |   | Y |   | Y |   |   | Y |   |   |   |
| swish |   |   |   | Y | Y |   |   |   |   |   |
| top_k |   |   |   | Y |   |   |   |   |   |   |
| topk_pooling |   |   | Y |   |   |   |   |   |   |   |
| uniform_random |   |   |   |   |   |   |   |   |   |   |
| var_conv_2d |   | Y | Y |   |   |   |   |   |   |   |
| where_index | Y |   |   |   |   |   |   |   |   |   |
| while | Y |   |   |   |   |   |   |   |   |   |
| write_to_array | Y |   |   |   |   |   |   |   |   |   |
| __xpu__conv2d |   |   |   |   |   |   |   | Y |   |   |
| __xpu__embedding_with_eltwise_add |   |   |   |   |   |   |   | Y |   |   |
| __xpu__fc |   |   |   |   |   |   |   | Y |   |   |
| __xpu__mmdnn_bid_emb_att |   |   |   |   |   |   |   | Y |   |   |
| __xpu__mmdnn_bid_emb_grnn_att |   |   |   |   |   |   |   | Y |   |   |
| __xpu__mmdnn_bid_emb_grnn_att2 |   |   |   |   |   |   |   | Y |   |   |
| __xpu__mmdnn_match_conv_topk |   |   |   |   |   |   |   | Y |   |   |
| __xpu__mmdnn_merge_all |   |   |   |   |   |   |   | Y |   |   |
| __xpu__mmdnn_search_attention |   |   |   |   |   |   |   | Y |   |   |
| __xpu__multi_encoder |   |   |   |   |   |   |   | Y |   |   |
| __xpu__resnet_cbam |   |   |   |   |   |   |   | Y |   |   |
| __xpu__resnet50 |   |   |   |   |   |   |   | Y |   |   |
| __xpu__sfa_head |   |   |   |   |   |   |   | Y |   |   |
......@@ -80,5 +80,5 @@ pip install paddlelite
- [FPGA源码编译](../demo_guides/fpga)
- [华为NPU源码编译](../demo_guides/huawei_kirin_npu)
- [百度XPU源码编译](../demo_guides/baidu_xpu)
- [Rockchip NPU源码编译](../demo_guides/rockchip_npu)
- [MediaTek APU源码编译](../demo_guides/mediatek_apu)
- [瑞芯微NPU源码编译](../demo_guides/rockchip_npu)
- [联发科APU源码编译](../demo_guides/mediatek_apu)
......@@ -5,13 +5,28 @@ Paddle Lite提供了Android/iOS/X86平台的官方Release预测库下载,如
您也可以根据目标平台选择对应的源码编译方法,Paddle Lite提供了源码编译脚本,位于`lite/tools/`文件夹下,只需要“准备环境”和“调用编译脚本”两个步骤即可一键编译得到目标平台的Paddle Lite预测库。
目前支持四种编译环境:
目前支持四种编译开发环境:
1. [Docker开发环境](compile_env.html#docker)
2. [Linux开发环境](compile_env.html#linux)
3. [Mac OS开发环境](compile_env.html#mac-os)
4. [Windows开发环境](compile_env.html#windows)
源码编译方法支持如下平台:
- [Android源码编译](../source_compile/compile_andriod)
- [iOS源码编译](../source_compile/compile_ios)
- [ArmLinux源码编译](../source_compile/compile_linux)
- [X86源码编译](../demo_guides/x86)
- [OpenCL源码编译](../demo_guides/opencl)
- [CUDA源码编译](../demo_guides/cuda)
- [FPGA源码编译](../demo_guides/fpga)
- [华为NPU源码编译](../demo_guides/huawei_kirin_npu)
- [百度XPU源码编译](../demo_guides/baidu_xpu)
- [瑞芯微NPU源码编译](../demo_guides/rockchip_npu)
- [联发科APU源码编译](../demo_guides/mediatek_apu)
- [模型优化工具opt源码编译](../user_guides/model_optimize_tool.html#opt)
## 1. Docker开发环境
[Docker](https://www.docker.com/) 是一个开源的应用容器引擎, 使用沙箱机制创建独立容器,方便运行不同程序。Lite的Docker镜像基于Ubuntu 16.04,镜像中包含了开发Andriod/Linux等平台要求的软件依赖与工具。
......
......@@ -90,7 +90,7 @@ inference_lite_lib.armlinux.armv8
--opt_model_dir: 输入模型的绝对路径,需要为opt转化之后的模型
```
- 编译 Rockchip NPU 预测库方法,详情请参考:[PaddleLite使用RK NPU预测部署](../demo_guides/rockchip_npu)
- 编译 瑞芯微(Rockchip) NPU 预测库方法,详情请参考:[PaddleLite使用RK NPU预测部署](../demo_guides/rockchip_npu)
```shell
--with_rockchip_npu: (OFF|ON) 是否编译编译 huawei_kirin_npu 的预测库,默认为OFF
......@@ -98,7 +98,7 @@ inference_lite_lib.armlinux.armv8
```
- 编译 Baidu XPU 预测库方法, 详情请参考:[PaddleLite使用百度XPU预测部署](../demo_guides/baidu_xpu)
- 编译 百度(Baidu) XPU 预测库方法, 详情请参考:[PaddleLite使用百度XPU预测部署](../demo_guides/baidu_xpu)
```shell
--with_baidu_xpu: (OFF|ON) 是否编译编译 baidu_xpu 的预测库,默认为OFF
......
......@@ -51,8 +51,8 @@
| LITE_WITH_PYTHON | 编译支持[Python API](../api_reference/python_api_doc.html)的预测库 | X86 / CUDA |OFF |
| LITE_WITH_OPENCL | 编译[OpenCL平台](../demo_guides/opencl.html)预测库 | OpenCL | OFF |
| LITE_WITH_FPGA | 编译[FPGA平台](../demo_guides/fpga.html)预测库 | FPGA | OFF |
| LITE_WITH_NPU | 编译[华为NPU(Kirin SoC)平台](../demo_guides/huawei_kirin_npu.html)预测库 | NPU | OFF |
| LITE_WITH_RKNPU | 编译[RK NPU平台](../demo_guides/rockchip_npu.html)预测库 | RKNPU | OFF |
| LITE_WITH_NPU | 编译[华为NPU平台](../demo_guides/huawei_kirin_npu.html)预测库 | NPU | OFF |
| LITE_WITH_RKNPU | 编译[瑞芯微NPU平台](../demo_guides/rockchip_npu.html)预测库 | RKNPU | OFF |
| LITE_WITH_XPU | 编译[百度XPU平台](../demo_guides/baidu_xpu.html)预测库 | XPU |OFF |
| LITE_WITH_XTCL | 通过XTCL方式支持百度XPU,默认Kernel方式 | XPU |OFF IF LITE_WITH_XPU |
| LITE_WITH_APU | 编译[MTK APU平台](../demo_guides/mediatek_apu.html)预测库 | APU |OFF |
| LITE_WITH_APU | 编译[联发科APU平台](../demo_guides/mediatek_apu.html)预测库 | APU |OFF |
......@@ -5,12 +5,12 @@ Lite预测库分为**基础预测库**和**全量预测库(with_extra)**:基
编译时由编译选项 `build_extra`(默认为OFF)控制,`--build_extra=OFF`时编译**基础预测库**`--build_extra=ON`时编译**全量预测库**
## 基础预测库( [基础OP列表](../introduction/support_operation_list.html#basic-operators) )
## 基础预测库( [基础算子](../introduction/support_operation_list.html#id2) )
### 支持功能
(1)87个[基础OP](../introduction/support_operation_list.html#basic-operators) (2)9个基础模型 (3)3个in8量化模型
(1)78个[基础算子](../introduction/support_operation_list.html#id2) (2)9个基础模型 (3)3个in8量化模型
### 支持的模型
......@@ -39,12 +39,12 @@ mobilenet_v1 mobilenet_v2 resnet50
```
## 全量预测库( [OP列表](../introduction/support_operation_list.html#op) )
## 全量预测库( [支持算子](../introduction/support_operation_list.html#id1) )
### 支持功能
Paddle-Lite中的全量算子( [基础OP](../introduction/support_operation_list.html#basic-operators) + [Extra OP](../introduction/support_operation_list.html#extra-operators-build-extra-on)
Paddle-Lite中的全量算子( [基础算子](../introduction/support_operation_list.html#id2) + [附加算子](../introduction/support_operation_list.html#id3)
### 特点
包含更多算子、支持更多模型,但体量更大。
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册