[DOC] fix doc and update supported op list, test=develop, test=document_fix (#4248)

* [DOC] fix doc in readme and compile, test=develop, test=document_fix * [DOC] update supported op list, test=develop, test=document_fix

[DOC] fix doc and update supported op list, test=develop, test=document_fix (#4248)
* [DOC] fix doc in readme and compile, test=develop, test=document_fix * [DOC] update supported op list, test=develop, test=document_fix
ad097764 · Qi Li · GitHub · 02497657 · ad097764 · ad097764
7 changed file
--- a/README.md
+++ b/README.md
@@ -20,55 +20,55 @@ Paddle Lite框架直接支持模型结构为[PaddlePaddle](https://github.com/Pa
 **二. 模型优化**

 Paddle Lite框架拥有优秀的加速、优化策略及实现，包含量化、子图融合、Kernel优选等优化手段。优化后的模型更轻量级，耗费资源更少，并且执行速度也更快。
-这些优化通过Paddle Lite提供的opt工具实现。opt工具还可以统计并打印出模型中的算子信息，并判断不同硬件平台下Paddle Lite的支持情况。您获取PaddlePaddle格式的模型之后，一般需要通该opt工具做模型优化。opt工具的下载和使用，请参考 [模型优化方法](https://paddle-lite.readthedocs.io/zh/develop/user_guides/model_optimize_tool.html)。
+这些优化通过Paddle Lite提供的opt工具实现。opt工具还可以统计并打印出模型中的算子信息，并判断不同硬件平台下Paddle Lite的支持情况。您获取PaddlePaddle格式的模型之后，一般需要通该opt工具做模型优化。opt工具的下载和使用，请参考 [模型优化方法](https://paddle-lite.readthedocs.io/zh/latest/user_guides/model_optimize_tool.html)。

 **三. 下载或编译**

-Paddle Lite提供了Android/iOS/X86平台的官方Release预测库下载，我们优先推荐您直接下载 [Paddle Lite预编译库](https://paddle-lite.readthedocs.io/zh/develop/quick_start/release_lib.html)。
-您也可以根据目标平台选择对应的[源码编译方法](https://paddle-lite.readthedocs.io/zh/develop/quick_start/release_lib.html#id2)。Paddle Lite 提供了源码编译脚本，位于 `lite/tools/`文件夹下，只需要 [准备环境](https://paddle-lite.readthedocs.io/zh/develop/source_compile/compile_env.html) 和 [调用编译脚本](https://paddle-lite.readthedocs.io/zh/develop/quick_start/release_lib.html#id2) 两个步骤即可一键编译得到目标平台的Paddle Lite预测库。
+Paddle Lite提供了Android/iOS/X86平台的官方Release预测库下载，我们优先推荐您直接下载 [Paddle Lite预编译库](https://paddle-lite.readthedocs.io/zh/latest/quick_start/release_lib.html)。
+您也可以根据目标平台选择对应的[源码编译方法](https://paddle-lite.readthedocs.io/zh/latest/quick_start/release_lib.html#id2)。Paddle Lite 提供了源码编译脚本，位于 `lite/tools/`文件夹下，只需要 [准备环境](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html) 和 [调用编译脚本](https://paddle-lite.readthedocs.io/zh/latest/quick_start/release_lib.html#id2) 两个步骤即可一键编译得到目标平台的Paddle Lite预测库。

 **四. 预测示例**

 Paddle Lite提供了C++、Java、Python三种API，并且提供了相应API的完整使用示例:

- [C++完整示例](https://paddle-lite.readthedocs.io/zh/develop/quick_start/cpp_demo.html)
- [Java完整示例](https://paddle-lite.readthedocs.io/zh/develop/quick_start/java_demo.html)
- [Python完整示例](https://paddle-lite.readthedocs.io/zh/develop/quick_start/python_demo.html)
+- [C++完整示例](https://paddle-lite.readthedocs.io/zh/latest/quick_start/cpp_demo.html)
+- [Java完整示例](https://paddle-lite.readthedocs.io/zh/latest/quick_start/java_demo.html)
+- [Python完整示例](https://paddle-lite.readthedocs.io/zh/latest/quick_start/python_demo.html)

 您可以参考示例中的说明快速了解使用方法，并集成到您自己的项目中去。

 针对不同的硬件平台，Paddle Lite提供了各个平台的完整示例：

- [Android示例](https://paddle-lite.readthedocs.io/zh/develop/demo_guides/android_app_demo.html) [[图像分类]](https://paddlelite-demo.bj.bcebos.com/apps/android/mobilenet_classification_demo.apk)  [[目标检测]](https://paddlelite-demo.bj.bcebos.com/apps/android/yolo_detection_demo.apk) [[口罩检测]](https://paddlelite-demo.bj.bcebos.com/apps/android/mask_detection_demo.apk)  [[人脸关键点]](https://paddlelite-demo.bj.bcebos.com/apps/android/face_keypoints_detection_demo.apk) [[人像分割]](https://paddlelite-demo.bj.bcebos.com/apps/android/human_segmentation_demo.apk)
- [iOS示例](https://paddle-lite.readthedocs.io/zh/develop/demo_guides/ios_app_demo.html)
- [ARMLinux示例](https://paddle-lite.readthedocs.io/zh/develop/demo_guides/linux_arm_demo.html)
- [X86示例](https://paddle-lite.readthedocs.io/zh/develop/demo_guides/x86.html)
- [CUDA示例](https://paddle-lite.readthedocs.io/zh/develop/demo_guides/cuda.html)
- [OpenCL示例](https://paddle-lite.readthedocs.io/zh/develop/demo_guides/opencl.html)
- [FPGA示例](https://paddle-lite.readthedocs.io/zh/develop/demo_guides/fpga.html)
- [Huawei NPU示例](https://paddle-lite.readthedocs.io/zh/develop/demo_guides/npu.html)
- [Baidu XPU示例](https://paddle-lite.readthedocs.io/zh/develop/demo_guides/baidu_xpu.html)
- [RKNPU示例](https://paddle-lite.readthedocs.io/zh/develop/demo_guides/rockchip_npu.html)
- [MTK APU示例](https://paddle-lite.readthedocs.io/zh/develop/demo_guides/mediatek_apu.html)
+- [Android示例](https://paddle-lite.readthedocs.io/zh/latest/demo_guides/android_app_demo.html) [[图像分类]](https://paddlelite-demo.bj.bcebos.com/apps/android/mobilenet_classification_demo.apk)  [[目标检测]](https://paddlelite-demo.bj.bcebos.com/apps/android/yolo_detection_demo.apk) [[口罩检测]](https://paddlelite-demo.bj.bcebos.com/apps/android/mask_detection_demo.apk)  [[人脸关键点]](https://paddlelite-demo.bj.bcebos.com/apps/android/face_keypoints_detection_demo.apk) [[人像分割]](https://paddlelite-demo.bj.bcebos.com/apps/android/human_segmentation_demo.apk)
+- [iOS示例](https://paddle-lite.readthedocs.io/zh/latest/demo_guides/ios_app_demo.html)
+- [ARMLinux示例](https://paddle-lite.readthedocs.io/zh/latest/demo_guides/linux_arm_demo.html)
+- [X86示例](https://paddle-lite.readthedocs.io/zh/latest/demo_guides/x86.html)
+- [CUDA示例](https://paddle-lite.readthedocs.io/zh/latest/demo_guides/cuda.html)
+- [OpenCL示例](https://paddle-lite.readthedocs.io/zh/latest/demo_guides/opencl.html)
+- [FPGA示例](https://paddle-lite.readthedocs.io/zh/latest/demo_guides/fpga.html)
+- [华为NPU示例](https://paddle-lite.readthedocs.io/zh/latest/demo_guides/huawei_kirin_npu.html)
+- [百度XPU示例](https://paddle-lite.readthedocs.io/zh/latest/demo_guides/baidu_xpu.html)
+- [瑞芯微NPU示例](https://paddle-lite.readthedocs.io/zh/latest/demo_guides/rockchip_npu.html)
+- [联发科APU示例](https://paddle-lite.readthedocs.io/zh/latest/demo_guides/mediatek_apu.html)



 ## 主要特性

 - **多硬件支持：**
-	- Paddle Lite架构已经验证和完整支持从 Mobile 到 Server [多种硬件平台](https://paddle-lite.readthedocs.io/zh/develop/introduction/support_hardware.html)，包括 ARM CPU、Mali GPU、Adreno GPU、华为 NPU，以及 FPGA 等，且正在不断增加更多新硬件支持。
-	- 各个硬件平台的 Kernel 在代码层和执行层互不干扰，用户不仅可以自由插拔任何硬件，还支持任意系统可见硬件之间的[混合调度](https://paddle-lite.readthedocs.io/zh/develop/introduction/tech_highlights.html#id7)。
+	- Paddle Lite架构已经验证和完整支持从 Mobile 到 Server [多种硬件平台](https://paddle-lite.readthedocs.io/zh/latest/introduction/support_hardware.html)，包括 ARM CPU、Mali GPU、Adreno GPU、华为 NPU，以及 FPGA 等，且正在不断增加更多新硬件支持。
+	- 各个硬件平台的 Kernel 在代码层和执行层互不干扰，用户不仅可以自由插拔任何硬件，还支持任意系统可见硬件之间的[混合调度](https://paddle-lite.readthedocs.io/zh/latest/introduction/tech_highlights.html#id7)。
 - **轻量级部署**：
 	- Paddle Lite在设计上对图优化模块和执行引擎实现了良好的解耦拆分，移动端可以直接部署执行阶段，无任何第三方依赖。
-	- 包含完整的80个 op+85个 Kernel 的动态库，对于ARMV7只有800K，ARMV8下为1.3M，并可以通过[裁剪预测](https://paddle-lite.readthedocs.io/zh/develop/user_guides/library_tailoring.html)库进一步减小预测库文件大小。
+	- 包含完整的80个 op+85个 Kernel 的动态库，对于ARMV7只有800K，ARMV8下为1.3M，并可以通过[裁剪预测](https://paddle-lite.readthedocs.io/zh/latest/user_guides/library_tailoring.html)库进一步减小预测库文件大小。
 - **高性能：**
 	- 极致的 ARM CPU 性能优化：针对不同微架构特点实现kernel的定制，最大发挥计算性能，在主流模型上展现出领先的速度优势。
 	- 支持 [PaddleSlim模型压缩工具](https://github.com/PaddlePaddle/PaddleSlim)：支持量化训练、离线量化等多种量化方式，最优可在不损失精度的前提下进一步提升模型推理性能。性能数据请参考 [benchmark](https://paddlepaddle.github.io/Paddle-Lite/develop/benchmark/)。
 - **多模型多算子**：
 	- Paddle Lite和PaddlePaddle训练框架的OP对齐，提供广泛的模型支持能力。
-	- 目前已严格验证24个模型200个OP的精度和性能，对视觉类模型做到了较为充分的支持，覆盖分类、检测和定位，包含了特色的OCR模型的支持，并在不断丰富中。具体请参考[支持OP](https://paddle-lite.readthedocs.io/zh/develop/introduction/support_operation_list.html)。
+	- 目前已严格验证24个模型200个OP的精度和性能，对视觉类模型做到了较为充分的支持，覆盖分类、检测和定位，包含了特色的OCR模型的支持，并在不断丰富中。具体请参考[支持OP](https://paddle-lite.readthedocs.io/zh/latest/introduction/support_operation_list.html)。
 - **强大的图分析和优化能力**：
-	- 不同于常规的移动端预测引擎基于 Python 脚本工具转化模型， Lite 架构上有完整基于 C++ 开发的 IR 及相应 Pass 集合，以支持操作熔合，计算剪枝，存储优化，量化计算等多类计算图优化。更多的优化策略可以简单通过 [新增 Pass](https://paddle-lite.readthedocs.io/zh/develop/develop_guides/add_new_pass.html) 的方式模块化支持。
+	- 不同于常规的移动端预测引擎基于 Python 脚本工具转化模型， Lite 架构上有完整基于 C++ 开发的 IR 及相应 Pass 集合，以支持操作熔合，计算剪枝，存储优化，量化计算等多类计算图优化。更多的优化策略可以简单通过 [新增 Pass](https://paddle-lite.readthedocs.io/zh/latest/develop_guides/add_new_pass.html) 的方式模块化支持。

 ## 持续集成

@@ -97,25 +97,25 @@ Paddle Lite 的架构设计着重考虑了对多硬件和平台的支持，并

 如果您想要进一步了解Paddle Lite，下面是进一步学习和使用Paddle-Lite的相关内容：
 ### 文档和示例
- 完整文档： [Paddle Lite 文档](https://paddle-lite.readthedocs.io/zh/develop/) 
+- 完整文档： [Paddle Lite 文档](https://paddle-lite.readthedocs.io/zh/latest/) 
 -  API文档：
-	- [C++ API文档](https://paddle-lite.readthedocs.io/zh/develop/api_reference/cxx_api_doc.html)
-	- [Java API文档](https://paddle-lite.readthedocs.io/zh/develop/api_reference/java_api_doc.html) 
-	- [Python API文档](https://paddle-lite.readthedocs.io/zh/develop/api_reference/python_api_doc.html)
-	- [CV图像处理API文档](https://paddle-lite.readthedocs.io/zh/develop/api_reference/cv.html)
+	- [C++ API文档](https://paddle-lite.readthedocs.io/zh/latest/api_reference/cxx_api_doc.html)
+	- [Java API文档](https://paddle-lite.readthedocs.io/zh/latest/api_reference/java_api_doc.html) 
+	- [Python API文档](https://paddle-lite.readthedocs.io/zh/latest/api_reference/python_api_doc.html)
+	- [CV图像处理API文档](https://paddle-lite.readthedocs.io/zh/latest/api_reference/cv.html)
 - Paddle Lite工程示例： [Paddle-Lite-Demo](https://github.com/PaddlePaddle/Paddle-Lite-Demo)
 ### 关键技术
 - 模型量化：
-	-  [静态离线量化](https://paddle-lite.readthedocs.io/zh/develop/user_guides/post_quant_with_data.html)
-	- [动态离线量化](https://paddle-lite.readthedocs.io/zh/develop/user_guides/post_quant_no_data.html)
-	- [量化训练](https://paddle-lite.readthedocs.io/zh/develop/user_guides/model_quantization.html)
- 调试分析：[调试和性能分析工具](https://paddle-lite.readthedocs.io/zh/develop/user_guides/debug.html)
- 移动端模型训练：点击[了解一下](https://paddle-lite.readthedocs.io/zh/develop/demo_guides/cpp_train_demo.html)
+	-  [静态离线量化](https://paddle-lite.readthedocs.io/zh/latest/user_guides/post_quant_with_data.html)
+	- [动态离线量化](https://paddle-lite.readthedocs.io/zh/latest/user_guides/post_quant_no_data.html)
+	- [量化训练](https://paddle-lite.readthedocs.io/zh/latest/user_guides/model_quantization.html)
+- 调试分析：[调试和性能分析工具](https://paddle-lite.readthedocs.io/zh/latest/user_guides/debug.html)
+- 移动端模型训练：点击[了解一下](https://paddle-lite.readthedocs.io/zh/latest/demo_guides/cpp_train_demo.html)
 - 飞桨预训练模型库：试试在[PaddleHub](https://www.paddlepaddle.org.cn/hublist?filter=hot&value=1)浏览和下载Paddle的预训练模型
 ### FAQ
- FAQ：常见问题，可以访问[FAQ](https://paddle-lite.readthedocs.io/zh/develop/introduction/faq.html)、搜索Issues、或者通过页面底部的联系方式联系我们
+- FAQ：常见问题，可以访问[FAQ](https://paddle-lite.readthedocs.io/zh/latest/introduction/faq.html)、搜索Issues、或者通过页面底部的联系方式联系我们
 ###贡献代码
- 贡献代码：如果您想一起参与Paddle Lite的开发，贡献代码，请访问[开发者共享文档](https://paddle-lite.readthedocs.io/zh/develop/develop_guides/for-developer.html)
+- 贡献代码：如果您想一起参与Paddle Lite的开发，贡献代码，请访问[开发者共享文档](https://paddle-lite.readthedocs.io/zh/latest/develop_guides/for-developer.html)


 ##  交流与反馈

--- a/docs/introduction/support_operation_list.md
+++ b/docs/introduction/support_operation_list.md
-# 支持OP
+# 支持算子

-## Ops （共计158个算子）
+当前Paddle-Lite共计支持算子204个，其中基础算子78个，附加算子126个。

-### Basic Operators (默认编译的算子)
- affine_channel
- arg_max
- batch_norm
- bilinear_interp
- box_coder
- calib
- cast
- concat
- conv2d
- conv2d_transpose
- density_prior_box
- depthwise_conv2d
- dropout
- elementwise_add
- elementwise_div
- elementwise_max
- elementwise_mul
- elementwise_sub
- exp
- expand
- fake_channel_wise_dequantize_max_abs
- fake_dequantize_max_abs
- fake_quantize_abs_max
- fake_quantize_dequantize_moving_average_abs_max
- fake_quantize_moving_average_abs_max
- fake_quantize_range_abs_max
- fc
- feed
- fetch
- fill_constant
- fill_constant_batch_size_like
- flatten
- flatten2
- floor
- fusion_elementwise_add_activation
- fusion_elementwise_div_activation
- fusion_elementwise_max_activation
- fusion_elementwise_mul_activation
- fusion_elementwise_sub_activation
- gelu
- grid_sampler
- hard_sigmoid
- instance_norm
- io_copy
- io_copy_once
- layout
- leaky_relu
- log
- matmul
- mean
- mul
- multiclass_nms
- nearest_interp
- pad2d
- pool2d
- prelu
- prior_box
- range
- reduce_mean
- relu
- relu6
- relu_clipped
- reshape
- reshape2
- rsqrt
- scale
- search_fc
- sequence_topk_avg_pooling
- shuffle_channel
- sigmoid
- slice
- softmax
- softsign
- split
- sqrt
- square
- squeeze
- squeeze2
- stack
- subgraph
- swish
- tanh
- transpose
- transpose2
- unsqueeze
- unsqueeze2
- yolo_box
+### 基础算子

-### Extra Operators (打开 `--build_extra=ON`开关才会编译)
+默认编译的算子，共计78个：

- anchor_generator
- assign
- assign_value
- attention_padding_mask
- axpy
- beam_search
- beam_search_decode
- box_clip
- calib_once
- collect_fpn_proposals
- conditional_block
- crop
- decode_bboxes
- distribute_fpn_proposals
- equal
- gather
- generate_proposals
- graph_op
- greater_equal
- greater_than
- gru
- gru_unit
- im2sequence
- increment
- is_empty
- layer_norm
- layout_once
- less_equal
- less_than
- lod_reset
- logical_and
- logical_not
- logical_or
- logical_xor
- lookup_table
- lookup_table_v2
- lrn
- match_matrix_tensor
- merge_lod_tensor
- negative
- norm
- not_equal
- power
- read_from_array
- reduce_max
- reduce_prod
- reduce_sum
- roi_align
- search_aligned_mat_mul
- search_attention_padding_mask
- search_grnn
- search_group_padding
- search_seq_arithmetic
- search_seq_depadding
- search_seq_fc
- search_seq_softmax
- sequence_arithmetic
- sequence_concat
- sequence_expand
- sequence_expand_as
- sequence_pool
- sequence_reshape
- sequence_reverse
- sequence_softmax
- shape
- split_lod_tensor
- top_k
- uniform_random
- var_conv_2d
- while
- write_to_array
+| OP Name | Host | X86 | CUDA | ARM | OpenCL | FPGA | 华为NPU | 百度XPU | 瑞芯微NPU | 联发科APU |
+|-:|-|-|-|-|-|-|-|-|-|-|
+| affine_channel | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| affine_grid | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| arg_max | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| assign_value | 　 | 　 | Y | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| batch_norm | 　 | Y | 　 | Y | 　 | 　 | Y | Y | Y | 　 |
+| bilinear_interp | 　 | 　 | Y | Y | Y | 　 | Y | 　 | 　 | 　 |
+| box_coder | 　 | 　 | 　 | Y | Y | 　 | 　 | 　 | 　 | 　 |
+| calib | 　 | 　 | Y | Y | 　 | Y | 　 | 　 | 　 | 　 |
+| cast | 　 | Y | 　 | Y | 　 | 　 | 　 | Y | 　 | 　 |
+| concat | 　 | Y | Y | Y | Y | 　 | Y | 　 | Y | 　 |
+| conv2d | 　 | Y | Y | Y | Y | Y | Y | Y | Y | Y |
+| conv2d_transpose | 　 | 　 | 　 | Y | 　 | 　 | Y | 　 | 　 | 　 |
+| density_prior_box | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| depthwise_conv2d | 　 | Y | Y | Y | Y | Y | Y | Y | Y | Y |
+| depthwise_conv2d_transpose | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| dropout | 　 | Y | Y | Y | Y | Y | Y | Y | 　 | 　 |
+| elementwise_add | 　 | Y | Y | Y | Y | Y | Y | Y | Y | Y |
+| elementwise_div | 　 | 　 | 　 | Y | 　 | 　 | Y | 　 | Y | 　 |
+| elementwise_max | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| elementwise_mod | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| elementwise_mul | 　 | Y | Y | Y | Y | Y | Y | 　 | Y | Y |
+| elementwise_pow | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| elementwise_sub | 　 | Y | Y | Y | Y | 　 | Y | 　 | Y | 　 |
+| elu | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| expand | Y | 　 | 　 | 　 | Y | 　 | Y | 　 | 　 | 　 |
+| expand_as | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| fc | 　 | Y | Y | Y | Y | Y | Y | 　 | Y | Y |
+| feed | Y | 　 | Y | 　 | 　 | Y | 　 | 　 | 　 | 　 |
+| fetch | Y | 　 | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 |
+| fill_constant | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| fill_constant_batch_size_like | Y | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| flatten | Y | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 |
+| flatten2 | Y | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 |
+| fusion_elementwise_add_activation | 　 | 　 | Y | Y | Y | Y | Y | 　 | 　 | 　 |
+| fusion_elementwise_div_activation | 　 | 　 | 　 | Y | 　 | 　 | Y | 　 | 　 | 　 |
+| fusion_elementwise_max_activation | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| fusion_elementwise_mul_activation | 　 | 　 | Y | Y | 　 | 　 | Y | 　 | 　 | 　 |
+| fusion_elementwise_sub_activation | 　 | 　 | Y | Y | Y | 　 | Y | 　 | 　 | 　 |
+| grid_sampler | 　 | 　 | 　 | Y | Y | 　 | 　 | 　 | 　 | 　 |
+| instance_norm | 　 | 　 | 　 | Y | Y | 　 | Y | 　 | 　 | 　 |
+| io_copy | 　 | 　 | Y | 　 | Y | Y | 　 | 　 | 　 | 　 |
+| io_copy_once | 　 | 　 | Y | 　 | Y | Y | 　 | 　 | 　 | 　 |
+| layout | 　 | 　 | Y | Y | Y | Y | 　 | 　 | 　 | 　 |
+| leaky_relu | 　 | Y | Y | Y | Y | 　 | Y | 　 | 　 | 　 |
+| matmul | 　 | Y | Y | Y | 　 | 　 | Y | Y | 　 | 　 |
+| mul | 　 | Y | Y | Y | 　 | 　 | Y | Y | 　 | 　 |
+| multiclass_nms | Y | 　 | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 |
+| multiclass_nms2 | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| nearest_interp | 　 | 　 | Y | Y | Y | 　 | Y | 　 | 　 | 　 |
+| pad2d | 　 | 　 | 　 | Y | Y | 　 | Y | 　 | 　 | 　 |
+| pool2d | 　 | Y | Y | Y | Y | Y | Y | Y | Y | Y |
+| prelu | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| prior_box | 　 | 　 | 　 | Y | 　 | Y | 　 | 　 | 　 | 　 |
+| range | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| reduce_mean | 　 | 　 | 　 | Y | 　 | 　 | Y | 　 | 　 | 　 |
+| relu | 　 | Y | Y | Y | Y | 　 | Y | 　 | Y | Y |
+| relu6 | 　 | 　 | 　 | Y | Y | 　 | Y | 　 | 　 | 　 |
+| reshape | Y | Y | 　 | 　 | Y | 　 | Y | Y | 　 | 　 |
+| reshape2 | Y | Y | 　 | 　 | Y | 　 | Y | Y | 　 | 　 |
+| scale | 　 | Y | Y | Y | Y | Y | Y | Y | 　 | 　 |
+| search_fc | 　 | Y | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| sequence_topk_avg_pooling | 　 | Y | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| shuffle_channel | 　 | 　 | 　 | Y | 　 | 　 | Y | 　 | 　 | 　 |
+| sigmoid | 　 | Y | Y | Y | Y | 　 | Y | 　 | 　 | 　 |
+| slice | 　 | Y | 　 | Y | Y | 　 | 　 | Y | 　 | 　 |
+| softmax | 　 | Y | Y | Y | 　 | 　 | Y | Y | Y | Y |
+| split | 　 | 　 | 　 | Y | 　 | 　 | Y | 　 | 　 | 　 |
+| squeeze | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| squeeze2 | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| stack | 　 | Y | 　 | Y | 　 | 　 | 　 | Y | 　 | 　 |
+| subgraph | 　 | 　 | 　 | 　 | 　 | 　 | Y | Y | Y | Y |
+| tanh | 　 | Y | Y | Y | Y | 　 | Y | Y | 　 | 　 |
+| thresholded_relu | 　 | 　 | 　 | Y | 　 | 　 | Y | 　 | 　 | 　 |
+| transpose | 　 | Y | Y | Y | Y | 　 | Y | Y | 　 | 　 |
+| transpose2 | 　 | Y | Y | Y | Y | 　 | Y | Y | 　 | 　 |
+| unsqueeze | Y | 　 | 　 | 　 | 　 | 　 | Y | 　 | 　 | 　 |
+| unsqueeze2 | Y | 　 | 　 | 　 | 　 | 　 | Y | 　 | 　 | 　 |
+| yolo_box | 　 | 　 | Y | Y | 　 | 　 | 　 | Y | 　 | 　 |


+### 附加算子

-## Kernels
+附加算子共计126个，需要在编译时打开`--build_extra=ON`开关才会编译，具体请参考[参数详情](../source_compile/library)。

-### Host kernels
-
- feed
- fetch
- flatten
- flatten2
- multiclass_nms
- reshape
- reshape2
-
-### ARM kernels
-
- affine_channel
- anchor_generator
- arg_max
- assign
- assign_value
- axpy
- batch_norm
- beam_search
- beam_search_decode
- bilinear_interp
- box_clip
- box_coder
- cast
- collect_fpn_proposals
- concat
- conditional_block
- conv2d
- conv2d_transpose
- crop
- decode_bboxes
- density_prior_box
- depthwise_conv2d
- distribute_fpn_proposals
- dropout
- elementwise_add
- elementwise_div
- elementwise_max
- elementwise_mul
- elementwise_sub
- equal
- exp
- expand
- fc
- fill_constant
- fill_constant_batch_size_like
- floor
- fusion_elementwise_add_activation
- fusion_elementwise_div_activation
- fusion_elementwise_max_activation
- fusion_elementwise_mul_activation
- fusion_elementwise_sub_activation
- gather
- generate_proposals
- greater_equal
- greater_than
- gru
- gru_unit
- hard_sigmoid
- im2sequence
- increment
- instance_norm
- is_empty
- layer_norm
- layout
- layout_once
- leaky_relu
- less_equal
- less_than
- lod_reset
- log
- logical_and
- logical_not
- logical_or
- logical_xor
- lookup_table
- lookup_table_v2
- lrn
- matmul
- merge_lod_tensor
- mul
- nearest_interp
- negative
- norm
- not_equal
- pad2d
- pool2d
- power
- prelu
- prior_box
- range
- read_from_array
- reduce_max
- reduce_mean
- reduce_prod
- relu
- relu6
- relu_clipped
- roi_align
- rsqrt
- scale
- sequence_expand
- sequence_pool
- sequence_softmax
- shape
- shuffle_channel
- sigmoid
- slice
- softmax
- split
- split_lod_tensor
- squeeze
- squeeze2
- stack
- swish
- tanh
- top_k
- transpose
- transpose2
- unsqueeze
- unsqueeze2
- while
- write_to_array
- yolo_box
-
-
-### X86 kernels
- batch_norm
- cast
- concat
- conv2d
- depthwise_conv2d
- dropout
- elementwise_add
- elementwise_sub
- fc
- fill_constant_batch_size_like
- gather
- gelu
- gru
- layer_norm
- match_matrix_tensor
- matmul
- mul
- pool2d
- reduce_sum
- relu
- reshape
- reshape2
- scale
- search_aligned_mat_mul
- search_attention_padding_mask
- search_fc
- search_grnn
- search_group_padding
- search_seq_arithmetic
- search_seq_depadding
- search_seq_fc
- search_seq_softmax
- sequence_arithmetic
- sequence_concat
- sequence_expand_as
- sequence_pool
- sequence_reverse
- sequence_topk_avg_pooling
- shape
- slice
- softmax
- softsign
- square
- squeeze
- squeeze2
- stack
- tanh
- transpose
- transpose2
- var_conv_2d
-
-### CUDA kernels
- attention_padding_mask
- bilinear_interp
- calib
- concat
- conv
- dropout
- elementwise_add
- fusion_elementwise_add_activation
- fusion_elementwise_mul_activation
- elementwise_mul
- feed
- io_copy
- layout
- layout_once
- leaky_relu
- lookup_table
- match_matrix_tensor
- mul
- nearest_interp
- pool2d
- relu
- scale
- search_aligned_mat_mul
- search_fc
- search_grnn
- search_group_padding
- search_seq_depadding
- search_seq_fc
- sequence_arithmetic
- sequence_concat
- sequence_pool
- sequence_reverse
- sequence_topk_avg_pooling
- softmax
- transpose
- var_conv_2d
- yolo_box
-
-### OpenCL kernels
- conv2d
- depthwise_conv2d
- elementwise_add
- fc
- fusion_elementwise_add_activation
- layout
- layout_once
- io_copy
- io_copy_once
- mul
- pool2d
- relu
+| OP Name | Host | X86 | CUDA | ARM | OpenCL | FPGA | 华为NPU | 百度XPU | 瑞芯微NPU | 联发科APU |
+|-:|-|-|-|-|-|-|-|-|-|-|
+| abs | 　 | 　 | Y | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| anchor_generator | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| assign | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| attention_padding_mask | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| axpy | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| beam_search_decode | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| beam_search_decode | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| box_clip | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| calib_once | 　 | 　 | Y | Y | 　 | Y | 　 | 　 | 　 | 　 |
+| clip | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| collect_fpn_proposals | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| conditional_block | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| crf_decoding | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| crop | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| ctc_align | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| decode_bboxes | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| deformable_conv | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| distribute_fpn_proposals | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| equal | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| exp | 　 | 　 | 　 | Y | Y | 　 | 　 | 　 | 　 | 　 |
+| fake_channel_wise_dequantize_max_abs | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| fake_dequantize_max_abs | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| fake_quantize_abs_max | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| fake_quantize_dequantize_abs_max | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| fake_quantize_dequantize_moving_average_abs_max | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| fake_quantize_moving_average_abs_max | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| fake_quantize_range_abs_max | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| floor | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| gather | 　 | Y | 　 | Y | 　 | 　 | 　 | Y | 　 | 　 |
+| gelu | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| generate_proposals | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| greater_equal | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| greater_than | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| group_norm | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| gru | 　 | Y | Y | Y | 　 | Y | 　 | 　 | 　 | 　 |
+| gru_unit | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| hard_sigmoid | 　 | 　 | 　 | Y | Y | 　 | Y | 　 | 　 | 　 |
+| hard_swish | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| im2sequence | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| increment | 　 | 　 | 　 | Y | 　 | 　 | Y | 　 | 　 | 　 |
+| is_empty | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| layer_norm | 　 | Y | 　 | Y | 　 | 　 | Y | Y | 　 | 　 |
+| layout_once | 　 | 　 | Y | Y | 　 | Y | 　 | 　 | 　 | 　 |
+| less_equal | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| less_than | Y | 　 | 　 | 　 | 　 | 　 | Y | 　 | 　 | 　 |
+| lod_reset | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| log | 　 | 　 | 　 | Y | 　 | 　 | Y | 　 | 　 | 　 |
+| logical_and | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| logical_not | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| logical_or | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| logical_xor | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| lookup_table | 　 | Y | Y | Y | 　 | 　 | 　 | Y | 　 | 　 |
+| lookup_table_dequant | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| lookup_table_v2 | 　 | Y | Y | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| lrn | 　 | 　 | 　 | Y | Y | 　 | 　 | 　 | 　 | 　 |
+| lstm | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| match_matrix_tensor | 　 | Y | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| max_pool2d_with_index | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| mean | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| merge_lod_tensor | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| negative | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| norm | 　 | 　 | 　 | Y | 　 | Y | 　 | 　 | 　 | 　 |
+| not_equal | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| one_hot | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| pixel_shuffle | Y | 　 | 　 | Y | Y | 　 | 　 | 　 | 　 | 　 |
+| pow | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| power | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| print | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| read_from_array | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| reciprocal | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| reduce_max | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| reduce_prod | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| reduce_sum | 　 | Y | 　 | 　 | 　 | 　 | 　 | Y | 　 | 　 |
+| relu_clipped | 　 | 　 | 　 | Y | 　 | 　 | Y | 　 | 　 | 　 |
+| retinanet_detection_output | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| roi_align | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| rsqrt | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| search_aligned_mat_mul | 　 | Y | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| search_attention_padding_mask | 　 | Y | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| search_grnn | 　 | Y | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| search_group_padding | 　 | Y | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| search_seq_arithmetic | 　 | Y | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| search_seq_depadding | 　 | Y | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| search_seq_fc | 　 | Y | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| search_seq_softmax | 　 | Y | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| sequence_arithmetic | 　 | Y | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| sequence_concat | 　 | Y | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| sequence_conv | 　 | Y | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| sequence_expand | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| sequence_expand_as | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| sequence_mask | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| sequence_pad | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| sequence_pool | 　 | Y | Y | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| sequence_pool_concat | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| sequence_reshape | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| sequence_reverse | 　 | Y | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| sequence_reverse_embedding | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| sequence_softmax | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| sequence_unpad | 　 | Y | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| shape | Y | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| sign | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| softsign | 　 | Y | 　 | 　 | 　 | 　 | Y | 　 | 　 | 　 |
+| split_lod_tensor | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| sqrt | 　 | 　 | 　 | 　 | 　 | 　 | Y | 　 | 　 | 　 |
+| square | 　 | Y | 　 | Y | 　 | 　 | Y | 　 | 　 | 　 |
+| swish | 　 | 　 | 　 | Y | Y | 　 | 　 | 　 | 　 | 　 |
+| top_k | 　 | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 |
+| topk_pooling | 　 | 　 | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| uniform_random | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| var_conv_2d | 　 | Y | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| where_index | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| while | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| write_to_array | Y | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | 　 |
+| __xpu__conv2d | 　 | 　 | 　 | 　 | 　 | 　 | 　 | Y | 　 | 　 |
+| __xpu__embedding_with_eltwise_add | 　 | 　 | 　 | 　 | 　 | 　 | 　 | Y | 　 | 　 |
+| __xpu__fc | 　 | 　 | 　 | 　 | 　 | 　 | 　 | Y | 　 | 　 |
+| __xpu__mmdnn_bid_emb_att | 　 | 　 | 　 | 　 | 　 | 　 | 　 | Y | 　 | 　 |
+| __xpu__mmdnn_bid_emb_grnn_att | 　 | 　 | 　 | 　 | 　 | 　 | 　 | Y | 　 | 　 |
+| __xpu__mmdnn_bid_emb_grnn_att2 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | Y | 　 | 　 |
+| __xpu__mmdnn_match_conv_topk | 　 | 　 | 　 | 　 | 　 | 　 | 　 | Y | 　 | 　 |
+| __xpu__mmdnn_merge_all | 　 | 　 | 　 | 　 | 　 | 　 | 　 | Y | 　 | 　 |
+| __xpu__mmdnn_search_attention | 　 | 　 | 　 | 　 | 　 | 　 | 　 | Y | 　 | 　 |
+| __xpu__multi_encoder | 　 | 　 | 　 | 　 | 　 | 　 | 　 | Y | 　 | 　 |
+| __xpu__resnet_cbam | 　 | 　 | 　 | 　 | 　 | 　 | 　 | Y | 　 | 　 |
+| __xpu__resnet50 | 　 | 　 | 　 | 　 | 　 | 　 | 　 | Y | 　 | 　 |
+| __xpu__sfa_head | 　 | 　 | 　 | 　 | 　 | 　 | 　 | Y | 　 | 　 |
--- a/docs/quick_start/release_lib.md
+++ b/docs/quick_start/release_lib.md
@@ -80,5 +80,5 @@ pip install paddlelite
 - [FPGA源码编译](../demo_guides/fpga)
 - [华为NPU源码编译](../demo_guides/huawei_kirin_npu)
 - [百度XPU源码编译](../demo_guides/baidu_xpu)
- [Rockchip NPU源码编译](../demo_guides/rockchip_npu)
- [MediaTek APU源码编译](../demo_guides/mediatek_apu)
+- [瑞芯微NPU源码编译](../demo_guides/rockchip_npu)
+- [联发科APU源码编译](../demo_guides/mediatek_apu)
--- a/docs/source_compile/compile_env.md
+++ b/docs/source_compile/compile_env.md
@@ -5,13 +5,28 @@ Paddle Lite提供了Android/iOS/X86平台的官方Release预测库下载，如

 您也可以根据目标平台选择对应的源码编译方法，Paddle Lite提供了源码编译脚本，位于`lite/tools/`文件夹下，只需要“准备环境”和“调用编译脚本”两个步骤即可一键编译得到目标平台的Paddle Lite预测库。

-目前支持四种编译的环境：
+目前支持四种编译开发环境：

 1. [Docker开发环境](compile_env.html#docker)
 2. [Linux开发环境](compile_env.html#linux)
 3. [Mac OS开发环境](compile_env.html#mac-os)
 4. [Windows开发环境](compile_env.html#windows)

+源码编译方法支持如下平台：
+
+- [Android源码编译](../source_compile/compile_andriod)
+- [iOS源码编译](../source_compile/compile_ios)
+- [ArmLinux源码编译](../source_compile/compile_linux)
+- [X86源码编译](../demo_guides/x86)
+- [OpenCL源码编译](../demo_guides/opencl)
+- [CUDA源码编译](../demo_guides/cuda)
+- [FPGA源码编译](../demo_guides/fpga)
+- [华为NPU源码编译](../demo_guides/huawei_kirin_npu)
+- [百度XPU源码编译](../demo_guides/baidu_xpu)
+- [瑞芯微NPU源码编译](../demo_guides/rockchip_npu)
+- [联发科APU源码编译](../demo_guides/mediatek_apu)
+- [模型优化工具opt源码编译](../user_guides/model_optimize_tool.html#opt)
+
 ## 1. Docker开发环境

 [Docker](https://www.docker.com/) 是一个开源的应用容器引擎, 使用沙箱机制创建独立容器，方便运行不同程序。Lite的Docker镜像基于Ubuntu 16.04，镜像中包含了开发Andriod/Linux等平台要求的软件依赖与工具。

--- a/docs/source_compile/compile_linux.md
+++ b/docs/source_compile/compile_linux.md
@@ -90,7 +90,7 @@ inference_lite_lib.armlinux.armv8
 --opt_model_dir:          输入模型的绝对路径，需要为opt转化之后的模型
 ```

- 编译 Rockchip NPU 预测库方法，详情请参考：[PaddleLite使用RK NPU预测部署](../demo_guides/rockchip_npu)
+- 编译 瑞芯微(Rockchip) NPU 预测库方法，详情请参考：[PaddleLite使用RK NPU预测部署](../demo_guides/rockchip_npu)

 ```shell
 --with_rockchip_npu: (OFF|ON)    是否编译编译 huawei_kirin_npu 的预测库，默认为OFF
@@ -98,7 +98,7 @@ inference_lite_lib.armlinux.armv8
 ```


- 编译 Baidu XPU 预测库方法, 详情请参考：[PaddleLite使用百度XPU预测部署](../demo_guides/baidu_xpu)
+- 编译 百度(Baidu) XPU 预测库方法, 详情请参考：[PaddleLite使用百度XPU预测部署](../demo_guides/baidu_xpu)

 ```shell
 --with_baidu_xpu: (OFF|ON)    是否编译编译 baidu_xpu 的预测库，默认为OFF

--- a/docs/source_compile/compile_options.md
+++ b/docs/source_compile/compile_options.md
@@ -51,8 +51,8 @@
 | LITE_WITH_PYTHON |  编译支持[Python API](../api_reference/python_api_doc.html)的预测库 | X86 / CUDA |OFF |
 | LITE_WITH_OPENCL |  编译[OpenCL平台](../demo_guides/opencl.html)预测库 | OpenCL | OFF |
 | LITE_WITH_FPGA |  编译[FPGA平台](../demo_guides/fpga.html)预测库 | FPGA | OFF |
-| LITE_WITH_NPU |  编译[华为NPU(Kirin SoC)平台](../demo_guides/huawei_kirin_npu.html)预测库 | NPU | OFF |
-| LITE_WITH_RKNPU |  编译[RK NPU平台](../demo_guides/rockchip_npu.html)预测库 | RKNPU | OFF |
+| LITE_WITH_NPU |  编译[华为NPU平台](../demo_guides/huawei_kirin_npu.html)预测库 | NPU | OFF |
+| LITE_WITH_RKNPU |  编译[瑞芯微NPU平台](../demo_guides/rockchip_npu.html)预测库 | RKNPU | OFF |
 | LITE_WITH_XPU |  编译[百度XPU平台](../demo_guides/baidu_xpu.html)预测库 | XPU |OFF |
 | LITE_WITH_XTCL | 通过XTCL方式支持百度XPU，默认Kernel方式 | XPU |OFF IF LITE_WITH_XPU |
-| LITE_WITH_APU | 编译[MTK APU平台](../demo_guides/mediatek_apu.html)预测库 | APU |OFF |
+| LITE_WITH_APU | 编译[联发科APU平台](../demo_guides/mediatek_apu.html)预测库 | APU |OFF |
--- a/docs/source_compile/library.md
+++ b/docs/source_compile/library.md
@@ -5,12 +5,12 @@ Lite预测库分为**基础预测库**和**全量预测库(with_extra)**：基

 编译时由编译选项 `build_extra`(默认为OFF)控制，`--build_extra=OFF`时编译**基础预测库**，`--build_extra=ON`时编译**全量预测库**。

-## 基础预测库( [基础OP列表](../introduction/support_operation_list.html#basic-operators) )
+## 基础预测库( [基础算子](../introduction/support_operation_list.html#id2) )


 ### 支持功能

-（1）87个[基础OP](../introduction/support_operation_list.html#basic-operators)       （2）9个基础模型       （3）3个in8量化模型
+（1）78个[基础算子](../introduction/support_operation_list.html#id2)       （2）9个基础模型       （3）3个in8量化模型


 ### 支持的模型
@@ -39,12 +39,12 @@ mobilenet_v1   mobilenet_v2   resnet50
 ```


-## 全量预测库( [OP列表](../introduction/support_operation_list.html#op) )
+## 全量预测库( [支持算子](../introduction/support_operation_list.html#id1) )


 ### 支持功能

-   Paddle-Lite中的全量算子（ [基础OP](../introduction/support_operation_list.html#basic-operators) + [Extra OP](../introduction/support_operation_list.html#extra-operators-build-extra-on) ）
+   Paddle-Lite中的全量算子（ [基础算子](../introduction/support_operation_list.html#id2) + [附加算子](../introduction/support_operation_list.html#id3) ）

 ### 特点
   包含更多算子、支持更多模型，但体量更大。