diff --git a/docs/benchmark/benchmark_tools.md b/docs/benchmark/benchmark_tools.md index 36bf8831f142b1bd6c988b0ece7192437643fcbf..3cf1486307ad79a47dfbfe199e3d6d708c99db4b 100644 --- a/docs/benchmark/benchmark_tools.md +++ b/docs/benchmark/benchmark_tools.md @@ -135,53 +135,53 @@ sh benchmark.sh ./benchmark_bin_v8 ./benchmark_models result_armv8.txt true > 不同手机,不同版本,测试模型的性能数据不同。 ```shell -run benchmark armv7 +run benchmark armv8 -------------------------------------- PaddleLite Benchmark Threads=1 Warmup=10 Repeats=30 --- mnasnet avg = 159.8427 ms --- mobilenet_v1 avg = 235.0072 ms --- mobilenet_v2 avg = 173.0387 ms --- shufflenet_v2 avg = 76.0040 ms --- squeezenet_v11 avg = 164.2957 ms +mnasnet min = 19.83500 max = 19.38500 average = 19.65503 +mobilenetv1 min = 32.00600 max = 31.56900 average = 31.81983 +mobilenetv2 min = 22.37900 max = 22.08700 average = 22.28623 +shufflenetv2 min = 10.80400 max = 10.62900 average = 10.68890 +squeezenet min = 17.67400 max = 17.47900 average = 17.57677 Threads=2 Warmup=10 Repeats=30 --- mnasnet avg = 83.1287 ms --- mobilenet_v1 avg = 121.6029 ms --- mobilenet_v2 avg = 86.6175 ms --- shufflenet_v2 avg = 41.5761 ms --- squeezenet_v11 avg = 87.8678 ms +mnasnet min = 11.85600 max = 11.72000 average = 11.77127 +mobilenetv1 min = 18.75000 max = 18.64300 average = 18.70593 +mobilenetv2 min = 14.05100 max = 13.59900 average = 13.71450 +shufflenetv2 min = 6.67200 max = 6.58300 average = 6.63400 +squeezenet min = 12.07100 max = 11.33400 average = 11.41253 Threads=4 Warmup=10 Repeats=30 --- mnasnet avg = 73.3880 ms --- mobilenet_v1 avg = 119.0739 ms --- mobilenet_v2 avg = 85.3050 ms --- shufflenet_v2 avg = 38.0762 ms --- squeezenet_v11 avg = 64.2201 ms +mnasnet min = 7.19300 max = 7.02600 average = 7.08480 +mobilenetv1 min = 10.42000 max = 10.29100 average = 10.34267 +mobilenetv2 min = 8.61900 max = 8.46900 average = 8.54707 +shufflenetv2 min = 4.55200 max = 4.41900 average = 4.46477 +squeezenet min = 8.60000 max = 7.85200 average = 7.98407 -------------------------------------- -run benchmark armv8 +run benchmark armv7 -------------------------------------- PaddleLite Benchmark Threads=1 Warmup=10 Repeats=30 --- mnasnet avg = 165.3073 ms --- mobilenet_v1 avg = 306.0188 ms --- mobilenet_v2 avg = 195.1884 ms --- shufflenet_v2 avg = 99.3692 ms --- squeezenet_v11 avg = 156.6971 ms +mnasnet min = 20.98300 max = 20.81400 average = 20.92527 +mobilenetv1 min = 33.19000 max = 32.81700 average = 33.08490 +mobilenetv2 min = 25.91400 max = 25.61700 average = 25.73097 +shufflenetv2 min = 11.14300 max = 10.97600 average = 11.06757 +squeezenet min = 19.31800 max = 19.20000 average = 19.26530 Threads=2 Warmup=10 Repeats=30 --- mnasnet avg = 90.2290 ms --- mobilenet_v1 avg = 157.0007 ms --- mobilenet_v2 avg = 118.1607 ms --- shufflenet_v2 avg = 68.6804 ms --- squeezenet_v11 avg = 91.3090 ms +mnasnet min = 12.59900 max = 12.46600 average = 12.52207 +mobilenetv1 min = 19.05800 max = 18.94700 average = 18.97897 +mobilenetv2 min = 15.28400 max = 15.11300 average = 15.19843 +shufflenetv2 min = 6.97000 max = 6.81400 average = 6.90863 +squeezenet min = 12.87900 max = 12.12900 average = 12.22530 Threads=4 Warmup=10 Repeats=30 --- mnasnet avg = 179.9730 ms --- mobilenet_v1 avg = 204.0684 ms --- mobilenet_v2 avg = 181.6486 ms --- shufflenet_v2 avg = 123.2728 ms --- squeezenet_v11 avg = 412.9046 ms +mnasnet min = 7.31400 max = 7.12900 average = 7.20357 +mobilenetv1 min = 11.44000 max = 10.86900 average = 10.94383 +mobilenetv2 min = 9.14900 max = 9.03800 average = 9.09907 +shufflenetv2 min = 4.60600 max = 4.49400 average = 4.53360 +squeezenet min = 8.27000 max = 8.10600 average = 8.19000 -------------------------------------- ``` diff --git a/docs/demo_guides/npu.md b/docs/demo_guides/npu.md index 9722ff6aabda87cb02adc111dd1b29e9bdcf3f55..0bdec8d73a881c186d9c4141e2d59a1b2bf11d8b 100644 --- a/docs/demo_guides/npu.md +++ b/docs/demo_guides/npu.md @@ -103,7 +103,6 @@ $ ./lite/tools/build_npu.sh --arm_os=android --arm_abi=armv7 --arm_lang=gcc --an --optimize_out_type=(protobuf|naive_buffer) \ --optimize_out= \ --valid_targets=npu,arm \ - --prefer_int8_kernel=(true|false) \ --record_tailoring_info =(true|false) ``` - model_optimize_tool生成的模型只是标记了NPU支持的Paddle算子,并没有真正生成NPU HiAI模型,只有在执行时才会将标记的Paddle算子转成HiAI IR,最终生成并执行HiAI模型,具体实现参考PR[2576](https://github.com/PaddlePaddle/Paddle-Lite/pull/2576)。 diff --git a/docs/user_guides/model_optimize_tool.md b/docs/user_guides/model_optimize_tool.md index 47f663dc75cdcf0950c87bfe45a78e65604ccbaf..c3d5f527048519e851cc8b9e785dc39668e971a4 100644 --- a/docs/user_guides/model_optimize_tool.md +++ b/docs/user_guides/model_optimize_tool.md @@ -83,7 +83,6 @@ PaddlePaddle模型有两种保存格式: --optimize_out_type=(protobuf|naive_buffer) \ --optimize_out= \ --valid_targets=(arm|opencl|x86|npu|xpu) \ - --prefer_int8_kernel=(true|false) \ --record_tailoring_info =(true|false) ``` @@ -95,12 +94,12 @@ PaddlePaddle模型有两种保存格式: | --optimize_out_type | 输出模型类型,目前支持两种类型:protobuf和naive_buffer,其中naive_buffer是一种更轻量级的序列化/反序列化实现。若您需要在mobile端执行模型预测,请将此选项设置为naive_buffer。默认为protobuf。 | | --optimize_out | 优化模型的输出路径。 | | --valid_targets | 指定模型可执行的backend,默认为arm。目前可支持x86、arm、opencl、npu、xpu,可以同时指定多个backend(以空格分隔),Model Optimize Tool将会自动选择最佳方式。如果需要支持华为NPU(Kirin 810/990 Soc搭载的达芬奇架构NPU),应当设置为npu, arm。 | -| --prefer_int8_kernel | 若待优化模型为int8量化模型(如量化训练得到的量化模型),则设置该选项为true以使用int8内核函数进行推理加速,默认为false。 | | --record_tailoring_info | 当使用 [根据模型裁剪库文件](./library_tailoring.html) 功能时,则设置该选项为true,以记录优化后模型含有的kernel和OP信息,默认为false。 | * 如果待优化的fluid模型是非combined形式,请设置`--model_dir`,忽略`--model_file`和`--param_file`。 * 如果待优化的fluid模型是combined形式,请设置`--model_file`和`--param_file`,忽略`--model_dir`。 * 优化后的模型为以`.nb`名称结尾的单个文件。 +* 删除`prefer_int8_kernel`的输入参数,`opt`自动判别是否是量化模型,进行相应的优化操作。 ### 功能二:统计模型算子信息、判断是否支持 diff --git a/docs/user_guides/model_quantization.md b/docs/user_guides/model_quantization.md index d90fa4bae34cccbcf809bdb2cd102eaf8c468b01..cf506cfa61e3942452ddaf1218d9d55c2fffa3fc 100644 --- a/docs/user_guides/model_quantization.md +++ b/docs/user_guides/model_quantization.md @@ -245,7 +245,6 @@ python compress.py \ --optimize_out_type=naive_buffer \ --optimize_out=mobilenet_v1_quant_opt \ --valid_targets=arm \ ---prefer_int8_kernel=true ``` 如前所述,量化训练后,float目录下的模型参数范围为int8,但参数数据类型仍为float32类型,这样确实没有起到模型参数压缩的效果。但是,经过model\_optimize\_tool工具优化后对应的量化参数均会以int8类型重新存储达到参数压缩的效果,且模型结构也被优化(如进行了各种operator fuse操作)。 diff --git a/docs/user_guides/post_quant_no_data.md b/docs/user_guides/post_quant_no_data.md index 206045822b896e07fca2651768b32c89c7615cb2..4068249ff7544f42c5f2643c971eb003836b1f59 100644 --- a/docs/user_guides/post_quant_no_data.md +++ b/docs/user_guides/post_quant_no_data.md @@ -86,7 +86,6 @@ WeightQuantization.quantize_weight_to_int(save_model_dir, 参考[模型转换](../user_guides/model_optimize_tool)准备模型转换工具,建议从Release页面下载。 参考[模型转换](../user_guides/model_optimize_tool)使用模型转换工具。 -因为该模型会将量化的权重反量化,然后实际加载并执行FP32预测模型,所以opt命令的输入参数--prefer_int8_kernel不需要设置为true,同时其他参数按照实际情况参考文档设置。 比如在安卓手机ARM端进行预测,模型转换的命令为: ```bash ./opt --model_dir=./mobilenet_v1_quant \ diff --git a/docs/user_guides/post_quant_with_data.md b/docs/user_guides/post_quant_with_data.md index 8b293cc7e47a33037de3706a30fd583c5516d165..0044b47610a2a211859bdc42f83f1921a681d50b 100644 --- a/docs/user_guides/post_quant_with_data.md +++ b/docs/user_guides/post_quant_with_data.md @@ -147,13 +147,12 @@ with fluid.name_scope('skip_quant'): 参考[模型转换](../user_guides/model_optimize_tool)准备模型转换工具,建议从Release页面下载。 -参考[模型转换](../user_guides/model_optimize_tool)使用模型转换工具。注意opt命令的输入参数--prefer_int8_kernel必须设置为true,其他参数按照实际情况参考文档设置。比如在安卓手机ARM端进行预测,模型转换的命令为: +参考[模型转换](../user_guides/model_optimize_tool)使用模型转换工具,参数按照实际情况设置。比如在安卓手机ARM端进行预测,模型转换的命令为: ```bash ./opt --model_dir=./mobilenet_v1_quant \ --optimize_out_type=naive_buffer \ --optimize_out=mobilenet_v1_quant_opt \ - --valid_targets=arm \ - --prefer_int8_kernel=true + --valid_targets=arm ``` ### 3.2 量化模型预测 diff --git a/docs/user_guides/tutorial.md b/docs/user_guides/tutorial.md index 6bb71938cab16a92e1c33e3d8276872fbcea580a..8f8aeb6af124bc4805c281e22e39cca51b507651 100644 --- a/docs/user_guides/tutorial.md +++ b/docs/user_guides/tutorial.md @@ -24,8 +24,7 @@ $ ./opt \ --param_file= \ --optimize_out_type=(protobuf|naive_buffer) \ --optimize_out= \ - --valid_targets=(arm|opencl|x86) \ - --prefer_int8_kernel=(ture|false) + --valid_targets=(arm|opencl|x86) ``` 其中,optimize_out为您希望的优化模型的输出路径。optimize_out_type则可以指定输出模型的序列化方式,其目前支持Protobuf与Naive Buffer两种方式,其中Naive Buffer是一种更轻量级的序列化/反序列化实现。如果你需要使用Lite在mobile端进行预测,那么您需要设置optimize_out_type=naive_buffer。