Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • Paddle
  • Issue
  • #18893

P
Paddle
  • 项目概览

PaddlePaddle / Paddle
大约 2 年 前同步成功

通知 2325
Star 20933
Fork 5424
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 1423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
P
Paddle
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 1,423
    • Issue 1,423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
    • 合并请求 543
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板
已关闭
开放中
Opened 7月 30, 2019 by saxon_zh@saxon_zhGuest

QAT&INT8 performance improvement

Created by: wojtuss

Currently we have two new QAT trained models of ResNet50 (from Baidu) at our disposal: I. with conv2d and mul’s weigths in FP32 full range (Model1), II. with conv2d and mul’s weights in FP32 but (fake) quantized (Model2). Model2 has the advantage that it can be used directly in inference (fake INT8). Model1 cannot be used directly, it has to be modified first (either weights have to be fake-quantized resulting in fake INT8 model, or fake quantize/dequantize ops have to be removed resulting in full FP32 inference model).

Our general approach to getting an optimized INT8 model in both cases is as follows:

  1. gather scale values from fake quantize/dequantize operators,
  2. extract FP32 inference model from the QAT model, i.e. a. remove fake quantize/dequantize operators, b. dequantize conv2d and mul’s weights, if needed,
  3. optimize the FP32 inference model using standard fusing passes (e.g. conv2d+bn, conv2d+relu, …),
  4. quantize the optimized FP32 model using standard INT8v2 quantization passes (cpu_quantize_pass, cpu_quantize_squash_pass).

Initially, we started work with Model1, finished implementing all but the last step. That is, from Model1 we got fully optimized FP32 inference model (we verified that accuracy is preserved at this point) which is ready to be quantized. Only step 4. remained to be implemented and applied. After switching to Model2, the steps 2.b and step 4. remains to be implemented and applied.

In case the step 2.b for Model2 was too difficult to perform preserving the accuracy, we would recommend continuing work on Model1.

We expect the whole procedure will be working next week (WW31).

指派人
分配到
无
里程碑
无
分配里程碑
工时统计
无
截止日期
无
标识: paddlepaddle/Paddle#18893
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7