Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • Paddle
  • Issue
  • #13466

P
Paddle
  • 项目概览

PaddlePaddle / Paddle
大约 2 年 前同步成功

通知 2325
Star 20933
Fork 5424
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 1423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
P
Paddle
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 1,423
    • Issue 1,423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
    • 合并请求 543
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板
已关闭
开放中
Opened 9月 18, 2018 by saxon_zh@saxon_zhGuest

Int8 design

Created by: Superjomn

Int8 design

There are two ways to support int8 conversion from float32:

  • Convert the value directly by some math methods;
    • Pros: simple
    • Cons: some precision loss for some models
  • Retrain the model in int8 domain
    • Pros: lower the precision loss for all the models
    • Cons: heavier to use

The converting way is a common way, and easy to carry out, it works for both the server-deployment and mobile deployment; while the retraining way harder but with better effect, we can make it a special feature for Paddle inference.

Converting way

  • Input: the float32 model
  • Run the infer-predictor N times, collect the statistics from the predictions. (with Python)
  • Generate the Int8 conversion information for the parameters, persist it in some way. (with Python)
  • The infer-predictor load the transformed model
    • Load the parameters
    • Load the program, run IR passes, change the graph if needed to support Int8.
    • Run prediction ...

NOTE: the parameter statistics or parameter conversion can be developed with Python code, while the program modification better to develop with IR/passes, so that the infer-predictor can unify all the optimization ways, for example, schedule the Int8 as well as some high-performance float kernels in the same graph.

Retraining way

We have made some progress in this way, and tested it in some image classification models and works fine.

It works as follows:

  • Input: the float32 model (python)
  • Run the trainer, with the program inserted with some int8 domain operators (python)
  • Persist the model (python)
  • The infer-predictor load the model
    • IR modify the program (remove the int8 training operators)
    • IR transform the parameters (transform the float32 to the format for )
    • and parameter with IR/pass and run the infer.

How to determine the boundary between python and C++

We prefer more codes in C++ so that the codes can be better reused and controlled, but considering Python is better for data analysis, so we had better.

  • Leave the data analysis, some parameter transformation in Python
  • The program modification in C++, as IR passes.
指派人
分配到
无
里程碑
无
分配里程碑
工时统计
无
截止日期
无
标识: paddlepaddle/Paddle#13466
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7