Created by: juncaipeng
PR types
Others
PR changes
Others
Describe
To avoid saving the cache data, get the abs min and abs max value of all quantized tensor in preparation stage, and then update the histogram fo quantized tensor in sampling stage.
- 优化离线量化方法,第一次前向计算,计算所有量化tensor的绝对值最大最小值,第二次前向计算,将采样数据的统计信息更新到同一个直方图中,最后基于直方图统计计算KL阈值。
- 可以避免缓存大量采样数据