GPTQ算法来自[GPTQ](https://arxiv.org/abs/2210.17323),该算法逐步按照行量化权重,利用海森矩阵来不断更新未量化的权重,在低比特Weight Only Int4量化表现良好。GPTQ默认使用搭配使用[RPTQ](https://arxiv.org/abs/2304.01089),若不想搭配RPTQ,调用fasterquant时设置act_order=False即可。
# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
importpaddle
importpaddle.nnasnn
importnumpyasnp
from.utilsimportcompute_scales
from.metricsimportmse_loss
__all__=['LayerWiseQuantError']
classLayerWiseQuantError(nn.Layer):
def__init__(self,
layer,
weight_bits=8,
act_bits=8,
weight_quant_method='abs_max_channel_wise',
act_quant_method='abs_max',
loss_function=mse_loss):
'''
LayerWiseQuantError computes the loss bewteen the output of the layer and the outout of the quantized layer.
Args:
layer (paddle.nn.Layer): Layer object.
quant_bits (int, optional): Number of bits to quantize the weight. Default: 8.
act_bits (int, optional): Number of bits to quantize the activation. Default: 8.
weight_quant_method (str, optional): The method of weight quantization. Choosen from abs_max, abs_max_channel_wise and avg. Default: abs_max_channel_wise.
act_quant_method (str, optional): The method of activation quantization. Choosen from abs_max, avg. Default: abs_max.
Examples:
.. code-block:: python
from paddleslim.quant.advanced import GPTQ
for cur_name, cur_layer in model.named_sublayers():
if type(cur_layer) == paddle.nn.Linear:
gptq_layer = LayerWiseQuantError(cur_layer)
for data in dataloader():
model(data)
for cur_name, cur_layer in model.named_sublayers():