Regarding the quantization matmul operator and softmax operator in PyTorch #2247

xiexiaozheng · 2023-11-06T05:41:51Z

xiexiaozheng
Nov 6, 2023

@alexsu52 Hi, I attempted to QAT quantize a toy model that contains matmul and softmax operators. However, after exporting the model using torch.onnx.export, I noticed that no fake quantization nodes were inserted after the matmul operator, and there were none after softmax either. Why is that?
mode code is like this:

class test_operator(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.act = torch.nn.Hardsigmoid()
        self.conv1 = torch.nn.Conv2d(in_channels=3, out_channels=3, kernel_size=1, stride=1)
        self.softmax = torch.nn.Softmax(-2)
    def forward(self, x1):
        x1 = self.conv1(x1)
        x1 = self.act(x1)
        x1 = torch.flatten(x1, 2)
        x2 = x1.transpose(-2,-1)
        scores = torch.matmul(x1, x2)
        res = self.softmax(scores)
        return res

model = test_operator()
nncf_config_dict = {
        "input_info": {"sample_size": [1, 3, 512, 512]}, # input shape required for model tracing
        "compression": {
            "algorithm": "quantization",  # 8-bit quantization with default settings
        },
    }
nncf_config = NNCFConfig.from_dict(nncf_config_dict)
compression_ctrl, model = create_compressed_model(model, nncf_config)
inference_model = compression_ctrl.strip()
torch.onnx.export(inference_model, torch.rand((1, 3, 512, 512)), './compressed_model.onnx')

the model with onnx format like this

alexsu52 · 2023-11-07T08:05:33Z

alexsu52
Nov 7, 2023
Maintainer

Hi @xiexiaozheng,

Fake quantization node related to MatMul was propagated to the HardSigmoid node. This transformation allows to execute Reshape and Transpose nodes in low precision (INT8) and boost model performance.
Softmax operation is not supported execution in INT8, so NNCF does not quantize this operation.

More information about which operations support execution in INT8 precision and how a quantized model from its original precision is transformed to a low precision is here https://docs.openvino.ai/2023.1/openvino_docs_OV_UG_lpt.html

1 reply

xiexiaozheng Nov 8, 2023
Author

thank you for your answer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regarding the quantization matmul operator and softmax operator in PyTorch #2247

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Regarding the quantization matmul operator and softmax operator in PyTorch #2247

xiexiaozheng Nov 6, 2023

Replies: 1 comment · 1 reply

alexsu52 Nov 7, 2023 Maintainer

xiexiaozheng Nov 8, 2023 Author

xiexiaozheng
Nov 6, 2023

Replies: 1 comment 1 reply

alexsu52
Nov 7, 2023
Maintainer

xiexiaozheng Nov 8, 2023
Author