-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CPU] Introduce FullyConnected, FCQuantized, FCCompressed, Placeholder #26239
[CPU] Introduce FullyConnected, FCQuantized, FCCompressed, Placeholder #26239
Conversation
d318f8c
to
d444d87
Compare
707a7cf
to
983cfbe
Compare
ca9e51d
to
3fa55f6
Compare
3fa55f6
to
f405ccc
Compare
7a4891b
to
24e0a1b
Compare
cb22763
to
b5a7adf
Compare
b5a7adf
to
9029bc1
Compare
eaba4ff
to
b1e368b
Compare
|
There is one 'issue' left |
b1e368b
to
e80440f
Compare
In serialization the The dynamic type is compatible with any other. The deserialization for type should work as it get ov type by name. |
e80440f
to
cef7bc6
Compare
ov::dynamic type is quite unique. It does not really play well with the backends we use. |
Fix Windows warning Introduce FullyConnectedQuantizeLegacy
cef7bc6
to
61ed454
Compare
Performance regression checks passed. Ready for merge |
openvinotoolkit#26239) ### Details: 1. Introduce the following operations to the internal opset * `FullyConnected` (`MatMul` with transposed constant second input) * `FullyConnectedCompressed` (`FullyConnected` with weights compression) * `FullyConnectedQuantizedLegacy` (`FullyConnected` with quantized activations and weights and dequantize scale and zero point pulled through the Op by LPT) * `FullyConnectedQuantized` (`FullyConnected` with quantization scales and zero points on activation, weights and outputs). Planned to be used in scope of dynamic quantization. Can be used for a static quantization as well in the future. * Unused inputs are presented as `Constant` input with `Shape{0}` 2. The following transformations were added / updated: * `ConvertFullyConnectedToFullyConnectedCompressed` (replaces proprietary ~`FuseFCAndWeightsDecompression`~) * `ConvertFCToFCQuantizedLegacy` replaces proprietary ~`FuseConvMatmulFCDeconvAndDQScales`~ * `FullyConnectedBiasFusion` (added into CPU folder for now, needs to be checked and review by GPU team before adaptation to internal opset). Replaces proprietary ~`FuseConvolutionMatMulDeconvAndBias`~ * `ConvertMatMulToFC` updated to use `ov::op::internal:FullyConnected`, planned to be moved to internal opset after review from GPU team ### Todo - [x] Clean up debug code - [x] Clean up extra cmake targets - [x] Perf regression check ### Tickets: - 149923
Details:
Introduce the following operations to the internal opset
FullyConnected
(MatMul
with transposed constant second input)FullyConnectedCompressed
(FullyConnected
with weights compression)FullyConnectedQuantizedLegacy
(FullyConnected
with quantized activations and weights and dequantize scale and zero point pulled through the Op by LPT)FullyConnectedQuantized
(FullyConnected
with quantization scales and zero points on activation, weights and outputs). Planned to be used in scope of dynamic quantization. Can be used for a static quantization as well in the future.Constant
input withShape{0}
The following transformations were added / updated:
ConvertFullyConnectedToFullyConnectedCompressed
(replaces proprietary)FuseFCAndWeightsDecompression
ConvertFCToFCQuantizedLegacy
replaces proprietaryFuseConvMatmulFCDeconvAndDQScales
FullyConnectedBiasFusion
(added into CPU folder for now, needs to be checked and review by GPU team before adaptation to internal opset). Replaces proprietaryFuseConvolutionMatMulDeconvAndBias
ConvertMatMulToFC
updated to useov::op::internal:FullyConnected
, planned to be moved to internal opset after review from GPU teamTodo
Tickets: