[CPU] Introduce FullyConnected, FCQuantized, FCCompressed, Placeholder #26239

EgorDuplensky · 2024-08-26T17:06:44Z

Details:

Introduce the following operations to the internal opset
- FullyConnected (MatMul with transposed constant second input)
- FullyConnectedCompressed (FullyConnected with weights compression)
- FullyConnectedQuantizedLegacy (FullyConnected with quantized activations and weights and dequantize scale and zero point pulled through the Op by LPT)
- FullyConnectedQuantized (FullyConnected with quantization scales and zero points on activation, weights and outputs). Planned to be used in scope of dynamic quantization. Can be used for a static quantization as well in the future.
- Unused inputs are presented as Constant input with Shape{0}
The following transformations were added / updated:
- ConvertFullyConnectedToFullyConnectedCompressed (replaces proprietary ~~FuseFCAndWeightsDecompression~~)
- ConvertFCToFCQuantizedLegacy replaces proprietary ~~FuseConvMatmulFCDeconvAndDQScales~~
- FullyConnectedBiasFusion (added into CPU folder for now, needs to be checked and review by GPU team before adaptation to internal opset). Replaces proprietary ~~FuseConvolutionMatMulDeconvAndBias~~
- ConvertMatMulToFC updated to use ov::op::internal:FullyConnected, planned to be moved to internal opset after review from GPU team

Todo

Clean up debug code
Clean up extra cmake targets
Perf regression check

Tickets:

149923

src/plugins/intel_cpu/src/dnnl_extension_utils.cpp

EgorDuplensky · 2024-11-26T13:22:35Z

There is an option to create an empty uninitialized Constant of undefined type -- just like that: make_shared<v0::Constant>(element::undefined, Shape{0}); I think we could reconsider the introduction of a Placeholder or simply inherit it from Constant if you need a separate entity.

This was my initial idea, since I also didn't like the PlaceHolder idea. And it didn't work, don't remember why. Let me try one more time :)

Placeholder replaced with an undefined Constant with an empty shape

EgorDuplensky · 2024-11-26T13:42:03Z

There is an option to create an empty uninitialized Constant of undefined type -- just like that: make_shared<v0::Constant>(element::undefined, Shape{0}); I think we could reconsider the introduction of a Placeholder or simply inherit it from Constant if you need a separate entity.

This was my initial idea, since I also didn't like the PlaceHolder idea. And it didn't work, don't remember why. Let me try one more time :)

Placeholder replaced with an undefined Constant with an empty shape

There is one 'issue' left
With this change we are making 'undefined' element type a valid element type.
It has an impact on serialization / deserialization. Such element type was not expected there before.
Also netron does not support it at the moment which is probably is not a big deal and can be fixed.
Also, now we basically do not have some 'invalid' element type inside enum.
Not sure if this is a real issue or not.
@jane-intel @praasz @itikhono What do you think?

praasz · 2024-11-28T11:30:41Z

There is an option to create an empty uninitialized Constant of undefined type -- just like that: make_shared<v0::Constant>(element::undefined, Shape{0}); I think we could reconsider the introduction of a Placeholder or simply inherit it from Constant if you need a separate entity.

This was my initial idea, since I also didn't like the PlaceHolder idea. And it didn't work, don't remember why. Let me try one more time :)

Placeholder replaced with an undefined Constant with an empty shape

There is one 'issue' left With this change we are making 'undefined' element type a valid element type. It has an impact on serialization / deserialization. Such element type was not expected there before. Also netron does not support it at the moment which is probably is not a big deal and can be fixed. Also, now we basically do not have some 'invalid' element type inside enum. Not sure if this is a real issue or not. @jane-intel @praasz @itikhono What do you think?

In serialization the undefined and dynamic types are converted to same string.
Maybe for this empty constant the dynamic can be used and only for empty const can be serialized as dynamic string (maybe can be applied as generic rule later).

The dynamic type is compatible with any other. The deserialization for type should work as it get ov type by name.

EgorDuplensky · 2024-11-29T18:10:36Z

There is an option to create an empty uninitialized Constant of undefined type -- just like that: make_shared<v0::Constant>(element::undefined, Shape{0}); I think we could reconsider the introduction of a Placeholder or simply inherit it from Constant if you need a separate entity.

This was my initial idea, since I also didn't like the PlaceHolder idea. And it didn't work, don't remember why. Let me try one more time :)

Placeholder replaced with an undefined Constant with an empty shape

There is one 'issue' left With this change we are making 'undefined' element type a valid element type. It has an impact on serialization / deserialization. Such element type was not expected there before. Also netron does not support it at the moment which is probably is not a big deal and can be fixed. Also, now we basically do not have some 'invalid' element type inside enum. Not sure if this is a real issue or not. @jane-intel @praasz @itikhono What do you think?

In serialization the undefined and dynamic types are converted to same string. Maybe for this empty constant the dynamic can be used and only for empty const can be serialized as dynamic string (maybe can be applied as generic rule later).

The dynamic type is compatible with any other. The deserialization for type should work as it get ov type by name.

ov::dynamic type is quite unique. It does not really play well with the backends we use.
Let's try to proceed with an undefined type for now. We can replace it with dynamic one later if necessary

Fix Windows warning Introduce FullyConnectedQuantizeLegacy

EgorDuplensky · 2024-12-09T14:17:58Z

Performance regression checks passed. Ready for merge

openvinotoolkit#26239) ### Details: 1. Introduce the following operations to the internal opset * `FullyConnected` (`MatMul` with transposed constant second input) * `FullyConnectedCompressed` (`FullyConnected` with weights compression) * `FullyConnectedQuantizedLegacy` (`FullyConnected` with quantized activations and weights and dequantize scale and zero point pulled through the Op by LPT) * `FullyConnectedQuantized` (`FullyConnected` with quantization scales and zero points on activation, weights and outputs). Planned to be used in scope of dynamic quantization. Can be used for a static quantization as well in the future. * Unused inputs are presented as `Constant` input with `Shape{0}` 2. The following transformations were added / updated: * `ConvertFullyConnectedToFullyConnectedCompressed` (replaces proprietary ~`FuseFCAndWeightsDecompression`~) * `ConvertFCToFCQuantizedLegacy` replaces proprietary ~`FuseConvMatmulFCDeconvAndDQScales`~ * `FullyConnectedBiasFusion` (added into CPU folder for now, needs to be checked and review by GPU team before adaptation to internal opset). Replaces proprietary ~`FuseConvolutionMatMulDeconvAndBias`~ * `ConvertMatMulToFC` updated to use `ov::op::internal:FullyConnected`, planned to be moved to internal opset after review from GPU team ### Todo - [x] Clean up debug code - [x] Clean up extra cmake targets - [x] Perf regression check ### Tickets: - 149923

EgorDuplensky requested review from a team as code owners August 26, 2024 17:06

EgorDuplensky requested review from itikhono and removed request for a team August 26, 2024 17:06

github-actions bot added category: CPU OpenVINO CPU plugin category: transformations OpenVINO Runtime library - Transformations labels Aug 26, 2024

EgorDuplensky force-pushed the quantized_fullyconnected_op branch 2 times, most recently from d318f8c to d444d87 Compare August 27, 2024 09:38

EgorDuplensky commented Aug 27, 2024

View reviewed changes

src/plugins/intel_cpu/src/dnnl_extension_utils.cpp Outdated Show resolved Hide resolved

EgorDuplensky force-pushed the quantized_fullyconnected_op branch 4 times, most recently from 707a7cf to 983cfbe Compare August 28, 2024 13:07

EgorDuplensky requested a review from a team as a code owner August 28, 2024 13:07

github-actions bot added the category: build OpenVINO cmake script / infra label Aug 28, 2024

EgorDuplensky force-pushed the quantized_fullyconnected_op branch 4 times, most recently from ca9e51d to 3fa55f6 Compare September 2, 2024 12:21

EgorDuplensky force-pushed the quantized_fullyconnected_op branch from 3fa55f6 to f405ccc Compare September 5, 2024 16:31

github-actions bot added the category: Core OpenVINO Core (aka ngraph) label Sep 5, 2024

EgorDuplensky force-pushed the quantized_fullyconnected_op branch 2 times, most recently from 7a4891b to 24e0a1b Compare September 5, 2024 17:20

github-actions bot removed the category: Core OpenVINO Core (aka ngraph) label Sep 5, 2024

EgorDuplensky force-pushed the quantized_fullyconnected_op branch 2 times, most recently from cb22763 to b5a7adf Compare September 8, 2024 16:08

EgorDuplensky requested a review from a team as a code owner September 8, 2024 16:08

github-actions bot added the category: IR FE OpenVINO IR v10 / v11 FrontEnd label Sep 8, 2024

EgorDuplensky force-pushed the quantized_fullyconnected_op branch from b5a7adf to 9029bc1 Compare September 8, 2024 16:37

EgorDuplensky force-pushed the quantized_fullyconnected_op branch from eaba4ff to b1e368b Compare November 26, 2024 13:21

EgorDuplensky force-pushed the quantized_fullyconnected_op branch from b1e368b to e80440f Compare November 26, 2024 15:38

praasz approved these changes Nov 27, 2024

View reviewed changes

dmitry-gorokhov approved these changes Nov 29, 2024

View reviewed changes

dmitry-gorokhov mentioned this pull request Nov 29, 2024

[CPU] [ARM] [INT8] FullyConnected #25171

Merged

EgorDuplensky force-pushed the quantized_fullyconnected_op branch from e80440f to cef7bc6 Compare November 29, 2024 17:52

github-actions bot removed the category: build OpenVINO cmake script / infra label Nov 29, 2024

EgorDuplensky added the do_not_merge label Nov 29, 2024

EgorDuplensky removed the do_not_merge label Nov 29, 2024

EgorDuplensky added 7 commits December 5, 2024 15:12

[CPU] Introduce FullyConnectedQuantized op and bias fusing

2a89537

Fix Windows warning Introduce FullyConnectedQuantizeLegacy

Apply review comments

c901442

Fix unit tests

77a475c

Revert extra interface changes

187093f

Ensure original hw limitations are applied for compression

8ebac7b

Apply review comments 2

0e6c797

Clean up debug code

61ed454

EgorDuplensky force-pushed the quantized_fullyconnected_op branch from cef7bc6 to 61ed454 Compare December 5, 2024 14:12

dmitry-gorokhov added this pull request to the merge queue Dec 10, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 10, 2024

dmitry-gorokhov added this pull request to the merge queue Dec 10, 2024

Merged via the queue into openvinotoolkit:master with commit 6a4ba46 Dec 10, 2024
173 checks passed

dmitry-gorokhov deleted the quantized_fullyconnected_op branch December 10, 2024 10:04

alvoron added a commit to eshoguli/openvino that referenced this pull request Dec 12, 2024

changes required after openvinotoolkit#26239

7a07337

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CPU] Introduce FullyConnected, FCQuantized, FCCompressed, Placeholder #26239

[CPU] Introduce FullyConnected, FCQuantized, FCCompressed, Placeholder #26239

EgorDuplensky commented Aug 26, 2024 •

edited

Loading

EgorDuplensky commented Nov 26, 2024 •

edited

Loading

EgorDuplensky commented Nov 26, 2024

praasz commented Nov 28, 2024

EgorDuplensky commented Nov 29, 2024

EgorDuplensky commented Dec 9, 2024

[CPU] Introduce FullyConnected, FCQuantized, FCCompressed, Placeholder #26239

[CPU] Introduce FullyConnected, FCQuantized, FCCompressed, Placeholder #26239

Conversation

EgorDuplensky commented Aug 26, 2024 • edited Loading

Details:

Todo

Tickets:

EgorDuplensky commented Nov 26, 2024 • edited Loading

EgorDuplensky commented Nov 26, 2024

praasz commented Nov 28, 2024

EgorDuplensky commented Nov 29, 2024

EgorDuplensky commented Dec 9, 2024

EgorDuplensky commented Aug 26, 2024 •

edited

Loading

EgorDuplensky commented Nov 26, 2024 •

edited

Loading