Concurrency support using model clone #564

dtrawins · 2024-02-16T15:42:45Z

Support for execution with multi concurrency

Adding clone operation to model objects which creates new execution context without duplicating compiled_model and memory usage. It enables multi-concurrency in multithreaded applications.
request attribute is now deprecated - new attributes compiled_model and infer_request are added instead - they match OpenVINO objects
improved performance for decoder models by eliminating creating new requests for each inference

Before submitting

Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Multithreading tests

…ency_support_cloneall

dtrawins · 2024-02-16T15:48:10Z

This PR is a replacement for #519

HuggingFaceDocBuilderDev · 2024-02-16T16:02:30Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…loneall

helena-intel · 2024-02-21T09:40:51Z

optimum/intel/openvino/modeling_diffusion.py

-            self.request = core.compile_model(self.model, self.device, self.ov_config)
+            logger.info(f"Compiling the {self._model_name} to {self.device} with config {self.ov_config} ... ")
+            self.compiled_model = core.compile_model(self.model, self.device, self.ov_config)
+            self.request = self.compiled_model  # Deprecated attribute, use compiled_model instead


Can there be a DeprecationWarning when people use self.request, so they know it's deprecated?

helena-intel · 2024-02-21T09:41:47Z

examples/openvino/multithreading/requirements.txt

@@ -0,0 +1,2 @@
+optimum-intel[openvino, nncf]"@git+https://github.com/huggingface/optimum-intel.git
+transformers


Suggested change

transformers

transformers is a dependency of optimum-intel

helena-intel · 2024-02-21T09:45:13Z

optimum/intel/openvino/modeling_base.py

        return self

    def forward(self, *args, **kwargs):
        raise NotImplementedError

+    def clone(self):
+        self.compile()
+        model_cloned = self.__class__(self.model, config=self.config, compile=False, dynamic_shapes=False)


Why dynamic_shapes=False?

dtrawins and others added 19 commits January 17, 2024 19:09

support for concurrency in llm models

9b55100

style fixes

9e4ab17

concurrency in seq2seq and stable diffusion classes

dcb2a8f

merge from main

03b797f

merge from upstream

eb5db08

concurrency via model cloning in encoders and decoders

e189e40

merge from upstream

3395049

fix clone performance

f75021b

init

e9bb941

fix next_beam_idx initialization

5f85cbb

init version

912cf3a

more tests

03b548d

Merge pull request #1 from dtrawins/multithreading_tests

d034d97

Multithreading tests

running conncurrent execution of stable diffusion pipe with cloning

e5d9c75

Merge remote-tracking branch 'dtrawins/stable-diff-test' into concurr…

ae50484

…ency_support_cloneall

merge from main with fixes

ed3e4a3

add concurrency examples

34e7e28

preserve request attribure as deprecated

f4d21d8

merge from main

22b529e

drop not needed tests

c971473

dtrawins added 3 commits February 16, 2024 17:06

style fix

cbdb304

Merge remote-tracking branch 'origin/main' into concurrency_support_c…

bd84ca9

…loneall

fix tests without gpu

30437ae

helena-intel reviewed Feb 21, 2024

View reviewed changes

dtrawins mentioned this pull request Mar 21, 2024

concurrency without model cloning #573

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concurrency support using model clone #564

Concurrency support using model clone #564

dtrawins commented Feb 16, 2024 •

edited

Loading

dtrawins commented Feb 16, 2024

HuggingFaceDocBuilderDev commented Feb 16, 2024

helena-intel Feb 21, 2024

helena-intel Feb 21, 2024

helena-intel Feb 21, 2024

		@@ -0,0 +1,2 @@
		optimum-intel[openvino, nncf]"@git+https://github.com/huggingface/optimum-intel.git
		transformers

Concurrency support using model clone #564

Are you sure you want to change the base?

Concurrency support using model clone #564

Conversation

dtrawins commented Feb 16, 2024 • edited Loading

Support for execution with multi concurrency

Before submitting

dtrawins commented Feb 16, 2024

HuggingFaceDocBuilderDev commented Feb 16, 2024

helena-intel Feb 21, 2024

Choose a reason for hiding this comment

helena-intel Feb 21, 2024

Choose a reason for hiding this comment

helena-intel Feb 21, 2024

Choose a reason for hiding this comment

dtrawins commented Feb 16, 2024 •

edited

Loading