Extremely slow generation #55

farvend · 2024-07-24T12:10:13Z

Hi, my gpu is gtx 1660 (6 gb) and while using ella my speed drop from 1.5it/s to 5s/it, seems like cuda cores are almost not being used and my CPU does most of the calculations instead

JettHu · 2024-08-16T11:26:01Z

Can I take a look at your workflow?

jcatsuki · 2024-09-18T01:03:00Z

.
I've got a similar problem too. But mine wasn't the KSampler that was taking too much time, mine was the ELLA Text Encode.
Im using an entry gaming laptop with specs of:

Ryzen5 3550h
gtx 1650(4gb)
24gb ram
Same with OP, it uses the cpu instead when encoding. I dont know if that how suppose to work though as Im have very limited programming background.
Thanks

Chanakan5591 · 2024-11-23T07:36:57Z

@jcatsuki I'm not sure if you are still interested in this. But ELLA is indeed utilizing CPU instead of GPU when encoding text unless one of this condition applies:

ComfyUI state that you have NORMAL_VRAM or HIGH_VRAM (in which I will assume so since it will do that with shared memory), and you have a GPU that works with FP16 (16xx series are not one of them according to ComfyUI's code)
you forcibly tell ComfyUI to only use GPU via --gpu-only flag, but that might slow down the diffusion process by a lot if you don't have enough VRAM.

An alternative that works for me but require a little bit of hacky code editing is to edit the model.py in ComfyUI-ELLA directory like so:

this is from roughly line 118, remove the model_management.text_encoder_device() to model_management.get_torch_device() that function exist in ComfyUI and will try to select any acceleration device available.

class T5TextEmbedder:
    def __init__(self, pretrained_path="google/flan-t5-xl", max_length=None, dtype=None, legacy=True):
-        self.load_device = model_management.text_encoder_device()
+        self.load_device = model_management.get_torch_device()

and on roughly line 312:

class ELLA:
    def __init__(self, path: str, **kwargs) -> None:
-        self.load_device = model_management.text_encoder_device()
+        self.load_device = model_management.get_torch_device()

This might not be the most elegant solution but it sure does works well for me, reducing down the encoding time from 6 minutes down to just a couple of seconds.

IMO, there should be an option in ELLA node to either use GPU when available, seperate from ComfyUI's decision, and force GPU or CPU. I will make a pull-request if I make the change.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extremely slow generation #55

Extremely slow generation #55

farvend commented Jul 24, 2024 •

edited

Loading

JettHu commented Aug 16, 2024

jcatsuki commented Sep 18, 2024

Chanakan5591 commented Nov 23, 2024 •

edited

Loading

Extremely slow generation #55

Extremely slow generation #55

Comments

farvend commented Jul 24, 2024 • edited Loading

JettHu commented Aug 16, 2024

jcatsuki commented Sep 18, 2024

Chanakan5591 commented Nov 23, 2024 • edited Loading

farvend commented Jul 24, 2024 •

edited

Loading

Chanakan5591 commented Nov 23, 2024 •

edited

Loading