Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extremely slow generation #55

Open
farvend opened this issue Jul 24, 2024 · 3 comments
Open

Extremely slow generation #55

farvend opened this issue Jul 24, 2024 · 3 comments

Comments

@farvend
Copy link

farvend commented Jul 24, 2024

Hi, my gpu is gtx 1660 (6 gb) and while using ella my speed drop from 1.5it/s to 5s/it, seems like cuda cores are almost not being used and my CPU does most of the calculations instead

low cuda usage
cpu usage

@JettHu
Copy link
Collaborator

JettHu commented Aug 16, 2024

Can I take a look at your workflow?

@jcatsuki
Copy link

image
image
image
.
I've got a similar problem too. But mine wasn't the KSampler that was taking too much time, mine was the ELLA Text Encode.
Im using an entry gaming laptop with specs of:

  • Ryzen5 3550h
  • gtx 1650(4gb)
  • 24gb ram
    Same with OP, it uses the cpu instead when encoding. I dont know if that how suppose to work though as Im have very limited programming background.
    Thanks

@Chanakan5591
Copy link

Chanakan5591 commented Nov 23, 2024

@jcatsuki I'm not sure if you are still interested in this. But ELLA is indeed utilizing CPU instead of GPU when encoding text unless one of this condition applies:

  1. ComfyUI state that you have NORMAL_VRAM or HIGH_VRAM (in which I will assume so since it will do that with shared memory), and you have a GPU that works with FP16 (16xx series are not one of them according to ComfyUI's code)
  2. you forcibly tell ComfyUI to only use GPU via --gpu-only flag, but that might slow down the diffusion process by a lot if you don't have enough VRAM.

An alternative that works for me but require a little bit of hacky code editing is to edit the model.py in ComfyUI-ELLA directory like so:

this is from roughly line 118, remove the model_management.text_encoder_device() to model_management.get_torch_device() that function exist in ComfyUI and will try to select any acceleration device available.

class T5TextEmbedder:
    def __init__(self, pretrained_path="google/flan-t5-xl", max_length=None, dtype=None, legacy=True):
-        self.load_device = model_management.text_encoder_device()
+        self.load_device = model_management.get_torch_device()

and on roughly line 312:

class ELLA:
    def __init__(self, path: str, **kwargs) -> None:
-        self.load_device = model_management.text_encoder_device()
+        self.load_device = model_management.get_torch_device()

This might not be the most elegant solution but it sure does works well for me, reducing down the encoding time from 6 minutes down to just a couple of seconds.

IMO, there should be an option in ELLA node to either use GPU when available, seperate from ComfyUI's decision, and force GPU or CPU. I will make a pull-request if I make the change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants