Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] ConsisID #10140

Open
wants to merge 68 commits into
base: main
Choose a base branch
from
Open

[core] ConsisID #10140

wants to merge 68 commits into from

Conversation

SHYuanBest
Copy link
Contributor

@SHYuanBest SHYuanBest commented Dec 6, 2024

What does this PR do?

Add support for ConsisID (#10100)

Paper: https://arxiv.org/abs/2411.17440
Project: https://pku-yuangroup.github.io/ConsisID
Code: https://github.com/PKU-YuanGroup/ConsisID
Demo: https://huggingface.co/spaces/BestWishYsh/ConsisID-preview-Space

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@SHYuanBest
Copy link
Contributor Author

@a-r-r-o-w Do we need to create a branch of huggingface: ConsisID, or I just use SHYuanBest: main?

@a-r-r-o-w
Copy link
Member

SHYuanBest:main works. This is just a branch from your diffusers fork to HF diffusers library, so you are free to make any changes you'd like here. Looking forward to the ConsisID changes!

@SHYuanBest
Copy link
Contributor Author

SHYuanBest commented Dec 10, 2024

@a-r-r-o-w @HuggingFaceDocBuilderDev hi, I have add consisid to this branch, can you help us to reveiew the code? Is there anything else I missed?

import torch
from diffusers import ConsisIDPipeline
from diffusers.pipelines.consisid.consisid_utils import prepare_face_models, process_face_embeddings_infer
from diffusers.utils import export_to_video
from huggingface_hub import snapshot_download

snapshot_download(repo_id="BestWishYsh/ConsisID-preview", local_dir="BestWishYsh/ConsisID-preview")

face_helper_1, face_helper_2, face_clip_model, face_main_model, eva_transform_mean, eva_transform_std = prepare_face_models("BestWishYsh/ConsisID-preview", device="cuda", dtype=torch.bfloat16)

pipe = ConsisIDPipeline.from_pretrained("BestWishYsh/ConsisID-preview", torch_dtype=torch.bfloat16)
pipe.to("cuda")

prompt = "The video captures a boy walking along a city street, filmed in black and white on a classic 35mm camera. His expression is thoughtful, his brow slightly furrowed as if he's lost in contemplation. The film grain adds a textured, timeless quality to the image, evoking a sense of nostalgia. Around him, the cityscape is filled with vintage buildings, cobblestone sidewalks, and softly blurred figures passing by, their outlines faint and indistinct. Streetlights cast a gentle glow, while shadows play across the boy's path, adding depth to the scene. The lighting highlights the boy's subtle smile, hinting at a fleeting moment of curiosity. The overall cinematic atmosphere, complete with classic film still aesthetics and dramatic contrasts, gives the scene an evocative and introspective feel."
image = "https://github.com/PKU-YuanGroup/ConsisID/blob/main/asserts/example_images/2.png?raw=true"

id_cond, id_vit_hidden, image, face_kps = process_face_embeddings_infer(face_helper_1, face_clip_model, face_helper_2, eva_transform_mean, eva_transform_std, face_main_model, "cuda", torch.bfloat16, image, is_align_face=True)

video = pipe(image=image, prompt=prompt, use_dynamic_cfg=False, id_vit_hidden=id_vit_hidden, id_cond=id_cond, kps_cond=face_kps, generator=torch.Generator("cuda").manual_seed(42))
export_to_video(video.frames[0], "output.mp4", fps=8)

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@SHYuanBest SHYuanBest requested a review from hlky December 10, 2024 09:04
@SHYuanBest
Copy link
Contributor Author

SHYuanBest commented Dec 11, 2024

@a-r-r-o-w @hlky hi, what should I do next?

@hlky
Copy link
Collaborator

hlky commented Dec 18, 2024

Thanks @SHYuanBest. Please wait for further review from @yiyixuxu

@a-r-r-o-w
Copy link
Member

Thanks for working on this @SHYuanBest! The PR looks mostly good to me. There are some things I would like to test and maybe change. Looking into it now. I hope it would be okay if I push to this branch directly

docs/source/en/api/pipelines/consisid.md Show resolved Hide resolved
video = pipe(image=image, prompt=prompt, num_inference_steps=50, guidance_scale=6.0, use_dynamic_cfg=False, id_vit_hidden=id_vit_hidden, id_cond=id_cond, kps_cond=face_kps, generator=torch.Generator("cuda").manual_seed(42))
export_to_video(video.frames[0], "output.mp4", fps=8)
```
<table>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any results being demonstrated should be linked from the huggingface documentation-images repository on HF Hub: https://huggingface.co/datasets/huggingface/documentation-images/tree/main/diffusers

If you could open a PR to their, I can merge it and then that could be linked here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@SHYuanBest SHYuanBest Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -0,0 +1,845 @@
# Copyright 2024 ConsisID Authors and The HuggingFace Team. All rights reserved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yiyixuxu So this model is mostly similar to CogVideoX except for performing cross attention with the face embeddings. I see some usages of nn.Sequential's that we don't like to have in Diffusers. Should we do the conversions and have a conversion script?

tests/pipelines/consisid/test_consisid.py Outdated Show resolved Hide resolved
@SHYuanBest
Copy link
Contributor Author

SHYuanBest commented Dec 22, 2024

to do:

  • Make the test script very small and pass all (model, pipeline, lora).
  • Check if test_vae_tiling requires expected_max_diff==0.35.
  • Have a conversion script about nn.Sequential.
  • Merge https://huggingface.co/datasets/huggingface/documentation-images/discussions/406 and update the Doc links.

@a-r-r-o-w
Copy link
Member

@SHYuanBest Great work on the changes! We will try and integrate this soon and target it for next diffusers release (we have one this week, which is why we've been very busy). On your end, I think we are mostly good with the changes, and just need to address some minor concerns for diffusers-side integration. I will let YiYi comment and do her review first and then we can tackle the remaining things

@SHYuanBest
Copy link
Contributor Author

@a-r-r-o-w @yiyixuxu That's great, much thanks for your great support! Looking forward to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
roadmap Add to current release roadmap
Projects
Status: Future Release
Development

Successfully merging this pull request may close these issues.

6 participants