Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release inpainting code #109

Open
CyberSculptor96 opened this issue Dec 14, 2024 · 3 comments
Open

release inpainting code #109

CyberSculptor96 opened this issue Dec 14, 2024 · 3 comments

Comments

@CyberSculptor96
Copy link

Hi, authors. Thanks for your great work! I wonder if you are going to release the inpainting code of VAR recently, it means a lot for research community. Looking forward to the release of the code to unlock more potential in VAR!

@iFighting
Copy link
Contributor

@CyberSculptor96
Thanks for your interest to our work, also apologize for late release of the code.
You can refer to this code: https://github.com/FoundationVision/VAR/blob/main/demo_zero_shot_edit.ipynb

@AlbertLin0
Copy link

@iFighting The result of inpainting is not good. Besides the method of replacing the token in the latent level, I also tried two methods:

  1. in the quant.py file 190, i use ground truth f_hat_list[si] , which is obtained from f_to_idxBl_or_fhat
    def get_next_autoregressive_input(self, si: int, SN: int, f_hat: torch.Tensor, h_BChw: torch.Tensor, f_hat_list) -> Tuple[Optional[torch.Tensor], torch.Tensor]: # only used in VAR inference
        HW = self.v_patch_nums[-1]
        f_hat_gt = f_hat_list[si]
        if si != SN-1:
            h = self.quant_resi[si/(SN-1)](F.interpolate(h_BChw, size=(HW, HW), mode='bicubic'))     # conv after upsample
            f_hat.add_(h)
            img_f_hat = vqvae.fhat_to_img(f_hat)
            img_f_hat_gt = vqvae.fhat_to_img(f_hat_gt)
            === Replace mask area in the pixel level ===
            f_hat = vqvae.img_to_fhat(...)
            return f_hat, F.interpolate(f_hat, size=(self.v_patch_nums[si+1], self.v_patch_nums[si+1]), mode='area')
        else:
            h = self.quant_resi[si/(SN-1)](h_BChw)
            f_hat.add_(h)
            img_f_hat = vqvae.fhat_to_img(f_hat)
            img_f_hat_gt = vqvae.fhat_to_img(f_hat_gt)
            === Replace mask area in the pixel level ===
            f_hat = vqvae.img_to_fhat(...)
            return f_hat, f_hat
  1. replace f_hat_list[si] to f_hat_gt, which is obtained from vqvae.img_to_fhat(img_gt)

but it not work...
Operations on pixel space cannot be mapped to operations on tokens, perhaps.

@iFighting
Copy link
Contributor

@iFighting The result of inpainting is not good. Besides the method of replacing the token in the latent level, I also tried two methods:

  1. in the quant.py file 190, i use ground truth f_hat_list[si] , which is obtained from f_to_idxBl_or_fhat
    def get_next_autoregressive_input(self, si: int, SN: int, f_hat: torch.Tensor, h_BChw: torch.Tensor, f_hat_list) -> Tuple[Optional[torch.Tensor], torch.Tensor]: # only used in VAR inference
        HW = self.v_patch_nums[-1]
        f_hat_gt = f_hat_list[si]
        if si != SN-1:
            h = self.quant_resi[si/(SN-1)](F.interpolate(h_BChw, size=(HW, HW), mode='bicubic'))     # conv after upsample
            f_hat.add_(h)
            img_f_hat = vqvae.fhat_to_img(f_hat)
            img_f_hat_gt = vqvae.fhat_to_img(f_hat_gt)
            === Replace mask area in the pixel level ===
            f_hat = vqvae.img_to_fhat(...)
            return f_hat, F.interpolate(f_hat, size=(self.v_patch_nums[si+1], self.v_patch_nums[si+1]), mode='area')
        else:
            h = self.quant_resi[si/(SN-1)](h_BChw)
            f_hat.add_(h)
            img_f_hat = vqvae.fhat_to_img(f_hat)
            img_f_hat_gt = vqvae.fhat_to_img(f_hat_gt)
            === Replace mask area in the pixel level ===
            f_hat = vqvae.img_to_fhat(...)
            return f_hat, f_hat
  1. replace f_hat_list[si] to f_hat_gt, which is obtained from vqvae.img_to_fhat(img_gt)

but it not work... Operations on pixel space cannot be mapped to operations on tokens, perhaps.

@AlbertLin0 which model do you use? i recommend you to use the 512 resolution model

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants