Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inquiring about training codes #6

Open
xuanxu92 opened this issue Oct 2, 2024 · 3 comments
Open

Inquiring about training codes #6

xuanxu92 opened this issue Oct 2, 2024 · 3 comments

Comments

@xuanxu92
Copy link

xuanxu92 commented Oct 2, 2024

Thanks for the excellent work! could you please release the training code when you are available?

@LeoXing1996
Copy link
Collaborator

Hey @xuanxu92, thanks for your interest in our project.

Our training code is based on PIA. We re-implement the attention operation of Motion Module and use the following attention mask during temporal attention.

def make_tril_block_mask(video_length: int, patch_size: int, device):
    """
    tensor([[[1., 1., 0., 0.],
             [1., 1., 0., 0.],
             [1., 1., 1., 0.],
             [1., 1., 1., 1.]]])
    """
    tmp_mask = torch.zeros(video_length, video_length)

    # warmup steps
    for idx in range(patch_size):
        tmp_mask[idx, :patch_size] = 1
    # tril blocks
    for idx in range(patch_size, video_length):
        tmp_mask[idx, :idx + 1] = 1

    tmp_mask = tmp_mask.type(torch.bool)
    mask = torch.zeros_like(tmp_mask, dtype=torch.float)
    mask.masked_fill_(tmp_mask.logical_not(), float('-inf'))
    return mask.to(device)

@somuchtome
Copy link

somuchtome commented Oct 28, 2024

Hey @xuanxu92, thanks for your interest in our project.

Our training code is based on PIA. We re-implement the attention operation of Motion Module and use the following attention mask during temporal attention.

def make_tril_block_mask(video_length: int, patch_size: int, device):
    """
    tensor([[[1., 1., 0., 0.],
             [1., 1., 0., 0.],
             [1., 1., 1., 0.],
             [1., 1., 1., 1.]]])
    """
    tmp_mask = torch.zeros(video_length, video_length)

    # warmup steps
    for idx in range(patch_size):
        tmp_mask[idx, :patch_size] = 1
    # tril blocks
    for idx in range(patch_size, video_length):
        tmp_mask[idx, :idx + 1] = 1

    tmp_mask = tmp_mask.type(torch.bool)
    mask = torch.zeros_like(tmp_mask, dtype=torch.float)
    mask.masked_fill_(tmp_mask.logical_not(), float('-inf'))
    return mask.to(device)

Hey, Thank you for your tips. Besides, I just want to try live2diff with uni-directional attention and wonder what will happen. Is the training set of the results show in Live2diff Figure 3(d) same as the warp-up uni-directional attention training set? e.g. 3000 steps, batchsize=1024, lr=1e-4.

@LeoXing1996
Copy link
Collaborator

Hey @xuanxu92 , sorry for late response. I check the history of your comment.
For the historical comments: If you apply "full-uni-directional" attention (i.e., causal attention used in LLMs), it's understandable that the initial frames may become stuck, as the first few frames in Live2Diff are trained with "bi-directional" attention.

For the current comment, the answer is "yes."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants