Inquiring about training codes #6

xuanxu92 · 2024-10-02T14:37:31Z

Thanks for the excellent work! could you please release the training code when you are available?

LeoXing1996 · 2024-10-28T03:44:48Z

Hey @xuanxu92, thanks for your interest in our project.

Our training code is based on PIA. We re-implement the attention operation of Motion Module and use the following attention mask during temporal attention.

def make_tril_block_mask(video_length: int, patch_size: int, device):
    """
    tensor([[[1., 1., 0., 0.],
             [1., 1., 0., 0.],
             [1., 1., 1., 0.],
             [1., 1., 1., 1.]]])
    """
    tmp_mask = torch.zeros(video_length, video_length)

    # warmup steps
    for idx in range(patch_size):
        tmp_mask[idx, :patch_size] = 1
    # tril blocks
    for idx in range(patch_size, video_length):
        tmp_mask[idx, :idx + 1] = 1

    tmp_mask = tmp_mask.type(torch.bool)
    mask = torch.zeros_like(tmp_mask, dtype=torch.float)
    mask.masked_fill_(tmp_mask.logical_not(), float('-inf'))
    return mask.to(device)

somuchtome · 2024-10-28T14:10:46Z

Hey @xuanxu92, thanks for your interest in our project.

Our training code is based on PIA. We re-implement the attention operation of Motion Module and use the following attention mask during temporal attention.

def make_tril_block_mask(video_length: int, patch_size: int, device):
    """
    tensor([[[1., 1., 0., 0.],
             [1., 1., 0., 0.],
             [1., 1., 1., 0.],
             [1., 1., 1., 1.]]])
    """
    tmp_mask = torch.zeros(video_length, video_length)

    # warmup steps
    for idx in range(patch_size):
        tmp_mask[idx, :patch_size] = 1
    # tril blocks
    for idx in range(patch_size, video_length):
        tmp_mask[idx, :idx + 1] = 1

    tmp_mask = tmp_mask.type(torch.bool)
    mask = torch.zeros_like(tmp_mask, dtype=torch.float)
    mask.masked_fill_(tmp_mask.logical_not(), float('-inf'))
    return mask.to(device)

Hey, Thank you for your tips. Besides, I just want to try live2diff with uni-directional attention and wonder what will happen. Is the training set of the results show in Live2diff Figure 3(d) same as the warp-up uni-directional attention training set? e.g. 3000 steps, batchsize=1024, lr=1e-4.

LeoXing1996 · 2024-11-09T03:00:36Z

Hey @xuanxu92 , sorry for late response. I check the history of your comment.
For the historical comments: If you apply "full-uni-directional" attention (i.e., causal attention used in LLMs), it's understandable that the initial frames may become stuck, as the first few frames in Live2Diff are trained with "bi-directional" attention.

For the current comment, the answer is "yes."

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inquiring about training codes #6

Inquiring about training codes #6

xuanxu92 commented Oct 2, 2024

LeoXing1996 commented Oct 28, 2024

somuchtome commented Oct 28, 2024 •

edited

Loading

LeoXing1996 commented Nov 9, 2024

Inquiring about training codes #6

Inquiring about training codes #6

Comments

xuanxu92 commented Oct 2, 2024

LeoXing1996 commented Oct 28, 2024

somuchtome commented Oct 28, 2024 • edited Loading

LeoXing1996 commented Nov 9, 2024

somuchtome commented Oct 28, 2024 •

edited

Loading