You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for your excellent work, which has been very inspiring to me.
I have some questions about the loss function used for fine-tuning your network in the context of your paper. In the paper, you mentioned using 'the same training objective in standard LDMs' during fine-tuning. However, in Figure 4 of the paper, it is stated that the network uses a pixel-wise reconstruction loss, which seems to compute based on the input video and the reconstructed video instead of the predicted noise. Could you please clarify if I am misunderstanding something?
The text was updated successfully, but these errors were encountered:
Thank you for your excellent work, which has been very inspiring to me.
I have some questions about the loss function used for fine-tuning your network in the context of your paper. In the paper, you mentioned using 'the same training objective in standard LDMs' during fine-tuning. However, in Figure 4 of the paper, it is stated that the network uses a pixel-wise reconstruction loss, which seems to compute based on the input video and the reconstructed video instead of the predicted noise. Could you please clarify if I am misunderstanding something?
The text was updated successfully, but these errors were encountered: