You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I see in the paper that default MLLM configs were largely used, but frame counts were increased where applicable.
Certain models such as LongVA appear to support video contexts up to 1000 frames, but only 128 are used in the benchmark. If models can handle the extra frame context, it seems like it could potentially help their performance.
What determines the frame counts?
The text was updated successfully, but these errors were encountered:
Hello,
I see in the paper that default MLLM configs were largely used, but frame counts were increased where applicable.
Certain models such as LongVA appear to support video contexts up to 1000 frames, but only 128 are used in the benchmark. If models can handle the extra frame context, it seems like it could potentially help their performance.
What determines the frame counts?
The text was updated successfully, but these errors were encountered: