The generation of the sub-task 【fine-grained action】 in MVBench #252

yxsysu · 2024-12-16T12:14:55Z

Hello authors,

In your paper, you mention that the candidates of the question in the sub-task【fine-grained action】 are generated using UMT-L. Could you please clarify whether you use a pre-trained UMT-L model to encode the videos and the 339 categories (the total number of categories in Moments in Time dataset), and then compute the text-visual similarity?

Thank you!

yinanhe assigned Andy1621 Dec 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The generation of the sub-task 【fine-grained action】 in MVBench #252

The generation of the sub-task 【fine-grained action】 in MVBench #252

yxsysu commented Dec 16, 2024

The generation of the sub-task 【fine-grained action】 in MVBench #252

The generation of the sub-task 【fine-grained action】 in MVBench #252

Comments

yxsysu commented Dec 16, 2024