Performance on action recognition #4

hanoonaR · 2022-12-29T20:19:15Z

Hi Authors,

Thank you for sharing your great work. I'm curious about the performance of your models on action recognition tasks. Have you attempted to benchmark on any standard action recognition tasks such as SSV2, K400/700?

Thank you.

klauscc · 2023-03-01T09:48:08Z

We take TimeSformer's codebase and use our video encoder as the initialization. The results on K400 is about 80%, about 2% higher than TimeSformer with the same architecture. However, very sadly it is much lower than CLIP's vision encoder (~85% if I remember correctly). This may indicate CLIP has stronger representations but our model has better video and text alignment.

Andy1621 · 2023-03-02T02:06:45Z

@klauscc In my previous study, directly fine-tuning CLIP-based TimeSformer achieves about 82%~83%. The similar results also can found in ST-Adapter.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance on action recognition #4

Performance on action recognition #4

hanoonaR commented Dec 29, 2022

klauscc commented Mar 1, 2023

Andy1621 commented Mar 2, 2023

Performance on action recognition #4

Performance on action recognition #4

Comments

hanoonaR commented Dec 29, 2022

klauscc commented Mar 1, 2023

Andy1621 commented Mar 2, 2023