Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance on action recognition #4

Open
hanoonaR opened this issue Dec 29, 2022 · 2 comments
Open

Performance on action recognition #4

hanoonaR opened this issue Dec 29, 2022 · 2 comments

Comments

@hanoonaR
Copy link

Hi Authors,

Thank you for sharing your great work. I'm curious about the performance of your models on action recognition tasks. Have you attempted to benchmark on any standard action recognition tasks such as SSV2, K400/700?

Thank you.

@klauscc
Copy link
Owner

klauscc commented Mar 1, 2023

We take TimeSformer's codebase and use our video encoder as the initialization. The results on K400 is about 80%, about 2% higher than TimeSformer with the same architecture. However, very sadly it is much lower than CLIP's vision encoder (~85% if I remember correctly). This may indicate CLIP has stronger representations but our model has better video and text alignment.

@Andy1621
Copy link

Andy1621 commented Mar 2, 2023

@klauscc In my previous study, directly fine-tuning CLIP-based TimeSformer achieves about 82%~83%. The similar results also can found in ST-Adapter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants