What is the learnable queries like BLIP2 to serve as the input of video Q-former? #3

tiesanguaixia · 2023-09-16T13:34:05Z

Thank you in advance!

zjr2000 · 2023-11-08T02:28:55Z

The architecture of the Video-Q Former adheres to the standard design of a vanilla Q-Former; however, the difference lies in its input features, which are the query tokens derived from the image features—the output of the BLIP2 Q-Former. For a detailed understanding, you can examine the specific implementation at this line of code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the learnable queries like BLIP2 to serve as the input of video Q-former? #3

What is the learnable queries like BLIP2 to serve as the input of video Q-former? #3

tiesanguaixia commented Sep 16, 2023

zjr2000 commented Nov 8, 2023

What is the learnable queries like BLIP2 to serve as the input of video Q-former? #3

What is the learnable queries like BLIP2 to serve as the input of video Q-former? #3

Comments

tiesanguaixia commented Sep 16, 2023

zjr2000 commented Nov 8, 2023