You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The architecture of the Video-Q Former adheres to the standard design of a vanilla Q-Former; however, the difference lies in its input features, which are the query tokens derived from the image features—the output of the BLIP2 Q-Former. For a detailed understanding, you can examine the specific implementation at this line of code.
Thank you in advance!
The text was updated successfully, but these errors were encountered: