Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could you provide the code for visualizing attention in Figure 2, or help us identify if there are any issues with our approach? #90

Open
yuhkalhic opened this issue Dec 11, 2024 · 0 comments

Comments

@yuhkalhic
Copy link

Thank you for your excellent work. We have a question regarding the attention visualization in Figure 2 of your paper.

We attempted to reproduce the visualization using the following approach:

  • Taking the last transformer layer
  • Summing across all attention heads

And an example of our method is below. However, our results differ significantly from yours. Even when I set my code to be the first head of the first transformer layer, I can't get such a high score and such a significant distribution pattern like yours.

Could you help us understand, if there might be any specific preprocess or normalization steps we're missing?

To help us better understand and reproduce your results, would it be possible to share the visualization code you used? This would be incredibly helpful for our research.

Thank you for your time and assistance.

屏幕截图 2024-12-11 145016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant