Could you provide the code for visualizing attention in Figure 2, or help us identify if there are any issues with our approach? #90

yuhkalhic · 2024-12-11T07:18:23Z

Thank you for your excellent work. We have a question regarding the attention visualization in Figure 2 of your paper.

We attempted to reproduce the visualization using the following approach:

Taking the last transformer layer
Summing across all attention heads

And an example of our method is below. However, our results differ significantly from yours. Even when I set my code to be the first head of the first transformer layer, I can't get such a high score and such a significant distribution pattern like yours.

Could you help us understand, if there might be any specific preprocess or normalization steps we're missing?

To help us better understand and reproduce your results, would it be possible to share the visualization code you used? This would be incredibly helpful for our research.

Thank you for your time and assistance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could you provide the code for visualizing attention in Figure 2, or help us identify if there are any issues with our approach? #90

Could you provide the code for visualizing attention in Figure 2, or help us identify if there are any issues with our approach? #90

yuhkalhic commented Dec 11, 2024

Could you provide the code for visualizing attention in Figure 2, or help us identify if there are any issues with our approach? #90

Could you provide the code for visualizing attention in Figure 2, or help us identify if there are any issues with our approach? #90

Comments

yuhkalhic commented Dec 11, 2024