Attention Probes

https://blog.eleuther.ai/attention-probes/(blog.eleuther.ai)

A new method called "attention probes" is proposed for classifying the internal states of language models by using an attention layer to collect hidden states. This approach avoids traditional aggregation methods like mean pooling or using the last token's representation. Experiments were conducted on models like Gemma 2B using various datasets to compare attention probes against last-token and mean probes. The results indicate that multi-head attention probes generally outperform mean probes and show that increasing the number of heads improves performance, although it also increases attention weight entropy.

0 points•by will22•2 months ago

Comments (0)

No comments yet. Be the first to comment!

Want to join the discussion?