0
The Cocktail Party Inside a Neural Network
https://medium.com/@micahadler2008/how-many-things-can-attention-keep-straight-4aa7e62b8fa2(medium.com)Imagine trying to follow dozens of conversations at a noisy party — you can focus on a few at once, but beyond that, the voices blur together. Modern AI systems face a similar challenge. In my new paper, On the Capacity of Self-Attention, I ask a similar question about AI: how many distinct conversations can an AI’s “listener” — its attention mechanism — actually track before it gets overwhelmed? Attention is what helps models like ChatGPT or image generators connect the right pieces of information — like linking a pronoun to the right noun or matching parts of an image that belong together. I found that attention has a real capacity limit — a kind of mental bandwidth — and that “multi-head” attention (multiple smaller listeners instead of one big one) helps the system separate overlapping signals and make sense of more conversations. In fact, it turns out that the common explanation for why AI uses a “team of listeners” — that they let it “look at different things at once” — is incomplete. The real reason is more fundamental: a team is better than one super-listener at cutting through the noise, dramatically expanding the AI’s mental bandwidth. In short: the limits of attention can be described mathematically, and understanding its limits helps us see — and improve — how AI thinks.
0 points•by raj•10 days ago