Learning without training: The implicit dynamics of in-context learning

This academic paper proposes a novel explanation for in-context learning (ICL) in Large Language Models (LLMs), a phenomenon where LLMs adapt to new patterns at inference time without explicit weight updates. The authors introduce the concept of a contextual block, which generalizes a transformer block by stacking a contextual layer (like self-attention) with a neural network. They demonstrate, through theoretical derivations and experimental verification, that the context provided in the prompt implicitly modifies the weights of the neural network's first layer, effectively performing a low-rank weight update. This implicit weight adjustment behaves similarly to a gradient descent learning dynamics, suggesting that ICL isn't solely about the internal workings of self-attention but a broader property of neural networks transferring input modifications to their weight structures.

Om Podcasten

Cut through the noise. We curate and break down the most important AI papers so you don’t have to.