Learning without training: The implicit dynamics of in-context learning

This academic paper proposes a novel explanation for in-context learning (ICL) in Large Language Models (LLMs), a phenomenon where LLMs adapt to new patterns at inference time without explicit weight updates. The authors introduce the concept of a contextual block, which generalizes a transformer block by stacking a contextual layer (like self-attention) with a neural network. They demonstrate, through theoretical derivations and experimental verification, that the context provided in the prompt implicitly modifies the weights of the neural network's first layer, effectively performing a low-rank weight update. This implicit weight adjustment behaves similarly to a gradient descent learning dynamics, suggesting that ICL isn't solely about the internal workings of self-attention but a broader property of neural networks transferring input modifications to their weight structures.

Om Podcasten