When can in-context learning generalize out of task distribution?

The research empirically investigates the role of pretraining distribution and a new concept of task diversity in the emergence of ICL, particularly using models trained on linear functions. Findings indicate that increasing task diversity causes transformers to shift from a specialized solution to one that can generalize across the entire task space, a transition also observed in nonlinear regression problems. The authors constructed a phase diagram to characterize how task diversity and the number of pretraining tasks interact, while also examining the influence of factors like model depth and problem dimensionality.

Om Podcasten

Cut through the noise. We curate and break down the most important AI papers so you don’t have to.