Sleep-time Compute: Beyond Inference Scaling at Test-time

This academic paper explores "sleep-time compute" for large language models (LLMs), a concept where models process information from a given context while idle, anticipating potential future queries. The authors introduce Stateful GSM-Symbolic and Stateful AIME, datasets created by splitting existing reasoning problems into context and questions to test this approach. Their experiments show that sleep-time compute significantly reduces the need for test-time compute to achieve similar accuracy, offering a more efficient inference process. Furthermore, by preparing for multiple related questions about the same context, sleep-time compute can lower the average cost per query. The paper concludes that sleep-time compute is most effective when queries are predictable from the provided context.

Om Podcasten

Cut through the noise. We curate and break down the most important AI papers so you don’t have to.