Sleep-time Compute: Beyond Inference Scaling at Test-time

This academic paper explores "sleep-time compute" for large language models (LLMs), a concept where models process information from a given context while idle, anticipating potential future queries. The authors introduce Stateful GSM-Symbolic and Stateful AIME, datasets created by splitting existing reasoning problems into context and questions to test this approach. Their experiments show that sleep-time compute significantly reduces the need for test-time compute to achieve similar accuracy, offering a more efficient inference process. Furthermore, by preparing for multiple related questions about the same context, sleep-time compute can lower the average cost per query. The paper concludes that sleep-time compute is most effective when queries are predictable from the provided context.

Om Podcasten