We aren't running out of training data, we are running out of open training data
Data licensing deals, scaling, human inputs, and repeating trends in open vs. closed.This is AI generated audio with Python and 11Labs.Source code: https://github.com/natolambert/interconnects-toolsOriginal post: https://www.interconnects.ai/p/the-data-wall0:00 We aren't running out of training data, we are running out of open training data2:51 Synthetic data: 1 trillion new tokens per day4:18 Data licensing deals: High costs per token6:33 Better tokens: Search and new frontiers Get full access to Interconnects at www.interconnects.ai/subscribe