The Case of the Blah Blah Blahs

A famous datset of Reuters articles from the 1980s includes “Blah blah blah.” in place of some stories. Why? We have a Patreon now! Sign up to support the show and get access to our bonus podcast, Overunderstood. Show notes: 00:31 - The link Jess sent 8:31 - SGML 8:46 - This is what the blahs look like and this is what all the entries look like. 24:00 - FTP 24:34 - Linguistic Data Consortium 29:00 - RCV1 at NIST and David D. Lewis’s README 30:22 - Construe-TIS: A System for Content-Based Indexing of a Database of News Stories (Phil Hayes and Steven Weinstein)

Om Podcasten

There are questions the internet just can't answer. But that doesn't mean we can't find them. On each episode of Underunderstood, we find a question the internet can’t answer — maybe it’s a dead-end Wikipedia page, an abandoned Reddit thread, or an unanswered question on Twitter — and we fill in the gaps. It’s part chat show, part documentary, and almost always surprising.