“Defining alignment research” by richard_ngo
I think that the concept of "alignment research" (and the distinction between that and "capabilities research") is currently a fairly confused one. In this post I’ll describe some of the problems with how people typically think about these terms, and offer replacement definitions. “Alignment” and “capabilities” are primarily properties of AIs not of AI research The first thing to highlight is that the distinction between alignment and capabilities is primarily doing useful work when we think of them as properties of AIs. This distinction is still under-appreciated by the wider machine learning community. ML researchers have historically thought about performance of models almost entirely with respect to the tasks they were specifically trained on. However, the rise of LLMs has vindicated the alignment community's focus on general capabilities, and now it's much more common to assume that performance on many tasks (including out-of-distribution tasks) will improve roughly in parallel. [...] ---Outline:(00:20) “Alignment” and “capabilities” are primarily properties of AIs not of AI research(05:04) What types of research are valuable for preventing misalignment?(05:29) Valuable property 1: worst-case focus(07:21) Valuable property 2: scientific approach(11:48) A better definition of alignment research--- First published: August 19th, 2024 Source: https://forum.effectivealtruism.org/posts/ajcQELstaSGYxdoRj/defining-alignment-research --- Narrated by TYPE III AUDIO.