Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
The paper surveys limitations of reinforcement learning from human feedback (RLHF). It highlights challenges in training AI systems with RLHF. Proposes auditing and disclosure standards for RLHF systems. Emphasizes a multi-layered approach for safer AI development. Identifies open questions for further research in RLHF.