Enhancing Personalized Multi-Turn Dialogue with Curiosity Reward
This paper introduces CURIO (Curiosity-driven User-modeling Reward as an Intrinsic Objective), a novel framework for enhancing personalized multi-turn dialogue in large language models (LLMs). This research addresses the limitations of conventional methods like Reinforcement Learning from Human Feedback (RLHF), which often fail to personalize interactions dynamically for individual users. CURIO integrates a curiosity-based intrinsic reward derived from a user model, encouraging the LLM agent to actively infer user traits and preferences throughout the conversation to improve its user model's accuracy. By formulating personalized dialogue as a Partially Observable Markov Decision Process (POMDP) and connecting the intrinsic reward to Potential-based Reward Shaping (PBRS) theory, the authors demonstrate that CURIO significantly improves personalization performance and generalization in tasks such as conversational recommendations and educational dialogues. The overall goal is to create more adaptive and engaging conversational agents by training them to learn about the user during the interaction.