This AI paper from ETH Zurich, Google and Max Plank proposes an effective AI strategy to boost the performance of reward models for RLHF (reinforcement learning from human feedback) 01/27/2024