Researchers at Stanford and DeepMind come up with the idea of using LLM of large language models as a proxy reward function 07/21/2023