Robot Dog Do Moonwalk MJ Style: This AI research proposes using coded rewards as a flexible interface between LLM and an optimization-based motion controller

The Artificial Intelligence industry has taken over the world in recent times. With new and unique research and models being released almost every day, AI is evolving and improving. Whether we consider the healthcare, education, marketing, or business domain, artificial intelligence and machine learning practices are beginning to transform the way industries operate. Almost all organizations are embracing the introduction of Large Language Models (LLMs), a well-known advance in AI. Famous LLMs like GPT-3.5 and GPT-4 have shown impressive adaptability to new contexts, enabling tasks like logical reasoning and code generation with minimal hand-sampling.

Researchers have also looked at the use of LLM to improve robotic control in the area of robotics. Since low-level robotic operations are hardware dependent and often not represented in LLM training data, it is difficult to apply LLMs to robotics. Previous approaches viewed LLMs as semantic schedulers or relied on human-created control primitives to communicate with robots. To meet all the challenges, Google DeepMind researchers have introduced a new paradigm that uses the adaptability and optimization potential of reward functions to perform a variety of robotic activities.

The reward functions act as intermediate interfaces defined by the LLMs, which can then be optimized to drive robot control strategies. These functions are suitable for LLM specification due to their semantic richness, as they can efficiently connect high-level language commands or fixes with low-level robot behaviors. The team has mentioned that operating at a higher level of abstraction using reward functions as an interface between language and low-level robotic actions was inspired by the observation that human language instructions often describe behavioral outcomes rather than of specific low-level actions. By connecting the instructions with the rewards, it becomes easier to bridge the gap between the language and the behaviors of the robots, since the rewards capture the depth of the semantics associated with the desired results.

JOIN the fastest ML subreddit community

The real-time optimizer MuJoCo MPC (Model Predictive Control) has been used in this paradigm to enable the development of interactive behavior. The iterative refinement process has been enhanced by the user’s ability to immediately observe the results and provide system input. For the evaluation process, the research team designed a set of 17 tasks for both a simulated quadruped robot and a dexterous manipulator robot. The method was able to perform 90% of the tasks that were designed with reliably good performance. In contrast, a benchmark strategy that uses primitive abilities to interface with code as policy only completed 50% of the tasks. Experiments were also carried out on a real robotic arm to test the efficiency of the methodology in which the interactive system displayed complex manipulation abilities such as non-grasping thrusting.

In conclusion, this is a promising approach with the help of which LLMs can be used to define reward parameters and optimize them for robotic control. The combination of LLM-generated rewards and real-time optimization techniques shows a feedback-based and interactive behavior creation process, enabling users to achieve complex robotic behaviors more efficiently and effectively.

review the Paper and Project. Don’t forget to join our 25k+ ML SubReddit, discord channel, and electronic newsletter, where we share the latest AI research news, exciting AI projects, and more. If you have any questions about the article above or if we missed anything, feel free to email us at asif@marktechpost.com

featured tools Of AI Tools Club

Check out 100 AI tools at AI Tools Club

Tanya Malhotra is a final year student at the University of Petroleum and Power Studies, Dehradun, studying BTech in Computer Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a data science enthusiast with good analytical and critical thinking, along with a keen interest in acquiring new skills, leading groups, and managing work in an organized manner.