Waymo has long touted its ties to Google's DeepMind and its decades of artificial intelligence research as a strategic advantage over rivals in the autonomous driving space. Now, the Alphabet-owned company is taking it a step further by developing a new training model for its robotaxis based on Google's Gemini multimodal large language model (MLLM).
Waymo today published a new research paper presenting an “end-to-end multimodal model for autonomous driving,” also known as EMMA. This new end-to-end training model processes sensor data to generate “future trajectories for autonomous vehicles,” helping Waymo's self-driving vehicles make decisions about where to go and how to avoid obstacles.
But more importantly, this is one of the first signs that the leader in autonomous driving plans to use MLLM in its operations. And it's a sign that these LLMs could break free from their current use as chatbots, email organizers, and image generators and find applications in an entirely new travel environment. In its research paper, Waymo proposes “developing an autonomous driving system in which the MLLM is a first-class citizen.”
End-to-end multimodal model for autonomous driving, also known as EMMA
The article describes how, historically, autonomous driving systems have developed specific “modules” for various functions, including perception, mapping, prediction and planning. This approach has proven useful for many years, but has scaling problems “due to accumulated errors between modules and limited communication between modules.” Additionally, these modules may have difficulty responding to “novel environments” because they are, by nature, “predefined,” which can make them difficult to adapt to.
Waymo says MLLMs like Gemini present an interesting solution to some of these challenges for two reasons: Chat is a “generalist” trained on vast data sets mined from the Internet “that provide rich 'world knowledge' beyond what is contained in common.” driving records”; and demonstrate “superior” reasoning abilities through techniques such as “chain-of-thought reasoning,” which mimics human reasoning by breaking down complex tasks into a series of logical steps.
Waymo developed EMMA as a tool to help its robotaxis navigate complex environments. The company identified several situations in which the model helped its self-driving cars find the correct route, including encountering various animals or constructions on the road.
Other companies, like Tesla, have talked a lot about developing end-to-end models for their self-driving cars. x.com/elonmusk/status/1727484899374899687″>Elon Musk claims that the latest version of its Full Self-Driving system (12.5.5) uses an “end-to-end neural networks” artificial intelligence system that translates camera images into driving decisions.
This is a clear indication that Waymo, which has an advantage over Tesla in deploying real driverless vehicles on the roads, is also interested in implementing an end-to-end system. The company said its EMMA model excelled at predicting trajectories, object detection and understanding road graphics.
“This suggests a promising avenue of future research, where even more basic autonomous driving tasks could be combined in a similar, expanded configuration,” the company said in a blog post today.
But EMMA also has its limitations, and Waymo acknowledges that future research will be needed before putting the model into practice. For example, EMMA couldn't incorporate 3D sensor inputs from lidar or radar, which Waymo said was “computationally expensive.” And it could only process a small number of image frames at a time.
There are also risks when using MLLM to train robotaxis that are not mentioned in the research article. Chatbots like Gemini often stumble or fail at simple tasks like reading watches or counting objects. Waymo has very little room for error when its autonomous vehicles travel at 40 mph on a busy highway. More research will be needed before these models can be implemented at scale, and Waymo is clear about that.
“We hope our results inspire further research to mitigate these issues,” writes the company's research team, “and further evolve the state of the art in autonomous driving model architectures.”