It’s no exaggeration to say that the autonomous vehicle industry is facing a reckoning.
Just this week, Cruise recalled its entire fleet of autonomous vehicles after a horrific accident involving a pedestrian led the California DMV to suspend the company from operating driverless robotaxis in the state. Meanwhile, activists in San Francisco have taken to the streets, literally, to immobilize driverless cars as a form of protest against the city’s use as a testing ground for the emerging technology.
But one startup says it has the key to safer self-driving technology, and it believes this key will convince naysayers.
Ghost Autonomy, a company that creates autonomous driving software for automaker partners, announced this week that it plans to begin exploring applications of multimodal large language models (LLMs) (ai models that can understand both text and images) in autonomous driving. To achieve this, Ghost has partnered with OpenAI through the OpenAI Startup Fund to gain early access to OpenAI systems and Azure resources from Microsoft, OpenAI’s close collaborator, plus a $5 million investment.
“LLMs offer a new way to understand ‘the long tail,’ adding reasoning to complex scenes where current models fall short,” Ghost co-founder and CEO John Hayes told TechCrunch in an email interview. “The use cases for LLM-based autonomous analytics will only grow as LLMs become faster and more capable.”
But how, exactly, does Ghost apply ai models designed to explain images and generate text to control self-driving cars? According to Hayes, Ghost is testing software that relies on multimodal models to “perform more complex scene interpretations.” suggest road decisions (e.g., “move to the right lane”) to the car’s control hardware based on images of road scenes taken by in-car cameras.
“At Ghost, we will work to refine existing models and train our own models to maximize reliability and performance on the road,” Hayes said. “For example, construction zones have unusual components that can be difficult for simpler models to navigate: temporary lanes, flaggers with changing signs, and complex negotiations with other road users. “LLMs have proven to be able to process all of these variables along with human-like levels of reasoning.”
However, the experts I spoke to are skeptical.
“(Ghost is) using ‘LLM’ as a marketing buzzword,” said Os Keyes, Ph.D. candidate at the University of Washington focused on data law and ethics, he told TechCrunch via email. “Basically, if you take this speech and replace LLM with ‘blockchain’ and send it back to 2016, it would be just as plausible and obviously a waste.”
Keyes posits that LLMs are simply the wrong tool for autonomous driving. They were not designed or trained for this purpose, he claims, and may even be a less efficient way to solve some of the pending challenges in terms of vehicle autonomy.
“It’s like hearing that your neighbor has been using a wad of Treasury bills to prop up a table,” Keyes said. “You could Do it that way, and it’s certainly more elegant than the alternative, but… why?
Mike Cook, a senior lecturer at King’s College London whose research focuses on computational creativity, agrees with Keyes’ general assessment. He points out that multimodal models themselves are far from a settled science; In fact, OpenAI’s flagship model makes up facts and makes basic mistakes that humans wouldn’t make, like copying text incorrectly and using incorrect colors.
“I don’t think there is a silver bullet in computer science,” Cook said. “There is simply no reason to put LLMs at the center of something as dangerous and complex as driving a car. Researchers around the world are already scrambling to find ways to validate and demonstrate the safety of LLMs for fairly common tasks like answering essay questions, and the idea that we should apply this often unpredictable and unstable technology to autonomous driving is, premature at best, and misguided at worst.”
But Hayes and OpenAI will not be deterred.
In a press release, Brad Lightcap, COO of OpenAI and manager of the OpenAI Startup Fund, is quoted as saying that multimodal models “have the potential to expand the applicability of LLMs to many new use cases,” including autonomy and automotive. He adds: “With the ability to understand and draw conclusions by combining video, images and sounds, multimodal models can create a new way of understanding scenes and navigating complex or unusual environments.”
TechCrunch emailed questions to Lightcap through OpenAI press relations, but had not received a response at the time of publication.
As for Hayes, he says LLMs could allow autonomous driving systems to “reason about driving scenes holistically” and “use broad-based global knowledge” to “navigate complex and unusual situations,” including situations that they had not seen before. He says Ghost is actively testing multi-modal model-driven decision making across its development fleet and working with automakers to “co-validate” and integrate new large models into Ghost’s autonomy stack.
“Certainly, current models are not quite ready for commercial use in automobiles,” Hayes said. “There is still a lot of work to do to improve its reliability and performance. But this is exactly why there is a market for application-specific companies doing R&D on these general models. Companies like ours with a large amount of training data and deep application knowledge will dramatically improve existing general models. The models themselves will also improve…. Ultimately, autonomous driving will require a complete system to deliver safety, with many different model types and functions. (Multimodal models) are just one tool to help make that happen.”
This is very promising with unproven technology. Can Ghost deliver? Given that companies as well-funded and resourced as Cruise and Waymo are experiencing major setbacks for many years in testing autonomous vehicles on the road, I’m not so sure.