The impact of artificial intelligence will never be equitable if there is only one company that builds and controls the models (not to mention the data they contain). Unfortunately, current ai models are made up of billions of parameters that must be trained and tuned to maximize performance for each use case, making the most powerful ai models out of reach for most people. and companies.
MosaicML started with the mission of making those models more accessible. The company, which counts Jonathan Frankle PhD '23 and MIT Associate Professor Michael Carbin as co-founders, developed a platform that allows users to train, improve, and monitor open source models using their own data. The company also built its own open source models using Nvidia graphics processing units (GPUs).
The approach made deep learning, a nascent field when MosaicML began, accessible to many more organizations as excitement around generative ai and large language models (LLMs) skyrocketed following the release of Chat GPT- 3.5. It also made MosaicML a powerful complementary tool for data management companies that were also committed to helping organizations make use of their data without handing it over to ai companies.
Last year, that reasoning led to the acquisition of MosaicML by Databricks, a global data warehouse, analytics and artificial intelligence company that works with some of the largest organizations in the world. Since the acquisition, the combined companies have released one of the highest-performing open source, general-purpose LLMs ever created. Known as DBRX, this model has set new benchmarks in tasks such as reading comprehension, general knowledge questions, and logic puzzles.
DBRX has since earned a reputation as one of the fastest open source LLMs available and has proven especially useful in large companies.
More than the model, though, Frankle says DBRX is important because it was built using Databricks tools, meaning any of the company's customers can achieve similar performance with their own models, which will accelerate the impact of the model. Generative ai.
“Honestly, it's exciting to see the community do cool things with it,” Frankle says. “For me, as a scientist, that's the best part. It's not the model, it's all the wonderful things the community is doing besides it. “That’s where the magic happens.”
Make algorithms efficient
Frankle earned bachelor's and master's degrees in computer science from Princeton University before coming to MIT to pursue his PhD in 2016. At first at MIT, he wasn't sure what area of computing he wanted to study. His eventual choice would change the course of his life.
Frankle ultimately decided to focus on a form of artificial intelligence known as deep learning. At the time, deep learning and artificial intelligence did not inspire the same enthusiasm as it does today. Deep learning was a decades-old area of study that had yet to bear much fruit.
“I don't think anyone at the time anticipated that deep learning was going to explode like it did,” Frankle says. “People who knew about it thought it was a really interesting area and there were a lot of unsolved problems, but phrases like large language model (LLM) and generative ai weren't really used at the time. “Those were the first days.”
Things started to get interesting with the 2017 release of a now infamous paper by Google researchers, in which they showed that a new deep learning architecture known as transformer was surprisingly effective as a language translation and showed promise in other applications, including content generation.
In 2020, Mosaic's eventual co-founder and technology executive, Naveen Rao, sent an email to Frankle and Carbin out of the blue. Rao had read a paper they both co-authored, in which researchers showed a way to scale down deep learning models without sacrificing performance. Rao proposed to the two of them to start a company. They were joined by Hanlin Tang, who had worked with Rao at a previous ai startup that had been acquired by Intel.
The founders began by reading about different techniques used to speed up the training of ai models and eventually combined several of them to demonstrate that they could train a model to perform image classification four times faster than had been achieved before.
“The trick was that there was no trick,” Frankle says. “I think we had to make 17 different changes to the way we trained the model to be able to figure it out. It was just a little bit here and a little bit there, but it turns out that it was enough to get incredible accelerations. “That’s really been the story of Mosaic.”
The team demonstrated that their techniques could make models more efficient and released a large open source language model in 2023 along with an open source library of their methods. They also developed visualization tools to allow developers to plot different experimental options for training and running models.
MIT's E14 Fund invested in Mosaic's Series A funding round, and Frankle says the E14 team offered helpful guidance from the beginning. Mosaic's progress enabled a new class of companies to train their own generative ai models.
“There was a democratization and open source angle to Mosaic's mission,” Frankle says. “That's something that's always been very close to my heart. Since I was a PhD student and I didn't have a GPU because I wasn't in a machine learning lab and all my friends had GPUs. I still feel that way. Why can't we all participate? Why can't we all do this and do science?
Open source innovation
Databricks had also been working to provide its clients with access to artificial intelligence models. The company finalized the acquisition of MosaicML in 2023 for $1.3 billion.
“At Databricks, we saw a founding team of academics like ourselves,” says Frankle. “We also saw a team of scientists who understand technology. Databricks has the data, we have the machine learning. You cannot do one without the other and vice versa. “It ended up being a really good game.”
In March, Databricks launched DBRX, providing the open source community and companies building their own LLMs with capabilities that were previously limited to closed models.
“What DBRX proved is that you can build the best open source LLM in the world with Databricks,” says Frankle. “If you are a company, today the sky is the limit.”
Frankle says the Databricks team has been encouraged to use DBRX internally on a wide variety of tasks.
“It's already fantastic and, with some adjustments, it's better than closed models,” he says. “You're not going to be better than GPT at everything. This is not how it works. But no one wants to solve all the problems. Everyone wants to solve a problem. And we can customize this model to be really great for specific scenarios.”
As Databricks continues to push the boundaries of ai and as competitors continue to invest huge sums of money in ai more broadly, Frankle hopes the industry will come to see open source as the best way forward.
“I believe in science and I believe in progress and I'm excited that we're doing such an exciting science as a field right now,” Frankle says. “I also believe in openness and hope everyone else embraces it like we have. “That’s how we got here, through good science and good exchange.”