Devvret Rishi is the CEO and Cofounder of Predibase. Prior he was an ML product leader at Google working across products like Firebase, Google Research and the Google Assistant as well as Vertex ai. While there, Dev was also the first product lead for Kaggle – a data science and machine learning community with over 8 million users worldwide. Dev’s academic background is in computer science and statistics, and he holds a masters in computer science from Harvard University focused on ML.
Asif: What inspired you to found Predibase, and what gap in the market did you aim to address?
Devvret: We started Predibase in 2021 with the mission to democratize deep learning. At that time, we saw that leading tech companies like Google, Apple, and Uber—where my co-founders and I previously worked—were leveraging neural network models, especially large pre-trained ones, to build better systems for tasks like recommendation engines and working with unstructured data such as text and images. However, most companies were still relying on outdated methods like linear regression or tree-based models. Our goal was to democratize access to these advanced neural networks.
We built Predibase on top of an open-source project my co-founder Piero had started while at Uber. Initially, we believed the way to democratize deep learning would be through platforms like ours, but we were surprised by how quickly the field evolved. What really changed the game was the emergence of models with massive parameter counts, like transformers. When scaled up by 100x or 1000x, these models gained emergent generative properties. Suddenly, engineers could interact with them simply by prompting, without any initial training.
Our platform initially focused on fine-tuning models like BERT in 2021-2022, which were considered large at the time. But as generative ai evolved, we saw that engineers needed more than just pre-trained models—they needed a way to customize them efficiently. This reinforced our original vision. While we initially focused on democratizing deep learning through fine-tuning, we realized that the need for customization platforms like Predibase had only grown stronger.
Asif: Your results seem almost magical; how do you do it?
Devvret: The core of our success comes from recognizing that machine learning has fundamentally changed. Five years ago, the way you trained models was by throwing a lot of data at them, training from scratch, and waiting hours or days for the process to converge. While training and fine-tuning aren’t going away, there has been a fundamental shift in how models are trained. The biggest trend driving this shift is the technical innovation behind Low-Rank Adaptation (LoRA). LoRA introduced the idea that you can modify only a small fraction of a model’s parameters—typically less than 1%—and still achieve the same level of performance as if you had fine-tuned all 7 billion parameters. This approach allows the model to behave and perform at a high level while being much more efficient.
Many customers assume that training or fine-tuning models will take days and cost tens of thousands of dollars. In contrast, with Predibase, we can fine-tune most models in 30 minutes to an hour for as little as $5-$50. This efficiency empowers teams to experiment more freely and reduces the barriers to building custom models.
So I think the magic in our results is really threefold:
The first key insight we had was recognizing that the way models are trained would change significantly. We fully committed to parameter-efficient fine-tuning, enabling users to achieve high-quality results much faster and with a much smaller computational footprint.
The second step was integrating parameter-efficient training with parameter-efficient serving. We used LoRA-based training and LoRA-optimized serving through our open-source framework, LoRAX. LoRAX allows a single deployment to support multiple fine-tuned models, which means you can achieve excellent results by having many specialized fine-tunes—perhaps one per customer—without significantly increasing serving costs.
The final ingredient behind our success is a lot of hard work and benchmarking. We’ve fine-tuned hundreds of billions of tokens on our platform and tens of thousands of models ourselves. This hands-on experience has given us deep insights into which parameter combinations work best for different use cases. When a customer uploads a dataset and selects a model, we have prior knowledge of how to train that model most effectively—what LoRA rank to use, how large the model should be, and how long to train it. It all comes down to being empirical, and our extensive research, including the Predibase Fine-Tuning Leaderboard, has been baked into the platform to make this process seamless for users.
Asif: Where/when does your solution deliver the best results?
Devvret: Our platform delivers the best results for specialized tasks. As one of our customers put it, “Generalized intelligence might be great, but we don’t need our point-of-sale assistant to recite French poetry.”
We’ve seen this in our Fine-Tuning Leaderboard as well, which shows that fine-tuned models excel at handling specific, focused tasks. LoRA-based fine-tuning and serving are especially effective in these scenarios, enabling organizations to achieve high-quality results tailored to their needs. This approach ensures they get the precision they require without the unnecessary overhead of larger, general-purpose models.
Asif: How does your solution help address the huge cost of running LLMs?
Devvret: We’ve built over 50 optimizations into our fine-tuning stack, incorporating the latest findings from the research community. These optimizations allow you to fine-tune models with minimal resources while still achieving high-quality results. As a result, fine-tuning can typically be completed in minutes or hours–not days–for just $5 to $50, a fraction of what traditional methods would cost.
On the inference side–where a typical organization allocates most of their sped–we tackle costs with GPU autoscaling, so you only pay for the compute you use. Turbo LoRA ensures models are optimized for fast inference with low latency, and our LoRAX framework allows multiple fine-tuned models to run from a single GPU. This means you can efficiently serve fine-tuned models from fewer GPUs, helping keep your infrastructure costs low while supporting high-volume real-time workloads.
Asif: Large enterprises are very concerned about data security and IP, how do you address this?
Devvret: We get it—data security and IP protection are top priorities, especially for enterprises handling sensitive information. That’s why we offer the ability to deploy Predibase in your Virtual Private Cloud or in our cloud. This ensures that data stays under your control, with all the security policies you need, including SOC II Type II compliance. Whether you’re in finance, healthcare, or any other regulated industry, you can fine-tune and deploy models with the confidence that your data and IP are safe.
Asif: How easy/complicated is it to use Predibase?
Devvret: You can get started with Predibase in as few as ~10 lines of code. Whether you’re an engineer or a data scientist, our platform abstracts away the complexities of fine-tuning and deploying models. You can get started through our web interface or SDK, upload your dataset, select a model, and kick off training in no time. We’ve built Predibase to make fine-tuning as simple as possible, so teams can focus on outcomes instead of wrestling with infrastructure.
Asif: Inference speed is key in many use cases, how does Predibase help with that aspect?
Devvret: Predibase boosts inference speed with Turbo LoRA, which increases throughput by up to 4x, and FP8 quantization, which cuts the memory footprint in half for faster processing. On top of that, the LoRAX framework lets multiple fine-tuned models run on a single GPU, reducing costs and improving efficiency. With GPU autoscaling, the platform adjusts resources in real-time based on demand, ensuring fast responses during traffic spikes without overpaying for idle infrastructure. This combination guarantees fast, cost-effective model serving, whether for production workloads or high-volume ai applications.
Asif: How fast is the payback on the fine-tuning initial cost?
Devvret: The payback on fine-tuning with Predibase is incredibly fast because LoRA fine-tuning is remarkably cheap compared to full fine-tuning. Many people still assume that fine-tuning is expensive, imagining the high costs of full model retraining—but with LoRA, fine-tuning typically costs only $5 to $50 for a job, making it a low-risk, high-return investment. With Predibase, enterprises can fine-tune efficiently without running dozens of expensive, time-consuming experiments. This enables rapid deployment of specialized, high-performing models.
Asif: How are you different from other fine tuning providers?
Devvret: Predibase stands out with a comprehensive fine-tuning platform that just works—no out-of-memory errors while training or unexpected drops in throughput while serving. We’ve built 50+ optimizations directly into our stack to ensure smooth, high-performance fine-tuning. Combined with LoRAX–which lets you efficiently serve hundreds of fine-tuned adapters on a single GPU–our Turbo LoRA, FP8 quantization, and GPU autoscaling make our model serving infrastructure industry-leading, delivering faster responses at lower costs.
We’ve seen too many teams get bogged down managing infrastructure, building data pipelines, and debugging fragmented open-source tools—leaving less time to actually build and productionize ai. That’s why we provide an end-to-end platform backed by a dedicated team of ML engineers to help you every step of the way. Whether you prefer the flexibility of SaaS in our cloud or full control with VPC deployments in yours, Predibase frees you from the operational burden, so you can focus on delivering impactful ai solutions.
Asif: What are some of the companies that you’re working with and what problem are they solving with SLMs?
Devvret: Checkr leverages Predibase to improve the accuracy and efficiency of background checks. They process millions of checks monthly, but 2% of the data in one part of the background check workflow—often messy and unstructured—needed human review. With Predibase, Checkr fine-tuned a small language model, achieving 90%+ accuracy, outperforming GPT-4, and reducing inference costs by 5x. This enabled them to replace manual review with real-time automated decisions, meeting tight latency SLAs and improving customer experience.
Convirza, on the other hand, processes over a million phone calls per month to extract actionable insights that help coach call agents. Previously, managing infrastructure for their ai models was complex and often too much of a burden for their small ai team. With Predibase’s LoRAX multi-adapter serving, they’re able to consolidate 60 adapters into a single deployment, reducing overhead and allowing them to iterate on new models much faster. This efficiency lets them focus on building ai solutions, not infrastructure, unlocking new capabilities for their customers, like creating bespoke call performance indicators on the fly.
Both companies highlight how small language models fine-tuned on Predibase outperform larger models while cutting costs, improving response times, and streamlining operations.
Asif: How do you see the industry evolving?
Devvret: There are two big wars happening in generative ai infrastructure. The first is the competition between small, fine-tuned language models and large, general-purpose models. The second is the battle between open-source and commercial solutions.
The question that comes up a lot is: will the future be about small, task-specific, fine-tuned models, or large, general-purpose ones? I’m convinced it’s going to be more and more about small, fine-tuned models and we’ve already seen this shift starting. In 2023, the market’s focus was all about making models as big as possible, which worked well for quick prototyping. But as companies move into production, the focus shifts to cost, quality, and latency.
A lot of studies have pointed out that the economics of Gen ai haven’t always added up—too much spend, too little benefit. You can’t justify spending billions on infrastructure to solve relatively simple automation tasks. That’s where smaller, task-specific models come in. As teams graduate from prototyping into production, these models will grow in importance.
And if you look at organizations using Gen ai seriously at scale, almost all of them follow this path as they mature. It’s the same reason OpenAI felt the need to roll out something like GPT-4o-mini. I think this trend will continue, and it’s a good thing for the industry because it forces costs to align with ROI.
Talking about the second trend, my view is that the entire pie for both open-source and commercial models will grow very quickly, but the relative share of open-source is going to grow much faster than the commercial side. Based on an A16Z Generative ai survey from 2023, people were looking to spend a lot on LLMs, especially in the enterprise segment. But in 2023–the year of prototyping, as many people say–80 to 90% of the usage was estimated as closed source. However, two-thirds of ai leaders have expressed plans to increase their open-source usage, targeting a 50/50 split.
Historically, most machine learning has been built on open-source architectures, so this shift aligns with the broader trajectory of the industry.
Asif: What problems are left unsolved and where do you see the greatest opportunity?
Devvret: I think the biggest unsolved problem—and one I find really exciting—is how to create a flywheel where models get better as they’re used. What I mean is introducing a real active learning process for LLMs. Right now, what I hear from organizations is that when they move to production, they can often get a model to 70% accuracy with prompt engineering alone. But as they try to push further, they only see marginal improvements—maybe going from 70% to 71%.
What they really want is a way to reach 80% or 90% accuracy, and they hope that by deploying the model, they can collect enough data to keep improving it. But that workflow isn’t solved yet. The way many companies handle it now is by releasing a model at 70%, collecting production data, manually reviewing it, and then fine-tuning the model based on annotated datasets. But this approach just doesn’t scale—there’s no way to manually review enough data, especially as LLMs handle millions of queries in production.
The real opportunity, in my opinion, lies in building a system where models can improve automatically over time. For example, if a model launches with 70% accuracy in a new domain, you need a way to leverage production data to fine-tune it iteratively. I think the key will be applying some of the breakthroughs we’re already seeing—like using LLMs as judges or generating synthetic data—to create that flywheel. With such a system, a model could launch at 50-70% accuracy, collect data from real use, and improve on its own.
This idea was partially realized in recommender systems, but it hasn’t yet been achieved with generative ai at scale. That’s where I think the industry is headed, and it’s where I see the most exciting potential for growth.
This Interview was originally published in Marktechpost Small Language Model SLM Magazine 2024.
Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of artificial intelligence for social good. His most recent endeavor is the launch of an artificial intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.