“Bigger is always better” – this principle is deeply rooted in the world of ai. Every month larger models are created, with more and more parameters. Companies are even building <a target="_blank" class="af ow" href="https://www.datacenterfrontier.com/hyperscale/article/55248311/meta-sees-10b-ai-data-center-in-louisiana-using-combo-of-clean-energy-nuclear-power” rel=”noopener ugc nofollow” target=”_blank”>$10 Billion ai Data Centers for them. But is this the only direction to go?
In NeurIPS 2024, Ilya Sutskeverone of the co-founders of OpenAI, shared an idea: “Preformation as we know it will undoubtedly end”. looks like the The era of scaling is coming to an end.which means it's time to focus on improving current approaches and algorithms.
One of the most promising areas is the use of small language models (SLM) with up to 10 billion parameters. This approach is really starting to take off in the industry. For example, Clem Delangue, CEO of Hugging Face, predicts that up to 99% of use cases could be addressed using SLM. A similar trend is evident in the latest YC startup applications:
Giant generic models with many parameters are very impressive. But they are also very expensive and often present latency and privacy challenges.
In my last article “You don't need hosted LLMs, right?”, I was wondering if you need self-hosted models. Now I go one step further and ask the question: Do you need any LLM?
In this article, I will discuss why small models may be the solution your business needs. We'll talk about how you can reduce costs, improve accuracy, and maintain control of your data. And of course, we'll have an honest discussion about its limitations.
The economics of LLMs are probably one of the most painful topics for companies. However, the issue is much broader: it includes the need for expensive hardware, infrastructure costs, energy costs and environmental consequences.
Yes, large language models are impressive in their capabilities, but they are also very expensive to maintain. You may have already noticed how subscription prices for LLM-based applications have increased. For example, OpenAI's recent announcement of a $200/month The Pro plan is a sign that costs are increasing. And competitors are likely to rise to these price levels as well.
The story of the Moxie robot is a good example of this statement. Embodied created a great companion robot for kids for $800 that used the OpenAI API. Despite the success of the product (kids sent between 500 and 1,000 messages a day!), the company is closing due to the high operating costs of the API. Now thousands of robots will become useless and children will lose their friends.
One approach is tune a specialized small language model for your specific domain. Of course, it will not solve “all the world's problems”, but it will perfectly fulfill the task assigned to it. For example, analyzing customer documentation or generating specific reports. At the same time, SLMs will be cheaper to maintain, consume fewer resources, require less data, and can run on much more modest hardware (even a smartphone).
And finally, let's not forget the environment. In it article Carbon emissions and training of large neural networksI found an interesting statistic that surprised me: training GPT-3 with 175 billion parameters consumed as much electricity as the average American home consumes in 120 years. Also produced 502 tons of CO₂which is comparable to the annual operation of more than one hundred gasoline cars. And that's not counting inferential costs. In comparison, implementing a smaller model like the 7B would require 5% of the consumption of a larger model. And what about the last one? o3 release?
Clue: Don't chase hype. Before undertaking the task, calculate the costs of using APIs or your own servers. Think about the expansion of such a system and how justified the use of LLM is.
Now that we've covered economics, let's talk quality. Naturally, very few people would want to compromise the accuracy of the solution just to save costs. But even here SLMs have something to offer.
Many studies show that for highly specialized tasks, small models can not only compete with large LLMs, but often outperform them. Let's look at some illustrative examples:
- Medicine: He Diabetic Model-7B (based on Qwen2–7B) achieved 87.2% accuracy in diabetes-related tests, while GPT-4 showed 79.17% and Claude-3.5–80.13%. Despite this, Diabetica-7B is dozens of times smaller than GPT-4 and can run locally on a consumer GPU.
- Legal Sector: An SLM with only 0.2 billion parameters achieves an accuracy of 77.2% in contract analysis (GPT-4: about 82.4%). Additionally, for tasks such as identifying “unfair” terms in user agreements, the SLM even surpasses GPT-3.5 and GPT-4 in the F1 metric.
- Mathematical tasks: Google DeepMind research shows that training a small model, Gemma2–9B, with data generated by another small model produces better results than training with data from the larger Gemma2–27B. Smaller models tend to focus better on specific aspects without the tendency to “try to shine with all the knowledge” that is often a characteristic of larger models.
- Content moderation: LLaMA 3.1 8B surpassed GPT-3.5 in precision (by 11.5%) and recall (by 25.7%) when moderating content on 15 popular subreddits. This was achieved even with 4-bit quantization.which further reduces the size of the model.
I'll go a step further and share that even classic NLP approaches often work surprisingly well. I share a personal case with you: I am working on a psychological support product where we process more than a thousand messages from users every day. They can write in a chat and get a response. Each message is first classified into one of four categories: