The smaller the smarter. Do you really need the power of top… | by Alexandre Allouin | December 2024

Concerns about the environmental impacts of large language models (LLMs) are increasing. Although it may be difficult to find detailed information about the actual costs of LLMs, let's try to gather some data to understand the scale.

Since complete data on ChatGPT-4 is not available, we can consider Llama 3.1 405B as an example. This open source Meta model is possibly the most “transparent” LLM to date. Based on several <a target="_blank" class="af om" href="https://ai.meta.com/blog/meta-llama-3-1/” rel=”noopener ugc nofollow” target=”_blank”>landmarksLlama 3.1 405B is comparable to ChatGPT-4, providing a reasonable basis for understanding LLMs within this range.

Hardware requirements to run the 32-bit version of this model range from 1,620 to 1,944 GB of GPU memory, depending on the source (<a target="_blank" class="af om" href="https://www.substratus.ai/blog/llama-3-1-405b-gpu-requirements” rel=”noopener ugc nofollow” target=”_blank”>substratum, HugsFace). For a conservative estimate, let's use the lower figure of 1,620 GB. To put this in perspective, recognizing that this is a simplified analogy, 1,620 GB of GPU memory is roughly equivalent to the combined memory of 100 standard MacBook Pros (16 GB each). So when you ask one of these LLMs for a Shakespeare-style tiramisu recipe, it takes the power of 100 MacBook Pros to give you an answer.

I'm trying to translate these numbers into something more tangible…although this doesn't include the <a target="_blank" class="af om" href="https://www.techtarget.com/searchenterpriseai/news/366596503/Meta-intros-its-biggest-open-source-ai-model-Llama-31-405B#:~:text=Meta%20said%20that%20to%20train,to%20train%20the%20new%20model.” rel=”noopener ugc nofollow” target=”_blank”>training costswhich are estimated to involve around 16,000 GPUs at a cost of approximately $60 million (excluding hardware costs), a significant investment by Meta, in a process that took around 80 days. In terms of electricity consumption, <a target="_blank" class="af om" href="https://www.notebookcheck.net/Meta-unveils-biggest-smartest-royalty-free-Llama-3-1-405B-ai.866775.0.html” rel=”noopener ugc nofollow” target=”_blank”>training required 11 GWh.

He annual electricity consumption per person In a country like France it is approximately 2,300 kWh. Thus, 11 GWh corresponds to the annual electricity consumption of some 4,782 people. This consumption resulted in the release of approximately 5,000 tons of greenhouse gases equivalent to CO₂ (based on the European average), although this figure can easily double depending on the country where the model was trained.

In comparison, burning 1 liter of diesel produces 2.54 kg of CO₂. Therefore, training a Llama 3.1 405B, in a country like France, is approximately equivalent to the emissions from burning around 2 million liters of diesel. This translates into approximately 28 million kilometers traveled by car. I think that provides enough perspective… and I haven't even mentioned the water needed to cool the GPUs!

Clearly, ai is still in its infancy and we can anticipate that more optimal and sustainable solutions will emerge over time. However, in this intense race, OpenAI's financial picture highlights a significant disparity between its revenue and operating expenses, particularly related to inference costs. In 2024, the company is projected to spend approximately $4 billion on processing power provided by Microsoft for inference workloads, while its annual revenue is estimated to range between $3.5 billion and $4.5 billion. of dollars. This means that inference costs alone almost equal, or even exceed, OpenAI's total revenue (<a target="_blank" class="af om" href="https://www.deeplearning.ai/the-batch/openai-faces-financial-growing-pains-spending-double-its-revenue/” rel=”noopener ugc nofollow” target=”_blank”>deep learning.ai).

All this happens in a context in which experts announce a stagnation in the performance of ai models (scaling paradigm). Increasing model size and GPUs are resulting in significantly lower performance compared to previous leaps, such as the advances made by GPT-4 over GPT-3. “The pursuit of AGI has always been unrealistic, and the 'bigger is better' approach to ai was bound to reach a limit over time, and I think that's what we're seeing here,” he said. <a target="_blank" class="af om" href="https://www.france24.com/en/live-news/20241118-is-ai-s-meteoric-rise-beginning-to-slow” rel=”noopener ugc nofollow” target=”_blank”>Sasha Luccioniresearcher and ai leader at the startup Hugging Face.

But don't get me wrong: I'm not going to put the ai to the test because I love it! This research phase is absolutely a normal stage in ai development. However, I think we need to exercise common sense when using ai: we can't use a bazooka to kill one mosquito every time. ai must become sustainable, not only to protect our environment but also to address social divisions. Indeed, the risk of leaving the Global South behind in the ai race due to high costs and resource demands would represent a significant failure in this new intelligence revolution.

So do you really need all the power of ChatGPT to handle the simplest tasks in your RAG channel? Are you looking to control your operating costs? Do you want complete end-to-end control over your pipeline? Are you worried about your private data circulating on the web? Or perhaps you are simply aware of the impact of ai and committed to its conscious use?

Small Language Models (SLM) offer an excellent alternative worth exploring. They can run on your local infrastructure and, when combined with human intelligence, offer substantial value. Although there is no universally accepted definition of SLM (in 2019, for example, GPT-2 with its 1.5 billion parameters was considered an LLM, which is no longer the case), I am referring to models such as Mistral 7B, Llama-3.2 . 3B, or Phi3.5, to name a few. These models can run on a “good” computer, resulting in a much smaller carbon footprint while ensuring the confidentiality of your data when installed on-premise. Although less versatile, when used wisely for specific tasks, they can still provide significant value while being more environmentally virtuous.