Last month, the US financial markets .
The companies of ai generally train their chatbots using full supercomputers with 16,000 specialized or more chips. But Depseek said it only needed about 2,000.
As Depseek engineers are detailed in a research work Published just after Christmas, the new company used several technological tricks to significantly reduce the cost of building its system. Their engineers needed only around $ 6 million in without processing without processing, approximately one tenth of what Meta spent on the construction of their latest ai technology.
What exactly did Deepseek? Here is a guide.
How are ai technologies built?
The main technologies of ai are based on what scientists call neural networks, mathematical systems that learn their skills by analyzing huge amounts of data.
The most powerful systems spend months by analyzing almost all English text on the Internet, as well as many images, sounds and other multimedia. That requires huge amounts of computer power.
About 15 years ago, IA researchers realized that specialized computer chips called graphics processing units, or GPU, were an effective way to do this type of data analysis. Companies such as the Silicon Valley Nvidia chips manufacture originally designed these chips to represent graphics for computer video games. But the GPUs also had a special ability to execute mathematics that promoted neural networks.
As companies packaged more GPUs in their computer data centers, their artificial intelligence systems could analyze more data.
But the best GPU costs around $ 40,000, and need large amounts of electricity. Send the data between chips can use more electrical power than executing the chips themselves.
How could Deepseek reduce costs?
He did many things. In particular, he adopted a method called “Mixture of Experts.”
Companies generally created a single neural network that learned all patterns in all data on the Internet. This was expensive, because it required huge amounts of data to travel between GPU chips.
If a chip was learning to write a poem and another was learning to write a computer program, they still needed to talk to each other, in case there was an overlap between poetry and programming.
With the mixture of the expert method, the researchers tried to solve this problem by dividing the system into many neural networks: one for poetry, one for computer programming, one for biology, one for physics, etc. There may be 100 of these smaller “experts” systems. Each expert could concentrate on their particular field.
Many companies have fought with this method, but Depseek could do it well. His trick was to combine these smaller “experts” systems with a “generalist” system.
Experts still needed to exchange information with each other, and the generalist, who had a decent but not detailed understanding of each issue, could help coordinate interactions between experts.
It is a bit like an editor that supervises a writing room full of specialized reporters.
And that is more efficient?
A lot more. But that's not the only thing Depseek did. He also dominated a simple trick that involves decimals that anyone who remembers their primary math class can understand.
Are there mathematics involved in this?
Remember that your mathematics teacher explains the concept of Pi. Pi, also denoted as π, is a number that never ends: 3.14159265358979 …
You can use π to make useful calculations, such as determining the circumference of a circle. When you do these calculations, shorten π only a few decimals: 3.14. If you use this simpler number, you get a fairly good estimate of the circumference of a circle.
Deepseek did something similar, but on a much larger scale, in the training of its ai technology.
Mathematics that allow a neuronal network to identify patterns in the text is really only multiplication: many, lots and a lot of multiplication. We are talking about months of multiplication in thousands of computer chips.
In general, the chips multiply numbers that fit 16 memory bits. But Deepseek squeezed each number in just 8 bits of memory: half of the space. In essence, he cut several decimals of each number.
This meant that each calculation was less precise. But that didn't matter. The calculations were precise enough to produce a really powerful neuronal network.
That's all?
Well, they added another trick.
After squeezing each number in 8 bits of memory, Depseek took a different route when multiplying those numbers. When determining the response to each multiplication problem, making a key calculation that would help decide how the neuronal network would work, stretched the response in 32 bits of memory. In other words, he maintained many more decimal. He made the answer more precise.
So, could any high school student have done this?
Well, no. Deepseek engineers showed in their article that they were also very good to write the very complicated computer code that tells GPU what to do. They knew how to squeeze even more efficiency of these chips.
Few people have that kind of skill. But the serious laboratories of ai have the talented engineers necessary to match what Deepseek has done.
So why didn't they do this?
Some ai laboratories may be using at least some of the same tricks. Companies like OpenAi do not always reveal what they are doing behind closed doors.
But others were clearly surprised by Depseek's work. Doing what the start -up was not easy. The experimentation necessary to find an advance like this involves millions of dollars, if not billions, in electrical energy.
In other words, it requires huge risk quantities.
“You have to put a lot of money on the line to try new things, often, they fail,” said Tim Dettmers, a researcher at the Allen Institute for artificial intelligence in Seattle who specializes in building efficient ia systems and previously worked as an ai researcher in finish.
“That is why we don't see much innovation: people are afraid of losing many millions just to try something that doesn't work,” he added.
Many experts pointed out that the $ 6 million of Deepseek covered only what the startup spent when training the final version of the system. In their article, Deepseek engineers said they had spent additional research and experimentation funds before the final race. But the same is true for any avant -garde project.
Deepseek experienced and worth it. Now, because the new Chinese company has shared its methods with other artificial intelligence researchers, its technological tricks are ready to significantly reduce the cost of building ai
(Tagstotranslate) artificial intelligence Computers (T) and Internet (T) Research