The LLM no longer require powerful servers: MIT researchers, Kaust, Ista and Yandex introduce a new AI approach to quickly compress large language models without significant loss of quality.

HIGGS – The innovative method to compress large language models developed in collaboration with Yandex Research, Mit, Kaust and ista teams.
Higgs allows compressing LLM without additional data or optimization of intensive resource parameters.
Unlike other compression methods, Higgs does not require specialized hardware and powerful GPU. Models can be quantified directly on a smartphone or a laptop in just a few minutes without significant loss of quality.
The method has already been used to quantify popular flame models 3.1 and 3.2-Family, as well as Deepseek and Qwen-Family models.

The Yandex research team, together with researchers from the Massachusetts Institute of technology (MIT), the Austrian Institute of Science and technology (ISTA) and the University of Science and technology of King Abdullah (Kaust), developed a method to quickly compress large language models without significant loss of quality.

Previously, the implementation of large language models on mobile devices or laptops involved a quantization process, taking from hours to week and had to run on industrial servers, to maintain good quality. Now, quantization can be completed in a matter of minutes right on a smartphone or a portable portable computer without the powerful industry or GPU.

Higgs reduces the input barrier to test and implement new models in consumption degree devices, such as home PC and smartphones by eliminating the need for industrial computer energy.

The innovative compression method promotes the company's commitment to make large language models accessible to all, from main actors, SMEs and non -profit organizations to individual taxpayers, developers and researchers. Last year, Yandex researchers collaborated with the main science and technology universities to introduce two new compression methods LLM: additive quantification of large language models (QLM) and photovoltaic adjustment. Combined, these methods can reduce the size of the model up to 8 times while the response quality of 95%is maintained.

LLM adoption barriers

Large language models require substantial computational resources, which makes them inaccessible and professional for most. This is also the case of open source models, such as the popular Depseek R1, which cannot be easily implemented on the most advanced servers designed for training in models and other automatic learning tasks.

As a result, access to these powerful models has traditionally been limited to a few selected organizations with the necessary infrastructure and computer power, despite their public availability.

However, Higgs can pave the way for broader accessibility. Developers can now reduce the size of the model without sacrificing quality and execute them in more affordable devices. For example, this method can be used to compress LLMS as Deepseek R1 with 671b parameters and calls 4 maverick with 400b parameters, which could only be quantified (compressed) with a significant loss in quality. This quantization technique unlocks new ways of using LLM in several fields, especially in environments with limited resources. Now, new companies and independent developers can take advantage of compressed models to create innovative products and services, while reducing costly equipment costs.

Yandex is already using Higgs for prototypes and accelerating the development of products, and ideas tests, since compressed models allow faster tests than their large -scale counterparts.

About the method

Higgs (incoherence of Hadamard with optimal gaussian mse grids) compresses large language models without requiring additional data or gradient descent methods, which makes quantization more accessible and efficient for a wide range of applications and devices. This is particularly valuable when appropriate data to calibrate the model. The method offers a balance between the quality of the model, the size and complexity of the quantization, which allows the models to be used in a wide range of devices such as smartphones and portable consumer computers.

Higgs was tested in the models called 3.1 and 3.2-Family, as well as in the Qwen-Family models. The experiments show that Higgs exceeds other quantification methods without data, including NF4 (4-bit normalfloat) and HQQ (quadratic quantification), in terms of value for money.

Developers and researchers can already access the method in Hugged face or explore research work, which is available in Arxiv. At the end of this month, the team will present its article in Naacl, one of the main conferences of the world in ai.

Continuous commitment to advance science and optimization

This is one of the several Yandex research documents presented in the quantization of the large language model. For example, the team presented aqlm and pv-tuning, two Methods of compression of LLM that can reduce the computational budget of a company up to 8 times without significant losses in the quality of ai response. The team also built a service That allows users to execute an 8B model on a normal PC or smartphone through a browser -based interface, even without high computer power.

Beyond the quantization of LLM, Yandex has several tools that optimize the resources used in LLM training. For example, the YAFSDP The Library accelerates LLM training by up to 25% and reduces GPU resources for training up to 20%.

Earl this year, Open Source Yandex developers PerforatorA tool for monitoring and continuous real -time analysis of servers and applications. The perforator highlights the inefficiencies of the code and provides processable information, which helps companies reduce infrastructure costs up to up to 20%. This could translate into possible savings in millions or even billions of dollars per year, depending on the size of the company.

Verify Paper. All credit for this investigation goes to the researchers of this project. In addition, feel free to follow us <a target="_blank" href="https://x.com/intent/follow?screen_name=marktechpost” target=”_blank” rel=”noreferrer noopener”>twitter And don't forget to join our 85k+ ml of submen. Note: Thanks to the Yandex team for leadership/ thinking resources for this article. Yandex's team has supported us financially for this content/article.

Asif Razzaq is the CEO of Marktechpost Media Inc .. as a visionary entrepreneur and engineer, Asif undertakes to take advantage of the potential of artificial intelligence for the social good. Its most recent effort is the launch of an artificial intelligence media platform, Marktechpost, which stands out for its deep coverage of automatic learning and deep learning news that is technically solid and easily understandable by a broad audience. The platform has more than 2 million monthly views, illustrating its popularity among the public.

The LLM no longer require powerful servers: MIT researchers, Kaust, Ista and Yandex introduce a new AI approach to quickly compress large language models without significant loss of quality.

Technical Terrence Team

How Bubba Wallace de Nascar spent his first big check

Leave a Reply Cancel reply

Recommended.

Apple's next-gen immersive CarPlay to start with Porsche and Aston Martin

A Complete Guide to Flow Blockchain in 2024

A day at the F1 Arcade: this is what it's like to drive

HNT Pumps 34% as Traders Flock to This Bitcoin Cloud Mining Platform Before Time Runs Out

10 million dollars in Ethereum are burned; Decentraland and Borroe Finance are bullish

Categories

Important Links

The LLM no longer require powerful servers: MIT researchers, Kaust, Ista and Yandex introduce a new AI approach to quickly compress large language models without significant loss of quality.

Related

Technical Terrence Team

How Bubba Wallace de Nascar spent his first big check

Leave a Reply Cancel reply

Recommended.

Apple's next-gen immersive CarPlay to start with Porsche and Aston Martin

A Complete Guide to Flow Blockchain in 2024

A day at the F1 Arcade: this is what it's like to drive

HNT Pumps 34% as Traders Flock to This Bitcoin Cloud Mining Platform Before Time Runs Out

10 million dollars in Ethereum are burned; Decentraland and Borroe Finance are bullish

Categories

Important Links

Get daily news updates to your inbox!