Seedlm: compress LLM pesos in pseudo-allerio generators seeds

Large language models (LLM) have transformed natural language processing, but face significant challenges in generalized implementation due to their high cost of execution time. In this article, we present Seedlm, a new compression method after training that uses seeds of a pseudo-alley generator to encode and compress pesos of the model. Specifically, for each block of weights, we find a seed that feeds on a linear feedback change record (LFSR) during inference to efficiently generate a random matrix. This matrix is combined linearly with compressed coefficients to rebuild the weight block. SEEDLM reduces access to memory and takes advantage of inactive calculation cycles during inference, effectively accelerating tasks united by trade by trade with less memory accesses. Unlike avant -garde methods that depend on calibration data, our approach is data free and generalizes well in various tasks. Our experiments with flame3 70b, which is particularly challenging, show a zero shooting precision retention in 4 and 3 bits compression to be on par or better than the latest generation methods, while maintaining the performance comparable to the FP16 baselines. In addition, FPGA -based tests show that the 4 -bit plant, as the size of the model increases, is close to a 4x acceleration in a FP16 baseline call 2/3.

† Meta

Seedlm: compress LLM pesos in pseudo-allerio generators seeds

Technical Terrence Team

This is the number of actions of the News Rate Airline

Leave a Reply Cancel reply

Recommended.

5 courses to master LLMs

Man vs. Musk: Whistleblower creates headaches for Tesla

Techniques and approaches for monitoring large language models in AWS

The United States grants $120 million to Polar Semiconductor to expand its chip facilities

A four-pack of Samsung Galaxy SmartTag2 trackers is 38 percent off for Black Friday

Categories

Important Links

Seedlm: compress LLM pesos in pseudo-allerio generators seeds

Related

Technical Terrence Team

This is the number of actions of the News Rate Airline

Leave a Reply Cancel reply

Recommended.

5 courses to master LLMs

Man vs. Musk: Whistleblower creates headaches for Tesla

Techniques and approaches for monitoring large language models in AWS

The United States grants $120 million to Polar Semiconductor to expand its chip facilities

A four-pack of Samsung Galaxy SmartTag2 trackers is 38 percent off for Black Friday

Categories

Important Links

Get daily news updates to your inbox!