LinkedIn released the Liger (Linkedin GPU Efficient Runtime) kernel – a revolutionary tool that increases LLM training efficiency by over 20% and reduces memory usage by 60%

LinkedIn has recently introduced its revolutionary innovation: the Liger Kernel (LinkedIn's GPU-efficient runtime)a collection of highly efficient Triton cores designed specifically for training large language models (LLMs). This new technology represents a breakthrough in machine learning, particularly in training large-scale models that require substantial computational resources. The Liger core is poised to become a fundamental tool for researchers, machine learning practitioners, and those eager to optimize the efficiency of their GPU training.

Introduction to the Liger Kernel

The Liger kernel has been meticulously designed to address the increasing demands of LLM training by improving both speed and memory efficiency. The LinkedIn development team has implemented several advanced features in the Liger kernel including Hugging Face-compatible RMSNorm, RoPE, SwiGLU, CrossEntropy, FusedLinearCrossEntropy, and more. These kernels are efficient and compatible with widely used tools such as Flash Attention, PyTorch FSDP, and Microsoft DeepSpeed, making them highly versatile for various applications.

Main features and benefits

One of the most notable aspects of the Liger kernel is its ability to increase multi-GPU training performance by over 20%, while reducing memory usage by up to 60%. This dual benefit is achieved through kernel fusion, in-place replacement, and sharding techniques that optimize the computational processes involved in LLM training. The kernel is designed to be lightweight, with minimal dependencies, and requires only Torch and Triton, eliminating common headaches associated with managing complex software dependencies.

The efficiency of the Liger kernel is further exemplified by its ability to handle larger context lengths, larger batch sizes, and massive vocabularies without compromising performance. For example, while traditional Hugging Face models can encounter out-of-memory (OOM) errors at 4K, the Liger kernel can scale up to 16K, substantially increasing model capacity and throughput.

Applications and use cases

The Liger kernel is particularly beneficial for those working on large-scale LLM training projects. For example, when training the LLaMA 3-8B model, the Liger kernel can achieve up to a 20% increase in training speed and a 40% reduction in memory usage. This is especially useful for training on datasets like Alpaca, where computational efficiency can significantly impact the overall cost and time required for model development.

In more advanced scenarios, such as the retraining phase of a multi-head LLM like Medusa, the Liger kernel can reduce memory usage by an impressive 80% while simultaneously improving performance by 40%. These improvements are crucial for researchers and practitioners looking to push the boundaries of what’s possible with LLMs, allowing them to experiment with larger models and more complex architectures without hardware limitations.

Technical description

The Liger kernel integrates several key Triton-based operations that improve LLM training performance. These include RMSNorm, RoPE, SwiGLU, and FusedLinearCrossEntropy, each of which contributes to the kernel’s overall efficiency. For example, RMSNorm normalizes activations using their root mean square. This process has been optimized within the Liger kernel to achieve a three-fold increase in speed and reduction in maximum memory.

Similarly, RoPE (Rotary Positional Embedding) and SwiGLU (Swish Gated Linear Units) have been implemented with in-place replacement techniques that significantly reduce memory usage and increase computational speed. The CrossEntropy loss function, critical for many LLM tasks, has also been optimized to reduce peak memory usage by more than four times and double the execution speed.

Ease of use and installation

Despite its advanced capabilities, the Liger kernel is designed to be easy to use and integrate into existing workflows. Users can patch their existing Hugging Face models with optimized Liger kernels using just one line of code. The kernel’s lightweight design also ensures that it supports multi-GPU setups, including PyTorch FSDP and DeepSpeed, without requiring extensive configuration or additional libraries.

The Liger kernel can be installed via pip, and there are stable and nightly versions available. This ease of installation, combined with minimal kernel dependencies, makes it accessible to a wide range of users, from machine learning experts to curious newbies looking to improve their training efficiency.

Future prospects and community participation

LinkedIn is committed to continually improving the Liger Kernel and welcomes contributions from the community. By fostering collaboration, LinkedIn aims to gather the best kernels for LLM training and incorporate them into future versions of the Liger Kernel. This approach ensures that the kernel remains at the forefront of technological innovation in LLM training.

Conclusion

LinkedIn’s release of the Liger Kernel marks a major milestone in the evolution of large-scale model training. The Liger Kernel will become an indispensable tool for anyone involved in large-scale model training, offering a highly efficient, easy-to-use, and versatile solution. Its ability to dramatically improve both speed and memory efficiency will undoubtedly accelerate the development of more advanced and capable large-scale models, paving the way for breakthroughs in artificial intelligence.

Take a look at the GitHub. All credit for this research goes to the researchers of this project. Also, don't forget to follow us on twitter.com/Marktechpost”>twitter and join our Telegram Channel and LinkedIn GrAbove!. If you like our work, you will love our fact sheet..

Don't forget to join our Over 49,000 ML subscribers on Reddit

Find upcoming ai webinars here

Asif Razzaq is the CEO of Marktechpost Media Inc. As a visionary engineer and entrepreneur, Asif is committed to harnessing the potential of ai for social good. His most recent initiative is the launch of an ai media platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is technically sound and easily understandable to a wide audience. The platform has over 2 million monthly views, illustrating its popularity among the public.

Join the fastest growing ai research newsletter read by researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others…

LinkedIn released the Liger (Linkedin GPU Efficient Runtime) kernel – a revolutionary tool that increases LLM training efficiency by over 20% and reduces memory usage by 60%

Technical Terrence Team

Bank of China Chairman Liu Jin resigns By Reuters

Leave a Reply Cancel reply

Recommended.

Ordinal Bitcoin NFTs Minted Pass the 500,000 Mark: What’s Next?

Crude oil rises on possible Russian cuts

Acceptable Use Policy for AI in the ELA Classroom

US Bitcoin Reserve Could Push Price to $500,000: Expert

ALL ACCESS PD PASS | ShakeUpLearning

Categories

Important Links

LinkedIn released the Liger (Linkedin GPU Efficient Runtime) kernel – a revolutionary tool that increases LLM training efficiency by over 20% and reduces memory usage by 60%

Related

Technical Terrence Team

Bank of China Chairman Liu Jin resigns By Reuters

Leave a Reply Cancel reply

Recommended.

Ordinal Bitcoin NFTs Minted Pass the 500,000 Mark: What’s Next?

Crude oil rises on possible Russian cuts

Acceptable Use Policy for AI in the ELA Classroom

US Bitcoin Reserve Could Push Price to $500,000: Expert

ALL ACCESS PD PASS | ShakeUpLearning

Categories

Important Links

Get daily news updates to your inbox!