MixedBread AI introduces Binary MRL: a novel embedding compression method that makes vector search scalable and enables embedding-based applications

Recently introduced Mixedbread.ai ai/mxbai-embed-large-v1″>Binary MRL, a 64-byte embedding to address the challenge of scaling embeddings in natural language processing (NLP) applications due to their memory-intensive nature. In natural language processing (NLP), embeddings play a vital role in various tasks such as recommendation systems, retrieval, and similarity search. However, the memory requirements of embeddings pose a significant challenge, especially when dealing with massive data sets. The method aims to find a way to decrease memory usage for embeddings while maintaining its usefulness and effectiveness in NLP applications.

Currently, state-of-the-art models produce embeddings with large dimensions (e.g., 1024 dimensions), encoded in float32 format, which require large memory for storage and retrieval. To address these limitations, Mixedbread.ai researchers have found two main approaches: Matryoshka Representation Learning (MRL) and Vector Quantization. MRL focuses on reducing the number of output dimensions of an integrated model while maintaining accuracy. This is done by placing more important data in the previous dimensions of the embed, allowing the less important dimensions to be cut off. On the other hand, Vector Quantization aims to reduce the size of each dimension by representing them as binary values instead of floating point numbers.

The proposed approach, ai/blog/binary-mrl”>binary MRL, It combines both methods to simultaneously achieve dimensionality reduction and embedding compression. By integrating MRL and Vector Quantization, Binary MRL aims to retain the semantic information encoded in embeddings while significantly reducing its memory footprint.

Binary MRL achieves compression by first reducing the number of output dimensions of the integrated model using MRL techniques. This involves training the model to preserve important information in fewer dimensions, thus allowing truncation of less relevant dimensions. Vector quantization is then used to display each dimension of the reduced-dimensional embedding as a binary value. This binary representation significantly reduces the memory usage of embeddings while preserving semantic information. Evaluation of Binary MRL on several data sets demonstrates that the method can achieve more than 90% of the performance of the original model using significantly smaller embeddings.

In conclusion, Binary MRL represents a novel approach to address the scalability challenges of embeddings in NLP applications. By combining MRL and Vector Quantization techniques, Binary MRL achieves significant embedding compression while preserving its usefulness and effectiveness. This method not only reduces retrieval costs on a large scale, but also makes possible new tasks that were not possible before due to memory limitations.

Tracking binary embeds: 64 bytes per embed, yeah haha

Reduces the memory usage of our embedded model by over 98% (64x) while retaining over 90% of the performance of the binary model

Model: https://t.co/ZlbEJf3DKi
Blog: https://t.co/ZaalEm0U92

— mixed bread (@mixedbreadai) twitter.com/mixedbreadai/status/1778848549414846718?ref_src=twsrc%5Etfw”>April 12, 2024

Pragati Jhunjhunwala is a Consulting Intern at MarktechPost. She is currently pursuing B.tech from the Indian Institute of technology (IIT), Kharagpur. She is a technology enthusiast and has a keen interest in the scope of data science software and applications. She is always reading about the advancements in different fields of ai and ML.

Join the fastest growing ai research newsletter read by researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others…

MixedBread AI introduces Binary MRL: a novel embedding compression method that makes vector search scalable and enables embedding-based applications

Technical Terrence Team

Oil expected to rise after Iran attack on Israel (Commodity:CL1:COM)

Leave a Reply Cancel reply

Recommended.

Daft Punk Anime Interstella 5555 Coming to Theaters for One Night Only

Amazon's Like a Dragon works better as a mafia drama than a Yakuza adaptation

Buterin outlines next steps for Ethereum proof-of-stake evolution

Xreal’s $399 Air 2 augmented reality glasses now available for pre-order

Is now the time to buy FTSE 100 companies?

Categories

Important Links

MixedBread AI introduces Binary MRL: a novel embedding compression method that makes vector search scalable and enables embedding-based applications

Related

Technical Terrence Team

Oil expected to rise after Iran attack on Israel (Commodity:CL1:COM)

Leave a Reply Cancel reply

Recommended.

Daft Punk Anime Interstella 5555 Coming to Theaters for One Night Only

Amazon's Like a Dragon works better as a mafia drama than a Yakuza adaptation

Buterin outlines next steps for Ethereum proof-of-stake evolution

Xreal’s $399 Air 2 augmented reality glasses now available for pre-order

Is now the time to buy FTSE 100 companies?

Categories

Important Links

Get daily news updates to your inbox!