Build cost-effective RAG applications with binary embeddings on Amazon Titan Text Embeddings V2, Amazon OpenSearch Serverless, and Amazon Bedrock knowledge bases

Today, we are pleased to announce the availability of Binary Embeddings for amazon Titan Text Embeddings V2 in amazon Bedrock Knowledge Bases and amazon OpenSearch Serverless. With support for binary embedding in amazon Bedrock and a binary vector store in OpenSearch Serverless, you can use binary embeddings and a binary vector store to create retrieval augmented generation (RAG) applications in amazon Bedrock knowledge bases, which reduces memory usage and overhead costs.

amazon Bedrock is a fully managed service that provides a single API to access and use multiple high-performance foundation models (FMs) from leading ai companies. amazon Bedrock also offers a rich set of capabilities to build generative ai applications with security, privacy, and responsible ai. Using amazon Bedrock knowledge bases, FMs and agents can retrieve contextual information from their company's private data sources for RAG. RAG helps FMs deliver more relevant, accurate and personalized responses.

amazon Titan Text Embeddings models generate meaningful semantic representations of documents, paragraphs, and sentences. amazon Titan Text Embeddings takes as input a body of text and generates a vector of 1024 (default), 512, or 256 dimensions. amazon Titan text embeds are delivered through a latency-optimized endpoint invocation for faster searching (recommended during the fetch step) and performance-optimized batch jobs for faster indexing. With Binary Embeddings, amazon Titan Text Embeddings V2 will represent data as binary vectors with each dimension encoded as a single binary digit (0 or 1). This binary representation will convert high-dimensional data into a more efficient format for storage and computation.

amazon OpenSearch Serverless is a serverless deployment option for amazon OpenSearch Service, a fully managed service that makes it simple to perform interactive log analysis, real-time application monitoring, website search, and vector search with its k-neighbor plugin plus nearby (kNN). Supports exact and approximate nearest neighbor algorithms and multiple storage and comparison engines. It makes it easy for you to build machine learning (ML) augmented search experiences, generative ai applications, and modern analytics workloads without having to manage the underlying infrastructure.

The OpenSearch Serverless kNN plugin now supports binary and 16-bit vectors (FP16), as well as 32-bit floating-point vectors (FP32). You can store binary embeddings generated by amazon Titan Text Embeddings V2 for lower costs by setting the kNN vector field type to binary. Vectors can be stored and searched in OpenSearch Serverless using the PUT and GET APIs.

This post summarizes the benefits of this new binary vector support in amazon Titan Text Embeddings, amazon Bedrock Knowledge Bases, and OpenSearch Serverless, and gives you information on how to get started. The following diagram is a rough architecture diagram with the amazon Bedrock and amazon OpenSearch Serverless knowledge bases.

You can reduce latency and reduce storage costs and memory requirements in OpenSearch Serverless and amazon Bedrock Knowledge Bases with minimal reduction in retrieval quality.

We run the Massive Text Embedding Benchmark (MTEB) recovery dataset with binary embeddings. On this data set, we reduced storage and saw a 25x improvement in latency. Binary embeddings maintained 98.5% recall accuracy with reclassification and 97% without reclassification. Compare these results to the results we obtained using full precision embeddings (float32). In end-to-end RAG benchmark comparisons with full-precision embeddings, binary embeddings with amazon Titan Text Embeddings V2 retain 99.1% of the accuracy of full-precision responses (98.6% without reclassification). We encourage customers to perform their own benchmarking using amazon OpenSearch Serverless and Binary Embeddings for amazon Titan Text Embeddings V2.

OpenSearch Serverless benchmarks using the Hierarchical Navigable Small Worlds (HNSW) algorithm with binary vectors have revealed a 50% reduction in search OpenSearch Compute Units (OCUs), resulting in cost savings for users . The use of binary indexes has resulted in significantly faster recovery times. Traditional search methods often rely on computationally intensive calculations, such as L2 and cosine distances, which can be resource-intensive. In contrast, binary indexes in amazon OpenSearch Serverless operate at Hamming distances, a more efficient approach that speeds up search queries.

In the following sections we will discuss the procedures for binary embeddings. with amazon Titan text inlaysbinary vectors (and FP16) for vector engineand binary embedding option for amazon Bedrock Knowledge Bases To learn more about amazon Bedrock knowledge bases, visit Knowledge Bases now offering a fully managed RAG experience on amazon Bedrock.

Generate binary embeddings with amazon Titan Text Embeddings V2

amazon Titan Text Embeddings V2 now supports binary embeddings and is optimized for performance and retrieval accuracy at different dimension sizes (1024, 512, 256) with text support for over 100 languages. By default, amazon Titan Text Embeddings models produce embeddings with 32-bit floating-point precision (FP32). Although using a 1024-dimensional vector of FP32 embeddings helps achieve higher accuracy, it also results in large storage requirements and related costs in recovery use cases.

To generate binary embeds in your code, add the right embeddingTypes parameter in your invoke_model API request for amazon Titan Text Embeddings V2:

import json
import boto3
import numpy as np
rt_client = boto3.client("bedrock-runtime")

response = rt_client.invoke_model(modelId="amazon.titan-embed-text-v2:0", 
          body=json.dumps(
               {
                   "inputText":"What is amazon Bedrock?",
                   "embeddingTypes": ("binary","float")
               }))('body').read()

embedding = np.array(json.loads(response)("embeddingsByType")("binary"), dtype=np.int8)

As in the previous request, we can request either the binary embedding alone or both the binary and float embeddings. the above embedding Above is a binary vector of length 1024 similar to:

array((0, 1, 1, ..., 0, 0, 0), dtype=int8)

For more information and sample code, see the amazon Titan Embeddings text.

Configure amazon Bedrock knowledge bases with binary vector embeddings

You can use amazon Bedrock Knowledge Bases to take advantage of binary embeddings with amazon Titan Text Embeddings V2 and the binary vector and 16-bit floating point (FP16) vector engine in amazon OpenSearch Serverless, without writing a single line of code. Follow these steps:

In the amazon Bedrock console, create a knowledge base. Provide the knowledge base details, including name and description, and create a new service role or use an existing one with the relevant AWS Identity and Access Management (IAM) permissions. For information about creating service roles, see Service Roles. Low Choose data sourcechoose amazon s3as shown in the following screenshot. Choose Next.
Configure the data source. Enter a name and description. Define the S3 source URI. Low Fragmentation and analysis settingschoose Default. Choose Next to continue.
Complete the knowledge base setup by selecting an onboarding model. For this tutorial, select Titan v2 text embed. Low Inlay Typechoose Binary Vector Embeddings. Low Vector dimensionschoose 1024. Choose Quick creation of a new vector store. This option will configure a new amazon Open Search Serverless store that supports the binary data type.

You can query the knowledge base details after creation to monitor the synchronization status of the data source. Once synchronization is complete, you can test the knowledge base and check the FM responses.

Conclusion

As we've explored throughout this post, binary embeddings are an option in the amazon Titan Text Embeddings V2 models available on amazon Bedrock and the binary vector store on OpenSearch Serverless. These features significantly reduce memory and disk needs on amazon Bedrock and OpenSearch Serverless, resulting in less OCU for the RAG solution. You will also experience better performance and an improvement in latency, but there will be some impact on the accuracy of the results compared to using the full float (FP32) data type. Although the drop in accuracy is minimal, you must decide if it suits your application. Specific benefits will vary depending on factors such as data volume, search traffic, and storage requirements, but the examples discussed in this post illustrate the potential value.

Support for Binary Embeddings in amazon Open Search Serverless, amazon Bedrock Knowledge Bases, and amazon Titan Text Embeddings v2 are available today in all AWS Regions where the services are already available. Please see the list of regions for details and future updates. For more information about amazon Knowledge Bases, visit the amazon Bedrock Knowledge Bases product page. To learn more about amazon Titan text embeds, visit amazon Titan on amazon Bedrock. To learn more about amazon OpenSearch Serverless, visit the amazon OpenSearch Serverless product page. For pricing details, check out the amazon Bedrock pricing page.

Try the new feature in the amazon Bedrock console today. Send comments to <a target="_blank" href="https://repost.aws/tags/TAQeKlaPaNRQ2tWB6P7KrMag/amazon-bedrock” target=”_blank” rel=”noopener”>AWS re: Publishing for amazon Bedrock or through your regular AWS contacts and engage with the community of generative ai creators at <a target="_blank" href="https://community.aws/generative-ai?trk=4a84f0b4-a654-4729-9e51-6d6dc54134f2&sc_channel=el” target=”_blank” rel=”noopener”>community.aws.

About the authors

Shreyas Subramanian is a Principal Data Scientist, helping customers use generative ai and deep learning to solve their business challenges using AWS services. Shreyas has experience in large-scale optimization and ML and using ML and reinforcement learning to accelerate optimization tasks.

Rum Widha is a Senior Software Development Manager at amazon Bedrock Knowledge Bases, helping customers easily build scalable RAG applications.

satish nandi is a senior product manager for amazon OpenSearch Service. He is focused on OpenSearch Serverless and has years of experience in networking, security, and ai/ML. He has a bachelor's degree in computer science and an MBA in entrepreneurship. In his free time, he enjoys flying airplanes, gliding, and riding motorcycles.

Vamshi Vijay Nakkirtha is a senior software development manager working on the OpenSearch project and the amazon OpenSearch service. His main interests include distributed systems.

Build cost-effective RAG applications with binary embeddings on Amazon Titan Text Embeddings V2, Amazon OpenSearch Serverless, and Amazon Bedrock knowledge bases

Technical Terrence Team

Margaritaville at Sea expands its cruise offering from Florida port

Leave a Reply Cancel reply

Recommended.

Steve Jobs’ long shadow looms over turmoil at OpenAI

Ripple's XRP Targets $1 Ahead of Bitcoin Halving; AI Altcoin Ready to Overtake Cardano

Ethereum gas fee increased due to memecoin frenzy with mixed feedback on network usability

Ethereum Layer 2 will continue to have various approaches to scaling

New support levels and targets

Categories

Important Links

Build cost-effective RAG applications with binary embeddings on Amazon Titan Text Embeddings V2, Amazon OpenSearch Serverless, and Amazon Bedrock knowledge bases

Generate binary embeddings with amazon Titan Text Embeddings V2

Configure amazon Bedrock knowledge bases with binary vector embeddings

Conclusion

About the authors

Related

Technical Terrence Team

Margaritaville at Sea expands its cruise offering from Florida port

Leave a Reply Cancel reply

Recommended.

Steve Jobs’ long shadow looms over turmoil at OpenAI

Ripple's XRP Targets $1 Ahead of Bitcoin Halving; AI Altcoin Ready to Overtake Cardano

Ethereum gas fee increased due to memecoin frenzy with mixed feedback on network usability

Ethereum Layer 2 will continue to have various approaches to scaling

New support levels and targets

Categories

Important Links

Get daily news updates to your inbox!