Cohere’s Nimble Rerank 3 FM is now generally available in amazon SageMaker JumpStart. This FM is the newest FM in Cohere’s Rerank FM series, designed to enhance enterprise augmented generation (RAG) search and retrieval systems.
In this post we discuss the benefits and capabilities of this new model with some examples.
Overview of Cohere Rerank Models
Cohere's Rerank family of models is designed to enhance existing enterprise search systems and RAG systems. Rerank models improve search accuracy compared to keyword-based and embedding-based search systems. Cohere Rerank 3 is designed to rerank documents retrieved by initial search algorithms based on their relevance to a given query. A reranking model, also known as a cross-encoder, is a type of model that, given a query-document pair, will output a similarity score. For search models, words, sentences, or entire documents are typically encoded as dense vectors in a semantic space. By computing the cosine of the angle between these vectors, you can quantify their semantic similarity and output a single similarity score. You can use this score to rerank documents by relevance to your query.
Cohere Rerank 3 Nimble is the newest model in Cohere’s Rerank family of models, designed to improve the speed and efficiency of its predecessor Cohere Rerank 3. Based on Cohere benchmark tests, including BEIR (Benchmarking IR) for accuracy and internal benchmark datasets, Cohere Rerank 3 Nimble maintains high accuracy and is approximately 3-5x faster than Cohere Rerank 3. The speed improvement is designed for enterprises looking to enhance their search capabilities without sacrificing performance.
The following diagram represents the two-stage retrieval of a RAG sequence and illustrates where Cohere Rerank 3 Nimble is incorporated into the search sequence.
In the first stage of retrieval in the RAG architecture, a set of candidate documents is returned based on the knowledge base that is relevant to the query. In the second stage, Cohere Rerank 3 Nimble analyzes the semantic relevance between the query and each retrieved document, re-ranking them from most relevant to least relevant. The top-ranked documents complement the original query with additional context. This process improves the quality of search results by identifying the most relevant documents. Integrating Cohere Rerank 3 Nimble into a RAG system allows users to send fewer but higher-quality documents to the language model for informed generation. This results in higher accuracy and relevance of search results without adding latency.
SageMaker JumpStart Overview
SageMaker JumpStart provides access to a wide selection of publicly available modeling models. These pre-trained models serve as powerful starting points that can be deeply customized to address specific use cases. You can now use state-of-the-art model architectures, such as language models, computer vision models, and more, without having to build them from scratch.
amazon SageMaker is a fully managed, end-to-end machine learning (ML) platform that revolutionizes the entire ML workflow. It offers an unparalleled set of tools that adapt to every stage of the ML lifecycle, from data preparation to model deployment and monitoring. Data scientists and developers can use SageMaker’s integrated development environment (IDE) to access a wide range of pre-built algorithms, customize their own models, and seamlessly scale their solutions. The platform’s strength lies in its ability to abstract away the complexities of infrastructure management, allowing you to focus on innovation rather than operational expenses. SageMaker’s automated ML capabilities, including automated machine learning (AutoML) features, democratize ML by enabling even non-experts to build sophisticated models. Additionally, its robust governance features help organizations maintain control and transparency over their ML projects, addressing critical concerns around regulatory compliance.
Prerequisites
Make sure your SageMaker AWS Identity and Access Management (IAM) service role has the AmazonSageMakerFullAccess
Permission policy attached.
To successfully deploy Cohere Rerank 3 Nimble, confirm one of the following options:
- Make sure that your IAM role has the following permissions and authority to make subscriptions to AWS Marketplace in the AWS account used:
aws-marketplace:ViewSubscriptions
aws-marketplace:Unsubscribe
aws-marketplace:Subscribe
- Alternatively, confirm that your AWS account has a subscription to the model. If so, you can skip the following deployment instructions and begin with the model package subscription.
Deploy Cohere Rerank 3 Nimble in SageMaker JumpStart
You can access the Cohere Rerank 3 model family using SageMaker JumpStart in amazon SageMaker Studio, as shown in the following screenshot.
Implementation starts when you choose it Deployand you may be prompted to subscribe to this model through AWS Marketplace. If you are already subscribed, you can choose Deploy Again to deploy the model. Once the deployment is complete, you will see that an endpoint is created. You can test the endpoint by passing a sample inference request payload or by selecting the test option using the SDK.
Subscribe to the model package
To subscribe to the model package, complete the following steps:
- Depending on the model you want to deploy, open the model package listing page for cohere-rerank-nimble-english or cohere-rerank-nimble-multilingual.
- In the AWS Marketplace listing, select Continue subscribing.
- In it Subscribe to this software Page, review and choose Accept offer if you and your organization agree to the EULA, pricing, and support terms.
- Choose Continue with setup and then choose an AWS Region.
A Product ARN will be displayed. This is the model package ARN that you must specify when creating a deployable model with Boto3.
Deploy Cohere Rerank 3 Nimble using the SDK
To deploy the model using the SDK, copy the product ARN from the previous step and specify it in the model_package_arn
in the following code:
After you specify the model package ARN, you can create the endpoint, as shown in the following code. Specify the endpoint name, instance type, and the number of instances to use. Make sure that you have the account-level service limit to use ml.g5.xlarge for the endpoint to use as one or more instances. To request a service quota increase, see AWS Service Quotas.
If the endpoint is already created, you just need to connect to it with the following code:
Follow a similar process as detailed above to deploy Cohere Rerank 3 in SageMaker JumpStart.
Example of inference with Cohere Rerank 3 Nimble
Cohere Rerank 3 Nimble offers robust multilingual support. The model is available in English and multilingual versions supporting over 100 languages.
The following code example illustrates how to perform real-time inference using Cohere Rerank 3 Nimble-English:
In the following code, the top_n
The inference parameter for Cohere Rerank 3 and Rerank 3 Nimble specifies the number of top-ranked results to return after reranking the input documents. It allows you to control how many of the most relevant documents are included in the final result. To determine an optimal value for top_n
Consider factors such as the diversity of your document set, the complexity of your queries, and the desired balance between accuracy and latency for enterprise or RAG search.
The following is the result of Cohere Rerank 3 Nimble-English:
Cohere Rerank 3 Agile multilingual support
Cohere Rerank 3 Nimble-Multilingual capabilities enable global organizations to deliver consistent and improved search experiences to users across different regions and language preferences.
In the following example, we create an input payload for a list of emails in multiple languages. We can take the same set of previous emails and translate them into different languages. These examples are available in the SageMaker JumpStart model card and are randomly generated for this example.
Use the following code to perform real-time inferences using Cohere Rerank 3 Nimble-Multilingual:
The following is the result of Cohere Rerank 3 Nimble-Multilingual:
The output translated into English is as follows:
In both examples, the relevance scores are normalized to be in the range (0, 1). Scores close to 1 indicate high relevance to the query, and scores close to 0 indicate low relevance.
Suitable use cases for Cohere Rerank 3 Nimble
The Cohere Rerank 3 Nimble model offers an option that prioritizes efficiency. The model is ideal for businesses looking to enable their customers to accurately search complex documentation, build applications that understand over 100 languages, and retrieve the most relevant information from multiple data warehouses. In industries like retail, where website abandonment increases with every 100 milliseconds added to search response time, having a faster ai model like Cohere Rerank 3 Nimble powering the enterprise search system translates into higher conversion rates.
Conclusion
Cohere Rerank 3 and Rerank 3 Nimble are now available in SageMaker JumpStart. To get started, see Train, deploy, and evaluate pre-trained models with SageMaker JumpStart.
Interested in diving deeper? Check out Cohere on AWS GitHub repository.
About the authors
Breanne Warner Breanne is an Enterprise Solutions Architect at amazon Web Services supporting customers in the Healthcare and Life Sciences (HCLS) industries. She is passionate about helping customers use generative ai on AWS and driving model adoption. Breanne also serves on the Women@amazon board as Co-Director of Allyship to foster an inclusive and diverse culture at amazon. Breanne holds a Bachelor of Science in Computer Engineering from the University of Illinois at Urbana Champaign (UIUC).
Nithin Vijeaswaran Niithiyn is a Solutions Architect at AWS. His area of expertise is Generative ai and AWS ai Accelerators. He holds a BS in Computer Science and Bioinformatics. Niithiyn works closely with the GTM Generative ai team to help AWS customers on multiple fronts and accelerate their adoption of Generative ai. He is an avid Dallas Mavericks fan and enjoys collecting sneakers.
Karan Singh Karan is a Generative ai Specialist for Third-Party Models at AWS, where he works with top-tier third-party foundation model providers to define and execute GTM join moves that help customers train, deploy, and scale foundation models. Karan holds a Bachelor of Science in Electrical Engineering and Instrumentation from Manipal University and a Master of Science in Electrical Engineering from Northwestern University, and is currently an MBA candidate at the Haas School of Business at the University of California, Berkeley.