The generative ai and large language models (LLM) are revolutionizing organizations in various sectors to improve customer experience, which would traditionally have been moving forward. Each organization has data stored in data stores, either in premises or in cloud suppliers.
You can adopt generative ai and improve customer experience by converting your existing data into an index on which generative ai can search. When asking a question to an open source llm, you get public information available in response. Although this is useful, generative ai can help you understand your data together with the additional context of LLMS. This is achieved through the generation of augmented recovery (RAG).
RAG recovers data from a pre -existing knowledge base (its data), combines it with the knowledge of the LLM and generates answers with a more human language. However, for the generative ai to understand its data, a certain amount of data preparation is required, which implies a large learning curve.
amazon Aurora is a relational database compatible with MySQL and Postgresql created for the cloud. Aurora combines the performance and availability of traditional business databases with the simplicity and profitability of open source databases.
In this publication, we guide it through how to turn your existing aurora data into an index without the need for data preparation for amazon Kendra to perform a data search and implement RAG that combines your data together with the knowledge of LLM to produce answers precise.
General solution of the solution
In this solution, use your existing data as a data source (Aurora), create an intelligent search service connecting and synchronizing its data source to amazon Kendra Search and performs a generative search for ai data, which RAG uses to produce precise responses Combining your data along with the knowledge of the LLM. For this publication, we use the claude of Anthrope on amazon Bedrock as our LLM.
The following are the high level steps for the solution:
The following diagram illustrates the architecture of the solution.
Previous requirements
To follow this publication, the following previous requirements are required:
Create a PostgreSql aurora cluster
Execute the following AWS CLI commands to create a V2 cluster without postgressql aurora server:
The next screen capture shows the created instance.
Ingest data to Aurora Postgresql-Compatible
Connect to the instance of Aurora using the Pgadmin tool. See the connection to an instance of DB that executes the PostgreSQL database engine for more information. To ingest your data, complete the following steps:
- Execute the following postgresql statements in PGADMIN to create the database, scheme and table:
- In your post -grower connection of Pgadmin Aurora, I navigate to Databases, genes, Scheme, employees, Tables.
- Choose (click on the right button) Tables and choose PSQL tool To open a PSQL customer connection.
- Place the CSV file in its PGADMIN location and run the following command:
- Execute the following PSQL query to verify the number of copied records:
Create an amazon Kendra index
The amazon Kendra index maintains the content of its documents and is structured in a way of making the documents be searching. It has three types of index:
- GENERATIVE INDEX OF BUSINESS EDITION OF IA – It offers the greatest precision for the recovery API operation and for rag use cases (recommended)
- Business Edition Index -Provant semantic search capabilities and offers high availability service that is suitable for production workloads
- Developer editing index – Provides semantic search capabilities so you can prove your use cases
To create an amazon Kendra index, complete the following steps:
- In the amazon Kendra console, choose Indices In the navigation panel.
- Choose Create an index.
- In it Specify index details Page, provide the following information:
- For Index nameEnter a name (for example,
genai-kendra-index
). - For Iam paperchoose Create a new role (recommended).
- For Role nameEnter a IAM role name (for example,
genai-kendra
). Your role name will have prefixAmazonKendra--
(For example,AmazonKendra-us-east-2-genai-kendra
).
- For Index nameEnter a name (for example,
- Choose Next.
- In it Add additional capacity Page, select Developer edition (for this demonstration) and choose Next.
- In it Configure user access control Page, provide the following information:
- Low Access control settingsselect No.
- Low Expansion of the user groupselect None.
- Choose Next.
- In it Check and create page, verify the details and choose Create.
You can take some time for the index to believe. Verify the list of indices to see the progress of the creation of your index. When the index status is ASSETYour index is ready to use.
Configure the amazon Kendra Aurora postgresql connector
Complete the following steps to configure your data source connector:
- In the amazon Kendra console, choose Data sources In the navigation panel.
- Choose Add data source.
- Choose PostgreSql Aurora connector As type of data source.
- In it Specify data source details Page, provide the following information:
- In it Define access and safety page, below FountainProvide the following information:
- Low AuthenticationIf you already have credentials stored in AWS Secrets Manager, choose in the display menu otherwise, choose Create and add a new secret.
- In it Create an AWS Secrets Manager secret Emerging window, provide the following information:
- For Secret nameEnter a name (for example,
AmazonKendra-Aurora-PostgreSQL-genai-kendra-secret
). - For User name DatabaseEnter the user's name of your database.
- For Password¸ Enter the user password.
- For Secret nameEnter a name (for example,
- Choose Add secret.
- Low Configure VPC and security groupProvide the following information:
- For Virtual private cloudChoose your virtual private cloud (VPC).
- For SubnetChoose your subnet.
- For VPC security groupsChoose the VPC security group to allow access to your data source.
- Low Iam paper¸ If you have an existing role, choose in the drop -down menu. Otherwise, choose Create a new role.
- In it Configure synchronization settings page, below Synchronization scopeProvide the following information:
- For SQL consultationEnter the SQL query and column values as follows:
select * from employees.amazon_review
. - For Main keyEnter the main key column (
pk
). - For QualificationEnter the title column that provides the name of the document title within the table of your database (
reviews_title
). - For BodyEnter the column of the body in which your search for amazon Kendra will occur (
reviews_text
).
- For SQL consultationEnter the SQL query and column values as follows:
- Low Synchronization nodeselect Complete synchronization To convert the complete data of the table into a search index.
After the synchronization is successfully completed, its amazon Kendra index will contain the data in the specified Postgreql Aurora table. Then you can use this index for intelligent search and cloth applications.
- Low Synchronization Execution Hourschoose Execute.
- Choose Next.
- In it Establish field mapping page, leave the predetermined configuration and choose Next.
- Check your configuration and choose Add data source.
Your data source will appear in the Data sources page after the data source has been created successfully.
Invoke the RAG application
Synchronization of the amazon Kendra index can take minutes depending on the volume of its data. When the synchronization is completed without error, it is ready to develop its RAG solution in its favorite IDE. Complete the following steps:
- Configure your AWS credentials to allow Boto3 to interact with AWS services. You can do this by configuring the
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
Environment variables or using the~/.aws/credentials
archive: - Import Langchain and the necessary components:
- Create an instance of the LLM (Claude de Anthrope):
- Create your warning template, which provides instructions for the LLM:
- Initialize the
KendraRetriever
With its amazon Kendra index id replacing theKendra_index_id
that you created previously and the amazon Kendra client: - Combine Claude from Anthrope and amazon Kendra Retriever in a recovery chain:
- Invokes the chain with your own consultation:
Clean
To avoid incurring future positions, eliminating the resources that you created as part of this publication:
- Eliminate the Aurora DB cluster and the DB instance.
- Eliminate the amazon Kendra index.
Conclusion
In this publication, we discuss how to turn your existing Aurora data into an amazon Kendra index and implement a RAG -based solution for data search. This solution drastically reduces the need for data preparation for the search for amazon Kendra. It also increases the speed of generative development of IA applications by reducing the learning curve behind data preparation.
Try the solution, and if you have any comments or question, leave them in the comments section.
About the authors
Aravind Hariharaputran He is data consultant of the Professional Services team at amazon Web Services. He passionate data and AIML in general with extensive experience in the management of database technologies. It helps customers transform the database and applications inherited into modern data platforms and generative applications of ai. He likes to spend time with the family and play the Crick.
Ivan Cui He is a data science leader with Aws Professional Services, where he helps customers build and implement solutions using ML and generative the AWS. He has worked with customers in various industries, including software, finance, pharmaceutical, medical care, IoT and entertainment and media. In his free time, he likes to read, spend time with his family and travel.