With the rapid growth of generative artificial intelligence (ai), many AWS customers are looking to take advantage of publicly available foundation models (FMs) and technologies. This includes Meta Llama 3, Meta’s publicly available large language model (LLM). The partnership between Meta and amazon signifies collective generative ai innovation, and Meta and amazon are working together to push the boundaries of what’s possible.
In this post, we provide an overview of the Meta Llama 3 models available on AWS at the time of writing, and share best practices on developing Text-to-SQL use cases using Meta Llama 3 models. All the code used in this post is publicly available in the accompanying Github repository.
Background of Meta Llama 3
Meta Llama 3, the successor to Meta Llama 2, maintains the same 70-billion-parameter capacity but achieves superior performance through enhanced training techniques rather than sheer model size. This approach underscores Meta’s strategy of optimizing data utilization and methodologies to push ai capabilities further. The release includes new models based on Meta Llama 2’s architecture, available in 8-billion- and 70-billion-parameter variants, each offering base and instruct versions. This segmentation allows Meta to deliver versatile solutions suitable for different hardware and application needs.
A significant upgrade in Meta Llama 3 is the adoption of a tokenizer with a 128,256-token vocabulary, enhancing text encoding efficiency for multilingual tasks. The 8-billion-parameter model integrates grouped-query attention (GQA) for improved processing of longer data sequences, enhancing real-world application performance. Training involved a dataset of over 15 trillion tokens across two GPU clusters, significantly more than Meta Llama 2. Meta Llama 3 Instruct, optimized for dialogue applications, underwent fine-tuning with over 10 million human-annotated samples using advanced techniques like proximal policy optimization and supervised fine-tuning. Meta Llama 3 models are licensed permissively, allowing redistribution, fine-tuning, and derivative work creation, now requiring explicit attribution. This licensing update reflects Meta’s commitment to fostering innovation and collaboration in ai development with transparency and accountability.
Prompt engineering best practices for Meta Llama 3
The following are best practices for prompt engineering for Meta Llama 3:
- Base model usage – Base models offer the following:
- Prompt-less flexibility – Base models in Meta Llama 3 excel in continuing sequences and handling zero-shot or few-shot tasks without requiring specific prompt formats. They serve as versatile tools suitable for a wide range of applications and provide a solid foundation for further fine-tuning.
- Instruct versions – Instruct versions offer the following:
- Structured dialogue – Instruct versions of Meta Llama 3 use a structured prompt format designed for dialogue systems. This format maintains coherent interactions by guiding system responses based on user inputs and predefined prompts.
- Text-to-SQL parsing – For tasks like Text-to-SQL parsing, note the following:
- Effective prompt design – Engineers should design prompts that accurately reflect user queries to SQL conversion needs. Meta Llama 3’s capabilities enhance accuracy and efficiency in understanding and generating SQL queries from natural language inputs.
- Development best practices – Keep in mind the following:
- Iterative refinement – Continuous refinement of prompt structures based on real-world data improves model performance and consistency across different applications.
- Validation and testing – Thorough testing and validation make sure that prompt-engineered models perform reliably and accurately across diverse scenarios, enhancing overall application effectiveness.
By implementing these practices, engineers can optimize the use of Meta Llama 3 models for various tasks, from generic inference to specialized natural language processing (NLP) applications like Text-to-SQL parsing, using the model’s capabilities effectively.
Solution overview
The demand for using LLMs to improve Text-to-SQL queries is growing more important because it enables non-technical users to access and query databases using natural language. This democratizes access to generative ai and improves efficiency in writing complex queries without needing to learn SQL or understand complex database schemas. For example, if you’re a financial customer and you have a MySQL database of customer data spanning multiple tables, you could use Meta Llama 3 models to build SQL queries from natural language. Additional use cases include:
- Improved accuracy – LLMs can generate SQL queries that more accurately capture the intent behind natural language queries, thanks to their advanced language understanding capabilities. This reduces the need to rephrase or refine your queries.
- Handling complexity – LLMs can handle complex queries involving multiple tables (which we demonstrate in this post), joins, filters, and aggregations, which would be challenging for rule-based or traditional Text-to-SQL systems. This expands the range of queries that can be handled using natural language.
- Incorporating context – LLMs can use contextual information like database schemas, table descriptions, and relationships to generate more accurate and relevant SQL queries. This helps bridge the gap between ambiguous natural language and precise SQL syntax.
- Scalability – After they’re trained, LLMs can generalize to new databases and schemas without extensive retraining or rule-writing, making them more scalable than traditional approaches.
For the solution, we follow a Retrieval Augmented Generation (RAG) pattern to generate SQL from a natural language query using the Meta Llama 3 70B model on amazon SageMaker JumpStart, a hub that provides access to pre-trained models and solutions. SageMaker JumpStart provides a seamless and hassle-free way to deploy and experiment with the latest state-of-the-art LLMs like Meta Llama 3, without the need for complex infrastructure setup or deployment code. With just a few clicks, you can have Meta Llama 3 models up and running in a secure AWS environment under your virtual private cloud (VPC) controls, maintaining data security. SageMaker JumpStart offers access to a range of Meta Llama 3 model sizes (8B and 70B parameters). This flexibility allows you to choose the appropriate model size based on your specific requirements. You can also incrementally train and tune these models before deployment.
The solution also includes an embeddings model hosted on SageMaker JumpStart and publicly available vector databases like ChromaDB to store the embeddings.
ChromaDB and other vector engines
In the realm of Text-to-SQL applications, ChromaDB is a powerful, publicly available, embedded vector database designed to streamline the storage, retrieval, and manipulation of high-dimensional vector data. Seamlessly integrating with machine learning (ML) and NLP workflows, ChromaDB offers a robust solution for applications such as semantic search, recommendation systems, and similarity-based analysis. ChromaDB offers several notable features:
- Efficient vector storage – ChromaDB uses advanced indexing techniques to efficiently store and retrieve high-dimensional vector data, enabling fast similarity searches and nearest neighbor queries.
- Flexible data modeling – You can define custom collections and metadata schemas tailored to your specific use cases, allowing for flexible data modeling.
- Seamless integration – ChromaDB can be seamlessly embedded into existing applications and workflows, providing a lightweight and performant solution for vector data management.
Why choose ChromaDB for Text-to-SQL use cases?
- Efficient vector storage for text embeddings – ChromaDB’s efficient storage and retrieval of high-dimensional vector embeddings are crucial for Text-to-SQL tasks. It enables fast similarity searches and nearest neighbor queries on text embeddings, facilitating accurate mapping of natural language queries to SQL statements.
- Seamless integration with LLMs – ChromaDB can be quickly integrated with LLMs, enabling RAG architectures. This allows LLMs to use relevant context, such as providing only the relevant table schemas necessary to fulfill the query.
- Customizable and community support – ChromaDB offers flexibility and customization with an active community of developers and users who contribute to its development, provide support, and share best practices. This provides a collaborative and supportive landscape for Text-to-SQL applications.
- Cost-effective – ChromaDB eliminates the need for expensive licensing fees, making it a cost-effective choice for organizations of all sizes.
By using vector database engines like ChromaDB, you gain more flexibility for your specific use cases and can build robust and performant Text-to-SQL systems for generative ai applications.
Solution architecture
The solution uses the AWS services and features illustrated in the following architecture diagram.
The process flow includes the following steps:
- A user sends a text query specifying the data they want returned from the databases.
- Database schemas, table structures, and their associated metadata are processed through an embeddings model hosted on SageMaker JumpStart to generate embeddings.
- These embeddings, along with additional contextual information about table relationships, are stored in ChromaDB to enable semantic search, allowing the system to quickly retrieve relevant schema and table context when processing user queries.
- The query is sent to ChromaDB to be converted to vector embeddings using a text embeddings model hosted on SageMaker JumpStart. The generated embeddings are used to perform a semantic search on the ChromaDB.
- Following the RAG pattern, ChromaDB outputs the relevant table schemas and table context that pertain to the query. Only relevant context is sent to the Meta Llama 3 70B model. The augmented prompt is created using this information from ChromaDB as well as the user query.
- The augmented prompt is sent to the Meta Llama3 70B model hosted on SageMaker JumpStart to generate the SQL query.
- After the SQL query is generated, you can run the SQL query against amazon Relational Database Service (amazon RDS) for MySQL, a fully managed cloud database service that allows you to quickly operate and scale your relational databases like MySQL.
- From there, the output is sent back to the Meta Llama 3 70B model hosted on SageMaker JumpStart to provide a response the user.
- Response sent back to the user.
Depending on where your data lives, you can implement this pattern with other relational database management systems such as PostgreSQL or alternative database types, depending on your existing data infrastructure and specific requirements.
Prerequisites
Complete the following prerequisite steps:
- Have an AWS account.
- Install the AWS Command Line Interface (AWS CLI) and have the amazon SDK for Python (Boto3) set up.
- Request model access on the amazon Bedrock console for access to the Meta Llama 3 models.
- Have access to use Jupyter notebooks (whether locally or on amazon SageMaker Studio).
- Install packages and dependencies for LangChain, the amazon Bedrock SDK (Boto3), and ChromaDB.
Deploy the Text-to-SQL environment to your AWS account
To deploy your resources, use the provided AWS CloudFormation template, which is a tool for deploying infrastructure as code. Supported AWS Regions are US East (N. Virginia) and US West (Oregon). Complete the following steps to launch the stack:
- On the AWS CloudFormation console, create a new stack.
- For Template source, choose Upload a template file then upload the yaml for deploying the Text-to-SQL environment.
- Choose Next.
- Name the stack
text2sql
. - Keep the remaining settings as default and choose Submit.
The template stack should take 10 minutes to deploy. When it’s done, the stack status will show as CREATE_COMPLETE.
- When the stack is complete, navigate to the stack Outputs
- Choose the
SagemakerNotebookURL
link to open the SageMaker notebook in a separate tab. - In the SageMaker notebook, navigate to the
Meta-Llama-on-AWS/blob/text2sql-blog/RAG-recipes
directory and openllama3-chromadb-text2sql.ipynb.
- If the notebook prompts you to set the kernel, choose the
conda_pytorch_p310
kernel, then choose Set kernel.
Implement the solution
You can use the following Jupyter notebook, which includes all the code snippets provided in this section, to build the solution. In this solution, you can choose which service (SageMaker Jumpstart or amazon Bedrock) to use as the hosting model service using ask_for_service()
in the notebook. amazon Bedrock is a fully managed service that offers a choice of high-performing FMs. We give you the choice between solutions so that your teams can evaluate if SageMaker JumpStart is preferred or if your teams want to reduce operational overhead with the user-friendly amazon Bedrock API. You have the choice to use SageMaker JumpStart to host the embeddings model of your choice or amazon Bedrock to host the amazon Titan Embeddings model (amazon.titan-embed-text-v2:0
).
Now that the notebook is ready to use, follow the instructions in the notebook. With these steps, you create an RDS for MySQL connector, ingest the dataset into an RDS database, ingest the table schemas into ChromaDB, and generate Text-to-SQL queries to run your prompts and analyze data residing in amazon RDS.
- Create a SageMaker endpoint with the BGE Large En v1.5 Embedding model from Hugging Face:
- Create a collection in ChromaDB for the RAG framework:
- Build the document with the table schema and sample questions to enhance the retriever’s accuracy:
- Add documents to ChromaDB:
- Build the prompt (
final_question
) by combining the user input in natural language (user_query
), the relevant metadata from the vector store (vector_search_match
), and instructions (details
): - Submit a question to ChromaDB and retrieve the table schema SQL
- Invoke Meta Llama 3 on SageMaker and prompt it to generate the SQL query. The function
get_llm_sql_analysis
will run and pass the SQL query results to Meta Llama 3 to provide a comprehensive analysis of the data:
Although Meta Llama 3 doesn’t natively support function calling, you can simulate an agentic workflow. In this approach, a query is first generated, then run, and the results are sent back to Meta Llama 3 for interpretation.
Run queries
For our first query, we provide the input “How many unique airplane producers are represented in the database?” The following is the table schema retrieved from ChromaDB:
The following is the generated query:
The following is the data analysis generated from the previous SQL query:
For our second query, we ask “Find the airplane IDs and producers for airplanes that have flown to New York.” The following are the table schemas retrieved from ChromaDB:
The following is our generated query:
The following is the data analysis generated from the previous SQL query:
Clean up
To avoid incurring continued AWS usage charges, delete all the resources you created as part of this post. Make sure you delete the SageMaker endpoints you created within the application before you delete the CloudFormation stack.
Conclusion
In this post, we explored a solution that uses the vector engine ChromaDB and Meta Llama 3, a publicly available FM hosted on SageMaker JumpStart, for a Text-to-SQL use case. We shared a brief history of Meta Llama 3, best practices for prompt engineering with Meta Llama 3 models, and an architecture pattern using few-shot prompting and RAG to extract the relevant schemas stored as vectors in ChromaDB. Finally, we provided a solution with code samples that gives you flexibility to choose SageMaker Jumpstart or amazon Bedrock for a more managed experience to host Meta Llama 3 70B, Meta Llama3 8B, and embeddings models.
The use of publicly available FMs and services alongside AWS services helps drive more flexibility and provides more control over the tools being used. We recommend following the amazon-sagemaker-examples/tree/main/sagemaker-jumpstart” target=”_blank” rel=”noopener”>SageMaker JumpStart GitHub repo for getting started guides and examples. The solution code is also available in the following Github repo.
We look forward to your feedback and ideas on how you apply these calculations for your business needs.
About the Authors
Marco Punio is a Sr. Specialist Solutions Architect focused on generative ai strategy, applied ai solutions, and conducting research to help customers hyperscale on AWS. Marco is based in Seattle, WA, and enjoys writing, reading, exercising, and building applications in his free time.
Armando Diaz is a Solutions Architect at AWS. He focuses on generative ai, ai/ML, and Data Analytics. At AWS, Armando helps customers integrating cutting-edge generative ai capabilities into their systems, fostering innovation and competitive advantage. When he’s not at work, he enjoys spending time with his wife and family, hiking, and traveling the world.
Breanne Warner is an Enterprise Solutions Architect at amazon Web Services supporting healthcare and life science (HCLS) customers. She is passionate about supporting customers to leverage generative ai and evangelizing model adoption. Breanne is also on the Women@amazon board as co-director of Allyship with the goal of fostering inclusive and diverse culture at amazon. Breanne holds a Bachelor of Science in Computer Engineering.
Varun Mehta is a Solutions Architect at AWS. He is passionate about helping customers build enterprise-scale Well-Architected solutions on the AWS Cloud. He works with strategic customers who are using ai/ML to solve complex business problems. Outside of work, he loves to spend time with his wife and kids.
Chase Pinkerton is a Startups Solutions Architect at amazon Web Services. He holds a Bachelor’s in Computer Science with a minor in Economics from Tufts University. He’s passionate about helping startups grow and scale their businesses. When not working, he enjoys road cycling, hiking, playing volleyball, and photography.
Kevin Lu is a Technical Business Developer intern at amazon Web Services on the Generative ai team. His work focuses primarily on machine learning research as well as generative ai solutions. He is currently an undergraduate at the University of Pennsylvania, studying computer science and math. Outside of work, he enjoys spending time with friends and family, golfing, and trying new food.