amazon Q Business is an ai-powered generative assistant that can answer questions, provide summaries, generate content, and complete tasks securely based on data and information from your business systems. Much of that information is found in text narratives stored in various document formats, such as PDFs, Word files, and HTML pages. Some information is also stored in tables (such as pricing tables or product specifications) embedded in those same types of documents, CSVs, or spreadsheets. Although amazon Q Business can provide precise answers from narrative texts, obtaining answers from these tables requires special handling of more structured information.
On November 21, 2024, amazon Q Business released support for tabular search, which you can use to extract answers from tables embedded in documents ingested into amazon Q Business. Tabular search is a feature built into amazon Q Business that works seamlessly across many domains, with no configuration required by the administrator or end users.
In this post, we incorporate different types of documents that have tables and show you how amazon Q Business answers questions related to table data.
Prerequisites
To follow this tutorial, you must meet the following prerequisites:
- An AWS account where you can follow the instructions in this post.
- At least one amazon Q Business user required. For information, see amazon Q Business pricing.
- Requires cross-region inference enabled in the amazon Q app.
- amazon Q Business apps created on or after November 21, 2024 will automatically benefit from the new capability. If your app was created before this date, you will need to re-ingest its content to update its indexes.
Tabular Search Overview
Tabular search extends amazon Q Business's capabilities to find answers beyond paragraphs of text, analyzing tables embedded in business documents so you can get answers to a wide range of queries, including searching for data from tables.
With tabular search on amazon Q Business, you can ask questions like “what is the credit card with the lowest APR and no annual fees?” or “which credit cards offer travel insurance?” where the answers can be found in a product comparison table, within a marketing PDF stored in an internal repository or on a website.
This feature supports a wide range of file formats, including PDF, Word documents, CSV files, Excel spreadsheets, HTML, and SmartSheet (via the SmartSheet connector). In particular, tabular search can also extract data from tables represented as images within PDF files and retrieve information from one or multiple cells. Additionally, it can perform numerical data aggregations, providing users with valuable information.
Ingest documents in amazon Q Business
To create an amazon Q Business app, retriever, and index to extract real-time data during a conversation, follow the steps in Create and configure your amazon Q app section in the AWS Machine Learning blog post, Discover insights from amazon S3 with the amazon Q S3 connector.
For this publication we use The billionaires of the worldwhich lists the world's top 10 billionaires from 1987 to 2024 in tabular format. You can download this data as a PDF from Wikipedia using the Tools menu. Upload the PDF to an amazon Simple Storage Service (amazon S3) bucket and use it as a data source in your amazon Q Business application.
Run queries with amazon Q
You can start asking questions to amazon Q using the Web Experience URLwhich can be found in the Applications page, as shown in the following screenshot.
Suppose we want to know the ratio of men to women who appeared on the Forbes 2024 list of the world's billionaires. As you can see in the following screenshot of He The billionaires of the world PDF, there were 383 women and 2398 men.
To use amazon Q Business to get that information from the PDF, enter the following into the web experience chatbot
“In 2024, what is the ratio of men and women who appeared on the Forbes 2024 Billionaires List?”
amazon Q Business provides the answer, as shown in the screenshot below.
The screenshot below is a list of the top 10 billionaires of 2009.
We entered “How many of the top 10 billionaires in 2009 were from countries outside the United States?”
amazon Q Business provides an answer, as shown in the following screenshot.
Next, to demonstrate how amazon Q Business can extract data from a CSV file, we use the example of crime statistics found here.
We introduced the question: “How many crime incidents were reported in Hollywood?”
amazon Q Business provides the answer, as shown in the screenshot below.
Metadata boost
To improve the accuracy of amazon Q Business app responses with CSV files, you can add metadata to documents in an S3 bucket using a metadata file. Metadata is additional information about a document that describes it in more detail to improve the accuracy of retrieval of context-poor document formats, for example, a CSV with cryptic column names. Additional fields, such as its title and the date and time it was created, can also be useful if you want to search for titles or documents from a certain time period.
You can do this by following Enable document attributes for search in amazon Q Business.
Additional details on metadata enhancement can be found in Configuring Document Attributes for Enhancement in amazon Q Business in the amazon Q User Guide.
Clean
To avoid incurring future charges and clean up unused roles and policies, delete the resources you created: the amazon Q application, the data sources, and the corresponding IAM roles.
To remove the amazon Q app, follow these steps:
- In the amazon Q console, choose Applications and then select your application.
- in it Behavior drop down menu, choose Delete.
- To confirm the deletion, enter delete in the field and choose Delete. Wait until you receive the confirmation message; The process can take up to 15 minutes.
To delete the S3 bucket created in Prepare your S3 bucket as a data sourcefollow these steps:
- Follow the instructions in Emptying a Bucket.
- Follow the steps in Delete a deposit
To delete the IAM Identity Center instance that you created as part of the prerequisites, follow the steps in Delete your IAM Identity Center instance.
Conclusion
By following this post, you can ingest different types of documents that contain tables. You can then ask amazon Q questions related to the information in the table and have amazon Q provide you with natural language answers.
For information about searching metadata, see Setting up metadata controls in amazon Q Business.
To configure the S3 data source, see Configure the amazon Q Business app with the S3 data source.
About the author
Jiten Dedhia is a Senior AIML Solutions Architect with over 20 years of experience in the software industry. He has helped Fortune 500 companies with their AIML/generative ai needs.
Sapna Maheshwari He is a Senior Solutions Architect at AWS and is passionate about designing impactful technology solutions. She is an engaging speaker who likes to share her ideas at conferences.