New Extended Data Format Support in Amazon Kendra

Companies around the world are looking to use multiple data sources to implement a unified search experience for their employees and end customers. Given the large volume of data that needs to be searched and indexed, retrieval speed, solution scalability, and search performance become key factors to consider when choosing an enterprise intelligent search solution. Additionally, these unique data sources comprise repositories of structured and unstructured content, including various file types, which can cause compatibility issues.

Amazon Kendra is a highly accurate, intelligent search service that enables users to find answers to their questions from their structured and unstructured data using natural language processing and advanced search algorithms. Return specific answers to questions, giving users an experience close to interacting with a human expert.

Today, Amazon Kendra released seven additional data format support options for your use. This allows you to easily integrate your existing data sources as-is and intelligently search across multiple content repositories.

In this post, we discuss the new supported data formats and how to use them.

New supported data formats

Previously, Amazon Kendra supported documents that included structured text in the form of FAQs, as well as unstructured text in the form of HTML files, Microsoft PowerPoint presentations, Microsoft Word documents, plain text documents, and PDFs.

With this release, Amazon Kendra now offers support for seven additional data formats:

Rich Text Format (RTF)
JavaScript Object Notation (JSON)
Markdown (MD)
Comma Separated Values (CSV)
Microsoft Excel (MS Excel)
Extensible Markup Language (XML)
Extensible Style Sheet Language (XSLT) Transformations

Amazon Kendra users can add these documents with different data formats to their index in the following two ways:

Solution Overview

In the following sections, we walk through the steps for adding documents from a data source and performing a search on those documents.

The following diagram shows the architecture of our solution.

To test this solution for any of the supported formats, you must use your own data. You can test uploading documents of the same or different formats to the S3 bucket.

Create an Amazon Kendra Index

For instructions on how to create your Amazon Kendra index, see Creating an index.

You can skip this step if you have a pre-existing index to use for this demo.

Upload documents to an S3 bucket and ingest into the index using the S3 connector

Complete the following steps to connect an S3 bucket to your index:

Create an S3 bucket to store your documents.
Create a folder called sample-data.
Upload the documents you want to test in the folder.
In the Amazon Kendra console, go to your index and choose data sources.
Choose add data source.
Low Available data sourcesselect S3 and choose add connector.
Enter a name for your connector (such as Demo_S3_connector) and choose Next.
Choose Explore S3 and choose the S3 bucket where you uploaded the documents.
For Identity and access management functioncreate a new role.
For Set Synchronization Execution Programselect run on demand.
Choose Next.
About him review and create page, choose add data source.
After the creation process is complete, choose sync now.

Now that you’ve ingested some documents, you can navigate to the built-in search console to test your queries.

Search your documents with the Amazon Kendra search console

In the Amazon Kendra console, choose Search indexed content in the navigation pane.

The following are examples of search results for different types of documents:

RTF – Enter data in RTF format uploaded to S3 bucket and sync data source: