Amazon Kendra is an intelligent search service powered by machine learning (ML). Amazon Kendra helps you easily aggregate content from a variety of content repositories into a centralized index that lets you quickly search all your business data and find the most accurate answer. drupal It is a content management software. It is used to create many of the websites and applications we use every day. Drupal has a great set of features, such as easy content creation, reliable performance, and security. Many organizations use Drupal to store their content. One of the key requirements for many customers using Drupal is the ability to easily and securely find accurate information across all documents in the data source.
With the Amazon Kendra Drupal Connector, you can index Drupal content, filter the types of custom content you want to index, and easily search Drupal content using Amazon Kendra Smart Search.
This post shows you how to use the Amazon Kendra Drupal Connector to configure the connector as a data source for your Amazon Kendra index and search your Drupal documents. Depending on your Drupal connector configuration, you can sync the connector to crawl and index different types of Drupal content, such as blogs and wikis. The connector also ingests access control list (ACL) information for each file. ACL information is used for user context filtering, where search results for a query are filtered based on what a user has authorized access to.
Previous requirements
To try the Amazon Kendra connector for Drupal using this post as a reference, you need the following:
Configure the data source using the Amazon Kendra Connector for Drupal
To add a data source to your Amazon Kendra index using the Drupal connector, you can use an existing index or create a new index. Then complete the following steps. For more information on this topic, see the Amazon Kendra Developer Guide.
- In the Amazon Kendra console, open your index and choose Data sources in the navigation panel.
- Choose Add data source.
- Low drupalchoose Add connector.
- In it Specify data source details section, enter a name and description and choose Next.
- About him Define access and security section, for Drupal server URLenter the URL of the Drupal site.
- To configure SSL certificates, you can create a self-signed certificate for this configuration using the
openssl x509 -in mydrupalsite.pem -out drupal.crt
command and store the certificate in an Amazon Simple Storage Service (Amazon S3) bucket. For more details on how to generate a private key and certificate, see Generating Certificates. - Choose Explore S3 and choose S3 bucket with SSL certificate.
- Low Authenticationyou have two options:
- Use Secrets Manager to create new Drupal authentication credentials. You need a Drupal administrator username and password (plus a client ID and client secret for OAuth 2.0 authentication).
- Use an existing Secrets Manager secret that has the Drupal authentication credentials you want the connector to access (plus a client ID and client secret for OAuth 2.0 authentication).
- Choose Save and add secret.
- For IAM Rolechoose Create a new role or choose an existing IAM role configured with appropriate IAM policies to access the Secrets Manager secret, Amazon Kendra index, and data source.
See IAM Roles for Data Sources for the permissions required for the IAM role.
- Choose Next.
- In it Configure sync settings section, select Articles, Basic pages, Basic blocks, Custom content typesand Custom blocks along with options to track comments and attachments as needed.
- Optionally, enter inclusion/exclusion patterns for entity titles.
- Provide information about your synchronization scope (full only or delta) and specify the execution schedule.
- Choose Next.
- In it Set field mappings section, add the Drupal custom fields you want to sync and their respective Amazon Kendra field mappings. Required fields are pre-mapped by Amazon Kendra.
- Choose Next.
- Review the configuration settings and save the data source.
- Choose Sync now in the created data source to start data synchronization with the Amazon Kendra index.
The time required to crawl and sync content to Amazon Kendra varies depending on content volume and performance.
You can now search indexed Drupal content using the search console or a search application. You can optionally search with ACL with the following additional steps.
- Go to the index page you created and in the User access controll tab, choose Edit settings.
- Low Access control settingsselect Yeahkeep the default values for Username and Groupschoose JSON for Token typeand maintain the expansion of the user group as None.
- On the next page, keep the default values (or change them based on your capacity requirements) and choose Update.
Search smart with Amazon Kendra
Before attempting to search the Amazon Kendra console or use the API, ensure that data source synchronization is complete. To check this, view the data sources and check if the last sync was successful.
- To start your search, in the Amazon Kendra console, choose Search indexed content in the navigation panel.
You will be redirected to the Amazon Kendra Search Console. You can now search for information in the Drupal documents that you indexed with Amazon Kendra.
- For this post, we are looking for a document stored in the Drupal data source.
- Expand Test query with an access token and choose Apply token.
- For UsernameEnter the email address associated with your Drupal account.
- Choose Apply.
The user can now only view content they have access to based on the specified username or groups. In our example, the Drupal user with the [email protected]
Email does not have access to any documents in Drupal, so none are displayed.
Limitations
Please note the following limitations when using this solution:
- Content types (such as article or basic page) that are not associated with any views cannot be crawled.
- If an administrator does not have access to a block, they will not be able to track the block’s data.
- The document body for the article, basic page, basic block, user-defined content type, and user-defined block type is displayed in HTML format. If the HTML content is not well formed, the HTML-related tags will appear in the body of the document and therefore will be visible in Amazon Kendra search results. The same goes for article comments, basic page, basic block, user-defined content type, user-defined block type.
- Content type or block type without description or body will not be injected into the Amazon Kendra index because there is validation on the Amazon Kendra SDK side. However, Drupal allows you to create the content type without description or body. Only comments and attachments of the respective content types or block types (if they exist) will be injected into the Amazon Kendra index.
Clean
To avoid incurring future costs, clean up the resources that you created as part of this solution. If you created a new Amazon Kendra index while trying this solution, delete it. If you only added a new data source using the Amazon Kendra connector for Drupal, delete that data source. Delete the created IAM users.
Conclusion
With the Amazon Kendra Drupal connector, your organization can securely search content stored on a Drupal site using intelligent search powered by Amazon Kendra. In this post, we introduce you to the integration, but there are many additional features that we don’t cover, such as the following:
- You can map additional fields to Amazon Kendra index attributes and enable them to be faceted, searched, and displayed in search results.
- You can integrate the Drupal data source with the custom document enrichment (CDE) capability in Amazon Kendra to perform additional attribute mapping logic and even custom content transformation during ingestion.
To learn more about the possibilities with Drupal, see the Amazon Kendra Developer Guide.
To learn more about other built-in Amazon Kendra connectors for popular data sources, see the Amazon Kendra Connectors page.
About the authors
Channa Basavaraja is a Senior Solutions Architect at AWS with over two decades of experience building distributed business solutions. His depth areas span machine learning, mobile/application development, event-driven architecture, and IoT/edge computing.
Yuanhua Wang is a software engineer at AWS with more than 15 years of experience in the technology industry. His interests are software architecture and the creation of cloud computing tools.