Detect signatures on documents or images using the signatures feature in Amazon Textract

Amazon Textract is a machine learning (ML) service that automatically extracts text, handwriting, and data from any document or image. AnalyzeDocument Signatures is a feature within Amazon Textract that provides the ability to automatically detect signatures on any document. This can reduce the need for human review, custom code, or ML expertise.

In this post, we discuss the benefits of AnalyzeDocument’s signatures feature and how the AnalyzeDocument Signatures API helps detect signatures in documents. We also explain how to use the function through the Amazon Textract console, and provide code examples for using the API and processing the response with the Amazon Textract Response Parser library. Lastly, we share some best practices for using this feature.

Benefits of the Signatures feature

Our clients in the insurance, mortgage, legal and tax industries are challenged to process large volumes of paper documents while meeting regulatory and compliance requirements that require signatures on the documents. You may need to ensure that specific forms, such as loan applications or claims submitted by your end customers, contain signatures before you begin processing the application. For certain document processing workflows, you may need to go a step further to extract and compare signatures for verification.

Historically, clients typically submit documents to a human reviewer for signature detection. Using human reviewers to detect signatures tends to require a significant amount of time and resources. It can also lead to inefficiencies in the document processing workflow, resulting in longer response times and a poor end-user experience.

The Scan Document Signatures feature allows you to automatically detect handwritten signatures, electronic signatures, and initials on documents. This can help you create a scalable automated solution with less reliance on costly and time-consuming manual processing. Not only can you use this feature to check if the document is signed, but you can also validate if a particular field in the form is signed using the location details of the detected signatures. You can also use location information to redact personally identifiable information (PII) in a document.

How AnalyzeDocument Signatures detects signatures on documents

The AnalyzeDocument API has four types of functions: forms, tables, queries, and signatures. When Amazon Textract processes documents, the results are returned in an array of Block objects. The Signatures feature can be used alone or in combination with other types of features. When used alone, the Signatures function type provides a JSON response that includes the location and confidence scores of the detected signatures and the raw text (words and lines) of the documents. The Signatures feature combined with other types of features, such as forms and tables, can help you get useful information. In cases where the function is used with forms and tables, the response displays the signature as part of the key value pair or a table cell. For example, the response for the following form contains the key as Lender’s Signature and the value as the Block object.

How to use the Signatures feature in the Amazon Textract console

Before we get started with the API and code samples, let’s review the Amazon Textract console. After uploading the document to the Amazon Textract console, select signature detection in it configure document section and choose apply settings.

The following screenshot shows an example of a pay stub in the firms tab for the Document Parsing API in the Amazon Textract console.

The function detects and presents the signature with its corresponding page and trust score.

code samples

You can use the Signatures feature to detect signatures on different types of documents, such as checks, loan application forms, claim forms, pay stubs, mortgage documents, bank statements, leases, and contracts. In this section, we discuss some of these documents and show how to call the AnalyzeDocument API with the Signatures parameter to detect signatures.

The input document can be in byte array format or be located in an Amazon Simple Storage Service (Amazon S3) bucket. For documents in a byte array format, you can send image bytes to an Amazon Textract API operation using the bytes property. Signatures as a function type are supported by the AnalyzeDocument API for synchronous document processing and StartDocumentAnalysis for asynchronous document processing.

In the example below, we detect signatures on an employment verification letter.

We use the following sample Python code:

import boto3
import json

#create a Textract Client
textract = boto3.client('textract')
#Document
documentName = image_filename

response = None
with open(image_filename, 'rb') as document:
    imageBytes = bytearray(document.read())

# Call Textract AnalyzeDocument by passing a document from local disk
response = textract.analyze_document(
    Document={'Bytes': imageBytes},
    FeatureTypes=["FORMS",'SIGNATURES']
    )

Let’s analyze the response we get from the AnalyzeDocument API. The following answer has been trimmed to show only the relevant parts. The answer has a BlockType of SIGNATURE which displays the confidence score, block ID, and bounding box details:

'BlockType': 'SIGNATURE',
   'Confidence': 38.468597412109375,
   'Geometry': {'BoundingBox': {'Width': 0.15083004534244537,
     'Height': 0.019236255437135696,
     'Left': 0.11393339931964874,
     'Top': 0.8885205388069153},
    'Polygon': [{'X': 0.11394496262073517, 'Y': 0.8885205388069153},
     {'X': 0.2647634446620941, 'Y': 0.8887625932693481},
     {'X': 0.264753133058548, 'Y': 0.9077568054199219},
     {'X': 0.11393339931964874, 'Y': 0.907513439655304}]},
   'Id': '609f749c-5e79-4dd4-abcc-ad47c6ebf777'}]

We use the following code to print the id and location in a tabulated format:

#print detected text
from tabulate import tabulate
d = []
for item in response["Blocks"]:
    if item["BlockType"] == "SIGNATURE":
        d.append([item["Id"],item["Geometry"]])

print(tabulate(d, headers=["Id", "Geometry"],tablefmt="grid",maxcolwidths=[None, 100]))

The following screenshot shows our results.

More details and the full code is available in the notebook at github repository.

For documents that have readable signatures in key-value formats, we can use the Text Response Parser to extract only the signature fields by looking for the key and the value corresponding to those keys:

from trp import Document
doc = Document(response)
d = []

for page in doc.pages:
    # Search fields by key
    print("\nSearch Fields:")
    key = "Signature"
    fields = page.form.searchFieldsByKey(key)
    for field in fields:
        d.append([field.key, field.value])        

print(tabulate(d, headers=["Key", "Value"]))

The above code returns the following results:

Search Fields:
Key                        		Value
-------------------------  		--------------
8. Signature of Applicant 	Paulo Santos
26. Signature of Employer 	Richard Roe
3. Signature of Lender     	Carlos Salazar

Please note that in order to transcribe signatures in this manner, the signatures must be legible.

Best Practices for Using the Signatures Feature

Keep the following best practices in mind when using this feature:

For real-time responses, use the AnalyzeDocument API synchronous operation. For use cases where you don’t need the response in real time, such as batch processing, we suggest using the asynchronous API operation.
The Signatures feature works best when there are up to three signatures on a page. When there are more than three signatures on a page, it is better to split the page into sections and feed each section separately to the API.
Use the confidence scores provided with the detected signatures to route documents for human review when the scores do not meet the required threshold. The confidence score is not a measure of accuracy, but rather an estimate of the model’s confidence in its prediction. You should select a trust score that makes the most sense for your use case.

Summary

In this post, we provide an overview of Amazon Textract’s signatures feature to automatically detect signatures on documents such as pay stubs, rental agreements, and contracts. AnalyzeDocument Signatures reduce the need for human reviewers and help you reduce costs, save time, and create scalable solutions for document processing.

To get started, log in to the Amazon Textract console to test the feature. For more information about Amazon Textract’s capabilities, see Amazon Textract, the Amazon Textract Developer Guide, or Textract Resources.

About the authors

maran chandrasekaran is a Senior Solutions Architect at Amazon Web Services working with our enterprise clients. Outside of work, he loves to travel and ride motorcycles in the Texas Hill Country.

shibin michaelraj He is a Senior Product Manager on the AWS Textract team. She is focused on building AI/ML based products for AWS customers.

Suprakash Dutta is a Sr. Solutions Architect at Amazon Web Services. He focuses on digital transformation strategy, application modernization and migration, data analytics, and machine learning. He is part of the AI / ML community on AWS and designs intelligent document processing solutions.