The prevalence of virtual business meetings in the corporate world, greatly accelerated by the COVID-19 pandemic, is here to stay. Based on survey conducted by American Express in 2023, 41% of business meetings are expected to be held in hybrid or virtual format by 2024. Attending multiple meetings daily and keeping track of all ongoing topics becomes increasingly more difficult to manage over time. This can have a negative impact in many ways, from delays in project deadlines to loss of client trust. Writing meeting summaries is the usual remedy to overcome this challenge, but it disrupts the concentration needed to listen to ongoing conversations.
A more efficient way to manage meeting summaries is to automatically create them at the end of a call using generative artificial intelligence (ai) and speech-to-text technologies. This allows attendees to focus solely on the conversation, knowing that a transcript will automatically be available at the end of the call.
This post presents a solution to automatically generate a summary of a recorded virtual meeting (for example, using amazon Chime) with multiple participants. The recording is transcribed to text using amazon Transcribe and then processed by amazon SageMaker Hugging Face containers to generate the meeting summary. Hugging Face containers host a large language model (LLM) of the Hugging Hub Face.
If you prefer to generate post-call recording summaries with amazon Bedrock instead of amazon SageMaker, check out this sample Bedrock solution. For an ai-powered generative Live Meeting Assistant that creates post-call summaries, but also provides live transcriptions, translations, and contextual assistance based on your own company's knowledge base, check out our new LMA solution.
Solution Overview
The entire solution infrastructure is provisioned using the AWS Cloud Development Kit (AWS CDK), which is an infrastructure as code (IaC) framework for programmatically defining and deploying AWS resources. The framework provides resources in a secure and repeatable manner, enabling significant acceleration of the development process.
amazon Transcribe is a fully managed service that seamlessly runs automatic speech recognition (ASR) workloads in the cloud. The service enables easy ingestion of audio data, creation of easy-to-read transcripts, and improved accuracy through custom vocabularies. amazon Transcribe's new basic ASR model supports more than 100 language variants. In this post, we use the speaker logging feature, which allows amazon Transcribe to differentiate between up to 10 unique speakers and tag a conversation accordingly.
hugging face is an open source machine learning (ML) platform that provides tools and resources for the development of ai projects. Its key offering is the Hugging Face Hub, which hosts an extensive collection of over 200,000 pre-trained models and 30,000 data sets. AWS's partnership with Hugging Face enables seamless integration through SageMaker with a set of deep learning containers (DLCs) for training and inference, and Hugging Face estimators and predictors for the SageMaker Python SDK.
ai-cdk-constructs” target=”_blank” rel=”noopener”>Generative ai CDK Builds, an open source extension to the AWS CDK, provides well-designed multi-service patterns to quickly and efficiently build the repeatable infrastructure needed for generative ai projects on AWS. In this post, we illustrate how you simplify deploying core models (FM) from Hugging Face or amazon SageMaker JumpStart with SageMaker real-time inference, which provides fully managed, persistent endpoints for hosting machine learning models. They are designed for interactive, real-time, low-latency workloads and provide automatic scaling to manage load fluctuations. For all languages supported by amazon Transcribe, you can find Hugging Face FMs that support summaries in the corresponding languages.
The following diagram shows the automated meeting summary workflow.
<img class="alignnone wp-image-74616 size-full" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2024/04/17/ml-9091-ai-generated-conversation-summary-leveraging-amazon-transcribe-amazon-arch-diag.png” alt=”Architecture diagram” width=”1758″ height=”3108″/>
The workflow consists of the following steps:
- The user uploads the meeting recording as an audio or video file to the project's amazon Simple Storage Service (amazon S3) bucket, where
/recordings
file. - Each time a new recording is uploaded to this folder, an AWS Lambda Transcribe function is invoked and starts an amazon Transcribe job that converts the meeting recording to text. The transcripts are then stored in the project's S3 bucket on
/transcriptions/TranscribeOutput/
. - This triggers the Inference Lambda function, which preprocesses the transcript file into a format suitable for ML inference and stores it in the project's S3 bucket under the prefix.
/summaries/InvokeInput/processed-TranscribeOutput/
and invokes a SageMaker endpoint. The endpoint hosts the Hugging Face model that summarizes the processed transcript. The summary is uploaded to the S3 bucket under the prefix/summaries
. Note that the request template used in this example includes a single statement; however, for more sophisticated requirements, the template can be easily extended to tailor the solution to your own use case. - This S3 event triggers the Notification Lambda function, which sends the summary to an amazon Simple Notification Service (amazon SNS) topic.
- All subscribers to the SNS topic (such as meeting attendees) receive the summary in their email inbox.
In this post, we deploy Mistral 7B Instruct, an LLM available on Hugging Face Model Hub, to a SageMaker endpoint to perform summarization tasks. Mistral 7B Instruct is developed by Mistral ai. It is equipped with more than 7 billion parameters, allowing it to process and generate text according to user instructions. It has been trained on a large corpus of text data to understand various contexts and nuances of language. The model is designed to perform tasks such as answering questions, summarizing information and creating content, among others, following specific instructions given by users. Its effectiveness is measured through metrics such as perplexity, precision, and F1 score, and it is tuned to respond to instructions with relevant and consistent text results.
Previous requirements
To follow this post, you must have the following prerequisites:
Implement the solution
To deploy the solution in your own AWS account, see the GitHub repository To access the complete source code of the AWS CDK project in Python:
If you are deploying AWS CDK assets for the first time to your AWS account and the AWS Region that you specified, you must first run the bootstrap command. Configure the basic AWS resources and permissions required for AWS CDK to deploy AWS CloudFormation stacks in a given environment:
Finally, run the following command to deploy the solution. Specify the email address of the summary recipient in the SubscriberEmailAddress
parameter:
Try the solution
We have provided some examples of meeting recordings in the data project repository folder. You can upload the test.mp4 recording to the project's S3 bucket in the /recordings
file. The summary will be saved to amazon S3 and sent to the subscriber. The end-to-end duration is about 2 minutes given an input of about 250 tokens.
The following figure shows the input conversation and output summary.
<img loading="lazy" class="alignnone wp-image-73585 size-full" src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2024/04/05/ml-9091-ai-generated-conversation-summary-leveraging-amazon-transcribe-amazon-image002.png” alt=”” width=”1442″ height=”794″/>
Limitations
This solution has the following limitations:
- The model provides high-precision completions for the English language. You can use other languages such as Spanish, French or Portuguese, but the quality of the endings may degrade. You may find other Hugging Face models that are better suited to other languages.
- The model used in this post is limited by a context length of about 8000 tokens, which is roughly equivalent to 6000 words. If a longer context length is required, you can replace the model by referencing the new model ID in the respective AWS CDK build.
- Like other LLMs, Mistral 7B Instruct may hallucinate, generating content that deviates from factual reality or includes fabricated information.
- The format of the recordings must be .mp4, .mp3 or .wav.
Clean
To remove deployed resources and stop incurring charges, run the following command:
Alternatively, to use the AWS Management Console, complete the following steps:
- In the AWS CloudFormation console, choose batteries in the navigation panel.
- Select the stack named Text-summarization-Infrastructure-stack and choose Delete.
Conclusion
In this post, we proposed an architectural pattern to automatically transform your meeting recordings into engaging conversation summaries. This workflow shows how AWS Cloud and Hugging Face can help you accelerate the development of your generative ai application by orchestrating a combination of managed ai services, such as amazon Transcribe, and externally sourced machine learning models from Hugging Face Hub, like those of Mistral ai.
If you would like to learn more about how conversation summaries can be applied to a contact center environment, you can implement this technique in our suite of solutions for amazon.com/live-call-analytics” target=”_blank” rel=”noopener”>Live call analysis and amazon.com/post-call-analytics” target=”_blank” rel=”noopener”>Post-call analysis.
References
ai/fr/news/announcing-mistral-7b/” target=”_blank” rel=”noopener”>Mistral 7B launch post, by Mistral ai
Our team
This post was created by AWS Professional Services, a global team of experts who can help you achieve your desired business results when using the AWS Cloud. We work together with your team and your chosen AWS Partner Network (APN) member to implement your business cloud computing initiatives. Our team supports you through a collection of offerings that help you achieve specific outcomes related to enterprise cloud adoption. We also provide focused guidance through our global specialist practices, covering a range of solutions, technologies and industries.
About the authors
Gabriel Rodriguez Garcia is a machine learning engineer at AWS Professional Services in Zurich. In his current role, he has helped clients achieve their business objectives in a variety of ML use cases, ranging from setting up MLOps inference pipelines to developing a fraud detection application. When he is not working, he likes to do physical activities, listen to podcasts or read books.
Jahed Zaidi is an artificial intelligence and machine learning specialist at AWS Professional Services in Paris. He is a trusted builder and advisor to businesses across industries, helping companies innovate faster and at scale with technologies ranging from generative ai to scalable machine learning platforms. Outside of work, you'll find Jahed discovering new cities and cultures and enjoying outdoor activities.
Mateusz Zaremba is a DevOps Architect at AWS Professional Services. Mateusz supports clients at the intersection of machine learning and DevOps expertise, helping them deliver value efficiently and securely. Beyond technology, he is an aerospace engineer and avid sailor.
Kemeng Zhang He currently works at AWS Professional Services in Zurich, Switzerland, specializing in ai/ML. He has been part of multiple NLP projects, from behavior change in digital communication to fraud detection. Apart from that, he is interested in UX design and card games.