Released in 2021, Amazon SageMaker Canvas is a visual point-and-click service for creating and deploying machine learning (ML) models without the need to write any code. Out-of-the-box Foundation Models (FMs) available in SageMaker Canvas allow customers to use generative ai for tasks such as content generation and summarization.
We are excited to announce the latest updates to Amazon SageMaker Canvas, bringing exciting new generative ai capabilities to the platform. With support for the Meta Llama 2 and Mistral.ai models and the release of streaming responses, SageMaker Canvas continues to empower everyone who wants to get started with generative ai without writing a single line of code. In this post, we discuss these updates and their benefits.
Introducing the Meta Llama 2 and Mistral models
Llama 2 is a cutting-edge core model from Meta that offers enhanced scalability and versatility for a wide range of generative ai tasks. Users have reported that Llama 2 is able to engage in meaningful and coherent conversations, generate new content, and extract responses from existing notes. Llama 2 is among the next-generation large language models (LLMs) available today for the open source community to create their own ai-powered applications.
Mistral.ai, a leading French ai startup, has developed Mistral 7B, a powerful language model with 7.3 billion parameters. Mistral models have been very well received by the open source community thanks to the use of clustered query attention (GQA) for faster inference, making them highly efficient and with performance comparable to the model with double or triple parameters.
Today, we are pleased to announce that SageMaker Canvas now supports three variants of the Llama 2 model and two variants of the Mistral 7B:
To try these models, navigate to SageMaker Canvas Ready-to-use models page, then choose Generate, extract and summarize content.. This is where you'll find the SageMaker Canvas GenAI chat experience. Here, you can use any Amazon Bedrock or SageMaker JumpStart template by selecting them from the template drop-down menu.
In our case we chose one of the Llama 2 models. Now you can provide your opinion or query. As you submit input, SageMaker Canvas forwards it to the model.
Choosing which of the models available in SageMaker Canvas best suits your use case requires taking into account information about the models themselves: the Llama-2-70B-chat model is a larger model (70 billion parameters, in compared to 13 billion). with Llama-2-13B-chat), meaning their throughput is generally higher than the smaller one, at the cost of slightly higher latency and higher cost per token. Mistral-7B has comparable performances to Llama-2-7B or Llama-2-13B, however, it is hosted on Amazon SageMaker. This means that the pricing model is different, moving from a dollar-per-token pricing model to a dollar-per-hour model. This can be more cost-effective with a significant number of requests per hour and consistent usage at scale. All of the above models can work well in a variety of use cases, so our suggestion is to evaluate which model best solves your problem, considering the trade-offs between production, performance and costs.
If you're looking for an easy way to compare how models behave, SageMaker Canvas natively provides this capability in the form of model comparisons. You can select up to three different models and send the same query to all of them at the same time. SageMaker Canvas will then get the responses from each of the models and display them in a side-by-side chat UI. To do this, choose Compare and choose other models to compare, as below:
Introducing Response Streaming: Real-Time Interactions and Improved Performance
One of the key advances in this version is the introduction of broadcast responses. Streaming responses provides a richer user experience and better reflects a chat experience. With response streaming, users can receive instant feedback and seamless integration into their chatbot applications. This allows for a more interactive and responsive experience, improving the overall performance and user satisfaction of the chatbot. The ability to receive immediate responses in the form of chat creates a more natural conversation flow and improves the user experience.
With this feature, you can now interact with your ai models in real-time, receive instant responses, and enable seamless integration into a variety of applications and workflows. All models that can be queried in SageMaker Canvas (from Amazon Bedrock and SageMaker JumpStart) can transmit responses to the user.
Get started today
Whether you are building a chatbot, recommendation system, or virtual assistant, the Llama 2 and Mistral models combined with streamed responses bring improved performance and interactivity to your projects.
To use the latest features of SageMaker Canvas, be sure to delete and recreate the app. To do that, sign out of the app by choosing Sign offand then open SageMaker Canvas again. You should see the new models and enjoy the latest releases. Signing out of the SageMaker Canvas application will free all resources used by the workspace instance, thus avoiding incurring unwanted additional charges.
Conclusion
To get started with the new streamed responses for the Llama 2 and Mistral models in SageMaker Canvas, visit the SageMaker console and explore the intuitive interface. To learn more about how SageMaker Canvas and Generative ai can help you achieve your business goals, see Empower Your Business Users to Extract Insights from Company Documents Using Amazon SageMaker Canvas and Generative ai and Overcome Common Business Challenges Contact centers with generative ai and Amazon SageMaker Canvas. .
If you want to learn more about the features of SageMaker Canvas and dive deeper into other ML use cases, check out the other posts available in the SageMaker Canvas category of the AWS ML blog. We can't wait to see the amazing ai applications you'll create with these new capabilities!
About the authors
Davide Gallitelli is a Senior Solutions Architect specializing in ai/ML. He is based in Brussels and works closely with clients around the world looking to adopt Low-Code/No-Code machine learning and generative ai technologies. He has been a developer since a young age and started coding at age 7. He started learning ai/ML in college and has fallen in love with it ever since.
Dan Sinnreich is a Senior Product Manager at AWS, helping to democratize low-code/no-code machine learning. Prior to AWS, Dan built and marketed enterprise SaaS platforms and time series models used by institutional investors to manage risk and build optimal portfolios. Outside of work, he can be found playing hockey, scuba diving, and reading science fiction.