Today we are pleased to announce that Pixtral 12B (pixtral-12b-2409
), a next-generation vision language model (VLM) from <a target="_blank" href="https://mistral.ai/” target=”_blank” rel=”noopener”>Mistral ai which excels at both multimodal and text-only tasks, is available to customers through amazon SageMaker JumpStart. You can test this model with SageMaker JumpStart, a machine learning (ML) hub that provides access to algorithms and models that can be deployed with a single click to run inference.
In this post, we explain how to discover, implement, and use the Pixtral 12B model for a variety of real-world vision use cases.
Pixtral 12B Overview
Pixtral 12B represents Mistral's first VLM and demonstrates strong performance in several benchmarks, outperforming other open models and matching larger models, according to Mistral. Pixtral is trained to understand both images and documents, and shows strong skills in vision tasks such as understanding graphs and figures, answering questions about documents, multimodal reasoning, and following instructions, some of which we demonstrate later in this post with examples. Pixtral 12B is capable of ingesting images at their natural resolution and aspect ratio. Unlike other open source models, Pixtral does not compromise performance on text benchmarks, such as instruction following, coding, and math, to excel in multimodal tasks.
Mistral designed a novel architecture for Pixtral 12B to optimize both speed and performance. The model has two components: a 400 million-parameter vision encoder, which tokenizes images, and a 12 billion-parameter multimodal transformer decoder, which predicts the next text token given a sequence of text and images. The newly trained vision encoder natively supports variable image sizes, allowing Pixtral to be used to accurately understand complex diagrams, charts, and documents at high resolution, and provides fast inference speeds on small images such as icons, clip art and equations. This architecture allows Pixtral to process any number of images with arbitrary sizes in its large 128,000-token context window.
Licensing agreements are a critical decision factor when using open weight models. Like other Mistral models such as Mistral 7B, Mixtral 8x7B, Mixtral 8x22B and Mistral Nemo 12B, Pixtral 12B is launched under the Apache 2.0 commercially permissiveproviding enterprise and startup customers with a high-performance VLM option for building complex multimodal applications.
SageMaker JumpStart Overview
SageMaker JumpStart offers access to a wide selection of publicly available base models (FMs). These pre-trained models serve as powerful starting points that can be deeply customized to address specific use cases. Now you can use next-generation model architectures, such as language models, computer vision models, and more, without having to build them from scratch.
With SageMaker JumpStart, you can deploy models in a secure environment. Models can be provisioned on dedicated SageMaker Inference instances, including instances powered by AWS Trainium and AWS Inferentia, and are isolated within your virtual private cloud (VPC). This reinforces data security and compliance because models operate under their own VPC controls, rather than in a shared public environment. After you deploy an FM, you can further customize and tune the model, including SageMaker Inference to deploy container models and registries to improve observability. With SageMaker, you can streamline the entire model deployment process. Please note that fine tuning in Pixtral 12B is not yet available (at the time of writing) in SageMaker JumpStart.
Prerequisites
To try Pixtral 12B in SageMaker JumpStart, you need the following prerequisites:
Discover Pixtral 12B on SageMaker JumpStart
You can access Pixtral 12B through SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. In this section, we go over how to discover models in SageMaker Studio.
SageMaker Studio is an IDE that provides a single, web-based visual interface where you can access tools specifically designed to perform ML development steps, from data preparation to creating, training, and deploying your ML models. For more details about getting started and setting up SageMaker Studio, see amazon SageMaker Studio Classic.
- In SageMaker Studio, access SageMaker JumpStart by choosing Begin in the navigation panel.
- Choose HugsFace to access the Pixtral 12B model.
- Look for the Pixtral 12B model.
- You can choose the model card to view details about the model, such as the license, the data used to train, and how to use the model.
- Choose Deploy to deploy the model and create an endpoint.
Deploy the model to SageMaker JumpStart
Deployment starts when you choose Deploy. When the deployment is complete, an endpoint is created. You can test the endpoint by passing a sample inference request payload or by selecting the test option using the SDK. When you use the SDK, you'll see example code that you can use in the notebook editor of your choice in SageMaker Studio.
To deploy using the SDK, we start by selecting the Mistral Nemo Base model, specified by the model_id
with the value huggingface-vlm-mistral-pixtral-12b-2409
. You can deploy any of the selected models in SageMaker with the following code:
This deploys the model to SageMaker with default configurations, including the default instance type and default VPC configurations. You can change these settings by specifying non-default values in JumpStartModel. The End User License Agreement (EULA) value must be explicitly set to True in order to accept the EULA. Also, make sure you have the service limit at the account level to use ml.p4d.24xlarge or ml.pde.24xlarge for endpoint use as one or more instances. To request a service quota increase, see AWS Service Quotas. After deploying the model, you can run inference against the deployed endpoint through the SageMaker predictor.
Pixtral 12B Use Cases
In this section, we provide examples of inference in Pixtral 12B with example prompts.
LOC
We use the following image as input for OCR.
We use the following message:
Understanding and analyzing graphics.
To understand and analyze the graphs, we use the following image as input.
We use the following message:
We obtain the following result:
Image to code
For an image-to-code example, we use the following image as input.
We use the following message:
Clean
Once you're done, remove the SageMaker endpoints using the following code to avoid incurring unnecessary costs:
Conclusion
In this post, we show you how to get started with Mistral's newest multimodal model, Pixtral 12B, in SageMaker JumpStart and deploy the model for inference. We also explore how SageMaker JumpStart enables data scientists and machine learning engineers to discover, access and deploy a wide range of pre-trained FMs for inference, including other Mistral ai models such as Mistral 7B and Mixtral 8x22B.
To learn more about SageMaker JumpStart, see Training, Deploying, and Testing Pretrained Models with SageMaker JumpStart and Getting Started with amazon SageMaker JumpStart to get started.
For more Mistral assets, check out the Mistral-on-AWS repository.
About the authors
Preston Tuggle is a specialized Senior Solutions Architect working on generative ai.
Nithiyan Vijayaswaran is a solutions architect specializing in GenAI at AWS. His area of focus is Generative ai and AWS ai Accelerators. He has a Bachelor's degree in Computer Science and Bioinformatics. Niithiyn works closely with the Generative ai GTM team to help AWS customers on multiple fronts and accelerate their adoption of generative ai. He is an avid Dallas Mavericks fan and enjoys collecting sneakers.
Shane Roy is a Senior GenAI Specialist for the AWS Worldwide Specialist Organization (WWSO). It works with customers across industries to solve their most pressing and innovative business needs using the variety of cloud-based AWS ai/ML services, including model offerings from top-tier base model providers.