Pixtral 12B is now available on Amazon SageMaker JumpStart

Today we are pleased to announce that Pixtral 12B (pixtral-12b-2409), a next-generation vision language model (VLM) from <a target="_blank" href="https://mistral.ai/” target=”_blank” rel=”noopener”>Mistral ai which excels at both multimodal and text-only tasks, is available to customers through amazon SageMaker JumpStart. You can test this model with SageMaker JumpStart, a machine learning (ML) hub that provides access to algorithms and models that can be deployed with a single click to run inference.

In this post, we explain how to discover, implement, and use the Pixtral 12B model for a variety of real-world vision use cases.

Pixtral 12B Overview

Pixtral 12B represents Mistral's first VLM and demonstrates strong performance in several benchmarks, outperforming other open models and matching larger models, according to Mistral. Pixtral is trained to understand both images and documents, and shows strong skills in vision tasks such as understanding graphs and figures, answering questions about documents, multimodal reasoning, and following instructions, some of which we demonstrate later in this post with examples. Pixtral 12B is capable of ingesting images at their natural resolution and aspect ratio. Unlike other open source models, Pixtral does not compromise performance on text benchmarks, such as instruction following, coding, and math, to excel in multimodal tasks.

Mistral designed a novel architecture for Pixtral 12B to optimize both speed and performance. The model has two components: a 400 million-parameter vision encoder, which tokenizes images, and a 12 billion-parameter multimodal transformer decoder, which predicts the next text token given a sequence of text and images. The newly trained vision encoder natively supports variable image sizes, allowing Pixtral to be used to accurately understand complex diagrams, charts, and documents at high resolution, and provides fast inference speeds on small images such as icons, clip art and equations. This architecture allows Pixtral to process any number of images with arbitrary sizes in its large 128,000-token context window.

Licensing agreements are a critical decision factor when using open weight models. Like other Mistral models such as Mistral 7B, Mixtral 8x7B, Mixtral 8x22B and Mistral Nemo 12B, Pixtral 12B is launched under the Apache 2.0 commercially permissiveproviding enterprise and startup customers with a high-performance VLM option for building complex multimodal applications.

SageMaker JumpStart Overview

SageMaker JumpStart offers access to a wide selection of publicly available base models (FMs). These pre-trained models serve as powerful starting points that can be deeply customized to address specific use cases. Now you can use next-generation model architectures, such as language models, computer vision models, and more, without having to build them from scratch.

With SageMaker JumpStart, you can deploy models in a secure environment. Models can be provisioned on dedicated SageMaker Inference instances, including instances powered by AWS Trainium and AWS Inferentia, and are isolated within your virtual private cloud (VPC). This reinforces data security and compliance because models operate under their own VPC controls, rather than in a shared public environment. After you deploy an FM, you can further customize and tune the model, including SageMaker Inference to deploy container models and registries to improve observability. With SageMaker, you can streamline the entire model deployment process. Please note that fine tuning in Pixtral 12B is not yet available (at the time of writing) in SageMaker JumpStart.

Prerequisites

To try Pixtral 12B in SageMaker JumpStart, you need the following prerequisites:

Discover Pixtral 12B on SageMaker JumpStart

You can access Pixtral 12B through SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. In this section, we go over how to discover models in SageMaker Studio.

SageMaker Studio is an IDE that provides a single, web-based visual interface where you can access tools specifically designed to perform ML development steps, from data preparation to creating, training, and deploying your ML models. For more details about getting started and setting up SageMaker Studio, see amazon SageMaker Studio Classic.

In SageMaker Studio, access SageMaker JumpStart by choosing Begin in the navigation panel.
Choose HugsFace to access the Pixtral 12B model.
Look for the Pixtral 12B model.
You can choose the model card to view details about the model, such as the license, the data used to train, and how to use the model.
Choose Deploy to deploy the model and create an endpoint.

Deploy the model to SageMaker JumpStart

Deployment starts when you choose Deploy. When the deployment is complete, an endpoint is created. You can test the endpoint by passing a sample inference request payload or by selecting the test option using the SDK. When you use the SDK, you'll see example code that you can use in the notebook editor of your choice in SageMaker Studio.

To deploy using the SDK, we start by selecting the Mistral Nemo Base model, specified by the model_id with the value huggingface-vlm-mistral-pixtral-12b-2409. You can deploy any of the selected models in SageMaker with the following code:

from sagemaker.jumpstart.model import JumpStartModel 

accept_eula = True 

model = JumpStartModel(model_id="huggingface-vlm-mistral-pixtral-12b-2409") 
predictor = model.deploy(accept_eula=accept_eula)

This deploys the model to SageMaker with default configurations, including the default instance type and default VPC configurations. You can change these settings by specifying non-default values in JumpStartModel. The End User License Agreement (EULA) value must be explicitly set to True in order to accept the EULA. Also, make sure you have the service limit at the account level to use ml.p4d.24xlarge or ml.pde.24xlarge for endpoint use as one or more instances. To request a service quota increase, see AWS Service Quotas. After deploying the model, you can run inference against the deployed endpoint through the SageMaker predictor.

Pixtral 12B Use Cases

In this section, we provide examples of inference in Pixtral 12B with example prompts.

LOC

We use the following image as input for OCR.

We use the following message:

payload = {
    "messages": (
        {
            "role": "user",
            "content": (
                {
                    "type": "text",
                    "text": "Extract and transcribe all text visible in the image, preserving its exact formatting, layout, and any special characters. Include line breaks and maintain the original capitalization and punctuation.",
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "Pixtral_data/amazon_s1_2.jpg"
                    }
                }
            )
        }
    ),
    "max_tokens": 2000,
    "temperature": 0.6,
    "top_p": 0.9,
}
print(response)
Approximate date of commencement of proposed sale to the public: AS SOON AS PRACTICABLE AFTER THIS REGISTRATION STATEMENT BECOMES EFFECTIVE. 
If any of the securities being registered on this Form are to be offered on a delayed or continuous basis pursuant to Rule 415 under the Securities Act of 1933, check the following box. 
() If this Form is filed to register additional securities for an offering pursuant to Rule 462(b) under the Securities Act of 1933, check the following box and list the Securities Act registration statement number of the earlier effective registration statement for the same offering. 
() If this Form is a post-effective amendment filed pursuant to Rule 462(c) under the Securities Act of 1933, check the following box and list the Securities Act registration statement number of the earlier effective registration statement for the same offering. 
() If delivery of the prospectus is expected to be made pursuant to Rule 434, please check the following box. 
() **CALCULATION OF REGISTRATION FEE** 
| TITLE OF EACH CLASS OF SECURITIES TO BE REGISTERED | AMOUNT TO BE REGISTERED(1) | PROPOSED MAXIMUM OFFERING PRICE PER SHARE(2) | PROPOSED MAXIMUM AGGREGATE OFFERING PRICE(2) | AMOUNT OF REGISTRATION FEE | 
|----------------------------------------------------|----------------------------|---------------------------------------------|---------------------------------------------|----------------------------| 
| Common Stock, $0.01 par value per share........... | 2,875,000 shares           | $14.00                                      | $40,250,000                                 | $12,197(3)                 | 

(1) Includes 375,000 shares that the Underwriters have the option to purchase to cover over-allotments, if any. 
(2) Estimated solely for the purpose of calculating the registration fee in accordance with Rule 457(c). 
(3) $11,326 of registration fee has been previously paid. ...

Understanding and analyzing graphics.

To understand and analyze the graphs, we use the following image as input.

We use the following message:

prompt= """
Analyze the attached image of the chart or graph. Your tasks are to:
Identify the type of chart or graph (e.g., bar chart, line graph, pie chart, etc.).
Extract the key data points, including labels, values, and any relevant scales or units.
Identify and describe the main trends, patterns, or significant observations presented in the chart.
Generate a clear and concise paragraph summarizing the extracted data and insights. The summary should highlight the most important information and provide an overview that would help someone understand the chart without seeing it.
Ensure that your summary is well-structured, accurately reflects the data, and is written in a professional tone.
"""
payload = {
    "messages": (
        {
            "role": "user",
            "content": (
                {
                    "type": "text",
                    "text": prompt,
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "Pixtral_data/amazon_s1_2.jpg"
                    }
                }
            )
        }
    ),
    "max_tokens": 2000,
    "temperature": 0.6,
    "top_p": 0.9,
}
print(response)
image_path = "Pixtral_data/Amazon_Chart.png"  # Replace with your local image path
response = send_images_to_model(predictor, prompt, image_path)
print(response)

We obtain the following result:

The image is a bar chart titled "Segment Results – North America," which presents data on net sales and operating income over several quarters from Q2 2023 to Q2 2024. The chart is divided into two sections: one for net sales and the other for operating income.

### Key Data Points:
- Net Sales:
 - Q2 2023: $82,546 million
 - Q3 2023: Approximately $85,000 million
 - Q4 2023: Approximately $90,000 million
 - Q1 2024: Approximately $85,000 million
 - Q2 2024: $90,033 million
 - Year-over-Year (Y/Y) growth: 9%

- Operating Income:
 - Q2 2023: $3,211 million
 - Q3 2023: Approximately $4,000 million
 - Q4 2023: Approximately $7,000 million
 - Q1 2024: Approximately $5,000 million
 - Q2 2024: $5,065 million
 - Year-over-Year (Y/Y) growth: 58%

- Total Trailing Twelve Months (TTM):
 - Net Sales: $369.8 billion
 - Operating Income: $20.8 billion
...
- **Operating Income:** Operating income shows significant growth, particularly in Q4 2023, where it peaks. There is a notable year-over-year increase of 58%.

### Summary:
The bar chart illustrates the segment results for North America, focusing on net sales and operating income from Q2 2023 to Q2 2024. Net sales demonstrate a steady upward trend, culminating in a 9% year-over-year increase, with the highest value recorded in Q2 2024 at $90,033 million. Operating income exhibits more volatility, with a significant peak in Q4 2023, and an overall substantial year-over-year growth of 58%. The total trailing twelve months (TTM) figures indicate robust performance, with net sales reaching $369.8 billion and operating income at $20.8 billion. This data underscores a positive growth trajectory in both net sales and operating income for the North American segment over the observed period.

Image to code

For an image-to-code example, we use the following image as input.

We use the following message:

def extract_html(text):
 pattern = r'```html\s*(.*?)\s*```'
 match = re.search(pattern, text, re.DOTALL)
 return match.group(1) if match else None
  
prompt = "Create HTML and CSS code for a minimalist and futuristic website to purchase luggage. Use the following image as template to create your own design."
payload = {
    "messages": (
        {
            "role": "user",
            "content": (
                {
                    "type": "text",
                    "text": prompt,
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "Pixtral_data/Amazon_Chart.png"
                    }
                }
            )
        }
    ),
    "max_tokens": 2000,
    "temperature": 0.6,
    "top_p": 0.9,
}
print('Input Image:\n\n')
html_code = extract_html(response)
print(html_code)
display(HTML(html_code))



    
    
    Luggage Store
    <link rel="stylesheet" href="https://aws.amazon.com/blogs/machine-learning/pixtral-12b-is-now-available-on-amazon-sagemaker-jumpstart/styles.css"/>


    
        
        
            
        
    
...
        © 2023 Luggage Store. All rights reserved.

Clean

Once you're done, remove the SageMaker endpoints using the following code to avoid incurring unnecessary costs:

predictor.delete_model()
predictor.delete_endpoint()

Conclusion

In this post, we show you how to get started with Mistral's newest multimodal model, Pixtral 12B, in SageMaker JumpStart and deploy the model for inference. We also explore how SageMaker JumpStart enables data scientists and machine learning engineers to discover, access and deploy a wide range of pre-trained FMs for inference, including other Mistral ai models such as Mistral 7B and Mixtral 8x22B.

To learn more about SageMaker JumpStart, see Training, Deploying, and Testing Pretrained Models with SageMaker JumpStart and Getting Started with amazon SageMaker JumpStart to get started.

For more Mistral assets, check out the Mistral-on-AWS repository.

About the authors

Preston Tuggle is a specialized Senior Solutions Architect working on generative ai.

Nithiyan Vijayaswaran is a solutions architect specializing in GenAI at AWS. His area of focus is Generative ai and AWS ai Accelerators. He has a Bachelor's degree in Computer Science and Bioinformatics. Niithiyn works closely with the Generative ai GTM team to help AWS customers on multiple fronts and accelerate their adoption of generative ai. He is an avid Dallas Mavericks fan and enjoys collecting sneakers.

Shane Roy is a Senior GenAI Specialist for the AWS Worldwide Specialist Organization (WWSO). It works with customers across industries to solve their most pressing and innovative business needs using the variety of cloud-based AWS ai/ML services, including model offerings from top-tier base model providers.

Pixtral 12B is now available on Amazon SageMaker JumpStart

Technical Terrence Team

BMW Praises Tesla's Full Self-Driving Tech on Social Media By Investing.com

Leave a Reply Cancel reply

Recommended.

Which AP automation is best?

Lido plans to level up ahead of Ethereum Shanghai hard fork upgrade

The 54 Best Black Friday Tech Deals Under $25

Return to Monkey Island comes to Apple Arcade in June

Ethereum Whales Send $486 Million To Coinbase, Time To Exit?

Categories

Important Links

Pixtral 12B is now available on Amazon SageMaker JumpStart

Pixtral 12B Overview

SageMaker JumpStart Overview

Prerequisites

Discover Pixtral 12B on SageMaker JumpStart

Deploy the model to SageMaker JumpStart

Pixtral 12B Use Cases

LOC

Understanding and analyzing graphics.

Image to code

Clean

Conclusion

About the authors

Related

Technical Terrence Team

BMW Praises Tesla's Full Self-Driving Tech on Social Media By Investing.com

Leave a Reply Cancel reply

Recommended.

Which AP automation is best?

Lido plans to level up ahead of Ethereum Shanghai hard fork upgrade

The 54 Best Black Friday Tech Deals Under $25

Return to Monkey Island comes to Apple Arcade in June

Ethereum Whales Send $486 Million To Coinbase, Time To Exit?

Categories

Important Links

Get daily news updates to your inbox!