Mistral AI's Mistral 7B foundation models are now available on Amazon SageMaker JumpStart

Today, we are pleased to announce that Mistral 7B base models, developed by Mistral ai, are available for customers through Amazon SageMaker JumpStart to deploy with a single click to run inference. With 7 billion parameters, Mistral 7B can be easily customized and quickly deployed. You can try this model with SageMaker JumpStart, a machine learning (ML) hub that provides access to algorithms and models to get you started quickly with ML. In this post, we explain how to discover and implement the Mistral 7B model.

What is Mistral 7B?

Mistral 7B is a basic model developed by Mistral ai, which supports English text and code generation capabilities. It supports a variety of use cases, such as text summarization, classification, text completion, and code completion. To demonstrate the model’s easy customization, Mistral ai also released a Mistral 7B Instruct model for chat use cases, fine-tuned using a variety of publicly available conversation data sets.

Mistral 7B is a transformative model and uses pooled query attention and sliding window attention to achieve faster inference (low latency) and handle longer sequences. Group query attention is an architecture that combines multi-query and multi-head attention to achieve output quality close to multi-head attention and speed comparable to multi-query attention. Sliding window attention uses the stacked layers of a transformer to serve in the past beyond the window size to increase the length of the context. Mistral 7B has a context length of 8000 tokens, demonstrates low latency and high throughput, and has strong performance compared to larger model alternatives, providing low memory requirements in a 7B model size. The model is available under permission. apache 2.0 licensefor unrestricted use.

What is SageMaker JumpStart?

With SageMaker JumpStart, machine learning professionals can choose from a growing list of top-performing base models. Machine learning professionals can deploy basic models to dedicated Amazon SageMaker instances within a network-isolated environment and customize models using SageMaker for model training and deployment.

You can now discover and deploy Mistral 7B with a few clicks in Amazon SageMaker Studio or programmatically through the SageMaker Python SDK, allowing you to derive model performance and MLOps controls with SageMaker functions such as Amazon SageMaker Pipelines, Amazon SageMaker Debugger or container registries. The model is deployed in a secure AWS environment and under the controls of your VPC, which helps ensure data security.

Discover models

You can access the basic Mistral 7B models through SageMaker JumpStart in the SageMaker Studio user interface and the SageMaker Python SDK. In this section, we go over how to discover models in SageMaker Studio.

SageMaker Studio is an integrated development environment (IDE) that provides a single, web-based visual interface where you can access tools specifically designed to perform all ML development steps, from data preparation to authoring, training, and deployment. implementation of your ML models. For more details about getting started and setting up SageMaker Studio, see Amazon SageMaker Studio.

In SageMaker Studio, you can access SageMaker JumpStart, which contains pre-trained models, notebooks, and pre-built solutions, at Pre-built and automated solutions.

From the SageMaker JumpStart home page, you can search for solutions, templates, notebooks, and other resources. You can find Mistral 7B in the Basic models: text generation carousel.

You can also find other model variants by choosing Explore all text models or searching for “Mistral”.

You can choose the model card to view details about the model, such as the license, the data used to train, and how to use it. You will also find two buttons, Deploy and open notebookwhich will help you use the model (the following screenshot shows the Deploy option).

Deploy models

Deployment starts when you choose Deploy. Alternatively, you can implement via the example notebook that appears when you choose open notebook. The example notebook provides comprehensive guidance on how to implement the model for inference and resource cleansing.

To deploy using a laptop, we start by selecting the Mistral 7B model, specified by the model_id. You can deploy any of the selected models in SageMaker with the following code:

from sagemaker.jumpstart.model import JumpStartModel

model = JumpStartModel(model_id="huggingface-llm-mistral-7b-instruct")
predictor = model.deploy()

This deploys the model to SageMaker with default configurations, including the default instance type (ml.g5.2xlarge) and default VPC configurations. You can change these settings by specifying non-default values in JumpStartModel. Once deployed, you can run inference against the deployed endpoint via the SageMaker predictor:

payload = {"inputs": "<s>(INST) Hello! (/INST)"}
predictor.predict(payload)

Optimizing Deployment Configuration

Mistral models use the Text Generation Inference Models Service (TGI version 1.1). When deploying models with the TGI deep learning container (DLC), you can configure a variety of launcher arguments via environment variables when implementing your endpoint. To support the 8000 token context length of Mistral 7B models, SageMaker JumpStart has configured some of these parameters by default: we configure MAX_INPUT_LENGTH and MAX_TOTAL_TOKENS to 8191 and 8192, respectively. You can see the full list by inspecting its model object:

By default, SageMaker JumpStart does not block concurrent users via the environment variable. MAX_CONCURRENT_REQUESTS lower than the TGI default of 128. The reason is that some users may have typical workloads with small load context lengths and want high concurrency. Please note that the SageMaker TGI DLC supports multiple simultaneous users using continuous batching. When implementing your endpoint for your application, you might consider whether to set MAX_TOTAL_TOKENS either MAX_CONCURRENT_REQUESTS before deployment to provide the best performance for your workload:

model.env("MAX_CONCURRENT_REQUESTS") = "4"

Here, we show how the model’s performance may differ for your typical endpoint workload. In the following tables, you can see that small-sized queries (128 input words and 128 output tokens) perform quite well with a large number of concurrent users, achieving token throughput on the order of 1000 tokens per second. However, as the number of input words increases to 512 input words, the endpoint saturates its batching capacity (the number of concurrent requests it is allowed to process simultaneously), resulting in a plateau in performance. and significant latency degradations starting around 16 concurrent users. Finally, when multiple concurrent users query the endpoint with large input contexts (e.g. 6400 words) simultaneously, this performance plateau occurs relatively quickly, to the point where your SageMaker account will start to encounter timeouts. 60 second response time for your overloaded requests. .

.				throughput (tokens/s)
concurrent users				1	2	4	8	sixteen	32	64	128
model	instance type	entry words	exit tokens	.
mistral-7b-instruct	ml.g5.2xlarge	128	128	30	54	89	166	287	499	793	1030
		512	128	29	fifty	80	140	210	315	383	458
		6400	128	17	25	30	35	—	—	—	—

.				p50 latency (ms/token)
concurrent users				1	2	4	8	sixteen	32	64	128
model	instance type	input words	exit tokens	.
mistral-7b-instruct	ml.g5.2xlarge	128	128	32	33	3. 4	36	41	46	59	88
		512	128	3. 4	36	39	43	54	71	112	213
		6400	128	57	71	98	154	—	—	—	—

Inference and Example Indications

Mistral 7B

You can interact with a base Mistral 7B model like any standard text generation model, where the model processes an input sequence and generates the next predicted words in the sequence. The following is a simple example with multi-shot learning, where the model is given several examples and the final example response is generated with contextual knowledge of these previous examples:

> Input
Tweet: "I get sad when my phone battery dies."
Sentiment: Negative
###
Tweet: "My day has been :+1:"
Sentiment: Positive
###
Tweet: "This is the link to the article"
Sentiment: Neutral
###
Tweet: "This new music video was incredibile"
Sentiment:

> Output
 Positive

Mistral Instruction 7B

The version adapted to the Mistral instructions accepts formatted instructions in which the conversation roles must begin with a user message and alternate between user and assistant. A simple user message may look like the following:

<s>(INST) {user_prompt} (/INST)

A multi-turn message would look like this:

<s>(INST) {user_prompt_1} (/INST) {assistant_response_1} </s><s>(INST) {user_prompt_1} (/INST)

This pattern is repeated during all the turns in the conversation.

In the following sections, we explore some examples using the Mistral 7B Instruct model.

Knowledge Recovery

The following is an example of knowledge retrieval:

> Input
<s>(INST) Which country has the most natural lakes? Answer with only the country name. (/INST) 

> Output
1. Canada

Answer to broad context questions

To demonstrate how to use this model to support long input contexts, the following example incorporates a passage titled “Rats” by Robert Sullivan (reference), from the MCAS Grade 10 English Language Arts Reading Comprehension Test to Cue Instruction and asks the model a directed question about the text:

> Input
<s>(INST) A rat is a rodent, the most common mammal in the world. Rattus norvegicus is one of the approximately four hundred different kinds of rodents, and it is known by many names, each of which describes a trait or a perceived trait or sometimes a habitat: the earth rat, the roving rat, the barn rat, the fi eld rat, the migratory rat, the house rat, the sewer rat, the water rat, the wharf rat, the alley rat, the gray rat, the brown rat, and the common rat. The average brown rat is large and stocky; it grows to be approximately sixteen inches long from its nose to its tail—the size of a large adult human male’s foot—and weighs about a pound, though brown rats have been measured by scientists and exterminators at twenty inches and up to two pounds. The brown rat is sometimes confused with the black rat, or Rattus rattus, which is smaller and once inhabited New York City and all of the cities of America but, since Rattus norvegicus pushed it out, is now relegated to a minor role. (The two species still survive alongside each other in some Southern coastal cities and on the West Coast, in places like Los Angeles, for example, where the black rat lives in attics and palm trees.) The black rat is always a very dark gray, almost black, and the brown rat is gray or brown, with a belly that can be light gray, yellow, or even a pure-seeming white. One spring, beneath the Brooklyn Bridge, I saw a red-haired brown rat that had been run over by a car. Both pet rats and laboratory rats are Rattus norvegicus, but they are not wild and therefore, I would emphasize, not the subject of this book. Sometimes pet rats are called fancy rats. But if anyone has picked up this book to learn about fancy rats, then they should put this book down right away; none of the rats mentioned herein are at all fancy.

Rats are nocturnal, and out in the night the brown rat’s eyes are small and black and shiny; when a fl ashlight shines into them in the dark, the eyes of a rat light up like the eyes of a deer. Though it forages* in darkness, the brown rat has poor eyesight. It makes up for this with, fi rst of all, an excellent sense of smell. . . . They have an excellent sense of taste, detecting the most minute amounts of poison, down to one part per million. A brown rat has strong feet, the two front paws each equipped with four clawlike nails, the rear paws even longer and stronger. It can run and climb with squirrel-like agility. It is an excellent swimmer, surviving in rivers and bays, in sewer streams and toilet bowls.

The brown rat’s teeth are yellow, the front two incisors being especially long and sharp, like buckteeth. When the brown rat bites, its front two teeth spread apart. When it gnaws, a fl ap of skin plugs the space behind its incisors. Hence, when the rat gnaws on indigestible materials—concrete or steel, for example—the shavings don’t go down the rat’s throat and kill it. Its incisors grow at a rate of fi ve inches per year. Rats always gnaw, and no one is certain why—there are few modern rat studies. It is sometimes erroneously stated that the rat gnaws solely to limit the length of its incisors, which would otherwise grow out of its head, but this is not the case: the incisors wear down naturally. In terms of hardness, the brown rat’s teeth are stronger than aluminum, copper, lead, and iron. They are comparable to steel. With the alligator-like structure of their jaws, rats can exert a biting pressure of up to seven thousand pounds per square inch. Rats, like mice, seem to be attracted to wires—to utility wires, computer wires, wires in vehicles, in addition to gas and water pipes. One rat expert theorizes that wires may be attractive to rats because of their resemblance to vines and the stalks of plants; cables are the vines of the city. By one estimate, 26 percent of all electric-cable breaks and 18 percent of all phone-cable disruptions are caused by rats. According to one study, as many as 25 percent of all fi res of unknown origin are rat-caused. Rats chew electrical cables. Sitting in a nest of tattered rags and newspapers, in the fl oorboards of an old tenement, a rat gnaws the head of a match—the lightning in the city forest.

When it is not gnawing or feeding on trash, the brown rat digs. Anywhere there is dirt in a city, brown rats are likely to be digging—in parks, in fl owerbeds, in little dirt-poor backyards. They dig holes to enter buildings and to make nests. Rat nests can be in the floorboards of apartments, in the waste-stuffed corners of subway stations, in sewers, or beneath old furniture in basements. “Cluttered and unkempt alleyways in cities provide ideal rat habitat, especially those alleyways associated with food-serving establishments,” writes Robert Corrigan in Rodent Control, a pest control manual. “Alley rats can forage safely within the shadows created by the alleyway, as well as quickly retreat to the safety of cover in these narrow channels.” Often, rats burrow under concrete sidewalk slabs. Entrance to a typical under-the-sidewalk rat’s nest is gained through a two-inch-wide hole—their skeletons collapse and they can squeeze into a hole as small as three quarters of an inch wide, the average width of their skull. This tunnel then travels about a foot down to where it widens into a nest or den. The den is lined with soft debris, often shredded plastic garbage or shopping bags, but sometimes even grasses or plants; some rat nests have been found stuffed with the gnawed shavings of the wood-based, spring-loaded snap traps that are used in attempts to kill them. The back of the den then narrows into a long tunnel that opens up on another hole back on the street. This second hole is called a bolt hole; it is an emergency exit. A bolt hole is typically covered lightly with dirt or trash—camoufl age. Sometimes there are networks of burrows, which can stretch beneath a few concrete squares on a sidewalk, or a number of backyards, or even an entire city block—when Rattus norvegicus fi rst came to Selkirk, England, in 1776, there were so many burrows that people feared the town might sink. Rats can also nest in basements, sewers, manholes, abandoned pipes of any kind, fl oorboards, or any hole or depression. “Often,” Robert Corrigan writes, “‘city rats’ will live unbeknownst to people right beneath their feet.”

Rats also inhabit subways, as most people in New York City and any city with a subway system are well aware. Every once in a while, there are reports of rats boarding trains, but for the most part rats stay on the tracks—subway workers I have talked to refer to rats as “track rabbits.” People tend to think that the subways are fi lled with rats, but in fact rats are not everywhere in the system; they live in the subways according to the supply of discarded human food and sewer leaks. Sometimes, rats use the subway purely for nesting purposes; they fi nd ways through the walls of the subway stations leading from the tracks to the restaurants and stores on the street—the vibrations of subway trains tend to create rat-size cracks and holes. Many subway rats tend to live near stations that are themselves near fast-food restaurants. At the various subway stations near Herald Square, for example, people come down from the streets and throw the food that they have not eaten onto the tracks, along with newspapers and soda bottles and, I have noticed, thousands of nolonger-charged AA batteries, waiting to leak acid. The rats eat freely from the waste and sit at the side of the little streams of creamy brown sewery water that fl ows between the rails. They sip the water the way rats do, either with their front paws or by scooping it up with their incisors.


What is the most likely reason the author states, "cables are the vines of the city"? (/INST) 


> Output
1. The author states that rats are attracted to wires, such as utility wires, computer wires, and vehicle wires, which may be due to their resemblance to vines and the stalks of plants.

Mathematics and reasoning

Mistral models also report strengths in mathematical precision. Mistral can provide understanding such as the following mathematical logic:

> Input
<s>(INST) I bought an ice cream for 6 kids. Each cone was $1.25 and I paid with a $10 bill. How many dollars did I get back? Explain first before answering. (/INST)

> Output
1. First, let's figure out how much the ice cream cost in total. Since each cone was $1.25 and there were 6 kids, the total cost would be:

Total cost = Number of cones * Cost per cone
Total cost = 6 * $1.25
Total cost = $7.50

2. Next, we'll subtract the total cost from the amount paid ($10) to find out how much money was returned:

Money returned = Amount paid - Total cost
Money returned = $10 - $7.50
Money returned = $2.50

So, you got back $2.50 after buying the ice cream for the 6 kids.

Coding

The following is an example of an encoding message:

> Input
<s>(INST) In Bash, how do I list all text files in the current directory (excluding subdirectories) that have been modified in the last month? (/INST) 


> Output
1. Use the `find` command to search for text files in the current directory (excluding subdirectories) that have been modified in the last month. The `-type f` option specifies that we only want to search for files, not directories. The `-name "*.txt"` option specifies that we only want to search for files with the `.txt` extension. The `-mtime -30` option specifies that we only want to search for files that have been modified in the last 30 days (1 month).

```bash
find . -type f -name "*.txt" -mtime -30
```

This command will list all text files in the current directory (excluding subdirectories) that have been modified in the last month.

Clean

Once you have finished running the notebook, be sure to delete all the resources you created in the process so that your billing stops. Use the following code:

predictor.delete_model()
predictor.delete_endpoint()

Conclusion

In this post, we show you how to get started with Mistral 7B in SageMaker Studio and deploy the model for inference. Since the base models are pre-trained, they can help reduce training and infrastructure costs and allow customization for your use case. Visit Amazon SageMaker JumpStart now to get started.

Resources

About the authors

Dr. Kyle Ulrich is an applied scientist on the Amazon SageMaker JumpStart team. His research interests include scalable machine learning algorithms, computer vision, time series, non-parametric Bayesian processes, and Gaussian processes. His PhD is from Duke University and he has published articles in NeurIPS, Cell, and Neuron.

Dr Ashish Khaitan He is a Senior Applied Scientist at Amazon SageMaker JumpStart and helps develop machine learning algorithms. He earned his doctorate from the University of Illinois Urbana-Champaign. He is an active researcher in machine learning and statistical inference, and has published numerous papers at NeurIPS, ICML, ICLR, JMLR, ACL, and EMNLP conferences.

Vivek Singh is a product manager for Amazon SageMaker JumpStart. It focuses on enabling customers to incorporate SageMaker JumpStart to simplify and accelerate their machine learning journey to build generative ai applications.

Roy Allela is a Senior Solutions Architect specializing in ai/ML at AWS based in Munich, Germany. Roy helps AWS customers (from small startups to large enterprises) efficiently train and deploy large language models on AWS. Roy is passionate about computational optimization problems and improving the performance of ai workloads.

Mistral AI’s Mistral 7B foundation models are now available on Amazon SageMaker JumpStart

Technical Terrence Team

'No more flights for you': travel agency sounds alarm over risk of 'carbon passports'

Leave a Reply Cancel reply

Recommended.

Belouga: How to use it for teaching

Latest UN report calls for 'unprecedented' emissions cuts to save climate goals

Top 10 Monadas projects to try this week: March 24

What is the ERC 404 Token Standard on Ethereum (2024)?

Walmart posts strong quarter thanks to higher-income shoppers

Categories

Important Links