This blog post is co-written with Hwalsuk Lee at Upstage.
Today we are pleased to announce that the ai/solar-llm”>Solar The basic model developed by Upstage is now available to customers using amazon SageMaker JumpStart. Solar is a 100% pre-trained large language model (LLM) with amazon SageMaker that exceeds and uses its compact size and powerful history to specialize in purpose-driven training, making it versatile across languages, domains, and tasks.
You can now use Solar Mini Chat and Solar Mini Chat – Quant pretrained models within SageMaker JumpStart. SageMaker JumpStart is SageMaker's machine learning (ML) hub that provides access to core models plus built-in algorithms to help you get started quickly with ML.
In this post, we explain how to discover and implement the solar model through SageMaker JumpStart.
What is the Solar model?
Solar is a compact and powerful model for English and Korean languages. It is specifically optimized for multi-turn chat purposes, demonstrating improved performance on a wide range of natural language processing tasks.
The Solar Mini Chat model is based on Solar energy 10.7 billionwith 32 layers Call 2 structure, and initialized with previously trained weights of ai/news/announcing-mistral-7b/”>Mistral 7B compatible with the Llama 2 architecture. This setting gives it the ability to handle long conversations more effectively, making it particularly suitable for interactive applications. It uses a scaling method called depth magnification (DUS), which is composed of deep scaling and continuous pre-training. DUS allows for much easier and more efficient scaling of smaller models than other scaling methods such as mix of experts (MoE).
In December 2023, the Solar 10.7B model caused a sensation by reaching the top of the LLM Open Leaderboard to hug the face. Using noticeably fewer parameters, Solar 10.7B offers responses comparable to GPT-3.5, but is 2.5 times faster. In addition to topping the Open LLM rankings, Solar 10.7B outperforms GPT-4 with models specifically trained on certain domains and tasks.
The following figure illustrates some of these metrics:
With SageMaker JumpStart, you can deploy pre-trained models based on Solar 10.7B: Solar Mini Chat and a quantized version of Solar Mini Chat, optimized for English and Korean chat applications. The Solar Mini Chat model provides an advanced understanding of the nuances of the Korean language, significantly elevating user interactions in chat environments. Provides accurate responses to user input, ensuring clearer communication and more efficient problem resolution in English and Korean chat apps.
Get started with solar models in SageMaker JumpStart
To get started with solar models, you can use SageMaker JumpStart, a fully managed ML core service to deploy pre-built ML models to a production-ready hosted environment. You can access solar models through SageMaker JumpStart in amazon SageMaker Studio, a web-based integrated development environment (IDE) where you can access purpose-built tools to perform all ML development steps, from data preparation to creating, training and deploying your ML. Models.
In the SageMaker Studio console, choose Good start in the navigation panel. You can enter “solar” in the search bar to find Upstage solar models.
<img loading="lazy" class="aligncenter size-full wp-image-18074" src="https://technicalterrence.com/wp-content/uploads/2024/04/1713046472_718_Upstage-Solar-Models-Now-Available-on-Amazon-SageMaker-JumpStart.jpg" alt="Figure: Search solar model in amazon SageMaker JumpStart” width=”1790″ height=”982″/>
Let's implement the Solar Mini Chat – Quant model. Choose the model card to view details about the model, such as the license, the data used to train, and how to use the model. You will also find a Deploy option, which takes you to a home page where you can test the inference with an example payload.
This model requires a subscription to AWS Marketplace. If you have already subscribed to this model and received approval to use the product, you can deploy the model directly.
If you have not subscribed to this model, choose Subscribego to AWS Marketplace, review the pricing terms and End User License Agreement (EULA), and choose Accept the offer.
After you subscribe to the model, you can deploy your model to a SageMaker endpoint by selecting deployment resources, such as the instance type and initial instance count. Choose Deploy and wait for an endpoint to be created for model inference. You can select a ml.g5.2xlarge
For example, as a cheaper option to perform inferences with the Solar model.
Once your SageMaker endpoint has been successfully created, you can test it in the various SageMaker application environments.
Run your code for solar models in SageMaker Studio JupyterLab
SageMaker Studio supports multiple application development environments, including JupyterLab, a set of capabilities that expand the offering of fully managed notebooks. It includes kernels that boot in seconds, a pre-configured runtime with popular data science, machine learning frameworks, and high-performance private block storage. For more information, see SageMaker JupyterLab.
Create a JupyterLab space within SageMaker Studio that manages the storage and compute resources required to run the JupyterLab application.
You can find code that shows deploying solar models in SageMaker JumpStart and an example of how to use the deployed model in the GitHub repository. You can now deploy the model using SageMaker JumpStart. The following code uses the default ml.g5.2xlarge instance for the Solar Mini Chat Quant model inference endpoint.
Solar models support a request/response payload compatible with the OpenAI Chat completion endpoint. You can try single-turn or multi-turn chat examples with Python.
# Get a SageMaker endpoint
sagemaker_runtime = boto3.client("sagemaker-runtime")
endpoint_name = sagemaker.utils.name_from_base(model_name)
# Multi-turn chat prompt example
input = {
"messages": (
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Can you provide a Python script to merge two sorted lists?"
},
{
"role": "assistant",
"content": """Sure, here is a Python script to merge two sorted lists:
```python
def merge_lists(list1, list2):
return sorted(list1 + list2)
```
"""
},
{
"role": "user",
"content": "Can you provide an example of how to use this function?"
}
)
}
# Get response from the model
response = sagemaker_runtime.invoke_endpoint(EndpointName=endpoint_name, ContentType="application/json", Body=json.dumps (input))
result = json.loads(response('Body').read().decode())
print result
You have successfully performed real-time inference with the Solar Mini Chat model.
Clean
After you have tested the endpoint, delete the SageMaker inference endpoint and delete the model to avoid incurring charges.
You can also run the following code to remove the endpoint and mode in the SageMaker Studio JupyterLab notebook:
# Delete the endpoint
model.sagemaker_session.delete_endpoint(endpoint_name)
model.sagemaker_session.delete_endpoint_config(endpoint_name)
# Delete the model
model.delete_model()
For more information, see Delete endpoints and resources. Additionally, you can close SageMaker Studio resources that are no longer needed.
Conclusion
In this post, we show you how to get started with Upstage solar models in SageMaker Studio and deploy the model for inference. We also show you how you can run your Python sample code in SageMaker Studio JupyterLab.
Since solar models are already pre-trained, they can help reduce training and infrastructure costs and enable customization of your generative ai applications.
Try it in the SageMaker JumpStart console or the SageMaker Studio console! You can also watch the following video, Try 'Solar' with amazon SageMaker.
<iframe loading="lazy" title="Try 'Solar' with amazon SageMaker Jumpstart! | Upstage LLM” width=”500″ height=”281″ src=”https://www.youtube-nocookie.com/embed/e2ehr1oBqnA?feature=oembed” frameborder=”0″ allow=”accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share” referrerpolicy=”strict-origin-when-cross-origin” allowfullscreen=”” sandbox=”allow-scripts allow-same-origin”>
This guide is for informational purposes only. You should still conduct your own independent evaluation and take steps to ensure that you comply with your own specific quality control practices and standards and with local rules, laws, regulations, licenses and terms of use that apply to you, your content. , and the third-party model referenced in this guide. AWS has no control or authority over the third-party model referenced in this guide and makes no representations or warranties that the third-party model is secure, virus-free, operational, or compatible with your environment and standards. production. AWS makes no representation or warranty that the information contained in this guide will produce any particular outcome or result.
About the authors
channy yun is a leading developer advocate on AWS and is passionate about helping developers build modern applications on the latest AWS services. He is a pragmatic developer and blogger at heart, and loves community-driven learning and technology sharing.
Hwalsuk Lee He is Chief technology Officer (CTO) at Upstage. He has worked for Samsung Techwin, NCSOFT and Naver as an ai researcher. He is pursuing his PhD in Electrical and Computer Engineering at the Korea Advanced Institute of Science and technology (KAIST).
Brandon Lee He is a Senior Solutions Architect at AWS and primarily helps large educational technology clients in the public sector. He has over 20 years of experience leading application development at global companies and large corporations.