In Part 1 of this series, we introduced the newly released ModelTrainer class in the amazon SageMaker Python SDK and its benefits, and showed you how to tune a Meta Llama 3.1 8B model on a custom data set. In this post, we look at improvements to the Model Builder class, which allows you to seamlessly deploy a model from ModelTrainer to a SageMaker endpoint and provides a single interface for multiple deployment configurations.
In November 2023, we released the ModelBuilder class (see Package and deploy models faster with new tools and guided workflows in amazon SageMaker and Package and deploy classic machine learning and LLM easily with amazon SageMaker, part 1: PySDK enhancements), which reduced the initial configuration complexity for creating a SageMaker endpoint, such as creating an endpoint configuration, choosing the container, serialization and deserialization, and more, and helps you create a deployable model in a single step. The recent update improves the usability of the ModelBuilder class for a wide range of use cases, particularly in the rapidly evolving field of generative ai. In this post, we delve into the improvements made to the ModelBuilder class and show you how to seamlessly deploy the tuned model from Part 1 to a SageMaker endpoint.
Improvements to the ModelBuilder class
We have made the following usability improvements to the ModelBuilder class:
- Seamless transition from training to inference – ModelBuilder now integrates directly with SageMaker training interfaces to ensure that the correct file path to the last trained model artifact is automatically calculated, simplifying the workflow from model training to deployment.
- Unified inference interface – Previously, the SageMaker SDK offered separate interfaces and workflows for different types of inference, such as real-time, batch, serverless, and asynchronous inference. To simplify the model deployment process and provide a consistent experience, we have enhanced ModelBuilder to serve as a unified interface that supports multiple types of inference.
- Ease of development, testing and production transfer. – We are adding support for testing in local mode with ModelBuilder so that users can effortlessly debug and test their processing and inference scripts with faster local testing without including a container, and a new feature that generates the latest container image for a given framework so you don't need to update the code every time a new version of LMI comes out.
- Customizable inference pre- and post-processing – ModelBuilder now allows you to customize pre- and post-processing steps for inference. By allowing scripts to filter content and remove personally identifiable information (PII), this integration streamlines the deployment process, encapsulating the necessary steps within the model configuration for better management and deployment of models with specific inference requirements.
- Benchmarking Support – New benchmarking support in ModelBuilder allows you to evaluate deployment options, such as endpoints and containers, based on key performance metrics such as latency and cost. With the introduction of a benchmarking API, you can test scenarios and make informed decisions, optimizing your models for maximum performance before production. This improves efficiency and provides cost-effective deployments.
In the following sections, we discuss these improvements in more detail and demonstrate how to customize, test, and deploy your model.
Perfect implementation from ModelTrainer class
ModelBuilder integrates seamlessly with the Model Trainer class; You can simply pass the ModelTrainer object that was used to train the model directly to ModelBuilder in the model parameter. In addition to ModelTrainer, ModelBuilder also supports the Estimator class and SageMaker Core output. TrainingJob.create()
and automatically parses model artifacts to create a SageMaker model object. With resource chaining, you can create and deploy the model as shown in the following example. If you followed Part 1 of this series to tune a Meta Llama 3.1 8B model, you can pass the model_trainer
object as follows:
Customize the model using InferenceSpec
He InferenceSpec
The class allows you to customize the model by providing custom logic to load and invoke the model, and specify any preprocessing or postprocessing logic as needed. For SageMaker endpoints, preprocessing and postprocessing scripts are often used as part of the inference process to handle tasks that are required before and after data is sent to the model to make predictions, especially in the case of complex workflows or non-standard models. The following example shows how you can specify custom logic using InferenceSpec
:
Test using local mode and in process
Deploying a trained model to a SageMaker endpoint involves creating a SageMaker model and configuring the endpoint. This includes the inference script, any required serialization or deserialization, the location of the model artifact in amazon Simple Storage Service (amazon S3), the URI of the container image, the correct type and instance count, and more. Machine learning (ML) professionals must iterate on these configurations before finally deploying the endpoint to SageMaker for inference. ModelBuilder offers two modes for rapid prototyping:
- In process mode – In this case the inferences are made directly within the same inference process. This is very useful for quickly testing the inference logic provided through
InferenceSpec
and provides immediate feedback during experimentation. - local mode – The model is deployed and run as a local container. This is achieved by setting the mode to
LOCAL_CONTAINER
when you build the model. This is useful for mimicking the same environment as the SageMaker endpoint. See the following laptop For example.
The following code is an example of running inference in process mode, with a custom configuration. InferenceSpec
:
As next steps, you can test it in local container mode as shown in the following code, adding the image_uri
. You will need to include the model_server
argument when including the image_uri
.
Implement the model
When the test is complete, you can now deploy the model to a real-time endpoint to make predictions by updating the mode to mode.SAGEMAKER_ENDPOINT
and providing an instance type and size:
In addition to real-time inference, SageMaker supports serverless inference, asynchronous inference, and batch inference deployment modes. You can also use InferenceComponents
to abstract your models and assign CPUs, GPUs, accelerators, and scaling policies per model. For more information, see Reduce model deployment costs by 50% on average using the latest features in amazon SageMaker.
After having the ModelBuilder
object, you can implement any of these options by simply adding the corresponding inference configurations when deploying the model. By default, if the mode is not provided, the model is deployed to a real-time endpoint. The following are examples of other configurations:
from sagemaker.serverless.serverless_inference_config import ServerlessInferenceConfig
predictor = model_builder.deploy(
endpoint_name="serverless-endpoint",
inference_config=ServerlessInferenceConfig(memory_size_in_mb=2048))
- Implement a multi-model endpoint using
InferenceComponent
:
Clean
If you created any endpoints by following this post, you will incur charges while they are in operation. As a best practice, delete endpoints if they are no longer needed, either by using the AWS Management Console or by using the following code:
Conclusion
In this two-part series, we introduce improvements to ModelTrainer and ModelBuilder in the SageMaker Python SDK. Both classes aim to reduce complexity and cognitive overload for data scientists by providing you with a simple and intuitive interface to train and deploy models, both locally on your SageMaker notebooks and on remote SageMaker endpoints.
We recommend that you try the SageMaker SDK enhancements (SageMaker Core, ModelTrainer, and ModelBuilder) by checking out the SDK Documentation and sample notebooks on the <a target="_blank" href="https://github.com/aws/amazon-sagemaker-examples/tree/default/%20%20%20%20%20deploy_and_monitor/sm-model_builder” target=”_blank” rel=”noopener”>GitHub repositoryAnd let us know your opinion in the comments!
About the authors
Durga Sury is a Senior Solutions Architect on the amazon SageMaker team. Over the past five years, he has worked with several enterprise clients to set up a secure and scalable ai/ML platform built on SageMaker.
Shweta Singh He is a Senior Product Manager on the amazon SageMaker Machine Learning (ML) Platform team on AWS and leads the SageMaker Python SDK. He has worked in various product roles at amazon for over 5 years. He has a bachelor's degree in Computer Engineering and a Master of Science in Financial Engineering, both from New York University.