Accelerate your ML lifecycle using the new and improved Amazon SageMaker Python SDK – Part 2: ModelBuilder

In Part 1 of this series, we introduced the newly released ModelTrainer class in the amazon SageMaker Python SDK and its benefits, and showed you how to tune a Meta Llama 3.1 8B model on a custom data set. In this post, we look at improvements to the Model Builder class, which allows you to seamlessly deploy a model from ModelTrainer to a SageMaker endpoint and provides a single interface for multiple deployment configurations.

In November 2023, we released the ModelBuilder class (see Package and deploy models faster with new tools and guided workflows in amazon SageMaker and Package and deploy classic machine learning and LLM easily with amazon SageMaker, part 1: PySDK enhancements), which reduced the initial configuration complexity for creating a SageMaker endpoint, such as creating an endpoint configuration, choosing the container, serialization and deserialization, and more, and helps you create a deployable model in a single step. The recent update improves the usability of the ModelBuilder class for a wide range of use cases, particularly in the rapidly evolving field of generative ai. In this post, we delve into the improvements made to the ModelBuilder class and show you how to seamlessly deploy the tuned model from Part 1 to a SageMaker endpoint.

Improvements to the ModelBuilder class

We have made the following usability improvements to the ModelBuilder class:

Seamless transition from training to inference – ModelBuilder now integrates directly with SageMaker training interfaces to ensure that the correct file path to the last trained model artifact is automatically calculated, simplifying the workflow from model training to deployment.
Unified inference interface – Previously, the SageMaker SDK offered separate interfaces and workflows for different types of inference, such as real-time, batch, serverless, and asynchronous inference. To simplify the model deployment process and provide a consistent experience, we have enhanced ModelBuilder to serve as a unified interface that supports multiple types of inference.
Ease of development, testing and production transfer. – We are adding support for testing in local mode with ModelBuilder so that users can effortlessly debug and test their processing and inference scripts with faster local testing without including a container, and a new feature that generates the latest container image for a given framework so you don't need to update the code every time a new version of LMI comes out.
Customizable inference pre- and post-processing – ModelBuilder now allows you to customize pre- and post-processing steps for inference. By allowing scripts to filter content and remove personally identifiable information (PII), this integration streamlines the deployment process, encapsulating the necessary steps within the model configuration for better management and deployment of models with specific inference requirements.
Benchmarking Support – New benchmarking support in ModelBuilder allows you to evaluate deployment options, such as endpoints and containers, based on key performance metrics such as latency and cost. With the introduction of a benchmarking API, you can test scenarios and make informed decisions, optimizing your models for maximum performance before production. This improves efficiency and provides cost-effective deployments.

In the following sections, we discuss these improvements in more detail and demonstrate how to customize, test, and deploy your model.

Perfect implementation from ModelTrainer class

ModelBuilder integrates seamlessly with the Model Trainer class; You can simply pass the ModelTrainer object that was used to train the model directly to ModelBuilder in the model parameter. In addition to ModelTrainer, ModelBuilder also supports the Estimator class and SageMaker Core output. TrainingJob.create() and automatically parses model artifacts to create a SageMaker model object. With resource chaining, you can create and deploy the model as shown in the following example. If you followed Part 1 of this series to tune a Meta Llama 3.1 8B model, you can pass the model_trainer object as follows:

# set container URI
image_uri = "763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.3.0-tgi2.2.0-gpu-py310-cu121-ubuntu22.04-v2.0"

model_builder = ModelBuilder(
    model=model_trainer,  # ModelTrainer object passed onto ModelBuilder directly
    role_arn=role,
    image_uri=image_uri,
    inference_spec=inf_spec,
    instance_type="ml.g5.2xlarge"
)
# deploy the model
model_builder.build().deploy()

Customize the model using InferenceSpec

He InferenceSpec The class allows you to customize the model by providing custom logic to load and invoke the model, and specify any preprocessing or postprocessing logic as needed. For SageMaker endpoints, preprocessing and postprocessing scripts are often used as part of the inference process to handle tasks that are required before and after data is sent to the model to make predictions, especially in the case of complex workflows or non-standard models. The following example shows how you can specify custom logic using InferenceSpec:

from sagemaker.serve.spec.inference_spec import InferenceSpec

class CustomerInferenceSpec(InferenceSpec):
    def load(self, model_dir):
        from transformers import AutoModel
        return AutoModel.from_pretrained(HF_TEI_MODEL, trust_remote_code=True)

    def invoke(self, x, model):
        return model.encode(x)

    def preprocess(self, input_data):
        return json.loads(input_data)("inputs")

    def postprocess(self, predictions):
        assert predictions is not None
        return predictions

Test using local mode and in process

Deploying a trained model to a SageMaker endpoint involves creating a SageMaker model and configuring the endpoint. This includes the inference script, any required serialization or deserialization, the location of the model artifact in amazon Simple Storage Service (amazon S3), the URI of the container image, the correct type and instance count, and more. Machine learning (ML) professionals must iterate on these configurations before finally deploying the endpoint to SageMaker for inference. ModelBuilder offers two modes for rapid prototyping:

In process mode – In this case the inferences are made directly within the same inference process. This is very useful for quickly testing the inference logic provided through InferenceSpec and provides immediate feedback during experimentation.
local mode – The model is deployed and run as a local container. This is achieved by setting the mode to LOCAL_CONTAINER when you build the model. This is useful for mimicking the same environment as the SageMaker endpoint. See the following laptop For example.

The following code is an example of running inference in process mode, with a custom configuration. InferenceSpec:

from sagemaker.serve.spec.inference_spec import InferenceSpec
from transformers import pipeline
from sagemaker.serve import Mode
from sagemaker.serve.builder.schema_builder import SchemaBuilder
from sagemaker.serve.builder.model_builder import ModelBuilder

value: str = "Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared to the glorious majesty of the giraffe.\nDaniel: Hello, Girafatron!\nGirafatron:"
schema = SchemaBuilder(value,
            {"generated_text": "Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared to the glorious majesty of the giraffe.\\nDaniel: Hello, Girafatron!\\nGirafatron: Hi, Daniel. I was just thinking about how magnificent giraffes are and how they should be worshiped by all.\\nDaniel: You and I think alike, Girafatron. I think all animals should be worshipped! But I guess that could be a bit impractical...\\nGirafatron: That\'s true. But the giraffe is just such an amazing creature and should always be respected!\\nDaniel: Yes! And the way you go on about giraffes, I could tell you really love them.\\nGirafatron: I\'m obsessed with them, and I\'m glad to hear you noticed!\\nDaniel: I\'"})

# custom inference spec with hugging face pipeline
class MyInferenceSpec(InferenceSpec):
    def load(self, model_dir: str):
        ...
    def invoke(self, input, model):
        ...
    def preprocess(self, input_data):
        ...
    def postprocess(self, predictions):
        ...
        
inf_spec = MyInferenceSpec()

# Build ModelBuilder object in IN_PROCESS mode
builder = ModelBuilder(inference_spec=inf_spec,
                       mode=Mode.IN_PROCESS,
                       schema_builder=schema
                      )
                      
# Build and deploy the model
model = builder.build()
predictor=model.deploy()

# make predictions
predictor.predict("How are you today?")

As next steps, you can test it in local container mode as shown in the following code, adding the image_uri. You will need to include the model_server argument when including the image_uri.

image_uri = '763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-inference:2.0.0-transformers4.28.1-gpu-py310-cu118-ubuntu20.04'

builder = ModelBuilder(inference_spec=inf_spec,
                       mode=Mode.LOCAL_CONTAINER,  # you can change it to Mode.SAGEMAKER_ENDPOINT for endpoint deployment
                       schema_builder=schema,
                       image_uri=image,
                       model_server=ModelServer.TORCHSERVE
                      )

model = builder.build()                      
predictor = model.deploy()

predictor.predict("How are you today?")

Implement the model

When the test is complete, you can now deploy the model to a real-time endpoint to make predictions by updating the mode to mode.SAGEMAKER_ENDPOINT and providing an instance type and size:

sm_predictor = model.deploy(
    initial_instance_count=1,
    instance_type="ml.g5.2xlarge",
    mode=Mode.SAGEMAKER_ENDPOINT,
    role=execution_role,
)

sm_predictor.predict("How is the weather?")

In addition to real-time inference, SageMaker supports serverless inference, asynchronous inference, and batch inference deployment modes. You can also use InferenceComponents to abstract your models and assign CPUs, GPUs, accelerators, and scaling policies per model. For more information, see Reduce model deployment costs by 50% on average using the latest features in amazon SageMaker.

After having the ModelBuilder object, you can implement any of these options by simply adding the corresponding inference configurations when deploying the model. By default, if the mode is not provided, the model is deployed to a real-time endpoint. The following are examples of other configurations:

from sagemaker.serverless.serverless_inference_config import ServerlessInferenceConfig
predictor = model_builder.deploy(
    endpoint_name="serverless-endpoint",
    inference_config=ServerlessInferenceConfig(memory_size_in_mb=2048))

from sagemaker.async_inference.async_inference_config import AsyncInferenceConfig
from sagemaker.s3_utils import s3_path_join

predictor = model_builder.deploy(
    endpoint_name="async-endpoint",
    inference_config=AsyncInferenceConfig(
        output_path=s3_path_join("s3://", bucket, "async_inference/output")))

from sagemaker.batch_inference.batch_transform_inference_config import BatchTransformInferenceConfig

transformer = model_builder.deploy(
    endpoint_name="batch-transform-job",
    inference_config=BatchTransformInferenceConfig(
        instance_count=1,
        instance_type="ml.m5.large",
        output_path=s3_path_join("s3://", bucket, "batch_inference/output"),
        test_data_s3_path = s3_test_path
    ))
print(transformer)

Implement a multi-model endpoint using InferenceComponent:

from sagemaker.compute_resource_requirements.resource_requirements import ResourceRequirements

predictor = model_builder.deploy(
    endpoint_name="multi-model-endpoint",
    inference_config=ResourceRequirements(
        requests={
            "num_cpus": 0.5,
            "memory": 512,
            "copies": 2,
        },
        limits={},
))

Clean

If you created any endpoints by following this post, you will incur charges while they are in operation. As a best practice, delete endpoints if they are no longer needed, either by using the AWS Management Console or by using the following code:

predictor.delete_model() 
predictor.delete_endpoint()

Conclusion

In this two-part series, we introduce improvements to ModelTrainer and ModelBuilder in the SageMaker Python SDK. Both classes aim to reduce complexity and cognitive overload for data scientists by providing you with a simple and intuitive interface to train and deploy models, both locally on your SageMaker notebooks and on remote SageMaker endpoints.

We recommend that you try the SageMaker SDK enhancements (SageMaker Core, ModelTrainer, and ModelBuilder) by checking out the SDK Documentation and sample notebooks on the <a target="_blank" href="https://github.com/aws/amazon-sagemaker-examples/tree/default/%20%20%20%20%20deploy_and_monitor/sm-model_builder” target=”_blank” rel=”noopener”>GitHub repositoryAnd let us know your opinion in the comments!

About the authors

Durga Sury is a Senior Solutions Architect on the amazon SageMaker team. Over the past five years, he has worked with several enterprise clients to set up a secure and scalable ai/ML platform built on SageMaker.

Shweta Singh He is a Senior Product Manager on the amazon SageMaker Machine Learning (ML) Platform team on AWS and leads the SageMaker Python SDK. He has worked in various product roles at amazon for over 5 years. He has a bachelor's degree in Computer Engineering and a Master of Science in Financial Engineering, both from New York University.

Accelerate your ML lifecycle using the new and improved Amazon SageMaker Python SDK – Part 2: ModelBuilder

Technical Terrence Team

Trump plans will test Fed's 2025 rate cut bets

Leave a Reply Cancel reply

Recommended.

The best docking stations for laptops in 2023

Operai rejects the offer of $ 97.4 billion of Elon Musk for company control

Why did the SEC finally approve Bitcoin spot ETFs? Ripple CLO has answers

What is quantitative trading and how exactly does it work?

Free Zynga Oras NFTs Fetch $1M on OpenSea in 24 Hours

Categories

Important Links

Accelerate your ML lifecycle using the new and improved Amazon SageMaker Python SDK – Part 2: ModelBuilder

Improvements to the ModelBuilder class

Perfect implementation from ModelTrainer class

Customize the model using InferenceSpec

Test using local mode and in process

Implement the model

Clean

Conclusion

About the authors

Related

Technical Terrence Team

Trump plans will test Fed's 2025 rate cut bets

Leave a Reply Cancel reply

Recommended.

The best docking stations for laptops in 2023

Operai rejects the offer of $ 97.4 billion of Elon Musk for company control

Why did the SEC finally approve Bitcoin spot ETFs? Ripple CLO has answers

What is quantitative trading and how exactly does it work?

Free Zynga Oras NFTs Fetch $1M on OpenSea in 24 Hours

Categories

Important Links

Get daily news updates to your inbox!