Generative ai base models are gaining popularity among enterprises due to their versatility and potential to address a variety of use cases. The true value of base models is realized when they are tailored to domain-specific data. Managing these models throughout the business and model lifecycle can introduce complexity. As base models are tailored to different domains and data, operationalizing these pipelines becomes critical.
amazon SageMaker, a fully managed service for building, training, and deploying machine learning (ML) models, has seen increased adoption for customizing and deploying machine learning models that power generative ai applications. SageMaker offers advanced features for building automated workflows to deploy models at scale. One of the key features that enables operational excellence around model management is the Model Registry. The Model Registry helps catalog and manage model versions and facilitates collaboration and governance. When a model is trained and evaluated for performance, it can be stored in the Model Registry for management.
amazon SageMaker has released new features in Model Registry that make it easier to version and catalog simulation models. Customers can use SageMaker to train or tune simulation models, including amazon SageMaker JumpStart and amazon Bedrock models, and also manage these models within Model Registry. As customers begin to scale generative ai applications across various use cases, such as fine-tuning for domain-specific tasks, the number of models can grow rapidly. To keep track of models, versions, and associated metadata, SageMaker Model Registry can be used as a model inventory.
In this post, we explore new features in Model Registry that streamline FM management: you can now register unpacked model artifacts and pass an End User License Agreement (EULA) acceptance flag without requiring any user intervention.
Overview
Model Registry has worked well for traditional models, which are smaller in size. In the case of FMs, there were challenges due to their size and user intervention requirements for EULA acceptance. With the new features of Model Registry, it has become easier to register a refined FM within Model Registry, which can then be deployed for real-world use.
A typical model development lifecycle is an iterative process. We run many experimentation cycles to achieve the expected model performance. Once trained, these models can be registered in the Model Registry, where they are cataloged as releases. Models can be organized into groups, releases can be compared based on their quality metrics, and models can have an associated approval status indicating whether they can be deployed.
Once the model is manually approved, a continuous integration and continuous deployment (CI/CD) workflow can be triggered to deploy these models into production. Optionally, Model Registry can be used as a repository for models that are approved for use by an enterprise. Multiple teams can then deploy these approved models from Model Registry and build applications around them.
An example workflow might follow these steps and is shown in the following diagram:
- Select a SageMaker JumpStart model and register it in the Model Registry
- Alternatively, you can adjust a SageMaker JumpStart model
- Evaluate the model using SageMaker Model Evaluation. SageMaker allows for human evaluation if desired.
- Create a model group in the Model Registry. For each run, create a version of the model. Add your model group to one or more Model Registry collections, which can be used to group registered models that are related to each other. For example, you might have a collection of large language models (LLMs) and another collection of broadcast models.
- Deploy models as SageMaker inference endpoints that can be consumed by generative ai applications.
Figure 1: Model Registry workflow for base models
To better support generative ai applications, Model Registry has released two new features: ModelDataSource and Source Model URI. In the following sections, we will explore these features and how to use them.
ModelDataSource speeds up deployment and provides access to EULA-dependent models
Until now, model artifacts had to be stored along with the inference code when a model was registered in the Model Registry in a compressed format. This posed challenges for generative ai applications where FMs are very large in size with billions of parameters. The large size of FMs when stored as compressed models was leading to increased latency in the SageMaker endpoint startup time because decompressing these models at runtime was time-consuming. model_data_source
The parameter can now accept the location of unzipped model artifacts in amazon Simple Storage Service (amazon S3), simplifying the registration process. This also eliminates the need for endpoints to unzip model weights, reducing latency during endpoint startup times.
Additionally, public JumpStart models and certain FMs from third-party service providers, such as LLAMA2, require their EULA to be accepted before using the models. Therefore, when public SageMaker JumpStart models were adjusted, they could not be stored in the Model Registry because a user needed to accept the license agreement. The Model Registry added a new feature: support for the EULA acceptance flag within the Model Registry. model_data_source
parameter, allowing for the registration of such models. Customers can now catalog, version, associate metadata such as training metrics, and more in Model Registry for a wider variety of FMs.
Register unpacked models stored in amazon S3 using the AWS SDK.
Register models that require an EULA.
The source model URI provides simplified registration and support for proprietary models
Model Registry now supports automatic inclusion of inference specification files for some well-known model identifiers, including select models from AWS Marketplace, hosted models, or versioned model packages in Model Registry. Because of SourceModelURI support for automatic inclusion, you can register proprietary JumpStart models from vendors such as AI21 labs, Cohere, and LightOn without needing the inference specification file, enabling your organization to use a broader set of FMs in Model Registry.
Previously, to register a trained model in the SageMaker Model Registry, you needed to provide the full inference specification required for deployment, including an amazon Elastic Container Registry (amazon ECR) image and the trained model file. With the release of source_uri
SageMaker has made it easy for users to register any model by providing a Source Model URI, which is a free-form field that stores the model ID or location in a proprietary JumpStart and Bedrock Model ID, S3 location, and MLflow Model ID. Instead of having to provide the details required for deployment to SageMaker hosting at the time of registrations, you can add the artifacts later. After registration, to deploy a model, you can package the model as an Inference Specification and update the Model Registry accordingly.
For example, you can register a model in the Model Registry with a SourceURI of the model's amazon Resource Name (ARN).
You can then update the registered model with the inference specification, making it deployable in SageMaker.
Register an amazon JumpStart owner FM.
Conclusion
As organizations continue to adopt generative ai in different areas of their business, having robust model management and version control is critical. With Model Registry, you can achieve version control, tracking, collaboration, lifecycle management, and FM governance.
In this post, we explore how Model Registry can now more effectively support the management of generative ai models throughout the model lifecycle, enabling you to better govern and adopt generative ai to achieve transformative outcomes.
For more information about Model Registry, see Register and deploy models with Model Registry. To get started, visit the SageMaker console.
About the authors
Chaitra Mathur She serves as a Principal Solutions Architect at AWS, where her role is to advise customers on building robust, scalable, and secure solutions on AWS. With a keen interest in data and machine learning, she helps customers leverage AWS ai/ML and Generative ai services to effectively address their machine learning requirements. Throughout her career, she has shared her expertise at numerous conferences and written several blog posts in the area of machine learning.
Kait Healy is a Solutions Architect II at AWS. She specializes in working with startups and enterprise customers in the automotive industry, where she has experience building large-scale ai/ML solutions to drive key business outcomes.
Saumitra Vikram is a Senior Software Engineer at AWS. He focuses on ai/ML technology, ML model management, ML governance, and MLOps to improve overall organizational efficiency and productivity.
Siamak Nariman is a Senior Product Manager at AWS. He focuses on ai/ML technology, ML model management, and ML governance to improve overall organizational efficiency and productivity. He has extensive experience in process automation and deploying various technologies.