Deploy Amazon SageMaker Pipelines Using AWS Controllers for Kubernetes

Kubernetes is a popular orchestration platform for managing containers. Its scalability and load balancing capabilities make it ideal for handling the variable workloads typical of machine learning (ML) applications. DevOps engineers often use Kubernetes to manage and scale ML applications, but before an ML model is available, it must be trained and evaluated, and if the quality of the resulting model is satisfactory, it must be uploaded to a model registry.

amazon SageMaker offers capabilities to remove the heavy, undifferentiated work of building and deploying ML models. SageMaker simplifies the process of managing dependencies, container images, auto-scaling, and monitoring. Specifically for the model creation stage, amazon SageMaker Pipelines automates the process by managing the infrastructure and resources required to process data, train models, and run evaluation tests.

A challenge for DevOps engineers is the added complexity of using Kubernetes to manage the deployment stage and relying on other tools (such as AWS SDK or AWS CloudFormation) to manage the model creation flow. An alternative to simplify this process is to use AWS Drivers for Kubernetes (ACK) to manage and deploy a SageMaker training workflow. ACK allows you to take advantage of managed model building workflows without needing to define resources outside of your Kubernetes cluster.

In this post, we present an example to help DevOps engineers manage the entire ML lifecycle, including training and inference, using the same toolkit.

Solution Overview

We consider a use case where an ML engineer sets up a SageMaker model-building pipeline using a Jupyter notebook. This setup takes the form of a directed acyclic graph (DAG) represented as a JSON pipeline definition. The JSON document can be stored and versioned in an amazon Simple Storage Service (amazon S3) bucket. If encryption is required, it can be implemented using an AWS Key Management Service (AWS KMS) managed key for amazon S3. A DevOps engineer with access to obtain this amazon S3 definition file can upload the pipeline definition to an ACK Service Controller for SageMaker, which runs as part of an amazon Elastic Kubernetes Service (amazon EKS) cluster. The DevOps engineer can then use the Kubernetes APIs provided by ACK to submit the pipeline definition and initiate one or more pipeline executions in SageMaker. This entire workflow is shown in the following solution diagram.

Prerequisites

To follow the course, it is necessary to meet the following prerequisites:

An EKS cluster where the ML pipeline will be built.
A user with access to an AWS Identity and Access Management (IAM) role that has IAM permissions (iam:CreateRole, iam:AttachRolePolicyand iam:PutRolePolicy) to allow the creation of roles and associating policies with roles.
The following command line tools on the local machine or cloud-based development environment are used to access the Kubernetes cluster:

Install SageMaker ACK Service Handler

The SageMaker ACK Service Controller makes it easy for DevOps engineers to use Kubernetes as a control plane to create and manage ML pipelines. To install the controller on your EKS cluster, complete the following steps:

Configure IAM permissions to ensure that the controller has access to the appropriate AWS resources.
Install the driver using a SageMaker Helm chart to make it available on the client machine.

The following tutorial Provides step-by-step instructions with the commands required to install the ACK Service Driver for SageMaker.

Generate a pipeline JSON definition

In most companies, ML engineers are responsible for building the ML pipeline in their organization. They often work with DevOps engineers to operate those pipelines. In SageMaker, ML engineers can use the SageMaker Python SDK to generate a pipeline definition in JSON format. A SageMaker pipeline definition must follow the following guidelines: schemewhich includes base images, dependencies, steps, and instance types and sizes needed to fully define the pipeline. The DevOps engineer then retrieves this definition to deploy and maintain the infrastructure needed for the pipeline.

Below is an example pipeline definition with a training step:

{
  "Version": "2020-12-01",
  "Steps": (
  {
    "Name": "AbaloneTrain",
    "Type": "Training",
    "Arguments": {
      "RoleArn": "<>",
      "HyperParameters": {
        "max_depth": "5",
        "gamma": "4",
        "eta": "0.2",
        "min_child_weight": "6",
        "objective": "multi:softmax",
        "num_class": "10",
        "num_round": "10"
     },
     "AlgorithmSpecification": {
     "TrainingImage": "683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-xgboost:1.7-1",
     "TrainingInputMode": "File"
   },
   "OutputDataConfig": {
     "S3OutputPath": "s3://<>/sagemaker/"
   },
   "ResourceConfig": {
     "InstanceCount": 1,
     "InstanceType": "ml.m4.xlarge",
     "VolumeSizeInGB": 5
   },
   "StoppingCondition": {
     "MaxRuntimeInSeconds": 86400
   },
   "InputDataConfig": (
   {
     "ChannelName": "train",
     "DataSource": {
       "S3DataSource": {
         "S3DataType": "S3Prefix",
         "S3Uri": "s3://<>/sagemaker/xgboost/train/",
         "S3DataDistributionType": "
       }
     },
     "ContentType": "text/libsvm"
   },
   {
     "ChannelName": "validation",
     "DataSource": {
       "S3DataSource": {
         "S3DataType": "S3Prefix",
         "S3Uri": "s3://<>/sagemaker/xgboost/validation/",
         "S3DataDistributionType": "FullyReplicated"
       }
     },
     "ContentType": "text/libsvm"
   })
  }
 })
}

With SageMaker, ML model artifacts and other system artifacts are encrypted in transit and at rest. SageMaker encrypts them by default using the AWS Managed Key for amazon S3. Optionally, you can specify a custom key using the AWS Managed Key. KmsKeyId property of the OutputDataConfig argument. For more information about how SageMaker protects data, see Data Protection in amazon SageMaker.

Additionally, we recommend securing access to pipeline artifacts, such as model output and training data, to a specific set of IAM roles built for data scientists and ML engineers. This can be achieved by attaching an appropriate bucket policy. For more information on best practices for securing data in amazon S3, see Top 10 Security Best Practices for Securing Data in amazon S3.

Create and submit a pipeline YAML specification

In the world of Kubernetes, objects Objects are the persistent entities in the Kubernetes cluster that are used to represent the state of the cluster. When you create an object in Kubernetes, you need to provide the object specification that describes its desired state, as well as some basic information about the object (such as a name). Then, using tools like kubectl, you provide the information in a manifest file in YAML (or JSON) format to communicate with the Kubernetes API.

Please see the following Kubernetes YAML specification for a SageMaker pipelineDevOps engineers need to modify the .spec.pipelineDefinition Enter the key in the file and add the pipeline JSON definition provided by the ML engineer. They then prepare and submit a standalone pipeline execution YAML specification to run the pipeline in SageMaker. There are two ways to submit a pipeline YAML specification:

Pass the inline pipeline definition as a JSON object to the pipeline YAML specification.
Convert the JSON pipeline definition to string format using the jq command-line utility. For example, you can use the following command to convert the pipeline definition to a JSON-encoded string:

In this post, we use the first option and prepare the YAML specification (my-pipeline.yaml) as follows:

apiVersion: sagemaker.services.k8s.aws/v1alpha1
kind: Pipeline
metadata:
  name: my-kubernetes-pipeline
spec:
  parallelismConfiguration:
  	maxParallelExecutionSteps: 2
  pipelineName: my-kubernetes-pipeline
  pipelineDefinition: |
  {
    "Version": "2020-12-01",
    "Steps": (
    {
      "Name": "AbaloneTrain",
      "Type": "Training",
      "Arguments": {
        "RoleArn": "<>",
        "HyperParameters": {
          "max_depth": "5",
          "gamma": "4",
          "eta": "0.2",
          "min_child_weight": "6",
          "objective": "multi:softmax",
          "num_class": "10",
          "num_round": "30"
        },
        "AlgorithmSpecification": {
          "TrainingImage": "683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-xgboost:1.7-1",
          "TrainingInputMode": "File"
        },
        "OutputDataConfig": {
          "S3OutputPath": "s3://<>/sagemaker/"
        },
        "ResourceConfig": {
          "InstanceCount": 1,
          "InstanceType": "ml.m4.xlarge",
          "VolumeSizeInGB": 5
        },
        "StoppingCondition": {
          "MaxRuntimeInSeconds": 86400
        },
        "InputDataConfig": (
        {
          "ChannelName": "train",
          "DataSource": {
            "S3DataSource": {
              "S3DataType": "S3Prefix",
              "S3Uri": "s3://<>/sagemaker/xgboost/train/",
              "S3DataDistributionType": "FullyReplicated"
            }
          },
          "ContentType": "text/libsvm"
        },
        {
          "ChannelName": "validation",
          "DataSource": {
            "S3DataSource": {
              "S3DataType": "S3Prefix",
              "S3Uri": "s3://<>/sagemaker/xgboost/validation/",
              "S3DataDistributionType": "FullyReplicated"
            }
          },
          "ContentType": "text/libsvm"
        }
      )
    }
  }
)}
pipelineDisplayName: my-kubernetes-pipeline
roleARN: <>

Send the pipeline to SageMaker

To submit your prepared pipeline specification, apply the specification to your Kubernetes cluster as follows:

kubectl apply -f my-pipeline.yaml

Create and submit a pipeline execution YAML specification

Please see the following Kubernetes YAML specification for a SageMaker pipeline. Prepare the pipeline execution YAML specification (pipeline-execution.yaml) as follows:

apiVersion: sagemaker.services.k8s.aws/v1alpha1
kind: PipelineExecution
metadata:
  name: my-kubernetes-pipeline-execution
spec:
  parallelismConfiguration:
  	maxParallelExecutionSteps: 2
  pipelineExecutionDescription: "My first pipeline execution via amazon EKS cluster."
  pipelineName: my-kubernetes-pipeline

To start a pipeline execution, use the following code:

kubectl apply -f pipeline-execution.yaml

Review and troubleshoot pipeline execution

To list all the pipes created using the ACK handler, use the following command:

To list all pipeline executions, use the following command:

kubectl get pipelineexecution

To get more details about the pipeline after submitting it, such as checking the status, errors, or parameters of the pipeline, use the following command:

kubectl describe pipeline my-kubernetes-pipeline

To troubleshoot a pipeline execution by reviewing more details about the execution, use the following command:

kubectl describe pipelineexecution my-kubernetes-pipeline-execution

Clean

Use the following command to delete any pipelines you have created:

Use the following command to cancel any pipeline execution you have started:

kubectl delete pipelineexecution

Conclusion

In this post, we present an example of how ML engineers familiar with Jupyter notebooks and SageMaker environments can work efficiently with DevOps engineers familiar with Kubernetes and related tools to design and maintain an ML pipeline with the right infrastructure for their organization. This allows DevOps engineers to manage all steps of the ML lifecycle with the same toolset and environment they are accustomed to, enabling organizations to innovate faster and more efficiently.

Explore the GitHub repository for ACK and the SageMaker Controller to start managing your ML operations with Kubernetes.

About the authors

Pratik Yeole is a Senior Solutions Architect working with global customers and helping them build value-driven solutions on AWS. He has domain expertise in Containers and MLOps. Outside of work, he enjoys time with friends, family, music, and cricket.

Felipe Lopez Felipe is a Senior Solutions Architect specializing in ai/ML at AWS. Prior to joining AWS, Felipe worked at GE Digital and SLB, where he focused on modeling and optimization products for industrial applications.

Deploy Amazon SageMaker Pipelines Using AWS Controllers for Kubernetes

Technical Terrence Team

Airbus and Rolls to break silence with airlines over Cathay Pacific A350 engine incident, sources say By Reuters

Leave a Reply Cancel reply

Recommended.

Disney Removes Simpsons Episode That References Chinese Labor Camps

Euro index rises, yen falls today

Ethereum’s ICO legacy vs. Mpeppe’s 150x innovation: which will prevail?

General Axelar messaging capabilities

Greek stock markets close higher; The General Composite of Athens rises 0.40% By Investing.com

Categories

Important Links

Deploy Amazon SageMaker Pipelines Using AWS Controllers for Kubernetes

Solution Overview

Prerequisites

Install SageMaker ACK Service Handler

Generate a pipeline JSON definition

Create and submit a pipeline YAML specification

Send the pipeline to SageMaker

Create and submit a pipeline execution YAML specification

Review and troubleshoot pipeline execution

Clean

Conclusion

About the authors

Related

Technical Terrence Team

Airbus and Rolls to break silence with airlines over Cathay Pacific A350 engine incident, sources say By Reuters

Leave a Reply Cancel reply

Recommended.

Disney Removes Simpsons Episode That References Chinese Labor Camps

Euro index rises, yen falls today

Ethereum’s ICO legacy vs. Mpeppe’s 150x innovation: which will prevail?

General Axelar messaging capabilities

Greek stock markets close higher; The General Composite of Athens rises 0.40% By Investing.com

Categories

Important Links

Get daily news updates to your inbox!