Boomi Uses BYOC in Amazon SageMaker Studio to Scale Custom Markov Chain Implementation

This post is co-authored with Swagata Ashwani, a Senior Data Scientist at Boomi.

buomi is an enterprise-grade Software as a Service (SaaS) Independent Software Vendor (ISV) that creates developer enablement tools for software engineers. These tools are integrated via API into Boomi’s core service offering.

In this post, we discuss how Boomi used the bring-your-own-container (BYOC) approach to develop a new AI/ML-enabled solution for its clients to address the “blank canvas” problem. Boomi’s machine learning (ML) powered solution makes it easy to quickly develop integrations into your platform and enable faster time-to-market for your customers. Boomi funded this solution through the AWS PE ML FastStart program, a customer enablement program aimed at taking ML-enabled solutions from idea to production in a matter of weeks. Boomi built this solution using Amazon SageMaker Studio, a comprehensive browser-based IDE for AI/ML workloads, and Amazon Elastic Container Registry (Amazon ECR).

He blank canvas problem describes the productivity and creativity issues developers face when starting a new task. An experienced developer knows from the start of a new task what his code base will look like in general, but the process of creating this code base is extensive and there is no clear starting point. As the developer begins to progress on the blank canvas, the productivity of him remains low. The code written is usually boilerplate code that provides the foundation for business logic that cannot be written until most of the foundation has been established.

Boomi created a novel solution to the blank canvas problem using traditional development techniques. Boomi’s ML and data engineering teams needed the solution to be deployed quickly, in a repeatable and consistent manner, at scale. Boomi’s team uses SageMaker’s BYOC paradigm to support their custom model. The Boomi team then used SageMaker Projects and SageMaker Pipelines to automate training, testing, monitoring, and deployment of their custom model solution.

Customer use case

Markov chains are specialized structures for making predictive recommendations in a state machine. Markov chains are best known for their applications in web crawling and search engines. Boomi’s data science team implemented a Markov chain model that could be applied to common integration steps or sequences in their platform, hence the name Step Suggest.

Markov chains are built using a state machine and a probability matrix that describes the possibility of state transitions. Given an initial state, a Markov chain calculates the probability of a transition to another state allowed by the state machine. Boomi’s data science team applied the Markov chain approach to the step suggestion problem by treating the integration steps as states in a state machine. Boomi’s implementation of the Markov chain takes the previous integration step and predicts the next integration step with significant accuracy.

Boomi had significant success with his application of Markov chains. However, the underlying algorithm for Step Suggest is complicated and proprietary. SageMaker has built-in support for several popular ML algorithms, but Boomi already had a working solution. Instead of starting from scratch, Boomi used the BYOC approach to import their existing models into SageMaker. As a result, Boomi’s team was able to use SageMaker for inference, CI/CD, and monitoring, without having to rebuild their Markov chain from scratch.

Solution Details

The most important criteria for this solution were the reuse of existing models and the ease of implementing those models in production. Boomi’s Step Suggest solution needs automated training and inference pipelines. At the time of the migration to SageMaker’s BYOC deployment model, Boomi’s solution was built and heavily tested on individual laptops.

Boomi used Amazon ECR to store versions of his Step Suggest model. Amazon ECR stores and versions containerized applications in a container registry. Boomi’s team created a Docker container with the model built from individual laptops and uploaded that container to an Amazon ECR domain. When the upload was complete, Boomi mounted the image to his SageMaker domain, where it could be imported and used for additional ML tasks, such as inference deployments to a hosted endpoint.

The exact steps to replicate this process are outlined Train and deploy deep learning models using JAX with Amazon SageMaker. This post discusses how to bring the JAX framework to your SageMaker domain. JAX is a promising ML framework for which SageMaker does not have built-in support. Boomi implemented a similar workflow for its proprietary framework, extending the capabilities of its SageMaker implementation to meet the requirements of the Step Suggest project. There are some prerequisites; continue with the following steps before following the guide in the JAX post to practice the BYOC implementation paradigm with SageMaker.

Alternatives to SageMaker

Boomi was already an AWS customer prior to the AWS PE ML FastStart program. In fact, most of his data science team used notebook instances of SageMaker for model development. Data stored in Amazon Simple Storage Service (Amazon S3) trained models on notebook instances, which came pre-installed with Jupyter Notebook software. This worked for model development, but Boomi needed a more robust solution to scale this workload to its clients.

The AWS PE ML FastStart program conducted a deep dive session with Boomi’s data science and engineering teams. We decided that SageMaker Studio would be a better solution for the Boomi team to quickly scale this solution to their customers.

Why SageMaker?

SageMaker Studio brought several key advantages that SageMaker laptops could not achieve on their own. First, Studio makes it easy to share laptop assets between a large team of data scientists like Boomi’s. Boomi analysts were free to use SageMaker Data Wrangler for data preparation tasks, while Boomi data scientists could continue to use Jupyter notebooks. More importantly, Studio kept the BYOC functionality. This was absolutely crucial because it meant the team could reuse the model assets they had already created.

Second, SageMaker Pipelines made it easy for the Boomi team to visualize and modify their complex CI/CD requirements. The BYOC deployment paradigm requires additional integrations with Amazon ECR. To that end, the exact training and inference pipelines used by Boomi’s MLOps team required additional steps for automated deployment, rollback, and monitoring. SageMaker Pipelines and the SageMaker StepFunctions SDK addressed this requirement.

Finally, SageMaker Projects introduced the team to the ability to create AWS CloudFormation templates that standardized their ML development environments. Infrastructure as Code (IaC) solutions like AWS CloudFormation reduce digital waste and standardize resource deployments across an AWS account. When CloudFormation templates are deployed through AWS Service Catalog, as they are with SageMaker projects, data science teams can operate freely without fear of violating organizational best practices or safeguards. Boomi’s cloud engineering team agreed that this would be an important factor in scaling their data science team.

deep dive feature

The following diagram illustrates the architecture and workflow of the solution.

SageMaker’s BYOC paradigm allowed Boomi’s team to reuse a highly customized implementation of a proprietary machine learning algorithm. Boomi also had several custom pre- and post-processing steps for his models. These proprietary steps enabled Boomi to bridge the gap between its core product engineering and data science teams. Implementing the processing logic within Studio, while possible, would be better suited to a built-in algorithm. The Studio BYOC paradigm allowed Boomi’s data science team to do what it did best without sacrificing speed and agility in its product development.

Because Boomi is a large organization with a strong cloud governance team, and because there are so many teams actively contributing to this project, a strong CI/CD is necessary. The CI/CD enabled by SageMaker Pipelines made it possible for the various contributing parties to collaborate. Boomi analysts contributed to preprocessing and postprocessing; the data science team customized, tuned, and built the model inside a container; and the systems engineering and MLOps team was able to integrate Step Suggest into their core platform.

Conclusion

By leveraging Amazon SageMaker Studio, Amazon SageMaker Projects, and Amazon SageMaker Pipelines, Boomi made it easy to build MLOps solutions at scale.

“The AWS SageMaker Pipeline-based solution has reduced the time it takes to build, deploy, and manage our model by approximately 30%, thanks to its intuitive and easy-to-use interface. Using this solution, we were able to deploy our model in just 4 weeks, 2x faster than if we had used traditional infrastructure.”

Boomi has an active relationship with his AWS account team. AWS account teams connect customers like Boomi with programs designed to address their business and technology needs. Connect with your AWS account team to learn more about programs like AWS PE ML FastStart to improve your time to market for new and innovative products built on or with AWS.

About the authors

Dan Ferguson is a Solutions Architect Specializing in AI/ML (SA) in Private Equity Solutions Architecture at Amazon Web Services. Dan helps Private Equity-backed portfolio companies leverage AI/ML technologies to achieve their business objectives.

swagata ashwani is a Senior Data Scientist at Boomi with over 6 years of data science experience. His interests include MLOps, natural language processing, and data visualization. He is also actively involved in volunteering for Women in Data / AI and spreading more awareness and outreach within the AI community.
In her spare time, she can be found strumming her guitar, drinking masala chai, and enjoying spicy Indian street food.