Image by author
Due to the popularity of the blog 7 End-to-End MLOps Platforms You Should Try in 2024, I am writing another list of end-to-end MLOP tools that are open source.
Open source tools provide privacy and more control over your data and model. On the other hand, you have to manage these tools yourself, implement them, and then hire more people to maintain them. Additionally, you will be responsible for security and any interruption of service.
In summary, both paid MLOps platforms and open source tools have advantages and disadvantages; You just have to choose what works for you.
In this blog, we will learn about 5 end-to-end open source MLOps tools to train, track, deploy and monitor models in production.
1. Kubeflow
He kubeflow/kubeflow makes all machine learning operations simple, portable, and scalable on Kubernetes. It is a cloud-native framework that allows you to create machine learning pipelines and train and deploy the model to production.
Picture of Kubeflow
Kubeflow supports cloud services (AWS, GCP, Azure) and self-hosted services. It allows machine learning engineers to integrate all types of ai frameworks to train, tune, program, and deploy the models. Additionally, it provided a centralized dashboard to monitor and manage pipelines, code editing using Jupyter Notebook, experiment tracking, model registration, and artifact storage.
2. ml flow
He flowml/flowml It is generally used for monitoring and recording experiments. However, over time, it has become an end-to-end MLOps tool for all types of machine learning models, including LLMs (Large Language Models).
Picture of ml flow
MLFlow has 6 main components:
- Follow-up– Versions and stores parameters, code, metrics, and output files. It also comes with interactive metric and parametric visualizations.
- Projects– Packaging data science source code for reusability and reproducibility.
- Models– Stores machine learning models and metadata in a standard format that downstream tools can use later. It also provides model deployment and serving options.
- Model registration– A centralized model store to manage the lifecycle of MLflow models. Provides version control, model lineage, model aliases, model tagging, and annotations.
- Recipes (pipes)– Machine learning pipelines that allow you to quickly train high-quality models and deploy them to production.
- LLM– Provides support for LLM assessment, rapid engineering, monitoring and implementation.
You can manage the entire machine learning ecosystem using CLI, Python, R, Java, and REST APIs.
3. Metaflow
He Netflix/metaflow Enables data scientists and machine learning engineers to quickly create and manage machine learning/ai projects.
Metaflow was initially developed at Netflix to increase the productivity of data scientists. It has now been made open source, so everyone can benefit from it.
Picture of Metaflow documents
Metaflow provides a unified API for data management, version control, orchestration, training and deployment of modes and computing. It supports major cloud providers and machine learning frameworks.
4. Seldon Core V2
He SeldonIO/seldon-core is another popular comprehensive MLOps tool that allows you to package, train, deploy, and monitor thousands of machine learning models in production.
Picture of seldon core
Seldon Core Key Features:
- Deploy models locally with Docker or to a Kubernetes cluster.
- Monitoring of model and system metrics.
- Deploy drift and outlier detectors alongside models.
- Supports most machine learning frameworks, such as TensorFlow, PyTorch, Scikit-Learn, ONNX.
- Data-centric MLOP approach.
- CLI is used to manage workflows, inference, and debugging.
- Save costs by deploying multiple models seamlessly.
Seldon core turns your machine learning models into REST/GRPC microservices. I can easily scale and manage thousands of machine learning models and provide additional capabilities for metrics tracking, request logging, explanations, outlier detectors, A/B testing, canaries, and more.
5. ML Execution
He mlrun/mlrun framework allows you to easily create and manage machine learning applications in production. Streamlines production data ingestion, machine learning processes, and online applications, significantly reducing engineering efforts, production time, and computing resources.
Picture of MLRun
The main components of MLRun:
- Project management– A centralized hub that manages various project assets such as data, features, jobs, workflows, secrets, and more.
- Data and artifacts– Connect multiple data sources, manage metadata, catalog and version artifacts.
- Feature Store– Stores, prepares, catalogs, and provides model features for training and deployment.
- Batch runs and workflows– Runs one or more functions and collects, tracks, and compares all of their results and artifacts.
- Real-time service pipeline– Rapid deployment of scalable data pipelines and machine learning.
- Real-time monitoring– Monitors production data, models, resources and components.
Conclusion
Instead of using one tool for each step of the MLOps process, you can use just one to perform them all. With just one end-to-end MLOP tool, you can train, track, store, version, deploy, and monitor machine learning models. All you have to do is deploy them locally using Docker or in the cloud.
Using open source tools is good for more control and privacy, but comes with the challenges of managing them, updating them, and dealing with security and downtime issues. If you are starting out as an MLOps engineer, I suggest you focus on open source tools and then move on to managed services like Databricks, AWS, Iguazio, etc.
I hope you like my content on MLOps. If you want to read more about them, mention it in a comment or contact me on LinkedIn.
Abid Ali Awan (@1abidaliawan) is a certified professional data scientist who loves building machine learning models. Currently, he focuses on content creation and writing technical blogs on data science and machine learning technologies. Abid has a master's degree in technology management and a bachelor's degree in telecommunications engineering. His vision is to build an artificial intelligence product using a graph neural network for students struggling with mental illness.