Machine Learning is all over the place, thanks to its recent developments and new releases. With AI and ML’s increasing popularity and demand for production-level ML models, finding out ML problems and constituting a solution for them is very important. Design patterns are the best way to narrow down to a solution for an ML-related problem. The idea of a pattern helps define a problem and find an in-depth solution to that problem which can be re-used for similar problems any number of times.
Design patterns codify the knowledge into instructions that can be followed by practitioners all over the world. Different ML design patterns are used at different stages of the ML life cycle. Some of them are used in problem framing, assessing feasibility, or addressing an ML model’s development or deployment stage. Recently, a Twitter user named Eugene Yan discussed design patterns in machine learning systems in his thread. He has listed a few of them in his tweet.
- Cascade: Cascade involves breaking down a complex problem into simpler problems and then using subsequent models to tackle more difficult or specific problems. The example shared is about Stack Exchange, an online community platform, about how they use a cascade of defenses against spam. It consists of multiple layers of protection to detect and prevent spam from being posted on their platform, where each layer focuses on a different aspect of spam detection. The first line of defense is when someone posts too fast to be humanly possible (HTTP 429 error), the second is if someone gets caught via regex and rules (Heuristics), and the third is extremely accurate based on shadow testing (ML). Cascade works in a systematic and hierarchical manner and is thus an effective approach. Check out the resource here.
- Reframing – Reframing involves redefining the original problem to make it easier to solve. The example given in the tweet is about Alibaba, a large e-commerce platform that has reframed the paradigm of sequential recommendation, which helps predict the next item a user is likely to interact with. Check out the resource here.
- Human-in-the-loop – This involves collecting labels or annotations from users, annotation services, or domain experts to improve the performance of an ML model. The examples mentioned in the tweet are Stack Exchange and LinkedIn, where users can flag spam posts. This allows users to provide feedback on spam content, which can be used to train ML models to detect spam in the future better and filter out offensive messages. Check out the resource here.
- Data Augmentation – It involves creating synthetic variations of training data to increase size and diversity to improve the ability of ML models to generalize and reduce the risk of overfitting. An example of DoorDash, a food delivery platform, has been mentioned where data augmentation is used to address the challenge of accurately categorizing and tagging new menu items that have limited or no data available for training a model. Check out the resource here.
- Data Flywheel – It is a positive feedback loop where the collection of more data improves ML models leading to more users and data. Tesla’s example has been shared as it collects data from its cars, such as sensor data, performance metrics, and usage patterns. This data is used to identify and label errors that help improve models used for tasks like autonomous driving. Check out the resource here.
- Business Rules: These involve adding some extra logic or constraints to augment or adjust the output of ML models based on domain knowledge or business requirements. Twitter uses ML models to predict engagement, which regulates the visibility of tweets in timelines. It also uses hand-tuned weights or rules as constraints on the output of ML models to incorporate knowledge in the decision-making process. Check out the resource here.
Consequently, design patterns in Machine Learning systems can enhance models’ performance, reliability, and interpretability and help solve challenges in this domain.
This article is inspired by this tweet. Don’t forget to join our 26k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at [email protected]
🚀 Check Out 100’s AI Tools in AI Tools Club
Tanya Malhotra is a final year undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and critical thinking, along with an ardent interest in acquiring new skills, leading groups, and managing work in an organized manner.