Machine learning is everywhere, thanks to its recent developments and new releases. With the growing popularity of AI and ML and the demand for production-level ML models, it is very important to discover ML problems and come up with a solution for them. Design patterns are the best way to narrow down to a solution for an ML related problem. The idea of a pattern helps to define a problem and find a deep solution to that problem that can be reused for similar problems any number of times.
Design patterns encode knowledge into instructions that professionals around the world can follow. Different ML design patterns are used at different stages of the ML life cycle. Some of them are used to frame problems, assess feasibility, or approach the development or implementation stage of an ML model. Recently, a Twitter user named Eugene Yan discussed design patterns in machine learning systems in his thread. He has listed some of them in his tweet.
- Cascade: Cascade involves breaking down a complex problem into simpler problems, and then using subsequent models to tackle more difficult or specific problems. The shared example is about Stack Exchange, an online community platform, about how they use a cascade of defenses against spam. It consists of multiple layers of protection to detect and prevent spam from being published on its platform, with each layer focusing on a different aspect of spam detection. The first line of defense is when someone posts too fast to be humanly possible (HTTP error 429), the second is if someone gets caught via regular expressions and rules (heuristics), and the third is extremely accurate based on shadow testing. (ML). Cascade works in a systematic and hierarchical manner and is therefore an effective approach. Check out the resource here.
- Reframing: Reframing involves redefining the original problem to make it easier to solve. The example given in the tweet is about Alibaba, a large e-commerce platform that has recast the sequential recommendation paradigm, which helps predict the next item a user is likely to interact with. Check out the resource here.
- Human-in-the-loop: This involves collecting tags or annotations from users, annotation services, or domain experts to improve the performance of an ML model. The examples mentioned in the tweet are Stack Exchange and LinkedIn, where users can flag spam posts. This allows users to provide feedback on spam content, which can be used to train ML models to better detect future spam and filter offensive messages. C.heck the resource here.
- Data augmentation: Involves creating synthetic variations of training data to increase the size and diversity to improve the generalizability of ML models and reduce the risk of overfitting. An example has been mentioned from DoorDash, a food delivery platform, where data augmentation is used to address the challenge of accurately categorizing and tagging new menu items that have limited or unavailable data to train a model. Check out the resource here.
- Data Flywheel – It is a positive feedback loop where collecting more data improves ML models leading to more users and data. The Tesla example has been shared as it collects data from its cars, such as sensor data, performance metrics, and usage patterns. This data is used to identify and tag errors that help improve models used for tasks like autonomous driving. Check out the resource here.
- Business Rules – These involve adding some additional logic or constraints to augment or adjust the output of ML models based on domain knowledge or business requirements. Twitter uses ML models to predict engagement, which regulates the visibility of tweets on timelines. It also uses hand-tuned weights or rules as constraints on the output of ML models to incorporate knowledge into the decision-making process. Check out the resource here.
Consequently, design patterns in Machine Learning systems can improve the performance, reliability, and interpretability of models and help solve challenges in this domain.
This article is inspired by this tweet Don’t forget to join our 19k+ ML SubReddit, discord channel, and electronic newsletter, where we share the latest AI research news, exciting AI projects, and more. If you have any questions about the article above or if we missed anything, feel free to email us at [email protected]
🚀 Check out 100 AI tools at AI Tools Club
Tanya Malhotra is a final year student at the University of Petroleum and Power Studies, Dehradun, studying BTech in Computer Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a data science enthusiast with good analytical and critical thinking, along with a keen interest in acquiring new skills, leading groups, and managing work in an organized manner.