Machine learning (ML) models are fundamentally driven by data, and creating inclusive ML systems requires important considerations about how to design representative data sets. However, few beginner-oriented machine learning modeling tools are designed to encourage hands-on learning of dataset design practices, including how to design for data diversity and inspect data quality.
To this end, we describe a set of four data design practices (DDPs) for designing inclusive ML models and share how we designed a tablet-based application called Co-ML to foster DDP learning through a collaborative ML model. . With Co-ML, beginners can build image classifiers through a distributed experience where data is synchronized across multiple devices, allowing multiple users to iteratively refine ML datasets in discussion and coordination with their peers.
We implemented Co-ML in a two-week AIML educational summer camp, where youth ages 13-18 worked in groups to create custom ML-based mobile applications. Our analysis reveals how creating multi-user models with Co-ML, in the context of student-driven projects created during the summer camp, supported the development of DDP, including incorporating diversity of data, evaluating the performance of the model and data quality inspection. Furthermore, we found that students' attempts to improve model performance often prioritized learnability over classroom balance. Through this work, we highlight how the combination of collaboration, model testing interfaces, and student-driven projects can empower students to actively participate in exploring the role of data in machine learning systems.