The lack of diversity in data collection has caused significant failures in machine learning (ML) applications. While ML developers do post-collection interventions, they are time consuming and rarely comprehensive. Therefore, new methods are needed to track and manage data collection, iteration, and model training to assess whether data sets reflect real-world variability. We present data design, an iterative bias mitigation approach to data collection that connects HCI concepts with ML techniques. Our process includes (1) Pre-collection planning, to thoughtfully request and document expected data distributions; (2) Monitoring of the collection, to systematically promote the diversity of the sampling; and (3) Familiarity with data, to identify samples that are unfamiliar to a model through out-of-distribution (OOD) methods. We instantiate design data through our own data collection and applied ML case study. We find that models trained on “designed” datasets generalize better across intersectional groups than those trained on similarly sized but less specific datasets, and that familiarity with the data is effective for debugging datasets.