Data Design: Proactive Data Collection and Iteration for Machine Learning

The lack of diversity in data collection has caused significant failures in machine learning (ML) applications. While ML developers do post-collection interventions, they are time consuming and rarely comprehensive. Therefore, new methods are needed to track and manage data collection, iteration, and model training to assess whether data sets reflect real-world variability. We present data design, an iterative bias mitigation approach to data collection that connects HCI concepts with ML techniques. Our process includes (1) Pre-collection planning, to thoughtfully request and document expected data distributions; (2) Monitoring of the collection, to systematically promote the diversity of the sampling; and (3) Familiarity with data, to identify samples that are unfamiliar to a model through out-of-distribution (OOD) methods. We instantiate design data through our own data collection and applied ML case study. We find that models trained on “designed” datasets generalize better across intersectional groups than those trained on similarly sized but less specific datasets, and that familiarity with the data is effective for debugging datasets.

Data Design: Proactive Data Collection and Iteration for Machine Learning

Technical Terrence Team

Crypto markets rallied, rejecting the US government's stance.

Leave a Reply Cancel reply

Recommended.

Stacks (STX) Sees 30% Gain as Mainnet Upgrade and Stablecoin Launch Near

Desafíos que enfrentan las plataformas de juegos NFT al usar e integrar criptomonedas

Phathom Pharma, Rhythm and Vaxcyte most likely acquisition targets in pharma sector: analyst

Gemini Alleges DCG Genesis Defrauded Users Cryptocurrencies and ICOs

Base Posts New All-Time High In Daily Transactions Amidst Friend.tech Resurgence

Categories

Important Links

Data Design: Proactive Data Collection and Iteration for Machine Learning

Related

Technical Terrence Team

Crypto markets rallied, rejecting the US government's stance.

Leave a Reply Cancel reply

Recommended.

Stacks (STX) Sees 30% Gain as Mainnet Upgrade and Stablecoin Launch Near

Desafíos que enfrentan las plataformas de juegos NFT al usar e integrar criptomonedas

Phathom Pharma, Rhythm and Vaxcyte most likely acquisition targets in pharma sector: analyst

Gemini Alleges DCG Genesis Defrauded Users Cryptocurrencies and ICOs

Base Posts New All-Time High In Daily Transactions Amidst Friend.tech Resurgence

Categories

Important Links

Get daily news updates to your inbox!