DataComp: In search of the next generation of multimodal data sets

*=Equal taxpayers

Multimodal datasets are a critical component in recent advances such as Stable Diffusion and GPT-4, but their design does not receive the same research attention as model architectures or training algorithms. To address this shortcoming in the machine learning ecosystem, we introduced DataComp, a testbed for dataset experiments focused on a new candidate pool of 12.8 billion image-text pairs from Common Crawl. Participants in our benchmark design new filtering techniques or select new data sources and then evaluate their new data set by running our standardized CLIP training code and testing the resulting model on 38 subsequent test sets. Our benchmark consists of multiple computational scales spanning four orders of magnitude, allowing the study of scale trends and making the benchmark accessible to researchers with varying resources. Our benchmark experiments show that the DataComp workflow leads to better training sets. In particular, our best baseline, DataComp-1B, allows training a ViT-L/14 CLIP from scratch with a zero-shot accuracy of 79.2% on ImageNet, outperforming OpenAI's ViT-L/14 CLIP by 3 .7 percentage points while using the same training procedure. and calculate.

DataComp: In search of the next generation of multimodal data sets

Technical Terrence Team

PlayBets ICO introduces high-ranking PLT token

Leave a Reply Cancel reply

Recommended.

Arkham reports Mt. Gox transfers $2.8 billion in bitcoins to a new address

Bitcoin on the brink: Is a “big move” imminent?

Ghosts' vision of the future is a dystopian dreamland.

US Justice Department asks court to reject TikTok's challenge to crackdown law By Reuters

Legion Ventures ICO: democratizing investments with 13 million dollars

Categories

Important Links

DataComp: In search of the next generation of multimodal data sets

Related

Technical Terrence Team

PlayBets ICO introduces high-ranking PLT token

Leave a Reply Cancel reply

Recommended.

Arkham reports Mt. Gox transfers $2.8 billion in bitcoins to a new address

Bitcoin on the brink: Is a “big move” imminent?

Ghosts' vision of the future is a dystopian dreamland.

US Justice Department asks court to reject TikTok's challenge to crackdown law By Reuters

Legion Ventures ICO: democratizing investments with 13 million dollars

Categories

Important Links

Get daily news updates to your inbox!