MOFI: Learning Image Representation from Annotated Images of Noisy Entities

In this paper, we present a novel approach to automatically assign entity labels to images from existing noisy image-text pairs. The approach employs a named entity recognition model to extract entities from the text and uses a CLIP model to select the correct entities as labels from the paired image. The approach is simple and can be easily extended to billions of image-text pairs extracted from the web, through which we have successfully created a dataset with 2 million distinct entities. We study new training approaches on the new dataset collected with large-scale entity labels, including supervised pre-training, contrastive pre-training, and multi-task learning. Experiments show that supervised pre-training with large-scale entity labels is very effective for image retrieval tasks, and multi-task training can further improve performance. The final model, called \textbf{MOFI}, achieves 83.59% mAP on the challenging GPR1200 dataset, compared to the previous 67.33% for OpenAI's CLIP model. Additional experiments on linear probe and zero-shot image classification tasks also show that our MOFI model outperforms a CLIP model trained on the original image and text data, demonstrating the effectiveness of the new dataset in learning representations of general purpose images.

MOFI: Learning Image Representation from Annotated Images of Noisy Entities

Technical Terrence Team

What's next for Apple? South African analysts express skepticism over impact of massive buyback

Leave a Reply Cancel reply

Recommended.

How to measure rag yield: metric and controller tools

Google introduces lightweight open AI model called Gemma

Infographic 7 basic steps for a Google form

Walmart is selling an undefeated clock of $ 95 'excellent' for only $ 39, and buyers say it looks like a rolex '

The path to identity-first decentralization

Categories

Important Links

MOFI: Learning Image Representation from Annotated Images of Noisy Entities

Related

Technical Terrence Team

What's next for Apple? South African analysts express skepticism over impact of massive buyback

Leave a Reply Cancel reply

Recommended.

How to measure rag yield: metric and controller tools

Google introduces lightweight open AI model called Gemma

Infographic 7 basic steps for a Google form

Walmart is selling an undefeated clock of $ 95 'excellent' for only $ 39, and buyers say it looks like a rolex '

The path to identity-first decentralization

Categories

Important Links

Get daily news updates to your inbox!