Nomic AI releases Nomic Embed Vision v1 and Nomic Embed Vision v1.5 – CLIP-like vision models that can be used in conjunction with its popular text embedding models

Nomic ai has recently introduced two major releases in multimodal integration models: ai/nomic-embed-vision-v1″>Nomic Integrated Vision v1 and ai/nomic-embed-vision-v1.5″>Nomic Integrated Vision v1.5. These models are designed to provide high-quality, fully replicable vision embeds that integrate seamlessly with existing Nomic Embed Text v1 and v1.5 models. This integration creates a unified embedding space that improves performance for text and multimodal tasks, outperforming competitors such as OpenAI CLIP and OpenAI Text Embedding 3 Small.

ai/posts/nomic-embed-vision”>Nomic Integrated Vision aims to address the limitations of existing multimodal models such as CLIP, which, while impressive in zero-shot multimodal capabilities, underperform on tasks outside of image retrieval. By aligning a vision encoder with the existing Nomic Embed Text latent space, Nomic has created a unified multimodal latent space that excels at image and text tasks. This unified space has shown superior performance on benchmarks such as Imagenet 0-Shot, MTEB and Datacomp, making it the first dumbbell model to achieve such results.

Nomic Embed Vision models can embed image and text data, perform unimodal semantic search within data sets, and perform multimodal semantic search across data sets. With only 92M parameters, the vision encoder is ideal for high-volume production use cases and complements the 137M Nomic Embed Text. Nomic has open sourced the training code and replication instructions, allowing researchers to reproduce and improve the models.

The performance of these models is compared to established standards, and Nomic Embed Vision demonstrates superior performance in various tasks. For example, Nomic Embed v1 achieved 70.70 on Imagenet 0-shot, 56.7 on Datacomp Avg. and 62.39 on MTEB Avg. Nomic Embed v1.5 performed slightly better, indicating the robustness of these models .

Nomic Embed Vision powers multimodal search in Atlas, showcasing its ability to understand textual queries and image content. An example query demonstrated the model's semantic understanding by retrieving images of stuffed animals from a dataset of 100,000 images and captions.

The formation of Nomic Embed Vision involved several innovative approaches to align the vision encoder with the text encoder. These included training on image-text pairs and text-only data, using a Three Towers training method and locked image text fitting. The most effective approach involved freezing the text encoder and training the vision encoder on image-text pairs, ensuring backward compatibility with Nomic Embed Text embeddings.

The vision encoder was trained on a subset of 1.5 billion image-text pairs using 16 H100 GPUs, achieving impressive results on the Datacomp benchmark, which includes 38 image classification and retrieval tasks.

Nomic has released two versions of Nomic Embed Vision, v1 and v1.5, which are compatible with the corresponding versions of Nomic Embed Text. This support allows seamless multimodal tasks across different versions. The models are released under a CC-BY-NC-4.0 license, encouraging experimentation and research, with plans to re-license under Apache-2.0 for commercial use.

In conclusion, Nomic Embed Vision v1 and v1.5 transform multimodal embeddings, providing a unified latent space that excels in image and text tasks. With open source training codes and a commitment to continuous innovation, Nomic ai sets a new standard in model integration and offers powerful tools for diverse applications.

Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of artificial intelligence for social good. His most recent endeavor is the launch of an ai media platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is technically sound and easily understandable to a wide audience. The platform has more than 2 million monthly visits, which illustrates its popularity among the public.

Join the fastest growing ai research newsletter read by researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others…

Nomic AI releases Nomic Embed Vision v1 and Nomic Embed Vision v1.5 – CLIP-like vision models that can be used in conjunction with its popular text embedding models

Technical Terrence Team

With £10,000 to invest, should you buy growth shares or value shares?

Leave a Reply Cancel reply

Recommended.

Warren Buffett’s Berkshire Hathaway Tops Morningstar Financial Stock List

Baker Hughes forecasts higher margins thanks to strong order book By Reuters

All Types of Preferred Stock Explained

Twitter is ending free two-factor SMS authentication. So what can you use instead? | Twitter

Informe NFT mensual de enero con Footprint Analytics

Categories

Important Links

Nomic AI releases Nomic Embed Vision v1 and Nomic Embed Vision v1.5 – CLIP-like vision models that can be used in conjunction with its popular text embedding models

Related

Technical Terrence Team

With £10,000 to invest, should you buy growth shares or value shares?

Leave a Reply Cancel reply

Recommended.

Warren Buffett’s Berkshire Hathaway Tops Morningstar Financial Stock List

Baker Hughes forecasts higher margins thanks to strong order book By Reuters

All Types of Preferred Stock Explained

Twitter is ending free two-factor SMS authentication. So what can you use instead? | Twitter

Informe NFT mensual de enero con Footprint Analytics

Categories

Important Links

Get daily news updates to your inbox!