First steps with multimodality | by Valentina Alto | December 2023

Understand the vision capabilities of large multimodal models

Recent advances in generative ai have enabled the development of large multimodal models (LMMs) that can process and generate different types of data, such as text, images, audio, and video.

LMMs share with “standard” large language models (LLMs) the generalization and adaptation capacity typical of large base models. However, LMMs are capable of processing data beyond text, including images, audio, and video.

One of the most prominent examples of large multimodal models is GPT4V(ision), the latest version of the pre-trained generative transformer (GPT) family. GPT-4 can perform various tasks that require both natural language understanding and computer vision, such as image captioning, visual question answering, text-to-image synthesis, and image-to-text translation.

The GPT4V (along with its newer version, the GPT-4-turbo vision) has demonstrated extraordinary capabilities, including:

Mathematical reasoning about numerical problems:

Generating code from sketches:

Description of artistic heritage:

And many others.

In this article, we will focus on the vision capabilities of LMMs and how they differ from standard computer vision algorithms.

What is computer vision?

Computer vision (CV) is a field of artificial intelligence (ai) that allows computers and systems to derive…

First steps with multimodality | by Valentina Alto | December 2023

Technical Terrence Team

Pending home sales show no growth in November and are below consensus

Leave a Reply Cancel reply

Recommended.

Amazon's top-selling bed sheets with 123,000 perfect ratings are now just $35 for a 6-piece set

Streamline Real Estate Data Management: Advanced Data Extraction and Retrieval with Indexify | by Ashish Abraham | August 2024

This German nonprofit is creating an open voice assistant that anyone can use

The Verge’s Favorite Holiday Gifts Under $50

Buying this exciting FTSE stock would offer me access to the AI revolution!

Categories

Important Links

First steps with multimodality | by Valentina Alto | December 2023

Understand the vision capabilities of large multimodal models

What is computer vision?

Related

Technical Terrence Team

Pending home sales show no growth in November and are below consensus

Leave a Reply Cancel reply

Recommended.

Amazon's top-selling bed sheets with 123,000 perfect ratings are now just $35 for a 6-piece set

Streamline Real Estate Data Management: Advanced Data Extraction and Retrieval with Indexify | by Ashish Abraham | August 2024

This German nonprofit is creating an open voice assistant that anyone can use

The Verge’s Favorite Holiday Gifts Under $50

Buying this exciting FTSE stock would offer me access to the AI ​​revolution!

Categories

Important Links

Get daily news updates to your inbox!

Buying this exciting FTSE stock would offer me access to the AI revolution!