Datategy and Math & AI Institute researchers offer insight into the future of large language model multimodality

Researchers from Datategy SAS in France and the Math & ai Institute in Turkey propose a possible direction for recently emerging multimodal architectures. The core idea of their study is that the well-studied formulation of Named Entity Recognition (NER) can be incorporated into a multimodal large language model (LLM) environment.

Multimodal architectures such as LLaVA, Kosmos or AnyMAL have been gaining ground recently and have demonstrated their capabilities in practice. These models tokenize data from non-text modalities, such as images, and use modality-specific external encoders to embed them in a joint linguistic space. This allows architectures to provide a means to configure multimodal data mixed with text in an interleaved manner.

Authors of this paper propose that this generic architectural preference can be extended to a much more ambitious environment in the near future, which they refer to as an “omnimodal era.” The notions of “entities”, which are somehow connected to the concept of NER, can be imagined as modalities for these types of architectures.

For example, current LLMs are known to have difficulty deducing complete algebraic reasoning. Although research is being done to develop specific “mathematics-friendly” models or use external tools, a particular horizon for this problem could be to define quantitative values as a modality in this framework. Another example would be implicit and explicit date and time entities that can be processed by a specific temporal cognitive modality encoder.

LLMs are also having a very difficult time in geospatial understanding, where they are far from being considered “geospatially aware”. Furthermore, numerical global coordinates need to be processed accordingly, where notions of proximity and adjacency must be accurately reflected in the linguistic embedding space. Therefore, incorporating locations as a special geospatial modality could also provide a solution to this problem with a specifically designed encoder and joint training. In addition to these examples, the first potential entities that could be incorporated as a modality that come to mind are people, institutions, etc.

The authors argue that this type of approach promises to resolve the parametric/nonparametric knowledge scale and context length limitation, as complexity and information can be distributed to numerous modality encoders. This could also solve the problems of injecting updated information across modalities. The researchers simply provide the limits of such a potential framework and discuss the promises and challenges of developing an entity-driven language model.

Review the Paper. All credit for this research goes to the researchers of this project. Also, don't forget to join. our 33k+ ML SubReddit, 41k+ Facebook community, Discord channel, and Electronic newsletterwhere we share the latest news on ai research, interesting ai projects and more.

If you like our work, you'll love our newsletter.

Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of artificial intelligence for social good. His most recent endeavor is the launch of an ai media platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is technically sound and easily understandable to a wide audience. The platform has more than 2 million monthly visits, which illustrates its popularity among the public.

<!– ai CONTENT END 2 –>

(Featured ai Model) Check out LLMWare and their RAG-specialized 7B Parameter LLMs

Datategy and Math & AI Institute researchers offer insight into the future of large language model multimodality

Technical Terrence Team

Verint Systems increases 10% after results, advantage attributed to the "growing adoption of AI and bots" By Investing.com

Leave a Reply Cancel reply

Recommended.

Ethereum Set for Gains: These Bullish Indicators Signal Upside Potential

$MATIC migration from Polygon to $POL will take place on September 4th

Court orders Elon Musk to testify in SEC investigation into his Twitter acquisition

Why Bitcoin ETF approval won't happen this week, according to this journalist

ENS token surges after PayPal and Venmo adopt Ethereum name service

Categories

Important Links

Datategy and Math & AI Institute researchers offer insight into the future of large language model multimodality

Related

Technical Terrence Team

Verint Systems increases 10% after results, advantage attributed to the "growing adoption of AI and bots" By Investing.com

Leave a Reply Cancel reply

Recommended.

Ethereum Set for Gains: These Bullish Indicators Signal Upside Potential

$MATIC migration from Polygon to $POL will take place on September 4th

Court orders Elon Musk to testify in SEC investigation into his Twitter acquisition

Why Bitcoin ETF approval won't happen this week, according to this journalist

ENS token surges after PayPal and Venmo adopt Ethereum name service

Categories

Important Links

Get daily news updates to your inbox!