LLaVA-NeXT-Interleave: A large and versatile multimodal model that can handle configurations such as multiple images, multiple frames, and multiple views
Recent advances in large multimodal models (LMMs) have demonstrated remarkable capabilities in diverse multimodal settings, moving closer to the goal ...