From Specialists to General Purpose Assistants: A Deep Dive into the Evolution of Multimodal Core Models in Vision and Language 10/12/2023