xAI, the OpenAI competitor founded by Elon Musk, has unveiled the first version of Grok that can process visual information. Grok-1.5V is the company's first-generation multimodal ai model, which not only processes text, but also “documents, diagrams, charts, screenshots and photos.” At xAI x.ai/blog/grok-1.5v” rel=”nofollow noopener” target=”_blank” data-ylk=”slk:announcement;cpos:2;pos:1;elm:context_link;itc:0;sec:content-canvas” class=”link “>advertisement, gave some examples of how its capabilities can be used in the real world. You can, for example, show him a photo of a flowchart and ask Grok to translate it into Python code, have him write a story based on a drawing, and even have him explain a meme that you can't understand. Hey, not everyone can keep up with everything the Internet spits out.
The new version comes just a couple of weeks after the company introduced Grok-1.5. That model was designed to be better at coding and math than its predecessor, as well as being able to process longer contexts so it could verify data from more sources to better understand certain queries. xAI said its early testers and existing users will soon be able to enjoy the Grok-1.5V's capabilities, although it did not give an exact timeline for its launch.
In addition to introducing Grok-1.5V, the company also released a benchmark dataset it calls RealWorldQA. You can use any of RealWorldQA's 700 images to evaluate ai models—each element comes with questions and answers that you can easily verify, but that can stump multimodal models like Grok. xAI said its technology received the highest score when the company tested it with RealWorldQA against competitors such as OpenAI's GPT-4V and Google Gemini Pro 1.5.