Can master's students visualize graphs? Assessing symbolic understanding of AI programs

Large language models (LLMs) have demonstrated the ability to generate generic computer programs, allowing understanding of the structure of programs. However, it is difficult to determine the true capabilities of LLMs, especially in pursuing tasks they missed during training. It is critical to determine whether LLMs can truly “understand” symbolic graphics programs, which generate visual content when run. They define this understanding as the ability to understand the semantic content of the rendered image based solely on the raw text input of the program. This method involves answering questions about the content of the image without actually seeing it, which is easy with visual input but much more difficult when relying solely on the text of the program.

Existing research on symbolic graphics programs has focused primarily on procedural modeling of 2D shapes and 3D geometry. These programs, such as constructive solid geometry (CSG), computer-aided design (CAD), and scalable vector graphics (SVG), provide a clear and interpretable representation of visual content. In addition, LLMs have been applied to various programming tasks such as code retrieval, automated testing, and generation; however, the comprehension of symbolic graphics programs is largely different, as their semantic meaning is often defined visually. Existing benchmarks for LLMs focus on non-graphical program comprehension, while visual language models are evaluated using multimodal datasets for tasks such as image captioning and visual question answering.

Researchers from the Max Planck Institute for Intelligent Systems, Tübingen, the University of Cambridge, and MIT have proposed a new approach to assess and improve master’s students’ understanding of symbolic graphics programs in law. A benchmark called SGP-Bench is presented for master’s students’ semantic understanding and consistency in interpreting SVG (2D vector graphics) and CAD (2D/3D objects) programs. Furthermore, a new fine-tuning method based on a collected instruction trace dataset called symbolic instruction fine-tuning is developed to improve performance. Furthermore, the MNIST symbolic dataset created by the researchers shows important differences between master’s students’ understanding of symbolic graphics programs and human understanding.

The process of building a benchmark to assess master's students' understanding of symbolic graphics programs uses a scalable and efficient process. It uses a powerful vision-language model (GPT-4o) to generate semantic questions based on rendered images of the symbolic programs. In addition, human annotators verify the quality and accuracy of these automatically generated question-answer pairs. This approach reduces the manual effort required compared to traditional data creation methods. The process for SVG and 2D CAD programs is straightforward as they directly produce 2D images, but in 3D CAD programs, 3D models are first converted to 2D images from multiple fixed camera positions.

The assessment of LLMs’ understanding of symbolic graphics programs is performed on the SGP-MNIST dataset, which consists of 1000 SVG programs representing MNIST-like digit images, with 100 programs per digit (0–9). While humans can easily recognize the images, LLMs found it extremely difficult to interpret the symbolic programs. Even the state-of-the-art GPT-4o model performed barely better than random guesses. This stark contrast between human and LLM performance highlights a significant gap in how machines process and understand symbolic representations of visual information compared to humans.

In conclusion, the researchers present a new way to evaluate LLMs, assessing their ability to understand images directly from their symbolic graphics programs without visual input. The researchers created the SGP-Bench, a benchmark that effectively measures the performance of LLMs on this task. They also introduced symbolic instruction fine-tuning (SIT) to improve the ability of LLMs to interpret graphics programs. This research helps provide a clearer picture of LLMs’ capabilities and promotes the creation of diverse assessment tasks. Future research includes investigating how LLMs understand semantics in this area and working on developing advanced methods to improve their performance on these tasks.

Take a look at the Paper. All credit for this research goes to the researchers of this project. Also, don't forget to follow us on twitter.com/Marktechpost”>twitter and join our Telegram Channel and LinkedIn GrAbove!. If you like our work, you will love our fact sheet..

Don't forget to join our Subreddit with over 48 billion users

Find upcoming ai webinars here

Sajjad Ansari is a final year student from IIT Kharagpur. As a technology enthusiast, he delves into practical applications of ai, focusing on understanding the impact of ai technologies and their real-world implications. He aims to articulate complex ai concepts in a clear and accessible manner.

ai/webinar-unlock-the-power-of-your-snowflake-data-with-llms?utm_campaign=2408%20-%20Webinar%20-%20Snowflake%20data%20with%20LLMs&utm_source=marktechpost&utm_medium=banner-ad-desktop”>x-300.jpg” alt=””/>

Can master's students visualize graphs? Assessing symbolic understanding of AI programs

Technical Terrence Team

Does the current economic climate offer a once-in-a-decade opportunity to profit from growth stocks?

Leave a Reply Cancel reply

Recommended.

I would use this Warren Buffett method to try to double my wealth by 2034

Binance Labs Announces First Batch of Season 6 Incubation Projects

Consumers plan to buy the same amount for Thanksgiving, but will pay more

SambaNova Systems breaks records with Samba-1-Turbo: transforming AI processing with unmatched speed and innovation

Bitcoin Miner BitNile Moves 6,572 Bitcoin Miners to Michigan

Categories

Important Links

Can master's students visualize graphs? Assessing symbolic understanding of AI programs

Related

Technical Terrence Team

Does the current economic climate offer a once-in-a-decade opportunity to profit from growth stocks?

Leave a Reply Cancel reply

Recommended.

I would use this Warren Buffett method to try to double my wealth by 2034

Binance Labs Announces First Batch of Season 6 Incubation Projects

Consumers plan to buy the same amount for Thanksgiving, but will pay more

SambaNova Systems breaks records with Samba-1-Turbo: transforming AI processing with unmatched speed and innovation

Bitcoin Miner BitNile Moves 6,572 Bitcoin Miners to Michigan

Categories

Important Links

Get daily news updates to your inbox!