How easy is it to fool your multimodal LLMs? An empirical analysis of misleading indications

Remarkable advances in multimodal large language models (MLLMs) have not made them immune to challenges, particularly in the context of handling misleading information in prompts, thereby producing hallucinated responses under such conditions. To quantitatively evaluate this vulnerability, we present MAD-Bench, a carefully curated benchmark containing 1000 test samples divided into 5 categories such as non-existent objects, object count, and spatial relationship. We provide a comprehensive analysis of popular MLLMs, from GPT-4v, Reka, Gemini-Pro to open source models such as LLaVA-NeXT and MiniCPM-Llama3. Empirically, we observe significant performance gaps between GPT-4o and other models; and previous robust instruction-tuned models are not effective on this new benchmark. While GPT-4o achieves an accuracy of 82.82% on MAD-Bench, the accuracy of any other model in our experiments ranges between 9% and 50%. Additionally, we propose a solution that adds an additional paragraph to the misleading prompts to encourage models to think twice before answering the question. Surprisingly, this simple method can even double the accuracy; However, the absolute numbers are still too low to be satisfactory. We hope that MAD-Bench can serve as a valuable benchmark to stimulate further research to improve model resilience against misleading cues.

How easy is it to fool your multimodal LLMs? An empirical analysis of misleading indications

Technical Terrence Team

My goal is to make a million by buying only about 10 shares!

Leave a Reply Cancel reply

Recommended.

C. Gordon Bell, creator of personal computer prototype, dies at 89

5 Ways to Get Interesting Datasets for Your Next Data (Non-Kaggle) Project | by Matt Chapman | June 2023

House Republicans Demand SEC Answers on FTX Co-Founder’s Arrest Bitcoin News

Kentucky Education Professional Standards Board Approves Innovative Programs to Address Teacher Shortage

Fast food chain struggles to survive after Chapter 11 bankruptcy

Categories

Important Links

How easy is it to fool your multimodal LLMs? An empirical analysis of misleading indications

Related

Technical Terrence Team

My goal is to make a million by buying only about 10 shares!

Leave a Reply Cancel reply

Recommended.

C. Gordon Bell, creator of personal computer prototype, dies at 89

5 Ways to Get Interesting Datasets for Your Next Data (Non-Kaggle) Project | by Matt Chapman | June 2023

House Republicans Demand SEC Answers on FTX Co-Founder’s Arrest Bitcoin News

Kentucky Education Professional Standards Board Approves Innovative Programs to Address Teacher Shortage

Fast food chain struggles to survive after Chapter 11 bankruptcy

Categories

Important Links

Get daily news updates to your inbox!