This Microsoft and Oxford AI article introduces Olympus: a universal task router for computer vision tasks

Computer vision models have made significant progress in solving individual tasks such as object detection, segmentation, and classification. Complex real-world applications such as autonomous vehicles, security and surveillance, healthcare, and medical imaging require multiple vision tasks. However, each task has its own model architecture and requirements, making efficient management within a unified framework a major challenge. Current approaches rely on training individual models, making it difficult to scale them to real-world applications that require a combination of those tasks. Researchers of the Oxford University and Microsoft have come up with a novel framework, Olympuswhich aims to simplify the handling of various vision tasks while enabling more complex workflows and efficient resource utilization.

Traditionally, computer vision approaches are based on task-specific models. These models focus on efficiently performing one task at a time. However, the requirement of separate models for each task increases the computational burden. Multitask learning models exist, but they often suffer from poor task balance, resource inefficiency, and performance degradation on complex or underrepresented tasks. Therefore, there is a need for a new method that solves scalability issues, dynamically adapts to new scenarios, and utilizes resources effectively.

At its core, the proposed framework, Olympus, has a controller, the Multimodal Large Language Model (MLLM), responsible for understanding user instructions and routing them to appropriate specialized modules. Olympus key features include:

Task-aware routing: The MLLM controller analyzes incoming tasks and efficiently redirects them to the most suitable specialized model to optimize computational resources.
Scalable Framework: Can handle up to 20 tasks simultaneously without requiring separate systems and integrate with existing MLLMs efficiently.
Knowledge sharing: the different Olympus components share everything they have learned with each other, maximizing production efficiency.
Action Chain Capability: Olympus can handle multiple vision tasks and is highly adaptable to complex real-world applications.

Olympus demonstrated impressive performance in several benchmarks. It achieved an average routing efficiency of 94.75% on 20 individual tasks and achieved an accuracy of 91.82% in scenarios that required multiple tasks to complete an instruction. The modular routing approach allowed new tasks to be added with minimal retraining, demonstrating its scalability and adaptability.

Olympus: A universal task router for computer vision tasks marks a significant leap in computer vision. Its innovative task-aware routing mechanism and modular knowledge sharing framework address inefficiency and scalability challenges in multitasking learning systems. By achieving impressive routing accuracy, precision in chained action scenarios, and scalability in various vision tasks, Olympus establishes itself as a versatile and efficient tool for various applications. While further exploration of edge-case tasks, latency trade-offs, and real-world validation is needed, Olympus paves the way for more integrated and adaptive systems, challenging the traditional task-specific model paradigm. With further developments and implementations, Olympus can change the way complex vision problems are handled in different domains. This will provide a solid foundation for future developments in computer vision and artificial intelligence.

Verify he Paper and GitHub page. All credit for this research goes to the researchers of this project. Also, don't forget to follow us on <a target="_blank" href="https://twitter.com/Marktechpost”>twitter and join our Telegram channel and LinkedIn Grabove. Don't forget to join our SubReddit over 60,000 ml.

Trending: LG ai Research launches EXAONE 3.5 – three frontier-level bilingual open-source ai models that deliver unmatched instruction following and broad context understanding for global leadership in generative ai excellence….

Afeerah Naseem is a Consulting Intern at Marktechpost. He is pursuing his bachelor's degree in technology from the Indian Institute of technology (IIT), Kharagpur. He is passionate about data science and fascinated by the role of artificial intelligence in solving real-world problems. He loves discovering new technologies and exploring how they can make everyday tasks easier and more efficient.

(Download) Large Language Model Vulnerability Assessment Report (Promoted)