A defining characteristic that sets humans apart from other animals is our ability to communicate through language and use tools to accomplish complex tasks. While recent advances in AI have yielded impressive results, including creating basic models that can generate human-like text output, there are still challenges to overcome before achieving artificial general intelligence (AGI). For example, while these models excel at processing large amounts of unlabeled data, they can struggle with domain-specific tasks, such as mathematical calculations. This has led some to suggest that further development of specialized tools may be necessary to help these models take the next step.
Microsoft researchers have introduced TaskMatrix.AI, a new approach to creating a more versatile and capable AI system. The concept involves integrating basic models with millions of existing models and system APIs, resulting in a “super-AI” that can perform various digital and physical tasks. While AI models and systems are currently designed to address specific domains effectively, the diversity in their implementations and working mechanisms can make basic models difficult to access. This new ecosystem aims to overcome these obstacles by providing a unified framework to connect these AI models and systems.
The Microsoft research team outlines the benefits of TaskMatrix.AI, including the ability to perform digital and physical tasks. To achieve this, the base model acts as a central system that can understand various inputs (text, image, video, audio, and code) and generate code to call APIs to complete tasks. Additionally, the platform has a comprehensive API repository with consistent documentation, making it easy for developers to add new APIs. TaskMatrix.AI can also continue to learn and extend its capabilities by adding new APIs with specific functions to its API platform. Finally, the system is designed to provide better interpretability of your responses by making both the task resolution logic and the results of the APIs easy to understand.
TaskMatrix.AI is built on four main components, which work together to enable the system to understand user goals and run API-based executable code for specific tasks. The Multimodal Basic Conversational Model (MCFM) serves as the primary interface for user communication and can understand the multimodal context. The API Platform provides a unified API documentation schema and a place to store millions of APIs. An API selector uses MCFM’s understanding of user goals to recommend related APIs. Finally, the API Executor executes the action codes generated by the relevant APIs and returns the results. Furthermore, the team has used Human Feedback Reinforcement Learning (RLHF) techniques to train a reward model that can optimize TaskMatrix.AI using information obtained from human interaction. This approach can help MCFM and API Selector find optimal policies and improve performance of complex tasks.
The team conducted an empirical study to test the ability of TaskMatrix.AI to generate PowerPoint slides for different companies using ChatGPT as MCFM. The system generated several slides for each company by dividing the task into 25 API calls. The study demonstrated TaskMatrix.AI’s understanding of user instructions and PowerPoint content, allowing it to generate pages based on a list of companies and insert a suitable logo based on each page’s title.
Research shows that TaskMatrix.AI can improve performance on various tasks by connecting basic models with existing APIs. The team believes that TaskMatrix.AI, along with the continued development of basic models, cloud services, robotics, and the Internet of Things, has the potential to create a future world with greater productivity and creativity.
review the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 17k+ ML SubReddit, discord channeland electronic newsletterwhere we share the latest AI research news, exciting AI projects, and more.
Niharika is a technical consulting intern at Marktechpost. She is a third year student, currently pursuing her B.Tech from the Indian Institute of Technology (IIT), Kharagpur. She is a very enthusiastic individual with a strong interest in machine learning, data science, and artificial intelligence and an avid reader of the latest developments in these fields.
🔥 Must Read: What is AI Hallucination? What goes wrong with AI chatbots? How to detect an amazing artificial intelligence?