Due to their impressive results across a wide range of NLP tasks, Long Language Models (LLMs) such as ChatGPT have aroused great interest from researchers and businesses alike. Using reinforcement learning from human feedback (RLHF) and extensive prior training in huge text corpora, LLMs can generate greater language understanding, generation, interaction, and reasoning abilities. The great potential of LLMs has spawned a host of new areas of study, and the resulting opportunities to develop cutting-edge artificial intelligence systems are virtually limitless.
LLMs need to collaborate with other models to realize their full potential and take on challenging AI jobs. Therefore, it is critical to choose the right middleware to establish communication channels between LLMs and AI models. To solve this problem, the researchers acknowledge that each AI model can be represented as a language by summarizing the function of the model. As a result, the researchers propose the idea that “LLMs use the language as a generic interface to link various AI models.” Specifically, LLMs can be seen as the central nervous system for managing AI models such as planning, scheduling, and cooperation, as they include model descriptions in prompts. As a result, LLMs can now use this tactic to rely on third-party models to complete AI-related activities. However, another difficulty arises if you want to incorporate multiple AI models into LLMs: to perform many AI tasks, you need to collect many high-quality model descriptions, which requires rapid and intensive engineering. Many public ML communities have a wide selection of models suitable for solving specific AI tasks, including language, vision, and speech, and these models have clear and concise descriptions.
The research team proposes HuggingGPT, which can process inputs from various modalities and solve many complex AI problems, to connect LLM (ie ChatGPT) and the ML community (ie Hugging Face). To communicate with ChatGPT, the researchers combine the model description from the library corresponding to each AI model in Hugging Face with the prompt. Then, the LLMs (ie ChatGPT) will be the “brains” of the system to answer user queries.
Researchers and developers can work together on data sets and natural language processing models with the help of the HuggingFace Hub. As a bonus, it has a simple user interface for locating and downloading ready-to-use templates for various NLP applications.
Phases of HuggingGPT
HuggingGPT can be broken down into four distinct steps:
- Task scheduling: Use ChatGPT to interpret user requests for meaning, then break them down into discrete, actionable tasks with screen guidance.
- Model selection: Based on model descriptions, ChatGPT chooses expert models stored in Hugging Face to complete predetermined tasks.
- Task Execution: Call and run each chosen model, then report back to ChatGPT on the results.
- After integrating the forecast from all models with ChatGPT, the final step is to generate responses for users.
To examine closely –
HuggingGPT starts with a large language model that breaks down a user’s request into discrete steps. The big language model should establish relationships and order of tasks when dealing with complex demands. HuggingGPT uses a combination of specification-based instructions and demo-based parsing in its fast design to guide the large language model towards efficient task scheduling. The following paragraphs serve as an introduction to these details.
HuggingGPT should then select the appropriate model for each task in the task list after parsing the feature list. The researchers do this by pulling descriptions of expert models from the Hugging Face Hub and then using the in-context task modeling mechanism to dynamically choose which models to apply to certain tasks. This method is more adaptable and open (describe expert models; anyone can gradually use them).
The next step after a model has been assigned a task is to carry it out, a process known as model inference. HuggingGPT uses hybrid inference endpoints to speed up and ensure the computational stability of these models. The models receive the task arguments as inputs, perform the necessary computations, and then return the results of the inference to the larger language model. Models without resource dependencies can be parallelized to further increase inference efficiency. This allows multiple tasks to start simultaneously with all their dependencies satisfied.
HuggingGPT moves to the response generation step after all tasks have been executed. HuggingGPT collects the results of the previous three steps (task planning, model selection, and task execution) into a single coherent report. This report details the tasks that were planned, the models that were chosen for those tasks, and the inferences that were drawn from those models.
contributions
- It offers protocols for cooperation between models to complement the benefits of large linguistic and expert models. New approaches to creating general AI models are made possible by separating the large language models, which function as brains for planning and decision-making, from the smaller models, which act as executors of each given task.
- By connecting the Hugging Face hub to more than 400 specific task-focused models in ChatGPT, the researchers were able to build HuggingGPT and address broad classes of AI problems. HuggingGPT users can access reliable multimodal chat services thanks to the open collaboration of models.
- Numerous trials on various difficult AI tasks in language, vision, speech and cross modality show that HuggingGPT can understand and solve complicated tasks in multiple modalities and domains.
Advantages
- HuggingGPT can perform various complex AI tasks and integrate multi-modal perception abilities because its design allows it to use external models.
- In addition, HuggingGPT can continue to absorb knowledge from specialists in specific domains through this pipeline, allowing for expandable and scalable AI capabilities.
- HuggingGPT has built hundreds of Hugging Face models into ChatGPT, covering 24 tasks such as text classification, object detection, semantic segmentation, image generation, question answering, text to speech, and text to video. Experimental results show that HuggingGPT can handle complex AI and multimodal data tasks.
limitations
- There will always be restrictions with HuggingGPT. Efficiency is one of our main concerns, as it represents a potential barrier to success.
- The inference of the bulk language model is the main efficiency bottleneck. HuggingGPT must interact with the large language model multiple times per user request round. This occurs during task planning, model selection, and response generation. These exchanges significantly lengthen response times, which reduces the quality of service for end users. The second is the maximum length constraint imposed on contexts.
- HuggingGPT has a maximum context length restriction due to the maximum number of LLM tokens allowed. To address this, studies have focused solely on the task planning phase of the dialog window and context tracking.
- The main concern is the reliability of the system as a whole. When inferring, long language models can occasionally deviate from the instructions, and the output format can sometimes surprise developers. The insurrection of very large language models during inference is an example.
- There is also the problem that the expert model of the Hugging Face inference endpoint needs to be more manageable. Hugging Face expert models may have failed during the job execution phase due to network latency or service health.
The source code can be found in a directory called “JAVIS”
In conclusion
Improving AI requires solving challenging problems in a variety of areas and modalities. While many AI models exist, they need to be more powerful to handle complex AI tasks. LLMs could be a controller to manage existing AI models to perform complex AI tasks. The language is a generic interface because LLMs have demonstrated excellent language processing, generation, interaction, and reasoning capabilities. In keeping with this idea, the researchers present HuggingGPT. This framework uses LLMs (such as ChatGPT) to link different AI models from other machine learner communities (such as Hugging Face) to complete AI-related tasks. More specifically, it uses ChatGPT to organize tasks after receiving a user request, choose models based on their Hugging Face job descriptions, execute each subtask using the chosen AI model, and compile a response from the results of the executions. HuggingGPT paves the way for cutting-edge AI by utilizing ChatGPT’s superior language capability and Hugging Face’s wealth of AI models to perform a wide range of complex AI tasks across various modalities and domains, with amazing results in areas such as language, vision, voice, and more.
review the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 17k+ ML SubReddit, discord channeland electronic newsletterwhere we share the latest AI research news, exciting AI projects, and more.
Dhanshree Shenwai is a Computer Engineer and has good experience in FinTech companies covering Finance, Cards & Payments and Banking domain with strong interest in AI applications. She is enthusiastic about exploring new technologies and advancements in today’s changing world, making everyone’s life easier.
🔥 Must Read: What is AI Hallucination? What goes wrong with AI chatbots? How to detect an amazing artificial intelligence?