Large Language Models (LLMs), including GPT-3, PaLM, OPT, BLOOM, and GLM-130B, have largely pushed the limits of what it is possible for computers to understand and produce in terms of language. One of the most fundamental language applications, answering questions, has been significantly improved thanks to recent advances in LLM. Based on existing studies, the performance of closed-book QA of LLMs and learning-in-context QA is on par with that of supervised models, contributing to our understanding of the memorization ability of LLMs. LLM. But even LLMs have finite capacity and fall short of human expectations when faced with problems that require considerable exceptional knowledge. Therefore, recent attempts have concentrated on building LLMs enhanced with external knowledge, including online search and retrieval.
For example, WebGPT is capable of browsing online, getting lengthy answers to complicated queries, and equally useful references. Despite its popularity, the original WebGPT approach has not yet been widely adopted. First, it relies on lots of expert-level annotations of breadcrumb paths, well-written responses, and response-preference tagging, all of which require expensive resources, a lot of time, and extensive training. Second, by telling the system to interact with a web browser, provide operating instructions (such as “Search”, “Read”, and “Cite”), and then collect relevant material from online sources, the behavioral cloning approach ( that is, imitation learning) requires that its basic model, GPT-3, resemble human experts.
Finally, the multi-turn structure of web browsing requires extensive computing resources and can be excessively slow for the user experience, for example, WebGPT-13B takes around 31 seconds to respond to a 500 token query. Researchers from Tsinghua University, Beihang University, and Zhipu.AI introduce WebGLM in this study, a robust web-enhanced quality assurance system based on the 10 Billion-Parameter General Language Model (GLM-10B) . Figure 1 shows an illustration of one. It’s powerful, affordable, sensitive to human preferences, and most importantly, it’s of a caliber that’s on par with WebGPT. To achieve good performance, the system uses several novel approaches and designs, including an LLM-augmented retriever, a two-stage retriever that combines fine-grained LLM distilled recovery with coarse-grained web search.
The ability of LLMs like GPT-3 to spontaneously accept correct references is the source of inspiration for this technique, which could be refined to improve smaller dense retrievers. A GLM-10B-based response generator bootstrapped via LLM-in-context learning and trained on long-quoted QC samples is known as a bootstrap generator. LLMs can be prepared to provide high-quality data by using proper citation-based filtering instead of relying on expensive human experts to write WebGPT. A scorer taught using the approval signals of users of online QA forums can understand the preferences of the human majority when it comes to various responses.
They show that a proper dataset architecture could produce a high-quality marker compared to WebGPT expert labeling. The results of his quantitative ablation tests and in-depth human evaluation show just how efficient and effective the WebGLM system is. Notably, WebGLM (10B) outperforms WebGPT (175B) in its Turing test and outperforms WebGPT (13B) of similar size. WebGLM is one of the best web-enhanced QA systems publicly available as of this presentation, thanks to its improvement over the only publicly accessible system, Perplexity.ai. In conclusion, they provide the following in this document: • Build WebGLM, an effective web-enhanced quality assurance system with human preferences. It performs similarly to WebGPT (175B) and substantially better than the similarly sized WebGPT (13B).
It also outperforms Perplexity.ai, a popular system powered by LLMs and search engines. • Identify the limitations of WebGPT in real world implementations. They propose a set of new designs and strategies to enable the high accuracy of WebGLM while achieving efficient and cost-effective advantages over basic systems. • Formulate human evaluation metrics to evaluate web-enhanced quality control systems. Extensive experiments and human evaluations demonstrate the solid capabilities of WebGLM and generate information about future developments of the system. The code implementation is available on GitHub.
review the Paper and Github. Don’t forget to join our 24k+ ML SubReddit, discord channel, and electronic newsletter, where we share the latest AI research news, exciting AI projects, and more. If you have any questions about the article above or if we missed anything, feel free to email us at [email protected]
featured tools Of AI Tools Club
🚀 Check out 100 AI tools at AI Tools Club
Aneesh Tickoo is a consulting intern at MarktechPost. She is currently pursuing her bachelor’s degree in Information Science and Artificial Intelligence at the Indian Institute of Technology (IIT), Bhilai. She spends most of her time working on projects aimed at harnessing the power of machine learning. Her research interest is image processing and she is passionate about creating solutions around her. She loves connecting with people and collaborating on interesting projects.