Introduction
Neural Information Processing Systems (NeurIPS) 2023 The conference, a premier event in the field of artificial intelligence and machine learning, set new benchmarks in research and collaboration. This year's conference attracted a record 13,321 presentations. The rigorous review process, conducted by more than 1,100 area presidents, 100 senior area presidents and 396 ethics reviewers, led to the acceptance of 3,584 articles. This high level of participation underlines the importance of the event as a hub for cutting-edge research and innovation in the ai community.
Award categories
This year, the awards were classified into three different areas:
- Featured Main Tracking Articles
- Featured finalist work on the main track
- Featured Datasets and Reference Articles
Each category honors different aspects of ai research, reflecting the diverse and multifaceted nature of the field.
Featured Main Tracking Articles
1. Privacy Audit with one (1) training run
Authors: Thomas Steinke, Milad Nasr, Matthew Jagielski
Abstract: This innovative paper presents an innovative method for auditing differentially private machine learning systems using a single training run. This approach marks a substantial leap from traditional methods that require multiple iterations. The implications are significant and promising advances in the development of privacy-focused machine learning algorithms, potentially revolutionizing the way privacy is maintained in ai.
You can access this document here.
2. Are the emerging capabilities of large linguistic models a mirage?
Authors: Rylan Schaeffer, Brando Miranda, Sanmi Koyejo
Abstract: Challenging conventional wisdom, this article critically examines the supposed emergent abilities of large-scale linguistic models. The authors argue that these capabilities may not be inherent to the scaling of ai models, but could arise from the metrics used in their evaluation. This provocative stance prompts a reevaluation of our understanding of large language models and underscores the need for more robust metrics to accurately assess ai capabilities.
You can access this document here.
Featured finalist work on the main track
3. Scaling data-constrained language models
Authors: Niklas Muennighoff et al.
Abstract: This article addresses the formidable challenge of scaling language models in scenarios where data is limited. Traditionally, large language models rely on large data sets for training. The authors propose innovative techniques to improve the performance of these models even with smaller data sets, potentially democratizing access to advanced language modeling.
You can access this document here.
4. Direct preference optimization: Your language model is secretly a reward model:
Authors: Rafael Rafailov et al.
Abstract: Offering a novel perspective, this paper presents a unique method for controlling the behavior of large language models by directly optimizing them based on human preferences. This approach could pave the way for the creation of more controllable and user-centered language models, improving their practical usability and ethical alignment.
Click here to explore this document.
Featured Datasets and Reference Articles
5. ClimSim: A large multiscale dataset for hybrid physics and machine learning climate emulation:
Authors: Sungduk Yu et al.
Abstract: This article presents ClimSim, a dataset of unprecedented size designed for hybrid machine learning and physics research in climate modeling. As the largest data set of its kind, ClimSim represents an invaluable resource for researchers striving to innovate climate modeling and prediction techniques.
Click here to explore this document.
6. DecodingTrust – A Comprehensive Assessment of Reliability in GPT Models:
Authors: Boxin Wang et al.
Abstract: This article addresses a crucial aspect of ai development: reliability. It proposes a comprehensive framework for evaluating the reliability of GPT (generative pre-trained transformer) models, marking a significant step towards the development of more reliable and ethically sound language models.
Click here to access this document.
Bono: a legacy of impact
Following tradition, the conference also featured the “Test of Time” award, given to a decade-old article that has significantly influenced the field. This year's recipient was “Distributed representations of words and phrases and their compositionality”by Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado and Jeffrey Dean. First presented at NeurIPS 2013 and now cited over 40,000 times, this paper introduced the innovative word embedding technique, word2vec. His innovative approach to learning from large volumes of unstructured text spearheaded a new era in natural language processing, making it a cornerstone in ai research.
Additional Noteworthy Articles
7. Thought Tree: Deliberate Problem Solving with Large Language Models
Authors: Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, Karthik Narasimhan
Abstract: This article presents the “Tree of Thoughts” (ToT) framework for problem solving with language models. Addresses the limitations of linguistic models in tasks that require exploration, strategic anticipation, or critical initial decisions. ToT allows the exploration of coherent units of text (“thoughts”) as intermediate steps toward problem solving. It allows language models to make deliberate decisions by considering multiple paths of reasoning, self-evaluating options, and looking forward or backward as necessary. The framework significantly improves problem-solving capabilities in tasks that require planning or non-trivial search, as demonstrated in experiments such as the Twenty-Four Game, creative writing, and mini crossword puzzles.
Click here to access this document.
8. Toolformer: Language models can learn to use tools by themselves
Authors: Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, Thomas Scialom
Abstract: In this article, the authors present Toolformer, a language model (LM) designed to leverage external tools through simple APIs, addressing the paradox where LMs excel at learning new tasks but struggle with basic functions such as arithmetic or fact finding. Toolformer is trained to autonomously determine when and how to use these tools, including a calculator, a question and answer system, a search engine, a translation system, and a calendar. It does this in a self-supervised manner with minimal demonstrations. This approach significantly improves zero performance on various tasks and maintains the core linguistic capabilities of the model.
Click here to explore this document.
9. Zephyr: LM Alignment Direct Distillation
Authors: Lewis Tunstall et al.
Abstract: This article introduces ZEPHYR-7B, a 7 billion parameter language model designed to improve alignment with user intent. It employs distilled supervised fine-tuning (dSFT) and distilled direct preference optimization (dDPO) with ai feedback (AIF) from results classified by a master model. This efficient approach requires only a few hours of training and no additional sampling during tuning. ZEPHYR-7B demonstrates superior performance in chat benchmarks compared to existing models, including LLAMA2-CHAT-70B, without the need for human annotations. Resources related to this system are shared online for public access.
Click here to explore this document.
10. Code Chain: Reasoning with a Code Emulator Augmented with a Language Model
Authors: Chengshu Li, Jacky Liang, Andy Zeng, Xinyun Chen, Karol Hausman, Dorsa Sadigh, Sergey Levine, Li Fei-Fei, Fei Xia, Brian Ichter
Abstract: This article presents Chain of Code (CoC), an extension to improve the reasoning capabilities of language models (LMs), particularly in tasks that require a combination of logic, arithmetic, and semantic understanding. CoC encourages LMs to format semantic subtasks as flexible pseudocode, allowing an “LMulator” to handle undefined behaviors. This approach outperforms Chain of Thought and other baselines on several benchmarks. In particular, CoC achieves a 12% gain over Chain of Thought on BIG-Bench Hard, demonstrating its effectiveness in expanding the range of reasoning questions that LMs can accurately address using code-based thinking.
Click here to explore this document.
11. Large language models as zero-shot conversational recommenders
Authors: Zhankui He et al.
Abstract: This paper presents an empirical study on conversational recommendation using large language models in a zero-shot environment. It includes three main contributions: the creation of the largest public dataset of real-world conversational recommendations from a popular discussion website, an evaluation showing that large language models outperform existing fine-tuned models even without fine-tuning, and an analysis of the models. performance through probing tasks. This analysis helps understand the effectiveness and limitations of large language models in conversational recommendation, offering directions for future design.
Click here to explore this document.
Conclusion
NeurIPS 2023 exemplified the vibrant and rapidly evolving landscape of artificial intelligence and machine learning research. The record number of presentations and rigorous review process highlighted the importance of the event as a nexus for innovative research. The diverse range of award categories celebrated achievements in various facets of ai, from innovative methods in privacy auditing and challenging conventional beliefs about emerging abilities in large language models, to novel approaches to scaling language models with limited data and evaluating the reliability of ai systems. .
The introduction of important data sets such as ClimSim further underlines the conference's role in fostering advances in interdisciplinary fields. The “Test of Time” award, which recognizes the lasting impact of the word2vec article, served as a reminder of the lasting influence of the pioneering research. Additionally, intriguing articles like “Tree of Thoughts” and “Toolformer” demonstrated the continued push toward more sophisticated and practical applications of ai, revealing a future in which language models not only understand the world but also interact with it in a new way. increasingly complex ways.
NeurIPS 2023 was not only a showcase of current achievements, but also a beacon for future explorations, laying the foundation for continued innovation and discovery in the ai community.