APEER: A Novel Automated Engineering Algorithm for Passage Relevance Ranking

A major challenge in the field of information retrieval (IR) using large language models (LLM) is the heavy reliance on human-created cues for zero-relevance classification. This dependency requires great human effort and experience, making the process slow and subjective. Furthermore, existing methods do not adequately address the complexities involved in relevance ranking, such as the integration of query pairs and long passages and the need for comprehensive relevance assessments. These challenges hinder the efficient and scalable application of LLMs in real-world scenarios, limiting their full potential to improve IR tasks.

Current methods for addressing this challenge primarily involve manual rapid engineering, which, while effective, is time-consuming and subjective. Manual methods lack scalability and are limited by the variability of human experience. Furthermore, existing machine engineering techniques focus more on simpler tasks such as language modeling and classification, without addressing the unique complexities of relevance classification. These complexities include the integration of query-passage pairs and the need for end-to-end relevance ranking, which existing methods handle suboptimally due to their simpler optimization processes.

A team of researchers from Rutgers University and the University of Connecticut proposes APEER (Automatic Prompt Engineering Enhances LLM Reranking), which automates rapid engineering through iterative feedback and preference optimization. This approach minimizes human involvement by generating refined prompts based on performance feedback and aligning them with preferred prompt examples. By systematically refining prompts, APEER addresses the limitations of manual prompt engineering and improves the efficiency and accuracy of LLMs in IR tasks. This method represents a significant advance in providing a scalable and effective solution for optimizing LLM indications in complex relevance ranking scenarios.

APEER operates by generating indications initially and refining them through two main optimization steps. Feedback optimization involves obtaining performance feedback on the current message and generating a refined version. Preference optimization further enhances this message by learning from sets of positive and negative examples. The training and validation of APEER are carried out using multiple datasets including MS MARCO, TREC-DL and BEIR, ensuring the robustness and effectiveness of the method on various IR tasks and LLM architectures.

APEER demonstrates significant improvements in LLM performance for relevance ranking tasks. Key performance metrics such as nDCG@1, nDCG@5, and nDCG@10 show substantial improvements over state-of-the-art manual prompts. For example, APEER achieved an average improvement of 5.29 nDCG@10 on eight BEIR data sets compared to manual cues in the LLaMA3 model. Additionally, APEER indications show better transferability across various LLM tasks and architectures, consistently outperforming baseline methods on several datasets and models, including GPT-4, LLaMA3, and Qwen2.

In conclusion, the proposed method, APEER, automates prompt engineering for LLMs in IR, addressing the critical challenge of reliance on human-created prompts. By employing iterative feedback and preference optimization, APEER reduces human effort and significantly improves LLM performance on multiple data sets and models. This innovation represents a substantial advancement in the field, providing a scalable and effective solution for optimizing LLM indications in complex relevance ranking scenarios.

Review the Paper. All credit for this research goes to the researchers of this project. Also, don't forget to follow us on twitter.com/Marktechpost”>twitter.

Join our Telegram channel and LinkedIn Grabove.

If you like our work, you will love our Newsletter..

Don't forget to join our SubReddit over 45,000ml

(Sign up for free) Try Gretel Navigator, the first composite ai system built to create, edit, and augment tabular data.

Aswin AK is a Consulting Intern at MarkTechPost. He is pursuing his dual degree from the Indian Institute of technology Kharagpur. She is passionate about data science and machine learning, and brings a strong academic background and practical experience solving real-life interdisciplinary challenges.

(Gretel Navigator Announcement) Create, edit, and augment tabular data with the first composite ai system trusted by EY, Databricks, Google, and Microsoft.