Microsoft AI Research proposes an extensible prompt (X-Prompt) to prompt a Large Language Model (LLM) beyond Natural Language (NL)

Due to their ability to produce text comparable to human-written material and their versatility in various natural language processing (NLP) applications, Long Language Models (LLMs) have become extremely popular in recent years. These models can now discover correlations and patterns in natural language text that were previously impossible. As a result, several practical applications have been created, including answering questions, text summaries, and language translation. The availability of a large amount of data for LLMs to train has been one of the main elements that contributed to their success. These models can now be trained quickly thanks to the accessibility of powerful hardware like graphics processing units (GPUs). The success of LLMs has also been significantly influenced by their ability to adapt to certain needs. By training a previously trained model on a smaller data set relevant to that purpose, programmers can modify it to accomplish a particular goal, such as sentiment analysis or text categorization. As a result, several NLP-based applications have been created that can be quickly adapted to certain activities and use cases.

According to recent research, language models (LMs) learn better from context as their model size increases. The popup feature demonstrates promising results in low-trigger, zero-learn environments by enabling a large LM to receive run-time instructions via a descriptive natural language (NL) message to achieve its defined goal with good robustness outside of distribution (OOD). . However, it is only sometimes simple to develop a detailed indicator, particularly for activities with intangible and detailed criteria. For example, unless the language is well known, it is not easy to describe a person’s linguistic style using NL to encourage a LM to write in that language (eg, William Shakespeare’s style). They suggest the extensible prompt (X-Prompt), developed to overcome the obstacles of presenting more detailed prompts. In addition to presenting a lexicon of dummy terms, X-Prompt differs from NL prompts in that it offers an extensible interface to increase the descriptive capabilities of the prompts. As shown in Table 1, it is simple and customizable for X-Prompt to enter an imagined word2 that reflects the style of a particular person. This word can then be combined with different request contexts to tell the LM to produce the given content in the user’s language.

They test using the X-Prompts case study for style customization. They demonstrate that X-Prompt successfully combines the advantages of NL and soft prompts, offering a potentially extensible interface for advanced interaction between people and mass LMs. They also show that X-Prompt has strong descriptive capabilities and high OOD resistance. They suggest fast-increasing context-driven learning to help imagined terms learn toward widespread use against overfitting training data in distribution (ID) to ensure that an X-Prompt can be resistant to OOD such as message messages. NL. They advise using X-Prompt, a versatile interface for requesting a meaningful language model outside of natural language. Beyond style customization, as in this work, X-Prompt can enhance in-context learning capabilities to handle more complex instructions for language model customization. This work addresses advanced interaction between humans and large language models (eg, creative language generation, patching language models with new entity and event insights, detoxification and debiasing in language generation).

Table 1 – Unlike prompts that exclusively use NL words, X-Prompt also adds an extensive lexicon of dummy terms (such as wgsatya and wsheldon g) to reflect concepts that NL words find difficult to convey, including linguistic style of a particular person. In the same way that NL words can be combined with different prompt contexts to create a robust OOD X-Prompt, dummy words learned for general usability can be used to tell the LM to generate specialized content in the target language. a particular user. Note that the above output samples were created by requesting the OPT-6.7b model with the imaginary words learned: wgsatya was discovered from Satya Nadella’s tweets, and wsheldon g was discovered through Sheldon Cooper’s comments. The Big Bang Theory. None of the training manuals contain “C++”.

review the Paper Y Github. All credit for this research goes to the researchers of this project. Also, don’t forget to join our reddit page, discord channel, Y electronic newsletterwhere we share the latest AI research news, exciting AI projects, and more.

Aneesh Tickoo is a consulting intern at MarktechPost. She is currently pursuing her bachelor’s degree in Information Science and Artificial Intelligence at the Indian Institute of Technology (IIT), Bhilai. She spends most of her time working on projects aimed at harnessing the power of machine learning. Her research interest is image processing and she is passionate about creating solutions around her. She loves connecting with people and collaborating on interesting projects.