Due to recent technological advances and their remarkable performance, Language Models (LLMs) have now been widely adopted by the general population. For example, tools like ChatGPT can compose long responses to user-provided questions, which can help authors and other writers improve their writing style. On the other hand, generative models like HuggingFace’s Stable Diffusion model can directly generate stunning image results from user input. Although these LLMs produce a variety of output, they all share one characteristic in common: they all use text prompts as model input.
Researchers have found that writing quality prompts is one of the fastest and most effective ways to produce better results. This is where rapid engineering comes into play. In natural language processing, fast engineering refers to the process of determining the inputs that provide preferable or practical results. In other words, these methods modify the text input by incorporating suggestions for an artistic style or design elements, such as lighting. For example, a better message might be “a deserted city with empty buildings, greenery, high definition, high quality, photorealistic, ultra-realistic, 4k” instead of “a deserted city with empty buildings.”
Today, a lot of time and effort is spent creating sign engineering tools that can be used to create better text ads as model inputs. As an important step in this direction, Microsoft Research recently released LMOps, a suite of tools for improving text prompts used as input to generative AI models. LMOps is a research initiative aimed at fundamental research to develop fundamental models for AI products, focusing on the underlying technology to provide AI capabilities with LLMs and generative AI models. The toolkit includes Promptist, a prompt interface that optimizes user text input for text-to-image conversion, and Structured Prompt, a method for adding more instances in a few shots learning prompt for text generation. .
Microsoft researchers also worked on developing a language model that is very useful for automatic optimization of text messages for text-to-image generation. This language model is mainly based on reinforcement learning. On this front, the team first used supervised learning on a manually optimized set of prompts to fit a pretrained language model. After that, reinforcement learning was used to further train the model. Since reinforcement learning works on the basis of a reward function, the team used the updated prompts as input to the text-to-image generator and evaluated the generated images for “relevance and aesthetics” using CLIP. The final model was manually evaluated by a team of researchers who, in most cases, preferred the images produced by the optimized indicator to those produced by the original indicator.
Microsoft researchers also addressed one of the main drawbacks of input streams for LLMs. The largest input stream that LLMs can handle is usually in the range of a few thousand words. This restriction is overcome with Microsoft’s structured prompts, which supports hundreds of examples. To achieve this, the examples are first concatenated into groups, and then each group is fed as input to the model. The hidden key and value vectors of the model’s attention modules are cached. The cached attention vectors are then used by the model’s hidden layers when the user’s raw input request is passed to the model. This new approach introduced by the researchers outperforms the traditional method in several NLP tasks.
The toolset is currently under extensive development to incorporate more features for quick optimization.
review the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 14k+ ML SubReddit, discord channel, and electronic newsletterwhere we share the latest AI research news, exciting AI projects, and more.
Khushboo Gupta is a consulting intern at MarktechPost. He is currently pursuing his B.Tech at the Indian Institute of Technology (IIT), Goa. She is passionate about the fields of machine learning, natural language processing, and web development. She likes to learn more about the technical field by participating in various challenges.