Fast engineering is the process of creating instructions to guide generative models. It is the key to unlocking the power of large models for image generation and language tasks. Today, rapid engineering methods can be broadly classified into two categories.
- Difficult incitement methods: This method involves using handcrafted sequences of interpretable tokens to highlight model behaviors. Using hard hints, many good hints are discovered by trial and error or pure intuition.
- Gentle indications: Soft prompts consist of continuous value language embeddings. They are uninterpretable and non-transferable. Gradient-based optimizers and large data sets are used to generate high-performance ads for specialized tasks.
Given the image on the left, a discrete text message is discovered using CLIP to request Stable Broadcasting, generating new images (right).
Advantages of Hard Indications
- Hard prompts can be mixed and mutated to perform various tasks, while soft prompts are highly specialized.
- Hard notices can be discovered using one model and implemented in another. This portability is not possible in the case of soft cues, as there are differences in embedding dimension and rendering space between models.
- Sticky hints can only be used when only API access to one model is available.
Difficult indications made easy
Models like ChatGPT and Stable Diffusion use ads; however, the actual text is malfunctioning. Soft prompts could be used to take full advantage of the model’s capabilities, but the main problem with them is that humans cannot understand them.
Researchers at the University of Maryland and New York University have designed an ad optimizer PEZ to discover good hard ads. PEZ can work in any language or vision-language model. You can find interpretable cues that describe image content and generate new images with the same content, with minimal adjustment.
We may integrate different ads to generate new images with combined content from multiple source images.
The main advantage of PEZ is that we can extract the indicator and then modify it to create new images. For example, we can capture the human drawing style, generate a message for it, and then change/add a word like tiger or Paris to generate new photos of the same style.
Optimization of hard indications
One way to optimize hard hints is to cast soft hints onto the nearest token embeds. However, this can affect performance, even if the software flag is close to the actual token addition. We can also project the soft cues at each iteration of gradient descent, but if the learning rate is too low, the model may stay on the same strong cue and learn nothing.
PEZ solves this problem by joining the soft and hard cues by optimizing the soft cue using gradients that are computed from the nearest hard cue. Ultimately, the model can be projected onto the nearest token embeds for an efficient and interpretable hard prompt. This procedure is more reliable and requires much less engineering and adjustment than previous hard prompt adjustment methods.
The following image shows the algorithm used for optimization:
review the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 13k+ ML SubReddit, discord channel, and electronic newsletterwhere we share the latest AI research news, exciting AI projects, and more.
I am a civil engineering graduate (2022) from Jamia Millia Islamia, New Delhi, and I have strong interest in data science, especially in neural networks and its application in various areas.