Text-to-image models have broken into the AI domain in recent months. They have shown excellent performance in generating images, which can produce output using text prompts that can be difficult to distinguish from actual images. These models are becoming an essential part of content generation quite quickly.
Nowadays, it is possible to use AI models to generate images that we can use in our applications, say, web page design. We can simply take one of the models, which can be MidJourney, DALL-E, or Stable Diffusion, and ask them to generate images for us.
Let’s assume, for a second, that we are on the other side of the equation. Imagine that you are an artist and you put in hours of hard work to generate digital art. He publishes it on digital channels making sure to file all the required copyright information to make sure his art is not stolen in any way. Then the next day, you see one of these large-scale models generate an image that looks identical to your artwork. How would you react to that?
This is one of the ignored problems of large-scale imaging models. The data sets used to train these models often include copyrighted materials, personal photographs, and artwork by individual artists. We need to find a way to remove such concepts and materials from large-scale models. But how can we do it without retraining the model from scratch? Or what if we want to keep the related concepts but remove the copyrighted ones?
In response to these concerns, a team of researchers has proposed a method for the ablation or removal of specific concepts from text-conditioned diffusion models.
The proposed method modifies the images generated for a target concept to match a broad anchor concept, such as overwriting Star Wars R2D2 with Robot or Monet paintings with a painting. This is called concept ablation and is the key contribution of the article.
The goal here is to modify the conditional distribution of the model for a given target concept. This allows a distribution defined by the anchor concept to be matched, thus removing the concept to a more generic version.
The authors propose two different ways to achieve target distributions, each of which leads to different training targets. In the first case, the model is adjusted to match the model prediction between two text indications that contain the target and the corresponding anchor concepts. For example, it takes cute grumpy cat to Cute cat. In the second objective, the conditional distribution is defined by the modified text and image pairs of the target concept indicator paired with anchor concept images. This approach takes cute grumpy cat to a random cat image.
Two different ablation methods are evaluated; model-based and noise-based. In the model-based approach, the anchor distribution is generated by the model itself, conditioned by the anchor concept. On the other hand, noise-based ablation involves starting with a concept and generating the target image with added random noise.
The proposed concept ablation method is tested on 16 tasks, including specific object instances, artistic styles, and memorized images. It was able to successfully remove target concepts while minimally affecting closely related surrounding concepts that should be preserved. The method takes about five minutes per concept and is resistant to misspellings in the text message.
In conclusion, this method presents a promising approach to address concerns about the use of copyrighted materials and personal photos in large-scale text-to-image models.
review the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 17k+ ML SubReddit, discord channeland electronic newsletterwhere we share the latest AI research news, exciting AI projects, and more.
Ekrem Çetinkaya received his B.Sc. in 2018 and M.Sc. in 2019 from Ozyegin University, Istanbul, Türkiye. She wrote her M.Sc. thesis on denoising images using deep convolutional networks. She is currently pursuing a PhD. She graduated from the University of Klagenfurt, Austria, and working as a researcher in the ATHENA project. Her research interests include deep learning, computer vision, and multimedia networks.
🔥 Must Read: What is AI Hallucination? What goes wrong with AI chatbots? How to detect an amazing artificial intelligence?