artificial intelligence has significantly advanced text-to-image generation in recent years. Transforming written descriptions into visual representations has several applications, from creating content to helping the blind and telling stories. Researchers have faced two major obstacles, which are the lack of high-quality data and copyright issues related to data sets scraped from the Internet.
In recent research, a team of researchers proposed the idea of building an image dataset under a Creative Commons (CC) license and using it to train open diffusion models that can outperform Stable Diffusion 2 (SD2). To do this, it is necessary to overcome two important obstacles, which are the following.
- Lack of captions: Although high-resolution CC photos are openly licensed, they often lack textual descriptions, that is, the captions necessary for training the text-to-image generative model. The model finds it difficult to understand and produce images based on text input in the absence of captions.
- Scarcity of CC Photos: Compared to larger proprietary datasets like LAION, CC photos are scarcer despite being an important resource. This scarcity raises the question of whether there is enough data to successfully train high-quality models.
The team used a transfer learning technique and created excellent synthetic captions using a pre-trained model and compared them to a carefully curated selection of CC photographs. This method is simple and uses the ability of a model to generate text from photographs or other inputs. They achieved this by compiling a dataset of invented photographs and captions, which can be used to train generative models that translate words into images.
The team has created a training recipe that is both computationally and data efficient to address the second challenge. With less data, the aim is to achieve the same quality as the current SD2 models. Only about 3% of the data is needed, that is, approximately 70 million examples that were first used to train SD2. This suggests that there are enough CC photographs accessible to train high-quality models efficiently.
The team has trained several text-to-image models using the data and effective training procedure. Together, these models are called the CommonCanvas family and mark a significant advance in the field of generative models. They can generate visual results that are on par with SD2 in terms of quality.
The largest model in the CommonCanvas family, trained on a CC dataset less than 3% the size of the LAION dataset, achieves performance comparable to SD2 in human evaluations. Despite the limitations of data set size and the use of artificial captions, the method is effective in generating high-quality findings.
The team has summarized its main contributions as follows.
- The team has used a transfer learning method called phone calling to produce excellent captions for Creative Commons (CC) photos that did not initially have captions.
- They have provided a dataset called CommonCatalog that includes around 70 million CC photographs published under an open license.
- The CommonCatalog dataset is used to train a series of latent diffusion models (LDM). Combined, these models are called CommonCanvas and perform competitively both qualitatively and quantitatively compared to the SD2 baseline.
- The study applies a series of training optimizations, making the base SD2 model train almost three times faster.
- To encourage cooperation and further study, the team has made the trained CommonCanvas model, CC photos, artificial captions, and CommonCatalog dataset freely available on GitHub.
Review the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join. our 32k+ ML SubReddit, Facebook community of more than 40,000 people, Discord channel, and Electronic newsletterwhere we share the latest news on ai research, interesting ai projects and more.
If you like our work, you’ll love our newsletter.
we are also in Telegram and WhatsApp.
Tanya Malhotra is a final year student of University of Petroleum and Energy Studies, Dehradun, pursuing BTech in Computer Engineering with specialization in artificial intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and critical thinking, along with a burning interest in acquiring new skills, leading groups and managing work in an organized manner.
<!– ai CONTENT END 3 –>