This paper was accepted into the Foundation Models in the Wild workshop at ICML 2024.
Diffusion models have emerged as the de facto approach to generating visual data, which is trained to match the distribution of the training dataset. Furthermore, we also want to control the generation to meet desired properties, such as alignment with a text description, which can be specified with a black-box reward function. Previous works fine-tune pre-trained diffusion models to achieve this goal through reinforcement learning-based algorithms. However, they suffer from problems including slow credit allocation and low quality of their generated samples. In this work, we explore techniques that do not directly maximize reward, but instead generate high-reward images with relatively high probability, a natural scenario for the Generative Flow Networks (GFlowNets) framework. To this end, we propose the Diffusion Alignment with GFlowNet (DAG) algorithm to subsequently train diffusion models with black-box property functions. Extensive experiments on stable diffusion and various reward specifications corroborate that our method could effectively align large-scale text-to-image diffusion models with given reward information.