CtrlSynth: Controllable image and text synthesis for data-efficient multimodal learning
Pre-training of robust or multimodal baseline vision models (e.g., CLIP) relies on large-scale data sets that can be noisy, potentially ...
Pre-training of robust or multimodal baseline vision models (e.g., CLIP) relies on large-scale data sets that can be noisy, potentially ...
A major challenge in ai research is how to develop models that can balance fast, intuitive reasoning with slower, more ...
Controllable learning (CL) is emerging as a crucial component of reliable machine learning. It emphasizes ensuring that learning models meet ...
This paper was accepted into the NeurIPS 2023 workshop on diffusion models. We demonstrate how conditional generation from diffusion models ...