Learning representations of data that are transferable and applicable across tasks is a lofty objective in machine learning. The availability of large amounts of controllable, realistic data for evaluation and training is crucial for achieving this aim and monitoring progress. This is especially the case when thinking about the robustness and fairness of deep neural network models, both of which are essential for models used in practical settings in addition to their sheer accuracy. However, it is difficult to get such information due to concerns over privacy, bias, and copyright infringement. Most publicly available image databases are difficult to edit beyond crude image augmentations and lack fine-grained metadata.
The associated rich collection of factor labels can be easily retrieved using synthetic picture data in which all the parameters affecting the generated scene are precisely controlled. A trained deep neural network’s full capabilities, including its robustness, can be assessed this way. Despite their potential, many existing synthetic image datasets could be better for general picture representation learning research due to their lack of realism and typically limited scope.
To address this issue, researchers from the Meta AI (FAIR), Mila-Quebec AI Institute, and Université de Montréal provide a new collection of synthetic Photorealistic Unreal Graphics (PUG) datasets, created with the representation learning research community in mind and featuring vastly more realistic images than those available in the public domain at present. The Unreal Engine [EpicGames] was used to create the environments, which is lauded for its realism and is utilized extensively in the video gaming and entertainment sectors. They also introduce the TorchMultiverse Python package, which, in addition to pre-rendered static picture datasets, provides a simple Python interface to allow for easily controlled dataset production from any given PUG environment. Using these methods, they add four additional datasets and demonstrate their applicability to various fields of study:
- Animals for studying symbolic space in the context of foundation model research and OOD generalization.
- The comprehensive set of factor changes in ImageNet, including pose, backdrop, size, texture, and lighting, serves as an additional robustness test set for ImageNet.
- SPAR for testing linguistic vision models. They use it to show how artificial data can circumvent problems with existing benchmarks.
- They also introduce PUG: AR4T, a benchmark for fine-tuning vision-language models, and show how well it complements PUG: SPAR.
The PUG datasets collectively set a new bar for the control and photorealism of artificial picture data.
Check out the Reference Article, Paper, and Github. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 28k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
Dhanshree Shenwai is a Computer Science Engineer and has a good experience in FinTech companies covering Financial, Cards & Payments and Banking domain with keen interest in applications of AI. She is enthusiastic about exploring new technologies and advancements in today’s evolving world making everyone’s life easy.