A recent breakthrough in AI has been the importance of scale in driving advances in various domains. Large models have demonstrated remarkable capabilities in language comprehension, generation, representation learning, multimodal tasks, and image generation. With an increasing number of parameters that can be learned, modern neural networks consume vast amounts of data. As a result, the capabilities exhibited by these models have seen dramatic improvements.
One example is GPT-2, which broke data barriers by consuming approximately 30 billion language tokens a few years ago. GPT-2 showed promising zero shot results in NLP benchmarks. However, newer models like Chinchilla and LLaMA have surpassed GPT-2 by consuming trillions of web-tracked tokens. They have easily surpassed GPT-2 in terms of benchmarks and capabilities. In computer vision, ImageNet initially consisted of 1 million images and was the gold standard for representation learning. But with the scale of data sets to billions of images through web crawling, data sets like LAION5B have produced powerful visual representations, as seen with models like CLIP. The shift from manually assembling data sets to collecting them from various sources via the web has been key to this scaling from millions to billions of data points.
While language and image data have scaled significantly, other areas, such as 3D computer vision, still need to catch up. Tasks such as the generation and reconstruction of 3D objects are based on small, hand-crafted data sets. ShapeNet, for example, relies on professional 3D designers using expensive software to create assets, making the process challenging to crowdsource and scale. The scarcity of data has become a bottleneck for learning-based methods in 3D computer vision. 3D object generation is still far behind 2D image generation, and often relies on models trained on large 2D data sets rather than being trained from scratch on 3D data. The growing demand and interest in augmented reality (AR) and virtual reality (VR) technologies further highlight the urgent need to scale up 3D data.
To address these limitations, researchers at the Allen Institute for AI, University of Washington, Seattle, Columbia University, Stability AI, CALTECH, and LAION present Objaverse-XL as a large-scale web-tracked 3D asset dataset. . Rapid advances in 3D authoring tools, along with the increased availability of 3D data on the Internet through platforms like Github, Sketchfab, Thingiverse, Polycam, and specialized sites like the Smithsonian Institution, have contributed to the creation of Objaverse-XL. This dataset provides a significantly wider variety and quality of 3D data than previous efforts such as Objaverse 1.0 and ShapeNet. With over 10 million 3D objects, Objaverse-XL represents a substantial increase in scale, surpassing previous data sets by several orders of magnitude.
The scale and diversity offered by Objaverse-XL have significantly expanded the performance of next-generation 3D models. In particular, the Zero123-XL model, previously trained with Objaverse-XL, demonstrates remarkable zero-shot generalization capabilities in challenging and complex modalities. It performs exceptionally well at tasks like synthesizing novel views, even with diverse inputs such as photorealistic assets, cartoons, drawings, and sketches. Similarly, PixelNeRF, trained to synthesize novel views from a small set of images, shows remarkable improvements when trained with Objaverse-XL. Scaling pre-training data from 1,000 assets to 10 million shows consistent improvements, highlighting the promise and opportunities that web-scale data brings.
The implications of Objaverse-XL extend beyond the realm of 3D models. Its potential applications span computer vision, graphics, augmented reality, and generative AI. Reconstructing 3D objects from images has long been a challenge in computer vision and computer graphics. Existing methods have explored various differentiable renderings, network architectures, and rendering techniques to predict 3D shapes and textures from images. However, these methods have mostly relied on small-scale data sets such as ShapeNet. With the significantly larger Objaverse-XL, new levels of performance and generalization can be achieved in zero-shot form.
Also, the emergence of 3D generative AI has been an exciting development. Models such as MCC, DreamFusion, and Magic3D have shown that 3D shapes can be generated from language prompts with the help of text-to-image models. Objaverse-XL also opens up opportunities for 3D text generation, enabling advances in 3D text modeling. By leveraging the vast and diverse dataset, researchers can explore novel applications and push the boundaries of generative AI in the 3D domain.
The release of Objaverse-XL marks a significant milestone in the field of 3D datasets. Their size, diversity, and potential for large-scale formation hold promise for advancing research and applications in 3D understanding. Although Objaverse-XL is currently smaller than billion-scale image and text data sets, its introduction paves the way for further exploration on how to continue to scale 3D data sets and simplify the capture and creation of 3D content. Future work may also focus on choosing optimal data points to train and extending Objaverse-XL to benefit discriminatory tasks such as 3D segmentation and detection.
In conclusion, the introduction of Objaverse-XL as a massive 3D dataset sets the stage for exciting new possibilities in computer vision, graphics, augmented reality, and generative AI. By addressing the limitations of previous data sets, Objaverse-XL provides a foundation for large-scale training and opens avenues for innovative research and applications in the 3D domain.
review the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 26k+ ML SubReddit, discord channeland electronic newsletterwhere we share the latest AI research news, exciting AI projects, and more.
🚀 Check out 100 AI tools at AI Tools Club
Niharika is a technical consulting intern at Marktechpost. She is a third year student, currently pursuing her B.Tech from the Indian Institute of Technology (IIT), Kharagpur. She is a very enthusiastic individual with a strong interest in machine learning, data science, and artificial intelligence and an avid reader of the latest developments in these fields.