When SAM meets NeRF: This AI model can segment anything in 3D

We’re all in awe of recent advances in generative AI, but that doesn’t mean we won’t make significant strides in other applications. For example, the domain of computer vision has also experienced relatively rapid advances recently. Meta’s Segment Anything Model (SAM) release was a huge success and completely changed the game in 2D image segmentation.

In image segmentation, the goal is to detect and “paint” all objects in the scene. Typically, this is done by training a model on a dataset of objects that we want to segment. We can then use the model to segment the same objects into different images. However, the main problem here is that the model is limited by the objects that we show it during training; and it cannot segment invisible objects.

With SAM, this changes. SAM is the first model that could segment anything, literally. This is accomplished by training the SAM on large-scale data and giving it the ability to perform trigger-free segmentation on various styles of image data. It is designed to automatically segment objects of interest in images, regardless of their shape, size, or appearance. SAM has shown remarkable performance in object segmentation in 2D images, revolutionizing the field of machine vision.

JOIN the fastest ML subreddit community

Of course, people didn’t just stop there. They began working on ways to extend the capabilities of SAM beyond 2D. However, one key question remains unanswered: Can SAM’s segmentation capability be extended to 3D, thus bridging the gap between 2D and 3D perception caused by scarcity of data? The answer seems yes, and it’s time to meet with SA3D.

SA3D Leverages advances in Neural Radiance Fields (NeRF) and the SAM model to revolutionize 3D segmentation. NeRF has become one of the most popular 3D renderings in recent years. NeRF creates connections between sparse 2D images and real 3D points through differentiable volume representation. It has undergone numerous improvements, making it a powerful tool for addressing the challenges of 3D perception.

There have been some attempts to extend NeRF-based techniques for 3D segmentation. These approaches involved training an additional feature field aligned with a pretrained 2D visual backbone. While effective, these methods suffer from limitations such as high memory consumption, artifacts in radiation fields that affect feature fields, and inefficiency due to the need to train an additional feature field for each scene.

this is where SA3D come into play Unlike the previous methods, SA3D it does not require training an additional function field. Instead, it leverages the power of SAM and NeRF to automatically segment the desired objects from all views.

SA3D it works by taking user-specified cues from a single rendered view to start the segmentation process. The SAM-generated segmentation maps are then projected onto 3D mask grids using density-guided inverse rendering, providing initial 3D segmentation results. To refine the segmentation, incomplete 2D masks from other views are processed and used as automatic cross-view cues. These masks are fed into SAM to generate refined masks, which are then projected onto the 3D mask grids. This iterative process allows the generation of complete 3D segmentation results.

Overview of how SA3D works. Fountain: https://arxiv.org/abs/2304.12308

SA3D offers several advantages over previous approaches. It can be easily adapted to any pretrained NeRF model without the need for changes or retraining, making it highly compatible and adaptable. The entire segmentation process with SA3D it is efficient, taking about two minutes without the need for engineering optimization. This speed does SA3D a practical solution for real world applications. Furthermore, experimental results have shown that SA3D it can generate detailed segmentation results for various types of 3D objects, opening up new possibilities for applications such as robotics, augmented reality, and virtual reality.

review the Paper, Projectand github link. Don’t forget to join our 21k+ ML SubReddit, discord channel, and electronic newsletter, where we share the latest AI research news, exciting AI projects, and more. If you have any questions about the article above or if we missed anything, feel free to email us at asif@marktechpost.com

Check out 100 AI tools at AI Tools Club

Ekrem Çetinkaya received his B.Sc. in 2018 and M.Sc. in 2019 from Ozyegin University, Istanbul, Türkiye. She wrote her M.Sc. thesis on denoising images using deep convolutional networks. She is currently pursuing a PhD. She graduated from the University of Klagenfurt, Austria, and working as a researcher in the ATHENA project. Her research interests include deep learning, machine vision, and multimedia networks.

Learn about Bright Data: the world’s #1 web data platform