Meet Segment AnyRGBD: A Toolkit for Segmenting SAM-Based Rendered Depth Images

To segment depth images rendered using SAM, the researchers have developed the Segment AnyRGBD toolkit. SAD, short for Segment Any RGBD, was recently introduced by NTU researchers. SAD can easily segment any 3D object from RGBD inputs (or just generated depth images).

The produced depth image is then sent to SAM, as researchers have shown that people can easily recognize things from the depth map display. This is achieved by first mapping the depth map ([H, W]) to RGB space ([H, W, 3]) via a colormap function. The rendered depth image pays less attention to texture and more attention to geometry compared to the RGB image. In SAM-based projects like SSA, Anything-3D, and SAM 3D, the input images are all RGB images. The researchers pioneered the use of SAM to extract geometric details directly.

OVSeg is a zero-shot semantic segmentation tool used by researchers. The study authors have given consumers the choice between raw RGB photos or depth images generated as input to the SAM. The user can retrieve the semantic masks (where each shader represents a different class) and the SAM masks associated with the class in any way.

🚀 JOIN the fastest ML subreddit community

Results

Since texture information is more prominent in RGB images and geometry information is present in depth photos, the former are brighter than their rendered counterparts. As the accompanying diagram shows, SAM offers a wider variety of skins for the RGB inputs than for the depth inputs.

Over-segmentation in SAM has been reduced thanks to the depth image produced. In the accompanying illustration, for example, the chair is identified as one of four table segments that were extracted from the RGB photos using semantic segmentation. However, the table is correctly classified as a whole in the depth image. In the accompanying image, the blue circles indicate regions of the skull that are misclassified as walls in the RGB image but are correctly identified in the depth image.

The chair circled in red in the depth image may be two chairs so close together that they are treated as a single entity. The texture data from the RGB photos is crucial to identify the item.

repo and tool

Visit https://huggingface.co/spaces/jcenaa/Segment-Any-RGBD to view the repository.

This repository is open source based on OVSeg, which is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License. However, certain parts of the project are covered by different licenses: the MIT license covers both CLIP and ZSSEG.

https://huggingface.co/spaces/jcenaa/Segment-Any-RGBD is where one can test the tool.

For this task you will need a graphics processing unit (GPU) and you can get one by doubling the space and upgrading the settings to use a GPU instead of waiting in line. There is a significant delay between the start of the frame, the processing of SAM segments, the processing of zero-trigger semantic segments, and the generation of 3D output. Final results are available in around 2 to 5 minutes.

review the Code and repo. Don’t forget to join our 20k+ ML SubReddit, discord channel, and electronic newsletter, where we share the latest AI research news, exciting AI projects, and more. If you have any questions about the article above or if we missed anything, feel free to email us at [email protected]

🚀 Check out 100 AI tools at AI Tools Club

Dhanshree Shenwai is a Computer Engineer and has good experience in FinTech companies covering Finance, Cards & Payments and Banking domain with strong interest in AI applications. She is enthusiastic about exploring new technologies and advancements in today’s changing world, making everyone’s life easier.