CMU researchers propose TIDEE: an embedded agent that can order never-before-seen rooms without any explicit instructions

The effective operation of a robot requires more than just blind obedience to predetermined commands. Robots must respond when there is an obvious deviation from the norm and must be able to deduce important context from an incomplete instruction. Partial or self-generated instruction requires the kind of reasoning that requires a solid understanding of how things in the environment (objects, physics, other agents, etc.) should act. This kind of thinking and acting is a crucial component of embedded common sense reasoning, which is essential for robots to work and interact naturally in the real world.

The field of embedded common sense thinking has lagged behind embedded agents who can follow specific step-by-step instructions because the latter must learn to observe and act without explicit instructions. Embedded common sense thinking can be studied through tasks such as sorting, in which the agent must recognize items in the wrong places and take corrective action to return them to more appropriate settings. The agent must navigate and manipulate intelligently while searching likely locations for objects to be moved, recognizing when things are out of their natural locations in the current scene, and determining where to reposition objects so that they are in their proper locations. Common sense reasoning of object placement and the desirable abilities of intelligent beings come together in this challenge.

TIDEE is a proposed embedded agent developed by the research team that can clean spaces that have never been seen before without guidance. TIDEE is the first type because it can scan a scene for items that aren’t where they should be, determine where in the scene to place them, and then precisely move them there.

JOIN the fastest ML subreddit community

TIDEE investigates the surroundings of a house, finds things out of place, infers likely object contexts for them, locates those contexts in the current scene, and moves objects back to their proper locations. Common sense antecedents are encoded into a visual search network that guides the agent’s exploration to efficiently locate the receptacle of interest in the current scene to reposition the object; ii) visual-semantic detectors that detect objects out of place; and iii) an associative neuronal graphic memory of things and spatial relationships that proposes plausible semantic receptacles and surfaces for object replacements. Using the AI2THOR simulation environment, the researchers put TIDEE to the test by having it clean up chaotic surroundings. TIDEE completes the job directly from pixel and depth input without having seen the same room previously, using only prior knowledge learned from a different collection of training houses. Based on human evaluations of the resulting changes in room design, TIDEE performs better than ablative variants of the model that exclude one or more of the above common sense.

TIDEE can order spaces you have never seen before without any guidance or prior exposure to the places or objects in question. TIDEE does this by looking around the area, identifying items and labeling them as normal or abnormal. TIDEE uses graph inference on its scene graph and external graph memory to infer potential receptacle categories when an object is out of place. It then uses the spatial semantic map of the scene to direct an image-based search network to possible receptacle category locations.

How does it work?

TIDEE cleans the rooms in three different steps. TIDEE begins by scanning the area and running an anomaly detector at each time step until a suspicious object is found. TIDEE then moves to where the item is and grabs it. The second step is for TIDEE to infer a likely receptacle for the element based on the graphics scenario and the pooled external graphics memory. If TIDEE has not yet recognized the container, it will use a visual search network to guide its exploration of the area and suggest where the container can be found. TIDEE holds in memory the estimated 3D centroids of previously identified objects and uses this information for object tracking and navigation.

The visual attributes of each element are collected using a commercially available object detector. At the same time, relational language features are produced by feeding pretrained language model predictions for the 3D relationships between objects (such as “next to”, “supported by”, “above”, etc.).

TIDEE contains a neural graph module programmed to anticipate possible item placement ideas once an object has been picked up. An element to be placed, a memory graph that contains plausible contextual connections learned from training scenarios, and a scene graph that encodes the object relationship configuration in the current scene all interact to make the module work.

TIDEE employs an optical search network that predicts the probability of the presence of an object at each spatial point in an obstacle map given the semantic obstacle map and a search category. The agent then searches those areas that it believes are most likely to contain the target.

TIDEE has two shortcomings, both of which are obvious directions for further research: it does not consider the open and closed states of elements, nor does it include their 3D stance as part of the disordering and restructuring process.

The chaos that results from carelessly scattering things around a room may not be representative of real-life chaos.

TIDEE completes the job directly from pixel and depth input without having seen the same room previously, using only prior knowledge learned from a different collection of training houses. Based on human evaluations of the resulting changes in room design, TIDEE performs better than ablative variants of the model that exclude one or more of the above common sense. A stripped-down model version far outperforms a high-performing solution in a comparable room reorganization benchmark, allowing the agent to observe the target state prior to reorganization.

review the Paper, Project, Github, and CMU-Blog. Don’t forget to join our 19k+ ML SubReddit, discord channel, and electronic newsletter, where we share the latest AI research news, exciting AI projects, and more. If you have any questions about the article above or if we missed anything, feel free to email us at asif@marktechpost.com

Check out 100 AI tools at AI Tools Club

Dhanshree Shenwai is a Computer Engineer and has good experience in FinTech companies covering Finance, Cards & Payments and Banking domain with strong interest in AI applications. She is enthusiastic about exploring new technologies and advancements in today’s changing world, making everyone’s life easier.

JOIN the fastest ML subreddit community