When a car is traveling down a narrow city street, reflections from shiny paintwork or the side mirrors of parked vehicles can help the driver catch a glimpse of things that are otherwise hidden from view, such as a child playing on the sidewalk. behind parked cars.
Building on this idea, researchers at MIT and Rice University have created a machine vision technique that harnesses reflections to create images of the world. His method uses reflections to turn shiny objects into “cameras”, allowing the user to view the world as if looking through the “lenses” of everyday objects such as a ceramic coffee mug or metal paperweight.
Using images of an object taken from different angles, the technique turns that object’s surface into a virtual sensor that captures reflections. The AI system maps these reflections in a way that allows it to estimate depth in the scene and capture novel views that would only be visible from the object’s perspective. This technique could be used to see around corners or past objects that are blocking the observer’s view.
This method could be especially useful in autonomous vehicles. For example, it could allow a self-driving car to use reflections from objects it passes, such as utility poles or buildings, to see around a parked truck.
“We have shown that any surface can be turned into a sensor with this formulation that turns objects into virtual pixels and virtual sensors. This can be applied in many different areas,” says Kushagra Tiwary, a graduate student in the Camera Culture Group at the Media Lab and co-lead author of a paper about this research.
Tiwary is joined on the paper by co-lead author Akshat Dave, a graduate student at Rice University; Nikhil Behari, MIT Research Support Associate; Tzofi Klinghoffer, MIT graduate student; Ashok Veeraraghavan, professor of electrical and computer engineering at Rice University; and lead author Ramesh Raskar, associate professor of media arts and sciences and leader of the Camera Culture Group at MIT. The research will be presented at the Conference on Computer Vision and Pattern Recognition.
pondering reflections
The heroes of television crime shows often “zoom in and up” surveillance footage to capture reflections, perhaps the ones seen in a suspect’s sunglasses, that help them solve a crime.
“In real life, exploiting these reflexes isn’t as easy as hitting an enhancement button. Getting useful information from these reflections is quite difficult because reflections give us a distorted view of the world,” says Dave.
This distortion depends on the shape of the object and the world the object reflects, about which researchers may have incomplete information. Also, the shiny object can have its own color and texture that blends in with the reflections. Also, reflections are two-dimensional projections of a three-dimensional world, making it difficult to judge depth in mirrored scenes.
The researchers found a way to overcome these challenges. Their technique, known as ORCa (which stands for Objects as Radiance-Field Cameras), works in three steps. First, they take pictures of an object from many vantage points, capturing multiple reflections in the shiny object.
Then, for each image from the real camera, ORCa uses machine learning to turn the object’s surface into a virtual sensor that captures the light and reflections that hit each virtual pixel on the object’s surface. Finally, the system uses virtual pixels on the object’s surface to model the 3D environment from the object’s point of view.
catch rays
Imaging the object from many angles allows ORCa to capture multi-view reflections, which the system uses to estimate the depth between the bright object and other objects in the scene, as well as estimate the shape of the bright object. ORCa models the scene as a 5D radiation field, which captures additional information about the intensity and direction of the light rays emanating from and hitting each point in the scene.
The additional information contained in this 5D radiation field also helps ORCa accurately estimate depth. And because the scene is rendered as a 5D radiation field, rather than a 2D image, the user can see hidden features that would otherwise be blocked by corners or obstructions.
In fact, once ORCa has captured this 5D radiation field, the user can place a virtual camera anywhere in the scene and synthesize what that camera would see, Dave explains. The user could also insert virtual objects into the environment or change the appearance of an object, such as from ceramic to metallic.
“It was especially challenging to go from a 2D image to a 5D environment. You have to make sure that the mapping works and is physically accurate, so it is based on how light travels in space and how light interacts with the environment. We spend a lot of time thinking about how we can model a surface,” says Tiwary.
accurate estimates
The researchers evaluated their technique by comparing it to other methods that model reflexes, which is a slightly different task than what ORCa performs. His method performed well in separating an object’s true color from reflections, and exceeded baselines in extracting more accurate object geometry and textures.
They compared the system’s depth estimates with simulated real data on the actual distance between objects in the scene and found the ORCa predictions to be reliable.
“Consistently, with ORCa, it not only estimates the environment accurately as a 5D image, but to achieve that, in the intermediate steps, it also does a good job of estimating the object’s shape and separating reflections from the object’s texture.” Dave says.
Building on this proof of concept, the researchers want to apply this technique to drone images. ORCa could use weak reflections from objects a drone is flying over to reconstruct a scene from the ground. They also want to improve ORCa so that it can use other signals, such as shadows, to reconstruct hidden information or combine reflections from two objects to image new parts of a scene.
“Estimating specular highlights is really important for seeing around corners, and this is a natural next step for seeing around corners using soft reflections in the scene,” says Raskar.
“Usually, shiny objects are difficult for vision systems to handle. This paper is very creative because it turns the long weakness of the brightness of the objects into an advantage. By exploiting the surrounding reflections of a shiny object, the paper can not only see hidden parts of the scene, but also understand how the scene is lit. This enables applications in 3D perception including, but not limited to, the ability to composite virtual objects in real scenes in a way that makes them appear seamless, even in difficult lighting conditions,” says Achuta Kadambi, assistant professor of electrical engineering and science. of computing at the University of California, Los Angeles, who was not involved in this work. “One of the reasons others haven’t been able to use shiny objects in this way is that most previous work requires surfaces with known geometry or texture. The authors have obtained a new and intriguing formulation that does not require such knowledge.
The research was supported, in part, by the Intelligence Advanced Research Projects Activity and the National Science Foundation.