As the Neural Radiance Field (NeRF) recently emerged, innovative sight synthesis research has evolved significantly. The main concept of NeRF is to use the differentiable volume representation approach to enhance multilayer perceptron networks (MLPs) to encode the density and radiation fields of the scene. After training, NeRF can produce high-quality photos from creative camera poses. Although NeRF can provide photorealistic rendering results, training a NeRF can take hours or days due to slow deep neural network optimization, which restricts the range of applications for which it can be used.
Recent studies show that grid-based techniques such as Plenoxels, DVGO, TensoRF, and Instant-NGP allow rapid training of a NeRF in minutes. However, as an image gets larger, the memory usage of such grid-based representations increases in cubic order. Voxel pruning, tensor decomposition, and hash indexing are just some of the ways that have been suggested to decrease memory usage. However, these algorithms can only handle constrained scenes when the grids are built in the original Euclidean space. A space warping technique that turns an unlimited space into a limited one is an approach often used to describe unlimited scenarios.
Generally, there are two different types of deformation functions. (1) For forward-facing scenes (Fig. 1(a)), normalized device coordinate (NDC) deformation is used to map an infinitely distant view frustum to a bounding box by squashing space along the z-axis. . (2) For unlimited scenes centered on 360° objects, the inverse sphere warp can assign an infinitely large space to a sphere limited by the sphere inversion transformation. However, these two warping techniques cannot accommodate random camera path patterns and instead assume certain ones. The quality of the images produced suffers particularly when a trajectory is long and comprises several elements of interest, known as free trajectories, as seen in Fig. 1(c).
The uneven allocation of spatial representation capacity leads to a decrease in the performance of free paths. In particular, many areas of the landscape remain empty and invisible to any perspective of entry when the path is long and narrow. However, regardless of whether the area is vacant, the grids of the current approaches are consistently distributed over the entire frame. As a result, much of the rendering capacity must be reclaimed in unused space. Although this waste can be reduced using empty voxel pruning, tensor decomposition, or hash indexing, it still results in blurry images as GPU memory is limited.
Furthermore, only sparse and far input views fill the background spaces, while many foreground elements in Fig. 1(c) are observed with dense and near input views in the visible spaces. In this scenario, dense grids should be assigned to foreground objects to maintain shape detail, and thick grids should be placed in the background area to take full advantage of the grid’s spatial rendering. However, existing grid-based systems distribute the grids evenly over the area, resulting in inefficient use of representative capacity. Researchers from the University of Hong Kong, S-Lab NTU, Max Plank Institute, and Texas A&M University suggest F2 -NeRF (Fast-Free-NeRF), the first NeRF fast training approach that allows free camera paths for unlimited large scenes, to solve the problems mentioned above.
F2 – NeRF, based on the Instant-NGP framework, retains the fast convergence speed of the hash-grid representation and can be well trained on unlimited scenes with different camera paths. Based on this standard, they create perspective warping, a basic spatial warping technique that can be applied to any camera path. They describe the criteria for a suitable deformation function for any chamber configuration at F2 –NERF.
The fundamental principle of perspective warping is to first describe the position of a point p in 3D by concatenating the 2D coordinates of the projections of p on the input images. Then, using principal component analysis (PCA), map these 2D coordinates into a compact 3D subspace space. They empirically show that the proposed perspective warp is a generalization of the current NDC warp and inverse sphere warp to arbitrary trajectories. Perspective warp can handle random trajectories while it could automatically degenerate to these two warp functions in forward-facing scenes or 360° object-centric scenes.
They also provide a space subdivision approach to adaptively employ thick grids for background regions and narrow grids for foreground regions to achieve perspective warping in a grid-based NeRF framework. They do extensive testing on the unlimited forward-facing dataset, unlimited 360 object-centric datasets, and a new unlimited free-trajectory dataset. The tests show that F2 – NeRF generates high-quality images on all three data sets with various trajectory patterns using the same perspective distortion. Their solution outperforms standard grid-based NeRF algorithms on the new free dataset with free camera paths, and only takes around 12 minutes to train on a 2080Ti GPU.
review the Paper and Project. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 17k+ ML SubReddit, discord channeland electronic newsletterwhere we share the latest AI research news, exciting AI projects, and more.
Aneesh Tickoo is a consulting intern at MarktechPost. She is currently pursuing her bachelor’s degree in Information Science and Artificial Intelligence at the Indian Institute of Technology (IIT), Bhilai. She spends most of her time working on projects aimed at harnessing the power of machine learning. Her research interest is image processing and she is passionate about creating solutions around her. She loves connecting with people and collaborating on interesting projects.
🔥 Must Read: What is AI Hallucination? What goes wrong with AI chatbots? How to detect an amazing artificial intelligence?