Despite the tremendous progress in neural radiance fields (NeRF), we still face a dilemma of the trade-off between quality and efficiency, e.g., MipNeRF presents fine-detailed and anti-aliased renderings but takes days for training, while Instant-ngp can accomplish the reconstruction in a few minutes but suffers from blurring or aliasing when rendering at various distances or resolutions due to ignoring the sampling area.
To this end, we propose a novel Tri-Mip encoding (à la “mipmap”) that enables both instant reconstruction and anti-aliased high-fidelity rendering for neural radiance fields. The key is to factorize the pre-filtered 3D feature spaces in three orthogonal mipmaps. In this way, we can efficiently perform 3D area sampling by taking advantage of 2D pre-filtered feature maps, which significantly elevates the rendering quality without sacrificing efficiency. To cope with the novel Tri-Mip representation, we propose a cone-casting rendering technique to efficiently sample anti-aliased 3D features with the Tri-Mip encoding considering both pixel imaging and observing distance.
Extensive experiments on both synthetic and real-world datasets demonstrate our method achieves state-of-the-art rendering quality and reconstruction speed while maintaining a compact representation that reduces 25% model size compared against Instant-ngp.
To render a pixel, we emit a cone from the camera’s projection center to the pixel on the image plane, and then we cast a set of spheres inside the cone. Next, the spheres are orthogonally projected on the three planes and featurized by our Tri-Mip encoding. After that the feature vector is fed into the tiny MLP to non-linearly map to density and color. Finally, the density and color of the spheres are integrated using volume rendering to produce final color for the pixel.
Our Tri-MipRF achieves state-of-the-art rendering quality while can be reconstructed efficiently, compared with cutting-edge radiance fields methods, e.g., NeRF, MipNeRF, Plenoxels, TensoRF, and Instant-ngp. Equipping Instant-ngp with super-sampling (named Instant-ngp↑5×) improves the rendering quality to a certain extent but significantly slows down the reconstruction.
To further demonstrate the applicability, we captured several objects in the wild, performed SFM on the sequence to estimate the camera’s intrinsic and extrinsic parameters, and applied our Tri-MipRF to reconstruct them. We show three example results here, where we can see the rendered novel views faithfully reproduce the detailed structures and appearances, and the PSNR/SSIM values also evidence the applicability of our method.
Zip-NeRF introduces a multi-sampling-based method to address the same problem, efficient anti-aliasing, while our method belongs to the pre-filtering-based method.
@inproceedings{hu2023Tri-MipRF,
author = {Hu, Wenbo and Wang, Yuling and Ma, Lin and Yang, Bangbang and Gao, Lin and Liu, Xiao and Ma, Yuewen},
title = {Tri-MipRF: Tri-Mip Representation for Efficient Anti-Aliasing Neural Radiance Fields},
booktitle = {ICCV},
year = {2023}
}