Tri-MipRF: Tri-Mip Representation for Efficient Anti-Aliasing Neural Radiance Fields

ICCV 2023 (Oral, Best Paper Finalist)

1PICO, ByteDance, Beijing 2Tsinghua University 3Institute of Computing Technology, CAS
Instant-ngp (left) suffers from aliasing in distant or low-resolution views and blurriness in close-up shots,
while Tri-MipRF (right) renders both fine-grained details in close-ups and high-fidelity zoomed-out images.

Abstract

Despite the tremendous progress in neural radiance fields (NeRF), we still face a dilemma of the trade-off between quality and efficiency, e.g., MipNeRF presents fine-detailed and anti-aliased renderings but takes days for training, while Instant-ngp can accomplish the reconstruction in a few minutes but suffers from blurring or aliasing when rendering at various distances or resolutions due to ignoring the sampling area.

To this end, we propose a novel Tri-Mip encoding (à la “mipmap”) that enables both instant reconstruction and anti-aliased high-fidelity rendering for neural radiance fields. The key is to factorize the pre-filtered 3D feature spaces in three orthogonal mipmaps. In this way, we can efficiently perform 3D area sampling by taking advantage of 2D pre-filtered feature maps, which significantly elevates the rendering quality without sacrificing efficiency. To cope with the novel Tri-Mip representation, we propose a cone-casting rendering technique to efficiently sample anti-aliased 3D features with the Tri-Mip encoding considering both pixel imaging and observing distance.

Extensive experiments on both synthetic and real-world datasets demonstrate our method achieves state-of-the-art rendering quality and reconstruction speed while maintaining a compact representation that reduces 25% model size compared against Instant-ngp.

Video

Method

To render a pixel, we emit a cone from the camera’s projection center to the pixel on the image plane, and then we cast a set of spheres inside the cone. Next, the spheres are orthogonally projected on the three planes and featurized by our Tri-Mip encoding. After that the feature vector is fed into the tiny MLP to non-linearly map to density and color. Finally, the density and color of the spheres are integrated using volume rendering to produce final color for the pixel.

Results

Quality vs. Reconstruction time

Our Tri-MipRF achieves state-of-the-art rendering quality while can be reconstructed efficiently, compared with cutting-edge radiance fields methods, e.g., NeRF, MipNeRF, Plenoxels, TensoRF, and Instant-ngp. Equipping Instant-ngp with super-sampling (named Instant-ngp↑5×) improves the rendering quality to a certain extent but significantly slows down the reconstruction.


In-the-wild Captures

To further demonstrate the applicability, we captured several objects in the wild, performed SFM on the sequence to estimate the camera’s intrinsic and extrinsic parameters, and applied our Tri-MipRF to reconstruct them. We show three example results here, where we can see the rendered novel views faithfully reproduce the detailed structures and appearances, and the PSNR/SSIM values also evidence the applicability of our method.


Related Links

Zip-NeRF introduces a multi-sampling-based method to address the same problem, efficient anti-aliasing, while our method belongs to the pre-filtering-based method.

BibTeX

@inproceedings{hu2023Tri-MipRF,
        author      = {Hu, Wenbo and Wang, Yuling and Ma, Lin and Yang, Bangbang and Gao, Lin and Liu, Xiao and Ma, Yuewen},
        title       = {Tri-MipRF: Tri-Mip Representation for Efficient Anti-Aliasing Neural Radiance Fields},
        booktitle   = {ICCV},
        year        = {2023}
}