FreeNeRF: Improving Few-shot Neural Rendering with Free Frequency Regularization
CVPR 2023

Jiawei Yang
UC, Los Angeles
Marco Pavone
Nvidia Research
Stanford University
Yue Wang
Nvidia Research
overview

Abstract

Novel view synthesis with sparse inputs is a challenging problem for neural radiance fields (NeRF). Recent efforts alleviate this challenge by introducing external supervision, such as pre-trained models and extra depth signals, and by non-trivial patch-based rendering. In this paper, we present Frequency regularized NeRF (FreeNeRF), a surprisingly simple baseline that outperforms previous methods with minimal modifications to the plain NeRF. We analyze the key challenges in few-shot neural rendering and find that frequency plays an important role in NeRF's training. Based on the analysis, we propose two regularization terms. One is to regularize the frequency range of NeRF's inputs, while the other is to penalize the near-camera density fields. Both techniques are ``free lunches'' at no additional computational cost. We demonstrate that even with one line of code change, the original NeRF can achieve similar performance as other complicated methods in the few-shot setting. FreeNeRF achieves state-of-the-art performance across diverse datasets, including Blender, DTU, and LLFF. We hope this simple baseline will motivate a rethinking of the fundamental role of frequency in NeRF's training under the low-data regime and beyond.

TL;DR:

We use frequency regularization and occlusion regularization to improve few-shot neural rendering. Both techniques can be implemented with a few lines of code.


Example novel view synthesis results

FreeNeRF enables view synthesis from sparse inputs with as few as 3 input images, by adding a few lines of code.


More Results

For more results, check out: Comparison to others


How does FreeNeRF work?

*click to expand

1. High-frequency signals cause catastrophic overfitting in few-shot neural rendering.


Neural rendering methods, such as NeRF, can learn 3D scene representations from a set of 2D images without explicit 3D geometry. Instead, the 3D geometry is implicitly learned by optimizing appearance in its 2D projected views. However, when given only very few input views, NeRF can easily overfit to these Given only very few input views, NeRF is prone to overfitting to these 2D images with small loss while not explaining the 3D geometry in a multi-view consistent way.

This issue of overfitting in few-shot neural rendering is further exacerbated by the presence of high-frequency signals in the input positional encoding. A previous study shows that higher-frequency mappings enable faster convergence for high-frequency components. However, the over-fast convergence to high-frequency components will lead to catastrophic overfitting in few-shot neural rendering.

To test this, we conducted an experiment in which we trained NeRF models with masked positional encodings by setting the high-frequency bits to zero:

pos_enc[int(L * x%): ] = 0,

where L is the length of the positional encoding and x is the visible ratio.

The following videos show the negative impact of high-frequency signals on NeRF's performance in few-shot neural rendering, resulting in severe overfitting. While using only low-frequency inputs allows NeRF to learn 3D scene representations, the resulting models may still exhibit oversmoothness. These results highlight the importance of addressing the overfitting issue from the frequency domain in order to improve the accuracy of 3D scene representations and mitigate the issue of oversmoothness.

High-frequency inputs cause the catastrophic failure in few-shot neural rendering.

2. Frequency regularization enjoys the benefits of both high-frequency and low-frequency signals.


We propose Frequency Regularization. Given a positional encoding, we use a linearly increased frequency mask to regularize the visible frequency spectrum based on training time steps, as described in Equations 4 and 5 of the paper.

The following figure shows how frequency mask changes over the training step. We use 50%-schedule as an example, i.e., all inputs become visible at the midpoint of training. By gradually increasing the visibility of the high-frequency signals, Frequency Regularization helps to reduce the risk of overfitting that causes catastrophic failure at the beginning and avoids underfitting that causes over-smoothness at the end.

Frequency mask changes over the training step.


The following videos show two examples. NeRF models first learn the smooth and coarse 3D scene representations with only low-frequency signals. As the training step increases, more high-frequency signals become visible, and the model learns more accurate 3D scene representations with both high-frequency and low-frequency signals.

------------------>>------------------>>------------------ Training steps ------------------>>------------------>>------------------

3. Occlusion regularization addresses the near-camera floaters.


Despite the use of Frequency Regularization, some characteristic artifacts may still appear in certain novel views due to the limited number of training views and the inherent ill-posedness of the problem. These artifacts often manifest as "walls" or "floaters" that are close to the camera and can significantly degrade the quality of the 3D scene representations.

To address this issue, we propose a new method called Occlusion Regularization, which penalizes the dense fields near the camera as described in Equation 6 of the paper. By reducing the influence of these dense fields, Occlusion Regularization helps to improve the accuracy and realism of the 3D scene representations, as shown in the visual comparisons below between models without (left) and with (right) occlusion regularization.


Citation

Consider citing us if you find this project is helpful.
@article{yang2022freenerf,
  author = {Jiawei Yang and Marco Pavone and Yue Wang},
  title = {FreeNeRF: Improving Few-shot Neural Rendering with Free Frequency Regularization},
  joural = {Proceedings of the IEEE/CVF International Conference on Computer Vision (CVPR)},
  year = {2023},
}

Acknowledgement

This webpage integrates components from many websites, including RefNeRF, RegNeRF, DreamFusion, and Richard Zhang's template. We sincerely thank the authors for their great work and websites.