Neural networks are biased to learn lower frequency functions easier than higher frequency functions. Consequently, most scene representation networks, such as NeRF, have a hard time capturing the high frequency variations in color and geometry of a scene and thus produce blurry scene renderings. To counteract this issue, the authors of the original NeRF algorithm propose to leverage high frequency functions to map the original input to a higher dimensional space before passing it to the network. We are interested in comparing the quality of the Fourier feature mapping approaches of Positional Encoding and Gauss Mapping, as well as networks that mimic this high-frequency mapping by introducing periodic activation functions, like SIREN and SINONE.
Gauss Mapping produces smoother renderings with fewer artifacts and captures the overall object shapes better than the original Positional Encoding. The periodic activation functions of SIREN and SINONE behave unstable during training in combination with DSNeRF and are unable to capture the original 3D scene well.
FastNeRF proposes a network split that separates the original DSNeRF into two networks. This new network architecture requires a three-dimensional and a two-dimensional input instead of a five-dimensional one and thereby enables feasible caching. Since a volume renderer is able to fall back on the cache and does not have to query the network at rendering time, interactive rendering speeds at 30fps can be achieved.