There has recently been great interest in neural rendering methods. Some approaches use 3D geometry reconstructed with Multi-View Stereo (MVS) but cannot recover from the errors of this process, while others directly learn a volumetric neural representation, but suffer from expensive training and inference. We introduce a general approach that is initialized with MVS, but allows further optimization of scene properties in the space of input views, including depth and reprojected features, resulting in improved novel view synthesis. A key element of our approach is a differentiable point-based splatting pipeline, based on our bi-directional Elliptical Weighted Average solution. To further improve quality and efficiency of our point-based method, we introduce a probabilistic depth test and efficient camera selection. We use these elements together in our neural renderer, allowing us to achieve a good compromise between quality and speed. Our pipeline can be applied to multi-view harmonization and stylization in addition to novel view synthesis.
The recent research explosion around implicit neural representations, such as NeRF, shows that there is immense potential for implicitly storing high-quality scene and lighting information in compact neural networks. However, one major limitation preventing the use of NeRF in real-time rendering applications is the prohibitive computational cost of excessive network evaluations along each view ray, requiring dozens of petaFLOPS. In this work, we bring compact neural representations closer to practical rendering of synthetic content in real-time applications, such as games and virtual reality. We show that the number of samples required for each view ray can be significantly reduced when samples are placed around surfaces in the scene without compromising image quality. To this end, we propose a depth oracle network that predicts ray sample locations for each view ray with a single network evaluation. We show that using a classification network around logarithmically discretized and spherically warped depth values is essential to encode surface locations rather than directly estimating depth. The combination of these techniques leads to DONeRF, our compact dual network design with a depth oracle network as its first step and a locally sampled shading network for ray accumulation. With DONeRF, we reduce the inference costs by up to 48x compared to NeRF when conditioning on available ground truth depth information. Compared to concurrent acceleration methods for raymarching-based neural representations, DONeRF does not require additional memory for explicit caching or acceleration structures, and can render interactively (20 frames per second) on a single GPU.
We investigate the use of neural fields for modeling diverse mesoscale appearances. Mesoscale structures, such as fur, fabric, and grass are currently handled using case-specific graphics primitives with limited versatility. Instead, we draw inspiration from neural radiance fields and propose to represent a volumetric mesoscale primitive using a neural reflectance field (NeRF), which jointly models the geometry and lighting response. The volumetric primitive can be instantiated over a base mesh to ``texture'' it with the desired meso and microscale appearance. We condition the reflectance field on user-defined parameters that control the appearance. A single NeRF texture thus captures a continuum of reflectance fields rather than one specific structure. This increases the gamut of appearances that can be modeled and provides an easy solution to combat repetitive texturing artifacts. Our approach unites the versatility and modeling power of neural networks with artistic control needed for precise modeling of virtual scenes. While all our training data is currently synthetic, our work provides a recipe that can be further extended to extract complex, hard-to-model appearances from real images.