Control-NeRF: Editable Feature Volumes for Scene Rendering and Manipulation

Verica Lazova^1,2, Vladimir Guzov^1,2, Kyle Olszewski³, Sergey Tulyakov³ and Gerard Pons-Moll^1,2

¹University of Tübingen
²Max Planck Institute for Informatics, Saarland Informatics Campus
³Snap Inc.

Abstract

We present a novel method for performing flexible, 3D-aware image content manipulation while enabling high-quality novel view synthesis. While NeRF-based approaches are effective for novel view synthesis, such models memorize the radiance for every point in a scene within a neural network. Since these models are scene-specific and lack a 3D scene representation, classical editing such as shape manipulation, or combining scenes is not possible. Hence, editing and combining NeRF-based scenes has not been demonstrated. With the aim of obtaining interpretable and controllable scene representations, our model couples learnt scene-specific feature volumes with a scene agnostic neural rendering network. With this hybrid representation, we decouple neural rendering from scene-specific geometry and appearance. We can generalize to novel scenes by optimizing only the scene-specific 3D feature representation, while keeping the parameters of the rendering network fixed. The rendering function learnt during the initial training stage can thus be easily applied to new scenes, making our approach more flexible. More importantly, since the feature volumes are independent of the rendering model, we can manipulate and combine scenes by editing their corresponding feature volumes. The edited volume can then be plugged into the rendering model to synthesize high-quality novel views. We demonstrate scene manipulation including mixing scenes, deforming objects and inserting objects into scenes, while producing photo-realistic results.

Architecture

Our method learns a volumetric representations for multiple scenes simultaneously. Left in the figure we show visualizations of the learned feature volumes. We query the volume along the ray and predict color and density based on the obtained features. The pixel color is derived using volume rendering, similar to NeRF. At training time the volume and the rendering network are trained jointly. For novel scenes, the rendering network is fixed and only the scene volume is optimized. As shown on the right, these volumes can be edited and mixed and for the purpose of scene editing.

Video

Links

Paper

Citation

@article{lazova2022control,
  title={Control-NeRF: Editable Feature Volumes for Scene Rendering and Manipulation},
  author={Lazova, Verica and Guzov, Vladimir and Olszewski, Kyle and Tulyakov, Sergey and Pons-Moll, Gerard},
  journal={arXiv preprint arXiv:2204.10850},
  year={2022}
}

Acknowledgments

We thank Aymen Mir, Bharat Bhatnagar, Garvita Tiwari, Ilya Petrov, Jan Eric Lenssen, Julian Chibane, Keyang Zhou and Xiaohan Zhang for the in-depth discussions, valuable insights and honest feedback. This work is and supported by the German Federal Ministry of Education and Research (BMBF): Tubingen AI Center, FKZ: ¨ 01IS18039A; and partially funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - 409792180 (Emmy Noether Programme, project: Real Virtual Humans). Gerard Pons-Moll is a member of the Machine Learning Cluster of Excellence, EXC number 2064/1 – Project number 390727645. The project was made possible by funding from the Carl Zeiss Foundation.