Control-NeRF: Editable Feature Volumes for Scene Rendering and Manipulation

Verica Lazova1,2, Vladimir Guzov1,2, Kyle Olszewski3, Sergey Tulyakov3 and Gerard Pons-Moll1,2

1University of Tübingen
2Max Planck Institute for Informatics, Saarland Informatics Campus
3Snap Inc.


We present a novel method for performing flexible, 3D-aware image content manipulation while enabling high-quality novel view synthesis. While NeRF-based approaches are effective for novel view synthesis, such models memorize the radiance for every point in a scene within a neural network. Since these models are scene-specific and lack a 3D scene representation, classical editing such as shape manipulation, or combining scenes is not possible. Hence, editing and combining NeRF-based scenes has not been demonstrated. With the aim of obtaining interpretable and controllable scene representations, our model couples learnt scene-specific feature volumes with a scene agnostic neural rendering network. With this hybrid representation, we decouple neural rendering from scene-specific geometry and appearance. We can generalize to novel scenes by optimizing only the scene-specific 3D feature representation, while keeping the parameters of the rendering network fixed. The rendering function learnt during the initial training stage can thus be easily applied to new scenes, making our approach more flexible. More importantly, since the feature volumes are independent of the rendering model, we can manipulate and combine scenes by editing their corresponding feature volumes. The edited volume can then be plugged into the rendering model to synthesize high-quality novel views. We demonstrate scene manipulation including mixing scenes, deforming objects and inserting objects into scenes, while producing photo-realistic results.


Our method learns a volumetric representations for multiple scenes simultaneously. Left in the figure we show visualizations of the learned feature volumes. We query the volume along the ray and predict color and density based on the obtained features. The pixel color is derived using volume rendering, similar to NeRF. At training time the volume and the rendering network are trained jointly. For novel scenes, the rendering network is fixed and only the scene volume is optimized. As shown on the right, these volumes can be edited and mixed and for the purpose of scene editing.





  title={Control-NeRF: Editable Feature Volumes for Scene Rendering and Manipulation},
  author={Lazova, Verica and Guzov, Vladimir and Olszewski, Kyle and Tulyakov, Sergey and Pons-Moll, Gerard},
  journal={arXiv preprint arXiv:2204.10850},


Carl-Zeiss-Stiftung Tübingen AI Center University of Tübingen MPII Saarbrücken

We thank Aymen Mir, Bharat Bhatnagar, Garvita Tiwari, Ilya Petrov, Jan Eric Lenssen, Julian Chibane, Keyang Zhou and Xiaohan Zhang for the in-depth discussions, valuable insights and honest feedback. This work is and supported by the German Federal Ministry of Education and Research (BMBF): Tubingen AI Center, FKZ: ¨ 01IS18039A; and partially funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - 409792180 (Emmy Noether Programme, project: Real Virtual Humans). Gerard Pons-Moll is a member of the Machine Learning Cluster of Excellence, EXC number 2064/1 – Project number 390727645. The project was made possible by funding from the Carl Zeiss Foundation.