Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes

Julian Chibane^1,2, Aayush Bansal³, Verica Lazova^1,2, Gerard Pons-Moll^1,2

¹University of Tübingen, Germany
²Max Planck Institute for Informatics, Saarland Informatics Campus, Germany
³Carnegie Mellon University, USA

CVPR 2021 Virtual

Overview Video

Abstract & Method

In this work, we introduce Stereo Radiance Fields (SRF), a neural view synthesis approach that is trained end-to-end, generalizes to new scenes in a single forward pass, and requires only sparse views at test time. (b)

In contrast, pure data-driven synthesis (a) requires dense input images and time-intensive scene memorization for each new scene.

SRF intuition: Building on intuition from stereo reconstruction systems, SRF achieves this by composing information of image pairs. 3D points on an opaque, non-occluded surface will project to similar-looking regions when viewed from different perspectives (blue). A point in free space, will not (red).

**Method:** To predict pixel colors of a novel view (grey camera), we shoot a camera ray into the scene, sample points along it, and predict color and density per sample, which are fused into a single color using volumetric renderings.(cf. NeRF) For the color and density prediction we (a) project the sample into all reference views, where we extract point specific CNN features. (b): Next, we compare pairs of features with learned similarity functions, emulating correspondence matching. (c): We compute aggregated stereo features with CNNs and pool them into a single encoding of correspondence, which is decoded into color and density (d).

Citation

@inproceedings{SRF,
    title = {Stereo Radiance Fields (SRF): Learning View Synthesis from Sparse Views of Novel Scenes },
    author = {Chibane, Julian and Bansal, Aayush and Lazova, Verica and Pons-Moll, Gerard},
    booktitle = {{IEEE} Conference on Computer Vision and Pattern Recognition (CVPR)},
    month = {jun},
    organization = {{IEEE}},
    year = {2021},
}

Acknowledgments

We thank the RVH group for their feedback. This work is funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - 409792180 (Emmy Noether Programme, project: Real Virtual Humans). The project was made possible by funding from the Carl Zeiss Foundation.