NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors

CVPR 2024 (Highlight)

1University of Tübingen, 2Tübingen AI Center, 3Max Planck Institute for Informatics, Saarland Informatics Campus,
4Imperial College London
Description of the image

Left: We present Neural Riemannian Distance Fields (NRDFs), a principled method to learn data-driven priors as subspace of high-dimensional Riemannian manifolds. Right: NRDFs can effectively model the pose of different articulated shapes. We present diverse samples generated using NRDFs trained on human, hand, and animal poses respectively.

Abstract

Faithfully modeling the space of articulations is a crucial task that allows recovery and generation of realistic poses, and remains a notorious challenge. To this end, we introduce Neural Riemannian Distance Fields (NRDFs), data-driven priors modeling the space of plausible articulations, represented as the zero-level-set of a neural field in a high-dimensional product-quaternion space. To train NRDFs only on positive examples, we introduce a new sampling algorithm, ensuring that the geodesic distances follow a desired distribution, yielding a principled distance field learning paradigm. We then devise a projection algorithm to map any random pose onto the level-set by an adaptive-step Riemannian optimizer, adhering to the product manifold of joint rotations at all times. NRDFs can compute the Riemannian gradient via backpropagation and by mathematical analogy, are related to Riemannian flow matching, a recent generative model. We conduct a comprehensive evaluation of NRDF against other pose priors in various downstream tasks, i.e., pose generation, image-based pose estimation, and solving inverse kinematics, highlighting NRDF’s superior performance. Besides humans, NRDF’s versatility extends to hand and animal poses, as it can effectively represent any articulation.

Results

Pose denoising

We can denoise unrealistic poses by projecting the noisy pose onto the learned manifold. Use the slider here to see the projection process.


Interpolate start reference image.

Noisy Pose

Loading...
Interpolation end reference image.

Projected Pose


Interpolate start reference image.

Noisy Pose

Loading...
Interpolation end reference image.

Projected Pose


Interpolate start reference image.

Noisy Pose

Loading...
Interpolation end reference image.

Projected Pose

Interpolate start reference image.

Noisy Pose

Loading...
Interpolation end reference image.

Projected Pose


Interpolate start reference image.

Noisy Pose

Loading...
Interpolation end reference image.

Projected Pose


Interpolate start reference image.

Noisy Pose

Loading...
Interpolation end reference image.

Projected Pose


Inverse kinematics from partial observations

Given partial observation (yellow markers), we perform 3D pose completion. We observe that VPoser (pink) based optimization generates realistic, yet fixed and less diverse poses. NRDF (blue) generates diverse and realistic poses in all setups.

Monocular 3D pose estimation from images

Top: Results from SMPLer-X Bottom: We refine the network prediction using NRDF based optimization pipeline. As highlighted, refined poses align better with the observation.

Description of the image

Acknowledgments

Special thanks to RVH team members, and reviewers, their feedback helped improve the manuscript. The project was made possible by funding from the Carl Zeiss Foundation. This work is supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - 409792180 (Emmy Noether Programme, project: Real Virtual Humans) and the German Federal Ministry of Education and Research (BMBF): Tübingen AI Center, FKZ: 01IS18039A. Gerard Pons-Moll is a member of the Machine Learning Cluster of Excellence, EXC number 2064/1 - Project number 390727645. This work was supported by the Engineering and Physical Sciences Research Council [grant number EP/X011364/1].

BibTeX

@inproceedings{he24nrdf,
    title = {NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors},
    author = {He, Yannan and Tiwari, Garvita and Birdal, Tolga and Lenssen, Jan Eric and Pons-Moll, Gerard},
    booktitle = {Conference on Computer Vision and Pattern Recognition ({CVPR})},
    year = {2024},
}