Research Projects | Real Virtual Humans

IPNet — Figure 1. Single view point cloud and registration using IP-Net. Given a single view point cloud (A), IP-Net can be used to predict the missing shape and register SMPL+D (B) to it. This allows us to control the registration with novel poses (C, D).

Statistical human body models are key ingredients in studying, manipulating and animating digital humans. We proposed SMPL [1], a human body model built upon linear blend skinning. To account for body shape variations and pose-dependent shape variations, SMPL learns shape and pose blendshapes from a large number of aligned scans. SMPL has a linear formulation and it is compatible with commercial rendering engines, making it a valuable tool for many research projects in the vision and graphics community.

Realistic digital humans exhibit various soft-tissue deformations when performing motions. In Dyna [2], we learned a low-dimensional linear subspace of soft-tissue deformations, which is related to pose coefficients of the underlying body model. Dyna significantly advances the state-of-the-art in terms of animation realism.

Registering a 3D human body model to human scans is a challenging problem due to noise and missing regions in scans. In IP-Net [3], we utilized implicit functions modelled by deep neural networks to reconstruct detail-rich scan surfaces. The proposed method can reconstruct humans in clothing from sparse point clouds or even from single-view depth images, see Figure. 1. Another difficulty for registration with a learning-based approach is the lack of annotated training samples. In LoopReg [4], we cast registration as a differentiable end-to-end formulation by diffusing the SMPL blending function to the whole 3D space. This self-supervised model only requires a small set of registered scans to warm-start and it becomes more accurate after processing more raw scans.

Figure 2. DFaust exploits consistency in texture over time intervals and deals with temporal offsets between shape and texture capture.

Applying machine learning to animation, motion prediction and motion synthesis requires a large amount of registered 4D motion data. We developed a novel 4D registration technique by exploiting the temporal consistency of texture. With this approach, we collected a dynamic 4D human dataset DFaust [5], which contains 40,000 raw and aligned meshes (Figure. 2). Existing marker-based human capture datasets vary in size, skeleton structure and annotation details, hence cannot be jointly used. Therefore we introduced AMASS [6], a large-scale human motion capture dataset that unifies 15 mocap datasets under the same parametrization. We achieved this by solving a sophisticated optimization problem to fit the body model to sparse marker sets.

3D Human Body Registration

References