Generative Neural Articulated Radiance Fields | NeurIPS 2022

Alexander W. Bergman*, Petr Kellnhofer*, Wang Yifan*, Eric R. Chan*, David B. Lindell, Gordon Wetzstein

Unconditional generation of editable radiance field representations of human bodies.

ABSTRACT

Unsupervised learning of 3D-aware generative adversarial networks (GANs) using only collections of single-view 2D photographs has very recently made much progress. These 3D GANs, however, have not been demonstrated for human bodies and the generated radiance fields of existing frameworks are not directly editable, limiting their applicability in downstream tasks. We propose a solution to these challenges by developing a 3D GAN framework that learns to generate radiance fields of human bodies or faces in a canonical pose and warp them using an explicit deformation field into a desired body pose or facial expression. Using our framework, we demonstrate the first high-quality radiance field generation results for human bodies. Moreover, we show that our deformation-aware training procedure significantly improves the quality of generated bodies or faces when editing their poses or facial expressions compared to a 3D GAN that is not trained with explicit deformations.

FILES

Technical paper and supplement (link to arxiv)
Code

CITATION

A.W. Bergman, P. Kellnhofer, W. Yifan, E.R. Chan, D.B. Lindell, G. Wetzstein, Generative Neural Articulated Radiance Fields, NeurIPS 2022.

@inproceedings{bergman2022gnarf,
author = {Bergman, Alexander W. and Kellnhofer, Petr and Yifan, Wang and Chan, Eric R., and Lindell, David B. and Wetzstein, Gordon},
title = {Generative Neural Articulated Radiance Fields},
booktitle = {NeurIPS},
year = {2022},
}

QUANTITATIVE RESULTS

Bodies	AIST++ FID	AIST++ PCKh@0.5	SURREAL FID	SURREAL PCKh@0.5
ENARF-GAN	–	–	21.3	0.966
EG3D+warping	66.5	0.855	163.9	0.348
GNARF	7.9	0.980	4.7	0.999

GNARF outperforms concurrent work ENARF-GAN and a baseline which deforms a trained EG3D model by estimating the generated pose and deforming ray samples into a target pose. This is measured in FID, which compares how similar the generated images are to real images, and PCKh@0.5, which measures how similar the rendered result is to the target body pose.

Faces	FID	AED	APD	ID-Consistency
EG3D+warping	22.9	0.29	0.028	0.81
PIRender	64.4	0.28	0.040	0.70
3D GAN Inversion	31.2	0.36	0.039	0.73
GNARF	17.9	0.23	0.025	0.80

Similarly, GNARF outperforms competing methods and the EG3D+warping baseline in animating facial expressions. This is measured in FID, Average Expression Distance (AED) and Average Pose Distance (APD), which measure how close the rendered image is to the target pose, and ID-consistency. GNARF is either the state-of-the-art or competitive on all of these metrics.

METHOD

Overview of our pipeline. The StyleGAN2 generator generates a tri-plane feature representation of a radiance field. The feature volume is then deformed via our Surface Field method, and the features are rendered by neural volume rendering. The image super-resolution module converts this to a higher resolution image, which is then fed into the discriminator along with the camera and body poses and neural rendered raw image.

HUMAN BODY RESULTS

GNARF is able to generate diverse 3D human body identities, and animate these bodies by a parametric model such as the SMPL skeleton and mesh.

When compared to the baseline of animating EG3D, our method produces significantly better animated results. This is because during training, the animation and identity are explicitly factored in the generator architecture.

HUMAN FACE RESULTS

GNARF can also be applied to 3D faces and used to generate animatable models. We show that the deformation enables parametric model-driven expression editing.

RELATED PROJECTS

You may also be interested in related projects on 2D and 3D GANs, such as :

Karras et al. StyleGAN, 2019 (link)
Karras et al. StyleGAN2, 2020 (link)
Karras et al. StyleGAN3, 2021 (link)
Chan et al. pi-GAN. CVPR 2021 (link)
Chan et al. EG3D. CVPR 2022 (link)

or related projects focusing on neural scene representations and rendering from our group:

Lindell et al. BACON: Band-limited Coordinate Networks. CVPR 2022 (link)
Bergman et al. Fast Training of Neural Lumigraph Representations using Meta Learning. NeurIPS 2021 (link)
Kellnhofer et al. Neural Lumigraph Rendering. CVPR 2021 (link)
Martel et al. ACORN: Adaptive Coordinate Networks for Neural Representation. SIGGRAPH 2021 (link)
Lindell et al. Automatic Integration for Fast Neural Rendering. CVPR 2021 (link)
Sitzmann et al. Implicit Neural Representations with Periodic Activation Functions. NeurIPS 2020 (link)
Sitzmann et al. MetaSDF. NeurIPS 2020 (link)
Sitzmann et al. Scene Representation Networks. NeurIPS 2019 (link)
Sitzmann et al. Deep Voxels. CVPR 2019 (link)