SEED Research & Announcements Blogs Publications Open Source Careers Contact Us Research & Announcements Blogs Publications Open Source Careers Contact Us

SCA 2022: Voice2Face Audio-Driven Facial and Tongue Rig Animations with cVAEs

This is a research presentation from the Eurographics Symposium on Computer Animation (SCA 2022). Authors: Mónica Villanueva Aylagas, Héctor Anadon Leon, Mattias Teye, and Konrad Tollmar.

Download the full research paper. (5.9 MB PDF)

In this paper, we present Voice2Face, a tool that generates facial and tongue animations directly from recorded speech using machine learning.

Our approach consists of two steps: a conditional Variational Autoencoder generates mesh animations from speech, while a separate module maps the animations to rig controller space. Our contributions include an automated method for speech style control, a method to train a model with data from multiple quality levels, and a method for animating the tongue. 

Unlike previous works, our model generates animations without speaker-dependent characteristics, while allowing speech style control.

We demonstrate through a user study that Voice2Face significantly outperforms a comparative state-of-the-art model, and our quantitative evaluation suggests that Voice2Face yields more accurate lip closure in speech with bilabials through our speech style optimization. Both evaluations also show that our data quality conditioning scheme outperforms both an unconditioned model and a model trained with a smaller high-quality dataset. Finally, the user study shows a preference for animations including the tongue. 

Evaluating Data-Driven Co-Speech Gestures of Embodied Conversational Agents through Real-Time Interaction

Related News

Improving Generalization in Game Agents with Imitation Learning

SEED
Jul 16, 2024
How do we efficiently train in-game AI agents to handle new situations that they haven’t been trained on?

Towards Optimal Training Distribution for Photo-to-Face Models

SEED
Jul 8, 2024
How do we best construct game avatars from photos? This presentation discusses a work in progress with an optimized view of the training data.

Incorporating ML Research Into Audio Production: ExFlowSions Case Study

SEED
Jun 25, 2024
Mónica Villanueva and Jorge García present the challenges and lessons learned from turning a machine learning generative model from a research project into a game production tool.