StyleAvatar3D: Generating Stylized Avatars Using Image-Text Diffusion Models

The framework uses ControlNet to produce multi-view images with pose guidance.

Researchers presented StyleAvatar3D – a new method for generating stylized 3D avatars using pre-trained image-text diffusion models and a GAN-based 3D generation network. The first is utilized for data generation, while the latter – for training.

The creators take advantage of the appearance and geometry provided by ControlNet to create multi-view avatar images in different styles. They use poses from existing 3D models to guide the process of generating these images. 

"To address the misalignment between poses and images in data, we investigate view-specific prompts and develop a coarse-to-fine discriminator for GAN training. We also delve into attribute-related prompts to increase the diversity of the generated avatars. Additionally, we develop a latent diffusion model within the style space of StyleGAN to enable the generation of avatars based on image inputs."

The authors claim their approach "demonstrates superior performance over current state-of-the-art methods in terms of visual quality and diversity of the produced avatars." Read more about it here and don't forget to join our 80 Level Talent platform and our Telegram channel, follow us on Instagram and Twitter, where we share breakdowns, the latest news, awesome artworks, and more.

Join discussion

Comments 0

    You might also like

    We need your consent

    We use cookies on this website to make your browsing experience better. By using the site you agree to our use of cookies.Learn more