VLOGGER generates high quality videos of variable length, easily controllable through high-level representations of human faces and bodies, and considers a broad spectrum of scenarios.
VLOGGER generates talking human videos from text and audio inputs, allowing for control over the content and tone of the video.
VLOGGER uses a stochastic human-to-3D-motion diffusion model to generate intermediate body motion controls, responsible for gaze, facial expressions, and pose.
VLOGGER uses a temporal image-to-image translation model to generate the corresponding frames, taking the predicted body controls and a reference image of a person.
VLOGGER generates a diverse distribution of videos of the original subject, with a significant amount of motion and realism.
VLOGGER allows for editing existing videos, making it possible to change the expression of the subject or add new content.
Generate talking human videos from text and audio inputs for use in video conferencing or virtual events.
Edit existing videos to change the expression of the subject or add new content.
Use VLOGGER to generate videos for social media or advertising campaigns.
Apply VLOGGER to generate videos for educational or training purposes.
Input text and audio to generate talking human videos.
Use the stochastic human-to-3D-motion diffusion model to generate intermediate body motion controls.
Use the temporal image-to-image translation model to generate the corresponding frames.
Edit existing videos using VLOGGER's video editing capabilities.