MoMask utilizes a hierarchical quantization scheme to represent human motion as multi-layer discrete motion tokens with high-fidelity details, outperforming state-of-art methods on the text-to-motion generation task.
Represents human motion as multi-layer discrete motion tokens with high-fidelity details.
Predicts randomly masked motion tokens conditioned on text input at training stage.
Learns to progressively predict the next-layer tokens based on the results from current layer.
Generates 3D human motions based on text input, leveraging the hierarchical quantization scheme.
Inpaints specific regions within existing motion clips, conditioned on a textual description.
Text-driven 3D human motion generation
Temporal inpainting of motion clips
Motion generation for animation and video games
Human-computer interaction and robotics
Use MoMask for text-driven 3D human motion generation
Apply MoMask to related tasks such as temporal inpainting
Fine-tune MoMask for specific domains or applications