MusicLM generates high-fidelity music from text descriptions, using a hierarchical sequence-to-sequence modeling task, and supports conditioning on both text and melody.
Generate high-fidelity music from text descriptions, such as 'a calming violin melody backed by a distorted guitar riff'.
Condition MusicLM on both text and a melody, transforming whistled and hummed melodies according to the style described in a text caption.
Use a hierarchical sequence-to-sequence modeling task to generate music at 24 kHz that remains consistent over several minutes.
Access the MusicCaps dataset, composed of 5.5k music-text pairs, with rich text descriptions provided by human experts.
Test the diversity of the generated samples while keeping constant the conditioning and/or the semantic tokens.
Generate music for film and video game soundtracks.
Create music for advertisements and commercials.
Produce music for live performances and concerts.
Use MusicLM for music education and composition.
Provide a text description of the desired music.
Condition MusicLM on a melody, if desired.
Adjust the model parameters to fine-tune the generation process.
Evaluate the generated music and refine the process as needed.