DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Demystifying the Magic: A Look Inside the Algorithms of Speech Recognition
  • Unlocking the Power of Search: Keywords, Similarity, and Semantics Explained
  • Advancements in AI for Health Data Analysis
  • Voice Synthesis: Evolution, Ethics, and Law

Trending

  • A Hands-On ABAP RESTful Programming Model Guide
  • Building a Production-Ready AI Agent in 2026: Beyond the Hello World Demo
  • Top JavaScript/TypeScript Gen AI Frameworks for 2026
  • Ujorm3: A New Lightweight ORM for JavaBeans and Records
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Generate Music Using Meta’s MusicGen On Colab

Generate Music Using Meta’s MusicGen On Colab

Learn how to set up MusicGen on Colab, an advanced text-to-music model that uses artificial intelligence algorithms to generate captivating musical compositions.

By 
Mittal Patel user avatar
Mittal Patel
·
Oct. 10, 23 · Tutorial
Likes (1)
Comment
Save
Tweet
Share
1.9K Views

Join the DZone community and get the full member experience.

Join For Free

In the vast realm of artificial intelligence, deep learning has revolutionized numerous domains, including natural language processing, computer vision, and speech recognition. However, one fascinating area that has captivated researchers and music enthusiasts alike is the generation of music using artificial intelligence algorithms. MusicGen is a state-of-the-art controllable text-to-music model that seamlessly translates textual prompts into captivating musical compositions.

What Is MusicGen?

MusicGen is a remarkable model designed for music generation that offers simplicity and controllability. Unlike existing methods such as MusicLM, MusicGen stands out by eliminating the need for a self-supervised semantic representation. The model employs a single-stage auto-regressive Transformer architecture and is trained using a 32kHz EnCodec tokenizer. Notably, MusicGen generates all four codebooks in a single pass, setting it apart from conventional approaches. By introducing a slight delay between the codebooks, the model demonstrates the ability to predict them in parallel, resulting in a mere 50 auto-regressive steps per second of audio. This innovative approach optimizes the efficiency and speed of the music generation process.

MusicGen is trained on 20k hours of licensed music. They also trained it on the internal dataset of 10K high-quality music tracks, and on the ShutterStock and Pond5 music data.

Prerequisites:

As per the official MusicGen GitHub repo: 

  • Python 3.9
  • PyTorch 2.0.0
  • GPU with at least 16 GB of memory

Available MusicGen Models

There are 4 pre-trained models available and they are as follows:

  • Small: 300M model, Text to music only 
  • Medium: 1.5B model, Text to music only
  • Melody: 1.5B model, Text to music and text+melody to music
  • Large: 3.3B model, Text to music only

Experiments

Below is the output of conditional music generation using the MusicGen large model.

Text Input: Jingle bell tune with violin and piano
Output: (Using MusicGen "large" model)


Below is the output of the MusicGen “melody” model. We used the above audio and text input to generate the following audio.

Text Input: Add heavy drums drums and only drums
Output: (Using MusicGen "melody" model)


How to Set Up the MusicGen on Colab

Make sure you are using GPU for faster inference. It took ~9 minutes to generate 10 seconds of audio using CPU whereas using GPU(T4) it took just 35 seconds.

Before starting make sure Torch and TorchAudio are installed in the Colab.

Install the AudioCraft library from Facebook.

!python3 -m pip install -U git+https://github.com/facebookresearch/audiocraft#egg=audiocraft


Import necessary libraries.

from audiocraft.models import musicgen
from audiocraft.utils.notebook import display_audio
import torchfrom audiocraft.data.audio import audio_write


Load the model. The list of models is as follows:

# | model types are => small, medium, melody, large |
# | size of models are => 300M, 1.5B, 1.5B, 3.3B |
model = musicgen.MusicGen.get_pretrained('large', device='cuda')

Set the parameters (optional):

model.set_generation_params(duration=60) # this will generate 60 seconds of audio.


Conditional music generation (generate the music by providing text).

model.set_generation_params(duration=60)
res = model.generate( [ 'Jingle bell tune with violin and piano' ], progress=True)
# This will show the music controls on the colab


To generate unconditional music:

res = model.generate_unconditional( num_samples=1, progress=True)
# this will show the music controls on the screendisplay_audio(res, 16000)


To Generate Music Continuation

To create music continuation we will need an audio file. We will feed that file to the model and the model will generate and add more music to it.

from audiocraft.utils.notebook import display_audio
import torchaudio


path_to_audio = "path-to-audio-file.wav"
description = "Jazz jazz and only jazz"


# Load audio from a file. Make sure to trim the file if it is too long!
prompt_waveform, prompt_sr = torchaudio.load( path_to_audio )
prompt_duration = 15
prompt_waveform = prompt_waveform[..., :int(prompt_duration * prompt_sr)]
output = model.generate_continuation(prompt_waveform, prompt_sample_rate=prompt_sr,
descriptions=[ description ], progress=True)
display_audio(output, sample_rate=32000)


To generate melody conditional generation:

model = musicgen.MusicGen.get_pretrained('melody', device='cuda')


model.set_generation_params(duration=20)


melody_waveform, sr = torchaudio.load("path-to-audio-file.wav")
melody_waveform = melody_waveform.unsqueeze(0).repeat(2, 1, 1)
output = model.generate_with_chroma(
descriptions=['Add heavy drums'], melody_wavs=melody_waveform, melody_sample_rate=sr,progress=True)
display_audio(output, sample_rate=32000)


Write the audio file to the disk.

If you want to download the file from the Colab then you will need to write the WAV file on the disk. Here is the function that will write a WAV file onto the disk. It will take the model output as a first input and the filename as a second input.

def write_wav(output, file_initials):
    try:
        for idx, one_wav in enumerate(output):
        audio_write(f'{file_initials}_{idx}', one_wav.cpu(), model.sample_rate, strategy="loudness", loudness_compressor=True)
        return True
    except Exception as e:
        print("error while writing the file ", e)
        return None


# this will write a file that starts with bollywood
write_wav(res, "audio-file")


Full Implementation (Google Colab File Link)

Our full implementation of the Meta’s MusicGen library is given in the Colab file. Feel free to explore and create music using it.

Conclusion

In conclusion, Audiocraft’s MusicGen is a powerful and controllable music generation model. Looking ahead, Audiocraft holds exciting future potential for advancements in AI-generated music. Whether you’re a musician or an AI enthusiast, Audiocraft’s MusicGen opens up a world of creative possibilities.

AI Deep learning Library NLP Speech recognition

Published at DZone with permission of Mittal Patel. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Demystifying the Magic: A Look Inside the Algorithms of Speech Recognition
  • Unlocking the Power of Search: Keywords, Similarity, and Semantics Explained
  • Advancements in AI for Health Data Analysis
  • Voice Synthesis: Evolution, Ethics, and Law

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook