ChatGPT finally brought AI to the masses, garnering over a million users in its first week of release in December 2022. Since then, we’ve seen a ton of creative uses for virtually anything from organizing people’s meals to hosting Dungeons and Dragons nights. However, ChatGPT is, strictly speaking, a chatbot. Text flows in, text flows out.
As you’re probably aware from the flux of AI-generated media on social media, there are also very robust algorithms that can turn text prompts into images or even videos, sometimes with striking results. Now, Google unveiled a new system that can generate music in any genre starting from a simple text description. There’s even an option to generate music based on your humming or whistling if you can’t really capture your idea for a song in words.
Music-making AI bots
This isn’t the first text-to-music AI that we’ve seen. However, the new system, called MusicLM, is heads and shoulders above any other previous iteration.
Trained using a massive database of over 280,000 hours of music, Google’s AI can combine various genres and instruments to generate surprisingly eclectic works, be they short songs or entire playlists. It’s also remarkably capable of integrating more abstract requests. For instance, here’s one of the text prompts that was used in the past and shared by the authors in their research paper:
“The main soundtrack of an arcade game. It is fast-paced and upbeat, with a catchy electric guitar riff. The music is repetitive and easy to remember, but with unexpected sounds, like cymbal crashes or drum rolls.”
And here’s what the output sounds like:
Here’s another interesting one:
“Slow tempo, bass-and-drums-led reggae song. Sustained electric guitar. High-pitched bongos with ringing tones. Vocals are relaxed with a laid-back feel, very expressive.”
There’s also a story mode that you can use to generate tracks based on several descriptions stitched together, which you could theoretically use to make an entire DJ set. This is useful if you to generate a soundtrack in which different sections of the song need to evoke different feelings or play in a different style, like in this example:
One of the Google researchers really had fun with the next one, stretching the limits of MusicLM by asking it to generate a track that starts off with some jazzy vibes only to roll into pop, rap, and even death metal while staying cohesive.
Here’s a Google developer humming the main theme of the Italian protest folk song Bella Ciao:
And now here’s MusicLM reproducing the melody using a variety of instruments:
But perhaps the most interesting feature is the AI’s ability to generate soundtracks using paintings and their description as prompts.
There are dozens of other sample tracks made using MusicLM posted on GitHub.
These are surely impressive results, although don’t expect any of these songs to win a Grammy any time soon. The compositions, while entertaining and even creative at times, are littered with all sorts of artifacts that sound oddly out of place, like the seven-finger hands you sometimes see in AI-generated visual art. Sound quality-wise, although Google claims the AI generates files at 24 kHz, the output can sound like it was mixed and mastered by some junior sound engineer in his basement.
Despite its shortcomings, MusicLM is still pretty mindblowing. Furthermore, it shows that neither Google nor its rival Meta for that matter, is sitting idle while everyone is going crazy about ChatGPT. Google might even have a better chatbot than OpenAI but they might just be keeping their cards close to their chest, waiting for the perfect moment to unveil their own work. If there’s anything that Google showed us through its DeepMind division, is that it’s capable of delivering extraordinary AI machines, like AlphaGo that can steamroll the world’s best champions at Go (a game several orders of magnitude more complex than chess) or AlphaFold, which cracked the structure of over 200 million proteins.
For now, MusicLM is not publicly available. The authors say that the machine is not ready for public release yet, as researchers still need to figure out how to solve some glitches, but also some licensing dilemmas that may prove particularly thorny. Stability AI and Midjourney—two of the biggest names in the exploding field of AI-generated imagery— have become the target of a class action lawsuit in California filed by many artists who are requesting financial reparation for copyright infringement. The artists are “concerned about AI systems being trained on vast amounts of copyrighted work with no consent, no credit, and no compensation,” and Google might have a similar concern that it could get sued if it releases a public AI trained on music without the authors’ permission.