homehome Home chatchat Notifications


Google shows off ChatGPT-like bot that turns hums and text into music

AI is yet again redrawing the boundaries of what we call 'art'.

Tibi Puiu
January 30, 2023 @ 5:06 pm

share Share

Credit: Pixabay.

ChatGPT finally brought AI to the masses, garnering over a million users in its first week of release in December 2022. Since then, we’ve seen a ton of creative uses for virtually anything from organizing people’s meals to hosting Dungeons and Dragons nights. However, ChatGPT is, strictly speaking, a chatbot. Text flows in, text flows out.

As you’re probably aware from the flux of AI-generated media on social media, there are also very robust algorithms that can turn text prompts into images or even videos, sometimes with striking results. Now, Google unveiled a new system that can generate music in any genre starting from a simple text description. There’s even an option to generate music based on your humming or whistling if you can’t really capture your idea for a song in words.

Music-making AI bots

This isn’t the first text-to-music AI that we’ve seen. However, the new system, called MusicLM, is heads and shoulders above any other previous iteration.

Trained using a massive database of over 280,000 hours of music, Google’s AI can combine various genres and instruments to generate surprisingly eclectic works, be they short songs or entire playlists. It’s also remarkably capable of integrating more abstract requests. For instance, here’s one of the text prompts that was used in the past and shared by the authors in their research paper:

“The main soundtrack of an arcade game. It is fast-paced and upbeat, with a catchy electric guitar riff. The music is repetitive and easy to remember, but with unexpected sounds, like cymbal crashes or drum rolls.”

And here’s what the output sounds like:

Here’s another interesting one:

“Slow tempo, bass-and-drums-led reggae song. Sustained electric guitar. High-pitched bongos with ringing tones. Vocals are relaxed with a laid-back feel, very expressive.”

There’s also a story mode that you can use to generate tracks based on several descriptions stitched together, which you could theoretically use to make an entire DJ set. This is useful if you to generate a soundtrack in which different sections of the song need to evoke different feelings or play in a different style, like in this example:

time to meditate (0:00-0:15)
time to wake up (0:15-0:30)
time to run (0:30-0:45)
time to give 100% (0:45-0:60)

One of the Google researchers really had fun with the next one, stretching the limits of MusicLM by asking it to generate a track that starts off with some jazzy vibes only to roll into pop, rap, and even death metal while staying cohesive.

jazz song (0:00-0:15)
pop song (0:15-0:30)
rock song(0:30-0:45)
death metal song (0:45-1:00)
rap song (1:00-1:15)
string quartet with violins (1:15-1:30)
epic movie soundtrack with drums (1:30-1:45)
scottish folk song with traditional instruments (1:45-2:00)

Here’s a Google developer humming the main theme of the Italian protest folk song Bella Ciao:

And now here’s MusicLM reproducing the melody using a variety of instruments:

jazz with saxophone
opera singer
tribal drums and flute

But perhaps the most interesting feature is the AI’s ability to generate soundtracks using paintings and their description as prompts.

“His melting-clock imagery mocks the rigidity of chronometric time. The watches themselves look like soft cheese—indeed, by Dali s own account they were inspired by hallucinations after eating Camembert cheese. In the center of the picture, under one of the watches, is a distorted human face in profile. The ants on the plate represent decay.” By Gromley, Jessica. “The Persistence of Memory”. Encyclopedia Britannica, 14 Apr. 2022.
Dali soundtrack
“Inspired by a hallucinatory experience in which Munch felt and heard a scream throughout nature, it depicts a panic-stricken creature, simultaneously corpse like and reminiscent of a sperm or fetus, whose contours are echoed in the swirling lines of the blood-red sky.” By Zaczek, Iain. “The Scream”. Encyclopedia Britannica, 14 Apr. 2022.
Munch soundtrack

There are dozens of other sample tracks made using MusicLM posted on GitHub.

These are surely impressive results, although don’t expect any of these songs to win a Grammy any time soon. The compositions, while entertaining and even creative at times, are littered with all sorts of artifacts that sound oddly out of place, like the seven-finger hands you sometimes see in AI-generated visual art. Sound quality-wise, although Google claims the AI generates files at 24 kHz, the output can sound like it was mixed and mastered by some junior sound engineer in his basement.

Despite its shortcomings, MusicLM is still pretty mindblowing. Furthermore, it shows that neither Google nor its rival Meta for that matter, is sitting idle while everyone is going crazy about ChatGPT. Google might even have a better chatbot than OpenAI but they might just be keeping their cards close to their chest, waiting for the perfect moment to unveil their own work. If there’s anything that Google showed us through its DeepMind division, is that it’s capable of delivering extraordinary AI machines, like AlphaGo that can steamroll the world’s best champions at Go (a game several orders of magnitude more complex than chess) or AlphaFold, which cracked the structure of over 200 million proteins.

For now, MusicLM is not publicly available. The authors say that the machine is not ready for public release yet, as researchers still need to figure out how to solve some glitches, but also some licensing dilemmas that may prove particularly thorny. Stability AI and Midjourney—two of the biggest names in the exploding field of AI-generated imagery— have become the target of a class action lawsuit in California filed by many artists who are requesting financial reparation for copyright infringement. The artists are “con­cerned about AI sys­tems being trained on vast amounts of copy­righted work with no con­sent, no credit, and no com­pen­sa­tion,” and Google might have a similar concern that it could get sued if it releases a public AI trained on music without the authors’ permission.

share Share

How Hot is the Moon? A New NASA Mission is About to Find Out

Understanding how heat moves through the lunar regolith can help scientists understand how the Moon's interior formed.

This 5,500-year-old Kish tablet is the oldest written document

Beer, goats, and grains: here's what the oldest document reveals.

A Huge, Lazy Black Hole Is Redefining the Early Universe

Astronomers using the James Webb Space Telescope have discovered a massive, dormant black hole from just 800 million years after the Big Bang.

Did Columbus Bring Syphilis to Europe? Ancient DNA Suggests So

A new study pinpoints the origin of the STD to South America.

The Magnetic North Pole Has Shifted Again. Here’s Why It Matters

The magnetic North pole is now closer to Siberia than it is to Canada, and scientists aren't sure why.

For better or worse, machine learning is shaping biology research

Machine learning tools can increase the pace of biology research and open the door to new research questions, but the benefits don’t come without risks.

This Babylonian Student's 4,000-Year-Old Math Blunder Is Still Relatable Today

More than memorializing a math mistake, stone tablets show just how advanced the Babylonians were in their time.

Sixty Years Ago, We Nearly Wiped Out Bed Bugs. Then, They Started Changing

Driven to the brink of extinction, bed bugs adapted—and now pesticides are almost useless against them.

LG’s $60,000 Transparent TV Is So Luxe It’s Practically Invisible

This TV screen vanishes at the push of a button.

Couple Finds Giant Teeth in Backyard Belonging to 13,000-year-old Mastodon

A New York couple stumble upon an ancient mastodon fossil beneath their lawn.