Google’s New AI Tool Can Turn Text Into Music

Google recently announced that it has developed a new artificial intelligence (AI) music generator that can create music in any genre with a text description.

In a research paper published on Thursday, Google researchers introduced the music-making AI tool dubbed “MusicLM” as a model generating high-fidelity music from text descriptions such as ‘a calming violin melody backed by a distorted guitar riff’.

MusicLM casts the process of conditional music generation as a hierarchical sequence-to-sequence modeling task, and “generates music at 24 kHz that remains consistent over several minutes.”

Further, the AI tool can produce “original” music from both text and sound prompts for up to 5 minutes from text and auditory cues. MusicLM was trained on a database consisting of over 2,80,000 hours of music so as to learn how to generate coherent songs for descriptions in any desired genre.

“Our experiments show that MusicLM outperforms previous systems both in audio quality and adherence to the text description. Moreover, we demonstrate that MusicLM can be conditioned on both text and a melody in that it can transform whistled and hummed melodies according to the style described in a text caption,” the paper reads.

According to the research paper, MusicLM could create songs from richly written captions, such as:

The main soundtrack of an arcade game. It is fast-paced and upbeat, with a catchy electric guitar riff. The music is repetitive and easy to remember, but with unexpected sounds, like cymbal crashes or drum rolls.


Through the “story mode” where the audio was generated by providing a sequence of shorter text prompts, including:

  • Time to meditate (0:00-0:15), time to wake up (0:15-0:30), time to run (0:30-0:45), time to give 100% (0:45-0:60)
  • Electronic song played in a videogame (0:00-0:15), meditation song played next to a river (0:15-0:30), fire (0:30-0:45), fireworks (0:45-0:60)

Google’s MusicLM can also generate music clips that capture the mood and atmosphere from paintings — particularly from descriptions of some of the art world’s most well-known artworks — including Edvard Munch’s “The Scream” and Salvador Dali’s famous “The Persistence of Memory”.

You can head over to Google’s GitHub page to listen to some of the music tracks generated by MusicLM.

To support future research, Google has already released MusicCaps, a dataset composed of 5.5k music-text pairs, with rich text descriptions provided by human experts.

Nevertheless, don’t expect MusicLM to be available to the public anytime soon in the future, as there are a lot of concerns that need to be addressed before it is publicly released.

According to Google’s researchers, the AI tool has existing limitations, which include programming biases that may lead to lack of representation and cultural appropriation, vocal quality, and technological glitches, as well as the use of negations and temporal ordering used in text prompts.

In fact, during an experiment, Google’s researchers found out that roughly one percent of music examples created by MusicLM were copied directly from the songs it was trained on raising copyright concerns.

“We acknowledge the risk of potential misappropriation of creative content associated to the use case. We strongly emphasize the need for more future work in tackling these risks associated to music generation — we have no plans to release models at this point,” the study states.

Subscribe to our newsletter

To be updated with all the latest news

Kavita Iyer
Kavita Iyer
An individual, optimist, homemaker, foodie, a die hard cricket fan and most importantly one who believes in Being Human!!!


Please enter your comment!
Please enter your name here

Subscribe to our newsletter

To be updated with all the latest news

Read More

Suggested Post