MusicLM: Google's Promising AI For Music Generation

AI AND INNOVATION

4/18/20234 min read

Salvador Dali - The Persistence of Memory, MusicLM Painting Conditioning
Salvador Dali - The Persistence of Memory, MusicLM Painting Conditioning

In this post we take a brief look at the newest AI tool project from Google that is capable of generating music just from text: "MusicLM"

Do you have a creative mind, and sometimes daydream about a music that doesn't exist?

Do you wish you could translate your thoughts and feelings into music because words weren't enough?

Have you ever wished you could create music just by describing it in words?

Well, with MusicLM, Google's latest AI-powered music generator, that dream may soon become a reality.

Picture this - you're sitting on your couch, thinking about a new music piece, but you don't know how to play any instruments. Suddenly, you remember MusicLM, and with a few words, you're able to create the music you envisioned.

Sounds too good to be true?

Let's find out:

What is MusicLM?

Imagine you have a personal magical music genie that can turn your words into beautiful music.

This genie lives inside your computer (or other electronic device) and the only thing you have to do is write him what you're thinking or feeling.

That's exactly what MusicLM is - an AI-powered music generator tool ( a "Music Language Model") that it's still in the works, by Google, capable of turning text into music.

How does MusicLM work?

Here's how it works: you simply describe the type of music you want to hear in words, like "a happy tune with a bouncy beat and jazzy saxophone solo".

Then, MusicLM uses its "magical" AI powers to generate a high-quality music track that matches your description.

The process works like this: MusicLM is fed a text description of the type of music that needs to be generated, and it analyzes the text to understand the desired musical attributes such as the tempo, melody, rhythm, and harmony.

Then, it uses a hierarchical sequence-to-sequence modeling technique to generate the music, piece by piece.

What makes MusicLM special?

While MusicLM is not the first AI model capable of generating music from text, it does show a lot of unique and exciting features.

You can for example whistle a particular melody, then prompt MusicLm to create music in a particular genre, with the instruments you want, with that melody.

This is what google calls in their research paper: Text and Melody Conditioning

How good is MusicLM?

The best word to describe it is: Promising!

Depending on how the tool is used, the results can range from "Pretty Impressive to "This is a little weird".

MusicLM does a great job with simpler requests, like for example, playing a specific instrument with a melody provided in the form of whistling or humming.

But like many other tools, it still sounds weird and uncanny when you start asking for it to sing. The lyrics will just sound like weird gibberish and will give that uncomfortable and creepy feeling of hearing a machine trying to imitate a real human.

As of right now, MusicLM is still in its research phase, so while it's very impressive to see what is capable of, there is still a long way to go before we are able to see how something like this would work in its full potential.

When is MusicLM being released?

There is no release date available yet for MusicLM.

Google has a long record of maintaining high quality standards in the tech industry, so it is expected that something as big as this will take its time to be released to the public.

Final Thoughts

MusicLM has the potential to revolutionize the way music is created, by allowing anyone with a creative vision to turn it into a reality, without the need for extensive musical knowledge or training.

There is a lot of debate around the morality of having AI tools generating artistic content, and jeopardizing the livelihoods of millions of musicians, composers, beat producers, or pretty much anyone making a living out of the music and sound industry.

There is also a lot of debate around the legality of these AI models, since they are usually trained on pre-existing copyrighted music.

Also read: "AI-Generated Music - The ethical concerns for the music industry"

It is expected that Google won't be rushing a full release to the public any time soon to be sure that everything will be legally compliant and functioning in the best way possible, maintaining Google's long record of high standards in technology.

Be sure to check out Google's research on this, an listen to the results for yourself by going to their official research page:

MusicLM: Generating Music From Text

MusicLM - Text and Melody Conditioning
MusicLM - Text and Melody Conditioning

In the example above, the language model takes a small sample of some recognizable melodies, then transforms them into completely different sounds.

MusicLM can do way more than this though.

It is also possible to describe an entire song, and divide it into sections, by providing a sequence of prompts.

This is what's called: Story Mode

MusicLM Story Mode and Text Prompts
MusicLM Story Mode and Text Prompts

It's easy to see how a tool like this could turn the whole music-making industry upside-down.

Even a person that doesn't know how to play any instrument, with zero musical background can generate music with a tool as simple as MusicLM.

But wait! it goes even deeper than this.

As with many other AI tools capable of generating content like ChatGPT and Midjourney, there is a lot that can be done besides what the tool promises to do.

So what if...

...you could describe a painting, just to hear what it "sounds like"?

MusicLM Painting Caption Conditioning
MusicLM Painting Caption Conditioning

That is the concept of Painting Caption Conditioning, where MusicLM takes a description of a famous painting from Wikipedia, which in the above example is: "The Persistence of Memory" from Salvador Dali, and then turns it into an audio version of that painting.

Which, depending on how we look at it, is essentially turning paintings into sound.

Yes...technology has reached the point where we can see sound and hear images!

Imagine going to an art museum or gallery, and having a pair of speakers next to the painting, playing a musical interpretation of that particular piece of art.

It surely sounds promising!

Related Articles