Google Magenta, going forward with AI-Assisted Music Production?

Google Magenta

Two years ago, Google launched Magenta, a research project that explores the role of AI in the processes of creating art and music. I dug a bit more on where they currently stand and they already have many demos showcasing how machine learning algorithms can help artists in their creative process.

I insist on the word help. In my opinion, technologies are not created to replace artists. The goal is to enable them to explore more options, thus potentially spark more creativity.

“Music is not a “problem to be solved” by AI. Music is not a problem, period. Music is a means of self expression. (…) What AI technologies are for, then, is finding new ways to make these connections. And never before has it been this easy to search for them.” Tero Parviainen

When you write a song, usually one of the first things you pick is which instruments you and/or your band are going to play. Right from the start, creativity already hits boundaries regarding the finite number of instruments you have on hand.

That’s why today I’m sharing more about a project called Nsynth. Standing for Neural Synthesisers, Nsynth enables musicians to create new sounds by combining existing ones in a very easy way.

You can try it for yourself with their demo website here: 

Screen Shot 2018-06-26 at 11.00.24.png
Nsynth Sound Maker Demo

See that it doesn’t have to be music instruments, as you can imagine create a new sound based a pan flute and a dog 🙂

Why would you want to mix two sounds? Sure, software enables you to create your own synthesisers already, and you may as well play two instrument samples at a time.

Blending two instruments together in a new way is basically creating sounds, like a painter would create new colors by blending them on his palette. See this as new sounds on your palette.

How Nsynth works to generate sounds

NSynth is an algorithm that generates new sounds by combining the features of existing sounds. To do that, the algorithm takes different sounds as input. You teach the machine (a deep learning network) how music works by showing it examples. 

The technical challenge here is to find a mathematical model to represent a sound so that an algorithm can make computations. Once this model is built, it can be used to generate new sounds.

NSynth Autoencoder

The sound input is compressed in a vector, with an encoder capable of extracting only the fundamental characteristics of a sound, using a latent space model.  In our case, sound input is reduced in a 16-dimensional numerical vector. The latent space is the space in which data lies in the bottleneck (Z on the drawing below).  In the process, the encoder ideally distills the qualities that are common throughout both audio inputs. These qualities are then interpolated linearly to create new mathematical representations of each sound. These new representations are then decoded into new sounds, which have the acoustic qualities of both inputs.

In a simpler version:

nsynth-ae.png

To sum up, NSynth is an example of an encoder that has learned a latent space of timbre in the audio of musical notes.

Musicians can try it out on Ableton Live:

Of course, the Magenta team didn’t stop here, and I’ll be back showcasing more of their work soon!

Sources and Inspiration