AI of Things (VI): Generative Artificial Intelligence, creating music to the rhythm of perceptron

Guillermo Caminero Fernández    7 June, 2022
Robot playing piano. Photo: Possessed Photography / Unsplash

In recent years, the number of AI (Artificial Intelligence) models that are used to generate synthetic information has exploded. One of the most famous, and one that has been with us since 2014, is GANs (generative adversarial networks). These neural networks have been used to generate images of all kinds and can be used for creative purposes such as creating works of art; or not so ethical purposes such as, for example, creating synthetic faces in order to create fake profiles on social networks.

AI models are also capable of ‘creating’ text. While it is true that text generation by AI has existed for a long time using memory networks such as LSTMs (Long short-term memory), it was not until the last few years (specifically in 2017) with the appearance of Transformers that texts were really generated with the quality and coherence that could have been written by a human.

Thanks to these state-of-the-art neural networks, words can be autocompleted, sentences can be finished or even novels (of dubious quality) can be written. The following website uses a transformer to complete a sentence

We have already mentioned that a computer today can create an image or a text with high quality. We could be satisfied with creating a digital painting, or writing a novel with a computer, but why not go further and create other types of data? In this blog, we will go into detail and talk about how an AI can create audio and music in particular.

The mathematics of music

An image for a computer is nothing more than a set of pixels arranged with certain intensities. Like images, a text for a computer is a set of letters arranged in a certain order, so the algorithm only has to find the correct order and value.

Pixels are the basis of images and letters are the basis of texts. So what can we use to create music? For this we will briefly talk about how a loudspeaker or a headphone works. These devices convert an electrical impulse into movement, this movement generates waves that compress the air generating longitudinal waves. These waves that are transmitted through the air are what we know as sounds and as a wave we can act on certain variables such as duration, intensity (amplitude), tone (frequency) and phase.

Depending on the frequency we will hear one sound or another, for example, a scale of musical notes can be represented as a sinusoidal signal with a certain frequency. We can see an example of the waveform of these notes in the image below.

Code to create the scale in image and sound with Python using the PyAudio library: gist scale
Code to create the scale in image and sound with Python using the PyAudio library: gist scale

We can play scores for a single instrument or channel with these basic notes, but if we want more complex sounds, we can make combinations of these notes or chords. For example, to play the chord corresponding to the notes do, mi, sol, we make a harmonic composition of the sines corresponding to these notes and add them in phase. The resulting waveform is the one shown in the following image.

Code to create the chord in image and sound with Pytho: gist chord
Code to create the chord in image and sound with Pytho: gist chord

Mathematics is very present in music and this application of mathematics in music has been studied a lot in order to know which frequencies sound “good” (consonance) and which sound “bad” (dissonance) together. For the analysis of these signals, and of music in general, there is a lot of information on the internet.

The following video will give you a clearer understanding of sounds, scales and how to generate different sounds with different waves.

Generating music with Artificial Intelligence

We are already seeing the complexity of creating a sound and we only have 2 seconds of audio, to get a complete song we need a huge composition of such sounds of different lengths and different pitches.

We might start thinking about a reinforcement learning algorithm for this task so that it would start offering random pitches and lengths while a user would choose the ones he likes and the ones he doesn’t like. Another alternative could be, for example, the use of genetic algorithms. In this case, different melodies are offered, the user selects the ones he/she likes and with these the new generation originates.

The above-mentioned options are processes that could take days, months or years to get something we like. In order to make this task fast and efficient, supervised algorithms combined with the use of knowledge transfer are used. Starting from music that has already been created and is to our liking, we can create new music that resembles the previous one.

There are different types of methods for this kind of music generation. The best known are the GAN’s (mentioned above) and the VAE’s (Variational Autoencoder).

  • GANs are trained as a competitive environment where there is a generator and a discriminator where each competes to beat the other. The generator generates music from random noise (fake music) and the discriminator is trained with real music and the generator’s fake music. When this discriminator is not able to differentiate between the real music and the music generated by the generator, it means that we have an “almost” real music generator. At first the generator creates completely random music, but with training it becomes more and more similar to the real music samples provided.
  • VAE has a latent space reduction and reconstruction structure. This type of music generation is widely used and is what OpenAI’s company OpenAI uses in its latest advance in the creation of Jukebox music. In addition, it makes use of the latest neural network technology such as Transformers (already mentioned) to achieve near-real music qualities. We can find a wide range of examples of songs created with this technology on the website

These neural networks are very computationally expensive and difficult to train without good hardware such as a GPU or TPU. OpenAI provides sample code for generating music using this model on Google Colab. Thanks to the resources provided by Google for free, we can train a model to create a few seconds of a song in the style of our favourite singer and with the lyrics of the song of our choice. 

There are other libraries or projects with the aim of creating music, such as Google Magenta, where we can find many examples of music creation in a multitude of ways, such as the aforementioned GAN, VAE, etc.

Now, if we are able to generate music that resembles that of certain artists with their rhythms, bases and even with an audio that could represent them, how far can AI go? Is it ethically correct to use the creations of these artists to create new ones? Do the rights of the music created deserve the rights of the original artist? Many questions arise that are likely to generate discussion and not all of us will have the same opinion.

If you want to know more applications of the fusion of the Internet of Things and Artificial Intelligence, known to us as AIoThings, you can read other articles in the series:

Leave a Reply

Your email address will not be published.