Here, we would like to present a ´new´ guest, reinforcement learning (RL). Not as new, because it has been with us since the 80s.
We will quickly go over the classic approximations, that until now have been sufficient in tackling the majority of Machine Learning problems. The main characteristics of this technology are very apparent to anyone who has had minimum contact in this area: in supervised learning we have labelled data (that already has numerical values for problems such as regression or categories for classification) together with those that the algorithm will learn from. In the case of non-supervised learning, we will have un-labelled data, we have the objective of discovering structures or patterns within the data (for example, with clustering problems or segmentation).
So then where does the learning by reinforcement come into play? In this case, these types of models will be used when you don´t initially have the data with which to learn available, which would be because they don’t exist or because we can´t wait to compile them. Or it could be because they change too quickly, and the exit is also modified with a higher frequency with which the models can typically understand.
We can better understand the concept if we go the origins of reinforced learning, which is based on the study of animal behaviour. The common example is of the newly born gazelle, who is capable of understanding how to walk and run in a few minutes, without having any previous knowledge or being shown how to use his legs. His learning method consisted of trial and error, interacting with his environment and learning what type of movements are beneficial and not, all because he wanted to reach his goal, in this case driven by his will to survive.
Any examples of current problems that adjust to these characteristics can be controlled by robotics (where the robot can get to know for the first time how he wants to move), or with interactive video games or classic games (where there is a lot of possibilities and the situation is constantly changing) where, like the gazelle, the objective is to maximise any notion of reward (that, depending on the game, can kill the maximum number of zombies possible).
Although the concept of this type of learning isn’t as clear as those for supervised and non-supervised learning, the key difference remains clear: whilst with classic methods they have data from which they can learn, RL algorithms will generate their own data based on experience, tests and making mistakes, to identify the best strategy or collection of movements, using the information (positive or negative) they have obtained from the environment and their actions, as a reinforcement. To summarise, RL models live through their own experiences and understand them, whilst the others have examples which they have to learn. And, most importantly, systems based in RL can survive learning from their environment, without the lamentation of being attached to the rules or learning models of the past.
We could focus more on the possible automatic learning strategies:
The basic working scheme of a Reinforcement Learning model can be seen in figure 1. The agent is the RL algorithm that makes decisions on how to behave in its environment. The environment is the world in which the agent operates, representing a universe of possibilities or situations that can lead to a concrete moment. The state is the characterisation of the situation in a given moment by the agent. The actions refer to the tasks that the agent brings forward in the environment. And, lastly, the reward is what guides the agents, associated with executed actions, in the form of feedback to the environment.
In almost all of the possible scenarios for the application of this type of learning, one can highlight all of those scenarios in which they presented a human action, and that cannot be a result of a collection of rules or traditional ML models. To mention a few: the atomisation of robotic processes (like the industrial robot Faunc, that learned for himself by grabbing objects form containers), the packaging for materials that need to be sent, the driving of automatic vehicles, digital marketing (where a model can learn to use personalised adverts and, in the moment, fit these to the users based on their activity), chatbots (used to understand user reactions), finances ( where it can be used to evaluate commercial strategies with the objective of maximising the value of financial portfolios), etc. Another typical example of the applications of RL algorithms are those which learn to play video games, as in the case of AlphaGo Zero, the first algorithm to beat a human world champion in the famous video game china Go. The image shows a complete map of the immense possibilities of RL.
Overall, there is more to life than classic models and supervised or non-supervised ML algorithms, as we can see presented here: the various tasks RL can help us complete.
Written by Alfonso Ibañez and Ruben Granados