LUCA Artificial Intelligence in the Industrial Sector: the success story of Repsol Although technological advances have been left behind in recent years, the industrial sector presents opportunities for developpment in the field of Big Data and Artificial Intelligence, and also in...
Richard Benjamins Towards a new Big Data ecosystem to mitigate Climate Change Big Data can be used to help fight Climate Change [1],[2],[3]. Several projects have analysed huge amounts of satellite, weather and climate data to come up with ways to...
Richard Benjamins Lessons learned from The Cambridge Analytica / Facebook scandal It has been now some time that the Cambridge Analytica / Facebook scandal was first revealed on March 17, 2018 by The Guardian and the New York Times. Much has been...
LUCA Traffic spikes before football games? The effect of the World Cup in Germany Post written by Telefónica NEXT Once every four years, The FIFA World Cup brings fans together in a month of surprising events. Every game is different, but each is filled...
LUCA Success Story: pioneering project for economic and social development In a previous article we talked about how important it is to undertake a cultural transformation, that is, of individuals, whilst a company or sector undertakes a digital transformation...
LUCA Improving intent to purchase with mobile advertising: Success Story of Milpa Real The Kantar Millward study demonstrates the effectiveness of the Mobile Advertising platform Data Rewards. The objective of the Brand Lift study is to measure the effectiveness of an advertising campaign,...
LUCA Improving intent to purchase with mobile advertising: Success Story of Milpa Real The Kantar Millward study demonstrates the effectiveness of the Mobile Advertising platform Data Rewards. The objective of the Brand Lift study is to measure the effectiveness of an advertising campaign,...
LUCA How Germany moves Our colleagues in Telefonica NEXT have created, using mobility data, data analysis and big data, an interactive map that lets you visualize nationwide traffic flows. Cities, transport companies and...
Deep Learning vs Atari: train your AI to dominate classic videogames (Part II)LUCA 22 June, 2018 Written by Enrique Blanco (CDO Researcher) and Fran Ramírez (Security Researcher at Eleven Paths) In this article, the second about our experiment using Reinforcement Learning (RL) and Deep Learning in OpenAI environments, we continue on from the previous post that you can read here if you haven’t done so already. This post presents the results obtained after training our agent in the Breakout-v0 and SpaceInvaders-v0 environments. Before continuing, you may want to also catch up on our recent webinar in which we went into more detail about the results you will read about in this blog. Introduction Reinforcement Learning (RL) is the area of Machine Learning used to train artificial intelligences to play videogames in environments developed in OpenAI Gym. It is capable of providing an agent with algorithms that allow it to examine and understand the environment that it is working in in order to achieve an objective in exchange for a set reward. These algorithms help the agent to learn, through trial and error, to maximize the reward that it can obtain based on the variables that it observes in the game, all without needing human intervention. Below, we briefly define some of the common concepts of Reinforcement Learning (RL): Environment: this describes the game in which the agent must act and learn to develop. Reward: the incentive that the agent obtains after carrying out a determined action. In the case of Breakout-v0, the agent receives a positive reward when it manages to return the ball and destroy one of the bricks. State: this is usually a tensor obtained from the observation space of the environment. In this case, the states consist in a collection of preprocessed images with the aim of helping to train the model. Action: this is a possible move in the action space that the agent can carry out, based on the current game state or the historic states that it has studied. For example, in our case it would be to move left, right or stay still in terms of direction, and to shoot the ball Control policy: this determines how the agent chooses the action that it will take. The programmer can choose the control policy at the time of carrying out the training of the neural network. Normally, you can choose a random action to start with, and once the model trains sufficiently, it will act based on the maximum value that the model has obtained up to that point. Figure 1: Diagram showing the learning process of an agent during the training. Beginning the Training The algorithm used in this paper, which we will explain in the following sections, aims to maximize the reward each time. The agent recognizes images of the game environment and adds them to a neural network, which will allow it to estimate the best action to take based on the input data. We will use the TensorFlow library to build the architecture of the deep network as well as to make the relevant calculations. The values of the actions that the model estimates from a given input are normally referred to as Q-Values. When an agent knows these values beforehand, it only has to select the action that maximizes the corresponding Q-Value for each game state that it observes. However, these Q-Values should be explored through an extensive training process, due to the large amount of possible states that can occur. Control Policy At the start, the values of the actions start at zero, allowing the agent to take random actions in the game. Each time that the action returns a positive reward (destroying a brick), the weights and biases of the layers of the model’s architecture update, which means that the estimation of Q-Values becomes increasingly refined. When approximating the map of different states and actions, Reinforcement Learning techniques are often quite unstable when using a deep neural network. This is due to the nonlinearity of neural networks and the fact the small changes in Q-Values, when there is an inappropriate control policy, can drastically change the action and therefore lead to very different game states. Due to all this, and with the aim of reducing instabilities that could arise during training, one usually runs a random sample of a large number of states, actions and rewards in order to explore the greatest number of possibilities of the current casuistry and avoid divergences and blockages in the model’s training. .gist-file .gist-data {max-height: 500px;} https://gist.github.com/eblancoh/625f99a75bd8c851364899705fbadf41.js Q-Function The objective of the agent is to interact with the emulator with the intention of learning which action to take in a given game state – or set of game states – in order to maximize the reward of said action. A function that returns the optimum action given a certain game state is defined as: Q(s,a) = reward(s,a)+γ · max(Q(s’,a’)) This function is known as the Bellman Equation. It shows that the value of the Q function for a given state s and an action a equals the current reward r for that state s and the action a plus the expected reward derived from a new action a’ and a previous state s’, corrected by a discount factor γ∈[0,1]. This discount hyperparameter allows us to decide how important future rewards are in relation to the current reward. Values close to γ≃1 will be better suited to Breakout, because the rewards are not obtained immediately after the action, since various subsequent actions may take place before it becomes clear whether the initial action was successful or now. In other words, it takes various frames after bouncing the ball for a brick to break. https://gist.github.com/eblancoh/766e5eea60db51e73a21c3ee18eca0ef.js The Loss and Optimization Functions Given the large number of frames per second to process, and the elevated dimensionality of the game states, it is impractical to directly map the causality between action and state. This forces us to approximate the Q function through our random sample of states, rewards and actions. Usually, the loss function chosen aims to minimize the Root Mean-Squared Error of the Q-Values that we obtain through using our model, and the expected Q-Values. sqrt(loss) = Q(s’, a’) – Q(s, a) = reward(s, a) + gamma · max(Q(s’, a’) – Q(s, a)) In order to find the minimum of the previous function one can use the iterative optimization algorithm “Gradient Descent”. This algorithm calculates the gradients of the loss function for each weight and moves them in the direction that minimizes the function. However, finding the minimum of a nonlinear function can be complicated, especially due to the possibility of being stuck on a local minimum and not the global minimum what you want, or carrying out many iterations on a flat part of the curve. Optimizing a neural network is a complicated task, which is highly dependent on the quality and quantity of the data with which the model trains. The complication of optimizing the network is also a result of its architecture, which consists in a larger number of layers and has greater dimensionality than usual, and will require a greater number of weights and biases. https://gist.github.com/eblancoh/593f019e5735b810e436b7c2d25db9c9.js Pre-Processing Input Data One of the main deciding factors of a good training of the model, given the long computing times required, is the pre-processing of the image and the nature of the input to the neural network. This will also directly affect the routines that one needs to develop for interacting with the environment. In general, it is advisable to process the image generated by the Gym environment before it is included in the model. Generally, this aims to reduce its dimensionality, by eliminating the information that would not be useful when training the neural network. Normally, there would be an emphasis on the information relating to color that OpenAI Gym contains in its three color channels. These channels do not contain valuable information for the training of our model, and will therefore be forgotten before introducing the states to the model. The images returned by the OpenAI Gym environment are arrays of 210×160 pixels grouped in three RGB layers. This increases the memory usage. Therefore, it is vitally important to preprocess the images in order to reduce the dimensions of the inputs, to eliminate unnecessary information and to reduce memory usage. The tests carried out in this project are based on two approximations regarding the processing of images: As a first approximation, we take images of the game environment and process them; making them greyscale, resizing them, removing any background and using a simple image filtering to detect movement. The resulting state of these steps is the latest image of the environment as well as recent traces of movement of the objects. In the second alternative, we have opted for using a stack of four images as the input, with the intention of allowing the model to learn to detect movement. This is necessary since an individual state offers little information about the velocity and direction of the ball and paddle. We are only interested in the area of the game where the ball and paddle are moving and where the bricks are. The borders of the screenshots do not offer valuable information to the model, so we eliminate these areas. Furthermore, we reduce the resolution of the image by 50% and turn it black and white (in a binary scale) since the RGB channels also offer little information of interest. In the next post, we will offer a description of the architecture of the model with which he have trained our agents in Breakout-v0 and SpaceInvaders-v0. We will also explain in greater detail the logic of the training, explain the testing phase, and offer some conclusions about the project. Don’t miss out on a single post. Subscribe to LUCA Data Speaks. Business Messaging – From humble SMS to conversational commerceLessons learned from The Cambridge Analytica / Facebook scandal
LUCA Artificial Intelligence in the Industrial Sector: the success story of Repsol Although technological advances have been left behind in recent years, the industrial sector presents opportunities for developpment in the field of Big Data and Artificial Intelligence, and also in...
Richard Benjamins Towards a new Big Data ecosystem to mitigate Climate Change Big Data can be used to help fight Climate Change [1],[2],[3]. Several projects have analysed huge amounts of satellite, weather and climate data to come up with ways to...
Olivia Brookhouse The Big Data behind Black Friday It is often the products that seem to have “come out of nowhere” that suddenly experience rocket sales and have everyone talking. Whether its health fads such as...
Olivia Brookhouse How to learn Python: the most important skill for employers The application of Artificial Intelligence to optimize Big Data and provide Data Insights is becoming a necessity to all business processes, therefore programmers are in hot demand. Designing effective...
LUCA SMASSA Success Story: Intelligent Parking Services Today we bring another success story from the public sector: the example of SMASSA, the Sociedad Municipal de Aparcamientos (Municipal Parking Company) whose mission is to solve the parking...
Olivia Brookhouse Big Data, the fuel for Digital Banks In 1981, Bill Gates said “640KB will be enough for anybody”. Surpisingly, it was hard even then, to imagine where we would be today; in a sea of data....