Guest Post written by Paulo Villegas – Head of Cognitive Computing at AURA in Telefónica CDO
Modern AI can achieve the impressive performance in perception & recognition tasks mentioned in part I because of advances in several areas: algorithm improvements (especially in the area of Deep Learning), increases in computing power (cloud computing, GPUs) and, very notably, the Internet.
The Internet is what made possible to amass the 14 million examples of images available in the ImageNet database mentioned in the previous part. In the case of supervised learning, it makes available the millions of annotated examples needed so that algorithms can learn from them. It gives a new perspective to Newton’s quote “standing on the shoulders of giants” transforming it into “standing on the sources of millions of dwarfs”
Another example is given by the famous Alpha Go case. How did Alpha Go beat Go world champions? One answer is by training with far more examples than human masters can possibly manage in their lifetime. It used 30 million moves from 160,000 actual games in a database. Then it improved by using reinforcement learning (a branch of machine learning that helps system learn by optimizing a reward function, in this case winning the game), playing against itself with again tens of millions of positions.
|Figure 1. ImageNet alone has over 14 million examples of images|
Its recent successor, AlphaGo Zero (or the even more recent AlphaZero, which can also play chess and shogi) seems to have overcome the restriction: it learns without data. The only data provided are the game rules, and then it uses reinforcement learning to improve its playing abilities. However the way it works is playing against itself: AlphaGo Zero played almost 5 million games against itself during its initial training. You could argue that this does not actually change the scene: what its creators have achieved is a very clever way to generate synthetic data (the plays of AlphaGo Zero against itself) to train it, but the amount of data needed is still huge. Nothing beats practice.
But although collecting great amounts of data is one of the reasons of the recent advances in Machine Learning results, it can only take you so far: there is almost always a limit to the number of training data we can obtain. And for certain tasks it is inherently difficult to come up with enough good examples. One way humans cope with that is by using transfer learning, by which knowledge learned in one task, domain or class of data can be reused in another context.
|Figure 2. You have likely never seen a babirusa (Babyrousa celebensis) before, so there is no specific training in your brain for recognizing it. However we have generic training for recognizing other animal shapes and parts, so by watching (and remembering) this single image, next time you see another picture of a babirusa you will recognize it. Thanks, transfer learning. (Source: By Masteraah at German Wikipedia )|
Deep Learning systems can use transfer learning too. A typical use is by employing a pre-trained network (or parts of it) for a different task. For instance, a big deep net trained for image recognition commonly starts with a few convolutional layers, which together learn a representation of the input data (image) into a higher level latent space. Additional layers then perform the desired task (e.g. classification). We could take the first layers of the trained net, hence taking advantage of the learned representation, stack together new layers on top of them, and train the resulting network for a different task. Given that a big share of the new net has already been pre-trained in the original net, and assuming the representation is also good for the new task, the training time can be reduced greatly and need far fewer examples to achieve good results. We have therefore performed transfer learning from the original network to the new one.
This approach (use a pre-trained network for a new task) is already well established as one standard procedure for image classification, though it still requires datasets of some size for the new task. Those requirements could be further reduced by using techniques such as meta-learning, in which the system learns the best procedure to learn the new task.
There is also a more extreme version of transfer learning called zero-shot learning. In this modality, it is possible to correctly identify classes without having ever seen a single instance of them. How can we achieve that? It may be possible by domain transfer (a variant of transfer learning). If I say that “a maned wolf is an animal similar to a fox but with unusually long legs”, then you may be able to identify which of the three animals in the following figure is a maned wolf without having seen a single one before.
|Figure 3. Zero-shot learning: pick the maned wolf, please.
(Source image 1, Source image 2, Source image 3)
There are also machine learning techniques developed to perform zero-shot learning. Some of them employ the same domain transfer technique: by adequately mapping between visual features of objects and word semantic representations (extracted from text corpora), a system trained with dogs, frogs, fish, etc. but not with cats might be able to find cat instances, by evaluating how cat relates to other terms in the word domain, and mapping them to visual domain. Which is not very different from what a human would do.
In summary, the overall learning process used by modern machine learning processes might be using features not that far from what human brains do. And going beyond standard supervised learning, there are a growing set of tools and procedures, such as transfer learning or zero-shot learning (but also reinforcement learning or unsupervised learning) that might increase their capabilities for identifying patterns and entities in the reality around us, which is a great part (though by any means not all) of our cognitive baggage as humans. At least concerning perception.
First post of this serie: The AI Hunger Games: Why is modern AI so data hungry?(I)