Modern Artificial Intelligence is performing human-like tasks that seemed out of reach just a few years ago. Granted, we are talking about narrow AI (tasks involving only a small subset of human capabilities) – general AI is still far away. But on that narrow task we are experiencing spectacular advances. The most salient results are in perception: visual perception (image recognition, which in tasks such as large-scale object recognition is achieving human-like performance) or audio perception (speech recognition is also achieving unprecedented results). But other noteworthy results have also made headlines, such as Google’s AlphaGo beating Go champions. There are also initial forays into ‘artistic’ traits such as painting styles or music composition.
Many of these advances are tightly related with development in certain area of Machine Learning – Deep Learning (statistical learning performed by neural nets with many layers). Deep Learning is achieving impressive results in many areas due to its versatility and the capabilities of modern networks to be trained very efficiently. But there is a catch: to achieve its magical performance, a deep learning instance typically needs to be trained with lots of data.
One typical example is ImageNet, the image database typically used to train deep learning classifiers for object recognition. ImageNet is large: it contains more than 14 million images. These are distributed into many classes: there are near 22,000 different classes in it (each class being that of images containing relevant instances of a given entity). In order to be able to recognized, say, cats in images, we could gather the cat images in ImageNet and train a deep learning neural network with them (along with a varied and sizable collection of images that are not cats). How many images would we use? Including subcategories (such as Siamese cat, Angora cat, etc) there are 22,387 images with cats in ImageNet. That’s indeed a lot of cats.
![]() |
Figure 1. Everybody likes cat pictures, so here we go: the “cat” class in ImageNet |
Modern AI is highly concerned with statistical pattern recognition, and that’s a huge difference with older (“classic”) AI, which was highly symbolic. Back at the beginning of Artificial Intelligence research it was believed that AI was the realm of logic and reasoning. Good-Old Artificial Intelligence (abbreviated to GOFAI) was all about establishing rules and reasoning over them, trying to emulate the higher levels of human thinking. This did not work. For perception tasks, GOFAI failed miserably, being incapable to cope with the sheer variety of reality, which is full of noisy instances not really suitable for “sharp” reasoning.
Nowadays it is accepted that tasks related with perception (making sense of the world around us) are much better solved with statistical machine learning, training systems with real examples of that world around us. But why should we need so many? Do humans need that many examples to recognize stuff?
I don’t have data at hand, but it seems unlikely that a human child will need to see 22,000 labeled instances of cats (i.e. having her parents and teachers show them 22,000 cats told to her as actually being cats) before it can recognize one. Of course, since each single cat is often presented to the child not as a still image but as a living animal, the child can probably see it from different angles and moving, which aids recognition. But still. Humans seem to need a lot less examples to be able to recognize them.
However that comparison is not fair.
A Deep Neural Network prepared for visual identification starts as a blank slate. Yes, we fix its topology (number and shape of neuron layers, activation functions, etc.) and the training procedure (mini-batches, dropout, momentum, etc.). But the network parameters (neuron weights, of which a big deep learning network can have millions) come uninitialized, or with random initializations.
From then on, the network is shown all the labelled training data (including our 22000 cats) many times; at the end of each training epoch the system has learned from the examples seen and (usually) improves its recognition performance, step by step, until it reaches its final (impressive) success.
A human brain is nothing like a blank slate.
Instead, it comes preconfigured with a lot of wiring. This wiring is imprinted in the human genome, which is the one containing the instructions for building the brain. Those instructions have been shaped by millions of years of natural selection: evolution has selected the brain wirings that made us fitter for survival, among them the ones that equal to a huge amount of “training epochs” for our neural circuits. We are not born with “random weights” in our brain, rather with structures already well prepared for perception tasks. Among them, recognition of cat-like animals. Which is an ability likely to be very useful for survival, hence a good trait to be acquired through natural selection.
![]() |
Figure 2. Human brains better come pre-trained for fast cat identification … or else |
And in that sense we could argue that the human brain comes pre-trained by evolution. That means our neural wiring has a great advantage over the current AI systems, which they need to overcome by using many examples.
In the next part we will talk about some ways to somehow overcome this data appetite of Deep Learning systems.