AI of Things Stanford University’s “Women in Data Science” to debut in Madrid On February 3rd, Synergic Partners, the niche Big Data consultancy wing of LUCA, will organize the “Women in Data Science” conference for the first time in Madrid – in...
AI of Things Mobility and transport planning in Neuquén There are more and more cities who, thanks to technology and the use of data, are hoping to improve the quality of life of their citizens though carrying out...
Patrick Buckley How will AI change the labour market for the better? From the way we shop, to the way we learn, the digital world in which we live is unrecognisable from the reality of a decade ago. One area which...
Beatriz Sanz Baños 5 IoT elements to improve the customer experience in your store There are fewer and fewer sectors that can ignore Internet of Things. What started as a tech trend focused on home security and personal exercise has evolved in such a...
ElevenPaths The Intelligent MSSP During years, Managed Security Services (MSS) have been the most effective strategy to tackle the increasing and changing threat landscape. Otherwise, some disruptive factors are compelling a new approach...
ElevenPaths Detected an extension in Chrome Web Store, active from February, that steals credit cards We have detected an extension for Google Chrome, still active, that steals data from web site forms visited by the victims. This extension, which is still available on Chrome...
The 2 types of learning in Machine Learning: supervised and unsupervisedAI of Things 23 July, 2019 We have already seen in previous posts that Machine Learning techniques basically consist of automation, through specific algorithms, the identification of patterns or trends which “hide” in the data. Thus, it is very important not only to choose the most suitable algorithm (and its subsequent parameterisation for each particular problem), but also to have a large volume of data of a sufficient quality. The selection of the algorithm is not easy. If we look it up on the internet, we can find ourselves in an avalanche of very detailed items, which at times, more than helping us, actually confuse us. Therefore, we are going to try and give some basic guidelines to get started. There are two fundamental questions which we must ask ourselves. The first is: What is it that we want to do? To respond to this question, it may come in handy to reread two posrs that we posted earlier in our LUCA blog, “The 9 tasks on which to base Machine Learning”, and “The 5 questions which you can answer with Data Science”. The crux of the matter is to clearly define the objective. To solve our problem, then, we will consider what kind of task we will have to undertake. This may be, for example, a classification problem, such as spam detection or spam; or a clustering problem, such as recommending a book to a customer based on their previous purchases (Amazon’s recommendation system). We can also try to figure out, for example, how much a customer will use a particular service. In this case, we would be faced with a regression problem (estimating a value). If we consider the classic customer retention problem, we see that we can address it from different approaches. We want to do customer segmentation, yes, but which strategy is best? Is it better to treat it as a classification problem, clustering or even regression? The key clue is going to be to ask us the second question. What information I have to achieve my objective? If I ask myself, “My clients, do they group together in any way, naturally?”, I have not defined any target for the grouping. However, if I ask the question in this other way: Can we identify groups of customers with a high probability of requesting the service to be stopped as soon as their contract ends, we have a perfectly defined goal: whether the customer will deregister, and we want to take action based on the response we get. In the first case, we are faced with an example of unsupervised learning, while the second is supervised learning. In the early stages of the Data Science process, it is very important to decide whether the “attack strategy” will be monitored or unsupervised, and in the latter case define precisely what the target variable will be. As we decide, we will work with one family of algorithms or another. Supervised Learning In supervised learning, algorithms work with “labelled data”, trying to find a function that, given the input data variables, assigns them the appropriate output tag. The algorithm is trained “historical” data and thus “learns” to assign the appropriate output tag to a new value, that is, it predicts the output value. For example, a spam detector analyses the history of messages, seeing what function it can represent, depending on the input parameters that are defined (the sender, whether the recipient is individual or part of a list, if the subject contains certain terms etc), the assignment of the “spam” or “not spam” tag. Once this function is defined, when you enter a new unlabelled message, the algorithm is able to assign it the correct tag. Supervised learning is often used in classification issues, such as digit identification, diagnostics, or identity fraud detection. It is also used in regression problems, such as weather predictions, life expectancy, growth etc. These two main types of supervised learning, classification and regression, are distinguished by the target variable type. In classification cases, it is of categorical type, while in cases of regression, the target variable is numeric. Although in previous posts we spoke in more detail about different algorithms, we have already moved forward with some of the most common: 1. Decision trees 2. Classification of Naïve Bayes 3. Regression by least squares 4. Logistic Regression 5. Support Vector Machines (SVM) 6. “Ensemble” Methods (Classifier Sets) Unsupervised Learning Unsupervised learning occurs when “labelled” data is not available for training. We only know the input data, but there is no output data that corresponds to a certain input. Therefore, we can only describe the structure of the data, to try to find some kind of organization that simplifies the analysis. Therefore, they have an exploratory character. For example, clustering tasks look for groupings based on similarities, but there is no guarantee that these will have any meaning or utility. Sometimes, when exploring data without a defined goal, you can find curious but impractical spurious correlations. For example, in the graph below, published on Tyler Vigen Spurious Correlations’ website, we can see a strong correlation between per capita chicken consumption in the United States and its oil imports. Figure 1: Example of a spurred correlation Unsupervised learning is often used in clustering, co-occurrence groupings, and profiling issues. However, problems that involve finding similarity, link prediction, or data reduction can be monitored or not. The most common types of algorithms in unsupervised learning are: 1.Clustering algorithms 2.Analysis of major components 3.Decomposition into singular values (singular value decomposition) 4. Independent Component Analysis Which algorithm to choose? Once we are clear whether we are dealing with a supervised or unsupervised learning case, we can use one of the famous “cheat-sheet” algorithms (what we would call “chop”), to help us choose which one we want to start working with. We leave as an example one of the most well-known, the scikit-learn. But there are many more, such as the Microsoft Azure Machine Learning Algorithm cheat sheet. Figure 2: “Chop” algorithm selection from Scikit-learn So, what is reinforcement learning? Not all ML algorithms can be classified as supervised or unsupervised learning algorithms. There is a “no man’s land” which is where reinforcement learning techniques fit. This type of learning is based on improving the response of the model using a feedback process. They are based on studies on how to encourage learning in humans and rats based on rewards and punishments. The algorithm learns by observing the world around it. Your input information is the feedback you get from the outside world in response to your actions. Therefore, the system learns from trial and error. It is not a type of supervised learning, because it is not strictly based on a set of tagged data, but on monitoring the response to actions taken. It is also not unsupervised learning, since when we model our “apprentice” we know in advance what the expected reward is. Don’t miss out on a single post. Subscribe to LUCA Data Speaks. You can also follow us on Twitter, YouTube and LinkedIn How Big Data & Artificial Intelligence are having a positive impact in the sport of Rugby UnionTelefónica merges home communications through Aura Ecosystem
Kassandra Block How Artificial Intelligence is helping companies to improve their customer relationship: Vivo’s success story in Brazil During the pandemic, the need for companies to use digital channels in order to provide a quality customer service has become evident. An example of this digitalisation in its...
Matilde de Almeida Matching startups: virtual medical assistance and 5G-connected ambulance with Visionable Innovating within a company is not an easy task, but it is necessary, and the business development team at Telefónica Open Innovation, takes care of just that. We attempt...
Gonzalo Álvarez Marañón Nobody on The Internet Knows You Are A Dog, Even If You Use TLS Certificates You may have noticed that most websites have a little padlock on them. If you click on it, a window will pop up stating that “the connection is secure”....
Patrick Buckley How AI and Machine Learning help to develop vaccines As Christmas approaches this year, we have all been gifted the great news that the Pfizer/BioNTech vaccine has shown to be both safe and effective in creating an immune...
Beatriz Sanz Baños Road safety and IoT Mobility is one of the key factors to consider in order to make cities more efficient, a necessity taking into account the millions of citizens travel to work or study centers in their vehicles. Taking...
Pablo Alarcón Padellano Move to the cloud with confidence supported by ElevenPaths and Check Point The goal of ElevenPaths Public Cloud Managed Security Services is to help you to secure any cloud workload and to mitigate cloud risks.