You are less rational than you think when you take decisions under uncertain conditions

ElevenPaths    8 November, 2018
I propose you the following game of luck:
 
  • Option A: I give 1,000 € to you with a probability of 100%.
  • Option B: Let’s leave it to heads or tails: if it’s heads, you will win 2,000 € but if it’s tails, you will win nothing.
Which option would you choose? A sure profit or the possibility to win twice more (or nothing)? If you think like 84% of the population, you may have chosen option A: a sure profit. Ok, so now I will propose you another scenario. You must pay a fine and you can choose how to do it:
  • Option A: You pay 1,000 € for the fine with a probability of 100%.
  • Option B: You flip a coin to decide it: if it’s heads, you will pay 2,000 € for the fine but if it’s tails, you will pay nothing.
Which option would you choose now? Would you pay the fine or would you flip a coin, considering that you may pay nothing (or twice more)? In this case, if you are like 70 % of the population, you may have chosen option B. So, are you doing it well or not? Ok, let’s analyse what’s happening here purely from a rational point of view.

According to the expected utility theory, you will always choose the option that maximizes decision utility

The expected value of a decision E[u(x)] (or the return that can be expected) is calculated as the product of two simple amounts: the probability of the income p multiplied by the value (or utility) of the income u(x). That is to say: the more likely it is that you win something, and the greater value it has, the higher will be the expected value. This is mathematically represented as follows:
https://business.blogthinkbig.com/wp-content/uploads/sites/2/2019/04/riesgo1.png

If we were 100 % rational, by using this formula we would always know what to do at any time and how to take the ideal decision. For this purpose, we would only need to calculate the probability of each decision, its utility, and then take the decision that maximizes the expected value. Unfortunately,
humans are not rational decision-making machines. We are not “homo economicus” with the ability to perform a perfect cost-benefit analysis and subsequently to choose, completely objectively, optimal results. Leaving games of luck behind, the nice Expected Utility Theory engages two big errors when we apply it to our everyday life:     
  1. We are awful at estimating the chance of winning.
  2. We are awful at estimating income value.
To put it in context, let’s analyse the two initial proposals considering this theory.
Regarding the game of luck, the expected value for option A is:
E(A)=1.0∙1,000=1,000

While the expected value for option B is:
E(B)=0.5∙2,000+0.5∙0.0=1,000

Both values are identical! Therefore, purely from a rational point of view,
both should be equally important to us. What about the second scenario?

 

In this case, the expected value for option A is:

E(A)=1.0∙(-1,000)=-1,000

While the expected value for option B is:
E(B)=0.5∙(-2,000)+0.5∙0.0=-1,000

Once again, they are identical. Consequently, once again, it would be the same to choose one over the other. So, why do most of the people choose option A in the first case and option B in the second scenario, instead of choosing any of them? Because we are not purely rational!
  • We would rather win a small but sure income, than a potential great income. As the saying goes: “a bird in the hand is worth two in the bush”.
  • However, we detest sure small losses and would rather have a potential great loss. That is to say, we feel aversion to losses, so we assume the risk rather than lose.
Of course, our brain doesn’t calculate anything. It only applies a heuristic: if you can certainly win something, just take it and don’t risk it for more; if you can avoid a sure loss, take the risk even if the potential loss can be higher. When equivalent incomes and losses are weighted, the latter
“outweighs”.

Indeed, win satisfaction is far lower than the pain of grief. It’s quite easy to understand it: if you go out with 100 € but you lose 50, your subjective assessment of this loss is higher than if you go out with no money and you find 50 €, even if, objectively, such incomes are equal. In both cases you come back home with 50 €, but the process is not at all indifferent to you.

 
According to the Prospect Theory, our loss aversion will lead us to risk more to not lose rather than to win  
Do you remember the last entry, A story about two minds: the vast difference between real and perceived risk? The behavioural economics modern psychology establishes a human model radically different from the “homo economicus”: when our brain, under uncertain circumstances, faces a complex situation, it just replaces the problem with a simple one. This is due to heuristics or “thought shortcuts”, that lead us to take “irrational” decisions, although perfectly justifiable.      

 

The following mathematical curve shows graphically the basis of the Prospect Theory, developed by Kahneman and Tversky:

https://business.blogthinkbig.com/wp-content/uploads/sites/2/2019/04/prospectiva.png

This curve lists three essential cognitive characteristics of the Prospect
Theory, related to System 1:
  1. It’s not a straight (or concave) line, as expected from the Utility Theory.
    Interestingly, it is like an S, which shows that awareness about incomes and losses tends to diminish: we tend to overestimate the small likelihoods and underestimate the great ones.
  2. It is also surprising that both curves are not symmetric. The slope changes steeply on the point of reference due to the loss aversion: you react more strongly to a loss than to an –objectively equivalent– income. Indeed, this value is estimated to be 2-3 times stronger.
  3. Finally, options are not assessed on the basis of its result, but on the basis of the point of reference. If you have a capital of 1,000.00 € in the bank, you will be happier if you receive an additional amount of 1,000.00 € than if you already have 1,000,000.00 € in the bank. They are the same 1,000.00€, but the point of reference is different: that’s why you don’t appreciate them in the same way. It’s the same for losses: I’m sure that a loss of 1,000.00 € doesn’t impact you in the same way if you already have 2,000.00 € than if you have 1,000,000 €, right?

 

A Decision Theory in the field of Information Security
Considering the abovementioned information, two ideas are becoming clearer:
  1. We don’t assess objectively losses and incomes.
  2. We are swayed by the point of reference and loss aversion.
In such circumstances, it is worth pondering about the following two hypotheses on how you will invest in security measures depending on how
information is presented:
  • 1st hypothesis: when two investment options on information security measures are positively presented, you will choose that one with greater certainty. 
  • 2nd hypothesis: when two investment options on information security measures are negatively presented, you will choose that one with less certainty. 
Let’s see an example. Imagine that your company has allocated a budget to fund an information security package. Without such aversion, your company losses have been estimated to be 600,000.00 € (financial, physical,
data, reputation and time losses included). Which of the following packages, A or B, would you choose in each scenario? 
  • 1st scenario: options positively presented:
    • Package A: you will certainly save 200,000.00 € of assets.
    • Package B: there is a 1/3 likelihood of saving 600,000.00 € of assets and a 2/3 likelihood of not saving anything.
    • 2nd scenario: options negatively presented:
      • Package A: you will certainly lose 400,000.00 € of assets.
      • Package B: there is a 1/3 likelihood of not losing anything and a 2/3 likelihood of losing 600,000.00 € of assets.
      As you may observe, they are the first two scenarios proposed, but reformulated in terms of security decisions. Although A and B packages in both scenarios lead to the same expected utility, according to the Prospective Theory, in practice most of the security managers would choose package A from the first scenario (it’s better to save something certainly than to take the risk of not saving anything) and package B from the second one. However, an experience showed that in the second scenario both packages were chosen, with a bias towards package A.

       

      How important are these results in our everyday life? It’s impossible to list all the potential attacks existing and to calculate their probability and impact according to the traditional risk assessment formula. Therefore, you must be on guard against the mental processes that keep you away from optimal decisions:

      • Depending on your attitude, risk-seeking or risk-avoidance, you tend to react one way or another, so bridging your rationality. Risk-seeking persons will choose options B. In practice, we tend to choose certain options when we face profits, in the same way we choose risky options when we face losses. That’s why, before taking a security decision, stop and ask yourself: How this option is being raised, as a profit or a loss? How do I tend to react when facing such scenarios? Do I tend to be risk-seeking or risk-avoidance? Who is taking the decision, System 1 or System 2?
      • Thus, when presenting an investment option before the committee or the managers, you can do it from a positive or negative framework. In the first case, just raise profit certainty and keep away probabilities and risk. In the second case, instead of raising a sure loss (even though small), just raise the possibility of not losing (even if you risk a big loss) and point its high probability out.
      • When framing a security investment, use the desire of earning with certainty. Instead of presenting this security investment as the expected protection against hypothetical threats that could not come into being, just focus on certain and unquestionable profits: a better reputation, customer’s reliability, efficient processes and operations, regulation compliance, etc. Try to drive the discussions to profits and talk about glaringly obvious matters. Seek sure profits and keep you away from possibilities and uncertainties.
      • As security engineer or “defenders”, you are a good friend of Losses. In short, whatever you do, you will lose: if attacks are successful, you lose; if there is no proof of successful attacks, does it mean that you won? No, it doesn’t, so you will be told that you spent too much on security: you have lost again. Nobody said that working in cybersecurity was easy or grateful, it’s even worse than the goalkeeper’s work. Working with losses fosters a risk-seeking attitude: you are likely to risk more for a total defence, so ignoring sure solutions against minor threats.
      • Bear in mind that it’s really easy to overestimate small probabilities. This can lead you to invest in solutions that protect against striking but not prevalent threats. You can invest in APTs flashily named and at the same time forget that most of the attacks are carried out through common and not at all glamourous methods: phishing, webpages injections, traditional recompiled and repacked malware against which there are patches… Anyway, more of the same, that is far removed from “advanced”, “intelligent” or “sophisticated”. For sure, they are highly persistent, since the most successful threats are the oldest ones. Nihil novum sub sole.
      • Don’t fall into the diminishing sensitivity trap. The S curve gets flatter. This means that a first incident causes a higher impact than the tenth one of the same magnitude. Each attack “will hurt” less than the previous one, losses being the same. The organisation gets desensitized. For this reason, acting from the first incident is so important, since the organisation is raw. The more time you take to react, even if the incident occurs again, the less striking it will be considered. After all, here we are, right?
      • For defenders, an attack is successful or not, and the result is all or nothing. If an attack is 1% successful, you are not 99% protect since, in case of the attack being successful, you will have succumbed to 100%. A successful and serious incident will radically move your point of reference to the losses. You won’t feel as safe as before the incident. Therefore, the organization will probably invest in an attempt to bring the point of reference back to its initial state. A change in the point of reference will cause your sensitivity to change when dealing with the same incidents: if this point is lowered, then a terrible incident for you before will make you feel indifferent now, and the other way around. It’s important to check the point of reference by using all the metrics and measures at your disposal.
      No matter how hard you try, your will never take ideal or perfect decisions. You will be obligated to face countless restraints in terms of resources (both economic and personal), culture, legislation, etc. Moreover, your own behaviour against risk must be brought into the equation, such behaviour being influenced by a number of factors of which YOU ARE NOT EVEN AWARE.

       

      With this entry, I want to help you to become more aware of some of these factors. Keep them in mind for your future security decisions. People tend to accept an incremental profit in security instead of the probability of a greater profit. In the same way, they tend to take the risk of a big loss instead of accepting the certainty of a small one, what about you?

      Gonzalo Álvarez de Marañón
      Innovation and Labs
      (ElevenPaths)

The factory of the future

Cascajo Sastre María    7 November, 2018

The process of industrial transformation is a fundamental part of history. We all remember the history of the first industrial revolution with the arrival of steam, the second with electricity and the third with the start of automation. Now, and almost without realizing it, we are fully immersed in the Fourth Industrial Revolution, also called Industry 4.0 in Europe or Smart Manufacturing in the United States.

This revolution was born with the arrival of Internet of Things to the industrial sector. The development and evolution of this technology applied to industrial processes has led to the birth of a new term: Industrial Internet of Things, whose acronym is IIoT.

It is about creating an ecosystem based on the integration of processes, with machines, applications and people, all interconnected with each other. Therefore, we hear more and more about the factories of the future, including connected factories or factories 4.0. All these terms refer to the implementation of IoT technology in factories and plants to make stock management and production processes more efficient, faster and more accurate.

How do you manage to transform a factory? Connectivity is the key to this whole process. Implementing interconnected devices in factories will allow workers to create new work patterns thanks to the analysis of the information they will receive. The most practical feature of them all, it will allow this information to be updated in real time.

Data is the treasure of the digital age. With it, business owners can obtain information about their business that until now they did not know about. To gather all this information, the logistics sector has the RFID sensors as protagonists. They are sensors that perform radio frequency identification of a label, that is, they use a wireless and remote identification technology from which a device connected to a reader collects information and sends it to a central station using radio waves.

The development and evolution of this technology applied to industrial processes has led to the birth of a new term: Industrial Internet of Things, whose acronym is IIoT.

However, to get the most out of Big Data, you have to know how to interpret all this volume of information. If the workers and managers know how to analyze all this information, they will be able to see in real time what causes a problem and then solve it very quickly. Not just that, but thanks to predictive analytics you might notice a problem before it happens.

The digitization of the factories makes the information accessible to all workers, plant managers and directors thanks to its storage in the cloud. In this way, the data can be consulted at any time and from any site and device. This also favors the transfer of information between departments or business areas.

Among the most notable advantages of these innovative plants are the reduction of production costs and the time for problem detection. In turn, the efficiency, productivity and speed with which the product can reach the end user are increased. In this line, it is worth noting that the connected factories favor sustainability, since through the IoT they can also manage the lighting of the factory, as well as favoring energy efficiency in production processes.

And this is only the beginning. Undoubtedly, technology will continue to advance and evolve so that factories continue to digitize and who knows, maybe we will talk about Industry 5.0 sooner than we think.

Leave a Comment on The factory of the future

4 trends in technology for 2019

AI of Things    7 November, 2018

Now that the year is almost coming to a close, it’s worth it to take a look at the predictions made for next year. What are the trends on everyone’s radar? what problems are we looking to solve, and how can Artificial Intelligence help us?

Even though there are many trends and many predictions, we will focus on four key points that are already important, but will gain even more recognition in 2019. 

1.     Artificial Intelligence

Big Data is still as relevant as ever, but most companies are already aware of its benefits, and have teams dedicated to obtaining data and insights. If you are not using or focusing on Big Data, you are missing out. The focus next year will be what layers we can add to this, to continue basing business decisions on data, but also improving processes. As we covered in the LUCA Innovation Day, Artificial Intelligence will continue to be a point of focus, and a much talked about trend in 2019.

In survey conducted across the USA and Europe by IT marketplace Spiceworks, 76% of participants stated they believe Artificial Intelligence will benefit their work by automating mundane tasks that take up more of their time, and with this, help them focus on the strategic aspects of the business.

AI can help in a myriad of ways, including the ability to build improved algorithms, AI assistants, and in the medical industry; where AI can help predict a patients outcome (life or death) with 95% accuracy.

The challenge however, will be in making sure that a company and its team has necessary skills and resources to implement Artificial Intelligence, which only 20% of IT professionals believe that their organization has.

2. IT as a Service (ITaaS)

While many companies want to head in the direction of a digital transformation, some lack the budget to do so, and this prevents them from riding the latest technology waves. With the introduction of the IT as a Service model, the IT service provider delivers an information technology service to a business, without the hefty price tag.

ITaaS should not be confused with a technological shift, but a shift in the organizational and operational side. Using ITaaS will help companies undergo changes and migrations at a much quicker pace. According to Hewlett Packard Enterprises “ The vendor takes the day-to-day tasks off the plates of your IT staff, freeing them up to concentrate on projects that can generate revenue.” This model also helps companies scale up or down, adjusting their needs and their budget.

3.     Chatbots

According to Forbes, around 40% of businesses will have or will start using chatbots by the end of 2019. Chatbots have already been on the radar, and an estimated one in four companies already use them in their day-to-day. With the addition of Natural Language Processing (NLP), Chatbots are becoming more human-like in their interactions and can help retail and e-commerce stores screening their customer complaints or queries, before having the customer speak to a person.

4. Edge computing and Cloud

Last year, Edge Computing and Cloud were set as trends but this year, and so far their popularity and potential has only increased, with the creation and the use of AI-powered smart devices. These smart devices such as drones or autonomous vehicles communicate instantly via IoT and send data to the cloud, which is becoming more and more impractical due to the time it takes. This is where Edge Computing comes into the picture. Due to the need of many of these devices to obtain real-time responses, which so far only Edge Computing can provide. Edge Computing and Cloud work so closely, that some companies have called this combination “The Fog” due to its interdependence.

As smart drones, autonomous vehicles, and other AI-powered smart devices seek to connect and communicate instantly via the IoT, the matter of sending data “all the way” to the cloud will become highly impractical. Many of these devices will need real-time response and processing, making edge computing the only viable option.

Cloud is still gaining momentum, and is a “necessary catalyst for innovation” for companies. According to Gartner, By 2021, more than half of global enterprises already using cloud today will adopt an all-in cloud strategy, especially as security and knowledge on it keeps growing. 

While many of these trends are not groundbreaking, and we already know about many of them already, what we will accomplish with them will definitely be bigger and better than before. 

Don’t miss out on a single post. Subscribe to LUCA Data Speaks.

You can also follow us on TwitterYouTube and LinkedIn

A Brief History of Machine Learning

AI of Things    5 November, 2018

As members of the Machine Learning community, it would be a good idea for us all to have an idea of the history of the sector we work in. Although we are currently living through an authentic boom in Machine Learning, this field has not always been so prolific, going through periods of high expectations and advances as well as “winters” of severe stagnation.

Birth [1952 – 1956]

1950 — Alan Turing creates the “Turing Test” to determine whether or not a machine is truly intelligent. In order to pass the test, the machine must be capable of making a human believe that it is another human instead of a computer.
1952 — Arthur Samuel writes the first computer program capable of learning. The software was a program that could play checkers and improved with each game it played.
1956 — Martin Minsky and John McCarthy, with the help of Claude Shannon and Nathan Rochester, held a conference at Dartmouth in 1956, which is considered to be where the field of Artificial Intelligence was born. Minsky convinced the attendees to adopt the term “Artificial Intelligence” as the name for the new field.
1958 — Frank Rosenblatt designs the Perceptron, the first artificial neural network.


First Winter of AI [1974 – 1980]

In the second half of the 1970s, the field suffered its first “winter.” Various agencies that had been financing AI research cut funds after years of high expectations and few actual advances.
1979 — Students at Stanford University invent the “Stanford Cart,” a mobile robot capable of moving autonomously around a room while avoiding obstacles.
1967 — The algorithm “Nearest Neighbor” is written. This milestone is considered the birth of the field of pattern recognition in computers.

The Explosion of the 1980s [1980 – 1987]

The 80s are known as the birth of expert systems, based on rules. These were rapidly adopted by the corporate sector, generating new interest in Machine Learning.
1981 — Gerald Dejong introduces the concept of “Explanation Based Learning” (EBL), in which a computer analyzes the training data and creates general rules allowing the less important data to be discarded.
1985 — Terry Sejnowski invents NetTalk, which learns to pronounce words in the same way a child would learn to do.


Second AI Winter [1987 – 1993]

At the end of the 1980s and the beginning of the 90s, AI experienced a second “winter.” This time, its effects lasted for several years and the reputation of the field did not fully recover until the early 2000s.
1990s — Work in Machine Learning moves from a knowledge-driven focus to a data-driven one. Scientists begin to create programs that analyze large quantities of data and extract conclusions from the results.
1997 — The computer Deep Blue, by IBM, beats world chess champion Gary Kaspárov.


Explosion and Commercial Adoption [2006 – Present Day]

The growth in the potential for calculation together with the great abundance of available data have relaunched the field of Machine Learning. Many businesses are moving their companies towards data and incorporating Machine Learning into its processes, products and services in order to gain an edge over their competition.
2006 — Geoffrey Hinton coined the phrase “Deep Learning” to explain the new architectures of profound neural networks capable of learning much better models.
2011 — The Watson computer by IBM beats it human competitors at Jeopardy, a game show that consists of answering questions in natural language.
2012 — Jeff Dean, at Google, with the assistance of Andrew Ng (Stanford University), leads the project GoogleBrain, which developed a deep neural network using all of the capacity of the Google infrastructure to detect patterns in videos and images.
2012 — Geoffrey Hinton leads the winning team in the Computer Vision contest at Imagenet using a deep neural network (DNN). The team won by a large margin, giving rise to the current explosion of Machine Learning based on DNNs.
2012 — The research laboratory Google X uses GoogleBrain to autonomously analyze Youtube videos and detect those containing cats.
2014 — Facebook develops DeepFace, an algorithm based on DNNs capable of recognizing people with the same precision as a human being.
2014 — Google buys DeepMind, a British deep learning startup that had recently demonstrated DNN capabilities with an algorithm capable of playing Atari games by simply viewing the pixels on the screen, the same way a person would. The algorithm, after hours of training, was capable of beating human experts in the games.
2015 — Amazon launches its own Machine Learning platform.
2015 — Microsoft creates the “Distributed Machine Learning Toolkit”, which allows for the efficient distribution of machine learning problems to multiple computers.
2015 — Elon Musk and Sam Altman, among others, found the non-profit organization OoenAI, providing it with one billion dollars with the objective of ensuring that artificial intelligence has a positive impact on humanity.
2016 – Google DeepMind beats professional Go player Lee Sedol five games to one at what is considered to be one of the most complex board games.  Expert Go players confirmed that the algorithm was capable of making “creative” moves that they had never seen before.
Today, we are experiencing a third explosion in artificial intelligence. Although there are skeptics who say we cannot discard the possibility of a third winter, this time the advances in the sector are being applied to business to the point of creating whole new markets and producing significant changes in the strategies of both big and small businesses.
The wide availability of data seems to be the fuel behind these algorithm motors which, in turn, are surpassing the limitations of calculation that existed prior to distributed computing. All of this seems to indicate that we should continue to have access to more and more data that will feed our algorithms will the scientific community does not seem to be running out of ideas to continue advancing the field. The following years promise to be truly frenetic.

Written by Víctor González Pacheco, Team Leader and Data Scientist at LUCA consulting analytics

You can also follow us on TwitterYouTube and LinkedIn

DNS over HTTPS (DoH) is already here: the controversy is served

ElevenPaths    5 November, 2018
Recently, the IETF has raised to RFC the DNS over
HTTPS proposal
. In other words, this means resolving domains through the well-known HTTPS, with its corresponding POST, GET and certifications exchange for authentication and encryption. This new is more important than it may seem. For two reasons: firstly, it’s a new resolving paradigm that shakes network foundations. Secondly, because the support of having RFC combined with the interest shown by browsers (greedy for the power granted by this) has led them to start its implementation in record time. It is said that privacy is granted, ok, but… Is it a good (or bad) idea?

DoH (DNS over HTTPS) is really simple. Instead of going to port 53 of a server (for instance, the well-known 8.8.8.8) and asking for a domain through an UDP or TCP packet, DoH standardizes the construction of a GET or POST to a HTTPS domain, so the answer will be the A and AAAA records (the RFC doesn’t specify other records) with the IP. It has more details, such as the clever solution of turning the heading cache-control into the TTL. Everything encrypted carefully, of course. Do you remember when in a hotel you could tunnel the HTTP browsing via the DNS protocol (often unrestricted) to avoid paying the Wi-Fi? So now it’s the other way around.

How have we reached this point?
The DNS protocol is like a camel. Over time, it has been carrying so much weight –patches, remediations and plugins–, that now it is walking through the desert without completely solving any problem except those for which it was designed. For some reason, the desired security and privacy haven’t even been achieved yet; and this is not because they haven’t been proposed (in fact there are dozens of alternative proposals, even complementary to each other), but because none of them has been implemented massively. Ranging from DNSSEC to DNS over TLS (DoT), as can be imagined, this latter means keeping the same DNS protocol, but with a TLS tunnel (something like POP3 and SPOP3). DoT (the closest to DoH) uses the port 853 and, indeed, also hides traffic content and authenticates the server. This RFC was proposed in 2016 but, contrary to expectations, it has not found its way. Anyway, it has not caused the stir raised by DoH.

By the way, there are also DNS over DTLS, DNS over QUIC, DNS over TOR… even a DoH that returns a Json, but this is a special adaptation used by Google (also by Cloudfare) more powerful (since, for example, it allows to check other records different from A
or AAAA).

DNS sobre HTTPS imagen
Usa DoH a través de las APIs de Google y Cloudfare imagen
In these images you can see how use DoH through the
Google and Cloudfare’s APIs and how it returns a Json
Why such a stir?
DNS is one of the oldest protocols of the network, and it has been always a bottleneck for security (ranging from the birthday attack to the Kaminsky’s flaw): clear text, potential UDP (so facilitating even more false packet injections). This represents a disaster even without attacks, since servers can be under governments control and then queries can be redirected or blocked. And all this in full transparency and without privacy nor integrity (since DNSSEC is not as implemented as it should be). We have entrusted Internet foundations to a protocol that could not technologically protect itself against massive implementation of solutions (or such a protection has not been wanted, for the same reason). A protocol to which all kind of patches and wraps have been applied to avoid breaking tradition, to the extent that finally the proposal to get security has been ground-breaking: placing resolving to data framework. If this were not enough, DoH makes that resolving does not trust the system’s global DNS, so it could ignore the DNS server usually provided by DHCP. In this way, each application could resolve via HTTPS by default. 

But this does not seem harmful, right? Would not be wonderful that no one could see what we are trying to resolve and consequently could not modify it by any means? Hiding under the HTTPS queries and replies and going with the flow within a port that nobody can block: the port 443. No more spies and constraints. This is what DoH offers but, is it actually advantageous?

Browsers are happy implementing it. It’s their opportunity to be powerful, not just because they already know this technology HTTPS, but also because it allows them to implement the resolver to be queried by default within the browser… For instance, Google could not access to  whatever is resolved through its famous 8.8.8.8, but it would extend its percentage of DNS’ users (around 13 %) to everyone using Google Chrome, 60 % now. It has been named “secure DNS”. They have seen the opportunity to break out of the system DNS exactly through the point where most domains are resolved: the browser. Google is already using DoH on Intra (released by Jigsaw Operations), which precisely is used to elude DNS blocks.

As for Android, it implements DNS over TLS on its last release, although it has not spread it so much. Currently, Cloudfare has also entered in the DNS business, so the 1.1.1.1 company is working with Firefox to provide reliable resolve. In fact, in Firefox DoH is known as TRR (Trusted Recursive Resolver). It promises not to use the bunch of users’ data that it may need. For example, Cloudfare is engaged to remove that sending of the 3 first octets used in a DNS query. This sending is a movement (with RFC) promoted by Google and OpenDNS in 2011 to improve DNS performance through IP location.

Implementación de Chrome imagen
Chrome has implemented it, but they don’t have an appropriate interface yet. They’re working on it. https://chromium-review.googlesource.com/c/chromium/src/+/1194946
Firefox ya lo incorpora imagen
Firefox has already integrated it, disabled by default
Nobody has considered the problems derived from SIN or from false certificates?
To be honest, there are two serious problems, both derived from TLS itself. The first one, that the cleverest ones may have discovered, is that, currently in the TLS world, the domain visited is clear. If someone monitored a TLS communication they would only see the domain itself, but not the customer-domain communication. This is because of the SIN (Server Name Indication), a parameter which is naturally exchanged over the TLS communication process. The pro-DoH agree with this, but they say that it will change soon. In 2017, this RFC establishing how all the TLS communication will be encrypted (including the domains visited) was accepted as a draft. If this is encrypted on the TLS, the DNS over HTTPS resolving query itself will be completely invisible and, finally, private. How long must we wait for this? No one knows it. People have faith that TLS will implement it.
However, beware! Because a potential traditional resolver (behind DoH) could see the domains queried, so at a moment it would be possible to “go back” and check who queried what.
Maybe the logical option would be using DoH as interface and DoT in the servers that are able to search the domain within the query background. And all this adding DNSSEC, since it’s fully compatible (it adds integrity) and they have different functions.

Moreover, another serious problem from TLS is the use of false certificates within the server, since they enable encryption breaking and spying. This bad practice is accessible for governments and, paradoxically, also constitutes a weak point of DoH derived from using TLS, especially when DoH has been designed specifically to ensure that governments can’t limit Internet through the traditional DNS. Any government could intervene only by using a false certificate also in the DoH (as sometimes done for other web pages).
Although DoT requires pinning use on its RFC, in DoH it’s not even recommended… Didn’t they plan to do it?

https://business.blogthinkbig.com/wp-content/uploads/sites/2/2019/04/doh5.png
As observed, in DNS over TLS pins are indicated (dnsprivacy.org), contrary to DoH.
They even advise against it.

To get the pinning, like other solutions (for instance the extinguished HPKP), after the DoT TLS handshake, the customer shall estimates the SPKI from the certificate based on the public key as well as other data from X.509. Exactly as the HPKP pins, but without first transfer. The customer must know and store them in advance.
The browsers’ role and where this leads us
Because of this, the well-known Internet paradigm may be broken. At least, it raises doubts. In fact, Paul Vixie (one of the DNS’ developers) is radically against it, and promotes the use of DNS over TLS instead of DNS over HTTPS. Some of his reasons are (even if it sounds grinchy) that analysts will lose control over the network, the monitoring ability, signalling and data protocols are confused… It is necessary to take into consideration that this model gives even more power to the browser and, consequently, to that one having the greatest browsing share now: Google. In this regard, Firefox has a more transparent policy, although Cloudfare could get interesting information thanks to its partnership. Anyway, are we centralizing so much DNS, a system decentralized by nature?

DoH is simply a new way to use DNS, and behind it the server queried can do whatever it wants (something that it already does) and it will actually do the same as any network resolver. The protocol itself does not change, what is modified is: How to access it and Who can get such access. The encryption regarding DoT does not change either, but now such encryption is done through a port 443, that hides the remainder of the encrypted traffic and then the DNS resolving is lost within the rest of queries. Just as the malware learnt (to neutralize firewalls’ reputation) to locate the server outside the network (instead of turning the victim into a server by opening a port); just as it understood later it was better to stop using strange ports (for instance, IRC) and communicating via the port 80, or later even via the 334. Just in the same way as we have transferred our hard disk to the cloud, and every application to the browser: DNS joins this trend, so its traditional functioning is reconsidered. All this raises doubts on how resolving will be set within the systems. Imagine that what was feared from governments could be done now by the applications developers or the major DNS owners, or on the contrary: in the future, maybe we will be able to download applications with their own DoH, and we will accept changes in the DNS queries only by accepting the terms and conditions –that nobody reads…

What about the power of filtering domains at the DNS level? It would not be possible with DoH, since the browser could keep visiting that phishing or command a control even if you have previously blocked it on the company DNS. Are we doing malwares a favour in exchange for user privacy and browser power?

However, DoH also opens new possibilities. For this to work, the multiplexed HTTP/2 is used, opening in turn other ways, due to the push that allows to resolve more domains in one go. Moreover, it allows to reduce the SNI leakage. Why? On HTTP/2, connections are reused. From the first connection to a site, the browser can know other sites hosted by this server and consequently reuse this connection for visiting. When connections are encrypted… Benefit is retaken from the channel without sending again the SNI. Since not lots of web pages are located in one server, this will happen most of the times.

In short: locally, pay attention to your browser and if you confirm something strange when resolving domains in your systems, you already know what may be happening; globally… we will see the new paradigms derived from this protocol.

Sergio de los Santos
Innovation and Labs (ElevenPaths)

Warning About Normalizing Data

Santiago Morante Cendrero    1 November, 2018
For many machine learning algorithms, normalizing data for analysis is a must. A supervised example would be neural networks. It is known that normalizing the data being input to the networks improves the results. If you don’t believe me it’s OK (no offense taken), but you may prefer to believe Yann Le Cunn (Director of AI Research in Facebook and founding father of convolution networks) by checking section 4.3 of this paper.

Convergence [of backdrop] is usually faster if the average of each input variable over the training set is close to zero. Among others, one reason is that when the neural network tries to correct the error performed in a prediction, it updates the network by an amount proportional to the input vector, which is bad if input is large
.

Another example in this case of an unsupervised algorithm, is K-means. This algorithm tries to group data in clusters so that the data in each cluster shares some common characteristics. This algorithm performs two steps:

  • Assign centers of clusters in some point in space (random at first try, calculating the centroid of each cluster the rest of the time)
  • Associate each point to the closest center.
In this second step, the distances between each point and the centers are calculated usually as a Minkowski distance (commonly the famous Euclidean distance). Each feature weights the same in the calculation, so features measured in high ranges will influence more than those measured in low ranges e.g. the same feature would have more influence in the calculation if measured in millimeters than in kilometers (because the numbers would be bigger). So the scale of the features must be in a comparable range.
Now you know that normalization is important, let´s see what options we have to normalize our data.

A couple of ways to normalize data:

Feature scaling

Each feature is normalized within its limits.
Figure 1, normalization formula
This is a common technique used to scale data into a range. But the problem when normalizing each feature within its empirical limits (so that the maximum and the minimum are found in this column) is that noise may be amplified.
One example: imagine we have Internet data from a particular house and we want to make a model to predict something (maybe the price to charge). One of our hypothetical features could be the bandwidth of the fiber optic connection. Suppose the house purchased a 30Mbit Internet connection, so the bit rate is approximately the same every time we measure it (lucky guy).
Figure 2, Connection speed over 50 days 

It looks like a pretty stable connection right? As the bandwidth is measured in a scale far from 1, let us scale it between 0 and 1 using our feature scaling method (sklearn.preprocessing.MinMaxScaler).

Figure 3, Connection speed / day in scale 0-1.
After the scaling, our data is distorted. What was an almost flat signal, now looks like a connection with a lot of variation. This tells us that feature scaling is not adequate to nearly constant signals.

Standard scaler

Next try. OK, scaling in a range didn’t work for a noisy flat signal, but what about standardizing the signal? Each feature would be normalized by:
Figure 4, Standard scaling formula
This could work on the previous case, but don’t open the champagne yet. Mean and standard deviation are very sensitive to outliers (small demonstration). This means that outliers may attenuate the non-outlier part of the data.
Now imagine we have data about how often the word “hangover” is posted on Facebook (for real). The frequency is like a sine wave, with lows during the weekdays and highs on weekends. It also has big outliers after “Halloween” and similar dates. We have idealized this situation with the next data set (3 parties in 50 days. Not bad).
Figure 5, Number of times the word “hangover” is used in Facebook / days.
Despite having outliers, we would like to be able to distinguish clearly that there is a measurable difference between weekdays and weekends. Now we want to predict something (that’s our business) and we would like to preserve the fact that during the weekends the values are higher, so we think of standardizing the data (sklearn.preprocessing.StandardScaler). We check the basic parameters of standardization.
 
Figure 6, Standard standardization for the above data is not a good choice.

What happened? First, we were not able to scale the data between 0 and 1. Second, we now have negative numbers, which is not a dead end, but complicates the analysis. And third, now we are unable to clearly distinguish the differences between weekdays and weekends (all close to 0), because outliers have interfered with the data.

From a very promising data, we now have an almost irrelevant one. One solution to this situation could be to pre-process the data and eliminate the outliers (things change with outliers).

Scaling over the maximum value

The next idea that comes to mind is to scale the data by dividing it by its maximum value. Let´s see how it behaves with our data sets (sklearn.preprocessing.MaxAbsScaler).

Figure 7, data divided by maximum value
Figure 8, data scaled over the maximum

Good! Our data is in range 0,1… But, wait. What happened with the differences between weekdays and weekends? They are all close to zero! As in the case of standardization, outliers flatten the differences among the data when scaling over the maximum.

Normalizer

The next tool in the box of the data scientist is to normalize samples individually to unit norm (check this if you don’t remember what a norm is).

Figure 9, samples individually sampled to unit norm

This data rings a bell in your head right? Let’s normalize it (here by hand, but also available as sklearn.preprocessing.Normalizer).

Figure 10, the data was then normalized

At this point in the post, you know the story, but this case is worse than the previous one. In this case we don’t even get the highest outlier as 1, it is scaled to 0.74, which flattens the rest of the data even more.

Robust scaler

The last option we are going to evaluate is Robust scaler. This method removes the median and scales the data according to the Interquartile Range (IQR). It is supposed to be robust to outliers.

Figure 11, the median data removed and scaled
Figure 12, use of Robust scaler

You may not see it in the plot (but you can see it in the output), but this scaler introduced negative numbers and did not limit the data to the range [0, 1]. (OK, I quit).

There are others methods to normalize your data (based on PCA, taking into account possible physical boundaries, etc), but now you know how to evaluate whether your algorithm is going to influence your data negatively.

Things to remember (basically, know your data):

Normalization may (possibly [dangerously]) distort your data. There is no ideal method to normalize or scale all the data sets. Thus it is the job of the data scientist to know how the data is distributed, know the existence of outliers, check ranges, know the physical limits (if any) and so on. With this knowledge, one can select the best technique to normalize the feature, probably using a different method for each feature.

If you know nothing about your data, I would recommend you to first check the existence of outliers (remove them if necessary) and then scale over the maximum of each feature (while crossing your fingers).

Written by Santiago Morante, PhD, Data Scientist at LUCA Consulting Analytics

You can also follow us on TwitterYouTube and LinkedIn

IoT growth forecast

Cascajo Sastre María    31 October, 2018

Internet of Things is transforming the world of technology by leaps and bounds. Its evolution enables the development of smart cities, industries and homes, which generates more sustainable and efficient spaces.

The number of devices that work connected is immense. The range goes from sensors to manage traffic, energy or pollution, to machines that manufacture components in industry, drones that help in agriculture and fight against fires or wearables with which we control our daily activity.

Knowing their possibilities, it is not surprising that in recent years the investment in IoT has not stopped growing. Its implementation generates a significant increase in efficiency where it is applied.

It is estimated that IoT spending will have an annual growth rate of 13.6% between 2017 and 2022

This explains the growing interest of companies in devoting resources to launch digital innovation plans in the services they offer. All this benefits their income, their prestige and an improvement in the costumers’ perception of them.

For these reasons, the different analysts agree that world investment will continue to increase. The International Data Corporation (IDC) estimates that spending on this technology will have an annual growth rate of 13.6% in the period from 2017 to 2022, when it will reach 1.2 trillion dollars.

Along the same lines, Bain & Company predicts that investment in the combined IoT markets will reach some 520,000 million dollars in 2021 (more than the double of 235,000 million in 2017).

According to Business Insider, in 2023 the government investment to promote smart cities can reach 900,000 million dollars, while the manufacturing of solutions in the industry will cover some 450,000 million.

The amount of devices connected will continue to grow to 20,400 million in 2020, year in which 95% of new products will incorporate this technology in its manufacture. Likewise, Business Insider predicts that, among citizens, institutions and governments, we will have more than 40,000 million devices connected in 2023.

In the geographical area, although the reach of connectivity is becoming more global, we must bear in mind that, to date, China, North America and Western Europe are the main actors of the IoT revolution (they took 67% of the market in 2017).

With sectors, the manufacturing of goods and transport are those that most invest, although the analysis predicts which consumption will grow the most in the coming years, followed by insurance and medical care.

IoT technology has a wide market avalilable. Among the challenges it faces are an increase in security measures and the integration of existing devices, as well as the development of intuitive interfaces to facilitate its use throughout the population.

It seems clear that its central role in the evolution of telecommunications opens a scenario full of possibilities for hardware manufacturers, software developers and connectivity providers.

Leave a Comment on IoT growth forecast

Big Data & AI changing the music game: IBM Watson BEAT

AI of Things    30 October, 2018

Music is something that unites us. Go anywhere in the world and you will find music: someone singing, playing, rapping or dancing. For thousands of years it has formed a crucial part of every known society in the world, even the most isolated tribal groups, and every society and culture has music that reflects their beliefs, current affairs, and feelings. Our human intelligence; our experiences and our ability to learn from these, is what enables musicians to write music that resonates with a demographic the size of the population of India.

But what the majority of us don´t understand is that music can be complex. If we look at the music industry from a commercial view point, there are over 7 billion people in the world, that´s a huge market. Trying to make music that will sell and appeal to everyone is an impossible task. But making music that appeals to a billion of these, that is attainable.

IBM’s music arm of Watson: Watson BEAT, is changing the game. IBM’s Watson Beat is a cognitive cloud-based music program developed using AI and machine learning. The machine´s music generation algorithms analyse individual tracks and collect data on pitch, time and key signatures, and note sequences. Using this collected data, it works out what a listener might want, or what an artist may be inspired by. Of course, this does not immediately equal a smash hit that everyone will love, but it certainly has the potential to aid producers and song-writers know their audience and get inspired.

Video of music producer Alex da Kid collaborating with IBM Watson BEAT.

The project initially came from former IBM researcher Janani Mukundan. Under Janani´s guidance, it has been influenced by several different people, including musician Richard Daskas, with whom IBM created their collaboration for the Red Bull F1 commercial. The BEAT code uses two methods of machine learning; reinforcement learning (uses the assumptions of modern western music theory to create reward functions), and Deep Belief Network (DBN), where the AI is trained on a single input melody to create a vibrant and complex melody layer.

However, discussions have arisen surrounding the copyright debate. Many believe that a machine cannot own copyright, and that it must belong to the person that sees the spark and creates something with it. Yet entertainment lawyer Bjorn Schipper, says that it is hard to give a conclusive answer due to the novelty of AI in composition.

Also joining in on the debate, Meindert Kennis, the Lead Digital Strategist and CMO of Spinnin’ Records, stated that “A lot of artists create music and come to us and say “Here’s my new track,” but, if they start to use more AI instruments, it might be wise to record the actual recording of the music and the creative process more to show that it’s actually them.”.

Although the technology can compose music by using algorithms and data sets, it cannot guarantee a sound that will appeal to the masses. Echoing one of the themes we have frequently visited in our blogs, the human intelligence is still the most vital piece of the puzzle. Predominantly, this AI is used to reduce the time composers and producers have to spend on repetitive tasks: once again, utilising AI as a tool to aid our human capabilities.

https://www.youtube.com/watch?v=bDO4yN4V-sM

Broadway musician Seth Rudetsky partners with IBM Watson to create a cognitive collaboration

Traditionally, people precieve AI to be simply data and analytics. Watson Beat harnesses the power of AI to turn data into a creative expression. You may have already heard a song composed by AI, who knows, maybe your next favourite will be created by IBM´s Watson BEAT!

Don’t miss out on a single post. Subscribe to LUCA Data Speaks.

You can also follow us on TwitterYouTube and LinkedIn

7 objects that you did not know are IoT

Beatriz Sanz Baños    23 October, 2018

Internet of Things, Big Data and Artificial Intelligence keep expanding to all areas of our lives, improving the productivity of companies, governments and people.

The amount of everyday objects integrate these technologies does not stop growing. Here are some examples:

  1. Smart slipsole

Footwear is an essential element for our comfort, but also a valuable source of information. Thanks to the connected slipsoles we can know the number of steps we take each day, as well as the distance and duration of our routes and trace our routes by GPS connection.

  1. Coffee maker

In a connected world, people can have coffee ready when they get out of bed. Smart coffee makers prepare it at the desired time, warn when the capsules are running out, monitor the water level and alert if they need maintenance operations.

  1. Trackers

There are IoT devices with GPS tracking incorporated into all types of objects, such as watches for seniors and backpacks with which parents can always know the location of their children, in addition to receiving alerts if they leave a  previously stablished security perimeter. These trackers are also installed in suitcases and bicycles, which allows us to locate them immediately in case of theft or loss.

In a similar way, sensors in collars allow us to keep track of our pets, as well as monitoring their body temperature, sleep habits, their diet and even emit ultrasounds that calm them in dangerous situations. These tools give us the option of transferring updated information about their daily activity and their health status to the veterinarian.

  1. Smart refrigerators

How many times do we realize that we forgot to buy something just when we have come back from the supermarket?

This can change thanks to the creation of intelligent refrigerators able to monitor our food stock and warn us of the need to buy an item. Likewise, they can inform us when the expiration date of the food is near or detect an inefficient order of the products.

In addition, intelligent refrigerators control basic aspects of maintenance and energy management such as temperature, humidity or if we have left any door open. Moreover, some have a touch screen and voice assistant that offer users information on recipes or the nutritional value of the meals.

  1. Drones

The number of unmanned aerial vehicles has multiplied in recent years. They are already support elements in sectors such as agriculture, where they facilitate the monitoring of the state of the fields, the detection of pests or a more efficient fumigation. They have also increased efficiency in freight transport or construction.

Similarly, drones are a key element for public safety. They help in the prevention and extinction of fires, as well as in other natural disasters such as floods or earthquakes, where they exercise decisive tasks in the evaluation of damages and the search for people.

  1. Smart bracelets

Another of the IoT applications in the field of health and sports are smart wristbands. These gadgets are used to measure parameters related to physical exercise such as the number of steps or distances traveled. This allows us to have, through an app, all the information on our smartphone without having to wear it during physical activity.

  1. Sensors for watering the garden

Smart garden sensors monitor environmental conditions such as the degree of humidity, temperature, and the level of sunlight or the state of soil fertilization. The only remaining task is to use an application to transmit this data to our mobile phone. Thus, users know at all times if there is a need to water the garden.

The digital transformation is unstoppable. The scope of connectivity between objects will continue to grow. It is a reality that benefits us all, since an intelligent environment is more sustainable and efficient, which makes our day to day life easier.

IoT and Big Data: the parents of carsharing

Cascajo Sastre María    22 October, 2018

Mobility is one of the main problems of large cities. Every day, millions of citizens are caught in traffic jams. Although it may seem difficult, this reality can be changed. The application of Internet of Things and Big Data to the world of carsharing offers us a way to improve urban traffic.

Telecommunications networks produce huge amounts of data about our daily lives constantly. All this information, which needs Artificial Intelligence techniques to be processed, is what is known as Big Data. From its analysis, valuable information can be obtained for a more efficient management of urban mobility.

The connectivity of smartphones and automobiles (Gartner estimates that in 2020 there will be around 250 million cars connected worldwide), together with the massive data processing, allows users to be in contact with companies and calculate the most efficient routes to minimize the travel time.

These technologies have led to a disruptive evolution of the rent a car model towards carsharing. The concept refers to the loan or temporary use of vehicles made available to users in exchange for a specific tariff, generally for short periods of time and in limited geographical areas.

This system allows users to be in contact with companies and calculate the most efficient routes to minimize the travel time

Carsharing is empowered by IoT and Big Data for the centralized management of its vehicle fleet

Like VTC platforms such as Uber or Cabify, the carsharing companies use IoT and Big Data to carry out a centralized management of their vehicle fleet. Everything works with geolocation services that provide constant traffic updates, which makes it possible to send notifications about accidents and other incidents, as well as information about the location of available cars.

Another variant of shared car that is empowered by digital innovation is carpooling, that is, the practice of several people travelling in the same vehicle, divided the expenses among all the participants. Applications such as BlaBlaCar put users in contact with similar journeys every day, thus reducing the number of cars circulating simultaneously.

Besides saving time and costs in your travels, it is very simple, fast and intuitive for users to be able to join these new initiatives. All these companies have developed mobile applications so customers have an easier time using them. You only need your smartphone to register, find the car, find the passengers that will accompany you and choose the destination.

Towards a more sustainable city

The conventional model of mobility is not sustainable. The density of traffic has an impact on the level of pollutionof the air we breathe. It is no longer necessary for each citizen to own a car, especially if it is going to be used during reduced fractions of time throughout the week.

New business models based on shared-use vehicles allow consumers to pay exclusively for driving time and save on refueling, insurance or maintenance costs. In addition, they facilitate the extension of electric cars through cities.

New business models based on shared-use vehicles allow consumers to pay exclusively for driving time and save on refueling, insurance or maintenance costs

The high initial cost is the main obstacle for the expansion of this type of transportation in the domestic sphere. Therefore, the creation and development of startups that offer them is great news for society as a whole. Car2Go, Emov or Zity are some examples, but there are also available motosharing platforms that have electric motorbikes for shared use.

Digital transformation has come to the world of automobiles to stay. The development of new ICTs provides new, more efficient habits. Everything seems to indicate that payment for use will displace property when it comes to travel and that the non-polluting vehicles will continue to spread through megacities around the world.

The future trend offered by technologies based on IoT and Big Data are connected, electric and autonomous cars within the framework of smart cities where the parameters for sustainable mobility are registered in real time and always aiming to improve and make the life of its inhabitants simpler.

Leave a Comment on IoT and Big Data: the parents of carsharing