Blockchain technology

Diego Martín Moreno    2 April, 2019

In the previous article we talked about the basic concepts of the blockchain, and now we will focus on the current technologies to implement a blockchain solution.

Blockchain and Smart Contract started in the 90s conceptually, but it has been in the last 9 years when different technological solutions have appeared, at this moment there are many implementations, but we can highlight three:

On the one hand, we have Corda R3, which is a blockchain created in 2014, focused around the finance sector, where a monetary concept does not exist, and a notary concept does.

We also have Hyperledger, a blockchain born in 2015. It´s use is more general, with a modular design, without money, and where the chain or ledger that we create is stuck to the design of our smart contract.

And finally, there´s Ethereum, which is a blockchain also established in 2015, with a much more general design, with its own money, ether, and where the network is focused on the design of Dapp applications (a decentralised application), to allow us to create decentralised applications that can be executed within Ethereum. 

In the interest of technology, we will focus on Ethereum. For example, we can build an application in blockchain, which is already a flexible network and, moreover, it has public test environments, where we can construct our applications with practically no cost.

To begin, we should talk about Ethereum, a network with its own currency and ether, which is the key element for the whole system. In order to execute the smart contracts, we need to have Gas at our disposal, which we get through Ether, which comes with our Ethereum accounts, to be able to pay the miners, that mine for us or execute our smart contract. 

Furthermore, other than Ether, Ethereum also has the second basic component in the network, the Ethereum Virtual Machine (EVM), which is the software that carriers out all the network nodes, and that allows us to safely execute a smart contract. This EVM can be implemented in different languages, it simply needs to comply with the specifications of the Ethereum Yellow Paper.

At this time two large implementations exist, on the one hand Geth, which is the implementation in Go of the EVM, and on the other we have Parity, which is the Rust implementation. The biggest advantages of the EVM are:

1. It´s deterministic. It always gives us the same result, for the same data and the same operations, like 2+2 always equals 4.

2. Its terminable: Our operations can´t be indefinitely executed, as all the execute instructions in the EVM have a gas cost and we have to limit the spending on each smart contract.

3. It´s isolated. No one can manipulate the execution of the smart contract. 

The nodes and the ether form the base of the system, and now we can already deploy our smart contracts, which are programs that can be developed in distinct languages although they are all carried out in the same network. Nowadays, in Ethereum Solidity is the main programming language of Smart Contract, which is a mix between C and Javascript. 

We now have all the elements in place to build our applications in Ethereum, a network with nodes to carry out the contracts, currency to pay for the transition, and a language to construct the smart contract. Now the question arises, what type of blockchain do I want to use?

When we talk about networks within Blockchain, we have 3 types, depending on the privacy of each one:

– Public networks. These are the networks that anyone can join. In the case of Ethereum, its denominated by Mainnet, which also offers public tests like the network Ropsten. Its main characteristic is that the information on the network will be public.

These types of networks conform to public agreements, certificates or whichever type of functionality is necessary to be in the public domain. Economic costs are also higher with these types of networks.

– Semi-public networks. This type of network is orientated around the sharing of information between distinct organisations, and where the smart contract is assigned to manage the relationships between the members (automatically and without supervision), which improves productivity of the processes. 

One of the advantages of this network is that it doesn’t need as many resources as the public one. There is already an implementation of Ethereum which is Quorum, of JP Morgan, specified for semi-public networks. 

In Spain, they have access to Alastria which is a semi-public network implemented through Quorum. 

– Private networks. The last type of network, which would be used within a business that doesn’t need to communicate outside of itself, where we like to simplify processes and where we have all the information audited. 

As we can see, we have distinct types of networks for different uses that, depending on our use case, we should select a specific type of network.

And all of this is happening today, but we need to be ready for future changes and the future of the best technologies that exist, as for example, future changed to Ethereum. At Ethereum, there are a variety of projects designed to improve the network. Like Plasma, which is a project that improves Ethereum’s capabilities of carrying out more transactions per second. There´s also Casper, which was born from the idea to diminish network consumption with a new protocol that uses the algorithms Proof-Of-Stake (PoS) and Proof-of-Work (PoW).

As we have seen in this article, blockchain is still a relatively young technology, that continues to improve and change with time, but that has come to stay and modify the solutions we can give in many cases, just as with Big Data technologies.

Don’t miss out on a single post. Subscribe to LUCA Data Speaks.

You can also follow us on TwitterYouTube and LinkedIn

New research: we discover how to avoid SmartScreen via COM Hijacking and with no privileges

ElevenPaths    2 April, 2019
COM Hijacking technique has a simple theoretical basis, similar to the DLL Hijacking one: What does it happen when an application searches for a non-existent COM object on the computer where it is being executed? Or when such object exists but it cannot be found on the registry key where it was searched? An attacker may create it by means of altered information. For instance, a path leading the victim to a DLL created by the attacker instead of to the searched one. We can benefit from the by-default order used by the program to search for this object: this is how we have managed to avoid SmartScreen on Windows.

Brief introduction

COM (Component Object Model) is a binary-interface standard for software components allowing communication between processes as well as dynamic creation of objects, regardless of the language used to program them. COM provides a stable ABI (Application Binary Interface) that does not change with compilers’ different versions. This is appealing for C++ developers when the code must be shared with clients using different compilers’ versions.

COM objects are commonly compiled as a DLL, but the way they are used is particular. COM objects must be unequivocally identifiable at execution time, so the GUID identification method is used.

{CB4445AC-D88E-4846-A5F3-05DD7F220288}


Each COM object is registered under its corresponding GUID, together with one or more keys that provide information on the object itself, such as the real path of its specific DLL. Usually, COM objects are registered under the following registry paths: HKLMSOFTWAREClassesCLSID or HKLUSOFTWAREClassesCLSID. There, under the corresponding GUID key, InprocServer, InprocServer32, InprocHandler e InprocHandler32 registry keys are commonly used to provide the object DLL with the paths. If the COM object is under the root HKEY_LOCAL_MACHINE (HLKM), this means that it is available for all users on the computer and has been created thanks to system admin permissions; while those under the root HKEY_CURRENT_USER (HCKU) are valid for the user currently authenticated and not necessarily created by an admin.

The system’s search order is quite interesting. A typical scenario is going firstly to user’s branch and then to the computer’s branch where it is executed. Let’s think of an application that when boosting needs to use the functions of the COM object located on the following registry key:

HKEY_LOCAL_MACHINESOFTWAREClassesCLSID{CB4445AC-D88E-4846-A5F3-05DD7F220288}InprocServer32

 However, before examining there, the application search for it in the following path:

HKEY_CURRENT_USERSOFTWAREClassesCLSID{CB4445AC-D88E-4846-A5F3-05DD7F220288}InprocServer32

 In case this last key did not exist, we would be facing an application vulnerable to COM hijacking. Performing the technique only involves creating the following structure on the registry:

HKEY_CURRENT_USERSOFTWAREClassesCLSID
{CB4445AC-D88E-4846-A5F3-05DD7F220288}
InprocServer32
(Default) = C:DLLsMaliciosasmiDLL.dll

COM Hijacking as a persistence technique

The COM Hijacking technique to achieve persistence brings several advantages against the remaining traditional techniques to boot the system. The best way is to have a native COM object, called every time the system is boosted. The main problem here is that native COM objects are usually located on HKCR (classes root) instead of on the user’s own registry, so a user on its own should not be able to access it.

The truth is that HKCR is a virtual view of that we see both on HKCU and HKLM. This means that if you wish to write a key on

HKCRCLSID{A47979D2-C419-11D9-A5B4-001185AD2B89} 

You would be able to do it by creating it on

HKCUSoftwareClassesCLSID{A47979D2-C419-11D9-A5B4-001185AD2B89}

Consequently, to perform the hijack over the native COM object on Windows, the key may be created as shown in the following image, where you can observe how it is immediately spread.

 

Since we work over HKEY_CURRENT_USER (HKCU), no admin permissions are needed to perform the attack. Once the registry key created, the code within the entered DLL will be executed each time the vulnerable application finds the kidnapped COM object and loads the malicious DLL.

Elevating privileges through Event Viewer and Task Scheduler

To elevate privileges through a technique such as COM Hijacking, we must take advantage of a vulnerable application executed with elevated privileges and high-integrity level process. Event Viewer and Task Scheduler applications call an elevated and high-integrity level process known as mmc.exe. It is used by several Windows applications for administration. Functionalities mentioned search for COM objects on the following path:

HKCUSoftwareClassesCLSID{0A29FF9E-7F9C-4437-8B11-F424491E3931}InprocServer32

What would it happen if a COM hijack were performed over such object? As you can see, the following line achieves a hijack:

powershell.exe -Command {$Path=”HKCU:SoftwareClassesCLSID{0A29FF9E-7F9C-4437-8B11-F424491E3931}InprocServer32″;$Name=”(Default)”;$Value=”C:MisDLLsepp1.dll”;New-Item -Path $Path -Force;New-ItemProperty -Path $Path -Name $Name -Value $Value}

Once the vulnerable process called, this will find the COM object (that in principle has not been allocated) and will execute the malicious DLL: in this case a meterpreter shell located on “C:MisDLLsepp1.dll”

As the vulnerable process is elevated and has high-integrity level, the provided shell will have SYSTEM privileges without problems. A similar technique has been used to avoid UAC.

Going unnoticed: SmartScreen is vulnerable to COM Hijacking

Some time ago we discovered how attackers were able to avoid SmartScreen by taking advantage of DLL Hijacking techniques. This approach manages similar effects, but in a different manner. Every time a program is executed on Windows, SmartScreen is executed in order to protect us. No matter what program it is, every execution goes over SmartScreen, that queries on the cloud if the program might result in a risk for the system..

Nevertheless, SmartScreen is vulnerable to COM Hijacking.

Every time a binary is executed, SmartScreen is executed as well; in turn, every time SmartScreen is executed several COM objects are unsuccessfully searched within the registry. Among them:

HKCUSoftwareClassesCLSID{A463FCB9-6B1C-4E0D-A80B-A2CA7999E25D}InprocServer32.


By using a simple DLL, a hijack can be performed over this object by executing the following command on the PowerShell console:

Following the execution of the previous script, any program executed
by the user will execute SmartScreen, and in turn such process
will load and execute the malicious DLL,
so returning a meterpreter shell. For the proof of concept, we simply used a DLL that displayed a “Hello world!”, letting die the process that is supposed to protect us.
 

It may be used for bypassing and persistence too.

We have informed Microsoft about this issue. They answered this behavior is by desing.

The whole research can be found in:

Innovation and lab in ElevenPaths
www.elevenpaths.com

Don’t give up on human intelligence while adopting the artificial one

Álvaro Alegría    29 March, 2019

Artificial intelligence is here to stay, that’s for sure. Just look at the pink (and not that pink) pages of any newspaper or take a quick look at Linkedin’s feed to see that the “AI” is the last cry in the business world.

And the truth is that there are plenty of reasons for this. The capabilities offered by this new technology allow not only to overcome barriers that were previously insurmountable for traditional systems, but also to unlock fields that were previously reserved for science fiction films.

However, this rage for artificial intelligence has revealed a fact that is, to say the least, surprising: When it comes to making decisions, companies seem to rely more on machines than on humans.

No one denies that technology has exponentially higher computing capabilities than the most intelligent human on earth, but it is also true that not all business decisions require those levels of intelligence and, in those cases, the trend seems to be maintained: the “machine” decides better.

To position oneself on the certainty of this question in a categorical way, with a simple, true or false, is simply impossible. There are so many conditioning factors and nuances that make up the context of each decision that, probably, the most appropriate answer is that it “depends”.

However, the purpose of this article is not to give an answer to that question, but to raise the debate on the reasons that apparently lead us to trust a machine rather than a human. Because, when the decision is made to trust a machine, that machine has been designed and programmed by a human; and the ultimate decision to trust is also made by a human.

From a strictly personal point of view, giving more reliability to machines than to people is due to different factors, not all of them positive.

The first, as we have already mentioned, are the greatest computing capacities. Technology can store more information, more varied and cross and contrast it at levels that human beings are incapable of.

The second, although it may be obvious, is the lack of humanity. Human beings have feelings, prejudices, biases, experiences that, even if we don’t want to, influence our reasoning capacity and, therefore, condition our decisions. (I don’t judge whether this is negative or positive. I just want to make it clear that, sometimes, this is one of the aspects to be avoided by designating a machine as a decision maker).

The third, and possibly the most controversial, is a certain dose of cowardice. If the decisions are made by a machine, which we have also agreed is less fallible than a human, we humans are released from responsibility for the decisions. Thus, if the result obtained is not as expected, the fault will never be ours, but that of the machine. At most, it may be the fault of the person who designed and programmed the machine, but in no case of the human being who should have taken that decision in the absence of it. In my opinion, this is a dangerous aspect, because the absence of responsibility can lead to irresponsibility.

Be that as it may, the reality, motivated by the speed of technological advances, is that many large companies have gone almost directly from making 100% human decisions, based solely on experience and intuition, to ceding that decision-making capacity to artificial intelligence. And in this process an intermediate step has been overlooked: human decisions based on data.

It is very likely that machines will make more precise, more aseptic and more coherent decisions than humans, but, unfortunately, it seems that we ourselves have taken the opportunity to show how good we could have been compared to machines, if we had been able to trust ourselves.

In any case, there is something that should never be lost sight of. That no matter how superior the intelligence of a machine is compared to that of a human, it will always be artificial.

You can also follow us on TwitterYouTube and LinkedIn
Leave a Comment on Don’t give up on human intelligence while adopting the artificial one

Groundbreaking study exposes the extent of UK gender bias

Stefan Leadbeater    26 March, 2019

In previous years, various reforms have been used to combat the labour gap and to “make work pay” for all however we still find ourselves with an issue that persists. In this blog we will be talking about how advances in Artificial Intelligence have allowed a more in-depth and reliable study into the gender gap and equality in the UK workplace.

In studies using Artificial Intelligence in the past, the internet as a source of data has been largely ignored. This has meant that studies have only been able to focus on a small number of sectors which would inevitably have a negative effect on the quality of the data produced.

However, a study by Glass AI has changed this.

Glass AI is a large-scale Artificial Intelligence system that reads, interprets and monitors the internet.

As a company, Glass AI has deep roots in machine learning and computational linguistics (a field not yet extensively researched) and through this the company aims to create a new AI driven research resource. With this study into the gender bias in UK employment, they have undertaken a “unique experiment in the application of data science”.

It was a first of its kind study where the artificial intelligence was able to understand written language which Glass AI states is the first-time primary data has been collected in this way. AI was used to read data on over 2.3 million people and the positions they hold in 150,000 companies in over 108 different industries.

The report published by Glass AI explains that the technology was programmed to recognise business websites under the umbrella of ‘.UK’ as well as their gender, their business roles in terms of importance, leadership position and economic sectors – a wide range of information. The advantage of using the internet as an analytical tool is that Glass AI has been able to offer a more in-depth and detailed insight into the gravity of the gender bias in the UK workplace than has previously been available bringing to light the real issues.

The study has exposed these shocking facts:

  • 82% of all CEOs, 92% of chairpersons, and 7% of directors are male.
  • Support roles often deemed as inferior are dominated by women with 95% of all receptionists, legal secretaries, and care assistants are female.

This doesn’t leave very much scope for women to advance professionally within their workplaces and Sergi Martorell, the Co-founder of Glass AI has stated how he is shocked by the sheer scale of the problem. He believes that not enough attention is being paid to it and in turn, not enough action is taking place.

However, the study did highlight sectors in which women also make up the main body of employees such as the educational sector with 71% of primary and secondary education being led by women. Equally in vetenary science, 78% of employees are women but this merely highlights that the participation of women in the workplace has improved. It does not detract from the fact that a very low percentage of them are in managerial positions.

Looking to the future, having seen that the findings made by AI were the same as those submitted by the Office for National Statistics, it shows promising steps forward into the use of the internet as a accurate data source in future studies. When discussing AI, the main concern of people is always about accuracy and once this hurdle can be overcome, more and more time and resources can be saved whilst producing even more accurate results than those of humans.

Don’t miss out on a single post. Subscribe to LUCA Data Speaks.

You can also follow us on TwitterYouTube and LinkedIn

Chiemsee-Alpenland Tourism and Telefónica Next Examine visitors to the Chiemsee lake

AI of Things    22 March, 2019

Original post by Cécile Schneider, Telefónica NEXT

The touristic region of Chiemsee-Alpenland and our colleagues at Telefónica NEXT used data analysis to examine the profile of visitors to the Chiemsee lake. In this way, tourism professionals from the Chiemsee lake obtained new insights on the visitors- as for example their origin, age distribution and gender, and the average time of their stay.

The study focused on the region of the Chiemsee lake and its island of Herrreninsel (island of the Caballeros), which attracts almost half a million visitors a year. Telefónica NEXT evaluated the affluency of the visitors during the months of July 2017 to June 2018 retrospectively, within a radius of 5km. Amongst other places, within this radius is also the island of “Fraueninsel” (Island of the women), and localities on the outskirts of the Chiemsee Lake such as Prien, Gstadt and Bernau.

Figure 2. Christina Pfaffinger, director of Chiemsee-Alpenland Tourism

Specialized offers directed at visitors 

Thanks to the results obtained, the touristic offer can be adapted to better suit the needs of the visitors. Christina Pfaffinger director of Chiemsee-Alpenland tourism, said in this regard: “A real-time analysis of visitors and their countries and regions of origin allows us to make the most of markets of origin and develop specific offers tailored to the visitor profile.”

Telefónica NEXT published this promotion as a free pilot project to test out, for the first time in the tourism sector, the opportunity to combine data analysis with local advertising. 

Opportunities of data analysis in tourism 

“We received many requests for this project, and so we decided on the region of the Chiemsee lake because of the island ‘Herreninsel’ there- one of the most important German tourist attractions- where we can demonstrate how better knowing the visitor profiles allows you to create better offers with greater precision.” Commented Jens Lappoehn, director of Telefónica NEXT.

O2 more local 

The pilot project also contains an SMS or MMS publicity campaign with 100.000 promotional contacts. For this, they used the ‘Location-based Advertising Service O2 More Local’ of Telefónica. This publicity of local character only receives clients that have previously registered for the service. In this way, tourism professionals get to the right people at tailored times and places, above all to the locations from which the majority of clients travel.  

Telefónica NEXT generates data from the mobile network of over 45 million users, when mobile phones use internet connection or make calls. It is always anonymized using a certified process of TUV consisting of three security levels to ensure that individuals are not referenced. Thus, the analysis of this data offers patterns of movement in Germany, contributing to the very significant added value for areas of great affluence.  

Don’t miss out on a single post. Subscribe to LUCA Data Speaks.

You can also follow us on TwitterYouTube and LinkedIn

How to win over the Millennial Shopper? A great personalised experience will make them stay

Anshul Kudal    19 March, 2019

Data is essential in creating a personalised experience which attract Millennial clients. The obtaining and exchanging of information, always maintaining the privacy of the client, which is key in continuing to capture and retain their attention.

Millennials, the first digital generation

We know that according to historical and economic context, human beings tend to develop similar attitudes and concerns. More than 8 million people in Spain belong to the Millennial generation. Many companies directing all of their forces into connecting with them and understanding their characteristics is essential to achieving this. Millennials are demanding, digital and reserved, but however they are lazy and impatient. They live in a digital world offering a wide a diverse product range and they are accustomed to immediate communication, purchases and access to information. And it is not okay merely to capture the attention of a Millennial, you must maintain it.

In recent years. Marketing professionals have realised that the retention of clients is actually more important than acquiring them. As a matter of fact, a recent E-commerce 2018 survey about Spain shows that the conversion rate of existing clients is three times that of new clients.

When it comes to retaining clients, creating a personalised user experience is key, already 77% of Millennial clients consider that personalised recommendations are extremely useful and 33% end up buying recommended products. There is nothing like knowing the client and offering them what you know will interest them.

Millennial clients of Generation Z have a very short attention span. It is less likely that these users will stay on a web page for more than a few seconds in order to find what they are looking for or a website that doesn’t suit their needs for that matter. Also, in a digital world characterised by a wide range of products offered, these types of clients have at their fingertips countless tools and possibilities to choose what they want, when they want.

So, how do we retain these types of clients? According to experts in this sector, the key is in anticipating their searches and wants, and to show them the products in as little time as possible. Creating a great personalised user experience.

Understanding the client and anticipating their needs:

The key to all of this, is information. Digital clients generate data online and on other platforms which, while always preserving their privacy, can be used commercially. With sufficient data it is possible to differentiate between the curiosity of the client and the intention of the client, meaning, actually knowing if a client is interested in a product.

How do you get this information? There are different sources of data available:

  • Data which the user generates online on their own web page, application or platform. This could be data from sources such as subscriptions or product searches and purchases.
  • Third party data which accumulate on applications and web pages of third parties in the form of cookies and email lists. This information is sold to businesses and generally is of quite poor quality and not overtly legal.
  • Third party data, other providers and companies which are willing to share or sell it. This is of better quality and legal.

Always taking into account that the privacy of the client should not be violated, the latest model which brings together all of the best aspects, is something in which all salespeople should invest. To this day, giants such as Oracle, Salesforce and Adobe are investing in markets and platforms to support this exchange of data between businesses.

At Telefónica, one of the biggest telecommunications businesses in the world, a team of data scientists have developed a state of the art solution based on artificial intelligence which can drive marketers’ needs using an abundance of company data. Thanks to this, a business-person canknow the needs, and the context of their clientele and anticipate when the next purchase will occur. These solutions are totally compatible with GDPR and they ensure the privacy of their users.

Telefónica, which has access to information and data for a large number of clients, is already associated with being one of the main companies helping them understand their clients. They hope that, with their advanced artificial intelligence, their data and focus centred on defending privacy, they can support companies to create a much more satisfactory digital ecosystem for their clients.

Don’t miss out on a single post. Subscribe to LUCA Data Speaks.

You can also follow us on TwitterYouTube and LinkedIn

The base rate fallacy or why antiviruses, antispam filters and detection probes work worse than what is actually promised

ElevenPaths    17 March, 2019

Before starting your workday, while your savoring your morning coffee, you open your favorite cybersecurity newsletter and an advertisement on a new Intrusion Detection System catches your attention:

THIS IDS IS CAPABLE OF DETECTING 99% OF ATTACKS!

“Hmmm<, not bad”, you think, while taking a new sip of coffee. You scroll down taking a look at a few more news, when you see a new IDS advertisement:

THIS IDS IS CAPABLE OF DETECTING 99.9% OF ATTACKS!

At first glance, which IDS is better? It seems obvious: the best will be the one which is capable of detecting a higher number of attacks, this is, the ID that detects 99.9%, against the 99% one.

Or maybe not? I’m going to make it easier. Imagine that you find a third advertisement:

THIS IDS IS CAPABLE OF DETECTING 100% OF ATTACKS!

This IDS is definitely the bomb! It detects everything!

Ok, it detects everything but…  at what price? See how easy is to obtain a 100% detection rate IDS: you only have to tag every incoming packet as malicious. You will obtain 100% detection by 100% false positives. Here, a second actor comes into play −often overlooked when data on attack detection effectiveness is provided: how many times the system has raised the alarm when there was no attack?

The detection problem
There is high number of cybersecurity applications that address the challenge of detecting an attack, anomaly or a malicious behavior:

  • IDSs must detect malicious packets from the legitimate traffic.
  • Antispam filters must find the junk mail (spam) among regular mail (ham).
  • Antiviruses must discover disguised malware among harmless files.
  • Applications’ firewalls must separate malicious URLs from benign ones.
  • Airport metal detectors must point out weapons and potentially dangerous metallic objects, instead of inoffensive objects.
  • Vulnerability scanners must warn about vulnerabilities in services or codes.
  • Cyberintelligence tools such as Aldara must know whether a conversation in social networks might become a reputational crisis, or if a Twitter account is a bot or is used by a terrorist cell.
  • Log analysis tools must identify correlated events.
  • Network protocol identification tools must correctly tag the packets.
  • Lie detectors must discern if a suspect is telling the truth or lying.
  • And many other applications. You can add more examples in the comments below. 

In spite of their disparity, all these systems have a common feature: they generate alerts when they consider that a True Positive (TP) has been found. Unfortunately, they are not perfect and also generate alerts even if there is no malicious or anomalous activity, which is known as False Positive (FP).

The following table shows all the possible response statuses of an IDS when it faces an incident. If the system detects an incident when it has actually occurred, it is working correctly: a True Positive (TP) has taken place. However, the system is malfunctioning if the incident has occurred, but the system does not warn about it: it results in a False Negative (FN). Similarly, if there is no incident and the system identifies it inaccurately, we will be facing a False Positive (FP), while we will be dealing with a True Negative (TN) if the system does not warn in such case.

False alerts are as important as detections
Let’s think about any detection system. For instance, an IDS that detects 99% of attacks is capable of tagging as malicious 99% of packets that are indeed malicious. In other words, the Detection Rate (DR), also known as True Positive Rate (TPR), is 0.99. Conversely, when there is a non-malicious incoming packet, the IDS is capable of tagging it as non-malicious in 99% of cases, meaning that the False Alert Rate (FAR) −also called False Positive Rate (FPR)− is 0.01. The truth is that in a conventional network the percentage of malicious packets is extremely low compared to legitimate packets. In this case, let’s assume that 1/100,000 packet is malicious, a figure quite conservative. Given these conditions, our IDS warns that one packet is malicious. What is the likelihood that it will be malicious?
Don’t rush to give an answer. Think about it again. 

And think about it carefully once again, do you have an answer? We will reach it step by step. In the following table, you will find a list of all the data for a specific example of 10,000,000 analyzed packets. Of all of them, 1 out of 100,000 are malicious, this is: 100; 99% of them will have been correctly identified as malicious, this is: 99 packets, while 1% −a single packet− has not been detected as malicious and no alarm has been raised: such packet has slipped through the system. The first column is completed. Moreover, the remaining 9,999,900 packets are legitimate. That said, the alarm will have sounded erroneously for 1% of these packets, summing up 99,999 packets; while for the remaining 99% the system did not sound the alert, that is: it maintained silence for a total of 9,899,901 packets. The second column is ready. Obviously, rows and columns must add the total amounts showed in the table.

With this table, now we are able to quickly answer the previous question: What is the likelihood that a packet will be malicious if the system has tagged it as such?

The answer is provided by the first row: only 99 of the 100,098 generated alerts corresponded to malicious packets. This means that the probability for that alert to be a malicious packet is quite exiguous: only 0.0989031%! You can check the calculations. It won’t get it right even one out of the thousand times that the alarm is raised. Welcome to the false positive problem!

Many people are shocked by this result: how is it possible that it fails at that point if the detection rate is 99%? Because the legitimate traffic volume is overwhelmingly higher than malicious traffic volume!

The following Venn diagram help to understand better what is happening. Even if it is not to scale, it shows how legitimate traffic (¬M) is much more frequent than malicious one (M). Indeed, it is 100,000 times more frequent. The answer for our question can be found in the section between the 3) area and the whole A area. This 3) area is quite small compared to A. Consequently, the fact that an alarm is raised does not mean too much in absolute terms regarding the danger of the analyzed packet.

The most paradoxical fact is that it does not matter how the Detection Rate of this hypothetical IDS improves, nothing will change until its False Alert Rate decreases. Even in an extreme case, on the assumption that DR = 1.0, if the remaining parameters are left untouched, when the IDS sounds an alarm, the probability that this alarm will correspond to a real attack will remain insignificant: 0.0999011%! As we can see, it does not reach even one per thousand. For this reason, IDSs have such a bad name: if an IDS only gets right one out of the thousand times it warns, finally you will end up ignoring all its alerts. The only solution would be to improve the False Alert Rate, approaching it to zero as far as possible.

The following graphic shows how detection effectiveness evolves, in other words: how as the False Alert Rate P(A|¬M) decreases, the probability P(M|A) that there will be a real malicious activity (M) −when the system raises an alert (A)− changes. Once examined the graphic, it is evident that however much the detection rate improves, it will never overcome the possible maximum effectiveness… unless the False Alert Rate decreases.

In fact, results are bleak to an extent: even with a perfect 100% Detection Rate (P(A|M) = 1.0) to reach an effectiveness higher than 50%, P(M|A) > 0.5, it would be necessary to reduce the False Alert Rate under 1,E-05, a feat not likely to be achieved.

In summary, it becomes clear that the effectiveness of detection systems (malware, network attacks, malicious URLs or spam ones), does not depend so much on the system’s capability to detect intrusive behavior, but on their capability to discard false alerts.Why you cannot ignore the Base Rate when evaluating the effectiveness of a system intended to detect any type of malicious behavior

When evaluating the behavior of an IDS, three variables are interconnected:

  • Detection Rate: how well the system identifies a malicious event as such. Ideally, DR = 1.0. This information is usually highlighted by the manufacturer, particularly when it is 100%.
  • False Alert Rate: how well the system identifies a legitimate event as such, without tagging it as malicious. Ideally, FAR = 0.0. In practice, this value is usually far from being ideal. It is common to find values between 1% and 25%.
  • Base Rate: what percentage of events are malicious in the context of the study. The higher the value is, −in other words, the more “peligroso” the environment is−, the more efficient the IDS will be, just because there are more malicious events, so logically by tagging any event as malicious the percentage of right detections would increase. The same IDS in two different contexts, one of them with a high base rate (many backdrop attacks), and the other one with a low base rate (few malicious events) will seem to magically improve its effectiveness. Actually, all that happens is that the higher number of attacks you receive, the more times you will be right in terms of detections by tagging anything as an attack. 

Manufacturers tend to highlight the first value and leave out the remaining two. That said, the False Alert Rate is as important as the Detection Rate. A given organization may waste thousands of hours investigating false alerts in malware, IDSs, logs, spam, etc. Important e-mails or critical files might be kept in quarantine, or event deleted. In the worst-case scenario, when the FAR is very high, the system is so annoying and resource-intensive to validate the alerts, that may be disabled or completely ignored. You have fallen into the alarm fatigue!

At a time when most of the manufacturers of this type of tools reach detection rates close to 100%, pay attention to False Positives as well. 

And remember that the effectiveness of any solution will be completely conditioned by the base rate. The less prevalent a problem is, the more times the system will falsely shout: “The Wolf is coming!”.

Gonzalo Álvarez Marañón
@gonalvmar
Innovation and Labs (ElevenPaths)
www.elevenpaths.com

AI will enable doctors to spend more time with their patients

AI of Things    15 March, 2019
The technical revolution within the NHS suggested in this report could not have come at a better time. We have doctors and GPs overburdened with more and more patients and simply not enough staff to deal with them and the administrative work which comes with them. The government seems to think that various forms of artificial intelligence including robotics is the answer. However, as many people fear, this is not the replacement of doctors with robots, but merely the use of technology as a tool to aid healthcare professionals and to ease the burden placed on them.

Using AI directly in the treatment of patients is clearly going to be a longer-term aspiration as a result of the aforementioned as well as the need for extensive testing and various ethical issues to take into account. This means that in the short-term, the focus, realistically will be on something that can only be described as ‘Virtual Assistants’.
These ‘Virtual Assistants’ can be utilised in various tasks, both administrative and technical, enabling healthcare professionals to focus more on the patient’s wellbeing and being able to dedicate more time to their problems and finding solutions to these.

On one hand, the NHS hopes that AI will drastically help with the administrative workload of the 111 system in England. This is a service where the public with non-urgent problems can call and be advised on whether to attend hospital, see a GP or to remain at home.
They have already tested AI Chatbots using technology from Babylon which would act as the first point of contact for people trying to contact medical services. Using extensive algorithms, the bots can process millions of different symptoms and recommend the best course of action. This would in turn prevent the ‘time wasting’ of the people who cause overcrowding with nothing more than a common cold.
Alternatively, the use of AI can save a lot of time by carrying out the menial tasks previously done by doctors or nurses. For example, the dispensing of medication or the sending of reminders to patients for appointments. This would be beneficial on two fronts, primarily it would reduce the workload of healthcare professionals giving them more time to spend with patients, and secondly it would reduce human error.
Additionally, there has been huge advances in using artificial intelligence to aid in the interpretation of scans and the recommendation of treatment. A report by UCL (University College London) states how an AI system which can identify and recommend over 50 eye diseases has been developed by themselves, DeepMind Health, and Moorefields Eye Hospital. Not only eyes however, many other medical issues can be prevented through early detection and treatments, so the co-existence of experts and technology should be something the public desire. Given enough accurate data, AI should be able to process information as quickly as, or even faster than doctors and with the same level of accuracy – a vital tool when health is involved.
However, as good as this all sounds, not everybody is convinced. The change from talking to human beings to talking to an algorithm is, for some, a bitter pill to swallow. The people’s main concern is the room for error, as one wrong piece of advice could mean the difference between life and death in circumstances where a trip to the hospital was necessary but not advised. ‘Doctor Murphy’ is an NHS Consultant who has reported on many questionable results from this technology including a 48-year-old over-weight man with sweating and chest pains. We would hope that the recommendation would be to go straight to hospital, however the AI chat bot advised a GP appointment.
For people to gain confidence in the technology, consistency of results is paramount, but with this technology still in its infancy, errors are becoming ever more commonplace…a fatal mistake in medicine. If, however, these errors can be abolished and a mutual trust created between the NHS and the public, the report states that these virtual assistants can save 5.7 million hours of GP’s time each year throughout England.
However, as Jo Best stated in an article on the use of AI in the NHS,

´´An AI system is only as good as the data it is drawing upon to learn from´´.

In a national health system where paper-based notes are still largely the norm and different IT systems being used around the country, accuracy and consistency of data are near impossible to achieve. If we don’t start compiling a nationwide data set and training medical personnel in new technologies, we simply won’t be able to exploit the technology to its fullest.

Artificial Intelligence will not replace healthcare professionals but will merely enhance their abilities and take off some of the pressure currently felt within the healthcare industry. AI is definitely something which will affect us, the public in a positive way as not only will medical personnel have more time to consult with us, their abilities will be enhanced and complimented by technology providing further peace of mind – a clear step in the right direction of Artificial Intelligence. Additionally, the future looks bright in regard to Artificial Intelligence with exciting emerging technologies such as VR (Virtual Reality) which experts say could be used in place of doctors to help treat mental conditions such as PTSD or phobias. They say this will further combat the congestion felt within the NHS impacting on limited patient time.

You can also follow us on TwitterYouTube and LinkedIn

If you want to change your employees’ security habits, don’t call their will, modify their environment instead

ElevenPaths    13 March, 2019
You’re in a coffee bar and you need to connect your smartphone to a Wi-Fi, so you check your screen and see the following options. Imagine that you know or can ask for the key, in case it were requested, which one would you choose?
Wi-Fi networks image
Depending on your security awareness level, you will choose the first one: mi38, that seems to have the best signal; or v29o, that has not such a bad signal but is secured and requests a password. Imagine now that you are in the same coffee bar, but in this case you have the following list of Wi-Fi networks on your smartphone screen. Which one would you choose now?

Wi-Fi networks colors image
Whether your security awareness level is high or not, I’m pretty sure that you would choose 3gk6. What has changed? They are the same Wi-Fi networks, but presented in a different manner. You are not even aware, but this presentation will have influenced your decision. Welcome to the nudge power!
Those nudges that sway your decisions without your knowledge
In 2008 Richard Thaler and Cass Sunstein published Nudge: Improving Decisions about Health, Wealth, and Happiness, a book that helped to popularize the “Nudge theory” and the concept of “choice architecture“. In this book the authors postulate that by designing carefully the options showed to the public, as well as the way such options are presented or framed, we are subtly influencing the decision made, without limiting the choice freedom.
According to the authors, a nudge is: “any aspect of the choice architecture that alters people’s behavior in a predictable way without forbidding any options or significantly changing their economic incentives.”
This book includes dozens of success cases of nudges and choice architectures: fly images etched on urinals that reduced spillage on men bathroom floors; fruits and vegetables that had been placed in front of the self-service tape increased purchases of these fruits and veggies more than if they had been placed at the end; displays along the road showing the speed of the vehicles approaching, so making them slow down; forms presenting the organ donation as default option when somebody dies that achieve vast differences in terms of figures between countries; and I could go on. This book is really fun and illustrative.
As we previously saw in articles of this set, our rationality is limited and our decisions are systematically subdued to biases and heuristics that produce undesirable results in some complex situations. Nudges are supported by the theoretical framework of two cognitive systems: System I and System II. The main feature of these nudges is that they exploit our irrationality.
Over the years, the concept of nudge has been shaped out and new definitions have appeared. A particularly useful definition is the behavioral scientist P. G. Hansen’s one: “A nudge is a function of (I) any attempt at influencing people’s judgment, choice or behavior in a predictable way (II) that is motivated because of cognitive boundaries, biases, routines, and habits in individual and social decision-making posing barriers for people to perform rationally in their own self-declared interests, and which (III) works by making use of those boundaries, biases, routines, and habits as integral parts of such attempts.”
This definition suggests the following nudges’ features: :
  • They produce predictable results: they influence towards a predictable direction.
  • They fight against irrationality: they intervene when people don’t act rationally in their self-interest due to their cognitive boundaries, biases, routines and habits.
  • They tap into irrationality: they exploit people’s cognitive boundaries, heuristics, routines and habits to influence towards a better behavior.
Let’s go back to the first example presented on Wi-Fi networks. According to Hansen’s definition, we can observe how the second way used to present the network list affects as follows:
  • It produces predictable results: more users turn to the most secure choices.
  • It fights against irrationality: it fights against the unthinking impulse of connection, that may be satisfied by the first Wi-Fi network with good signal appearing within the list, regardless of if it is open or secured.
  • It taps into such irrationality: green elements are seen as more secure than red ones, we privilege the first options of a list against the last ones, we pay more attention to visual cues (locks) than to textual ones, we privilege (the supposed) speed over security, etc.
And all this by showing the same networks, without forbidding any option or changing users’ economic incentives. That is to say, all these biases are being tapped to display the preferable option in the first place of the list, in green, including a lock in addition to text; and prioritizing by security as the first criterion as well as by connection speed as the second one. Ultimately, biases are analyzed, and a nudge is designed to tap into them, while respecting choice freedom.
Several research works have materialized this Wi-Fi networks’ experiment  successfully achieving to modify users’ behaviors towards more secure choices. In all the cases, these researches reached similar conclusions:
  • Well-designed nudges have the power to influence decisions.
  • This capacity for modifying behavior is greater as the probability that the user shows insecure behaviors increases.
  • The power to alter behavior raises if several types of nudges are combined, so calling Systems I and II.
How to influence your employees’ security behavior
Every day, people within your organization are dealing with a wide range of security decisions, in addition to choose a secure Wi-Fi:
  • If I download and install this app, will it involve a risk for my security?
  • If I plug this USB into the laptop, will it be an input vector for viruses?
  • If I create this short and easy-to-remember password, will I be cracked?
This is why security policies exist: they guide users’ behavior by requiring them to act as securely as possible within the organization’s security context and aims. However, are there other alternatives? Is it possible to guide people’s security choices while respecting their self-determination and without limiting the options? In other words: is it achievable that they act securely without them being aware of they are being influenced and feeling that their freedom is being limited?
According to R. Calo, professor specialized in cyberlaw, there are three types of behavioral intervention:
  1. Codes: they involve manipulating the environment to make the undesirable (insecure) behavior (almost) impossible. For instance, if you want your system’s users to create secure passwords, you may refuse any password that does not follow the password security policy: “at least 12 characters long including both alphanumeric and special characters as well as upper and lower cases, and non-repeated regarding last 12 passwords”. By doing so, the user does not have another choice than buckle under it or they will not be able to access the system. In general, all security guidelines not leaving options are included in this category: blocking USB ports to prevent potentially dangerous devices from being connected; restricting sites to be connected to a white list; limiting the size of e-mail attachments; and many others typically foreseen within organizational security policies. Codes are really effective to modify behaviors, but they do not leave choice neither exploit limited rationality, so they cannot be considered as “nudges”. In fact, many of these measures do not have success among users and may lead to search for circumventions that betray completely their purpose, such as writing complex passwords in a post-it stuck to the monitor: by doing so, users are protected against remote attacks, but in-house attacks are eased and even fostered.
  2. Nudges: they exploit cognitive biases and heuristics to influence users towards wiser (secure) behaviors. For example, going back to passwords, if you want your system’s users to create more secure passwords according your security policy guidelines previously mentioned, you can add a password strength indicator for signup forms. Users feel the need to get a stronger password, so they are more likely to keep adding characters in order to complicate it until the result is a flamboyant green “robust password”. Even if the system does not forbid weak passwords, so respecting users self-determination, this simple nudge increases drastically the complexity of created passwords.
    1. passwords image
      1. Notices: they are purely informative interventions intended to cause reflection. For instance, the introductory new-password form may include a message reporting on the expected characteristics of new passwords as well as on how important are strong passwords to prevent attacks from happening, etc. Unfortunately, informative messages are really ineffective, since users tend to ignore them and often do not consider them even intelligible. These notices cannot be considered either as “nudges”, since they do not exploit biases neither cognitive boundaries. Nevertheless, their efficacy can be notably increased if they are combined with a nudge: for instance, by including the message and the strength indicator in the same password creation page. These hybrid nudges are aimed to call System I, quick and fast, as well as System II, slow and thoughtful, through informative messages.
      Therefore, to ensure the success of a behavioral intervention it will be desirable to call both types of processes.
      The most effective nudges in the field of information security
      Hybrid nudges are the most effective ones, since they combine thought-provoking information with any other cognitive trick that exploits biases or heuristics::
      • Default options: provide more than an option, but always make sure that the default option is the most secure one. By doing so, even if you allow users to select a different option, most of them will not do it.
      • (Subliminal) information: a password creation website induces users to create stronger passwords if there are images of hackers, or simply of eyes, or even if the text is modified: “enter your secret” instead of “enter your password”.
      • Targets: present a goal to the user, for instance: a strength indicator, a percentage indicator, a progress bar, etc. Like this, they will strive to accomplish the task. This type of intervention can be categorized as a feedback as well.
      • Feedback: provide the user with information for them to understand if each action is achieving the expected result while a task is being executed. For example, by reporting on the security level reached over the set-up process of an application or service, or on the risk level of an action before tapping on “Send”. Mind you, the language must be carefully adapted to the recipient’s skill level. For instance, in this research, thanks to the use of metaphors known as “locked doors” and “bandits” users understood better the information and consequently made better choices. In this other study, researchers confirmed how to report periodically Android users on the permissions used by installed apps, so making them to check the permissions granted. In this other study, the same researchers reported users on how their location was being used. Consequently, they limited app access to their location. In another research, the fact of reporting about how many people can view your post in social networks led a high number of users to delete the post in order to avoid regretting.
      • Conventional behavior: show the place of each user in relation to users’ average. Nobody likes to be behind, all the people want to be over the average. For instance, following a password selection, the message “87% of your colleagues have created a strong password” make those users that had created a weak password to reflect and create a more secure one.
      • Order: present the most secure option at the top of the list. We tend to select the first option we see.
      • Standards: use pictographic conventions: green means “secure”, red indicates “danger”. A lock represents security, and so on.
      • Prominence: by highlighting the most secure options you attract people’s attention over them, so you facilitate their selection. The more visible an option is, the higher is its probability to be selected.
      • Frames: you can present an action’s result as “making a profit” or “avoiding a loss”. Loss aversion tends to be a more powerful momentum.

      Nudges’ ethical implications

      As you may imagine, this issue is not without its ethical implications, since you are creating interventions with the aim of influencing users’ behavior by exploiting cognitive processes’ holes. In summary, we may say that you are hacking users’ brains.
      The researchers K. Renaud and V. Zimmermann have published a full paper where they explore nudges’ ethical guidelines in its broadest sense. This way, they state a number of general principles to create ethical nudges, so before launching yourself on the design of your own organizational nudges, I recommend you to think about the following five ethical principles:
      1. Autonomy: the end user must be free to choose any of the provided offers, regardless of the direction to where the nudge is addressed. In general terms, no option will be forbidden or removed from the environment. If any option is required to be limited due to security reasons, such fact must be justified.
      2. Benefit: the nudge must only be deployed when it provides a clear benefit, so the intervention is totally justified.
      3. Justice: as many people as possible must benefit from the nudge, and not only its author.
      4. Social responsibility: both nudge’s anticipated and unanticipated results must be considered. Pro-social nudges progressing towards the common good must be always contemplated.
      5. Integrity: nudges must be designed with scientific support, whenever possible.

      Use nudges for Good

      Nudges are becoming more usual in the field of cybersecurity with the aim of influencing people in order to make them choose the option that the nudge designer consider is the best or most secure one. New choice architectures are being explored as a means for designing better security making-decision environments without drawing on restrictive policies or limiting options. Even if the neutral design is a fallacy, be cautious and ethical when designing your organizational nudges: design them to help users to overcome those biases and heuristics that endanger their decisions on privacy and security.
      Push your organization to achieve a greater security, always respecting people’s freedom.
      Gonzalo Álvarez de Marañón
      Innovation and Labs (ElevenPaths)

      Reinforcement learning…. the new player

      Rubén Granados    11 March, 2019

      Here, we would like to present a ´new´ guest, reinforcement learning (RL). Not as new, because it has been with us since the 80s.

      We will quickly go over the classic approximations, that until now have been sufficient in tackling the majority of Machine Learning problems. The main characteristics of this technology are very apparent to anyone who has had minimum contact in this area: in supervised learning we have labelled data (that already has numerical values for problems such as regression or categories for classification) together with those that the algorithm will learn from. In the case of non-supervised learning, we will have un-labelled data, we have the objective of discovering structures or patterns within the data (for example, with clustering problems or segmentation).

      So then where does the learning by reinforcement come into play? In this case, these types of models will be used when you don´t initially have the data with which to learn available, which would be because they don’t exist or because we can´t wait to compile them. Or it could be because they change too quickly, and the exit is also modified with a higher frequency with which the models can typically understand.

      We can better understand the concept if we go the origins of reinforced learning, which is based on the study of animal behaviour. The common example is of the newly born gazelle, who is capable of understanding how to walk and run in a few minutes, without having any previous knowledge or being shown how to use his legs. His learning method consisted of trial and error, interacting with his environment and learning what type of movements are beneficial and not, all because he wanted to reach his goal, in this case driven by his will to survive.

      Any examples of current problems that adjust to these characteristics can be controlled by robotics (where the robot can get to know for the first time how he wants to move), or with interactive video games or classic games (where there is a lot of possibilities and the situation is constantly changing) where, like the gazelle, the objective is to maximise any notion of reward (that, depending on the game, can kill the maximum number of zombies possible).

      Although the concept of this type of learning isn’t as clear as those for supervised and non-supervised learning, the key difference remains clear: whilst with classic methods they have data from which they can learn, RL algorithms will generate their own data based on experience, tests and making mistakes, to identify the best strategy or collection of movements, using the information (positive or negative) they have obtained from the environment and their actions, as a reinforcement. To summarise, RL models live through their own experiences and understand them, whilst the others have examples which they have to learn. And, most importantly, systems based in RL can survive learning from their environment, without the lamentation of being attached to the rules or learning models of the past.

      We could focus more on the possible automatic learning strategies:

      Figure 1. Automatic learning strategies

      The basic working scheme of a Reinforcement Learning model can be seen in figure 1. The agent is the RL algorithm that makes decisions on how to behave in its environment. The environment is the world in which the agent operates, representing a universe of possibilities or situations that can lead to a concrete moment. The state is the characterisation of the situation in a given moment by the agent. The actions refer to the tasks that the agent brings forward in the environment. And, lastly, the reward is what guides the agents, associated with executed actions, in the form of feedback to the environment.

      Figure 2. Feedback to the environment.

      In almost all of the possible scenarios for the application of this type of learning, one can highlight all of those scenarios in which they presented a human action, and that cannot be a result of a collection of rules or traditional ML models.  To mention a few: the atomisation of robotic processes (like the industrial robot Faunc, that learned for himself by grabbing objects form containers), the packaging for materials that need to be sent, the driving of automatic vehicles, digital marketing (where a model can learn to use personalised adverts and, in the moment, fit these to the users based on their activity), chatbots (used to understand user reactions), finances ( where it can be used to evaluate commercial strategies with the objective of maximising the value of financial portfolios), etc. Another typical example of the applications of RL algorithms are those which learn to play video games, as in the case of AlphaGo Zero, the first algorithm to beat a human world champion in the famous video game china Go. The image shows a complete map of the immense possibilities of RL.

      Figure 3. Various tasks RL can help us complete

      Overall, there is more to life than classic models and supervised or non-supervised ML algorithms, as we can see presented here: the various tasks RL can help us complete.

      Written by Alfonso Ibañez and Ruben Granados

      Don’t miss out on a single post. Subscribe to LUCA Data Speaks.

      You can also follow us on TwitterYouTube and LinkedIn