#CyberSecurityPulse: Changing stereotypes in the security sector

ElevenPaths    12 June, 2018

Ripples of outrage spread across the cybersecurity industry last week after women in red evening gowns were seen promoting a product at the Infosecurity Europe 2018 conference. The event’s organisers condemned the move, saying vendor contracts ban the use of so-called ‘booth babes’. Thankfully, this behaviour is in the minority. In fact, it is perceived that there is beginning to be greater gender diversity, that more women are participating in conferences and that multiple programmes and initiatives are being implemented, including a renewed focus on recruitment.

The value of a diverse workforce is becoming a reality. Change is taking place at the enterprise level, as companies accept that gender diversity can enhance their overall capabilities. Unfortunately, the industry as a whole is nowhere near gender parity or equality, but some efforts are being made to attract women in the field of cybersecurity and one such initiative is in schools to attract female talent in time.

Many issues remain unresolved, but it is gaining relevance. While the numbers are still stagnant, the industry is realising that at the root of the problem is the need for culture to change. It’s going to take years, so approaching this issue with an entrepreneurial and collaborative mindset and using the data to make decisions will be essential.

More information available at SC Magazine

Highlighted News

Lawmakers renew push to preempt state encryption laws

anti-doping imagen

A bipartisan group of lawmakers is renewing a push for legislation to block states from mandating that technology companies build backdoors into devices they produce in order to allow law enforcement access to them. The measure is designed to preempt state and local governments from moving forward with their own laws governing encryption before the federal government acts on the issue. Specifically, the legislation would prohibit state and local governments from mandating that tech companies “design or alter the security functions in its product or service to allow the surveillance of any user of such product or service, or to allow the physical search of such product, by any agency or instrumentality of a State, a political subdivision of a State or the United States,” according to a copy of the bill.

More information available at Congress.gov

South Korean Cryptocurrency Exchange Coinrail hacked

EI-ISAC imagen

South Korea-based cryptocurrency exchange Coinrail announced on Sunday a cyber-incident during which an intruder made off with a large amount of ICO tokens stored on the company’s servers. The exchange announced the hack via a message on its website where it admitted a hacker stole tokens issued during the initial coin offerings (ICOs) of Pundi X (NPXS), NPER (NPER), and Aston (ATX), which were being traded at the time on its servers. As soon as it detected the intrusion, Coinrail put its portal in maintenance mode. The exchange said it secured and moved most of its cryptocurrency assets in cold storage (offline) wallets. While hacked exchanges often went under in the early days of Bitcoin, nowadays, with pressure from authorities, most offer compensation plans for affected users. Coinrail did not publish any information about compensation plans.

More information available at Coinrail

News from the rest of the week

Google Patches 11 Critical Android Bugs in June Update

Google patched 57 vulnerabilities affecting the Android operating system and kernel and chipset components tied to third-party firms MediaTek, NVIDIA and Qualcomm. Eleven of the bugs are rated critical and 46 are rated high. Google said the most severe of the vulnerabilities are remote code execution bugs (CVE-2018-9341, CVE-2018-5146 and CVE-2017-13230) in the Android media framework “that could enable a remote attacker using a specially crafted file to execute arbitrary code within the context of a privileged process.”

More information available at Android

Banking trojans replaced ransomware as top email-based payload in Q1

The concept of infecting targeted users with banking trojans has been so successful in the recent past that in the first quarter of 2018, banking trojans overtook ransomware as the top malicious payload distributed through email. In all, banking trojans accounted for 59 percent of all malicious email payloads in the first quarter of 2018 which also saw email-based malware attacks rise significantly. A new report from Proofpoint has shown that the number of firms receiving more than 50 email-based malware attacks grew by 20 percent compared to in the last quarter of 2017.

More information available at Proof Point

InvisiMole Spyware is a powerful malware that went undetected for at least five years

Malware researchers from ESET have spotted a new sophisticated piece of spyware, tracked as InvisiMole, used in targeted attacks in Russia and Ukraine in the last five years. According to the researchers, the authors of the InvisiMole spyware have removed any clue that could attribute the malware to a specific actor, the unique exception is represented by the compilation data of a single file (dating to October 13, 2013). Compilation dates for all the remaining files have been removed by the authors.

More information available at Github

Other news

Ticketfly Confirms 27M Accounts Exposed

More information available at CNet

Dozens of Vulnerabilities Discovered in DoD’s Enterprise Travel System

More information available at Darkreading

Apple will let users run iOS apps on macOS

More information available at The Hacker News

Big Data Analytics, but what type?

AI of Things    11 June, 2018
In today’s climate, businesses are already aware that if they don’t make the most of their data, they will be left behind by the competition. They know that traditional Business Intelligence (BI) systems are no longer enough. All around them, they hear people talk about Big Data, Data Analytics, Data Science and more. They read leading consultancy reports that predict that an increasing number of businesses will enter these industries in the coming year. They have started to invest resources in their data storage. However, after all this they don’t know how that draw out value from this information.


As you can imagine from the title, in this post we are going to explore the different types of Big Data Analytics, what they consist in, how they can be used in the market, and how specific businesses are currently using them. All this information will give us an insight into which type is best suited to specific businesses.

The first step is to decide which type of analytics your business needs. This is not a trivial question, since there are no “universal analytics” that work for each case. Traditionally, one would work with analytical tools in a reactive way. These tools are capable of generating reports and visualizations about what has happened in the past, but they do not offer useful information about possible business opportunities or problems that may arise in the future. This led to a need for a movement towards Predictive Analytics as well as the Descriptive Analytics that already existed. The world saw a move from linear analytics in a controlled environment, towards analytics that can be applied in a real world (i.e. less structured) environment.
This need is still to be fully fulfilled, shown by the fact that technology consultants IDC estimate a Compound Annual Growth Rate (CAGR) for the Big Data and Analytics industry of 26.4% by the end of 2018.
Graphic showing the word Analytics at the bottom of a swimming pool
Figure 1: Analytics.
In this article we will define Descriptive, Predictive and Prescriptive analytics in order to reveal what each type can offer to businesses who want to improve their operational capabilities.

1) Descriptive Analytics

    
This is the most basic area of analytics, and is currently used by around 90% of businesses. Descriptive Analytics answers the question: What has happened? It analyzes historical data and data collected in real time in order to generate insights about how past business strategies have worked (for example, a marketing campaign).
  • Aim: To identify the causes that led to success or failure in the past in order to understand how they might affect the future
  • Based on: Standard aggregate functions of the database. They require a basic level of mathematics
  • Examples: This type of analytics is often used for social analytics and is the result of basic arithmetic operations such as average response time, page views, follower trends, likes etc
  • Application: Using tools such as Google Analytics to analyze whether a promotional campaign has worked well or not, with the use of basic parameters such as the number of visits to the page. The results are usually visualized on “Dashboards” that allow the user to see real-time data and send the reports to others.

2) Predictive Analytics

This is the next step in reducing data into insights, and according to Gartner, 13% of organizations use such techniques. This type of analytics answers the question: What may happen in the future based on what has happened in the past? It analyzes historical trends and data models in order to try to predict how they will behave in the future. For example, a company can predict their growth by extrapolating previous behavior and assuming that there would be relevant changes in their environment. Predictive analytics offer better recommendations and answers to questions that BI cannot answer.
  • Aim: To identify the causes that led to success or failure in the past in order to understand how they might affect the future. This can be useful when setting realistic business objectives and can help businesses to plan more effectively
  • Based on: They use statistical algorithms and Machine Learning to predict the probability of future results. The data that feeds these algorithms comes from CRMs, ERPs or human resources. These algorithms are capable of identifying relationships between different variables in the dataset. They are also capable of filling gaps in information with the best possible predictions. However, despite being the “best possible”, they are still only predictions
  • Examples: One usually uses this type of analytics for Sentiment Analysis. Data enters a machine learning model in the form of plain text and the model is then capable of assigning a value to the text referring to whether the emotion shown is positive, negative or neutral
  • Application: Often in the financial sector in order to assign a client with a credit score. Retail companies also use this type of analytics to identify patterns in the purchasing behavior of clients, to make stock predictions or to offer personalized recommendations

Figure 2: The different types of analytics.

    

3) Prescriptive Analytics

This type of analytics goes one step further, as it aims to influence the future. As such, it is known as “the final frontier of analytical capabilities”. Predictive analytics can suggest which actions to take in order to achieve a certain objective. Prescriptive analytics does this as well, but also suggests the possible effects of each option.
It aims to answer the question: What should our business do? Its complexity means that, despite the immense value that it could offer, only 3% of organization use such analytics (according to Gartner)
Infographic - the three phases of analytics
Figure 3: The three phases of analytics.

    
  • Aims: Prescriptive analytics don’t just anticipate what is going to happen, and when, but can also tell us why. Further still, it can suggest which decisions we should take in order to make the most of a future business opportunity or to avoid a possible risk, showing the implication of each option on the result.
  • Based on: This type of analytics ingests hybrid data; structures (numbers, categories) and unstructured (videos, images, sounds and text). This data may come from an organization’s internal sources, or external ones such as social networks. To the data, it applies statistical mathematical models, machine learning and natural language processing. It also applies rules, norms, best practices and business regulations. These models can continue to collect data in order to continue making predictions and prescriptions. In this way, the predictions become increasingly precise and can suggest better decisions to the business
  • Examples: Prescriptive analytics is useful when making decisions relating to the exploration and production of petrol and natural gas. It captures a large quantity of data, can create models and images about the Earth’s structure and describe different characteristics of the process (machine performance, oil flow, temperature, pressure etc). This tools can be used to decide where and when to drill and therefore build wells in a way that minimizes costs and reduces the environmental impact.
  • Application: Health service providers can use such analytics to: effectively plan future investments in equipment and infrastructure, by basing plans on economic, demographic and public health data; to obtain better results in patient satisfaction surveys and to avoid patient churn; to identify the most appropriate intervention models according to specific population groups. Pharmaceutical companies can use it to find the most appropriate patients for a clinical trial.

We don’t have a crystal ball to tell us which numbers will appear in the lottery, but it is evident that Big Data technologies can shed light on current problems in our business, helping us to understand why they are happening. In this way, we can transform data into actionable insights and reinvent business processes.

From data to decisions and actions - infographics
Figure 4: Infographic – from data to decisions and actions.

We have moved from the question of “what has happened?” to being capable of understanding “why has it happened?  We can predict “what is going to happen?” and prescribe “what should I do now?” Now, we can truly create an intelligent business.

Deep Learning vs Atari: train your AI to dominate classic videogames (Part I)

AI of Things    7 June, 2018

A few months ago we began a series of posts on this blog where we explained how you can train an Artificial Intelligence (AI) to eventually win certain games. If this subject intrests you, we invite you to watch our webinars “Dominate classic videogames with OpenAI and Machine Learning” where we explain everything you need to know about OpenAI Gym, from its installation to its functions, as well as showing examples with Python and simple techniques of Reinforcement Learning (RL). It was two-part series that began on April 24 and concluded on May 29.

Before moving forwards with these two episodes, we encourage you to read the following posts (in Spanish):

In these posts, you can find an introduction to OpenAI and its Universe and Gym libraries, and then you can develop simple solutions for well-known environments such as CartPole and Taxi. These articles will help you to install the environment and the necessary libraries, as well as establishing the foundations that will help you in the following posts.

In this new series, we’ll continue with OpenAI but this time we’re going to take on bigger challenges, with regards to the complexity of the game and the design of the AI agent (using Reinforcement Learning techniques). Our aim is to offer an introduction to the training of an AI that is capable of taking on these harder videogame environments by following a strategy of trial and error.

Our base for training will be Atari’s classic videogameBreakout, but the solutions that we develop can be extrapolated to other games with similar characteristic. Although we have increased the difficulty of the environment, we are going to need a platform that isn’t too complex. This simply means that there aren’t too many moving objects to analyze and that the sequence of movement is simple (left, right, shoot). Within these limitations we can include classics such as Space Invaders and Pac-Man.

Figure 2: A screemshot of the OpenAI Gym environments for Breakout, Space Invaders and Pac-Man.

These arcade games can be played on the Atari 2600 console, developed in 1977 and which ended up seeing successful sales for over a decade, and marked the teenage years of many of us. The simplicity of the game environments developed for this console has allowed these “worlds” to become platforms for the study and application of AI and Machine Learning techniques. This simplicity each frame of a game can be defined by its state in a relatively manageable observation space, with a limited amount of actions.

As mentioned earlier, the game we have decided to use for our Proof of Concept (PoC) is Breakout. This classic arcade game was created in 1976 by Nolan Bushnell and Steve Bristow but initially built by Steve Wozniak. The aim of the game is to break all the bricks that fill the upper half of the screen with a ball that bounces of your moveable paddle. The paddle can move to the left and right, within the game limits.

Figure 3: Screenshot of the Breakout training environment within the OpenAI Gym.

The player tries to bounce the ball back towards the bricks, destroying them one by one, and making sure that the ball doesn’t fall off the lower part of the screen (which causes you to lose a life). The game resets each time all the bricks have been destroyed or whrn the player loses all of the five lives they start with.

Thanks to the large number of supported environments, you can see the full list here, we will again use OpenAI and Gym. This time it will be supplied with screenshots from the game so that before we start, it can identify and locate the pixels of the screen that correspond to different elements of the game (such as the position of the ball, the paddle etc).

The Gym environment that we have used is called Breakout-v0 and has the following characteristics:

  • An observation space (env.observation_space) that can be represented by a state of play. It is an array with dimensions (210, 169, 3) which represents the pixel values of an image in the game. The third dimension is reserved for the defined RGB values of each pixel for a 128 bit color palette.
  • The action space (env.action_space.n) for this game is defined as a group of four integers [0, 1, 2, 3]. The relationship between these integers and the allowed actions can be obtained using (env.unwrapped.get_action_meaning), which returns [‘NOOP’, ‘FIRE’, ‘RIGHT’, ‘LEFT’].
Figure 4: Information about the Breakout-v0 environment within OpenAi. (Source)

The following steps can summarize the interaction with the OpenAI Gym: load the environment from the library; obtain an initial random state; apply a certain action to the environment that is derived from a rule; this action will take place in a new game state; receive a reward after applying the action, such as an indicator if the game has finished.

By training an AI in environments such as Breakout with techniques such as “Random Search”, we recognize that is it an immeasurable task, especially when the agent is given game states that are larger and more complex. Therefore we need our agent to use an appropriate set of actions that allows us to approximate the Q(s,a) function, which maximizes the reward “a” by applying a given action “s” to the state. But, how can we deal with the complexity that comes when combining different states and approximate this function? We can do it by using Deep Neural Networks equipped with Q-Learning algorithms, most commonly known as Deep Q-Networks (DQN).

For this type of training, the Neural Networks to use are called Convolutional Neural Networks. Throughout the history of Deep Learning, these networks have proven to behave excellently when recognizing and learning patterns based on images. These networks take the values of pixels of the frames that serve as inputs. The representations of these entries become more abstract as they pass through the “architecture” (its various layers) and the results is a dense final layer which shows a number of outputs that is equal to the action space of the environment (4 in the case of Breakout-v0).

Figure 5: Graphic showing the Convolutional Neural Network layers.

Optimizing a Neural Network is a difficult task, since it is highly dependent on the quality and quantity of the data that is used to train the model. The difficulty in optimizing the network is also a result of the architecture itself, since a larger number of layers and greater “dimensionality” requires the optimization of a larger number of weights and biases.

As a reminder, the Q(s,a) function is defined as:

If we know the Q(s,a) function perfectly, one could immediately know which action to carry out based on a defined policy:

The Neural Network will try to predict its own output by repeatedly using this formula that is aimed at modifying the Q(s,a) function. However, what can guarantee that we can make the Q function converge towards the truth by taking samples of a decision making policy? Unfortunately, nothing can save us from a function that never converges towards the optimum. There are however, technical differences that will allow this function to correctly approximate, but we will explain those in the second part of this series.

During the training of these types of architectures, the most convenient strategy is to give our algorithm a combination of “pre-processed” initial states, which it can use to obtain a beneficial action for the agent, that will recognize the “reward” of this action, and the following state of the environment, and then continue to feed the model in this way.

Now that we have a foundation of Deep Neural Networks and Q-Learning, in the following post we will present the results of our own training for Breakout and Space Invaders. Also, we will give more details about the implementation of the model’s architecture, the strategies used to reach the solution and the relevant tasks of prep-processing images that make the network more efficient. Our model will be a DQN built using the TensorFlow library.

Right now we are immersed in training this model, and below you can see a preview of the current capabilities of our agent:

https://www.youtube.com/watch?v=EvU6QlnL4_k

As you can see, in one episode our AI is already capable of scoring 101 points! It’s been able to achieve this after being trained on over 1900 episodes and having processed almost 2.5e7 states during one week of training. We’re going to give the model more training time so that in the next article we can show you a video of how our agent is capable of destroying all the bricks before losing five lives, as well as another video of the model training in Space Invaders.

Written by Enrique Blanco (CDO Researcher) and Fran Ramírez (Security Researcher at Eleven Paths)

We look forward to seeing you in the next post!

Don’t miss out on a single post. Subscribe to LUCA Data Speaks.

5 IoT elements to improve the customer experience in your store

Beatriz Sanz Baños    6 June, 2018

There are fewer and fewer sectors that can ignore Internet of Things. What started as a tech trend focused on home security and personal exercise has evolved in such a way that now it would be strange for a business niche to not consider incorporating it in its business lines or adjacent services.

IoT has ceased to be a possible supplement and has evolved towards being a tech segment with a life of its own. This is shown by the study The Internet of Things: mapping the value beyond the hype by McKinsey, which estimates that Internet of Things will increase its business volumes tremendously exponentially in the years to come, creating an impact of no less than $11 trillion annually from 2025.

And the retail sector is one of the sectors that seems eager to use this tech trend sooner rather than later. Some uses you will have already seen, others you will see soon and yet others will now seem like science fiction, but sooner or later, they will be installed in far more stores than we could imagine.

1.- Stock management

You have probably already seen this in superstores: connected devices that analyse the stock available across all stores in real time. What does that mean? That if a person is in your store and falls in love with a shirt but you don’t have it in their size, you can tell them if that size is available in another store nearby.

This model can also work in reverse in a much more advanced way: without requiring a request from a customer, stores can see the stock they have in real time and, if they anticipate a possible shortage, request new material from another store so as not to run out of stock.

2.- User analysis

This is the kind of thing that users don’t catch on to, but that helps the chains a lot. Even though there is universal clothing accessible to all targets, it is evident that geographical areas, even within the same city, say a lot about the customer. And this can be analysed with the IoT.

Imagine, for example, a device that autonomously stores data on the sales made in your store: the average expenditure by each customer, what garments are the most (and least) successful, what sizes are most in demand, what days most people visit… If this data is crossed with stores in other areas you will be able to get behaviour patterns. In this way, you can optimise your stock to anticipate the possible purchases made by your future customers.

3.- Smart tags

Isn’t it time for us to take more technological advantage of clothing tags? There are already specific innovations that allow a lot of information to be extracted from a mere tag: the stock available, the exact measurements, the option of paying by mobile using the tag etc. An example of this is the Telefónica IoT Digital Tags service.

4.- Beacons on the street

A person goes walking down the street and suddenly… wham! They get a notification on their mobile with your store’s offers. And it doesn’t have to be a shot in the dark, since they will get it when they are nearby. This is possible with geomarketing and beacon tech, which geolocates users and sends offers and promotions to their mobile when they are near your store using different kinds of connections (usually Bluetooth).

5.- Dynamic music

So we all feel like asking for a drink when we go past a store that has music playing at top volume, but this isn’t necessarily a positive thing. A store’s music doesn’t have to be flashy, but effective. This means that while a certain kind of music may attract a certain kind of customer, it may drive off others. In this regard, the Internet of Things can also help you plan the best music for your establishment.

One tool you could try is spotandsell, a business line by onthespot to help you choose the perfect music for your establishment, whatever kind it is. It’s not about standard music, but pieces especially composed to encourage purchase and provide a great user experience for your potential customers.

The AI Hunger Games – Why is modern Artificial Intelligence so data hungry? (Part I)

AI of Things    5 June, 2018
Guest Post written by Paulo Villegas – Head of Cognitive Computing at AURA in Telefónica CDO

Modern Artificial Intelligence is performing human-like tasks that seemed out of reach just a few years ago. Granted, we are talking about narrow AI (tasks involving only a small subset of human capabilities) – general AI is still far away. But on that narrow task we are experiencing spectacular advances. The most salient results are in perception: visual perception (image recognition, which in tasks such as large-scale object recognition is achieving human-like performance) or audio perception (speech recognition is also achieving unprecedented results). But other noteworthy results have also made headlines, such as Google’s AlphaGo beating Go champions. There are also initial forays into ‘artistic’ traits such as painting styles or music composition.
Continue reading “The AI Hunger Games – Why is modern Artificial Intelligence so data hungry? (Part I)”

New tools: Metashield Bots, analyzing and cleaning metadata for everyone, from everywhere

ElevenPaths    5 June, 2018
You all know Metashield. Basically, it is a technology from our own to analyze and clean metadata, that is used in several of our own products. Although metadata seems to be an old problem, it is still useful when you analyze leaked data, as in the Bin Laden hard disk case that we covered, and even it was a key piece in our research about Wannacry author, when we found out how the creator worked and even what his default language in Word was. We are introducing today a new way to use Metashield, for everyone and from everywhere since we have created bots for Telegram, Skype and Slack. It is easier than ever now. Let’s see.

If you want to use Metashield, you have several choices. Clean-up online, a client for Windows, as an aside technology in our imminent Path8, with FOCA… but how to use it from your computer, laptop, tablet and phone? We have now the answer. Simply add these bots as a chat in Skype, Telegram or Slack, and send (or share) a file.

What is Metashield Bot? imagen
https://msbot.e-paths.com

The bots will analyze and show metadata and, if you want to, clean them. This will allow to:

  • Clean and analyze files from any platform or operative system where Telegram, Slack or Skype can be used.
  • Clean and analyze files easily before sharing them. Just drug and drop. If your Skype, Telegram or Slack is synchronized between your own devices, the analyzed or cleaned file will be there too. So you will have some kind of “distributed Metashield”.

Aside in the case of Telegram, for example, this allows you to include it in your own developments as an API if you want to use the bots in an automated way.

You may analyze as much as 500 files every hour, but we have limited the cleans to two files every hour, so the service is not abused. Aside, please consider these programs are “beta” yet, so please report any issue and forgive our downtimes. We do not keep any file, so your privacy is guaranteed.

We have recorded some videos to show you how easy it is to configure and use these bots in these platforms.

Metashield Bots for Slack:

Metashield Bots for Telegram:

Metashield Bots for Skype:

Equipo de Innovación y laboratorio

Will GDPR’s “right to data portability” change the data industry forever?

Richard Benjamins    1 June, 2018
After years of preparation, on May 25 2018, the new General Data Protection Regulation (GDPR) has come into force.  Much has been written, discussed and speculated about it. Over the past 2 years, organizations have worked frenetically to understand its implications and to be prepared for this date.  In this blog we will explore whether the “right to data portability” will drastically change the data industry.

The GDPR is a regulation, not a directive, and this is a main difference with the European data protection directive in force so far. A directive gives room for national interpretations, whereas a regulation is like a national law. Apart from this important change, other relevant changes relate to:
  • Geography: the geographical scope is extended to all organizations that serve users in the European Union, regardless of citizenship and of where the organizations is headquartered.
  • Penalties: the maximum fine for breaching the GDPR is 4% of the organizations global revenue or €20 million, whichever is greater.
  • Consent: organizations that want to process personal data, need to obtain explicit consent (opt-in) through an easy understandable and clear text, defining and explaining the purpose of the processing.
  • Data subject rights including the rights to be informed, right of access, right to rectification, right to be forgotten (erasure), right to data portability and right to object.
  • Data Protection Officers: the appointment of a DPO will be mandatory for organizations whose core activities consist of operations which require regular and systematic monitoring of data subjects on a large scale, or of special (sensitive) categories of data.
  • Breach notification: breach notification to the supervisory authority is mandatory within max 72 hours, and if the breach is likely to result in a high risk of adversely affecting individuals’ rights and freedoms, individuals must also be notified without undue delay.
In this post, we will talk about one of the less known new rights of citizens, namely the right to data portability. This right has not received too much attention in all discussions around the GDPR. However, we believe that it might be a future game changer for many industries.
The right to portability allows individuals to obtain a copy of their data and to reuse it for whatever purpose they see fit. They could use it just for their personal interest; or to transfer their data to a new service provider such as an electricity or insurance company. The new service provider would then “know” the new customer from day one. Not all personal data a company has about a customer falls under the right to portability. Covered are:
  • The data the customer has provided to the service provider, such as name, address, bank information, etc. and
  • The “observed” data that the service provider sees based on the customer’s usage of the service such as the KwH consumed, claims made or financial transactions made, etc.
What is not covered is any information the service provider infers about the customer. For example, a company may use a Machine Learning model that assigns a score to customers reflecting the likelihood they will leave the company. This inferred “churn score” does not fall under the right to portability.
Organizations that have prepared for this might have asked themselves the question of how many customers would exercise their right to portability. This is important since a low amount (in the hundreds) might be manageable in a manual way, whereas a large amount (e.g. in the ten thousands or even more) might require automation of the process, and therefore investment. There are usually two approaches companies have used to estimate the number of expected requests:
  • They compare it with the right to access data, which is already a right under the current data protection directive, and assume similar amounts as today. In general, very few people exercise their right to access data, and most of the companies handle those requests manually.
  • Another way companies use to estimate the number of requests for data portability is the (voluntary) churn rate of their customers. Customers that decide to change service provider might see a benefit in bringing their personal data to the new service provider because that makes the onboarding process easier; no need to fill in lots of information. Moreover, the new service provider can look at the usage behavior and give tailored services to the new customer from day one of the relation. Not all customers that churn will however chose to port their data. For instance, in the insurance industry, customers that have submitted many claims, might want to keep that information away from their new insurance company so that they are not penalized with a higher premium.
In most cases, those two approaches have led organizations to believe that the right to portability will not be exercised too much, and therefore they have considered no specific investments to prepare for massive portability requests.
In the short term, those organizations are probably right, and have taken the right decision. However, we think that this particular right might have a huge impact on many businesses across many sectors.
Illustration of various technological items, including graphs, numbers and a laptop
Figure 2: One of the key parts of the DPR is the new “right to data portability”.

Here is why…

The right to data portability also means that users can request service providers to directly transfer their personal data to other service providers. Moreover, users can authorize third parties to file the requests on behalf of them. And this is the point that might have a game changing impact on the data industry. Imagine that Amazon reaches out to all its customers to suggest that they authorize Amazon to file a data portability request on behalf of them to port their data from all their service providers (e.g. insurance, utilities, telecommunications, etc.) to Amazon. In return, Amazon promises to all customers who agree, to provide them with a better and cheaper alternative service, and significant discounts on future purchases. If Amazon were to offer telecommunications and insurance services, then through this campaign, Amazon could acquire many new customers. But more importantly, Amazon would have access to the personal data of all those users who accepted Amazon’s offer and could start creating value from this data. If this happened at a massive scale then, suddenly, the private data of the “left” service providers would have lost its uniqueness and thus would have become less differential. If we take this scenario to the extreme, then we might imagine a data war between companies to gather as much personal data as possible, and all in a way that is fully compliant with the GDPR. In the end, users are just exercising their right to data portability.
Seen like this, it looks like a major threat for companies that are currently exploiting their propriety data for business because it is differential data; only they have access to this data. Notice, however that it can also be seen as an opportunity. Any organization could try to convince customers to port their data to them, and thereby increasing their customers and/or their data assets. If such a scenario happens, we think it is likely that it will be started and led by the likes of GAFAs and/or startups.
Of course, this scenario will not happen overnight. Several things need to be in place for this scenario to become realistic. First of all, the GDPR already mentions that the data needs to be ported in a structured (e.g. columns and rows), commonly used (e.g. CSV) and machine-readable format. A second requirement is that data portability should be an automated process powered by APIs. This makes it similar to the PSD2 regulation (Payment Services Directive) in the financial sector, that obliges banks to open their customer information through APIs to support so-called Open Banking. In this scenario, customers can tell the banks to give access to their financial data to third parties who can then provide them with additional value or even transactional services. Banks might see this as a major threat, but they shouldn’t forget that they might charge for API usage and thus create a new revenue stream. Together, the GDPR’s data portability right and PSD2 might significantly change the banking and data industry.
But neither automation nor APIs are sufficient for the scenario to work. What is still needed is a standard format to interchange data. Otherwise, a lot of effort needs to be done on the receiving side before the data can be processed. So apart from the data being in a structured, commonly used and machine-readable format, it also must be in a standard format. Only then, ecosystems can scale in a transparent way, with a possibly game-changing impact.
With this in mind, there are three possible scenarios to consider:
  • No standard – Each organization ports data in its own format, and receiving organizations need to build translators from the source format to the destination format. This will cause much data integration work, but on the other hand, it could start today.
  • Sector standard – The different organizations of a sector define on a commonly agreed sector format. For instance, all major telecommunications companies in a country could come together to agree what data fields to interchange and what the format should be. Examples of this include the so-called Green Button in the utility sector in the USA: “The Green Button initiative is an industry-led effort to respond to a White House call-to-action to provide electricity customers with easy access to their energy usage data in a consumer-friendly and computer-friendly format.” Another example is the so-called Blue Button for the healthcare sector, also in the USA: “The Blue Button symbol signifies that a site has functionality for customers download health records. You can use your health data to improve your health and to have more control over your personal health information and your family’s healthcare.
  • Universal Standard – This is a cross-sectorial approach that tries to come up with a universal standard for data portability: the Rainbow Button: “The ‘Rainbow Button’ project has been initiated … by 8 leading companies …., in order to define a common framework for the deployment of the ‘portability right’ as described in the GDPR and the guidelines to data portability provided by WG29 in April 2017.” According to Fing, the organization that started the Rainbow Button initiative, “The regulators confirm that the right to data portability is at the heart of the creation of a data ecosystem and the services of tomorrow, based on new data usages initiated and controlled by the data subjects. The target is not limited to switching services (churn), but really to spark the creation of a new range of services based on data.” Another important initiative promoting the same approach is “midata” in the UK.
When all these requirements have become a reality, then the impact of the right to data portability will have a game-changing impact on the data industry through the creation of thriving data ecosystems, where data can float freely around in a transparent way, and always under strict control of the users.

Leave a Comment on Will GDPR’s “right to data portability” change the data industry forever?

Podcast #1: The Challenges of working with data

AI of Things    31 May, 2018
This is the first of a series of podcasts where, Richard Benjamins, Data & AI Ambassador for Telefonica/LUCA, will share with us his thoughts and reflexions about these Technologies.
Richard was named as one of the 100 most influential people in data-driven business (DataIQ 100). He is a frequent speaker at big data & analytics events and strategic advisor to BigML. He has held a number of big data management positions across Telefonica, working tirelessly to provide people with the ability to communicate using secure, state of the art technology.

Data and AI can be used for good things, but its effects can also be harmful, either intentionally or as a non-desired side effect of some positive use. Decisions taken by AI products need to be fair and interpretable.The massive generation and collection of data today is creating huge opportunities for businesses and institutions through Artificial Intelligence and Machine Learning. However, those opportunities also come with a responsibility and risk.

Are we living on a privacy time bomb?
Will the Cambridge Analytica/Facebook scandal change the data industry for good?
Will the data industry be a victim of its own success?
Is there a different, more sustainable, way forward for the data industry?
In this podcast, those questions are discussed by our Data & AI Ambassador,  Richard Benjamins.

Farms 4.0: IoT to benefit agriculture

Luis Simón Gómez Semeleder    30 May, 2018

When we think about Internet of Things, different pictures come to mind: a sensor that measures the activity inside our house, a bracelet that helps us exercise, or a device that helps us maintain a room temperature even when we’re on the far side of the city.

However, the potential of IoT goes far beyond the most tech-heavy sectors and even reaches more traditional niches which over time have gradually come to embrace technologies in their daily use. An example is one of the essential sectors within the Spanish economy: agriculture.

The revolution in farming has been unstoppable. In fact, the European Union has already planned for the paradigm shift in the agricultural industry, in which it predicts that there will continue to be even more developments that allow professionals in the sector to optimize all their processes. This comes in a niche which will have no fewer than 75 devices connected by 2020, according to the report by the World Government Summit.

This is not a theory but a real practice that is already part of the day-to-day activity of Spain’s agriculture sector. And to prove it, here are a few examples:

1.- Smart tractors 

Tractors have not disappeared, nor are they going to disappear from agriculture; in fact, now they can be used more efficiently than ever, especially connected tractors, which can lay out the best route to plow the field in order to avoid repetition and possible soil erosion. Thanks to this kind of practice, fuel consumption and potential emissions into the atmosphere are also lowered.

2.- Drones in the fields

This is surely the most fascinating application of them all. If farmers have spent their entire lives looking up at the sky, now they’re still doing it, but with a bit of help: drones which help enormously to measure the harvest and carry out automated tasks.

Drones are able to perform an increasingly broad range of tasks in the fields, from monitoring the state of the plants and harvests to distributing fertilizer, not to mention measuring plots of land and different constants (air temperature, water and heat levels, etc.). The use of drones is so advantageous that some wineries have even decided to use them to improve their efficacy.

3.- Sensors in the soil

There is also room for technology underground. One example comes from the Spanish startup BrioAgro, which installs sensors to provide real-time information. In this way, the farmer can get all kinds of data on the moisture and light levels and the nutrients in their crops, right on their mobile phones.

Furthermore, farms that use underground sensors can also lower their water consumption and use fertilizers and energy much more efficiently, without this affecting their farm yields aboveground.

4.- Chips in animals

Farms tend to have some animals, but it’s not always easy to keep track of them. Many of them carry chips, but the fact is that these chips have traditionally only been used to identify the animal, not to offer any added technological use.

Today, however, Internet of Things allows us to take things much further, especially in cases like the Grupo Caro, a Spanish company that monitors the daily activity of each of its animals thanks to an individual chip which provides information on their general condition, feeding, hydration, quality, etc. In this way, the owners collect objective information in real time, a job that used to be done without the reliability of technology and with a much more burdensome work process.

5.- Big data in production

Among other things, farmers ultimately have to keep an endless list of factors in mind to monitor their yields and their farms’ daily activity. Thanks to the IoT, however, they can have global tools which can provide information in real time and improve the efficiency of their farms by merging applications like those mentioned above.

Leave a Comment on Farms 4.0: IoT to benefit agriculture

Ready for a Wild World: Big Data is key for humanitarian issues

AI of Things    30 May, 2018
The Big Data for Social Good event organized by LUCA, “Ready for a Wild World“, took place on May 24th and brought together experts from relevant global organizations and companies in Madrid. Among those organizations were FAO, UNICEF, The Ministry of Agriculture and Fisheries, Food and Environment (MAPAMA), the GSMA and companies like Data-Pop Alliance and Digital Globe. With each of their presentations, they were able to show the importance of data when it comes to developing efficient prevention plans to prepare for natural disasters and climate change.

Closing panel Big Data for Social Good
Figure 1: The closing panel of experts.
In the new digital era, data-based desicion making is a fundamental pillar for the success of an organization. According to Kyla Reid, Head of Mobile for Humanitarian Innovation & Digital Identity at GSMA, “Big Data is key for digital innovation when it comes to humanitarian work.”

Along these same lines, the Big Data for Social Good department at LUCA is already working on projects with organizations like UNICEF and FAO. In fact an agreement has recently been signed with the latter. Natalia Winder Rossi, Head of Social Protection at FAO mentioned: “FAO and Telefónica are working together to take advantage of the use of state-of-the-art digital technologies for agricultural development, food security and nutrition, and specifically, to prepare and strengthen farmers in the face of extreme weather events related to climate change.”

Another debate that was opened during the sessions was data protection. Isabel Bombal, advisor to the General Directorate of Rural Development and Forest Policy-MAPAMA highlighted how to overcome the barriers that arise when the fact of data sharing comes up: “the first incentive to make data exchande posible is to explain the benefits of why it is important”. Regardless of whether the objective when working with data is philatropic or business , Elena Gil, CEO of LUCA, reminded the audience that “Privacy is a right and we must be very protective with that.”

Besides presenting real tools and projects live, like OPAL, a platform that seeks to unleash the potential of private data for social good, the application of geospacial data like those offered by DigitalGlobe to predict all types of events, including environmental ones, or SafePost, a tool that is able to send messages in emergency situations without the need for internet connection; the need for these initiatives to be sustainable on a medium to long term scale as also highly discussed. It is important that these initiatives are sustainable for the main users of these tools, meaning, humanitarian organizations and public administrations.

If want more detail on the event, you can watch all the coverage, including videos on our website.