The 5 most in-demand Big Data profiles in 2017

AI of Things    7 December, 2016
With Big Data falling off the Gartner Emerging Technology Hype Cycle this year, and Artificial Intelligence now hogging the limelight, we decided to take a look at the most in-demand data profiles we expect to see in 2017.
During the past 2 years we have seen the craze of the Data Scientist, however, as a greater number of companies in an increasingly wide range of sectors begin their journeys to data-driven – more and more different profiles are emerging. We spoke to our business unit and our HR team to get their insights and these are the top 5 profiles that cropped up in our conversations:

1. Data Engineer

High quality data is fundamental in the success of Big Data projects.  For this reason, we expect to see plenty of vacancies in 2017 for Data Engineers who have a consistent and perfectionist approach to data conversion and treatment. Companies will be looking for these data gurus to have extensive experience in manipulating information with SQL, T-SQL, R, Hadoop, Hive, Python and Spark. Much like Data Scientists, they are also expected to be creative when it comes to comparing data with conflicting data types to be able to resolve issues. They also often need to create solutions which allow companies to capture existing data in more usable data formats – as well as performing data modelling and design.

2. Data Visualisation Specialist

Data Visualization has become extremely important in ensuring data-driven employees get the buy-in required to implement ambitious and important Big Data projects in their company. “Data Storytelling” and the art of visualizing data in a compelling way has become a crucial part of the Big Data world and increasingly organisations want to have these capabilities in-house. Furthermore, more often than not, these professionals are expected to know how to visualise in various tools such as Spotfire, D3, Carto and Tableau – amongst many others. Data Visualisation Specialists need to be adaptable and curious to ensure they keep up with latest trends and solutions to tell their data stories in the most interesting way possible in the board room.

3. Big Data Architect

This is where the Hadoop experts come in. Typically, a Big Data architect addresses specific data problems and requirements, being able to describe the structure and behaviour of a Big Data solution using the technology in which they specialise – which is more often than not, Hadoop.

These employees act as an important link between the organisation (and its particular needs) and Data Scientists and Engineers. Any company that wants to build a Big Data environment will require a Big Data architect who can comfortably manage the complete lifecycle of a Hadoop solution – including requirement analysis, platform selection, technical architecture design, application design and development, testing and finally deployment.
According to this article, the primary skills and competencies to become a Big Data Architect are:
  • Marketing and analytical skills
  • RDMSs (Relational Database Management Systems) or Foundational database skills
  • NoSQL, Cloud Computing, and MapReduce
  • Skills in statistics and applied math
  • Data Visualisation, Data Mining, Data Analysis and Data Migration experience
  • Data Governance (an understanding of the interaction between Big Data and Security)
If you can tick all of these boxes, then you’re currently recruitment gold, with salaries showing steep year-on-year growth:

US salary trend
Figure 2: US Salary Trend for Big Data Hadoop Architect
If you’re keen to become a real Big Data architecture rockstar, then getting the right certifications in Hadoop and Apache Spark is strongly recommended to achieve a thorough knowledge of tools such as MapReduce, HDFS, Pig, Hive, Hbase, Zookeeper, Flume, Sqop, Cassandra, Scala, Mongo DB, Storm, Kafka and Ampala. Not too much to get your head around!

4. Artificial Intelligence Developer

The unquestionable hype around Artificial Intelligence is also set to accelerate the number of roles advertised for specialists who really understand how to apply AI, Machine Learning and Deep Learning techniques in the business world.  Recruiters will be asking for developers with extensive knowledge of a wide array of programming languages which lend well to AI development such as Lisp, Prolog, C/C++, Java and Python.
However, many speculate that this peak in demand for AI experts could cause a “brain drain for tech groups” with businesses poaching talent away from the world of academia.  Last month in the Financial Times, deep learning pioneer and researcher Yoshua Bengio, of the University of Montreal stated: “The industry has been recruiting a lot of talent — so now there’s a shortage in academia, which is fine for those companies, but it’s not great for academia.” It will be interesting to see how this conflict between academia and business is revolved in the next few years.

5. Data Scientist

The shift of Big Data from tech hype to business reality may have accelerated, but the shift away from recruiting top Data Scientists isn’t set to change in 2017.  A recent Deloitte report highlighted that the world of business will need one million Data Scientists by 2018, so if their predictions are correct, there’s a big talent gap in the market.  This multidisciplinary profile requires technical analytical skills, technical computer science skills as well as strong softer skills such as communication, business acumen and intellectual curiosity. These are outlined in this article and in our video which we published earlier this week on the LUCA blog.

With only one month until 2017 kicks off, we’re excited to see where the new year takes us and how innovation in Big Data and Artificial Intelligence will affect the global data job market.  If you think you have what it takes to help leading organisations on their journey to data-driven, keep an eye on our jobs portal here.

Big Data and Tourism: How this Girona Festival became Data-Driven

Ana Zamora    2 December, 2016

Every year, from the 9th to the 17th of May, Girona celebrates the “Temps de Flors”,  one of the most popular flower festivals in Europe. For ten days, the streets of the city come to life with music, colour and the smell of exotic flowers. During this period, thousands of visitors flood the city enjoying the charm of this unique Catalonian festival with the backdrop of one of the most famous Game of Thrones filming locations


For 2 years in a row, we have been working with Girona’s local government to enable them to take a more data-driven approach to this touristic event, ensuring that the festival is as successful as possible for the organisers.  One of LUCA’s products (Smart Steps) analyses crowd behaviour by aggregating and anonymising our mobile network event data to provide actionable insights to decision-makers in the public sector in areas such as mobility, infrastructure planning and in this case, tourism.
Our study enabled the city of Girona to become a pioneer in Big Data analysis for tourism, analysing millions of mobile data events per day to understand the behaviour of tourists as well as where they come from and how long they stay.  Smart Steps prioritises security and privacy at all times, carrying out a robust and exhaustive anonymisation and aggregation process to analyse the movements of groups of people, rather than individual tourists to provide trend insights and patterns. After this, an extrapolation is also applied to provide an accurate representation of both national and international tourists.

Key insights allowed us to estimate that Girona had a total of 244,199 visits to the city
during the festival. With 90% of visitors coming from Catalonia, 2% from the rest of Spain and 8% from other countries.

People on Girona
Figure 2: Thousands of people walk the streets of Girona during Temps de Flors.

Furthermore, of the Spanish national visitors, we could identify that of the 92% of visitors visiting from Spain, 60% were from regions within Girona, 35% from Barcelona, ​​2% of Tarragona and 1% of Lleida. 
The study also showed that the gender and age split was consistent among the different regions, apart from in Girona where visitors were slightly younger on the whole compared to other areas of Catalonia.

Heatmap of national tourists
Figure 3: Heatmap of national tourists on the Festival de las Flores.

We also saw in the analysis that 18,881 visitors came from outside of Spain, 82% of which came from 9 countries: France, Holland, Germany, Belgium, Great Britain, Italy, Poland, Russia and the USA. French visitors were the most prominent, accounting for 45% of the total.
Some years ago, the only way of obtaining this kind of data was by carrying out more traditional visitor surveys. However, now, thanks to Big Data, it is possible to obtain an in-depth analysis of the movements
and behaviours of large groups of people. These tourism insights are extremely valuable to both public and private sector decision makers, as they allow them to adapt their offering to give tourists an even better experience. 



Interested in finding out more about our Big Data tourism products? Watch Dave Sweeney’s presentation on our YouTube channel or drop us an email here.
Written by Ana Zamora

You can still win 5000 dollars. Send your Latch plugins over!

Florence Broderick    2 December, 2016
Remember that on Monday, December 12 at 1pm (CET), the deadline for the submission of applications for our Latch plugins competition ends. You’ve had almost two months to think of a breakthrough idea and to develop it, but don’t worry; you still have a few more days to round it off.

However, if you still don’t know what you want to do, you still have time to register and, to help you, we will give you some ideas.

How about this Latch integration for the protection of payments developed by our collegues in Equinox (in just 23.5 hours!)? A great project that combines creativity, security and utility!
The idea is to be able to issue a token that gives access to a service or device. This token is printed on paper (which I have) and is only valid when the token Issuer authorizes its use from the Latch application (second authorization factor).

Or how about the integration of ElevenPaths’ Latch+Antiransomeware in the AntiRansomWare tool? It is the winning combination to address a problem as worrying and common nowadays as Randsomeware. It is a tool that adds an authorization layer on Windows systems for “protected” folders, in addition to the existing permissions of the operating system, so that any type of write or delete operation of the files is denied. The authorization in this case lays on Latch instances for each folder, and files in those folders cannot be modified or deleted if the associated Latch is closed.

Aren’t you inspired yet? How about you try the new Latch Cloud TOTP functionality? This functionality allows you to use Latch as an application to generate TOTPs that you can easily use with websites like Facebook, Dropbox or Google.

Get involved and enter the competition! Register in the Latch Plugins Contest. A prize of up to 5000 $ is waiting for you.

May the luck be with you!

International call traffic may tell you more than first thought

AI of Things    30 November, 2016
This post debates the value of international phone calls in understanding society.
Telefónica has a wide global infrastructure of networks which can be used by other service providers to carry their international call and data traffic. Telefonica Business Solutions sell this service, amongst others, negotiating wholesale business deals. Our role throughout this process is to collect call and data traffic in one country (provided by a telecommunications operator) and effectively transport and pass this on to another operator in a different country.

We recently had the chance to process and analyse a few months’ worth of data relating to this service. The aim of this was to let us understand the data and information allowing us to discover some interesting facts with a pretty simple analysis. All the information that is stored and processed by our global Big Data team here in LUCA is done so anonymously, ensuring a secure working environment.
When dealing with voice calls, the characteristics of each “recording/event” have a dataset which can be summed up as the phone number from the country of origin, a destination phone number, a timestamp and the call time duration. To add a deeper analysis, we can also use further parameters – but we won’t do that on this occassion.

Whilst the phone numbers we deal with are anonymous, we can access the country code and in some cases the region or province of the number of the person making the call and the person receiving the call. This dataset may face limitations in terms of data variance but it is expansive in terms of volume. In terms of structure it is very similar to some popular open data relating to air traffic (see this example). In fact, this resemblance has allowed us to easily reuse some interpretations of the data as they have been previously formed by the programme Carto.

Let’s listen to what the data tells us:

Given the basic information that this dataset has provided, the first exploration that we have raised is the evolution of the total number of calls that they have studied. The following graph is a representation of the daily traffic that was studied.

Amount of calls managed by Telefónica
Figure 2: A specific representation of the amount of calls managed by Telefónica. We can see a clear weekly pattern and the curious changes that happen in different weeks.
It has been found that a weekly pattern takes place with dips during the weekend. What is most notable is the weekly variation and of course even more so when the variations are very pronounced. The data is starting to to show something worth debating, so what is the data trying to tell us?
The answer becomes clearer when we start to travel between countries. For example, in the following graphic we can see the daily progress of the number of calls to Italy from various countries. The biggest peaks that appear on the right hand of the graph are from the 24th of August 2016, which is when a large earthquake took place in Italy.

Figure 3: Representation of the amount of
calls made to Italy from different countries worldwide during the earthquake
that took place on the 24th of August in Italy.
The data may also be starting to allow us to analyse international events when the ties between countries are noted through our data. Let’s hold that thought, the information starts to appear more subtly: why was the response of Ecuador or Argentina more notable than other countries? We can try to explain the situation with a few well thought out arguments, but, we want the data to do the talking.

Google gives us a very useful tool to help us interpret this information we are finding. This platform is referred to as GDELT and it monitors in real time what’s happening in the world and the impact it’s having. It also takes into consideration the language and where in the world it has happened. This means that we can further develop the information that we already have by combining the local and global. This tool can be used with the BigQuery platform from Google. Depending on how you choose to set the parameters the results may vary or you can simply stick to the preconfigured analytical tools.

As an example, in June 2016, the United Kingdom voted over whether they would remain in the European Union. Can our data explain this? It certainly can. We aren’t just talking about the immediate effect but also the impact it will have in the following weeks. We can see in the following graph the amount of calls between the United Kingdom and Belgium (headquarters of the European Commission). The first marked date (in red) is the day of the vote (Thursday June 23). We can also see the impact in the weeks after the event. The second marked date, exactly a month after, coincides with the first published economic index which highlighted the economic contraction of the United Kingdom.

Calls between UK and Belgium
Figure 4: Representation of the amount of
calls between the UK and Belgium around the time of the Brexit vote in the UK.
These initial investigations help to create a more formal model. These bodies can even be the anonymous phone numbers as well as the geographical regions with their origin and destination. They can also be taken as separate data that can create a series of indicators (amount of minutes received and taken), or this information can be paired up so that it would be talking about a network or graph in which the hubs are the bodies and the arcs connect those hubs with others where there has been traffic.

Type of data considered Suggested Analysis:

The next figure shows an example from a graph which represents a map that shows and analyzes the existing connections between Spain and the rest of the countries noted. In detail, the data is linked to July 7 2016, and highlight the connections with Islamic countries as it was the last day of Ramadan. They also show links to countries that contribute to tourism in the summer. The video below shows the daily changes in data from the map. 

Connection graphic
Figure 5: Graph showing the connections
between Spain and the rest of the world on the 7th of July 2016 (the
end of Ramadan)
Time Series
The sequential and temporary nature of data allows us to model them in a time sensitive way. The analysis of time series is a popular statistical discipline and therefore library functions have been developed in almost all programming languages which are regularly used in the world of data analysis (R, Python and Matlab). There are even free tools such as INZight which allow us to do more basic analyses without even writing one line of code.
As a first step before making any analysis, it is important to verify that our data series is static (the mean, variance and covariance of its values does not depend on time) and, if it is not, we make it that way. A series of data from the call based data set shouldn’t usually be stationary, so we need to work on that.
Put simply, a time series like what we have identified in the data taken from the call traffic has been divided into three parts that can be added together or multiplied to produce the original series.

Trend: In our case it depends of the volume of traffic that Telefonica processes with a particular country, I.e., we are mainly linked to the growth or contraction of business.
Seasonality: There are notable weekly cycles, in which a significant increase in calls happens during the weekend.
Remaining information: This is the difference in values from the original series of data and then data that has been generated through trends and seasonality. This part of the data allows for most interest as the peaks and troughs can be linked and related to technical issues, international events and public holidays. Ultimately the remaining information is where we can look if we want to analyse what happened outside of the normal trends.
Any program (like zoo, xts or R timeSeries) allows us to easily remove these three components.

Graphic
Figure 6: Break down through trend,
seasonality and those left over from the amount of directed minutes to one
single country.
The usual interest in doing a time series analysis is to be able to generate a predictive model (like the exponential smoothing model or the ARIMA model) that allows us to anticipate, for example, how much traffic we will have in the next few days. Or it can help us to find the true outliers in the series (values that come out of the intervals of predictions that we can do confidently, which is quite simple to do in R.
Due to its reliability to make predictions in the short term, the family of tests belonging to the exponential smoothing technique from Holt-Winters have become popular and are available in tools like Tableau or TIBCO Spotfire analysis.
The ARIMA models are more complex to apply but in most cases improve the prediction of the previous data as the link between the data has been previously established giving the model more context depending on earlier values.

Traffic generated
Figure 7: Prediction of traffic generated with
exponential smoothing (Holt-Winters)

Multi-country social media platforms:
The information behind the use of social media is a value that is used extensively by businesses. The main reason that businesses follow up this information is to segment their clients so that they can effectively communicate with them to increase their chances of product consumption. However, the main obstacles for businesses when trying to exploit these sources are complexity and cost.
Telefónica provides experience and differencial knowledge about the construction of social media models or SNA (Social Network Analysis) which uses information gained from caller patterns. This time we want to understand the existing relationships internationally that are formed through social media and how we can explain their relevance through telecommunications. We have been inspired by social initiatives like Combatting global epidemics with big mobile data and also Behavioural insights for the 2030 agenda.
The next figure gives us a first look at data taken from this type of perspective. Only taking into account the volume of calls relevant to the ones that were actually answered, combining this with common sense aligning this data with global socio-economic data.

Amount of calls
Figure 8: The amount of calls be between countries paired by their origin and destination calls throughout the month of August 2016. This only counts the countries with most volumes of calls generated and the main volume of destination calls for each country.
There are good sources with international socio-economic data in order to contrast and complement what is observed in our data. For example, large amounts of economic data can be found in the economic observatory of MIT, in the Databank of the World Bank, or Eurostat. And more social (and also economic) data in the United Nations or UNICEF databases. This type of data can be very useful even if there is temporary granularity, spatial issues, or the frequency with which they are published is not ideal.

Before continuing to understanding how countries interact, we need to stop for a moment and think about how people behave when it comes to making a call.
In Figure 9 we have divided the calls that are made daily into four different user groups: those who tend to call during working hours (green), those who call in their free time (blue), those who call during the weekend (red), or finally those who call at night time (purple). Although this first division may seem simple, it allows us to note the users who will normally be calling for personal reasons or those who ring because of work related activity. We can highlight how, for example, the level of calls during the weekend easily exceeds those taken from Monday to Friday, and furthermore once you are in one of these groups you tend to stay there. It’s easy to say this was expected but it’s the data that has been able to state and qualify these statements.
Daily evolution
Figure 9: The daily evolution of calls made by
users who normally call during office hours (green), those who call during the
afternoon Monday-Friday (blue), weekend callers (red) and night time callers
(purple)
Coming back to the inter-country perspective, in the following graph we can see a geographical representation that can help us to better understand the flows in communication. The original data has been simplified and scaled for convenience and ease of reading. We can monitor the changes in datafrom the dates of Ramadan (7/7/16) to the earthquake in Italy on the (24/8/16).
Video 1: Animated representation of the connections made between Spain and the rest of the listed countries throughout the months of June, July and August 2016. We can see these connections with relation to the dates of the end of Ramadan and also with the earthquake in Italy
These representations have led us to confirm personal links (social) and professional (economic) which we have mentioned before when referring to socioeconomic data. In the next graphic we go a little deeper to show a more specific central European zone, the video gives us a closer look at the analysis of the data.

Geographical representation
Figure 10: Geographical representation of communications
through a defined zone in Europe.
Recapping what we have learnt from the data analysis, where we have separated the callers based on their habits and have the precise information about their location we can start to understand the fundamental relation between call data and other socio-economic indicators. Not forgetting the link between global events, commercial relations between regions and even the simple interaction between people in their local communities.
For example:
  • We could analyze communications between eminently industrial zones and compare those with relation to commercial seaports which are connected by transport links.
  • The combined knowledge of communications between the caller country and the destination of the call collaborated with historic immigration patterns has allowed us to give the data a deeper meaning. We could see that this was consistent with the data analysed about Argentina and Italy during the earthquake crisis in Italy. For this reason, we expect the same patterns with Spain and Germany. Does this mean that this call time information could become a true indication for modern day immigration? It’s possible that one day we might be able to predict the flow of people through data.
Clearly our digital footprint goes a long way in describing us and our behaviour.
Written by Pedro de Alarcón.

Artificial Intelligence: What even is that?

Richard Benjamins    29 November, 2016
Artificial Intelligence (AI) is the hottest topic out there at the moment, and often it is merely associated with chatbots such as Siri or other cognitive programs such as Watson. However, AI is much broader than just that. To understand what these systems mean for Artificial Intelligence, it is important to understand the “AI basics”, which are often lost in the midst of AI hype out there at the moment. By understanding these fundamental principles, you will be able to make your own judgment on what you read or hear about AI.

This post is the first of a series of three posts, each of which discuss fundamental concepts of AI. In this first post, we will discuss some definitions of AI, and explain what the sub-fields of AI are.

What are the most common definitions of AI?

So, first of all, how does Google (one of the kings of AI) define Artificial Intelligence?

Definition of AI
Figure 2: A popular definition of Artificial Intelligence (Google).
There are many definitions of AI available online, but all of them refer to the same idea of machine intelligence, however, they differ in where they put the emphasis which is what we have analysed below (an overview of these definitions can be found here).

For example, Webster gives the following definition:

Definition of AI
Figure 3: The official Webster definition of Artificial Intelligence.
All definitions, of course, emphasise the presence of machines which are capable of performing tasks which normally require human intelligence. For example, Nillson and Minsky define AI in the following ways:
  • “The goal of work in artificial intelligence is to build machines that perform tasks normally requiring human intelligence.” (Nilsson, Nils J. (1971), Problem-Solving Methods in Artificial Intelligence (New York: McGraw-Hill): vii.)
  • “The science of making machines do things that would require intelligence if done by humans.”  (Marvin Minsky)
Other definitions put emphasis on a temporary dimension, such as that of Rich & Knight and Michie:
  • “AI is the study of
    how to make computers perform things that, at the moment, people do
    better .”(
    Elaine Rich and Kevin Knight)
  • “AI is a collective name for problems which we do not yet know how to
    solve properly by computer.”
    (Michie, Donald, “Formation and Execution of
    Plans by Machine,” in N. V. Findler & B. Meltzer (eds.) (1971),
    Artificial Intelligence and Heuristic Programming (New York: American
    Elsevier): 101-124; quotation on p. 101.)
The above definitions portray AI as a moving target making computers perform things that, at the moment, people do better. 40 years ago imagining that a computer could beat the world champion of chess was considered AI. However, today, this is considered normal. The same goes for speech recognition; today we have it on our mobile phone, but 40 years ago it seemed impossible to most.
On the other hand, other definitions highlight the role of AI as a tool to understand human thinking. Here we enter into the territory of Cognitive Science, which is currently being popularized through the term Cognitive Computing (mainly by IBM’s Watson).
  • By Artificial Intelligence I therefore mean the use of computer programs and programming techniques to cast light on the principles of intelligence in general and human thought in particular.” (Boden, Margaret (1977), Artificial Intelligence and Natural Man, New York: Basic Books)
  • AI can have two purposes. One is to use the power of computers to
    augment human thinking, just as we use motors to augment human or horse
    power. Robotics and expert systems are major branches of that. The other
    is to use a computer’s artificial intelligence to understand how humans
    think. In a humanoid way. If you test your programs not merely by what
    they can accomplish, but how they accomplish it, then you’re really
    doing cognitive science; you’re using AI to understand the human mind.

    — Herbert Simon
Some however take much a more concise and less scientific approach with definitions such as:
  • AI is everything we can’t do with today’s computers.”
  • AI is making computers act like those in movies.” (Her, AI, Ex Machina, 2001: A Space Odyssey, etc.)

From all of these definitions, the important points to remember are:

  • AI can solve complex problems which used to be assessed by people only.
  • What we consider today as AI, may just become commodity software in the not so distant future.
  • AI may shed light on how we, people, think and solve problems.

What are the sub areas of AI?

Looking at the introductory table of content of any AI textbook will quickly reveal what are considered to be the sub-fields of AI, and there is ample consensus that the following areas definitely belong to it: Reasoning, Knowledge Representation, Planning, Learning, Natural Language Processing (communication), Perception and the Ability to Move and Manipulate objects. But, what does it mean for a computer to manifest those tasks?
  • Reasoning. People are able to deal with facts (who is the president of the United States), but also know how to reason, e.g. how to deduce new facts from existing facts. For instance, if I know that all men all mortal and that Socrates is a man, then I know that Socrates is mortal, even if I have never seen this fact before. There is a difference between Information Retrieval (like Google search: if it’s there, I will find it) and reasoning (like Wolfram Alpha: if it’s not there, but I can deduce it, I will still find it).
  • Knowledge Representation. Any computer program that reasons about things in the world, needs to be able to represent virtually the objects and actions that correspond to the real world. If I want to reason about cats, dogs and animals, I need to represent something like isa(cat, animal), isa(dog, animal), has_legs(animal, 4).  This representation allows a computer to deduce that a cat has 4 legs, because it is an animal, not because I have represented explicitly that a cat has 4 legs, e.g. has_legs(cat, 4).
  • Planning. People are planning constantly: if I have to go from home to work, I plan what route to take to avoid traffic. If I visit a city, I plan where to start, what to see, etc. For a computer to be intelligent, it needs to have this capability too. Planning requires a knowledge representation formalism that allows to talk about objects, actions and about how those actions change the objects, or, in other words, change the state of the (virtual) world. Robots and self-driving cars incorporate the latest AI technology for their planning processes. One of the first AI planners was STRIPS (Stanford Research Institute Problem Solver), that used a formal language to express states and state-changes in the world, as shown in Figure 3.
STRIPS planner
Figure 4: The STRIPS planner to build a pile of blocks.
  • Learning. Today this is probably the most popular aspect of AI. Rather than programming machines to do what they are supposed to do, machines are able to learn automatically from data: Machine Learning. Throughout their life, and especially in the early years, humans learn an enormous amount of things, such as talking, writing, mathematics, etc. Empowering machines with that capability makes them intelligent to a certain extent. Machines are also capable of improving their performance by learning by doing. Thanks to the popularity of Big Data,  there is a vast amount of publications on Machine Learning, as well as cloud-based tools to run ML algorithms as you need them, e.g. BigML.
  • Natural Language Processing. We, humans, are masters of language processing since communications is one of the aspects that make humans stand out of other living things. Therefore, any computer program that exhibits similar behavior is supposed to possess some intelligence. NLP is already part of our digital live. We can ask Siri questions, and we get answers, which implies that Siri processes our language and knows what to respond (oftentimes).
  • Perception. Using our 5 senses, we constantly perceive and interpret things. We have no problem in attributing some intelligence to a computer that can “see”, e.g. can recognize faces and objects in images and videos. This kind of perception is also amply present in our current digital life.
  • Move and Manipulate objects. This capability is above all important for robotics. All our cars are assembled by robots, though they do not look like us. However, androids look a bit like us and need to manipulate objects all the time. Self-driving cars are an other clear example of this manifesting this intelligent capability.
Self-driving cars
Figure 5. Self-driving cars combine many capabilities of Artificial Intelligence.
In this first post (of three), we have explained some key notions about Artificial Intelligence. If you couldn’t do so before, you will now be able to read AI publications a bit differently.  In the next post, we will elaborate on the question of how intelligent AI can become. Stay tuned!
Leave a Comment on Artificial Intelligence: What even is that?

Telefónica Mannequin Challenge

Florence Broderick    25 November, 2016
Today in the office we decided to do our very own Mannequin Challenge, bringing together employees from all over Telefónica. This viral internet craze has even frozen the internet in recent weeks so we decided to do our own version:

The network might never stop at Telefónica, but sometimes our employees do. Interested in finding out more about joining our team? Visit our careers website here.

Can Mobile Data combat Climate Change in Germany?

AI of Things    24 November, 2016
One of our favourite topics here at LUCA is using Big Data for Social Good, to measure our progress on Sustainable Development Goals. Three of the 17 goals are closely linked to Climate Change: Affordable and Clean Energy; Sustainable Cities and Communities and Climate Action.
Telefónica has shown their commitment to these goals by promising that 50% of its energy will be renewable by 2020 (and 100% by 2030).  As the specialist Big Data unit of Telefónica, we strongly believe that data is fundamental in ensuring we drive society towards a more sustainable model – as we showed yesterday with our post on the opportunity of finding carsharers using mobile phone data.

However, aside from goals 7, 11 and 13 there is also number 17 – which we believe is extremely important. The 17th goal is to ensure we create compelling partnerships in order to achieve these objectives, and this is precisely what our team in Germany are doing. We have been working alongside Teralytics and the South Pole Group to find a smarter data solution for air pollution in the city of Nuremburg.
Urban areas and their respective local governments are facing immense challenges with accelerating rates of CO2 emissions. In their mission to ensure cleaner air for their cities, the first and most important step is to collect accurate data to identify where the major air pollution hotspots are.
Cities across Germany have shown excessive pollution levels in recent years and to combat this, we have been working with a wide range of public sector bodies to tackle this using mobile data. Working with partners, we are providing actionable insights about traffic and crowd mobility patterns to help the authorities in Nuremburg measure and predict pollution in a more cost-effective way, reducing the impact for the German taxpayer.
Pollution in Germany
Figure 2: How can we use Big Data to reduce pollution in Germany?
For local governments, air quality management can be costly and more often than not, the way we study traffic is relatively manual, using roadside interview data and manual counters.  Not only is this expensive, but it’s also often inaccurate – providing a small snapshot on how traffic really moves around cities and countries. However, by using mobile data the authorities in Germany are now able to shift to Big Data, rather than small samples, receiving insights on a regular, more dynamic basis against more traditional data collection methods.
Mobile data technologies, such as Smart Steps, allow us to know how fast cars are travelling, which roads suffer with more traffic and which mode of transport people are using (and much more). After anonymising and aggregating our data, we are able to provide key insights for pollution analysis and transport planners looking to decarbonise their cities.
Calculating pollution
Figure 3: Calculating pollution using mobile phone data.
For decision makers in Nuremberg, tackling air pollution is a top priority. Officials have drafted a new clean air plan and the results of their pilot project will enable them to priortise according to the needs of the city.  As Germany’s largest mobile communications provider in terms of subscribers, Telefónica and LUCA will transform the data using their advanced technology platform based on machine-learning and advanced algorithms with Teralytics. Then, the South Pole Group will provide the analysis to calculate air pollution emissions. Data quality checks will be performed using historical data for comparison.
Using the results of this project, Nuremberg will be able to identify which areas of the region are worst affected. For example, the city council may substitute transportation options which have high emissions with greener solutions, or they may identify areas to extend bicycle lanes. By having better traffic data, cities like Nuremberg can now rise up against the challenges of pollution and smog – improving the lives of thousands of citizens.
Alexander Lange, who works in Business Development in LUCA in Germany, presented this project in Berlin at the European Commission’s conference “Decarbonising Transport: Smart Mobility Innovation for Sustainable Cities“. If you would like to find out more, please contact us here to receive more information on this ground-breaking project.

Take part in Latch Plugins Contest with such hacks as Paper Key. Are you game?

Florence Broderick    24 November, 2016
At Elevenpaths there is a tradition of developing innovation and training the ability to transform an idea into something tangible, as you might know that in development process, projects often have “asymptotic” completion times

Every six months we are challenged to develop an idea for 24 hours in a row, put it into practice and then present it in public. It can be anything, but the important thing is that it works. We call it Equinox.

In the Equinox of Fall of 2016, a group of colleagues (Jorge Rivera, Pedro Martínez, Alberto Sánchez and Félix Gómez) wanted to unite the abstract, the logical security, with the specific, something that you could touch. And we thought that, at the same time, we could use the technology of Latch and the new API developed this year (the “operation instances”- Latch SDK).

From there, the Paper Key project was created, with which we wanted to unite different technological pieces, prioritizing the security of the whole process, and abstracting the technology, so that the use is simple and intuitive.

The idea is to be able to issue a token that gives access to a service or device. This token is printed on paper (which I have) and is only valid when the token Issuer authorizes its use from the Latch application (second authorization factor).


In our real example, a person can print a ticket with an associated amount of money, and after authorizing the operation in Latch from their mobile, a second person exchanges the ticket in an automatic wallet, which will deliver the indicated amount of coins.

The whole process involves two people (the Issuer and the Recipient) and four technology blocks: the web application, the ticket printer, the API Python server, and the ticket reader + wallet.

The Issuer, from a web application, generates a ticket with an operation identifier and an amount of money. The operation is associated with the Issuer’s Latch account, and the ticket is sent to the Recipient by physical means, or with the printer that is in their environment.

When the Recipient wants to consume the ticket (in this case, get an amount of euros from an automated wallet), they approach a ticket reader, which will check the status of the authorization in Latch. As long as the ticket Issuer does not authorize the operation, the service cannot be accessed or consumed, and a notification will also be sent to their Latch app that someone is attempting to use the ticket (which is the standard behavior).

The architecture used in this proof of concept could be optimized, but since we had to finish all developments in 24 hours, we needed to share the work among the four of us. (This approach also allows the server, printer and ticket reader to be distributed in different locations, since they communicate with each other via the Internet).

Taking into account the premises of Equinox (24 hours, that it works and that it can be explained!), we describe the different components in more detail.

The WebApp
It is a simple application in PHP with an interface in liquid HTML that allows to adapt the forms to the different sizes or orientations of the screen of mobile telephones.

The application runs on a WAMP server and communicates with an API in Python to interface with the printer and the ticket reader. It is a standard PHP application, where users are authenticated by user and password against a MySQL generating a session token. You can find a lot of examples on how to do this on the website.

The WebApp allows the user to browse, and after being validated, to select an amount of money and write a free text to identify the operation. This information is sent via a POST to a Python server, which will generate a request for the printer.

The response of the server with the API in Python is a JSON that we parse in the PHP server to return the response to the WebApp:
{
status: [Ok/NOK]
money: [amount of money – to inform the WebApp]
id: [Identifier returned by the server – for the WebApp]
}

In the response of the POST we receive the status of the operation and the ID generated to enter it on the screen of the Issuer’s phone.

The ticket printer
This subsystem consists of a Raspberry Pi and a thermal ticket printer. The printer (Brother QL-570) was kindly lent to us by the Secretariat team, and we got the Raspberry from the IoT Security lab, which has enough hardware to play with.

The Raspberry is connected to the Internet via Wi-Fi, and it waits in a port for a REST request with the contents it has to print ( “generateID” operation.)

{
instanceId: [Latch instance ID]
money: [amount of money in Euros]
}

A two-dimensional QR code is generated with the libqrencode library, and with the Image Magic’s libraries, the code is superimposed over a pre-established background with the “Equinox” logo. Then, the text is added to the request, in this case the value of the generated ticket.

The final ticket will be printed with the Raspberry PI thanks to the printing pseudo-driver for this printer, available in Git-Hub.

The QR code is an operation identifier encoded in Base32, and will allow the QR code reader to check the authorization status of the operation before providing money (1 Internet Point goes to the one that tells us why we had to use Base32 instead of Base64).

The Python API server
On this server we can find the API in Python for Latch (interface between the WebApp, the printer, the ticket reader and the Latch server) and the WAMP server.

The server is invoked by the WebApp, using a POST to port 1338, with the fields:

{
money: [amount of money in euros]
text: [string of text that will appear in the Latch app]
}

Two operations are now executed sequentially:
1. The server creates a request via the API to request the “operation instance” to the Latch system of Elevenpaths, so that in the Latch app associated with the user, a new line will appear with the text identifier of the operation. This operation is now subject to the authorization of the user, is “latched”.

And in the interface of the phone app … we find, within the PaperKey service, a new “operation instance” with the entered text “Equinox Demo 2016”.

2. The server invokes the ticket printer (IP and port of the Raspberry associated with the printer) so that the ticket is printed with the QR code associated with the operation. At this moment, the Issuer has generated an operation in Latch, and also has printed a paper ticket with a QR code that identifies said operation.

If the Recipient of the operation (that person who physically takes the ticket) would like to use it, they must wait for the Issuer to authorize such operation.

Ticket reader + money bank This system is composed of another Raspberry Pi (in the cardboard box), a laser QR code reader, like those in supermarkets and a colorful coin dispenser (we told you they have a lot of toys).

The laser reader is presented by USB as a standard HID keyboard, so that to transmit information to the operating system it simulates keystrokes corresponding to the scanned code (digits or characters).

This posed an interesting problem with the terminal. In order to be able to capture keystrokes without the STDIN of the process – since this would be in its console, not being available from a process launched in a pseudo terminal – we used a wrapper programmed in C that intercepts the events of the device that presents the Linux kernel in user space /dev/input/event5.

And this caused us a second problem, since the operation identifier we use has alphanumeric characters with uppercase and lowercase, and the keyboard emulation of the scanner is always with characters that do not require simultaneous keystrokes (e.g. [SHIFT] + Letter.) So we had to do a code conversion to Base32 (which collaterally increases the size of the string, so the density of the QR code must be increased as well.) If you have read this, we will not give you that Internet Point. After all the twists and bumps, we have an operation identifier. From the Raspberry, we build a JSON request, and launch it against the API Python server as operation “checkID.”

{
Id: [Operation identifier]
}


The server sends a query to Latch, providing the operation ID associated with the user. If the operation is “latched” (“Latch ON”), the system will return an error.

If the operation has been unlatched (“Latch OFF”), the system will consider the operation as authorized and will proceed to provide the amount of money indicated in the automatic wallet. The wallet is connected to the Raspberry Pi by USB, and it receives the amount of coins to dispense with a code of 4 digits.

Taking part in Latch Plugin Contest
Paper Key, as proof of concept, allowed us to prove that it is simple (we did it in 23.5 hours!) to integrate different technologies to achieve a secure and user friendly system with many use cases, depending on the imagination of each person.

For example, lockers containing a product provided by the Issuer and that can only be opened by the Recipient, upon confirmation of payment received by the Issuer via their Latch. Or one could issue tickets for a free bar: only when the party responsible (to pay) decides so via their Latch, the tickets can be validated in exchange for drinks. One can also give one-time access (OTA) to a facility, for example, give free trial days of access to a gym.

As you can see, a lot of things can be done with relatively simple integrations.

We would like to take this opportunity to remind you that a few weeks ago ElevenPaths convened a new edition of the competition Latch Plugins Contest. In this contest you can win up to $ 5,000; remember that what is rewarded is the imagination, talent, creativity and solution provided. If you want to know all the steps to follow to register, visit our Community, where we explain how to participate and where you can find tricks, tips, and also join the conversation about the Latch Plugins Contest. In addition, if you want to know all the mechanics of the contest, you will also be able to check the legal terms and conditions.

Remember that the deadline of the contest is December 12, 2016, show your hack side and participate now!

*Related content:
Latch Plugins Contest: the plugins and hacks contest in which you can win up to 5,000 USD
Latch Plugins Contest: Remember the story!

Industry 4.0 is much more than Smart Factories

Beatriz Sanz Baños    23 November, 2016

We should start by explaining what Industrial IoT (IIoT) – also known as Industry 4.0 – is. It is undoubtedly as much a revolution as the three previous revolutions were. But this revolution is quieter and trickles from certain businesses to reach the entirety of organizations. In the digital era, changes move faster than in previous periods.

The term Industry 4.0 was coined by the German Academy for Science and Technology (acatech) a association that protects the interests of key players such as Bosch or Siemens. The term chosen by acatech has become very popular in the industry and describes tasks that were being carried out before there was even a term to describe it: to add intelligence to industrial processes.

Industry 4.0 Evolution or revolution?

Obviously, some industries – like the automotive industry, for instance – have been working on improving efficiency in the manufacturing process for quite some time. Automation in these sectors is exhaustive. Industry 4.0 is a holistic reshaping of the industrial plant, taking into consideration that this does not only include the assembly premises but the supply chain, as well of the final distribution and so on. These centres combine technology as a core factor both as means to achieve connectivity and through automation, optimizes the organizations resources. An initial conclusion is that Industry 4.0 goes beyond the traditional boundaries of the Smart Factory where the product is assembled. It now integrates the raw material supplier, logistics deliveries, the workers (who are now connected), etc. allowing the whole process to flow in a streamlined and connected manner throughout the different stages.

Sometimes, IIoT is called the fourth industrial revolution. In a certain manner, it is safe to say that every evolutionary step in the history of industrialisation has had a revolutionary component that stimulates a part of the productive process. The first factor addressed was energy. When chain production was developed there was an effort to improve productivity. The progressive decline in cost and improvements in automation spurred the search for better quality. Finally this new transformation that Internet of Things has brought aims to achieve more efficiency in the processes and to strive for a zero manufacturing defects scenario. Extreme efficiency is undoubtedly a prime goal for Industry 4.0. Business areas such as the automotive industry achieve a 120-130% efficiency thanks to automation allowing for an end to end control of the process. This trend towards minimum cost and maximum quality is a challenge that manufacturers need to address in a globalized market that is taking competitiveness to a new level.

Technology was not only created for the benefit of industrial use. However the industrial market has been the most keen to embrace and maximize these advancements. Certain technology allows to improve efficiency; the development of electronics, communications, processing power, analytical capacity, etc. are key components of the IoT. These elements have also been put to use by the industrial business to achieve a higher degree of efficiency. To illustrate this with an example, 4G was not developed to improve Industry 4.0, but it benefits from the technological advancement of communications. Another example is miniaturization. It was not developed for industrial use, but for consumer electronics, that required smaller and easier-to-use form factors. However IIoT has capitalized on it to achieve more efficiency – for example using smaller and more efficient truck sensors. There is a transversal benefit for the IIoT allowing it to make the most of technology which was developed – or meant for – other business areas and industries. 

Looking towards the future, this technological reinvention of different industries will continue its unstoppable drive in the years to come. This first stage – currently underway – affects big companies especially those which require controlling costs very closely, that compete in highly competitive markets, and are on a path to reach the aforementioned ultra-efficiency. The transformational forces are not bound to stop after reaching the big players and the expansion of the IIoT will continue until it reaches the whole industrial and manufacturing process of this fourth industrial era.

It will not be an automatic nor simple shift. It will require a new skillset, a new way of focusing on protecting data security and managing the uncertainty that digitization brings. Industry 4.0, for the reasons we just mentioned, requires experienced and talented partners to guide companies through industrial transformation, adapting the general principals to the specific needs of each industry.

Commuter Traffic: Can Big Data solve the problem?

AI of Things    23 November, 2016
When we sit in our daily traffic jams, many of us may think: Where do the other commuters come from? Where are they on their way to? Are we all going in the same direction? Perhaps the lady who sat in the car next door actually lives 2 doors away and has the exact same commute as me everyday.  Here at LUCA, we decided to take a data-driven approach by looking at our mobile data insights to show you the huge potential of carsharing, demonstrating that us commuters have a lot more in common than you may think.
Smart Steps is part of the LUCA portfolio and enables us to extract actionable mobility insights from our customers mobile event data. After anonymizing and aggregating the data, we are able to understand the demographic profile of groups of mobile phones (which act as a proxy for groups of people) as well as identifying their home and work locations:
In this analysis  we decided to use Smart Steps to have a social impact, looking at how our product could contribute to sustainability goals. First of all, we perform pre-processing on our anonymized raw data in order to provide useful information in a recurrent way. An example of this is the assignment of POIs (points of interest) which can be either “work” or “home”.
As you can imagine, these POIs are calculated based on mobility patterns. So, we began to think about use cases related to the sharing economy, public transport planning, environmental impact and infrastructure construction. In the end, we decided to focus our analysis on a typical day in Madrid, which suffers tremendous traffic jams in rush hour (as in many cities) and has recently been affected by considerable pollution issues.

Deciding how to address such problems is a real challenge for the authorities, so we have developed a simple tool which explores how many people share both their home postcode and their work postcode.

Where do we work?

First of all, we extracted a heat map showing the density of workers in every postcode in Madrid. Below you can check out this map and compare to see if it is aligned with your expectations. The darker the colour, the greater the number of workers in this area:
We then decided to dig deeper by looking at the catchment of the Telefónica Headquarters, establishing where workers commute from every day. We expand on the heat map below in our demo video here in case you find this interesting:
Figure 4: Where do people commute from to the Telefónica Headquarters postcode?

How do we move around the city?

Fully engaged by this part of the analysis, we decided to build a simple dashboard which is able to show the home-to-work relationships between all postcodes in Madrid. By doing so, we ended up creating a tool which could help us to, for example, find carsharing partners who share both home and work postcode on a daily basis, providing a unique opportunity for companies and the public sector to encourage this green initiative (we’ll elaborate on this next week in a more specific post on carsharing).
Not satisfied by this, we took the analysis one step further, looking in greater detail at the movements of masses every day. We grouped the postcodes in to four areas (1) Internal-North, (2) Internal-South, (3) External-North and (4) External-South, where “internal” refers to the area inside the M-30 motorway, and “external” refers to the rest of the Autonomous Community of Madrid. Based on these areas, we can see how masses move. In the video below, we explain how this dashboard works:

Another great way to understand and visualize how people move in Madrid is to use graph analytics. The commuting dataset contains an aggregated count of the number of people moving from home (postcode A) to work (postcode B). This can be seen as a directed graph where nodes are postcodes and edges are weighted by the count of moving people. We then performed some pre-processing and ran the community detection algorithm in Gephi to find out which groups of postcodes are intrinsically connected (communities). The algorithm produced 5 groups or communities which you can see below:
Figure 5: We used Gephi to detect 5 communities of highly connected postcodes.
Then, we can easily represent nodes and colors (communities) on a map using Spotfire:
As you can see in figure 6 and the second video, there are very clearly defined mobility areas, and in general, people appear to live and work in the same area (North, East, South, Central and West) as the colours are relatively compact.  This is beneficial for the quality of our air as it implies shorter journeys to work. However, there is an exception with the “blue” community (Madrid East also know as “Corredor de Henares”) as it shows a much sparser pattern than the others.

Another interesting approach is to investigate the “poles of attraction” of each community, that is, postcodes with the highest number of commuters within its community (the biggest circles in the Gephi graph and map in figure 5), which are really the “busy areas” of each area. This is also demonstrated in our second video.
Of course, our analysis is a relatively simple and straightforward approach to what could be a much more complete tool. There is plenty of fine tuning to be done, including greater capabilities and more extensive data which could complete the analysis. However, it is a first step in understanding how individuals, companies, NGOs, public administrations could use this data to improve our lives in an anonymized and aggregated way – prioritising security every step of the way.
After reading this, do you have further ideas on how to apply our data? Let us know by dropping us an email here or commenting on this post. We’d love to hear from you.
By Javier Carro and Pedro de Alarcón, Data Scientists at LUCA.