Big Dating: Could AI be the real matchmaker on Tinder?

Florence Broderick    21 November, 2016
Online dating platforms such as Tinder, Happn and Hinge are seeing exponential growth, slowly sliding on to the home screens of smartphone users all over the world.  Last week at the Web Summit in Lisbon, Tinder’s CEO, Sean Rad, presented about just how popular the world of swiping and superliking has become, declaring that 80% of people on the app are actually searching for “serious relationships”. He also shared that 85% of users are Millennials and that 1.4 billion swipes take place every day, creating 26 million daily matches. 

Cloud Artificial Intelligence
Figure 1: Could Artificial Intelligence be a game changed in the world of online dating?
However, the massive popularity of turning to online forums to meet potential suitors can cause great frustration for many.  In a world where we can digitalise and automate so much of we do, some find it to be too time-consuming, whilst some just get bored of the generic icebreaker conversations. So, what if Artificial Intelligence could relieve online daters of the daily monotony of searching for the perfect match? What if they could invest that extra hour in cooking their favourite recipe or hitting the gym ahead of any potential date? Well, with Bernie, they can. 
Bernie`s functionalities
Figure 2: An overview of Bernie’s functionalities in the online dating world.
Bernie, otherwise know as “Personal Dating Assistant AI”, is a startup based out of Vancouver and aims to take the friction out of online dating. This bot lets you write customizable messages introducing yourself to potential candidates, “sounding like you, not someone else“. He also provides “freedom from hours of daily swiping“, saving users time by eliminating dates who won’t work out. Furthermore, he also learns who you find attractive “working hard to meet your standards.”

The solution relies on both Artificial Intelligence and Deep Learning and the founder reported their number crunching in this blog post, revealing that out of a sample size of 164,519 efforts (actions or events by Bernie), users only reverse Bernie 225 times, giving them a remarkable feedback accuracy of 99.86%.
However, whilst Bernie may entertain the serial online dater, how will the potential “victims” feel when they find out it was a robot and algorithms who actually discovered their “unique” profile on online dating platforms? Well, the founder claims that his countless hours of research, experimentation and bug-fixing have been worth it as he has now found a girlfriend – who did not see any problem with his data-driven approaching to dating. 
On a more negative note, Tinder is also suffering from an invasion of trained Artificially Intelligent chatbots trying to trick users. Candid Wueest from Symantec explained that “the reason they exist is because somebody, somewhere is making money out of them” in an article in El Confidencial.  These bots do this by driving traffic to commercial websites where they try to get users to suscribe to premium services, sharing their credit card details which can be used by potential fraudsters – all in their innocent pursuit to find love. 
An example of this was highlighted in TechCrunch, where many users reported fake profiles of women who were driving male users to a mobile game called Castle Clash with a Tinderverified.com URL to make it seem legitimate. Their spamming attempts aimed to accelerate their downloads but ended up aggravating lots of users who believed they were actually talking to real people as you can see below:

Tinder Bot
Figure 3: An example of a Tinder Bot tricking a user to visit a gaming website.

A similar case also took place when someone used Tinder to collect Uber referral credits, which was also against Tinder’s terms of service.  As more and more of these cases emerge, it’s clear that one of the greatest challenges for online dating platforms is learning to co-exist with AI and bots, as innocent or dangerous as they may be for users. 
As well as protecting their users, they also have to ensure they are themselves integrating AI within their applications to make online dating smarter – so that users don’t even feel the need to download Bernie in the beginning. If they achieve that, then perhaps Sean Rad’s number of daily swipes may drop in the years to come. 

ElevenPaths and Etisalat Digital announce their collaboration for Mobile Security R&D

Florence Broderick    21 November, 2016
Madrid, November 21 2016.ElevenPaths, Telefónica Cyber Security Unit, and Etisalat Digital, two of the world’s leading providers of communications services and solutions, announced today their collaboration in the field Mobile Security research & development (R&D) to conduct extensive research in monitoring and analysing mobile threats for applications and devices. The collaboration was announced at the recent RSA Conference 2016 held at Abu Dhabi. This is an expansion of their alliance beyond their existing shared portfolio of Managed Security and Cyber Security Services. The agreement in Security Services is part of the broader collaboration existing between both companies in multiple areas, under the framework of the Strategic Partnership Agreement originally signed in June 2011.

Francisco Salcedo, Senior Vice President Etisalat Digital said “today’s announcement is significant as mobility has moved beyond devices, apps and online transactions to a connected ecosystem. This transformation has made mobile platforms vulnerable and an easy target for cybercriminals. The collaboration and the deployment at Etisalat Digital’s Cyber Security Operating Centres will enable both partners to provide a solution for enterprises to control fraudulent activity which directly impacts its services, brand or reputation.”

The tools and knowledge used for prevention of PC malware is completely different to mobile malware. A mobile ecosystem is extremely dynamic and cybercriminals are constantly evolving the tools and techniques used for such activities. They look for sustainable, scalable business models that generate revenue through fraud while defeating security enhancements introduced by Mobile App Markets on a regular basis.

Both companies will work with Tacyt, a cyber intelligence mobile threat tool developed by ElevenPaths for mobile threats monitoring and analysis. Tacyt uses a big data approach for mobile app environment research and an enterprise-grade service to conduct full investigations, including mobile malware classification, attribution, categorization, monitoring and in-depth analysis of mobile malware need multiple approaches:

  • Mobile ecosystem is extremely dynamic and cybercriminals look for sustainable, scalable business models that generate revenue through fraud while defeating security enhancements introduced by Mobile App Markets. 
  • Attribution and malware family categorization reveals trends in the cybercriminal community. 
  • Malware risk categorization is vital for mobile threat defense in Bring Your Own Device (BYOD) deployments. If an employee installs aggressive adware on a device, would that be enough to block access to corporate email on its own? What if the adware roots and places a backdoor? Categorization will help in such deployments.

Pedro Pablo Pérez, ElevenPaths CEO and Telefónica Global Security Managing Director, said “we are pleased to collaborate with Etisalat Digital to conduct this in-depth research and analysis on mobile threats. Cyber analysts can use Tacyt for manual or automated search, matching, and investigation of different parameters (metadata) within iOS and Android apps. This allows the identification of potential ‘singularities’, a concept which refers to whatever data (dates, size, images, digital certificates) – technical or circumstantial –makes the app or its developer – as a person – singular or unique from others.”

LUCA at Big Data Spain 2016: Our Full Roundup

Ana Zamora    18 November, 2016
Over the past 2 days our LUCA team have been in full force at the Big Data Spain 2016 event, which was held this year in Kinepolis in the Ciudad de la Imagen. This technology summit, run by Paradigma Digital, brings together over a thousand Big Data professionals and is already in it’s fifth edition. 

This year, we sent several members of our team from both our Data Science wing and our recruitment area to share information on our new Big Data brand and run our technical challenge which we mentioned on our blog the other day to the wide range of data enthusiasts visiting our stand throughout the two day festival of data. 
LUCA team
Figure 1: The LUCA team at Big Data Spain 2016

Proceedings kicked off with a welcome note from Oscar Méndez from Stratio before us hearing about the very popular matter of Artificial Intelligence, from Paco Nathan from O’Reilly Media who raised some poignant issues about AI replacing humans in industry. He showed an interesting slide that showed that the most common job in almost all states of the USA is indeed a truck driver, so perhaps we should think more about how and if the economy will allow for machines to replace us in the swift fashion many are expecting. 
Throughout the morning session, we also particularly enjoyed Kay Broderson’s talk on CausalImpact, which you can find more information on here on Github. He demonstrated how this approach can be used to estimate the causal effect of a designed intervention on a time series, using the example of an advertising campaign to explain the technology to the audience. 
The afternoon sessions also touched some of our favourite topics, including Open Data (From Insight to Visualization with Google BigQuery and CARTO) and the role of Stream Processing in Big Data and the Internet of Things.  As well as more technical sessions with fascinating demos, there were also some insightful more academic talks about Growing Data Scientists and Managing Data Science, which were delivered by Amparo Alonso from the University of A Coruña and David Martinez from University College London. 
On Day 2, our CDO, Chema Alonso took to the stage for his keynote.  He opened up by discussing the increasing threat of smartphones to our privacy, explaining how their extensive sensors, GPS, WiFi and accelerometers exist to create data – data which tells a lot about who we are and importantly, where we are. Chema explained that location data is pivotal, and although many of us innocently think that by switching off “Location Services” we are not being tracked – it is not actually true at all. 
Chema Alonso speech
Figure 2: Chema Alonso explains the important link between Big Data and Security.
He explained that the apps we have installed on our phones start to understand us from the moment we install them, giving them permission to access information such as file location and account credentials. By quickly clicking “I Agree” we hand over our most important personal information such as our email address, our mobile phone number, our social media accounts and everything which we are logged into.
Chema then went on to discuss Tacyt, an ElevenPaths product which monitors, stores, analyzes, correlates and classifies millions of mobile apps while adding thousands of new apps every day. However, he mentioned that aside from apps there are many other ways of tracking us. For example, via WiFi we share the name of our home network, the location of our daily coffee shop and plenty of other information that we’re not aware of.  Battery cookies are also another way of knowing where we are and where we go everyday. 
Furthermore, he shared how telcos can also gather extensive location data through the mobile network without even considering GPS signal, explaining that mere location data gives away a lot more than just where you are. For example, where you go on holiday, where you work, where you shop, if you go by car or train or even where you park the car. Additionally, by knowing how long you are there, telcos can infer how much you sleep, if you have any lovers, your affluence – all of which can be combined to build a relatively accurate user profile – moving from where you are to who you are. 
Chema Alonso
Figure 3: Chema discussed a Play Store bug which “scared” people.
The Telefónica CDO then went on to explain that this data can be used for a lot more than commercial purposes such as advertising, but rather for anti-fraud products, traffic prediction or emergency services. He discussed how LUCA is ensuring they apply their commercial expertise to use Big Data for Social Good by analyzing data from natural disasters in Mexico or using mobility data to reduce CO2 emissions. 
Chema finished his talk by inviting all of us to stop for a second and have a good think about the most relevant questions.  Firstly the legal questions: are the terms and conditions clear enough? Are people aware of what the “yes, I agree” really means?
Secondly the ownership questions: who is the owner of our data? And if it is companies right now, then we have the questions of trust to address: do we know what they are doing with our data? And most importantly, is it all worth it? 
The link between Big Data and Privacy continues to pose a wide range of technical and ethical questions for all us, and although many are still in intense debate, Chema clearly underlines the importance of security being at the centre of everything we do – protecting users and ensuring transparency for customers. 

Open Data and Business – a paradox?

Richard Benjamins    17 November, 2016
While Open Data has a wide range of definitions, Wikipedia provides one of the most commonly accepted: “Open Data is the idea that some data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control.” 

From our perspective, the most important word in this definition is “freely”.  And we pose the question: does this mean that Open Data and Business are incompatible? The short answer: absolutely not.

McKinsey stated in a 2013 report that Open Data (public information and shared data from private sources) can help to create $3 trillion a year of value in seven areas of the global economy. The opportunities that arise when data is opened up to the masses are clear.
However, the longer answer is that anyone who has tried to get some Open Data and perform an analysis knows that this is not trivial. Open Data varies much in terms of quality, formats, frequency of updates, support, etc. Moreover, it is very hard to find the right Open Data you are looking for. Today, most business and value from Open Data is generated through ad hoc consultancy projects that search, find and incorporate Open Data to solve a specific business problem.
However, one of the visions of Open Data is to create a thriving ecosystem of, on the one hand, Open Data publishers, and on the other hand, users, developers, startups and businesses that process, combine and analyze this Open Data to create value (e.g. to solve specific problems, or to discover important and actionable insights).
The current state of play is that those thriving ecosystems are still being formed, and there are several initiatives and companies that try to position themselves, mostly in specific niche markets. A few players in the field include:
  • OpenCorporates. A large open database of companies in the world.
  • Transport API. A digital platform for transport collecting all kinds of transport data, especially in the UK.
  • Quandl. A financial and economic data portal.
Those companies and organizations focus on aggregating Open Data in a specific niche area, and their business model is built around access to curated quality data. Other types of companies then can use this Open Data to run a specific business. A typical example of such a business is Claim my Refund, which uses Transport Open Data (e.g. from a Transport API) to automatically claim refunds for their customers in case there are delays on their underground trips in London.



Another business model around Open Data is to help institution publish their Open Data in a structured way. Such projects are mostly performed for governmental institution:

  • Socrata and Junar are cloud platforms that allow government organizations to put their data online.
  • Localidata focuses on Location Data, especially in Spain.
  • FiWare is an independent, open community to build an open sustainable ecosystem around public, royalty-free and implementation-driven software platform standards.

Once the data is published as Open Data, developers and other companies can then access that data and build value added applications. In the governmental space it is not uncommon for Public Administration to pay for having its data published as Open Data, and then to pay again for an innovative application that uses this Open Data to provide value to citizens (e.g. with information about schools).

In conclusion, there is definitely a business model for Open Data. In the short term around specific niche areas such as transport, or through ad hoc consultancy projects. In the mid term, business will evolve around ecosystems around Open Data both coming from the public and the private sector. However, the current state of play is relatively immature. The bottom line is that public Open Data still lacks quality and private Open Data is barely available.
But this doesn’t mean that Open Data is not powerful yet. A great example of this is where British Airways uses only three Open Data sets for an amazingly innovative advertising campaign in Piccadilly Circus in London:
On a huge screen in Piccadilly Circus in London, a boy stands up and points to a passing planes only if it is a BA flight and it can be seen (i.e. there are no clouds). This advert is based on bringing together three data sources, which are all publicly available: GPS data, plane tracking data and weather data. This work illustrates the power of Open Data when combined with creativity.

Here at LUCA, we’re fascinated by Open Data so watch this space to see more posts and content on the power of opening up data to bring new value to society.

Leave a Comment on Open Data and Business – a paradox?

The Data Transparency Lab Conference 2016 kicks off tomorrow

Florence Broderick    15 November, 2016

By Ramon Sangüesa, Data Transparency Lab coordinator.

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 13.3px Tahoma; -webkit-text-stroke: #000000}
span.s1 {font-kerning: none}

This week the 2016 edition of the Data Transparency Lab conference will take place.  In this event, a community of technologists, researchers, policymakers and industry representatives come together at Columbia University in New York in their ambition to advance online personal data transparency through scientific research and design. This same conference took place last year in Boston at MIT as you can see below:

The uncontrollable growth of the internet has outpaced our ability as individuals, societies and states to maintain control of our identity and privacy meaning that we need to define new guidelines for how our personal data is owned, accessed and used, according to the DTL website.

The conference which kicks off tomorrow has several key objectives:

  • Promote the concept of personal data transparency enabling users to have the right tools to know who, how, why and for what their data is being used.
  • Provide a platform for the research and development of the new tools which allow this.
  • Bring together researchers, regulators, industry leaders, designers, journalists and active players in the area of privacy and data transparency allowing their to be a interdisciplinary dialogue.
As part of their strategy to achieve these objectives, the DTL gives 6 grants per year to projects which help them to achieve this goal to achieve online personal data transparency through scientific research. These projects are presently in different stages of development, however, we are particularly excited about the Facebook Data Valuation Tool (FDVT) which has been developed by a research team at the Carlos III University, led by Professor Angel Cuevas – which was recently featured in the El Confidencial newspaper. This video explains a little more about the potential of this tool:

As you can see in the video, this unique tool is an add-on for browsers which runs while you interact with Facebook. The FDVT estimates how much your activity on Facebook (browsing, posting, liking, clicking on adverts, etc.) is worth. 
To calculate this estimation the FDVT browser extension locates the ads you are being shown while the user is “inside” Facebook, calculating the value of this advertising impression throughout the session. And of course, this value grows if the user clicks on an advert within Facebook.
One of the key findings of this tool is that it hints at higher revenues than the usual estimate for each user of Facebook throughout a whole year of usage. This has been estimated at approximately €10, however, it is easy to see that if you extrapolate the value generated in a short span of time devoted to using the FDVT, the final yearly value generated will be much higher.
Beyond the direct effect of raising users awareness about the economy behind their personal data, this tool can have some other practical uses. We have also been approached by a research group in economics and international taxes. They understand that with a tool like this, it could be easier to estimate how much taxes a company such as Facebook should be paying in each country, just by segmenting data from the FDVT if is shared with a substantial number of users in different countries.
Want to find out more about the FDVT? Check out Chema Alonso’s take on his blog, or if you would like to try out the FDVT, you can find a downloadable version here.
To attend this conference, register on the Eventbrite page or keep an eye on the action on Twitter, following the Data Transparency Lab Twitter account.

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 13.3px Tahoma; -webkit-text-stroke: #000000}
span.s1 {font-kerning: none}

Big Data Spain 2016: Your chance to join the LUCA team?

AI of Things    15 November, 2016
This week on the 17th and 18th of November, Big Data Spain 2016 will be taking place bringing together over 1000 Big Data experts from all over the world to discuss the latest trends and challenges facing the modern organization when it comes to data. 

The event, which takes place in Kinépolis Madrid (Ciudad de la Imagen), has an exciting line up of tech evangelists and our very own Chief Data Officer, Chema Alonso, will be taking part on Friday 18th at 12:10 in Theatre 25, giving a keynote with his insights on the world of Big Data and Cybersecurity – discussing in detail the enabling technology behind these phenomena.  To get an idea of what to expect, check out his keynote at the LUCA launch which took place last month. 
Chema Alonso
Figure 1: Come along and hear Chema Alonso’s keynote on Big Data at #BDS16

Of course, to lead the way in Big Data we need to attract top talent.  According to a recent survey by CrowdFlower as part of the 2016 Data Science Report, 83% of respondents said there weren’t enough Data Scientists to go around, which was 79% higher than the year before. There is a big talent gap at the moment and we are going out of our way to bring the best data professionals to our company by promoting a data culture and strategy within our company.


For this reason, our recruitment team will be attending to engage with the Big Data community and do some talent spotting for both LUCA and the wider Telefónica organization. We would love budding Data Scientists and Data Engineers to come along and see us at our stand, and, if you think you’ve got what it takes, you can take part in our technical challenge for a chance to win a job interview with our CTO, José Palazon
LUCA stand
Figure 2: Want to know how to win an interview with our CTO? Come and find out more on our stand. 
So, what is our technical challenge?  In Telefónica, we have to work with a diverse range of data sources on a daily basis.  One of these sources is call traffic, and in this task you will need to process a high volume of data to understand what is happening on our network at all times, analyzing aggregated information to estimate the activity per user, as well as combining our cell catalogue information to uncover which areas have greater call traffic. 
We will have a Jupyter notebook, preconfigured for Python, Scala o even R, with Spark available if would like to use it so that you can show us your skills when it comes to manipulating huge volumes of data. 

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Calibri; -webkit-text-stroke: #000000}
span.s1 {font-kerning: none}

Interested in finding about more about working with us? Then come along to our stand at Big Data Spain 2016 and follow us on Twitter to get the latest news and updates.

Chatbots? New? You haven’t met ELIZA

AI of Things    10 November, 2016
By Dr Richard Benjamins, VP for External Positioning and Big Data for Social Good at LUCA.



Artificial Intelligence is a hot topic at the moment. We definitely live in the AI summer, as opposed to the AI winter of the 1970s when AI research suffered a decline in interest and funding due to undelivered expectations. Today, AI is back in, and chatbots in particular are at the centre of every analysts attention.
Facebook has recently launched a platform for developing chatbots, Google launched Allo, IBM has Watson, and there are of course Siri and Cortana. There are also hundreds of start-ups building their own chatbots such as you can see in this post from Venture Radar.
Chatbots are able to hold conversations with people in a relatively “natural way”. The business promise of chatbots is that they are able to automate human interaction, which is one of the biggest cost factors to organizations (for example, in customer service).
So what’s the history of AI? The first of what is now called a “chatbot” was ELIZA, a computer program written by Joseph Weizenbaum at the MIT AI Lab in 1964-1966. ELIZA simulated a Rogerian psychotherapist which people interacted with through typing. ELIZA was able to fool many people by convincing them that they were speaking with a real person, rather than a computer program. This also generated one of the first discussions on passing the Turing Test: building a computer program whose output humans judge as coming from another human. Eliza has been implemented thousands of times by students of AI courses (including myself), and there are still online implementations available. But how does ELIZA work?
Conversation with Eliza
Figure 1: Example of a conversation with Eliza
Basically, ELIZA is a rule-based system using pattern matching. The program reads the input from the command line and then parses the sentence looking for relevant keywords. When it finds a keyword, it plays back an appropriate answer to the user, often in the form of a new question (the Rogerian approach). And this repeats all the time. When ELIZA cannot make sense of the input, it returns a general answer such as “Tell me more about X” (where X matches a word from the user’s input), or “What do you mean by that?” Moreover, ELIZA has stored several alternative formulations for the same answer, so it doesn’t repeat itself all the time.
The goal of Rogerian Therapy is to provide clients with an opportunity to develop a sense of self where they can realize how their attitudes, feelings and behaviour are being negatively affected. In this sense, ELIZA just “listens” and plays back questions to the user to let the user tell more. After all, in today’s society, aren’t many just longing often for someone who just listens to us?
Below, you can see some of the code for when the user inputs something about the dreams he or she has. The code is written in Prolog, a high-level programming language used specifically for Artificial Intelligence:
rules([[dreamt,4],[
[1,[_,you,dreamt,Y],0,
[really,’,’,Y,?],
[have,you,ever,fantasied,Y,while,you,were,awake,?],
[have,you,dreamt,Y,before,?],
[equal,[dream,3]],
[newkey]]]]).
 
rules([[dream,3],[
[1,[_],0,
[what,does,that,dream,suggest,to,you,?],
[do,you,dream,often,?],
[what,persons,appear,in,your,dreams,?],
[do,you,believe,that,dreaming,has,something,to,do,with,your,problem,?],
[newkey]]]]).
And if you want to listen to an extended conversation with ELIZA, check out this video:

After ELIZA, many other attempts have been made to write computer programs that are capable of doing human tasks that require intelligence, for example: MYCIN for the diagnosis of Meningitis and DENDRAL for analyzing organic compounds.
The problem of these early AI systems was that they only had shallow knowledge: either the relevant knowledge was captured in the rule base, or the system didn’t know what to do. This phenomenon was referred to as “brittleness” of AI systems. AI systems were brittle compared with robust human intelligence: ask a person something at the edge of a certain domain, and he or she still will be able to give a reasonable answer. Computers weren’t able to do the same.
Later attempts tried to deal partially with this issue through the inclusion of so-called “deep knowledge” in their knowledge base. Through such knowledge an AI system was still capable of some reasoning even if the subject was out of the direct scope of the system. A seminal article on this subject was Randall Davis’ “Reasoning from first principles in electronic troubleshooting” which was published in 1983, which tried to code some kind of understanding of how devices work, and refer to that knowledge when solving unknown problems.
Real Artificial Intelligence, however, requires much more and has to include abilities such as Reasoning, Knowledge Representation, Planning, Natural Language Processing, Perception, and General Intelligence. Technology has changed and improved enormously since those early attempts, and new AI tools like Siri and Watson are streets ahead of ELIZA or MYCIN. However, there is still a long way to go for AIs to exhibit real human-like intelligence. We can all keep our jobs in the meantime.

Big Data and Elections: We shine a light on Trump and Clinton

Richard Benjamins    9 November, 2016
Twitter is widely used as a tool to understand and predict phenomena in the real world. Today on our blog, we have been using Twitter to understand the US Presidential Elections of November 8th 2016. There are no conclusive research results on whether it is possible to predict the outcomes of elections using tweets but we decided to investigate.

Election light
Figure 1: Shine a light on Trump or Clinton
In 2010, an article was published as part of the Fourth International AAAI Conference on Weblogs and Social Media entitled “Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment- concludes that it indeed is possible“. This research used tweets from the German Federal elections in Sept 2009 and concluded that the mere number of messages mentioning a party can reflect the election result. Furthermore, joint mentions of two parties are in line with real world political ties and coalitions. The research also went on to show that an analysis of the tweets’ political sentiment demonstrates close correspondence to the parties’ and politicians’ political positions indicating that the content of Twitter messages “plausibly reflects the offline political landscape.”
However, another article published one year later in the same Artificial Intelligence conference -“Limits of Electoral Predictions Using Twitter” stated that “unfortunately, they found no correlation between the analysis results and the electoral outcomes, contradicting previous reports,”, basing their investigation on tweets of the US 2010 Congressional elections.
A search in Google will show much more research on the feasibility of using Twitter and other social media networks for election predictions.  Additionally, a Google Trends query to check which candidate generates more search activity provides some insight on the matter:
Whatever the conclusion is, there is no doubt that Twitter reflects to some extent what is going on in a country just before important elections. During the two days running up to the elections on the 8th of November 2016, we established a real-time feed from Twitter filtering relevant hashtags, handles names and keywords. As technology, we have used Sinfonier, an ElevenPaths cybersecurity product to detect cyberthreats based on real-time information processing. In this case, Sinfonier automates the capture of tweets in real-time (see Figure 3). Sinfonier encapsulates real-time capabilities of Apache Storm in a very elegant way, so we can build a “digest Twitter-to-MongoDB” within a few minutes and with almost zero code writing.

Sinfonier Topology
Figure 3:  Sinfonier Topology (data extraction pipeline) designed to capture real-time data from Twitter.

We then also used real-time capabilities using Elastic Search and Kibana. Apart from visualizing the tweets in real-time in a dashboard (see Figure 4), we wanted to try out something more fun so we also got some Philips Hue lamps involved.

Real-time tweets
Figure 4: Real-time tweets on Clinton (blue) vs. Trump (red). Trump tweets almost double those of Clinton. Trump is slightly more active in replying and both re-tweet equally all tweets they are mentioned in.
Actually, in the Big Data era, where taking data-driven decisions is the ultimate goal, in a variety of situations it will be unfeasible to use traditional dashboards to convey the status of real-time KPIs. Imagine the case of factories, call center open floors and large retail centers. It could be very cool to have “realtime transparent dashboards”, so that lights (visual perception) adapt to the (big) data produced in the real world in a fast, intelligent and pervasive way. The applications are limitless!

In such cases, using dynamic lights could be a good alternative to convey the main insights of dashboards. For instance, in a call center, the light intensity and color could change according to the number and type of calls (complaints, information, products, etc.) from customers.  By connecting lamps to the Internet, we enter the world of the Internet of Things. Hue lamps can be instructed to react to phenomena on the Internet using the “If This Then That” framework. Using IFTTT, you can use the lamps to turn the light on when it is raining in Amsterdam, or when your plane lands safely in another city around the world.

So how have we connected the lamps to the tweets related to the US elections?  On this occasion, we haven’t performed a complex analysis nor a prediction of the winner as for this purpose, we would need to ask ourselves first if Twitter is the best data stream to use.
Our first step was to design two query sets, in the first query, we count all the
tweets related to common words used to refer to Hillary Clinton
(e.g. Hillary, Clinton, HillaryClinton) or Donald Trump (e.g. Donald, Trump,
DonaldTrump). The results of which are displayed in Figure 5:

Figure 5: Twitter shows much more activityrelated with Donald Trump but tweet count include both positive andnegative references so direct interpretation could be misleading.

Figure 5: Twitter shows much more activityrelated with Donald Trump but tweet count include both positive andnegative references so direct interpretation could be misleading.

The second query (Figure 6) is much more specific as we compare tweets containing their handles, @RealDonaldTrump vs @HillaryClinton:

Pattern
Figure 6. The second query reveals a much more balanced pattern with no clear winner.

Additionally, we set rules for the lights as follows:

  • Lamp 1 is connected to a twitter feed on query 1 (Figure 5), blinking the color of the candidate who had more tweets in the last 2 seconds and showing a stable color of the candidate who had more tweets in the last hour.
  • Lamp 2 has the same behaviour of Lamp 1 but this time it is connected to the second query stream (figure 6).
On the day of the elections, we used the lamps in our office to engage with Election Day. We observed that the lamps were mostly red, reflecting the fact that Trump has many more mentions than Clinton. However, we saw that many tweets mentioning Trump are actually against him, showing negative sentimient whilst Hillary Clinton attracted less negative tweets.



 
Now, the votes have been cast and the United States of America has decided – but in the midst of global frenzied reaction to Donald Trump’s election, our lamps keep on flickering.
Leave a Comment on Big Data and Elections: We shine a light on Trump and Clinton

Can Big Data and IOT prevent motorcycle crashes?

AI of Things    8 November, 2016
Most of us are familiar with the dangers involved in driving motorbikes, with motorcyclists being 27 times more likely than passenger car occupants to die in a crash per vehicle mile traveled, and almost five times more likely to be injured, according to the US based III.

In Europe, the rate for motorcycle fatalities per million inhabitants is extremely high in countries such as Portugal, France and Slovenia where travelling on two wheels is increasingly popular.  In Spain, 247 people died in motorcycle accidents in 2015, up 32% from the year before highlighting the importance of targeted campaigns and increased safety measures to protect motorcyclists.
At a professional level, motorcycle crashes are also happening more frequently.  This year, the Moto GP, Moto 2 and Moto 3 championships have seen a record number of incidents with over 1000 falls – which equates to 60 per weekend – beating all previous records. We saw this unfortunate tendency only a few weeks ago at the Malaysia MotoGP where a total of 17 riders fell off their bikes:

According to Marca, many pundits put this down to new tournament rules, the new standard central electronic system and above all, the change to new Michelin brakes. However, could the Internet of Things and Big Data analytics give us more answers in the future? Could such technologies help us to save the lives of motorcyclists on the roads and on the MotoGP circuits?  We certainly think so.
In the case of the non-professional motorcyclists, connecting motorcycles to make them more intelligent could most definitely help us save lives.  As an example, Internet of Things connectivity could have an enabled the emergency authorities to have found this man who was trapped for 8 days under his motorcycle in South Georgia much sooner.
Equally, Big Data is increasingly helping professional motorcyclists and cyclists to optimize their performance.  In fact, IBM have a special section in their Big Data and Analytics hub dedicated to the role of such technologies in cycling. Sky Christopherson spoke specifically about its potential at the 2013 Hadoop Summit here:

The British Cycling team have also underlined the immense value of using Big Data for “marginal gains” in their Olympic strategy, aiming to improve every element of their performance by 1%, ranging from tweaks in diet and sleeping patterns to bike engineering decisions. This data-driven approach to the sport may well be the reason they won 12 medals in the Rio Olympics (more than the US and the Netherlands combined).
One of our employees, Hugo Scagnetti, decided to investigate this data-driven approach in the world of motorcycles by riding a connected motorbike around the world for the first time after suffering severe leg injuries 2 years ago due to osteonecrosis. Hugo, the Global Rider, rode 37,000 kilometres on his Yamaha motorbike around the world to raise money for stem cell regenerative therapy for children.
During and after his trip, we decided to visualize the IOT data created, integrating information from Facebook and photos taken during the trip using analytics to accurately identify stops, which you can see in this roundup video which we showed at the LUCA launch event a few weeks ago:

The potential of data analytics for motorcyclists and cyclists looking to improve safety and optimize performance is huge. Together, the Internet of Things and Big Data provide a unique opportunity to revolutionize both sports as well saving lives on our roads and we are excited to see how startups, telcos, technology companies and manufacturers come together to make this happen in the future.

Fighting Fraud: The $3.7 trillion black hole facing today’s organizations

Ana Zamora    4 November, 2016


The global cost of fraud per year is approximately $3.7 trillion according to a 2014 survey, meaning that the average fraud impact per organization is estimated at around 5% of its annual revenue. Whilst many believe that fraud cases tend to be multi-million dollar affairs, when in reality the survey revealed that the average loss was actually $145,000.

The growing number of annual data breaches and hacks of prominent consumer brands such as Talk Talk has generated tension amongst the general public, with users become increasingly aware of the risks created by our ever growing digital footprints. In fact, the Association of Certified Fraud Examiners has started to hold an Annual Fraud Awareness Week which will take place from November 13th to 19th this month.

A wide range of data-driven companies are looking at ways to use their Big Data to help themselves, or others, to reduce the impact of fraud in their organizations using cutting-edge analytical techniques.
At LUCA, our security portfolio has a wide range of B2C and B2B products which look to fight fraud from an end-to-end perspective, working closely with our colleagues at Eleven Paths – our specialist cybersecurity unit in Telefónica.


As we see it, there are many different strategies to detect fraud: compromised points of purchase; 
payment fraud intelligence; online threat insights etc. However, we also believe that telco data from network events, operations systems and CRM systems can provide groundbreaking insights to organizations looking to reduce the “black fraud hole” they are facing in their P&L.


To ensure this can take place, we have implemented a quick and smooth integration between our data insights platform and our fraud management solutions by bringing different teams together merging the worlds of Security & Big Data to bring new value to our corporate customers with products such as Smart Digits. 

Big Data and security
Figure 2: Big Data and Security slide from Chema Alonso’s LUCA launch keynote


Pedro Pablo Pérez, CEO of Eleven Paths, spoke about these very products at the LUCA launch, highlighting key use cases simulating real situations. We highlighted the
 benefits for the end consumer, using a clear consent process to mitigate the effects of fraud – fighting the increasingly savvy fraudsters in the online world:

Figure 3: Pedro Pablo talking about Fraud detection using Big Data



Want to know more about how we’re bringing Big Data and Security together to fight fraud? Drop us an email here.

By Daniel Torres, Global Product Manager at LUCA.