Can mobile phone traces help shed light on the spread of Zika in Colombia?

AI of Things    27 April, 2018

Content written by Daniela Perrotta, ISI Foundation researcher and UN Global Pulse fellow and Enrique Frias-Martinez, Researcher, Telefonica Research, Madrid. Originally posted on the UN Global Pulse blog.

Nowadays, thanks to the continuous growth of the transport infrastructures, millions of people travel every day around the world, resulting in more opportunities for infectious diseases to spread on a large scale faster than ever before. In this blog, Enrique Frias explains how mobile data was used to map and predict the spread of Zika in Colombia.

A brief history of pandemics…

Between 1918 and 1920, due to the special circumstances of World War I, such as overcrowded camps and hospitals, and soldiers piled in trenches or in transit every day, the Spanish Flu killed between 20 and 100 million people (more than the war itself) resulting perhaps in the most lethal pandemic in the history of humankind. The question that then arises naturally is the following: what if an equally virulent and deadly virus would hit today’s highly-connected world where you can easily reach most places in less than a day’s journey? Indeed, this prospect raises a growing global concern towards the next potential pandemic: when and where it might strike, and whether practitioners and scientists are prepared to respond and prevent the disastrous consequences of widespread spreading of a new disease. The search for the answer to this question, however, does not come without challenges since the emergence (or re-emergence) of human infectious diseases is continuous, variable and remarkably difficult to predict. 

Just recently, between 2015 and 2016, the Americas experienced a large-scale outbreak of Zika that, until then, was considered a neglected tropical vector-borne disease with only local outbreaks over the years, since the virus responsible for the infection was first identified in Uganda in 1947.  However, this represents only one of the latest global public health threats that has once again highlighted the urgent need for accurate data on human mobility and modelling of mobility processes in order to timely assess the spatial spread of infectious diseases and allow for rapid interventions and appropriate control measures to reduce the overall impact of diseases.

Traditional Methods vs Mobility Data

In developed countries, population movements are traditionally observed by national statistical institutes through costly and non-scalable techniques, such as census surveys, aimed at gathering information on the way people usually move on a daily basis. However, such datasets may be inadequate due to lacking spatial resolution or updates, not allowing them to encompass the rapid evolution of travel patterns and often limiting the potential impact of many studies. Moreover, this information may be partially or completely unavailable in developing countries.

Mobility models help by leveraging the fundamental laws of physics to synthetically infer population movements according to parametric forms, such as the laws of gravity and radiation. However, these require good calibration data and their performance significantly depends on the specific geographical setting and modelling assumptions.

To overcome these limitations, more and more sources of data and innovative techniques are used to detect people’s physical movements over time, such as the digital traces generated by human activities on the Internet (e.g. Twitter, Flickr, Foursquare) or the footprints left by mobile phone users’ activity. In particular, cellular networks implicitly offer a large ensemble of details on human activity, incredibly helpful for capturing mobility patterns and providing a high-level picture of human mobility.

One powerful collaboration…

In this context, a collaborative effort began between Telefonica Research in Madrid (Spain),  the Computational Epidemiology Lab at the ISI Foundation in Turin (Italy) and UN Global Pulse (an innovation initiative of the United Nations). This collaboration is currently investigating the human mobility patterns relevant to the epidemic spread of Zika at a local level, in Colombia, mainly focusing on the potential benefits of harnessing mobile phone data as a proxy for human movements. Mobile phone data is defined as the information contained in call detail records (CDRs) created by telecom operators for billing purposes and summarizing mobile subscribers’ activity, i.e. phone calls, text messages and data connections. Such “digital traces” are continuously collected by telecom providers and thus represent a relatively low-cost and endless source for identifying human movements at an unprecedented scale. 

In this study, more than two billion encrypted, aggregated and anonymized calls made by around seven million mobile phone users in Colombia have been used to identify aggregated population movements across the country. To assess the value of such human mobility derived from CDRs, the data is evaluated against more traditional methods: census data, that are considered as a reference since they ideally represent the entire population of the country and its mobility features, and mobility models, i.e. the gravity model and the radiation model, that are the most commonly used today. In particular, the gravity model assumes that the number of trips increases with population size and decreases with distances, whereas the radiation model assumes that the mobility depends on population density.

What does the analysis involve?

The first step is to reconstruct a mobility network for each method to describe the flows of people travelling daily among departments of Colombia. It is worth noting that in principle finer geographical resolutions might be used, but if on the one hand CDRs data allow to potentially go down to the level of mobile phone towers, on the other hand census data are generally aggregated to a wider spatial resolution, thus also hindering the use of mobility models. 

Each mobility network is statistically analysed in terms of the structural and topological properties and accurately compared with the mobility network generated by the census data in order to evaluate the main similarities and differences. On average, trips are concentrated in the western part of the country, including some connections to the Archipelago of San Andres, Providencia y Santa Catalina and a few links to the south, thus reflecting the spatial distribution of the approximately 47 million people that live in Colombia. In this case, from the networks’ point of view, the gravity model is not able to reproduce the mobility of census data as flows are strongly underestimated. On the other hand, the mobility determined by the radiation model and mobile phone data showed a comparable performance with high correlations and similarities with census data, thus successfully representing the mobility among departments in Colombia.

The next step is to assess the predictive power of each mobility network in the application to the study of Zika. Colombia was the second country, following Brazil, to have experienced a large-scale Zika outbreak in Latin America with over 100 thousand cases reported between October 2015 and July 2016, of which about 9% were laboratory confirmed. However, the burden of the epidemic might have been strongly overlooked because of several issues in reporting, mainly due to the clinical similarities of mild symptoms associated with Zika infection, asymptomatic cases, limited sentinel sites and medically unattended cases.

Figure 2: Schematic representation of the integration of mobility patterns (A) and data layers (B) into a modelling approach capable of simulating the epidemic spread of Zika in order to analyse different epidemic scenarios (C) 

In this study, a metapopulational mathematical approach is adopted in order to explicitly simulate the epidemic spread of the disease as governed by the transmission dynamics of the Zika virus through human-mosquito interactions and promoted by the population movements across the country. This modelling approach allows us to represent the population divided into “subpopulations”, corresponding to defined geographical units (i.e. departments) connected by mobility flows, in which the infection dynamics occur according to a compartmental classification of the individuals based on the various stages of the disease. Given the same modelling settings (i.e. initial conditions and parameters), this approach allows us to perform numerical simulations of the spatio-temporal evolution of the epidemic spread of the disease by integrating one mobility network at a time. One can then ultimately assess their predictive power by comparing the simulated epidemic profiles with the Zika case data officially reported by the Instituto Nacional de Salud (INS) in Colombia.

However, this is not an easy task since, in the case of vector-borne diseases like Zika, several other factors beyond human mobility contribute to the epidemic spread of the infection. In fact, the local environment and climate are key factors that regulate the spatial and seasonal variability of the presence of the Aedes mosquitoes, primarily responsible for the transmission of the Zika virus. For example, the country’s capital, Bogotà, is not at risk for autochthonous Zika virus transmission because it is situated at an average altitude of 2,640 metres above sea level with an average monthly air temperature of 18°C, thus not favouring the presence of mosquitoes. Indeed, no confirmed cases of Zika have been reported in Bogotà. 

Looking forward…

Therefore, further modelling efforts must be employed to account for all those ingredients that are necessary to provide a more realistic representation of the epidemic progression. This would involve following the same methodology adopted in recent state of the art approaches to Zika modelling, the model would integrate detailed data on the spatial heterogeneity of the mosquito abundance and the consequent exposure of the population to the disease, as well as detailed data on the population and the Zika cases. Indeed, this approach would allow the exploration of different epidemic scenarios and the comparison of the epidemic outcome provided by the integration of the various human mobility patterns to finally identify the potential impact of using the mobility derived from mobile phones to inform epidemic models and help public health authorities in planning timely interventions.

Don’t miss out on a single post. Subscribe to LUCA Data Speaks.

In search of improved cryptocurrency privacy with Dash, Zcash and Monero

ElevenPaths    24 April, 2018
When we talk about cryptocurrencies we often find ourselves with the belief that their use is completely anonymous. However, those who have investigated a little about them (because it is impossible to know about all of the ones which exist) will know that this is not necessarily the case; taking into account that many of the operations are perfectly traceable in the corresponding block chains.

In this way, if we come across Bitcoin or Litecoin addresses in an alleged criminal activity, we can trace the operations back to those which have been found involved, as well as navigating forwards or backwards in time in the block chains. At the same time, we should also get to know the internal history of this cryptocurrency, as if a hard fork has been produced it could be spending these bitcoins in different block chains under different rules. An example of this is the investigation which we published a few weeks ago about the Wannacry addresses tracking the clues through both the Bitcoin and Bitcoin Cash block chains.

So what should we do if during the course of the investigation we end up finding ourselves with a cryptocurrency which we do not have under our radar or which we do not know? Well firstly, most of the time, we will search in Google. However, the Coinmarketcap.com project could be used as a first reference, as it can further provide information about the average rate, which includes official websites of the project and some explorers from the block chain of each cryptocurrency.

Información proporcionada por coinmarketcap sobre Bitcoin Cash imagen
Figure 1. Information provided by coinmarketcap about Bitcoin Cash


Why do they insist so much upon an anonymous cape?

The question can appear logical at the beginning. Where did the concept of anonymity come from if the transactions are traceable? The answer lies in the fact that despite the possibility to link the operations to the addresses through the blockchain, the association of each address to a person or an organisation is actually complex; unless there is a demonstration by the owner of the same address by mistake, omission or some type of exchange.

Even if it were so, it is recommendable that the generation of different addresses follows the recommendations of the Bitcoin community. In this way, the person who we pay will not have information about the total quantity of money which we have when we make a payment; as it could be the case that we had our entire fortune within the same address. In the end, when we pay a business in cash we do not have to say to the person we pay how much money we have in the bank, we only pay them the right quantity to cover the product amount. Thus, it is precisely within this possibility of generating numerous addresses where the anonymity of Bitcoin resides. From there came the emergence of services called mixers which are dedicated exclusively to carrying out this operation. If the balance of the addresses starts to circulate between dozens of addresses, it is difficult for an external observer to be able to establish where the money has changed hands, by having merely the information from the blockchain.

Dash, Zcash and Monero
In order to confront the anonymity problem provided by Bitcoin, different projects such as Dash, Zcash and Monero have started to emerge.

Dash emerged in 2014 and it is based upon the Bitcoin source code. It relies upon two types of nodes, nodes (miners) and master nodes (in charge of governance and extra uses such as instant and anonymous submissions). The last ones are special nodes ordered to execute certain specific functions which are inherent to cryptocurrency:

  • PrivateSend. A difference from Monero, is that the anonymity in DASH is optional. The transactions are routed through the masternodes in a similiar form as a mixer. The maximum quantity which you can send by this method is 1000 DASH.
  • InstantSend. In other cryptocurrencies it is necessary to wait for a transaction to be added to the block. With this functionality its purpose is to speed up the process if there is consensus among the masternodes.

In order to have a masternode and the right to vote for the proposals, you should rely on 1000 DASH (around 300K USD in April 2018). They share 45% of the profits of each block to the miners, 45% to the masternodes and 10% is used to finance the project (new developments or commercial actions).
Zcash surfaced in 2016 from the fork root from the Bitcoin source code, putting focus upon the privacy. The users can (optionally) use zk-Snarks to mask the sender, receiver and balance of the transaction. However, only 3.5% of the coins are in the hands of private addresses. Thus, within a study J. Quesnelle manages to associate 31% of the operations which are implemented within these transactions to their owners.
Monero, we have already spoke about within this blog about the privacy implemented by Monero. In a nutshell, Monero does not provide information about who is the sender and receiver of a transaction by using circular signatures or in a ring that mixes the transactions of different users at protocol level. The implementation of Ring Confidential Transactions in January 2017 also added the possibility of hiding the balance of the operations.

The clue trackers
In the absence of additional information about the origin of the transactions, explorers of the cryptocurrency blockchains such as Monero are still managing even juicier information. To begin with, if a user consults information about a txid in their explorer, they will already have the capacity to associate an IP address with a transaction. If as within the case of Monero, as well as receiving a consultation from txid, they received a verification request for a payment from a Monero address; this platform will then have the capacity of knowing the received balance for this account. Furthermore, not only within this transaction but also for the rest of the operations within the Monero blockchain after rescanning it. For this reason, it is advisable to carry out the balance verification in the local nodes and not to depend on the reputation of third parties at the time of verifying the received balances.

Ejemplo de explorador de Monero imagen
Figura 2. Ejemplo de explorador de Monero. 

Even so, you must not lose sight of the fact that carrying out connections with network nodes from these cryptocurrencies can be an indicator of their use. For this reason, and also from being experts, we can say that the transaction anonymity is not sufficient to generate a completely anonymous payment, the main developers of the Monero project are also working on a project destined to improve the privacy at a network level. Based upon the i2P and known as Kovri (in Esperanto, to conceal or hide it); even though this project is in development, it intends to cover the existing vacuum at a network level to anonymize the source of the connections.

Félix Brezo
Innovation and Laboratory Team ElevenPaths
Yaiza Rubio
Innovation and Laboratory Team ElevenPaths

An introduction to Machine Learning: “What are Insights?”

AI of Things    23 April, 2018
Content originally written in Spanish by Paloma Recuero de los Santos, LUCA Brand Awareness.


Within LUCA, we often talk of “Insights“. Our tools are designed to obtain valuable “Insights”, or “Actionable Insights” that allow a company to make better decisions based on data. But, what exactly does the word “Insight” mean?

A spiral of stained glass windows
Figure 1: A spiral representing an Insight as a discovery.

    

What is an insight?

The Collins dictionary gives the following definition:
insight
/ˈɪnˌsaɪt/ noun
1. The ability to perceive clearly or deeply; penetration
2. A penetrating and often sudden understanding, as of a complex situation or problem
As we can see, both definitions talk about perceiving something clearly, of understanding and also hint at a “complex” situation or problem.

By looking at the etymology of the word, we get another interesting definition:

Word Origin and History for insight
c.1200, innsihht, “sight with the eyes of the mind,” mental vision, understanding,” from in + sight. Sense shifted to “penetrating understanding into character or hidden nature” (1580s).

The phrase “sight with the eyes of the mind” is vision based on understanding, where expert eyes are able to see crucial things hidden in data, such as behaviour, tendencies and anomalous events. Artificial Intelligence, and crucially Machine Learning, allow us to apply such a “vision” in order to obtain a deep understanding of volumes of data that would be impossible to analyze manually. Such technologies make it possible to compare the data, order it, put it in context and convert it into “Insights“. These Insights can then be translated into concrete actions that form the foundations of a business strategy. It is no longer good enough to justify given decisions using data, instead strategy should be based on such Insights.

Figure 2: The process of turning data into Insights.

 
Therefore, “Data-Driven” companies are those that speak the language of data, and are as such capable of making more intelligence decisions, based on their own data as well as other sources that are freely available.

One of the challenges for non-English speaking countries is finding a word that conveys the meaning of the word “Insight”. In Spain, for example, the word “clave” (key) can be used but it doesn’t quite encapsulate everything that “Insight” does. As such, the English word is often used. This is a fairly common occurence; technical terms such as Big Data, Machine Learning and Data Science are often kept in the original language.

AMSI, one step further from Windows malware detection

ElevenPaths    23 April, 2018
At the beginning it was a virus; pieces of assembly code which connected to the files, so that they could modify the “entrypoint”. Afterwards, this technique was twisted and improved as much as possible, they searched for automatic execution, reproduction, and independence of the “guest” (the malware has already beenstandalone since some time), and also so that it could go under the antivirus radar. “Touch Hard Disk” was the premise (how could they infect it?) and in turn the malware anathema. If it managed to avoid this toll as much as possible, it could get away from the detectors. This technique is called “Fileless”, which sought for an ethereal formula in order to survive within the memory for as long as possible. Hence, it does not touch the disk or delay it too much and it does not land upon what the antivirus firmly controls. “Fileless” has been perfected to such an extent (are you familiar with the malware which combines macros and Powershell?), that there is already a native formula in Windows to mitigate it as much as possible. Yet, it’s not getting the attention that it should.

Estructura básica AMSI imagen
The basic AMSI structure, provided by Microsoft

Anti-Malware Scan Interface (AMSI), basically, seeks to solve a serious lifetime problem within the antivirus industry, which is to detect what does not “touch the hard disk”. It was introduced within Windows 10 and deals with establishing a native communication channel from an operating system which “goes through an antivirus” without the need to “hack” disk, I/O calls, etc. Any flow can pass by this detector, even the very obfuscated called scripts which evaluate and reconstruct their malicious charge within the memory. AMSI comes as standard for Windows and includes a facility so that:

  • Any programmer can order it to analyze the program’s input flows which they find in the memory.
  • Anyone can analyze this flow, by establishing it as an analysis provider (such as an antivirus)

In a world where they still infect systems thanks to any blatant “fileless” trick of obfuscation, this tool is more than necessary.

How can you use it?

The provider model is established; by which you can make the flow pass on the information. The provider will be the one which receives the flow; this role should be fulfilled by whatever has the capacity to analyze this flow as if it were a file, or even, an antivirus. At the same time, AMSI facilitates that any developer can solicit their code that goes through AMSI by a certain “input”. For example, a script for Powershell (something which Windows already does) or a code JavaScript from a navigator. Would it not be interesting if Chrome or Firefox were to pass by AMSI a piece of JavaScript before launching it, without it even touching a temporary folder? Yes, in theory it is possible, the only thing is that still no one does it in this native way. So who does it? Fundamentally, the Microsoft scripting tools; and amongst them, one of the most necessary, Powershell. Ever since roughly 2015, the malware based upon Powershell became popular, a movement in this direction was seen as necessary. With AMSI, the interpreter launches the flow to be analyzed first by the corresponding provider. As the default provider is the very own Windows Defender, its detection capacity is what it is. Though, if an antivirus were to act as an AMSI provider it could manage to evaluate the code which did not pass by the hard drive, since it is simply in the form of an information “flow”.

AMSI en powershell imagen
AMSI in Powershell (source available here)

From the most attractive possibilities (beyond Microsoft’s own scripting software which already does it), we have thought about the advantages for a world in which, in a native way, the content of the navigator as a ‘flow’ can be evaluated by a traditional antimalware system by firms. The antivirus could reach a lot further, and it would not even have to be worried about having to put itself inside the traffic, to wait for it to be stored in the information, to detect the ‘miners’on the fly, etc. Ultimately, it offers an antivirus which supports AMSI (but for now, as far as we know, few appear to support it: AVG/Avast, Dr.Web, ESET…), to hook itself onto the scripts’ interpretations, which we assume would improve or facilitate the detection capacity.

A practical note

It is very simple, imagine this Powershell line is observed from with inside the image and remotely downloads a MimiKatz instance (software habitually used to show memory credentials, and detected by many antimalware systems). A download and execution through Powershell without really storing the information, ends up being detected by its own Windows Defender. It’s actually Windows Defender through AMSI which has interrupted this case.

AMSI en acción imagen
AMSI in action when it invokes a ‘downloaded’ script

On the other side, what we do is actually download this ps1 file from Mimikatz in a hard disk and it will be Windows Defender who directly detects it. In the logs it is very clear. To the left of the generated event with the first invocation with Webclient. To the right, the generated event with the second saved in the disk.

detección por descarga y como fichero imagen
On the left, it is detected with the intent of downloading. On the right, it is detected when it is stored as a file.

It is all very good, but is it avoidable?

Of course, like everything else; it is avoidable in the sense that it does not pass through it, or invalidate itself in any way. We are not speaking about false positives, given that AMSI itself does not detect anything, as it is a simple channel or messenger. The following list which we show maybe does not include all of the found possibilities which can be used to deactivate or avoid it, but overall it provides the majority.

  • Using COM hijacking: this formula was released in mid 2017 and has already been solved by Microsoft. AMSI searches for a COM object (CLSID) first in HKCU (where it did not used to exist), then it searched in HKLM where it actually was. If this CLSID is created within HKCU first, the successful initialization of AMSI was prevented.
  • Using a nullbyte: it was released in February 2018 and has already been solved by Microsoft. It consisted of adding a traditional null byte before the detected chain, which was malicious due to the potential viruses. AMSI utilizes strncpy whose function was copied into the buffer to scan it, until it is found by a 0. A flaw which is quite unacceptable right now. More information can be found here..
  • Memorypatching: was also released in February this year and has already been solved by Microsoft. It consisted of a memory patch for the AmsiScanBuffer() function which should be executed in Powershell before the malicious script is executed. More information about this technique is available at Black Hat Asia

There are other ways to avoid it, such as the ones shown here, as well as examples where it is disabled in some way, or it tries to avoid flow detection. We have various examples within this talk.

Lista de funciones exportadas por amsi.dl imagen
The list of exported functions by amsi.dll. Which is poorly documented.

Conclusion

AMSI along with other security measures embedded as standard in Windows 10, could be implemented better or worse, more or less usable, liked a little or a lot, that result useful or not. However, the security measures (now all grouped together in Windows 10, under the same umbrella of Device Guard, with its kind of EMET, antivirus, CFG, firewall outside, the old AppLocker…) they are there by default, this is a fact, and it is necessary to get to know them well in order to take advantage of them and in turn improve overall security. AMSI is one of the most interesting tools, and at the same time one of the less well known.

Sergio de los Santos
Innovation and Laboratory Team at ElevenPaths
[email protected]
@ssantosv


Sheila Berta
Innovation and Laboratory Team at ElevenPaths
[email protected]
@unapibageek

Data and Human Resources: a close relationship

AI of Things    20 April, 2018
We live in an era of burnout, information overload and a battle for work-life balance. More and more, we can see the evolution of Human Resources (HR) into not only recruiting, but also well-being. The question is how HR departments and employees can make the best out of technology and data to make time spent at work (which we all know is a lot) much better. 
Companies like Google, are highly aware of the importance of their people and the culture they come to everyday. This change in HR has opened the eyes of many, especially key stakeholders in companies to notice why people should remain a focus, and why investment and constant innovation in this area is relevant. Laszlo Bock, former Senior VP of People Operations and author of Work Rules! once said, “We also use a lot of metrics and numbers to track how things are going. Our goal is to innovate as much on the people side as we do on the product side.” Nowadays, Bock leads Humu, a company he started based on using ML and science to make work life better. 

people working on laptop
Figure 1. Did you know most people spend around 2,000 hours at work per year?

Here is where data usage comes in. Data as we know helps back up decisions, and become the foundation and reason to choose to go one way and not the other. By making data informed choices, the information speaks for itself and the negotiation process is much shorter. Less bantering, more information. To adopt this view of people management, it is clear that the first step is the initial desire to become data driven when it comes to HR. A company will need investment not only with money but also with time and training to execute this vision with transparency and the correct tools. 
There is wide land to cover on this topic, however on this post; we will focus on how Artificial Intelligence and Machine Learning help with situations such as unwanted situations, frequent questions, employee benefits and turnover. 
After a series of highly publicized events during the past year, Harassment in the workplace has become a widely covered topic but still, much fear that speaking out will cost them their job or exacerbate the situation. SPOT began as a product to disrupt how workplaces handle uncomfortable situations. Through AI and analysis done on their servers, SPOT interacts with users and allow those who have experienced harassment to talk about it, explain what happened and if they have any witnesses. SPOT then stores the information for 30 days. If the user wants a document detailing the incident, it will also compile the responses from the chat into a document that could potentially be sent to HR. workplace. 
Recording negative experiences is not the only thing chatbots can do though. Companies are using chatbots to respond to employee FAQ´s. Through analyzing the questions and having the information in real time, an ongoing problem will be easily detected. For example if several employees mention they don´t know when they have holidays, a communication issue can be detected and immediately solved. 
When it comes to collecting data regarding satisfaction at work, managing style or general feedback, the first thing that comes to mind is a survey. One way to optimize this is to involve ML algorithms. These algorithms identify patterns in the responses and be able to generate insights in a much faster way. If an employee is extremely unhappy, HR has time to reach out and get a better scope of the situation before the person quits. One company focusing on gathering this data is Glint. Their goal is to increase employee engagement with brief surveys and give managers and HR teams a better idea of where to focus their energy, what is truly important for a team or an individual, and will provide guidance to take action.  
All of this being said, AI and ML should not replace the human factor in HR, but work hand in hand with recruiters and managers to save time, produce useful models and allow for mechanical tasks like payroll and reading through resumes to become much more fluid. The outcome of this is more time dedicated to personal contact, more time for hands on problem solving, and a happier and healthier work environment for everyone. 

First results of the OOH advertising project in Brazil

AI of Things    19 April, 2018

On the 26th March, Clear Channel, JCDecaux and Otima presented the results of the OOH map campaigns in Rio de Janeiro and São Paulo that were carried out alongside LUCA. The initiative began in April 2017 with the aim of testing the effectiveness of outdoor advertising with solutions such as LUCA OOH Audience. It involved combining a large variety of Telefonica’s data which is capable of delivering a more in-depth view of each location.

Testing the effectiveness of Out-Of-Home advertising is one of the key objectives of large companies in the advertising and marketing sectors. With this common goal, Clear Channel, JCDecaux and Otima came together to create the OOH Map, the project that offers tools and metrics that allow agencies and advertisers to plan and evaluate their campaigns. LUCA uses advanced data analytics to offer such companies regular and wide-reaching data.

Now, after launching the project in Brazil during the second half of 2017, the first results of the project have been published. There was an initial investment of 2 million Reais (R$) in Rio de Janeiro and São Paulo and the project relied on the expertise of LUCA, Ipsos Brazil, Ipsos UK, MGE Data and Logit.

A very reliable study of the data was carried out thanks to over 17 thousand interviews and the analysis of over 80 thousand journeys. Sergio Viriato, owner of Conmark, suggests that: 

“According to the registered data, people in São Paulo are impacted by outdoor advertising an average of 86 times per week, and in Rio de Janeiro this figure rises to 173. This translates into total weekly interactions of 490 million and 495 million respectively.”

It is an advertising model that has an increasing presence in large cities due to its large impact. Based on these first results, during a weekly campaign that saw 300 installation in each city, the reach in São Paulo was 29% among young people older than 15, and 38% in the case of Rio de Janeiro. This translates into an average frequency of impacts per person of 5.1 and 10.1 respectively.

“If outdoor advertising were a TV program, it would be one of the 5 most popular shows based on audience alone”

Sergio Viriato, coordinator of the project, assures that the good results can boost future investment in the sector, as is already being seen in Japan, France and the UK. The OOH project will be freely available to all those who register their email on the project’s platform. Ipsos will carry out simulations of the project with the registered agencies and advertisers, and the organizers of the project will put on a road show in order to present the conclusions of the study.

For the next applications of the software, the project will use the anonymous information of 410 million journeys that are identified by mobile data. By the end of 2018, the number of installations will increase, including ones on public transport. Also, once adapted to the methodology, the project will also be applied to areas around Rio de Janeiro and São Paulo. The study follows Esomar guidelines and was the result of a committee formed by Ipsos Brazil, Ipsos UK, MGE Data and Logit, and was financed by the founding companies.

#CyberSecurityPulse: From the bug bounties (traditional) to the data abuse bounties

ElevenPaths    19 April, 2018
social networks

Social networks image The Internet giants are going to great lengths to be transparent with their communication about the information they are gathering from their users. In the case of Facebook, they pay millions of dollars every year to investigators and bug hunters to detect security flaws in their products and infrastructure, in order to minimize the risk of being subject to specific attacks. Though, after the Cambridge Analytica scandal, the company has launched a new type of bug bounty to compensate those that report “data abuse” on their platform. Through the new program ‘Data Abuse Bounty’, Facebook will ask third parties to help them find application developers who are misusing their data. “Certain actors can maliciously gather and abuse Facebook user’s data even when security vulnerabilities do not exist. This program has the intention of protecting us against abuse”, according to the publication carried out by the company.

This program is the first of its class in the industry, where the focus is on the misuse of the users’ data by application developers. The report submitted to Facebook by the analysts should involve at least 10,000 Facebook users and explain not only how the data was collected, but also how it was abused, and additionally about the fact that the problem was not known about by other means beforehand. On the other side, Facebook has also facilitated a platform where it offers social network users all of the information which they have been collecting about a particular user; measures which without a doubt are necessary in a moment where many people are distrusting the internet giants.

More information available at Facebook

Highlighted news

Russia wants to block Telegram after the denial of an encryption key

anti-doping
Anti-doping imagen The Russian media and internet regulator has asked a court to block the Telegram encrypted messaging application after the company refused to give their encryption keys to the state authorities. The regulator, known as Roskomnadzor, filed the suit in Moscow district court. The suit, which still has not been issued, contains a “request to restrict access to the information services in the Russian territory” from the application, they said in a statement. In other words, the government wants to block the application so that it does not work in the country. The suit comes after the Russian State security service, the FSB (before known as the KGB) called for the Dubai-based application developer to hand over their encryption keys, of which Russia claims is a legal suit. The entrepreneur and founder of the company, Pavel Durov refused to do so and thus, the Russian government took Telegram to court.

More information available at the ZDNet

The GCHQ director from the United Kingdom has confirmed an important cyberattack against the Islamic State

EI-ISAC

According to the head of GCHQ, the attack was launched in collaboration with the ministry of defense from the United Kingdom and has disrupted Islamic State operations. The British Intelligence believes that this is the first time that “they have systematically and persistently degraded an opponent’s online efforts as part of wider military campaign”. Fleming explained that the cyber-experts from the
United Kingdom have taken action to disrupt the online activities and networks from the Islamic State, and to discourage individuals or groups. “These operations have made a significant contribution to the coalition’s efforts to suppress the Daesh propaganda, they have obstructed their ability to coordinate attacks and have protected the coalition forces in the battlefield”, said the head pf GCHQ to the audience in the conference in Manchester.

More information available at Security Affairs

News from the rest of the week

Microsoft adds anti-ransomware protection and recovery tools to Office 365

Microsoft has launched a series of new tools to protect their Office 365 Home and 365 Personal clients from a large range of cyber-threats, which includes ransomware. Kirk Koenigsbauer, Microsoft Office Corporate Vice President, said that the underwriters of these two Office suites will receive additional measures in order to protect against ransomware, threats based upon email addresses, greater password protection and the advanced link verification of Office products.

More information available at SC Magazine

A bug in Microsoft Outlook allows Windows’ passwords to be stolen easily

The Microsoft Outlook (CVE-2018-0950) vunerability could allow attackers to steal confidential information, including the credentials of the
user’s Windows login screen, simply convincing the victims to preview an email with Microsoft Outlook, without the need from additional
interaction from the user. The vuneralbility would reside in a form in which Microsoft Outlook shows the content of the remotely located OLE
when you preview a RTF email (enriched text formatting) and which automatically starts the SMB connections.

More information available at CMU

Your Windows could be compromised only by just visiting a website

Microsoft has patched up five critical vulnerabilities in Windows
Graphics Component which reside in the improper handling of embedded
sources within the library of Window sources and which affect all of the
versions from the operating Windows systems so far. An attacker can
trick a user in order to open up a malicious archive or a website
specifically deisgned with a maliscious source, and that if you open it
in a web browser, it would give control of the affected system to the
attacker.

More information available at The Hacker News

Other news

Threat actors search for the Drupalgeddon2 vulnerability

More information available at Security Affairs

3.3 million dollars stolen from the Coinsecure’s main base

More information available at Security Affairs

New code injection technique utilized by APT33 is named Early Bird to avoid detection through antimalware tools

More information available at Security Affairs

AI is invading the world of entertainment

AI of Things    16 April, 2018
Previously in our blog, we have seen how Artificial Intelligence (AI) can help make societies safer and even, in a not too distant future, save lives. In this post, we’re going to explore some of the many ways in which AI is entering into the world of entertainment; from designing movie trailers to reducing buffering times when streaming. Let’s get started!


Technology related to artificial intelligences is getting ever smarter and we are seeing its power in an increasing number of areas of our lives. Two of the key capabilities of an artificial intelligence are the ability to understand speech (called Natural Language Processing, or NLP) and detect images; abilities that are developed using Machine Learning algorithms. These concepts are important to bear in mind and will be referred to throughout this post.


A golfer on the PGA tour
Figure 1: AI was used at the recent Masters Tournament to show the key replays.

 
One of the key names in this field is Watson, IBM’s “AI platform for professionals”. In a short space of time, this platform has provided services for some major sporting events, including the US Open tennis and the recent golf Masters. One of the advantages of watching sporting events on TV is that you can see highlights and replays of the action. Watson was used to select which highlights should be shown and did so by combining audio cues such as cheers as well as detecting fist pumps and other celebrations. Similar image detection was recently employed by the AI at the 60th Annual Grammy Awards. Prior to the awards show, Watson analyzed close to 125,000 photos as well as hours of video from the red carpet. It then used facial recognition to create online content featuring the best shots, and was even able to detect the stars’ emotions to help inform its decisions.
As mentioned in a previous post, Netflix uses Big Data and Machine Learning in much of what it does, in particular when offering suggestions to its users. Perhaps a less known application, and one that is based upon artificial intelligence, is the site’s Dynamic Optimizer. This all relates to video compression, the process of encoding a video so that it requires less storage space. Historically, video content has been compressed uniformly but this doesn’t take advantage of the fact that less complex video can be compressed more without visibly losing quality. The Dynamic Optimizer using AI to analyze content shot-by-shot, which leads to greater total compression and reduces buffering times.

TV remote pointing at a screen showing Netflix
Figure 2 : Netflix use The Dynamic Optimizer to reduce buffer times when streaming
 
We can find another application of AI elsewhere in the world of movies. For many films, the trailer is a crucial factor in determining the excitement around its release in cinemas. The difficulty is in selecting the right clips from the film to put in the trailer. For the 2016 horror film “Morgan”, creators employed (no surprises here) IBM’s Watson to help create the perfect trailer. The AI was trained using 100 horror movie trailers and was then fed the full version of Morgan. From this, it chose the scenes with most action and even detected the sentiment of each one. To put the final trailer together, a human touch was still needed, but nonetheless, it is an exciting prospect for the future.
Within Telefonica, the exciting launch of Aura, our artificial intelligence, has paved a way for a revolutionary user experience. In the six countries where Aura is currently present, users will be able to receive personalized TV recommendations, start phone calls, see how much data they have left and much more. Also at the recent Mobile World Congress, Telefonica announced Movistar Home, which will (among other things) allow users to say goodbye to their TV remote control. To keep up to date with all things LUCA check out our website, and don’t forget to follow us on Twitter, LinkedIn and YouTube.

A Technical Analysis of the Cobalt phases, a nightmare for a bank’s internal network

ElevenPaths    16 April, 2018
A few days ago, a key member from a group of attackers known as Cobalt/Carbanak (or even FIN7 for some of them) was arrested in Alicante. This group has been related to different campaigns against banking institutions, which has caused substantial losses through transfers and fraudulent cash withdrawals in cash machines. We are going to see some technical details from modus operandi, the last wave, how it functions and some ideas about how to mitigate the impacts.

The objective of the group is to access the infrastructure of a financial entity in order to compromise cash machines and withdraw cash fraudulently. Although it seems like science fiction, they do it with network control of the cashpoints, to the point of being able to do it at a specific time, so that it starts to release all of the cash that it contains. Thus, at this moment the ‘mule’ who finds themselves in front of the cash machine will be able carry out the action. More than in the sample analysis, we will focus on the most interesting aspects of the attack phases.


Objective 1: Attack of the user’s inbox

This group employs the spear-phishing or the managed phishing technique. It is fundamentally about social engineering (which requires an employee interaction), in order to compromise the cooperative network of the financial organization. The victims (which are probably selected in a previous intelligence phase, in which they collect worker’s names and surnames and join them to the known bank structure) receive emails with malicious attachments which implant into legitimate companies and regulatory authorities. This can involve very elaborate emails, with reports of alleged updates, alerts etc. Anyway, they are not sent in bulk, only to a select group of emails normally belonging to the same domain. At this stage, they intend to pass right under the radar. The email normally contains a web link to a document, i.e. a file with an .doc extension (which in reality normally is a renamed RTF) hosted in a bought domain for this moment.

It is known which emails in previous campaigns (and even the most recent) are being sent in this wave, by a simple mailer created in PHP. Normally, recognizable because the following message will be added in its header:

X-PHP-Originating-Script: 0:alexusMailer_v2.0.php

Advice: In order to improve the typical anti-spam and anti-malware perimeter boundary; carry out an intelligent sandboxing email, which controls peoples’ inboxes who have privileges. As this will ensure that the information will filter out, or you could even include controlled revision of certain accounts in order to guarantee better protection.

Objective 2: Execution, the internal network

If the victim runs the file whilst Office is vulnerable, the infection will begin. The RTF file takes advantage of various Office vulnerabilities (especially CVE-2017-8570, CVE-2017- 8570…. All of which are very recent) and when they are run, in turn it extracts various types of files. EXE, DLL, DOC, BAT and SCT (Visual Basic Script). Each one has a function: memory injection, download another payload erase tests (BAT) and the DOC is a harmless document which is designed to distract the user. In fact it is called “decoy.doc”. The most interesting thing is that the so-called SCT actually hides in an asynchronous call, in order to download and run the executable file which will actually trigger the infection (in the cases of those which are not embedded within the documents themselves).

arte del código de uno de los BAT descargados por el RTF imagen
Figure 1. Part of one of the downloaded BAT codes for RTF

Another very interesting issue is the fact that it uses .BAT files for the “flow control” of the execution. The code itself is quite self-explanatory: the intention is so that the block.txt file works like mutex, so that the user does not launch the RTF twice whilst they download the payloads and so they do not behave in an erratic way. The fact is that thanks to this, it is possible to create a simple vaccine for this and maybe other waves. For example, with this code:

copy /y nul “%TMP%block.txt”
icacls %TMP%block.txt /deny *S-1-1-0:W /T
copy /y nul “%TEMP%block.txt”
icacls %TEMP%block.txt /deny *S-1-1-0:W /T

It is very simple and harmless to create a block.txt without deletion permission by anyone. Yet, it is not the most elegant and can even be avoided by malware, but as we have said, it is quite harmless.
The RTFs are created with tools which are sold within the black market, which defines the vulnerability and the embedded files, and carries out all of the composition work. As an interesting fact, they do not create a RTF that is consistent with specifications. The created RTF that comes in the email has this header.

“{rt “, “{rt ” o “{rt1”

When the header should be “rtf1”. This is because of the tool used for the creation of exploits or also payloads in RTF. It also sometimes uses the test1.ru domain for statistics, but it does not have a relevant function.

Advice: In this stage, obviously the most advantageous thing is to have a completely updated system (especially with Office) and some properly trained users. Additional deeper security measures (managed specifically by Windows, to detect the malware, EDR, etc.) are also helpful; web filtering and a good IOC consumption policy can be utilised in order to intelligently use data which they can already reach.

Código ejemplo real de los SCT imagen
A real code example of the SCT which can extract the RTF. In which you can observe the executable download

Objective 3: Control of the interesting servers

The “implants” they inject into the system can be controlled remotely through a home call. In these cases, the domains and utilized commands correspond to the RAT tools. The IDS, proxy filters etc. should carry out their work.

In these concrete cases, once they have control of whichever system it may be in the internal banking network, the attackers adapt however they can to the environment, and their performance depends on the provided facilities. They carry out an obvious lateral movement through the network searching for a cash machine control server. In order to be successful they will use standard lateral movements mimikatz, pass the hash, privilege escalation in the active directory…) searching for connections to the servers which interest them (for example, Terminal Server). Also, it is possible that they create users in various machines to escalate the privileges or pass unnoticed in a convenient way. This stage can take weeks.

Once the server is located it can start allowing control of the cash machines, they will implant the beacon as a service, so that it can be controlled by the attacker. It deals with a type of meterpreter for remote control, which again, allows access the attacker to gain access and thus carry out the connections remotely which should be detected through domain logs etc. In reality, it is based on CobaltStrike, which is a commercial tool used to create these type of attack tools and remote control. In fact, there has appeared a new version recently.

Advice: A well-guarded and segmented network. Adequate correlation of security events and above all, privilege control.

Conclusion

It has been shown that traditionally mixed methods with more modern systems have been used in order to gain control of an internal machine, and from there, to manage the cash machine. It is not science fiction. Nor is the possibility of adequate protection. As we have seen (if not in one phase it could be within another) there exists numerous methods which can combat these types of threats, as sophisticated as they may seem; as long as the threat is understood and the information is adequately managed in each phase of the attack. It is nothing extraordinarily new, but it is actually a serious issue to take into account.

Sergio de los Santos
Innovation and Laboratory Team

 

Reducing travel delays using Big Data

AI of Things    13 April, 2018
When we travel by public transport, there are a number of factors that affect our satisfaction with the journey.  Sadly, data science can’t help us to enjoy seats that are more comfortable or food that tastes better. However, Big Data can help reduce the frequency of delays to our journeys by plane or train. In this blog, we see how this works and introduce one of LUCA’s tools that brings benefits to this area.

Delays are a frustrating, costly, and potentially inevitable part of traveling by public transport. There are a number of reasons why a train or plane might be cancelled, and often these causes are out of the control of the operator (you can’t control the weather after all!) However, technical issues are controllable and yet cause a relatively large number of delays. It is here where Big Data can play an important role.
Man looking at departure board in an airport.
Figure 1: Flight delays are costly for both passengers and operators. 
    
The key for airlines is to predict when a component part might need replacing and thus prevent delays by replacing it in advance. Flight operators have a huge amount of data available to them: in-flight data from black boxes, data from sensors within engines, passenger data and more. With the use of Big Data analytics, it is possible to draw insights from this huge quantity of information. EasyJet, the fifth largest European airline by passengers in 2017, has the long-term aim of completely eliminating delays caused by technical faults. To achieve this, it has recently announced a “predictive maintenance partnership” that will rely on the Airbus Skywise platform. During the trial period, 31 faults were detected which helped to avoid costly disruption.
Photo of a train at a station platform.
Figure 2: Big Data has been used to predict train delays in Stockholm, could the same happen in London?
    
In Sweden, rail operator Stockholmståg has been working with Big Data for a few years now. In 2015 it announced a predictive analysis algorithm that uses historical data in order to predict delays with up to two hours of warning. At LUCA, our LUCA Fleet product helps our clients to not only understand the activity of their vehicles but also manage the fleet with more intelligence. The platform includes a system of alerts, predictions and recommendations to help prevent breakdowns and disruptions. This is incredibly useful for vehicle rental companies, couriers and more. You can find out more about LUCA Fleet on our website, and watch the demonstration below to see how the solution works in practice.