5 uses of data on Black Friday

AI of Things    27 November, 2017
There are many stories that suggest how the term ‘Black Friday’ came into use. One common misconception often shared is that Black Friday gets its name from the first day of the year when retailers move from making a loss (‘in the red’) to making profits (‘in the black’). The true roots of the term are found in Philadelphia in the early 60’s, where the police used it to refer to the added chaos following Thanksgiving. By the 1980’s retailers had turned the term to their advantage and in the years that followed, it would become the phenomenon that we know today.  We, as consumers, spend more money each year and therefore, we create more data. In this blog, we will look at the key ways in which shoppers and retailers can use Big Data over the Black Friday period.

1. Harnessing the data

Big Data is a reality that is here to stay, and companies in every sphere of life are using data technologies as an integral part of business. For retailers and e-tailers, the question being asked is “can you afford not to harness Big Data?” This question is no more potent than when Black Friday comes around, as the potential for profit is larger than ever. As well as Big Data technologies, an increasing number of companies are using ‘Machine Learning’ services such as Hadoop to better understand their clients. The advantage of these methods over traditional ones is that they can provide real-time analysis. From historic and current data, machine-learning algorithms can factor in a large number of variables into their models, including predicted weather patterns. The outcome? The ability to not only predict trends in consumer tastes, but also to predict how much consumers will spend, and where.

2. Preparing for the big day

Months of preparation will take place before Black Friday arrives, due to its importance for retailers. With the data-driven technologies mentioned previously, important decisions relating to stock management and the hiring of seasonal workers can be made with more confidence. Each year, the NRF (National Retail Federation) in the United States carries out a number of surveys, one of which revealed that an expected 500,000 to 550,000 seasonal workers will be hired this year. By using the Big Data available to them, stores are able to predict the number of shoppers and thus hire the appropriate number of workers; therefore improving their operational efficiency. Efficiency is also a key word when it comes to stock management. Here, it is important to have sufficient quantities of the best-selling products so that customers don’t leave empty handed. Stores also want to avoid having large amounts of left over stock. Whilst an important part of Black Friday is the buzz generated by limited stock, firms should (and do) use data-driven modeling techniques to prepare themselves for the influx of shoppers.

Figure 2 : It is important for stores to have sufficient stock to match the demand from customers
Figure 2 : It is important for stores to have sufficient stock to match the demand from customers.

3. Setting the right price

Such modeling techniques can also be applied to pricing strategies. Black Friday itself is essentially characterized by the low prices that are available, and yet it is a day when retailers see enormous profits. Some interesting policies have arisen in recent years. Large brands such as Best Buy and Home Depot go beyond the standard price matching and allow managers to beat the price of the competition by 10%. The website Greentoe offers a ‘name-your-own-price’ policy on many goods, a novel way of differentiating from the rest. Finding the balance between a competitive price and strong profit margins is not easy, but by harnessing the data, it is significantly easier. One of the key benefits of Big Data tools is that they can analyze data in real time, and machine learning algorithms can factor in what the competition is doing (alongside many other factors) to arrive at a more precise price.

4. Reaching the right customers


In marketing, the acronym STP refers to the process of segmentation, targeting and positioning that brands must undergo in order to have success. It is all well and good having the right amount of stock and the perfect price, but these things are useless if customers do not see your products. By using data sources that are both in-house and external, retailers can use data-science to draw up more precise segments than would have been possible via previous methods. Since consumer data is aggregated and anonymized, it is possible to do this in a way that values the privacy that we desire. Once segmented, retailers can target potential customers with the deals that will interests them most. For example, in the week leading up to Black Friday, I received a daily email form amazon with personalized deals, based on the data collected from my previous shopping history. As I will mention below, consumers are craving a personal relationship with the brands they love, and targeted communication such as the email below allows brands to position themselves as a key player in the first phase of the consumer decision journey: their initial considerations.
Figure 3 : A targeted email sent by Amazon
Figure 3 : A targeted email sent by Amazon.

5. Improving the shopping experience

One of the trends over recent years has been the hyperconnectivity of our society. We will rarely go anywhere with our phone and this is especially true when we go shopping. In a way, our phone acts as a sales assistant because we use it to make ‘wiser’ purchases. Another trend is that customers want both an increasingly close connection with the brands they purchase but also the ability to interact with those brands on their own terms. In order to achieve this, retailers must have an omnichannel approach, with a presence on social media, a mobile-friendly website and real time customer service. This personal experience is of course key on Black Friday, but perhaps even more so on Cyber Monday, when the mega sales head online. With so many different places to shop, brands must create ‘personal’ connections with their shoppers in order to ensure that they are chosen. Big Data analysis provides key insights that can strengthen the relationship between brand and consumer, and we at LUCA believe that this will only become more important in the future.
How about you? Did you take part in the Black Friday sales? Perhaps you chose to shop from the comfort of your own home this cyber Monday, or decided (like me) to sit out completely and avoid the crowds. Whatever your decision, it is likely that Big Data played a bigger part than ever before! To stay up-to-date with all the latest news, events and content from LUCA, follow us on Twitter and LinkedIn. Happy shopping!

Don’t miss out on a single post. Subscribe to LUCA Data Speaks.

Dumpster diving in Bin Laden’s computers: malware, passwords, warez and metadata (I)

ElevenPaths    27 November, 2017

What would you expect from a computer network that belongs to a terrorists group? Super-encrypted material? Special passwords? The Central Intelligence Agency (CIA) on 1 November 2017 released additional materials recovered in the 2nd May 2011 raid on Bin Laden’s compound in Abbottabad, Pakistan.  We have seen some news about movies, porn, games and several other stuff stored in those computers. But we will go further. We will focus on the security aspects of its 360 GB zipped information. Did they use passwords? Proxies? Encryption? Any special software?

A few hours after releasing the raw information from the hard drives from at least three computers found there, the CIA removed the content due to “technical” issues. 8 days later, they released the data back but now all Office documents were converted to PDF and EXE files were “deactivated” removing their headers for “security reasons”. 

A few words about the CIA “technical issue”

Did the CIA regret in less than 24 hours? We do not know but what for sure happened is that, releasing it all again they added their own metadata.  For example, now we know they used LibreOffice 5.2 (that is not the latest version and has some security issues) to convert Office documents to PDF, and LibreOffice 5.0 to convert RTF. Does LibreOffice have a tool to convert to PDF few thousands of file? Yes, it does. They probably used lowrite, which is able to convert files to PDF from command line.

We used our https://metashieldclean-up.elevenpaths.com to analyse some data

But, for some reason, CIA made some mistakes. They did not correctly convert all of the DOCX files to PDF. Here is an example of the content of some of the files.

Messed up data after creating a PDF

These files were seized 5 years ago… why to be in a rush? They did not even check that files were properly converted before the re-release. Anyway, during the second release they removed some “malware”. 815 different samples. We have checked them and find some interesting stuff. From those 815 “malware” samples, we checked against Virustotal:

  • Not found: 524
  • Found with 0 positives: 146
  • Found with more than 1 positive:  145

At least 146 samples are not considered malware by antivirus, but CIA does. That is ok, AVs are not always right, we already know… but, checking some samples manually, we do not see any evidence of malware on them.  Some are documents, some executables… did the CIA make a deeper analysis? Yes, that is what is seems. We took some random samples like 903A80A6E8C6457E51A00179F10A8FA8, not detected by any antivirus as of today and found what would look like malicious stuff. So good for the CIA here… or not.

Because this is the exception. The file does not look like malware if you take a deeper look. There are really lots of other documents that do not seem to be infected in any way or suppose a risk.  We manually checked. But for some reason they have been removed because of being specifically classified as “malware” or dangerous. Why removing them?

As we can tell, even .log files (just text) has ben labeled as malware by the CIA

Analyzing the memory (although computers were shut down)

Aside from the data re-released by the CIA, once we had all the original material, first “not obvious” action was grabbing pagefile.sys and hiberfil.sys and analyze them. These files are specially interesting because potentially, anything may be there. Literally. Hibefil.sys is a dump of the memory itself and pagefile.sys is the swap file, so chunks of memory from different processes are there and you literally may find urls, passwords… anything. We found two hiberfil.sys files, and seven pagefile.sys from at least three computers.

First things to try to sniff around is interesting URLS. Videos are always interesting. Mainly, videos for children.
We found as well their anonymous proxies of choice, like “http://tproxy.guardster.com” where we can find the urls they were really visiting. Mainly Islamic forums.

But as we have detected, there are some malware IOCs there, like malware evidence in memory. For instance: 20080311cPxl31 (a Flash downloader popular during 2011),http://jL.chura.pl/rc/, http://218.25.11.147/download (quite old Chinese malware distributor, or does it what it looked like), http://59.106.145.58/ (related with MS08-067), http://85.17.138.60/traf/fgaX, 29x67629n689 (not a very common string…). This are all samples of strings found in memory.

But two of them are specially interesting. This string ftp://ggss:xsw2xsw2@ found in one of the pagefile.sys files, which obviously is an username and password from an FTP, belongs to this 4742ae6404fa227623192998e79f1bc6 sample. But this sample is not a popular malware. This raid had place in May 2011. But this sample was first seen in VirusTotal in 2015. How is it possible that it was not seen before anywhere? Not in any antivirus database for four years but just in some Abbotabd computer? It may be a specially unique automated crafted sample… but who knows…

Aside, there even more references to malware in pagefile.sys or hiberfil.sys files. This one looks specially interesting.

Chunk of memory from one of the computers

There is always the chance that this chunks of information in memory come from another source, like the user was just searching about it, AV signatures… but as of the place of the chunk itself we think the computer was infected. This “password sender trojan by: spyder” is really an ancient piece of malware from at least 2000.

Some ancient PDF file by SANS referring this keylogger

Thus, apart from the 815 potential malware files tagged as so by the CIA, some evidences found in the memory linked to other malware samples found in the computer itself make us think that those computers were quite infected.

By the way, their antivirus of choice was a pirated version of ESET32, since they all had the service running. Although some of them had references to AVG and some Kasperksy warez keys.

Hiberfil.sys were interesting for some other reason. LSASS process is there somewhere, and, if treated the right way, you may “mount” the process and check for credentials. That is what we have done. Try to guess passwords from the users logged in just when the file was created. We have tried with hiberfil.sys from SHAED-PC, one of the computers in Bin Laden’s compound.

Using Debugging Tools for Windows (WinDbg),Windows Memory toolkit free edition and mimikatz, tried to find Windows passwords. The process is about converting the hiberfil.sys to a format WindDbg understands,  finding the LSASS base process, running mimikatz and the result was that there were no passwords at all.

Taking passwords out of hiberfil.sys file.

NTLM and LM are clearly null, so passwords were blank.

We could go on and on analyzing pagefile.sys and hiberfil.sys for hours. But this is just a glimpse about what you may find.

In the next  blog entry we will dig deeper into the registry files, passwords used for communication, what programs run when the computers started up… and some other revealing clues.

* Dumpster diving in Bin Laden’s computers: malware, passwords, warez and metadata (II)

Innovation and laboratory
[email protected]

The Data Transparency Lab strengthens its work on data transparency after investing over one million euros in three years

ElevenPaths    27 November, 2017
  • Barcelona becomes the permanent headquarters of the DTL Annual Conference, which will take place from 11 to 13 December.
  • The DTL is a clear example of the various innovation projects that Telefónica develops at its headquarters in Barcelona.
  • The Laboratory is currently sponsoring research groups of prestigious universities such as Princeton or Berkeley.

Barcelona, 22 November 2017.- The Data Transparency Lab (DTL), created and promoted by Telefónica to carry out research in the field of transparency in the use of data in the digital environment, has established itself as a reference in its sector after making an investment of over one million euros in new applications and programs since its creation in 2014.

At a media event today, Kim Faura, director general of Telefónica in Catalonia, and Ramón Sangüesa, DTL’s Coordinator, have taken stock of Data Transparency Lab’s first three years.
In this time, the DTL has sponsored research groups of the most prestigious universities in the world, with which it has created a community that has allowed the development of programs and open source applications designed to improve data management both for individuals and for companies.
Thus, groups of researchers from Princeton University, Berkeley, Technische Universität Berlin, University of California, Eurecom and the Max Planck Institute have been awarded a grant for the development of programs and applications. Spanish researchers from the Pompeu Fabra University and the Carlos III University in Madrid, benefit also from the DTL grants.
Some of the projects that have emerged from the DTL are applications that show users the information that is being sent to third parties through mobile applications or plugins helping users to understand how much money the companies that develop them earn through sharing our data. 
For this year, the DTL’s grant program has chosen six finalists from among a total of 45 coming from 18 countries. Each finalist will receive a €50,000 grant.
The Data Transparency Lab is a project that originated at the Telefónica R&D Centre in Barcelona three years ago with the aim of bringing together the best technologists, policy makers, industry representatives and researchers around the world to work on improving transparency and privacy of personal data.

Innovation Hub
Today, data is considered the most important asset for companies and knowing how to manage and process it properly is essential in order to be able to create business opportunities. The DTL creates and offers tools for users to obtain transparent information on the management of their data allowing businesses to take advantage of their information, within the legal confines.
Promoted by Telefónica as a technological innovation hub based on research at the highest level, the DTL also comprises other partners:  AT&T, Mozilla, MIT Connection Science and the Institute National de Recherche en Informatique et Automatique (INRIA), the main artificial intelligence research entity in France. DTL’s intention is to gradually increase the number of companies associated with this Consortium.
Since its creation its aim has been to connect the best talent in the world to put it to work in the technological challenges that true data transparency requires with the objective of generating a new economy and confidence.
The DTL is an example of the innovation projects that Telefónica has always developed in Catalonia, where more than 200 researchers of 20 different nationalities work on various projects.

Annual Conference in Barcelona
The Data Transparency Lab annual conference, which has been held previously in centres such as the Massachusetts Institute of Technology (MIT) or the Columbia University in New York, returns this year to Barcelona, which from now on is the permanent headquarters of this event.
The next edition, which will take place from 11 to 13 December, will be held in the auditorium of the Telefónica Diagonal 00 Tower in Barcelona and will bring together some 200 participants, researchers, businesses and promoters of the transparent use of data. The Conference will present the sponsored projects for this year and the results and progress of the research projects which were set up a year ago.

Trump: one year in data

AI of Things    24 November, 2017
A little over a year ago we saw an American election that sent shockwaves through the political world. Donald J. Trump, the Republican candidate, went head-to-head with Hillary Clinton who represented the Democratic Party. The election was  held just over a year ago, on Tuesday November 8. As a reference point, prior to the day it was believed that a Trump win was “1000 times less likely than Brexit”. In August 2015, popular betting sites set the probability of a Trump win as 25 to 1 but as the decision-day approached the odds were cut at 5 to 1; still an unlikely event. Consequently, as the results became evident, the world was shook. But how involved was data science behind this surprising victory?
 

Cambridge Analytica, a private company specializing in combining data mining and data analysis with strategic communication for electoral processes, aided the Trump campaign efforts. The company analyzed voters and used personalized advertising in order to target individuals. By looking at “likes” on the popular social media site Facebook, Cambridge Analytica were able to determine (with high levels of accuracy) an individual’s race, sexuality and if they were in favor of the Democratic or Republican Party. The company purchased information on US Citizens and cross-referenced it with their own data in order to extract the maximum value from the figures. This included cross-referencing the information with registers of known Republican Party supporters, even including their address of residence. 
 

This then progressed to personalized messages through Facebook advertising. Individuals received slightly different messages, to appeal to the audience on a psychological level and tell them what they want to hear. These details could be as small as choosing a video rather than a photo, or using a bright background color instead of something simpler. The messages were sometimes focused on area groups, spreading negative messages about opponent Hillary Clinton. Without a doubt this worked and reduced general support for the Democrats. The technologies gave the power to determine an individual’s race, and as a result, Trump targeted ethnic groups in different ways. He was able to show the campaign that people wanted to see and shield people from negative areas. This shows the power that Big Data possesses in influencing world affairs.
 
Going into greater depth on the topic of Hillary Clinton, we can analyze the number of times her name was mentioned during Trump’s speeches. The number increased dramatically as the election approached, demonstrating Trump’s personal attacks on his opponent. If we also look at the number of times he mentioned the word ‘emails’, in reference to Hillary’s email controversy, this also increased significantly in the months leading up to the election. Here, data is giving us an interesting insight into Trump’s tactics; clearly, he wanted to boost his campaign by tarnishing the name of his political opponent.
 
The next point, which simply cannot be ignored when talking about Donald Trump, is his activity on the social media platform Twitter. Often controversial, the president of the United States posts both official messages as well as opinions on his personal twitter account @realDonaldTrump. Noteworthy tweets include his reference to Hillary Clinton as the ‘biggest loser of all time’, claiming that he would ‘NEVER call him [Kim Jong-Un] short and fat’ and his numerous claims of ‘Fake News’. The website ‘Did Trump Tweet It?’ harnesses the power of Machine Learning in order to determine whether Trump writes his own tweets or if a member of his team has written them. The site can predict the writer of the tweets with up to 98.8% accuracy and provides us with a percentage of probability that Trump himself wrote the tweet. Certainly, it is an interesting insight into the
mind of the President. Tweets that would be classified as his ‘personal opinion’ often show extremely high probabilities that Trump wrote them himself whereas more generic tweets such as official correspondence or thankyous are frequently written by other members of Trump’s team.

 
Figure 2 : An external view of Trump Tower, in Chicago
Figure 2 : An external view of Trump Tower, in Chicago

 

 
 
 

 

 
A  

 

The UK also experienced a more data-driven election this year as Prime Minister Theresa May called the snap election in an effort to increase conservative control. Facebook carried out a targeted campaign on May 12  (under a month before the June 8 election) with the aim of increasing voter turnout. Average daily registrations quadrupled; targeted advertising is clearly a powerful tool for spreading messages. What implications will this have for future elections?
 
In the US election, Trump’s data-driven approach was successful partly as the democrats did not match it. The republicans kept the actions hidden from the public eye. Now that the information has been released, will it be possible for data to have a similar impact? The world of Big Data is always growing and machine-learning techniques are always becoming more advanced. Inevitably, data will find a key way to be at the center of the political world.
 
Here at LUCA we are interested in how Big Data is revolutionizing politics, and this will be a topic of observation and analysis for many years to come. 

Designing an OOH advertising map in Brazil

AI of Things    23 November, 2017
For any company, it is vital that they maximize their client reach, and one way of doing this is to make use of the power and value that advertising brings. Nowadays, there are infinite ways of sharing a message; but Out-Of-Home (OOH) Advertising is the method that is seen by most people. Due to this, we want to share with you the project carried out in Brazil, with the aim of creating an OOH map in the cities of Sao Paulo and Rio de Janeiro.

But, what is OOH advertising? It is a style of advertising that involves creating outdoor publicity campaigns, through the use of billboards, outdoor information panels or adverts on public transport. In fact, it is an excellent way to reach the greatest number of people and to have the biggest possible influence. One only has to think of how many advertising hoardings each person passes every day, especially when they leave their house! Furthermore, this form of advertising tends to be very eye-catching; the brand only has around 3 seconds to catch the attention of the target, and therefore bright colors and catchy taglines are often commonly used. Finally, it is also very common to incorporate the latest technologies such as QR codes and NFC (Near-Field Communication), so that people can discover more about the product via their mobile phone.


LUCA OOH Audience
is working alongside Clear Channel, JCDecaux and Otima, with the goal of creating an OOH map in Brazil. The aim of this map is to offer useful tools and metrics so that communication and advertising agencies, and the advertisors themselves, can plan and evaluate their OOH Advertising campaigns. The focus of this study has been the cities of Sao Paulo and Rio de Janeiro in Brazil. By making use of the data the companies collect, including the number of users, their geographic location, the journeys they make and the mode of transport they use. All of these things contribute to a better understanding of how to apply OOH Advertising. The challenge of the study is to lift up Brazil to the same level as the world leaders of OOH (the United Kingdom, France and Japan). 
Why should you invest in OOH Advertising, instead of other ways of advertising, such as the internet, which seem more modern at first glance? Thanks to OOH Advertising, it is possible to reach any age or demographic group. Secondly, after applying the designed OOH map, a total of 400 million weekly trips where accumulated in Sao Paulo and Rio alone, something that permits a large exposure for the OOH adverts.
Figura 2: The bay in Rio de Janeiro.

LUCA OOH Audience, by using Smart Steps technologies, is capable of providing an in-depth view of different locations; it offer an incomparable sample. The solution allows companies to make better decisions regarding their external advertising.

Don’t miss out on a single post. Subscribe to LUCA Data Speaks.

III Impact Innovation Talks: Big Data’s new challenges, achievements and opportunities

AI of Things    17 November, 2017
Yesterday, the 16th of November, the III Impact Innovation Talks was held and was organized by Telefónica Open Future together with PWN Madrid. This third edition of a talk series with the slogan #WomansAge aims to reflect on the situation of female entrepreneurs. This time the event covered 3 themes and their challenges, achievements and opportunities. Pedro A. de Alarcón was one of the speakers in the event, as the head of Big Data for Social Good at LUCA Data-Driven Decisions, and he wanted to empashize the benefits that the value of Big Data can provide when improving society.

Figure 1 : The Wayra auditorium on gran Vía ready for the III Innovation Talks to start.
Figure 1 : The Wayra auditorium on Gran Vía ready for the III Innovation Talks to start.

Due to the Women’s Entrepreneurship Day on November 19th, Telefónica Open Future_ held the III Impact Innovation Talks with the main goal of reflecting on the situation of entrepreneurial women on society. The event was organized under the #WomensAge initiative in collaboration with PWN Madrid and involved the participation of many experts in big data, machine learning, cybersecurity and artificial intelligence’s related fields.

Helena Díez-Fuentes opened the event by welcoming all the attendees and the presentations began with Ángela Shen-Hsieh, head of predictive behaviour at Telefónica and Ramón López de Mantaras, director of the Instituto de inteligencia artificial del CSIC. They spoke about AI and the opportunity of predicting human behavior. According to Ángela Shen-Hsieh “there are many challenges for IA, and one of the keys is that it needs data to work better, to know more about us, so IA can also work better for us”. Along the same lines, Ramón López de Mantaras also agreed that “the challenge now is to provide a better overall experience for us, as a consumer, as a customer… but to do it in a way that we have some protection and we can protect our privacy. The challenge is to make this data work for us, providing machines with common sense”. Also, they both said the main problem is that data can be biased and there is a big need to not to let that happen.

Figure 2: Rebeca Marciel and Pedro A. de Alarcón during their talk at the III Innovation Talks.
Figure 2: Rebeca Marciel and Pedro A. de Alarcón during their talk at the III Innovation Talks.

Rebeca Marciel, head of Gartner Consulting, alongside Pedro Antonio de Alarcón, head of Big Data for Social Good at LUCA, presented the concept of BD4SG, by making it clear that “the potential for good thanks to the value of data is now a reality”. Pedro A. de Alarcón played a demo aimed at demonstrating how big data can contribute in helping underdevolped countries in the case of a natural disaster or pandemic threats. One must always work together with other data sources to make this data helpful. They also addressed data privacy, emphasazing the importance of an ethical use of data and the need to develop Corporative Social Responsability to make sure we can use data’s power for good.

And last but not least, Yaiza Rubio, Intelligence analyst at ElevenPaths, and David Barroso, founder of Countercraft, discussed the challenges of cybersecurity and data. According to both experts, the most important thing is being ready to predict what will come next. Thus, “the main challenge is to provide a easy structure in which everyone can get the information they need from the internet, but having the tools to analyze the security they have”. They both agreed with that technology reflected the society and that there are two main points: the feelings of anonymity and invulnerability we can have when browsing in the web. This makes it easier for people to commit crimes and that is why we all need to make sure we can have an easy but secure digital experience. The main goal of Intelligence Analysts such as Yaiza Rubio is to find the way of having both national security and privacy needs fulfilled by improving the knowledge about the related challenges, achievements and opportunities of this field.

Figure 3: A view of the Wayra auditorium Wayra during Yaiza Rubio's talk at the III Innovation Talks.
Figure 3: A view of the Wayra auditorium Wayra during Yaiza Rubio’s talk at the III Innovation Talks.

During the closing event, Raquel Cabezudo, chairwoman at PWN Madrid, thanked all the speakers and attendees for coming to the III Innovation Talks and asked people to keep improving the ideas they have, as innovation and entrepreneurship will be the best engine to make the world a better place in the future.

Don’t miss out on a single post. Subscribe to LUCA Data Speaks.

Security and electronic signature for any enterprise

ElevenPaths    16 November, 2017

ElevenPaths, Microsoft and Gradiant have collaborated to allow companies to benefit from an advanced platform for electronic signatures and digital certificate safekeeping, integrated with a cloud service for HSM devices, through a simple pay-for-use model.

Guaranteeing confidentiality, integrity and access to information is the main objective of cyber security. The level of protection required varies according to each organization’s needs and the legal or normative requirements of the applicable sector.

To ensure a high level of protection for your information, it is recommended to store and use encryption keys and specially protected signature devices referred to as HSM (Hardware Security Module). Both the standards for the payment card industry, PCI-DSS, and the European Union regulation and IDAS for identification and electronic signature provide for the use of such devices.

In this context, the use of secure cryptographic hardware or HSM provides an adequate mechanism to safeguard and protect keys (in the fashion of a safe-deposit box). However, the cost and complexity related to installation and configuration hinder greater adoption of this hardware. For this reason, some as-a-service solutions have emerged, such as the Azure Key Vault, which offer the possibility of using HSMs as one more service within a public cloud. 
Microsoft Azure is a comprehensive set of cloud services used by developers and IT professionals to create, implement and administer applications through its global network of data centers. Microsoft incorporates Key Vault, a service to safeguard keys on Hardware Security Modules with FIPS 140-2 level 2 certification (hardware and firmware).
SealSign® is a scalable, modular and full enterprise platform developed by ElevenPaths providing electronic document and biometric signatures, digital certificate safekeeping, and long-term archiving of signed documents. 
ElevenPaths, Microsoft and Gradiant have collaborated to create a solution for electronic signatures and digital certificate safekeeping in high security cloud storage. This solution was presented on the occasion of Security Innovation Day 2017, an innovative cybersecurity event organized by ElevenPaths. It combines the SealSign® electronic signature platform, the availability and scalability of the Azure Key Vault, and Key Vault’s integrated key safekeeping service thanks to the BlackICE Connect integration module, developed by Gradiant.
Using this cloud solution provides every enterprise with a high security, high performance platform, the costs of which are tied to its real usage and needs. This allows savings of up to 80% for this service in comparison with other on premise, dedicated platforms.

#CyberSecurityPulse: The Last Disaster of Ethereum’s Most Important Wallets

ElevenPaths    14 November, 2017

It is estimated that 587 wallets with around 513,774.16 ethers have been frozen after an anomaly in one of Ethereum’s most important wallets was detected. Parity Technologies, a company focused on the development of software specialized in peer-to-peer solutions, published the security alert on November 8, stating that they had detected a vulnerability in the Parity Wallet library contract of the standard multi-sig contract. Specifically, the company considers that those affected are those users with assets in a multi-sig wallet created in Parity Wallet that was deployed after 20th July.

Following the fix for the original multi-sig vulnerability that had been exploited on 19th of July, a new version of the Parity Wallet library contract was deployed on 20th of July. Unfortunately, that code contained another vulnerability which was undiscovered at the time – it was possible to turn the Parity Wallet library contract into a regular multi-sig wallet and become an owner of it by calling the initWallet function.


The company, in its last communication published yesterday, states that this is a learning opportunity (albeit a painful one) for our company, for our collaborators and the community that stands with us. There have been discussions within Parity and across the open source community for a while now on how to build better and more secure systems. After all security incidents that cryptocurrency users have suffered in recent years, there is only one thing that is clear: without security, there will be no transformation with the new payment methods.

More information at Parity Technologies

Top Stories

Critical Tor Flaw Leaks Users’ Real IP Address

Mac and Linux versions of the Tor anonymity browser just received a temporary fix for a critical vulnerability that leaks users’ IP addresses when they visit certain types of addresses. TorMoil, as the flaw has been dubbed by its discoverer, is triggered when users click on links that begin with file://. When the Tor browser for macOS and Linux is in the process of opening such an address, “the operating system may directly connect to the remote host, bypassing Tor Browser,” according to We Are Segment, the security firm that privately reported the bug to Tor developers.

More information at We Are Segment

APT28 Used Microsoft Office DDE Exploit Since October

Cybercriminals have started actively exploiting a newly discovered Microsoft Office vulnerability. This DDE attack technique has been found leveraging by an Advanced Persistent Threat (APT) hacking group—APT28 since October. The campaign involved documents referencing the recent terrorist attack in New York City in an attempt to trick victims into clicking on the malicious documents, which eventually infects their systems with malware.

More information at McAfee

Rest of the Week´s News

Bill to Formalize the Election System as Critical Infrastructure

A Senate bill would put the power of legislation behind much of the government’s election security work during the past year and would establish a national competition for hacking election systems. The Securing America’s Voting Equipment Act, or SAVE Act, would formalize the Homeland Security Department’s designation of election systems as critical infrastructure, a move that makes it easier for the federal government to share cyberthreat information with state election officials.

More information at NextGov

IEEE P1735 Implementations May Have Weak Cryptographic Protections

The P1735 IEEE standard describes methods for encrypting electronic-design intellectual property (IP), as well as the management of access rights for such IP. The methods are flawed and, in the most egregious cases, enable attack vectors that allow recovery of the entire underlying plaintext intellectual property. Implementations of IEEE P1735 may be weak to cryptographic attacks that allow an attacker to obtain such information even without the key, among other impacts.

More information at Cert.gov

Vault 8: WikiLeaks Releases Source Code For Hive

Wikileaks announced yesterday a new Vault 8 series that will reveal source codes and information about the backend infrastructure developed by the CIA hackers. Hive’s infrastructure has been specially designed to prevent attribution, which includes a public facing fake website following multi-stage communication over a Virtual Private Network (VPN).

More information at Wikileaks

Further Reading

Built-in Keylogger Found in MantisTek GK2 Keyboards

More information at The Hacker News

SowBug Cyber-Espionage Group Stealing Diplomatic Secrets Since 2015

More information at Symantec

AVGater Attack Abuse Quarantine Vulnerabilities for Privilege Escalation

More information at Security Affairs

Mobility and transport planning in Neuquén

AI of Things    13 November, 2017

There are more and more cities who, thanks to technology and the use of data, are hoping to improve the quality of life of their citizens though carrying out mobility planning and other initiatives. In this post we will present the project carried out by LUCA alongside the Province of Neuquén (western Argentina), where the value of data has been used to design a new public transport system and the route of the new ‘Intelligent Metrobus’.  This project has turned into an opportunity to improve urban mobility and public transport in the region.


Neuquén did not have a single source of information from which they could gather data about the
movement flows of people in the city, primarily at off-peak times, such as banking or school hours. Therefore, LUCA offered the city a solution that perfectly adapts to the idea of a Smart City that the province desired.

The City of Neuquén has a population of 245,000 people and, when combined with the municipalities of Plotter and Cipolletti, is the largest area of concentrated population in Patagonia, with 341,300 inhabitants. Taking these characteristics into account, the province is at the cutting edge of innovation as it aims to use the available technologies to improve the quality of life of their citizens.


The objective for the Province of Neuquén is to make the most of the information that we have at LUCA, as well as our solutions, in order to design a mobility map with reliable and up-to-date data. This information will also be used to develop an organized transport plan, with the aim of creating the first Metrobus in the interior of the country.

Using Big Data tools, LUCA analyzes the tendencies and patterns of data traffic in order to know more about the behaviour of the consumers. Traffic data is enriched with demographic and behavioral information and includes sociodemographic, residential and working locations, information about the objectives of visitors, the age groups, gender, and other attributes that allow for an evaluation by profile and a sophisticated segmentation.


In this way, thanks to the application of LUCA Transit based on Smart Steps (a solution that provides insights using anonymized and aggregated mobile data), it was possible to gather the necessary information to estimate the movement of groups of people in the city. This solution allows both public and private organizations to make better business decisions based on real behavior. Supported by the reliability of the data, it will benefit the planning of transport routes as well as the location of stations and stops. In the same way, it will also be possible to plan the public transport systems in the city, as is the desire of the province. Additionally it allows the province to measure and characterize the flow of trips within the city of Neuquén, to get to know the key transport zones, to obtain tendencies about the movement of large groups of people, to support the new Metrobus service and, in short, create a more habitable city.

Don’t miss out on a single post. Subscribe to LUCA Data Speaks.

The rise of data-driven education

AI of Things    10 November, 2017
Here at LUCA, we like to use our blog to discover and explore how Big Data can be used in parts of society that have historically been closed off to data science. We passionately believe that data, if harnessed correctly, can be a force for good in society, and you can read more about our Big Data for Social Good initiatives here. In this latest blog, we shall see how such technologies are creating data-driven learning environments in our schools, colleges and universities.


The Oxford English Dictionary definition of the word ‘knowledge’ is ‘facts, information, and skills acquired through experience or education; the theoretical or practical understanding of a subject’. Thus, one could define learning as ‘the acquisition of data’, and as students absorb this data, a vast amount of data is created at the same time. In fact, recent developments mean that more data is being created than ever before. Concrete statistics are hard to find, but estimates reveal that at least 50% of classes will be delivered online by 2019. Online learning (e-learning), mobile learning (m-learning), MOOCs and blended learning are current trends that create more data than traditional teaching methods.

Harnessing this data has the ultimate goal of improving the results of students, and there are a number of key applications. For example, data technologies can make the monitoring of student performance easier and more precise. If a student were to take an online exam, staff at the institution could not only see which questions they answered correctly, but also how long they took to read and answer each one. This can provide insights for those setting the exams, and they can adjust the questions to make them more understandable.

The rise of blended learning, where students undertake a mixture of classroom and online course, allows for even more insights. Staff can see how students perform in the two environments and tailor the balance accordingly. Analysis can be incredibly precise, including revealing whether a specific phrase in a textbook is difficult to understand. Thus, teaching methods and learning methods can be improved.

Success and dropout rates can also see the benefits of data science, as shown in the case of Arizona State University. At this American university, learning is digital and customized to the students. Staff can then help students with advice about their strengths and areas they need to work on. ASU has seen student success rates rise 13% and dropout rates have fallen by 54%, a benefit of the extra support that students receive.

Figure 2 : Using data can help reduce dropout rates in universities.
Figure 2 : Using data can help reduce dropout rates in universities.

However, this case study highlights the first limitation of the current technologies. In mathematical and scientific courses, Big Data techniques can easily be applied to the learning process, where exams are assessing quantitative, not quantitative, information. Whether such analysis can work as comprehensively for essay-based subjects remains to be seen. Another significant question is whether the learning process can simply be reduced to a series of numbers. Those weary of Big Data will argue that the relationship between teacher and student is never the same, and that a one-size-fits-all policy based on data could never work.

This is where ‘small data’ enters the picture. According to Martin Lindstrom’s book, small data is ‘The Tiny Clues that Uncover Huge Trends’. When applied to the education system, this would involve reducing national or international census-based assessments to the necessary minimum (which still maintaining anonymity). Thus, analysis of the data can be even more insightful for each individual school, as you can see the causation of a particular trend. We expect to see an increasing number of ‘small data’ developments in the coming months and years.

One of the potential worries of a data-driven education system is likely to be a lack of privacy. However, at LUCA, we always work with anonymized and aggregated data sets, and believe that the same standards should (and will) be applied to the education system. We are excited to see how Big Data can be used to improve this area of society, are you?

Don’t miss out on a single post. Subscribe to LUCA Data Speaks.