DIARIO: Our Privacy-Friendly Document Malware Detector

Innovation and Laboratory Area in ElevenPaths    11 May, 2020
DIARIO: Our Privacy-Friendly Document Malware Detector

Let’s imagine that a user receives an Excel file containing information that is supposed to be private or confidential. The user thinks that it could be malware, but their local antivirus has not detected it (since it has arrived in their inbox or hard drive and the antivirus has not gone off). What would happen if it were really malware? How could we check it?

If we send it to a multi-antivirus system on the web or via email to an administrator who can help us identify it, we could be disclosing confidential information if the file is legitimate. In such a case, we would be compromising confidential information in an attempt to protect our system. However, if you don’t use any security measures because you believe the document should not be shared, you could infect your system. In this context, we thought DIARIO could come into play.


DIARIO is a new malware detection concept. It scans and analyses documents in a static way with no need to know the content of those files. For the analysis, it just uses the structure and formal features of the file without using any sensitive content. DIARIO extracts the features of the file and use them to create a vector impossible to attribute to a single file. This vector is employed together with standard Machine Learning techniques to detect malware.

The model used is flexible and is usually trained with the latest malware samples so that it can detect and complement beyond the traditional antivirus signatures.

This Machine Learning-based detection system is patented and has been built entirely by ElevenPaths Innovation & Labs.

There are many Machine Learning-based solutions to detect malware, but DIARIO is different from them for the following reasons:

  • It specialises in those documents where privacy is most critical: PDF and Office files.
  • Intelligent: We have trained our Machine Learning model by using the least detected samples in turn by antivirus engines. This way we can bridge the gap between traditional solutions and real malware issues. DIARIO is not intended to replace antivirus, but to complement them.
  • It has a dashboard for the analyst to validate and reinforce the system conveniently. This dashboard can be used by analysts to carry out malware research: attribution, detection, learning, analyses, research, and so on. This way we would have two user profiles: the one who wants to use the prediction service without compromising the data from the documents and the analyst who can take advantage of the database without accessing any compromising data from the documents.
  • Analyses are really fast. We just need a minimal part of the file to upload to the server and predict the attack. The server does not discard the file. Rather, the file is simply not required.

How Is It Used?

DIARIO has been working for a few months now, in the following lines you will find the formulas to use it:

  • Web: Users just need to drag the file into the scanner box in order to receive the prediction without compromising the information from the document.
  • Email Plugin: Users can conveniently send attachments without compromising their privacy. We will give further details later.
  • Analyst Dashboard: From where documents and features can be searched, analysed, or related to each other in order to develop new research and improve collective intelligence − while maintaining the confidentiality of the document. For now, this works under invitation.
  • The links containing the result and the prediction can be shared in static pages.

So you don’t trust the system? Well done, that’s why we offer the partial sending formulas.

  • API: Anyone can use DIARIO through an API. Build your own client, plug it to your repositories, and so on. FOCA has already integrated it.
  • SDK and command line toolsOn our GitHub.
  • Client for Windows, Linux, and Mac. It shows the content needed for the calculation and only the necessary is uploaded.


We have performed some tests that allow us to confirm that the level of detection (and false positives) is at the level of any other commercial solution. On the other hand, we have performed tests by using special types of macro malware, particularly those not detected by traditional signature systems. The full report is available on https://diario.elevenpaths.com

Leave a Reply

Your email address will not be published.