Encryption That Preserves The Format To Ensure The Privacy Of Financial And Personal Data

Gonzalo Álvarez Marañón    20 October, 2020
Encryption That Preserves The Format To Ensure The Privacy Of Financial And Personal Data

Your personal information swarms through thousands of databases of public and private organizations. How do you protect its confidentiality so that it does not fall into the wrong hands? At first glance, the solution seems obvious: just encrypt it. Unfortunately, in cryptography things are never that simple. Encrypting information like this without further ado poses several drawbacks. Let’s see it with an example.

Disadvantages of Encrypting Confidential Data

Imagina que un comercio online o tu entidad financiera quieren cifrar tu número de tarjeta de crédito que guardan en su base de datos. Podrían recurrir a la solución estándar de cifrado: usar AES, por ejemplo, en modo CTR con una clave de 128 bits y con un vector de inicialización aleatorio. Si tu número de tarjeta es 4444 5555 1111 0000, el resultado de cifrarlo con AES-128-CTR se muestra en la siguiente tabla, codificado de diferentes maneras habituales:

Imagine that an online shop or your financial institution wants to encrypt your credit card number which they keep in their database. They could use the standard encryption solution: use AES, for example, in CTR mode with a 128-bit key and a random initialisation vector. If your card number is 4444 5555 1111 0000, the result of encrypting it with AES-128-CTR is shown in the table below, encoded in different common ways

Clear text4444 5555 1111 0000
Text encrypt in Base64U2FsdGVkX1/Kgcb0V8G++1DWcwyu47pWXflP2CiVda51Ew==
Text encrypt in hexadecimal53616c7465645f5f3601f1e979348111d342c038e9275492a1966fd8659f61a89869
Uncoded encrypted textSalted__ݺ▒Ii<½║'{☺Éqc»▬@Çþ¶ÔÈ×C♂♦

As you can see, the format of the coded text has nothing to do with the format of the original clear text.

  • Change in length: coded text is much longer than clear text. It would violate the standard-length limits for credit cards imposed by the database.
  • Change in format: no one would recognize this sausage as a credit card number. If a cyber attacker steals the database, he doesn’t have to be very clever to realise that what he is stealing is not a ready-to-use credit card.
  • Change in character set: no validation would pass for the content of the record because it contains characters that are not numbers, let alone in its uncoded form, which looks like a teenager’s WhatsApp. Encrypted text would cause problems in the data scheme.

This transformation of the clear text into a monstrous chain will break many systems.

  • You will not be able to store it in databases that are not prepared to accept this new format.
  • You will not be able to transmit it through the usual payment gateways.
  • You will have to decrypt it every time you use it.
  • You will not be able to search for a specific card number in the database to consult its operations.

Is that not enough? Well, the problems don’t end up there. If during a database query the encrypted value is decrypted to be read and then re-encrypted, AES in CTR mode will use a new random initialisation vector, so the final encryption will not look anything like the previous encrypted value. As proof, in this new table you have the same card value encrypted with the same key, but with a different initialization vector:

Clear text4444 5555 1111 0000
Text encrypted in Base64U2FsdGVkX18OyY1wEH1Co2mFw3nXazm9e6yCGqLLAyTbug==
Text encrypted in hexadecimal53616c7465645f5f09c2cb2e14abda1d21bea9d22e3653e8310e6e8551a94bbf1467
Encoded encrypted textSalted__Ñ╬T7¶Í«é¿r═§yG»¬³hºƒð7→{╩e

Nothing similar, right? As a consequence, forget about using the encrypted data as a unique key to identify a row in a database because they will change from encryption to encryption.

In short, encrypting data that is in very strict format, such as a credit card, poses several seemingly insurmountable practical limitations. But then, if the change in format prevents encryption, how to comply with the latest regulations, such as GDPR, PCI DSS or PSD2, and how to preserve data confidentiality without impairing database functionality?

What Solution Does Cryptography Provide?

The answer cryptographers have given to this dilemma is known as Format-Preserving Encryption (FPE). FPE extends the classic encryption algorithms, such as AES, so that the encrypted texts retain their original length and format. Moreover, in the particular case of a credit card, the encrypted value can even be made to pass the Luhn check. See how the above credit card number would look encrypted using FPE:

Clear text4444 5555 1111 0000
Encrypted text with FPE1234 8765 0246 9753

With FPE, a credit card is encrypted in a chain that still looks like a credit card and passes all controls.

Thanks to FPE, data no longer causes errors in databases, message formats or legacy applications. And what is the biggest advantage of FPE? You can process and analyse the data while it is encrypted because it will still comply with the validation rules.

Of course, there are many highly formatted data beyond credit card numbers that can be successfully be protected by FPE:

  • IMEI number
  • Bank account number
  • Phone number
  • Social security number
  • Post code
  • ID number                                            
  • E-mail address
  • Etc.

These identifiers are routinely used by all types of industries: e-commerce, financial, health, etc. The question is: how secure are these encryption methods?

FPE in Real World

In 2013 NIST adopted in its SP 800-38G recommendation three algorithms to encrypt data while preserving the format, called FF1, FF2 and FF3 respectively. If you are curious about the name, it derives from the use of a long-standing encryption scheme: Feistel cipher; hence the algorithms based on it are called Feistel-based Format-preserving encryption or FF. FF2 did not even see the light of day, as it was broken during the approval process. As for FF3, in 2017 weaknesses were already found, which have been strengthened in the subsequent FF3-1 version. For the time being, FF1 and FF3-1 are still holding it together.

However, the FPE algorithms still have limitations:

  • FPE algorithms are deterministic: identical clear texts will result in identical cipher texts when encrypted with the same key, unlike conventional encryption, which is usually randomised. However, for data with less demanding formats, such as an email address, randomness can easily be added, as an email address can be of any length, unlike, for example, a telephone number which will always have 9 digits.
  • FPE schemes do not provide data integrity (you have no guarantee whether the encrypted data has been changed) or sender authentication (you have no guarantee of who encrypted the data).

In the end, FPE continues as an open research problem, in which we will still see many advances both in cryptanalysis (breaking algorithms) and in the creation of new, more powerful ones.

Leave a Reply

Your email address will not be published. Required fields are marked *