Initialization vector mishandling

Florian Picca

Crypto

Feb.

2023

Introduction

To encrypt data, one needs to choose a suitable encryption algorithm and generate a key, but most of the time additional parameters are required. In this blog post, we will focus on the initialization vector (IV), which is a parameter used by the most common symmetric encryption algorithms (AES-CBC, AES-CTR and AES-GCM). The majority of vulnerabilities I encounter during cryptographic reviews come from mishandling of this IV.

In this post we will look at what an IV is, why it is important and how to handle it safely depending on the chosen algorithm.

Background

An initialization vector (IV) is a value used in cryptography by some algorithms. The purpose of using an IV is to ensure that identical plaintext blocks encrypt to different ciphertext blocks, thus avoiding repeated blocks, making the encryption stronger and provide semantic security.

The IV can be compared to a seed in a Pseudo-Random Number Generator (PRNG). The seed in a PRNG determines the starting point for the random number sequence, while the IV in encryption determines the starting point for the ciphertext. A different seed in a PRNG will result in a different sequence of random numbers, and a different IV in encryption will result in a different ciphertext, even if the plaintext stays the same.

Example

from Crypto.Cipher import AES
key = b'K'*16
msg = b'Just some data i want to encrypt'
IV = b'I'*16
print(AES.new(key=key, mode=AES.MODE_CBC, iv=IV).encrypt(msg).hex())
# 1623202e81366f70b20d3c1f47c7fb3760528bfcffc60f5bb9b8e0750d2f220d
# now change IV
IV = b'J'*16
print(AES.new(key=key, mode=AES.MODE_CBC, iv=IV).encrypt(msg).hex())
# 1023ba96d4d22eeca3f60cf091d52ae47f4bef84cf6cf99df71e5b12097daaf4

Common pitfalls

You might wonder what could go wrong if the IV is used improperly. Some developers simply don’t know what to do with it and just set it to zero. But beware, doing so can have disastrous consequences depending on the encryption algorithm used. Take the Zerologon vulnerability for example, which abused an all zero IV for AES-CFB8.

In this section we will cover some of the most common mistakes that can be made when choosing an IV for the three most widespread symmetric encryption algorithms based on the Advanced Encryption Standard (AES).

The AES internals are outside the scope of this post. You only need to know that it takes a 16-byte plaintext and a symmetric key (128, 192 or 256 bits) as inputs and produces a 16-byte ciphertext. Thus, AES is part of the block cipher family.

To encrypt data longer than 16 bytes, block ciphers can be used in different ways called “modes of operation”. The three modes of operation we will discuss are CBC, CTR and GCM.

AES-CBC

The name CBC stands for “Cipher Block Chaining”. The encryption process is depicted below.

CBC encryption schema

As you can see, a plaintext block is xored with the previous ciphertext block before being encrypted. But if the encryption of each block depends on the result of the previous one, what do we do with the first block ? We need an additional initial value to fulfill the role of this imaginary previous block. That’s exactly what the IV does.

In more mathematical notations :

$$ \begin{cases} C_i = E_{key}(P_i \oplus C_{i-1}) \\ C_0 = IV \end{cases} $$

Knowing how the CBC mode works will help understand the mistakes showcased below.

Constant IV

Like mentioned previously, it is common for developers to simply use an IV composed of 16 null bytes. By looking at the previous figure, you can see that if two messages start with the same n blocks, the resulting ciphertext will also start with the same n blocks.

Example

from Crypto.Cipher import AES
key = b'K'*16
msg = b'Just some data i want to encrypt'
IV = b'\x00'*16
print(AES.new(key=key, mode=AES.MODE_CBC, iv=IV).encrypt(msg).hex())
# 13b5bb3bab8dc9c626a8708bc397efb4 3e026dbc88eb4955b9f05db9b82aaac5
# now change last block of plaintext
msg = b'Just some data i want to secure.'
print(AES.new(key=key, mode=AES.MODE_CBC, iv=IV).encrypt(msg).hex())
# 13b5bb3bab8dc9c626a8708bc397efb4 d2ea901a255183553e073280c09a3ebc

This may allow an attacker to gain some information on your encrypted data. He can know if two encrypted data share the same prefix (> 16 bytes). This is enough to say that the CBC mode with a constant IV is not semantically secure.

However, the real threats it exposes you to, highly depends on the context. For example, if you use this to encrypt files in a database, it would be possible to distinguish some file types. PNG images always start with the same 16 bytes, so it would be possible to differentiate such files from other types. There are obviously other scenarios with a variety of impact levels.

Also note that this vulnerability does not come from the value of the IV itself, but from the fact that it is constant. And by constant, I mean it is always the same for a given key. It is important not to reuse key/IV pairs.

Predictable IV

To never reuse an IV with the same key, you might be tempted to use a counter. This is what some other operation modes do, actually. We will see this for the CTR and GCM modes. But in the case of the CBC mode, this is not a good idea, especially for interactive protocols.

Recall the following relation for the CBC mode :

$$ C_i = E_{key}(P_i \oplus C_{i-1}) $$

If the attacker has predicted the IV that will be used for the next encryption operation ($IV^\prime$), he can submit his guess ($P_g$) of the plaintext $P_i$ for which he knows the encryption result $C_i$.

To do so, he computes the following plaintext value :

$$ P^\prime = C_{i-1} \oplus P_g \oplus IV^\prime $$

The victim will encrypt this value, resulting in :

$$ \begin{aligned} C^\prime &= E_{key}(P^\prime \oplus IV^\prime) \\ &= E_{key}(C_{i-1} \oplus P_g \oplus IV^\prime \oplus IV^\prime) \\ &= E_{key}(C_{i-1} \oplus P_g) \\ \end{aligned} $$

If the guess is correct ($P_g = P_i$) then the resulting ciphertext should match the previously known one ($C^\prime = C_i$).

Example

Imagine a protocol that encrypts each network packet by using the last encrypted block as the new IV. Also suppose that transmitted data is limited to “Yes” or “No”.

The first message “Yes” is encrypted below :

from Crypto.Cipher import AES
from Crypto.Util.Padding import pad
key = b'K'*16
msg = pad(b'Yes', 16)
IV = b'\x00'*16
enc = AES.new(key=key, mode=AES.MODE_CBC, iv=IV).encrypt(msg)
print(enc.hex())
# 7ed27092a092b1265ac219821a5a066d

An attacker who intercepted the communication might try to guess the message by submitting guesses until the right value is found :

def strxor(a, b):
    return bytes([x ^ y for x,y in zip(a, b)])

IV = enc
guess = strxor(pad(b'No', 16), IV)
enc = AES.new(key=key, mode=AES.MODE_CBC, iv=IV).encrypt(guess)
print(enc.hex())
# 5abccc68cf2f949ed187924897da24e8

IV = enc
guess = strxor(pad(b'Yes', 16), IV)
enc = AES.new(key=key, mode=AES.MODE_CBC, iv=IV).encrypt(guess)
print(enc.hex())
# 7ed27092a092b1265ac219821a5a066d

When the guess is right, the encryption result matches the intercepted one.

Although this example might seem far-fetched, this is roughly what is done by the BEAST attack, allowing it to decrypt traffic in TLS 1.0 and below.

Choosing the IV

If you have to use the CBC mode of operation, make sure to always generate a random IV using a cryptographically strong PRNG to avoid predictability.

AES-CTR

The name CTR stands for “Counter”. The encryption process is depicted below.

CTR encryption schema

Unlike the CBC mode, the plaintext is not passed to the underlying block cipher. The IV acts as a counter (the most significant bits of this counter are initialized with a nonce in the above picture) incremented for each plaintext block. The IV is encrypted by the block cipher, then the plaintext is xored to the resulting keystream. This construction is similar to stream ciphers.

Keep this in mind while we look at the incorrect ways of handling the IV for this mode.

Constant or reused IV

Just like for CBC mode, you should not reuse a key/IV pair. Although the security of the CBC mode is somewhat resilient to a single reuse, the CTR mode’s security would be completely broken !

If the same IV is used twice, the same keystream will be produced. The attacker can compute the XOR operation of the two plaintexts, allowing him to decrypt them using crib dragging. This is the same attack path as the one used to break the two time pad.

$$ \begin{aligned} C_1 &= P_1 \oplus KS \\ C_2 &= P_2 \oplus KS \\ C_1 \oplus C_2 &= P_1 \oplus P_2 \end{aligned} $$

Example

Let’s assume an attacker has access to the encrypted values of two messages encrypted using the same key/IV pair :

from Crypto.Cipher import AES
from Crypto.Util import Counter
key = b'K'*16
msg = b'Just some data i want to encrypt'
IV = Counter.new(initial_value=0, nbits=128)
print(AES.new(key=key, mode=AES.MODE_CTR, counter=IV).encrypt(msg).hex())
# 6c4ec8cbc99e98836b876432ed8108a959c1900c86f31e4c6bb1c37eabe8f649
msg = b'This is a super secret sentence!'
print(AES.new(key=key, mode=AES.MODE_CTR, counter=IV).encrypt(msg).hex())
# 7253d2ccc98484ce6f877326e9855ae00ad3921097a74a502ebad978b7f2e31c

If he already knows (even partially) one of the two messages, he can (even partially) decrypt the other one like this :

def strxor(a, b):
    return bytes([x ^ y for x,y in zip(a, b)])

c1 = bytes.fromhex("6c4ec8cbc99e98836b876432ed8108a959c1900c86f31e4c6bb1c37eabe8f649")
c2 = bytes.fromhex("7253d2ccc98484ce6f877326e9855ae00ad3921097a74a502ebad978b7f2e31c")
msg = b'Just some data i want to encrypt'
ks = strxor(c1, msg)
print(strxor(c2, ks))
# b'This is a super secret sentence!'

Hopefully, you now see why you should never reuse a key/IV pair with the CTR mode.

Random IV

To never reuse an IV with the same key, you might be tempted to generate the IV randomly just like for the CBC mode. But, you guessed it, this idea is not as good as it seems because of the Birthday Paradox.

The key takeaway is that if you generate an n-bit random value, the probability of obtaining a collision approximates 50% after just $2^{\frac{n}{2}}$ generations.

If n is sufficiently large, the probability of collision is negligible for a longer time, but keep in mind that a single collision is enough to break confidentiality.

You can read more about it here.

Choosing the IV

You should choose the IV in a way you are sure it will never be reused with the same key. But keep in mind that encrypting a message composed of n blocks actually consumes n consecutive IVs.

Depending on the context, using a random IV might be the easiest solution providing enough collision resistance.

Choosing the IV is a delicate task and highly depends on the context :

number of messages encrypted under the same key;
size of the messages.

AES-GCM

The name GCM stands for “Galois/Counter Mode”. The encryption process is depicted below.

GCM encryption schema

This mode acts like the CTR mode, but offers an additional authentication tag. You don’t need to understand all of the above figure, just notice that the ciphertext is constructed in the same way as for the CTR mode.

Because of this similarity, the same IV generation flaws are applicable.

Constant or reused IV

Reusing a single key/IV pair for this mode has the same consequences on confidentiality as for the CTR mode. But the GCM mode offers data authentication and this also severely suffers from IV reuse. In fact, a single IV reuse completely breaks the authentication property and even exposes the internal authentication key, allowing an attacker to forge new tags.

Choosing the IV

The GCM mode is designed for 12 bytes (96-bit) IV. They should be generated sequentially to ensure uniqueness.

Conclusion

The IV is a crucial element for the security of encryption schemes. It should be carefully chosen depending on the encryption mode used and the context.

CBC mode should use an unpredictable, randomly generated IV, while CTR and GCM might prefer a sequentially generated one to avoid collisions.

Mishandling IVs can have disastrous consequences for the security of your product. We at Stackered can assess the security of your cryptographic mechanisms and help you make the best choices depending on your specific needs and constraints.