Post

8. Number Theory

Background

Number Theory

Let $n$ be a positive integer and let $p$ be prime.

Notation. Let $\mathbb{Z}$ denote the set of integers. We will write $\mathbb{Z}_n = \left\lbrace 0, 1, \dots, n - 1 \right\rbrace$.

Definition. Let $x, y \in \mathbb{Z}$. $\gcd(x, y)$ is the greatest common divisor of $x, y$. $x$ and $y$ are relatively prime if $\gcd(x, y) = 1$.

Definition. The multiplicative inverse of $x \in \mathbb{Z}_n$ is an element $y \in \mathbb{Z}_n$ such that $xy = 1$ in $\mathbb{Z}_n$.

Lemma. $x \in \mathbb{Z}_n$ has a multiplicative inverse if and only if $\gcd(x, n) = 1$.

Definition. $\mathbb{Z}_n^\ast$ is the set of invertible elements in $\mathbb{Z}_n$. i.e, $\mathbb{Z}_n^\ast = \left\lbrace x \in \mathbb{Z}_n : \gcd(x, n) = 1 \right\rbrace$.

Lemma. (Extended Euclidean Algorithm) For $x, y \in \mathbb{Z}$, there exists $a, b \in \mathbb{Z}$ such that $ax + by = \gcd(x, y)$.

Group Theory

Definition. A group is a set $G$ with a binary operation $* : G \times G \rightarrow G$, satisfying the following properties.

  • $(\mathsf{G1})$ (Associative) $(a * b) * c = a * (b * c)$ for all $a, b, c \in G$.
  • $(\mathsf{G2})$ (Identity) $\exists e \in G$ such that for all $a\in G$, $e * a = a * e = a$.
  • $(\mathsf{G3})$ (Inverse) For each $a \in G$, $\exists x \in G$ such that $a * x = x * a = e$. In this case, $x = a^{-1}$.

Definition. A group is commutative if $a * b = b * a$ for all $a, b \in G$.

Definition. The order of a group is the number of elements in $G$, denoted as $\left\lvert G \right\lvert$.

Definition. A set $H \subseteq G$ is a subgroup of $G$ if $H$ is itself a group under the operation of $G$. We write $H \leq G$.

Theorem. (Lagrange) Let $G$ be a finite group and $H \leq G$. Then $\left\lvert H \right\lvert \mid \left\lvert G \right\lvert$.

Proof. All left cosets of $H$ have the same number of elements. A bijection between any two coset can be constructed. Cosets partition $G$, so $\left\lvert G \right\lvert$ is equal to the number of left cosets multiplied by $\left\lvert H \right\lvert$.

Let $G$ be a group.

Definition. Let $g \in G$. The set $\left\langle g \right\rangle = \left\lbrace g^n : n \in \mathbb{Z} \right\rbrace$ is called the cyclic subgroup generated by $g$. The order of $g$ is the number of elements in $\left\langle g \right\rangle$, denoted as $\left\lvert g \right\lvert$.

Definition. $G$ is cyclic if there exists $g \in G$ such that $G = \left\langle g \right\rangle$.

Theorem. $\mathbb{Z}_p^\ast$ is cyclic.

Proof. $\mathbb{Z}_p$ is a finite field, so $\mathbb{Z}_p^\ast = \mathbb{Z}_p \setminus \left\lbrace 0 \right\rbrace$ is cyclic.

Theorem. If $G$ is a finite group, then $g^{\left\lvert G \right\lvert} = 1$ for all $g \in G$. i.e, $\left\lvert g \right\lvert \mid \left\lvert G \right\lvert$.

Proof. Consider $\left\langle g \right\rangle \leq G$, then the result follows from Lagrange’s theorem.

Corollary. (Fermat’s Little Theorem) If $x \in \mathbb{Z}_p^\ast$, $x^{p-1} = 1$.

Proof. $\mathbb{Z}_p^\ast$ has $p-1$ elements.

Corollary. (Euler’s Generalization) If $x \in \mathbb{Z}_n^\ast$, $x^{\phi(n)} = 1$.

Proof. $\mathbb{Z}_n^\ast$ has $\phi(n)$ elements, where $\phi(n)$ is the Euler’s totient function.


Schemes such as Diffie-Hellman rely on the hardness of the DLP. So, how hard is it? How does one compute the discrete logarithm?

There are group-specific algorithms that exploit the algebraic features of the group, but we only cover generic algorithms, that works on any cyclic group. A trivial example would be the exhaustive search, where if $\left\lvert G \right\lvert = n$ and given a generator $g \in G$, find the discrete logarithm of $h \in G$ by computing $g^i$ for all $i = 1, \dots, n - 1$. Obviously, it has running time $\mathcal{O}(n)$. We can do better than this.

Baby Step Giant Step Method (BSGS)

Let $G = \left\langle g \right\rangle$, where $g \in G$ has order $q$. $q$ need not be prime for this method. We are given $u = g^\alpha$, $g$, and $q$. Our task is to find $\alpha \in \mathbb{Z}_q$.

Set $m = \left\lceil \sqrt{q} \right\rceil$. $\alpha$ is currently unknown, but by the division algorithm, there exists integers $i,j$ such that $\alpha = i \cdot m + j$ and $0\leq i, j < m$. Then $u = g^\alpha = g^{i\cdot m + j} = g^{im} \cdot g^j$. Therefore,

\[u(g^{-m})^i = g^j.\]

Now, we compute the values of $g^j$ for $j = 0, 1,\dots, m - 1$ and keep a table of $(j, g^j)$ pairs. Next, compute $g^{-m}$ and for each $i$, compute $u(g^{-m})^{i}$ and check if this value is in the table. If a value is found, then we found $(i, j)$ such that $i \cdot m + j = \alpha$.

We see that this algorithm takes $2\sqrt{q}$ group operations on $G$ in the worst case, so the time complexity is $\mathcal{O}(\sqrt{q})$. However, to store the values of $(j, g^j)$ pairs, a lot of memory is required. The table must be large enough to contain $\sqrt{q}$ group elements, so the space complexity is also $\mathcal{O}(\sqrt{q})$.

To get around this, we can build a smaller table by choosing a smaller $m$. But then $0 \leq j < m$ but $i$ must be checked for around $q/m$ values.

There is actually an algorithm using constant space. Pollard’s Rho algorithm takes $\mathcal{O}(\sqrt{q})$ times and $\mathcal{O}(1)$ space.

Groups of Composite Order

In Diffie-Hellman, we only used large primes. There is a reason for using groups with prime order. We study what would happen if we used composite numbers.

Let $G$ be a cyclic group of composite order $n$. First, we start with a simple case.

Prime Power Case: Order $n = q^e$

Let $G = \left\langle g \right\rangle$ be a cyclic group of order $q^e$.1 ($q > 1$, $e \geq 1$) We are given $g,q, e$ and $u = g^\alpha$ and we will find $\alpha$. ($0 \leq \alpha < q^e)$

For each $f = 0, \dots, e$, define $g_f = g^{(q^f)}$. Then

\[(g_f)^{(q^{e-f})} = g^{(q^f) \cdot (q^{e-f})} = g^{(q^e)} = 1.\]

So $g_f$ generates a cyclic subgroup of order $q^{e-f}$. In particular, $g_{e-1}$ generates a cyclic subgroup of order $q$. Using this fact, we will reduce the given problem into a discrete logarithm problem on a group having smaller order $q$.

We proceed with recursion on $e$. If $e = 1$, then $\alpha \in \mathbb{Z}_q$, so we have nothing to do. Suppose $e > 1$. Choose $f$ so that $1 \leq f \leq e-1$. We can write $\alpha = i\cdot q^f + j$, where $0 \leq i < q^{e-f}$ and $0 \leq j < g^f$. Then

\[u = g^\alpha = g^{i \cdot q^f + j} = (g_f)^i \cdot g^j.\]

Since $g_f$ has order $q^{e-f}$, exponentiate both sides by $q^{e-f}$ to get

\[u^{(q^{e-f})} = (g_f)^{q^{e-f} \cdot i} \cdot g^{q^{e-f} \cdot j} = (g_{e-f})^j.\]

Now the problem has been reduced to a discrete logarithm problem with base $g_{e-f}$, which has order $q^f$. We can compute $j$ using algorithms for discrete logarithms.

After finding $j$, we have

\[u/g^j = (g_f)^i\]

which is also a discrete logarithm problem with base $g_f$, which has order $q^{e-f}$. We can compute $i$ that satisfies this equation. Finally, we can compute $\alpha = i \cdot q^f + j$. We have reduced a discrete logarithm problem into two smaller discrete logarithm problems.

To get the best running time, choose $f \approx e/2$. Let $T(e)$ be the running time, then

\[T(e) = 2T\left( \frac{e}{2} \right) + \mathcal{O}(e\log q).\]

The $\mathcal{O}(e\log q)$ term comes from exponentiating both sides by $q^{e-f}$. Solving this recurrence gives

\[T(e) = \mathcal{O}(e \cdot T_{\mathrm{base}} + e\log e \log q),\]

where $T_\mathrm{base}$ is the complexity of the algorithm for the base case $e = 1$. $T_\mathrm{base}$ is usually the dominant term, since the best known algorithm takes $\mathcal{O}(\sqrt{q})$.

Thus, computing the discrete logarithm in $G$ is only as hard as computing it in the subgroup of prime order.

General Case: Pohlig-Hellman Algorithm

Let $G = \left\langle g \right\rangle$ be a cyclic group of order $n = q_1^{e_1}\cdots q_r^{e_r}$, where the factorization of $n$ into distinct primes $q_i$ is given. We want to find $\alpha$ such that $g^\alpha = u$.

For $i = 1, \dots, r$, define $q_i^\ast = n / q_i^{e_i}$. Then $u^{q_i^\ast} = (g^{q_i^\ast})^\alpha$, where $g^{q_i^\ast}$ will have order $q_i^{e_i}$ in $G$. Now compute $\alpha_i$ using the algorithm for the prime power case.

Then for all $i$, we have $\alpha \equiv \alpha_i \pmod{q_i^{e_i}}$. We can now use the Chinese remainder theorem to recover $\alpha$. Let $q_r$ be the largest prime, then the running time is bounded by

\[\sum_{i=1}^r \mathcal{O}(e_i T(q_i) + e_i \log e_i \log q_i) = \mathcal{O}(T(q_r) \log n + \log n \log \log n)\]

group operations. Thus, we can conclude the following.

The difficulty of computing discrete logarithms in a cyclic group of order $n$ is determined by the size of the largest prime factor.

Consequences

  • For a group with order $n = 2^k$, the Pohlig-Hellman algorithm will easily compute the discrete logarithm, since the largest prime factor is $2$. The DL assumption is false for this group.
  • For primes of the form $p = 2^k + 1$, the group $\mathbb{Z}_p^\ast$ has order $2^k$, so the DL assumption is also false for these primes.
  • In general, $G$ must have at least one large prime factor for the DL assumption to be true.
  • By the Pohlig-Hellman algorithm, discrete logarithms in groups of composite order is a little harder than groups of prime order. So we often use a prime order group.

Information Leakage in Groups of Composite Order

Let $G = \left\langle g \right\rangle$ be a cyclic group of composite order $n$. We suppose that $n = n_1n_2$, where $n_1$ is a small prime factor.

By the Pohlig-Hellman algorithm, the adversary can compute $\alpha_1 \equiv \alpha \pmod {n_1}$ by computing the discrete logarithm of $u^{n_2}$ with base $g^{n_2}$.

Consider $n_1 = 2$. Then the adversary knows whether $\alpha$ is even or not.

Lemma. $\alpha$ is even if and only if $u^{n/2} = 1$.

Proof. If $\alpha$ is even, then $u^{n/2} = g^{\alpha n/2} = (g^{\alpha/2})^n = 1$, since the group has order $n$. Conversely, if $u^{n/2} = g^{\alpha n/2} = 1$, then the order of $g$ must divide $\alpha n/2$, so $n \mid (\alpha n /2)$ and $\alpha$ is even.

This lemma can be used to break the DDH assumption.

Lemma. Given $u = g^\alpha$ and $v = g^\beta$, $\alpha\beta \in \mathbb{Z}_n$ is even if and only if $u^{n/2} = 1$ or $v^{n/2} = 1$.

Proof. $\alpha\beta$ is even if and only if either $\alpha$ or $\beta$ is even. By the above lemma, this is equivalent to $u^{n/2} = 1$ or $v^{n/2} = 1$.

Now we describe an attack for the DDH problem.

  1. The adversary is given $(g^\alpha, g^\beta, g^\gamma)$.
  2. The adversary computes the parity of $\gamma$ and $\alpha\beta$ and compares them.
  3. The adversary outputs $\texttt{accept}$ if the parities match, otherwise output $\texttt{reject}$.

If $\gamma$ was chosen uniformly, then the adversary wins with probability $1/2$. But if $\gamma = \alpha\beta$, the adversary always wins, so the adversary has DDH advantage $1/2$.

The above process can be generalized to any groups with small prime factor. See Exercise 16.22 Thus, this is another reason we use groups of prime order.

  • DDH assumption does not hold in $\mathbb{Z}_p^\ast$, since its order $p-1$ is always even.
  • Instead, we use a prime order subgroup of $\mathbb{Z}_p^\ast$ or prime order elliptic curve group.

Summary of Discrete Logarithm Algorithms

Name Time Complexity Space Complexity
BSGS $\mathcal{O}(\sqrt{q})$ $\mathcal{O}(\sqrt{q})$
Pohlig-Hellman $\mathcal{O}(\sqrt{q_\mathrm{max}}$ $\mathcal{O}(1)$
Pollard’s Rho $\mathcal{O}(\sqrt{q})$ $\mathcal{O}(1)$
  • In generic groups, solving the DLP requires $\Omega(\sqrt{q})$ operations.
    • By generic groups, we mean that only group operations and equality checks are allowed. Algebraic properties are not used.
  • Thus, we use a large prime $q$ such that $\sqrt{q}$ is large enough.

Candidates of Discrete Logarithm Groups

We need groups of order prime, and we cannot use $\mathbb{Z}_p^\ast$ as itself. We have two candidates.

  • Use a subgroup of $\mathbb{Z}_p^\ast$ having prime order $q$ such that $q \mid (p-1)$ as in Diffie-Hellman.
  • Elliptic curve group modulo $p$.

Reduced Residue Class $\mathbb{Z}_p^\ast$

There are many specific algorithms for discrete logarithms on $\mathbb{Z}_p^\ast$.

  • Index-calculus
  • Elliptic-curve method
  • Special number-field sieve (SNFS)
  • General number-field sieve (GNFS)

GNFS running time is dominated by the term $\exp(\sqrt[3]{\ln p})$. If we let $p$ to be an $n$-bit prime, then the complexity is $\exp(\sqrt[3]{n})$. Suppose that GNFS runs in time $T$ for prime $p$. Since $\sqrt[3]{2} \approx 1.26$, doubling the number of bits will increase the running time of GNFS to $T^{1,26}$.

Compare this with symmetric ciphers such as AES, where doubling the key size squares the amount of work required.3 NIST and Lenstra recommends the size of primes that gives a similar level of security to that of symmetric ciphers.

Symmetric key length Size of prime (NIST) Size of prime (Lenstra)
80 1024 1329
128 3072 4440
256 15360 26268

All sizes are in bits. Thus we need a very large prime, for example $p > 2^{2048}$, for security these days.

Elliptic Curve Group over $\mathbb{Z}_p$

Currently, the best-known attacks are generic attacks, so we can use much smaller parameters than $\mathbb{Z}_p^\ast$. Often the groups have sizes about $2^{256}$, $2^{384}$, $2^{512}$.

  1. We didn’t require $q$ to be prime! 

  2. A Graduate Course in Applied Cryptography 

  3. Recall that the best known attack was only 4 times faster than brute-force search. 

This post is licensed under CC BY 4.0 by the author.