I first learned about the RSA algorithm in high school. The actual computation wasn’t difficult to understand, but the fact that it worked seemed magical. I spent a good year trying to understand it back then and even then I only in college ended up figuring it out to any level of confidence. The beautiful simplicity of the algorithm was one of the things that got me into cryptography in the first place, so here is the layman explanation where we build RSA from the ground up that I wish I heard in high school, free of any messy business like group theory.

However, this does not mean we can just jump into it. First we need some basics.

A Brief Primer on Modular Arithmetic

Many may already know that modulo simply means remainder after division \(13 \bmod 6 = 1\) for instance, \(144 \bmod 100 = 44\), and \(88 \bmod 44 = 0\) as well. Often this is described as “clock arithmetic” in high school as if you add say 13 hours to a clock, that’s the same as adding 1 to the clock.

Which brings us to interesting properties of modular arithmetic. The fact that our operations in normal math work under a modular system.

Modulo is Distributive Over Addition

$$(x + y) \bmod z = (x \bmod z + y \bmod z) \bmod z$$

Essentially modulo distributes over addition, ie we can do as many modulos as we like for arithmetic steps in between modulo operations, for example \(144 + 362 \bmod 100\)\(= (44 + 62) \bmod 100 \)\(=106 \bmod 100 = 6\).

The reason this works generally should be pretty clear. The expression \((x + y) \bmod z\) can be split up to be $$(x + y) \bmod z = (az + b) + (cz + d) \bmod z$$ where \(az + b = x\) and \(cz + d = y\), and \(b\) and \(d\) are less than \(z\), and if \(x\) or \(y\) are less than \(z\), \(a\) of course is 0. \(b\) and \(d\) respectively are actually just \(x \bmod z\) and \(y \bmod z\). We can thus distribute this to be $$(a + c)z + b + d \bmod z = b + d \bmod z$$ $$= (x \bmod z + y \bmod z) \bmod z$$ Any integer multiplied by \(z\) is taken out by the modulo operation, leaving us with what we have at the end.

The intuition for this is our clock arithmetic from earlier. Adding 13 hours then 14 hours merely adds \(13 + 14 \bmod 12 = 1 + 2 = 3\) hours to our clock. We can think of any amount of “overwinding” from either of these quantities as getting eaten by the modulo.

Modulo is Distributive over Multiplication

$$xy \bmod z = (x \bmod z)(y \bmod z) \bmod z$$

This is a little harder to intuitively visualize but mathematically it goes like this. We can similarly split up \(x\) and \(y\) into a multiple of \(z\) and a remainder.

$$xy \bmod z = (az + b)(cz + d) \bmod z$$ $$= acz^2 + adz + bcz + bd \bmod z$$ $$= bd \bmod z$$

Similar to the case with addition, any term containing a \(z\) in it is eaten by the modulo.

And since it deserves a brief mention, this should make sense as well

$$z^a \bmod z = 0$$

Is true for any whole number \(a\).

Building RSA

So now that we are here, lets discuss our goals here, and more specifically what RSA actually does.

RSA is of course an acronym for its creators, Ron Rivest, Adi Shamir, and Leonard Adleman. It is an asymmetric encryption scheme, also known as a public-key encryption scheme. The goal was to create an encryption scheme as follows.

I am able to give you a piece of information \(P\), lets call this a public key. I have a secret key \(S\). You are able to at least trust that I was the one to give you \(P\), I also don’t have to keep \(P\) secret, I can in fact give out \(P\) to everyone.

Using \(P\), you can encrypt a message \(m\) by some means. No other holder of \(P\) can decrypt the encrypted message, but I as the owner of \(S\) can. All the algorithms and what not are public, the only secret thing I have is \(S\).

\(S\) and \(P\) can be made up of some pieces of data, but all can of course be encoded as a set of numbers, as can the message.

So lets formalize what we need.

Properties of a Public-Key Encryption Scheme

Lets first discuss what it was that Rivest was looking for that night, what exactly defines a public key encryption scheme?

Firstly, there are two sets of data we are operating on, a public key \(P\) and a private key \(S\). Everything here is of course numbers, \(P\) and \(S\) can be numbers as well or sets of numbers. The messages are numbers. Otherwise we wouldn’t be talking about math.

Now of course we need to encrypt with a key, which is done as follows. Given some message \(m\),

$$\text{Encryption}: Enc(P, m) = c$$

And decryption.

$$\text{Decryption}: Dec(S, c) = m$$

Firstly one thing to note is that these should be fixed input and fixed output functions. That is, we can define a maximum input size and a maximum output size on these function such that we can reasonably send/receive them.

These functions also act as reverses of each other, that is

$$\text{Reversibility}: Dec(s, Enc(P, m)) = m$$

This adds one additional interesting property, every output of the encrypt function is unique, there are no two inputs that encrypt to the same thing, as otherwise we wouldn’t be able to uniquely decrypt each.

This turns out to not be strictly true for RSA, there is a very very small proportion of possible inputs where RSA actually fails this. But its such a small percentage of the possible inputs that its unlikely to be something that is encrypted, and can be avoided by other means, more on this later.

And lastly, if something is encrypted, given all the public information they should not be able to decrypt it, except if they have the secret key \(S\).

$$\text{Security}: \text{Given encrypted message c, P,}$$ $$\text{discovering anything about plaintext m is hard}$$

$$\text{There is no known reasonably computable function}$$ $$\text{without knowing S} f(c, P) = m$$

Now there are very formal definitions of these that I don’t want to get too deep into big O notation or anything of that sort. Hard means that brute force or some slightly smarter brute force is really the best way we know to compute something. The basics of it here is that only the holders of the secret key should be able to decrypt an encrypted message.

Analyzing these properties, we can discuss a few interesting facts.

The ciphertext \(c\) has properties of the plaintext message \(m\) hidden in it. Perhaps this is obvious but something I feel needs to be sunk in a little more to truly understand this.

The main issue is that for most mathematical functions we know of, they don’t hide these properties well enough that we can pretty easily reverse them. Or if they do hide these properties well enough, we lose information in the process, for instance can you tell me what input \(x\) I gave to \(x \bmod 62\) to get 40? The secrecy isn’t hidden here due to some complex mathematical properties, its hidden due to the sheer amount of inputs that give this output.

Functions that are easy to compute in one direction but hard to compute in the other are known as one way functions of course. Perhaps we can find some hard problems that give us some useful one way functions.

Hard Problems

One way functions are typically based off of hard problems, so lets investigate some hard problems to give some examples.

The Prime Factorization Problem

It turns out finding the prime factors of something is considered a hard problem.

Suppose I take two prime numbers \(p\) and \(q\). I don’t tell you what they are, but I multiply them together and give them to you. What can you do to find \(p\) and \(q\).

The best answer a novice mathematician can give is the brute force answer, guessing and checking. The best answer a less novice mathematician can give is a slightly smarter version of brute force, but not smart enough to make this an easy problem.

The Discrete Logarithm Problem

A logarithm in classic mathematics is giving the equation \(a^x= b\) and being provided \(a\) and \(b\) one must find \(x\). We have pretty good ways of finding these for the real numbers.

But suppose I take for some whole numbers \(a\), \(b\), and \(c\)

$$a^x \bmod b = c$$

If I provide you with \(a\), \(b\), and \(c\), are you able to find an input \(x\) satisfying this equation.

It also turns out that if I instead give you \(x\), \(b\), and \(c\), finding \(a\) instead is similarly difficult.

This is similarly a hard problem. We have made some progress on slightly more complicated algorithms to compute these better than raw brute force but its still on the complexity of raw brute force.

One remarkable thing about discrete exponentiation is the fact that it “distributes.” That is, \(a^{bc} \bmod d = (a^b \bmod d)^c \bmod d\). In other words, despite it being very difficult to reverse discrete exponentiation, it still somehow preserves a hidden fact that it was found by calculating this modular exponentiation. This may prove to be useful.

Getting Somewhere, Fermat’s Little Theorem

Turns out modular exponentiation has some interesting properties, one of the most interesting conclusions was by 1600s French mathematician Pierre de Fermat. In his famous “little theorem,” he stated that if we have a prime number \(p\) and any whole number \(a < p\), then

$$a^p \bmod p = a$$

Now this is a bit of a weird thing to stumble upon. Why would this be true.

There are a number of proofs for this theorem, but without getting into group theory, here is a very simple one based off of mathematical induction.

Proof of Fermat’s Little Theorem

This proof is based on mathematical induction. For those unfamiliar with this proof technique, it goes as follows. Suppose we want to prove that something holds for all the whole numbers. One way to prove this is to prove the two following statements.

  1. Base Case: Prove that it works for 1, or some other base case

  2. Induction: Prove that if it works for \(a\), it works for \(a+1\)

The idea here is that the if we have the 1 case proven, we can use the second statement to say that it works for 2 because it works for 1. We can then say that it works for 3 because it works for 2 and so on. These are often called domino proofs, because the idea is that if you “tip the first domino” proving the base case, and can show each last domino tips the next (induction), thus all the dominos fall.

We use this technique to prove Fermat’s Little Theorem.

  1. Base Case: We firstly know that \(1^p \bmod p = 1\) for any prime number \(p\). True by inspection.

  2. Induction: We assume that \(a^p \bmod p = a\) is true. We try to prove the case of \(a+1\), we use the binomial theorem.

$$(a + 1)^p \bmod p = a^p + {\binom{p}{1}}a^{p-1}1^1 + {\binom{p}{2}}a^{p-2}1^2 + \cdots$$

$$+ \binom{p}{p-1}a^{1}1^{p-1} + 1^p \bmod p$$

Because \({\binom{p}{n}} = \frac{p!}{(p-n)!n!}\) is firstly an integer and for any \(1 \leq n < p\) we can see that the denominator does not divide by \(p\). This any combination satisfying this \(n\) is a multiple of \(p\), meaning that all the terms having this as its binomial coefficient are multiples of \(p\) and thus taken out by the modulo. Only the first and last terms are left, thus

$$(a + 1)^p \bmod p = a^p + 1^p \bmod p$$

Since we assumed that \(a^p \bmod p = a\) and know that \(1^p \bmod p = 1\), we can say that

$$(a + 1)^p \bmod p = a + 1$$

So how is this useful?

Well given the discrete log problem, maybe we can see if we can abuse that to create an encryption scheme.

RSA Attempt 1

So lets try to create an encryption scheme with our knowledge.

We know that with exponentiation, we can split up exponents. That is \(a^{bc} = (a^b)^c\). This holds true over modular arithmetic as well, this can be shown from how it works for multiplication by treating exponents as repeated multiplication

So what if we did this.

We find a big prime \(p\). We split up \(p = ed\), which are whole numbers. We then publish \(p\) and \(e\). If someone wants to encrypt a message \(m\), they do

$$m^e \bmod p = c$$

Due to the discrete log problem, no one can take \(c\) and reverse it back into the message \(m\) as that is a hard problem.

As the holder of the secret key \(d\), I do

$$c^d \bmod p = (m^e)^d \bmod p $$ $$ = m^{ed} \bmod p = m^p \bmod p = m$$

And decrypt the message.

Except the astute of you probably have already noticed the problem.

Firstly, \(p\) is prime, there exists no \(p = ed\) that are whole numbers beside 1 and \(p\) itself, neither of those cases being particularly useful to me.

Secondly, even if I was able to do so, this entire system relies on \(d\) being kept secret. But they can compute \(d = \frac{p}{e}\). Not so secret anymore.

So our brilliant scheme was foiled. But perhaps we can build this failure into something that works.

RSA Attempt 2

So while we cannot use Fermat’s Little Theorem itself, lets try to use a slightly weaker notion of it, see if it works that way. It turns out that Fermat’s Little Theorem is simply a special case of what is known as Euler’s theorem (and of course Euler has his hands in this too). I’m not going to get into it too much but its good to note.

Since our method fails with prime numbers, lets see if we can do something with a number that is almost prime.

So we take two large prime numbers \(p\) and \(q\). Multiply them together to produce \(n=pq\).

One thing to note here is that \(n\) is protected by the prime factorization problem. That is, provided \(n\), it is difficult to find \(p\) and \(q\).

The next question of course we have is what is our exponent to get this weaker notion of Fermat’s Little Theorem to work. A first naive guess would probably be \(n\) itself so lets try it, say with the number \(n=13(11) = 143\)

$$3^{143} \bmod 143 = 126$$

Well that wasn’t quite right.

A second thought that might come to mind is to test if an exponent works with Fermat’s little theorem on the individual primes.

$$3^{143} \bmod 13 = 9$$

$$3^{143} \bmod 11 = 5$$

We can see that it doesn’t work this way either, but perhaps its worth investigating this method.

The result of any \(a \bmod p = b\) implies that \(a\) can be written as \(a = pg +b\) for some natural number \(g\). If we have multiple modulo things such as the above, say \(c \bmod q = d\), then we can say that similarly \(c = qh + d\), and we can say that \(a = a \implies pg + b = qh + d\)

Now this doesn’t quite help us, there are too many unknowns here. But we can reduce some of the unknowns using a little trick. We notice with Fermat’s little theorem, if we divide both sides by our integer \(a\), then we get.

$$a^{p-1} \bmod p = 1$$

So instead of searching for exponents where our number is returned after the modular exponentiation, we instead look for exponents where the number is 1.

This simplifies things a little for our attempts above, as we can thus have \(b = d = 1\) and this means that \(pg + 1 = qh + 1 \implies pg = qh\).

So how does this help us? As all the numbers in that expression are integers, the implication here \(pg\) is divisible by both \(p\) and \(q\), and this by \(n = pq\). That is if we can find an exponent which returns 1 for both \(\bmod p\) and \(\bmod q\), then we know it also it returns 1 for the case of \(\bmod n\).

So how do we search for these. Well we use a little trick. We can say that for any whole number \(h\),

$$(a^{p-1})^h \bmod p = 1^h \bmod p = 1$$

We can tune this \(h\) to be whatever it needs to be to work with \(q\) as well. So basically to summarize. We want to find an exponent \(e\) where \(a^e \bmod pq = 1\). We know that if \(a^e \bmod p = 1\) and \(a^e \bmod q = 1\), then \(a^e \bmod pq = 1\). We can also know that \(a^{p-1} \bmod p = 1\) and \(a^{q-1} \bmod q = 1\). Thus why not multiply these exponents together, if we take \(a^{p-1} \bmod p = 1\), from the example above, we take \(h = q-1\), \(e=(p-1)(q-1)\).

Thus $$a^{(p-1)(q-1)} \bmod pq = 1$$

We have one final step. To have the result of the modular exponentiation equal 1, we took our original Fermat’s theorem and divided both sides by \(a\). Now we return that \(a\), multiplying by both sides, we see that

$$a^{(p-1)(q-1) + 1} \bmod pq$$ $$= a^{pq -p -q + 2} \bmod pq = a$$

There it is, our primary exponent, \((p-1)(q-1) + 1\).

Compared to our first RSA attempt, this exponent doesn’t seem prime and can’t be derived by knowing \(n=pq\) alone. One has to know \(p\) and \(q\) themselves to compute this.

We can actually optimize the primary exponent further, it turns out that we merely need a number divisible by both \(p-1\) and \(q-1\), in other words we can get a smaller exponent by taking the least common multiple, \(lcm(p-1, q-1)\). This creates a smaller number, but for simplicity, I will simply use their product in the remainder of this article.

To test our sanity, lets try this with the primes \(11\) and \(13\). \(n = 143\)

$$(p-1)(q-1) + 1 = 10(12) + 1 = 121$$

$$36^{121} \bmod 143 = 36$$.

There it is.

Now we have another problem though. We have \(121\). How do we split it?

With our classic intuition, this doesn’t seem to be all that possible at least to do so in a cryptographically secure way, as the prime factorization of \(121\) is \(11^2\). We could of course use these for our exponents, just as a sanity check we see that.

$$36^{11} \bmod 143 = 69$$

$$69^{11} \bmod 143 = 36$$

As expected.

But using 11 for both our public key exponent and private key exponent is probably not the best move. But another problem is of course the fact that splitting it in this way requires prime factorization, which we previously described to be a hard problem. On integers this small it is of course very easy, but on larger ones? What can we do?

We instead turn to some benefits of playing in a modular space.

One thing we notice is that we can use the same \(h\) exponent trick, that is we don’t just have one exponent, we have a class of exponents found by \(h(p-1)(q-1) + 1\)

Because of this, we can actually say this is a modular arithmetic problem. We are thus looking for two numbers \(ed \bmod (p-1)(q-1) = 1\).

We call \(e\) and \(d\) modular multiplicative inverses of each other. Real numbers have modular inverses as well, we just call them reciprocals, \((a)\frac{1}{a} = 1\), but in the modular space this becomes quite a bit more interesting.

One convenient thing about modular arithmetic is that it tells us that if at we choose say \(e\) to be co-prime to the modulus, then \(d\) exists.

Conveniently figuring out whether something is coprime is easy in this case, as we know our modulus is the product of two primes, we test divisibility our our number against those two primes. Alternatively we can verify that the greatest common factor of two numbers is 1. This turns out to be the better approach as the algorithm to calculate greatest common factor also assists us greatly in calculating the modular multiplicative inverse. This is known as the Euclidean algorithm.

What finding the multiplicative inverse entails is actually finding an \(h\) for which the expression \(\frac{h(p-1)(q-1) + 1}{e}\) is \(a\) whole number. For small numbers this isn’t actually that big of a deal, it turns out that there is a \(\frac{1}{e}\) chance that any \(h\) is actually valid simply by nature of rolling random numbers and having at some be divisible by it. For instance, choosing \(e=7\) gets us \(h=6\) and \(d=103\) pretty quickly by mere brute force. And we can verify these once again.

$$32^7 \bmod 143 = 98$$

$$98^{103} \bmod 143 = 32$$

So it does work.

But we would like an algorithm that isn’t mere brute force. And there is one. The Euclidean algorithm has a neat quirk where by adding some trivial calculations to our main loop finding the gcd we can also find the modular multiplicative inverse. More on this can be found at the end of this article.

At long last we have an encryption scheme.

RSA In Full

Key Creation

Find two large primes \(p\) and \(q\). Multiply them together to get \(n = pq\). This is the modulus, part of both the public and private key.

Calculate the general exponent \(j = lcm(p-1, q-1)\)

Choose a public exponent \(1 < e < j\)

Calculate the private exponent \(d = \text{modular multiplicative inverse of e over j}\).

Publish \(e, n\) as the public key. Keep \(d\), \(p\), and \(q\) secret.

Encryption

The public key holder can encrypt a message \(m\) to the private key recipient by computing

$$m^e \bmod n = c$$

Decryption

The private key holder can decrypt an encrypted message \(c\) by calculating.

$$c^d \bmod n = m$$

Various Notes of Interest

How to find large primes

It turns out we can find large primes by using Fermat’s Little Theorem as well. Known as the Fermat primality test, a random number \(p\) is probably prime if it satisfies Fermat’s little theorem for a couple of different integers. While “false primes” do exist where for many different integers Fermat’s primeality test works despite them not being primes, its usually pretty rare and the iterations of the test can be set to one’s tolerance for error.

Primes overall are relatively common within the number line, with their density around a certain number being approximately the log of that number. But we do have some slightly more efficient means of finding primes if needed. For instance Sophie Germain discovered so called Germain primes, that is its pretty likely that if you have a prime \(p\), then \(2p + 1\) is also prime. We can use this to use smaller primes to find bigger primes.

Fast Modular Exponentiation

Most computer systems have their integer size limited to some amount of bytes. Also the method of repeated multiplication to find an exponent will not work well for the exponents of the size needed for RSA.

So we have an alternate method known as fast modular exponentiation.

Written as a binary number, we notice that an exponent can be expressed easily as a sum of powers of 2.

For instance

$$5^25 \bmod 20 = 5^{16}(5^{8})(5^1) \bmod 20$$ $$= ((((5^2)^2)^2)^2)(((5^2)^2)^2)(5) \bmod 20$$

We notice that by calculating this way, the latter terms are often calculated on the way to calculating the first term. And this effectively “logs” the amount of multiplications we have to do to exponentiate.

We can also of course on every square or multiplication step we can take the modulo to keep the integer we are dealing with low.

Extended Euclidean Algorithm and Adventures Finding the Multiplicative Inverse

I find it pretty interesting that while prime factorization is difficult to calculate, gcd and lcm are remarkably easy. To some extent to my computer science mind it would seem like all of these should be hard problems as they rely on this prime factorization that is very difficult to find but no, if you have two numbers that share a factor, extracting said factor is rather simple.

The key in Euclid’s method lies in a bit of a trick. It turns out that \(gcd(a, b) = gcd(a-b, b)\) assuming of course that \(a > b\). Now this fact may be a bit difficult to believe but it lies in the fact that \(a-b = gcd(a, b) (\frac{a}{gcd(a, b)} - \frac{b}{gcd(a, b})\). That is, since the gcd divides both numbers, these terms \(\frac{a}{gcd(a, b)}\) and \(\frac{b}{gcd(a, b)}\) are both integers, and thus their difference is also an integer. Multiply \(gcd(a, b)\) back in and you get a smaller number that shares a gcd with \(b\).

Essentially we can repeatedly subtract the smaller number from the bigger number and at some point we should reach a place where both numbers are equal. That is the gcd.

Now it seems rather silly to say compute \(gcd(400, 12)\) by repeatedly subtracting 12. Fortunately we have an operation that does this repeated subtraction until the number is less than 12, the modulo operation. To compute the \(gcd(400, 12)\) we first take \(400 \bmod 12 = 4\). As 4 is the smaller number, we compute \(12 \bmod 4 = 0\). As 4 completely goes into 12, we know that the gcd here is 4.

In summary, in code this looks to be a very simple algorithm. For instance here is a Python snippet I once wrote.

def gcd(a. b):
    while a != 0:
        a, b = b % a, a
    return b

Now this is the euclidean algorithm. But what is extended about the extended euclidean algorithm.

It turns out that during the process of solving \(gcd(a, b)\) we also more or less solve what integer coefficients \(x\), \(y\) satisfy

$$ax + by = gcd(a, b)$$

Also known as Bezout’s identity, which exists for all nonzero numbers. The rationale for this leads from the Euclidean algorithm rationale, integer multiples of two numbers can only approach each other or go further apart from each other in multiples of the greatest common factor.

The extended algorithm method of solving this primarily involves noticing that the input whose remainder is taken can be reconstructed by the output and one additional integer, that is \(a_n = hb_n + (a_n \bmod b_n)\) for some integer \(h\). And we can of course easily find this integer \(h\) by adding a line that does flooring division on \(a\) with \(b\).

At some point in our Euclidean algorithm, \(a_n \bmod b_n\) is the \(gcd(a, b)\). We can then just essentially say that \(a_n - hb_n = gcd(a, b)\), and can using the identity above to reconstruct it. This works best in a recursive algorithm rather than the iterative algorithm as the reconstruction for this must happen on the return even if the gcd is found at the furthest recursion level.

I don’t want to detail the algorithm too much here as the important part is how to actually find the modular multiplicative inverse. To find the inverse of \(a \bmod n\), we first calculate using the extended algorithm.

$$ax + ny = gcd(a, n)$$

Now we know that the gcd here is 1, we established that for the modular inverse to exist the values must be coprime. But once we have the above expression, we simply take the entire thing \(\bmod n\) and are able to get.

$$ax + ny \bmod n = 1 \implies ax \bmod n = 1$$

Thus \(x\) is our multiplicative modular inverse.

RSA as a Digital Signature Algorithm

RSA in today’s world is more often used for digital signatures than it is for encryption, as it turns out that in a modern connected internet world single party encryption isn’t all that necessary and we have other ways of handling interactive multi party encryption.

Digital signatures are cryptographic proofs that the sender of a message has access to the private key of the message. That is, along with a message \(m\), a signature \(s\) also comes and using the public key \(P\), there exists a \(verify(p, s, m)\) to verify that the message came from the private key holder.

For RSA, this is rather simple actually. Rather than the public key holder “encrypting” a message, that message is “encrypted” by the private key holder.

$$s = m^d \bmod n$$

To verify, the signature is “decrypted” by the verifier and checked against the original message.

$$m = s^e \bmod n$$

If they match, its a valid signature. It has similar security protections as used in encryption, rather than signatures.

Final Notes

Please email me at maksymovi@maksym.pro for any corrections to math mistakes, typos, suggestions, or any other comments.