Public-key Cryptography Tutorial; What's SSH, SSL, TLS, AES, SHA-1, MD5

By Xah Lee. Date: . Last updated: .

This article is a basic intro to Public-key Cryptography related issues, written for programers and Linux System Administrators. This provide a conceptual introduction to understand Public-key cryptography, and associated protocols such as {HTTPS, SSH, SSL, TLS} and the bewildering meaning of acronyms such as {SHA-1, MD5, RSA, DES, Blowfish, AES, RC5}.

Why is Public-Key Cryptography So Confusing?

Public-key Cryptography (PKC) as used in tech industry (➢ for example: SSH, HTTPS, certificates, digital signature) is very confusing.

First, is the math concepts involved. You basically have to be a mathematician to understand how it works in detail. (it involves advanced number theory, modular arithmetic, group theory, elliptic curve theory (algebraic geometry).)

Besides the math, it gets extremely complex. Because:

Is it confusing enough?

Luckily, as a programer or sys admin, we don't need to understand the math to use it. We can still understand the concept of public-key encryption for sending secret messages.

Here's a very basic intro to get you started.

Ciphers, Plaintext, Ciphertext

There are many methods to encrypt a message. Such a method is called a Cipher. Message before encryption is called “plaintext”, after encryption is called “ciphertext”.

Cipher Involves a “Key”

To encrypt a message, almost always involves a “key”. For example, you can shift letter by 2, such as {a → c, b → d, c → e, …}. Then, the “key” is the map of letter transformation (or, just the phrase “shift alphabet to the right by 2”). Or, you can create a customized map to transform the letters, ➢ for example: {a → 3, b → u, c → h, …}. This map would be the “key”. To create a ciphertext, you just apply this transformation. To get plaintext, just use the reverse map.

Most Basic Cipher: Substitution Ciphers

The cipher described above belongs to a class called Substitution Ciphers. Substitution Ciphers are ancient. One popular example of substitution cipher is called ROT13. It just shift alphabet by 13 places {a → n, b → o, c → p, …}. If you use Emacs, you can call the command rot13-region to convert selected text into cipher text using the rot13 cipher. This is useful for example, when you post a quiz, and you can include the answer by rot13 the answer, to prevent accidental reading of the answer.

Try to decipher this:


(start emacs, paste into a empty file, select the text, then type 【Alt+x】 then rot13-region)

Symmetric-Key Ciphers

Ciphers are classified into many types. The most common type of cipher use the same key to encrypt & decrypt. Such cipher is called Symmetric-key ciphers. For example, substitution ciphers are symmetric-key ciphers, because, given the key used (a map of letter transformation), you are able to both encrypt & decrypt a plaintext/ciphertext.

Symmetric-key ciphers have a major practical problem. To transmit a message, both sender & receiver must already know the key. So, there's a chicken-and-egg problem. You want to send a secret message, but first you need to send another message (the key) secretly. For example, on the internet, when you send credit card numbers to a site, and if you are using symmetric-key cipher, both your machine and the remote site must first both know a key and agree on a cipher (method).

Public-key Cryptography (PKC): RSA and Others

In about 1973, mathematicians started to understand a system that eventually became the Public-key cryptography (PKC). Its popularity began in 1990s. Here's the basics of PKC:

The way to use this is that, each person have a key pair. One called public, and the other private. The public key is published. Only the private key is kept in secret, known just to himself.

Suppose, Alice & Bob want to talk in secret. Alice encrypt the message using Bob's public key. When Bob receives it, he can just decrypt it using his secret private key.

There are several public-key ciphers. The most popular one is RSA.

On Linux terminal, you can type ssh-keygen -t rsa to generate a public/private key pair (for the RSA cipher). By default, they are stored at ~/.ssh/ and ~/.ssh/id_rsa. Here's a example of what the keys look like:

public key:

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDNal4QTKag28EVCvn48CYarg9dbtcVbPXVnYSb9FezRP6GrRALkK8Z0D1RzybIQhM0n/qb8AZo78ikkOP4UP8gjjP9+PWiRzsT8KAuB1rAeI7OYX6sD3tDbeK4XPJFigeKQBb626V8yFg9+7YN4ZW6kzmOUt+pINN0SHAIcPuaBU4VyDhWfmdXjFejyBALU+mbcSCDLHBkfjTjXe4/H5nvXQr+D1KkJWt868BHDLAdsuptJ4o7K/x12nrW4AATY2/t6xWaqViMObaEKTS+p0cy9ERNShms0F7XopyHdeYA6olDOQZJY1v3QJiDes9akJQ6N+jx6r0i7Sn/d0TRqwz/ xah@xah-VB3

private key:

Proc-Type: 4,ENCRYPTED
DEK-Info: AES-128-CBC,67855EB697A7ECF5028811BBC66C85EC


The Problem of Trusting Keys: Certificate, Certificate Authority, Public-key Infrastructure

This public-key cryptography solved a lot problems. However, along with it comes with many new practical problems.

How do you know, that Bob's public key on the net is actually from Bob? A bad guy can publish a key and claim he's Bob. If Alice encrypt a message using a wrong person's public key, then that wrong person will be able to read the message, nobody else. So, here you have the problem of trusting keys.

To solve this trusting public key problem, is invented Public key certificate and Certificate authority (CA). Basically, a organization is trusted as the authority, and it publish “certificates”.

The Certificate contains the public key and a person/org's ID (of some form), and the issuer, and other info. So, for example, when you send credit card number to, you first seek the public key in a certificate from Amazon that is issued by some CA that you trust.

Firefox Certificate Manager 2012-10-22
Firefox's Certificate Manager dialog screenshot, as of .

You can view the certificates that are bundled with browsers. In Firefox, it's under Options/Preference, Advanced tab, Encryption. In Internet Explorer 9, it's under Options, Content tab.

Web Browser bundle certificates of popular companies such as {Google, Yahoo, Skype, Microsoft's, …}, and certs from global CAs. You can add more or remove some.

The most popular global CA (➢ for example: for websites, online banking) are {VeriSign, Thawte}, both now owned by Symantec. (Thawte was founded by Mark Shuttleworth in 1995, before he started Ubuntu Linux in 2004) Other popular ones are Comodo and GoDaddy. These companies do business of issuing certs. To have a cert issued by them, you have to pay money.

For Firefox, you can see their cert inclusion policy and a list at 〔Mozilla CA Certificate Store @

Public-key Infrastructure

There is, another problem. Certificates Authority are basically centralized. If it's a big government (➢ for example: USA, China, Russia), or big corporation (➢ for example: Microsoft, Google), how do you know it won't be be corrupted at some point? Overall, several systems are invented to solve the problem of trusting keys.

Both “web of trust” & are not ideal. They are only used in Linux communities mostly among individuals. (they are not used by online merchants, or banks and financial services, governments, or any serious business.)

Overall, a system for the key-trust-problem is called the Public-key infrastructure (PKI).

Key Revocation and Expiration

There is, more complexity. Suppose Alice's computer got stolen, and thus her secret key is exposed. Now, her public key can't be trusted anymore. If you encrypt a message using that public key, then her computer thief can read it. So, there's a need for key revocation. When you lost a key, you should be able to report it, and the CA or PKI should be able to know it, and “cancel” your public key that is now bad. There's also a need for key expiration. Thus, is created Certificate revocation list and also Online Certificate Status Protocol.

Now, when you visit website, sometimes you get a error message saying that the Certificate can't be trusted, or that the Cert expired, or that the revocation list can't be verified. Now you know what it means.

Public-Key Cryptography and Symmetric-Key Cryptography; Mixed Use

PKC is great. However, it's slow to use PKC to encrypt messages. So, to make it practical, all HTTPS, SSH, etc systems that use PKC actually use hybrid of public-key and symmetric-key cipher. Following is a basic description of how it works.

First, use PKC system to encrypt a message that is the key for a symmetric-key cipher. Then, once this is done, all exchange of messages use symmetric-key cipher to encrypt & decrypt. For example, to send credit card to, first you get the public key of Amazon (by getting its certificate issued by a trusted CA. (which is usually bundled with the browser.)). Then, use PKC to encrypt a key of some symmetric-key cipher method. This new, temporary key, is now known to both you and Amazon. For the rest of the session, symmetric-key encryption is used to communicate.

So, this is another complexity, out of practicality, in using PKC.

Symmetric-key Ciphers: AES, Blowfish, DES, Triple DES, Serpent, Twofish, …

So now, you see that Symmetric-key Ciphers are quite important. They are also much simpler, and faster. Vast majority of ciphers, from ancient to modern ones, are actually of the symmetric-key type. Symmetric-key ciphers are usually called Symmetric-key algorithm.

Over the years, many symmetric-key ciphers came and went, because computer gets faster, and unbreakable cipher 10 years ago becomes easy to break. So, over the years we have {AES, Blowfish, DES, Triple DES, Serpent, Twofish} and lots others.

DES was the standard around 1990s. Since, Triple DES replaced it, then, AES is the most popular today. Here's some basic description of symmetric-key ciphers.

Block cipher

In cryptography, a block cipher is a deterministic algorithm operating on fixed-length groups of bits, called blocks, with an unvarying transformation that is specified by a symmetric key. Block ciphers are important elementary components in the design of many cryptographic protocols, and are widely used to implement encryption of bulk data.

Common block ciphers: AES, Blowfish, DES, Triple DES, Serpent, Twofish.

Data Encryption Standard

DES is the archetypal block cipher — an algorithm that takes a fixed-length string of plaintext bits and transforms it through a series of complicated operations into another ciphertext bitstring of the same length. In the case of DES, the block size is 64 bits. DES also uses a key to customize the transformation, so that decryption can supposedly only be performed by those who know the particular key used to encrypt. The key ostensibly consists of 64 bits; however, only 56 of these are actually used by the algorithm. Eight bits are used solely for checking parity, and are thereafter discarded. Hence the effective key length is 56 bits, and it is always quoted as such. Every 8th bit of the selected key is discarded, that is, positions 8, 16, 24, 32, 40, 48, 56, 64 are removed from the 64 bit key leaving behind only the 56 bit key.

Like other block ciphers, DES by itself is not a secure means of encryption but must instead be used in a mode of operation.

Triple DES (aka TDEA): basically applying DES 3 times, in sequence, using 3 different keys.


AES is based on a design principle known as a substitution-permutation network, and is fast in both software and hardware.[6] Unlike its predecessor DES, AES does not use a Feistel network. AES is a variant of Rijndael which has a fixed block size of 128 bits, and a key size of 128, 192, or 256 bits. By contrast, the Rijndael specification per se is specified with block and key sizes that may be any multiple of 32 bits, both with a minimum of 128 and a maximum of 256 bits.

AES has been adopted by the U.S. government and is now used worldwide. It supersedes the Data Encryption Standard (DES),[5] which was published in 1977. The algorithm described by AES is a symmetric-key algorithm.


In cryptography, RC5 is a block cipher notable for its simplicity. Designed by Ronald Rivest in 1994, RC stands for “Rivest Cipher”, or alternatively, “Ron's Code” (compare RC2 and RC4). The Advanced Encryption Standard (AES) candidate RC6 was based on RC5.

Hash Function, Cryptographic Hash Function, Fingerprint, Message Digest

In encryption, we turn a message into a scrambled form. In computing and computing security, we also need to “condense” a message into a short string, and this string can serve for ID purposes. This turning a message into a short string is called hashing, and the result is called Hash Value, Fingerprint, or Message Digest. This is often needed as part of cryptographic protocols.

Hash function is the most basic idea. Basically, it turns a chunk of data (i.e. a large string of arbitrary size) into a short string of fixed length. Ideally, any 2 different input data should produce different strings. Strickly speaking, this is impossible. It is possible only if the length of the result is equal to the length of biggest possible input.

When 2 different chunk of data get turned into identical short string by a hash function, it's called a “collision”, and this is not desirable. Different type of hash functions are designed for different purposes, with different degrees of negligible chance of collision.

Here's a excerpt from Wikipedia on Hash function:

A hash function is any algorithm or subroutine that maps large data sets of variable length, called keys, to smaller data sets of a fixed length. For example, a person's name, having a variable length, could be hashed to a single integer. The values returned by a hash function are called hash values, hash codes, hash sums, checksums or simply hashes.

Hash functions are related to (and often confused with) checksums, check digits, fingerprints, randomization functions, error correcting codes, and cryptographic hash functions. Although these concepts overlap to some extent, each has its own uses and requirements and is designed and optimized differently.

Fingerprint (computing), is basically a hash function with more stringent requirement of no collision.

Cryptographic hash function, is a hash function with highest requirement of no collision.

A cryptographic hash function is a hash function; that is, an algorithm that takes an arbitrary block of data and returns a fixed-size bit string, the (cryptographic) hash value, such that an (accidental or intentional) change to the data will (with very high probability) change the hash value. The data to be encoded are often called the “message,” and the hash value is sometimes called the message digest or simply digest.

The ideal cryptographic hash function has four main or significant properties:

The most commonly used hash functions are: MD5, SHA-1. MD5 is popular in 1990s and is today obsolete. SHA-1 also reached end of its life today. (git uses SHA-1 still as file ID. See: 〔How would git handle a SHA-1 collision on a blob? @〕 〔➤see git Tutorial〕).


The MD5 Message-Digest Algorithm is a widely used cryptographic hash function that produces a 128-bit (16-byte) hash value. Specified in RFC 1321, MD5 has been utilized in a wide variety of security applications, and is also commonly used to check data integrity. MD5 was designed by Ron Rivest in 1991 to replace an earlier hash function, MD4. An MD5 hash is typically expressed as a hexadecimal number, 32 digits long.

However, it has since been shown that MD5 is not collision resistant; as such, MD5 is not suitable for applications like SSL certificates or digital signatures that rely on this property. In 1996, a flaw was found with the design of MD5, and while it was not a clearly fatal weakness, cryptographers began recommending the use of other algorithms, such as SHA-1 — which has since been found to be vulnerable as well. In 2004, more serious flaws were discovered in MD5, making further use of the algorithm for security purposes questionable — specifically, a group of researchers described how to create a pair of files that share the same MD5 checksum. Further advances were made in breaking MD5 in 2005, 2006, and 2007. In December 2008, a group of researchers used this technique to fake SSL certificate validity, and US-CERT now says that MD5 “should be considered cryptographically broken and unsuitable for further use”, and most U.S. government applications now require the SHA-2 family of hash functions.

Emacs has MD5 algorithm builtin. For example, try this emacs lisp code (md5 "some big text file content …"). 〔➤see Emacs: How to Evaluate Emacs Lisp Code


In cryptography, SHA-1 is a cryptographic hash function designed by the United States National Security Agency and published by the United States NIST as a U.S. Federal Information Processing Standard. SHA stands for “secure hash algorithm”. The four SHA algorithms are structured differently and are distinguished as SHA-0, SHA-1, SHA-2, and SHA-3. SHA-1 is very similar to SHA-0, but corrects an error in the original SHA hash specification that led to significant weaknesses. The SHA-0 algorithm was not adopted by many applications. SHA-2 on the other hand significantly differs from the SHA-1 hash function.

SHA-1 is the most widely used of the existing SHA hash functions, and is employed in several widely used applications and protocols.

To get a idea of what cryptographic hash functions are there, and their security status, see: Comparison of cryptographic hash functions and Hash function security summary.


What about HTTPS, SSH, SSL, TLS? These are networking protocols using public-key cryptography. To understand them, you must first understand TCP/IP protocols. See: TCP/IP Tutorial for Beginner.