How PKI (Public Key Infrastructure) works

Cyclops3590
CERTIFIED EXPERT
Published:
First off, I want to get a few definitions out of the way so we can all be on the same page.  This includes common abbreviations and terms used when talking about PKI

Abbreviations:
PKI - Public Key Infrastructure
CA - Certificate Authority
CN - Common Name
CSR - Certificate Signing Request
SAN - Subject Alternative Name
OCSP - Online Certificate Status Protocol
CRL - Certificate Revocation List

Definitions:
Asymmetric Encryption - Encryption that uses two keys; one public, one private.  If data is encrypted with one of the keys, it can only be decrypted with the other key.
Public Key - The encryption key that is given to anyone in the public domain.  Normally used to encrypt data.
Private Key - The encryption key that is kept secret and should never be shared any ware.  Normally used to decrypt data.
Root CA - Basically the top dog of the certificate hierarchy.  The root of the trust chain.  
Intermediate CA - Below the Root CA in the trust chain and is often the CA that actually sign any CSRs for the PKI environment it is apart of.
Digital Signature - a mechanism in which a message can be sent to someone and they can verify it came from you due to an encrypted (using the private key) hash of the message.
Self-signed Certificate - a CSR that is signed by the same identity entity that it is saying it is.  This is generally bad because anyone can say "I'm Fred" and vouch for themselves.  However for the certificate to then be trusted, you have to already know who "Fred" is and trust him.
Identity - The system/individual that owns the key pair.

PKI's purpose:

First off, I want to explain why PKI is necessary.  Asymmetric encryption is very computationally expensive compared to symmetric encryption.  So often times when you are using applications that use, e.g., SSH or SSL, the encryption that is performed on the data itself is not asymmetric; rather symmetric.  So why do we even use asymmetric encryption then?  Well, it's because we don't trust the world as far as we can throw it.  So we use PKI primarily for authentication purposes, nothing more.  This is accomplished due to the 2 key encryption implementation.  If data is encrypted with the public key, it can only be decrypted by the private key.  Since we never give out the private key, we know that whomever sent the data used your key.  However, how does the public client know you are you?  This is often times done via digital signatures.  Since the message being sent is hashed, and then encrypted using the private key, only the public key can decrypt it.  The receiver then hashes the message and compares the decrypted message.  If it's the same, it must have been sent by the private key owner.  This is why it is so important to keep that private key secret because otherwise anyone can masquerade as you.  Keep in mind though, while the public person validated you, you never really validated them.  This is because the public key can be used by anyone due to the fact you give it out like candy for anyone to use.  This can be overcome by both sides having an asymmetric key pair.  You can look up 2-way SSL for an example of this.

So PKI is about authentication.

PKI architecture:

Simply

   Root CA
      |
Intermediate CA
      |
    Server

Now that we know why we use PKI, how is everything put together.  Every PKI environment needs a Root CA.  Until that is created, a PKI doesn't exist.  The primary purpose of this Root CA is to be the root of all trust.  How does it accomplish this though?  Well, it does it via a self-signed certificate.  Wait, I thought you said, self-signed certificates are bad in the definition section?  Yes, I did.  And this is the exception that proves the rule.  Since we are using certificates for a client to authenticate the person they initiate communication with is actually the person we think they are, we need a way to trust what they say.  Effectively Root CA's are rare, and not under the same administrative domain as the server you're connecting to, and when you say you trust one, it should be like saying you trust your mother or father; it should mean something.  Not like, I trust that stranger not to run their car into me; do you really?  So you say you trust the Root CA's self-signed certificate and as a result any certificate it digitally signs.  I'll get to CA signing a certificate in just a moment.

Next in the architecture is the Intermediate CA.  These are generally numerous in nature to provide redundancy and thus allows the Root CA to be turned off.  Yes, off.  Why?  Because if that self-signed Root CA certificate gets compromised, you're entire PKI is now compromised and no certificate can be trusted.  So Intermediate CA's, while not a requirement like a Root CA, generally are found due to actually signing the certificate requests.  But how does the Intermediate CA do signing and be trusted like the Root CA?  Well the Intermediate CA creates a public/private pair and the public key is put into a CSR that gets signed by the Root CA.  The public key and digital signature is put into a certificate which is used by the Intermediate CA to be provided to the public, which is how they get the information needed to be able to link that CA to the root CA establishing a trust chain.

Finally the server.  The server will create a private/public key pair.  The private key is always kept secret.  The public key on the other hand, just like the Intermediate CA, is used to create a CSR.  That CSR is given to the admin of the Intermediate CA.  The admin then verifies the request with certain data (I'll get into what data is there to certify in a moment) and if everything looks ok, then he'll let the CA sign the request and send the resulting certificate along with the Root CA and Intermediate CA chain certs.  The reason for all of those certificates is simple.  After you receive everything back, you have to install the certificate (which contains the public key) and the private key into whatever server you are going to use it in.  You most likely will also need to install the chain certs so that the software can validate everything as well and the certificate is in fact legit.  This is the case even in self-signed certificates.  However, in addition you will link the server certificate to the Intermediate CA and the Intermediate CA to the Root CA.  The purpose is because when the client authenticates the server, the server will provide the signed server certificate.  Since it is linked to the Intermediate CA certificate, it will also provide that and the Root CA certificate in turn.  This way, the client doesn't have to actually have any of the certificates loaded in its trusted certificate store.  All the client has to do is have one of those certificates in the store as trusted and the server is now trusted.

Certificate creation/installation/maintenance process:

I said I'd get to it in a moment, so here we go.  I glossed over several important technical points about the creation of the keys, the CSR, and the signing.  First off, certificates have purposes.  So the certificate you install on your web server is created with certain purposes in mind; like identifying the server to any connecting clients.  The intermediate CA was created with signing CSR's in mind.

When a certificate is created it starts by the identity/server creating the key pair.  These are simply keys and carry basic information.  However, already at this point, the public key has the common name information in addition to the public encryption key.  After these are created, the Identity runs a command to encapsulate that public key created into a CSR.  This preps the public key into the correct format for whatever CA you are going to use to sign the request.  The CSR, not the public key (remember public key is in the CSR), is then sent to the CA admin; nothing else. The CSR however is not enough for the certificate to be signed.  There is other information you must provide.  This generally involves contact information, SANs (which can be in the CSR), expiry information, etc.  It can differ based on the CA.  All of this information is then put into the certificate as well.  The signing process is effectively a Digital signature from the CA.  Since the client has the CA's public cert, it can do the process to decrypt and validate the information (public key decrypts signature which was encrypted with CA's private key and then that hash is compared against the verifier's own hash calculation).  So we're done right?  Not quite yet, there are still 2 areas that need to be covered: certificate chaining and revocation.

I barely mentioned it above about chain certificates.  Remember I said you could turn off the Root CA.  What if the client connecting to you only has the Root CA certificate and not the Intermediate certificate?  Well the client won't trust your certificate because it was signed by the Intermediate CA, not the Root CA.  Wait, I thought they created a chain of trust?  Precisely,  The chain certificates you received when you received your server certificate that you installed can be used and the Intermediate CA's certificate is then given to the client automatically; no need for the client to have it as it will be provided by the server being communicated with.  The client, says, hey, I can trust that certificate because it is signed by a Root CA cert I trust.  Thus I can now trust your server certificate.  Pretty cool huh.

Revocation is one of those things where you hope it never is needed, for some reason lots of people don't use, but keeps the whole purpose of using PKI intact.  The world has lots of bad people (and lazy ones) in it and sometimes private keys get compromised for whatever reason.  When this happens, you can't trust the public key either because people can masquerade as you.  So you tell the CA that signed your certificate to revoke the certificate.  They do it and now it is invalid right?  Kind of.  The client has to know that it has been revoked.  If it doesn't know it will still use it and trust that public certificate.  The client can be made aware by using a CRL check or OCSP verification.  CRLs are URLs pointing the client to a list of certificates that it should not trust anymore.  OCSP is a more communicative protocol that can accomplish the same thing.

Finally (ok maybe 3 things left), what about when the certificate isn't compromisd but expires?  You simply perform a certificate renewal.  This is done thru the same process as when you originally got the certificate.  You still have the private key which doesn't need to change unless it was compromised of course, but you can create a new one if you want.  You create another CSR, have your CA sign it, and then install the new certificate.

Private key passwords:

So is the private key just a key and that's it?  Mostly ya, but you can put a password on it.  Yes, it should be kept secret, but if the private key does get compromised, in order to use it, they need to know the password to open the file contents.  Basically it's just an additional security measure and should always be used.  It can buy you enough time to revoke the compromised certificate and get a new one installed before it becomes useful to the hacker.

Certificate validation process

Ok, so we know why we use PKI, how PKI is architected, and how a certificate is created/maintained.  What about its use? The process is as such. This is generalized as some protocols do it differently as to specifics.

Client                           Server
   |  ---Connection--->      |
   |  <---Certificate---        |
Validation
   |  **Sym Key Exch**   |

So when the client connects, the server will send its signed certificate.  The client then validates that certificate to know it can trust that server.  The client actually tries to validate a lot of things:
1. Is the name in the certificate the same name I used to connect to the server
2. Is the certificate's date range valid for the current time
3. Is the signing CA one that I trust.  If not, is there a chain certificate that I trust
4. Is the CA signature in the CA's revoked list
5. Does the server use a cipher that I can use.
There may be others, but those are most common that I'm aware of.

If everything comes back good, the client continues with the communication, usually being the symmetric key exchange (actually exchange process depends on what was agreed upon by the client/server; that process is outside the scope of this article).

Note though that the client doesn't have to perform any validations it doesn't want to; e.g. hostname check.  Also, what happens if one of the checks comes back as bad.  Well, the client gets an error and sometimes the option to add the certificate to a personal store of certificates to explicitly trust.  Depending on what you're connecting to this explicit trust can range from common practice to down right stupid.  But it's the client choice at least.  

And remember this is only about authentication.  But wait, if it's just about authentication, why revoke a certificate if the private key is compromised?  Well, it's because of the possible symmetric key exchanges that can happen (yes, I know I said this process is out of scope, but this is important to know).  There are some exchanges where the client comes up with the key, encrypts it with the public key, and sends it to the server so it knows the key to use.  If the private key is compromised, the hacker can decrypt that portion of the communication and in turn decrypt the entire communications between the two parties.  This is not the case when using cipher suites like DH, Diffie-Hellman.
2
2,917 Views
Cyclops3590
CERTIFIED EXPERT

Comments (1)

Mark WillsTopic Advisor
CERTIFIED EXPERT
Distinguished Expert 2018

Commented:
Friendly reminder...

Sitting as "New" means no one will be looking at it until submitted for Review.

Would benefit from some formatting and a few line feeds (get rid of that "wall of text")

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.