GKE with CertManager

Using cert-manager add-on with GKE

Joaquín Menchaca (智裕)
9 min readAug 29, 2022

--

This article details how to secure web traffic using TLS with a certificate issued by a trusted CA. This will use Let’s Encrypt through a popular Kubernetes add-on called cert-manager.

This application will cover the following components:

NOTE: This will apply the principal of least privilege for securing access to cloud resources using Workload Identity. This increases the complexity of the solution, but it is the recommended best practices when creating such solutions in a production setting.

📔 NOTE: This was tested on following below and may not work if versions are significantly different.* Kubernetes API v1.22
* kubectl v1.22
* gcloud 394.0.0
* ExternalDNS v0.12.2
* CertManager v1.9.1
* Dgraph v21.03.2

Requirements

This is an advanced category article and will combine several concepts across load balancing, reverse proxy, provisioning cloud resources, and deploying Kubernetes resources (including ingress and statefulset).

Accounts

  • To follow this article, you will need to have a registered domain and forward DNS queries to the Cloud DNS name servers. Consult documentation from the registrar for your domain. This tutorial will use example.com as an example domain.
  • Google Cloud account a billing account and project setup. Google is offering a 90-day $300 free trial (Aug 2022) that is sufficient for this article. See https://cloud.google.com/free.

Knowledge

  • Basic knowledge of using Google Cloud SDK to configure access, setup a project, and provision resources.
  • Basic shell scripting knowledge including things like setting up environment variables
  • Basic Kubernetes using kubectl command to deploy applications and setup configuration with the KUBECONFIG environment variable.
  • Basic networking knowledge of TCP (Layer 4 and Layer 7), HTTP/1.1 protocol, and exposure to SSL/TLS certificates.
  • Understanding of load balancers and reverse proxies.

Tools (Required)

  • Google Cloud SDK (gcloud command) to interact with Google Cloud
  • Kubernetes client (kubectl command) to interact with Kubernetes
  • Helm (helm command) to install Kubernetes packages
  • helm-diff plugin to see differences about what will be deployed.
  • helmfile (helmfile command) to automate installing many helm charts

Tools (Recommended)

  • POSIX shell (sh) such as GNU Bash (bash) or Zsh (zsh): these scripts in this guide were tested using either of these shells on macOS and Ubuntu Linux.
  • GNU stream-editor (sed) and GNU grep (grep): scripts were tested with these tools and the macOS or BSD equivalents may not work.
  • curl (curl): tool to interact with web services from the command line.
  • jq (jq): a JSON processor tool that can transform and extract objects from JSON, as well as providing colorized JSON output greater readability.

These tools can be installed with Homebrew on macOS and with Chocolatey and MSYS2 on Windows.

Project Setup

Directory structure

We want to create this directory structure in your project

~/projects/ingress-gce
├── dgraph
│ └── helmfile.yaml
├── kube-addons
│ ├── helmfile.yaml
│ └── issuers.yaml
└── ratel
└── helmfile.yaml

In GNU Bash, you can create the above structure like this:

mkdir -p ~/projects/ingress-gce/{dgraph,ratel,kube-addons} 
cd ~/projects/ingress-gce
touch {dgraph,kube-addons,ratel}/helmfile.yaml \
kube-addons/issuers.yaml

Environment variables

These environment variables will be used in this project. Create a file called env.sh and run source env.sh.

Google project setup

There will be two projects created to provision cloud resources. One project will have the Cloud DNS zone and the other project will have the GKE cluster. You can set this up in the web console, or by typing these commands:

Provision Cloud Resources

Cloud DNS

Using Google Cloud SDK gcloud command, you can setup the Cloud DNS zone in GNU Bash wih the following command:

You should get a list of name servers that are useful to configure with your registrar. The name servers will very; here’s an example output:

ns-cloud-d1.googledomains.com.
ns-cloud-d2.googledomains.com.
ns-cloud-d3.googledomains.com.
ns-cloud-d4.googledomains.com.

Google Kubernetes Engine

The default setting when creating a GKE cluster is to allow all nodes to have access to escalate privileges. This is not desirable. This process will create a GKE cluster with minimal privileges and have Workload Identity enabled.

⚠️ NOTE: Though this Kubernetes cluster is secure in as far as principal of least privilege with identity principals (Google service accounts) for securing access to cloud resources, the master nodes and worker nodes are  accessible from the public Internet.For further security, such as  production environments, you may  consider using private masters nodes, which require some form of jump host or VPN to access them, as also having worker nodes on a private and public networks (subnets), so that you have further control of what endpoints should be explicitly exposed to the public Internet.Additionally, using a CNI plugin that supports network policies, such as Calico, allows you to restrict traffic from both external and internal networks.

Run through the steps below in GNU Bash:

Grant Access to Cloud DNS

We want to grant future services CertManager and ExternalDNS the ability to read-write DNS records.

When we deploy an ingress object, ExternalDNS will automatically update DNS records in the Cloud DNS zone, and CertManager will issue a certificate using an ACME CA. This process, called DNS01 challenge, requires writing records in the Cloud DNS zone to verify that own that domain.

The best way to do this is to restrict access only to the exact services that need this level of access. This can be down through Workload Identity, which creates a one-to-one relationship between the KSA (Kubernetes Service Account) and the GSA (Google Service Account) using OIDC (OpenID Connect).

Run through the steps here to setup access:

This is only half of the process. The second part of the process happens after deploying the required services, which requires adding annotations to the service account object to associate to the related service account. This will happen in the next part using the tool helmfile.

Kubernetes Addons: CertManager and ExternalDNS

Create the file below at kube-addons/helmfile.yaml:

This helmfile.yaml will install two helm charts with the necessary configuration to support Workload Identity for access to the Cloud DNS zone.

When ready, you can deploy this with:

source env.sh
helmfile --file
kube-addons/helmfile.yaml

After CertManager is installed, it will install some customer CRDs that are needed for installing certificates. We’ll need to add some cluster-wide issuers that will allow us to issue certificates when deploying ingress objects.

Copy the following below and save as kube-addons/issuers.yaml:

When ready, these can be deployed with:

source env.sh
helmfile --file
kube-addons/issuers.yaml

Example Application: Dgraph

Dgraph is a highly performant distributed graph database that can be deployed easily on Kubernetes.

Though interface of Dgraph is through a web interface using GraphQL or a superset called DQL, it is still a database, and for that reason, should be secured.

Run the following below to create an DG_ALLOW_LIST of private IP addresses allocated by the GKE cluster, as well as the IP address that is used to access the Internet.

Create the following file below as dgraph/helmfile.yaml:

When ready, this can be deployed by typing:

source env.sh
helmfile --file dgraph/helmfile.yaml

This will deploy 3 Dgraph Alpha nodes and 3 Dgraph Zero nodes and an ingress object to access the Dgraph Alpha nodes.

The Dgraph Alpha nodes will have to be configured to using a service of NodePort type, because the L7 load balancer provisioned by deploying the ingress, can only route to IP addresses that are accessible on the VPC’s network. It cannot route to private IP addresses within the Kubernetes network.

Verify that the load balancer is running using:

kubectl get ing --namespace dgraph

When an IP address is allocated, you can query the endpoint using:

source env.sh
curl
https://dgraph.${DNS_DOMAIN}/health | jq
curl https://dgraph.${DNS_DOMAIN}/state | jq

Example Application: Ratel

The Ratel is a small application that runs in your browser and can access the Draph cluster from your browser to the server. There’s a small web server that hosts the front-end code that needs to be deployed first.

Ratel should never be deployed in the same namespace as Dgraph, because if the application is ever compromised, the malicious actor could then directly read the private graph database. By having separate namespaces, Dgraph can be further secured using a network policy like Calico or Cillium, and also with a service mesh that supports strict mode.

Create a file below as ratel/helmfile.yaml:

When ready this can be deployed with:

source env.sh
helmfile --file ratel/helmfile.yaml

You can access it depending on your domain, but something like https://ratel.example.com:

Configure the Dgraph server URL to your domain name used for the Dgraph Alpha nodes, such as https://dgraph.example.com.

Fun with Dgraph

From the Ratel application we can adds some data, add a schema, and run a query.

NOTE: These are adapted from the Getting Started tutorial and using snippets from an earlier blog I wrote for called Dgraph on AWS: Setting up a horizontally scalable graph database.

Dataset

Select the Console and the Mutate radio button, and copy and paste the following text below, then select the Run button:

Schema

Alter the schema by selecting the Schema button on the left, and then the Bulk Edit button. In this edit box, copy and paste the text below, and then select Apply Schema.

Query

Now we can do a query. Select Console, Query, then copy and paste the following:

This should look something like this:

Cleanup

🚨IMPORTANT: It is vital to delete the persistence volume claims, i.e. pvc resources, before deleting the cluster. Otherwise, the storage will remain and eat up costs.

The other Kubernetes resources are optional, as they will be deleted with the destruction of the cluster.

Kubernetes Resources

You can cleanup Kubernetes resources with the following commands.

Google cloud resources

You can cleanup Google cloud resources with the following commands.

Resources

Helmfile

GKE default ingress (ingress-gce)

This is the default ingress controller for GKE and it is configured outside of the GKE cluser using a L7 load balancer. Google has no formal name for this ingress controller other than calling it External HTTP(S) Load Balancer.

ExternalDNS

CertManager

Dgraph

Conclusion

I thought originally this tutorial would be straight forward, but there are many issues that cause erroneous results. First the workload identity setup is complex, but fortunately, some of this complexity is hidden by using automation with helmfile.

The default GKE ingress

The default ingress that comes with GKE had some snafus that I forgot. First, the ingress controller creates the HTTP(S) load balancer outside of the GKE cluster, so ClusterIP service type is not possible. Thus, either NodePort or LoadBalancer service types are supported, as these are accessible on the same network as the external load balancer.

There are a few downsides to this:

  1. Dgraph exposed to local private networks
  2. Each ingress allocates a new IP address

The first is not desirable as it exposes the attack surface to Dgraph. An extra firewall rules or network policies are desirable to help lock this down. This tutorial showed how how to restrict what addresses can access Dgraph to help minimize the risk.

The second issue is problematic load balancers in Google Cloud can get expensive.

Finally

Anyhow, I hope all of this is useful for your journeys with CertManager, ExternalDNS on GKE. I also you got a chance to play with Dgraph, especially as graph databases are really fun.

--

--

Joaquín Menchaca (智裕)

DevOps/SRE/PlatformEng — k8s, o11y, vault, terraform, ansible