GKE with gRPC and ingress-nginx
Using gRPC with ingress-nginx add-on with GKE
This article details how to secure mixed HTTP and gRPC (HTTP/2) web traffic with a single ingress controller. As part of the process, TLS certificates will be issued by a trusted CA. This will use Let’s Encrypt through a popular Kubernetes add-on called cert-manager.
About ingress-nginx
In a previous article, I demonstrated how to setup secure traffic using HTTP/1.1 using the default ingress controller (ingress-gce) that comes bundled with GKE.
This time around, we’ll use the ever popular ingress controller called ingress-nginx. This ingress controller solution is built around the OpenResty platform that is built around open source NGINX and LuaJIT to provide many features.
About Dgraph
Dgraph is a highly performant and highly available distributed graph database. Dgraph supports using either both gRPC or HTTP to interact with the graph database.
The main purpose of this article is to demonstrate using mixed north-south traffic, or in other words, through an endpoint into the Kubernetes cluster, which is case ingress-nginx.
For internal traffic, called east-west traffic, you can use service meshes, or advance CNI drivers that can support similar features through eBPF. This will be covered in future articles.
Components
This application will cover the following components:
- Google cloud resources: GKE, Cloud DNS zone
- Kubernetes add-ons: cert-manager (installed with
helmfile
), external-dns (installed withhelmfile
), ingress-nginx (installed withhelmfile
) - Applications: Dgraph (installed with
helmfile
), a pydgraph client (run usingdocker
)
NOTE: This will apply the principal of least privilege for securing access to cloud resources using Workload Identity. This is the recommended best practice when creating such solutions need privileged access to a cloud resource.
📔 NOTE: This was tested on following below and may not work if versions are significantly different.* Kubernetes API v1.22
* kubectl v1.22
* gcloud 394.0.0
* external-dns v0.12.2
* cert-manager v1.9.1
* ingress-nginx 1.3.0
* Docker 20.10.17
* Dgraph v21.03.2
Requirements
This is an intermediate-advance category article and will combine several concepts across load balancing, reverse proxy, provisioning cloud resources, and deploying Kubernetes resources (including ingress
and statefulset
).
Accounts
- To follow the steps in this article, you will need to have a registered domain and forward DNS queries to the Cloud DNS name servers.
Consult documentation from the registrar for your domain.
This tutorial will useexample.com
as an example domain. - Google Cloud account a billing account and project setup. Google is offering a 90-day $300 free trial (Aug 2022) that is sufficient for this article. See https://cloud.google.com/free.
For an alternative to registering a domain and forwarding domain resolution to Cloud DNS, you can do the following to use untrusted certificates and simulatre domain resolution:
- Set
ACME_ISSUER_NAME=letencryupt-staging
before deploying any ingress. - Edit edit
/etc/hosts
(or equivalent) to match the DNS records, local DNS cache, or configuring DNS client to point to Cloud DNS for the search domain for that domain. - When accessing a service through the web like Ratel, you will need to click on add an exception when prompted for an untrusted website.
- When using the
curl
command, you will have to use the-k
option, e.g. curl-k
.
Knowledge
- Basic knowledge of using Google Cloud SDK to configure access, setup a project, and provision resources.
- Basic shell scripting knowledge including things like setting up environment variables. Python is useful for understanding the
load_data.py
script, but not required. - Basic Kubernetes using
kubectl
command to deploy applications and setup configuration with theKUBECONFIG
environment variable. Understanding Kubernetes resources types likeDeployment
,StatefulSet
,ReplicaSets
,Pods
,Service
(L4),Ingress
(L7) are useful. - Basic networking knowledge of TCP (Layer 4 vs Layer 7), knowledge about HTTP/2 vs HTTP/1.1 protocols, and exposure to TLS vs SSL certificates.
- Understanding of load balancers and reverse proxies, and routing based on ports, Virtual Host, URL Paths.
Tools (Required)
- Google Cloud SDK (
gcloud
command) to interact with Google Cloud - Kubernetes client (
kubectl
command) to interact with Kubernetes - Helm (
helm
command) to install Kubernetes packages - helm-diff plugin to see differences about what will be deployed.
- helmfile (
helmfile
command) to automate installing many helm charts - Docker Engine (
docker
command) to automate running pydgraph client and all its dependencies locally.
Tools (Recommended)
- POSIX shell (
sh
) such as GNU Bash (bash
) or Zsh (zsh
): these scripts in this guide were tested using either of these shells on macOS and Ubuntu Linux. - GNU stream-editor (
sed
) and GNU grep (grep
): scripts were tested with these tools and the macOS or BSD equivalents may not work. - curl (
curl
): tool to interact with web servers from the command line. - jq (
jq
): a JSON processor tool that can transform and extract objects from JSON, as well as providing colorized JSON output greater readability. - gprcurl (
gprcurl
): tool to interact with gRPC servicers from the command line.
These tools can be installed with Homebrew on macOS and with Chocolatey and MSYS2 on Windows.
Project Setup
Directory structure
We want to create this directory structure in your project area:
~/projects/ingress-nginx-grpc
├── dgraph
│ └── helmfile.yaml
├── kube-addons
│ ├── helmfile.yaml
│ └── issuers.yaml
└── ratel
└── helmfile.yaml
In GNU Bash, you can create the above structure like this:
mkdir -p ~/projects/ingress-nginx-grpc/{dgraph,ratel,kube-addons}
cd ~/projects/ingress-nginx-grpc
touch {dgraph,kube-addons,ratel}/helmfile.yaml \
kube-addons/issuers.yaml
Environment variables
These environment variables will be used in this project. Create a file called env.sh
with the contents below, changing values as appropriate, and then run source env.sh
.
You can run this script to verify the environment variables and commands are available:
USER="darkn3rd"
ID="7af3da347073b0ddf20fd7fa0c4e69c7"
VERS="a3ac0c761e49c2ca8cd88f2e0d75d04dd3f4ed1c"
FILE="validate.sh"
URL=https://gist.githubusercontent.com/$USER/$ID/raw/$VERS/$FILEcurl -s $URL | bash -s --
Google project setup
There will be two Google cloud projects created to provision cloud resources. One project will have the Cloud DNS zone and the other project will have the GKE cluster. You can set this up in the web console, or by typing these commands:
Provision Cloud Resources
Cloud DNS
Using Google Cloud SDK gcloud
command, you can setup the Cloud DNS zone in GNU Bash with the following command:
You should get a list of name servers that are useful to configure with your registrar. The name servers will very; here’s an example output:
ns-cloud-d1.googledomains.com.
ns-cloud-d2.googledomains.com.
ns-cloud-d3.googledomains.com.
ns-cloud-d4.googledomains.com.
Google Kubernetes Engine
The default setting when creating a GKE cluster is to allow all nodes to have access to escalate privileges. This is not desirable. This process will create a GKE cluster with minimal privileges and have Workload Identity enabled.
⚠️ NOTE: Though this Kubernetes cluster is secure in as far as principal of least privilege with identity principals (Google service accounts) for securing access to cloud resources, the master nodes and worker nodes are accessible from the public Internet.For further security, such as production environments, you may consider using private masters nodes, which require some form of jump host or VPN to access them, as also having worker nodes on a private and public networks (subnets), so that you have further control of what endpoints should be explicitly exposed to the public Internet.Additionally, using a CNI plugin that supports network policies, such as Calico, allows you to restrict traffic from both external and internal networks.
Run through the steps below in GNU Bash:
Grant Access to Cloud DNS
We want to grant future services CertManager and ExternalDNS the ability to read-write DNS records.
When we deploy an ingress
object, ExternalDNS will automatically update DNS records in the Cloud DNS zone, and CertManager will issue a certificate using an ACME CA. This process, called DNS01 challenge, requires writing records in the Cloud DNS zone to verify that own that domain.
The best way to do this is to restrict access only to the exact services that need this level of access. This can be down through Workload Identity, which creates a one-to-one relationship between the KSA (Kubernetes Service Account) and the GSA (Google Service Account) using OIDC (OpenID Connect).
Run through the steps here to setup access:
This is only half of the process. The second part of the process happens after deploying the required services, which requires adding annotations to the Kubernetes service account object to associate to the related Google service account. This will happen in the next part using the tool helmfile
.
Kubernetes Addons
Create the file below as kube-addons/helmfile.yaml:
This helmfile.yaml
will install ingress-nginx, external-dns, and cert-manager helm charts. Additional necessary configuration to support is added with Workload Identity for access to the Cloud DNS zone.
When ready, you can deploy this with:
source env.sh
helmfile --file kube-addons/helmfile.yaml
After CertManager is installed, it will install some customer CRDs that are needed for installing certificates. We’ll need to add some cluster-wide issuers that will allow us to issue certificates when deploying ingress
objects.
Copy the following below and save as kube-addons/issuers.yaml
:
When ready, these can be deployed with:
source env.sh
helmfile --file kube-addons/issuers.yaml
You can verify the results with:
kubectl get all,certissuers --namespace kube-addons
Example Application: Dgraph
Dgraph is a highly performant distributed graph database that can be deployed easily on Kubernetes.
Though interface of Dgraph is through a web interface using GraphQL or a superset called DQL, it is still a database, and for that reason, should be secured.
Run the following below to create an DG_ALLOW_LIST
of private IP addresses allocated by the GKE cluster, as well as the IP address that is used to access the Internet.
Create the following file below as dgraph/helmfile.yaml
:
When ready, this can be deployed by typing:
source env.sh
helmfile --file dgraph/helmfile.yaml
This will deploy 3 Dgraph Alpha nodes and 3 Dgraph Zero nodes and an ingress
object to access the Dgraph Alpha nodes.
There will be two ingress resources created because the annotations needed to configure rules in NGINX are represented in annotations, which are global for the whole ingress object. Thus we need to create two of these, one for all HTTP traffic, and another for gRPC traffic.
As ingress-nginx
sits within the Kubernetes cluster, we can use a service type of ClusterIP
, so that the Dgraph servers are not unnecessarily exposed to other networks outside of Kubernetes.
Unlike ingress resources created using an external load balancer, the services can use ClusterIP
, keeping the Dgraph services inaccessible to outside networks
Verify that the ingress were created by running:
kubectl get ing --namespace dgraph
You can query the endpoints using:
source env.sh
HTTP_ADDR=dgraph.${DNS_DOMAIN}
GRPC_ADDR=grpc.$DNS_DOMAIN
GIT_ADDR=raw.githubusercontent.com
GIT_PATH=dgraph-io/pydgraph/master/pydgraph/proto/api.proto# test using HTTP/1.1
curl $HTTP_ADDR/health | jq
curl $HTTP_ADDR/state | jq# fetch api.proto file
curl -sOL https://$GIT_ADDR/$PATH# test using gRPC
grpcurl -proto api.proto \
$GRPC_ADDR:443 \
api.Dgraph/CheckVersion
Example Application: Ratel
The Ratel is a small application that runs in your browser and can access the Draph cluster from your browser to the server. There’s a small web server that hosts the front-end code that needs to be deployed first.
Ratel should never be deployed in the same namespace as Dgraph, because if the application is ever compromised, the malicious actor could then directly read the private graph database. By having separate namespaces, Dgraph can be further secured using a network policy like Calico or Cillium, and also with a service mesh that supports strict mode.
Create a file below as ratel/helmfile.yaml
:
When ready this can be deployed with:
source env.sh
helmfile --file ratel/helmfile.yaml
You can access it depending on your domain, but something like https://ratel.example.com
(swapping out example.com for the domain you are using):
Configure the Dgraph server URL to your domain name used for the Dgraph Alpha nodes, such as https://dgraph.example.com
(swapping out example.com for the domain you are using).
Fun with Dgraph
From the Ratel application we can adds some data, add a schema, and run a query.
NOTE: These are adapted from the Getting Started tutorial and using snippets from an earlier blog I wrote for called Dgraph on AWS: Setting up a horizontally scalable graph database.
Dataset and Schema through gRPC
For the data set and schema, we can use gRPC instead of curl or Ratel.
For this part, I created a set of scripts that can build a Docker image with the needed dependencies and data. The following structure and files created will be created:
~/projects/ingress-nginx-grpc
└── examples
└── pydgraph
├── Dockerfile
├── Makefile
├── helmfile.yaml
├── load_data.py
├── requirements.txt
├── sw.nquads.rdf
└── sw.schema
You can download, setup the file structure, build the Docker image, and run it with the data set and schema using the following commands:
This will use the grpc FQDN, e.g. grpc.example.com
, to interact with Dgraph Alpha servers using gRPC.
The client (Python script load_data.py
) will use encrypted traffic (h2
), to the Dgraph Alpha server. The ingress-nginx controller will actually terminate, and then send it to one of the Dgraph Alpha pods unencrypted (h2c
).
The ingress-nginx controller can do this because a certificate, issued from ACME CA server, using automation from CertManager. To make all of this happen, DNS address records were created in Cloud DNS through ExernalDNS.
Query using Ratel
Performing the query using Ratel is more stimulating as you can get both the JSON and visual feedback.
In Ratel, which should be at https://ratel.example.com
(substituting example.com
for the domain you are using), select Console, Query, then copy and paste the following:
This should look something like this:
Troubleshooting
Certificates
When running curl against the HTTP endpoint, you can run:
source env.sh
curl -svvI https://dgraph.$DNS_DOMAIN/health
One problem I ran into was earlier macOS Catalina (MacOS X version 10.15.2
) from around 2019 had expired Let’s Encrypt root certificates. The only way around this that I could tell was to upgrade the macOS or buy a new computer. I didn’t have the same issue with Ubuntu.
If you need to inspect the certificate itself, you can run this:
Cleanup
🚨IMPORTANT: It is vital to delete the persistence volume claims, i.e. pvc
resources, before deleting the cluster. Otherwise, the storage will remain and eat up costs.
The other Kubernetes resources are optional, as they will be deleted with the destruction of the cluster.
Kubernetes Resources
You can cleanup Kubernetes resources with the following commands:
⚠️ IMPORTANT: Delete the PVC so that no storage is left over when the Kubernetes cluster is deleted.
Google cloud resources
You can cleanup Google cloud resources with the following commands.
Resources
Here’s some articles on gRPC, Google Cloud, and configuration related to the kube-addons installed earlier.
gRPC Articles
- gRPC Load Balancing. 2017-06-15
- gRPC vs HTTP APIS. 2019-11-18
- HTTP/2: The secret weapon of gRPC. 2021-06-25
- How gRPC is faster? 2020-02-05
ingress-nginx
external-dns
cert-manager
- Deploy cert-manager on Google Kubernetes Engine (GKE) and create SSL certificates for Ingress using Let’s Encrypt
- Manage certificate lifecycle using Cert-Manager
- Integrating cert-manager with Google Cloud Certificate Authority Service
- ACME DNS01 Challenge with Google Cloud DNS docs
Previous Article
Conclusion
For microservices, gRPC is ubiquitous, and you would think that it would be something that would be straightforward, and not too complex to setup, but unfortunately that is not the case.
Complexities of gRPC
Part of this complexity, is the nature of HTTP/2 that is required for gRPC. This protocol is encrypted by default (called h2
mode) with TLS certificates, but can operate without encryption (called h2c
mode). The HTTP/2 protocol is stateful and uses a single connection for many sessions, where HTTP/1.1 is stateless and has a single connection per session. With traditional load balancers that don’t yet support HTTP/2, traffic can accumulate on a single server.
For communication with gRPC, either the service supports reflection, so that you can discover the interface, or you need to use a protocol buffer definition to facilitate communication.
Complexities of documentation
Outside of the technologies, open source documentation, being what it is, that is based on volunteers, can leave a lot to be desired. In the case of gRPC documentation with ingress-nginx, it was bad. Previously, the docs covered using this fortune-teller-app that was built with Bazel, and well, Bazel changed (like breaking changes), and nothing compiled, and the original author was absentee. For a long time, the docs did not have a valid reproducible path to demonstrate using gRPC.
Fortunately, last July, a volunteer (thank you Long Wu Yuan) swapped it out for go-grpc-greeter-server
that is built with a Makefile and Buildkit (docker buildx plugin), so hopefully the documentation is better.
In any event, as this process can be complex, especially when supporting mixed HTTP and gRPC traffic with cert-manager and ingress-nginx. Given that situation, I started making tutorials like this, to help people put all of this together and find success with gRPC load balancing.
Continued Adventures of of gRPC
I would like to cover more north-south traffic solutions with ingress controllers such as Nginx Kubernetes ingress (kubernetes-ingress
), Ambassador, Contour, Traefik, Gloo, and maybe Caddy. I was inspired by ArgoCD ingress documentation in this area, especially as ArgoCD uses gRPC and did some robust testing in this area.
For east-west traffic, which is important for clients that operate from within the same Kubernetes cluster, I hope to have some articles on service meshes like NSM, Consul Connect, Istio, and Linkerd, as well as advance CNI drivers that use eBPF like Cillium.
Long term, I would like to explore advance use cases as this leads to progressive delivery (Spinnaker, ArgoCD, FluxCD, Drone) or blue-green canary deployments, o11y (visualization, metrics, tracing, profiling, log aggregation), and policy-as-code (OPA).
I hope you enjoyed this, thanks for reading.