GKE with NGINX Service Mesh 2

Integrating an Ingress into NGINX Service Mesh

Joaquín Menchaca (智裕)
15 min readSep 18, 2022

--

In the previous article, I demonstrated how to deploy NSM (NGINX Service Mesh) for east-west traffic. This allows clients and servers to communicate transparently with mTLS (mutual TLS), where both the client and servers authenticate each other and provide encrypted traffic, using a mixture of gRPC and HTTPS traffic.

I also showed how to deploy real world scenario, a distributed graph database Dgraph, and a client that connects to Dgraph through the service mesh.

Service Mesh with Pydgraph and Dgraph

Overview

This tutorial continues with that to add an ingress controller (NGINX+ ingress controller) that is integrated with the service mesh, so that communication between the ingress controller and members within the mesh use mTLS.

With these two solutions, both gRPC and HTTPS is supported on north-south traffic (ingress endpoint) and east-west traffic (service mesh).

For this solution, we’ll install a graphical client application called Dgraph Ratel, which can access Dgraph through the end-points outside of the Kubernetes cluster.

Service Mesh with Ratel and NGINX Ingress Controller

To follow this guide, the previous article must be followed, as this one depends on NSM to be installed and fully operational.

📔 NOTE: This was tested on following below and may not work if versions are significantly different.* Kubernetes API v1.22
* kubectl v1.22
* helm v3.8.2
* helmfile v0.144.0
* gcloud 402.0.0
* gsutil 5.13
* external-dns v0.12.2
* cert-manager v1.9.1
* NGINX Ingress Controller 2.3.0
* NGINX Service Mesh 1.5.0
* nginx-meshctl v1.5.0
* Docker 20.10.17
* Dgraph v21.03.2

Requirements

Previous Article

The previous article walks through installing NGINX Service Mesh and Dgraph. These components are required.

Accounts

Knowlege

Tools (Required)

  • Google Cloud SDK (gcloud command) to interact with Google Cloud
  • Kubernetes client (kubectl command) to interact with Kubernetes
  • Helm (helm command) to install Kubernetes packages
  • helm-diff plugin to see differences about what will be deployed.
  • helmfile (helmfile command) to automate installing many helm charts
  • Docker Engine (docker command) to automate running pydgraph client and all its dependencies locally.
  • NSM command line tool (nginx-meshctl) is an optional tool used to deploy and interact with the service mesh.
    NOTE: This tool is gated behind https://downloads.f5.com that seems to have a lot of problems, so this tool is strictly optional for this tutorial.

Tools (Recommended)

  • POSIX shell (sh) such as GNU Bash (bash) or Zsh (zsh): these scripts in this guide were tested using either of these shells on macOS and Ubuntu Linux.
  • GNU stream-editor (sed) and GNU grep (grep): scripts were tested with these tools and the macOS or BSD equivalents may not work.
  • curl (curl): tool to interact with web servers from the command line.
  • jq (jq): a JSON processor tool that can transform and extract objects from JSON, as well as providing colorized JSON output greater readability.
  • gprcurl (gprcurl): tool to interact with gRPC servicers from the command line.

These tools can be installed with Homebrew on macOS and with Chocolatey and MSYS2 on Windows.

Project Setup

Directory Structure

The directory structure will add some new items (emboldened) in addition to items created from the previous article.

~/projects/nsm
├── clients
│ ├── examples
│ │ └── pydgraph
│ │ ├── Dockerfile
│ │ ├── Makefile
│ │ ├── helmfile.yaml
│ │ ├── load_data.py
│ │ ├── requirements.txt
│ │ ├── sw.nquads.rdf
│ │ └── sw.schema
│ └── fetch_scripts.sh
├── dgraph
│ ├── dgraph_allow_lists.sh
│ ├── helmfile.yaml
│ └── vs.yaml
├── kube_addons
├── cert_manager
│ ├── helmfile.yaml
│ └── issuers.yaml
├── external_dns
│ └── helmfile.yaml
└── nginx_ic
├── docker_keys.sh
└── helmfile.yaml
├── nsm
│ └── helmfile.yaml
├── o11y
│ ├── fetch_manifests.sh
│ └── helmfile.yaml
└── ratel
├── helmfile.yaml
└── vs.yaml

You can add these additional items with the following commands in Bash:

Environment Variables

These environment variables will be used in this project. Create a file called dns_env.sh with the contents below:

When finished with this script, combine gke_env.sh form the previous article with dns_env.sh into env.sh and source them. In a POSIX shell, you can do that with the following commands:

cat gke_env.sh dns_env.sh > env.sh
source env.sh

Google project setup

In the previous article, we setup GCR to publish Docker images, and GKE to run workloads. In this article, we’ll create a project to host the Cloud DNS zone:

Provision Cloud Resources

Cloud DNS

Using Google Cloud SDK gcloud command, you can setup the Cloud DNS zone with the following commands in GNU Bash:

You should get a list of name servers that are useful to configure with your registrar. The name servers will very; here’s an example output:

ns-cloud-d1.googledomains.com.
ns-cloud-d2.googledomains.com.
ns-cloud-d3.googledomains.com.
ns-cloud-d4.googledomains.com.

Google Kubernetes Engine

In the previous article, this was provisioned in the $GKE_PROJECT_ID project.

Google Container Registry

In the previous article, this was provisioned in the $GCR_PROJECT_ID project.

Grant Access to Cloud DNS with Workload Identity

Both CertManager and ExternalDNS require the ability to read-write DNS records.

This step will start the process to setup a one-to-one relationship between the KSA (Kubernetes Service Account) and the GSA (Google Service Account) using an OIDC (OpenID Connect) provider.

Run through the steps here to setup access:

Deploy Kubernetes Addons

There will be three add-ons that will add functionality to the GKE cluster, which should be installed in the following order:

  1. CertManager to issue TLS certificates used to secure both web traffic
  2. NGINX Ingress Controller to provide a layer 7 load balancer for services within the service mesh.
  3. ExternalDNS to automate updating DNS records of deployed services

CertManager

Create the following below and save it as ./kube_addons/cert_manager/helmfile.yaml:

When ready, you can deploy this using the following:

source env.sh
helmfile --file ./kube_addons/cert_manager/helmfile.yaml apply

Create the following below and save it as ./kube_addons/cert_manager/issuers.yaml

When ready, you can deploy this using the following:

source env.sh
helmfile --file ./kube_addons/cert_manager/issuers.yaml apply

NGINX Ingress Controller: access NGINX+ images

This is more complex component because currently, in order of have NGINX Ingress Controller integrated with NGINX Service Mesh, you need to have a commericial license of NGINX. For a trial licnese, see to https://www.nginx.com/free-trial-request/.

Follow the instructions to download nginx-repo.crt, nginx-repo.key, and optionally nginx-repo.jwt. You will need to install these into your local Docker environemnt to be able to access the private repository (below).

First, copy the keys to a local location. Assuming these are in $HOME/Downloads, you can copy them using this command in Bash:

cp ~/Downloads/nginx-repo.{jwt,key,crt} ./kube_addons/nginx_ic

Save script the following to ./kube_addons/nginx_ic/docker_keys.sh:

When ready, you can run this with:

pushd ./kube_addons/nginx_ic/ && bash docker_keys.sh && popd

If you are using Docker Desktop on Mac, you will need to restart Docker Desktop, so that it may copy the keys into its virtual machine, which is Hyperkit system that runs the Docker engine.

For Linux, nothing needs to be done, as the new credentials will be immediately available.

NGINX Ingress Controller: republish private images

Now comes time to pull the images from a private repository and push them to the GCR repository.

NOTE: GCR was enabled and access was granting using gsutil in the previous article.

The easiest solution to access private images is to republish them to GCR.

NOTE: the alternative is more complex, as it requires using JWT credentials to do image pulls directly from NGINX’s private repository; see Using the NGINX IC Plus JWT token in a Docker Config Secret for more information.

Run the following steps to republish to GCR:

NGINX Ingress Controller: Helmfile configuration

Create the following at ./kube_addons/nginx_ic/helmfile.yaml:

If you would like to use App Protect to secure access to the backend database, typically a good idea, you can enable this feature with:

export NGINX_APP_PROTECT=true

When ready, you can deploy this with the following command:

source env.sh
helmfile --file ./kube_addons/nginx_ic/helmfile.yaml apply

ExternalDNS

Create the following at ./kube_addons/external_dns/helmfile.yaml:

When ready, you can deploy this with the following command:

source env.sh
helmfile --file ./kube_addons/external_dns/helmfile.yaml apply

Verify Deployment

You can verify all the components in kube-addons namespace by running:

kubectl get all,clusterissuer --namespace kube-addons

This should show something like the following:

Dgraph Ratel

Dgraph has a graphical administrative, query, and visualization tool called Ratel. You can install this with the following steps.

Create the following in ./ratel/helmfile.yaml:

When ready, this can be deployed with the following command:

source env.shkubectl get namespace "ratel" > /dev/null 2> /dev/null \
|| kubectl create namespace "ratel" \
&& kubectl label namespaces "ratel" name="ratel"
helmfile --file helmfile.yaml template \
| nginx-meshctl inject \
| kubectl apply --namespace "ratel" --filename -

This will install Ratel as a member of the service mesh network.

Normally, we would NOT want to this, because Ratel does NOT need direct access to Dgraph, for it is a wholly independent client application that can connect to any Dgraph database end-point.

However, once NGINX ingress controller is configured to integrate into NGINX service mesh, it will now only work for services that are apart of the mesh network.

You can check the results of the deployment with the following command:

kubectl get all --namesapce ratel

The results should look something like the following:

Notice for Ratel pod, there are 2/2 ready, meaning there are two containers ready, one for the Ratel web service and another for the proxy side-car container.

Deploy Endpoints

Now that an ingress controller, the layer 7 load balancer, is available, we can configure some endpoints to use the ingress. NGINX supports custom CRD for this process with the VirtualServer resource.

Additionally, if App Protect is enabled, we can also configure a Policy resource to further secure the endpoint.

Ratel Endpoint

Create the following file at ./ratel/vs.yaml:

When ready to deploy this, you can run the following:

source env.sh
helmfile --file ./ratel/vs.yaml apply

You can see the newly deployed resources with the following command:

kubectl get all,virtualserver,certificate --namespace ratel

This should look something like the following:

The VirtualServer is a custom resource from NGINX ingress controller that supports features that are not available in the generic Ingress resource.

When the VirtualServer resource is deployed, ExternalDNS will update the appropriate DNS records, and CertManager will issue a certificate, which shows up here as the Certificate resource and corresponding Secret resource to store the certificate.

You can view the application at https://ratel.example.com, replacing example.com to the domain that you are using.

Dgraph Endpoint

Dgraph was already deployed from in the previous article. Because Dgraph is a backend database, we should take measures to secure it.

One way to do this is with NGINX App Policy. If this was enabled earlier during installation of NGINX ingress controller, you can set a policy to secure traffic.

Create the following file at ./dgraph/vs.yaml:

If you have multiple IP addresses you wish to add, you can add them under spec.accessControl.allow in the above script. For example, I have one or two coffee shops that I use, so I could add these IP addresses.

When ready to deploy, you can run the following:

source env.sh# get outbound IP address
export MY_IP_ADDRESS=$(curl --silent ifconfig.me)
# enable if NGINX Ingress was installed with this setting
export NGINX_APP_PROTECT=true
helmfile --file ./dgraph/vs.yaml

You can check the results of newly deployed resources with:

kubectl get all,virtualserver,certificate,policy --namespace dgraph

This should look something similar to the following:

Testing Access

Testing Access using curl and grpcurl

Afterward, you can test access with the following commands:

source env.shcurl dgraph.${DNS_DOMAIN}/health | jq
curl dgraph.${DNS_DOMAIN}/state | jq
# fetch local api.proto
curl -sOL https://raw.githubusercontent.com/dgraph-io/pydgraph/master/pydgraph/proto/api.proto
grpcurl -proto api.proto \
grpc.$DNS_DOMAIN:443 \
api.Dgraph/CheckVersion

Testing Access using Ratel

Connect to Ratel application and connect to https://dgraph.example.com, replacing example.com the domain that you are using:

If it is a green icons, then we are able to connect to the Dgraph database through the Dgraph endpoint, e.g. https://dgraph.example.com.

In Ratel, select Console, Query, then copy and paste the following:

NOTE: Note that that the data set and schema should already populated into the database using steps in the previous tutorial.

This should look something like this:

The Future: Traffic Access Control Policies

NGINX Service mesh has some support Traffic Access Control Policies using SMI (Service Mesh Interface).

It would be nice to use this feature, because you can block all traffic off, and allow only some services that can access the mesh to connect to the Dgraph graph database. In this case, this would be the pydgraph client and the NGINX ingress controller, which should be able to connect Dgraph through the service mesh.

NGINX Service Mesh set to Deny

In order to get started, you would need to reconfigure NSM to deny all traffic, and then allow traffic by deploying a TrafficTarget custom resource.

Should you want to experiment with this feature, you can configure this in the existing NSM with the following:

# set global setting that will be picked up in helmfile
export NSM_ACCESS_CONTROL_MODE
=deny
# path existing deployment
helmfile --file
./nsm/helmfile.yaml apply
# delete existing running pods so new pods pick up config-map
kubectl
delete --namespace "nginx-mesh" \
$(kubectl get pods \
--namespace "nginx-mesh" \
--selector "app.kubernetes.io/name=nginx-mesh-api" \
--output name
)
# verify currently set to 'deny'
nginx-meshctl
config | jq -r .accessControlMode

Ultimately, this solution ultimately will not work for one reason:

  1. Only HTTP traffic is denied, gRPC traffic still works regardless.

For this issue, I let the team know about it by filing a github issue: and received some feedback:

Our gRPC support does not have feature parity with HTTP. The TrafficTarget objects do not effect gRPC traffic. This deserves an explanation in the documentation (ref).

From the SMI specification, it is currently not supported:

gRPC — there should be a gRPC specific traffic spec. As part of the first version, this has been left out as HTTPRouteGroup can be used in the interim (ref).

Thus far in the the current implementation, you cannot restrict gRPC traffic regardless on any configuration. This comes as a shock, given the popularity of gRPC for microservice or distributed database clusters.

Addendum: alternative to using a registered domain

For an alternative to registering a domain and forwarding domain resolution to Cloud DNS, you can do the following to use untrusted certificates and simulate domain resolution:

  • Set ACME_ISSUER_NAME=letsencrypt-staging before deploying any VirtualServer.
  • Edit edit/etc/hosts (or equivalent) on your local system to match the DNS records, update local DNS cache, or configuring DNS client to point to Cloud DNS for the search domain for that domain.
  • When accessing a service through the web like Ratel, you will need to click on add an exception when prompted for an untrusted website.
  • When using the curl command, you will have to use the curl -k option.
  • When using the load_data.py script, you will need to copy the private certificate from the Kubernetes secret with the kubectl cp command and then copy it into the running Docker container with the docker cp command. Then when running load_pata.py script, use the appropriate command line options to point to the path of the private certificate. Run load_data.py --help for more information on command line options.

Cleanup

Delete Kubernetes Resources

This will delete the resources from this tutorial and the previous tutorial.

Purge Google Cloud Resources

This will delete the cloud resources from this tutorial and the previous tutorial.

Resources

Blog Source Code

These are further notes and code that I created in the testing of NGINX service mesh solution.

Google Cloud

F5 NGINX

F5 Aspen Mesh

Articles

Other

Conclusion

For now, this concludes the journey to deploy a service mesh platform (NGINX Service Mesh) with an integrated ingress controller (NGINX Ingress Controller).

Take Aways

The take-aways from this exercise are the following:

  • Deploying integrated ingress controller with NGINX Ingress Controller
  • Using private container registry to host NGINX+ image artifacts.
  • Using new NGINX Ingress Controller custom resources of VirtualServer and Policy.
  • Enabling app protect and adding a policy to further protect services.
  • Integrating CertManager with VirtualServer to secure web traffic with trusted TLS certificates.
  • Integrating ExternalDNS to automate DNS record upserts when deploying VirtualServer.
  • Using Workload Identity with GKE to grant secure access to Cloud DNS.

About the NGINX Service Mesh Solution

The NGINX Service Mesh was deployed with strict mode (mTLS), so that those that are not members of the service mesh cannot connect to any services on the mesh, such as the private Dgraph graph database.

In this article, we added an integrated ingress controller where the ingress controller communicates to the mesh using mTLS. This adds an essential layer of security, because traffic between the ingress controller (layer 7 load balancer) and members of the service mesh are encrypted and authenticated using mTLS.

Limitations of NGINX Ingress Controller

But there’s a catch, the ingress controller cannot service non-mesh-members after it is integrated.

Thus in order to use NGINX Ingress Controller, all services MUST be integrated into the service mesh, regardless of whether or not the new service needs access to other services within the mesh, such as the private Dgraph graph database.

In the case of the Ratel client application, we do not want anything running on the small web service container to access the mesh. Ratel is a client that will load in your browser, and accesses Dgraph through an endpoint. The service does nothing in the backend except serve the client application.

Workarounds for the Limitation

So what can we do?

There are three possible workarounds that come to mind to get around this limitation:

  1. Deploy two ingress controllers, one to service the meshed services, and one that services non-meshed services. This adds costs for the public IP address and external load balancer.
  2. Set mTLS mode to permissive, to allow both non-mesh and mesh traffic to the services, which is less secure.
  3. Introduce a new network plugin, like Calico, that supports network policies. This can work for either service mesh sets mTLS to either permissive or strict.
  4. Set access control mode to deny and then use SMI traffic access control policies to control access to the service, and only allow the ingress controller and designated client pods to access the database†.

Note that currently, SMI traffic access control feature is currently still in development and not recommended for production. Ultimately, however, this solution will NOT work for gRPC, as gRPC traffic will always go through regardless of these policies, which is a significant limitation (ref. issue 76).

Required Commercial License Expense

NGINX Service Mesh advertises that this solution is free:

NGINX Service Mesh is free, optimized for developers, and the lightest, easiest way to implement mTLS and end-to-end encryption in Kubernetes for both east‑west (service-to-service) traffic and north‑south (ingress and egress) traffic. (ref How to Chose a Service Mesh)

Ultimately this may be false advertisement if you want “north-south (ingress and egress)” that is integrated into the service mesh. The documentation explicitly states this:

There are two versions of NGINX Ingress Controller for Kubernetes: NGINX Open Source and NGINX Plus. To deploy NGINX Ingress Controller with NGINX Service Mesh, you must use the NGINX Plus version. (ref deploy-with-kic)

In contrast to other solutions that are truly free, Istio and F5's Aspen Mesh, offers an integrated ingress controller that does not require a commercial license.

Finally

I think it is exciting that NGINX is embracing Kubernetes and offers a service mesh, especially given NGINX background is building the fastest web server and load balancer.

Their recent announcement to commit more resources to open source is exciting, so hopefully these products can improve and address some of the issues I brought up in this article and others.

Thank you for following.

--

--

Joaquín Menchaca (智裕)

DevOps/SRE/PlatformEng — k8s, o11y, vault, terraform, ansible