GKE with NGINX Service Mesh 2
Integrating an Ingress into NGINX Service Mesh
In the previous article, I demonstrated how to deploy NSM (NGINX Service Mesh) for east-west traffic. This allows clients and servers to communicate transparently with mTLS (mutual TLS), where both the client and servers authenticate each other and provide encrypted traffic, using a mixture of gRPC and HTTPS traffic.
I also showed how to deploy real world scenario, a distributed graph database Dgraph, and a client that connects to Dgraph through the service mesh.
Overview
This tutorial continues with that to add an ingress controller (NGINX+ ingress controller) that is integrated with the service mesh, so that communication between the ingress controller and members within the mesh use mTLS.
With these two solutions, both gRPC and HTTPS is supported on north-south traffic (ingress endpoint) and east-west traffic (service mesh).
For this solution, we’ll install a graphical client application called Dgraph Ratel, which can access Dgraph through the end-points outside of the Kubernetes cluster.
To follow this guide, the previous article must be followed, as this one depends on NSM to be installed and fully operational.
📔 NOTE: This was tested on following below and may not work if versions are significantly different.* Kubernetes API v1.22
* kubectl v1.22
* helm v3.8.2
* helmfile v0.144.0
* gcloud 402.0.0
* gsutil 5.13
* external-dns v0.12.2
* cert-manager v1.9.1
* NGINX Ingress Controller 2.3.0
* NGINX Service Mesh 1.5.0
* nginx-meshctl v1.5.0
* Docker 20.10.17
* Dgraph v21.03.2
Requirements
Previous Article
The previous article walks through installing NGINX Service Mesh and Dgraph. These components are required.
Accounts
- Google Cloud account a billing account and project setup. Google is offering a 90-day $300 free trial (Aug 2022) that is sufficient for this article. See https://cloud.google.com/free.
- NGINX Plus Trial License or Subscription for NGINX Ingress Controller with NGINX Plus. This also requires a business email, so no accounts like
gmail.com
oryahoo.com
will work. - Registered domain (or alternative) with a registrar, and configure frowarding DNS queries to the Cloud DNS name servers that will be created later.
Knowlege
- Basic knowledge of using Google Cloud SDK to configure access, setup a project, and provision resources.
- Basic shell scripting knowledge including things like setting up environment variables. Python is useful for understanding the
load_data.py
script, but not required. - Basic Kubernetes using
kubectl
command to deploy applications and setup configuration with theKUBECONFIG
environment variable. Understanding Kubernetes resources types likeDeployment
,StatefulSet
,ReplicaSets
,Pods
,Service
(L4),Ingress
(L7) are useful. - Basic networking knowledge of TCP (Layer 4 vs Layer 7), knowledge about HTTP/2 vs HTTP/1.1 protocols, and exposure to TLS vs SSL certificates.
- Understanding of load balancers and reverse proxies, and routing based on ports, Virtual Host, URL Paths.
- Understanding of Google principal identifiers and Kubernetes RBAC and Kubernetes service accounts is useful, but not required.
Tools (Required)
- Google Cloud SDK (
gcloud
command) to interact with Google Cloud - Kubernetes client (
kubectl
command) to interact with Kubernetes - Helm (
helm
command) to install Kubernetes packages - helm-diff plugin to see differences about what will be deployed.
- helmfile (
helmfile
command) to automate installing many helm charts - Docker Engine (
docker
command) to automate running pydgraph client and all its dependencies locally. - NSM command line tool (
nginx-meshctl
) is an optional tool used to deploy and interact with the service mesh.
NOTE: This tool is gated behind https://downloads.f5.com that seems to have a lot of problems, so this tool is strictly optional for this tutorial.
Tools (Recommended)
- POSIX shell (
sh
) such as GNU Bash (bash
) or Zsh (zsh
): these scripts in this guide were tested using either of these shells on macOS and Ubuntu Linux. - GNU stream-editor (
sed
) and GNU grep (grep
): scripts were tested with these tools and the macOS or BSD equivalents may not work. - curl (
curl
): tool to interact with web servers from the command line. - jq (
jq
): a JSON processor tool that can transform and extract objects from JSON, as well as providing colorized JSON output greater readability. - gprcurl (
gprcurl
): tool to interact with gRPC servicers from the command line.
These tools can be installed with Homebrew on macOS and with Chocolatey and MSYS2 on Windows.
Project Setup
Directory Structure
The directory structure will add some new items (emboldened) in addition to items created from the previous article.
~/projects/nsm
├── clients
│ ├── examples
│ │ └── pydgraph
│ │ ├── Dockerfile
│ │ ├── Makefile
│ │ ├── helmfile.yaml
│ │ ├── load_data.py
│ │ ├── requirements.txt
│ │ ├── sw.nquads.rdf
│ │ └── sw.schema
│ └── fetch_scripts.sh
├── dgraph
│ ├── dgraph_allow_lists.sh
│ ├── helmfile.yaml
│ └── vs.yaml
├── kube_addons
│ ├── cert_manager
│ │ ├── helmfile.yaml
│ │ └── issuers.yaml
│ ├── external_dns
│ │ └── helmfile.yaml
│ └── nginx_ic
│ ├── docker_keys.sh
│ └── helmfile.yaml
├── nsm
│ └── helmfile.yaml
├── o11y
│ ├── fetch_manifests.sh
│ └── helmfile.yaml
└── ratel
├── helmfile.yaml
└── vs.yaml
You can add these additional items with the following commands in Bash:
Environment Variables
These environment variables will be used in this project. Create a file called dns_env.sh
with the contents below:
When finished with this script, combine gke_env.sh
form the previous article with dns_env.sh
into env.sh
and source them. In a POSIX shell, you can do that with the following commands:
cat gke_env.sh dns_env.sh > env.sh
source env.sh
Google project setup
In the previous article, we setup GCR to publish Docker images, and GKE to run workloads. In this article, we’ll create a project to host the Cloud DNS zone:
Provision Cloud Resources
Cloud DNS
Using Google Cloud SDK gcloud
command, you can setup the Cloud DNS zone with the following commands in GNU Bash:
You should get a list of name servers that are useful to configure with your registrar. The name servers will very; here’s an example output:
ns-cloud-d1.googledomains.com.
ns-cloud-d2.googledomains.com.
ns-cloud-d3.googledomains.com.
ns-cloud-d4.googledomains.com.
Google Kubernetes Engine
In the previous article, this was provisioned in the $GKE_PROJECT_ID
project.
Google Container Registry
In the previous article, this was provisioned in the $GCR_PROJECT_ID
project.
Grant Access to Cloud DNS with Workload Identity
Both CertManager and ExternalDNS require the ability to read-write DNS records.
This step will start the process to setup a one-to-one relationship between the KSA (Kubernetes Service Account) and the GSA (Google Service Account) using an OIDC (OpenID Connect) provider.
Run through the steps here to setup access:
Deploy Kubernetes Addons
There will be three add-ons that will add functionality to the GKE cluster, which should be installed in the following order:
- CertManager to issue TLS certificates used to secure both web traffic
- NGINX Ingress Controller to provide a layer 7 load balancer for services within the service mesh.
- ExternalDNS to automate updating DNS records of deployed services
CertManager
Create the following below and save it as ./kube_addons/cert_manager/helmfile.yaml
:
When ready, you can deploy this using the following:
source env.sh
helmfile --file ./kube_addons/cert_manager/helmfile.yaml apply
Create the following below and save it as ./kube_addons/cert_manager/issuers.yaml
When ready, you can deploy this using the following:
source env.sh
helmfile --file ./kube_addons/cert_manager/issuers.yaml apply
NGINX Ingress Controller: access NGINX+ images
This is more complex component because currently, in order of have NGINX Ingress Controller integrated with NGINX Service Mesh, you need to have a commericial license of NGINX. For a trial licnese, see to https://www.nginx.com/free-trial-request/.
Follow the instructions to download nginx-repo.crt
, nginx-repo.key
, and optionally nginx-repo.jwt
. You will need to install these into your local Docker environemnt to be able to access the private repository (below).
First, copy the keys to a local location. Assuming these are in $HOME/Downloads
, you can copy them using this command in Bash:
cp ~/Downloads/nginx-repo.{jwt,key,crt} ./kube_addons/nginx_ic
Save script the following to ./kube_addons/nginx_ic/docker_keys.sh
:
When ready, you can run this with:
pushd ./kube_addons/nginx_ic/ && bash docker_keys.sh && popd
If you are using Docker Desktop on Mac, you will need to restart Docker Desktop, so that it may copy the keys into its virtual machine, which is Hyperkit system that runs the Docker engine.
For Linux, nothing needs to be done, as the new credentials will be immediately available.
NGINX Ingress Controller: republish private images
Now comes time to pull the images from a private repository and push them to the GCR repository.
NOTE: GCR was enabled and access was granting using gsutil
in the previous article.
The easiest solution to access private images is to republish them to GCR.
NOTE: the alternative is more complex, as it requires using JWT credentials to do image pulls directly from NGINX’s private repository; see Using the NGINX IC Plus JWT token in a Docker Config Secret for more information.
Run the following steps to republish to GCR:
NGINX Ingress Controller: Helmfile configuration
Create the following at ./kube_addons/nginx_ic/helmfile.yaml
:
If you would like to use App Protect to secure access to the backend database, typically a good idea, you can enable this feature with:
export NGINX_APP_PROTECT=true
When ready, you can deploy this with the following command:
source env.sh
helmfile --file ./kube_addons/nginx_ic/helmfile.yaml apply
ExternalDNS
Create the following at ./kube_addons/external_dns/helmfile.yaml
:
When ready, you can deploy this with the following command:
source env.sh
helmfile --file ./kube_addons/external_dns/helmfile.yaml apply
Verify Deployment
You can verify all the components in kube-addons namespace by running:
kubectl get all,clusterissuer --namespace kube-addons
This should show something like the following:
Dgraph Ratel
Dgraph has a graphical administrative, query, and visualization tool called Ratel. You can install this with the following steps.
Create the following in ./ratel/helmfile.yaml
:
When ready, this can be deployed with the following command:
source env.shkubectl get namespace "ratel" > /dev/null 2> /dev/null \
|| kubectl create namespace "ratel" \
&& kubectl label namespaces "ratel" name="ratel"helmfile --file helmfile.yaml template \
| nginx-meshctl inject \
| kubectl apply --namespace "ratel" --filename -
This will install Ratel as a member of the service mesh network.
Normally, we would NOT want to this, because Ratel does NOT need direct access to Dgraph, for it is a wholly independent client application that can connect to any Dgraph database end-point.
However, once NGINX ingress controller is configured to integrate into NGINX service mesh, it will now only work for services that are apart of the mesh network.
You can check the results of the deployment with the following command:
kubectl get all --namesapce ratel
The results should look something like the following:
Notice for Ratel pod, there are 2/2
ready, meaning there are two containers ready, one for the Ratel web service and another for the proxy side-car container.
Deploy Endpoints
Now that an ingress controller, the layer 7 load balancer, is available, we can configure some endpoints to use the ingress. NGINX supports custom CRD for this process with the VirtualServer
resource.
Additionally, if App Protect is enabled, we can also configure a Policy
resource to further secure the endpoint.
Ratel Endpoint
Create the following file at ./ratel/vs.yaml
:
When ready to deploy this, you can run the following:
source env.sh
helmfile --file ./ratel/vs.yaml apply
You can see the newly deployed resources with the following command:
kubectl get all,virtualserver,certificate --namespace ratel
This should look something like the following:
The VirtualServer
is a custom resource from NGINX ingress controller that supports features that are not available in the generic Ingress
resource.
When the VirtualServer
resource is deployed, ExternalDNS will update the appropriate DNS records, and CertManager will issue a certificate, which shows up here as the Certificate
resource and corresponding Secret
resource to store the certificate.
You can view the application at https://ratel.example.com
, replacing example.com
to the domain that you are using.
Dgraph Endpoint
Dgraph was already deployed from in the previous article. Because Dgraph is a backend database, we should take measures to secure it.
One way to do this is with NGINX App Policy. If this was enabled earlier during installation of NGINX ingress controller, you can set a policy to secure traffic.
Create the following file at ./dgraph/vs.yaml
:
If you have multiple IP addresses you wish to add, you can add them under spec.accessControl.allow
in the above script. For example, I have one or two coffee shops that I use, so I could add these IP addresses.
When ready to deploy, you can run the following:
source env.sh# get outbound IP address
export MY_IP_ADDRESS=$(curl --silent ifconfig.me)# enable if NGINX Ingress was installed with this setting
export NGINX_APP_PROTECT=truehelmfile --file ./dgraph/vs.yaml
You can check the results of newly deployed resources with:
kubectl get all,virtualserver,certificate,policy --namespace dgraph
This should look something similar to the following:
Testing Access
Testing Access using curl and grpcurl
Afterward, you can test access with the following commands:
source env.shcurl dgraph.${DNS_DOMAIN}/health | jq
curl dgraph.${DNS_DOMAIN}/state | jq# fetch local api.proto
curl -sOL https://raw.githubusercontent.com/dgraph-io/pydgraph/master/pydgraph/proto/api.protogrpcurl -proto api.proto \
grpc.$DNS_DOMAIN:443 \
api.Dgraph/CheckVersion
Testing Access using Ratel
Connect to Ratel application and connect to https://dgraph.example.com
, replacing example.com
the domain that you are using:
If it is a green icons, then we are able to connect to the Dgraph database through the Dgraph endpoint, e.g. https://dgraph.example.com
.
In Ratel, select Console, Query, then copy and paste the following:
NOTE: Note that that the data set and schema should already populated into the database using steps in the previous tutorial.
This should look something like this:
The Future: Traffic Access Control Policies
NGINX Service mesh has some support Traffic Access Control Policies using SMI (Service Mesh Interface).
It would be nice to use this feature, because you can block all traffic off, and allow only some services that can access the mesh to connect to the Dgraph graph database. In this case, this would be the pydgraph client and the NGINX ingress controller, which should be able to connect Dgraph through the service mesh.
In order to get started, you would need to reconfigure NSM to deny all traffic, and then allow traffic by deploying a TrafficTarget
custom resource.
Should you want to experiment with this feature, you can configure this in the existing NSM with the following:
# set global setting that will be picked up in helmfile
export NSM_ACCESS_CONTROL_MODE=deny# path existing deployment
helmfile --file ./nsm/helmfile.yaml apply# delete existing running pods so new pods pick up config-map
kubectl delete --namespace "nginx-mesh" \
$(kubectl get pods \
--namespace "nginx-mesh" \
--selector "app.kubernetes.io/name=nginx-mesh-api" \
--output name
)# verify currently set to 'deny'
nginx-meshctl config | jq -r .accessControlMode
Ultimately, this solution ultimately will not work for one reason:
- Only HTTP traffic is denied, gRPC traffic still works regardless.
For this issue, I let the team know about it by filing a github issue: and received some feedback:
Our gRPC support does not have feature parity with HTTP. The TrafficTarget objects do not effect gRPC traffic. This deserves an explanation in the documentation (ref).
From the SMI specification, it is currently not supported:
gRPC — there should be a gRPC specific traffic spec. As part of the first version, this has been left out as HTTPRouteGroup can be used in the interim (ref).
Thus far in the the current implementation, you cannot restrict gRPC traffic regardless on any configuration. This comes as a shock, given the popularity of gRPC for microservice or distributed database clusters.
Addendum: alternative to using a registered domain
For an alternative to registering a domain and forwarding domain resolution to Cloud DNS, you can do the following to use untrusted certificates and simulate domain resolution:
- Set
ACME_ISSUER_NAME=letsencrypt-staging
before deploying anyVirtualServer
. - Edit edit
/etc/hosts
(or equivalent) on your local system to match the DNS records, update local DNS cache, or configuring DNS client to point to Cloud DNS for the search domain for that domain. - When accessing a service through the web like Ratel, you will need to click on add an exception when prompted for an untrusted website.
- When using the
curl
command, you will have to use thecurl -k
option. - When using the
load_data.py
script, you will need to copy the private certificate from the Kubernetes secret with thekubectl cp
command and then copy it into the running Docker container with thedocker cp
command. Then when runningload_pata.py
script, use the appropriate command line options to point to the path of the private certificate. Runload_data.py --help
for more information on command line options.
Cleanup
Delete Kubernetes Resources
This will delete the resources from this tutorial and the previous tutorial.
Purge Google Cloud Resources
This will delete the cloud resources from this tutorial and the previous tutorial.
Resources
Blog Source Code
These are further notes and code that I created in the testing of NGINX service mesh solution.
Google Cloud
F5 NGINX
- NGINX and NGINX Plus Ingress Controllers for Kubernetes (KIC) source code
- F5 NGINX Ingress Controller product page
- F5 NGINX Service Mesh product page
- How to Choose a Service Mesh (2021-MAY-04): blog
- SMI Traffic Policies: docs
- Deploy NGINX Plus Ingress Controller (aka KIC or Kubernets Ingress Controller): docs
- Using the NGINX IC Plus JWT token in a Docker Config Secret: docs
F5 Aspen Mesh
- Aspen Mesh: F5 alternative service mesh solution built around Istio and Envoy, not NGINX.
Articles
Other
- Service Mesh Interface home page
- SMI Specification github repo
- grpc traffic allowed when accessControlMode=deny github issue
- SSL_do_handshake Errors with gRPC github issue
Conclusion
For now, this concludes the journey to deploy a service mesh platform (NGINX Service Mesh) with an integrated ingress controller (NGINX Ingress Controller).
Take Aways
The take-aways from this exercise are the following:
- Deploying integrated ingress controller with NGINX Ingress Controller
- Using private container registry to host NGINX+ image artifacts.
- Using new NGINX Ingress Controller custom resources of
VirtualServer
andPolicy
. - Enabling app protect and adding a policy to further protect services.
- Integrating CertManager with
VirtualServer
to secure web traffic with trusted TLS certificates. - Integrating ExternalDNS to automate DNS record upserts when deploying
VirtualServer
. - Using Workload Identity with GKE to grant secure access to Cloud DNS.
About the NGINX Service Mesh Solution
The NGINX Service Mesh was deployed with strict
mode (mTLS), so that those that are not members of the service mesh cannot connect to any services on the mesh, such as the private Dgraph graph database.
In this article, we added an integrated ingress controller where the ingress controller communicates to the mesh using mTLS. This adds an essential layer of security, because traffic between the ingress controller (layer 7 load balancer) and members of the service mesh are encrypted and authenticated using mTLS.
Limitations of NGINX Ingress Controller
But there’s a catch, the ingress controller cannot service non-mesh-members after it is integrated.
Thus in order to use NGINX Ingress Controller, all services MUST be integrated into the service mesh, regardless of whether or not the new service needs access to other services within the mesh, such as the private Dgraph graph database.
In the case of the Ratel client application, we do not want anything running on the small web service container to access the mesh. Ratel is a client that will load in your browser, and accesses Dgraph through an endpoint. The service does nothing in the backend except serve the client application.
Workarounds for the Limitation
So what can we do?
There are three possible workarounds that come to mind to get around this limitation:
- Deploy two ingress controllers, one to service the meshed services, and one that services non-meshed services. This adds costs for the public IP address and external load balancer.
- Set mTLS mode to
permissive
, to allow both non-mesh and mesh traffic to the services, which is less secure. - Introduce a new network plugin, like Calico, that supports network policies. This can work for either service mesh sets mTLS to either
permissive
orstrict
. - Set access control mode to
deny
and then use SMI traffic access control policies to control access to the service, and only allow the ingress controller and designated client pods to access the database†.
† Note that currently, SMI traffic access control feature is currently still in development and not recommended for production. Ultimately, however, this solution will NOT work for gRPC, as gRPC traffic will always go through regardless of these policies, which is a significant limitation (ref. issue 76).
Required Commercial License Expense
NGINX Service Mesh advertises that this solution is free:
NGINX Service Mesh is free, optimized for developers, and the lightest, easiest way to implement mTLS and end-to-end encryption in Kubernetes for both east‑west (service-to-service) traffic and north‑south (ingress and egress) traffic. (ref How to Chose a Service Mesh)
Ultimately this may be false advertisement if you want “north-south (ingress and egress)” that is integrated into the service mesh. The documentation explicitly states this:
There are two versions of NGINX Ingress Controller for Kubernetes: NGINX Open Source and NGINX Plus. To deploy NGINX Ingress Controller with NGINX Service Mesh, you must use the NGINX Plus version. (ref deploy-with-kic)
In contrast to other solutions that are truly free, Istio and F5's Aspen Mesh, offers an integrated ingress controller that does not require a commercial license.
Finally
I think it is exciting that NGINX is embracing Kubernetes and offers a service mesh, especially given NGINX background is building the fastest web server and load balancer.
Their recent announcement to commit more resources to open source is exciting, so hopefully these products can improve and address some of the issues I brought up in this article and others.
Thank you for following.