GKE with Consul Service Mesh
Using Consul Connect and Envoy to build a service mesh
This article shows how to set up and get started with CCSM (Consul Connect service mesh) or more recently called just Consul Service Mesh.
This article will cover how to install and configure services to use CCSM. An example application Dgraph, a distributed graph database, will be used as this demonstrates a real world application.
📔 NOTE: This was tested on following below and may not work if versions are significantly different.* Kubernetes API v1.22
* gcloud 402.0.0
* gsutil 5.13
* kubectl v1.22
* kustomize v4.5.4
* helm v3.8.2
* helmfile v0.144.0
* Docker 20.10.17
* Dgraph v21.03.2
* Consul 1.13.2
About Consul
Consul is a popular tool for service discovery and a key-value store that was released in April-2014.
Service discovery is important for clusters members or microservices as it provides “automatic detection of devices and services offered by these devices on a computer network (ref)”. This allows “applications and microservices locate different components on a network (ref)”.
The key-value store is a network database to store hash maps (also called associative arrays or dictionaries). This allows services create and retrieve configuration.
Building upon this, Hashicorp developed consul-template, which is essentially cloud native change configuration, and Consul Connect, now called Consul Service Mesh, which can automatically inject side-car proxy containers into your network, so that services can communicate securely.
About Service Mesh
A service mesh uses automation to secure internal network traffic between member nodes. It does this by inserting reverse-proxy sidecar containers to every pod that is apart of the service mesh.
With a network of side-car proxies installed, the traffic can be further secured using strict mTLS, where not only the client must authenticate the validity of the server, but the server must authenticate the validity of the client. This is on top of the encryption of traffic between all the members of the mesh.
A service mesh is divided into three planes (illustration below): control plane to manage the overall service mesh, a data plane that consists of the members within the mesh that are secured with a proxy, and observability plane to monitor traffic from within the mesh.
📔 NOTE: Observability is not supported for Consul Service Mesh with services that use multiple-ports.
Consul Connect leverages off of Consul to manage the connectivity through use of service discovery, health checks, and a service catalog. Envoy is the default proxy that is injected into each of the pods to create the service mesh. This proxy can be swapped for another proxy component, such as HAProxy or NGINX.
Requirements
These are the requirements to use this solution.
Accounts
No commercial licenses are needed for either Consul and Dgraph. All of the tools are accessible from the public Internet. For creating resources on Google Cloud, you will need to create an account.
- Google Cloud account with ownership of a project where you can deploy resources (where billing account was linked to the project)
Knowledge
You should be familiar or have exposure to the following concepts to get more thorough understanding of this tutorial:
- virtual hosting, reverse-proxy, and load balancer
- Layer 4 vs Layer 7 (OSI), TCP/IP, routing, gateway
- HTTP/1.1 vs HTTP/2 and gRPC
- images vs containers
- virtual machines instances (nodes) vs containers vs pods
For Kubernetes, experience with deploying applications with service
resources is useful, but even if you don’t have this, this guide will walk you through it. Configuring KUBECONFIG
to access the Kubernetes cluster with Kubernetes client (kubectl
) and using Helm (helm
), so familiarity to this is useful.
For Google Cloud, you should be familiar Google Cloud SDK (gcloud
tool) with setting up an account, project, and provisioning resources. This is important as there are cost factors involved in setting these things up.
Tools (Required)
- Google Cloud SDK (
gcloud
command) to interact with Google Cloud - Kubernetes client (
kubectl
command) to interact with Kubernetes - Helm (
helm
command) to install Kubernetes packages - helm-diff plugin to see differences about what will be deployed.
- helmfile (
helmfile
command) to automate installing many helm charts - Kustomize (
kustomize
command) to apply patches to existing Helm charts
Tools (Recommended)
These tools are useful in using the automation used form within this article.
- POSIX shell (
sh
) such as GNU Bash (bash
) or Zsh (zsh
): these scripts in this guide were tested using either of these shells on macOS and Ubuntu Linux. - GNU stream-editor (
sed
) and GNU grep (grep
): scripts were tested with these tools. Note that BSD versions of these tools may NOT WORK, such as tools bundled with macOS or BSD. - Docker Engine (
docker
command) to automate building and pushing running pydgraph client to Google container registry. - git (
git
command) to download source code from git code repositories.
Project Setup
This will setup all the content for this tutorial.
Directory Structure
The directory structure should look like this:
~/projects/consul_connect
├── consul
│ └── helmfile.yaml
└── examples
└── dgraph
├── helmfile.yaml
└── pydgraph_client.yaml
In GNU Bash, you can create the above structure like this:
export PROJECT_DIR=~/projects/consul_connectmkdir -p $PROJECT_DIR/{examples/dgraph,consul}
cd $PROJECT_DIR
touch {consul,examples/dgraph}/helmfile.yaml \
examples/dgraph/pydgraph_client.yaml
Environment Variables
These environment variables will be used in this project. Create a file called env.sh
with the contents below, changing values as appropriate, and then run source env.sh
.
Google Project Setup
For this tutorial, we’ll need to setup a Google cloud project and provide access to allow use to create the necessary cloud resources. Here is an example of how you can set this up with gcloud
:
Provision Cloud Resources
These instructions will create the necessary cloud resources for this project.
Provision Google Kubernetes Engine cluster
The steps below will allow you to bring up a Kubernetes cluster with 3 worker nodes.
📔 NOTE: This will deploy a robust 3 worker node Kubernetes cloud that is suitable for Consul. This will create a principal identity (Google Service Account) with the minimal necessary privileges required to manage the Kubernetes nodes (GCE).📔 NOTE: For production environments, you will want to explore further security measures, such as private cluster, to block access from the public Internet.
You can test access to the cluster as well as the components installed with the following commands:
kubectl get nodes
kubectl get all --all-namespaces
Another useful command to test a new cluster is to see how many resources are available and what is consumed in the new cluster:
kubectl top nodes
kubectl top pods --all-namespaces
Deploy Kubernetes Resources
This section covers deploying Kubernetes resources such as Deployment
, StatefulSet
, ServiceAccount
, Service
, and so on. This will cover installing the Consul Connect service mesh, Dgraph, and pydgraph-client to access Dgraph through the service mesh.
Deploy Consul Connect service mesh
This will deploy the Consul Connect Service mesh. Save the following code below as consul/helmfile.yaml
:
This Helm chart configuration values will install Consul Connect service mesh with automatic injection enabled. When you deploy a pod with annotation of consul.hashicorp.com/connect-inject: "true"
, side-car containers will be installed to copy the consul binary into the container and setup and configure Envoy proxy. The service
proxy resources will be used as a blueprint to register the service with Consul’s service catalog and configure the Envoy proxy.
Run the following to deploy the service mesh:
source env.shhelmfile --file ./consul/helmfile.yaml apply
You can check that everything is deployed with:
kubectl get all --namespace consul
This should show something like this:
Deploy Observability
Currently observability is not supported with multi-port services like Dgraph. Hopefully this will get fixed in the future.
For further information, see:
- Caveats for Multi-port Pods
- injection stack traces when deploying multi-port application (issue 1594)
Deploy Dgraph
Dgraph is a distributed graph database communicates through both HTTP on port 8080
and gRPC on port 9080
. Dgraph uses the DQL (Dgraph Query Language) through either gRPC or HTTP, and can also use GraphQL with HTTP. Dgraph supports administrative operations using GraphQL or REST.
For this reason, to fully use Dgraph on a service mesh, you have to use the recently added multi-port configuration with Consul Connect. This requires separating the single multi-port service
into two separate services
: one for gRPC (9080
) and one for HTTP (8080
).
Save the following helmfile config below as examples/dgraph/helmfile.yaml
:
This helmfile config uses some advance features to make some necessary changes required by Consul Connect:
- pre-install service accounts and new gRPC service all packaged up as
dgraph-extras
chart - render Dgraph resources with required annotations for consul
- apply patches to add Dgraph headless service labels that instructs Consul to ignore these services when is configures the proxies.
- remove gRPC port (
9080
) from the Dgraph Alphaservice
, as this was defined earlier as a separate gRPCservice
with thedgraph-extras
chart.
Consul Connect will inject Envoy sidecar proxy containers. Dgraph Zero will get a sidecar for port 6080
, while Dgraph Alpha will have two sidecar proxy containers per pod: one for gRPC at port 9080
and another one for HTTP at port 8080
.
When ready to deploy all of this, run the following command:
source env.shhelmfile --file ./examples/dgraph/helmfile.yaml apply
You can check on the status using:
kubectl get all --namespace dgraph
This should show something like:
You notice the extra containers per pod in the ready state, which are the Envoy proxy sidecar containers.
Deploy Pydgraph client
The client is a small python script that can load data into Dgraph using gRPC, and the container also has some useful tools like curl
, grpcurl
, and jq
.
Save the following below as examples/dgraph/pydgraph_client.yaml
:
When ready to deploy this, you can run the following:
source env.sh# https://hub.docker.com/r/darknerd/pydgraph-client
export DOCKER_REGISTRY=darknerd
export CCSM_ENABLED=truehelmfile --file ./examples/dgraph/pydgraph_client.yaml apply
You can check the deployment with the following:
kubectl get all --namespace pydgraph-client
This should result in something similar to the following:
Testing Upstream Traffic
Consul Connect will set up a tunnel between the upstream ports specified in the annotation to the ports that are serviced by Dgraph.
First remote into the client container:
CLIENT_NS="pydgraph-client"
PYDGRAPH_POD=$(kubectl get pods -n $CLIENT_NS --output name)
kubectl exec -ti -c "pydgraph-client" -n $CLIENT_NS \
${PYDGRAPH_POD} -- bash
One in the container, test that HTTP traffic is working:
curl --silent localhost:8080/health
For gRPC traffic, you can run the following:
grpcurl -plaintext -proto api.proto \
localhost:9080 api.Dgraph/CheckVersion
Also, you can try loading data:
python3 load_data.py \
--plaintext \
--alpha localhost:9080 \
--files ./sw.nquads.rdf \
--schema ./sw.schema
These should work through the tunnel that is configured by Consul Connect using the Envoy proxy side-cars.
Dgraph Graphical Viewer: Ratel
Dgraph hosts an online graphical viewer at https://play.dgraph.io/. If you would like to access the data we deployed with load_data.py
, you can run this in a new terminal tab:
kubectl port-forward svc/dgraph-dgraph-alpha -n dgraph 8080:8080
Now you can you can point the connection configuration in Ratel to http://localhost:8080:
Click on the Console and select Query and enter the following DQL:
Click Run to see the results of the query:
Consul User Interface
The Consul UI can be accessed by running this command in a new terminal tab:
source env.shkubectl port-forward service/consul-ui --namespace consul 8500:80
You can access the Consul UI through http://localhost:8500. The Consul UI should look like this below with other services appearing after Dgraph and pydgraph-client were deployed.
If you click on pydgraph-client, you can see the connections:
Cleanup
Kubernetes Resources
You can cleanup Kubernetes resources with the following:
It is important to delete the consul
namespace if you intend to deploy new version of Consul Connect service mesh in the future. This is because there are secrets left behind that will break future installations, so deleting the namespace
will avoid this scenario.
Cloud Resources
The Kubernetes cluster and the associated Google service account can be deleted with the following commands:
Addendum: Publishing Pygraph-Client Images
If you would like to publish the pydgraph-client images to an alternative registry, you can run the following steps below.
Download the source code
pushd examples
git clone \
--depth 1 \
--branch "consul" \
git@github.com:darkn3rd/pydgraph-client.git
popd
Publishing to GCR
If you wish to use Google Container Registry, you can run the following.
Publising to DockerHub
If you have an account on DockerHub, you can publish it there with these steps:
Resources
These are some resources and references that may be useful in using this solution.
Consul Documentation
- Kubernetes Pods with Multiple Ports
- Secure Consul and Registered Service on Kubernetes
- Annotations and Labels
- Enable Transparent Proxy Mode
- Tranparent Proxy on Consul Service Mesh, 2020-JUL-01
- How does Consul Service Mesh Work on Kubernetes?
Gateways and Ingress
These are links for north-south traffic into mesh.
I have not tested these solutions yet
These are links that cover integration of either ingress controllers or API gateways with Consul. This may be using Consul as a backend database or the Consul Connect service mesh itself.
📔 NOTE: I have not tested the content of this material, just documenting any material I find on the topic for later exploration. If you find any useful material out there, please send me a note.
- Amabassador Edge Stack integration with Consul
- Consul API Gateway
- Using HashiCorp Consul with Kong Ingress Controller for Kubernetes
- Getting Started With Traefik Proxy and HashiCorp Consul
- Taefik Consul Provider configuration
- traefik-consul walk-through
- consul ingress controllers integration with Traefik or Kong using terrafom
- Consul on EKS using nginx as ingress (transparent mode)
Tracing
- Consul connect integration of jaeger in Kubernetes
- Consul Service Mesh Tracing Demo
- https://github.com/envoyproxy/envoy/tree/main/examples/jaeger-tracing
Dgraph Documentation
Helmfile
- https://github.com/helmfile/helmfile
- What is Helmfile? by Paul Czarkowski (VMWare Tanzu Developer Center)
Blog Source Code
This is some code that I developed when testing Consul Connect service mesh solution.
- Blog Source Code: https://github.com/darkn3rd/blog_tutorials/tree/master/kubernetes/gke/service-mesh/consul-connect
- HTTP/gRPC Greeter Application: https://github.com/darkn3rd/greeter
- pydgraph-client w consul support: https://github.com/darkn3rd/pydgraph-client/tree/consul
Conclusion
There you have it, a small (cough) overview how to get started with Consul Connect Service Mesh. In particular, here some of the takeaways:
- Provisioning Kubenertes (GKE)
- (addendum) Provisioning GCR and publishing images to GCR
- Deploying Consul Connect Service Mesh on GKE
- Deploying a server and a client with multiport support: HTTP and gRPC
- Testing HTTP traffic with
curl
and gRPC traffic withgprcurl
. - Limitations and Challenges with current multi-port scenarios
Additionally, here’s some extra takeaways beyond just using Consul Connect:
- Using Helmfile to deploy Helm charts with templated chart config values, where values and branch logic is set by env vars.
- Using Helmfile to patch using Kustomize merge and JSON Patch
- Helm raw chart to package Kubernetes manifests as templated values
- Introduction to Dgraph distributed graph database
The Challenges with Consul
You may have noticed that Consul is, dare I say, complex, beyond complex. The documentation is good, but perhaps maybe not all that well organized, with many missing things.
The underlying tool Consul is very powerful, and Consul Connect service mesh on top of this tool is quite robust and extremely flexible where you can swap out the default CA for other solutions, like Vault CA, and swap out the Envoy proxy for another solution, like NGINX or HAProxy. For ingress into the cluster, you can use Consul API Gateway, or another API Gateway or an ingress controller.
Consul Connect service mesh has some challenges or limitations (see below) when you have a service that supports multiple ports.
Complexity
I have experimented with other service meshes and I was able to get up to speed quickly: Linkerd = 1 day, Istio = 3 days, NGINX Service Mesh = 5 days, but Consul Connect service mesh took at least 11 days to get off the ground. This is by far the most complex solution available.
Unable to Update
If you need to update Consul Connect with a configuration change and use helm to update consul, the consul-server pods may not reach a healthy state. You may have to delete everything and recreate it from scratch.
Apparently there’s some way to ameliorate this by adding leave_on_terminate: true
setting in the server.extraConfig
(ref).
Higher Memory Footprint
Consul Connect service mesh has a higher memory footprint, so on a small cluster with e5-medium
nodes (2 vCPUs, 4 GB memory), you will only be able to support a maximum of 6 side-car proxies. In order to get an application like Dgraph working, which will have 6 nodes (3 Dgraph Alpha pods and 3 Dgraph Zero pods) for high availability along with at least one client, a larger footprint with more robust Kubernetes worker nodes were required.
Requirement for Service Resource
One challenge to Consul Connect service mesh is that it configures the Envoy side-car proxy based on what you specify for a service
. This added some challenges.
- A pure client that is not listening on a port, still requires you to specify a
service
resource so that it can be added to the service mesh. - A
StatefulSet
that requires specifying a headlessservice
in addition toservice
endpoint into the cluster will fail spectacularly if both service and headless service use the same port.
The docs explicitly note this:
Note: As of consul-k8s
v0.26.0
and Consul Helmv0.32.0
, having a Kubernetes service is required to run services on the Consul Service Mesh. (ref)
More Complexity with Multiport
The Kubernetes service
API supports an array of ports
that you can specify, but Consul Connect only supports a single port for transparent-proxy mode. This is very bizarre, because a service with multiple ports is quite common, such as an admin port vs API port, or scenarios where a service has both HTTP and gRPC interfaces.
This is also part of the Kubernetes service
API specification, which Consul Connect reads to configure the Envoy proxy. So, in this sense, Kubernetes is not fully supported as far a parity with the service
API.
For the multi-port scenario, the following will need to be done on the server:
- all
services
with multiple ports will need to be broken up into separate services with only one port - need to specify
consul.hashicorp.com/connect-service
annotation listing each of the services supported that will be mapped into consul. - need to specify
consul.hashicorp.com/connect-service-port annotation
listing ports that correspond to the previous above annotation - if ACLs are enabled, a
serviceaccount
needs to be specified corresponding to eachservice
specified. - if ACLs are enabled and Kubernetes 1.24+ is used, a corresponding
secret
for the service token needs to be created as well.
The client will need the following in order to connect to the server:
- specify
consul.hashicorp.com/connect-service-upstreams
annotation listing the consul service and outbound port to use fromlocalhost
. - if ACLs are enabled, a
serviceaccount
that corresponds to theservice
specified for the client.
The client is now required to connect to localhost
at the target outbound port, not to the service endpont DNS name, such as mysvc.myns.svc.cluster.local
. This will be the only way to use the service mesh. Directly connecting to the service endpoint, e.g. mysvc.myns.svc.cluster.local
, will bypass the service mesh and thus will not be protected with encryption.
Insecurity with Multiport
When transparent-proxy is enabled, members can communicate using the DNS of the service endpoint, for example: mysvc.myns.svc.cluster.local
. And when you use multi-port scenario, transparent-proxy is unfortunately disabled.
Because of this situation, security through mTLS or ACLs (tokens) can be bypassed completely when multi-port services are configured. Any non-mesh member or mesh member that does not have access granted (through configuring an intention) can connect to the service endpoint, such as mysvc.myns.svc.cluster.local
. The only thing ACLs offer at this point is blocking encrypted traffic through the mesh, and thus the ACL feature is pointless.
This issue can be ameliorated by configuring the service itself to only communicate through localhost
, which forces it to use the service mesh, but then this poses problems, such as trying to use an ingress
. Alternatively, you could use a firewall, such as a network policy. Ultimately, another non-Consul solution is needed.
Ingress Challenge with Multiport
An ingress controller is an interesting challenge to integrate to the service mesh, as annotations will be needed to put the ingress controller pods onto the service mesh. The ingress controller will route traffic to the backend service
using the local DNS, such as mysvc.myns.svc.cluster.local
, where the service
named mysvc
running in the myns
namespace.
With multi-port scenario however, this will not work, because the ingress controller is now required to route to localhost
for a specific outbound ports that are specified in the consul.hashicorp.com/connect-service-upstreams
annotation. The normal ingress
resource API does not support this setup, as it routes to Kubernetes service
DNS name, not to localhost
.
There may be some ingress controllers that may provide extra non-standard configurations that could support this requirement to route to localhost
, but unfortunately no one at Hashicorp has even tested this common use case (ref).
No Observability with Multport
If you are using multi-port scenario, observability is not an option. Just forget you even heard of the word observability, one of the three planes that make up the service mesh solution. The Consul Connect injection process will actually cause stack traces.
Wrapping Up
I hope this is useful in exposure to Consul Connect service mesh and can help you get started should you want to try this out. If you have services that only listen on a single port, then this certainly an interesting solution to explore.
If however, you have an application service that needs support for 2+ ports, because you know, Kubernetes supports this, I would recommend avoiding Consul Connect, as it is not functional to meet minimum requirements for a service mesh. Perhaps someday, when Hashicorp prioritizes basic functionality and usability in future version, this product can be considered.