AKS with Istio Service Mesh

Securing traffic with Istio service mesh on AKS

Joaquín Menchaca (智裕)
13 min readAug 7, 2021

--

Update: 2021年08月14日 Reduce commands, gist for 4+ commands, fixes

So now you have your distributed application, monolith, or microservices packaged as a container and deployed to Kubernetes. Congratulations!

But now you need to have security, such as encrypted traffic and network firewalls, and for all of these secured services, you need to have monitoring and proper load balancing of gRPC (HTTP/2) traffic, especially as Kubernetes fails in this department (ref).

And all of these things need to happen every time you roll out an new pod.

The Solution

This can all be down with a service mesh, that will add encryption-in-transit (mTLS or Mutual TLS), o11y (cloud native observability), load balancing, traffic management, as well as other features. For security outside of the service mesh (layer 4), you can use a network plugin, like Calico, that supports network policies.

This article will cover how to get started with this using Istio, coupled with the famous Envoy proxy, which is one of the most popular service mesh platforms on Kubernetes.

Goals and Not Goals

This article will cover the following goals:

  1. Install AKS with Calico and install Istio with Istio addons
  2. Install Dgraph and some clients (Python script using pydgraph)
  3. Test outside traffic is blocked after installing the network policy.
  4. Test traffic works through the service mesh.
  5. Generate traffic (gRPC and HTTP) and observe in Kiali.

The not goals (reserved for later):

  • Restricting traffic within the mesh to authorized clients.
  • Automatic authorization (AuthN, JWT, etc) for mesh members
  • Managing external inbound or outbound traffic through Gateways.
  • Other traffic management features, like retries and circuit breaker.

Architecture

Istio Architecture: Control Plan vs. Data Plane

A service mesh can be logically organized into two primary layers:

a control plane layer that’s responsible for configuration and management, and a data plane layer that provides network functions valuable to distributed applications. (ref)

Articles in Series

This series shows how to both secure and load balance gRPC and HTTP traffic.

  1. AKS with Azure Container Registry
  2. AKS with Calico network policies
  3. AKS with Linkerd service mesh
  4. AKS with Istio service mesh (this article)

Previous Article

The previous article covered similar topics using the Linkerd service mesh.

Requirements

For creation of Azure cloud resources, you will need to have a subscription that will allow you to create resources.

Required Tools

  • Azure CLI tool (az): command line tool that interacts with Azure API.
  • Kubernetes client tool (kubectl): command line tool that interacts with Kubernetes API
  • Helm (helm): command line tool for “templating and sharing Kubernetes manifests” (ref) that are bundled as Helm chart packages.
  • helm-diff plugin: allows you to see the changes made with helm or helmfile before applying the changes.
  • Helmfile (helmfile): command line tool that uses a “declarative specification for deploying Helm charts across many environments” (ref).
  • Istio CLI (istioctl): command line tool to configure and deploy the Istio environment.

Optional tools

  • POSIX shell (sh) such as GNU Bash (bash) or Zsh (zsh): these scripts in this guide were tested using either of these shells on macOS and Ubuntu Linux.
  • Docker (docker): command line tool to build, test, and push docker images.

Project setup

The following structure will be used:

~/azure_istio
├── addons
│ ├── grafana.yaml
│ ├── jaeger.yaml
│ ├── kiali.yaml
│ ├── prometheus.yaml
│ ├── prometheus_vm.yaml
│ └── prometheus_vm_tls.yaml
├── env.sh
└── examples
├── dgraph
│ ├── helmfile.yaml
│ └── network_policy.yaml
└── pydgraph
├── Dockerfile
├── Makefile
├── helmfile.yaml
├── load_data.py
├── requirements.txt
├── sw.nquads.rdf
└── sw.schema

With either Bash or Zsh, you can create the file structure with the following commands:

Project environment variables

Setup these environment variables below to keep a consistent environment amongst different tools used in this article. If you are using a POSIX shell, you can save these into a script and source that script whenever needed.

Copy this source script and save as env.sh:

Provision Azure resources

Azure Resources

Both AKS with Azure CNI and Calico network policies and ACR cloud resources can be provisioned with the following steps outlined in the script below.

Verify AKS and KUBCONFIG

Verify that the AKS cluster with current configured KUBCONFIG environment variable:

source env.sh
kubectl get all --all-namespaces

The results should look something like the:

AKS with Azure CNI and Calico

NOTE: As of Aug 1, 2021, this will install Kubernetes v1.20.7 and Calico cluster version is v3.19.0. This will reflect recent changes and introduce two new namespaces: calico-system and tigera-operator.

Verify Azure CNI

Verify that nodes and pods are now on the same Azure VNET subnet, which means that the Azure CNI network plugin is installed as the default plugin.

You can print the IP addresses on the nodes and pods with the following:

This should show something like this:

Nodes:
------------
aks-nodepool1-56788426-vmss000000 10.240.0.4
aks-nodepool1-56788426-vmss000001 10.240.0.35
aks-nodepool1-56788426-vmss000002 10.240.0.66
Pods:
------------
calico-kube-controllers-7d7897d6b7-qlrh6 10.240.0.36
calico-node-fxg66 10.240.0.66
calico-node-j4hlq 10.240.0.35
calico-node-kwfjv 10.240.0.4
calico-typha-85c77f79bd-5ksvc 10.240.0.4
calico-typha-85c77f79bd-6cl7p 10.240.0.66
calico-typha-85c77f79bd-ppb8x 10.240.0.35
azure-ip-masq-agent-6np6q 10.240.0.66
azure-ip-masq-agent-dt2b7 10.240.0.4
azure-ip-masq-agent-pltj9 10.240.0.35
coredns-9d6c6c99b-5zl69 10.240.0.28
coredns-9d6c6c99b-jzs8w 10.240.0.85
coredns-autoscaler-599949fd86-qlwv4 10.240.0.75
kube-proxy-4tbs4 10.240.0.35
kube-proxy-9rxr9 10.240.0.66
kube-proxy-bjjq5 10.240.0.4
metrics-server-77c8679d7d-dnbbt 10.240.0.89
tunnelfront-589474564b-k8s88 10.240.0.67
tigera-operator-7b555dfbdd-ww8sn 10.240.0.4

The Istio service mesh

Kubernetes components

There are a few ways to install Istio, helm charts, operators, or with istioctl. For this article, we take the easy road, the istioctl command.

Istio Platform

Install and verify Istio service mesh with the following commands:

source env.shistioctl install --set profile=demo -y
kubectl
get all --namespace istio-system

This should show something like the following below:

Deployment of Istio

Istio addons

Download the addon manifests and install them with the following commands:

NOTE: The first time kubectl apply is run, there will be errors as CRD was not yet installed. Run kubectl apply again, to run the remaining manifests that depend on the CRD.

After adding these components, you can see new resources with kubectl get all -n istio-system:

Deployment of Istio + Addons

The Dgraph service

Dgraph is a distributed graph database that can be installed with these steps below.

Save the following as examples/dgraph/helmfile.yaml:

NOTE: The namespace dgraph will need to have the required label istio-injection: enabled to signal Istio to install Envoy proxy side cars.

Both the namespace with the needed label and Dgraph can be installed and verified with these commands:

source env.sh
helmfile --file examples/dgraph/helmfile.yaml apply
kubectl --namespace dgraph get all

After about two minutes, this should show something like the following:

Deployment of Dgraph

The pydgraph client

For the pydgraph client, we’ll run though these steps to show case Istio service mesh and Calico network policies:

  1. Build pydgraph-client image and push to ACR
  2. Deploy pydgraph-client in pydgraph-allow namespace. Istio will inject an Envoy proxy into the pod.
  3. Deploy pydgraph-client in pydgraph-deny namespace.

Fetch the build and deploy scripts

In the previous blog, I documented steps to build and release a pygraph-client image, and then deploy a container using that image.

Fetch build and deploy scripts

Below is a script you can use to download the gists and populate the needed files run through these steps.

NOTE: These scripts and further details are covered in the previous article (see AKS with Azure Container Registry).

Build and Push

Now that all the required source files are available, build the image:

source env.shaz acr login --name ${AZ_ACR_NAME}
pushd examples/pydgraph && make build && make push && popd

Deploy to pydgraph-deny namespace

The client in this namespace will not be apart of the service mesh.

helmfile \
--namespace "pydgraph-deny" \
--file examples/pydgraph/helmfile.yaml \
apply

Afterward, you can check the results with kubectl get all -n pydgraph-deny:

Namespace: pydgraph-deny

Deploy to pydgraph-allow namespace

The client in this namespace will be apart of the service mesh. Create the namespace pydgraph-allow, deploy pydgraph client into that namespace, and verify the results with the following commands:

The final results in pydgraph-allow namespace the should look similar to the following:

Namespace: pydgraph-allow

This will add the Envoy proxy sidecar container:

Test 0 (Baseline): No Network Policy

Conduct a basic check to verify that the things are working before running any tests with network policies. In this sanity check and proceeding tests, both HTTP (port 8080) and gRPC (port 9080) will be tested.

No Network Policy

Log into pydgraph-deny

Log into pydgraph-deny client:

PYDGRAPH_DENY_POD=$(
kubectl get pods --namespace "pydgraph-deny" --output name
)
kubectl exec -ti --namespace "pydgraph-deny" \
${PYDGRAPH_DENY_POD}
-- bash

HTTP check (no network policy)

In the pydgraph-client container and run this command:

curl ${DGRAPH_ALPHA_SERVER}:8080/health | jq

The expected results should be health status of one of the Dgraph Alpha nodes:

/health (HTTP)

gRPC check (no network policy)

In the pydgraph-client container and run this command:

grpcurl -plaintext -proto api.proto \
${DGRAPH_ALPHA_SERVER}:9080 \
api.Dgraph/CheckVersion

The expected results will be the Dgraph server version.

api.Dgraph/CheckVersion (gRPC)

TEST 1: Apply a network policy

The goal of this next test is to deny all traffic that is outside of service mesh. his can be done by using network policies where only traffic from the service mesh is permitted.

After adding the policy, the expected results will timeouts as communication from the pydgraph-client that is not in the service mesh, from the pydgraph-deny namespace, will be blocked.

Network Policy added to block traffic outside the mesh

Adding a network policy

This policy will deny all traffic to the Dgraph Alpha pods, except for traffic from the service mesh, or more explicitly, from any pod from namespaces with the label linkerd.io/control-plane-ns=linkerd.

Dgraph Network Policy for Istio (made with https://editor.cilium.io)

Copy the following and save as examples/dgraph/network_policy.yaml:

When ready, apply this with the following command:

kubectl --filename ./examples/dgraph/network_policy.yaml apply

Log into pydgraph-deny

Log into pydgraph-deny client:

PYDGRAPH_DENY_POD=$(
kubectl get pods --namespace "pydgraph-deny" --output name
)
kubectl exec -ti --namespace "pydgraph-deny" \
${PYDGRAPH_DENY_POD}
-- bash

HTTP check (network policy applied)

Log into the pydgraph-client pod, and run this command:

curl ${DGRAPH_ALPHA_SERVER}:8080/health

The expected results in this case, after a very long wait (about 5 minutes) will be something similar to this:

gRPC check (network policy apply)

Log into the pydgraph-client pod and run this command:

grpcurl -plaintext -proto api.proto \
${DGRAPH_ALPHA_SERVER}:9080 \
api.Dgraph/CheckVersion

The expected results for gRPC in about 10 seconds will be:

Test 2: Test with Envoy proxy side car

Now that we verified that network connectivity is not possible from the pydgraph-deny namespace, we can now try testing from pydgraph-allow, which has the Envoy proxy side car injected into the pod by Istio.

Log into pydgraph-allow

Log into pydgraph-allow client:

PYDGRAPH_ALLOW_POD=$(
kubectl get pods --namespace "pydgraph-allow" --output name
)
kubectl exec -ti --namespace "pydgraph-allow" \
${PYDGRAPH_ALLOW_POD}
-- bash

HTTP check (namespace label applied)

Log into the pydgraph-client pod, and run this command:

curl ${DGRAPH_ALPHA_SERVER}:8080/health | jq

The expected results for this is that JSON data about the health from one of the Dgraph Alpha pods.

/health (HTTP)

gRPC check (namespace label applied)

Log into the pydgraph-client pod and run this command:

grpcurl -plaintext -proto api.proto \
${DGRAPH_ALPHA_SERVER}:9080 \
api.Dgraph/CheckVersion

The expected results for this is that JSON detailing the Dgraph server version.

api.Dgraph/CheckVersion (gRPC)

Test 3: Listening to traffic steams

For this step, we will monitor traffic as it goes through the proxy and then generate some traffic. For monitoring, we’ll Kiali graphical dashboard.

Kiali dashboard

Run this command:

istioctl dashboard kiali

One in the dashboard, click on graph and select the dgraph for the Namespace.

Generate Traffic

With this monitoring in place, log into the pydgraph-client pod and run these commands:

curl ${DGRAPH_ALPHA_SERVER}:8080/healthgrpcurl -plaintext -proto api.proto \
${DGRAPH_ALPHA_SERVER}
:9080 api.Dgraph/CheckVersion
python3 load_data.py --plaintext \
--alpha ${DGRAPH_ALPHA_SERVER}:9080 \
--files ./sw.nquads.rdf \
--schema ./sw.schema
curl "${DGRAPH_ALPHA_SERVER}:8080/query" --silent \
--request
POST \
--header "Content-Type: application/dql" \
--data
$'{ me(func: has(starring)) { name } }'

Observe the resulting traffic

As both gRPC and HTTP traffic is generated, you can see two lines into the demo-dgraph-alpha service, which is depicted as a triangle △ icon.

In the graph you can see the following content:

  • Kubernetes services are represented by triangle △ icon and Pod containers as the square ◻ icon.
  • Both gRPC and HTTP incoming traffic connect to the demo-dgraph-alpha service and then to the alpha container, which is called latest, due to lack of a version label.
  • The Dgraph Alpha service then communicates to Dgraph zero service, also called latest, due to lack of a version label.

Cleanup

This will remove the AKS cluster as well as any provisioned resources from AKS including external volumes created through the Dgraph deployment.

az aks delete \
--resource-group $AZ_RESOURCE_GROUP \
--name $AZ_CLUSTER_NAME

Resources

These are some resources I have come across when researching this article.

Blog Source Code

Service Mesh

General articles about service meshes.

gRPC Load Balancing

Topics on gRPC load balancing on Kubernetes.

Istio vs. Calico: Combining Network Policies with Istio

There are a few articles around using network policies with Istio.

Istio vs AKS: Installing Istio on AKS

These are specific pages related to AKS and Istio.

Documentation

Articles

Articles and blogs on Istio.

Example Application

This is application from Istio. There are more examples in project source code:

Conclusion

In this article I narrowly focused on the basics of Istio combined with network policies (Calico) for pods that are not in the mesh. One of the main reasons I wanted to look at Istio is due to issues regarding load balancing long-lived multiplexed gRPC traffic, and the security (mTLS) and observability were added bonuses.

There are a few things I would like to explore as the next step are around managing external traffic and further securing traffic within the mesh.

For traffic access or rather restricting traffic within the mesh network using AuthorizationPolicy, and exploring adding a later of authorization, so that service must authenticate to access the component.

External Traffic

There comes a point where you may want to explore a service to an endpoint. Istio provides to customer resources with a Gateway resource, for L4-L6 properties of a load balancer, and a VirtualService resource that can be bound to a gateway to control the forwarding of traffic arriving at a particular host or gateway port.

For a public facing service, you would want to use a friendly DNS name like https://dgraph.example.com, as this is easier to remember than something like https://20.69.65.109. This can be automated with Kubernetes addons external-dns and cert-manager. Through these two addons you can automate DNS record updates and automate issuing X.509 certificate from a trusted certificate authority.

So how do can I integrate these addons with Istio?

You can integrate these addons using either the native Gateway and VirtualService or use an ingress resource.

For the ingress, you can select the ingress by setting an annotation of kubernetes.io/ingress.class: istio. I wrote an earlier article, AKS with Cert Manager, that demonstrates how to use ingress-nginx with both external-dns using Azure DNS and cert-manager using Let’s Encrypt. The process is identical with exception of the annotation to select istio instead of nginx.

For Gateway and VirtualService resources, external-dns has direct support to scan these sources directly. With cert-manager, you would configure a Certificate resource, and then reference the secret it creates from the Gateway resource.

Final Note

Thank you for following this article. I hope it is useful to get started and start using within your organization.

--

--

Joaquín Menchaca (智裕)

DevOps/SRE/PlatformEng — k8s, o11y, vault, terraform, ansible