AKS with Istio Service Mesh

Securing traffic with Istio service mesh on AKS

The Solution

This can all be down with a service mesh, that will add encryption-in-transit (mTLS or Mutual TLS), o11y (cloud native observability), load balancing, traffic management, as well as other features. For security outside of the service mesh (layer 4), you can use a network plugin, like Calico, that supports network policies.

Goals and Not Goals

This article will cover the following goals:

  1. Install AKS with Calico and install Istio with Istio addons
  2. Install Dgraph and some clients (Python script using pydgraph)
  3. Test outside traffic is blocked after installing the network policy.
  4. Test traffic works through the service mesh.
  5. Generate traffic (gRPC and HTTP) and observe in Kiali.
  • Restricting traffic within the mesh to authorized clients.
  • Automatic authorization (AuthN, JWT, etc) for mesh members
  • Managing external inbound or outbound traffic through Gateways.
  • Other traffic management features, like retries and circuit breaker.

Architecture

Istio Architecture: Control Plan vs. Data Plane

Articles in Series

This series shows how to both secure and load balance gRPC and HTTP traffic.

  1. AKS with Azure Container Registry
  2. AKS with Calico network policies
  3. AKS with Linkerd service mesh
  4. AKS with Istio service mesh (this article)

Previous Article

The previous article covered similar topics using the Linkerd service mesh.

Requirements

For creation of Azure cloud resources, you will need to have a subscription that will allow you to create resources.

Required Tools

  • Azure CLI tool (az): command line tool that interacts with Azure API.
  • Kubernetes client tool (kubectl): command line tool that interacts with Kubernetes API
  • Helm (helm): command line tool for “templating and sharing Kubernetes manifests” (ref) that are bundled as Helm chart packages.
  • helm-diff plugin: allows you to see the changes made with helm or helmfile before applying the changes.
  • Helmfile (helmfile): command line tool that uses a “declarative specification for deploying Helm charts across many environments” (ref).
  • Istio CLI (istioctl): command line tool to configure and deploy the Istio environment.

Optional tools

  • POSIX shell (sh) such as GNU Bash (bash) or Zsh (zsh): these scripts in this guide were tested using either of these shells on macOS and Ubuntu Linux.
  • Docker (docker): command line tool to build, test, and push docker images.

Project setup

The following structure will be used:

~/azure_istio
├── addons
│ ├── grafana.yaml
│ ├── jaeger.yaml
│ ├── kiali.yaml
│ ├── prometheus.yaml
│ ├── prometheus_vm.yaml
│ └── prometheus_vm_tls.yaml
├── env.sh
└── examples
├── dgraph
│ ├── helmfile.yaml
│ └── network_policy.yaml
└── pydgraph
├── Dockerfile
├── Makefile
├── helmfile.yaml
├── load_data.py
├── requirements.txt
├── sw.nquads.rdf
└── sw.schema

Project environment variables

Setup these environment variables below to keep a consistent environment amongst different tools used in this article. If you are using a POSIX shell, you can save these into a script and source that script whenever needed.

Provision Azure resources

Azure Resources

Verify AKS and KUBCONFIG

Verify that the AKS cluster with current configured KUBCONFIG environment variable:

source env.sh
kubectl get all --all-namespaces
AKS with Azure CNI and Calico

Verify Azure CNI

Verify that nodes and pods are now on the same Azure VNET subnet, which means that the Azure CNI network plugin is installed as the default plugin.

Nodes:
------------
aks-nodepool1-56788426-vmss000000 10.240.0.4
aks-nodepool1-56788426-vmss000001 10.240.0.35
aks-nodepool1-56788426-vmss000002 10.240.0.66
Pods:
------------
calico-kube-controllers-7d7897d6b7-qlrh6 10.240.0.36
calico-node-fxg66 10.240.0.66
calico-node-j4hlq 10.240.0.35
calico-node-kwfjv 10.240.0.4
calico-typha-85c77f79bd-5ksvc 10.240.0.4
calico-typha-85c77f79bd-6cl7p 10.240.0.66
calico-typha-85c77f79bd-ppb8x 10.240.0.35
azure-ip-masq-agent-6np6q 10.240.0.66
azure-ip-masq-agent-dt2b7 10.240.0.4
azure-ip-masq-agent-pltj9 10.240.0.35
coredns-9d6c6c99b-5zl69 10.240.0.28
coredns-9d6c6c99b-jzs8w 10.240.0.85
coredns-autoscaler-599949fd86-qlwv4 10.240.0.75
kube-proxy-4tbs4 10.240.0.35
kube-proxy-9rxr9 10.240.0.66
kube-proxy-bjjq5 10.240.0.4
metrics-server-77c8679d7d-dnbbt 10.240.0.89
tunnelfront-589474564b-k8s88 10.240.0.67
tigera-operator-7b555dfbdd-ww8sn 10.240.0.4

The Istio service mesh

Kubernetes components

Istio Platform

Install and verify Istio service mesh with the following commands:

source env.shistioctl install --set profile=demo -y
kubectl
get all --namespace istio-system
Deployment of Istio

Istio addons

Download the addon manifests and install them with the following commands:

Deployment of Istio + Addons

The Dgraph service

Dgraph is a distributed graph database that can be installed with these steps below.

source env.sh
helmfile --file examples/dgraph/helmfile.yaml apply
kubectl --namespace dgraph get all
Deployment of Dgraph

The pydgraph client

For the pydgraph client, we’ll run though these steps to show case Istio service mesh and Calico network policies:

  1. Build pydgraph-client image and push to ACR
  2. Deploy pydgraph-client in pydgraph-allow namespace. Istio will inject an Envoy proxy into the pod.
  3. Deploy pydgraph-client in pydgraph-deny namespace.

Fetch build and deploy scripts

Build and Push

Now that all the required source files are available, build the image:

source env.shaz acr login --name ${AZ_ACR_NAME}
pushd examples/pydgraph && make build && make push && popd

Deploy to pydgraph-deny namespace

The client in this namespace will not be apart of the service mesh.

helmfile \
--namespace "pydgraph-deny" \
--file examples/pydgraph/helmfile.yaml \
apply
Namespace: pydgraph-deny

Deploy to pydgraph-allow namespace

The client in this namespace will be apart of the service mesh. Create the namespace pydgraph-allow, deploy pydgraph client into that namespace, and verify the results with the following commands:

Namespace: pydgraph-allow

Test 0 (Baseline): No Network Policy

Conduct a basic check to verify that the things are working before running any tests with network policies. In this sanity check and proceeding tests, both HTTP (port 8080) and gRPC (port 9080) will be tested.

No Network Policy

Log into pydgraph-deny

Log into pydgraph-deny client:

PYDGRAPH_DENY_POD=$(
kubectl get pods --namespace "pydgraph-deny" --output name
)
kubectl exec -ti --namespace "pydgraph-deny" \
${PYDGRAPH_DENY_POD}
-- bash

HTTP check (no network policy)

In the pydgraph-client container and run this command:

curl ${DGRAPH_ALPHA_SERVER}:8080/health | jq
/health (HTTP)

gRPC check (no network policy)

In the pydgraph-client container and run this command:

grpcurl -plaintext -proto api.proto \
${DGRAPH_ALPHA_SERVER}:9080 \
api.Dgraph/CheckVersion
api.Dgraph/CheckVersion (gRPC)

TEST 1: Apply a network policy

The goal of this next test is to deny all traffic that is outside of service mesh. his can be done by using network policies where only traffic from the service mesh is permitted.

Network Policy added to block traffic outside the mesh

Adding a network policy

This policy will deny all traffic to the Dgraph Alpha pods, except for traffic from the service mesh, or more explicitly, from any pod from namespaces with the label linkerd.io/control-plane-ns=linkerd.

Dgraph Network Policy for Istio (made with https://editor.cilium.io)
kubectl --filename ./examples/dgraph/network_policy.yaml apply

Log into pydgraph-deny

Log into pydgraph-deny client:

PYDGRAPH_DENY_POD=$(
kubectl get pods --namespace "pydgraph-deny" --output name
)
kubectl exec -ti --namespace "pydgraph-deny" \
${PYDGRAPH_DENY_POD}
-- bash

HTTP check (network policy applied)

Log into the pydgraph-client pod, and run this command:

curl ${DGRAPH_ALPHA_SERVER}:8080/health

gRPC check (network policy apply)

Log into the pydgraph-client pod and run this command:

grpcurl -plaintext -proto api.proto \
${DGRAPH_ALPHA_SERVER}:9080 \
api.Dgraph/CheckVersion

Test 2: Test with Envoy proxy side car

Now that we verified that network connectivity is not possible from the pydgraph-deny namespace, we can now try testing from pydgraph-allow, which has the Envoy proxy side car injected into the pod by Istio.

Log into pydgraph-allow

Log into pydgraph-allow client:

PYDGRAPH_ALLOW_POD=$(
kubectl get pods --namespace "pydgraph-allow" --output name
)
kubectl exec -ti --namespace "pydgraph-allow" \
${PYDGRAPH_ALLOW_POD}
-- bash

HTTP check (namespace label applied)

Log into the pydgraph-client pod, and run this command:

curl ${DGRAPH_ALPHA_SERVER}:8080/health | jq
/health (HTTP)

gRPC check (namespace label applied)

Log into the pydgraph-client pod and run this command:

grpcurl -plaintext -proto api.proto \
${DGRAPH_ALPHA_SERVER}:9080 \
api.Dgraph/CheckVersion
api.Dgraph/CheckVersion (gRPC)

Test 3: Listening to traffic steams

For this step, we will monitor traffic as it goes through the proxy and then generate some traffic. For monitoring, we’ll Kiali graphical dashboard.

Kiali dashboard

Run this command:

istioctl dashboard kiali

Generate Traffic

With this monitoring in place, log into the pydgraph-client pod and run these commands:

curl ${DGRAPH_ALPHA_SERVER}:8080/healthgrpcurl -plaintext -proto api.proto \
${DGRAPH_ALPHA_SERVER}
:9080 api.Dgraph/CheckVersion
python3 load_data.py --plaintext \
--alpha ${DGRAPH_ALPHA_SERVER}:9080 \
--files ./sw.nquads.rdf \
--schema ./sw.schema
curl "${DGRAPH_ALPHA_SERVER}:8080/query" --silent \
--request
POST \
--header "Content-Type: application/dql" \
--data
$'{ me(func: has(starring)) { name } }'

Observe the resulting traffic

As both gRPC and HTTP traffic is generated, you can see two lines into the demo-dgraph-alpha service, which is depicted as a triangle △ icon.

  • Kubernetes services are represented by triangle △ icon and Pod containers as the square ◻ icon.
  • Both gRPC and HTTP incoming traffic connect to the demo-dgraph-alpha service and then to the alpha container, which is called latest, due to lack of a version label.
  • The Dgraph Alpha service then communicates to Dgraph zero service, also called latest, due to lack of a version label.

Cleanup

This will remove the AKS cluster as well as any provisioned resources from AKS including external volumes created through the Dgraph deployment.

az aks delete \
--resource-group $AZ_RESOURCE_GROUP \
--name $AZ_CLUSTER_NAME

Resources

These are some resources I have come across when researching this article.

Blog Source Code

Service Mesh

General articles about service meshes.

gRPC Load Balancing

Topics on gRPC load balancing on Kubernetes.

Istio vs. Calico: Combining Network Policies with Istio

There are a few articles around using network policies with Istio.

Istio vs AKS: Installing Istio on AKS

These are specific pages related to AKS and Istio.

Documentation

Articles

Articles and blogs on Istio.

Example Application

This is application from Istio. There are more examples in project source code:

Conclusion

In this article I narrowly focused on the basics of Istio combined with network policies (Calico) for pods that are not in the mesh. One of the main reasons I wanted to look at Istio is due to issues regarding load balancing long-lived multiplexed gRPC traffic, and the security (mTLS) and observability were added bonuses.

External Traffic

There comes a point where you may want to explore a service to an endpoint. Istio provides to customer resources with a Gateway resource, for L4-L6 properties of a load balancer, and a VirtualService resource that can be bound to a gateway to control the forwarding of traffic arriving at a particular host or gateway port.

Final Note

Thank you for following this article. I hope it is useful to get started and start using within your organization.

--

--

Linux NinjaPants Automation Engineering Mutant — exploring DevOps, o11y, k8s, progressive deployment (ci/cd), cloud native infra, infra as code

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Joaquín Menchaca (智裕)

Linux NinjaPants Automation Engineering Mutant — exploring DevOps, o11y, k8s, progressive deployment (ci/cd), cloud native infra, infra as code