AKS with Istio Service Mesh
Securing traffic with Istio service mesh on AKS
Update: 2021年08月14日 Reduce commands, gist for 4+ commands, fixes
So now you have your distributed application, monolith, or microservices packaged as a container and deployed to Kubernetes. Congratulations!
But now you need to have security, such as encrypted traffic and network firewalls, and for all of these secured services, you need to have monitoring and proper load balancing of gRPC (HTTP/2) traffic, especially as Kubernetes fails in this department (ref).
And all of these things need to happen every time you roll out an new pod.
The Solution
This can all be down with a service mesh, that will add encryption-in-transit (mTLS or Mutual TLS), o11y (cloud native observability), load balancing, traffic management, as well as other features. For security outside of the service mesh (layer 4), you can use a network plugin, like Calico, that supports network policies.
This article will cover how to get started with this using Istio, coupled with the famous Envoy proxy, which is one of the most popular service mesh platforms on Kubernetes.
Goals and Not Goals
This article will cover the following goals:
- Install AKS with Calico and install Istio with Istio addons
- Install Dgraph and some clients (Python script using pydgraph)
- Test outside traffic is blocked after installing the network policy.
- Test traffic works through the service mesh.
- Generate traffic (gRPC and HTTP) and observe in Kiali.
The not goals (reserved for later):
- Restricting traffic within the mesh to authorized clients.
- Automatic authorization (AuthN, JWT, etc) for mesh members
- Managing external inbound or outbound traffic through Gateways.
- Other traffic management features, like retries and circuit breaker.
Architecture
A service mesh can be logically organized into two primary layers:
a control plane layer that’s responsible for configuration and management, and a data plane layer that provides network functions valuable to distributed applications. (ref)
Articles in Series
This series shows how to both secure and load balance gRPC and HTTP traffic.
- AKS with Azure Container Registry
- AKS with Calico network policies
- AKS with Linkerd service mesh
- AKS with Istio service mesh (this article)
Previous Article
The previous article covered similar topics using the Linkerd service mesh.
Requirements
For creation of Azure cloud resources, you will need to have a subscription that will allow you to create resources.
Required Tools
- Azure CLI tool (
az
): command line tool that interacts with Azure API. - Kubernetes client tool (
kubectl
): command line tool that interacts with Kubernetes API - Helm (
helm
): command line tool for “templating and sharing Kubernetes manifests” (ref) that are bundled as Helm chart packages. - helm-diff plugin: allows you to see the changes made with
helm
orhelmfile
before applying the changes. - Helmfile (
helmfile
): command line tool that uses a “declarative specification for deploying Helm charts across many environments” (ref). - Istio CLI (
istioctl
): command line tool to configure and deploy the Istio environment.
Optional tools
- POSIX shell (
sh
) such as GNU Bash (bash
) or Zsh (zsh
): these scripts in this guide were tested using either of these shells on macOS and Ubuntu Linux. - Docker (
docker
): command line tool to build, test, and push docker images.
Project setup
The following structure will be used:
~/azure_istio
├── addons
│ ├── grafana.yaml
│ ├── jaeger.yaml
│ ├── kiali.yaml
│ ├── prometheus.yaml
│ ├── prometheus_vm.yaml
│ └── prometheus_vm_tls.yaml
├── env.sh
└── examples
├── dgraph
│ ├── helmfile.yaml
│ └── network_policy.yaml
└── pydgraph
├── Dockerfile
├── Makefile
├── helmfile.yaml
├── load_data.py
├── requirements.txt
├── sw.nquads.rdf
└── sw.schema
With either Bash or Zsh, you can create the file structure with the following commands:
Project environment variables
Setup these environment variables below to keep a consistent environment amongst different tools used in this article. If you are using a POSIX shell, you can save these into a script and source that script whenever needed.
Copy this source script and save as env.sh
:
Provision Azure resources
Both AKS with Azure CNI and Calico network policies and ACR cloud resources can be provisioned with the following steps outlined in the script below.
Verify AKS and KUBCONFIG
Verify that the AKS cluster with current configured KUBCONFIG
environment variable:
source env.sh
kubectl get all --all-namespaces
The results should look something like the:
NOTE: As of Aug 1, 2021, this will install Kubernetes v1.20.7
and Calico cluster version is v3.19.0
. This will reflect recent changes and introduce two new namespaces: calico-system
and tigera-operator
.
Verify Azure CNI
Verify that nodes and pods are now on the same Azure VNET subnet, which means that the Azure CNI network plugin is installed as the default plugin.
You can print the IP addresses on the nodes and pods with the following:
This should show something like this:
Nodes:
------------
aks-nodepool1-56788426-vmss000000 10.240.0.4
aks-nodepool1-56788426-vmss000001 10.240.0.35
aks-nodepool1-56788426-vmss000002 10.240.0.66Pods:
------------
calico-kube-controllers-7d7897d6b7-qlrh6 10.240.0.36
calico-node-fxg66 10.240.0.66
calico-node-j4hlq 10.240.0.35
calico-node-kwfjv 10.240.0.4
calico-typha-85c77f79bd-5ksvc 10.240.0.4
calico-typha-85c77f79bd-6cl7p 10.240.0.66
calico-typha-85c77f79bd-ppb8x 10.240.0.35
azure-ip-masq-agent-6np6q 10.240.0.66
azure-ip-masq-agent-dt2b7 10.240.0.4
azure-ip-masq-agent-pltj9 10.240.0.35
coredns-9d6c6c99b-5zl69 10.240.0.28
coredns-9d6c6c99b-jzs8w 10.240.0.85
coredns-autoscaler-599949fd86-qlwv4 10.240.0.75
kube-proxy-4tbs4 10.240.0.35
kube-proxy-9rxr9 10.240.0.66
kube-proxy-bjjq5 10.240.0.4
metrics-server-77c8679d7d-dnbbt 10.240.0.89
tunnelfront-589474564b-k8s88 10.240.0.67
tigera-operator-7b555dfbdd-ww8sn 10.240.0.4
The Istio service mesh
There are a few ways to install Istio, helm charts, operators, or with istioctl
. For this article, we take the easy road, the istioctl
command.
Istio Platform
Install and verify Istio service mesh with the following commands:
source env.shistioctl install --set profile=demo -y
kubectl get all --namespace istio-system
This should show something like the following below:
Istio addons
Download the addon manifests and install them with the following commands:
NOTE: The first time kubectl apply
is run, there will be errors as CRD was not yet installed. Run kubectl apply
again, to run the remaining manifests that depend on the CRD.
After adding these components, you can see new resources with kubectl get all -n istio-system
:
The Dgraph service
Dgraph is a distributed graph database that can be installed with these steps below.
Save the following as examples/dgraph/helmfile.yaml
:
NOTE: The namespace dgraph
will need to have the required label istio-injection: enabled
to signal Istio to install Envoy proxy side cars.
Both the namespace with the needed label and Dgraph can be installed and verified with these commands:
source env.sh
helmfile --file examples/dgraph/helmfile.yaml apply
kubectl --namespace dgraph get all
After about two minutes, this should show something like the following:
The pydgraph client
For the pydgraph client, we’ll run though these steps to show case Istio service mesh and Calico network policies:
- Build
pydgraph-client
image and push to ACR - Deploy
pydgraph-client
inpydgraph-allow
namespace. Istio will inject an Envoy proxy into the pod. - Deploy
pydgraph-client
inpydgraph-deny
namespace.
Fetch the build and deploy scripts
In the previous blog, I documented steps to build and release a pygraph-client
image, and then deploy a container using that image.
Fetch build and deploy scripts
Below is a script you can use to download the gists and populate the needed files run through these steps.
NOTE: These scripts and further details are covered in the previous article (see AKS with Azure Container Registry).
Build and Push
Now that all the required source files are available, build the image:
source env.shaz acr login --name ${AZ_ACR_NAME}
pushd examples/pydgraph && make build && make push && popd
Deploy to pydgraph-deny namespace
The client in this namespace will not be apart of the service mesh.
helmfile \
--namespace "pydgraph-deny" \
--file examples/pydgraph/helmfile.yaml \
apply
Afterward, you can check the results with kubectl get all -n pydgraph-deny
:
Deploy to pydgraph-allow namespace
The client in this namespace will be apart of the service mesh. Create the namespace pydgraph-allow
, deploy pydgraph client into that namespace, and verify the results with the following commands:
The final results in pydgraph-allow
namespace the should look similar to the following:
This will add the Envoy proxy sidecar container:
Test 0 (Baseline): No Network Policy
Conduct a basic check to verify that the things are working before running any tests with network policies. In this sanity check and proceeding tests, both HTTP (port 8080
) and gRPC (port 9080
) will be tested.
Log into pydgraph-deny
Log into pydgraph-deny
client:
PYDGRAPH_DENY_POD=$(
kubectl get pods --namespace "pydgraph-deny" --output name
)kubectl exec -ti --namespace "pydgraph-deny" \
${PYDGRAPH_DENY_POD} -- bash
HTTP check (no network policy)
In the pydgraph-client
container and run this command:
curl ${DGRAPH_ALPHA_SERVER}:8080/health | jq
The expected results should be health status of one of the Dgraph Alpha nodes:
gRPC check (no network policy)
In the pydgraph-client
container and run this command:
grpcurl -plaintext -proto api.proto \
${DGRAPH_ALPHA_SERVER}:9080 \
api.Dgraph/CheckVersion
The expected results will be the Dgraph server version.
TEST 1: Apply a network policy
The goal of this next test is to deny all traffic that is outside of service mesh. his can be done by using network policies where only traffic from the service mesh is permitted.
After adding the policy, the expected results will timeouts as communication from the pydgraph-client
that is not in the service mesh, from the pydgraph-deny
namespace, will be blocked.
Adding a network policy
This policy will deny all traffic to the Dgraph Alpha pods, except for traffic from the service mesh, or more explicitly, from any pod from namespaces with the label linkerd.io/control-plane-ns=linkerd
.
Copy the following and save as examples/dgraph/network_policy.yaml
:
When ready, apply this with the following command:
kubectl --filename ./examples/dgraph/network_policy.yaml apply
Log into pydgraph-deny
Log into pydgraph-deny
client:
PYDGRAPH_DENY_POD=$(
kubectl get pods --namespace "pydgraph-deny" --output name
)kubectl exec -ti --namespace "pydgraph-deny" \
${PYDGRAPH_DENY_POD} -- bash
HTTP check (network policy applied)
Log into the pydgraph-client
pod, and run this command:
curl ${DGRAPH_ALPHA_SERVER}:8080/health
The expected results in this case, after a very long wait (about 5 minutes) will be something similar to this:
gRPC check (network policy apply)
Log into the pydgraph-client
pod and run this command:
grpcurl -plaintext -proto api.proto \
${DGRAPH_ALPHA_SERVER}:9080 \
api.Dgraph/CheckVersion
The expected results for gRPC in about 10 seconds will be:
Test 2: Test with Envoy proxy side car
Now that we verified that network connectivity is not possible from the pydgraph-deny
namespace, we can now try testing from pydgraph-allow
, which has the Envoy proxy side car injected into the pod by Istio.
Log into pydgraph-allow
Log into pydgraph-allow
client:
PYDGRAPH_ALLOW_POD=$(
kubectl get pods --namespace "pydgraph-allow" --output name
)kubectl exec -ti --namespace "pydgraph-allow" \
${PYDGRAPH_ALLOW_POD} -- bash
HTTP check (namespace label applied)
Log into the pydgraph-client
pod, and run this command:
curl ${DGRAPH_ALPHA_SERVER}:8080/health | jq
The expected results for this is that JSON data about the health from one of the Dgraph Alpha pods.
gRPC check (namespace label applied)
Log into the pydgraph-client
pod and run this command:
grpcurl -plaintext -proto api.proto \
${DGRAPH_ALPHA_SERVER}:9080 \
api.Dgraph/CheckVersion
The expected results for this is that JSON detailing the Dgraph server version.
Test 3: Listening to traffic steams
For this step, we will monitor traffic as it goes through the proxy and then generate some traffic. For monitoring, we’ll Kiali graphical dashboard.
Kiali dashboard
Run this command:
istioctl dashboard kiali
One in the dashboard, click on graph and select the dgraph
for the Namespace
.
Generate Traffic
With this monitoring in place, log into the pydgraph-client
pod and run these commands:
curl ${DGRAPH_ALPHA_SERVER}:8080/healthgrpcurl -plaintext -proto api.proto \
${DGRAPH_ALPHA_SERVER}:9080 api.Dgraph/CheckVersionpython3 load_data.py --plaintext \
--alpha ${DGRAPH_ALPHA_SERVER}:9080 \
--files ./sw.nquads.rdf \
--schema ./sw.schemacurl "${DGRAPH_ALPHA_SERVER}:8080/query" --silent \
--request POST \
--header "Content-Type: application/dql" \
--data $'{ me(func: has(starring)) { name } }'
Observe the resulting traffic
As both gRPC and HTTP traffic is generated, you can see two lines into the demo-dgraph-alpha
service, which is depicted as a triangle △ icon.
In the graph you can see the following content:
- Kubernetes services are represented by triangle △ icon and Pod containers as the square ◻ icon.
- Both gRPC and HTTP incoming traffic connect to the
demo-dgraph-alpha
service and then to thealpha
container, which is calledlatest
, due to lack of aversion
label. - The Dgraph Alpha service then communicates to Dgraph zero service, also called latest, due to lack of a
version
label.
Cleanup
This will remove the AKS cluster as well as any provisioned resources from AKS including external volumes created through the Dgraph deployment.
az aks delete \
--resource-group $AZ_RESOURCE_GROUP \
--name $AZ_CLUSTER_NAME
Resources
These are some resources I have come across when researching this article.
Blog Source Code
- AKS with Istio Service Mesh: https://github.com/darkn3rd/blog_tutorials/tree/master/kubernetes/aks/series_2_network_mgmnt/part_4_istio
Service Mesh
General articles about service meshes.
- The History of the Service Mesh by William Morgan, 13 Feb 2018.
- Which Service Mesh Should I Use? by by George Miranda, 24 Apr 2018
- Service Meshes in the Cloud Native World by Pavan Belagatti, 5 Apr 2021
- What is a Service Mesh? Redhat, Accessed 1 Aug 2021.
- Service Mesh, Wikipedia, Accessed 1 Aug 2021.
gRPC Load Balancing
Topics on gRPC load balancing on Kubernetes.
- gRPC Load Balancing on Kubernetes without Tears: https://kubernetes.io/blog/2018/11/07/grpc-load-balancing-on-kubernetes-without-tears/
Istio vs. Calico: Combining Network Policies with Istio
There are a few articles around using network policies with Istio.
- Network Policy and Istio: Deep Dive by Saurabh Mohan, 24 May 2017.
- Using Network Policy in Concert with Istio — Part 1 by Spike Curtis, 6 Aug 2017.
- Using Network Policy in Concert with Istio — Part 2 by Spike Curtis, 6 Aug 2017.
- Using Network Policy in Concert with Istio — Part 3 by Spike Curtis, 6 Aug 2017.
- Using Network Policy with Istio by Spike Curtis, 10 Aug 2017.
- Configuring Zero Trust Networking with Kubernetes, Istio and Calico, DarkEdges, 17 Jan 2019.
- Enforce Calico network policy using Istio (tutorial), Accessed 1 Aug 2021.
- Istio integration, Accessed 1 Aug 2021.
- Enforce network policy for Istio, Accessed 1 Aug 2021.
- Enforce Calico network policy using Istio (tutorial), Accessed 1 Aug 2021.
- Use HTTP methods and paths in policy rules, Accessed 1 Aug 2021.
Istio vs AKS: Installing Istio on AKS
These are specific pages related to AKS and Istio.
- Install and use Istio in Azure Kubernetes Service (AKS), Azure Documentation, 2 Oct 2019.
- Azure, Istio Documentation,12 Sep 2019.
Documentation
- Getting Started, Istio Documention, Accessed 1 Aug 2021.
- Request Routing, Istio Documentation, Accessed 1 Aug 2021.
- Gateway, Istio Documentation, Accessed 2 Aug 2021.
- VirtualService, Istio Documentation, Accessed 2 Aug 2021.
- AuthorizationPolicy, Istio Documentation, Accessed 2 Aug 2021.
Articles
Articles and blogs on Istio.
- StatefulSets Made Easier With Istio 1.10 by Lin Sun, 19 May 2021.
- Introducing istiod: simplifying the control plane by Craig Box, 19 Mar 2020.
- Introducing the Istio Operator by Martin Ostrowski and Frank Budinsky, 14 Nov 2019
- Why You Should Care About Istio Gateways by Neeraj Poddar, 2 Aug 2018
Example Application
This is application from Istio. There are more examples in project source code:
Conclusion
In this article I narrowly focused on the basics of Istio combined with network policies (Calico) for pods that are not in the mesh. One of the main reasons I wanted to look at Istio is due to issues regarding load balancing long-lived multiplexed gRPC traffic, and the security (mTLS) and observability were added bonuses.
There are a few things I would like to explore as the next step are around managing external traffic and further securing traffic within the mesh.
For traffic access or rather restricting traffic within the mesh network using AuthorizationPolicy
, and exploring adding a later of authorization, so that service must authenticate to access the component.
External Traffic
There comes a point where you may want to explore a service to an endpoint. Istio provides to customer resources with a Gateway
resource, for L4-L6 properties of a load balancer, and a VirtualService
resource that can be bound to a gateway to control the forwarding of traffic arriving at a particular host or gateway port.
For a public facing service, you would want to use a friendly DNS name like https://dgraph.example.com
, as this is easier to remember than something like https://20.69.65.109
. This can be automated with Kubernetes addons external-dns and cert-manager. Through these two addons you can automate DNS record updates and automate issuing X.509 certificate from a trusted certificate authority.
So how do can I integrate these addons with Istio?
You can integrate these addons using either the native Gateway and VirtualService or use an ingress
resource.
For the ingress, you can select the ingress by setting an annotation of kubernetes.io/ingress.class: istio
. I wrote an earlier article, AKS with Cert Manager, that demonstrates how to use ingress-nginx with both external-dns using Azure DNS and cert-manager using Let’s Encrypt. The process is identical with exception of the annotation to select istio
instead of nginx
.
For Gateway
and VirtualService
resources, external-dns has direct support to scan these sources directly. With cert-manager, you would configure a Certificate
resource, and then reference the secret it creates from the Gateway
resource.
Final Note
Thank you for following this article. I hope it is useful to get started and start using within your organization.