Ultimate EKS Baseline Cluster

Proper Elastic Kubernetes Service Baseline Cluster

27 min readJul 6, 2023

The AWS managed Kubernetes service, EKS (Elastic Kubernetes Service), has the highest level of complexity amongst cloud offerings. Besides having to build out the networking, routing, security, and worker nodes separately from the managed master nodes, there’s no longer any bundled support for storage starting from 1.23, and the existing support for external load balancers is limited.

In the first part of this article, I show how with a few commands, you can build a more robust Kubernetes cluster that includes the following:

storage support: persistent volumes using EBS
load balancer (layer 4): external load balancer using NLB
reverse proxy (layer 7): ingress controller using ALB
traffic security: network policies using Calico

In order to secure services that need access to cloud resources, an OIDC provider through IAM Roles for Service Account facility will be used.

In the second part of this article, I will use Dgraph, a distributed graph database, and Ratel client visual application, to demostrate the features with this cluster.

0. Prerequisites

These are are some prerequisites and initial steps needed to get started before provisioning a Kubernetes cluster and installing add-ons.

0.1 Knowledge: Systems

Basic concepts of systems, such as Linux and the shell (redirection, pipes, process substitution, command substitution, environment variables), as well as virtualization and containers are useful. The concept of a service (daemon) is important.

0.2 Knowledge: Networking

This article requires some basic understanding of networking with TCP/IP and the OSI model, specifically the Transport Layer 4 and Application Layer 7 for HTTP. This article covers using load balancing and reverse proxy.

0.3 Knowledge: Kubernetes

In Kubernetes, familiarity with service types: ClusterIP, NodePort, LoadBalancer, ExternalName and the ingress resource are important.

Exposure to other types of Kubernetes resource objects used in this guide are helpful: persistent volume claims, storage class, pods, deployments, statefulsets, configmaps, serviceaccount and network policies.

0.4 Tools

These are the tools used in this article.

These are the required tools for this guide:

AWS CLI [aws] is a tool that interacts with AWS.
kubectl client [kubectl] a the tool that can interact with the Kubernetes cluster. This can be installed using adsf tool.
helm [helm] is a tool that can install Kubernetes applications that are packaged as helm charts.
eksctl [eksctl] is the tool that can provision EKS cluster as well as supporting VPC network infrastructure.
POSIX Shell [sh] such as bash [bash] or zsh [zsh] are used to run the commands.

These tools are highly recommended:

adsf [adsf] is a tool that installs versions of popular tools like kubectl.
jq [jq] is a tool to query and print JSON data
GNU Grep [grep] supports extracting string patterns using extended Regex and PCRE.

0.5 AWS Setup

Before getting started on EKS, you will need to set up billing to an AWS account (there’s a free tier), and then configure a profile that has provides access to an IAM User identity. See Setting up the AWS CLI for more information on configuring a profile.

After setup, you can test access with the following below:

export AWS_PROFILE="<your-profile-goes-here>"
aws sts get-caller-identity

This should show something like the following with values appropriate to your environment, e.g. example output to IAM user named kwisatzhaderach:

0.6 Kubernetes Client Setup

If you use asdf to install kubectl, you can get the latest version with the following:

# install kubectl plugin for asdf
asdf plugin-add kubectl \
  https://github.com/asdf-community/asdf-kubectl.git
asdf install kubectl latest

# fetch latest kubectl 
asdf install kubectl latest
asdf global kubectl latest

# test results of latest kubectl 
kubectl version --short --client 2> /dev/null

This should show something like:

Client Version: v1.27.1
Kustomize Version: v5.0.1

Also, create directory to store Kubernetes configurations that will be used by the KUBECONFIG env variable:

mkdir -p $HOME/.kube

0.7 Setup Environment Variables

These environment variables will be used throughout this guide. If opening up a new browser tab, make sure to set the environment variables accordingly.

# variables used to create EKS
export AWS_PROFILE="my-aws-profile" # CHANGEME
export EKS_CLUSTER_NAME="my-unique-cluster-name" # CHANGEME
export EKS_REGION="us-west-2"
export EKS_VERSION="1.26"
# KUBECONFIG variable
export KUBECONFIG=$HOME/.kube/$EKS_REGION.$EKS_CLUSTER_NAME.yaml

# account id
export ACCOUNT_ID=$(aws sts get-caller-identity \
  --query "Account" \
  --output text
)

# aws-load-balancer-controller
export POLICY_NAME_ALBC="${EKS_CLUSTER_NAME}_AWSLoadBalancerControllerIAMPolicy"
export POLICY_ARN_ALBC="arn:aws:iam::$ACCOUNT_ID:policy/$POLICY_NAME_ALBC"
export ROLE_NAME_ALBC="${EKS_CLUSTER_NAME}_AmazonEKSLoadBalancerControllerRole"

# ebs-csi-driver
export ROLE_NAME_ECSI="${EKS_CLUSTER_NAME}_EBS_CSI_DriverRole"
export ACCOUNT_ROLE_ARN_ECSI="arn:aws:iam::$ACCOUNT_ID:role/$ROLE_NAME_ECSI"
POLICY_NAME_ECSI="AmazonEBSCSIDriverPolicy" # preinstalled by AWS
export POLICY_ARN_ECSI="arn:aws:iam::aws:policy/service-role/$POLICY_NAME_ECSI"

0.8 Setup Helm Repositories

These days helm charts come from a variety of sources. You can get the helm chart used in this guide by running the following commands.

# add AWS LB Controller (NLB/ALB) helm charts
helm repo add "eks" "https://aws.github.io/eks-charts"
# add Calico CNI helm charts
helm repo add "projectcalico" "https://docs.tigera.io/calico/charts"
# add Dgraph helm charts (demo application)
helm repo add "dgraph" "https://charts.dgraph.io"

# download charts
helm repo update

1. Provision an EKS cluster

After prerequisite tools are installed and setup, we can start provisioning cloud resources and deploy components to Kubernetes. The cluster can be brought up with the following command:

eksctl create cluster \
  --version $EKS_VERSION \
  --region $EKS_REGION \
  --name $EKS_CLUSTER_NAME \
  --nodes 3

Once this finished in about 20 minutes to provision both VPC and EKS, install a kubectl version that matches the Kubernetes server version:

# fetch exact version of Kubernetes server (Requires GNU Grep)
VER=$(kubectl version --short 2> /dev/null \
  | grep Server \
  | grep -oP '(\d{1,2}\.){2}\d{1,2}'
)

# setup kubectl tool
asdf list kubectl | grep -q $VER || asdf install kubectl $VER
asdf global kubectl $VER

⚠️ NOTE: The above command requires GNU grep. If you have Homebrew, you can run brew install grep. Windows can get this with MSYS2 or git-bash.

Also, check the status of the worker nodes and applications running on Kubernetes.

kubectl get nodes
kubectl get all --all-namespaces

This should show something like the following.

1.1 Add OIDC Provider Support

The EKS cluster has an OpenID Connect (OIDC) issuer URL associated with it. To use IRSA (IAM roles for service accounts), an IAM OIDC provider must exist for the cluster’s OIDC issuer URL.

You can set this up with the following command:

eksctl utils associate-iam-oidc-provider \
  --cluster $EKS_CLUSTER_NAME \
  --region $EKS_REGION \
  --approve

You can verify the OIDC provider is added with the following:

OIDC_ID=$(aws eks describe-cluster \
  --name $EKS_CLUSTER_NAME \
  --region $EKS_REGION \
  --query "cluster.identity.oidc.issuer" \
  --output text \
  | cut -d '/' -f 5
)

aws iam list-open-id-connect-providers \
  | grep $OIDC_ID \
  | cut -d '"' -f4 \
  | cut -d '/' -f4

2. AWS Load Balancer Controller

This will add external load balancer support with either a layer 4 TCP with NLB or a layer 7 HTTP with ALB. The NLB can be used with a service resource of type LoadBalancer, and the reverse proxy is provisioned through an ingress resource.

Installation of this component will require the following steps:

Upload policy (e.g. AWSLoadBalancerControllerIAMPolicy) that grants access to Elastic Load Balancing APIs
Create an IAM Role (e.g. AmazonEKSLoadBalancerControllerRole) with the above attached policy and associate it to the name of the future KSA (i.e. aws-load-balancer-controller).
Create a Kubernetes service account (i.e. aws-load-balancer-controller) and associate it with the previously created IAM Role (e.g. AmazonEKSLoadBalancerControllerRole).
Deploy AWS Load Balancer Controller that uses the above service account.

2.1 Create a Policy to access ELB APIs

There’s a policy we can download that has sufficient permissions to access Elastic Load Balancing APIs. In this step, we’ll create a policy using these permissions.

First download the file with the following.

VER="v2.5.2" # change if version changes
PREFIX="https://raw.githubusercontent.com"
HTTP_PATH="kubernetes-sigs/aws-load-balancer-controller/$VER/docs/install"
FILE_GOV="iam_policy_us-gov"
FILE_REG="iam_policy"

# Download the appropriate link
curl --remote-name --silent --location $PREFIX/$HTTP_PATH/$FILE_REG.json

Once this is downloaded, you can install it with the following:

aws iam create-policy \
    --policy-name $POLICY_NAME_ALBC \
    --policy-document file://iam_policy.json

2.2. Associate Service Account with the uploaded policy

eksctl create iamserviceaccount \
  --cluster $EKS_CLUSTER_NAME \
  --region $EKS_REGION \
  --namespace "kube-system" \
  --name "aws-load-balancer-controller" \
  --role-name $ROLE_NAME_ALBC \
  --attach-policy-arn $POLICY_ARN_ALBC \
  --approve

You can verify what was created with the following:

aws iam get-role --role-name $ROLE_NAME_ALBC

This should show something like the following:

You can inspect the metadata added to the service:

kubectl get serviceaccount "aws-load-balancer-controller" \
  --namespace "kube-system" \
  --output yaml

This should show something like the following:

2.3 Install AWS load balancer controller add-on

helm install \
  aws-load-balancer-controller \
  eks/aws-load-balancer-controller \
  --namespace "kube-system" \
  --set clusterName=$EKS_CLUSTER_NAME \
  --set serviceAccount.create=false \
  --set serviceAccount.name=aws-load-balancer-controller

When completed, you can check on the status of it by running:

kubectl get all \
  --namespace "kube-system" \
  --selector "app.kubernetes.io/name=aws-load-balancer-controller"

This should show something like the following:

2.4 Testing NLB

This is a small minimal test to deploy an Apache Web Server to test the solution using NLB.

# deploy application
kubectl create namespace httpd-svc
kubectl create deployment httpd \
  --image=httpd \
  --replicas=3 \
  --port=80 \
  --namespace=httpd-svc

# provision external load balancer
cat <<EOF | kubectl apply --namespace httpd-svc -f -
apiVersion: v1
kind: Service
metadata:
  name: httpd
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: external
    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
    service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
spec:
  ports:
    - port: 80
      targetPort: 80
      protocol: TCP
  type: LoadBalancer
  selector:
    app: httpd
EOF

Verify that all the components were installed:

kubectl get all --namespace=httpd-svc

This should show something like the following:

You can run curl to fetch a response from the web service:

export SVC_LB=$(kubectl get service httpd \
  --namespace "httpd-svc" \
  --output jsonpath='{.status.loadBalancer.ingress[0].hostname}'
)

curl --silent --include $SVC_LB

This should look something like the following

Additionally, for the curious, if you want to look at resources created by the controller, you can run the following command:

export SVC_LB=$(kubectl get service httpd \
  --namespace "httpd-svc" \
  --output jsonpath='{.status.loadBalancer.ingress[0].hostname}'
)

aws elbv2 describe-load-balancers --region $EKS_REGION \
  --query "LoadBalancers[?DNSName==\`$SVC_LB\`]"

This should show something like this:

2.5 Testing ALB

This is a small minimal test to deploy an Apache Web Server to test the solution using ALB.

# deploy application 
kubectl create namespace "httpd-ing"
kubectl create deployment httpd \
  --image "httpd" \
  --replicas 3 \
  --port 80 \
  --namespace "httpd-ing"

# create proxy to deployment 
kubectl expose deployment httpd \
  --port 80 \
  --target-port 80 \
  --namespace "httpd-ing"

# provision application load balancer
kubectl create ingress alb-ingress \
  --class "alb" \
  --rule "/=httpd:80" \
  --annotation "alb.ingress.kubernetes.io/scheme=internet-facing" \
  --annotation "alb.ingress.kubernetes.io/target-type=ip" \
  --namespace "httpd-ing"

Verify the components were installed with:

kubectl get all,ing --namespace "httpd-ing"

This should show something like:

Test the connection with the following:

export ING_LB=$(kubectl get ing alb-ingress \
  --namespace "httpd-ing" \
  --output jsonpath='{.status.loadBalancer.ingress[0].hostname}'
)

curl --silent --include $ING_LB

This should show something like the following:

Additionally, for the curious, if you want to look at AWS resources created by the controller, you can run the following command:

export ING_LB=$(kubectl get ing alb-ingress \
  --namespace "httpd-ing" \
  --output jsonpath='{.status.loadBalancer.ingress[0].hostname}'
)

aws elbv2 describe-load-balancers --region $EKS_REGION \
  --query "LoadBalancers[?DNSName==\`$ING_LB\`]"

This should show something like the following:

2.6 Delete Test Applications

You can delete these test applications with the following commands:

# deprovision cloud resources
kubectl delete "ingress/alb-ingress" --namespace "httpd-ing"
kubectl delete "service/httpd" --namespace "httpd-svc"

# delete kubernetes resouces
kubectl delete namespace "httpd-svc" "httpd-ing"

3. AWS EBS CSI driver

The current versions of EKS starting with 1.23 no longer come with a persistent volume support, so you have to install it on your own. The best method or at least the easiest way to install this, is using EKS add-ons facility. This will install the EBS CSI driver.

Installation of this component will require the following steps:

Create IAM Role (e.g. EBS_CSI_DriverRole) and associate it to Kubernetes service account (i.e. ebs-csi-controller-sa).
Deploy AWS EBS CSI driver using EKS add-ons facility, which also sets up the Kubernetes service account (i.e. ebs-csi-controller-sa) with an association back to the above IAM Role (e.g. EBS_CSI_DriverRole).
Create storage class that uses new the EBS CSI driver

3.1 Setup IAM Role and K8S SA association

The following process will create an IAM Role with permissions to access AWS EBS API. The service account ebs-csi-controller-sa will be created later when installing the driver.

# AWS IAM role bound to a Kubernetes service account
eksctl create iamserviceaccount \
  --name "ebs-csi-controller-sa" \
  --namespace "kube-system" \
  --cluster $EKS_CLUSTER_NAME \
  --region $EKS_REGION \
  --attach-policy-arn $POLICY_ARN_ECSI \
  --role-only \
  --role-name $ROLE_NAME_ECSI \
  --approve

This will create a IAM Role, which you can verify with:

aws iam get-role --role-name $ROLE_NAME_ECSI

This should show something like this:

3.2 Install AWS EBS CSI Drvier

# Install Addon
eksctl create addon \
  --name "aws-ebs-csi-driver" \
  --cluster $EKS_CLUSTER_NAME \
  --region $EKS_REGION \
  --service-account-role-arn $ACCOUNT_ROLE_ARN_ECSI \
  --force

# Pause here until STATUS=ACTIVE
ACTIVE=""; while [[ -z "$ACTIVE" ]]; do
  if eksctl get addon \
       --name "aws-ebs-csi-driver" \
       --region $EKS_REGION \
       --cluster $EKS_CLUSTER_NAME \
    | tail -1 \
    | awk '{print $3}' \
    | grep -q "ACTIVE"
  then
    ACTIVE="1"
  fi
done

It is important to wait until status changes to ACTIVE before proceeding.

You can inspect the pods created by running the following command:

kubectl get pods \
  --namespace "kube-system" \
  --selector "app.kubernetes.io/name=aws-ebs-csi-driver"

This should show something like:

You can verify the service account annotations references the IAM Role for the EBS CSI driver.

kubectl get serviceaccount "ebs-csi-controller-sa" \
  --namespace "kube-system" \
  --output yaml

3.3 Create storage class that uses the EBS CSI driver

In order to use the driver, we will need to create a storage class. You can do so by running the following command:

# create ebs-sc storage class
cat <<EOF | kubectl apply --filename -
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ebs-sc
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
EOF

When completed you can verify the storage class was created with:

kubectl get storageclass

This should show something like the following

3.4 Set new storage class to the default (optional)

This is an optional step. As there’s no functional default storage class, we can set the newly created storage class to be the default with the following commands:

kubectl patch storageclass gp2 --patch \
 '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'
kubectl patch storageclass ebs-sc --patch \
 '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

After this, you can verify the change with:

kubectl get storageclass

This should show something like:

3.5 Testing Persistent Volume

In this small test, we deploy a pod that continually writes to the external volume, and a volume claim to allocate storage using the storage class we created earlier.

If this works, the storage will be provisioned in the cloud to create the volume, and then it will be attached to the node and mounted in the pod. If this fails, you will see that the pod will be stuck in pending mode.

# create pod with persistent volume
kubectl create namespace "ebs-test"

# deploy application with mounted volume
cat <<EOF | kubectl apply --namespace "ebs-test" --filename -
apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  containers:
    - name: app
      image: ubuntu
      command: ["/bin/sh"]
      args: ["-c", "while true; do echo $(date -u) >> /data/out.txt; sleep 5; done"]
      volumeMounts:
      - name: persistent-storage
        mountPath: /data
  volumes:
    - name: persistent-storage
      persistentVolumeClaim:
        claimName: ebs-claim
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ebs-claim
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: ebs-sc
  resources:
    requests:
      storage: 4Gi
EOF

You can test the results of the volume creation with the following command:

kubectl get all,pvc --namespace "ebs-test"

We can also look at the events that took place in this namespace with:

kubectl events --namespace "ebs-test"

This should show something like this:

3.6 Delete test application

When finished, you can delete the pod with the following:

kubectl delete pod app --namespace "ebs-test"
kubectl delete pvc ebs-claim --namespace "ebs-test"
kubectl delete ns "ebs-test"

4. Calico CNI

These instructions will continue to use AWS VPC CNI for networking, and will use Calico CNI for network policies.

4.1 Install Calico CNI via the Tigera operator

# create ns for operator
kubectl create namespace tigera-operator

# deploy calico cni
helm install calico projectcalico/tigera-operator \
  --version v3.26.1 \
  --namespace tigera-operator \
  --set installation.kubernetesProvider=EKS

You can verify the installed components with the following command:

kubectl get pods -n calico-system

This should look something like this:

4.2 Enable Pod IP annotation (important)

There is a known issue with kubelet taking time to update Pod.Status.PodIP leading to calico being blocked on programming the policy. Setting ANNOTATE_POD_IP to true will enable AWS VPC CNI plugin (i.e. IPAMD) to add Pod IP as an annotation (i.e. vpc.amazonaws.com/pod-ips) to the pod spec to address this race condition.

cat << EOF > append.yaml
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - patch
EOF

# patch cluster role to allow updating annotations
kubectl apply -f <(cat <(kubectl get clusterrole aws-node -o yaml) append.yaml)

# enable pod annotation
kubectl set env daemonset aws-node \
  --namespace kube-system \
  ANNOTATE_POD_IP=true

# delete existing pod, so that they are refreshed with the annotation
kubectl delete pod \
  --selector app.kubernetes.io/name=calico-kube-controllers \
  --namespace calico-system

Verify that the annotation was added

kubectl describe pod \
  --selector app.kubernetes.io/name=calico-kube-controllers \
  --namespace calico-system \
  | grep -o vpc.amazonaws.com/pod-ips.*$

This should show something like:

vpc.amazonaws.com/pod-ips: 192.168.95.173

4.3 Test network policy

Calico has great tutorial application that shows the connections between a front-end service, back-end service, and the client.

STEP 1: You can install this demonstration application with the following commands:

MANIFESTS=(00-namespace 01-management-ui 02-backend 03-frontend 04-client)
APP_URL=https://docs.projectcalico.org/v3.5/getting-started/kubernetes/tutorials/stars-policy/manifests/

for MANIFEST in ${MANIFESTS[*]}; do 
  kubectl apply -f $APP_URL/$MANIFEST.yaml
done

STEP 2: You can checkout the graphical application by running the the command below. You can run this in the another terminal tab (make sure to set KUBECONFIG so that you can access the cluster).

kubectl port-forward service/management-ui \
  --namespace management-ui 9001

STEP 3: Open a browser to http://localhost:9001/. You should see the management user interface. The C node is the client service, the F node is the front-end service, and the B node is the back-end service. Each node has full communication access to all other nodes, as indicated by the bold, colored lines.

STEP 4: Apply the following network policies to isolate the services from each other:

DENY_URL=https://docs.projectcalico.org/v3.5/getting-started/kubernetes/tutorials/stars-policy/policies/default-deny.yaml

kubectl apply --namespace client --filename $DENY_URL
kubectl apply --namespace stars --filename $DENY_URL

STEP 5: If the graphical application is still running, hit refresh. It will not be able to access graphical application.

STEP 6: Apply the following network policies to allow the management user interface to access the services:

export ALLOW_URL=https://docs.projectcalico.org/v3.5/getting-started/kubernetes/tutorials/stars-policy/policies/

kubectl apply --filename $ALLOW_URL/allow-ui.yaml
kubectl apply --filename $ALLOW_URL/allow-ui-client.yaml

STEP 7: After refreshing the browser, you can see that the management user interface can reach the nodes again, but the nodes cannot communicate with each other.

STEP 8: Apply the following network policy to allow traffic from the front-end service to the back-end service:

kubectl apply --filename $ALLOW_URL/backend-policy.yaml

STEP 9: After refreshing the browser, you can see that the front-end can communicate with the back-end.

STEP 10: Apply the following network policy to allow traffic from the client to the front-end service.

kubectl apply --filename $ALLOW_URL/frontend-policy.yaml

After refreshing the browser, you can see the client can communicate to the front-end service. The front-end service can still communicate to the back-end service.

4.4 Delete test application

This command will delete the application.

MANIFESTS=(04-client 03-frontend 02-backend 01-management-ui 00-namespace)
APP_URL=https://docs.projectcalico.org/v3.5/getting-started/kubernetes/tutorials/stars-policy/manifests/

for MANIFEST in ${MANIFESTS[*]}; do 
  kubectl delete --filename $APP_URL/$MANIFEST.yaml
done

5. Dgraph Demo Application

Dgraph is a highly performant distributed graph database that uses either DQL (Dgraph Query Language) or GraphQL as graph database query languages.

You can install Dgraph using the Dgraph helm chart. For this example, we’ll use an external load balancer with NLB. Since, generally, it is not safe to park the database on the public internet, we’ll create an access or allow list to limit what client sources can communicate with the Dgraph database.

5.1 Setup Access List for Security

We can get the current IP address as well as private IP addresses used by EKS with the following commands:

VPC_ID=$(aws eks describe-cluster \
  --name $EKS_CLUSTER_NAME \
  --region $EKS_REGION \
  --query 'cluster.resourcesVpcConfig.vpcId' \
  --output text
)

EKS_CIDR=$(aws ec2 describe-vpcs \
  --vpc-ids $VPC_ID \
  --region $EKS_REGION \
  --query 'Vpcs[0].CidrBlock' \
  --output text
)

# get the current outbound IP from your current location
MY_IP_ADDRESS=$(curl --silent ifconfig.me)

# set env var to use later
export DG_ALLOW_LIST="${EKS_CIDR},${MY_IP_ADDRESS}/32"

5.2 Install HA Dgraph cluster

With the env variable DG_ALLOW_LIST set, we can deploy Dgraph with the following:

# get dgraph helm chart
helm repo add dgraph https://charts.dgraph.io && helm repo update

# deploy dgraph
helm install dg dgraph/dgraph \
  --namespace dgraph \
  --create-namespace \
  --values -  <<EOF
zero:
  persistence:
    storageClass: ebs-sc
alpha:
  configFile:
    config.yaml: |
      security:
        whitelist: ${DG_ALLOW_LIST}
  persistence:
    storageClass: ebs-sc
  service:
    type: LoadBalancer
    annotations:
      service.beta.kubernetes.io/aws-load-balancer-type: external
      service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
      service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
      service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
      service.beta.kubernetes.io/aws-load-balancer-target-group-attributes: preserve_client_ip.enabled=true
EOF

You can inspect the deployed components with:

kubectl get all --namespace dgraph

5.3 About External Load Balancer Configuration

Normally, hen configuring the external load balancer, you can use loadBalancerSourceRanges to limit access through the load balancer, but this will not work with NLB, because Security Groups are not supported with NLB (See issue 2221), so this will need to be configured at the application level.

Also note that the client source IP address can be perserved when externalTrafficPolicy is set to Local, but with NLB, this has to be set through the target group attributes.

If you would like to get further information regarding what was provisioned on AWS, you can run this command:

export DG_LB=$(kubectl get service dg-dgraph-alpha \
  --namespace dgraph \
  --output jsonpath='{.status.loadBalancer.ingress[0].hostname}'
)

# Get information using LB DNS Name
aws elbv2 describe-load-balancers --region $EKS_REGION \
  --query "LoadBalancers[?DNSName==\`$DG_LB\`]" | jq

5.4 Connecting to Dgraph

You can run the following to test connectivity to Dgraph.


export DG_LB=$(kubectl get service dg-dgraph-alpha \
  --namespace dgraph \
  --output jsonpath='{.status.loadBalancer.ingress[0].hostname}'
)

curl --silent $DG_LB:8080/state | jq -r '.groups."1".members'

Will will use this environment variable DG_LB in following steps of this tutorial.

5.5 Testing Dgraph

We can test Dgraph by loading up some data and schema, and then run some queries using curl. Make sure to have the DG_LB environment variable set.

First let’s upload some data.

curl "$DG_LB:8080/mutate?commitNow=true" --silent --request POST \
 --header  "Content-Type: application/json" \
 --data $'
{
  "set": [
    {"uid": "_:luke","name": "Luke Skywalker", "dgraph.type": "Person"},
    {"uid": "_:leia","name": "Princess Leia", "dgraph.type": "Person"},
    {"uid": "_:han","name": "Han Solo", "dgraph.type": "Person"},
    {"uid": "_:lucas","name": "George Lucas", "dgraph.type": "Person"},
    {"uid": "_:irvin","name": "Irvin Kernshner", "dgraph.type": "Person"},
    {"uid": "_:richard","name": "Richard Marquand", "dgraph.type": "Person"},
    {
      "uid": "_:sw1",
      "name": "Star Wars: Episode IV - A New Hope",
      "release_date": "1977-05-25",
      "revenue": 775000000,
      "running_time": 121,
      "starring": [{"uid": "_:luke"},{"uid": "_:leia"},{"uid": "_:han"}],
      "director": [{"uid": "_:lucas"}],
      "dgraph.type": "Film"
    },
    {
      "uid": "_:sw2",
      "name": "Star Wars: Episode V - The Empire Strikes Back",
      "release_date": "1980-05-21",
      "revenue": 534000000,
      "running_time": 124,
      "starring": [{"uid": "_:luke"},{"uid": "_:leia"},{"uid": "_:han"}],
      "director": [{"uid": "_:irvin"}],
      "dgraph.type": "Film"
    },
    {
      "uid": "_:sw3",
      "name": "Star Wars: Episode VI - Return of the Jedi",
      "release_date": "1983-05-25",
      "revenue": 572000000,
      "running_time": 131,
      "starring": [{"uid": "_:luke"},{"uid": "_:leia"},{"uid": "_:han"}],
      "director": [{"uid": "_:richard"}],
      "dgraph.type": "Film"
    },
    {
      "uid": "_:st1",
      "name": "Star Trek: The Motion Picture",
      "release_date": "1979-12-07",
      "revenue": 139000000,
      "running_time": 132,
      "dgraph.type": "Film"
    }
  ]
}
' | jq

Now, let’s upload the schema that will add an index.

curl "$DG_LB:8080/alter" --silent --request POST \
 --data $'
name: string @index(term) .
release_date: datetime @index(year) .
revenue: float .
running_time: int .
starring: [uid] .
director: [uid] .

type Person {
  name
}

type Film {
  name
  release_date
  revenue
  running_time
  starring
  director
}
' | jq

NOTE: This alter command will fail if a whitelist is not setup.

You can list out all of the movies that have a starring edge.

curl "$DG_LB:8080/query" --silent --request POST \
  --header "Content-Type: application/dql" \
  --data $'{ me(func: has(starring)) { name } }' \
  | jq .data

You can run this query Star Wars movies released after 1980.

curl "$DG_LB:8080/query" --silent --request POST \
  --header "Content-Type: application/dql" \
  --data $'
{
    me(func: allofterms(name, "Star Wars"), orderasc: release_date) 
     @filter(ge(release_date, "1980")) {
        name
        release_date
        revenue
        running_time
        director { name }
        starring (orderasc: name) { name }
    }
}
' | jq .data

6. Ratel visual application

Ratel is a graphical query and administration application for Dgraph. This application, unlike a double-clickable application on the desktop, runs only within a web browser like Safari, Firefox, or Chrome.

6.1 Installing Ratel

You can install a small web service that hosts Ratel, so that you can run this in your browser. Run the following to install the web service hosting Ratel:

# get dgraph helm chart
helm repo add dgraph https://charts.dgraph.io && helm repo update

# deploy Ratel
helm install ratel \
  --namespace ratel \
  --create-namespace dgraph/ratel \
  --values - <<EOF
ingress:
  enabled: true
  className: alb
  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
  hosts:
    - paths:
        - path: /*
          pathType: ImplementationSpecific
EOF

You can verify the installed components with the following command:

kubectl get all,ing --namespace ratel

6.2 Accessing Ratel

Print out the URL to Ratel, and past this into the browser.

RATEL_LB=$(kubectl get ing ratel \
  --namespace "ratel" \
  --output jsonpath='{.status.loadBalancer.ingress[0].hostname}'
)

echo "http://$RATEL_LB"

You will be prompted to enter in the Dgraph server URL. You can get this with the following command:

DGRAPH_LB=$(kubectl get service dg-dgraph-alpha \
  --namespace dgraph \
  --output jsonpath='{.status.loadBalancer.ingress[0].hostname}'
)

echo "http://$DGRAPH_LB:8080"

Copy that string as paste the string into the Dgraph server URL textbox and hit enter.

Click on Continue. This should drop you into the Query mode.

6.2 Testing queries with Ratel

If you ran through the previous steps above in the Testing Dgraph section, the data and schema should already be loaded. You run the same query from earlier:

{
    me(func: allofterms(name, "Star Wars"), orderasc: release_date) 
     @filter(ge(release_date, "1980")) {
        name
        release_date
        revenue
        running_time
        director { name }
        starring (orderasc: name) { name }
    }
}

This should look something like:

7. Security with Network Policies

Network Policies can control ingress (inbound) and egress (outbound) traffic between services running on Kubernetes.

7.1 General recommendations

For securing a services within a Kubernetes cluster, I recommend creating the following polices:

default baseline: in a given namespace, deny all traffic.
single tier web application: allow all traffic (0.0.0.0/0) but deny egress traffic to private subnets.
multi-tier web application: same as single tier web application, but allow outbound traffic to database’s namespace, e.g. add an egress rule to namespaceSelector.matchLabels.name=$DATABASE_NAMESPACE.
private database tier: allow all traffic from namespace of clients that need access to the database.

Dgraph is in the #4 category, so all traffic should be blocked except from desired sources.

Ratel is in the #2 category, a smell web server that hosts the client-only Ratel application. As with any web server, it should be isolated and not able to reach any private service within the internal network.

7.2 Test initial access from Ratel

First let’s test access, by exec into the Ratel container.

RATEL_POD=$(kubectl get pod \
  --selector app.kubernetes.io/name=ratel \
  --namespace ratel \
  --output name
)

kubectl exec -ti -n ratel $RATEL_POD -- sh

Once inside the Ratel container, run the following commands to test access:

# connect using service
wget -q -O- dg-dgraph-alpha.dgraph:8080/health

# connect using pod
wget -q -O- dg-dgraph-alpha-0.dg-dgraph-alpha-headless.dgraph:8080/health

7.3 Test initial access to Dgraph

We will use the namespace unapproved for traffic that is not approved traffic, and the namespace dgraph-client to test approved traffic to the Dgraph service.

Run the commands below to set this up.

# create name spaces
kubectl create namespace dgraph-client

# run new container and exec into the container
# CTRL-D to exit the session
kubectl run curl \
  --namespace dgraph-client \
  --image=curlimages/curl \
  --stdin --tty -- sh

Once inside this container running in dgraph-client namespace, run this command to test the connection,

# connect using service
curl dg-dgraph-alpha.dgraph:8080/health

# connect using pod
curl dg-dgraph-alpha-0.dg-dgraph-alpha-headless.dgraph:8080/health

Also repeat the same process for the unapproved namespace:

# create name spaces
kubectl create namespace unapproved

# run new container and exec into the container
kubectl run curl \
  --namespace unapproved \
  --image=curlimages/curl \
  --stdin --tty -- sh

Once in the container running in the unapproved namespace, repeat the same test:

# connect using service
curl dg-dgraph-alpha.dgraph:8080/health

# connect using pod
curl dg-dgraph-alpha-0.dg-dgraph-alpha-headless.dgraph:8080/health

7.4 Example: restricting egress traffic from Ratel

Now, we can add a network policy to restrict all egress traffic to private IP addresses, using this command below:

kubectl apply --namespace ratel --filename - <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: ratel-deny-egress
spec:
  podSelector: {}
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - {}
  egress:
    - to:
        - ipBlock:
            cidr: "0.0.0.0/0"
            except:
              - "10.0.0.0/8"
              - "172.16.0.0/12"
              - "192.168.0.0/16"
EOF

Try running the same test above. The output should be something like wget: bad address.

7.5 Example: restrict ingress traffic to Dgraph

Here’s a small example of how Dgraph can be secured, this example policy will do the following:

allows all pods in namespace dgraph to receive traffic from all pods in the same namespace on all ports (denies inbound traffic to all pods in namespace dgraph from other namespaces)
allows all pods in namespace dgraph to receive traffic from all pods in dgraph-client namespace for ports 8080 and 9080
allows all pods in namespace dgraph to receive traffic from load balancer private IP addresses on ports 8080 and 9080

Before we do this, we need to fetch the private IP addresses from the load balancers, so that we can add this to the ingress rules:

# fetch DNS name of the load balancer
export DG_LB=$(kubectl get service dg-dgraph-alpha --namespace dgraph \
  --output jsonpath='{.status.loadBalancer.ingress[0].hostname}'
)

# get ELB name from ARN
ELB_NAME=$(aws elbv2 describe-load-balancers --region $EKS_REGION \
  --query "LoadBalancers[?DNSName==\`$DG_LB\`].LoadBalancerArn" \
  --output text | cut -d/ -f2-4
)

# get network interface of the NLB usign ELB name
ELB_PRIVATE_ADDRS=($(aws ec2 describe-network-interfaces \
  --region $EKS_REGION \
  --filters Name=description,Values="ELB $ELB_NAME" \
  --query 'NetworkInterfaces[*].PrivateIpAddresses[*].PrivateIpAddress' \
  --output text
))

# get client source IP 
MY_IP_ADDRESS=$(curl --silent ifconfig.me)

# create new array with client source IP + private LB 
export INGRESS_ADDRS=(${ELB_PRIVATE_ADDRS[@]} $MY_IP_ADDRESS)

To implement a network, you can run the following:

# deploy network policy to dgraph namespace
kubectl apply --namespace dgraph --filename - <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: dgraph-allow
spec:
  podSelector: {}
  ingress:
    - from:
        - podSelector: {}
    - from:
$(P="    "; for IP in ${INGRESS_ADDRS[*]}; 
    do printf -- "$P$P- ipBlock:\n$P$P${P}cidr: $IP/32\n"; 
  done
)
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: dgraph-client
      ports:
        - port: 8080
        - port: 9080
EOF

NOTE: For this exercise, we’ll include both client source IP address and private NLB IP addresses. Ultimately, the application, will accept traffic or blocked based on addresses supplied during deployment of the Dgraph application.

NOTE: If external load balancer (NLB) is changed, such as deleting the service and recreating it, the IP addresses for the load balancer will need to be updated by updating INGRESS_ADDRS variable, and re-applying the network policy.

As an example, the array ELB_PRIVATE_ADDRS can have the following values: 192.168.24.42, 192.168.46.167, and 192.168.94.221, so visually, this policy will look like this (courtesy of https://orca.tufin.io/netpol/):

With Cilium’s visual editor, the same policy will look like this:

Visual digram of dgraph-allow network policy

7.5 Testing restricted access

After applying the policy, we can test access through the load balancer:

export DG_LB=$(kubectl get service dg-dgraph-alpha --namespace dgraph \
  --output jsonpath='{.status.loadBalancer.ingress[0].hostname}'
)

curl --silent $DG_LB:8080/state | jq -r '.groups."1".members'

From earlier test with curl running in dgraph-client namespace, we can run this:

# try commands from the dgraph-client namespace
kubectl exec --stdin --tty curl --namespace dgraph-client -- sh

Once inside the container, run:

curl dg-dgraph-alpha.dgraph:8080/health

Similarly, from the unapproved namespace, we can

# try commands from the unapproved namespace
kubectl exec --stdin --tty curl --namespace unapproved -- sh

Once inside the container, run the command below. The expectation, you will not be able to establish network connectivity.

curl dg-dgraph-alpha.dgraph:8080/health

8. Cleanup

8.1 Dgraph and Ratel Cleanup

You can delete Dgraph and Ratel with the following commands”

helm delete ratel --namespace ratel
helm delete dg --namespace dgraph

kubectl delete pvc --namespace dgraph --selector release=dg

8.2 Delete all Kubernetes objects that provision cloud resources

Before cleaning up the cloud resources, it is important to delete any Kubernetes resource objects that cause provisioning of cloud resources. Run these commands to verify.

kubectl get ingress,service,persistentvolumeclaim \
  --all-namespaces | grep -v none

If there any services of type LoadBalancer or ingress resources that create an external load balancer with NLB or ALB, these should be deleted. Also, all persistent volume claims should be deleted as well.

Otherwise, after Kubernetes is removed, there can be leftover orphaned cloud resources eating up costs 💵 💶 💷 💴.

8.3 Reset Default to original Storage Class

As a precaution, we don’t want to have any resources locked that may prevent deletion of the Kubernetes cluster. Run this command if we changed the defaults earlier.

kubectl patch storageclass ebs-sc --patch \
  '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'
kubectl patch storageclass gp2 --patch \
  '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

8.4 IAM Roles

These should be removed when removing Kubernetes with eksctl, but it is a good practice to remove them just in case.

eksctl delete iamserviceaccount \
  --name "aws-load-balancer-controller" \
  --namespace "kube-system" \
  --cluster $EKS_CLUSTER_NAME \
  --region $EKS_REGION

eksctl delete iamserviceaccount \
  --name "ebs-csi-controller-sa" \
  --namespace "kube-system" \
  --cluster $EKS_CLUSTER_NAME \
  --region $EKS_REGION

8.5 Delete Policy

We can also delete the policy that was created earlier:

aws iam delete-policy --policy-arn "$POLICY_ARN_ALBC"

8.6 Kubernetes cluster

Finally, the Kubernetes cluster itself can be deleted.

eksctl delete cluster --region $EKS_REGION --name $EKS_CLUSTER_NAME

Source Code

The source code to this repository can be found at:

https://github.com/darkn3rd/blog_tutorials/tree/master/kubernetes/eks/baseline/eksctl

Resources

Kubernetes Resources

These are the Kubernetes resource objects or components used in this guide.

Kubernetes Addons

These are the documentation areas from Amazon used to this guide.

AWS Cloud Resources

These are some of the AWS resource objects used in this guide.

Dgraph Documentation and Articles

Get Started — Quickstart Guide for v21.03: earlier guide had curl commands.
Dgraph on AWS: Setting up a horizontally scalable graph database: earlier guide I wrote in Sep, 2020. The setup for EKS is outdated, but the examples around Ratel are still applicable and useful.

Kubernetes

Kubectl commands

Kubernetes Network Policy Tools

Network Policy Editor from Cillium
Network Policy Viewer from Tufin

Security Guides

Network security best practices guide
Best Practices for Network Policies on the Amazon Elastic Kubernetes Service

AWS Load Balancer Issues

2221

Conclusion

Thank you for following the article. Here are some final notes.

Takeaways

The main take away of this article is to create a baseline reference platform for EKS (Elastic Kubernetes Service) that includes support for volumes (EBS), external load balancers (ALB and NLB), as well as network polices (Calico).

The second take away showcase these features using a robust application, such as the distributed graph database Dgraph:

persistent volumes: Dgraph database requires storage
external load balancer: restricted connectivity to database Dgraph
ingress: small web server hosting visual application Ratel
network policies: restrict outbound access to web server hosting visual application Ratel, restrict inbound access to Dgraph within the cluster

Secure web traffic

One thing that I avoided, as it adds a several more layers of complexity are securing traffic with certificates. This requires owning a domain, so that you can change records in a DNS zone, and access services using a DNS FQDN, like https://dgraph.example.com and https://ratel.example.com.

On EKS, there are a few solution paths for automation. You will need automation to configure the DNS records with Route53 (or another solution like Cloudflare depending on where you manage your domain), and you will need automation to issue and install certificates, which also requires DNS automation to verify ownership of said domain.

Here are some options:

DNS Automation: AWS CLI, Terraform, Kubernetes add-on with external-dns.
Certificate Automation: ACM (AWS Certificate Manager) or Kubernetes add-on with cert-manager.

It will come to no surprise that ACM will work only with Route53. The cert-manager solution on the other hand will work with Route53, Cloudflare, Azure DNS, Cloud DNS, and several other solutions.

The future

In the future, I would like to cover these topics, as well as showing how to do some of these processes with Terraform, which will likely be several articles, especially as eksctl automates many components in the background.

From this baseline, there are all sort of directions to cover, such more advanced networking or service meshes (Istio, Linkerd, Cillium, NSM, Consul Connect), o11y (tracing, log aggregation/shipping, profiling, metrics, alerting, visualization), and progressive delivery with Spinnaker, ArgoCD, and FluxCD, on not just EKS, but also GKE and AKS.

So stay tuned. In the mean time, I hope this article is useful.

Ultimate EKS Baseline Cluster

Proper Elastic Kubernetes Service Baseline Cluster

0. Prerequisites

0.1 Knowledge: Systems

0.2 Knowledge: Networking

0.3 Knowledge: Kubernetes

0.4 Tools

0.5 AWS Setup

0.6 Kubernetes Client Setup

0.7 Setup Environment Variables

0.8 Setup Helm Repositories

1. Provision an EKS cluster

1.1 Add OIDC Provider Support

2. AWS Load Balancer Controller

2.1 Create a Policy to access ELB APIs

2.2. Associate Service Account with the uploaded policy

2.3 Install AWS load balancer controller add-on

2.4 Testing NLB

2.5 Testing ALB

2.6 Delete Test Applications

3. AWS EBS CSI driver

3.1 Setup IAM Role and K8S SA association

3.2 Install AWS EBS CSI Drvier

3.3 Create storage class that uses the EBS CSI driver

3.4 Set new storage class to the default (optional)

3.5 Testing Persistent Volume

3.6 Delete test application

4. Calico CNI

4.1 Install Calico CNI via the Tigera operator

4.2 Enable Pod IP annotation (important)

4.3 Test network policy

4.4 Delete test application

5. Dgraph Demo Application

5.1 Setup Access List for Security

5.2 Install HA Dgraph cluster

5.3 About External Load Balancer Configuration

5.4 Connecting to Dgraph

5.5 Testing Dgraph

6. Ratel visual application

6.1 Installing Ratel

6.2 Accessing Ratel

6.2 Testing queries with Ratel

7. Security with Network Policies

7.1 General recommendations

7.2 Test initial access from Ratel

7.3 Test initial access to Dgraph

7.4 Example: restricting egress traffic from Ratel

7.5 Example: restrict ingress traffic to Dgraph

7.5 Testing restricted access

8. Cleanup

8.1 Dgraph and Ratel Cleanup

8.2 Delete all Kubernetes objects that provision cloud resources

8.3 Reset Default to original Storage Class

8.4 IAM Roles

8.5 Delete Policy

8.6 Kubernetes cluster

Source Code

Resources

Kubernetes Resources

Kubernetes Addons

AWS Cloud Resources

Dgraph Documentation and Articles

Kubernetes

Kubernetes Network Policy Tools

Security Guides

AWS Load Balancer Issues

Conclusion

Takeaways

Secure web traffic

The future

Written by Joaquín Menchaca (智裕)