Ultimate Baseline GKE cluster, Pt 2

Part II: Distributed Graph Database on Proper GKE baseline

Joaquín Menchaca (智裕)
13 min readOct 27, 2023

--

In the previous article I demonstrated how to setup a baseline cluster with GKE that has private subnet and a least-privilege service account. I also covered some basic tests that can be used to demo or troubleshoot cluster features.

In this article, I will show a robust stateful application, called Dgraph, a highly performant resilient distributed graph database.

Why Dgraph?

The Dgraph database is a cloud native application that is highly resilient with built-in high availability, communicates using gRPC (HTTP/2) or GraphQL (HTTP/1.1), and has built-in metrics and distributed tracing through OpenCensus.

These features intersect nicely with Kubernetes features for auto-healing and recoverability:

  • Application Layer Dgraph application itself is able to operate with some members are not available, can elect a new leader, sends snapshots to synchronize data between its members.
  • Platform LayerKubernetes through statefulset controller can restart a Dgraph member with its data intact, so that for example, if alpha-0 goes down, it comes back with persistent volume for alpha-0, and if alpha-1 does down, it comes back with alpha-1 data, and so forth.
  • Infrastructure Layer — Google MIGs (Managed Instance Groups), which can recover a failing GCE virtual machine instances that are distributed across different data centers (represented by an zone) within the same region.

Some of the other characteristics will be demonstrated with Dgraph on GKE are the following:

  • persistent volumes — Dgraph, as a database, will need a fast storage, so will need a storage class that supports SSD
  • external load balancer — gRPC uses HTTP/2, which is easy to supports with a layer 4 load balancer, so service object of type LoadBalancer will be used for this
  • ingress — can be used for GraphQL, which uses HTTP/1.1. This is fairly straightforward to set up with am ingress, which is a configuration sent to the ingress controller (which is a layer 7 load balancer).
  • network policies — to allow only limited access to the cluster, and for the Ratel client, which is hosted by a small web service, can be more open.

Prerequisite Setup

If not already completed, make sure to run through the Steps 0 and Steps 1 from the previous article, Part I: Proper Google Kubernetes Engine baseline cluster.

3.0 Dgraph Demo Application

Dgraph is a highly performant distributed graph database that uses either DQL (Dgraph Query Language) or GraphQL as graph database query languages.

You can install Dgraph using the Dgraph helm chart. For this example, we’ll use an external load balancer. Since, generally, it is not safe to park the database on the public internet, we’ll create an access or allow list to limit what client sources can communicate with the Dgraph database.

3.1 Setup Access List for Security

As most should know, databases are suppose to be private, and not accessible on the Internet. Dgraph can be configured to block all traffic except for IP addresses that are allowed to access Dgraph.

To get started, we can get the IP address range from the subnet we created earlier as well as the pod subnet used on GKE.

SUBNET_CIDR=$(gcloud compute networks subnets describe $GKE_SUBNET_NAME \
--project $GKE_PROJECT_ID \
--region $GKE_REGION \
--format json \
| jq -r '.ipCidrRange'
)

GKE_POD_CIDR=$(gcloud container clusters describe $GKE_CLUSTER_NAME \
--project $GKE_PROJECT_ID \
--region $GKE_REGION \
--format json \
| jq -r '.clusterIpv4Cidr'
)

# get the current outbound IP from your current location
# this will simulate the remote offic IP address
MY_IP_ADDRESS=$(curl --silent ifconfig.me)

# set env var to use later
export DG_ALLOW_LIST="${SUBNET_CIDR},${GKE_POD_CIDR},${MY_IP_ADDRESS}/32"

3.2 Install HA Dgraph cluster

With the env variable DG_ALLOW_LIST set, we can deploy Dgraph with the following:

# get dgraph helm chart
helm repo add dgraph https://charts.dgraph.io && helm repo update

# deploy dgraph
helm install dg dgraph/dgraph \
--namespace dgraph \
--create-namespace \
--values - <<EOF
zero:
persistence:
storageClass: premium-rwo
size: 10Gi
alpha:
configFile:
config.yaml: |
security:
whitelist: ${DG_ALLOW_LIST}
persistence:
storageClass: premium-rwo
size: 30Gi
service:
type: LoadBalancer
externalTrafficPolicy: Local
EOF

📓 NOTE: With Google Cloud, the initial quotes for storage is limited to 500Gi. A three node GKE cluster will already consume 300Gi (3 x 100Gi), leaving a scant 200Gi for the cluster or anything else running on the cluster. If you run into pods that a stuck in a pending state, check the events, e.g. kubectl get events --namespace dgraph, to see if storage is the issue.

You can inspect the deployed components with:

kubectl get all --namespace dgraph

3.3 About External Load Balancer Configuration

When using any filtering by IP address, such as Dgraph’s whitelist or using a network policy, the client source IP address needs to be preserved. This can be done by setting externalTrafficPolicy is set to Local.

If you would like to get further information regarding what was provisioned on AWS, you can run this command:

export DG_LB=$(kubectl get service dg-dgraph-alpha \
--namespace dgraph \
--output jsonpath='{.status.loadBalancer.ingress[0].ip}'
)


gcloud compute forwarding-rules list \
--filter $DG_LB \
--format "table[box](name,IPAddress,target.segment(-2):label=TARGET_TYPE)"

3.4 Connecting to Dgraph

You can run the following to test connectivity to Dgraph.

export DG_LB=$(kubectl get service dg-dgraph-alpha \
--namespace dgraph \
--output jsonpath='{.status.loadBalancer.ingress[0].ip}'
)

curl --silent $DG_LB:8080/state | jq -r '.groups."1".members'

Will will use this environment variable DG_LB in following steps of this tutorial.

3.5 Testing Dgraph

We can test Dgraph by loading up some data and schema, and then run some queries using curl. Make sure to have the DG_LB environment variable set.

First let’s upload some data.

curl "$DG_LB:8080/mutate?commitNow=true" --silent --request POST \
--header "Content-Type: application/json" \
--data $'
{
"set": [
{"uid": "_:luke","name": "Luke Skywalker", "dgraph.type": "Person"},
{"uid": "_:leia","name": "Princess Leia", "dgraph.type": "Person"},
{"uid": "_:han","name": "Han Solo", "dgraph.type": "Person"},
{"uid": "_:lucas","name": "George Lucas", "dgraph.type": "Person"},
{"uid": "_:irvin","name": "Irvin Kernshner", "dgraph.type": "Person"},
{"uid": "_:richard","name": "Richard Marquand", "dgraph.type": "Person"},
{
"uid": "_:sw1",
"name": "Star Wars: Episode IV - A New Hope",
"release_date": "1977-05-25",
"revenue": 775000000,
"running_time": 121,
"starring": [{"uid": "_:luke"},{"uid": "_:leia"},{"uid": "_:han"}],
"director": [{"uid": "_:lucas"}],
"dgraph.type": "Film"
},
{
"uid": "_:sw2",
"name": "Star Wars: Episode V - The Empire Strikes Back",
"release_date": "1980-05-21",
"revenue": 534000000,
"running_time": 124,
"starring": [{"uid": "_:luke"},{"uid": "_:leia"},{"uid": "_:han"}],
"director": [{"uid": "_:irvin"}],
"dgraph.type": "Film"
},
{
"uid": "_:sw3",
"name": "Star Wars: Episode VI - Return of the Jedi",
"release_date": "1983-05-25",
"revenue": 572000000,
"running_time": 131,
"starring": [{"uid": "_:luke"},{"uid": "_:leia"},{"uid": "_:han"}],
"director": [{"uid": "_:richard"}],
"dgraph.type": "Film"
},
{
"uid": "_:st1",
"name": "Star Trek: The Motion Picture",
"release_date": "1979-12-07",
"revenue": 139000000,
"running_time": 132,
"dgraph.type": "Film"
}
]
}
' | jq

Now, let’s upload the schema that will add an index.

curl "$DG_LB:8080/alter" --silent --request POST \
--data $'
name: string @index(term) .
release_date: datetime @index(year) .
revenue: float .
running_time: int .
starring: [uid] .
director: [uid] .

type Person {
name
}

type Film {
name
release_date
revenue
running_time
starring
director
}
' | jq

NOTE: This alter command will fail if a whitelist is not setup.

You can list out all of the movies that have a starring edge.

curl "$DG_LB:8080/query" --silent --request POST \
--header "Content-Type: application/dql" \
--data $'{ me(func: has(starring)) { name } }' \
| jq .data

You can run this query Star Wars movies released after 1980.

curl "$DG_LB:8080/query" --silent --request POST \
--header "Content-Type: application/dql" \
--data $'
{
me(func: allofterms(name, "Star Wars"), orderasc: release_date)
@filter(ge(release_date, "1980")) {
name
release_date
revenue
running_time
director { name }
starring (orderasc: name) { name }
}
}
' | jq .data

4. Ratel visualization application

Ratel is a graphical query and administration application for Dgraph. This application, unlike a double-clickable application on the desktop, runs only within a web browser like Safari, Firefox, or Chrome.

4.1 Installing Ratel

You can install a small web service that hosts Ratel, so that you can run this in your browser. Run the following to install the web service hosting Ratel:

# get dgraph helm chart
helm repo add dgraph https://charts.dgraph.io && helm repo update

# deploy Ratel
helm install ratel \
--namespace ratel \
--create-namespace dgraph/ratel \
--values - <<EOF
service:
type: NodePort
ingress:
enabled: true
className: gce
annotations:
kubernetes.io/ingress.class: gce
hosts:
- paths:
- path: /*
pathType: ImplementationSpecific
EOF

📓 NOTE: The service must be of NodePort or LoadBalancer type, as ingress-gce does not support ClusterIP. Also, the annotation kubernetes.io/ingress.class: gce is required if ingress-gce is not the default as the className will be ignored.

You can verify the installed components with the following command:

kubectl get all,ing --namespace ratel

4.2 Accessing Ratel

Print out the URL to Ratel, and paste this into the browser.

RATEL_LB=$(kubectl get ing ratel \
--namespace "ratel" \
--output jsonpath='{.status.loadBalancer.ingress[0].ip}'
)

echo "http://$RATEL_LB"

You will be prompted to enter in the Dgraph server URL. You can get this with the following command:

DGRAPH_LB=$(kubectl get service dg-dgraph-alpha \
--namespace dgraph \
--output jsonpath='{.status.loadBalancer.ingress[0].ip}'
)

echo "http://$DGRAPH_LB:8080"

Copy that string as paste the string into the Dgraph server URL textbox and hit enter.

Click on Continue. This should drop you into the Query mode.

4.2 Testing queries with Ratel

If you ran through the previous steps above in the Testing Dgraph section, the data and schema should already be loaded. You run the same query from earlier:

{
me(func: allofterms(name, "Star Wars"), orderasc: release_date)
@filter(ge(release_date, "1980")) {
name
release_date
revenue
running_time
director { name }
starring (orderasc: name) { name }
}
}

This should look something like:

5. Security with Network Policies

Network Policies can control ingress (inbound) and egress (outbound) traffic between services running on Kubernetes.

5.1 General recommendations

For securing a services within a Kubernetes cluster, I recommend creating the following polices:

  1. default baseline: in a given namespace, deny all traffic.
  2. single tier web application: allow all traffic (0.0.0.0/0) but deny egress traffic to private subnets.
  3. multi-tier web application: same as single tier web application, but allow outbound traffic to database’s namespace, e.g. add an egress rule to namespaceSelector.matchLabels.name=$DATABASE_NAMESPACE.
  4. private database tier: allow all traffic from namespace of clients that need access to the database.

Dgraph is in the #4 category, so all traffic should be blocked except from desired sources.

Ratel is in the #2 category, a smell web server that hosts the client-only Ratel application. As with any web server, it should be isolated and not able to reach any private service within the internal network.

5.2 Testing Network Policies

These will be the following tests we should do before and after the polices are applied.

Before we apply the policy, we can test connectivity, then apply the policy and verify that after deploying the policy, the Dgraph service can only be reached from approved sources.

5.2.1 Testing from an approved namespace

The namespace dgraph-client will be used to test approved traffic to the Dgraph service.

Run the commands below to set this up.

# create name spaces
kubectl create namespace dgraph-client

# run new container and exec into the container
# CTRL-D to exit the session
kubectl run curl \
--namespace dgraph-client \
--image=curlimages/curl \
--stdin --tty -- sh

Once inside this container running in dgraph-client namespace, run this command to test the connection.

# connect using service
curl dg-dgraph-alpha.dgraph:8080/health
# connect using pod
curl dg-dgraph-alpha-0.dg-dgraph-alpha-headless.dgraph:8080/health

5.2.2 Testing from unapproved namespace

We will use the namespace unapproved for traffic that is not approved traffic. Repeat the same process for the unapproved namespace:

# create name spaces
kubectl create namespace unapproved

# run new container and exec into the container
kubectl run curl \
--namespace unapproved \
--image=curlimages/curl \
--stdin --tty -- sh

Once in the container running in the unapproved namespace, repeat the same test:

# connect using service
curl dg-dgraph-alpha.dgraph:8080/health
# connect using pod
curl dg-dgraph-alpha-0.dg-dgraph-alpha-headless.dgraph:8080/health

5.2.3 Testing from Ratel web service namespace

Exec into the Ratel container:

RATEL_POD=$(kubectl get pod \
--selector app.kubernetes.io/name=ratel \
--namespace ratel \
--output name
)

kubectl exec -ti -n ratel $RATEL_POD -- sh

Once inside the Ratel container, run the following commands to test access:

# connect using service
wget -q -O- dg-dgraph-alpha.dgraph:8080/health
# connect using pod
wget -q -O- dg-dgraph-alpha-0.dg-dgraph-alpha-headless.dgraph:8080/health

5.3 Network Policy: restricting egress traffic from Ratel

Now, we can add a network policy to restrict all egress traffic to private IP addresses, using this command below:

kubectl apply --namespace ratel --filename - <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: ratel-deny-egress
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
ingress:
- {}
egress:
- to:
- ipBlock:
cidr: "0.0.0.0/0"
except:
- "10.0.0.0/8"
- "172.16.0.0/12"
- "192.168.0.0/16"
EOF

Try running the same test above. The output should be something like wget: bad address.

5.4 Network Policy: restrict ingress traffic to Dgraph

Here’s a small example of how Dgraph can be secured, this example policy will do the following:

  • allows all pods in namespace dgraph to receive traffic from all pods in the same namespace on all ports (denies inbound traffic to all pods in namespace dgraph from other namespaces)
  • allows all pods in namespace dgraph to receive traffic from all pods in dgraph-client namespace for ports 8080 and 9080
  • allows all pods in namespace dgraph to receive traffic from load balancer private IP addresses on ports 8080 and 9080

Before we do this, we need to fetch the private IP addresses from the load balancers, so that we can add this to the ingress rules:

# get client source IP 
INGRESS_ADDRS=$(curl --silent ifconfig.me)

To implement a network, you can run the following:

# deploy network policy to dgraph namespace
kubectl apply --namespace dgraph --filename - <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: dgraph-allow
spec:
podSelector: {}
ingress:
- from:
- podSelector: {}
- from:
$(P=" "; for IP in ${INGRESS_ADDRS[*]};
do printf -- "$P$P- ipBlock:\n$P$P${P}cidr: $IP/32\n";
done
)
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: dgraph-client
ports:
- port: 8080
- port: 9080
EOF

NOTE: For this exercise, the client source IP address. Ultimately, the application, will accept traffic or block traffic based on addresses supplied during deployment of the Dgraph application.

As an example, a public IP of 50.196.140.125/32, representing a home office IP address, will be used. A table for the policies is shown below (courtesy of https://orca.tufin.io/netpol/):

Table of dgraph-allow network policy

With Cilium’s visual editor, the same policy will look like this:

Visual digram of dgraph-allow network policy

5.5 Testing restricted access

Now we want to log into the containers we did before to test access after the policies have been applied.

5.5.1 Testing from dgraph-client

From earlier test with curl running in dgraph-client namespace, we can run this:

# try commands from the dgraph-client namespace
kubectl exec --stdin --tty curl --namespace dgraph-client -- sh

Once inside the container, run the same test, expecting this to work:

# connect using service
curl dg-dgraph-alpha.dgraph:8080/health
# connect using pod
curl dg-dgraph-alpha-0.dg-dgraph-alpha-headless.dgraph:8080/health

5.5.2 Testing from unapproved

Similarly, from the unapproved namespace, we can

# try commands from the unapproved namespace
kubectl exec --stdin --tty curl --namespace unapproved -- sh

Once inside the container, run the following command, expecting it to fail:

# connect using service
curl dg-dgraph-alpha.dgraph:8080/health
# connect using pod
curl dg-dgraph-alpha-0.dg-dgraph-alpha-headless.dgraph:8080/health

5.5.3 Testing from Ratel

We can exec into the Ratel container with the following:

RATEL_POD=$(kubectl get pod \
--selector app.kubernetes.io/name=ratel \
--namespace ratel \
--output name
)

kubectl exec -ti -n ratel $RATEL_POD -- sh

At the shell prompt, run the following tests, expecting them to fail:

# connect using service
wget -q -O- dg-dgraph-alpha.dgraph:8080/health
# connect using pod
wget -q -O- dg-dgraph-alpha-0.dg-dgraph-alpha-headless.dgraph:8080/health

5.5.4 Testing from a whitelisted public IP

After applying the policy, we can test access through the load balancer:

export DG_LB=$(kubectl get service dg-dgraph-alpha --namespace dgraph \
--output jsonpath='{.status.loadBalancer.ingress[0].ip}'
)

curl --silent $DG_LB:8080/state | jq -r '.groups."1".members'

Cleanup

Kubernetes Resources

Here’s how you can clear how the Kubernetes resources that were used in this article:

helm delete ratel --namespace ratel
helm delete dg --namespace dgraph
kubectl delete pvc --namespace dgraph --selector release=dg
kubectl delete namespace dgraph-client unapproved ratel dgraph

Related Articles

I also have an article to setup something similar for EKS on AWS.

Source Code

Scripts related to this article can be found here:

Final Thoughts

Thank you for following thus far. In the previous article, I showed how to provision a GKE cluster with a private network and least-privilege security account as well as provide some optional tests you can run to test the features.

In this article, I show how deploy a robust distributed graph database called Dgraph, demonstrate some basic usage of Dgraph, as well as secure Dgraph using network policies that are applied with Calico.

This material should be useful material to learn how to manage and secure other database applications on Kubernetes with GKE.

From here, there are a few directions we can go, such as the following list:

  • Dgraph (or other database): configure object storage access for backups
  • Setup web service with TLS support (requires registering a domain)
  • Setup zero trust network using a service mesh

--

--

Joaquín Menchaca (智裕)

DevOps/SRE/PlatformEng — k8s, o11y, vault, terraform, ansible