Ultimate Baseline GKE cluster, Pt 2
Part II: Distributed Graph Database on Proper GKE baseline
In the previous article I demonstrated how to setup a baseline cluster with GKE that has private subnet and a least-privilege service account. I also covered some basic tests that can be used to demo or troubleshoot cluster features.
In this article, I will show a robust stateful application, called Dgraph, a highly performant resilient distributed graph database.
Why Dgraph?
The Dgraph database is a cloud native application that is highly resilient with built-in high availability, communicates using gRPC (HTTP/2) or GraphQL (HTTP/1.1), and has built-in metrics and distributed tracing through OpenCensus.
These features intersect nicely with Kubernetes features for auto-healing and recoverability:
- Application Layer — Dgraph application itself is able to operate with some members are not available, can elect a new leader, sends snapshots to synchronize data between its members.
- Platform Layer — Kubernetes through statefulset controller can restart a Dgraph member with its data intact, so that for example, if
alpha-0goes down, it comes back with persistent volume foralpha-0, and ifalpha-1does down, it comes back withalpha-1data, and so forth. - Infrastructure Layer — Google MIGs (Managed Instance Groups), which can recover a failing GCE virtual machine instances that are distributed across different data centers (represented by an zone) within the same region.
Some of the other characteristics will be demonstrated with Dgraph on GKE are the following:
- persistent volumes — Dgraph, as a database, will need a fast storage, so will need a storage class that supports SSD
- external load balancer — gRPC uses HTTP/2, which is easy to supports with a layer 4 load balancer, so
serviceobject of typeLoadBalancerwill be used for this - ingress — can be used for GraphQL, which uses HTTP/1.1. This is fairly straightforward to set up with am ingress, which is a configuration sent to the ingress controller (which is a layer 7 load balancer).
- network policies — to allow only limited access to the cluster, and for the Ratel client, which is hosted by a small web service, can be more open.
Prerequisite Setup
If not already completed, make sure to run through the Steps 0 and Steps 1 from the previous article, Part I: Proper Google Kubernetes Engine baseline cluster.
3.0 Dgraph Demo Application
Dgraph is a highly performant distributed graph database that uses either DQL (Dgraph Query Language) or GraphQL as graph database query languages.
You can install Dgraph using the Dgraph helm chart. For this example, we’ll use an external load balancer. Since, generally, it is not safe to park the database on the public internet, we’ll create an access or allow list to limit what client sources can communicate with the Dgraph database.
3.1 Setup Access List for Security
As most should know, databases are suppose to be private, and not accessible on the Internet. Dgraph can be configured to block all traffic except for IP addresses that are allowed to access Dgraph.
To get started, we can get the IP address range from the subnet we created earlier as well as the pod subnet used on GKE.
SUBNET_CIDR=$(gcloud compute networks subnets describe $GKE_SUBNET_NAME \
--project $GKE_PROJECT_ID \
--region $GKE_REGION \
--format json \
| jq -r '.ipCidrRange'
)
GKE_POD_CIDR=$(gcloud container clusters describe $GKE_CLUSTER_NAME \
--project $GKE_PROJECT_ID \
--region $GKE_REGION \
--format json \
| jq -r '.clusterIpv4Cidr'
)
# get the current outbound IP from your current location
# this will simulate the remote offic IP address
MY_IP_ADDRESS=$(curl --silent ifconfig.me)
# set env var to use later
export DG_ALLOW_LIST="${SUBNET_CIDR},${GKE_POD_CIDR},${MY_IP_ADDRESS}/32"3.2 Install HA Dgraph cluster
With the env variable DG_ALLOW_LIST set, we can deploy Dgraph with the following:
# get dgraph helm chart
helm repo add dgraph https://charts.dgraph.io && helm repo update
# deploy dgraph
helm install dg dgraph/dgraph \
--namespace dgraph \
--create-namespace \
--values - <<EOF
zero:
persistence:
storageClass: premium-rwo
size: 10Gi
alpha:
configFile:
config.yaml: |
security:
whitelist: ${DG_ALLOW_LIST}
persistence:
storageClass: premium-rwo
size: 30Gi
service:
type: LoadBalancer
externalTrafficPolicy: Local
EOF📓 NOTE: With Google Cloud, the initial quotes for storage is limited to 500Gi. A three node GKE cluster will already consume 300Gi (3 x 100Gi), leaving a scant 200Gi for the cluster or anything else running on the cluster. If you run into pods that a stuck in a pending state, check the events, e.g. kubectl get events --namespace dgraph, to see if storage is the issue.
You can inspect the deployed components with:
kubectl get all --namespace dgraph3.3 About External Load Balancer Configuration
When using any filtering by IP address, such as Dgraph’s whitelist or using a network policy, the client source IP address needs to be preserved. This can be done by setting externalTrafficPolicy is set to Local.
If you would like to get further information regarding what was provisioned on AWS, you can run this command:
export DG_LB=$(kubectl get service dg-dgraph-alpha \
--namespace dgraph \
--output jsonpath='{.status.loadBalancer.ingress[0].ip}'
)
gcloud compute forwarding-rules list \
--filter $DG_LB \
--format "table[box](name,IPAddress,target.segment(-2):label=TARGET_TYPE)"3.4 Connecting to Dgraph
You can run the following to test connectivity to Dgraph.
export DG_LB=$(kubectl get service dg-dgraph-alpha \
--namespace dgraph \
--output jsonpath='{.status.loadBalancer.ingress[0].ip}'
)
curl --silent $DG_LB:8080/state | jq -r '.groups."1".members'Will will use this environment variable DG_LB in following steps of this tutorial.
3.5 Testing Dgraph
We can test Dgraph by loading up some data and schema, and then run some queries using curl. Make sure to have the DG_LB environment variable set.
First let’s upload some data.
curl "$DG_LB:8080/mutate?commitNow=true" --silent --request POST \
--header "Content-Type: application/json" \
--data $'
{
"set": [
{"uid": "_:luke","name": "Luke Skywalker", "dgraph.type": "Person"},
{"uid": "_:leia","name": "Princess Leia", "dgraph.type": "Person"},
{"uid": "_:han","name": "Han Solo", "dgraph.type": "Person"},
{"uid": "_:lucas","name": "George Lucas", "dgraph.type": "Person"},
{"uid": "_:irvin","name": "Irvin Kernshner", "dgraph.type": "Person"},
{"uid": "_:richard","name": "Richard Marquand", "dgraph.type": "Person"},
{
"uid": "_:sw1",
"name": "Star Wars: Episode IV - A New Hope",
"release_date": "1977-05-25",
"revenue": 775000000,
"running_time": 121,
"starring": [{"uid": "_:luke"},{"uid": "_:leia"},{"uid": "_:han"}],
"director": [{"uid": "_:lucas"}],
"dgraph.type": "Film"
},
{
"uid": "_:sw2",
"name": "Star Wars: Episode V - The Empire Strikes Back",
"release_date": "1980-05-21",
"revenue": 534000000,
"running_time": 124,
"starring": [{"uid": "_:luke"},{"uid": "_:leia"},{"uid": "_:han"}],
"director": [{"uid": "_:irvin"}],
"dgraph.type": "Film"
},
{
"uid": "_:sw3",
"name": "Star Wars: Episode VI - Return of the Jedi",
"release_date": "1983-05-25",
"revenue": 572000000,
"running_time": 131,
"starring": [{"uid": "_:luke"},{"uid": "_:leia"},{"uid": "_:han"}],
"director": [{"uid": "_:richard"}],
"dgraph.type": "Film"
},
{
"uid": "_:st1",
"name": "Star Trek: The Motion Picture",
"release_date": "1979-12-07",
"revenue": 139000000,
"running_time": 132,
"dgraph.type": "Film"
}
]
}
' | jqNow, let’s upload the schema that will add an index.
curl "$DG_LB:8080/alter" --silent --request POST \
--data $'
name: string @index(term) .
release_date: datetime @index(year) .
revenue: float .
running_time: int .
starring: [uid] .
director: [uid] .
type Person {
name
}
type Film {
name
release_date
revenue
running_time
starring
director
}
' | jqNOTE: This alter command will fail if a whitelist is not setup.
You can list out all of the movies that have a starring edge.
curl "$DG_LB:8080/query" --silent --request POST \
--header "Content-Type: application/dql" \
--data $'{ me(func: has(starring)) { name } }' \
| jq .dataYou can run this query Star Wars movies released after 1980.
curl "$DG_LB:8080/query" --silent --request POST \
--header "Content-Type: application/dql" \
--data $'
{
me(func: allofterms(name, "Star Wars"), orderasc: release_date)
@filter(ge(release_date, "1980")) {
name
release_date
revenue
running_time
director { name }
starring (orderasc: name) { name }
}
}
' | jq .data4. Ratel visualization application
Ratel is a graphical query and administration application for Dgraph. This application, unlike a double-clickable application on the desktop, runs only within a web browser like Safari, Firefox, or Chrome.
4.1 Installing Ratel
You can install a small web service that hosts Ratel, so that you can run this in your browser. Run the following to install the web service hosting Ratel:
# get dgraph helm chart
helm repo add dgraph https://charts.dgraph.io && helm repo update
# deploy Ratel
helm install ratel \
--namespace ratel \
--create-namespace dgraph/ratel \
--values - <<EOF
service:
type: NodePort
ingress:
enabled: true
className: gce
annotations:
kubernetes.io/ingress.class: gce
hosts:
- paths:
- path: /*
pathType: ImplementationSpecific
EOF📓 NOTE: The service must be of NodePort or LoadBalancer type, as ingress-gce does not support ClusterIP. Also, the annotation kubernetes.io/ingress.class: gce is required if ingress-gce is not the default as the className will be ignored.
You can verify the installed components with the following command:
kubectl get all,ing --namespace ratel4.2 Accessing Ratel
Print out the URL to Ratel, and paste this into the browser.
RATEL_LB=$(kubectl get ing ratel \
--namespace "ratel" \
--output jsonpath='{.status.loadBalancer.ingress[0].ip}'
)
echo "http://$RATEL_LB"You will be prompted to enter in the Dgraph server URL. You can get this with the following command:
DGRAPH_LB=$(kubectl get service dg-dgraph-alpha \
--namespace dgraph \
--output jsonpath='{.status.loadBalancer.ingress[0].ip}'
)
echo "http://$DGRAPH_LB:8080"Copy that string as paste the string into the Dgraph server URL textbox and hit enter.
Click on Continue. This should drop you into the Query mode.
4.2 Testing queries with Ratel
If you ran through the previous steps above in the Testing Dgraph section, the data and schema should already be loaded. You run the same query from earlier:
{
me(func: allofterms(name, "Star Wars"), orderasc: release_date)
@filter(ge(release_date, "1980")) {
name
release_date
revenue
running_time
director { name }
starring (orderasc: name) { name }
}
}This should look something like:
5. Security with Network Policies
Network Policies can control ingress (inbound) and egress (outbound) traffic between services running on Kubernetes.
5.1 General recommendations
For securing a services within a Kubernetes cluster, I recommend creating the following polices:
- default baseline: in a given namespace, deny all traffic.
- single tier web application: allow all traffic (
0.0.0.0/0) but deny egress traffic to private subnets. - multi-tier web application: same as single tier web application, but allow outbound traffic to database’s namespace, e.g. add an egress rule to
namespaceSelector.matchLabels.name=$DATABASE_NAMESPACE. - private database tier: allow all traffic from namespace of clients that need access to the database.
Dgraph is in the #4 category, so all traffic should be blocked except from desired sources.
Ratel is in the #2 category, a smell web server that hosts the client-only Ratel application. As with any web server, it should be isolated and not able to reach any private service within the internal network.
5.2 Testing Network Policies
These will be the following tests we should do before and after the polices are applied.
Before we apply the policy, we can test connectivity, then apply the policy and verify that after deploying the policy, the Dgraph service can only be reached from approved sources.
5.2.1 Testing from an approved namespace
The namespace dgraph-client will be used to test approved traffic to the Dgraph service.
Run the commands below to set this up.
# create name spaces
kubectl create namespace dgraph-client
# run new container and exec into the container
# CTRL-D to exit the session
kubectl run curl \
--namespace dgraph-client \
--image=curlimages/curl \
--stdin --tty -- shOnce inside this container running in dgraph-client namespace, run this command to test the connection.
# connect using service
curl dg-dgraph-alpha.dgraph:8080/health
# connect using pod
curl dg-dgraph-alpha-0.dg-dgraph-alpha-headless.dgraph:8080/health5.2.2 Testing from unapproved namespace
We will use the namespace unapproved for traffic that is not approved traffic. Repeat the same process for the unapproved namespace:
# create name spaces
kubectl create namespace unapproved
# run new container and exec into the container
kubectl run curl \
--namespace unapproved \
--image=curlimages/curl \
--stdin --tty -- shOnce in the container running in the unapproved namespace, repeat the same test:
# connect using service
curl dg-dgraph-alpha.dgraph:8080/health
# connect using pod
curl dg-dgraph-alpha-0.dg-dgraph-alpha-headless.dgraph:8080/health5.2.3 Testing from Ratel web service namespace
Exec into the Ratel container:
RATEL_POD=$(kubectl get pod \
--selector app.kubernetes.io/name=ratel \
--namespace ratel \
--output name
)
kubectl exec -ti -n ratel $RATEL_POD -- shOnce inside the Ratel container, run the following commands to test access:
# connect using service
wget -q -O- dg-dgraph-alpha.dgraph:8080/health
# connect using pod
wget -q -O- dg-dgraph-alpha-0.dg-dgraph-alpha-headless.dgraph:8080/health5.3 Network Policy: restricting egress traffic from Ratel
Now, we can add a network policy to restrict all egress traffic to private IP addresses, using this command below:
kubectl apply --namespace ratel --filename - <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: ratel-deny-egress
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
ingress:
- {}
egress:
- to:
- ipBlock:
cidr: "0.0.0.0/0"
except:
- "10.0.0.0/8"
- "172.16.0.0/12"
- "192.168.0.0/16"
EOF
Try running the same test above. The output should be something like wget: bad address.
5.4 Network Policy: restrict ingress traffic to Dgraph
Here’s a small example of how Dgraph can be secured, this example policy will do the following:
- allows all pods in namespace
dgraphto receive traffic from all pods in the same namespace on all ports (denies inbound traffic to all pods in namespacedgraphfrom other namespaces) - allows all pods in namespace
dgraphto receive traffic from all pods indgraph-clientnamespace for ports8080and9080 - allows all pods in namespace
dgraphto receive traffic from load balancer private IP addresses on ports8080and9080
Before we do this, we need to fetch the private IP addresses from the load balancers, so that we can add this to the ingress rules:
# get client source IP
INGRESS_ADDRS=$(curl --silent ifconfig.me)To implement a network, you can run the following:
# deploy network policy to dgraph namespace
kubectl apply --namespace dgraph --filename - <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: dgraph-allow
spec:
podSelector: {}
ingress:
- from:
- podSelector: {}
- from:
$(P=" "; for IP in ${INGRESS_ADDRS[*]};
do printf -- "$P$P- ipBlock:\n$P$P${P}cidr: $IP/32\n";
done
)
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: dgraph-client
ports:
- port: 8080
- port: 9080
EOF
NOTE: For this exercise, the client source IP address. Ultimately, the application, will accept traffic or block traffic based on addresses supplied during deployment of the Dgraph application.
As an example, a public IP of 50.196.140.125/32, representing a home office IP address, will be used. A table for the policies is shown below (courtesy of https://orca.tufin.io/netpol/):
With Cilium’s visual editor, the same policy will look like this:
5.5 Testing restricted access
Now we want to log into the containers we did before to test access after the policies have been applied.
5.5.1 Testing from dgraph-client
From earlier test with curl running in dgraph-client namespace, we can run this:
# try commands from the dgraph-client namespace
kubectl exec --stdin --tty curl --namespace dgraph-client -- shOnce inside the container, run the same test, expecting this to work:
# connect using service
curl dg-dgraph-alpha.dgraph:8080/health
# connect using pod
curl dg-dgraph-alpha-0.dg-dgraph-alpha-headless.dgraph:8080/health5.5.2 Testing from unapproved
Similarly, from the unapproved namespace, we can
# try commands from the unapproved namespace
kubectl exec --stdin --tty curl --namespace unapproved -- shOnce inside the container, run the following command, expecting it to fail:
# connect using service
curl dg-dgraph-alpha.dgraph:8080/health
# connect using pod
curl dg-dgraph-alpha-0.dg-dgraph-alpha-headless.dgraph:8080/health5.5.3 Testing from Ratel
We can exec into the Ratel container with the following:
RATEL_POD=$(kubectl get pod \
--selector app.kubernetes.io/name=ratel \
--namespace ratel \
--output name
)
kubectl exec -ti -n ratel $RATEL_POD -- shAt the shell prompt, run the following tests, expecting them to fail:
# connect using service
wget -q -O- dg-dgraph-alpha.dgraph:8080/health
# connect using pod
wget -q -O- dg-dgraph-alpha-0.dg-dgraph-alpha-headless.dgraph:8080/health5.5.4 Testing from a whitelisted public IP
After applying the policy, we can test access through the load balancer:
export DG_LB=$(kubectl get service dg-dgraph-alpha --namespace dgraph \
--output jsonpath='{.status.loadBalancer.ingress[0].ip}'
)
curl --silent $DG_LB:8080/state | jq -r '.groups."1".members'Cleanup
Kubernetes Resources
Here’s how you can clear how the Kubernetes resources that were used in this article:
helm delete ratel --namespace ratel
helm delete dg --namespace dgraph
kubectl delete pvc --namespace dgraph --selector release=dg
kubectl delete namespace dgraph-client unapproved ratel dgraphRelated Articles
I also have an article to setup something similar for EKS on AWS.
Source Code
Scripts related to this article can be found here:
Final Thoughts
Thank you for following thus far. In the previous article, I showed how to provision a GKE cluster with a private network and least-privilege security account as well as provide some optional tests you can run to test the features.
In this article, I show how deploy a robust distributed graph database called Dgraph, demonstrate some basic usage of Dgraph, as well as secure Dgraph using network policies that are applied with Calico.
This material should be useful material to learn how to manage and secure other database applications on Kubernetes with GKE.
From here, there are a few directions we can go, such as the following list:
- Dgraph (or other database): configure object storage access for backups
- Setup web service with TLS support (requires registering a domain)
- Setup zero trust network using a service mesh
