Ultimate Baseline GKE cluster, Pt 2
Part II: Distributed Graph Database on Proper GKE baseline
In the previous article I demonstrated how to setup a baseline cluster with GKE that has private subnet and a least-privilege service account. I also covered some basic tests that can be used to demo or troubleshoot cluster features.
In this article, I will show a robust stateful application, called Dgraph, a highly performant resilient distributed graph database.
Why Dgraph?
The Dgraph database is a cloud native application that is highly resilient with built-in high availability, communicates using gRPC (HTTP/2) or GraphQL (HTTP/1.1), and has built-in metrics and distributed tracing through OpenCensus.
These features intersect nicely with Kubernetes features for auto-healing and recoverability:
- Application Layer — Dgraph application itself is able to operate with some members are not available, can elect a new leader, sends snapshots to synchronize data between its members.
- Platform Layer — Kubernetes through statefulset controller can restart a Dgraph member with its data intact, so that for example, if
alpha-0
goes down, it comes back with persistent volume foralpha-0
, and ifalpha-1
does down, it comes back withalpha-1
data, and so forth. - Infrastructure Layer — Google MIGs (Managed Instance Groups), which can recover a failing GCE virtual machine instances that are distributed across different data centers (represented by an zone) within the same region.
Some of the other characteristics will be demonstrated with Dgraph on GKE are the following:
- persistent volumes — Dgraph, as a database, will need a fast storage, so will need a storage class that supports SSD
- external load balancer — gRPC uses HTTP/2, which is easy to supports with a layer 4 load balancer, so
service
object of typeLoadBalancer
will be used for this - ingress — can be used for GraphQL, which uses HTTP/1.1. This is fairly straightforward to set up with am ingress, which is a configuration sent to the ingress controller (which is a layer 7 load balancer).
- network policies — to allow only limited access to the cluster, and for the Ratel client, which is hosted by a small web service, can be more open.
Prerequisite Setup
If not already completed, make sure to run through the Steps 0 and Steps 1 from the previous article, Part I: Proper Google Kubernetes Engine baseline cluster.
3.0 Dgraph Demo Application
Dgraph is a highly performant distributed graph database that uses either DQL (Dgraph Query Language) or GraphQL as graph database query languages.
You can install Dgraph using the Dgraph helm chart. For this example, we’ll use an external load balancer. Since, generally, it is not safe to park the database on the public internet, we’ll create an access or allow list to limit what client sources can communicate with the Dgraph database.
3.1 Setup Access List for Security
As most should know, databases are suppose to be private, and not accessible on the Internet. Dgraph can be configured to block all traffic except for IP addresses that are allowed to access Dgraph.
To get started, we can get the IP address range from the subnet we created earlier as well as the pod subnet used on GKE.
SUBNET_CIDR=$(gcloud compute networks subnets describe $GKE_SUBNET_NAME \
--project $GKE_PROJECT_ID \
--region $GKE_REGION \
--format json \
| jq -r '.ipCidrRange'
)
GKE_POD_CIDR=$(gcloud container clusters describe $GKE_CLUSTER_NAME \
--project $GKE_PROJECT_ID \
--region $GKE_REGION \
--format json \
| jq -r '.clusterIpv4Cidr'
)
# get the current outbound IP from your current location
# this will simulate the remote offic IP address
MY_IP_ADDRESS=$(curl --silent ifconfig.me)
# set env var to use later
export DG_ALLOW_LIST="${SUBNET_CIDR},${GKE_POD_CIDR},${MY_IP_ADDRESS}/32"
3.2 Install HA Dgraph cluster
With the env variable DG_ALLOW_LIST
set, we can deploy Dgraph with the following:
# get dgraph helm chart
helm repo add dgraph https://charts.dgraph.io && helm repo update
# deploy dgraph
helm install dg dgraph/dgraph \
--namespace dgraph \
--create-namespace \
--values - <<EOF
zero:
persistence:
storageClass: premium-rwo
size: 10Gi
alpha:
configFile:
config.yaml: |
security:
whitelist: ${DG_ALLOW_LIST}
persistence:
storageClass: premium-rwo
size: 30Gi
service:
type: LoadBalancer
externalTrafficPolicy: Local
EOF
📓 NOTE: With Google Cloud, the initial quotes for storage is limited to 500Gi. A three node GKE cluster will already consume 300Gi (3 x 100Gi), leaving a scant 200Gi for the cluster or anything else running on the cluster. If you run into pods that a stuck in a pending
state, check the events, e.g. kubectl get events --namespace dgraph
, to see if storage is the issue.
You can inspect the deployed components with:
kubectl get all --namespace dgraph
3.3 About External Load Balancer Configuration
When using any filtering by IP address, such as Dgraph’s whitelist or using a network policy, the client source IP address needs to be preserved. This can be done by setting externalTrafficPolicy
is set to Local
.
If you would like to get further information regarding what was provisioned on AWS, you can run this command:
export DG_LB=$(kubectl get service dg-dgraph-alpha \
--namespace dgraph \
--output jsonpath='{.status.loadBalancer.ingress[0].ip}'
)
gcloud compute forwarding-rules list \
--filter $DG_LB \
--format "table[box](name,IPAddress,target.segment(-2):label=TARGET_TYPE)"
3.4 Connecting to Dgraph
You can run the following to test connectivity to Dgraph.
export DG_LB=$(kubectl get service dg-dgraph-alpha \
--namespace dgraph \
--output jsonpath='{.status.loadBalancer.ingress[0].ip}'
)
curl --silent $DG_LB:8080/state | jq -r '.groups."1".members'
Will will use this environment variable DG_LB
in following steps of this tutorial.
3.5 Testing Dgraph
We can test Dgraph by loading up some data and schema, and then run some queries using curl. Make sure to have the DG_LB
environment variable set.
First let’s upload some data.
curl "$DG_LB:8080/mutate?commitNow=true" --silent --request POST \
--header "Content-Type: application/json" \
--data $'
{
"set": [
{"uid": "_:luke","name": "Luke Skywalker", "dgraph.type": "Person"},
{"uid": "_:leia","name": "Princess Leia", "dgraph.type": "Person"},
{"uid": "_:han","name": "Han Solo", "dgraph.type": "Person"},
{"uid": "_:lucas","name": "George Lucas", "dgraph.type": "Person"},
{"uid": "_:irvin","name": "Irvin Kernshner", "dgraph.type": "Person"},
{"uid": "_:richard","name": "Richard Marquand", "dgraph.type": "Person"},
{
"uid": "_:sw1",
"name": "Star Wars: Episode IV - A New Hope",
"release_date": "1977-05-25",
"revenue": 775000000,
"running_time": 121,
"starring": [{"uid": "_:luke"},{"uid": "_:leia"},{"uid": "_:han"}],
"director": [{"uid": "_:lucas"}],
"dgraph.type": "Film"
},
{
"uid": "_:sw2",
"name": "Star Wars: Episode V - The Empire Strikes Back",
"release_date": "1980-05-21",
"revenue": 534000000,
"running_time": 124,
"starring": [{"uid": "_:luke"},{"uid": "_:leia"},{"uid": "_:han"}],
"director": [{"uid": "_:irvin"}],
"dgraph.type": "Film"
},
{
"uid": "_:sw3",
"name": "Star Wars: Episode VI - Return of the Jedi",
"release_date": "1983-05-25",
"revenue": 572000000,
"running_time": 131,
"starring": [{"uid": "_:luke"},{"uid": "_:leia"},{"uid": "_:han"}],
"director": [{"uid": "_:richard"}],
"dgraph.type": "Film"
},
{
"uid": "_:st1",
"name": "Star Trek: The Motion Picture",
"release_date": "1979-12-07",
"revenue": 139000000,
"running_time": 132,
"dgraph.type": "Film"
}
]
}
' | jq
Now, let’s upload the schema that will add an index.
curl "$DG_LB:8080/alter" --silent --request POST \
--data $'
name: string @index(term) .
release_date: datetime @index(year) .
revenue: float .
running_time: int .
starring: [uid] .
director: [uid] .
type Person {
name
}
type Film {
name
release_date
revenue
running_time
starring
director
}
' | jq
NOTE: This alter
command will fail if a whitelist is not setup.
You can list out all of the movies that have a starring
edge.
curl "$DG_LB:8080/query" --silent --request POST \
--header "Content-Type: application/dql" \
--data $'{ me(func: has(starring)) { name } }' \
| jq .data
You can run this query Star Wars
movies released after 1980
.
curl "$DG_LB:8080/query" --silent --request POST \
--header "Content-Type: application/dql" \
--data $'
{
me(func: allofterms(name, "Star Wars"), orderasc: release_date)
@filter(ge(release_date, "1980")) {
name
release_date
revenue
running_time
director { name }
starring (orderasc: name) { name }
}
}
' | jq .data
4. Ratel visualization application
Ratel is a graphical query and administration application for Dgraph. This application, unlike a double-clickable application on the desktop, runs only within a web browser like Safari, Firefox, or Chrome.
4.1 Installing Ratel
You can install a small web service that hosts Ratel, so that you can run this in your browser. Run the following to install the web service hosting Ratel:
# get dgraph helm chart
helm repo add dgraph https://charts.dgraph.io && helm repo update
# deploy Ratel
helm install ratel \
--namespace ratel \
--create-namespace dgraph/ratel \
--values - <<EOF
service:
type: NodePort
ingress:
enabled: true
className: gce
annotations:
kubernetes.io/ingress.class: gce
hosts:
- paths:
- path: /*
pathType: ImplementationSpecific
EOF
📓 NOTE: The service must be of NodePort
or LoadBalancer
type, as ingress-gce does not support ClusterIP
. Also, the annotation kubernetes.io/ingress.class: gce
is required if ingress-gce is not the default as the className
will be ignored.
You can verify the installed components with the following command:
kubectl get all,ing --namespace ratel
4.2 Accessing Ratel
Print out the URL to Ratel, and paste this into the browser.
RATEL_LB=$(kubectl get ing ratel \
--namespace "ratel" \
--output jsonpath='{.status.loadBalancer.ingress[0].ip}'
)
echo "http://$RATEL_LB"
You will be prompted to enter in the Dgraph server URL. You can get this with the following command:
DGRAPH_LB=$(kubectl get service dg-dgraph-alpha \
--namespace dgraph \
--output jsonpath='{.status.loadBalancer.ingress[0].ip}'
)
echo "http://$DGRAPH_LB:8080"
Copy that string as paste the string into the Dgraph server URL
textbox and hit enter.
Click on Continue. This should drop you into the Query mode.
4.2 Testing queries with Ratel
If you ran through the previous steps above in the Testing Dgraph section, the data and schema should already be loaded. You run the same query from earlier:
{
me(func: allofterms(name, "Star Wars"), orderasc: release_date)
@filter(ge(release_date, "1980")) {
name
release_date
revenue
running_time
director { name }
starring (orderasc: name) { name }
}
}
This should look something like:
5. Security with Network Policies
Network Policies can control ingress (inbound) and egress (outbound) traffic between services running on Kubernetes.
5.1 General recommendations
For securing a services within a Kubernetes cluster, I recommend creating the following polices:
- default baseline: in a given namespace, deny all traffic.
- single tier web application: allow all traffic (
0.0.0.0/0
) but deny egress traffic to private subnets. - multi-tier web application: same as single tier web application, but allow outbound traffic to database’s namespace, e.g. add an egress rule to
namespaceSelector.matchLabels.name=$DATABASE_NAMESPACE
. - private database tier: allow all traffic from namespace of clients that need access to the database.
Dgraph is in the #4 category, so all traffic should be blocked except from desired sources.
Ratel is in the #2 category, a smell web server that hosts the client-only Ratel application. As with any web server, it should be isolated and not able to reach any private service within the internal network.
5.2 Testing Network Policies
These will be the following tests we should do before and after the polices are applied.
Before we apply the policy, we can test connectivity, then apply the policy and verify that after deploying the policy, the Dgraph service can only be reached from approved sources.
5.2.1 Testing from an approved namespace
The namespace dgraph-client
will be used to test approved traffic to the Dgraph service.
Run the commands below to set this up.
# create name spaces
kubectl create namespace dgraph-client
# run new container and exec into the container
# CTRL-D to exit the session
kubectl run curl \
--namespace dgraph-client \
--image=curlimages/curl \
--stdin --tty -- sh
Once inside this container running in dgraph-client
namespace, run this command to test the connection.
# connect using service
curl dg-dgraph-alpha.dgraph:8080/health
# connect using pod
curl dg-dgraph-alpha-0.dg-dgraph-alpha-headless.dgraph:8080/health
5.2.2 Testing from unapproved namespace
We will use the namespace unapproved
for traffic that is not approved traffic. Repeat the same process for the unapproved namespace:
# create name spaces
kubectl create namespace unapproved
# run new container and exec into the container
kubectl run curl \
--namespace unapproved \
--image=curlimages/curl \
--stdin --tty -- sh
Once in the container running in the unapproved
namespace, repeat the same test:
# connect using service
curl dg-dgraph-alpha.dgraph:8080/health
# connect using pod
curl dg-dgraph-alpha-0.dg-dgraph-alpha-headless.dgraph:8080/health
5.2.3 Testing from Ratel web service namespace
Exec into the Ratel container:
RATEL_POD=$(kubectl get pod \
--selector app.kubernetes.io/name=ratel \
--namespace ratel \
--output name
)
kubectl exec -ti -n ratel $RATEL_POD -- sh
Once inside the Ratel container, run the following commands to test access:
# connect using service
wget -q -O- dg-dgraph-alpha.dgraph:8080/health
# connect using pod
wget -q -O- dg-dgraph-alpha-0.dg-dgraph-alpha-headless.dgraph:8080/health
5.3 Network Policy: restricting egress traffic from Ratel
Now, we can add a network policy to restrict all egress traffic to private IP addresses, using this command below:
kubectl apply --namespace ratel --filename - <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: ratel-deny-egress
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
ingress:
- {}
egress:
- to:
- ipBlock:
cidr: "0.0.0.0/0"
except:
- "10.0.0.0/8"
- "172.16.0.0/12"
- "192.168.0.0/16"
EOF
Try running the same test above. The output should be something like wget: bad address
.
5.4 Network Policy: restrict ingress traffic to Dgraph
Here’s a small example of how Dgraph can be secured, this example policy will do the following:
- allows all pods in namespace
dgraph
to receive traffic from all pods in the same namespace on all ports (denies inbound traffic to all pods in namespacedgraph
from other namespaces) - allows all pods in namespace
dgraph
to receive traffic from all pods indgraph-client
namespace for ports8080
and9080
- allows all pods in namespace
dgraph
to receive traffic from load balancer private IP addresses on ports8080
and9080
Before we do this, we need to fetch the private IP addresses from the load balancers, so that we can add this to the ingress rules:
# get client source IP
INGRESS_ADDRS=$(curl --silent ifconfig.me)
To implement a network, you can run the following:
# deploy network policy to dgraph namespace
kubectl apply --namespace dgraph --filename - <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: dgraph-allow
spec:
podSelector: {}
ingress:
- from:
- podSelector: {}
- from:
$(P=" "; for IP in ${INGRESS_ADDRS[*]};
do printf -- "$P$P- ipBlock:\n$P$P${P}cidr: $IP/32\n";
done
)
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: dgraph-client
ports:
- port: 8080
- port: 9080
EOF
NOTE: For this exercise, the client source IP address. Ultimately, the application, will accept traffic or block traffic based on addresses supplied during deployment of the Dgraph application.
As an example, a public IP of 50.196.140.125/32
, representing a home office IP address, will be used. A table for the policies is shown below (courtesy of https://orca.tufin.io/netpol/):
With Cilium’s visual editor, the same policy will look like this:
5.5 Testing restricted access
Now we want to log into the containers we did before to test access after the policies have been applied.
5.5.1 Testing from dgraph-client
From earlier test with curl running in dgraph-client
namespace, we can run this:
# try commands from the dgraph-client namespace
kubectl exec --stdin --tty curl --namespace dgraph-client -- sh
Once inside the container, run the same test, expecting this to work:
# connect using service
curl dg-dgraph-alpha.dgraph:8080/health
# connect using pod
curl dg-dgraph-alpha-0.dg-dgraph-alpha-headless.dgraph:8080/health
5.5.2 Testing from unapproved
Similarly, from the unapproved
namespace, we can
# try commands from the unapproved namespace
kubectl exec --stdin --tty curl --namespace unapproved -- sh
Once inside the container, run the following command, expecting it to fail:
# connect using service
curl dg-dgraph-alpha.dgraph:8080/health
# connect using pod
curl dg-dgraph-alpha-0.dg-dgraph-alpha-headless.dgraph:8080/health
5.5.3 Testing from Ratel
We can exec into the Ratel container with the following:
RATEL_POD=$(kubectl get pod \
--selector app.kubernetes.io/name=ratel \
--namespace ratel \
--output name
)
kubectl exec -ti -n ratel $RATEL_POD -- sh
At the shell prompt, run the following tests, expecting them to fail:
# connect using service
wget -q -O- dg-dgraph-alpha.dgraph:8080/health
# connect using pod
wget -q -O- dg-dgraph-alpha-0.dg-dgraph-alpha-headless.dgraph:8080/health
5.5.4 Testing from a whitelisted public IP
After applying the policy, we can test access through the load balancer:
export DG_LB=$(kubectl get service dg-dgraph-alpha --namespace dgraph \
--output jsonpath='{.status.loadBalancer.ingress[0].ip}'
)
curl --silent $DG_LB:8080/state | jq -r '.groups."1".members'
Cleanup
Kubernetes Resources
Here’s how you can clear how the Kubernetes resources that were used in this article:
helm delete ratel --namespace ratel
helm delete dg --namespace dgraph
kubectl delete pvc --namespace dgraph --selector release=dg
kubectl delete namespace dgraph-client unapproved ratel dgraph
Related Articles
I also have an article to setup something similar for EKS on AWS.
Source Code
Scripts related to this article can be found here:
Final Thoughts
Thank you for following thus far. In the previous article, I showed how to provision a GKE cluster with a private network and least-privilege security account as well as provide some optional tests you can run to test the features.
In this article, I show how deploy a robust distributed graph database called Dgraph, demonstrate some basic usage of Dgraph, as well as secure Dgraph using network policies that are applied with Calico.
This material should be useful material to learn how to manage and secure other database applications on Kubernetes with GKE.
From here, there are a few directions we can go, such as the following list:
- Dgraph (or other database): configure object storage access for backups
- Setup web service with TLS support (requires registering a domain)
- Setup zero trust network using a service mesh