Dgraph on Kubernetes

Getting Started with Dgraph on Kubernetes

Joaquín Menchaca (智裕)
8 min readDec 9, 2023

--

Dgraph is a highly performant distributed graph database that speaks GraphQL natively. This means that you do not need to translate from GraphQL to other languages, such as Cypher with Neo4j, or to a non-graph database, such as the case with Hasura.

Dgraph also has a superset of the GraphQL language called DQL (Dgraph Query Language), which can communicate through either HTTP or through gRPC for higher performance.

This tutorial will show how you can quickly install a high availability Dgraph cluster on Kubernetes, and test basic functionality. The Dgraph setup here is suitable for demonstration purposes, and not for full production, as this requires further consideration with security, business continuity, and performance.

Prerequisites

Tools

  • kubectl client [kubectl] a the tool that can interact with the Kubernetes cluster. This can be installed using adsf tool.
  • helm [helm] is a tool that can install Kubernetes applications that are packaged as helm charts.
  • grpcurl [grpcurl] is a tool that can interact with gRPC protocol from the command-line.
  • curl [curl]: tool to interact with web servers from the command line.
  • POSIX Shell [sh] such as bash [bash] or zsh [zsh] are used to run the commands. These come standard on Linux, and with macOS you can get the latest with brew install bash zsh if Homebrew is installed.

These tools are highly recommended:

  • jq [jq] is a tool to query and print JSON data

Depending on what cloud provider(s) you are using, you will need their flavor of tools installed and setup.

Kubernetes Client Access

Obviously, you need a Kubernetes cluster provisioned, so that you can deploy applications onto Kubernetes. I wrote some articles that cover how provision Kubernetes:

Once installed, you will need to have the environment variable KUBECONFIG set to point to a configuration with the credentials needed to access the Kubernetes cluster. Afterward, you can use kubectl and helm commands to install Dgraph and interact with the Kubernetes cluster.

Installing Dgraph

Dgraph can be easily installed using the Helm chart. First you will want to setup some settings that are appropriate to your cluster. Here are some example settings we can use for this guide.

DGRAPH_NS="dgraph"
DGRAPH_RELEASE_NAME="dg"
DGRAPH_ALLOW_LIST="0.0.0.0/0" # insecure, should be changed
DGRAPH_ZERO_DISK_SIZE="10Gi" # change as needed
DGRAPH_ALPHA_DISK_SIZE="30Gi" # change as needed for db size

DGRAPH_SC="<CHANGE_ME>" # absolutely must be changed

Dgraph should use disk that have high I/O, such as SSD. This varies depending on your cloud provider. For example, on GKE, you can use premium-rwo; on EKS, you have to install your own driver and create a storage class that uses that driver.

When ready, add the Dgraph helm chart

helm repo add dgraph https://charts.dgraph.io
helm repo update

Now install Dgraph using the variables set earlier:

helm install $DGRAPH_RELEASE_NAME dgraph/dgraph \
--namespace $DGRAPH_NS \
--create-namespace \
--values - <<EOF
zero:
persistence:
storageClass: $DGRAPH_SC
size: $DGRAPH_ZERO_DISK_SIZE
alpha:
configFile:
config.yaml: |
security:
whitelist: ${DG_ALLOW_LIST}
persistence:
storageClass: $DGRAPH_SC
size: $DGRAPH_ALPHA_DISK_SIZE
EOF

Securing the Whitelist

You may want to set the whitelist to something more restrictive when setting up the DGRAPH_ALLOW_LIST environment variable. You could, for example, use the local subnets and any office IP address(es), such as your home IP address assigned by your ISP.

You can get your outbound IP address with the following:

export MY_IP_ADDRESS=$(curl --silent ifconfig.me)

On Google Cloud, it really depends on how you configured your network or if you used the defaults, and where you setup your cluster, and the project you used.

GKE_CLUSTER_NAME="<your_cluster_name>"
GKE_PROJECT_ID="<your_project_id>"
GKE_REGION="<your_cluster_region>"

export GKE_POD_CIDR=$(gcloud container clusters describe $GKE_CLUSTER_NAME \
--project $GKE_PROJECT_ID \
--region $GKE_REGION \
--format json \
| jq -r '.clusterIpv4Cidr'
)

# setup environment variable with GKE CIDR and Home IP
export DGRAPH_ALLOW_LIST="${GKE_POD_CIDR},${MY_IP_ADDRESS}/32"

One AWS, you can do the following:

EKS_CLUSTER_NAME="<your_cluster_name>"
EKS_REGION="<your_cluster_region>"

VPC_ID=$(aws eks describe-cluster \
--name $EKS_CLUSTER_NAME \
--region $EKS_REGION \
--query 'cluster.resourcesVpcConfig.vpcId' \
--output text
)

EKS_CIDR=$(aws ec2 describe-vpcs \
--vpc-ids $VPC_ID \
--region $EKS_REGION \
--query 'Vpcs[0].CidrBlock' \
--output text
)

# setup environment variable with EKS CIDR and Home IP
export DGRAPH_ALLOW_LIST="${EKS_CIDR},${MY_IP_ADDRESS}/32"

On Azure, we can do something like the following:

# get AKS pod IP addresses
AKS_CIDR=$(az aks show \
--name $AZ_CLUSTER_NAME \
--resource-group $AZ_RESOURCE_GROUP \
| jq -r '.networkProfile.podCidr' \
| tr '\n' ','
)

# setup environment variable with AKS CIDR and Home IP
export DGRAPH_ALLOW_LIST="${AKS_CIDR},${MY_IP_ADDRESS}/32"

Testing Dgraph Connectivity

After Dgraph is installed, we can run some basic test to make sure connectivity works with Dgraph. These tests are especially useful when testing connectivity after setting up network policies, service meshes, and also with an external endpoint.

You can make Dgraph accessible through the kubectl port-forward command.

DGRAPH_NS="dgraph"
DGRAPH_RELEASE_NAME="dg"

# setup HTTP access in another terminal tab
kubectl port-forward \
--namespace $DGRAPH_NS \
$DGRAPH_RELEASE_NAME-dgraph-alpha-headless.dgraph.svc 8080 8080

# setup gRPC access in another terminal tab
kubectl port-forward \
--namespace $DGRAPH_NS \
$DGRAPH_RELEASE_NAME-dgraph-alpha-headless.dgraph.svc 9080 9080

Testing HTTP connectivity

You can get the status of the Dgraph cluster using the following command:

export DGRAPH_HTTP="localhost:8080" # change as needed

# test http connectivity
curl -s http://$DGRAPH_HTTP/state

Testing gRPC connectivity

export DGRAPH_GRPC="localhost:9080" # change as needed

# fetch api.proto definition
curl -sOL \
https://raw.githubusercontent.com/dgraph-io/pydgraph/master/pydgraph/proto/api.proto

# test grpc connectivity
grpcurl -plaintext -proto api.proto $DGRAPH_GRPC api.Dgraph/CheckVersion

Getting Started Tutorial

For this section, we’ll upload data, a schema to add indexes, and run a few queries.

First setup an environment variable with the hostname and port number of the Dgraph server, for example:

export DGRAPH_HTTP="localhost:8080"

Uploading Data

You can choose between using RDF or JSON for the data format. Below are examples of each type, choose one of the options.

For RDF, you can upload data with this command:

curl "$DGRAPH_HTTP/mutate?commitNow=true" --silent --request POST \
--header "Content-Type: application/rdf" \
--data $'
{
set {
_:luke <name> "Luke Skywalker" .
_:luke <dgraph.type> "Person" .
_:leia <name> "Princess Leia" .
_:leia <dgraph.type> "Person" .
_:han <name> "Han Solo" .
_:han <dgraph.type> "Person" .
_:lucas <name> "George Lucas" .
_:lucas <dgraph.type> "Person" .
_:irvin <name> "Irvin Kernshner" .
_:irvin <dgraph.type> "Person" .
_:richard <name> "Richard Marquand" .
_:richard <dgraph.type> "Person" .

_:sw1 <name> "Star Wars: Episode IV - A New Hope" .
_:sw1 <release_date> "1977-05-25" .
_:sw1 <revenue> "775000000" .
_:sw1 <running_time> "121" .
_:sw1 <starring> _:luke .
_:sw1 <starring> _:leia .
_:sw1 <starring> _:han .
_:sw1 <director> _:lucas .
_:sw1 <dgraph.type> "Film" .

_:sw2 <name> "Star Wars: Episode V - The Empire Strikes Back" .
_:sw2 <release_date> "1980-05-21" .
_:sw2 <revenue> "534000000" .
_:sw2 <running_time> "124" .
_:sw2 <starring> _:luke .
_:sw2 <starring> _:leia .
_:sw2 <starring> _:han .
_:sw2 <director> _:irvin .
_:sw2 <dgraph.type> "Film" .

_:sw3 <name> "Star Wars: Episode VI - Return of the Jedi" .
_:sw3 <release_date> "1983-05-25" .
_:sw3 <revenue> "572000000" .
_:sw3 <running_time> "131" .
_:sw3 <starring> _:luke .
_:sw3 <starring> _:leia .
_:sw3 <starring> _:han .
_:sw3 <director> _:richard .
_:sw3 <dgraph.type> "Film" .

_:st1 <name> "Star Trek: The Motion Picture" .
_:st1 <release_date> "1979-12-07" .
_:st1 <revenue> "139000000" .
_:st1 <running_time> "132" .
_:st1 <dgraph.type> "Film" .
}
}
' | jq

For JSON, you can upload data with this command:

curl "$DGRAPH_HTTP/mutate?commitNow=true" --silent --request POST \
--header "Content-Type: application/json" \
--data $'
{
"set": [
{"uid": "_:luke","name": "Luke Skywalker", "dgraph.type": "Person"},
{"uid": "_:leia","name": "Princess Leia", "dgraph.type": "Person"},
{"uid": "_:han","name": "Han Solo", "dgraph.type": "Person"},
{"uid": "_:lucas","name": "George Lucas", "dgraph.type": "Person"},
{"uid": "_:irvin","name": "Irvin Kernshner", "dgraph.type": "Person"},
{"uid": "_:richard","name": "Richard Marquand", "dgraph.type": "Person"},
{
"uid": "_:sw1",
"name": "Star Wars: Episode IV - A New Hope",
"release_date": "1977-05-25",
"revenue": 775000000,
"running_time": 121,
"starring": [{"uid": "_:luke"},{"uid": "_:leia"},{"uid": "_:han"}],
"director": [{"uid": "_:lucas"}],
"dgraph.type": "Film"
},
{
"uid": "_:sw2",
"name": "Star Wars: Episode V - The Empire Strikes Back",
"release_date": "1980-05-21",
"revenue": 534000000,
"running_time": 124,
"starring": [{"uid": "_:luke"},{"uid": "_:leia"},{"uid": "_:han"}],
"director": [{"uid": "_:irvin"}],
"dgraph.type": "Film"
},
{
"uid": "_:sw3",
"name": "Star Wars: Episode VI - Return of the Jedi",
"release_date": "1983-05-25",
"revenue": 572000000,
"running_time": 131,
"starring": [{"uid": "_:luke"},{"uid": "_:leia"},{"uid": "_:han"}],
"director": [{"uid": "_:richard"}],
"dgraph.type": "Film"
},
{
"uid": "_:st1",
"name": "Star Trek: The Motion Picture",
"release_date": "1979-12-07",
"revenue": 139000000,
"running_time": 132,
"dgraph.type": "Film"
}
]
}
' | jq

Uploading Schema

Alter the schema to add indexes on some of the data so queries can use term matching, filtering and sorting.

Note that this requires running this command from an outbound IP address that is included in the security.whitelist configuration of Dgraph.

curl "$DGRAPH_HTTP/alter" --silent --request POST \
--data $'
name: string @index(term) .
release_date: datetime @index(year) .
revenue: float .
running_time: int .
starring: [uid] .
director: [uid] .

type Person {
name
}

type Film {
name
release_date
revenue
running_time
starring
director
}
' | jq

Query all Starring Edges

You can list all the movies having the starring edge.

curl "$DGRAPH_HTTP/query" --silent --request POST \
--header "Content-Type: application/dql" \
--data $'{ me(func: has(starring)) { name } }' \
| jq .data

Query all movies released after 1980

curl "$DGRAPH_HTTP/query" --silent --request POST \
--header "Content-Type: application/dql" \
--data $'
{
me(func: allofterms(name, "Star Wars"), orderasc: release_date)
@filter(ge(release_date, "1980")) {
name
release_date
revenue
running_time
director { name }
starring (orderasc: name) { name }
}
}
' | jq .data

Conclusion

I wrote this article with two main purposes, a walk-through for new comers to Dgraph and also as a reference for how to install Dgraph on Kubernetes.

I see a lot of questions online around how to install Dgraph on Kubernetes, so I thought this might be useful to cover this, as well as some material how to test connectivity, as well as a small tutorial to run through the basics of Dgraph: mutation (upload data), alter (upload schema), and running queries.

Other considerations you may want to try out, is using network policies or service mesh to further secure Dgraph, and setting up least privilege access to an object store or NFS for backup or exports.

--

--

Joaquín Menchaca (智裕)

DevOps/SRE/PlatformEng — k8s, o11y, vault, terraform, ansible