Dgraph on Kubernetes
Getting Started with Dgraph on Kubernetes
Dgraph is a highly performant distributed graph database that speaks GraphQL natively. This means that you do not need to translate from GraphQL to other languages, such as Cypher with Neo4j, or to a non-graph database, such as the case with Hasura.
Dgraph also has a superset of the GraphQL language called DQL (Dgraph Query Language), which can communicate through either HTTP or through gRPC for higher performance.
This tutorial will show how you can quickly install a high availability Dgraph cluster on Kubernetes, and test basic functionality. The Dgraph setup here is suitable for demonstration purposes, and not for full production, as this requires further consideration with security, business continuity, and performance.
Prerequisites
Tools
- kubectl client [
kubectl
] a the tool that can interact with the Kubernetes cluster. This can be installed usingadsf
tool. - helm [
helm
] is a tool that can install Kubernetes applications that are packaged as helm charts. - grpcurl [
grpcurl
] is a tool that can interact with gRPC protocol from the command-line. - curl [
curl
]: tool to interact with web servers from the command line. - POSIX Shell [
sh
] such as bash [bash
] or zsh [zsh
] are used to run the commands. These come standard on Linux, and with macOS you can get the latest withbrew install bash zsh
if Homebrew is installed.
These tools are highly recommended:
- jq [
jq
] is a tool to query and print JSON data
Depending on what cloud provider(s) you are using, you will need their flavor of tools installed and setup.
- Azure CLI tool command [
az
]: for interacting with the Azure - Google Cloud SDK command [
gcloud
] for interacting with Google Cloud - AWS CLI command [
aws
] for interacting with AWS.
Kubernetes Client Access
Obviously, you need a Kubernetes cluster provisioned, so that you can deploy applications onto Kubernetes. I wrote some articles that cover how provision Kubernetes:
Once installed, you will need to have the environment variable KUBECONFIG
set to point to a configuration with the credentials needed to access the Kubernetes cluster. Afterward, you can use kubectl
and helm
commands to install Dgraph and interact with the Kubernetes cluster.
Installing Dgraph
Dgraph can be easily installed using the Helm chart. First you will want to setup some settings that are appropriate to your cluster. Here are some example settings we can use for this guide.
DGRAPH_NS="dgraph"
DGRAPH_RELEASE_NAME="dg"
DGRAPH_ALLOW_LIST="0.0.0.0/0" # insecure, should be changed
DGRAPH_ZERO_DISK_SIZE="10Gi" # change as needed
DGRAPH_ALPHA_DISK_SIZE="30Gi" # change as needed for db size
DGRAPH_SC="<CHANGE_ME>" # absolutely must be changed
Dgraph should use disk that have high I/O, such as SSD. This varies depending on your cloud provider. For example, on GKE, you can use premium-rwo
; on EKS, you have to install your own driver and create a storage class that uses that driver.
When ready, add the Dgraph helm chart
helm repo add dgraph https://charts.dgraph.io
helm repo update
Now install Dgraph using the variables set earlier:
helm install $DGRAPH_RELEASE_NAME dgraph/dgraph \
--namespace $DGRAPH_NS \
--create-namespace \
--values - <<EOF
zero:
persistence:
storageClass: $DGRAPH_SC
size: $DGRAPH_ZERO_DISK_SIZE
alpha:
configFile:
config.yaml: |
security:
whitelist: ${DG_ALLOW_LIST}
persistence:
storageClass: $DGRAPH_SC
size: $DGRAPH_ALPHA_DISK_SIZE
EOF
Securing the Whitelist
You may want to set the whitelist to something more restrictive when setting up the DGRAPH_ALLOW_LIST
environment variable. You could, for example, use the local subnets and any office IP address(es), such as your home IP address assigned by your ISP.
You can get your outbound IP address with the following:
export MY_IP_ADDRESS=$(curl --silent ifconfig.me)
On Google Cloud, it really depends on how you configured your network or if you used the defaults, and where you setup your cluster, and the project you used.
GKE_CLUSTER_NAME="<your_cluster_name>"
GKE_PROJECT_ID="<your_project_id>"
GKE_REGION="<your_cluster_region>"
export GKE_POD_CIDR=$(gcloud container clusters describe $GKE_CLUSTER_NAME \
--project $GKE_PROJECT_ID \
--region $GKE_REGION \
--format json \
| jq -r '.clusterIpv4Cidr'
)
# setup environment variable with GKE CIDR and Home IP
export DGRAPH_ALLOW_LIST="${GKE_POD_CIDR},${MY_IP_ADDRESS}/32"
One AWS, you can do the following:
EKS_CLUSTER_NAME="<your_cluster_name>"
EKS_REGION="<your_cluster_region>"
VPC_ID=$(aws eks describe-cluster \
--name $EKS_CLUSTER_NAME \
--region $EKS_REGION \
--query 'cluster.resourcesVpcConfig.vpcId' \
--output text
)
EKS_CIDR=$(aws ec2 describe-vpcs \
--vpc-ids $VPC_ID \
--region $EKS_REGION \
--query 'Vpcs[0].CidrBlock' \
--output text
)
# setup environment variable with EKS CIDR and Home IP
export DGRAPH_ALLOW_LIST="${EKS_CIDR},${MY_IP_ADDRESS}/32"
On Azure, we can do something like the following:
# get AKS pod IP addresses
AKS_CIDR=$(az aks show \
--name $AZ_CLUSTER_NAME \
--resource-group $AZ_RESOURCE_GROUP \
| jq -r '.networkProfile.podCidr' \
| tr '\n' ','
)
# setup environment variable with AKS CIDR and Home IP
export DGRAPH_ALLOW_LIST="${AKS_CIDR},${MY_IP_ADDRESS}/32"
Testing Dgraph Connectivity
After Dgraph is installed, we can run some basic test to make sure connectivity works with Dgraph. These tests are especially useful when testing connectivity after setting up network policies, service meshes, and also with an external endpoint.
You can make Dgraph accessible through the kubectl port-forward
command.
DGRAPH_NS="dgraph"
DGRAPH_RELEASE_NAME="dg"
# setup HTTP access in another terminal tab
kubectl port-forward \
--namespace $DGRAPH_NS \
$DGRAPH_RELEASE_NAME-dgraph-alpha-headless.dgraph.svc 8080 8080
# setup gRPC access in another terminal tab
kubectl port-forward \
--namespace $DGRAPH_NS \
$DGRAPH_RELEASE_NAME-dgraph-alpha-headless.dgraph.svc 9080 9080
Testing HTTP connectivity
You can get the status of the Dgraph cluster using the following command:
export DGRAPH_HTTP="localhost:8080" # change as needed
# test http connectivity
curl -s http://$DGRAPH_HTTP/state
Testing gRPC connectivity
export DGRAPH_GRPC="localhost:9080" # change as needed
# fetch api.proto definition
curl -sOL \
https://raw.githubusercontent.com/dgraph-io/pydgraph/master/pydgraph/proto/api.proto
# test grpc connectivity
grpcurl -plaintext -proto api.proto $DGRAPH_GRPC api.Dgraph/CheckVersion
Getting Started Tutorial
For this section, we’ll upload data, a schema to add indexes, and run a few queries.
First setup an environment variable with the hostname and port number of the Dgraph server, for example:
export DGRAPH_HTTP="localhost:8080"
Uploading Data
You can choose between using RDF or JSON for the data format. Below are examples of each type, choose one of the options.
For RDF, you can upload data with this command:
curl "$DGRAPH_HTTP/mutate?commitNow=true" --silent --request POST \
--header "Content-Type: application/rdf" \
--data $'
{
set {
_:luke <name> "Luke Skywalker" .
_:luke <dgraph.type> "Person" .
_:leia <name> "Princess Leia" .
_:leia <dgraph.type> "Person" .
_:han <name> "Han Solo" .
_:han <dgraph.type> "Person" .
_:lucas <name> "George Lucas" .
_:lucas <dgraph.type> "Person" .
_:irvin <name> "Irvin Kernshner" .
_:irvin <dgraph.type> "Person" .
_:richard <name> "Richard Marquand" .
_:richard <dgraph.type> "Person" .
_:sw1 <name> "Star Wars: Episode IV - A New Hope" .
_:sw1 <release_date> "1977-05-25" .
_:sw1 <revenue> "775000000" .
_:sw1 <running_time> "121" .
_:sw1 <starring> _:luke .
_:sw1 <starring> _:leia .
_:sw1 <starring> _:han .
_:sw1 <director> _:lucas .
_:sw1 <dgraph.type> "Film" .
_:sw2 <name> "Star Wars: Episode V - The Empire Strikes Back" .
_:sw2 <release_date> "1980-05-21" .
_:sw2 <revenue> "534000000" .
_:sw2 <running_time> "124" .
_:sw2 <starring> _:luke .
_:sw2 <starring> _:leia .
_:sw2 <starring> _:han .
_:sw2 <director> _:irvin .
_:sw2 <dgraph.type> "Film" .
_:sw3 <name> "Star Wars: Episode VI - Return of the Jedi" .
_:sw3 <release_date> "1983-05-25" .
_:sw3 <revenue> "572000000" .
_:sw3 <running_time> "131" .
_:sw3 <starring> _:luke .
_:sw3 <starring> _:leia .
_:sw3 <starring> _:han .
_:sw3 <director> _:richard .
_:sw3 <dgraph.type> "Film" .
_:st1 <name> "Star Trek: The Motion Picture" .
_:st1 <release_date> "1979-12-07" .
_:st1 <revenue> "139000000" .
_:st1 <running_time> "132" .
_:st1 <dgraph.type> "Film" .
}
}
' | jq
For JSON, you can upload data with this command:
curl "$DGRAPH_HTTP/mutate?commitNow=true" --silent --request POST \
--header "Content-Type: application/json" \
--data $'
{
"set": [
{"uid": "_:luke","name": "Luke Skywalker", "dgraph.type": "Person"},
{"uid": "_:leia","name": "Princess Leia", "dgraph.type": "Person"},
{"uid": "_:han","name": "Han Solo", "dgraph.type": "Person"},
{"uid": "_:lucas","name": "George Lucas", "dgraph.type": "Person"},
{"uid": "_:irvin","name": "Irvin Kernshner", "dgraph.type": "Person"},
{"uid": "_:richard","name": "Richard Marquand", "dgraph.type": "Person"},
{
"uid": "_:sw1",
"name": "Star Wars: Episode IV - A New Hope",
"release_date": "1977-05-25",
"revenue": 775000000,
"running_time": 121,
"starring": [{"uid": "_:luke"},{"uid": "_:leia"},{"uid": "_:han"}],
"director": [{"uid": "_:lucas"}],
"dgraph.type": "Film"
},
{
"uid": "_:sw2",
"name": "Star Wars: Episode V - The Empire Strikes Back",
"release_date": "1980-05-21",
"revenue": 534000000,
"running_time": 124,
"starring": [{"uid": "_:luke"},{"uid": "_:leia"},{"uid": "_:han"}],
"director": [{"uid": "_:irvin"}],
"dgraph.type": "Film"
},
{
"uid": "_:sw3",
"name": "Star Wars: Episode VI - Return of the Jedi",
"release_date": "1983-05-25",
"revenue": 572000000,
"running_time": 131,
"starring": [{"uid": "_:luke"},{"uid": "_:leia"},{"uid": "_:han"}],
"director": [{"uid": "_:richard"}],
"dgraph.type": "Film"
},
{
"uid": "_:st1",
"name": "Star Trek: The Motion Picture",
"release_date": "1979-12-07",
"revenue": 139000000,
"running_time": 132,
"dgraph.type": "Film"
}
]
}
' | jq
Uploading Schema
Alter the schema to add indexes on some of the data so queries can use term matching, filtering and sorting.
Note that this requires running this command from an outbound IP address that is included in the security.whitelist
configuration of Dgraph.
curl "$DGRAPH_HTTP/alter" --silent --request POST \
--data $'
name: string @index(term) .
release_date: datetime @index(year) .
revenue: float .
running_time: int .
starring: [uid] .
director: [uid] .
type Person {
name
}
type Film {
name
release_date
revenue
running_time
starring
director
}
' | jq
Query all Starring Edges
You can list all the movies having the starring
edge.
curl "$DGRAPH_HTTP/query" --silent --request POST \
--header "Content-Type: application/dql" \
--data $'{ me(func: has(starring)) { name } }' \
| jq .data
Query all movies released after 1980
curl "$DGRAPH_HTTP/query" --silent --request POST \
--header "Content-Type: application/dql" \
--data $'
{
me(func: allofterms(name, "Star Wars"), orderasc: release_date)
@filter(ge(release_date, "1980")) {
name
release_date
revenue
running_time
director { name }
starring (orderasc: name) { name }
}
}
' | jq .data
Conclusion
I wrote this article with two main purposes, a walk-through for new comers to Dgraph and also as a reference for how to install Dgraph on Kubernetes.
I see a lot of questions online around how to install Dgraph on Kubernetes, so I thought this might be useful to cover this, as well as some material how to test connectivity, as well as a small tutorial to run through the basics of Dgraph: mutation (upload data), alter (upload schema), and running queries.
Other considerations you may want to try out, is using network policies or service mesh to further secure Dgraph, and setting up least privilege access to an object store or NFS for backup or exports.