Survey of DevOps Tools

Continued Learning Adventured with SRE & Devops Tools

Joaquín Menchaca (智裕)
6 min readApr 28, 2018

--

I embarked on a small quest to explore the current state of tools that occupy slots for desired state change configuration tools and IaC (infrastructure as code). My use case fairly simple:

  • implemented a small ElasticSearch cluster

This is what I have learned so far…

Infrastructure As Code Projects

IaC Projects (grey bubbles = not yet explored)

I wanted to create some systems using AWS Elastic Compute Cloud (EC2), Google Compute Engine (GCE) for production or staging environments, and Vagrant and Virtualbox for local development environments.

This is what I found in these areas.

Bubble-Gum and Scripts

Both Google Cloud and AWS provide command line tools that abstract the complexity of provisioning cloud resources, like instances, gateways, networks, load balancers, databases, and so on. These tools can get you started, and are great for doing ad-hoc miscellaneous chores.

I started to create my own definition files to describe my infrastructure using JSON, which is a fun exercise in your favorite scripting language, but this can get complex quickly.

If you follow down path of SRE patterns, then more robust libraries are desirable to get tighter integration, such as boto for AWS or api-client-library for GCP. And for DevOps collaboration patterns you can use a intuitive DSL with Ansible or Terraform.

Chef

Chef has a knife-google plug-in that allows you to easily spin up systems on GCE, and there’s an equivalent for AWS called knife-ec2. These are great for ad-hoc development, but couldn’t recommend using these for production.

Google has some Chef cookbooks, such as google-gcompute, that expose Google cloud components as configurable resources. Currently, the underlying authorization scheme doesn’t support the current GCloud SDK. Hopefully this will be fixed soon. In this mean time, the cookbook forces you to create a service account as part of its operation.

Another thing, because this is a third-party cookbook, there’s added levels of complexity: ① creating a full cookbook, ② using chef-zero server (local-mode), ③ configuration in config.rb or knife.rb, and ④ managing cookbook dependencies with berkshelf (or policyfile.rb).

One person recommended Chef Provisioning (previously with a cooler name of Chef Metal) uses chef to configure cloud resources. At some point, I may explore this, but there are other tools in high market demand.

Ansible

Ansible has been the most delightful tool to use, at least with AWS. Ansible can read the state directly from the source of truth, AWS, and use the information to dynamically create cloud resources. I could for example read the number of zones in the current region, and then evenly distribute nodes across each availability zone to increase availability. Ansible allows you to easily do this programmatically, where other tools only allow you to do this statically (no DRY).

I have yet to explore Ansible with Google. In combing through the documentation, one thing absent where modules that allowed you to fetch state from Google Cloud. Hopefully this will be added soon. There is a dynamic inventory tool gce.py, so maybe some state from GCP could be accessed through this through Ansible’s group variables.

Terraform

Terraform has unbelievably intuitive DSL for creating both AWS and Google Cloud resources. Google Cloud was easy to get started, as you can create instances in the default VPC and subnet without explicitly specifying this.

This is not the case with AWS, where you have to explicitly specify the vpc_id or subnet_id to configure systems. Fortunately Terraform recently added an import feature that allows you to import the current state of specified resources as save it to its local offline state file. With this, you can then leverage off existing resources in your infrastructure that were created outside of Terraform.

One awesome tool I came across in my research is the tool called terraforming tool from Daisuke Fijita. This tool can dump AWS resources to Terraform’s HCL language. This significantly resources time to come up to speed with this tool on AWS.

A colleague of mine recommend a supplementary tool called terragrunt that adds DRY (don’t repeat yourself). Terraform is a very static tool, and it is not easy or possible to create resources dynamically, as you can with Ansible or scripting language, so terragrunt adds some of this capability.

Desired State Change Configuration Projects

Desired State Change Configuration and Deploy Tools (grey = not explored)

After creating the systems with one tool, I wanted to then hand off to another tool to put stuff on the systems. These tools are divided into two main categories: tools that use an agent to pull down a configuration from a centralized server(s), and tools that push a configuration to a remote system.

Agent Based DS Tools

The three tools in this category are CFEngine, Puppet, and Chef. These platforms are a centralized change management solution. You install an agent on your instances, which then apply configurations specified in a script (CFEngine policy, Puppet manifest, Chef recipe) that converges to a desired state on a regular interval. Should your system fall out of the desired state, the agent will converge it back to the desired state. This is especially useful for applying security policies across the whole infrastructure.

When you update your configuration, they are updated on the servers, and then the agents pull down those changes and apply them.

For those that do not want a robust centralized solution, could use alternatives within each eco-system. With Puppet, instead of pulling latest changes from centralized puppet masters, you can use puppet apply and apply changes using a local copy of your Puppet manifests. With Chef, you can alternatively use Chef Solo without the need for a robust Chef Server, or you can use a small in-memory Chef Server called Chef-Zero (now packaged as chef-client local-mode). For local development environments, tools like knife-solo and knife-zero are useful for developing and testing Chef recipes without the need for a Chef Server.

Push Based DS Tools

These tools have a wide variety of capabilities and solve a range of solutions that are not possible the above agent based tools. Ansible and Salt Stack are robust change configuration tools that use remote execution to apply changes remotely to a system through SSH.

Before the arrival of these change configuration tools, there were deploy tools that worked in conjunction with Puppet or Chef, such as Mina, Capistrano, and Fabric. These tools were used to deploy applications and orchestrate changes, such as updating a load balancer to point to updated services. MCollective was another early tool that offers a robust distributed orchestration platform, not too unlike Salt Stack where can run distributed parallel execution. Puppet recently added Bolt remote task execution tool. Pure deploy tools have fallen out of popularity given that both Ansible and Salt Stack can do this as well as change configuration tasks.

Agentless push tools are generally popular where you need deploy time change configuration for Immutable Production using Docker. In this scenario, you would never change a running container, but rather, dispose the container and run it again. When you launch the container, you may apply limited change configuration to a template for a configuration file, or configure environment variables used by the container. Thus a robust runtime change configuration management is no longer needed.

Another reason these tools have traditionally been popular, is you may need some level of coordination or orchestration. Many times, you may need to configure a system based on the state of another system. The change tool that uses an agent can only offer eventual convergence, like Puppet’s exported resource or Chef’s search facility for example. For configuring a cluster, where several systems are composed for a single service, this eventual convergence will not work.

In these scenarios, you really need an orchestration tool, to configure a state that spans multiple systems simultaneously, or in a particular sequence.

Ansible is well known for its simplicity and has gained considerable ground in this area, for deploy time configuration, orchestration, in additional to IaC cloud resource provisioning.

--

--

Joaquín Menchaca (智裕)

Linux NinjaPants Automation Engineering Mutant — exploring DevOps, o11y, k8s, progressive deployment (ci/cd), cloud native infra, infra as code