This is part 1 of 2 part series on A Practical Guide to HashiCorp Consul. This part is primarily focused on understanding the problems that Consul solves and how it solves them. The second part is more focused on a practical application of Consul in a real-life example and will be published next week. Let’s get started.
How about setting up discoverable, configurable, and secure service mesh using a single tool?
What if we tell you this tool is platform-agnostic and cloud-ready?
And comes as a single binary download.
All this is true. The tool we are talking about is HashiCorp Consul.
Let’s learn about Consul in details below and see how it solves these complex challenges and makes the life of a distributed system operator easy.
HashiCorp announced Consul in April 2014 and it has since then got a good community acceptance.
This guide is aimed at discussing some of these crucial problems and exploring the various solutions provided by HashiCorp Consul to tackle these problems.
Let’s rundown through the topics that we are going to cover in this guide. The topics are written to be self-content. You can jump directly to a specific topic if you want to.
Brief Background on Monolithic vs. Service-oriented Architectures (SOA)
Even if it is a single application, typically it has multiple different sub-components.
One of the examples that HashiCorp’s CTO Armon Dadgar gave during his introductory video for Consul was about - delivering desktop banking application. It has a discrete set of sub-components - for example, authentication (say subsystem A), account management (subsystem B), fund transfer (subsystem C), and foreign exchange (subsystem D).
Now, although these are independent functions - system A authentication vs system C fund transfer - we deploy it as a single, monolith app.
Over the last few years, we have seen a trend away from this kind of architectures. There are several reasons for this shift.
Challenge with monolith is: Suppose there is a bug in one of the subsystems, system A, related to authentication.
We can’t just fix it in system A and update it in production.
We have to update system A and do a redeploy of the whole application, which we need deployment of subsystems B, C, and D as well.
This whole redeployment is not ideal. Instead, we would like to do a deployment of individual services.
The same monolithic app delivered as a set of individual, discrete services.
So, if there is a bug fix in one of our services:
and we fix that bug:
We can do the redeployment of that service without coordinating the deployment with other services. What we are essentially talking about is one form of microservices.
This gives a big boost to our development agility. We don’t need to coordinate our development efforts across different development teams or even systems. We will have freedom of developing and deploying independently. One service on a weekly basis and other on quarterly. This is going to be a big advantage to the development teams.
But, there is no such thing as a free lunch.
The development efficiency we have gained introduces its own set of operational challenges. Let’s look at some of those.
Service discovery in a monolith, its challenges in a distributed system, and Consul's solution
Assuming two services in a single application want to talk to one another. One way is to expose a method, make it public and allow other services to call it. In a monolithic application, it is a single app, and the services would expose public functions and it would simply mean function calls across services.
As this is a function call within a process, it has happened in-memory. Thus, it's fast, and we need not worry about how our data was moved and if it was secure or not.
In the distributed world, service A is no longer delivered as the same application as service B. So, how does service A finds service B if it wants to talk to B?
Service A might not even be on the same machine as service B. So, there is a network in play. And it is not as fast and there is a latency that we can measure on the lines of milliseconds, as compared to nanoseconds of a simple function call.
As we already know by now, two services on a distributed system have to discover one-another to interact. One of the traditional ways of solving this is by using load balancers.
Load balancers would sit in front of each service with a static IP known to all other services.
This gives an ability to add multiple instances of the same service behind the load balancer and it would direct the traffic accordingly. But this load balancer IP is static and hard-coded within all other services, so services can skip discovery.
The challenge is now to maintain a set of load balancers for each individual services. And we can safely assume, there was originally a load balancer for the whole application as well. The cost and effort for maintaining these load balancers have increased.
With load balancers in front of the services, they are a single point of failures. Even when we have multiple instances of service behind the load balancer if it is down our service is down. No matter how many instances of that service are running.
Load balancers also increase the latency of inter-service communication. If service A wish to talk to service B, request from A will have to first talk to the load balancer of service B and then reach B. The response from B will also have to go through the same drill.
And by nature, load balancers are manually managed in most cases. If we add another instance of service, it will not be readily available. We will need to register that service into the load balancer to make it accessible to the world. This would mean manual effort and time.
Consul’s solution to service discovery problem in distributed systems is a central service registry.
Consul maintains a central registry which contains the entry for all the upstream services. When a service instance starts, it is registered on the central registry. The registry is populated with all the upstream instances of the service.
When a service A wants to talk to service B, it will discover and communicate with B by querying the registry about the upstream service instances of B. So, instead of talking to a load balancer, the service can directly talk to the desired destination service instance.
Consul also provides health-checks on these service instances. If one of the service instances or service itself is unhealthy or fails its health-check, the registry would then know about this scenario and would avoid returning the service’s address. The work that load-balancer would do is handled by the registry in this case.
Also, if there are multiple instances of the same service, Consul would send the traffic randomly to different instances. Thus, leveling the load among different instances.
Consul has handled our challenges of failure detection and load distribution across multiple instances of services without a necessity of deploying a centralized load balancer.
Traditional problem of slow and manually managed load balancers is taken care of here. Consul programmatically manages registry, which gets updated when any new service registers itself and becomes available for receiving traffic.
This helps with scaling the services with ease.
Configuration Management in a monolith, its challenges in a distributed environment, and Consul's solution
When we look at the configuration for a monolithic application, they tend to be somewhere along the lines of giant YAML, XML or JSON files. That configuration is supposed to configure the entire application.
Given a single file, all of our subsystems in our monolithic application would now consume the configuration from the same file. Thus creating a consistent view of all our subsystems or services.
If we wish to change the state of the application using configuration update, it would be easily available to all the subsystems. The new configuration is simultaneously consumed by all the components of our application.
Unlike monolith, distributed services would not have a common view on configuration. The configuration is now distributed and there every individual service would need to be configured separately.
Challenges in Distributed Systems
Configuration is to be spread across different services. Maintaining consistency between the configuration on different services after each update is a challenge.
Moreover, the challenge grows when we expect the configuration to be updated dynamically.
Consul’s solution for configuration management in distributed environment is the central Key-Value store.
Consul solves this challenge in a unique way. Instead of spreading the configuration across different distributed service as configuration pieces, it pushes the whole configuration to all the services and configures them dynamically on the distributed system.
Let’s take an example of state change in configuration. The changed state is pushed across all the services in real-time. The configuration is consistently present with all the services.
Network segmentation in a monolith, its challenges in distributed systems, and Consul's solutions
When we look at our classic monolithic architecture, the network is typically divided in three different zones.
The first zone in our network is publicly accessible. The traffic coming to our application via the internet and reaching our load balancers.
The second zone is the traffic from our load balancers to our application. Mostly an internal network zone without direct public access.
The third zone is the closed network zone, primarily designated for data. This is considered to be an isolated zone.
Only the load balancers zone can reach into the application zone and only the application zone can reach into the data zone. It is a straightforward zoning system, simple to implement and manage.
The pattern changes drastically for distributed services.
There are multiple services within our application network zone itself. Each of these service talks to other within this network, making it a complicated traffic pattern.
The primary challenge is that the traffic is not in any sequential flow. Unlike monolithic architecture, where the flow was defined from load balancers to the application and application to data.
Depending on the access pattern we want to support, the traffic might come from different endpoints and reaching different services.
Given multiple services and the ability to support multiple endpoints allows us to deploy multiple service consumers and providers.
Controlling the flow of traffic and segmenting the network into groups or chunks will become a bigger issue. Also, making sure we have strict rules that guide us with partitioning the network based on who should be allowed to talk to whom and vice versa is also vital.
Consul’s solution to overall network segmentation challenge in distributed systems is by implementing service graph and mutual TLS.
Consul solves the problem of network segmentation by centrally managing the definition around who can talk to whom. Consul has a dedicated feature for this called Consul Connect.
Consul Connect enrolls these policies of inter-service communication that we desire and implements it as part of the service graph. So, a policy might say service A can talk to service B, but B cannot talk to C, for example.
The higher benefit of this is, it is not IP restricted. Rather it’s service level. This makes it scalable. The policy will be enforced on all instances of service and there will be no hard bound firewall rule specific to a service’s IP. Making us independent of the scale of our distributed network.
Consul Connect also handles service identity using popular TLS protocol. It distributes the TLS certificate associated with a service.
These certificates help other services securely identify each other. TLS also help with secure communication between the services. This makes for trusted network implementation.
Consul enforces TLS using an agent-based proxy attached to each service instance. This proxy acts as a sidecar. Use of proxy, in this case, prevents us from making any change into the code of original service.
This allows for the higher level benefit of enforcing encryptions on data at rest and data in transit. Moreover, it will assist with fulfilling compliances required by laws around privacy and user identity.
Basic Architecture of Consul
Consul is a distributed and highly available system.
Consul is shipped as a single binary download for all popular platforms. The executable can run as a client as well as server.
Each node that provides services to Consul runs a Consul agent. Each of these agents talk to one or more Consul servers.
Consul agent is responsible for health-checking the services on the node as the health-check of the node itself. It is not responsible for service discovery or maintaining key/value data.
Consul servers are where data is stored and replicated.
Consul can run with single server, but it is recommended by HashiCorp to run a set of 3 to 5 servers to avoid failures. As all the data is stored at Consul server side, with a single server, the failure could cause a data loss.
With multi-servers cluster, they elect a leader among themselves. It is also recommended by HashiCorp to have cluster of servers per datacenter.
During the discovery process, any service in search for other service can query the Consul servers or even Consul agents. The Consul agents forward the queries to Consul servers automatically.
If the query is cross-datacenter, the queries are forwarded by the Consul server to the remote Consul servers. The results from remote Consul servers are returned to the original Consul server.
Getting Started with Consul
This section is dedicated to closely looking at Consul as a tool, with some hands-on experience.
Download and Install
As discussed above, Consul ships as a single binary downloaded from HashiCorps website or from Consul’s GitHub repo releases section.
Single binary can run as Consul Server or even as Consul Client Agent.
You can download Consul from here - Download Consul page.
We will download Consul on command line using the link from download page
Unzip the downloaded zip file.
Add it to PATH.
Once you unzip the compressed file and put the binary under your PATH, you can run it like this.
This will start the agent in development mode.
While the above command is running, you can check for all the members in Consul’s network.
Given we only have one node running, it is treated as server by default. You can designate an agent as a server by supplying server as command line parameter or server as configuration parameter to Consul’s config.
The output of the above command is based on the gossip protocol and is eventually consistent.
Consul HTTP API
For strongly consistent view of the Consul’s agent network, we can use HTTP API provided out of the box by Consul.
Consul DNS Interface
Consul also provides a DNS interface to query nodes. It serves DNS on 8600 port by default. That port is configurable.
Registering a service on Consul can be achieved either by writing a service definition or by sending a request over an appropriate HTTP API.
Consul Service Definition
Service definition is one of the popular ways of registering a service. Let’s take a look at one of such service definition examples.
To host our service definitions we will add a configuration directory, conventionally names as consul.d - ‘.d’ represents that there are set of configuration files under this directory, instead of single config under name consul.
Write the service definition for a fictitious Django web application running on port 80 on localhost.
To make our consul agent aware of this service definition, we can supply the configuration directory to it.
The relevant information in the log here are the sync statements related to the “web” service. Consul agent as accepted our config and synced it across all nodes. In this case one node.
Consul DNS Service Query
We can query the service with DNS, as we did with node. Like so:
We can also query DNS for service records that give us more info into the service specifics like port and node.
You can also use the TAG that we supplied in the service definition to query a specific tag:
Consul Service Catalog Over HTTP API
Service could similarly be queried using HTTP API:
We can filter the services based on health-checks on HTTP API:
Update Consul Service Definition
If you wish to update the service definition on a running Consul agent, it is very simple.
There are three ways to achieve this. You can send a SIGHUP signal to the process, reload Consul which internally sends SIGHUP on the node or you can call HTTP API dedicated to service definition updates that will internally reload the agent configuration.
Send SIGHUP to 21289
Or reload Consul
Configuration reload triggered
You should see this in your Consul log.
Consul Web UI
Consul provides a beautiful web user interface out-of-the-box. You can access it on port 8500.
In this case at http://localhost:8500. Let’s look at some of the screens.
The home page for the Consul UI is services with all the relevant information related to a Consul agent and web service check.
Going into further details on a given service, we get a service dashboard with all the nodes and their health for that service.
On each individual node, we can look at the health-checks, services, and sessions.
Overall, Consul Web UI is really impressive and a great companion for the command line tools that Consul provides.
How is Consul Different From Zookeeper, doozerd, and etcd?
Consul has a first-class support for service discovery, health-check, key-value storage, multi data centers.
All these tools, including Consul, uses server nodes that require quorum of nodes to operate and are strongly consistent.
More or less, they all have similar semantics for key/value store management.
These semantics are attractive for building service discovery systems. Consul has out-of the box support for service discovery, which the other systems lack at.
A service discovery systems also requires a way to perform health-checks. As it is important to check for service’s health before allowing others to discover it. Some systems use heartbeats with periodic updates and TTL. The work for these health checks grows with scale and requires fixed infra. The failure detection window is as least as long as TTL.
Unlike Zookeeper, Consul has client agents sitting on each node in the cluster, talking to each other in gossip pool. This allows the clients to be thin, gives better health-checking ability, reduces client-side complexity, and solves debugging challenges.
Consul’s website gives a good commentary on comparisons between Consul and other tools.
Open Source Tools Around HashiCorp Consul
HashiCorp and the community has built several tools around Consul
These Consul tools are created and managed by the dedicated engineers at HashiCorp:
Consul Template (3.3k stars) - Generic template rendering and notifications with Consul. Template rendering, notifier, and supervisor for @hashicorp Consul and Vault data. It provides a convenient way to populate values from Consul into the file system using the consul-template daemon.
Envconsul (1.2k stars) - Read and set environmental variables for processes from Consul. Envconsul provides a convenient way to launch a subprocess with environment variables populated from HashiCorp Consul and Vault.
Consul Replicate (360 stars) - Consul cross-DC KV replication daemon. This project provides a convenient way to replicate values from one Consul datacenter to another using the consul-replicate daemon.
Consul Migrate - Data migration tool to handle Consul upgrades to 0.5.1+.
The community around Consul has also built several tools to help with registering services and managing service configuration, I would like to mention some of the popular and well-maintained ones -
Confd (5.9k stars) - Manage local application configuration files using templates and data from etcd or consul.
Fabio (5.4k stars) - Fabio is a fast, modern, zero-conf load balancing HTTP(S) and TCP router for deploying applications managed by consul. Register your services in consul, provide a health check and fabio will start routing traffic to them. No configuration required.
Registrator (3.9k stars) - Service registry bridge for Docker with pluggable adapters. Registrator automatically registers and deregisters services for any Docker container by inspecting containers as they come online.
Hashi-UI (871 stars) - A modern user interface for HashiCorp Consul & Nomad.
Git2consul (594 stars) - Mirrors the contents of a git repository into Consul KVs. git2consul takes one or many git repositories and mirrors them into Consul KVs. The goal is for organizations of any size to use git as the backing store, audit trail, and access control mechanism for configuration changes and Consul as the delivery mechanism.
Spring-cloud-consul (503 stars) - This project provides Consul integrations for Spring Boot apps through autoconfiguration and binding to the Spring Environment and other Spring programming model idioms. With a few simple annotations, you can quickly enable and configure the common patterns inside your application and build large distributed systems with Consul based components.
Crypt (453 stars) - Store and retrieve encrypted configs from etcd or consul.
Mesos-Consul (344 stars) - Mesos to Consul bridge for service discovery. Mesos-consul automatically registers/deregisters services run as Mesos tasks.
Consul-cli (228 stars) - Command line interface to Consul HTTP API.
Distributed systems are not easy to build and setup. Maintaining them and keeping them running is an altogether another piece of work. HashiCorp Consul makes the life of engineers facing such challenges easier.
As we went through different aspects of Consul, we learnt how straightforward it would become for us to develop and deploy application with distributed or microservices architecture.
Ease of use, excellent documentation, robust production ready code and community backing, allows adopting and introducing HashiCorp Consul in our technology stack fairly easy.
We hope it was an informative ride on the journey of Consul. Our journey has not yet ended, this was just the first half. We will meet you again with the second part of this article that walks us through practical example close to real-life applications.
Let’s us know what you would like to hear from us more or if you have any questions around the topic, we will be more than happy to answer those.
About the Author
Pranav is a Technical Lead at Velotio. He is a full-stack developer with extensive product development experience and specializes in deploying high-performance, scalable applications. He primarily leads the backend development on Python/Django and Ruby/Rails and frontend development on ReactJS and jQuery. He has keen interest in hunting new technologies, analytics, and solving puzzles.