In the first part of this series, we took an in-depth look at what problems and tasks can be solved by application distributed architecture. We defined what tools can be used for solving these problems and marked the importance of discovery implementation at the setting stage of the project. We also chose Consul as a base application for discovery – service implementation.
In the second part, we will review Consul’s work with DNS protocol, describe the main requests to HTTP API, clarify what types of Health Checks can be used, and, of course, find out the importance of K/V storage. And, most importantly, we will become familiar with some of these features in practice.
Consul can respond to the requests according to DNS protocol. Any DNS-client can be used for requests. DNS-interface is available for components on local host, port 8600. Besides straight requests to Consul, you can specify it as a resolver in the system and transparently use it for name resolution by proxying all external requests to the above-stated "full" DNS-server and resolving requests in private zone .consul by yourself.
In case several services exist in catalog with the same names and different IP-addresses, Consul accidentally mixes addresses in the response for implementation of DNS primitive load-balancing.
It is possible to either make a straight request for domain name resolution in terms of cluster or to make a lookup. It can be done for both service lookup and node lookup.
Domain name format for DNS request in terms of consul-cluster is strictly defined and is not a subject to change.
This is a typical DNS-request that will return cluster node IP-address by its name (the node name is stated at agent start with the help of - node parameter).
Let’s review node’s name format for DNS-request:
- <node> - obligatory part, node’s name;
- .node – indicator for making a node lookup;
- [.datacenter] – optional part, datacenter name (consul "out of the box" can provide discovery for several datacenters in terms of one cluster. By default, "dc1" name is used. If the name of a datacenter is not included, the current datacenter is used. The current datacenter is a datacenter in terms of which there is a running agent that receives the request);
- <domain> - obligatory part, private Consul top-level domain has the meaning.consul by default.
So, domain name for node (for example, by name nodeservice) will look like that:
As we can see, the datacenter’s name is dropped, but the name can be built like that:
Several nodes with the same name in terms of one datacenter are not allowed.
Request for a service lookup by name is processed on all cluster nodes. Request for a service lookup provides more opportunities than a request for a domain name resolution. Besides a request for service IP-address (A record), we can make a request for getting SRV-record and find out ports with a running service.
That’s how a typical request for nodes lookup with a running server (with the name rls) looks:
root@511cdc9dd19b:~# dig @127.0.0.1 -p 8600 rls.service.consul. ; <<>> DiG 9.9.5-3ubuntu0.7-Ubuntu <<>> @127.0.0.1 -p 8600 rls.service.consul. ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 26143 ;; flags: qr aa rd; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0 ;; WARNING: recursion requested but not available ;; QUESTION SECTION: ;rls.service.consul. IN A ;; ANSWER SECTION: rls.service.consul. 0 IN A 172.17.0.2 rls.service.consul. 0 IN A 172.17.0.3 ;; Query time: 4 msec ;; SERVER: 127.0.0.1#8600(127.0.0.1) ;; WHEN: Thu Feb 18 07:23:00 UTC 2016 ;; MSG SIZE rcvd: 104
From this response, we can see that there are two nodes with running service (rls) in cluster and that Consul DNS-interface returns IP-addresses of all nodes. If we repeat the request several times, we will see the logs switching places. It means that the first place is not reserved to the first found server. This is an example of simple DNS load-balancing that we mentioned above.
If we request SRV-record, the response will be the following:
root@511cdc9dd19b:/# dig @127.0.0.1 -p 8600 rls.service.consul. SRV ; <<>> DiG 9.9.5-3ubuntu0.7-Ubuntu <<>> @127.0.0.1 -p 8600 rls.service.consul. SRV ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 8371 ;; flags: qr aa rd; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 2 ;; WARNING: recursion requested but not available ;; QUESTION SECTION: ;rls.service.consul. IN SRV ;; ANSWER SECTION: rls.service.consul. 0 IN SRV 1 1 80 agent-two.node.dc1.consul. rls.service.consul. 0 IN SRV 1 1 80 agent-one.node.dc1.consul. ;; ADDITIONAL SECTION: agent-two.node.dc1.consul. 0 IN A 172.17.0.3 agent-one.node.dc1.consul. 0 IN A 172.17.0.2 ;; Query time: 5 msec ;; SERVER: 127.0.0.1#8600(127.0.0.1) ;; WHEN: Thu Feb 18 07:39:22 UTC 2016 ;; MSG SIZE rcvd: 244
In ANSWER SECTION, there are domain names of nodes in format, requested by Consul (pay attention—these are nodes, not services!) and ports with running requested service. Nodes IP-addresses are listed in ADDITIONAL SECTION of the response.
Service name format for DNS-request looks like:
- [tag.] – optional part. It is used for service filtration by tags. If we have services with the same name but different tags, then adding tag’s name can help to filter the respond.
- <service> - obligatory part, service name
- .service - indicates that we make a service lookup
- [.datacenter] – optional part, datacenter name
- <domain> - obligatory part, Consul top-level private domain
So, service with Nginx name, possessing tag by name "web" can be presented as domain:
SRV-requests for services lookup according to RFC-2782
Besides the "usual" building of domain name, we can build it according to more strict rules RFC-2782 for request execution on getting SRV-record. The name format looks like:
Service name and tag have underscore (_) in the form of prefix. (In the original RFC, instead of tag there should be the protocol name. This helps to prevent collisions while making a request.) In the case of using name in RFC-2782 format, service with Nginx name, possessing tag by name "web", will look like that:
The response will be the same as in the case of a "simple" request:
root@511cdc9dd19b:/# dig @127.0.0.1 -p 8600 _rls._rails.service.consul. SRV ; <<>> DiG 9.9.5-3ubuntu0.7-Ubuntu <<>> @127.0.0.1 -p 8600 _rls._rails.service.consul. SRV ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 26932 ;; flags: qr aa rd; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 2 ;; WARNING: recursion requested but not available ;; QUESTION SECTION: ;_rls._rails.service.consul. IN SRV ;; ANSWER SECTION: _rls._rails.service.consul. 0 IN SRV 1 1 80 agent-one.node.dc1.consul. _rls._rails.service.consul. 0 IN SRV 1 1 80 agent-two.node.dc1.consul. ;; ADDITIONAL SECTION: agent-one.node.dc1.consul. 0 IN A 172.17.0.2 agent-two.node.dc1.consul. 0 IN A 172.17.0.3 ;; Query time: 6 msec ;; SERVER: 127.0.0.1#8600(127.0.0.1) ;; WHEN: Thu Feb 18 07:52:59 UTC 2016 ;; MSG SIZE rcvd: 268
By default, all domain names in terms of Consul have TTL = 0, meaning they are not cached at all. This is important to remember.
HTTP REST API is the main tool for Consul cluster management and it offers a range of opportunities. In terms of API, 10 endpoints are processed. Each of them provides access to configuration of a certain functional Consul aspect. In Consul Documentation, there is a detailed endpoints description.
HTTP REST API — here is a brief description of them, to give you an idea of API opportunities:
- acl – access control
- agent - Consul agent management
- catalog – cluster nodes and services management
- coordinate – network coordinates
- event – custom events
- health – availability check
- kv - Key/Value storage
- query – prepared requests
- session - sessions
- status – system status
From the name we can get that acl manages access control to Consul services. We can regulate access for receiving and changing data about services, nodes, custom events, and also to perform access control to k/v-storage.
Managing Consul local agent. All operations at that endpoint affect local agent data. It is possible to get information about current agent condition, its role in cluster, and also to get access for managing local services. Changes implemented to local services will be synchronized with all cluster nodes.
Managing Consul global registry. Work with nodes and services is concentrated here. In terms of this endpoint, it is possible to register and turn off services. In the case of work with services, using this section is more preferable than working through agent. Working through catalog is easier, clearer, and facilitates anti-entropy.
Consul uses network tomography for measurement of network coordinates. These coordinates are used for building effective routes in terms of cluster and many other useful functions, like being able to look up the nearest node with provided service or switching to the nearest datacenter in case of a system crash. API functions in this section are used only for receiving information about the current statement of network coordinates.
Custom events processing. Custom events are used for performing any actions in terms of cluster: for automatic deploy, services restart, certain scripts running, or other acts in terms of orchestration.
Allows one to check the current state of nodes and services. This endpoint is used only for reading and returns the current state of nodes and services, and also lists performed checks.
This endpoint has only one method and is used for data management in distributed key/value storage, provided by Consul. The only method in this endpoint looks like:
The difference in processing is in the request method. GET will return the value by key, PUT will save the new value or rewrite the old one, and DELETE will delete the record.
Managing of Prepared queries. These queries allow making complex manipulations at Consul configuration and can be saved and performed later. Saved queries get a unique ID. With the help of the ID, the query can be made any time without a necessity of repeated preparation.
Session mechanism in Consul is used for building distributed locks. Sessions represent the binding layer between nodes, running checks and k/v-storage. Every session has a name and it can be saved in the storage. The name is used for locks in terms of sequential activities with nodes and services in concurrent mode. The mechanism of sessions work is described in Consul documentation.
This endpoint is used for receiving information on cluster status. Here it is possible to find out the current leader and receive information on cluster members.
Previously, we talked about equal load distribution with the help of DNS, and now we are going to observe the health check mechanism of nodes and services state. Health check is a periodical operation. According to its results, we can define the condition of the tested system. In fact, this is an automatic monitoring that keeps the cluster condition in working state, clears not working nodes and services, and returns them to work by rehabilitation. Consul supports several types of checks:
- Script check – running of a certain script on a certain node at predetermined intervals. According to escape code (any different from zero code means that the check fails) it turns on or turns off the node or service.
- HTTP Check – a check that is trying to get the stated URL, and according to the response code it turns on or off the testing object (any 2xx – is ok, code 429 Too Many Requests generates a warning, other codes report an error).
- TCP Check – a check that is trying to establish a tcp-connection with a predetermined interval to a determined address and port. Connection failure means that the check fails.
- TTL Check – a check that should be periodically updated by HTTP API. If a service doesn’t update this check in terms of a certain interval, it is indicated as non-working. This is a passive check. The service has to report the state of its work periodically. If in a predetermined interval there is no report, the check is considered failed.
- Docker Check – a check for services working in docker-containers. Consul, by using Docker Exec API, can run a script, located inside a container. The check result will depend on escape code (any different from zero means the check failure).
The storage provided by Consul is a distributed key-value database and can be used for saving any data, accessible for any cluster member (according to ACL rules, of course). Services can save in the storage data that is essential for another cluster’s members. It can be values of configuration options, some calculation results, or, as we stated above, k/v-storage can be used for distributed locks implementation with the help of session mechanism. Using k/v-storage allows us to make our cluster more effective and decrease the percentage of manual configuration. Services can correct their state according to the information in the storage provided by the cluster. Note: don’t save any data concerning business-logics of your services in this storage. The storage, provided by Consul, is used for keeping and distributing meta information on cluster members’ condition, but not for keeping data that cluster members are processing.
It is hard to overestimate the role of discovery-service in the process of building distributed architecture on big projects. Consul is the perfect choice for that role. The product is constantly developing and doesn’t forge ahead. Lots of useful functionality, necessary for easy support of the system with many components, was implemented. Besides, Consul is written in Go and is distributed as a single executable file, which makes the updating and support process very convenient.