Service Discovery: More Than It Seems (Part 2)

DZone 's Guide to

Service Discovery: More Than It Seems (Part 2)

With our services ready, it's time to make our clients fault-tolerant and more customizable using tools that work with Mesos Marathon and Spring Cloud.

· Cloud Zone ·
Free Resource

In the first episode, we successfully fetched data from Mesos Marathon into Spring Cloud beans directly. At the same time, we found our first problems, one of which we will analyze in the current part of the story.

Let's remember our connection configuration to Marathon:

      scheme: http #url scheme 
      host: marathon #marathon host
      port: 8080 #marathon port 

What problems do we see here? First, we do not have any authorization while connecting. That's is strange for production usage. Second, we can specify only one host and port. In principle, it would be possible to try hiding several masters behind one load balancer or DNS, but we make an additional point of failure that we want to avoid.

Password — Whole Head!

There are two available options for authorization mechanics: Basic and Token. Basic authorization feels pretty hackneyed. Every developer knows it. Take a login and password. Glue them with :. Encode in Base64. Add the HTTP-header Authorization with the value Basic <Base64>. That's all.

With a token, it is slightly more challenging. It is unavailable in open source implementations. But such a method will reasonable for those who use DC/OS. For this purpose, it is necessary just to add another authorization header:

Authorization: token=<auth_token> 

Thus we can add several necessary properties to our configuration:

      token: <dcos_acs_token> 
      username: marathon 
      password: mesos 

And further, we can be guided by simple priorities. If the token is specified, then we take it. Otherwise, we make a login and password and do basic authorization. Well, in the absence of that, we create the client without authorization.

Feign.Builder builder = Feign.builder()
    .encoder(new GsonEncoder(ModelUtils.GSON))
    .decoder(new GsonDecoder(ModelUtils.GSON))
    .errorDecoder(new MarathonErrorDecoder());

if (!StringUtils.isEmpty(token)) { 
    builder.requestInterceptor(new TokenAuthRequestInterceptor(token));
} else if (!StringUtils.isEmpty(username)) { 
    builder.requestInterceptor(new BasicAuthRequestInterceptor(username,password)); 

builder.requestInterceptor(new MarathonHeadersInterceptor()); 

return builder.target(Marathon.class, baseEndpoint); 

The Marathon client is implemented with the declarative HTTP client Feign, which can be extended by interceptors. In our case, they provide extra HTTP headers to the query. After that, the builder constructs a proxy object with additional behavior according to an interface that it implements. An interface should have one or several methods that could be called on the remote side:

public interface Marathon { 
  // Apps 
  @RequestLine("GET /v2/apps")
  GetAppsResponse getApps() throws MarathonException; 

  //Other methods

So, warm-up ended. Now we will be engaged in the more complex challenge.

Fault-Tolerance Client

If we have deployed a production installation of Mesos and Marathon, then a number of masters from which we could read data will be more than one. Moreover, some of them could be unavailable, broken, or be maintaining. The impossibility to obtain information will lead to the obsolescence of information on the client side, and, therefore, at some point, it will make the wrong decisions. Or, in the case of an update of the application software, we generally won't receive the list of instances, and it will be out of service. All of that is not good. We need smart client load balancing to fix it.

It would be logical to use Ribbon as the most appropriate candidate because it is already used for client load balancing inside Spring Cloud. We will talk more about balancing strategies in upcoming articles. Now, we limit it to the basic functionality that is required for solving our problem.

First of all, we need to use the balancer in feign-client:

Feign.Builder builder = Feign.builder()
        .lbClientFactory(new MarathonLBClientFactory())

Maybe you have a question. What is lbClientFactory and why we should use our own? This factory constructs a client to the load balancer. By default, feign-client doesn't have a major feature: It can't retry calls if something goes wrong. For retries, we should add it during object construction:

public static class MarathonLBClientFactory implements LBClientFactory { 
    public LBClient create(String clientName) { 
        LBClient client = new LBClientFactory.Default().create(clientName); 
        IClientConfig config = ClientFactory.getNamedConfig(clientName); 
        client.setRetryHandler(new DefaultLoadBalancerRetryHandler(config)); 
        return client;

Don't worry about the fact that our retry-handler has the Default prefix. Inside of it, we have all that we need. Try to configure it.

Many feign-clients might exist in an application, but the Marathon client is only one of them. With that in mind, we should use the following pattern for properties:


In our case:


All properties for the client load balancer are stored in a configuration manager that is named Archarius. In our case, these properties are stored in memory and we can add them on the fly. To implement it, we should add helper the method setMarathonRibbonProperty in our modified client. In this method, we set a different kind of property with the following pattern:

ConfigurationManager.getConfigInstance().setProperty(MARATHON_SERVICE_ID_RIBBON_PREFIX + suffix, value); 

And now, before the construction of our feign-client, we should initialize:

setMarathonRibbonProperty("listOfServers", listOfServers); 
setMarathonRibbonProperty("OkToRetryOnAllOperations", Boolean.TRUE.toString());
setMarathonRibbonProperty("MaxAutoRetriesNextServer", 2);
setMarathonRibbonProperty("ConnectTimeout", 100);
setMarathonRibbonProperty("ReadTimeout", 300); 

What is interesting here? At first, it is listOfServers. In fact, this is an enum of all possible pairs of host sand ports of Marathon masters, separated by a comma. In our case, we should add a proxy-property that should be translated into a ribbon-property:

      listOfServers: m1:8080,m2:8080,m3:8080 

Now, every new call to the master will go to one of these servers.

And we should not forget to set OkToRetryOnAllOperations to true for enabling retries.

The max retries count should be set up in the MaxAutoRetriesNextServer property. So, what about the NextServer suffix? The MaxAutoRetries option defines the retry count for the first server before the next will be used. By default, it has a value of 0. It means that after the first fault attempt, the client goes to the next candidate immediately. And you should remember, that MaxAutoRetriesNextServer defines the retry count without a first attempt.

And finally, for avoiding long connections, we should set ConnectTimeout and ReadTimeout with reasonable limits.


In this part of series, we've made our Marathon-client fault-tolerant and more customizable. Moreover, we use solutions that are already used in Spring Cloud. But we're still far away from our goal because the most interesting part is not ready yet.

See you in the next part!

spring cloud ,mesosphere marathon ,cloud ,service discovery ,tutorial

Published at DZone with permission of Aleksandr Tarasov . See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}