Polyglot Microservices: Application Integration With Apache Thrift
There are many viable communications schemes to choose from today and they all have their place. However, if you want it all in one package, then Randy Abernethy makes a solid case for Apache Thrift. Read on for more.
Join the DZone community and get the full member experience.
Join For FreeModern software systems live in a networked world. Network communications are critical to the tiniest embedded systems in the Internet-of-Things, through to the weightiest of relational databases anchoring traditional multi-tier applications. As new software systems increasingly embrace dynamically scheduled containerized microservices, light-weight high-performance language agnostic network communications become ever more important.
But how to wire all of these things together, the old and the new, the big and the small? How do we package a message from a service written in one language in such a way that any other language can read it? How do we design services that are fast enough for high-performance backend cloud systems but accessible by front end scripting technologies? How do we keep things light weight to support efficient containers and embedded systems? How do we create interfaces that can evolve over time without breaking existing components? How do we do all of this in an open, vendor-neutral way and, perhaps most important, how can we do it all precisely once, reusing the same communications primitives across a broad platform? For companies like Facebook, Evernote and Twitter, the answer is Apache Thrift.
This article introduces the Apache Thrift framework and its role in modern distributed applications. We will take a look at why Apache Thrift was created and how it helps programmers build high-performance cross-language services. To begin, we'll consider the growing need for multi-language integration and examine the role Apache Thrift plays in polyglot application development. Next we'll look at the two key functions of a communications framework and walk through the construction of a simple Apache Thrift service. At the end of the article, we'll compare Apache Thrift to several other tools offering similar features to help you determine when Apache Thrift might be a good fit.
Polyglotism, the Pleasure and the Pain
The number of programming languages in common commercial use has grown considerably in recent years. In 2003 80% of the Tiobe Index (http://www.tiobe.com/index.php/tiobe_index) was attributed to six programming languages: Java, C, C++, Perl, Visual Basic and PHP. In 2013 it took nearly twice as many languages to capture the same 80%, adding Objective-C, C#, Python, JavaScript and Ruby to the list. In early 2016 the entire Tiobe top 20 did not add up to 80% of the mind share. In Q4 2014 Github reported 19 languages all having more than 10,000 active repositories (http://githut.info/), adding Swift, Go, Scala and others to the in click.
Increasingly developers and architects choose the programming language most suitable for the task at hand. A developer working on a Big Data project might decide Clojure is the best language to use, meanwhile, folks down the hall may be doing front end work in Dart, while programmers in the basement might be using C with embedded systems (no aversion to sunlight implied). Years ago this type of diversity would be rare at a single company, now it can be found within a single team.
Choosing a programming language uniquely suited to solving a particular problem can lead to productivity gains and better quality software. When the language fits the problem, friction is reduced, programming becomes more direct and code becomes simpler and easier to maintain. For example, in large-scale data analysis, horizontal scaling is instrumental to achieving acceptable performance. Functional programming languages like Haskell, Scala and Clojure tend to fit naturally here, allowing analytic systems to scale out without complex concurrency concerns.
Platforms drive language adoption as well. Objective-C exploded in popularity when Apple released the iPhone and Swift is following suit. Go is the language of the booming container ecosystem, responsible for Docker, Kubernetes, etcd and other essentials. Those programming for the browser will have teams competent with JavaScript, TypeScript, and/or Dart while the game and GUI world still codes in C++ for top performing graphics. These choices are driven by history as well as compelling technology underpinnings. Even when such groups are internally monoglots, languages mix and mingle as they collaborate across business boundaries.
Many organizations who claim monoglottism make use of a range of support languages for testing and prototyping. Dynamic programming languages such as Groovy and Ruby are often used for testing, while Lua, Perl, and Python are popular for prototyping and PHP has a long history with the web. Build systems such as the Groovy based Gradle and the Ruby based Rake also provide innovative capabilities.
The Polyglot story is not all wine and song, however. Mastering a programming language is no small feat, not to mention the tools and libraries that come with it. As this burden is multiplied with each new language, firms may experience diminishing returns. Introducing multiple languages into a product initiative can have numerous costs associated with cross-language integration, developer training, and complexity in build and test. If managed improperly, these costs can quickly overshadow the benefits of a multilanguage strategy.
Apache Thrift Language Support | |||
AS3 | C | C++ | C# |
D | Dart | Delphi | Erlang |
Go | Haskell | Haxe | Java |
JavaScript | Lua | Node.js | Objective-C |
OCaml | Perl | PHP | Python |
Ruby | Smalltalk | TypeScript |
One of the key strengths of Apache Thrift is its ability to simplify, centralize and encapsulate the cross-language aspects of a system. Apache Thrift offers broad support, in tree, for polyglot application development. Every language mentioned above is supported by the Apache Thrift project, over twenty languages in all, and growing. This unrivaled direct support for existing languages and the Apache Thrift community's rapid addition of support for new languages can help organizations maximize the potential of polyglotism while minimizing the downside.
Application Integration With Apache Thrift
Whether your application makes use of multiple platforms and languages or not, it is fairly likely that its operations span multiple processes over networks and time. At some point, these processes will need to communicate, either through a file on disk, through a buffer in memory or across networks. There are two central concerns associated with inter-process communications:
Type Serialization
Service Implementation
Let's consider each in turn.
Type Serialization
Serialization is a basic function in any cross platform/language exchange. For example, consider an application for the Music Industry that uses Apache QPID as a messaging system to communicate song data. Using the QPID message broker, Java and Python programs can send/receive messages using queues. The question is, can the Python and Java programs read each other’s musical messages? Python objects are represented differently in memory than Java objects. If a Python program sent the raw memory bits for its music track data to a Java program fireworks would ensue.
To solve this problem we need a data serialization layer on top of the messaging platform. Why not just send everything back and forth in JSON one might ask? Using a standard format like JSON is part of a solution, however, we still must answer questions like: how are data fields ordered when sending multi-field messages, what happens when fields are missing, and what does a language that does not directly support a data type do when receiving that datatype? These and many other questions cannot be answered by a data layout specification like JSON, YAML or XML. Different languages frequently produce different, though legally formatted, documents for the same dataset.
IDL and Types
Apache Thrift provides a modular serialization framework which addresses these issues. With Apache Thrift, developers define abstract data types in an Interface Definition Language (IDL). This IDL can then be compiled into source code for any supported language. The generated code provides complete serialization and deserialization logic for all of the user's defined types. Apache Thrift ensures that types written by any language can be read by any other language.
namespace * music
enum PerfRightsOrg {
ASCAP = 1
BMI = 2
SESAC = 3
Other = 4
}
typedef double Minutes
struct MusicTrack {
1: string title
2: string artist
3: string publisher
4: string composer
5: Minutes duration
6: PerfRightsOrg pro
}
Some complain that creating IDL is an extra step, slowing the development process. I have found that it is quite the opposite. IDL forces you to carefully consider your interfaces in isolation, free of noisy implementation code. This may be the most important time you spend on a system design. IDL is also light weight, easy to modify and experiment with and often useful as a communications tool on the business side as well.
Some say schemaless systems are more flexible and that IDL is brittle. The truth is, whether you document your schema or not, you still have a schema if you are reading and interpreting data. Implied (undocumented) schemas can be the source of fairly treacherous application errors and create a burden on developers who need to interact with the data or extend the system. If you have no definition for the data layout you read and write except the code that reads and writes it, it will be slow going when you want to extend the system. How many bits of code throughout the system depend on this implied schema? How do you change such a thing?
The popularity of NoSQL systems, many of which are schemaless, creates another role for IDL. You now have the opportunity to document your types in a single place and to use those types in service calls, with messaging systems and in storage systems like Redis, MongoDB and others.
Some systems reverse the process and generate their schema from a given coded solution. Annotation driven system like Java's JAX-RS can work this way. This approach makes it very easy to allow implementation details to bias the interface definition, straining portability and clarity. It is generally much more work to modify implementation code than it is to modify IDL. Also, there is no guarantee that another vendor’s code generator will create compatible code given the schema from a different vendor. This is a problem anytime multiple vendors are involved in a communications solution.
Apache Thrift side steps some of these problems by providing a single source of truth, the IDL. Apache Thrift supplies vendor independent support for a single IDL across a wide array of programming languages and the Apache Thrift cross-language test suit is constantly at work verifying interoperability as the framework grows.
Interface Evolution
IDL creates a contract that all parties can rely upon and that code generators can use to create working serialization operations, ensuring the contract is adhered to. Yet IDL schemas need not be brittle. Apache Thrift IDL supports a range of interface evolution features which, when used properly, allow fields to be added and removed, types to be changed and more.
Support for interface evolution greatly simplifies the task of ongoing software maintenance and extension. Modern engineering sensibilities such as microservices, Continuous Integration (CI) and Continuous Delivery (CD) require systems to support incremental improvements without impacting the rest of the platform. Tools which do not supply some form of interface evolution tend to “break the world” when changed. In such systems changing an interface means that all of the clients and servers using that interface must be rewritten and/or recompiled, then redeployed in a big bang.
Apache Thrift interface evolution features allow multiple interface versions to coexist seamlessly in a single operating environment. This makes incremental updates viable, enabling CI/CD pipelines and empowering individual Agile teams to deliver business value at their own cadence.
Modular Serialization
Apache Thrift provides pluggable serializers, known as protocols, allowing you to use any one of several serialization formats for data exchange, including binary for speed, compact for size and JSON for readability. The same contract (IDL) can remain in place even as you change serialization protocols to refine the operational aspects of your system. This modular approach allows custom serialization protocols to be added as well. Because Apache Thrift is community managed and open source, you can easily change or enhance functionality and push it upstream when needed (patches are always welcome at the Apache Thrift project).
Service Implementation
Services are modular application components which provide interfaces accessible over a network (physical or virtual). Apache Thrift IDL allows you to define services in addition to types. Like types, IDL services can be compiled to generate stub code. Service stubs are used to connect clients and servers in a wide range of languages.
service SailStats {
double get_sailor_rating(1: string sailor_name)
double get_team_rating(1: string team_name)
double get_boat_rating(1: i64 boat_serial_number)
list<string> get_sailors_on_team(1: string team_name)
list<string> get_sailors_rated_between(1: double min_rating,
2: double max_rating)
string get_team_captain(1: string team_name)
}
Imagine you have a module which tracks and computes sailing team statistics and that this module is built into a Windows C++ GUI application designed to visualize wind flow dynamics. As it happens, your company’s web dev team would like to use the sail stats module to enhance a client facing Node.js based web application on Linux. Faced with multiple languages and platforms, different module cardinalities (we may need to run 10s of instances of our Node.js code but only a few of our C++ module), and the desirable trait of laziness (wanting to write as little code as possible), Apache Thrift could be a good solution.
With Apache Thrift, we could repackage the sail stats functions as a microservice and provide the Node.js programmers with access to the service via an easy to use Node.js client stub. To create the sail stats microservice we need only define the service interface in IDL, compile the IDL to create client and server stubs for the service, select one of the prebuilt Apache Thrift servers to host the service and then assemble the parts.
Prebuilt Server Shells
It is important to note that, unlike standalone serialization solutions, Apache Thrift comes with a complete set of server shells, ready to use, in almost all of the supported languages. This eliminates the difficult and repetitive process of building custom network servers. The prebuilt Apache Thrift servers are also small and focused, providing just the functionality necessary to host Apache Thrift services. A typical Apache Thrift server will consume an order of magnitude less memory than an equivalent Tomcat deployment. This makes Apache Thrift servers a good choice for containerized microservices and embedded systems that do not have the resources necessary to run full blown web or application servers.
Modular Transport System
Apache Thrift also offers a pluggable transport system. Apache Thrift clients and servers communicate over transports which adapt Apache Thrift data flows to the outside world. For example, the TSocket transport allows Apache Thrift applications to communicate over TCP/IP sockets. There are prebuilt transports for other communications schemes, such as named pipes, and custom transports are easy to craft as well. Apache Thrift also supports offline transports which allow data to be serialized to disk, memory, and other devices.
A particularly elegant aspect of the Apache Thrift transport model is support for layered transports. Protocols serialize application data into a bit stream. Transports simply read and write the bytes, making any type of manipulation possible. For example, the TZLibTransport is available in many Apache Thrift language libraries and can be layered on top of any other transport to achieve high ratio data compression. You can branch data to loggers, fork requests to parallel servers, encrypt and perform any other manner of manipulation with custom layered transports.
Apache Thrift Service Walkthrough
To get a better understanding of the practical aspects of Apache Thrift, we'll build a simple microservice. Our service will be designed to supply various parts of our enterprise with a daily greeting. The service will expose a single "hello_func" function which takes no parameters and returns a greeting string. To see how Apache Thrift works across languages we'll build clients in C++, Python, and Java.
The Hello IDL
Most projects involving Apache Thrift begin with careful consideration of the interface components involved. Apache Thrift IDL is similar to C in its notation and makes it easy to define types and services shared across systems. Apache Thrift IDL code is saved in plain text files with a “.thrift” extension.
# hello.thrift
service HelloSvc {
string hello_func()
}
Our hello.thrift IDL file declares a single service interface called HelloSvc with a single function, hello_func(). The function accepts no parameters and returns a string. To use this interface we can compile it with the Apache Thrift IDL compiler. The IDL compiler binary is named “thrift” on UNIX-like systems and “thrift.exe” on Windows. The compiler expects two command line arguments, an IDL file to compile and one (or more) target languages to generate code for. Here’s an example session which generates Python stubs for our HelloSvc:
$ ls
hello.thrift
$ thrift --gen py hello.thrift
$ ls
gen-py hello.thrift
In the above session, the IDL Compiler created a gen-py directory to house all of the emitted Python code for our hello.thrift IDL. The directory contains client/server stubs for all of the services defined and serialization code for all of the user defined types.
The Hello Server
Now that we have our support code generated we can implement our service and use a prebuilt Apache Thrift server to house it. Here's an example server coded in Python:
# hello_server.py
import sys
sys.path.append("gen-py")
from hello import HelloSvc
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol
from thrift.server import TServer
class HelloHandler:
def hello_func(self):
return "Hello from the python server"
handler = HelloHandler()
proc = HelloSvc.Processor(handler)
trans_ep = TSocket.TServerSocket(port=9090)
trans_fac = TTransport.TBufferedTransportFactory()
proto_fac = TBinaryProtocol.TBinaryProtocolFactory()
server = TServer.TSimpleServer(proc, trans_ep, trans_fac, proto_fac)
server.serve()
At the top of our server listing we use the built in Python sys module to add the gen-py directory to the Python Path. This allows us to import the generated service stubs for our HelloSvc service.
Our next step is to import several Apache Thrift library packages. TSocket provides an end point for our clients to connect to, TTransport provides a buffering layer, TBinaryProtocol will handle data serialization and TServer will give us access to some of the prebuilt Python server classes.
The next block of code implements the HelloSvc service itself. This class is called a handler in Apache Thrift parlance. All of the service methods must be represented in the Handler class, in our case this is just the hello_func() method. In the real world, this is where all of your time and effort is spent, implementing your services, because Apache Thrift handles the rest.
Next we create an instance of our handler and use it to initialize a processor for our service. The processor is the server side stub generated by the IDL compiler which turns network service requests into calls to the appropriate handler function.
The Apache Thrift library offers end point transports for use with files, memory, and various networks. The example here creates a TCP server socket end point to accept client connections on port 9090. The buffering layer ensures that we make efficient use of the underlying network, transmitting bits only when an entire message has been serialized. The binary serialization protocol transmits our data in a fast binary format with little overhead.
Apache Thrift provides a range of servers to choose from, each with unique features. The server used here is an instance of the TSimpleServer class, which, as its name implies, provides basic server functionality. Once constructed, we can start the server run loop by calling the serve() method.
The example session below runs our Python server:
$ ls
gen-py hello_server.py hello.thrift
$ python hello_server.py
Our Python server took about 7 lines of code, excluding imports and the service implementation. The story is similar in C++, Java, and most other languages. This is a very basic server but the example should give you some sense as to how much leverage Apache Thrift gives you when it comes to quickly creating cross-language microservices.
A Python Client
Now that we have our server running, let's create a simple Python client to test it.
# hello_client.py
import sys
sys.path.append("gen-py")
from hello import HelloSvc
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol
trans = TSocket.TSocket("localhost", 9090)
trans = TTransport.TBufferedTransport(trans)
proto = TBinaryProtocol.TBinaryProtocol(trans)
client = HelloSvc.Client(proto)
trans.open()
msg = client.hello_func()
print("[Client] received: %s" % msg)
trans.close()
Our Python client begins by importing the same HelloSvc module used by the server but the client will make use of the client-side stubs for the hello service. We also import three modules from the Apache Thrift Python Library. The first is TSocket which is used on the client side to make a TCP connection to the server socket, as you might guess the client must use a client-side transport compatible with the server transport. The next import pulls in TTransport which will provide a network buffer and the TBinaryProtocol import allows us to serialize messages to the server, again this must match the server implementation.
Our next block of code initializes the TSocket with the host and port to connect to. We wrap the socket transport in a buffer and finally wrap the entire transport stack in the TBinaryProtocol, creating an I/O stack that can serialize data to and from the server end point.
The I/O stack is used by the client stub which acts as a proxy for the remote service. Opening the transport causes the client to connect to the server, allowing us to make calls to the service using the client stub. Invoking the hello_func() method on the Client object serializes our call request with the binary protocol and transmits it over the socket to the server. Our program prints out the result and then closes the connection using the transport close() method.
Here is a sample session running the above client (the Python server must be running in another shell to respond).
$ ls
gen-py hello_client.py hello_server.py hello.thrift
$ python hello_client.py
[Client] received: Hello from the python server
While a bit more work than your run of the mill hello world program, a few lines of IDL and a few lines of Python code have allowed us to create a language agnostic, OS agnostic, and platform agnostic service API with a working client and server. Not bad.
A C++ Client
To broaden our perspective and demonstrate the cross-language aspects of Apache Thrift let's build two more clients for the hello server, one in C++ and one in Java. We'll start with the C++ client.
First we need to compile the service definition again, this time generating C++ stubs:
$ thrift --gen cpp hello.thrift
$ ls
gen-cpp gen-py hello_client.py hello_server.py hello.thrift
Running the IDL Compiler with the “--gen cpp” switch causes it to emit C++ files in the gen-cpp directory roughly equivalent to those generated for Python, producing headers (.h) and source files (.cpp) for our hello.thrift IDL. The gen-cpp/HelloSvc.h header contains the declarations for our service and the gen-cpp/HelloSvc.cpp source file contains the implementation of the service stub components.
Here’s the code for a HelloSvc C++ client with the same functionality as the Python client above:
#include "gen-cpp/HelloSvc.h"
#include <thrift/transport/TSocket.h>
#include <thrift/transport/TBufferTransports.h>
#include <thrift/protocol/TBinaryProtocol.h>
#include <boost/make_shared.hpp>
#include <iostream>
#include <string>
using namespace apache::thrift::transport;
using namespace apache::thrift::protocol;
using boost::make_shared;
int main() {
auto trans_ep = make_shared<TSocket>("localhost", 9090);
auto trans_buf = make_shared<TBufferedTransport>(trans_ep);
auto proto = make_shared<TBinaryProtocol>(trans_buf);
HelloSvcClient client(proto);
trans_ep->open();
std::string msg;
client.hello_func(msg);
std::cout << "[Client] received: " << msg << std::endl;
trans_ep->close();
}
Our C++ client code is structurally identical to the Python client code. With few exceptions, the Apache Thrift meta-model is consistent from language to language, making it easier for developers to work across languages.
The C++ main() function corresponds line for line with the Python code with one exception, hello_func() does not return a string conventionally, rather it returns the string through an out parameter reference.
The Apache Thrift language libraries are generally wrapped in namespaces to avoid conflicts in the global namespace of the host language. In C++, all of the Apache Thrift library code is located within the “apache.thrift” namespace. The using statements here provide implicit access to the necessary apache thrift library code.
Apache Thrift strives to maintain as few dependencies as possible to keep the development environment simple and portable, however, there are exceptions. For example, the Apache Thrift C++ library relies on the open source Boost library. In our example, several objects are wrapped in boost::shared_ptr. Apache Thrift uses shared_ptr to manage the lifetimes of almost all of the key objects involved in C++ service operations.
Those familiar with C++ will know that shared_ptr has become part of the standard library in C++11. While our example code is written in C++11, Apache Thrift supports C++98 as well, requiring the use of the boost version of shared_ptr (C++98 support will likely be dropped at some point in the future moving all boost namespace elements to the std namespace).
Here is a bash session which builds and runs our C++ client.
$ ls
gen-cpp gen-py hello_client.cpp hello_client.py hello_server.py
hello.thrift
$ g++ --std=c++11 hello_client.cpp gen-cpp/HelloSvc.cpp -lthrift
$ ls
a.out gen-cpp gen-py hello_client.cpp hello_client.py
hello_server.py hello.thrift
$ ./a.out
[Client] received: Hello thrift, from the python server
In this example we use the Gnu C++ compiler to build our hello_client.cpp file into an executable program. Clang, Visual C++ and other compilers are also commonly used to build Apache Thrift C++ applications.
For the C++ compile, we must also compile the generated client stub found in the HelloSvc.cpp source file. During the link phase the “–lthrift” switch tells the linker to scan the standard Apache Thrift C++ library to resolve the TSocket and TBinaryProtocol library dependencies (this switch must follow the list of .cpp files when using g++ or it will be ignored causing link errors).
Assuming the Python Hello server is still up, we can run our executable C++ client and make a cross-language RPC call. The C++ compiler builds our source into an a.out file which produces the same result as the Python client when executed.
A Java Client
As a final example let’s put together a Java client for our service. Our first step is to generate Java stubs for the service.
$ thrift --gen java hello.thrift
$ ls
a.out gen-cpp gen-java gen-py hello_client.cpp
hello_client.py hello_server.py hello.thrift
The “–-gen java” switch causes the IDL Compiler to emit Java code for our interface in the gen-java directory, creating a HelloSvc class with nested client and server stub classes. Here is the source for a Java client which parallels the prior Python and C++ clients:
import org.apache.thrift.protocol.TBinaryProtocol;
import org.apache.thrift.transport.TSocket;
import org.apache.thrift.TException;
public class HelloClient {
public static void main(String[] args) throws TException {
TSocket trans = new TSocket("localhost", 9090);
TBinaryProtocol protocol = new TBinaryProtocol(trans);
HelloSvc.Client client = new HelloSvc.Client(protocol);
trans.open();
String str = client.hello_func();
System.out.println("[Client] received: " + str);
trans.close();
}
}
Our Java main() method lives inside a class with the same name as the containing file. The rest is a rehash of our previous clients. The one noticeable difference is that the Java client has no buffering layer above the end point transport. This is because the socket implementation in Java is based on a stream class which buffers internally, so no additional buffering is required.
Here is a build and run session for the Java client:
$ javac -cp /usr/local/lib/libthrift-1.0.0.jar:/usr/share/java/slf4j-api.jar:/usr/share/java/slf4j-nop.jar HelloClient.java gen-java/HelloSvc.java
$ ls
a.out gen-cpp gen-java gen-py HelloClient.class
HelloClient.java hello_client.cpp hello_client.py hello_server.py hello.thrift
$ java -cp /usr/local/lib/libthrift-1.0.0.jar:/usr/share/java/slf4j-api.jar:/usr/share/java/slf4j-nop.jar:./gen-java:. HelloClient
[Client] received: Hello thrift, from the python server
Our Java compile includes three dependencies, the first is the Apache Thrift Java library jar. The IDL generated code for our service also depends on SLF4J, a popular Java logging façade. The slf4j-api jar is the façade and the slf4j-nop jar is the nonoperational logger, which simply ignores logging calls. The java files generate byte code in .class files for our HelloClient class as well as the HelloSvc class.
To run our Java HelloClient class under the JVM, we must modify the Java class path as we did in the compilation step, adding the current directory and the gen-java directory, where the HelloClient class and HelloSvr class files will be found. Running the client produces the same result we saw with Python and C++.
Beyond running standard build tools in our respective languages, it took very little effort to produce our Apache Thrift server and the three clients. In short order, we have built a microservice which can handle requests from clients created in a wide range of languages (source here). Now that we have seen how basic Apache Thrift programs are created, let’s take a look at how Apache Thrift fits into the overall application integration landscape.
Comparing Toolkits
SOAP, REST, Protocol Buffers and Apache Avro are perhaps the technologies most often considered as alternatives to Apache Thrift, though there are many others. Each technology is unique and all have their place. The following sections provide a brief overview of the key players in the software communications landscape followed by a summary of the features fielded by Apache Thrift and a discussion of where it fits in the milieu.
SOAP
Simple Object Access Protocol (SOAP) is a W3C recommendation (https://www.w3.org/TR/2007/REC-soap12-part1-20070427/) specifying a Service Oriented Architecture (SOA) style remote procedure call (RPC) system over HTTP. SOAP relies on XML for carrying its payload between the client and server and is typically deployed over HTTP, though other transports can also be used. Optimizations are available which attempt to reduce the burden of transmitting XML and there are versions of SOAP which use JSON, among other off shoots. Related technologies, such as XML-RPC, operate on similar principles. Unlike, RESTful services, which directly utilize HTTP headers, verbs and status codes, SOAP and XMP-RPC systems tunnel function calls through HTTP POST operations, missing out on most of the caching and system layering benefits found in RESTful services.
The key benefit of HTTP friendly technologies is their broad interoperability. By transmitting standards based text documents (XML, JSON, etc.) over the ubiquitous HTTP protocol, almost any application or language can be engaged. Human readable XML/JSON payloads also greatly simplify prototyping, testing and debugging. On the downside, each language, vender and, often, each company, provides their own scheme for generating stubs. There are no guarantees that code generated by different SOAP WSDL (Web Service Description Language) tools will collaborate. Another drawback is that these HTTP tunneling systems tend to underperform REST at scale due to their lack of support for traditional web infrastructure, in particular caching, among other efficiencies.
SOAP was one of the principal technologies used during the evolution of Service Orientation and is still widely used today. SOAP also offers a number of useful WS-* standards established by the Oasis standards body, addressing authentication, transactions and other concerns (https://www.oasis-open.org/standards). That said, few new SOAP services appear to be coming online and most considering SOAP today find REST simpler, faster at scale and more compelling as a public API solution.
REST
REST is an acronym for REpresentational State Transfer, a term coined by Dr. Roy Fielding in his 2000 dissertation (https://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm). REST is the typical means for web browsers to retrieve content from web servers. RESTful web services use the REST architectural style in order to leverage the infrastructure of the web. The well understood and widely supported HTTP protocol gives REST based services broad reach. REST based services typically use the JSON format for payload transmission, making client/server requests human readable and easy to work with.
RESTful services are unique in that their interfaces are based on resources accessed through URIs and manipulated through HTTP verbs, such as GET, PUT, POST, and DELETE. When done well, this is referred to as a Resource Oriented Architecture (ROA). ROAs produce significant benefits when scaling over the web. For example, standard web-based caching systems can cache resources acquired using the GET verb, firewalls can make more intelligent decisions about HTTP delivered traffic and applications can leverage the wealth of technology associated with existing Web server infrastructure. HTTP headers can be used to negotiate payload formats, cache expirations, security features and more. In-browser clients can leverage the native features of the browser, the list goes on.
When developers refer to APIs or services today, they are usually talking about REST APIs/services. The RESTful approach has become nigh on ubiquitous when it comes to implementing public interfaces. The ecosystem is vast and the developer skills are wide spread. There are however drawbacks to REST.
It is important to keep in mind that REST is an architectural style, not a standard or a technology framework. Two different teams might build the same REST service in very different and incompatible ways. While this might be said of any solution, it is particularly true of REST due to the broad set of perspectives on how REST should be done and the several tool kits, schema mechanisms and documentation systems in use. For example, the RESTful world offers three competing platforms for service definition and code generation: RAML, Swagger, and API Blueprint.
There are also communications models not addressed by REST. REST is by definition a client/server architecture and, in practice, it is implemented over HTTP, a request/response based protocol. REST does not address serialization concerns or support messaging or data streaming.
One of the most important issues with RESTful interfaces is their overhead in backend systems. The advent of HTTP/2 (https://http2.github.io/) does much to address the overhead associated with HTTP header and JSON text transmission, however, no amount of external optimization is likely to allow a REST service to perform at the level of a purpose built binary solution such as Apache Thrift. In fact, Protocol Buffers and Thrift were created by Google and Facebook respectively to alleviate the performance issues associated with RESTful services in high load server systems.
Protocol Buffers
Google Protocol Buffers (https://developers.google.com/protocol-buffers/) and Apache Thrift are similar in function, performance and from a serialization and IDL stand point. They were built by different companies to do the same thing. Official Google Protocol Buffer (PB) language support is limited to C++, Go, Java, Python, C# and Objective-C in PB2. This is a moving target and new languages are added over time (for example Protocol Buffers 3 added Ruby to the list). Protocol Buffers are used by a large community of developers and many projects on the web expand the out of tree languages available to a list comparable to that of Apache Thrift.
Google Protocol Buffers focuses on providing a monolithic integrated message serialization system through the main project. Several RPC style systems for Protocol Buffers are available in other projects. For some, the modular serialization and transport features of the Apache Thrift framework and the in tree language and server support provide an advantage. Others prefer the simple integrated serialization scheme offered by PB.
Another difference between the platforms is support for transmission of collections. Apache Thrift supports transmission of three common container types: lists, sets, and maps. Protocol Buffers supplies a repeating field feature rather than support for containers, producing similar capabilities through a lower level construct. Newer versions of PB add map simulation with some restrictions. Protocol Buffers supports signed and unsigned integers while Apache Thrift supports only signed integers. Apache Thrift, however, supports unions and some other minor IDL features not found in Protocol Buffers. When compared side by side, Protocol Buffers serialization may be marginally faster and produce marginally smaller payloads than Apache Thrift in trade for fewer high-level abstractions.
Protocol Buffers are robust, well documented and backed by a large corporation, which contrasts with the community driven nature of Apache Thrift. This is evident most clearly in the quality of the documentation for the two projects, Google's being noticeably superior (and I am being kind).
Apache Avro
Apache Avro (https://avro.apache.org/) is a serialization framework designed to package the serialization schema with the data serialized. This contrasts with Apache Thrift and Protocol Buffers, both of which describe the schema (data types and service interfaces) in IDL. Apache Avro interprets the schema on the fly while most other systems generate code to interpret the schema at compile time. In general, combining the schema with the data works well for long-lived objects serialized to disk. However, such a model can add complexity and overhead to real-time RPC style communications. Arguments and optimizations can be made to turn these observations on their head of course but the most practical use of Apache Avro has been focused on serializing objects to disk.
Apache Avro supported seven programming languages at the time of this writing and offered a basic RPC framework as well. Avro supports the same containers present in Apache Thrift although Apache Avro maps only allow strings as keys. The use of dynamically interpreted embedded schemas in Apache Avro and the use of compiled IDL in Apache Thrift are the key distinctions between these two platforms.
Apache Thrift
The strength of the Apache Thrift (https://thrift.apache.org/) platform lies in the completeness of its package, its performance and flexibility as well as the expressiveness of its IDL. Apache Thrift was created to provide cross-language capabilities comparable to REST but with dramatically improved performance and a significantly smaller foot print.
Performance
To get a sense of the relative performance of several of the communications approaches described here, we can look at some simple test results. The chart below displays the time required to make one million API calls to several services. All of the servers were coded in Java and the same client, also coded in Java, was used in all cases, though the necessary bindings are used to call the service backend under test. Each bar shows the number of seconds the requests took to complete against a different implementation running on the same machine. The tests were performed in isolation over the local loopback on a system with no other activity. Multiple runs of each test were completed and no outliers were discovered. The sole service function accepts a string and returns a small struct. The service implementation is identical in all cases and performs no logic, simply returning a static struct to highlight the service and serialization overhead.
The first bar shows the elapsed time for the service when implemented with SOAP. A standard Java SOAP service coded in JAX-WS, deployed on Tomcat 7 was used for the test. The serialization overhead associated with XML and the load incurred by Tomcat and HTTP make this the worst performer in the group at over 350 seconds.
The second bar shows the results of the same test but against a REST service created with Java and JAX-RS. Though the comparison normalizes as many variables as possible, REST based services are defined with HTTP verbs and IRIs not functions. The implementation here is a simple GET request (no caching), passing the input string as a query parameter and receiving the resultant struct in a JSON payload. This is noticeably faster than the SOAP example at about 300 seconds, due to the improved serialization performance of JSON over XML and the significantly smaller JSON payload, which is also only present in the response.
The last three bars are Apache Thrift server cases. The first is as close to an apples-to-apples comparison with the REST example as can be had with Apache Thrift. An Apache Thrift server was created with the same one method service, packaged as a servlet, deployed on Tomcat and configured to use the JSON protocol over an HTTP transport. The Apache Thrift client sends parameters to the server using JSON and receives results using JSON. The REST API uses the GET verb and receives its parameters in the IRI, eliminating JSON on the request and making the total bytes transmitted by the REST API smaller than the Apaceh Thrift JSON implementation. The result, perhaps surprisingly, is a significant performance improvement when using Apache Thrift. This is attributable to the serialization benefits produced by the purpose built Apache Thrift client/server stubs among other efficiencies. The gap widens further when POST requests are required in the REST API causing JSON serialization on the client and server.
The real performance gains arrive when Tomcat and HTTP are left behind. The final two bars show the performance of compiled Apache Thrift servers running over TCP with JSON and Compact protocols respectively. Both servers are an order of magnitude faster on the run and an order of magnitude smaller in memory.
While your mileage will vary with different languages, different levels of concurrency, different server shells, different services and different frameworks, the example above provides a frame of reference and explains why many firms have moved large scale backend services away from REST/SOAP and/or JSON serialization when under pressure for performance. Simply migrating to Apache Thrift from REST or SOAP could enable the same hardware to support 10 to 20 times the traffic.
Some designers opt for REST with payloads serialized using Protocol Buffers or Apache Thrift, this doubles the toolkit burden and complexity, misses out on the significant benefits to be had by eliminating HTTP, yet gives up the endearing "human readable payload" property typically associated with REST. An altogether unsatisfying combination.
When it comes to performance, Apache Thrift offers a complete package with near REST class interoperability, significantly improved performance and the widest range of protocol and transport choices.
Reach
Apache Thrift offers support in tree for a wide set of programming languages but also an impressive range of platforms. Apache Thrift can be a good fit for embedded systems, offering support for Java's Compact implementation and providing small foot print servers in C++ and other languages.
Apache Thrift is a natural fit for the typical enterprise development environments, with support for Java/JVM and C#/CLR on Windows, Linux, and OSX. Apache Thrift is also a perfect fit for cloud-native systems, offering minimal, small footprint servers in many languages, perfect for container packaging.
Apache Thrift integrates well with the world of the web also. Native support for languages like JavaScript and Dart can be combined with HTTP, TLS, WebSocket and JSON support in backend systems written in Node.js, C++, Java, C#, etc. Mobile solutions on iOS and Android are also easy to build with support for Objective-C and Java.
Summary
There are many viable communications schemes to choose from today and they all have their place. As a default API option and particularly if you want broad accessibility over the public Internet, REST may be your best choice. If you need absolute speed, you can write your own native binary protocol or use something edgy like Cap'n Proto (https://capnproto.org/). If you are principally serializing to disk, take a look at Apache Avro. If you want a solid, name brand high-speed serialization system, consider Flatbuffers (https://google.github.io/flatbuffers/), or if you need RPC services as well, perhaps Protocol Buffers.
However, if you want:
Servers and Serialization – a complete serialization and service solution in tree
Modularity – pluggable serialization protocols and transports with a range of provided implementations
Performance – light weight, scalable servers with fast and efficient serialization
Reach – support for an impressive range of languages, protocols and platforms
Rich IDL – language independent support for expressive type and service abstractions
Flexibility – integrated type and service evolution features
Community Driven Open Source – Apache Software Foundation hosted and community managed
... in one package, then Apache Thrift belongs at the top of your consideration list.
A warm thank you from the author and his associates goes out to the many contributors to Open Source software and particularly to Jake, Jens, Roger, and Aki on the Apache Thrift team.
The author is a member of the project management committee for Apache Thrift and the author of The Programmer's Guide to Apache Thrift from Manning Publications [39% discount code: abernethydz].
Published at DZone with permission of Randy Abernethy. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments