Over a year ago, I became exposed to a new way of writing Web Services called REST. REST is about using the principles of the World Wide Web to build applications. REST stands for REpresentational State Transfer and was first defined within a PhD thesis by Roy Fielding. REST is a set of architectural principles which ask the following questions:
- Why is the World Wide Web so prevalent and ubiquitous?
- What makes the Web scale?
- How can I apply the architecture of the Web to my own applications?
While REST has many similarities to the more traditional ways of writing SOA applications, in many important ways it is very different. You would think that my background would be an asset to understanding this new way of creating web services, but unfortunately this was not the case. The reason is that some of the concepts of REST are hard to swallow, especially if you have written successful SOAP or CORBA applications. If your career has a foundation in one of these older technologies, there's a bit of emotional baggage you will have to overcome. For me, it took a few months and a lot of reading. For you, it may be easier. For others, they will never pick REST over something like SOAP and WS-*. I just ask that you keep an open mind and do some research if I fail to convince you that REST is an intriguing alternative to WS-*. So...
RESTful Architectural Principles
REST isn't protocol specific, but when people talk about REST they usually mean REST over HTTP. Technologies like SOAP use HTTP strictly as a transport protocol and thus use a very small subset of its capabilities. Many would say that WS-* uses HTTP solely to tunnel through firewalls. HTTP is actually a very rich application protocol which gives us things like content negotiation and distributed caching. RESTful web services try to leverage HTTP in its entirety using specific architectural principles. What are those RESTful principles?
- Addressable Resources. Every “thing” on your network should have an ID. With REST over HTTP, every object will have its own specific URI.
- A Uniform, Constrained Interface. When applying REST over HTTP, stick to the methods provided by the protocol. This means following the meaning of GET, POST, PUT, and DELETE religiously.
- Representation oriented. You interact with services using representations of that service. An object referenced by one URI can have different formats available. Different platforms need different formats. AJAX may need JSON. A Java application may need XML.
- Communicate statelessly. Stateless applications are easier to scale.
Let's go into more detail on each of these individual principles.
Addressability is the idea that every object and resource in your system is reachable through a unique identifier. This seems like a no-brainer, but, if you think about it, standardized object identity isn't available in many environments. If you have tried to implement a portable J2EE application you probably know what I mean. In J2EE, distributed and even local references to services are not standardized so it make portability really difficult. This isn't such a big deal for one application, but with the new popularity of SOA, we're heading to a world where disparate applications must integrate and interact. Not having something as simple as service addressability standardized adds a whole complex dimension to integration efforts.
In the REST world, addressability is addressed through the use of URIs. URIs are standardized and well-known. Anybody who has ever used a browser is familiar with URIs. From a URI we know the object's protocol. In other words, we know how to communicate with the object. We know its host and port or rather, where it is on the network. Finally, we know the resource's path on its host, which is its identity on the server it resides.
Using a unique URI to identify each of your services make each of your resources linkable. Service references can be embedded in documents or even emails. For instance, consider the situation where somebody calls your company's help desk with a problem with your SOA application. They can email a link to the developers on what exact service there was problems with. Furthermore, the data which services publish can also be composed into larger data streams fairly easily.
In this example, we have an XML document that describes a e-commerce order entry. We can reference data provided by different divisions in a company. From this reference we can not only obtain information about the linked customer and products that were bought, but we also have the identifier of the service this data comes from. We know exactly where we can further interact and manipulate this data if we so desired.
The Uniform, Constrained Interface
The REST principle of a constrained interface is perhaps the hardest pill for an experienced CORBA or SOAP developer to swallow. The idea behind it is that you stick to the finite set of operations of the application protocol you're distributing your services upon. For HTTP, this means that services are restricted to using the methods GET, PUT, DELETE, and POST. Let's explain each of these methods:
- GET is a read only operation. It is both an idempotent and safe operation. Idempotent means that no matter how many times you apply the operation, the result is always the same. The act of reading an HTML document shouldn't change the document. Safe means that invoking a GET does not change the state of the server at all. That, other than request load, the operation will not affect the server.
- PUT is usually modeled as an insert or update. It is also idempotent. When using PUT, the client knows the identity of the resource it is creating or updating. It is idempotent because sending the same PUT message more than once has no affect on the underlying service. An analogy is an MS Word document that you are editing. No matter how many times you click the “save” button, the file that stores your document will logically be the same document.
- DELETE is used to remove services. It is idempotent as well.
- POST is the only non-idempotent and unsafe operation of HTTP. It is a method where the constraints are relaxed to give some flexibility to the user. In a RESTFul system, POST usually models a factory service. Where with PUT you know exactly which object you are creating, with POST you are relying on a factory service to create the object for you.
You may be scratching your head and thinking, “How is it possible to write a distributed service with only 4 methods?” Well... SQL only has 4 operations: SELECT, INSERT, UPDATE, and DELETE. JMS and other MOMs really only have two: send and receive. How powerful are both of these tools? For both SQL and JMS, the complexity of the interaction is confined purely to the data model. The addressability and operations are well defined and finite and the hard stuff is delegated to the data model (in SQL's case) or the message body(JMS's case).
Why is the Uniform Interface Important?
Constraining the interface for your web services has many more advantages than disadvantages. Let's look at a few:
If you have a URI that points to a service you know exactly what methods are available on that resource. You don't need a IDL-like file describing what methods are available. You don't need stubs. All you need is an HTTP client library. If you have a document that is composed of links to data provided by many different services, you already know what method to call to pull in data from those links.
HTTP is a very ubiquitous protocol. Most programming languages have an HTTP client library available to them. So, if your web service is exposed via REST, there is a very high probably that people that want to use your service will be able to without any additional requirements beyond being able to exchange the data formats the service is expecting. With CORBA or WS-* you have to install vendor specific client libraries. How many of you have had the problem of getting CORBA or WS-* vendors to interoperate? It has traditionally been very problematic. The WS-* set of specifications have also been a moving target over the years. So with WS-* and CORBA, you not only have to worry about vendor interoperability, you have to make sure that your client and server are using the same specification version. With REST over HTTP, you don't have to worry about either of these things and can just focus on understanding the data format of the service. I like to think that you are focusing on what is really important: application interoperability, rather than vendor interoperability.
Because REST constrains you to a well-defined set of methods, you have predictable behavior which can have incredible performance benefits. GET is the strongest example. Because GET is a read method that is both idempotent and safe, browsers and HTTP proxies can cache responses to servers which can save a huge amount of network traffic and hits to your website. Add the capabilities of HTTP 1.1's Cache-Control header, and you have a incredibly rich way of defining caching policies for your services.
It doesn't end with caching though. Consider both PUT and DELETE. Because they are idempotent, the client, nor the server have to worry about handling duplicate message delivery. This saves a lot of book keeping and complex code.
The third architectural principle of REST is that your services should be representation oriented. Each service is addressable through a specific URI and representations are exchanged between the client and service. With a GET operation you are receiving a representation of the current state of that resource. A PUT or POST passes a representation of the resource to the server so that the underlying resource's state can change.
In a RESTful system, the complexity of the client-server interaction is within the representations being passed back and forth. These representations could be XML, JSON, YAML, or really any format you can come up with. One really cool thing about HTTP is that it provides a simple content negotiation protocol between the client and server. Through the Content-Type header, the client specifies the representation's type. With the Accept header, the client can list its preferred response formats. AJAX clients can ask for JSON, Java for XML, Ruby for YAML. Another thing this is very useful for is versioning of services. The same service can be available through the same URI with the same methods (GET, POST, etc.), and all that changes is the mime type. For example, the mime type could be “application/xml” for an old service while newer services could exchange “application/xml;schemaVersion=1.1” mime types.
All and all, because REST and HTTP have a layered approach to addressability, method choice, and data format, you have a much more decoupled protocol that allows your service to interact with a wide variety of different clients in a consistent way.
The last RESTful principle I will discuss is the idea of statelessness. When I talk about statelessness though, I don't mean that your applications can't have state. In REST, stateless means that there is no client session data stored on the server. The server only records and manages the state of the resources it exposes. If there needs to be session specific data, it should be held and maintained by the client and transfered to the server with each request as needed. A service layer that does not have to maintain client sessions is a lot easier to scale as it has to do a lot less expensive replications in a clustered environment. Its a lot easier to scale up as all you have to do is add machines.
A world without server maintained session data isn't so hard to imagine if you look back 12-15 years ago. Back then many distributed applications had a fat GUI client written in Visual Basic, Power Builder, or Visual C++ talking RPCs to a middle-tier that sat in front of a database. The server was stateless and just processed data. The fat client held all session state. The problem with this architecture was an operations one. It was very hard for operations to upgrade, patch, and maintain client GUIs in large environments. Web applications solved this problem because the applications could be delivered from a central server and rendered by the browser. We started maintaining client sessions on the server because of the limitations of the browser. Now, circa 2008, with the growing popularity of AJAX, Flex, and Java FX, the browsers are sophisticated enough to maintain their own session state like their fat-client counterparts in the mid-90s used to do. We can now go back to that stateless scalable middle tier that we enjoyed in the past. Its funny how things go full circle sometimes.
REST identifies the key architectural principles of why the World Wide Web is so prevalent and scalable. The next step in the evolution of the web is to apply these principles to the semantic web and the world of web services. REST offers a simple, interoperable, and flexible way of writing web services that can be very different than the RPC mechanisms like CORBA and WS-* that so many of us have had training in.
This article is the first of a two part series. In this article I wanted to introduce you to the basic concepts of REST. In my next article “Putting Java to REST”, we will build a very simple RESTful service in Java using the new JCP standard JAX-RS. In other words, you'll get to see the theory being put into action. Until then, I urge you to read more about REST. Below are some interesting links.
Read Part II of this series: Putting Java to REST
About the Author
Bill Burke is an engineer and Fellow at the JBoss division of Red Hat. He blogs regularly at bill.burkecentral.com.