HTTP is the protocol used to deliver our web applications to the end user; moreover, in every application integration based on REST it is the transport protocol for communication between the different components or services.
Given this omnipresence, knowing the HTTP specification by heart and the features already implemented in the protocol can only be beneficial. I'm surprised the web is full of theoretical material and explanation for the standard, but there is no REST or HTTP exercise that stimulates practising with it.
So on the last Friday we collected some resources and set HTTP as target of our book club session. Like Deming said, there is no substitute for knowledge.
Reading the HTTP spec documents would be very boring in a group-like setting. furthermore, the RFCs are distilled and sometimes isolated from each other (HTTP 1.0 vs 1.1). While they are definitely a good background read, specific teaching material is going to win in efficiency.
I selected Fabien Potencier's Caching on the edge presentation as the base guideline to follow. This slide deck presents the caching features of HTTP as used by servers, clients and caches in the path between them; it's an optimal excuse to review beforehand the basic concepts of HTTP such as methods, response code, and headers.
Stack Overflow is also a very refined source of theoretical explanations: whenever you are in doubt about how to use a specific header or on the difference between Expires and Cache-Control, it can find an answer for you. This answer has emerged from peer review and has been produced after years of diffusion of the protocol, unlike the original RFCs. It's definitely going to beat W3schools .
Theory is useful but practice makes perfect (and it also has better retention in our brains). The standard tool for performing HTTP requests is curl:
curl -v -X POST -H 'Accept: application/json' -d 'param=value1&another=value2' http://example.com/resource
You can start by targeting your own application to see the headers that your are explicitly setting in responses and the ones added by your container (Apache, Tomcat, or a framework). For example, this is a test resource that we use for testing our HTTP clients:
$ curl -v -X GET http://www.onebip.com/smsctest/answer.php
* About to connect() to www.onebip.com port 80 (#0)
* Trying 126.96.36.199... connected
> GET /smsctest/answer.php HTTP/1.1
> User-Agent: curl/7.22.0 (i686-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/188.8.131.52 libidn/1.23 librtmp/2.3
> Host: www.onebip.com
> Accept: */*
< HTTP/1.1 200 OK
< Date: Thu, 14 Nov 2013 20:44:54 GMT
< Server: Apache
< X-Onebip-Node: ws-b-46.production.onebip.com
< Vary: Accept-Encoding
< Content-Length: 2
< Content-Type: text/html; charset=utf-8
< Cache-control: private
< Set-Cookie: SERVERID=01-FUSF5JEJKQ68M; path=/
< Via: 1.1 www.onebip.com
< Connection: close
* Closing connection #0
To experiment with caching behaviors, we used instead the debian.org ISO download links. They feature:
- redirects and appropriate status codes (302, 304).
- cache headers based on conditional behavior (Last-Modified, Etag)
Large filesizes such as a .iso file makes really clear why GET is the most optimized operation on the web.
I've learned quite a bit on the system administrators concerns of caches and HTTP servers configuration. Developers instead shared their usage of HTTP headers on the server side to trigger particular behaviors - from the classic layout selecton by User-Agent to content negotiation based on Accept, Accept-Language and Accept-Encoding.
It was also a nice occasion to define responsibilities to avoid devs and ops stepping on each other toes. For example:
- application developers define the value of Cache-Control headers, to define what is cacheable and what is not.
- system administrators enable server-side caches like Apache mod_cache. Caching of full responses is then application-independent and does not touch the code written in your favorite programming language.
- Client-side caches instead must be used by the client explicitly. In a SOA environment it seems server-side ones are enough.
The scope of what you can explore in a single session is limited by our minds bandwidth and by the time allocated (the minimum of the two). Don't exaggerate.
I assume, like use, that you are a web developer working with RESTful web services inside and on the boundary of an application. Knowing HTTP is then basic knowledge: it is a building block comparable to CPU and memory, or for a better comparison TCP. Being a 20+ years old standard that has undergone some evolutions it is also stable knowledge, meaning exercising with it will last much more than playing with a new framework or tool.
However, don't rest on laurels: Httpbis is being worked on...