The HTML 5 proposal contains many new and interesting ideas. In particular, we'll be discussing WebSocket at the Silicon Valley WebBuilder panel (so please submit your questions). To provide some background, let's take a closer look at WebSocket and consider it against potential small modifications to HTTP usage (such as keeping TCP connections distinct when "window" objects are distinct).
WebSocket essentially functions as follows: after initiating an HTTP connection, the client requests an HTTP "Upgrade: WebSocket", whereupon the underlying TCP connection is used to bidirectionally stream 0xFF-separated UTF-8 messages. We can now look at the specifics in more detail:
Does WebSocket use TCP ports 81 and 815?
The use of new ports requires firewall and proxy configuration which will be resisted by many IT administrators. Moreover, these ports appear to be suggested without consulting IANA. The ports appear to be available, but it is a matter of due process to consult IANA before appropriating well-known ports. The most likely outcome for widespread use is that WebSocket would "upgrade" port 80 as below.
How does WebSocket make use of an HTTP connection on port 80?
20 48 54 54 50 2f 31 2e 31 0d 0a 55 70 67 72 61
64 65 3a 20 57 65 62 53 6f 63 6b 65 74 0d 0a 43
6f 6e 6e 65 63 74 69 6f 6e 3a 20 55 70 67 72 61
64 65 0d 0a
For documenting the protocol, perhaps it would make sense to simply give the client/server interaction in ASCII, rather than specifying the exact sequence of bytes used to interact with the remote HTTP server for "Upgrade: WebSocket". Note here that the flexibility of HTTP is being used effectively.
Does WebSocket obey the same origin policy?
The "same origin policy" is one of the cornerstones of web security. Essentially, executable page content can only establish a connection to the server that the user has loaded the page from. Many of the recent security exploits on the web (such as the gmail address book exploit and clickjacking) arise because of subtle breakdowns in same-origin enforcement. It is not clear whether WebSocket is intended to follow the same-origin policy or not (a failure condition when the URL does not refer to the originating host is not documented) but for the safety of the web, we should insist that this policy remain in place.
Is WebSocket restricted to the two-connection limit of HTTP?
Can WebSocket read and write arbitrarily as with low-level socket APIs?
WebSocket communication is restricted to the WebSocket protocol (which includes the connection setup and the 0xFF-delineated UTF-8 messages). It is argued that this improves security because WebSocket clients are unlikely to be able to attack existing network services. However, if WebSocket becomes popular, the majority of internet-facing systems will have applications that are vulnerable to attack through their WebSocket interface. Is the short-term benefit worth the long-term loss in flexibility (especially considering that a variety of existing plugins allow low-level socket interaction with the originating host).
How does WebSocket delineate messages?
WebSocket framing terminates messages with 0xFF. This is efficient in terms of byte usage, but framing errors could easily occur due to stray binary data (and keep in mind that a framing failure is a critical failure in a protocol). Further, detecting such framing errors would not be obvious from inspecting the TCP stream. (In contrast, MIME framing is unambiguous and requires no internal escaping of binary messages.)
How are function call semantics implemented over WebSocket?
WebSocket enforces no relationship between messages sent and received; multiple messages may be received from the server subsequent to a client message being sent to the server. This is not necessarily a drawback of the protocol, it is simply important to keep in mind that the request/response structure familiar on the web with HTTP is not enforced by WebSocket.
Is WebSocket easy to implement?
On the surface, implementation is straightforward; however, it is important to note that writing can occur simultaneously at both ends of the connection. If both ends attempt to write an amount larger than their TCP output buffers, deadlock can occur. The point here is not that the protocol should be designed to avoid simultaneous writing (as with HTTP 1.0) -- this is necessary to obtain the event-based interactivity we are after. The point is that WebSocket implementations added ad-hoc to many different applications would lead to problems; in other words, ease of implementation is not as important as correctness in the protocol.
Can we just upgrade HTTP?
So, it appears that one interpretation is that the greatest benefit of WebSocket is its unspecified behavior in terms of TCP connections. Are there simple things that we can do to improve HTTP for use with Ajax Push and Comet? After all, we want to make use of the framing and metadata features of HTTP, as well as benefit from its many standard and widely deployed implementations.
The next step is to fully support HTTP 1.1 from the browser (specifically, pipelining). By calling enablePipelining(true) on an XMLHttpRequest object, multiple push notification requests could be sent to the server without waiting for one of the two TCP connections to be freed. When a notification was available for one of the requests, all intermediate requests would be unblocked with no-op responses. Again, this would allow more straightforward multi-window push implementations.
Finally, we should consider extensions to the HTTP protocol itself, since a flurry of no-op responses when many windows are open is not efficient. With the introduction of a RequestTag HTTP header, an HTTP response could be uniquely associated with a request (other than by virtue of its order in the queue). This would allow out-of-order responses to pipelined requests, and would make it possible to use HTTP in an event-driven fashion. Note that this is not just useful for notification-style applications; control over response ordering can reduce latency and server buffering requirements. With support for out-of-order responses, it would be desirable to have control over which TCP connection is used for a given request. This could be controlled through an optionally specified connection "name".