WebSocket is neither Web nor Socket
Join the DZone community and get the full member experience.
Join For FreeWebSocket essentially functions as follows: after initiating an HTTP connection, the client requests an HTTP "Upgrade: WebSocket", whereupon the underlying TCP connection is used to bidirectionally stream 0xFF-separated UTF-8 messages. We can now look at the specifics in more detail:
Does WebSocket use TCP ports 81 and 815?
The use of new ports requires firewall and proxy configuration which
will be resisted by many IT administrators. Moreover, these ports
appear to be suggested without consulting IANA.
The ports appear to be available, but it is a matter of due process to
consult IANA before appropriating well-known ports. The most likely
outcome for widespread use is that WebSocket would "upgrade" port 80 as
below.
How does WebSocket make use of an HTTP connection on port 80?
20 48 54 54 50 2f 31 2e 31 0d 0a 55 70 67 72 61For documenting the protocol, perhaps it would make sense to simply give the client/server interaction in ASCII, rather than specifying the exact sequence of bytes used to interact with the remote HTTP server for "Upgrade: WebSocket". Note here that the flexibility of HTTP is being used effectively.
64 65 3a 20 57 65 62 53 6f 63 6b 65 74 0d 0a 43
6f 6e 6e 65 63 74 69 6f 6e 3a 20 55 70 67 72 61
64 65 0d 0a
Does WebSocket obey the same origin policy?
The "same
origin policy" is one of the cornerstones of web security. Essentially,
executable page content can only establish a connection to the server
that the user has loaded the page from. Many of the recent security
exploits on the web (such as the gmail address book exploit and clickjacking)
arise because of subtle breakdowns in same-origin enforcement. It is
not clear whether WebSocket is intended to follow the same-origin
policy or not (a failure condition when the URL does not refer to the
originating host is not documented) but for the safety of the web, we
should insist that this policy remain in place.
Is WebSocket restricted to the two-connection limit of HTTP?
This
does not appear to be specified. However, since the WebSocket protocol
makes no use of metadata, chaos would ensue if a single connection was
used to multiplex the traffic of different WebSocket instances. The
most natural interpretation is that a new TCP socket is created for
each JavaScript construction of a WebSocket object. Typical usage, such
as for standalone Ajax components, would have a WebSocket created for
each component on the page, potentially resulting in hundreds of
connections to the server. Strangely enough, the two-connection limit
is the only fundamental aspect that makes using HTTP for Ajax Push
difficult, and if we had control over how XMLHttpRequest used the
underlying TCP connections, we would be in much better shape. The most
dramatic benefit (and greatest risk to scalability) of WebSocket must
not be unspecified. Note that socket establishment is expensive, so
providing a way to multiplex different endpoints of a protocol over a
single connection (as HTTP can) is a useful optimization.
Can WebSocket read and write arbitrarily as with low-level socket APIs?
WebSocket
communication is restricted to the WebSocket protocol (which includes
the connection setup and the 0xFF-delineated UTF-8 messages). It is
argued that this improves security because WebSocket clients are
unlikely to be able to attack existing network services. However, if
WebSocket becomes popular, the majority of internet-facing systems will
have applications that are vulnerable to attack through their WebSocket
interface. Is the short-term benefit worth the long-term loss in
flexibility (especially considering that a variety of existing plugins
allow low-level socket interaction with the originating host).
How does WebSocket delineate messages?
WebSocket framing
terminates messages with 0xFF. This is efficient in terms of byte
usage, but framing errors could easily occur due to stray binary data
(and keep in mind that a framing failure is a critical failure in a
protocol). Further, detecting such framing errors would not be obvious
from inspecting the TCP stream. (In contrast, MIME framing is
unambiguous and requires no internal escaping of binary messages.)
How are function call semantics implemented over WebSocket?
WebSocket
enforces no relationship between messages sent and received; multiple
messages may be received from the server subsequent to a client message
being sent to the server. This is not necessarily a drawback of the
protocol, it is simply important to keep in mind that the
request/response structure familiar on the web with HTTP is not
enforced by WebSocket.
Is WebSocket easy to implement?
On the surface,
implementation is straightforward; however, it is important to note
that writing can occur simultaneously at both ends of the connection.
If both ends attempt to write an amount larger than their TCP output
buffers, deadlock can occur. The point here is not that the protocol
should be designed to avoid simultaneous writing (as with HTTP 1.0) --
this is necessary to obtain the event-based interactivity we are after.
The point is that WebSocket implementations added ad-hoc to many
different applications would lead to problems; in other words, ease of
implementation is not as important as correctness in the protocol.
Can we just upgrade HTTP?
So, it appears that one
interpretation is that the greatest benefit of WebSocket is its
unspecified behavior in terms of TCP connections. Are there simple
things that we can do to improve HTTP for use with Ajax Push and Comet?
After all, we want to make use of the framing and metadata features of
HTTP, as well as benefit from its many standard and widely deployed
implementations.
The first step is to allow HTTP to benefit (in a reasonable way) from what is unspecified connection behavior with WebSocket: if two JavaScript object contexts do not share a "window" object, they should not (by default) share TCP connections. This would allow multiple browser windows/tabs to open notification connections to the server without interference and without complex inter-window coordination for the purpose of sharing a single connection. This step requires no API or protocol changes.
The next step is to fully support HTTP 1.1 from the browser (specifically, pipelining). By calling enablePipelining(true) on an XMLHttpRequest object, multiple push notification requests could be sent to the server without waiting for one of the two TCP connections to be freed. When a notification was available for one of the requests, all intermediate requests would be unblocked with no-op responses. Again, this would allow more straightforward multi-window push implementations.
Finally, we should consider extensions to the HTTP protocol itself, since a flurry of no-op responses when many windows are open is not efficient. With the introduction of a RequestTag HTTP header, an HTTP response could be uniquely associated with a request (other than by virtue of its order in the queue). This would allow out-of-order responses to pipelined requests, and would make it possible to use HTTP in an event-driven fashion. Note that this is not just useful for notification-style applications; control over response ordering can reduce latency and server buffering requirements. With support for out-of-order responses, it would be desirable to have control over which TCP connection is used for a given request. This could be controlled through an optionally specified connection "name".
From Ted's Ajax Adventure
Opinions expressed by DZone contributors are their own.
Comments