The absolute minimum you'll ever have to know about session persistence on the web
What is the definition of session persistence? For instance, it means recognizing an user as the same one that has compiled a login form before. Technically speaking, it means identify a client in between different HTTP requests.
Session persistence is different from data persistence (for which we use databases, files, and ORMs) since the thing to maintain is not the state of the application, but of the interaction with a particular user. Session persistence is usually enough if it lasts for the time of a single usage (from minutes to hours), thus a local storage on a web server can be employed (usually RAM).
Parameters like the ip address of the client are not reliable for this recognition, and as we'll see, every web application has to craft a custom approach to session persistence since the HTTP protocol does not offer such a facility. Even HTTP-based authentication is repeated at every interaction (resending password), but it is not as widely used as a cookie-based approach.
HTTP is a stateless protocol - but it gives us workarounds to maintain the identity of clients during a connection.
Stateless means that each HTTP request, taken as-is, is independent of the previous and next ones. There is no state of the connection embedded in the protocol like with TCP sequence numbers: in fact, typically a connection is opened when you request a page with your browser and closed when the operation is completed. The stateless nature of the protocol is what makes proxies work, since they can easily cache idempotent GET requests.
Actually it's not so simple: the Keep-Alive header, which became a default in HTTP 1.1, prescribes not to close the connection after a response has been sent, and makes the server able to deliver multiple resource in sequence (like the images of a web page) with the overhead of a single connection. Still, a new connection is created at every click.
There are various ways to instill a bit of state into HTTP interactions, by providing the client with a parameter that can be passed back to the server in order for it to recognized the user. This parameter can be for example embedded in the Url, or it can be set via a specialized mechanism which was introduced by Netscape in 1995, five years after the original HTTP specification: cookies.
Cookies are supported by every modern browser, with the recognition of the headers Cookie (in requests) and Set-Cookie (in responses).
A cookie is a little string value, with an identified name, which will be passed back as a request header at every new connection, until its expiration time has been reached. Cookies are host-specific (every website has its own set maintained on the client), and path-specific (can be used by pages only in a particular subdirectory of the server.)
In Java servlets, you can easily handle cookies via an object model:
// response instanceof HttpServletResponse
Cookie myData = new Cookie("fancyName", "value");
// the cookie will be available from the next request
Cookies cookies = request.getCookies();
In a PHP script, the setcookie() function describes every characteristic of cookies in its signature:
setcookie("TestCookie", $value); // expires when browser is closed
setcookie("TestCookie", $value, time()+3600); // expires in 1 hour
setcookie("TestCookie", $value, time()+3600, "/folderWhereCookieIsValid/", ".domainwherecookieisvalid.com");
The $_COOKIE superglobal array will contain the values of all cookies from the subsequent request.
There are two options when using cookies for maintaining the state of an interaction: storing raw data or storing the keys of a server-side data structure (like a database table). Cookies can be forged easily, so don't store in them info that could be modified (user permissions or nickname) without a further mechanism of identification (a one-time password for example).
If you're struggling with the transmission of raw data via cookies, maybe you should take your infrastructure to the next level, and start using sessions.
Session management stores a unique id in a conventionally named cookie, and then use the value transmitted by the client to access a big hash table where every session id has its own array of data.
In the past session ids were passed in the url because not every client had cookie support, but now the risk the user tweets his url containing a session id is greater than finding someone with a browser that does not support cookies.
If you have multiple servers, the session state management will become complex since either the session storage is shared between HTTP servers (as an external node) or the load balancer must send every user to the same server, basing on the IP or some other request parameter.
Session hijacking is the practice of guessing the session id of another user, typically by having him sending the cookie to an external server via an attack executed on the client-side.
In a Java servlet, the request object will give you access to the HttpSession object:
// request instanceof HttpServletRequest
HttpSession session = request.getSession();
// remember that casting when you get it back
String myAttribute = (String) session.getAttribute("name");
In PHP, another superglobal array manages session variables:
<?phpThe higher level abstraction over PHP's session management is useful when it comes the time to do some acceptance testing, as it provides something to mock or to put in "testing" mode (an hack, but Zend_Session works like this.) Without a level of indirection, it would be very difficult to test more than one request in the same script.
$_SESSION['key'] = $value; // yes, it's that simple
// you can also use an higher-level abstraction like Zend_Session