I Know What Facebook is Using - Part 1

DZone 's Guide to

I Know What Facebook is Using - Part 1

· Performance Zone ·
Free Resource


If your site is focusing on community, social media, or you hope to have a wildly successful web application, chances are you might aspire to have a userbase as large as Facebook someday.  Although that is admirable to aspire to, introducing your servers to exponentially increased numbers of users and simultaneous connections will certainly present you with the kinds of problems you'll want to have.  High-performance applications as well as highly optimized servers will be necessary in order to sufficiently address these increased number of users.  With this goal in mind, I carefully looked at some of the technologies that Facebook uses and just how they might help you to in optimizing and scaling your own application infrastructures.  I call this articles series, 'I Know What Facebook is Using', and eventually you will too.  Let's get started!

Part 1 - Facebook Uses Varnish

Have you ever wondered how Facebook has the server capacities to display as many profile pictures and photo albums as quickly as they do?  According to Facebook, there's a very good chance that an application called Varnish is involved.  Varnish is a high-performance HTTP accelerator designed to improve the performance of web applications that have a number of interlocking pieces, or counterparts.  Varnish uses a configuration language called VCL that allows you to take control of how your web application responds to requests and deals with cache.  Varnish supports Edge Side Includes, a XML-based markup language that gives us a way to break up sections of a web page, and cache each of them independently. Varnish also has the ability to do load balancing across multiple nodes, and poll backend nodes through its health checking system.

Installing Varnish

Varnish is supported on a number of the major distributions of Unix/Linux. You can also obtain the application from source, though a few prerequisites must be met:

  • A recent version of  GCC (3.3.x or newer should be fine, 4.2.1 or newer recommended)
  • A POSIX-compatible make (GNU make is fine)
  • Recent versions of the GNU autotools ( automake,  autoconf,  libtool and  ncurses)
  • When building source code checked out from the repository (as opposed to source code from a release tarball), you will also need  xsltproc from  libxslt.

Consider following the documentation for configuring and building Varnish from source on your web/application server.

Running Varnish

First, let's get an idea of how Varnish works.  Varnish is designed as a caching reverse proxy.  A reverse proxy sits in front of a web server and intercepts all of the traffic from the web destined for one or more of your web servers. Reverse proxies have the intellegence to deal with the traffic request themselves, or forward the traffic as needed to the backend servers. Caching reverse proxy servers can relieve the efforts of your backend servers by caching static content like images and dynamically generated HTML pages.

When configuring your installation of Varnish, the first item of business is changing the listen port on your HTTP server- many of the example configurations show their HTTP listen ports to 8080.  Now, run Varnish with the following command:

varnishd -a :80 -b localhost:8080 -T localhost:6082

This command will run Varnish in its most basic configuration, forwarding traffic as needed to your awaiting web server over port 8080, and provides a management interface over port 6082.  Notice that your website is still served up over port 80, but not before the traffic is intercepted by Varnish.

Using the Varnish Configuration Language

As stated, Varnish uses its default configuration when running it via the above method.  However, we can use VCL, the Varnish Configuration Language to create a custom configuration file for added control and complexity.  VCL has a syntax strikingly close to C, and when loaded, it is compiled to a shared object and dynamically linked to the server process.  There is an entire list of functions and subroutines available to aid in your configurations.  Take a look at the Varnish documentation for the most comprehensive list.

Here are a couple of examples for us to take a look at.  Let's check out a simple configuration for creating a backend named object:

backend www {
.host = "www.myexampleapp.com";
.port = "http";

With an added parameter, you can limit the number of connections that Varnish will send to the backend like this:

backend www {
.host = "www.myexampleapp.com";
.port = "http";
.max_connections = 300;

Lastly, here is a cool configuration example for stopping those pesky hotlinks to all the cool images from your website.  Using Varnish, your web application will return an error 403 for those directly linking to images on your site:

sub vcl_recv {
if (req.http.host == "www.myexampleapp.com" &&
req.url ~ "^/images/" &&
(req.http.referer && req.http.referer !~ "^http://www.myexampleapp.com/")) {
error 403 "No hotlinking please";

As you can see, Varnish offers you a number of configuration options for use with your web infrastructure.  Take a page out of Facebook's playbook, and see how you can use this tool to improve the overall performance of your web applications.


Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}