My First A/B Test
A/B testing gets a lot of attention on Hacker News, inbound.org, and other forums, and appeals to me as a data analysis exercise. As a software engineer with a practical bent, I like the concept of data analysis techniques which produce useful results while treating a system as a black box. This stands in contrast to algorithms that aim to analyze data and tell a story, for instance applying agent-based modeling to political science and the study of war and peace.
Testing two variations of a site to see how people react turns out to be extremely difficult to tinker with on a blog. You need to have the right sort of problems to justify using statistics, and it’s challenging to create those problems to happen to justify the experiment. From another angle, I’ve long been leery of using split testing at all as keeping WordPress stable has been a real pain so I prefer to avoid additional operational complexity.
Enter Adzerk, which provides a hosted ad server, removing the need for additional infrastructure. The function of an ad server is to let you upload media (pictures, etc) and set up business rules for when to display each “ad”, effectively letting you run something resembling an Adsense clone, minus all the AI. Adzerk has a nice free plan, which covers you up to a ridiculously high number of impressions, so it’s not really necessary. I’ve been really happy with the site, although from their perspective I’m likely a terrible customer, as they’re not making any money off me.
The logical products to promote on a programming site seem like jobs, books, and developer tools – right now I’m just running campaigns on a couple sites consisting of links to Amazon pages. Here’s a screenshot a campaign set up very recently:
Once you get enough impressions to compare, you can just turn off each entry and make new ones for the next test. There’s no particular reason you have to use traditional display advertising with this – as cheap as it is, one could easily build a very dynamic site using their API, for instance to pump in hot news, thumbnails for suggested stories, etc.
|ExtJS in Action||10060||0.43||43|
To test this, I made a table of each combination of values, comparing them using thechi-squared test on Evan’s Awesome A/B testing tools. Fortunately this showed ExtJS in Action to be a clear winner over all the rest as I hoped- one risk of this technique is that there is a clear loser or a couple equivalent winners, where I was hoping for one to be the best.
|x||no difference||no difference||loses|
|no difference||x||no difference||loses|
|no difference||no difference||x||loses|
In all, this took around nine months to achieve, but with little effort on my part. The lesson I take is this: while this is fun, it may be better to look for big wins elsewhere- especially borrowing ideas from people who already run these tests. Additionally, the tests would complete much faster running only two variations, and many can run in parallel.