Over a million developers have joined DZone.

Philly ETE 2013 – Day 1 Keynote Summary

· Big Data Zone

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

The Philly ETE Keynote address was a presentation on modelling advertising data by media 6 degrees, a company which uses data modelling techniques to improve advertisement conversion rates for large brands.

The techniques presented showed that predictive models of purchasing behavior can be built in an unsupervised fashion, without knowledge of causality. Specifically, they purchase data that ties cookies to hashed URLs, about 100 million people per day, averaging tens of clicks per cookie. These models use techniques such as linear regression, and work best with niche products (since most people have credit cards, etc, they are naturally un-targeted)

Using this data, they predict whether it is worth purchasing impressions on ad exchanges, comparing users known to visit a brand to traits of a user listed in an ad exchange – the decision to purchase an ad in front of a user has to happen in <15 ms.

One interesting challenge is establishing causal relationships – doing A/B tests with blank/public service ads as a control is undesirable.  Data can establish causal relationships, but can be confounding (cause-effect may appear reversed). For instance, consider beauty; using “beautiful” on a message on a dating site might be a failure because of the reaction it induces, or because the attractive people get most of the messages.

Retargeting was also discussed at some length – this is the advertising practice of showing ads for a product when the user has previously visited a site. The effects of botnets clicking ads is visible to those who buy ads on exchanges (many impressions that don’t convert) but appealing to the exchanges and to some ad agencies (gives the appearance of having higher volume). These can make money by surfing people to random sites, normally resulting in low value ad traffic for those sites, which many unique IPs and traffic which can appeal somewhat normal.

Recently these started visiting sites that use retargeting (e.g. Chase) leading to much higher value ads being shown on many of the botnet sites, which was visible to a number of ad buyers.

Hortonworks Sandbox is a personal, portable Apache Hadoop® environment that comes with dozens of interactive Hadoop and it's ecosystem tutorials and the most exciting developments from the latest HDP distribution, brought to you in partnership with Hortonworks.


Published at DZone with permission of Gary Sieling, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}