Over a million developers have joined DZone.

Keeping up

· Big Data Zone

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

Data and analytics are wonderful things, aren’t they? Ooh, the power of data. Woo the insight from analytics. And it’s all so automated. So automagic…

Thing is, as the very old saying goes, Garbage In, Garbage Out. So it was when I came across a piece in The Telegraph from a month ago… “Only 7 of the FTSE 100 CEOs are active on Twitter” A subject close to my heart.

The piece was one of those easy bits of journalism that, if only I paid a proper PR person, I could probably get for coverage of my own research. But I don’t so I don’t (yet). The headline was spot on according to my own research conducted at the end of January – there were indeed seven FTSE100 CEOs who had a Twitter account and had tweeted in the past 6 months. The only problem was that one of the CEOs who got a lot of coverage – Amec Foster Wheeler’s Samir Brikho – well, the thing is that Samir is a prolific socialCEO – it’s just that he’s not been the CEO of a FTSE100 company since March 2014. Amec (as they were then) were shifted down to the FTSE250 in the March 2014 quarterly review.

This might sound like pedantry – possibly it is, in a “my research is better than your research” kind of way. But what it highlights for me is one of the key challenges with social media analytics software, and analytics platforms in general. If you make assumptions on a bunch of standing data (in this case the people behind the article’s research list of FTSE100 CEO Twitter accounts) everything else can end up looking wrong.

In the research that I’ve been doing, without doubt the hardest, most time consuming bit has been the quarterly trawl through finding out who the current list of FTSE100 CEOs actually are (there’s 10-20% turnover every year) and then whether the new and the existing ones have/still have/have created/have deleted Twitter or LinkedIn accounts. That takes quite a bit of informed judgement and over a day of work even though I’ve gotten pretty good at it having run the research six times. And even then there are things that are hard to resolve.

For instance, it took me a year to completely confirm that despite a few that look like they could have been set up defensively, Martin Sorrell from WPP doesn’t have a Twitter account. I know that because I spoke to WPP’s External Comms person last year.

Analytics are only as good as the data that structures them. If you don’t put the effort into meticulously managing those often small but very complex metadata sets, you can forget about the big data stuff.

Hortonworks Sandbox is a personal, portable Apache Hadoop® environment that comes with dozens of interactive Hadoop and it's ecosystem tutorials and the most exciting developments from the latest HDP distribution, brought to you in partnership with Hortonworks.


The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}