Could Social Media Data Streamline the Census?

DZone 's Guide to

Could Social Media Data Streamline the Census?

While there's significant potential for a real-time census process, it's likely to be a while before such projects are adopted globally. Researchers have come up with an intermediate solution.

· Big Data Zone ·
Free Resource

The census is one of those institutions that, while undoubtedly valuable, seems to be stuck in a bygone era. While there is significant potential for a real-time census to be conducted when you have something like the e-residency program implemented in Estonia, it's likely to be some time before such projects are adopted globally.

As such, an intermediate solution might be required to ensure that the state has a better idea of who lives where within a country. A team from the University of Washington believes that social networks such as Facebook can play such a role.

The long lead time of any census means that data reliability is a considerable question, especially when people move about for work or study. In a recent paper, the researchers compare Facebook Ads Manager's estimate of the populations of California and Texas with those from the American Community Survey.

Real-Time Migration

The team believes that social scientists could make better use of the databases held on sites such as Facebook and LinkedIn to garner more information on geography, mobility, behavior, and employment. Suffice to say, they're not advocating such databases replace the census as much as augmenting the official data records.

"Facebook data are freely available and disaggregated at the level of city or ZIP code in the US," they say.

The researchers utilized Facebook's Ad Manager service, which allows advertisers to target via quite specific demographics. You might, for instance, look for British expats living in New York.

Data was extracted from Ads Manager via an algorithm that was looking for expats from over 50 countries in every American state, with data on their age and sex. They attempted as much as possible to compensate for the unrepresentative nature of the Facebook sample. They then compared these numbers with official figures from the American Community Survey, with a specific focus on the number of Mexican migrants living in California and Texas.

When the numbers were crunched, it emerged that the Facebook numbers were considerably lower than official figures, especially among the older demographic, with the official stats around 5% higher than Facebook's numbers. The authors accept that this may be due to biases in the data and lower Facebook usage in that demographic.

When they repeated the analysis for Philippino people, the variance was much lower.

The team is confident that through such iterations they were able to develop a model to take account of the various biases inherent in the Facebook data and adjust their findings accordingly.

"Is it better to have a large sample that is biased, or a small sample that is nonbiased? The American Community Survey is a small sample that is more representative of the underlying population; Facebook is a very large sample but not representative," they say.

The researchers continue:

"The idea is that in certain contexts, the sample in the American Community Survey is too small to say something significant. In other circumstances, Facebook samples are too biased. With this project we aim at getting the best of both worlds: By calibrating the Facebook data with the American Community Survey, we can correct for the bias and get better estimates."

The team plans to develop the model further by testing it in developing countries where access to reliable and timely data is both crucially important but also sorely lacking.

big data ,census ,data analytics ,social media data

Published at DZone with permission of Adi Gaskell , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}