Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
At Mazama Science we produce web based tools for interrogating important datasets. We are proud to announce the release of a new Population databrowser that allows users to review international population trends. We have done our best to make this release Open, Transparent and Reproducible by providing source code and data that enable readers to retrace our steps from raw data to final graphic.
Understanding global population trends is extremely important for anyone trying to understand world events or trying to make projections regarding economics and natural resource usage. The Population databrowser is a pro bono data visualization service that promotes a better understanding of existing and projected population trends throughout the world.
In this release we focused on internationalization by providing a single click switch from one language to another. (This also proves useful if you want to test your knowledge of country names in another language!) As an example, here is the Russian language plot for the population trend in Ukraine:
In case your Cyrillic is rusty, the top portion of the plot shows that the (estimated) population of Ukraine in 2014 is the same as it was in 1963. Annual change seen in the bottom half clearly shows the mass exodus after the fall of the Soviet Union and an ongoing population decline projected into the future.
Open, Transparent and Reproducible
Our stated goal of making things Open, Transparent and Reproducible deserves some further explanation:
Open in the sense of freely accessible but also in the sense of open source where the data and analysis software are available at zero cost. The Population databrowser uses publicly available data from the US Census Bureau and relies on open source R for analysis and plotting. Users wishing to run the example R scripts should try out RStudio — an open source IDE for R.
Transparent is the word used to describe data graphics and user interfaces that don’t need a lot of explaining. We have done our best to make using this databrowser as effortless as possible. With careful attention to variable naming and code structure we hope that the source code we provide is, if not always transparent, at least not opaque.
In elementary school we learned that science and engineering should be Reproducible. Sadly, this is not always the case as analyses are often reported without any way to assess their validity. The data and analysis scripts we provide offer you a chance to reproduce the results seen in the Population databrowser.
Example R code
On the databrowser Source page we provide all of the data files and R code needed to convert raw data obtained from the Census Bureau into the graphics seen in the Population databrowser. A few of the issues addressed include:
- using reshape2 to convert an ‘unraveled’ table into a proper dataframe
- converting country codes from FIPS to ISO
- using RJSONIO to read in json data (with Unicode characters)
- using Lists as flexible containers to simplify and regularize function arguments
Take Home Message
We sincerely hope that this databrowser provides some inspiration to people across the globe to:
- learn more about international population trends
- create more multi-language data visualization tools
- share source code used to create good data visualizations
Published at DZone with permission of Jonathan Callahan , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.