Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

The Reliability of Government Data Over Externally Managed Datasets

DZone's Guide to

The Reliability of Government Data Over Externally Managed Datasets

Wading through a significant number of federal government API links that returned 404 last week, I was reminded yet again of the unreliability of federal government data.

· Big Data Zone
Free Resource

Learn best practices according to DataOps. Download the free O'Reilly eBook on building a modern Big Data platform.

When I worked at the Department of Veterans Affairs, I was approached by a number of folks — external to the federal government — who wanted to help clean up, work with, and improve public datasets when it came to open data efforts in the federal government. As I was working on specific datasets about veteran facilities, organizations, programs, services, and more that would make a potential impact on a veterans' lives, I would often suggest publishing CSVs to GitHub and solicit the help of the public to validate and manage data out in the open. Something that was almost always shut down when I brought the topic up to anyone in leadership.

The common stance regarding the public participating in acquiring, managing, and cleaning up data using GitHub was no! The federal government was the authority when it came providing data. It would own the entire process and would be the only gatekeeper for accessing it. A couple of datasets that came up were the information for suicide assistance and the information for substance abuse clinic support, which I had on-the-ground local folks at clinics and veteran support groups wanting to help. I was told there would be no way I could get approval to help crowdsource the evolution of datasets; that all data would be stored, maintained, and made available via VA servers.

As I waded through a significant number of links that returned 404 as part of my talk about the state of APIs in federal government last week, I was reminded once again of the unreliability of federal government datasets. I’m finding a significant number of APIs, datasets, and supporting documentation go missing. This has me looking for any existing examples of how the federal government can better publish, share, syndicate, and manage data in an interoperable way — efforts like the National Information Exchange Model (NIEM), which “is a common vocabulary that enables efficient information exchange across diverse public and private organizations. NIEM can save time and money by providing consistent, reusable data terms and definitions, and repeatable processes.”

Another aspect of this conversation I’ll be exploring further is the role GitHub plays in all this. There are 130+ federal agency GitHub users/organizations on the platform, and I’d like to see how this usage might contribute to federal agencies being more engaged, and managing the uptime, availability, and reliability of data, code, APIs, and other resources coming out of the federal government. I am looking for any positive examples of federal agencies leveraging external cloud services and private sector partnership opportunities to make data, content, and other resources more available and reliable for public consumption. Let me know any other angles you’d like to see highlighted as part of my federal government data and API research.

Find the perfect platform for a scalable self-service model to manage Big Data workloads in the Cloud. Download the free O'Reilly eBook to learn more.

Topics:
big data ,government ,open data

Published at DZone with permission of Kin Lane, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}