Over a million developers have joined DZone.

A New Way to Archive the Web

Archiving the web has always been a challenge. But a new collaborative project called Cobweb should help. It is planned for release in 2017.

· Web Dev Zone

Start coding today to experience the powerful engine that drives data application’s development, brought to you in partnership with Qlik.


One might think that platforms such as Archive.org have done a decent job in providing an open archive of the web. A team of researchers believe that more could/should be done however, and suggest an innovative solution in a newly published paper.

The researchers suggest an open source and collaborative platform that they call Cobweb, which will enable a comprehensive web archive to be created via the coordination of existing efforts by the archiving community. They reason that by sharing this responsibility across a number of institutions, the aggregation of their effort will provide a more frequently updated archive at both greater speed and with less cost.

Better Archiving

The researchers highlight the Arab Spring to elucidate their point, reminding us of how the rapidly unfolding events occurred across blogs, official media and social media, thus presenting a major challenge for archiving efforts.

“Recognizing the importance of recording this event, a curator immediately creates a new Cobweb project and issues an open call for nominations of relevant websites,” they say. “Scholars, subject area specialists, interested members of the public, and event participants themselves quickly respond, contributing to a site list that is more comprehensive than could be created by any curator or institution.”

“Archiving institutions review the site list and publicly claim responsibility for capturing portions of it that are consistent with local collection development policies and technical capacities.”

Cobweb relies heavily on a collaborative approach from members, which is a distinction from existing efforts that have a little bit of collaboration but are primarily individual endeavors.

“As a centralized catalog of aggregated collection and seed-level descriptive metadata, Cobweb will enable a range of desirable collaborative, coordinated, and complementary collecting activities,” the team say. “Cobweb will leverage existing tools and sources of archival information, exploiting, for example, the APIs being developed for Archive-It to retrieve holdings information for over 3,500 collections from 350 institutions.”

If the project reaches fruition, it’s planned to be hosted at the California Digital Library, with initial data provided by the collected metadata from partners and stakeholders. It’s expected that the project will take a year, with a public release made during the IIPC General Assembly in April 2017 in order to gather feedback from the community.

Create data driven applications in Qlik’s free and easy to use coding environment, brought to you in partnership with Qlik.


Published at DZone with permission of Adi Gaskell, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}