What Is an Active Archive?
Find out what an active archive is.
Join the DZone community and get the full member experience.Join For Free
Simply put, an active archive is one where data is organized, accessible, retrievable, and intelligently retained, making your archived data useful to the organization. In the past, archives were thought of strictly as a long-term repository for highly infrequently accessed data — think cold storage — and so not much thought was put into intelligently managing this data. The hope was that your archive was like an insurance policy that you would never need to use.
But data overall has continued to grow unabated at tremendous rates year over year, and unstructured data has led that charge with growth rates of 60 percent or more. What’s more, it’s predicted to represent 90 percent or more of all data within just a few years. Unstructured data such as office documents, videos, audio files, images, .pdfs, and anything not in a database, has now become the lifeblood of most organizations. Intelligently storing this data over the long term is critical not only for compliance and organizational history, but increasingly for business intelligence, analysis, data mining, and other purposes.
An active archive is a way to intelligently and cost-effectively manage unstructured data for the long term, not just to save money but increasingly to leverage the data as a critical corporate asset to be mined and used for benefit.
Data Should be Organized
Unstructured data tends to be messy — a typical organization can have millions and millions of files not necessarily organized in any particular fashion. To make sense of this, it’s helpful to be able to classify and tag data based on categories that are important both internally and externally. Think “confidential” or “legal” as useful flags for the ability to retrieve data in the event of an audit, but more than that, all sales data, all financial data, etc. could be classified for fast and easy retrieval for future use.
Data Should Be Accessible
You need to be able to store your data where you want and get at it easily. This could mean on-premises in a private cloud, or increasingly, in the public cloud or clouds. We’re beginning to see increased competition among cloud vendors, and having the ability to take advantage of changing cloud economics is extremely valuable. An Active Archive should support both on-premises and true multi-cloud with the ability to dynamically switch storage destinations among cloud vendors, and not require the administrator to remember where that data is.
Data Should Be Retrievable
Complementary to classification and tagging is full-content search. Imagine the ability to quickly and easily search through petabytes of data with millions (or billions) of files to find that needle you were looking for, using a word or a string of words, rather than having to know where or when a file was saved. This opens up what has been an opaque black hole of practically unusable data into a usable repository.
Data Should Be Intelligently Retained
If you ask an audience of IT administrators what their corporate policy is on data retention, the vast majority of them will tell you they keep everything forever. Data governance is a huge topic, more than we can get into here, but suffice to say that best practices are not to keep everything forever but to intelligently prune data that is no longer needed for legal, space, cost, and other reasons. An active archive is one that helps an administrator set policies to enable intelligent pruning of data no longer needing to be retained, freeing up space and decreasing storage costs.
An active archive provides intelligent, multi-cloud data management, making the long-term storage of an organization’s most critical asset — its data — useful today and forever.
Opinions expressed by DZone contributors are their own.