The idea of turning your business data into a product, also termed “data as a product,” is a known concept that I didn’t invent. It has been documented well by many groups with various well formed white papers. I wanted to bring this concept to light for simple reinforcement. It is common for organizations to feel that due to experience in a specific industry or years in IT that they are ready for the introduction of big data via a data lake architecture into their organization. How hard can it be?
On the other hand, what we as “Big Data Vendors” experience in our initial engagements are at least 4 stages of customer experience. This roughly translates to a model of Big data or Data Maturity as described by other groups.
In reality, most organizations are in the initial stages of big data maturity. The business world is awash with encouragement towards a utopia or promised land where data becomes your product. For your own aspirations, it helps to understand the steps that other organizations have taken through stages of data maturity to potentially achieve data as a product.
4 Stages of Data Maturity
Initial - This usually represents a stage at which most companies have realized they need something beyond a traditional data warehouse. Sometimes the IT or development group may have some limited experience with Hadoop. Some groups may even have at least one use case working but usually on a lab system that is no more than a prototype.
For the most part groups in the initial stage are beginning to learn about eliminating data silos and data lakes as a concept with a stated goal of self-service data. Manual processing is largely the standard procedure of the day. Realistically there is no concept of self-service and most requests need to be pushed through the IT support infrastructure for all but the smallest of use cases. Even worse, the latter can take extended periods to implement due to all the sign offs required. There is usually very little security implemented other than user authentication integration. Data lifecycle planning is not understood yet and the development of data swamp is very possible.
Awareness - At this stage external data is combined with company data to provide some additional value. The awareness of the movement to Hadoop and away from traditional RDBMS starts to permeate the organization. The realization of the changing nature of the skills required and general disruption to existing processes becomes evident. No self-service generally exists formally but a data lake infrastructure is in place and the initial stages of data lake use are occurring. Usually one or two successful use cases have been implemented.
But with initial success comes an avalanche of ideas from other business units looking to make use of the new system. Data ingest with some level of automation is happening, some data silos have eliminated or dependence upon them has been reduced. Some additional use of security policies such as encryption over the wire, at rest and possibly Kerberos have been implemented or are in the process of being implemented. Data lifecycle planning is in place but potentially not automated at this stage.
Proficiency - At this stage, groups are getting at least internal business value from data. There may be some initial use of self-service reporting and possibly data transformation. All data engineering is well defined and standardized. Systems are now easily ingesting data. This process is mature and includes automated metadata, data quality and transformations. Traditional RDBMS systems have been right-sized or eliminated completely. Additions of new data sets into this framework done via a standard operating procedure taking less than a week. Security is fully implemented with robust authentication, authorization, and general administration very well defined. The output of these processes are at the core of the business now and are considered business critical.
Mature Data Processes/Data Driven - This is the final stage where data is now a product that can be sold to other organizations. This may include multiple geographically dispersed Hadoop clusters functioning in unison via automated processes to aggregate and transform data into a valuable commodity. Many times this stage may also include the use of virtualization or Cloud layers. This strategy may also now include an external self-service presentation layer as a part of that cloud strategy. This is the most highly refined state leveraging well-understood processes codified in a reusable and reliable technology stack. It includes sponsorship and vision alignment throughout the organization combined with an executive plan in the 3-5 year range for expansion and EOL planning. Data lifecycle and retention policy is at its most mature state. Data lifecycle is fully automated.
Assess the Health of Your Big Data Strategy
Getting to the most mature stage of Big Data is more than a technology choice. You must understand the current maturity state across multiple functional areas. Executive alignment as a company for a big data vision is also critical to the success of big data projects. Also consider how many developer and IT sponsored big data projects are substantial, or little more than extensive proof of concepts at the end of the day. There are also issues of maturity of infrastructure, staffing and business process that all merge to provide a picture of where an organization stands.
Understanding these concepts make the difference between effective modernization projects and utter failures. It's important to use the right technology and the right partner to help guide projects in the wilds of the big data landscape.