Customer: Is Hadoop enterprise-ready?
Me *Standing next to the white board*: Yes, and that's why we use the term "Enterprise Ready Data Lake."
Imagine that there are 3 points.
- You need to prove your identity to get access to Lake and then need permissions or authority to access data.
- Once you proved your authenticity then demands comes to manage the lifecycle of data from it's requirement to retirement "Automated process."
- Life Cycle Management process needs to be integrated with a Governance solution to manage data of data "metadata," data lineage, auditing and more to fulfill security and compliance requirement.
Entry Point: You must have strong Authentication in place to get into the system and more users will be coming in to access data as we move away from silos of data to a centralized repository. The access management must be easier to manage, i.e Security solution should have a centralized place to Admin (create, define and manage) security policies. Once users get in and have access then we need to track their actions and that's Auditing. At last, Data Encryption is in motion & at rest.
Security is in place and now we know that Data ingestion is occurring with full security. Now, business wants to manage the lifecycle of data in one common place "Data replication, retention, handling late data arrival rules, data mirroring and visualize the complete data pipeline."
Once data lifecycle management in place then we will be generating more data of data "metadata" and there is existing legacy metadata that need to be exchanged with the Hadoop system. This generates the requirement of Data Governance solution. This solution should provide complete data lineage, exchange, search functionality
Customer: Yes, this is exactly what we are looking for. All this must be well integrated and please provide this as 100% open source but enterprise ready solution.
Kerberos is a must in production.