Join the DZone community and get the full member experience.
Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
To gather insights on the state of big data in 2018, we talked to 22 executives from 21 companies who are helping clients manage and optimize their data to drive business value. We asked them, "How has the role of the DBA changed as part of the big data strategy?" Here's what they told us:
- Struggling to become the data management expert. A SQL database can scale to handle big data but the database is not optimized. Data managers are now working on meeting the SLA within budget or for the lowest cost. We provide the option of trimming the structured database from 100 terabytes to 40 — much easier to manage from a time and cost standpoint.
- The emergence of the data ops concept. Data management and data governance. Many players are all part of the value chain, with roles based on certain assumptions. Data operations is part of the team, with shared goals collaborating with when data is collected and created. How to use automation to squeeze out delays. Agility to take the most appropriate action. Drive greater efficiency and value.
- Management, as well as governance and software delivery. Maintain database models and schemas. In big data, everything is exploratory — not relational. Shift from well-defined to enabled for applications and engineers to work together collaboratively. Manage meta-data discovery.
- While developers don’t think they need them, DBAs are still needed for governance to make it easier to analyze data.
- DBAs have gone from managing databases tobeing data engineers across multiple systems. They focus on how data moves from one database to another, the consumption of data, tuning of the data, and management of the data process across the data landscape is critical until it is distributed and executed automatically.
- DBAs have moved from being focused on individual products like SQLServer and Oracle to having to deal with bringing companies' big data implementation to life.
Infrastructure and Platforms
- The role of the DBA has evolved to patient zero at the frontline. Responsible for the evolution of the IT stack. The scope of cognizance is greater with infrastructure and platforms.
- It's quite a challenge, as they are no longer in control of the database technologies that the applications use. The more that’s moved to the cloud, the less the DBA has control of. There are more data and more databases.The skill set of managing data infrastructure, proposing solutions for large amounts of data, integrating, knowing how to archive, and handling disaster recovery. AWS seems to tie database options in the cloud to DBAs. They still have to worry about backup, disaster recovery, and massive storage. Need to think more strategically with regards to backup and storage.
- They are more important than ever. But they need to learn how to efficiently integrate legacy data stored in RDBMS systems with big data technologies.
- As big data changes data architectures, the consequences for the DBA may be moderate — but are real and require some technical adjustments. New technologies give a new deal of the cards of data management, making the era of the DBA and the long-standing guardian of data schemas a bit outdated today.
- Indeed, NoSQL platforms without data schemas and Hadoop, as well as the set of tools that support it, are increasingly deployed in enterprises. Developers now have more influence in the design of the data itself.
- This has the effect of pushing the DBAs to broaden their scope of expertise: They must learn the mechanisms and operations of NoSQL systems, acquire the ability to administer Hadoop clusters, and more generally adopt "the store data without having a schema" practice.
- Furthermore, the agility provided by the NoSQL is at the price of data integrity, which is much more difficult to obtain with this model. For the time being, the integrity has given way to the benefit of data flexibility in the web applications of many companies.
- DBAs must adapt while design and development styles change. DBAs are also expected to operate several relational systems and consider NoSQL technology closely to guide the company on what to do and when to deploy it. There will likely be several types of DBAs in the future: those confined to a technical role and those who are traditional administrators, and others, who will seek to learn new technologies and tools to manage big data. New titles will appear to qualify the DBAs and the specializations will become even more precise.
- It depends on how you view the previous or current roles of DBAs. For this, I’ll start by stipulating that DBAs, in my opinion, have always needed to be part of the entire software development process. In today’s problem sets, it’s now more imperative than ever to have all DBAs engaged throughout the development process, especially planning, scoping, and prototyping. DBAs can provide concrete information on items such as data infrastructure capability, cost of needed changes, potential performance impact, and overall capacity planning (there’s a strong desire for this to be predictable).
- The most successful organizations who have adopted a big data strategy have evolved the role of database administrators into broader administrators of new forms of data infrastructure that include NoSQL databases as well as Hadoop. Combined with data developers who can develop data management logic, data scientists who can manipulate and prepare the data, and data analysts in the lines of business, the database administrator is an essential part of operating a big data strategy. The role of the DBA has now become dependent on more intelligent tools that can manage and report on data infrastructure and processes across a wide variety of databases and technology frameworks. For example, these tools have the capability to monitor and optimize the utilization and capacity of data infrastructure resources across the enterprise.
Workloads and SLAs
- The structure of the job is gone. The types of problems solved are much broader. Need to be able to blend workloads of different SLAs for batch and streaming workloads. What tool do I use? Need a hybrid environment to deliver new workloads in streaming and batch while staying abreast of changes.
- Old leader of the fiefdom. Now, there are many data sources to make data part of the data ecosystem of hyper-managed data not in the database, with a hybrid view of the world. Understand communications, the speed of links, security, and how to pull sources together. DBAs aren't part of the bigger team, though they need to be. Ultimately, we’ll have autonomous databases that are self-learning and self-teaching.
- There are many more technologies to manage than ever. You’ll have a data warehouse and ten to 20 technologies to understand and manage. You need to understand the technologies and what will make it easier for the DBA to do online and scale out. Pick the right technology for the right problem. Larger companies are looking at standardizing a group of technologies for search, NoSQL, Hadoop, and GPU so that they require less management and administration.
- They’ve gone from being a systems administrator with domain knowledge of databases to now needing to know how to work with data integration, unstructured data, natural language processing, document stores, and statistics. Toolsets can make the job easier. Relational databases aren’t going anywhere, but there will be new stores for big data. Must be motivated to expand their expertise and learn.
- The traditional role of DBAs has changed significantly in the age of big data. For the longest time, DBAs were barely more than a systems administrator with specialized knowledge in a particular database system. They did certainly have knowledge of SQL, an understanding of SQL optimization, and an appreciation for building data keys, but they were not actively involved in particular uses of the data stored in those database systems.
- Big data DBAs have a deeper understanding of the applications for data and also of non-relational data models (for example, self-referential graphical models, key/value stores, non-structured data stores, document stores, etc.) and must possess the knowledge to perform data integration that goes beyond the traditional extraction-transformation-loading process (ETL) used for business intelligence (BI) applications of the 90s. Probabilistic data integration is not that uncommon nowadays and neither are techniques to deal with reducing the likelihood of exposing sensitive data when appropriate (such as differential privacy). I believe that this is an exciting time for those DBAs who want to invest the effort to transform their careers for the years to come.
- Adapt scripting and automation to end users.
- It varies by company. With previous generations of Oracle and Teradata, you needed a DBA. Now, with enterprise architectures evolving to open source and the Hadoop ecosystem, the expertise lies with the developers rather than the DBAs. DBAs need to evolve to be more developer-facing.
- Data is more mainstream and dominant. DBAs have evolved into more sophisticated insights and writing scripts. More upstream and complex.
- In a world of real-time data, the roles of DBA and DevOps are changing significantly. Decentralized data applications need a new set of tools for management.
Here’s who we spoke to:
Emma McGrattan, S.V.P. of Engineering, Actian
Neena Pemmaraju, VP, Products, Alluxio, Inc.
Tibi Popp, Co-founder and CTO, Archive360
Laura Pressman, Marketing Manager, Automated Insights
Sébastien Vugier, SVP, Ecosystem Engagement and Vertical Solutions, Axway
Kostas Tzoumas, Co-founder and CEO, Data Artisans
Shehan Akmeemana, CTO, Data Dynamics
Peter Smails, V.P. of Marketing and Business Development, Datos IO
Tomer Shiran, Founder and CEO and Kelly Stirman, CMO, Dremio
Ali Hodroj, Vice President Products and Strategy, GigaSpaces
Flavio Villanustre, CISO and V.P. of Technology, HPCC Systems
Fangjin Yang, Co-founder and CEO, Imply
Murthy Mathiprakasam, Director of Product Marketing, Informatica
Iran Hutchinson, Product Manager and Big Data Analytics Software/Systems Architect, InterSystems
Dipti Borkar, V.P. of Products, Kinetica
Adnan Mahmud, Founder and CEO, LiveStories
Jack Norris, S.V.P. Data and Applications, MapR
Derek Smith, Co-founder and CEO, Naveego
Ken Tsai, Global V.P., Global Vice President, Head of Database and Data Management Product Marketing, SAP
Clarke Patterson, Head of Product Marketing, StreamSets
Seeta Somagani, Solutions Architect, VoltDB
Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub. Join the discussion.
Opinions expressed by DZone contributors are their own.