Skills Developers Need for Big Data
Skills Developers Need for Big Data
To work with big data, developers need to understand the business problem they are working on, along with the deployment architectures and data.
Join the DZone community and get the full member experience.Join For Free
The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.
To gather insights on the state of big data in 2018, we talked to 22 executives from 21 companies who are helping clients manage and optimize their data to drive business value. We asked them, "What skills do developers need to have to work on big data projects?" Here's what they told us.
Know the Business Problem
- Developers need a variety of skills to work on big data projects, including the following three that are crucial for success: They must have a clear understanding of the range of business objectives within a company and how those align with the capabilities of various technologies. Similarly, in the context of an application, developers need to have an understanding of the business value of the datasets they’re working with. Finally, developers need the ability to build and manage an application as a member of a self-contained team that’s part of a larger organization.
- While developers rule right now, platforms will converge in order to scale so you better understand Kafka. You don’t have to know all of the coding since there will be tools that remove the connectivity challenges. Stay in touch with what is happening with the data you are collecting and ensure it is being used ethically and securely.
- Leverage data fabric to simplify processes. Use data as a general resource with containers and microservices. Transformational process simplifies and allows them to pursue business moments. Intelligence to make more tailored and reactive processes. See quality issues and the root cause of issues. Makes their job easier so they’re able to contribute more.
- Understand the basic data vocabulary of structure, dimension, and variables. Understand what kind of analysis can be done with a given variable. Know the gotchas for data — minimum quality standards? What are the tests you can run to determine data integrity?
- How to work with data at scale. Concurrency of multi-users. Application developers pick up languages quickly. Understand how the data ecosystem works.
- Developers need to use programming languages, probability and statistics, applied math, and algorithms for rising trend of machine learning. They also need to understand the context of data, how it will be consumed by the end user, how it will be reused. They need to think distributed computing and architecture to properly separate data management into distinct zones, to keep the big data architecture organized, agile, and secure. DevOps principles should be applied, too. By being involved throughout the software delivery process, data experts can help the rest of the team understand the types of data challenges their software will face in production. The result of big data and DevOps teams working together will be apps whose real-world behavior matches as closely as possible to its behavior in development and testing environments.
- Data engineering and data science are the big divisions. Basic knowledge of data science might suffice but deep knowledge of the different data technologies is necessary. Despite NoSQL’s popularity, SQL is still the standard for querying data. Developers need to be aware of the different deployment options — cloud native, containers, and the popular deployment options. But, my personal view is that developers need to know the underlying concepts of databases, without which, the ton of technologies in the space can seem daunting to learn. A good understanding of database and system concepts such as consistency guarantees, transactional boundaries, system architecture, guarantees, and responsibilities etc will help developers understand the landscape, categorize the technologies, and identify technologies that they should be looking into.
- Understand the big data world is decentralized and distributed by nature. Understand the pitfalls of high availability, latency, debugging. Understand the concepts of in-memory with Spark and data locality with Hadoop. Understand open source options for AI/ML with Apache Spark. You are not restricted to big frameworks. Look into more simplified frameworks.
- Developers should not need to be aware of any specialized development languages and should be able to focus on identifying the core business logic needed to deliver big data projects. A systematic and AI-driven data platform can then translate the business logic into the underlying processing, enabling developers to be future-proofed for any changes to processing technology frameworks.
Here’s who we spoke to:
Opinions expressed by DZone contributors are their own.