What Devs Need to Know About Big Data
If you're a developer, some of the things that you need to know include Java, Python, Spark, Machine Learning, and natural language processing.
Join the DZone community and get the full member experience.
Join For FreeTo gather insights on the state of Big Data today, we spoke with 22 executives from 20 companies who are working in Big Data themselves or providing Big Data solutions to clients.
Here's what they told us when we asked, "What skills do developers need to have to work on Big Data projects?"
- Be familiar with JSON, which is the standard. Tie in with Java and Scala. Be open-minded about different best practices in the Big Data environment. Different use cases how to take advantage of Spark versus an app building mentality.
- Learn new languages and programs adding to Python, R, Ruby, Apache projects. Don’t throw away learnings of the past. Understand the implication of security being an afterthought in Hadoop. Maintain best practices for data providence and security.
- Evolve with modern technology like microservices, Hadoop, and Cassandra. Learn these and push the envelope.
- Conversion tools and databases. Machine Learning. Natural language processing. Think outside the box. Work in a collaborative environment.
- Know your rules of thumb for latency — how long to get data off disk, off memory, and across the network. Understand how your application will handle failure. Hardware fails. Code fails. Databases fail. How will your application handle those scenarios?3) Build in operational visibility so you will know when something goes wrong and what went wrong.
- Be excited by the future and do not feel threatened by it. Our developers in Kiev and Tel Aviv embrace these challenges. Be into AI Machine Learning tools that enable customers to interact through NLP.
- It depends on the category of developers. Those in Machine Learning and SQL need to understand advanced statistical approaches. Other developers are builders integrating technology, like architects and DevOps must do integration. Stay on top of the components with approaches to integration, especially in the cloud.
- Be able to script in Spark, Java, and Scala. Understand what the business is trying to accomplish. What decisions are being made? What data is needed? How should it be presented? Have multiple skill sets while understanding the business goals and objectives. Innovation happens at the intersection of different disciplines.
- SQL is an industry standard tool. It's very flexible. Mix with knowledge of Linux, Python, and Java to have a successful mix of knowledge and expertise.
- Developers are good at adapting to tools and being creative. Understand data and how to transfer into tangible business value.
- Developers are typically focused on the technology — and rightly so. They also need to understand the business problem and work with the entire team to identify and solve the business problem. Don’t optimize for a single use case; things are changing too quickly. Understand and partner with lines of business leaders. There’s a “hero” culture in open-source communities. Developers should be responsible for looking for proprietary solutions. Understand the benefits of going from one solution to another — speed versus applicability.
- Keep an open mind. Remain adaptable. Don’t pin yourself to a technology. Don’t get too emotionally committed to what technology you are using. It’s safer to be a Big Data analyst than a Spark analyst.
- Prepare yourself by working with large datasets at every opportunity. Be conscientious about data integrity, hygiene, applications, and moving data. There are a lot of aspects to Big Data. Determine where your interests lie. Get up to speed on Machine Learning, as this is where Big Data is headed.
- Understand interfaces and natural language tools to do the job. Reduce friction for the developers. Make Hadoop fast, easy, and secure to use. Spark and Spark SQL are familiar to developers.
- There are two ways to approach it. You don’t need a command of any major programming languages for application development and SQL — just know how to use the tools that are available. Don’t use an abstraction layer and do know the Java, Python, and R ecosystems and libraries.
- Have a good handle on Python. Understand statistics and modeling algorithms, regression, and linear programming. Go back to basics. Understand the fundamentals.
- Talented developers that are good engineers but without business experience are less valuable to me than someone with two to three years of programming experience but also has experience in retail, warehouse management, customer service, or product management. I’ll invest in them to take the IT classes they need. Look for jobs that aren’t in development.
- Advanced programming skills and knowledge of the open-source ecosystem.
- Data processing and cleansing techniques. Data analysis — in particular, use of Machine Learning techniques. Data modeling — how to structure unstructured data so they are easy to analyze.
- Follow tutorials provided in Azure, AWS, and other cloud service providers. Tools such as Apache Hive offer an easy transition from relational database SQL to an SQL-type language that can run on top of Hadoop clusters. They can then learn Pig and/or Spark to provide more flexibility depending on the problems they are trying to solve. If they then want to go deeper into the data science direction, they should learn R. Finally, they should be comfortable with the DevOps side of deploying Hadoop clusters in cloud environments.
What skills do you think developers need to be successful on Big Data projects?
By the way, here’s who we talked to!
- Nitin Tyagi, Vice President Enterprise Solutions, Cambridge Technology Enterprises.
- Ryan Lippert, Senior Marketing Manager and Sean Anderson, Senior Product Marketing Manager, Cloudera.
- Sanjay Jagad, Senior Manager, Product Marketing, Coho Data.
- Amy Williams, COO, Data Conversion Laboratory (DCL).
- Andrew Brust, Senior Director Market Strategy and Intelligence, Datameer.
- Eric Haller, Executive Vice President, Experian DataLabs.
- Julie Lockner, Global Product Marketing, Data Platforms, Intersystems.
- Jim Frey, V.P. Strategic Alliances, Kentik.
- Eric Mizell, Vice President Global Engineering, Kinetica.
- Rob Consoli, Chief Revenue Officer, Liaison.
- Dale Kim, Senior Director of Industrial Solutions, MapR.
- Chris Cheney, CTO, MPP Global.
- Amit Satoor, Senior Director, Product and Solution Marketing, SAP.
- Guy Levy-Yurista, Head of Product, Sisense.
- Jon Bock, Vice President of Product and Marketing, Snowflake Computing.
- Bob Brodie, CTO, SUMOHeavy.
- Kim Hanmark, Director of Professional Services EMEA, TARGIT.
- Dennis Duckworth, Director of Product Marketing, VoltDB.
- Alex Gorelik, Founder and CEO and Todd Goldman, CMO, Waterline Data.
- Oliver Robinson, Director and Co-Founder, World Programming.
Big data
Data science
dev
Machine learning
Opinions expressed by DZone contributors are their own.
Comments