Here's who we talked to:
Uri Maoz, Head of U.S. Sales and Marketing, Anodot | Dave McCrory, CTO, Basho | Carl Tsukahara, CMO, Birst | Bob Vaillancourt, Vice President, CFB Strategies | Mikko Jarva, CTO Intelligent Data, Comptel | Sham Mustafa, Co-Founder and CEO, Correlation One | Andrew Brust, Senior Director Marketing Strategy, Datameer | Tarun Thakur, CEO/Co-Founder, Datos IO | Guy Yehiav, CEO, Profitect | Hjalmar Gislason, Vice President of Data, Qlik | Guy Levy-Yurista, Head of Product, Sisense | Girish Pancha, CEO, StreamSets | Ciaran Dynes, Vice Presidents of Products, Talend | Kim Hanmark, Director, Professional Services, TARGIT | Dennis Duckworth, Director of Product Marketing, VoltDB.
We asked these executives, "What skills do developers need to work on big data projects?"
Here's what they told us:
- Application owners are on the line for the budget and to keep the app running 24x7. Deployment, scale, and data management. Application developers are smart, is the database consistently protected against programming errors? The app needs to scale, be resilient, and be protected. MongoDB takes the pain away. The new owners of IT are application developers. They define the needs of the app.
- A synthesis of knowledge and programming. Those with a background in statistics pick up programming quickly. The reverse is not true. Ramp up statistics and applied math coding skills. Pick up statistics coding since it’s the foundation of data science.
- Understand SQL and traditional programming. Have the discipline to understand governance and security. A lot of people are getting into NoSQL with MongoDB and Cassandra; however, the same people doing business intelligence want access to big data and already know SQL. Some skill is required but it’s still useful to have. Know how to build rock hard, secure applications. Learn Spark for data exploration.
- Early adopters should learn Scala and how to program with Spark, that’s valuable for future employment. R programming is super valuable for Data Scientists. It’s also fun to use for data sets, predictive models. Python is similar. Learn ecosystems – for the packages you control. Enterprise developers should wait until they know what big data engine’s they’ll be working on to stay on a logical continuum.
- Different skills for different levels. Understand the options available, load and get the data ready, push down to the ETL level. Middle-tier need to know how to prepare data in a model that can be queried. There will a huge collection of attributes. Don’t assume it will scale. Know what you’re scaling for. Think about the architecture for the application. Be aware of multi-layered architectural design issues for the next set of use cases. Know what level of granularity you need to scale?
- Developers tend to think of the data ingestion process as a piece of the app as opposed to separating ingestion from data analytics. Move from custom coding to thinking of building a single ingestion platform for all your analytic needs. Use emerging technology to learn, leverage, work more efficiently, and effectively. Use Python, Groovy, Java to add custom data flow logic. Architect mindset changes to separate ingestion from analytics.
- Focus on machine learning and algorithms that are super scalable and can plug into different algorithms. Get more comfortable with data science. Understand the difference between a distributed system and a single node system. Learn how to handle errors. There’s a lot of complexity under the covers. Know how to take to production.
- Collaborate with storytellers and statisticians. Develop data integration skills. Don’t be afraid to insert yourself into B.I. Be pushier. Developers can contribute more than they think – they are two-thirds of the unicorn.
- Depends on the scale and the technology. Tighter integration between R and Spark. You need PhD’s with math and statistics backgrounds. If we push capabilities to the edge to do data discovery, then pseudo-data scientists can access the data and garner insights. Enable regular people to be pseudo-data scientists. By connecting different tables with common identifications we can allow people to spend less time modeling data.
- Think about who is going to be using your data – they’re smart but in a different way than you are. Use tools, look at heat maps. Go to the balcony and see how the people are using the tool in an unbiased way. Make sure users are receiving the value they need. Implementation needs to be short – it cannot take six months to implement.
- Developers are good at learning any new language relatively fast given their mindset. They need to work on non-developer skillsets with an analytical mind – preparing data in the right way and the skills to present insights to others.
- Understand the architecture. Understand how to build a system from the ground up that scales and handles large amounts of data. Read about how different companies are building architectures that will scale out. The industry is changing fast. Stay up to date with distributed systems, not a single product or database. Understand the basics.
- Need to understand how data is structured and how to resolve conflicts using CRDT (conflict resolution data types). Machine learning, AI and neural networks will require machines to parse the data. Visualization tools will be needed to present data and insights to users.
- Rapid deployment lifecycles with iterative evolution. Do not become wed to a single set of tools because they will change as rapidly as the big data landscape is changing. Prepare for change. The principles will remain consistent but the technology will change.
- Patience. I graduated with a degree in History and Political Science and was first introduced into big data on Mayor Mike Bloomberg’s 2005 re-election campaign. You first have to develop the end objective and work backwards. There is an abundance of information on the web for free that you can teach yourself over time, in addition to a lot of helpful tools that have an intuitive UI and are fairly easy to learn. To start off though, it’s best to have a basic understanding of SQL and general purpose programming languages such as Java, C or Python.
Are there additional skills you think developers need for big data projects?