I had a great discussion with Dave King, the founder and CEO of Exaptive, about the state of Big Data application development today.
Q: What are the keys to Big Data application development?
A: The first key is recognizing that we are now in an environment in which we think about data in applications rather than in reports and spreadsheets. How we work with data has emerged from the data report ethos. Think about how data fits into the workflow, and how to integrate data across many applications and databases to benefit from the knowledge and insights it provides. We need to enable business users to have a conversation with data, edit it, and annotate it, versus just reading and visualizing. Think of data as a landscape. We need better tools to explore the complex landscape.
Q: How can companies get more out of Big Data applications?
A: First and most important for getting the most out of data is reducing the time and cost of experimentation. When it takes a long time to analyze data, you do less exploration because it’s time-consuming and expensive. To get a better return on your Big Data investment, you have to explore hypotheses and some of the less traditional hypotheses are the ones that result in the biggest, most unexpected, insights. Businesses do not like to take the time, expense, and risk needed to explore non-traditional hypotheses. Minimize iteration cost to be able to do more cycles of analysis.
Second, with as much interest as there is in AI, Machine Learning, and automation, there still needs to be collaboration between humans and computers. Subject matter experts (SMEs) should be working with data scientists and the computing structure.
Third, as business invests in Big Data infrastructure and the increased connectivity of data through warehouses and microservices, they also need to think about synthesizing expertise within the company with the data. Network SMEs with data.
Fourth is the ability to reuse the data. Finding a balance between specialization and reuse is hard — but it’s a huge advantage. Some tools are so specific and focus on such a specific use case that’s it’s not reusable by anyone else in the organization. This hamstrings the organization. Target toolkits that enable the reuse of the data and code.
Q: How has Big Data application development changed over the past year?
A: Specialization of tools and data practitioners. LexisNexis and Westlaw, for instance, used to manage all three tiers of data applications: the data, the algorithm, and the presentation. Today, software focuses on one tier, possibly for a single industry or application. In parallel, the full-stack developer is becoming a thing of the past. We’re seeing a proliferation of specialized technical positions like data scientists and data visualization.
Both kinds of specialization create value. They also create overhead for connecting the tiers to provide business solutions.
Q: What real-world problems are your clients solving with Big Data applications?
A: There are three categories of use-cases we see repetitively and that I find particularly interesting to solve.
Medical researchers working with genetics data need powerful tools for data storage, query algorithms, and novel visualizations to help generate hypotheses from the data. This is a great example of how an application helps solve a Big Data challenge across an organization. You’ve got developers, data scientists, researchers, and business decision-makers all needing to collaborate. An application is where they all converge to work with the data and generate value, in a feedback loop between creators and users.
Organizations across a number of verticals have tabular data that actually represents network analysis challenges. When in need of visualization, they can’t escape the “hairball” network problem. The right tools help turn big data into small data, untangle the hairball, and enable exploration of the data to find value.
Digital content providers need a way to take huge corpora of text and images and enable search and visualization by end-users. It has to be fast and intuitive. Going from raw text to a user-friendly web app requires a lot of creative solutions across the data stack, including Machine Learning and Natural Language Processing (NLP).
Q: What are the most common issues you see preventing companies from realizing the benefits of Big Data applications?
A: Companies have a partial-stack development problem. They have a core competency and focus on that core competency. That yields efficiency gains. However, to be successful, they need all the other pieces of their application or their data product. Maybe they need someone that knows Machine Learning, or they need to leverage a broader set of technologies to get the most out of the data they have in-house.
Companies also burn a lot of resources “gluing” things together to get them released. Then, they have to take them apart to improve them. There are some heuristics to follow, but iteration is inevitable and experimentation has benefits. Lowering the cost of iteration improves an application. Companies need the opportunity to learn something new in the act of building and failing, without feeling like they’re failing.
Q: What are your biggest concerns regarding the state of Big Data application development today?
A: We use the term “science” in Data Science. However, Data Science is very different than traditional research science where you have a hypothesis, run experiments to test the hypothesis, and then have data to support your findings. In Big Data, we take data and form hypotheses after the fact, without knowing the conditions, methods, and hypotheses that led to this data set. That’s risky. We’re doing a disservice to traditional science if we don’t bridge the gap between the two methodologies. We need a more nuanced way of talking about Data Science and collaborating with scientists to realize the potential of Big Data.
Q: What skills do developers need to be successful in Big Data application development?
A: Think about how quickly you can learn new languages and get up to speed on new technologies. The Big Data landscape is changing very quickly and you have to be able to keep up with and respond to the changes. Practice learning new things to develop your own learning curve. The more practice you get, the faster you will become at gaining a new proficiency. This skill and experience will make you invaluable throughout your career.
Over the next decade, I think the ability to synthesize disparate information and data and the ability to connect technology will be in increasingly greater demand. Data scientists are semi-developers that are really good at connecting things.
Q: What have I failed to ask that you think developers need to know about Big Data application development?
A: While I’ve watched the rise of open source and the contributions of developers to the ecosystem, we haven’t discussed the ways for developers to monetize their work. They can work for a company and earn a salary. They can create open-source shareware that generates some incremental revenue. I’d like to see avenues open to developers to monetize work between closed source and open source.
As Big Data becomes a bigger force, the developers and data scientists that created the magic algorithms that yielded value from all this investment are at a risk of losing out. Most of the time, they’ve already given up their intellectual property to their employer or they’ve open-sourced their work. I’d like to see monetization avenues that enable retention and compensation for intellectual property — not just for execution and services.