Forecasting the Future: Let's Rewind to the Basics
The goal of forecasting is not to predict the future but to tell you what is needed to take meaningful action in the present to make the future bright.
Join the DZone community and get the full member experience.Join For Free
Those who predict may not always have sound knowledge. So, how does one gain the right knowledge? By going back to the basics to strengthen the foundation. This is even more required in the field of predictive analytics.
Evolving from being a buzzword a couple of years ago, data science now translates to information overload with so much popularity (some might call it hype) around forecasting and predicting the future. There is endless information readily available, be it sample data, code, or even free space to practice the code.
Despite so much exposure, one common concern we keep hearing is the poor accuracy or precision of the model. After all the hard work, the results are not even close to what the stakeholder wants. Surely, data science did not start as an alternative to crystal ball or astrology. So, what is exactly amiss?
Is it a surprise that only 20%-30% of the project deals with data modeling and the rest of it is about building that data?
From our school days, we know that to have a good future, we need to get the basics right first. Listed below are five basic blocks one needs to consider before building a data science castle.
Project charter: Yes, you read it right. This is the starting point. We need to clearly articulate the business problem at hand, the domain involved, stakeholders for whom we are solving the problem, and most, importantly, their tolerance for false positives or true negatives.
Analyzing the real problem to be solved: As Steve Blasis says, ‘The real answer to asking the right questions is simple: Keep asking. In the end, the right questions are those that get you relevant information." A lean tool like the 5 Whys will help you understand the underlying problem. This is an essential step, as we need to collect relevant data for further engineering.
Understanding the domain: The problem arises along with the ecosystem, so it’s important to understand the domain and how data gets populated. There are plenty of user study systems available, e.g. surveys, interviews, and the latest ethnographic trends. Everything comes at a cost — so does building knowledge around a domain. Hence, we need to choose the one that fits into our budget, while, at the same time, is good enough to get to the depth of the ecosystem.
Identifying potential causes: Once we understand the problem and ecosystem, it is important to understand the contributing parameters regardless of whether they are controllable. While we can firm up the correlation through modeling; at this point in time, it’s more to do with our understanding of the ecosystem. In mathematical terms, this would be finding all Xs that may influence Y, the problem we are trying to solve. Y = f(x1, x2, x3, …). A cause-and-effect diagram (AKA fishbone) comes in handy while categorizing the possible causes of a problem.
Collecting relevant data: A fishbone diagram helps in understanding what data we need to collect. Of course, data collection comes with the constraint of time and cost with a tradeoff of accuracy and precision. This step is vital to understanding data maturity and setting the expectation about the desired outcome. If we can collect data for all potential causes or are able to engineer available data to cover all influencers, there is a high probability of our model being accurate and/or precise.
Once we have the solid foundation, the rest of the steps — data cleansing, data exploration and engineering, modeling, and visualization — follow seamlessly.
Before we fast-forward to predicting the future, we need to rewind and get the basics right. The goal of forecasting is not to predict the future but to tell you what is needed to take meaningful action in the present to make the future bright.
Opinions expressed by DZone contributors are their own.