Data Science: What You Already Know Can Hurt You
Data Science: What You Already Know Can Hurt You
In data science, new, shiny tools can distract you and impede your judgment.
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
The Einstellung effect is a psychological phenomenon that changes the way we all come to solutions and impedes innovation.
Every day we solve problems - from choosing the quickest way to work, to how we’re going to fix a problem for that one client. How do we know if our solutions are any good? What if there is a much better solution that we haven’t thought of yet?
I recently came across a cover letter where someone said, “Every solution comes to me eventually”. This struck me as a strange thing to say. We don’t have visibility to every solution; we all have unknown unknowns. But even further, known knowns may not even make a connection for a certain problem. The Einstellung effect may occur, preventing us from considering all the available solutions.
The Einstellung effect occurs where preexisting knowledge impedes one’s ability to reach an optimal solution. We become unable to consider other solutions when we think we already have a one, even though it may not be accurate or optimal. It leaves us cognitively incapable of differentiating previous experience with the current problem. So we may solve a problem but we don’t actually innovate.
Einstellung is a German word that translates to setting, mindset, or attitude. The brain attempts to work efficiently by referring to past solutions without giving the current problem much though. It’s stuck in a mindset. We apply previous methods to a seemingly similar problem instead of evaluating the problem on its own terms. This effect presents across disciplines and skill levels. Whether or not we know it, we all experience it.
The classic experiment used to validate this effect was conducted by Abraham Luchins in 1942 - the water jar problem.
Participants were separated into two groups, one given a few priming questions before the core question. The priming questions led the focus of the first group to a particular method of solving the solution. When presented with the core problem, one that couldn’t be solved with the same technique, they were unable to solve it. The participants in the second group, on the other hand, were asked the same core question without the primer and, more often than not, were able to find the optimal solution. (You can find the problem here.Try it yourself!)
Another experiment involved analyzing chess players and their eye movements on the board. The participants were again split into two groups, the first group with a suboptimal solution on the board along with an optimal solution and the other with just the optimal solution. The group with the suboptimal solutions continued to look at squares relating to the found solution even though they mentioned they were actively looking for a better one. Their eyes became fixated on the known solution. The Einstellung effect prevented them from viewing the board with an unbiased view even though they were intentionally trying to do so.
This effect suggests that once we gain experience, the more likely we are to fall trap to its influence and fail to evaluate each problem for its merits. We need to ask what the fundamental difference with this problem is and evaluate each new problem without bias. Prevent our brains from going on a mechanized state of autopilot. It’s not a lack of knowledge that leads to these errors but initial ideas formed from previous experience.
In Data Science
Data science is an emerging field where new technologies and methods emerge what seems like every day. But be wary, as trending methods may cloud our judgement. These new tools and ideas can be like shiny objects, where we can’t look away even if it’s not the right tool for our problem, such as the use of tools like Hadoop and NoSQL for the sake of using something trendy or ‘big data’ associated. Rather than leveraging a smaller dataset, we jump into an ocean of unexploited data without adequate reasoning or preparation. Or there’s approaching a problem by blindly throwing the trending algorithm of the day at it. (Recurrent Neural Networks and Random Forests are all the rage these days.) This can lead to solution blindness, especially when intelligence is added too early in the process. Sometimes we form our problem around the solution rather than the other way around.
The Einstellung effect also presents itself in the context of confirmation bias, where we ignore results that don’t support our initial representation of the model or hypothesis. Feature and model selection need to reflect an accurate depiction of the data. Exploratory data analysis is a critical stage in data science that is often overlooked. We need to explore and visualize the data in various ways to dispel preconceived notions before going toward solutions.
“Good is the enemy of great.” – Voltaire
Although a bit of an extreme case, this problem is synonymous with the philosophy of JK Simmon’s character in the recent movie Whiplash: "There are no two words in the English language more harmful than 'good job.'" One becomes content with local maxima rather than the absolute maximum.
Our brains are sabotaging our ability to come up with new ideas! What can we do about it? Break the pattern.
Usually when we think of geniuses, they are the people with a large working memory. They are able to process more at a single point in time. However the working memory, the prefrontal cortex, can block other memories from creating a connection, which consequently prevents creative thinking. A well-known creative process looks something like this:
- Gather as much information as possible
- Come up with ideas – they won’t be good
- Forget about the project and think about or do other things.
The third point is key in coming up with novel solutions and bypassing the Einstellung effect. Taking your mind off the task at hand for a while effectively activates the cerebral cortex and gets you out of the working memory to explore new ideas and connections.
Similar to distraction, interleaving is the technique by which one switches between ongoing tasks to improve memory, retention, and learning. It allows a topic to percolate in your mind and extract the general rules. This is not to be confused with multitasking. It could similarly result in a loss of productivity since switching between projects and modes of thinking can be time consuming. But the added benefit of jumping in and out of a problem can greatly outweigh the time required, if it leads to a better solution. Being flexible and allowing yourself to explore paths that don’t necessarily look promising from the start are great ways to allow your mind to discover new dimensions of a problem.
Collaboration, getting different perspectives, is a great method to break out of a rut. An approach that I like is to have multiple people work on an initial concept separately then convene with their findings and explore each other’s unbiased ideas. If a solution is presented too early, it can cause the others to suffer from the Einstellung effect.
The field of data science lacks meaningful collaboration tools. Data science encompasses a large domain of knowledge and many times requires more than one perspective. There are competitions like Kaggle where people can work together on a project, but a meaningful collaboration tool would not only allow data scientists to work with each other on a dataset but track decisions made in the process. Visualization redesigns for example could greatly benefit from this. Edward Tufte’s redesign of the challenger data effectively shows the desired result in hindsight. But his knowledge of the outcome rather than the decisions made in the process leads to an unfair critique. Most of the data is left out to highlight the major data point that caused the disaster.
In a production environment sometimes good enough really is good enough. It may not be worth the extra effort to get the best solution. It may not even be possible with the current technology. Marginal benefits may not be worth the time it takes to reach a better solution. The key is in knowing the tradeoffs and when to explore. Recurring problems are the best candidate for exploring if there is a better solution, when it could be just around the corner.
A Cognitive Network
At Exaptive, one of the things we are striving to facilitate is a better method for discovering novel innovations around data. We want to eradicate the Einstellung effect in our field, and eliminate any associate efficiency loss to boot. (Hey, it’s good to have lofty goals.) We believe something along the lines of a suggestion engine is what data practitioners of all kinds need. Except, in addition to suggesting new approaches, the right suggestion engine would reveal the potential collaborators who designed those approaches. What we like to call a cognitive network - a concept that deserves a post of its own - allows people to explore data in various ways with a diverse set of collaborators and suggests different ways to think about a problem.
At its core, a cognitive network is focused on connections, connections that wouldn’t be made if the Einstellung effect has anything to do with it. Connections are what allow us to know when something fits for a particular application, where to apply a technique, or to translate a concept to another area or field of study. They are the glue that hold together pieces of information by use and meaning. The connections then are crucial to innovation, and missing one is detrimental.
We should set aside some time to determine if we are settling for a known, good enough solution or we are evaluating the problem with clear eyes and see all solutions. At the end of the day, we may still not consider every solution. However, being mindful of the Einstellung effect and open to new approaches even if an apparent solution has presented itself will aid in reaching those solutions just outside our conventional way of thinking and lead to innovations.
Published at DZone with permission of Stephen Arra , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.