Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

For AI to Be Effective, It Needs to Be Fueled With Quality Data

DZone's Guide to

For AI to Be Effective, It Needs to Be Fueled With Quality Data

Data quality and intelligent AI go hand-in-hand. If you want to succeed in using AI, you must spend more time working on the data and less time working on the AI.

· AI Zone ·
Free Resource

Start coding something amazing with the IBM library of open source AI code patterns.  Content provided by IBM.

There's no doubt that AI has usurped big data as the enterprise technology industry's new favorite buzzword. After all, it's on Gartner's 2017 Hype Cycle for emerging technologies for a reason.

While progress was slow during the first few decades, AI advancement has rapidly accelerated during the last decade. Some people say AI will augment humans and maybe even make us immortal; other pessimistic individuals say AI will lead to conflict and may even automate our society out of jobs. Despite the differences in opinion, the fact is, only a few people can identify what AI really is. Today, we are surrounded by minute forms of AI, like the voice assistants that we all hold in our smartphones, without us knowing or perceiving the efficiency of the service. From Siri to self-driving cars, a lot of promise has already been shown by AI and the benefits it can bring to our economy, personal lives, and society at large. The question now turns to how enterprises will benefit from AI. But before companies or people can obtain the numerous improvements AI promises to deliver, they must first start with good-quality, clean data. Having accurate, cleansed, and verified information is critical to the success of AI. The data that fuels AI-driven applications must be trusted, on time, and of the highest quality.

Data Quality and Intelligence Must Go Hand-in-Hand

Data is currently used by organizations to extract numerous informational assets that are then used to assist strategic plans. The strategic plans dictate the future of the organization and how it fairs within the rising competition. Considering the importance of data, the impact that can be caused by low-quality information is indeed intimidating to think of. In fact, bad data costs the US about $3 trillion per year.

Recently, I had the opportunity to interview Nicholas Piette and Jean-Michel Franco from Talend, which is one of the leading big data and cloud integration company. Nicholas Piette, who is the Chief Evangelist at Talend, has been working with integration companies for nine years now and has been part of Talend for over a year.

When asked about the link between both data quality and artificial intelligence, Nick Piette responded with authority that you cannot do one without the other. Both data quality and AI walk hand-in-hand, and it's imperative for data quality to be present for AI to be not only accurate but impactful.

To better understand the concept of data quality and how it has an impact on AI, Nick used the help of the five Rs method that he mentioned was taught to him by David Shrier, his professor at MIT. The five Rs mentioned by Nicholas include:

Whatever data you have should be relevant to what you do and should serve as a guide and not as a deterrent.

We might reach a point where the large influx of data we have at our fingertips is too overwhelming for us to realize what elements of it are really useful vs what is disposable. This is where the concept of data readiness enters the fold. Having mountains of historical data can be helpful for extracting patterns and forecasting cyclical behavior or re-engineering processes that lead to undesirable outcomes. However, as businesses continue to advance toward the increased use of real-time engines and applications, the importance of data readiness-or information that is the most readily or recently made available-takes on greater importance. The data that you apply should be recent and should have figures that replicate reality.

AI Use Cases: Once You Know the Rules, How do You Play the Game?

When asked for the best examples of the use of AI at work today, Nick said he considered the use of AI in healthcare as a shining example of both what has been achieved using AI to date and what more companies can do with this technology. More specifically, Nick said:

"Today, healthcare professionals are using AI technology to determine the chances of a heart attack in an individual, or predict cardiac diseases. AI is now ready to assist doctors and help them diagnose patients in ways they were unable to do before."

All accolades aside, the use of AI in healthcare is also currently dictated by our understanding or interpretation of what the AI algorithms produce. Thus, if an AI system comes up with new insights that seem 'foreign' to our current understanding, it's often difficult for the end-user to 'trust' that analysis. According to Nick, the only way society can truly trust and comprehend the results delivered by AI algorithms is if we know that at the very core of those analyses is quality data.

Nicholas Piette added that ensure data quality is an absolutely necessary prerequisite for all companies looking to implement AI. He said the following words in this regard:

"100% of AI projects are subject to fail if there are no solid efforts beforehand to improve the quality of the data being used to fuel the applications. Making no effort to ensure the data you are using, is absolutely accurate and trusted-in my opinion-is indicative of unclear objectives regarding what AI is expected to answer or do. I understand it can be difficult to acknowledge, but if data quality mandates aren't addressed up front, by the time the mistake is realized, a lot of damage has already been done. So make sure it's forefront."

Nick also pointed out that hearing they have a data problem is not easy for organizations to digest. Adding a light touch of humor, he said, "Telling a company it has a data problem is like telling someone they have an ugly child." But the only way to solve a problem is to first realize you have one and be willing to put in the time needed to fix it.

Referring to the inability of the companies to realize that they have a problem, Nicholas pointed out that more than half of the companies that he has worked with did not believe that they have a data problem until the problem was pointed out. Once it was pointed out, they had the AHA! Moment.

Nick further voiced his opinion that it would be great if AI could, in the future, tell exactly how it reached an answer and the computations that went into reaching that conclusion. Until that happens, both data quality and AI are interlinked together, and there is no way you could achieve success in AI without getting complete accuracy in the data that you feed into the machine. Nick says:

"If you want to be successful, you have to spend more time working on the data and less time working on the AI."

Start coding something amazing with the IBM library of open source AI code patterns.  Content provided by IBM.

Topics:
ai ,data quality ,data analytics ,ai applications

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}