Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

How to Make Machine Learning Fairer

DZone's Guide to

How to Make Machine Learning Fairer

Can we make AI technology fair?

· AI Zone ·
Free Resource

Bias comes in a variety of forms, all of them potentially damaging to the efficacy of your ML algorithm. Read how Alegion's Chief Data Scientist discusses the source of most headlines about AI failures here.

As artificial intelligence has become more powerful, more and more attention is given to how we can ensure it’s as fair as possible. A recent project by researchers at MIT CSAIL suggests that the key might be to focus on how the data that underpins AI today is collected.

“Computer scientists are often quick to say that the way to make these systems less biased is to simply design better algorithms,” the researchers say. “But algorithms are only as good as the data they’re using, and our research shows that you can often make a bigger difference with better data.”

The research was able to not only identify data that could cause potential problems but also to quantify the impact that factor could have on accuracy levels. They then used this to show how different ways of collecting data could reduce the various types of bias they had identified, whilst still maintaining high levels of predictive accuracy.

“We view this as a toolbox for helping machine learning engineers figure out what questions to ask of their data in order to diagnose why their systems may be making unfair predictions,” the team explain.

Fairer Data

I wrote recently about a project that was aiming to provide comparable levels of predictive accuracy with a lot less training data. It’s a nice example of how more data is not always beneficial to the performance of the system. This could be because the additional data is of poor quality, but it could also be because it lacks fundamental diversity.

The team believes that their approach allows developers to look at datasets and easily see if biases exist and whether more data is needed from particular demographic groups to make the data representative and, therefore, fairer.

“We can plot trajectory curves to see what would happen if we added 2,000 more people versus 20,000, and from that figure out what size the dataset should be if we want to have the best of all worlds,” they say. “With a more nuanced approach like this, hospitals and other institutions would be better equipped to do cost-benefit analyses to see if it would be useful to get more data.”

The researchers believe this is perhaps the best way to improve the fairness of AI, as whilst you could request additional information from your existing pool of participants, this could simply result in acquiring largely irrelevant information.

As AI does become more powerful and plays a bigger role in the decisions we make in life, it’s positive to see a growing number of projects not only attempt to make those algorithms as fair as possible. Whilst this project suggests the need for authentic data, there is a growing usage of virtual data in healthcare use cases, especially in areas where real data might be limited. Hopefully, this study will act as a reminder of the need for that data to result in fair outcomes, and not embed bias and discrimination.

Your machine learning project needs enormous amounts of training data to get to a production-ready confidence level. Get a checklist approach to assembling the combination of technology, workforce and project management skills you’ll need to prepare your own training data.

Topics:
ml ,machine learning ,ai ,fair ,decision making ,data

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}