The Machines Are Biased: Fixing the Fatal Flaws That Plague Modern Algorithms
In this brief guide to building ethical algorithms, we'll uncover how modern algorithms are prone to bias and discover the best practices to avoid it.
Join the DZone community and get the full member experience.Join For Free
"We've arranged a society based on science and technology, in which nobody understands anything about science technology. And this combustible mixture of ignorance and power, sooner or later, is going to blow up in our faces." -Carl Sagan
While applicable to most of our technology, Sagan's quote hits the hardest when thinking about algorithms. We live in a digitally automated world, yet most of us lack a working understanding of how digital automation works.
While conversations around self-driving cars or sentient AI often take center stage of our imagination, we shouldn't fail to recognize the progress we've already made in areas like problem-solving and machine learning.
"Smart algorithms" essentially power our entire lives. They birth our news feeds, recommend us ideas and people to follow, power our search results, and track, understand and predict our behavior. Our world is transitioning from digitization to automation as we enter the Age of Algorithms.
But before you scroll or click away to the next shiny piece of content, most certainly brought to your attention by one of the algorithms in question here, it might be worth asking if algorithms are ethical?
For the common folk, a question like this yields nothing but confusion.
And this is true to a great extent because the word "algorithm" is a very broad term. An algorithm, in general, refers to a set of rules that need to be followed to achieve a particular goal. A recipe for baking apple cinnamon muffins, for example, can be called an algorithm, the specific solution to a particular math problem, again an algorithm. But these things aren't what we mean when we speak in terms of algorithms being unethical.
In such conversations, we are generally speaking about what I've decided to call Social Algorithms. These are algorithms/models, typically built by machine learning programs based on heaps of consumer or public data, deployed by government or corporate organizations in a way that has a deep and lasting impact on individuals and society.
In that sense, Facebook's ML models are a social algorithm (loosely speaking since they certainly use more than one model), predictive policing models—some straight out of Minority Report—are another example of what we can call Social Algorithms.
Granted, the way I've defined a Social Algorithm might be a bit too vague for a few readers (for instance, does the Youtube algorithm fall within this category?) But what isn't vague is the way these algorithms affect our lives. Algorithms are becoming a part of an increasing number of decisions we make every day, and hence it is worth asking if we can trust them. We don't know how machines work, but if they are involved with judging things like our credit scores or our likelihood of turning into criminals, then we certainly ought to.
How Algorithms Can Be Unethical
The internet is full to the brim with examples of biased and prejudiced algorithms. From predictive policing to judging how healthcare resources are allocated to a person, biased algorithms have impacted us all and not without serious consequences.
I don't want to spend time discussing specific examples or ringing the alarm for how dangerous this whole situation can be. Instead, I want to take the time to break down how exactly algorithms get biased, especially considering how their authors (for the most part at least) have the best intentions in mind.
In a Forbes article, AI expert Bruno Maisonnie points out the three major reasons that make algorithms biased: Flawed Data, inconsiderate constraints, and the very principles that fuel AI Algorithms. Let's break these down further:
Certainly the most unsurprising of the bunch. If you feed biased training data to a program, the result is a biased program. What is surprising though, is the amount of problematic data floating around.
One striking example of this is Amazon's (now scrapped) recruitment algorithm. The program was meant to rate applications between 0-5 stars, allowing recruiters to pick the best of the bunch at a glance. But soon enough, it was discovered that the algorithm had developed a bias against women, favoring men for roles like "software developer."
The discrepancy was rooted in the fact that the data used for training the algorithm—past 10 years of application data— massively underrepresented women since tech jobs are usually male-dominated. As a direct but unintended consequence, the program had learned to discriminate against women for certain positions.
That's just one example of how biased data leads to prejudiced algorithms, and certainly, this isn't the only one. More often than not, such data-driven biases are completely unintentional and severely problematic, both for those that build and implement them and those that the program discriminates against.
Constraints That Birth Bias
Despite having a decent data set, Bruno points out, one could still end up with a biased program. Consider Optum's healthcare algorithm, which was found to be biased against Black patients. The algorithm was in use at several prominent healthcare institutions and has thus affected an estimated 200 million people.
With the Optum algorithm, the developers had a rather straightforward goal: to build a program that identifies high-risk patients who need extra care and resources to prevent them from getting worse. One way to do that is to assign "risk scores" (based on a number of different factors) to each patient.
This is where things get interesting. One of the parameters the developers allowed the algorithm to draw from was the cost of care or the amount people spend on healthcare. Since Black Americans spend significantly less on healthcare compared to white Americans, mainly because they cannot afford to do so, the algorithm "learned" that Black Americans were, on average, healthier than their white counterparts.
As a direct but unintended consequence, when comparing two patients with identical conditions, the algorithm more often allowed special care for white patients, thinking they were sicker. Again "Although the algorithm did not explicitly apply racial identification to patients," The Guardian reported, "it still played out racial biases in effect. That's because the parameter the algorithm used to signify health – cost of care – had racial biases baked into it."
Here we are reminded of how important it is to train algorithms with appropriate parameters. Had the developers chosen an additional condition, one which demanded that the discrepancy between the risk scores for similar medical conditions for people of different races should be kept to a minimum, the results would have been much different.
However, I doubt if any programmer would have seen this coming. In terms of building the algorithms themselves, the best we can do is perhaps be extra cautious about the training conditions and the constraints we subject our algorithms to.
Finally, there are limitations and problems with machine learning that keep us from designing perfect algorithms. Sometimes it is impossible to quantify precisely what we are trying to accomplish in terms that algorithms can be mapped on. Sometimes computer logic and human logic do not go hand in hand. Sometimes we fail to anticipate what the machine will end up learning when left on its own. And the only way to combat these problems is to always be critical of our algorithms. If machines can be just as biased as humans, humans will need to step up and be just as objective as machines.
How to Build Ethical Algorithms
But it isn't all doom and gloom. Now that we know what creates algorithmic bias, we are better positioned to discuss how to get around it. Here are a few pointers:
Use Clean, Bias-free Datasets to Train Your Algorithms
The first step towards building a fair algorithm, as we discussed above, is using clean data sets that aren't biased towards or under-represent certain groups in society. The obvious trade-off here is that managing such diverse and accurate databases is often significantly more expensive and time-consuming.
The real problem is that we simply cannot go about using whatever data we can get our hands on, mainly because most systems and institutions that we have built are, in one way or another, biased already. Relying on data from places that already harbor bias is only going to exaggerate the same bias within our algorithms. What we need to do instead is build entire ethical frameworks that can inform our data selection process for upcoming algorithms (more on these down below).
But beyond just ensuring healthy data sets for budding algorithms, we must also remember that ML-based algorithms (and the companies that build and use them) do not exist in a vacuum. Developers and product designers often work in a collaborative environment, relying upon and drawing from a vast library of resources and third-party products. Third-party service providers involved further complicate the issue.
Given such an interdependent eco-system, developers need to ensure they depend upon reliable software and app developers and only use trusted and transparent products. This added layer of security ensures that no unintentional bias makes it into the development process because of unregulated methodologies or flawed data sets.
Define What You Mean by "Ethical" and Code it Into Your Program
In their book, The Ethical Algorithm, authors Arron Roth and Michael Kearns break down the two significant challenges that we face when building ethical algorithms. "One of the greatest challenges here," they write, "is the development of quantitative definitions of social values that many of us can agree on."
They say we need to be precise when defining terms like 'privacy' and 'fairness .' To achieve such a goal, we need more than just programmers and app developers.
"But once we have settled on our definitions," they add, "we can try to internalize them in the machine learning pipeline, encoding them into our algorithms." How do you translate ethical complexities into code that machines can read? The authors claim we do it by introducing ethical requirements as constraints. "Instead of asking for the model that only minimizes error, we ask for the model that minimizes error subject to the constraint that it should not violate particular notions of fairness or privacy 'too much.'"
The obvious challenge here is that the model's accuracy will take a hit as a consequence of this new constraint. This added cost to corporations and governments (or basically anyone that wishes to deploy social algorithms) brings us to the final point in our discussion: public opinion.
Promoting Algorithmic Literacy for the Public Is Critical
Algorithmic literacy among the public is critical when it comes to addressing the issues that plague digital automation. We need to be aware of concerns like the fairness vs. automation problem and that it translates into inconsistent results and expensive products within the real world.
We have to be aware of the legislation that surrounds such issues and continue to push for better, but at the same time, more applicable solutions. As consumers, we need to support the companies and institutions that go the extra mile for ensuring their algorithms are fair and just; and as developers, we need to ensure we do our bit in building systems that do less damage than they do good.
We deploy algorithms because machines can be objective and fair. They allow us an opportunity to automate the tasks we often struggle with. And with the arrival of rudimentary AI and machine learning, the stage is set for a world where bots do most of the computational heavy lifting—and do it better than us.
But such a dream can soon turn into a nightmare if we aren't careful about how we build our algorithms. Neglecting biases and other flaws within our algorithms will only worsen the situation since their use will certainly pollute the data for the next generation of social algorithms, making the problem significantly worse with every iteration.
But with an honest, diligent effort for building algorithms imbued with ethics and preparing for the social, political, and cultural shift required to transition into an automated society, we can not just inhabit a world run by algorithms but also thrive in it.
Opinions expressed by DZone contributors are their own.