Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Can Predictive Models Be Fair?

DZone's Guide to

Can Predictive Models Be Fair?

Consider the fairness of predictive models with this analysis of relevant research papers and the example of credit scoring.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

In the first episode of season 3 of the television series Black Mirror, we discover the dystopia of a society governed by a "personal rating," a score, a score ranging from 0 to 5. In this world, each person rates the others, the best rated having access to better services (priority in services, better rates, better prices, etc). Will this tendency to construct scores in all sorts of fields (historically on credits but today on criminal, even civic aspects in some countries) not lead to a world that would be an endless popularity contest? And how would it be compatible with social justice, a priori desirable?

A credit score is, from an actuarial point of view, a quantity proportional to the probability of not honoring its commitments as a creditor. It may also be able to pay the due dates for three consecutive months, or just be late. In real life, as always, it's a little more complicated. In the United States or Great Britain, it is not uncommon for students to go into debt for decades in order to have the opportunity to follow the courses that interest them (even if the motivation is mainly to obtain a degree at the end of the course). But above all, as soon as they reach the age of 18, credit rating companies will monitor all their movements. Often without their knowledge. And if one day, a consumer credit or mortgage is refused, the reasons are never motivated. Is it a delay in paying rent? Forgotten library fines? An unpaid water bill, years old?

Credit rating companies in the United States, but also in China, are beginning to explore the use of social media data to improve credit scores. Can't counting the number of times a user uses the word "wasted" in what they post online reveal information about debt repayment? This is at least what the American credit analyst FICO claims: "If you look at how many times a person says 'wasted' in their profile, it has some value in predicting whether they're going to repay their debt (...) It's not much, but it's more than zero." (Quoted in McLannahan (2015)). In China, peer-to-peer lender Jubao revealed that he was more likely to give "bonuses" to borrowers if they were Facebook friends with celebrities, as Botsman (2017) tells us.

For the moment, credit rating companies still use the data they know well (utility bills and credit cards), but they imagine that a lot of interesting information must be accessible (in one way or another) on social networks. But data are still scarce, and difficult to analyze. What about the sarcastic or humorous component in a tweet using the word "wasted"? As is often the case, the difficulty is that truly relevant data are difficult to obtain. If it is possible to have information on the payment of rent when a tenant goes through an agency, what to do for transactions between two individuals? And if that were possible, how would you handle the roommate case? Not getting credit because a former roommate didn't pay on time becomes disturbing. All the more so if it is perhaps about a cellular telephone bill claimed abusively by the telephone company, whereas the subscription had been canceled.

But the big "malus" in the credit score is often the fact of never having had a credit card. One might think that a person who did not need a credit card (and was satisfied with a debit card, allowing him to buy from a merchant, like most bank cards in France) is a prudent person, who does not need credit for daily expenses. But for credit institutions, this person is not reliable because we don't know him. And it is up to it to prove that it is (we return to the recurrent practice of reversing the burden of proof mentioned in Charpentier (2016)). This is strangely what happens today when you want to enter the American soil without having a Facebook page.

What if credit institutions aren't the only ones interested in our lives? What would a world be if, in addition to knowing if I pay my bills on time, some people wanted to know about my networks of friends, which newspapers I read, whether I prefer to buy whole milk or semi-skimmed milk? When we visit the Stasi Museum in Berlin, we discover that this world existed, that 1 person out of 63 was an agent (or indicator) of the Stasi (counting the occasional indicators, the proportion can reach one person out of 6). The museum describes a total panoptism, each being observed permanently, as described by Foucault (1975). But doesn't this nightmare correspond to today's world of permanent surveillance, more or less consented. Surveillance via cellular phones (geolocation for the most common function, but sometimes also audio recordings without the user's knowledge by certain applications), via connected objects, but also surveillance cameras coupled with increasingly powerful facial recognition algorithms. At the end of 2017, 170 million cameras were installed in China, and the 300 million mark should be reached by 2020. During an experiment attempted by the BBC, it took seven minutes to find the journalist John Sudworth walking in the streets.

The danger is that you never know who's in control. More and more private security companies have partnered with governments. Email providers read our messages to detect spam, but also to resell certain information. For example, in the Privacy Policy attached to Gmail's Terms of Use (Google) we read "Our automated systems analyze your content (including email) to provide you with custom product features, such as (...) custom advertising. Insurers are increasingly considering the installation of GPS boxes in cars, but through external service providers. Beyond the ownership of data (mentioned in Charpentier & Suire (2016)) we can wonder about their resale and their use. Knowing that someone regularly visits a blood transfusion center is potentially interesting information, especially coupled with others.

Since 2014, the Chinese government has been working on an evaluation system for its own citizens, scheduled to be implemented in 2020, as Trujillo (2017) tells us. This "social credit system" aims to create a "citizen score" (to use the expression of Galeon & Bergan (2017)), in order to predict and prevent potential dangers, normalizing individual behavior through panoptic devices (e.g. video surveillance), inducing self-defense and self-control reflexes. As Foucault (1975) said, it is a question of "ensuring that surveillance is permanent in its effects, even if it is discontinuous in its action; that the perfection of power tends to make the topicality of its exercise useless" (even today, it is more and more continuous in its action). Some of these scores are used by police to find out where to patrol to reduce crime, such as PredPol. But when we look more closely, the predictions say, in substance, that the crimes will take place (in the majority) in the (historically) most criminogenic areas of the city. The boundary between banality and tautology is narrow. And the real danger is that scores often transform probabilities into near-certainties, and suspicion becomes proof, as Supiot (2015) noted.

In June 2010, a report from the Academy of Medicine called for "improving the practice of expert sex offender dangerousness by teaching and disseminating actuarial methods. These "actuarial methods" are quite simply scoring techniques, "profiling" as defined in the European regulation on personal data of 27 April 2016 (RGDP). Angèle Christin was interested in algorithms that estimate the probability of recidivism in the American criminal justice system. As she has shown, these techniques raise many questions, particularly discriminatory biases, the opacity that makes recourse difficult, and especially the difficulty of understanding what is actually calculated. In the State of Virginia, a score between 1 and 10 is returned, an agreement taken over by Compas (Correctional Offender Management Profiling Alternative Sanctions) which also offers a color code that predicts the risk of violent recidivism. It is then a decision-making tool, a machine that cannot place a person in detention alone (Christin et al. (2015)).

The conclusions of a predictive score depend on two key elements: the model used, and the data. In the majority of cases in the United States, model codes remain opaque (and therefore impossible to attack), and few have seen the data used to calibrate these models. But one can ask oneself if the court decisions are not also relatively opaque? Judges must certainly give reasons for their decisions, which makes them open to criticism and attack, but if the process were so transparent, shouldn't the outcome of a (human) trial then be more predictable? Finally, the different biases are quite simple to understand. Suppose being rich means having a good lawyer, and having a good lawyer means not having certain convictions. In this case, a wealth variable (the type of vehicle owned for example) will be positively related to not being guilty (convicted coupagle), and will lower the dangerousness score. The other danger in selection biases is that they are sometimes complex to understand, even paradoxical. A classic example is shown in Figure 1. During World War II, engineers and statisticians were asked how to reinforce bombers who were facing enemy fire.

Figure 1: Damaged locations of returned aircraft (Source: McGeddon 2016)

Statistician Abraham Wald began collecting data on impacts in the cabin, as reported by Mangel & Samaniego (1984). To everyone's surprise, he recommended shielding the areas of the aircraft that showed the least damage. Indeed, the aircraft used in the sample had a significant bias: only returned aircraft were taken into account. If they were able to return with holes at the tips of the wings, it is because these parts are sufficiently solid. And since no aircraft returned with holes in the propeller engines, these were the parts that needed to be reinforced.

Another danger is where causal relationships are reversed. What about this doctor who prescribes a powerful neuroleptic to a patient under investigation, lest justice reproach him for not having seen the dangerousness of his patient, and conversely, justice relies on this prescription to prove that the patient is dangerous? A poorly designed algorithm could misunderstand the meaning of causal relationships.

But predictive models in judicial matters are not only on the side of judges. In the event of a road traffic accident, the Badinter Act (of 5 July 1985) provides for a "right to compensation" for any victim of a traffic accident involving a land motor vehicle. When the driver's insurance company offers compensation, the victim makes a quick cost/benefit analysis to find out if he goes to court. If it does not formally construct a predictive model, it tries to see, from some elements to its knowledge, the costs of asking a judge to decide on the amount of compensation, and its (potential) benefits.

Another important point is that lawyers call these "predictive" models "actuarial" models. The first function of actuaries was to discount, to calculate the value of time. And judicial time often has disastrous consequences. How would a human decision, imperfect, taken after 5 years of procedure be "better" than an automatic decision taken in 15 days by a machine? Many people who have known proceedings for several years, resulting in a dismissal, dream of accelerated procedures. Because "lost time" has a value, actuaries know it well.

What then of this efficiency of algorithmic models? Justice must be effective, but this constraint must not make us forget the central objective, which is to render justice. What happens if efficiency becomes an objective, not to say the main objective? This is the question posed by predictive models: what is the objective that we are trying to maximize? And how is it formulated in a simple way?

In the United States, many judges have been accused of motivating a judgment using decision support tools, which leaves some doubt as to the real function of these tools. The original idea was to help. Recently, several systems put in place in the past years have been questioned. For example, in Australia, the STMP (Suspect Targeting Management Plan) proposed to identify whether or not pre-adolescents should be monitored. This model is similar to any actuarial model, i.e. a risk assessment and prediction tool, focusing either on repeat offenders or on those suspected of committing a future crime. However, a recent report showed that its use had "no observable impact on crime prevention". At the same time in the United States, Compas (Correctional Offender Management Profiling Alternative Sanctions) has been criticized in Dressel & Farid (2017): "Advocates of these systems argue that data and advanced automated learning make these analyses more accurate and less biased than those of humans. However, we show that the widely used Compas risk assessment software is no more accurate or fair than predictions made by people with little or no criminal justice expertise." By questioning people recruited on the Internet, without legal skills, it was a question of predicting whether or not people would commit another crime within the next two years. Compas was wrong in 34.8% of cases, and Internet users in 33% of cases. That said, one may wonder here what "to be wrong" means. In this case, recidivism is not measured here, but conviction for recidivism. What if the models (or the people) hadn't been wrong, but the judges, on the other hand, had?

And if one of the worries did not come in what one asks a predictive tool? To predict is (basically) to establish a probability for a future fact. As was pointed out in a debate on polls and elections, can we say that we are wrong if we announce that an event can happen with a 5% chance and that it actually happens? To know if a forecasting technique is good, you need to collect a forecast set and compare them to observations. This is what meteorologists have been doing for about fifteen years, and which has been formalized by Gneiting et al. (2007). Their idea is that a set of predictive distributions is obtained by a \{\hat F_t,\hat F_{t+1},..,\hat F_{t+h}\} model and it is appropriate to compare these distributions to \{y_t,y_{t+1},..,y_{t+h}\} - and not \{\hat y_t,\hat y_{t+1},..,\hat y_{t+h}\}. It is then necessary to introduce a distance between the predictive distributions, and the observations. In a physical system, it is possible to imagine understanding the different causal relationships, and thus to predict. But in human relations (and justice is a perfect example), nothing is as simple, as automatic as the laws of fluid mechanics that make it possible to model meteorological phenomena.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:
big data ,predictive analytics ,data analytics ,credit score

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}