Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Bayes, Credit Scoring, and Terrorism

DZone's Guide to

Bayes, Credit Scoring, and Terrorism

· Big Data Zone
Free Resource

Access NoSQL and Big Data through SQL using standard drivers (ODBC, JDBC, ADO.NET). Free Download 

My neighbor Corey published a very interesting post on his blog http://bayesianbiologist.com/… on how likely the NSA program will catch a terrorist (a real one). I was working on something similar last weeks, with Stéphane Tufféry, for our chapter, entitled Statistical Learning in Actuarial Science

The idea was to show credit scoring techniques, from logistic regression, classification trees, random forests, etc. Of course, it is more boring, since we talk about loans and not terrorism. In credit scoring, we consider possible loans, and we have to predict if someone is more likely to be a bad guy or a good guy. The idea is the same: based on some covariates, we need to build a score function, that can be related to the probability of being bad. The higher the score, the more likely the person will be a bad guy. Then, of course, we have to discuss errors, namely false-positive (good guys that we think are bad) and false-negative (bad guys that we think are good). 

From the company's perspective, you do not want to have bad guys in your portfolio, and from everyone else point of view (since everyone believes he's the good one, this is a classical optimistic bias), we do not want to be confused with those bad guys. Then we can spend hours on classification curves, and criteria to assess if our classifier is good or not, etc. While I was writing the introduction of the chapter, I remember that I found it hard to find proper words (to describe that 0/1 problem). But I did use (like everyone else) the terms good and bad. Like in terrorism. Except that to use this terminology (bad and good), we have to be more specific. In credit scoring, a bad guy is someone who did not pay back, at least once, for instance. But in terrorism, I think it is more difficult to say what a terrorist is.

In December 1996, I was in an RER train, going south, and we reached Cité Universitaire when a bomb exploded in Port Royal. The train following mine, I guess. I remember that a couple of days after, I was traveling Paris, in a bus, carrying with me a nice plant of…well, a plant that you’re not supposed to grow. (Say I was carrying sandwiches, a la Ted Mosby.) In order to avoid trouble, I put my sandwiches in a large box. I remember that people were starring at me, and some actually asked what was in the box. 

People often try to build their own terrorist classifiers, based on what they think might be covariates. And dirty trousers, a poorly shaved face, long hair (yes, I used to have long hair), and a box in a bus were obviously some of them. Note that I don’t blame them -- I do the same! After reading Corey’s post this morning, I took the bus. And I saw someone with a ninja sword.

At first, my terrorist classifier put her in the bad guy class. Then I understood it was an umbrella. So I put her in the super cool geeky category (that only a few can reach).

When I started to teach non-life insurance in Paris, the last part of the course was dedicated to large risks, natural catastrophes, and a hot topic: terrorism. I was giving this course (probably my best experience, ever) in tandem with François Bucchini, who was working by that time for AXA France. The two of us were giving the course together, interacting: I was the boring guy doing the maths, and François was sharing his experience. And by that time, he was involved in the creation of GAREAT, a market structure, launched in France in 2002, to propose reinsurance against terrorism (for French companies). And one of the first claim was from the CAV (which is a pun for Comité d’Action Viticole) considered as a terrorist group. So, as he told us, be careful of prejudices when you think about terrorism. Cool wine drinkers can be dangerous terrorists…

Actually, I would love to see covariates used by the NSA to predict if you’re a bad guy, or a potentially dangerous terrorist. Let's have a guess… You've asked for a visa for Pakistan? or Afghanistan? or Libya (not Libya, not yet bad guys, still have good friends there)? You have an NRA membership? You bought some heavy metal on iTunes? You still have a stop acta sticker on your blog? You have a blog? You wrote a post including the word terrorist in it?

Note: I am supposed to be in Chicago next week. If I can't enter the U.S., we’ll probably know more about potential covariates.

The fastest databases need the fastest drivers - learn how you can leverage CData Drivers for high performance NoSQL & Big Data Access.

Topics:

Published at DZone with permission of Arthur Charpentier, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}