Over a million developers have joined DZone.

Inequalities and Quantile Regression

· Big Data Zone

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

In the course on inequality measures, we’ve seen how to compute various (standard) inequality indices, based on some sample of incomes (that can be binned, in various categories). On Thursday, we discussed the fact that incomes can be related to different variables (e.g. experience), and that comparing income inequalities between countries can be biased, if they have very different age structures.

So we’ve seen quantile regressions. I can mention some old slides (used in a crash course at McGill three years ago), as well as a more technical discussion on ties, and non-unicity of the regression line.

In order to illustrate, consider the following dataset:

> salary <- read.table("http://data.princeton.edu/wws509/datasets/salary.dat",header=TRUE)
> plot(salary$yd,salary$sl)
> abline(lm(sl~yd,data=salary),col="blue")

We have here the standard regression line, obtained using ordinary least squares. Here we have the expected income given the experience. But we can also use a quantile regression:


> library(quantreg)
> Q10 <- rq(sl~yd,data=salary,tau=.1)
> Q90 <- rq(sl~yd,data=salary,tau=.9)
> abline(Q10,col="red")
> abline(Q90,col="purple")

A classical tool to describe inequalities is the ratio of the 90% quantile over the 10% quantile (among so many others):

> ratio9010 = function(age){
+   predict(Q90,newdata=data.frame(yd=age))/
+   predict(Q10,newdata=data.frame(yd=age))
+ }

For instance, among people with 5 years of experience, there is an inequality index of

> ratio9010(5)

while for people with 30 years of experience, it would be

> ratio9010(30)

If we plot the evolution of this 90-10 ratio, as a function of the experience, we get the following increasing trend:

> A=0:30
> plot(A,Vectorize(ratio9010)(A),type="l",ylab="90-10 quantile ratio")

So clearly, comparing inequalitis ceteris paribus between two groups, should be performed carefully, and probably including some covariates.


Hortonworks Sandbox is a personal, portable Apache Hadoop® environment that comes with dozens of interactive Hadoop and it's ecosystem tutorials and the most exciting developments from the latest HDP distribution, brought to you in partnership with Hortonworks.


Published at DZone with permission of Arthur Charpentier, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}