Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

DZone's Guide to

# Smoothing Mortality Rates in R

· Big Data Zone ·
Free Resource

Comment (0)

Save
{{ articles[0].views | formatCount}} Views

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

Recently, I was working with Julie, a student of mine, coming from Rennes, on mortality tables. Actually, we work on genealogical datasets from a small region in Québec, and we can observe a lot of volatiliy. If I borrow one of her graphs, we get something like

Since we have some missing data, we wanted to use some Generalized Nonlinear Models. So let us see how to get a smooth estimator of the mortality surface.  We will write some code that we can use on our data later on (the dataset we have has been obtained after signing a lot of official documents, and I guess I cannot upload it here, even partially).

```DEATH <- read.table(
"http://freakonometrics.free.fr/Deces-France.txt",
header=TRUE)
EXPO  <- read.table(
"http://freakonometrics.free.fr/Exposures-France.txt",
header=TRUE,skip=2)
library(gnm)
D=DEATH\$Male
E=EXPO\$Male
A=as.numeric(as.character(DEATH\$Age))
Y=DEATH\$Year
I=(A<100)
base=data.frame(D=D,E=E,Y=Y,A=A)
subbase=base[I,]
subbase=subbase[!is.na(subbase\$A),]```

The first idea can be to use a Poisson model, where the mortality rate is a smooth function of the age and the year, something like

$D_{x,t}\sim\mathcal{P}(E_{x,t}\cdot \exp[{\color{blue}s(x,t)}>)$

```library(mgcv)
regbsp=gam(D~s(A,Y,bs="cr")+offset(log(E)),data=subbase,family=quasipoisson)
predmodel=function(a,y) predict(regbsp,newdata=data.frame(A=a,Y=y,E=1))
vX=trunc(seq(0,99,length=41))
vY=trunc(seq(1900,2005,length=41))
vZ=outer(vX,vY,predmodel)
persp(vZ,theta=-30,col="green",shade=TRUE,xlab="Ages (0-100)",
ylab="Years (1900-2005)",zlab="Mortality rate (log)")```

The mortality surface is here

It is also possible to extract the average value of the years, which is the interpretation of the $a_x$ coefficient in the Lee-Carter model,

```predAx=function(a) mean(predict(regbsp,newdata=data.frame(A=a,
Y=seq(min(subbase\$Y),max(subbase\$Y)),E=1)))
plot(seq(0,99),Vectorize(predAx)(seq(0,99)),col="red",lwd=3,type="l")```

We have the following smoothed mortality rate

Recall that the Lee-Carter model is

$D_{x,t}\sim\mathcal{P}(E_{x,t}\cdot \exp[{\color{blue}a_x+b_x\cdot k_t}>)$

where parameter estimates can be obtained using

```regnp=gnm(D~factor(A)+Mult(factor(A),factor(Y))+offset(log(E)),
data=subbase,family=quasipoisson)```
```predmodel=function(a,y) predict(regnp,newdata=data.frame(A=a,Y=y,E=1))
vZ=outer(vX,vY,predmodel)
persp(vZ,theta=-30,col="green",shade=TRUE,xlab="Ages (0-100)",
ylab="Years (1900-2005)",zlab="Mortality rate (log)")```

The (crude) mortality surface is

with the following $a_x$ coefficients.

`plot(seq(1,99),coefficients(regnp)[2:100],col="red",lwd=3,type="l")`

Here we have a lot of coefficients, and unfortunately, on a smaller dataset, we have much more variability. Can we smooth our Lee-Carter model? To get something which looks like

$D_{x,t}\sim\mathcal{P}(E_{x,t}\cdot \exp[{\color{blue}s_a(x)+s_b(x)\cdot s_k(t)}>)$

Actually, we can, and the code is rather simple:

```library(splines)
knotsA=c(20,40,60,80)
knotsY=c(1920,1945,1980,2000)
regsp=gnm(D~bs(subbase\$A,knots=knotsA,Boundary.knots=range(subbase\$A),degre=3)+
Mult(bs(subbase\$A,knots=knotsA,Boundary.knots=range(subbase\$A),degre=3),
bs(subbase\$Y,knots=knotsY,Boundary.knots=range(subbase\$Y),degre=3))+
offset(log(E)),data=subbase, family=quasipoisson)
BpA=bs(seq(0,99),knots=knotsA,Boundary.knots=range(subbase\$A),degre=3)
BpY=bs(seq(min(subbase\$Y),max(subbase\$Y)),knots=knotsY,Boundary.knots= range(subbase\$Y),degre=3)
predmodel=function(a,y)
predict(regsp,newdata=data.frame(A=a,Y=y,E=1)) v
Z=outer(vX,vY,predmodel)
persp(vZ,theta=-30,col="green",shade=TRUE,xlab="Ages (0-100)",
ylab="Years (1900-2005)",zlab="Mortality rate (log)")```

The mortality surface is now

and again, it is possible to extract the average mortality rate, as a function of the age, over the years,

```BpA=bs(seq(0,99),knots=knotsA,Boundary.knots=range(subbase\$A),degre=3)
Ax=BpA%*%coefficients(regsp)[2:8]
plot(seq(0,99),Ax,col="red",lwd=3,type="l")```

We can then play with the smoothing parameters of the spline functions, and see the impact on the mortality surface

```knotsA=seq(5,95,by=5)
knotsY=seq(1910,2000,by=10)
regsp=gnm(D~bs(A,knots=knotsA,Boundary.knots=range(subbase\$A),degre=3)+
Mult(bs(A,knots=knotsA,Boundary.knots=range(subbase\$A),degre=3),
bs(Y,knots=knotsY,Boundary.knots=range(subbase\$Y),degre=3))
+offset(log(E)),data=subbase,family=quasipoisson)
predmodel=function(a,y) predict(regsp,newdata=data.frame(A=a,Y=y,E=1))
vZ=outer(vX,vY,predmodel)
persp(vZ,theta=-30,col="green",shade=TRUE,xlab="Ages (0-100)",
ylab="Years (1900-2005)",zlab="Mortality rate (log)")```

We now have to use those functions our our small data sample ! That should be fun….

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

Topics:

Comment (0)

Save
{{ articles[0].views | formatCount}} Views

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

# {{ parent.title || parent.header.title}}

### {{ parent.tldr }}

{{ parent.urlSource.name }}