Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Using the Mortality Database

DZone's Guide to

Using the Mortality Database

Take a look at some fun analysis based on demographic data and using a demography library in R to get the proportion of people alive in 1945 who are still alive today.

· Big Data Zone
Free Resource

Need to build an application around your data? Learn more about dataflow programming for rapid development and greater creativity. 

In demography, we like to use life tables to estimate the probability that someone born in 1945 (as an example) is still alive nowadays. But another interesting quantity might be the probability that someone alive in 1945 is still alive nowadays.

The main difference is that we do not know when that person, alive in 1945, was born. Someone who was old in 1945 is very unlikely to still be alive in 2017. To compute those probabilities, we can use datasets from here. More precisely, we need both death and birth data. I assume that datasets (text files) were downloaded (it is necessary to register, for free, to get the data).

D=read.table("FRDeaths_1x1.txt",skip=1,header=TRUE)
B=read.table("FRBirths.txt",skip=1,header=TRUE)

In the death dataset, there is a “110+” for people older than 110 years. For convenience, let us cap our observations at 110 years old,

D$Age=as.numeric(as.character(D$Age))
D$Age[is.na(D$Age)]=110

Consider now a first function that will return for people born in 1930 (as an example) two peices of information:

  • The number of people (here, let's consider women only) born in 1930 (from the birth database).
  • The number of death of people of age 0 in 1930, people of age 1 in 1931, people of age 2 in 1932, etc.

The code is simple:

nb=function(y=1930){
debut=1816
MatDFemale=matrix(D$Female,nrow=111)
colnames(MatDFemale)=debut+0:198
cly=y-debut+1:111
deces=diag(MatDFemale[,cly[cly%in%1:199]])
return(c(B$Female[B$Year==y],deces))}

We have a single number for the number of births, and then a vector for the number of deaths. Consider now another function. Consider the people born in 1930. We want to get two numbers: the number of people still alive in 1945 (say), and the number of people still alive nowadays. The ratio will be the proportion of people born in 1930 that were alive in 1945 that are still alive in 2015.

pop=function(ne=1930,an=1945){
comptage=nb(ne)
s=0
if(an>ne) s=sum(comptage[seq(2,1+an-ne)])
p1=max(comptage[1]-s,0)
p2=max(p1-sum(comptage[seq(2+an-ne,length(comptage))]),0)
c(p1,p2)
}

Then, for a given year (say 1945), to get the proportion of people alive in 1945 that are still alive today, we need to count how many people born in 1944 were still alive in 1945, and in 2015, but also born in 1943, 1942, etc. And we simply consider the ratio of the total number of people alive in 2015 over the total number of people alive in 1945:

ptn=function(y=1945){
V=Vectorize(function(x) pop(x,y))(1816:y)
sum(V[2,!is.na(V[2,])])/sum(V[1,!is.na(V[1,])])
}

Hence, 22% of those alive in 1945 are still alive in 2015:

> ptn(1945)
[1] 0.2209435

Actually, instead of looking only at 1945, it is possible to get a plot:

P=Vectorize(ptn)(1900:2010)
plot(1900:2010,P,type="l",ylim=0:1)

For instance:

> ptn(1975)
[1] 0.6377413

63.7% of those alive in 1975 are still alive 40 years after. That is a rather interesting function; I was surprised that I couldn’t find it is standard demographical R package.

Check out the Exaptive data application Studio. Technology agnostic. No glue code. Use what you know and rely on the community for what you don't. Try the community version.

Topics:
database ,demographics ,r

Published at DZone with permission of Arthur Charpentier, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}