Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Visualizing (Censored) Lifetime Distributions

DZone's Guide to

Visualizing (Censored) Lifetime Distributions

Understanding all the R packages out there is hard, but this one's pretty interesting. Check out some standard censored lifetime data and see how to generate it into a nice visual.

· Big Data Zone ·
Free Resource

The Architect’s Guide to Big Data Application Performance. Get the Guide.

There are now more than 10,000 R packages available from CRAN — and much more if you include those available only on GitHub. So, to be honest, it's difficult to know all of them. But sometimes, you discover a nice function in one of them, and that is really awesome. Consider for instance some (standard) censored lifetime data:

n=10000
idx=sample(1:4,size=n,replace=TRUE)
pd=LETTERS[idx]
lambda=1+(idx-1)/3
t=rexp(n,lambda)
x=rexp(n)
c=t>x
y=pmin(t,x)
df=data.frame(time=y,status=c,product=pd)

Yes, I will generate them here. Consider the Kaplan-Meier estimator of the survival function:

library(survival)
km.base = survfit( Surv(time,status) ~ 1  , data = df )
plot(km.base)

Recently, Anat (currently finishing the Data Science for Actuaries program) helped me discover a nice R function to add information to that graph (well, not that graph, since it will be a ggplot version, but the same survival distribution plot):

library(ggplot2)
library(survminer)
ggsurvplot(km.base, main = "", color = "blue" , censor = FALSE, xlim = c(0,3), risk.table = TRUE ,
risk.table.col = "blue" , risk.table.height = 0.2, risk.table.title = "" , legend.labs = "All" , legend.title = "" , break.time.by = 1, xlab = "" , ylab = "")

Image title

This is more interesting when we have different lifetimes:

km.prod = survfit( Surv(time,status) ~ product  , data = df )
ggsurvplot(km.prod, main = "", censor = FALSE, xlim = c(0,3), risk.table = TRUE , risk.table.col = "strata" , risk.table.height = 0.3, risk.table.title = "" , legend.labs = LETTERS[1:4] , legend.title = "" , break.time.by = 1, xlab = "" , ylab = "")

Image title

Or a different time granularity:

Nice, isn't it?

Learn how taking a DataOps approach will help you speed up processes and increase data quality by providing streamlined analytics pipelines via automation and testing. Learn More.

Topics:
big data ,tutorial ,data visualizations ,r ,lifetime distributions

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}